[go: up one dir, main page]

CN116720812A - A big data smart warehousing management system based on data encoding - Google Patents

A big data smart warehousing management system based on data encoding Download PDF

Info

Publication number
CN116720812A
CN116720812A CN202311007459.XA CN202311007459A CN116720812A CN 116720812 A CN116720812 A CN 116720812A CN 202311007459 A CN202311007459 A CN 202311007459A CN 116720812 A CN116720812 A CN 116720812A
Authority
CN
China
Prior art keywords
character
coding
data
data information
information sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311007459.XA
Other languages
Chinese (zh)
Other versions
CN116720812B (en
Inventor
王阳
王国超
王浩
王亮
曹磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Hengyide Machinery Co ltd
Original Assignee
Hefei Hengyide Machinery Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Hengyide Machinery Co ltd filed Critical Hefei Hengyide Machinery Co ltd
Priority to CN202311007459.XA priority Critical patent/CN116720812B/en
Publication of CN116720812A publication Critical patent/CN116720812A/en
Application granted granted Critical
Publication of CN116720812B publication Critical patent/CN116720812B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/087Inventory or stock management, e.g. order filling, procurement or balancing against orders
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请涉及数据处理领域,具体提供一种基于数据编码的大数据智慧仓储管理系统,该管理系统包括:数据采集模块,用于获取待处理数据,待处理数据包括多条数据信息序列,每一数据信息序列对应每一商品的管理参数;编码模块,用于对待处理数据中每一条数据信息序列进行编码,进而得到每一条数据信息序列对应的实际编码结果;代表字符确定模块,用于基于实际编码结果与理想编码结果的差异性确定每一条数据信息序列的代表字符;聚类模块,用于基于每一条数据信息序列的代表字符对待处理数据进行聚类,进而得到聚类结果。该方法通过代表数据信息序列的代表字符进行聚类,可以减小计算量,提高算法效率,使得聚类结果的稳定性和准确性较高。

This application relates to the field of data processing, and specifically provides a big data smart warehousing management system based on data encoding. The management system includes: a data collection module for acquiring data to be processed. The data to be processed includes multiple data information sequences, each of which The data information sequence corresponds to the management parameters of each commodity; the encoding module is used to encode each data information sequence in the data to be processed, and then obtains the actual encoding result corresponding to each data information sequence; the representative character determination module is used to base on the actual The difference between the encoding result and the ideal encoding result determines the representative characters of each data information sequence; the clustering module is used to cluster the data to be processed based on the representative characters of each data information sequence, and then obtain the clustering results. This method clusters the representative characters representing the data information sequence, which can reduce the amount of calculation, improve the algorithm efficiency, and make the clustering results more stable and accurate.

Description

Big data wisdom warehouse management system based on data encoding
Technical Field
The application relates to the field of data processing, in particular to a big data intelligent warehouse management system based on data coding.
Background
In large data processing, operations on data clustering typically require normalization, etc. of the data to facilitate comparisons and statistics between the various data dimensions.
However, when the standardized data is directly clustered, the situations of weak homogeneity, unstable clustering effect and the like may occur due to huge data volume, and meanwhile, the characteristics of some data are not obvious enough or are difficult to distinguish, so that the direct clustering is difficult to obtain a good result, and therefore, the data is often required to be clustered by a data coding method, on one hand, the data dimension is reduced, and on the other hand, key information and characteristics are extracted, so that the subsequent operation is facilitated.
For the intelligent warehouse management system, each commodity in the warehouse system has various information thereof. The method comprises the steps of carrying out cluster analysis on information such as a source, a destination, a transportation mode, transportation time and vehicle scheduling of cargoes, optimizing warehouse logistics layout and scheduling, and reducing logistics time and cost, wherein the existing data coding usually adopts a frequent item set coding method to code data, and when the data dimension is higher, a frequent item set algorithm needs to calculate the support degree of a large number of candidate item sets and subsets, so that the calculation complexity is increased, and the algorithm efficiency is reduced.
Disclosure of Invention
The application provides a big data intelligent warehouse management system based on data coding, which can solve the problems of high computational complexity and low algorithm efficiency of the existing data coding when carrying out cluster analysis on various information in the intelligent warehouse management system.
In order to solve the technical problems, the first technical scheme adopted by the application is as follows: provided is a big data intelligent warehouse management system based on data coding, comprising:
the data acquisition module is used for acquiring data to be processed, wherein the data to be processed comprises a plurality of data information sequences, and each data information sequence corresponds to a management parameter of each commodity;
the coding module is used for coding each data information sequence in the data to be processed, so as to obtain an actual coding result corresponding to each data information sequence;
the representative character determining module is used for determining representative characters of each data information sequence based on the difference between the actual coding result and the ideal coding result;
and the clustering module is used for clustering the data to be processed based on the representative characters of each data information sequence, so as to obtain a clustering result.
In an alternative embodiment, the encoding module is configured to:
coding the data information sequences by utilizing the character arrangement mode of each data information sequence in the data to be processed, so as to obtain an actual coding result corresponding to each data information sequence;
and coding the data information sequences by utilizing the character dictionary sequence of each data information sequence in the data to be processed, so as to obtain an ideal coding result corresponding to each data information sequence.
In an alternative embodiment, the encoding module includes:
the first coding module is used for determining the arrangement mode combination of all characters in each data information sequence by utilizing a full arrangement algorithm, the arrangement mode combination comprises a plurality of character sequences, each character sequence represents a character arrangement mode, the data information sequence is coded by utilizing a BWT coding mode based on the plurality of character sequences in the arrangement mode combination so as to obtain a plurality of first coding results, and the plurality of first coding results form the actual coding result;
the second coding module is used for determining the character dictionary sequence of each data information sequence in the data to be processed, and coding the data information sequences by utilizing a BWT coding mode based on the dictionary character sequences so as to obtain a plurality of second coding results, wherein the ideal coding results are formed by the plurality of second coding results; the character dictionary sequence comprises a plurality of dictionary sequences, and the first coding result corresponds to the second coding result one by one.
In an alternative embodiment, the representative character determination module includes:
the difference calculation module is used for determining the difference of each character in each data information sequence based on the difference between the actual coding result and the ideal coding result of each data information sequence;
and the character determining module is used for determining the representative characters of the data information sequence based on the difference of each character.
In an alternative embodiment, the difference calculating module is configured to:
determining the comprehensive difference of each character in each data information sequence based on the difference between the actual encoding result and the ideal encoding result of each data information sequence;
the variability of each character in the data information sequence is calculated based on the integrated variability of each character and the frequency with which the characters appear in the data information sequence.
In an alternative embodiment, the difference calculating module is configured to:
calculating the difference between the coding distance of each character of each first coding result in the actual coding result and the dictionary distance of each character of a second coding result corresponding to the first coding result in the ideal coding result, taking the ratio of the absolute value of the calculated difference to a larger value as the distance difference, averaging all the calculated distance differences, and taking the calculated average as the comprehensive difference of each character in each data information sequence; the larger value is the larger value in the coding distance of each character of each first coding result in the actual coding result and the dictionary distance of each character of a second coding result corresponding to the first coding result in the ideal coding result.
In an alternative embodiment, the difference calculating module is configured to:
calculating the dictionary distance of each character of the second coding result based on the dictionary character distance sequence corresponding to the second coding result; wherein each element in the dictionary character distance sequence is a dictionary distance between two characters.
In an alternative embodiment, the difference calculating module is configured to:
calculating the sum of average distances between the current character and all reference characters in the first coding result, and calculating the distance between the current character and the reference characters based on the calculated sum and the occurrence times of the current character in the first coding result, so as to obtain a coding character distance sequence of the first coding result, wherein each element in the coding character distance sequence is the coding distance between two characters;
and calculating the coding distance of each character of the first coding result based on the coding character distance sequence corresponding to the first coding result.
In an alternative embodiment, the character determining module is configured to:
normalizing the difference of each character;
taking the characters with the difference smaller than a preset value after normalization processing as candidate characters;
and determining the representative character of the data information sequence based on the frequency of the candidate character and the difference after normalization processing.
In an alternative embodiment, the character determining module is further configured to:
calculating the ratio of the frequency of each candidate character to the difference after normalization processing;
and taking the candidate character with the ratio larger than 1 as a representative character of the data information sequence.
The application has the beneficial effects that the big data intelligent warehouse management system based on the data coding, which is different from the prior art, comprises: the data acquisition module is used for acquiring data to be processed, wherein the data to be processed comprises a plurality of data information sequences, and each data information sequence corresponds to a management parameter of each commodity; the coding module is used for coding each data information sequence in the data to be processed, so as to obtain an actual coding result corresponding to each data information sequence; the representative character determining module is used for determining representative characters of each data information sequence based on the difference between the actual coding result and the ideal coding result; and the clustering module is used for clustering the data to be processed based on the representative characters of each data information sequence, so as to obtain a clustering result. According to the method, different data information sequences are clustered through the representative characters representing the data information sequences, so that the calculated amount can be reduced, the algorithm efficiency is improved, and the stability and the accuracy of the clustering result are higher.
Drawings
FIG. 1 is a schematic diagram of a big data intelligent warehouse management system based on data encoding according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an embodiment of the encoding module of FIG. 1;
fig. 3 is a schematic diagram of an embodiment of the character determining module shown in fig. 1.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The present application will be described in detail with reference to the accompanying drawings and examples.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an embodiment of a big data intelligent warehouse management system based on data encoding according to the present application.
Specifically, the big data intelligent warehouse management system 100 based on data coding provided by the application comprises a data acquisition module 10, a coding module 20, a representative character determining module 30 and a clustering module 40.
The data acquisition module 10 is configured to acquire data to be processed, where the data to be processed includes a plurality of data information sequences, and each data information sequence corresponds to a management parameter of each commodity. The management parameters include high-dimensional information such as the source, destination, transportation mode, transportation time, vehicle scheduling information and the like of the commodity.
The encoding module 20 is configured to encode each data information sequence in the data to be processed, so as to obtain an actual encoding result corresponding to each data information sequence. Specifically, the encoding module 20 is configured to: and coding the data information sequences by utilizing the character arrangement mode of each data information sequence in the data to be processed, so as to obtain the actual coding result corresponding to each data information sequence. And coding the data information sequences by utilizing the character dictionary sequence of each data information sequence in the data to be processed, so as to obtain an ideal coding result corresponding to each data information sequence.
In one embodiment, referring to fig. 2, the encoding module 20 includes: a first encoding module 21 and a second encoding module 22. The first encoding module 21 is configured to determine an arrangement combination of all characters in each data information sequence by using a full permutation algorithm, where the arrangement combination includes a plurality of character sequences, each character sequence represents a character arrangement, and encode the data information sequence by using a BWT encoding method based on the plurality of character sequences in the arrangement combination to obtain a plurality of first encoding results, where the plurality of first encoding results form the actual encoding result. The second encoding module 22 is configured to determine a character dictionary sequence of each data information sequence in the data to be processed, encode the data information sequence by using a BWT encoding manner based on the dictionary character sequence, so as to obtain a plurality of second encoding results, where the plurality of second encoding results form the ideal encoding result; the character dictionary sequence comprises a plurality of dictionary sequences, and the first coding result corresponds to the second coding result one by one.
The BWT coding mode is a coding mode for adjusting the character sequence, and the number of characters before and after coding is unchanged. In the data information sequence after BWT coding, the sequences of the same kind of characters are not necessarily adjacent, but the characters with similar sequences in the dictionary character sequence still can be adjacent in the coded data, namely, only the positions of the characters change, but the relative distance of the characters with similar sequences changes less, the smaller the change of the relative distance, the closer the corresponding actual coding result is to the ideal coding result, and the ideal coding result is: after the characters in the data information sequence are coded, the characters with similar orders in the dictionary are still similar in the actual coding result. The BWT coding is not fully idealized, and the adjacent characters in the dictionary are not necessarily put together, so that the smaller the difference, the smaller the difference between the actual and ideal results of the BWT coding results, and in the sequential rearrangement, the more closely the characters can be aligned together, the more the characters can represent the nature of the data itself.
The present application thus provides a representative character determination module 30, the representative character determination module 30 being adapted to determine a representative character for each data information sequence based on the difference between the actual encoding result and the ideal encoding result. The clustering module 40 is configured to cluster the data to be processed based on the representative character of each data information sequence, so as to obtain a clustering result.
In one embodiment, referring to fig. 3, the representative character determining module 30 includes a variance calculating module 31 and a character determining module 32. The difference calculation module is used for determining the difference of each character in each data information sequence based on the difference between the actual coding result and the ideal coding result of each data information sequence. In a specific embodiment, the variance calculating module 31 is configured to: the overall difference for each character in each data information sequence is determined based on the difference between the actual encoding result and the ideal encoding result for each data information sequence. The variability of each character in the data information sequence is calculated based on the integrated variability of each character and the frequency with which the characters appear in the data information sequence.
Specifically, the difference calculating module 31 is configured to: calculating the difference between the coding distance of each character of each first coding result in the actual coding result and the dictionary distance of each character of a second coding result corresponding to the first coding result in the ideal coding result, taking the ratio of the absolute value of the calculated difference to a larger value as the distance difference, averaging all the calculated distance differences, and taking the calculated average as the comprehensive difference of each character in each data information sequence. The larger value is the larger value of the coding distance of each character of each first coding result in the actual coding result and the dictionary distance of each character of the second coding result corresponding to the first coding result in the ideal coding result.
First, the difference calculating module 31 is configured to calculate a dictionary distance of each character of the second encoding result based on a dictionary character distance sequence corresponding to the second encoding result; wherein each element in the dictionary character distance sequence is a dictionary distance between two characters.
In a specific embodiment, three characters a, b and c are assumed, and the corresponding dictionary character distance sequence in the second coding result of the 3 characters is: [ d (a, b), d (a, c), d (b, c) ], d () represents the distance between two characters in brackets. Where d (a, b) =1, d (a, c) =2, d (b, c) =1, the dictionary character distance sequence is therefore: [1,2,1]. The dictionary distance of each character of the second coding result can be calculated through the dictionary character distance sequence: the following formula (1):
d(a)==1.5;
d(b)==1;
d(c)==1.5。 (1)
and normalizing the calculated values to obtain the distances of the characters a, b and c, namely the dictionary distance of each character of the second coding result.
Further, the difference calculating module 31 is configured to calculate a sum of average distances between a current character and all reference characters in the first encoding result, calculate a distance between the current character and the reference characters based on the calculated sum and the number of times the current character appears in the first encoding result, and further obtain an encoding character distance sequence of the first encoding result, where each element in the encoding character distance sequence is an encoding distance between two characters; and calculating the coding distance of each character of the first coding result based on the coding character distance sequence corresponding to the first coding result.
In one embodiment, for example: the first encoding result is: abbaca, the coded character distance sequence is expressed as: [ d (a, b), d (a, c), d (b, c)]Wherein d (a, b) =) Wherein->And 3 in (2) represents three a in the first coding result, i.e. the number of times the character a appears in the first coding result, wherein the first element of the added three elements in brackets +.>Representing the average distance between the first a (the first a is recorded as the current character) and all the characters b (the characters b are recorded as the reference characters) in the actual coding result, and the second element>Representing the average distance of the second a (the second a is denoted as the current character) from all b (character b is denoted as the reference character), and so on.Representing the sum of the average distances between the current character a and all the reference characters b, calculating the distance between the current character and the reference characters based on the calculated sum and the number of times the current character appears in the first encoding result, i.e. calculating d (a, b). The results of d (a, c) and d (b, c), d (a, b), d (a, c) and d (b, c) can be calculated in the same way to form the code character distance sequence. Further based on the coded character distance sequenceThe coding distance of each character in the first coding result is calculated, that is, d (a), d (b) and d (c) in the first coding result are calculated, and the calculation mode is shown in the above formula (1) and will not be described herein.
Through the above-described process, the coding distance of each character of the first coding result is calculated, the dictionary distance of each character of the second coding result is calculated, the difference calculating module 31 further calculates the difference between the two, and the ratio of the absolute value of the calculated difference to the larger value is taken as the distance difference. The coding distance of each character of all the first coding results and the dictionary distance of each character of the second coding results are further calculated, the difference value of the coding distance and the dictionary distance is further calculated, and the corresponding distance difference is obtained in the same way. And (3) averaging all the calculated distance differences to obtain the comprehensive difference of each character in each data information sequence.
The variability of each character in the data information sequence is calculated based on the integrated variability of each character and the frequency with which the characters appear in the data information sequence. Specifically, the differential calculation formula of each character is:
indicating the frequency of occurrence of the ith character in the data information sequence,/for example>Representing the integrated variability of the ith character in the data information sequence and n representing the number of characters in the data information sequence.
Wherein,,representing differential weights, ++>The larger the difference weight is, the larger the difference weight is.
The character determining module 32 is configured to determine a representative character of the data information sequence based on the variability of each character.
Specifically, for each data information sequence, under different dictionary sequences, characters with smaller comprehensive differences can represent the character distribution characteristics of the data information sequence, so that the characters with smaller comprehensive differences in each data information sequence are obtained first, the differences of each character are obtained by combining the character frequencies, the differences represent the representativeness of the characters to the data information sequence, the representative characters of the data information sequence are obtained based on the differences of each character, and different pieces of data are clustered through the representative characters.
Specifically, the character determining module 32 normalizes the variability of each character; taking characters with the difference after normalization processing smaller than a preset value, for example, 0.6 as candidate characters; and determining the representative character of the data information sequence based on the frequency of the candidate character and the difference after normalization processing. In one embodiment, the character determination module 32 calculates a ratio of the frequency of each candidate character to the normalized difference; and taking the candidate character with the ratio larger than 1 as a representative character of the data information sequence.
The clustering module 40 is configured to cluster the data to be processed based on the representative character of each data information sequence, so as to obtain a clustering result.
Specifically, after the clustering module 40 obtains the representative character of each piece of data, different pieces of data are clustered through the representative character, for example, a hierarchical clustering method is adopted to obtain a clustering result. In the big data intelligent warehouse management system, articles with similar attributes can be stored together according to the clustering result, so that the articles with the same category are stored together.
According to the method, the difference of the actual coding result and the ideal coding result of each character in each piece of data is calculated, the difference of each character in each dictionary sequence is obtained, the comprehensive difference of each character in each piece of data is obtained by combining all dictionary sequences, the representative characters in each piece of data are further obtained, the calculated amount is reduced through representative character clustering, meanwhile, the stability and the accuracy of the clustering result are higher, different categories are obtained, the storage is further carried out, the reliability of the storage result is greatly improved, and the follow-up inquiry, article sorting and other operations are facilitated.
The foregoing is only the embodiments of the present application, and therefore, the scope of the present application is not limited by the above embodiments, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or direct or indirect application in other related technical fields are included in the scope of the present application.

Claims (10)

1. Big data wisdom warehouse management system based on data coding, characterized by comprising:
the data acquisition module is used for acquiring data to be processed, wherein the data to be processed comprises a plurality of data information sequences, and each data information sequence corresponds to a management parameter of each commodity;
the coding module is used for coding each data information sequence in the data to be processed, so as to obtain an actual coding result corresponding to each data information sequence;
the representative character determining module is used for determining representative characters of each data information sequence based on the difference between the actual coding result and the ideal coding result;
and the clustering module is used for clustering the data to be processed based on the representative characters of each data information sequence, so as to obtain a clustering result.
2. The big data intelligent warehouse management system based on data coding as claimed in claim 1, wherein the coding module is configured to:
coding the data information sequences by utilizing the character arrangement mode of each data information sequence in the data to be processed, so as to obtain an actual coding result corresponding to each data information sequence;
and coding the data information sequences by utilizing the character dictionary sequence of each data information sequence in the data to be processed, so as to obtain an ideal coding result corresponding to each data information sequence.
3. The big data intelligent warehouse management system based on data encoding of claim 2, wherein the encoding module comprises:
the first coding module is used for determining the arrangement mode combination of all characters in each data information sequence by utilizing a full arrangement algorithm, the arrangement mode combination comprises a plurality of character sequences, each character sequence represents a character arrangement mode, the data information sequence is coded by utilizing a BWT coding mode based on the plurality of character sequences in the arrangement mode combination so as to obtain a plurality of first coding results, and the plurality of first coding results form the actual coding result;
the second coding module is used for determining the character dictionary sequence of each data information sequence in the data to be processed, and coding the data information sequences by utilizing a BWT coding mode based on the dictionary character sequences so as to obtain a plurality of second coding results, wherein the ideal coding results are formed by the plurality of second coding results; the character dictionary sequence comprises a plurality of dictionary sequences, and the first coding result corresponds to the second coding result one by one.
4. A data coding based big data smart warehouse management system as claimed in claim 3, wherein the representative character determination module includes:
the difference calculation module is used for determining the difference of each character in each data information sequence based on the difference between the actual coding result and the ideal coding result of each data information sequence;
and the character determining module is used for determining the representative characters of the data information sequence based on the difference of each character.
5. The big data intelligent warehouse management system based on data coding of claim 4, wherein the variance calculating module is configured to:
determining the comprehensive difference of each character in each data information sequence based on the difference between the actual encoding result and the ideal encoding result of each data information sequence;
the variability of each character in the data information sequence is calculated based on the integrated variability of each character and the frequency with which the characters appear in the data information sequence.
6. The big data intelligent warehouse management system based on data encoding of claim 5, wherein the variance calculating module is configured to:
calculating the difference between the coding distance of each character of each first coding result in the actual coding result and the dictionary distance of each character of a second coding result corresponding to the first coding result in the ideal coding result, taking the ratio of the absolute value of the calculated difference to a larger value as the distance difference, averaging all the calculated distance differences, and taking the calculated average as the comprehensive difference of each character in each data information sequence; the larger value is the larger value in the coding distance of each character of each first coding result in the actual coding result and the dictionary distance of each character of a second coding result corresponding to the first coding result in the ideal coding result.
7. The big data intelligent warehouse management system based on data encoding of claim 6, wherein the variance calculating module is configured to:
calculating the dictionary distance of each character of the second coding result based on the dictionary character distance sequence corresponding to the second coding result; wherein each element in the dictionary character distance sequence is a dictionary distance between two characters.
8. The big data intelligent warehouse management system based on data encoding of claim 6, wherein the variance calculating module is configured to:
calculating the sum of average distances between the current character and all reference characters in the first coding result, and calculating the distance between the current character and the reference characters based on the calculated sum and the occurrence times of the current character in the first coding result, so as to obtain a coding character distance sequence of the first coding result, wherein each element in the coding character distance sequence is the coding distance between two characters;
and calculating the coding distance of each character of the first coding result based on the coding character distance sequence corresponding to the first coding result.
9. The big data intelligent warehouse management system based on data encoding of claim 4, wherein the character determining module is configured to:
normalizing the difference of each character;
taking the characters with the difference smaller than a preset value after normalization processing as candidate characters;
and determining the representative character of the data information sequence based on the frequency of the candidate character and the difference after normalization processing.
10. The data-encoding-based big data intelligent warehouse management system of claim 9, wherein the character determination module is further configured to:
calculating the ratio of the frequency of each candidate character to the difference after normalization processing;
and taking the candidate character with the ratio larger than 1 as a representative character of the data information sequence.
CN202311007459.XA 2023-08-11 2023-08-11 Big data wisdom warehouse management system based on data encoding Active CN116720812B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311007459.XA CN116720812B (en) 2023-08-11 2023-08-11 Big data wisdom warehouse management system based on data encoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311007459.XA CN116720812B (en) 2023-08-11 2023-08-11 Big data wisdom warehouse management system based on data encoding

Publications (2)

Publication Number Publication Date
CN116720812A true CN116720812A (en) 2023-09-08
CN116720812B CN116720812B (en) 2023-10-20

Family

ID=87866530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311007459.XA Active CN116720812B (en) 2023-08-11 2023-08-11 Big data wisdom warehouse management system based on data encoding

Country Status (1)

Country Link
CN (1) CN116720812B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963666A (en) * 1995-08-18 1999-10-05 International Business Machines Corporation Confusion matrix mediated word prediction
US6345119B1 (en) * 1996-02-19 2002-02-05 Fujitsu Limited Handwritten character recognition apparatus and method using a clustering algorithm
JP2004030048A (en) * 2002-06-24 2004-01-29 Nippon Digital Kenkyusho:Kk Character recognition dictionary, character recognition dictionary creating method, and character recognition method
US10171992B1 (en) * 2018-06-22 2019-01-01 International Business Machines Corporation Switching mobile service provider using blockchain
CN111506726A (en) * 2020-03-18 2020-08-07 大箴(杭州)科技有限公司 Short text clustering method and device based on part-of-speech coding and computer equipment
WO2021258853A1 (en) * 2020-06-24 2021-12-30 平安科技(深圳)有限公司 Vocabulary error correction method and apparatus, computer device, and storage medium
CN115270717A (en) * 2022-06-29 2022-11-01 国家计算机网络与信息安全管理中心 Method, device, equipment and medium for detecting vertical position
CN115438629A (en) * 2021-06-02 2022-12-06 北京字节跳动网络技术有限公司 Data processing method, device, storage medium and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963666A (en) * 1995-08-18 1999-10-05 International Business Machines Corporation Confusion matrix mediated word prediction
US6345119B1 (en) * 1996-02-19 2002-02-05 Fujitsu Limited Handwritten character recognition apparatus and method using a clustering algorithm
JP2004030048A (en) * 2002-06-24 2004-01-29 Nippon Digital Kenkyusho:Kk Character recognition dictionary, character recognition dictionary creating method, and character recognition method
US10171992B1 (en) * 2018-06-22 2019-01-01 International Business Machines Corporation Switching mobile service provider using blockchain
CN111506726A (en) * 2020-03-18 2020-08-07 大箴(杭州)科技有限公司 Short text clustering method and device based on part-of-speech coding and computer equipment
WO2021258853A1 (en) * 2020-06-24 2021-12-30 平安科技(深圳)有限公司 Vocabulary error correction method and apparatus, computer device, and storage medium
CN115438629A (en) * 2021-06-02 2022-12-06 北京字节跳动网络技术有限公司 Data processing method, device, storage medium and electronic equipment
CN115270717A (en) * 2022-06-29 2022-11-01 国家计算机网络与信息安全管理中心 Method, device, equipment and medium for detecting vertical position

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘博;刘晓光;王刚;吴迪;: "两种面向推荐系统的数据压缩方法", 计算机工程与科学, no. 11 *
谭丽;孙季丰;: "基于码书索引变换的高通量DNA序列数据压缩算法", 电子学报, no. 05 *
钟尚平, 高庆狮: "一类矢量地图的无损压缩算法", 系统仿真学报, no. 10 *

Also Published As

Publication number Publication date
CN116720812B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
Mohamad et al. Research article standardization and its effects on k-means clustering algorithm
Rani et al. Recent techniques of clustering of time series data: a survey
CN106529968B (en) Customer classification method and system based on transaction data
CN102663100B (en) Two-stage hybrid particle swarm optimization clustering method
Ma et al. Distance and density clustering for time series data
Hafez Knowledge Discovery in Databases
KR102358357B1 (en) Estimating apparatus for market size, and control method thereof
CN107305577A (en) Correct-distribute address date processing method and system based on K-means
Megalooikonomou et al. A dimensionality reduction technique for efficient similarity analysis of time series databases
Wang et al. Improved KNN algorithm based on preprocessing of center in smart cities
CN116720812B (en) Big data wisdom warehouse management system based on data encoding
CN119065324B (en) Industrial production negative carbon emission optimal control method and system
CN114297582A (en) Modeling method of discrete counting data based on multi-probe locality sensitive Hash negative binomial regression model
CN115186138B (en) A distribution network data comparison method and terminal
CN115314550B (en) Intelligent medical information pushing method and system based on digitization
US20040098412A1 (en) System and method for clustering a set of records
Mola et al. Discriminant analysis and factorial multiple splits in recursive partitioning for data mining
CN117540008A (en) Contract abnormal data risk intelligent analysis method
CN104714953A (en) Time series data motif identification method and device
Górecki et al. An experimental evaluation of time series classification using various distance measures
CN116361722A (en) A Multiple Fault Classification Method Based on Improved Linear Local Tangent Space Arrangement Model
CN111695599B (en) Elastic identification method for user electricity load time
Zhang et al. Efficient top-k DTW-based sensor data similarity search using perceptually important points and dual-bound filtering
CN120069907B (en) Product intelligent tracing method and system based on radio frequency tags
CN118245641B (en) Variable-length time series data similarity query method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant