CN117950598B

CN117950598B - Intelligent storage method for design data of electronic product

Info

Publication number: CN117950598B
Application number: CN202410346324.4A
Authority: CN
Inventors: 程伟; 杨丽丹; 杨顺作; 杨丽香; 杨金燕; 杨丽霞
Original assignee: Shenzhen Kaibo Technology Co ltd
Current assignee: Shenzhen Yibaolian Technology Co ltd
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-06-07
Anticipated expiration: 2044-03-26
Also published as: CN117950598A

Abstract

The invention relates to the technical field of data compression, in particular to an intelligent storage method of design data of an electronic product, which comprises the steps of determining correction necessary parameters when judging conditions are met whenever the current character to be compressed in a preceding buffer area is not matched with the character in a searching buffer area in the process of dictionary coding compression of data to be stored, judging whether the correction conditions are met or not, modifying the character in the preceding buffer area if the correction conditions are met, continuing dictionary coding compression of the modified preceding buffer area, and finally obtaining a first compressed file; determining a correction coefficient sequence according to the difference of characters in the advance buffer area before and after modification, and performing probability coding compression on the correction coefficient sequence to obtain a second compressed file; the first compressed file and the second compressed file are stored. The invention effectively improves the data compression effect and reduces the occupied storage space of the data.

Description

A method for intelligent storage of electronic product design data

技术领域Technical Field

本发明涉及数据压缩技术领域，具体涉及一种电子产品设计数据智能存储方法。The present invention relates to the technical field of data compression, and in particular to an intelligent storage method for electronic product design data.

背景技术Background technique

随着电子产品的日益发展，其设计越来越精巧，功能越来越繁杂，设计数据也越来越多，占用的存储空间也越来越大，因此设计数据的存储空间控制就成了电子产品数据管理的一个主要问题。而为了减小设计数据的存储空间，对设计数据进行必要的压缩就成为了一种必然的选择。With the development of electronic products, their designs are becoming more and more sophisticated, their functions are becoming more and more complex, the design data is increasing, and the storage space occupied is also increasing. Therefore, the storage space control of design data has become a major issue in electronic product data management. In order to reduce the storage space of design data, it is an inevitable choice to compress the design data.

传统的LZ77编码属于无损编码，可用于电子产品的设计数据的压缩，从而可以有效减小设计数据的存储空间。但是在采用LZ77编码进行数据压缩的过程中，数据压缩效果受到先行缓存区中的数据与查找缓存区中数据的匹配程度的影响，当数据匹配的连续性越高时，则进行数据压缩的效果就越好，但是当数据匹配的连续性越差时，此时采用LZ77编码再进行数据编码压缩，反而会增加数据的编码长度，导致数据的压缩效果较差，最终影响设计数据的存储。Traditional LZ77 coding is a lossless coding and can be used to compress the design data of electronic products, thereby effectively reducing the storage space of design data. However, in the process of using LZ77 coding for data compression, the data compression effect is affected by the matching degree between the data in the first buffer and the data in the search buffer. When the continuity of data matching is higher, the data compression effect is better. However, when the continuity of data matching is worse, using LZ77 coding for data encoding compression will increase the encoding length of the data, resulting in poor data compression effect, which ultimately affects the storage of design data.

发明内容Summary of the invention

本发明的目的在于提供一种电子产品设计数据智能存储方法，用于解决现有由于数据压缩效果差，导致数据占用存储空间大的问题。The purpose of the present invention is to provide an intelligent storage method for electronic product design data, which is used to solve the problem that the data occupies a large storage space due to poor data compression effect.

为解决上述技术问题，本发明提供了一种电子产品设计数据智能存储方法，包括以下步骤：To solve the above technical problems, the present invention provides an electronic product design data intelligent storage method, comprising the following steps:

获取待存储数据，对所述待存储数据进行字典编码压缩，在进行字典编码压缩过程中，每当出现先行缓存区域中的当前待压缩字符与查找缓存区域中的字符不匹配时，判断是否满足判定条件，若满足判定条件，则根据当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度，以及先行缓存区域中存在的与查找缓存区域中的字符相匹配的连续字符的字符长度，确定修正必要参数；Acquire the data to be stored, perform dictionary coding compression on the data to be stored, and during the dictionary coding compression process, whenever the current character to be compressed in the advance cache area does not match the character in the search cache area, determine whether a determination condition is met, and if the determination condition is met, determine the necessary correction parameters according to the character lengths of all characters between the current character to be compressed and the corresponding character when the mismatch occurred last time, and the character lengths of continuous characters in the advance cache area that match the characters in the search cache area;

根据所述修正必要参数，判断是否满足修正条件，若满足修正条件，则根据查找缓存区域中的字符，对先行缓存区域中的字符进行修改，并对修改后的先行缓存区域继续进行字典编码压缩，最终得到第一压缩文件；According to the correction necessary parameters, it is determined whether the correction condition is met. If the correction condition is met, the characters in the preceding cache area are modified according to the characters in the search cache area, and the modified preceding cache area is further compressed by dictionary coding to finally obtain a first compressed file;

根据先行缓存区域中的字符在修改前后的差异，确定先行缓存区域中每个字符对应的修正系数，从而得到待存储数据对应的修正系数序列，并对所述修正系数序列进行概率编码压缩，从而得到第二压缩文件；Determine the correction coefficient corresponding to each character in the preceding cache area according to the difference between the characters in the preceding cache area before and after the modification, thereby obtaining a correction coefficient sequence corresponding to the data to be stored, and perform probability coding compression on the correction coefficient sequence, thereby obtaining a second compressed file;

对第一压缩文件和第二压缩文件进行存储。The first compressed file and the second compressed file are stored.

进一步的，确定修正必要参数对应的计算公式为：Furthermore, the calculation formula corresponding to the necessary correction parameters is determined as follows:

；其中，/>表示修正必要参数；/>表示当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度；/>表示先行缓存区域中存在的与查找缓存区域中的字符相匹配的连续字符的字符长度；/>表示归一化函数。 ; Among them, /> Indicates necessary parameters for correction; /> Indicates the length of all characters between the current character to be compressed and the corresponding character when it did not match the last time; /> Indicates the character length of the continuous characters in the look-ahead buffer area that match the characters in the look-up buffer area; /> Represents the normalization function.

进一步的，对先行缓存区域中的字符进行修改，包括：Furthermore, the characters in the advance buffer area are modified, including:

将先行缓存区域中首次出现的与查找缓存区域中的字符相匹配的连续字符确定为替换字符，并将先行缓存区域中替换字符之前的与查找缓存区域中的字符不匹配的所有字符确定为被替换字符，更换所述替换字符和被替换字符的位置，从而得到修改后的先行缓存区域。The continuous characters that first appear in the advance cache area and match the characters in the search cache area are determined as replacement characters, and all characters before the replacement characters in the advance cache area that do not match the characters in the search cache area are determined as replaced characters, and the positions of the replacement characters and the replaced characters are replaced to obtain a modified advance cache area.

将先行缓存区域中首次出现的与查找缓存区域中的字符相匹配的连续字符确定为目标字符，并将先行缓存区域中目标字符之前的与查找缓存区域中的字符不匹配的所有字符确定为修改字符，将所述修改字符修改为查找缓存区域中存在的字符。The first consecutive characters that appear in the advance cache area and match the characters in the search cache area are determined as target characters, and all characters before the target characters in the advance cache area that do not match the characters in the search cache area are determined as modified characters, and the modified characters are modified to the characters existing in the search cache area.

进一步的，确定先行缓存区域中每个字符对应的修正系数，对应的计算公式为：Further, the correction coefficient corresponding to each character in the advance cache area is determined, and the corresponding calculation formula is:

；其中，/>表示先行缓存区域中的第n个字符对应的修正系数；/>表示在进行修正后先行缓存区域中的第n个字符的数据值；/>表示在进行修正前先行缓存区域中的第n个字符的数据值。 ; Among them, /> Indicates the correction coefficient corresponding to the nth character in the advance buffer area;/> Indicates the data value of the nth character in the advance buffer area after correction; /> Indicates the data value of the nth character in the advance buffer area before correction is made.

进一步的，所述判定条件至少包括：当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度小于第一设定字符长度阈值，且查找缓存区域中的字符长度不小于第二设定字符长度阈值。Furthermore, the determination condition includes at least: the character length of all characters between the current character to be compressed and the corresponding character when there was no match last time is less than a first set character length threshold, and the character length in the search cache area is not less than a second set character length threshold.

进一步的，所述修正条件至少包括：修正必要参数大于设定参数阈值。Furthermore, the correction condition at least includes: the correction necessary parameter is greater than a set parameter threshold.

进一步的，采用LZ77编码进行字典编码压缩。Furthermore, LZ77 encoding is used for dictionary encoding compression.

进一步的，当不满足所述判定条件时，则继续进行字典编码压缩。Furthermore, when the determination condition is not met, dictionary coding compression is continued.

进一步的，采用游程编码对所述修正系数序列进行概率编码压缩，从而得到第二压缩文件。Furthermore, run-length coding is used to perform probability coding compression on the correction coefficient sequence, thereby obtaining a second compressed file.

本发明具有如下有益效果：本发明通过在对待存储数据的字典编码压缩过程中，每当发生数据中断，也就是先行缓存区域中的当前待压缩字符与查找缓存区域中的字符不匹配时，判断是否判定条件，即判断当前是否需要对发生数据中断处的字符进行修改，若满足判定条件，则衡量对发生数据中断处的字符进行修改的必要程度，确定修正必要参数，当有必要进行修改时，则根据找缓存区域中的字符，对先行缓存区域中的字符进行修改，以提高先行缓存区域中发生数据中断处的与查找缓存区域中的字符相匹配的字符长度，并对修改后的先行缓存区域继续进行字典编码压缩，最终得到第一压缩文件。同时为了便于后续对压缩后得到的第一压缩文件进行解压缩，并最终得到原始的数据，在每次对先行缓存区域中的字符进行修改后，根据先行缓存区域中的字符在修改前后的差异，确定待存储数据对应的修正系数序列。由于仅是对待存储数据中的部分字符进行了修改，因此修正系数序列中含有大量的重复字符，因此对该修正系数序列进行概率编码压缩，并最终得到第二压缩文件。本发明通过对待存储数据的字典编码压缩过程中发生数据中断处的字符进行修改，从而有效避免了现有的LZ77编码需要对先行缓存区中发生数据中断处的数据构建多个三元组，反而使数据在压缩后存在数据量变大的问题，有效提高了数据压缩效果，减小了数据占用存储空间，实现了数据的高效存储。The present invention has the following beneficial effects: in the process of dictionary coding compression of data to be stored, whenever a data interruption occurs, that is, when the current characters to be compressed in the advance cache area do not match the characters in the search cache area, the present invention determines whether a determination condition is met, that is, whether it is necessary to modify the characters at the location where the data interruption occurs. If the determination condition is met, the necessary degree of modifying the characters at the location where the data interruption occurs is measured, and the necessary correction parameters are determined. When it is necessary to modify, the characters in the advance cache area are modified according to the characters in the search cache area to increase the length of the characters that match the characters in the search cache area at the location where the data interruption occurs in the advance cache area, and the modified advance cache area is further subjected to dictionary coding compression, and finally a first compressed file is obtained. At the same time, in order to facilitate the subsequent decompression of the first compressed file obtained after compression and finally obtain the original data, after each modification of the characters in the advance cache area, the correction coefficient sequence corresponding to the data to be stored is determined according to the difference between the characters in the advance cache area before and after the modification. Since only part of the characters in the data to be stored are modified, a large number of repeated characters are contained in the correction coefficient sequence, so the correction coefficient sequence is subjected to probability coding compression, and finally a second compressed file is obtained. The present invention modifies the characters at the data interruption locations during the dictionary coding compression process of the stored data, thereby effectively avoiding the need for the existing LZ77 coding to construct multiple triplets for the data at the data interruption locations in the advance buffer area, which in turn causes the problem of increased data volume after compression. The present invention effectively improves the data compression effect, reduces the storage space occupied by data, and realizes efficient data storage.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案和优点，下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它附图。In order to more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings required for use in the embodiments or the prior art descriptions are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为本发明实施例的电子产品设计数据智能存储方法的流程图；FIG1 is a flow chart of a method for intelligently storing electronic product design data according to an embodiment of the present invention;

图2为本发明实施例的窗口示意图。FIG. 2 is a schematic diagram of a window according to an embodiment of the present invention.

具体实施方式Detailed ways

为了更进一步阐述本发明为达成预定发明目的所采取的技术手段及功效，以下结合附图及较佳实施例，对依据本发明提出的技术方案的具体实施方式、结构、特征及其功效，详细说明如下。在下述说明中，不同的“一个实施例”或“另一个实施例”指的不一定是同一实施例。此外，一或多个实施例中的特定特征、结构或特点可由任何合适形式组合。In order to further explain the technical means and effects adopted by the present invention to achieve the predetermined invention purpose, the specific implementation methods, structures, features and effects of the technical solutions proposed by the present invention are described in detail below in conjunction with the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" does not necessarily refer to the same embodiment. In addition, specific features, structures or characteristics in one or more embodiments may be combined in any suitable form.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。另外，本文所涉及公式中的所有参数或者指标均为归一化之后的消除了量纲影响的数值。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art of the present invention. In addition, all parameters or indicators in the formulas referred to herein are normalized values that eliminate the influence of dimensions.

为了解决现有由于数据压缩效果差，导致数据占用存储空间大的问题，本实施例提供了一种电子产品设计数据智能存储方法，该方法对应的流程图如图1所示，包括以下步骤：In order to solve the problem that data occupies a large storage space due to poor data compression effect, this embodiment provides an electronic product design data intelligent storage method, the corresponding flowchart of the method is shown in FIG1, and includes the following steps:

（1）获取待存储数据，对所述待存储数据进行字典编码压缩，在进行字典编码压缩过程中，每当出现先行缓存区域中的当前待压缩字符与查找缓存区域中的字符不匹配时，判断是否满足判定条件，若满足判定条件，则根据当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度，以及先行缓存区域中存在的与查找缓存区域中的字符相匹配的连续字符的字符长度，确定修正必要参数。(1) Obtaining data to be stored and performing dictionary coding compression on the data to be stored. During the dictionary coding compression process, whenever a current character to be compressed in the advance cache area does not match a character in the search cache area, determining whether a determination condition is satisfied. If the determination condition is satisfied, determining necessary correction parameters based on the character lengths of all characters between the current character to be compressed and the corresponding character when the mismatch last occurred, and the character lengths of consecutive characters in the advance cache area that match the characters in the search cache area.

首先，获取待存储数据，在本实施例中，该待存储数据是指电子产品设计数据。在获取到电子产品设计数据之后，对该数据进行数据清洗的预处理，如对数据中的错误、缺失以及异常数据进行修正，提高数据的准确性和一致性，消除了噪声和无效信息，从而使得后续进行数据压缩所使用的数据更加完整准确，方便后续进行压缩。当然，作为其他的实施方式，当所获得的电子产品设计数据本身就完整准确时，此时无需进行该预处理过程。First, the data to be stored is obtained. In this embodiment, the data to be stored refers to the electronic product design data. After the electronic product design data is obtained, the data is pre-processed for data cleaning, such as correcting errors, missing data, and abnormal data in the data, improving the accuracy and consistency of the data, and eliminating noise and invalid information, so that the data used for subsequent data compression is more complete and accurate, which is convenient for subsequent compression. Of course, as other implementation methods, when the obtained electronic product design data itself is complete and accurate, there is no need to perform this pre-processing process at this time.

为了实现预处理后的电子产品设计数据的编码压缩，首先使用现有的LZ77编码确定编码的编码窗口大小。现有的LZ77编码通常使用4096个字节作为数据的编码窗口大小，并将该编码窗口分为两个部分，如图2所示，左侧部分为数据进行编码时的查找缓存区域，右边部分为先行缓存区域。由于较小的编码窗口可能会导致LZ77编码无法找到更长的匹配，从而限制了压缩性能；而较大的编码窗口则可能导致LZ77编码的查找和匹配过程变慢，增加了压缩的计算负担，因此对于电子产品设计数据，本实施例选择大小为8个字节的窗口作为后续编码压缩的编码窗口大小。In order to achieve coding compression of pre-processed electronic product design data, the existing LZ77 coding is first used to determine the coding window size of the coding. The existing LZ77 coding usually uses 4096 bytes as the coding window size of the data, and divides the coding window into two parts, as shown in Figure 2, the left part is the search cache area when the data is encoded, and the right part is the advance cache area. Since a smaller coding window may cause the LZ77 coding to be unable to find a longer match, thereby limiting the compression performance; and a larger coding window may cause the search and matching process of the LZ77 coding to slow down, increasing the computational burden of compression, so for electronic product design data, this embodiment selects a window size of 8 bytes as the coding window size for subsequent coding compression.

在确定编码窗口大小之后，首先利用现有的LZ77编码对预处理后的电子产品设计数据进行字典编码压缩。在进行编码压缩过程中，编码窗口会在电子产品设计数据中进行滑动，并对数据进行滑动编码。在进行滑动编码过程中，每次当发生先行缓存区域中的数据与查找缓存区域中的数据匹配不一致，即先行缓存区域中的当前待压缩字符与查找缓存区域中的字符不匹配时，此时便认为发生了数据中断。也就是说，当发生了数据中断时，是指先行缓存区域中的当前待压缩字符无法在查找缓存区域中找到。此时再采用LZ77编码进行编码时，就会导致数据重新编码时的编码长度变长，从而导致数据压缩效果变差。After determining the size of the coding window, the existing LZ77 coding is first used to perform dictionary coding compression on the pre-processed electronic product design data. During the coding compression process, the coding window will slide in the electronic product design data and perform sliding coding on the data. During the sliding coding process, every time the data in the advance cache area does not match the data in the search cache area, that is, the current character to be compressed in the advance cache area does not match the character in the search cache area, it is considered that a data interruption has occurred. In other words, when a data interruption occurs, it means that the current character to be compressed in the advance cache area cannot be found in the search cache area. At this time, if the LZ77 coding is used for encoding, the coding length will become longer when the data is re-encoded, resulting in a worse data compression effect.

为了提高数据压缩效果，当每次发生了数据中断时，则获取先行缓存区域中当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度，也就是获取上一次发生数据中断到当前发生数据中断之间的所有字符的字符长度/>。对于编码过程中首次发生了数据中断，则该字符长度/>是指当前发生数据中断之前的所有字符的字符长度。当字符长度/>越大时，则说明当前数据使用LZ77编码进行压缩时的效果越好，即使发生了数据中断，后续依然可以使用LZ77编码继续进行压缩，而当字符长度/>较小时，则说明当前数据使用LZ77编码进行压缩时的效果越差，此时需要对LZ77编码进行改进，以提高数据压缩效果。In order to improve the data compression effect, every time a data interruption occurs, the character length of all characters between the current character to be compressed and the corresponding character when the previous mismatch occurred in the advance cache area is obtained. , that is, to obtain the character length of all characters between the last data interruption and the current data interruption/> . If data interruption occurs for the first time during the encoding process, the character length/> Refers to the character length of all characters before the current data interruption. When the character length/> The larger the value, the better the compression effect of the current data using LZ77 encoding. Even if data interruption occurs, LZ77 encoding can still be used to continue compression. When the character length is / > When it is smaller, it means that the effect of compressing the current data using LZ77 encoding is worse. At this time, LZ77 encoding needs to be improved to improve the data compression effect.

基于上述分析，在获取到先行缓存区域中当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度之后，需要根据当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度，以及查找缓存区域中的字符长度，判断是否满足判定条件。在本实施例中，判定条件为：当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度小于第一设定字符长度阈值，且查找缓存区域中的字符长度不小于第二设定字符长度阈值。第一设定字符长度阈值的大小可以根据经验进行选择设定，本实施例设置设定字符长度阈值的取值为10。考虑到为了便于后续对先行缓存区域中的字符进行修改，因此本实施例设置第二设定字符长度阈值的取值为查找缓存区域的长度大小，也就是查找缓存区域中所能够容纳的字符的最大数目。当不满足判定条件时，则继续采用LZ77编码进行字典编码压缩，此时对后续的数据不进行判断，即无需判断是否要对先行缓存区域中的字符进行修改；而当满足判定条件时，此时需要对后续的数据进行判断，以判断是否要对先行缓存区域中的字符进行修改。Based on the above analysis, the character length of all characters between the current character to be compressed and the corresponding character when it did not match the last time in the cache area is obtained. Afterwards, it is necessary to determine whether the determination condition is met according to the character length of all characters between the current character to be compressed and the corresponding character when the previous character does not match, and the character length in the search cache area. In this embodiment, the determination condition is: the character length of all characters between the current character to be compressed and the corresponding character when the previous character does not match is less than the first set character length threshold, and the character length in the search cache area is not less than the second set character length threshold. The size of the first set character length threshold can be selected and set based on experience. In this embodiment, the value of the set character length threshold is set to 10. Considering that it is convenient to modify the characters in the advance cache area later, the value of the second set character length threshold is set to the length of the search cache area, that is, the maximum number of characters that can be accommodated in the search cache area. When the determination condition is not met, LZ77 encoding is continued to be used for dictionary encoding compression. At this time, no judgment is made on the subsequent data, that is, it is not necessary to judge whether the characters in the advance cache area need to be modified; and when the determination condition is met, it is necessary to judge the subsequent data to determine whether the characters in the advance cache area need to be modified.

为了判断是否要对先行缓存区域中的字符进行修改，获取发生了数据中断后先行缓存区域中的存在的与查找缓存区域中的字符相匹配的连续字符的字符长度，即在先行缓存区域中跳过与查找缓存区域中的字符不匹配的当前待压缩字符，继续进行字符匹配，直至寻找到与查找缓存区域中的字符相匹配的单个或者连续字符，并将该相匹配的单个或者连续字符的字符长度记作/>。当在先行缓存区域中寻找不到与查找缓存区域中的字符相匹配的单个或者连续字符时，则认为该字符长度/>的取值为0。In order to determine whether to modify the characters in the advance buffer area, the character length of the continuous characters in the advance buffer area that match the characters in the search buffer area after the data interruption occurs is obtained. That is, the current character to be compressed that does not match the character in the search cache area is skipped in the advance cache area, and character matching is continued until a single or continuous character that matches the character in the search cache area is found, and the character length of the matched single or continuous character is recorded as/> When no single or continuous characters matching the characters in the search cache area are found in the look-ahead cache area, the character length is considered to be The value of is 0.

在此基础上，根据所获取到的当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度，以及先行缓存区域中存在的与查找缓存区域中的字符相匹配的连续字符的字符长度，确定修正必要参数，对应的计算公式为：On this basis, the necessary correction parameters are determined according to the character lengths of all characters between the current character to be compressed and the corresponding character when the mismatch occurred last time, and the character lengths of the continuous characters in the advance cache area that match the characters in the search cache area. The corresponding calculation formula is:

在上述的修正必要参数的计算公式中，当前待压缩字符与上一次出现不匹配时对应字符之间的所有字符的字符长度的取值越小时，表示当前数据使用LZ77编码进行压缩时的效果越差，而当先行缓存区域中存在的与查找缓存区域中的字符相匹配的连续字符的字符长度/>的取值越大时，说明后续根据查找缓存区域中的字符对不匹配的当前待压缩字符进行修改后，能够有效提高先行缓存区域中与查找缓存区域中的字符匹配一致的字符位数，从而减少使用LZ77编码时三元组的构建次数，对数据进行LZ77编码压缩的效果越好，此时越应该对不匹配的当前待压缩字符进行修改，对应的修正必要参数的取值就越大。In the calculation formula for the above correction necessary parameters, the character length of all characters between the current character to be compressed and the corresponding character when it did not match the last time The smaller the value of is, the worse the effect of compressing the current data using LZ77 encoding is. When the character length of the continuous characters in the look-up cache area that match the characters in the look-up cache area is greater than the character length of the continuous characters in the look-up cache area, the smaller the value of is. The larger the value of , the more it means that after the unmatched current characters to be compressed are modified according to the characters in the lookup cache area, the number of character bits in the advance cache area that match the characters in the lookup cache area can be effectively increased, thereby reducing the number of triples constructed when using LZ77 encoding. The better the effect of LZ77 encoding compression on the data, the more the unmatched current characters to be compressed should be modified, and the larger the value of the corresponding correction necessary parameter will be.

（2）根据所述修正必要参数，判断是否满足修正条件，若满足修正条件，则根据查找缓存区域中的字符，对先行缓存区域中的字符进行修改，并对修改后的先行缓存区域继续进行字典编码压缩，最终得到第一压缩文件。(2) judging whether the correction condition is satisfied according to the correction necessary parameters; if the correction condition is satisfied, modifying the characters in the preceding cache area according to the characters in the search cache area, and continuing to perform dictionary encoding compression on the modified preceding cache area, and finally obtaining a first compressed file.

在通过上述步骤确定修正必要参数之后，根据该修正必要参数，判断是否满足修正条件。在本实施例中，修正条件为：修正必要参数大于设定参数阈值。设定参数阈值可以根据经验进行合理设置，本实施例设置该设定参数阈值的取值为0.7。当不满足修正条件时，则无需对先行缓存区域中的字符进行修改，此时继续采用LZ77编码进行字典编码压缩。而当满足修正条件时，则需要根据查找缓存区域中的字符，对先行缓存区域中的字符进行修改。After determining the necessary correction parameters through the above steps, determine whether the correction conditions are met based on the necessary correction parameters. In this embodiment, the correction condition is: the necessary correction parameters are greater than the set parameter threshold. The set parameter threshold can be reasonably set based on experience. In this embodiment, the value of the set parameter threshold is set to 0.7. When the correction condition is not met, there is no need to modify the characters in the advance cache area, and LZ77 encoding continues to be used for dictionary encoding compression. When the correction condition is met, it is necessary to modify the characters in the advance cache area based on the characters in the search cache area.

可选择的，根据查找缓存区域中的字符，对先行缓存区域中的字符进行修改，实现步骤包括：Optionally, the characters in the preceding cache area are modified according to the characters in the search cache area, and the implementation steps include:

为了便于理解，在对先行缓存区域中的字符进行修改时，将该先行缓存区域中的字符按照顺序依次与查找缓存区域中的字符进行匹配，从而确定首次出现的连续匹配的字符，并将这些字符作为替换字符。同时将先行缓存区域中替换字符之前的所有字符确定为被替换字符。For ease of understanding, when modifying the characters in the advance cache area, the characters in the advance cache area are matched with the characters in the search cache area in order, so as to determine the first occurrence of the continuous matching characters, and use these characters as replacement characters. At the same time, all characters before the replacement character in the advance cache area are determined as the replaced characters.

更换所述替换字符和被替换字符的位置，从而得到修改后的先行缓存区域。例如，先行缓存区域中的字符为EFABG，其中字符EF与查找缓存区域中的字符不匹配，即在查找缓存区域中不包含字符E和F，因而发生了上述所指的数据中断。字符AB与查找缓存区域中的字符匹配，即在查找缓存区域中包含字符A和B。同时，字符G与查找缓存区域中的字符也不匹配。此时，将字符AB确定为替换字符，并将字符EF确定为被替换字符，将字符EF和字符AB进行位置更换，此时修改后的先行缓存区域中的字符为ABEFG。通过这种方式对先行缓存区域中的字符进行修改，可以有效提高后续进行数据编码压缩的效果。The positions of the replacement character and the replaced character are changed to obtain a modified advance cache area. For example, the characters in the advance cache area are EFABG, wherein the character EF does not match the characters in the search cache area, that is, the search cache area does not contain characters E and F, and thus the data interruption referred to above occurs. The character AB matches the characters in the search cache area, that is, the search cache area contains characters A and B. At the same time, the character G does not match the characters in the search cache area either. At this time, the character AB is determined as the replacement character, and the character EF is determined as the replaced character, and the positions of the characters EF and AB are swapped, and the characters in the modified advance cache area are ABEFG. By modifying the characters in the advance cache area in this way, the effect of subsequent data encoding and compression can be effectively improved.

可选择的，作为其他一种实施方式，根据查找缓存区域中的字符，对先行缓存区域中的字符进行修改，实现步骤包括：Optionally, as another implementation, the characters in the preceding cache area are modified according to the characters in the search cache area, and the implementation steps include:

为了便于理解，在对先行缓存区域中的字符进行修改时，按照上述相同的方式，确定先行缓存区域中首次出现的连续匹配的字符，为了便于区分，将这些字符称为目标字符。同时，将先行缓存区域中目标字符之前的所有字符称为修改字符。此时，根据查找缓存区域中的字符，对修改字符进行修改，以使得先行缓存区域中尽可能出现更长的与查找缓存区域中的字符相匹配的连续字符。For ease of understanding, when modifying the characters in the look-ahead buffer area, the first consecutive matching characters in the look-ahead buffer area are determined in the same manner as described above, and for ease of distinction, these characters are referred to as target characters. At the same time, all characters before the target characters in the look-ahead buffer area are referred to as modifying characters. At this time, the modifying characters are modified according to the characters in the look-up buffer area, so that a longer consecutive character matching the characters in the look-up buffer area appears in the look-ahead buffer area as much as possible.

其中，在对修改字符进行修改时，若查找缓存区域中存在这些目标字符构成的字符串，且该字符串前存在的字符的长度不小于修改字符的字符长度，则直接利用该字符串前的且字符长度等于修改字符的字符长度的字符对修改字符进行替换，通过替换后，先行缓存区域中目标字符及其前面的所有字符所构成的字符串会在查找缓存区域中出现。例如，同样对于先行缓存区域中的字符EFABG，在不考虑窗口大小的情况下，若同时查找缓存区域中的字符为ABABC，此时则直接采用查找缓存区域中的第二次出现的字符A和B所构成的字符串AB之前的字符A和B对先行缓存区域中的修改字符EF进行替换，通过替换后，先行缓存区域中的字符为ABABG。若查找缓存区域中存在这些目标字符构成的字符串，但是该字符串前存在的字符的长度小于修改字符的字符长度，则直接利用该字符串前的所有字符对先行缓存区域中位于目标字符之前的字符进行替换。例如，同样对于先行缓存区域中的字符EFABG，若同时查找缓存区域中的字符为BABC，此时则采用查找缓存区域中的字符A和B所构成的字符串AB之前的字符B对先行缓存区域中的修改字符EF进行替换，通过替换后，先行缓存区域中的字符为EBABG。若查找缓存区域中存在这些目标字符构成的字符串，但是该字符串前不存在字符，此时则直接采用查找缓存区域中位于修改字符之前的且字符长度等于修改字符的字符长度的字符对修改字符进行替换。例如，同样对于先行缓存区域中的字符EFABG，若同时查找缓存区域中的字符为ABCD，此时则采用查找缓存区域中的字符CD对先行缓存区域中的修改字符EF进行替换，通过替换后，先行缓存区域中的字符为CDABG。Among them, when modifying the modified character, if the string composed of these target characters exists in the search cache area, and the length of the characters before the string is not less than the character length of the modified character, the modified character is directly replaced with the character before the string and the character length is equal to the character length of the modified character. After the replacement, the string composed of the target character and all the characters before it in the advance cache area will appear in the search cache area. For example, for the character EFABG in the advance cache area, without considering the window size, if the characters in the search cache area are ABABC at the same time, the characters A and B before the string AB composed of the second appearing characters A and B in the search cache area are directly used to replace the modified character EF in the advance cache area. After the replacement, the characters in the advance cache area are ABABG. If the string composed of these target characters exists in the search cache area, but the length of the characters before the string is less than the character length of the modified character, the characters before the target character in the advance cache area are directly replaced with all the characters before the string. For example, for the character EFABG in the advance cache area, if the character in the search cache area is BABC, the character B before the character string AB composed of the characters A and B in the search cache area is used to replace the modified character EF in the advance cache area. After the replacement, the characters in the advance cache area are EBABG. If the character string composed of these target characters exists in the search cache area, but there is no character before the character string, the modified character is directly replaced with the character in the search cache area that is located before the modified character and has a character length equal to the character length of the modified character. For example, for the character EFABG in the advance cache area, if the character in the search cache area is ABCD, the character CD in the search cache area is used to replace the modified character EF in the advance cache area. After the replacement, the characters in the advance cache area are CDABG.

当然，在对修改字符进行修改时，若查找缓存区域中不存在这些目标字符构成的字符串，仅仅是存在这些目标字符中的前面部分字符，此时则按照上述相同的方式，采用查找缓存区域中存在的该部分字符之前的字符串对先行缓存区域中位于目标字符之前的字符进行替换，以使得通过替换后先行缓存区域中尽可能出现更长的与查找缓存区域中的字符相匹配的连续字符。Of course, when modifying the modified characters, if the character string consisting of these target characters does not exist in the search cache area, and only the front part of the target characters exists, then in the same way as above, the character string before the part of the characters in the search cache area is used to replace the characters before the target characters in the advance cache area, so that after the replacement, as long as possible continuous characters matching the characters in the search cache area appear in the advance cache area.

在通过上述方式对先行缓存区域中的字符进行修改后，基于修改后的先行缓存区域，继续采用LZ77编码进行字典编码压缩，当下一次再次发生数据中断时，则按照上述相同的方式再次对先行缓存区域中的字符进行修改，并在修改后继续采用LZ77编码进行字典编码压缩，直至压缩结束，并最终得到第一压缩文件。After the characters in the advance cache area are modified in the above manner, LZ77 encoding is continued to be used for dictionary coding compression based on the modified advance cache area. When data interruption occurs again next time, the characters in the advance cache area are modified again in the same manner as above, and LZ77 encoding is continued to be used for dictionary coding compression after modification until the compression is completed, and finally the first compressed file is obtained.

（3）根据先行缓存区域中的字符在修改前后的差异，确定先行缓存区域中每个字符对应的修正系数，从而得到待存储数据对应的修正系数序列，并对所述修正系数序列进行概率编码压缩，从而得到第二压缩文件。(3) Determine the correction coefficient corresponding to each character in the advance cache area according to the difference between the characters in the advance cache area before and after the modification, thereby obtaining a correction coefficient sequence corresponding to the data to be stored, and perform probability coding compression on the correction coefficient sequence to obtain a second compressed file.

为了便于后续对压缩后得到的第一压缩文件进行解压缩，并最终得到原始的数据，在每次对先行缓存区域中的字符进行修改后，需要根据先行缓存区域中的字符在修改前后的差异，确定先行缓存区域中每个字符对应的修正系数，从而最终得到待存储数据中每个字符对应的修正系数。也就是，通过将修改后的数据的集合与原数据的集合进行做比，得到修改后的数据与原数据之间的修正系数，对应的计算公式为：In order to facilitate the subsequent decompression of the first compressed file obtained after compression and finally obtain the original data, after each modification of the characters in the advance cache area, it is necessary to determine the correction coefficient corresponding to each character in the advance cache area according to the difference between the characters in the advance cache area before and after the modification, so as to finally obtain the correction coefficient corresponding to each character in the data to be stored. That is, by comparing the set of modified data with the set of original data, the correction coefficient between the modified data and the original data is obtained, and the corresponding calculation formula is:

；其中，/>表示先行缓存区域中的第n个字符对应的修正系数；/>表示在进行修正后先行缓存区域中的第n个字符转换成ascii码的十进制数值；/>表示在进行修正前先行缓存区域中的第n个字符转换成ascii码的十进制数值，该十进制数值不存在为零的情况，可以直接作为分母。 ; Among them, /> Indicates the correction coefficient corresponding to the nth character in the advance buffer area;/> Indicates the decimal value of the nth character in the advance buffer area converted into ASCII code after correction;/> It means that the nth character in the buffer area is converted into the decimal value of the ASCII code before correction. The decimal value does not contain zero and can be directly used as the denominator.

通过上述确定先行缓存区域中每个字符对应的修正系数的相同的方式，可以确定待存储数据中每个字符对应的修正系数，按照待存储数据中字符的排列顺序，从而可以得到一个修正系数序列。在该修正系数序列中未修改的字符对应的修正系数为1，由于上述仅是对待存储数据中的部分字符进行了修改，因此得到的修正系数序列中连续为1的修正系数的数据较多。因此使用游程编码对该修正系数序列进行概率编码压缩，将修正系数序列中连续重复率较高的数据进行统一的编码，以减少数据的冗余，并最终得到第二压缩文件。The correction coefficient corresponding to each character in the data to be stored can be determined in the same manner as the correction coefficient corresponding to each character in the advance cache area, and a correction coefficient sequence can be obtained according to the arrangement order of the characters in the data to be stored. The correction coefficient corresponding to the unmodified character in the correction coefficient sequence is 1. Since only some characters in the data to be stored are modified, there are more correction coefficients that are continuously 1 in the correction coefficient sequence. Therefore, run-length coding is used to perform probability coding compression on the correction coefficient sequence, and the data with a high continuous repetition rate in the correction coefficient sequence is uniformly encoded to reduce data redundancy, and finally obtain a second compressed file.

（4）对第一压缩文件和第二压缩文件进行存储。(4) Storing the first compressed file and the second compressed file.

在通过上述步骤得到第一压缩文件和第二压缩文件之后，对该第一压缩文件和第二压缩文件进行存储并传输。当需要对数据进行解压缩时，通过使用LZ77编码的解码方式对第一压缩文件进行解码，并使用游程编码的解码方式对第二压缩文件进行解码，并结合两种解码结果，最终得到原始数据以供使用。由于该解压缩过程与上述数据压缩过程相反，且上述已经对数据压缩过程进行了详细介绍，此处对该解压缩过程不再进行赘述。After the first compressed file and the second compressed file are obtained through the above steps, the first compressed file and the second compressed file are stored and transmitted. When the data needs to be decompressed, the first compressed file is decoded by using the decoding method of LZ77 encoding, and the second compressed file is decoded by using the decoding method of run-length encoding, and the two decoding results are combined to finally obtain the original data for use. Since the decompression process is opposite to the above data compression process, and the data compression process has been described in detail above, the decompression process will not be repeated here.

本发明通过对待存储数据的字典编码压缩过程中发生数据中断处的字符进行修改，从而提高了LZ77编码在数据中断处的字符压缩效率，有效避免了现有的LZ77编码需要对先行缓存区中发生数据中断处的数据构建多个三元组，反而使数据在压缩后存在数据量变大的问题，有效提高了数据压缩效果，减小了数据占用存储空间，实现了数据的高效存储。The present invention improves the character compression efficiency of LZ77 encoding at the data interruption point by modifying the characters at the data interruption point during the dictionary coding compression process of the stored data, effectively avoiding the problem that the existing LZ77 encoding needs to construct multiple triplets for the data at the data interruption point in the advance buffer area, which instead causes the data volume to increase after compression, effectively improving the data compression effect, reducing the storage space occupied by data, and realizing efficient data storage.

需要说明的是：以上所述实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。It should be noted that the above-described embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. These modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should all be included in the protection scope of the present application.

Claims

1. A method for intelligent storage of electronic product design data, characterized in that it comprises the following steps:

Acquire the data to be stored, perform dictionary coding compression on the data to be stored, and during the dictionary coding compression process, whenever the current character to be compressed in the advance cache area does not match the character in the search cache area, determine whether a determination condition is met, and if the determination condition is met, determine the necessary correction parameters according to the character lengths of all characters between the current character to be compressed and the corresponding character when the mismatch occurred last time, and the character lengths of continuous characters in the advance cache area that match the characters in the search cache area;

According to the correction necessary parameters, it is determined whether the correction condition is met. If the correction condition is met, the characters in the preceding cache area are modified according to the characters in the search cache area, and the modified preceding cache area is further compressed by dictionary coding to finally obtain a first compressed file;

Determine the correction coefficient corresponding to each character in the preceding cache area according to the difference between the characters in the preceding cache area before and after the modification, thereby obtaining a correction coefficient sequence corresponding to the data to be stored, and perform probability coding compression on the correction coefficient sequence, thereby obtaining a second compressed file;

Storing the first compressed file and the second compressed file;

The calculation formula for determining the necessary correction parameters is:

; Among them, /> Indicates necessary parameters for correction; /> Indicates the length of all characters between the current character to be compressed and the corresponding character when it did not match the last time; /> Indicates the character length of the continuous characters in the look-ahead buffer area that match the characters in the look-up buffer area; /> represents the normalization function;

Modify the characters in the look-ahead buffer area, including:

Determine the first occurrence of the continuous characters in the advance cache area that match the characters in the search cache area as replacement characters, and determine all characters in the advance cache area that precede the replacement characters and do not match the characters in the search cache area as replaced characters, and replace the positions of the replacement characters and the replaced characters, thereby obtaining a modified advance cache area;

Modify the characters in the look-ahead buffer area, including:

Determine the first consecutive characters in the advance cache area that match the characters in the search cache area as target characters, and determine all characters before the target characters in the advance cache area that do not match the characters in the search cache area as modified characters, and modify the modified characters to the characters existing in the search cache area;

Determine the correction coefficient corresponding to each character in the advance cache area. The corresponding calculation formula is:

; Among them, /> Indicates the correction coefficient corresponding to the nth character in the advance cache area;/> Indicates the data value of the nth character in the advance buffer area after correction; /> Indicates the data value of the nth character in the advance buffer area before correction is made.

2. According to claim 1, a method for intelligent storage of electronic product design data is characterized in that the judgment conditions include at least: the character length of all characters between the current character to be compressed and the corresponding character when it did not match the last time is less than a first set character length threshold, and the character length in the search cache area is not less than a second set character length threshold.

3. The method for intelligently storing electronic product design data according to claim 1, wherein the correction condition at least includes: a correction-necessary parameter being greater than a set parameter threshold.

4. The method for intelligently storing electronic product design data according to claim 1, characterized in that LZ77 encoding is used for dictionary encoding compression.

5. The method for intelligently storing electronic product design data according to claim 1, wherein when the judgment condition is not met, dictionary coding compression is continued.

6. The method for intelligent storage of electronic product design data according to claim 1 is characterized in that the correction coefficient sequence is probability-coded and compressed using run-length coding to obtain a second compressed file.