[go: up one dir, main page]

CN119232171B - A data intelligent compression method for twin systems - Google Patents

A data intelligent compression method for twin systems Download PDF

Info

Publication number
CN119232171B
CN119232171B CN202411745562.9A CN202411745562A CN119232171B CN 119232171 B CN119232171 B CN 119232171B CN 202411745562 A CN202411745562 A CN 202411745562A CN 119232171 B CN119232171 B CN 119232171B
Authority
CN
China
Prior art keywords
data
model
module
probability distribution
bootstrap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411745562.9A
Other languages
Chinese (zh)
Other versions
CN119232171A (en
Inventor
潘成胜
张晨曦
施建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202411745562.9A priority Critical patent/CN119232171B/en
Publication of CN119232171A publication Critical patent/CN119232171A/en
Application granted granted Critical
Publication of CN119232171B publication Critical patent/CN119232171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an intelligent data compression method for a twin system, which is used for collecting video, audio and text data for preprocessing, constructing a Bootstrap model for carrying out feature extraction and preliminary compression on input data, constructing a Combined model consisting of the Bootstrap model and a Supporter model, combining probability distribution output by the Bootstrap model with probability distribution output by the Supporter model, carrying out final compression on the output probability distribution by using an arithmetic coding method to reduce the volume of data transmission and transmit the data to target equipment, decoding and reconstructing the received data by the target equipment, and recovering the original data before compression.

Description

Data intelligent compression method for twin system
Technical Field
The invention relates to the technical fields of computers and electronic information engineering, in particular to a data intelligent compression method oriented to a twin system.
Background
In a digital twin system, a twin model enables accurate monitoring, simulation and optimization of the real world by mapping physical objects in real time. The digital twin system is widely applied to a plurality of fields such as manufacturing, energy, smart city, traffic and the like, can improve the operation efficiency, reduce the failure rate and provide higher flexibility and automation degree. Digital twinning systems rely on the transmission and processing of large amounts of data to ensure real-time interaction between virtual models and physical entities. The data comprises various formats such as video, audio, sensor data and text information, and the like, and extremely low data transmission delay is required, so that with the continuous increase of the data volume, how to efficiently transmit the data while ensuring the real-time performance becomes one of the key challenges faced by the digital twin system. In response to this challenge, data compression techniques have proven to be a reliable and effective solution. The data compression technology can effectively reduce the data volume by reducing redundant information, thereby improving the transmission speed and reducing the bandwidth occupation. The method is particularly important for a digital twin system for processing diversified data and coping with complex environments, and particularly in a scene with low time delay requirements, the data compression technology can effectively cope with the challenges and can ensure the real-time performance and the high efficiency of the digital twin system. Therefore, there is a need for data compression techniques that enable low latency, efficient transmission of data. However, the existing data compression method is usually optimized for specific types of data, has remarkable effect when processing single type of data, has poor adaptability when facing multiple data types, has remarkably reduced compression effect, and is difficult to meet the requirement of environment on low-delay transmission.
In order to solve the problems and the current situation, the invention provides a general data compression method based on a neural network and arithmetic coding. By designing a Bootstrap model and a Combined model and combining a multi-head attention mechanism, a position coding and a residual error network, the data compression ratio and the compression speed are improved, so that the time delay of data transmission is effectively reduced, multiple data types in a twin system are adapted, and an efficient data compression transmission scheme is provided for the construction of the twin system.
Disclosure of Invention
The invention aims to solve the problems that in a twin system, data transmission faces low time delay and diversification, and the real-time interaction of the data between the twin system and a physical entity is seriously influenced. The existing compression method is usually used for compressing specific types of data, and has remarkable effect when processing single type of data, but has poor adaptability when facing multiple data types, and the compression effect is remarkably reduced. In addition, the conventional compression method has difficulty in achieving efficient compression of diversified data in a rapidly changing complex environment. Aiming at the problems, the invention provides a data intelligent compression method for a twin system, which aims to improve the data compression ratio and the compression time efficiency by designing a Bootstrap model and a Combined model and combining a multi-head attention mechanism, a position coding and a residual error network, thereby effectively reducing the time delay of data transmission, adapting to various data types in the twin system and ensuring the real-time processing of data.
In order to realize the functions, the invention designs a twin system-oriented intelligent data compression method, which comprises the following steps S1 to S5, and the compression and transmission of data are completed:
Step S1, collecting video data, audio data and text data as data to be compressed, preprocessing the data to be compressed, including standardized processing and format conversion of the data, and obtaining preprocessed data;
Step S2, constructing and training a Bootstrap model, taking the preprocessed data as input, and carrying out feature extraction and preliminary compression on the preprocessed data based on a position coding module, a bidirectional gating circulating unit, a multi-head attention mechanism, a residual error module and a functional module, and outputting probability distribution logits b;
s3, constructing and training a coded model, wherein the coded model consists of a Bootstrap model and a Supporter model, the Supporter model is based on a position coding module, a multi-head attention mechanism, a residual error module and a functional module, probability distribution logits s is output, the coded model combines probability distribution logits b output by the Bootstrap model with probability distribution logits s output by the Supporter model, and sequence characteristic representation and further compression of data are enhanced to generate probability distribution logits c;
S4, finally compressing the probability distribution output in the step S3 by using an arithmetic coding method to reduce the volume of data transmission and transmitting the data to target equipment;
And S5, decoding and reconstructing the received data by utilizing a pre-trained Bootstrap model, a Combined model and an arithmetic decoding method on target equipment, and restoring the original data before compression.
The beneficial effects of the invention are that compared with the prior art, the invention has the following advantages:
The invention provides a twin system-oriented data intelligent compression method which is suitable for various types of data including text, video, audio and sensor data. First, consistent, efficient compression is achieved between different types of data through specific data preprocessing steps and a generic compression model. And secondly, a Bootstrap model and a Combined model are used, a multi-head attention mechanism and a residual block are Combined, so that efficient compression of diversified data is realized, and the method has remarkable advantages in compression ratio and compression speed compared with the traditional method. Meanwhile, by combining with a probability model of arithmetic coding, high reliability of data compression is ensured, efficient lossless compression is realized at a lower bit rate, and loss and error rate of data in the transmission process are reduced. Finally, the method of the invention adopts a standard neural network architecture and an arithmetic coding algorithm, has good realizability and expandability, can be conveniently integrated into the existing twin system, and can be customized and expanded according to specific requirements. In summary, the general data compression method based on the neural network and the arithmetic coding not only can adapt to various data types, but also has remarkable advantages in compression efficiency, further can remarkably reduce the time delay of data transmission, particularly in a complex environment, can ensure the rapid transmission and real-time processing of the data of the twin system, and provides an efficient data compression transmission scheme for the construction of the twin system.
Drawings
FIG. 1 is a flow chart of a data intelligent compression method for a twin system provided according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a training process of a Bootstrap model provided according to an embodiment of the present invention;
FIG. 3 is a block diagram of a Combined model provided in accordance with an embodiment of the present invention;
FIG. 4 is a graph of training and testing loss of a Bootstrap model as a function of iteration number, provided in accordance with an embodiment of the present invention;
FIG. 5 is a graph of the change in loss during training of a Combined model provided in accordance with an embodiment of the present invention;
FIG. 6 is a graph comparing the performance of the method of the present invention with a series of compression algorithms on a diverse data set, provided in accordance with an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
The embodiment of the invention provides a twin system-oriented data intelligent compression method, referring to fig. 1, the following steps S1-S5 are executed to complete data compression and transmission:
Step S1, collecting video data, audio data and text data as data to be compressed, preprocessing the data to be compressed to ensure that different types of data can be effectively processed, wherein the preprocessing comprises standardized processing and format conversion of the data, and obtaining preprocessed data;
The method for preprocessing the data to be compressed in step S1 is as follows:
inter-sampling is performed for data frames of video data and audio data, the sampling rate is reduced by the following formula:
;
Wherein, Is a sampled data frame, S i represents an i-th frame in the original data frame sequence, and n is a sampling interval, i.e. every n frames;
and carrying out normalization processing on the sampled video data and audio data, and unifying data scales, wherein the normalization processing comprises the following specific formula:
;
wherein x norm is a normalized data value, x is an original data value, and min (x) and max (x) are respectively a minimum value and a maximum value in the data;
For text data, the mapping of characters to integers is performed by the following formula:
;
Where C int is the mapped integer value, C is the original character, and ord is a function that converts the character to a corresponding ASCII integer value.
Step S2, constructing and training a Bootstrap model, taking the preprocessed data as input, and carrying out feature extraction and preliminary compression on the preprocessed data based on a position coding module, a bidirectional gating circulating unit, a multi-head attention mechanism, a residual error module and a functional module to optimize the representation of the data and improve the compression efficiency;
the Bootstrap model in the step S2 sequentially comprises a position coding module, a bidirectional gating circulating unit, a multi-head attention mechanism, a residual error module and a functional module;
The Bootstrap model inputs preprocessed data, adds position information for the data through a position coding module, then adopts a bidirectional gating circulation unit (BiGRU) to output embedded features, captures bidirectional dependency of the data, further improves understanding capability of the model on sequence data, then utilizes a multi-head attention mechanism to output attention scores, captures complex relationships among the data through flattening operation, solves gradient disappearance problem in a depth network through a residual error module, finally generates unscaled probability distribution logits b through two parallel functional modules to complete preliminary prediction, wherein the two parallel functional modules are a linear layer and a full-connection layer, performs preliminary transformation on the extracted features through the linear layer, and then maps the features to an output space through the full-connection layer.
The training process of the Bootstrap model is described with reference to fig. 2. The Bootstrap model is trained by traversing the input data multiple times, optimizing parameters to minimize cross entropy loss, and generating a high-quality symbol probability prediction model. The Bootstrap model adapts to the requirements of different data sets by automatically selecting super parameters, and captures long-term dependency and complex modes of data by utilizing a multi-head self-attention mechanism, a position coding and residual error module. After training, the model parameters are saved and used as a part of a compressed file, so that an efficient prediction basis is provided for the compression process of the data intelligent compression method for the twin system.
S3, constructing and training a coded model, wherein the coded model consists of a Bootstrap model and a Supporter model, the Supporter model is based on a position coding module, a multi-head attention mechanism, a residual error module and a functional module, probability distribution logits s is output, the coded model combines probability distribution logits b output by the Bootstrap model with probability distribution logits s output by the Supporter model, and sequence characteristic representation and further compression of data are enhanced to generate probability distribution logits c;
In the Combined model in step S3, the Supporter model inputs the preprocessed data, adds position information for the data through the position coding module, splices the data added with the position information with the embedded features output by the bidirectional gating circulation unit of the Bootstrap model, outputs attention scores through a multi-head attention mechanism, captures complex relations among the data, sequentially passes through a plurality of residual modules, enhances the stability and training efficiency of the model, and finally passes through three parallel functional modules, namely a linear module, a dense module and a residual module, wherein each module serves as an independent predictor for learning features with different complexity. The output vector of each module is reduced to the dimension matched with the vocabulary size through linear transformation, and finally weighted summation is carried out to generate probability distribution logits s, and Supporter model can extract basic mode, complex feature and deep information respectively, so that the accuracy of overall prediction is improved.
The Combined model combines the probability distribution output by the Bootstrap model and the Supporter model, and generates final probability distribution logits c through convex sum;
;
Where λ is a learnable parameter constrained within the range of [0,1] by a sigmoid activation function. Through the combination, the Combined model remarkably improves compression efficiency and prediction precision, and ensures excellent performance of the model on various data sets.
Combinedshape structure referring to fig. 3, in the combinedshape, bootstrap model is Combined with the random initialized Supporter model, and final symbol probability prediction is generated through convex sum. When the training phase of the Combined model starts, the Bootstrap model keeps the parameters trained in step S2 unchanged, and the Supporter model performs adaptive updating according to the need. The input data sequence is divided into a plurality of equal-sized portions, and batch predictions are made in parallel. The predicted result of the coded model is converted into probability distribution through a softmax activation function and then is input into an arithmetic coder for symbol coding. The Supporter model continuously performs parameter updating according to actual symbol prediction in the encoding process, and optimizes the prediction capacity by minimizing cross entropy loss. The multi-head self-attention mechanism, the position coding and residual error module further improves the adaptability and the prediction precision of the model, and ensures that the model can be rapidly adapted to the non-stationary statistical characteristics in the sequence. Finally, the coded symbols are input into an arithmetic coder together with the probability prediction generated by the coded model to generate an efficiently compressed file, thereby realizing higher compression ratio and faster coding speed.
The processing process of the position coding module to the data is as follows:
;
;
Wherein, For position-coded data, pos is the sequence position, i is the dimension index, and d is the model dimension.
The processing process of the multi-head attention mechanism on the data is as follows:
;
Wherein Attention (Q, K, V) is the Attention score, Q, K, V represent query, key and value matrices, respectively, d k is the dimension of the key vector for scaling the dot product, preventing the result after the dot product from being too large, affecting the gradient of softmax, which is a normalization function for converting the Attention score into a probability distribution.
S4, finally compressing the probability distribution output in the step S3 by using an arithmetic coding method, wherein the arithmetic coding method is an efficient coding mode based on the probability distribution of the data so as to reduce the volume of data transmission and transmit the data to target equipment;
the arithmetic coding method in step S4 is as follows:
;
where new_interval is a new coding section, low and high are the lower and upper bounds of the current section, and probability is the occurrence probability of the current symbol.
And S5, decoding and reconstructing the received data by utilizing a pre-trained Bootstrap model, a Combined model and an arithmetic decoding method on target equipment, and restoring the original data before compression. The arithmetic decoding method in the decoding process cooperates with model reasoning work, so that the integrity and accuracy of the data in the transmission and recovery processes are ensured.
Fig. 4 shows the variation of training and testing loss of the Bootstrap model with the iteration number in step S2 in the application of the data intelligent compression method for the twin system designed by the present invention. As the number of iterations increases, the training loss drops significantly from 1.35 to near 0.98, while the test loss drops from 1.20 to near 0.98, indicating that the performance of the Bootstrap model on the training and validation data set continues to improve and eventually tends to stabilize. In the whole process, the test loss is slightly lower than the training loss all the time, which indicates that the Bootstrap model has no overfitting and has good generalization capability. Overall, the Bootstrap model successfully converges in the training process, and the expected compression effect is achieved.
Fig. 5 shows the change in loss during training of the Combined model in step S3. The partial training is based on the model training of step S2, and results obtained by further combining and optimizing the model structure. It can be seen that the loss value gradually decreases with the training, and the error of the model gradually decreases in the continuous optimization process, which means that the performance of the model is further improved in the training process of the Combined model. By comparing the training results of step S2, it can be seen that the training of the Combined model is based on the performance of the initial model and the loss value continues to be reduced in further combinations and optimizations. This demonstrates that the Combined model successfully inherits the advantages of the early training part and further improves the generalization ability and compression effect of the model. From the overall trend, fig. 5 shows that the model is effective in this stage of optimization, and the test loss reaches a relatively stable level, proving the rationality and effectiveness of the model design and training strategy.
FIG. 6 shows the performance of the method designed by the present invention versus a series of compression algorithms on a diverse data set. These algorithms include conventional compressors 7-Zip, BSC, zip, bzip, gzip and Tar, and neural network based compressors CMIX and Dzip. Test datasets cover a wide range of fields including Text (e.g., webster and Text 8), audio (Audio), genomic data (e.g., ill-quality), floating point data (e.g., num-control), human chromosome data (e.g., h.chr1 and h.chr20), image data (e.g., kodim 02), and mixed datasets XOR60 and HMM60, as well as specific datasets such as Np-bases and Enwiki. Experimental results show that the method provided by the invention is excellent in all data sets, and is particularly superior to other traditional compressors and neural network compressors in text and genome data sets. Compared with the original compression algorithm, the method provided by the invention has obvious improvement on the bit rate per character (BPC) and the overall compression efficiency. In addition, compared with other neural network compressors, the design method of the invention not only has excellent compression efficiency, but also has obvious advantages in processing speed. For example, although the compression effect on text data is slightly lower than that of CMIX method, the encoding and decoding efficiency is three to four times that of CMIX, and the performance advantage is fully reflected. The outstanding performance of the method disclosed by the invention benefits from a multi-head self-attention mechanism, a position coding and a residual block architecture, the adaptability and generalization capability of the method on different data types are greatly improved by the technologies, and the wide application potential and the technical advantages of the method in the modern data compression field are shown.
In summary, the method of the present invention provides significant advantages by comparing and analyzing the number of Bits Per Character (BPC) of a plurality of compressors on different types of real data sets. The compression effect on text, audio and genome data sets is significantly better than that of traditional compressors and other neural network compressors by introducing a multi-head self-attention mechanism, position coding and residual modules. These improvements not only improve compression efficiency and prediction accuracy, but also enhance the adaptability and generalization ability of the model. Thus, the remarkable performance of the proposed method over a wide variety of data types demonstrates its feasibility and effectiveness as an efficient, reliable compression method.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (7)

1.一种面向孪生系统的数据智能压缩方法,其特征在于,执行如下步骤S1-步骤S5,完成数据的压缩和传输:1. A data intelligent compression method for a twin system, characterized in that the following steps S1 to S5 are executed to complete data compression and transmission: 步骤S1:采集视频数据、音频数据和文本数据作为待压缩的数据,针对待压缩的数据,进行预处理,包括数据的标准化处理和格式转换,获得预处理后的数据;Step S1: collecting video data, audio data and text data as data to be compressed, and performing preprocessing on the data to be compressed, including data standardization and format conversion, to obtain preprocessed data; 步骤S2:构建并训练Bootstrap模型,以预处理后的数据为输入,基于位置编码模块、双向门控循环单元、多头注意力机制、残差模块和功能模块,对预处理后的数据进行特征提取和初步压缩,输出概率分布logits b Step S2: Build and train the Bootstrap model, take the preprocessed data as input, extract features and perform preliminary compression on the preprocessed data based on the position encoding module, bidirectional gated recurrent unit, multi-head attention mechanism, residual module and functional module, and output the probability distribution logits b ; 步骤S3:构建并训练Combined模型,Combined模型由Bootstrap模型和Supporter模型组成,Supporter模型基于位置编码模块、多头注意力机制、残差模块和功能模块,输出概率分布logits s ,Combined模型将Bootstrap模型输出的概率分布logits b ,与Supporter模型输出的概率分布logits s 相结合,加强数据的序列特征表示和进一步压缩,生成概率分布logits c Step S3: Build and train a Combined model. The Combined model consists of a Bootstrap model and a Supporter model. The Supporter model outputs a probability distribution logits s based on a position encoding module, a multi-head attention mechanism, a residual module, and a functional module. The Combined model combines the probability distribution logits b output by the Bootstrap model with the probability distribution logits s output by the Supporter model to strengthen the sequence feature representation and further compress the data, and generate a probability distribution logits c . 步骤S4:使用算术编码方法对步骤S3输出的概率分布进行最终压缩,以减少数据传输的体积,并将数据向目标设备进行传输;Step S4: finally compressing the probability distribution outputted in step S3 using an arithmetic coding method to reduce the volume of data transmission, and transmitting the data to the target device; 步骤S5:在目标设备上,利用预训练的Bootstrap模型、Combined模型和算术解码方法对接收到的数据进行解码和重构,还原出压缩前的原始数据。Step S5: On the target device, the received data is decoded and reconstructed using the pre-trained Bootstrap model, Combined model and arithmetic decoding method to restore the original data before compression. 2.根据权利要求1所述的一种面向孪生系统的数据智能压缩方法,其特征在于,步骤S1中针对待压缩的数据进行预处理的方法如下:2. According to the twin system-oriented data intelligent compression method of claim 1, it is characterized in that the method for preprocessing the data to be compressed in step S1 is as follows: 针对视频数据和音频数据的数据帧进行帧间采样,通过以下公式降低采样率:Inter-frame sampling is performed on the data frames of video data and audio data, and the sampling rate is reduced by the following formula: ; 其中,是采样后的数据帧,S i 代表原始数据帧序列中的第i帧,n是采样间隔;in, is the sampled data frame, Si represents the i- th frame in the original data frame sequence, and n is the sampling interval; 针对采样后的视频数据和音频数据,进行归一化处理如下式:For the sampled video data and audio data, normalization is performed as follows: ; 其中,x norm 是归一化后的数据值,x是原始数据值,min(x)和max(x)分别是数据中的最小值和最大值;Among them, x norm is the normalized data value, x is the original data value, min( x ) and max( x ) are the minimum and maximum values in the data respectively; 针对文本数据,通过以下公式,进行字符到整数的映射:For text data, characters are mapped to integers using the following formula: ; 其中,C int 是映射后的整数值,c是原始字符,ord是将字符转换为对应的ASCII整数值的函数。Where C int is the integer value after mapping, c is the original character, and ord is the function that converts the character to the corresponding ASCII integer value. 3.根据权利要求1所述的一种面向孪生系统的数据智能压缩方法,其特征在于,步骤S2中所述的Bootstrap模型依次包括位置编码模块、双向门控循环单元、多头注意力机制、残差模块和功能模块;3. According to a twin system-oriented data intelligent compression method according to claim 1, it is characterized in that the Bootstrap model described in step S2 includes a position encoding module, a bidirectional gated recurrent unit, a multi-head attention mechanism, a residual module and a functional module in sequence; Bootstrap模型输入预处理后的数据,通过位置编码模块为数据添加位置信息,然后采用双向门控循环单元,输出嵌入的特征,捕捉数据的双向依赖关系,之后经过展平操作,利用多头注意力机制,输出注意力得分,捕捉数据之间的复杂关系,通过残差模块和两个并行的功能模块,生成未缩放的概率分布logits b ,完成初步的预测,其中两个并行的功能模块为线性层和全连接层,先使用线性层对提取的特征进行初步变换,后通过全连接层将特征映射到输出空间。The Bootstrap model inputs the preprocessed data, adds position information to the data through the position encoding module, and then uses a bidirectional gated recurrent unit to output embedded features to capture the bidirectional dependencies of the data. After that, it flattens and uses a multi-head attention mechanism to output attention scores to capture the complex relationship between the data. The unscaled probability distribution logits b is generated through the residual module and two parallel functional modules to complete the preliminary prediction. The two parallel functional modules are the linear layer and the fully connected layer. The linear layer is first used to perform a preliminary transformation on the extracted features, and then the features are mapped to the output space through the fully connected layer. 4.根据权利要求1所述的一种面向孪生系统的数据智能压缩方法,其特征在于,步骤S3中所述的Combined模型中,Supporter模型输入预处理后的数据,通过位置编码模块为数据添加位置信息,将添加位置信息的数据与Bootstrap模型的双向门控循环单元所输出嵌入的特征进行拼接,将拼接后的特征通过多头注意力机制输出注意力得分,捕捉数据之间的复杂关系,依次通过多个残差模块,最后通过三个并行的功能模块,三个并行的功能模块为线性模块、密集模块和残差模块,各模块的输出向量通过线性变换缩减至与词汇表大小相匹配的维度,并在最后进行加权求和,生成概率分布logits s 4. According to claim 1, a data intelligent compression method for twin systems is characterized in that, in the Combined model described in step S3, the Supporter model inputs the preprocessed data, adds position information to the data through the position encoding module, splices the data with the added position information with the embedded features output by the bidirectional gated recurrent unit of the Bootstrap model, outputs the attention score of the spliced features through a multi-head attention mechanism, captures the complex relationship between the data, passes through multiple residual modules in turn, and finally passes through three parallel functional modules, the three parallel functional modules are a linear module, a dense module and a residual module, the output vector of each module is reduced to a dimension matching the vocabulary size through a linear transformation, and finally a weighted sum is performed to generate a probability distribution logits s ; Combined模型将Bootstrap模型和Supporter模型输出的概率分布结合,通过凸和生成最终的概率分布logits c The Combined model combines the probability distributions output by the Bootstrap model and the Supporter model, and generates the final probability distribution logits c through convex sum; ; 其中,λ是一个通过sigmoid激活函数约束在[0,1]范围内的可学习参数。Here, λ is a learnable parameter constrained to the range [0,1] by a sigmoid activation function. 5.根据权利要求3或4所述的一种面向孪生系统的数据智能压缩方法,其特征在于,位置编码模块对数据的处理过程如下式:5. According to a twin system-oriented data intelligent compression method according to claim 3 or 4, it is characterized in that the position encoding module processes the data as follows: ; ; 其中,为位置编码后的数据,pos是序列位置,i是维度索引,d是模型维度。in, , The data is position-encoded, pos is the sequence position, i is the dimension index, and d is the model dimension. 6.根据权利要求3或4所述的一种面向孪生系统的数据智能压缩方法,其特征在于,多头注意力机制对数据的处理过程如下式:6. According to a twin system-oriented data intelligent compression method according to claim 3 or 4, it is characterized in that the multi-head attention mechanism processes data as follows: ; 其中,Attention(Q,K,V) 为注意力得分,Q,K,V分别代表查询、键和值矩阵,dk是键向量的维度,softmax是归一化函数。Where Attention(Q,K,V) is the attention score, Q, K, V represent query, key and value matrices respectively, dk is the dimension of the key vector, and softmax is the normalization function. 7.根据权利要求1所述的一种面向孪生系统的数据智能压缩方法,其特征在于,步骤S4中的算术编码方法如下式:7. According to a twin system-oriented data intelligent compression method according to claim 1, it is characterized in that the arithmetic coding method in step S4 is as follows: ; 其中,new_interval是新的编码区间,low和high是当前区间的下界和上界,probability是当前符号的出现概率。Among them, new_interval is the new encoding interval, low and high are the lower and upper bounds of the current interval, and probability is the probability of occurrence of the current symbol.
CN202411745562.9A 2024-12-02 2024-12-02 A data intelligent compression method for twin systems Active CN119232171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411745562.9A CN119232171B (en) 2024-12-02 2024-12-02 A data intelligent compression method for twin systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411745562.9A CN119232171B (en) 2024-12-02 2024-12-02 A data intelligent compression method for twin systems

Publications (2)

Publication Number Publication Date
CN119232171A CN119232171A (en) 2024-12-31
CN119232171B true CN119232171B (en) 2025-04-25

Family

ID=94070520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411745562.9A Active CN119232171B (en) 2024-12-02 2024-12-02 A data intelligent compression method for twin systems

Country Status (1)

Country Link
CN (1) CN119232171B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120433778B (en) * 2025-07-08 2025-09-05 南京信息工程大学 A universal lossless data compression method based on multimodal feature fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116155297A (en) * 2022-12-29 2023-05-23 杭州和利时自动化有限公司 Data compression method, device, equipment and storage medium
CN116229071A (en) * 2023-03-02 2023-06-06 西安电子科技大学 Integrated MP-Unet segmentation method based on multi-mode MRI

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004507145A (en) * 2000-08-15 2004-03-04 シーゲイト テクノロジー エルエルシー Dual mode compression of operating code
CN110737764B (en) * 2019-10-24 2023-07-07 西北工业大学 Personalized dialogue content generation method
CN113393025A (en) * 2021-06-07 2021-09-14 浙江大学 Non-invasive load decomposition method based on Informer model coding structure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116155297A (en) * 2022-12-29 2023-05-23 杭州和利时自动化有限公司 Data compression method, device, equipment and storage medium
CN116229071A (en) * 2023-03-02 2023-06-06 西安电子科技大学 Integrated MP-Unet segmentation method based on multi-mode MRI

Also Published As

Publication number Publication date
CN119232171A (en) 2024-12-31

Similar Documents

Publication Publication Date Title
CN109522403B (en) A Method of Abstract Text Generation Based on Fusion Coding
CN119232171B (en) A data intelligent compression method for twin systems
CN113328755B (en) Compressed data transmission method facing edge calculation
CN116743182B (en) Lossless data compression method
CN116939320B (en) Method for generating multimode mutually-friendly enhanced video semantic communication
CN115022637B (en) Image encoding method, image decompression method and device
CN116961672A (en) Lossless data compression method based on transducer encoder
CN118247704A (en) Video description generation method based on high-dynamic multilayer semantic coding
CN116318172A (en) Design simulation software data self-adaptive compression method
CN117743614A (en) Remote sensing image text retrieval method based on remote sensing multi-mode basic model
CN119445379A (en) A method and system for image self-supervised learning integrating contrastive learning and feature mask modeling
CN111339782B (en) Sign language translation system and method based on multilevel semantic analysis
CN114615507B (en) Image coding method, decoding method and related device
CN115913247A (en) A deep lossless compression method and system for high-frequency power data
CN115599927A (en) A time-series knowledge map completion method and system based on metric learning
CN116911360B (en) Method for solving minimum compression rate of semantic information by using neural network
CN118380072A (en) Quantum transducer model-based molecular generation method
Wang et al. SNN-SC: A spiking semantic communication framework for feature transmission
CN117651144A (en) Deep learning-based building point cloud compression method and system
CN120433778B (en) A universal lossless data compression method based on multimodal feature fusion
CN115048442A (en) System call sequence data enhancement method for host intrusion detection
CN112865898A (en) Antagonistic wireless communication channel model estimation and prediction method
CN119091892B (en) Semantic communication method and system based on self-adaptive semantic reconstruction
CN119449047B (en) LDPC code blind identification analysis method under non-cooperative signal scene
CN118659791B (en) Lossless data processing method based on deep sea profile buoy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant