CN109800869B

CN109800869B - Data compression method and related device

Info

Publication number: CN109800869B
Application number: CN201811641325.2A
Authority: CN
Inventors: 王和国; 李爱军; 曹庆新; 李炜
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2021-03-05
Anticipated expiration: 2038-12-29
Also published as: CN109800869A; WO2020134550A1

Abstract

The embodiment of the application discloses a data compression method and a related device, wherein the method is applied to a neural network model comprising an N-layer structure, N is an integer greater than 1, and the method comprises the following steps: acquiring an output data set of an ith layer structure in a neural network model, wherein the output data set comprises at least one m multiplied by N matrix, m and N are integers greater than 1, and i is any one of 1 to N; performing data packet segmentation on at least one mxn matrix to obtain M segmented first data packets, wherein M is an integer greater than or equal to 1; and performing data compression on the M first data packets to obtain two second data packets after data compression. By adopting the embodiment of the application, the compression efficiency of the data in the neural network model can be improved.

Description

Data compression method and related device

Technical Field

The present application relates to the field of neural network technology, and in particular, to a data compression method and related apparatus.

Background

With the development of artificial intelligence, Neural Networks play an increasingly important role, particularly Convolutional Neural Networks (CNNs). In the operation process of the neural network model, the data and the weight have large requirements on the bandwidth of the system, and the requirement of the neural network model on the bandwidth is urgently reduced.

At present, Column Compression (CCS) and Row Compression (CRS) are adopted to compress data in a neural network model, and the compression efficiency of the two compression modes is not high.

Disclosure of Invention

The embodiment of the application provides a data compression method and a related device, which are used for improving the compression efficiency of data in a neural network model.

In a first aspect, an embodiment of the present application provides a data compression method, which is applied to a neural network model including an N-layer structure, where N is an integer greater than 1, and the method includes:

acquiring an output data set of an ith layer structure in the neural network model, wherein the output data set comprises at least one m x N matrix, m and N are integers greater than 1, and i is any one of 1 to N;

performing data packet segmentation on the at least one mxn matrix to obtain M segmented first data packets, wherein M is an integer greater than or equal to 1;

and performing data compression on the M first data packets to obtain two second data packets after data compression.

In one possible example, the obtaining the output data set of the i-th layer structure in the neural network model includes:

when i is 1, acquiring an input data set of a layer 1 structure in the neural network model, wherein the input data set of the layer 1 structure comprises at least one first matrix;

acquiring a weight data set of the 1 st layer structure, decompressing the weight data set of the 1 st layer structure to obtain a second matrix, wherein the second matrix is the decompressed weight data set of the 1 st layer structure;

determining an output data set of the layer 1 structure based on the at least one first matrix and the second matrix;

when i is more than or equal to 2 and less than or equal to N, acquiring an output data set of an i-1 layer structure in the neural network model, wherein the output data set of the i-1 layer structure comprises at least one third matrix;

acquiring a weight data set of the ith layer structure, decompressing the weight data set of the ith layer structure to obtain a fourth matrix, wherein the fourth matrix is the decompressed weight data set of the ith layer structure;

determining an output data set of the i-th layer structure based on the at least one third matrix and the fourth matrix.

In one possible example, the performing packet segmentation on the at least one M × n matrix to obtain M segmented first packets includes:

performing array conversion on the at least one mxn matrix to obtain a one-dimensional array, wherein the one-dimensional array is the at least one mxn matrix subjected to array conversion;

and performing data packet segmentation on the one-dimensional array to obtain M segmented first data packets, wherein each of the 1 st first data packet to the M-1 st first data packet comprises P data, the M first data packet comprises Q data, P is an integer greater than or equal to 1, and Q is an integer greater than or equal to 1 and less than or equal to P.

In one possible example, the performing data compression on the M first data packets to obtain two second data packets after data compression includes:

acquiring data packet information of a jth first data packet, where the data packet information of the jth first data packet includes P indication signals, a first data set, and a length of the jth first data packet, where the indication signals are used to indicate whether each piece of P data included in the jth first data packet is zero, the first data set includes at least one non-zero piece of P pieces of data included in the jth first data packet, and the jth first data packet is any one of the 1 st first data packet to the M-1 st first data packet;

performing the same operation on M-2 first data packets except the jth first data packet from the 1 st first data packet to the M-1 st first data packet to obtain data packet information of each first data packet in the M-2 first data packets;

acquiring data packet information of the mth first data packet, where the data packet information of the mth first data packet includes Q indication signals, a second data set, and a length of the mth first data packet, where the indication signals are used to indicate whether each piece of Q data included in the mth first data packet is zero, and the second data set includes at least one piece of non-zero data in the Q pieces of data included in the mth first data packet;

forming a first sub-data packet by P indicating signals and a first data set included in each of the 1 st to M-1 th first data packets to obtain M-1 first sub-data packets;

forming an Mth first sub-data packet by Q indicating signals and a second data set which are included in the Mth first data packet;

forming a 1 st second data packet by the M-1 first sub-packets and the Mth first sub-packet based on the ordering of the M first data packets;

and forming the length of the M first data packets into a 2 nd second data packet based on the sorting of the M first data packets.

In a possible example, after the performing data compression on the M first data packets to obtain two second data packets after data compression, the method further includes:

executing the same operation on N-1 layers of structures except the ith layer of structure in the N layers of structures to obtain two second data packets corresponding to each layer of structure in the N-1 layers of structures;

and storing the 2N second data packets corresponding to the N-layer structure into a double-data rate synchronous dynamic random access memory (DDR).

In a second aspect, an embodiment of the present application provides a data compression apparatus, which is applied to a neural network model including an N-layer structure, where N is an integer greater than 1, and the apparatus includes:

an obtaining unit, configured to obtain an output data set of an ith layer structure in the neural network model, where the output data set includes at least one m × N matrix, m and N are integers greater than 1, and i is any one of 1 to N;

a dividing unit, configured to perform packet division on the at least one mxn matrix to obtain M divided first data packets, where M is an integer greater than or equal to 1;

and the compression unit is used for carrying out data compression on the M first data packets to obtain two second data packets after data compression.

In one possible example, in obtaining the output data set of the i-th layer structure in the neural network model, the obtaining unit is specifically configured to:

In a possible example, in terms of performing packet segmentation on the at least one M × n matrix to obtain M segmented first packets, the segmentation unit is specifically configured to:

In a possible example, in terms of performing data compression on the M first data packets to obtain two second data packets after data compression, the compression unit is specifically configured to:

In one possible example, the data compression apparatus further comprises:

the execution unit is used for executing the same operation on N-1 layers of structures except the ith layer of structure in the N layers of structures to obtain two second data packets corresponding to each layer of structure in the N-1 layers of structures;

and the memory unit is used for storing the 2N second data packets corresponding to the N-layer structure into a double-rate synchronous dynamic random access memory (DDR).

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing steps in the method according to the first aspect of the embodiment of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program, where the computer program is executed by a processor to implement some or all of the steps described in the method according to the first aspect of the embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described in a method as described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.

It can be seen that, in the embodiment of the present application, the data compression apparatus obtains an output data set of an i-th layer structure in the neural network model, where the output data set includes at least one mxn matrix, performs data packet segmentation on the at least one mxn matrix to obtain M divided first data packets, and performs data compression on the M first data packets to obtain two data packets after data compression. Therefore, by dividing at least one M × n matrix into M first data packets and compressing the M first data packets into two second data packets, the output data set is compressed into two second data packets, and the compression efficiency of the data in the neural network model is improved.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

Fig. 1A is a schematic flowchart of a data compression method according to an embodiment of the present application;

FIG. 1B is a schematic diagram provided by an embodiment of the present application;

FIG. 1C is another schematic illustration provided by an embodiment of the present application;

FIG. 1D is another schematic illustration provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating another data compression method according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating another data compression method according to an embodiment of the present application;

FIG. 4 is a block diagram illustrating functional units of a data compression apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed description of the invention

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The following are detailed below.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The data compression apparatus according to the embodiment of the present application may be integrated in an electronic device, and the electronic device may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), Mobile Stations (MS), terminal devices (terminal device), and the like. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.

The following describes embodiments of the present application in detail.

Referring to fig. 1A, fig. 1A is a schematic flow chart of a data compression method applied to a neural network model including an N-layer structure, where N is an integer greater than 1, the data compression method includes:

step 101: the data compression device obtains an output data set of an ith layer structure in the neural network model, wherein the output data set comprises at least one m x N matrix, m and N are integers greater than 1, and i is any one of 1 to N.

Taking a convolutional neural network as an example, the N-layer structure includes an input layer, a convolutional layer, a pooling layer, and a full-link layer.

In one possible example, the data compression apparatus obtains an output data set of an i-th layer structure in the neural network model, including:

when i is 1, the data compression device acquires an input data set of a layer 1 structure in the neural network model, wherein the input data set of the layer 1 structure comprises at least one first matrix;

the data compression device obtains the weight data set of the 1 st layer structure, and decompresses the weight data set of the 1 st layer structure to obtain a second matrix, wherein the second matrix is the decompressed weight data set of the 1 st layer structure;

the data compression means determines a set of output data of the layer 1 structure based on the at least one first matrix and the second matrix;

when i is more than or equal to 2 and less than or equal to N, the data compression device acquires an output data set of an i-1 layer structure in the neural network model, wherein the output data set of the i-1 layer structure comprises at least one third matrix;

the data compression device obtains the weight data set of the ith layer structure, and decompresses the weight data set of the ith layer structure to obtain a fourth matrix, wherein the fourth matrix is the decompressed weight data set of the ith layer structure;

the data compression means determines an output data set of the i-th layer structure based on the at least one third matrix and the fourth matrix.

The output data set of the 1 st layer structure is the product of at least one first matrix and a second matrix, and the number of columns of each first matrix is the same as the number of rows of the second matrix.

And the output data set of the ith layer structure is the product of at least one third matrix and a fourth matrix, and the column number of each third matrix is the same as the row number of the fourth matrix.

Step 102: and the data compression device divides the data packet of the at least one M multiplied by n matrix to obtain M divided first data packets, wherein M is an integer greater than or equal to 1.

In one possible example, the data compression apparatus performs packet segmentation on the at least one mxn matrix to obtain M segmented first packets, including:

the data compression device performs array conversion on the at least one mxn matrix to obtain a one-dimensional array, wherein the one-dimensional array is the at least one mxn matrix subjected to array conversion;

the data compression device divides the data packets of the one-dimensional array to obtain M divided first data packets, wherein each of the 1 st first data packet to the M-1 st first data packet comprises P data, the M first data packet comprises Q data, P is an integer greater than or equal to 1, and Q is an integer greater than or equal to 1 and less than or equal to P.

Wherein the size of each of the P data and the Q data is the same.

For example, as shown in fig. 1B, assuming that P is 10, the data compression apparatus performs array conversion on an 8 × 8 matrix (a) to obtain a one-dimensional array (B), where the one-dimensional array (B) includes 64 data, performs packet segmentation on the one-dimensional array (B) to obtain 7 segmented first data packets (c), where each of the 1 st to 6 th first data packets includes 10 data, and the 7 th first data packet includes 4 data.

For example, as shown in fig. 1C, assuming that P is 16, the data compression apparatus performs array conversion on two 8 × 8 matrices (d) to obtain a one-dimensional array (e), where the one-dimensional array (e) includes 128 data, performs packet segmentation on the one-dimensional array (e) to obtain 8 segmented first data packets (f), and each of the 1 st to 8 th first data packets includes 16 data.

Step 103: and the data compression device performs data compression on the M first data packets to obtain two second data packets after data compression.

In one possible example, the data compression apparatus performs data compression on the M first data packets to obtain two second data packets after data compression, including:

the data compression device acquires data packet information of a jth first data packet, wherein the data packet information of the jth first data packet includes P indication signals, a first data set and a length of the jth first data packet, the indication signals are used for indicating whether each piece of P data included in the jth first data packet is zero, the first data set includes at least one piece of non-zero data in the P pieces of data included in the jth first data packet, and the jth first data packet is any one of the 1 st first data packet to the M-1 st first data packet;

the data compression device executes the same operation on M-2 first data packets except the jth first data packet from the 1 st first data packet to the M-1 st first data packet to obtain data packet information of each first data packet in the M-2 first data packets;

the data compression device acquires data packet information of the Mth first data packet, wherein the data packet information of the Mth first data packet comprises Q indication signals, a second data set and the length of the Mth first data packet, the indication signals are used for indicating whether each piece of Q data included by the Mth first data packet is zero or not, and the second data set comprises at least one piece of non-zero data in the Q data included by the Mth first data packet;

the data compression device enables P indicating signals and a first data set included by each of the 1 st first data packet to the M-1 st first data packet to form a first sub data packet, and M-1 first sub data packets are obtained;

the data compression device combines the Q indication signals and the second data set included by the Mth first data packet into an Mth first sub-packet;

the data compression device combines the M-1 first sub-packets and the Mth first sub-packet into a 1 st second data packet based on the ordering of the M first data packets;

the data compression device groups the lengths of the M first packets into a 2 nd second packet based on the ordering of the M first packets.

The ordering of the M first sub-packets in the 1 st second data packet is the same as the ordering of the M first data packets, that is, the M first sub-packets correspond to the M first data packets one to one.

The length ordering of the M first data packets in the 2 nd second data packet is the same as the ordering of the M first data packets, that is, the lengths of the M first data packets correspond to the M first data packets one to one.

For example, as shown in fig. 1D, it is assumed that M is 3, the sequence of 3 first data packets is that the 1 st to 3 rd first data packets are sequentially arranged, the data compression apparatus obtains the data packet information of the 3 first data packets, the data packet information of the 1 st first data packet includes 16 indication signals (16bit), 12 non-zero data (data 1-data 12) and the length of the 1 st first data packet is 64bit, the data packet information of the 2 nd first data packet includes 16 indication signals (16bit), 10 non-zero data (data 13-data 22) and the length of the 2 nd first data packet is 56bit, the data packet information of the 3 rd first data packet includes 16 indication signals (16bit), 11 non-zero data (data 23-data 33) and the length of the 3 rd first data packet is 60bit, and each of the 3 rd first data packets includes the indication signals and the first non-zero data packets, which constitute the sub-data packets And packaging to obtain 3 first sub data packets, forming the 1 st second data packet (g) by using the 3 first sub data packets based on the sequencing of the 3 first data packets, and forming the 2 nd second data packet (h) by using the lengths of the 3 first data packets based on the sequencing of the 3 first data packets.

In one possible example, after the data compression device performs data compression on the M first data packets to obtain two second data packets after the data compression, the method further includes:

the data compression device executes the same operation on N-1 layers of structures except the ith layer of structure in the N layers of structures to obtain two second data packets corresponding to each layer of structure in the N-1 layers of structures;

and the data compression device stores the 2N second data packets corresponding to the N-layer structure into a double-data-rate synchronous dynamic random access memory (DDR).

As can be seen, in this example, the data compression apparatus stores the 2N second packets corresponding to the N-layer structure into the DDR, and since the storage of the 2N second packets into the DDR is an off-chip storage manner, the occupation of the internal storage space of the neural network model is reduced.

In one possible example, after the data compression apparatus stores the 2N second data packets corresponding to the N-layer structure into the DDR, the method further includes:

the data compression device acquires a first position of data to be searched, the first position is a qth row and a qth column of the data to be searched in a target matrix included in an output data set of the ith layer structure, p is more than or equal to 1 and less than or equal to m, and q is more than or equal to 1 and less than or equal to n;

the data compression device determines, based on the first location and the P, a length to read R first packets, where R is [ ((P-1) x n + q)/P ] rounded;

the data compression device determines that the sum of the length of the 1 st first data packet to the length of the Rth first data packet is S based on the 2 nd second data packet corresponding to the ith layer structure;

the data compression device reads the S +1 th indication signal to the [ S + ((P-1) xn + q) -R xP ] th indication signal in the 1 st second data packet corresponding to the ith layer structure;

the data compression device determines a second position of the data to be searched based on the S +1 th indication signal to the [ S + ((P-1) xn + q) -RxP ] th indication signal, wherein the second position is a sequence number of the data to be searched in a 1 st second data packet corresponding to the ith layer structure;

and the data compression device reads the target data of the second position from the 1 st second data packet corresponding to the ith layer structure, wherein the target data is the data to be searched.

Specifically, the data compression apparatus determines the second position of the data to be searched based on the S +1 th indication signal to the [ S + ((P-1) × n + q) -R × P ] th indication signal by: the data compression device determines the number of non-zero indication signals from the S +1 th indication signal to the [ S + ((P-1) x n + q) -R x P ] th indication signal to be T; and the data compression device determines that the second position of the data to be searched is S + P + T.

For example, assuming that the i-th layer structure is a convolutional layer, each nonzero data is 4 bits, an output data set of the convolutional layer is an 8 × 8 matrix, P is 16, the data compression apparatus acquires the 4 th column of the 7 th row in the 8 × 8 matrix of the first position of the data to be searched, determines to read the length of 3 first data packets based on the first position and P, acquires that the length of the 1 st first data packet in the 2 nd second data packet corresponding to the convolutional layer is 64 bits, the 1 st first data packet includes 12 nonzero data, the length of the 2 nd first data packet is 56 bits and the length of the 3 rd first data packet is 60 bits, determines that the sum of the length of the 1 st first data packet to the length of the 3 rd first data packet is 180 bits, reads the 181 th to the first indication signals in the 1 st second data packet corresponding to the convolutional layer, and determines that the number of the 181 th to 184 th indication signals is 4, and determining that the second position of the data to be searched is 200, and reading 200 th data from the 1 st second data packet corresponding to the convolutional layer, wherein the 200 th data is the data to be searched.

As can be seen, in this example, the data compression apparatus determines to read the length of R first packets based on the first position of the data to be searched and P, determines that the sum of the length of the 1 st first packet to the length of the R first packet is S, reads the S +1 st indication signal to the [ S + ((P-1) x n + q) -R x P ] th indication signal in the 1 st second packet corresponding to the i-th layer structure, determines the second position of the data to be searched based on the S +1 st indication signal to the [ S + ((P-1) x n + q) -R x P ] th indication signal, and reads the target data of the second position from the 1 st second packet corresponding to the i-th layer structure. Therefore, the length of the R first data packets is determined through the first position of the data to be searched and the P, the second position of the data to be searched is determined based on the length of the R first data packets, and the position of the compressed source data can be quickly positioned from the variable-length compressed data.

Referring to fig. 2, fig. 2 is a schematic flow chart of another data compression method according to an embodiment of the present application, consistent with the embodiment shown in fig. 1A, where the data compression method includes:

step 201: the data compression device obtains an output data set of an ith layer structure in the neural network model, wherein the output data set comprises at least one m x N matrix, m and N are integers greater than 1, and i is any one of 1 to N.

Step 202: and the data compression device performs array conversion on the at least one mxn matrix to obtain a one-dimensional array, wherein the one-dimensional array is the at least one mxn matrix subjected to array conversion.

Step 203: the data compression device divides the one-dimensional array into data packets to obtain M divided first data packets, wherein each of the 1 st first data packet to the M-1 st first data packet comprises P data, the M first data packet comprises Q data, M is an integer greater than or equal to 1, P is an integer greater than or equal to 1, and Q is an integer greater than or equal to 1 and less than or equal to P.

Step 204: the data compression device obtains data packet information of a jth first data packet, where the data packet information of the jth first data packet includes P indication signals, a first data set, and a length of the jth first data packet, where the indication signals are used to indicate whether each piece of P data included in the jth first data packet is zero, the first data set includes at least one non-zero piece of P data included in the jth first data packet, and the jth first data packet is any one of the 1 st first data packet to the M-1 st first data packet.

Step 205: and the data compression device executes the same operation on M-2 first data packets except the jth first data packet from the 1 st first data packet to the M-1 st first data packet to obtain data packet information of each first data packet in the M-2 first data packets.

Step 206: the data compression device obtains data packet information of the mth first data packet, the data packet information of the mth first data packet includes Q indication signals, a second data set and a length of the mth first data packet, the indication signals are used for indicating whether each piece of Q data included in the mth first data packet is zero, and the second data set includes at least one piece of non-zero data in the Q data included in the mth first data packet.

Step 207: and the data compression device combines the P indication signals and the first data set included in each of the 1 st to M-1 th first data packets into a first sub-packet to obtain M-1 first sub-packets.

Step 208: and the data compression device combines the Q indication signals and the second data set included by the Mth first data packet into an Mth first sub-packet.

Step 209: and the data compression device combines the M-1 first sub-packets and the Mth first sub-packet into a 1 st second data packet based on the sequencing of the M first data packets.

Step 210: the data compression device groups the lengths of the M first packets into a 2 nd second packet based on the ordering of the M first packets.

It should be noted that, the specific implementation of the steps of the method shown in fig. 2 can refer to the specific implementation described in the above method, and will not be described here.

In accordance with the embodiment shown in fig. 1A and fig. 2, please refer to fig. 3, and fig. 3 is a schematic flow chart of another data compression method provided in the present application, where the data compression method includes:

step 301: the data compression device obtains an output data set of an ith layer structure in the neural network model, wherein the output data set comprises at least one m x N matrix, m and N are integers greater than 1, and i is any one of 1 to N.

Step 302: and the data compression device performs array conversion on the at least one mxn matrix to obtain a one-dimensional array, wherein the one-dimensional array is the at least one mxn matrix subjected to array conversion.

Step 303: the data compression device divides the data packets of the one-dimensional array to obtain M divided first data packets, wherein each of the 1 st first data packet to the M-1 st first data packet comprises P data, the M first data packet comprises Q data, M is an integer greater than 1, P is an integer greater than or equal to 1, and Q is an integer greater than or equal to 1 and less than or equal to P.

Step 304: the data compression device obtains data packet information of a jth first data packet, where the data packet information of the jth first data packet includes P indication signals, a first data set, and a length of the jth first data packet, where the indication signals are used to indicate whether each piece of P data included in the jth first data packet is zero, the first data set includes at least one non-zero piece of P data included in the jth first data packet, and the jth first data packet is any one of the 1 st first data packet to the M-1 st first data packet.

Step 305: and the data compression device executes the same operation on M-2 first data packets except the jth first data packet from the 1 st first data packet to the M-1 st first data packet to obtain data packet information of each first data packet in the M-2 first data packets.

Step 306: the data compression device obtains data packet information of the mth first data packet, the data packet information of the mth first data packet includes Q indication signals, a second data set and a length of the mth first data packet, the indication signals are used for indicating whether each piece of Q data included in the mth first data packet is zero, and the second data set includes at least one piece of non-zero data in the Q data included in the mth first data packet.

Step 307: and the data compression device combines the P indication signals and the first data set included in each of the 1 st to M-1 th first data packets into a first sub-packet to obtain M-1 first sub-packets.

Step 308: and the data compression device combines the Q indication signals and the second data set included by the Mth first data packet into an Mth first sub-packet.

Step 309: and the data compression device combines the M-1 first sub-packets and the Mth first sub-packet into a 1 st second data packet based on the sequencing of the M first data packets.

Step 310: the data compression device groups the lengths of the M first packets into a 2 nd second packet based on the ordering of the M first packets.

Step 311: and the data compression device executes the same operation on the N-1 layers of structures except the ith layer of structure in the N layers of structures to obtain two second data packets corresponding to each layer of structure in the N-1 layers of structures.

Step 312: and the data compression device stores the 2N second data packets corresponding to the N-layer structure into a double-data-rate synchronous dynamic random access memory (DDR).

It should be noted that, the specific implementation of the steps of the method shown in fig. 3 can refer to the specific implementation described in the above method, and will not be described here.

The above embodiments mainly introduce the scheme of the embodiments of the present application from the perspective of the method-side implementation process. It is to be understood that the data compression apparatus includes hardware structures and/or software modules corresponding to the respective functions for implementing the above-described functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the data compression apparatus may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

The following is an embodiment of the apparatus of the present application, which is used to execute the method implemented by the embodiment of the method of the present application. Referring to fig. 4, fig. 4 is a block diagram illustrating functional units of a data compression apparatus 400 according to an embodiment of the present application, in which the data compression apparatus 400 is applied to a neural network model including an N-layer structure, where N is an integer greater than 1, the data compression apparatus 400 includes:

an obtaining unit 401, configured to obtain an output data set of an i-th layer structure in the neural network model, where the output data set includes at least one m × N matrix, m and N are both integers greater than 1, and i is any one of 1 to N;

a dividing unit 402, configured to perform packet division on the at least one mxn matrix to obtain M divided first data packets, where M is an integer greater than or equal to 1;

a compressing unit 403, configured to perform data compression on the M first data packets to obtain two second data packets after data compression.

In one possible example, in obtaining the output data set of the i-th layer structure in the neural network model, the obtaining unit 401 is specifically configured to:

In a possible example, in terms of performing packet segmentation on the at least one M × n matrix to obtain M segmented first packets, the segmentation unit 402 is specifically configured to:

In a possible example, in terms of performing data compression on the M first data packets to obtain two second data packets after data compression, the compression unit 403 is specifically configured to:

In one possible example, the data compression apparatus 400 further includes:

an executing unit 404, configured to execute the same operation on N-1 layers of the N-layer structures except for the ith layer of structure, to obtain two second data packets corresponding to each layer of the N-1 layers of structures;

the storage unit 405 is configured to store the 2N second data packets corresponding to the N-layer structure into the double data rate synchronous dynamic random access memory DDR.

Consistent with the embodiments shown in fig. 1A, fig. 2 and fig. 3, please refer to fig. 5, fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, where the electronic device includes a processor, a memory, a communication interface, and one or more programs, the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for performing the following steps:

It can be seen that, in the embodiment of the present application, an output data set of an i-th layer structure in a neural network model is obtained, where the output data set includes at least one mxn matrix, the at least one mxn matrix is subjected to data packet segmentation to obtain M segmented first data packets, and the M first data packets are subjected to data compression to obtain two data packets after data compression. Therefore, by dividing at least one M × n matrix into M first data packets and compressing the M first data packets into two second data packets, the output data set is compressed into two second data packets, and the compression efficiency of the data in the neural network model is improved.

In one possible example, in obtaining the output data set of the i-th layer structure in the neural network model, the program comprises instructions for performing the following steps:

acquiring at least one weight data set of the 1 st layer structure, decompressing the weight data set of the 1 st layer structure to obtain a second matrix, wherein the second matrix is the decompressed weight data set of the 1 st layer structure;

In one possible example, in terms of performing packet segmentation on the at least one M × n matrix to obtain M segmented first packets, the program includes instructions specifically configured to:

In one possible example, in terms of performing data compression on the M first data packets to obtain two second data packets after data compression, the program includes instructions specifically configured to perform the following steps:

In one possible example, the program further includes instructions for performing the steps of:

and storing the 2N second data packets corresponding to the N-layer structure in a double-speed synchronous dynamic random access memory (DDR).

Embodiments of the present application further provide a computer storage medium for storing a computer program, where the computer program is executed by a processor to implement part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific implementation and application scope, and in view of the above, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A data compression method, characterized in that, applied to a neural network model comprising an N-layer structure, wherein N is an integer greater than 1, and the method comprises:

Obtain the output data set of the i-th layer structure in the neural network model, the output data set includes at least one m×n matrix, the m and the n are both integers greater than 1, and the i is 1 to any of the N;

performing data packet division on the at least one m×n matrix to obtain M first data packets after division, where M is an integer greater than or equal to 1;

performing data compression on the M first data packets to obtain two second data packets after data compression;

The said at least one m×n matrix is divided into data packets to obtain M first data packets after division, including:

performing array conversion on the at least one m×n matrix to obtain a one-dimensional array, where the one-dimensional array is the at least one m×n matrix after the array conversion;

The one-dimensional array is divided into data packets to obtain the M first data packets after the division, and each of the first data packets from the first first data packet to the M-1th first data packet is including P pieces of data, the Mth first data packet includes Q pieces of data, the P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 1 and less than or equal to the P;

The described data compression is performed on the M first data packets to obtain two second data packets after data compression, including:

Obtain the data packet information of the jth first data packet, where the data packet information of the jth first data packet includes P indication signals, the first data set and the length of the jth first data packet, so The indication signal is used to indicate whether each of the P data included in the jth first data packet is zero, and the first data set includes P data included in the jth first data packet At least one non-zero data in, the jth first data packet is any one of the 1st first data packet to the M-1th first data packet;

Perform the same operation on M-2 first data packets from the first first data packet to the M-1 th first data packet except the jth first data packet, to obtain Packet information of each first packet in the M-2 first packets;

Acquire the data packet information of the M-th first data packet, where the data-packet information of the M-th first data packet includes Q indication signals, a second data set, and the length of the M-th first data packet , the indication signal is used to indicate whether each data in the Q data included in the Mth first data packet is zero, and the second data set includes the Q data included in the Mth first data packet at least one of the data is non-zero;

The P indicator signals and the first data set included in each of the first data packets from the first first data packet to the M-1th first data packet are formed into a first sub-data packet, to obtain M - 1 first sub-packet;

The M-th first sub-packet is formed by the Q indicator signals and the second data set included in the M-th first data packet;

Based on the ordering of the M first data packets, the M-1 first sub-data packets and the M-th first sub-data packets form the first second data packet;

The lengths of the M first data packets are formed into a second second data packet based on the ordering of the M first data packets.

2. The method according to claim 1, wherein the acquiring the output data set of the i-th layer structure in the neural network model comprises:

When i=1, obtain the input data set of the first layer structure in the neural network model, and the input data set of the first layer structure includes at least one first matrix;

Obtain the weight data set of the first layer structure, decompress the weight data set of the first layer structure to obtain a second matrix, and the second matrix is the decompressed first layer structure The weight data set of ;

determining a set of output data for the layer 1 structure based on the at least one first matrix and the second matrix;

When 2≤i≤N, obtain the output data set of the i-1th layer structure in the neural network model, and the output data set of the i-1th layer structure includes at least one third matrix;

Obtain the weight data set of the i-th layer structure, decompress the weight data set of the i-th layer structure, and obtain a fourth matrix, where the fourth matrix is the decompressed i-th layer structure The weight data set of ;

An output data set of the i-th layer structure is determined based on the at least one third matrix and the fourth matrix.

3. The method according to claim 1, wherein after the data compression is performed on the M first data packets to obtain two second data packets after data compression, the method further comprises:

Perform the same operation on the N-1 layer structure except the i-th layer structure in the N-layer structure, to obtain two second data packets corresponding to each layer structure in the N-1 layer structure;

2N second data packets corresponding to the N-layer structure are stored in the double-rate synchronous dynamic random access memory DDR.

4. A data compression device, characterized in that, when applied to a neural network model comprising an N-layer structure, the N is an integer greater than 1, and the device comprises:

an obtaining unit, configured to obtain the output data set of the i-th layer structure in the neural network model, the output data set includes at least one m×n matrix, and the m and the n are both integers greater than 1, The i is any one from 1 to the N;

a dividing unit, configured to perform data packet division on the at least one m×n matrix to obtain M first data packets after division, where M is an integer greater than or equal to 1;

a compression unit, configured to perform data compression on the M first data packets to obtain two second data packets after data compression;

In terms of performing data packet segmentation on the at least one m×n matrix to obtain M first data packets after the segmentation, the segmentation unit is specifically used for:

In terms of performing data compression on the M first data packets to obtain two second data packets after data compression, the compression unit is specifically used for:

5. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, the one or more programs being stored in the memory and configured by the processor Executing, the program includes instructions for performing the steps in the method of any of claims 1-3.

6. A computer-readable storage medium, characterized in that, the computer-readable storage medium is used for storing a computer program, and the computer program is executed by a processor to realize the method according to any one of claims 1-3. method.