CN118249817B

CN118249817B - Decoding method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN118249817B
Application number: CN202410636569.0A
Authority: CN
Inventors: 刘发强; 祝夭龙
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2024-05-22
Filing date: 2024-05-22
Publication date: 2024-08-27
Anticipated expiration: 2044-05-22
Also published as: CN118249817A

Abstract

The disclosed embodiment provides a decoding method and device, an electronic device, and a computer-readable storage medium. The decoding method includes: obtaining multiple coding sequences based on Huffman coding; multiple coding sequences are obtained by encoding multiple segments after segmenting the data to be encoded; multiple coding sequences are loaded onto a chip; a decoding model on the chip is used to perform on-chip parallel decoding on multiple coding sequences, and multiple codes within each coding sequence are decoded in parallel; the decoding model is a model with parallel decoding capability obtained by training or fitting multiple coding sequence samples; multiple coding sequence samples are obtained by encoding multiple segments after segmenting the data samples to be encoded. Through this embodiment scheme, parallel decoding of Huffman coding sequences is realized, thereby reducing decoding delay and improving decoding speed in scenarios such as large model bandwidth compression.

Description

Decoding method and device, electronic device, and computer-readable storage medium

技术领域Technical Field

本公开实施例涉及解码技术领域，特别涉及一种解码方法及装置、电子设备、计算机可读存储介质。The embodiments of the present disclosure relate to the field of decoding technology, and in particular to a decoding method and device, an electronic device, and a computer-readable storage medium.

背景技术Background Art

霍夫曼编码是一种熵编码方法，根据数据的统计特性确定不定长的编码表，以实现无损的数据压缩。霍夫曼编码广泛用于图片等文件的计算机网络传输，以减少带宽需求。Huffman coding is an entropy coding method that determines a variable-length coding table based on the statistical characteristics of the data to achieve lossless data compression. Huffman coding is widely used in computer network transmission of files such as pictures to reduce bandwidth requirements.

然而，霍夫曼解码一般是串行的，该解码方法一般难以适配GPU（GraphicalProcessing Unit，图形处理单元）、类脑芯片等大量配备的向量矩阵计算单元，因此霍夫曼编码的解码方案会导致较大的额外时间代价，会抵消一部分压缩带宽带来的性能优势。However, Huffman decoding is generally serial, and this decoding method is generally difficult to adapt to vector matrix computing units that are widely equipped with GPUs (Graphical Processing Units) and brain-like chips. Therefore, the decoding scheme of Huffman coding will incur a large additional time cost, which will offset part of the performance advantages brought by the compressed bandwidth.

发明内容Summary of the invention

本公开实施例提供了一种解码方法及装置、电子设备、计算机可读存储介质。The embodiments of the present disclosure provide a decoding method and device, an electronic device, and a computer-readable storage medium.

第一方面，本公开实施例提供了一种解码方法，该解码方法包括：In a first aspect, an embodiment of the present disclosure provides a decoding method, the decoding method comprising:

获取基于霍夫曼编码的多个编码序列；所述多个编码序列为待编码数据进行分段后对多个分段分别进行编码获得的；Acquire multiple coding sequences based on Huffman coding; the multiple coding sequences are obtained by segmenting the data to be encoded and encoding the multiple segments respectively;

将所述多个编码序列加载到芯片上；loading the plurality of coding sequences onto a chip;

采用所述芯片上的解码模型，对所述多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码；Using the decoding model on the chip, the multiple coding sequences are decoded in parallel on the chip, and the multiple codes within each coding sequence are decoded in parallel;

其中，所述解码模型是基于多个编码序列样本进行训练或拟合获得的具有并行解码能力的模型；所述多个编码序列样本为待编码数据样本进行分段后对多个分段分别进行编码获得的。The decoding model is a model with parallel decoding capability obtained by training or fitting based on multiple coding sequence samples; the multiple coding sequence samples are obtained by segmenting the data samples to be encoded and then encoding the multiple segments separately.

第二方面，本公开提供了一种解码装置，该解码装置包括：In a second aspect, the present disclosure provides a decoding device, the decoding device comprising:

获取模块，用于获取基于霍夫曼编码的多个编码序列；所述多个编码序列为待编码数据进行分段后对多个分段分别进行编码获得的；An acquisition module, used for acquiring multiple coding sequences based on Huffman coding; the multiple coding sequences are obtained by segmenting the data to be encoded and then encoding the multiple segments respectively;

加载模块，用于将所述多个编码序列加载到芯片上；A loading module, used for loading the plurality of coding sequences onto a chip;

解码模块，用于采用所述芯片上的解码模型，对所述多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码；A decoding module, used to use the decoding model on the chip to perform on-chip parallel decoding on the multiple coding sequences, and to perform parallel decoding on multiple codes within each coding sequence;

第三方面，本公开提供了一种电子设备，包括：In a third aspect, the present disclosure provides an electronic device, including:

多个处理核；以及multiple processing cores; and

片上网络，被配置为交互所述多个处理核间的数据和外部数据；其中，一个或多个所述处理核中存储有一个或多个指令，一个或多个所述指令被一个或多个所述处理核执行，以使一个或多个所述处理核能够执行所述的解码方法。An on-chip network is configured to exchange data between the multiple processing cores and external data; wherein one or more instructions are stored in one or more of the processing cores, and one or more of the instructions are executed by one or more of the processing cores, so that one or more of the processing cores can execute the decoding method.

第四方面，本公开提供了一种计算机可读存储介质，其上存储有计算机程序，其中，所述计算机程序在被处理器/处理核执行时实现上述的解码方法。In a fourth aspect, the present disclosure provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the above-mentioned decoding method when executed by a processor/processing core.

本公开所提供的实施例通过采用芯片上的解码模型，对经过霍夫曼编码获得的多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码，能够实现霍夫曼编码序列的并行解码，从而降低解码延迟，提高在大模型带宽压缩等场景中的解码速度。The embodiments provided by the present disclosure use a decoding model on a chip to perform on-chip parallel decoding on multiple coding sequences obtained through Huffman coding, and to perform parallel decoding on multiple codes within each coding sequence, thereby enabling parallel decoding of Huffman coding sequences, thereby reducing decoding delay and improving decoding speed in scenarios such as large model bandwidth compression.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本公开的进一步理解，并且构成说明书的一部分，与本公开的实施例一起用于解释本公开，并不构成对本公开的限制。通过参考附图对详细示例实施例进行描述，以上和其他特征和优点对本领域技术人员将变得更加显而易见，在附图中：The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification. Together with the embodiments of the present disclosure, they are used to explain the present disclosure and do not constitute a limitation of the present disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing detailed example embodiments with reference to the accompanying drawings. In the accompanying drawings:

图1为本公开实施例提供的一种解码方法的流程图；FIG1 is a flowchart of a decoding method provided by an embodiment of the present disclosure;

图2为本公开实施例提供的一种解码方法的示意图；FIG2 is a schematic diagram of a decoding method provided by an embodiment of the present disclosure;

图3为本公开实施例提供的一种解码装置的框图；FIG3 is a block diagram of a decoding device provided by an embodiment of the present disclosure;

图4为本公开实施例提供的一种电子设备的框图。FIG. 4 is a block diagram of an electronic device provided in an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

为使本领域的技术人员更好地理解本公开的技术方案，以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, including various details of the embodiments of the present disclosure to facilitate understanding, which should be considered as merely exemplary. Therefore, it should be recognized by those of ordinary skill in the art that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present disclosure. Similarly, for the sake of clarity and conciseness, the description of well-known functions and structures is omitted in the following description.

在不冲突的情况下，本公开各实施例及实施例中的各特征可相互组合。In the absence of conflict, the various embodiments of the present disclosure and the various features therein may be combined with each other.

如本文所使用的，术语“和/或”包括一个或多个相关列举条目的任何和所有组合。As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

本文所使用的术语仅用于描述特定实施例，且不意欲限制本公开。如本文所使用的，单数形式“一个”和“该”也意欲包括复数形式，除非上下文另外清楚指出。还将理解的是，当本说明书中使用术语“包括”和/或“由……制成”时，指定存在所述特征、整体、步骤、操作、元件和/或组件，但不排除存在或添加一个或多个其它特征、整体、步骤、操作、元件、组件和/或其群组。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接，而是可以包括电性的连接，不管是直接的还是间接的。The terms used herein are only used to describe specific embodiments and are not intended to limit the present disclosure. As used herein, the singular forms "a" and "the" are also intended to include plural forms, unless the context clearly indicates otherwise. It will also be understood that when the terms "including" and/or "made of" are used in this specification, the presence of the features, wholes, steps, operations, elements and/or components is specified, but the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or groups thereof is not excluded. "Connected" or "connected" and similar words are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect.

除非另外限定，否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解，诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义，且将不解释为具有理想化或过度形式上的含义，除非本文明确如此限定。Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted as having an idealized or overly formal meaning unless explicitly defined as such herein.

大语言模型（Large Language Model）在芯片上，特别是在端侧芯片上部署推理时，token（令牌）的产生过程主要是串行的。这使得每产生一个token就需要把整个模型权重等参数从内存（显存）加载到芯片上一次。从而，串行产生一个token，内存数据传输带宽远大于计算量。因此，内存传输带宽是提高推理速度的主要瓶颈。When deploying a large language model on a chip, especially on an edge chip for inference, the token generation process is mainly serial. This means that every time a token is generated, the entire model weight and other parameters need to be loaded from the memory (video memory) to the chip once. As a result, when a token is generated serially, the memory data transmission bandwidth is much larger than the amount of computation. Therefore, the memory transmission bandwidth is the main bottleneck for improving the inference speed.

另外，霍夫曼编码可以用于神经网络权重的压缩编码，从而降低带宽需求。例如，chatglm3权重的信息量约占其存储量的72%，因此有压缩潜力。In addition, Huffman coding can be used to compress neural network weights, thereby reducing bandwidth requirements. For example, the information content of chatglm3 weights accounts for about 72% of its storage capacity, so there is potential for compression.

然而，霍夫曼解码一般是串行的，而且解码方法一般难以适配GPU（GraphicalProcessing Unit，图形处理单元）、类脑芯片等大量配备的向量矩阵乘计算单元，因此霍夫曼编码的片上解码方案会导致较大的额外时间代价，会抵消一部分压缩带宽带来的性能优势。However, Huffman decoding is generally serial, and the decoding method is generally difficult to adapt to the vector-matrix multiplication computing units that are widely equipped in GPUs (Graphical Processing Units) and brain-like chips. Therefore, the on-chip decoding solution of Huffman coding will incur a large additional time cost, which will offset part of the performance advantages brought by the compressed bandwidth.

目前，基本的霍夫曼解码方法是根据编码表构建的二叉树逐bit（比特）的比对编码数据，以完成串行的解码。其缺点主要是难以并行，导致解码长序列时时间较长。At present, the basic Huffman decoding method is to compare the encoded data bit by bit according to the binary tree constructed according to the encoding table to complete serial decoding. Its main disadvantage is that it is difficult to parallelize, resulting in a long time to decode long sequences.

当前的霍夫曼解码的主要缺点是串行为主，难以并行，而且解码方式难以适配配备大规模向量矩阵乘的类脑计算芯片或GPU等计算芯片，从而难以用到大模型推理带宽缩减等场景上。The main disadvantage of the current Huffman decoding is that it is mainly serial and difficult to parallelize. In addition, the decoding method is difficult to adapt to brain-like computing chips or GPUs equipped with large-scale vector-matrix multiplications, making it difficult to use in scenarios such as reducing bandwidth for large model inference.

本公开所提供的实施例通过采用芯片上的解码模型，对经过霍夫曼编码获得的多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码，能够实现霍夫曼编码序列的并行解码，从而提高解码效率，降低解码延迟，提高在大模型带宽压缩等场景中的解码速度。The embodiments provided by the present disclosure use a decoding model on a chip to perform on-chip parallel decoding on multiple coding sequences obtained through Huffman coding, and perform parallel decoding on multiple codes within each coding sequence, thereby realizing parallel decoding of Huffman coding sequences, thereby improving decoding efficiency, reducing decoding delay, and increasing decoding speed in scenarios such as large model bandwidth compression.

根据本公开实施例的解码方法可以由终端设备或服务器等电子设备执行，终端设备可以为车载设备、用户设备（User Equipment，UE）、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理（Personal Digital Assistant，PDA）、手持设备、计算设备、车载设备、可穿戴设备等，所述方法可以通过处理器调用存储器中存储的计算机可读程序指令的方式来实现，或者，可通过服务器执行所述方法。The decoding method according to the embodiment of the present disclosure can be executed by an electronic device such as a terminal device or a server. The terminal device can be a vehicle-mounted device, a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc. The method can be implemented by a processor calling a computer-readable program instruction stored in a memory, or the method can be executed by a server.

图1为本公开实施例提供的一种解码方法的流程图。参照图1，该方法包括步骤S11-S13：FIG1 is a flowchart of a decoding method provided by an embodiment of the present disclosure. Referring to FIG1 , the method includes steps S11-S13:

S11、获取基于霍夫曼编码的多个编码序列；多个编码序列为待编码数据进行分段后对多个分段分别进行编码获得的。S11. Acquire multiple coding sequences based on Huffman coding; the multiple coding sequences are obtained by segmenting the data to be encoded and then encoding the multiple segments respectively.

预先根据所有待编码传输的数据的统计特性构建该数据的霍夫曼编码表，包括：计算该数据中每个字符出现的次数（概率），把最小的两个出现次数（概率）相加，构建二叉树，并作为左右子树，重复此过程，直到概率值为1，将每个二叉树的左边指定为0，右边指定为1，依次统计沿二叉树顶部到每个字符的路径中指定的0和1，作为该字符对应的霍夫曼编码，将全部字符及其对应的霍夫曼编码映射为表格，获得霍夫曼编码表。A Huffman coding table of the data is constructed in advance according to the statistical characteristics of all the data to be encoded and transmitted, including: calculating the number of occurrences (probability) of each character in the data, adding the two smallest number of occurrences (probabilities), constructing a binary tree, and using it as the left and right subtrees, repeating this process until the probability value is 1, specifying the left side of each binary tree as 0 and the right side as 1, and sequentially counting the 0s and 1s specified in the path from the top of the binary tree to each character as the Huffman code corresponding to the character, mapping all characters and their corresponding Huffman codes into a table, and obtaining a Huffman coding table.

在本公开实施例中，待编码数据可以为大模型权重数据；在获取基于霍夫曼编码的多个编码序列之前，所述方法还可以包括：In the embodiment of the present disclosure, the data to be encoded may be large model weight data; before obtaining a plurality of encoding sequences based on Huffman coding, the method may further include:

将大模型权重矩阵中每一行的大模型权重数据分别作为一个待编码的数据段，对每个待编码的数据段进行霍夫曼编码，获得一个编码序列；或者，The large model weight data of each row in the large model weight matrix is respectively used as a data segment to be encoded, and each data segment to be encoded is Huffman encoded to obtain a coding sequence; or,

将大模型权重矩阵中每一列的大模型权重数据分别作为一个待编码的数据段，对每个待编码的数据段进行霍夫曼编码，获得一个编码序列。The large model weight data in each column of the large model weight matrix is respectively used as a data segment to be encoded, and Huffman encoding is performed on each data segment to be encoded to obtain a coding sequence.

在本公开实施例中，如图2所示，对于待编码传输的数据，可以进行分段编码，例如，对于一个权重矩阵中的数据，可以分别以每行数据作为一个待编码的数据段，或者分别以每列数据作为一个待编码的数据段，对于每个数据段进行霍夫曼编码，得到一个编码序列，每个编码序列为一个由0和/或1组成的序列，可以简称为0/1序列。In an embodiment of the present disclosure, as shown in FIG2 , segmented encoding may be performed on the data to be encoded and transmitted. For example, for the data in a weight matrix, each row of data may be taken as a data segment to be encoded, or each column of data may be taken as a data segment to be encoded. Huffman encoding is performed on each data segment to obtain a coding sequence. Each coding sequence is a sequence composed of 0s and/or 1s, which may be referred to as a 0/1 sequence.

在本公开实施例中，在将多个编码序列加载到芯片上之前，所述方法还可以包括：In the embodiment of the present disclosure, before loading the multiple coding sequences onto the chip, the method may further include:

获取多个编码序列中长度最长的第一编码序列；Obtaining a first coding sequence with the longest length among multiple coding sequences;

根据第一编码序列的长度，对多个编码序列中除第一编码序列以外的第二编码序列进行位数补齐。According to the length of the first coding sequence, the second coding sequences other than the first coding sequence in the multiple coding sequences are padded with bits.

在本公开实施例中，可以将编码后获得的多个0/1序列按照所有0/1序列中最大长度的0/1序列进行长度补齐。In the disclosed embodiment, the lengths of the multiple 0/1 sequences obtained after encoding may be padded according to the 0/1 sequence with the longest length among all 0/1 sequences.

在本公开实施例中，根据第一编码序列的长度，对多个编码序列中除第一编码序列以外的第二编码序列进行位数补齐，包括：In the embodiment of the present disclosure, according to the length of the first coding sequence, the second coding sequence other than the first coding sequence in the plurality of coding sequences is padded with bits, including:

根据第一编码序列的长度，对第二编码序列补充0和/或1。The second coding sequence is supplemented with 0 and/or 1 according to the length of the first coding sequence.

在本公开实施例中，进行长度补齐时，可以在0/1序列的后几位全部补0对齐，也可以全部补1对齐，还可以将0和1相结合进行补齐，在此对于详细补齐方式不做限定，可以根据需求自行定义。其中，在补齐时可以使得补齐的位数与原始的0/1序列具有明显区别，以便于解码模型便于识别待解码的原始的0/1序列。In the disclosed embodiment, when padding the length, the last few bits of the 0/1 sequence may be padded with all 0s, or all 1s, or a combination of 0 and 1. The detailed padding method is not limited here and can be defined according to the needs. When padding, the padded bits can be clearly distinguished from the original 0/1 sequence, so that the decoding model can easily identify the original 0/1 sequence to be decoded.

在本公开实施例中，在获取基于霍夫曼编码的多个编码序列之前，所述方法还可以包括：In the embodiment of the present disclosure, before acquiring multiple coding sequences based on Huffman coding, the method may further include:

在对待编码数据进行分段时，使得获得的每个待编码的数据段的统计特性差异在预设的差异范围内。When the data to be encoded is segmented, the difference in the statistical characteristics of each data segment to be encoded is made to be within a preset difference range.

在本公开实施例中，进行长度补齐时，为了使补齐的0的个数尽量少以减少无效传输，应该在对数据进行分段时尽量使得每一个数据段的统计特性接近（例如，使得权重矩阵的每一行（列）的统计特性接近），例如，使得每一段数据的数据分布情况接近。In the embodiments of the present disclosure, when length padding is performed, in order to minimize the number of padding 0s to reduce invalid transmission, the statistical characteristics of each data segment should be made close as much as possible when the data is segmented (for example, the statistical characteristics of each row (column) of the weight matrix should be close), for example, the data distribution of each segment of data should be close.

在本公开实施例中，该差异范围可以根据不同的应用场景自行定义，在此不做限定。In the embodiments of the present disclosure, the difference range can be defined according to different application scenarios and is not limited here.

在本公开实施例中，例如，进行分段时统计出大模型权重矩阵中包含3、5、8、9等数据，其中，3和8的数量较多，5和9的数量较少，并且，如果按大模型权重矩阵的每行进行分段，则会使得每行大模型权重数据中3和8的占比在95%以上，5和9的占比在5%以内，例如，第一行大模型权重数据中3和8的占比在97%，第二行大模型权重数据中3和8的占比在96%，第三行大模型权重数据中3和8的占比在98%，因此，该差异范围可以确定为1%-3%。此时按行对大模型权重矩阵中的数据进行分段时，则使得每个待编码的数据段的统计特性差异在预设的差异范围内。反之，如果按大模型权重矩阵的每列进行分段，则会使得某些列的大模型权重数据中3和8的占比在90%左右，5和9的占比在10%左右，某些列的大模型权重数据中3和8的占比在50%左右，5和9的占比在50%左右，因此，各列之间的统计特性差异太大，使得各个数据段中间的长度差异太大，不利于长度补齐，因此，避免按大模型权重矩阵的每列进行分段。In the embodiment of the present disclosure, for example, when segmenting, statistics are obtained that the large model weight matrix contains data such as 3, 5, 8, and 9, among which the number of 3 and 8 is large, and the number of 5 and 9 is small. Moreover, if segmentation is performed according to each row of the large model weight matrix, the proportion of 3 and 8 in each row of the large model weight data will be more than 95%, and the proportion of 5 and 9 will be less than 5%. For example, the proportion of 3 and 8 in the first row of the large model weight data is 97%, the proportion of 3 and 8 in the second row of the large model weight data is 96%, and the proportion of 3 and 8 in the third row of the large model weight data is 98%. Therefore, the difference range can be determined as 1%-3%. At this time, when the data in the large model weight matrix is segmented by row, the statistical characteristic difference of each data segment to be encoded is within the preset difference range. On the contrary, if the large model weight matrix is segmented according to each column, the proportion of 3 and 8 in the large model weight data of some columns will be about 90%, and the proportion of 5 and 9 will be about 10%. The large model weight data of some columns will have a proportion of 3 and 8 and a proportion of 5 and 9 of about 50%. Therefore, the statistical characteristics of each column are too different, and the length difference between each data segment is too large, which is not conducive to length filling. Therefore, avoid segmenting according to each column of the large model weight matrix.

S12、将多个编码序列加载到芯片上。S12, loading multiple coding sequences onto the chip.

在本公开实施例中，基于解码模型的片上解码，可以把分段编码后获得的多个编码序列加载到芯片上，并利用解码模型进行并行解码。In the embodiment of the present disclosure, based on the on-chip decoding of the decoding model, multiple coding sequences obtained after segmented coding can be loaded onto the chip and decoded in parallel using the decoding model.

S13、采用芯片上的解码模型，对多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码；其中，该解码模型是基于多个编码序列样本进行训练或拟合获得的具有并行解码能力的模型；多个编码序列样本为待编码数据样本进行分段后对多个分段分别进行编码获得的。S13. Using a decoding model on the chip, perform on-chip parallel decoding on multiple coding sequences, and perform parallel decoding on multiple codes within each coding sequence; wherein the decoding model is a model with parallel decoding capability obtained by training or fitting based on multiple coding sequence samples; and the multiple coding sequence samples are obtained by segmenting the data samples to be encoded and then encoding the multiple segments separately.

在本公开实施例中，解码模型可以包括但不限于神经网络，该神经网络可以用于执行图像处理任务、语音处理任务、文本处理任务、视频处理任务中的任意一种。In the embodiments of the present disclosure, the decoding model may include but is not limited to a neural network, which can be used to perform any one of image processing tasks, speech processing tasks, text processing tasks, and video processing tasks.

在本公开实施例中，该解码模型可以是任意的能够并行解码计算的、数据驱动训练的、参数量较少的模型。数据驱动是指通过互联网或者其他的相关软件为手段采集海量的数据，将数据进行组织行程信息，之后对相关的信息进行整合和提炼，在数据的基础上经过训练和拟合形成自动化的决策模型，简单来说，就是以数据为中心依据进行决策和行动。其中，在统计学和机器学习领域，拟合是指找到合适的模型参数或函数形式，以使该模型能够在给定的数据集上更好地描述或预测数据的特征和趋势。拟合可以通过不同的方法进行,如最小二乘法、最大似然估计或梯度下降等。在拟合过程中，通常会尝试不同的模型或函数形式，并通过调整参数来最小化模型与数据之间的差异。拟合的目标是找到一个能够在整个数据集上表现良好并能够泛化到新数据的模型。In the disclosed embodiment, the decoding model can be any model that can perform parallel decoding calculations, data-driven training, and has a small number of parameters. Data-driven refers to the collection of massive amounts of data through the Internet or other related software, organizing the data into itinerary information, and then integrating and refining the relevant information. On the basis of the data, an automated decision model is formed through training and fitting. In short, it is to make decisions and actions based on data. Among them, in the fields of statistics and machine learning, fitting refers to finding suitable model parameters or function forms so that the model can better describe or predict the characteristics and trends of the data on a given data set. Fitting can be performed by different methods, such as least squares, maximum likelihood estimation, or gradient descent. In the fitting process, different models or function forms are usually tried, and the difference between the model and the data is minimized by adjusting the parameters. The goal of fitting is to find a model that performs well on the entire data set and can be generalized to new data.

在本公开实施例中，在解码模型包括神经网络的情况下，采用芯片上的解码模型，对多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码，可以包括：In the embodiment of the present disclosure, when the decoding model includes a neural network, a decoding model on a chip is used to perform on-chip parallel decoding on multiple coding sequences, and to perform parallel decoding on multiple codes within each coding sequence, which may include:

将多个编码序列输入一个或多个神经网络进行并行解码。Multiple encoded sequences are input into one or more neural networks for parallel decoding.

在本公开实施例中，该神经网络可以包含但不限于卷积神经网络，其他任何能够实现解码功能的神经网络均可以包含在本公开的神经网络的保护范围之内。In the embodiments of the present disclosure, the neural network may include but is not limited to a convolutional neural network, and any other neural network capable of implementing a decoding function may be included in the protection scope of the neural network of the present disclosure.

在本公开实施例中，在该神经网络为一维神经网络的情况下，解码模型可以包括多个一维神经网络；将多个编码序列输入一个或多个神经网络进行并行解码，包括：In the embodiment of the present disclosure, when the neural network is a one-dimensional neural network, the decoding model may include multiple one-dimensional neural networks; inputting multiple encoding sequences into one or more neural networks for parallel decoding includes:

将多个编码序列分别输入多个一维神经网络；Input multiple encoding sequences into multiple one-dimensional neural networks respectively;

通过每个一维神经网络分别对一个编码序列进行解码，并在每个所述一维神经网络内部对相应的编码序列内部的多个编码进行并行解码，实现对多个编码序列并行解码。Each one-dimensional neural network decodes a coding sequence respectively, and multiple codes in the corresponding coding sequence are decoded in parallel within each one-dimensional neural network, so as to realize parallel decoding of multiple coding sequences.

在本公开实施例中，每个编码序列为一维数据，一维神经网络在同一时段内仅可以对一个一维数据进行处理，因此，对于多个编码序列，可以采用多个一维神经网络同时进行处理，从而达到并行解码的目的。In the embodiment of the present disclosure, each coding sequence is one-dimensional data, and a one-dimensional neural network can only process one one-dimensional data in the same time period. Therefore, for multiple coding sequences, multiple one-dimensional neural networks can be used to process them simultaneously, thereby achieving the purpose of parallel decoding.

在本公开实施例中，由于很多神经网络具有良好的并行运算性能，例如一维卷积网络，因此，可以在每个一维神经网络内部对相应的编码序列内部的多个编码进行并行解码，从而实现了多个编码序列并行解码以及每个编码序列内的多个编码并行解码这一双并行解码方案，提高解码效率。In the embodiments of the present disclosure, since many neural networks have good parallel computing performance, such as one-dimensional convolutional networks, multiple codes within the corresponding coding sequence can be decoded in parallel within each one-dimensional neural network, thereby realizing a dual parallel decoding scheme of parallel decoding of multiple coding sequences and parallel decoding of multiple codes within each coding sequence, thereby improving decoding efficiency.

在本公开实施例中，在神经网络为多维神经网络的情况下，将多个编码序列输入一个或多个神经网络进行并行解码，包括：In the embodiment of the present disclosure, when the neural network is a multi-dimensional neural network, multiple encoding sequences are input into one or more neural networks for parallel decoding, including:

将多个编码序列输入多维神经网络；Input multiple encoding sequences into a multidimensional neural network;

通过多维神经网络对多个编码序列进行并行解码，并在每一维神经网络内部对相应的编码序列内部的多个编码进行并行解码。Multiple coding sequences are decoded in parallel through a multi-dimensional neural network, and multiple codes within the corresponding coding sequence are decoded in parallel within each dimensional neural network.

在本公开实施例中，可以分别通过多维神经网络中一维神经网络对多个编码序列中的一维编码序列进行解码，实现多维神经网络对多个编码序列的并行解码。每一维神经网络也可以具有并行解码能力，例如，每一维神经网络可以具有类似于一维卷积网络的结构，实现在每一维神经网络内部对相应的编码序列内部的多个编码进行并行解码。In the disclosed embodiment, one-dimensional coding sequences in multiple coding sequences can be decoded by one-dimensional neural networks in the multidimensional neural network, respectively, to achieve parallel decoding of multiple coding sequences by the multidimensional neural network. Each dimensional neural network can also have parallel decoding capabilities. For example, each dimensional neural network can have a structure similar to a one-dimensional convolutional network, to achieve parallel decoding of multiple codes within the corresponding coding sequence within each dimensional neural network.

在本公开实施例中，神经网络可以包括但不限于卷积神经网络，例如，可以为一维卷积神经网络，可以通过多个相同的一维卷积神经网络对多个编码序列并行解码。In the embodiments of the present disclosure, the neural network may include but is not limited to a convolutional neural network. For example, it may be a one-dimensional convolutional neural network. Multiple encoding sequences may be decoded in parallel by multiple identical one-dimensional convolutional neural networks.

在本公开实施例中，利用一维卷积神经网络进行解码的优势包括：In the disclosed embodiment, the advantages of using a one-dimensional convolutional neural network for decoding include:

1、由于权重共享，所以一维卷积神经网络参数较少，从而造成的额外传输代价少。1. Due to weight sharing, the one-dimensional convolutional neural network has fewer parameters, resulting in less additional transmission cost.

2、多个一维卷积神经网络可并行计算，从而可以完成并行解码，而且卷积运算在类脑芯片或GPU上被高度优化，所以效率高。2. Multiple one-dimensional convolutional neural networks can be calculated in parallel, so that parallel decoding can be completed, and the convolution operation is highly optimized on brain-like chips or GPUs, so it is highly efficient.

3、在大模型带宽压缩场景中，带来的计算量不会导致显著的代价，因为产生一个token的计算代价较少（几百GFLOPS（Giga Floating-point Operations Per Second，每秒10亿次的浮点运算数），芯片一般具有几十TFLOPS（Floating-point operations persecond，每秒一万亿次的浮点运算数）），片上的计算资源可完成这些解码计算。3. In the large-model bandwidth compression scenario, the amount of computation brought about will not result in a significant cost, because the computational cost of generating a token is relatively low (hundreds of GFLOPS (Giga Floating-point Operations Per Second), and the chip generally has tens of TFLOPS (Floating-point operations per second), and the on-chip computing resources can complete these decoding calculations.

4、神经网络的设计需要处理编码的对齐问题，由于编码流（即编码序列）的统计特性基本均匀，因此输入到网络的编码流和输出的解码结果大致在位置上对应，所以霍夫曼编码序列的该特性适合用神经网络处理，特别是卷积神经网络。4. The design of neural networks needs to deal with the problem of coding alignment. Since the statistical characteristics of the coding stream (i.e., coding sequence) are basically uniform, the coding stream input to the network and the output decoding result are roughly corresponding in position. Therefore, this characteristic of the Huffman coding sequence is suitable for processing by neural networks, especially convolutional neural networks.

在本公开实施例中，在解码模型包括神经网络的基础上，该神经网络可以包括但不限于残差连接层和/或批标准化层，以实现对神经网络的计算优化。In an embodiment of the present disclosure, on the basis that the decoding model includes a neural network, the neural network may include but is not limited to a residual connection layer and/or a batch normalization layer to achieve computational optimization of the neural network.

当强行将一个输入x添加到函数的输出F(x)的时候，虽然仍然可以用G(x)来描述输入输出的关系，但是这个G(x)却可以明确的拆分为F(x)和x的线性叠加。这就是残差连接（skip connect）的思想，将输出表述为输入和输入的一个非线性变换的线性叠加，残差连接的作用是解决梯度消失和梯度爆炸问题，同时也可以帮助模型更快地收敛。残差连接通常被应用于包含多个层的神经网络中，例如残差网络（ResNet）和变形卷积网络（DenseNet）等。When an input x is forcibly added to the output F(x) of the function, although G(x) can still be used to describe the relationship between the input and the output, this G(x) can be clearly split into the linear superposition of F(x) and x. This is the idea of the skip connect, which expresses the output as the linear superposition of the input and a nonlinear transformation of the input. The role of the skip connect is to solve the gradient vanishing and gradient exploding problems, and can also help the model converge faster. Residual connections are usually used in neural networks with multiple layers, such as residual networks (ResNet) and deformable convolutional networks (DenseNet).

批标准化(Batch Normalization )简称BN算法，是为了克服神经网络层数加深导致难以训练而诞生的一个算法。当训练集的样本数据和目标样本集分布不一致的时候，训练得到的模型无法很好的泛化。神经网络在训练的时候随着网络层数的加深，激活函数的输入值的整体分布逐渐往激活函数的取值区间上下限靠近，从而导致在反向传播时浅层的神经网络的梯度消失。而batch normalization（批标准化）的作用是通过规范化的手段，将越来越偏的分布拉回到标准化的分布，使得激活函数的输入值落在激活函数对输入比较敏感的区域，从而使梯度变大，加快学习收敛速度，避免梯度消失的问题。Batch Normalization (BN) is an algorithm designed to overcome the difficulty of training caused by the deepening of the number of neural network layers. When the distribution of the sample data of the training set is inconsistent with that of the target sample set, the trained model cannot generalize well. When the number of neural network layers increases during training, the overall distribution of the input value of the activation function gradually approaches the upper and lower limits of the activation function's value range, resulting in the disappearance of the gradient of the shallow neural network during back propagation. The role of batch normalization is to pull the increasingly biased distribution back to the standardized distribution through normalization, so that the input value of the activation function falls in the area where the activation function is more sensitive to the input, thereby increasing the gradient, accelerating the learning convergence speed, and avoiding the problem of gradient disappearance.

在本公开实施例中，在解码模型包括神经网络的情况下，在采用芯片上的解码模型，对多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码之前，所述方法还可以包括：In the embodiment of the present disclosure, when the decoding model includes a neural network, before using the decoding model on the chip to perform on-chip parallel decoding on multiple coding sequences and performing parallel decoding on multiple codes within each coding sequence, the method may further include:

将多个编码序列作为训练数据，对神经网络进行训练；Use multiple encoding sequences as training data to train the neural network;

在神经网络的损失值满足预设条件的情况下，获得解码模型。When the loss value of the neural network meets the preset conditions, the decoding model is obtained.

在本公开实施例中，为了确保神经网络完成正确的解码功能，需要对该神经网络进行训练。例如，可以基于所有的待压缩的数据获得训练集进行训练，即基于当前待压缩的数据进行编码后获得编码序列，并以这些编码序列作为训练数据，而且可以不需要该神经网络有对其他数据的泛化能力，因为只需要解码该组数据（例如一个网络的权重数据）即可。该神经网络也可以采用其他数据进行训练，使得神经网络有对其他数据具有泛化能力。In the disclosed embodiment, in order to ensure that the neural network completes the correct decoding function, the neural network needs to be trained. For example, a training set can be obtained based on all the data to be compressed for training, that is, a coding sequence is obtained after encoding the current data to be compressed, and these coding sequences are used as training data, and the neural network does not need to have the ability to generalize other data, because only the group of data (such as the weight data of a network) needs to be decoded. The neural network can also be trained with other data so that the neural network has the ability to generalize other data.

在本公开实施例中，该待编码数据样本可以包括大模型权重数据样本；多个编码序列样本可以包括对大模型权重数据样本进行分段后分别对每个分段进行霍夫曼编码获得的多个编码序列。In an embodiment of the present disclosure, the data sample to be encoded may include a large model weight data sample; the multiple encoding sequence samples may include multiple encoding sequences obtained by segmenting the large model weight data sample and then performing Huffman encoding on each segment.

在本公开实施例中，基于大模型权重数据样本获得的多个编码序列对神经网络进行训练，可以获得能够对大模型权重数据进行并行解码的解码模型，基于该解码模型，可以在大模型权重数据压缩场景中，对大模型权重矩阵中的大模型权重数据进行分段，获得多个编码序列，将该多个编码序列输入训练好的解码模型，从而快速获得解码结果，从而可以加速大模型权重数据压缩场景中的数据解码速度。In the embodiments of the present disclosure, a neural network is trained based on multiple coding sequences obtained from large model weight data samples to obtain a decoding model capable of decoding the large model weight data in parallel. Based on the decoding model, in a large model weight data compression scenario, the large model weight data in the large model weight matrix can be segmented to obtain multiple coding sequences, and the multiple coding sequences can be input into the trained decoding model to quickly obtain decoding results, thereby accelerating the data decoding speed in the large model weight data compression scenario.

在本公开实施例中，由于神经网络具有强大的拟合能力，因此可以获得较高的解码准确率。特别是在大模型等神经网络权重的压缩场景中，由于网络对权重值的误差具有鲁棒性，因此不需要保证100%的解码准确率，因此适合用神经网络近似解决。In the disclosed embodiment, since the neural network has a strong fitting ability, a higher decoding accuracy can be obtained. Especially in the compression scenario of the neural network weights such as large models, since the network is robust to the error of the weight value, it is not necessary to guarantee 100% decoding accuracy, so it is suitable to use a neural network approximation solution.

在本公开实施例中，下面以图2中的原权重矩阵为例来详细说明本公开实施例方案。原权重矩阵中包含多行数据：第一行，-8、-2、4、3、-9；第二行，1、5、-3、-1、7；第三行，-2、-9、5、0、-1；第四行，-8、-4、6、7、-4；对于以上数据可以按行进行分段，每行数据作为一个数据段，分别针对每个数据段进行霍夫曼编码，例如，对于第一行数据构成的数据段-8、-2、4、3、-9进行霍夫曼编码可以获得编码序列：01001110，遵循长度补齐原则，可以在码序列01001110后面对齐补0，获得补齐后的编码序列0100111000，将该补齐后的编码序列0100111000加载到芯片中，通过芯片上的神经网络进行解码，从而复原出原权重数据：-8、-2、4、3、-9。针对原权重矩阵中的每行数据（可以分别作为一个数据段）均可以执行以上操作，并在获得每个数据段的霍夫曼编码的编码序列后，将多个编码序列加载至芯片上，通过神经网络进行并行解码，从而增大霍夫曼解码并行性，降低解码延迟，提高在大模型带宽压缩等场景中的解码速度。In the embodiment of the present disclosure, the original weight matrix in FIG. 2 is taken as an example to explain the embodiment of the present disclosure in detail. The original weight matrix contains multiple rows of data: the first row, -8, -2, 4, 3, -9; the second row, 1, 5, -3, -1, 7; the third row, -2, -9, 5, 0, -1; the fourth row, -8, -4, 6, 7, -4; the above data can be segmented by row, each row of data as a data segment, and Huffman encoding is performed on each data segment. For example, Huffman encoding of the data segment -8, -2, 4, 3, -9 consisting of the first row of data can obtain the encoding sequence: 01001110. Following the length padding principle, 0 can be aligned and padded after the code sequence 01001110 to obtain the padded encoding sequence 0100111000. The padded encoding sequence 0100111000 is loaded into the chip and decoded by the neural network on the chip to restore the original weight data: -8, -2, 4, 3, -9. The above operation can be performed for each row of data in the original weight matrix (which can be regarded as a data segment respectively), and after obtaining the Huffman-coded coding sequence of each data segment, multiple coding sequences are loaded onto the chip and decoded in parallel through the neural network, thereby increasing the parallelism of Huffman decoding, reducing decoding delay, and improving the decoding speed in scenarios such as large model bandwidth compression.

可以理解，本公开提及的上述各个方法实施例，在不违背原理逻辑的情况下，均可以彼此相互结合形成结合后的实施例，限于篇幅，本公开不再赘述。本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。It can be understood that the above-mentioned various method embodiments mentioned in the present disclosure can be combined with each other to form a combined embodiment without violating the principle logic. Due to space limitations, the present disclosure will not repeat them. It can be understood by those skilled in the art that in the above-mentioned method of the specific implementation method, the specific execution order of each step should be determined according to its function and possible internal logic.

此外，本公开还提供了解码装置、电子设备、计算机可读存储介质，上述均可用来实现本公开提供的任一种解码方法，相应技术方案和描述和参见方法部分的相应记载，不再赘述。In addition, the present disclosure also provides a decoding device, an electronic device, and a computer-readable storage medium, all of which can be used to implement any decoding method provided by the present disclosure. The corresponding technical solutions and descriptions are referred to in the corresponding records of the method part and will not be repeated here.

图3为本公开实施例提供的一种解码装置的框图。FIG3 is a block diagram of a decoding device provided by an embodiment of the present disclosure.

参照图3，本公开实施例提供了一种解码装置300，该解码装置300可以包括：3 , an embodiment of the present disclosure provides a decoding device 300, which may include:

获取模块301，用于获取基于霍夫曼编码的多个编码序列；多个编码序列为待编码数据进行分段后对多个分段分别进行编码获得的；The acquisition module 301 is used to acquire multiple coding sequences based on Huffman coding; the multiple coding sequences are obtained by segmenting the data to be encoded and encoding the multiple segments respectively;

加载模块302，用于将多个编码序列加载到芯片上；A loading module 302, used to load multiple coding sequences onto the chip;

解码模块303，用于采用芯片上的解码模型，对多个编码序列进行片上并行解码，并对每个编码序列内部的多个编码进行并行解码；The decoding module 303 is used to use the decoding model on the chip to perform on-chip parallel decoding on multiple coding sequences and to perform parallel decoding on multiple codes within each coding sequence;

其中，解码模型是基于多个编码序列样本进行训练或拟合获得的具有并行解码能力的模型；多个编码序列样本为待编码数据样本进行分段后对多个分段分别进行编码获得的。The decoding model is a model with parallel decoding capability obtained by training or fitting multiple coding sequence samples; the multiple coding sequence samples are obtained by segmenting the data samples to be encoded and then encoding the multiple segments separately.

参照图4，本公开实施例提供了一种电子设备，该电子设备包括多个处理核401以及片上网络402，其中，多个处理核401均与片上网络402连接，片上网络402用于交互多个处理核间的数据和外部数据。4 , an embodiment of the present disclosure provides an electronic device, which includes multiple processing cores 401 and an on-chip network 402 , wherein the multiple processing cores 401 are all connected to the on-chip network 402 , and the on-chip network 402 is used to exchange data between the multiple processing cores and external data.

其中，一个或多个处理核401中存储有一个或多个指令，一个或多个指令被一个或多个处理核401执行，以使一个或多个处理核401能够执行上述的解码方法。One or more instructions are stored in one or more processing cores 401 , and the one or more instructions are executed by one or more processing cores 401 , so that one or more processing cores 401 can execute the above decoding method.

在一些实施例中，该电子设备可以是类脑芯片，由于类脑芯片可以采用向量化计算方式，且需要通过外部内存例如双倍速率（Double Data Rate，DDR）同步动态随机存储器调入神经网络模型的权重信息等参数。因此，本公开实施例采用批处理的运算效率较高。In some embodiments, the electronic device may be a brain-like chip, and since the brain-like chip can use vectorized computing and needs to load the weight information and other parameters of the neural network model through an external memory such as a double data rate (DDR) synchronous dynamic random access memory, the batch processing in the disclosed embodiment has a higher computing efficiency.

本公开实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，其中，所述计算机程序在被处理器/处理核执行时实现上述的解码方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。The present disclosure also provides a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above decoding method when executed by a processor/processing core. The computer-readable storage medium may be a volatile or non-volatile computer-readable storage medium.

本公开实施例还提供了一种计算机程序产品，包括计算机可读代码，或者承载有计算机可读代码的非易失性计算机可读存储介质，当所述计算机可读代码在电子设备的处理器中运行时，所述电子设备中的处理器执行上述解码方法。The embodiment of the present disclosure also provides a computer program product, including a computer-readable code, or a non-volatile computer-readable storage medium carrying the computer-readable code. When the computer-readable code runs in a processor of an electronic device, the processor in the electronic device executes the above decoding method.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中，在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分；例如，一个物理组件可以具有多个功能，或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读存储介质上，计算机可读存储介质可以包括计算机存储介质（或非暂时性介质）和通信介质（或暂时性介质）。It will be appreciated by those skilled in the art that all or some of the steps, systems, and functional modules/units in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or may be implemented as hardware, or may be implemented as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable storage medium, which may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transient medium).

如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息（诸如计算机可读程序指令、数据结构、程序模块或其他数据）的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器（RAM）、只读存储器（ROM）、可擦式可编程只读存储器（EPROM）、静态随机存取存储器（SRAM）、闪存或其他存储器技术、便携式压缩盘只读存储器（CD-ROM）、数字多功能盘（DVD）或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读程序指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer-readable program instructions, data structures, program modules or other data. Computer storage media include, but are not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), static random access memory (SRAM), flash memory or other memory technology, portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer. In addition, as is known to those of ordinary skill in the art, communication media typically contain computer-readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.

用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构（ISA）指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机（例如利用因特网服务提供商来通过因特网连接）。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列（FPGA）或可编程逻辑阵列（PLA），该电子电路可以执行计算机可读程序指令，从而实现本公开的各个方面。The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as "C" language or similar programming languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer, partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., using an Internet service provider to connect through the Internet). In some embodiments, by using the state information of the computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit may execute the computer-readable program instructions, thereby implementing various aspects of the present disclosure.

这里所描述的计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一个可选实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Software DevelopmentKit，SDK)等等。The computer program product described herein may be implemented in hardware, software, or a combination thereof. In one optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (SDK), etc.

这里参照根据本公开实施例的方法、装置（系统）和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Various aspects of the present disclosure are described herein with reference to flowcharts and/or block diagrams of methods, devices (systems) and computer program products according to embodiments of the present disclosure. It should be understood that each box in the flowchart and/or block diagram and the combination of boxes in the flowchart and/or block diagram can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processor of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.

附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings show the possible architecture, function and operation of the system, method and computer program product according to multiple embodiments of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of the module, program segment or instruction includes one or more executable instructions for realizing the specified logical function. In some alternative implementations, the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or action, or can be implemented with a combination of special hardware and computer instructions.

本文已经公开了示例实施例，并且虽然采用了具体术语，但它们仅用于并仅应当被解释为一般说明性含义，并且不用于限制的目的。在一些实例中，对本领域技术人员显而易见的是，除非另外明确指出，否则可单独使用与特定实施例相结合描述的特征、特性和/或元素，或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此，本领域技术人员将理解，在不脱离由所附的权利要求阐明的本公开的范围的情况下，可进行各种形式和细节上的改变。Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted only in a general illustrative sense and not for limiting purposes. In some instances, it will be apparent to those skilled in the art that, unless otherwise expressly noted, features, characteristics, and/or elements described in conjunction with a particular embodiment may be used alone or in combination with features, characteristics, and/or elements described in conjunction with other embodiments. Therefore, those skilled in the art will appreciate that various changes in form and detail may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims

1. A decoding method, the method comprising:

Acquiring a plurality of coding sequences based on Huffman coding; the plurality of coding sequences are obtained by respectively coding a plurality of segments after segmenting the data to be coded; the data to be encoded comprises: large model weight data;

loading the plurality of coding sequences onto a chip;

adopting the decoding model on the chip to perform on-chip parallel decoding on the plurality of coding sequences and performing parallel decoding on a plurality of codes inside each coding sequence;

The decoding model is a model with parallel decoding capability, which is obtained by training or fitting based on a plurality of coded sequence samples; the plurality of code sequence samples are obtained by respectively coding a plurality of segments after segmenting the data sample to be coded; the plurality of code sequence samples are obtained after being coded based on the current data to be compressed, and the decoding model does not need to have generalization capability for data other than the current data to be compressed; the decoding model includes a neural network; the neural network is used for executing any one of an image processing task, a voice processing task, a text processing task and a video processing task.

2. The decoding method of claim 1, wherein,

The on-chip decoding model is adopted to decode the plurality of coding sequences in parallel on a chip, and decode a plurality of codes inside each coding sequence in parallel, and the method comprises the following steps:

The multiple coding sequences are input into one or more of the neural networks for parallel decoding.

3. The decoding method of claim 2, wherein the neural network is a one-dimensional neural network, and the decoding model includes a plurality of the one-dimensional neural networks;

the inputting the plurality of coding sequences into one or more of the neural networks for parallel decoding includes:

inputting a plurality of the coding sequences into a plurality of the one-dimensional neural networks respectively;

and decoding one coding sequence through each one-dimensional neural network, and decoding a plurality of codes in the corresponding coding sequence in parallel in each one-dimensional neural network so as to realize parallel decoding of the plurality of coding sequences.

4. The decoding method of claim 2, wherein the neural network is a multidimensional neural network;

inputting the plurality of coding sequences into the multi-dimensional neural network;

And decoding the plurality of coding sequences in parallel through the multidimensional neural network, and decoding a plurality of codes in the corresponding coding sequences in parallel in each dimensional neural network.

5. The decoding method of claim 2, wherein the neural network comprises a convolutional neural network.

6. The decoding method of claim 1, wherein prior to said loading said plurality of encoded sequences onto a chip, said method further comprises:

acquiring a first coding sequence with the longest length in the plurality of coding sequences;

and according to the length of the first coding sequence, carrying out bit number compensation on a second coding sequence except the first coding sequence in the plurality of coding sequences.

7. The decoding method of claim 1, wherein prior to acquiring the plurality of encoded sequences based on huffman coding, the method further comprises:

when the data to be encoded is segmented, the obtained statistical characteristic difference of each data segment to be encoded is within a preset difference range.

8. The decoding method of claim 1, wherein the decoding model comprises a neural network; before the on-chip decoding model is adopted to perform on-chip parallel decoding on the plurality of coding sequences and perform parallel decoding on a plurality of codes inside each coding sequence, the method further comprises:

training the neural network by taking the plurality of code sequence samples as training data;

and under the condition that the loss value of the neural network meets a preset condition, obtaining the decoding model.

9. The decoding method of claim 8, wherein,

The data sample to be encoded comprises: large model weight data samples;

The plurality of code sequence samples includes: and segmenting the large model weight data sample, and then respectively carrying out Huffman coding on each segment to obtain a plurality of coding sequences.

10. The decoding method according to claim 1 or 9, wherein the data to be encoded is large model weight data; before acquiring the plurality of coding sequences based on huffman coding, the method further comprises:

respectively taking the large model weight data of each row in the large model weight matrix as a data segment to be encoded, and carrying out Huffman encoding on each data segment to be encoded to obtain an encoding sequence; or alternatively

And taking the large model weight data of each column in the large model weight matrix as a data segment to be encoded, and carrying out Huffman encoding on each data segment to be encoded to obtain a coding sequence.

11. A decoding apparatus, comprising:

an acquisition module, configured to acquire a plurality of coding sequences based on huffman coding; the plurality of coding sequences are obtained by respectively coding a plurality of segments after segmenting the data to be coded; the data to be encoded comprises: large model weight data;

the loading module is used for loading the plurality of coding sequences onto a chip;

the decoding module is used for carrying out on-chip parallel decoding on the plurality of coding sequences by adopting the on-chip decoding model and carrying out parallel decoding on a plurality of codes in each coding sequence;

12. An electronic device, comprising:

A plurality of processing cores; and

A network on chip configured to interact data between the plurality of processing cores and external data; wherein one or more of said processing cores have one or more instructions stored therein, one or more of said instructions being executable by one or more of said processing cores to enable one or more of said processing cores to perform the decoding method of any one of claims 1-10.

13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the decoding method according to any of claims 1-10.