CN106911336A

CN106911336A - The high-speed parallel low density parity check coding device and its interpretation method of multi-core dispatching

Info

Publication number: CN106911336A
Application number: CN201710031380.9A
Authority: CN
Inventors: 殷柳国; 张远东; 葛广君
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-01-17
Filing date: 2017-01-17
Publication date: 2017-06-30
Anticipated expiration: 2037-01-17
Also published as: CN106911336B

Abstract

The invention relates to a multi-core scheduling high-speed parallel low-density parity-check decoder and a decoding method thereof, belonging to the technical field of wireless communication. The decoder includes a sequentially connected data cache module, a multi-core scheduling module, and an LDPC parallel decoding core composed of multiple decoding cores; According to the working status of a single codeword, the word to be decoded with a single codeword length is assigned to the decoding core in the idle state that has completed the decoding. The multi-core scheduling module checks whether the decoding core has been decoded. If it meets the decoding result verification , then output the decoding result to the multi-core scheduling module. After the decoding is completed, the multi-core scheduling module will output the decoding results of each decoding core to the data buffer module in the same order according to the codeword allocation order, and the data buffer module will decode data output. The invention effectively improves the computing efficiency and decoding rate of the decoder by adding a novel multi-core scheduling module.

Description

High-speed parallel low-density parity-check decoder with multi-core scheduling and its decoding method

技术领域technical field

本发明属于无线通信技术领域，涉及一种多核调度的高速并行低密度奇偶校验译码器及其译码方法。The invention belongs to the technical field of wireless communication, and relates to a multi-core scheduling high-speed parallel low-density parity-check decoder and a decoding method thereof.

背景技术Background technique

随着经济的高速发展，无论是民用领域还是军事领域，都对高分辨率、高精度提出了更高的要求，而成熟稳定的高分(即高分辨率)系统对于国家军事安全以及诸多民用领域都是重要的支撑。在高分领域，美国最先研制出高分辨率观测卫星，使其该项技术处于世界领先地位，并且由于空间资源有限以及大数据时代下数据非同寻常的重要性，使得高分技术领先的国家不愿意共享资源，因此自主进行高分研发刻不容缓。With the rapid development of the economy, both civilian and military fields have put forward higher requirements for high resolution and high precision, and a mature and stable high resolution (high resolution) system is very important for national military security and many civilian applications. Fields are important supports. In the high-resolution field, the United States first developed a high-resolution observation satellite, making this technology a world leader, and due to limited space resources and the extraordinary importance of data in the era of big data, the leading high-resolution technology The country is unwilling to share resources, so it is urgent to carry out high-scoring research and development independently.

随着高分专项(即高分辨率对地观测系统重大专项)的逐渐推进，各类航天器探测器等数量迅速增加，探测信息的数据率越来越高，产生的数据量空前庞大，加上近些年来数据采集技术发展迅速，数字图像处理技术日趋成熟，为了获得更加全面的数据分析要求接收机能够接收并处理更加庞大的数据量。目前300Mbps的卫星高速数据传输系统已经获得了广泛的应用，传输速率已经要求达到3Gbps以上，未来还将继续提高到10Gbps甚至30Gbps。因此，能够适应高速数传要求的传输系统是高分专项发展的重要支撑。然而数据传输速率与系统功率存在正比例关系，速率增加必然会导致系统功率的提高，并且系统资源本身严格受限，因此如何在有限资源下处理更高速率的数据成为了高速数传系统的重要问题。With the gradual advancement of the high-resolution project (that is, the major project of the high-resolution Earth observation system), the number of various spacecraft and detectors has increased rapidly, the data rate of detection information has become higher and higher, and the amount of data generated is unprecedentedly large. In recent years, data acquisition technology has developed rapidly, and digital image processing technology has become increasingly mature. In order to obtain more comprehensive data analysis, it is required that the receiver can receive and process a larger amount of data. At present, the 300Mbps satellite high-speed data transmission system has been widely used, and the transmission rate has been required to reach more than 3Gbps, and will continue to increase to 10Gbps or even 30Gbps in the future. Therefore, a transmission system that can adapt to the requirements of high-speed data transmission is an important support for the development of high-resolution projects. However, there is a direct proportional relationship between the data transmission rate and the system power. An increase in the rate will inevitably lead to an increase in the system power, and the system resources themselves are strictly limited. Therefore, how to process higher-rate data under limited resources has become an important issue for high-speed data transmission systems. .

低密度奇偶校验(LDPC)码由于其非常接近理论极限编码增益的码字构造方式越来越受到人们的重视。其译码复杂度低，可并行译码并且译码是否有误便于检测，无论是高速数据传输还是高编码增益的要求都能够满足，目前在高速卫星数据传输系统中有着越来越广泛的应用。一般情况下，单个译码核译码能力有限，如果需要实现能够与高速数传相匹配的高速译码，通常需要多个译码核并行工作。如图1所示，为传统的高速LDPC译码器，包括依次相连的数据缓存模块、多核调度模块，以及由多个译码核组成的LDPC并行译码核；其中每个译码核由依次相连的输入数据缓存模块、软信息存储模块、变量节点阵列模块、控制模块，及分别与控制模块相连的校验节点阵列模块、变量节点存储模块、校验节点存储模块，与变量节点阵列模块依次相连的译码结果存储模块和输出缓存模块。各模块均通过一块FPGA芯片实现。Low Density Parity Check (LDPC) codes have attracted more and more attention due to their code word construction method which is very close to the theoretical limit coding gain. Its decoding complexity is low, it can be decoded in parallel, and it is easy to detect whether there is an error in decoding. It can meet the requirements of high-speed data transmission and high coding gain. It is currently more and more widely used in high-speed satellite data transmission systems. . In general, the decoding capability of a single decoding core is limited. If it is necessary to achieve high-speed decoding that can match high-speed data transmission, multiple decoding cores usually need to work in parallel. As shown in Figure 1, it is a traditional high-speed LDPC decoder, including sequentially connected data cache modules, multi-core scheduling modules, and LDPC parallel decoding cores composed of multiple decoding cores; The connected input data cache module, soft information storage module, variable node array module, control module, and the check node array module, variable node storage module, check node storage module respectively connected to the control module, and the variable node array module in sequence A connected decoding result storage module and an output buffer module. Each module is realized by an FPGA chip.

该传统高速LDPC译码器的译码方法为：待译码数据首先进入数据缓存模块(由FPGA芯片内的FIFO构成)，根据多核调度模块发出的指令，将数据并行送入各个LDPC并行译码核中进行译码。虚线框中为并行译码核，每个译码核工作方式相同，即单个码字首先进入输入数据缓存模块进行缓存，其作用是为后级模块提供与译码核工作时钟同频率的单个码字信息，而后待译信息进入软信息存储模块，根据控制模块设定的固定迭代次数(记为N次)及译码周期，将软信息从存储模块中读出，并循环通过变量节点阵列模块和校验节点阵列模块，达到迭代次数后输出至译码结果存储模块以及输出缓存模块，输出缓存模块的作用是按照需求将译码结果以某种时钟占空比输出至后级模块。多核调度模块根据每个译码核的编号将数据拼接后经由数据缓存模块后直接输出。可以看出，传统的高速LDPC译码器通常采用多核并行方法实现，该方法对单核译码器进行简单复制，前端连接传统多核调度模块以及数据缓存模块对整个系统进行控制。The decoding method of this traditional high-speed LDPC decoder is as follows: the data to be decoded first enters the data buffer module (constituted by the FIFO in the FPGA chip), and sends the data to each LDPC in parallel for decoding according to the instructions issued by the multi-core scheduling module decoded in the core. The dotted line box is a parallel decoding core, and each decoding core works in the same way, that is, a single code word first enters the input data buffer module for buffering, and its function is to provide a single code word with the same frequency as the decoding core working clock for the subsequent module. Word information, and then the information to be translated enters the soft information storage module, according to the fixed number of iterations (denoted as N times) and decoding cycle set by the control module, the soft information is read from the storage module, and circulates through the variable node array module The sum check node array module is output to the decoding result storage module and the output buffer module after reaching the number of iterations. The function of the output buffer module is to output the decoding result to the subsequent module at a certain clock duty ratio according to requirements. The multi-core scheduling module concatenates the data according to the number of each decoding core and then directly outputs it through the data cache module. It can be seen that the traditional high-speed LDPC decoder is usually implemented in a multi-core parallel method, which simply replicates the single-core decoder, and the front end is connected to the traditional multi-core scheduling module and data cache module to control the entire system.

然而传统的高速LDPC译码器能够进行正确译码不仅与各并行译码器正常工作有关，多核调度模块的调度作用更加关键。传统的多核调度模块主要负责码字的顺序分配，即待译信息缓存到数据缓存模块中并且达到一个码字长度的存储深度后，多核调度模块按照规定的单个码字长度，将待译码字从数据缓存模块中依次读出，并将其按照译码核的编号顺序依次赋给每个译码核。由于并行译码核中控制模块设置的译码迭代次数N固定，多核调度模块只需进行固定时间的等待之后，即可得到各译码核译码后的数据，再按照译码核的编号将数据进行拼接处理并输出即可。随后多核调度模块按照读取、分配、拼接的步骤继续将待译码字顺序分配给各译码核，等待后输出译码数据，不断重复该过程直至所有待译码字完成译码或接收到停止信号。由于传统高速LDPC译码器中多核调度模块功能简单且时序固定，因此该多核调度模块可直接使用计数器来完成。However, the correct decoding of the traditional high-speed LDPC decoder is not only related to the normal operation of each parallel decoder, but also the scheduling function of the multi-core scheduling module is more critical. The traditional multi-core scheduling module is mainly responsible for the order allocation of codewords, that is, after the information to be decoded is cached in the data cache module and reaches the storage depth of a codeword length, the multi-core scheduling module will allocate the codewords to be decoded according to the specified single codeword length Read out from the data cache module sequentially, and assign it to each decoding core in sequence according to the numbering order of the decoding cores. Since the number of decoding iterations N set by the control module in the parallel decoding core is fixed, the multi-core scheduling module only needs to wait for a fixed time to obtain the decoded data of each decoding core, and then transfer the data according to the number of the decoding core The data can be spliced and output. Then the multi-core scheduling module continues to assign the words to be decoded to each decoding core in sequence according to the steps of reading, assigning, and splicing, and outputs the decoding data after waiting, and repeats this process until all the words to be decoded are decoded or received stop signal. Since the function of the multi-core scheduling module in the traditional high-speed LDPC decoder is simple and the timing is fixed, the multi-core scheduling module can be directly implemented using a counter.

传统多核调度模块工作流程如图2所示，以四核并行译码器为例。在设置了固定的最大迭代次数之后，多核调度模块等待输入数据缓存模块的数据达到一个码字后开始工作。其将输入数据缓存模块的数据以一个码字长度为单位进行读出，按照顺序分别送给译码核1、译码核2、译码核3以及译码核4，每个译码核一直工作，多核调度模块在译码期间进行等待，直到所有译码核达到最大迭代次数，将译码后数据按照译码核1、译码核2、译码核3、译码核4的顺序重新组合并输出。多核调度模块再继续执行上述分配工作直至系统停止。由于四个译码核均在达到最大迭代次数后停止译码，因此这四个译码核工作进度保持一致，在每个译码周期内能够完成且仅能完成四个码字的译码工作。The workflow of the traditional multi-core scheduling module is shown in Figure 2, taking a quad-core parallel decoder as an example. After setting a fixed maximum number of iterations, the multi-core scheduling module waits for the data input to the data buffer module to reach a codeword before starting to work. It reads the data of the input data buffer module in units of one codeword length, and sends them to decoding core 1, decoding core 2, decoding core 3 and decoding core 4 in sequence, and each decoding core is always Work, the multi-core scheduling module waits during the decoding period until all the decoding cores reach the maximum number of iterations, and re-decodes the decoded data in the order of decoding core 1, decoding core 2, decoding core 3, and decoding core 4 Combine and output. The multi-core scheduling module continues to perform the above assignment work until the system stops. Since the four decoding cores stop decoding after reaching the maximum number of iterations, the work progress of the four decoding cores remains consistent, and only four codewords can be decoded in each decoding cycle. .

该方法实现难度低，但是在此过程中，传统的并行译码仅仅将单个译码核进行简单复制，将待译码字依次送入每个译码核进行译码，由于通常情况下，每个译码核均按照仿真情况，提前设置好固定的迭代次数以保证规定要求下的误码率，因此各译码核也保持同步的工作状态，码字分配容易实现。但是实际译码过程中，每个译码核译码时均需要执行固定的迭代次数，并不是每个码字均需要进行最大迭代次数才能译码成功，大部分码字在小于迭代次数时已经完成译码，对于已经成功译码的码字，继续进行迭代不会带来性能上的提高，反而降低了译码效率，造成了资源的浪费。This method is difficult to implement, but in the process, the traditional parallel decoding only simply copies a single decoding core, and sends the word to be decoded to each decoding core in turn for decoding. According to the simulation situation, each decoding core has a fixed number of iterations set in advance to ensure the bit error rate under the specified requirements. Therefore, each decoding core also maintains a synchronous working state, and the code word allocation is easy to realize. However, in the actual decoding process, each decoding core needs to perform a fixed number of iterations when decoding, and not every codeword needs the maximum number of iterations to decode successfully. After the decoding is completed, for the codewords that have been successfully decoded, continuing to iterate will not bring performance improvement, but will reduce the decoding efficiency and cause a waste of resources.

这些对于高速译码器而言是非常关键的问题。These are very critical issues for high-speed decoders.

由此可见，传统并行译码器由于其固定的迭代次数造成整体译码效率不够高，在资源有限的情况下造成硬件资源的浪费。目前亟需能够根据实际译码完成情况对迭代次数进行控制，并且可以整体调度整个译码器的新型多核调度模块，以更加充分利用有限资源，提高整体译码效率。It can be seen that the overall decoding efficiency of the traditional parallel decoder is not high enough due to its fixed number of iterations, and hardware resources are wasted in the case of limited resources. At present, there is an urgent need for a new multi-core scheduling module that can control the number of iterations according to the actual decoding completion, and can schedule the entire decoder as a whole, so as to make full use of limited resources and improve the overall decoding efficiency.

发明内容Contents of the invention

本发明的目的是为了解决传统有限资源下高速LDPC译码器，存在的硬件资源未充分利用的问题，提出一种新型的多核调度高速并行低密度奇偶校验(LDPC)译码器及其译码方法，能够通过新型多核调度模块控制译码核进行非固定次数的迭代，从而有效地提高译码效率，并使硬件资源更充分地利用。The purpose of the present invention is to solve the problem of insufficient utilization of hardware resources in traditional high-speed LDPC decoders with limited resources, and propose a new type of multi-core scheduling high-speed parallel low-density parity check (LDPC) decoder and its decoder. The decoding method can control the decoding core to perform a non-fixed number of iterations through a new multi-core scheduling module, thereby effectively improving the decoding efficiency and making full use of hardware resources.

本发明提出的一种多核调度的高速并行低密度奇偶校验译码器，该译码器采用一块FPGA芯片实现，包括依次相连的数据缓存模块、多核调度模块，以及由多个译码核组成的LDPC并行译码核；其特征在于：所述数据缓存模块由FPGA内部的FIFO构成，该FIFO具有更深的存储深度，以保证下一个码字到来时，能有足够缓存空间；所述多核调度模块用于控制整个高速LDPC译码器的调度，当上级数据缓存模块存入多于一个码字的数据量，即向后级并行译码核发出译码起始信号，将缓存数据调配至各译码核译码，同时接收后级并行译码核反馈的译码结束信号，检验各译码核是否处于空闲状态，将前级数据缓存模块中下一个待译码字送入空闲状态的译码核进行译码；译码结束后根据码字分配顺序，并按照相同顺序将各译码核的译码结果统一输出至数据缓存模块；所述的LDPC并行译码核中的每个译码核由依次相连的软信息存储模块、变量节点阵列模块、控制模块，及分别与控制模块相连的校验节点阵列模块、变量节点存储模块、校验节点存储模块，与变量节点阵列模块相连的译码结果存储模块；所述变量节点阵列模块还与校验节点存储模块、变量节点存储模块相连。The present invention proposes a multi-core scheduling high-speed parallel low-density parity-check decoder. The decoder is implemented by an FPGA chip, including sequentially connected data cache modules, multi-core scheduling modules, and multiple decoding cores. The LDPC parallel decoding core; it is characterized in that: the data cache module is formed by the FIFO inside the FPGA, and this FIFO has a deeper storage depth to ensure that when the next codeword arrives, there is enough buffer space; the multi-core scheduling The module is used to control the scheduling of the entire high-speed LDPC decoder. When the upper-level data cache module stores more than one codeword of data, it sends a decoding start signal to the subsequent parallel decoding core to allocate the cached data to each The decoding core decodes, and at the same time receives the decoding end signal fed back by the parallel decoding core of the subsequent stage, checks whether each decoding core is in an idle state, and sends the next word to be decoded in the previous stage data buffer module to the decoding in the idle state The code core performs decoding; after the decoding is completed, according to the codeword distribution order, and in the same order, the decoding results of each decoding core are uniformly output to the data buffer module; each decoding in the LDPC parallel decoding core The core consists of a soft information storage module, a variable node array module, and a control module connected in sequence, and a check node array module, a variable node storage module, and a check node storage module connected to the control module, and a translator connected to the variable node array module. A code result storage module; the variable node array module is also connected with the check node storage module and the variable node storage module.

本发明提出的一种如上述的多核调度的高速并行低密度奇偶校验译码器的译码方法，其特征在于：The present invention proposes a decoding method of a high-speed parallel low-density parity-check decoder as described above, characterized in that:

待译码字流首先进入数据缓存模块，由多核调度模块根据后端并行译码核的工作状况，将单个码字长度的待译码字分配给完成译码的处于空闲状态的译码核，而每个译码核的在接收到待译码字后，首先将其放入软信息存储模块，循环通过变量节点阵列模块以及校验节点阵列模块进行译码，每次译码结果均存储于译码结果存储模块中，由多核调度模块检验该译码核是否已经译码完成，若符合译码结果校验，则输出译码结果至多核调度模块，该译码核转为空闲状态；否则该译码核继续进行迭代译码，直至达到设定的最大迭代次数，若此时还未正确译码，则强制停止该译码核对当前码字的译码，输出译码结果并反馈译码失败信息；多核调度模块再将下一个待译码字送入空闲译码核进行译码；译码结束后多核调度模块根据码字分配顺序，并按照相同顺序将各译码核的译码结果统一输出至数据缓存模块，数据缓存模块将译码数据输出。The word stream to be decoded first enters the data cache module, and the multi-core scheduling module assigns the word to be decoded with a single codeword length to the decoding core in the idle state that has completed decoding according to the working status of the back-end parallel decoding core. After each decoding core receives the word to be decoded, it first puts it into the soft information storage module, circulates through the variable node array module and the check node array module for decoding, and each decoding result is stored in the In the decoding result storage module, the multi-core scheduling module checks whether the decoding core has been decoded. If it meets the decoding result verification, the decoding result is output to the multi-core scheduling module, and the decoding core is turned into an idle state; otherwise The decoding core continues to perform iterative decoding until it reaches the set maximum number of iterations. If it has not been correctly decoded at this time, it will forcibly stop the decoding to check the decoding of the current codeword, output the decoding result and feedback the decoding Failure information; the multi-core scheduling module sends the next word to be decoded to the idle decoding core for decoding; after decoding, the multi-core scheduling module sends the decoding results of each decoding core in the same order according to the code word allocation order Unified output to the data cache module, and the data cache module outputs the decoded data.

本发明的技术特点及有益效果：Technical characteristics and beneficial effects of the present invention:

1)本发明的数据缓存模块增加了存储深度，以保证下一个码字到来时，能有足够缓存空间，直到某个译码核完成上一码字的译码后，将新码字赋给该译码核；1) The data cache module of the present invention increases the storage depth to ensure that when the next codeword arrives, there is enough buffer space until a certain decoding core completes the decoding of the previous codeword and assigns the new codeword to the the decoding core;

2)多核调度模块控制整个高速LDPC译码器的调度，通过多核调度模块控制译码核进行非固定次数的迭代，从而有效地提高译码效率。2) The multi-core scheduling module controls the scheduling of the entire high-speed LDPC decoder. The multi-core scheduling module controls the decoding core to perform a non-fixed number of iterations, thereby effectively improving the decoding efficiency.

3)由于多核调度模块前端有数据缓存模块，用于缓存待译码数据，同时存在多核调度模块的码字调度和控制，因此单个译码核内省略了输入、输出FIFO，以节省硬件资源。3) Since there is a data cache module at the front end of the multi-core scheduling module, which is used to cache data to be decoded, and there is code word scheduling and control of the multi-core scheduling module, so the input and output FIFOs are omitted in a single decoding core to save hardware resources.

本发明通过非常小的硬件开销，大幅降低了传统方法的运算量，并且显著提高了系统整体的效率并节约了整体的硬件资源。The present invention greatly reduces the calculation amount of the traditional method through very small hardware overhead, significantly improves the overall efficiency of the system and saves the overall hardware resources.

附图说明Description of drawings

图1是传统多核并行LDPC译码器结构框图。Figure 1 is a structural block diagram of a traditional multi-core parallel LDPC decoder.

图2是传统多核调度模块工作示意图。Fig. 2 is a working diagram of a traditional multi-core scheduling module.

图3是本发明的多核调度的高速LDPC译码器结构框图。Fig. 3 is a structural block diagram of the multi-core scheduling high-speed LDPC decoder of the present invention.

图4是本发明实施例的最大迭代次数分布情况。Fig. 4 is the distribution of the maximum number of iterations in the embodiment of the present invention.

图5是本发明实施例的迭代次数与译码性能的关系。Fig. 5 shows the relationship between the number of iterations and the decoding performance of the embodiment of the present invention.

图6是本发明实施例的迭代次数与节省运算量的关系。FIG. 6 shows the relationship between the number of iterations and the amount of computation saved in the embodiment of the present invention.

图7是本发明实施例的输入FIFO深度增加量。Fig. 7 shows the increase of input FIFO depth according to the embodiment of the present invention.

具体实施方式detailed description

下面结合附图对本发明作进一步的详细描述。The present invention will be described in further detail below in conjunction with the accompanying drawings.

本发明提出的一种多核调度的高速LDPC译码器，如图2所示，包括依次相连的数据缓存模块、多核调度模块，以及由多个译码核组成的LDPC并行译码核，可与传统LDPC译码器相同采用一块FPGA芯片实现；这种多核调度的高速LDPC译码器与传统译码器的区别特征在于：A high-speed LDPC decoder with multi-core scheduling proposed by the present invention, as shown in Figure 2, includes a data buffer module, a multi-core scheduling module connected in sequence, and an LDPC parallel decoding core composed of multiple decoding cores, which can be combined with The traditional LDPC decoder is also realized by an FPGA chip; the difference between this multi-core scheduling high-speed LDPC decoder and the traditional decoder is:

1)数据缓存模块由FPGA内部的FIFO构成，该FIFO比传统LDPC译码器数据缓存模块中的FIFO具有更深的存储深度，以便能够缓存更多的码字序列信息，保证下一个码字到来时，能有足够缓存空间，待某个译码核完成上一码字的译码后，将新码字赋给该译码核；1) The data buffer module is composed of FIFO inside the FPGA. This FIFO has a deeper storage depth than the FIFO in the traditional LDPC decoder data buffer module, so that more codeword sequence information can be cached to ensure that the next codeword arrives. , can have enough buffer space, after a decoding core completes the decoding of the previous code word, assign the new code word to the decoding core;

2)多核调度模块用于控制整个高速LDPC译码器的调度，当上级数据缓存模块存入多于一个码字的数据量，即向后级并行译码核发出译码起始信号，将缓存数据调配至各译码核译码，同时接收后级并行译码核反馈的译码结束信号，检验各译码核是否处于空闲状态，将前级数据缓存模块中下一个待译码字送入空闲状态的译码核进行译码；译码结束后根据码字分配顺序，并按照相同顺序将各译码核的译码结果统一输出至数据缓存模块；2) The multi-core scheduling module is used to control the scheduling of the entire high-speed LDPC decoder. When the upper-level data cache module stores more than one codeword of data, it sends a decoding start signal to the subsequent parallel decoding core, and the buffer The data is allocated to each decoding core for decoding, and at the same time, it receives the decoding end signal fed back by the subsequent parallel decoding core, checks whether each decoding core is in an idle state, and sends the next word to be decoded in the previous data buffer module to the The decoding core in the idle state performs decoding; after the decoding is completed, the decoding results of each decoding core are uniformly output to the data buffer module in the same order according to the codeword allocation sequence;

3)LDPC并行译码核中的每个译码核由依次相连的软信息存储模块、变量节点阵列模块、控制模块，及分别与控制模块相连的校验节点阵列模块、变量节点存储模块、校验节点存储模块，与变量节点阵列模块相连的译码结果存储模块；所述变量节点阵列模块还与校验节点存储模块、变量节点存储模块相连。其中控制模块不再对全部并行译码核设置固定迭代次数，仅设置最大迭代次数(最大迭代次数的设定比传统控制模块设定的固定迭代次数N稍大即可)。并控制每次迭代的译码周期，即按照顺序将软信息依次代入变量节点阵列模块与校验节点阵列模块中进行运算。由于数据缓存模块用于缓存待译码数据，并且存储深度增加，同时存在多核调度模块的码字调度和控制，因此本发明的LDPC并行译码核省略了传统LDPC译码器每个译码核内的用于为后级模块提供与译码核工作时钟同频率的单个码字信息的输入数据缓存模块及用于按照需求将译码结果以某种时钟占空比输出至后级模块的输出数据缓存模块，以节省硬件资源。3) Each decoding core in the LDPC parallel decoding core is composed of a soft information storage module, a variable node array module, and a control module connected in sequence, and a check node array module, a variable node storage module, and a calibration node respectively connected to the control module. A verification node storage module, a decoding result storage module connected to the variable node array module; the variable node array module is also connected to the verification node storage module and the variable node storage module. Wherein the control module no longer sets a fixed number of iterations for all parallel decoding cores, but only sets the maximum number of iterations (the maximum number of iterations is set slightly larger than the fixed number of iterations N set by the traditional control module). And control the decoding cycle of each iteration, that is, sequentially substitute the soft information into the variable node array module and the check node array module for calculation. Since the data cache module is used to cache the data to be decoded, and the storage depth is increased, and there is codeword scheduling and control of the multi-core scheduling module, the LDPC parallel decoding core of the present invention omits each decoding core of the traditional LDPC decoder The internal input data buffer module is used to provide the subsequent module with a single code word information with the same frequency as the decoding core working clock, and the output for outputting the decoding result to the subsequent module at a certain clock duty cycle as required Data caching module to save hardware resources.

本发明设置FIFO的深度增加量(与传统LDPC译码器数据缓存模块中FIFO相比更深)具体说明如下：若共需译M个码字，其中a个码字在迭代N次之后正确译码，b个码字在迭代N+1次后正确译码，c个码字在迭代N+2次后正确译码，则该M个码字的所需的平均迭代次数N′可以根据b与c的概率计算得到，根据此平均迭代次数即可计算出FIFO需要增加的比特深度存储量。以此类推，根据不同数量的码字需要不同数量的大于N的迭代次数，使用该方法即可得出多核调度的高速LDPC译码器中数据缓存模块中FIFO深度的增加量。The present invention sets the depth increment of FIFO (deeper than FIFO in the traditional LDPC decoder data cache module) and is specifically described as follows: if M codewords need to be decoded altogether, wherein a codeword is correctly decoded after iteration N times , b codewords are correctly decoded after iteration N+1 times, and c codewords are correctly decoded after iteration N+2 times, then the required average number of iterations N′ of the M codewords can be calculated according to b and The probability of c is calculated, and according to the average number of iterations, the required bit depth storage capacity of the FIFO can be calculated. By analogy, different numbers of iterations greater than N are required according to different numbers of codewords. Using this method, the increase in FIFO depth in the data cache module of the multi-core scheduled high-speed LDPC decoder can be obtained.

本发明提出的多核调度的高速LDPC译码器的译码方法为：待译码字流首先进入数据缓存模块，由多核调度模块根据后端并行译码核的工作状况，将单个码字长度的待译码字分配给完成译码的处于空闲状态的译码核，而每个译码核的在接收到待译码字后，首先将其放入软信息存储模块，在未达到最大迭代次数前，循环通过变量节点阵列模块以及校验节点阵列模块进行译码，每次译码结果均存储于译码结果存储模块中，由多核调度模块检验该译码核是否已经译码完成，若符合译码结果校验，则输出译码结果至多核调度模块，该译码核转为空闲状态；否则该译码核继续进行迭代译码，直至达到设定的最大迭代次数，若此时还未正确译码，则强制停止该译码核对当前码字的译码，输出译码结果并反馈译码失败信息；多核调度模块再将下一个待译码字送入空闲译码核进行译码；译码结束后多核调度模块根据码字分配顺序，并按照相同顺序将各译码核的译码结果统一输出至数据缓存模块，数据缓存模块将译码数据输出。The decoding method of the high-speed LDPC decoder of multi-core scheduling proposed by the present invention is as follows: the word stream to be decoded first enters the data buffer module, and the multi-core scheduling module converts the single code word length according to the working status of the back-end parallel decoding core. The words to be decoded are assigned to the decoding cores in the idle state that have completed the decoding, and each decoding core puts them into the soft information storage module after receiving the words to be decoded, and when the maximum number of iterations is not reached Before, loop through the variable node array module and the check node array module to decode, each decoding result is stored in the decoding result storage module, and the multi-core scheduling module checks whether the decoding core has been decoded. If the decoding result is verified, the decoding result is output to the multi-core scheduling module, and the decoding core turns into an idle state; otherwise, the decoding core continues to perform iterative decoding until the set maximum number of iterations is reached. If the decoding is correct, the decoding is forced to stop and the decoding of the current codeword is checked, the decoding result is output and the decoding failure information is fed back; the multi-core scheduling module sends the next word to be decoded to the idle decoding core for decoding; After the decoding is completed, the multi-core scheduling module outputs the decoding results of each decoding core to the data cache module according to the order of the codeword allocation and in the same order, and the data cache module outputs the decoded data.

本发明的核心是多核调度模块对系统整体的综合控制调度，而多核调度模块的工作信号取决于各并行译码核是否完成译码，也即非固定迭代次数译码机制。该机制表现在多核调度模块若检测到某译码核译码后码字符合校验关系，则该译码核立即停止迭代成为空闲译码核，等待调度模块分配新码字。译码结果校验为将每个与校验节点j有链接关系的变量节点i的输出外信息的正负值进行异或运算，并将异或运算的结果作为校验结果，如下式所示：The core of the present invention is the comprehensive control and scheduling of the whole system by the multi-core scheduling module, and the working signal of the multi-core scheduling module depends on whether each parallel decoding core completes decoding, that is, a non-fixed iteration number decoding mechanism. This mechanism is manifested in the fact that if the multi-core scheduling module detects that the codeword decoded by a certain decoding core conforms to the verification relationship, the decoding core immediately stops iterating and becomes an idle decoding core, waiting for the scheduling module to allocate new codewords. The verification of the decoding result is to perform an XOR operation on the positive and negative values of the output information of each variable node i that has a link relationship with the check node j, and take the result of the XOR operation as the verification result, as shown in the following formula :

其中i为与校验节点j相连的变量节点，其数量为q_ij表示与校验节点j有连结关系的变量节点i的输出外信息，check为校验的输出结果，sgn()为符号函数即返回输出外信息的正负值、0为正1为负，为异或运算；check＝0说明满足校验关系；如果所有的校验节点均满足check＝0的校验关系，则说明该码字已经译码正确，或者该码字被译成了码字空间中的另外一个码字；以上两种情况均表示着译码已经完成，继续迭代也对译码结果产生不了作用，因此此时就应当停止当前码字的迭代，该译码核为空闲状态并可开始新码字的译码。Where i is the variable node connected to check node j, the number of which is q _ij represents the output external information of the variable node i that has a connection relationship with the check node j, check is the output result of the check, sgn() is a sign function that returns the positive and negative values of the output external information, 0 is positive and 1 is negative , It is an XOR operation; check=0 means that the verification relationship is satisfied; if all the check nodes satisfy the verification relationship of check=0, it means that the codeword has been decoded correctly, or the codeword has been decoded into a codeword Another codeword in the space; both of the above two cases indicate that the decoding has been completed, and continued iteration will not have an effect on the decoding result, so the iteration of the current codeword should be stopped at this time, and the decoding core is in an idle state And the decoding of a new codeword can be started.

本发明针对传统多核调度无法充分利用译码周期的问题，对传统的调度方法做出了改进，本发明的多核调度模块将待分配码字分配的具体方法如下：设LDPC并行译码核为m个，当输入FIFO内存储量达到一个码字后多核调度模块开始工作，多核调度模块首先按照各译码核编号由小到大的顺序进行码字初次分配，分别将前m个码字送往m个译码核进行译码；初次分配后，多核调度模块一直处于待机检测状态，当多核调度模块要进行后续的每个码字的分配时，根据译码核反馈回来的结果判断出哪个译码核处于空闲状态，记录下该译码核的编号并将其完成译码的码字暂存，同时从前端数据缓存模块中读出一个待译码字，分配给该当前处于空闲状态的译码核进行译码，之后继续进入待机检测状态，等待下一个译码核完成译码；则依次将后续码字分别调度到处于空闲状态的译码核进行译码(而不必等待m个译码核全部完成译码后再进行新一轮的码字分配，充分利用了各译码核的译码时间)。如果多个译码核同时完成译码，则按照译码核的编号从小到大的顺序进行待译码字的分配。The present invention aims at the problem that the traditional multi-core scheduling cannot make full use of the decoding cycle, and improves the traditional scheduling method. The multi-core scheduling module of the present invention allocates the codewords to be allocated. The specific method is as follows: Let the LDPC parallel decoding core be m When the storage capacity in the input FIFO reaches one codeword, the multi-core scheduling module starts to work. The multi-core scheduling module first allocates the codewords in the order of each decoding core number from small to large, and sends the first m codewords to m respectively. After the initial allocation, the multi-core scheduling module has been in the standby detection state. When the multi-core scheduling module is to allocate each subsequent codeword, it will judge which decoding core according to the result returned by the decoding core. When the core is in the idle state, record the number of the decoding core and temporarily store the code word that has been decoded, and at the same time read a word to be decoded from the front-end data buffer module and assign it to the decoding core that is currently in the idle state. The core performs decoding, and then continues to enter the standby detection state, waiting for the next decoding core to complete the decoding; then sequentially schedule the subsequent codewords to the decoding cores in the idle state for decoding (without waiting for m decoding cores After all the decoding is completed, a new round of code word allocation is carried out, which fully utilizes the decoding time of each decoding core). If multiple decoding cores complete the decoding at the same time, the words to be decoded are allocated in ascending order of the numbers of the decoding cores.

下面给出本发明的实施例的应用效果：Provide the application effect of the embodiment of the present invention below:

以码长为12288比特，信息位长度10240比特的LDPC码为例，译码过程无最大迭代次数，停止译码的条件为译码正确。Taking the LDPC code with a code length of 12288 bits and an information bit length of 10240 bits as an example, there is no maximum number of iterations in the decoding process, and the condition for stopping decoding is that the decoding is correct.

如图4所示为每个码字正确译码执行的迭代次数，横坐标为每个码字的序号，纵坐标为该码字正确译码时执行的迭代次数。图中最大迭代次数为29次，最小迭代词素为12次，完成译码的迭代次数在16次处十分密集，并且仅有0.2％的码字需要进行最大次数的迭代。传统并行译码器中，为了满足译码要求且控制过程简单，通每个常需要设置最大迭代次数，每个译码周期均按照最大迭代次数进行设置。对于本实施例而言，最大迭代次数设置为29次，那么实际上仅有0.2％的码字需要迭代29次，而99.8％的码字并不需要达到最大迭代次数，因此造成了迭代次数的浪费，进而造成硬件资源的浪费。As shown in Fig. 4, the number of iterations executed for correct decoding of each codeword is shown, the abscissa is the sequence number of each codeword, and the ordinate is the number of iterations executed when the codeword is correctly decoded. In the figure, the maximum number of iterations is 29, the minimum number of iterations is 12, and the number of iterations to complete decoding is very dense at 16, and only 0.2% of the codewords need the maximum number of iterations. In traditional parallel decoders, in order to meet the decoding requirements and control the process simply, it is usually necessary to set the maximum number of iterations, and each decoding cycle is set according to the maximum number of iterations. For this embodiment, the maximum number of iterations is set to 29 times, so in fact only 0.2% of the codewords need to iterate 29 times, and 99.8% of the codewords do not need to reach the maximum number of iterations, thus causing an increase in the number of iterations Waste, which in turn causes a waste of hardware resources.

如图5所示为迭代次数与译码性能关系图，横坐标为最大迭代次数，纵坐标为该最大迭代次数下译码结果的误码率。当信噪比合适时迭代次数的增加对于BER的降低有明显的作用，并且每增加两次次迭代，BER大约降低一个量级。因此采用非固定迭代次数的新型多核调度并行译码方式，将译码所需迭代次数少的码字节省下的时间分配给需要更多次迭代的才能正确译码的码字，相当于增加了整体的有效迭代次数，且降低了平均迭代次数。从译码运算量角度看，平均迭代次数与译码器工作时钟正相关，即平均迭代次数少，译码器工作时钟低，而低时钟速率有助于提高硬件电路的稳定性，并且在相同译码处理速率下，带来运算量的降低。如图6所示为最大迭代次数与节省运算量之间的关系，横坐标为最大迭代次数，纵坐标为节省运算量的百分比。在本例情况下，平均迭代次数从29次降为16次，运算量减少了46.7％，一方面硬件资源相比于传统方法利用更充分，另一方面运算量大幅降低，因此在达到相同指标的前提下，本发明显著提高了系统整体效率并节约了硬件资源。Fig. 5 is a graph showing the relationship between the number of iterations and decoding performance, the abscissa is the maximum number of iterations, and the ordinate is the bit error rate of the decoding result under the maximum number of iterations. When the signal-to-noise ratio is appropriate, the increase of the number of iterations has a significant effect on the reduction of BER, and every two iterations, the BER is reduced by an order of magnitude. Therefore, a new multi-core scheduling parallel decoding method with a non-fixed number of iterations is used to allocate the time saved for codewords that require fewer iterations to decode codewords that require more iterations to be decoded correctly, which is equivalent to increasing The overall effective number of iterations is increased, and the average number of iterations is reduced. From the perspective of decoding calculation, the average number of iterations is positively correlated with the working clock of the decoder, that is, the average number of iterations is small, the working clock of the decoder is low, and the low clock rate helps to improve the stability of the hardware circuit, and at the same The lower the decoding processing rate, the lower the amount of computation. Figure 6 shows the relationship between the maximum number of iterations and the amount of computation saved, the abscissa is the maximum number of iterations, and the ordinate is the percentage of the amount of computation saved. In this case, the average number of iterations is reduced from 29 to 16, and the amount of computation is reduced by 46.7%. On the one hand, hardware resources are more fully utilized than traditional methods, and on the other hand, the amount of computation is greatly reduced. Under the premise, the present invention significantly improves the overall efficiency of the system and saves hardware resources.

新型多核调度大幅提高了译码核部分的利用率，同时也引入了额外的两种开销，其一为校验节点增加的校验关系检验模块，其二为输入FIFO的深度增加。检验模块的本质是所有输入外信息符号位的异或运算，在硬件上表现为少许LUT资源。The new multi-core scheduling greatly improves the utilization rate of the decoding core, but also introduces two additional overheads, one is the verification relationship verification module added by the check node, and the other is the increase in the depth of the input FIFO. The essence of the verification module is the XOR operation of all the sign bits of the input external information, which is represented as a few LUT resources on the hardware.

在传统模式中最大迭代次数N的设置往往根据实际数据率、译码性能指标以及硬件支持的译码速率进行设定，而新型多核调度方式所需增加的FIFO深度在N的基础上进行计算。假设共需译10个码字，其中9个码字在迭代N次之后正确译码，1个码字在迭代N+1次后正确译码，则FIFO需要加深比特数深度的概率即为这1个码字出现的概率；若8个码字在迭代N次之后正确译码，1个码字在迭代N+1次后正确译码，1个码字在迭代N+2次后正确译码，则FIFO需要加深比特数深度的概率即为出现迭代N+1次与N+2次各一次的概率。由于需要大于N次迭代的码字出现的时间不同，对FIFO深度影响也不同，因此该概率的计算还需要将码字出现的顺序考虑其中。图7所示为每一个码字译码后计算得到的FIFO深度增加比特量，横坐标为待译码字的序号，纵坐标为当前FIFO缓存的待译比特数。可以看出最大迭代次数从29降至16时，输入FIFO深度只需增加两个码字的长度，但是译码性能得到了巨大的提升，并且图7中缓存的比特数最大不超过两个码字，因此在FIFO深度增加两个码字长度的情况下，整个译码过程中FIFO不会出现写满的情况。而在图6中，如果迭代次数从16次增加为29次，输入FIFO只增加两个码字的深度，误码率从10^-4量级降至10^-9量级，故FIFO深度的增加量与校验关系检验模块带来的少量LUT使用量同译码性能的改善相比完全可以接受。因此，本发明通过非常小的硬件开销，大幅降低了传统方法的运算量，并且显著提高了系统整体的效率并节约了整体的硬件资源。In the traditional mode, the setting of the maximum number of iterations N is often set according to the actual data rate, decoding performance index, and decoding rate supported by the hardware, while the increased FIFO depth required by the new multi-core scheduling method is calculated on the basis of N. Suppose a total of 10 codewords need to be decoded, of which 9 codewords are correctly decoded after N iterations, and 1 codeword is correctly decoded after N+1 iterations, then the probability that the FIFO needs to deepen the bit depth is this Probability of occurrence of 1 codeword; if 8 codewords are correctly decoded after N iterations, 1 codeword is correctly decoded after N+1 iterations, and 1 codeword is correctly decoded after N+2 iterations code, the probability that the FIFO needs to deepen the bit depth is the probability that iterations N+1 and N+2 occur each time. Since codewords that require more than N iterations occur at different times, they have different impacts on the FIFO depth, so the calculation of the probability also needs to take into account the order in which the codewords appear. Figure 7 shows the increased bit amount of FIFO depth calculated after each codeword is decoded, the abscissa is the sequence number of the word to be decoded, and the ordinate is the number of bits to be decoded in the current FIFO buffer. It can be seen that when the maximum number of iterations is reduced from 29 to 16, the input FIFO depth only needs to increase the length of two codewords, but the decoding performance has been greatly improved, and the maximum number of buffered bits in Figure 7 does not exceed two codewords words, so when the FIFO depth is increased by two codeword lengths, the FIFO will not be full during the entire decoding process. In Figure 6, if the number of iterations increases from 16 to 29, the input FIFO only increases the depth of two codewords, and the bit error rate decreases from 10 ^-4 to 10 ^-9 , so the increase in FIFO depth Compared with the improvement of decoding performance, the small amount of LUT usage brought by the volume and verification relationship verification module is completely acceptable. Therefore, the present invention greatly reduces the calculation amount of the traditional method through very small hardware overhead, and remarkably improves the efficiency of the whole system and saves the whole hardware resources.

Claims

1. A multi-core scheduling high-speed parallel low-density parity-check decoder, the decoder is implemented using an FPGA chip, including sequentially connected data buffer modules, multi-core scheduling modules, and LDPC composed of multiple decoding cores Parallel decoding core; it is characterized in that: described data cache module is made of FIFO inside FPGA, and this FIFO has deeper memory depth, when guaranteeing that next code word arrives, can have enough buffer space; Described multi-core scheduling module uses To control the scheduling of the entire high-speed LDPC decoder, when the upper-level data cache module stores more than one codeword of data, it sends a decoding start signal to the subsequent parallel decoding core to allocate the buffered data to each decoding Core decoding, while receiving the decoding end signal fed back by the parallel decoding core of the subsequent stage, checking whether each decoding core is in the idle state, and sending the next word to be decoded in the previous stage data buffer module to the decoding core in the idle state Decoding; each decoding core in the described LDPC parallel decoding core consists of a soft information storage module, a variable node array module, a control module connected in sequence, and a check node array module, a variable node array module and a variable node respectively connected to the control module A node storage module, a check node storage module, and a decoding result storage module connected to the variable node array module; the variable node array module is also connected to the check node storage module and the variable node storage module.

2. a kind of decoding method of decoder as claimed in claim 1, it is characterized in that, word stream to be decoded at first enters data cache module, by multi-core dispatching module according to the operating condition of back-end parallel decoding core, will The word to be decoded with a single codeword length is assigned to the decoding core in the idle state that has completed the decoding, and after each decoding core receives the word to be decoded, it first puts it into the soft information storage module, and loops Decoding is performed through the variable node array module and the check node array module, and each decoding result is stored in the decoding result storage module, and the multi-core scheduling module checks whether the decoding core has been decoded, and if it meets the decoding result check, then output the decoding result to the multi-core scheduling module, and the decoding core will turn into an idle state; otherwise, the decoding core will continue to perform iterative decoding until the set maximum number of iterations is reached. If the decoding has not been completed at this time , the decoding is forced to stop and check the decoding of the current codeword, output the decoding result and feedback the decoding failure information; the multi-core scheduling module sends the next word to be decoded to the idle decoding core for decoding; the decoding ends The post-multi-core scheduling module outputs the decoding results of each decoding core to the data cache module in the same order according to the codeword allocation order, and the data cache module outputs the decoded data.

3. The decoding method according to claim 2, wherein the decoding result is verified as performing exclusive OR on the positive and negative values of the output external information of each variable node i that has a link relationship with the check node j operation, and use the result of the XOR operation as the verification result, as shown in the following formula:

c c h h e e c c k k = = sgn sgn (({q q}_{11 j j})) &CirclePlus; &CirclePlus; sgn sgn (({q q}_{22 j j})) &CirclePlus; &CirclePlus; ... ... &CirclePlus; &CirclePlus; sgn sgn (({q q}_{i i j j})) &CirclePlus; &CirclePlus; ... ... &CirclePlus; &CirclePlus; sgn sgn (({q q}_{{d d}_{{c c}_{j j}} j j})),, 11 \leq \leq i i \leq \leq {d d}_{{c c}_{j j}}

Where i is the variable node connected to check node j, the number of which is q _ij represents the output external information of the variable node i that has a connection relationship with the check node j, check is the output result of the check, sgn() is a sign function that returns the positive and negative values of the output external information, 0 is positive and 1 is negative , It is an XOR operation; check=0 means that the verification relationship is satisfied; if all check nodes satisfy the verification relationship of check=0, the codeword has been decoded correctly, or the codeword has been decoded into a codeword space The other codeword in the above two cases all means that the decoding has been completed, and the decoding core is in an idle state.

4. decoding method as claimed in claim 2, it is characterized in that, multi-core dispatching module is as follows with the concrete method of code word distribution to be distributed: Let LDPC parallel decoding core be m, when input FIFO internal storage capacity reaches a code word The multi-core scheduling module starts to work. The multi-core scheduling module first allocates codewords in the order of each decoding core number from small to large, and sends the first m codewords to m decoding cores for decoding; after the initial allocation, The multi-core scheduling module has been in the standby detection state. When the multi-core scheduling module is to allocate each subsequent codeword, it can judge which decoding core is in the idle state according to the result fed back by the decoding core, and record the decoding core. Number and temporarily store the code word that has been decoded, and at the same time read a word to be decoded from the data cache module, assign it to the decoding core that is currently in the idle state for decoding, and then continue to enter the standby detection state, waiting The next decoding core completes the decoding; then the subsequent codewords are scheduled to the decoding cores in the idle state for decoding in turn; if multiple decoding cores complete the decoding at the same time, the decoding cores are numbered from small to large The order of the words to be decoded is allocated.