CN119814046A

CN119814046A - A QC-LDPC code decoder based on FPGA and its implementation method

Info

Publication number: CN119814046A
Application number: CN202411855004.8A
Authority: CN
Inventors: 刘海庆; 秦帝; 王宏北; 张雪梅
Original assignee: Anhui Zhongke Terahertz Technology Co ltd; Anhui University
Current assignee: Anhui Zhongke Terahertz Technology Co ltd; Anhui University
Priority date: 2024-12-17
Filing date: 2024-12-17
Publication date: 2025-04-11
Anticipated expiration: 2044-12-17
Also published as: CN119814046B

Abstract

The present invention discloses a QC-LDPC code decoder based on FPGA and an implementation method thereof, and belongs to the field of communication technology. Channel information and posterior probability storage unit LLR_RAM, front-end shift register unit F_shifter. The present invention introduces a data selector Data_selector in front of the variable node processing unit VNU, the selector is connected to the shift register, and the selector flexibly connects the output value of the shift register to VNU, thereby advancing the RAM read operation of the current layer to the data update stage of the previous layer, and the updated data of this layer is also partially written into the storage unit LLR_RAM, and partially transmitted to the next layer VNU, reducing three clock cycles of RAM reading, improving efficiency and not reducing decoding performance; in addition, the intra-layer parallel decoding algorithm and the 4-level pipeline structure technology are adopted to further improve the decoder throughput; improve the decoding speed and reduce the delay without reducing the performance, which is suitable for LDPC decoding applications in high-speed communication scenarios.

Description

QC-LDPC code decoder based on FPGA and implementation method thereof

Technical Field

The invention relates to the technical field of communication, in particular to a QC-LDPC code decoder based on an FPGA and an implementation method thereof.

Background

In digital communication systems, signals are subject to various noise and interference during transmission, resulting in errors in the transmitted data. In order to improve the reliability of data transmission, a forward error correction coding technology is generally adopted, and an LDPC code is used as a very efficient error correction code, and has become one of important coding schemes in a modern communication system due to the excellent performance of the LDPC code approaching the shannon limit, and is widely used in various fields such as deep space communication, terahertz communication and the like. Decoders of LDPC codes have been widely studied in terms of code architecture, decoding algorithm, hardware implementation, etc., wherein how to improve throughput and hardware resource utilization of LDPC decoders while guaranteeing decoding performance has become a hot point of study.

Among many decoding algorithms of LDPC codes, hierarchical iterative decoding is popular because of its excellent convergence. In terms of hardware implementation, system throughput may be improved by reducing the clock cycles per layer of data processing, or by increasing more pipeline stages. However, when the next layer needs to use the updated LLR value, the update process of the previous layer may not be completed, resulting in that the data cannot be updated in time, thereby affecting the decoding performance. Therefore, the speed of intra-layer data processing becomes a critical factor limiting decoder throughput. With the introduction of pipelines, the problem of read-write collision of a message memory further aggravates the problems of limited throughput and reduced decoding performance.

There are mainly several approaches to the current solutions. One is to introduce a waiting period during processing, and only after the previous layer has completed calculation and obtained updated LLR values, the next layer operation is started. However, the latency period introduced is typically long, greatly affecting the throughput of the decoding, one approach is to continually process each layer of operations without setting the latency period, read old LLR values when pipeline conflicts are encountered, and temporarily ignore the update contribution of the previous layer in the current layer processing.

However, when the number of conflict submatrices is large, the decoding performance is obviously reduced, and another method is to reduce the number of conflicts by reordering the processing of the submatrices and deferring the use time of the update contribution value based on the above method, but the method only reduces the data update conflict and does not solve the problem of reducing the decoding performance.

Disclosure of Invention

1. Technical problem to be solved

The invention aims to provide an FPGA-based QC-LDPC code decoder and an implementation method thereof, which are used for solving the problems that in the background art, when the number of conflict submatrices is large, the method can obviously reduce the decoding performance, and the other method is used for reducing the number of conflicts by reordering the processing of the submatrices and deferring the use time of updating contribution values on the basis of the method, but only reduces the data updating conflict, but does not solve the problem of reducing the decoding performance.

2. Technical proposal

A FPGA-based QC-LDPC code decoder comprising:

The channel information and posterior probability storage unit LLR_RAM is used for storing the initially transmitted channel data and LLR information after data updating iteration, n RAM storage blocks are shared, and n is the column block number of the check matrix H;

the front-end shift register unit F_shift is used for performing cyclic right shift operation on data output by the LLR_RAM according to the shift requirement of the unit matrix in each layer of the check matrix H, and k blocks of F_shift memory blocks are used as the maximum value in the number of the unit matrix which is not negative in each layer of column partition;

the variable node computing unit VNU is used for updating and computing the information data outside the variable nodes in the iterative process, and m variable node computing units are used for the total, wherein m is the number of rows of the identity matrix in the check matrix H;

The check node calculation unit CNU is used for updating and calculating the information data outside the check nodes in the iterative process, wherein m check node calculation units are used for the total, and m is the number of rows of the identity matrix in the check matrix H;

The posterior probability updating module LUU updates posterior probability LLR of the layer according to updated variable node information check node information, and m posterior probability updating units are totally arranged, wherein m is the number of rows of the identity matrix in the check matrix H;

The back-end shift register unit B_shift is used for reversely shifting a part of data output by the LUU, writing the data back to the RAM, and performing cyclic right shifting operation of the unit matrix according to the H matrix shift requirement of the next layer, wherein k blocks of B_shift memory blocks are shared, and k is the maximum value in the number of the unit matrices which are not negative in each layer of column partition;

the Data selector unit data_selector is used for selecting the shift registers at the front end and the rear end to output Data, connecting the Data to the VNU module and starting the Data iteration of the next layer;

The check node information storage FIFO unit R_M_FIFO is mainly used for storing external information updated by check nodes, and k check node information storage R_M_FIFO units are used for storing k pieces of check node information, wherein k is the maximum value in the number of non-negative unit matrixes in each layer of column partition;

The variable node information storage FIFO module Q_M_FIFO is mainly used for storing external information updated by variable nodes, and k check node information storage Q_M_FIFO units are used for storing the external information updated by the variable nodes, wherein k is the maximum value in the number of non-negative unit matrixes in each layer of column partition;

The control module Con_M controls the same-layer data processing and the data exchange between different layers of all the modules, and the time sequence of 4-pipeline frame staggered decoding.

Preferably, the channel information and posterior probability storage unit LLR_RAM completely writes data only when initialization and iteration are completed, the other data iterates layers, only selectively writes, the writing selection is controlled by the control module Con_M, the first layer data only reads all data of the layers according to the first layer matrix requirement when the iteration is initiated, the other layers selectively read part, LLR_RAM data is read, 3 clocks are processed in advance of the layer data, and the reading selection is controlled by the control module Con_M.

Preferably, the front-end drum shift register unit f_selector includes a Data selector module data_selector, and the output Data of the front-end drum shift register unit f_selector is not completely shifted and output according to the current layer shift requirement of the conventional structure, but is partially shifted and output according to the connection of the Data selector module data_selector to the next layer, which is controlled by the control module con_m.

Preferably, the Data selector module data_selector selects Data connected to the variable node computing unit VNU according to the current iteration layer number, the Data being output values of the front-end and back-end shift register units, respectively.

Preferably, the output Data of the back-end shift register unit b_shift is no longer determined by the full recovery of the original LLR sequence of the conventional structure, but by whether the Data selector module data_selector is connected to the next-layer variable node computing unit VNU, and if connected to the next-layer VNU, the output Data is processed according to the shift requirement of the block matrix of the next-layer, and if not connected to the next-layer, the shift is recovered to the original LLR sequence, and the Data selector module data_selector is written back to the llr_ram module.

Preferably, the control module Con_M starts from the channel input to the LLR_RAM unit, controls the data processing sequence of each layer, from F_shifter to VNU to CNU to LUU, finally to B_shifter, and then to write back to the RAM_LLR module according to the need, manages the data flow between layers, transfers the data from the first layer to the second layer to the last layer in turn, completes one data iteration until the required data iteration times are completed, controls 4-pipeline frame interleaved decoding, uses 4-pipeline frame interleaved decoding because each layer of data processing needs 4 clocks, and each clock corresponds to different frame data processing when processing the data in the layer.

Preferably, the implementation method of the QC-LDPC code decoder based on the FPGA comprises the following steps:

s1, initializing, namely storing received four-frame channel initial information LLR into each storage block of an LLR_RAM unit in a blocking manner according to the columns of a check matrix H, writing all 0 check node information into each R_M_FIFO unit, and initializing iteration number item to 0;

S2, iterating first layer data, namely firstly, reading data in the first layer data according to LLR_RAM units corresponding to non-negative blocking matrixes in the first layer of the check matrix H, sequentially extracting data corresponding to each frame, shifting the data in a front-end shift register unit F_shift, sending the shifted data into a VNU (virtual network unit) for variable node data updating, then sending the data into a CNU for check node updating, carrying out posterior probability updating on the data by the LUU, and finally sending the posterior probability LLR data into a B_shift unit;

S3, in the updating process of the VNU module of the first layer, based on a non-negative blocking matrix in a check matrix H of the second layer, preparing Data of the second layer, if the blocking of the first layer H matrix is adjacent to the blocking of the second layer H matrix up and down and are non-negative matrices, the Data can be directly obtained by shifting LLR Data corresponding to the first layer through a B_shift unit without re-reading an LLR_RAM unit; if the first layer H matrix is divided into blocks with negative matrix and the second layer H matrix is divided into blocks with non-negative matrix, the LLR Data corresponding to the second layer needs to read information from the RAM-LLR unit and carry out corresponding shift operation through the F-shift unit; if the first layer H matrix is divided into non-negative matrix and the second layer H matrix is divided into negative matrix, the LLR Data corresponding to the first layer is required to be shifted and recovered through the B_shift unit and written back into the LLR_RAM unit, the Data acquired by the second layer is sent to the data_selector Data selection module after being shifted and sent to the VNU, and Data update is carried out by adopting operation similar to the first layer;

s4, repeating the steps until the data updating of all layers is completed, adding 1 to iteration number item, if the iteration number does not reach the set iteration number, continuing to update the data, and after the iteration is completed, restoring the posterior probability LLR output by the LUU by the B_shifter unit to an original sequence and writing back to the RAM_LLR unit;

s5, outputting a decoding result, namely reading updated data from the RAM-LLR unit, performing decoding judgment, and finally outputting the result.

Preferably, the data processing algorithm used is the hierarchical minimum sum algorithm LMSA.

3. Advantageous effects

Compared with the prior art, the invention has the advantages that:

The invention relates to a QC-LDPC code decoder based on an FPGA and an implementation method thereof, which are improved on the basis of the structure of the traditional LDPC code decoder, in particular to a Data selector which is introduced before a variable node processing unit VNU. The two input ends of the selector are respectively connected with the shift register of the input end and the output end of the system, and the output value of the shift register is flexibly connected to the VNU through the selector, so that the RAM read operation of the current layer is advanced to the updating stage of the previous layer;

According to the QC-LDPC code decoder based on the FPGA and the implementation method thereof, update data of the layer is also partially written into the LLR-RAM unit and is partially transmitted to the next layer of VNU, three clock cycles for reading by the RAM are reduced, the data transmission efficiency is improved, and the decoding performance is not reduced;

The invention adopts an intra-layer parallel decoding algorithm and a 4-stage pipeline structure technology, so that the throughput of the decoder is further improved. This design significantly increases decoding speed and reduces latency without degrading performance, and is suitable for LDPC decoding applications in high-speed communication scenarios.

Drawings

FIG. 1 is a schematic diagram of a QC-LDPC decoder based on an FPGA and a method for implementing the same;

FIG. 2 is a schematic diagram of a QC-LDPC code decoder based on an FPGA and a 4-pipeline frame interleaved decoding method for implementing the same;

FIG. 3 is a schematic flow chart of the QC-LDPC decoder based on the FPGA and the QC-LDPC decoder provided by the implementation method thereof;

FIG. 4 is a timing diagram of a QC-LDPC code decoder based on an FPGA and a method for implementing the same, which provides 4-pipe frame interleaved decoding of the QC-LDPC code decoder;

Fig. 5 is a graph showing decoding performance of the QC-LDPC decoder using BPSK modulation under AWGN channels according to the present invention based on the FPGA QC-LDPC code decoder and the implementation method thereof.

Detailed Description

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "configured to," "engaged with," "connected to," and the like are to be construed broadly, and include, for example, "connected to," whether fixedly connected to, detachably connected to, or integrally connected to, mechanically connected to, electrically connected to, directly connected to, indirectly connected to, and in communication with each other via an intermediate medium. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Referring to fig. 1-5, the present invention provides a technical solution:

a FPGA-based QC-LDPC code decoder comprising:

The device comprises channel information, a posterior probability storage unit LLR_RAM, a front-end roller shift register unit F_selector, a variable node calculation unit VNU, a check node calculation unit CNU, a posterior probability update unit LUU, a rear-end roller shift register unit B_selector, a Data selector module data_selector, a check node information storage FIFO unit R_M_FIFO, a variable node information storage FIFO unit Q_M_FIFO and a control module Con_M;

The LLR values corresponding to the non-negative blocking sub-matrix of the next layer are no longer read out of the LLR_RAM cells and shifted to the VNU. If the current layer is the negative submatrix and the lower layer is not the negative submatrix, the LLR of the next layer is read from an LLR_RAM unit in advance by 3 clocks and is sent to an F_shift unit, the current layer is the non-negative submatrix, the lower layer is the negative matrix, the LLR of the current layer is shifted and recovered and written back to the LLR_RAM, and in addition, an intra-layer parallel decoding algorithm and a 4-level pipeline structure technology are adopted as shown in figure 2, so that the throughput of a decoder is further improved;

A control module Con_M for controlling the same layer data processing of all the modules and the data exchange between different layers, and the time sequence of 4-pipeline frame staggered decoding;

specifically, the channel information and posterior probability storage unit LLR_RAM only performs complete writing of data when initialization and iteration are completed, the other data iterates layers, only performs selective writing, writing selection is controlled by the control module Con_M, only performs all data reading of the first layer according to the first layer matrix requirement when the initial iteration is performed, the other layers perform selective partial reading, and reads LLR_RAM data, processes the data of the layer for 3 clocks in advance, and the reading selection is controlled by the control module Con_M.

Further, the front-end drum shift register unit f_shift includes a Data selector module data_selector, and the output Data of the front-end drum shift register unit f_shift is not completely shifted and output according to the current layer shift requirement of the conventional structure, but is partially shifted and output according to the connection of the Data selector module data_selector to the next layer, which is controlled by the control module con_m.

Further, the Data selector module data_selector selects Data connected to the variable node calculation unit VNU according to the current iteration layer number, the Data being output values of the front-end and back-end shift register units, respectively.

Further, the output Data of the back-end shift register unit b_shift is not completely restored to the original LLR sequence by the conventional structure, but is determined by whether the Data selector module data_selector is connected to the next-layer variable node computing unit VNU, if connected to the next-layer VNU, the output Data is processed according to the shift requirement of the block matrix of the next-layer, and if not connected to the next-layer, the shift is restored to the original LLR sequence, and the Data selector module data_selector is written back to the llr_ram module.

Further, the control module Con_M starts from the channel input to the LLR_RAM unit, controls the data processing sequence of each layer, from F_shift to VNU to CNU to LUU, finally to B_shift, and then to write back to the RAM_LLR module according to the need, manages the data flow between layers, transfers the data from the first layer to the second layer to the last layer in turn, completes one data iteration until the required data iteration times are completed, controls 4-channel frame interleaved decoding, uses 4-channel frame interleaved decoding because each layer of data processing needs 4 clocks, and each clock corresponds to different frame data processing when the data is processed in the layer.

Notably, the implementation method of the QC-LDPC code decoder based on the FPGA comprises the following steps of:

Embodiment one:

The embodiment of the invention provides an FPGA-based QC-LDPC code decoder and an implementation method thereof, and the specific processing steps are shown in figure 4;

The method comprises the steps of S1, initializing, namely storing received four-frame channel initial information LLR into each storage block of channel information and a posterior probability storage unit LLR_RAM in a segmented mode according to the columns of H of a check matrix, wherein each storage block has a storage address of 0-3, and respectively corresponds to the same LLR divided storage positions in different frames;

s2, iterating the first layer of data;

a) According to the first layer of the check matrix H, the storage data in each storage block of the LLR_RAM unit at the position where the first layer of the check matrix H is a non-negative sub-matrix is sequentially read, the first clock reads the corresponding data of the first frame, the second clock reads the corresponding data of the second frame, and at the same time, the second clock starts to read the variable node data corresponding to the first layer of the first frame in each storage block of the R_M_FIFO; the method comprises the steps of finishing four-frame LLR block initial information reading in the first four clocks, finishing variable node information corresponding to the first layer four frames in the second to fifth clocks, sequentially sending the read LLR block initial data into each shifting unit of a front-end shifting register unit Fl_shift corresponding to the first layer according to a reading sequence, shifting according to the shifting requirement of a first layer check matrix, wherein F_shift only consumes one clock when finishing shifting operation, the four-frame LLR block data share the same F_shift, distinguishing the four sent clocks sequentially, and controlling the data reading operation by a control module Con_M;

b) The first layer front end shift register unit F_shift outputs Data, the Data is connected to all the computing units of the VNU by the selection module data_selector, variable node Data of all the frames of R_M_FIFO read at the moment are also connected to all the computing units of the VNU, after all the computing units of the VNU complete Data updating of variable nodes, the Data are sent to all the computing units of the CNU and all the storage units of the Q_M_FIFO, the CNU reads the information of Q_M_FIFO just stored during the Data updating of the check node, and after the Data of the check node complete updating, the Data are sent to all the module computing units of the LUU and all the storage units of the R_M_FIFO. Each calculation unit of the LUU sends the data into each shift unit of the back-end shift register B_shift after finishing the data updating of the posterior probability LLR for the sent check node information and Q_M_FIFO storage information; the four frames of LLR block data share the same VNU, CNU and LUU modules, the four clocks are fed in to be distinguished successively, and the reading operation of the data is controlled by a control module Con_M;

c) The Data sent to each shift unit of the back-end shift register B_shift is subjected to shift control by a control module according to the blocks which are not negative in the second layer of the block of the check matrix H, if the blocks of the first layer and the second layer are adjacent and are not negative, the check block of the second layer is directly obtained by directly shifting the corresponding B_shift unit of the first layer instead of being read again from the LLR_RAM storage unit, the shifted blocks are connected to each calculation unit of the VNU of the second layer by a Data selection module data_selector, and the second layer Data update is started; if the blocks of the first layer are adjacent to the blocks of the second layer, the check block corresponding to the first layer is not negative, and the check block of the second layer is negative, the posterior probability LLR block output by the LUU of the first layer is directly shifted to restore the original sequence, and the original sequence is written back to the storage position corresponding to the RAM_LLM; if the first layer and the second layer are adjacent, the check block corresponding to the first layer is negative, but the check block of the second layer is not negative, three clocks are required to be read from the corresponding storage position of the RAM_LLM in advance before the B_shift unit outputs, wherein two clocks are read, one clock is sent to the shift unit corresponding to the front shift register module F_shift for shift output, output Data is sent to the Data selection module data_selector to be connected to the VNU computing unit of the second layer, at the moment, the shift unit of the front shift register module F_selector and the shift unit of the rear shift register module B_selector are simultaneously connected to the Data selection module data_selector, and the VNU module computing unit of the second layer is connected to the data_selector for update;

Second layer data update, namely acquiring second layer data from c), performing data update of the second layer according to the similar operation of b) and c), and acquiring third layer data for update;

And finishing an iteration operation according to the operation until the last layer of data is updated, and performing an 1 adding operation on the iteration count item. If the iteration number requirement is not completed, acquiring new layer data according to the step 2 c), and updating the data of the layer according to the similar operations of the step 2B) and the step 2 c) until the iteration requirement is completed, and after the iteration is completed, sending posterior probability LLR output by each calculation unit of the LUU module into a shift unit of a rear-end shift register module B_shift to restore an original LLR sequence, and writing back into a RAM_LLR storage unit;

completely reading the iterated data from the RAM-LLR storage unit, performing decoding judgment, and outputting;

The data processing algorithm used in the embodiment is a hierarchical minimum sum algorithm LMSA;

the calculation formula of the VNU module is as follows:

External messages expressed as the jth check node propagated from the ith variable node of the nth layer at the ith iteration, for a particular m and n, in the mth layer Updated by the a posteriori probability LLR in the n-th layer.

The calculation formula of the CNU module is as follows:

Wherein the method comprises the steps of Meaning that at the ith iteration, an external message propagates from the jth check node to the ith variable node of the nth layer. _N(j)\i The variable node set participating in the jth check equation is represented, excluding the ith variable node. Where α is the attenuation factor α ε (0, 1).

The calculation formula of the LUU module is as follows:

The update of a posterior LLR of an ith variable node of an nth layer in the ith iteration is represented;

The embodiment is tested in an AWGN channel with BPSK modulation as shown in fig. 5. The normalization factor is set to α=0.75. Meanwhile, the initialization information LLR, the VNU information and the posterior probability information LLR are quantized in a manner of [6, 8 ]. In the decoding process, the maximum iteration times are respectively set to be 5 times, 10 times and 15 times, the error rate is counted, and the error rate is compared with the error rate of 10 times of iteration data of LNMS algorithm of MATLAB without any modification. As can be seen from fig. 5, when the number of iterations is set to 10, the decoding performance of the decoder is substantially optimal, and when the bit error rate is 10 ^-5, the difference between the decoding performance of the decoder and MATLAB for 10 times is less than 0.1dB.

The specific throughput of the embodiment is obtained by the following formula:

Where N _ldpc represents the LDPC code length used, N _fram represents the number of frames that can be decoded simultaneously, f _clk represents the system clock frequency used, rate is the code Rate, N _iter represents the number of iterations, N _lay represents the number of decoder layers, and C _lay represents the number of clocks required for each layer of iterations. At 254M clock frequency, a calculated single drop can achieve a throughput of 102.87Gb/s at maximum. The decoder is suitable for QC-LDPC decoding in a high-speed communication scene. The FPGA of the case realizes the statistical result as follows;

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A QC-LDPC code decoder based on FPGA, characterized by comprising:

The channel information and posterior probability storage unit LLR_RAM is used to store the initially input channel data and the LLR information after the data update iteration. There are n RAM storage blocks in total, where n is the number of column blocks of the check matrix H;

The front-end shift register unit F_shifter is used to perform a circular right shift operation on the data output by the LLR_RAM according to the shift requirements of the unit matrix in each layer of the check matrix H. There are k F_shifter storage blocks in total, and k is the maximum value of the number of non-negative unit matrices in the column blocks of each layer;

The variable node calculation unit VNU is used to update and calculate the external information data of the variable node during the iteration process. There are m variable node calculation units in total, where m is the number of rows of the unit matrix in the check matrix H;

The check node calculation unit CNU is used to update and calculate the external information data of the check node during the iteration process. There are m check node calculation units in total, where m is the number of rows of the unit matrix in the check matrix H;

The posterior probability update module LUU verifies the node information according to the updated variable node information and updates the posterior probability LLR of the layer. There are m posterior probability update units in total, where m is the number of rows of the unit matrix in the check matrix H.

The back-end shift register unit B_shifter is used to reverse shift part of the data output by the LUU and write it back to the RAM, and perform a circular right shift operation on the unit matrix according to the H matrix shift requirement of the next layer. There are k B_shifter storage blocks in total, and k is the maximum value of the number of non-negative unit matrices in the column blocks of each layer;

The data selector unit Data_selector is used to select the output data of the shift registers at the front end and the back end, connect the data to the VNU module, and start the data iteration of the next layer;

The check node information storage FIFO unit R_M_FIFO is mainly used to store the updated external information of the check node. There are k check node information storage R_M_FIFO units in total, where k is the maximum value of the number of non-negative unit matrices in each layer column block;

The variable node information storage FIFO module Q_M_FIFO is mainly used to store the updated external information of the variable node. There are k check node information storage Q_M_FIFO units in total, where k is the maximum value of the number of non-negative unit matrices in each layer column block;

The control module Con_M controls the same-layer data processing and data exchange between different layers of all the above modules, as well as the timing of 4-pipeline frame interleaving decoding.

2. According to claim 1, a QC-LDPC code decoder based on FPGA is characterized in that: the channel information and posterior probability storage unit LLR_RAM, data is completely written only when initialized and iteration is completed, and the remaining data iteration layers are only selectively written, and the write selection is controlled by the control module Con_M; only in the initial iteration, the first layer of data is read out according to the requirements of the first layer matrix, and the remaining layers are selectively partially read out, and the LLR_RAM data is read, which is 3 clocks ahead of the data processing of this layer, and the read selection is controlled by the control module Con_M.

3. According to claim 2, a QC-LDPC code decoder based on FPGA is characterized in that: the front-end drum shift register unit F_shifter includes a data selector module Data_selector, and the output data of the front-end drum shift register unit F_shifter is no longer completely shifted and output according to the shift requirements of the current layer of the traditional structure, but is partially shifted and output according to the part connected to the next layer by the data selector module Data_selector, controlled by the control module Con_M.

4. A QC-LDPC code decoder based on FPGA according to claim 3, characterized in that: the data selector module Data_selector selects data connected to the variable node calculation unit VNU according to the current iteration layer number, and the data are the output values of the front-end and back-end shift register units respectively.

5. According to claim 4, a QC-LDPC code decoder based on FPGA is characterized in that: the output data of the back-end shift register unit B_shifter is no longer completely restored to the original LLR sequence by the traditional structure, but is determined by whether the data selector module Data_selector is connected to the variable node calculation unit VNU of the next layer. If it is connected to the VNU of the next layer, it is processed according to the shift requirements of the block matrix of the next layer; if it is not connected to the next layer, it is shifted and restored to the original LLR sequence and written back to the LLR_RAM module.

6. A QC-LDPC code decoder based on FPGA according to claim 1, characterized in that: the control module Con_M starts from the channel input to the LLR_RAM unit, controls the data processing order of each layer, from F_shifter to VNU, then to CNU, then to LUU, and finally to B_shifter, and then writes back to the RAM_LLR module as needed; manages the data flow between layers, the data is transferred from the first layer to the second layer, and then to the last layer in turn, completing a data iteration until the required number of data iterations is completed; controls 4-pipeline frame interleaved decoding, because each layer of data processing requires 4 clocks, so 4-pipeline frame interleaved decoding is used, when processing data within a layer, each clock corresponds to different frame data processing.

7. The method for implementing a QC-LDPC code decoder based on FPGA according to any one of claims 1 to 6, characterized in that:

S1: Initialization: The received four-frame channel initial information LLR is divided into blocks and stored in each storage block of the LLR_RAM unit according to the columns of the check matrix H, and the check node information of all 0 is written in each R_M_FIFO unit. At the same time, the number of iterations iter is initialized to 0;

S2: First layer data iteration: First, read the data in the LLR_RAM unit corresponding to the non-negative block matrix in the first layer of the check matrix H, extract the data corresponding to each frame in turn, and shift it in the front-end shift register unit F_shifter. The shifted data is sent to VNU for variable node data update, and then sent to CNU for check node update. LUU updates the posterior probability of the data, and finally sends the posterior probability LLR data to the B_shifter unit;

S3: Data update of subsequent layers: During the update process of the first-layer VNU module, the second-layer data is prepared based on the non-negative block matrix in the second-layer check matrix H. If the first-layer H matrix block and the second-layer H matrix block are adjacent to each other and are non-negative matrices, the data can be directly obtained by shifting the LLR data corresponding to the first layer through the B_shifter unit without re-reading the LLR_RAM unit; if the first-layer H matrix block is a negative matrix and the second-layer H matrix block is a non-negative matrix, the LLR data corresponding to the second layer needs to read the information from the RAM_LLR unit and perform the corresponding shift operation through the F_shifter unit; if the first-layer H matrix block is a non-negative matrix and the second-layer H matrix block is a negative matrix, the LLR data corresponding to the first layer needs to be shifted and restored through the B_shifter unit and written back to the LLR_RAM unit. The data obtained by the second layer is sent to the Data_selector data selection module after the shift operation, and then sent to the VNU, and the data is updated using operations similar to the first layer;

S4: Iteration process: Repeat the above steps until the data update of all layers is completed, and the number of iterations iter increases by 1. If the set number of iterations is not reached, continue to update the data. When the iteration is completed, the posterior probability LLR output by the LUU is restored to the original sequence by the B_shifter unit and written back to the RAM_LLR unit;

S5: Output decoding result: read the updated data from the RAM_LLR unit, perform decoding decision, and finally output the result.

8. The method for implementing a QC-LDPC code decoder based on FPGA according to claim 7, characterized in that the data processing algorithm used is a layered minimum sum algorithm (LMSA).