Disclosure of Invention
1. Technical problem to be solved
The invention aims to provide an FPGA-based QC-LDPC code decoder and an implementation method thereof, which are used for solving the problems that in the background art, when the number of conflict submatrices is large, the method can obviously reduce the decoding performance, and the other method is used for reducing the number of conflicts by reordering the processing of the submatrices and deferring the use time of updating contribution values on the basis of the method, but only reduces the data updating conflict, but does not solve the problem of reducing the decoding performance.
2. Technical proposal
A FPGA-based QC-LDPC code decoder comprising:
The channel information and posterior probability storage unit LLR_RAM is used for storing the initially transmitted channel data and LLR information after data updating iteration, n RAM storage blocks are shared, and n is the column block number of the check matrix H;
the front-end shift register unit F_shift is used for performing cyclic right shift operation on data output by the LLR_RAM according to the shift requirement of the unit matrix in each layer of the check matrix H, and k blocks of F_shift memory blocks are used as the maximum value in the number of the unit matrix which is not negative in each layer of column partition;
the variable node computing unit VNU is used for updating and computing the information data outside the variable nodes in the iterative process, and m variable node computing units are used for the total, wherein m is the number of rows of the identity matrix in the check matrix H;
The check node calculation unit CNU is used for updating and calculating the information data outside the check nodes in the iterative process, wherein m check node calculation units are used for the total, and m is the number of rows of the identity matrix in the check matrix H;
The posterior probability updating module LUU updates posterior probability LLR of the layer according to updated variable node information check node information, and m posterior probability updating units are totally arranged, wherein m is the number of rows of the identity matrix in the check matrix H;
The back-end shift register unit B_shift is used for reversely shifting a part of data output by the LUU, writing the data back to the RAM, and performing cyclic right shifting operation of the unit matrix according to the H matrix shift requirement of the next layer, wherein k blocks of B_shift memory blocks are shared, and k is the maximum value in the number of the unit matrices which are not negative in each layer of column partition;
the Data selector unit data_selector is used for selecting the shift registers at the front end and the rear end to output Data, connecting the Data to the VNU module and starting the Data iteration of the next layer;
The check node information storage FIFO unit R_M_FIFO is mainly used for storing external information updated by check nodes, and k check node information storage R_M_FIFO units are used for storing k pieces of check node information, wherein k is the maximum value in the number of non-negative unit matrixes in each layer of column partition;
The variable node information storage FIFO module Q_M_FIFO is mainly used for storing external information updated by variable nodes, and k check node information storage Q_M_FIFO units are used for storing the external information updated by the variable nodes, wherein k is the maximum value in the number of non-negative unit matrixes in each layer of column partition;
The control module Con_M controls the same-layer data processing and the data exchange between different layers of all the modules, and the time sequence of 4-pipeline frame staggered decoding.
Preferably, the channel information and posterior probability storage unit LLR_RAM completely writes data only when initialization and iteration are completed, the other data iterates layers, only selectively writes, the writing selection is controlled by the control module Con_M, the first layer data only reads all data of the layers according to the first layer matrix requirement when the iteration is initiated, the other layers selectively read part, LLR_RAM data is read, 3 clocks are processed in advance of the layer data, and the reading selection is controlled by the control module Con_M.
Preferably, the front-end drum shift register unit f_selector includes a Data selector module data_selector, and the output Data of the front-end drum shift register unit f_selector is not completely shifted and output according to the current layer shift requirement of the conventional structure, but is partially shifted and output according to the connection of the Data selector module data_selector to the next layer, which is controlled by the control module con_m.
Preferably, the Data selector module data_selector selects Data connected to the variable node computing unit VNU according to the current iteration layer number, the Data being output values of the front-end and back-end shift register units, respectively.
Preferably, the output Data of the back-end shift register unit b_shift is no longer determined by the full recovery of the original LLR sequence of the conventional structure, but by whether the Data selector module data_selector is connected to the next-layer variable node computing unit VNU, and if connected to the next-layer VNU, the output Data is processed according to the shift requirement of the block matrix of the next-layer, and if not connected to the next-layer, the shift is recovered to the original LLR sequence, and the Data selector module data_selector is written back to the llr_ram module.
Preferably, the control module Con_M starts from the channel input to the LLR_RAM unit, controls the data processing sequence of each layer, from F_shifter to VNU to CNU to LUU, finally to B_shifter, and then to write back to the RAM_LLR module according to the need, manages the data flow between layers, transfers the data from the first layer to the second layer to the last layer in turn, completes one data iteration until the required data iteration times are completed, controls 4-pipeline frame interleaved decoding, uses 4-pipeline frame interleaved decoding because each layer of data processing needs 4 clocks, and each clock corresponds to different frame data processing when processing the data in the layer.
Preferably, the implementation method of the QC-LDPC code decoder based on the FPGA comprises the following steps:
s1, initializing, namely storing received four-frame channel initial information LLR into each storage block of an LLR_RAM unit in a blocking manner according to the columns of a check matrix H, writing all 0 check node information into each R_M_FIFO unit, and initializing iteration number item to 0;
S2, iterating first layer data, namely firstly, reading data in the first layer data according to LLR_RAM units corresponding to non-negative blocking matrixes in the first layer of the check matrix H, sequentially extracting data corresponding to each frame, shifting the data in a front-end shift register unit F_shift, sending the shifted data into a VNU (virtual network unit) for variable node data updating, then sending the data into a CNU for check node updating, carrying out posterior probability updating on the data by the LUU, and finally sending the posterior probability LLR data into a B_shift unit;
S3, in the updating process of the VNU module of the first layer, based on a non-negative blocking matrix in a check matrix H of the second layer, preparing Data of the second layer, if the blocking of the first layer H matrix is adjacent to the blocking of the second layer H matrix up and down and are non-negative matrices, the Data can be directly obtained by shifting LLR Data corresponding to the first layer through a B_shift unit without re-reading an LLR_RAM unit; if the first layer H matrix is divided into blocks with negative matrix and the second layer H matrix is divided into blocks with non-negative matrix, the LLR Data corresponding to the second layer needs to read information from the RAM-LLR unit and carry out corresponding shift operation through the F-shift unit; if the first layer H matrix is divided into non-negative matrix and the second layer H matrix is divided into negative matrix, the LLR Data corresponding to the first layer is required to be shifted and recovered through the B_shift unit and written back into the LLR_RAM unit, the Data acquired by the second layer is sent to the data_selector Data selection module after being shifted and sent to the VNU, and Data update is carried out by adopting operation similar to the first layer;
s4, repeating the steps until the data updating of all layers is completed, adding 1 to iteration number item, if the iteration number does not reach the set iteration number, continuing to update the data, and after the iteration is completed, restoring the posterior probability LLR output by the LUU by the B_shifter unit to an original sequence and writing back to the RAM_LLR unit;
s5, outputting a decoding result, namely reading updated data from the RAM-LLR unit, performing decoding judgment, and finally outputting the result.
Preferably, the data processing algorithm used is the hierarchical minimum sum algorithm LMSA.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
The invention relates to a QC-LDPC code decoder based on an FPGA and an implementation method thereof, which are improved on the basis of the structure of the traditional LDPC code decoder, in particular to a Data selector which is introduced before a variable node processing unit VNU. The two input ends of the selector are respectively connected with the shift register of the input end and the output end of the system, and the output value of the shift register is flexibly connected to the VNU through the selector, so that the RAM read operation of the current layer is advanced to the updating stage of the previous layer;
According to the QC-LDPC code decoder based on the FPGA and the implementation method thereof, update data of the layer is also partially written into the LLR-RAM unit and is partially transmitted to the next layer of VNU, three clock cycles for reading by the RAM are reduced, the data transmission efficiency is improved, and the decoding performance is not reduced;
The invention adopts an intra-layer parallel decoding algorithm and a 4-stage pipeline structure technology, so that the throughput of the decoder is further improved. This design significantly increases decoding speed and reduces latency without degrading performance, and is suitable for LDPC decoding applications in high-speed communication scenarios.
Detailed Description
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "configured to," "engaged with," "connected to," and the like are to be construed broadly, and include, for example, "connected to," whether fixedly connected to, detachably connected to, or integrally connected to, mechanically connected to, electrically connected to, directly connected to, indirectly connected to, and in communication with each other via an intermediate medium. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Referring to fig. 1-5, the present invention provides a technical solution:
a FPGA-based QC-LDPC code decoder comprising:
The device comprises channel information, a posterior probability storage unit LLR_RAM, a front-end roller shift register unit F_selector, a variable node calculation unit VNU, a check node calculation unit CNU, a posterior probability update unit LUU, a rear-end roller shift register unit B_selector, a Data selector module data_selector, a check node information storage FIFO unit R_M_FIFO, a variable node information storage FIFO unit Q_M_FIFO and a control module Con_M;
The LLR values corresponding to the non-negative blocking sub-matrix of the next layer are no longer read out of the LLR_RAM cells and shifted to the VNU. If the current layer is the negative submatrix and the lower layer is not the negative submatrix, the LLR of the next layer is read from an LLR_RAM unit in advance by 3 clocks and is sent to an F_shift unit, the current layer is the non-negative submatrix, the lower layer is the negative matrix, the LLR of the current layer is shifted and recovered and written back to the LLR_RAM, and in addition, an intra-layer parallel decoding algorithm and a 4-level pipeline structure technology are adopted as shown in figure 2, so that the throughput of a decoder is further improved;
The channel information and posterior probability storage unit LLR_RAM is used for storing the initially transmitted channel data and LLR information after data updating iteration, n RAM storage blocks are shared, and n is the column block number of the check matrix H;
the front-end shift register unit F_shift is used for performing cyclic right shift operation on data output by the LLR_RAM according to the shift requirement of the unit matrix in each layer of the check matrix H, and k blocks of F_shift memory blocks are used as the maximum value in the number of the unit matrix which is not negative in each layer of column partition;
the variable node computing unit VNU is used for updating and computing the information data outside the variable nodes in the iterative process, and m variable node computing units are used for the total, wherein m is the number of rows of the identity matrix in the check matrix H;
The check node calculation unit CNU is used for updating and calculating the information data outside the check nodes in the iterative process, wherein m check node calculation units are used for the total, and m is the number of rows of the identity matrix in the check matrix H;
The posterior probability updating module LUU updates posterior probability LLR of the layer according to updated variable node information check node information, and m posterior probability updating units are totally arranged, wherein m is the number of rows of the identity matrix in the check matrix H;
The back-end shift register unit B_shift is used for reversely shifting a part of data output by the LUU, writing the data back to the RAM, and performing cyclic right shifting operation of the unit matrix according to the H matrix shift requirement of the next layer, wherein k blocks of B_shift memory blocks are shared, and k is the maximum value in the number of the unit matrices which are not negative in each layer of column partition;
the Data selector unit data_selector is used for selecting the shift registers at the front end and the rear end to output Data, connecting the Data to the VNU module and starting the Data iteration of the next layer;
The check node information storage FIFO unit R_M_FIFO is mainly used for storing external information updated by check nodes, and k check node information storage R_M_FIFO units are used for storing k pieces of check node information, wherein k is the maximum value in the number of non-negative unit matrixes in each layer of column partition;
The variable node information storage FIFO module Q_M_FIFO is mainly used for storing external information updated by variable nodes, and k check node information storage Q_M_FIFO units are used for storing the external information updated by the variable nodes, wherein k is the maximum value in the number of non-negative unit matrixes in each layer of column partition;
A control module Con_M for controlling the same layer data processing of all the modules and the data exchange between different layers, and the time sequence of 4-pipeline frame staggered decoding;
specifically, the channel information and posterior probability storage unit LLR_RAM only performs complete writing of data when initialization and iteration are completed, the other data iterates layers, only performs selective writing, writing selection is controlled by the control module Con_M, only performs all data reading of the first layer according to the first layer matrix requirement when the initial iteration is performed, the other layers perform selective partial reading, and reads LLR_RAM data, processes the data of the layer for 3 clocks in advance, and the reading selection is controlled by the control module Con_M.
Further, the front-end drum shift register unit f_shift includes a Data selector module data_selector, and the output Data of the front-end drum shift register unit f_shift is not completely shifted and output according to the current layer shift requirement of the conventional structure, but is partially shifted and output according to the connection of the Data selector module data_selector to the next layer, which is controlled by the control module con_m.
Further, the Data selector module data_selector selects Data connected to the variable node calculation unit VNU according to the current iteration layer number, the Data being output values of the front-end and back-end shift register units, respectively.
Further, the output Data of the back-end shift register unit b_shift is not completely restored to the original LLR sequence by the conventional structure, but is determined by whether the Data selector module data_selector is connected to the next-layer variable node computing unit VNU, if connected to the next-layer VNU, the output Data is processed according to the shift requirement of the block matrix of the next-layer, and if not connected to the next-layer, the shift is restored to the original LLR sequence, and the Data selector module data_selector is written back to the llr_ram module.
Further, the control module Con_M starts from the channel input to the LLR_RAM unit, controls the data processing sequence of each layer, from F_shift to VNU to CNU to LUU, finally to B_shift, and then to write back to the RAM_LLR module according to the need, manages the data flow between layers, transfers the data from the first layer to the second layer to the last layer in turn, completes one data iteration until the required data iteration times are completed, controls 4-channel frame interleaved decoding, uses 4-channel frame interleaved decoding because each layer of data processing needs 4 clocks, and each clock corresponds to different frame data processing when the data is processed in the layer.
Notably, the implementation method of the QC-LDPC code decoder based on the FPGA comprises the following steps of:
s1, initializing, namely storing received four-frame channel initial information LLR into each storage block of an LLR_RAM unit in a blocking manner according to the columns of a check matrix H, writing all 0 check node information into each R_M_FIFO unit, and initializing iteration number item to 0;
S2, iterating first layer data, namely firstly, reading data in the first layer data according to LLR_RAM units corresponding to non-negative blocking matrixes in the first layer of the check matrix H, sequentially extracting data corresponding to each frame, shifting the data in a front-end shift register unit F_shift, sending the shifted data into a VNU (virtual network unit) for variable node data updating, then sending the data into a CNU for check node updating, carrying out posterior probability updating on the data by the LUU, and finally sending the posterior probability LLR data into a B_shift unit;
S3, in the updating process of the VNU module of the first layer, based on a non-negative blocking matrix in a check matrix H of the second layer, preparing Data of the second layer, if the blocking of the first layer H matrix is adjacent to the blocking of the second layer H matrix up and down and are non-negative matrices, the Data can be directly obtained by shifting LLR Data corresponding to the first layer through a B_shift unit without re-reading an LLR_RAM unit; if the first layer H matrix is divided into blocks with negative matrix and the second layer H matrix is divided into blocks with non-negative matrix, the LLR Data corresponding to the second layer needs to read information from the RAM-LLR unit and carry out corresponding shift operation through the F-shift unit; if the first layer H matrix is divided into non-negative matrix and the second layer H matrix is divided into negative matrix, the LLR Data corresponding to the first layer is required to be shifted and recovered through the B_shift unit and written back into the LLR_RAM unit, the Data acquired by the second layer is sent to the data_selector Data selection module after being shifted and sent to the VNU, and Data update is carried out by adopting operation similar to the first layer;
s4, repeating the steps until the data updating of all layers is completed, adding 1 to iteration number item, if the iteration number does not reach the set iteration number, continuing to update the data, and after the iteration is completed, restoring the posterior probability LLR output by the LUU by the B_shifter unit to an original sequence and writing back to the RAM_LLR unit;
s5, outputting a decoding result, namely reading updated data from the RAM-LLR unit, performing decoding judgment, and finally outputting the result.
Embodiment one:
The embodiment of the invention provides an FPGA-based QC-LDPC code decoder and an implementation method thereof, and the specific processing steps are shown in figure 4;
The method comprises the steps of S1, initializing, namely storing received four-frame channel initial information LLR into each storage block of channel information and a posterior probability storage unit LLR_RAM in a segmented mode according to the columns of H of a check matrix, wherein each storage block has a storage address of 0-3, and respectively corresponds to the same LLR divided storage positions in different frames;
s2, iterating the first layer of data;
a) According to the first layer of the check matrix H, the storage data in each storage block of the LLR_RAM unit at the position where the first layer of the check matrix H is a non-negative sub-matrix is sequentially read, the first clock reads the corresponding data of the first frame, the second clock reads the corresponding data of the second frame, and at the same time, the second clock starts to read the variable node data corresponding to the first layer of the first frame in each storage block of the R_M_FIFO; the method comprises the steps of finishing four-frame LLR block initial information reading in the first four clocks, finishing variable node information corresponding to the first layer four frames in the second to fifth clocks, sequentially sending the read LLR block initial data into each shifting unit of a front-end shifting register unit Fl_shift corresponding to the first layer according to a reading sequence, shifting according to the shifting requirement of a first layer check matrix, wherein F_shift only consumes one clock when finishing shifting operation, the four-frame LLR block data share the same F_shift, distinguishing the four sent clocks sequentially, and controlling the data reading operation by a control module Con_M;
b) The first layer front end shift register unit F_shift outputs Data, the Data is connected to all the computing units of the VNU by the selection module data_selector, variable node Data of all the frames of R_M_FIFO read at the moment are also connected to all the computing units of the VNU, after all the computing units of the VNU complete Data updating of variable nodes, the Data are sent to all the computing units of the CNU and all the storage units of the Q_M_FIFO, the CNU reads the information of Q_M_FIFO just stored during the Data updating of the check node, and after the Data of the check node complete updating, the Data are sent to all the module computing units of the LUU and all the storage units of the R_M_FIFO. Each calculation unit of the LUU sends the data into each shift unit of the back-end shift register B_shift after finishing the data updating of the posterior probability LLR for the sent check node information and Q_M_FIFO storage information; the four frames of LLR block data share the same VNU, CNU and LUU modules, the four clocks are fed in to be distinguished successively, and the reading operation of the data is controlled by a control module Con_M;
c) The Data sent to each shift unit of the back-end shift register B_shift is subjected to shift control by a control module according to the blocks which are not negative in the second layer of the block of the check matrix H, if the blocks of the first layer and the second layer are adjacent and are not negative, the check block of the second layer is directly obtained by directly shifting the corresponding B_shift unit of the first layer instead of being read again from the LLR_RAM storage unit, the shifted blocks are connected to each calculation unit of the VNU of the second layer by a Data selection module data_selector, and the second layer Data update is started; if the blocks of the first layer are adjacent to the blocks of the second layer, the check block corresponding to the first layer is not negative, and the check block of the second layer is negative, the posterior probability LLR block output by the LUU of the first layer is directly shifted to restore the original sequence, and the original sequence is written back to the storage position corresponding to the RAM_LLM; if the first layer and the second layer are adjacent, the check block corresponding to the first layer is negative, but the check block of the second layer is not negative, three clocks are required to be read from the corresponding storage position of the RAM_LLM in advance before the B_shift unit outputs, wherein two clocks are read, one clock is sent to the shift unit corresponding to the front shift register module F_shift for shift output, output Data is sent to the Data selection module data_selector to be connected to the VNU computing unit of the second layer, at the moment, the shift unit of the front shift register module F_selector and the shift unit of the rear shift register module B_selector are simultaneously connected to the Data selection module data_selector, and the VNU module computing unit of the second layer is connected to the data_selector for update;
Second layer data update, namely acquiring second layer data from c), performing data update of the second layer according to the similar operation of b) and c), and acquiring third layer data for update;
And finishing an iteration operation according to the operation until the last layer of data is updated, and performing an 1 adding operation on the iteration count item. If the iteration number requirement is not completed, acquiring new layer data according to the step 2 c), and updating the data of the layer according to the similar operations of the step 2B) and the step 2 c) until the iteration requirement is completed, and after the iteration is completed, sending posterior probability LLR output by each calculation unit of the LUU module into a shift unit of a rear-end shift register module B_shift to restore an original LLR sequence, and writing back into a RAM_LLR storage unit;
completely reading the iterated data from the RAM-LLR storage unit, performing decoding judgment, and outputting;
The data processing algorithm used in the embodiment is a hierarchical minimum sum algorithm LMSA;
the calculation formula of the VNU module is as follows:
External messages expressed as the jth check node propagated from the ith variable node of the nth layer at the ith iteration, for a particular m and n, in the mth layer Updated by the a posteriori probability LLR in the n-th layer.
The calculation formula of the CNU module is as follows:
Wherein the method comprises the steps of Meaning that at the ith iteration, an external message propagates from the jth check node to the ith variable node of the nth layer. N(j)\i The variable node set participating in the jth check equation is represented, excluding the ith variable node. Where α is the attenuation factor α ε (0, 1).
The calculation formula of the LUU module is as follows:
The update of a posterior LLR of an ith variable node of an nth layer in the ith iteration is represented;
The embodiment is tested in an AWGN channel with BPSK modulation as shown in fig. 5. The normalization factor is set to α=0.75. Meanwhile, the initialization information LLR, the VNU information and the posterior probability information LLR are quantized in a manner of [6, 8 ]. In the decoding process, the maximum iteration times are respectively set to be 5 times, 10 times and 15 times, the error rate is counted, and the error rate is compared with the error rate of 10 times of iteration data of LNMS algorithm of MATLAB without any modification. As can be seen from fig. 5, when the number of iterations is set to 10, the decoding performance of the decoder is substantially optimal, and when the bit error rate is 10 -5, the difference between the decoding performance of the decoder and MATLAB for 10 times is less than 0.1dB.
The embodiment is tested in an AWGN channel with BPSK modulation as shown in fig. 5. The normalization factor is set to α=0.75. Meanwhile, the initialization information LLR, the VNU information and the posterior probability information LLR are quantized in a manner of [6, 8 ]. In the decoding process, the maximum iteration times are respectively set to be 5 times, 10 times and 15 times, the error rate is counted, and the error rate is compared with the error rate of 10 times of iteration data of LNMS algorithm of MATLAB without any modification. As can be seen from fig. 5, when the number of iterations is set to 10, the decoding performance of the decoder is substantially optimal, and when the bit error rate is 10 -5, the difference between the decoding performance of the decoder and MATLAB for 10 times is less than 0.1dB.
The specific throughput of the embodiment is obtained by the following formula:
Where N ldpc represents the LDPC code length used, N fram represents the number of frames that can be decoded simultaneously, f clk represents the system clock frequency used, rate is the code Rate, N iter represents the number of iterations, N lay represents the number of decoder layers, and C lay represents the number of clocks required for each layer of iterations. At 254M clock frequency, a calculated single drop can achieve a throughput of 102.87Gb/s at maximum. The decoder is suitable for QC-LDPC decoding in a high-speed communication scene. The FPGA of the case realizes the statistical result as follows;
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.