[go: up one dir, main page]

TWI552533B - Low density parity check decoding method performing on general graphic processing unit - Google Patents

Low density parity check decoding method performing on general graphic processing unit Download PDF

Info

Publication number
TWI552533B
TWI552533B TW104122282A TW104122282A TWI552533B TW I552533 B TWI552533 B TW I552533B TW 104122282 A TW104122282 A TW 104122282A TW 104122282 A TW104122282 A TW 104122282A TW I552533 B TWI552533 B TW I552533B
Authority
TW
Taiwan
Prior art keywords
message
array
value
target
index
Prior art date
Application number
TW104122282A
Other languages
Chinese (zh)
Other versions
TW201703441A (en
Inventor
賴伯承
邱奏翰
Original Assignee
華邦電子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 華邦電子股份有限公司 filed Critical 華邦電子股份有限公司
Priority to TW104122282A priority Critical patent/TWI552533B/en
Application granted granted Critical
Publication of TWI552533B publication Critical patent/TWI552533B/en
Publication of TW201703441A publication Critical patent/TW201703441A/en

Links

Landscapes

  • Error Detection And Correction (AREA)

Description

執行於通量圖形處理器的低密度奇偶檢查解碼方法Low density parity check decoding method performed on a flux graphics processor

本發明是有關於一種平行化資料處理技術,且特別是有關於一種執行於通量圖形處理器的低密度奇偶檢查解碼方法以及解碼裝置。The present invention relates to a parallel data processing technique, and more particularly to a low density parity check decoding method and decoding apparatus implemented in a flux graphics processor.

由於低密度奇偶檢查(Low Density Parity Check,LDPC)碼是一種可以達到近似於雪農(Shannon)通道極限的效能等級的一種錯誤更正碼,因此LDPC碼被廣泛使用於時下很多通訊系統標準上,像是基於IEEE 802.11n標準的WiFi系統、基於IEEE 802.3標準的乙太網系統、基於IEEE 802.16e標準的WiMAX系統,或數位視訊廣播(Digital Video Broadcasting-Satellite transmission 2nd generation,DVB-S2)系統等等。雖然LDPC在通道編碼上有相對較好的通道錯誤檢查與校正能力,但LDPC的解碼流程需要重複性的迭代循環運算來取得解碼結果。因此,在利用大尺寸的奇偶校驗矩陣(Parity check matrix)來協助解碼運算的狀況下,LDPC的解碼流程需要龐大的硬體計算能力與硬體資源來支援。Since the Low Density Parity Check (LDPC) code is an error correction code that can achieve a performance level similar to that of the Shannon channel, LDPC codes are widely used in many communication system standards today. , such as the IEEE 802.11n-based WiFi system, the IEEE 802.3-based Ethernet system, the IEEE 802.16e-based WiMAX system, or the Digital Video Broadcasting-Satellite Transmission 2nd generation (DVB-S2) system. and many more. Although LDPC has relatively good channel error checking and correction capability in channel coding, the LDPC decoding process requires repetitive iterative loop operations to obtain decoding results. Therefore, in the case of using a large parity check matrix to assist in decoding operations, the decoding process of the LDPC requires a large amount of hardware computing power and hardware resources to support.

通量圖形處理器(General-purpose computing on graphics processing units,GPGPU)利用處理圖形任務的圖形處理器來計算原本由中央處理器處理的通用計算任務,且這些通用計算常常與圖形處理沒有任何關係。進一步來說,通量圖形處理器是一種多核心架構,其藉由同時執行大量的執行緒(threads)而提供強大的運算能力與高資料吞吐量(throughput)。可預期的,在通量圖形處理器上執行低密度奇偶檢查解碼將可大幅提升解碼效能。General-purpose computing on graphics processing units (GPGPUs) utilize graphics processors that process graphics tasks to compute general-purpose computing tasks that would otherwise be handled by the central processing unit, and these general-purpose calculations often have nothing to do with graphics processing. Further, the flux graphics processor is a multi-core architecture that provides powerful computing power and high data throughput by simultaneously executing a large number of threads. It is expected that performing low density parity check decoding on a flux graphics processor will greatly improve decoding performance.

然而,在習知的作法中,通量圖形處理器一般僅支援規則的低密度奇偶檢查解碼。對於不規則的低密度奇偶檢查解碼來說,解碼效能往往會受限於資料結構與記憶體存取之設計上的難度與複雜度。基此,本領域技術人員旨在開發一種應用範圍更廣泛且可提升通量圖形處理器執行低密度奇偶檢查解碼之解碼效能的設計架構。However, in conventional practice, flux graphics processors typically only support regular low density parity check decoding. For irregular low-density parity check decoding, decoding performance is often limited by the difficulty and complexity of data structure and memory access design. Accordingly, those skilled in the art are directed to developing a design architecture that is more widely used and that can improve the decoding performance of a flux graphics processor to perform low density parity check decoding.

有鑑於此,本發明提供一種執行於通量圖形處理器的低密度奇偶檢查解碼方法以及解碼裝置,可獲取更大的運算平行度而進一步提升解碼效能,並可支援不規則的低密度奇偶檢查解碼。In view of this, the present invention provides a low-density parity check decoding method and a decoding apparatus performed on a flux graphics processor, which can obtain greater parallelism of operations and further improve decoding performance, and can support irregular low-density parity check. decoding.

本發明提出一種執行於通量圖形處理器的低密度奇偶檢查碼解碼方法,所述通量圖形處理器的串流多處理器包括多個執行緒處理核心與共享記憶體。所述方法包括下列步驟。基於相關於奇偶校驗矩陣的坦那圖(Tanner graph)中的M個邊(edge),將每一邊對應至多個執行緒其中之一,致使每一執行緒對應至多個邊識別碼其中之一。M為大於1的整數且這些邊連接於多個檢查節點與多個位元節點之間。當執行這些執行緒其中之一,依據執行緒其中之一的邊識別碼存取共享記憶體中的資料,以更新儲存於共享記憶體中分別對應至這些邊的多個傳遞消息。The present invention provides a low density parity check code decoding method performed by a flux graphics processor, the stream multiprocessor of the flux graphics processor comprising a plurality of thread processing cores and shared memory. The method includes the following steps. Based on the M edges in the Tanner graph associated with the parity check matrix, each edge is mapped to one of the plurality of threads, such that each thread corresponds to one of the plurality of edge identifiers . M is an integer greater than 1 and these edges are connected between a plurality of check nodes and a plurality of bit nodes. When one of the threads is executed, the data in the shared memory is accessed according to the side identification code of one of the threads to update the plurality of delivery messages respectively stored in the shared memory corresponding to the sides.

在本發明的一實施例中,上述的依據執行緒其中之一的邊識別碼存取共享記憶體中的資料,以更新儲存於共享記憶體中分別對應至多個邊的傳遞消息的步驟包括:依據執行緒其中之一的邊識別碼從M個取值索引值中讀取出至少一目標取值索引值,並依據目標取值索引值從儲存於共享記憶體中的M個第一方向傳遞消息中讀取出至少一第一目標傳遞消息。In an embodiment of the present invention, the step of accessing the data in the shared memory by the side identification code according to one of the threads to update the transfer message respectively corresponding to the plurality of edges stored in the shared memory includes: Reading at least one target value index value from the M value index values according to one of the thread identification codes, and transmitting from the M first directions stored in the shared memory according to the target value index value At least one first target delivery message is read from the message.

在本發明的一實施例中,儲存於上述的共享記憶體的取值索引陣列存放有分別對應至多個邊的取值索引值,且儲存於共享記憶體的位元點至檢查點消息陣列存放有分別對應至這些邊的第一方向傳遞消息。In an embodiment of the present invention, the value index array stored in the shared memory stores value index values respectively corresponding to the plurality of edges, and the bit points stored in the shared memory are stored in the checkpoint message array. There are messages in the first direction corresponding to the edges respectively.

在本發明的一實施例中,上述的取值索引陣列中的取值索引值的陣列存放位置依據坦那圖的連結狀態而定,上述的位元點至檢查點消息陣列中對應至相同的檢查節點的第一方向傳遞消息相鄰排列。In an embodiment of the present invention, the array storage location of the value index value in the value index array is determined according to the connection state of the map, and the bit point to the checkpoint message array corresponds to the same Check the first direction of the node to pass the message adjacent to the array.

在本發明的一實施例中,上述的依據執行緒其中之一的邊識別碼從取值索引值中讀取出目標取值索引值,並依據目標取值索引值從儲存於共享記憶體中的第一方向傳遞消息中讀取出第一目標傳遞消息的步驟包括:依據執行緒其中之一的邊識別碼,從取值索引陣列中的第i個取值索引值開始讀取出目標取值索引,其中i等於執行緒其中之一的邊識別碼。依據第i個取值索引值,從位元點至檢查點消息陣列中的第j個第一方向傳遞消息開始讀取出第一目標傳遞消息,其中j等於第i個取值索引值。響應於依序循環讀取取值索引陣列而持續從位元點至檢查點消息陣列讀取出第一目標傳遞消息,直至讀取到符合預設條件的取值索引值其中之一,以停止讀取位元點至檢查點消息陣列中的第一方向傳遞消息。符合預設條件的取值索引值其中之一等於執行緒其中之一的邊識別碼。In an embodiment of the present invention, the edge identification code according to one of the foregoing threads reads the target value index value from the value index value, and stores the value in the shared memory according to the target value index value. The step of reading the first target delivery message in the first direction delivery message comprises: reading the target extraction from the i-th index value in the value index array according to the edge identification code of one of the threads A value index, where i is equal to the side identifier of one of the threads. According to the ith value index value, the first target delivery message is read from the bit point to the jth first direction delivery message in the checkpoint message array, where j is equal to the ith value index value. Retrieving the first target delivery message from the bit point to the checkpoint message array continuously in response to sequentially reading the array of index indexes until one of the value index values meeting the preset condition is read to stop The bit direction is read to the first direction in the checkpoint message array to deliver the message. One of the value index values that meet the preset condition is equal to one of the thread's side identifiers.

在本發明的一實施例中,上述的依據執行緒其中之一的邊識別碼存取共享記憶體中的資料,以更新儲存於共享記憶體中分別對應至多個邊的傳遞消息的步驟更包括:依據執行緒其中之一的識別碼從M個位置索引值中讀取目標位置索引值,並利用目標位置索引值與上述的第一目標傳遞消息更新M個第二方向傳遞消息中的第二目標傳遞消息。此目標位置索引值指示出第二目標傳遞消息的陣列存放位置。In an embodiment of the present invention, the step of accessing the data in the shared memory by the edge identification code according to one of the threads to update the transfer message respectively corresponding to the plurality of edges stored in the shared memory further includes Reading the target location index value from the M location index values according to the identifier of one of the threads, and updating the second of the M second direction delivery messages with the target location index value and the first target delivery message described above The target delivers the message. This target location index value indicates the array storage location of the second target delivery message.

在本發明的一實施例中,位置索引陣列存放有分別對應至上述的邊的位置索引值,且檢查點至位元點消息陣列存放有分別對應至上述的邊的第二方向傳遞消息。In an embodiment of the invention, the location index array stores location index values respectively corresponding to the edges, and the checkpoint-to-bit dot message array stores second direction transfer messages respectively corresponding to the edges.

在本發明的一實施例中,上述的位置索引陣列中的位置索引值的陣列存放位置依據坦那圖的連結狀態而定,檢查點至位元點消息陣列中對應至相同的位元節點的第二方向傳遞消息相鄰排列。In an embodiment of the present invention, the array storage location of the location index value in the location index array is determined according to the connection state of the graph, and the checkpoint to the bitpoint message array corresponds to the same bit node. The second direction passes the messages adjacent to each other.

在本發明的一實施例中,上述的依據執行緒其中之一的識別碼從位置索引值中讀取目標位置索引值的步驟包括:依據執行緒其中之一的邊識別碼,讀取位置索引陣列中的第i個位置索引值以作為該目標位置索引值,其中i等於執行緒其中之一的邊識別碼。In an embodiment of the present invention, the step of reading the target location index value from the location index value according to the identifier of one of the threads according to the thread includes: reading the location index according to the edge identifier of one of the threads The i-th position index value in the array is taken as the target position index value, where i is equal to one of the thread's side identification codes.

在本發明的一實施例中,上述的利用目標位置索引值與第一目標傳遞消息更新第二方向傳遞消息中的第二目標傳遞消息的步驟包括:依據第一目標傳遞消息計算出更新消息,並利用更新消息取代檢查點至位元點消息陣列中目標位置索引值所指的第k個第二方向傳遞消息,以更新第二目標傳遞消息。k等於目標位置索引值。In an embodiment of the present invention, the step of updating the second target delivery message in the second direction delivery message by using the target location index value and the first target delivery message comprises: calculating an update message according to the first target delivery message, And replacing the kth second direction delivery message indicated by the target location index value in the checkpoint to the bit point message array by using the update message to update the second target delivery message. k is equal to the target position index value.

基於上述,藉由將坦那圖上的每一邊分別對應至多個執行緒其中之一,致使通量圖形處理器可平行化地處理低密度奇偶檢查碼解碼流程中傳遞消息的更新運算。通量圖形處理器的多個執行緒處理核心可透過讀取一取值索引陣列而據以讀取位元點至檢查點陣列中的第一方向傳遞消息。換言之,各個執行緒處理核心可依據執行緒的邊識別碼來存取共享記憶體中的資料,以更新儲存於共享記憶體中分別對應至這些邊的多個傳遞消息。如此,相較於習知將資料節點(包括位元節點與檢查節點)指派至不同的執行緒來分別進行迭代運算的解碼方式,本發明可獲取更大的運算平行度。此外,本發明之基於坦那圖上的邊的資料處理方式可同時支援規則與不規則的低密度奇偶檢查碼解碼。Based on the above, by mapping each side of the Tanner map to one of the plurality of threads, respectively, the flux graphics processor can process the update operation of the message in the low density parity check code decoding process in parallel. The plurality of thread processing cores of the flux graphics processor can pass the first element in the checkpoint array to read the message by reading a value index array. In other words, each thread processing core can access the data in the shared memory according to the side identifier of the thread to update the plurality of delivery messages respectively stored in the shared memory corresponding to the edges. Thus, the present invention can obtain greater computational parallelism than conventional methods of assigning data nodes (including bit nodes and check nodes) to different threads to perform iterative operations, respectively. In addition, the data processing method based on the edge on the Tanner map of the present invention can simultaneously support the decoding of rules and irregular low density parity check codes.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。The above described features and advantages of the invention will be apparent from the following description.

基於坦那圖(Tanner graph)上的邊(Edge)的數量大於節點(Node)的數量的特性,本發明提出一種以邊為基礎(Edge-based)的運算處理架構。相較於目前通量圖形處理器上以節點為基礎(Node-based)的運算處理方式,本發明可進一步提升於通量圖形處理器上執行低密度奇偶檢查解碼的運算平行度而提高解碼效能。Based on the characteristics that the number of edges on the Tanner graph is larger than the number of nodes, the present invention proposes an edge-based arithmetic processing architecture. Compared with the node-based processing method on the current flux graphics processor, the present invention can further improve the parallelism of performing low-density parity check decoding on the flux graphics processor to improve decoding performance. .

圖1為依據本發明一實施例所繪示的解碼裝置的示意圖。解碼裝置100可以配置在無線通訊接收裝置內,例如:使用IEEE 802.11n標準的接收器,但本發明不以此為限。當解碼裝置100從通訊通道接收到接收資料時,解碼裝置100可基於低密度奇偶檢查演算法進行解碼,以對從上述通訊通道接收到之接收資料進行校正程序。於本範例實施例中,解碼裝置100包括通量圖形處理器20以及儲存單元30。FIG. 1 is a schematic diagram of a decoding apparatus according to an embodiment of the invention. The decoding device 100 can be configured in a wireless communication receiving device, for example, a receiver using the IEEE 802.11n standard, but the invention is not limited thereto. When the decoding device 100 receives the received data from the communication channel, the decoding device 100 may perform decoding based on the low density parity check algorithm to perform a correction procedure on the received data received from the communication channel. In the present exemplary embodiment, the decoding device 100 includes a flux graphics processor 20 and a storage unit 30.

於本範例實施例中,通量圖形處理器20包括多個串流多處理器SM_1~SM_P(P為正整數)、快取記憶體21,以及動態隨機存取記憶體22。每個串流多處理器SM_1~SM_P經配置而處理多個執行緒(thread),且每個串流多處理器SM_1~SM_P包括各自的共享記憶體25_1~25_P。另外,串流多處理器SM_1~SM_P分別包括多個執行緒處理核心,且屬於同一串流多處理器的執行緒處理核心可透過共享記憶體進行溝通或資料傳遞。舉例而言,串流多處理器SM_1包括執行緒處理核心C1_1~C1_Q,且執行緒處理核心C1_1~C1_Q可共同存取共享記憶體25_1。相似的,串流多處理器SM_2包括執行緒處理核心C2_1~C2_Q,且執行緒處理核心C2_1~C2_Q可共同存取共享記憶體25_1。In the present exemplary embodiment, the flux graphics processor 20 includes a plurality of stream multiprocessors SM_1~SM_P (P is a positive integer), a cache memory 21, and a dynamic random access memory 22. Each of the stream multiprocessors SM_1~SM_P is configured to process a plurality of threads, and each of the stream multiprocessors SM_1~SM_P includes a respective shared memory 25_1~25_P. In addition, the stream multiprocessors SM_1~SM_P respectively comprise a plurality of thread processing cores, and the thread processing cores belonging to the same stream multiprocessor can communicate or transfer data through the shared memory. For example, the stream multiprocessor SM_1 includes a thread processing core C1_1~C1_Q, and the thread processing cores C1_1~C1_Q can jointly access the shared memory 25_1. Similarly, the stream multiprocessor SM_2 includes a thread processing core C2_1~C2_Q, and the thread processing cores C2_1~C2_Q can jointly access the shared memory 25_1.

此外,雖然未繪示於圖1,串流多處理器SM_1~SM_P還可以包括執行緒束排程器(warp scheduler)等其他元件,本發明對此並不限制。此外,串流多處理器SM_1~SM_P可共用快取記憶體21,快取記憶體21可用於在執行緒之間傳送資料。串流多處理器SM_1~SM_P中的執行緒處理核心經配置而平行地執行大量的執行緒。在本範例實施例中,通量圖形處理器20可利用單一指令多重執行緒(Single-instruction multiple thread,SIMT)技術而依據相同的指令平行化處理大量的執行緒。In addition, although not shown in FIG. 1, the stream multiprocessors SM_1~SM_P may further include other components such as a warp scheduler, which is not limited by the present invention. In addition, the stream multiprocessors SM_1~SM_P can share the cache memory 21, and the cache memory 21 can be used to transfer data between threads. The thread processing cores in the stream multiprocessors SM_1~SM_P are configured to execute a large number of threads in parallel. In the present exemplary embodiment, the flux graphics processor 20 can parallelize a large number of threads according to the same instruction using a single-instruction multiple thread (SIMT) technique.

儲存單元30例如是任意型式的固定式或可移動式隨機存取記憶體(Random Access Memory,RAM)、唯讀記憶體(Read-Only Memory,ROM)、快閃記憶體(Flash memory)、硬碟或其他類似裝置或這些裝置的組合,然本發明不限於此。儲存單元30耦接至通量圖形處理器20並且存有多個指令,通量圖形處理器20執行所述指令而提供低密度奇偶檢查解碼功能。The storage unit 30 is, for example, any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory (Flash memory), hard A disc or other similar device or a combination of these devices, but the invention is not limited thereto. The storage unit 30 is coupled to the flux graphics processor 20 and stores a plurality of instructions that the flux graphics processor 20 executes to provide a low density parity check decoding function.

於本範例實施例中,通量圖形處理器20執行所述指令以執行下列步驟。基於相關於奇偶校驗矩陣的坦那圖中的M個邊,將每一邊對應至多個執行緒其中之一,致使每一執行緒對應至多個邊識別碼其中之一。當執行執行緒其中之一,依據執行緒其中之一的邊識別碼存取共享記憶體中的資料,以更新儲存於共享記憶體中分別對應至這些邊的多個傳遞消息。In the present exemplary embodiment, flux graphics processor 20 executes the instructions to perform the following steps. Each of the sides corresponds to one of the plurality of threads based on the M edges in the tannogram associated with the parity check matrix, such that each thread corresponds to one of the plurality of side identification codes. When one of the threads is executed, the data in the shared memory is accessed according to the side identifier of one of the threads to update the plurality of delivery messages respectively stored in the shared memory corresponding to the edges.

以下將進一步詳細說明本發明之低密度奇偶檢查解碼方法。圖2A為一種奇偶校驗矩陣210的示意圖,而圖2B為奇偶校驗矩陣210中位元節點(Bit nodes)與檢查節點(Check nodes)之間關係的示意圖。如圖2A所示,奇偶校驗矩陣210的8行各自對應到位元節點B0、B1、B2、B3、B4、B5、B6、B7;而奇偶校驗矩陣210的4列各自對應到檢查節點C0、C1、C2、C3。低密度奇偶檢查的解碼流程係將機率信息(probability information)與奇偶校驗矩陣210進行矩陣乘法運算來求得解碼結果。The low density parity check decoding method of the present invention will be described in further detail below. 2A is a schematic diagram of a parity check matrix 210, and FIG. 2B is a schematic diagram of a relationship between bit nodes and check nodes in the parity check matrix 210. As shown in FIG. 2A, 8 rows of the parity check matrix 210 respectively correspond to the bit nodes B0, B1, B2, B3, B4, B5, B6, B7; and 4 columns of the parity check matrix 210 respectively correspond to the check node C0. , C1, C2, C3. The decoding process of the low density parity check performs matrix multiplication of the probability information and the parity check matrix 210 to obtain a decoding result.

請參照圖2B,一般來說,奇偶校驗矩陣210可以表示為坦那圖(Tanner graph),坦那圖同樣包括位元節點B0~B7與檢查節點C0~C3。如圖2B所示,位元節點B0~B7與檢查節點C0~C3之間存在邊(edge)時(即,位元節點與檢查節點之間有連線),才會由位元節點與檢查節點輪流進行運算。舉例而言,位元節點B0與檢查節點C0經由邊E1相連。圖2B所示具有相對應關係的位元節點B0~B7與檢查節點C0~C3(彼此在圖2B中有連線關係)各自運算完成後,皆會將各自的運算結果暫存入至相同的記憶體單元或記憶體位置中。Referring to FIG. 2B, in general, the parity check matrix 210 can be represented as a Tanner graph, and the Tanner graph also includes bit nodes B0 B B7 and check nodes C0 C C3. As shown in FIG. 2B, when there is an edge between the bit nodes B0 B B7 and the check nodes C0 C C3 (that is, there is a connection between the bit node and the check node), the bit node and the check are performed. The nodes take turns to perform operations. For example, bit node B0 is connected to check node C0 via edge E1. After the respective operations of the bit nodes B0 to B7 and the check nodes C0 to C3 (which are connected to each other in FIG. 2B) shown in FIG. 2B are completed, the respective operation results are temporarily stored in the same Memory unit or memory location.

於本範例實施例中,通量圖形處理器20根據坦那圖的連結狀態來執行低密度奇偶檢查解碼程序,其中低密度奇偶檢查解碼程序包括水平解碼程序與垂直解碼程序。具體來說,在水平解碼程序中,通量圖形處理器20計算檢查節點C0~C3向位元節點B0~B7傳遞的傳遞消息。在垂直解碼程序中,通量圖形處理器20計算位元節點B0~B7向檢查節點C0~C3傳遞的傳遞消息。這些傳遞消息會沿著坦那圖中的邊(edge)來傳送。例如,基於邊E1的連結,檢查節點C0傳送給位元節點B0的是傳遞消息M12,而位元節點B0傳送給檢查節點C0是傳遞消息M11。In the present exemplary embodiment, the flux graphics processor 20 performs a low density parity check decoding procedure according to the link state of the graph, wherein the low density parity check decoding program includes a horizontal decoding program and a vertical decoding program. Specifically, in the horizontal decoding process, the flux graphics processor 20 calculates a delivery message transmitted by the check nodes C0 to C3 to the bit nodes B0 to B7. In the vertical decoding process, the flux graphics processor 20 calculates the delivery messages that the bit nodes B0-B7 pass to the check nodes C0-C3. These delivery messages are transmitted along the edges in the tanner. For example, based on the link of the edge E1, it is the transfer message M12 that the check node C0 transfers to the bit node B0, and the transfer message M11 that the bit node B0 transmits to the check node C0.

基於上述對低密度奇偶檢查解碼流程的說明可知,於解碼過程中,通量圖形處理器20必需利用兩個陣列來分別儲存多個第一方向傳遞消息以及多個第二方向傳遞消息,才可依據解碼演算法而利用這些傳遞消息進行迭代運算。例如,通量圖形處理器20可以採用總和-乘積演算法(Sum-Product Algorithm)、最小值-總和演算法(Min-Sum Algorithm)、或是位元翻轉演算法(bit-flipping Algorithm),本發明並不限制採用何種演算法。Based on the above description of the low density parity check decoding process, in the decoding process, the flux graphics processor 20 must use two arrays to store a plurality of first direction transfer messages and a plurality of second direction transfer messages, respectively. These iterative operations are performed using these delivery messages in accordance with the decoding algorithm. For example, the flux graphics processor 20 may employ a Sum-Product Algorithm, a Min-Sum Algorithm, or a bit-flipping algorithm. The invention does not limit which algorithm is used.

需說明的是,於本發明的範例實施例中,第一方向傳遞消息為位元節點傳送給檢查節點的傳遞消息,也可稱之為位元點至檢查點消息(bit-to-check massage)。第二方向傳遞消息為檢查節點傳送給位元節點的傳遞消息,也可稱之為檢查點至位元點消息(check-to-bit massage)。此外,儲存第一方向傳遞消息的陣列稱之為位元點至檢查點陣列,而儲存第二方向傳遞消息的陣列稱之為檢查點至位元點陣列。值得一提的是,於本發明的範例實施例中,各個邊之傳遞消息的計算分別對應至相異的執行緒。基此,若利用如圖1所示之通量圖形處理器20執行低密度奇偶檢查解碼,通量圖形處理器20之各個執行緒處理核心可分別計算相異的邊上的傳遞消息。It should be noted that, in an exemplary embodiment of the present invention, the first direction delivery message is a delivery message transmitted by the bit node to the check node, and may also be referred to as a bit-to-check massage. ). The second direction delivery message is a delivery message that the inspection node transmits to the bit node, which may also be referred to as a check-to-bit massage. In addition, the array storing the first direction transfer message is referred to as a bit point to checkpoint array, and the array storing the second direction transfer message is referred to as a checkpoint to bit point array. It is worth mentioning that in the exemplary embodiment of the present invention, the calculation of the message passing by each side corresponds to a different thread. Accordingly, if the low-density parity check decoding is performed using the flux graphics processor 20 as shown in FIG. 1, the respective thread processing cores of the flux graphics processor 20 can separately calculate the transfer messages on the different sides.

圖3為依照本發明一實施例所繪示的低密度奇偶檢查解碼方法的流程圖。在本範例實施例中,所述低密度奇偶檢查解碼方法可適用於如圖1所繪示之解碼裝置100,但本發明不僅限於此。請參照圖3,於步驟S301,基於相關於奇偶校驗矩陣的坦那圖中的M個邊,將每一邊對應至多個執行緒其中之一,致使每一執行緒對應至多個邊識別碼其中之一。其中,M為大於1的整數且這些邊連接於多個檢查節點與多個位元節點之間。換言之,關於每一個邊之傳遞消息的計算將分別被指派至不同的執行緒。以圖2B的坦那圖為例來說,檢查節點C0~C3與位元節點B0~B7之間存在12個邊,因此這12個邊上之傳遞消息(包括第一方向傳遞消息與第二方向傳遞消息)的計算將分別由相異的執行緒處理核心同時執行不同的執行緒而完成。例如,於水平解碼過程中,第二方向傳遞消息的更新計算將分別對應至12個相異的執行緒,且這12個執行緒分別具有各自的邊識別碼。FIG. 3 is a flowchart of a low density parity check decoding method according to an embodiment of the invention. In the present exemplary embodiment, the low density parity check decoding method is applicable to the decoding device 100 as illustrated in FIG. 1, but the present invention is not limited thereto. Referring to FIG. 3, in step S301, each side is corresponding to one of the plurality of threads based on the M edges in the tannogram related to the parity check matrix, so that each thread corresponds to the plurality of side identification codes. one. Where M is an integer greater than 1 and the edges are connected between a plurality of check nodes and a plurality of bit nodes. In other words, the calculations for the delivery messages for each edge will be assigned to different threads, respectively. Taking the Tanah diagram of FIG. 2B as an example, there are 12 edges between the check nodes C0~C3 and the bit nodes B0~B7, so the message is transmitted on the 12 sides (including the first direction to transmit the message and the second The calculation of the direction transfer message will be completed by the different thread processing cores simultaneously executing different threads. For example, in the horizontal decoding process, the update calculation of the second direction delivery message will correspond to 12 distinct threads, respectively, and the 12 threads each have a respective side identification code.

之後,於步驟S302,當執行這些執行緒其中之一,依據執行緒其中之一的邊識別碼存取共享記憶體中的資料,以更新儲存於共享記憶體中分別對應至這些邊的多個傳遞消息。具體來說,於水平解碼過程中,執行緒處理核心可依據執行緒的邊識別碼來讀取更新第二方向傳遞消息其中之一所需的至少一第一方向傳遞消息。Then, in step S302, when one of the threads is executed, the data in the shared memory is accessed according to the side identification code of one of the threads to update the plurality of stored in the shared memory respectively corresponding to the edges. Pass the message. Specifically, in the horizontal decoding process, the thread processing core may read at least one first direction transfer message required to update one of the second direction transfer messages according to the side identifier of the thread.

為了詳細說明本發明,以下將列舉另一實施例以詳細說明如何依據執行緒的邊識別碼來存取適當的傳遞消息以進行更新計算。圖4為依照本發明一實施例所繪示的低密度奇偶檢查解碼方法的流程圖。在本範例實施例中,所述低密度奇偶檢查解碼方法可適用於如圖1所繪示之解碼裝置100,但本發明不僅限於此。In order to explain the present invention in detail, another embodiment will be listed below to explain in detail how to access an appropriate delivery message in accordance with the side identifier of the thread for update calculation. FIG. 4 is a flowchart of a low density parity check decoding method according to an embodiment of the invention. In the present exemplary embodiment, the low density parity check decoding method is applicable to the decoding device 100 as illustrated in FIG. 1, but the present invention is not limited thereto.

請參照圖4,於步驟S401,基於相關於奇偶校驗矩陣的坦那圖中的M個邊,將每一邊對應至多個執行緒其中之一,致使每一執行緒對應至多個邊識別碼其中之一。接著,於步驟S402,依據執行緒其中之一的邊識別碼從M個取值索引值中讀取出至少一目標取值索引值,並依據目標取值索引值從儲存於共享記憶體中的M個第一方向傳遞消息中讀取出至少一第一目標傳遞消息。Referring to FIG. 4, in step S401, each side is corresponding to one of the plurality of threads based on the M edges in the tannogram associated with the parity check matrix, so that each thread corresponds to the plurality of side identification codes. one. Next, in step S402, at least one target value index value is read from the M value index values according to one of the thread identification codes, and is stored in the shared memory according to the target value index value. At least one first target delivery message is read from the M first direction delivery messages.

於本範例實施例中,儲存於共享記憶體的取值索引陣列存放有分別對應至多個邊的取值索引值,且儲存於共享記憶體的位元點至檢查點消息陣列存放有分別對應至這些邊的第一方向傳遞消息。取值索引值用以指示執行緒處理核心應該讀取位元點至檢查點陣列中哪一個陣列存放位置的元素。需特別說明的是,取值索引陣列中的取值索引值的陣列存放位置依據坦那圖的連結狀態而定,而位元點至檢查點消息陣列中對應至相同的檢查節點的第一方向傳遞消息相鄰排列。基於上述之取值索引陣列的設置與取值索引值之陣列存放位置的安排,各個執行緒可從位元點至檢查點陣列中讀取到計算特定某一個邊所需的傳遞消息。In the exemplary embodiment, the value index array stored in the shared memory stores the value index corresponding to the plurality of edges, and the bit points stored in the shared memory are stored in the checkpoint message array respectively. The first direction of these edges conveys the message. The value index value is used to indicate that the thread processing core should read the element from the bit point to which array in the checkpoint array. It should be particularly noted that the array storage location of the value index index in the value index array depends on the link state of the map, and the bit direction to the checkpoint message array corresponds to the first direction of the same check node. Pass the message adjacent to the arrangement. Based on the arrangement of the above-mentioned value index array and the array storage location of the index value, each thread can read the transfer message required to calculate a specific edge from the bit point to the checkpoint array.

更詳細來說,依據執行緒其中之一的邊識別碼,執行緒處理核心可從取值索引陣列中的第i個取值索引值開始讀取出目標取值索引,其中i等於執行緒其中之一的邊識別碼。舉例而言,假設執行緒的邊識別碼為‘1’,則執行緒處理核心可從取值索引陣列中的第1個取值索引值開始讀取出至少一目標取值索引。其中所述邊識別碼為整數。接著,依據第i個取值索引值,執行緒處理核心可從位元點至檢查點消息陣列中的第j個第一方向傳遞消息開始讀取出第一目標傳遞消息,其中j等於第i個取值索引值。之後,執行緒處理核心可從位元點至檢查點消息陣列中的第j個第一方向傳遞消息開始讀取出第一目標傳遞消息。In more detail, according to the side identifier of one of the threads, the thread processing core can read the target value index from the i-th index value in the value index array, where i is equal to the thread. One of the side identification codes. For example, if the edge identification code of the thread is '1', the thread processing core can read at least one target value index from the first value index value in the value index array. Wherein the side identification code is an integer. Then, according to the ith value index value, the thread processing core can start reading the first target delivery message from the bit point to the jth first direction delivery message in the checkpoint message array, where j is equal to the i th Value index value. Thereafter, the thread processing core can begin reading the first target delivery message from the bit point to the jth first direction delivery message in the checkpoint message array.

執行緒處理核心響應於依序循環讀取取值索引陣列而持續從位元點至檢查點消息陣列讀取出第一目標傳遞消息,直至讀取到符合預設條件的取值索引值其中之一,以停止讀取位元點至檢查點消息陣列中的第一方向傳遞消息。符合預設條件的取值索引值其中之一等於執行緒其中之一的邊識別碼。舉例來說,當執行邊識別碼為‘1’之執行緒的執行緒處理核心讀取到取值索引值為‘1’時,執行緒處理核心將停止繼續讀取取值索引陣列且同時停止繼續讀取出第一方向傳遞消息。如此,當執行這些執行緒其中之一時,計算各個邊之第二方向傳遞消息的各個執行緒可讀取出正確的第一方向傳遞消息。The thread processing core responds by sequentially reading the array of index indexes and continuously reading the first target delivery message from the bit point to the checkpoint message array until the value index value corresponding to the preset condition is read. First, the message is transmitted in a first direction by stopping reading the bit point to the checkpoint message array. One of the value index values that meet the preset condition is equal to one of the thread's side identifiers. For example, when the execution thread processing core of the thread whose execution side ID is '1' is read to the value index value of '1', the thread processing core will stop reading the value index array and stop at the same time. Continue to read out the first direction to deliver the message. Thus, when one of the threads is executed, each thread that computes the second direction of each side of the message is readable to retrieve the correct first direction delivery message.

回到圖4所示之流程,於步驟S403,依據執行緒其中之一的識別碼從M個位置索引值中讀取目標位置索引值,並利用目標位置索引值與上述的第一目標傳遞消息更新M個第二方向傳遞消息中的第二目標傳遞消息。此目標位置索引值指示出第二目標傳遞消息的陣列存放位置。Returning to the flow shown in FIG. 4, in step S403, the target location index value is read from the M location index values according to the identifier of one of the threads, and the target location index value is used to communicate the message with the first target. Updating the second target delivery message in the M second direction delivery messages. This target location index value indicates the array storage location of the second target delivery message.

於本範例實施例中,位置索引陣列存放有分別對應至這些邊的位置索引值,且檢查點至位元點消息陣列存放有分別對應至這些邊的第二方向傳遞消息。另外,位置索引陣列中的位置索引值的陣列存放位置依據坦那圖的連結狀態而定,檢查點至位元點消息陣列中對應至相同的位元節點的第二方向傳遞消息相鄰排列。基於上述之位置索引陣列的設置與位置索引值之陣列存放位置的安排,各個執行緒可將計算出來的更新後傳遞消息寫入至正確的陣列存放位置,以完成第二方向傳遞消息的更新。In the exemplary embodiment, the location index array stores location index values respectively corresponding to the edges, and the checkpoint-to-bit point message array stores a second direction delivery message corresponding to the edges respectively. In addition, the array storage location of the location index values in the location index array is determined according to the connection state of the Tanner diagram, and the second direction delivery message corresponding to the same bit node in the checkpoint-to-bitpoint message array is adjacently arranged. Based on the arrangement of the location index array and the location storage location of the location index value, each thread can write the calculated update delivery message to the correct array storage location to complete the update of the second direction delivery message.

更詳細來說,依據執行緒其中之一的邊識別碼,執行緒處理核心可讀取位置索引陣列中的第i個位置索引值以作為目標位置索引值,其中i等於執行緒其中之一的邊識別碼。舉例而言,假設執行緒的邊識別碼為‘1’,則執行緒處理核心可讀取位置索引陣列中的第1個位置索引值以作為目標位置索引值。其中所述邊識別碼為整數。之後,執行緒處理核心依據第一目標傳遞消息計算出更新消息,並利用更新消息取代檢查點至位元點消息陣列中目標位置索引值所指的第k個第二方向傳遞消息,以更新第二目標傳遞消息,其中k等於目標位置索引值。In more detail, according to the side identifier of one of the threads, the thread processing core can read the i-th position index value in the position index array as the target position index value, where i is equal to one of the threads. Side identification code. For example, assuming that the side identifier of the thread is '1', the thread processing core can read the first position index value in the position index array as the target position index value. Wherein the side identification code is an integer. Afterwards, the thread processing core calculates the update message according to the first target delivery message, and replaces the kth second direction delivery message indicated by the target location index value in the checkpoint to the bit point message array by using the update message to update the The second target passes the message, where k is equal to the target location index value.

值得一提的是,低密度奇偶檢查碼一般用奇偶校驗矩陣進行描述,奇偶校驗矩陣中每一行中‘1’的個數稱為該行的行重,每一列中‘1’的個數稱為該列的列重。相應奇偶校驗矩陣的行重和列重都相同的低密度奇偶檢查碼稱為規則的(regular)低密度奇偶檢查碼(奇偶校驗矩陣為規則的),否則稱為非規則的(irregular)低密度奇偶檢查碼(奇偶校驗矩陣為不規則的)。由於本發明之低密度奇偶檢查解碼是以邊為基礎進行平行化處理,因此本發明之低密度奇偶檢查解碼方法適用於規則的與不規則的低密度奇偶檢查碼,並不會降低不規則的低密度奇偶檢查解碼的解碼效能。It is worth mentioning that low-density parity check codes are generally described by a parity check matrix. The number of '1's in each row in the parity check matrix is called the row weight of the row, and the number of '1's in each column is The number is called the column weight of the column. The low-density parity check code of the same parity check matrix with the same row weight and column weight is called a regular low-density parity check code (the parity check matrix is regular), otherwise it is called irregular (irregular) Low density parity check code (parity check matrix is irregular). Since the low density parity check decoding of the present invention is parallelized on an edge basis, the low density parity check decoding method of the present invention is applicable to regular and irregular low density parity check codes without reducing irregularities. Low-density parity check decoding performance.

相較於以節點為基礎的設計架構,本發明之以邊為基礎的架構係讓單一執行緒負責坦那圖上單一個邊之傳遞消息的更新運算。因此,由於坦那圖上的邊的數量通常大於節點數量,因此本發明之低密度奇偶檢查解碼方法的運算平行度可提升,並從而提升解碼效能。基於前述的說明可知,本發明之以邊為基礎的處理流程需配置4個陣列,分別是取值索引陣列、位元點至檢查點陣列、位置索引陣列,以及檢查點至位元點陣列。取值索引陣列用以控制執行緒從位元點至檢查點陣列中的多個第一方向傳遞消息存取正確的第一目標傳遞消息。各執行緒將從位元點至檢查點陣列持續讀取出至少一第一目標傳遞消息,直至讀取到等於自己的邊識別碼的取值索引值。Compared to the node-based design architecture, the edge-based architecture of the present invention allows a single thread to be responsible for the update operation of the message passing on a single side of the graph. Therefore, since the number of edges on the Tanner map is usually larger than the number of nodes, the operational parallelism of the low-density parity check decoding method of the present invention can be improved, thereby improving decoding performance. Based on the foregoing description, the edge-based processing flow of the present invention requires four arrays, namely an index index array, a bit point to checkpoint array, a position index array, and a checkpoint to bit point array. The value index array is used to control the thread to access the correct first target delivery message from the bit point to the plurality of first direction in the checkpoint array. Each thread will continuously read out at least one first target delivery message from the bit point to the checkpoint array until a value index value equal to its own side identification code is read.

圖5A與圖5B為依據本發明一實施例所繪示的資料結構與執行緒的資料存取流程的範例示意圖。須先說明的是,圖5A與圖5B係以圖2B之坦那圖為例進行說明,但本發明並不限制於此。請先參照圖5A,執行緒t0用以負責邊E1上之傳遞消息的更新與計算,且執行緒t0的邊識別碼為‘0’。在步驟1中,根據邊識別碼‘0’,執行緒t0從取值索引陣列a1中讀取出陣列存位置對應至邊識別碼‘0’的取值索引值‘1’。在步驟2中,依據取值索引值‘1’,執行緒t0從位元點至檢查點陣列a2中讀取出陣列存位置對應至取值索引值‘1’的第一方向傳遞消息L B 1 →C0FIG. 5A and FIG. 5B are schematic diagrams showing an example of a data structure of a data structure and a thread according to an embodiment of the invention. It should be noted that FIG. 5A and FIG. 5B are described by taking the Tanah diagram of FIG. 2B as an example, but the present invention is not limited thereto. Referring first to FIG. 5A, the thread t0 is responsible for updating and calculating the message transmitted on the edge E1, and the edge identifier of the thread t0 is '0'. In step 1, according to the side identification code '0', the thread t0 reads out the value index value '1' corresponding to the edge identification code '0' from the array index array a1. In step 2, according to the value index value '1', the thread t0 reads from the bit point to the checkpoint array a2 the first direction transfer message L B corresponding to the array index position corresponding to the value index value '1'. 1 → C0 .

之後,在步驟3中,執行緒t0繼續從取值索引陣列a1中讀取出位於取值索引值‘1’之後的取值索引值‘2’。在步驟4中,依據取值索引值‘2’,執行緒t0從位元點至檢查點陣列a2中讀取出陣列存位置對應至取值索引值‘2’的第一方向傳遞消息L B 2 →C0。在步驟5中,執行緒t0繼續依序讀取取值索引陣列a1,直至讀取到取值索引值‘0’。在步驟6中,執行緒t0利用第一方向傳遞消息L B 1 →C0以及第一方向傳遞消息L B 2 →C0計算出更新消息,並依據自己的邊識別碼‘0’來讀取位置索引陣列a3中的目標位置索引值‘0’。在步驟7中,執行緒t0可利用更新消息來寫入至陣列存放位置對應至目標位置索引值‘0’的第二方向傳遞消息,從而獲取更新後的第二方向傳遞消息L C 0 →B0Thereafter, in step 3, the thread t0 continues to read the value index value '2' after the value index value '1' from the value index array a1. In step 4, according to the value index value '2', the thread t0 reads from the bit point to the checkpoint array a2 the first direction transfer message L B corresponding to the array storage location corresponding to the value index value '2'. 2 → C0 . In step 5, the thread t0 continues to sequentially read the value index array a1 until the value index value '0' is read. In step 6, the thread t0 calculates the update message by using the first direction transfer message L B 1 →C0 and the first direction transfer message L B 2 →C0 , and reads the position index according to the own side identifier code '0'. The target position index value '0' in the array a3. In step 7, the thread t0 can use the update message to write to the second direction transfer message corresponding to the target position index value '0' of the array storage location, thereby obtaining the updated second direction transfer message L C 0 → B0 .

相似的,圖5B繪示了執行緒t1計算第二方向傳遞消息L C 0 →B1的流程。請參照圖2B與圖5B,執行緒t1用以負責邊E2上之傳遞消息的更新與計算,且執行緒t1的邊識別碼為‘1’。執行緒t1可依據步驟1至步驟7中的運算與資料存取流程而獲取更新後的第二方向傳遞消息L C 0 →B4。本領域技術人員應可參照前文與圖5A的說明而自行推演執行緒t1的運算流程,於此不再贅述。 Similarly, FIG. 5B illustrates the flow of the thread t1 calculating the second direction transfer message L C 0 →B1 . Referring to FIG. 2B and FIG. 5B, the thread t1 is responsible for updating and calculating the delivery message on the edge E2, and the edge identifier of the thread t1 is '1'. The thread t1 can obtain the updated second direction delivery message L C 0 →B4 according to the operation and data access procedure in steps 1 to 7. Those skilled in the art should refer to the foregoing description of FIG. 5A to derive the operation flow of the thread t1, and no further details are provided herein.

圖6為依照本發明一實施例所繪示的資料結構與執行緒的資料存取流程的範例示意圖。須先說明的是,圖6係以圖2B之坦那圖為例進行說明,但本發明並不限制於此。請參照圖6,執行緒t0~t9分別用以負責圖2B之坦那圖中各個邊上之傳遞消息的更新與計算,且執行緒t0~t9的邊識別碼分別為‘0’~‘9’。依據圖2B之坦那圖中結點的連結狀態,取值索引值的安排如圖6所示的存放於取值索引陣列a1,而位置索引值的安排如圖6所示的存放於位置索引陣列a3。圖6繪示了所有執行緒t0~t9的資料存取流程。本領域技術人員應可參照前文與圖5A的說明而自行推演執行緒t0~t9的運算流程,於此不再贅述。FIG. 6 is a schematic diagram showing an example of a data structure of a data structure and a thread according to an embodiment of the invention. It should be noted that FIG. 6 is described by taking the Tanah diagram of FIG. 2B as an example, but the present invention is not limited thereto. Referring to FIG. 6, the threads t0~t9 are respectively responsible for updating and calculating the message transmitted on each side of the Tanner graph of FIG. 2B, and the side identifiers of the threads t0~t9 are respectively '0'~'9. '. According to the connection state of the nodes in the Tanner diagram of FIG. 2B, the arrangement of the value index values is stored in the value index array a1 as shown in FIG. 6, and the position index value is arranged in the position index as shown in FIG. 6. Array a3. Figure 6 shows the data access flow for all threads t0~t9. Those skilled in the art should refer to the foregoing description of FIG. 5A to derive the operation flow of the threads t0~t9, and no further details are provided herein.

值得一提的是,基於圖6所示的資料存取流程可知,對平行化處理多個執行緒而言,關於取值索引陣列、位元點至檢查點陣列,以及位置索引陣列的記憶體存取是非常聚合的(coalesced),此記憶體存取聚合現象可大幅凸顯出利用通量圖形處理器來執行本發明之低密度奇偶檢查解碼方法的優勢。尤其是,於一範例實施例中,取值索引陣列與位元點至檢查點陣列非常適於存放在串流多處理器中的L1快取記憶體。也就是說,本發明可提升將低密度奇偶檢查解碼執行於通量圖形處理器的解碼效能。It is worth mentioning that, based on the data access process shown in FIG. 6, the memory of the index array, the bit point to the checkpoint array, and the position index array for parallelizing multiple threads is known. Access is very agglomerated, and this memory access aggregation phenomenon can greatly highlight the advantages of using a flux graphics processor to perform the low density parity check decoding method of the present invention. In particular, in an exemplary embodiment, the value index array and the bit point to checkpoint array are well suited for storing L1 cache memory in a streaming multiprocessor. That is to say, the present invention can improve the decoding performance of performing low density parity check decoding on a flux graphics processor.

綜上所述,本發明藉由將坦那圖上的每一邊分別對應至多個執行緒其中之一,致使通量圖形處理器可平行化地處理低密度奇偶檢查碼解碼流程中傳遞消息的更新運算。相較於習知將資料節點(包括位元節點與檢查節點)指派至不同的執行緒來分別進行迭代運算的解碼方式,本發明可獲取更大的運算平行度。此外,本發明之基於坦那圖上的邊的資料處理方式可同時支援規則與不規則的低密度奇偶檢查碼解碼。再者,基於本發明對於取值索引陣列與位置索引陣列的設置,可在不重新排列傳遞消息的前提下達到記憶體存取聚合(memory accessing coalescing)以及資料大量重複存取的特性,以進一步縮短通量圖形處理器執行低密度奇偶檢查解碼時的資料讀取時間。In summary, the present invention enables the flux graphics processor to parallelize the update of the message transmitted in the low density parity check code decoding process by respectively mapping each side of the graph to one of the plurality of threads. Operation. Compared with the conventional decoding method in which data nodes (including bit nodes and check nodes) are assigned to different threads to respectively perform iterative operations, the present invention can obtain greater parallelism of operations. In addition, the data processing method based on the edge on the Tanner map of the present invention can simultaneously support the decoding of rules and irregular low density parity check codes. Furthermore, based on the setting of the index array and the location index array according to the present invention, the memory accessing coalescing and the large amount of repeated data access characteristics can be achieved without rearranging the transmitted messages to further Reduces the data read time when the flux graphics processor performs low-density parity check decoding.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

100‧‧‧解碼裝置
20‧‧‧通量圖形處理器
30‧‧‧儲存單元
21‧‧‧快取記憶體
22‧‧‧動態隨機存取記憶體
SM_1~SM_P‧‧‧串流多處理器
25_1~25_P‧‧‧共享記憶體
C1_1~C1_Q、C2_1~C2_Q‧‧‧執行緒處理核心
210‧‧‧奇偶校驗矩陣
B0~B7‧‧‧位元節點
C0~C3‧‧‧檢查節點
E1、E2‧‧‧邊
M11、M12‧‧‧傳遞消息
S301~S302、S401~S403‧‧‧步驟
t0~t9‧‧‧執行緒
a1‧‧‧取值索引陣列
a2‧‧‧位元點至檢查點陣列
a3‧‧‧位置索引陣列
a4‧‧‧檢查點至位元點陣列
100‧‧‧ decoding device
20‧‧‧ flux graphics processor
30‧‧‧ storage unit
21‧‧‧ Cache memory
22‧‧‧ Dynamic Random Access Memory
SM_1~SM_P‧‧‧Streaming Multiprocessor
25_1~25_P‧‧‧Shared memory
C1_1~C1_Q, C2_1~C2_Q‧‧‧Thread Processing Core
210‧‧‧Parity check matrix
B0~B7‧‧‧ bit node
C0~C3‧‧‧Check node
E1, E2‧‧‧
M11, M12‧‧‧ passing the message
S301~S302, S401~S403‧‧‧ steps
T0~t9‧‧‧ thread
A1‧‧‧value index array
A2‧‧‧ bit points to checkpoint array
A3‧‧‧Location Index Array
A4‧‧‧Checkpoint to bit array

圖1為依據本發明一實施例所繪示的解碼裝置的示意圖。 圖2A為一種奇偶校驗矩陣的示意圖。 圖2B為奇偶校驗矩陣中位元節點與檢查節點之間關係的示意圖。 圖3為依照本發明一實施例所繪示的低密度奇偶檢查解碼方法的流程圖。 圖4為依照本發明一實施例所繪示的低密度奇偶檢查解碼方法的流程圖。 圖5A為依據本發明一實施例所繪示的資料結構與執行緒的資料存取流程的範例示意圖。 圖5B為依據本發明一實施例所繪示的資料結構與執行緒的資料存取流程的範例示意圖。 圖6為依照本發明一實施例所繪示的資料結構與執行緒的資料存取流程的範例示意圖。FIG. 1 is a schematic diagram of a decoding apparatus according to an embodiment of the invention. 2A is a schematic diagram of a parity check matrix. 2B is a schematic diagram of the relationship between a bit node and a check node in a parity check matrix. FIG. 3 is a flowchart of a low density parity check decoding method according to an embodiment of the invention. FIG. 4 is a flowchart of a low density parity check decoding method according to an embodiment of the invention. FIG. 5A is a schematic diagram showing an example of a data structure of a data structure and a thread according to an embodiment of the invention. FIG. 5B is a schematic diagram showing an example of a data structure of a data structure and a thread according to an embodiment of the invention. FIG. 6 is a schematic diagram showing an example of a data structure of a data structure and a thread according to an embodiment of the invention.

S301~S302‧‧‧步驟 S301~S302‧‧‧Steps

Claims (10)

一種執行於通量圖形處理器的低密度奇偶檢查 (LDPC)解碼方法,所述通量圖形處理器的一串流多處理器包括多個執行緒處理核心與一共享記憶體,所述方法包括: 基於相關於一奇偶校驗矩陣的一坦那圖(Tanner graph)中的M個邊(edge),將每一該些邊對應至多個執行緒其中之一,致使每一該些執行緒對應至多個邊識別碼其中之一,其中M為大於1的整數且該些邊連接於多個檢查節點與多個位元節點之間;以及 當執行該些執行緒其中之一,依據該些執行緒其中之一的邊識別碼存取一共享記憶體中的資料,以更新儲存於該共享記憶體中分別對應至該些邊的多個傳遞消息。A low density parity check (LDPC) decoding method performed by a flux graphics processor, the stream multiprocessor of the flux graphics processor comprising a plurality of thread processing cores and a shared memory, the method comprising : based on M edges in a Tanner graph related to a parity check matrix, each of the edges is corresponding to one of the plurality of threads, so that each of the threads corresponds to One of a plurality of side identification codes, wherein M is an integer greater than 1 and the edges are connected between the plurality of check nodes and the plurality of bit nodes; and when one of the threads is executed, according to the execution One of the side identification codes accesses a piece of data in the shared memory to update a plurality of delivery messages stored in the shared memory corresponding to the sides. 如申請專利範圍第1項所述的低密度奇偶檢查解碼方法,其中依據該些執行緒其中之一的該邊識別碼存取該共享記憶體中的資料,以更新儲存於該共享記憶體中分別對應至該些邊的該些傳遞消息的步驟包括: 依據該些執行緒其中之一的該邊識別碼從M個取值索引值中讀取出至少一目標取值索引值,並依據該至少一目標取值索引值從儲存於該共享記憶體中的M個第一方向傳遞消息中讀取出至少一第一目標傳遞消息。The low density parity check decoding method according to claim 1, wherein the data in the shared memory is accessed according to the side identification code of one of the threads to be updated and stored in the shared memory. The step of respectively transmitting the message to the edges includes: reading, according to the side identifier of the one of the threads, at least one target value index value from the M value index values, and according to the The at least one target value index value reads at least one first target delivery message from the M first direction delivery messages stored in the shared memory. 如申請專利範圍第2項所述的低密度奇偶檢查解碼方法,其中儲存於該共享記憶體的一取值索引陣列存放有分別對應至該些邊的該些取值索引值,且儲存於該共享記憶體的一位元點至檢查點消息陣列存放有分別對應至該些邊的該些第一方向傳遞消息。The low-density parity check decoding method of claim 2, wherein a value index array stored in the shared memory stores the value index values respectively corresponding to the edges, and is stored in the The one-point-to-checkpoint message array of the shared memory stores the first direction-passing messages respectively corresponding to the edges. 申請專利範圍第3項所述的低密度奇偶檢查解碼方法,其中該取值索引陣列中的該些取值索引值的陣列存放位置依據該坦那圖的連結狀態而定,該位元點至檢查點消息陣列中對應至相同的檢查節點的第一方向傳遞消息相鄰排列。The low-density parity check decoding method described in claim 3, wherein the array storage position of the value index values in the value index array is determined according to the connection state of the TANNA map, and the bit point is The first direction of the checkpoint message array corresponding to the same check node is transmitted adjacent to the message. 如申請專利範圍第3項所述的低密度奇偶檢查解碼方法,其中依據該些執行緒其中之一的該邊識別碼從該些取值索引值中讀取出該至少一目標取值索引值,並依據該至少一目標取值索引值從儲存於該共享記憶體中的該些第一方向傳遞消息中讀取出該至少一第一目標傳遞消息的步驟包括: 依據該些執行緒其中之一的該邊識別碼,從該取值索引陣列中的第i個取值索引值開始讀取出該至少一目標取值索引,其中i等於該些執行緒其中之一的該邊識別碼; 依據第i個取值索引值,從該位元點至檢查點消息陣列中的第j個第一方向傳遞消息開始讀取出該至少一第一目標傳遞消息,其中j等於第i個取值索引值;以及 響應於依序循環讀取該取值索引陣列而持續從該位元點至檢查點消息陣列讀取出該至少一第一目標傳遞消息,直至讀取到符合一預設條件的該些取值索引值其中之一,以停止讀取該位元點至檢查點消息陣列中的該些第一方向傳遞消息, 其中,該些取值索引值其中之一等於該些執行緒其中之一的該邊識別碼。The low-density parity check decoding method according to claim 3, wherein the at least one target value index value is read from the value index values according to the side identifier of one of the threads. And the step of reading the at least one first target delivery message from the first direction delivery messages stored in the shared memory according to the at least one target value index value comprises: according to the threads The side identification code, the at least one target value index is read from the ith index index value in the value index array, where i is equal to the side identifier of one of the threads; Determining, according to the i-th index value, the at least one first target delivery message from the bit-point to the j-th first direction delivery message in the checkpoint message array, where j is equal to the i-th value An index value; and continually reading the array of index indexes in sequence, and continuously reading the at least one first target delivery message from the bit point to the checkpoint message array until a predetermined condition is met The value index One of the values to stop reading the bit point to the first direction in the checkpoint message array to deliver the message, wherein one of the value index values is equal to the side of one of the threads Identifier. 如申請專利範圍第2項所述的低密度奇偶檢查解碼方法,其中依據該些執行緒其中之一的該邊識別碼存取該共享記憶體中的資料,以更新儲存於該共享記憶體中分別對應至該些邊的該些傳遞消息的步驟更包括: 依據該些執行緒其中之一的識別碼從M個位置索引值中讀取一目標位置索引值,並利用該目標位置索引值與該至少一第一目標傳遞消息更新M個第二方向傳遞消息中的第二目標傳遞消息,其中該目標位置索引值指示出第二目標傳遞消息的陣列存放位置。The low-density parity check decoding method according to claim 2, wherein the data in the shared memory is accessed according to the side identification code of one of the threads to be updated and stored in the shared memory. The step of transmitting the messages corresponding to the edges further includes: reading a target location index value from the M location index values according to the identifier of one of the threads, and using the target location index value and The at least one first target delivery message updates a second target delivery message of the M second direction delivery messages, wherein the target location index value indicates an array storage location of the second target delivery message. 如申請專利範圍第6項所述的低密度奇偶檢查解碼方法,其中一位置索引陣列存放有分別對應至該些邊的該些位置索引值,且一檢查點至位元點消息陣列存放有分別對應至該些邊的該些第二方向傳遞消息。The low-density parity check decoding method according to claim 6, wherein a position index array stores the position index values respectively corresponding to the edges, and a check point-to-bit point message array is stored separately. The messages are delivered to the second directions corresponding to the edges. 申請專利範圍第7項所述的低密度奇偶檢查解碼方法,其中該位置索引陣列中的該些位置索引值的陣列存放位置依據該坦那圖的連結狀態而定,該檢查點至位元點消息陣列中對應至相同的位元節點的第二方向傳遞消息相鄰排列。The low-density parity check decoding method of claim 7, wherein the array storage location of the location index values in the location index array is determined according to a connection state of the TAN mapping, the checkpoint to a bit point The second direction of the message array corresponding to the same bit node is adjacent to the message. 如申請專利範圍第7項所述的低密度奇偶檢查解碼方法,其中依據該些執行緒其中之一的識別碼從該些位置索引值中讀取該目標位置索引值的步驟包括: 依據該些執行緒其中之一的該邊識別碼,讀取該位置索引陣列中的第i個位置索引值以作為該目標位置索引值,其中i等於該些執行緒其中之一的該邊識別碼。The low density parity check decoding method according to claim 7, wherein the step of reading the target location index value from the location index values according to the identifier of one of the threads comprises: The side identification code of one of the threads reads the i-th position index value in the position index array as the target position index value, where i is equal to the side identification code of one of the threads. 如申請專利範圍第7項所述的低密度奇偶檢查解碼方法,其中利用該目標位置索引值與該至少一第一目標傳遞消息更新該些第二方向傳遞消息中的該第二目標傳遞消息的步驟包括: 依據該至少一第一目標傳遞消息計算出一更新消息,並利用該更新消息取代該檢查點至位元點消息陣列中該目標位置索引值所指的第k個第二方向傳遞消息,以更新該第二目標傳遞消息,其中k等於該目標位置索引值。The low density parity check decoding method according to claim 7, wherein the target location index value and the at least one first target delivery message are used to update the second target delivery message in the second direction delivery messages. The step includes: calculating an update message according to the at least one first target delivery message, and replacing the kth second direction delivery message indicated by the target location index value in the checkpoint to bit point message array by using the update message To update the second target delivery message, where k is equal to the target location index value.
TW104122282A 2015-07-09 2015-07-09 Low density parity check decoding method performing on general graphic processing unit TWI552533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW104122282A TWI552533B (en) 2015-07-09 2015-07-09 Low density parity check decoding method performing on general graphic processing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW104122282A TWI552533B (en) 2015-07-09 2015-07-09 Low density parity check decoding method performing on general graphic processing unit

Publications (2)

Publication Number Publication Date
TWI552533B true TWI552533B (en) 2016-10-01
TW201703441A TW201703441A (en) 2017-01-16

Family

ID=57848156

Family Applications (1)

Application Number Title Priority Date Filing Date
TW104122282A TWI552533B (en) 2015-07-09 2015-07-09 Low density parity check decoding method performing on general graphic processing unit

Country Status (1)

Country Link
TW (1) TWI552533B (en)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
G. F. P. Fernandes, V. M. M. d. Silva, M. A. C. Gomes and L. A. P. S. d. Sousa, "Edge Stream Oriented LDPC Decoding," 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), Toulouse, 2008, pp. 237-244. *
G. Falcao, J. Andrade, V. Silva and L. Sousa, "Real-time DVB-S2 LDPC decoding on many-core GPU accelerators," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, 2011, pp. 1685-1688 *
S. Wang, S. Cheng and Q. Wu, "A parallel decoding algorithm of LDPC codes using CUDA," 2008 42nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 2008, pp. 171-175 *

Also Published As

Publication number Publication date
TW201703441A (en) 2017-01-16

Similar Documents

Publication Publication Date Title
US9411683B2 (en) Error correction in memory
CN102932003B (en) The acceleration interpretation method of the QC-LDPC code based on GPU framework
Wang et al. High throughput low latency LDPC decoding on GPU for SDR systems
US9195536B2 (en) Error correction decoder and error correction decoding method
Hetényi et al. Creating entangled logical qubits in the heavy-hex lattice with topological codes
US9755665B1 (en) Systems and methods for an iterative decoding scheme
WO2018072294A1 (en) Method for constructing check matrix and method for constructing horizontal array erasure code
CN107209702A (en) For performing in memory while reading the system and method with write operation
US9048872B2 (en) Layered decoding architecture with reduced number of hardware buffers for LDPC codes
WO2024057036A1 (en) Quantum computing decoder and associated methods
US20240160988A1 (en) Quantum error correction code decoder and associated methods
CN109818626A (en) Decode method, decoder and the storage system of low density parity check code
US9973214B2 (en) Low density parity check decoding method performing on general graphic processing unit and decoding apparatus
CN109921802A (en) A kind of interpretation method, module and the device of QC-LDPC code
WO2020119770A1 (en) Information processing method and device and computer storage medium
TWI552533B (en) Low density parity check decoding method performing on general graphic processing unit
CN109245775A (en) A kind of decoder and its method for realizing decoding
US20240204799A1 (en) Apparatus and method for checking an error in a bit-flipping decoder
Qi et al. Implementation of accelerated BCH decoders on GPU
CN106970852A (en) Flash memory error control circuit and method thereof
CN111010195B (en) Codeword checking method and device
Martínez-Zaldívar et al. Tridimensional block multiword LDPC decoding on GPUs
US20260044770A1 (en) Quantum error correction code decoder and associated methods
Debbabi et al. Multicore and manycore implementations of ADMM-based decoders for LDPC decoding
CN119865189B (en) Heterogeneous implementation method, system, device and storage medium of LDPC encoder