TW201018094A

TW201018094A - Method, apparatus, computer program product and device providing semi-parallel low density parity check decoding using a block structured parity check matrix

Info

Publication number: TW201018094A
Application number: TW97141772A
Authority: TW
Inventors: Predrag Radosavljevic; Marjan Karkooti; Baynast Alexandre De; Joseph R Cavallaro
Original assignee: Nokia Corp
Priority date: 2008-10-30
Filing date: 2008-10-30
Publication date: 2010-05-01
Also published as: TWI387211B

Abstract

The invention relates to low density parity check decoding. A method for decoding an encoded data block is described. Encoded data block comprising data sub-blocks are stored. Decoding is performed in a pipelined manner using an irregular, block-structured parity check matrix, where at least two data sub-block matrices of the parity check matrix are read from and written in each of a plurality of clock cycles. The reading and writing of the data sub-blocks is evenly distributed between at least two area of a memory. The decoding is performed with shift values which eliminate cycles at or below a predetermined threshold length. An apparatus, computer program product and device are also described.

Description

201018094 六、發明說明： I：發明所屬之技術領域3 發明領域本發明之具體實施例大致上係有關無線通訊系統，更 5 特別係有關於無線通訊系統中之低密度同位核對解碼。【先前技術:J 發明背景於說明部分及/或附圖中所見若干縮寫定義如下： AN 接取節點 10 APP 後驗機率 ASIC 特殊應用積體電路 BP 信念傳播 DFU 解碼功能單元 DP 資料處理器 15 DSP 數位信號處理器 FEC 正向錯誤校正 FER 訊框錯誤率 FPGA 現場可規劃閘極陣列 LBP 分層式信念傳播 20 LDPC 低密度同位核對 MEM 記憶體 PCM 同位核對矩陣 PROG 程式 RF 射頻 3 201018094 RX 接收器 SBP 標準信念傳播 SNR 信號對雜訊比 TRANS 收發器 τχ 發射器 UE 使用者設備 WiMAX 微波接取之全球交互操作性於典型無線通訊系統中，硬體資源受限制（例如全平行 10架構並非可接受的解決之道，原因在於於晶片上占據大量面積且可撓性小或無），因此應用基於LBP之解碼。比較SBP 解碼演繹法則，LBP解碼演繹法則之主要優點為LBP解碼演繹法則之特徵為收斂性約快兩倍，原因在於可信度訊息之排程為最佳化。201018094 VI. INSTRUCTIONS: I: TECHNICAL FIELD OF THE INVENTION FIELD OF THE INVENTION The present invention is generally related to wireless communication systems, and more particularly to low density parity check decoding in wireless communication systems. [Prior Art: J Background of the Invention A number of abbreviations as defined in the description and/or the drawings are defined as follows: AN Access Node 10 APP Post-Probability ASIC Special Application Integrated Circuit BP Belief Propagation DFU Decoding Function Unit DP Data Processor 15 DSP Digital signal processor FEC Forward error correction FER frame error rate FPGA Field programmable gate array LBP Hierarchical belief propagation 20 LDPC Low density parity check MEM Memory PCM Parity check matrix PROG Program RF RF 3 201018094 RX Receiver SBP Standard belief propagation SNR signal to noise ratio TRANS transceiver τχ Transmitter UE User equipment WiMAX microwave access global interoperability in typical wireless communication systems, hardware resources are limited (eg full parallel 10 architecture is not acceptable The solution is due to the large area occupied on the wafer and the small or no flexibility, so LBP-based decoding is applied. Comparing the SBP decoding deduction rule, the main advantage of the LBP decoding deduction rule is that the LBP decoding deduction rule is characterized by a convergence of approximately twice as fast as the scheduling of the credibility message is optimized.

15 解碼係分層進行(例如PCM分開各列的集合），此處APP 由一層至另一層改良。下一層之解碼處理係於前一層之 APP更新時開始。參考D· Hocevar，「透過LDPC密碼之分層式解碼之複雜度減低的解碼器架構」於信號處理系統SIPS 2004. IEEE工 20 作坊於 107-112 頁，2004 年 10 月；M. Mansour 及 Ν· Shanbhag ’「高資料流量LDPC解碼器」，極大型積體(VLSI) 系統，IEEE異動於第^期，第976-996頁，2003年12月；及 P. Radosavljevic、A. de Baynast、及 J. R. Cavallaro，「用於 LDPC解碼之最佳化訊息通過時程」，信號、系統及電腦， 201018094 第39屆Asilomar會議，2005年11月。 S. Chung、T. Richardson、及R· Urbanke ’「低密度同位15 Decoding is done hierarchically (for example, PCM separates the sets of columns), where APP is modified from one layer to another. The decoding process of the next layer begins when the APP of the previous layer is updated. See D. Hocevar, "Decoder Architecture with Reduced Complexity of Layered Decoding of LDPC Passwords" in Signal Processing Systems SIPS 2004. IEEE 20 Workshop on pages 107-112, October 2004; M. Mansour and Ν · Shanbhag 'High Data Flow LDPC Decoder, Maximal Integrated Body (VLSI) System, IEEE Transaction, Issues 976-996, December 2003; and P. Radosavljevic, A. de Baynast, and JR Cavallaro, "Optimized Messaging Time Lapse for LDPC Decoding", Signals, Systems and Computers, 201018094 The 39th Asilomar Conference, November 2005. S. Chung, T. Richardson, and R· Urbanke ’“Low Density

核對碼使用高斯近似法之和-積解碼分析」，IEEE異動資訊理論，第47期，第657-670頁，2001年2月，提示隨機pcM 5之最佳化。此種最佳化係等於最佳化隨機PCM之輪廓資料。輪廓資料係由兩個多項式、p(x)及λ(χ)定義，其決定pcM 之多行及多列之重量分配特徵，經由密度演化分析而最佳化。另一方面，Mansour提示架構知曉PCM設計俾便達成硬 10體資源與解碼資料流量間之可接受的折衷。PCM為區塊結構化，此處各個子區塊為移位身分矩陣。只考慮規則密碼，結果位元/訊框誤差率效能相當差。有關進一步參考内容參見：A. Prabhakar、K· Narayanan ’「使用線性同餘序列之低密度同位核對碼之虛擬隨機組成」，通訊上之IEEE異動，5〇 15 卷，第9期，1389-1396頁，2002年9月。為了支援IEEE 802.1111無線及別1^八又標準，1^>(：解碼器須達成約1十億位元/秒之解碼資料流量同時使用有線的硬體平行度(半平行解碼器）。解碼器架構須可擴充來支援寬廣範圍之碼速率之解碼及密碼字元大小。具有24個子區塊 20 行之區塊結構化同位核對矩陣係於IEEE 802.11η標準中提出，如此解碼器架構須支援之。雖然使用隨機PCM之全然平行架構將達成高產出量，但由於所支援之PCM非為架構知曉故其缺點為占用大量面積。用於半平行架構之區塊結構化PCM已經用來縮小解碼 201018094 器面積。但為了達成十億位元/秒之資料流量，須以更密切的架構知曉限制來最佳化PCM。明内發明概要 5 根據本發明之具體實施例為一種用於解碼一編碼資料區塊之方法。儲存包含資料子區塊之一編碼資料區塊。解碼係使用不規則的區塊結構化同位核對矩陣而以管線化方式執行。PCM之至少兩個子區塊矩陣可讀取自及寫入於多數時鐘週期之各個週期。資料子區塊之讀及寫係均勾分布 10於3己憶體的至少二區之間。解碼係以可於或低於預定臨界值長度去除週期循環之移位值進行。根據本發明之I個實_為—種祕解碼—已編碼的資料區塊之叹備。該設備具有用於儲存包含資料子區塊 2一已編碼的資料區塊之記憶體。該設備具有處理器來使區塊結構化同位核對矩陣而以管線化方式解碼之至少兩個子區壤矩陣可讀取自及寫入於夕數時鐘週期之各個週期。資分布於記憶體的至少二區之間塊之讀及寫係均勺螂碼係以可於或低於預定臨界值長度去除·循環之移位值進行。 20 根據本發明之另一個具體竇尬/仃指令之程式具體實施之電腦可讀取:為-_機器可讀取數位處理設備實施來執行用、體’該等指令可由一操作。儲存包含資料子區塊之編瑪的資料區塊之用不規則的區塊結構_位=4區塊。解碼係使對矩陣而以管線化方式執 201018094 =pcm之至少兩個子區塊矩陣可讀取自及寫人於多數時。資料子區塊之讀及寫係均勻分布於記度去除週_環之移位值騎。七咏預定臨界值長 5 發明之又__個具體實施例為—種用於解碼 10 之裝置。該裝置具有用於儲存包含資料子 Ιι右:資料區塊之至少兩個裝置。此外，該裝式解d制""區塊結構化同位核對鱗而以管線化方 =:料區塊之袭置’此處職之至少兩個子區塊矩陣之1及寫2人於多數時鐘週期之各個週期。㈣子區塊可勻分布於記憶體的至少二區之間。解碼係以於預㈣界值長度去除週期循環之移位值進行。圖式簡早說明 15 人關W其匕本發明之實施例之面相於後文象細說明結寸圖之各幅圖研讀時將更為彰顯，附圖中：第1圖顯示方程式(1)至⑺。 =2_不適合用於實施本發明之具體實施例之多種電子裝置之簡化區塊圖。 LDPC第顯不對—給定的碼速率，區塊結構碼與隨機 _間之間隙說明，單位為分科圖顯不區塊結構化不規㈣位核對矩陣。顯不根據本發明之實施例ΑΡΡ記憶體之組織。圖顯示根據本發明之實施例RO賴組之實施。顯不根據本發明之實施例_種核對記憶體之實 20 201018094 例。第8圖顯示根據本發明之實施例之一種ldpC解碼器之方塊圖。第9圖顯示根據本發明之實施例之一種dfu之方塊圖。 5 第圖顯示根據本發明之實施例之一種減低的循環置換器之方塊圖。第π圖顯示用於不同碼速率按時鐘週期中之解碼迭代之處理潛伏期延遲。第12圖顯示對多種碼字元長度之解碼資料流量相對於 10 碼速率之說明圖。第13圖顯示對不同迭代最大數目、碼大小1944、碼速率1/2之FER相對於SNR之說明圖。第14圖顯不對不同迭代最大數目、碼大小1944、碼速率5/6之FER相對於SNR之說明圖。 15 帛15圖顯示對預定數目之迭代之最小可達成資料流量之說明圖。第16圖顯示根據本發明之實施例之—種方法。 ΚΙ 3ΛΓ 】較佳實施例之詳細說明據本發明之實施例可克服與架構知曉PCM相關聯之 5時維待於隨機pCM之相同錯誤校正能力。此等實 i…5半平行解碼器架構達成約1十億位元/秒之平均解碼資料流量。根據本發k實施·合架構知曉區塊結構化P C Μ。 201018094 此等PCMit合驗區域纽半平行LDpc解碼器，允許高解碼資料流量(例如高於1十億位元/秒）而未犧牲錯誤校正能力PCM可結合數種架構知曉限制，諸如子區塊矩陣之最小大小（例如移位的身分矩陣）；用於區域有效解碼器設計之 5移位值之有限集合；為了記憶體資料流量增加每層均勻分布的奇/偶非零區塊行；及用於線性編碼之冗餘部分之上三角形結構(例如沿對角線及其上方只有非零元素）。為了具有能力趨近效能，非零子矩陣之移位值可經最佳化來限制短長度週期（例如長4、6及8之週期）的數目。此 1〇外，藉外顯地考慮PCM之區塊結構，透過密度演化分析可將密碼輪廓資料調整為最佳化。參考第2圖’說明適合用於實施本發明之具體實施例之多種電子裝置之簡化方塊圖。第2圖中，無線網路212適用於透過一接取節點（AN) 216而與一使用者設備（UE) 214通 15 訊。UE 214包括一資料處理器DP 218、耦接至DP 218之一記憶體（MEM) 220及耦接至DP 218之一適當RF收發器 (TRANS) 222 (具有一發射器（TX)及一接收器（RX))。MEM 220儲存一程式(PROG) 224。TRANS 222係用來與AN 216 進行雙向無線通訊。注意TRANS 222有至少一根天線來協 20 助通訊。 AN 216包括一DP 226、耦接至DP 226之一MEM 228、及耦接至DP 226之一適當RF TRANS 230 (具有一丁又及一 RX)。MEM 228儲存一PROG 232。TRANS 230係用來與UE 214進行雙向無線通訊。注意TRANS 230有至少一根天線俾 201018094 協助通訊。AN 216係透過資料徑路234而耦接至一個或多個外部網路或外部系統，諸如網際網路236。 PROG 224、232中之一者假設包括程式指令，該等程式指令當由相關聯之DP執行時，如此處討論，允許電子裝 5置根據本發明之具體實施例操作。大致上，UE 214之多個實施例包括但非限於蜂巢式電話、具有無線通訊能力之個人數位助理器（pDA)、具有無線通讯能力之可機式電腦、具有無線通訊能力之影像拍攝裝置諸如數位相機、具有無線通訊能力之遊戲機、具有無線 10通訊能力之音樂儲存及回放設施、允許無線網際網路接取及劉覽之網際網路設施，以及結合此等功能之組合之可攜式單元或終端裝置。本發明之實施例可藉UE 214之DP 218、226中之一者或多者及AN 216執行之電腦軟艘實施，或藉硬體或藉軟體與 15 硬體之組合實施。 MEM 220、228可屬於當地技術環境適合之任一種類型且可使用任何適當資料儲存技術實施，諸如基於半導體之記憶鱧裝置、磁性記憶體裝置及系統、光學記億體裝置及系統、固定式記憶體及活動式記憶體作為非限制性實例。 2〇 DP 218、226可屬於適合本地技術環境之任一種適當類型，且包括通用電腦、特用電腦、微處理器、數位信號處理器 (DSP)及基於多核心處理器架構之處理器（作為非限制性實例）中之一者或多者。如前文討論且特別就具體方法說明，本發明之具體實 201018094 施例可實現為包含於具體電腦可讀取媒體上可具體實施之程式指令之-電腦程式產品。程式指令之執行結果導致包含利用具體實施例之步驟或方法步驟之操作。 5 10 15 20 。大致上，多個實施例可於硬體或特殊用途電路、軟體、邏輯電路或其任—種組合實施。勤斜面相可於硬體實施’而其它面相可於可藉控制器、微處理器或其它運算裝置執行之勒體或軟體實施，但本發明並非囿限於此。雖然本發明之多個面相可描述為方塊圖、流程圖或若干其它圖式代表圖_朗，但輯周知此處所叙此等方塊圖、設備、系統、技術或方法可於硬體、軟體，體、特殊用途電路或邏輯電路、通用硬體或控制器或其它運算裝置或其若干組合(作為非限制性實例）實施。區塊結構化不規則PCM適合用於具有高度解碼資料流量（例如平均資料流量高於丨十億位元/秒)之半平RLDpc解碼器中實施，同時保有與隨機PCM相同的錯誤校正效能。 PCM可以架構知曉限制設計，諸如： a) 使用子區塊矩陣中有限的移位值集合來縮小循環置換器大小，同時避免短週期的存在以及允許面積有效的解碼器設計。 b) 可使用每層均勻分布的奇/偶非零區塊行俾便提高記憶體資料流量。經由允許由PCM之兩個子區塊同時讀/寫可仏度訊息，實質上提高資料流量。設計PCM時，避免記憶體存取衝突，因此可將全部訊息儲存入兩個獨立記憶體模組。舉例言之，屬於奇區塊行之全部訊息儲存於_個記憶 11 201018094 體模組，而屬於偶區塊行之全部訊息儲存於另一個模組。移位值(例如來自於可能數值之縮小集合）經由最小化新成本函數而去除/減少短長度週期（例如長度4、6及8之週期）之數目可調整為最佳化。 5 藉外顯考慮PCM之區塊結構’經由密度演化分析可最佳化PCM輪廓資料。此種輪廓資料係與隨機矩陣所得之輪廓資料不同。由於密度演化分析並未取決於移位值，故此種最佳化大為簡化。根據本發明之實施例之PCM設計不改變ldpc解碼之 10收傲速度。使用此種LDPC碼可達成商平行程度而無任何效能損耗。此種平行程度係高於使用Turbo碼所達成之平行度。 LDPC碼之架構知曉最佳化，導致適合用於半平行高資料流量解碼器設計之區塊結構化PCM。根據本發明之實施 15例之解碼器初步可於FpGA上實施（例如使用Xilinx系統產生器設計工具）用於快速形成原型及功能證實。靶定的高資料流量LDPC解碼器也可設計為ASIC解。比較FPGA實施，可達成更南的 > 料流量(ASIC可提供快速時鐘速度)及顯著較小的閘極計數及功率耗散。固定點實施法可用於解碼器 20設計。比較浮動點實施法之錯誤率效能，依據可接受之效能損耗而定，固定點實施法之算術精度可為7位元或8位元。根據本發明之實施例之密碼最佳化策略獲得區塊結構化卩€厘，其係與1£丑£ 802.1111及\\^^1八又標準可相容。區塊結構化P C Μ表示增加此等標準之資料流量之良好替代之 12 201018094 道。根據本發明之實施例PCM設計提供多項效果，包括 LDPC碼之架構知曉最佳化。區塊結構PCM適合用於架構有效半平行高資料流量解碼器，此種POV[也結合絕佳錯誤校 5正能力。短的週期數目顯著減少，使得錯誤校正效能可媲美隨機PCM。此種pcm允許於單一時鐘週期兩個App訊息子區塊的讀/寫而記憶體無衝突。藉種子PCM中可能的移位值之有限集合可提供面積效率。如此允許顯著簡化的循環置換器設計。 10 隨機PCM可以兩個多項式λ(χ)及p(x)說明。遵照各行， λί說明連接至i度位元節點之邊緣分量；遵照各列，&說明連接至i度檢查節點之邊緣分量。隨機PCM具有絕佳漸近線效能，但缺乏平行度，使用複雜的記憶體存取。如此實際上不容易使用隨機PCM。參考T. Richardson、A. Shokrollahi 15 及R· Urbanke，「趨近不規則低密度同位核對碼之能力之設計」’ IEEE異動資訊理論，第47期，619-637頁，2001年2月。區塊結構化PCM可藉一種輪廓資料定義諸如：兩個多項式λ’(χ)及ρ’(χ)及一個種子矩陣1^*含有子區塊之非零移位值。參考：R.M. Tanner ’「至低複雜度碼之遞歸辦法」， 20資訊理論之1EEE異動，27期，533-547頁，1981年9月，及 A. Prabhakar ' K. Narayanan，「使用線性同餘序列之低密度同位核對碼之虛擬隨機組成」，通訊上之IEEE異動，5〇卷，第 9期，1389-1396頁，2002年9月。由於某種平行度，區塊結構化PCM提供高解碼資料流 13 201018094 量。也允許接近最佳化之漸近線效能。參考： R—jevic、A de Bay_、M Kark〇〇ti、及】忆The checksum using the Gaussian approximation-sum-product decoding analysis, IEEE Transaction Information Theory, No. 47, pp. 657-670, February 2001, suggests the optimization of random pcM 5. This optimization is equivalent to optimizing the profile of the random PCM. The profile data is defined by two polynomials, p(x) and λ(χ), which determine the weight distribution characteristics of multiple rows and columns of pcM, optimized by density evolution analysis. On the other hand, the Mansour hint architecture knows the PCM design and achieves an acceptable compromise between hard-core resources and decoded data traffic. The PCM is a block structure where each sub-block is a shifted identity matrix. Considering only the rule password, the resulting bit/frame error rate performance is quite poor. For further reference, see: A. Prabhakar, K. Narayanan '"Virtual Random Composition of Low Density Parity Check Codes Using Linear Congruence Sequences", IEEE Transaction on Communications, 5〇15, Issue 9, 1389-1396 Page, September 2002. In order to support IEEE 802.1111 wireless and other standards, 1^> (: The decoder must achieve a decoding data flow of about 1 billion bits per second while using wired hardware parallelism (semi-parallel decoder). The decoder architecture shall be scalable to support a wide range of code rate decoding and cipher character sizes. Block-structured co-located check matrix with 24 sub-blocks of 20 rows is proposed in the IEEE 802.11 η standard, so the decoder architecture shall be Although the fully parallel architecture using random PCM will achieve high throughput, the disadvantage of occupying a large amount of area is that the supported PCM is not known for architecture. Block structured PCM for semi-parallel architecture has been used. The area of the 201018094 device is reduced. However, in order to achieve a data flow of one billion bits per second, the PCM must be optimized with closer architectural knowing constraints. Summary of the Invention 5 According to a specific embodiment of the present invention, a method is used for decoding. A method of encoding a data block, storing a coded data block containing one of the data sub-blocks. The decoding system uses an irregular block to structure the co-located check matrix to be pipelined. Execution. At least two sub-block matrices of the PCM can be read and written in each cycle of a plurality of clock cycles. The read and write sectors of the data sub-block are all distributed in at least two regions of the 3 memory. The decoding is performed by removing the shift value of the periodic cycle at or below a predetermined threshold length. According to the present invention, a real _ is a secret decoding - an encoded data block sigh. The device has And a memory for storing the encoded data block including the data sub-block 2. The device has a processor to enable the block to structure the co-check matrix and to decode at least two sub-regions in a pipelined manner. And are written in each cycle of the octave clock cycle. The block is read and written in at least two blocks of the memory, and the code is removed or cycled at or below a predetermined threshold length. The shift value is performed. 20 The computer according to another specific sinusoidal/仃 command according to the present invention is readable by a computer: for -_ machine readable digital processing device implementation to execute, the body 'these instructions can be one Operation. Storage contains data The block of the block is used for the irregular block structure _ bit = 4 blocks. The decoding system allows at least two sub-block matrices of the 201018094 =pcm to be processed in a pipelined manner. And the writer is in most cases. The reading and writing of the data sub-blocks are evenly distributed in the shift removal week _ ring shift value ride. Seven 咏 predetermined threshold value length 5 __ a specific embodiment is - A device for decoding 10. The device has at least two devices for storing a data containing a data block: a data block. In addition, the device is configured to "d" and "block" structured parity checkpoints and pipelines The formula =: The block of the material block is '1' of at least two sub-block matrices of this job and writes 2 people in each cycle of most clock cycles. (4) The sub-blocks are evenly distributed between at least two regions of the memory. The decoding is performed by removing the shift value of the periodic cycle by the pre-(four) boundary value length. BRIEF DESCRIPTION OF THE DRAWINGS The description of the embodiments of the present invention will be more apparent in the following description of the drawings of the drawings, in which: Figure 1 shows the equation (1). To (7). = 2_ is a simplified block diagram of a variety of electronic devices that are not suitable for use in practicing embodiments of the present invention. The LDPC is not correctly displayed—the given code rate, the gap between the block structure code and the random _, and the unit is the sub-section. The picture shows the block-structured irregular (four) bit check matrix. The organization of the memory according to the embodiment of the present invention is not shown. The figure shows the implementation of a RO set according to an embodiment of the invention. Embodiments according to the present invention are not shown in the example of the invention. 20 201018094 Example. Figure 8 is a block diagram showing an ldpC decoder in accordance with an embodiment of the present invention. Figure 9 shows a block diagram of a dfu in accordance with an embodiment of the present invention. 5 The figure shows a block diagram of a reduced cycler in accordance with an embodiment of the present invention. The πth plot shows the processing latency delay for decoding iterations in clock cycles for different code rates. Figure 12 shows an illustration of the decoded data traffic versus the 10 code rate for multiple code character lengths. Figure 13 shows an illustration of the FER versus SNR for the maximum number of different iterations, the code size of 1944, and the code rate of 1/2. Figure 14 shows an illustration of the FER vs. SNR for the maximum number of iterations, the code size of 1944, and the code rate of 5/6. Figure 15 shows an illustration of the minimum achievable data flow for a predetermined number of iterations. Figure 16 shows a method in accordance with an embodiment of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of the present invention can overcome the same error correction capabilities of the random pCM that are associated with the architecture-aware PCM. These real i...5 semi-parallel decoder architectures achieve an average decoding data flow of approximately 1 billion bits per second. According to the implementation of the present invention, the block structure structuring P C Μ is known. 201018094 These PCMit test area New semi-parallel LDpc decoders allow high decoded data traffic (eg, above 1 billion bits per second) without sacrificing error correction capability. PCM can be combined with several architecturally aware limits, such as sub-blocks. The minimum size of the matrix (eg, shifted identity matrix); a finite set of 5 shift values for the region effective decoder design; increasing the evenly distributed odd/even non-zero block rows for each layer of memory data traffic; A triangular structure above the redundant portion of the linear encoding (eg, only non-zero elements along the diagonal and above). In order to have capability approaching performance, the shift values of the non-zero sub-matrices can be optimized to limit the number of short length periods (e.g., periods of lengths of 4, 6, and 8). In addition, by explicitly considering the block structure of the PCM, the password profile data can be adjusted to be optimized through density evolution analysis. Referring to Figure 2, there is illustrated a simplified block diagram of various electronic devices suitable for use in practicing the embodiments of the present invention. In FIG. 2, wireless network 212 is adapted to communicate with a user equipment (UE) 214 via an access node (AN) 216. The UE 214 includes a data processor DP 218, a memory (MEM) 220 coupled to the DP 218, and a suitable RF transceiver (TRANS) 222 coupled to the DP 218 (having a transmitter (TX) and receiving (RX)). The MEM 220 stores a program (PROG) 224. The TRANS 222 is used for two-way wireless communication with the AN 216. Note that the TRANS 222 has at least one antenna to coordinate communication. The AN 216 includes a DP 226, a MEM 228 coupled to one of the DPs 226, and an appropriate RF TRANS 230 (having a dual and an RX) coupled to one of the DPs 226. The MEM 228 stores a PROG 232. The TRANS 230 is used for two-way wireless communication with the UE 214. Note that the TRANS 230 has at least one antenna 俾 201018094 to assist with communication. AN 216 is coupled to one or more external networks or external systems, such as Internet 236, via data path 234. One of the PROGs 224, 232 is assumed to include program instructions that, when executed by the associated DP, as discussed herein, allow the electronic device to operate in accordance with a particular embodiment of the present invention. In general, various embodiments of the UE 214 include, but are not limited to, a cellular telephone, a personal digital assistant (pDA) with wireless communication capabilities, a portable computer with wireless communication capabilities, and an image capture device with wireless communication capabilities such as Digital camera, game console with wireless communication capability, music storage and playback facilities with wireless 10 communication capability, Internet access facilities allowing wireless internet access and Liu Yu, and portable combination of these functions Unit or terminal device. Embodiments of the present invention may be implemented by one or more of the DPs 218, 226 of the UE 214 and a computer flexible ship executed by the AN 216, or by a combination of hardware or software and 15 hardware. MEM 220, 228 may be of any type suitable for the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical devices and systems, and fixed memory. Body and mobile memory are non-limiting examples. 2〇 DP 218, 226 may be of any suitable type suitable for the local technical environment, and include general purpose computers, special computers, microprocessors, digital signal processors (DSPs), and processors based on multi-core processor architectures (as One or more of the non-limiting examples). As discussed above and particularly with respect to specific methods, the embodiment of the present invention can be implemented as a computer program product embodied in a specific computer readable medium. The results of execution of the program instructions result in the use of steps or method steps of a particular embodiment. 5 10 15 20 . In general, the various embodiments can be implemented in hardware or special purpose circuits, software, logic circuits, or any combination thereof. The slanting phase can be implemented in hardware and the other aspects can be implemented in a body or software that can be executed by a controller, microprocessor or other computing device, but the invention is not limited thereto. Although the various aspects of the present invention can be described as a block diagram, a flow diagram, or a number of other figures representing a figure, the blocks, devices, systems, techniques, or methods described herein can be used in hardware or software. The body, special purpose circuit or logic circuit, general purpose hardware or controller or other computing device or a combination thereof (as a non-limiting example) is implemented. Block structured irregular PCM is suitable for implementation in a semi-flat RLDpc decoder with highly decoded data streams (e.g., average data traffic above 丨1 billion bits per second) while maintaining the same error correction performance as random PCM. The PCM can be architecturally aware of the restricted design, such as: a) Use a limited set of shift values in the sub-block matrix to reduce the size of the loop switch while avoiding the existence of short periods and allowing for efficient area decoder design. b) The memory data flow can be increased by using evenly distributed odd/even non-zero block lines per layer. Data traffic is substantially increased by allowing simultaneous reading/writing of latitude messages by two sub-blocks of the PCM. When designing PCM, memory access conflicts are avoided, so all messages can be stored in two independent memory modules. For example, all the messages belonging to the odd block row are stored in the _ memory 11 201018094 body module, and all the messages belonging to the even block row are stored in another module. The shift value (e.g., from a reduced set of possible values) can be adjusted to optimize by reducing/decreasing the number of short length periods (e.g., periods of lengths 4, 6, and 8) by minimizing the new cost function. 5 Considering the block structure of PCM by explicit explicit PCM profile data can be optimized by density evolution analysis. This profile data is different from the profile data obtained from the random matrix. Since the density evolution analysis does not depend on the shift value, this optimization is greatly simplified. The PCM design in accordance with an embodiment of the present invention does not change the arbitrarily speed of ldpc decoding. The use of such LDPC codes can achieve a degree of parallelism without any loss of performance. This degree of parallelism is higher than the parallelism achieved using Turbo codes. The architecture of the LDPC code is known to be optimized, resulting in a block structured PCM suitable for use in semi-parallel high-rate traffic decoder designs. A decoder according to the implementation of the present invention can be initially implemented on FpGA (e.g., using the Xilinx System Generator Design Tool) for rapid prototyping and functional verification. The targeted high-rate LDPC decoder can also be designed as an ASIC solution. Comparing FPGA implementations, it is possible to achieve a more recent > material flow (ASIC provides fast clock speed) and significantly smaller gate count and power dissipation. The fixed point implementation can be used in the decoder 20 design. Comparing the error rate performance of the floating point implementation method, the arithmetic accuracy of the fixed point implementation method can be 7 bits or 8 bits depending on the acceptable performance loss. The cryptographic optimization strategy according to an embodiment of the present invention obtains a block structuring, which is compatible with the standard of 1 ugly 802.1111 and \\^^1. Block Structured P C Μ represents a good alternative to increasing the flow of data for these standards 12 201018094. The PCM design provides a number of effects in accordance with embodiments of the present invention, including architectural knowledge optimization of LDPC codes. The block structure PCM is suitable for architecture efficient semi-parallel high data traffic decoders, and this POV [also combines excellent error correction capability]. The number of short cycles is significantly reduced, making the error correction performance comparable to random PCM. This pcm allows read/write of two App message sub-blocks in a single clock cycle without memory conflict. A limited set of possible shift values in the seed PCM can provide area efficiency. This allows for a significantly simplified cyclic permutator design. 10 Random PCM can be described by two polynomials λ(χ) and p(x). According to each line, λί indicates the edge component connected to the i-degree bit node; according to the columns, & description is connected to the edge component of the i-degree check node. Random PCM has excellent asymptotic performance, but lacks parallelism and uses complex memory access. It is thus not practical to use random PCM. Refer to T. Richardson, A. Shokrollahi 15 and R. Urbanke, "Design of the ability to approach irregular low-density parity check codes" IEEE Transaction Information Theory, No. 47, pp. 619-637, February 2001. The block structured PCM can be defined by a profile data such as: two polynomial λ'(χ) and ρ'(χ) and a seed matrix 1^* containing non-zero shift values of the sub-blocks. Reference: RM Tanner '"Recursive Approach to Low Complexity Code", 20 Information Theory 1EEE Transaction, 27 issues, pp. 533-547, September 1981, and A. Prabhakar 'K. Narayanan, "Using Linear Congruence The virtual random composition of the sequence of low-density parity check codes, IEEE Transactions on Communications, 5〇, No. 9, pp. 1389-1396, September 2002. Due to some parallelism, the block structured PCM provides a high decoded data stream 13 201018094. It also allows for near-optimal asymptotic performance. Reference: R-jevic, A de Bay_, M Kark〇〇ti, and]

Cavallar。’基於架構導向同位核對矩陣之高資料流量多重速率LDPC解碼器」，第14屆歐洲信號處理會議 5 (EUSIPCO)，2006年9 月。备產生區塊結構化PCM時，移位值可經最佳化來減少 PCM内部之短週_如長度4、6及8之職之數目。減少此等週期數目，顯著降低FER效能曲線的錯誤底且提升解碼的收斂速度。適當碼設計對短的至中等的碼字元大小(例⑹ ⑩ 10 1〇0〇-30〇〇位元)提供良好錯誤率效能。此外，經由於區塊結構化PCM執行密度演化分析可最佳化。根據本發明之實施例PCM不具有任何長度4之週期，且比較隨機組成具有多於40%之長度6之週期。參考：P. Radosavljevic、A de Baynast、M, Karkooti、及J. R. Cavallaro ’「基於架構導向 15同位核對矩陣之高資料流量多重速率LDPC解碼器」，第14 屆歐洲信號處理會議(EUSIPCO)，2〇06年9月。對一給定的速率R及碼字元大小N，子區塊數目Nc表示 ® 為Nc=N/S，有SxS子區塊。必須小心考慮子區塊數目。更 — 大量子區塊提供更佳的輪廓資料。但更少數的子區塊允許 20更容易移除短週期，且因較高平行度故允許較高資料流量。平衡此等因素，允許對給定之碼字元大小及目標資料流量選擇適當子區塊大小。對Η種子之任何分開元素a、b、c及D，A、B、C、D中有長度4之週期C4之機率係以方程式（1)表示，如第1圖所 14 201018094 示。子中之長度4之週期之平均數以方程式(2)表示，如第 1圖所示。屬於長度4之週期之A之機率以方程式（3)表示，如第1圖所示。包括A之長度4週期之平均數以方程式(4)表示，如第1圖所示。 5 參考 ρ· Radosavljevic、A de Baynast、M. Karkooti、及 J· R. Cavallaro,「基於架構導向同位核對矩陣之高資料流量 • 多重速率LDPC解碼器」，第14屆歐洲信號處理會議 (EUSIPCO)，2006年9 月。整個PCM矩陣中之週期數目係以方程式表示，如第i 10 圖所示。{〇^，〇12,〇13,〇14}表示於11種子之週期八800中之移位值。參考：K.S. Kim、S.H· Lee、Y.H. Kim、J_Y. Ahn，「使用循環移位矩陣二進制LDPC碼之設計」，電子函件，第4〇卷，第5期，325-326頁，2004年3月。移位值總數須至少等於(6)，如第1圖所示。如此允許移 I5除PCM47全部長度為4之週期。參考：p. Ra(j〇savljevic、A de . Baynast、M. Karkooti、及J. R，Cavallaro，「基於架構導向同位核對矩陣之高資料流量多重速率LDPC解碼器」，第14 屆歐洲信號處理會議(EUSIPCO)，2006年9月。經由於區塊結構化PCM執行密度演化分析，可將Η** 20調整為最佳化。用於標準密度演化分析，輪廓資料可以兩個多項式λ(χ)及p(x)表示，此處\為連接至i度位元節點之邊緣比例’及巧為連接至j度核對節點之邊緣比例。可擴充，故為連接於i度位元節點與j度核對節點間之邊緣比例。相同密度演化方程式可用於隨機結構化碼及區塊結構化碼。 15 201018094 LDP=顯_ —較之碼速機塊結構化碼與隨機、曰之間隙’以分貝表示。該線圖顯示有24子區塊行之PCM之小於〇 4分貝之錯誤率效能損耗及有之㈣之小_7分貝之錯誤率效能損耗。 ^ 方程式(4)及(5)可延伸至任何週期長度(例如6、8)。使用方程式⑹’可判定需要移除全部長度4之週期所需最少移位值數目。Cavallar. 'High-data multi-rate LDPC decoder based on architecture-oriented parity check matrix, 14th European Signal Processing Conference 5 (EUSIPCO), September 2006. When generating block structured PCM, the shift value can be optimized to reduce the number of short weeks within the PCM, such as the length of positions 4, 6, and 8. Reducing the number of such cycles significantly reduces the false bottom of the FER performance curve and improves the convergence speed of decoding. The appropriate code design provides good error rate performance for short to medium code character sizes (example (6) 10 10 1 〇 0 〇 -30 〇〇 bits). In addition, density evolution analysis can be optimized via block-structured PCM. The PCM according to an embodiment of the present invention does not have any period of length 4, and the random composition has a period of length 6 of more than 40%. References: P. Radosavljevic, A de Baynast, M, Karkooti, and JR Cavallaro '"High-data-rate multi-rate LDPC decoder based on architecture-oriented 15 co-located check matrix", 14th European Conference on Signal Processing (EUSIPCO), 2〇 September 2006. For a given rate R and codeword size N, the number of sub-blocks Nc indicates that ® is Nc=N/S with SxS sub-blocks. Care must be taken to consider the number of sub-blocks. More — A large number of sub-blocks provide better profile data. However, a smaller number of sub-blocks allow 20 to more easily remove short periods and allow higher data throughput due to higher parallelism. Balancing these factors allows the appropriate sub-block size to be selected for a given code character size and target data traffic. For any of the separate elements a, b, c and D of the seed, the probability of having a period C4 of length 4 in A, B, C, D is represented by equation (1), as shown in Fig. 1 2010 14094. The average of the periods of length 4 in the sub-routine is expressed by equation (2), as shown in Fig. 1. The probability of A belonging to the period of length 4 is expressed by equation (3), as shown in Fig. 1. The average of the length of 4 cycles including A is expressed by equation (4) as shown in Fig. 1. 5 References ρ· Radosavljevic, A de Baynast, M. Karkooti, and J. R. Cavallaro, “High Data Flow Based on Architecture-Oriented Parity Check Matrix • Multi-rate LDPC Decoder”, 14th European Signal Processing Conference (EUSIPCO) , September 2006. The number of cycles in the entire PCM matrix is expressed as an equation, as shown in Figure i10. {〇^, 〇12, 〇13, 〇14} represents the shift value in the period of eight seeds of the eighteen. Reference: KS Kim, SH· Lee, YH Kim, J_Y. Ahn, “Design of Binary LDPC Codes Using Cyclic Shift Matrix”, E-Mail, Vol. 4, No. 5, pp. 325-326, March 2004 . The total number of shift values must be at least equal to (6), as shown in Figure 1. This allows the shifting of I5 except for the period in which the PCM 47 has a total length of four. Reference: p. Ra (j〇savljevic, A de. Baynast, M. Karkooti, and J. R, Cavalaro, “High Data Flow Multirate LDPC Decoder Based on Architecture-Oriented Parity Check Matrix”, 14th European Signal Processing Conference (EUSIPCO), September 2006. The Η**20 can be optimized for optimization due to the block-structured PCM performing density evolution analysis. For standard density evolution analysis, the contour data can be two polynomial λ(χ And p(x) means that here is the edge ratio of the edge connected to the i-degree bit node' and the edge ratio of the node connected to the j-degree check node. It can be expanded, so it is connected to the i-degree bit node and j. The ratio of edges between nodes is checked. The same density evolution equation can be used for random structuring codes and block structuring codes. 15 201018094 LDP=display_--between code-speed block structuring codes and random, 曰 gaps in decibels The line graph shows the error rate performance loss of PCM with 24 sub-block rows less than 〇4 dB and the error rate performance loss of (4) small _7 dB. ^ Equations (4) and (5) can be extended To any cycle length (eg 6, 8). Use equation ⑹ 'may determine the need to remove the entire length of period 4 of the minimum required number of shift values.

密度演化演繹法則可延伸而考慮碼之區塊結構。架構知曉最佳化限制允許PCM的冗餘部分之上三角形結構用於 1〇簡化編碼目的，以及對記憶體資料流量於資訊部分均等分布的奇及偶非零區塊行位置增加。The density evolution deduction rule can be extended to consider the block structure of the code. The architecture knows that the optimization limit allows the triangular structure above the redundant portion of the PCM to be used for simplified coding purposes, as well as the odd and even non-zero block row positions for equal distribution of memory data traffic to the information portion.

根據本發明之實施例LD P C可支援具有架構知曉限制之區塊結構化PCM。由於PCM之特殊結構，可實現高解碼資料流量，經由三層PCM層之管線化，允許每個時鐘週期 15 得自兩個子區塊矩陣之APP及核對訊息的讀/寫。面積有效半平行解碼器實施例利用由於PCM中有限的移位集合故循環置換器的尺寸縮小，且允許每一層全然處理平行度。記憶體可於兩個APP記憶體模組中分成24區塊行。該對APP區塊/行可於每個時鐘週期讀/寫。如此允許於每個時 20 鐘週期讀/寫兩個APP區塊/行而無記憶體衝突。第4圖顯示具有2/3速率之典型區塊結構KpCM。第5、6及7圖顯示根據本發明之實施例之LDPC解蝎器之記憶體組織結構。於此等實例中，使用一個讀及一個寫記憶體埠。 16 201018094 第5圖顯示劃分成為兩個子模組之A P P記憶體之組織結構。於各記憶體模組中，可儲存半數PCM區塊行(例如奇或偶區塊行）。各模組有12區塊行如此具有深度12。一個區塊行中之 5 APP訊息數目以S亦即區塊行寬度表示。兩個補數可用於可靠度訊息之固定點表示法(APP訊息及核對訊息）。可支援任何固定點算術精度。第6圖顯示根據本發明之rom模組之實施例，該模組含有非零位置及連續讀/寫APP區塊行之移位值（例如 10 原先數值或相對數值）。兩個ROM模組各自為一個特定APP模組所專用。模組 k供非零區塊行位置（例如由丨至24)及相對應之身分矩陣之移位值。區塊行之位置為APP記憶體模組之下一個讀/寫位址0 15 可使用兩個額外r〇m模組。此等模組可儲存相對移位值替代原先移位值。相對移位值對相同區塊行之前一個移位值提供相對差。原先移位值用於第一次迭代。如此防止於圮憶體寫入前APP訊息之循環置換。 2〇區塊結構化PCM具有均等分布的奇及偶非零區塊行位 2〇置。如此允許—個模組含有得自奇區塊行之APP訊息及第一模組含有得自偶區塊行之APP訊息。第7圖顯示核對$憶體之組織結構。―個核對記憶體位置含有得自兩個接續區塊矩陣之訊息。核對記憶體之組織結構並未仰賴APPH塊行之讀/寫順 17 201018094 序。係以全零啟動；結果核對訊息位置未與特定區塊行相關。核對記憶體位置可含有來自於兩個非零子矩陣訊息。於若干實施例中，核對記憶體可劃分成子模組，協助解碼器之擴大，且提供多種碼字元大小之支援。 5 第8、9及10圖顯示高資料流量LDPC解碼器、解碼功能單凡、及減低的循環置換器之細節方塊圖，全部皆係根據本發明之實施例。第8圖顯示根據本發明之實施例一種LDpc解碼器8〇〇之方塊圖。此種解碼器可利用硬體資源，諸如：8個解碼功 10能單TC860用於達成每一層完全解碼平行度，此處該層有s 列，兩個核對記憶體850及855 (例如用於協助各層管線化之鏡面855)、四個APP模組810及815包括APP鏡面記憶體 815，以及用於由記憶體讀取後App訊息之區塊移位之循環置換器820。 15 控制器840提供控制邏輯，控制核對記憶體850及855之定址，及ROM模組831、833、836及838之定址（用於APP記憶體模組之定址且決定循環置換之移位值），以及處理内側 S個平行DFU 860。當列連續度\^1^為奇數時，每個時鐘週期一個區塊行可讀/寫自/至APP模組81〇及815，/寫可排程為 20該層的最後(例如最後時鐘週期）。兩個核對訊息子區塊可自動讀/寫自/至核對記憶體850及855，但核對記憶體位置之第二半可能無效。因此控制器840中之控制邏輯電路可能讓 DFU 860中之若干算術FU不能動作，及四個循環置換器820 中之兩個不能動作。ROM1 831及836以及ROM2 833及838 18 201018094 二者可於一次解碼迭代結束時完全讀取。額外R〇M (圖中未顯示）可用於儲存各層之wR值。 5 10 15 ❿ 20 此種循環置換器820於APP訊息之寫入前並未使用反向循環置換。此外，循環置換器82〇具有因三個管線階段之三個時鐘週期之總時間潛伏期延遲，此處S2:1MUX之兩個階段決定一個管線階段。第9圖顯示根據本發明之實施例一種DFU 86〇之方塊圖。顯不用於一個PCM列解碼之三個管線階段（讀取、處理、及寫入階段)之實施例。三個接續層中之各列可同時解碼。於每個時鐘週期，兩似PP訊息及兩個核對訊息載入單-DFU;於每個時鐘週期’兩個核對訊息及兩個App訊息經更新。DFU 860支援每個時鐘週期兩個子區塊矩陣(App 區塊及核對訊息區塊)之讀/寫。允許三個管線階段：讀取、處理、及寫人階段^❹Φ列最小和魏單元则可實現最小和近似值(例如兩個最小訊息之串列搜尋）。第10圖顯示根據本發明之實施例之—種循環置換器謂。區塊結構化PCM之種子矩陣具有有限的移位值集合 (例如由1至15)。循環置換器820有4個2:1 Μυχ 1〇1〇階段。兩個額外「倒裝邏輯電路」階段(s χ 2:ι Μυχ i謂)於二方向執行區塊移位(例如此處相辦偏移為丨至15，或_15至 -1)。於輪入階段及輸出階段之額夕卜「倒震邏輯電路」用於左方向及右方向支援區塊移位。此種邏輯電路逆轉App訊 ^順序：第-訊息變成第挪息，第二訊息變成第㈣訊息專用於輪入「倒裝邏輯電路」，及反之亦然用於輸出「倒裝 19 201018094 邏輯電路」。若相對位移移位值為負（例如_15至-1間）則可利用此種邏輯電路。可估計用於解碼器800之算術部分之標準ASIC閘極數目，包括DFU 860及循環置換器820。於非限制性實例中，使用1944位元（因而S為81)之碼字元大小及8位元之二補數固定點算術精度，閘極總數約為235千閘極。面積只有接近 1.46增加來支援每個時鐘週期讀/寫兩個區塊矩陣。81 DFU 等於189千閘極，此處96千閘極用於處理兩個區塊矩陣。比 10 15 权丁网蚀之興型循環置換器，四個減低的循環置換各自約有11，6千間極。當支援高達⑽之全部移位值時減的循環置換轉供面積賴著縮，卜姆解根據本發明實施例之解碼ϋ可支援任何二補數固定點算術精度。使用半平行架構，以有限的硬體資源可達成高解碼料流量W如平均約1 +億位元/秒）。藉下述辦法提供高資 "丨<_量.每個時鐘週期寫兩個子區塊矩陣(例如Μ?訊息核對訊息區塊）；每-個PCM層之全祕理平行；以及: 接續層之管線化。一The LD P C can support block structured PCM with architectural awareness restrictions in accordance with an embodiment of the present invention. Due to the special structure of the PCM, high decoded data traffic can be achieved, which is pipelined through the three PCM layers, allowing each clock cycle 15 to be read/written from the APP and check messages of the two sub-block matrices. The area effective semi-parallel decoder embodiment utilizes the size reduction of the cyclic displacer due to the limited set of shifts in the PCM and allows each layer to fully handle the parallelism. The memory can be divided into 24 block lines in two APP memory modules. The pair of APP blocks/rows can be read/written every clock cycle. This allows two APP blocks/rows to be read/written at 20 sec intervals without memory conflicts. Figure 4 shows a typical block structure KpCM with a 2/3 rate. Figures 5, 6 and 7 show the memory organization of an LDPC decoder according to an embodiment of the present invention. In these examples, a read and a write memory are used. 16 201018094 Figure 5 shows the organization of the A P P memory divided into two sub-modules. In each memory module, half of the PCM block lines (for example, odd or even block lines) can be stored. Each module has 12 block rows with a depth of 12. The number of 5 APP messages in a block row is represented by S, which is the block line width. Two complements can be used for fixed point representation of the reliability message (APP message and check message). Any fixed point arithmetic precision can be supported. Figure 6 shows an embodiment of a rom module according to the present invention which contains shift values for non-zero positions and consecutive read/write APP block rows (e.g., 10 original values or relative values). The two ROM modules are each dedicated to a specific APP module. Module k provides the shift value of the non-zero block row position (e.g., from 丨 to 24) and the corresponding identity matrix. The location of the block row is a read/write address below the APP memory module. 0 15 Two additional r〇m modules can be used. These modules can store relative shift values instead of the original shift values. The relative shift value provides a relative difference to the previous shift value of the same block row. The original shift value is used for the first iteration. This prevents the cyclic replacement of the APP message before the memory is written. 2〇 Block structured PCM has equally distributed odd and even non-zero block row bits. This allows a module to contain APP messages from odd blocks and the first module to contain APP messages from even blocks. Figure 7 shows the organization of the checksum memory. The “check memory location” contains messages from two consecutive block matrices. Checking the organization of memory does not depend on the reading/writing of the APPH block. The system starts with all zeros; the result check message location is not related to a specific block row. The check memory location can contain messages from two non-zero sub-matrices. In some embodiments, the collating memory can be divided into sub-modules, assisting in the expansion of the decoder, and providing support for multiple code character sizes. 5 Figures 8, 9, and 10 show detailed block diagrams of the high data traffic LDPC decoder, the decoding function, and the reduced cyclic permutator, all in accordance with an embodiment of the present invention. Figure 8 shows a block diagram of an LDpc decoder 8A in accordance with an embodiment of the present invention. Such a decoder can utilize hardware resources, such as: 8 decoding functions 10 can be used to achieve complete decoding parallelism of each layer, where the layer has s columns, two check memories 850 and 855 (for example, Assisting each layer of mirrored mirror 855), four APP modules 810 and 815 include APP mirror memory 815, and a cyclic permutator 820 for block shifting of the App message after reading by the memory. 15 The controller 840 provides control logic for controlling the addressing of the memory banks 850 and 855 and the addressing of the ROM modules 831, 833, 836 and 838 (for addressing of the APP memory module and determining the shift value of the cyclic permutation) And processing the inner S parallel DFU 860. When the column continuity \^1^ is an odd number, one block line per clock cycle can be read/written from/to the APP modules 81〇 and 815, and the write can be scheduled to be 20 at the end of the layer (for example, the last clock) cycle). The two check message sub-blocks can be automatically read/written from/to check memory 850 and 855, but checking the second half of the memory location may be invalid. Thus the control logic in controller 840 may cause some of the arithmetic FUs in DFU 860 to be inoperable, and two of the four cyclic permutators 820 may not be active. Both ROM1 831 and 836 and ROM2 833 and 838 18 201018094 can be completely read at the end of a decoding iteration. Additional R〇M (not shown) can be used to store the wR values for each layer. 5 10 15 ❿ 20 Such a cyclic permutator 820 does not use reverse cyclic permutation until the APP message is written. In addition, the cyclic permutator 82 has a total time latency delay of three clock cycles due to three pipeline stages, where the two phases of S2:1 MUX determine one pipeline phase. Figure 9 is a block diagram showing a DFU 86〇 according to an embodiment of the present invention. An embodiment of three pipeline stages (read, handle, and write stages) for one PCM column decoding is shown. Each of the three successive layers can be decoded simultaneously. At each clock cycle, two PP-like messages and two check messages are loaded into the single-DFU; two check messages and two App messages are updated every clock cycle. The DFU 860 supports read/write of two sub-block matrices (App block and check message block) per clock cycle. Three pipeline stages are allowed: read, process, and write phase. ❹ Φ column minimum and Wei unit can achieve minimum and approximation (for example, two minimum messages for tandem search). Figure 10 shows a cyclic permutator in accordance with an embodiment of the present invention. The seed matrix of the block structured PCM has a finite set of shift values (e.g., from 1 to 15). The cyclic permutator 820 has four stages of 2:1 Μυχ 1〇1〇. Two additional "Flip-In Logic" stages (s χ 2: ι Μυχ i) perform block shifts in the two directions (for example, the offset here is 丨 to 15, or _15 to -1). In the round-up phase and the output phase, the "shocking logic circuit" is used to shift the block in the left and right directions. Such a logic circuit reverses the App message sequence: the first message becomes the first interest, the second message becomes the fourth message, and the message is dedicated to the "flip-chip logic circuit", and vice versa for outputting the "flip-chip 19 201018094 logic circuit" "." This logic can be used if the relative displacement shift value is negative (for example, between _15 and -1). The standard ASIC gate count for the arithmetic portion of decoder 800 can be estimated, including DFU 860 and cyclic permutator 820. In a non-limiting example, a code character size of 1944 bits (and thus S is 81) and a two-complement of 8 bits are used for fixed point arithmetic precision, and the total number of gates is about 235 thousand gates. The area is only increased by approximately 1.46 to support reading/writing of two block matrices per clock cycle. 81 DFU is equal to 189 thousand gates, where 96 thousand gates are used to process two block matrices. Compared with the 10 15 丁网网循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环循环The reduced permutation transfer area is reduced when the total shift value of up to (10) is supported. The decoding of the Bumm solution according to the embodiment of the present invention can support any two-complement fixed point arithmetic precision. Using a semi-parallel architecture, high decoding throughput can be achieved with limited hardware resources, such as an average of about 1 + 100 million bits per second. Use the following method to provide high capital "丨<_quantity. Write two sub-block matrices per clock cycle (for example, message check message block); the full secret of each PCM layer is parallel; and: Pipeline of the continuation layer. One

20 n即恭於母次迭代之解碼潛伏期延遲。」線階段有其本冑之敎期料：Wr/7+5軸週期之言伏期延遲(R) ’ Wr/2+6時鐘週期之處理潛伏期延遲 WR/2:4時鐘週期之寫入潛伏期延遲(w)。由於各層々化，每次迭代之解騎伏期延遲可判定為處理階階段之最大潛伏期延遲，母如万私式⑺所不，此處LJ 讀取潛伏期延遲由於重叠處理/寫人潛伏期勾 20 201018094 不影響總潛伏期延遲。實際上，處理潛伏期延遲p及PCM中之層數決定每次迭代之潛伏期延遲。處理潛伏期延遲經常性大於寫入潛伏期延遲。由於每層全然解碼平行，每次迭代之解碼潛伏期延 5 €並未取決於碼字元大小，藉延伸而取決於每層之列數。彳人迭代之解碼潛伏期延遲係依據碍速率決;^。舉例說明於第11圖。 • 平均解碼資料流量係基於達成FER為10·4之平均迭代數目（此處最大解碼迭代數目設定為⑸。迭代平均數也取決 10於踢字元大小及碼速率；典型約為五次送代。使用2〇〇MHz 時鐘頻率，比較每個時鐘週期支援一個區塊-矩陣之讀/寫之解碼器，平均資料流量增加m.54倍。參考第12圖有關用於多個碼字元長度之資料流量相對於碼速率之舉例說明。第13圖及第14圖顯示用於不同的預定的最大解碼迭代 15數目’FER效能相對於SNR。最大解碼迭代數目取決於多項 • 因素例如SNR、期望之服等。第13圖顯示瑪速率為1/2之實例。第14圖顯示碼速率為5/6之實例。第15圖顯示對預定的最大解碼迭代數目之最小可達成資料流量之實例。於本實例中，最大解碼迭代數目設定為 20 〇根據本發明之實施例之解碼器提供經由使用移位值之有限集合之短週期的減少/去除。比較全然隨機PCM結構，此種減少/去除只以錯誤率效能之邊際損耗執行。此外，此種解碼器使用每層相等分布之奇及偶非零區塊行。由於每 21 201018094 個週期二子行讀/寫自/至記憶體模組，故未發生APP記憶體存取衝突。此種LDPC解碼器提供資料流量的增加而只有有限的硬體額外管理資訊量。第16圖顯示根據本發明之實施例一種用於解碼一編碼 5資料區塊之方法。於步驟1610，儲存包含資料子區塊之一編碼資料區塊。於步驟162〇，使用不規則區塊結構化同纟 -20 n is the decoding latency delay of the mother iteration. The line phase has its own expectation: Wr/7+5 axis cycle delay period delay (R) ' Wr/2+6 clock cycle processing latency delay WR/2: 4 clock cycle write latency Delay (w). Due to the deuteration of each layer, the delay of the solution of each iteration can be determined as the maximum latency delay of the processing stage, as the parent (7) does not, where the LJ read latency delay is due to the overlap processing/write latency period. 201018094 does not affect the total latency delay. In fact, the processing latency delay p and the number of layers in the PCM determine the latency delay for each iteration. The processing latency delay is often greater than the write latency delay. Since each layer is completely decoded in parallel, the decoding latency of each iteration is 5 € and does not depend on the size of the code character, which depends on the number of columns per layer. The decoding latency delay of the iterative iteration is based on the rate of failure; An example is shown in Figure 11. • The average decoded data traffic is based on the average number of iterations with a FER of 10.4 (where the maximum number of decoding iterations is set to (5). The iterative average also depends on the kick size and code rate; typically about five times. Using a 2 〇〇 MHz clock frequency, comparing a block-matrix read/write decoder per clock cycle, the average data traffic is increased by m.54 times. Refer to Figure 12 for multiple codeword lengths. Examples of data traffic versus code rate. Figures 13 and 14 show the number of FER performance versus SNR for different predetermined maximum decoding iterations. The maximum number of decoding iterations depends on multiple factors such as SNR, expectation Figure 13. Figure 13 shows an example of a horse rate of 1/2. Figure 14 shows an example of a code rate of 5/6. Figure 15 shows an example of the minimum achievable data flow for a predetermined maximum number of decoding iterations. In the present example, the maximum number of decoding iterations is set to 20 解码 The decoder according to an embodiment of the present invention provides for the reduction/removal of short periods via the use of a finite set of shift values. PCM architecture, such reduction/removal is only performed with marginal loss of error rate performance. In addition, such decoders use equally distributed odd and even non-zero block rows per layer. Since every 21 201018094 cycles two subrows read/write There is no APP memory access conflict from the memory module. Such an LDPC decoder provides an increase in data traffic with only a limited amount of additional hardware management information. Figure 16 shows an embodiment of the present invention in accordance with an embodiment of the present invention. A method for decoding an encoded 5 data block. In step 1610, storing an encoded data block containing one of the data sub-blocks. In step 162, the irregular block is used to structure the same -

核對矩陣，以管線化方式執行解碼。於多數時鐘週期之各週期可讀取自且寫入PCM2至少兩個子區塊矩陣。資料子區塊之讀及寫係均勻分布於至少兩個記憶體模組間。解碼 Q 10係以移位值執行，其去除於或低於預定臨界值長度之週期。本發明之實施例可於多個組件諸如積體電路模組實施。積體電路之設計可藉大型高度自動化方法執行。複雜而強而有力之軟體工具可用於將邏輯位準設計轉成準備於半導體基材上蝕刻及形成之半導體電路設計。 15 程式諸如加州山景市西諾普系統公司（Synopsys，Inc.) 及加州聖荷西市凱登斯設計公司（Cadence Design)所提供之程式使用經過明破建立之設計規則及預先儲存之設計模 ® 組存庫而於半導體晶片上自動路由導體及定位組件。一1 已經元成半導體電路之設計，呈標準化電子格式（例如 2〇 Opus、GDSII等）之所得設計可傳送至半導體製造廠或稱作「fab」用於製造。刖文說明係藉說明性非限制性實例提供本發明之完整資訊性說明。但相關技藝界之熟諸技藝人士鐘於前文說明連同附圖及隨附之申請專利範圍一起研讀將顯然易知多項 22 201018094 修改及調整。但本發明之教示之全部此等修改及類似修改仍將落入本發明之範圍。此外’可未相對應使用其它特徵可優異地使用本發明之較佳實施例之若干特徵。如此，前文說明須單純考慮為 5本發明之原理之舉例說明而非限制性。【围式簡JfL软^明】第1圖顯示方程式（1)至(乃。 φ 第2圖顯示適合用於實施本發明之具體實施例之多種電子裝置之簡化區塊圖。 1〇第3圖顯讀—給定的碼速率，區塊結構碼與隨機 LDPC碼間之贿說明，單料分貝刚。第4圖顯示區塊結構化不規則同位核對矩陣。第5圖顯示根據本發明之實施例APP記憶體之組織。第6圖顯示根據本發明之實施例ROM模組之實施。第7圖顯不根據本發明之實施例一種核對記憶體之實 • 例。帛8圖顯不根據本發明之實施例之—種LDpc解碼器之方塊圖。第9圖顯不根據本發明之實施例之一種DFu之方塊圖。第0圖顯不根據本發明之實施例之一種減低的循環置換器之方塊圖。 ®顯不用於不同竭速率按時鐘週期中之解碼迭代之處理潛伏期延遲。圖顯不對多種碼字元長度之解碼資料流量相對於 23 201018094 碼速率之說明圖。第13圖顯示對不同迭代最大數目、碼大小1944、碼速率1/2之FER相對於SNR之說明圖。第14圖顯示對不同迭代最大數目、碼大小1944、碼速 5 率5/6之FER相對於SNR之說明圖。第15圖顯示對預定數目之迭代之最小可達成資料流量之說明圖。第16圖顯示根據本發明之實施例之一種方法。【主要元件符號説明】 212…無線網路 815...APP鏡面記憶體 214...使用者設備、UE 820…循環置換器 216…接取節點、AN 831...ROM1 218...資料處理器、DP 833 …R0M2 220...記憶體、MEM 836...ROM1 222...RF收發器 838 …ROM2 224...程式、PROG 840…控制器 226...資料處理器、DP 850.··核對記憶體 228...記憶體、MEM 855…鏡面記憶體 230... RF收發器 860…功能單元 232...程式、PROG 910…串列最小和功能單元 234...資料徑路 1010...2 ： 1 MUX 236...網際網路 1020... 2 ： 1 MUX 800…LDPC解碼器 1610…處理方塊 810...APP 模組 1620...處理方塊 24Check the matrix and perform decoding in a pipelined manner. Each of the plurality of clock cycles can be read from and written to at least two sub-block matrices of PCM2. The reading and writing of the data sub-blocks are evenly distributed between the at least two memory modules. The decoding Q 10 is performed with a shift value that is removed at or below a predetermined threshold length. Embodiments of the invention may be implemented in a plurality of components, such as integrated circuit modules. The design of the integrated circuit can be performed by a large, highly automated method. Complex and powerful software tools can be used to convert a logic level design into a semiconductor circuit design that is etched and formed on a semiconductor substrate. 15 Programs such as Synopsys, Inc., of California, and Cadence Design, San Jose, Calif., use programs that have been established through clear-cut design rules and pre-stored designs. The Model® group stores the conductors and the positioning components automatically on the semiconductor wafer. The design of the semiconductor circuit has been designed in a standardized electronic format (eg, 2〇 Opus, GDSII, etc.) and can be transferred to a semiconductor manufacturing facility or referred to as "fab" for manufacturing. The description of the present invention provides a complete and informative description of the invention by way of illustrative and non-limiting examples. However, it is obvious that the skilled artisans of the relevant artisans, together with the drawings and the accompanying patent application scope, will be able to understand a number of changes and adjustments. All such modifications and similar modifications of the teachings of the present invention will still fall within the scope of the invention. Further, several features of the preferred embodiment of the invention may be used to advantage without the use of other features. As such, the foregoing description is intended to be illustrative and not restrictive. [Simplified JfL Soft] Fig. 1 shows equations (1) to (yes. φ) Fig. 2 shows a simplified block diagram of various electronic devices suitable for implementing the specific embodiments of the present invention. Figure explicit reading - given code rate, bribe description between block structure code and random LDPC code, single material decibel just. Figure 4 shows block structured irregular parity check matrix. Figure 5 shows according to the invention Embodiment FIG. 6 shows the organization of the ROM module. Fig. 6 shows the implementation of the ROM module according to an embodiment of the present invention. Fig. 7 shows an example of a check memory according to an embodiment of the present invention. A block diagram of an LDpc decoder in accordance with an embodiment of the present invention. Figure 9 shows a block diagram of a DFu in accordance with an embodiment of the present invention. Figure 0 shows a reduced cyclic permutation in accordance with an embodiment of the present invention. The block diagram of the device is not used for the processing latency delay of the decoding iterations in different clock rates. The figure shows the decoding data flow of multiple code character lengths relative to the description of the 23 201018094 code rate. Different An illustration of the FER vs. SNR for the maximum number, code size 1944, and code rate 1/2. Figure 14 shows the FER vs. SNR for the maximum number of different iterations, the code size of 1944, and the code rate of 5 rate 5/6. Fig. 15 is a diagram showing the minimum achievable data flow for a predetermined number of iterations. Fig. 16 shows a method according to an embodiment of the present invention. [Main component symbol description] 212...Wireless network 815... APP mirror memory 214...user equipment, UE 820...cyclic permutator 216...access node, AN 831...ROM1 218...data processor, DP 833 ...R0M2 220...memory, MEM 836...ROM1 222...RF transceiver 838 ...ROM2 224...program, PROG 840...controller 226...data processor, DP 850.·check memory 228...memory, MEM 855... Mirror Memory 230... RF Transceiver 860... Function Unit 232... Program, PROG 910... Serial Minimal and Functional Unit 234... Data Path 1010...2: 1 MUX 236... Internet 1020... 2: 1 MUX 800...LDPC Decoder 1610...Process Block 810...APP Module 1620...Process Block 24

Claims

201018094 VII. Patent application scope: 1. A method comprising: storing a data block containing one of the data sub-blocks; and decoding the data in a pipelined manner using an irregular block structure homomorphic check matrix a block, where at least two data sub-block matrices of the parity check matrix are read and written in each cycle of a plurality of clock cycles, where the read and write of the data sub-blocks are It is evenly distributed between at least two regions of the Panichi memory, and the decoding is performed here with a shift value that can be eliminated at or below a predetermined threshold length. 2. As in the method of claim 1, the pipeline here contains at least three layers. 3. The method of claim 1, wherein the decoding is performed using a cyclic permutator that uses a shift value memory module to store the position and shift of the non-zero sub-block matrix. Value/relative offset value. 4. The method of claim 3, wherein the shift value memory module comprises read only memory. 5. If the method of claim 1 is applied, the critical length is 8 here. 6. If the method of claim 1 is applied, the layer is completely treated parallel. 7. The method of claim 1, wherein the first region of the memory stores the data of the odd block row from the parity check matrix and the second region of the memory is stored from the parity check matrix. The block of the block is 25 201018094. 8. If you apply for the method of item 1 of the patent scope, the data flow here is one billion yuan per second. 9. An apparatus, comprising: a memory configured to store a data block containing a bar code of one of a data sub-block; to use an irregular block to structure the parity check matrix to decode the data area in a pipelined manner a block processor, where at least two data sub-block matrices of the parity check matrix are read from and written in each cycle of a plurality of clock cycles, where the read and write of the data sub-blocks It is evenly distributed between at least two regions of a memory, and the decoding is performed here with a shift value that can be eliminated at or below a predetermined threshold length. 10. As for the equipment of claim 9 of the patent scope, a pipeline here contains at least three layers. 11. The device of claim 9, wherein the decoding is performed using a cyclic permutator that uses a shift value memory module to store the position and shift value of the non-zero sub-block matrix. / Relative offset value. 12. In the case of the device of the patent application, the shift value memory module contains read-only memory. 13. If the equipment of claim 9 is applied, the critical length is 8 here. 14. If the equipment of claim 9 is applied, the layer is completely treated parallel. 26 201018094 15. If the device of claim 9 and the surrounding item 9 is stored, the first area of the memory is stored, the data of the odd block of the check matrix and the first area of the case are stored. The even block data from the parity check matrix. < 16. For the equipment of the application for the patent field, item 9, the data flow here is at least 10,000 bits per second. D. 17. For the equipment of claim 9 of the patent scope, the equipment is here implemented in at least ~ integrated circuits. ~ 18. A computer-readable medium that can be read by a machine-readable instruction program. These read-readable instructions can be processed by digital processing to perform the following operations: The storage contains data 7 leaves are encoded in one of the blocks. Data block; and linearization = structured parity check matrix and pipe read: ί: 至少 check matrix at least two data sub-block matrices, ', enter each cycle of most clock cycles, and write The input system is evenly distributed in the presence of (four) degrees. 19. If the application is in the first floor of the Fan Park. The media of 18 items, here a pipeline contains at least three 2 〇 · If the patent application scope ring displacer is executed, the media 'here the decoding system uses a one-wheel permutator to use the shift value memory module to 27 201018094 The position of the non-zero sub-matrices and the shift value/relative offset value. 21. If the applicant applies for the 2G item, the shift value memory model group contains the read-only memory. 22· If you apply for the patent scope _ the media, here the layer has a complete treatment of parallelism. 23. As claimed in claim 18, one of the clocks in the first region stores data from the odd block rows of the parity check matrix and the first region of the memory is stored from the parity check matrix. Even block

data. 24. A device comprising: at least two means for storing - a data subblock of a coded data block, and i for pipelining a structured parity check matrix using a - irregular block a device for decoding the data block, where at least two data sub-block matrices of the same check matrix are read and written in each cycle of a majority of clock cycles,

Here, the reading and writing of the data sub-blocks are between the at least two regions of the sentence-memory, and; here the decanting is shifted by a period that can be eliminated or below a predetermined threshold. The bit value is executed. 25. The device of claim 24, wherein the pipeline comprises at least: a layer. - 26. The apparatus of claim 24, further comprising: a memory device for storing a non-zero sub-block matrix position and a shift value/relative offset 28 201018094 value. 27. The device of claim 24, wherein the memory comprises read-only memory. 28. If the device of claim 24 is applied, the layer is fully treated for parallelism. 29. The apparatus of claim 24, wherein a first storage device stores data from odd block rows of the parity check matrix and a second storage device stores even blocks from the parity check matrix. Information on the line.

29