[go: up one dir, main page]

US20070011592A1 - Decoder architecture for Reed Solomon codes - Google Patents

Decoder architecture for Reed Solomon codes Download PDF

Info

Publication number
US20070011592A1
US20070011592A1 US11/505,451 US50545106A US2007011592A1 US 20070011592 A1 US20070011592 A1 US 20070011592A1 US 50545106 A US50545106 A US 50545106A US 2007011592 A1 US2007011592 A1 US 2007011592A1
Authority
US
United States
Prior art keywords
inp
circumflex over
polynomial
feng
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/505,451
Inventor
Lisa Fredrickson
Leilei Song
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agere Systems LLC
Original Assignee
Agere Systems LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agere Systems LLC filed Critical Agere Systems LLC
Priority to US11/505,451 priority Critical patent/US20070011592A1/en
Assigned to AGERE SYSTEMS INC. reassignment AGERE SYSTEMS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FREDRICKSON, LISA, SONG, LEILEI
Publication of US20070011592A1 publication Critical patent/US20070011592A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • H03M13/15Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
    • H03M13/151Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
    • H03M13/1545Determination of error locations, e.g. Chien search or other methods or arrangements for the determination of the roots of the error locator polynomial
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • H03M13/15Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
    • H03M13/151Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
    • H03M13/1525Determination and particular use of error location polynomials
    • H03M13/153Determination and particular use of error location polynomials using the Berlekamp-Massey algorithm
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • H03M13/15Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
    • H03M13/151Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
    • H03M13/158Finite field arithmetic processing

Definitions

  • the invention relates to an architecture for a Reed Soloman decoder with improved efficiency.
  • Reed Soloman codes are error-correcting codes used to improve the robustness of communication and storage systems.
  • Some symbols of the transmitted or stored codeword may be erroneous when recovered, and if the number of erroneous symbols is small enough, they can be found and corrected by a Reed Solomon decoder.
  • a typical Reed-Solomon error-correction system utilizes an encoder and decoder.
  • the encoder inputs the k data symbols and appends the r redundant symbols in such a manner that the n symbols of a Reed-Solomon codeword can be regarded a the coefficients of a polynomial of degree n ⁇ 1, which is a multiple of a generator polynomial of degree r ⁇ 1.
  • the roots of the generator polynomial are consecutive powers of a so-called primitive element, as it is commonly known in the literature, which is commonly referred to as ⁇ . These roots are ⁇ Ir0 , ⁇ Ir0+1 , ⁇ Ir0+2 , . . . , ⁇ Ir0+r ⁇ 1 ⁇ , where Ir 0 is an arbitrary logarithm in base ⁇ of the first root. In transmission or storage, some of the codeword symbols may be corrupted when received at the decoder.
  • a typical decoder architecture consists of three processing units.
  • the first is referred to here as a syndromer, whose purpose is to calculate the so-called syndromes, as they are known in the literature, from the received vector.
  • the syndromes are the coefficients of the Fourier transform of an error vector that the syndromer computes from the received vector.
  • the syndromer typically requires one clock cycle per recovered symbol.
  • the second is referred to here as a polynomial determining unit (PDU), and typically dominates the VLSI area of the decoder.
  • the PDU determines various polynomials from the syndromes, depending upon the algorithm employed, including an error locator polynomial.
  • the third processing unit is referred to here as a Chien searcher, which also typically requires one clock cycle per recovered symbol. The function of the Chien searcher is to find roots of the error locator polynomial, in order to locate the errors.
  • a PDU architecture which can process a Reed Solomon codeword within a number of clock cycles equal to the number of symbols in a codeword, n.
  • Early Reed-Solomon decoders utilized a single centralized Galois Field multiplier as part of a specialized arithmetic logic unit to perform the PDU functions.
  • O(t2) the problem of determining the polynomials from the syndromes is proportional in multiplications to the square of t, denoted O(t2), where t is the number of errors to be corrected.
  • O(t2) the number of errors to be corrected.
  • tmax the number of clock cycles to perform the PDU functions in these early implementations has exceeded n. Therefore, the focus today is on PDU architectures which utilize O(t) parallel multipliers and require O(t) clock cycles to determine the polynomials.
  • Feng revised Horiguchi's algorithm to make it more regular and suitable for VLSI implementation To achieve this, Feng reformulated the algorithm and used a special circuit to calculate a scaling factor, Bp. This was done to prevent an overflow in the scratch polynomial B(z).
  • An advantage of Feng's architecture is that the storage for iterative development of various polynomials is minimized, where the total storage is r m-bit registers used to hold the syndromes, t m-bit registers used to hold iterative solutions for ⁇ (z), and t m-bit registers used to hold iterative solutions for B(z), for a total of approximately 4tm registers in the polynomial determination unit.
  • a disadvantage of Feng's architecture is that it uses 3t multipliers, and these multipliers dominate the VLSI area of the PDU.
  • Fredrickson described a PDU architecture which used approximately 2t multipliers in a shared fashion for reduced area while retaining approximately the same iteration time as Feng.
  • the overall area and hence efficiency was improved, but the storage requirements for polynomials was larger than in Feng's architecture.
  • a decoder for error correcting codes comprises a syndrome calculation circuit.
  • a polynomial determining unit calculates polynomials from an output of the syndrome calculation circuit.
  • the polynomials include at least one polynomial that is rotated if the one polynomial overflows.
  • a method of decoding error correcting codes in accordance with yet another aspect of the present invention comprises calculating a plurality of syndromes.
  • a plurality of polynomials are calculated from the calculated plurality of syndromes. At least one of the calculated plurality of polynomials is rotated if the polynomial overflows.
  • a method of correcting errors for decoding error correcting codes in accordance with still another aspect of the present invention comprises calculating an error evaluation polynomial comprising a plurality of coefficients. Each of the plurality of coefficients of the error evaluation polynomial is multiplied by a first factor if each coefficient has overflowed. Each of the plurality of coefficients of the error evaluation polynomial is multiplied by a second factor different from the first factor if each coefficient has not overflowed, whereby the error evaluation polynomial is evaluated.
  • FIG. 1 shows a part of the Berlekamp-Massey algorithm used to determine a locator polynomial according to the background art
  • FIG. 3 shows an algorithm according to a preferred embodiment of the invention
  • FIG. 4 shows a decoder PDU slice according to an embodiment of the invention
  • FIG. 6 shows the use of a dual multiplier circuit according to an embodiment of the invention in performing a Chien search
  • FIG. 7A is a partial view of a circuit implementation according to the background art, implementing the algorithm of FIG. 2 ;
  • FIG. 7B is a further partial view of the background art circuit implementation according to FIG. 7A , implementing the algorithm of FIG.2 ;
  • FIG. 8A is a partial view of a correction block associated with the background art circuit of FIGS. 7A and 7B ;
  • FIG. 8B is a further partial view of the correction block according to the background art.
  • a highly efficient PDU architecture which uses a total of t multipliers in a shared fashion for reduced area while retaining approximately the same iteration time and polynomial storage efficiency of Feng. This is achieved by sharing a single multiplier in each 'slice of the circuit and employing it three times in successive clock cycles.
  • the architecture disclosed here is easily extended to an error-and-erasure decoder with improved efficiency, combining the methods of the disclosed architecture with erasure locator polynomial pre-processing methods that are well-known in the art.
  • step 100 the syndromes are input, and in step 110 the coefficients are updated.
  • U(z) is a sequence of r rearranged syndrome coefficients, while ⁇ is of degree t max and B is of degree r+1.
  • i is used to count the r iterations of the algorithm. Assume that at most t errors have occurred. The body of each iteration is subdivided into three steps 120 , 130 and 140 .
  • step 120 the outputs of t max multipliers, which scale coefficients of the current ⁇ 1 (z) by various coefficients of U, are summed to produce a so-called discrepancy, ⁇ , while the coefficients of B are shifted.
  • step 120 The result of step 120 is used in step 130 .
  • t max multipliers scale the coefficients of the current B i (z) by ⁇ to produce an additive change in the coefficients of ⁇ i (z) .
  • step 130 The results of step 130 are used in step 140 . If the logical condition is true, the integer j is updated and t max multipliers scale the coefficients of B (i+1) (z) by ⁇ to normalize them. The iteration counter i is incremented.
  • step 150 i is compared to r, and if i ⁇ r, the algorithm proceeds with the next iteration, and otherwise the polynomials ⁇ (z) and B(z) have been determined and are output in step 160 .
  • a shortcoming of the algorithm as shown in FIG. 1 is that r+1 symbol storage registers are needed for B. Feng showed a method of reducing this storage as shown in the algorithm of FIG. 2 .
  • the polynomial A of degree at most t max plays a similar role to polynomial B in FIG. 1 .
  • Feng notes that at most t max consecutive coefficients of B are non-zero, and the new variable p counts the number of times the polynomial would have been shifted beyond the storage limits of A in FIG. 1 .
  • FIGS. 7A, 7B , 8 A and 8 B Feng's circuit implementation is shown in FIGS. 7A, 7B , 8 A and 8 B.
  • FIGS. 7A and 7B show partial views of a BMA (Berlekamp-Massey Architecture) circuit block employed to compute the error locator polynomial and the error evaluation polynomial from the output of a syndrome computation block (not shown), and
  • FIGS. 8A and 8B show partial views of a correction circuit block that performs a Chien search and error evaluation on the output of the BMA block.
  • BMA Binarylekamp-Massey Architecture
  • the partial view of Feng's BMA block in FIG.7A shows multipliers 700 1 , 700 2 , . . . , 700 n and 718 1 , 718 2 , . . . , 718 n , adders 705 1 , 705 2 , . . . 705 n , registers 710 1 , 710 2 , . . . , 710 n , 715 1 , 715 2 , . . . , 715 n , 716 and 717 , summer 720 , Galois Field inverter 730 and decision block 740 .
  • the Galois field inverter 730 computes the inverse ⁇ ⁇ 1 of the discrepancy ⁇ .
  • the multipliers 700 1 , 700 2 , . . . , 700 n and 718 1 , 718 2 , . . . , 718 n and adders 705 1 , 705 2 , . . . , 705 n perform Galois field arithmetic. Multiplexers used for loading registers 715 1 , 715 2 , . . . , 715 n , 716 and 717 are omitted for clarity. Registers 715 1 , 715 2 , . . . , 715 n , 716 and 717 are initialized to be the syndromes calculated in the syndrome calculation block and registers 710 1 , 710 2 , . . . , 710 n are initialized to be zero. A critical path is shown by the dotted line at 745 .
  • FIG.7B The remaining circuitry in Feng's BMA block is shown in FIG.7B , where 750 1 . . . 750 n and 755 are multipliers, 760 0 , 760 1 , . . . , 760 n , 765 and 780 0 , 780 1 , . . . , 780 n are multiplexers 770 0 , 770 1 , . . . , 770 n and 775 are registers, 785 is an AND gate, and 790 is a decision block.
  • the registers 770 1 , . . . , 770 n and 775 are initialized to zero, and register 770 0 is initialized to 1. Outputs are obtained at terminals 795 0 , 795 1 , . . . , 795 n and 796 .
  • FIG. 8A is a partial view of the correction block according to Feng, in which 801 1 , 801 2 , . . . , 801 n and 802 1 , 802 2 , 802 n are multiplexers, 810 1 , 810 2 , . . . , 810 n and 811 1 , 811 2 , . . . , 811 n are registers, 815 1 , 815 2 , . . . , 815 n , 816 1 , 816 2 , . . .
  • 816 n and 831 are multipliers
  • 827 and 828 are summers
  • 830 and 845 are adders
  • 833 is a decision block
  • 835 is a Galois Field inverter
  • 840 is an OR gate
  • 837 represents a critical path.
  • FIG. 8B shows recursive circuitry on the right hand side to calculate the numerator of the expression in paragraph earlier as a product of Feng's terms Ba and Bp, the recursive circuitry including a multiplexer 856 , registers 861 and 862 , and a multiplier 866 .
  • On the left hand side are circuits to evaluate the polynomial B(z), employing multiplexers 855 0 , 855 1 , . . . , 855 n , registers 860 0 , 860 1 , . . . , 860 n , multipliers 865 1 , . . . , 865 n and a summer 870 .
  • the outputs of multiplier 866 and summer 870 are then multiplied together by multiplier 880 . 875 represents a critical path.
  • FIG. 3 A preferred algorithm according to the invention is shown in FIG. 3 , which is closely related to a preferred embodiment of the apparatus.
  • the syndromes are input in step 300 , and the coefficients updated in step 310 .
  • an iteration cycle of the previous algorithms has been split into three distinct steps 320 , 330 and 340 , each of which takes one cycle to execute.
  • Feng's implementation requires 3t multipliers to execute steps 220 , 230 and 240 within one clock cycle. By separating these steps into distinct cycles, only t multipliers are required which can be used in various ways to execute each part of the algorithm. Since these multipliers dominate the VLSI area of the PDU, the area is desirably reduced.
  • FIG. 4 the schematic of a submodule referred to as a slice is shown.
  • the slice contains a number of two-way multiplexers 410 , 412 , 414 , 416 , 418 , 420 , 422 , 424 , 426 and 428 , each having three inputs including a select input, such that when the select input is 1(0) the input 1(0) appears on the output, a Galois Field adder 430 , which inputs two 8-bit quantities and outputs their finite field sum, and is equivalent to a bit-wise XOR gate, a Galois Field multiplier 432 , which inputs two 8-bit quantities and outputs their finite field product, and registers 440 , 442 , 444 and 446 , which are D-type clocked flip-flops to hold terms ⁇ , B, o, and U.
  • o is an overflow flag indicating whether B(z) has overflowed, and is either 0 or 1.
  • An o input is provided at 402 , a B input at 404 , an o output at 452 , a B output at 454 , a ⁇ output at 456 , and a multiplier output at 458 .
  • register 446 holds a syndrome value denoted U j
  • register 444 holds a term of the locator polynomial ⁇
  • register 442 holds a term of the B polynomial
  • register 442 holds a flag that shows if the term of the B polynomial has resulted from an overflow during rotation.
  • Slice k holds the coefficients of Z k in the polynomials B(z) and ⁇ (z).
  • the signal init is asserted, which loads registers 440 , 442 , and 444 with all zeroes, and register 446 with an external syndrome term at the rising edge of the clock. All slices except for the implied first one are initialized in this way. The initial value of ⁇ (z) is 1, so all slices except for the implied first one are set to zero. Similarly, the initial B(z) is 1. After initialization, the signal init is reset to zero for the remainder of decoding, and 2t three-step iterations are performed to complete the decoding, where t is the maximum number of errors to be corrected.
  • control signals m0sel, m1sel and m2sel are all asserted (set to 1), while signals m3sel, m4sel and m5sel are deasserted (set to 0).
  • the product mulout k is the Galois field product of a term of U and a term of ⁇ , which is summed with all such terms from all slices in an external Galois Field adder to form the discrepancy.
  • the terms of the three polynomials U, B and 7 are rotated within a slice, so that the term of 7 moves to register 446 , a term of B moves up to register 444 , and a term of U is in register 442 .
  • the B ⁇ k>output of slice k is connected to the B input ⁇ k+1>of slice ⁇ k+1>.
  • the terms of B are shifted ahead from one slice to the next, accomplishing the B (i+1) (z) ⁇ ROTATED [B i (z)] function.
  • the modified B(z) is zB(z), where the highest order term, if it overflows, is shifted back to the lowest order slice, setting the accompanying o(z) flag.
  • control signals m1sel, m2sel and m5sel are all asserted (1), while control signals m4sel and m0sel are deasserted (0).
  • Signal m3sel may or may not be asserted.
  • the product mulout k is the Galois Field product of a term of B and discrepancy ).
  • the Galois Field sum ⁇ k ⁇ ( ⁇ B k ) is one term of the function ⁇ (i+1) (z) ⁇ i (z)+ ⁇ B (i+1) (z), which is loaded in register 442 at the end of iteration step 330 .
  • U U (i+1) (z) ⁇ ROTATED [U i (z)] one of the terms of which is loaded in register 444 at the end of iteration step 330 .
  • U (i+1) (z) ⁇ ROTATED [U i (z)] one of the terms of which is loaded in register 444 at the end of iteration step 330 .
  • These are shifted from one slice to the next using the same B input ⁇ k+1>and B ⁇ k>inter-slice connections used to shift B in iteration step 320 .
  • the value in register 440 is unmodified.
  • An external circuit determines if any of the bits of the discrepancy are non-zero by OR-ing all the bits together. Two external quantities i and 2j are compared, and the result of the comparison is logically AND-ed with the output of the non-zero detection OR gate. This logical signal externally drives m3sel. If m3sel is asserted, the value of register 446 remains unchanged at the end of iteration step 330 , and register 446 retains the unmodified term ⁇ k , accomplishing the function: B (i+1) (z) ⁇ i (z).
  • the value of register 446 is updated at the end of iteration step 330 by taking on the value B k that was in register 444 at the end of iteration step 320 .
  • an external circuit performs a Galois field inversion on the quantity ⁇ , inverts the least significant bit of the result, and loads the modified inversion, denoted delinvp1, in an external register at the end of iteration step 330 .
  • m4sel is asserted, m0sel, m2sel, m3sel and m5sel are deasserted, and m1sel takes the same value that m3sel took in iteration step 330 .
  • the product mulout k is the Galois field product of a term of B and delinvp1.
  • register 442 is updated at the end of iteration step 340 by taking on the value Bk that was in register 446 at the end of step 330 .
  • register 446 takes on the value U k that was in register 444 at the end of step 330 . Since m2sel is deasserted, register 444 takes on the value ⁇ k that was in register 442 at the end of step 330 . Register 440 is unmodified.
  • the updated polynomial terms B, A and U have returned to their original positions at the end of the three steps 320 , 330 and 340 making up one iteration.
  • the polynomials B and ⁇ have taken their final values, and are used by the Chien search unit to determine error locations and values.
  • a unique feature of the slice shown in FIG. 4 is that the terms ⁇ , B, and U are contained in time-varying registers within a slice. This allows the terms which are multiplied to be shifted to the multiplier inputs as needed, so that each multiplier is shared in a high-speed implementation. It also allows for re-use of the polynomial rotation circuitry so that the terms of the B polynomial are shifted in step 320 , while the terms of the U polynomial are shifted using the same circuitry in step 330 .
  • Feng's circuitry was sliced to contain one term of the equivalent ⁇ , A, and U and their associated logic.
  • Table 1 compares the efficiency of the slice shown in FIG. 4 to that of Feng. Each was estimated using synthesis results for a decoder designed for 9 bit symbols. As can be observed from Table 1, Feng's implementation uses nearly twice the VLSI area, but the two implementations have nearly equal iteration delays. Hence, this improved implementation is nearly twice as efficient.
  • FIG. 5 shows the schematic of the entire PDU according to the embodiment of FIG.4 .
  • the PDU includes a number of slices 500 1 , 500 2 , 500 3 , . . . 500 20 , having respective o inputs 502 1 , 502 2 , 502 3 , . . .
  • auxiliary logic indicated generally at 555 including an OR gate 550 , a decision block 552 and multiplexers 560 and 562 .
  • the register output value is fed back to multiplexer 560 at terminal 561 .
  • Delay line 510 includes multiplexers 512 1 , 512 2 , 512 3 . . .
  • the delay line 510 is shifted once very three cycles.
  • An output 518 from the delay line 510 is supplied to input terminal 563 of multiplexer 562 , where it is also combined with the output of multiplexer 560 , to feed the B input 504 of the first slice 500 1 , corresponding to the B input 404 in FIG. 4 .
  • Decision block 552 detects a non-zero condition to determine the value of the o input 502 1 to the first slice 500 1 , corresponding to the o input 402 in FIG. 4 .
  • OR gate 550 is employed to detect a first cycle condition.
  • the present invention also eliminates Feng's special circuit for calculating the numerator of the error value expression. To do this, the Applicants make a judicious choice of Ir0 to use the simplified version of the error value expression, i.e., with numerator 1, and handle storage of the B polynomial in a proprietary fashion.
  • the storage of a constant term for the polynomial B(z) is not required, because initialization of B(z) is performed by the OR gate 550 in the first cycle of Berlekamp-Massey iterations, as shown in FIG. 5 .
  • a non-zero coefficient in the polynomial o(z) indicates that the corresponding term of the resulting polynomial B(z) is the result of overflowing the allotted storage.
  • the polynomial B(z) is typically evaluated in a circuit as shown on the left hand side of FIG. 8B , as previously discussed.
  • the register containing B i has been multiplied by ⁇ ⁇ i .

Landscapes

  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Error Detection And Correction (AREA)
  • Detection And Correction Of Errors (AREA)

Abstract

A Reed Solomon decoder architecture. The architecture uses a modified version of the error-evaluator polynomial form proposed by Horiguchi, and later improved by Feng. The architecture is an improvement over Feng in that the area of the dominant PDU unit has been significantly reduced, while maintaining nearly the same iteration time, in novel slice circuitry which rotates terms to share a common multiplier and other circuitry. In addition, a novel implementation of storage of the B polynomial and associated overflow flags allows its storage to be minimized and provides equivalent functioning of the Chien search unit using a proprietary dual-multiplier arrangement in place of random multipliers used by Feng.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The invention relates to an architecture for a Reed Soloman decoder with improved efficiency.
  • 2. Background
  • Reed Soloman codes are error-correcting codes used to improve the robustness of communication and storage systems. A data stream, consisting of k m-bit symbols, is typically protected against errors by appending r m-bit redundant symbols, so that the concatenated stream of n=k+r m-bit symbols forms a Reed Solomon codeword. Some symbols of the transmitted or stored codeword may be erroneous when recovered, and if the number of erroneous symbols is small enough, they can be found and corrected by a Reed Solomon decoder.
  • In some systems, the recovery circuitry generates a measure of each symbol's reliability as the recovered symbols are produced. If a symbol is regarded as unreliable, an erasure flag is generated to indicate its location to the Reed Solomon decoder. If s flagged and t unflagged erroneous symbols are recovered, a Reed Solomon decoder designed to process these erasure flags is capable of correcting the errors if 2t+s<=r . We refer to such a decoder as an error-and-erasure decoder.
  • In other systems the recovery circuitry does not process erasure flags, and such a decoder is therefore referred to as an error-only decoder. If t erroneous symbols (i.e., t errors) are recovered, an error-only Reed Solomon decoder is capable of correcting the errors if 2t<=r.
  • A typical Reed-Solomon error-correction system utilizes an encoder and decoder. The encoder inputs the k data symbols and appends the r redundant symbols in such a manner that the n symbols of a Reed-Solomon codeword can be regarded a the coefficients of a polynomial of degree n−1, which is a multiple of a generator polynomial of degree r−1. The roots of the generator polynomial are consecutive powers of a so-called primitive element, as it is commonly known in the literature, which is commonly referred to as α. These roots are
    Ir0, αIr0+1, αIr0+2, . . . , ═Ir0+r−1},
    where Ir0 is an arbitrary logarithm in base α of the first root. In transmission or storage, some of the codeword symbols may be corrupted when received at the decoder.
  • A typical decoder architecture consists of three processing units. The first is referred to here as a syndromer, whose purpose is to calculate the so-called syndromes, as they are known in the literature, from the received vector. The syndromes are the coefficients of the Fourier transform of an error vector that the syndromer computes from the received vector. The syndromer typically requires one clock cycle per recovered symbol. The second is referred to here as a polynomial determining unit (PDU), and typically dominates the VLSI area of the decoder. The PDU determines various polynomials from the syndromes, depending upon the algorithm employed, including an error locator polynomial. The third processing unit is referred to here as a Chien searcher, which also typically requires one clock cycle per recovered symbol. The function of the Chien searcher is to find roots of the error locator polynomial, in order to locate the errors.
  • For high throughput, a PDU architecture is desired which can process a Reed Solomon codeword within a number of clock cycles equal to the number of symbols in a codeword, n. Early Reed-Solomon decoders utilized a single centralized Galois Field multiplier as part of a specialized arithmetic logic unit to perform the PDU functions. In this case, the problem of determining the polynomials from the syndromes is proportional in multiplications to the square of t, denoted O(t2), where t is the number of errors to be corrected. However, as the maximum number of errors, tmax, to be corrected has grown, the number of clock cycles to perform the PDU functions in these early implementations has exceeded n. Therefore, the focus today is on PDU architectures which utilize O(t) parallel multipliers and require O(t) clock cycles to determine the polynomials.
  • Berlekamp and Massey described algorithms for decoding Reed Solomon codes by determining certain polynomials. Although these algorithms have been slightly refined over time, they are still generally referred to in the literature as the Berlekamp-Massey algorithm. Typical implementations of the Berlekamp-Massey algorithm use the syndromes to determine an error locator polynomial, Λ(z) , and an error evaluator polynomial, Ω(z). As described by Berlekamp, the calculation of these two polynomials uses two additional scratch polynomials, A(z) and B(z). Each of the polynomials is approximately of degree t.
  • Later, Blahut described a method of determining Λ(z) from the syndromes using only the scratch polynomial B(z). Ω(z) was then determined in an additional step, convolving the syndrome polynomial S(z) with the locator polynomial, i.e., Ω(z) =S(z)·Λ(z). Further, Horiguchi showed that it is not necessary to determine the polynomial Ω(z); the errors can be directly determined from Λ(z) and one of the scratch polynomials used to produce it, B(z). Berlekamp's storage of two polynomials or Blahut's additional convolution can be eliminated to increase efficiency.
  • Feng revised Horiguchi's algorithm to make it more regular and suitable for VLSI implementation. To achieve this, Feng reformulated the algorithm and used a special circuit to calculate a scaling factor, Bp. This was done to prevent an overflow in the scratch polynomial B(z). An advantage of Feng's architecture is that the storage for iterative development of various polynomials is minimized, where the total storage is r m-bit registers used to hold the syndromes, t m-bit registers used to hold iterative solutions for Λ(z), and t m-bit registers used to hold iterative solutions for B(z), for a total of approximately 4tm registers in the polynomial determination unit. A disadvantage of Feng's architecture is that it uses 3t multipliers, and these multipliers dominate the VLSI area of the PDU.
  • Fredrickson described a PDU architecture which used approximately 2t multipliers in a shared fashion for reduced area while retaining approximately the same iteration time as Feng. The overall area and hence efficiency was improved, but the storage requirements for polynomials was larger than in Feng's architecture.
  • SUMMARY OF THE INVENTION
  • In accordance with the principles of the present invention, a decoder for error correcting codes comprises a syndrome calculation circuit. A polynomial determining unit calculates polynomials from an output of the syndrome calculation circuit. The polynomials include at least one polynomial that is rotated if the one polynomial overflows.
  • A correction circuit for a decoder for error correcting codes in accordance with another aspect of the present invention comprises a correction circuit to evaluate an error evaluation polynomial. The correction circuit comprises a first multiplier employed when a scratch polynomial has overflowed, and a second multiplier different from the first multiplier employed when the scratch polynomial has not overflowed.
  • A method of decoding error correcting codes in accordance with yet another aspect of the present invention comprises calculating a plurality of syndromes. A plurality of polynomials are calculated from the calculated plurality of syndromes. At least one of the calculated plurality of polynomials is rotated if the polynomial overflows.
  • A method of correcting errors for decoding error correcting codes in accordance with still another aspect of the present invention comprises calculating an error evaluation polynomial comprising a plurality of coefficients. Each of the plurality of coefficients of the error evaluation polynomial is multiplied by a first factor if each coefficient has overflowed. Each of the plurality of coefficients of the error evaluation polynomial is multiplied by a second factor different from the first factor if each coefficient has not overflowed, whereby the error evaluation polynomial is evaluated.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Features and advantages of the present invention will become apparent to those skilled in the art from the following description with reference to the drawings, in which:
  • FIG. 1 shows a part of the Berlekamp-Massey algorithm used to determine a locator polynomial according to the background art;
  • FIG. 2 shows a part of the Berlekamp-Massey algorithm used to determine a locator polynomial as modified by Feng, according to the background art;
  • FIG. 3 shows an algorithm according to a preferred embodiment of the invention;
  • FIG. 4 shows a decoder PDU slice according to an embodiment of the invention;
  • FIG. 5 is a block diagram of the PDU according to an embodiment of the invention;
  • FIG. 6 shows the use of a dual multiplier circuit according to an embodiment of the invention in performing a Chien search;
  • FIG. 7A is a partial view of a circuit implementation according to the background art, implementing the algorithm of FIG. 2;
  • FIG. 7B is a further partial view of the background art circuit implementation according to FIG. 7A, implementing the algorithm of FIG.2;
  • FIG. 8A is a partial view of a correction block associated with the background art circuit of FIGS. 7A and 7B;
  • FIG. 8B is a further partial view of the correction block according to the background art.
  • DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
  • In accordance with the principles of the present invention, a highly efficient PDU architecture is provided which uses a total of t multipliers in a shared fashion for reduced area while retaining approximately the same iteration time and polynomial storage efficiency of Feng. This is achieved by sharing a single multiplier in each 'slice of the circuit and employing it three times in successive clock cycles.
  • In addition, a proprietary alternative to Feng's scaling factor circuitry is disclosed which incorporates rotation of the scratch polynomial B(z) when B(z) overflows, and employs a modified Chien searcher with specialized dual constant multipliers to multiply by two different factors according to whether or not B(z) has overflowed, to eliminate the determination of the scaling factor Bp.
  • In a preferred embodiment of the invention, the described architecture does not process erasure flags, and is therefore an error-only decoder. If t erroneous symbols (i.e., t errors) are recovered, the disclosed Reed Solomon decoder is capable of correcting the errors if 2t<=r. However, the architecture disclosed here is easily extended to an error-and-erasure decoder with improved efficiency, combining the methods of the disclosed architecture with erasure locator polynomial pre-processing methods that are well-known in the art.
  • The Berlekamp-Massey algorithm, as discussed in the Background of the Invention, for determining polynomials Λ(z) and B(z) from the syndromes is shown as a flowchart here for reference in FIG. 1. In step 100 the syndromes are input, and in step 110 the coefficients are updated. In FIG. 1, it is convenient to use polynomial notation for a stored sequence of coefficients, {A0, A1, A2, . . . } where
    A(z)=ΣAiZi
    In FIG. 1, U(z) is a sequence of r rearranged syndrome coefficients, while Λis of degree tmax and B is of degree r+1. The notation
    B(i+1)(z)←zB1(z)
    indicates that the B coefficients are updated in a shifting operation,
    B0 (i+1)=0,B1 (i+1)=B0 1,B2 (i+1)=B1 i, . . . , Br (i+1)=B(r−1) i
    and the notation
    U(i+1)(z)←ROTATED [Ui(z)]
    indicates that the U coefficients are updated in a shifting operation, with the highest degree coefficient shifting to the lowest degree,
    U0 (i+1)=U(r−1) (i), U1 (i+1)=U0 (i), U2 (i+1)=U1 (i), . . . , U(r−1) (i+1)=U(r−2) (i)
  • In FIG. 1, i is used to count the r iterations of the algorithm. Assume that at most t errors have occurred. The body of each iteration is subdivided into three steps 120, 130 and 140.
  • In step 120, the outputs of tmax multipliers, which scale coefficients of the current Λ1(z) by various coefficients of U, are summed to produce a so-called discrepancy, Λ, while the coefficients of B are shifted.
  • The result of step 120 is used in step 130. When Λ≠0, tmax multipliers scale the coefficients of the current Bi(z) by Δ to produce an additive change in the coefficients of Λi(z) . In addition, a non-zero Δ can be inverted to produce Φ=1/Δ, the coefficients of B are loaded with the coefficients of Λi(z) if a logical condition is true, and the coefficients of U are rotated.
  • The results of step 130 are used in step 140. If the logical condition is true, the integer j is updated and tmax multipliers scale the coefficients of B(i+1)(z) by Φ to normalize them. The iteration counter i is incremented.
  • In step 150, i is compared to r, and if i<r, the algorithm proceeds with the next iteration, and otherwise the polynomials Λ(z) and B(z) have been determined and are output in step 160.
  • The roots of the polynomial Λ(z) yield the error locations in that
    Λ(α-i)=0
    implies that the codeword symbol with index n−1−i is in error, where n is the number of symbols in the codeword. When Λ(α−i)=0, the codeword symbol with index n−1−i is corrected by adding the error value,
    e(n−1−i)(−i)(r+Ir0−1)/((B(α−iodd−i))
    where Λodd(Z) is the polynomial obtained from Λ(z) by deleting all terms with even powers of z. When Ir0 is chosen so that r+Ir0−1 is the multiplicative order of α in the Galois field, the formula simplifies to
    e(n−1−i)=1/((B(α−iodd−i)
  • A shortcoming of the algorithm as shown in FIG. 1 is that r+1 symbol storage registers are needed for B. Feng showed a method of reducing this storage as shown in the algorithm of FIG. 2.
  • The algorithm of FIG. 2 includes similar steps to those of FIG.1, in that the syndromes are input in step 200, the coefficients are updated in step 210, then each iteration is carried out in steps 220, 230 and 240, step 250 determines if i=r, and if not then steps 220, 230 and 240 are repeated, and the results are output in step 260.
  • In FIG. 2, the polynomial A of degree at most tmax plays a similar role to polynomial B in FIG. 1. Feng notes that at most tmax consecutive coefficients of B are non-zero, and the new variable p counts the number of times the polynomial would have been shifted beyond the storage limits of A in FIG. 1. Assuming that at most t errors have occurred and comparing the resulting polynomials of FIGS. 1 and 2, Feng shows that the polynomial Λ(z) is the same, and
    B(z)=zpA(z).
  • When Λ(α−i)=0, the codeword symbol with n−i−1 is corrected by adding the error value, e ( n - 1 - i ) = α ( - i ) ( r + lr 0 - 1 ) / ( ( α - ip A ( α - i ) Λ odd ( α - i ) ) = α ( - i ) ( r + lr 0 - 1 - p ) / ( ( A ( α - i ) Λ odd ( α - i ) )
  • Feng's circuit implementation is shown in FIGS. 7A, 7B, 8A and 8B. FIGS. 7A and 7B show partial views of a BMA (Berlekamp-Massey Architecture) circuit block employed to compute the error locator polynomial and the error evaluation polynomial from the output of a syndrome computation block (not shown), and FIGS. 8A and 8B show partial views of a correction circuit block that performs a Chien search and error evaluation on the output of the BMA block.
  • The partial view of Feng's BMA block in FIG.7A shows multipliers 700 1, 700 2, . . . , 700 n and 718 1, 718 2, . . . , 718 n, adders 705 1, 705 2, . . . 705 n, registers 710 1, 710 2, . . . , 710 n, 715 1, 715 2, . . . ,715 n, 716 and 717, summer 720, Galois Field inverter 730 and decision block 740. The Galois field inverter 730 computes the inverse Δ−1 of the discrepancy Δ. The multipliers 700 1, 700 2, . . . , 700 n and 718 1, 718 2, . . . ,718 n and adders 705 1, 705 2, . . . , 705 n perform Galois field arithmetic. Multiplexers used for loading registers 715 1, 715 2, . . . , 715 n, 716 and 717 are omitted for clarity. Registers 715 1, 715 2, . . . ,715 n, 716 and 717 are initialized to be the syndromes calculated in the syndrome calculation block and registers 710 1, 710 2, . . . , 710 n are initialized to be zero. A critical path is shown by the dotted line at 745.
  • The remaining circuitry in Feng's BMA block is shown in FIG.7B, where 750 1. . . 750 n and 755 are multipliers, 760 0, 760 1, . . . , 760 n, 765 and 780 0, 780 1, . . . , 780n are multiplexers 770 0, 770 1, . . . ,770n and 775 are registers, 785 is an AND gate, and 790 is a decision block. The registers 770 1, . . . , 770 n and 775 are initialized to zero, and register 770 0 is initialized to 1. Outputs are obtained at terminals 795 0, 795 1, . . . , 795 n and 796.
  • FIG. 8A is a partial view of the correction block according to Feng, in which 801 1, 801 2, . . . ,801 n and 802 1, 802 2, 802 n are multiplexers, 810 1, 810 2, . . . , 810 n and 811 1, 811 2, . . . , 811 n are registers, 815 1, 815 2, . . . , 815 n, 816 1, 816 2, . . . , 816 n and 831 are multipliers, 827 and 828 are summers, 830 and 845 are adders, 833 is a decision block, 835 is a Galois Field inverter, 840 is an OR gate, and 837 represents a critical path.
  • FIG. 8B shows recursive circuitry on the right hand side to calculate the numerator of the expression in paragraph earlier as a product of Feng's terms Ba and Bp, the recursive circuitry including a multiplexer 856, registers 861 and 862, and a multiplier 866. On the left hand side are circuits to evaluate the polynomial B(z), employing multiplexers 855 0, 855 1, . . . , 855 n, registers 860 0, 860 1, . . . , 860 n, multipliers 865 1, . . . , 865 n and a summer 870. The outputs of multiplier 866 and summer 870 are then multiplied together by multiplier 880. 875 represents a critical path.
  • A preferred algorithm according to the invention is shown in FIG. 3, which is closely related to a preferred embodiment of the apparatus. The syndromes are input in step 300, and the coefficients updated in step 310. In FIG. 3, an iteration cycle of the previous algorithms has been split into three distinct steps 320, 330 and 340, each of which takes one cycle to execute. Feng's implementation requires 3t multipliers to execute steps 220, 230 and 240 within one clock cycle. By separating these steps into distinct cycles, only t multipliers are required which can be used in various ways to execute each part of the algorithm. Since these multipliers dominate the VLSI area of the PDU, the area is desirably reduced. In step 350, i is compared to r, and if i is not equal to r, then steps 320, 330 and 340 are repeated. If i=r, then the results are output in step 360.
  • In FIG. 4, the schematic of a submodule referred to as a slice is shown. The slice contains a number of two- way multiplexers 410, 412, 414, 416, 418, 420, 422, 424, 426 and 428, each having three inputs including a select input, such that when the select input is 1(0) the input 1(0) appears on the output, a Galois Field adder 430, which inputs two 8-bit quantities and outputs their finite field sum, and is equivalent to a bit-wise XOR gate, a Galois Field multiplier 432, which inputs two 8-bit quantities and outputs their finite field product, and registers 440, 442, 444 and 446, which are D-type clocked flip-flops to hold terms Λ, B, o, and U. On the rising edge of the clock signal clk the input values are stored in the flip-flops and appear at the output. The term o is an overflow flag indicating whether B(z) has overflowed, and is either 0 or 1. An o input is provided at 402, a B input at 404, an o output at 452, a B output at 454, a Λ output at 456, and a multiplier output at 458.
  • In general, at the beginning of an iteration (at the start of step 320), register 446 holds a syndrome value denoted Uj, register 444 holds a term of the locator polynomial Λ, register 442 holds a term of the B polynomial, and register 442 holds a flag that shows if the term of the B polynomial has resulted from an overflow during rotation. Slice k holds the coefficients of Zk in the polynomials B(z) and Λ(z).
  • At the start of decoding, the signal init is asserted, which loads registers 440, 442, and 444 with all zeroes, and register 446 with an external syndrome term at the rising edge of the clock. All slices except for the implied first one are initialized in this way. The initial value of Λ(z) is 1, so all slices except for the implied first one are set to zero. Similarly, the initial B(z) is 1. After initialization, the signal init is reset to zero for the remainder of decoding, and 2t three-step iterations are performed to complete the decoding, where t is the maximum number of errors to be corrected.
  • At the start of iteration step 320, control signals m0sel, m1sel and m2sel are all asserted (set to 1), while signals m3sel, m4sel and m5sel are deasserted (set to 0). The product muloutk is the Galois field product of a term of U and a term of Λ, which is summed with all such terms from all slices in an external Galois Field adder to form the discrepancy. At the end of iteration step 320, is loaded into an external register that feeds back the registered signal to the slices. Also, at the end of the iteration step 320, the terms of the three polynomials U, B and 7 are rotated within a slice, so that the term of 7 moves to register 446, a term of B moves up to register 444, and a term of U is in register 442. The B<k>output of slice k is connected to the Binput<k+1>of slice <k+1>. As these B terms are registered, the terms of B are shifted ahead from one slice to the next, accomplishing the
    B(i+1)(z)←ROTATED [Bi(z)]
    function. In this function, the modified B(z) is zB(z), where the highest order term, if it overflows, is shifted back to the lowest order slice, setting the accompanying o(z) flag.
  • At the start of iteration step 330, control signals m1sel, m2sel and m5sel are all asserted (1), while control signals m4sel and m0sel are deasserted (0). Signal m3sel may or may not be asserted. The product muloutk is the Galois Field product of a term of B and discrepancy ). The Galois Field sum Λk⊕(Δ⊕Bk) is one term of the function
    Λ(i+1)(z)←Λi(z)+ΔB(i+1)(z),
    which is loaded in register 442 at the end of iteration step 330. The terms of U are shifted ahead from one slice to the next, accomplishing the function:
    U(i+1)(z)←ROTATED [Ui(z)]
    one of the terms of which is loaded in register 444 at the end of iteration step 330. These are shifted from one slice to the next using the same Binput<k+1>and B<k>inter-slice connections used to shift B in iteration step 320. The value in register 440 is unmodified.
  • Register 446 may take one of two values. This depends upon the expression:
    ((Δ≠0) AND (2j<=i))
    in FIG. 3. An external circuit determines if any of the bits of the discrepancy are non-zero by OR-ing all the bits together. Two external quantities i and 2j are compared, and the result of the comparison is logically AND-ed with the output of the non-zero detection OR gate. This logical signal externally drives m3sel. If m3sel is asserted, the value of register 446 remains unchanged at the end of iteration step 330, and register 446 retains the unmodified term Λk, accomplishing the function:
    B(i+1)(z)←Λi(z).
  • If m3sel is deasserted, the value of register 446 is updated at the end of iteration step 330 by taking on the value Bk that was in register 444 at the end of iteration step 320. At the same time, an external circuit performs a Galois field inversion on the quantity Δ, inverts the least significant bit of the result, and loads the modified inversion, denoted delinvp1, in an external register at the end of iteration step 330. The inversion circuitry handles the case Δ=0 by loading the external register with a 1.
  • At the start of iteration step 340, m4sel is asserted, m0sel, m2sel, m3sel and m5sel are deasserted, and m1sel takes the same value that m3sel took in iteration step 330. The product muloutk is the Galois field product of a term of B and delinvp1. If mlsel is asserted, the input to register 442 is: ( delinvpl B k ( i + 1 ) ) B k ( i + 1 ) = ( ( Δ - 1 1 ) B k ( i + 1 ) ) B k ( i + 1 ) = ( ( Δ - 1 B k ( i + 1 ) ) B k ( i + 1 ) B k ( I + 1 ) = Δ - 1 B k ( i + 1 )
    in a field of characteristic 2 like GF(256). This accomplishes the function
    B(i+1)(z)←ΦB(i+1)(z)
    in FIG. 3. If m1sel is deasserted, then register 442 is updated at the end of iteration step 340 by taking on the value Bk that was in register 446 at the end of step 330.
  • Since m3sel is deasserted, register 446 takes on the value Uk that was in register 444 at the end of step 330. Since m2sel is deasserted, register 444 takes on the value Λk that was in register 442 at the end of step 330. Register 440 is unmodified.
  • In this manner, the updated polynomial terms B, A and U have returned to their original positions at the end of the three steps 320, 330 and 340 making up one iteration. After 2t such iterations, the polynomials B and Λ have taken their final values, and are used by the Chien search unit to determine error locations and values.
  • A unique feature of the slice shown in FIG. 4 is that the terms Λ, B, and U are contained in time-varying registers within a slice. This allows the terms which are multiplied to be shifted to the multiplier inputs as needed, so that each multiplier is shared in a high-speed implementation. It also allows for re-use of the polynomial rotation circuitry so that the terms of the B polynomial are shifted in step 320, while the terms of the U polynomial are shifted using the same circuitry in step 330.
  • To compare implementations, Feng's circuitry was sliced to contain one term of the equivalent Λ, A, and U and their associated logic. Table 1 compares the efficiency of the slice shown in FIG. 4 to that of Feng. Each was estimated using synthesis results for a decoder designed for 9 bit symbols. As can be observed from Table 1, Feng's implementation uses nearly twice the VLSI area, but the two implementations have nearly equal iteration delays. Hence, this improved implementation is nearly twice as efficient.
    TABLE 1
    Comparison of Slice Efficiency
    Feng Preferred Embodiment
    Area
    3 multipliers 1 multiplier
    3 symbol registers 3 symbol registers
    1 symbol adder 1 symbol adder
    5 symbol multiplexers 8 symbol multiplexers
    (˜1,270 gates) (628 gates)
    Delay per iteration 2 multiplications
    1 large summation
    1 inversion 3 inversions
    2 multiplexings
    1 register delay 3 register delays
    1 register setup 3 register setups
    (˜16 ns) (˜16.8 ns)
    Product of Delay * Area 20,320 10,550
  • FIG. 5 shows the schematic of the entire PDU according to the embodiment of FIG.4. This particular PDU has tmax=20, and designs for other values of tmax are easily derived by adding or deleting slices. The PDU includes a number of slices 500 1, 500 2, 500 3, . . . 500 20, having respective o inputs 502 1, 502 2, 502 3, . . . , 502 20 and the final slice 500 20 having an o output 502 21, a delay line 510 for holding half of U, control circuitry shown symbolically at 520, a summer 530, a Galois field inverter 10 540 for obtaining the mutiplicative inverse of ), and auxiliary logic indicated generally at 555, including an OR gate 550, a decision block 552 and multiplexers 560 and 562.
  • Outputs 508 1, 508 2, 508 3. . . 508 20 from slices 500 1, 500 2, 500 3. . . 500 20, corresponding to multiplier output 458 in the exemplary slice 15 shown in FIG.4, are summed by a summer 530, to give a value of Δ at terminal 538, which is then inverted by the Galois field inverter 540 and modified by Galois Field adder 542 to provide a value of (1/Δ)ρ1 at 544. The register output value is fed back to multiplexer 560 at terminal 561. Delay line 510 includes multiplexers 512 1, 512 2, 512 3. . . 512 20, and corresponding registers 514 1, 514 2, 514 3. . . 514 20. The delay line 510 is shifted once very three cycles. An output 518 from the delay line 510 is supplied to input terminal 563 of multiplexer 562, where it is also combined with the output of multiplexer 560, to feed the B input 504 of the first slice 500 1, corresponding to the B input 404 in FIG. 4. Decision block 552 detects a non-zero condition to determine the value of the o input 502 1 to the first slice 500 1, corresponding to the o input 402 in FIG. 4. OR gate 550 is employed to detect a first cycle condition.
  • The present invention also eliminates Feng's special circuit for calculating the numerator of the error value expression. To do this, the Applicants make a judicious choice of Ir0 to use the simplified version of the error value expression, i.e., with numerator 1, and handle storage of the B polynomial in a proprietary fashion.
  • The polynomials B(z) and o(z) here are of the special form, B ( z ) = i = 1 t max B i z i and o ( z ) = i = 1 t max o i z i
    where oi is 0 or 1, and neither polynomial has a constant term. The storage of a constant term for the polynomial B(z) is not required, because initialization of B(z) is performed by the OR gate 550 in the first cycle of Berlekamp-Massey iterations, as shown in FIG. 5. For either of these polynomials, the notation
    B(i+1)(z)←ROTATED [Bi(z)]
    indicates that the coefficients are updated in a shifting operation, with the highest degree coefficient shifting to the coefficient of z,
    B1 (i+1)=Btmax 1, B2 (i+1)=B1 i, . . . , Btmax (i+1) =B(t max 1)i
  • A non-zero coefficient in the polynomial o(z) indicates that the corresponding term of the resulting polynomial B(z) is the result of overflowing the allotted storage.
  • Assuming that at most t errors have occurred, the resulting polynomial Λ(z) is identical to that of FIG. 1. If the resulting polynomial o(z) is zero, the resulting polynomial B(z) is identical to the B(z) obtained in FIG. 1. To distinguish the polynomials when o(z) is non-zero, we refer to the result of FIG. 5 as Bwrapped(Z) . The resulting polynomial B(z) obtained in FIG. 1 can be related to Bwrapped(z) in all cases as follows.
    B0=B0 wrapped=0,B1=(1−o1)B1 wrapped, . . . , Bt max=(1−ot max)Bt max wrapped
    B(t max+1)=o1B1 wrapped, B(T max+1)=o2B2 wrapped , . . . , B2(t max)=ot maxBt max wrapped
  • In a Chien search unit, the polynomial B(z) is typically evaluated in a circuit as shown on the left hand side of FIG. 8B, as previously discussed. A series of Chien search registers is initially loaded with the polynomial B(z). The initial sum of these registers represents the polynomial B(z) evaluated at z=1. In the next cycle, the register containing Bi has been multiplied by ∀−i. After ν cycles, the sum of the registers can be expressed as i = 1 t max α - i v B i = B ( α - v ) .
  • In a preferred embodiment of the invention as shown in FIG. 6, a register 630 is initially loaded with Bi and another one bit register 640 is loaded with the corresponding overflow flag oi via multiplexers 610 and 620 respectively. The overflow flag controls a proprietary dual-constant multiplier. If oi=0, the dual-constant multiplier scales by α−i, but if oi=1, the dual-constant multiplier scales by α−(i+max). The net effect is to provide an equivalent evaluation to that of the polynomial B(z) in FIG. 1, while using only half the B storage registers of FIG. 1. This special circuitry allows us to eliminate two random multipliers in Feng's numerator circuit of FIG. 8B. Although we picture the contents of the dashed box 650 in FIG. 6 as two distinct multipliers 660 and 670 followed by a multiplexer 680 for clarity of understanding, in practice the two multipliers 660 and 670 share a lot of common terms, and are described in such a way as to exploit the commonality and reduce complexity. In Appendix A, a typical single and dual multiplier description is shown in Verilog.
  • While the invention has been described with reference to the exemplary embodiments thereof, those skilled in the art will be able to make various modifications to the described embodiments of the invention without departing from the true spirit and scope of the invention.
    APPENDIX A
    Example Single and Dual Multipliers in Verilog
    Description: multiply by alpha**1003 = alpha**(−20) in GF(1024)
    module mula1003(inp, outp);
    input [9:0] inp;
    output [9:0] outp;
    reg [9:0] outp;
    always @(inp)
    begin
    outp[0] = inp[1] {circumflex over ( )} inp[4] {circumflex over ( )} inp[5] {circumflex over ( )} inp[7];
    outp[1] = inp[0] {circumflex over ( )} inp[1] {circumflex over ( )} inp[4] {circumflex over ( )} inp[6] {circumflex over ( )} inp[7];
    outp[2] = inp[3] {circumflex over ( )} inp[6] {circumflex over ( )} inp[7] {circumflex over ( )} inp[9];
    outp[3] = inp[2] {circumflex over ( )} inp[3] {circumflex over ( )} inp[6] {circumflex over ( )} inp[8] {circumflex over ( )} inp[9];
    outp[4] = inp[1] {circumflex over ( )} inp[4] {circumflex over ( )} inp[7] {circumflex over ( )} inp[8] {circumflex over ( )} inp[9];
    outp[5] = inp[0] {circumflex over ( )} inp[1] {circumflex over ( )} inp[5] {circumflex over ( )} inp[6] {circumflex over ( )} inp[7] {circumflex over ( )} inp[8];
    outp[6] = inp[0] {circumflex over ( )} inp[1] {circumflex over ( )} inp[3] {circumflex over ( )} inp[6] {circumflex over ( )} inp[9];
    outp[7] = inp[0] {circumflex over ( )} inp[2] {circumflex over ( )} inp[3] {circumflex over ( )} inp[7] {circumflex over ( )} inp[8] {circumflex over ( )} inp[9];
    outp[8] = inp[2] {circumflex over ( )} inp[3] {circumflex over ( )} inp[5] {circumflex over ( )} inp[8];
    outp[9] = inp[2] {circumflex over ( )} inp[4] {circumflex over ( )} inp[5] {circumflex over ( )} inp[9];
    end
    endmodule
    Description: multiply by alpha**1003 or alpha**(983) in GF(1024)
    - Note that common terms are labelled cm and shared
    - p0 are remaining terms for alpha**1003, p1 are remaining for
    alpha**(983)
    module duala1003or983(inp, select, outp);
    input [9:0] inp;
    input select;
    output [9:0] outp;
    reg [9:0] outp;
    reg [9:0] p0;
    reg [9:0] p1;
    reg [9:0] cm;
    always @(inp)
    begin
    p0[0] = inp[4] {circumflex over ( )} inp[7];
    p0[1] = inp[1] {circumflex over ( )} inp[6] {circumflex over ( )} inp[7];
    p0[2] = inp[6] {circumflex over ( )} inp[9];
    p0[3] = inp[3] {circumflex over ( )} inp[8] {circumflex over ( )} inp[9];
    p0[4] = inp[1];
    p0[5] = inp[0] {circumflex over ( )} inp[1] {circumflex over ( )} inp[7];
    p0[6] = inp[0] {circumflex over ( )} inp[3];
    p0[7] = inp[2] {circumflex over ( )} inp[3] {circumflex over ( )} inp[9];
    p0[8] = inp[2] {circumflex over ( )} inp[5];
    p0[9] = inp[4] {circumflex over ( )} inp[5];
    p1[0] = inp[0] {circumflex over ( )} inp[2] {circumflex over ( )} inp[3] {circumflex over ( )} inp[8];
    p1[1] = inp[2] {circumflex over ( )} inp[5] {circumflex over ( )} inp[9];
    p1[2] = inp[1] {circumflex over ( )} inp[2] {circumflex over ( )} inp[4] {circumflex over ( )} inp[5];
    p1[3] = inp[0] {circumflex over ( )} inp[1] {circumflex over ( )} inp[4] {circumflex over ( )} inp[7];
    p1[4] = inp[2] {circumflex over ( )} inp[6];
    p1[5] = inp[3];
    p1[6] = inp[4] {circumflex over ( )} inp[8];
    p1[7] = inp[1] {circumflex over ( )} inp[5];
    p1[8] = inp[0] {circumflex over ( )} inp[1] {circumflex over ( )} inp[6];
    p1[9] = inp[0] {circumflex over ( )} inp[3] {circumflex over ( )} inp[7];
    cm[0] = inp[1] {circumflex over ( )} inp[5];
    cm[1] = inp[0] {circumflex over ( )} inp[4];
    cm[2] = inp[3] {circumflex over ( )} inp[7];
    cm[3] = inp[2] {circumflex over ( )} inp[6];
    cm[4] = inp[4] {circumflex over ( )} inp[7] {circumflex over ( )} inp[8] {circumflex over ( )} inp[9];
    cm[5] = inp[5] {circumflex over ( )} inp[6] {circumflex over ( )} inp[8];
    cm[6] = inp[1] {circumflex over ( )} inp[6] {circumflex over ( )} inp[9];
    cm[7] = inp[0] {circumflex over ( )} inp[7] {circumflex over ( )} inp[8];
    cm[8] = inp[3] {circumflex over ( )} inp[8];
    cm[9] = inp[2] {circumflex over ( )} inp[9];
    end

Claims (2)

1. A decoder for error correcting codes, comprising:
a syndrome calculation circuit; and
a polynomial determining unit for calculating polynomials from an output of said syndrome calculation circuit, said polynomials including at least one polynomial that is rotated if said at least one polynomial overflows.
2-18. (canceled)
US11/505,451 2002-09-23 2006-08-17 Decoder architecture for Reed Solomon codes Abandoned US20070011592A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/505,451 US20070011592A1 (en) 2002-09-23 2006-08-17 Decoder architecture for Reed Solomon codes

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/251,776 US7117425B2 (en) 2002-09-23 2002-09-23 Decoder architecture for reed solomon codes
US11/505,451 US20070011592A1 (en) 2002-09-23 2006-08-17 Decoder architecture for Reed Solomon codes

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/251,776 Continuation US7117425B2 (en) 2002-09-23 2002-09-23 Decoder architecture for reed solomon codes

Publications (1)

Publication Number Publication Date
US20070011592A1 true US20070011592A1 (en) 2007-01-11

Family

ID=31992818

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/251,776 Expired - Fee Related US7117425B2 (en) 2002-09-23 2002-09-23 Decoder architecture for reed solomon codes
US11/505,451 Abandoned US20070011592A1 (en) 2002-09-23 2006-08-17 Decoder architecture for Reed Solomon codes

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/251,776 Expired - Fee Related US7117425B2 (en) 2002-09-23 2002-09-23 Decoder architecture for reed solomon codes

Country Status (1)

Country Link
US (2) US7117425B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011277A1 (en) * 2008-07-10 2010-01-14 Poeppelman Alan D Adjustable error-correction for a reed solomon encoder/decoder

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7814398B2 (en) * 2006-06-09 2010-10-12 Seagate Technology Llc Communication channel with Reed-Solomon encoding and single parity check
TWI514778B (en) * 2014-03-27 2015-12-21 Storart Technology Co Ltd Method and circuit for shortening latency of chien's search algorithm for bch codewords

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119262A (en) * 1997-08-19 2000-09-12 Chuen-Shen Bernard Shung Method and apparatus for solving key equation polynomials in decoding error correction codes

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011277A1 (en) * 2008-07-10 2010-01-14 Poeppelman Alan D Adjustable error-correction for a reed solomon encoder/decoder
US8151172B2 (en) * 2008-07-10 2012-04-03 Lsi Corporation Adjustable error-correction for a reed solomon encoder/decoder

Also Published As

Publication number Publication date
US20040059990A1 (en) 2004-03-25
US7117425B2 (en) 2006-10-03

Similar Documents

Publication Publication Date Title
US6374383B1 (en) Determining error locations using error correction codes
US4873688A (en) High-speed real-time Reed-Solomon decoder
EP0620654B1 (en) Circuit for performing the Euclidian algorithm in decoding of arithmetical codes
US6637002B1 (en) Decoder for error correcting block codes
Lee High-speed VLSI architecture for parallel Reed-Solomon decoder
US4868828A (en) Architecture for time or transform domain decoding of reed-solomon codes
JP3970337B2 (en) Hardware-optimized Reed-Solomon decoder for large data blocks
US7322004B1 (en) Efficient high-speed Reed-Solomon decoder
US6119262A (en) Method and apparatus for solving key equation polynomials in decoding error correction codes
US6571368B1 (en) Systolic Reed-Solomon decoder
US7870468B1 (en) Reed-solomon decoder using a configurable arithmetic processor
US5535225A (en) Time domain algebraic encoder/decoder
EP0621698B1 (en) Error correction method including erasure correction, and apparatus therefore
US20100199154A1 (en) Reduced processing in high-speed Reed-Solomon decoding
KR100260415B1 (en) High speed serial error position polynomual calculation circuit
EP1370003A1 (en) Reed-Solomon Decoder
EP1502356B1 (en) A method of soft-decision decoding of reed-solomon codes
EP1102406A2 (en) Apparatus and method for decoding digital data
US20070011592A1 (en) Decoder architecture for Reed Solomon codes
Iwamura et al. A design of reed-solomon decoder with systolic-array structure
US20030126543A1 (en) Method and apparatus for solving key equation polynomials in decoding error correction codes
JP3343857B2 (en) Decoding device, arithmetic device, and methods thereof
JP3614978B2 (en) Galois field division method and division apparatus
Park et al. High-speed low-complexity Reed-Solomon decoder using pipelined Berlekamp-Massey algorithm
US7693927B2 (en) Data processing system and method

Legal Events

Date Code Title Description
AS Assignment

Owner name: AGERE SYSTEMS INC., PENNSYLVANIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FREDRICKSON, LISA;SONG, LEILEI;REEL/FRAME:018191/0119

Effective date: 20020919

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION