HK1071967A

HK1071967A - Method and apparatus for encoding data in parallel

Info

Publication number: HK1071967A
Application number: HK05104653.5A
Authority: HK
Inventors: R．S．萨尔维; M．A．海华德
Original assignee: 高通股份有限公司
Priority date: 2001-09-20
Filing date: 2002-09-06
Publication date: 2005-08-05

Description

Method and apparatus for parallel encoding of data bits

Background

Technical Field

The present invention relates to data communications, and more particularly to encoding multiple data bits in parallel (e.g., using a multi-port memory) to substantially reduce the latency associated with the encoding.

Description of the related Art

In a typical data communication system, data is processed, modulated, and conditioned at a transmitter unit to generate a modulated signal, which is then transmitted to one or more receiver units. Data processing may include, for example, formatting the data into a particular frame format, encoding the formatted data with a particular encoding scheme to provide error detection and/or correction at the receiver unit, channelizing (i.e., covering) the encoded data, and then spreading the channelized data across the system bandwidth. Data processing is typically defined by a system or implemented standard.

At the receiver unit, the transmitted signal is received, conditioned, demodulated, and digitally processed to recover the transmitted data. The processing at the receiver unit is complementary to that at the transmitter unit and may include, for example, despreading the received samples, decovering the despread samples, and decoding the decovered symbols to recover the transmitted data.

The ability to correct transmission errors enhances the reliability of data transmission. Many digital communication systems use convolutional codes or Turbo codes to provide error correction capability at the receiver unit. Convolutional codes operate on serial data, i.e., one or a few bits at a time. There are various useful convolutional codes, and various algorithms for decoding the received encoded information sequence to recover the original data. Turbo coding is particularly a parallel concatenated convolutional coding scheme. Concatenated codes are concatenated combinations of two or more codes and are used to provide additional error correction capability. For concatenated codes, the code bits between code stages may be interleaved (i.e., reordered) to provide time diversity, which may further improve performance. The frame or entire packet of coded bits is typically stored before reordering is implemented. The reordered coded bits are then serially retrieved by the next coding stage and coded.

Generally, convolutional and Turbo coding is implemented serially on an input bit stream. For each clock cycle, one data bit is provided to the encoder, and two or more code bits are generated based on the code rate of the encoder. Some of the code bits may then be punctured (i.e., removed) to obtain code bits for other code rates.

Digital multiple-access communication systems typically transmit data in packets and frames, enabling efficient sharing of system resources among active users. For services that cannot tolerate long delays (e.g., voice, video), packets are selected to be of short duration (e.g., 10 milliseconds), and encoding is selected accordingly to make their processing delay short. However, to improve coding efficiency, it is desirable to process and code larger size packets, which can result in longer processing delays when using conventional techniques for serially encoding data. Longer processing delays may adversely affect the performance of the communication system. For example, a particular user or data rate may be selected for a particular data transmission based on the conditions of the communication link. If the processing delay is excessively long, the link condition may change during the data transmission time, thereby changing or deteriorating the performance.

It can be seen that a technique for efficiently encoding data with a short processing delay is highly desirable.

Summary of The Invention

According to one aspect, an encoder can encode multiple bits in parallel to substantially reduce encoding time. Two or more encoders may be serially concatenated together to form a concatenated encoder, such as a Turbo encoder commonly used in CDMA communication systems. By encoding M bits in parallel with a first (outer) encoder and N bits in parallel with a second (inner) encoder, the overall encoding latency of the concatenated encoder can be greatly reduced. An interleaver, typically coupled between the first and second encoders and supporting parallel encoding, is capable of receiving a plurality of encoded bits for a write operation and providing a plurality of encoded bits for a read operation.

One embodiment provides a concatenated encoder for encoding multiple data bits in parallel. The concatenated encoder includes a first (outer) encoder, a memory, and a second (inner) encoder coupled in cascade. A first encoder receives and encodes M data bits in parallel according to a first encoding scheme, generating MR encoded bits. The memory receives and stores the MR code bits from the first encoder without puncturing (i.e., without erasure). A second encoder receives and encodes the N code bits in parallel according to a second coding scheme to generate coded data. M and N may be any values. For example, M may be eight or more and N may be four or more.

Each of the first and second encoders may be convolutional encoders that implement a particular polynomial generator matrix (e.g., a rate 1/2 convolutional code). Each encoder may also be implemented with one or more look-up tables, state machines, or other designs. To reduce memory requirements, encoding may be performed and completed by the encoder for a particular packet before encoding begins for another packet. To reduce processing latency, the second encoder may encode one packet while the second encoder encodes another packet (i.e., pipeline encoding).

The memory may be implemented as a multi-port memory with P ports (P > 1), a single memory cell, or multiple memory cells. The memory may be designed to store W codewords in parallel for write operations and to provide R codewords in parallel for read operations, each codeword comprising a certain number of coded bits (e.g., eight). The memory may be used to provide interleaving of the coded bits stored in the memory. For example, W codewords may be stored in successive rows in the memory by a write operation, and R codewords may be retrieved from the aligned rows in the memory by a read operation.

The concatenated encoder may further comprise a set of N multiplexers for providing the N encoded bits in parallel to the second encoder. Each multiplexer receives a respective codeword from the memory, selects one of the coded bits within the received codeword, and provides the selected bit to the second encoder.

Another embodiment provides a convolutional encoder for encoding multiple data bits in parallel. The convolutional encoder includes a state machine coupled to an output generator. The state machine receives M data bits in parallel and provides a set of values that indicate the next state of the state machine. The next state is a function of the M data bits and the current state of the state machine. The output generator also receives the M data bits and the current state and generates MR code bits in response thereto. M and MR can be any number greater than one (e.g., M ≧ 8, MR ≧ 16).

The state machine typically implements a particular polynomial generator matrix and may be implemented with a set of logic elements (e.g., gates) coupled to a set of registers. Each logic element is coupled to a selected bit of the M data bits and a current state machine value to implement a particular logic function of one bit of the state machine. The register stores the output value from the logic element, and the register output includes the current state of the state machine.

To encode the data packets, the output generator may include first and second output generators. The first output generator receives the M data bits and the current state and generates MR code bits in response thereto in a first encoding stage (e.g., data). The second output generator also receives the M data bits and the current state and generates MR code bits in response thereto in a second encoding stage (e.g., tail). The code bits from the first or second output generator are selected depending on the encoding stage being performed. The state machine is typically set to a known state (e.g., all zeros) during the second encoding phase.

Yet another embodiment provides a data encoder for encoding multiple bits in parallel. The data encoder includes an input interface, a multi-bit encoder, a memory, and an output interface. An input interface receives the M data bits and provides the received bits to a multi-bit encoder. The multi-bit encoder may be selected to receive and encode M data bits in parallel to generate MR code bits, or to receive and encode N code bits in parallel to generate NR code bits. The memory stores unpunctured bits from the MR coded bits of the multi-bit encoder and directs supply of the N coded bits to the multi-bit encoder. An output interface receives the NR encoded bits from the multi-bit encoder and provides unpunctured ones of the NR encoded bits as encoded data. The data encoder typically further comprises an address generator that generates addresses for performing memory read and write operations.

Another embodiment provides a transmitter unit for use in a communication system (e.g., a CDMA system). The transmitter unit includes an encoder, a modulator, and a cascade coupled transmitter. An encoder receives and encodes M data bits in parallel according to a first coding scheme to generate MR code bits, stores the MR code bits unpunctured bits, interleaves the code bits for a particular packet, receives and encodes N code bits according to a second coding scheme to generate NR code bits, and provides the NR code bits unpunctured bits as encoded data. The modulator receives and modulates the encoded data with a particular modulation scheme to generate modulated data. And the transmitter receives and processes the modulated data to generate a modulated signal suitable for transmission. The encoder may be designed to implement Turbo coding or concatenated coding.

Another embodiment provides a method for implementing parallel concatenated coding of multiple data bits. According to the method, M data bits are received and encoded in parallel according to a first encoding scheme to generate MR code bits. Zero or more bits of the MR code bits may be truncated with a particular puncturing scheme, with the non-punctured code bits stored in memory. At an appropriate time, the N coded bits are retrieved from memory and coded in parallel according to a second coding scheme to generate coded data. For efficiency and reduced latency, W codewords of unpunctured code bits may be written to W ports of the memory simultaneously, and R codewords of code bits may be read from R ports of the memory simultaneously. To provide interleaving, W codewords may be stored in successive rows in memory via a write operation, while R codewords may be retrieved from aligned rows in memory by a read operation.

Other aspects and embodiments of the invention are described below.

Brief description of the drawings

The features, nature, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:

FIG. 1 is a block diagram of a communication system;

FIG. 2 is a block diagram of an encoder that may be used to implement some embodiments of the present invention;

FIG. 3 is a diagram of a concatenated encoder implementing a particular set of polynomial generator matrices according to one embodiment;

FIG. 4 is a block diagram of a convolutional encoder for encoding multiple data bits in parallel according to an embodiment;

fig. 5A and 5B are schematic diagrams of a convolutional encoder that implements a particular polynomial generator matrix and encodes eight data bits in parallel according to various embodiments.

FIG. 6 is a schematic diagram of an embodiment of a convolutional encoder that implements another particular polynomial generator matrix and encodes four coded bits in parallel;

FIG. 7A is an illustration of an interleaver;

FIGS. 7B and 7C are diagrams of interfaces between an outer convolutional encoder and interleavers with and without truncation, respectively, in accordance with various embodiments;

FIG. 8 is a block diagram of an encoder according to an embodiment;

FIG. 9 is a flow diagram of a method of implementing parallel concatenated coding of multiple data bits, according to one embodiment.

Detailed description of specific embodiments

Fig. 1 is a simplified block diagram of an embodiment of a communication system 100 in which aspects of the present invention may be implemented. Traffic data is transmitted at the transmitter unit 110, typically in packets or frames, from a data source 112 to an encoder 114, which formats and encodes the data using a particular coding scheme. The encoder 114 typically further performs interleaving (i.e., reordering) of the coded bits. A Modulator (MOD)116 then receives, channelizes (i.e., covers), and spreads the coded data to generate symbols, which are then converted into one or more analog signals. The analog signals are filtered, (quadrature) modulated, amplified, and upconverted by a transmitter (TMTR)118 to generate a modulated signal, which is then transmitted via an antenna 120 to one or more receiver units.

At receiver unit 130, the transmitted signal is then received by an antenna 132 and provided to a receiver (RCVR) 134. Within receiver 134, the received signal is amplified, filtered, frequency down-converted, quadrature demodulated, and digitized to provide samples. The samples are despread, decovered, and demodulated by a demodulator (DEMOD)136 to generate demodulated symbols. The decoder 138 then decodes the demodulated symbols and (possibly) rearranges the decoded data to recover the transmitted data. The processing performed by demodulator 136 and decoder 138 is complementary to the processing performed at transmitter unit 110. The recovered data is then provided to a data sink 140.

The signal processing described above supports the transmission of voice, video, packet data, messaging, and other types of communications in one direction. A two-way communication system supports two-way data transmission. However, signal processing in the other direction is not shown in fig. 1 for simplicity.

Communication system 100 may be a Code Division Multiple Access (CDMA) system, a Time Division Multiple Access (TDMA) communication system (e.g., a GSM system), a Frequency Division Multiple Access (FDMA) communication system, or other multiple access communication system that supports voice and data communication between users on a terrestrial link.

The use of CDMA technology IN MULTIPLE ACCESS COMMUNICATION SYSTEMs is disclosed IN U.S. Pat. Nos. 4901307 and 5103459, entitled "SPREAD SPECTRUM MULTIPLE ACCESS COMMUNICATION SYSTEM SATELLITE OR TERRESTRIAL REPEATERS", and "SYSTEM AND METHOD GENERATING WAVEFORMS IN A CDMA CELLULAR TELEPHONE SYSTEM". Another specific CDMA system is disclosed in U.S. patent No. 08963386 entitled "METHOD AND PAPARTUS FOR HIGH RATE PACKET DATA TRANSMISSION" filed on 3.11.1997 (referred to herein as the HDR system). These patents and patent applications are assigned to the assignee of the present invention and are incorporated herein by reference.

CDMA Systems are generally designed to conform to one or more standards such as "TIA/EIA-95-B Mobile Station-Base Station Compatibility Standard for Dual-Mode wide Spread Spectrum Cellular System" (hereinafter IS-95 Standard), "TIA/EIA-98-Cellular Minimum Standard for Dual-Mode wide Spread Spectrum Cellular System" (hereinafter IS-98 Standard), the Standard provided by "3 general division parallel Project" (3GPP), embodied within a set of 3 documents including nos. G TS25.211, 3G TS 25.212, 3G TS 25.213, and 3G TS 25.214 (hereinafter W-CDMA Standard), and "laser Spectrum 2000 System" (hereinafter CDMA2000 Standard). New CDMA standards are continuously being proposed and put into use. These CDMA standards are incorporated herein by reference.

Fig. 2 is a block diagram of an encoder 200, which may be designed to implement some embodiments of the present invention. The encoder 200 may be used in the encoder of fig. 1. In this embodiment, encoder 200 implements concatenated encoding and includes outer convolutional encoder 212, interleaver 214, and inner convolutional encoder 216 coupled in cascade. Outer convolutional encoder 212 receives and convolutionally encodes input data to generate coded bits. These bits are provided to interleaver 214 for storage. Once the code bits for the entire packet are stored within interleaver 214, the code bits are retrieved and provided to inner convolutional encoder 216. To achieve interleaving, the order in which the coded bits are read out is different from the order in which the bits are written into interleaver 214. Other convolutional encoders 212 receive and convolutionally encode the coded bits to generate encoded data, which is then provided to subsequent processing stages.

A conventional convolutional encoder receives and serially encodes data, one bit at a time (e.g., every clock cycle). For communication systems that transmit data in large packets, serial encoding of the data can result in long processing delays. Moreover, for concatenated encoders having multiple concatenated convolutional encoders coupled in cascade, the processing delay can be long, especially if the outer and inner convolutional encoders are encoded serially at the same time.

In one aspect, a convolutional encoder can receive and encode multiple (M) bits in parallel. This capability enables a convolutional encoder to encode a packet of data in an amount of approximately 1/M of the time required by a conventional convolutional encoder. This advantage is more pronounced in the case of concatenated encoders (e.g., Turbo encoders) when a single convolutional encoder processes bits in parallel.

According to another aspect, the interleaver is capable of storing and providing a plurality of data bits in parallel. The interleaver may be implemented using, for example, a multi-port memory. The interleaver, when used with the convolutional encoder described herein, also reduces processing latency because data can be read from and written to the interleaver in a small portion of time.

For simplicity, an embodiment will now be described for downstream data transmission in a communication system (HDR system) as described in the aforementioned U.S. patent application serial number 08963386. HDR systems use concatenated coding, which includes outer convolutional coding, interleaving, and inner convolutional coding. The HDR system also defines two packet formats with the characteristics listed in table 1.

TABLE 1

Parameter(s)	Packet format 1	Packet format 2	Unit of
Parameter(s)	Packet format 1	Packet format 2	Unit of	Total bits/packet	1024	2048	Bits
				Total bits/packet	1024	2048	Bits
				Outer convolution encoder
Input data bits/packets	1018	2042	Bits	Outer convolution encoder
Input data bits/packets	1018	2042	Bits	Code tail bits/packets	4	4	Bits
Outer code rate	1/2	2/3		Code tail bits/packets	4	4	Bits
Outer code rate	1/2	2/3		Outer code puncturing pattern	(1111)	(1011)
Outputting coded bits/packets	2044	3069	Bits	Outer code puncturing pattern	(1111)	(1011)
Outputting coded bits/packets	2044	3069	Bits
Depth of interleaver	2048	3072	Bits
Depth of interleaver	2048	3072	Bits
Inner convolutional encoder
Inner convolutional encoder				Inputting coded bits/packets	2044	3069	Bits
Code tail bits/packets	4	3	Bits	Inputting coded bits/packets	2044	3069	Bits
Code tail bits/packets	4	3	Bits	Internal code rate	1/2	3/4
Inner code shortening mode	(111111)	(111001)		Internal code rate	1/2	3/4
Inner code shortening mode	(111111)	(111001)		Outputting coded bits/packets	4096	4096	Bits
				Outputting coded bits/packets	4096	4096	Bits
				Total code rate	1/4	1/2

Within the HDR system, the outer convolutional encoder implements convolutional encoding of rate 1/2 defined by the following polynomial generator matrix:

formula (1)

The inner convolutional encoder within the HDR system implements convolutional encoding of rate 1/2 defined by the following polynomial generator matrix:

formula (2)

Fig. 3 is an illustration of an encoder 300 that implements the inner and outer convolutional codes defined by equations (1) and (2). Data bit u is provided to outer convolutional encoder 310, which implements equation (1) and generates two outputs y_oaAnd y_ob. Within encoder 310, data bit u is provided to adder 312, which further couples registers 314a through 314d (which are used to implement a set of delays) in cascade. The outputs from adder 312 and registers 314A, 314B, and 314D are added by adders 316A, 316B, and 316C to achieve the numerator of the second element in the polynomial generator matrix represented in equation (1). The inputs from registers 314C and 314DThe denominator of the second element in equation (1) is added by adder 318 and provided to adder 312. Inputting data bits u as a first output y_oaProvided to, and the output from adder 316c comprises a second output y_ob。

Output y within outer convolutional encoder 310_oaAnd y_obPossibly truncated (not shown for simplicity in fig. 3). The unpunctured coded bits are then provided to interleaver 330 and reordered. Reordered coded bits V. And then provided to an inner convolutional encoder 340, which implements equation (2) and generates two outputs y_iaAnd y_ib. Within encoder 340, encoded bit v is provided to adder 342, which is cascade coupled to registers 344A and 344B. The outputs from adder 342 and registers 344A and 344B are added by adders 346A and 346B to achieve the numerator of the second element in the polynomial generator matrix represented in equation (2). The output from register 344A is provided to adder 342 to implement the denominator of the second element in equation (2). Inputting coded bits v as a first output y_iaProviding, the output from the adder 346B comprises a second output y_ib。

In general, data bits u are provided serially to encoder 310, and coded bits v are also provided serially to encoder 340. For each input data bit, outer convolutional encoder 310 generates two coded bits. Interleaver 330 receives and stores the coded bits and provides the coded bits serially to inner convolutional encoder 340. Bit encoding in serial mode results in long processing delays.

The convolutional encoder of one embodiment is capable of encoding multiple bits in parallel to significantly reduce coding delay. For each clock cycle, multiple (e.g., M) data bits may be received and encoded to generate multiple encoded bits. For a rate 1/2 encoder, 2M code bits are generated for the M data bits. M may be selected to be any number such as, for example, 4, 8, 16, 32, etc. Various additional embodiments of such convolutional encoders are described below.

A number of digital communication systems, such as HDR systems, transmit data in packets. The number of bits within a packet (i.e., the packet size) is selected based on a number of criteria, such as, for example, the data rate, the amount of data to be transmitted, processing delay requirements, etc. To enable the decoder at the receiver unit to start with a known state at the beginning of each packet, the encoder is initialized to a known state (e.g., all zeros) at the beginning of each packet because this shortens the decoding time and improves performance. Initialization is achieved by inserting a set of tail bits at the end of the previous packet. The code tail bits are selected so that the encoder is set to a known state.

In one embodiment, the convolutional encoder of the exemplary embodiment is implemented with a look-up table. Referring to fig. 3, outer convolutional encoder 310 may be viewed as a state machine with 4-bit states defined by the outputs of registers 314A through 314D. To generate the contents of the lookup table, the M input data bits at time index n may be represented by vector U_nMeaning that 2M code bits can be represented by vector Y_nAnd the current encoder state may be represented by vector X_nAnd (4) showing. Next state X of the encoder_n+1And encoder output vector Y_nCan be expressed as:

data of Code tail

X_n+1＝f(X_n，U_n) X_n+10 formula (3)

Y_n＝g₁(X_n，U_n) Y_n＝g₂(X_n，U_n) Formula (4)

Each of equations (3) and (4) provides one equation to be used when the input is data and another equation when the encoder input includes code tail bits.

Equations (3) and (4) can be applied to allThe combination of the input data bits and the encoder state of the energy is calculated. For example, for equation (4), the output coded bits may be the input vector U_n0.. 00, encoder state X_n0.. 00, input vector U_n0.. 01 and encoder state X_n0.. 00, etc. and an input vector U_n1.. 11 and encoder state X_nCalculated as 0.. 00. Output coded bits and then for all input vectors U_nAnd encoder State X_nA possible combination of 0.. 01 is calculated. The process continues until all combinations of input vectors and encoder states are computed. Equation (3) may also be calculated in a similar manner.

The results of the calculations of equations (3) and (4) may be stored in a memory that implements a look-up table. The required memory size depends on the number of data bits to be encoded in parallel and the particular polynomial generator matrix implemented. For example, if encoding of eight data bits is performed in parallel with the convolutional encoding represented in equation (1), a memory of 12 address bits and 20 data bits in size (i.e., 4096 × 20) may be used. The 12-bit address includes 8 input data bits and 4 bits of the current encoder state. The 20 bit output includes 16 coded bits and 4 bits for the next encoder state.

Once the memory is properly defined, a data vector U is input_nAnd current encoder state X_nMay be provided to the address input of the memory, which then provides the output vector Y_nAnd the next encoder state X_n+1. Next encoder state X_n+1Is suitably stored for use with the next input data vector U_n+1Are used together.

In another embodiment, the convolutional encoder is implemented with a state machine. The encoder states and outputs can be expressed as shown in equations (3) and (4). Each of equations (3) and (4) may be recursively solved, and the resulting equations then implemented in hardware, software, or a combination of both. The recursive equation of the encoder may be solved as follows. Order toRepresents a transposed state vector, and u₀Representing the input data bit at time index 0. The next state and output of the encoder can then be expressed as:

X₁＝AX₀+Bu₀formula (5)

y₀＝CX₀+Du₀Formula (6)

Where A, B, C and D are scalars, vectors, and matrices, which depend on the particular polynomial generator matrix implemented. The encoder state equation (5) can be recursively solved as follows:

X₂＝A²X₀+ABu₀+Bu₁

X₃＝A³X₀+A²Bu₀+ABu₁+Bu₂



X₈＝A⁸X₀+A⁷Bu₀+A⁶Bu₁+A⁵Bu₂+A⁴Bu₃+A³Bu₄+A²Bu₅+ABu₆+Bu₇

the encoder output equation (6) can also be recursively solved in a similar manner.

Equations (5) and (6) are used to encode one data bit u at a time. A similar set of equations may be derived for parallel encoding of M data bits. For example, to encode 8 data bits in parallel (i.e., M-8), the transpose of the input data vector at time index n may be defined asAnd the transpose of the output encoded vector may be defined asUsed as U_nAnd Y_nThe vector definition of the definitions, equations (5) and (6) can be expressed as

X_n+1＝FX_n+GU_nFormula (7)

Y_n＝HX_n+IU_nFormula (8)

Where F, G, H and I are the particular polynomial generator matrix, current encoder state X, depending on the implementation_nAnd input data vectorU_nVector and matrix of (c). Equation (7) is used to generate the next encoder state X after the M data bits have been encoded_n+1Equation (8) for generating the input vector U_nIs output Y of the encoder_n。

To determine F, G, H and I within equations (7) and (8), equations (5) and (6) may be recursively solved using a variety of techniques, and the results from the recursive calculations may be used to implement equations (7) and (8). For example, a table may be used to list the state of each input data bit and the encoder output. The terms in the table may then be used to implement equations (7) and (8), as described below.

Table 2 shows the implementation of equation (1) in FIG. 3 at eight input data bits u₀To u₇The post-encoder states and outputs are provided serially to convolutional encoder 310. As shown in FIG. 3, registers 314A through 314D initially store x, respectively₁，x₂，x₃，x₄The value of (c). In a first clock cycle, a first data bit u₀Is provided to encoder 310 and the output of adder 312 is calculated as x₄+x₃+u₀This is stored in the second row, second column of table 2. The output of the encoder is calculated as ya₀＝u₀And y is_b0＝(x₄+x₃+u₀)+x₄+x₂+x₁＝x₃+x₂+x₁+u₀. (each adder 316 implements a modulo-2 addition). On the next clock cycle, the values from adder 312 and registers 314A through 314C are shifted into registers 314A through 314D, respectively. The next data bit u₁Is provided to the encoder and the output of adder 312 is calculated as x₃+x₂+u₁It is stored in the third row, second column of table 2. The output of the encoder is calculated as y_a1＝u₁And y is_b2＝(x₃+x₂+u₁)+x₃+x₁+(x₄+x₃+u₀)＝x₄+x₃+x₂+x₁+u₁+u₀. Processing continues until the eighth data bit u is received and processed₇。

The encoder outputs a vector:

Y_b＝[y_b7y_b6y_b5y_b4y_b3y_b2y_b1y_b0]

corresponding input vector U ═ U₇u₆u₅u₄u₃u₂u₁u₀]And is generated from entries in the last column of table 2. In the eighth data bit u₇After encoding, encoder state X_n+1Generated from the entries in the last row in table 2. As shown in Table 2, the encoder outputs a vector Y_bAnd the next encoder state X_n+1Each is a current encoder state X_n＝[x₄x₃x₂x₁]And a function of the input vector U. For the data phase, the encoder outputs a vector Y_aOnly a function of the input vector U (i.e. Y)_a＝U)。

TABLE 2

u	1	x₁	x₂	x₃	x₄	y_a	y_b
u	1	x₁	x₂	x₃	x₄	y_a	y_b	u₀	x₄+x₃+u₀	x₁	x₂	x₃	x₄	u₀	x₃+x₂+x₁+u₀
u₁	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₂	x₃	u₁	x₄+x₃+x₂+x₁+u₀+u₁	u₀	x₄+x₃+u₀	x₁	x₂	x₃	x₄	u₀	x₃+x₂+x₁+u₀
u₁	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₂	x₃	u₁	x₄+x₃+x₂+x₁+u₀+u₁	u₂	x₂+x₁+u₂	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₂	u₂	x₄+x₂+x₁+u₀+u₁+u₂
u₃	x₄+x₃+x₁+u₀+u₃	x₄+x₃+x₁+u₀+u₃	x₃+x₂+u₁	x₄+x₃+u₀	x₁	u₃	x₄+x₁+u₀+u₁+u₂+u₃	u₂	x₂+x₁+u₂	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₂	u₂	x₄+x₂+x₁+u₀+u₁+u₂
u₃	x₄+x₃+x₁+u₀+u₃	x₄+x₃+x₁+u₀+u₃	x₃+x₂+u₁	x₄+x₃+u₀	x₁	u₃	x₄+x₁+u₀+u₁+u₂+u₃	u₄	x₄+x₂++u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	x₄+x₃+x₁+u₀+u₃	x₃+x₂+u₁	x₄+x₃+u₀	u₄	x₄+u₀+u₁+u₂+u₃+u₄
u₅	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	x₄+x₃+x₁+u₀+u₃	x₃+x₂+u₁	u₅	x₃+u₁+u₂+u₃+u₄+u₅	u₄	x₄+x₂++u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	x₄+x₃+x₁+u₀+u₃	x₃+x₂+u₁	x₄+x₃+u₀	u₄	x₄+u₀+u₁+u₂+u₃+u₄
u₅	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	x₄+x₃+x₁+u₀+u₃	x₃+x₂+u₁	u₅	x₃+u₁+u₂+u₃+u₄+u₅	u₆	x₄+x₃+x₂+u₀+u₂+u₃+u₆	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	x₄+x₃+x₁+u₀+u₃	u₆	x₂+u₂+u₃+u₄+u₅+u₆
u₇	x₃+x₂+x₁+u₁+u₃+u₄+u₇	x₄+x₃+x₂+u₀+u₂+u₃+u₆	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	u₇	x₁+u₃+u₄+u₅+u₆+u₇	u₆	x₄+x₃+x₂+u₀+u₂+u₃+u₆	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	x₄+x₃+x₁+u₀+u₃	u₆	x₂+u₂+u₃+u₄+u₅+u₆
u₇	x₃+x₂+x₁+u₁+u₃+u₄+u₇	x₄+x₃+x₂+u₀+u₂+u₃+u₆	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄	x₄+x₂+u₀+u₁+u₄	u₇	x₁+u₃+u₄+u₅+u₆+u₇			x₃+x₂+x₁+u₁+u₃+u₄+u₇	x₄+x₃+x₂+u₀+u₂+u₃+u₆	x₃+x₁+u₁+u₂+u₅	x₄+x₂+u₀+u₁+u₄

Referring to table 1, an outer convolutional encoder within an HDR system receives 1018 data bits and four code tail bits per packet in packet format 1. If eight bits are coded in parallel, 128 clock cycles are used to code one data packet. The first 127 clock cycles are used to encode 1016 data bits (i.e., 127 × 8 ═ 1016), and the 128 th clock cycle is used to encode the remaining two data bits and four code tail bits. The first 127 clock cycles are referred to as the "data phase" and the last clock cycle is referred to as the "code tail phase".

The outer convolutional encoder receives 2042 data bits and four code tail bits per packet in packet format 2. If eight bits are encoded in parallel, 256 clock cycles are used to encode one data packet. The first 255 clock cycles are used to encode 2040 data bits (i.e., 255 x 8 ═ 2040), and the 256 th clock cycle is used to encode the remaining two data bits and four code tail bits. The first 255 clock cycles are referred to as the "data phase" and the last clock cycle is referred to as the "code tail phase".

Table 3 shows the data bits u in two₀And u₁And four code tail bits are serially provided to the post-encoder state and output of convolutional encoder 310 in fig. 3. Similarly, registers 314A through 314D initially store x accordingly₁，x₂，x₃，x₄The value of (c). In the first two clock cycles, two data bits u₀And u₁Are provided serially to the encoder. Encoder state x₁To x₄And encoder output y_cAnd y_dCalculated in a similar manner as described above. Thus, the second and second rows of Table 3 are the same as the second and third rows of Table 2, and on the third clock cycle, are provided to the encoder with x₂+x₁The first code tail bit of the value. The value of the tail bits is selected so that the output of adder 312 equals zero, which is used to refresh the convolutional encoder. The output of the encoder is calculated as y_c2＝x₂+x₁And y_d2＝x₄+u₀+u₁. On the next clock cycle, the values from adder 312 and registers 314A through 314C are shifted into registers 314A through 314D, respectively. The second code tail bit is selected as x₄+x₃+x₁+u₀The output of adder 312 is also set to zero and the encoder is refreshed. Processing continues with the last two bit values provided to the encoder being zero.

As shown in Table 3, the encoder output Y_cAnd Y_dAt the same time, the input vector U and the current encoder state X_nAs a function of (c). For the code tail phase, the next encoder state X_n+1Set to a known state of all zeros (i.e., X)₈＝[0000])。

TABLE 3

u	1	x₁	x₂	x₃	x₄	y_c	y_d
u	1	x₁	x₂	x₃	x₄	y_c	y_d	u₀	x₄+x₃+u₀	x₁	x₂	x₃	x₄	u₀	x₃+x₂+x₁+u₀
u₁	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₂	x₃	u₁	x₄+x₃+x₂+x₁+u₀+u₁	u₀	x₄+x₃+u₀	x₁	x₂	x₃	x₄	u₀	x₃+x₂+x₁+u₀
u₁	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₂	x₃	u₁	x₄+x₃+x₂+x₁+u₀+u₁	x₂+x₁	0	x₃+x₂+u₁	x₄₊x₃+u₀	x₁	x₂	x₂+x₁	x₄+u₀+u₁
x₄+x₃+x₁+u₀	0	0	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₄+x₃+x₁+u₀	x₃+x₂+x₁+u₁	x₂+x₁	0	x₃+x₂+u₁	x₄₊x₃+u₀	x₁	x₂	x₂+x₁	x₄+u₀+u₁
x₄+x₃+x₁+u₀	0	0	x₃+x₂+u₁	x₄+x₃+u₀	x₁	x₄+x₃+x₁+u₀	x₃+x₂+x₁+u₁	x₄+x₂+u₀+u₁	0	0	0	x₃+x₂+u₁	x₄+x₃+u₀	x₄+x₂+u₀+u₁	x₄+x₃+u₀
x₃+x₂+u₁	0	0	0	0	x₃+x₂+u₁	x₃+x₂+u₁	x₃+x₂+u₁	x₄+x₂+u₀+u₁	0	0	0	x₃+x₂+u₁	x₄+x₃+u₀	x₄+x₂+u₀+u₁	x₄+x₃+u₀
x₃+x₂+u₁	0	0	0	0	x₃+x₂+u₁	x₃+x₂+u₁	x₃+x₂+u₁	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0			0	0	0	0

Fig. 4 is a block diagram of an embodiment of a convolutional encoder 400 that can encode multiple input data bits in parallel. Convolutional encoder 400 may be used to implement the data and code tail stages (e.g., as defined in tables 2 and 3, respectively). The encoder structure shown in fig. 4 may be used to implement, for example, outer convolutional encoder 310 or inner convolutional encoder 340 in fig. 3.

Within convolutional encoder 400, input data bits are provided in parallel to encoder state machine 410, data phase output generator 420, and code tail phase output generator 430 as data vector U. . The encoder state machine 410 also receives the current encoder state X and determines a new encoder state based on the received input vector U and the current encoder state X. The encoder state machine 410 may implement, for example, the last row of table 2.

Data phase output generator 420 and code tail phase output generator 430 also receive the current encoder state X and determine encoder outputs for the data phase and code tail phase according to the received inputs X and U, respectively. Data phase output generator 420 can implement, for example, the last two columns in table 2, and code tail generator 430 can implement, for example, the last two columns in table 3. First and second outputs Y from the data phase output generator 420_aAnd Y_bAre provided to Multiplexers (MUXs) 440A and 440B, respectively. Similarly, first and second outputs Y from code-tail stage output generator 430_cAnd Y_dProvided to multiplexers 440A and 440B, respectively. Multiplexers 440A and 440B provide output Y from data phase output generator 420 accordingly when operating in the data phase_aAnd Y_bThe output Y is accordingly provided from the code-tail phase generator 430 when operating in the code-tail phase_cAnd Y_d。

To implement a convolutional encoder that continuously encodes input data bits as they are received without the need to reset the encoder state at the beginning of each packet, only the encoder state machine 410 and the data phase output generator 420 are needed. For communication systems (e.g., HDR systems) where data is sent in packets and code-tail bits are used to reset the convolutional encoder to a known state at the beginning of each packet, a code-tail phase output generator 430 and multiplexer 440 are used to provide the required encoder output.

The design of the encoder state machine 410 and the data phase output generator 420 depends on the particular polynomial generator matrix implemented and the number of data bits to be encoded in parallel. The design of the code-tail stage output generator 430 depends on the polynomial generator matrix, the number of data bits to be encoded in parallel, and the particular frame format (e.g., the number of data and code-tail bits to be encoded in the code-tail stage). The design of a particular convolutional encoder 400 is now described below.

Fig. 5A is a schematic diagram of a particular embodiment of a convolutional encoder 500 that can encode eight input data bits in parallel and implement the polynomial generator matrix represented in equation (1). Convolutional encoder 500 includes an encoder state machine 510 that implements the state machine defined in table 2, and a data phase output generator 520 that generates the encoder output defined in table 2. The encoder state machine 510 and the data phase output generator 520 correspond to the encoder state machine 410 and the data phase output generator 420, respectively, in fig. 4. In this embodiment, encoder state machine 510 is implemented with AND gates 512A-512D AND registers 514A-514D, AND data phase output generator 520 is implemented with AND gates 522A-522H.

As shown in FIG. 5A, eight input data bits u₀To u₇Provided in parallel to the inputs of the encoder state machine 510 and the data phase output generator 520, each also receives the current encoder state, represented by x₁To x₄And (4) defining. Each AND gate 512 within encoder state machine 510 is selectively coupled to an input u₀To u₇And x₁To x₄As defined in the last row of table 2. E.g., AND gate 512A is coupled to input x₃，x₂，x₁，u₁，u₃，u₄And u₇The last row, the third column (x) in Table 2₁) Is defined by the term in (b). The outputs of AND gates 512A-512D are coupled to the inputs of registers 514A AND 514D, respectively. The outputs of registers 514A through 514D each include a state machine output x₁To x₄。

Similarly, each AND gate within the data phase output generator 520 is selectively coupled to the input u₀To u₇And x₁To x₄As defined in the last row of table 2. E.g., AND gate 522A is coupled to input x₃，x₂，x₁，u₀And u₇Second row, last column (y) in Table 2_b0) Is defined by the term in (b). Input u₀To u₇Respectively comprising the encoder output y_a0To y_a7(not shown for simplicity in FIG. 5A), the outputs of AND gates 522A through 522H correspondingly comprise encoder output y_b0To y_b7。

FIG. 5B shows the code tail phaseSchematic diagram of a particular embodiment of output generator 530 and multiplexers 540A and 540B implemented as the code tail stage of the polynomial generator matrix represented in equation (1) for packet formats 1 and 2 shown in table 1. Code-tail stage output generator 530 and multiplexers 540A and 540B correspond to respective code-tail stage output generators and multiplexers 440A and 440B in fig. 4. In this embodiment, code-tail stage output generator 530 is implemented with AND gates 532A through 532J AND generates encoder output Y for the code-tail stage defined in Table 3_cAnd Y_d. Multiplexer 540a is implemented with 2 x 1 multiplexers 542A through 542F and provides a first encoder output Y_oa. Similarly, multiplexer 540B is implemented with 2 × 1 multiplexers 544A through 544H and provides a second encoder output Y_ob。

Encoder state machine 510, data phase output generator 520, code tail phase output generator 530, and multiplexers 540A and 540B in fig. 5A and 5B form a particular implementation of convolutional encoder 400. This particular implementation is used to implement the polynomial generator matrix represented in equation (1) and for the packet format described in table 1. For packet format 1, 1018 data bits are provided to convolutional encoder 500 over 128 clock cycles. Eight data bits are provided to encoder 500 for each of the first 127 clock cycles, and selection multiplexers 540A and 540B provide output Y from data phase output generator 520_aAnd Y_b. On the 128 th clock cycle, the remaining two data bits, four code-tail bits, and two zeros are provided to the encoder 500. Registers 514A and 514D are reset to zero (simultaneously) and multiplexers 540A and 540B are selected to provide output Y from code tail stage output generator 530_cAnd Y_d. For packet format 2, 5002042 data bits are provided to the convolutional encoder over 256 clock cycles. For each of the first 255 clock cycles, corresponding to a data phase, eight data bits are encoded in parallel, and multiplexers 540A and 540B provide output Y accordingly_aAnd Y_b. On the 256 th clock cycle, corresponding to code tail stage, two data bits and four code tail bitsAnd two zeros are encoded in parallel and multiplexers 540A and 540B provide output Y, respectively_cAnd Y_d。

The particular implementation depicted in fig. 5A and 5B is depicted for a clearer understanding. It is noted that different implementations are also contemplated and are within the scope of the invention. Also, different designs are typically used for different polynomial generator matrices, different numbers of input data bits, or different packet formats.

In a similar manner, additional convolutional encoders may be designed to implement the polynomial generator matrix in equation (2). In an embodiment, the convolutional encoder is designed to receive and encode four coded bits in parallel. Equations (5) and (6) for the corresponding next encoder state and output can be recursively solved in the manner described above.

Table 4 shows the four input coded bits v₀To v₃The encoder states and outputs after being serially provided to the convolutional encoder 340 of fig. 3. Registers 344A and 344B initially store x, respectively₁And x₂The value of (c). On a first clock cycle, a first coded bit v₀Is provided to encoder 340 and the output of adder 342 is calculated as x₁+v₀It is stored in the second row, second column of table 4. The encoder output is calculated as y_a0＝v₀And y is_f0＝(x₁+v₀)+x₂+x₁＝x₂+v₀. On the next clock cycle, the values from adder 312 and register 344A are shifted into registers 344A and 344B, respectively. Next coded bit v₁Is provided to encoder 340 and the output of adder 342 is calculated as x₁+v₀+v₁It is stored in the third row, second column. The output is calculated as y_e1＝v₁And y is_f1＝(x₁+v₀+v₁)+(x₁+v₀)+x₁＝x₁+v₁. The process continues until the fourth coded bit v₃Is received and processed.

Encoder output vector Y_fGenerated from entries in the last column in table 4. At the fourth coded bit v₃Encoded encoder state X_n+1Generated from the last row entry in table 4. As shown in FIG. 4, the encoder outputs a vector Y_fAnd the next encoder state X_n+1Each is a current encoder state X_n＝[x₂x₁]And a function of the input vector V. For the data phase, the encoder outputs a vector Y_eOnly a function of the input vector V.

TABLE 4

v	1	x₁	x₂	y_e	y_f
v	1	x₁	x₂	y_e	y_f	v₀	x₁+v₀	x₁	x₂	v₀	x₂+v₀
v₁	x₁+v₀+v₁	x₁+v₀	x₁	v₁	x₁+v₁	v₀	x₁+v₀	x₁	x₂	v₀	x₂+v₀
v₁	x₁+v₀+v₁	x₁+v₀	x₁	v₁	x₁+v₁	v₂	x₁+v₀+v₁+v₂	x₃+x₂+u₁	x₁+v₀	v₂	x₁+v₀+v₂
v₃	x₁+v₀+v₁+v₂+v₃	x₁+v₀+v₁+v₂	x₃+x₂+u₁	v₃	x₁+v₀+v₁+v₃	v₂	x₁+v₀+v₁+v₂	x₃+x₂+u₁	x₁+v₀	v₂	x₁+v₀+v₂
v₃	x₁+v₀+v₁+v₂+v₃	x₁+v₀+v₁+v₂	x₃+x₂+u₁	v₃	x₁+v₀+v₁+v₃			x₁+v₀+v₁+v₂+v₃	x₁+v₀+v₁+v₂

Referring back to table 1, the inner convolutional encoder within the HDR system receives 2044 coded bits and four code tail bits per packet of packet format 1. If four bits are encoded in parallel, 512 clock cycles are used to encode one packet. The first 511 clock cycles are used to encode 2044 coded bits (i.e., 511 × 4 ═ 2044), and the 512 th clock cycle is used to encode four code tail bits. The convolutional encoder receives 3079 code bits and three code tail bits for each packet of packet format 2. If four bits are encoded in parallel, 768 clock cycles are used to encode one data packet. The first 767 clock cycles are used to encode 3068 code bits (i.e., 767 × 4 ═ 3068), and the 768 clock cycles are used to encode the last code bit and three code-tail bits.

Table 5 shows the state and output of the inner convolutional encoder at the code tail stage of packet format 1. On the first clock cycle, with the value x₁Is provided to the encoder. The code tail bit value is selected such that the output of adder 342 equals zero. The output of the encoder is calculated as y_g0＝x₁And y_h0＝x₂+x₁. Processing continues in a similar manner for the remaining three code-tail bits.

TABLE 5

v	1	x₁	x₂	y_g	y_h
v	1	x₁	x₂	y_g	y_h	x₁	0	x₁	x₂	x₁	x₂+x₁
0	0	0	x₁	0	x₁	x₁	0	x₁	x₂	x₁	x₂+x₁
0	0	0	x₁	0	x₁	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0			0	0

Table 6 shows the state and output of the inner convolutional encoder at the code tail stage of packet format 2. On a first clock cycle, the last coded bit v is provided₀For the encoder, and encoder state x₁And x₂And output y_i0And y_j0Calculated in a similar manner as described above. The second row of Table 6 is thus the same as the second row of Table 4The same is true. On the second clock cycle, with x₁+v₀The first code tail bits of the value are provided to the encoder. The code tail bit value is selected such that the output of adder 342 equals zero. The output of the encoder is calculated as y_i0＝x₁+v₀And y_j0＝v₀. Processing continues in a similar manner for the residue code tail bits.

TABLE 6

v	1	x₁	x₂	y_i	y_j
v	1	x₁	x₂	y_i	y_j	v₀	x₁+v₀	x₁	x₂	v₀	x₂+v₀
x₁+v₀	0	x₁+v₀	x₁	x₁+v₀	v₀	v₀	x₁+v₀	x₁	x₂	v₀	x₂+v₀
x₁+v₀	0	x₁+v₀	x₁	x₁+v₀	v₀	0	0	0	x₁+v₀	0	x₁+v₀
0	0	0	0	0	0	0	0	0	x₁+v₀	0	x₁+v₀
0	0	0	0	0	0			0	0

Fig. 6 is a schematic diagram of a particular embodiment of a convolutional encoder 600 that can encode four input coded bits in parallel and implement the polynomial generator matrix represented in equation (2). Convolutional encoder 600 includes an encoder state machine 610 that implements the state machine defined in table 4, an output generator 620 that generates the encoder outputs defined in tables 4 through 6, and multiplexers 640A and 640B that provide the appropriate encoder outputs for the data and code tail stages of packet formats 1 and 2.

As shown in FIG. 6, four input codesBit v₀To v₃Provided in parallel to the inputs of the encoder state machine 610 and the output generator 620, each also receives a signal defined as X_n＝[x₂x₁]Current encoder state. Each AND gate 612 within the encoder state machine 610 is selectively coupled to the input v₀-v₃And x₁-x₂As defined in the last row of table 4. For example, AND gate 612A is coupled to input x₁，v₀，v₁，v₂，v₃And v₄E.g. last row, third column (x) of table 4₁) Is defined in the table entry(s). The outputs of AND gates 612A AND 612B are coupled to the inputs of registers 614A AND 614B, respectively. The outputs of registers 614A and 614B each include a state machine output x₁And x₂。

Similarly, each AND gate 622 within output generator 620 is selectively coupled to input v₀-v₃And x₁-x₂As defined in the last two columns in tables 4 to 6. E.g., AND gate 622A coupled to input x₂And v₀And generates y_f0(second row, last column in Table 4), AND gate 622B is coupled to input x₂And x₁And generates y_h0(second row, last column in Table 5), AND AND gate 622C is coupled to input x₂And v₀And generates y_j0(second row, last column in table 6). The outputs of the other encoders are generated as indicated in tables 4 to 6.

Multiplexer 640A comprises 3 x 1 multiplexers 642A through 642D, which respectively provide output y for the first encoder of inner convolutional encoder 600_ia0To y_ia3. In the data phase, y_e0To y_e3Are provided through multiplexers 642A-642D, respectively. In the code tail stage, multiplexers 642A-642D provide y for packet format 1 accordingly_g0To y_g3Providing y for packet format 2_i0To y_i3. Similarly, multiplexer 640B comprises 3 × 1 multiplexers 644A through 644D, which respectively provide second encoder output y for inner convolutional encoder 600_ib0To y_ib3. On dataStage, y_f0To y_f3Provided through multiplexers 644A-644D, respectively. In the code tail stage, multiplexers 644A-644D provide y for packet format 1 accordingly_h0To y_h3Providing y for packet format 2_j0To y_j3。

Another aspect of the present invention provides an interleaver capable of storing a plurality of coded bits generated in parallel by an outer convolutional encoder and supplying the plurality of coded bits to an inner convolutional encoder in parallel. Referring to fig. 2, an interleaver is coupled to the inner and outer convolutional encoders. The interleaver is designed to store a packet of one or more coded bits. After the entire packet is stored, the coded bits are then retrieved in a read order different from the write order to obtain interleaving of the coded bits. If interleaving is not required, the coded bits are retrieved from the interleaver in the original order.

The outer convolutional encoder of the example embodiment may be designed to receive and encode M data bits in parallel and generate M · R coded bits, where R relates to the code rate of the outer convolutional encoder (e.g., R ═ 2 refers to the rate 1/2 encoder). To speed up processing and reduce latency, the interleaver may be designed to store M · R coded bits from the outer convolutional encoder in parallel as the coded bits are generated by the encoder. Similarly, the inner convolutional encoder may be designed to receive and encode the N code bits in parallel. Also, to speed up processing and reduce latency, the interleaver may be designed to store at least N coded bits in parallel to the inner convolutional encoder in a single read operation.

The coded bits from each of the inner and outer convolutional encoders may be punctured to provide coded bits of other code rates. For example, referring to table 1, the output from the outer convolutional encoder is not truncated for packet format 1 to obtain a code rate of 1/2 and is truncated for packet format 2 to obtain a code rate of 2/3. Similarly, the output from the inner convolutional encoder is truncated for packet format 1 to obtain a code rate of 1/2 and for packet format 2 to obtain a code rate of 3/4. The interface between the encoder and the interleaver can be designed to efficiently achieve symbol puncturing.

Fig. 7A is a diagram of an embodiment of an interleaver 700. In this embodiment, interleaver 700 is implemented with a multi-port memory 710 with P ports, where P is greater than one. Depending on the particular memory cell used to implement the interleaver, each of the P ports may be used as both a write and read port, or may be dedicated write or read ports. In the embodiment shown in FIG. 7A, memory 710 includes W ports, designated as Port D₁To D_WAnd the R port is designated as the read port Q₁To Q_R. The memory 710 further includes P address inputs A₁To A_PEach of the P ports has an address input. Each write and read port may transfer C bits in parallel.

Address generator 720 receives input addresses ADDR, generates the necessary addresses for each active port, and provides the generated addresses to address input A of memory 710₁To A_P. Although not shown in FIG. 7A (for simplicity), address generator 720 also generates one or more control signals that direct memory 710 to perform write or read operations.

In one embodiment, memory 710 is configured as a two-dimensional memory with a plurality of rows and columns. In one embodiment, the coded bits are written into successive lines of memory 710. For efficiency, the width of each row may correspond to the width of each port (i.e., C bits). This allows up to W rows of coded bits to be written to the W write ports of memory 710 on each write operation. Once the coded bits of the entire packet are stored in memory 710, the coded bits may be retrieved from memory. In an embodiment, the coded bits are also read out from the memory 710 row by row. For the embodiment shown in fig. 7A, up to R rows of coded bits may be retrieved from R read ports per read operation.

Various designs may be used to provide the coded bits from interleaver 700 to the inner convolutional encoder. The particular design of the implementation depends on the particular system requirements. In one design, R multiplexers 730A-730R are coupled to the sense terminals of R, respectivelyMouth Q₁To Q_R. For each read operation, up to R rows of coded bits are retrieved from memory 710 and provided to multiplexers 730A through 730R, which also receive control signal AD accordingly₁To AD_R. Each multiplexer 730 receives C coded bits according to a corresponding control signal AD_XOne code bit is selected and the selected code bit is provided to the multiplexer output. Control signal AD₁To AD_RA particular code bit is selected from the code bits retrieved for each row. R multiplexers 730 may be used to provide up to R coded bits in parallel to the inner convolutional encoder.

To further clarify the understanding, a specific design of an interleaver for use with the inner and outer convolutional encoders described in fig. 5A, 5B, and 6 will now be described. In the encoder design described above, the outer convolutional encoder receives and encodes 8 data bits in parallel to generate 16 code bits in one clock cycle, and the inner convolutional encoder receives and encodes 4 code bits in parallel. In this particular interleaver design, an 8-port memory is used, with four ports for receiving coded bits in a write operation and four ports for providing coded bits in a read operation. In this design, each port can receive or provide 8 bits in parallel. Thus, for this particular design, up to 32 code bits may be written to the interleaver in a write operation and up to 32 code bits may be read from the interleaver in a read operation.

Fig. 7B is an embodiment of an interface between an outer convolutional encoder and an interleaver without truncation. In this embodiment, the coded bits generated by the outer convolutional encoder are provided to four registers 732A through 732D. Registers 732A-732B receive the 16 coded bits generated in a first clock cycle, and registers 732C and 732D receive the 16 coded bits generated in a second (e.g., another) clock cycle. When truncation is not implemented, all 32 coded bits on registers 732A-732D are provided to port D of memory in a write-once operation, respectively₁To D₄。

Fig. 7C is a diagram of an embodiment of an outer convolutional encoder and an inter-interleaver interface with truncation. Referring to table 1, the coded bits of the outer code are punctured with a puncturing pattern (1011) for packet format 2. Thus, in one clock cycle, 16 code bits are generated, 4 code bits are truncated, and 12 code bits are stored. Initially, 16 coded bits generated in the first clock cycle are stored to registers 732A-732B, and 16 coded bits generated in the second clock cycle are stored to registers 732C and 732D. After puncturing, the remaining 24 coded bits, as shown in FIG. 7C, are provided to three write ports (e.g., D)₁To D₃)。

The address generator provides the appropriate addresses for writing the non-truncated coded bits to sequential rows in the memory. One address is generated for each active port used to write the coded bits. Thus, the address generator is Port D when no truncation is performed₁To D₄Generating four addresses, Port D when performing truncation₁To D₃Three addresses are generated.

To provide four coded bits in parallel to the inner convolutional encoder, four rows of coded bits are retrieved from memory and provided to four 8 × 1 multiplexers. Each multiplexer also receives a respective 3-bit control signal AD_XThe signal selects a particular bit within the retrieved row to provide to the inner convolution encoder. The address of each retrieved bit may thus be divided into two parts, the first part identifying a particular row within the memory and the second part identifying a particular location within the row. A first part of the address is supplied to a suitable address input of the memory and a second part as a control signal AD_XProvided is a method. The first and second portions of the address are generated according to a particular interleaving scheme defined by the implemented system and standard.

The interleaver of the example embodiments is also implemented using other memories. For example, a single-port memory cell or multiple memory cells may be used to store and provide multiple bits simultaneously in parallel. For a single-port memory cell, multiple write operations may be used to store the generated code bits, and multiple read operations may also be used to retrieve the desired code bits. In designs using multiple memory cells, each memory cell may operate like a port (or pairs of ports) of a multi-port memory. Thus, a variety of designs may be used to implement the interleaver and are presented herein

Within the scope of the invention.

In the above embodiments, the interleaver is implemented between inner and outer convolutional encoders. Using this configuration to implement a Turbo encoder may provide certain advantages. In other encoder designs, interleaving after the outer convolutional encoder may not be necessary, and memory may not be needed after the outer convolutional encoder, or possibly memory is simply used as a buffer.

The concatenated encoder of the exemplary embodiments may operate in a variety of ways. In a particular design, the encoder is used to encode one data packet at a time. Referring to fig. 2, a specific packet of data may be encoded by an outer convolutional encoder and stored in an interleaver. After the entire packet is encoded by the outer convolutional encoder, the encoded bits are extracted from the interleaver and encoded by the inner convolutional encoder. Once the entire packet is encoded by the inner convolutional encoder, the encoding of the next packet is performed by the outer convolutional encoder. This design reduces the memory requirements of the interleaver, which may be desirable in some applications.

In another particular design, the interleaver is implemented with the capability to store two or more coded bit packets. For example, the memory used to implement the interleaver may be divided into two banks, each bank enabling the inner and outer convolutional encoders to operate on two packets simultaneously. The outer convolutional encoder encodes the first packet and stores the encoded bits of the packet to the repository. After storing the entire first packet to memory, the outer convolution encoder encodes the second packet and stores the encoded bits of the packet to the second repository. When the outer convolution encoder encodes and stores the coded bits of the current packet into one bank, the inner convolution encoder may retrieve and encode the coded bits of the previous packet from another bank. This design reduces processing latency.

Fig. 8 is a block diagram of a particular design of an encoder 800, which may be used to implement some embodiments. The encoder 800 may be used to implement the encoder 114 of fig. 1. Encoder 800 includes a processing unit 810 coupled to an address generator 820 and a memory 830. The processing unit 810 receives data from the buffer and control information from a control source (not shown), encodes the received data according to the control information, and provides the encoded data to the buffer 850.

In the illustrated embodiment of fig. 8, processing unit 810 includes an input interface 812, a multi-bit encoder 814, an output interface 816, and a control unit 818. Input interface 812 generates address and control signals for buffer 802, receives data provided by buffer 802 based on the generated address and control signals, and routes the received data to multi-bit encoder 814. The multi-bit encoder 814 implements the output and inner convolution encoders and may be implemented with one or more look-up tables or one or more encodings (such as described in fig. 4). When operating as an outer convolutional encoder, multi-bit encoder 814 encodes data from input interface 812 and provides the generated encoded bits to memory 830. And when operating as an inner convolutional encoder, multi-bit encoder 814 encodes the coded bits from memory 830 and provides the resulting coded bits to output interface 816. The output interface 816 then provides the encoded data to a buffer 850.

Control unit 818 receives various control information such as, for example, the particular data packet to be encoded, the location of the packet within buffer 802, the packet format, the encoding scheme to be used, the location at which the encoded packet is stored within buffer 850, and so forth. The control unit 818 then directs the input interface 812 to obtain the appropriate data bits from the buffer 802, directs the encoder state machine 814 to use the appropriate encoding scheme, and further directs the output interface 816 to provide the encoded data to the appropriate location within the buffer 850.

Address generator 820 generates the appropriate addresses for writing coded bits to memory 830 and reading coded bits from memory. Address generator 820 may be implemented with logic, a look-up table, or some other design.

The memory 830 stores the coded bits generated by the multi-bit encoder 814 and provides the stored coded bits to the multi-bit encoder 814. Memory 830 may be used to provide interleaving of the coded bits by appropriately generating the addresses. Memory 830 may be implemented with a multi-port memory (as described above) or one or more memory cells.

FIG. 9 is a flow chart of an embodiment of a method for implementing parallel concatenated coding of multiple data bits. Initially, at step 912, a plurality (M) of data bits from a particular data packet is received, and at step 914, a plurality (MR) of encoded bits is generated via parallel encoding according to a first encoding scheme (e.g., convolutional). The number of coded bits generated by the first coding scheme depends on the particular code rate of the scheme. In step 916, zero or more of the generated code bits may be punctured with a first puncturing scheme to provide code bits of different code rates. The unpunctured coded bits are then stored into memory at step 918.

As with the embodiment shown in fig. 9, the entire packet is encoded by the first encoding scheme and stored before the second encoding scheme is subsequently encoded. This enables interleaving of the coded bits, as described above. Accordingly, it is determined whether the entire packet is encoded at step 920. If the answer is negative, the process returns to step 912 and another M (or fewer) data bits are received.

Additionally, if the entire packet is encoded, a plurality (N) of code bits is received from memory at step 922 and encoded in parallel according to a second (e.g., convolutional) coding scheme to generate a plurality (NR) of code bits at step 924. Likewise, the number of coded bits generated by the second coding scheme depends on the particular code rate of the scheme. Likewise, in step 926, zero or more of the generated code bits may be punctured with a second puncturing scheme to provide code bits for another code rate. The unpunctured coded bits are then provided as coded data to the next processing element (e.g., modulator 116 of fig. 1) at step 928.

For efficiency and reduced latency, W codewords may be stored in parallel (e.g., via W write ports) to memory, and R codewords may be retrieved from memory in parallel (e.g., via R read ports). The W code words allow parallel storage of unpunctured code bits from the first coding scheme and the R code words allow N code bits to be provided in parallel to the second coding scheme. The memory may operate in the manner described above to obtain interleaving of the coded bits. For example, W codewords may be written to consecutive rows in the memory, and R codewords may be read from arranged rows in the memory.

The encoder and interleaver of the example embodiments may be used to significantly shorten the encoding time. By parallel encoding the M data bits with an outer convolutional encoder and the N code bits with an inner convolutional encoder, the overall coding delay can be greatly reduced. The interleaver of the present invention supports parallel encoding because it has the ability to receive multiple coded bits during a write operation and provide multiple coded bits during a read operation. The improvement in processing latency for a particular design is shown in table 7, where packet formats 1 and 2 are in HDR systems, and M is 8 and N is 4.

TABLE 7

	Packet format 1		Packet format 2
	Packet format 1		Packet format 2			In parallel	In series	In parallel	In series
Outer convolution encoder						In parallel	In series	In parallel	In series
Outer convolution encoder					Input bit	1018		2042
Code tail bits	4		4		Input bit	1018		2042
Code tail bits	4		4		Total input bits	1022		2046
Required clock period	128	1024	256	2048	Total input bits	1022		2046
Required clock period	128	1024	256	2048
Inner convolution encoder
Inner convolution encoder					Input bit	2044		3069
Code tail bits	4		3		Input bit	2044		3069
Code tail bits	4		3		Total input bits	2048		3072
Required clock period	512	2048	768	3072	Total input bits	2048		3072
Required clock period	512	2048	768	3072
Encoding time (20MHZ clock)
Encoding time (20MHZ clock)					Outer encoder (musec)	6.4	51.2	12.8	102.4

Inner encoder (musec)	25.6	102.4	38.4	153.6
Inner encoder (musec)	25.6	102.4	38.4	153.6	Total code time (musec)	32.0	153.6	51.2	256.0

For the particular design shown in table 7, the total coding delay is reduced by a factor of 4.8 for packet format 1 and 5 for packet format 2. It is appreciated that further improvements in processing delay can be achieved by increasing the number of bits that are coded in parallel, particularly for the inner convolutional encoder (i.e., increasing N).

The shorter processing delay provided by the encoder and interleaver of the present invention provides several benefits. Some of these benefits are briefly described below.

First, shorter processing delays may be used to support some types of services, such as voice and video, which have more stringent delay requirements. Shorter processing delays may therefore allow efficient coding schemes for more delay-sensitive applications.

Second, shorter processing delays may improve system performance. For example, if a particular user or data rate is selected for a particular transmission based on the condition of the communication link, which is determined at a particular time, a shorter processing delay increases the likelihood that the link condition will not change during the data transmission. Link conditions typically change over time, and longer processing delays increase the likelihood that link conditions change during data transmission time, which can lead to performance degradation.

Third, shorter processing delays may improve the capacity of some communication systems. For example, in an HDR system, power control data is multiplexed with traffic data and transmitted to user terminals. The shorter processing delay enables more accurate control of the transmit power of the user terminal, which can increase system capacity and improve performance.

Fourth, the shorter processing delay enables multiple transmitting entities (e.g., three users in a three sector system) to share hardware resources (e.g., encoders) continuously in one processing slot (e.g., a forward link slot in an HDR system) to reduce the overall area of hardware design.

For the sake of brevity, some aspects and embodiments of the encoder of the present invention are described in particular with respect to the forward link in an HDR system. The present invention may be used in other communication systems that use the same, similar, or different coding schemes. For example, the encoder of the present invention may be used to implement a convolutional encoder capable of receiving and encoding multiple data bits in parallel. The encoder of the present invention may also be used to implement a concatenated encoder, such as a Turbo encoder, that is capable of receiving and encoding multiple data bits in parallel. The specific design of the encoder depends on various factors such as, for example, the particular polynomial generator matrix implemented, the number of bits to be encoded in parallel, the packet format, the use of code tail bits, etc.

The encoder of the present invention may advantageously be used in a base station or a user terminal (such as a mobile unit, a telephone, etc.) within a communication system. The coding for the forward link (i.e., downlink) and the reverse link (i.e., uplink) may differ and generally depends on the particular CDMA system or standard being implemented. Thus, the encoder of the present invention is generally designed for the particular application for which it is used.

Referring to the specific designs shown in tables 2 and 3, the next state and output of the outer convolution encoder can be generated with functions having up to seven terms. Referring to the specific designs shown in tables 4 through 6, the next state and output of the inner convolution encoder may be generated with a function of up to five terms. These functions may be simply generated with logic gates, as is known in the art. Other elements of the inner and outer convolutional encoders (e.g., registers, multiplexers) may also be implemented in a manner known in the art.

Some or all of the elements of the inventive encoder described above (e.g., multi-bit encoder, input and output interfaces, control unit, encoder state machine, output generator, multiplexer, etc.) may also be implemented within: one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), complex PLDs (cplds), controllers, microcontrollers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof. Some or all of the elements of the encoder of the present invention may also be implemented using software or firmware executed by a processor.

The memory and storage elements, such as those used to implement the interleaver of the present invention, may be implemented using a variety of memory technologies, such as Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), flash memory, and the like. The storage unit may be implemented with a storage element such as, for example, a hard disk, a CD-ROM drive, etc. Many other implementations of memory cells are possible and within the scope of the invention.

The foregoing description of the preferred embodiments will provide those skilled in the art with an enabling description for implementing the invention. The general principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A concatenated encoder for parallel encoding of a plurality of data bits, comprising:

a first encoder that receives and encodes M data bits in parallel according to a first encoding scheme to generate MR encoded bits;

a memory coupled to the first encoder and configured to receive and store the plurality of MR coded bits unpunctured bits from the first encoder; and

a second encoder coupled to the memory and configured to receive and encode the N coded bits in parallel according to a second coding scheme to generate coded data.

2. The concatenated encoder of claim 1, wherein the memory is implemented as a multi-port memory having P ports, where P is greater than one.

3. The concatenated encoder of claim 2, wherein the memory is configured to store a plurality of W codewords in parallel for a write operation and to provide the plurality of W codewords in parallel for a read operation, wherein each codeword comprises a particular number of coded bits.

4. The concatenated encoder of claim 2, wherein the W codewords are stored into successive rows of the memory during a write operation, and the R codewords are retrieved from the stored ordered rows during a read operation to provide interleaving.

5. The concatenated encoder of claim 2, further comprising:

a plurality of N sets of multiplexers, each multiplexer coupled to a port of the multi-port memory and configured to receive a corresponding codeword from the memory, select one of the code bits in the received codeword, and provide the selected code bit, wherein the N selected code bits from the plurality of N multiplexers include the N code bits provided in parallel to the second encoder.

6. The concatenated encoder of claim 1, wherein each of the first and second encoders is a convolutional encoder implementing a particular polynomial generator matrix.

7. The concatenated encoder of claim 1, wherein each of the first and second encoders is a rate 1/2 convolutional encoder.

8. The concatenated encoder of claim 1, wherein at least one of the first and second encoders is implemented using one or more look-up tables.

9. The concatenated encoder of claim 1, wherein at least one of the first and second encoders is implemented as a state machine.

10. The concatenated encoder of claim 1, wherein the first encoder is operative to receive and encode at least eight data bits in parallel according to a first convolutional coding scheme.

11. The concatenated encoder of claim 1, wherein the second encoder is operative to receive and encode at least four code bits in parallel according to a second convolutional coding scheme.

12. The concatenated encoder of claim 1, wherein the memory is configured to provide interleaving of the coded bits stored in the memory.

13. The concatenated encoder of claim 1, wherein the encoding is performed and completed by the first and second encoders for the first packet before beginning encoding of the second packet to reduce memory requirements.

14. The concatenated encoder of claim 1, wherein the first packet is encoded by a first encoder and the second packet is encoded by a second encoder to reduce processing latency.

15. A convolutional encoder for encoding a plurality of data bits in parallel, comprising:

a state machine for receiving a plurality of M data bits in parallel and providing a set of values indicative of a next state of the state machine, wherein the next state is a function of the M data bits and the set of values indicative of a current state of the state machine; and

an output generator coupled to the state machine and configured to receive the set of M data bits and the value of the current state and generate a plurality of MR encoding bits in response thereto.

16. The convolutional encoder of claim 15, wherein the output generator comprises:

a first output generator for receiving a set of M data bits and a value of a current state and generating therefrom a first plurality of MR code bits; and

a second output generator for receiving the set of M data bits and the value of the current state and generating therefrom a second plurality of MR code bits; and

wherein the first plurality of MR encoding bits is selected for a first encoding phase and the second plurality of MR encoding bits is selected for a second encoding phase, wherein the state machine is set to a known state for the second encoding phase.

17. The convolutional encoder of claim 16, wherein the known states are defined by a set of zero values.

18. The convolutional encoder of claim 15, wherein a state machine is used to implement a particular polynomial generator matrix.

19. The convolutional encoder of claim 15, wherein the state machine comprises

A set of logic elements, each logic element coupled to a selected one of the M data bits and to a set of values of a current state to implement a particular logic function for a corresponding bit of the state machine, an

A set of registers coupled to the set of logic elements and for storing output values from the logic elements, wherein the outputs from the set of registers comprise a set of values indicating a current state of the state machine.

20. The convolutional encoder of claim 15, wherein the state machine and the output generator are designed to receive and encode eight or more data bits in parallel.

21. A data encoder for encoding a plurality of bits in parallel, comprising:

an input interface for receiving a plurality of M data bits;

a multi-bit encoder coupled to the input interface and configured to receive and encode the M data bits to generate a plurality of MR code bits or to receive and encode a plurality of N code bits in parallel to generate a plurality of NR code bits;

a memory coupled to the multi-bit encoder and configured to store unpunctured bits of the MR coded bits from the multi-bit encoder and to provide the N coded bits to the multi-bit encoder; and

an output interface coupled to the multi-bit encoder and configured to receive the NR encoded bits from the multi-bit encoder and provide the NR encoded bits as unpunctured bits as encoded data.

22. The convolutional encoder of claim 20, further comprising:

and an address generator coupled to the memory and configured to generate addresses for memory read and write operations.

23. A transmitter unit in a communication system, comprising:

an encoder for receiving and encoding in parallel a plurality of M data bits according to a first coding scheme to generate a plurality of MR code bits, storing the non-punctured bits of the MR code bits, interleaving the code bits for a particular packet, receiving and encoding in parallel a plurality of N code bits according to a second coding scheme, and providing the non-punctured bits of the NR code bits as encoded data;

a modulator coupled to the encoder and configured to receive and modulate the encoded data with a particular modulation scheme to generate modulated data; and

a transmitter is coupled to the modulator and is configured to receive and process the modulated data to generate a modulated signal suitable for transmission.

24. The transmitter unit of claim 23, wherein the encoder is configured to implement Turbo coding or concatenated coding.

25. A method for implementing parallel concatenated coding of a plurality of data bits, comprising:

receiving a plurality of M data bits;

encoding M data bits in parallel according to a first encoding scheme to generate a plurality of MR encoded bits;

storing unpunctured bits of the MR code bits in a memory;

retrieving a plurality of N coded bits from a memory; and

the N coded bits are coded in parallel according to a second coding method to generate coded data.

26. The method of claim 25, wherein storing comprises:

writing a plurality of W codewords of unpunctured coded bits to a plurality of W ports of a memory.

27. The method of claim 25, wherein the retrieving comprises retrieving the data from a database

R codewords of a plurality of coded bits are read from a plurality of R ports of the memory.

28. The method of claim 25, wherein each of the first and second coding schemes is a convolutional code.

29. The method of claim 25, wherein the N code bits are retrieved from the memory in a read order different from a write order used to store unpunctured ones of the MR code bits for providing interleaving of the code bits.

30. The method of claim 25, wherein encoding M data bits and encoding N code bits each comprises

If the M data bits or the N codes respectively comprise no code tail bits, a first corresponding set of coding equations is applied, an

If the M data bits or the N code bits include one or more code tail bits, a second corresponding set of coding equations is applied.

31. A method of parallel encoding a plurality of data bits, comprising:

receiving a plurality of M data bits, wherein M is greater than or equal to eight;

encoding M data bits in parallel according to a first convolutional coding scheme to generate a plurality of MR code bits, wherein encoding the M data bits comprises

If the M data bits include codeless tail bits, applying a first set of encoding equations, an

Applying a second encoding equation if the M data bits include one or more code-tail bits

And (4) collecting.

Storing unpunctured bits of the MR code bits in a first order in a memory;

retrieving a plurality of N coded bits from memory in a second order, wherein N is greater than or equal to four; and

encoding N code bits in parallel according to a first convolutional coding scheme to generate encoded data, wherein encoding the N code bits comprises

If the N coded bits include codeless tail bits, applying a third set of coding equations, an

Applying a set of fourth coding equations if the N code bits include one or more code tail bits

And (6) mixing.