CN1564125A

CN1564125A - Array type reconstructural DSP engine chip structure based on CORDIC unit

Info

Publication number: CN1564125A
Application number: CN 200410013670
Authority: CN
Inventors: 杨宇; 毛志刚
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2004-04-09
Filing date: 2004-04-09
Publication date: 2005-01-12

Abstract

Interconnection bus including longitudinal interconnection bus and transverse interconnection bus interconnected through reconfigurable switch network are setup between reconfigurable processing units arranged in array. Reconfigurable processing units are connected to each other longitudinally through basic unit data lines, which are connected to transverse interconnection bus through reconfigurable switch network. In transverse adjacent reconfigurable processing units at same stage of pipeline, their adders, shifters, accumulators, and register are connected through interconnection lines containing control switch. Reconfigurable processing unit itself of using COROIC algorithm possesses high reconfigurability, providing features of realizing wide used DSP algorithms, simple structure and rules, easy of modularized. Thus, the reconfigurable processing unit is suitable for being as core unit in reconfigurable chip.

Description

An array reconfigurable DSP engine chip structure based on CORDIC unit

技术领域：Technical field:

本发明涉及一种以CORDIC算法为核心的粗颗粒度基本单元组成的可重构(硬件可编程)阵列式芯片内部结构，该结构主要应用于DSP领域。通过对芯片内的硬件可重构资源的配置，可以高效地执行绝大多数DSP算法中的核心环节，可以用做DSP系统中的加速引擎。The invention relates to a reconfigurable (hardware programmable) array chip internal structure composed of coarse-grained basic units with a CORDIC algorithm as the core, and the structure is mainly used in the DSP field. Through the configuration of the hardware reconfigurable resources in the chip, it can efficiently execute the core links in most DSP algorithms, and can be used as the acceleration engine in the DSP system.

背景技术：Background technique:

CORDIC(COordinate Rotation DIgital Computing)又称为坐标旋转数字计算方法，是用于计算广义矢量旋转的一种迭代方法。通过设置CORDIC单元中为数不多的几个参数，它可以用简单的“移位——加”迭代实现多种基本函数和运算，如：三角函数、反三角函数、双曲函数、反双曲函数、对数运算、指数运算、开方运算、乘法运算、除法运算，这些特性表明CORDIC算法本身具有很好的可重构性(硬件可编程性)。这些函数和运算中有很多是用其他方法不容易实现的，也是一些DSP算法中经常遇到的。以往的可重构器件分为细颗粒度和粗颗粒度两大类。我们对现有的粗颗粒度阵列芯片的总结发现，他们的数据字宽都是固定的，只能适应数据字宽相同的一类应用。我们认为如果芯片能够具有数据字宽可重构的功能，将会大大增强芯片的适应性，使芯片资源浪费少，形成功能模块的性能较高，功耗较低，而且配置数据相比之下要少的多，有利于动态应用。CORDIC (COordinate Rotation DIgital Computing), also known as coordinate rotation digital calculation method, is an iterative method for calculating generalized vector rotation. By setting the few parameters in the CORDIC unit, it can implement a variety of basic functions and operations with simple "shift-add" iterations, such as: trigonometric functions, inverse trigonometric functions, hyperbolic functions, inverse hyperbolic functions Functions, logarithmic operations, exponential operations, square root operations, multiplication operations, and division operations, these characteristics indicate that the CORDIC algorithm itself has good reconfigurability (hardware programmability). Many of these functions and operations are not easily realized by other methods, and are often encountered in some DSP algorithms. Previous reconfigurable devices are divided into two categories: fine-grained and coarse-grained. Our summary of the existing coarse-grained array chips found that their data word width is fixed, and they can only adapt to a class of applications with the same data word width. We believe that if the chip can have the function of reconfigurable data word width, the adaptability of the chip will be greatly enhanced, the waste of chip resources will be less, the performance of the functional modules will be higher, the power consumption will be lower, and the configuration data will be relatively low. Much less, in favor of dynamic applications.

发明内容：Invention content:

为了解决已有的粗颗粒度DSP阵列芯片数据字宽固定的问题，提供一种能通过把相邻单元的基本运算部件重构以改变数据字宽的阵列式可重构DSP芯片。本发明的技术方案如下：一种基于CORDIC单元的阵列式可重构DSP引擎芯片结构，排成阵列的若干个可重构处理单元1之间设置有互连总线2，纵向的互连总线2-1与横向的互连总线2-2通过可重构开关网络3互相连接，在同一纵向排列方向上的每个相邻的可重构处理单元1通过基本单元数据线4纵向相连接，基本单元数据线4与横向的互连总线2-2通过可重构开关网络3相连接，可重构处理单元1是cordic算法的若干级流水结构，横向相邻的可重构处理单元1的同一级流水中的加法器与相对应位置的加法器、移位器与相对应位置的移位器、累加器与相对应位置的累加器、寄存器与相对应位置的寄存器之间分别通过包括有控制开关5的互连线7相连接。本发明的DSP引擎芯片，为实现相邻单元之间的可重构，主要在相邻单元中处于同一流水级的两个运算部件如移位器、加法器、累加器、寄存器之间建立可重构通路，当该通路连通时两个移位器、加法器、累加器、寄存器可以组成一个数据字宽为原来2倍的功能模块。我们为单元中的移位器、加法器、累加器和寄存器增加了可重构功能，使横向相邻的两个单元之间同一级流水的移位器、加法器和寄存器通过配置，可以连接成为字宽为原来2倍的相应功能单元。该可重构功能让我们可以实现将版图位置相邻的4/9/16个8-bit CORDIC单元组成一个16/24/32-bit的CORDIC单元。由于本发明的DSP引擎芯片的移位器、加法器、寄存器和累加器的数据字宽都能通过重构发生变化，因此芯片的通用性和适应性大大增强，使芯片资源浪费少，形成功能模块的性能高、功耗低，有利于动态应用。使用COROIC算法的可重构处理单元1自身具有很强的可重构性，能够高效地实现相当广泛的DSP类算法，而且结构简单规则，易于实现模块化，非常适合作为可重构芯片的核心单元。本发明设计新颖、工作可靠，具有较大的推广价值。In order to solve the problem of fixed data word width of existing coarse-grained DSP array chips, an array type reconfigurable DSP chip which can change the data word width by reconfiguring the basic operation components of adjacent units is provided. The technical scheme of the present invention is as follows: an array type reconfigurable DSP engine chip structure based on CORDIC unit, interconnection bus 2 is arranged between several reconfigurable processing units 1 arranged in an array, vertical interconnection bus 2 -1 and the horizontal interconnection bus 2-2 are connected to each other through the reconfigurable switch network 3, and each adjacent reconfigurable processing unit 1 in the same longitudinal arrangement direction is vertically connected through the basic unit data line 4, basically The unit data line 4 is connected to the horizontal interconnection bus 2-2 through the reconfigurable switch network 3. The reconfigurable processing unit 1 is a several-stage pipeline structure of the cordic algorithm, and the horizontally adjacent reconfigurable processing unit 1 is the same The adder and the adder at the corresponding position, the shifter and the shifter at the corresponding position, the accumulator and the accumulator at the corresponding position, and the register and the register at the corresponding position in the pipeline are respectively controlled by The interconnection lines 7 of the switches 5 are connected. The DSP engine chip of the present invention, in order to realize the reconfigurability between adjacent units, mainly establishes reconfigurable components between two arithmetic components such as shifters, adders, accumulators, and registers that are at the same pipeline level in adjacent units. Reconstructing the path, when the path is connected, two shifters, adders, accumulators, and registers can form a functional module with twice the original data word width. We have added reconfigurable functions to the shifters, adders, accumulators, and registers in the unit, so that the shifters, adders, and registers of the same level of pipeline between two horizontally adjacent units can be connected through configuration. Become the corresponding functional unit whose word width is 2 times of the original. This reconfigurable function allows us to combine 4/9/16 8-bit CORDIC units adjacent to each other in the layout to form a 16/24/32-bit CORDIC unit. Since the data word width of the shifter, adder, register and accumulator of the DSP engine chip of the present invention can be changed through reconfiguration, the versatility and adaptability of the chip are greatly enhanced, the waste of chip resources is small, and the function The high performance and low power consumption of the module facilitates dynamic applications. The reconfigurable processing unit 1 using the COROIC algorithm itself has strong reconfigurability, can efficiently implement a wide range of DSP algorithms, and has a simple and regular structure, easy to achieve modularization, and is very suitable as the core of reconfigurable chips unit. The invention is novel in design, reliable in operation and has great popularization value.

附图说明：Description of drawings:

图1是本发明的结构示意图，图2是本发明实施方式二中可重构开关网络3的结构示意图，图3是实施方式一中可重构处理单元1的结构示意图，图4是实施方式一中移位器重构的示意图，图5是加法器重构的示意图，图6是相邻可重构处理单元1中第一级流水的连接结构示意图。Fig. 1 is a schematic structural diagram of the present invention, Fig. 2 is a schematic structural diagram of a reconfigurable switch network 3 in Embodiment 2 of the present invention, Fig. 3 is a schematic structural diagram of a reconfigurable processing unit 1 in Embodiment 1, and Fig. 4 is an embodiment A schematic diagram of the reconfiguration of the shifter, FIG. 5 is a schematic diagram of the reconfiguration of the adder, and FIG. 6 is a schematic diagram of the connection structure of the first-stage pipeline in the adjacent reconfigurable processing unit 1 .

具体实施方式：Detailed ways:

具体实施方式一：下面结合图1、图3至图6具体说明本实施方式。排成阵列的若干个可重构处理单元1之间设置有互连总线2，纵向的互连总线2-1与横向的互连总线2-2通过可重构开关网络3互相连接，在同一纵向排列方向上的每个相邻的可重构处理单元1通过基本单元数据线4纵向相连接，基本单元数据线4与横向的互连总线2-2通过可重构开关网络3相连接，可重构处理单元1是cordic算法的若干级流水结构，横向相邻的可重构处理单元1的同一级流水中的加法器与相对应位置的加法器、移位器与相对应位置的移位器、累加器与相对应位置的累加器、寄存器与相对应位置的寄存器之间分别通过包括有控制开关5的互连线7相连接。Specific Embodiment 1: The present embodiment will be specifically described below with reference to FIG. 1 , FIG. 3 to FIG. 6 . An interconnection bus 2 is arranged between several reconfigurable processing units 1 arranged in an array, and the vertical interconnection bus 2-1 and the horizontal interconnection bus 2-2 are connected to each other through a reconfigurable switch network 3. Each adjacent reconfigurable processing unit 1 in the vertical arrangement direction is vertically connected through the basic unit data line 4, and the basic unit data line 4 is connected with the horizontal interconnection bus 2-2 through the reconfigurable switch network 3, The reconfigurable processing unit 1 is a several-stage pipeline structure of the cordic algorithm, the adder and the adder in the corresponding position in the same pipeline of the horizontally adjacent reconfigurable processing unit 1, the shifter and the shifter in the corresponding position The bit register, the accumulator and the accumulator at the corresponding position, and the register and the register at the corresponding position are respectively connected through the interconnection line 7 including the control switch 5 .

如图4所示，分属两个相邻可重构处理单元1中的移位器1-1-1和移位器1-1-2，都是右移三位的移位器，当想得到较长数据字宽时，通过指令让控制开关5接通，互连线7就成为通路，使移位器1-1-1和移位器1-1-2就合成为一数据字宽为原来2倍的移位器。加法器也能通过相同方法进行重构，加长数据字宽。如图5所示，分属两个相邻可重构处理单元1中相同位置的超前进位加法器1-2-1和超前进位加法器1-2-2，通过控制开关5的接通，就能得到数据字宽为原来2倍的一个超前进位加法器，从而实现重构功能。累加器和寄存器也能通过相同方法进行重构。控制开关5选用场效应管来实现。图6中示出横向相邻的两个可重构处理单元1中第一级流水中相同位置的移位器、加法器、寄存器和累加器的连接结构示意图，加法器1-2-1和加法器1-2-2之间，加法器1-3-1和加法器1-3-2之间，加法器1-4-1和加法器1-4-2之间，寄存器1-5-1和寄存器1-5-2之间、寄存器1-6-1和寄存器1-6-2之间，寄存器1-7-1和寄存器1-7-2之间，移位器1-8-1和移位器1-8-2之间、移位器1-9-1和移位器1-9-2之间，标志寄存器1-10-1和标志寄存器1-10-2之间都通过包括有开关5的互连线7相连接。As shown in FIG. 4 , the shifters 1-1-1 and 1-1-2 belonging to two adjacent reconfigurable processing units 1 are both right-shifted three-bit shifters. When a longer data word width is desired, the control switch 5 is turned on through an instruction, and the interconnection line 7 becomes a path, so that the shifter 1-1-1 and the shifter 1-1-2 are synthesized into a data word width For the original 2x shifter. The adder can also be reconfigured in the same way to increase the data word width. As shown in FIG. 5 , the carry-ahead adder 1-2-1 and the carry-ahead adder 1-2-2 belonging to the same position in two adjacent reconfigurable processing units 1 are connected by a control switch 5 Through the pass, you can get a look-ahead carry adder whose data word width is twice the original, so as to realize the reconstruction function. Accumulators and registers can also be reconstructed in the same way. The control switch 5 is realized by using a field effect transistor. Figure 6 shows a schematic diagram of the connection structure of shifters, adders, registers and accumulators at the same position in the first-stage pipeline in two horizontally adjacent reconfigurable processing units 1, adders 1-2-1 and Between Adder 1-2-2, Between Adder 1-3-1 and Adder 1-3-2, Between Adder 1-4-1 and Adder 1-4-2, Register 1-5 Between -1 and register 1-5-2, between register 1-6-1 and register 1-6-2, between register 1-7-1 and register 1-7-2, shifter 1-8 Between -1 and shifter 1-8-2, between shifter 1-9-1 and shifter 1-9-2, between flag register 1-10-1 and flag register 1-10-2 All of them are connected through the interconnection line 7 including the switch 5 .

如图3和图6所示，芯片的工作过程主要分为两个阶段：重构阶段和工作阶段。在重构阶段，上面结构中带有可重构功能的部分，即图中圆圈代表的可预置存储单元，被写入预置的配置数据。此时该芯片的功能已经被固定下来，相当于一个只能完成某一特定功能的硬件电路，随时可以加载被处理数据开始工作。进入工作阶段以后，每一个时钟节拍有一组数据从单元的顶部进入处理引擎，以流水的方式逐级向下运行，到达算法结束的位置以后通过互连总线连接到某一芯片端口。As shown in Figure 3 and Figure 6, the working process of the chip is mainly divided into two stages: the reconstruction stage and the working stage. In the reconfiguration phase, the part with reconfigurable functions in the above structure, that is, the preconfigurable storage unit represented by the circle in the figure, is written with preset configuration data. At this time, the function of the chip has been fixed, which is equivalent to a hardware circuit that can only complete a certain function, and can load the processed data at any time to start working. After entering the working stage, a set of data enters the processing engine from the top of the unit at each clock beat, and runs down step by step in a pipelined manner. After reaching the end of the algorithm, it is connected to a certain chip port through the interconnection bus.

下面以计算某一角度α的正弦和余弦为例说明8位CORDIC单元内部工作过程。单元入口有三路数据输入(X₀，Y₀，Z₀)，初始值取(1、0、α)，前十级流水的移位序列为(0，0，1，2，3，4，5，6，7，8)，3级模校正流水的移位序列(2，5，8)。X₀进入第一级流水后分为两路，一路直接作为第一个加法器的被加数，另一路经过“右移0位”后作为第二个加发器的加数；Y₀进入第一级流水后分为三路，一路直接作为第二个加法器的被加数，第二路经过“右移0位”后作为第一个加发器的加数，第三路送入符号判断模块；Z₀进入第一级后分为两路，一路送入加发器与±arctan(2^-0)做和，另一路则进入符号判断模块。符号判断模块根据Z_i的正负产生一个符号位分别送入三路加法器，控制他们做和或做差。三个加法器产生的结果在下一个时钟到来时存入该级流水的寄存器中，供第二级流水操作使用。以此类推，直到第十级，各级之间操作的不同点在于移位位数(如移位序列描述的)。从第十一级到第十三级流水为模校正操作，该操作中X₀分为两路，一路直接作为第一个加法器的被加数，另一路经过右移后作为这个加法器的加数；Y₀与X₀的模校正操作相同；Z₀无操作，只做三级寄存。三级模校正操作的移位序列为(2，5，8)，各级加法器的加减操作由一个预置存储位控制。最终第十三级流水寄存器中的数据就是本次计算的结果：X₁₃＝Cosα，Y₁₃＝Sinα，Z₁₃＝0。这两个正弦和余弦值可以通过互连总线输出或传给其他功能模块使用。The following takes the calculation of the sine and cosine of a certain angle α as an example to illustrate the internal working process of the 8-bit CORDIC unit. The unit entrance has three data inputs (X ₀ , Y ₀ , Z ₀ ), the initial value is (1, 0, α), and the shift sequence of the first ten stages is (0, 0, 1, 2, 3, 4, 5, 6, 7, 8), the shift sequence (2, 5, 8) of the 3-stage modulo correction pipeline. After X ₀ enters the first-level pipeline, it is divided into two paths, one path is directly used as the addend of the first adder, and the other path is used as the addend of the second adder after being "shifted right by 0 bits"; Y ₀ enters After the first level of pipeline, it is divided into three ways, one way is directly used as the addend of the second adder, the second way is used as the addend of the first adder after "shifting right by 0", and the third way is sent into Symbol judging module; Z ₀ is divided into two paths after entering the first stage, one path is sent to the adder and ±arctan(2 ^-0 ) for summing, and the other path enters the symbol judging module. The sign judging module generates a sign bit according to the positive or negative of Z _i and sends it to the three-way adder to control them to make sum or difference. The results generated by the three adders are stored in the registers of the pipeline when the next clock arrives, for use in the second pipeline operation. By analogy, up to the tenth stage, the difference in operation between stages is the number of shifts (as described by the shift sequence). The pipeline from the eleventh level to the thirteenth level is a modulo correction operation. In this operation, X ₀ is divided into two ways, one way is directly used as the summand of the first adder, and the other way is used as the addend of the adder after being shifted to the right. Addend; the modulo correction operation of Y ₀ and X ₀ is the same; Z ₀ has no operation, only three-level register. The shift sequence of the three-level modulo correction operation is (2, 5, 8), and the addition and subtraction operations of the adders at each level are controlled by a preset storage bit. Finally, the data in the thirteenth-stage pipeline register is the result of this calculation: X ₁₃ =Cosα, Y ₁₃ =Sinα, Z ₁₃ =0. The two sine and cosine values can be output through the interconnection bus or passed to other functional modules for use.

具体实施方式二：下面结合图1和图2具体说明本实施方式。本实施方式与实施方式一的不同点是：所述互连总线2为64-bit互连总线，可重构开关网络3由若干个开关管3-1组成，开关管3-1设置在纵向的64根互连总线2-1与横向的64根互连总线2-2的交叉点处，开关管3-1的两个主工作极分别连接纵向的互连总线2-1和横向的互连总线2-2，开关管3-1的控制极连接控制开关管的预置存储器3-2。本实施方式工作时，通过编程在预置存器3-2中设定开关管3-1是连通还是关断，从而决定本芯片的构成方式。开关管3-1既可以选用CMOS管也可以选用NMOS管。Specific Embodiment 2: The present embodiment will be specifically described below with reference to FIG. 1 and FIG. 2 . The difference between this embodiment and Embodiment 1 is: the interconnection bus 2 is a 64-bit interconnection bus, the reconfigurable switch network 3 is composed of several switch tubes 3-1, and the switch tubes 3-1 are arranged vertically At the intersection of the 64 interconnection buses 2-1 and the 64 horizontal interconnection buses 2-2, the two main working poles of the switch tube 3-1 are connected to the longitudinal interconnection bus 2-1 and the horizontal interconnection bus 2-1 respectively. Connected to the bus 2-2, the control pole of the switch tube 3-1 is connected to the preset memory 3-2 for controlling the switch tube. When this embodiment is working, the configuration of the chip is determined by setting whether the switch tube 3-1 is connected or turned off in the preset memory 3-2 through programming. The switching tube 3-1 can be either a CMOS tube or an NMOS tube.

Claims

1. An array-type reconfigurable DSP engine chip structure based on CORDIC units, an interconnection bus (2) is arranged between several reconfigurable processing units (1) arranged in an array, and the longitudinal interconnection bus (2 -1) The horizontal interconnection bus (2-2) is connected to each other through a reconfigurable switch network (3), and each adjacent reconfigurable processing unit (1) in the same longitudinal arrangement direction is connected through a basic unit data line (4) vertically connected, the basic unit data line (4) is connected with the horizontal interconnection bus (2-2) through a reconfigurable switch network (3), which is characterized in that the reconfigurable processing unit (1) is a cordic The pipeline structure of several stages of the algorithm, the adder and the adder in the corresponding position, the shifter and the shifter in the corresponding position, the accumulator and the The accumulator at the corresponding position, the register and the register at the corresponding position are respectively connected through an interconnection line (7) including a control switch (5).

2, a kind of array type reconfigurable DSP engine chip structure based on CORDIC unit according to claim 1, it is characterized in that described interconnection bus (2) is a 64-bit interconnection bus, reconfigurable switch network ( 3) It is composed of several switching tubes (3-1), and the switching tubes (3-1) are arranged at the intersection of 64 interconnecting buses (2-1) in the vertical direction and 64 interconnecting buses (2-2) in the horizontal direction point, the two main working poles of the switch tube (3-1) are respectively connected to the longitudinal interconnection bus (2-1) and the horizontal interconnection bus (2-2), and the control pole of the switch tube (3-1) Connect the preset memory (3-2) of the control switch tube.