CN1564125A - Array type reconstructural DSP engine chip structure based on CORDIC unit - Google Patents
Array type reconstructural DSP engine chip structure based on CORDIC unit Download PDFInfo
- Publication number
- CN1564125A CN1564125A CN 200410013670 CN200410013670A CN1564125A CN 1564125 A CN1564125 A CN 1564125A CN 200410013670 CN200410013670 CN 200410013670 CN 200410013670 A CN200410013670 A CN 200410013670A CN 1564125 A CN1564125 A CN 1564125A
- Authority
- CN
- China
- Prior art keywords
- reconfigurable
- interconnection bus
- adder
- interconnection
- reconfigurable processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Multi Processors (AREA)
Abstract
Description
技术领域:Technical field:
本发明涉及一种以CORDIC算法为核心的粗颗粒度基本单元组成的可重构(硬件可编程)阵列式芯片内部结构,该结构主要应用于DSP领域。通过对芯片内的硬件可重构资源的配置,可以高效地执行绝大多数DSP算法中的核心环节,可以用做DSP系统中的加速引擎。The invention relates to a reconfigurable (hardware programmable) array chip internal structure composed of coarse-grained basic units with a CORDIC algorithm as the core, and the structure is mainly used in the DSP field. Through the configuration of the hardware reconfigurable resources in the chip, it can efficiently execute the core links in most DSP algorithms, and can be used as the acceleration engine in the DSP system.
背景技术:Background technique:
CORDIC(COordinate Rotation DIgital Computing)又称为坐标旋转数字计算方法,是用于计算广义矢量旋转的一种迭代方法。通过设置CORDIC单元中为数不多的几个参数,它可以用简单的“移位——加”迭代实现多种基本函数和运算,如:三角函数、反三角函数、双曲函数、反双曲函数、对数运算、指数运算、开方运算、乘法运算、除法运算,这些特性表明CORDIC算法本身具有很好的可重构性(硬件可编程性)。这些函数和运算中有很多是用其他方法不容易实现的,也是一些DSP算法中经常遇到的。以往的可重构器件分为细颗粒度和粗颗粒度两大类。我们对现有的粗颗粒度阵列芯片的总结发现,他们的数据字宽都是固定的,只能适应数据字宽相同的一类应用。我们认为如果芯片能够具有数据字宽可重构的功能,将会大大增强芯片的适应性,使芯片资源浪费少,形成功能模块的性能较高,功耗较低,而且配置数据相比之下要少的多,有利于动态应用。CORDIC (COordinate Rotation DIgital Computing), also known as coordinate rotation digital calculation method, is an iterative method for calculating generalized vector rotation. By setting the few parameters in the CORDIC unit, it can implement a variety of basic functions and operations with simple "shift-add" iterations, such as: trigonometric functions, inverse trigonometric functions, hyperbolic functions, inverse hyperbolic functions Functions, logarithmic operations, exponential operations, square root operations, multiplication operations, and division operations, these characteristics indicate that the CORDIC algorithm itself has good reconfigurability (hardware programmability). Many of these functions and operations are not easily realized by other methods, and are often encountered in some DSP algorithms. Previous reconfigurable devices are divided into two categories: fine-grained and coarse-grained. Our summary of the existing coarse-grained array chips found that their data word width is fixed, and they can only adapt to a class of applications with the same data word width. We believe that if the chip can have the function of reconfigurable data word width, the adaptability of the chip will be greatly enhanced, the waste of chip resources will be less, the performance of the functional modules will be higher, the power consumption will be lower, and the configuration data will be relatively low. Much less, in favor of dynamic applications.
发明内容:Invention content:
为了解决已有的粗颗粒度DSP阵列芯片数据字宽固定的问题,提供一种能通过把相邻单元的基本运算部件重构以改变数据字宽的阵列式可重构DSP芯片。本发明的技术方案如下:一种基于CORDIC单元的阵列式可重构DSP引擎芯片结构,排成阵列的若干个可重构处理单元1之间设置有互连总线2,纵向的互连总线2-1与横向的互连总线2-2通过可重构开关网络3互相连接,在同一纵向排列方向上的每个相邻的可重构处理单元1通过基本单元数据线4纵向相连接,基本单元数据线4与横向的互连总线2-2通过可重构开关网络3相连接,可重构处理单元1是cordic算法的若干级流水结构,横向相邻的可重构处理单元1的同一级流水中的加法器与相对应位置的加法器、移位器与相对应位置的移位器、累加器与相对应位置的累加器、寄存器与相对应位置的寄存器之间分别通过包括有控制开关5的互连线7相连接。本发明的DSP引擎芯片,为实现相邻单元之间的可重构,主要在相邻单元中处于同一流水级的两个运算部件如移位器、加法器、累加器、寄存器之间建立可重构通路,当该通路连通时两个移位器、加法器、累加器、寄存器可以组成一个数据字宽为原来2倍的功能模块。我们为单元中的移位器、加法器、累加器和寄存器增加了可重构功能,使横向相邻的两个单元之间同一级流水的移位器、加法器和寄存器通过配置,可以连接成为字宽为原来2倍的相应功能单元。该可重构功能让我们可以实现将版图位置相邻的4/9/16个8-bit CORDIC单元组成一个16/24/32-bit的CORDIC单元。由于本发明的DSP引擎芯片的移位器、加法器、寄存器和累加器的数据字宽都能通过重构发生变化,因此芯片的通用性和适应性大大增强,使芯片资源浪费少,形成功能模块的性能高、功耗低,有利于动态应用。使用COROIC算法的可重构处理单元1自身具有很强的可重构性,能够高效地实现相当广泛的DSP类算法,而且结构简单规则,易于实现模块化,非常适合作为可重构芯片的核心单元。本发明设计新颖、工作可靠,具有较大的推广价值。In order to solve the problem of fixed data word width of existing coarse-grained DSP array chips, an array type reconfigurable DSP chip which can change the data word width by reconfiguring the basic operation components of adjacent units is provided. The technical scheme of the present invention is as follows: an array type reconfigurable DSP engine chip structure based on CORDIC unit,
附图说明:Description of drawings:
图1是本发明的结构示意图,图2是本发明实施方式二中可重构开关网络3的结构示意图,图3是实施方式一中可重构处理单元1的结构示意图,图4是实施方式一中移位器重构的示意图,图5是加法器重构的示意图,图6是相邻可重构处理单元1中第一级流水的连接结构示意图。Fig. 1 is a schematic structural diagram of the present invention, Fig. 2 is a schematic structural diagram of a
具体实施方式:Detailed ways:
具体实施方式一:下面结合图1、图3至图6具体说明本实施方式。排成阵列的若干个可重构处理单元1之间设置有互连总线2,纵向的互连总线2-1与横向的互连总线2-2通过可重构开关网络3互相连接,在同一纵向排列方向上的每个相邻的可重构处理单元1通过基本单元数据线4纵向相连接,基本单元数据线4与横向的互连总线2-2通过可重构开关网络3相连接,可重构处理单元1是cordic算法的若干级流水结构,横向相邻的可重构处理单元1的同一级流水中的加法器与相对应位置的加法器、移位器与相对应位置的移位器、累加器与相对应位置的累加器、寄存器与相对应位置的寄存器之间分别通过包括有控制开关5的互连线7相连接。Specific Embodiment 1: The present embodiment will be specifically described below with reference to FIG. 1 , FIG. 3 to FIG. 6 . An
如图4所示,分属两个相邻可重构处理单元1中的移位器1-1-1和移位器1-1-2,都是右移三位的移位器,当想得到较长数据字宽时,通过指令让控制开关5接通,互连线7就成为通路,使移位器1-1-1和移位器1-1-2就合成为一数据字宽为原来2倍的移位器。加法器也能通过相同方法进行重构,加长数据字宽。如图5所示,分属两个相邻可重构处理单元1中相同位置的超前进位加法器1-2-1和超前进位加法器1-2-2,通过控制开关5的接通,就能得到数据字宽为原来2倍的一个超前进位加法器,从而实现重构功能。累加器和寄存器也能通过相同方法进行重构。控制开关5选用场效应管来实现。图6中示出横向相邻的两个可重构处理单元1中第一级流水中相同位置的移位器、加法器、寄存器和累加器的连接结构示意图,加法器1-2-1和加法器1-2-2之间,加法器1-3-1和加法器1-3-2之间,加法器1-4-1和加法器1-4-2之间,寄存器1-5-1和寄存器1-5-2之间、寄存器1-6-1和寄存器1-6-2之间,寄存器1-7-1和寄存器1-7-2之间,移位器1-8-1和移位器1-8-2之间、移位器1-9-1和移位器1-9-2之间,标志寄存器1-10-1和标志寄存器1-10-2之间都通过包括有开关5的互连线7相连接。As shown in FIG. 4 , the shifters 1-1-1 and 1-1-2 belonging to two adjacent
如图3和图6所示,芯片的工作过程主要分为两个阶段:重构阶段和工作阶段。在重构阶段,上面结构中带有可重构功能的部分,即图中圆圈代表的可预置存储单元,被写入预置的配置数据。此时该芯片的功能已经被固定下来,相当于一个只能完成某一特定功能的硬件电路,随时可以加载被处理数据开始工作。进入工作阶段以后,每一个时钟节拍有一组数据从单元的顶部进入处理引擎,以流水的方式逐级向下运行,到达算法结束的位置以后通过互连总线连接到某一芯片端口。As shown in Figure 3 and Figure 6, the working process of the chip is mainly divided into two stages: the reconstruction stage and the working stage. In the reconfiguration phase, the part with reconfigurable functions in the above structure, that is, the preconfigurable storage unit represented by the circle in the figure, is written with preset configuration data. At this time, the function of the chip has been fixed, which is equivalent to a hardware circuit that can only complete a certain function, and can load the processed data at any time to start working. After entering the working stage, a set of data enters the processing engine from the top of the unit at each clock beat, and runs down step by step in a pipelined manner. After reaching the end of the algorithm, it is connected to a certain chip port through the interconnection bus.
下面以计算某一角度α的正弦和余弦为例说明8位CORDIC单元内部工作过程。单元入口有三路数据输入(X0,Y0,Z0),初始值取(1、0、α),前十级流水的移位序列为(0,0,1,2,3,4,5,6,7,8),3级模校正流水的移位序列(2,5,8)。X0进入第一级流水后分为两路,一路直接作为第一个加法器的被加数,另一路经过“右移0位”后作为第二个加发器的加数;Y0进入第一级流水后分为三路,一路直接作为第二个加法器的被加数,第二路经过“右移0位”后作为第一个加发器的加数,第三路送入符号判断模块;Z0进入第一级后分为两路,一路送入加发器与±arctan(2-0)做和,另一路则进入符号判断模块。符号判断模块根据Zi的正负产生一个符号位分别送入三路加法器,控制他们做和或做差。三个加法器产生的结果在下一个时钟到来时存入该级流水的寄存器中,供第二级流水操作使用。以此类推,直到第十级,各级之间操作的不同点在于移位位数(如移位序列描述的)。从第十一级到第十三级流水为模校正操作,该操作中X0分为两路,一路直接作为第一个加法器的被加数,另一路经过右移后作为这个加法器的加数;Y0与X0的模校正操作相同;Z0无操作,只做三级寄存。三级模校正操作的移位序列为(2,5,8),各级加法器的加减操作由一个预置存储位控制。最终第十三级流水寄存器中的数据就是本次计算的结果:X13=Cosα,Y13=Sinα,Z13=0。这两个正弦和余弦值可以通过互连总线输出或传给其他功能模块使用。The following takes the calculation of the sine and cosine of a certain angle α as an example to illustrate the internal working process of the 8-bit CORDIC unit. The unit entrance has three data inputs (X 0 , Y 0 , Z 0 ), the initial value is (1, 0, α), and the shift sequence of the first ten stages is (0, 0, 1, 2, 3, 4, 5, 6, 7, 8), the shift sequence (2, 5, 8) of the 3-stage modulo correction pipeline. After X 0 enters the first-level pipeline, it is divided into two paths, one path is directly used as the addend of the first adder, and the other path is used as the addend of the second adder after being "shifted right by 0 bits"; Y 0 enters After the first level of pipeline, it is divided into three ways, one way is directly used as the addend of the second adder, the second way is used as the addend of the first adder after "shifting right by 0", and the third way is sent into Symbol judging module; Z 0 is divided into two paths after entering the first stage, one path is sent to the adder and ±arctan(2 -0 ) for summing, and the other path enters the symbol judging module. The sign judging module generates a sign bit according to the positive or negative of Z i and sends it to the three-way adder to control them to make sum or difference. The results generated by the three adders are stored in the registers of the pipeline when the next clock arrives, for use in the second pipeline operation. By analogy, up to the tenth stage, the difference in operation between stages is the number of shifts (as described by the shift sequence). The pipeline from the eleventh level to the thirteenth level is a modulo correction operation. In this operation, X 0 is divided into two ways, one way is directly used as the summand of the first adder, and the other way is used as the addend of the adder after being shifted to the right. Addend; the modulo correction operation of Y 0 and X 0 is the same; Z 0 has no operation, only three-level register. The shift sequence of the three-level modulo correction operation is (2, 5, 8), and the addition and subtraction operations of the adders at each level are controlled by a preset storage bit. Finally, the data in the thirteenth-stage pipeline register is the result of this calculation: X 13 =Cosα, Y 13 =Sinα, Z 13 =0. The two sine and cosine values can be output through the interconnection bus or passed to other functional modules for use.
具体实施方式二:下面结合图1和图2具体说明本实施方式。本实施方式与实施方式一的不同点是:所述互连总线2为64-bit互连总线,可重构开关网络3由若干个开关管3-1组成,开关管3-1设置在纵向的64根互连总线2-1与横向的64根互连总线2-2的交叉点处,开关管3-1的两个主工作极分别连接纵向的互连总线2-1和横向的互连总线2-2,开关管3-1的控制极连接控制开关管的预置存储器3-2。本实施方式工作时,通过编程在预置存器3-2中设定开关管3-1是连通还是关断,从而决定本芯片的构成方式。开关管3-1既可以选用CMOS管也可以选用NMOS管。Specific Embodiment 2: The present embodiment will be specifically described below with reference to FIG. 1 and FIG. 2 . The difference between this embodiment and
Claims (2)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 200410013670 CN1564125A (en) | 2004-04-09 | 2004-04-09 | Array type reconstructural DSP engine chip structure based on CORDIC unit |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 200410013670 CN1564125A (en) | 2004-04-09 | 2004-04-09 | Array type reconstructural DSP engine chip structure based on CORDIC unit |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN1564125A true CN1564125A (en) | 2005-01-12 |
Family
ID=34478237
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN 200410013670 Pending CN1564125A (en) | 2004-04-09 | 2004-04-09 | Array type reconstructural DSP engine chip structure based on CORDIC unit |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN1564125A (en) |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101620587B (en) * | 2008-07-03 | 2011-01-19 | 中国人民解放军信息工程大学 | Flexible reconfigurable task processing unit structure |
| CN102163247A (en) * | 2011-04-02 | 2011-08-24 | 北京大学深圳研究生院 | Array structure of reconfigurable operators |
| CN102214158A (en) * | 2011-06-08 | 2011-10-12 | 清华大学 | Dynamic reconfigurable processor with full-interconnection routing structure |
| CN102339269A (en) * | 2011-09-09 | 2012-02-01 | 北京大学深圳研究生院 | Reconfigurable operator array structure suitable for WLP (Wafer Level Packaging) packaging mode |
| CN102624653A (en) * | 2012-01-13 | 2012-08-01 | 清华大学 | Extensible QR decomposition method based on pipeline working mode |
| CN102650860A (en) * | 2011-02-25 | 2012-08-29 | 西安邮电学院 | Controller structure of signal processing hardware in novel data stream DSP (digital signal processor) |
| CN103390071A (en) * | 2012-05-07 | 2013-11-13 | 北京大学深圳研究生院 | Hierarchical interconnection structure of reconfigurable operator array |
| CN105843774A (en) * | 2016-03-23 | 2016-08-10 | 东南大学—无锡集成电路技术研究所 | Dynamic multimode configurable reconstructed computation unit structure |
| CN106326628A (en) * | 2015-12-03 | 2017-01-11 | 西安邮电大学 | Reconstructing array structure for natural logarithm and natural exponential functions |
| CN109933372A (en) * | 2019-02-26 | 2019-06-25 | 西安理工大学 | A low-power processor with a multi-mode dynamically switchable architecture |
| CN110597755A (en) * | 2019-08-02 | 2019-12-20 | 北京多思安全芯片科技有限公司 | Recombination configuration method of safety processor |
| CN113885832A (en) * | 2021-09-30 | 2022-01-04 | 南京大学 | Reconfigurable computing engine based on CORDIC |
-
2004
- 2004-04-09 CN CN 200410013670 patent/CN1564125A/en active Pending
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101620587B (en) * | 2008-07-03 | 2011-01-19 | 中国人民解放军信息工程大学 | Flexible reconfigurable task processing unit structure |
| CN102650860A (en) * | 2011-02-25 | 2012-08-29 | 西安邮电学院 | Controller structure of signal processing hardware in novel data stream DSP (digital signal processor) |
| CN102163247A (en) * | 2011-04-02 | 2011-08-24 | 北京大学深圳研究生院 | Array structure of reconfigurable operators |
| CN102214158A (en) * | 2011-06-08 | 2011-10-12 | 清华大学 | Dynamic reconfigurable processor with full-interconnection routing structure |
| CN102339269B (en) * | 2011-09-09 | 2017-10-27 | 北京大学深圳研究生院 | A kind of reconfigurable operator array structure suitable for WLP packing forms |
| CN102339269A (en) * | 2011-09-09 | 2012-02-01 | 北京大学深圳研究生院 | Reconfigurable operator array structure suitable for WLP (Wafer Level Packaging) packaging mode |
| CN102624653A (en) * | 2012-01-13 | 2012-08-01 | 清华大学 | Extensible QR decomposition method based on pipeline working mode |
| CN102624653B (en) * | 2012-01-13 | 2014-08-20 | 清华大学 | Extensible QR decomposition method based on pipeline working mode |
| CN103390071A (en) * | 2012-05-07 | 2013-11-13 | 北京大学深圳研究生院 | Hierarchical interconnection structure of reconfigurable operator array |
| CN106326628B (en) * | 2015-12-03 | 2018-12-28 | 西安邮电大学 | A kind of reconfigurable array structure for realizing natural logrithm and natural exponential function |
| CN106326628A (en) * | 2015-12-03 | 2017-01-11 | 西安邮电大学 | Reconstructing array structure for natural logarithm and natural exponential functions |
| CN105843774B (en) * | 2016-03-23 | 2018-10-02 | 东南大学—无锡集成电路技术研究所 | A kind of Reconfigurable Computation cellular construction that dynamic multi-mode can match |
| CN105843774A (en) * | 2016-03-23 | 2016-08-10 | 东南大学—无锡集成电路技术研究所 | Dynamic multimode configurable reconstructed computation unit structure |
| CN109933372A (en) * | 2019-02-26 | 2019-06-25 | 西安理工大学 | A low-power processor with a multi-mode dynamically switchable architecture |
| CN109933372B (en) * | 2019-02-26 | 2022-12-09 | 西安理工大学 | A multi-mode dynamically switchable architecture low-power processor |
| CN110597755A (en) * | 2019-08-02 | 2019-12-20 | 北京多思安全芯片科技有限公司 | Recombination configuration method of safety processor |
| CN110597755B (en) * | 2019-08-02 | 2024-01-09 | 北京多思安全芯片科技有限公司 | Recombination configuration method of safety processor |
| CN113885832A (en) * | 2021-09-30 | 2022-01-04 | 南京大学 | Reconfigurable computing engine based on CORDIC |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7340562B2 (en) | Cache for instruction set architecture | |
| CN107797962B (en) | Computational Array Based on Neural Network | |
| WO2009144539A2 (en) | Microprocessor techniques for real signal processing and updating | |
| KR20090027184A (en) | Multi-media platform including floating point unit-processing element (FPX-PE) structure that supports floating-point arithmetic and reconfigured array processor (RAP) including its XP-PE structure and its RAP | |
| CN111008003B (en) | Data processors, methods, chips and electronic devices | |
| CN1564125A (en) | Array type reconstructural DSP engine chip structure based on CORDIC unit | |
| US7263602B2 (en) | Programmable pipeline fabric utilizing partially global configuration buses | |
| CN103984677A (en) | Embedded reconfigurable system based on large-scale coarseness and processing method thereof | |
| CN110362293B (en) | Multipliers, data processing methods, chips and electronic devices | |
| US20110185151A1 (en) | Data Processing Architecture | |
| CN101847137A (en) | FFT processor for realizing 2FFT-based calculation | |
| CN110531954B (en) | Multiplier, data processing method, chip and electronic device | |
| CN104063357A (en) | Processor And Processing Method | |
| CN110059809B (en) | Computing device and related product | |
| CN112559954B (en) | FFT algorithm processing method and device based on software-defined reconfigurable processor | |
| Waidyasooriya et al. | FPGA implementation of heterogeneous multicore platform with SIMD/MIMD custom accelerators | |
| WO2026016556A1 (en) | Arithmetic logic unit, processor, computing chip, and computing device | |
| Patle et al. | Implementation of Baugh-Wooley Multiplier Based on Soft-Core Processor | |
| CN105577372A (en) | Unsigned processing method of modular inversion algorithm and modular inversion accelerator | |
| CN110515586B (en) | Multiplier, data processing method, chip and electronic device | |
| CN114861125A (en) | An implementation method of fast Fourier transform and inverse transform | |
| US20070198811A1 (en) | Data-driven information processor performing operations between data sets included in data packet | |
| CN120045513B (en) | A computing unit and FFT processor based on coarse-grained reconfigurable architecture | |
| Cardarilli et al. | A full-adder based reconfigurable architecture for fine grain applications: ADAPTO | |
| CN111222632A (en) | Computing device, computing method and related product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
| WD01 | Invention patent application deemed withdrawn after publication |