TW201017529A - Computation and addressing method of a eneral sized memory-based FFT processor - Google Patents
Computation and addressing method of a eneral sized memory-based FFT processor Download PDFInfo
- Publication number
- TW201017529A TW201017529A TW97141311A TW97141311A TW201017529A TW 201017529 A TW201017529 A TW 201017529A TW 97141311 A TW97141311 A TW 97141311A TW 97141311 A TW97141311 A TW 97141311A TW 201017529 A TW201017529 A TW 201017529A
- Authority
- TW
- Taiwan
- Prior art keywords
- fourier transform
- memory
- fast fourier
- data
- point
- Prior art date
Links
- 230000015654 memory Effects 0.000 title claims abstract description 122
- 238000000034 method Methods 0.000 title claims description 40
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 24
- 239000013598 vector Substances 0.000 claims abstract description 23
- 238000004364 calculation method Methods 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 239000000463 material Substances 0.000 claims 4
- 230000005540 biological transmission Effects 0.000 claims 1
- 235000012054 meals Nutrition 0.000 claims 1
- 239000002023 wood Substances 0.000 claims 1
- 230000006399 behavior Effects 0.000 abstract 1
- 238000013507 mapping Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- 241000638935 Senecio crassissimus Species 0.000 description 1
- 230000000881 depressing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Landscapes
- Complex Calculations (AREA)
Abstract
Description
201017529 九、發明說明: 【發明所屬之技術領域】 本發明關於一種快速傅利葉轉換之計算與定址方法以 及使用此方法之以記憶體為基礎之任意點數正/逆向快速 傅立葉轉換處理器設計。對於任意點數的以記憶體為基礎 的離散快速傅立葉轉換處理器設計可以有效的減少處理器 面積與所需的操作時脈。 【先前技術】 Φ 按,有關本發明相關之快速傅利葉轉換計算與定址方 法以及使用此方法之以記憶體為基礎之正/逆向快速傅立 葉轉換處理器之先前技術謹羅列並比較缺點如下: (1) 由於美國專利號4,477, 878名為 “Discrete Fourier transform with non-tumbled output,,不能 支援多記憶體架構 (multi-bank memory structure),因而,對於基r(radix-r)計算時,就需 要有r個時脈週期才能把資料從記憶體中讀出或將計算 β 完的資料寫回記憶體中。這將導致FFT在計算過程中需 要更多的時脈週期,以及為了即時應用所需的更高時脈 速度。 本發明可藉由支援多記憶體定址,而在無記憶體存取衝 突的情況下,將諸如基r的r筆資料在一個時脈週期内 完成讀或寫,以解決先前技術的問題。 (2) 由於美國專利號5091875名為“Fast Fourier transform (FFT) addressing apparatus and method”, 美國專利公開號 20060253514 名為 “Memory-based Fast Fourier Transform device”,以及學術論文 L. G. 6 201017529201017529 IX. Description of the Invention: [Technical Field] The present invention relates to a fast Fourier transform calculation and addressing method and a memory-based arbitrary point forward/reverse fast Fourier transform processor design using the same. The memory-based discrete fast Fourier transform processor design for any number of points can effectively reduce the processor area and the required operating clock. [Prior Art] Φ According to the prior art of the fast Fourier transform calculation and addressing method of the present invention and the memory-based forward/reverse fast Fourier transform processor using the method, the disadvantages are as follows: (1) Since US Patent No. 4,477,878 is called "Discrete Fourier transform with non-tumbled output", it cannot support multi-bank memory structure. Therefore, for base r (radix-r) calculation, it is necessary. There are r clock cycles to read data from memory or write beta-completed data back into memory. This will cause the FFT to require more clock cycles in the calculation process and for immediate application. Higher clock speed. The present invention can read or write a r-pen data such as a base in a clock cycle by supporting multi-memory addressing without a memory access conflict. Solving the problems of the prior art. (2) U.S. Patent No. 5,091,875 is entitled "Fast Fourier transform (FFT) addressing apparatus and method", U.S. Patent Publication No. 2006 0253514 is called "Memory-based Fast Fourier Transform device" and academic paper L. G. 6 201017529
Johnson “Conflict free memory addressing for dedicated FFT hardware,M IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no· 5,pp. 312-316,May 1992等,均僅支援固定的 基r,因此僅能適用在具有N = rn大小的的FFT中。 若考慮到應用在中國數位電視之3780點FFT或者是 PLC應用之3072點FFT的情況時,前述兩件先前技術 即無法運作。 但本發明能夠支援任意基數r的混和。因此,能夠在任 φ 何大小的FFT應用中使用。 (3) 美國專利號 7062523 名為“ “Method for efficiently computing a fast Fourier transform” 僅支援固定的基r,因此不能支援中國DTV或是PLC之 類的應用。除此之外,他也不能支援多記憶體架構 (multi-bank memory structure),其在基 r(radix-r)運算時,需要r個時脈週期自記憶體存取資料。因此 將比使用多記憶體架構的處理器需要更高的時脈來完成 FFT的計算。 本發明除了支援任意大小的FFT應用之可變基數外,尚 ® 支援多記憶體架構,在不產生記憶體存取衝突的情況下 減低所需時脈。 (4) 美國專利號 7, 164, 723 名為“ Modulation apparatus using mixed-radix fast Fourier transform”,以及論文 B. G. Jo, and Μ. H· Sunwoo, “New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy, ’’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, ηο·5, pp. 911-919, May 2005 僅適用於基 2/4 混和的 201017529 演算法(algorithm) ’故僅能工作在N=2n大小上的 FFT ’無法應用於例如n= 3780之中國DTV之其他大小 的FFT應用。 ' 本發明因為可支援任何混合的基數’所以可以滿足上述 需求。除此之外,對於諸如8192的更長點數的處理 器設計,本發明可以讓處理器設計更加有彈性,因本發 明可支援大於基4的演算法。 (5) 美國專利公開號20080025199名為“Method and device for high throughput n-point forward and • inverse fast Fourier transform” 提出 3780 的可能 分解(candidate decomposition ),例如,3780 =Johnson "Conflict free memory addressing for dedicated FFT hardware, M IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992, etc., all support only fixed The base r, therefore, can only be applied to FFTs with N = rn size. If considering the 3780-point FFT applied to Chinese digital TV or the 3072-point FFT of PLC application, the above two prior techniques cannot be operated. However, the present invention can support the mixing of any base r. Therefore, it can be used in FFT applications of any size. (3) US Patent No. 7062523 named "Method for efficient computing a fast Fourier transform" only supports fixed bases. r, therefore can not support applications such as Chinese DTV or PLC. In addition, he can't support multi-bank memory structure, which requires r clock cycles to access data from memory during radix-r operations. Therefore, a higher clock is required than the processor using the multi-memory architecture to complete the FFT calculation. In addition to supporting the variable base of FFT applications of any size, the present invention supports multiple memory architectures to reduce the required clock without creating memory access violations. (4) US Patent No. 7, 164, 723 entitled "Modulation apparatus using mixed-radix fast Fourier transform", and paper BG Jo, and Μ. H. Sunwoo, "New continuous-flow mixed-radix (CFMR) FFT processor Using novel in-place strategy, '' IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, ηο·5, pp. 911-919, May 2005 only for the base 2/4 hybrid 201017529 algorithm ( Algorithm) 'The FFT that can only work at N=2n size cannot be applied to other sizes of FFT applications such as n=3780 China DTV. 'The present invention can support any mixed base number' so that the above requirements can be met. In addition, for longer processor designs such as 8192, the present invention allows the processor design to be more flexible, as the present invention can support algorithms larger than base 4. (5) US Patent Publication No. 20080025199 A possible decomposition of 3780 is proposed for "Method and device for high throughput n-point forward and • inverse fast Fourier transform", for example, 3780 =
3x3x3x2x2x5x7。其以MDC架構實行每個小點數的FFT 模組來減少後續會提到的一些中國專利中的較大内部暫 存器。然而,由於此方法需要對每個模組在一個時脈週 期中完成運算,因此會需要大量硬體。此外,在實際的 系統應用上是需要依序輸出資料的,但是此專利輸出資 料卻不是依序輸出的,因此尚有部分問題未解決。 (6) 中國專利號01140060.9名稱”3780點離散傅里葉變 春 換處理器系統及其結構”、中國專利號03107204. 6名 稱”具有3780點IDFT/DFT處理器的多載波系統及其方 法”、中國專利公開號200410090873.2名稱”採用升採 樣處理方法實現3780點離散傅立葉變換”、中國專利公 開號200610104144.7名稱”3780點離散傅立葉變換處 理器”以及中國專利公開號200710044716. 1名稱”流水 線結構的3780點快速傅里葉變換處理器”等上述專利可 8 201017529 執行3780點之具有類似管線(pipelined)架構之FFT 處理器,其所提出的架構内部需要大量的暫存器或記憶 體來重新排列資料。此外,對於實際系統應用之需求而 言,依序輸入輸出資料以及支援連續資料流都是必須 的,為了達成這些,中國專利號01140060. 9名稱”3780 點離散傅里葉變換處理器系統及其結構”以及中國專利 參 號03107204. 6名稱”具有3780點IDFT/DFT處理器的多 載波系統及其方法”就至少需要3N字元的記憶體空間; 中國專利公開號200410090873. 2,名稱”採用升採樣處 理方法實現3780點離散傅立葉變換”、中國專利公開號 200710044716. 1,名稱”流水線結構的3780點快速傅里 葉變換處理器”就至少需要5N字元的記憶體空間;中國 ^ 專利公開號200610104144. 7名稱”3780點離散傅立葉 變換處理器”就至少需要6N字元的記憶體空間。 相較前述,本發明僅僅需要2N字元的記憶體空間就可 以做到了 。並且,請注意到在中國專利號 01140060.9、中國專利號03107204.6以及中國專利公 開號200710044716.1之輸出資料並不是有序的,因此 他們需要至少一個N字元的記憶體空間與額外的控制邏 9 201017529 輯來重新排序輸出資料以便依序輸出。 (7)在論文 Z.-X. Yang, Y.-P. Hu,C. -Y. Pan, and L. Yang, “Design of a 3780-point IFFT processor for TDS-OFDM, 5, IEEE Trans. Broadcast., vol. 48, no. 1, pp· 57-61,Mar. 2002所提出的3780點FFT處理器之 輸出資料並非依序排列,為了能夠依序排列,其需要一 ❹ 個緩衝器去重新排列輸出資料,因此其至少需要3N字 元的記憶體空間,才能在能處理連續資料流的前提下達 成此需求,但本發明已如上述僅需2N字元空間的記憶 體。 【發明内容】 有鑑於上述先前技術的缺失,本發明提出一種快速傅 利葉轉換之計算與定址方法以及使用此方法之以記憶體為 ❿ 基礎之任意點數正/逆向快速傅立葉轉換處理器設計,其 方法第一項特徵為: 藉由分解方程式將長點數離散傅立葉轉換的計算分解為數 個短點數的離散傅立葉轉換,並同時將其指標由單一維度 映射成多維度指標向量。 本發明的方法第二項特徵為:藉由控制這些多維度 201017529 指標向量’本發明把原始輸人資料分散存放到數個記憶體 裡,使得在不產生記憶體存取衝突的情況下同時達到計算 期間的資料置換與記憶體完整蝴蝶點數一次存取的目的。 本發明的方法第三項特徵為:當資料置換使用在已計 算完成的舊資料依序輸出與新資料依序輸入時為了往後 計算期間可以繼續保持資料存取時沒有記憶體衝突,本發 •明對於新資料的計算採取與先前資料計算時的反序操作來 達成目的。此方法,對於任意點數的以記憶體為基礎的離 散快速傅立葉轉換處理器設計可以有效的減少處理器面積 與所需的操作時脈。 本發月提出之以s己憶體為基礎之快速傅立葉轉換處理 器設計,其包含:-用以存放資料之主要記憶體、一用以 鲁進行分解後短點數快速傅立料換之處理元件以及一押制 單元,其中該控制單元具有控制以下項目之功能:(Γ)輪 入輸出資料與蝴蝶運算用之記憶體,⑵分解後之短點數 决速傅立葉轉換之計算順序,及⑶以資料置換方式進 資料存取所需之記憶體定址1以執行上述之方法。 【實施方式】 為了達成上述的發明目的,兹將本發明之—具趙實施 201017529 例說明如下: X{k)^NZx(ji)W^ 離散傅立葉轉換之定義為 ”=0 "。將長點數離 散傅立葉轉換分解為數個短點數離散傅立葉轉換之方法已 在論文”IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-25, NO. 3, JUNE 1977” 名稱 『Index Mappings for Multidimensional m3x3x3x2x2x5x7. It implements an FFT module for each small point in the MDC architecture to reduce the larger internal registers of some of the Chinese patents mentioned later. However, since this method requires an operation for each module in one clock cycle, a large amount of hardware is required. In addition, in actual system applications, it is necessary to output data sequentially, but the output of this patent is not output in order, so some problems remain unresolved. (6) Chinese Patent No. 01140060.9 "3780-point discrete Fourier transform spring processor system and its structure", Chinese Patent No. 03107204. 6 name "Multi-carrier system with 3780 IDFT/DFT processor and its method" , Chinese Patent Publication No. 200410090873.2 name "3780 point discrete Fourier transform using upsampling processing method", Chinese Patent Publication No. 200610104144.7 name "3780 point discrete Fourier transform processor" and Chinese patent publication number 200710044716. 1 name "pipeline structure 3780" The above-mentioned patents such as the point fast Fourier transform processor can perform 3780 points of an FFT processor with a pipelined architecture, and the proposed architecture requires a large amount of scratchpads or memory to rearrange the data. . In addition, for the actual system application requirements, sequential input and output data and support for continuous data flow are all necessary. In order to achieve this, the Chinese patent number 011400060. 9 name "3780 point discrete Fourier transform processor system and its "Structure" and Chinese Patent No. 03107204. 6 "Multi-carrier system with 3780-point IDFT/DFT processor and its method" requires at least 3N characters of memory space; Chinese Patent Publication No. 200410090873. 2, name" adopted The upsampling processing method realizes 3780-point discrete Fourier transform", Chinese Patent Publication No. 200710044716. 1. The 3780-point fast Fourier transform processor with the name "pipeline structure" requires at least 5N characters of memory space; China ^ Patent Disclosure No. 200610104144. 7 Name "3780-point discrete Fourier transform processor" requires at least 6N characters of memory space. Compared to the foregoing, the present invention requires only 2N characters of memory space. Also, please note that the output data in Chinese Patent No. 01140060.9, Chinese Patent No. 03107204.6, and Chinese Patent Publication No. 200710044716.1 are not orderly, so they need at least one N-character memory space with additional control logic 9 201017529 To reorder the output data for sequential output. (7) In the paper Z.-X. Yang, Y.-P. Hu, C. -Y. Pan, and L. Yang, "Design of a 3780-point IFFT processor for TDS-OFDM, 5, IEEE Trans. Broadcast., vol. 48, no. 1, pp· 57-61, Mar. 2002 The output data of the 3780-point FFT processor is not arranged in order. In order to be able to arrange in sequence, it needs a buffer to go. The output data is rearranged, so that it requires at least 3N characters of memory space to achieve this requirement on the premise that the continuous data stream can be processed, but the present invention has only 2N character space memory as described above. In view of the above-mentioned shortcomings of the prior art, the present invention proposes a fast Fourier transform calculation and addressing method and an arbitrary-point forward/reverse fast Fourier transform processor design based on the memory using the method, the method first The feature of the term is: Decomposing the calculation of the long-point discrete Fourier transform into a discrete Fourier transform of several short points by the decomposition equation, and simultaneously mapping its index from a single dimension to a multi-dimensional index vector. The second method of the method of the present invention special By controlling these multi-dimensional 201017529 indicator vectors, the present invention distributes the original input data into a plurality of memories so that the data replacement and the memory during the calculation are simultaneously achieved without causing memory access conflicts. The purpose of the complete butterfly point access is as follows. The third feature of the method of the present invention is that when the data replacement is used, the old data sequentially output and the new data are sequentially input, and can be continued for the subsequent calculation period. There is no memory conflict when accessing data. The calculation of new data takes the reverse operation with the previous data calculation to achieve the purpose. This method is a memory-based discrete fast Fourier transform for any number of points. The processor design can effectively reduce the processor area and the required operating clock. This month's FT-based fast Fourier transform processor design includes: - the main memory for storing data a processing element for short-point number fast-changing after decomposing, and a depressing unit, wherein the control unit It has the functions of controlling the following items: (Γ) the memory for the input data and the butterfly operation, (2) the calculation order of the short-point final Fourier transform after decomposition, and (3) the data access method for data access. The memory address is set to perform the above-described method. [Embodiment] In order to achieve the above object, the present invention will be described as follows: X{k)^NZx(ji)W^ Discrete Fourier transform Defined as "=0 ". The method of decomposing long-point discrete Fourier transform into several short-point discrete Fourier transforms has been published in the paper "IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-25, NO. 3, JUNE 1977" For Multidimensional m
Formulation of the DFT and Convolution』中提出。本 發明在此以點數長度N=N1N2之分解方程式(1)加以說明: \n = N2nl+A2n2 modA^ =0,1,,..,TV, -1 [k = + Nxk2 modN -1 (1) 方程式(1)將指標n與k映射為指標向量(nl, n2)與 (kl, k2),使其由單一維度[0,N-l]映射成二維度[0, Nl-1] X [0, N2-1]。其中係數A2與B1之選擇取決於N1 與N2間之關係。於本發明提出之方法中,方程式(1)係數 選用規則如下: 情況一:若N1與N2互質,則選用符合下列條件之A2及 B1為係數 A2 : =plNl 並且 A2 =dlN2 + 1 B1 : =p2N2 並且 B1 =q2Nl + 1 12 201017529 在此,pi、ql、n9 0 ^ ^ P及q2皆為正整數。因此,離散傅立 葉轉換之定義可宜士、 科J冩成以下之方程式(2): ^,A2) = ^ X〇c(«l,n2)»^*'^ «I 叫 12 =Σ{ Σχ(«ι»«2)^^} =ΣΜ.«ί) ^ (2) 隋況一.若Nl與M2非為互質,則選用A2 = B1 =卜因 & H# it葉轉換之定義可寫成以下之方程式⑶: =Σ{ w^ (3)Proposed in Formulation of the DFT and Convolution. The present invention is described herein by the decomposition equation (1) of the point length N = N1N2: \n = N2nl + A2n2 modA^ =0, 1, , .., TV, -1 [k = + Nxk2 modN -1 ( 1) Equation (1) maps the indices n and k into index vectors (nl, n2) and (kl, k2), which are mapped from a single dimension [0, Nl] to a two-dimensional [0, Nl-1] X [ 0, N2-1]. The choice of coefficients A2 and B1 depends on the relationship between N1 and N2. In the method proposed by the present invention, the coefficient selection rule of equation (1) is as follows: Case 1: If N1 and N2 are mutually prime, A2 and B1 satisfying the following conditions are selected as coefficient A2: =plNl and A2 = dlN2 + 1 B1 : =p2N2 and B1 = q2Nl + 1 12 201017529 Here, pi, ql, n9 0 ^ ^ P and q2 are all positive integers. Therefore, the definition of the discrete Fourier transform can be determined by the following equation (2): ^, A2) = ^ X〇c(«l,n2)»^*'^ «I is called 12 =Σ{ Σχ («ι»«2)^^} =ΣΜ.«ί) ^ (2) Condition 1. If Nl and M2 are not mutually prime, then use A2 = B1 = Buin &H# it leaf transformation definition Can be written as the following equation (3): =Σ{ w^ (3)
本發明可由方程式⑵及⑻得知,-長點數(N點) 快速傅立葉轉換可分別在第—及第二階段中以二較短點數 (N1點及N2點)快逮傅立葉轉換加以計算。就定值n2 而言,第一階段N1點快速傅立葉轉換之原始輪入資料為 x(nl,n2), nl-Ο, 1,·..,N1-1。對應此 n2 之第一階段 輸出資料為y(kl,必,kl=G,i ...,N卜卜就定值u 而言,第二階段N2點快速傅立葉轉換之原始輪入資料為 y(kl’ 112),n2=0’ 1,.·.,N2一卜對應此 kl 之第二階段 輸出資料則為X(kl,k2),k2= 〇, i,…,N2h。方程式 13 201017529 (2)與(3)之差異在於若N1與N2非為互質,則第一與第二 階段間存在有如方程式(3)所示之旋轉因子以。 註:就情況一而言,其係數選取亦可如情況二,且本發明 之後續流程相同 現於第一階段計算N1點離散傅立葉轉換並於第二階 段計算N2點離散傅立葉轉換,以求取第一個離散傅立葉 ❿轉換符元。先將原始輸入資料分散存放到數個記憶庫 (memory bank)中。假定N22N1且記憶庫之數量為N2 ’ 則可藉由方程式(4)將該原始輸入資料分散存放至N2個記 憶庫中以避免記憶體衝突。 bank = nl + n2 mod N2 (4) 避免記憶體衝突之關鍵在於透過方程式(1)及(4)將 ❿資料分散存放於記憶庫。一旦選定記憶庫,便需將資料定 址。同一記憶庫中之任兩筆資料應映射至〇到N1_l範圍 内之不同位址。為求簡單,在此選用以下之方程式(5)執 行資料定址。 address = nl (5) 完成第一個離散傅立葉轉換符元之計算後,應將資 料依序輸出。輸出指標係由方程式(1)映射而得。在將計 201017529 之第-個離散傅立葉轉換符元資料依 時’亦以㈣置換之方歧序輸人第二個 符元之輸入資料。亦即,新的原始輸入資料、轉換 輸出貝枓X⑴之位置。計算第: 之順序係與計算第一個離散傅立葉轉換符元時相反 即於第-階段計算N2點離散傅立葉轉換,再 •階段計算I點離散傅立葉轉換,藉以求取第二個離㈣ 立葉轉換符元。完成第二個離散傅立葉轉換符元之叶算 後,同樣以資料置換方式依序輸出計算完成之舊資料並依 序輸入新資料。第三個離散傅立葉轉換符元復依第一個離 散傅立葉轉換符元之方式計算。 ❹ 之 以下將以中國數位電視所需之謂點快速傅立葉轉 換為例詳加說明。以下所述之分解順序僅為可行方式 例 由於3780=4x3x3x3x5x7,本發明可分別進行3點、 4點、5點與7點之快速傅立葉轉換計算。在此,資料係 分散存放於7個記憶庫。 先依分解順序4、3、3、3、5及7執行計算。第一 ^段(4點快速傅立葉轉換)係採用分解方程式進行; 15 201017529 f» = 945„1+2836«2mod3780 [*: = 945^,+ 4^2mod3780 «, ί ' ’2',.·,944 ( 6) 上述方程式將指標η咏灿去人 峨射為向量㈨扮,即由[0,3779]至 [〇, 3] X [0, 944],如表一所示。 表一 第一階段之指標映射 — nl=〇 nl = l η1=2 η1=3 —---— — «2 =〇 χ[〇] χ[945] χ[18 χ[2835 90] ] «2=1 χ[2836] χ[ΐ] χ[94 χ[1891 6] ] »2=2 χ[1892] χ[2837 ] χ[2] χ[947] • · · • · · • · · • · · • · · ^2 =944 χ[944] χ[28 χ[3779 χ[1889] 34] ] 表一中每一列之資料為每個4點快速傅立葉轉換之原 始輪入資料。輸入順序取決於指標nl。例如,& = 1列 16 201017529 中’輸入順序為(nl,(〇,1),(1,1), (2,丨)以及 (3,D ’其分別對應之資料為χ[2836]、χ[1]、 χ[_],如表-所示。 J與 由於每個4點快速傅立葉轉換之原始輸入資料皆具 有相同之指標《2 ’故可藉由方程式⑺將資料分散存入不 同3己憶庫來避免第一階段之記憶體衝突。The present invention can be known from equations (2) and (8), and the long-point (N-point) fast Fourier transform can be calculated by using two shorter points (N1 point and N2 point) fast-changing Fourier transform in the first and second stages, respectively. . For the fixed value n2, the original round entry data of the first stage N1 point fast Fourier transform is x(nl, n2), nl-Ο, 1,·.., N1-1. The output data corresponding to the first stage of this n2 is y(kl, must, kl=G, i ..., N Bu is the fixed value u, the original round entry of the second stage N2 point fast Fourier transform is y (kl' 112), n2=0' 1, 1.·., N2 卜 corresponds to the second stage output data of this kl is X(kl, k2), k2 = 〇, i, ..., N2h. Equation 13 201017529 The difference between (2) and (3) is that if N1 and N2 are not mutually prime, there is a rotation factor as shown in equation (3) between the first and second stages. Note: In case of case 1, the coefficient The selection may also be as in case two, and the subsequent process of the present invention is the same as the first stage calculates the N1 point discrete Fourier transform and the second stage calculates the N2 point discrete Fourier transform to obtain the first discrete Fourier transform symbol. First, the original input data is dispersed into several memory banks. Assuming N22N1 and the number of memory banks is N2 ', the original input data can be distributed to N2 memory banks by equation (4). Avoid memory conflicts. bank = nl + n2 mod N2 (4) The key to avoiding memory conflicts is to pass the equation Equations (1) and (4) distribute the data in the memory. Once the memory is selected, the data needs to be addressed. Any two data in the same memory should be mapped to different addresses within the range of N1_l. For the sake of simplicity, the following equation (5) is used to perform data addressing. address = nl (5) After completing the calculation of the first discrete Fourier transform symbol, the data should be output in order. The output index is determined by the equation (1). The mapping is obtained by inputting the input data of the second symbol in the order of the fourth discrete Fourier transform symbol data of 201017529, which is also replaced by (4). That is, the new original input data, Convert the position of the output 枓X(1). The order of the calculation: is the opposite of the calculation of the first discrete Fourier transform symbol, that is, the N2 point discrete Fourier transform is calculated in the first stage, and the I point discrete Fourier transform is calculated in the stage. Take the second (four) vertical leaf transform symbol. After the second discrete Fourier transform symbol is calculated, the old data calculated by the data is sequentially output and the new data is sequentially input. The third discrete Fourier transform symbol is calculated according to the first discrete Fourier transform symbol. 以下 The following will be explained in detail by the fast Fourier transform required by the Chinese digital TV. The decomposition sequence described below For the only possible example, since 3780=4x3x3x3x5x7, the present invention can perform fast Fourier transform calculation of 3 points, 4 points, 5 points and 7 points respectively. Here, the data is distributed in 7 memory banks. Calculations are performed at 3, 3, 3, 5, and 7. The first segment (4-point fast Fourier transform) is performed using the decomposition equation; 15 201017529 f» = 945„1+2836«2mod3780 [*: = 945^,+ 4^2mod3780 «, ί ' '2',.·,944 ( 6) The above equations illuminate the indicator η 咏 去 为 为 为 向量 向量 向量 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 944], as shown in Table 1. The index map of the first stage of Table 1 - nl=〇nl = l η1=2 η1=3 —---—— — «2 =〇χ[〇] χ[945] χ[18 χ[2835 90] ] «2 =1 χ[2836] χ[ΐ] χ[94 χ[1891 6] ] »2=2 χ[1892] χ[2837 ] χ[2] χ[947] • · · • · · • · · • · • · · ^2 =944 χ[944] χ[28 χ[3779 χ[1889] 34] ] The data in each column of Table 1 is the original round entry for each 4-point Fast Fourier Transform. The order of input depends on the indicator nl. For example, & = 1 column 16 201017529 'The input order is (nl, (〇, 1), (1, 1), (2, 丨) and (3, D 'the corresponding data is χ [2836] χ[1], χ[_], as shown in Table--J. Since the original input data of each 4-point fast Fourier transform has the same index "2", the data can be distributed by equation (7). Different 3 memory pools to avoid the first phase of memory conflicts.
bank = nl + x mod7 (7) 歷經第一階段後,原始資料已被分解為4個9仍點 快速傅立葉轉換之獨立群組,並分別對應於kl= 〇,H 及3。 , 同樣地’本發明亦可以945 = 3χ3χ3χ5χ7之順序分解 上述之945點快速傅立葉轉換,並將指標&映射為向量 ❿(η2,η3’ η4,η5,π6),即由[〇,944]映射為[〇,2]x[〇, 2]x[0,2]χ [〇,4]χ[〇,6]。 結合各階段之所有分解方程式,即可求得378〇點快 速傅立葉轉換依此分解順序之完整指標映射方程式,如方 程式(8)與(9)所示’在選擇記憶庫時係利用方程式(1〇)以 避免記憶體衝突,定址方程式則可採用方程式(11)。 n = 945nl + 1260n2 + 2940n3 + 980n4 + 1512n5 + 17 201017529 540n6 mod3780 (8) k = 945kl + 2380k2 + 3360k3 + 2520k4 + 2268k5 + 540k6 mod3780 (9) bank = nl + n2 + n3 + n4 + n5 + n6 mod7 (10) address = 135nl + 45n2 + 15n3 + 5n4 + n5 • (11) 繼計算第一個快速傅立葉轉換符元,亦即單數之快 速傅立葉轉換符元之後,方程式(1〇)及(11)中所有指標 111將轉換為ki。對於第二個輸入之快速傅立葉轉換符元 而言’亦即雙數之快逮傅立葉轉換符元,該雙數快速傅立 葉轉換符元之輸入資料xeven[n]應被置入單數離散傅立 瘳葉轉換符元之輪出資料XQd❿]之位置,兩者間關係為 其映射指標、記憶庫與絲應分別以方程式⑻、 (ίο)與οι)決定。 以下討論如何於目前雙數快速傅立葉轉換符元之資 =分配下,於計算期間持續避免資料存取時之記憶體衝 哭。 本發明係採用與 為計算雙數快速傅立葉轉換符元 201017529 計算單數快速傅立葉轉換符元時相反之分解順序進行計 算,亦即讓=7χ5χ3χ3χ3χ4,換言之,本發明係以7Bank = nl + x mod7 (7) After the first stage, the original data has been decomposed into four independent groups of 9-point fast Fourier transforms, corresponding to kl=〇, H and 3. Similarly, the present invention can also decompose the above-mentioned 945-point fast Fourier transform in the order of 945 = 3χ3χ3χ5χ7, and map the index & to the vector ❿(η2, η3' η4, η5, π6), that is, [〇, 944] Map to [〇, 2]x[〇, 2]x[0,2]χ [〇,4]χ[〇,6]. Combining all the decomposition equations of each stage, we can obtain the complete index mapping equation of 378〇 fast Fourier transform according to this decomposition order, as shown in equations (8) and (9), using equations when selecting memory. 〇) To avoid memory conflicts, the equation (11) can be used for the address equation. n = 945nl + 1260n2 + 2940n3 + 980n4 + 1512n5 + 17 201017529 540n6 mod3780 (8) k = 945kl + 2380k2 + 3360k3 + 2520k4 + 2268k5 + 540k6 mod3780 (9) bank = nl + n2 + n3 + n4 + n5 + n6 mod7 (10) address = 135nl + 45n2 + 15n3 + 5n4 + n5 • (11) Following the calculation of the first fast Fourier transform symbol, ie the singular fast Fourier transform symbol, in equations (1〇) and (11) All indicator 111 will be converted to ki. For the second input fast Fourier transform symbol, that is, the double-number fast-forward Fourier transform symbol, the input data xeven[n] of the double-number fast Fourier transform symbol should be placed into the singular discrete Fourier transform symbol The position of the data XQd❿], the relationship between the two is its mapping index, memory and silk should be determined by equations (8), (ίο) and οι). The following discussion discusses how to continue to avoid memory crying during data access during the calculation period under the current double-numbered fast Fourier transform symbol. The present invention is calculated using the reverse order of the calculation of the singular fast Fourier transform symbol for calculating the double fast Fourier transform symbol 201017529, that is, let = 7 χ 5 χ 3 χ 3 χ 3 χ 4, in other words, the present invention is 7
*’、5點、3點、3點、3點以及4點快速傅立葉轉換之順 序執行#算。藉㈣財式,本發明可求得雙數快速傅立 葉轉換符元之完整輸入輸出指標映射方程式(12)及(13), 其結果類似於單數快速傅立葉轉換符元之完整指標映射方 程式(8)及(9),方程式(12)及(13)分別將輸入指標η與輸 出指標k映射為向量(ai,a2,a3,a4,a5,a6)與(bl, b2,b3,b4,b5,b6),即由[〇,3779]映射為[〇,6]x[0, 4]χ[0’ 2]χ[0,2]x[〇,2]x[0,3],以計算雙數快速傅 立葉轉換符元。 n = 540al + 2268a2 + 2520a3 + 3360a4 + 2380a5 + 945a6 mod3780 (12) k = 540bl + 1512b2 + 980b3 + 2940b4 + 1260b5 + 945b6 mod3780 (13) 將輸入與輸出指標方程式(9)與(12)以及(8)與(13) 對應比較,可發現此兩組方程式恰以反序向量之關係彼此 匹配,如方程式(14)及(15)所示。 (al, a2, a3, a4, a5, a6) = (k6, k5, k4, k3, k2, 201017529 kl) (14) (nl, n2, n3, n4, n5, n6) = (b6, b5, b4, b3, b2, bl) (15) 因雙數快速傅立葉轉換符元xeven[n]之輸入資料被 置於已計算完成之單數快速傅立葉轉換符元輸出資料*', 5, 3, 3, 3, and 4 points of the fast Fourier transform are executed sequentially. By (4) financial formula, the present invention can obtain the complete input-output index mapping equations (12) and (13) of the double-number fast Fourier transform symbol, and the result is similar to the complete index mapping equation (8) of the singular fast Fourier transform symbol and (9) Equations (12) and (13) map the input index η and the output index k to vectors (ai, a2, a3, a4, a5, a6) and (bl, b2, b3, b4, b5, b6, respectively). ), that is, [〇,3779] is mapped to [〇,6]x[0, 4]χ[0' 2]χ[0,2]x[〇,2]x[0,3] to calculate the double number Fast Fourier transform symbol. n = 540al + 2268a2 + 2520a3 + 3360a4 + 2380a5 + 945a6 mod3780 (12) k = 540bl + 1512b2 + 980b3 + 2940b4 + 1260b5 + 945b6 mod3780 (13) Input and output indicator equations (9) and (12) and (8) Corresponding to (13), it can be found that the two sets of equations match each other in the inverse vector relationship, as shown in equations (14) and (15). (al, a2, a3, a4, a5, a6) = (k6, k5, k4, k3, k2, 201017529 kl) (14) (nl, n2, n3, n4, n5, n6) = (b6, b5, B4, b3, b2, bl) (15) The input data of the double-numbered fast Fourier transform symbol xeven[n] is placed in the calculated singular fast Fourier transform symbol output data.
Xodd [ k ]之位置’而兩者間之關係為k=n,由此可得雙數 ⑩快速傅立葉轉換符元之記憶庫選擇與記憶體定址方程式 (16)及(17)。 bank =技1 + + 技3 + a4 + + 技6 mod7 (16) address = 135a6 + 45a5 + 15a4 + 5a3 + a2 (17) 請注意,輸入映射方程式(12)與輸出映射方程式(9) 相同,且記憶庫選擇方程式(10)與(16)亦保持相同。是 籲以,於雙數快速傅立葉轉換符元之計算期間,藉由倒轉單 數快速傅立葉轉換符元分解順序之計算方式,可使記憶體 始終免於存取衝突。 此外,本發明發現,輸出映射方程式(13)與輸入映 射方程式(8)亦為相同,意指在將第三個快速傅立葉轉換 輸入資料xthirdU]置入已計算完成之雙數快速傅立葉轉 換輸出資料xeven[k]之位置並令其中k=n時,該第三個 20 201017529 快速傅立葉轉換輸入資料之分散儲存方式可與方程式 ⑻、(10)及(11)所決定者相同’也即第三個快速傅立葉 轉換符元的資料存放位置跟單數快速傅立葉轉換符元之資 料存放位置一樣而又回到起始討論的狀態,xthird[n]= x〇dd[n]。 根據以上之實施例,本發明可藉由倒轉先前之快速 ❹傅立葉轉換符元分解順序,設計出無記憶體衝突之可變基 數快速傅立葉轉換處理器,其可同時達到蝴蝶輸出與資料 輸入輸出之資料置換。 第一圖係以記憶體為基礎之3780點離散傅立葉轉換 處理器設計方塊圖,其中MEM一1與ME12為二記憶區塊 (memory block) ’各區塊包含7個記憶庫(memory 參bank),各記憶庫大小為540個字,FFT—C0RE包含可分別 處理短點數離散傅立葉轉換之處理單元,其計算各短點數 離散傅立葉轉換所需的時脈週期數由設計者依需求決定, 此亦決定計算單元所需硬體數量,控制單元主控資料存取 與處理單元之計算。 以下將就離散傅立葉轉換處理器之運作加以闞明, 爲便於說明,在此定義第一類之分解與計算順序依次為4 21 201017529 點、3點、3點、3點、5點與7點離散傅立葉轉換,並將 其倒轉順序定義為第二類順序,依次 ’(‘黏、5 點、3 點、3點、3點與4點離散傅立葉轉換。 假設第-個與第二個離散傅立葉轉換符元分別储存於 記憶區塊MEM—1與_〜2,並假設第一個與第二個離散傅 立葉轉換符元之分解與計算順序皆為第〜類順序,則 ❹以上之說明,本發明可知: ⑴第-、五、九、十三…離散傅立葉轉換符元係儲存於 記憶區塊MEM一1且以第一類順序計算。 ⑵第二、六、十、十四··離散傅立葉轉換符元係儲存於 記憶區塊MEM一2且以第一類順序計算。 (3)第三、七、十一、+五 雜勒捕a 十五…離散傅立葉轉換符元係儲存 ❹於記憶區塊MEM_1且以第二類順序計算。 ⑷第四、人、十:、十六.··離散傅立葉轉換符元係儲存 於記憶區塊MEM—2且以第二類順序計算。 爲說明之便,第二圖中僅以長度為^画⑽之離散 傅立葉轉換為例來說明取得資料存取所需指標向量之硬體 實施方式。如第二圖所示,其係由數個累加器Αι、 A2、…、A5組成。於控制單元中包含3組此硬體設計, 22 201017529 其中2組分別用於產生資料輸入與輸出所需之指標向量, 其參數ΙΠ、U2、U3、114、q與r係由各階段之分解方程式 ⑴決定。而第3組則是用於產生各個短點數傅立葉轉二 所需資料的指標向量,其與前面兩組唯—不同處是此時之 參數q’r皆為〇’即a4,A5可去除。 下表二顯示以記憶體為基礎之離散傅立葉轉換處理 ❹器設計之不同即時應用方法的比較結果。所有項目中,僅 有本案方法與[美國專利4, 477, 878]可支援任何一般基數 之展合’並進行任何點數長度之快速傅立葉轉換。然而, 如别所述’[美國專利4,477,878]未能實現無記憶體衝突 之多記憶體架構(multi-bankmem〇rystructure),因而 無法減少即時應用所需之操作時脈。 ❿ 本發明提出以記憶體為基礎之快速傅立葉轉換處理 器設計方法,使其可同時滿足以下三項目的:⑴蝴蝶運 算與輸入輸出資料時之資料置換;⑵任何基數之混合; 及(3)支援多記憶體架構。 23 201017529 表二 本發明與不同方法之比較 [A1] [A2] [B2] [A5] [A3] [A4] B[3] 本發 明 基數 所有 一般 固定基 數-r 固定 基數-r 基數_ 2/4 所有 一般 資料置 換 是 是 是 是 是 記憶體 2N 個字 2N 個 字 2N個 字 2N個 字 2N 個 字 多重記 憶庫 否 是 否 是 是 [A1]:美國專利 4,477,878 [A2]:美國專利 5, 091, 875 [A3]:美國專利 7,062,523 [A4]:美國專利 7, 164, 723 [A5]:美國專利公開20060253514 24 201017529 [B2]:論文 L. G. Johnson “Conflict free memory addressing for dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992. B[3]:論文 B. G. Jo, and Μ. H. Sunwoo, “New continuous-flow mixed-radix (CFMR) FFT processor ❿ using novel in-place strategy,” IEEE Trans. Circuits Syst. I,Reg. Papers, vol. 52,no. 5,pp. 911-919, May 2005. 綜上所述,當以本發明方法應用於中國數位電視所 需之3780點離散傅立葉轉換時,僅需2N個字之記憶體即 可滿足系統需求,同時達到持續資料流與資料依序輸入輸 出之目的。若欲達成與本案相同效果,[美國專利公開 9 20080025199]、[中國專利 01140060· 9]、[中國專利 03107204.6]之方案需至少3N個字之記憶體,[中國專利 公開 200410090873.2]與[中國專利公開 200710044716. 1] 需至少 5N個字之記憶體,[中國專利公開 200610104144.7 ]需6N個字之記憶體,而[論文2.-又· Yang, Y. -P. Hu, C. -Y. Pan, and L. Yang, ^Design of 25 201017529The position of Xodd [ k ] and the relationship between the two is k = n, thereby obtaining the memory selection and memory addressing equations (16) and (17) of the double number 10 fast Fourier transform symbol. Bank = technique 1 + + technique 3 + a4 + + technique 6 mod7 (16) address = 135a6 + 45a5 + 15a4 + 5a3 + a2 (17) Note that the input mapping equation (12) is the same as the output mapping equation (9). And the memory selection equations (10) and (16) remain the same. It is recalled that during the calculation of the double-numbered fast Fourier transform symbol, the memory is always free of access conflicts by reversing the calculation of the singular fast Fourier transform symbol decomposition order. In addition, the present invention finds that the output mapping equation (13) is the same as the input mapping equation (8), meaning that the third fast Fourier transform input data xthirdU] is placed in the calculated double-number fast Fourier transform output data xeven When the position of [k] is such that k=n, the third 20 201017529 fast Fourier transform input data can be stored in the same manner as those determined by equations (8), (10) and (11), that is, the third The data storage location of the fast Fourier transform symbol is the same as the data storage location of the singular fast Fourier transform symbol, and returns to the state of the initial discussion, xthird[n]= x〇dd[n]. According to the above embodiments, the present invention can design a variable base fast Fourier transform processor without memory conflict by reversing the previous fast Fourier transform symbol decomposition sequence, which can simultaneously achieve butterfly output and data input and output. Data replacement. The first picture is a memory-based 3780-point discrete Fourier transform processor design block diagram, in which MEM-1 and ME12 are two memory blocks (each block contains 7 memory banks (memory reference bank) Each memory bank has a size of 540 words, and the FFT_C0RE includes a processing unit that can separately process the discrete-point Fourier transform of the short-point number, and the number of clock cycles required for calculating the discrete Fourier transform of each short-point number is determined by the designer according to the requirement. This also determines the number of hardware required for the computing unit, and the control unit controls the data access and processing unit calculations. The operation of the discrete Fourier transform processor will be explained below. For convenience of explanation, the decomposition and calculation order of the first type is defined as 4 21 201017529 points, 3 points, 3 points, 3 points, 5 points and 7 points. Discrete Fourier transform, and define its reverse order as the second order, followed by '(', 5, 3, 3, 3, and 4 discrete Fourier transforms. Suppose the first and second discrete Fourier The conversion symbols are stored in the memory blocks MEM-1 and _~2, respectively, and it is assumed that the decomposition and calculation order of the first and second discrete Fourier transform symbols are in the order of the first type, then the above description, this According to the invention, (1) the first, fifth, ninth, thirteenth... discrete Fourier transform symbol is stored in the memory block MEM-1 and is calculated in the first order. (2) Second, sixth, ten, fourteen·discrete Fourier The conversion symbol is stored in the memory block MEM-2 and is calculated in the first type of order. (3) The third, seventh, eleventh, and + five miscellaneous traps a fifteen... the discrete Fourier transform symbol is stored in memory Block MEM_1 is calculated in the second order. (4) Fourth, person, Ten:16. The discrete Fourier transform symbol is stored in the memory block MEM-2 and is calculated in the second order. For the sake of explanation, only the discrete Fourier transform of length (10) is shown in the second figure. As an example, the hardware implementation of the indicator vector required for data access is described. As shown in the second figure, it is composed of several accumulators Αι, A2, ..., A5. The control unit contains 3 groups of this hard. Body design, 22 201017529 Two groups are used to generate the index vector required for data input and output, and the parameters ΙΠ, U2, U3, 114, q and r are determined by the decomposition equation (1) of each stage, while the third group is It is an index vector used to generate the data required for each short point Fourier to two. It is different from the previous two groups. The parameter q'r at this time is 〇', ie, a4, A5 can be removed. Table 2 below shows Comparison of different real-time application methods for memory-based discrete Fourier transform processing. All of the projects, only the method of this case and [US Patent 4, 477, 878] can support the integration of any general base. Make any points faster Fourier transform. However, as described above, [[US Pat. No. 4,477,878] fails to implement a multi-bank mem ry structure without memory conflict, and thus cannot reduce the operational clock required for immediate application. A memory-based fast Fourier transform processor design method is proposed, which can satisfy the following three items: (1) data replacement during butterfly operation and input and output data; (2) mixing of any base; and (3) support for multiple memories 23 201017529 Table 2 Comparison of the present invention and different methods [A1] [A2] [B2] [A5] [A3] [A4] B[3] The cardinality of the invention, all general fixed bases - r fixed base - r base _ 2/4 All general data replacement is yes or no memory 2N words 2N words 2N words 2N words 2N words multiple memory banks whether it is yes [A1]: US Patent 4,477,878 [A2]: US patent 5, 091, 875 [A3]: U.S. Patent 7,062,523 [A4]: U.S. Patent 7,164,723 [A5]: U.S. Patent Publication No. 20060253514 24 201017529 [B2]: Paper LG Johnson "Conflict free memory addressi Ng for dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992. B[3]: Paper BG Jo, and Μ H. Sunwoo, "New continuous-flow mixed-radix (CFMR) FFT processor ❿ using novel in-place strategy," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911 -919, May 2005. In summary, when the method of the present invention is applied to the 3780-point discrete Fourier transform required for Chinese digital television, only 2N words of memory can be used to meet the system requirements while achieving continuous data flow. The purpose of inputting and outputting with the data in sequence. If you want to achieve the same effect as the case, [US Patent Publication 9 20080025199], [Chinese Patent 0114060], and [Chinese Patent 03107204.6] require at least 3N words of memory, [Chinese Patent Publication 200410090873.2] and [Chinese Patent] Public 200710044716. 1] Memory of at least 5N words is required, [Chinese Patent Publication 200610104144.7] requires 6N words of memory, and [Thesis 2.-Yang Yang, Y. -P. Hu, C. -Y. Pan, and L. Yang, ^Design of 25 201017529
a 3780-point IFFT processor for TDS-OFDM,” IEEEa 3780-point IFFT processor for TDS-OFDM,” IEEE
Trans. Broadcast., vol. 48, no. 1, pp. 57-61, Mar. 2002]亦至少需3N個字以上之記憶體。因此,就此應用而 言,若以本案方法設計3780點離散傅立葉轉換處理器, 則相較於以現有方式設計者可大幅縮減晶片面積。 本發明更提出一種以記憶體為基礎之(正/逆向)快速 ❿傅立葉轉換處理器,用以執行上述之方法,其係包含:一 用以存放資料之主要記憶體、一進行分解後短點數快速傅 立葉轉換之處理元件以及一控制單元,其中該控制單元具 有控制以下項目之功能··( 1)輸入輸出資料與蝴蝶運算用 之記憶體,(2)分解後之短點數快速傅立葉轉換之計算順 序,及(3)以資料置換方式進行資料存取所需之記憶體定 赢址。用以執行上述之方法。該主要記憶體包含二記憶區塊 ❿ (memory block),亦即 MEM_1 與 MEM—2,當 MEM_1 用於快 速傅立葉轉換運算時,MEM_2則用於輸入輸出資料,反之 亦然;且,每一記憶區塊包含Μ個記憶庫(memory bank), 且每一記憶庫之大小為N/M,其中N為快速傅立葉轉換之 點數長度,Μ為由系統設計者自行設定之記憶庫數量,該 處理單元FFT_C0RE係設計為可對分解後之短點數快速傅 26 201017529 立葉轉換進行個別計算,其計算各短點數離散傅立葉轉換 所需的時脈週期數由設計者依需求決定此亦決定計算單 元所需硬體數量。該控制單元之第⑴項控制功能係控制 如前所述之該等記憶區塊,以將其功能切換為快速傅立葉 轉換計算或輸入輸出資料。該控制單元之第(2)項控制功 能係控制該處理元件,使其利用與同一記憶區塊之前次快 籲速傅立葉轉換符元分解順序相反之順序進行短點數快速傅 立葉轉換计算,從而取得快速傅立葉轉換符元;亦即,若 該快速傅立葉轉換符元於一記憶區塊中係以ni點快速傅 立葉轉換、N2點快速傅立葉轉換....至Nk點快速傅立葉 轉換之順序計算’則儲存於同一記憶區塊中之次一快速傅 立葉轉換符元之計算順序為Nk點快速傅立葉轉換叫〇 •點快速傅立葉轉換......至N1點快速傅立葉轉換。該控制 單兀之第(3)項控制功能係控制以資料置換之方式進行資 料存取,從而進行每一記憶區塊之蝴蝶運算與資料輸入輪 八i含3組運作原理如圖二之硬體設計,其中2組分 別用於產生資料輸入與輸出所需之指標向量,其參數 111⑽、113、M、q與r係由各階段之分解方程式決定。 而第3紅肢用於產生各個短點數傅立葉轉換所需資料的 27 201017529 指標向量,其與前面兩組唯一不同處是此時之參數q,r 皆為0。 綜上所述,以上之實施例僅是用來解說本發明之具體 實施方式,本發明之專利範圍仍應以申請專利範圍所載為 準。 【圖式簡單說明】 ❿第一圖本發明快速傅立葉轉換處理器設計方塊圖。 第二圖本發明指標向量產生器硬體實施圖。 【主要元件符號說明】 無Trans. Broadcast., vol. 48, no. 1, pp. 57-61, Mar. 2002] also requires at least 3N words of memory. Therefore, for this application, if the 3780-point discrete Fourier transform processor is designed in the present method, the chip area can be greatly reduced compared to the prior art designer. The invention further provides a memory-based (forward/reverse) fast ❿ Fourier transform processor for performing the above method, which comprises: a main memory for storing data, and a short point after decomposition A fast Fourier transform processing component and a control unit, wherein the control unit has the function of controlling the following items: (1) input and output data and memory for butterfly operation, and (2) short point fast Fourier transform after decomposition The order of calculation, and (3) the memory-based win address required for data access by means of data replacement. Used to perform the above method. The main memory includes two memory blocks, namely MEM_1 and MEM-2. When MEM_1 is used for fast Fourier transform operation, MEM_2 is used for input and output data, and vice versa; and, each memory The block contains a memory bank, and the size of each memory bank is N/M, where N is the length of the point of the fast Fourier transform, and the number of memories is set by the system designer. The unit FFT_C0RE is designed to perform individual calculations on the decomposed short point fast Fu 26 201017529 Fourier transform. The number of clock cycles required to calculate the discrete Fourier transform of each short point is determined by the designer according to the requirements. The number of hardware required. The control function of item (1) of the control unit controls the memory blocks as described above to switch their functions to fast Fourier transform calculations or input and output data. The control function of item (2) of the control unit controls the processing element to perform short-point fast Fourier transform calculation in the reverse order of the previous fast-speed Fourier transform symbol decomposition order of the same memory block, thereby obtaining a fast Fourier transform symbol; that is, if the fast Fourier transform symbol is in a memory block with a ni point fast Fourier transform, an N2 point fast Fourier transform .... to the Nk point fast Fourier transform sequence calculation ' The order of calculation of the next fast Fourier transform symbol stored in the same memory block is Nk point fast Fourier transform called 〇•point fast Fourier transform...to N1 point fast Fourier transform. The control function of item (3) of the control unit controls the data access by means of data replacement, thereby performing the butterfly operation and data input wheel of each memory block, and the operation principle of the three groups is as shown in FIG. The body design, in which two groups are used to generate the index vector required for data input and output, the parameters 111(10), 113, M, q and r are determined by the decomposition equations of each stage. The third red limb is used to generate the 27 201017529 index vector of the data required for each short-point Fourier transform. The only difference from the previous two sets is that the parameter q, r is 0 at this time. In the above, the above embodiments are only intended to illustrate the specific embodiments of the present invention, and the scope of the invention should be determined by the scope of the patent application. [Simple diagram of the drawing] ❿The first figure is a block diagram of the design of the fast Fourier transform processor of the present invention. The second figure shows the hardware implementation diagram of the indicator vector generator of the present invention. [Main component symbol description] None
2828
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW97141311A TW201017529A (en) | 2008-10-28 | 2008-10-28 | Computation and addressing method of a eneral sized memory-based FFT processor |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW97141311A TW201017529A (en) | 2008-10-28 | 2008-10-28 | Computation and addressing method of a eneral sized memory-based FFT processor |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201017529A true TW201017529A (en) | 2010-05-01 |
| TWI375171B TWI375171B (en) | 2012-10-21 |
Family
ID=44830889
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW97141311A TW201017529A (en) | 2008-10-28 | 2008-10-28 | Computation and addressing method of a eneral sized memory-based FFT processor |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TW201017529A (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9459812B2 (en) | 2014-02-03 | 2016-10-04 | Ceva D.S.P. Ltd. | System and method for zero contention memory bank access in a reorder stage in mixed radix discrete fourier transform |
| CN109117454A (en) * | 2017-06-23 | 2019-01-01 | 扬智科技股份有限公司 | 3780-point fast Fourier transform processor and operating method thereof |
| US12494235B2 (en) | 2023-03-31 | 2025-12-09 | Taiwan Semiconductor Manufacturing Company, Ltd. | Systems and methods for flexible bank addressing in digital computing-in-memory (DCIM) |
| TWI913640B (en) | 2023-03-31 | 2026-02-01 | 台灣積體電路製造股份有限公司 | Systems, methods and circuits for flexible bank addressing in digital computing-in-memory |
-
2008
- 2008-10-28 TW TW97141311A patent/TW201017529A/en not_active IP Right Cessation
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9459812B2 (en) | 2014-02-03 | 2016-10-04 | Ceva D.S.P. Ltd. | System and method for zero contention memory bank access in a reorder stage in mixed radix discrete fourier transform |
| CN109117454A (en) * | 2017-06-23 | 2019-01-01 | 扬智科技股份有限公司 | 3780-point fast Fourier transform processor and operating method thereof |
| CN109117454B (en) * | 2017-06-23 | 2022-06-14 | 扬智科技股份有限公司 | 3780-point fast Fourier transform processor and its operation method |
| US12494235B2 (en) | 2023-03-31 | 2025-12-09 | Taiwan Semiconductor Manufacturing Company, Ltd. | Systems and methods for flexible bank addressing in digital computing-in-memory (DCIM) |
| TWI913640B (en) | 2023-03-31 | 2026-02-01 | 台灣積體電路製造股份有限公司 | Systems, methods and circuits for flexible bank addressing in digital computing-in-memory |
Also Published As
| Publication number | Publication date |
|---|---|
| TWI375171B (en) | 2012-10-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8364736B2 (en) | Memory-based FFT/IFFT processor and design method for general sized memory-based FFT processor | |
| CN114816334B (en) | Acceleration unit, related device and method | |
| US8694570B2 (en) | Method and apparatus for evaluation of multi-dimensional discrete fourier transforms | |
| Chen et al. | Hardware efficient mixed radix-25/16/9 FFT for LTE systems | |
| US9582474B2 (en) | Method and apparatus for performing a FFT computation | |
| JP3749022B2 (en) | Parallel system with fast latency and array processing with short waiting time | |
| Hsiao et al. | A generalized mixed-radix algorithm for memory-based FFT processors | |
| JP2022540749A (en) | Systems and methods for shift-based information mixing across channels of neural networks similar to shuffle nets | |
| CN102053948A (en) | Method and system for transposing a matrix on a single instruction multiple data multi-core processor architecture | |
| JP2011100452A5 (en) | ||
| US20040243656A1 (en) | Digital signal processor structure for performing length-scalable fast fourier transformation | |
| TW201229898A (en) | Mechanism for conflict detection using SIMD | |
| US20210303358A1 (en) | Inference Engine Circuit Architecture | |
| CN105224505A (en) | Based on the FFT accelerator installation of matrix transpose operation | |
| WO2023045516A1 (en) | Fft execution method, apparatus and device | |
| JP2000231513A (en) | Memory architecture for parallel data access in any given dimension of an N-dimensional rectangular data array | |
| TW201017529A (en) | Computation and addressing method of a eneral sized memory-based FFT processor | |
| US9176929B2 (en) | Multi-granularity parallel FFT computation device | |
| CN111221501A (en) | Number theory conversion circuit for large number multiplication | |
| Puchała et al. | Effectiveness of Fast Fourier Transform implementations on GPU and CPU | |
| US20060253514A1 (en) | Memory-based Fast Fourier Transform device | |
| CN111047026B (en) | Memory chip capable of executing artificial intelligent operation and operation method thereof | |
| US20080228845A1 (en) | Apparatus for calculating an n-point discrete fourier transform by utilizing cooley-tukey algorithm | |
| TWI237773B (en) | Fast fourier transform processor and dynamic scaling method thereof and radix-8 fast Fourier transform computation method | |
| EP1076296A2 (en) | Data storage for fast fourier transforms |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |