[go: up one dir, main page]

TW201017529A - Computation and addressing method of a eneral sized memory-based FFT processor - Google Patents

Computation and addressing method of a eneral sized memory-based FFT processor Download PDF

Info

Publication number
TW201017529A
TW201017529A TW97141311A TW97141311A TW201017529A TW 201017529 A TW201017529 A TW 201017529A TW 97141311 A TW97141311 A TW 97141311A TW 97141311 A TW97141311 A TW 97141311A TW 201017529 A TW201017529 A TW 201017529A
Authority
TW
Taiwan
Prior art keywords
fourier transform
memory
fast fourier
data
point
Prior art date
Application number
TW97141311A
Other languages
Chinese (zh)
Other versions
TWI375171B (en
Inventor
Chen-Yi Lee
qing-feng Xiao
Yuan Chen
Original Assignee
Univ Nat Chiao Tung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Chiao Tung filed Critical Univ Nat Chiao Tung
Priority to TW97141311A priority Critical patent/TW201017529A/en
Publication of TW201017529A publication Critical patent/TW201017529A/en
Application granted granted Critical
Publication of TWI375171B publication Critical patent/TWI375171B/zh

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

For a large size FFT computation, the present invention decompose it into several smaller sizes FFT by decomposition equation and then transform the original index from one dimension into multi-dimension vector. By controlling the index vector, the present invention could distribute the input data into different memory banks such that both the in-place policy for computation and the multi-bank memory for high-radix structure could be supported simultaneously without memory conflict. Besides, in order to keep memory conflict-free when the in-place policy is also adopted for I/O data, the present invention reverse the decompose order of FFT to satisfy the vector reverse behavior. The proposed approach can minimize the area and reduce the necessary clock rate effectively for general sized memory-based FFT processor design. Further, the present invention applies the proposed to the 3780-point DFT processor for the Chinese DTV application. Compared with all the prior arts, the present invention only needs 2N words memory to achieve the in order I/O data requirement for continuous data flow. However, all the current patents need at least 3N words memory to achieve this system requirement. Here, N is the DFT size, 3780.

Description

201017529 九、發明說明: 【發明所屬之技術領域】 本發明關於一種快速傅利葉轉換之計算與定址方法以 及使用此方法之以記憶體為基礎之任意點數正/逆向快速 傅立葉轉換處理器設計。對於任意點數的以記憶體為基礎 的離散快速傅立葉轉換處理器設計可以有效的減少處理器 面積與所需的操作時脈。 【先前技術】 Φ 按,有關本發明相關之快速傅利葉轉換計算與定址方 法以及使用此方法之以記憶體為基礎之正/逆向快速傅立 葉轉換處理器之先前技術謹羅列並比較缺點如下: (1) 由於美國專利號4,477, 878名為 “Discrete Fourier transform with non-tumbled output,,不能 支援多記憶體架構 (multi-bank memory structure),因而,對於基r(radix-r)計算時,就需 要有r個時脈週期才能把資料從記憶體中讀出或將計算 β 完的資料寫回記憶體中。這將導致FFT在計算過程中需 要更多的時脈週期,以及為了即時應用所需的更高時脈 速度。 本發明可藉由支援多記憶體定址,而在無記憶體存取衝 突的情況下,將諸如基r的r筆資料在一個時脈週期内 完成讀或寫,以解決先前技術的問題。 (2) 由於美國專利號5091875名為“Fast Fourier transform (FFT) addressing apparatus and method”, 美國專利公開號 20060253514 名為 “Memory-based Fast Fourier Transform device”,以及學術論文 L. G. 6 201017529201017529 IX. Description of the Invention: [Technical Field] The present invention relates to a fast Fourier transform calculation and addressing method and a memory-based arbitrary point forward/reverse fast Fourier transform processor design using the same. The memory-based discrete fast Fourier transform processor design for any number of points can effectively reduce the processor area and the required operating clock. [Prior Art] Φ According to the prior art of the fast Fourier transform calculation and addressing method of the present invention and the memory-based forward/reverse fast Fourier transform processor using the method, the disadvantages are as follows: (1) Since US Patent No. 4,477,878 is called "Discrete Fourier transform with non-tumbled output", it cannot support multi-bank memory structure. Therefore, for base r (radix-r) calculation, it is necessary. There are r clock cycles to read data from memory or write beta-completed data back into memory. This will cause the FFT to require more clock cycles in the calculation process and for immediate application. Higher clock speed. The present invention can read or write a r-pen data such as a base in a clock cycle by supporting multi-memory addressing without a memory access conflict. Solving the problems of the prior art. (2) U.S. Patent No. 5,091,875 is entitled "Fast Fourier transform (FFT) addressing apparatus and method", U.S. Patent Publication No. 2006 0253514 is called "Memory-based Fast Fourier Transform device" and academic paper L. G. 6 201017529

Johnson “Conflict free memory addressing for dedicated FFT hardware,M IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no· 5,pp. 312-316,May 1992等,均僅支援固定的 基r,因此僅能適用在具有N = rn大小的的FFT中。 若考慮到應用在中國數位電視之3780點FFT或者是 PLC應用之3072點FFT的情況時,前述兩件先前技術 即無法運作。 但本發明能夠支援任意基數r的混和。因此,能夠在任 φ 何大小的FFT應用中使用。 (3) 美國專利號 7062523 名為“ “Method for efficiently computing a fast Fourier transform” 僅支援固定的基r,因此不能支援中國DTV或是PLC之 類的應用。除此之外,他也不能支援多記憶體架構 (multi-bank memory structure),其在基 r(radix-r)運算時,需要r個時脈週期自記憶體存取資料。因此 將比使用多記憶體架構的處理器需要更高的時脈來完成 FFT的計算。 本發明除了支援任意大小的FFT應用之可變基數外,尚 ® 支援多記憶體架構,在不產生記憶體存取衝突的情況下 減低所需時脈。 (4) 美國專利號 7, 164, 723 名為“ Modulation apparatus using mixed-radix fast Fourier transform”,以及論文 B. G. Jo, and Μ. H· Sunwoo, “New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy, ’’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, ηο·5, pp. 911-919, May 2005 僅適用於基 2/4 混和的 201017529 演算法(algorithm) ’故僅能工作在N=2n大小上的 FFT ’無法應用於例如n= 3780之中國DTV之其他大小 的FFT應用。 ' 本發明因為可支援任何混合的基數’所以可以滿足上述 需求。除此之外,對於諸如8192的更長點數的處理 器設計,本發明可以讓處理器設計更加有彈性,因本發 明可支援大於基4的演算法。 (5) 美國專利公開號20080025199名為“Method and device for high throughput n-point forward and • inverse fast Fourier transform” 提出 3780 的可能 分解(candidate decomposition ),例如,3780 =Johnson "Conflict free memory addressing for dedicated FFT hardware, M IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992, etc., all support only fixed The base r, therefore, can only be applied to FFTs with N = rn size. If considering the 3780-point FFT applied to Chinese digital TV or the 3072-point FFT of PLC application, the above two prior techniques cannot be operated. However, the present invention can support the mixing of any base r. Therefore, it can be used in FFT applications of any size. (3) US Patent No. 7062523 named "Method for efficient computing a fast Fourier transform" only supports fixed bases. r, therefore can not support applications such as Chinese DTV or PLC. In addition, he can't support multi-bank memory structure, which requires r clock cycles to access data from memory during radix-r operations. Therefore, a higher clock is required than the processor using the multi-memory architecture to complete the FFT calculation. In addition to supporting the variable base of FFT applications of any size, the present invention supports multiple memory architectures to reduce the required clock without creating memory access violations. (4) US Patent No. 7, 164, 723 entitled "Modulation apparatus using mixed-radix fast Fourier transform", and paper BG Jo, and Μ. H. Sunwoo, "New continuous-flow mixed-radix (CFMR) FFT processor Using novel in-place strategy, '' IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, ηο·5, pp. 911-919, May 2005 only for the base 2/4 hybrid 201017529 algorithm ( Algorithm) 'The FFT that can only work at N=2n size cannot be applied to other sizes of FFT applications such as n=3780 China DTV. 'The present invention can support any mixed base number' so that the above requirements can be met. In addition, for longer processor designs such as 8192, the present invention allows the processor design to be more flexible, as the present invention can support algorithms larger than base 4. (5) US Patent Publication No. 20080025199 A possible decomposition of 3780 is proposed for "Method and device for high throughput n-point forward and • inverse fast Fourier transform", for example, 3780 =

3x3x3x2x2x5x7。其以MDC架構實行每個小點數的FFT 模組來減少後續會提到的一些中國專利中的較大内部暫 存器。然而,由於此方法需要對每個模組在一個時脈週 期中完成運算,因此會需要大量硬體。此外,在實際的 系統應用上是需要依序輸出資料的,但是此專利輸出資 料卻不是依序輸出的,因此尚有部分問題未解決。 (6) 中國專利號01140060.9名稱”3780點離散傅里葉變 春 換處理器系統及其結構”、中國專利號03107204. 6名 稱”具有3780點IDFT/DFT處理器的多載波系統及其方 法”、中國專利公開號200410090873.2名稱”採用升採 樣處理方法實現3780點離散傅立葉變換”、中國專利公 開號200610104144.7名稱”3780點離散傅立葉變換處 理器”以及中國專利公開號200710044716. 1名稱”流水 線結構的3780點快速傅里葉變換處理器”等上述專利可 8 201017529 執行3780點之具有類似管線(pipelined)架構之FFT 處理器,其所提出的架構内部需要大量的暫存器或記憶 體來重新排列資料。此外,對於實際系統應用之需求而 言,依序輸入輸出資料以及支援連續資料流都是必須 的,為了達成這些,中國專利號01140060. 9名稱”3780 點離散傅里葉變換處理器系統及其結構”以及中國專利 參 號03107204. 6名稱”具有3780點IDFT/DFT處理器的多 載波系統及其方法”就至少需要3N字元的記憶體空間; 中國專利公開號200410090873. 2,名稱”採用升採樣處 理方法實現3780點離散傅立葉變換”、中國專利公開號 200710044716. 1,名稱”流水線結構的3780點快速傅里 葉變換處理器”就至少需要5N字元的記憶體空間;中國 ^ 專利公開號200610104144. 7名稱”3780點離散傅立葉 變換處理器”就至少需要6N字元的記憶體空間。 相較前述,本發明僅僅需要2N字元的記憶體空間就可 以做到了 。並且,請注意到在中國專利號 01140060.9、中國專利號03107204.6以及中國專利公 開號200710044716.1之輸出資料並不是有序的,因此 他們需要至少一個N字元的記憶體空間與額外的控制邏 9 201017529 輯來重新排序輸出資料以便依序輸出。 (7)在論文 Z.-X. Yang, Y.-P. Hu,C. -Y. Pan, and L. Yang, “Design of a 3780-point IFFT processor for TDS-OFDM, 5, IEEE Trans. Broadcast., vol. 48, no. 1, pp· 57-61,Mar. 2002所提出的3780點FFT處理器之 輸出資料並非依序排列,為了能夠依序排列,其需要一 ❹ 個緩衝器去重新排列輸出資料,因此其至少需要3N字 元的記憶體空間,才能在能處理連續資料流的前提下達 成此需求,但本發明已如上述僅需2N字元空間的記憶 體。 【發明内容】 有鑑於上述先前技術的缺失,本發明提出一種快速傅 利葉轉換之計算與定址方法以及使用此方法之以記憶體為 ❿ 基礎之任意點數正/逆向快速傅立葉轉換處理器設計,其 方法第一項特徵為: 藉由分解方程式將長點數離散傅立葉轉換的計算分解為數 個短點數的離散傅立葉轉換,並同時將其指標由單一維度 映射成多維度指標向量。 本發明的方法第二項特徵為:藉由控制這些多維度 201017529 指標向量’本發明把原始輸人資料分散存放到數個記憶體 裡,使得在不產生記憶體存取衝突的情況下同時達到計算 期間的資料置換與記憶體完整蝴蝶點數一次存取的目的。 本發明的方法第三項特徵為:當資料置換使用在已計 算完成的舊資料依序輸出與新資料依序輸入時為了往後 計算期間可以繼續保持資料存取時沒有記憶體衝突,本發 •明對於新資料的計算採取與先前資料計算時的反序操作來 達成目的。此方法,對於任意點數的以記憶體為基礎的離 散快速傅立葉轉換處理器設計可以有效的減少處理器面積 與所需的操作時脈。 本發月提出之以s己憶體為基礎之快速傅立葉轉換處理 器設計,其包含:-用以存放資料之主要記憶體、一用以 鲁進行分解後短點數快速傅立料換之處理元件以及一押制 單元,其中該控制單元具有控制以下項目之功能:(Γ)輪 入輸出資料與蝴蝶運算用之記憶體,⑵分解後之短點數 决速傅立葉轉換之計算順序,及⑶以資料置換方式進 資料存取所需之記憶體定址1以執行上述之方法。 【實施方式】 為了達成上述的發明目的,兹將本發明之—具趙實施 201017529 例說明如下: X{k)^NZx(ji)W^ 離散傅立葉轉換之定義為 ”=0 "。將長點數離 散傅立葉轉換分解為數個短點數離散傅立葉轉換之方法已 在論文”IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-25, NO. 3, JUNE 1977” 名稱 『Index Mappings for Multidimensional m3x3x3x2x2x5x7. It implements an FFT module for each small point in the MDC architecture to reduce the larger internal registers of some of the Chinese patents mentioned later. However, since this method requires an operation for each module in one clock cycle, a large amount of hardware is required. In addition, in actual system applications, it is necessary to output data sequentially, but the output of this patent is not output in order, so some problems remain unresolved. (6) Chinese Patent No. 01140060.9 "3780-point discrete Fourier transform spring processor system and its structure", Chinese Patent No. 03107204. 6 name "Multi-carrier system with 3780 IDFT/DFT processor and its method" , Chinese Patent Publication No. 200410090873.2 name "3780 point discrete Fourier transform using upsampling processing method", Chinese Patent Publication No. 200610104144.7 name "3780 point discrete Fourier transform processor" and Chinese patent publication number 200710044716. 1 name "pipeline structure 3780" The above-mentioned patents such as the point fast Fourier transform processor can perform 3780 points of an FFT processor with a pipelined architecture, and the proposed architecture requires a large amount of scratchpads or memory to rearrange the data. . In addition, for the actual system application requirements, sequential input and output data and support for continuous data flow are all necessary. In order to achieve this, the Chinese patent number 011400060. 9 name "3780 point discrete Fourier transform processor system and its "Structure" and Chinese Patent No. 03107204. 6 "Multi-carrier system with 3780-point IDFT/DFT processor and its method" requires at least 3N characters of memory space; Chinese Patent Publication No. 200410090873. 2, name" adopted The upsampling processing method realizes 3780-point discrete Fourier transform", Chinese Patent Publication No. 200710044716. 1. The 3780-point fast Fourier transform processor with the name "pipeline structure" requires at least 5N characters of memory space; China ^ Patent Disclosure No. 200610104144. 7 Name "3780-point discrete Fourier transform processor" requires at least 6N characters of memory space. Compared to the foregoing, the present invention requires only 2N characters of memory space. Also, please note that the output data in Chinese Patent No. 01140060.9, Chinese Patent No. 03107204.6, and Chinese Patent Publication No. 200710044716.1 are not orderly, so they need at least one N-character memory space with additional control logic 9 201017529 To reorder the output data for sequential output. (7) In the paper Z.-X. Yang, Y.-P. Hu, C. -Y. Pan, and L. Yang, "Design of a 3780-point IFFT processor for TDS-OFDM, 5, IEEE Trans. Broadcast., vol. 48, no. 1, pp· 57-61, Mar. 2002 The output data of the 3780-point FFT processor is not arranged in order. In order to be able to arrange in sequence, it needs a buffer to go. The output data is rearranged, so that it requires at least 3N characters of memory space to achieve this requirement on the premise that the continuous data stream can be processed, but the present invention has only 2N character space memory as described above. In view of the above-mentioned shortcomings of the prior art, the present invention proposes a fast Fourier transform calculation and addressing method and an arbitrary-point forward/reverse fast Fourier transform processor design based on the memory using the method, the method first The feature of the term is: Decomposing the calculation of the long-point discrete Fourier transform into a discrete Fourier transform of several short points by the decomposition equation, and simultaneously mapping its index from a single dimension to a multi-dimensional index vector. The second method of the method of the present invention special By controlling these multi-dimensional 201017529 indicator vectors, the present invention distributes the original input data into a plurality of memories so that the data replacement and the memory during the calculation are simultaneously achieved without causing memory access conflicts. The purpose of the complete butterfly point access is as follows. The third feature of the method of the present invention is that when the data replacement is used, the old data sequentially output and the new data are sequentially input, and can be continued for the subsequent calculation period. There is no memory conflict when accessing data. The calculation of new data takes the reverse operation with the previous data calculation to achieve the purpose. This method is a memory-based discrete fast Fourier transform for any number of points. The processor design can effectively reduce the processor area and the required operating clock. This month's FT-based fast Fourier transform processor design includes: - the main memory for storing data a processing element for short-point number fast-changing after decomposing, and a depressing unit, wherein the control unit It has the functions of controlling the following items: (Γ) the memory for the input data and the butterfly operation, (2) the calculation order of the short-point final Fourier transform after decomposition, and (3) the data access method for data access. The memory address is set to perform the above-described method. [Embodiment] In order to achieve the above object, the present invention will be described as follows: X{k)^NZx(ji)W^ Discrete Fourier transform Defined as "=0 ". The method of decomposing long-point discrete Fourier transform into several short-point discrete Fourier transforms has been published in the paper "IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-25, NO. 3, JUNE 1977" For Multidimensional m

Formulation of the DFT and Convolution』中提出。本 發明在此以點數長度N=N1N2之分解方程式(1)加以說明: \n = N2nl+A2n2 modA^ =0,1,,..,TV, -1 [k = + Nxk2 modN -1 (1) 方程式(1)將指標n與k映射為指標向量(nl, n2)與 (kl, k2),使其由單一維度[0,N-l]映射成二維度[0, Nl-1] X [0, N2-1]。其中係數A2與B1之選擇取決於N1 與N2間之關係。於本發明提出之方法中,方程式(1)係數 選用規則如下: 情況一:若N1與N2互質,則選用符合下列條件之A2及 B1為係數 A2 : =plNl 並且 A2 =dlN2 + 1 B1 : =p2N2 並且 B1 =q2Nl + 1 12 201017529 在此,pi、ql、n9 0 ^ ^ P及q2皆為正整數。因此,離散傅立 葉轉換之定義可宜士、 科J冩成以下之方程式(2): ^,A2) = ^ X〇c(«l,n2)»^*'^ «I 叫 12 =Σ{ Σχ(«ι»«2)^^} =ΣΜ.«ί) ^ (2) 隋況一.若Nl與M2非為互質,則選用A2 = B1 =卜因 & H# it葉轉換之定義可寫成以下之方程式⑶: =Σ{ w^ (3)Proposed in Formulation of the DFT and Convolution. The present invention is described herein by the decomposition equation (1) of the point length N = N1N2: \n = N2nl + A2n2 modA^ =0, 1, , .., TV, -1 [k = + Nxk2 modN -1 ( 1) Equation (1) maps the indices n and k into index vectors (nl, n2) and (kl, k2), which are mapped from a single dimension [0, Nl] to a two-dimensional [0, Nl-1] X [ 0, N2-1]. The choice of coefficients A2 and B1 depends on the relationship between N1 and N2. In the method proposed by the present invention, the coefficient selection rule of equation (1) is as follows: Case 1: If N1 and N2 are mutually prime, A2 and B1 satisfying the following conditions are selected as coefficient A2: =plNl and A2 = dlN2 + 1 B1 : =p2N2 and B1 = q2Nl + 1 12 201017529 Here, pi, ql, n9 0 ^ ^ P and q2 are all positive integers. Therefore, the definition of the discrete Fourier transform can be determined by the following equation (2): ^, A2) = ^ X〇c(«l,n2)»^*'^ «I is called 12 =Σ{ Σχ («ι»«2)^^} =ΣΜ.«ί) ^ (2) Condition 1. If Nl and M2 are not mutually prime, then use A2 = B1 = Buin &H# it leaf transformation definition Can be written as the following equation (3): =Σ{ w^ (3)

本發明可由方程式⑵及⑻得知,-長點數(N點) 快速傅立葉轉換可分別在第—及第二階段中以二較短點數 (N1點及N2點)快逮傅立葉轉換加以計算。就定值n2 而言,第一階段N1點快速傅立葉轉換之原始輪入資料為 x(nl,n2), nl-Ο, 1,·..,N1-1。對應此 n2 之第一階段 輸出資料為y(kl,必,kl=G,i ...,N卜卜就定值u 而言,第二階段N2點快速傅立葉轉換之原始輪入資料為 y(kl’ 112),n2=0’ 1,.·.,N2一卜對應此 kl 之第二階段 輸出資料則為X(kl,k2),k2= 〇, i,…,N2h。方程式 13 201017529 (2)與(3)之差異在於若N1與N2非為互質,則第一與第二 階段間存在有如方程式(3)所示之旋轉因子以。 註:就情況一而言,其係數選取亦可如情況二,且本發明 之後續流程相同 現於第一階段計算N1點離散傅立葉轉換並於第二階 段計算N2點離散傅立葉轉換,以求取第一個離散傅立葉 ❿轉換符元。先將原始輸入資料分散存放到數個記憶庫 (memory bank)中。假定N22N1且記憶庫之數量為N2 ’ 則可藉由方程式(4)將該原始輸入資料分散存放至N2個記 憶庫中以避免記憶體衝突。 bank = nl + n2 mod N2 (4) 避免記憶體衝突之關鍵在於透過方程式(1)及(4)將 ❿資料分散存放於記憶庫。一旦選定記憶庫,便需將資料定 址。同一記憶庫中之任兩筆資料應映射至〇到N1_l範圍 内之不同位址。為求簡單,在此選用以下之方程式(5)執 行資料定址。 address = nl (5) 完成第一個離散傅立葉轉換符元之計算後,應將資 料依序輸出。輸出指標係由方程式(1)映射而得。在將計 201017529 之第-個離散傅立葉轉換符元資料依 時’亦以㈣置換之方歧序輸人第二個 符元之輸入資料。亦即,新的原始輸入資料、轉換 輸出貝枓X⑴之位置。計算第: 之順序係與計算第一個離散傅立葉轉換符元時相反 即於第-階段計算N2點離散傅立葉轉換,再 •階段計算I點離散傅立葉轉換,藉以求取第二個離㈣ 立葉轉換符元。完成第二個離散傅立葉轉換符元之叶算 後,同樣以資料置換方式依序輸出計算完成之舊資料並依 序輸入新資料。第三個離散傅立葉轉換符元復依第一個離 散傅立葉轉換符元之方式計算。 ❹ 之 以下將以中國數位電視所需之謂點快速傅立葉轉 換為例詳加說明。以下所述之分解順序僅為可行方式 例 由於3780=4x3x3x3x5x7,本發明可分別進行3點、 4點、5點與7點之快速傅立葉轉換計算。在此,資料係 分散存放於7個記憶庫。 先依分解順序4、3、3、3、5及7執行計算。第一 ^段(4點快速傅立葉轉換)係採用分解方程式進行; 15 201017529 f» = 945„1+2836«2mod3780 [*: = 945^,+ 4^2mod3780 «, ί ' ’2',.·,944 ( 6) 上述方程式將指標η咏灿去人 峨射為向量㈨扮,即由[0,3779]至 [〇, 3] X [0, 944],如表一所示。 表一 第一階段之指標映射 — nl=〇 nl = l η1=2 η1=3 —---— — «2 =〇 χ[〇] χ[945] χ[18 χ[2835 90] ] «2=1 χ[2836] χ[ΐ] χ[94 χ[1891 6] ] »2=2 χ[1892] χ[2837 ] χ[2] χ[947] • · · • · · • · · • · · • · · ^2 =944 χ[944] χ[28 χ[3779 χ[1889] 34] ] 表一中每一列之資料為每個4點快速傅立葉轉換之原 始輪入資料。輸入順序取決於指標nl。例如,& = 1列 16 201017529 中’輸入順序為(nl,(〇,1),(1,1), (2,丨)以及 (3,D ’其分別對應之資料為χ[2836]、χ[1]、 χ[_],如表-所示。 J與 由於每個4點快速傅立葉轉換之原始輸入資料皆具 有相同之指標《2 ’故可藉由方程式⑺將資料分散存入不 同3己憶庫來避免第一階段之記憶體衝突。The present invention can be known from equations (2) and (8), and the long-point (N-point) fast Fourier transform can be calculated by using two shorter points (N1 point and N2 point) fast-changing Fourier transform in the first and second stages, respectively. . For the fixed value n2, the original round entry data of the first stage N1 point fast Fourier transform is x(nl, n2), nl-Ο, 1,·.., N1-1. The output data corresponding to the first stage of this n2 is y(kl, must, kl=G, i ..., N Bu is the fixed value u, the original round entry of the second stage N2 point fast Fourier transform is y (kl' 112), n2=0' 1, 1.·., N2 卜 corresponds to the second stage output data of this kl is X(kl, k2), k2 = 〇, i, ..., N2h. Equation 13 201017529 The difference between (2) and (3) is that if N1 and N2 are not mutually prime, there is a rotation factor as shown in equation (3) between the first and second stages. Note: In case of case 1, the coefficient The selection may also be as in case two, and the subsequent process of the present invention is the same as the first stage calculates the N1 point discrete Fourier transform and the second stage calculates the N2 point discrete Fourier transform to obtain the first discrete Fourier transform symbol. First, the original input data is dispersed into several memory banks. Assuming N22N1 and the number of memory banks is N2 ', the original input data can be distributed to N2 memory banks by equation (4). Avoid memory conflicts. bank = nl + n2 mod N2 (4) The key to avoiding memory conflicts is to pass the equation Equations (1) and (4) distribute the data in the memory. Once the memory is selected, the data needs to be addressed. Any two data in the same memory should be mapped to different addresses within the range of N1_l. For the sake of simplicity, the following equation (5) is used to perform data addressing. address = nl (5) After completing the calculation of the first discrete Fourier transform symbol, the data should be output in order. The output index is determined by the equation (1). The mapping is obtained by inputting the input data of the second symbol in the order of the fourth discrete Fourier transform symbol data of 201017529, which is also replaced by (4). That is, the new original input data, Convert the position of the output 枓X(1). The order of the calculation: is the opposite of the calculation of the first discrete Fourier transform symbol, that is, the N2 point discrete Fourier transform is calculated in the first stage, and the I point discrete Fourier transform is calculated in the stage. Take the second (four) vertical leaf transform symbol. After the second discrete Fourier transform symbol is calculated, the old data calculated by the data is sequentially output and the new data is sequentially input. The third discrete Fourier transform symbol is calculated according to the first discrete Fourier transform symbol. 以下 The following will be explained in detail by the fast Fourier transform required by the Chinese digital TV. The decomposition sequence described below For the only possible example, since 3780=4x3x3x3x5x7, the present invention can perform fast Fourier transform calculation of 3 points, 4 points, 5 points and 7 points respectively. Here, the data is distributed in 7 memory banks. Calculations are performed at 3, 3, 3, 5, and 7. The first segment (4-point fast Fourier transform) is performed using the decomposition equation; 15 201017529 f» = 945„1+2836«2mod3780 [*: = 945^,+ 4^2mod3780 «, ί ' '2',.·,944 ( 6) The above equations illuminate the indicator η 咏 去 为 为 为 向量 向量 向量 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 944], as shown in Table 1. The index map of the first stage of Table 1 - nl=〇nl = l η1=2 η1=3 —---—— — «2 =〇χ[〇] χ[945] χ[18 χ[2835 90] ] «2 =1 χ[2836] χ[ΐ] χ[94 χ[1891 6] ] »2=2 χ[1892] χ[2837 ] χ[2] χ[947] • · · • · · • · · • · • · · ^2 =944 χ[944] χ[28 χ[3779 χ[1889] 34] ] The data in each column of Table 1 is the original round entry for each 4-point Fast Fourier Transform. The order of input depends on the indicator nl. For example, & = 1 column 16 201017529 'The input order is (nl, (〇, 1), (1, 1), (2, 丨) and (3, D 'the corresponding data is χ [2836] χ[1], χ[_], as shown in Table--J. Since the original input data of each 4-point fast Fourier transform has the same index "2", the data can be distributed by equation (7). Different 3 memory pools to avoid the first phase of memory conflicts.

bank = nl + x mod7 (7) 歷經第一階段後,原始資料已被分解為4個9仍點 快速傅立葉轉換之獨立群組,並分別對應於kl= 〇,H 及3。 , 同樣地’本發明亦可以945 = 3χ3χ3χ5χ7之順序分解 上述之945點快速傅立葉轉換,並將指標&映射為向量 ❿(η2,η3’ η4,η5,π6),即由[〇,944]映射為[〇,2]x[〇, 2]x[0,2]χ [〇,4]χ[〇,6]。 結合各階段之所有分解方程式,即可求得378〇點快 速傅立葉轉換依此分解順序之完整指標映射方程式,如方 程式(8)與(9)所示’在選擇記憶庫時係利用方程式(1〇)以 避免記憶體衝突,定址方程式則可採用方程式(11)。 n = 945nl + 1260n2 + 2940n3 + 980n4 + 1512n5 + 17 201017529 540n6 mod3780 (8) k = 945kl + 2380k2 + 3360k3 + 2520k4 + 2268k5 + 540k6 mod3780 (9) bank = nl + n2 + n3 + n4 + n5 + n6 mod7 (10) address = 135nl + 45n2 + 15n3 + 5n4 + n5 • (11) 繼計算第一個快速傅立葉轉換符元,亦即單數之快 速傅立葉轉換符元之後,方程式(1〇)及(11)中所有指標 111將轉換為ki。對於第二個輸入之快速傅立葉轉換符元 而言’亦即雙數之快逮傅立葉轉換符元,該雙數快速傅立 葉轉換符元之輸入資料xeven[n]應被置入單數離散傅立 瘳葉轉換符元之輪出資料XQd❿]之位置,兩者間關係為 其映射指標、記憶庫與絲應分別以方程式⑻、 (ίο)與οι)決定。 以下討論如何於目前雙數快速傅立葉轉換符元之資 =分配下,於計算期間持續避免資料存取時之記憶體衝 哭。 本發明係採用與 為計算雙數快速傅立葉轉換符元 201017529 計算單數快速傅立葉轉換符元時相反之分解順序進行計 算,亦即讓=7χ5χ3χ3χ3χ4,換言之,本發明係以7Bank = nl + x mod7 (7) After the first stage, the original data has been decomposed into four independent groups of 9-point fast Fourier transforms, corresponding to kl=〇, H and 3. Similarly, the present invention can also decompose the above-mentioned 945-point fast Fourier transform in the order of 945 = 3χ3χ3χ5χ7, and map the index & to the vector ❿(η2, η3' η4, η5, π6), that is, [〇, 944] Map to [〇, 2]x[〇, 2]x[0,2]χ [〇,4]χ[〇,6]. Combining all the decomposition equations of each stage, we can obtain the complete index mapping equation of 378〇 fast Fourier transform according to this decomposition order, as shown in equations (8) and (9), using equations when selecting memory. 〇) To avoid memory conflicts, the equation (11) can be used for the address equation. n = 945nl + 1260n2 + 2940n3 + 980n4 + 1512n5 + 17 201017529 540n6 mod3780 (8) k = 945kl + 2380k2 + 3360k3 + 2520k4 + 2268k5 + 540k6 mod3780 (9) bank = nl + n2 + n3 + n4 + n5 + n6 mod7 (10) address = 135nl + 45n2 + 15n3 + 5n4 + n5 • (11) Following the calculation of the first fast Fourier transform symbol, ie the singular fast Fourier transform symbol, in equations (1〇) and (11) All indicator 111 will be converted to ki. For the second input fast Fourier transform symbol, that is, the double-number fast-forward Fourier transform symbol, the input data xeven[n] of the double-number fast Fourier transform symbol should be placed into the singular discrete Fourier transform symbol The position of the data XQd❿], the relationship between the two is its mapping index, memory and silk should be determined by equations (8), (ίο) and οι). The following discussion discusses how to continue to avoid memory crying during data access during the calculation period under the current double-numbered fast Fourier transform symbol. The present invention is calculated using the reverse order of the calculation of the singular fast Fourier transform symbol for calculating the double fast Fourier transform symbol 201017529, that is, let = 7 χ 5 χ 3 χ 3 χ 3 χ 4, in other words, the present invention is 7

*’、5點、3點、3點、3點以及4點快速傅立葉轉換之順 序執行#算。藉㈣財式,本發明可求得雙數快速傅立 葉轉換符元之完整輸入輸出指標映射方程式(12)及(13), 其結果類似於單數快速傅立葉轉換符元之完整指標映射方 程式(8)及(9),方程式(12)及(13)分別將輸入指標η與輸 出指標k映射為向量(ai,a2,a3,a4,a5,a6)與(bl, b2,b3,b4,b5,b6),即由[〇,3779]映射為[〇,6]x[0, 4]χ[0’ 2]χ[0,2]x[〇,2]x[0,3],以計算雙數快速傅 立葉轉換符元。 n = 540al + 2268a2 + 2520a3 + 3360a4 + 2380a5 + 945a6 mod3780 (12) k = 540bl + 1512b2 + 980b3 + 2940b4 + 1260b5 + 945b6 mod3780 (13) 將輸入與輸出指標方程式(9)與(12)以及(8)與(13) 對應比較,可發現此兩組方程式恰以反序向量之關係彼此 匹配,如方程式(14)及(15)所示。 (al, a2, a3, a4, a5, a6) = (k6, k5, k4, k3, k2, 201017529 kl) (14) (nl, n2, n3, n4, n5, n6) = (b6, b5, b4, b3, b2, bl) (15) 因雙數快速傅立葉轉換符元xeven[n]之輸入資料被 置於已計算完成之單數快速傅立葉轉換符元輸出資料*', 5, 3, 3, 3, and 4 points of the fast Fourier transform are executed sequentially. By (4) financial formula, the present invention can obtain the complete input-output index mapping equations (12) and (13) of the double-number fast Fourier transform symbol, and the result is similar to the complete index mapping equation (8) of the singular fast Fourier transform symbol and (9) Equations (12) and (13) map the input index η and the output index k to vectors (ai, a2, a3, a4, a5, a6) and (bl, b2, b3, b4, b5, b6, respectively). ), that is, [〇,3779] is mapped to [〇,6]x[0, 4]χ[0' 2]χ[0,2]x[〇,2]x[0,3] to calculate the double number Fast Fourier transform symbol. n = 540al + 2268a2 + 2520a3 + 3360a4 + 2380a5 + 945a6 mod3780 (12) k = 540bl + 1512b2 + 980b3 + 2940b4 + 1260b5 + 945b6 mod3780 (13) Input and output indicator equations (9) and (12) and (8) Corresponding to (13), it can be found that the two sets of equations match each other in the inverse vector relationship, as shown in equations (14) and (15). (al, a2, a3, a4, a5, a6) = (k6, k5, k4, k3, k2, 201017529 kl) (14) (nl, n2, n3, n4, n5, n6) = (b6, b5, B4, b3, b2, bl) (15) The input data of the double-numbered fast Fourier transform symbol xeven[n] is placed in the calculated singular fast Fourier transform symbol output data.

Xodd [ k ]之位置’而兩者間之關係為k=n,由此可得雙數 ⑩快速傅立葉轉換符元之記憶庫選擇與記憶體定址方程式 (16)及(17)。 bank =技1 + + 技3 + a4 + + 技6 mod7 (16) address = 135a6 + 45a5 + 15a4 + 5a3 + a2 (17) 請注意,輸入映射方程式(12)與輸出映射方程式(9) 相同,且記憶庫選擇方程式(10)與(16)亦保持相同。是 籲以,於雙數快速傅立葉轉換符元之計算期間,藉由倒轉單 數快速傅立葉轉換符元分解順序之計算方式,可使記憶體 始終免於存取衝突。 此外,本發明發現,輸出映射方程式(13)與輸入映 射方程式(8)亦為相同,意指在將第三個快速傅立葉轉換 輸入資料xthirdU]置入已計算完成之雙數快速傅立葉轉 換輸出資料xeven[k]之位置並令其中k=n時,該第三個 20 201017529 快速傅立葉轉換輸入資料之分散儲存方式可與方程式 ⑻、(10)及(11)所決定者相同’也即第三個快速傅立葉 轉換符元的資料存放位置跟單數快速傅立葉轉換符元之資 料存放位置一樣而又回到起始討論的狀態,xthird[n]= x〇dd[n]。 根據以上之實施例,本發明可藉由倒轉先前之快速 ❹傅立葉轉換符元分解順序,設計出無記憶體衝突之可變基 數快速傅立葉轉換處理器,其可同時達到蝴蝶輸出與資料 輸入輸出之資料置換。 第一圖係以記憶體為基礎之3780點離散傅立葉轉換 處理器設計方塊圖,其中MEM一1與ME12為二記憶區塊 (memory block) ’各區塊包含7個記憶庫(memory 參bank),各記憶庫大小為540個字,FFT—C0RE包含可分別 處理短點數離散傅立葉轉換之處理單元,其計算各短點數 離散傅立葉轉換所需的時脈週期數由設計者依需求決定, 此亦決定計算單元所需硬體數量,控制單元主控資料存取 與處理單元之計算。 以下將就離散傅立葉轉換處理器之運作加以闞明, 爲便於說明,在此定義第一類之分解與計算順序依次為4 21 201017529 點、3點、3點、3點、5點與7點離散傅立葉轉換,並將 其倒轉順序定義為第二類順序,依次 ’(‘黏、5 點、3 點、3點、3點與4點離散傅立葉轉換。 假設第-個與第二個離散傅立葉轉換符元分別储存於 記憶區塊MEM—1與_〜2,並假設第一個與第二個離散傅 立葉轉換符元之分解與計算順序皆為第〜類順序,則 ❹以上之說明,本發明可知: ⑴第-、五、九、十三…離散傅立葉轉換符元係儲存於 記憶區塊MEM一1且以第一類順序計算。 ⑵第二、六、十、十四··離散傅立葉轉換符元係儲存於 記憶區塊MEM一2且以第一類順序計算。 (3)第三、七、十一、+五 雜勒捕a 十五…離散傅立葉轉換符元係儲存 ❹於記憶區塊MEM_1且以第二類順序計算。 ⑷第四、人、十:、十六.··離散傅立葉轉換符元係儲存 於記憶區塊MEM—2且以第二類順序計算。 爲說明之便,第二圖中僅以長度為^画⑽之離散 傅立葉轉換為例來說明取得資料存取所需指標向量之硬體 實施方式。如第二圖所示,其係由數個累加器Αι、 A2、…、A5組成。於控制單元中包含3組此硬體設計, 22 201017529 其中2組分別用於產生資料輸入與輸出所需之指標向量, 其參數ΙΠ、U2、U3、114、q與r係由各階段之分解方程式 ⑴決定。而第3組則是用於產生各個短點數傅立葉轉二 所需資料的指標向量,其與前面兩組唯—不同處是此時之 參數q’r皆為〇’即a4,A5可去除。 下表二顯示以記憶體為基礎之離散傅立葉轉換處理 ❹器設計之不同即時應用方法的比較結果。所有項目中,僅 有本案方法與[美國專利4, 477, 878]可支援任何一般基數 之展合’並進行任何點數長度之快速傅立葉轉換。然而, 如别所述’[美國專利4,477,878]未能實現無記憶體衝突 之多記憶體架構(multi-bankmem〇rystructure),因而 無法減少即時應用所需之操作時脈。 ❿ 本發明提出以記憶體為基礎之快速傅立葉轉換處理 器設計方法,使其可同時滿足以下三項目的:⑴蝴蝶運 算與輸入輸出資料時之資料置換;⑵任何基數之混合; 及(3)支援多記憶體架構。 23 201017529 表二 本發明與不同方法之比較 [A1] [A2] [B2] [A5] [A3] [A4] B[3] 本發 明 基數 所有 一般 固定基 數-r 固定 基數-r 基數_ 2/4 所有 一般 資料置 換 是 是 是 是 是 記憶體 2N 個字 2N 個 字 2N個 字 2N個 字 2N 個 字 多重記 憶庫 否 是 否 是 是 [A1]:美國專利 4,477,878 [A2]:美國專利 5, 091, 875 [A3]:美國專利 7,062,523 [A4]:美國專利 7, 164, 723 [A5]:美國專利公開20060253514 24 201017529 [B2]:論文 L. G. Johnson “Conflict free memory addressing for dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992. B[3]:論文 B. G. Jo, and Μ. H. Sunwoo, “New continuous-flow mixed-radix (CFMR) FFT processor ❿ using novel in-place strategy,” IEEE Trans. Circuits Syst. I,Reg. Papers, vol. 52,no. 5,pp. 911-919, May 2005. 綜上所述,當以本發明方法應用於中國數位電視所 需之3780點離散傅立葉轉換時,僅需2N個字之記憶體即 可滿足系統需求,同時達到持續資料流與資料依序輸入輸 出之目的。若欲達成與本案相同效果,[美國專利公開 9 20080025199]、[中國專利 01140060· 9]、[中國專利 03107204.6]之方案需至少3N個字之記憶體,[中國專利 公開 200410090873.2]與[中國專利公開 200710044716. 1] 需至少 5N個字之記憶體,[中國專利公開 200610104144.7 ]需6N個字之記憶體,而[論文2.-又· Yang, Y. -P. Hu, C. -Y. Pan, and L. Yang, ^Design of 25 201017529The position of Xodd [ k ] and the relationship between the two is k = n, thereby obtaining the memory selection and memory addressing equations (16) and (17) of the double number 10 fast Fourier transform symbol. Bank = technique 1 + + technique 3 + a4 + + technique 6 mod7 (16) address = 135a6 + 45a5 + 15a4 + 5a3 + a2 (17) Note that the input mapping equation (12) is the same as the output mapping equation (9). And the memory selection equations (10) and (16) remain the same. It is recalled that during the calculation of the double-numbered fast Fourier transform symbol, the memory is always free of access conflicts by reversing the calculation of the singular fast Fourier transform symbol decomposition order. In addition, the present invention finds that the output mapping equation (13) is the same as the input mapping equation (8), meaning that the third fast Fourier transform input data xthirdU] is placed in the calculated double-number fast Fourier transform output data xeven When the position of [k] is such that k=n, the third 20 201017529 fast Fourier transform input data can be stored in the same manner as those determined by equations (8), (10) and (11), that is, the third The data storage location of the fast Fourier transform symbol is the same as the data storage location of the singular fast Fourier transform symbol, and returns to the state of the initial discussion, xthird[n]= x〇dd[n]. According to the above embodiments, the present invention can design a variable base fast Fourier transform processor without memory conflict by reversing the previous fast Fourier transform symbol decomposition sequence, which can simultaneously achieve butterfly output and data input and output. Data replacement. The first picture is a memory-based 3780-point discrete Fourier transform processor design block diagram, in which MEM-1 and ME12 are two memory blocks (each block contains 7 memory banks (memory reference bank) Each memory bank has a size of 540 words, and the FFT_C0RE includes a processing unit that can separately process the discrete-point Fourier transform of the short-point number, and the number of clock cycles required for calculating the discrete Fourier transform of each short-point number is determined by the designer according to the requirement. This also determines the number of hardware required for the computing unit, and the control unit controls the data access and processing unit calculations. The operation of the discrete Fourier transform processor will be explained below. For convenience of explanation, the decomposition and calculation order of the first type is defined as 4 21 201017529 points, 3 points, 3 points, 3 points, 5 points and 7 points. Discrete Fourier transform, and define its reverse order as the second order, followed by '(', 5, 3, 3, 3, and 4 discrete Fourier transforms. Suppose the first and second discrete Fourier The conversion symbols are stored in the memory blocks MEM-1 and _~2, respectively, and it is assumed that the decomposition and calculation order of the first and second discrete Fourier transform symbols are in the order of the first type, then the above description, this According to the invention, (1) the first, fifth, ninth, thirteenth... discrete Fourier transform symbol is stored in the memory block MEM-1 and is calculated in the first order. (2) Second, sixth, ten, fourteen·discrete Fourier The conversion symbol is stored in the memory block MEM-2 and is calculated in the first type of order. (3) The third, seventh, eleventh, and + five miscellaneous traps a fifteen... the discrete Fourier transform symbol is stored in memory Block MEM_1 is calculated in the second order. (4) Fourth, person, Ten:16. The discrete Fourier transform symbol is stored in the memory block MEM-2 and is calculated in the second order. For the sake of explanation, only the discrete Fourier transform of length (10) is shown in the second figure. As an example, the hardware implementation of the indicator vector required for data access is described. As shown in the second figure, it is composed of several accumulators Αι, A2, ..., A5. The control unit contains 3 groups of this hard. Body design, 22 201017529 Two groups are used to generate the index vector required for data input and output, and the parameters ΙΠ, U2, U3, 114, q and r are determined by the decomposition equation (1) of each stage, while the third group is It is an index vector used to generate the data required for each short point Fourier to two. It is different from the previous two groups. The parameter q'r at this time is 〇', ie, a4, A5 can be removed. Table 2 below shows Comparison of different real-time application methods for memory-based discrete Fourier transform processing. All of the projects, only the method of this case and [US Patent 4, 477, 878] can support the integration of any general base. Make any points faster Fourier transform. However, as described above, [[US Pat. No. 4,477,878] fails to implement a multi-bank mem ry structure without memory conflict, and thus cannot reduce the operational clock required for immediate application. A memory-based fast Fourier transform processor design method is proposed, which can satisfy the following three items: (1) data replacement during butterfly operation and input and output data; (2) mixing of any base; and (3) support for multiple memories 23 201017529 Table 2 Comparison of the present invention and different methods [A1] [A2] [B2] [A5] [A3] [A4] B[3] The cardinality of the invention, all general fixed bases - r fixed base - r base _ 2/4 All general data replacement is yes or no memory 2N words 2N words 2N words 2N words 2N words multiple memory banks whether it is yes [A1]: US Patent 4,477,878 [A2]: US patent 5, 091, 875 [A3]: U.S. Patent 7,062,523 [A4]: U.S. Patent 7,164,723 [A5]: U.S. Patent Publication No. 20060253514 24 201017529 [B2]: Paper LG Johnson "Conflict free memory addressi Ng for dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992. B[3]: Paper BG Jo, and Μ H. Sunwoo, "New continuous-flow mixed-radix (CFMR) FFT processor ❿ using novel in-place strategy," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911 -919, May 2005. In summary, when the method of the present invention is applied to the 3780-point discrete Fourier transform required for Chinese digital television, only 2N words of memory can be used to meet the system requirements while achieving continuous data flow. The purpose of inputting and outputting with the data in sequence. If you want to achieve the same effect as the case, [US Patent Publication 9 20080025199], [Chinese Patent 0114060], and [Chinese Patent 03107204.6] require at least 3N words of memory, [Chinese Patent Publication 200410090873.2] and [Chinese Patent] Public 200710044716. 1] Memory of at least 5N words is required, [Chinese Patent Publication 200610104144.7] requires 6N words of memory, and [Thesis 2.-Yang Yang, Y. -P. Hu, C. -Y. Pan, and L. Yang, ^Design of 25 201017529

a 3780-point IFFT processor for TDS-OFDM,” IEEEa 3780-point IFFT processor for TDS-OFDM,” IEEE

Trans. Broadcast., vol. 48, no. 1, pp. 57-61, Mar. 2002]亦至少需3N個字以上之記憶體。因此,就此應用而 言,若以本案方法設計3780點離散傅立葉轉換處理器, 則相較於以現有方式設計者可大幅縮減晶片面積。 本發明更提出一種以記憶體為基礎之(正/逆向)快速 ❿傅立葉轉換處理器,用以執行上述之方法,其係包含:一 用以存放資料之主要記憶體、一進行分解後短點數快速傅 立葉轉換之處理元件以及一控制單元,其中該控制單元具 有控制以下項目之功能··( 1)輸入輸出資料與蝴蝶運算用 之記憶體,(2)分解後之短點數快速傅立葉轉換之計算順 序,及(3)以資料置換方式進行資料存取所需之記憶體定 赢址。用以執行上述之方法。該主要記憶體包含二記憶區塊 ❿ (memory block),亦即 MEM_1 與 MEM—2,當 MEM_1 用於快 速傅立葉轉換運算時,MEM_2則用於輸入輸出資料,反之 亦然;且,每一記憶區塊包含Μ個記憶庫(memory bank), 且每一記憶庫之大小為N/M,其中N為快速傅立葉轉換之 點數長度,Μ為由系統設計者自行設定之記憶庫數量,該 處理單元FFT_C0RE係設計為可對分解後之短點數快速傅 26 201017529 立葉轉換進行個別計算,其計算各短點數離散傅立葉轉換 所需的時脈週期數由設計者依需求決定此亦決定計算單 元所需硬體數量。該控制單元之第⑴項控制功能係控制 如前所述之該等記憶區塊,以將其功能切換為快速傅立葉 轉換計算或輸入輸出資料。該控制單元之第(2)項控制功 能係控制該處理元件,使其利用與同一記憶區塊之前次快 籲速傅立葉轉換符元分解順序相反之順序進行短點數快速傅 立葉轉換计算,從而取得快速傅立葉轉換符元;亦即,若 該快速傅立葉轉換符元於一記憶區塊中係以ni點快速傅 立葉轉換、N2點快速傅立葉轉換....至Nk點快速傅立葉 轉換之順序計算’則儲存於同一記憶區塊中之次一快速傅 立葉轉換符元之計算順序為Nk點快速傅立葉轉換叫〇 •點快速傅立葉轉換......至N1點快速傅立葉轉換。該控制 單兀之第(3)項控制功能係控制以資料置換之方式進行資 料存取,從而進行每一記憶區塊之蝴蝶運算與資料輸入輪 八i含3組運作原理如圖二之硬體設計,其中2組分 別用於產生資料輸入與輸出所需之指標向量,其參數 111⑽、113、M、q與r係由各階段之分解方程式決定。 而第3紅肢用於產生各個短點數傅立葉轉換所需資料的 27 201017529 指標向量,其與前面兩組唯一不同處是此時之參數q,r 皆為0。 綜上所述,以上之實施例僅是用來解說本發明之具體 實施方式,本發明之專利範圍仍應以申請專利範圍所載為 準。 【圖式簡單說明】 ❿第一圖本發明快速傅立葉轉換處理器設計方塊圖。 第二圖本發明指標向量產生器硬體實施圖。 【主要元件符號說明】 無Trans. Broadcast., vol. 48, no. 1, pp. 57-61, Mar. 2002] also requires at least 3N words of memory. Therefore, for this application, if the 3780-point discrete Fourier transform processor is designed in the present method, the chip area can be greatly reduced compared to the prior art designer. The invention further provides a memory-based (forward/reverse) fast ❿ Fourier transform processor for performing the above method, which comprises: a main memory for storing data, and a short point after decomposition A fast Fourier transform processing component and a control unit, wherein the control unit has the function of controlling the following items: (1) input and output data and memory for butterfly operation, and (2) short point fast Fourier transform after decomposition The order of calculation, and (3) the memory-based win address required for data access by means of data replacement. Used to perform the above method. The main memory includes two memory blocks, namely MEM_1 and MEM-2. When MEM_1 is used for fast Fourier transform operation, MEM_2 is used for input and output data, and vice versa; and, each memory The block contains a memory bank, and the size of each memory bank is N/M, where N is the length of the point of the fast Fourier transform, and the number of memories is set by the system designer. The unit FFT_C0RE is designed to perform individual calculations on the decomposed short point fast Fu 26 201017529 Fourier transform. The number of clock cycles required to calculate the discrete Fourier transform of each short point is determined by the designer according to the requirements. The number of hardware required. The control function of item (1) of the control unit controls the memory blocks as described above to switch their functions to fast Fourier transform calculations or input and output data. The control function of item (2) of the control unit controls the processing element to perform short-point fast Fourier transform calculation in the reverse order of the previous fast-speed Fourier transform symbol decomposition order of the same memory block, thereby obtaining a fast Fourier transform symbol; that is, if the fast Fourier transform symbol is in a memory block with a ni point fast Fourier transform, an N2 point fast Fourier transform .... to the Nk point fast Fourier transform sequence calculation ' The order of calculation of the next fast Fourier transform symbol stored in the same memory block is Nk point fast Fourier transform called 〇•point fast Fourier transform...to N1 point fast Fourier transform. The control function of item (3) of the control unit controls the data access by means of data replacement, thereby performing the butterfly operation and data input wheel of each memory block, and the operation principle of the three groups is as shown in FIG. The body design, in which two groups are used to generate the index vector required for data input and output, the parameters 111(10), 113, M, q and r are determined by the decomposition equations of each stage. The third red limb is used to generate the 27 201017529 index vector of the data required for each short-point Fourier transform. The only difference from the previous two sets is that the parameter q, r is 0 at this time. In the above, the above embodiments are only intended to illustrate the specific embodiments of the present invention, and the scope of the invention should be determined by the scope of the patent application. [Simple diagram of the drawing] ❿The first figure is a block diagram of the design of the fast Fourier transform processor of the present invention. The second figure shows the hardware implementation diagram of the indicator vector generator of the present invention. [Main component symbol description] None

2828

Claims (1)

201017529 十、申請專利範面: 1. -種任意減快速侧葉轉換之計算蚁財法,其舰在於包 含下列步驟: ⑴將長點數離散傅立葉轉換的計算分解為數舰點數的離散 傅立葉轉換’並同時將其指標由單一維度映射成多維度指標向 量; • ⑵藉由控制這些多維度指標向量,把原始輸入資料分散存放 到數個記憶體裡’使得在不產生記憶體存取衝突的情況下同時達 到計算期間的資料置換與記憶體完整蝴蝶點數一次存取的目的; (3)當紐置換制在⑽算完賴¥資料依序輸出與新資料 依序輸入時’為了往後計算_可以繼續保持資料存取時沒有士己 憶體衝突’對於新資料的計算採取與先前資料計算時的反序操作 i 來達成目的; 依此方法,對於設計任意點數的以記憶體為基礎的快速傅立葉轉 換處理器,可以減少處理器面積與所需的操作時脈。 2. 含:一 種以6己憶體為基礎之正/逆向快速傅立葉轉換處理器 葉棘換:放料之主要記憶趙一進行分解後短點數快速傅立 =:處理·及—控制單元’其中該控制單元具有控制以下 項目之功I . (1)輸入輸出資料與蝴蝶運算用之記憶體(2广 29 201017529 之短點數快迷傅立葉轉換之計算順序,及(3)以資料置換方式進行 資料存取所需之記憶體定址。 3·如申請專概㈣2餐述之以域體絲礎之正/逆向快速傅立 葉轉換處理器,其中,該主要記憶體包含二記憶區塊,為MEM—1 與MEM〜2,當MEM_1用於快速傅立葉轉換運算時,細河-2則 用於輸入輪出資料,反之亦然。 • 4.如申請專利範圍帛3項所述之以記憶體為基礎之正/逆向快速傅立 葉轉換處理器’其中,每—記憶區塊包含Μ個記憶庫,且每一記 隐庫之大小為jV/jVl ’其巾#為快速傅立葉轉換之點數長度,Μ為 由系統設計者自行設定之記憶庫數量。 5.如申請專利範圍第2項所述之以記憶體絲礎之正/逆向快速傅立 葉轉換處理器’其中,該處理單元係設計為可對分解後之短點數快 ❿速傅立葉轉換進行個別計算。 、 6.如申請專概圍第2猶叙以記憶體絲礎之…逆向快速傅立 轉換處理器’其中,該控制單元之第⑴項㈣魏係控制如申 請專利範®第3項所述之該兩記醜塊以將其魏切換為快速傅 立葉轉換計算錢^^資料。 如申請專概Η第2項賴之以記舰為基礎之正/逆向快速傅立 葉轉換處理器,装由 、h控制單元之第(2)項控制功能係控制該處 30 201017529 理元件來朗t請補細第i邮⑶狀步驟執行 ,使其利用 與同-記舰塊之前雜補立葉轉歸元分解順序減之順序進 行短峨賴增_, 即’若該快速傅立葉轉換符元於—記憶區塊中係以%點快速傅立 麵,雜雜議......W賴增換之順序 计算,則儲存_-記_塊中之次—快速傅立葉騎符元之計算 順序為Μ點快逮傅立葉轉換點快速傅立葉轉換至 快速傅立葉轉換。 1 ” 8.如申請專利範圍第2項所述之以記憶體為基礎之正,逆向快速傅立 葉轉換處理器,其中’ _單元之第⑶項控制功能係記憶體定 址並控制以資料置換之方式進行資料存取,從而進行每一記憶區塊201017529 X. Application for patents: 1. The calculation of ant arbitrage method for arbitrary reduction of fast side-lobe conversion. The ship consists of the following steps: (1) Decomposing the calculation of long-point discrete Fourier transform into discrete Fourier transforms of several ship points. 'And at the same time map its indicators from a single dimension to a multi-dimensional indicator vector; • (2) by controlling these multi-dimensional indicator vectors, the original input data is scattered into several memories' so that no memory access conflicts are generated In the case of simultaneous data replacement during the calculation period and the purpose of accessing the complete butterfly point of the memory; (3) When the New Zealand replacement system is in the order of (10), the data is sequentially output and the new data is sequentially input. After the calculation _ can continue to maintain the data access without the conflict of the person's memory conflict 'for the calculation of the new data to take the reverse of the previous data calculation i to achieve the purpose; in this way, for the design of any number of memory Based on the fast Fourier transform processor, you can reduce the processor area and the required operating clock. 2. Including: a positive/reverse fast Fourier transform processor based on 6 recalls. Leaf and spine exchange: the main memory of the discharge material, Zhao Yi, after decomposition, short point number, fast Fourier =: processing · and - control unit The control unit has the function of controlling the following items. (1) Input and output data and memory for butterfly operation (2 Guang 29 201017529 short point number fast fan Fourier transform calculation order, and (3) data replacement method The address of the memory required for data access. 3. If you apply for a special (4) 2 meal, the positive/reverse fast Fourier transform processor of the domain body, where the main memory contains two memory blocks, which are MEM. —1 and MEM~2, when MEM_1 is used for fast Fourier transform operation, Xihe-2 is used to input the wheeled data, and vice versa. • 4. As described in the patent application 帛3, the memory is The basic positive/reverse fast Fourier transform processor', wherein each memory block contains one memory, and the size of each hidden library is jV/jVl 'the towel# is the length of the point of the fast Fourier transform, Μ For system designers The number of memory banks set. 5. The positive/reverse fast Fourier transform processor based on the memory of the second aspect of the patent application scope, wherein the processing unit is designed to be faster for the short points after decomposition The idling Fourier transform is performed for individual calculations. 6. If the application is for the general purpose, the second is the memory of the basics... the reverse fast Fourier transform processor', where the control unit (1) (4) Wei system control is applied The two ugly blocks described in the third paragraph of the patent paradigm are used to convert the Wei to the fast Fourier transform to calculate the money ^^ data. If the application is specifically for the second item, the ship-based forward/reverse fast Fourier The conversion processor, the control function of the (2) item of the control unit is controlled by the control unit of the h control unit, and the step (3) step is executed to make use of the same---- The order of the miscellaneous lobes and the meta-decomposition order is reduced by 顺序, that is, 'if the fast Fourier transform symbol is in the memory block, the fast point is used in the % point, and the miscellaneous discussion..... .W Lai increase the order of calculation, then store _- remember _ The second is the calculation sequence of the fast Fourier ride symbol. The fast Fourier transform point is fast Fourier transform to fast Fourier transform. 1 ” 8. As described in the second paragraph of the patent application, the memory-based positive, An inverse fast Fourier transform processor, wherein the control function of the '(_) element is addressed by the memory and controls data access by means of data replacement, thereby performing each memory block 之蝴蝶運算與資料輸人輸出。此項控制魏為申請專利 第(2)點之實施。 弟1項 9·如申請專利範圍第8項所述之以記憶體為基礎之正/逆向快迷傅立 葉轉換處理器,其中’執行資料存取之記憶庫係由指標向量”立 «2, ···,《k)與方程式(ai)所決定,其中方程式( τ 馬—*常量整 而該指標向量對應每一短點數快速傅立葉轉換,亦即 利範圍第6項所述之%點快速傅立葉轉換、%點快申請專 換...·.·至乂點快速傅立葉轉換。 、零立葉轉 31 201017529 όαι认=叫 + 叱 + ... + „k + e m〇d M ^ 如申請專利範圍第9項所述之以記憶體為基礎之正/逆。 轉轉換處理器’其中’方程式⑹將所有資料分為 入每-記㈣塊之Μ個缝庫,触址料可細指標向子 Uk)之任-函數,該指標向量可將同一群組中任兩曰^資料置^ 兩個不同位址;當記憶庫數量M = Μ,#盔λ,、 入 鲁 1 Μ為乂,’...喝其中 之一。而(tA,t/2,…,似=(H ...,# t·1 ’ M+1,..., k) * (Wi » U2 > ... » Mk-l) = («1 5 «2 * ... » ru, y n -t+i ’ ..· ’ 叫)。而分 解與計算順序為l點快速傅立葉轉換,點快速傅立葉轉換. 至I點快速傅立葉轉換或其逆向順序,則我們採用公式⑹來決定 資料位址。 k-2 k~\ ^={Σ(Π^}+^ mod^/M) (a3) • 11.如申請專利範圍第9項所述之以記憶體為基礎之正/逆向快速傅 立葉轉換處理器,其中,該指標向量係由分解下列方程式⑹與 ⑹於各階段產生,在此以胸从2為例’離散傅立葉轉換之定義 為梢',〇轉’ ’分解方程式⑹與(a5)可將JV點離散傅立葉轉換 分解為兩錄錄數之離立葉娜,㈣雜散傅立葉轉換 與I點離散傅立葉轉換;分解方程式⑹與⑹中之係數木及怂 32 201017529 為正整數’輸人錢出指標分別由該輸人及輸出方程编)與⑹ 轉換; η = Λ^«ι + Λ2η2 k — B\k\ + N\k2Butterfly operation and data input output. This control Wei is the implementation of patent (2). 1 item 9 · The memory-based forward/reverse fast Fourier transform processor as described in item 8 of the patent application scope, wherein the 'memory library for performing data access is determined by the indicator vector' «2, ··, “k) is determined by equation (ai), where the equation (τ __ constant is uniform and the indicator vector corresponds to each short point fast Fourier transform, that is, the % point described in item 6 of the profit range is fast Fourier transform, % point fast application for exchange...···To point fast Fourier transform., Zero-leaf turn 31 201017529 όαι认=叫+ 叱+ ... + „k + em〇d M ^ Memory-based forward/reverse according to item 9 of the scope. The conversion processor 'where' equation (6) divides all the data into one seam library of each-four (four) block, and the contact material can be fine index Sub-Uk)-function, the indicator vector can set two data in the same group to two different addresses; when the number of memory M = Μ, #helmet λ,, enter Lu 1 Μ, '...drink one of them. And (tA, t/2,..., like =(H ...,# t·1 ' M+1,..., k) * (Wi » U2 > ... » Mk-l) = ( «1 5 «2 * ... » ru, yn -t+i ' ..· ' 叫). The decomposition and calculation order is l-point fast Fourier transform, point fast Fourier transform. To I point fast Fourier transform or In the reverse order, we use equation (6) to determine the data address. k-2 k~\ ^={Σ(Π^}+^ mod^/M) (a3) • 11. As described in item 9 of the patent application A memory-based forward/reverse fast Fourier transform processor, wherein the index vector is generated by decomposing the following equations (6) and (6) at each stage, where the chest is defined as a 'discrete Fourier transform' ', 〇转' 'The decomposition equations (6) and (a5) can decompose the JV point discrete Fourier transform into two discrete records, (4) stray Fourier transform and I-point discrete Fourier transform; decomposition equations (6) and (6) Coefficient wood and 怂32 201017529 is a positive integer 'input money out indicators are respectively compiled by the input and output equations) and (6) conversion; η = Λ^«ι + Λ2η2 k — B\k\ + N\k2 mod TV W1,灸1=0,1,·..,Λ^-1 «2, ^2 = 0, 1, ..., N2-\ (34) (a5) ° 12.如申請專利範圍第9項所述之以記憶體為基礎之正/逆向快速傅 立葉轉換處理器,其巾,賊於如ψ請專利範圍第利所述輸入Mod TV W1, moxibustion 1=0,1,·..,Λ^-1 «2, ^2 = 0, 1, ..., N2-\ (34) (a5) ° 12. As claimed The memory-based forward/reverse fast Fourier transform processor described in the nine items, the thief of the thief 鬱 及輸出方程式之該域向量〜,,⑷可經由累加器之硬體運 用而取得。 如申請專職,項所述之以記憶體為基礎之正/逆向快速傅立 葉轉換處理器,其中,以該指標向量^2,,·言用以於 第顺計Μ點離散傅立葉轉換而被存取之料資料的每 二:=::,且計算完成之輸嶋係 硬體運用而轉 侧⑽伽累加器之 33The domain vectors ~, (4) of the output equation can be obtained by the hardware of the accumulator. A memory-based forward/reverse fast Fourier transform processor as described in the application for full-time, wherein the index vector ^2, , is used to be accessed by the fourth-order discrete Fourier transform. Every two materials of the material: =::, and the calculation of the completed transmission is the hardware application and the rotation side (10) gamma accumulator 33
TW97141311A 2008-10-28 2008-10-28 Computation and addressing method of a eneral sized memory-based FFT processor TW201017529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97141311A TW201017529A (en) 2008-10-28 2008-10-28 Computation and addressing method of a eneral sized memory-based FFT processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97141311A TW201017529A (en) 2008-10-28 2008-10-28 Computation and addressing method of a eneral sized memory-based FFT processor

Publications (2)

Publication Number Publication Date
TW201017529A true TW201017529A (en) 2010-05-01
TWI375171B TWI375171B (en) 2012-10-21

Family

ID=44830889

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97141311A TW201017529A (en) 2008-10-28 2008-10-28 Computation and addressing method of a eneral sized memory-based FFT processor

Country Status (1)

Country Link
TW (1) TW201017529A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9459812B2 (en) 2014-02-03 2016-10-04 Ceva D.S.P. Ltd. System and method for zero contention memory bank access in a reorder stage in mixed radix discrete fourier transform
CN109117454A (en) * 2017-06-23 2019-01-01 扬智科技股份有限公司 3780-point fast Fourier transform processor and operating method thereof
US12494235B2 (en) 2023-03-31 2025-12-09 Taiwan Semiconductor Manufacturing Company, Ltd. Systems and methods for flexible bank addressing in digital computing-in-memory (DCIM)
TWI913640B (en) 2023-03-31 2026-02-01 台灣積體電路製造股份有限公司 Systems, methods and circuits for flexible bank addressing in digital computing-in-memory

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9459812B2 (en) 2014-02-03 2016-10-04 Ceva D.S.P. Ltd. System and method for zero contention memory bank access in a reorder stage in mixed radix discrete fourier transform
CN109117454A (en) * 2017-06-23 2019-01-01 扬智科技股份有限公司 3780-point fast Fourier transform processor and operating method thereof
CN109117454B (en) * 2017-06-23 2022-06-14 扬智科技股份有限公司 3780-point fast Fourier transform processor and its operation method
US12494235B2 (en) 2023-03-31 2025-12-09 Taiwan Semiconductor Manufacturing Company, Ltd. Systems and methods for flexible bank addressing in digital computing-in-memory (DCIM)
TWI913640B (en) 2023-03-31 2026-02-01 台灣積體電路製造股份有限公司 Systems, methods and circuits for flexible bank addressing in digital computing-in-memory

Also Published As

Publication number Publication date
TWI375171B (en) 2012-10-21

Similar Documents

Publication Publication Date Title
US8364736B2 (en) Memory-based FFT/IFFT processor and design method for general sized memory-based FFT processor
CN114816334B (en) Acceleration unit, related device and method
US8694570B2 (en) Method and apparatus for evaluation of multi-dimensional discrete fourier transforms
Chen et al. Hardware efficient mixed radix-25/16/9 FFT for LTE systems
US9582474B2 (en) Method and apparatus for performing a FFT computation
JP3749022B2 (en) Parallel system with fast latency and array processing with short waiting time
Hsiao et al. A generalized mixed-radix algorithm for memory-based FFT processors
JP2022540749A (en) Systems and methods for shift-based information mixing across channels of neural networks similar to shuffle nets
CN102053948A (en) Method and system for transposing a matrix on a single instruction multiple data multi-core processor architecture
JP2011100452A5 (en)
US20040243656A1 (en) Digital signal processor structure for performing length-scalable fast fourier transformation
TW201229898A (en) Mechanism for conflict detection using SIMD
US20210303358A1 (en) Inference Engine Circuit Architecture
CN105224505A (en) Based on the FFT accelerator installation of matrix transpose operation
WO2023045516A1 (en) Fft execution method, apparatus and device
JP2000231513A (en) Memory architecture for parallel data access in any given dimension of an N-dimensional rectangular data array
TW201017529A (en) Computation and addressing method of a eneral sized memory-based FFT processor
US9176929B2 (en) Multi-granularity parallel FFT computation device
CN111221501A (en) Number theory conversion circuit for large number multiplication
Puchała et al. Effectiveness of Fast Fourier Transform implementations on GPU and CPU
US20060253514A1 (en) Memory-based Fast Fourier Transform device
CN111047026B (en) Memory chip capable of executing artificial intelligent operation and operation method thereof
US20080228845A1 (en) Apparatus for calculating an n-point discrete fourier transform by utilizing cooley-tukey algorithm
TWI237773B (en) Fast fourier transform processor and dynamic scaling method thereof and radix-8 fast Fourier transform computation method
EP1076296A2 (en) Data storage for fast fourier transforms

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees