TW200915095A

TW200915095A - Fast Fourier Transform (FFT) and inverse FFT (IFFT) device and method

Info

Publication number: TW200915095A
Application number: TW96135456A
Authority: TW
Inventors: Yuan-Chu Yu; Lan-Da Van; Chin-Teng Lin
Original assignee: Univ Nat Chiao Tung
Priority date: 2007-09-21
Filing date: 2007-09-21
Publication date: 2009-04-01

Abstract

Fast Fourier Transform (FFT) and inverse FFT (IFFT) device and its method tare disclosed. By incorporating a retrenched 8-point FFT (R8-FFT) unit and a multiplication-after-write (MAW) scheme, the chip area can be reduced and computation speed is increased, so as to fulfill the needs of a multi-input multi-output (MIMO) orthogonal frequency-division multiplexing (OFDM) wireless communication system.

Description

200915095 l 九、發明說明：【發明所屬之技術領域】本發明係有關一種傅立葉轉換及其反轉換，特別是關 . 於一種快速傅立葉轉換及其反轉換裝置與方法。【先前技術】快速傅立葉轉換係一種分析時間改變信號的技術，簡單地說，快速傅立葉轉換(FFT)係將時域信號映射至其頻域副本，反快速傅立葉轉換(IFFT)係將頻域信號映射至其時域副本，二者互為可逆的轉換。對於一正交多頻系統 (orthogonal frequency division multiplexing ; OFDM)而言， IFFT/FFT模組係處理調變/解調變以達到有效多載波 (multicarrier)傳送所不可或缺之工具，例如無線區域網路 (wireless local area network)802.11 a 標準中，要求一種具高速度與即時處理能力，且能結合一簡易方法而達成高資料處理效率之IFFT/FFT模組。此外，在一些特定的無線通訊系統中，例如非對稱式數位用戶線路(ADSL)、超高速數位用戶迴路（VDSL)或數位影音廣播系統（DAB/DVB-T) 等，可藉由使用IFFT/FFT模組增加傳輸的頻寬或增加傳輸的效率。蝴蝶模組是一種普遍用來實現FFT/IFFT的方式，其係將輸入的資料點利用多層交叉運算的方式，得到 FFT/IFFT的結果，習知的蝴蝶模組具有相同數量的輸入及輸出，如圖1所示，以基數8的FFT/IFFT蝴蝶模組1〇〇 200915095 為例，其具有二個乘法單元n〇及12〇、八個輸入χ[⑴至 x[7]以及八個輪出γ[0]至γ[7]，用以運算藉奇偶數的拆解方式分解由輸人至χ[7]形成的傅立葉轉換矩陣，以同時產生八個對應的輸出Υ[0]至Υ[7]。中華民國專利公告號第265503號提出一種「二維反 =餘弦轉換|置」’採用ΝχΝ型矩陣架構及管線型乘法器架構，以矩陣拆解方式縮小晶片面積，但仍需四分之一長又的複數乘法及儲存參數的唯讀記憶體，因而消耗較的曰a >}面積’同時需要較長的運算瓶頸時間，而且只能 ^用在反離散餘弦轉換裝置上。中華民國專利公告號第409212號提出一種「功率及 :積有=之快速傅立葉轉換處理器」，採用大於4點矩陣茱模、、且核、，以及應用並列式單一延遲迴授路徑（SDF) 采構，，如並列式R22SDF、R23SDF及R24SDF架構，惟 =理器的複數乘法的複雜度高，因此需要較長的運算瓶、間此外，採用複數乘法器配合唯讀記憶體架構，需 p較大的晶片面積，且處理速率（ρΓ〇_ίηέ她)僅為，無法符合多輸入多輸出系統需求。〜。中華民國專利公告號第418362號提出一種「具有平 2構之快速傅立葉轉換裝置」，分別採用並列式更動的二餘弦轉換(MDCT)與更動的離散正弦轉換(MDST)函數 =异法，其中第一 MDCT/MDST模組列是以時間遞迴架構十仃排列實現，需消耗較大的晶片面積。中華民國專利公告號第522315號提出一種「利用離 200915095 散哈特利轉換來計算離散傅立葉轉換及其反轉換之裝 ί 哈㈣彳轉換及矩陣型架構來實現512點之離散二入反轉換，其處理速率僅為R，無法符合多輪入夕戰’』出系統需求。中華民國專利公告號第1224263號提出―種易FFT/iyFT處理器」，採用矩陣型架構及管線型乘法器以 16點R2 SDF型加強架構為發展基礎，以實現32、64、128 及256點的快迷傅立葉轉換，惟其複數乘法的複雜度高，且處理速率僅4 R，無法符合多輸人统需求。中華民國專利公開號第200534121號提出一種「快速傅立，轉換架構及方法」，採用蝴蝶矩陣型，因而消耗較大的明片面積在儲存記憶體上，同時需要較長的運算瓶頸時間’且其處理速率僅為R，無法符合多輸人多輸出系統需求。中華民國專利公開號第20060107號提出一種「快速傅立葉轉換處理器及其動態調整方法及基數-8之快速傅立葉轉換演异法」，採用8點矩陣型蝴蝶模組架構重複運异’需要消耗較大的晶片面積，同時需要較長的運算瓶頸時間’且其處理速率僅為R，無法符合多輸入多輸出系統需求。中華民國專利公開編號200602902提出一種「快速傅立葉轉換方法」，其仍需較長的運算時間，且處理速率僅為R，無法符合多輸入多輸出系統需求。200915095 l IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a Fourier transform and its inverse conversion, and more particularly to a fast Fourier transform and its inverse conversion apparatus and method. [Prior Art] Fast Fourier Transform is a technique for analyzing time-varying signals. Simply put, Fast Fourier Transform (FFT) maps time domain signals to their frequency domain replicas, and inverse fast Fourier transform (IFFT) uses frequency domain signals. Maps to its time domain copy, which are mutually reversible. For an orthogonal frequency division multiplexing (OFDM), the IFFT/FFT module is an indispensable tool for processing modulation/demodulation to achieve efficient multicarrier transmission, such as wireless zones. The wireless local area network (802.11a) standard requires an IFFT/FFT module with high speed and instant processing capability, and which can combine high-data processing efficiency with a simple method. In addition, in some specific wireless communication systems, such as asymmetric digital subscriber line (ADSL), ultra-high-speed digital subscriber loop (VDSL) or digital audio and video broadcasting system (DAB/DVB-T), etc., by using IFFT/ The FFT module increases the bandwidth of the transmission or increases the efficiency of the transmission. The butterfly module is a commonly used method for implementing FFT/IFFT. It uses the method of multi-layer crossover operation to obtain the result of FFT/IFFT. The conventional butterfly module has the same number of inputs and outputs. As shown in Fig. 1, the FFT/IFFT butterfly module 1〇〇200915095 with base 8 has two multiplication units n〇 and 12〇, eight inputs χ[(1) to x[7] and eight rounds. γ[0] to γ[7] are used to calculate the Fourier transform matrix formed by the input to χ[7] by the disassembly method of the odd and even numbers to simultaneously generate eight corresponding outputs Υ[0] to Υ [7]. The Republic of China Patent No. 265503 proposes a "two-dimensional inverse = cosine transform | set" using a 矩阵-type matrix architecture and a pipeline-type multiplier architecture to reduce the wafer area by matrix disassembly, but still needs a quarter of a length The complex multiplication and the read-only memory storing the parameters, thus consuming a larger 曰a >} area' requires a longer computational bottleneck time, and can only be used on the inverse discrete cosine transform device. The Republic of China Patent No. 409212 proposes a "power sum: fast Fourier transform processor with a =," using a matrix of more than 4 points, and a core, and applying a parallel single delay feedback path (SDF) The structure, such as the parallel R22SDF, R23SDF and R24SDF architecture, only the complexity of the complex multiplication of the processor is high, so it requires a long calculation bottle, and in addition, using a complex multiplier with a read-only memory architecture requires p The larger the wafer area, and the processing rate (ρΓ〇_ίηέ her) is only, can not meet the needs of multiple input and multiple output systems. ~. The Republic of China Patent No. 418362 proposes a "fast Fourier transform device with a flat structure", which uses a parallel-type modified two-cosine transform (MDCT) and a modified discrete sine transform (MDST) function = the same method, wherein A MDCT/MDST module column is implemented in a time-returned architecture, which requires a large wafer area. The Republic of China Patent No. 522315 proposes a "discrete double-inverted conversion of 512 points by using the dispersion of the Hartley transform from 200915095 to calculate the discrete Fourier transform and its inverse transform." The processing rate is only R, which cannot meet the requirements of multiple rounds of the war. The Republic of China Patent No. 1224263 proposes an "Easy FFT/iyFT Processor", which uses a matrix architecture and a pipeline type multiplier to 16 The R2 SDF-type enhanced architecture is the basis for development to achieve fast, Fourier transforms of 32, 64, 128, and 256 points. However, the complexity of complex multiplication is high, and the processing rate is only 4 R, which cannot meet the requirements of multiple input systems. The Republic of China Patent Publication No. 200534121 proposes a "fast Fourier, conversion architecture and method" that uses a butterfly matrix type, thus consuming a large area of the slice on the memory and requiring a long computational bottleneck time' Its processing rate is only R, which cannot meet the needs of multi-input and multi-output systems. The Republic of China Patent Publication No. 20060107 proposes a "fast Fourier transform processor and its dynamic adjustment method and a fast Fourier transform algorithm for base number-8", which uses an 8-point matrix butterfly module architecture to repeat the same operation. The large wafer area requires a long computational bottleneck time and its processing rate is only R, which cannot meet the requirements of multiple input and multiple output systems. The Republic of China Patent Publication No. 200602902 proposes a "fast Fourier transform method" which still requires a long operation time and the processing rate is only R, which cannot meet the requirements of the MIMO system.

Wei-Hsin Chang and Truong Nguyen 在 IEEE Trans· on 7 200915095Wei-Hsin Chang and Truong Nguyen on IEEE Trans· on 7 200915095

Circuit and System I 中發表的「An OFDM-specified lossless FFT architecture」，以R22SDF架構實現一 64點快速傅立葉轉換’其針對位元階層作簡化以得到最小晶片面積的設計，但複數乘法的複雜度高，且處理速率僅為R，無法符合多輸入多輸出系統需求。"An OFDM-specified lossless FFT architecture" published in Circuit and System I, which implements a 64-point fast Fourier transform with the R22SDF architecture. It simplifies the bit hierarchy to obtain the minimum chip area design, but the complexity of complex multiplication is high. And the processing rate is only R, which cannot meet the requirements of multi-input and multi-output systems.

Chiu C·，Wing Hui，Tion J. D. and McCanny J. V.在 IEEE Journal of Solid-State Circuits 中發表的「A 64-point Fourier transform chip for video motion compensation using phase correlation」，提出一基數·4的多延遲整流架構 (R4MDC)，使其在4x4的多輸人多輸出應用上，各元件的利用率有不錯的表現，但需消耗較大的晶片面積，且複數乘法的複雜度高。Chiu C., Wing Hui, Tion JD and McCanny JV, "A 64-point Fourier transform chip for video motion compensation using phase correlation", published in IEEE Journal of Solid-State Circuits, proposes a multi-delay rectification architecture with a base number of 4. (R4MDC), it makes the utilization of each component have a good performance in the 4x4 multi-input multi-output application, but it consumes a large wafer area, and the complexity of complex multiplication is high.

Haining Jiang, Hanwen Luo, Jifeng Tian and Wentao Song 在 IEEE Trans, on Consumer Electronics 中發表的「Design of an efficient FFT processor for OFDM systems」，提出一基數-4的多延遲整流架構，由於在架構上將處理核心並列排列’導致所需的晶片面積增加，同時系統操作頻率大於輸入資料取樣頻率，不符合低耗能要求0Haining Jiang, Hanwen Luo, Jifeng Tian and Wentao Song, "Design of an efficient FFT processor for OFDM systems" published in IEEE Trans, on Consumer Electronics, proposes a radix-4 multi-delay rectification architecture, due to the architecture will handle The parallel arrangement of cores leads to an increase in the required wafer area, while the system operating frequency is greater than the input data sampling frequency, which does not meet the low energy requirements.

Yunho Jung, Hongil Yoon and Jaeseok Kim 在 IEEE Trans, on Consumer Electronics 中發表的「New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications」，提出一基數_4的多延遲整流架構，藉由改變架構使内部的常數乘法面積減少，但系統操 200915095 作頻率大於輸入資料取樣頻率，不符合低耗能要求。 K. Maharatna，E. Grass and U. Jagdhold 在 IEEE Journal of Solid-State Circuits 中發表的「A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM」，提出一以常數並列乘法器架構之基數-8式快速傅立葉轉換器，其處理速度雖可達4.33R，但卻需消耗較大的晶片面積，且在2x2及4x4多輸入多輸出正交多頻系統應用上各模組的利用率較低。 L. Jia，Y. Gao, J. Isoaho and H. Tenhunen 在 Proc. Eleventh Annu. IEEE Int. ASIC Conf.中發表的「A new VLSI-oriented FFT algorithm and implementation」，提出一傳統式的分離基數-2/8理論，但卻無實現的硬體架構，若以其中的方程式直接實現硬體架構，每個蝴蝶架構中多出一個常數乘法單元，而需消耗較大的晶片面積。Yunho Jung, Hongil Yoon and Jaeseok Kim, "New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications" published in IEEE Trans, on Consumer Electronics, proposes a base-four multi-delay rectification architecture by changing the architecture. The internal constant multiplication area is reduced, but the system operation 200915095 frequency is greater than the input data sampling frequency and does not meet the low energy requirements. K. Maharatna, E. Grass and U. Jagdhold, "A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM", published in IEEE Journal of Solid-State Circuits, proposes a constant parallel multiplier architecture. The base--8 fast Fourier converter has a processing speed of 4.33R, but consumes a large wafer area, and the utilization of each module in 2x2 and 4x4 multiple input multi-output orthogonal multi-frequency system applications Lower. L. Jia, Y. Gao, J. Isoaho and H. Tenhunen, Proc. Eleventh Annu. IEEE Int. ASIC Conf., "A new VLSI-oriented FFT algorithm and implementation", proposes a traditional separation factor - 2/8 theory, but no implementation of the hardware architecture, if the equation is directly implemented in the hardware architecture, each butterfly architecture has a constant multiplication unit, which consumes a large wafer area.

Wen_Chang Yeh and Chein-Wei Jen 在 IEEE Trans, on Signal Processing 中所發表的「High-speed and low-power split-radix FFT」，提出一傳統式的分離基數-2/8理論的硬體架構實現設計，但每個蝴蝶架構中多出一個常數乘法單元，而需消耗較大的晶片面積，同時由於處理速率僅為R，無法符合多輸入多輸出系統需求，且各元件的利用率也略低0 S. Bouguezel，M.O. Ahmad and Μ. N. S. Swamy 在 IEEE Trans, on Circuits and Systems I 中所發表的「A New Radix-2/8 FFT Algorithm for Length-q x 2m DFTs」，提出一 200915095 傳統式的分離基數-2/8理論，但卻無實現的硬體架構，若以其中的方程式直接實現硬體架構，每個蝴蝶架構中將多出一個常數乘法單元，而需消耗較大的晶片面積。 C. T. Lin, Y. C. Yu and L.D. Van 在 IEEE ISCAS2006 Conference 中所發表的「A Low Power 64-Point FFT/IFFT Design for IEEE 802.11a wireless LAN Application」，提出一傳統式的分離基數-2/8理論，但其處理速率僅為R，只能應用到低處理速度之單輸入單輸出正交多頻 (SISO-OFDM)環境下，無法適用於多輸入多輸出系統的應用上。李建松在大同工學院的碩士論文（2006)「Design and implementation of variable-length fast gourier transform processor」，雖揭示各種FFT/IFFT的轉換架構，例如單一延遲迴授路徑（SDF)、單一延遲整流路徑（SDC)及多延遲整流路徑(MDC)的架構’但該等架構均無法同時滿足低晶片面積、高效率以及多輸入多輸出系統應用的需求。由於傳統的R22SDF及R23SDF架構之64點快速傅立葉轉換器，應用在單一輸入單一輸出的無線通訊系統，例如無線區域網路’雖然滿足精簡及高效率之設計，但因耗能大及複數乘法的複雜度高，不適用於多輸入多輸出之系統，而分離基數式(split-radix)和高階基數式(high-radix)快速傅立葉轉換理論’雖可解決複數乘法複雜度高的問題，但高階基數式架構的乘法器數目大量增加，造成晶片成本升高’且分離基數式架構目前未有一高利用率之管線架 200915095 構，使其符合多輸入多輪出益域網路。 …、線通λ系統，例如無線區因此，一種應用於多輪多低晶片面積之快速傅立葉轉有率及為所冀。茱轉換及其反轉換裝置與方法，乃【發明内容】本發明的目的，在於想φ _ . ^ ^ 統且具有高效率及低曰片:：種應用於多輸入多輸出系轉換裝置與方法片面積之快速傅立葉轉換及其反 ^據本H —種快速傅立葉轉換及元，_ 入資料產生複數個第於屮t茱轉換模組根據該複數個輸資料包含-輸出資料，每一該複數個第一輸出個乘法單元，一編號資訊，一乘法模组具有複數該複數個乘μ ㈣編號資訊從具有該位置資訊之第二輸出資料，以:一存該第二_，其*= 矩陣式㈣體複數έ <固第-輸出資料包含衝突輸出資料，該料，並在後：該位置資訊儲存該衝突輸出資根據本發：種該乘法模組進行運算。括⑷進行第—L刑速傅立葉轉換及其反轉換方法包尘蝴蝶運算以根據複數個輸入資料產生 200915095 複數個具有位置資訊及編號資訊之第— 複數個第一輸出資料中該編號資訊不衝貝^，對該一輪出資料’（C)根據該位置資訊館、及未經該第-乘法運算的該複數個第—輪資料經該第一乘法運算之該複數個第一輸出中對未訊不衝突者進行對應該編號資訊的第二乘==資數個具有該位置資訊之第三輸出資料複㈣及未㈣第二乘法運算的該複數個 Τ輪出貧料’（f)重複步卵)及⑷直到Wen_Chang Yeh and Chein-Wei Jen, "High-speed and low-power split-radix FFT" published in IEEE Trans, on Signal Processing, proposes a traditional hardware architecture design based on the separation base-2/8 theory. However, there is one constant multiplication unit in each butterfly structure, which consumes a large wafer area, and because the processing rate is only R, it cannot meet the requirements of the multi-input and multi-output system, and the utilization rate of each component is also slightly lower. S. Bouguezel, MO Ahmad and Μ. NS Swamy's "A New Radix-2/8 FFT Algorithm for Length-q x 2m DFTs" published in IEEE Trans, on Circuits and Systems I, presents a 200915095 traditional separation The base-2/8 theory, but no implementation of the hardware architecture, if the equation is directly implemented in the hardware architecture, each butterfly architecture will have a constant multiplication unit, which consumes a larger wafer area. CT Lin, YC Yu and LD Van at the IEEE ISCAS2006 Conference, "A Low Power 64-Point FFT/IFFT Design for IEEE 802.11a wireless LAN Application", proposes a traditional separation base-2/8 theory, but The processing rate is only R, which can only be applied to a single input single output quadrature multi-frequency (SISO-OFDM) environment with low processing speed, and cannot be applied to applications of multiple input and multiple output systems. Li Jiansong's Master's thesis at the Datong Institute of Technology (2006) "Design and implementation of variable-length fast gourier transform processor", while revealing various FFT/IFFT conversion architectures, such as a single delayed feedback path (SDF), a single delay rectification path ( SDC) and Multi-Delay Rectifier Path (MDC) architectures, but these architectures are not capable of meeting both low die area, high efficiency, and multi-input multi-output system applications. Due to the traditional R22SDF and R23SDF architecture 64-point fast Fourier converter, the wireless communication system used in single input and single output, such as wireless local area network, is designed for streamlined and high efficiency, but it is energy-intensive and complex multiplication. High complexity, not suitable for multi-input and multi-output systems, while split-radix and high-radix fast Fourier transform theory can solve the complex multiplication problem, but high-order The number of multipliers in the base-type architecture has increased significantly, resulting in higher cost for the wafers' and the separation-based architecture does not currently have a highly utilized pipeline rack 200915095, making it compatible with multi-input, multi-round and profit-domain networks. ..., line-pass λ system, such as wireless zone, therefore, a fast Fourier transfer rate for multiple rounds of low wafer area.茱 conversion and its inverse conversion device and method are the content of the invention. The object of the invention is to have a high efficiency and a low : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The fast Fourier transform of the slice area and the inverse of the H-type fast Fourier transform and the element, the _ input data generates a plurality of 第t茱 conversion modules according to the plurality of data containing-output data, each of the plural a first output multiplication unit, a number information, a multiplication module having a plurality of multiplicative μ (four) number information from the second output data having the position information, to: store the second _, its *= matrix The formula (4) body complex number έ < 固第-output data contains conflicting output data, the material, and after: the location information stores the conflict output resource according to the present invention: the multiplication module performs the operation. (4) Performing the first-L penalty speed Fourier transform and its inverse conversion method, the dust-blown butterfly operation to generate 200915095 multiple pieces of position information and number information based on the plurality of input data - the number of the first output data is not rushed ^, the round of the data '(C) according to the location information hall, and the plurality of first wheel data without the first multiplication operation, the first plurality of first outputs of the first multiplication operation The non-conflict person performs the second multiplication of the numbered information == the number of the third output data with the position information (four) and the fourth (four) second multiplication operation of the plurality of rounds of the poor material '(f) Step eggs) and (4) until

η料均經過該第-或第二乘法運算，以及⑻進行H ㈣該第二及第三資料產生該複數個輸入資枓的快速傅立茱轉換資料或其反轉換資料。 t發明藉由精簡式8點快速傅立葉轉換模組提高硬體利用率，以及利用先寫後乘（multiplication_after_write; MAW)機韻少運算_，並採用矩陣式記，隨模組建構迴授路徑與向前整流路徑架構提高處理速率，以最小的晶 =面積達成符σ夕輸人多輸出正交多頻無線通訊系統需求的目的。【實施方式】本《明提έΗ精簡式8點快速傅立葉轉換模組(R8_FFT) 以降低複數乘法的複雜度’以N點快速傅立葉轉換為例，其轉換序列可表示為 200915095 Z[k]=YiX[n]-Wjr 公式1 其中，X[n]為輸入資料’ Z[k]為輪出資料，^ 、且％，根據基數8之理論，公式j可改寫為N為％轉因子 Z[s + Tt]= ^ /=0 τ~\ ^MT ' X{l-\- Mm\ Wfm 公式2 ’ T，8，s、可拆解成二個其中，k=s+Tt，n=1+Mm，以N等於64為例 t、l及m均為〇至7的整數，因此，公 8點之快速傅立葉運算式並改寫成 " >〇y, y6 Λ w° w2 h ysEach of the n materials undergoes the first or second multiplication operation, and (8) the H (d) the second and third data are used to generate the fast Fourier transform data of the plurality of input assets or the inverse conversion data thereof. The invention improves the hardware utilization by the simplified 8-point fast Fourier transform module, and utilizes the multiplication_after_write (MAW) machine rhyme operation _, and adopts the matrix type to construct the feedback path with the module. The forward rectification path architecture increases the processing rate to achieve the goal of the multi-output orthogonal multi-frequency wireless communication system with a minimum crystal = area. [Embodiment] The "Established 8-point fast Fourier transform module (R8_FFT) to reduce the complexity of complex multiplication" takes the N-point fast Fourier transform as an example. The conversion sequence can be expressed as 200915095 Z[k]= YiX[n]-Wjr Equation 1 where X[n] is the input data 'Z[k] is the round-off data, ^, and %, according to the theory of base 8, formula j can be rewritten as N is the %-turn factor Z[ s + Tt]= ^ /=0 τ~\ ^MT ' X{l-\- Mm\ Wfm Equation 2 ' T,8,s, detachable into two of them, k=s+Tt,n=1 +Mm, where N is equal to 64, t, l, and m are integers from 〇 to 7, so the fast Fourier expression of the 8th point is rewritten as ">〇y, y6 Λ w° w2 h ys

w'w'

公式3 將公式3對18〇。與9〇的多餘項作簡化後得到 200915095 >〇' 1 0 0 y. 0 0 1 y. 0 0 ys 0 0 1 y2 0 0 η 0 0 0 γ6 0 1 0 Λ. 0 0 0 οΚο -旳 ~y%'ο (^〇 + χ4) + (χ2 + χ6)- (尤0+尤4)-(Χ2 +义 6) (^0-Χ4)_ jXX2^X6) ^Χ0~Χ4) + Λχ2-Χ6) (尤1 +A) + (X3 + Jf7) u'+a)-(x3 + a) {x' ~χ5)~ ΛΧ,-^ι) -(Λ，1 ~^s) + 7(^3-^7). &式4 接著以上半部及下半部拆4之上四舰^料財：以公式 >0/1 h/3 其中 = gKFft)+Hkfft)， G2(FFT) + ^2(FFT) > >6 /7 G\(FFT)= ^Χ〇+Χ^ + (Χ2+Χ6) L(’〇 "Ά)-·/(义2 _尤6)_ G2(FFT) (Xq +X4)-(X2 +^e) ^0-^4) + 7(^2-^6), ^ΐ(«τ) = [ι rg1]· "2(印r)=-户[l ej_ '^\(FFT) - H^FFTy -^2(FFT) ~H2(FFT) 公式5a 公式5a (Xl +X5)-(X^+X7) _(Χ{-Χ5)+ΛΧ3~Χ7)_Equation 3 puts Equation 3 to 18〇. Simplify with the excess of 9〇 to get 200915095 >〇' 1 0 0 y. 0 0 1 y. 0 0 ys 0 0 1 y2 0 0 η 0 0 0 γ6 0 1 0 Λ. 0 0 0 οΚο -旳~y%'ο (^〇+ χ4) + (χ2 + χ6)- (尤0+尤4)-(Χ2 +义6) (^0-Χ4)_ jXX2^X6) ^Χ0~Χ4) + Λχ2 -Χ6) (尤1 +A) + (X3 + Jf7) u'+a)-(x3 + a) {x' ~χ5)~ ΛΧ,-^ι) -(Λ,1 ~^s) + 7 (^3-^7). & 4 Then the upper half and the lower half are removed from the 4th ship. The formula is: formula >0/1 h/3 where = gKFft)+Hkfft), G2( FFT) + ^2(FFT) >>6 /7 G\(FFT)= ^Χ〇+Χ^ + (Χ2+Χ6) L('〇"Ά)-·/(义2 _尤6 )_G2(FFT) (Xq +X4)-(X2 +^e) ^0-^4) + 7(^2-^6), ^ΐ(«τ) = [ι rg1]· "2( Print r) = - household [l ej_ '^\(FFT) - H^FFTy -^2(FFT) ~H2(FFT) Equation 5a Equation 5a (Xl +X5)-(X^+X7) _(Χ{ -Χ5)+ΛΧ3~Χ7)_

根據公式6a至6d可得到R8-FJFT 公式6a 公式6b 公式6c 公式6d 在此運鼻過程中，以上 14 200915095 半部及下半部的拆解方式取代習知奇偶數的拆解方式分解公式4所示的矩陣，以化簡所需之單一乘法數。 ^圖2係R8-FFT根據輸出重新排序之分離基數_2/8理論產生的乙型蝴蝶模組200，其具有一個乘法單元21〇、八個輸入及四個輸出，以計算8個輸入資料乂[〇]至χ[7] 為例’要將輪入資料X[〇g Χ[7]完全輸入需八個時脈週期，L型蝴蝶模組200在二個時脈週期中進行運算，以分別產生對應輪入資料Χ[〇]至Χ[7]的輸出Υ[2]、γ [6]、γ[3] 與Υ[7]以及輪出Υ[0]、γ [4]、γ⑴與γ[5]，L型蝴蝶模組 200的利用率為八分之二，反觀如圖i所示的蝴蝶模組係藉由二個乘法單元Π0及120在一個時脈週期中進行運算，以同時產生對應輸入資料X[〇]至X[7]的輸出資料γ[0] 至Υ[7] ’蝴蝶模組1〇〇的利用率為八分之一，因此l型蝴蝶模組200相較於習知的蝴蝶模組100至少提升二倍的利用率。此外’ L型蝴蝶模組200中的乘法單元數目僅為習知的蝴蝶模組1 〇〇的一半’有效減少所需的晶片面積。圖3係本發明之FFT/IFFT裝置的第一個實施例，一採用串列模組(serial blockwise)架構之分離基數_2/8多延遲迴授路徑（redix-2/8 multiple-path delay feedback ; R28MDF)300包括一輸入模組310具有輸入組318及319, 輸入組318及319分別包含複數個輸入單元312用以接收複數個輸入資料，一 R8-FFT 320根據該等輸入資料產生具包含位置資訊及編號資訊的輸出資料，R8-FFT 320的具體架構即為圖2所不之L型蝴蝶模組2 0 0 ’ -~~控制模組 15 200915095According to formulas 6a to 6d, R8-FJFT can be obtained. Equation 6a Equation 6b Equation 6c Equation 6d In this nose-feeding process, the above-mentioned 14 200915095 half and bottom half dismantling methods replace the conventional odd-numbers disassembly method decomposition formula 4 The matrix shown to simplify the single multiplication number required. ^ Figure 2 is a R8-FFT based on the output reordering separation base _2 / 8 theory of the B butterfly module 200, which has a multiplication unit 21 〇, eight inputs and four outputs to calculate 8 input data乂[〇]至χ[7] For example, 'To enter the data X[〇g Χ[7] full input requires eight clock cycles, the L-Butterfly module 200 operates in two clock cycles, To generate the output Υ[2], γ [6], γ[3] and Υ[7], and the Υ[0], γ [4], respectively, corresponding to the rounded data Χ[〇] to Χ[7], γ(1) and γ[5], the utilization rate of the L-shaped butterfly module 200 is two-eighth. In contrast, the butterfly module shown in Fig. i is operated in one clock cycle by two multiplication units Π0 and 120. In order to simultaneously generate the output data γ[0] to Υ[7] of the corresponding input data X[〇] to X[7], the utilization rate of the butterfly module 1〇〇 is one eighth, so the l-type butterfly module The 200 is at least twice as efficient as the conventional butterfly module 100. In addition, the number of multiplying units in the 'L-type butterfly module 200 is only half that of the conventional butterfly module 1' effectively reduces the required wafer area. 3 is a first embodiment of the FFT/IFFT device of the present invention, a discrete block 2/8 multi-delay feedback path using a serial blockwise architecture (redix-2/8 multiple-path delay) The feedback module (R28MDF) 300 includes an input module 310 having input groups 318 and 319. The input groups 318 and 319 respectively include a plurality of input units 312 for receiving a plurality of input data, and an R8-FFT 320 generates a device according to the input data. The output structure including the location information and the number information, the specific structure of the R8-FFT 320 is the L-shaped butterfly module of FIG. 2 2 0 0 ' -~~ control module 15 200915095

350用以控制R8-FFT 320的運算次序，一乘法模組330具有複數個乘法單元用以對R8-FFT 320的輸出資料進行乘法運算，以及一矩陣式記憶體模組340根據該位置資訊儲存經乘法模組330運算後的結果，矩陣式記憶體模組340 採用迴授路徑的架構，用以將其儲存的資料迴授至R8-FFT 320中，產生對應於該等輸入資料的FFT/IFFT資料輸出。在本實施例中，輸入單元312及乘法模組330的示意圖分別如圖4及圖5所示。參照圖4，輸入單元312包括一及閘（AND gate)316根據一閘極控制信號及一時脈產生—位移信號Ss，以及一暫存器314根據位移信號心位移該等輸入資料。參照圖5及圖3，乘法模組330包括一多工器 334、複數個乘法單元332以及一解多工器336，多工器 334根據該編號資訊將r8_fft 32〇輸出資料傳遞至對應的乘法單元332進行對應的乘法運算，經解多工器336將乘法=元332運算的結果分別由Y0至Y4輸出，並根據該位置資訊儲存至矩陣式記憶體模組340，在一實施例中，乘 Μ ^元332包括加法位移器，利用固定位移線路配合加法 =實現乘法運算，因此降低了乘法運算的複雜度，且不需加另—唯讀式記憶體，更進—步減少所需的晶片面積。 —a，6係圖3的運算時序圖，以64點之 ϋ、Γ拉點之腳㈣的運算時間需64個時脈，包括二 =期麗與MS2以及二個輸出時期⑽與⑽: 分別為圖6中乘法時期及輸出時期的時序圖。 …圖7及圖3,輸入模組31〇包括二個輸圖 16 200915095 319’輸入組318及319包含64個輸入單元312，且64個輸入單元312構成由χ〇〇至χ77之8x8矩陣’ 64點資料χ[〇] 至Χ[63]依序輸入至χ〇〇至Xu中，在第1個時脈週期中，資料Χ[56]輸入至χ7〇，R8_FFT 320根據已輸入的資料， • 例如已輸入至Xoo至X〇7中的資料X[0]至X[7]，進行L型蝴蝶運算，產生輸出資料γ〇〇(0)、γ1〇(0)、丫4〇(〇)及γ5〇(〇)，其中，Yab(c)中的ab表示位置資訊(a，b)以及e表示編號資訊，接著乘法模組330根據編號資訊選用對應的乘法單元對R8-FFT 320的輸出資料進行對應的乘法運算，並將運异結果根據位置資訊存入矩陣式記憶體模組34〇中對應的位置。在第2個時脈週期中，資料χ[57]輸入至X7i，r8_fft 320根據已輸入的資料，例如已輸入至χ⑽至χ〇7中的資料 χ[〇]至xm，進行L型蝴蝶運算，產生輸出信號γ2。⑼、 Υ3〇(〇)、丫6〇(0)及Υ^Ο)，經乘法模組33〇根據編號資訊選 =對應的乘法單元進行乘法運算後，將結果根據位置資訊存入矩陣式記憶體模組34〇巾。同樣地，在第3至第Μ =時脈週期中’根據輸人至輸入模組训中的得注意的是，在第6個時脈週期中，鹿到:輸出貝料¥22(4)及Y62(4)的編號資訊互相衝突，對 c乘法單元，例如第四個乘法單元，而形成衝突 2氣⑷’此時，在第6個時脈週期中，選用：：法對編號資訊不衝突的輸出資料Y2 : i資並將衝突輪出資料⑽根據位，）子入陣式冗憶體模組340中對應的位置 17 200915095 (6,2)以便在後續未使用到第四乘法單元的時脈週期中再，衝大輸出貝料γ62⑷提供給第四乘法單元進行乘法運异’例如在第8個β夺脈週期中將衝突輸出資料、⑷提供給第四乘法單元進行乘法運算，此時以R64(4)表示之，並將、、、。果根據位置資訊(6,2)回存至矩陣式記憶體模組⑽。同樣地’在第9個時脈週期中的衝突輸出資料\(句在第 12個時脈週期中進行乘法運算，在第1()個時脈週期中的衝突輸出資料Υ64⑻及Υ74_分別在第η及f 13個時脈週期中進仃乘法算，因此第工至第16個時脈週期形成圖6中的乘法時期_。參照圖8及圖3，在第個時脈週期中’ #料X[8]輸入至Χι。，似_附32()根據從矩記憶體模組340巾迴授的資料的資料進行[型蝴蝶產生刚7贿資料2°°、21。、24。及250輸出。在第18個時脈週期中，資料χ[9]輸入至&，R8_FFT32()根據從矩陣式記憶體模組340中迴授的資料的資料進行L型蝴异’產生FFT/IFFT資料Ζ〇ι、&、乙41及&輸出，，在第19至第32個時脈週期中，辦FT32G根據矩: 式圮憶體杈組340中迴授的資料進行L型蝴蝶運算中的輸出時/月OS1，同樣地，第33至第48個以及第49至第64個時脈週期則分別形成圖的2 期MS2及輸出時期0S2。 J人法時 ^本實施例中，？法模組33。採用並列式迴授路冓，其包括九個乘法單元分別對應編號訊〇〜8，每—今^ 18 200915095 法單元對應不同的乘法運算，五個輸入端分別接收R8_FFT 320的輸出資料及儲存在矩陣式記憶體模組340中的衝突輸出資料’以及五個輸出端Y0至Υ4分別輸出經乘法單元運异後的結果，並根據位置資訊存入矩陣式記憶體模組 /、中對應編说ϊ 0的乘法單元表示X1，可直接輸出不會造成衝突，此種先寫後乘機制可減少運算的延遲時間’進而提高處理速率。由於輪入模組310具有二個輸入組318及319，且在 64個時脈週期中包括二個乘法時期MSI與MS2及二個輸出時期OS1與〇S2 ’顯示R28MDF 300的處理速率可達 2R’適用於2x2多輸入多輸出正交多頻的無線區域網路。此外’無論在乘法時期或輸出時期r8_fft 320及矩陣式記憶體模組340均被使用，因此R8-FFT 320及矩陣式記憶體模組340的利用率達1〇〇〇/。，有效提升各模組的利用率。圖9係本發明FFT/IFFT裝置的第二個實施例，一採用串列模組架構之分離基數-2/8多延遲整流路徑(redix-2/8 multiple_path delay commutator ; R28MDC)400 包括一輸入樓組410具有輸入組460至466，.輸入組460至466分別包含複數個輸入單元412用以接收複數個輸入資料，以及一切換單元414用以切換該複數個輸入資料，一 R8_pFT 4 2 0根據經切換單元切換後輸出之該等輸入資料產生具包含位置資及編號資訊的輸出資料，一乘法模組430具有複數個乘法單元432用以對R8-FFT 420的輸出資料進行 19 200915095 乘法運算，一矩陣式記憶體模組44〇根據該位置資訊存經乘法模組運算後的結果，以及_ R8_FFT 45G根據矩陣式記憶體模組440中儲存的⑽產生㈣於該諸人資料之 FFT/IFFT資料輸出，其中，R8 FFT 42〇及45〇的具體架構即為圖2所示之L型蝴蝶模組·。在本實施例中，乘法模組430採用並列式迴授路徑，以便接收矩阵式記憶體模組440提供的資料或將資料儲存至矩陣式記憶體模組 440中，且輸入單元412及乘法模組43()的架構與圖4及圖5所示之架構相同。在—實施例中，乘法單S 432包括 =法位移器’利用固定位移線路配合加法器實現乘法運算因此降低了乘法運异的複雜度，且不需外加另一唯讀式記憶體’更進一步減少所需的晶片面積。圖10係圖9的運算時序圖，參照圖9圖1〇，以64點之FFT/IFFT為例，一次64點之fft/ifft的運算時間需 64個時脈’包括四個乘法時期順、服、湖與麗以及四個輸出時期⑽、〇S2、〇s3與〇S4，乘法時期及輸出時期的時序圖及各模組之間的運作分別如圖7及圖8 所不’圖9與目3的差異在於矩陣式記憶體· 44〇採用向前整流路㈣構’其财的資料直接提供給 R8-FFT 450 而不迴授至R8_FFT 42G。在第—組的16個時脈週期47〇中’ R8-FFT42〇及乘法模板43〇根據經切換單元414切換 4 L型蝴蝶運算及乘法運算，並將運算結，儲存至矩陣式記憶體模組440中，形成乘法時期MS1。第一組的16個時脈週期472中，R8_FFT 45〇根據矩陣 20 200915095The 350 is used to control the operation order of the R8-FFT 320. The multiplication module 330 has a plurality of multiplication units for multiplying the output data of the R8-FFT 320, and a matrix memory module 340 stores the information according to the location information. As a result of the operation of the multiplication module 330, the matrix memory module 340 adopts a feedback path architecture for feeding back the stored data to the R8-FFT 320 to generate an FFT corresponding to the input data. IFFT data output. In the present embodiment, the schematic diagrams of the input unit 312 and the multiplication module 330 are as shown in Figs. 4 and 5, respectively. Referring to FIG. 4, the input unit 312 includes an AND gate 316 that generates a shift signal Ss according to a gate control signal and a clock, and a register 314 that shifts the input data according to the displacement signal. Referring to FIG. 5 and FIG. 3, the multiplication module 330 includes a multiplexer 334, a plurality of multiplication units 332, and a demultiplexer 336. The multiplexer 334 passes the r8_fft 32〇 output data to the corresponding multiplication according to the number information. The unit 332 performs a corresponding multiplication operation, and the result of the multiplication=element 332 operation is output from the Y0 to Y4 by the demultiplexer 336, and is stored in the matrix memory module 340 according to the position information. In an embodiment, The multiplication unit 332 includes an addition shifter, and uses a fixed displacement line with addition = to achieve multiplication, thereby reducing the complexity of the multiplication operation, and without adding another-read only memory, further reducing the required Wafer area. -a,6 is the operation timing diagram of Figure 3. The operation time of 64 feet and the foot of the pull point (4) requires 64 clocks, including two = phase and MS2 and two output periods (10) and (10): respectively It is a timing chart of the multiplication period and the output period in Fig. 6. ... Figure 7 and Figure 3, the input module 31A includes two maps 16 200915095 319 'Input groups 318 and 319 contain 64 input units 312, and 64 input units 312 form an 8x8 matrix from χ〇〇 to χ 77 The 64-point data χ[〇] to Χ[63] are sequentially input to χ〇〇 to Xu. In the first clock cycle, the data Χ[56] is input to χ7〇, and the R8_FFT 320 is based on the input data. • For example, the data X[0] to X[7] that have been input to Xoo to X〇7 are subjected to the L-shaped butterfly operation to produce the output data γ〇〇(0), γ1〇(0), 丫4〇(〇 And γ5〇(〇), wherein ab in Yab(c) represents position information (a, b) and e represents number information, and then the multiplication module 330 selects a corresponding multiplication unit pair R8-FFT 320 according to the number information. The output data is subjected to a corresponding multiplication operation, and the difference result is stored in the corresponding position in the matrix memory module 34A according to the position information. In the second clock cycle, data χ[57] is input to X7i, and r8_fft 320 performs L-shaped butterfly operation based on the input data, such as data χ[〇] to xm that has been input to χ(10) to χ〇7. , generating an output signal γ2. (9), Υ3〇(〇), 丫6〇(0), and Υ^Ο), multiplied by the multiplication unit selected by the numbering information== after the multiplication module 33, and the result is stored in the matrix memory according to the position information. The body module 34 wipes. Similarly, in the 3rd to Μ = clock cycle, 'note that according to the input to the input module training, in the 6th clock cycle, the deer arrives: output bead material ¥22 (4) And the number information of Y62(4) conflicts with each other, and the c multiplication unit, for example, the fourth multiplication unit, forms a collision 2 gas (4)'. At this time, in the 6th clock cycle, the following:: The conflicting output data Y2: i and the conflict rounding data (10) according to the bit, the corresponding position 17 200915095 (6, 2) in the sub-array redundancy module 340 so that the fourth multiplication unit is not used subsequently In the clock cycle, the large output bead γ62(4) is supplied to the fourth multiplication unit for multiplication, for example, the conflict output data is supplied to the fourth multiplication unit in the eighth β cycle, and (4) is multiplied. At this time, it is represented by R64 (4), and will be, , and . If it is back to the matrix memory module (10) according to the location information (6, 2). Similarly, 'the conflicting output data in the ninth clock cycle\ (the sentence is multiplied in the 12th clock cycle, and the conflict output data Υ64(8) and Υ74_ in the 1st () clock cycle are respectively The η and f 13 clock cycles are multiplied, so the first to the 16th clock cycles form the multiplication period _ in Fig. 6. Referring to Fig. 8 and Fig. 3, in the first clock cycle '# The material X[8] is input to Χι., and the _ attached 32 () is based on the data of the information retrieved from the moment memory module 340. [The butterfly generates the data of 2°°, 21, 24, and 250 output. In the 18th clock cycle, the data χ[9] is input to &, R8_FFT32(), based on the data of the data fed back from the matrix memory module 340, the L-type FFT is generated. The IFFT data Ζ〇ι, &, B 41 and & output, in the 19th to 32th clock cycles, the FT32G is based on the moment: the data retrieved from the group 340 is used to perform the L-shaped butterfly. In the calculation, the output time/month OS1, similarly, the 33rd to 48th and 49th to 64th clock cycles respectively form the 2nd MS2 of the graph and the output period 0S2. In this embodiment, the method module 33 adopts a parallel feedback loop, which includes nine multiplication units respectively corresponding to the numbering signal ~8, and each of the current 18 18150150 law units corresponds to different multiplication operations, five The input end respectively receives the output data of the R8_FFT 320 and the conflicting output data stored in the matrix memory module 340 and the five output terminals Y0 to Υ4 respectively output the results of the multiplication by the multiplication unit, and store the result according to the location information. The multiplication unit of the matrix memory module /, corresponding to the description ϊ 0, represents X1, and the direct output does not cause a collision. This first write and multiplication mechanism can reduce the delay time of the operation, thereby increasing the processing rate. The module 310 has two input groups 318 and 319, and includes two multiplication periods MSI and MS2 and two output periods OS1 and 〇S2 'display R28MDF 300 processing rate up to 2R' in 64 clock cycles. 2x2 multi-input multi-output orthogonal multi-frequency wireless local area network. In addition, 'R8-FFT 320 and matrix are used regardless of the multiplication period or output period r8_fft 320 and matrix memory module 340 are used. The utilization rate of the memory module 340 is up to 1〇〇〇/., effectively improving the utilization rate of each module. Fig. 9 is a second embodiment of the FFT/IFFT device of the present invention, and the separation using the serial module architecture The radix-2/8 multiple delay rectification path (RXMDC) 400 includes an input floor group 410 having input groups 460 to 466. The input groups 460 to 466 respectively include a plurality of input units 412 for Receiving a plurality of input data, and a switching unit 414 is configured to switch the plurality of input data, and an R8_pFT 4 2 0 generates output data including location information and number information according to the input data outputted after switching by the switching unit, The multiplication module 430 has a plurality of multiplying units 432 for performing 19 200915095 multiplication operations on the output data of the R8-FFT 420, and a matrix memory module 44 is subjected to the operation of the multiplication module according to the position information, and _ R8_FFT 45G generates (4) the FFT/IFFT data output of the data according to (10) stored in the matrix memory module 440, wherein the specific structure of the R8 FFT 42〇 and 45〇 is as shown in FIG. · L-shaped butterfly module. In this embodiment, the multiplication module 430 uses a parallel feedback path to receive the data provided by the matrix memory module 440 or store the data in the matrix memory module 440, and the input unit 412 and the multiplication mode The architecture of group 43() is the same as that shown in Figures 4 and 5. In an embodiment, the multiplication single S 432 includes a = method shifter 'utilizing a fixed displacement line with an adder to achieve a multiplication operation, thereby reducing the complexity of multiplication, and without adding another read only memory' further Reduce the required wafer area. FIG. 10 is a timing chart of the operation of FIG. 9. Referring to FIG. 9 and FIG. 1 , taking the 64-point FFT/IFFT as an example, a 64-point fft/ifft operation time requires 64 clocks, including four multiplication periods, Service, Lake and Li, and four output periods (10), 〇S2, 〇s3, and 〇S4, the timing diagrams of the multiplication period and the output period, and the operations between the modules are as shown in Fig. 7 and Fig. 8 respectively. The difference of the third is that the matrix memory 44 〇 uses the forward rectification path (four) to construct its data directly to the R8-FFT 450 without feeding back to the R8_FFT 42G. In the 16-cycle period of the first group, the R8-FFT42〇 and the multiplication template 43〇 switch the 4 L-type butterfly operation and the multiplication operation according to the switching unit 414, and store the operation result in the matrix memory model. In group 440, a multiplication period MS1 is formed. In the first set of 16 clock cycles 472, R8_FFT 45〇 according to the matrix 20 200915095

式記憶體模組440中餘存的資料進行L FF:/IFF7料的輪出’同時_·及乘法= 繼，，對後、，輸入的資料進行運算，形成輸出時法時期順，同樣地，第三組及第四組的16 = 及俱分別形成輪出時期⑽與乘法時#_ = 出時期⑽與乘法時期聰，並在之後的i6個時脈週^ 478中形成輸出_qS4。在此實施射，乘法触43〇採用並列式迴授路，使乘法模組伽具備先寫後乘之機制，以提高處理速度。由於輸入模組41〇具有四個輸入組46〇至466，且在 64個時脈週期中包括四個乘法時期題至聰及四個輸出時期OS1至OS4 ’顯示R28MDC 4〇〇的處理速率可達 4R ’適用於4x4多輪入多輸出正交多頻的無線區域網路。此外，無論在乘法時期或輸出時期R8_FFT42〇及45〇、乘法模組430及矩陣式記憶體模組44〇均被使用，因此 R8-FFT 420及450、乘法模組43〇及矩陣式記憶體模組44〇的利用率達100%，有效提升各模組的利用率。進一步言， R28MDC架構400應用了高速之管線式處理器架構，且在 FFT/IFFT的運异過程中乘法時期及輸出時期同時執行，因此單位時間的處理能力提升為二倍，達到高利用率的目的0 圖11係本發明R28MDF與習知技術應用在2x2多輸入多輸出正交多頻系統中’ 64點之FFT/IFFT轉換效能比較表，其中習知的R2MDC目前僅有理論尚無實際的架 21 200915095 構，若直接根據其理論實現應架構，則需4個唯讀記憶體，由圖11可知，本發發明之R28MDF係在考量乘法模組的複雜度、系統操作頻率、晶片面積、蝴蝶模組利用率、及處理速度後，符合性能與面積之最高效率的架構。圖12係本發明R28MDC與習知技術應用在4x4多輸入多輸出正交多頻系統中，64點之FFT/IFFT轉換效能比較表，由圖12可知，本發發明之R28MDC係模組利用率最高且晶片面積最小的架構。【圖式簡單說明】圖1係習知蝴蝶模組的示意圖；圖2係本發明之L型蝴蝶模組的示意圖；圖3係本發明之FFT/IFFT裝置第一個實施例的示意圖，圖4係圖3中輸入单元的不意圖；圖5係圖3中乘法模組的示意圖；圖6係圖3的運算時序圖；圖7係圖6中乘法時期的時序圖；圖8係圖6中輸出時期的時序圖；圖9係本發明之FFT/IFFT裝置第二個實施例的示意圖，圖10係圖9的運算時序圖；圖11係本發明R28MDF與習知技術應用在2x2多輸入多輸出正交多頻系統中，64點之；FFT/IFFT轉換效能比 22 200915095 較表；以及圖12係本發明R28MDC與習知技術應用在4x4多輸入多輸出正交多頻系統中，64點之FFT/IFFT轉換效能比較表。【主要元件符號說明】 100 蝴蝶模組 110 乘法單元 120 乘法單元 200 L型蝴蝶模組 210 乘法單元 300 分離基數-2/8多延遲迴授路徑 310 輸入模組 312 輸入單元 314 暫存器 316 及閘 318 輸入組 319 輸入組The data remaining in the memory module 440 is subjected to L FF:/IFF7 material rounding 'simultaneous_·and multiplication=step, and then the input data is calculated to form an output time period, similarly The third group and the fourth group of 16 = and the formation of the round-out period (10) and the multiplication time #_ = the out-of-time period (10) and the multiplication period Cong, and the output _qS4 is formed in the subsequent i6 clock cycles 478. In this implementation, the multiply touch 43〇 uses a parallel-type feedback circuit, so that the multiplication module gamma has a mechanism of first writing and then multiplying to improve the processing speed. Since the input module 41〇 has four input groups 46〇 to 466, and includes four multiplication periods in the 64 clock cycles to Cong and four output periods OS1 to OS4 'display R28MDC 4〇〇 processing rate can be Up to 4R' is suitable for 4x4 multi-wheeled multi-output quadrature multi-frequency wireless local area network. In addition, R8-FFT 420 and 450, multiplication module 43〇, and matrix memory are used regardless of the multiplication period or output period R8_FFT42〇 and 45〇, multiplication module 430, and matrix memory module 44〇. The utilization rate of the module 44〇 is 100%, which effectively improves the utilization rate of each module. Furthermore, the R28MDC architecture 400 uses a high-speed pipeline processor architecture, and performs multiplication and output periods simultaneously in the FFT/IFFT migration process, so the processing power per unit time is doubled to achieve high utilization. OBJECTIVE FIG. 11 is a comparison table of the 64-point FFT/IFFT conversion performance of the R28MDF of the present invention and a conventional technique applied in a 2x2 multiple-input multiple-output orthogonal multi-frequency system, wherein the conventional R2MDC has only theoretically no practical. Frame 21 200915095 structure, if directly based on its theory to achieve the architecture, then four read-only memory is required. As can be seen from Figure 11, the R28MDF of the present invention considers the complexity of the multiplication module, the system operating frequency, the wafer area, After the butterfly module utilization rate and processing speed, the architecture with the highest efficiency and area efficiency is adopted. 12 is a comparison table of 64-point FFT/IFFT conversion performance of the R28MDC of the present invention and a conventional technique applied in a 4×4 multiple-input multiple-output orthogonal multi-frequency system. As shown in FIG. 12, the R28MDC module utilization rate of the present invention is shown in FIG. The architecture with the highest and smallest wafer area. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic view of a conventional butterfly module; FIG. 2 is a schematic view of an L-shaped butterfly module of the present invention; FIG. 3 is a schematic view of a first embodiment of an FFT/IFFT device of the present invention; 4 is a schematic diagram of the input unit in FIG. 3; FIG. 5 is a schematic diagram of the multiplication module in FIG. 3; FIG. 6 is an operation timing diagram of FIG. 3; FIG. 7 is a timing chart of the multiplication period in FIG. FIG. 9 is a schematic diagram of a second embodiment of the FFT/IFFT apparatus of the present invention, FIG. 10 is an operation timing diagram of FIG. 9; FIG. 11 is a R28MDF of the present invention and a conventional technique applied to 2×2 multiple inputs. In a multi-output orthogonal multi-frequency system, 64 points; FFT/IFFT conversion performance ratio 22 200915095; and FIG. 12 is a R28MDC of the present invention and a conventional technique applied in a 4x4 multiple input multiple output orthogonal multi-frequency system, 64 Point FFT/IFFT conversion performance comparison table. [Main component symbol description] 100 Butterfly module 110 Multiplication unit 120 Multiplication unit 200 L-type butterfly module 210 Multiplication unit 300 Separation base-2/8 multi-delay feedback path 310 Input module 312 Input unit 314 Register 316 and Gate 318 Input Group 319 Input Group

320 R8-FFT 330 乘法模組 332 乘法單元 334 多工器 336 解多工器 340 矩陣式記憶體模組 23 200915095 350 控制模組 400 分離基數-2/8多延遲整流路徑 410 輸入模組 412 輸入單元 414 切換單元 420 R8-FFT 430 乘法模組 432 乘法單元 440 矩陣式記憶體模組 450 R8-FFT 460-466 輸入組 470-478 16個時脈週期 24320 R8-FFT 330 Multiplication Module 332 Multiplication Unit 334 Multiplexer 336 Demultiplexer 340 Matrix Memory Module 23 200915095 350 Control Module 400 Separation Base-2/8 Multi-Delay Rectifier Path 410 Input Module 412 Input Unit 414 Switching Unit 420 R8-FFT 430 Multiplication Module 432 Multiplication Unit 440 Matrix Memory Module 450 R8-FFT 460-466 Input Group 470-478 16 Clock Cycles 24

Claims

200915095 X. The scope of application for patents·· ^Transfer type ^Four Fourier transform and its inverse conversion device, including: Input data; ] Early 70' to receive a plurality of streamlined 8-point fast Fourier transform modules into the data to generate complex numbers The first-output data, each!;::: one: the data contains a position, the message and the 'number of assets t, the second group, the group' has a plurality of multiplication units, according to the number information of the plurality of materials from the plural The multiplication unit 3' ς 彳运运产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生In the cycle, if the plurality of first conflicting outputs are output, the translation method outputs data, and the multiplication module is provided in the subsequent clock cycle to perform the operation. 2 as requested! The fast Fourier transform and its control module are used to control and set the order of operations. The fast Fourier transform and its inverse conversion device in the 8-bit fast Fourier transform single, the array memory 2: having a feedback path structure to feedback the moment Fourier transform two::: second output The data is sent to the reduced 8-point fast conversion, and the fast Fourier 25 200915095 leaf conversion data or its inverse conversion data corresponding to the plurality of input data is generated. Including the fast Fourier transform of item 1 and its inverse conversion device, the memory-mode continuous 8-point fast Fourier transform module switches the fast Fourier transform of the item 4 in the matrix and its inverse conversion device, and the complex rib formation The structure of the front rectification path to provide the fast output of the second output data to the second streamlined 8-point fast-transfer leaf transfer group or its inverse conversion of several rounds including the fast Fourier of the item 1. The conversion and its reverse conversion device, in which the soap element is replaced in the input module, is used for data to the reduced 8-point fast Fourier transform unit.... Each element 1 speed Fourier transform and its inverse conversion The device, according to a closed-loop control signal and a time-shift signal, shifts the plurality of input data according to the 5 green shift signal. The fast Fourier transform and the inverse conversion device of the axe/, item 1 Group / / month ο · point fast Fourier transform module includes an L-shaped butterfly speed Fourier transform and its inverse conversion split in an L-shaped butterfly mode, the upper half and the lower main 1 input and four outputs ' Take The matrix operation after the +4 disassembly. 10. The fast Fourier transform and inverse conversion device of the item 1 is 26, 200915095 2== The group has a parallel feedback path structure to access the matrix. === leaf transformation and its inverse conversion device, steps: (4) Fourier transform and its inverse conversion method, including the following: 亍 L-shaped butterfly operation, according to the plurality of position information and number information, the first number = birth, the plural The first multiplication operation of the number information=person number information in the first output data to generate a plurality of second output data having 5 hai position information; the location # storing the second output data and not The plurality of first-output data of the multiplication operation; ()=the non-Hai-multiplication operation of the plurality of first-output data, the number information does not conflict, and the second=-method corresponding to the number information generates a plurality of The third output asset (4) of the location information stores the third output f material and the plurality of first rounds of data not subjected to the second multiplication according to the location information; (0 repeating steps (d) and (4) until the plurality of An output data is subjected to the first or second multiplication operation; and (g) performing a second L-type butterfly operation 'generating the fast Fourier transform data of the plurality of input data according to the second and third data or inverse conversion data thereof ' 27 200915095 13. The fast method of claim 12 further includes the feedback of the second and third indirect conversion and its inverse conversion butterfly operation. · As requested in item 12, the fast Fourier transform method and the inverse Fourier transform and its inverse conversion L-shaped butterfly operation. Output the material to perform the second 15. The fast method of claim 12, including switching The plurality of input data/input and/or inverse conversion operations. Into the first L-type butterfly 16. As the fast method of the item 12, the current 笫-_ ν, the transformation and the inverse conversion of the square part of the I matrix operation t-butter operation including the upper half And the second half of the second method of the 1i conversion and its inverse conversion and the first multiplication method including the displacement addition operation. 28