JP2012155569A

JP2012155569A - Digital signal processing device

Info

Publication number: JP2012155569A
Application number: JP2011014727A
Authority: JP
Inventors: Hirohiko Shibata; 大彦柴田
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2011-01-27
Filing date: 2011-01-27
Publication date: 2012-08-16

Abstract

【課題】積和演算の処理能力を向上する。
【解決手段】ディジタル信号処理装置は、複数のレジスタと、クロック信号に同期して時系列的に前記複数のレジスタにデータを格納するデータ転送部と、同じタイミングで前記複数のレジスタに格納されたデータに対して演算を実行する演算部とを備える。前記データ転送部は、与えられた命令に応じて、前記クロック信号の或るタイミングで前記複数のレジスタのうちの或るレジスタに格納された前記データを、次のタイミングで前記複数のレジスタのうちの指定された他のレジスタに格納するように転送する。ＳＩＭＤによるフィルタ演算の処理能力を向上することができる。
【選択図】図１An object of the present invention is to improve the processing capability of a product-sum operation.
A digital signal processing device stores a plurality of registers, a data transfer unit that stores data in the plurality of registers in time series in synchronization with a clock signal, and the plurality of registers stored at the same timing. And an arithmetic unit that performs an operation on the data. In accordance with a given instruction, the data transfer unit stores the data stored in a certain register among the plurality of registers at a certain timing of the clock signal. Transfer to be stored in the other specified register. It is possible to improve the processing capability of the filter operation by SIMD.
[Selection] Figure 1

Description

本発明は、ディジタル信号処理に関する。本発明は特に、ＳＩＭＤ方式のプロセッサにおいてフィルタ演算等の計算を効率化する技術に関する。 The present invention relates to digital signal processing. The present invention particularly relates to a technique for improving the efficiency of calculations such as filter operations in a SIMD processor.

ディジタル信号処理装置は一般に大量の演算を一群のデータに対して効率よく行う必要がある。適用されるアプリケーションは、機械制御、高能率音声符号化・復号化、画像処理等広範である。 A digital signal processing apparatus generally needs to efficiently perform a large amount of operations on a group of data. Applications that can be applied include a wide range of applications such as machine control, high-efficiency speech encoding / decoding, and image processing.

一般的なロードストアアーキテクチャのＲＩＳＣプロセッサによるディジタル信号処理では、１つの命令で複数のデータを並列に扱うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）方式が頻繁に用いられる。図８は、一般的なＳＩＭＤ方式によるディジタル信号処理装置の演算器の一例である。１つのＤｎと１つのＤｍからなる組について、１つの命令ＤｏｐＤ→Ｄで４組のデータが並列に処理され、４通りの結果Ｄｄが得られる。 In digital signal processing by a RISC processor of a general load store architecture, a single instruction multiple data (SIMD) system that handles a plurality of data in parallel with one instruction is frequently used. FIG. 8 shows an example of an arithmetic unit of a digital signal processing apparatus using a general SIMD system. With respect to a set of one Dn and one Dm, four sets of data are processed in parallel by one instruction DopD → D, and four results Dd are obtained.

特許文献１では、図９に示すように、ＳＩＭＤ方式の演算器において、並列化された複数の処理対象データ保持レジスタに、同じデータを設定する方法が提案されている。この文献では、ベクトル変数とスカラ変数の乗算のような特定の処理で性能を改善できるとされている。 Patent Document 1 proposes a method of setting the same data in a plurality of parallel processing target data holding registers in a SIMD type arithmetic unit as shown in FIG. This document states that the performance can be improved by specific processing such as multiplication of a vector variable and a scalar variable.

特開２００５−１７４３００号JP-A-2005-174300

高能率音声符号化・復号化において、大量のデータを効率よく処理する方法が必要である。しかしながら特開２００５−１７４３００の技術では、ベクタ変数とスカラ変数との演算処理に効果があるが、高能率音声符号化・復号化で多用されるフィルタ積和演算処理においては、効力を発揮しない。 In high-efficiency speech encoding / decoding, a method for efficiently processing a large amount of data is required. However, the technique disclosed in Japanese Patent Application Laid-Open No. 2005-174300 is effective in the arithmetic processing of vector variables and scalar variables, but is not effective in the filter product-sum arithmetic processing frequently used in high-efficiency speech encoding / decoding.

以下に、［発明を実施するための形態］で使用される番号を括弧付きで用いて、課題を解決するための手段を説明する。これらの番号は、［特許請求の範囲］の記載と［発明を実施するための形態］との対応関係を明らかにするために付加されたものである。ただし、それらの番号を、［特許請求の範囲］に記載されている発明の技術的範囲の解釈に用いてはならない。 Hereinafter, means for solving the problem will be described using the numbers used in [DETAILED DESCRIPTION] in parentheses. These numbers are added to clarify the correspondence between the description of [Claims] and [Mode for Carrying Out the Invention]. However, these numbers should not be used to interpret the technical scope of the invention described in [Claims].

本発明の一側面において、ディジタル信号処理装置は、複数のレジスタ（Ｋｍｎ）と、クロック信号に同期して時系列的に複数のレジスタにデータを格納するデータ転送部（Ｌ）と、同じタイミングで複数のレジスタに格納されたデータに対して演算を実行する演算部（Ｍｕｌ０〜３、ｐｒｏｄ０〜３、ａｄｄ０〜３、ａｃｃ０〜３）とを備える。データ転送部は、与えられた命令に応じて、クロック信号の或るタイミングで複数のレジスタのうちの或るレジスタに格納されたデータを、次のタイミングで複数のレジスタのうちの指定された他のレジスタに格納するように転送する。 In one aspect of the present invention, a digital signal processing device includes a plurality of registers (Kmn) and a data transfer unit (L) that stores data in the plurality of registers in time series in synchronization with a clock signal at the same timing. Computation units (Mul 0 to 3, prod 0 to 3, add 0 to 3, acc 0 to 3) that perform operations on data stored in a plurality of registers are provided. In accordance with a given instruction, the data transfer unit transfers data stored in a certain register among a plurality of registers at a certain timing of a clock signal to a specified other of the plurality of registers at a next timing. To be stored in the register.

本発明の他の側面において、ディジタル信号処理方法は、クロック信号に同期して時系列的に複数のレジスタにデータを格納する工程と、同じタイミングで複数のレジスタに格納されたデータに対して演算を実行する工程とを備える。格納する工程においては、与えられた命令に応じて、クロック信号の或るタイミングで複数のレジスタのうちの或るレジスタに格納されたデータを、次のタイミングで複数のレジスタのうちの指定された他のレジスタに格納するように転送する。 In another aspect of the present invention, a digital signal processing method includes a step of storing data in a plurality of registers in time series in synchronization with a clock signal, and an operation on data stored in the plurality of registers at the same timing. The process of performing. In the storing step, in accordance with a given instruction, data stored in a certain register among a plurality of registers at a certain timing of a clock signal is designated in the plurality of registers at a next timing. Transfer to store in another register.

本発明によれば、データ転送部が或るタイミングで或るレジスタの格納したデータを、次のタイミングで他のレジスタに転送して計算に用いるため、処理能力が向上する。一例として、ＳＩＭＤによる高能率音声符号化・復号化処理で多様される積和演算の処理能力を改善することができる。 According to the present invention, the data transfer unit transfers the data stored in a certain register at a certain timing to another register at the next timing and uses it for calculation, so that the processing capability is improved. As an example, it is possible to improve the processing capability of the product-sum operation that is variously performed in the high-efficiency speech encoding / decoding processing by SIMD.

図１は、本発明の一実施形態におけるディジタル信号処理装置の概略図である。FIG. 1 is a schematic diagram of a digital signal processing apparatus according to an embodiment of the present invention. 図２は、ｘｔｙｐｅとデータレジスタ群の説明図である。FIG. 2 is an explanatory diagram of the xtype and the data register group. 図３は、フィルタ演算動作の例を示すシグナルフロー図である。FIG. 3 is a signal flow diagram illustrating an example of the filter calculation operation. 図４は、ｍａｃ＿ｎの第二引数を示す。FIG. 4 shows the second argument of mac_n. 図５は、規格で定義された分析フィルタの処理コードを示す。FIG. 5 shows the processing code of the analysis filter defined in the standard. 図６は、図５のコード中の左シフトと丸め処理のコードである。FIG. 6 is a code for left shift and rounding in the code of FIG. 図７Ａは、一実施形態におけるフィルタ処理のコードである。FIG. 7A is a filter processing code according to an embodiment. 図７Ｂは、一実施形態におけるフィルタ処理のコードである。FIG. 7B is a filter processing code according to an embodiment. 図８は、ＳＩＭＤ方式によるディジタル信号処理装置の演算器の１例である。FIG. 8 is an example of an arithmetic unit of a digital signal processing apparatus based on the SIMD system. 図９は、特開２００５−１７４３００で提案された特定の処理のための演算器構成を示す。FIG. 9 shows an arithmetic unit configuration for specific processing proposed in Japanese Patent Laid-Open No. 2005-174300.

以下、図面を参照して本発明の実施形態について説明する。
本発明の一実施形態におけるディジタル信号処理装置は、積和演算命令のオペランドとして、“積和演算器動作モード”を持つ。
またディジタル信号処理装置は、積和演算命令と“積和演算動作モード”とをデコードし、積和命令専用データレジスタと被乗算データ入力レジスタ間のデータ転送と、被乗算データ入力レジスタ同士のデータ転送とを制御する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
The digital signal processing apparatus according to an embodiment of the present invention has a “product-sum operation unit operation mode” as an operand of a product-sum operation instruction.
The digital signal processor decodes the product-sum operation instruction and the “product-sum operation mode”, transfers data between the product-sum instruction dedicated data register and the multiplied data input register, and the data between the multiplied data input registers. Control the transfer.

図１は、本実施形態におけるディジタル信号処理装置の概略図である。ディジタル信号処理装置は、命令メモリｉｍｅｍ、データメモリｄｍｅｍ、演算器ＡＣＣ、命令デコーダＤＥＣ、レジスタファイルＲＦを備える。このディジタル信号処理装置は、高能率音声符号化・復号化処理で多用されるフィルタ積和演算を効率よく処理するために、一般的なＳＩＭＤ（図８）や特開２００５−１７４３００（図９）のＳＩＭＤを用いたディジタル信号処理装置の演算器とは異なる構成を備える。そのために拡張した積和演算命令と、これを処理する演算器ＡＣＣと命令デコーダＤＥＣとが設けられる。 FIG. 1 is a schematic diagram of a digital signal processing apparatus according to this embodiment. The digital signal processing device includes an instruction memory immem, a data memory dmem, an arithmetic unit ACC, an instruction decoder DEC, and a register file RF. This digital signal processing apparatus uses a general SIMD (FIG. 8) and Japanese Patent Application Laid-Open No. 2005-174300 (FIG. 9) in order to efficiently process a filter product-sum operation frequently used in high-efficiency speech encoding / decoding processing. This has a different configuration from the arithmetic unit of the digital signal processing apparatus using SIMD. For this purpose, an extended product-sum operation instruction, an arithmetic unit ACC for processing the instruction, and an instruction decoder DEC are provided.

命令デコーダＤＥＣは、特別な積和演算命令ｍａｃ４＿ｎをデコードする機能を有する。この命令においては、第二の引数として“積和演算器動作モード”を指定することができる。命令デコーダＤＥＣは、命令コードとともに“積和演算器動作モード”をデコードする。デコード結果に応じて、データ転送部Ｌは、演算器ＡＣＣ内の積和命令専用データレジスタａｌｇ０／１、ｄａｔａ０−５、被乗算データ入力レジスタＫｍｎのデータ転送を行う。 The instruction decoder DEC has a function of decoding a special product-sum operation instruction mac4_n. In this instruction, “multiply-accumulator operation mode” can be specified as the second argument. The instruction decoder DEC decodes the “multiply-accumulate unit operation mode” along with the instruction code. In accordance with the decoding result, the data transfer unit L performs data transfer to the multiply-add instruction dedicated data registers arg0 / 1, data0-5, and the multiplied data input register Kmn in the arithmetic unit ACC.

例えば、図３に示すフィルタ演算動作では、クロック信号に同期して時系列的に複数のレジスタＫｍｎにディジタル信号によるデータが図１のデータ転送部Ｌによって格納される。データは、例えば、画像処理における画素の輝度などを示す信号データＸ_ｍ（ｍは整数）と、その信号に対して掛けられるフィルタの特性を示す係数データａ_ｎ（ｎは整数）とを含む。図３の例では、Ｋ００、Ｋ０２、Ｋ４０、Ｋ４２が信号データを格納するデータレジスタとして用いられる。Ｋ０１、Ｋ０３、Ｋ４１、Ｋ４３は係数データを格納する係数レジスタとして用いられ、データレジスタＫ００、Ｋ０２、Ｋ４０、Ｋ４２にそれぞれ対応して設けられる。 For example, in the filter operation shown in FIG. 3, the data transfer unit L in FIG. 1 stores the digital signal data in the plurality of registers Kmn in time series in synchronization with the clock signal. Data includes, for example, a signal indicating, for example, brightness of pixels in the image processing data X _{m (m} is an integer), the _(n is an integer) coefficient data a _n indicating the characteristic of the filter is multiplied for the signals and. In the example of FIG. 3, K00, K02, K40, and K42 are used as data registers for storing signal data. K01, K03, K41, and K43 are used as coefficient registers for storing coefficient data, and are provided corresponding to the data registers K00, K02, K40, and K42, respectively.

演算部ＡＣＣは、積和演算命令に応じて、動作クロック信号の同じタイミングで複数のデータレジスタＫ００、Ｋ０２、Ｋ４０、Ｋ４２のうちの各々に格納されたディジタル信号と複数の係数レジスタＫ０１、Ｋ０３、Ｋ４１、Ｋ４３のうち対応するレジスタに格納された係数との組の時系列データに対して積和演算を実行する。 In response to the product-sum operation instruction, the arithmetic unit ACC receives the digital signal stored in each of the plurality of data registers K00, K02, K40, and K42 at the same timing of the operation clock signal and the plurality of coefficient registers K01, K03, A product-sum operation is performed on the time-series data of the set with the coefficient stored in the corresponding register among K41 and K43.

データ転送部Ｌは、与えられた命令に応じて、或るタイミングで複数のデータレジスタの各々に格納されたディジタル信号を、次のタイミングで複数のデータレジスタのうちの積和演算命令によって指定された他のデータレジスタに格納されるように転送する。 The data transfer unit L designates a digital signal stored in each of the plurality of data registers at a certain timing according to a given instruction by a product-sum operation instruction of the plurality of data registers at the next timing. The data is transferred so as to be stored in another data register.

具体的には、演算部ＡＣＣは、図３の左上のＸ_０、ａ_０から右下のＸ_−７、ａ_１０までのデータのリストを、クロック信号に同期して時系列的に最下段から順にレジスタに取り込んで積算する。更に、時系列順に得られた積算の結果の和を順次に取ることによって積和演算が実行される。本実施形態におけるディジタル信号処理装置は、ＳＩＭＤ型のプロセッサであり、この積和演算はｉｍｅｍに格納された一の積和演算命令に応じて実行される。 Specifically, the arithmetic unit ACC _displays a list of data from X ₀ , a _{0 in} the upper left of FIG. 3 to X _-7 , a _{10 in the} lower right from the lowermost stage in time series in synchronization with the clock signal. Take in order and accumulate. Furthermore, the product-sum operation is executed by sequentially taking the sum of the integration results obtained in time series order. The digital signal processing apparatus according to the present embodiment is a SIMD type processor, and this product-sum operation is executed in accordance with one product-sum operation instruction stored in the image.

演算部ＡＣＣは、同じタイミングで複数のレジスタＫｍｎに格納されたあるデータが次の演算では隣のレジスタで使用されるという特徴に着目する（図３：右下から左上へ移動するようにデータ（ｘ）が使用される）。データ転送部Ｌは、与えられた命令（命令メモリｉｍｅｍに格納され、命令デコーダＤＥＣでデコードされた命令）に応じて、動作クロック信号の或るタイミングで複数のレジスタＫ００、Ｋ０２、Ｋ４０、Ｋ４２のうちの或るレジスタに格納されたデータ（例えば図３のリストの最下段のＫ４２に格納されたＸ_−７、Ｋ４０に格納されたＸ_−８、Ｋ０２に格納されたＸ_−９）を、次のタイミングで、命令セットの定義によって指定された他のレジスタ（それぞれ下から２段目のＫ４０に格納されたＸ_−７、Ｋ０２に格納されたＸ_−８、Ｋ００に格納されたＸ_−９）に格納するように転送する。 The arithmetic unit ACC pays attention to the feature that certain data stored in the plurality of registers Kmn at the same timing is used in the next register in the next operation (FIG. 3: data (moving from the lower right to the upper left) x) is used). The data transfer unit L stores a plurality of registers K00, K02, K40, and K42 at a certain timing of the operation clock signal according to a given instruction (instruction stored in the instruction memory “imem” and decoded by the instruction decoder DEC). Data stored in a certain register (for example, X- ₇ stored in K42 at the bottom of the list of FIG. 3, X- ₈ stored in K40, X- ₉ stored in K02) The other registers specified by the instruction set definition (X- ₇ stored in K40, X- ₈ stored in K02, and X- ₉ stored in K00, respectively) from the bottom. Transfer to store.

具体的には、“積和演算器動作モード”として“ＦＩＬＴＥＲ”を意味するコードを指定すると、データ転送部Ｌは、複数の被乗算データ入力レジスタＫｍｎの間でデータを転送する。このようにして、積和命令専用データレジスタａｌｇ０／１、ｄａｔａ０−５と被乗算データ入力レジスタＫｍｎに関するデータ転送を効率化する。その結果、高能率音声符号化・復号化処理で多用される積和演算の処理能力が改善される。 Specifically, when a code meaning “FILTER” is designated as the “multiply-accumulate unit operation mode”, the data transfer unit L transfers data between the multiple data input registers Kmn to be multiplied. In this way, the data transfer related to the multiply-add instruction dedicated data registers alg0 / 1, data0-5 and the multiplied data input register Kmn is made efficient. As a result, the processing capability of the product-sum operation frequently used in the high-efficiency speech encoding / decoding process is improved.

以下、本実施形態について、さらに詳しく説明する。演算器ＡＣＣは、積和命令専用データレジスタ群ａｌｇ０／１およびｄａｔａ０〜５、１６ビットの被乗算データ入力レジスタＫｍｎ（ｍ＝０，４，ｎ＝０〜３）、乗算器ｍｕｌ０〜３、加算器ａｄｄ０〜３、乗算器ｍｕｌ０〜３の出力を保持する３２ビットのレジスタｐｒｏｄ０〜３、加算器ａｄｄ０〜３の出力を保持する３２ビットのレジスタａｃｃ０〜３で構成される。ｉｍｅｍは、命令メモリである。 Hereinafter, this embodiment will be described in more detail. The arithmetic unit ACC includes a product-sum instruction dedicated data register group alg0 / 1 and data 0 to 5, 16-bit multiplied data input register Kmn (m = 0, 4, n = 0 to 3), multipliers mul0 to 3, and addition And adders add0-3, 32-bit registers prod0-3 for holding outputs of multipliers mul0-3, and 32-bit registers acc0-3 for holding outputs of adders add0-3. imme is an instruction memory.

ｄｍｅｍは、処理対象の１６ビット変数配列を格納し、１２８ビットのアクセス単位で読み書きするインターフェースを持ったデータメモリである。この１２８ビットのアクセス単位を表す型として、本実施形態では“ｘｔｙｐｅ”を用意している。“ｘｔｙｐｅ”のインターフェース境界からずれたアドレスから始まる配列に対する読み出しアクセスの結果は、配列の先頭から次のインターフェース境界までの内容に、アクセス単位内の残りの内容が続いた形になる。図２の濃い網掛けの領域が、処理対象のデータが格納されている領域を示す。 dmem is a data memory having an interface for storing a 16-bit variable array to be processed and reading and writing in 128-bit access units. In this embodiment, “xtype” is prepared as a type representing this 128-bit access unit. As a result of the read access to the array starting from the address deviated from the interface boundary of “xtype”, the content from the top of the array to the next interface boundary is followed by the remaining content in the access unit. A dark shaded area in FIG. 2 indicates an area where data to be processed is stored.

これを適切に処理するため、データ転送部Ｌでは、積和命令専用データレジスタ群ａｌｇ０／１、ｄａｔａ０〜５から１６ビットの被乗算データ入力レジスタＫｍｎ（ｍ＝０，４，ｎ＝０〜３）に被乗算データを振り分ける。データ転送部Ｌは、Ｋｍｎレジスタ間でデータを転送することが出来る。また、レジスタファイルＲＦは１２８ビットである。 In order to appropriately process this, the data transfer unit L uses the multiply-add instruction dedicated data register group arg0 / 1, data0-5 to the 16-bit multiplied data input register Kmn (m = 0, 4, n = 0-3). ) To be multiplied. The data transfer unit L can transfer data between the Kmn registers. The register file RF is 128 bits.

このように構成した積和演算回路を制御するための命令セットを用意する。ｍａｃ４＿ｎ（）（ｎ＝０〜５）は、第一引数に対し第二引数で示される処理を行って、積和命令専用データレジスタ群ｄａｔａ０〜５およびレジスタファイルＲＦに結果を格納する命令である。ｍａｃ＿ｉｎａ（）は積和命令専用データレジスタａｌｇ０に引数を格納する命令である。ｍａｃ＿ｉｎｂ（）は積和命令専用データレジスタａｌｇ１に引数を格納する命令である。ＣＬＲ１２８（）は２つの引数をとって積和命令専用データレジスタ群ａｌｇ０／１、ｄａｔａ０〜５から１６ビットの被乗算データ入力レジスタＫｍｎ（ｍ＝０，４，ｎ＝０〜３）に被乗算データを振り分ける論理Ｌを決定するとともに積和命令専用データレジスタ群ａｌｇ０／１、ｄａｔａ０〜５を初期化する命令である。ｓｅｔ＿ｆｉｒｘ（）はｘｔｙｐｅの引数のＬＳＢ側ｓｈｏｒｔ４個分を被乗算データ入力レジスタＫ００，Ｋ０２，Ｋ４０，Ｋ４２にセットする命令である。 An instruction set for controlling the multiply-accumulate circuit configured as described above is prepared. mac4_n () (n = 0 to 5) is an instruction for performing the process indicated by the second argument on the first argument and storing the result in the product-sum instruction dedicated data register groups data0 to 5 and the register file RF. . mac_ina () is an instruction for storing an argument in the data sum instruction data alg0. mac_inb () is an instruction for storing an argument in the data sum instruction data register alg1. The CLR128 () takes two arguments and multiplies the multiply-add instruction dedicated data register group alg0 / 1, data 0-5 to the 16-bit multiplied data input register Kmn (m = 0, 4, n = 0-3). This is an instruction for determining a logic L for distributing data and initializing a data sum group for data sum instructions alg0 / 1 and data0-5. set_firex () is an instruction for setting four LSB shorts of xtype arguments in the multiplied data input registers K00, K02, K40, and K42.

このうち、ｍａｃ４＿ｎ（ｎ＝０〜５）は、フィルタ積和演算に使用する特別な命令である。この命令では、第一引数に対する処理方法を指示する第二引数“積和演算器動作モード”を、図４のように定義する。例えば、第二引数“積和演算器動作モード”に図４中のＦＩＬＴＥＲを設定すると、積和命令専用データレジスタ群ａｌｇ０／１、ｄａｔａ０〜５から１６ビットの被乗算データ入力レジスタＫｍｎ４２へ被乗算データを取り込み、Ｋｍｎレジスタ間でデータをシフトするように、データ転送部Ｌを制御する。 Of these, mac4_n (n = 0 to 5) is a special instruction used for the filter product-sum operation. In this instruction, a second argument “multiply-accumulate unit operation mode” for instructing a processing method for the first argument is defined as shown in FIG. For example, when FILTER in FIG. 4 is set in the second argument “multiply-accumulate unit operation mode”, the multiply-add instruction data register group arg0 / 1, data 0-5 to the 16-bit multiplied data input register Kmn42 are multiplied. The data transfer unit L is controlled so as to capture data and shift the data between the Kmn registers.

さらに、高能率音声符号化・復号化方式の１つであるＡＭＲ（ＡｄａｐｔｉｖｅＭｕｌｔｉＲａｔｅ）で用いる線形予測法の分析フィルタを例に、本実施形態における動作と効果について説明する。図５は、規格で定義された分析フィルタの処理コードである。ａ［］，ｘ［］，ｙ［］はそれぞれ線形予測係数、入力信号、出力信号の１６ビット変数配列である。ｌｇは処理フレーム長である。Ｍ＝１１は線形予測の次数である。 Furthermore, the operation and effect of this embodiment will be described by taking as an example an analysis filter of a linear prediction method used in AMR (Adaptive Multi Rate), which is one of high-efficiency speech encoding / decoding methods. FIG. 5 shows a processing code of the analysis filter defined by the standard. a [], x [], and y [] are 16-bit variable arrays of linear prediction coefficients, input signals, and output signals, respectively. lg is the processing frame length. M = 11 is the order of linear prediction.

さらにコード中の左シフトと丸め処理は、オーバーフロー／アンダーフローに対する飽和処理を考慮して図６のように定義されている。 Further, the left shift and round processing in the code are defined as shown in FIG. 6 in consideration of saturation processing for overflow / underflow.

図７Ａと図７Ｂは一続きのコードであり、前記命令セットを使って図５を書き換えたものである。（ａ）で線形予測係数列ａ（０〜１０）を行う。（ｂ）で入力信号列ｘを、それぞれメモリからレジスタファイルに転送する。（ｃ）で本実施形態の特徴である機構を使用して分析フィルタ処理を行う。従来のＳＩＭＤ方式によるディジタル信号処理装置では２４００ステップ要していたのに対し、このコードの実行ステップ数は約１７００ステップである。 FIG. 7A and FIG. 7B are a series of codes, and are obtained by rewriting FIG. 5 using the instruction set. In (a), the linear prediction coefficient sequence a (0 to 10) is performed. In (b), the input signal sequence x is transferred from the memory to the register file. In (c), an analysis filter process is performed using a mechanism that is a feature of the present embodiment. The conventional SIMD digital signal processing apparatus requires 2400 steps, whereas the number of code execution steps is about 1700.

上記の例では、線形予測法の分析フィルタについて説明したが、他の様々なフィルタ処理についても、データ転送部Ｌが或るタイミングで或るレジスタの格納したデータを、次のタイミングで他のレジスタに転送して計算に用いることにより、処理能力の向上を図ることが可能である。 In the above example, the analysis filter of the linear prediction method has been described. However, in various other filter processes, data stored in a certain register at a certain timing by the data transfer unit L is transferred to another register at the next timing. It is possible to improve the processing capability by transferring the data to and using it for calculation.

以上に説明したように、従来のＳＩＭＤ方式のディジタル信号処理装置に対して、“積和演算動作モード”により、積和命令専用データレジスタと被乗算データ入力レジスタ間のデータ転送が効率化される。その結果、高能率音声符号化・復号化方式などで用いるフィルタ処理のステップ数を削減し、処理能力を向上することができる。 As described above, the data transfer between the product-sum instruction dedicated data register and the data input register to be multiplied is made efficient by the “product-sum operation mode” as compared with the conventional SIMD digital signal processing device. . As a result, it is possible to reduce the number of steps of filter processing used in a high-efficiency speech encoding / decoding method and improve processing performance.

本実施形態は特に、以下のような特徴を有する。
（１）積和演算命令のオペランドとして“積和演算器動作モード”を持つ。
（２）積和演算命令と“積和演算動作モード”をデコードし、積和命令専用データレジスタと被乗算データ入力レジスタ間のデータ転送、被乗算データ入力レジスタ同士のデータ転送を制御する。
（３）積和演算命令のオペランドとして、積和演算器の動作モードを制御する拡張機能制御部を持つ。
（４）拡張機能制御部が積和命令専用データレジスタ群とレジスタファイルとの間のデータ転送を制御する。
（５）拡張機能制御部が積和命令専用データレジスタ間のデータ転送を制御する。
（６）拡張機能制御部が積和命令専用データレジスタの設定値を制御する。
（７）積和演算回路がレジスタファイルとメモリとを並列にアクセスする。 In particular, the present embodiment has the following features.
(1) “Product-sum operation mode” is provided as an operand of a product-sum operation instruction.
(2) The product-sum operation instruction and the “product-sum operation mode” are decoded, and the data transfer between the product-sum instruction dedicated data register and the multiplied data input register and the data transfer between the multiplied data input registers are controlled.
(3) An extended function control unit for controlling the operation mode of the product-sum operation unit is provided as an operand of the product-sum operation instruction.
(4) The extended function control unit controls data transfer between the product-sum instruction dedicated data register group and the register file.
(5) The extended function control unit controls data transfer between the product-sum instruction dedicated data registers.
(6) The extended function control unit controls the set value of the data register for the product-sum instruction.
(7) The product-sum operation circuit accesses the register file and the memory in parallel.

ａｃｃ演算器
ａｃｃ０〜ａｃｃ３レジスタ
ａｄｄ０〜ａｄｄ３加算器
ａｌｇ０、ａｌｇ１積和命令専用データレジスタ
ｄａｔａ０〜ｄａｔａ５積和命令専用データレジスタ
ＤＥＣデコーダ
ｄｍｅｍデータメモリ
ｉｍｅｍ命令メモリ
Ｋｍｎ被乗算データ入力レジスタ
Ｌデータ転送部
ｍｕｌ０〜ｍｕｌ３乗算器
ｐｒｏｄ０〜ｐｒｏｄ３レジスタ
ＲＦレジスタファイル
ｓｅｌ選択器 acc arithmetic unit acc0 to acc3 register add0 to add3 adders alg0, alg1 data sum instruction dedicated data register data0 to data5 product sum instruction dedicated data register DEC decoder dmem data memory imam instruction memory Kmn multiplied data input register L data transfer units mul0 to mul0 mul3 multiplier prod0-prod3 register RF register file sel selector

Claims

Multiple registers,
A data transfer unit for storing data in the plurality of registers in time series in synchronization with a clock signal;
An arithmetic unit that performs an operation on the data stored in the plurality of registers at the same timing,
In accordance with a given instruction, the data transfer unit stores the data stored in a certain register among the plurality of registers at a certain timing of the clock signal. A digital signal processing device that transfers data to be stored in another designated register.

The digital signal processing apparatus according to claim 1,
The digital signal processing device, wherein the arithmetic unit executes the arithmetic operation in response to one command being given.

The digital signal processing apparatus according to claim 1 or 2,
The plurality of registers are:
Multiple data registers,
A plurality of coefficient registers respectively associated with the plurality of data registers,
The data transfer unit stores digital signals in each of the plurality of data registers and stores coefficients in each of the plurality of coefficient registers in time series in synchronization with the clock signal,
The arithmetic unit, in response to a product-sum operation instruction, calculates the digital signal stored in each of the plurality of data registers and the coefficient stored in the corresponding register among the plurality of coefficient registers at the same timing. Perform product-sum operation on a set of time series data,
The data transfer unit is configured to specify the digital signal stored in each of the plurality of data registers at the certain timing in accordance with the given instruction, and specify the digital signal among the plurality of data registers at the next timing. A digital signal processing device that transfers data to be stored in another data register.

The digital signal processing apparatus according to claim 3,
The plurality of data registers include n data registers from the first to the n-th (n is an integer of 2 or more),
The data transfer unit stores the digital signal stored in the i-th data register (i is an integer between 1 and n−1) at the certain timing among the plurality of data registers at the next timing. Digital signal processor that transfers to the second data register.

Storing data in a plurality of registers in time series in synchronization with a clock signal;
And performing an operation on the data stored in the plurality of registers at the same timing,
In the storing step, according to a given instruction, the data stored in a certain register among the plurality of registers at a certain timing of the clock signal is stored in the plurality of registers at the next timing. A digital signal processing method for transferring data to be stored in another specified register.

A digital signal processing method according to claim 5, comprising:
The digital signal processing method, wherein the step of executing the operation is executed in response to a command being given.

A digital signal processing method according to claim 5 or 6,
The plurality of registers are:
Multiple data registers,
A plurality of coefficient registers respectively associated with the plurality of data registers,
In the storing step, a digital signal is stored in each of the plurality of data registers and a coefficient is stored in each of the plurality of coefficient registers in time series in synchronization with the clock signal,
In the step of executing the operation, the digital signal stored in each of the plurality of data registers and the corresponding one of the plurality of coefficient registers are stored at the same timing according to a product-sum operation instruction. The product-sum operation is performed on the time series data of the set with the coefficient
In the storing step, the digital signal stored in each of the plurality of data registers at the certain timing in accordance with the given instruction is further transferred to the plurality of data registers at the next timing. A digital signal processing method that transfers data to be stored in another specified data register.

The digital signal processing method according to claim 7, comprising:
The plurality of data registers include n data registers from the first to the n-th (n is an integer of 2 or more),
In the storing step, the digital signal stored in the i-th data register (i is an integer of 1 to n-1) at the certain timing among the plurality of data registers is processed at the next timing. A digital signal processing method for transferring to the i + 1th data register.