JP2000172674A

JP2000172674A - Inverse DCT operation device and inverse DCT operation method

Info

Publication number: JP2000172674A
Application number: JP10345710A
Authority: JP
Inventors: Masanori Ishizuka; 正則石塚; Mitsuhiko Ota; 光彦太田; Tadami Kono; 忠美河野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-12-04
Filing date: 1998-12-04
Publication date: 2000-06-23

Abstract

(57)【要約】【課題】ＤＣＴ演算等により符号化された入力データ
に対し逆ＤＣＴ演算を実行するための逆ＤＣＴ演算装置
および方法に関し、ＩＥＥＥによる演算精度でかつ小規
模な回路でもって、高速に逆ＤＣＴ演算を実行すること
を目的とする。【解決手段】入力データに対する逆ＤＣＴ演算を分解
して１次元の行列演算を順次実行する場合、入力データ
の複数の係数値の並べ替えを行ってからシリアル形式の
係数値をパラレル形式の係数値に変換するシリアル／パ
ラレル変換部４と、シリアル／パラレル変換部の出力を
アドレスとして、既に記憶されている情報の加減算を行
う複数の加減算部を有する積和演算部１とを備え、積和
演算部内の複数の加減算部を再度使用し、積和演算部に
よる積和演算の結果である複数の値同士を加算または減
算してからパラレル／シリアル変換を行う。 (57) Abstract: An inverse DCT operation apparatus and method for performing an inverse DCT operation on input data encoded by a DCT operation or the like, with an operation accuracy by IEEE and a small circuit, An object is to perform an inverse DCT operation at high speed. When a one-dimensional matrix operation is sequentially performed by decomposing an inverse DCT operation on input data, a plurality of coefficient values of the input data are rearranged, and a serial-type coefficient value is converted into a parallel-type coefficient value. And a product-sum operation unit 1 having a plurality of addition / subtraction units for performing addition / subtraction of information already stored using the output of the serial / parallel conversion unit as an address. A plurality of addition / subtraction units in the unit are used again, and a plurality of values which are the result of the product-sum operation by the product-sum operation unit are added or subtracted from each other, and then the parallel / serial conversion is performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、直交変換の一種で
あるＤＣＴ（discrete cosine transform ：離散コサイ
ン変換）演算を行って符号化されたデータを復号化する
ために、この符号化されたデータに対し、逆ＤＣＴ（in
verse discrete cosine transform ：逆離散コサイン変
換）演算を実行するための逆ＤＣＴ演算装置および逆Ｄ
ＣＴ演算方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for performing a discrete cosine transform (DCT) operation, which is a type of orthogonal transform, to decode the encoded data. On the other hand, the inverse DCT (in
Inverse DCT arithmetic unit and inverse D for performing inverse discrete cosine transform) operation
The present invention relates to a CT calculation method.

【０００２】ここで、ＤＣＴ演算とは、アナログの動画
像データ等の空間周波数成分の冗長度を取り除くための
高速フーリエ変換を利用した直交変換の一種であり、Ｍ
ＰＥＧ（moving picture experts group ）等の高能率
符号化方式に従って上記動画像データをディジタルの動
画像データに符号化して圧縮する技術をいう。本発明
は、このようにして圧縮したディジタルの動画像データ
等を復号化して元のアナログの動画像データを高精度に
て再生するために、ＩＥＥＥ（Institute of Electrica
l and Electronics Engineers, Inc. ：エレクトロニク
ス関連の米国の学会）により規定された演算精度を満た
しつつ、小規模な回路にて逆ＤＣＴ演算を実行するため
の一方策に言及するものである。Here, the DCT operation is a kind of orthogonal transform using a fast Fourier transform for removing redundancy of a spatial frequency component of analog moving image data or the like.
A technique for encoding the moving image data into digital moving image data and compressing the moving image data according to a high-efficiency encoding method such as PEG (moving picture experts group). The present invention provides an IEEE (Institute of Electrica) for decoding digital moving image data and the like compressed in this way and reproducing the original analog moving image data with high accuracy.
l and Electronics Engineers, Inc .: A technique for performing an inverse DCT operation in a small-scale circuit while satisfying the operation accuracy specified by an electronics-related American society.

【０００３】[0003]

【従来の技術】図２０は、従来の逆ＤＣＴ演算装置の一
構成例を示すブロック図である。ただし、ここでは、例
えば、特開平４−３１３１５７号公報（出願人：三菱電
機株式会社、平成４年１１月５日出願）に開示されてい
るような、ＤＡ（distributedarithmetic）法を用いた
１次元の逆ＤＣＴ演算装置の構成を代表して示すことと
する。2. Description of the Related Art FIG. 20 is a block diagram showing an example of a configuration of a conventional inverse DCT operation device. However, here, for example, a one-dimensional method using a DA (distributedarithmetic) method as disclosed in Japanese Patent Application Laid-Open No. 4-313157 (applicant: Mitsubishi Electric Corporation, filed on November 5, 1992). Is representatively shown.

【０００４】図２０に示す１次元の逆ＤＣＴ演算装置に
おいては、複数のブロックに分割して符号化された入力
データに対する逆ＤＣＴ演算を、２回の１次元の８×８
の行列演算に分解し、１次元目および２次元目の８×８
の行列演算を順次実行することにより、復号化された出
力データを得るようにしている。より詳しく説明する
と、図２０においては、上記入力データの係数値を出力
する順序の並べ替えを行うデータ並べ替え回路２００が
設けられている。さらに、データ並べ替え回路２００の
出力側は、積和演算部３００の入力側に接続されてい
る。この積和演算部３００は、１次元の８×８の行列演
算を実行するための複数の積和演算回路（例えば、第１
の積和演算回路〜第８の積和演算回路３５０−１〜３５
０−８からなる８個の積和演算回路）を有しており、こ
れらの積和演算回路の出力側は、後処理部４００の入力
側に接続されている。この後処理部４００から、復号化
された出力データが出力される。In a one-dimensional inverse DCT operation device shown in FIG. 20, an inverse DCT operation on input data divided into a plurality of blocks and encoded is performed twice in one-dimensional 8 × 8.
Into 8 × 8 in the first and second dimensions
Are sequentially executed to obtain decoded output data. More specifically, in FIG. 20, a data rearranging circuit 200 that rearranges the order in which the coefficient values of the input data are output is provided. Further, the output side of the data rearrangement circuit 200 is connected to the input side of the product-sum operation unit 300. The product-sum operation unit 300 includes a plurality of product-sum operation circuits (for example, first
-To-eighth product-sum operation circuits 350-1 to 350-35
The output side of these eight product-sum operation circuits is connected to the input side of the post-processing unit 400. The post-processing unit 400 outputs the decoded output data.

【０００５】このような構成の逆ＤＣＴ演算装置におい
て、符号化された入力データに対する逆ＤＣＴ演算処理
の流れを説明する。まず、データ並べ替え回路２００に
て、符号化された入力データの係数値を出力する順序の
並べ替えを行う。原理的には、１ワードずつ入力される
シリアル形式の入力データについて、最下位のビットか
ら最上位のビットまで全ワード分を揃えてパラレル形式
の入力データを出力することにより、シリアル形式の入
力データの係数値をパラレル形式の係数値に並べ替える
処理を行う。[0005] The flow of the inverse DCT operation on coded input data in the inverse DCT operation device having such a configuration will be described. First, the data rearrangement circuit 200 rearranges the order in which the coefficient values of the encoded input data are output. In principle, for serial input data that is input one word at a time, parallel input data is output by aligning all words from the least significant bit to the most significant bit and outputting parallel input data. Is performed to rearrange the coefficient values in the parallel format.

【０００６】つぎに、このようにして得られたパラレル
形式の係数値を積和演算部３００内のメモリ等に予め記
憶しておく。このようにして記憶されているパラレル形
式の値をアドレスとして、上記積和演算部３００内のメ
モリ等から得られる値を、第１〜第８の積和演算回路３
５０−１〜３５０−８により加算または減算し、乗算器
を使用せずに累積加算を行うことにより、上記パラレル
形式の係数値の積和演算を実行する。Next, the parallel coefficient values thus obtained are stored in a memory or the like in the product-sum operation unit 300 in advance. The values obtained from the memory or the like in the product-sum operation unit 300 are stored in the first to eighth product-sum operation circuits 3 using the parallel-format values stored in this manner as addresses.
By performing addition or subtraction by 50-1 to 350-8, and performing cumulative addition without using a multiplier, the product-sum operation of the parallel format coefficient values is performed.

【０００７】ただし、ここで注意すべき点は、第１〜第
８の積和演算回路による上記パラレル形式の係数値の積
和演算の結果に対しパラレル／シリアル変換を行って最
終的に復号化された出力データを得るために、上記積和
演算の結果として得られる複数の値同士を加算または減
算するといったような後処理を行う必要が生じてくるこ
とである。図２０の逆ＤＣＴ演算装置では、後処理部４
００において、上記パラレル形式の係数値の積和演算の
結果として得られる複数の値同士を加算または減算する
ことにより、積和演算の後処理を行っていた。However, a point to be noted here is that the result of the product-sum operation of the coefficient values in the parallel format by the first to eighth product-sum operation circuits is subjected to parallel / serial conversion and finally decoded. In order to obtain the output data, it is necessary to perform post-processing such as adding or subtracting a plurality of values obtained as a result of the product-sum operation. In the inverse DCT operation device of FIG.
At 00, post-processing of the product-sum operation is performed by adding or subtracting a plurality of values obtained as a result of the product-sum operation of the coefficient values in the parallel format.

【０００８】[0008]

【発明が解決しようとする課題】上記のとおり、ＤＡ法
を用いた従来の逆ＤＣＴ演算装置においては、複数の積
和演算回路の出力側に後処理部を設け、この後処理部に
おいて、複数の積和演算回路による係数値の積和演算の
結果として得られる複数の値同士を加算または減算する
ようにしていた。しかしながら、上記の後処理部のよう
な回路は冗長であり、このような余計な回路を設けるこ
とにより、無駄に逆ＤＣＴ演算装置の回路規模が大きく
なるという問題が発生する。As described above, in the conventional inverse DCT operation device using the DA method, a post-processing section is provided on the output side of a plurality of product-sum operation circuits. Are added or subtracted between a plurality of values obtained as a result of the product-sum operation of the coefficient values by the product-sum operation circuit. However, the circuit such as the post-processing unit is redundant, and providing such an extra circuit causes a problem that the circuit scale of the inverse DCT operation device is increased unnecessarily.

【０００９】特に、ＭＰＥＧの規格に従って圧縮した大
量の動画像データを復号化する場合、ＩＥＥＥにより規
定された演算精度で、複数のブロックの各々に対し２次
元の８×８の行列の逆ＤＣＴ演算を行う必要がある。し
かしながら、従来の逆ＤＣＴ演算装置のように、累積加
算部を８個並列にして積和演算を実行するような構成で
は、上記の演算精度を満たすために多くのクロックサイ
クルを浪費するので、上記動画像データに対する逆ＤＣ
Ｔ演算を完了させるまでに多くの時間がかかってしま
う。In particular, when decoding a large amount of moving image data compressed in accordance with the MPEG standard, the inverse DCT operation of a two-dimensional 8 × 8 matrix is performed on each of a plurality of blocks with the operation accuracy specified by IEEE. Need to do. However, in a configuration in which eight accumulators are executed in parallel and a product-sum operation is performed as in a conventional inverse DCT operation device, many clock cycles are wasted to satisfy the above operation accuracy. Inverse DC for moving image data
It takes a lot of time to complete the T operation.

【００１０】本発明は上記問題点に鑑みてなされたもの
であり、ＩＥＥＥにより規定された演算精度を満たしつ
つ、小規模な回路でもって高速に逆ＤＣＴ演算を実行す
るための逆ＤＣＴ演算装置および逆ＤＣＴ演算方法を提
供することを目的とするものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has an inverse DCT operation device for executing an inverse DCT operation at high speed with a small-scale circuit while satisfying the operation accuracy specified by IEEE. It is an object of the present invention to provide an inverse DCT operation method.

【００１１】[0011]

【課題を解決するための手段】図１は、本発明の原理構
成を示すブロック図である。ただし、ここでは、本発明
の逆ＤＣＴ演算装置の構成を簡略化して示すこととす
る。上記問題点を解決するために、本発明の逆ＤＣＴ演
算装置は、図１に示すように、任意の符号化された入力
データに対する逆ＤＣＴ演算を、少なくとも２回の１次
元の行列演算に分解し、上記１次元の所定の数の行列演
算を順次実行して上記入力データの復号化を行う場合
に、上記入力データの複数の係数値の並べ替えを行って
からシリアル形式で送出される係数値を、パラレル形式
の係数値に変換するシリアル／パラレル変換部４と、こ
のシリアル／パラレル変換部４から出力される値をアド
レスとして、既に記憶されている情報の加減算を行う複
数の加減算部を有する積和演算部１とを備えている。こ
こで、上記複数の加減算部は、積和演算部１内の複数の
累積加算部（図１では、第１〜第ｎの累積加算部３−１
〜３−ｎとして示す）に含まれる。FIG. 1 is a block diagram showing the principle configuration of the present invention. However, here, the configuration of the inverse DCT operation device of the present invention is simplified. In order to solve the above problem, the inverse DCT operation device of the present invention decomposes an inverse DCT operation on arbitrary encoded input data into at least two one-dimensional matrix operations as shown in FIG. When decoding the input data by sequentially executing the predetermined number of one-dimensional matrix operations, a plurality of coefficient values of the input data are rearranged and then transmitted in a serial format. A serial / parallel conversion unit 4 for converting a numerical value into a coefficient value in a parallel format, and a plurality of addition / subtraction units for performing addition / subtraction of information already stored using a value output from the serial / parallel conversion unit 4 as an address. And a product-sum operation unit 1. Here, the plurality of addition / subtraction units are a plurality of cumulative addition units (in FIG. 1, the first to n-th cumulative addition units 3-1) in the product-sum operation unit 1.
３−3-n).

【００１２】図１に示す本発明の逆ＤＣＴ演算装置で
は、上記積和演算部１内の複数の加減算部を再度使用
し、上記積和演算部１による積和演算の結果として得ら
れる複数の値同士を加算または減算してから、パラレル
／シリアル変換部５によりパラレル／シリアル変換を行
うことによって、上記入力データを復号化するようにし
ている。In the inverse DCT operation device of the present invention shown in FIG. 1, a plurality of addition / subtraction units in the product-sum operation unit 1 are used again, and a plurality of product-sum operations obtained by the product-sum operation unit 1 are obtained. The input data is decoded by adding or subtracting the values, and then performing parallel / serial conversion by the parallel / serial converter 5.

【００１３】さらに、本発明の第１の好ましい実施態様
に係る逆ＤＣＴ演算装置は、任意の符号化された入力デ
ータに対する逆ＤＣＴ演算を、２回の１次元の行列演算
に分解し、１次元目および２次元目の所定の数の行列演
算を順次実行して上記入力データの復号化を行う場合
に、上記入力データの複数の係数値の並べ替えを行う係
数値並べ替え部と、この係数値並べ替え部からシリアル
形式で送られてくる係数値を、パラレル形式の係数値に
変換するシリアル／パラレル変換部と、このシリアル／
パラレル変換部から出力される値をアドレスとして、既
に記憶されている情報の加減算を行う複数の加減算部を
有する積和演算部と、この積和演算部による１次元目の
積和演算の結果として得られる複数の値に対する転置を
行う転置部とを備えており、この転置部から送出される
パラレル形式の複数の値を上記積和演算部に入力して上
記複数の値の２次元目の積和演算を実行し、上記積和演
算部内の上記複数の加減算部を再度使用し、上記２次元
目の積和演算の結果として得られる複数の値同士を加算
または減算してからパラレル／シリアル変換を行うこと
により、上記入力データを復号化するように構成され
る。Further, the inverse DCT operation device according to the first preferred embodiment of the present invention decomposes the inverse DCT operation on arbitrary encoded input data into two one-dimensional matrix operations to perform one-dimensional operation. A coefficient value rearranging unit for rearranging a plurality of coefficient values of the input data when decoding the input data by sequentially executing a predetermined number of matrix operations in the first and second dimensions; A serial / parallel conversion unit for converting a coefficient value sent in a serial format from the numerical value rearranging unit into a parallel format coefficient value;
A product-sum operation unit having a plurality of addition / subtraction units for performing addition / subtraction of information already stored, using a value output from the parallel conversion unit as an address, and a result of a first-dimension product-sum operation performed by the product-sum operation unit A transposition unit that transposes a plurality of values obtained from the transposition unit, inputs a plurality of values in a parallel format sent from the transposition unit to the product-sum operation unit, and performs a second-dimensional product of the plurality of values. A sum operation is performed, the plurality of addition / subtraction units in the product-sum operation unit are used again, and a plurality of values obtained as a result of the second-dimensional product-sum operation are added or subtracted, and then parallel / serial conversion is performed. Is performed to decode the input data.

【００１４】さらに、本発明の第２の好ましい実施態様
に係る逆ＤＣＴ演算装置は、任意の符号化された入力デ
ータに対する逆ＤＣＴ演算を、２回の１次元の行列演算
に分解し、１次元目および２次元目の所定の数の行列演
算を順次実行して上記入力データの復号化を行う場合
に、上記入力データの複数の係数値の並べ替えを行う２
個の係数値並べ替え部と、これらの２個の係数値並べ替
え部の各々からシリアル形式で送られてくる係数値を、
パラレル形式の係数値に変換するシリアル／パラレル変
換部と、このシリアル／パラレル変換部から出力される
値をアドレスとして、既に記憶されている情報の加減算
を行う複数の加減算部を有する積和演算部と、この積和
演算部による１次元目の積和演算の結果として得られる
複数の値に対する転置を行う２個の転置部とを備えてお
り、これらの２個の転置部の各々から送出されるパラレ
ル形式の複数の値を上記積和演算部に入力して上記複数
の値の２次元目の積和演算を実行し、上記積和演算部内
の上記複数の加減算部を再度使用し、上記２個の転置部
の各々から送出される複数の値同士を加算または減算し
てからパラレル／シリアル変換を行うことにより、上記
入力データを復号化するように構成される。Further, the inverse DCT operation device according to the second preferred embodiment of the present invention decomposes the inverse DCT operation on any encoded input data into two one-dimensional matrix operations, When decoding the input data by sequentially executing a predetermined number of matrix operations in the first and second dimensions, rearrangement of a plurality of coefficient values of the input data is performed.
Coefficient value rearranging units, and coefficient values sent in serial form from each of these two coefficient value rearranging units,
A product-sum operation unit having a serial / parallel conversion unit for converting to a coefficient value in a parallel format, and a plurality of addition / subtraction units for performing addition / subtraction of information already stored using a value output from the serial / parallel conversion unit as an address. And two transposition units that transpose a plurality of values obtained as a result of the first-dimension product-sum operation by the product-sum operation unit, and are transmitted from each of these two transposition units. The plurality of values in the parallel format are input to the product-sum operation unit, and a second-dimensional product-sum operation of the plurality of values is performed, and the plurality of addition / subtraction units in the product-sum operation unit are used again. The input data is decoded by adding or subtracting a plurality of values sent from each of the two transposition units and then performing parallel / serial conversion.

【００１５】さらに、本発明の第３の好ましい実施態様
に係る逆ＤＣＴ演算装置は、任意の符号化された入力デ
ータに対する逆ＤＣＴ演算を、２回の１次元の行列演算
に分解し、１次元目および２次元目の所定の数の行列演
算を順次実行して上記入力データの復号化を行う場合
に、上記入力データの複数の係数値の並べ替えを行う２
個の係数値並べ替え部と、これらの２個の係数値並べ替
え部の各々からシリアル形式で送られてくる係数値を、
パラレル形式の係数値に変換するシリアル／パラレル変
換部と、このシリアル／パラレル変換部から出力される
値をアドレスとして、既に記憶されている情報の加減算
を行う複数の第１の加減算部を有する１次元目積和演算
部と、この１次元目積和演算部による１次元目の積和演
算の結果として得られる複数の値に対する転置を行う２
個の転置部とを備えている。Further, the inverse DCT operation device according to the third preferred embodiment of the present invention decomposes the inverse DCT operation on any encoded input data into two one-dimensional matrix operations, When decoding the input data by sequentially executing a predetermined number of matrix operations in the first and second dimensions, rearrangement of a plurality of coefficient values of the input data is performed.
Coefficient value rearranging units, and coefficient values sent in serial form from each of these two coefficient value rearranging units,
A serial / parallel conversion unit for converting into a coefficient value in a parallel format, and a plurality of first addition / subtraction units for performing addition / subtraction of information already stored using a value output from the serial / parallel conversion unit as an address A two-dimensional sum-of-products operation unit, and transposes a plurality of values obtained as a result of the first-dimensional sum-of-products operation by the one-dimensional sum-of-products operation unit 2
And two transposition units.

【００１６】本発明の第３の好ましい実施態様に係る逆
ＤＣＴ演算装置はまた、上記２個の転置部の各々から送
出されるパラレル形式の複数の値をアドレスとして、既
に記憶されている情報の加減算を行う複数の第２の加減
算部を有する２次元目積和演算部とを備えており、この
２次元目積和演算部内の上記複数の第２の加減算部を再
度使用し、２次元目積和演算部による２次元目の積和演
算の結果として得られる複数の値同士を加算または減算
してからパラレル／シリアル変換を行うことにより、上
記入力データを復号化するように構成される。The inverse DCT operation device according to a third preferred embodiment of the present invention further comprises, as an address, a plurality of values in a parallel format sent from each of the two transposed units, for the information already stored. A two-dimensional product sum calculation unit having a plurality of second addition / subtraction units for performing addition / subtraction. The second plurality of second addition / subtraction units in the two-dimensional product sum calculation unit are used again, and The input data is decoded by adding or subtracting a plurality of values obtained as a result of the second-dimensional product-sum operation by the product-sum operation unit and then performing parallel / serial conversion.

【００１７】好ましくは、本発明の第１〜第３の好まし
い実施態様に係る逆ＤＣＴ演算装置において、１次元目
の行列演算に係る累積加算を行うために上記積和演算部
を初期化するときに、上記パラレル形式の係数値の小数
部における最下位のビットの丸め処理を行うようにして
いる。さらに、好ましくは、本発明の第１および第２の
好ましい実施態様において、上記１次元目積和演算部
は、上記シリアル／パラレル変換部から出力される上記
パラレル形式の係数値を、最下位のビットから最上位の
ビットに向かって１ビットずつシフトしながら一時的に
保持するビット保持部と、上記複数の加減算部を用いて
累積加算を行う累積加算部とを含み、この累積加算部
は、上記ビット保持部に保持されている上記パラレル形
式の係数値の１ビットをアドレスとして、上記累積加算
部内の値記憶部に既に記憶されている情報と上記加減算
部とを用いて上記累積加算を実行するようにしている。Preferably, in the inverse DCT operation device according to the first to third preferred embodiments of the present invention, when the product-sum operation unit is initialized to perform cumulative addition relating to the first-dimensional matrix operation. Then, the least significant bit in the decimal part of the parallel format coefficient value is rounded. More preferably, in the first and second preferred embodiments of the present invention, the one-dimensional product summation unit calculates the coefficient value of the parallel format output from the serial / parallel conversion unit in the lowest order. A bit holding unit that temporarily holds the bits while shifting them one bit at a time from the bit to the most significant bit, and a cumulative adding unit that performs cumulative addition using the plurality of addition / subtraction units. Performing the cumulative addition using the information already stored in the value storage unit in the cumulative addition unit and the addition / subtraction unit, using one bit of the parallel format coefficient value stored in the bit storage unit as an address. I am trying to do it.

【００１８】さらに、好ましくは、本発明の第３の好ま
しい実施態様において、上記１次元目積和演算部は、上
記シリアル／パラレル変換部から出力される上記パラレ
ル形式の係数値を最下位のビットから最上位のビットに
向かって１ビットずつシフトしながら一時的に保持する
第１のビット保持部と、上記複数の第１の加減算部を用
いて１次元目の行列演算に係る累積加算を行う第１の累
積加算部とを含み、この第１の累積加算部は、上記第１
のビット保持部に保持されている上記パラレル形式の係
数値の１ビットをアドレスとして、上記第１の累積加算
部内の第１の値記憶部に既に記憶されている情報と上記
第１の加減算部とを用いて上記１次元目の行列演算に係
る累積加算を実行し、上記２次元目積和演算部は、上記
２個の転置部の各々から送出される上記パラレル形式の
複数の値を、最下位のビットから最上位のビットに向か
って１ビットずつシフトしながら一時的に保持する第２
のビット保持部と、上記複数の第２の加減算部を用いて
２次元目の行列演算に係る累積加算を行う第２の累積加
算部とを含み、この第２の累積加算部は、上記第２のビ
ット保持部に保持されている上記パラレル形式の複数の
値の１ビットをアドレスとして、上記第２の累積加算部
内の第２の値記憶部に既に記憶されている情報と上記第
２の加減算部とを用いて上記２次元目の行列演算に係る
累積加算を実行するようにしている。Further, preferably, in the third preferred embodiment of the present invention, the one-dimensional product sum-of-products calculating unit calculates the coefficient value in the parallel format output from the serial / parallel converting unit in the least significant bit. And a first bit holding unit that temporarily holds the data while shifting one bit at a time from the first bit to the most significant bit, and performs cumulative addition related to a first-dimensional matrix operation using the plurality of first addition / subtraction units. And a first accumulative adder, wherein the first accumulative adder includes the first accumulator.
The information already stored in the first value storage unit in the first cumulative addition unit and the first addition / subtraction unit are set using, as an address, one bit of the coefficient value in the parallel format held in the bit holding unit. And the cumulative addition according to the first-dimensional matrix operation is performed, and the two-dimensional product-sum operation unit calculates the plurality of values in the parallel format sent from each of the two transposed units, The second method of temporarily storing data while shifting one bit at a time from the least significant bit to the most significant bit
And a second accumulative addition unit that performs accumulative addition relating to a two-dimensional matrix operation using the plurality of second addition / subtraction units. The second accumulative addition unit includes the second accumulative addition unit. The information already stored in the second value storage unit in the second accumulative addition unit is used as the address of one bit of the plurality of values in the parallel format held in the second bit holding unit. The addition and subtraction unit is used to execute the cumulative addition related to the above-described second-dimensional matrix operation.

【００１９】さらに、好ましくは、本発明の第２および
第３の好ましい実施態様において、上記２個の係数値並
べ替え部は、上記入力データの複数の係数値に対し交互
に並べ替えを行うための第１の係数値並べ替えＲＡＭ、
および第２の係数値並べ替えＲＡＭにより構成され、上
記第１の係数値並べ替えＲＡＭにおいて一つのブロック
の係数値の並べ替えを行った結果の読み出しが実行され
る前に、上記第２の係数値並べ替えＲＡＭにおいて次に
送られてくる他のブロックの係数値の書き込み動作が実
行され、それ以降に送られてくるブロックの係数値に対
し同様の読み出し動作および書き込み動作が実行される
ようになっている。Further, preferably, in the second and third preferred embodiments of the present invention, the two coefficient value rearranging sections are configured to alternately rearrange a plurality of coefficient values of the input data. The first coefficient value sorting RAM of
And the second coefficient value rearranging RAM, and before the first coefficient value rearranging RAM reads out the result of rearranging the coefficient values of one block, executes the second coefficient value rearranging. In the numerical rearrangement RAM, the writing operation of the coefficient value of another block transmitted next is executed, and the same reading operation and writing operation are performed on the coefficient value of the block transmitted thereafter. Has become.

【００２０】さらに、好ましくは、本発明の第２および
第３の好ましい実施態様において、上記２個の転置部
は、上記係数値の１次元目の積和演算の結果として得ら
れる複数の値に対し交互に転置を行うための第１の転置
ＲＡＭ、および第２の転置ＲＡＭにより構成され、上記
第１の転置ＲＡＭにおいて一つのブロックの値の転置を
行った結果の読み出し動作が実行される前に、上記第２
の転置ＲＡＭにおいて次に送られてくるブロックの値の
書き込み動作が実行され、それ以降に送られてくるブロ
ックの値に対し同様の読み出し動作および書き込み動作
が実行されるようになっている。Further, preferably, in the second and third preferred embodiments of the present invention, the two transposed units include a plurality of values obtained as a result of a first-order product-sum operation of the coefficient values. On the other hand, the first transposition RAM is configured by a first transposition RAM and a second transposition RAM for performing transposition alternately, and before the read operation of the result obtained by transposing the value of one block in the first transposition RAM is performed. In the second
The write operation of the value of the block transmitted next is executed in the transposition RAM, and the same read operation and write operation are performed on the value of the block transmitted thereafter.

【００２１】また一方で、本発明の逆ＤＣＴ演算方法
は、任意の符号化された入力データに対する逆ＤＣＴ演
算を、少なくとも２回の１次元の行列演算に分解し、こ
の１次元の所定の数の行列演算を順次実行して上記入力
データの復号化を行うために、上記入力データの複数の
係数値の並べ替えを行ってからシリアル形式で送出され
る係数値を、パラレル形式の係数値に変換し、このパラ
レル形式に変換された値をアドレスとして、予め記憶さ
れている情報の加減算を行う複数の加減算部を使用し、
予め記憶されている情報と上記複数の加減算部とを用い
て累積加算を行うことにより積和演算を実行し、上記複
数の加減算部を再度使用し、上記積和演算の結果として
得られる複数の値同士を加算または減算してからパラレ
ル／シリアル変換を行うことにより、上記入力データを
復号化する。On the other hand, the inverse DCT operation method of the present invention decomposes the inverse DCT operation on any encoded input data into at least two one-dimensional matrix operations, In order to sequentially execute the matrix operation of the above and perform decoding of the input data, the coefficient values transmitted in the serial format after rearranging the plurality of coefficient values of the input data are converted into the coefficient values in the parallel format. Using a plurality of addition and subtraction units that perform addition and subtraction of information stored in advance, using the value converted and converted into the parallel format as an address,
A product-sum operation is performed by performing cumulative addition using the information stored in advance and the plurality of addition / subtraction units, and the plurality of addition / subtraction units are used again. The input data is decoded by performing parallel / serial conversion after adding or subtracting values.

【００２２】ついで、本発明の逆ＤＣＴ演算装置および
逆ＤＣＴ演算方法の前提となっている積和演算を行う場
合に、乗算器を使用せずに、メモリ等の値記憶部に既に
記憶されている情報と加算器や減算器等からなる加減算
部を用いて累積加算を行うことにより積和演算の結果を
得るようなＤＡ法の逆ＤＣＴ演算の理論的背景を以下に
示す。Next, when performing the product-sum operation, which is the premise of the inverse DCT operation device and the inverse DCT operation method of the present invention, the multiplication operation is performed without using a multiplier, and is already stored in a value storage unit such as a memory. The theoretical background of the inverse DCT operation of the DA method that obtains the result of the product-sum operation by performing the cumulative addition using the added information and the addition / subtraction unit including the adder and the subtractor will be described below.

【００２３】例えば、２次元の８×８行列の入力データ
の逆ＤＣＴ演算を、（８×８行列）＊（１×８行列）の演算（すなわち、
１次元目の行列演算） ↓ １次元目の行列演算結果の転置 ↓ （８×８行列）＊（１×８行列）の演算（すなわち、２
次元目の行列演算） ↓ ２次元目の行列演算結果の転置のように、２回の（８×８行列）＊（１×８行列）の演
算（すなわち、２回の１次元の行列演算）に分解して行
う方式が一般的である。For example, an inverse DCT operation on input data of a two-dimensional 8 × 8 matrix is performed by an operation of (8 × 8 matrix) * (1 × 8 matrix) (ie,
1st dimension matrix operation) ↓ Transpose of 1st dimension matrix operation result ↓ (8 × 8 matrix) * (1 × 8 matrix) operation (ie, 2
Dimensional matrix operation) ↓ Two operations of (8 × 8 matrix) * (1 × 8 matrix) like transposition of 2D matrix operation result (that is, two one-dimensional matrix operations) In general, the method is performed by disassembling this.

【００２４】上記（８×８行列）＊（１×８行列）のｋ
行目の積和演算を、下記の〔数１〕に記載された式
（１）のように表記する。The above (8 × 8 matrix) * (1 × 8 matrix) k
The product-sum operation in the line is represented as in the following equation (1).

【００２５】[0025]

【数１】この式（１）において、ｘ（成分ｘ_m,n）を入力デー
タ、Ｃ（成分Ｃ_km）を１次元の８×８行列の逆ＤＣＴ変
換行列、そして、ｙを出力データとする。(Equation 1) In this equation (1), x (component x _{m, n} ) is input data, C (component C _km ) is a one-dimensional 8 × 8 inverse DCT transformation matrix, and y is output data.

【００２６】ここで、乗算器を使用せずに上記の積和演
算を実行する方法を以下に示す。この場合、ｘをＮビッ
トの２進数表現にて表記すると、下記の〔数２〕に記載
された式（２）のようになる。Here, a method of executing the above product-sum operation without using a multiplier will be described below. In this case, if x is expressed in N-bit binary notation, it becomes as shown in the following equation (2).

【００２７】[0027]

【数２】式（２）を式（１）に代入すると、下記の〔数３〕に記
載された式（３）が得られる。(Equation 2) By substituting equation (2) into equation (1), equation (3) described in the following [Equation 3] is obtained.

【００２８】[0028]

【数３】すなわち、下記の〔数４〕に記載された式（４）のよう
に表される。(Equation 3) That is, it is represented as the following equation (4) in [Equation 4].

【００２９】[0029]

【数４】ここで、Ｒ（成分Ｒ_k）は、入力データｘの各成分のｎ
番目のビットの値（合計８ビット）をアドレスとするメ
モリにより構成することができる。(Equation 4) Here, R (component R _k ) is n of each component of input data x.
It can be constituted by a memory in which the value of the second bit (8 bits in total) is used as an address.

【００３０】前述の式（１）において使用している行列
演算を成分で表すと、下記の〔数５〕に記載された式
（５）のようになる。When the matrix operation used in the above equation (1) is represented by components, it is as shown in the following equation (5).

【００３１】[0031]

【数５】この式（５）の行列演算をそのまま実行しようとする
と、Ｒ_kのワード数として、２⁸＝２５６個が必要にな
る。また、ｋは０〜７の範囲を取るので、合計２０４８
ワードが必要になる。(Equation 5) If the matrix operation of the equation (5) is to be executed as it is, 2 ⁸ = 256 are required as the number of words of R _k . Also, since k takes a range of 0 to 7, a total of 2048
Word is needed.

【００３２】ここで、上記変換行列Ｃの対称性を利用す
ると、下記の〔数６〕および〔数７〕にそれぞれ記載さ
れた式（６）および式（７）のように積和演算を分解す
ることができる。まず、以下の式（６）および式（７）
のような２種類の積和演算を行う。Here, when the symmetry of the transformation matrix C is used, the product-sum operation is decomposed as shown in the following equations (6) and (7), respectively. can do. First, the following equations (6) and (7)
Are performed.

【００３３】[0033]

【数６】 (Equation 6)

【００３４】[0034]

【数７】つぎに、下記の〔数８〕に記載された式（８）に示すよ
うに、ｓ（成分ｓ₀〜ｓ₃）およびｔ（成分ｔ₀〜
ｔ₃）の２種類の成分の加算および減算を行うことによ
って出力データｙ（成分ｙ₀〜ｙ₇）が得られる。(Equation 7) Next, as shown in Expression (8) described in the following [Equation 8], s (components s _{0 to} s ₃ ) and t (components t ₀ to
Output data y (component y ₀ ~y ₇₎ is obtained by performing addition and subtraction of two components of t _3).

【００３５】[0035]

【数８】上記のような演算を実行すると、Ｒ_kのワード数とし
て、は、２⁴＝１６個が必要になる。また一方で、ｋは
０〜７の範囲を取るので、合計６４ワードが必要とな
る。すなわち、上記のように行列演算を分解することに
よって、この行列演算に必要なワード数を、前述の式
（５）の行列演算をそのまま実行する場合のワード数の
１／１６にすることができる。(Equation 8) When the above-described operation is performed, the number of words of R _k needs to be 2 ⁴ = 16. On the other hand, since k ranges from 0 to 7, a total of 64 words are required. That is, by decomposing the matrix operation as described above, the number of words required for the matrix operation can be reduced to 1/16 of the number of words when the matrix operation of the above-described equation (5) is directly executed. .

【００３６】従来の逆ＤＣＴ演算装置の構成では、複数
の累積加算部において式（６）および式（７）の演算を
実行し、その後に式（８）の演算を実行するために、複
数の加算器や減算器を設けていた。これに対し、本発明
の逆ＤＣＴ演算装置（および逆ＤＣＴ演算方法）では、
図１の原理図に基づいて説明したように、複数の累積加
算部内の加減算部を再度使用して式（８）の演算を実行
することにより、従来設けていた複数の加算器や減算器
が不要になる。In the configuration of the conventional inverse DCT arithmetic unit, a plurality of accumulative adders execute the operations of Expressions (6) and (7) and then execute the operation of Expression (8). Adders and subtractors were provided. On the other hand, in the inverse DCT operation device (and the inverse DCT operation method) of the present invention,
As described with reference to the principle diagram of FIG. 1, by executing the operation of Expression (8) again using the addition / subtraction units in the plurality of accumulative addition units, a plurality of conventionally-added adders and subtractors can be realized. It becomes unnecessary.

【００３７】かくして、本発明の逆ＤＣＴ演算装置およ
び逆ＤＣＴ演算方法によれば、積和演算の後処理用の複
数の加算器や減算器等の余計な回路が必要なくなるの
で、従来よりも小規模な回路でかつ少ないクロックサイ
クル数で少なくとも２次元の逆ＤＣＴ演算を実行するこ
とが可能になる。さらに、本発明の第１の好ましい実施
態様に係る逆ＤＣＴ演算装置によれば、複数の累積加算
部内の加減算部を再度使用して積和演算の後処理を行う
と共に、１次元目および２次元目の８×８行列等の積和
演算を行う積和演算部に対し１個の係数値並べ替えＲＡ
Ｍと１個の転置ＲＡＭ等を設けるようにしているので、
ＩＥＥＥにより規定される２次元の逆ＤＣＴ演算精度を
満たしつつ、従来よりも小規模な回路でかつ少ないクロ
ックサイクル数で効率の良いパイプライン処理を行うこ
とが可能になる。Thus, according to the inverse DCT operation apparatus and the inverse DCT operation method of the present invention, unnecessary circuits such as a plurality of adders and subtractors for post-processing of the product-sum operation are not required. At least two-dimensional inverse DCT operation can be performed with a large-scale circuit and with a small number of clock cycles. Furthermore, according to the inverse DCT operation device according to the first preferred embodiment of the present invention, post-processing of the product-sum operation is performed by using the addition / subtraction units in the plurality of accumulating units again, and the first and second dimensions are added. One coefficient value rearrangement RA is performed for the product-sum operation unit that performs the product-sum operation of the 8 × 8 matrix or the like.
Since M and one transposed RAM are provided,
While satisfying the two-dimensional inverse DCT operation accuracy specified by IEEE, it is possible to perform efficient pipeline processing with a smaller circuit and a smaller number of clock cycles than before.

【００３８】さらに、本発明の第２の好ましい実施態様
に係る逆ＤＣＴ演算装置によれば、複数の累積加算部内
の加減算部を再度使用して積和演算の後処理を行うと共
に、１次元目および２次元目の８×８行列等の積和演算
を行う積和演算部に対し２個の係数値並べ替えＲＡＭと
２個の転置ＲＡＭ等を設けているので、ＩＥＥＥにより
規定される２次元の逆ＤＣＴ演算精度を満たしつつ、従
来よりも小規模な回路で、かつ、第１の実施態様の場合
よりも少ないクロックサイクル数で非常に効率の良いパ
イプライン処理を行うことが可能になる。Further, according to the inverse DCT operation device according to the second preferred embodiment of the present invention, post-processing of the product-sum operation is performed by using the addition / subtraction units in the plurality of accumulation addition units again, and the first dimension Since two coefficient value rearranging RAMs and two transposed RAMs are provided for a product-sum operation unit that performs a product-sum operation of an 8 × 8 matrix or the like in the second dimension, two-dimensional data defined by IEEE While achieving the inverse DCT calculation accuracy of the first embodiment, it is possible to perform very efficient pipeline processing with a smaller circuit than before and with a smaller number of clock cycles than in the first embodiment.

【００３９】さらに、本発明の第３の好ましい実施態様
に係る逆ＤＣＴ演算装置によれば、複数の累積加算部内
の加減算部を再度使用して積和演算の後処理を行うと共
に、１次元目および２次元目の８×８行列等の積和演算
をそれぞれ別個に行う２個の積和演算部に対し２個の係
数値並べ替えＲＡＭと２個の転置ＲＡＭ等を設けている
ので、ＩＥＥＥにより規定される２次元の逆ＤＣＴ演算
精度を満たしつつ、従来よりも小規模な回路で、かつ、
第２の実施態様の場合よりも少ないクロックサイクル数
で非常に効率の良いパイプライン処理を行うことが可能
になる。Further, according to the inverse DCT operation device according to the third preferred embodiment of the present invention, post-processing of the product-sum operation is performed again by using the addition / subtraction units in the plurality of accumulation addition units again, and the first dimension And two transposition RAMs and two coefficient value rearranging RAMs are provided for two product-sum operation units for separately performing the product-sum operation of an 8 × 8 matrix or the like of the second dimension. While satisfying the two-dimensional inverse DCT operation accuracy defined by
Very efficient pipeline processing can be performed with a smaller number of clock cycles than in the case of the second embodiment.

【００４０】[0040]

【発明の実施の形態】以下、添付図面（図２〜図１９）
を参照しながら、本発明の好ましい実施例の構成を説明
する。図２は、本発明の逆ＤＣＴ演算装置に適用される
積和演算部の具体的な構成を示すブロック図である。な
お、これ以降、前述した構成要素と同様のものについて
は、同一の参照番号を付して表すこととする。BRIEF DESCRIPTION OF THE DRAWINGS FIG.
The configuration of the preferred embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram showing a specific configuration of the product-sum operation unit applied to the inverse DCT operation device of the present invention. Hereinafter, the same components as those described above will be denoted by the same reference numerals.

【００４１】図２においては、本発明の逆ＤＣＴ演算装
置に適用される積和演算部１（図１参照）は、シリアル
／パラレル変換部４から出力されるパラレル形式の係数
値を一時的に保持するためのビット保持部（図１参照）
として機能する１６個のシフトレジスタ（図２では、第
１のシフトレジスタ〜第１６のシフトレジスタ２０−１
〜２０−１６として示す）と、１６個の累積加算部（図
２では、第１の累積加算部〜第１６の累積加算部３０−
１〜３０−１６として示す）とを備えている。さらに、
これらの累積加算部は、上記１６個のシフトレジスタに
保持されている値の１ビットをアドレスとして、上記累
積加算部内の値記憶部（後述の図１８参照）に既に記憶
されている情報と、上記累積加算部内の加算器や減算器
からなる複数の加減算部（後述の図１８参照）とを用い
て累積加算を実行する。In FIG. 2, a product-sum operation unit 1 (see FIG. 1) applied to the inverse DCT operation device of the present invention temporarily stores a parallel-type coefficient value output from a serial / parallel conversion unit 4. Bit holding unit for holding (see FIG. 1)
Shift registers (in FIG. 2, the first shift register to the sixteenth shift register 20-1)
20 to 16-16) and 16 cumulative adders (in FIG. 2, the first cumulative adder to the sixteenth cumulative adder 30-
1 to 30-16). further,
These accumulative adders use, as an address, one bit of the value held in the 16 shift registers, information already stored in a value storage unit (see FIG. 18 described later) in the accumulative adder, Cumulative addition is performed using a plurality of adder / subtractors (see FIG. 18 described later) including adders and subtracters in the accumulator.

【００４２】上記のような構成の積和演算部、シリアル
／パラレル変換部４およびパラレル／シリアル変換部５
により、積和演算回路１０が構成される。さらに詳しく
説明すると、上記の第１〜第１６のシフトレジスタ２０
−１〜２０−１６は、シリアル／パラレル変換部４から
送られてくる係数値を受け取った後に、最下位のビット
から最上位のビットに向かって１ビットずつシフトしな
がら上記の第１〜第１６の累積加算部３０−１〜３０−
１６に上記係数値を供給する。The product-sum operation unit, the serial / parallel conversion unit 4 and the parallel / serial conversion unit 5 configured as described above.
Thus, the product-sum operation circuit 10 is configured. More specifically, the first to sixteenth shift registers 20 will be described.
After receiving the coefficient value sent from the serial / parallel conversion unit 4, -1 to 20-16 shift the bit from the least significant bit to the most significant bit one bit at a time. 16 accumulative adders 30-1 to 30-
16 is supplied with the coefficient value.

【００４３】これらの第１〜第１６の累積加算部では、
第１〜第１６のシフトレジスタから出力される値の中で
前述の式（８）の成分ｓ₀〜ｓ₃または成分ｔ₀〜ｔ₃
に対応する４ビットの係数値をアドレスとして、上記累
積加算部内の値記憶部に既に記憶されている値を参照す
る。さらに、上記累積加算部では、第１〜第１６のシフ
トレジスタの値を１ビットシフトする度に値記憶部より
参照される参照値を累積して加算していく。このような
累積加算を実行する際は、累積された値（すなわち、１
サイクル前の加算の結果）を１ビット右にシフトした値
（すなわち、１／２倍した値）と参照値とを加算する。
上記の累積加算は、前述の式（６）または式（７）の演
算に対応している。In these first to sixteenth cumulative addition sections,
The first to component s ₀ ~s ₃ or component t ₀ ~t ₃ of the above formula in the value output from the sixteenth shift register (8)
Is used as an address, and a value already stored in the value storage unit in the accumulative addition unit is referred to. Further, the accumulative addition section accumulates and adds the reference values referred from the value storage section every time the values of the first to sixteenth shift registers are shifted by one bit. When performing such cumulative addition, the accumulated value (ie, 1
The value obtained by shifting the result before the cycle (the result of the addition before the cycle) to the right by one bit (that is, a value obtained by multiplying by）) and the reference value are added.
The above-described cumulative addition corresponds to the calculation of the above-described equation (6) or (7).

【００４４】第１〜第１６のシフトレジスタ１−１６か
ら出力される値を全てシフトした時点で、第１〜第１６
の累積加算部内の加算器や減算器を再度使用して、累積
した値同士を加算または減算し、１６種類の積和演算の
結果を同時に得ることができる。このような加算または
減算は、前述の式（８）の演算に対応している。最終的
に、パラレル／シリアル変換部５は、第１〜第１６の累
積加算部から送られてくる１６種類の積和演算の結果に
それぞれ対応する１６個の値を、シリアル形式の値に変
換し、復号化された出力データとして順次出力する。When all the values output from the first to sixteenth shift registers 1-16 have been shifted, the first to sixteenth
, The accumulated values can be added or subtracted again by using the adder and the subtracter in the accumulating adder of, and the results of 16 types of product-sum operations can be obtained at the same time. Such addition or subtraction corresponds to the calculation of the above-described equation (8). Finally, the parallel / serial converter 5 converts the 16 values corresponding to the results of the 16 types of product-sum operations sent from the first to sixteenth accumulators into serial values. Then, the data is sequentially output as decoded output data.

【００４５】上記のような構成により、１６個の累積加
算部内で入力データのパイプライン処理等の並列処理を
行うことによって、ＩＥＥＥにより規定される演算精度
を満たしつつ、従来よりも少ないクロックサイクル数で
１次元および２次元の８×８行列の逆ＤＣＴ演算を実行
することができるようになり、累積加算部内の加算器や
減算器を再度使用することで回路規模も節減される。With the above configuration, by performing parallel processing such as pipeline processing of input data in the 16 accumulators, the number of clock cycles can be reduced as compared with the conventional one while satisfying the operation accuracy specified by IEEE. Can perform one-dimensional and two-dimensional 8 × 8 matrix inverse DCT operations, and the circuit size can be reduced by using the adder and the subtracter in the accumulator again.

【００４６】図３は、本発明の第１の実施例の構成を示
すブロック図である。図３の第１の実施例に係る逆ＤＣ
Ｔ演算装置においては、１次元目および２次元目の８×
８行列の行列演算を順次実行して１６種類の積和演算の
結果を得るために、図２に示したような構成の積和演算
回路１０が設けられている。この積和演算回路１０は、
前述したように、入力データに関連するパラレル形式の
係数値を一時的に保持する１６個のシフトレジスタと、
これらのシフトレジスタから出力される値と既に記憶さ
れている値を使用して累積加算を実行する１６個の累積
加算部とを備えている。FIG. 3 is a block diagram showing the configuration of the first embodiment of the present invention. Inverse DC according to the first embodiment of FIG.
In the T arithmetic unit, the 8 × in the first and second dimensions
A product-sum operation circuit 10 having a configuration as shown in FIG. 2 is provided to sequentially execute matrix operations of eight matrices and obtain results of 16 types of product-sum operations. This product-sum operation circuit 10
As described above, 16 shift registers for temporarily holding coefficient values in parallel format related to input data;
There are 16 accumulative adders that execute accumulative addition using values output from these shift registers and values already stored.

【００４７】さらに、上記第１の実施例では、入力デー
タの複数の係数値の並べ替えを行う係数値並べ替え部を
構成する係数値並べ替えＲＡＭ（random access memor
y：ランダムアクセスメモリの略）６と、上記係数値の
積和演算の結果として得られる複数の値に対する転置
（すなわち、行と列を入れ替える操作）を行う転置部を
構成する転置ＲＡＭ８とが設けられている。Further, in the first embodiment, a coefficient value rearranging RAM (random access memory) constituting a coefficient value rearranging section for rearranging a plurality of coefficient values of input data is provided.
y: an abbreviation for random access memory) 6 and a transposition RAM 8 that constitutes a transposition unit that transposes a plurality of values obtained as a result of the product-sum operation of the coefficient values (that is, an operation of exchanging rows and columns). Have been.

【００４８】係数値並べ替えＲＡＭ６では、積和演算回
路１０による１次元目の積和演算を実行可能にするため
に、入力データのシリアル形式の係数値を書き込んでか
ら、上記係数値を出力する順序の並べ替えを行い、この
並べ替えを行った結果として得られる複数の値を読み出
すようにしている。また一方で、転置ＲＡＭ８では、８
×８行列の２次元目の積和演算を実行可能にするため
に、積和演算回路１０による１次元目の積和演算の結果
として得られる複数の値を書き込んでから、上記複数の
値に対する転置を行い、この転置を行った結果として得
られる複数の値を読み出すようにしている。In the coefficient value rearranging RAM 6, in order to enable the product-sum operation circuit 10 to execute the first-dimensional product-sum operation, the serial-valued coefficient value of the input data is written, and then the coefficient value is output. The order is rearranged, and a plurality of values obtained as a result of the rearrangement are read. On the other hand, in the transposition RAM 8, 8
In order to make it possible to execute the second-dimensional multiply-accumulate operation of the × 8 matrix, a plurality of values obtained as a result of the first-dimensional multiply-add operation by the multiply-accumulate operation circuit 10 are written. The transposition is performed, and a plurality of values obtained as a result of the transposition are read.

【００４９】さらに、図３の第１の実施例では、係数値
並べ替えＲＡＭ６と積和演算回路１０との間に、マルチ
プレクサ（ＭＰＸ）７を設けている。このマルチプレク
サ７は、積和演算回路１０内で１次元目の積和演算また
は２次元目の積和演算のいずれか一方を実行させるため
に、係数値並べ替えＲＡＭ６部から送出されるシリアル
形式の複数の値をパラレル形式の複数の値に変換した
値、または、転置ＲＡＭ８からパラレル形式で送出され
る複数の値のいずれか一方を選択して積和演算回路１０
に入力する機能を有する。Further, in the first embodiment shown in FIG. 3, a multiplexer (MPX) 7 is provided between the coefficient value rearranging RAM 6 and the product-sum operation circuit 10. The multiplexer 7 is used to execute one of the first-dimensional product-sum operation and the second-dimensional product-sum operation in the product-sum operation circuit 10, and the serial-format data transmitted from the coefficient value rearranging RAM 6 is used. A product-sum operation circuit 10 selects one of a plurality of values converted into a plurality of values in a parallel format or a plurality of values sent in a parallel format from the transposition RAM 8.
It has a function to input to

【００５０】このような構成の逆ＤＣＴ演算装置におい
ては、積和演算回路１０内で２次元目の積和演算を行っ
た後に、１６個の累積加算部内の複数の加減算部を再度
使用し、２次元目の積和演算の結果として得られる複数
の値同士を加算または減算してからパラレル／シリアル
変換を行うことにより、復号化された出力データが得ら
れる。In the inverse DCT operation device having such a configuration, after performing the second dimension product-sum operation in the product-sum operation circuit 10, a plurality of addition / subtraction units in the 16 accumulative addition units are used again. By performing parallel / serial conversion after adding or subtracting a plurality of values obtained as a result of the product-sum operation in the second dimension, decoded output data is obtained.

【００５１】さらに詳しく説明すると、上記第１の実施
例において入力データの８×８行列の逆ＤＣＴ演算を実
行する場合、まず第１に、図３の係数値並べ替えＲＡＭ
６からシリアル形式で送られてくる係数値を、パラレル
形式の係数値に変換する処理を行う。このようなシリア
ル／パラレル変換の処理は、図１のシリアル／パラレル
変換部４により行われる（後述の「係数ロード」）。More specifically, when performing the inverse DCT operation of the 8 × 8 matrix of the input data in the first embodiment, first, the coefficient value rearranging RAM of FIG.
The processing of converting the coefficient values sent in the serial format from No. 6 into the coefficient values in the parallel format is performed. Such serial / parallel conversion processing is performed by the serial / parallel conversion unit 4 in FIG. 1 (“coefficient loading” described later).

【００５２】第２に、シリアル／パラレル変換部４から
出力されるパラレル形式の係数値は、図２の１６個のシ
フトレジスタ２０−１〜２０−１６にロードされる。こ
れらのシフトレジスタ２０−１〜２０−１６は、最下位
のビットから最上位のビットに向かって１ビットずつシ
フトしながら、図２の１６個の累積加算部３０−１〜３
０−１６に上記係数値を供給する。Second, the coefficient values in the parallel format output from the serial / parallel converter 4 are loaded into the 16 shift registers 20-1 to 20-16 in FIG. These shift registers 20-1 to 20-16 shift the 16-bit accumulators 30-1 to 30-3 in FIG. 2 while shifting one bit at a time from the least significant bit to the most significant bit.
The above coefficient value is supplied to 0-16.

【００５３】これらの１６個の累積加算部では、１６個
のシフトレジスタから出力される値の中の４個の値（４
ビット）をアドレスとして、上記累積加算部内の値記憶
部に既に記憶されている値を参照し、参照値を得る。さ
らに、上記の１６個の累積加算部では、１６個のシフト
レジスタの値を１ビットずつシフトする度に参照される
参照値を累積して加算していく。このような累積加算を
実行する際は、累積された値を１ビット右にシフトした
値と参照値とを加算する。上記の１６個のシフトレジス
タから出力される値を全てシフトした時点で、１６個の
累積加算部内の加算器や減算器等の加減算部を再度使用
して、累積した値同士を加算または減算し、１６種類の
１次元目の積和演算の結果を同時に得ることができる
（後述の「累積加算」）。In these 16 accumulators, four of the values output from the 16 shift registers (4
With reference to the value already stored in the value storage unit in the accumulative addition unit, using the bit) as an address, a reference value is obtained. Further, the 16 accumulative adders accumulate and add the reference values that are referred to each time the values of the 16 shift registers are shifted one bit at a time. When performing such cumulative addition, a value obtained by shifting the accumulated value to the right by one bit and a reference value are added. When all the values output from the 16 shift registers have been shifted, the adder / subtracter such as an adder or a subtractor in the 16 accumulators is used again to add or subtract the accumulated values. , And 16 types of results of the first-dimension product-sum operation can be obtained simultaneously (“cumulative addition” described later).

【００５４】第３に、上記の１次元目の積和演算の結果
として得られる複数の値を転置ＲＡＭ８に入力し、行と
列を入れ替える転置の操作を行ってから、上記複数の値
を積和演算回路１０内の１６個のシフトレジスタに再度
入力する。この積和演算回路１０では、１６個の累積加
算部内の値記憶部に既に記憶されている値と、１６個の
シフトレジスタから出力される値を使用して２次元目の
積和演算を行う。さらに、１６個の累積加算部内の加算
部や減算部を再度使用し、２次元目の積和演算の結果と
して得られる複数の値同士を加算または減算してからパ
ラレル／シリアル変換を行うことにより、復号化された
シリアル形式の出力データが得られる（後述の「結果出
力」）。Third, a plurality of values obtained as a result of the product-sum operation in the first dimension are input to the transposition RAM 8, and a transposition operation for exchanging rows and columns is performed. The data is again input to the 16 shift registers in the sum operation circuit 10. The product-sum operation circuit 10 performs a two-dimensional product-sum operation using the values already stored in the value storage units in the 16 accumulators and the values output from the 16 shift registers. . Further, the adder and the subtractor in the 16 accumulators are used again to add or subtract a plurality of values obtained as a result of the product-sum operation in the second dimension, and then perform parallel / serial conversion. Thus, decrypted serial-format output data is obtained (“result output” described later).

【００５５】上記第１の実施例では、複数の係数値に対
し並列処理を行うことによって、ＩＥＥＥにより規定さ
れる２次元の８×８行列の逆ＤＣＴ演算精度を満たしつ
つ、比較的少ないクロックサイクル数で効率の良いパイ
プライン処理を実行することが可能になる。さらに、上
記第１の実施例では、２次元目の積和演算の後処理を行
う際に累積加算部内の加算部や減算部を利用するように
しているので、従来設けていた加算部や減算部が必要な
くなり、回路規模が少なくて済む。In the first embodiment, by performing parallel processing on a plurality of coefficient values, a relatively small number of clock cycles can be achieved while satisfying the inverse DCT operation accuracy of a two-dimensional 8 × 8 matrix defined by IEEE. This makes it possible to execute efficient and efficient pipeline processing. Further, in the first embodiment, the adder and the subtractor in the accumulator are used when performing the post-processing of the product-sum operation in the second dimension. This eliminates the need for a unit and reduces the circuit scale.

【００５６】さらに、上記第１の実施例では、１次元目
の行列演算に係る累積加算を行うために積和演算回路内
の累積加算部を初期化するときに、上記パラレル形式の
係数値の小数部における最下位のビットの丸め処理（四
捨五入）を行うようにしている。このようにすれば、２
次元目の累積加算の際に係数値のビット数を少なくする
ことができるので、ＩＥＥＥにより規定される逆ＤＣＴ
演算精度を満たしつつ、最小限のクロックサイクル数で
パイプライン処理を実行することができる。Further, in the first embodiment, when the accumulative addition unit in the product-sum operation circuit is initialized to perform the accumulative addition related to the matrix operation of the first dimension, the coefficient value of the parallel format is initialized. The least significant bit in the decimal part is rounded (rounded). In this way, 2
Since the number of bits of the coefficient value can be reduced at the time of cumulative addition of the dimension, the inverse DCT defined by IEEE
The pipeline processing can be executed with a minimum number of clock cycles while satisfying the operation accuracy.

【００５７】図４は、本発明の第２の実施例の構成を示
すブロック図である。図４の第２の実施例に係る逆ＤＣ
Ｔ演算装置においても、前述の第１の実施例の場合と同
じように、１次元目および２次元目の８×８行列の行列
演算を順次実行して１６種類の積和演算の結果を得るた
めに、図２に示したような構成の積和演算回路１０が設
けられている。FIG. 4 is a block diagram showing the configuration of the second embodiment of the present invention. Inverse DC according to the second embodiment of FIG.
Also in the T-operation device, as in the case of the above-described first embodiment, matrix operations of the first and second 8 × 8 matrices are sequentially performed to obtain 16 types of product-sum results. For this purpose, a product-sum operation circuit 10 having the configuration shown in FIG. 2 is provided.

【００５８】さらに、上記第２の実施例では、複数の係
数値を構成する複数のブロックに対し交互に並べ替えを
行うための２個の係数値並べ替えＲＡＭ（図４では、第
１の係数値並べ替えＲＡＭ６１および第２の係数値並べ
替えＲＡＭ６２として示す）と、上記積和演算の結果と
して得られる複数の値のブロックに対する転置を交互に
行うための２個の転置ＲＡＭ（図４では、第１の転置Ｒ
ＡＭ８１および第２の転置ＲＡＭ８２として示す）とが
設けられている。Further, in the second embodiment, two coefficient value rearranging RAMs (in FIG. 4, the first coefficient RAM) for alternately rearranging a plurality of blocks constituting a plurality of coefficient values are described. Numerical data rearranging RAM 61 and second coefficient value rearranging RAM 62) and two transposing RAMs (in FIG. 4, in FIG. 4) for alternately transposing blocks of a plurality of values obtained as a result of the product-sum operation. First transpose R
AM 81 and a second transposed RAM 82).

【００５９】さらに、図４の第２の実施例では、２個の
係数値並べ替えＲＡＭと積和演算回路１０との間に、マ
ルチプレクサ７１を設けている。このマルチプレクサ７
１は、積和演算回路１０内で１次元目の積和演算または
２次元目の積和演算のいずれか一方を実行させるため
に、２個の係数値並べ替えＲＡＭから送出されるシリア
ル形式の複数の値をパラレル形式の複数の値に変換した
値、または、２個の転置ＲＡＭからパラレル形式で送出
される複数の値のいずれか一方を選択して積和演算回路
１０に入力する機能を有する。Further, in the second embodiment shown in FIG. 4, a multiplexer 71 is provided between the two coefficient value rearranging RAMs and the product-sum operation circuit 10. This multiplexer 7
Reference numeral 1 denotes a serial format sent from the two coefficient value rearranging RAMs in order to execute either the first-dimensional product-sum operation or the second-dimensional product-sum operation in the product-sum operation circuit 10. A function of selecting either one of a plurality of values converted into a plurality of values in a parallel format or a plurality of values transmitted in a parallel format from two transposition RAMs and inputting the selected value to the product-sum operation circuit 10. Have.

【００６０】このような構成の逆ＤＣＴ演算装置におい
ては、積和演算回路１０内で２次元目の積和演算を行っ
た後に、１６個の累積加算部内の複数の加減算部を再度
使用し、２次元目の積和演算の結果として得られる複数
の値同士を加算または減算してからパラレル／シリアル
変換を行うことにより、復号化された出力データが得ら
れる。In the inverse DCT operation device having such a configuration, after performing the second dimension product-sum operation in the product-sum operation circuit 10, a plurality of addition / subtraction units in the 16 accumulative addition units are used again. By performing parallel / serial conversion after adding or subtracting a plurality of values obtained as a result of the product-sum operation in the second dimension, decoded output data is obtained.

【００６１】上記の第１および第２の係数値並べ替えＲ
ＡＭ６１、６２では、第１の係数値並べ替えＲＡＭ６１
において一つのブロックの係数値の並べ替えを行った結
果の読み出し動作が実行される前に、上記第２の係数値
並べ替えＲＡＭ６２において次に送られてくる他のブロ
ックの係数値の書き込み動作が実行される。さらに、こ
れ以降に送られてくるブロックの係数値に対し、同様の
読み出し動作および書き込み動作が順次実行される。こ
の場合は、一方の係数値並べ替えＲＡＭの読み出し動作
を実行している期間に、他方の係数値並べ替えＲＡＭの
書き込み動作を実行することができるので、前述の第１
の実施例の場合よりも高速にて係数値の並べ替えを実行
することができる。The first and second coefficient value rearrangement R
In the AMs 61 and 62, the first coefficient value rearranging RAM 61
In the second coefficient value rearranging RAM 62, before the read operation of the result of rearranging the coefficient values of one block is performed, the writing operation of the coefficient value of another block transmitted next is performed in the second coefficient value rearranging RAM 62. Be executed. Further, the same read operation and write operation are sequentially performed on the coefficient values of the blocks transmitted thereafter. In this case, the write operation of the other coefficient value rearranging RAM can be performed while the read operation of one coefficient value rearranging RAM is being performed.
The rearrangement of the coefficient values can be executed at a higher speed than in the case of the embodiment.

【００６２】また一方で、上記の第１および第２の転置
ＲＡＭ８１、８２では、第１の転置ＲＡＭ８１において
一つのブロックの値の転置を行った結果の読み出し動作
が実行される前に、上記第２の転置ＲＡＭ８２において
次に送られてくるブロックの値の書き込み動作が実行さ
れる。さらに、これ以降に送られてくるブロックの値に
対し、同様の読み出し動作および書き込み動作が順次実
行される。この場合は、一方の転置ＲＡＭの読み出し動
作を実行している期間に、他方の転置ＲＡＭの書き込み
動作を実行することができるので、前述の第１の実施例
の場合よりも高速にて１次元目の積和演算後の転置を実
行することができる。On the other hand, in the first and second transposition RAMs 81 and 82, before the read operation of the result of transposing the value of one block in the first transposition RAM 81 is executed, In the second transposition RAM 82, the operation of writing the value of the block transmitted next is executed. Further, the same read operation and write operation are sequentially performed on the values of the blocks transmitted thereafter. In this case, the write operation of the other transposed RAM can be performed while the read operation of one transposed RAM is being performed, so that the one-dimensional one-dimensional operation can be performed at a higher speed than in the case of the first embodiment. Transposition after the product-sum operation of the eyes can be performed.

【００６３】上記第２の実施例では、前述の第１の実施
例の構成に比べて係数値並べ替えＲＡＭと転置ＲＡＭが
各々１個増えているが、第１の実施例の場合よりも少な
いクロックサイクル数で非常に効率の良いパイプライン
処理を実行することが可能になる。図５は、本発明の第
３の実施例の構成を示すブロック図である。ここでは、
入力データの複数の係数値のシリアル／パラレル変換を
行うシリアル／パラレル変換部（例えば、図２参照）
が、１次元目積和演算回路１１に設けられると共に、２
次元目の積和演算の結果として得られる複数の値のパラ
レル／シリアル変換を行うパラレル／シリアル変換部
（例えば、図２参照）が、２次元目積和演算回路１２に
設けられている。In the second embodiment, the coefficient value rearranging RAM and the transposition RAM are each increased by one in comparison with the configuration of the first embodiment, but are smaller than those in the first embodiment. Very efficient pipeline processing can be executed with the number of clock cycles. FIG. 5 is a block diagram showing the configuration of the third embodiment of the present invention. here,
Serial / parallel converter for serial / parallel conversion of a plurality of coefficient values of input data (for example, see FIG. 2)
Are provided in the one-dimensional product sum arithmetic circuit 11 and
A parallel / serial converter (for example, see FIG. 2) that performs parallel / serial conversion of a plurality of values obtained as a result of the product-sum operation in the dimension is provided in the two-dimensional product-sum operation circuit 12.

【００６４】図５の第３の実施例に係る逆ＤＣＴ演算装
置においては、１次元目の８×８行列の積和演算を実行
する１次元目積和演算回路１１と、２次元目の８×８行
列の積和演算を実行する２次元目積和演算回路１２とが
別個に設けられている。ただし、これらの１次元目積和
演算部１１および２次元目積和演算回路１２の各々の構
成は、前述の図２の積和演算回路１０の構成と実質的に
同じである。In the inverse DCT operation device according to the third embodiment shown in FIG. 5, a first-dimension product-sum operation circuit 11 for executing a first-dimension 8 × 8 matrix product-sum operation and a second-dimension 8 A two-dimensional product-sum operation circuit 12 for executing a product-sum operation of a × 8 matrix is provided separately. However, the configuration of each of the one-dimensional product-sum operation unit 11 and the two-dimensional product-sum calculation circuit 12 is substantially the same as the configuration of the product-sum calculation circuit 10 of FIG.

【００６５】さらに詳しく説明すると、１次元目積和演
算回路１１は、シリアル／パラレル変換部から出力され
るパラレル形式の係数値を一時的に保持するための１６
個のシフトレジスタと、１６個の第１の累積加算部とを
備えている。さらに、これらの累積加算部の各々は、シ
リアル／パラレル変換部から出力される値をアドレスと
して、上記累積加算部内の値記憶部に既に記憶されてい
る情報と、上記累積加算部内の加算部や減算部からなる
複数の加減算部とを用いて１次元目の累積加算を実行す
る。More specifically, the one-dimensional product sum-of-arithmetic circuit 11 is a circuit for temporarily storing a coefficient value in a parallel format output from the serial / parallel converter.
Shift registers and 16 first accumulative adders. Further, each of these accumulators uses the value output from the serial / parallel converter as an address, the information already stored in the value memory in the accumulator, the adder in the accumulator, The first-order cumulative addition is performed using a plurality of addition / subtraction units including a subtraction unit.

【００６６】また一方で、２次元目積和演算回路１２
は、２個の転置部８３、８４の各々から送出されるパラ
レル形式の複数の値を一時的に保持するための１６個の
シフトレジスタと、１６個の第２の累積加算部とを備え
ている。さらに、これらの累積加算部の各々は、２個の
転置部の各々から送出されるパラレル形式の複数の値を
アドレスとして、上記累積加算部内の第２の値記憶部に
既に記憶されている情報と、上記累積加算部内の加算部
や減算部からなる複数の第２の加減算部とを用いて２次
元目の累積加算を実行する。On the other hand, the two-dimensional product sum operation circuit 12
Includes 16 shift registers for temporarily holding a plurality of values in the parallel format sent from each of the two transposition units 83 and 84, and 16 second accumulative addition units. I have. Further, each of these accumulative adders uses, as an address, a plurality of values in a parallel format transmitted from each of the two transposed units as information stored in a second value storage unit in the accumulative adder. And a plurality of second adding / subtracting units including an adding unit and a subtracting unit in the above-mentioned accumulating unit.

【００６７】さらに、上記第３の実施例では、複数の係
数値を構成する複数のブロックに対し交互に並べ替えを
行うための２個の係数値並べ替えＲＡＭ（図５では、第
１の係数値並べ替えＲＡＭ６３および第２の係数値並べ
替えＲＡＭ６４として示す）と、上記係数値の１次元の
積和演算の結果として得られる複数の値のブロックに対
する転置を交互に行うための２個の転置ＲＡＭ（図５で
は、第１の転置ＲＡＭ８３および第２の転置ＲＡＭ８４
として示す）とが設けられている。Further, in the third embodiment, two coefficient value rearranging RAMs (in FIG. 5, a first coefficient RAM) for alternately rearranging a plurality of blocks constituting a plurality of coefficient values. And two transpositions for alternately transposing a plurality of value blocks obtained as a result of the one-dimensional product-sum operation of the coefficient values. RAM (in FIG. 5, a first transposed RAM 83 and a second transposed RAM 84
) Are provided.

【００６８】さらに、図５の第３の実施例では、２個の
係数値並べ替えＲＡＭと１次元目積和演算回路１１との
間に、第１のマルチプレクサ７２を設けている。この第
１のマルチプレクサ７２は、２個の係数値並べ替えＲＡ
Ｍのいずれか一方の係数値並べ替えＲＡＭから送出され
るパラレル形式の複数の値を選択して１次元目積和演算
回路１０に入力する機能を有する。Further, in the third embodiment shown in FIG. 5, a first multiplexer 72 is provided between the two coefficient value rearranging RAMs and the one-dimensional product sum arithmetic circuit 11. The first multiplexer 72 includes two coefficient value rearrangements RA
It has a function of selecting a plurality of values in the parallel format sent from any one of the coefficient value rearranging RAMs of M and inputting them to the one-dimensional product sum arithmetic circuit 10.

【００６９】また一方で、図５の第３の実施例では、２
個の第１の転置ＲＡＭ８３および第２の転置ＲＡＭ８４
との間に、第２のマルチプレクサ９２を設けている。こ
の第２のマルチプレクサ９２は、２個の転置ＲＡＭのい
ずれか一方の転置ＲＡＭから送出されるパラレル形式の
複数の値を選択して２次元目積和演算回路１２に入力す
る機能を有する。On the other hand, in the third embodiment shown in FIG.
First transposed RAM 83 and second transposed RAM 84
, A second multiplexer 92 is provided. The second multiplexer 92 has a function of selecting a plurality of values in a parallel format sent from one of the two transposed RAMs and inputting the selected values to the two-dimensional product sum arithmetic circuit 12.

【００７０】このような構成の逆ＤＣＴ演算装置におい
ては、２次元目積和演算回路１２内で２次元目の積和演
算を行った後に、１６個の累積加算部内の複数の加減算
部を再度使用し、２次元目の積和演算の結果として得ら
れる複数の値同士を加算または減算してからパラレル／
シリアル変換を行うことにより、復号化された出力デー
タが得られる。In the inverse DCT operation device having such a configuration, after performing the second-dimensional product-sum operation in the two-dimensional product-sum operation circuit 12, a plurality of addition / subtraction units in the 16 accumulative addition units are again operated. To add or subtract a plurality of values obtained as a result of the product-sum operation in the second dimension,
By performing serial conversion, decoded output data is obtained.

【００７１】上記の第１および第２の係数値並べ替えＲ
ＡＭ６３、６４では、第１の係数値並べ替えＲＡＭ６３
において一つのブロックの係数値の並べ替えを行った結
果の読み出し動作が実行される前に、上記第２の係数値
並べ替えＲＡＭ６４において次に送られてくる他のブロ
ックの係数値の書き込み動作が実行される。さらに、こ
れ以降に送られてくるブロックの係数値に対し、同様の
読み出し動作および書き込み動作が順次実行される。こ
の場合は、第１および第２の係数値並べ替えＲＡＭ６
３、６４から１次元目積和演算回路１１に向かって、入
力データの複数の係数値を連続的に送出することができ
るので、前述の第２の実施例の場合よりも高速にて係数
値の並べ替えを実行することができる。The first and second coefficient value rearrangement R
In the AM 63, 64, the first coefficient value rearranging RAM 63
Before the read operation of the result of the rearrangement of the coefficient values of one block is performed in the above, the operation of writing the coefficient values of another block transmitted next in the second coefficient value rearrangement RAM 64 is performed. Be executed. Further, the same read operation and write operation are sequentially performed on the coefficient values of the blocks transmitted thereafter. In this case, the first and second coefficient value rearranging RAM 6
Since a plurality of coefficient values of the input data can be continuously transmitted from the third and the 64th to the one-dimensional product-sum operation circuit 11, the coefficient values can be transmitted at a higher speed than in the second embodiment. Can be performed.

【００７２】また一方で、上記の第１および第２の転置
ＲＡＭ８３、８４では、第１の転置ＲＡＭ８３において
一つのブロックの値の転置を行った結果の読み出し動作
が実行される前に、上記第２の転置ＲＡＭ８４において
次に送られてくるブロックの値の書き込み動作が実行さ
れる。さらに、これ以降に送られてくるブロックの値に
対し、同様の読み出し動作および書き込み動作が順次実
行される。この場合は、第１および第２の転置ＲＡＭ８
３、８４から２次元目積和演算回路１２に向かって、パ
ラレル形式の複数の値を連続的に送出することができる
ので、前述の第２の実施例の場合よりも高速にて係数値
の並べ替えを実行することができる。On the other hand, in the first and second transposition RAMs 83 and 84, before the read operation of the result of transposing the value of one block in the first transposition RAM 83 is executed, In the second transposition RAM 84, the operation of writing the value of the block transmitted next is executed. Further, the same read operation and write operation are sequentially performed on the values of the blocks transmitted thereafter. In this case, the first and second transposed RAMs 8
Since a plurality of values in the parallel format can be continuously transmitted from the third and the fourth to the two-dimensional product-sum operation circuit 12, the coefficient values can be calculated at a higher speed than in the case of the second embodiment. Sorting can be performed.

【００７３】上記第３の実施例では、前述の第１の実施
例の構成に比べて係数値並べ替えＲＡＭと転置ＲＡＭと
積和演算回路とが各々１個増えているが、第２の実施例
の場合よりもさらに少ないクロックサイクル数で非常に
効率の良いパイプライン処理を実行することが可能にな
る。このような構成では、１サイクルで一つの係数値の
演算処理を実現することができる。In the third embodiment, the coefficient value rearranging RAM, the transposition RAM, and the product-sum operation circuit are each increased by one as compared with the configuration of the first embodiment. Very efficient pipeline processing can be executed with a smaller number of clock cycles than in the example. With such a configuration, it is possible to realize the operation of calculating one coefficient value in one cycle.

【００７４】図６および図７は、本発明の第１の実施例
の全体的な動作を説明するためのタイミングチャートの
その１およびその２である。ただし、ここでは、図３に
示したような構成の逆ＤＣＴ演算装置を動作させた場合
の、動画像データ等からなる入力データの６ブロック分
のパイプライン処理のタイミングムチャートを図示して
いる。この場合、時間軸の１目盛りは、１６サイクルに
相当する時間の長さを表している。FIGS. 6 and 7 are timing charts 1 and 2 for explaining the overall operation of the first embodiment of the present invention. However, here, a timing chart of pipeline processing for six blocks of input data composed of moving image data and the like when the inverse DCT operation device having the configuration shown in FIG. 3 is operated is illustrated. . In this case, one scale on the time axis indicates the length of time corresponding to 16 cycles.

【００７５】これらの６ブロックは、ＭＰＥＧの規格に
準拠した４つの輝度データＹ０、Ｙ１、Ｙ２およびＹ３
と、２つの色差データＣｂ、Ｃｒとを含み、通常、マク
ロブロックとよばれる。このマクロブロック内の１ブロ
ックは、８×８＝６４個の値を有する。図６および図７
に示すように、入力データの複数の係数値の積和演算
を、「係数ロード」、「累積加算」および「結果出力」
の３つのステージに分け、これらの各々のステージを並
列処理（すなわち、パイプライン処理）によって実現す
る。以下、このような並列処理について説明する。These six blocks are composed of four luminance data Y0, Y1, Y2 and Y3 based on the MPEG standard.
And two pieces of color difference data Cb and Cr, and are usually called macroblocks. One block in this macro block has 8 × 8 = 64 values. 6 and 7
As shown in, the product-sum operation of multiple coefficient values of the input data is performed by "coefficient loading", "cumulative addition", and "result output".
And each of these stages is realized by parallel processing (ie, pipeline processing). Hereinafter, such parallel processing will be described.

【００７６】「係数ロード」は、６ブロック分の入力デ
ータの書き込み動作（例えば、図６のＹ０ライト〜Ｃｒ
ライト）および読み出し動作（例えば、図６のＹ０リー
ド〜Ｃｒリード）を行った係数値並べ替えＲＡＭ（図３
参照）からシリアル形式で送られてくる６ブロック分の
係数値を、パラレル形式の係数値に並べ替える処理であ
る。この係数値並べ替え処理においては、１次元目およ
び２次元目の輝度データＹ０（１ｓｔ）、Ｙ０（２ｎ
ｄ）、Ｙ１（１ｓｔ）、Ｙ１（２ｎｄ）、Ｙ２（１ｓ
ｔ）、Ｙ２（２ｎｄ）、Ｙ３（１ｓｔ）およびＹ３（２
ｎｄ）と、１次元目および２次元目の色差データＣｂ
（１ｓｔ）、Ｃｂ（２ｎｄ）、Ｃｒ（１ｓｔ）およびＣ
ｒ（２ｎｄ）に対する処理を順次実行する。上記のよう
な係数値並べ替え処理は、図２のシリアル／パラレル変
換部において実行される。“Coefficient loading” is a write operation of input data for six blocks (for example, Y0 write to Cr in FIG. 6).
Write and read operations (for example, Y0 read to Cr read in FIG. 6) are performed.
This is a process of rearranging the coefficient values of six blocks sent in serial form from the reference form (see FIG. 2) into coefficient values in parallel form. In this coefficient value rearranging process, the first and second dimension luminance data Y0 (1st) and Y0 (2n)
d), Y1 (1st), Y1 (2nd), Y2 (1s)
t), Y2 (2nd), Y3 (1st) and Y3 (2
nd), the first- and second-dimensional color difference data Cb
(1st), Cb (2nd), Cr (1st) and C
Processing for r (2nd) is sequentially executed. The coefficient value rearrangement processing as described above is executed in the serial / parallel conversion unit in FIG.

【００７７】「累積加算」は、図２のシリアル／パラレ
ル変換部から出力されるパラレル形式の係数値を１６個
のシフトレジスタにロードし、これらのシフトレジスタ
内で最下位のビットから最上位のビットに向かって１ビ
ットずつシフトしながら複数の累積加算部へ係数値を供
給する処理である。この累積加算処理は、図２の複数
（例えば、１６個）の累積加算部において実行される。
これらの累積加算部では、１６個のシフトレジスタ中の
４個の値（４ビット）をアドレスとして、既に回路上に
記憶されている値を参照し、参照値を得る。上記の累積
加算部では、１６個のシフトレジスタの値をシフトする
度に参照される値を累積して加算していく。この累積加
算の際は、累積された値（すなわち、１サイクル前の加
算の結果）を１ビット右にシフトした値（すなわち、１
／２倍した値）と参照値とを加算する。１６個のシフト
レジスタの値を全てシフトしたときに、複数の累積加算
部を再度使用して、累積加算した値同士を加算または減
算し、１６個の積和演算の結果を同時に得ることができ
る。In the "cumulative addition", a parallel format coefficient value output from the serial / parallel conversion unit in FIG. 2 is loaded into 16 shift registers, and the least significant bit is shifted from the least significant bit to the most significant bit in these shift registers. This is a process of supplying a coefficient value to a plurality of cumulative addition units while shifting one bit at a time toward a bit. This accumulative addition process is executed by a plurality (for example, 16) of accumulative adders in FIG.
In these accumulators, four values (4 bits) in the 16 shift registers are used as addresses to refer to values already stored in the circuit and to obtain reference values. The accumulative addition unit accumulates and adds the values referred to each time the values of the 16 shift registers are shifted. In this cumulative addition, the value obtained by shifting the accumulated value (ie, the result of the addition one cycle before) to the right by one bit (ie, 1
/ 2 times) and the reference value. When all the values of the 16 shift registers have been shifted, a plurality of accumulative adders can be used again to add or subtract the accumulatively added values to simultaneously obtain the results of the 16 product-sum operations. .

【００７８】「結果出力」は、図２のパラレル／シリア
ル変換部において、積和演算の結果として得られる１６
個の値をシリアル形式で出力する処理である（図５に、
６ブロック分の出力データを、Ｙ０出力、Ｙ１出力、Ｙ
２出力、Ｙ３出力、Ｃｂ出力およびＣｒ出力として示
す）。なお、図３の転置ＲＡＭでは、１次元目の積和演
算の結果として得られる１６個の値の行と列を入れ替え
る転置を実行してから、転置後の１６個の値を積和演算
回路内の１６個のシフトレジスタに再度入力する。この
積和演算回路では、１６個の累積加算部内の値記憶部に
既に記憶されている値と、１６個のシフトレジスタから
出力される値を使用して２次元目の積和演算を行う。さ
らに、１６個の累積加算部内の加算部や減算部を再度使
用し、２次元目の積和演算の結果として得られる複数の
値同士を加算または減算してからパラレル／シリアル変
換を行うことにより、復号化されたシリアル形式の出力
データが得られる。The “result output” is obtained as a result of the product-sum operation in the parallel / serial converter of FIG.
Is a process of outputting the values in a serial format (see FIG. 5,
Output data for six blocks is output as Y0 output, Y1 output, Y
2 output, Y3 output, Cb output and Cr output). In the transposition RAM of FIG. 3, after performing transposition for exchanging rows and columns of 16 values obtained as a result of the first-dimensional product-sum operation, the 16 values after the transposition are used to calculate the product-sum operation circuit. Are input again to the 16 shift registers. In this product-sum operation circuit, a second-dimension product-sum operation is performed using the values already stored in the value storage units in the 16 accumulators and the values output from the 16 shift registers. Further, the adder and the subtractor in the 16 accumulators are used again to add or subtract a plurality of values obtained as a result of the product-sum operation in the second dimension, and then perform parallel / serial conversion. , The decoded serial output data is obtained.

【００７９】上記のような並列処理を行うことにより、
ＩＥＥＥにより規定される演算精度でもって２次元の８
×８行列の逆ＤＣＴ演算を、比較的少ないクロックサイ
クル数（例えば、６ブロック分のデータの逆ＤＣＴ演算
に対するクロックサイクル数は９６０サイクル）で実行
することができるようになり、回路規模も少なくて済
む。By performing the parallel processing as described above,
Two-dimensional 8 with the arithmetic precision specified by IEEE
The inverse DCT operation of the × 8 matrix can be executed with a relatively small number of clock cycles (for example, the number of clock cycles for the inverse DCT operation of six blocks of data is 960), and the circuit scale is small. I'm done.

【００８０】図８および図９は、本発明の第２の実施例
の全体的な動作を説明するためのタイミングチャートの
その１およびその２である。ただし、ここでは、図４に
示したような構成の逆ＤＣＴ演算装置を動作させた場合
の、動画像データ等からなる入力データの６ブロック分
のパイプライン処理のタイミングムチャートを図示して
いる。この場合、時間軸の１目盛りは、１６サイクルに
相当する時間の長さを表している。FIGS. 8 and 9 are timing charts 1 and 2 for explaining the overall operation of the second embodiment of the present invention. However, here, a timing chart of pipeline processing for six blocks of input data composed of moving image data and the like when the inverse DCT operation device having the configuration shown in FIG. 4 is operated is illustrated. . In this case, one scale on the time axis indicates the length of time corresponding to 16 cycles.

【００８１】これらの６ブロックは、前述の第１の実施
例の場合と同じように、４つの輝度データＹ０、Ｙ１、
Ｙ２およびＹ３と、２つの色差データＣｂ、Ｃｒとを含
む。さらに、図８および図９の第２の実施例において
も、第１の実施例にて既述したように、入力データの複
数の係数値の積和演算を、「係数ロード」、「累積加
算」および「結果出力」の３つのステージに分け、これ
らの各々のステージを並列処理によって実現する。これ
らの３つのステージについては、第１の実施例にて既に
説明しているので、ここでは、その詳細な説明を省略す
る。These six blocks are composed of four luminance data Y0, Y1,..., As in the case of the first embodiment.
Y2 and Y3 and two color difference data Cb and Cr. Further, in the second embodiment of FIGS. 8 and 9, as described in the first embodiment, the product-sum operation of a plurality of coefficient values of input data is performed by “coefficient loading” and “cumulative addition”. "And" result output ", and each of these stages is realized by parallel processing. Since these three stages have already been described in the first embodiment, a detailed description thereof will be omitted here.

【００８２】上記第２の実施例では、前述の第１の実施
例の場合と異なり、２個の係数値並べ替えＲＡＭ（例え
ば、図４の第１および第２の係数値並べ替えＲＡＭ６
１、６２）を用いて入力データの係数値の並べ替えを行
うと共に、２個の転置ＲＡＭ（例えば、図４の第１およ
び第２の転置ＲＡＭ８１、８２）を用いて、積和演算後
の複数の値の転置を行っている。In the second embodiment, unlike the first embodiment, two coefficient value rearranging RAMs (for example, the first and second coefficient value rearranging RAM 6 shown in FIG. 4) are used.
1, 62), and the two transposed RAMs (for example, the first and second transposed RAMs 81, 82 in FIG. 4) are used to rearrange the coefficient values of the input data. Transpose multiple values.

【００８３】図８および図９に示すタイミングチャート
から明らかなように、上記第２の実施例では、２個の係
数値並べ替えＲＡＭにより、シリアル形式で送られてく
る４つの輝度データＹ０、Ｙ１、Ｙ２およびＹ３と、２
つの色差データＣｂ、Ｃｒに対し、交互に係数値並べ替
えを行うようにしている。より具体的にいえば、第１の
係数値並べ替えＲＡＭは、輝度データＹ０、Ｙ２、およ
び色差データＣｂの書き込み（ライト）、係数値並べ替
えおよび読み出し（リード）を実行し、第２の係数値並
べ替えＲＡＭは、輝度データＹ１、Ｙ３、および色差デ
ータＣｒの書き込み（ライト）、係数値並べ替えおよび
読み出し（リード）を実行する。As is clear from the timing charts shown in FIGS. 8 and 9, in the second embodiment, four luminance data Y0 and Y1 transmitted in serial form by two coefficient value rearranging RAMs. , Y2 and Y3 and 2
The coefficient values are alternately rearranged for the two color difference data Cb and Cr. More specifically, the first coefficient value rearranging RAM executes writing (writing), coefficient value rearranging and reading (reading) of the luminance data Y0, Y2 and the chrominance data Cb, and executes the second processing. The numerical value rearranging RAM executes writing (writing), coefficient value rearranging and reading (reading) of the luminance data Y1, Y3 and the color difference data Cr.

【００８４】図８および図９のタイミングチャートで
は、２個の係数値並べ替えＲＡＭを同時に動作させるこ
とによって、一方の係数値並べ替えＲＡＭから一つのブ
ロック分の並べ替え処理後の係数値を読み出している間
に、他方の係数値並べ替えＲＡＭに対し次のブロック分
の係数値を書き込むことができる。それゆえに、上記第
２の実施例では、前述の第１の実施例の場合よりも高速
にて係数値の並べ替え処理を実行することができる。In the timing charts of FIGS. 8 and 9, by operating two coefficient value rearranging RAMs simultaneously, one block of coefficient value rearranged RAMs is read from one coefficient value rearranging RAM. During this operation, the coefficient values for the next block can be written to the other coefficient value rearranging RAM. Therefore, in the second embodiment, the coefficient value rearrangement process can be executed at a higher speed than in the case of the first embodiment.

【００８５】さらに、図８および図９のタイミングチャ
ートでは、２個の転置ＲＡＭを同時に動作させることに
よって、一方の転置ＲＡＭから一つのブロック分の積和
演算結果の転置処理後のデータを読み出している間に、
他方の転置ＲＡＭに対し次のブロック分のデータを書き
込むことができる。それゆえに、上記第２の実施例で
は、前述の第１の実施例の場合よりも高速にて積和演算
結果の転置処理を実行することができる。Further, in the timing charts of FIGS. 8 and 9, by operating two transposition RAMs at the same time, the data after transposition processing of the product-sum operation result of one block is read from one transposition RAM. While
The next block of data can be written to the other transposed RAM. Therefore, in the second embodiment, the transposition process of the product-sum operation result can be executed at a higher speed than in the case of the first embodiment.

【００８６】前述の第１の実施例においては、図６およ
び図７のタイミングチャートに示したように、積和演算
回路内で１次元目の積和演算を実行してから２次元目の
積和演算を実行するまでに、複数の累積加算部が動作し
ない期間が存在する。これに対し、図８および図９に示
す第２の実施例においては、一方の係数値並べ替えＲＡ
Ｍから送出されるシリアル形式の係数値をパラレル形式
の係数値に変換して複数の累積加算部に供給する動作に
続いて、他方の係数値並べ替えＲＡＭから送出されるシ
リアル形式の係数値をパラレル形式の係数値に変換して
複数の累積加算部に供給する動作を実行しているので、
複数の累積加算部を連続して動作させることが可能にな
る。In the above-described first embodiment, as shown in the timing charts of FIGS. 6 and 7, after the first-dimension product-sum operation is performed in the sum-of-products operation circuit, the second-dimension product is calculated. There is a period during which the plurality of accumulators do not operate until the sum operation is performed. On the other hand, in the second embodiment shown in FIG. 8 and FIG.
Following the operation of converting the serial-type coefficient value sent from M to a parallel-type coefficient value and supplying it to a plurality of accumulators, the serial-type coefficient value sent from the other coefficient value rearranging RAM is Since the operation of converting to the coefficient value in the parallel format and supplying it to the multiple accumulators is performed,
It becomes possible to operate a plurality of accumulative adders continuously.

【００８７】上記のように、２個の係数値並べ替えＲＡ
Ｍおよび２個の転置ＲＡＭを用いて６ブロック分のデー
タの並列処理を行うことにより、ＩＥＥＥにより規定さ
れる演算精度でもって２次元の８×８行列の逆ＤＣＴ演
算を、前述の第１の実施例の場合よりも少ないクロック
サイクル数（例えば、６ブロック分のデータの逆ＤＣＴ
演算に対するクロックサイクル数は６４０サイクル、６
ブロック分の積和演算の結果の出力が完了するまでのク
ロックサイクル数は８６４サイクル）で実行することが
できるようになる。As described above, two coefficient value rearrangements RA
By performing parallel processing of 6 blocks of data using M and two transposition RAMs, the two-dimensional 8 × 8 matrix inverse DCT operation with the operation accuracy defined by IEEE A smaller number of clock cycles than the case of the embodiment (for example, inverse DCT of 6 blocks of data)
The number of clock cycles for the operation is 640 cycles, 6
(The number of clock cycles until the output of the result of the product-sum operation for the blocks is completed is 864).

【００８８】図１０および図１１は、本発明の第３の実
施例の全体的な動作を説明するためのタイミングチャー
トのその１およびその２である。ただし、ここでは、図
５に示したような構成の逆ＤＣＴ演算装置を動作させた
場合の、動画像データ等からなる入力データの６ブロッ
ク分のパイプライン処理のタイミングムチャートを図示
している。この場合、時間軸の１目盛りは、１６サイク
ルに相当する時間の長さを表している。FIGS. 10 and 11 are timing charts 1 and 2 for explaining the overall operation of the third embodiment of the present invention. However, here, a timing chart of a pipeline process for six blocks of input data composed of moving image data and the like when the inverse DCT operation device having the configuration shown in FIG. 5 is operated is illustrated. . In this case, one scale on the time axis indicates the length of time corresponding to 16 cycles.

【００８９】これらの６ブロックは、前述の第１および
第２の実施例の場合と同じように、４つの輝度データＹ
０、Ｙ１、Ｙ２およびＹ３と、２つの色差データＣｂ、
Ｃｒとを含む。さらに、図１０および図１１の第３の実
施例においても、第１の実施例にて既述したように、入
力データの複数の係数値の積和演算を、「係数ロー
ド」、「累積加算」および「結果出力」の３つのステー
ジに分け、これらの各々のステージを並列処理によって
実現する。これらの３つのステージについては、第１の
実施例にて既に説明しているので、ここでは、その詳細
な説明を省略する。These six blocks are composed of four luminance data Y as in the first and second embodiments.
0, Y1, Y2 and Y3 and two color difference data Cb,
And Cr. Further, in the third embodiment of FIGS. 10 and 11, as described in the first embodiment, the product-sum operation of a plurality of coefficient values of the input data is performed by “coefficient loading” and “cumulative addition”. "And" result output ", and each of these stages is realized by parallel processing. Since these three stages have already been described in the first embodiment, a detailed description thereof will be omitted here.

【００９０】上記第３の実施例では、前述の第２の実施
例の場合と同じように、２個の係数値並べ替えＲＡＭ
（例えば、図５の第１および第２の係数値並べ替えＲＡ
Ｍ６３、６４）を用いて入力データの係数値の並べ替え
を行うと共に、２個の転置ＲＡＭ（例えば、図５の第１
および第２の転置ＲＡＭ８３、８４）を用いて、積和演
算後の複数の値の転置を行っている。ただし、上記第３
の実施例では、前述の第２の実施例の場合と異なり、１
次元目の６ブロック分の積和演算を実行する１次元目積
和演算回路と、２次元目の６ブロック分の積和演算を実
行する２次元目積和演算回路１２とが別個に設けられて
いる。In the third embodiment, two coefficient value rearranging RAMs are used in the same manner as in the second embodiment.
(For example, the first and second coefficient value rearrangement RA in FIG. 5)
M63, 64), the coefficient values of the input data are rearranged, and two transposed RAMs (for example, the first RAM of FIG.
And the second transposition RAMs 83 and 84) are used to transpose a plurality of values after the product-sum operation. However, the third
In this embodiment, unlike the second embodiment described above, 1
A one-dimensional product-sum operation circuit for performing the product-sum operation for six blocks in the dimension and a two-dimensional product-sum operation circuit 12 for performing the product-sum operation for six blocks in the second dimension are separately provided. ing.

【００９１】図１０および図１１に示すタイミングチャ
ートから明らかなように、上記第３の実施例において
も、２個の係数値並べ替えＲＡＭにより、シリアル形式
で送られてくる４つの輝度データＹ０、Ｙ１、Ｙ２およ
びＹ３と、２つの色差データＣｂ、Ｃｒに対し、交互に
係数値並べ替えを行うようにしている。より具体的にい
えば、第１の係数値並べ替えＲＡＭは、輝度データＹ
０、Ｙ２、および色差データＣｂの書き込み（ライ
ト）、係数値並べ替えおよび読み出し（リード）を実行
し、第２の係数値並べ替えＲＡＭは、輝度データＹ１、
Ｙ３、および色差データＣｒの書き込み（ライト）、係
数値並べ替えおよび読み出し（リード）を実行する。As is apparent from the timing charts shown in FIGS. 10 and 11, also in the third embodiment, four luminance data Y0, The coefficient values are rearranged alternately for Y1, Y2, and Y3 and the two color difference data Cb and Cr. More specifically, the first coefficient value rearranging RAM stores the luminance data Y
Write (write), coefficient value rearrangement, and readout (read) of 0, Y2, and color difference data Cb are performed, and the second coefficient value rearrangement RAM stores the luminance data Y1,
Write (write), coefficient value rearrangement, and read (read) of Y3 and color difference data Cr are executed.

【００９２】図１０および図１１のタイミングチャート
では、２個の係数値並べ替えＲＡＭを同時に動作させる
ことによって、一方の係数値並べ替えＲＡＭから一つの
ブロック分の並べ替え処理後の係数値を読み出している
間に、他方の係数値並べ替えＲＡＭに対し次のブロック
分の係数値を書き込むことができる。さらに、２個の係
数値並べ替えＲＡＭから読み出された係数値を、１次元
目の積和演算のみを実行する１次元目積和演算回路内の
１６個のシフトレジスタにロードするようにしている。
したがって、１次元目積和演算回路では、１次元目の６
ブロック分のデータＹ０（１ｓｔ）、Ｙ１（１ｓｔ）、
Ｙ２（１ｓｔ）、Ｙ３（１ｓｔ）、Ｃｂ（１ｓｔ）、お
よびＣｒ（１ｓｔ）の係数値が連続してロードされる。
それゆえに、上記第３の実施例では、前述の第２の実施
例の場合よりも高速にて係数値の並べ替え処理を実行す
ることができる。In the timing charts of FIGS. 10 and 11, by operating two coefficient value rearranging RAMs simultaneously, the coefficient values of one block after the rearrangement processing for one block are read out from one coefficient value rearranging RAM. During this operation, the coefficient values for the next block can be written to the other coefficient value rearranging RAM. Further, the coefficient values read from the two coefficient value rearranging RAMs are loaded into 16 shift registers in the one-dimensional product-sum operation circuit that executes only the first-dimensional product-sum operation. I have.
Therefore, in the first-dimension product-sum operation circuit, the first dimension 6
Data for blocks Y0 (1st), Y1 (1st),
The coefficient values of Y2 (1st), Y3 (1st), Cb (1st), and Cr (1st) are loaded successively.
Therefore, in the third embodiment, the coefficient value rearrangement processing can be executed at a higher speed than in the case of the second embodiment.

【００９３】また一方で、図１０および図１１のタイミ
ングチャートでは、２個の転置ＲＡＭを同時に動作させ
ることによって、一方の転置ＲＡＭから一つのブロック
分の転置処理後のデータを読み出している間に、他方の
転置ＲＡＭに対し次のブロック分のデータを書き込むこ
とができる。さらに、２個の転置ＲＡＭから読み出され
た転置処理後のデータを、２次元目の積和演算のみを実
行する２次元目積和演算回路内の１６個のシフトレジス
タにロードするようにしている。したがって、１次元目
積和演算回路では、２次元目の６ブロック分のデータＹ
０（２ｎｄ）、Ｙ１（２ｎｄ）、Ｙ２（２ｎｄ）、Ｙ３
（２ｎｄ）、Ｃｂ（２ｎｄ）、およびＣｒ（２ｎｄ）の
値が連続してロードされる。それゆえに、上記第３の実
施例では、前述の第２の実施例の場合よりも高速にて１
次元目の積和演算結果の転置処理を実行することができ
る。On the other hand, in the timing charts of FIGS. 10 and 11, by operating two transposition RAMs at the same time, one block of transposition data is read from one transposition RAM while the data after transposition processing is being read. , The next block of data can be written to the other transposed RAM. Further, the transposed data read from the two transposition RAMs is loaded into 16 shift registers in a two-dimensional product-sum operation circuit that executes only the second-dimensional product-sum operation. I have. Therefore, in the one-dimensional product-sum operation circuit, the data Y for six blocks in the second dimension is calculated.
0 (2nd), Y1 (2nd), Y2 (2nd), Y3
The values of (2nd), Cb (2nd), and Cr (2nd) are loaded sequentially. Therefore, in the third embodiment, 1 is faster than in the case of the second embodiment.
Transposition processing of the product-sum operation result of the dimension can be executed.

【００９４】上記のように、２個の係数値並べ替えＲＡ
Ｍ、２個の転置ＲＡＭ、および２個の積和演算回路を用
いて６ブロック分のデータの並列処理を行うことによ
り、ＩＥＥＥにより規定される演算精度でもって２次元
の８×８行列の逆ＤＣＴ演算を、前述の第２の実施例の
場合よりも少ないクロックサイクル数（例えば、６ブロ
ック分のデータの逆ＤＣＴ演算に対するクロックサイク
ル数は３８４サイクル、６ブロック分の積和演算の結果
の出力が完了するまでのクロックサイクル数は５７６サ
イクル）で実行することができるようになる。As described above, two coefficient value rearrangements RA
M, by using two transposition RAMs and two multiply-accumulation circuits, to perform parallel processing of 6 blocks of data, thereby obtaining the inverse of a two-dimensional 8 × 8 matrix with the operation accuracy specified by IEEE. The DCT operation is performed by reducing the number of clock cycles (for example, the number of clock cycles for the inverse DCT operation of six blocks of data to 384 cycles, and the output of the product-sum operation of six blocks). (The number of clock cycles until the completion of the operation is 576 cycles).

【００９５】上記第３の実施例では、図１０および図１
１の「結果出力」を参照すれば明らかなように、第２の
実施例の場合よりもさらに少ないクロックサイクル数で
非常に効率の良いパイプライン処理を実行することによ
ってシリアル形式のデータを連続的に出力するようにし
ているので、１サイクルで一つの係数値の演算処理を実
現することが可能になる。In the third embodiment, FIG. 10 and FIG.
As is clear from reference to the "result output" of No. 1, serial data can be continuously converted by executing a very efficient pipeline processing with a smaller number of clock cycles than in the second embodiment. , The calculation of one coefficient value can be realized in one cycle.

【００９６】ついで、本発明の逆ＤＣＴ演算に係る積和
演算処理を、１次元目の積和演算処理と２次元目の積和
演算処理に分離して詳細に説明する。図１２および図１
３は、本発明の第１の実施例における１次元目の積和演
算処理手順を説明するためのタイミングチャートのその
１およびその２であり、図１４は、本発明の第１の実施
例における１次元目の積和演算処理のビットの動きを示
すタイミングチャートである。ここでは、クロックに基
づいて実行される１次元目の積和演算における「係数ロ
ード」、「累積加算」および「結果出力」の動作を示す
タイミングチャートと、１次元目の積和演算処理におけ
るデータの各ビットの動きを示すタイミングチャートを
図示することとする。Next, the product-sum operation according to the inverse DCT operation of the present invention will be described in detail by separating the product-sum operation in the first dimension and the product-sum operation in the second dimension. FIG. 12 and FIG.
3 is a first and a second part of a timing chart for explaining a first-dimension product-sum operation processing procedure in the first embodiment of the present invention, and FIG. 14 is a timing chart in the first embodiment of the present invention. 9 is a timing chart showing the movement of bits in the first-dimensional product-sum operation processing. Here, a timing chart showing operations of “coefficient loading”, “cumulative addition” and “result output” in the first-dimensional product-sum operation performed based on the clock, and data in the first-dimensional product-sum operation process A timing chart showing the movement of each bit of FIG.

【００９７】図１２および図１３において、「係数ロー
ド」は、シリアル／パラレル変換部にシリアル形式で入
力される複数の係数（例えば、１６個の係数からなる係
数１〜係数１５）の係数値を、パラレル形式の係数値に
並べ替える。図１２および図１３から明らかなように、
この係数ロードの処理には１６クロック分必要である。
この係数ロードの処理が終了すると、１クロックを使用
して、８×８行列（１ブロック分）の２列または２行分
の係数値を１６個のシフトレジスタにロードする。これ
と同時に、１６個の累積加算部の初期化を行う。In FIG. 12 and FIG. 13, "coefficient loading" means that the coefficient values of a plurality of coefficients (for example, coefficient 1 to coefficient 15 consisting of 16 coefficients) input to the serial / parallel conversion unit in serial form. , And rearrange them into coefficient values in a parallel format. As is clear from FIGS. 12 and 13,
16 clocks are required for this coefficient loading process.
When the coefficient loading process is completed, coefficient values for two columns or two rows of an 8 × 8 matrix (for one block) are loaded into 16 shift registers using one clock. At the same time, the 16 accumulators are initialized.

【００９８】上記１６個のシフトレジスタの値は、「累
積加算」の処理に使われる。１６個の累積加算部では、
１６個のシフトレジスタから出力される値の中でｓ₀〜
ｓ₃またはｔ₀〜ｔ₃（前述の式（８）参照）に対応す
る４ビットをアドレスとして、累積加算部内の値記憶部
より値を参照する。これらの値記憶部より得られた２０
ビットの値と、累積加算部内のレジスタの値（２１ビッ
ト）を右に１ビットシフトしたもの（ここでは、符号ビ
ットを２の補数表現により表すため、最上位のビットを
符号拡張する）とを加算し、この加算結果を上記レジス
タにフィードバックして記憶する。図１４に示すよう
に、上記の累積加算の処理を、１６個のシフトレジスタ
のビット数に相当する回数だけ（ここでは、ビット０〜
ビット１１の１２ビット分）繰り返す。ただし、最後の
ビット（１２ビット目）は符号ビットなので、減算を行
う。すなわち、上記の累積加算の処理は、前述の式
（５）または式（６）の演算に対応している。上記値記
憶部には、予め算出された２０ビットの値が１６ワード
（１ワード＝１６ビット）分だけ格納されている。The values of the sixteen shift registers are used in the process of "cumulative addition". In the 16 accumulators,
Of the values output from the 16 shift registers, s ₀ to
s as ₃ or t ₀ ~t ₃ address 4 bits corresponding to the (aforementioned formula (8)), and refers to the value from the value storing section in the accumulative adder. 20 obtained from these value storage units
The value of the bit and the value of the register in the accumulator (21 bits) shifted to the right by one bit (here, the sign bit is sign-extended to represent the sign bit in two's complement representation). The result of the addition is fed back to the register and stored. As shown in FIG. 14, the above-described cumulative addition process is performed a number of times corresponding to the number of bits of the 16 shift registers (here, bits 0 to 0).
(12 bits of bit 11) are repeated. However, since the last bit (the 12th bit) is a sign bit, subtraction is performed. That is, the above-described processing of the cumulative addition corresponds to the calculation of the above-described equation (5) or (6). The value storage unit stores a previously calculated 20-bit value for 16 words (1 word = 16 bits).

【００９９】図１４のｓ＋ｔまたはｓ−ｔの部分に示す
ように、上記１２ビット分の累積加算の処理が終了した
時点で、１６個の累積加算部から出力される累積加算部
の結果同士を、同累積加算部に含まれる加算器または減
算器を再度使用して加算または減算し、１次元目の積和
演算が完了する。このような加算または減算は、前述の
式（７）の演算に対応する。さらに、上記１次元目の積
和演算の結果の有効なビットである１４ビット分（整数
部１１ビット、小数部３ビット）を切り出し、パラレル
／シリアル変換部に送出する。ただし、最下位のビット
を切り捨てる際には、丸め（小数部の４ビット目に１を
加算する操作：すなわち、四捨五入））を行った後に切
り捨てなければならない。本発明の第１〜第３の実施例
では、１６個の累積加算部のうち、式（７）のｓ₀〜ｓ
₃の演算に対応する累積加算部を初期化するときに、小
数部の４ビット目に対応するビットのみ“１”に初期化
し、それ以外のビットは“０”に初期化することによっ
て、予め丸めを行っている。As shown in the portion s + t or st in FIG. 14, when the above-described process of accumulating 12 bits is completed, the results of the accumulating portions output from the 16 accumulating portions are compared with each other. The addition or subtraction is performed again using the adder or the subtractor included in the accumulative addition unit, and the first-dimension product-sum operation is completed. Such addition or subtraction corresponds to the operation of Expression (7) described above. Furthermore, 14 bits (an integer part 11 bits and a decimal part 3 bits), which are valid bits of the result of the first dimension product-sum operation, are cut out and sent to the parallel / serial conversion unit. However, when rounding down the least significant bit, it is necessary to round down (operation of adding 1 to the fourth bit of the decimal part: that is, rounding off). In the first to third embodiments of the present invention, among the 16 accumulative adder, s ₀ ~s of formula (7)
When initializing the accumulative adder corresponding to the operation of ₃ , only the bit corresponding to the fourth bit of the decimal part is initialized to “1”, and the other bits are initialized to “0”. Rounding.

【０１００】さらに、図１２および図１３において、
「結果出力」の処理は、１６個並列に得られた１４ビッ
トの累積演算結果をシリアル形式で順次出力することに
より遂行される。このように、１６サイクルを一つの１
ユニットとして、上記のような「係数ロード」、「累積
加算」および「結果出力」の３つのステージに対し並列
処理を実行することによって、効率の良いパイプライン
処理を行っている。図１４に示す各ビットの動きによ
り、上記の並列処理が実行される様子がよくわかるであ
ろう。このようにして得られた係数１〜係数１６の値
は、転置ＲＡＭに記憶される。Further, in FIG. 12 and FIG.
The "result output" process is performed by sequentially outputting the 16-bit 14-bit cumulative operation results obtained in parallel in a serial format. Thus, 16 cycles are converted into one 1
As a unit, efficient pipeline processing is performed by executing parallel processing for the three stages of “coefficient loading”, “cumulative addition”, and “result output” as described above. It can be clearly understood that the above-described parallel processing is executed by the movement of each bit shown in FIG. The values of coefficient 1 to coefficient 16 thus obtained are stored in the transposed RAM.

【０１０１】図１５および図１６は、本発明の第１の実
施例における２次元目の積和演算処理手順を説明するた
めのタイミングチャートのその１およびその２であり、
図１７は、本発明の第１の実施例における２次元目の積
和演算処理のビットの動きを示すタイミングチャートで
ある。ここでは、クロックに基づいて実行される２次元
目の積和演算における「係数ロード」、「累積加算」お
よび「結果出力」の動作を示すタイミングチャートと、
２次元目の積和演算処理におけるデータの各ビットの動
きを示すタイミングチャートを図示することとする。FIGS. 15 and 16 are timing charts 1 and 2 for explaining the procedure of the second-dimension product-sum operation in the first embodiment of the present invention.
FIG. 17 is a timing chart showing the movement of the bits in the second-dimensional product-sum operation in the first embodiment of the present invention. Here, a timing chart showing operations of “coefficient loading”, “cumulative addition”, and “result output” in a second-dimensional product-sum operation performed based on a clock,
A timing chart showing the movement of each bit of the data in the second-dimensional product-sum operation is shown.

【０１０２】図１５および図１６のタイミングチャート
に基づいて、２次元目の積和演算処理手順を説明する。
転置ＲＡＭによって、１次元目の積和演算の結果の行と
列を入れ替える操作（すなわち、転置）を行い、この転
置により得られた値を積和演算回路に再度入力する。図
１５および図１６において、「係数ロード」は、積和演
算回路にシリアル形式で入力される複数の係数（例え
ば、１６個の係数からなる係数１〜係数１６）の係数値
を、パラレル形式の係数値に並べ替える。図１５および
図１６から明らかなように、この係数ロードの処理には
１６クロック分必要である。この係数ロードの処理が終
了すると、１クロックを使って、８×８行列（１ブロッ
ク分）の２列または２行分の係数値を１６個のシフトレ
ジスタにロードする。転置ＲＡＭに格納される値は１４
ビットであるため、１６個のシフトレジスタには１４ビ
ットの値がロードされる。The second-dimension product-sum operation processing procedure will be described with reference to the timing charts of FIGS.
The transposition RAM performs an operation of transposing rows and columns of the result of the first-dimensional product-sum operation (ie, transpose), and inputs the value obtained by the transpose to the product-sum operation circuit again. In FIG. 15 and FIG. 16, “load coefficient” means that the coefficient values of a plurality of coefficients (for example, coefficient 1 to coefficient 16 composed of 16 coefficients) input to the product-sum operation circuit in a serial form are converted into parallel form. Sort by coefficient value. As is clear from FIGS. 15 and 16, the processing of this coefficient loading requires 16 clocks. When the coefficient loading process is completed, coefficient values for two columns or two rows of an 8 × 8 matrix (for one block) are loaded into 16 shift registers using one clock. The value stored in the transpose RAM is 14
Since the bits are bits, the 16 shift registers are loaded with 14-bit values.

【０１０３】上記１６個のシフトレジスタの値は、「累
積加算」の処理に使われる。１６個の累積加算部では、
１６個のシフトレジスタから出力される値の中でｓ₀〜
ｓ₃またはｔ₀〜ｔ₃（前述の式（８）参照）に対応す
る４ビットをアドレスとして、累積加算部内の値記憶部
より値を参照する。これらの値記憶部より得られた２０
ビットの値と、累積加算部内のレジスタの値（２１ビッ
ト）を右に１ビットシフトしたもの（ここでも、符号ビ
ットを２の補数表現により表すため、最上位のビットを
符号拡張する）とを加算し、この加算結果を上記レジス
タにフィードバックして記憶する。図１７に示すよう
に、上記の累積加算の処理を、１６個のシフトレジスタ
のビット数に相当する回数だけ（ここでは、ビット０〜
ビット１３の１４ビット分）繰り返す。ただし、最後の
ビット（１４ビット目）は符号ビットなので、減算を行
う。すなわち、上記の累積加算の処理は、前述の式
（５）または式（６）の演算に対応している。上記値記
憶部には、予め算出された２０ビットの値が１６ワード
分だけ格納されている。The values of the sixteen shift registers are used for "accumulative addition" processing. In the 16 accumulators,
Of the values output from the 16 shift registers, s ₀ to
s as ₃ or t ₀ ~t ₃ address 4 bits corresponding to the (aforementioned formula (8)), and refers to the value from the value storing section in the accumulative adder. 20 obtained from these value storage units
The value of the bit and the value obtained by shifting the value of the register (21 bits) in the accumulator by one bit to the right (again, sign-extending the most significant bit to represent the sign bit in two's complement representation) The result of the addition is fed back to the register and stored. As shown in FIG. 17, the above-described cumulative addition process is performed by the number of times corresponding to the number of bits of the 16 shift registers (here, bits 0 to 0).
(14 bits of bit 13) are repeated. However, since the last bit (14th bit) is a sign bit, subtraction is performed. That is, the above-described processing of the cumulative addition corresponds to the calculation of the above-described equation (5) or (6). The value storage unit stores a previously calculated 20-bit value for 16 words.

【０１０４】図１７のｓ＋ｔまたはｓ−ｔの部分に示す
ように、上記１４ビット分の累積加算の処理が終了した
時点で、１６個の累積加算部から出力される累積加算部
の結果同士を、同累積加算部に含まれる加算器または減
算器を再度使用して加算または減算し、２次元目の積和
演算が完了する。このような加算または減算は、前述の
式（７）の演算に対応する。最後に、１クロックを使用
して、図１５、図１６および図１７に示すような積和演
算の結果の丸めとオーバーフロー／アンダーフロー処理
を行い、２次元目の積和演算の最終結果である９ビット
の値を得る。この２次元目の演算結果は、逆ＤＣＴ演算
装置の後段の都合により出力すればよい。ただし、上記
のパイプライン処理を乱さないように出力する必要があ
る。As shown in the portion s + t or st in FIG. 17, when the above-described process of accumulating 14 bits is completed, the results of the accumulating portions output from the 16 accumulating portions are compared with each other. The adder or the subtractor included in the accumulator is used again to add or subtract, and the second-dimensional product-sum operation is completed. Such addition or subtraction corresponds to the operation of Expression (7) described above. Finally, using one clock, the result of the product-sum operation as shown in FIG. 15, FIG. 16 and FIG. 17 is rounded and overflow / underflow processing is performed to obtain the final result of the second-dimensional product-sum operation. Get a 9-bit value. The calculation result of the second dimension may be output due to the latter stage of the inverse DCT calculation device. However, it is necessary to output so as not to disturb the above pipeline processing.

【０１０５】前述したように、本発明の第１の実施例の
構成（図３）では、１マクロブロック（６ブロック分）
の２次元の８×８行列の逆ＤＣＴ演算を、ＩＥＥＥによ
り規定される演算精度を満たしつつ、９６０サイクルで
行うことができる。さらに、本発明の第２の実施例の構
成（図４）では、２個の係数値並べ替えＲＡＭと２個の
転置ＲＡＭを使用することによって、１マクロブロック
の２次元の８×８行列の逆ＤＣＴ演算を、ＩＥＥＥによ
り規定される演算精度を満たしつつ、６４０サイクルで
行うことができる。As described above, in the configuration of the first embodiment of the present invention (FIG. 3), one macroblock (for six blocks)
Can be performed in 960 cycles while satisfying the operation accuracy defined by IEEE. Further, in the configuration of the second embodiment of the present invention (FIG. 4), the use of two coefficient value rearranging RAMs and two transposition RAMs allows the two-dimensional 8 × 8 matrix of one macroblock to be obtained. The inverse DCT operation can be performed in 640 cycles while satisfying the operation accuracy defined by IEEE.

【０１０６】さらにまた、本発明の第３の実施例の構成
（図５）では、２個の係数値並べ替えＲＡＭと、２個の
転置ＲＡＭと、２個の積和演算部とを使用することによ
って、１マクロブロックの２次元の８×８行列の逆ＤＣ
Ｔ演算を、ＩＥＥＥにより規定される演算精度を満たし
つつ、３８４サイクルで行うことができる。これは、１
クロックで一つの係数値を処理することができることを
意味する。Further, in the configuration of the third embodiment of the present invention (FIG. 5), two coefficient value rearranging RAMs, two transposed RAMs, and two product-sum operation units are used. Thus, the inverse DC of a two-dimensional 8 × 8 matrix of one macroblock
The T operation can be performed in 384 cycles while satisfying the operation accuracy defined by IEEE. This is 1
This means that one coefficient value can be processed by the clock.

【０１０７】図１８は、本発明の実施例に使用される１
次元目の累積加算部の具体的な構成を示すブロック図で
ある。図１８に示すように、１次元目の積和演算に使用
される累積加算部においては、シリアル／パラレル変換
部から出力されるパラレル形式の係数の値（例えば、４
ビット）を予め記憶する値記憶部３１−１が設けられて
いる。さらに、この値記憶部３１−１の出力側に、複数
の加算器や減算器を含む加減算部３３−１と、この加減
算部３３−１による加算結果または減算結果を保持する
レジスタ３４−１と、このレジスタ３４−１の値を右に
シフトして符号拡張を行うための右シフト符号拡張部３
２−１とが設けられている。FIG. 18 shows one example used in the embodiment of the present invention.
It is a block diagram which shows the specific structure of the accumulation | adding part of a dimension. As shown in FIG. 18, in a cumulative addition unit used for the first-dimension product-sum operation, the value of a parallel-format coefficient output from the serial / parallel conversion unit (for example, 4
Bit) is provided in advance. Further, on the output side of the value storage unit 31-1, an addition / subtraction unit 33-1 including a plurality of adders and subtractors, and a register 34-1 for holding the addition result or the subtraction result by the addition / subtraction unit 33-1 are provided. Right shift sign extension unit 3 for shifting the value of register 34-1 to the right to sign extend.
2-1 are provided.

【０１０８】図１８において、加減算部３３−１は、値
記憶部３１−１より得られた２０ビットの値と、右シフ
ト符号拡張部３２−１によりレジスタ３４−１の値（例
えば、２１ビット）を右に１ビットシフトしたもの（こ
こでは、符号ビットを２の補数表現により表すため、最
上位のビットを符号拡張する）とを加算する。さらに、
加減算部３３−１による加算結果は、レジスタ３４−１
にフィードバックされ、２次元目の積和演算を行うため
の出力データとして記憶される。In FIG. 18, the addition / subtraction unit 33-1 stores the 20-bit value obtained from the value storage unit 31-1 and the value of the register 34-1 (for example, a 21-bit value) by the right shift code extension unit 32-1. ) Shifted to the right by one bit (here, the sign bit is sign-extended in order to represent the sign bit in two's complement representation). further,
The addition result by the addition / subtraction unit 33-1 is stored in the register 34-1.
And is stored as output data for performing the second-dimensional product-sum operation.

【０１０９】図１９は、本発明の実施例に使用される２
次元目の累積加算部の具体的な構成を示すブロック図で
ある。図１９に示すように、１次元目の積和演算に使用
される累積加算部においては、転置ＲＡＭから送出され
るパラレル形式の値（例えば、４ビット）を予め記憶す
る値記憶部３１−２が設けられている。さらに、この値
記憶部３１−２の出力側に、複数の加算器や減算器を含
む加減算部３３−２と、この加減算部３３−２による加
算結果または減算結果を保持するレジスタ３４−２と、
このレジスタ３４−２の値を右にシフトして符号拡張を
行うための右シフト符号拡張部３２−２とが設けられて
いる。FIG. 19 is a block diagram showing a second embodiment used in the embodiment of the present invention.
It is a block diagram which shows the specific structure of the accumulation | adding part of a dimension. As shown in FIG. 19, in the accumulative addition unit used for the first-dimension product-sum operation, a value storage unit 31-2 that previously stores a parallel-format value (for example, 4 bits) sent from the transposition RAM. Is provided. Further, on the output side of the value storage unit 31-2, an addition / subtraction unit 33-2 including a plurality of adders and subtractors, a register 34-2 for holding the addition result or the subtraction result by the addition / subtraction unit 33-2, ,
There is provided a right shift code extension section 32-2 for shifting the value of the register 34-2 to the right and performing sign extension.

【０１１０】図１９において、加減算部３３−２は、値
記憶部３１−２より得られた１８ビットの値と、右シフ
ト符号拡張部３２−２によりレジスタ３４−２の値（例
えば、１９ビット）を右に１ビットシフトしたもの（こ
こでは、符号ビットを２の補数表現により表すため、最
上位のビットを符号拡張する）とを加算する。さらに、
加減算部３３−２による加算結果は、レジスタ３４−２
にフィードバックされ、逆ＤＣＴ演算処理後の出力デー
タとして記憶される。In FIG. 19, the addition / subtraction unit 33-2 stores the 18-bit value obtained from the value storage unit 31-2 and the value of the register 34-2 (for example, the 19-bit value) by the right shift code extension unit 32-2. ) Shifted to the right by one bit (here, the sign bit is sign-extended in order to represent the sign bit in two's complement representation). further,
The addition result by the addition / subtraction unit 33-2 is stored in the register 34-2.
And is stored as output data after the inverse DCT operation processing.

【０１１１】[0111]

【発明の効果】以上説明したように、本発明の逆ＤＣＴ
演算装置によれば、第１に、複数の累積加算部内の加減
算部を再度使用して積和演算の後処理用の加算または減
算を行っているので、このような加算または減算のため
の余計な回路が節減されると共に、従来よりも少ないク
ロックサイクル数で逆ＤＣＴ演算を実行することが可能
になる。As described above, the inverse DCT of the present invention is used.
According to the arithmetic device, firstly, addition or subtraction for post-processing of the product-sum operation is performed by using the addition and subtraction units in the plurality of accumulation addition units again. Circuit can be saved, and the inverse DCT operation can be performed with a smaller number of clock cycles than before.

【０１１２】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第２に、複数の累積加算部内の加減算部を再度使用
して積和演算の後処理を行うと共に、１次元目および２
次元目の積和演算を行う積和演算部に対し１個の係数値
並べ替えＲＡＭと１個の転置ＲＡＭ等を設けるようにし
ているので、ＩＥＥＥにより規定される２次元の逆ＤＣ
Ｔ演算精度を満たしつつ、従来よりも小規模な回路でか
つ少ないクロックサイクル数で効率の良いパイプライン
処理を行うことが可能になる。Further, according to the inverse DCT operation device of the present invention, secondly, post-processing of the product-sum operation is performed again by using the addition / subtraction units in the plurality of accumulative addition units, and the first and second dimensions are added.
Since one coefficient value rearranging RAM, one transposition RAM, and the like are provided for the product-sum operation unit that performs the product-sum operation of the dimension, a two-dimensional inverse DC defined by IEEE is provided.
It is possible to perform efficient pipeline processing with a smaller circuit and a smaller number of clock cycles than before, while satisfying the T operation accuracy.

【０１１３】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第３に、複数の累積加算部内の加減算部を再度使用
して積和演算の後処理を行うと共に、１次元目および２
次元目の積和演算を行う積和演算部に対し２個の係数値
並べ替えＲＡＭと２個の転置ＲＡＭ等を設けているの
で、ＩＥＥＥにより規定される２次元の逆ＤＣＴ演算精
度を満たしつつ、従来よりも小規模な回路で、かつ、上
記の場合よりも少ないクロックサイクル数で非常に効率
の良いパイプライン処理を行うことが可能になる。Thirdly, according to the inverse DCT operation device of the present invention, post-processing of the product-sum operation is performed again by using the addition / subtraction units in the plurality of accumulative addition units again.
Since two coefficient value rearranging RAMs, two transposition RAMs, and the like are provided for the product-sum operation unit that performs the product-sum operation in the dimension, the two-dimensional inverse DCT operation accuracy specified by IEEE is satisfied. It is possible to perform highly efficient pipeline processing with a smaller circuit than before and with a smaller number of clock cycles than in the above case.

【０１１４】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第４に、シリアル／パラレル変換部から出力される
パラレル形式の係数値を１ビットずつシフトしながら、
複数のシフトレジスタにより上記パラレル形式の係数値
を一時的に保持するようにしてるので、少ないクロック
サイクル数で効率良く並列処理を実行することが可能に
なる。Fourth, according to the inverse DCT operation device of the present invention, fourthly, the coefficient values in the parallel format output from the serial / parallel conversion unit are shifted one bit at a time.
Since the parallel format coefficient values are temporarily held by a plurality of shift registers, parallel processing can be efficiently executed with a small number of clock cycles.

【０１１５】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第５に、１次元目の積和演算を行うために累積加算
部等を初期化するときに、パラレル形式の係数値の小数
部における最下位のビットの丸め処理を行うようにして
いるので、２次元目の積和演算の際に、ＩＥＥＥにより
規定される演算精度を満たしつつ、少ないクロックサイ
クル数で効率良く並列処理を実行することが可能にな
る。Fifth, according to the inverse DCT operation device of the present invention, when the accumulator and the like are initialized to perform the first-dimensional product-sum operation, the decimal part of the coefficient value in the parallel format is used. , The least-significant bit is rounded, so that in the second-dimension product-sum operation, parallel processing is efficiently executed with a small number of clock cycles while satisfying the operation accuracy defined by IEEE. It becomes possible.

【０１１６】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第６に、２個の係数値並べ替えＲＡＭにより複数の
係数値に対し交互に並べ替えを行う場合に、一方の係数
値並べ替えＲＡＭにおいて一つのブロックの係数値の並
べ替えを行った結果の読み出しが実行される前に、他方
の係数値並べ替えＲＡＭにおいて次に送られてくる他の
ブロックの係数値の書き込み動作を実行するようにして
いるので、係数値並べ替えの処理を比較的高速にて実行
することが可能になる。Further, according to the inverse DCT arithmetic unit of the present invention, sixth, when two coefficient value rearranging RAMs are used to perform rearrangement on a plurality of coefficient values alternately, one coefficient value rearrangement is performed. Before reading the result of rearranging the coefficient values of one block in the RAM, the writing operation of the coefficient value of another block transmitted next is executed in the other coefficient value rearranging RAM. Thus, the coefficient value rearranging process can be executed at a relatively high speed.

【０１１７】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第７に、２個の転置ＲＡＭにより１次元目の積和演
算の結果として得られる複数の値に対し交互に転置を行
う場合に、一方の転置ＲＡＭにおいて一つのブロックの
値の転置を行った結果の読み出し動作が実行される前
に、他方の転置ＲＡＭにおいて次に送られてくるブロッ
クの値の書き込み動作を実行するようにしているので、
転置処理を比較的高速にて実行することが可能になる。Further, according to the inverse DCT operation apparatus of the present invention, seventhly, when two transposition RAMs are used to transpose a plurality of values obtained as a result of the product-sum operation of the first dimension alternately. Before the read operation of the result of performing the transposition of the value of one block in one transposed RAM is performed, the write operation of the value of the next transmitted block is performed in the other transposed RAM. Because
The transposition process can be performed at a relatively high speed.

【０１１８】さらに、本発明の逆ＤＣＴ演算装置によれ
ば、第８に、複数の累積加算部内の加減算部を再度使用
して積和演算の後処理を行うと共に、１次元目および２
次元目の積和演算をそれぞれ別個に行う２個の積和演算
部に対し２個の係数値並べ替えＲＡＭと２個の転置ＲＡ
Ｍ等を設けているので、ＩＥＥＥにより規定される２次
元の逆ＤＣＴ演算精度を満たしつつ、従来よりも小規模
な回路で、かつ、第２の実施態様の場合よりも少ないク
ロックサイクル数で非常に効率の良いパイプライン処理
を行うことが可能になる。Eighth, according to the inverse DCT operation apparatus of the present invention, post-processing of the product-sum operation is performed again by using the addition / subtraction units in the plurality of accumulators again,
Two coefficient value rearranging RAMs and two transposition RAs are provided for two product-sum operation units that separately perform the product-sum operation of the dimension.
Since the M and the like are provided, it is possible to satisfy the two-dimensional inverse DCT operation accuracy defined by IEEE, and to use a circuit smaller than the conventional one and with a smaller number of clock cycles than in the second embodiment. It is possible to perform efficient pipeline processing.

【０１１９】さらに、本発明の逆ＤＣＴ演算方法によれ
ば、第９に、複数の累積加算部内の加減算部を再度使用
して積和演算の後処理用の加算または減算を行っている
ので、従来よりも小規模な回路でかつ少ないクロックサ
イクル数で逆ＤＣＴ演算を実行することが可能になる。Ninth, according to the inverse DCT operation method of the present invention, ninth, addition or subtraction for post-processing of the product-sum operation is performed by using the addition / subtraction units in the plurality of accumulators again. The inverse DCT operation can be performed with a smaller circuit and a smaller number of clock cycles than in the conventional case.

[Brief description of the drawings]

【図１】本発明の原理構成を示すブロック図である。FIG. 1 is a block diagram showing the principle configuration of the present invention.

【図２】本発明の逆ＤＣＴ演算装置に適用される積和演
算部の具体的な構成を示すブロック図である。FIG. 2 is a block diagram showing a specific configuration of a product-sum operation unit applied to the inverse DCT operation device of the present invention.

【図３】本発明の第１の実施例の構成を示すブロック図
である。FIG. 3 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.

【図４】本発明の第２の実施例の構成を示すブロック図
である。FIG. 4 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.

【図５】本発明の第３の実施例の構成を示すブロック図
である。FIG. 5 is a block diagram showing a configuration of a third exemplary embodiment of the present invention.

【図６】本発明の第１の実施例の全体的な動作を説明す
るためのタイミングチャート（その１）である。FIG. 6 is a timing chart (part 1) for explaining the overall operation of the first embodiment of the present invention.

【図７】本発明の第１の実施例の全体的な動作を説明す
るためのタイミングチャート（その２）である。FIG. 7 is a timing chart (2) for explaining the overall operation of the first embodiment of the present invention.

【図８】本発明の第２の実施例の全体的な動作を説明す
るためのタイミングチャート（その１）である。FIG. 8 is a timing chart (part 1) for explaining the overall operation of the second embodiment of the present invention.

【図９】本発明の第２の実施例の全体的な動作を説明す
るためのタイミングチャート（その２）である。FIG. 9 is a timing chart (2) for explaining the overall operation of the second embodiment of the present invention.

【図１０】本発明の第３の実施例の全体的な動作を説明
するためのタイミングチャート（その１）である。FIG. 10 is a timing chart (part 1) for explaining the overall operation of the third embodiment of the present invention.

【図１１】本発明の第３の実施例の全体的な動作を説明
するためのタイミングチャート（その２）である。FIG. 11 is a timing chart (2) for explaining the overall operation of the third embodiment of the present invention.

【図１２】本発明の第１の実施例における１次元目の積
和演算処理手順を説明するためのタイミングチャート
（その１）である。FIG. 12 is a timing chart (part 1) for explaining a first-dimension product-sum operation processing procedure in the first embodiment of the present invention.

【図１３】本発明の第１の実施例における１次元目の積
和演算処理手順を説明するためのタイミングチャート
（その２）である。FIG. 13 is a timing chart (part 2) for explaining the first dimension product-sum operation processing procedure in the first embodiment of the present invention.

【図１４】本発明の第１の実施例における１次元目の積
和演算処理のビットの動きを示すタイミングチャートで
ある。FIG. 14 is a timing chart showing the movement of bits in the first-dimensional product-sum operation in the first embodiment of the present invention.

【図１５】本発明の第１の実施例における２次元目の積
和演算処理手順を説明するためのタイミングチャート
（その１）である。FIG. 15 is a timing chart (part 1) for describing a second-dimensional product-sum operation processing procedure in the first embodiment of the present invention.

【図１６】本発明の第１の実施例における２次元目の積
和演算処理手順を説明するためのタイミングチャート
（その２）である。FIG. 16 is a timing chart (part 2) for describing a second-dimensional product-sum operation processing procedure in the first embodiment of the present invention.

【図１７】本発明の第１の実施例における２次元目の積
和演算処理のビットの動きを示すタイミングチャートで
ある。FIG. 17 is a timing chart showing bit movements in a second-dimensional product-sum operation in the first embodiment of the present invention.

【図１８】本発明の実施例に使用される１次元目の累積
加算部の具体的な構成を示すブロック図である。FIG. 18 is a block diagram showing a specific configuration of a first-dimensional accumulative addition unit used in the embodiment of the present invention.

【図１９】本発明の実施例に使用される２次元目の累積
加算部の具体的な構成を示すブロック図である。FIG. 19 is a block diagram showing a specific configuration of a second-dimensional accumulator used in the embodiment of the present invention.

【図２０】従来の逆ＤＣＴ演算装置の一構成例を示すブ
ロック図である。FIG. 20 is a block diagram illustrating a configuration example of a conventional inverse DCT operation device.

[Explanation of symbols]

１…積和演算部２−１〜２−ｎ…第１〜第ｎのデータ保持部３−１〜３−ｎ…第１〜第ｎの累積加算部４…シリアル／パラレル変換部５…パラレル／シリアル変換部６…係数値並べ替えＲＡＭ７…マルチプレクサ（ＭＰＸ）８…転置ＲＡＭ１０…積和演算回路１１…１次元目積和演算回路１２…２次元目積和演算回路２０−１〜２０−１６…第１〜第１６のシフトレジスタ３０−１〜３０−１６…第１〜第１６の累積加算部６１…第１の係数値並べ替えＲＡＭ６２…第２の係数値並べ替えＲＡＭ６３…第１の係数値並べ替えＲＡＭ６４…第２の係数値並べ替えＲＡＭ７１…マルチプレクサ７２…第１のマルチプレクサ８１…第１の転置ＲＡＭ８２…第２の転置ＲＡＭ８３…第１の転置ＲＡＭ８４…第２の転置ＲＡＭ９２…第２のマルチプレクサ２００…データ並べ替え回路３００…積和演算部３５０−１〜３５０−８…第１〜第８の積和演算回路４００…後処理部 DESCRIPTION OF SYMBOLS 1 ... Product-sum operation part 2-1-2-n ... 1st-nth data holding part 3-1-3-n ... 1st-nth accumulation | addition part 4 ... Serial / parallel conversion part 5 ... Parallel / Serial conversion unit 6 Coefficient value rearranging RAM 7 Multiplexer (MPX) 8 Transpose RAM 10 Product sum operation circuit 11 One-dimensional product sum operation circuit 12 Two-dimensional product sum operation circuit 20-1 to 20-20 -16 ... first to sixteenth shift registers 30-1 to 30-16 ... first to sixteenth accumulators 61 ... first coefficient value rearranging RAM 62 ... second coefficient value rearranging RAM 63 ... First coefficient value rearranging RAM 64 Second coefficient value rearranging RAM 71 Multiplexer 72 First multiplexer 81 First transposed RAM 82 Second transposed RAM 83 First transposed RAM 84 2nd transposition RAM 92 ... Multiplexer 200 ... data rearrangement circuit 300 ... product-sum operation unit 350-1～350-8 ... product-sum operation circuit 400 ... post-processing unit of the first to eighth

───────────────────────────────────────────────────── フロントページの続き (72)発明者河野忠美神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内Ｆターム(参考） 5B056 AA05 AA06 BB11 BB31 CC01 DD06 FF02 FF03 FF05 FF08 FF16 HH03 ────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Tadami Kono 4-1-1, Kamidadanaka, Nakahara-ku, Kawasaki-shi, Kanagawa F-term within Fujitsu Limited (Reference) 5B056 AA05 AA06 BB11 BB31 CC01 DD06 FF02 FF03 FF05 FF08 FF16 HH03

Claims

[Claims]

1. An inverse DCT operation on arbitrary encoded input data is decomposed into at least two one-dimensional matrix operations, and a predetermined number of one-dimensional matrix operations are sequentially executed to obtain the input data. An inverse DCT arithmetic unit for decoding a plurality of input data, wherein a plurality of coefficient values of the input data are rearranged, and a coefficient value transmitted in a serial format is converted into a coefficient value in a parallel format. And a product-sum operation unit having a plurality of addition / subtraction units for performing addition / subtraction of information already stored, using a value output from the serial / parallel conversion unit as an address. The addition / subtraction unit is used again, and a plurality of values obtained as a result of the product-sum operation by the product-sum operation unit are added or subtracted from each other, and then the parallel / serial conversion is performed. An inverse DCT operation device for decoding input data.

2. Decomposing an inverse DCT operation on arbitrary encoded input data into two one-dimensional matrix operations,
An inverse DCT operation apparatus for sequentially executing a predetermined number of matrix operations of a second dimension and a second dimension to decode the input data, wherein a coefficient value arrangement for rearranging a plurality of coefficient values of the input data Conversion unit; a serial / parallel conversion unit for converting a coefficient value sent from the coefficient value rearrangement unit in a serial format into a parallel format coefficient value; and addressing a value output from the serial / parallel conversion unit. A product-sum operation unit having a plurality of addition / subtraction units for performing addition / subtraction of information already stored; and a transpose for transposing a plurality of values obtained as a result of the first-dimensional product-sum operation by the product-sum operation unit A plurality of values in a parallel format sent from the transposition unit are input to the product-sum operation unit, and a second-dimensional product-sum operation of the plurality of values is performed. The plurality Use the addition and subtraction part again,
An inverse DCT operation device, wherein the input data is decoded by performing parallel / serial conversion after adding or subtracting a plurality of values obtained as a result of the second-dimensional product-sum operation.

3. Decomposing an inverse DCT operation on arbitrary encoded input data into two one-dimensional matrix operations,
In an inverse DCT operation device for sequentially executing a predetermined number of matrix operations of a second dimension and a second dimension to decode the input data, two inverses for rearranging a plurality of coefficient values of the input data are provided. A coefficient value rearranging section; a serial / parallel converting section for converting coefficient values sent in serial form from each of the two coefficient value rearranging sections to a parallel format coefficient value; A product-sum operation unit having a plurality of addition / subtraction units for performing addition / subtraction of information already stored, using a value output from the unit as an address, and a result of a first-dimension product-sum operation performed by the product-sum operation unit Two transposition units for transposing a plurality of values, wherein a plurality of values in a parallel format sent from each of the two transposition units are input to the multiply-accumulation unit and two of the plurality of values are input. The product-sum operation of the dimension And row, again using the plurality of addition and subtraction portions in the product-sum operation unit,
An inverse DCT operation device, wherein the input data is decoded by performing parallel / serial conversion after adding or subtracting a plurality of values obtained as a result of the second-dimensional product-sum operation.

4. Decomposing an inverse DCT operation on arbitrary encoded input data into two one-dimensional matrix operations,
In an inverse DCT operation device for sequentially executing a predetermined number of matrix operations of a second dimension and a second dimension to decode the input data, two inverses for rearranging a plurality of coefficient values of the input data are provided. A coefficient value rearranging section; a serial / parallel converting section for converting coefficient values sent in serial form from each of the two coefficient value rearranging sections to a parallel format coefficient value; A one-dimensional sum-of-products operation unit having a plurality of first addition / subtraction units for performing addition / subtraction of information already stored, using a value output from the unit as an address; And two transposition units for transposing a plurality of values obtained as a result of the product-sum operation of the two, and a plurality of values in a parallel format sent from each of the two transposition units are already stored as addresses. Of information A two-dimensional product sum operation unit having a plurality of second addition / subtraction units for performing subtraction, wherein the plurality of second addition / subtraction units in the two-dimensional product sum operation unit are used again, and the two-dimensional product sum is used. The input data is decoded by adding or subtracting a plurality of values obtained as a result of the second-dimensional product-sum operation by the sum operation unit, and then performing parallel / serial conversion, thereby decoding the input data.
CT calculation device.

5. The product-sum operation unit further shifts the coefficient value in the parallel format output from the serial / parallel conversion unit by one bit from a least significant bit to a most significant bit. A bit holding unit for temporarily holding, and a cumulative adding unit for performing cumulative addition using the plurality of adding / subtracting units, wherein the cumulative adding unit stores the coefficient values in the parallel format held in the bit holding unit. With 1 bit of the address as
4. The inverse DCT operation device according to claim 1, wherein the cumulative addition is performed using information already stored in a value storage unit in the cumulative addition unit and the addition / subtraction unit.

6. The one-dimensional multiply-accumulate unit further calculates a coefficient value of the parallel format output from the serial / parallel converter one bit at a time from a least significant bit to a most significant bit. A first bit holding unit that temporarily holds while shifting, and a first cumulative addition unit that performs cumulative addition related to a first-dimensional matrix operation using the plurality of first addition / subtraction units, The first accumulator has already stored in the first value memory in the first accumulator an address using one bit of the coefficient value in the parallel format held in the first bit memory as an address. Performing the cumulative addition relating to the first-dimensional matrix operation using the information and the first addition / subtraction unit, wherein the two-dimensional product-sum operation unit further comprises: Multiple values in the parallel format sent from A second bit holding unit that temporarily holds the bit by shifting one bit from the least significant bit toward the most significant bit, and a second-dimensional matrix operation using the plurality of second addition / subtraction units. A second accumulator that performs accumulative addition, wherein the second accumulator has, as an address, one bit of the plurality of values in the parallel format held in the second bit holding unit. The cumulative addition according to the second-dimensional matrix operation is performed using information already stored in a second value storage unit in the second cumulative addition unit and the second addition / subtraction unit. Inverse DCT operation device.

7. A rounding process of a least significant bit in a decimal part of a coefficient value in the parallel format when the product-sum operation unit is initialized to perform cumulative addition relating to a first-dimensional matrix operation. The inverse DCT operation device according to claim 1.

8. A first coefficient value rearranging RAM, and a second coefficient value rearranging unit, wherein the two coefficient value rearranging units perform rearrangement alternately for a plurality of coefficient values of the input data. Before the read operation of the result of rearranging the coefficient values of one block in the first coefficient value rearranging RAM is executed, the second coefficient value rearranging RAM performs the following. 5. The inverse operation according to claim 3, wherein a write operation of a coefficient value of another block sent to the block is executed, and a similar read operation and write operation are executed for a coefficient value of a block sent thereafter. DCT arithmetic unit.

9. A first transposition RAM, wherein said two transposition units alternately transpose a plurality of values obtained as a result of a product-sum operation of a first dimension of said coefficient values, and a second transposition RAM. Before the read operation of the result obtained by transposing the value of one block in the first transposed RAM is performed, the block of the next transmitted block in the second transposed RAM is constructed. 4. A value writing operation is performed, and a similar reading operation and writing operation are performed on a value of a block transmitted thereafter.
Or the inverse DCT arithmetic unit according to 4.

10. Decomposing an inverse DCT operation on arbitrary encoded input data into at least two one-dimensional matrix operations, and sequentially executing a predetermined number of one-dimensional matrix operations to obtain the input data. An inverse DCT operation method for decoding the input data, wherein a coefficient value transmitted in a serial format after rearranging a plurality of coefficient values of the input data is converted into a coefficient value in a parallel format. Using a plurality of addition / subtraction units that perform addition / subtraction of information stored in advance using the value converted into the parallel format as an address, and performing cumulative addition using the information stored in advance and the plurality of addition / subtraction units. By executing the product-sum operation by using the plurality of addition / subtraction units again, and adding or subtracting a plurality of values obtained as a result of the product-sum operation, and then performing parallel / serial conversion, An inverse DCT operation method, wherein the input data is decoded.