JPH117440A

JPH117440A - Processor, compiler, product-sum operation method, and recording medium

Info

Publication number: JPH117440A
Application number: JP9160202A
Authority: JP
Inventors: Masato Suzuki; 正人鈴木
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-06-17
Filing date: 1997-06-17
Publication date: 1999-01-12

Abstract

PROBLEM TO BE SOLVED: To reduce hardware costs by executing alternately a set of a load instruction(MOV) and a multiplication instruction(MULA) and a set of the load instruction and an cumulative addition instruction (ADDA). SOLUTION: An instruction execution circuit 604 executes a load instruction(MOV) and a multiplication instruction(MULA) through a parallel operation of a computing element 21 and a multiplier 28, and carries out the load instruction(MOV) and a cumulative addition instruction(ADDA) through a parallel operation of the element 21 and the multiplier 28. Then, a processor which is provided with an instruction reading circuit that simultaneously reads two instructions, an instruction decoding circuit that detects instructions which are parallelly carried out and controls the circuit 604 and the circuit 604 performs processing for a product-sum operation of one time in two cycles. That is, the loading of data to 1st and 2nd registers is not performed at the same time.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、プロセッサ、コン
パイラ、積和演算方法及び記録媒体に関し、特に積和演
算を行うプロセッサ、積和演算のコードを含む高級言語
プログラムを機械命令プログラムに翻訳するコンパイ
ラ、記憶領域に予め格納された値に対して積和演算を行
うための積和演算方法、及び該高級言語プログラム又は
該機械命令プログラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a processor, a compiler, a product-sum operation method, and a recording medium, and more particularly to a processor for performing a product-sum operation, and a compiler for translating a high-level language program including a product-sum operation code into a machine instruction program. The present invention relates to a product-sum operation method for performing a product-sum operation on a value previously stored in a storage area, and a recording medium on which the high-level language program or the machine instruction program is recorded.

【０００２】[0002]

【従来の技術】近年の電子技術の発展により、高性能な
プロセッサが普及し、あらゆる分野で用いられている。
特にマルチメディアといった応用分野においては、画像
処理や音声処理の技術がキーとなり、それらの処理には
積和演算が必須である。この画像処理や音声処理は、リ
アルタイムに処理することが要求されるため、積和演算
はできるだけ高速に処理しなければならない。積和演算
とは、２つのデータの並びからデータを１つずつ取り出
して乗算を行い、積を累積加算する演算をいう。2. Description of the Related Art With the development of electronic technology in recent years, high-performance processors have become widespread and used in all fields.
Particularly in application fields such as multimedia, image processing and audio processing techniques are key, and a product-sum operation is essential for such processing. Since the image processing and the audio processing are required to be performed in real time, the product-sum operation must be performed as fast as possible. The product-sum operation refers to an operation of taking out data one by one from the two data arrays, performing multiplication, and cumulatively adding the products.

【０００３】また、画像処理や音声処理を実現するため
のプログラムは、処理の高度化に伴って大規模化の一途
であり、プログラム開発の生産性向上などの観点から、
高級言語で記述される。そこで、積和演算の処理をする
プログラムを実行するためには、プロセッサ及びコンパ
イラに関する技術が必要とされる。ここで、コンパイラ
とは、高級言語で記述されたプログラムを、プロセッサ
で実行可能な機械命令プログラムに翻訳するものであ
る。Further, programs for realizing image processing and audio processing are continually increasing in scale with the advancement of processing, and from the viewpoint of improving the productivity of program development, etc.
Written in a high-level language. Therefore, in order to execute a program for performing the product-sum operation, techniques relating to a processor and a compiler are required. Here, the compiler translates a program described in a high-level language into a machine instruction program executable by a processor.

【０００４】（第１の従来技術）積和演算の処理を行う
第１の従来技術は、汎用プロセッサ単独で積和演算を行
う技術である。ここで、汎用プロセッサは、積和演算に
対する特別な回路を有していないプロセッサをいう。図
４は、積和演算の処理をするＣ言語のプログラムを示し
た図である。(First Prior Art) A first prior art for processing a product-sum operation is a technology for performing a product-sum operation by a general-purpose processor alone. Here, the general-purpose processor refers to a processor having no special circuit for the product-sum operation. FIG. 4 is a diagram showing a C-language program for performing the product-sum operation.

【０００５】図１３は、Ｃ言語プログラムを機械命令プ
ログラムに翻訳した結果のプログラムリストである。以
下、図４、図１３を用いて第１の従来技術を説明する。
汎用プロセッサで積和演算を行うために、コンパイラ
は、図４に示すＣ言語プログラムの断片を、図１３に示
す機械命令プログラムに変換する。図４において、for
文はループを表し、配列変数x[i]と配列変数y[i]の積を
配列番号iについて累積加算して変数sumを求める。図１
３は、積と和の１回の処理（図４のfor文によるループ
に相当)には、６命令が必要であることを示している。
命令１と命令２はそれぞれ２つの配列変数に対応する記
憶領域からデータを読出す命令、命令３は乗算命令（積
は上位と下位に分けて得られる）、命令４は積の下位を
累積する加算命令、命令５は積の上位を転送する命令、
命令６は積の上位を命令４で得られる桁上げとともに累
積する加算命令である。この図１３に示す機械命令プロ
グラムを汎用プロセッサにおいて繰り返し実行すること
により積和演算が実現される。FIG. 13 is a program list as a result of translating a C language program into a machine instruction program. Hereinafter, the first related art will be described with reference to FIGS. 4 and 13.
In order to perform a product-sum operation on a general-purpose processor, the compiler converts a fragment of the C language program shown in FIG. 4 into a machine instruction program shown in FIG. In FIG. 4, for
The statement represents a loop, and the variable sum is obtained by cumulatively adding the product of the array variable x [i] and the array variable y [i] for the array number i. FIG.
Reference numeral 3 indicates that six instructions are required for one process of product and sum (corresponding to a loop by a for statement in FIG. 4).
Instructions 1 and 2 are instructions for reading data from storage areas corresponding to two array variables, instruction 3 is a multiplication instruction (product is obtained by dividing into upper and lower parts), and instruction 4 is accumulation of lower part of the product. Addition instruction, instruction 5 is an instruction to transfer the higher order of the product,
The instruction 6 is an addition instruction for accumulating the higher order of the product together with the carry obtained by the instruction 4. By repeatedly executing the machine instruction program shown in FIG. 13 on a general-purpose processor, a product-sum operation is realized.

【０００６】（第２の従来技術）また、積和演算の処理
を行う第２の従来技術は、汎用プロセッサに、専用の積
和演算回路を付加して、積和演算を行う技術である。図
１４は、上記の専用の積和演算回路の構成を示すブロッ
ク図である。以下、図１４を用いて第２の従来技術を説
明する。専用の積和演算回路は、２つのデータを同時に
記憶領域Ａ、Ｂ（１２０３、１２０７）から読出して乗
算器１２０９に供給するために、記憶領域をアクセスす
るアドレスを求めるアドレス算出部Ａ、Ｂ（１２０１、
１２０５）、記憶領域をアクセスする記憶領域アクセス
部Ａ、Ｂ（１２０２、１２０６）、得られたデータを転
送するデータ転送部Ａ、Ｂ（１２０４、１２０８）を２
系統備える。さらに、専用の積和演算回路は、２つのデ
ータの積を求める乗算器１２０９と、積を累積する加算
器１２１０と、累積された結果を記憶する累積結果レジ
スタ１２１１を備える。(Second Prior Art) A second prior art for performing a product-sum operation is a technology for performing a product-sum operation by adding a dedicated product-sum operation circuit to a general-purpose processor. FIG. 14 is a block diagram showing a configuration of the dedicated product-sum operation circuit. Hereinafter, the second related art will be described with reference to FIG. The dedicated product-sum operation circuit reads the two data from the storage areas A and B (1203, 1207) at the same time and supplies the data to the multiplier 1209, so that the address calculation units A and B ( 1201,
1205), two storage area access units A and B (1202 and 1206) for accessing the storage area and two data transfer units A and B (1204 and 1208) for transferring the obtained data.
Have a system. Further, the dedicated product-sum operation circuit includes a multiplier 1209 for obtaining a product of two data, an adder 1210 for accumulating the product, and an accumulation result register 1211 for storing the accumulated result.

【０００７】従って、第２の従来技術では、汎用プロセ
ッサに付加した専用の積和演算回路が、積和演算の処理
を高速に行う機能をもつ。この技術によると、積和演算
１回の処理は、１命令かつ１サイクルで実現される。Therefore, in the second prior art, the dedicated product-sum operation circuit added to the general-purpose processor has a function of performing the product-sum operation at high speed. According to this technique, one process of the product-sum operation is realized in one instruction and one cycle.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記の
第１の従来技術では、汎用プロセッサが図１３の命令１
から命令６のすべての命令を１サイクルで実行できたと
しても積和１回の処理に６サイクルもかかり、必要な処
理性能が得られないという問題点を有している。まし
て、乗算器を備えない多くの汎用プロセッサでは加算器
による加算を繰り返すことによって積を求めるため、さ
らに多くのサイクル数を必要とすることになる。However, according to the first prior art, the general-purpose processor uses the instruction 1 shown in FIG.
However, even if all of the instructions 6 to 6 can be executed in one cycle, it takes six cycles to perform one multiply-accumulate operation, and a required processing performance cannot be obtained. In addition, many general-purpose processors without a multiplier require a larger number of cycles because a product is obtained by repeating addition by an adder.

【０００９】また、第２の従来技術では、専用の積和演
算回路において、記憶領域をアクセスするアドレスを求
める手段、記憶領域をアクセスする手段、得られたデー
タを転送する手段をいずれも２系統必要とする点で、ハ
ードウェアのコストが増大するという問題点を有してい
る。そこで、本発明はかかる点に鑑み、ハードウェアの
コストを低く押さえ、かつ、高速に積和演算の処理を行
うプロセッサを提供することを目的とする。In the second prior art, a dedicated product-sum operation circuit has two systems each of a means for obtaining an address for accessing a storage area, a means for accessing a storage area, and a means for transferring obtained data. There is a problem that the cost of hardware increases at the required point. In view of the above, it is an object of the present invention to provide a processor that keeps the cost of hardware low and that performs a product-sum operation at high speed.

【００１０】[0010]

【課題を解決するための手段】上記課題を解決するため
に本発明に係るプロセッサは、記憶領域に置かれた第１
の配列データの配列要素と第２の配列データの配列要素
との積を各配列要素毎に求めて累積加算するプロセッサ
であって、プログラムメモリから命令を読み出す命令読
出し手段と、前記命令読出し手段によって読み出された
所定の第１の拡張命令と第２の拡張命令とを解読する命
令解読手段と、第１〜第４のレジスタと、前記第１及び
第２の配列データの配列要素をそれぞれ前記第１及び第
２のレジスタにロードするデータロード手段と、前記第
１及び第２のレジスタの内容の積を求めて前記第３のレ
ジスタに格納する乗算手段と、前記第３のレジスタの内
容と前記第４のレジスタの内容との和を求め再び前記第
４のレジスタに格納する加算手段と、前記命令解読手段
により前記第１の拡張命令が解読された場合に前記乗算
手段を実行させると並行して前記データロード手段に新
たに前記第１の配列データの配列要素をロードさせる第
１の実行制御手段と、前記命令解読手段により前記第２
の拡張命令が解読された場合に前記データロード手段に
前記第２の配列データの配列要素をロードさせると並行
して前記加算手段を実行させる第２の実行制御手段とか
らなる命令実行手段とを備えることを特徴とする。In order to solve the above problems, a processor according to the present invention comprises a first memory located in a storage area.
A processor for obtaining the product of the array element of the array data and the array element of the second array data for each array element and accumulatively adding the product, comprising: an instruction reading means for reading an instruction from a program memory; Instruction decoding means for decoding the read first and second extended instructions, first to fourth registers, and array elements of the first and second array data, Data loading means for loading the first and second registers, multiplication means for obtaining the product of the contents of the first and second registers and storing the result in the third register; Adding means for obtaining the sum of the contents of the fourth register and storing the sum again in the fourth register; and causing the multiplying means to execute when the first extended instruction is decoded by the instruction decoding means. A first execution control means for loading a new said sequence elements of the first array data in the data loading unit in parallel, the second by the instruction decoding means
And a second execution control means for executing the addition means in parallel with loading the array elements of the second array data into the data loading means when the extended instruction is decoded. It is characterized by having.

【００１１】これによって、本発明に係るプロセッサ
は、前記第１のレジスタにデータをロードする前記デー
タロード手段の実行と、前記第２のレジスタにデータを
ロードする前記データロード手段の実行とは、それぞれ
前記第１の実行制御手段と前記第２の実行制御手段によ
って別個に制御され、第１及び第２のレジスタへのデー
タのロードが同時に行われるのではないので、前記デー
タロード手段は、記憶領域をアクセスするアドレスを求
める手段と記憶領域をアクセスする手段とが１系統あれ
ば足りるため、第２の従来技術に示した専用の積和演算
回路を用いた場合と比較してハードウェアコストを低く
押さえることができる。Thus, the processor according to the present invention is characterized in that the execution of the data load means for loading data into the first register and the execution of the data load means to load data into the second register include: Since the first execution control means and the second execution control means are separately controlled and the data loading to the first and second registers is not performed simultaneously, the data loading means includes Since only one system is required for obtaining the address for accessing the area and for accessing the storage area, the hardware cost is reduced as compared with the case of using the dedicated product-sum operation circuit shown in the second prior art. Can be kept low.

【００１２】また、本発明に係るプロセッサは、前記第
１の実行制御手段が前記データロード手段と前記乗算手
段とを並行させるとともに、前記第２の実行制御手段が
前記データロード手段と前記加算手段とを並行させるの
で、第１の従来技術に示した汎用プロセッサによる場合
と比較して高速に積和演算の処理を行うことができる。Also, in the processor according to the present invention, the first execution control means makes the data load means and the multiplication means parallel, and the second execution control means makes the data load means and the addition means Are performed in parallel, so that the product-sum operation can be performed at a higher speed than in the case of using the general-purpose processor described in the first related art.

【００１３】[0013]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図面を用いて詳細に説明する。（コンパイラ）図１は、コンパイラの構成を示すブロッ
ク図である。コンパイラ１０２は、ユーザが記述したＣ
言語プログラム１０１を翻訳し、機械命令プログラム１
１０を出力する。Embodiments of the present invention will be described below in detail with reference to the drawings. (Compiler) FIG. 1 is a block diagram showing the configuration of the compiler. The compiler 102 uses the C
Translates the language program 101 into the machine instruction program 1
10 is output.

【００１４】コンパイラ１０２は、Ｃ言語プログラム１
０１を読込用バッファ１０４に読み込むファイル読込部
１０３と、読込用バッファ１０４に読み込まれたＣ言語
プログラムの構文や意味を解析して中間コードを生成し
中間コード用バッファ１０６に書き込む構文解析部１０
５と、中間コード用バッファ１０６に格納された中間コ
ードを入力して機械命令プログラムを生成し出力用バッ
ファ１０８に書き込む機械命令生成部１０７と、出力用
バッファ１０８に格納された機械命令プログラムをファ
イルに出力するファイル出力部１０９とから構成され
る。The compiler 102 is a C language program 1
01 into the read buffer 104, and a syntax analyzer 10 that analyzes the syntax and meaning of the C language program read into the read buffer 104, generates intermediate code, and writes the intermediate code into the intermediate code buffer 106.
5, a machine instruction generation unit 107 that receives the intermediate code stored in the intermediate code buffer 106 to generate a machine instruction program and writes the generated machine instruction program to the output buffer 108, and stores the machine instruction program stored in the output buffer 108 as a file. And a file output unit 109 for outputting to

【００１５】図２は、構文解析部１０５の処理フローを
示したフローチャートである。図３は、機械命令生成部
１０７の処理フローを示したフローチャートである。図
４は、Ｃ言語プログラムの一部分を示すプログラムリス
トである。図４において、for文はループを表し、配列
変数x[i]と配列変数y[i]の積を配列番号iについて累積
加算して変数sumを求めることを意図して記述されたも
のである。FIG. 2 is a flowchart showing a processing flow of the syntax analysis unit 105. FIG. 3 is a flowchart showing a processing flow of the machine instruction generation unit 107. FIG. 4 is a program list showing a part of a C language program. In FIG. 4, a for statement represents a loop, and is described with the intention of obtaining a variable sum by accumulating the product of an array variable x [i] and an array variable y [i] for an array number i. .

【００１６】図５は、図４に示すＣ言語プログラムを入
力として与えた場合に、コンパイラにより生成された機
械命令プログラムを示したリストである。なお、機械命
令プログラムは本来０と１のビット列であるが、意味を
表すために図５ではニモニック表記してある。以下に、
図４のプログラムリストを入力とした場合における上記
構成をもつコンパイラの動作について図１、図２、図
３、図４、図５を用いて説明する。FIG. 5 is a list showing a machine instruction program generated by a compiler when the C language program shown in FIG. 4 is given as an input. The machine instruction program is originally a bit string of 0s and 1s, but is represented by a mnemonic in FIG. less than,
The operation of the compiler having the above configuration when the program list of FIG. 4 is input will be described with reference to FIGS. 1, 2, 3, 4, and 5.

【００１７】ファイル読込部１０３は、ユーザが記述し
たＣ言語プログラムを読み込み、読込用バッファ１０４
に格納する。構文解析部１０５は、読込用バッファ１０
４に格納されたＣ言語プログラムを取り出し解析を行う
（ステップ２０２）。図４に示すプログラムの「sum=su
m+x[i]*y[i];」については、「第１の配列変数と第２の
配列変数との積を各配列要素毎に求めて累積加算する」
旨のコードであると検出するため（ステップ２０３）、
積和演算である旨の中間コードを中間コード用バッファ
１０６に出力する（ステップ２０４）。The file reading unit 103 reads a C language program described by a user, and reads a C buffer program.
To be stored. The syntax analysis unit 105 reads the read buffer 10
4 is retrieved and analyzed (step 202). "Sum = su" of the program shown in FIG.
For “m + x [i] * y [i];”, “the product of the first array variable and the second array variable is obtained for each array element and cumulatively added”
(Step 203)
An intermediate code indicating a product-sum operation is output to the intermediate code buffer 106 (step 204).

【００１８】機械命令生成部１０７は、中間コード用バ
ッファ１０６に格納された中間コードを取り出し（ステ
ップ３０２）、積和演算である旨の中間コードであると
判断し（ステップ３０３）、積和演算処理用の機械命令
群（MOV、MULA、MOV、ADDAの繰り返し）を出力用バッフ
ァ１０８に出力する（ステップ３０４）。ファイル出力
部１０９は、出力用バッファ１０８に格納された機械命
令を、ファイルに出力する。The machine instruction generation unit 107 extracts the intermediate code stored in the intermediate code buffer 106 (step 302), determines that the intermediate code is a product-sum operation (step 303), and performs a product-sum operation. A machine instruction group for processing (repetition of MOV, MULA, MOV, ADDA) is output to the output buffer 108 (step 304). The file output unit 109 outputs the machine instruction stored in the output buffer 108 to a file.

【００１９】このファイルに出力された機械命令プログ
ラムは、図５のリストに示すようになる。以下、図５に
示す命令１から１４の意味を説明する。なお、命令１４
以降も４つの命令を単位として繰り返されるが、ここで
は記載を省略している。命令１： MOV @(A0), D0 アドレスレジスタA0の値が示す番地にある１６ビットデ
ータをデータレジスタD0にロードする。The machine instruction program output to this file is as shown in the list of FIG. Hereinafter, the meanings of the instructions 1 to 14 shown in FIG. 5 will be described. Instruction 14
Hereinafter, this is repeated in units of four instructions, but the description is omitted here. Instruction 1: MOV @ (A0), D0 Loads 16-bit data at the address indicated by the value of address register A0 into data register D0.

【００２０】命令２： MOV @(A1), D1 アドレスレジスタA1の値が示す番地にある１６ビットデ
ータをデータレジスタD1にロードする。命令３： MOV @(2,A0), D0 アドレスレジスタA0の値に２バイトの偏位を加えた番地
にある１６ビットデータをデータレジスタD0にロードす
る。Instruction 2: MOV @ (A1), D1 Loads 16-bit data at the address indicated by the value of address register A1 into data register D1. Instruction 3: MOV @ (2, A0), D0 Loads 16-bit data at an address obtained by adding a 2-byte deviation to the value of address register A0 to data register D0.

【００２１】命令４： MULA D0, D1 データレジスタD0の値とデータレジスタD1の値とを乗じ
た３２ビットの積を求める。積はプログラムからは見え
ない所定のレジスタに格納される。命令５： MOV @(2,A1), D1 アドレスレジスタA1の値に２バイトの偏位を加えた番地
にある１６ビットデータをデータレジスタD1にロードす
る。Instruction 4: MULA D0, D1 A 32-bit product is obtained by multiplying the value of data register D0 by the value of data register D1. The product is stored in a predetermined register that is not visible to the program. Instruction 5: MOV @ (2, A1), D1 Loads 16-bit data at the address obtained by adding a 2-byte deviation to the value of the address register A1 to the data register D1.

【００２２】命令６： ADDA 積和下位桁レジスタACCLと積和上位桁レジスタACCHとを
連結した３２ビット長のレジスタの値と、直近のMULA命
令で生成した３２ビットの積とを加算し、和を再び積和
下位桁レジスタACCLと積和上位桁レジスタACCHとに格納
する。加算の際に桁上げが生じた場合は、積和桁上げ蓄
積レジスタACCCの内容を１だけ増分する。Instruction 6: ADDA Adds the value of a 32-bit register obtained by connecting the product-sum lower-order register ACCL and the product-sum upper-order register ACCH to the 32-bit product generated by the most recent MULA instruction. Is stored again in the product-sum lower-order register ACCL and the product-sum upper-order register ACCH. If a carry occurs during the addition, the contents of the product-sum carry accumulation register ACCC are incremented by one.

【００２３】命令７： MOV @(4,A0), D0 アドレスレジスタA0の値に４バイトの偏位を加えた番地
にある１６ビットデータをデータレジスタD0にロードす
る。命令８： MULA D0, D1 （命令４と同じ。）命令９： MOV @(4,A1), D1 アドレスレジスタA1の値に４バイトの偏位を加えた番地
にある１６ビットデータをデータレジスタD1にロードす
る。Instruction 7: MOV @ (4, A0), D0 Loads 16-bit data at the address obtained by adding a 4-byte deviation to the value of the address register A0 to the data register D0. Instruction 8: MULA D0, D1 (same as instruction 4) Instruction 9: MOV @ (4, A1), D1 16-bit data at the address obtained by adding the 4-byte deviation to the value of the address register A1 is the data register D1. To load.

【００２４】命令１０： ADDA （命令６と同じ。）命令１１： MOV @(6,A0), D0 アドレスレジスタA0の値に６バイトの偏位を加えた番地
にある１６ビットデータをデータレジスタD0にロードす
る。Instruction 10: ADDA (same as instruction 6) Instruction 11: MOV @ (6, A0), D0 16-bit data at the address obtained by adding the displacement of the address register A0 by 6 bytes to the data register D0. To load.

【００２５】命令１２： MULA D0, D1 （命令４と同じ。）命令１３： MOV @(6,A1), D1 アドレスレジスタA1の値に６バイトの偏位を加えた番地
にある１６ビットデータをデータレジスタD1にロードす
る。Instruction 12: MULA D0, D1 (same as instruction 4) Instruction 13: MOV @ (6, A1), D1 16-bit data at the address obtained by adding the displacement of 6 bytes to the value of the address register A1. Load data register D1.

【００２６】命令１４： ADDA （命令６と同じ。）但し、命令１、２、３、５、７、９、１１、１３の各ロ
ード命令は、いわゆる遅延ロード方式を採っており、直
後の命令でロード結果を使用することはできない。Instruction 14: ADDA (same as instruction 6) However, the load instructions of instructions 1, 2, 3, 5, 7, 9, 11, and 13 employ a so-called delayed load method, and the immediately following instruction Cannot use the load result in

【００２７】上記の動作により、コンパイラ１０２は、
ユーザが記述したＣ言語プログラム１０１中の積和演算
を、ロード命令（MOV）と乗算命令（MULA）とロード命
令（MOV）と累積加算命令（ADDA）とからなる上記順序
の命令列を繰り返してなる機械命令の並びに翻訳し、本
発明に係るプロセッサの積和演算処理速度を最大限に高
めるための機械命令プログラム１１０を生成する。（プロセッサ）図６は、プロセッサの概略構成図であ
る。By the above operation, the compiler 102
The multiply-accumulate operation in the C language program 101 described by the user is repeated by repeating the above-described instruction sequence including a load instruction (MOV), a multiplication instruction (MULA), a load instruction (MOV), and an accumulative addition instruction (ADDA). A machine instruction program 110 for translating the machine instructions in order to maximize the product-sum operation speed of the processor according to the present invention is generated. (Processor) FIG. 6 is a schematic configuration diagram of the processor.

【００２８】プロセッサ６０１は、機械命令（以下、命
令と略す）をプログラムメモリ６０７から読出すための
命令読出し回路６０２と、読出した命令を解読して命令
実行回路６０４を制御する命令解読回路６０３と、必要
に応じてデータメモリ６０８をアクセスして命令を実行
する命令実行回路６０４とから構成される。プログラム
メモリ６０７は、機械命令プログラムを格納するプログ
ラムメモリ、データメモリ６０８は主にオペランドのデ
ータを格納するデータメモリである。また、プログラム
メモリ６０７に格納されているプログラムの１命令のサ
イズは１６ビットである。The processor 601 includes an instruction reading circuit 602 for reading a machine instruction (hereinafter, abbreviated as an instruction) from the program memory 607, and an instruction decoding circuit 603 for decoding the read instruction and controlling the instruction execution circuit 604. And an instruction execution circuit 604 for accessing the data memory 608 as necessary and executing the instruction. The program memory 607 is a program memory for storing a machine instruction program, and the data memory 608 is a data memory for mainly storing operand data. The size of one instruction of the program stored in the program memory 607 is 16 bits.

【００２９】命令バス６０５は、プログラムメモリ６０
７からプロセッサ６０１へ命令を供給する命令バス、デ
ータバス６０６は、データメモリ６０８とプロセッサ６
０１との間でのデータの転送を行うデータバスである。
また、命令バス６０５のバス幅は３２ビットである。命
令読出し回路６０２と命令解読回路６０３と命令実行回
路６０４とは、パイプライン処理を行う。The instruction bus 605 is connected to the program memory 60
7 and a data bus 606 for supplying an instruction to the processor 601, a data memory 608 and the processor 6
This is a data bus for transferring data to and from the data bus 01.
The bus width of the instruction bus 605 is 32 bits. The instruction reading circuit 602, the instruction decoding circuit 603, and the instruction execution circuit 604 perform pipeline processing.

【００３０】命令読出し回路６０２は、４命令分のサイ
ズ（１６ビット）の命令バッファを有する。命令読出し
回路６０２は、プログラムメモリ６０７から、命令バス
６０５を介して、同時に連続した２つの命令を読み出
し、命令バッファに格納する。ただし、２命令分の命令
バッファの空きがない場合は、命令解読回路６０３が命
令を解読し、空きができるのを待つ。The instruction reading circuit 602 has an instruction buffer having a size of four instructions (16 bits). The instruction readout circuit 602 reads out two consecutive instructions at the same time from the program memory 607 via the instruction bus 605 and stores them in the instruction buffer. However, if there is no free space in the instruction buffer for two instructions, the instruction decoding circuit 603 decodes the instruction and waits for a free space.

【００３１】命令解読回路６０３は、命令読出し回路６
０２で読み出された２命令が、MOVとMULAであれば、命
令バッファから２つの命令を取り出し、MOVとMULAを同
時に実行するように命令実行回路６０４に指示を与え
る。また、命令解読回路６０３は、命令読出し回路６０
２で読み出された２命令が、MOVとADDAであれば、命令
バッファから２つの命令を取り出し、MOVとADDAを同時
に実行するように命令実行回路６０４に指示を与える。The instruction decoding circuit 603 includes an instruction reading circuit 6
If the two instructions read in 02 are MOV and MULA, the two instructions are taken out of the instruction buffer and the instruction is given to the instruction execution circuit 604 to execute MOV and MULA simultaneously. The instruction decoding circuit 603 includes an instruction reading circuit 60.
If the two instructions read in step 2 are MOV and ADDA, two instructions are fetched from the instruction buffer and an instruction is given to the instruction execution circuit 604 to execute MOV and ADDA simultaneously.

【００３２】しかし、命令解読回路６０３は、命令読出
し回路６０２で読み出された２命令が、「MOVとMULA」
または「MOVとADDA」以外であれば、命令バッファから
１つの命令を取り出して、当該命令を実行するように命
令実行回路６０４に指示を与える。命令実行回路６０４
は、演算器と乗算器と加算器とを備え、命令に対応した
演算を実行する回路である。命令実行回路６０４の詳細
な構成は後述する。However, the instruction decoding circuit 603 determines that the two instructions read by the instruction reading circuit 602 are “MOV and MULA”.
Otherwise, if it is other than “MOV and ADDA”, one instruction is fetched from the instruction buffer and an instruction is given to the instruction execution circuit 604 to execute the instruction. Instruction execution circuit 604
Is a circuit that includes an arithmetic unit, a multiplier, and an adder, and executes an operation corresponding to an instruction. The detailed configuration of the instruction execution circuit 604 will be described later.

【００３３】図７は、命令読出し回路６０２、命令解読
回路６０３、及び命令実行回路６０４の処理タイミング
を示した処理タイミング図である。なお、図７中の命令
の番号は、前述した図５のプログラムリストに対応して
いる。以下に、図６、図７を用いてプロセッサの動作概
要を説明する。まず、プログラムメモリ６０７から命令
読出し回路６０２が命令１、命令２を読み出す（サイク
ル１）。命令解読回路６０３は命令１、命令２を解読し
（サイクル２）、命令１だけを命令実行回路６０４に実
行させる（サイクル３）。FIG. 7 is a processing timing chart showing the processing timing of the instruction reading circuit 602, the instruction decoding circuit 603, and the instruction execution circuit 604. The instruction numbers in FIG. 7 correspond to the program list in FIG. 5 described above. The outline of the operation of the processor will be described below with reference to FIGS. First, the instruction reading circuit 602 reads instruction 1 and instruction 2 from the program memory 607 (cycle 1). The instruction decoding circuit 603 decodes the instruction 1 and the instruction 2 (cycle 2), and causes the instruction execution circuit 604 to execute only the instruction 1 (cycle 3).

【００３４】まだ２命令分の空きがあるので、命令読出
し回路６０２は、次の命令３、命令４を読み出す（サイ
クル２）。命令解読回路６０３は命令２、命令３を解読
し（サイクル３）、命令２だけを命令実行回路６０４に
実行させる（サイクル４）。空きが１命令分しかないの
で、命令読出し回路６０２は、命令の読み出しを行わな
い（サイクル３）。命令解読回路６０３は命令３、命令
４を解読し（サイクル４）、命令３及び命令４を命令実
行回路６０４に並列実行させる（サイクル５）。Since there is still room for two instructions, the instruction reading circuit 602 reads the next instruction 3 and instruction 4 (cycle 2). The instruction decoding circuit 603 decodes the instruction 2 and the instruction 3 (cycle 3) and causes the instruction execution circuit 604 to execute only the instruction 2 (cycle 4). Since there is only one free instruction, the instruction reading circuit 602 does not read the instruction (cycle 3). The instruction decoding circuit 603 decodes the instructions 3 and 4 (cycle 4) and causes the instruction execution circuit 604 to execute the instructions 3 and 4 in parallel (cycle 5).

【００３５】２命令分の空きができたので命令読出し回
路６０２は、命令５、命令６を読み出す（サイクル
４）。命令解読回路６０３は命令５、命令６を解読し
（サイクル５）、命令５及び命令６を命令実行回路６０
４に並列実行させる（サイクル６）。命令読出し回路６
０２は、命令７、命令８を読み出す（サイクル５）。命
令解読回路６０３は命令７、命令８を解読し（サイクル
６）、命令７及び命令８を命令実行回路６０４に並列実
行させる（サイクル７）。Since there is room for two instructions, the instruction reading circuit 602 reads instructions 5 and 6 (cycle 4). The instruction decoding circuit 603 decodes the instruction 5 and the instruction 6 (cycle 5) and converts the instruction 5 and the instruction 6 into the instruction execution circuit 60.
4 is executed in parallel (cycle 6). Instruction reading circuit 6
02 reads out the instructions 7 and 8 (cycle 5). The instruction decoding circuit 603 decodes the instructions 7 and 8 (cycle 6) and causes the instruction execution circuit 604 to execute the instructions 7 and 8 in parallel (cycle 7).

【００３６】命令読出し回路６０２は、命令９、命令１
０を読み出す（サイクル６）。命令解読回路６０３は命
令９、命令１０を解読し（サイクル７）、命令９及び命
令１０を命令実行回路６０４に並列実行させる（サイク
ル８）。命令読出し回路６０２は、命令１１、命令１２
を読み出す（サイクル７）。命令解読回路６０３は命令
１１、命令１２を解読し（サイクル８）、命令１１及び
命令１２を命令実行回路６０４に並列実行させる（サイ
クル９）。The instruction reading circuit 602 is provided with instructions 9 and 1
0 is read (cycle 6). The instruction decoding circuit 603 decodes the instructions 9 and 10 (cycle 7) and causes the instruction execution circuit 604 to execute the instructions 9 and 10 in parallel (cycle 8). The instruction reading circuit 602 includes the instruction 11, the instruction 12,
Is read (cycle 7). The instruction decoding circuit 603 decodes the instructions 11 and 12 (cycle 8) and causes the instruction execution circuit 604 to execute the instructions 11 and 12 in parallel (cycle 9).

【００３７】命令読出し回路６０２は、命令１３、命令
１４を読み出す（サイクル８）。命令解読回路６０３は
命令１３、命令１４を解読し（サイクル９）、命令１３
及び命令１４を命令実行回路６０４に並列実行させる
（サイクル１０）。上記のように、MOVとMULAの組と,MO
VとADDAの組は、交互に実行される。このMOVとMULAの組
とMOVとADDAの組とは、それぞれ１つのマシンサイクル
で実行されるのであるが、これについては以下に詳しく
述べる。The instruction reading circuit 602 reads the instructions 13 and 14 (cycle 8). The instruction decoding circuit 603 decodes the instructions 13 and 14 (cycle 9), and
And the instruction 14 is executed in parallel by the instruction execution circuit 604 (cycle 10). As mentioned above, the combination of MOV and MULA, and MO
The set of V and ADDA is executed alternately. The set of MOV and MULA and the set of MOV and ADDA are each executed in one machine cycle, which will be described in detail below.

【００３８】図８は、命令実行回路６０４の詳細な構成
を示すブロック図である。命令実行回路６０４は、Aバ
ス(ABUS)１４とB1バス(B1BUS)１５とB2バス(B2BUS)１
６、アドレスレジスタファイル(ARF)１７、データレジ
スタファイル(DRF)１８、セレクタ(SAR)１９、セレクタ
(SDR)２０、AとBとの２つのデータ入力に対して算術論
理演算を行う演算器(ALU)２１、演算器２１のA入力を選
択するセレクタ(SAA)２２、演算器２１のB入力を選択す
るセレクタ(SAB)２３、演算器２１の出力を保持する演
算器出力バッファ(ALOB)２４、オペランドアドレスバッ
ファ(OAB)２５、ストアバッファ(STB)２６、ロードバッ
ファ(LDB)２７、AとBとの２つのデータ入力に対して乗
算を行い、入力データ幅の２倍のデータ幅の積の下位部
分のL出力と積の上位に対しては和出力Sと桁上げ出力C
とを出力する乗算器(MPY)２８、乗算器２８の和出力Sと
桁上げ出力CとをそれぞれA入力及びB入力として桁上げ
伝搬加算する加算器(CPA)２９、加算器２９のA入力を選
択するセレクタ(SMA)３０、加算器２９のB入力を選択す
るセレクタ(SMB)３１、乗算器２８のL出力を保持するラ
ッチ(PRL)３２、積の上位部分となる加算器２９の出力
を保持するラッチ(PRH1)３３、ラッチ３３の内容を再び
加算器２９に供給するためにタイミングを調整するラッ
チ(PRH2)３４、演算器出力バッファ２４の内容を選択的
に保持し累積加算された積の下位部分を格納する積和下
位桁レジスタ(ACCL)３５、ラッチ３３の内容を選択的に
保持し累積加算された積の上位部分を格納する積和上位
桁レジスタ(ACCH)３６、演算器出力バッファ２４の内容
を選択的に保持し積を累積加算する際の桁上げを蓄る積
和桁上げ蓄積レジスタ(ACCC)３７で構成される。FIG. 8 is a block diagram showing a detailed configuration of the instruction execution circuit 604. The instruction execution circuit 604 includes an A bus (ABUS) 14, a B1 bus (B1BUS) 15, and a B2 bus (B2BUS) 1.
6, address register file (ARF) 17, data register file (DRF) 18, selector (SAR) 19, selector
(SDR) 20, an arithmetic unit (ALU) 21 for performing an arithmetic and logic operation on two data inputs A and B, a selector (SAA) 22 for selecting the A input of the arithmetic unit 21, and a B input of the arithmetic unit 21 (SAB) 23, an arithmetic unit output buffer (ALOB) 24 for holding the output of the arithmetic unit 21, an operand address buffer (OAB) 25, a store buffer (STB) 26, a load buffer (LDB) 27, and A Multiplies two data inputs with B, sum output S and carry output C for the lower L output of the product with a data width twice the input data width and the upper product.
(MPY) 28, an adder (CPA) 29 for carrying and carrying and adding the sum output S and the carry output C of the multiplier 28 as an A input and a B input, and the A input of the adder 29, respectively. (SMA) 30, a selector (SMB) 31 for selecting the B input of the adder 29, a latch (PRL) 32 for holding the L output of the multiplier 28, and the output of the adder 29 which is the upper part of the product. (PRH1) 33 for holding the data, a latch (PRH2) 34 for adjusting the timing to supply the contents of the latch 33 to the adder 29 again, and selectively holding the contents of the arithmetic unit output buffer 24 to perform cumulative addition. An accumulator lower-order digit register (ACCL) 35 for storing the lower part of the product, an accumulator upper-order digit register (ACCH) 36 for selectively holding the contents of the latch 33 and storing the upper part of the accumulated product, When the contents of the output buffer 24 are selectively held and the product is cumulatively added. Configured to increase in 蓄Ru product sum carry accumulation register (ACCC) 37.

【００３９】Aバス(ABUS)１４とB1バス(B1BUS)１５とB2
バス(B2BUS)１６は、演算すべきデータ及び演算結果の
データを転送するためのバスである。アドレスレジスタ
ファイル(ARF)１７は、A0からA3の４つのアドレスレジ
スタから構成され、データレジスタファイル(DRF)１８
は、D0からD3の４つのデータレジスタから構成される。A bus (ABUS) 14 and B1 bus (B1BUS) 15 and B2
The bus (B2BUS) 16 is a bus for transferring data to be operated and data of the operation result. The address register file (ARF) 17 is composed of four address registers A0 to A3, and the data register file (DRF) 18
Is composed of four data registers D0 to D3.

【００４０】セレクタ(SAR)１９は、アドレスレジスタ
ファイル１７の入力を選択するセレクタであり、セレク
タ(SDR)２０は、データレジスタファイル１８の入力を
選択するセレクタである。乗算器２８は、部分積を加算
するためにツリー状に結合された複数の桁上げ保存加算
器からなり、ツリーの最終段からは個々のビットの桁上
げ保存加算器の和出力と桁上げ出力とがそのまま出力さ
れている。The selector (SAR) 19 is a selector for selecting an input of the address register file 17, and the selector (SDR) 20 is a selector for selecting an input of the data register file 18. Multiplier 28 comprises a plurality of carry save adders combined in a tree to add partial products, and from the last stage of the tree, sum output and carry output of individual bit carry save adders. Is output as it is.

【００４１】また、演算器２１の加算時の桁上げ入力は
加算器２９の桁上げ出力が、加算器２９の桁上げ入力は
演算器２１の桁上げ出力が接続され、演算器２１及び加
算器２９が連結して動作する。なお、アドレスレジスタ
ファイル１７から積和桁上げ蓄積レジスタ３７までのす
べてが１６ビット幅である。The carry input at the time of addition of the arithmetic unit 21 is the carry output of the adder 29, and the carry input of the adder 29 is connected to the carry output of the arithmetic unit 21. 29 operate in conjunction. Note that everything from the address register file 17 to the product-sum carry accumulation register 37 is 16 bits wide.

【００４２】また、図８中に付されたT1、T2及びTAの記
号は、レジスタ又はラッチの書込みを行うクロックタイ
ミングを示す。図９は、図５の機械命令プログラムでア
クセスするデータを格納したデータメモリ６０８の内容
説明図である。x'1000番地にデータx'1111が、x'1002番
地にデータx'2222が、x'1004番地にデータx'3333が、x'
2000番地にデータx'4444が、x'2002番地にデータx'5555
が、x'2004番地にデータx'6666が格納されている。ここ
で、番地はバイト（８ビット）毎に付与され、データは
最下位バイトのアドレスでアクセスされるものとする。
x'は１６進数を表す。The symbols T1, T2, and TA shown in FIG. 8 indicate clock timings at which writing to a register or a latch is performed. FIG. 9 is an explanatory diagram of the contents of the data memory 608 storing data accessed by the machine instruction program of FIG. Data x'1111 at address x'1000, data x'2222 at address x'1002, data x'3333 at address x'1004, x '
Data x'4444 at address 2000, data x'5555 at address x'2002
However, data x'6666 is stored at address x'2004. Here, the address is assigned for each byte (8 bits), and the data is accessed by the address of the least significant byte.
x 'represents a hexadecimal number.

【００４３】図１０から図１２は、プロセッサ６０１に
おける命令実行回路６０４の動作タイミング図を示すも
のである。図１０から図１２は、クロックT2、クロック
T1、クロックTA、アドレスレジスタファイルARF１７に
おけるアドレスレジスタA0及びアドレスレジスタA1、デ
ータレジスタファイルDRF１８におけるデータレジスタD
0及びデータレジスタD1、積和下位桁レジスタACCL３
５、積和上位桁レジスタACCH３６、積和桁上げ蓄積レジ
スタACCC３７、AバスABUS１４、B1バスB1BUS１５、B2バ
スB2BUS１６、演算器ALU２１の出力、乗算器MPY２８の
出力、加算器CPA２９の出力、オペランドアドレスバッ
ファOAB２５、ロードバッファLDB２７の値をマシンサイ
クルと呼ばれるタイミング毎に示すものである。FIGS. 10 to 12 show operation timing charts of the instruction execution circuit 604 in the processor 601. FIG. 10 to 12 show the clock T2 and the clock
T1, clock TA, address register A0 and address register A1 in address register file ARF17, data register D in data register file DRF18
0 and data register D1, multiply-accumulate lower digit register ACCL3
5. Product-sum upper digit register ACCH36, product-sum carry accumulation register ACCC37, A bus ABUS14, B1 bus B1BUS15, B2 bus B2BUS16, output of arithmetic unit ALU21, output of multiplier MPY28, output of adder CPA29, operand address buffer The values of the OAB 25 and the load buffer LDB 27 are shown for each timing called a machine cycle.

【００４４】なお、サイクルt1からサイクルt4までを図
１０に、サイクルt5からサイクルt8までを図１１に、サ
イクルt9とサイクルt10とを図１２に、それぞれクロッ
クT2がＨになる前半期間を(2)、クロックT1がＨになる
後半期間を(1)を付して示す。ここで、MOV命令について
はオペランドアドレスバッファOAB２５がデータバス６
０６に出力されるサイクル、MULA命令については乗算器
MPY２８が動作するサイクル、ADDA命令については演算
器ALU２１と加算器CPA２９が動作するサイクルをそれぞ
れ命令実行サイクルと見ると、図１０のサイクルt2が図
７のマシンサイクル３に、図１０のサイクルt3が図７の
マシンサイクル４（以下同様）に対応する。The cycle t1 to cycle t4 is shown in FIG. 10, the cycle t5 to cycle t8 is shown in FIG. 11, the cycle t9 and the cycle t10 are shown in FIG. ), The latter half period in which the clock T1 becomes H is indicated by (1). Here, for the MOV instruction, the operand address buffer OAB25 is connected to the data bus 6
Cycle output to 06, multiplier for MULA instruction
When the cycle in which the MPY 28 operates and the cycle in which the arithmetic unit ALU 21 and the adder CPA 29 operate with respect to the ADDA instruction are considered as instruction execution cycles, respectively, the cycle t2 in FIG. 10 is the machine cycle 3 in FIG. 7, and the cycle t3 in FIG. This corresponds to machine cycle 4 in FIG. 7 (the same applies hereinafter).

【００４５】また、アドレスレジスタA0には初期値x'10
00が、アドレスレジスタA1には初期値x'2000が、積和下
位桁レジスタ３５、積和上位桁レジスタ３６、積和桁上
げ蓄積レジスタ３７には初期値x'0000が格納されている
ものとする。以下に、図５に示す機械命令プログラムを
実行する場合の命令実行回路６０４の動作を、図８、図
９、図１０、図１１、図１２を用いて説明する。（命令１）命令１は、サイクルt1(1)からサイクルt3(2)
にかけて命令実行回路６０４で実行される。The initial value x'10 is stored in the address register A0.
00, the initial value x'2000 is stored in the address register A1, and the initial value x'0000 is stored in the product-sum lower digit register 35, the product-sum upper digit register 36, and the product-sum carry accumulation register 37. I do. The operation of the instruction execution circuit 604 when executing the machine instruction program shown in FIG. 5 will be described below with reference to FIGS. 8, 9, 10, 11, and 12. (Instruction 1) Instruction 1 is a cycle t1 (1) to a cycle t3 (2).
And executed by the instruction execution circuit 604.

【００４６】サイクルt1(1)において、アドレスレジス
タA0の値x'1000と偏位が０であることによる値x'0000と
をそれぞれB1バス１５及びB2バス１６とセレクタ２３及
びセレクタ２２とを経由して演算器２１で加算する。サ
イクルt2(2)で加算の結果の値x'1000を演算器出力バッ
ファ２４を経てオペランドアドレスレジスタ２５に格納
するとともにデータバス６０６にアドレスとして出力
し、データメモリ６０８のx'1000番地からデータを読出
す。In cycle t1 (1), the value x'1000 of the address register A0 and the value x'0000 due to the deviation being 0 are passed through the B1 bus 15 and the B2 bus 16, the selector 23 and the selector 22, respectively. Then, the arithmetic unit 21 adds the values. In cycle t2 (2), the value x'1000 of the result of the addition is stored in the operand address register 25 via the arithmetic unit output buffer 24 and output to the data bus 606 as an address, and the data is read from the address x'1000 in the data memory 608. Read.

【００４７】読出された値x'1111はサイクルt2(1)でロ
ードバッファ２７に保持され、サイクルt3(2)でセレク
タ２０を経てデータレジスタD0に格納される。（命令２）命令２は、サイクルt2(1)からサイクルt4(2)
にかけて命令実行回路６０４で実行される。The read value x'1111 is held in the load buffer 27 in cycle t2 (1), and stored in the data register D0 via the selector 20 in cycle t3 (2). (Instruction 2) Instruction 2 is a cycle t2 (1) to a cycle t4 (2).
And executed by the instruction execution circuit 604.

【００４８】サイクルt2(1)において、アドレスレジス
タA1の値x'2000と偏位が０であることによる値x'0000と
をそれぞれB1バス１５及びB2バス１６とセレクタ２３及
びセレクタ２２とを経由して演算器２１で加算する。サ
イクルt3(2)で加算の結果の値x'2000を演算器出力バッ
ファ２４を経てオペランドアドレスレジスタ２５に格納
するとともにデータバス６０６にアドレスとして出力
し、データメモリ６０８のx'2000番地からデータを読出
す。In the cycle t2 (1), the value x'2000 of the address register A1 and the value x'0000 due to the deviation being 0 pass through the B1 bus 15 and the B2 bus 16, and the selector 23 and the selector 22, respectively. Then, the arithmetic unit 21 adds the values. In the cycle t3 (2), the value x'2000 of the addition result is stored in the operand address register 25 via the arithmetic unit output buffer 24 and output to the data bus 606 as an address, and the data is read from the address x'2000 in the data memory 608. Read.

【００４９】読出された値x'4444はサイクルt3(1)でロ
ードバッファ２７に保持され、サイクルt4(2)でセレク
タ２０を経てデータレジスタD1に格納される。（命令３）命令３は、サイクルt3(1)からサイクルt5(2)
にかけて命令実行回路６０４で実行される。The read value x'4444 is held in the load buffer 27 at cycle t3 (1), and stored in the data register D1 via the selector 20 at cycle t4 (2). (Instruction 3) Instruction 3 is from cycle t3 (1) to cycle t5 (2)
And executed by the instruction execution circuit 604.

【００５０】サイクルt3(1)において、アドレスレジス
タA0の値x'1000と偏位の値x'0002とをそれぞれB1バス１
５及びB2バス１６とセレクタ２３及びセレクタ２２とを
経由して演算器２１で加算する。サイクルt4(2)で加算
の結果の値x'1002を演算器出力バッファ２４を経てオペ
ランドアドレスレジスタ２５に格納するとともにデータ
バス６０６にアドレスとして出力し、データメモリ６０
８のx'1002番地からデータを読出す。In cycle t3 (1), the value x'1000 of the address register A0 and the deviation value x'0002 are respectively
5 and the B2 bus 16, the selector 23, and the selector 22 via the selector 22. In the cycle t4 (2), the value x'1002 of the result of the addition is stored in the operand address register 25 via the arithmetic unit output buffer 24 and output to the data bus 606 as an address.
Data is read from the address x'1002 of No. 8.

【００５１】読出された値x'2222はサイクルt4(1)でロ
ードバッファ２７に保持され、サイクルt5(2)でセレク
タ２０を経てデータレジスタD0に格納される。（命令４）命令４は、命令３と同時に命令解読回路６０
３で解読され、並列に命令実行回路６０４で実行され
る。厳密には、命令４は、サイクルt4(2)からサイクルt
4(1)にかけて命令実行回路６０４で実行される。The read value x'2222 is held in the load buffer 27 at cycle t4 (1), and is stored in the data register D0 via the selector 20 at cycle t5 (2). (Instruction 4) The instruction 4 is the instruction decoding circuit 60 at the same time as the instruction 3.
3 and are executed by the instruction execution circuit 604 in parallel. Strictly speaking, instruction 4 starts at cycle t4 (2) and ends at cycle t4 (2).
4 (1) is executed by the instruction execution circuit 604.

【００５２】サイクルt4(2)において、データレジスタD
0の値x'1111とデータレジスタD1の値x'4444とをそれぞ
れAバス１４とB2バス１６とを経由して乗算器２８で乗
算する。積の下位１６ビットである値x'0c84はサイクル
t4(1)でラッチ３２に格納される。積の上位１６ビット
の和出力と桁上げ出力とはそれぞれセレクタ３０とセレ
クタ３１とを経由して加算器２９で加算され、加算結果
の値x'048dはサイクルt4(1)でラッチ３３に格納され
る。（命令５）命令５は、サイクルt4(1)からサイクルt6(2)
にかけて命令実行回路６０４で実行される。In cycle t4 (2), data register D
The multiplier 28 multiplies the value x'1111 of 0 and the value x'4444 of the data register D1 via the A bus 14 and the B2 bus 16, respectively. The value x'0c84, the lower 16 bits of the product, is the cycle
The data is stored in the latch 32 at t4 (1). The sum output and carry output of the upper 16 bits of the product are added by the adder 29 via the selector 30 and the selector 31, respectively, and the value x'048d of the addition result is stored in the latch 33 at cycle t4 (1). Is done. (Instruction 5) The instruction 5 is a cycle t4 (1) to a cycle t6 (2).
And executed by the instruction execution circuit 604.

【００５３】サイクルt4(1)において、アドレスレジス
タA1の値x'2000と偏位の値x'0002とをそれぞれB1バス１
５及びB2バス１６とセレクタ２３及びセレクタ２２とを
経由して演算器２１で加算する。サイクルt5(2)で加算
の結果の値x'2002を演算器出力バッファ２４を経てオペ
ランドアドレスレジスタ２５に格納するとともにデータ
バス６０６にアドレスとして出力し、データメモリ６０
８のx'2002番地からデータを読出す。In cycle t4 (1), the value x'2000 of the address register A1 and the value x'0002 of the deviation are respectively transferred to the B1 bus 1
5 and the B2 bus 16, the selector 23, and the selector 22 via the selector 22. In the cycle t5 (2), the value x'2002 of the addition result is stored in the operand address register 25 via the arithmetic unit output buffer 24 and output to the data bus 606 as an address.
The data is read from the address 8'x'2002.

【００５４】読出された値x'5555はサイクルt5(1)でロ
ードバッファ２７に保持され、サイクルt6(2)でセレク
タ２０を経てデータレジスタD1に格納される。（命令６）命令６は、命令５と同時に命令解読回路６０
３で解読され、並列に命令実行回路６０４で実行され
る。厳密には、命令６は、サイクルt5(2)からサイクルt
6(1)にかけて命令実行回路６０４で実行される。The read value x'5555 is held in the load buffer 27 at cycle t5 (1), and is stored in the data register D1 via the selector 20 at cycle t6 (2). (Instruction 6) The instruction 6 is the instruction decoding circuit 60 at the same time as the instruction 5.
3 and are executed by the instruction execution circuit 604 in parallel. Strictly speaking, instruction 6 starts at cycle t5 (2) and ends at cycle t5 (2).
The processing is executed by the instruction execution circuit 604 up to 6 (1).

【００５５】サイクルt5(2)において、積和下位桁レジ
スタ３５の値x'0000とラッチ３２の値x'0c84とをそれぞ
れAバス１４及びB2バス１６とセレクタ２２及びセレク
タ２３とを経由して加算器２１で加算する。サイクルt5
(1)で、加算結果の値x'0c84を演算器出力バッファ２４
を経て再び積和下位桁レジスタ３５に格納するととも
に、ラッチ３３の値x'048dをラッチ３４に転送する。さ
らにサイクルt5(1)において、ラッチ３４に保持された
値x'048dと積和上位桁レジスタ３６の値x'0000とをそれ
ぞれセレクタ３１及びセレクタ３０を経由して加算器２
９で加算する。このとき演算器２１の最上位ビットから
の桁上げ出力を加算器２９の最下位ビットへの桁上げ入
力として加算が行われるが、その桁上げ入力の値は０で
ある。In cycle t5 (2), the value x'0000 of the product-sum lower-order register 35 and the value x'0c84 of the latch 32 are transferred via the A bus 14 and the B2 bus 16 and the selectors 22 and 23, respectively. The addition is performed by the adder 21. Cycle t5
In (1), the value x'0c84 of the addition result is stored in the arithmetic unit output buffer 24.
, Is again stored in the product-sum lower-order register 35, and the value x′048d of the latch 33 is transferred to the latch 34. Further, in cycle t5 (1), the value x'048d held in the latch 34 and the value x'0000 of the product-sum upper-order register 36 are added to the adder 2 via the selector 31 and the selector 30, respectively.
9 is added. At this time, the carry output from the most significant bit of the arithmetic unit 21 is added as a carry input to the least significant bit of the adder 29, and the value of the carry input is 0.

【００５６】サイクルt6(2)で、加算結果の値x'048dを
ラッチ３３を経て再び積和上位桁レジスタ３６に格納す
る。次にサイクルt6(2)において、セレクタ２３によっ
て選択された積和桁上げ蓄積レジスタ３７の値x'0000と
セレクタ２２が何も選択せずに出力する値x'0000とを、
加算器２９の最上位ビットからの桁上げ出力を最下位ビ
ットへの桁上げ入力として演算器２１において加算す
る。その桁上げ入力の値は０であり、加算結果の値x'00
00は演算器出力バッファ２４を経てサイクルt6(1)で再
び積和桁上げ蓄積レジスタ３７に格納される。At cycle t6 (2), the value x'048d of the addition result is stored again in the product-sum upper-order register 36 via the latch 33. Next, in cycle t6 (2), the value x'0000 of the product-sum carry accumulation register 37 selected by the selector 23 and the value x'0000 output by the selector 22 without selecting anything are
The arithmetic unit 21 adds the carry output from the most significant bit of the adder 29 as the carry input to the least significant bit. The value of the carry input is 0, and the value of the addition result x'00
00 is stored in the multiply-accumulate carry accumulation register 37 again at cycle t6 (1) via the arithmetic unit output buffer 24.

【００５７】以上のように３回の加算操作により積が累
積加算され、１回目までの積和処理が完了する。（命令７）命令７は、サイクルt5(1)からサイクルt7(2)
にかけて命令実行回路６０４で実行される。As described above, the products are cumulatively added by the three addition operations, and the product-sum processing up to the first time is completed. (Instruction 7) The instruction 7 starts from cycle t5 (1) to cycle t7 (2).
And executed by the instruction execution circuit 604.

【００５８】サイクルt5(1)において、アドレスレジス
タA0の値x'1000と偏位の値x'0004とをそれぞれB1バス１
５及びB2バス１６とセレクタ２３及びセレクタ２２とを
経由して演算器２１で加算する。サイクルt6(2)で加算
の結果の値x'1004を演算器出力バッファ２４を経てオペ
ランドアドレスレジスタ２５に格納するとともにデータ
バス６０６にアドレスとして出力し、データメモリ６０
８のx'1004番地からデータを読出す。In cycle t5 (1), the value x'1000 of the address register A0 and the deviation value x'0004 are respectively
5 and the B2 bus 16, the selector 23, and the selector 22 via the selector 22. In cycle t6 (2), the value x'1004 of the result of the addition is stored in the operand address register 25 via the arithmetic unit output buffer 24 and output to the data bus 606 as an address.
Data is read from the address 8'x'1004.

【００５９】読出された値x'3333はサイクルt6(1)でロ
ードバッファ２７に保持され、サイクルt7(2)でセレク
タ２０を経てデータレジスタD0に格納される。（命令８）命令８は、命令７と同時に命令解読回路６０
３で解読され、並列に命令実行回路６０４で実行され
る。厳密には、命令８は、サイクルt6(2)からサイクルt
6(1)にかけて命令実行回路６０４で実行される。The read value x'3333 is held in the load buffer 27 at cycle t6 (1), and is stored in the data register D0 via the selector 20 at cycle t7 (2). (Instruction 8) The instruction 8 is the instruction decoding circuit 60 simultaneously with the instruction 7.
3 and are executed by the instruction execution circuit 604 in parallel. Strictly speaking, instruction 8 starts at cycle t6 (2) and ends at cycle t6 (2).
The processing is executed by the instruction execution circuit 604 up to 6 (1).

【００６０】サイクルt6(2)において、データレジスタD
0の値x'2222とデータレジスタD1の値x'5555とをそれぞ
れAバス１４とB2バス１６とを経由して乗算器２８で乗
算する。積の下位１６ビットである値x'9f4aはサイクル
t6(1)でラッチ３２に格納される。積の上位１６ビット
の和出力と桁上げ出力とはそれぞれセレクタ３０とセレ
クタ３１とを経由して加算器２９で加算され、加算結果
の値x'0b60はサイクルt6(1)でラッチ３３に格納され
る。（命令９）命令９は、サイクルt6(1)からサイクルt8(2)
にかけて命令実行回路６０４で実行される。In cycle t6 (2), data register D
The value x'2222 of 0 and the value x'5555 of the data register D1 are multiplied by the multiplier 28 via the A bus 14 and the B2 bus 16, respectively. The value x'9f4a, the lower 16 bits of the product, is the cycle
The data is stored in the latch 32 at t6 (1). The sum output and carry output of the upper 16 bits of the product are added by the adder 29 via the selector 30 and the selector 31, respectively, and the value x'0b60 of the addition result is stored in the latch 33 in cycle t6 (1). Is done. (Instruction 9) The instruction 9 starts from cycle t6 (1) to cycle t8 (2).
And executed by the instruction execution circuit 604.

【００６１】サイクルt6(1)において、アドレスレジス
タA1の値x'2000と偏位の値x'0004とをそれぞれB1バス１
５及びB2バス１６とセレクタ２３及びセレクタ２２とを
経由して演算器２１で加算する。サイクルt7(2)で加算
の結果の値x'2004を演算器出力バッファ２４を経てオペ
ランドアドレスレジスタ２５に格納するとともにデータ
バス６０６にアドレスとして出力し、データメモリ６０
８のx'2004番地からデータを読出す。In cycle t6 (1), the value x'2000 of the address register A1 and the deviation value x'0004 are respectively
5 and the B2 bus 16, the selector 23, and the selector 22 via the selector 22. In cycle t7 (2), the value x'2004 of the result of the addition is stored in the operand address register 25 via the arithmetic unit output buffer 24 and output to the data bus 606 as an address.
The data is read from the address 8'x'2004.

【００６２】読出された値x'6666はサイクルt7(1)でロ
ードバッファ２７に保持され、サイクルt8(2)でセレク
タ２０を経てデータレジスタD1に格納される。（命令１０）命令１０は、命令９と同時に命令解読回路
６０３で解読され、並列に命令実行回路６０４で実行さ
れる。厳密には、命令１０は、サイクルt7(2)からサイ
クルt8(1)にかけて命令実行回路６０４で実行される。The read value x'6666 is held in the load buffer 27 at cycle t7 (1), and is stored in the data register D1 via the selector 20 at cycle t8 (2). (Instruction 10) The instruction 10 is decoded by the instruction decoding circuit 603 at the same time as the instruction 9, and is executed by the instruction execution circuit 604 in parallel. Strictly speaking, the instruction 10 is executed by the instruction execution circuit 604 from a cycle t7 (2) to a cycle t8 (1).

【００６３】サイクルt7(2)において、積和下位桁レジ
スタ３５の値x'0c84とラッチ３２の値x'9f4aとをそれぞ
れAバス１４及びB2バス１６とセレクタ２２及びセレク
タ２３とを経由して加算器２１で加算する。サイクルt7
(1)で、加算結果の値x'abceを演算器出力バッファ２４
を経て再び積和下位桁レジスタ３５に格納するととも
に、ラッチ３３の値x'0b60をラッチ３４に転送する。さ
らにサイクルt7(1)において、ラッチ３４に保持された
値x'0b60と積和上位桁レジスタ３６の値x'048dとをそれ
ぞれセレクタ３１及びセレクタ３０を経由して加算器２
９で加算する。このとき演算器２１の最上位ビットから
の桁上げ出力を加算器２９の最下位ビットへの桁上げ入
力として加算が行われるが、その桁上げ入力の値は０で
ある。In cycle t7 (2), the value x'0c84 of the product-sum lower-order register 35 and the value x'9f4a of the latch 32 are transferred via the A bus 14 and the B2 bus 16, and the selectors 22 and 23, respectively. The addition is performed by the adder 21. Cycle t7
In (1), the value x'abce of the addition result is output to the arithmetic unit output buffer 24.
, Is again stored in the product-sum lower-order register 35, and the value x′0b60 of the latch 33 is transferred to the latch 34. Further, in cycle t7 (1), the value x'0b60 held in the latch 34 and the value x'048d of the product-sum upper-order register 36 are added to the adder 2 via the selector 31 and the selector 30, respectively.
9 is added. At this time, the carry output from the most significant bit of the arithmetic unit 21 is added as a carry input to the least significant bit of the adder 29, and the value of the carry input is 0.

【００６４】サイクルt8(2)で、加算結果の値x'0fedを
ラッチ３３を経て再び積和上位桁レジスタ３６に格納す
る。次にサイクルt8(2)において、セレクタ２３によっ
て選択された積和桁上げ蓄積レジスタ３７の値x'0000と
セレクタ２２が何も選択せずに出力する値x'0000とを、
加算器２９の最上位ビットからの桁上げ出力を最下位ビ
ットへの桁上げ入力として演算器２１において加算す
る。その桁上げ入力の値は０であり、加算結果の値x'00
00は演算器出力バッファ２４を経てサイクルt8(1)で再
び積和桁上げ蓄積レジスタ３７に格納される。At cycle t8 (2), the value x'0fed of the addition result is stored again in the product-sum upper-order register 36 via the latch 33. Next, in cycle t8 (2), the value x'0000 of the product-sum carry accumulation register 37 selected by the selector 23 and the value x'0000 output by the selector 22 without selecting anything are determined by:
The arithmetic unit 21 adds the carry output from the most significant bit of the adder 29 as the carry input to the least significant bit. The value of the carry input is 0, and the value of the addition result x'00
00 is stored in the product-sum carry accumulation register 37 again in the cycle t8 (1) via the arithmetic unit output buffer 24.

【００６５】以上のように３回の加算操作により積が累
積加算され、２回目までの積和処理が完了する。（命令１１）（説明を省略する。動作は図１１の破線部
分に示される。）（命令１２）命令１２は、命令１０と同時に命令解読回
路６０３で解読され、並列に命令実行回路６０４で実行
される。厳密には、命令１２は、サイクルt8(2)からサ
イクルt8(1)にかけて命令実行回路６０４で実行され
る。As described above, the products are cumulatively added by the three addition operations, and the product-sum processing up to the second time is completed. (Instruction 11) (Description is omitted. The operation is indicated by the broken line in FIG. 11.) (Instruction 12) The instruction 12 is decoded by the instruction decoding circuit 603 at the same time as the instruction 10, and is executed by the instruction execution circuit 604 in parallel. Is done. Strictly, the instruction 12 is executed by the instruction execution circuit 604 from cycle t8 (2) to cycle t8 (1).

【００６６】サイクルt8(2)において、データレジスタD
0の値x'3333とデータレジスタD1の値x'6666とをそれぞ
れAバス１４とB2バス１６とを経由して乗算器２８で乗
算する。積の下位１６ビットである値x'b852はサイクル
t8(1)でラッチ３２に格納される。積の上位１６ビット
の和出力と桁上げ出力とはそれぞれセレクタ３０とセレ
クタ３１とを経由して加算器２９で加算され、加算結果
の値x'147aはサイクルt8(1)でラッチ３３に格納され
る。（命令１３）（説明を省略する。動作は図１１の破線部
分に示される。）（命令１４）命令１４は、命令１３と同時に命令解読回
路６０３で解読され、並列に命令実行回路６０４で実行
される。厳密には、命令１４は、サイクルt9(2)からサ
イクルt10(1)にかけて命令実行回路６０４で実行され
る。In cycle t8 (2), data register D
The value x'3333 of 0 and the value x'6666 of the data register D1 are multiplied by the multiplier 28 via the A bus 14 and the B2 bus 16, respectively. The value x'b852, the lower 16 bits of the product, is the cycle
The data is stored in the latch 32 at t8 (1). The sum output and carry output of the upper 16 bits of the product are added by the adder 29 via the selector 30 and the selector 31, respectively, and the value x'147a of the addition result is stored in the latch 33 at cycle t8 (1). Is done. (Instruction 13) (Description is omitted. The operation is indicated by the broken line in FIG. 11.) (Instruction 14) The instruction 14 is decoded by the instruction decoding circuit 603 at the same time as the instruction 13, and is executed by the instruction execution circuit 604 in parallel. Is done. Strictly, the instruction 14 is executed by the instruction execution circuit 604 from cycle t9 (2) to cycle t10 (1).

【００６７】サイクルt9(2)において、積和下位桁レジ
スタ３５の値x'abceとラッチ３２の値x'b852とをそれぞ
れAバス１４及びB2バス１６とセレクタ２２及びセレク
タ２３とを経由して加算器２１で加算する。サイクルt9
(1)で、加算結果の値x'6420を演算器出力バッファ２４
を経て再び積和下位桁レジスタ３５に格納するととも
に、ラッチ３３の値x'147aをラッチ３４に転送する。さ
らにサイクルt9(1)において、ラッチ３４に保持された
値x'147aと積和上位桁レジスタ３６の値x'0fedとをそれ
ぞれセレクタ３１及びセレクタ３０を経由して加算器２
９で加算する。このとき演算器２１の最上位ビットから
の桁上げ出力を加算器２９の最下位ビットへの桁上げ入
力として加算が行われるが、その桁上げ入力の値は１で
ある。In cycle t 9 (2), the value x′abce of the product-sum lower-order register 35 and the value x′b852 of the latch 32 are transferred via the A bus 14 and the B 2 bus 16 and the selectors 22 and 23, respectively. The addition is performed by the adder 21. Cycle t9
In (1), the value x'6420 of the addition result is output to the arithmetic unit output buffer 24.
, Is again stored in the product-sum lower-order register 35, and the value x′147a of the latch 33 is transferred to the latch 34. Further, in the cycle t9 (1), the value x'147a held in the latch 34 and the value x'0fed of the product-sum upper-order register 36 are added to the adder 2 via the selector 31 and the selector 30, respectively.
9 is added. At this time, the carry output from the most significant bit of the arithmetic unit 21 is added as a carry input to the least significant bit of the adder 29, and the value of the carry input is 1.

【００６８】サイクルt10(2)で、加算結果の値x'2468を
ラッチ３３を経て再び積和上位桁レジスタ３６に格納す
る。次にサイクルt10(2)において、セレクタ２３によっ
て選択された積和桁上げ蓄積レジスタ３７の値x'0000と
セレクタ２２が何も選択せずに出力する値x'0000とを、
加算器２９の最上位ビットからの桁上げ出力を最下位ビ
ットへの桁上げ入力として演算器２１において加算す
る。その桁上げ入力の値は０であり、加算結果の値x'00
00は演算器出力バッファ２４を経てサイクルt10(1)で再
び積和桁上げ蓄積レジスタ３７に格納される。In cycle t10 (2), the value x'2468 of the addition result is stored again in the product-sum upper-order register 36 via the latch 33. Next, in cycle t10 (2), the value x'0000 of the product-sum carry accumulation register 37 selected by the selector 23 and the value x'0000 output by the selector 22 without selecting anything are determined by:
The arithmetic unit 21 adds the carry output from the most significant bit of the adder 29 as the carry input to the least significant bit. The value of the carry input is 0, and the value of the addition result x'00
00 is stored in the product-sum carry accumulation register 37 again in the cycle t10 (1) via the arithmetic unit output buffer 24.

【００６９】以上のように３回の加算操作により積が累
積加算され、３回目までの積和処理が完了する。なお、
図１１、図１２の破線部分は、命令１１、命令１３及び
命令１４に後続する命令の動作を表す。また、上述の３
回目までの積和処理では、積和上位桁レジスタ３６の加
算時に最上位ビットから桁上げが出ることがなかった
が、積算の回数が増えて桁上げが発生するとこれが積和
桁上げ蓄積レジスタ３７に蓄積される。As described above, the products are cumulatively added by the three addition operations, and the product-sum processing up to the third time is completed. In addition,
The broken lines in FIGS. 11 and 12 indicate the operation of the instruction following the instruction 11, the instruction 13, and the instruction 14. In addition, the above 3
In the sum-of-products processing up to the first time, no carry was generated from the most significant bit at the time of addition of the sum-of-products upper digit register 36, but when the number of times of integration increases and a carry occurs, this is the sum-of-products carry accumulation register 37. Is accumulated in

【００７０】以上のように、命令実行回路６０４は、ロ
ード命令（MOV）と乗算命令（MULA）を、演算器２１と
乗算器２８の並列動作により同時に実行し、ロード命令
（MOV）と累積加算命令（ADDA）を、演算器２１と乗算
器２８と加算器２９の並列動作により同時に実行する。
これにより、２命令を同時に読み込める命令読出し回
路６０２と、並列に実行すべき命令を検出し命令実行回
路６０４を制御する命令解読回路６０３と、命令実行回
路６０４を備えたプロセッサ６０１は、積和演算１回分
の処理を２サイクルで行うことができる。As described above, the instruction execution circuit 604 executes the load instruction (MOV) and the multiplication instruction (MULA) simultaneously by the parallel operation of the arithmetic unit 21 and the multiplier 28, and executes the load instruction (MOV) and the cumulative addition. The instruction (ADDA) is simultaneously executed by the parallel operation of the arithmetic unit 21, the multiplier 28 and the adder 29.
As a result, the instruction reading circuit 602 that can read two instructions at the same time, the instruction decoding circuit 603 that detects instructions to be executed in parallel and controls the instruction execution circuit 604, and the processor 601 including the instruction execution circuit 604 have a product-sum operation. One process can be performed in two cycles.

【００７１】以上、本発明に係るプロセッサ及びコンパ
イラについて、実施形態に基づいて説明したが、本発明
はこれら実施形態に限られないことは勿論である。即
ち、（１）実施形態では、Ｃ言語プログラムにおける積和１
回の処理を、コンパイラ１０２がロード命令と乗算命令
とロード命令と累積加算命令とからなる命令列に翻訳
し、プロセッサ６０１がこの命令列を受け、前のロード
命令と乗算命令とを並列に、後のロード命令と累積加算
命令とを並列にそれぞれ解読実行しているが、ロードと
乗算とを並列に実行する命令とロードと累積加算とを並
列に実行する命令とを定義し、コンパイラ１０２がＣ言
語の同じ処理をこれら２つの命令からなる命令列に翻訳
し、プロセッサ６０１がこの命令列を受け、それぞれを
単独に解読実行するようにしてもよい。（２）実施形態では、アドレスレジスタファイル１７か
ら積和桁上げ蓄積レジスタ３７までのすべてを１６ビッ
ト幅としたが、すべてが８ビット幅でもよく、すべてが
３２ビット幅でもよい。（３）実施形態では、積和桁上げ蓄積レジスタ３７を設
けて積の累積加算時の桁上げを蓄積し、３２ビットの積
に対して４８ビットの和を保つようにしているが、これ
を削除して和も３２ビットのみ保つようにしてもよい。
こうすれば２回の加算操作により１回分の累積加算が達
成される。従って、演算器２１と加算器２９と再び演算
器２１とを動作させている本実施の形態を、演算器２１
と加算器２９とをそれぞれ１回ずつ動作させるかあるい
は演算器２１を２回動作させるように変形できる。特に
後者のようにすると、加算器２９の動作が乗算時のみに
限定されるため、セレクタ３０とセレクタ３１とが不要
になる。（４）実施形態では、プロセッサ６０１の命令実行回路
６０４の内部に乗算器２８や加算器２９などを備えてい
るが、命令実行回路６０４は図８の破線で囲んだ部分を
削除し、同部分を拡張演算装置として必要時にだけ命令
実行回路６０４を拡張する形で設けてもよい。これによ
り、積和演算の不要な応用には拡張演算装置のないハー
ドウェアコストの小さい汎用プロセッサを提供でき、積
和演算の必要な応用には拡張演算装置を付加した形態の
プロセッサを提供できる。なお、積和演算の必要な応用
においても、積和演算に用いるデータへのアクセスは、
演算器２１、データバス６０６、アドレスレジスタファ
イル１７の１系統のアクセス手段で実現し、また、拡張
演算装置と命令実行回路６０４とが並列に動作するた
め、ハードウェアの利用効率が向上し、拡張演算装置の
付加に要するハードウェアコストを最小限に抑えられ
る。As described above, the processor and the compiler according to the present invention have been described based on the embodiments. However, it goes without saying that the present invention is not limited to these embodiments. That is, (1) In the embodiment, the product sum 1 in the C language program
Compiler 102 translates the processing of this time into an instruction sequence consisting of a load instruction, a multiplication instruction, a load instruction, and an accumulative addition instruction. Processor 601 receives this instruction sequence, and executes the previous load instruction and multiplication instruction in parallel. Although the subsequent load instruction and the cumulative addition instruction are each decoded and executed in parallel, an instruction for executing the load and multiplication in parallel and an instruction for executing the load and the cumulative addition in parallel are defined. The same processing in the C language may be translated into an instruction sequence composed of these two instructions, and the processor 601 may receive this instruction sequence and decode and execute each of them independently. (2) In the embodiment, everything from the address register file 17 to the product-sum carry accumulation register 37 has a 16-bit width. However, all may have an 8-bit width or all may have a 32-bit width. (3) In the embodiment, the product-sum carry accumulation register 37 is provided to accumulate the carry at the time of accumulative addition of the product, and to keep the 48-bit sum for the 32-bit product. Alternatively, the sum may be kept only 32 bits.
In this case, one addition operation is achieved by two addition operations. Therefore, the present embodiment in which the arithmetic unit 21, the adder 29, and the arithmetic unit 21 are operated again is
And the adder 29 may be operated once each, or the arithmetic unit 21 may be operated twice. In particular, in the latter case, the operation of the adder 29 is limited to only the time of multiplication, so that the selector 30 and the selector 31 become unnecessary. (4) In the embodiment, the multiplier 28 and the adder 29 are provided inside the instruction execution circuit 604 of the processor 601. However, the instruction execution circuit 604 deletes a portion surrounded by a broken line in FIG. May be provided as an extended arithmetic unit so that the instruction execution circuit 604 is extended only when necessary. This makes it possible to provide a general-purpose processor that does not require an extended arithmetic unit and has a low hardware cost for applications that do not require a product-sum operation, and provide a processor that has an extended arithmetic unit added to an application that requires a product-sum operation. In addition, in applications that require the product-sum operation, access to the data used for the product-sum operation is as follows:
This is realized by a single access means of the arithmetic unit 21, the data bus 606, and the address register file 17, and the extended arithmetic unit and the instruction execution circuit 604 operate in parallel. Hardware costs required for adding an arithmetic unit can be minimized.

【００７２】[0072]

【発明の効果】以上の説明から明らかなように、本発明
に係るプロセッサは、記憶領域に置かれた第１の配列デ
ータの配列要素と第２の配列データの配列要素との積を
各配列要素毎に求めて累積加算するプロセッサであっ
て、プログラムメモリから命令を読み出す命令読出し手
段と、前記命令読出し手段によって読み出された所定の
第１の拡張命令と第２の拡張命令とを解読する命令解読
手段と、第１〜第４のレジスタと、前記第１及び第２の
配列データの配列要素をそれぞれ前記第１及び第２のレ
ジスタにロードするデータロード手段と、前記第１及び
第２のレジスタの内容の積を求めて前記第３のレジスタ
に格納する乗算手段と、前記第３のレジスタの内容と前
記第４のレジスタの内容との和を求め再び前記第４のレ
ジスタに格納する加算手段と、前記命令解読手段により
前記第１の拡張命令が解読された場合に前記乗算手段を
実行させると並行して前記データロード手段に新たに前
記第１の配列データの配列要素をロードさせる第１の実
行制御手段と、前記命令解読手段により前記第２の拡張
命令が解読された場合に前記データロード手段に前記第
２の配列データの配列要素をロードさせると並行して前
記加算手段を実行させる第２の実行制御手段とからなる
命令実行手段とを備えることを特徴とする。As is apparent from the above description, the processor according to the present invention calculates the product of the array element of the first array data and the array element of the second array data stored in the storage area in each array. A processor for obtaining and adding cumulatively for each element, an instruction reading means for reading an instruction from a program memory, and decoding a predetermined first extended instruction and a second extended instruction read by the instruction reading means. Instruction decoding means, first to fourth registers, data loading means for loading array elements of the first and second array data into the first and second registers, respectively, and the first and second registers. Multiplying means for obtaining the product of the contents of the registers and storing the result in the third register, and obtaining the sum of the contents of the third register and the contents of the fourth register and storing the sum in the fourth register again Addition Means for executing the multiplication means when the first extension instruction is decoded by the instruction decoding means, and causing the data load means to newly load an array element of the first array data. 1 execution control means, and executing the adding means in parallel with loading the array element of the second array data into the data loading means when the second extended instruction is decoded by the instruction decoding means. And an instruction execution means comprising a second execution control means for causing the instruction to be executed.

【００７３】これによって、本発明に係るプロセッサ
は、前記第１のレジスタにデータをロードする前記デー
タロード手段の実行と、前記第２のレジスタにデータを
ロードする前記データロード手段の実行とは、それぞれ
前記第１の実行制御手段と前記第２の実行制御手段によ
って別個に制御され、第１及び第２のレジスタへのデー
タのロードが同時に行われるのではないので、前記デー
タロード手段は、記憶領域をアクセスするアドレスを求
める手段と記憶領域をアクセスする手段とが１系統あれ
ば足りるため、第２の従来技術に示した専用の積和演算
回路を用いた場合と比較してハードウェアコストを低く
押さえることができる。According to this, the processor according to the present invention executes the data load means for loading data into the first register and the execution of the data load means to load data into the second register. Since the first execution control means and the second execution control means are separately controlled and the data loading to the first and second registers is not performed simultaneously, the data loading means includes Since only one system is required for obtaining the address for accessing the area and for accessing the storage area, the hardware cost is reduced as compared with the case of using the dedicated product-sum operation circuit shown in the second prior art. Can be kept low.

【００７４】また、本発明に係るプロセッサは、前記第
１の実行制御手段が前記データロード手段と前記乗算手
段とを並行させるとともに、前記第２の実行制御手段が
前記データロード手段と前記加算手段とを並行させるの
で、第１の従来技術に示した汎用プロセッサによる場合
と比較して高速に積和演算の処理を行うことができる。Also, in the processor according to the present invention, the first execution control means makes the data load means and the multiplication means parallel, and the second execution control means makes the data load means and the addition means Are performed in parallel, so that the product-sum operation can be performed at a higher speed than in the case of using the general-purpose processor described in the first related art.

【００７５】また、本発明に係るプロセッサにおける前
記命令解読手段は、前記記憶領域から第１の配列データ
の配列要素を前記第１のレジスタにロードする第１のロ
ード命令と、当該第１のロード命令の結果が格納される
前の前記第１のレジスタの内容と前記第２のレジスタの
内容との積を求めて前記第３のレジスタに格納する乗算
命令とが連続して配置された命令を並列して解読し、前
記記憶領域から第２の配列データの配列要素を前記第２
のレジスタにロードする第２のロード命令と、前記第３
のレジスタの内容と前記第４のレジスタの内容との和を
求め再び前記第４のレジスタに格納する累積加算命令と
が連続して配置された命令を並列して解読する並列解読
手段を備えることもできる。In the processor according to the present invention, the instruction decoding means includes: a first load instruction for loading an array element of first array data from the storage area into the first register; An instruction in which a multiplication instruction for obtaining a product of the contents of the first register and the contents of the second register before storing the result of the instruction and storing the product in the third register is sequentially executed. Decode in parallel and read the array element of the second array data from the storage area to the second
A second load instruction for loading the register of
And a parallel decoding means for obtaining the sum of the contents of the fourth register and the contents of the fourth register, and decoding in parallel an instruction in which a cumulative addition instruction to be stored again in the fourth register is arranged in parallel. Can also.

【００７６】ここで、前記第１の拡張命令は、前記記憶
領域から第１の配列データの配列要素を前記第１のレジ
スタにロードする第１のロード命令と、当該第１のロー
ド命令の結果が格納される前の前記第１のレジスタの内
容と前記第２のレジスタの内容との積を求めて前記第３
のレジスタに格納する乗算命令とが連続して配置された
命令であり、前記第２の拡張命令は、前記記憶領域から
第２の配列データの配列要素を前記第２のレジスタにロ
ードする第２のロード命令と、前記第３のレジスタの内
容と前記第４のレジスタの内容との和を求め再び前記第
４のレジスタに格納する累積加算命令とが連続して配置
された命令である。Here, the first extended instruction includes a first load instruction for loading an array element of first array data from the storage area into the first register, and a result of the first load instruction. Is obtained by multiplying the contents of the first register and the contents of the second register before the third register is stored.
And the multiplication instruction stored in the second register is a second instruction for loading an array element of second array data from the storage area into the second register. And a cumulative addition instruction for obtaining the sum of the contents of the third register and the contents of the fourth register and storing the sum again in the fourth register.

【００７７】これによって、積和演算処理のためだけに
前記第１及び第２の拡張命令を別個の命令として定義す
る必要がなく、一般的な命令だけを読み込み解読するこ
とによってプロセッサは高速に積和演算を実行すること
ができる。また、前記乗算命令には前記第３のレジスタ
を指定する情報が暗示的に含まれ、前記累積加算命令に
は前記第３のレジスタ及び第４のレジスタを指定する情
報が暗示的に含まれ、本発明に係るプロセッサの前記第
３のレジスタ及び前記第４のレジスタは、専用のレジス
タとすることもできる。Thus, it is not necessary to define the first and second extended instructions as separate instructions only for the product-sum operation, and the processor reads and decodes only general instructions, thereby enabling the processor to multiply at high speed. A sum operation can be performed. Further, the multiplication instruction implicitly includes information designating the third register, the cumulative addition instruction implicitly includes information designating the third register and the fourth register, The third register and the fourth register of the processor according to the present invention may be dedicated registers.

【００７８】これによって、前記乗算手段と前記加算手
段とは専用のレジスタとデータをやりとりするので回路
構成を簡単なものとすることができ、ハードウェアコス
トを低く押さえることができる。また、本発明に係るプ
ロセッサの前記乗算手段は、ツリー接続された複数の桁
上げ保存加算器と、前記ツリーの最終段の和出力と桁上
げ出力とを加算する桁上げ伝搬加算器とからなり、前記
加算手段は、前記和を求めるために前記桁上げ伝搬加算
器を用いることにすることもできる。Thus, since the multiplication means and the addition means exchange data with the dedicated register, the circuit configuration can be simplified, and the hardware cost can be reduced. Further, the multiplying means of the processor according to the present invention includes a plurality of carry save adders connected in a tree, and a carry propagation adder for adding the sum output and the carry output of the last stage of the tree. The adding means may use the carry propagation adder to obtain the sum.

【００７９】これによって、前記加算手段は、加算のた
めに用いる複数の加算器のうちの１つの代わりとして、
乗算手段を構成する桁上げ伝搬加算器を、乗算手段と共
用することができるので、ハードウェアコストを低く押
さえることができる。また、本発明に係るプロセッサ
は、記憶領域に置かれた第１の配列データの配列要素と
第２の配列データの配列要素との積を各配列要素毎に求
めて累積加算する積和演算を実行するプロセッサであっ
て、同一のクロックパルスを受けて動作しプログラムメ
モリ中の命令に従ってデータを処理する主たるプロセッ
サと従たる拡張演算装置とを備え、前記主たるプロセッ
サは、プログラムメモリから命令を読み出す命令読出し
手段と、前記命令読出し手段によって読み出された命令
の解読を行う命令解読手段と、前記命令解読手段の解読
結果に応じて命令の実行を行う命令実行手段とをからな
り、前記命令実行手段は、命令実行制御手段と加算器を
備え、前記従たる拡張演算装置は、乗算器を備え、前記
主たるプロセッサと前記従たる拡張演算装置とは、前記
命令実行手段からデータを前記乗算器に伝送する第１の
バス及び前記乗算器の乗算結果を前記命令実行手段に伝
送する第２のバスにより接続され、前記命令実行制御手
段は、前記命令解読手段により前記積和演算の命令が解
読された場合に、前記加算器を用いて前記第１の配列デ
ータの配列要素にアクセスするためのアドレス計算と前
記第２の配列データの配列要素にアクセスするためのア
ドレス計算と前記累積加算とを行い、これと並行して前
記従たる拡張演算装置に前記乗算器を用いて前記乗算を
行わせることを特徴とする。Thus, the adding means can replace one of a plurality of adders used for addition with one another.
Since the carry propagation adder constituting the multiplying means can be shared with the multiplying means, the hardware cost can be reduced. Further, the processor according to the present invention performs a sum-of-products operation in which a product of an array element of the first array data and an array element of the second array data stored in the storage area is obtained for each array element and cumulatively added. A main processor that operates in response to the same clock pulse and processes data in accordance with an instruction in a program memory, and a secondary extended operation device, wherein the main processor executes an instruction to read an instruction from the program memory. Said instruction execution means comprising: a read means; an instruction decoding means for decoding an instruction read by said instruction read means; and an instruction execution means for executing an instruction in accordance with a result of decoding by said instruction decoding means. Comprises an instruction execution control means and an adder, wherein the secondary extended operation device comprises a multiplier, and the main processor and the secondary extended operation device are provided. Is connected by a first bus for transmitting data from the instruction execution means to the multiplier and a second bus for transmitting the multiplication result of the multiplier to the instruction execution means, wherein the instruction execution control means comprises: When the instruction of the product-sum operation is decoded by the instruction decoding means, an address calculation for accessing an array element of the first array data using the adder and an array element of the second array data are performed. And performing the accumulative addition to access the data, and in parallel with this, causing the subordinate extended arithmetic unit to perform the multiplication using the multiplier.

【００８０】これによって、前記主たるプロセッサと前
記従たるプロセッサとは、互いに独立して並列処理を行
うことができるので、ハードウェアの利用効率が高ま
り、ハードウェアコストに対する処理能力を高めること
ができる。また、本発明に係るコンパイラは、高級言語
プログラムから第１〜第４のレジスタと乗算器と加算器
を備えるプロセッサを対象とする機械命令プログラムを
生成するコンパイラであって、前記高級言語プログラム
中に第１の配列データの配列要素と第２の配列データの
配列要素との積を各配列要素毎に求めて累積加算する旨
のコードを検出する検出手段と、前記コードが検出され
た場合に、前記第１のレジスタの内容と前記第２のレジ
スタの内容との積を求めて前記第３のレジスタに格納さ
せると並行して記憶領域から新たに前記第１の配列デー
タの配列要素を前記第１のレジスタにロードさせるため
の第１の拡張命令と、記憶領域から前記第２の配列デー
タの配列要素を前記第２のレジスタにロードさせると並
行して前記第３のレジスタの内容と前記第４のレジスタ
の内容との和を求め再び前記第４のレジスタに格納させ
るための第２の拡張命令と、が繰り返された機械命令プ
ログラムを生成する機械命令生成手段とを備えることを
特徴とする。As a result, the main processor and the sub-processor can perform parallel processing independently of each other, so that the utilization efficiency of hardware can be increased and the processing capability with respect to hardware cost can be increased. The compiler according to the present invention is a compiler that generates a machine instruction program for a processor including first to fourth registers, a multiplier, and an adder from a high-level language program, wherein the high-level language program includes Detecting means for obtaining a product of the array element of the first array data and the array element of the second array data for each array element and detecting a code for accumulative addition; and when the code is detected, In parallel with obtaining the product of the content of the first register and the content of the second register and storing the product in the third register, the array element of the first array data is newly stored in the storage area in parallel with the first A first extension instruction for loading into the first register, and loading an array element of the second array data from the storage area into the second register in parallel with the first register. Machine instruction generating means for generating a machine instruction program in which a second extended instruction for obtaining the sum of the contents and the contents of the fourth register and storing the sum again in the fourth register is provided. It is characterized by.

【００８１】また、前記第１の拡張命令は、前記記憶領
域から第１の配列データの配列要素を前記第１のレジス
タにロードする第１のロード命令と、当該第１のロード
命令の結果が格納される前の前記第１のレジスタの内容
と前記第２のレジスタの内容との積を求めて前記第３の
レジスタに格納する乗算命令とが連続して配置された命
令であり、前記第２の拡張命令は、前記記憶領域から第
２の配列データの配列要素を前記第２のレジスタにロー
ドする第２のロード命令と、前記第３のレジスタの内容
と前記第４のレジスタの内容との和を求め再び前記第４
のレジスタに格納する累積加算命令とが連続して配置さ
れた命令であるとすることもできる。The first extended instruction includes a first load instruction for loading an array element of first array data from the storage area into the first register, and a result of the first load instruction. A multiplication instruction for obtaining a product of the contents of the first register and the contents of the second register before being stored and storing the product in the third register, wherein A second load instruction for loading an array element of second array data from the storage area into the second register; a content of the third register and a content of the fourth register; And the fourth
And the cumulative addition instruction stored in the register No. 1 may be an instruction arranged continuously.

【００８２】これによって、前記の第１〜第４のレジス
タと乗算器と加算器を備え、データロードと乗算とを並
行して実行し、データロードと累積加算とを並行して実
行する機能を有するプロセッサに好適な機械命令プログ
ラムが生成される。従って、前記プロセッサが、前記コ
ンパイラによって生成された機械命令プログラムを読み
込み解読し実行することにより、高速な積和演算の処理
が実現される。Thus, the above-mentioned first to fourth registers, multipliers and adders are provided, and a function of executing data loading and multiplication in parallel and executing data loading and cumulative addition in parallel is provided. A machine instruction program suitable for the processor having the program is generated. Therefore, the processor reads, decodes, and executes the machine instruction program generated by the compiler, thereby realizing high-speed product-sum operation processing.

【００８３】また、前記機械命令生成手段は、前記第１
の拡張命令と前記第２の拡張命令との繰返しの前に、前
記第１の配列データの第１番目の配列要素に対応する記
憶領域から第１の配列要素を前記第１のレジスタにロー
ドする第１の前置ロード命令と、前記第２の配列データ
の第１番目の配列要素に対応する記憶領域から配列要素
を前記第２のレジスタにロードする第２の前置ロード命
令とを、追加して生成することもできる。Further, the machine instruction generating means may include
Loading the first array element from the storage area corresponding to the first array element of the first array data into the first register before the repetition of the first extended instruction and the second extended instruction A first prefix load instruction and a second prefix load instruction for loading an array element from the storage area corresponding to the first array element of the second array data into the second register are added. Can also be generated.

【００８４】これによって、前記高級言語プログラム中
に第１の配列データの配列要素と第２の配列データの配
列要素との積を各配列要素毎に求めて累積加算する旨の
コードが記述されていた場合に、プロセッサに前記第１
の配列データ及び前記第２の配列データの第１番目の配
列要素についての積和演算処理を実行させるための機械
命令プログラムを生成することができる。Thus, in the high-level language program, a code for obtaining the product of the array element of the first array data and the array element of the second array data for each array element and performing cumulative addition is described. The first
And a machine instruction program for executing a product-sum operation for the first array element of the second array data.

【００８５】また、本発明に係るコンパイラが生成する
前記乗算命令には前記第３のレジスタを指定する情報が
暗示的に含まれることにすることもできる。これによっ
て、前記乗算命令の命令コードのサイズを小さいものと
することができる。また、本発明に係るコンパイラが生
成する前記累積加算命令には前記第３のレジスタ及び第
４のレジスタを指定する情報が暗示的に含まれることに
することもできる。Further, the multiplication instruction generated by the compiler according to the present invention may implicitly include information designating the third register. Thereby, the size of the instruction code of the multiplication instruction can be reduced. Further, the cumulative addition instruction generated by the compiler according to the present invention may implicitly include information designating the third register and the fourth register.

【００８６】これによって、前記累積加算命令の命令コ
ードのサイズを小さいものとすることができる。また、
本発明に係る積和演算方法は、第１〜第４のレジスタと
乗算器と加算器を備えるプロセッサを用いて、記憶領域
に置かれた第１の配列データの配列要素と第２の配列デ
ータの配列要素との積を各配列要素毎に求めて累積加算
する積和演算方法であって、前記第１の配列データの配
列要素を前記第１のレジスタにロードする第１のデータ
ロードステップと、当該第１のデータロードステップに
よって前記第１の配列データの配列要素が前記第１のレ
ジスタにロードされる前の前記第１のレジスタの内容と
第２のレジスタの内容との積を求めて前記第３のレジス
タに格納する乗算ステップとを並行し、前記第２の配列
データの配列要素を前記第２のレジスタにロードする第
２のデータロードステップと、前記第３のレジスタの内
容と前記第４のレジスタの内容との和を求め再び前記第
４のレジスタに格納する加算ステップとを並行すること
を繰り返すことを特徴とする。Thus, the size of the instruction code of the cumulative addition instruction can be reduced. Also,
A method for multiplying and accumulating data according to the present invention uses a processor having first to fourth registers, a multiplier and an adder, using an array element of first array data and a second array data stored in a storage area. A product-sum operation method for obtaining a product of each of the array elements for each array element and accumulatively adding the product, wherein a first data loading step of loading an array element of the first array data into the first register; Calculating the product of the contents of the first register and the contents of the second register before the array elements of the first array data are loaded into the first register by the first data loading step. A second data loading step of loading an array element of the second array data into the second register in parallel with the multiplying step of storing the data in the third register; The fourth record And repeating to parallel and an addition step of storing again calculates the sum of the contents of static in the fourth register.

【００８７】これによって、前記プロセッサは、前記デ
ータロードステップ及び前記乗算ステップの並列実行
と、前記データロードステップ及び前記加算ステップの
並列実行ができるので、低コストでかつ高速に積和演算
を実行することができる。また、本発明に係る記録媒体
は、第１〜第４のレジスタと乗算器と加算器とを備える
プロセッサを用いて記憶領域に置かれた第１の配列デー
タの配列要素と第２の配列データの配列要素との積を各
配列要素毎に求めて累積加算する積和演算プログラムを
記載した記録媒体であって、前記積和演算プログラム
は、前記第１の配列データの配列要素を前記第１のレジ
スタにロードする第１のデータロードステップと、当該
第１のデータロードステップによって前記第１の配列デ
ータの配列要素が前記第１のレジスタにロードされる前
の前記第１のレジスタの内容と第２のレジスタの内容と
の積を求めて前記第３のレジスタに格納する乗算ステッ
プとを並行する第１の拡張命令と、前記第２の配列デー
タの配列要素を前記第２のレジスタにロードする第２の
データロードステップと、前記第３のレジスタの内容と
前記第４のレジスタの内容との和を求め再び前記第４の
レジスタの格納する加算ステップとを並行する第２の拡
張命令とが繰り返されていることを特徴とする積和演算
プログラムを記録している。Thus, the processor can execute the data loading step and the multiplication step in parallel and the data loading step and the addition step in parallel, thereby executing the product-sum operation at low cost and at high speed. be able to. In addition, the recording medium according to the present invention includes an array element of first array data and a second array data stored in a storage area using a processor including first to fourth registers, a multiplier, and an adder. A product-sum operation program for calculating a product with each array element for each array element and accumulatively adding the product, wherein the product-sum operation program stores the array element of the first array data in the first array data. A first data loading step of loading the first register data into the first register, and contents of the first register before the array elements of the first array data are loaded into the first register by the first data loading step. Loading a first extension instruction in parallel with a multiplication step of obtaining a product of the contents of the second register and storing the product in the third register, and loading an array element of the second array data into the second register You A second extended instruction that performs a second data load step and an addition step of obtaining the sum of the contents of the third register and the contents of the fourth register and storing the sum again in the fourth register is executed by a second extended instruction. A product-sum calculation program characterized by being repeated is recorded.

【００８８】これによって、前記プロセッサは、前記デ
ータロードステップ及び前記乗算ステップの並列実行
と、前記データロードステップ及び前記加算ステップの
並列実行ができるので、低コストでかつ高速に積和演算
を実行することができる。上述したように、本発明に係
るプロセッサ、コンパイラ、積和演算方法、及び記録媒
体は、ハードウェアコストを低く押さえ、かつ、高速に
積和演算を処理する技術であるので、積和演算を多用す
るマルチメディア関連の製品開発において非常に有用で
あり、マルチメディア関連産業の進歩発展に多大な貢献
をするものである。Thus, the processor can execute the data loading step and the multiplication step in parallel and the data loading step and the addition step in parallel, thereby executing the product-sum operation at low cost and at high speed. be able to. As described above, the processor, the compiler, the product-sum operation method, and the recording medium according to the present invention are techniques for processing the product-sum operation at a low hardware cost and at a high speed, and thus the product-sum operation is frequently used. It is very useful in the development of multimedia related products, and greatly contributes to the advancement and development of the multimedia related industry.

[Brief description of the drawings]

【図１】実施形態に係るコンパイラの構成を示すブロッ
ク図である。FIG. 1 is a block diagram illustrating a configuration of a compiler according to an embodiment.

【図２】同実施形態に係る構文解析部の処理フローを示
したフローチャートである。FIG. 2 is a flowchart showing a processing flow of a syntax analysis unit according to the embodiment.

【図３】同実施形態に係る機械命令生成部の処理フロー
を示したフローチャートである。FIG. 3 is a flowchart showing a processing flow of a machine instruction generation unit according to the embodiment.

【図４】積和演算の処理をするＣ言語のプログラムを示
した図である。FIG. 4 is a diagram showing a C language program for performing a product-sum operation.

【図５】図４に示すＣ言語プログラムを入力として与え
た場合に、同実施形態に係るコンパイラにより生成され
た機械命令プログラムを示したリストである。FIG. 5 is a list showing machine instruction programs generated by the compiler according to the embodiment when the C language program shown in FIG. 4 is given as an input.

【図６】同実施形態に係るプロセッサの概略構成図であ
る。FIG. 6 is a schematic configuration diagram of a processor according to the same embodiment.

【図７】同実施形態に係る命令読出し回路、命令解読回
路、及び命令実行回路の処理タイミングを示した処理タ
イミング図である。FIG. 7 is a processing timing chart showing processing timings of an instruction reading circuit, an instruction decoding circuit, and an instruction execution circuit according to the same embodiment.

【図８】同実施形態に係る命令実行回路の構成を示すブ
ロック図である。FIG. 8 is a block diagram showing a configuration of an instruction execution circuit according to the same embodiment.

【図９】図５の機械命令プログラムでアクセスするデー
タを格納したデータメモリの内容説明図である。9 is an explanatory diagram of the contents of a data memory storing data accessed by the machine instruction program of FIG. 5;

【図１０】図５の機械命令プログラムに対応した命令実
行回路のサイクルｔ１〜ｔ４における動作タイミング図
である。10 is an operation timing chart of the instruction execution circuit corresponding to the machine instruction program of FIG. 5 in cycles t1 to t4.

【図１１】図５の機械命令プログラムに対応した命令実
行回路のサイクルｔ５〜ｔ８における動作タイミング図
である。11 is an operation timing chart of an instruction execution circuit corresponding to the machine instruction program of FIG. 5 in cycles t5 to t8.

【図１２】図５の機械命令プログラムに対応した命令実
行回路のサイクルｔ９〜ｔ１０における動作タイミング
図である。12 is an operation timing chart of the instruction execution circuit corresponding to the machine instruction program of FIG. 5 in cycles t9 to t10.

【図１３】従来のコンパイラによりＣ言語プログラムを
機械命令プログラムに翻訳した結果のプログラムリスト
である。FIG. 13 is a program list as a result of translating a C language program into a machine instruction program by a conventional compiler.

【図１４】汎用プロセッサに付加する従来の専用積和演
算回路の構成を示すブロック図である。FIG. 14 is a block diagram showing a configuration of a conventional dedicated product-sum operation circuit added to a general-purpose processor.

[Explanation of symbols]

１４ Aバス（ABUS）１５ B1バス（B1BUS）１６ B2バス（B2BUS）１７アドレスレジスタファイル（ARF）１８データレジスタファイル（DRF）２１演算器（ALU）２４演算器出力バッファ（ALOB）２５オペランドアドレスバッファ（OAB）２６ストアバッファ（STB）２７ロードバッファ（LDB）２８乗算器（MPY）２９加算器（CPA）３２ラッチ（PRL）３３ラッチ（PRH1）３４ラッチ（PRH2）３５積和下位桁レジスタ（ACCL）３６積和上位桁レジスタ（ACCH）３７積和桁上げ蓄積レジスタ（ACCC）１０１Ｃ言語プログラム１０２コンパイラ１０３ファイル読込部１０４読込用バッファ１０５構文解析部１０６中間コード用バッファ１０７機械命令生成部１０８出力用バッファ１０９ファイル出力部１１０機械命令プログラム６０１プロセッサ６０２命令読出し回路６０３命令解読回路６０４命令実行回路６０５命令バス６０６データバス６０７プログラムメモリ６０８データメモリ 14 A bus (ABUS) 15 B1 bus (B1BUS) 16 B2 bus (B2BUS) 17 Address register file (ARF) 18 Data register file (DRF) 21 Operation unit (ALU) 24 Operation unit output buffer (ALOB) 25 Operand address buffer (OAB) 26 Store buffer (STB) 27 Load buffer (LDB) 28 Multiplier (MPY) 29 Adder (CPA) 32 Latch (PRL) 33 Latch (PRH1) 34 Latch (PRH2) 35 Multiply-accumulate lower digit register (ACCL) ) 36 Multiply-accumulate upper digit register (ACCH) 37 Multiply-add carry register (ACCC) 101 C language program 102 Compiler 103 File reading unit 104 Reading buffer 105 Syntax analysis unit 106 Intermediate code buffer 107 Machine instruction generation unit 108 Output Buffer 109 File output unit 110 Machine instruction program 601 Pro Sessor 602 Instruction reading circuit 603 Instruction decoding circuit 604 Instruction execution circuit 605 Instruction bus 606 Data bus 607 Program memory 608 Data memory

Claims

[Claims]

1. A processor for obtaining a product of an array element of a first array data and an array element of a second array data stored in a storage area for each array element and accumulatively adding the product. An instruction reading means for reading an instruction; an instruction decoding means for decoding a predetermined first extended instruction and a second extended instruction read by the instruction reading means; a first to a fourth register; Data loading means for loading array elements of the first and second array data into the first and second registers, respectively; obtaining a product of the contents of the first and second registers and storing the product in the third register Multiplying means, adder means for obtaining the sum of the contents of the third register and the contents of the fourth register, and storing the sum again in the fourth register; A first execution control unit for causing the data loading unit to newly load an array element of the first array data in parallel with executing the multiplication unit when the instruction is read; And a second execution control means for executing the addition means in parallel with loading the array elements of the second array data into the data loading means when the extended instruction is decoded. A processor comprising:

2. The first extended instruction includes a first load instruction for loading an array element of first array data from the storage area into the first register, and a result of the first load instruction. A multiplication instruction for obtaining a product of the contents of the first register and the contents of the second register before being stored and storing the product in the third register, wherein The second extended instruction loads an array element of second array data from the storage area into the second register.
Load instruction, the contents of the third register and the fourth
And an accumulative addition instruction for obtaining the sum of the contents of the register and storing the sum again in the fourth register. The instruction decoding means reads the array of the first array data from the storage area. A first load instruction for loading an element into the first register, and the product of the contents of the first register and the contents of the second register before the result of the first load instruction is stored And multiply instructions to be stored in the third register and decode instructions arranged in parallel in parallel,
A second load instruction for loading an array element of the second array data from the storage area into the second register; and a sum of the contents of the third register and the contents of the fourth register. 2. The processor according to claim 1, further comprising parallel decoding means for decoding, in parallel, an instruction in which the cumulative addition instruction stored in the fourth register is arranged consecutively.

3. The multiplication instruction implicitly includes information designating the third register, and the cumulative addition instruction implicitly includes information designating the third register and the fourth register. 3. The processor of claim 2, wherein the third register and the fourth register are dedicated registers.

4. The multiplying means comprises a plurality of carry save adders connected in a tree, and a carry propagation adder for adding a sum output and a carry output of the last stage of the tree. 4. The processor according to claim 3, wherein the means uses the carry propagation adder to determine the sum.

5. A processor for executing a sum-of-products operation for obtaining a product of an array element of the first array data and an array element of the second array data placed in a storage area for each array element and cumulatively adding the product. A main processor that operates in response to the same clock pulse and processes data in accordance with an instruction in a program memory; and a subordinate extended arithmetic unit, wherein the main processor reads instruction from the program memory, An instruction decoding unit that decodes the instruction read by the instruction reading unit; and an instruction execution unit that executes the instruction in accordance with the result of the decoding by the instruction decoding unit. Comprising a control means and an adder, wherein the secondary extended arithmetic unit comprises a multiplier, and wherein the main processor and the secondary extended arithmetic unit are
A first bus configured to transmit data from the instruction execution unit to the multiplier and a second bus configured to transmit a multiplication result of the multiplier to the instruction execution unit; When the instruction of the product-sum operation is decoded by the means, the adder is used to calculate an address for accessing an array element of the first array data and to access an array element of the second array data. For performing the address calculation and the accumulative addition for the above, and in parallel with this, causing the subordinate extended arithmetic unit to perform the multiplication using the multiplier.

6. A compiler for generating, from a high-level language program, a machine instruction program for a processor including first to fourth registers, a multiplier, and an adder, wherein a first array is included in the high-level language program. Detecting means for obtaining a product of the array element of data and the array element of the second array data for each array element and detecting a code for accumulative addition; and detecting the first code when the code is detected. In parallel with obtaining the product of the contents of the register and the contents of the second register and storing the product in the third register, an array element of the first array data is newly stored in the first register from a storage area. A first extension instruction for loading and an array element of the second array data from a storage area are loaded into the second register in parallel with the contents of the third register and the fourth register. The fourth again sought the sum of the contents of the static
And a machine instruction generating means for generating a machine instruction program in which a second extended instruction is stored in the register.

7. The first extended instruction includes: a first load instruction for loading an array element of first array data from the storage area into the first register; and a result of the first load instruction. A multiplication instruction for obtaining a product of the contents of the first register and the contents of the second register before being stored and storing the product in the third register, wherein The second extended instruction loads an array element of second array data from the storage area into the second register.
Load instruction, the contents of the third register and the fourth
7. The compiler according to claim 6, wherein the accumulative addition instruction for obtaining the sum of the contents of the second register and storing the sum again in the fourth register is an instruction arranged consecutively.

8. A storage area corresponding to a first array element of the first array data before the repetition of the first extension instruction and the second extension instruction. A first pre-load instruction for loading the first array element from the first array element into the first register; and storing the array element from the storage area corresponding to the first array element of the second array data in the second array data. 8. The compiler according to claim 7, further comprising: generating a second preload instruction to be loaded into a register.

9. The compiler according to claim 8, wherein the multiplication instruction implicitly includes information designating the third register.

10. The compiler according to claim 9, wherein the cumulative addition instruction implicitly includes information designating the third register and the fourth register.

11. Using a processor having first to fourth registers, a multiplier, and an adder, an array element of a first array data and an array element of a second array data stored in a storage area. A product-sum operation method for obtaining a product for each array element and accumulating and adding, wherein a first data loading step of loading an array element of the first array data into the first register; Calculating a product of the contents of the first register and the contents of the second register before the array elements of the first array data are loaded into the first register by the data loading step; A second data loading step of loading an array element of the second array data into the second register; a multiplication step of loading the array element of the second array data into the second register; Inside Product-sum operation method and repeating to parallel and an addition step of storing again the fourth register obtains the sum of the.

12. An array element of a first array data and an array element of a second array data stored in a storage area using a processor having first to fourth registers, a multiplier, and an adder. A storage medium describing a product-sum operation program for obtaining a product for each array element and performing cumulative addition, wherein the product-sum operation program loads an array element of the first array data into the first register. A first data loading step, and the contents of the first register and the contents of the second register before the array elements of the first array data are loaded into the first register by the first data loading step. A first extension instruction in parallel with a multiplication step of obtaining a product of the contents and storing the product in the third register; and a second data for loading an array element of the second array data into the second register. A second extension instruction is executed in parallel with the loading step and the adding step of obtaining the sum of the contents of the third register and the contents of the fourth register and storing the sum again in the fourth register. A recording medium on which a product-sum operation program is recorded.