JPH0962655A

JPH0962655A - Multiprocessor system

Info

Publication number: JPH0962655A
Application number: JP7215646A
Authority: JP
Inventors: Yoshiko Tamaoki; 由子玉置; Naonobu Sukegawa; 直伸助川; Masanao Ito; 昌尚伊藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-08-24
Filing date: 1995-08-24
Publication date: 1997-03-07

Abstract

(57)【要約】【目的】従プロセッサを早期に起動し、主従プロセッサ
間の同期処理を減らす。【構成】スカラ処理ユニット（ＳＰＵ）からの起動によ
りベクトル処理ユニット（ＶＰＵ）は、Ｌビット９１５
をセットし、命令実行状態に入る。起動後、ＳＰＵは、
ベクトルデータの先頭アドレスなど複数の初期設定デー
タを順次ＶＰＵに送り、ＶＰＵ内のスカラレジスタ群６
０１にセットアップし、対応するＶビット６２０を１に
する。実行制御ユニット３０３は、次に実行すべきベク
トル命令が必要とする初期設定データがセットアップさ
れるのを待ち合わせて、そのベクトル命令を実行するリ
ソース、例えば、３０６に対する命令保持ユニット３３
６に転送し、そこでその命令の実行を制御する。ベクト
ル命令列の最後の命令を発行した時点で、Ｌビット９１
５をリセットし、ＳＰＵは、このリセットを待ち合わせ
てから、起動とセットアップを行う。以後、以上の動作
を繰り返す。 (57) [Summary] [Purpose] To start the slave processors early and reduce the synchronization processing between the master and slave processors. [Structure] When started from a scalar processing unit (SPU), a vector processing unit (VPU) has L bits 915.
Is set to enter the instruction execution state. After booting, the SPU
A plurality of initialization data such as the start address of vector data is sequentially sent to the VPU, and the scalar register group 6 in the VPU
01 and set the corresponding V bit 620 to 1. The execution control unit 303 waits until the initialization data required by the vector instruction to be executed next is set up, and the instruction holding unit 33 for the resource that executes the vector instruction, for example, 306.
6 where it controls execution of the instruction. When the last instruction in the vector instruction sequence is issued, L bit 91
5 is reset and the SPU waits for this reset before starting and setting up. After that, the above operation is repeated.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、主プロセッサと、主プ
ロセッサにより起動され、主プロセッサから転送された
データに基づいて主プロセッサが指定した処理を実行す
る従プロセッサとを有するマルチプロセッサシステムに
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor system having a main processor and a slave processor which is started by the main processor and executes a process designated by the main processor based on data transferred from the main processor.

【０００２】[0002]

【従来の技術】従来の科学技術計算で頻発する配列デー
タの演算を高速に実行する計算機としてベクトルプロセ
ッサが使用されている。ベクトルプロセッサをスカラ処
理ユニットとベクトル処理ユニットとこれらにより共有
された主記憶により構成することも知られている。例え
ば、本出願人発行の文献「ＨＩＴＡＣスカラ−８２０
処理装置マニュアル、６０２０−２」を参照。ここでベ
クトル処理ユニットは、ＤＯループにより連続して処理
されるべき配列データをベクトルデータとしてパイプラ
イン的に処理する従プロセッサとして動作し、スカラ処
理ユニットは、このベクトル処理ユニットの起動を制御
し、このベクトルデータにしかるべきデータを与える主
プロセッサとして動作する。より具体的には、スカラ処
理ユニットは、ベクトル処理ユニットが、処理すべき複
数のベクトルデータの各々に対する主記憶アドレス、処
理すべきベクトルデータの要素数（ベクトル長）、実行
すべきプログラムのアドレスなどをベクトル処理ユニッ
トに転送し、ベクトル処理ユニットは、複数のベクトル
レジスタと複数のパイプライン演算器を有し、これらの
初期設定データにしたがって、必要なベクトルデータを
主記憶からベクトルレジスタにロードし、ベクトルデー
タがロードされたベクトルレジスタを用いてパイプライ
ン演算器でこれらのベクトルデータに対するベクトル演
算を実行し、得られたベクトルデータを適当なベクトル
レジスタに格納し、最終的に得られた結果ベクトルデー
タを主記憶に格納する。このような一連の処理は一連の
ベクトル命令を使用して実行され、この一連のベクトル
命令が、処理すべき各ベクトルデータを、ベクトルレジ
スタに格納できるベクトルデータの要素数（ベクトルレ
ジスタ長）に等しい数の要素からなる要素群に分け、そ
れぞれの要素群からなる複数のベクトルデータに対して
繰り返し実行される。一方、スカラ処理ユニットは、い
ずれかの要素群からベクトルデータをベクトル処理ユニ
ットが処理している間、次の要素群からなるベクトルデ
ータに対する初期設定データとして、当該次のベクトル
データを指定するデータ、具体的には、そのベクトルデ
ータに関する主記憶アドレスを算出し、算出されたアド
レスをベクトル処理ユニットへ供給し、そのユニットを
起動する。以下、処理すべき全ての要素群が処理される
まで、以上の動作が繰り返される。2. Description of the Related Art A vector processor is used as a computer that executes at high speed the operation of array data that frequently occurs in conventional scientific and technological calculations. It is also known to configure a vector processor with a scalar processing unit, a vector processing unit, and a main memory shared by them. For example, the document "HITAC Scalar-820" issued by the present applicant
See Processor Manual, 6020-2 ". Here, the vector processing unit operates as a slave processor that pipeline-processes the array data to be continuously processed by the DO loop as vector data, and the scalar processing unit controls activation of the vector processing unit. It operates as a main processor which gives appropriate data to this vector data. More specifically, the scalar processing unit has a main memory address for each of a plurality of vector data to be processed by the vector processing unit, the number of elements of vector data to be processed (vector length), an address of a program to be executed, etc. To the vector processing unit, and the vector processing unit has a plurality of vector registers and a plurality of pipeline arithmetic units, and loads necessary vector data from the main memory into the vector register according to these initialization data, The vector arithmetic operation is performed on these vector data with the pipeline arithmetic unit using the vector register loaded with the vector data, the obtained vector data is stored in the appropriate vector register, and the finally obtained result vector data Is stored in the main memory. Such a series of processing is executed using a series of vector instructions, and this series of vector instructions is equal to the number of vector data elements (vector register length) that can store each vector data to be processed in the vector register. It is divided into an element group made up of a number of elements and repeatedly executed for a plurality of vector data made up of each element group. On the other hand, the scalar processing unit, while the vector processing unit is processing the vector data from any of the element group, as the initial setting data for the vector data consisting of the next element group, data that specifies the next vector data, Specifically, the main storage address for the vector data is calculated, the calculated address is supplied to the vector processing unit, and the unit is activated. Hereinafter, the above operation is repeated until all the element groups to be processed are processed.

【０００３】このようなベクトルプロセッサの処理速度
には、主プロセッサによる従プロセッサの起動方法が影
響する。具体的には、主プロセッサから従プロセッサへ
の初期設定データの与え方、そのタイミング、従プロセ
ッサの起動タイミングなどがこの処理速度に影響する。
とくに、主プロセッサが従プロセッサを起動するのに要
する時間を減らし、かつ、主プロセッサと従プロセッサ
をできるだけ並列に動作するように、従プロセッサを起
動することが重要である。The processing speed of such a vector processor is affected by the method of starting the slave processor by the main processor. Specifically, how to give the initial setting data from the main processor to the slave processor, its timing, the activation timing of the slave processor, and the like affect the processing speed.
In particular, it is important to start the slave processor so that the time required for the master processor to boot the slave processor is reduced and the master processor and the slave processor operate in parallel as much as possible.

【０００４】特公平２−２２４１８号明細書（第１の従
来技術）では、初期設定データをスカラ処理ユニットか
らベクトル処理ユニットに転送するのに要する時間を減
少するように工夫している。すなわち、ベクトル処理ユ
ニットが一連のベクトル命令を実行する前に、それによ
り処理されるベクトルデータに対する複数の初期設定デ
ータの全てをベクトル処理ユニット内に設けた作業用の
スカラレジスタ群に順次供給し、ベクトル処理ユニット
を起動する命令により、このスカラレジスタ群内の複数
の初期設定データを、ベクトル処理ユニット内に設け
た、ベクトル命令で指定可能な本来のスカラレジスタ群
に並行して転送した上で、ベクトル処理ユニットを起動
する。ベクトル処理ユニットがこの一連のベクトル命令
列を実行している間、スカラ処理ユニットはこの一連の
ベクトル命令を次に実行するときに必要な初期設定デー
タを算出し、上記作業用のスカラレジスタ群にセット
し、上記一連のベクトル命令の実行が終了したことを検
出すると、ベクトル処理ユニットを起動する命令を再度
実行する。これにより、上記一連のベクトル命令の第２
回目以降の実行時には、ベクトル処理ユニットは、上記
一連のベクトル命令の実行終了後、遅延なく上記一連の
ベクトル命令を再度実行できる。In Japanese Patent Publication No. 22418/1990 (first prior art), it is devised to reduce the time required to transfer the initialization data from the scalar processing unit to the vector processing unit. That is, before the vector processing unit executes a series of vector instructions, all of the plurality of initialization data for the vector data to be processed thereby are sequentially supplied to the working scalar register group provided in the vector processing unit, By the instruction to start the vector processing unit, after transferring a plurality of initialization data in this scalar register group to the original scalar register group that can be specified by the vector instruction in the vector processing unit, Start the vector processing unit. While the vector processing unit is executing this series of vector instructions, the scalar processing unit calculates the initialization data necessary for the next execution of this series of vector instructions, and stores it in the scalar register group for work. When it is set and it is detected that the execution of the series of vector instructions is completed, the instruction for activating the vector processing unit is executed again. As a result, the second series of vector instructions
In the subsequent executions, the vector processing unit can execute the series of vector instructions again without delay after the execution of the series of vector instructions.

【０００５】特開昭６２−１５９２６８号明細書（第２
の従来技術）では、上記一連のベクトル命令の最初の実
行時の起動の遅延を減少している。すなわち、ベクトル
処理ユニットが実行すべき一連のベクトル命令列を複数
の部分命令列に分割し、分割された部分命令列ごとに、
その部分命令列を実行するのに必要な複数の初期設定デ
ータをスカラ処理ユニットからベクトル処理ユニット内
の適当なスカラレジスタにセットアップし、その後ベク
トル処理ユニットを起動する。ベクトル処理ユニット
は、最初の部分命令列の実行を終了した後は、次の部分
命令列に対する初期設定データがスカラ処理ユニットか
ら供給されるまで、命令の実行を一時保留する。スカラ
処理ユニットは、上記最初の部分命令列の実行を起動し
た後に、この最初の部分命令列の実行と並行して、次の
部分命令列の実行に必要な複数の初期設定データを算出
し、ベクトル処理ユニット内の他の適当なスカラレジス
タに供給し、ベクトル処理ユニットに命令の実行の一時
停止の解除を指示する。ベクトル処理ユニットはこの解
除指示に応答して、当該次の部分命令列を、この時に設
定された初期設定データを使用して実行する。こうし
て、各部分命令列が必要とする一部の初期設定データを
供給する毎に、ベクトル処理ユニットによりその部分命
令列を実行させる方法をとると、上記第１の従来技術と
は異なり、一連のベクトル命令の実行に必要な全ての初
期設定データがセットアップされる前にベクトル処理ユ
ニットを起動でき、一連のベクトル命令列の実行開始時
期を早めることができる。JP-A-62-159268 (second section)
In the prior art), the activation delay at the first execution of the series of vector instructions is reduced. That is, a series of vector instruction sequences to be executed by the vector processing unit is divided into a plurality of partial instruction sequences, and for each divided partial instruction sequence,
A plurality of initialization data required for executing the partial instruction sequence are set up from the scalar processing unit to an appropriate scalar register in the vector processing unit, and then the vector processing unit is activated. After completing the execution of the first partial instruction sequence, the vector processing unit suspends the execution of the instruction until the initialization data for the next partial instruction sequence is supplied from the scalar processing unit. The scalar processing unit, after activating the execution of the first partial instruction sequence, calculates a plurality of initialization data necessary for the execution of the next partial instruction sequence in parallel with the execution of the first partial instruction sequence, It is supplied to another suitable scalar register in the vector processing unit to instruct the vector processing unit to cancel the suspension of instruction execution. In response to the cancellation instruction, the vector processing unit executes the next partial instruction sequence by using the initialization data set at this time. In this way, when a method of executing the partial instruction sequence by the vector processing unit every time a partial initialization data required by each partial instruction sequence is supplied, a series of The vector processing unit can be started before all the initialization data necessary for executing the vector instruction are set up, and the execution start timing of the series of vector instruction sequences can be advanced.

【０００６】[0006]

【発明が解決しようとする課題】上記第２の従来技術を
本願発明者が検討した結果、上記従来技術では、スカラ
処理ユニットとベクトル処理ユニットを結ぶデータの転
送経路でのデータの転送時間に改善すべき点があること
を見いだした。すなわち、スカラ処理ユニットからベク
トル処理ユニットへの初期設定データの転送時間は、両
者を結ぶ転送経路の長さに依存する。スカラ処理ユニッ
トおよびベクトル処理ユニットの動作クロックが短縮さ
れ、これらの処理ユニットの処理速度が著しく増大する
傾向にある。しかしながら、上記二つの処理ユニットは
異なる実装基板上に実装されるために、上記転送経路の
長さを著しく短縮することは難しい。その結果、処理ユ
ニットの速度の増大に比べて、上記転送時間がベクトル
プロセッサ全体の処理時間に占める割合が増大してい
る。As a result of the inventor of the present invention studying the second prior art, the above prior art improves the data transfer time in the data transfer path connecting the scalar processing unit and the vector processing unit. I found that there is something to do. That is, the transfer time of the initialization data from the scalar processing unit to the vector processing unit depends on the length of the transfer path connecting them. The operation clocks of the scalar processing unit and the vector processing unit are shortened, and the processing speed of these processing units tends to be significantly increased. However, it is difficult to significantly reduce the length of the transfer path because the two processing units are mounted on different mounting boards. As a result, the ratio of the transfer time to the processing time of the entire vector processor is increasing as compared with the increase of the speed of the processing unit.

【０００７】とくに、上記第２の従来技術では、ベクト
ル処理ユニットが実行する一連のベクトル命令を複数の
部分命令列に区分し、ベクトル処理ユニットとスカラ処
理ユニットの間で、各部分命令列の実行終了毎に同期を
とる必要がある。具体的には、ベクトル処理ユニットが
ベクトル命令一時停止命令を実行するまでは、スカラ処
理ユニットは次の部分命令列のための命令実行一時停止
解除命令を実行するのを待たされる。この待ちは、各部
分命令列毎に発生する。上記第２の従来技術に、この待
ちが解消されたことをスカラ処理ユニットがどのように
検出するかは記載されていないが、ごく普通の技術を採
用するならば、スカラ処理ユニットは、ベクトル処理ユ
ニット内の状態情報を読み出して、この待ちが解消した
か否かを判別する必要がある。このようにスカラ処理ユ
ニットがベクトル処理ユニット内の状態情報を読み出す
のに要する時間は、両者の間のデータ転送経路の長さに
依存し、前述のように、この時間は、全体の処理時間に
占める割合が増大してきている。とくに、上記第２の従
来技術では、ベクトル処理ユニットが実行すべき一連の
ベクトル命令を複数の部分命令列に分け、各部分命令列
毎に上記チェックを行うことになる。すなわち、部分命
令列の数が多いほど、上記命令実行一時停止命令と命令
停止一時停止解除命令の組をより多く実行しなければな
らない。このために上記チェックに要する時間の合計が
増大する。ちなみに、上記第１の従来技術では、このチ
ェックは、一連のベクトル命令を１回実行する毎に１回
行えばよい。In particular, in the second conventional technique, a series of vector instructions executed by the vector processing unit is divided into a plurality of partial instruction sequences, and each partial instruction sequence is executed between the vector processing unit and the scalar processing unit. It is necessary to synchronize at the end. Specifically, until the vector processing unit executes the vector instruction suspension instruction, the scalar processing unit is kept waiting to execute the instruction execution suspension cancellation instruction for the next partial instruction sequence. This waiting occurs for each partial instruction sequence. Although the second conventional technique does not describe how the scalar processing unit detects that this wait has been resolved, if the ordinary technique is adopted, the scalar processing unit can perform vector processing. It is necessary to read the status information in the unit to determine whether or not this wait has been resolved. In this way, the time required for the scalar processing unit to read the state information in the vector processing unit depends on the length of the data transfer path between them, and as described above, this time depends on the total processing time. The percentage is increasing. Particularly, in the second conventional technique, a series of vector instructions to be executed by the vector processing unit is divided into a plurality of partial instruction sequences, and the above-mentioned check is performed for each partial instruction sequence. That is, the larger the number of partial instruction sequences, the more sets of the above-mentioned instruction execution suspension instruction and instruction suspension suspension cancellation instruction must be executed. Therefore, the total time required for the above checks increases. By the way, in the above-mentioned first conventional technique, this check may be performed once for each execution of a series of vector instructions.

【０００８】また、同じ問題は、初期設定データを動作
クロックより低速にしか供給できないスカラ処理ユニッ
トを使用した場合により顕著になる。すなわち、最近の
商用のスカラプロセッサの中には、動作クロックは高い
が、外部へのデータの供給はそのクロックの周期より長
い周期でしか行えないものもある。このようなスカラプ
ロセッサを、上記ベクトルプロセッサのスカラ処理ユニ
ットとして使用すると、ベクトル処理ユニットに複数の
初期設定データをセットアップする時間が、ベクトルプ
ロセッサ全体の処理を実行するのに要する時間の中で相
対的に増大する。したがって、第２の従来技術のごと
く、部分命令列ごとに必要な初期設定データをセットア
ップする方法でも、部分命令列内に含まれた複数の命令
のための複数の初期設定データをセットアップしなけれ
ばならず、これらの初期設定データをセットアップする
のに要する時間は、スカラ処理装置のデータ転送周期が
低下した分だけ相対的に増大する。Further, the same problem becomes more noticeable when a scalar processing unit which can supply initialization data only at a speed lower than the operation clock is used. That is, some commercial scalar processors of recent years have a high operation clock, but can supply data to the outside only in a cycle longer than the cycle of the clock. When such a scalar processor is used as the scalar processing unit of the above vector processor, the time required to set up a plurality of initialization data in the vector processing unit is relatively large in the time required to execute the processing of the entire vector processor. Increase to. Therefore, even in the method of setting up the necessary initialization data for each partial instruction sequence as in the second conventional technique, it is necessary to set up a plurality of initialization data for a plurality of instructions included in the partial instruction sequence. Of course, the time required to set up these initial setting data relatively increases due to the decrease in the data transfer cycle of the scalar processing device.

【０００９】さらに、上記第１あるいは第２の従来技術
によるベクトルプロセッサでは、ベクトル処理ユニット
により同じ命令列でもって異なるベクトルデータの組を
順次実行させる。したがって、このベクトル命令列の実
行中に、次に処理されるべきの一組のベクトルデータを
指定する初期設定データをスカラ処理ユニットによりベ
クトル処理ユニットにセットアップする。上記第１の従
来技術では、すでに述べたように、このセットアップを
一連のベクトル命令の実行と並行して行えるように、二
つのレジスタ群を使用し、一方に複数の初期設定データ
を順次セットアップし、その後、ベクトル処理ユニット
を再度起動し、この起動時にこれらの複数の初期設定デ
ータを他方のレジスタ群に並列に転送していた。この方
法では、二組のレジスタ群を必要とし、さらに、これら
のレジスタ間で複数の初期設定データを並列に転送する
回路を必要とする。Further, in the vector processor according to the first or second prior art, different vector data sets are sequentially executed by the vector processing unit with the same instruction sequence. Therefore, during execution of this vector instruction sequence, the scalar processing unit sets up the initialization data that specifies a set of vector data to be processed next in the vector processing unit. In the first prior art described above, as described above, two register groups are used, and a plurality of initialization data are sequentially set up in one of them so that this setup can be performed in parallel with the execution of a series of vector instructions. After that, the vector processing unit was activated again, and at the time of this activation, the plurality of initialization data were transferred in parallel to the other register group. This method requires two sets of registers and further a circuit for transferring a plurality of initialization data in parallel between these registers.

【００１０】以上に述べた問題は、上記ベクトルプロセ
ッサ以外にも、主プロセッサと従プロセッサからなるマ
ルチプロセッサで起こり得る。とくに、従プロセッサを
起動後に主プロセッサが従プロセッサで実行される処理
と並列に実行すべき処理が少ない場合において同様に問
題となる。The above-mentioned problems may occur in a multiprocessor including a main processor and a slave processor, in addition to the above vector processor. In particular, when the slave processor has a small number of processes to be executed in parallel with the processes executed by the slave processor after the slave processor is activated, the same problem occurs.

【００１１】したがって、本発明の目的は、従プロセッ
サをより早期に起動できる従プロセッサの起動方法およ
びそのためのマルチプロセッサを提供することである。Therefore, an object of the present invention is to provide a method for starting a slave processor that can start the slave processor earlier and a multiprocessor for the same.

【００１２】本発明の他の目的は、一部の初期設定デー
タを従プロセッサにセットアップした段階でベクトル命
令列の実行開始を許し、それでいて、主プロセッサと従
プロセッサとの間の命令の実行の同期に要する時間を短
縮するのを可能にする従プロセッサの起動方法およびそ
のためのマルチプロセッサを提供することである。Another object of the present invention is to permit the start of execution of a vector instruction sequence when a part of the initialization data has been set up in the slave processor, and yet to synchronize the execution of instructions between the main processor and the slave processor. (EN) Provided is a method of starting a slave processor and a multiprocessor therefor, which makes it possible to reduce the time required for the above.

【００１３】本発明のさらに他の目的は、従プロセッサ
での現在の一連の処理の終了後にその一連の処理を異な
るデータに対して従プロセッサで再度実行させるために
必要な次の複数の初期設定データを、従プロセッサでの
現在の一連の処理の実行が完了する前により簡単な回路
でセットアップできる従プロセッサの起動方法およびそ
のためのマルチプロセッサシステムを提供することであ
る。It is still another object of the present invention that, after the current series of processing in the slave processor is completed, the next plurality of initializations necessary for causing the slave processor to re-execute the series of processing for different data. It is an object of the present invention to provide a method for starting a slave processor and a multiprocessor system for the same, in which data can be set up by a simpler circuit before the execution of the current series of processes in the slave processor is completed.

【００１４】[0014]

【課題を解決するための手段】上記目的を達成するため
に本発明によるマルチプロセッサシステムでは、主プロ
セッサは、該従プロセッサに一連の命令の実行を指示す
る起動手段と、該一連の命令の実行に必要な複数の初期
設定データを該従プロセッサに順次供給する手段とを有
し、従プロセッサは、該複数の初期設定データの一つを
保持するための複数のレジスタと、該主プロセッサから
の上記実行指示に応答して上記一連の命令を実行する命
令実行手段とを有し、上記供給手段は、上記複数の初期
設定データを該従プロセッサの起動後に供給し、上記命
令実行手段は、該一連の命令のうち、次に実行すべき命
令が該複数のレジスタのいずれか一つに保持された初期
設定データを要求する場合、その一つのレジスタに有効
な初期設定データがすでに書き込まれているときにはそ
の命令を実行し、そうでないときにはその一つのレジス
タに初期設定データが書き込まれるのを待ち合わせてか
らその命令を実行するように、その命令の実行を制御す
る命令実行制御手段を有する。To achieve the above object, in a multiprocessor system according to the present invention, a main processor has a starting means for instructing the slave processor to execute a series of instructions and an execution of the series of instructions. A plurality of registers for holding one of the plurality of initialization data, and a means for supplying one of the plurality of initialization data to the slave processor. An instruction executing means for executing the series of instructions in response to the execution instruction, the supplying means supplies the plurality of initial setting data after the slave processor is activated, and the instruction executing means includes: If the next instruction to be executed out of the series of instructions requests the initialization data held in any one of the plurality of registers, the initialization data valid in that one register Instruction execution control means for controlling the execution of the instruction so that the instruction is executed when it has already been written, and when the initialization data has not been written, the instruction is executed after waiting. Have.

【００１５】上記のように、本発明では、従プロセッサ
を起動した後に複数の初期設定データを従プロセッサに
供給するため、初期設定データをセットアップ開始後、
直ちに従プロセッサでの命令の実行を主プロセッサと並
行して開始できる。しかも従プロセッサは各初期設定デ
ータがセットアップされるのを待ち合わせてからその初
期設定データを利用する命令を実行するので、主プロセ
ッサによる初期設定データのセットアップとそれを利用
する命令の実行との同期が簡単に行うことができる。As described above, in the present invention, since a plurality of initialization data are supplied to the slave processor after the slave processor is activated, after the initialization data is set up,
Immediately execution of instructions in the slave processor can be started in parallel with the master processor. Moreover, since the slave processor waits for each initialization data to be set up before executing the instruction that uses the initialization data, the synchronization between the initialization data setup by the main processor and the execution of the instruction that uses it is synchronized. Easy to do.

【００１６】本発明によるマルチプロセッサシステムの
より具体的な実施態様では、主プロセッサは、該従プロ
セッサに一連の命令の実行を指示する起動手段と、該一
連の命令の実行に必要な複数の初期設定データを該従プ
ロセッサに順次供給する手段とを有し、従プロセッサ
は、それぞれ初期設定データを保持する複数のレジスタ
と、該主プロセッサからの上記実行指示に応答して上記
一連の命令を順次解読する命令解読手段と、各解読され
た命令が要求する処理を実行する命令実行手段であっ
て、異なる命令が要求する複数の処理のいずれか一つを
それぞれ実行する複数の命令実行手段を有するものと、
該複数の命令実行手段に対応して設けられ、それぞれ対
応する命令実行手段により実行可能な少なくとも一つの
命令の解読情報を保持し、保持された解読情報に基づい
て該対応する命令実行手段によりその命令を実行させる
複数の命令保持手段と、該解読手段で解読されたいずれ
かの命令が実行可能か否かを判定し、実行可能なときに
その命令の解読情報を、その命令を実行可能な命令実行
手段に対応する命令保持手段に送出する命令発行手段と
を有し、上記命令発行手段は、いずれかの解読された命
令が該複数のレジスタのいずれか一つに保持された初期
設定データの読み出しを要求する場合、該初期設定デー
タを該一つのレジスタから読み出し、該解読された命令
の解読情報とともに該複数の命令保持手段の一つに供給
する手段を有し、上記従プロセッサは、上記一連の命令
の中に含まれた、上記複数の初期設定データの読み出し
を要する複数の命令の後に位置する特定の命令が該命令
発行手段により発行されたときに、該一連の命令による
該複数のレジスタからの該複数の初期設定データの読み
出しが完了したことを示す読み出し完了情報を生成し保
持する手段をさらに有し、上記主プロセッサは、上記一
連の命令を他の複数の初期設定データに基づいて実行さ
せるときに、上記従プロセッサによる上記一連の命令の
実行と並行して、上記読み出し完了情報を上記保持手段
から読み出す手段と、この情報が該複数の初期設定デー
タの読み出しの完了を示すときに、上記従プロセッサを
再起動し、この読み出し完了情報が読み出し未完了を示
すとき、この情報が読み出し完了を示すのを待ち合わせ
てから上記従プロセッサを再起動する手段をさらに有す
る。In a more specific embodiment of the multiprocessor system according to the present invention, the main processor comprises a starting means for instructing the slave processor to execute a series of instructions, and a plurality of initial units required for executing the series of instructions. The slave processor sequentially supplies setting data to the slave processor, and the slave processor sequentially outputs the series of instructions in response to the execution instruction from the main processor and a plurality of registers each holding initial setting data. It has an instruction decoding means for decoding and an instruction executing means for executing the processing required by each decoded instruction, and a plurality of instruction executing means for executing any one of a plurality of processings required by different instructions. things and,
Decoding information of at least one instruction that is provided corresponding to the plurality of instruction executing means and that can be executed by the corresponding instruction executing means is held, and the corresponding instruction executing means stores the decoding information based on the held decoding information. A plurality of instruction holding means for executing an instruction and whether or not any instruction decoded by the decoding means is executable, and when it is executable, decoding information of the instruction is used to execute the instruction. An instruction issuing means for sending to an instruction holding means corresponding to the instruction executing means, wherein the instruction issuing means has initialization data in which any one of the decoded instructions is held in any one of the plurality of registers. In the case of requesting reading of the initialization data, the initialization data is read from the one register, and is supplied to one of the plurality of instruction holding means together with the decoding information of the decoded instruction, The slave processor, when a specific instruction located after the plurality of instructions that need to read the plurality of initialization data included in the series of instructions is issued by the instruction issuing means, Further includes means for generating and holding read completion information indicating that the reading of the plurality of initialization data from the plurality of registers by the instruction of is completed, and the main processor is configured to store the series of instructions in a plurality of other instructions. Means for reading the read completion information from the holding means in parallel with the execution of the series of instructions by the slave processor, when the information is stored in the plurality of initialization data. When the read completion is indicated, the slave processor is restarted, and when the read completion information indicates that the read is not completed, this information indicates the read completion. After waiting a to further include a means for restarting the slave processor.

【００１７】このように従プロセッサに設けられた複数
の命令保持手段のいずれかに各解読済みの命令を発行す
る手段を有する場合に、この命令の発行時にその命令が
必要とする初期設定データを上記複数のレジスタから読
み出し、その命令の解読情報とともに、いずれかの命令
保持手段に送出するので、この送出段階でその初期設定
データの読み出しが完了する。その命令がその一つの命
令保持手段に保持され、まだ実行されていない段階で
も、これらの一連の命令列に含まれる、上記一連の命令
の終了を示す特定の命令を実行したときには、その命令
に先行する、いずれかの初期設定データを要する命令は
全て発行済みであるので、この特定の命令に応答して、
上記複数のレジスタ内の全ての初期設定データが読み出
し済みと考えることができる。この時点では、すでに発
行された先行命令の実行はまだ完了していない可能性が
あるが、上記実施の態様では、このような先行する命令
のその後の実行の完了前に、上記複数のレジスタの読み
出しが完了したものとして、主プロセッサは、従プロセ
ッサでの命令の実行と並行して、この従プロセッサを再
起動し、そこに初期設定データを転送できる。When there is a means for issuing each decoded instruction to any of the plurality of instruction holding means provided in the slave processor as described above, the initialization data required by the instruction at the time of issuing this instruction is stored. Since the data is read from the plurality of registers and sent to any of the command holding means together with the decoding information of the command, the reading of the initial setting data is completed at this sending stage. Even when the instruction is held in the one instruction holding unit and is not yet executed, when a specific instruction indicating the end of the series of instructions included in the series of instructions is executed, All of the preceding commands that require initialization data have been issued, so in response to this specific command,
It can be considered that all the initialization data in the plurality of registers have been read. At this point, the execution of the preceding instruction that has already been issued may not be completed yet. However, in the above embodiment, before the subsequent execution of such a preceding instruction is completed, the plurality of registers Assuming the read is complete, the master processor can restart the slave processor and transfer the initialization data to it in parallel with the execution of the instruction in the slave processor.

【００１８】このように、このマルチプロセッサシステ
ムでは、上記複数のレジスタから各初期設定データがい
ずれかの命令保持手段に転送するので、これらのレジス
タに保持された全ての初期設定データの読み出しの完了
時期を、これらの初期設定データを利用する命令の実行
の完了時期より早めることができる。しかもこのことは
主として、複数の命令保持手段への初期設定データの転
送する手段と、上記特定の命令に応答して、読み出し完
了を示す情報を生成し、保持する手段という比較的簡単
な回路により実現できる。As described above, in this multiprocessor system, since the respective initialization data are transferred from the plurality of registers to any of the instruction holding means, the reading of all the initialization data held in these registers is completed. The timing can be earlier than the completion timing of the execution of the instruction utilizing these initialization data. In addition, this is mainly due to the means for transferring the initialization data to the plurality of instruction holding means and the means for generating and holding the information indicating the completion of reading in response to the specific instruction, which is a relatively simple circuit. realizable.

【００１９】[0019]

【実施例】以下、本発明に係るマルチプロセッサシステ
ムを図面に示した実施例を参照してさらに詳細に説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A multiprocessor system according to the present invention will be described below in more detail with reference to the embodiments shown in the drawings.

【００２０】＜実施例１＞図１に示すマルチプロセッサ
システムは、主記憶１００と、それに接続されたスカラ
処理ユニット（ＳＰＵ）２００とベクトル処理ユニット
（ＶＰＵ）３００からなるベクトルプロセッサである。
ＳＰＵはスカラ処理を実行するとともにＶＰＵの起動を
制御する主プロセッサである。ＶＰＵは、ＳＰＵにより
指定されたベクトル処理を実行する従プロセッサであ
る。<Embodiment 1> The multiprocessor system shown in FIG. 1 is a vector processor comprising a main memory 100, a scalar processing unit (SPU) 200 and a vector processing unit (VPU) 300 connected thereto.
The SPU is a main processor that executes scalar processing and controls activation of the VPU. The VPU is a slave processor that executes vector processing specified by the SPU.

【００２１】ＳＰＵでは、命令フェッチユニット２０１
により命令を主記憶１００から実行すべき一連のスカラ
命令をフェッチし、デコードユニット２０２によりフェ
ッチされた命令を順次解読し、実行制御ユニット２０３
がデコードされた命令の指示に従い、その命令の実行を
制御する。例えば、デコードされた命令がスカラデータ
に対する演算を行うための命令であれば、実行制御ユニ
ット２０３は、複数の汎用レジスタ（ＧＲ）と複数の浮
動小数点レジスタ（ＦＲ）からなるレジスタ群（以下、
ＧＲ／ＦＲレジスタ群と呼ぶ）２０４、演算器２０５を
適宜制御してその命令を実行させる。その命令が、主記
憶１００内のデータをその演算に使用する命令であると
きあるいはその演算の結果得られたデータを主記憶１０
０に書き込む命令であるときには、キャッシュ２０６を
制御して、それらのデータの読み出しあるいは書き込み
を行う。このようなスカラ演算を行う命令には、たとえ
ば、ＶＰＵで実行させるベクトル命令列のアドレス、そ
れらの命令で使用されるベクトルデータのベースアドレ
スあるいはベクトル長を主記憶１００から読み出すかあ
るいはこれらを演算により算出する命令が含まれる。In the SPU, the instruction fetch unit 201
Fetch a series of scalar instructions to be executed from the main memory 100, decode the fetched instructions sequentially by the decode unit 202, and execute the execution control unit 203.
Controls the execution of the instruction according to the instruction of the decoded instruction. For example, if the decoded instruction is an instruction for performing an operation on scalar data, the execution control unit 203 includes a register group (hereinafter, referred to as a general-purpose register (GR) and a floating-point register (FR)).
The GR / FR register group) 204 and the arithmetic unit 205 are appropriately controlled to execute the instruction. When the instruction is an instruction to use the data in the main memory 100 for the calculation, or the data obtained as a result of the calculation is stored in the main memory 10
When it is an instruction to write to 0, the cache 206 is controlled to read or write the data. An instruction for performing such a scalar operation is, for example, an address of a vector instruction string to be executed by the VPU, a base address or a vector length of vector data used by the instruction is read from the main memory 100, or these are calculated. Contains the instructions to calculate.

【００２２】デコードされた命令がＶＰＵへ、ＶＰＵで
処理されるベクトルデータのベースアドレス等の初期設
定データを設定する命令であれば、実行制御ユニット２
０３はＧＲ／ＦＲレジスタ群２０４からＶＰＵ内のレジ
スタ群制御ユニット６００にデータを転送するようＧＲ
／ＦＲレジスタ群２０４の読み出しを制御する。デコー
ドユニット２０２によりデコードされた命令がＶＰＵを
起動する命令であった場合、実行制御ユニット２０３
は、ＶＰＵの実行状態を調べ、ベクトル命令列の実行が
完了しているときには、実行制御ユニット２０３は、信
号線５５０を介してベクトル命令アドレスおよびベクト
ル長等の、ベクトル処理の実行に必要な他の初期設定デ
ータをＶＰＵに通知し、ＶＰＵを起動する。If the decoded instruction is an instruction for setting initial setting data such as a base address of vector data to be processed by the VPU, the execution control unit 2
03 is a GR for transferring data from the GR / FR register group 204 to the register group control unit 600 in the VPU.
The reading of the / FR register group 204 is controlled. If the instruction decoded by the decode unit 202 is an instruction to activate the VPU, the execution control unit 203
Checks the execution state of the VPU, and when the execution of the vector instruction sequence is completed, the execution control unit 203 uses the signal line 550 to execute vector processing such as vector instruction address and vector length. The VPU is notified of the initial setting data of and the VPU is activated.

【００２３】ＶＰＵには、ベクトルレジスタ群３０５が
設けられ、各ベクトルレジスタは、主記憶１００に保持
された演算を受けるべきベクトルデータを保持し、演算
パイプライン３０５に供給するのに使用されるか、ある
いはそこからの演算結果ベクトルデータを保持するのに
使用される。ＶＰＵには、複数のリソース、たとえば、
演算パイプライン３０５あるいはロードストアパイプラ
イン３０６も設けられている。演算パイプライン３０５
は、ベクトルデータを構成する複数の要素に対して同じ
演算をパイプライン的に実行する回路である。ロードス
トアパイプライン３０６は、主記憶１００内のベクトル
データをいずれかのベクトルレジスタにパイプライン的
にロードするかいずれかのベクトルレジスタ内のベクト
ルデータを主記憶１００にパイプライン的にストアする
回路である。The VPU is provided with a vector register group 305, and each vector register holds vector data to be subjected to the operation held in the main memory 100 and is used for supplying to the operation pipeline 305. , Or used to hold the result vector data from it. The VPU has multiple resources, for example:
A calculation pipeline 305 or a load store pipeline 306 is also provided. Arithmetic pipeline 305
Is a circuit that pipeline-wise executes the same operation on a plurality of elements forming vector data. The load / store pipeline 306 is a circuit that pipeline-loads the vector data in the main memory 100 into one of the vector registers, or pipeline-stores the vector data in one of the vector registers into the main memory 100. is there.

【００２４】ＶＰＵには、十分な数の演算パイプライン
３０５が設けられ、これらによりベクトル演算を異なる
ベクトルデータに対して並列に実行可能になっている。
さらに、ＶＰＵ内には、十分な数のロード／ストアパイ
プライン３０６が設けられ、異なるベクトルデータのロ
ードあるいはストアを並列に実行可能になっている。し
かし、図１では、これらの演算パイプラインあるいはロ
ード／ストアパイプラインは簡単化のために一つのみ示
している。The VPU is provided with a sufficient number of operation pipelines 305, which enable vector operations to be executed in parallel on different vector data.
Further, in the VPU, a sufficient number of load / store pipelines 306 are provided so that different vector data can be loaded or stored in parallel. However, in FIG. 1, only one of these operation pipelines or load / store pipelines is shown for simplification.

【００２５】ＶＰＵはＳＰＵにより起動されると、命令
フェッチユニット３０１が、ＳＰＵから指定された一連
のベクトル命令を主記憶１００から順次フェッチし、デ
コードユニット３０２によりフェッチされたベクトル命
令を順次解読し、実行制御ユニット３０３がデコードさ
れたベクトル命令が発行可能か否かを判定する。すなわ
ち、その命令が必要なリソース、例えば、演算パイプラ
イン３０５あるいはロードストアパイプライン３０５が
利用可能であるか否かを判定し、そのリソースが利用可
能であるならば、その命令の解読情報（たとえば、実行
すべき演算あるいは処理の種別、使用するベクトルレジ
スタの番号を含む）、そのリソースに対応して設けられ
た命令保持ユニット３３５あるいは３３６へ転送する。
命令実行制御ユニット３０３は、この命令がいずれかの
スカラレジスタ内の初期設定データを必要とするときに
は、そのデータがスカラレジスタ内にすでに保持されて
いることを確認した上で、その命令を発行する。その命
令の発行にあたっては、そのデータをそのスカラレジス
タから読み出し、その命令の解読情報の一部として命令
保持ユニット３３５あるいは３３６に転送する。When the VPU is activated by the SPU, the instruction fetch unit 301 sequentially fetches a series of vector instructions designated by the SPU from the main memory 100, and sequentially decodes the vector instructions fetched by the decode unit 302. The execution control unit 303 determines whether the decoded vector instruction can be issued. That is, it is determined whether the resource required by the instruction, for example, the arithmetic pipeline 305 or the load store pipeline 305 is available, and if the resource is available, the decoding information of the instruction (for example, , The type of operation or processing to be executed, the number of the vector register to be used), and the instruction holding unit 335 or 336 provided corresponding to the resource.
When this instruction requires initialization data in any of the scalar registers, the instruction execution control unit 303 issues the instruction after confirming that the data is already held in the scalar register. . In issuing the instruction, the data is read from the scalar register and transferred to the instruction holding unit 335 or 336 as a part of the decoding information of the instruction.

【００２６】命令保持ユニット３３５あるいは３３６
は、実行制御ユニット３０３から転送された解読情報を
一時的に保持し、その解読情報が指定する処理を実行す
るように、その命令保持ユニットに対応するリソースを
制御する。たとえば、対応するリソースを使用している
先行命令があればその命令の実行の終了後に、その保持
された解読情報に基づいて、対応するリソースである、
演算パイプライン３０５あるいはロードストアパイプラ
イン３０６を起動してその命令が要求する処理を実行さ
せる。命令保持ユニット３３５あるいは３３６は、そこ
に保持された解読情報にしたがって、ベクトルデータの
読み出しを、ベクトルレジスタ群３０４内の、その解読
情報が指定するいずれかのベクトルレジスタに指示する
かあるいはその処理の実行の結果得られるベクトルデー
タの書き込みを、ベクトルレジスタ群３０４内の、その
解読情報が指定するベクトルレジスタに指示する。以上
に概略的に述べたごとく、ＶＰＵ内に、命令の発行を制
御する実行制御ユニット３０３と各リソース対応の命令
保持ユニット３３５あるいは３３６を使用する技術は、
すでに特開平１−１２８１６２号明細書に記載されてい
る。本実施例では、実行制御ユニット３０３は、それが
いずれかの命令を発行するときに、その命令がいずれか
のスカラレジスタ内の初期設定データを必要とする場合
には、その初期設定データがすでにそのスカラレジスタ
内に書き込まれていることを確認した上で、その命令を
発行すること、さらに、その発行の際に、その初期設定
データをそのスカラレジスタから読み出し、その初期設
定データを命令解読情報の一部としていずれかの命令保
持ユニットに転送する点で従来技術と異なる。さらに、
命令保持ユニット３３５あるいは３３６では、転送され
た命令解読情報の中に初期設定データが含まれていると
きには、その命令解読情報が指定する処理を実行するま
でその初期設定データも保持し、その処理の実行時にそ
の初期設定データを対応するリソースに送り、その処理
に使用させる点で従来技術と異なる。Instruction holding unit 335 or 336
Controls the resource corresponding to the instruction holding unit so as to temporarily hold the decoding information transferred from the execution control unit 303 and execute the processing specified by the decoding information. For example, if there is a preceding instruction using the corresponding resource, after the execution of the instruction is finished, the corresponding resource is based on the held decoding information,
The arithmetic pipeline 305 or load / store pipeline 306 is activated to execute the processing required by the instruction. The instruction holding unit 335 or 336 instructs reading of vector data to any of the vector registers designated by the decoding information in the vector register group 304 according to the decoding information held therein, or the processing of the processing. The writing of the vector data obtained as a result of the execution is instructed to the vector register designated by the decoding information in the vector register group 304. As outlined above, the technique of using the execution control unit 303 for controlling the instruction issue and the instruction holding unit 335 or 336 corresponding to each resource in the VPU is as follows.
It has already been described in JP-A-1-128162. In this embodiment, when the execution control unit 303 issues any instruction, if the instruction requires initialization data in any scalar register, the initialization data is already Issue the instruction after confirming that it has been written in the scalar register, and at the time of issuing the instruction, read the initialization data from the scalar register and read the initialization data from the instruction decoding information. It differs from the prior art in that it is transferred to one of the instruction holding units as a part of. further,
In the instruction holding unit 335 or 336, when the transferred instruction decoding information includes the initial setting data, the instruction holding unit 335 or 336 also holds the initial setting data until the processing specified by the instruction decoding information is executed. It differs from the prior art in that the initialization data is sent to the corresponding resource at the time of execution and is used for the processing.

【００２７】本実施例では、ＳＰＵは、ＶＰＵを起動し
た後、ＶＰＵが処理すべきベクトルデータのロードに必
要なベースアドレス等の初期設定データをＶＰＵが必要
とする順番にしたがって順次供給するようになってい
る。レジスタ制御ユニット６２０は、これらの初期設定
データを、スカラレジスタ群６０１内の、ＳＰＵから指
定されたにスカラレジスタに順次セットするとともに、
それぞれのスカラレジスタに対応して設けられた有効ビ
ットＶをセットする。In the present embodiment, after activating the VPU, the SPU sequentially supplies the initialization data such as the base address required for loading the vector data to be processed by the VPU in the order required by the VPU. Has become. The register control unit 620 sequentially sets these initial setting data in a scalar register designated by the SPU in the scalar register group 601, and
The valid bit V provided corresponding to each scalar register is set.

【００２８】実行制御ユニット３０３は、一般に、命令
の実行に当たり、その実行に必要な、演算パイプライン
等のリソースが利用可能か否かを判定し、そのリソース
が利用可能であるときにその命令を発行するが、その命
令が、いずれかのスカラレジスタに保持された初期設定
データを必要とするときには、その初期設定データが利
用可能か否かを判定してから、その命令を発行する。本
実施例では、その命令が、ベクトルデータをロードある
いはストアする命令であるときには、そのベクトルデー
タのベースアドレスとして、スカラレジスタ群６０１内
のいずれか一つにＳＰＵにより格納された初期設定デー
タを使用するようになっている。In executing an instruction, the execution control unit 303 generally determines whether or not a resource, such as an arithmetic pipeline, required for the execution of the instruction is available, and when the resource is available, the instruction is executed. When issuing the instruction, when the instruction requires the initialization data held in any of the scalar registers, the instruction is issued after determining whether the initialization data is available. In this embodiment, when the instruction is an instruction to load or store vector data, the initialization data stored by the SPU in any one of the scalar register group 601 is used as the base address of the vector data. It is supposed to do.

【００２９】実行制御ユニット３０３は、その初期設定
データがすでにそのスカラレジスタに書き込まれている
ときには、その命令を発行し、そうでないときには、そ
の初期設定データが書き込まれるのを待ち合わせてから
その命令を発行する。その初期設定データがそのスカラ
レジスタに書き込まれたか否かは、そのスカラレジスタ
に対応する有効ビットＶの値が１であるか否かにより判
定する。実行制御ユニット３０３は、その命令が発行可
能と判断したときには、その命令が指定するスカラレジ
スタからその初期設定データを読み出し、その命令の解
読情報とともに、その命令が必要とするリソース、たと
えば、ロードストアパイプライン３０６に対応する命令
保持ユニット３３６に転送する。この命令保持ユニット
３３６は、対応するリソースが利用可能になったとき
に、そのロードあるいはストア命令を実行する。The execution control unit 303 issues the instruction when the initialization data has already been written to the scalar register, and otherwise waits for the initialization data to be written before executing the instruction. Issue. Whether or not the initial setting data is written in the scalar register is determined by whether or not the value of the valid bit V corresponding to the scalar register is 1. When the execution control unit 303 determines that the instruction can be issued, the initialization control unit 303 reads the initialization data from the scalar register designated by the instruction, and decodes the instruction together with the resource required by the instruction, for example, load store. Transfer to the instruction holding unit 336 corresponding to the pipeline 306. The instruction holding unit 336 executes its load or store instruction when the corresponding resource becomes available.

【００３０】ＶＰＵは、一連の命令の最後に、特定の命
令ｖｌｎｋを含み、この命令が命令実行ユニット３０３
により発行されると、レジスタ制御ユニット６００は、
上記スカラレジスタ群６０１に対応する有効性ビット群
Ｖ６２０をリセットし、初期制ユニット３５０は、上記
スカラレジスタ群６０１内の初期設定データの読み出し
の完了を示す読み出し完了ビットＬを１にセットする。The VPU contains a specific instruction vlnk at the end of the sequence of instructions, which instruction execution unit 303.
Issued by the register control unit 600,
The validity bit group V620 corresponding to the scalar register group 601 is reset, and the initialization unit 350 sets the read completion bit L indicating the completion of the reading of the initialization data in the scalar register group 601 to 1.

【００３１】ＳＰＵは、ＶＰＵのスカラレジスタ群６０
１へ複数の初期設定データを転送した後、ＶＰＵが、一
連のベクトル命令を実行している間に、この一連のベク
トル命令により次に別のベクトルデータを処理させるた
めに、再度ＶＰＵを起動する。この起動に当たっては、
ＶＰＵによる上記スカラレジスタ群６０１内の初期設定
データの読み出しが完了したか否かを、上記読み出し完
了ビットＬにより判断する。読み出しが完了していない
ときには、その読み出しの完了を待ち合わせる。読み出
しが完了した時点では、ＳＰＵは、次にＶＰＵにより処
理されるべきベクトルデータのベースアドレス等を含む
次の複数の初期設定データをＶＰＵに順次セットアップ
する。以下、ＶＰＵはすでに述べたのと同様の動作を繰
り返す。その後ＳＰＵとＶＰＵは同様の動作をさらに繰
り返す。SPU is a VPU scalar register group 60.
After transferring a plurality of initialization data to 1, while the VPU is executing the series of vector instructions, the series of vector instructions causes the VPU to be activated again in order to process another vector data next. . In starting this,
Whether or not the reading of the initialization data in the scalar register group 601 by the VPU is completed is determined by the reading completion bit L. When the reading is not completed, the completion of the reading is waited. When the reading is completed, the SPU sequentially sets up the next plurality of initialization data including the base address of the vector data to be processed by the VPU in the VPU. After that, the VPU repeats the same operation as described above. After that, the SPU and VPU further repeat the same operation.

【００３２】ＶＰＵでは、ｖｌｎｋ命令より先に発行さ
れた複数の命令が実際にいずれかのリソースで実行され
るのは、それぞれの命令が発行された時刻よりかなり遅
延することが生じる。しかし、ｖｌｎｋ命令を発行した
時点で、その命令に先行する複数の命令により指定され
たスカラレジスタ内の初期設定データは、それらの先行
する命令の発行時点にすでにそれらのスカラレジスタか
ら読み出されている。したがって、本実施例では、上記
ｖｌｎｋ命令の発行を契機として、上記読み出し完了ビ
ットＬを１にセットすることにより、これらの一連の命
令の実行が完了する前の時点で、これらのスカラレジス
タ群６０１にＳＰＵから上記次の複数の初期設定データ
をセットアップ可能にしている。In the VPU, the fact that a plurality of instructions issued before the vlnk instruction are actually executed in any resource may be delayed considerably from the time when each instruction is issued. However, at the time of issuing the vlnk instruction, the initialization data in the scalar registers specified by the instructions preceding the instruction have already been read from those scalar registers at the time of issuing the preceding instructions. There is. Therefore, in the present embodiment, the read completion bit L is set to 1 upon the issuance of the vlnk instruction, so that the scalar register group 601 can be executed before the execution of the series of instructions is completed. In addition, the following plural initial setting data can be set up from the SPU.

【００３３】以上のごとく、本実施例では、ＳＰＵがＶ
ＰＵを起動後にベースアドレス等の初期設定データをＶ
ＰＵに順次セットし、ＶＰＵでは、それを利用する命令
が、必要なデータがＳＰＵによりすでにセット済みであ
ることを待ち合わせてからその命令を実行する。したが
って、初期設定データがセットされた後、直ちにそのデ
ータを使用するベクトル命令を実行することができるの
で、ＶＰＵの実行を早期に行い得る。As described above, in this embodiment, the SPU is V
After starting the PU, set the initial setting data such as the base address to V
In the PU, the instructions are sequentially set, and in the VPU, the instruction that uses the VPU waits for the necessary data to be set by the SPU before executing the instruction. Therefore, since the vector instruction that uses the initialization data can be executed immediately after the initialization data is set, the VPU can be executed early.

【００３４】さらに、一連のベクトル命令の実行のため
に、ＳＰＵはＶＰＵを一度だけ起動すればよい。したが
って、ＶＰＵによるセットアップ済みの初期設定データ
の読み出しの完了をＳＰＵが待ち合わせる処理は一度だ
け行えばよく、したがって、この待ち合わせのための時
間が少なくて済む。Furthermore, the SPU need only activate the VPU once for execution of a series of vector instructions. Therefore, the process of waiting for the SPU to wait for the completion of the reading of the already-set-up initial setting data by the VPU needs to be performed only once, and therefore the waiting time is short.

【００３５】さらに、上述のごとく、各命令の発行時点
で必要なスカラレジスタ内の初期設定データを読み出
し、その命令が必要とするリソースに対応して設けられ
た命令保持ユニットに転送することにより、上記一群の
命令の最後の命令であるｖｌｎｋ命令の発行以降の時点
で、ＳＰＵから、後続の複数の初期設定データを上記ス
カラレジスタ群６０１に設定可能にしている。Further, as described above, the initialization data in the scalar register required at the time of issuing each instruction is read and transferred to the instruction holding unit provided corresponding to the resource required by the instruction, At the time point after the issuance of the vlnk instruction, which is the last instruction of the group of instructions, a plurality of subsequent initialization data can be set in the scalar register group 601 from the SPU.

【００３６】以下、本実施例の回路とその動作を命令列
の具体例を参照して説明する。図５は、図１のベクトル
プロセッサで処理されるべき処理の一例を記載したフォ
ートランプログラムである。すでに記載したように、図
１のベクトルプロセッサでは、ＶＰＵは、演算パイプラ
イン３０５、ロード／ストアパイプライン３０６を複数
有し、異なるベクトルデータに対して、同一の処理を並
行して実行可能になっている。図５のプログラムは、こ
のことを利用するためにこの図１のベクトルプロセッサ
により４要素ずつ並列に処理するベクトル命令列にコン
パイルされるように、この積和演算を４つの演算式にて
記載している。The circuit of this embodiment and its operation will be described below with reference to a specific example of an instruction sequence. FIG. 5 is a Fortran program which describes an example of processing to be processed by the vector processor of FIG. As already described, in the vector processor of FIG. 1, the VPU has a plurality of operation pipelines 305 and load / store pipelines 306, and the same processing can be executed in parallel on different vector data. ing. In order to utilize this fact, the program of FIG. 5 describes this product-sum operation by four arithmetic expressions so that the vector processor of FIG. ing.

【００３７】このプログラムは、二次元配列Ｘと二次元
配列Ｙの積に二次元配列Ｚを加算した結果を新たに二次
元配列Ｚとするいわゆる積和演算を実行する３重ＤＯル
ープからなる。最内側ループのみをベクトル化する場
合、配列Ｚと配列Ｙがベクトルデータとして扱われ、配
列Ｘの各要素はスカラデータとして扱われる。すなわ
ち、この最内側ループの最初の式の処理は、配列Ｚの第
ｊ列に属するＮ個のデータからなるベクトルデータ（以
下、これをベクトルデータＺ０と呼ぶことがある）と配
列Ｙの第ｋ列に属するＮ個のデータからなるベクトルデ
ータと、配列Ｘの第ｋ行第ｊ列の要素との積和演算を求
め、この積和演算により得られるＮ個の要素からなるベ
クトルデータでもって配列Ｚの第ｊ列を置き換えるベク
トル演算として実行される。この処理の後に、中間のＤ
Ｏループにしたがって、ｊの値を４づつ変えて、配列Ｚ
の他の列に属する要素からなるベクトルデータと、配列
Ｘの他の一つの要素と、配列Ｙの上と同じベクトルデー
タとを用いて積和演算が行われる。さらに、最外側ルー
プにしたがって、以上の処理をｋの値を変えて実行す
る。This program is composed of a triple DO loop for executing a so-called sum of products operation in which the result of adding the two-dimensional array Z to the product of the two-dimensional array X and the two-dimensional array Y becomes a new two-dimensional array Z. When only the innermost loop is vectorized, the array Z and the array Y are treated as vector data, and each element of the array X is treated as scalar data. That is, the processing of the first expression of this innermost loop is performed by the vector data consisting of N pieces of data belonging to the j-th column of the array Z (hereinafter, this may be referred to as vector data Z0) and the k-th array of the array Y. A vector sum consisting of N pieces of data belonging to a column and an element at the k-th row and the j-th column of the array X is calculated, and the vector data consisting of the N elements obtained by the sum-of-products operation is used to form an array. It is performed as a vector operation to replace the jth column of Z. After this process, the intermediate D
According to the O loop, the value of j is changed by 4 and the array Z
The sum of products operation is performed using the vector data consisting of the elements belonging to the other columns of, the other one element of the array X, and the same vector data as on the array Y. Further, according to the outermost loop, the above processing is executed while changing the value of k.

【００３８】最内側ループの第２から第４の文で記載さ
れた処理も配列データＺ（ｉ，ｊ）のそれぞれ異なる要
素群に対して同様の積和演算として実行される。これら
の第２から第４の式で処理される、配列Ｚ（ｉ，ｊ）の
異なる要素群からなるベクトルデータを以下ではベクト
ルデータＺ１、Ｚ２、Ｚ３と呼ぶ。これらのベクトルデ
ータＺ０、Ｚ１、Ｚ２、Ｚ３の算出に使用される、配列
データＸ（ｉ，ｊ）の要素をそれぞれＸ０、Ｘ１、Ｘ
２、Ｘ３と呼ぶ。The processes described in the second to fourth sentences of the innermost loop are also executed as similar product-sum operations for different element groups of the array data Z (i, j). The vector data, which is processed by these second to fourth equations and consists of different element groups of the array Z (i, j), will be referred to as vector data Z1, Z2, Z3 below. Elements of the array data X (i, j) used for calculation of these vector data Z0, Z1, Z2, Z3 are X0, X1, X respectively.
2, X3.

【００３９】以上のごとく、図５のプログラムでは、一
つのＤＯループでは、それぞれＮ個の要素からなる複数
のベクトルデータを処理する。各ベクトルデータは、図
１の装置では、ベクトルレジスタの長さ（本実施例で
は、各ベクトルレジスタの長さは５１２であると仮定す
る）に等しい要素毎に処理される。したがって、主記憶
１００上の各ベクトルデータも実際にはそれぞれ５１２
個の要素の組に分けて処理される。すなわち、一連のベ
クトル命令の最初の実行時に、各ベクトルデータの最初
の５１２個の要素の組が、主記憶１００からいずれかの
ベクトルレジスタに格納され、演算パイプライン３０５
に供給され、処理される。As described above, in the program of FIG. 5, one DO loop processes a plurality of vector data each consisting of N elements. In the apparatus of FIG. 1, each vector data is processed element by element equal to the length of the vector register (in this embodiment, it is assumed that the length of each vector register is 512). Therefore, each vector data on the main memory 100 is actually 512
It is processed by dividing it into a set of individual elements. That is, at the first execution of a series of vector instructions, the set of the first 512 elements of each vector data is stored from the main memory 100 into any of the vector registers, and the operation pipeline 305
And processed.

【００４０】一連のベクトル命令の実行の後に、各ベク
トルの次の５１２個の要素の組が同じ一連のベクトル命
令により同様に処理される。これらのベクトルデータの
それぞれ５１２個の組もベクトルデータと呼ばれる。一
連のベクトル命令列の実行ごとに、ＳＰＵは、その命令
列の実行に必要な初期設定データを計算しあるいは主記
憶１００からフェッチし、ＶＰＵのスカラレジスタ群６
０１にセットアップする。After execution of a series of vector instructions, the next set of 512 elements of each vector is similarly processed by the same series of vector instructions. Each of these 512 sets of vector data is also called vector data. Each time a series of vector instruction sequences is executed, the SPU calculates initialization data necessary for the execution of the instruction sequence or fetches it from the main memory 100, and the VPU scalar register group 6
Set it up to 01.

【００４１】図５のプログラムを実行するためにＳＰＵ
が実行すべきスカラ命令列およびＶＰＵが実行すべきベ
クトル命令列の概略をそれぞれ図６（ａ−１）、（ｂ−
１）に示し、これらのスカラ命令列およびベクトル命令
列の処理のフローチャートをそれぞれ同図（ａ−２）、
（ｂ−２）に示す。さらに、図７（ａ）、（ｂ）に図６
（ａ−１）、（ｂ−１）に示したスカラ命令列およびベ
クトル命令列の詳細を示す。SPU to execute the program of FIG.
6 (a-1) and 6 (b-) respectively show an outline of a scalar instruction sequence to be executed by the VPU and a vector instruction sequence to be executed by the VPU.
1), and a flowchart of the processing of the scalar instruction sequence and the vector instruction sequence is shown in FIG.
It shows in (b-2). Furthermore, in FIG. 7 (a) and FIG.
Details of the scalar instruction sequence and the vector instruction sequence shown in (a-1) and (b-1) are shown.

【００４２】スカラ命令列は、図７（ａ）に示すよう
に、大きく分けて４つのスカラ命令列１、２、３、４か
らなる。スカラ命令列１は、初期設定データをセットア
ップする二つのｍｔｖｐｒ命令を含む。これらの命令
は、ＳＰＵがＶＰＵにセットアップすべき初期設定デー
タのうち、ベクトル命令列の繰り返しごとに値が変わら
ない初期設定データをセットアップする。スカラ命令列
２は、ＶＰＵを起動するｅｘｖｐ命令を含み、ＶＰＵ
が、すでにベクトル命令列（これは本実施例ではｖｃｏ
ｄｅと呼ばれる）の実行を終了しているのを確認してか
ら、この命令を起動を行うようになっている。すなわ
ち、スカラ命令列２は、読み出し完了ビットＬが０にな
るのをｂｃ命令を用いたループにより待ち合わせ、Ｌ＝
０となったらｅｘｖｐ命令を発行するよう構成されてい
る。スカラ命令列３は、このベクトル命令列ｖｃｏｄｅ
が必要とする初期設定データの内、ベクトル命令の繰り
返し毎に値が変わる複数の初期設定データを順次ＶＰＵ
にセットアップするための命令列で、そのセットアップ
すべきいずれかの初期設定データを主記憶１００からＧ
Ｒ／ＦＲレジスタ群２０４にロードする命令あるいはＧ
Ｒ／ＦＲレジスタ群２０４内のデータを使用してセット
アップすべきいずれかの初期設定データを算出する命令
を含む。スカラ命令列４は、ベクトル命令列ｖｃｏｄｅ
の次の実行のために、スカラ命令列２から再度スカラ命
令列を再度繰り返すことを指示する条件付き分岐命令ｂ
ｃである。As shown in FIG. 7A, the scalar instruction sequence is roughly divided into four scalar instruction sequences 1, 2, 3, and 4. Scalar instruction sequence 1 includes two mtvpr instructions that set up initialization data. These instructions set up the initialization data whose value does not change at each repetition of the vector instruction sequence among the initialization data that the SPU should set up in the VPU. The scalar instruction sequence 2 includes an exvp instruction that activates the VPU, and
However, the vector instruction sequence (this is vco
This command is activated after confirming that the execution of (de) is completed. That is, the scalar instruction sequence 2 waits until the read completion bit L becomes 0 by a loop using the bc instruction, and L =
When it becomes 0, the exvp instruction is issued. The scalar instruction sequence 3 is the vector instruction sequence vcode.
Among the initialization data required by the VPU, a plurality of initialization data whose values change each time the vector instruction is repeated
Command sequence for setting up any of the initial setting data to be set up from the main memory 100 to G
An instruction to load the R / FR register group 204 or G
It includes instructions to calculate any initialization data to be set up using the data in R / FR register group 204. The scalar instruction sequence 4 is a vector instruction sequence vcode.
Conditional branch instruction b for instructing to repeat the scalar instruction sequence again from the scalar instruction sequence 2 for the next execution of
c.

【００４３】このように、本実施例のスカラ命令列は、
スカラ命令列３により必要な初期設定データをセットア
ップする前に、ＶＰＵを起動する命令ｅｘｖｐを含んで
いるところに特徴がある。その命令ｅｘｖｐの実行時
に、ＶＰＵでのスカラレジスタ群内の初期設定データの
読み出しの完了を待ち合わせてから、スカラ命令列３を
実行するところにも特徴がある。さらに、スカラ命令列
１により、ベクトル命令列の繰り返しごとに繰り返し使
用されるがその値が変わらない初期設定データを、ＶＰ
Ｕの起動の前に予めセットアップする所にも特徴があ
る。As described above, the scalar instruction sequence of this embodiment is
It is characterized in that it includes an instruction exvp for activating the VPU before the necessary initialization data is set up by the scalar instruction sequence 3. Another feature is that the scalar instruction sequence 3 is executed after waiting for the completion of reading the initialization data in the scalar register group in the VPU when executing the instruction exvp. Further, by the scalar instruction sequence 1, the initialization data that is repeatedly used every time the vector instruction sequence is repeated but its value does not change
Another feature is that it is set up in advance before U is started.

【００４４】ベクトル命令列ｖｃｏｄｅは、図７（ｂ）
に示すように、実際にはベクトル命令列１１の他にベク
トル命令でない命令１２を含む。ベクトル命令列１１に
含まれたベクトル命令の内、スカラレジスタ群６０１の
いずれかにセットアップされた初期設定データを使用す
る命令は、そのスカラレジスタに対応して設けられたＶ
ビットが１であることを実行制御ユニット３０３が確認
してその命令を発行するようになっている。命令１２
は、ＶＰＵでの命令の発行が終了したことをＶＰＵに通
知するための命令ｖｌｎｋで、この命令は、ベクトル命
令列１１の発行が終了したときに実行され、ＶＰＵの初
期制御ユニット３５０内の読み出し完了ビットＬを０に
リセットする。このｖｌｎｋ命令によりＬビットがリセ
ットされると、命令フェッチユニット３０１、デコード
ユニット３０２等は、実行すべき後続の命令がないこと
を知る。ＳＰＵは、このＬビットを見て、ベクトル命令
列ｖｃｏｄｅ内の全てのベクトル命令の発行の終了を検
出可能にしている。さらに、この命令は、スカラレジス
タ群６０１の内、スカラ命令列３によりセットアップさ
れたスカラレジスタに対応するＶビットを０にリセット
し、それにより、ＶＰＵがスカラレジスタ群６０１に新
たにＳＰＵから初期設定データをセットアップされるの
を検出可能にしている。この命令ｖｌｎｋは、対応する
Ｖビットを０にリセットすべきスカラレジスタを指定可
能になっている。The vector instruction sequence vcode is shown in FIG.
In addition to the vector instruction sequence 11, the instruction 12 which is not a vector instruction is actually included. Among the vector instructions included in the vector instruction sequence 11, the instruction using the initialization data set up in any of the scalar register groups 601 is the V instruction provided corresponding to the scalar register.
The execution control unit 303 confirms that the bit is 1, and issues the instruction. Instruction 12
Is an instruction vlnk for notifying the VPU that issuance of the instruction in the VPU is completed. This instruction is executed when the issuance of the vector instruction sequence 11 is completed, and is read in the initial control unit 350 of the VPU. The completion bit L is reset to 0. When the L bit is reset by this vlnk instruction, the instruction fetch unit 301, the decode unit 302, etc. know that there is no subsequent instruction to be executed. The SPU can detect the end of the issuance of all the vector instructions in the vector instruction sequence vcode by looking at the L bit. Further, this instruction resets the V bit corresponding to the scalar register set up by the scalar instruction sequence 3 in the scalar register group 601, to 0, whereby the VPU newly initializes the scalar register group 601 from the SPU. It makes it possible to detect when data is set up. This instruction vlnk can specify a scalar register whose corresponding V bit should be reset to 0.

【００４５】以下、図７の命令列の実行を図８のタイム
チャートを参照してさらに詳細に説明する。この図は、
ｖｃｏｄｅの起動と実行を外側繰り返し２回分実行した
場合の命令実行の様子を示す。図８において、斜線を施
した部分は、それぞれ一つのベクトル命令の実行を示
す。Hereinafter, the execution of the instruction sequence of FIG. 7 will be described in more detail with reference to the time chart of FIG. This figure is
The state of instruction execution in the case where the activation and execution of vcode is repeated twice outside is shown. In FIG. 8, the hatched portions indicate the execution of one vector instruction.

【００４６】（ＶＰＵにセットアップすべき初期設定デ
ータ）図５のプログラムで指定された処理を実行するた
めにＳＰＵがＶＰＵにセットアップすべき初期設定デー
タは以下の通りである。(Initialization Data to be Set Up in VPU) Initialization data to be set up in the VPU by the SPU in order to execute the processing specified by the program shown in FIG. 5 is as follows.

【００４７】（ａ）実行すべきベクトル命令列の先頭の
命令の主記憶アドレスｖｃｏｄｅ（ｂ）ベクトルデータ長Ｎ（処理される複数のベクトル
データの長さは全て同じになるように決められている）（ｃ）ベクトルデータＺ、Ｙの隣接する要素の主記憶ア
ドレス間隔（以下ストライドと呼ぶ）（これは、本実施
例ではベクトルデータＺ、Ｙに対して等しい値であると
仮定している）（ｄ）ベクトルデータＹの開始主記憶アドレス（以下ベ
ースアドレスと呼ぶ）（これはベクトル命令列ｖｃｏｄ
ｅを実行する毎には変化しない）（ｅ）ベクトルデータＺ０、Ｚ１、Ｚ２、Ｚ３の開始主
記憶アドレス（以下ベースアドレスと呼ぶ）（これはベ
クトル命令列ｖｃｏｄｅを実行する毎に変化する）（ｆ）ベクトルデータＸの要素のうち、ベクトルデータ
Ｚ０、Ｚ１、Ｚ２、Ｚ３の各々との積和演算に使用する
要素Ｘ０、Ｘ１、Ｘ２、Ｘ３（これらはベクトル命令列
の実行毎に変化する）（ＳＰＵによる初期設定データのセットアップ（その
１））本実施例では、スカラ命令列１は、上記（ｃ）
（ｄ）のデータをＶＰＵを起動する前にＶＰＵに順次セ
ットアップする。これらのセットアップは二つのｍｔｖ
ｐｒ命令により行われる。ｍｔｖｐｒ命令は、ＶＰＵ内
のスカラレジスタへデータを設定する命令であり、ＶＰ
Ｕは、この命令が指定するスカラレジスタにこの命令が
指定する初期設定データを書き込むとともにそれに対応
するＶビットを１にする。具体的には、ＳＰＵ内のデコ
ードユニット２０２においてｍｔｖｐｒ命令がデコード
されると、実行制御ユニット２０３は、ＳＰＵ内のＧＲ
／ＦＲレジスタ群２０４内のレジスタの内、各セットア
ップ命令ｍｔｖｐｒで指定されるレジスタに保持された
ストライドあるいはベクトルデータＹのベースアドレス
を読み出し、線７０１を介して、ＶＰＵ内のレジスタ制
御ユニット６００に転送する。(A) Main memory address vcode of the first instruction in the vector instruction sequence to be executed (b) Vector data length N (The lengths of a plurality of vector data to be processed are all determined to be the same. (C) Main memory address interval between adjacent elements of vector data Z and Y (hereinafter referred to as stride) (this is assumed to be the same value for vector data Z and Y in this embodiment). (D) Start main storage address of vector data Y (hereinafter referred to as base address) (this is vector instruction string vcod
(It does not change every time e is executed.) (e) Start main storage address of vector data Z0, Z1, Z2, Z3 (hereinafter referred to as base address) (this changes every time the vector instruction string vcode is executed) ( f) Among the elements of the vector data X, the elements X0, X1, X2, X3 used for the product-sum operation with each of the vector data Z0, Z1, Z2, Z3 (these change each time the vector instruction sequence is executed) (Setup of Initial Setting Data by SPU (Part 1)) In the present embodiment, the scalar instruction sequence 1 is the above (c).
The data of (d) is sequentially set up in the VPU before starting the VPU. These setups have two mtv
It is performed by the pr instruction. The mtvpr instruction is an instruction for setting data in a scalar register in the VPU, and
U writes the initialization data specified by this instruction to the scalar register specified by this instruction and sets the corresponding V bit to 1. Specifically, when the mtvpr instruction is decoded in the decode unit 202 in the SPU, the execution control unit 203 causes the GR in the SPU to
Of the registers in the / FR register group 204, the stride or the base address of the vector data Y held in the register designated by each setup instruction mtvpr is read and transferred to the register control unit 600 in the VPU via the line 701. To do.

【００４８】このとき、その命令が指定したスカラレジ
スタの番号も転送する。レジスタ群制御ユニット６００
では、図２に示すように、書き込み制御ユニット６０３
が、スカラレジスタ群６０１内の、ＳＰＵから転送され
たスカラレジスタの番号を有するスカラレジスタに転送
された初期設定データを書き込むとともに、セレクタ６
０７を制御して、Ｖビット群６０２の内、このスカラレ
ジスタに対応するＶビットに１を書き込む。図８におい
て、先頭の二つの黒丸は、これらの二つの初期設定デー
タが順次セットアップされることを示す。At this time, the number of the scalar register designated by the instruction is also transferred. Register group control unit 600
Then, as shown in FIG. 2, the write control unit 603
Write the initialization data transferred to the scalar register having the number of the scalar register transferred from the SPU in the scalar register group 601, and select the selector 6
By controlling 07, 1 is written in the V bit corresponding to this scalar register in the V bit group 602. In FIG. 8, the first two black circles indicate that these two initial setting data are sequentially set up.

【００４９】（ＳＰＵによるＶＰＵの起動）スカラ命令
列２内の第１の命令ｅｘｖｐはＶＰＵの起動を指示する
命令であり、ＶＰＵ内のＬビット９１５を調べ、その値
が１であれば、すなわち、ＶＰＵにおいてベクトル命令
によるスカラレジスタ群６０１の読み出しが完了してい
なければ、条件コードを設定してこの命令を終了する。
スカラ命令列２の第２の命令は条件分岐命令ｂｃであ
り、条件コードを調べて分岐のための条件が成立してい
れば指定された命令アドレス、今の場合にはｅｘｖｐ命
令のアドレスｗａｉｔ１に分岐する。したがって、ＳＰ
Ｕは、この二つの命令により、ＶＰＵによるスカラレジ
スタ群６０１の読み出しが完了するのを待ち合わせる。(Activation of VPU by SPU) The first instruction exvp in the scalar instruction sequence 2 is an instruction for instructing activation of the VPU, the L bit 915 in the VPU is checked, and if the value is 1, that is, , If the reading of the scalar register group 601 by the vector instruction is not completed in the VPU, the condition code is set and the instruction is terminated.
The second instruction of the scalar instruction sequence 2 is a conditional branch instruction bc, and if the condition code is checked by checking the condition code, the designated instruction address, in this case, the address 1 of the exvp instruction Branch off. Therefore, SP
The U waits for the VPU to finish reading the scalar register group 601 by these two instructions.

【００５０】ｅｘｖｐ命令を実行したときに、Ｌビット
９１５が０であれば、すなわち、ベクトル命令によるス
カラレジスタ群６０１の読み出しが完了している場合に
は、この命令が指定するベクトル命令アドレス（ここで
はこれもｖｃｏｄｅと表す）およびベクトル長ＮをＶＰ
Ｕに通知し、さらにＬビット９１５を１にするよう指示
して命令の実行を終了する。具体的には、ＳＰＵ内のデ
コードユニット２０２においてｅｘｖｐ命令がデコード
されると、実行制御ユニット２０３は、図３に示すＶＰ
Ｕの初期制御ユニット３５０内のＬビット９１５の値を
線５５０−０を介して得る。Ｌビット９１５の値が１の
時には、図示しない条件コードレジスタに条件コードを
設定してこの命令を終了する。When the L bit 915 is 0 when the exvp instruction is executed, that is, when the reading of the scalar register group 601 by the vector instruction is completed, the vector instruction address (here Then this is also expressed as vcode) and the vector length N is VP
The U is notified, the L bit 915 is instructed to be set to 1, and the execution of the instruction is ended. Specifically, when the exvp instruction is decoded in the decode unit 202 in the SPU, the execution control unit 203 causes the VP shown in FIG.
The value of the L bit 915 in U's initial control unit 350 is obtained via line 550-0. When the value of the L bit 915 is 1, a condition code is set in a condition code register (not shown), and this instruction ends.

【００５１】Ｌビットが０だった場合は、実行制御ユニ
ット２０３はｅｘｖｐ命令のオペランドに指定されたベ
クトル命令アドレス（今の場合にはｖｃｏｄｅ）とベク
トル長Ｎを線５５０−１を介して初期制御ユニット３５
０に送出する。初期制御ユニット３５０では、命令フェ
ッチ起動ユニット９１７が線５５２を介してベクトル命
令アドレスｖｃｏｄｅを命令フェッチユニット３０１に
線５５２を介して送出するとともに、命令フェッチを起
動する。またベクトル長設定ユニット９２０は線５５７
−１を介してベクトル長Ｎを実行制御ユニット３０３に
送出する。さらに、セット制御ユニット９１６はＬビッ
ト９１５を１に設定する。Ｌビット９１５が１になる
と、線５５３により命令デコードユニット３０２が起動
され、命令フェッチユニット３０１によりフェッチされ
た命令のデコードを開始する。When the L bit is 0, the execution control unit 203 initially controls the vector instruction address (vcode in this case) and the vector length N specified in the operand of the exvp instruction via the line 550-1. Unit 35
Send to 0. In the initial control unit 350, the instruction fetch activation unit 917 sends the vector instruction address vcode to the instruction fetch unit 301 via the line 552 and activates the instruction fetch. Also, the vector length setting unit 920 uses the line 557.
The vector length N is sent to the execution control unit 303 via -1. Further, the set control unit 916 sets the L bit 915 to 1. When L bit 915 becomes 1, line 553 activates instruction decode unit 302 to start decoding the instruction fetched by instruction fetch unit 301.

【００５２】Ｌビット９１５を線５５０を介して読み出
すには、この線５５０を信号が往復する時間が必要であ
るところ、一般にＶＰＵはＳＰＵとは別の実装基板上に
搭載され、線５５０は比較的長くなる。このために、命
令ｅｘｖｐの実行には複数のサイクルを要する。このた
めに図８において、スカラ命令列２の実行時間が複数サ
イクルにわたっている。また、図８から分かるように、
命令ｅｘｖｐを実行してから、Ｌビットが値０から１に
変化するまでの間にも遅延が同じ理由により生じる。Reading the L-bit 915 via line 550 requires time for the signal to travel back and forth on line 550. Generally, the VPU is mounted on a separate mounting board from the SPU and the line 550 is compared. Become longer. Therefore, execution of the instruction exvp requires multiple cycles. For this reason, in FIG. 8, the execution time of the scalar instruction sequence 2 extends over a plurality of cycles. Also, as can be seen from FIG.
For the same reason, there is a delay between the execution of the instruction exvp and the change of the L bit from the value 0 to 1.

【００５３】（ベクトル命令列の実行（その１））ベク
トル命令ｖｃｏｄｅ内のベクトル命令は、次の５つの部
分命令列からなる。第１の命令ｖｌｆｄは、ベクトルデ
ータＹをベクトルレジスタにロードする命令で、このベ
クトルデータＹは、ベクトルデータＺ０、Ｚ１、Ｚ２、
Ｚ３の各々の計算に共通に使用される。第２の命令ｖｌ
ｆｄから第４の命令ｖｓｔｄは、ベクトルデータＺ０を
主記憶１００からベクトルレジスタにロードして、その
ベクトルデータＺ０とベクトルデータＹと要素Ｘ０との
積和演算を実行して新たなベクトルデータＺ０を算出
し、ベクトルレジスタにその結果を格納した後に、その
ベクトルレジスタから主記憶１００にその結果ベクトル
データＺ０をストアする処理を行う。同様に、第５命令
ｖｌｆｄから第７の命令ｖｓｔｄは、ベクトルデータＺ
１の算出のための命令である。第８命令ｖｌｆｄから第
１０の命令ｖｓｔｄは、ベクトルデータＺ２の算出のた
めの命令である。第１１命令ｖｌｆｄから第１３の命令
ｖｓｔｄは、ベクトルデータＺ３の算出のための命令で
ある。(Execution of Vector Instruction Sequence (1)) The vector instruction in the vector instruction vcode is composed of the following five partial instruction sequences. The first instruction vlfd is an instruction for loading the vector data Y into the vector register, and the vector data Y is the vector data Z0, Z1, Z2,
Commonly used for each calculation of Z3. Second instruction vl
The fourth instruction vstd from fd loads the vector data Z0 into the vector register from the main memory 100, executes the product-sum operation of the vector data Z0, the vector data Y, and the element X0 to generate new vector data Z0. After calculating and storing the result in the vector register, a process of storing the result vector data Z0 in the main memory 100 from the vector register is performed. Similarly, the fifth instruction vlfd to the seventh instruction vstd are vector data Z.
This is an instruction for calculating 1. The eighth instruction vlfd to the tenth instruction vstd are instructions for calculating the vector data Z2. The eleventh instruction vlfd to the thirteenth instruction vstd are instructions for calculating the vector data Z3.

【００５４】ＶＰＵでは、実行制御ユニット３０３は、
フェッチされたベクトル命令列ｖｃｏｄｅの先頭の命令
が発行可能か否かを判定して、発行可能ならば、その命
令を直ちに発行する。今の場合、先頭の命令ｖｌｆｄ
は、ベクトルデータのロード命令であり、この命令が指
定するベクトルデータを主記憶１００からこの命令が指
定するベクトルレジスタにロードする。ベクトルデータ
のアドレスを計算するために必要なベースアドレスとス
トライドをこの命令が指定する二つのスカラレジスタか
ら読み出す。しかし、これらのスカラレジスタに対応す
る二つのＶビットの少なくとも一方が０のときには、両
方のＶビットが１になるのを待ち合わせてからこの命令
を発行する。これらのＶビットがともに１になれば、ス
カラレジスタ群６０１に必要なデータが書き込まれたこ
とを意味するので、スカラレジスタ群６０１の内、この
命令が指定する二つのスカラレジスタから上記ベースア
ドレスとストライドを読み出してこの命令を発行する。In the VPU, the execution control unit 303
It is determined whether or not the first instruction of the fetched vector instruction sequence vcode can be issued, and if it can be issued, the instruction is immediately issued. In this case, the first instruction vlfd
Is a load instruction of vector data, and loads vector data designated by this instruction from the main memory 100 to the vector register designated by this instruction. The base address and stride required to calculate the vector data address are read from the two scalar registers specified by this instruction. However, when at least one of the two V bits corresponding to these scalar registers is 0, this instruction is issued after waiting for both V bits to become 1. If both of these V bits are set to 1, it means that the necessary data has been written to the scalar register group 601, and therefore, from the two scalar registers designated by this instruction in the scalar register group 601, the base address Read the stride and issue this command.

【００５５】図７のプログラムの場合、この先頭のｖｌ
ｆｄ命令は、ベクトルデータＹをベクトルレジスタにロ
ードする命令であり、この命令が必要とする、ベースア
ドレスとストライドはすでにスカラ命令列１によりスカ
ラレジスタにＶＰＵの起動前にセットアップされてい
る。したがって、このロード命令は発行可能であり、直
ちに発行される。In the case of the program shown in FIG. 7, the leading vl
The fd instruction is an instruction for loading the vector data Y into the vector register, and the base address and stride required by this instruction have already been set up in the scalar register by the scalar instruction sequence 1 before the VPU is activated. Therefore, this load instruction can be issued and is issued immediately.

【００５６】ベクトル命令列ｖｃｏｄｅの第２の命令
は、ベクトルデータＺ０を主記憶１００からロードする
命令である。この命令が必要とする初期設定データは、
スカラ命令列１によってはセットアップされていないの
で、この命令は、ＳＰＵからのそれらの初期設定データ
がセットアップされた後に発行される。The second instruction of the vector instruction string vcode is an instruction to load the vector data Z0 from the main memory 100. The initialization data required by this command is
Since it has not been set up by the Scalar Instruction Sequence 1, this instruction is issued after those initialization data from the SPU have been set up.

【００５７】より具体的には、ＶＰＵにおける命令の実
行の制御は以下のようにして行われる。ＶＰＵ内のデコ
ードユニット３０２においてベクトル命令がデコードさ
れると、デコード結果が線５０３を介して実行制御ユニ
ット３０３に送られる。それとともに、デコードした結
果その命令がスカラレジスタを参照する場合は、レジス
タ群制御ユニット６００に線５０９を介して指示が与え
られる。レジスタ群制御ユニット６００においては、図
２に示すように、線５０９の指示により読み出し制御ユ
ニット６０５が、セレクタ６０８を介して指定されたス
カラレジスタの内容を読み出して線７０２−１に送出す
るとともに、セレクタ６０９を介して指定されたスカラ
レジスタに対応するＶビットの値を読み出して線７０２
−０に送出する。More specifically, control of instruction execution in the VPU is performed as follows. When the vector instruction is decoded in the decode unit 302 in the VPU, the decode result is sent to the execution control unit 303 via the line 503. At the same time, when the instruction as a result of decoding refers to a scalar register, an instruction is given to the register group control unit 600 via the line 509. In the register group control unit 600, as shown in FIG. 2, the read control unit 605 reads the content of the scalar register designated through the selector 608 and sends it to the line 702-1, in accordance with the instruction on the line 509. The value of the V bit corresponding to the scalar register designated through the selector 609 is read and the line 702 is read.
-Send to 0.

【００５８】実行制御ユニット３０３（図４参照）は、
デコードされた命令を線５０３により命令レジスタ８０
０に受け取る。命令レジスタ８００の内容中、命令の種
類に関する情報がパイプチェックユニット８０４に取り
込まれ、それぞれ演算パイプラインとロードストアパイ
プラインの稼働状況を示す状態表示ユニット８０１、８
０２の内容と比較され、命令を発行してよいか調べられ
る。状態表示ユニット８０１、８０２には各々線５０５
−１、５０６−１を介して演算パイプライン３０５もし
くはロードストアパイプライン３０６の稼働状況が通知
されている。命令を実行可能なパイプライン（リソー
ス）が空いていれば命令は発行可能である。本実施例で
は、実際には、このリソースが空いているか否かは、そ
のリソースに対応して設けられた命令保持ユニットが新
たな命令を保持可能であるかにより判断する。The execution control unit 303 (see FIG. 4) is
The decoded instruction is transferred to the instruction register 80 via the line 503.
Receive to 0. In the contents of the instruction register 800, information about the type of instruction is fetched by the pipe check unit 804, and status display units 801 and 8 showing the operating statuses of the operation pipeline and the load store pipeline, respectively.
It is compared with the contents of 02 to see if the instruction can be issued. Each of the status display units 801 and 802 has a line 505.
-1, 506-1, the operating status of the operation pipeline 305 or the load / store pipeline 306 is notified. An instruction can be issued if a pipeline (resource) capable of executing the instruction is free. In this embodiment, whether or not this resource is actually available is determined by whether or not the instruction holding unit provided corresponding to the resource can hold a new instruction.

【００５９】命令レジスタ８００の内容中、どのレジス
タを使用するかに関する情報がレジスタチェックユニッ
ト８０５に取り込まれ、ベクトルレジスタ群３０４の稼
働状況を示すベクトルレジスタ状態表示ユニット８０３
の内容と、線７０２−０により示されるＶビットの値と
比較され、命令を発行してよいか調べられる。ベクトル
レジスタ状態表示ユニット８０３には線５０４−１を介
してベクトルレジスタ群３０４の稼働状況が通知されて
いる。必要とするＶＲレジスタの読み出し／書き込みポ
ートが空いており、かつ読み出す必要のあるスカラレジ
スタに対応するＶビットの値が１の時、その命令は発行
可能である。Information about which register is used in the contents of the instruction register 800 is fetched by the register check unit 805, and the vector register status display unit 803 showing the operating status of the vector register group 304.
Is compared to the value of the V bit indicated by line 702-0 to see if the instruction may be issued. The operating status of the vector register group 304 is notified to the vector register status display unit 803 via a line 504-1. When the read / write port of the required VR register is empty and the value of the V bit corresponding to the scalar register that needs to be read is 1, the instruction can be issued.

【００６０】パイプチェックユニット８０４、レジスタ
チェックユニット８０５がともに命令が発行可能と判断
した場合、命令発行ユニット８０６は命令を発行する。
すなわち、線７０２−１より受け取ったスカラレジスタ
の値と、線８１２を介して受け取った命令の種類を、線
５０５−０もしくは線５０６−０を介して演算パイプラ
イン３０５もしくはロードストアパイプライン３０６に
送出する。また線５５７−１より受け取ったベクトル長
と、線８１３を介して受け取ったベクトルレジスタの番
号と読み出し／書き込みの別を、線５０４−０を介して
ベクトルレジスタ群３０４に送出する。さらに、線８１
４を介してパイプライン稼働状況表示ユニット８０１，
８０２およびベクトルレジスタ稼働状況表示ユニット８
０３を更新する。When both the pipe check unit 804 and the register check unit 805 determine that the instruction can be issued, the instruction issuing unit 806 issues the instruction.
That is, the value of the scalar register received from the line 702-1 and the type of the instruction received via the line 812 are transferred to the operation pipeline 305 or the load / store pipeline 306 via the line 505-0 or the line 506-0. Send out. The vector length received from the line 557-1, the number of the vector register received via the line 813, and the read / write status are sent to the vector register group 304 via the line 504-0. In addition, line 81
Pipeline operation status display unit 801,
802 and vector register operating status display unit 8
Update 03.

【００６１】発行ユニット８０６は、命令が指定するス
カラレジスタに対応するＶビットが０のときには、それ
が１になるのを待ち合わせてからその命令を発行する。
その命令を発行するときには、その命令が指定するスカ
ラレジスタの内容を読み出し、その命令が必要とするリ
ソース、たとえば、ロードストアパイプライン３０６に
対応して設けられた命令保持ユニット、例えば、３３６
にその命令の解読情報とともに転送する。こうして、そ
の命令を発行した時に、その命令が必要とするスカラレ
ジスタの読み出しが完了する。When the V bit corresponding to the scalar register designated by the instruction is 0, the issuing unit 806 waits until it becomes 1 before issuing the instruction.
When the instruction is issued, the contents of the scalar register designated by the instruction are read, and the resource required by the instruction, for example, an instruction holding unit provided corresponding to the load / store pipeline 306, for example, 336.
It is transferred with the decoding information of the instruction. Thus, when the instruction is issued, the reading of the scalar register required by the instruction is completed.

【００６２】発行ユニット８０６により発行された命令
がいずれかの命令保持ユニット、例えば、３３６に転送
されると、その命令保持ユニットは、転送された命令の
解読情報と、その命令解読情報に対応してスカラレジス
タから読み出された初期設定データとを一時的に保持
し、対応するロードストアパイプライン３０６が利用可
能になった時点で、保持された命令解読情報と初期設定
データとを使用して、その命令が要求する処理の実行
を、対応するリソースに指示する。When the instruction issued by the issuing unit 806 is transferred to any instruction holding unit, eg, 336, the instruction holding unit corresponds to the decoded information of the transferred instruction and the instruction decoded information. And temporarily store the initialization data read from the scalar register, and when the corresponding load store pipeline 306 becomes available, the instruction decoding information and the initialization data that are held are used. , Instructing the corresponding resource to execute the processing requested by the instruction.

【００６３】今の例では、ベクトル命令列ｖｃｏｄｅの
最初の命令ｖｌｆｄに対しては、ロードストアパイプラ
イン３０６が、ＳＰＵによりセットアップされたベクト
ルデータＹのベースアドレスに基づいて、このベクトル
データＹを主記憶１００よりこの命令が指定する一つの
ベクトルレジスタにロードする。In the present example, for the first instruction vlfd of the vector instruction string vcode, the load / store pipeline 306 mainly sets the vector data Y based on the base address of the vector data Y set up by the SPU. The memory 100 loads one vector register designated by this instruction.

【００６４】（ＳＰＵによる初期設定データのセットア
ップ（その２））スカラ命令列３では、前述した、ＶＰ
Ｕにセットアップすべき初期設定データの内、初期設定
データ（ｅ）から（ｆ）に記載の８個の初期設定データ
をセットアップする。スカラ命令列３は、図５の４つの
演算式によりベクトルデータＺ０からＺ３のそれぞれを
処理するに必要な初期設定データを生成し、ＶＰＵにセ
ットアップするための４の部分命令群にて構成されてい
る。(Setup of Initial Setting Data by SPU (Part 2)) In the scalar instruction sequence 3, the VP described above is used.
Among the initial setting data to be set up in U, eight initial setting data described in (e) to (f) are set up. The scalar instruction sequence 3 is composed of four partial instruction groups for generating initial setting data necessary for processing each of the vector data Z0 to Z3 by the four arithmetic expressions of FIG. 5 and setting it up in the VPU. There is.

【００６５】すなわち、スカラ命令列３の最初の命令ｍ
ｔｖｐｒから第４の命令ａｄｄは、ベクトルデータＺ０
の処理に関連するスカラ命令列である。この最初の命令
ｍｔｖｐｒは、予め汎用レジスタに格納された、ベクト
ルデータＺ０のベースアドレスをこの命令が指定するス
カラレジスタにセットアップする。第２の命令ｌｆｄｕ
は、ベクトルデータＺ０の算出に用いいる配列データＸ
（ｉ，ｊ）の一つの要素Ｘ０をＳＰＵ内の浮動小数点レ
ジスタにロードする命令である。第３の命令ｆｍｔｖｐ
ｒは、このロードされたデータＸ０をＶＰＵ内のスカラ
レジスタにセットアップする命令である。第４の命令ａ
ｄｄは、ベクトルデータＺのうち、現在処理されたベク
トルデータＺ０としてベクトル命令ｖｃｏｄｅの次の実
行時に処理されるべきベクトルデータのベースアドレス
を算出する加算命令である。第５の命令ｍｔｖｐｒから
第８の命令ａｄｄは、ベクトルデータＺ１の処理に関係
する同様の命令である。第９の命令ｍｔｖｐｒから第１
２の命令ａｄｄは、ベクトルデータＺ２の処理に関係す
る同様の命令である。第１３の命令ｍｔｖｐｒから第１
６の命令ａｄｄは、ベクトルデータＺ３の処理に関係す
る同様の命令である。That is, the first instruction m of the scalar instruction sequence 3
The fourth instruction add from tvpr is vector data Z0.
It is a scalar instruction sequence related to the processing of. The first instruction mtvpr sets up the base address of the vector data Z0 stored in the general-purpose register in advance in the scalar register designated by this instruction. Second instruction lfdu
Is the array data X used to calculate the vector data Z0.
It is an instruction to load one element X0 of (i, j) into the floating point register in the SPU. Third instruction fmtvp
r is an instruction to set up the loaded data X0 in a scalar register in the VPU. Fourth instruction a
dd is an addition instruction for calculating the base address of vector data to be processed at the time of the next execution of the vector instruction vcode as the currently processed vector data Z0 of the vector data Z. The fifth instruction mtvpr to the eighth instruction add are similar instructions related to the processing of the vector data Z1. 9th instruction mtvpr to 1st
The second instruction add is a similar instruction related to the processing of the vector data Z2. 13th instruction mtvpr to 1st
The instruction add of 6 is a similar instruction related to the processing of the vector data Z3.

【００６６】（ベクトル命令列の実行（その２））スカ
ラ命令列３内の第１の命令ｍｔｖｐｒがＶＰＵ内のスカ
ラレジスタに初期設定データをセットアップすると、ベ
クトル命令列ｖｃｏｄｅの第２の命令ｖｌｆｄが必要と
する初期設定データ（すなわち、ベクトルデータＺ０の
ベースアドレスとストライド）が両方とも利用可能にな
るので、この命令は発行可能になり、直ちに発行され
る。ベクトル命令ｖｃｏｄｅ内の第３の命令ｖｆｍａｄ
ｄは、ベクトルレジスタ内のベクトルデータＹと、スカ
ラレジスタ内のデータＸ０のスカラ積に、他のベクトル
レジスタ内のベクトルデータＺ０を加算して、その結果
でもって元のベクトルデータＺ０を置換する積和演算を
実行する命令である。この命令が必要とする初期設定デ
ータＸ０が、ＳＰＵで実行されるスカラ命令列３内の第
３の命令ｌｆｄｕによりセットアップされたときにはこ
の命令が発行される。なお、この命令が必要とするベク
トルデータＺ０、Ｙを適当なベクトルレジスタにロード
する第２、第１のベクトル命令はすでに実行中である。
したがって、第３のベクトル命令ｖｆａｄｄは、これら
第２、第１の命令によりロード済みのデータを、これら
の命令によりこれらのベクトルデータの後続の要素をロ
ードするのと並行して利用する。この利用方法は、いわ
ゆるチェインニングと呼ばれ、それ自体周知であり、本
実施例でもこのための回路が図１の装置に組込まれてい
るが、簡単化のために図示していない。ベクトル命令ｖ
ｃｏｄｅの第４の命令ｖｆｓｔｄは、第３の命令で算出
されたベクトルデータＺ０をベクトルレジスタから主記
憶１００にストアする命令である。この命令が必要する
初期設定データは、このベクトルのベースアドレスとス
トライドである。これらのデータはすでに実行済みの命
令によりスカラレジスタにセットアップされているの
で、この第４の命令ｖｆｓｔｄは、第３の命令に引き続
いて実行可能である。(Execution of Vector Instruction Sequence (Part 2)) When the first instruction mtvpr in the scalar instruction sequence 3 sets up the initialization data in the scalar register in the VPU, the second instruction vlfd in the vector instruction sequence vcode becomes This instruction is ready to issue and is issued immediately because both the required initialization data (ie, the base address and stride of the vector data Z0) are available. Third instruction vfmad in vector instruction vcode
d is a product that adds the vector data Y in the vector register and the data X0 in the scalar register to the scalar product of the vector data Z0 in another vector register, and replaces the original vector data Z0 with the result. This is an instruction to execute a sum operation. This instruction is issued when the initialization data X0 required by this instruction is set up by the third instruction lfdu in the scalar instruction sequence 3 executed by the SPU. The second and first vector instructions for loading the vector data Z0, Y required by this instruction into an appropriate vector register are already being executed.
Therefore, the third vector instruction vfadd uses the data loaded by these second and first instructions in parallel with loading the subsequent elements of these vector data by these instructions. This usage method is so-called chaining and is known per se. In this embodiment, a circuit for this purpose is also incorporated in the apparatus of FIG. 1, but is not shown for simplification. Vector instruction v
The fourth instruction vfstd of code is an instruction to store the vector data Z0 calculated by the third instruction from the vector register in the main memory 100. The initialization data required by this instruction is the base address and stride of this vector. This fourth instruction vfstd can be executed subsequent to the third instruction because these data have been set up in the scalar register by the already executed instruction.

【００６７】以下同様にして、ベクトルデータＺ１、Ｚ
２、Ｚ３の処理を実行するための複数の命令もそれらの
命令の実行に必要な初期設定データがＳＰＵからセット
アップされ次第、それぞれ直ちに発行される。Similarly, vector data Z1 and Z
A plurality of instructions for executing the processing of 2 and Z3 are also issued immediately as soon as the initialization data necessary for the execution of these instructions are set up from the SPU.

【００６８】（ベクトルリンク命令ｖｌｎｋの実行）ベ
クトル命令列が発行された後には、ベクトル命令列ｖｃ
ｏｄｅの最後の命令ｖｌｎｋが命令発行ユニット８０４
により発行され、実行される。図７のプログラムでは、
この命令は、ＳＰＵにより実行されるスカラ命令列１に
よりＶＰＵの起動前にセットアップされた、ストライド
およびベクトルデータのベースアドレスを保持する二つ
のスカラレジスタ以外のスカラレジスタに対するＶビッ
トを０にリセットする。これらのデータは、ベクトル命
令列ｖｃｏｄｅの次の実行時にも使用するためである。
さらに、Ｌビット９１５を０にリセットして、ＳＰＵ
に、ベクトル命令列ｖｃｏｄｅの発行が終了したことを
通知する。(Execution of vector link instruction vlnk) After the vector instruction sequence is issued, the vector instruction sequence vc
The last instruction vlnk of the ode is the instruction issue unit 804.
Issued and executed by. In the program of Figure 7,
This instruction resets the V bit to 0 for the scalar registers other than the two scalar registers holding the base address of the stride and vector data set up by the scalar instruction sequence 1 executed by the SPU before the activation of the VPU. This data is used also when the vector instruction string vcode is executed next time.
In addition, the L bit 915 is reset to 0, and the SPU
Is notified that the issuance of the vector instruction sequence vcode is completed.

【００６９】この命令の実行は以下のように実行され
る。ＶＰＵ内のデコードユニット３０２においてｖｌｎ
ｋ命令がデコードされると、デコード結果が線５０３を
介して実行制御ユニット３０３に送られる。実行制御ユ
ニット３０３（図４参照）は先行するベクトル命令が全
て発行されてから、以下の処理を行う。すなわち、線７
０２−２を介して命令で指定されたＶビットのリセット
指示をレジスタ群制御ユニット６００に送出し、また線
５５７−０を介してＬビットのリセット指示を初期制御
ユニット３５０に送出する。レジスタ群制御ユニット６
００（図２参照）では、線７０２−２の指示によりリセ
ット制御ユニット６０４が、セレクタ６０７を介して指
定されたＶビット、今の例では、ベクトルデータＹ、Ｚ
のストライドおよびベクトルデータＹのベースアドレス
を保持する二つのスカラレジスタ以外のスカラレジスタ
に対応するＶビットをリセットする。初期制御ユニット
３５０（図３参照）では、線５５７−０の指示によりリ
セット制御ユニット９１９が、Ｌビット９１５を０にす
る。Ｌビットの値が０に成ると、この値が線５５３によ
り命令デコードユニット３０２に通知され、この命令デ
コードユニット３０２は命令のデコードを終了する。The execution of this instruction is executed as follows. Vln in the decoding unit 302 in the VPU
When the k instruction is decoded, the decoding result is sent to the execution control unit 303 via the line 503. The execution control unit 303 (see FIG. 4) performs the following processing after all the preceding vector instructions have been issued. Ie line 7
The reset instruction of V bit designated by the instruction is sent to the register group control unit 600 via 02-2, and the reset instruction of L bit is sent to the initial control unit 350 via line 557-0. Register group control unit 6
00 (see FIG. 2), the reset control unit 604 is instructed by the line 702-2 to set the V bit designated by the selector 607. In the present example, the vector data Y, Z.
And the V bit corresponding to a scalar register other than the two scalar registers holding the base address of the vector data Y. In the initial control unit 350 (see FIG. 3), the reset control unit 919 sets the L bit 915 to 0 according to the instruction on the line 557-0. When the value of the L bit becomes 0, this value is notified to the instruction decode unit 302 by the line 553, and the instruction decode unit 302 finishes decoding the instruction.

【００７０】こうして、命令ｖｌｎｋが実行されると、
ＶＰＵは、ＳＰＵからの起動待ちの状態になり、ＳＰＵ
は、すでに述べたスカラ命令列２と３を繰り返す。この
スカラ命令列３の繰り返し時には、スカラ命令列３で
は、配列Ｘ（ｉ，ｊ）の別の要素データをデータＸ０、
Ｘ１、Ｘ２、Ｘ３としてＶＰＵにセットアップする。Thus, when the instruction vlnk is executed,
The VPU is in a state of waiting for activation from the SPU,
Repeats the scalar instruction sequences 2 and 3 already described. When the scalar instruction sequence 3 is repeated, in the scalar instruction sequence 3, another element data of the array X (i, j) is transferred to the data X0,
Set up on the VPU as X1, X2, X3.

【００７１】以上の実施例では、命令を発行する時に、
その命令が必要とするスカラレジスタのデータは参照さ
れるので、ｖｌｎｋ命令１２が発行されてＬビットが０
になる時には、ベクトル命令列ｖｃｏｄｅが参照する全
てのスカラレジスタの参照は終了している。そのためＳ
ＰＵで再びスカラ命令列２中のｅｘｖｐ命令を実行し、
さらにスカラ命令列３により新しい値をスカラレジスタ
に書き込んでも、現在実行中のベクトル命令列ｖｃｏｄ
ｅの実行には悪影響は与えない。また、ベクトル命令列
ｖｃｏｄｅの最後に位置するｖｌｎｋ命令１２により、
各スカラレジスタに対応する有効性ビットＶをリセット
しているため、次に起動されたＶＰＵでの命令列１１の
実行時には、ＳＰＵにおける２回目のスカラ命令列３の
実行を必ず待ってからスカラレジスタの値を参照するの
で、前の値を誤って参照することはない。ただしＳＰＵ
がスカラ命令列１において設定したスカラレジスタのＶ
ビットはｖｌｎｋ命令１２によってリセットされていな
いので、繰り返し何度も直ちに読むことができる。In the above embodiment, when issuing an instruction,
Since the data in the scalar register required by the instruction is referenced, the vlnk instruction 12 is issued and the L bit becomes 0.
When, the reference of all scalar registers referred to by the vector instruction sequence vcode is completed. Therefore S
The exvp instruction in the scalar instruction sequence 2 is executed again by PU,
Further, even if a new value is written to the scalar register by the scalar instruction sequence 3, the vector instruction sequence vcod currently being executed is
It does not adversely affect the execution of e. Also, by the vlnk instruction 12 located at the end of the vector instruction sequence vcode,
Since the validity bit V corresponding to each scalar register is reset, when executing the instruction sequence 11 in the next activated VPU, be sure to wait for the second execution of the scalar instruction sequence 3 in the SPU before the scalar register is executed. Since it refers to the value of, it never accidentally refers to the previous value. However, SPU
V of the scalar register set in the scalar instruction sequence 1
Since the bit has not been reset by the vlnk instruction 12, it can be read repeatedly over and over again.

【００７２】このように、本実施例では、ＳＰＵにより
ＶＰＵを起動後に、ＳＰＵでスカラ命令列３を順次実行
するに従いスカラレジスタに初期設定データが順次セッ
トアップされ、それぞれのスカラレジスタに対応する有
効性ビットＶがセットされ、それぞれの初期設定データ
がセットアップされるに従いＶＰＵの命令列１１中のベ
クトル命令が順次発行可能となる。したがって、初期設
定データのセットアップを開始後すぐにベクトル命令列
の実行を開始できる。このためベクトル命令列の実行時
間を短縮できる。As described above, in this embodiment, after the VPU is activated by the SPU, the initialization data is sequentially set up in the scalar register as the scalar instruction sequence 3 is sequentially executed by the SPU, and the validity corresponding to each scalar register is set. As the bit V is set and each initial setting data is set up, vector instructions in the VPU instruction sequence 11 can be sequentially issued. Therefore, the execution of the vector instruction sequence can be started immediately after starting the setup of the initialization data. Therefore, the execution time of the vector instruction sequence can be shortened.

【００７３】さらに、ＳＰＵから複数の初期設定データ
を供給する周期が長い場合でも、少ない数の初期設定デ
ータがセットアップされた段階でＶＰＵのベクトル命令
列の実行を開始できるので、ベクトル命令列の実行の完
了が遅れることが少なくできる。Further, even when the cycle of supplying a plurality of initialization data from the SPU is long, the execution of the vector instruction sequence of the VPU can be started at the stage when a small number of initialization data have been set up. You can reduce the delay of completion of.

【００７４】また、本実施例では、ベクトル命令列の実
行を一度行う間に、セットアップ済みの初期設定データ
の読み出しの完了をＳＰＵが待ち合わせるのは一度でよ
い。したがって、ベクトル命令列を繰り返し実行すると
きに、この待ち合わせに要する時間の総量を減らせるこ
とができる。Further, in the present embodiment, the SPU may wait for the completion of the read-out of the already-set-up initialization data only once while executing the vector instruction sequence once. Therefore, when the vector instruction sequence is repeatedly executed, the total amount of time required for this waiting can be reduced.

【００７５】なお、実施例では、ＶＰＵの起動前に複数
の初期設定データをセットアップしたが、これらの初期
設定データのセットアップの間、ＶＰＵは起動できない
が、これらの初期設定データは、ベクトル命令列の繰り
返し時にも使用されるので、この繰り返し時に新たにセ
ットアップする必要がないので、それだけＶＰＵの処理
を速めることができる。In the embodiment, although a plurality of initial setting data are set up before the VPU is started, the VPU cannot be started during the setting of these initial setting data. However, these initial setting data are a vector instruction string. Since it is also used during the repetition of, there is no need to newly set up during this repetition, so that the processing of the VPU can be speeded up accordingly.

【００７６】さらに、命令発行ユニットによる命令の発
行時に、スカラレジスタ内の初期設定データを読み出
し、リソース対応の命令保持ユニットに、そのデータを
保持するので、ベクトル命令列ｖｃｏｄｅの最後の命令
ｖｌｎｋを発行した時点で、ベクトル命令列によるスカ
ラレジスタ群の読み出しを完了することができる。この
ために、その後このベクトル命令列の一部の命令がＶＰ
Ｕで実行されている間に、ＶＰＵでのベクトル命令列の
次の実行のための、新たな複数の初期設定データのセッ
トアップをこの実行と並行して行える。したがって、Ｖ
ＰＵによりベクトル命令列を繰り返し実行するときに、
それらの実行を続けて行うことができる。しかも、ベク
トル命令列の実行中にこのベクトル命令列の次の実行の
ための初期設定データをセットアップすることが、命令
の発行時に発行された命令が必要とする初期設定データ
を読み出す回路と、そのデータをリソース対応の命令保
持ユニットとなどという簡単な回路で実現出来る。Furthermore, when the instruction issuing unit issues an instruction, the initialization data in the scalar register is read out and held in the instruction holding unit corresponding to the resource, so the last instruction vlnk of the vector instruction string vcode is issued. At that point, the reading of the scalar register group by the vector instruction sequence can be completed. For this reason, some instructions of this vector instruction sequence are
While being executed in U, a new set of initialization data can be set up in parallel for the next execution of the vector instruction sequence in the VPU. Therefore, V
When a vector instruction sequence is repeatedly executed by PU,
Their execution can continue. Moreover, setting up the initialization data for the next execution of this vector instruction sequence during execution of the vector instruction sequence is achieved by using a circuit for reading out the initialization data required by the instruction issued at the time of issuing the instruction, and Data can be realized with a simple circuit such as an instruction holding unit corresponding to resources.

【００７７】＜変形例＞（１）図１ではＳＰＵとＶＰＵは線５００および７０１
により直結されているが、バスで接続された構成でもよ
い。<Modification> (1) In FIG. 1, the SPU and VPU are lines 500 and 701.
Although it is directly connected by the above, it may be connected by a bus.

【００７８】（２）図１は主プロセッサ、従プロセッサ
としてスカラ処理ユニット、ベクトル処理ユニットから
なるベクトルプロセッサであるが、本発明は、ベクトル
処理に直接関係がない、主プロセッサおよび従プロセッ
サとからなるマルチプロセッサにも適用できる。(2) Although FIG. 1 shows a vector processor having a main processor and a scalar processing unit and a vector processing unit as slave processors, the present invention comprises a main processor and a slave processor which are not directly related to vector processing. It can also be applied to multiprocessors.

【００７９】（３）命令発行ユニットは、実施例では、
ベクトル命令列ｖｃｏｄｅ内の命令を、それらの命令の
解読順に発行した。しかし、ベクトルプロセッサですで
に知られているように、いずれかの解読済みの命令の発
行ができない場合に、その命令が発行可能になるまでの
間に、後続の命令を、その発行待ちの命令を追い越して
発行する技術を採用することも可能である。この追い越
し技術を使用する場合には、ベクトル命令列ｖｃｏｄｅ
の発行は、それに先行する命令を追い越して発行はしな
い。少なくとも、この命令より前に位置する、いずれか
のスカラレジスタに保持された初期設定データを利用す
る命令の発行より先には発行しない。すなわち、このｖ
ｌｎｋ命令により読み出し完了ビットｌをセットしする
ので、このｖｌｎｋ命令の発行時には、上に述べたよう
な、初期設定データを利用する命令は全て発行されてい
る必要がある。このためには、このｖｌｎｋ命令より先
行する命令による、初期設定データの読み出しが完了す
るまで、このｖｌｎｋ命令の発行を待ち合わせる必要が
ある。(3) In the embodiment, the instruction issuing unit is
The instructions in the vector instruction sequence vcode were issued in the decoding order of those instructions. However, as is already known in the vector processor, if one of the decoded instructions cannot be issued, the subsequent instruction is issued until the instruction becomes ready to issue. It is also possible to adopt the technology of issuing after passing. When using this overtaking technique, the vector instruction sequence vcode
Is not issued past the preceding instruction. At least, it is not issued prior to the issuance of the instruction using the initialization data held in any of the scalar registers located before this instruction. That is, this v
Since the read completion bit 1 is set by the lnk instruction, at the time of issuing this vlnk instruction, it is necessary to issue all the above-mentioned instructions using the initialization data. To this end, it is necessary to wait for the vlnk instruction to be issued until the initialization data read by the instruction preceding the vlnk instruction is completed.

【００８０】（４）本実施例において、ＶＰＵが予め定
められた一連の命令列のみを実行するように構成されて
いる場合には、ＳＰＵがＶＰＵを起動するときに、ＳＰ
ＵはＶＰＵに実行すべき一連の命令を指定する必要はな
い。(4) In the present embodiment, if the VPU is configured to execute only a predetermined series of instructions, the SP is activated when the SPU activates the VPU.
U does not need to specify to the VPU the sequence of instructions to be executed.

【００８１】（５）本実施例の技術は、本実施例のＶＰ
Ｕ、ＳＰＵとは異なる処理を行う主プロセッサと従プロ
セッサからなる他のマルチプロセッサにも適用可能であ
る。それらにおいて、主プロセッサがセットアップする
初期設定データは、実施例に記載したごとく従プロセッ
サが処理すべきデータを指定するデータでなくて、その
処理すべきデータそのものであってもよい。(5) The technique of this embodiment is based on the VP of this embodiment.
It is also applicable to other multiprocessors including a main processor and a slave processor that perform processing different from U and SPU. In them, the initial setting data set up by the main processor may not be the data designating the data to be processed by the slave processor as described in the embodiment but may be the data itself to be processed.

【００８２】[0082]

【発明の効果】本発明では、主プロセッサによる複数の
初期設定データのセットアップの開始後速やかに、従プ
ロセッサにおいて、このデータを利用する命令あるいは
命令列を実行できる。しかも、従プロセッサの再度の起
動のために必要な、従プロセッサによるセットアップ済
みの初期設定データの読み出しの完了の待ち合わせに要
する時間の総量を減らすこともできる。According to the present invention, an instruction or an instruction sequence utilizing this data can be executed in the slave processor immediately after the start of the setup of a plurality of initialization data by the main processor. In addition, it is possible to reduce the total amount of time required to wait for the completion of the read-out of the initialization data that has been set up by the slave processor, which is necessary for restarting the slave processor.

【００８３】さらに、従プロセッサでの命令の実行中
に、その従プロセッサでの命令列の次の実行のための初
期設定データを簡単な回路でセットアップできる。Further, during execution of an instruction in the slave processor, initialization data for the next execution of the instruction sequence in the slave processor can be set up by a simple circuit.

[Brief description of drawings]

【図１】本発明によるベクトルプロセッサの概略構成
図。FIG. 1 is a schematic configuration diagram of a vector processor according to the present invention.

【図２】図１の装置に使用するベクトル処理ユニット内
のレジスタ制御ユニットの概略構成図。2 is a schematic configuration diagram of a register control unit in a vector processing unit used in the apparatus of FIG.

【図３】図１の装置に使用するベクトル処理ユニット内
の初期制御ユニットの概略構成図。3 is a schematic configuration diagram of an initial control unit in a vector processing unit used in the apparatus of FIG.

【図４】図１の装置に使用するベクトル処理ユニット内
の命令発行制御ユニットの概略構成図。4 is a schematic configuration diagram of an instruction issue control unit in a vector processing unit used in the apparatus of FIG.

【図５】図１の装置で実行されるべき処理を示すフォー
トランプログラムの一例を示す図。5 is a diagram showing an example of a Fortran program showing processing to be executed by the apparatus shown in FIG.

【図６】図５のプログラムの実行の流を示すフローチャ
ート。6 is a flowchart showing the flow of execution of the program of FIG.

【図７】図１の装置で実行されるプログラムの一例を示
す図。7 is a diagram showing an example of a program executed by the apparatus of FIG.

【図８】図１の装置に置ける命令実行の模様を示すタイ
ミングチャート。8 is a timing chart showing a pattern of instruction execution in the apparatus of FIG.

[Explanation of symbols]

６０１…スカラレジスタ群、６０２…有効性表示ビット
群、９１５…スカラレジスタ群読み出し完了表示ビット
Ｌ。601 ... Scalar register group, 602 ... Effectiveness display bit group, 915 ... Scalar register group read completion display bit L.

Claims

[Claims]

1. A main processor and a slave processor connected to the main processor, wherein the main processor is required to execute a series of instructions and an activation means for instructing the slave processor to execute the series of instructions. Means for sequentially supplying a plurality of initialization data to the slave processor, wherein the slave processor has a plurality of registers for holding one of the plurality of initialization data and the execution from the main processor. An instruction executing means for executing the series of instructions in response to an instruction, the supplying means supplies the plurality of initial setting data after the slave processor is activated, and the instruction executing means includes the series of instructions. If the next instruction to be executed requests the initialization data held in any one of the plurality of registers, valid initialization data has already been written to that one register. Multiprocessor having instruction execution control means for controlling the execution of the instruction so that the instruction is executed when the instruction is present, and otherwise the instruction is executed after waiting for the initialization data to be written in the one register. system.

2. The slave processor holds a plurality of validity information corresponding to each of the plurality of registers and indicating whether or not the corresponding one register holds valid initial setting data. Means for changing any of the initialization information supplied by the main processor to one of the plurality of registers and changing the validity information corresponding to the one register to a value indicating data validity. And the instruction execution control means, among the plurality of registers,
Based on the validity information corresponding to the one register holding the initialization data required by the instruction to be executed next, it is detected whether or not the valid initialization data is written in the one register. The multiprocessor system according to claim 1, further comprising means.

3. The main processor executes the series of instructions based on a plurality of other initialization data,
After waiting for completion of reading the plurality of initialization data from the plurality of registers by the series of instructions,
2. The multiprocessor system according to claim 1, further comprising means for restarting the slave processor, and means for supplying the plurality of other initialization data to the slave processor after the restart.

4. The slave processor further includes means for holding read completion information indicating whether or not the read of the plurality of initialization data from the plurality of registers is completed, and the slave processor is In response to the activation by the main processor, there is a means for changing the read completion information into information indicating that the read is not completed, and the instruction executing means is included in the series of instructions.
A means for changing the read completion information to a value indicating that the reading by the series of instructions is completed in response to a specific instruction located after the plurality of instructions requiring reading of the plurality of initialization data. Then, the restart means reads the read completion information from the holding means, and when this information indicates the completion of the read of the plurality of initialization data, restarts the slave processor, and the read completion information is 4. A means for restarting the slave processor when waiting for the information to indicate that the reading is completed when the reading is not completed.
The described multiprocessor system.

5. The slave processor holds a plurality of validity information indicating whether or not valid data is held in a corresponding one of the plurality of registers, and the main processor includes: Each time any one of the supplied initial setting data is written into one of the plurality of registers, there is provided means for changing the validity information corresponding to the one register to a value indicating data validity. Means, among the plurality of registers,
Means for detecting whether or not valid initial setting data is written in the one register based on validity information corresponding to the one register holding the initial setting data required by the instruction to be executed next. The slave processor further includes means for changing the validity information corresponding to the plurality of registers holding the plurality of initial setting information to information indicating data invalidity in response to the specific instruction. Item 4. The multiprocessor system according to Item 4.

6. The main processor sets a part of the initialization data other than the plurality of pieces of initialization information in one of the plurality of registers of the slave processor before the startup of the slave processor by the startup means. Means for changing the validity information corresponding to the plurality of registers in the slave processor, and a part of the validity information corresponding to the some registers is data. The multiprocessor system according to claim 5, wherein the value is not changed to a value indicating invalidity.

7. The multiprocessor system according to claim 4, wherein the specific instruction is an instruction located at the end of the series of instructions and informing the slave processor that there is no subsequent instruction to be executed.

8. The means for changing the validity information corresponding to the plurality of registers in the slave processor is a part of the plurality of registers corresponding to a part of the registers designated by the specific instruction. 6. The multiprocessor system according to claim 5, further comprising means for selectively changing the validity information of the item to a value indicating invalid data.

9. The series of instructions includes a plurality of vector instructions requesting that the same processing be performed on a plurality of elements of vector data, respectively, and the instruction executing means includes a plurality of vector instructions. In response to one, a plurality of vector processing execution means capable of executing the same processing in parallel with each other for a plurality of elements of the vector data designated by the instruction, 8. The multiprocessor system according to claim 1, wherein said vector instruction includes information designating vector data to be processed.

10. A main processor and a slave processor connected to the main processor, wherein the main processor is required to execute a series of instructions and an activation means for instructing the slave processor to execute the series of instructions. Means for sequentially supplying a plurality of initialization data to the slave processor, wherein the slave processor has a plurality of registers respectively holding the initialization data and the series of units in response to the execution instruction from the main processor. Instruction decoding means for sequentially decoding the instructions, and if any of the instructions decoded by the decoding means requires reading of the initialization data held in any one of the plurality of registers, the initialization data From the one register, a means for holding the read initialization data until the processing requested by the decoded instruction is executed, The plurality of initialization data from the plurality of registers according to the series of instructions in response to a specific instruction included in the decree that is located after the plurality of instructions requiring reading of the plurality of initialization data. And holding means for generating and holding read completion information indicating that the reading has been completed, and the main processor is configured to execute the series of instructions by the slave processor in parallel with the completion of reading by the holding means. Means for detecting whether or not information is held, and means for controlling execution of an instruction related to restart of the slave processor depending on whether or not the holding means holds read completion information A multiprocessor system further comprising:

11. A means for controlling execution of an instruction included in the main processor executes an instruction for restarting the slave processor when the holding means holds read completion information, 11. The multiprocessor according to claim 10, further comprising means for executing the instruction for restarting after waiting for the H-hour means to hold the read completion information when the holding means does not hold the read completion information. system.

12. The means for controlling the execution of an instruction included in the main processor executes one of the plurality of initialization data used by the slave processor after the instruction for restarting is executed. 12. The multiprocessor system according to claim 11, further comprising means for sequentially executing a plurality of instructions, one of which is supplied to the slave processor.

13. A main processor and a slave processor connected to the main processor, wherein the main processor is required to execute a series of instructions and an activation means for instructing the slave processor to execute the series of instructions. Means for sequentially supplying a plurality of initialization data to the slave processor, wherein the slave processor has a plurality of registers respectively holding the initialization data and the series of units in response to the execution instruction from the main processor. Instruction decoding means for sequentially decoding the instructions and a plurality of instruction execution means for executing the processing required by each decoded instruction, each of which executes any one of a plurality of processings required by different instructions. And a decoding information of at least one instruction provided corresponding to the plurality of instruction executing means and executable by the corresponding instruction executing means. A plurality of instruction holding means for causing the corresponding instruction executing means to execute the instruction based on the held decoding information, and whether or not any of the instructions decoded by the decoding means can be executed, Decoding information of the instruction when it is executable,
An instruction issuing means for sending the instruction to an instruction holding means corresponding to an executable instruction executing means, wherein the instruction issuing means has any decoded instruction in any one of the plurality of registers. When requesting to read the held initial setting data, it has means for reading the initial setting data from the one register and supplying the decoded information of the decoded instruction to one of the plurality of instruction holding means. The slave processor is included in the series of instructions,
When a specific instruction located after the plurality of instructions requiring the reading of the plurality of initialization data is issued by the instruction issuing means, the plurality of initialization data from the plurality of registers according to the series of instructions The main processor further includes holding means for generating and holding read completion information indicating that the read is completed, and the main processor is configured to store the read completion information in parallel with the execution of the series of instructions by the slave processor. Is held in the holding unit, and when the holding unit holds the reading completion information, the instruction related to the restart of the slave processor is executed, and the holding unit holds the reading completion information. If not, the holding means waits for holding the read completion information and then executes the instruction related to the restart of the slave processor. A multiprocessor system further comprising means.

14. The slave processor holds a plurality of validity information respectively corresponding to one of the plurality of registers and indicating whether or not the corresponding one register holds valid initial setting data. And means for changing validity information corresponding to each of the plurality of registers, in which any of the initialization data supplied by the main processor is written, to a value indicating data validity. The instruction issuance control means, among the plurality of registers,
Means for detecting whether or not valid initialization data is already written in the one register based on validity information corresponding to the one register holding the initialization data required by the decoded instruction. The multiprocessor system according to claim 13, further comprising:

15. The slave processor further includes means for changing validity information corresponding to a plurality of registers holding the plurality of initialization data into information indicating data invalidity in response to the specific instruction. The multiprocessor system according to claim 14.

16. The supply means supplies the plurality of initialization data after the slave processor is activated, and the instruction issuing means stores any decoded instruction in any one of the plurality of registers. When requesting the retained initialization data, issue the instruction if valid initialization data has already been written to that one register, otherwise issue valid initialization data to that one register. The main processor further has a means for supplying the plurality of other initialization data to the slave processor after the restart, after having a command issuance control means for waiting for writing and issuing the command. The multiprocessor system according to any one of claims 13 to 14.

17. The means for changing the validity information corresponding to the plurality of registers in the slave processor is a part of the plurality of registers corresponding to a part of the registers designated by the specific instruction. 17. The multiprocessor system according to claim 16, further comprising means for selectively changing the validity information of the item to a value indicating invalid data.

18. The multiprocessor system according to claim 13, wherein said specific instruction is an instruction located at the end of said series of instructions and which notifies said slave processor of the end of a subsequent instruction to be decoded.

19. The series of instructions includes a plurality of vector instructions for performing the same processing on a plurality of elements of vector data, and the plurality of instruction executing means respectively includes one of the plurality of instructions. In response to a plurality of vector processing execution means capable of executing the same processing in parallel with each other for a plurality of elements of the vector data specified by the instruction, 19. A multiprocessor system as claimed in any one of claims 13 to 18, wherein the instructions include information specifying the vector data to be processed.

20. A master processor and a slave processor connected to the master processor, wherein the master processor sequentially supplies to the slave processor a plurality of initialization data used by a series of instructions to be executed by the slave processor. The slave processor has means for supplying, and the slave processor respectively corresponds to a plurality of registers for holding one of the plurality of initial setting data and one of the plurality of registers, and is effective in the corresponding one register. Means for holding a plurality of validity information indicating whether or not various initial setting data are held, and whenever any one of the initial setting data supplied by the main processor is written in one of the plurality of registers, The instruction executing means has means for changing the validity information corresponding to the one register to a value indicating data validity, and instruction executing means for executing the series of instructions. Means for reading data from a register designated by each instruction in response to a plurality of instructions that require data held in any one of the plurality of registers in the series of instructions, Of the plurality of validity information in response to a particular instruction located after the plurality of instructions using the data held in at least one of the plurality of registers in the series of instructions, Means for changing a part of the validity information corresponding to the plurality of registers designated by the specific instruction to information indicating that the data in the corresponding register is invalid, and in response to the specific instruction, Means for generating and holding read completion information indicating that reading of the data in the plurality of registers by the series of instructions is completed, and the main processor reads the read completion information into the holding means. To detect whether or not completion information is held, and to cause the slave processor to execute the series of instructions based on a plurality of other initialization data when the holding completion information is held in the holding means. Of the plurality of registers, the plurality of instructions are stored in one of some of the registers corresponding to the invalidated part of the validity information. A multiprocessor system consisting of a plurality of instructions for supplying each.