JPH1097465A

JPH1097465A - Multiprocessor system

Info

Publication number: JPH1097465A
Application number: JP8249593A
Authority: JP
Inventors: Yoshiko Tamaoki; 由子玉置; Yonetaro Totsuka; 米太郎戸塚; Masanao Ito; 昌尚伊藤; Naonobu Sukegawa; 直伸助川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-09-20
Filing date: 1996-09-20
Publication date: 1998-04-14
Anticipated expiration: 2016-09-20
Also published as: JP3820645B2

Abstract

(57)【要約】【課題】主記憶共有マルチプロセッサにおいて、複数
プロセスの複数プロセッサによる同時実行（ＳＭＰ）に
よる高速化と、１プロセスの複数プロセッサによる並列
実行（ＡＳＭＰ）による高速化とを両立する。【解決手段】ＳＭＰ／ＡＳＭＰ実行中か、またプログ
ラムのどの部分を実行しているかを表示するモードビッ
トを設け、ＯＳもしくはユーザプログラムにより変更可
能とし、モードビットの値に応じて異なるキャッシュコ
ヒーレント制御を行う回路を起動する。 (57) Abstract: In a main storage sharing multiprocessor, both speedup by simultaneous execution (SMP) of a plurality of processes by multiple processors and speedup by parallel execution (ASMP) by a plurality of processors of one process are compatible. . SOLUTION: A mode bit for indicating whether SMP / ASMP is being executed or which part of the program is being executed is provided, which can be changed by an OS or a user program, and different cache coherent control is performed according to the value of the mode bit. Start the circuit to be performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、主記憶共有のマル
チプロセッサシステムにかかわり、特に複数プロセスの
同時実行によるスループット向上と１プロセスの並列実
行による高速化の双方の目的を達成するためのマルチプ
ロセッサシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor system sharing a main memory, and more particularly to a multiprocessor for achieving both the objective of improving throughput by simultaneously executing a plurality of processes and increasing the speed of executing one process in parallel. About the system.

【０００２】[0002]

【従来の技術】プロセッサシステムの性能を向上させる
ために、主記憶を共有した複数プロセッサから成るマル
チプロセッサ構成をとることが近年一般的に行なわれて
いる。2. Description of the Related Art In recent years, in order to improve the performance of a processor system, a multiprocessor configuration including a plurality of processors sharing a main memory has been generally adopted.

【０００３】主記憶共有マルチプロセッサ（以下ＳＭ
Ｐ：ＳｙｍｍｅｔｒｉｃＭｕｌｔｉ−Ｐｒｏｃｅｓｓ
ｏｒと呼ぶ）の目的は、大きく分けて（１）複数プロセ
スの複数プロセッサでの同時実行によるシステムスルー
プットの向上、（２）１プロセスの複数プロセッサでの
並列実行（１プロセスを分割し、分割した各プロセスを
各プロセッサで同時に並列実行する）による高速化、の
２つである。A main memory sharing multiprocessor (hereinafter referred to as SM)
P: Symmetric Multi-Process
The purpose of (or) is roughly divided into (1) improvement of system throughput by simultaneous execution of a plurality of processes on a plurality of processors, and (2) parallel execution of a plurality of processes on a plurality of processors (1 process is divided and divided. Speedup by executing each process in parallel by each processor at the same time).

【０００４】上記双方の目的を達成するために、一般に
従来の技術では以下が行なわれている。すなわち、単体
プロセッサの実行はキャッシュを利用して高速化し、各
プロセッサキャッシュ間の一致性はハードウェアのキャ
ッシュコヒーレント機構により保証する。キャッシュコ
ヒーレント機構には、大別してスヌープ方式とディレク
トリ方式がある。いずれの方式においても、各プロセッ
サキャッシュ内の各ラインが他のラインや主記憶の内容
と一致しているのかどうかを管理し、不一致が発生する
場合は、プロセッサ間の結合網を介して一致をとるため
のキャッシュラインのコピー／更新／無効化などを行な
うことで不一致を解消し、プロセッサが誤動作するのを
防いでいる。またキャッシュを命令用とデータ用に分け
プログラム側に命令の書き換えを許さないという制限を
課して、命令キャッシュ同士のコヒーレンスはとらない
様にしたものもある。これに関しては、「情報科学コア
カリキュラム講座コンピュータアーキテクチャＩ」、
１６７頁−１７７頁、富田真治著、丸善出版、に記載が
ある。In order to achieve both of the above objects, the following is generally performed in the prior art. That is, execution of a single processor is accelerated by using a cache, and consistency between the processor caches is guaranteed by a cache coherent mechanism of hardware. The cache coherent mechanism is roughly classified into a snoop method and a directory method. In each method, it manages whether each line in each processor cache matches with other lines or the contents of main memory, and when a mismatch occurs, the match is determined via a connection network between processors. The mismatch is eliminated by copying / updating / invalidating the cache line to be taken, thereby preventing the processor from malfunctioning. Some caches are divided into instruction caches and data caches, and the program side is restricted from rewriting instructions so that coherence between instruction caches is not taken. Regarding this, "Information Science Core Curriculum Course Computer Architecture I",
Pages 167 to 177, written by Shinji Tomita, published by Maruzen Publishing.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら上記従来
の技術では、主記憶を共有するマルチプロセッサシステ
ムに於いて、システムが複数プロセスを同時実行してい
るときも、１プロセスを並列実行しているときも、全く
同じキャッシュコヒーレント方式を利用していた。However, in the above-mentioned prior art, in a multiprocessor system sharing a main memory, even when the system is simultaneously executing a plurality of processes or when executing one process in parallel. Used the exact same cache coherent scheme.

【０００６】複数プロセッサから成るマルチプロセッサ
システムが複数プロセスを同時実行している場合は、一
般に、各プロセッサは互いに異なるプロセスを実行する
ので、各プロセッサのキャッシュが同じ主記憶の内容を
指すことは少なく、キャッシュコヒーレント機構をなる
べく起動しないよう制御する方が性能が向上する。その
ため、複数プロセスを同時実行しているマルチプロセッ
サシステムのキャッシュコヒーレントは、多くの場合、
自プロセッサのキャッシュにデータがあるときは他プロ
セッサとの結合網にキャッシュの内容を送出せず、ま
た、命令キャッシュの内容を他プロセッサとの結合網に
送出しないようになっている。When a multiprocessor system composed of a plurality of processors is executing a plurality of processes at the same time, each processor generally executes a different process. Therefore, the cache of each processor rarely points to the same main memory. The performance is improved by controlling the cache coherent mechanism so as not to be activated as much as possible. As a result, cache coherence in multiprocessor systems running multiple processes simultaneously is often
When there is data in the cache of the own processor, the contents of the cache are not sent to the connection network with other processors, and the contents of the instruction cache are not sent to the connection network with other processors.

【０００７】しかしながら上記の複数プロセスを複数の
プロセッサで同時実行しているマルチプロセッサシステ
ムのキャッシュコヒーレント方式は、必ずしもマルチプ
ロセッサシステムが複数のプロセッサが１プロセスを並
列実行する場合に最適な方法ではない。However, the cache coherent method of a multiprocessor system in which a plurality of processes are simultaneously executed by a plurality of processors is not necessarily the most suitable method when the multiprocessor system executes a plurality of processors in parallel.

【０００８】このことを、図３のＦＯＲＴＲＡＮプログ
ラムのＤＯ１０、ＤＯ２０を添え字ｉについて４つのプ
ロセッサで並列実行する場合を例にとって、説明する。
図３のプログラムは以下のように実行される。３１０
０、３３００の部分を複数のプロセッサを有するシステ
ム中の１つのプロセッサ（これを親プロセッサと呼ぶ。
仮にＰＥ０とする）が実行し、３２００、３４００の部
分を複数のプロセッサ（これらを子プロセッサと呼ぶ。
仮にＰＥ１、ＰＥ２、ＰＥ３とする）および親プロセッ
サが分担実行する。親プロセッサＰＥ０は３１００の実
行が終了すると、子プロセッサ群ＰＥ１〜３を起動し、
３２００の各々添え字ｉ＝５〜８、９〜１２、１３〜１
６を各々実行させるとともに、自らは添え字ｉ＝１〜４
を分担する。全プロセッサの実行が終了すると、親プロ
セッサＰＥ０は３３００を実行し、それが終了すると、
再び子プロセッサ群ＰＥ１〜３を起動し、３４００の各
々添え字ｉ＝５〜８、９〜１２、１３〜１６を各々実行
させるとともに自らは添え字ｉ＝１〜４を分担する。親
プロセッサが３１００、３３００を実行している間、子
プロセッサ群は親プロセッサからの起動を待つ。This will be described with reference to an example in which DO10 and DO20 of the FORTRAN program shown in FIG. 3 are executed in parallel by four processors for the subscript i.
The program in FIG. 3 is executed as follows. 310
0, 3300 is one processor in a system having a plurality of processors (this is called a parent processor).
This is executed by PE0, and portions 3200 and 3400 are referred to as a plurality of processors (these are called child processors).
These are assumed to be PE1, PE2 and PE3) and the parent processor. When the execution of 3100 is completed, the parent processor PE0 activates the child processors PE1 to PE3,
Subscript i = 3-8, 9-12, 13-1
6 respectively, and the subscripts i = 1 to 4
To share. When the execution of all the processors ends, the parent processor PE0 executes 3300, and when it ends,
The child processors PE1 to PE3 are activated again, and the subscripts i = 5 to 8, 9 to 12, 13 to 16 of 3400 are executed, respectively, and the subprocessors share the subscripts i = 1 to 4. While the parent processor is executing 3100 and 3300, the child processors wait for activation from the parent processor.

【０００９】このプログラムの実行では、子プロセッサ
群は親プロセッサから起動されて初めて実行すべきプロ
グラム部分の命令アドレスを通知される。そのため、１
プロセスを複数プロセッサで並列実行する場合のキャッ
シュコヒーレント方式として、前述の複数プロセスを複
数プロセッサで同時実行する場合のキャッシュコヒーレ
ント方式を用いた場合には、命令キャッシュの内容を他
プロセッサに通知しないことから、命令キャッシュミス
を起こすことが多い。In the execution of this program, the child processor group is notified of the instruction address of the program portion to be executed only after being started by the parent processor. Therefore, 1
When using the cache coherent method in which multiple processes are executed simultaneously by multiple processors as the cache coherent method when processes are executed in parallel by multiple processors, the contents of the instruction cache are not notified to other processors. Instruction cache misses often occur.

【００１０】また上記プログラムの実行では、子プロセ
ッサ群は親プロセッサから起動されて初めて実行すべき
データをフェッチする。そのため、１プロセスを複数プ
ロセッサで並列実行する場合のキャッシュコヒーレント
方式として、前述の複数プロセスを複数プロセッサで同
時実行する場合のキャッシュコヒーレント方式を用いた
場合、自プロセッサのキャッシュにデータがあるときは
他プロセッサとの結合網にキャッシュの内容を送出しな
いことから、上記プログラムの実行でデータキャッシュ
ミスを起こすことも多い。結果として、キャッシュミス
のペナルティが大きいため、１プロセスを並列実行して
もあまり性能が向上しない、という事態が発生する。In the execution of the above program, the child processor group fetches data to be executed only after being started by the parent processor. Therefore, as a cache coherent method in which one process is executed in parallel by a plurality of processors, a cache coherent method in which a plurality of processes are executed simultaneously by a plurality of processors is used. Since the contents of the cache are not sent to the connection network with the processor, execution of the above program often causes a data cache miss. As a result, due to a large penalty of a cache miss, a situation occurs in which even if one process is executed in parallel, the performance is not significantly improved.

【００１１】上記の状況は、本来、１プロセスの複数プ
ロセッサによる並列実行のために要求されるコヒーレン
ト方式と、複数プロセスの複数プロセッサによる同時実
行のために要求されるコヒーレント方式が異なる性格を
持つにもかかわらず、同一の手段でコヒーレンスを保と
うとしているために発生する。The above situation is that the coherent method originally required for parallel execution by a plurality of processors in one process and the coherent method required for simultaneous execution by a plurality of processors in a process have different characteristics. Nevertheless, it occurs because we are trying to maintain coherence by the same means.

【００１２】本発明の目的は、主記憶共有型マルチプロ
セッサシステムにおいて、システムが複数プロセスを同
時実行しているときと、１プロセスを並列実行している
ときで、異なるキャッシュコヒーレント方式を実現する
システム構成を提供することにある。An object of the present invention is to provide a system for realizing different cache coherent schemes when a system is executing a plurality of processes at the same time and when a process is executing one process in parallel in a multiprocessor system with shared main memory. It is to provide a configuration.

【００１３】[0013]

【課題を解決するための手段】上記を解決するために本
発明では、各々キャッシュを備える複数のプロセッサ
と、上記プロセッサ群を結合する接続線と、上記キャッ
シュ間の内容一致制御回路とを備えるシステムにおい
て、上記プロセッサ群の内の第１の複数のプロセッサ
が、複数のプロセスを前記第１の複数のプロセッサで同
時実行するモードか、１つのプロセスを前記第１の複数
のプロセッサで並列実行するモードかを識別する第１の
情報を具備し、前記情報に応じて前記内容一致制御回路
の動作を切り替えるようにする。According to the present invention, there is provided a system comprising: a plurality of processors each having a cache; a connection line connecting the processors; and a content matching control circuit between the caches. A mode in which a first plurality of processors in the processor group execute a plurality of processes simultaneously on the first plurality of processors or a mode in which one process is executed in parallel on the first plurality of processors First information for identifying the content matching control circuit, and the operation of the content matching control circuit is switched according to the information.

【００１４】また、前記内容一致制御回路を、複数の機
能ユニットから構成し、前記情報に応じて前記機能ユニ
ットのいずれを起動するかを選択する回路を備える。The content matching control circuit includes a plurality of functional units, and includes a circuit for selecting which one of the functional units is to be activated according to the information.

【００１５】更にまた、１つのプロセスを前記第１の複
数のプロセッサで同時実行するモードは、プロセスの並
列動作部分を実行するモードと非並列動作部分を実行す
るモードとからなり、前記並列動作部分を実行するモー
ドと非並列動作部分を実行するモードとを切り替える手
段と、前記前記並列動作部分を実行するモードと非並列
動作部分を実行するモードに応じて前記内容一致制御回
路の動作を切り替える手段とを有する。Further, the mode for simultaneously executing one process by the first plurality of processors includes a mode for executing a parallel operation part of the process and a mode for executing a non-parallel operation part of the process. Means for switching between a mode for executing the parallel operation part and a mode for executing the non-parallel operation part, and means for switching the operation of the content matching control circuit in accordance with the mode for executing the parallel operation part and the mode for executing the non-parallel operation part And

【００１６】更にまた、前記内容一致制御回路を複数の
機能ユニットから構成し、前記１つのプロセスを前記第
１の複数のプロセッサで同時実行するモードであり、か
つ、非並列動作部分を実行するモードである場合は、前
記第１の複数のプロセッサのそれぞれのキャッシュが同
じエントリで更新されるように、前記機能ユニットを選
択する回路を有するようにする。Still further, a mode in which the content coincidence control circuit is constituted by a plurality of functional units, wherein the one process is simultaneously executed by the first plurality of processors, and a non-parallel operation portion is executed. In the case of (1), a circuit for selecting the functional unit is provided so that each cache of the first plurality of processors is updated with the same entry.

【００１７】更にまた、前記内容一致制御回路は、１つ
のプロセスを前記第１の複数のプロセッサで同時実行す
るモードであり、かつ、非並列動作部分を実行するモー
ドである場合は、前記第１の複数のプロセッサのそれぞ
れのキャッシュを同じエントリで更新するようにする。
更にまた、前記内容一致制御回路を複数の機能ユニット
から構成し、前記１つのプロセスを前記第１の複数のプ
ロセッサで同時実行するモードであり、かつ、並列動作
部分を実行するモードである場合は、前記第１の複数の
プロセッサのそれぞれのキャッシュが個別のエントリで
更新されるようにする。Further, in the content matching control circuit, a mode in which one process is simultaneously executed by the first plurality of processors and a mode in which a non-parallel operation part is executed are provided. Update the cache of each of the processors with the same entry.
Furthermore, in the case where the content matching control circuit is constituted by a plurality of functional units, the one process is simultaneously executed by the first plurality of processors, and the parallel operation part is executed. , So that each cache of the first plurality of processors is updated with an individual entry.

【００１８】更にまた、前記第１の複数のプロセッサ
は、１つの親プロセッサと他の子プロセッサからなり、
前記親プロセッサか前記子プロセッサかに応じて、上記
内容一致制御回路の動作を変えるようにする。Still further, the first plurality of processors comprises one parent processor and another child processor,
The operation of the content matching control circuit is changed depending on whether the parent processor or the child processor.

【００１９】[0019]

【発明の実施の形態】以下、図面を参照しつつ本発明の
実施形態を説明する。はじめに、本実施の形態での用語
を定義しておく。複数プロセッサから成るシステムが、
複数プロセッサにより複数プロセスを同時実行している
ことを示すモードを以下ＳＭＰ（Ｓｙｍｍｅｔｒｉｃ
Ｍｕｌｔｉ−Ｐｒｏｃｅｓｓｏｒ）モードとし、複数プ
ロセッサにより１プロセスを並列実行していることを示
すモード以下ＡＳＭＰ（ＡｓｙｎｃｈｒｏｎｏｕｓＳ
ＭＰ）モードとする。更に、ＡＳＭＰモードのプロセッ
サ群が、プログラムの非並列化部分（図３の３１００、
３３００など。以下シングル部分と呼ぶ）を実行中の場
合をシングルモード、並列化部分（図３の３２００、３
３００など。以下パラレル部分と呼ぶ）を実行中の場合
をパラレルモードとする。Embodiments of the present invention will be described below with reference to the drawings. First, terms in the present embodiment are defined. A system consisting of multiple processors
A mode indicating that a plurality of processes are being executed simultaneously by a plurality of processors is hereinafter referred to as SMP (Symmetric).
A Multi-Processor (SM) mode, which indicates that one process is being executed in parallel by a plurality of processors, an ASMP (Asynchronous S)
MP) mode. Further, the processor group in the ASMP mode is provided with a non-parallelized portion of the program (3100 in FIG. 3,
3300 etc. The case where the single part is being executed is referred to as single mode, and the parallel part (3200, 3200 in FIG. 3) is executed.
300 etc. (Hereinafter, referred to as a parallel portion) is referred to as a parallel mode.

【００２０】図１は本発明の１実施形態であるシステム
の全体構成である。プロセッサ群１０〜１３（ＰＥ０〜
ＰＥ３）および主記憶４３が、アドレス／コマンドバス
４１およびデータバス４２を介し接続されている。プロ
セッサ群１０〜１３は各々命令キャッシュ（Ｉｃａｃｈ
ｅ）、データキャッシュ（Ｄｃａｃｈｅ）を備えてい
る。また、信号線１８〜２１はプロセッサ群とアドレス
／コマンドバス４１を接続する。信号線２２〜２５はプ
ロセッサ群とデータバス４２を接続する。信号線２６は
主記憶４３とアドレス／コマンドバス４１を、信号線２
７は主記憶４３とデータバス４２を接続する。これら構
成要素は、従来のＳＭＰシステム（マルチプロセッサシ
ステムが複数プロセスを同時実行しているシステム）で
も備えている。FIG. 1 shows the overall configuration of a system according to an embodiment of the present invention. Processor groups 10 to 13 (PE0 to PE0)
PE 3) and the main memory 43 are connected via an address / command bus 41 and a data bus 42. Each of the processor groups 10 to 13 has an instruction cache (Icache).
e), and a data cache (Dcache). The signal lines 18 to 21 connect the processor group and the address / command bus 41. The signal lines 22 to 25 connect the processor group and the data bus 42. The signal line 26 connects the main memory 43 and the address / command bus 41 to the signal line 2.
7 connects the main memory 43 and the data bus 42. These components are also provided in a conventional SMP system (a system in which a multiprocessor system is simultaneously executing a plurality of processes).

【００２１】更に、本実施形態特有の構成要素である、
プロセッサ群１０〜１３の各プロセッサ間を接続し、同
期をとる同期情報バス４０、プロセッサ群と同期情報バ
スを接続する信号線１４〜１７を有する。この同期情報
バスは、後述するモード情報やプログラムカウンタの値
（即ち、命令アドレス）の通知に使用される。Further, a constituent element unique to this embodiment is
It has a synchronization information bus 40 for connecting and synchronizing the processors of the processor groups 10 to 13, and signal lines 14 to 17 for connecting the processor group and the synchronization information bus. The synchronization information bus is used for notifying mode information and a value of a program counter (that is, an instruction address) to be described later.

【００２２】図２は、プロセッサ１０の内部構成であ
る。他のプロセッサ１１〜１３の構成も同様であり説明
を省略する。プロセッサは、命令キャッシュ（Ｉｃａｃ
ｈｅ）５２、命令の実行および命令キャッシュを制御す
る命令ユニット５３、演算ユニット（ＡＬＵ）５７、ロ
ードストアユニット（ＬＳＵ）５６、データキャッシュ
（Ｄｃａｃｈｅ）５１、データキャッシュを制御するデ
ータユニット５０、レジスタ５８からなる。また、これ
らを接続する信号線６０、６２、６３、６４、６５、６
６、６８、外部バスとの接続を行う信号線１８−０、１
８−１、２２−０、２２−１を有する。これらの構成要
素は公知のＳＭＰシステムでも備えており、本発明のＳ
ＭＰモード時の動作も公知のＳＭＰシステムの動作と同
様である。FIG. 2 shows the internal configuration of the processor 10. The configuration of the other processors 11 to 13 is the same, and the description is omitted. The processor has an instruction cache (Icac)
he) 52, an instruction unit 53 for controlling instruction execution and an instruction cache, an arithmetic unit (ALU) 57, a load store unit (LSU) 56, a data cache (Dcache) 51, a data unit 50 for controlling the data cache, and a register 58. Consists of Also, signal lines 60, 62, 63, 64, 65, 6
6, 68, signal lines 18-0, 1 for connection to an external bus
8-1, 22-0 and 22-1. These components are also provided in a well-known SMP system, and
The operation in the MP mode is the same as the operation of the known SMP system.

【００２３】更に、図２は、本実施形態特有の構成要素
である命令ユニット５３内のモードビット、モードビッ
トの情報をデータユニットに通達する信号線６７、同期
情報バス４０に接続する信号線１４を有する。モードビ
ットにより定まるモードについては図１１で説明する。Further, FIG. 2 shows a mode line in the instruction unit 53 which is a component specific to this embodiment, a signal line 67 for transmitting information of the mode bit to the data unit, and a signal line 14 for connecting to the synchronization information bus 40. Having. The mode determined by the mode bit will be described with reference to FIG.

【００２４】図１１は、上記の本実施形態におけるモー
ドビットの構成を示す。モードビットは、（１）ＡＳＭＰビット：ＡＳＭＰモード（主記憶共有マ
ルチプロセッサシステムにおいて１プロセスを複数のプ
ロセッサで並列実行していることを示すモード（Ａｓｙ
ｎｃｈｒｏｎｏｕｓＳＭＰモード））であるかＳＭＰ
モード（主記憶共有マルチプロセッサシステムにおいて
複数のプロセスを複数のプロセッサで同時実行している
ことを示すモード（ＳＭＰモード））であるかを示す
（ここではＡＳＭＰビット＝１の場合はＡＳＭＰモー
ド、ＡＳＭＰビット＝０の場合はＳＭＰモードとす
る）、（２）ｐａｒｅｎｔビット：ＡＳＭＰモード時に自プロ
セッサが親であるか子であるかを示す（ここでは、ｐａ
ｒｅｎｔビット＝１の場合は親、ｐａｒｅｎｔビット＝
０の場合は子とする）、（３）ｐａｒａビット：ＡＳＭＰモード時に現在プログ
ラムのパラレル部を実行しているのかシングル部を実行
しているのかを示す（ここでは、ｐａｒａビット＝１の
場合はパラレル部を実行、ｐａｒａビット＝０の場合は
シングル部を実行とする）、の３つからなる。FIG. 11 shows the configuration of the mode bits in the above embodiment. The mode bits are: (1) ASMP bit: ASMP mode (mode (Asy) indicating that one process is being executed in parallel by a plurality of processors in the main memory shared multiprocessor system.
nchronous SMP mode)) or SMP
Mode (mode indicating that a plurality of processes are simultaneously executed by a plurality of processors in a main memory shared multiprocessor system (SMP mode)) (here, when the ASMP bit = 1, the ASMP mode and the ASMP mode). (2) parent bit: indicates whether the own processor is a parent or a child in the ASMP mode (here, pa
parent when parent bit = 1, parent bit =
(3) para bit: indicates whether the parallel part or single part of the program is currently being executed in the ASMP mode (here, if the para bit = 1, The parallel part is executed, and the single part is executed when para bit = 0).

【００２５】ＡＳＭＰビットおよびｐａｒｅｎｔビット
はＯＳ（オペレーティングシステム）が変更する。ｐａ
ｒａビットはユーザプログラムおよびＯＳが変更する。
なお、ここでは各モード種別をビット情報の形で記録し
ているが、これらを識別できる情報を記録できる手段で
あればどのような形で記録されても良い。例えば、レジ
スタを設け、レジスタ内にこれらを識別できる情報を数
字やアルファベット記号の形で記憶しても良い。The ASMP bit and the parent bit are changed by the OS (operating system). pa
The ra bit is changed by the user program and the OS.
Here, each mode type is recorded in the form of bit information, but may be recorded in any form as long as it can record information that can identify them. For example, a register may be provided, and information for identifying them may be stored in the register in the form of numbers or alphabetic symbols.

【００２６】以下、ＳＭＰモードで本システムが動作す
る場合の動作を説明する。ＯＳは、まずＳＭＰモードで
動作するプロセッサのＡＳＭＰビットを「０」（ＳＭＰ
モードを示す）とし、各プロセッサに各々独立なプロセ
スを割り当てる。The operation when the present system operates in the SMP mode will be described below. The OS first sets the ASMP bit of the processor operating in the SMP mode to “0” (SMP
Mode), and an independent process is assigned to each processor.

【００２７】ＳＭＰモード時、プロセッサは以下のよう
に動作する（図２参照）。命令ユニット５３は、命令キ
ャッシュ５２に命令がある場合はそれを取り出し、命令
がない場合は信号線１８−１を介してアドレス／コマン
ドバス４１に命令フェッチのライン転送要求を送出する
とともに、命令キャッシュ５２がデータバス４２から命
令ラインを受け取るように制御する。命令ユニット５３
はバスを介して取り出した命令をデコードし、演算命令
なら演算ユニットＡＬＵ５７を、ロードストア命令であ
ればロードストアユニット５６を、信号線６８を介して
制御する。In the SMP mode, the processor operates as follows (see FIG. 2). The instruction unit 53 fetches the instruction in the instruction cache 52 when the instruction is present, and sends out the instruction fetch line transfer request to the address / command bus 41 via the signal line 18-1 when the instruction is not present, and 52 controls to receive the instruction line from the data bus 42. Instruction unit 53
Decodes the instruction fetched via the bus, and controls the arithmetic unit ALU 57 for an arithmetic instruction, and the load store unit 56 for a load store instruction via a signal line 68.

【００２８】命令がロードストア命令の場合、ロードス
トアユニット５６はデータユニット５０に対し命令の種
別とアドレスを信号線６０を介して送出する。データユ
ニット５０は、データキャッシュ５１にデータがある場
合はレジスタ５８にデータを送出するようデータキャッ
シュ５１を制御する。データがない場合は、信号線１８
−０を介してアドレス／コマンドバス４１にデータフェ
ッチのライン転送要求を送出するとともに、データキャ
ッシュ５１がデータバス４２からデータラインを受け取
るように制御する。If the instruction is a load store instruction, the load store unit 56 sends the instruction type and address to the data unit 50 via the signal line 60. The data unit 50 controls the data cache 51 to send the data to the register 58 when the data is present in the data cache 51. If there is no data, signal line 18
A data fetch line transfer request is sent to the address / command bus 41 via −0, and the data cache 51 is controlled to receive a data line from the data bus 42.

【００２９】図６は、本実施形態で行うＳＭＰモード時
のデータキャッシュコヒーレンス方式を示す。本方式は
バークレイプロトコルとして知られる公知の方式である
（「情報科学コアカリキュラム講座コンピュータアー
キテクチャＩ」、１７０頁−１７３頁、富田真治著、丸
善出版）。FIG. 6 shows a data cache coherence method in the SMP mode performed in the present embodiment. This system is a well-known system known as the Berkeley protocol ("Information Science Core Curriculum Course Computer Architecture I", pp. 170-173, Shinji Tomita, Maruzen Publishing).

【００３０】図６に於いて、“丸”で囲ってあるのが、
キャッシュの各キャッシュラインの状態を示す。「Ｉ」
はＩｎｖａｌｄ（自キャッシュにデータがない。）、
「Ｖ」はＶａｌｉｄ（自キャッシュにデータがあり、内
容は主記憶と一致。他キャッシュにも同一のデータがあ
る可能性がある。）、「Ｄ」はＤｉｒｔｙ（自キャッシ
ュにデータがあり、内容は主記憶と異なる。他キャッシ
ュにはない。）、「Ｓｈ．Ｄ」はＳｈａｒｅｄＤｉｒ
ｔｙ（自キャッシュにデータがあり、内容は主記憶と異
なる。他キャッシュにも同一のデータがある可能性があ
る。）を示す。In FIG. 6, what is surrounded by "circles" is
The status of each cache line of the cache is shown. "I"
Is Invald (there is no data in its own cache),
"V" is Valid (there is data in the own cache and the content matches the main memory. There is a possibility that the same data exists in other caches). "D" is Dirty (there is data in the own cache and the content Is different from the main memory. It is not in other caches.), "Sh.D" is Shared Dir
ty (there is data in the own cache, the content is different from the main memory. There is a possibility that the same data exists in other caches).

【００３１】図６（ａ）は、自プロセッサで発生するア
クセス（Ｌ：ロード、ＳＴ：ストア、Ｃａｓｔｏｕｔ：
リプレースに伴う主記憶への書き戻し）により、各キャ
ッシュラインの状態がどう遷移するか、またこのキャッ
シュライン状態の変化に伴ってトランザクションが発生
し、他プロセッサへバスを介して出力される（このバス
上へ送信されるトランザクションをバストランザクショ
ンへいう）。このバストランザクションは図６において
“四角”で囲ってある。このバストランザクションはバ
ス（アドレス／コマンドバス４１、データバス４２）を
介して他プロセッサへ通知される。バストランザクショ
ンには、ＬＴｒｅｑ：他プロセッサのロードに伴うライ
ン転送要求、ＬＴｒｅｑ−ｆｏｒＳＴ：他プロセッサの
ストアに伴うライン転送要求、Ｉｎｖ：他プロセッサか
ら発せられた無効化要求、Ｂｕｓｏｕｔ：自キャッシュ
の当該キャッシュラインの内容のデータバスへの出力、
がある。FIG. 6A shows an access (L: load, ST: store, Castout:
Due to the write back to the main memory accompanying the replacement), how the state of each cache line changes, and a transaction is generated in accordance with the change in the cache line state, and is output to another processor via a bus. A transaction transmitted on the bus is referred to as a bus transaction). This bus transaction is surrounded by a "square" in FIG. This bus transaction is notified to other processors via the bus (address / command bus 41, data bus 42). The bus transaction includes LTreq: a line transfer request associated with loading of another processor, LTreq-forST: a line transfer request associated with storage of another processor, Inv: an invalidation request issued from another processor, and Busout: the corresponding cache of its own cache Output the contents of the line to the data bus,
There is.

【００３２】図６（ｂ）は、他プロセッサからバスを介
してバストランザクション（ＬＴｒｅｑ、ＬＴｒｅｑ−
ｆｏｒＳＴ、Ｉｎｖ、Ｂｕｓｏｕｔ）を受けた場合、自
プロセッサの自キャッシュの状態がどう遷移するか、ま
た他プロセッサへ送信するどのようなバストランザクシ
ョンが発生するかを示している。発生するバストランザ
クションは“四角”で囲ってある。ここでは、発生する
バストランザクションとして、更に「Ｂｕｓｏｕｔ＆Ｓ
ｈ．Ｄ化指示（自キャッシュの当該キャッシュラインの
内容のデータバスへの出力および、データ取り込み先で
のＳｈ．Ｄ化要求）」が加わる。FIG. 6B shows a bus transaction (LTreq, LTreq-) from another processor via the bus.
ForST, Inv, Busout), it indicates how the state of its own cache of the own processor changes, and what kind of bus transaction to transmit to another processor occurs. The bus transactions that occur are boxed. Here, "Busout & S
h. D instruction (output of the contents of the cache line of the own cache to the data bus and a request for Sh.D conversion at the data fetch destination).

【００３３】例えば、Ｖａｌｉｄであるラインに対して
自プロセッサからストア命令を実行した場合、自プロセ
ッサのラインに書込むためにその状態はＤｉｒｔｙに移
行し、同時にバスに対しＩｎｖトランザクションを発行
する（図６（ａ）参照）。一方他のプロセッサにおいて
同じラインがＶａｌｉｄだった場合、バストランザクシ
ョンＩｎｖを受け、そのラインは無効化されるとともに
状態はＩに移行する（図６（ｂ）参照）。For example, when the own processor executes a store instruction for a line that is a Valid, the state shifts to Dirty in order to write to the line of the own processor, and at the same time, an Inv transaction is issued to the bus (FIG. 6 (a)). On the other hand, if the same line is Valid in another processor, it receives the bus transaction Inv, the line is invalidated, and the state shifts to I (see FIG. 6B).

【００３４】本プロトコルは公知であり、その動作は状
態遷移図を追えば明らかであるため、ここではこれ以上
説明しないが、本プロトコルにより、ＳＭＰモードにお
いて複数のプロセッサのキャッシュが同じ主記憶位置を
共有した場合も、キャッシュ内容の一致性が保証され
る。以上、ＳＭＰモードで本システムが動作する場合の
動作を説明した。This protocol is well-known, and its operation is clear from the state transition diagram, and will not be described further here. However, according to this protocol, the caches of a plurality of processors in the SMP mode store the same main memory location. Even in the case of sharing, the consistency of the cache contents is guaranteed. The operation when the present system operates in the SMP mode has been described above.

【００３５】次にＡＳＭＰモード時の本システムの動作
を説明する。ＯＳは、まず並列実行されるプログラムが
要求する台数分のプロセッサ群を選択し、それらのＡＳ
ＭＰビットを１とする。さらにその中の１台のみｐａｒ
ｅｎｔビットを１とし（このプロセッサは親となる）、
他のプロセッサのｐａｒｅｎｔビットは０（これらプロ
セッサ群は子となる）とする。その後選択したプロセッ
サ群に同一プロセス（ジョブに相当）の各スレッド（タ
スクに相当）を割り当てる。Next, the operation of the present system in the ASMP mode will be described. The OS first selects as many processor groups as required by the program to be executed in parallel, and
The MP bit is set to 1. Furthermore, only one of them is par
Set the ent bit to 1 (this processor is the parent),
The parent bits of the other processors are set to 0 (these processor groups are children). Thereafter, each thread (corresponding to a task) of the same process (corresponding to a job) is allocated to the selected processor group.

【００３６】図３は並列実行されるプログラムの例であ
り、その内いずれの部分が並列実行されるかは「発明が
解決しようとする課題」の項で述べたとおりである。FIG. 3 shows an example of a program to be executed in parallel. Which part of the program is executed in parallel is as described in the section of "Problems to be Solved by the Invention".

【００３７】図４は図３のプログラムを並列実行する機
械語命令列イメージである。命令列の左側に付された数
字は機械語命令のアドレスとして便宜的に付けたもので
ある。本実施形態では、親プロセッサも子プロセッサ群
も全く同じアドレスから始まる同じ命令列を実行する。FIG. 4 is an image of a machine language instruction sequence for executing the program of FIG. 3 in parallel. The numbers attached to the left side of the instruction sequence are conveniently added as machine language instruction addresses. In the present embodiment, both the parent processor and the child processor group execute the same instruction sequence starting from exactly the same address.

【００３８】命令列中、ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇ
ｌｅ＿ｍｏｄｅ命令、および、ｓｗｉｔｃｈ＿ｔｏ＿ｐ
ａｒａ＿ｍｏｄｅ命令は、本実施形態において新設され
る命令である。本命令の動作は、命令を実行するプロセ
ッサが親であるか子であるか、すなわちｐａｒｅｎｔビ
ットの値によって異なる。In the instruction sequence, switch_to_sing
le_mode instruction and switch_to_p
The ara_mode instruction is an instruction newly provided in the present embodiment. The operation of this instruction differs depending on whether the processor executing the instruction is a parent or a child, that is, the value of the parent bit.

【００３９】（１）ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ
＿ｍｏｄｅ命令の実行ｐａｒｅｎｔビットが１の時（親プロセッサの場合）、
ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍｏｄｅ命令をデ
コードすると、プロセッサは子プロセッサ群から同期情
報バス４０を介して、子プロセッサからバリア（当該複
数のプロセッサが予めプログラムで設定した同期ポイン
ト）まで処理が終了したことを示す信号（この信号をバ
リア信号と呼ぶことにする）が返ってくるのを待ち、全
ての子プロセッサから上記信号を受信することで、全て
のプロセッサ間で同期が取れたこと（バリア同期）を確
認したら（なお、親プロセッサはｓｗｉｔｃｈ＿ｔｏ＿
ｓｉｎｇｌｅ＿ｍｏｄｅ命令をデコードするということ
が即ちバリアまで処理が終了していることを示してい
る）、同期情報バス４０にモードをシングルに変更する
よう指示を出す。(1) switch_to_single
_Mode instruction execution When the parent bit is 1 (in the case of the parent processor),
When the switch_to_single_mode instruction is decoded, the processor outputs a signal indicating that processing from the child processor to the barrier (synchronization point set by the plurality of processors in advance by a program) from the child processor group via the synchronization information bus 40 (this signal). Is called a barrier signal), and after receiving the above signals from all the child processors, it is confirmed that synchronization has been achieved between all the processors (barrier synchronization). The parent processor is switch_to_
Decoding the single_mode instruction means that the processing has been completed up to the barrier), and issues an instruction to the synchronization information bus 40 to change the mode to single.

【００４０】ｐａｒｅｎｔビットが０の時（子プロセッ
サの場合）、ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍｏ
ｄｅ命令をデコードすると、プロセッサは同期情報バス
４０に対して自プロセッサがバリアまで処理が終了した
ことを示す信号を送出後、プログラムカウンタの更新を
やめる。すなわち各子プロセッサは、命令の取り出しお
よび実行を中止した状態に入り、親プロセッサは全子プ
ロセッサがバリアに到達するのを待ってから後続の命令
を実行することになる。When the parent bit is 0 (for a child processor), switch_to_single_mo
Upon decoding the de instruction, the processor sends a signal to the synchronization information bus 40 indicating that the processor has completed processing up to the barrier, and then stops updating the program counter. That is, each child processor enters a state in which instruction fetching and execution are suspended, and the parent processor waits for all child processors to reach the barrier before executing subsequent instructions.

【００４１】（２）ｓｗｉｔｃｈ＿ｔｏ＿ｐａｒａ＿ｍ
ｏｄｅ命令の実行ｐａｒｅｎｔビットが１の時（親プロセッサの場合）、
ｓｗｉｔｃｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄｅ命令をデコー
ドすると、親プロセッサは同期情報バス４０にモードを
パラレルに変更するよう指示を出し、その時実行中の命
令のプログラムカウンタを同期情報バス４０に送出す
る。モードをパラレルに変更されると、子プロセッサは
同期情報バス４０に送出されたプログラムカウンタを受
け取り、その値からプログラムカウンタの更新を再開す
る。すなわち、親プロセッサは子プロセッサの中止した
状態（ストール状態）を解除し、その時実行中の命令か
ら再開させることになる。ｐａｒｅｎｔビットが０の時
（子プロセッサの場合）、ｓｗｉｔｃｈ＿ｔｏ＿ｐａｒ
ａ＿ｍｏｄｅ命令をデコードしてもプロセッサは何も行
なわない。(2) switch_to_para_m
Execution of mode instruction When the parent bit is 1 (in the case of the parent processor),
When the switch_to_para_mode instruction is decoded, the parent processor instructs the synchronous information bus 40 to change the mode to parallel, and sends the program counter of the instruction being executed at that time to the synchronous information bus 40. When the mode is changed to parallel, the child processor receives the program counter sent to the synchronization information bus 40 and restarts updating the program counter from the value. That is, the parent processor releases the suspended state (stall state) of the child processor and resumes from the instruction being executed at that time. When the parent bit is 0 (for a child processor), switch_to_par
Decoding the a_mode instruction causes the processor to do nothing.

【００４２】ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍｏ
ｄｅ命令、ｓｗｉｔｃｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄｅ命
令の動作が上記であることから、図４の機械語命令列は
以下のように実行されることになる。すなわち、命令列
１００２は親プロセッサのみが実行し、命令１００３に
より、子プロセッサにも命令アドレス９２０が伝わり、
親、子プロセッサ群が命令列１００４を並列実行する。
ここで命令列１００２は図３の３１００に相当し、命令
列１００４は３２００に相当する。命令列１００２中の
ｃｏｍｐｕｔｅ＿ｍｙ＿ａｄｄｒは、各プロセッサの担
当すべきデータのアドレスをそれぞれの計算機で互いに
独立に計算する命令シーケンスを略記したものである。
命令１００５により子プロセッサは実行を中止し、親プ
ロセッサはバリア同期がとれたのを確認の後、命令列１
００６を実行する。さらに命令１００７により再び親、
子プロセッサ群が命令列１００８の並列実行を開始し、
命令１００９により再び親プロセッサのみの実行に戻
る。ここで命令列１００６は図３の３３００に相当し、
命令列１００８は３４００に相当する。Switch_to_single_mo
Since the operations of the de instruction and the switch_to_para_mode instruction are as described above, the machine language instruction sequence in FIG. 4 is executed as follows. That is, the instruction sequence 1002 is executed only by the parent processor, and the instruction address 920 is transmitted to the child processor by the instruction 1003.
The parent and child processors execute the instruction sequence 1004 in parallel.
Here, the instruction sequence 1002 corresponds to 3100 in FIG. 3, and the instruction sequence 1004 corresponds to 3200. “Compute_my_addr” in the instruction sequence 1002 is an abbreviation of an instruction sequence for calculating addresses of data to be assigned to the respective processors independently of each other.
The instruction 1005 causes the child processor to stop executing, and the parent processor confirms that barrier synchronization has been established.
Execute 006. In addition, the parent again by instruction 1007,
The child processors start parallel execution of the instruction sequence 1008,
The instruction 1009 returns to execution of only the parent processor. Here, the instruction sequence 1006 corresponds to 3300 in FIG.
The instruction sequence 1008 corresponds to 3400.

【００４３】すなわち、図４に示されるプログラムは、
最初、親プロセッサおよび子プロセッサ群により処理が
開始されるが、ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍ
ｏｄｅ命令により、子プロセッサ群は中止状態となり、
親プロセッサだけで処理される状態となる。その後、親
プロセッサにてｓｗｉｔｃｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄ
ｅ命令が処理されると、子プロセッサ群へプログラムの
再開用の親プロセッサのプログラムカウンタの値が通知
され、全てのプロセッサによって、このプログラムカウ
ンタの値からプログラム処理が行われる。また、ｓｗｉ
ｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍｏｄｅ命令を実行する
と先の記述と同様の処理を繰り返す。以上の動作がモー
ドビットに基づく命令列並列実行のシーケンスである。That is, the program shown in FIG.
First, processing is started by the parent processor and the child processor group, but switch_to_single_m
By the mode instruction, the child processor group is in a suspended state,
It will be processed only by the parent processor. After that, switch_to_para_mod is performed by the parent processor.
When the e instruction is processed, the value of the program counter of the parent processor for resuming the program is notified to the child processor group, and the program processing is performed by all the processors from the value of the program counter. Also, swi
When the tch_to_single_mode instruction is executed, the same processing as described above is repeated. The above operation is the sequence of instruction sequence parallel execution based on the mode bits.

【００４４】以下モードビットに基づくキャッシュコヒ
ーレント動作について説明する。The cache coherent operation based on the mode bits will be described below.

【００４５】まず図２を用いて動作概要を説明する。Ａ
ＳＭＰモードかつパラレルモード時のキャッシュコヒー
レント動作は本実施形態では、ＳＭＰモード時と同じと
する。すなわちデータキャッシュのコヒーレンスは図６
に基づいて行なう。また命令キャッシュのコヒーレンス
はとらない。First, an outline of the operation will be described with reference to FIG. A
In this embodiment, the cache coherent operation in the SMP mode and the parallel mode is the same as that in the SMP mode. That is, the coherence of the data cache is shown in FIG.
Perform based on. There is no coherence in the instruction cache.

【００４６】ＡＳＭＰモードかつシングルモード時は、
親プロセッサのみが命令列を実行するが、親プロセッサ
の実行した結果を子プロセッサのデータキャッシュにも
反映するよう、コヒーレント機構は動作する。すなわ
ち、親プロセッサが書込んだキャッシュラインはＳｈ．
Ｄ属性を持って全ての子プロセッサにブロードキャスト
され、また親プロセッサが読み込んだデータラインは、
全ての子プロセッサにＶ属性でブロードキャストされ
る。また命令キャッシュについては、子プロセッサは命
令の実行を中止（ストール）してはいるが、親プロセッ
サが発生した命令フェッチのためのライン転送結果を自
命令キャッシュにも取り込むように制御する。以上によ
り、パラレルモード時にはばらばらであった各プロセッ
サのキャッシュの内容が、シングルモードで実行中に徐
々に親プロセッサのキャッシュの内容に変化していく
（詳細後述）。In the ASMP mode and the single mode,
Only the parent processor executes the instruction sequence, but the coherent mechanism operates so that the result of the execution by the parent processor is reflected in the data cache of the child processor. That is, the cache line written by the parent processor is Sh.
The data line broadcast to all child processors with the D attribute and read by the parent processor is
Broadcast to all child processors with the V attribute. As for the instruction cache, the child processor suspends (stalls) the execution of the instruction, but controls so that the line transfer result for the instruction fetch generated by the parent processor is also taken into its own instruction cache. As described above, the contents of the caches of the respective processors which were separated in the parallel mode gradually change to the contents of the cache of the parent processor during execution in the single mode (details will be described later).

【００４７】図７は、上記を実現しつつデータキャッシ
ュのコヒーレンスを保つ状態遷移方式を示す。図７
（ａ）は自プロセッサで発生するアクセスにより、各状
態がどう遷移するか、またどのようなバストランザクシ
ョンが発生するかを示している。また図７（ｂ）はバス
から発生するトランザクションにより自キャッシュの状
態がどう遷移するか、またどのようなバストランザクシ
ョンを発生するかを示している。FIG. 7 shows a state transition method for maintaining the coherence of the data cache while realizing the above. FIG.
(A) shows how each state transits and what kind of bus transaction occurs due to an access occurring in the own processor. FIG. 7B shows how the state of its own cache changes due to a transaction generated from the bus, and what kind of bus transaction is generated.

【００４８】例えば、Ｖであるラインに対して自プロセ
ッサからストア命令を実行した場合、自プロセッサのラ
インに書込むと同時に他のプロセッサへブロードキャス
トが発生し、その状態はＳｈ．Ｄに移行する（図７
（ａ）参照）。一方他のプロセッサにおいて同じライン
がＶだった場合、バストランザクションＢｒｏａｄｃａ
ｓｔを受けそのラインはキャッシュに取り込まれ、状態
はＳｈ．Ｄに移行する（図７（ｂ）参照）。本状態遷移
が正しく動作することは、後に図４の機械語命令列を用
いて説明する。For example, when a store instruction is executed from the own processor to a line of V, a broadcast is generated to another processor at the same time as writing to the line of the own processor. D (see FIG. 7)
(A)). On the other hand, if the same line is V in another processor, the bus transaction Broadca
st, the line is fetched into the cache, and the state is changed to Sh. The process proceeds to D (see FIG. 7B). The correct operation of this state transition will be described later using the machine language instruction sequence in FIG.

【００４９】図８、図９は、以上のモードビットに基づ
くプロセッサ動作およびキャッシュコヒーレント動作を
実現する構成を示す。FIGS. 8 and 9 show a configuration for realizing the processor operation and the cache coherent operation based on the above mode bits.

【００５０】図８は、データユニット５０の構成図であ
る。データキャッシュ状態記憶機構７９は、データキャ
ッシュ５１に保持されているデータラインのアドレスと
その状態を記憶している。信号線６７には命令ユニット
５３内のモードビットの値が出力されている。FIG. 8 is a configuration diagram of the data unit 50. The data cache state storage mechanism 79 stores the address of the data line held in the data cache 51 and the state thereof. The value of the mode bit in the instruction unit 53 is output to the signal line 67.

【００５１】信号線６７がＳＭＰモードを示していると
き、信号線６０−０および６０−１を介してロードスト
アユニット５６からロードストア要求が入ってくると、
組合せ回路８０は図６（ａ）の状態遷移に従って、バス
トランザクション発生回路７１〜７６およびデータキャ
ッシュへのライン取込み指示回路７７、キャッシュ状態
変更回路７８を制御する信号を信号線１０１〜１０７、
９３に送出する。When the signal line 67 indicates the SMP mode and a load store request is received from the load store unit 56 via the signal lines 60-0 and 60-1,
According to the state transition of FIG. 6A, the combinational circuit 80 sends signals for controlling the bus transaction generation circuits 71 to 76, the line fetch instruction circuit 77 to the data cache, and the cache state change circuit 78 to the signal lines 101 to 107,
93.

【００５２】具体的には、例えばストア要求が信号線６
０−１を介し入力され、そのストアアドレスが信号線６
０−０に入力されたとする。データキャッシュ状態記憶
機構７９は、ストアアドレスとキャッシュの状態を比較
し、アクセス要求先のラインの状態、すなわち「Ｉ」か
「Ｖ」か「Ｄ」か「Ｓｈ．Ｄ」かを信号線９１に送出す
る。また信号線９２にはそのストア要求によってＣａｓ
ｔｏｕｔされるべきラインがあるかとそのアドレスが送
出される。例えば信号線９１に「Ｖ」が示され、またＣ
ａｓｔｏｕｔされるべきラインがない場合、組合せ回路
８０は無効化トランザクション発生回路７３を起動し、
エンコード回路８１を経由してアドレス／コマンドバス
４１に対し無効化トランザクションを発生させる。さら
に組合せ回路８０は状態変更回路７８を起動し、アクセ
ス要求先ラインの状態を「Ｄ」に変更する。Specifically, for example, when a store request is sent to the signal line 6
0-1 and the store address is input to signal line 6
It is assumed that the value is input to 0-0. The data cache state storage mechanism 79 compares the store address with the state of the cache, and indicates the state of the line of the access request destination, that is, “I”, “V”, “D”, or “Sh.D” on the signal line 91. Send out. In addition, the signal line 92 outputs Cas in response to the store request.
The address of the line to be touted is sent out. For example, "V" is indicated on the signal line 91, and C
If there is no line to be stalled, the combination circuit 80 activates the invalidation transaction generation circuit 73,
An invalidation transaction is generated for the address / command bus 41 via the encoding circuit 81. Further, the combination circuit 80 activates the state change circuit 78 and changes the state of the access request destination line to “D”.

【００５３】また信号線１８−０−１にはバスから発生
するトランザクションが入力され、組合せ回路８０は図
６（ｂ）の状態遷移にしたがってバストランザクション
発生回路７１〜７６およびデータキャッシュへのライン
取込み指示回路７７、キャッシュ状態変更回路７８を制
御する信号線を送出する。A transaction generated from the bus is input to signal line 18-0-1. Combination circuit 80 takes in the lines to bus transaction generation circuits 71 to 76 and the data cache in accordance with the state transition of FIG. 6B. The instruction circuit 77 sends out a signal line for controlling the cache state changing circuit 78.

【００５４】信号線６７がＡＳＭＰかつパラレルモード
を示しているときの動作は、本実施形態においては上記
ＳＭＰモード時の動作と同じである。The operation when the signal line 67 indicates the ASMP and the parallel mode is the same as the operation in the SMP mode in the present embodiment.

【００５５】信号線６７がＡＳＭＰかつシングルモード
を示している場合、組み合せ回路８０は図７（ａ）
（ｂ）の状態遷移に従ってバストランザクション発生回
路７１〜７６およびデータキャッシュへのライン取込み
指示回路７７、キャッシュ状態変更回路７８を制御する
信号線を送出する。When the signal line 67 indicates the ASMP and single mode, the combination circuit 80 operates as shown in FIG.
According to the state transition of (b), a signal line for controlling the bus transaction generation circuits 71 to 76, the line fetch instruction circuit 77 for the data cache, and the cache state change circuit 78 is transmitted.

【００５６】具体的には例えばストア要求が信号線６０
−１を介し入力され、そのストアアドレスが信号線６０
−０に入力されたとする。データキャッシュ状態記憶機
構７９が信号線９１にＶを送出し、またＣａｓｔｏｕｔ
されるべきラインが信号線９２に示されない場合、組合
せ回路８０はブロードキャストトランザクション発生回
路７６を起動する。ブロードキャストトランザクション
発生回路７６は信号線６５を介してデータキャッシュ５
０に対しストア結果の反映されたラインをデータバス４
２に送出するよう指示するとともに、エンコード回路８
１を経由してアドレス／コマンドバス４１に対しブロー
ドキャストトランザクションを発生させる。さらに組合
せ回路８０は状態変更回路７８を起動し、アクセス要求
先ラインの状態をＳｈ．Ｄに変更する。Specifically, for example, a store request is sent to the signal line 60.
-1 and its store address is
It is assumed that the value is input to −0. The data cache state storage mechanism 79 sends out V on the signal line 91, and
If the line to be done is not shown on signal line 92, combinational circuit 80 activates broadcast transaction generation circuit 76. The broadcast transaction generation circuit 76 is connected to the data cache 5 through a signal line 65.
For 0, the line on which the store result is reflected is sent to the data bus 4
2 and the encoding circuit 8
1 and a broadcast transaction is generated for the address / command bus 41. Further, the combination circuit 80 activates the state change circuit 78 and changes the state of the access request destination line to Sh. Change to D.

【００５７】図９は、命令ユニット５３の構成図であ
る。FIG. 9 is a block diagram of the instruction unit 53.

【００５８】命令キャッシュ状態記憶機構１５３は、命
令キャッシュ５２に保持されている命令ラインのアドレ
スを記憶している。プログラムカウンタ１３１はこれか
ら実行すべき命令のアドレスを示し、命令キャッシュ状
態記憶機構１５３で命令アドレスを調べた結果、求める
命令が命令キャッシュ５２になければ状態変更回路１３
４、信号線１８−１−０を経由して命令ラインのフェッ
チ要求を送出する。状態変更回路１３４は命令ラインの
フェッチ要求を出した場合は信号線６６−１に命令キャ
ッシュに対するライン取込み指示を、信号線１５５に命
令キャッシュ状態変更指示を送出する。命令キャッシュ
５２に求める命令がある場合は信号線６６−０を介し命
令要求が命令キャッシュ５２に送出され、信号線６６−
２を介して命令が送られる。命令はデコード回路１２０
でデコードされ、通常の演算もしくはロードストア命令
であれば、デコード結果は信号線６８に送出され、演算
ユニット５７もしくはロードストアユニット５６を制御
する。命令がＡＳＭＰをモード制御する、図４のｓｗｉ
ｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍｏｄｅ命令（１００
１）およびｓｗｉｔｃｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄｅ命
令（１００３）である場合、デコード結果は信号線１５
３に送出される。プログラムカウンタ１３１は信号線１
５８により、命令を取り出すごとに更新される。以上が
モードビットにかかわらず共通的な命令ユニット５３の
動作である。The instruction cache state storage mechanism 153 stores the address of the instruction line held in the instruction cache 52. The program counter 131 indicates the address of the instruction to be executed. As a result of checking the instruction address in the instruction cache state storage mechanism 153, if the instruction to be sought is not in the instruction cache 52, the state change circuit 13
4. Send the instruction line fetch request via the signal line 18-1-0. When an instruction line fetch request is issued, the state change circuit 134 sends a line fetch instruction to the instruction cache on a signal line 66-1 and an instruction cache state change instruction on a signal line 155. When there is an instruction to be sought in the instruction cache 52, an instruction request is sent to the instruction cache 52 via a signal line 66-0,
Instructions are sent via 2. The instruction is a decoding circuit 120
If the instruction is a normal operation or load store instruction, the decoded result is sent to a signal line 68 to control the operation unit 57 or the load store unit 56. The instruction modulates the ASMP, swi in FIG.
tch_to_single_mode instruction (100
1) and the switch_to_para_mode instruction (1003), the decoded result is the signal line 15
3 is sent. The program counter 131 is connected to the signal line 1
58, each time an instruction is fetched. The above is the operation of the common instruction unit 53 regardless of the mode bits.

【００５９】次に、モードビットにかかわった命令ユニ
ット５３の動作を示す。Next, the operation of the instruction unit 53 related to the mode bits will be described.

【００６０】モードビット１５２がＳＭＰモードを示し
ている場合、デコード１２０からの出力１５３とモード
１２１からの出力１５２の組み合わせによって動作する
組合せ回路１２２は何も出力しない。すなわち命令ユニ
ット５３の動作は上述のとおりであり、ｓｗｉｔｃｈ＿
ｔｏ＿ｓｉｎｇｌｅ＿ｍｏｄｅ命令（１００１）および
ｓｗｉｔｃｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄｅ命令（１００
３）は無視される。When the mode bit 152 indicates the SMP mode, the combinational circuit 122 that operates according to the combination of the output 153 from the decoder 120 and the output 152 from the mode 121 does not output anything. That is, the operation of the instruction unit 53 is as described above.
The to_single_mode instruction (1001) and the switch_to_para_mode instruction (100
3) is ignored.

【００６１】モードビット１５２がＡＳＭＰモード示し
ている場合、信号線１５３にｓｗｉｔｃｈ＿ｔｏ＿ｓｉ
ｎｇｌｅ＿ｍｏｄｅ命令（１００１）およびｓｗｉｔｃ
ｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄｅ命令（１００３）を送出
されると、組合せ回路１２２は、ＰＣ（プログラムカウ
ンタ）取込み回路１２３、プログラムカウンタの更新を
抑止するＰＣ（プログラムカウンタ）更新抑止回路１２
４、I-line取込み指示回路１２５、自プロセッサでのプ
ログラム処理がバリア点に達したときにこれを通知する
信号を送出するバリア送出回路１２６、子プロセッサへ
モードが遷移したことを通知するモードBroad-Cast回路
１２７、全ての子プロセッサからバリア点への到達が通
知されたらモードブロードキャスト回路１２７を起動す
るバリア完待ち回路１２８、他のプロセッサへプログラ
ムカウンタ値を送出するＰＣ送出回路１２９を次のよう
に制御する。When the mode bit 152 indicates the ASMP mode, the switch line_switch_to_si
ngle_mode instruction (1001) and switchc
When the h_to_para_mode instruction (1003) is sent, the combinational circuit 122 includes a PC (program counter) fetch circuit 123 and a PC (program counter) update suppression circuit 12 for suppressing update of the program counter.
4. I-line fetch instruction circuit 125, barrier sending circuit 126 for sending a signal for notifying that the program processing in the own processor has reached a barrier point, mode Broad for notifying the child processor of the mode change A -Cast circuit 127, a barrier completion wait circuit 128 that activates the mode broadcast circuit 127 when all child processors are notified of arrival at a barrier point, and a PC transmission circuit 129 that transmits a program counter value to another processor as follows. To control.

【００６２】すなわちｐａｒｅｎｔ＝１、ｐａｒａ＝１
を示している場合、ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ
＿ｍｏｄｅ命令が信号線１５３に送出されると、バリア
完了待ち回路１２８とモードブロードキャスト回路１２
７を起動する。バリア完了待ち回路１２８は信号線１４
−０に全ての子プロセッサのバリア点への到達が通知さ
れたらモードブロードキャスト回路１２７を起動する。
またｐａｒｅｎｔ＝１、ｐａｒａ＝０の時にｓｗｉｔｃ
ｈ＿ｔｏ＿ｐａｒａ＿ｍｏｄｅ命令が信号線１５３に送
出されると、モードブロードキャスト回路１２７と他の
プロセッサへプログラムカウンタ値を送出するプログラ
ムカウンタ送出回路１２９を起動する。That is, parent = 1, para = 1
Indicates switch_to_single
When the _mode instruction is sent to the signal line 153, the barrier completion wait circuit 128 and the mode broadcast circuit 12
7 is started. The barrier completion wait circuit 128 is connected to the signal line 14
When it is notified to −0 that all child processors have reached the barrier point, the mode broadcast circuit 127 is activated.
Switchc when parent = 1 and para = 0
When the h_to_para_mode instruction is sent to the signal line 153, the mode broadcast circuit 127 and the program counter sending circuit 129 that sends the program counter value to another processor are activated.

【００６３】一方ｐａｒｅｎｔ＝０、ｐａｒａ＝１を示
している場合、ｓｗｉｔｃｈ＿ｔｏ＿ｓｉｎｇｌｅ＿ｍ
ｏｄｅ命令が信号線１５３に送出されると、バリア送出
回路１２６とプログラムカウンタ更新抑止回路１２４が
起動される。またｐａｒｅｎｔ＝０、ｐａｒａ＝０の時
に信号線１４−０にモードをパラレルに切り替える指示
が入力されると、モードビット１２１はｐａｒａ＝１と
するとともに組合せ回路１２２を介してプログラムカウ
ンタ取込み回路１２３を起動し、信号線１４−０に送出
されているプログラムカウンタを取り込むよう制御す
る。On the other hand, if parent = 0 and para = 1, switch_to_single_m
When the "mode" instruction is sent to the signal line 153, the barrier sending circuit 126 and the program counter update inhibiting circuit 124 are activated. When an instruction to switch the mode in parallel is input to the signal line 14-0 when parent = 0 and para = 0, the mode bit 121 is set to para = 1 and the program counter fetch circuit 123 is set via the combinational circuit 122. It starts and controls to take in the program counter sent to the signal line 14-0.

【００６４】図４の機械語命令列を実行したときのキャ
ッシュの内容について図６、図７、図１０（ａ）を用い
て説明する。なお、キャッシュは１ラインに４データ保
持できるものとする。The contents of the cache when the machine language instruction sequence shown in FIG. 4 is executed will be described with reference to FIGS. 6, 7, and 10A. It is assumed that the cache can hold four data in one line.

【００６５】図１０は、図４の各命令列１００１〜１０
０９を４台のプロセッサで実行したときの親プロセッサ
（ＰＥ０）および子プロセッサ（ＰＥ１〜３）の命令キ
ャッシュおよびデータキャッシュの内容を示したもので
ある。なお、図１０では子プロセッサは同じ動作をする
のでＰＥ１のキャッシュの内容のみを示している。命令
キャッシュの内容は、図４で便宜的に付した命令アドレ
スで表示した。図中、＊が付いている命令もしくはデー
タはキャッシュミスもしくはブロードキャストされたデ
ータの取込みが発生したことを示している。また図中Ａ
（１）〜のように表示した場合、Ａ（１）から始まる４
つのデータ、すなわちＡ（１）、Ａ（２）、Ａ（３）、
Ａ（４）がキャッシュに入っていることを意味するとす
る。FIG. 10 shows each of the instruction strings 1001 to 10 in FIG.
9 shows the contents of the instruction cache and the data cache of the parent processor (PE0) and the child processors (PE1 to PE3) when four processors are executed. In FIG. 10, since the child processors perform the same operation, only the contents of the cache of PE1 are shown. The contents of the instruction cache are indicated by instruction addresses given for convenience in FIG. In the figure, an instruction or data marked with * indicates that a cache miss or fetching of broadcasted data has occurred. A in the figure
(1) When displayed as shown in (1) to (4), starting from A (1)
A (1), A (2), A (3),
Assume that A (4) is in the cache.

【００６６】図４の命令１００１を実施したときは、Ｐ
Ｅ０、ＰＥ１とも命令キャッシュミスが発生したとす
る。ＰＥ１はプログラムカウンタの更新を抑止し、中止
（ストール）状態に入る。ＰＥ０〜３はシングルモード
となり、ＰＥ０は命令列１００２の実行を開始する。Ｐ
Ｅ０のデータキャッシュにはデータが入っていなかった
とすると、Ｐ、Ｓ、Ａ（１）〜は全てライン転送され
る。この時ＰＥ１（子プロセッサ）のデータキャッシュ
は図７（ｂ）に従い、Ｐ、Ｓ、Ａ（１）〜を取り込む
［状態ＩからＬＴｒｅｑもしくはＬＴｒｅｑ−ｆｏｒＳ
Ｔにより遷移］。ＰＥ０（親プロセッサ）のみＶａｌｉ
ｄ、その他（子プロセッサ）はＳｈ．Ｄとなる。またＰ
Ｅ１（子プロセッサは命令キャッシュにもＰＥ０と同じ
アドレス９１０を取り込む。すなわち、子プロセッサは
中止状態ではあるが、子プロセッサの命令およびデータ
キャッシュの更新は親プロセッサの命令およびデータキ
ャッシュ更新と合わせて行われる。これらの取込み処理
は、ＰＥ０のライン転送のかげで行なわれるので、処理
時間の増加は起こさない。When the instruction 1001 in FIG. 4 is executed, P
Assume that an instruction cache miss has occurred in both E0 and PE1. PE1 suppresses updating of the program counter, and enters a stop (stall) state. PE0 to PE3 enter the single mode, and PE0 starts executing the instruction sequence 1002. P
Assuming that no data is stored in the data cache of E0, all of P, S, A (1)-are line-transferred. At this time, the data cache of PE1 (child processor) fetches P, S, A (1)-according to FIG. 7B [from state I to LTreq or LTreq-forS
Transition by T]. Valid only for PE0 (parent processor)
d and others (child processors) are Sh. D. Also P
E1 (The child processor fetches the same address 910 as the PE0 in the instruction cache. That is, while the child processor is in the suspended state, the update of the instruction and data cache of the child processor is performed together with the instruction and data cache update of the parent processor. Since these fetching processes are performed under the influence of the line transfer of PE0, the processing time does not increase.

【００６７】ＰＥ０（親プロセッサ）が命令１００３を
実行すると、親プロセッサは同期情報バスのモードをパ
ラレルとするとともにプログラムカウンタを出力する。
ＰＥ１（子プロセッサ）はプログラムカウンタを取り込
み、全プロセッサが命令列１００４の並列実行を開始す
る。ＰＥ１の命令キャッシュにはＰＥ０と同じラインが
格納されているので命令キャッシュミスは発生しない。
またＰＥ１のデータキャッシュにはＳが格納されている
のでＳに関してはキャッシュミスは発生しない。Ａ
（５）〜、Ｂ（５）〜についてはキャッシュミスとな
る。命令列１００４の実行は図６の状態遷移に基づき行
なわれるため、ＰＥ０とＰＥ１のキャッシュの内容はか
なり異なってくる。ＰＥ１ではＢ（５）〜はＤｉｒｔｙ
の状態で保持される。When PE0 (parent processor) executes the instruction 1003, the parent processor makes the mode of the synchronous information bus parallel and outputs the program counter.
PE1 (child processor) takes in the program counter, and all processors start parallel execution of the instruction sequence 1004. Since the same line as that of PE0 is stored in the instruction cache of PE1, no instruction cache miss occurs.
Since S is stored in the data cache of PE1, no cache miss occurs for S. A
For (5) to B (5), a cache miss occurs. Since the execution of the instruction sequence 1004 is performed based on the state transition in FIG. 6, the contents of the caches of PE0 and PE1 differ considerably. In PE1, B (5) ~ is Dirty
Is held in the state of.

【００６８】命令１００５を実行すると、ＰＥ１（子プ
ロセッサ）は、バリア点に到達するとこのことを示すバ
リア信号を同期情報バス４０に送出してストール状態に
入る。またＰＥ０（親プロセッサ）は、命令１００５を
実行すると、ＰＥ１〜３（子プロセッサ）からのバリア
信号を待ち、これらを全て受け取ると、モードをシング
ルにする。When the instruction 1005 is executed, when the PE1 (child processor) reaches the barrier point, it sends a barrier signal indicating this to the synchronization information bus 40 and enters the stall state. When the instruction 1005 is executed, the PE0 (parent processor) waits for barrier signals from the PE1 to PE3 (child processors), and upon receiving all of them, sets the mode to single.

【００６９】命令列１００６の実行では、キャッシュコ
ヒーレンス制御は図７の状態遷移に従う。よってＰＥ１
〜３が変更したＢ（５）〜Ｂ（１６）はＰＥ０が参照す
るたびに、ブロードキャストされ、全ＰＥがＳｈ．Ｄ属
性のＢ（５）〜Ｂ（１６）を保持することになる。例え
ば、ＰＥ０（親プロセッサ）は図７（ａ）の状態Ｉから
ＬによってＳｈ．Ｄに移行し、ＰＥ１〜３（子プロセッ
サ）は図７（ｂ）の状態ＩからＬＴｒｅｑによりＳｈ．
Ｄに移行する。In the execution of the instruction sequence 1006, the cache coherence control follows the state transition of FIG. Therefore PE1
B (5) to B (16) changed by P0 to P3 are broadcast each time PE0 refers, and all PEs are sh. It holds B (5) to B (16) of the D attribute. For example, PE0 (parent processor) uses Sh. D, and the PEs 1 to 3 (child processors) change the state from the state I of FIG.
Move to D.

【００７０】命令１００７は、命令１００３と同様に実
行される。命令列１００８は、全ＰＥにより並列実行さ
れるが、ＰＥ１はＢ（５）〜を既にキャッシュに取り込
んでいるので、ミスを発生しない。The instruction 1007 is executed in the same manner as the instruction 1003. The instruction sequence 1008 is executed in parallel by all PEs. However, since PE1 has already fetched B (5) to cache, no miss occurs.

【００７１】比較のため、従来方式により図３のプログ
ラムを並列実行する場合の機械語命令列イメージを図５
に示し、また本命令列を通常のＳＭＰモード、すなわち
図６の状態遷移に従って実行した場合のキャッシュの内
容を図１０（ｂ）に示す。For comparison, FIG. 5 shows a machine language instruction sequence image when the program of FIG. 3 is executed in parallel by the conventional method.
FIG. 10B shows the contents of the cache when this instruction sequence is executed in the normal SMP mode, that is, in accordance with the state transition of FIG.

【００７２】図５（ａ）は親プロセッサが実行する命令
列、（ｂ）は子プロセッサ群が実行する命令列である。
図５（ａ）の命令２００３のｓｔｏｒｅ＿ｂｅｇｉｎ＿
ａｄｄｒ命令は、子プロセッサを起動し、子プロセッサ
に実行開始アドレスを通知するシーケンスを表してい
る。図５（ａ）の命令２００５のｌｏａｄａｌｌ＿ｅｎ
ｄ命令は、子プロセッサが通知してくる終了フラグを集
計するシーケンスを表している。図５（ｂ）に示すよう
に、子プロセッサはプログラムの非並列実行部分に到達
するとスピンウェイトを行なうとする。FIG. 5A shows an instruction sequence executed by the parent processor, and FIG. 5B shows an instruction sequence executed by the child processor group.
Store_begin_ of the instruction 2003 in FIG.
The addr instruction represents a sequence for activating the child processor and notifying the child processor of the execution start address. Loadall_en of the instruction 2005 in FIG.
The d instruction represents a sequence for counting the end flags notified by the child processor. As shown in FIG. 5B, it is assumed that the child processor performs the spin wait when it reaches the non-parallel execution part of the program.

【００７３】図１０（ｂ）に明らかなように、従来方式
ではプログラムの並列化部分に入るときにＰＥ１（子プ
ロセッサ）の命令キャッシュがミスする（２００４、２
００８）。また本発明の実施形態では発生しなかったデ
ータキャッシュミスが２００４、２００８で発生してい
る。As apparent from FIG. 10B, in the conventional method, the instruction cache of PE1 (child processor) misses when entering the parallel part of the program (2004, 2).
008). Data cache misses that did not occur in the embodiment of the present invention have occurred in 2004 and 2008.

【００７４】以上より明らかに従来方式の方がキャッシ
ュミスのペナルティが大きく、１プロセス並列実行によ
る性能向上を阻害している。It is apparent from the above that the conventional method has a larger penalty for cache misses and hinders performance improvement by one-process parallel execution.

【００７５】[0075]

【発明の効果】以上により本発明では、複数のプロセス
を複数のプロセッサで同時実行するモード（ＳＭＰモー
ド）か、１つのプロセスを前記第１の複数のプロセッサ
で並列実行するモード（ＡＳＭＰモード）かを識別する
第１の情報を具備し、前記情報に応じて前記内容一致制
御回路の動作を切り替えるので、各モードにあったキャ
ッシュコヒーレント制御方式を選ぶことができる。例え
ば、ＳＭＰモードでは各プロセッサのキャッシュの内容
をなるべく独立に保つことにより、無駄にコヒーレント
機構を起動せずに複数プロセス実行のスループットを向
上させることができる。一方ＡＳＭＰモードでは、プロ
グラムの実行部分（プロセスの並列動作部分を実行する
モードと非並列動作部分を実行するモード）に応じて適
したキャッシュコヒーレント方式をとることができ、１
プロセスの並列実行の性能を向上させることができる。As described above, according to the present invention, a mode in which a plurality of processes are simultaneously executed by a plurality of processors (SMP mode) or a mode in which one process is executed in parallel by the first plurality of processors (ASMP mode) is described. Is provided, and the operation of the content matching control circuit is switched according to the information, so that a cache coherent control method suitable for each mode can be selected. For example, in the SMP mode, by keeping the contents of the cache of each processor as independent as possible, it is possible to improve the throughput of executing a plurality of processes without activating the coherent mechanism unnecessarily. On the other hand, in the ASMP mode, a cache coherent method suitable for a program execution part (a mode in which a parallel operation part of a process is executed and a mode in which a non-parallel operation part is executed) can be adopted.
The performance of parallel execution of processes can be improved.

[Brief description of the drawings]

【図１】本発明の実施形態の１つであるプロセッサシス
テムの全体構成図である。FIG. 1 is an overall configuration diagram of a processor system according to an embodiment of the present invention.

【図２】本発明のプロセッサの構成図である。FIG. 2 is a configuration diagram of a processor of the present invention.

【図３】例題プログラムである。FIG. 3 is an example program.

【図４】図３のプログラムの本発明における機械語命令
列イメージである。4 is a machine language instruction sequence image of the program of FIG. 3 in the present invention.

【図５】図３のプログラムの従来技術における機械語命
令列イメージである。FIG. 5 is an image of a machine language instruction sequence in the prior art of the program of FIG. 3;

【図６】キャッシュコヒーレンス方式を説明する状態遷
移図である。FIG. 6 is a state transition diagram illustrating a cache coherence scheme.

【図７】キャッシュコヒーレンス方式を説明する状態遷
移図である。FIG. 7 is a state transition diagram illustrating a cache coherence scheme.

【図８】本発明のプロセッサのデータキャッシュコヒー
レント機構の構成図である。FIG. 8 is a configuration diagram of a data cache coherent mechanism of the processor of the present invention.

【図９】本発明のプロセッサの命令キャッシュコヒーレ
ント機構の構成図である。FIG. 9 is a configuration diagram of an instruction cache coherent mechanism of the processor of the present invention.

【図１０】本発明と従来技術のキャッシュ内容である。FIG. 10 shows cache contents of the present invention and the prior art.

【図１１】本発明の動作切り替えモードビットの構成で
ある。FIG. 11 shows the configuration of an operation switching mode bit according to the present invention.

[Explanation of symbols]

５１データキャッシュ、５２命令キャッシュ、５０データユニット、５３命令ユニット、１２１モードビット、４０同期情報バス、７１〜７７データキャッシュコヒーレント回路、１２３〜１２９命令キャッシュコヒーレント回路。 51 data cache, 52 instruction cache, 50 data units, 53 instruction units, 121 mode bits, 40 synchronization information bus, 71 to 77 data cache coherent circuit, 123 to 129 instruction cache coherent circuit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者助川直伸東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Naobu Sukekawa 1-280 Higashi Koigakubo, Kokubunji-shi, Tokyo Inside Central Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

1. A system comprising: a plurality of processors each having a cache; a connection line connecting the group of processors; and a content matching control circuit between the caches; a first plurality of processors in the group of processors. Comprises first information for identifying a mode in which a plurality of processes are simultaneously executed by the first plurality of processors or a mode in which one process is executed in parallel by the first plurality of processors; A multiprocessor system for switching the operation of the content matching control circuit according to

2. The multiprocessor system according to claim 1, wherein said content matching control circuit comprises a plurality of functional units, and further comprises a circuit for selecting which one of said functional units to activate according to said information.

3. The mode in which one process is simultaneously executed by the first plurality of processors further includes a mode in which a parallel operation part of the process is executed and a mode in which a non-parallel operation part is executed. Means for switching between a mode for executing the operation part and a mode for executing the non-parallel operation part, and switching the operation of the content matching control circuit according to the mode for executing the parallel operation part and the mode for executing the non-parallel operation part 2. The multiprocessor system of claim 1, further comprising:

4. A mode in which the content matching control circuit is composed of a plurality of functional units, wherein the one process is simultaneously executed by the first plurality of processors, and a non-parallel operation part is executed. 4. The multiprocessor system according to claim 3, further comprising a circuit for selecting the functional unit such that, in some cases, each cache of the first plurality of processors is updated with the same entry.

5. A mode in which the content coincidence control circuit is composed of a plurality of functional units, wherein the one process is simultaneously executed by the first plurality of processors, and a mode in which a parallel operation part is executed. 4. The multiprocessor system according to claim 3, further comprising a circuit for selecting the functional unit so that each cache of the first plurality of processors is updated with an individual entry.

6. The content matching control circuit according to claim 1, wherein said mode is a mode in which one process is simultaneously executed by said first plurality of processors, and said mode is a mode in which a non-parallel operation part is executed. 4. The multiprocessor system according to claim 3, wherein each cache of the plurality of processors is updated with the same entry.

7. The first plurality of processors comprises one parent processor and another child processor, and the operation of the content matching control circuit is changed according to the parent processor or the child processor. Multiprocessor system.