JP2009048306A

JP2009048306A - Parallel processing architecture and parallel processing processor using the same

Info

Publication number: JP2009048306A
Application number: JP2007211904A
Authority: JP
Inventors: Riki Fukunaga; 力福永
Original assignee: Tama TLO Co Ltd; Tokyo Metropolitan Public University Corp
Current assignee: Tama TLO Co Ltd; Tokyo Metropolitan Public University Corp
Priority date: 2007-08-15
Filing date: 2007-08-15
Publication date: 2009-03-05

Abstract

【課題】並列処理プロセッサをＦＰＧＡで構成し単体プロセッサ内でプログラムのプロセスを並列処理する。
【解決手段】プロセッサにプロセス管理用レジスタと汎用の内部スタックレジスタとメモリとリンクを設け、実行するプログラムのプロセス識別番号をプロセス管理用レジスタとメモリで管理し、プロセスの識別番号をメモリ上でリンク構造のスケジューリングリストに形成することによりプロセス間を連結し、プロセスの切り替えやプロセス間のチャンネル通信を実行する。
【選択図】図１A parallel processing processor is configured by an FPGA, and a program process is processed in parallel in a single processor.
A processor is provided with a process management register, a general-purpose internal stack register, and a memory, a process identification number of a program to be executed is managed by the process management register and the memory, and the process identification number is linked on the memory. By forming the scheduling list in the structure, the processes are connected to each other, and process switching and channel communication between processes are executed.
[Selection] Figure 1

Description

本発明は、コンピュータシステムの中核となる中央演算処理装置（CPU）に関し、単独で並列処理を行い、さらに複数のＣＰＵと協調またはデータ交換により同期をとりながら並列処理を行う並列処理アーキテクチャおよびそれを用いた並列処理プロセッサに関する。 The present invention relates to a central processing unit (CPU) that is the core of a computer system, and performs a parallel processing by itself, and further performs a parallel processing architecture that performs parallel processing while synchronizing with a plurality of CPUs by cooperation or data exchange. The present invention relates to the parallel processor used.

１９８０年代後半から１９９０年代前半にＣＳＰ（ＣｏｍｍｕｎｉｃａｔｉｎｇＳｅｑｕｅｎｔｉａｌＰｒｏｃｅｓｓｅｓ）理論に基づいた並列処理用プロセッサ（トランスピュータ（登録商標））が、英国ＩＮＭＯＳ社により開発、販売された。このトランスピュータは、複数ネットワーク上で連動して動き、そこで能力を発揮し計算スピードを高めるものであった。しかしその後、このトランスピュータの開発製造販売は中止された。
また、１９８０年代中頃のＶＬＳＩ（大規模集積回路）技術では1台のトランスピュータ程度の規模の回路は完全に１個のＶＬＳＩチップを占有するほどの場所、空間を占めていたので、トランスピュータのネットワークは巨大なボード（回路基板）で実現されていた。さらに、各インターフェースはチップ間を繋ぐ数本のワイアで実現されていた。
一方、トランスピュータに関する資料はおおまかなアーキテクチャのブロック図しか開示されず、トランスピュータの単体内部に関する並列処理アルゴリズムは公表されていなかった。 In the late 1980s and early 1990s, a processor for parallel processing (Transputer (registered trademark)) based on CSP (Communicating Sequential Processes) theory was developed and sold by UK company INMOS. This transputer worked in conjunction on multiple networks, where it demonstrated its capabilities and increased computation speed. However, the development, production and sales of this transputer were discontinued.
In addition, in the VLSI (Large Scale Integrated Circuit) technology of the mid-1980s, a circuit on the scale of one transputer occupies a place and space that completely occupies one VLSI chip. The network was realized with a huge board (circuit board). In addition, each interface was realized with several wires connecting the chips.
On the other hand, only a rough architecture block diagram is disclosed as a document about the transputer, and a parallel processing algorithm related to the inside of the single unit of the transputer has not been disclosed.

上述したＣＳＰ理論ではある計算システム（プログラム）をプロセスの集合として捉え、プロセス間は相互にメッセージ通信を行うことにより同期化をとり、計算が実行される。なお、プロセスとは「ある一定の行動を逐次的に実行し続ける実態」を示す。また、あるプログラムの中で複数プロセスは独自に同時に動作し、各プロセス同士はプロセス内部で実行された入出力操作を介して通信する。そしてこの通信によりプロセス間の同期が計られる。つまりあるプロセスが入力（出力）操作処理の段階に到達すると他方のプロセスが対応する出力（入力）操作段階に至るのを待ち、互いにデータ（メッセージ）通信段階に到達した時点で通信が実行される。入出力の対象となるデータはキュー（待行列；Ｑｕｅｕｅ）をとって格納されたりバッファリングされたりすることなくやり取りされる。こうして２プロセス間の処理が揃えられていく。そして、通信終了後は再び独自の処理を続けていく。 In the CSP theory described above, a certain calculation system (program) is regarded as a set of processes, and the processes are synchronized by performing message communication with each other, and the calculation is executed. In addition, the process indicates “the actual situation in which a certain action is continuously executed sequentially”. In addition, a plurality of processes operate independently at the same time in a program, and the processes communicate with each other via input / output operations executed inside the process. This communication enables synchronization between processes. In other words, when one process reaches the input (output) operation processing stage, it waits for the other process to reach the corresponding output (input) operation stage, and communication is executed when they reach each other's data (message) communication stage. . Data to be input / output is exchanged without being stored or buffered in a queue (queue). Thus, the processing between the two processes is aligned. Then, after the communication is completed, the original process is continued again.

上述したＣＳＰ理論では、通信を交わす２プロセス、あるいはすべてのプロセスにとってもシェアド（共有）メモリというものは存在せず、チャンネル間通信によってのみデータがプロセス間で共有されていくことになっている。
ＣＳＰ理論を具現化したプログラミング言語にＯｃｃａｍ（オッカム；登録商標）があり、この言語で書かれたプログラムはやはり並列で処理される（走る）プロセスの集合として捉えられる。平行して走る２つのプロセスのデータ交換はチャンネルという概念を用いて行われ、２つのプロセスで共通のチャンネル変数を定義し、これを通してデータ交換、同期化を行っていく。
特許文献１にはスイッチを用いて複数のトランスピュータ間を接続する構成が開示してあり、また特許文献２には外部フレームを用いて画像処理に適用したデータ並列処理方式の例が開示してある。また非特許文献１にはプロセッサのレジスタ構造などが開示されているが、プロセッサ単体の詳細な構造やプロセスを並列に実行するアーキテクチャは開示されていない。 In the CSP theory described above, there is no shared memory for two processes or all processes that communicate with each other, and data is shared between processes only by inter-channel communication.
There is Occam (registered trademark) as a programming language that embodies CSP theory, and a program written in this language is regarded as a set of processes that are processed (running) in parallel. Data exchange between two processes running in parallel is performed using the concept of channel, and a common channel variable is defined in the two processes, and data exchange and synchronization are performed through this.
Patent Document 1 discloses a configuration in which a plurality of transputers are connected using a switch, and Patent Document 2 discloses an example of a data parallel processing method applied to image processing using an external frame. is there. Non-Patent Document 1 discloses a register structure of a processor, but does not disclose a detailed structure of a single processor or an architecture for executing processes in parallel.

特開昭６３−５０１９８６号公報Japanese Unexamined Patent Publication No. 63-501986 特開平３−２６３１６４号公報JP-A-3-263164 トランスピュータ入門；山本正樹、中井泰明、村上安範共著；日刊工業新聞社Introduction to Transputer; Masaki Yamamoto, Yasuaki Nakai, Annori Murakami; Nikkan Kogyo Shimbun

優れた情報科学理論ＣＳＰ理論に基づき製作されたトランスピュータは、ＣＳＰ理論に基づいたプログラム言語Ｏｃｃａｍのみで機能する。そのため、ＯｃｃａｍはＣＳＰ理論の研究発展に必要不可欠である。しかしそれを用いて動作するトランスピュータがないために理論発展とそれに伴う実用化に大きな問題が生じていた。
トランスピュータの開発は中止されたが、その後、ＣＳＰ理論を基礎とした並列処理プロセッサのニーズは強まるばかりであった。またトランスピュータ開発時代におけるＶＬＳＩのCMOS（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）実装技術と現代の技術では大きな違いがあり、物理的規模として同規模のＶＬＳＩに当時は1つのプロセッサのみを載せることで一杯だったところにＣＭＯＳトランジスタの微細化が進み２０基程度のプロセッサが載せられるようになった。２０基のプロセッサで並列処理プロセッサを構成するとかなり複雑なネットワークを構成することができ、システムを大きく展開することができる。
そこで、本発明はその従来技術を凌ぐだけでなく、トランスピュータを最新のＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）に搭載させて高速に動作する並列処理プロセッサのアーキテクチャを提供することにある。また、複数プロセスを単体プロセッサ内で実行するための多くのハードウェアアルゴリズムも提供することにより、従来のトランスピュータの性能より優れた並列処理プロセッサ（ＴＰＣＯＲＥ）を提供することにある。 A transputer produced based on the excellent information science theory CSP theory functions only in the programming language Occam based on the CSP theory. Therefore, Occam is indispensable for research and development of CSP theory. However, since there is no transputer that operates using it, there has been a big problem in theoretical development and its practical application.
Transputer development was discontinued, but the need for parallel processors based on CSP theory was only growing. In addition, VLSI CMOS (Complementary Metal Oxide Semiconductor) packaging technology in the transputer development era has a big difference with modern technology, and the physical scale of the VLSI was full with only one processor at that time. In addition, CMOS transistors have been miniaturized and about 20 processors can be mounted. When a parallel processor is configured with 20 processors, a fairly complicated network can be configured, and the system can be greatly expanded.
Therefore, the present invention not only surpasses the prior art, but also provides an architecture of a parallel processor that operates at high speed by mounting a transputer in the latest FPGA (Field Programmable Gate Array). Another object of the present invention is to provide a parallel processing processor (TPCORE) superior to the performance of a conventional transputer by providing many hardware algorithms for executing a plurality of processes in a single processor.

本発明の並列処理アーキテクチャは、オッカム言語でプログラムを実行する並列処理プロセッサの並列処理アーキテクチャであって、上記並列プロセッサは、上記プログラムを構成する基本単位で逐次的に実行されるプロセス実行前の初期段階で該プロセスの開始命令が実行されると上記プロセスを生成し、該プロセス待ちのキューが無いときは生成した上記プロセスを実行して該プロセスの終了命令で終了し、または上記プロセスの実行中、チャンネル通信の提起やタイムアウト処理または停止命令が実行されるとアイドリング状態となり相手プロセスのチャンネルの反応（応答）を見るため待機し、上記プロセスを生成した後プロセス待ちが無いとき、上記プロセスの識別番号を上記プロセス待ちのキューの末尾に追加して待機し、待機中に上記プロセス待ちのキュー内で上記プロセスの識別番号が進み待機中の上記プロセスが先頭プロセスになると先頭待機時のプロセスが切り替えられて、該プロセスが実行され、終了命令により終了し、上記初期段階に遷移する。 The parallel processing architecture of the present invention is a parallel processing architecture of a parallel processing processor that executes a program in an Occam language, and the parallel processor is an initial stage prior to process execution that is sequentially executed in basic units constituting the program. When the process start instruction is executed at the stage, the process is generated. When there is no queue waiting for the process, the generated process is executed and terminated by the process end instruction, or the process is being executed. When channel communication submission, timeout processing or stop command is executed, the system enters an idle state and waits to see the response (response) of the other process's channel. Wait by adding the number to the end of the queue waiting for the process, and wait In the queue waiting for the process, when the process identification number advances and the waiting process becomes the leading process, the leading waiting process is switched, the process is executed, terminated by a termination command, and the initial stage Transition to.

本発明の並列処理プロセッサは、ネットワークを形成してオッカム言語で実行する並列処理プロセッサであって、算術演算または論理演算を行うＡＬＵと、上記ＡＬＵを制御するマイクロコードを格納したマイクロコードＲＯＭと、命令または次に実行する命令が格納されているメモリアドレスを格納するレジスタと汎用スタックレジスタを有する内部レジスタと、上記プロセッサで処理するプログラムの基本単位で逐次的に実行されるプロセスの識別番号を保持するワークスペースポインタレジスタと、待機プロセスを管理するためのデータを格納するプロセス管理用レジスタと、上記マイクロコードＲＯＭを制御するマイクロコードＲＯＭコントローラとを有するプロセッサと、上記プロセッサに接続されてデータを入出力する複数のリンクと、上記プロセッサまたは上記リンクの入出力データを格納するとともにワークスペースが設けられ該ワークスペースに上記プロセスを開始する識別番号と次に実行されるプロセスの識別番号のデータを所定アドレス値だけ離して格納しスケジューリングリストを形成して上記識別番号が連結されるメモリと、上記メモリの入出力データの授受を制御するメモリコントローラとを有する。 The parallel processing processor of the present invention is a parallel processing processor that forms a network and executes in the Occam language, an ALU that performs arithmetic operation or logical operation, a microcode ROM that stores microcode that controls the ALU, Holds the register for storing the memory address where the instruction or the instruction to be executed next is stored, the internal register with the general-purpose stack register, and the identification number of the process executed sequentially in the basic unit of the program processed by the processor A processor having a workspace pointer register, a process management register for storing data for managing a standby process, and a microcode ROM controller for controlling the microcode ROM, and connected to the processor to input data. Multiple outputs And the input / output data of the processor or the link, and a workspace is provided, and the identification number for starting the process and the data of the identification number of the next process to be executed are separated by a predetermined address value in the workspace. And a memory that controls the exchange of input / output data of the memory.

本発明の並列処理アーキテクチャおよびこれを用いた並列処理プロセッサは、プロセッサにプロセス管理用レジスタと汎用の内部レジスタとメモリとリンクブロックを設け、実行するプロセスのプロセス識別番号と次に実行するプロセス番号をプロセス管理用レジスタとメモリに格納し、プロセスの識別番号をメモリに格納しかつリンク構造にされたスケジューリングリストに形成してプロセスの切り替えやプロセス間のチャンネル通信を実行する。 A parallel processing architecture of the present invention and a parallel processing processor using the parallel processing architecture are provided with a process management register, a general-purpose internal register, a memory, and a link block, and a process identification number of a process to be executed and a process number to be executed next are provided. Stored in a process management register and memory, the process identification number is stored in the memory, and formed into a linked scheduling list to execute process switching and channel communication between processes.

本発明の並列処理プロセッサ（ＴＰＣＯＲＥとも称する）は、従来のトランスピュータの機械語（アセンブリ言語）を理解するとともに、アーキテクチャが全く異なる別のプロセッサを構成することにより、トランスピュータの性能を向上させた。
また、機械語を従来のトランスピュータとコンパチブルにすることにより、以前トランスピュータで開発されたソフトウェアはすべてこのＴＰＣＯＲＥで動作し、しかもＯｃｃａｍも機能することができる。したがって本発明の並列処理プロセッサを用いてＣＳＰ理論の発展研究もＯｃｃａｍを通して再び可能となる。 The parallel processor (also referred to as TPCORE) of the present invention improves the performance of the transputer by understanding the machine language (assembly language) of the conventional transputer and configuring another processor with a completely different architecture. .
Also, by making the machine language compatible with a conventional transputer, all software previously developed on the transputer can run on this TPCORE, and Occam can also function. Therefore, the development study of the CSP theory using the parallel processor of the present invention can be performed again through Occam.

まず、本発明の並列処理プロセッサの主要部のＣＰＵ１０（中央演算処理装置）について説明する。
図１にＣＰＵ１０のブロック構成を示す。ＣＰＵ１０は、数種類のレジスタ（内部レジスタ）と２本のバスと、各種の演算処理を行うＡＬＵ_３１（ＡｒｉｔｈｍｅｔｉｃＬｏｇｉｃＵｎｉｔ）、ＴＰＣＯＲＥ（５０）を制御する制御部などで構成される。
内部レジスタのうち、処理全般にかかわるレジスタは、Ｗｐｔｒ_１４（ワークスペースポインタ）、Ｉｐｔｒ_１５（インストラクション（命令）ポインタレジスタ）、Ｉｒｅｇ_２６（インストラクション（命令）レジスタ）、Ｏｒｅｇ_２５（オペランドレジスタ）、汎用スタックジスタのＡｒｅｇ_１１，Ｂｒｅｇ_１２，Ｃｒｅｇ_１３で構成される。 First, the CPU 10 (central processing unit) which is the main part of the parallel processor of the present invention will be described.
FIG. 1 shows a block configuration of the CPU 10. The CPU 10 includes several types of registers (internal registers), two buses, an ALU ₃₁ (Arithmetic Logic Unit) that performs various arithmetic processes, a control unit that controls TPCORE (50), and the like.
Among the internal registers, Wptr ₁₄ (workspace pointer), Iptr ₁₅ (instruction (instruction) pointer register), Ireg ₂₆ (instruction (instruction) register), Oreg ₂₅ (operand register), general-purpose stack It is composed of Areg ₁₁ , Breg ₁₂ , and Creg ₁₃ of the register.

Ｉｐｔｒ_１５は３２ｂｉｔのデータ幅を持ち、次に実行する命令が格納されているアドレスを保持するレジスタである。
Ｉｒｅｇ_２６は４ｂｉｔのデータ幅を持ち、取り出してきた命令の上位４ｂｉｔがこのレジスタに格納される。格納された値は命令解釈を行う専用ハードウェアに送られ、そこでデコードされ実行する命令が決定する。
Ｏｒｅｇ_２５は３２ｂｉｔのデータ幅を持ち、取り出してきた命令の下位４ｂｉｔがここに格納され、この格納された値は命令解釈時にＩｒｅｇ_２６の値とともに用いられる。
汎用スタックレジスタのＡｒｅｇ_１１，Ｂｒｅｇ_１２，Ｃｒｅｇ_１３は３２ｂｉｔのデータ幅を持ち、スタック構造を形成している。データの入力、出力に応じて、これらのレジスタ間でＰＵＳＨ（プッシュ）、ＰＯＰ（ポップ）の動作を行う。 Iptr ₁₅ is a register having a data width of 32 bits and holding an address where an instruction to be executed next is stored.
Ireg ₂₆ has a data width of 4 bits, and the upper 4 bits of the fetched instruction are stored in this register. The stored value is sent to dedicated hardware for instruction interpretation, where the instruction to be decoded and executed is determined.
Oreg ₂₅ has a data width of 32 bits, and the lower 4 bits of the fetched instruction are stored here, and this stored value is used together with the value of Ireg ₂₆ when interpreting the instruction.
The general-purpose stack registers Areg ₁₁ , Breg ₁₂ , and Creg ₁₃ have a data width of 32 bits and form a stack structure. In accordance with data input and output, PUSH (push) and POP (pop) operations are performed between these registers.

プロセスを管理するレジスタとしてＷｐｔｒ_１４（ワークスペースポインタ）、待機プロセス管理用レジスタとしてＦｐｔｒ_１６（フォワードポインタ）、Ｂｐｔｒ_１７（バックポインタ）がある。
Ｗｐｔｒ_１４は、現在のプロセスを示す３２ｂｉｔの値（プロセスＩＤ（識別番号））をこのレジスタに格納する。Ｗｐｔｒ_１４の下位２ｂｉｔはプロセスの優先度を示す。プロセスＩＤは、並列プロセスを１台のＴＰＣＯＲＥ５０で実行するために各プロセスにプロセスＩＤという３２ｂｉｔの値を任意に付ける。
Ｆｐｔｒ_１６、Ｂｐｔｒ_１７の待機用のプロセス管理用レジスタは、Ｗｐｔｒ_１４が現在のプロセスＩＤを管理しているのに対して、現在実行されていないプロセス、すなわち待機プロセスのＩＤを管理する。Ｆｐｔｒ_１６は待機プロセスの先頭のプロセスＩＤ、Ｂｐｔｒ_１７は最後尾のプロセスＩＤを保持する。待機プロセスが３つ以上のときは、この２つのレジスタで管理することはできないので、メモリ_（４２）を利用したリスト構造でこれらを管理する。 Wptr ₁₄ (workspace pointer) is a register for managing processes, and Fptr ₁₆ (forward pointer) and Bptr ₁₇ (back pointer) are registers for waiting process management.
Wptr ₁₄ stores a 32-bit value (process ID (identification number)) indicating the current process in this register. The lower 2 bits of Wptr ₁₄ indicate the priority of the process. The process ID arbitrarily assigns a 32-bit value called a process ID to each process in order to execute a parallel process with one TPCORE 50.
The process management registers for standby of Fptr ₁₆ and Bptr ₁₇ manage the ID of a process that is not currently executed, that is, the ID of a standby process, while Wptr ₁₄ manages the current process ID. Fptr ₁₆ holds the top process ID of the standby process, and Bptr ₁₇ holds the last process ID. When there are three or more standby processes, they cannot be managed by these two registers, so these are managed by a list structure using the memory ₍₄₂₎ .

その他のレジスタとして、ｃｎｔ_２１（カウンタレジスタ）とＴｅｍｐ_２９（テンポラリレジスタ）がある。ｃｎｔ_２１は３２ｂｉｔのデータ幅を持ち、繰り返し処理の回数やシフト回数、入出力数を数えるときに用いられる。Ｔｅｍｐ_２９は３２ｂｉｔのデータ幅を持ち、ＡＬＵ_３１が乗算や除算などを１クロックで演算を実行できないとき一時的に処理結果を保存する。 Other registers include cnt ₂₁ (counter register) and Temp ₂₉ (temporary register). The cnt ₂₁ has a data width of 32 bits and is used when counting the number of repetitions, the number of shifts, and the number of inputs and outputs. Temp ₂₉ has a data width of 32 bits, and temporarily stores a processing result when ALU ₃₁ cannot execute an operation such as multiplication or division in one clock.

ＡＬＵ_３１は、ＴＰＣＯＲＥ５０内部で算術、論理演算を行う。処理実行時に演算が必要なときは、レジスタのデータや演算用に任意に生成されたデータが個々に送られ演算処理が行われる。 The ALU ₃₁ performs arithmetic and logical operations inside the TPCORE 50. When an operation is required at the time of execution of the processing, the register data and the data arbitrarily generated for the operation are individually sent to perform the operation processing.

マイクロコードＲＯＭ_２７（ＭｉｃｒｏｃｏｄｅＲＯＭ）はマイクロコードを記憶し、レジスタ間の通信やメモリ_４２とレジスタ間の通信制御を行い、またＡＬＵ_３１の機能も制御する。この他プロセッサの状態遷移も管理する。一つのマイクロコードは６８ｂｉｔ幅であり、各ｂｉｔによりＴＰＣＯＲＥ５０の動作を制御する。マイクロコードの上位５６ｂｉｔがバスやレジスタ、ＡＬＵ_３１の制御を行い、下位１１ｂｉｔでプロセッサの動作を制御する。
マイクロコードＲＯＭコントローラ_２４（ＭｉｃｒｏｃｏｄｅＲＯＭＣｏｎｔｒｏｌｌｅｒ）はマイクロコードのアドレスを算出する。アドレス算出メカニズムは２通りあり、マイクロコードの６４ｂｉｔ目の値により区別される。この値が“１”のとき、Ｉｒｅｇ_２６やＯｒｅｇ_２５の値をもとにしてアドレスを算出し、“０”のとき、マイクロコードＲＯＭ_２７の出力９〜０ｂｉｔの値をそのまま次のアドレスとする。 Microcode ROM ₂₇ (Microcode ROM) stores microcode, performs communication between registers, communication control between memory ₄₂ and registers, and also controls the function of ALU ₃₁ . It also manages state transitions of other processors. One microcode is 68 bits wide, and the operation of the TPCORE 50 is controlled by each bit. The upper 56 bits of the microcode control the bus, the register, and the ALU ₃₁ , and the lower 11 bits control the operation of the processor.
A microcode ROM controller ₂₄ (Microcode ROM Controller) calculates a microcode address. There are two address calculation mechanisms, which are distinguished by the 64-bit value of the microcode. When this value is “1”, an address is calculated based on the values of Ireg ₂₆ and Oreg ₂₅ , and when “0”, the value of output 9 to 0 bits of the microcode ROM ₂₇ is used as the next address as it is. .

次に図２に、ＴＰＣＯＲＥ５０のブロック構成図を示す。ＴＰＣＯＲＥ５０は上述したＣＰＵ１０以外にリンク（Ｌｉｎｋ）ブロック_４５、メモリコントローラ_４１（Ｍｅｍｏｒｙｃｏｎｔｒｏｌｌｅｒ）、メモリ_４２（Ｍｅｍｏｒｙ_{４２−ａ〜４２−ｄ}）などで構成される。各メモリ_{４２−ａ〜４２−ｄ}は８ＫＢｙｔｅ（キロバイト）ブロックのＲＡＭで構成される。リンクブロック_４５は４個のインターフェース（リンク）で構成され、他のＴＰＣＯＲＥ５０と通信又はデータの交換を行う。 Next, FIG. 2 shows a block diagram of the TPCORE50. In addition to the CPU 10 described above, the TPCORE 50 includes a link block ₄₅ , a memory controller ₄₁ (Memory controller), a memory ₄₂ (Memory _{42-a to 42-d} ), and the like. Each of the memories _{42-a to 42-d} is composed of an 8 Kbyte (kilobyte) block RAM. The link block ₄₅ is composed of four interfaces (links), and communicates or exchanges data with other TPCORE 50.

メモリ（Ｍｅｍｏｒｙ_４２）は、１個のＴＰＣＯＲＥ５０につき３２ＫＢｙｔｅの内部メモリが搭載される。このメモリ_４２はＣＰＵ１０とリンクブロック_４５の両方からアクセスされる。これらのアクセス優先権は後述のメモリコントローラ_４１で管理される。
また、メモリ_４２はバイトアクセス、ワードアクセスの２通りの方法でアクセスできる。メモリ_４２からの読出しやメモリ_４２への書込みのデータ幅が２通りあるので、３２ＫＢｙｔｅのメモリ_４２は８ＫＢｙｔｅのメモリ_{４２−ａ（〜４２−ｄ）}を４個組み合わせた構成となっている。各メモリ_{４２−ａ〜４２−ｄ}はデータ幅が８ｂｉｔ、深さ１０２４で構成される。 A memory (Memory ₄₂ ) is provided with an internal memory of 32 KB per one TPCORE 50. This memory ₄₂ is accessed from both the CPU 10 and the link block ₄₅ . These access priorities are managed by a memory controller ₄₁ described later.
The memory ₄₂ can be accessed by two methods, byte access and word access. Since the data width of the write to read and the memory ₄₂ from the memory ₄₂ There are two, memory ₄₂ of 32KByte has a configuration that combines four memories _{42-a (~42-d)} of 8KByte. Each of the memories _{42-a to 42-d} has a data width of 8 bits and a depth of 1024.

リンクブロック_４５はリンクインターフェースやレジスタなどを含み、ＴＰＣＯＲＥ５０に４個の双方向シリアルリンク（リンク；Ｌｉｎｋ）が構成され、これらの４個のリンクの制御はＣＰＵ１０と独立に管理される。つまり、リンクブロック_４５はＴＰＣＯＲＥ５０を構成する他の部分と独立して動作する。したがって、リンクブロック_４５は、ＣＰＵ１０やメモリ_４２からデータを受け取ると、ＣＰＵ１０の動作と関係なくデータを送受信することができる。 The link block ₄₅ includes a link interface, a register, and the like, and four bidirectional serial links (links) are configured in the TPCORE 50, and control of these four links is managed independently of the CPU 10. That is, the link block ₄₅ operates independently of other parts constituting the TPCORE 50. Therefore, when the link block ₄₅ receives data from the CPU 10 or the memory ₄₂ , the link block ₄₅ can transmit and receive data regardless of the operation of the CPU 10.

メモリコントローラ_４１はＣＰＵ１０からメモリアクセスの要求またはリンクインターフェースからメモリアクセス要求を受ける。そして、このメモリコントローラ_４１は、ＦＰＧＡに搭載されたメモリ_４２の仕様に応じてリクエストを調整する。
メモリコントローラ_４１は、メモリアクセスの権限とバスデータ幅の変更（バイト（Ｂｙｔｅ）幅、ワード幅など）について管理する。また、アドレス空間は４ＧＢｙｔｅ（ギガバイト）以上に拡張することができる。 The memory controller ₄₁ receives a memory access request from the CPU 10 or a memory access request from the link interface. The memory controller ₄₁ adjusts the request according to the specifications of the memory ₄₂ mounted on the FPGA.
The memory controller ₄₁ manages memory access authority and bus data width change (byte width, word width, etc.). Also, the address space can be expanded to 4 GB (Gigabytes) or more.

次に、メモリアクセス権限について説明する。メモリコントローラ_４１は１本の制御線でメモリ_４２の優先度を制御する。例えば、制御信号が“Ｈｉｇｈ”（ハイ）レベルのときリンクブロック_４５がメモリ_４２を制御し、“Ｌｏｗ”（ロー）レベルのときＣＰＵ１０がメモリ_４２と接続される。
リンクブロック_４５にメモリアクセス権限が渡されると、メモリ_４２はリンク_{５２−ａ〜５２−ｄ}を介して他のＴＰＣＯＲＥ５０上にあるメモリ_４２と直接接続され、所謂ＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）状態となる。そのため、ＴＰＣＯＲＥ５０のメッセージパッシング（メッセージ転送）はＤＭＡであるため非常に高速に行われる。 Next, the memory access authority will be described. The memory controller ₄₁ controls the priority of the memory ₄₂ with one control line. For example, the link block ₄₅ controls the memory ₄₂ when the control signal is “High” level, and the CPU 10 is connected to the memory ₄₂ when the control signal is “Low” level.
When the memory access authority is passed to the link block ₄₅ , the memory ₄₂ is directly connected to the memory ₄₂ on the other TPCORE 50 via the links _{52-a to 52-d,} and enters a so-called DMA (Direct Memory Access) state. For this reason, message passing (message transfer) of the TPCORE 50 is performed at a very high speed because of DMA.

次に、メモリコントローラ_４１によるバスデータ幅の変更管理方法について説明する。
まずアドレスバスについて説明する。ＴＰＣＯＲＥ５０は３２ｂｉｔでアクセスできるメモリ空間（４Ｇ）のうち、内部メモリとして割り当てられたアドレス空間について、その他のアドレス空間より速いクロックでアクセスすることができる。また、ＴＰＣＯＲＥ５０はＦＰＧＡに搭載することを前提とし、回路構成を簡素化することにより、全てのアドレス空間を同等にみなしているので、４ＧＢｙｔｅのメモリ（メモリ_４２とそれ以外の不図示のメモリ）が同じクロックサイクルでアクセスできる。
さらに、このＴＰＣＯＲＥ５０はＦＰＧＡに搭載され、容量が制限されるので３２ｂｉｔ幅のうち下位１５ｂｉｔのみ使用する。この１５ｂｉｔの内、１４〜２ｂｉｔがアドレスとして用いられ、１〜０ｂｉｔは４個のメモリ_{４２−ａ〜４２−ｄ}のうちどれを選ぶかを決定するためにバンクセレクトとして使用される。このアドレスバスのデータ幅の変更例を図３に示す。 Next, a bus data width change management method by the memory controller ₄₁ will be described.
First, the address bus will be described. Of the memory space (4G) that can be accessed in 32 bits, the TPCORE 50 can access an address space allocated as an internal memory with a faster clock than other address spaces. Since the TPCORE 50 is assumed to be mounted on an FPGA and all the address spaces are regarded as being equal by simplifying the circuit configuration, a 4 GB memory (memory ₄₂ and other memory not shown) is provided. Can be accessed in the same clock cycle.
Further, since the TPCORE 50 is mounted on the FPGA and its capacity is limited, only the lower 15 bits of the 32-bit width are used. Of these 15 bits, 14 to 2 bits are used as an address, and 1 to 0 bits are used as a bank select to determine which of the four memories _{42-a to 42-d} is selected. An example of changing the data width of the address bus is shown in FIG.

次に、メモリコントローラ_４１によるデータバス幅の変更について説明する。
ＣＰＵ１０やリンク_{５２−ａ〜５２−ｄ}は３２ｂｉｔまたは８ｂｉｔのどちらのデータでもアクセスできるようにしてあるので、これを制御するためマイクロコードＲＯＭ_２７からデータ幅を選択する制御信号が出力される。例えば、通信が成立すると、リンク_{５２−ａ〜５２−ｄ}が通信するデータは８ｂｉｔであるので、データ幅はＢｙｔｅ幅に制限される Next, the change of the data bus width by the memory controller ₄₁ will be described.
Since the CPU 10 and the links _{52-a to 52-d} can access either 32 _- bit or 8-bit data, a control signal for selecting the data width is output from the microcode ROM ₂₇ in order to control this. For example, when communication is established, the data communicated by the links _{52-a to 52-d} is 8 bits, so the data width is limited to the Byte width.

メモリ_４２のワードアクセスについて説明する。各メモリ_{４２−ａ〜４２−ｄ}は８ｂｉｔでしかデータを入出力することができない。しかし、４個のメモリ_{４２−ａ〜４２−ｄ}を組み合わせて、ワード（３２ｂｉｔ）幅とバイト幅の２通りのデータを出力することができる。
図４にデータバスのデータ幅の変換について示す。
ＴＰＣＯＲＥ５０がメモリ_４２にアクセスし、データバスの幅がワード幅に決定されたとき、４個のメモリ_{４２−ａ〜４２−ｄ}から同時にデータが出力され、メモリコントローラ_４１により４個のメモリ_{４２−ａ〜４２−ｄ}を組み合わせることにより３２ｂｉｔ幅のデータを形成する。このとき、アドレスバスから出力されるデータは上位１３ｂｉｔのみ使用し、下位２ｂｉｔのメモリセレクト部分は使用されない。 The word access of the memory ₄₂ will be described. Each of the memories _{42-a to 42-d} can input / output data only at 8 bits. However, by combining the four memories _{42-a to 42-d} , two kinds of data of a word (32 bit) width and a byte width can be output.
FIG. 4 shows the conversion of the data bus data width.
When the TPCORE 50 accesses the memory ₄₂ and the data bus width is determined to be the word width, data is simultaneously output from the four memories _{42-a to 42-} _d , and the memory controller ₄₁ causes the four memories _42-a to be output. Data of 32 bits width is formed by combining _{~ 42-d} . At this time, the data output from the address bus uses only the upper 13 bits, and the lower 2-bit memory select portion is not used.

メモリ_４２のバイトアクセスについて説明する
データ幅制御信号によりデータ幅がバイト幅に指定されると、ＴＰＣＯＲＥ５０がメモリ_４２にバイトアドレスバスのデータの下位１５ｂｉｔを用いて、データが格納されているメモリ_{４２−ａ〜４２−ｄ}とそのアドレスを決定する。ＴＰＣＯＲＥ５０がメモリ_４２にバイトアクセスすると、１個のメモリ_{４２−ａ〜４２−ｄ}だけからバイト幅のデータが出力される。 When the data width is designated in byte-wide by the data width control signal describing byte access memory _42, TPCORE50 by using a lower 15bit data byte address bus to the memory _42, stored data memory _{42 a to 42-d} and its address are determined. When the TPCORE 50 performs byte access to the memory ₄₂ , byte _- width data is output from only one memory _{42-a to 42-d} .

次に、ＴＰＣＯＲＥ５０は前提条件として以下の（ａ−１）メモリ、（ａ−２）プロセス管理用レジスタ、（ａ−３）内部レジスタを有し、クロックについては（ａ−４）に示す。 Next, TPCORE 50 has the following (a-1) memory, (a-2) process management register, and (a-3) internal register as preconditions, and the clock is shown in (a-4).

（ａ−１）メモリ_４２について説明する。
３２ｂｉｔ幅（１ワード３２ｂｉｔとする）のメモリ_４２を用意する。１Ｂｙｔｅ番地付けとする。例えば0x00000000番地より番地が増えていく方向を正方向，逆を負の方向とする。メモリ量はここでは厳密に定義しないが任意で良い。メモリアクセスとしてはワード単位でプログラムは進んでいくが、チャンネル通信ではバイト単位のデータ転送を行うのでワード、バイトともにアクセスできるようにする必要がある。割り込み発生時各種レジスタ退避のため任意のアドレスに３２ｂｉｔ幅で５ワード領域を確保しレジスタ待避専用領域として固定する。それらを退避させるレジスタの名前に対応してＣｒｅｇｓａｖｅｌｏｃ、Ｂｒｅｇｓａｖｅｌｏｃ、Ａｒｅｇｓａｖｅｌｏｃ、Ｉｐｔｒｓａｖｅｌｏｃ、Ｗｐｔｒｓａｖｅｌｏｃと便宜的に名付ける（図１０参照）。また外部リンク通信用に入力チャンネル４ワード、出力チャンネル４ワードを確保し固定する。それらをＬｉｎｋ_０Ｉｎｐｕｔ、・・・、Ｌｉｎｋ_３Ｉｎｐｕｔ、Ｌｉｎｋ_０Ｏｕｔｐｕｔ、・・・、Ｌｉｎｋ_３Ｏｕｔｐｕｔと便宜的に名付ける。なお、以下の記述においてアドレス（Ｘ）とある場合、Ｘで示されるアドレスへのアクセスを意味する。 (A-1) The memory ₄₂ will be described.
A memory ₄₂ having a width of 32 bits (one word is 32 bits) is prepared. 1 byte addressing. For example, the direction in which the address increases from address 0x00000000 is the positive direction, and the reverse direction is the negative direction. The amount of memory is not strictly defined here, but may be arbitrary. As memory access, the program proceeds in units of words, but in channel communication, data transfer is performed in units of bytes, so that both words and bytes must be accessible. In order to save various registers when an interrupt occurs, a 5-word area with a 32-bit width is secured at an arbitrary address and fixed as a register saving dedicated area. Cregsaveloc, Bregsaveloc, Aregsaveloc, Iptsaveloc, and Wptraveloc are named for convenience (see FIG. 10) corresponding to the names of the registers for saving them. Also, 4 words of input channel and 4 words of output channel are secured and fixed for external link communication. These are named for convenience as Link ₀ Input,..., Link ₃ Input, Link ₀ Output,..., Link ₃ Output. In the following description, an address (X) means access to an address indicated by X.

（ａ―２）プロセス管理用レジスタについて説明する。
プロセッサ（ＣＰＵ１０）内にプロセス管理用レジスタとして３２ｂｉｔ幅のレジスタを５つ用意する。Ｗｐｔｒ_１４（現行プロセスのワークスペースポインタを保持）と待機プロセス管理用レジスタとしてＦｐｔｒ_１６とＢｐｔｒ_１７を用意する。Ｆｐｔｒ_１６、Ｂｐｔｒ_１７は優先度に応じて独立に２組用意する。なお、表現を簡略化するためここではＦｐｔｒ_１６、Ｂｐｔｒ_１７のサフィックスを省略し、優先度の記号のみを付記する、例えば、Ｆｐｔｒ_０，Ｂｐｔｒ_０を高優先度プロセス用、Ｆｐｔｒ_１、Ｂｐｔｒ_１を低優先度プロセス用とする。以後単にＦｐｔｒ_１６、Ｂｐｔｒ_１７とのみ記された場合優先度に関係なく、同じ優先度内のレジスタ対であることを意味する。 (A-2) The process management register will be described.
Five registers of 32-bit width are prepared as process management registers in the processor (CPU 10). Fptr ₁₆ (holding the workspace pointer of the current process) and Fptr ₁₆ and Bptr ₁₇ are prepared as standby process management registers. Two sets of Fptr ₁₆ and Bptr ₁₇ are prepared independently according to the priority. In order to simplify the expression, the suffixes Fptr ₁₆ and Bptr ₁₇ are omitted here, and only the priority symbols are added. For example, Fptr ₀ and Bptr ₀ are used for high priority processes, and Fptr ₁ and Bptr ₁ are used. For low priority processes. Hereinafter, when only Fptr ₁₆ and Bptr ₁₇ are described, it means that the register pair is within the same priority regardless of the priority.

（ａ−３）内部レジスタについて説明する。
前述したように、命令ポインタ（Ｉｐｔｒ_１５）と呼ばれる３２ｂｉｔ幅のレジスタを１つ用意する。次に実行する命令の格納されているメモリアドレスを保持しておく。また同じく３２ｂｉｔ幅の汎用スタックレジスタのＡｒｅｇ_１１、Ｂｒｅｇ_１２、Ｃｒｅｇ_１３を用意する。任意であるが、いくつか更に内部レジスタを設けなければならない場合がある。この例として、繰り返し操作の回数を保持しておくｃｎｔ_２１（カウンタレジスタ）、一時的なデータ保管用のＴｅｍｐ_２９（テンポラリレジスタ）などがあるが、以下ｃｎｔ_２１が構成されているとする（チャンネル間通信で利用する）。 (A-3) The internal register will be described.
As described above, one 32-bit register called an instruction pointer (Iptr ₁₅ ) is prepared. The memory address where the instruction to be executed next is stored is held. Similarly, 32-bit wide general-purpose stack registers Areg ₁₁ , Breg ₁₂ , and Creg ₁₃ are prepared. Optionally, some additional internal registers may need to be provided. Examples of this include cnt ₂₁ (counter register) that holds the number of repeated operations, Temp ₂₉ (temporary register) for temporary data storage, and the following is assumed that cnt ₂₁ is configured (channel) Used for inter-communication).

（ａ−４）ＴＰＣＯＲＥ５０のクロックについて説明する。
ＴＰＣＯＲＥ５０において、クロックは２種類用意し、優先度の高いプロセスで使用されるクロックの周期は１μ（マイクロ）秒、優先度の低いプロセスで使用されるクロックは６４μ秒とする。 (A-4) The clock of TPCORE50 will be described.
In the TPCORE 50, two types of clocks are prepared, the clock period used in the high priority process is 1 μ (micro) seconds, and the clock used in the low priority processes is 64 μ seconds.

このように、ＣＰＵ１０を有するＴＰＣＯＲＥ５０は、プロセス管理用レジスタや汎用スタックレジスタとメモリ（ＲＡＭ）_４２などのハードウェアを最適化し、また後述するＯｃｃａｍ言語で動作するようにすることにより従来のトランスピュータの性能をより向上することができる。 As described above, the TPCORE 50 having the CPU 10 optimizes hardware such as a process management register, a general-purpose stack register, and a memory (RAM) ₄₂ , and operates in the Occam language, which will be described later. The performance can be further improved.

次に、ＴＰＣＯＲＥ５０を制御するＯｃｃａｍ言語の概要について説明する。
まず、コンストラクションで処理するプロセスについて説明する。ここで、コンストラクションとは代入、出力、プロシジャーコール（サブルーチンに相当する）などの最も基本となるプリミティブ（基本）プロセスの集合体を示す。
プリミティブプロセスとは代入文、入力分、出力分の最小単位を示す。
代入プロセスは、変数に値を代入することを示し、ｙの値を変数ｘに代入する場合、ｘ：＝ｙと表される。入力プロセスは、変数に値を入力することを示し、ｃｈ（チャンネル）１からの変数ｘを受ける場合、ｃｈ１？ｘと表される。出力プロセスは、変数から値を出力することを示し、ｃｈ１へ変数ｙを出力する場合、ｃｈ１！ｙと表される。
宣言されたチャンネルは、２つの並列動作するプロセス間に存在する。チャンネルは宣言されるとプログラム実行中に通信する相手が変わることは無いが、送信側と受信側は入れ替わることができる。また、ＴＰＣＯＲＥ５０間の入出力はお互い同期を取って行われる。
例えば、上（前）の逐次プロセスにより出力プロセスが実行されたとき、下（後）の逐次プロセスが実行されなかったら、実行されるまで上のプロセスは待ち続け、両方が通信可能となったときに通信が開始する。このようにして通信の同期が取られる。 Next, an outline of the Occam language for controlling the TPCORE 50 will be described.
First, the process processed by the construction will be described. Here, the term “construction” refers to a collection of the most basic primitive (basic) processes such as assignment, output, procedure call (corresponding to a subroutine).
A primitive process is the minimum unit of an assignment statement, input, and output.
The assignment process indicates that a value is assigned to a variable. When a value of y is assigned to a variable x, x: = y is expressed. The input process indicates that a value is input to the variable. When receiving the variable x from ch (channel) 1, ch1? x. The output process indicates that the value is output from the variable. When the variable y is output to ch1, ch1! y.
The declared channel exists between two parallel processes. When a channel is declared, the communicating party does not change during program execution, but the sending side and the receiving side can be switched. Input / output between the TPCORE 50 is performed in synchronization with each other.
For example, when the output process is executed by the upper (front) sequential process, if the lower (rear) sequential process is not executed, the upper process continues to wait until it is executed, and both can communicate Communication starts. In this way, communication is synchronized.

Ｏｃｃａｍ言語のコンストラクションはＳＥＱ、ＰＡＲ、ＡＬＴ、ＷＨＩＬＥ、ＩＦ、ＣＡＳＥの６種類がある。
ＳＥＱ（シーケンシャル）コンストラクションは、プログラムの上から順にプロセスを実行するコンストラクションである。ＰＡＲ（パラレル）コンストラクションは、記述されている順番に関係なくプロセスを並列に実行する。ＡＬＴ（オルトネィティブ）コンストラクションは、ＡＬＴコンストラクションが求めている条件を１番最初に満たされたプロセスを選択し実行する。ＷＨＩＬＥ（ホワイル）コンストラクションは、これに付随している論理型の変数に基づいてプロセスが繰り返し実行する。ＩＦ（イフ）コンストラクションは、ガードが“真”になった最初のプロセスを実行する。ＣＡＳＥ（ケース）コンストラクションは、複数のプロセス群の中から一つのプロセスを選択する。 There are six types of construction in the Occam language: SEQ, PAR, ALT, WHILE, IF, and CASE.
The SEQ (sequential) construction is a construction for executing processes in order from the top of the program. The PAR (parallel) construction executes processes in parallel regardless of the order in which they are written. The ALT (orthotropic) construction selects and executes the process that first satisfies the conditions required by the ALT construction. The WHILE construction is repeatedly executed by the process based on a logical type variable attached to the WHILE construction. The IF construction executes the first process in which the guard becomes “true”. The CASE (case) construction selects one process from a plurality of process groups.

以下、ＳＥＱ、ＰＡＲ、ＡＬＴの各コンストラクションについてプログラムを参照しながら具体的に説明する。
ＳＥＱコンストラクションを用いたプログラム例を以下に示す。
ＳＥＱ
Ｐｒｏｃｅｓｓ１
Ｐｒｏｃｅｓｓ２
・
・
・
Ｐｒｏｃｅｓｓｎ Hereinafter, each of the SEQ, PAR, and ALT constructions will be specifically described with reference to a program.
A program example using the SEQ construction is shown below.
SEQ
Process 1
Process 2
・
・
・
Process n

上述したプログラムにおいて、ＳＥＱコンストラクションは、記述されたプロセスをＰｒｏｃｅｓｓ１から順にＰｒｏｃｅｓｓ２，Ｐｒｏｃｅｓｓ３，・・・と逐次実行していく。そして、最後のプロセスＰｒｏｃｅｓｓｎが終了すると、ＳＥＱコンストラクション自身が終了する。 In the above-described program, the SEQ construction sequentially executes the described processes from Process 1 as Process 2, Process 3,. Then, when the last process Process n ends, the SEQ construction itself ends.

ＰＡＲコンストラクションを用いたプログラム例を示す。
ＰＡＲ
Ｐｒｏｃｅｓｓ１
Ｐｒｏｃｅｓｓ２
・
・
・
Ｐｒｏｃｅｓｓｎ The example of a program using PAR construction is shown.
PAR
Process 1
Process 2
・
・
・
Process n

ＰＡＲコンストラクションは、記述されたプロセスを同時に実行して行く。そして全てのプロセスが終了したときＰＡＲコンストラクション自身が終了する。並列動作するプロセス同士は、唯一チャンネルを用いて通信することで相互作用を行う。ＰＡＲコンストラクション内のプロセス一つに対して一つのＴＰＣＯＲＥ５０が割り当てられている場合は、同じＰＡＲコンストラクション内のプロセスは同時に実行が開始される。しかし、１台のＴＰＣＯＲＥ５０で実行されるＰＡＲコンストラクションは、時分割に管理されて擬似並列的に実行される。この擬似並列動作については後述する。
また、並列に実行されるプロセスに優先度を付けることができ、ＰＡＲコンストラクションの前にＰＲＩの予約語を追加する。 The PAR construction runs the described process simultaneously. And when all the processes are finished, the PAR construction itself is finished. Processes operating in parallel interact with each other by communicating using a single channel. If one TPCORE 50 is assigned to one process in the PAR construction, the processes in the same PAR construction are started simultaneously. However, the PAR construction executed by one TPCORE 50 is managed in a time division manner and executed in a pseudo-parallel manner. This pseudo parallel operation will be described later.
In addition, priorities can be assigned to processes executed in parallel, and PRI reserved words are added before the PAR construction.

ＡＬＴコンストラクションを用いたプログラム例を示す。
ＡＬＴ
ｉｎｐｕｔ１
Ｐｒｏｃｅｓｓ１
ｉｎｐｕｔ２
Ｐｒｏｃｅｓｓ２
・
・
・
ｉｎｐｕｔｎ
Ｐｒｏｃｅｓｓｎ The example of a program using ALT construction is shown.
ALT
input1
Process1
input2
Process2
・
・
・
inputn
Processn

ＡＬＴコンストラクションは、多くのプロセスから実行すべきプロセスを一つ選択する。上述のプログラムにおいて、ｉｎｐｕｔ１，ｉｎｐｕｔ２，・・・，ｉｎｐｕｔｎは入力ガードを示し、最初にレディ状態になった入力ガード内のプロセスが実効される。そして、選択されたプロセスが実行された後、ＡＬＴコンストラクションは終了する。
ＡＬＴコンストラクションもまたＰＡＲコンストラクションと同様に優先度を付けることができ、ＰＲＩの予約語をＡＬＴの前に追加する。なお、Ｏｃｃａｍ言語では、上から順に優先度をつけているが、ＴＰＣＯＲＥ５０とトランスピュータでは２つの優先度しかない。そのため、プログラムの一番上に記述されたプロセスが優先度は高く、それ以下のプロセスは優先度が低い。 The ALT construction selects one process to be executed from many processes. In the above-described program, input1, input2,..., Inputn indicate input guards, and the processes in the input guards that are in the ready state first are executed. Then, after the selected process is executed, the ALT construction ends.
The ALT construction can also be prioritized in the same way as the PAR construction, adding a PRI reserved word before the ALT. In the Occam language, priorities are given in order from the top, but TPCORE50 and transputer have only two priorities. For this reason, the process described at the top of the program has a high priority, and processes below it have a low priority.

次に、ＦＰＧＡ上でのＴＰＣＯＲＥ５０ネットワークについて説明する。
図５に示すように、ＴＰＣＯＲＥ５０は従来のトランスピュータと同様に4つの外部インターフェース_{５２−ａ〜５２−ｄ}（またはリンク（Ｌｉｎｋ）とも称する）を備えている。ボックス_５１がＴＰＣＯＲＥ５０本体を示し、このボックス_５１の各辺の中央に辺と直交している線はインターフェース_{５２−ａ〜５２−ｄ}を示す。
図５に示すように、このインターフェース_{５２−ａ〜５２−ｄ}を他のＴＰＣＯＲＥ５０のインターフェース_{５２−ａ〜５２−ｄ}とつなぎ合わせＴＰＣＯＲＥ５０ネットワークを構築することができる。 Next, the TPCORE50 network on the FPGA will be described.
As shown in FIG. 5, the TPCORE 50 includes four external interfaces _{52-a to 52-d} (or also referred to as links) as in the conventional transputer. A box ₅₁ indicates the TPCORE 50 main body, and a line orthogonal to the side at the center of each side of the box ₅₁ indicates the interfaces _{52-a to 52-d} .
As shown in FIG. 5, the interfaces _{52-a to 52-d} can be connected to the interfaces _{52-a to 52-} _d of other TPCORE 50 to construct a TPCORE 50 network.

図６（ａ）〜（ｃ）に、ＴＰＣＯＲＥネットワーク１００の構成図を示す。ＴＰＣＯＲＥネットワーク１００はＴＰＣＯＲＥ５０のインターフェース_{５２−ａ〜５２−ｄ}を介して複数個ＴＰＣＯＲＥ５０を接続して、ツリー接続（図６（ａ））、パイプライン接続（図６（ｂ））、格子接続（図６（ｃ））されて並列処置プロセッサのネットワークを構築する。例えば、グラッフィクスアクセラレータとしては格子接続を、また並列データベース検索システムではツリー接続とすることにより、応用形態によって短時間で容易にネットワークを構成することができる。
また、上述したＴＰＣＯＲＥ５０をＦＰＧＡで形成することにより、ネットワークも並列処理を応用するシステムによって自由にそのトポロジーを改編できる。従って、ＴＰＣＯＲＥ５０をＦＰＧＡ上で実現させることのメリットは非常に大きい。 6A to 6C are configuration diagrams of the TPCORE network 100. FIG. In the TPCORE network 100, a plurality of TPCOREs 50 are connected via the interfaces _{52-a to 52-d} of the TPCORE 50, and a tree connection (FIG. 6A), a pipeline connection (FIG. 6B), and a lattice connection (FIG. 6 (c)) to construct a network of parallel treatment processors. For example, by using a lattice connection as a graphics accelerator and a tree connection as a parallel database search system, a network can be easily configured in a short time depending on the application form.
In addition, by forming the above-described TPCORE 50 with an FPGA, the topology of the network can be freely modified by a system that applies parallel processing. Therefore, the merit of realizing the TPCORE50 on the FPGA is very large.

次に、外部リンク（Ｌｉｎｋ）を用いた並列処理プロセッサの通信動作について説明する。ここでは説明を分かり易くするため、２個のＴＰＣＯＲＥ５０−１，５０−２を用いた通信の例を示す。なお、図７においてＬｉｎｋＩｎｔｅｒｆａｃｅ（リンクインターフェース）を単にリンク（Ｌｉｎｋ１、Ｌｉｎｋ２）とも称する。
図７に示すように、ＴＰＣＯＲＥ５０−１，５０−２間の通信は、Ａｃｋｎｏｗｌｅｄｇｅ（ＡＣＫ；アクノレッジ）パケットにより成立する。このＡＣＫパケットが送られてこなければ、これを受信するまで送信（Ｏｕｔ）側のＬｉｎｋ１（リンクインターフェース）_４５ａはアイドリング状態として待機する。
例えば、ＴＰＣＯＲＥ５０−１のＢｒｅｇ_１２が0x80000008という状態でｏｕｔ命令が実行されるとリンクブロック_４５を構成するＬｉｎｋ２（リンクインターフェース）_４５を通じて外部にデータを出力しようとする。ＣＰＵ１０はスタックレジスタ（Ａｒｅｇ_１１，Ｂｒｅｇ_１２，Ｃｒｅｇ_１３）の内容やプロセスＩＤをリンクインターフェースの各レジスタに渡し、Ｌｉｎｋ２に通信処理を委ねる。そして、現在実効していたプロセスを終了して、スケジューリングリストの次の実行を始める。 Next, the communication operation of the parallel processor using the external link (Link) will be described. Here, for easy understanding, an example of communication using two TPCOREs 50-1 and 50-2 is shown. In FIG. 7, the link interface (link interface) is also simply referred to as a link (Link1, Link2).
As shown in FIG. 7, the communication between the TPCOREs 50-1 and 50-2 is established by an Acknowledge (ACK) packet. If this ACK packet is not sent, the link (link interface) _45a on the transmission (Out) side waits in an idling state until it is received.
For example, when an out instruction is executed in a state where Breg _{12 of} TPCORE 50-1 is 0x80000008, data is to be output to the outside through the Link 2 (link interface) ₄₅ constituting the link block ₄₅ . The CPU 10 passes the contents of the stack registers (Areg ₁₁ , Breg ₁₂ , Creg ₁₃ ) and the process ID to each register of the link interface, and entrusts communication processing to Link 2. Then, the process currently being executed is terminated and the next execution of the scheduling list is started.

ＴＰＣＯＲＥ５０−１は１Ｂｙｔｅのデータを送信後、相手側のＴＰＣＯＲＥ５０−２からＡＣＫパケットが送られて来るまでアイドリングし、それらを受信すると通信が成立する（図８参照）。通信が成立した後も１Ｂｙｔｅずつ送信し、ＡＣＫパケットでデータの送受信をお互い確認し合いながらｃｎｔ２１に格納されたバイト数のデータを通信する。
ＣＰＵ１０とＬｉｎｋ１_４５ａが同時にメモリ_４２ａにアクセスすることはできないので、この通信が実行されているときはＬｉｎｋ１_４５ａがメモリ_４２ａを占用する。通信が終了すると、通信前にＬｉｎｋ１_４５ａに渡したＩＤをメモリ_４２ａに形成されたスケジューリングリストの最後尾へ追加する。この追加を実行するための制御はマイクロコードにより行われる。また、通信終了と同時にＣＰＵ１０がアイドリング状態から復帰し、メモリ_{４２，４２ａ}の占有権もＣＰＵ１０に渡される。図９に、ＴＰＣＯＲＥ５０−１，ＴＰＣＯＲＥ５０−２が各実行中のプロセスを停止してメモリ_４２の使用権限をＬｉｎｋ１_４５ａ，Ｌｉｎｋ２_４５に渡し、ＴＰＣＯＲＥ５０−１とＴＰＣＯＲＥ５０−２間でＡＣＫパケットとデータを送出する例を示す。 After transmitting 1-byte data, the TPCORE 50-1 idles until an ACK packet is sent from the TPCORE 50-2 on the other side, and communication is established when these are received (see FIG. 8). Even after the communication is established, data is transmitted one byte at a time, and data of the number of bytes stored in the cnt 21 is communicated while mutually confirming data transmission / reception with an ACK packet.
Since CPU10 and _{Link1 45a} can not access the memory _42a simultaneously, _{Link1 45a} when this communication is executed to occupy memory _42a. When the communication is completed, the ID passed to Link1 _45a before the communication is added to the tail of the scheduling list formed in the memory _42a . Control to perform this addition is performed by microcode. Simultaneously with the end of communication, the CPU 10 returns from the idling state, _and the occupancy rights of the memories _{42 and 42a} are also passed to the CPU 10. In FIG. 9, TPCORE 50-1 and TPCORE 50-2 stop the processes being executed, pass the authority to use the memory ₄₂ to Link1 _45a and Link2 ₄₅ , and send ACK packets and data between TPCORE 50-1 and TPCORE 50-2. An example is shown.

次に、ＴＰＣＯＲＥ５０のプロセッサの単体内部における複数のプロセスの並列処理について説明する。なお、ＴＰＣＯＲＥ５０は前提条件として上述の（ａ−１）メモリ、（ａ−２）プロセス管理用レジスタ、（ａ−３）内部レジスタを有する。 Next, parallel processing of a plurality of processes inside a single processor of the TPCORE 50 will be described. The TPCORE 50 has the above-described (a-1) memory, (a-2) process management register, and (a-3) internal register as preconditions.

まず、ＴＰＣＯＲＥ５０のプロセス管理について説明する。
以下レジスタのサフィックスの小文字の時はレジスタを示し、（ＸＸ）の記号はそのデータ（アドレス値など）を示す。例えばＡｒｅｇ_１１はレジスタを示し、（Ａｒｅｇ_１１）はＡレジスタに格納された（される）データを示す。
プロセスはＷｐｔｒ_１４とスケジューリングリストによって管理される。Ｗｐｔｒ_１４は現在のプロセスのＩＤ（識別番号）を格納するために用いられ、スケジューリングリストは待機プロセスを保持するために用いられる。
あるプロセスを実行中に新しく別のプロセスが生成されたり、割り込みが発生したりするとそのプロセスＩＤは待機プロセスとして、スケジューリングリストの最後尾に追加される。このリストは先に格納されたものが先に実行されるＱｕｅｕｅ（キュー）として取り扱われる。また、このスケジューリングリストはリンクリスト構造で実行されるプロセスの順序を管理する。 First, process management of the TPCORE 50 will be described.
In the following, when the register suffix is lowercase, it indicates a register, and the symbol (XX) indicates the data (address value, etc.). For example, Areg ₁₁ indicates a register, and (Areg ₁₁ ) indicates data stored in (or made to) the A register.
The process is managed by Wptr ₁₄ and the scheduling list. Wptr ₁₄ is used to store the ID (identification number) of the current process, and the scheduling list is used to hold a waiting process.
When another process is created or an interrupt occurs while executing a process, the process ID is added as a standby process to the end of the scheduling list. This list is treated as a queue that is stored first and executed first. The scheduling list manages the order of processes executed in the linked list structure.

（ｂ−１）プロセスＩＤ
プロセスを１つの処理の単位とし、このプロセスにプロセスＩＤ（３２ｂｉｔ）を付与する。このプロセスＩＤの下位２ｂｉｔはそのプロセスの優先度を示し、優先度は“Ｈｉｇｈ”（下位１ｂｉｔ＝０）と“Ｌｏｗ”（下位１ｂｉｔ＝１）とする。上位３０ｂｉｔはプロセスごとに格納するメモリ_４２のワークスペース上でのアドレスを示す。なお、プロセスのワークスペースはＯｃｃａｍコンパイラが管理する。 (B-1) Process ID
A process is a unit of processing, and a process ID (32 bits) is assigned to this process. The lower 2 bits of the process ID indicate the priority of the process, and the priorities are “High” (lower 1 bit = 0) and “Low” (lower 1 bit = 1). The upper 30 bits indicate an address on the work space of the memory ₄₂ stored for each process. The process workspace is managed by the Occam compiler.

（ｂ−２）ワークスペース
プロセスごとにＲＡＭで構成されるメモリ_４２上にワークスペースというメモリ領域を設け、このメモリ領域にプロセスＩＤが示す値を基準にそこからメモリ負方向に３２ｂｉｔワード単位で数ワード（必ず）用意する。ワークスペースは並列処理されるプロセスが他のプロセスの割り込みにより一時的に中断されるときデータ（各種レジスタ）の保持、チャンネル通信、ＡＬＴコンストラクション命令の実行、プロセススケジューリングに利用する。ワークスペース先頭−４番地（プロセスＩＤ−４、つまりワークスペースより１ワード目）にプロセス開始（あるいは再開）時のＩｐｔｒ_１５の値を入れる。 (B-2) Work space A memory area called a work space is provided on the memory ₄₂ composed of RAM for each process, and the memory area is a number in units of 32 bit words in the memory negative direction based on the value indicated by the process ID. Prepare a word (always). The work space is used for holding data (various registers), channel communication, execution of an ALT construction instruction, and process scheduling when a process to be processed in parallel is temporarily interrupted by an interrupt of another process. The value of Iptr ₁₅ at the time of starting (or resuming) the process is entered in the first 4 addresses of the workspace (process ID-4, that is, the first word from the workspace).

図１０に、プロセスＩＤ（Ｗｐｔｒ_１４）のワークスペースの例を示す。例えば、プロセスＩＤが示すアドレス（Ｗｐｔｒ_１４）の一つ前のアドレス（Ｗｐｔｒ_１４−１）をアクセスする時は、Ｗｐｔｒ_１４の値をＡＬＵ_３１に送り１減らした値をアドレスとしてアクセスする。同様に、アドレス（Ｗｐｔｒ_１４−２）にアクセスする時は、アドレス（Ｗｐｔｒ_１４）から２を引いた値をアドレスとしてアクセスする。 FIG. 10 shows an example of the work space of the process ID (Wptr ₁₄ ). For example, when accessing the address (Wptr ₁₄ -1) immediately before the address (Wptr ₁₄ ) indicated by the process ID, the value of Wptr ₁₄ is sent to the ALU ₃₁ and the value obtained by subtracting 1 is accessed as the address. Similarly, when accessing the address (Wptr ₁₄ -2), a value obtained by subtracting 2 from the address (Wptr ₁₄ ) is accessed as the address.

（ｂ−３）ワークスペース管理
現在実行中のプロセスＩＤはＷｐｔｒ_１４に保持する。プロセスを実行中に新たなプロセスが生成された場合、そのプロセスは待機プロセスとしてプロセススケジューリングリストの最後尾に付け加える。このリストは先入先出構造（ＦＩＦＯ）を持つ。プロセスＩＤの示すそのプロセス独自のアドレス（ワークスペース−８）、即ちアドレス（ワークスペース−２ワード）目にそのプロセスの次に実行されるべきプロセスのワークスペースアドレス（即ちプロセスＩＤ）を収納する。もしそれ以降に実行すべきプロセスがない場合は空信号（ｅｍｐｔｙ，エンプティフラグ）として例えば0x80000000あるいは0x80000001（それぞれ低い優先度のプロセス用，高い優先度のプロセス用）という値を持たせる。このアドレス（ワークスペース−８）に次に実行させるプロセスＩＤを持たせることによりリンクリストの型でプロセススケジューリングリストを構成する。待機プロセスの先頭のプロセスＩＤをＦｐｔｒ_１６に、最後尾のプロセスＩＤをＢｐｔｒ_１７に保持させる。アドレス（Ｂｐｔｒ_１７−８）にはしたがってｅｍｐｔｙが格納されている。この構造によるスケジューリングリスト（以下キュー（Ｑｕｅｕｅ）とも称する）は２つの優先度（ｈｉｇｈ，ｌｏｗ）について独立に保持する。図１０に、メモリ_４２上のプロセスＩＤ（Ｗｐｔｒ_１４）のワークスペースの例を示す。アドレス（Ｗｐｔｒ_１４，Ｗｐｔｒ_１４−１，・・・，Ｗｐｔｒ_１４−５）と汎用レジスタとプロセス管理用レジスタに対応する記憶場所の関係について示す。 (B-3) Work space management The currently executing process ID is held in Wptr ₁₄ . If a new process is created while executing a process, it is added as a standby process to the end of the process scheduling list. This list has a first-in first-out structure (FIFO). The process's unique address (workspace-8) indicated by the process ID, that is, the address (workspace-2 words), stores the work space address (ie, process ID) of the process to be executed next to the process. If there is no process to be executed thereafter, an empty signal (empty, empty flag) is given a value of, for example, 0x80000000 or 0x80000001 (for low priority processes and high priority processes, respectively). By giving the process ID to be executed next to this address (workspace-8), a process scheduling list is formed in the form of a linked list. The first process ID of the standby process is stored in Fptr ₁₆ and the last process ID is stored in Bptr ₁₇ . Address _(Bptr 17 -8) The thus empty is stored. A scheduling list (hereinafter also referred to as a queue) having this structure is held independently for two priorities (high, low). FIG. 10 shows an example of the work space of the process ID (Wptr ₁₄ ) on the memory ₄₂ . Address _{_{(Wptr 14, Wptr 14 -1,}} ···, Wptr 14 -5) and the relationship between the storage location corresponding to the general-purpose register and process management register shown.

次に、メモリ_４２上のワークスペースに関する具体例を表１，２に示す。表１は、ＰＡＲコンストラクションの実行時のワークスペース内容を示し、ワークスペース相対アドレスと格納データの関係を示す。また表２には、ＡＬＴコンストラクションの実行時のワークスペースの内容を示す。いずれのコンストラクションにおいても、負方向に所定ワード離れたアドレスにデータが格納される。ＰＡＲコンストラクションでは所定アドレス離れてプロセス開始時や復帰時に実行されるアドレス、次に実行されるプロセスのＩＤ、通信開始時アクセスするメモリ_４２の先頭アドレスが格納される。また、ＡＬＴコンストラクションではアドレスの負方向に所定アドレス離れて、ガード選択状態、プロセス開始時や復帰時に実行されるアドレス、次に実行されるプロセスのＩＤ、ＡＬＴ実行状態が格納される。 Next, specific examples of the work space on the memory ₄₂ are shown in Tables 1 and 2. Table 1 shows the contents of the workspace when executing the PAR construction, and shows the relationship between the workspace relative address and the stored data. Table 2 shows the contents of the workspace when the ALT construction is executed. In any construction, data is stored at an address separated by a predetermined word in the negative direction. In the PAR construction, an address to be executed at the time of starting or returning from a predetermined address, a process ID to be executed next, and a head address of the memory ₄₂ to be accessed at the start of communication are stored. In the ALT construction, the guard selection state, the address executed at the start or return of the process, the ID of the process to be executed next, and the ALT execution state are stored at a predetermined address in the negative direction of the address.

次に、ＴＰＣＯＲＥ５０単体における並列プロセス処理について説明する。
図１１に、並列で数種類のプロセスが走る環境下でのプロセス状態遷移図を示す。並列処理はプログラムが実行される前のグランドステージ（初期（基本）段階）で、ＴＰＣＯＲＥ５０にプロセスを開始するｓｔａｒｔｐ（開始）命令が供給されプログラムが実行されると（ＳＴ１）プロセスを生成する（ＳＴ２）。そして、キュー（Ｑｕｅｕｅ）が空のときは生成したプロセスを実行し（ＳＴ３）、ｅｎｄｐ（終了）命令でプロセスを終了する。また、プロセス（ＳＴ３）の実行中、チャンネル通信の提起やタイムアウト処理やｓｔｏｐｐ命令が実行されると、アイドリング状態となり（ＳＴ４）相手方プロセスのチャンネルの反応を見るため待機する（ＳＴ５）。
一方、プロセスを生成した後（ＳＴ２）キューが空でないとき、プロセスＩＤをキューの末尾に追加して待機する（ＳＴ５）。待機中にキュー内でプロセスＩＤが進む。待機中のプロセスが先頭プロセスになると先頭待機時のプロセスがチェンジして（切り替えられて）待機中のプロセスが実行され（ＳＴ３）、ｅｎｄｐ命令によりプロセスが終了し、最初のグランドステージに遷移する（ＳＴ１）。以後同様な遷移が繰り返される。 Next, parallel process processing in the single TPCORE 50 will be described.
FIG. 11 shows a process state transition diagram in an environment where several types of processes run in parallel. The parallel processing is a grand stage (initial (basic) stage) before the program is executed. When a startp instruction to start the process is supplied to the TPCORE 50 and the program is executed (ST1), a process is generated (ST2). ). When the queue is empty, the generated process is executed (ST3), and the process is terminated with an endp (end) command. Further, when channel communication is proposed, a time-out process, or a stop command is executed during execution of the process (ST3), the process enters an idling state (ST4) and waits to see the reaction of the channel of the partner process (ST5).
On the other hand, after the process is generated (ST2), when the queue is not empty, the process ID is added to the end of the queue and waits (ST5). While waiting, the process ID advances in the queue. When the waiting process becomes the leading process, the waiting process is changed (switched), the waiting process is executed (ST3), the process is terminated by the endp instruction, and transitions to the first ground stage ( ST1). Thereafter, the same transition is repeated.

以下、上述した並列プロセスの処理の具体例に説明する。
Ｏｃｃａｍ言語で作成されたプログラムにＰＡＲコンストラクションやＡＬＴコンストラクションなど新しくプロセスを生成するようなコンストラクションが存在すると、コンパイラは以下の実行をするようにアセンブラコード群を生成する。
（ａ１）生成するプロセスＩＤを作る。
（ａ２）アセンブラ命令ｓｔａｒｔｐを用いてプロセスを生成する。
（ａ３）生成されたプロセスはその優先度に応じたスケジューリングリストへと追加される。
この（ａ１）〜（ａ３）の実行は生成するプロセスの数（並列プロセスの数）だけ繰り返えされる。 Hereinafter, a specific example of the processing of the parallel process described above will be described.
If there is a construction that creates a new process, such as a PAR construction or an ALT construction, in the program created in the Occam language, the compiler creates an assembler code group to perform the following execution.
(A1) Create a process ID to be generated.
(A2) A process is generated using the assembler instruction startp.
(A3) The generated process is added to the scheduling list according to its priority.
The execution of (a1) to (a3) is repeated by the number of processes to be generated (the number of parallel processes).

次にプロセスの実行と切り替えについて図１，２を参照しながら説明する。
ＴＰＣＯＲＥ５０は、Ｗｐｔｒ_１４の値が示すプロセスを実行する。しかし、実行しているプロセスが実行不可能またはプロセッサ（ＴＰＣＯＲＥ５０）がアイドリング状態になった時にプロセスの切り替が起こる。つまりアイドリング状態になった時、再び実行可能状態になるまでその状態を待ち続けるのでなく、別のプロセスを実行することでアイドリング状態を減らしている。
上述したプロセス実行不可能またはアイドリング状態にする要因は以下の例がある。
（ｂ１）プロセスを終了または停止させるようなアセンブラ命令が実行された場合。
（ｂ２）入出力命令を実行したとき、通信する相手が準備できていない場合。
（ｂ３）遅延やタイムアウト処理を行う命令を実行した時、目的の時間が経過していない場合。
このとき、ＴＰＣＯＲＥ５０におけるハードウェアのメカニズムは次のようのになる。
（ｂ４）待機プロセスの有無を調べ、無ければアイドリング状態にし、有ればプロセスの切り替え次のステップへ進む。
（ｂ５）Ｆｐｔｒ_１６の値をＷｐｔｒ_１４へ格納する。それと同時にアドレス（Ｗｐｔｒ_１４−１）に格納されている値をＩｐｔｒ_１５へ格納する。
（ｂ６）次に待機プロセスのＩＤをアドレス（Ｗｐｔｒ_１４−２）から取り出し、それをＦｐｔｒ_１６へ格納する。
（ｂ７）Ｗｐｔｒ_１４へ格納されたプロセスを開始する。
このようなプロセス切り替えを行うハードウェアの状態の遷移はマイクロコードに記述してあり、それによって制御される。また例外的にプロセスが切り換る要因として割り込みがあるが、これについては後述する。 Next, process execution and switching will be described with reference to FIGS.
TPCORE 50 executes the process indicated by the value of Wptr ₁₄ . However, process switching occurs when a process being executed is not executable or the processor (TPCORE 50) is in an idling state. In other words, when the idling state is entered, the idling state is reduced by executing another process rather than waiting for the state to become executable again.
There are the following examples of the above-mentioned factors that make the process unexecutable or idling.
(B1) An assembler instruction that terminates or stops the process is executed.
(B2) When a communication partner is not ready when an input / output command is executed.
(B3) When a target time is not elapsed when an instruction for delay or timeout processing is executed.
At this time, the hardware mechanism in the TPCORE 50 is as follows.
(B4) The presence or absence of a standby process is checked. If there is no standby process, the system is set to an idling state.
(B5) The value of Fptr ₁₆ is stored in Wptr ₁₄ . At the same time, the value stored in the address (Wptr ₁₄ −1) is stored in Iptr ₁₅ .
(B6) Next, the ID of the standby process is extracted from the address (Wptr ₁₄ -2), and stored in Fptr ₁₆ .
(B7) The process stored in Wptr ₁₄ is started.
The transition of the hardware state for performing such process switching is described in microcode and controlled by it. In addition, there is an interrupt as a factor that causes the process to switch exceptionally, which will be described later.

次に、上述したスケジューリングリストについて説明する。
スケジューリングリストはメモリ_４２上に形成され、リンクリスト構造で実行されるプロセスの順序を管理する。プロセスはプロセス自身ワークスペースを持ち、これを用いてスケジューリングリストを形成する。図１２ではプロセスが４個存在しているときのスケジューリングリストの例を示す。なお、図１２において煩雑さを避けるため、Ｗｐｔｒ_１４の記号をＷｐｔｒと略記して、プロセス番号をＷｐｔｒに付記する。
メモリ_４２（Ｍｅｍｏｒｙ）上で形成されたスケジューリングリストにおいて、例えば、あるプロセス（プロセス０）のＩＤがアドレス（Ｗｐｔｒ_０）であるとき、そのプロセスの次に実行されるプロセスＩＤはアドレス（Ｗｐｔｒ_０−２）に格納される。また自らのプロセスが実行され始めるときの命令取得先のアドレス（Ｉｐｔｒ_１５の値）は（Ｗｐｔｒ_１−１）に格納される。４番目の最後のプロセスＩＤ（アドレス（Ｂｐｔｒ_１７））はアドレス（Ｗｐｔｒ_３）に格納される。図１２に示すように、例えば、３番目のプロセスに関するアドレス（Ｗｐｔｒ_２−２）には次の（４番目の）プロセスＩＤのＷｐｔｒ_３が格納され、プロセス自身が次のプロセスを指し示すリンクリスト構造でプロセスは実行順に待機している。
ＴＰＣＯＲＥ５０は、このようなスケジューリングリストにアクセスするためにＦｐｔｒ_１６とＢｐｔｒ_１７の２種類のレジスタを用意している。
待機プロセスの先頭のプロセスＩＤがＦｐｔｒ_１６、最後尾のプロセスＩＤがＢｐｔｒ_１７で記憶される。実行中のプロセスが終了し、次の待機プロセスを開始するときなどは、Ｆｐｔｒ_１６にアクセスし次のプロセスが開始する。また、プロセスが生成したときなどはＢｐｔｒ_１７を用いてスケジューリングリストの最後尾にアクセスし、このプロセスを追加する。このスケジューリングリストの動作メカニズムやＦｐｔｒ_１６、Ｂｐｔｒ_１７の制御はマイクロコードにより制御される。 Next, the scheduling list described above will be described.
The scheduling list is formed on the memory ₄₂ and manages the order of processes executed in the linked list structure. A process has its own workspace, which is used to form a scheduling list. FIG. 12 shows an example of a scheduling list when there are four processes. In order to avoid complication in FIG. 12, the symbol Wptr ₁₄ is abbreviated as Wptr, and the process number is appended to Wptr.
In the scheduling list formed on the memory ₄₂ (Memory), for example, when the ID of a certain process (process 0) is an address (Wptr ₀ ), the process ID executed next to that process is the address (Wptr ₀ − 2). Further, the instruction acquisition destination address (value of Iptr ₁₅ ) when the process of its own begins to be executed is stored in (Wptr ₁ −1). The fourth last process ID (address (Bptr ₁₇ )) is stored in the address (Wptr ₃ ). As shown in FIG. 12, for example, a link list structure in which Wptr ₃ of the next (fourth) process ID is stored in the address (Wptr ₂ -2) related to the third process, and the process itself points to the next process. The process is waiting in the order of execution.
The TPCORE 50 prepares two types of registers, Fptr ₁₆ and Bptr ₁₇ , in order to access such a scheduling list.
The first process ID of the standby process is stored as Fptr ₁₆ , and the last process ID is stored as Bptr ₁₇ . When the process being executed is terminated and the next standby process is started, the Fptr ₁₆ is accessed to start the next process. Further, when a process is generated, the last part of the scheduling list is accessed using Bptr ₁₇ and this process is added. The operation mechanism of the scheduling list and the control of Fptr ₁₆ and Bptr ₁₇ are controlled by microcode.

スケジューリングリストは、アドレス（Ｆｐｔｒ_１６）、アドレス（Ｂｐｔｒ_１７）、メモリ（中間プロセス）の３つの要素で構成され、リンクリスト構造で待機プロセスが連結している。
したがって、待機プロセスがメモリ_４２で保持されるので使用するレジスタの数が減り、ハードウェアのリソースが節約できる。レジスタなどで待機プロセスを保持すると、スケジューリングリストとして用意したレジスタ以上の待機プロセスが生成されたとき、レジスタに空きができるまで、プロセスの生成を禁止したりしなくてはならない。このようなことはハードウェアの構造を複雑にする。しかし、上述したリンクリスト構造では、ほとんど無限に待機プロセスを生成することができ、またハードウェアの構造をシンプルにすることができる。 The scheduling list is composed of three elements: an address (Fptr ₁₆ ), an address (Bptr ₁₇ ), and a memory (intermediate process), and a standby process is linked in a linked list structure.
Therefore, since the waiting process is held in the memory ₄₂ , the number of registers used is reduced, and hardware resources can be saved. If a standby process is held by a register or the like, when a standby process larger than the register prepared as the scheduling list is generated, the generation of the process must be prohibited until the register becomes empty. This complicates the hardware structure. However, with the linked list structure described above, standby processes can be generated almost infinitely, and the hardware structure can be simplified.

次に、ＴＰＣＯＲＥ５０におけるＰＡＲコンストラクションの実行について具体的に説明する。
（ｃ−１）プロセスの実行
プロセッサ５０はＩｐｔｒ_１５を基にプロセスを実行させるとともにＷｐｔｒ_１４で示されるワークスペースの各値を１マシン命令実行ごとに（必要があれば）更新する。
（ｃ−２）プロセスの開始
Ａｒｅｇ_１１に開始すべきプロセスのプロセスＩＤを入れ、またＢｒｅｇ_１２にプロセスの開始時に実行する命令のアドレスとアドレス（Ｉｐｔｒ_１５）とのオフセットを入れておく（Ｏｃｃａｍコンパイラで整えられる）。待ちプロセスがなければ（スケジューリングリストにプロセスＩＤが登録されていないとき；アドレス（Ｆｐｔｒ_１６）＝（Ｂｐｔｒ_１７）＝ｅｍｐｔｙ）、Ｂｐｔｒ_１７にＡｒｅｇ_１１のデータを格納し、アドレス（ワークスペース−４）にＩｐｔｒ_１５＋４＋Ｂｒｅｇ_１２を格納する。待ちプロセスがある場合、すなわちアドレス（Ｆｐｔｒ_１６）≠アドレス（Ｂｐｔｒ_１７）の場合、アドレス（Ｂｐｔｒ_１７−８）にＡｒｅｇ_１１のデータを格納する。 Next, execution of the PAR construction in the TPCORE 50 will be specifically described.
(C-1) Process Execution The processor 50 executes a process based on Iptr ₁₅ and updates each value of the workspace indicated by Wptr ₁₄ for each machine instruction execution (if necessary).
(C-2) Process Start The process ID of the process to be started is entered in Areg ₁₁ , and the address of the instruction to be executed at the start of the process and the offset (Iptr ₁₅ ) are entered in Breg ₁₂ (Occam compiler) ). If there is no waiting process (when the process ID is not registered in the scheduling list; address (Fptr ₁₆ ) = (Bptr ₁₇ ) = empty), the data of Areg ₁₁ is stored in Bptr ₁₇ and the address (workspace-4) Iptr ₁₅ + 4 + Breg ₁₂ is stored. If there is a waiting process, that is, if address (Fptr ₁₆ ) ≠ address (Bptr ₁₇ ), data of Areg ₁₁ is stored at address (Bptr ₁₇ −8).

（ｃ−３）現行プロセスの実行中断・終了
現行プロセスがプロセス終了あるいは停止命令の実行を行ったとき、入出力命令を実行したとき、あるいはチャンネル通信での待機、遅延（ディレイ）やタイムアウト処理を行う命令の実行に入ったとき、プロセスを中断させる（図１３のＳＴ１１ａ，図１４のＳＴ２１，図１５のＳＴ１１ｂ，図１６のＳＴ３１参照）。 (C-3) Interruption / termination of execution of the current process When the current process executes a process termination or stop command, executes an I / O command, or waits, delays, or timeouts in channel communication When the execution of the instruction to be executed is started, the process is interrupted (see ST11a in FIG. 13, ST21 in FIG. 14, ST11b in FIG. 15, ST31 in FIG. 16).

（ｃ−４）プロセスの切り替え
ＰＡＲコンストラクションにおけるプロセスの切り替え動作について説明する（図１３〜図１６と表３〜表６参照）。なお、表３〜６，表８において、煩雑さを避けるため各ポインタレジスタのサフィックスは省略する。
待機プロセスの有無を調べ、待機プロセスが無ければプロセッサ（ＴＰＣＯＲＥ５０）そのものがアイドリング状態となる（図１６，表３参照）。 (C-4) Process Switching The process switching operation in the PAR construction will be described (see FIGS. 13 to 16 and Tables 3 to 6). In Tables 3 to 6 and Table 8, the suffix of each pointer register is omitted to avoid complexity.
The presence or absence of a waiting process is checked. If there is no waiting process, the processor (TPCORE 50) itself enters an idling state (see FIG. 16, Table 3).

待機プロセスが１個以上ある場合（Ｆｐｔｒ_１６≠Ｂｐｔｒ_１７）プロセスを切り替えるためにＦｐｔｒ_１６の値をＷｐｔｒ_１４に格納し、アドレス（Ｆｐｔｒ_１６−４）にある値をＩｐｔｒ_１５に格納し、アドレス（Ｆｐｔｒ_１６−８）に保持されている次のプロセスのプロセスＩＤをＦｐｔｒ_１６に格納するという手続きを踏みプロセスの切り替えを行う（図１３と表４のＴａ１，Ｔａ２、図１５と表５のＴＣ１，ＴＣ２参照）。

When there are one or more waiting processes (Fptr ₁₆ ≠ Bptr ₁₇ ), the value of Fptr ₁₆ is stored in Wptr ₁₄ in order to switch processes, the value in the address (Fptr ₁₆ -4) is stored in Iptr ₁₅ , and the address ( fptr ₁₆ -8 process ID of the next process, which is held) to switch the process down the procedure of storing the fptr ₁₆ (Ta1 in FIG. 13 and Table 4, Ta2, TC1 of Figure 15 and Table 5, See TC2).

なお待機プロセスが１個のみの場合（Ｆｐｔｒ_１６＝Ｂｐｔｒ_１７）、プロセスを切り替えた後、次のプロセスのプロセスＩＤの代わりにＦｐｔｒ_１６にｅｍｐｔｙを入れておく。なおプロセスを切り替えようとしてＦｐｔｒ_１６＝Ｂｐｔｒ_１７＝ｅｍｐｔｙであった場合、Ｗｐｔｒ_１４＝ｅｍｐｔｙとしてプロセッサ（ＴＰＣＯＲＥ５０）はアイドリング状態となる。高優先度のプロセスから復帰して低優先度のプロセスに切り替わる場合、たとえアドレス（Ｆｐｔｒ_１６）＝（Ｂｐｔｒ_１７）＝ｅｍｐｔｙと待機プロセスがなくてもワークスペース割り込み保存領域のＷｐｔｒｓａｖｅｌｏｃにあるアドレス（Ｗｐｔｒ_１４）を持ってきて実行を再開させる（図１３と表４のＴａ３、図１５と表５のＴＣ３参照）。そして後述の「割り込みからの復帰」と同じ操作が行われる。
また、高優先度キューが空で中断プロセスは高優先度の場合、待機プロセスが有るとＷｓａｖｅｌｏｃをＷｐｔｒ_１４に格納し、アドレス（Ｗｐｔｒ_１４−４）のデータをＩｐｔｒ_１５に格納し、待機プロセスを取り出し、実行する（図１４と表６参照）。

If there is only one standby process (Fptr ₁₆ = Bptr ₁₇ ), after switching the process, empty is entered in Fptr ₁₆ instead of the process ID of the next process. If Fptr ₁₆ = Bptr ₁₇ = empty in an attempt to switch processes, Wptr ₁₄ = empty and the processor (TPCORE50) enters an idling state. When returning from a high-priority process and switching to a low-priority process, even if address (Fptr ₁₆ ) = (Bptr ₁₇ ) = empty and there is no waiting process, the address (Wptr) in the Wptsaveloc of the workspace interrupt storage area ₁₄ ) and resume execution (see Ta3 in FIG. 13 and Table 4, TC3 in FIG. 15 and Table 5). Then, the same operation as “return from interruption” described later is performed.
Also, if the high priority queue is interrupted process empty high priority, stores Wsaveloc the waiting process is in the Wptr _14, stores the data of the address _(Wptr 14 -4) in iptr _15, the waiting process Take out and execute (see FIG. 14 and Table 6).

（ｃ−５）低優先度プロセス実行中の高優先度プロセスの切り替え
低い（ｌｏｗ）優先度のプロセス実行中に高い（ｈｉｇｈ）優先度のプロセス（Ｆｐｔｒ_{（１６）０}＝Ｂｐｔｒ_{（１７）０}）が生成されたり、アイドリング状態から復帰しかつＦｐｔｒ_{（１６）０}＝Ｂｐｔｒ_{（１７）０}≠ｅｍｐｔｙであれば、そのプロセスのみがＱｕｅｕｅにある高優先度プロセスの実行が割り込んでくる。この場合、現在実行させているプロセスのＷｐｔｒ_１４、Ｉｐｔｒ_１５およびスタックレジスタのＡｒｅｇ_１１、Ｂｒｅｇ_１２、Ｃｒｅｇ_１３をそのワークスペースの所定の保存領域（Ｗｐｔｒｓａｖｅｌｏｃ、Ｉｐｔｒｓａｖｅｌｏｃ、Ａｒｅｇｓａｖｅｌｏｃ、Ｂｒｅｇｓａｖｅｌｏｃ、Ｃｒｅｇｓａｖｅｌｏｃ、Ｗｐｔｒｓａｖｅｌｏｃから連続でＷｐｔｒｓａｖｅｌｏｃ＋１６まで）に格納させる。そしてＷｐｔｒ_１４にＦｐｔｒ_０（プロセス０のＦｐｔｒ_１６）、Ｉｐｔｒ_１５にアドレス（Ｆｐｔｒ_０−４）の内容を格納する。 (C-5) High-priority process switching during low-priority process execution High-priority process (Fptr _{(16) 0} = Bptr _{(17) 0} ) during low-priority process execution Is generated, or when returning from the idling state and Fptr _{(16) 0} = Bptr _{(17) 0} ≠ empty, execution of the high-priority process in which only that process is in Queue is interrupted. In this case, Wptr ₁₄ , Iptr _{15 of} the currently executing process, and Areg ₁₁ , Breg ₁₂ , and Creg ₁₃ of the stack register are stored in a predetermined storage area of the workspace (Wptraveloc, Iptsaveveloc, Aregsaveveloc, Bregsaveveloc, Bregsavec To Wptrsaveloc + 16). Then _(fptr 16 Process 0) fptr ₀ to Wptr _14, it stores the contents of the address (Fptr ₀ -4) to iptr _15.

（ｃ−６）割り込みからの復帰
高優先度プロセスのスケジューリングリストが空になりＷｐｔｒｓａｖｅｌｏｃに割り込み以前のワークスペースアドレスが格納されている場合割り込み復帰を行う。Ｗｐｔｒ_１４、Ｉｐｔｒ_１５、Ａｒｅｇ_１１、Ｂｒｅｇ_１２、Ｃｒｅｇ_１３をそれぞれ退避先のメモリ_４２からレジスタに戻す。そしてＷｐｔｒｓａｖｅｌｏｃ＝ｅｍｐｔｙを格納する。 (C-6) Return from interrupt If the scheduling list of the high priority process is empty and the work space address before the interrupt is stored in Wptsaveloc, the interrupt is returned. Wptr ₁₄ , Iptr ₁₅ , Areg ₁₁ , Breg ₁₂ , and Creg ₁₃ are returned from the save destination memory ₄₂ to the register. Then, Wptrsaveloc = empty is stored.

次に、ＴＰＣＯＲＥ５０におけるチャンネル間通信について説明する。
ＴＰＣＯＲＥ５０では、“ｉｎ”や“ｏｕｔ”のような通信用アセンブリ命令が実効されたとき、まず通信は同じＴＰＣＯＲＥ５０内のプロセスと通信するか他のＴＰＣＯＲＥ５０と通信するかを調べる。外部リンク（Ｌｉｎｋ）を用いる通信であった場合は、現在のＡｒｅｇ_１１、Ｂｒｅｇ_１２、Ｃｒｅｇ_１３、Ｗｐｔｒ_１４の値をリンクインターフェースに渡して通信処理の全権をリンクインターフェースに委ねる。内部通信であった場合は、メモリ_４２上のチャンネルにアクセスし、そこに格納されているＩＤを読み取り、その後の実行を行う。内部通信と確認した後、ＴＰＣＯＲＥ５０は、すぐに入出力作業を行うのではなく、この通信するチャンネルがすでにＡＬＴ関連のアセンブラ命令でＥｎａｂｌｅ（イネーブル）状態にされているチャンネルであるかどうか調べる。そしてチャンネルがＡＬＴコンストラクション用のチャンネルでなかったら、入出力処理を始める。 Next, communication between channels in the TPCORE 50 will be described.
In the TPCORE 50, when a communication assembly instruction such as “in” or “out” is executed, it is first checked whether the communication communicates with a process in the same TPCORE 50 or with another TPCORE 50. In the case of communication using an external link (Link), the current values of Areg ₁₁ , Breg ₁₂ , Creg ₁₃ , and Wptr ₁₄ are passed to the link interface, and all rights of communication processing are left to the link interface. In the case of internal communication, the channel on the memory ₄₂ is accessed, the ID stored therein is read, and the subsequent execution is performed. After confirming the internal communication, the TPCORE 50 does not immediately perform the input / output operation, but checks whether or not this communication channel is already enabled by an ALT related assembler instruction. If the channel is not an ALT construction channel, input / output processing is started.

以下、ＴＰＣＯＲＥ５０における内部通信の実行について具体的に説明する。
（ｄ−１）チャンネル
プロセス間の通信に使われるためにメモリ内の任意領域に１語を確保する（Ｏｃｃａｍコンパイラが用意する）。このアドレスをチャンネルアドレスとする。そのチャンネルアドレスにはプロセスＩＤあるいは初期値として例えばｅｍｐｔｙ（0x8000000x；Ｘ＝０，１は優先度を示す）を格納する。
（ｄ−２）チャンネル間通信の開始
チャンネル間通信が要求されるとＯｃｃａｍコンパイラは通信に必要な情報としてＡｒｅｇ_１１に送受信するデータ数、Ｂｒｅｇ_１２にチャンネルアドレス、Ｃｒｅｇ_１３に送受信するデータを格納する（している）メモリ領域のアドレスを格納する。通信が開始されるとまずこのチャンネルがＡＬＴコンストラクションにより利用されているチャンネルかどうかを調べ、そうでなければ平行して走る当該２プロセス間でチャンネル間の入出力を開始する。 Hereinafter, execution of internal communication in the TPCORE 50 will be specifically described.
(D-1) Channel One word is secured in an arbitrary area in the memory to be used for communication between processes (prepared by the Occam compiler). This address is a channel address. The channel address stores a process ID or, for example, empty (0x8000000x; X = 0, 1 indicates priority) as an initial value.
(D-2) Start of inter-channel communication When inter-channel communication is requested, the Occam compiler stores the number of data transmitted / received to / from Areg ₁₁ as information necessary for communication, the channel address in Breg ₁₂ and the data transmitted / received to Creg ₁₃ Stores the address of the memory area. When communication is started, first, it is checked whether or not this channel is used by the ALT construction. Otherwise, input / output between channels is started between the two processes running in parallel.

（ｄ−３）通信の提起
上述した擬似並列動作ではプロセスは並行して実行されるとなっているが、１プロセッサ（ＴＰＣＯＲＥ５０）のみでは一度に1命令しか実行できないので、ある瞬間ではプロセスは１つのみしか実行されていないことになる。したがってチャンネル通信も結局のところ先行プロセスと後発プロセスの間のデータ交換という形をとる。
先行プロセスはスタックレジスタＡｒｅｇ_１１，Ｂｒｅｇ_１２，Ｃｒｅｇ_１３に所定のデータが格納され、チャンネル間通信に対応する命令の実行部に至ると、チャンネルアドレスで示されるメモリが空（ｅｍｐｔｙ）であればまずワークスペース（アドレス（Ｗｐｔｒ_１４）で示される）より１ワード負のアドレス（Ｗｐｔｒ_１４−４）にチャンネル間通信直後に開始される命令のアドレスを格納し、さらに２ワード負のアドレス（Ｗｐｔｒ_１４）−１２にＣｒｅｇ_１３の値を格納する（即ちデータ格納先アドレス）。そしてチャンネルアドレスに現在のプロセスＩＤを格納する。そしてこのプロセスをスケジューリングリストからはずし、次に待機しているプロセスを実行させる。つまりＦｐｔｒ_１６のデータをＷｐｔｒ_１４に入れてアドレス（Ｆｐｔｒ_１６−４）をＩｐｔｒ_１５とする。これによりプロセスは入出力待ちによるアイドリング状態となる。待機（後発）プロセスが存在しない時、プロセッサ（ＴＰＣＯＲＥ５０）はアイドリング状態となる。 (D-3) Proposal of communication In the above-described pseudo-parallel operation, processes are executed in parallel, but only one instruction (TPCORE50) can execute only one instruction at a time. Only one is executed. Therefore, the channel communication eventually takes the form of data exchange between the preceding process and the subsequent process.
In the preceding process, when predetermined data is stored in the stack registers Areg ₁₁ , Breg ₁₂ , and Creg ₁₃ and the execution unit of the instruction corresponding to the inter-channel communication is reached, if the memory indicated by the channel address is empty (empty), workspace (address (Wptr ₁₄₎ with indicated are) from one word negative addresses _(Wptr 14 -4) to store the address of the instruction to be started immediately after inter-channel communication, two more words negative addresses (Wptr ₁₄₎ The value of Creg ₁₃ is stored in -12 (ie, data storage destination address). The current process ID is stored in the channel address. Then, this process is removed from the scheduling list, and the next waiting process is executed. That is, the data of Fptr ₁₆ is put in Wptr ₁₄ and the address (Fptr ₁₆ -4) is set to Iptr ₁₅ . As a result, the process enters an idling state by waiting for input / output. When there is no standby (later) process, the processor (TPCORE50) enters an idling state.

（ｄ−４）通信の成立
プロセスが切り替わり、後発プロセスの実行が開始され、その通信開始に対応する命令部に至れば、前述の通信の提起で述べたアルゴリズムを実行する。しかし該当するチャンネルアドレス（アドレス（Ｂｒｅｇ_１２）で示される）にはすでに前述の操作で空でない情報（即ち先行プロセスのプロセスＩＤ）が書かれてあるので、そのプロセスとの通信が成立することになる。この時点でプロセス待ちのキューが空かどうかチェックする。
空でない場合、Ｆｐｔｒ_１６≠ｅｍｐｔｙ、それは通信相手方プロセス以外の他のプロセスが並列して走っている（実効されている）ことを意味するのでアドレス（Ｂｒｅｇ_１２−８）に相手方のプロセスＩＤを入れてプロセスをキューの最後尾につける。
もしキューが空Ｆｐｔｒ_１６＝ｅｍｐｔｙならＢｐｔｒ_１７に相手方のプロセスＩＤを格納する。相手方のプロセスＩＤから相手側のワークスペースの先頭アドレスがわかり、そこから−１２番地の場所には相手方のチャンネル通信でデータを保持すべき（あるいは保持している）アドレスが格納されている。これは端的に言うとアドレス（Ｂｒｅｇ_１２−１２）に格納されているデータである。この操作で先行アドレスのデータ保持アドレスと自（後発）プロセスのデータ保持アドレス（前述したようにＣｒｅｇ_１３に格納されている）が明らかになる。チャンネルアドレスにプロセスＩＤが書かれている先行プロセスはこの時点でこの通信が自分にとって入力か出力か記憶していないが、後発プロセスがこの情報を持っている（現行命令を調べてチャンネル入力か出力か判断できる）ので問題なくチャンネル間入出力は行われる（つまりどちらが源でどちらが行き先か一義的に判明する）。この時点でＡｒｅｇ_１１の値（通信バイト数）分のデータを送信側から受信側に移動させる。Ａｒｅｇ_１１は他目的で使われることが多いので、転送バイト数を記憶させておくために通信開始時にオプショナルな（カウンタ）レジスタのｃｎｔ_２１にその値をコピーしている。 (D-4) Communication establishment When the process is switched and the execution of the subsequent process is started and the command unit corresponding to the start of the communication is reached, the algorithm described in the above communication is executed. However, since the corresponding channel address (indicated by the address (Breg ₁₂ )) has already been written with non-empty information (that is, the process ID of the preceding process) in the above-described operation, communication with that process is established. Become. At this point, check whether the queue waiting for the process is empty.
If not empty, Fptr ₁₆ ≠ empty, it places the process ID of the other party address _(Breg 12 -8) this means that running in parallel other processes other than the communication party processes (which is effective) Put the process at the end of the queue.
If the queue is empty Fptr ₁₆ = empty, the partner process ID is stored in Bptr ₁₇ . From the process ID of the other party, the head address of the other party's work space is known, and from that point, the address where data should be held (or held) in the other party's channel communication is stored at the address -12. This is the data that is stored in the address _(Breg 12 -12) say plainly. By this operation, the data holding address of the preceding address and the data holding address of the own (later) process (stored in the Creg ₁₃ as described above) are clarified. The preceding process whose process ID is written in the channel address does not remember whether this communication is input or output for itself at this point, but the subsequent process has this information (examine the current command and input or output the channel). Therefore, it is possible to input / output between channels without problems (that is, it is possible to unambiguously determine which is the source and which is the destination). At this time, the data corresponding to the value of Areg ₁₁ (number of communication bytes) is moved from the transmission side to the reception side. Since Areg ₁₁ is often used for other purposes, the value is copied to cnt ₂₁ of an optional (counter) register at the start of communication in order to store the number of transfer bytes.

（ｄ−５）通信の終了
「通信の提起」で記述したように、アイドリング状態にある先行プロセスをスケジューリングリストの最後尾に追加する。Ｂｐｔｒ_１７＝アドレス（Ｂｒｅｇ_１２）、アドレス（Ｂｐｔｒ_１７−４）には先行プロセスの復帰後の最初に実行される命令のアドレスが格納されている。そして該当チャンネルアドレスをｅｍｐｔｙにする。 (D-5) Termination of communication As described in “Proposal of communication”, the preceding process in the idling state is added to the end of the scheduling list. Bptr ₁₇ = address (Breg ₁₂ ) and address (Bptr ₁₇ -4) store the address of the instruction to be executed first after the preceding process returns. Then, the corresponding channel address is set to empty.

図１７と表７に２プロセス間の通信の状態遷移図を示す。
２プロセスをプロセスＡ（先行プロセス）とプロセスＢ（後発プロセス）とし、まずプロセスＡを実行し（ＳＴ５１）この時にプロセスＢは待機中（ＳＴ５４）とする。プロセスＡにおいてチャンネル通信命令が実行されるとスタックレジスタ（Ａｒｅｇ_１１，Ｂｒｅｇ_１２，Ｃｒｅｇ_１３）に移動データ数、チャンネルアドレス、データ格納先アドレスが格納されて、チャンネル通信が開始する（ＳＴ５２）。チャンネルアドレスにプロセスＡのＩＤを格納し、プロセスＡをキューからはずし、プロセス切り替え処理を行い、プロセスＡをアイドリング状態にする（ＳＴ５３）。プロセスＡからプロセスＢに切り替えられると、プロセスＢはプロセスの実行を開始し、チャンネル通信命令があるとチャンネル通信を実行する（ＳＴ５６）。チャンネルアドレスのデータを相手先のプロセスＩＤに格納し、データソースアドレスやデータ数などのプロセスＡの通信情報をアクセスする。そして、プロセスＢとプロセスＡ間のデータの移動が行われ（ＳＴ５７）、一方、プロセスＢは動作を終了する（ＳＴ５８）。そして、プロセスＡが待機リストへ復帰し、キュー待ちして待機する（ＳＴ５９）。プロセスＡのＩＤが先頭になるとプロセスＡが再開し（ＳＴ６０）、中断された処理が行われ、プロセス終了命令でプロセスＡが終了する（ＳＴ６１）。 FIG. 17 and Table 7 show state transition diagrams of communication between the two processes.
The two processes are a process A (preceding process) and a process B (subsequent process). First, the process A is executed (ST51). At this time, the process B is on standby (ST54). When a channel communication command is executed in the process A, the number of movement data, the channel address, and the data storage destination address are stored in the stack registers (Areg ₁₁ , Breg ₁₂ , Creg ₁₃ ), and channel communication is started (ST 52). The ID of process A is stored in the channel address, process A is removed from the queue, process switching processing is performed, and process A is set to the idling state (ST53). When the process A is switched to the process B, the process B starts executing the process, and when there is a channel communication command, executes the channel communication (ST56). The data of the channel address is stored in the process ID of the counterpart, and the process A communication information such as the data source address and the number of data is accessed. Then, data movement between process B and process A is performed (ST57), while process B ends its operation (ST58). Then, the process A returns to the waiting list and waits for the queue (ST59). When the ID of process A comes to the top, process A resumes (ST60), interrupted processing is performed, and process A ends with a process end command (ST61).

次に、ＡＬＴコンストラクションについて説明する。
ＡＬＴコンストラクションは２つ以上の同形の構造から成り立っている。この構造単位はガードと呼ばれる論理式とチャンネル入力、あるいはチャンネル入力のみで構成される部分とそれに引き続くプロセスである（ガード＋プロセス）。
並列で走っている他の複数のプロセスのうちいくつかのプロセスがＡＬＴ命令プロセスのガードを構成しているどれかのチャンネルと通信を始めようとしたとする。ＡＬＴ命令はそのうち最初に（論理式が満たされかつ）チャンネル入力があったガード（この過程をガードがはずされると称する）に引き続くプロセスを選択的に実行するメカニズムである。このメカニズムはＣＳＰ理論の重要なプログラム方式の一つである。 Next, ALT construction will be described.
An ALT construction consists of two or more isomorphic structures. This structural unit is a logical expression called guard and a channel input, or a part composed of only channel input and a process following it (guard + process).
Suppose that some of the other processes running in parallel try to communicate with any of the channels that constitute the guard of the ALT instruction process. The ALT instruction is a mechanism that selectively executes the process following the first guard (which is a logical expression is satisfied) and a channel input (this process is referred to as the guard being removed). This mechanism is one of the important programming methods of CSP theory.

次に、ＴＰＣＯＲＥ５０上でのＡＬＴコンストラクションの実行について図１８と表８を参照しながら説明する。
（ｅ−１）ＡＬＴコンストラクションの内部状態
ＡＬＴコンストラクションは「イネーブル（Ｅｎａｂｌｅ）」、「待機（Ｗａｉｔ）」、「レディ（Ｒｅａｄｙ）」の３つの状態とリセット（Ｒｅｓｅｔ）状態とを遷移して実現する。これらの状態を３２ｂｉｔ幅の値で区別しＡＬＴコンストラクション実行時にある特定のメモリ領域（後述のＡＬＴプロセスのワークスペース内）にその値が保持される。この値を例えばそれぞれ0x80000001，0x80000002，0x80000003とする。 Next, execution of ALT construction on the TPCORE 50 will be described with reference to FIG. 18 and Table 8.
(E-1) Internal State of ALT Construction The ALT construction is realized by transitioning between three states of “Enable”, “Wait”, and “Ready” and a Reset state. . These states are distinguished by a 32-bit width value, and the value is held in a specific memory area (in the workspace of an ALT process described later) when the ALT construction is executed. For example, these values are 0x80000001, 0x80000002, and 0x80000003, respectively.

（ｅ−２）ＡＬＴコンストラクションプロセス
図１８と表８にＡＬＴコンストラクションの状態遷移図を示す。ＡＬＴコンストラクションが開始されるとは１つの独立したプロセスが開始されることを示す。メモリ_４２に独自のワークスペースをもちその先頭アドレスの値をプロセスＩＤとして設定する。Ｂｐｔｒ_１７にその値が格納される。そしてそのアドレスから負方向３ワード目アドレス（Ｗｐｔｒ_１４−１２に状態「イネーブル」を表す0x80000001を入れる（ＳＴ８１，Ｆ１参照）。 (E-2) ALT Construction Process FIG. 18 and Table 8 show state transition diagrams of the ALT construction. Starting an ALT construction indicates that one independent process is started. The memory ₄₂ has its own work space, and the value of the head address is set as the process ID. The value is stored in Bptr ₁₇ . Then add 0x80000001 representing the state "enable" in the negative direction third word addresses _(Wptr 14 -12 from the address (ST81, see F1).

（ｅ−３）ガード入力の有無の検査
すべてのガードについて、ガードごとに以下のことを行う。ガードの一部に論理式を使っている場合、ＯｃｃａｍコンパイラはＡＬＴコンストラクション実行時での論理式の結果値をＡｒｅｇ_１１に入れるようにする。
ガードの論理式実行においてＡｒｅｇ_１１が真値をもっていれば、次にチャンネル入力のチャンネルアドレスの値（Ｂｒｅｇ_１２に格納されている）を検査する（図１８，Ｆ２参照）。ここ（Ｂｒｅｇ_１２）がｅｍｐｔｙでなくすでに他のプロセスＩＤが書き込まれていたら（即ちチャンネル出力プロセス側からのチャンネル通信の提起が始まっていることを示す）、ＡＬＴプロセスは「レディ」状態となり（ＳＴ８３，Ｆ４参照）、Ｗｐｔｒ_１４にレディフラッグを示す0x80000003を格納すると同時にアドレス（Ｗｐｔｒ_１４）に分岐先未決定フラグ（１）を格納する。
この時点でプロセススケジューリングリストにプロセスが存在する場合、リスト先頭で待っているプロセスのワークスペースポインタ（Ｆｐｔｒ_１６）をアドレス（チャンネルアドレス−８）に格納させ、Ｆｐｔｒ_１６＝アドレス（チャンネルアドレス）としてＡＬＴコンストラクションプロセスと通信するプロセスをリスト先頭に持ってくる（すでにそうなっていればこの部分はスキップ）。
あるいはスケジューリングリストが空であればＢｐｔｒ_１７にチャネルアドレスを格納するガードのチャンネルが未入力の場合（Ａｒｅｇ_１１の値が偽値であるか、アドレス（Ｂｒｅｇ_１２）がＡＬＴコンストラクションプロセスのＷｐｔｒ_１４値であるか）このガードは無視される。ガードの論理式が真でかつアドレス（Ｂｒｅｇ_１２）＝ｅｍｐｔｙであればアドレス（Ｂｒｅｇ_１２）にＡＬＴコンストラクションプロセスのワークスペースアドレス（Ｗｐｔｒ_１４に保持されている）を入れる。（ガード有無の検査については、図１８，表８を参照。）

(E-3) Inspection of presence or absence of guard input For all guards, the following is performed for each guard. When a logical expression is used as a part of the guard, the Occam compiler puts the result value of the logical expression at the time of executing the ALT construction into Areg ₁₁ .
If Areg ₁₁ has a true value in the execution of the guard logical expression, the value of the channel address of the channel input (stored in Breg ₁₂ ) is checked (see FIG. 18, F2). If (Breg ₁₂ ) is not empty and another process ID has already been written (that is, the channel output process side has started to propose channel communication), the ALT process enters the “ready” state (ST83). see F4), stores simultaneously storing 0x80000003 indicating the ready flag to Wptr ₁₄ address (branch destination pending flag Wptr ₁₄₎ (1).
If there is a process in the process scheduling list at this point, the work space pointer (Fptr ₁₆ ) of the process waiting at the top of the list is stored in the address (channel address-8), and ALT is set as Fptr ₁₆ = address (channel address). Brings the process that communicates with the construction process to the top of the list (if this is the case, skip this part).
Alternatively, if the scheduling list is empty, the guard channel for storing the channel address in Bptr ₁₇ is not input (the value of Areg ₁₁ is a false value, or the address (Breg ₁₂ ) is the Wptr ₁₄ value of the ALT construction process. Is this) This guard is ignored. If the logical expression of the guard is true and the address (Breg ₁₂ ) = empty, the work space address of the ALT construction process (held in Wptr ₁₄ ) is entered in the address (Breg ₁₂ ). (Refer to Fig. 18 and Table 8 for the inspection for the presence or absence of guards.)

（ｅ−４）待機状態（Ｗａｉｔ）
すべてのチャンネルを入力待ちにした後（即ちガードに使われているチャンネルのチャンネルアドレスにＡＬＴプロセスのワークスペースアドレスが書かれたら）、アドレス（Ｗｐｔｒ_１４−１２）は「待機」を示す0x80000002を格納し、Ｗｐｔｒ_１４＝Ｆｐｔｒ_１６として次に待機しているプロセスに起動をかける（Ｉｐｔｒ_１５にＦｐｔｒ_１６−４を格納）（ＳＴ８２，Ｆ３参照）。
こうしてまたＡＬＴコンストラクションプロセスはスケジューリングリストからはずしておく。ＡＬＴコンストラクションプロセスはガードを構成するチャンネルの入力を待つ。
ワークスペースの先頭アドレス（Ｗｐｔｒ_１４）には分岐先未決定フラグ（１）を格納しておく。前述したように、ガード入力の有無の検査中にすでにチャンネル通信要求が感知されればこの状態を経ずに次のレディ状態に遷移する。 (E-4) Standby state (Wait)
After waiting for input all of the channel (ie When the workspace address of ALT process is written in the channel address of the channel being used to guard), address (Wptr 14 _-12) is stored in the 0x80000002 indicating "standby" Then, the next waiting process is activated with Wptr ₁₄ = Fptr ₁₆ (Fptr ₁₆ -4 is stored in Iptr ₁₅ ) (see ST82 and F3).
Thus again the ALT construction process is removed from the scheduling list. The ALT construction process waits for the input of the channels that make up the guard.
The branch destination undetermined flag (1) is stored in the head address (Wptr ₁₄ ) of the workspace. As described above, if a channel communication request is already detected during the check for the presence of a guard input, the state transitions to the next ready state without passing through this state.

（ｅ−５）レディ状態（Ｒｅａｄｙ）
いずれかのガードを構成する論理式（もしあれば）が真値を持ちそのチャンネルに入力が入ってくれば、ワークスペースがＦｐｔｒ_１６に格納され、ＡＬＴコンストラクションプロセスをスケジュールリストの先頭待機プロセスとする。Ａｒｅｇ_１１に「レディ状態」を示す値0x80000003を入れる。ＡＬＴコンストラクションプロセスへチャンネルアクセスを試みようとするプロセスはチャンネルアドレスからＡＬＴプロセスのワークスペースを得てそこに収納されている値を調べる。この値が１であるとこのチャンネルアドレスはＡＬＴプロセスによって使われているものと判断する（前述の「チャンネル間通信の開始」でのＡＬＴコンストラクションでの使用検査法の記述参照）。そしてこのＡＬＴコンストラクションプロセスがチャンネル出力側プロセスにより再びスケジューリングリストの先頭（Ｆｐｔｒ_１６）に登録されることになる（ＳＴ８３とＦ６参照）。 (E-5) Ready state (Ready)
If the logical expression (if any) that constitutes one of the guards has a true value and an input is input to the channel, the workspace is stored in Fptr ₁₆ and the ALT construction process is the first waiting process in the schedule list. . In Areg ₁₁ , a value 0x80000003 indicating “ready state” is entered. A process attempting to access a channel to the ALT construction process obtains the ALT process workspace from the channel address and examines the value stored therein. If this value is 1, it is determined that this channel address is being used by the ALT process (refer to the description of the usage check method in the ALT construction in the above-mentioned "start of inter-channel communication"). This ALT construction process is registered again at the head (Fptr ₁₆ ) of the scheduling list by the channel output side process (see ST83 and F6).

（ｅ−６）ガードのリセット（Ｒｅｓｅｔ）
ＡＬＴコンストラクションプロセスの入力ガードにおいて論理式が満たされチャンネルアクセスが認められるのは最初にガードがはずれた１つのみである。このガード以外のガードはすべてリセットさせなければならない。ＡＬＴコンストラクションプロセス中、Ａｒｅｇ_１１の値はガードの論理式の結果である。
チャンネル通信が他のプロセスからなされたが論理式が真とならなかった場合はＡｒｅｇ_１１を偽値にリセットする。論理式は真であったがチャンネル通信が行われなかった場合、当該チャンネルのアドレス（チャンネルアドレス）にｅｍｐｔｙを格納する。入力ガードがはずされたチャンネルはアドレス（Ｗｐｔｒ_１４）（ＡＬＴコンストラクションプロセスのワークスペースを保持）の内容をみてそこがまだ−１かどうかをチェックする。−１であれば宛先未決定ということなのでアドレス（Ｗｐｔｒ_１４）＝アドレス（チャンネルアドレス）として相手方のプロセスＩＤを格納する。そしてガードのはずれたあとのプロセスのアドレスをＩｐｔｒ_１５に格納し実行をそこに移す（ＳＴ８４，Ｆ７参照）。 (E-6) Guard reset (Reset)
In the input guard of the ALT construction process, the logical expression is satisfied and channel access is only allowed at the first time when the guard is removed. All other guards must be reset. During the ALT construction process, the value of Areg ₁₁ is the result of the guard formula.
If channel communication is made from another process but the logical expression is not true, Areg ₁₁ is reset to a false value. When the logical expression is true but the channel communication is not performed, “empty” is stored in the address (channel address) of the channel. The channel whose input guard is removed checks the contents of the address (Wptr ₁₄ ) (holding the workspace of the ALT construction process) and checks whether it is still -1. If it is -1, it means that the destination has not been determined, so the process ID of the other party is stored as address (Wptr ₁₄ ) = address (channel address). Then, the address of the process after the removal of the guard is stored in Iptr ₁₅ and the execution is transferred there (see ST84, F7).

以上ＰＡＲおよびＡＬＴコンストラクションの実装方法、１マイクロプロセッサ（プロセッサ）内で複数プロセスとチャンネル通信方法についてのハードウェアアルゴリズムについて述べた。
このように本発明のＴＰＣＯＲＥ５０は複数プロセスの実行を１プロセッサ内部のみでも可能とした。これは、Ｏｃｃａｍのもつ並列処理コマンド（コンストラクタ）を単体内部で行えるようにハードウェアアルゴリズムを工夫してそれを実装したことによる。
すなわち、
・逐次実行（通常のシングル命令の順次実行）（ＳＥＱコンストラクション）
・並列処理（ＰＡＲコンストラクション）
・プロセス間のデータ通信と同期（チャンネルの概念）
・多重チャンネル入力処理（ＡＬＴコンストラクション）
を工夫したことによる。 In the above, the implementation method of PAR and ALT construction, the hardware algorithm about the multiple processes and channel communication method in one microprocessor (processor) were described.
As described above, the TPCORE 50 of the present invention can execute a plurality of processes only within one processor. This is because the hardware algorithm has been devised and implemented so that the parallel processing command (constructor) of Occam can be executed inside the single unit.
That is,
-Sequential execution (sequential execution of normal single instructions) (SEQ construction)
・ Parallel processing (PAR construction)
・ Inter-process data communication and synchronization (channel concept)
・ Multi-channel input processing (ALT construction)
By devising.

本発明のＦＰＧＡに搭載したＴＰＣＯＲＥ５０はトランスピュータの命令の実行を完全に行えるということを示したが、従来のアーキテクチャとはまったく異なっている。その結果、とくにメモリアクセス方法に、（ａ）メモリおよび外部インターフェースのアクセスレートとＴＰＣＯＲＥ動作周波数、（ｂ）メモリアドレス空間の均質化、という相違点が生じ、それらは性能の向上につながっている。 Although it has been shown that the TPCORE 50 mounted on the FPGA of the present invention can completely execute the instructions of the transputer, it is completely different from the conventional architecture. As a result, differences in the memory access method, such as (a) access rate and TPCORE operating frequency of the memory and external interface, and (b) homogenization of the memory address space, have led to improved performance.

なお本発明において、ＯｃｃａｍはＣＳＰ理論に基づいて作られた言語である。本発明のプログラムはいくつかのプロセスが集合して構成されたものを示す。本発明のグランドステージは並列処理中のある１つのプロセスの遷移状態におけるスタート命令が実行される前の基本段階を示す。本発明のプロセスはある一定の行動を逐次的に実行し続ける実態を示す。また、コンストラクションとは代入、出力、プロシジャーコール（サブルーチンに相当する）などの最も基本となるプリミティブプロセスの集合体を示す。チャンネルとは並列に実行されているプロセス間の通信（データ交換）に用いられる概念または手段である。本発明のキューは待ち行列またはスケジューリングリストを示す。本発明のワークスペースはコンストラクション、命令、識別番号、アドレス、データなどを格納するメモリ空間を示す。本発明のスケジューリングリストは実行プロセスや待機プロセスの識別番号をメモリ上に格納して形成したリストを示す。本発明のアイドリングはプロセッサがプログラム実行前の待機状態を示す。 In the present invention, Occam is a language created based on the CSP theory. The program of the present invention shows an assembly of several processes. The ground stage of the present invention indicates a basic stage before a start instruction is executed in the transition state of a process in parallel processing. The process of the present invention shows the actual situation of executing a certain action sequentially. The construction is a collection of the most basic primitive processes such as assignment, output, and procedure call (corresponding to a subroutine). A channel is a concept or means used for communication (data exchange) between processes running in parallel. The queue of the present invention represents a queue or scheduling list. The workspace of the present invention indicates a memory space for storing constructions, instructions, identification numbers, addresses, data, and the like. The scheduling list of the present invention is a list formed by storing the identification numbers of execution processes and standby processes in a memory. The idling of the present invention indicates a standby state before the processor executes the program.

以上述べたように、本発明の並列処理プロセッサは、従来のトランスピュータのアーキテクチャを絞り込み、精査し直して設計することにより、できるだけ本体のゲート数・ロジックセル数を減らしスリム化させコンパクトなプロセッサを実現した。この結果、現在入手できる最大のＦＰＧＡで最大１８個のＴＰＣＯＲＥを１個のＦＰＧＡに組み込むことができる。この条件で、ルート部に最大８個のＴＰＣＯＲＥを配置した４段のツリー構造、また格子形態だと４×４のメッシュを1個のＦＰＧＡに組み込むとができる。 As described above, the parallel processor of the present invention is a compact processor that reduces the number of gates and logic cells of the main unit as much as possible by reducing the number of gates and logic cells as much as possible by refining and designing the conventional transputer architecture. It was realized. As a result, a maximum of 18 TPCOREs can be incorporated into one FPGA with the largest FPGA currently available. Under this condition, a four-stage tree structure in which a maximum of eight TPCOREs are arranged in the root portion, or a 4 × 4 mesh in a lattice form can be incorporated into one FPGA.

また、本発明のＴＰＣＯＲＥはＦＰＧＡで形成するので、ネットワークを構成する場合、並列処理を応用するシステムによって自由にそのトポロジーを改編できる。したがって、ＴＰＣＯＲＥをＦＰＧＡ上で実現させることのメリットは非常に大きくなる。 In addition, since the TPCORE of the present invention is formed by FPGA, when the network is configured, the topology can be freely modified by a system to which parallel processing is applied. Therefore, the merit of realizing TPCORE on FPGA becomes very large.

さらに、本発明のＴＰＣＯＲＥのシステムアーキテクチャは、外部インターフェースへのデータ転送レートとメモリのアクセスレートにおいて従来のトランスピュータと異なり、ＴＰＣＯＲＥの動作周波数と同期する。また、ＴＰＣＯＲＥではインターフェースの転送レートとクロックは独立している。
さらに、トランスピュータのメモリには階層性がありアクセスの早い内部メモリと遅いメモリがあったが、本発明のＴＰＣＯＲＥではメモリをすべて均質化しアクセスレートを４Ｇバイト空間すべてで同一とした。即ち、同じクロックレートですべてのメモリ空間を均一にアクセスできる。 Furthermore, the system architecture of the TPCORE of the present invention is synchronized with the operating frequency of the TPCORE, unlike the conventional transputer, in terms of the data transfer rate to the external interface and the memory access rate. In TPCORE, the interface transfer rate and clock are independent.
Further, the transputer memory has a hierarchical internal memory and a fast access memory, but in the TPCORE of the present invention, all the memories are homogenized and the access rate is the same in all 4 Gbyte spaces. That is, all memory spaces can be accessed uniformly at the same clock rate.

本発明のＣＰＵの構成図である。It is a block diagram of CPU of this invention. 並列処理プロセッサの構成図である。It is a block diagram of a parallel processing processor. アドレスバスのデータ幅の変換図である。It is a conversion diagram of the data width of the address bus. データバスのデータ幅の変換図である。It is a conversion figure of the data width of a data bus. インターフェースを有する並列処理プロセッサの構成図である。It is a block diagram of the parallel processing processor which has an interface. ＴＰＣＯＲＥネットワークの構成図である。It is a block diagram of a TPCORE network. ＴＰＣＯＲＥの通信開始の動作を示す図である。It is a figure which shows the operation | movement of the communication start of TPCORE. ＴＰＣＯＲＥの通信が成立時の動作を示す図である。It is a figure which shows the operation | movement when communication of TPCORE is materialized. ＴＰＣＯＲＥの通信動作を示す図である。It is a figure which shows the communication operation of TPCORE. プロセスＩＤのワークスペースを示す図である。It is a figure which shows the work space of process ID. 並列処理中の１プロセスの状態遷移を表す図である。It is a figure showing the state transition of 1 process in parallel processing. スケジューリングリストの構造を示す図である。It is a figure which shows the structure of a scheduling list. ＰＡＲコンストラクションの動作を示すプロセス切り替え状態遷移図である。It is a process switching state transition diagram showing the operation of PAR construction. ＰＡＲコンストラクションの動作を示す他のプロセス切り替え状態遷移図である。It is another process switching state transition diagram showing the operation of PAR construction. ＰＡＲコンストラクションの動作を示す他のプロセス切り替え状態遷移図である。It is another process switching state transition diagram showing the operation of PAR construction. ＰＡＲコンストラクションの動作を示す他のプロセス切り替え状態遷移図である。It is another process switching state transition diagram showing the operation of PAR construction. プロセス間のチャンネル通信の動作を示す状態遷移図である。It is a state transition diagram which shows operation | movement of the channel communication between processes. ＡＬＴコンストラクションの動作を示す状態遷移図である。It is a state transition diagram showing the operation of ALT construction.

Explanation of symbols

１０…ＣＰＵ、１１…Ａｒｅｇ（Ａレジスタ）、１２…Ｂｒｅｇ、１３…Ｃｒｅｇ、１４…Ｗｐｔｒ（ワークスペースポインタ）、１５…Ｉｐｔｒ、１６…Ｆｐｔｒ、１７…Ｂｐｔｒ、２１…ｃｎｔ、２２…ｃｌｋ、２３…Ｔｉｍｅｏｕｔ、２４…マイクロコードＲＯＭコントローラ、２５…Ｏｒｅｇ、２６…Ｉｒｅｇ、２７…マイクロコードＲＯＭ、２８…マイクロコントローラ、２９…Ｔｅｍｐ、３１…ＡＬＵ、４１…メモリコントローラ、４２，４２−ａ〜４２−ｄ…メモリ、４５…リンクブロック、５０，５０−１，５０−２…ＴＰＣＯＲＥ、５２−ａ〜５２−ｄ…リンク（Ｌｉｎｋ）インターフェース、１００…ＴＰＣＯＲＥネットワーク。 10 ... CPU, 11 ... Areg (A register), 12 ... Breg, 13 ... Creg, 14 ... Wptr (workspace pointer), 15 ... Iptr, 16 ... Fptr, 17 ... Bptr, 21 ... cnt, 22 ... clk, 23 ... Timeout, 24 ... Microcode ROM controller, 25 ... Oreg, 26 ... Ireg, 27 ... Microcode ROM, 28 ... Microcontroller, 29 ... Temp, 31 ... ALU, 41 ... Memory controller, 42, 42-a to 42- d ... Memory, 45 ... Link block, 50, 50-1, 50-2 ... TPCORE, 52-a to 52-d ... Link (Link) interface, 100 ... TPCORE network.

Claims

A parallel processing architecture of a parallel processor that executes a program in an Occam language, wherein the parallel processor is a start instruction of the process at an initial stage before execution of a process that is sequentially executed in a basic unit constituting the program Is executed, and when there is no queue waiting for the process, the generated process is executed and terminated with an end command of the process, or channel communication is raised or timed out during the execution of the process. When a process or stop command is executed, the system enters an idle state and waits to see the response of the other party's channel. When the process is created and there is no process waiting, the process identification number is placed at the end of the process waiting queue. Add and wait, while waiting for the above process in the queue waiting for the process Parallel processing architectures scan identification number advances, the waiting process is switched processes at the top stand becomes the head process the process is executed to exit the end command, the process proceeds to the initial stages.

The parallel processor has a pointer register and a memory. When executing the process, the parallel processor stores an identification number of the process in the pointer register or a workspace of the memory, and the process stored in the workspace is identified by the identification number. A plurality of processes are linked in a linked structure, and the process indicated by the value of the workspace is executed according to the identification number. When the executing process is not executable or the processor is in an idling state, the process is executed. The parallel processing architecture according to claim 1.

The parallel processor, when executing parallel construction of the process, exchanges the identification number between the pointer register and the workspace, detects a queue of the workspace, and executes a head process of the queue. The parallel processing architecture described.

When the parallel processor performs channel communication between the processes, the process identification number is stored in the workspace of the memory, and the process is switched from the queue of the workspace. The parallel processing architecture according to claim 2.

When the parallel processor executes the alternative construction of the process, it sends and receives the identification number in the pointer register and the workspace, detects the guard value of the process, and detects the first process in which the guard is removed. The parallel processing architecture according to claim 2, wherein the processing is executed.

A parallel processor that forms a network and executes in the Occam language,
An internal unit having an ALU that performs arithmetic or logical operations, a microcode ROM that stores microcode that controls the ALU, a register that stores a memory address in which an instruction or an instruction to be executed next is stored, and a general-purpose stack register A register, a workspace pointer register that holds an identification number of a process that is sequentially executed in a basic unit of a program processed by the processor, a process management register that stores data for managing a standby process, and A processor having a microcode ROM controller for controlling the microcode ROM;
A plurality of links connected to the processor for inputting and outputting data;
The input / output data of the processor or the link is stored and a work space is provided, and the data of the identification number for starting the process and the identification number of the next process to be executed are stored in the work space apart by a predetermined address value. A memory that forms a scheduling list to which the identification numbers are concatenated;
A parallel processing processor comprising: a memory controller that controls transmission / reception of input / output data of the memory.

7. The parallel processing according to claim 6, wherein the process is managed by the workspace pointer register and the scheduling list, the workspace pointer register stores an identification number of a current process, and a standby process is held in the scheduling list. Processor.

The parallel processing processor according to claim 6, wherein a process indicated by the value of the workspace pointer register is executed, and the executing process is switched when the executing process cannot be executed or when the processor is in an idling state. .

9. The parallel processing processor according to claim 8, wherein when another process is created or an interrupt occurs while executing the process, the identification number of the process is added to the end of the scheduling list as a waiting process.

The memory is formed on the same substrate as the processor on the FPGA, and the work space is provided with a memory area on the memory for each process, and a memory address of the memory area is set based on the value indicated by the process identification number in the memory area. The parallel processing processor according to claim 6, wherein several words are provided in a predetermined direction in a negative direction.

The parallel processing according to claim 6, wherein the work space is used for data holding of each register, channel communication, execution of an alternative construction or process scheduling when the process to be processed in parallel is temporarily interrupted by an interrupt of another process. Processing processor.

The process management register has first and second pointer registers, the scheduling list is formed by the first process, the last process, and an intermediate process, and the first pointer register holds the identification number of the first process. 7. The parallel processor according to claim 6, wherein the second pointer register holds an identification number of the last process, forms a linked list structure with the identification number, and connects the standby processes.

The internal register has first and second stack registers, stores the process identification number of the instruction of the parallel construction to be started when executing the parallel construction in the first stack register, and is executed at the start of the process The address and offset are stored in the second stack register. When there is a waiting process, the data of the first stack register is stored at the memory address indicated by the first pointer register. The data of the first stack register is stored in the second pointer register, and the third pointer register for storing the instruction pointer and the data of the second stack register are stored at a predetermined address of the memory. Parallel processor.

When executing the parallel construction, the processor checks whether there is a standby process, and when there is the standby process, exchanges data between the third pointer register and the first pointer register to switch the process. The parallel processing processor according to claim 13, wherein the idling state is set when there is no data.

When an interrupt occurs during execution of the process, the processor stores the data of the workspace pointer register, the instruction pointer register, and the internal register in a predetermined area of the workspace, and the workspace pointer register and the instruction The parallel processing processor according to claim 6, wherein data of the first pointer register is stored in the pointer register.

When the inter-channel communication command is executed, the preceding process stores the command address started immediately after channel communication and the data in the internal register in the workspace if the channel address value is empty. The parallel processing processor according to claim 6, wherein the process is removed from the scheduling list and the next waiting process is executed.

The processor stores the process identification number of the communication partner in the second pointer register when there is a queue waiting for the process as a result of checking the queue state waiting for the process after the channel communication between the processes is established. The parallel processing processor according to claim 16, wherein an identification number is added to the end of the queue waiting for the process, and when there is no queue waiting for the process, the second pointer register stores the process identification number of the communication partner.

The ortho-constructive construction of the process is composed of a logical expression constituting a guard and a part and a process composed of only channel inputs or channel inputs, and the result of the logical expression of the guard is a true value and the first for process management. When the process is in a ready state and there is a process in the scheduling list, the first stack register stores a work space pointer of the process waiting at the head of the list, and the process of the ortho construction 7. The parallel processing according to claim 6, wherein when the process that communicates with the process is positioned at the top of the list and there is no data in the scheduling list, the second pointer register stores the value of the second stack register for the process management. Processor.

After the above-described alternative construction waits for input of all the channels, a value indicating a standby state is stored in the workspace of the memory, and a value of the first pointer register is stored in the workspace pointer register. The parallel processing processor according to claim 18, wherein the parallel processing processor is started.

In the ortho construction, when input data is supplied to the channel when a logical expression constituting the guard is true, the data of the workspace is stored in the first pointer register and the process of the ortho construction is performed. The parallel processing processor according to claim 18, wherein a value indicating a ready state is stored in the first stack register as a first waiting process of scheduling.