JPH11282815A

JPH11282815A - Multi-thread computer system and multi-thread execution control method

Info

Publication number: JPH11282815A
Application number: JP10103970A
Authority: JP
Inventors: Junji Sakai; 淳嗣酒井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-03-31
Filing date: 1998-03-31
Publication date: 1999-10-15
Anticipated expiration: 2018-03-31
Also published as: JP3546694B2

Abstract

PROBLEM TO BE SOLVED: To provide a multi-thread computer system which includes a user level interrupt mechanism of reduced overhead and a fast exclusive control mechanism to execute a fine grain multi-thread. SOLUTION: When a thread execution is started by a processor element 11, the value of a counter 26 is decreased for every clock. When the value of the counter 26 is set to zero, an interrupt control part 41 starts the user level interrupt processing to shift its control to the address of a user thread scheduler 101 that is set at a user handler register 20. Meanwhile, a test-and-set instruction performs a lock operation to a lock variable set 30 included in a processor 1 and attains an exclusive control among the elements 11.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は並列計算機システム
に関し、特に、共有メモリ型マルチプロセッサ計算機上
で、オペレーティングシステムを介することなく複数ス
レッドを効率的にスケジューリングするマルチスレッド
計算機システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel computer system, and more particularly, to a multi-thread computer system for efficiently scheduling a plurality of threads on a shared memory multiprocessor computer without using an operating system.

【０００２】[0002]

【従来の技術】より高い演算処理性能を得るために、一
つのシステム内に複数のプロセッサエレメントを備える
マルチプロセッサ構成の計算機システムがある。そのよ
うな計算機システムのうち、各プロセッサエレメントが
主記憶メモリを共有する構成のものは共有メモリ型マル
チプロセッサ計算機システムと呼ばれ、分散メモリ型シ
ステムに比べプログラムの記述が容易であるという利点
を持つ。2. Description of the Related Art In order to obtain higher processing performance, there is a computer system having a multiprocessor configuration having a plurality of processor elements in one system. Among such computer systems, a system in which each processor element shares a main memory is called a shared memory type multiprocessor computer system, which has an advantage that a program can be easily described as compared with a distributed memory type system. .

【０００３】他方、ソフトウェア面では、一つのプロセ
スをスレッドと呼ばれる制御の流れに分割し、複数のス
レッドを並行して実行する、マルチスレッド実行と呼ば
れる並列実行方式がある。マルチプロセッサ計算機シス
テム上では、複数のスレッドを複数のプロセッサエレメ
ントに割り当てて同時に実行させることで処理性能が向
上する。On the other hand, in software, there is a parallel execution method called multi-thread execution in which one process is divided into control flows called threads and a plurality of threads are executed in parallel. On a multiprocessor computer system, processing performance is improved by allocating a plurality of threads to a plurality of processor elements and executing them simultaneously.

【０００４】マルチプロセッサ計算機システムにおける
スレッドの管理は、通常、マルチプロセッサ用オペレー
ティングシステムが行なう。すなわち、新しいスレッド
の生成、別のスレッドとの同期、スレッドの消滅等を行
なう場合、当該スレッドはオペレーティングシステムの
サービスを呼び出す。The management of threads in a multiprocessor computer system is usually performed by a multiprocessor operating system. That is, when creating a new thread, synchronizing with another thread, or deleting a thread, the thread calls a service of the operating system.

【０００５】スレッドのプリエンプションもオペレーテ
ィングシステムを介して行なわれる。プリエンプション
とは、長時間プロセッサを占有し続けるスレッドの実行
を中断させ、他のスレッドの実行に切り替える処理であ
る。プリエンプション機能をもつ計算機システムは、ハ
ードウェアタイマ装置と、所定の時間が経過するとタイ
マ装置からプロセッサに割り込み信号を伝える仕組みを
持っている。ユーザプログラムの実行を開始してから一
定の時間が経過すると、タイマ割り込みによってオペレ
ーティングシステムに制御が移り、オペレーティングシ
ステムが実行スレッドの切り替えを行なう。[0005] Thread preemption is also performed through the operating system. Preemption is a process of interrupting the execution of a thread that occupies the processor for a long time and switching to the execution of another thread. The computer system having the preemption function has a hardware timer device and a mechanism for transmitting an interrupt signal from the timer device to the processor when a predetermined time has elapsed. When a certain period of time has elapsed since the start of the execution of the user program, control is transferred to the operating system by a timer interrupt, and the operating system switches the execution thread.

【０００６】マルチプロセッサ計算機システム上で複数
のスレッドを効率良く実行するには、スレッドの生成、
同期、切り替え、消滅といったスレッド管理のオーバー
ヘッドを低減させることが重要である。To efficiently execute a plurality of threads on a multiprocessor computer system, thread generation,
It is important to reduce thread management overhead such as synchronization, switching, and extinction.

【０００７】サン・マイクロシステムズのソラリス(Sol
aris) オペレーティングシステムでは、オペレーティン
グシステムではなく、ユーザレベルのライブラリによっ
てスレッドの管理を行なうことができる。これにより、
より低いオーバーヘッドでスレッド管理ができるが、プ
リエンプションの場合にはオペレーティングシステムが
介在する（マルチスレッドプログラミング入門、アスキ
ー出版局、1996年９月、p69 〜p70 およびp76 〜p77)。The Sun Microsystems Solaris
aris) In the operating system, threads can be managed by a user-level library instead of the operating system. This allows
Thread management can be performed with lower overhead, but in the case of preemption, the operating system is involved (Introduction to Multithread Programming, ASCII Publishing, September 1996, p69-p70 and p76-p77).

【０００８】特開平5-158900号公報に記載されているプ
リエンプション処理回路は、オペレーティングシステム
ではなく、専用ハードウェアによってプリエンプション
の受け付けを行なうものであるが、受け付け後の切り替
え処理はオペレーティングシステムへの通常の割り込み
によって行なっている。[0008] The preemption processing circuit described in Japanese Patent Application Laid-Open No. 5-158900 accepts preemption not by an operating system but by dedicated hardware. This is done by an interrupt.

【０００９】他方、複数のスレッドによって共有される
計算機資源、例えば主記憶メモリ上の共有変数を読み書
きする場合には、複数のスレッドがその計算機資源を同
時にアクセスすることで誤った処理結果を得ることを回
避するために、排他制御と呼ばれる処理を行なう必要が
ある（排他制御については、例えば、Ａ．Ｓ．タネンバ
ウム原著「ＯＳの基礎と応用」プレンティスホール・ト
ッパン、第２章２．２節等に説明されている）。On the other hand, when reading and writing a computer resource shared by a plurality of threads, for example, a shared variable in a main memory, an erroneous processing result may be obtained by a plurality of threads simultaneously accessing the computer resource. In order to avoid this problem, it is necessary to perform a process called exclusive control (for exclusive control, see, for example, AS Tanenbaum, “Basics and Application of OS”, Prentis Hall Toppan, Chapter 2, Section 2.2 Etc.).

【００１０】排他制御は通常、テストアンドセットやエ
クスチェンジといった排他制御用のプロセッサ命令によ
って実現される。これらの命令は、主記憶上のあるメモ
リセルの値の検査とそのメモリセルへの新しい値の設定
とを不可分に行なう。例えば、テストアンドセット命令
は、以下の３つの操作を、割り込みあるいは他のプロセ
ッサによって分断されること無く行なう。（１）主記憶上のあるメモリセルの値の計算機レジスタ
への読み出し（２）そのメモリセルへの値１の書き込み（３）計算機レジスタへ読み出した値と値０との比較Exclusive control is usually realized by an exclusive control processor instruction such as test and set or exchange. These instructions inseparably check the value of a certain memory cell in the main memory and set a new value to that memory cell. For example, the test-and-set instruction performs the following three operations without interruption or division by another processor. (1) Reading the value of a certain memory cell in the main memory to the computer register (2) Writing the value 1 to the memory cell (3) Comparing the value read to the computer register with the value 0

【００１１】共有メモリ型マルチプロセッサ計算機シス
テムでは、異なるプロセッサ間の排他制御を行なうた
め、これらの排他制御用命令は共有している主記憶メモ
リに対して作用する。すなわち、排他制御自身のために
主記憶メモリへのアクセスが必要になる。In a shared memory type multiprocessor computer system, since exclusive control is performed between different processors, these exclusive control instructions act on a shared main memory. That is, access to the main storage memory is required for exclusive control itself.

【００１２】[0012]

【発明が解決しようとする課題】従来技術の第１の問題
点は、マルチスレッド実行のために十分高速な割り込み
応答処理ができず、スレッド切り替えのオーバーヘッド
が大きかった点である。これは、割り込み処理がオペレ
ーティングシステム内で処理されるため、ユーザプログ
ラムからオペレーティングシステムへのコンテキストの
切り替えが発生するからである。A first problem with the prior art is that a sufficiently high-speed interrupt response process cannot be performed due to multi-thread execution, and the overhead of thread switching is large. This is because, since the interrupt processing is processed in the operating system, a context switch from the user program to the operating system occurs.

【００１３】第２の問題点は、排他制御処理自身のため
に主記憶メモリへのアクセスが発生し、マルチスレッド
処理に不可欠なスレッド間の排他制御や同期処理のオー
バーヘッドが大きかった点である。近年のプロセッサの
処理速度は主記憶メモリのアクセス速度よりも格段に速
いため、多くのプロセッサはキャッシュメモリと呼ばれ
る高速で小容量のメモリをプロセッサと主記憶メモリの
間に備え、プロセッサは専らキャッシュメモリをアクセ
スすることでメモリアクセスによる処理の遅延を回避し
ている。しかし、排他制御命令は常に主記憶メモリへの
アクセスを行なうため、処理速度の低下を招く。The second problem is that access to the main memory occurs due to the exclusive control processing itself, and the overhead of exclusive control and synchronization processing between threads, which is indispensable for multi-thread processing, is large. Since the processing speed of recent processors is much faster than the access speed of the main memory, many processors have a high-speed, small-capacity memory called a cache memory between the processor and the main memory. To avoid processing delays due to memory access. However, since the exclusive control instruction always accesses the main storage memory, the processing speed is reduced.

【００１４】上に挙げた問題点はいずれもマルチスレッ
ド処理のオーバーヘッドを増大させる原因となり、特
に、粒度の小さいスレッド（含有する命令数の少ないス
レッド）を単位としてマルチスレッド処理を行なう場合
に大きな影響をもたらす。All of the above-mentioned problems cause an increase in the overhead of multi-thread processing. In particular, when multi-thread processing is performed in units of small-grained threads (threads containing a small number of instructions), large effects are caused. Bring.

【００１５】[0015]

【発明の目的】本発明の目的は、オペレーティングシス
テムが介在しない高速なユーザレベル割り込みを提供
し、マルチスレッド処理の効率を向上させることであ
る。SUMMARY OF THE INVENTION It is an object of the present invention to provide a high-speed user-level interrupt without the intervention of an operating system, and to improve the efficiency of multi-thread processing.

【００１６】本発明の他の目的は、マルチプロセッサ計
算機システムにおけるプロセッサエレメント間の高速な
ロック機構を提供し、マルチスレッド処理の効率を向上
させることである。Another object of the present invention is to provide a high-speed locking mechanism between processor elements in a multiprocessor computer system, and to improve the efficiency of multithread processing.

【００１７】[0017]

【課題を解決するための手段】本発明のマルチスレッド
計算機システムは、オペレーティングシステムが介在し
ない高速なユーザレベル割り込みを提供する高速ユーザ
レベル割り込み機構を有する。具体的には、プロセッサ
内の割り込み制御部（図２の４１）と、各プロセッサエ
レメント内のカウンタ（図２の２６）と、ユーザハンド
ラレジスタ（図２の２０）とを備える。The multithreaded computer system of the present invention has a high-speed user-level interrupt mechanism for providing a high-speed user-level interrupt without the intervention of an operating system. Specifically, it includes an interrupt control unit (41 in FIG. 2) in the processor, a counter (26 in FIG. 2) in each processor element, and a user handler register (20 in FIG. 2).

【００１８】また、本発明のマルチスレッド計算機シス
テムは、マルチプロセッサ計算機システムにおけるプロ
セッサエレメント間の高速なロック機構を提供する高速
ロック機構を有する。具体的には、第１の高速ロック機
構にあっては、プロセッサ内にプロセッサエレメント間
で共有されるロック変数セット（図２の３０）と、それ
を操作するための計算機命令を備える。また、第２の高
速ロック機構にあっては、キュー構造のロック変数（図
３の６１、６２及び６９）を有する。さらに第３の高速
ロック機構にあっては、プロセッサ内の共有キャッシュ
メモリ（図４の７１）と、キャッシュ上のメモリ領域を
操作するための計算機命令とを有する。Further, the multi-thread computer system of the present invention has a high-speed lock mechanism for providing a high-speed lock mechanism between processor elements in a multi-processor computer system. Specifically, the first high-speed lock mechanism includes a lock variable set (30 in FIG. 2) shared between the processor elements in the processor, and computer instructions for operating the lock variable set. Further, the second high-speed locking mechanism has lock variables of a queue structure (61, 62 and 69 in FIG. 3). Further, the third high-speed lock mechanism has a shared cache memory (71 in FIG. 4) in the processor and computer instructions for operating a memory area on the cache.

【００１９】また本発明のマルチスレッド実行制御方法
は、複数のプロセッサエレメントを含むプロセッサと前
記複数のプロセッサエレメントで共有される主記憶メモ
リとを備え、一つのユーザプロセスを複数のスレッドに
分割し、そのユーザプロセス内のスレッドスケジューラ
の制御の下に複数のスレッドを複数のプロセッサエレメ
ントに割り当てて同時に実行するマルチスレッド計算機
システムにおけるマルチスレッド実行制御方法におい
て、オペレーティングシステムが介在しない高速なユー
ザレベル割り込みを提供するために、以下の段階を含ん
でいる。（ａ）ユーザプロセスのスレッドが割り当てられるプロ
セッサエレメント内のユーザハンドラレジスタにそのユ
ーザプロセスのスレッドスケジューラの置かれているメ
モリアドレスを設定すると共に、そのプロセッサエレメ
ント内のカウンタにそのスレッドに割り当てるタイムク
ォンタム値を設定する段階（ｂ）プロセッサエレメントにおけるスレッドの実行開
始と同時にそのプロセッサエレメント内の前記カウンタ
の値を一定周期で更新し、予め定められたカウント値に
達した時点でユーザレベル割り込みを発生させる段階（ｃ）ユーザレベル割り込みの処理において、割り込み
要求元のプロセッサエレメントの現在のプログラムカウ
ンタの値をそのプロセッサエレメント内のユーザ退避Ｐ
Ｃに設定し、そのプロセッサエレメント内のユーザハン
ドラレジスタに設定されたメモリアドレスをプログラム
カウンタに設定することにより制御をユーザプロセス内
のスレッドスケジューラに移す段階Further, a multi-thread execution control method according to the present invention includes a processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, wherein one user process is divided into a plurality of threads, In a multi-thread execution control method in a multi-thread computer system in which a plurality of threads are allocated to a plurality of processor elements under the control of a thread scheduler in the user process and executed simultaneously, a high-speed user-level interrupt without an operating system is provided. To do so, it involves the following steps: (A) The memory address where the thread scheduler of the user process is placed is set in the user handler register in the processor element to which the thread of the user process is allocated, and the time quantum value to be allocated to the thread in the counter in the processor element. (B) updating the value of the counter in the processor element at a constant period simultaneously with the start of execution of the thread in the processor element, and generating a user-level interrupt when the count value reaches a predetermined count value (C) In the processing of the user-level interrupt, the current value of the program counter of the processor element of the interrupt request source is stored in the user save P
C to transfer control to the thread scheduler in the user process by setting the memory address set in the user handler register in the processor element to the program counter.

【００２０】また、本発明のマルチスレッド実行制御方
法は、マルチプロセッサ計算機システムにおけるプロセ
ッサエレメント間の高速な排他制御を可能にするため
に、複数のプロセッサエレメントで実行される複数のス
レッド間の排他制御を、以下の（ａ），（ｂ），（ｃ）
の何れかによって実現している。（ａ）プロセッサ内に設けられ且つプロセッサエレメン
ト間で排他的に値の操作が行なえる計算機命令セットに
よって各プロセッサエレメントからアクセス可能な１ビ
ットの記憶装置の集合を用いて排他制御する。（ｂ）プロセッサエレメントに対応した識別番号をもつ
トークンを到着順に格納するプロセッサ内のキュー構造
の集合であって、プロセッサエレメント間で排他的にト
ークンの追加、検索あるいは削除が行なえる計算機命令
セットによって各プロセッサエレメントからアクセスで
きるキュー構造の集合を用いて排他制御する。（ｃ）同一アドレスに対するアクセスをプロセッサエレ
メント間で調停するアクセス調停機構を通じて各プロセ
ッサエレメントからアクセス可能な共有キャッシュメモ
リ上のメモリ要素であって、プロセッサエレメント間で
排他的に値の操作が行なえる計算機命令セットによって
アクセス可能なメモリ要素を用いて排他制御する。Further, the multi-thread execution control method of the present invention provides an exclusive control between a plurality of threads executed by a plurality of processor elements in order to enable a high-speed exclusive control between the processor elements in a multi-processor computer system. By the following (a), (b), (c)
Is realized by any of the above. (A) Exclusive control is performed using a set of 1-bit storage devices accessible from each processor element by a computer instruction set provided in the processor and capable of exclusively operating values between the processor elements. (B) A set of queue structures in a processor that store tokens having identification numbers corresponding to the processor elements in the order of arrival, and are provided by a computer instruction set in which tokens can be exclusively added, searched, or deleted between the processor elements. Exclusive control is performed using a set of queue structures accessible from each processor element. (C) A memory element in a shared cache memory that can be accessed from each processor element through an access arbitration mechanism that arbitrates access to the same address between the processor elements, and is a computer that can exclusively operate values between the processor elements. Exclusive control is performed using a memory element accessible by the instruction set.

【００２１】[0021]

【作用】カウンタがゼロになると割り込み制御部はユー
ザレベル割り込み処理を開始する。ユーザレベル割り込
み処理では、予めユーザプロセスのスレッドスケジュー
ラの開始アドレスに設定しておいたユーザハンドラレジ
スタの値をプログラムカウンタに設定し、オペレーティ
ングシステムを経由すること無くスレッドスケジューラ
に制御を移行する。When the counter reaches zero, the interrupt controller starts user-level interrupt processing. In the user level interrupt processing, the value of the user handler register, which is set in advance as the start address of the thread scheduler of the user process, is set in the program counter, and the control is transferred to the thread scheduler without passing through the operating system.

【００２２】プロセッサ内に備えられた高速ロック機構
は、主記憶メモリへアクセスする必要のないロック獲得
及び解放機能を提供する。A fast lock mechanism provided in the processor provides lock acquisition and release functions without having to access main memory.

【００２３】[0023]

BEST MODE FOR CARRYING OUT THE INVENTION

【構成の説明】本発明の実施の形態について、図面を参
照して詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described in detail with reference to the drawings.

【００２４】図１は、本発明が実施される計算機構成の
一例を示す図である。プロセッサ１は内部に複数のプロ
セッサエレメント１１、１２及び１９を持ち、それらの
プロセッサエレメントは共通の主記憶メモリ２に対して
アクセスする。FIG. 1 is a diagram showing an example of a computer configuration in which the present invention is implemented. The processor 1 has a plurality of processor elements 11, 12 and 19 therein, and these processor elements access a common main memory 2.

【００２５】図２を参照すると、本発明の第１の実施の
形態は、内部に複数のプロセッサエレメントを有するプ
ロセッサ１と、そのプロセッサ上で動作するオペレーテ
ィングシステム５０と、そのオペレーティングシステム
上で動作するユーザプロセス１００とから構成される。Referring to FIG. 2, a first embodiment of the present invention is a processor 1 having a plurality of processor elements therein, an operating system 50 operating on the processor, and operating on the operating system. And a user process 100.

【００２６】プロセッサ１内の各プロセッサエレメント
は、一般的な計算機のプロセッサが持つレジスタ集合や
演算ユニット、制御ユニット等の他に、ユーザレベルの
プリエンプション割り込みを発生させるためのカウンタ
２６とゼロ比較器２５、ユーザレベル割り込み発生時の
制御移動に使用されるユーザハンドラレジスタ２０、ユ
ーザ退避ＰＣ２３を備えている。Each processor element in the processor 1 includes a counter 26 for generating a user-level preemption interrupt and a zero comparator 25, in addition to a register set, an operation unit, and a control unit of a general computer processor. , A user handler register 20 used for control movement when a user level interrupt occurs, and a user save PC 23.

【００２７】カウンタ２６の値は、クロック信号によっ
てクロック毎に減算される。ゼロ比較器２５は、カウン
タ２６の値とゼロとを比較する。ユーザハンドラレジス
タ２０はユーザレベル割り込み発生時に実行制御を移す
べきプログラムカウンタ値を保持しているレジスタであ
り、ユーザ退避ＰＣ２３はユーザレベル割り込みの発生
直前のプログラムカウンタ値を退避しておくためのレジ
スタである。The value of the counter 26 is decremented for each clock by the clock signal. The zero comparator 25 compares the value of the counter 26 with zero. The user handler register 20 is a register holding a program counter value to which execution control is to be transferred when a user level interrupt occurs, and the user save PC 23 is a register for saving the program counter value immediately before the user level interrupt occurs. is there.

【００２８】図２のプログラムカウンタ（ＰＣ）２２及
び割り込み制御部４１は一般的な計算機のプロセッサが
持つプログラムカウンタ及び割り込み制御部と同等の機
能を有するものであり、カーネルハンドラレジスタ２１
及びカーネル退避ＰＣ２４は一般的な計算機のプロセッ
サにおける通常の割り込み発生時に用いられる割り込み
ハンドラレジスタ及びプログラムカウンタ退避用レジス
タと同等の機能を有するものである。The program counter (PC) 22 and the interrupt controller 41 shown in FIG. 2 have the same functions as the program counter and the interrupt controller of a general computer processor.
The kernel save PC 24 has the same function as an interrupt handler register and a program counter save register used when a normal interrupt occurs in a general computer processor.

【００２９】ユーザプロセス１００はオペレーティング
システム５０により生成される実行プログラムの実体で
ある。ユーザプロセス１００は主記憶メモリ空間を共有
する複数のスレッド１０２、１０３及び１０４から構成
され、これらのスレッドがプロセッサエレメントに割り
当てられ、実行される。スレッドスケジューラ１０１は
ユーザプログラムとリンクされた形でユーザプロセス１
００の内部に存在し、当該ユーザプロセスを構成する全
スレッドの管理を行なう。The user process 100 is an entity of an execution program generated by the operating system 50. The user process 100 is composed of a plurality of threads 102, 103 and 104 sharing a main memory space, and these threads are assigned to processor elements and executed. The thread scheduler 101 links the user process 1 with the user program 1
00, and manages all the threads constituting the user process.

【００３０】また、プロセッサ内には全プロセッサエレ
メントで共有されるロック変数セット３０とアクセス調
停機構４０がある。ロック変数セット３０は１ビットの
状態を記憶できるロック変数の集合であり、アクセス調
停機構４０はロック変数へのアクセスをプロセッサエレ
メント間で調停する機構である。In the processor, there are a lock variable set 30 and an access arbiter 40 shared by all processor elements. The lock variable set 30 is a set of lock variables that can store a 1-bit state, and the access arbitration mechanism 40 is a mechanism that arbitrates access to the lock variable between processor elements.

【００３１】[0031]

【動作の説明】図２を参照して、本実施の形態の動作に
ついて詳細に説明する。[Description of Operation] The operation of this embodiment will be described in detail with reference to FIG.

【００３２】オペレーティングシステム５０がユーザプ
ロセス１００を開始させると、ユーザプロセス１００は
その内部に組み込んだスレッドスケジューラ１０１を呼
び出す。スレッドスケジューラ１０１は、ある決まった
スケジューリングアルゴリズムに従って当該ユーザプロ
セス１００内のスレッドの集合をスケジューリングす
る。スレッドスケジューラ１０１が次に実行すべきスレ
ッド（ここではスレッド１０２とする）を選択すると、
スレッドスケジューラ１０１はスレッド１０２に割り当
てるタイムクォンタム値をスレッド１０２が割り当てら
れるプロセッサエレメント１１のカウンタ２６に設定
し、スレッドスケジューラ１０１の置かれているメモリ
アドレスを当該プロセッサエレメント１１のユーザハン
ドラレジスタ２０に設定した後、当該プロセッサエレメ
ント１１でスレッド１０２の実行を開始させる。When the operating system 50 starts the user process 100, the user process 100 calls a thread scheduler 101 incorporated therein. The thread scheduler 101 schedules a set of threads in the user process 100 according to a certain scheduling algorithm. When the thread scheduler 101 selects a thread to be executed next (here, the thread 102),
The thread scheduler 101 sets the time quantum value assigned to the thread 102 in the counter 26 of the processor element 11 to which the thread 102 is assigned, and sets the memory address where the thread scheduler 101 is located in the user handler register 20 of the processor element 11. Thereafter, the execution of the thread 102 by the processor element 11 is started.

【００３３】スレッド１０２が実行されるのと並行し
て、カウンタ２６の値がクロック信号によって一定時間
毎に減じられてゆく。その値がゼロになると、ゼロ比較
器２５が割り込み制御部４１に割り込み要求信号を送
る。割り込み制御部４１は当該プロセッサエレメント１
１にてユーザレベル割り込みの処理を開始する。In parallel with the execution of the thread 102, the value of the counter 26 is reduced at regular intervals by a clock signal. When the value becomes zero, the zero comparator 25 sends an interrupt request signal to the interrupt control unit 41. The interrupt control unit 41 determines that the processor element 1
At step 1, the user level interrupt process is started.

【００３４】ユーザレベル割り込み処理が開始すると、
プロセッサエレメント１１は現在のプログラムカウンタ
２２の値をユーザ退避ＰＣ２３に設定し、ユーザハンド
ラレジスタ２０の値をプログラムカウンタ２２に設定す
る。これにより、オペレーティングシステム５０を介す
ることなく、ユーザプロセス１００の内部にあるスレッ
ドスケジューラ１０１に速やかに制御が移行する。スレ
ッドスケジューラ１０１は、レジスタセットの値をはじ
めとするスレッド１０２の実行状態をスレッドスケジュ
ーラ１０１の内部の管理データ領域に保存し、定められ
たスケジューリングアルゴリズムに従って次に実行開始
すべきスレッドを選択し、次スレッドのレジスタセット
の値を復元すると共に、カウンタ２６にそのスレッドに
対応するタイムクォンタム値を設定して次スレッドの実
行を開始する。When the user level interrupt processing starts,
The processor element 11 sets the current value of the program counter 22 in the user save PC 23 and sets the value of the user handler register 20 in the program counter 22. As a result, control is quickly transferred to the thread scheduler 101 inside the user process 100 without going through the operating system 50. The thread scheduler 101 saves the execution state of the thread 102 including the value of the register set in a management data area inside the thread scheduler 101, selects the next thread to start execution according to a predetermined scheduling algorithm, and The value of the register set of the thread is restored, and the time quantum value corresponding to the thread is set in the counter 26, and the execution of the next thread is started.

【００３５】これに対して、従来の割り込み処理を用い
る場合を以下に説明する。まず、割り込み処理開始時に
ユーザプログラムからオペレーティングシステムにコン
テキストを切り替える。このコンテキスト切り替えに
は、 ○現在のプログラムカウンタ２２の値のカーネル退避Ｐ
Ｃ２４への設定 ○カーネルハンドラレジスタ２１の値のプログラムカウ
ンタ２２への設定 ○プロセッサの動作モードのカーネルモードへの切り替
え ○必要なレジスタセットの内容の主記憶メモリへの退避といった処理が含まれる。このコンテキスト切り替えに
よって制御はオペレーティングシステム５０内部のカー
ネルレベル割り込みハンドラ５１、次いでプロセススケ
ジューラ５２に移行する。プロセススケジューラ５２は
割り込み要因を分析し、あらかじめ定められたスケジュ
ーリングアルゴリズムに従って次に実行すべきプロセス
を決定する。そして、再度コンテキスト切り替えを行な
って次プロセスの実行を開始する。このコンテキスト切
り替えには、 ○次プロセスのためのレジスタセットの設定 ○次プロセス用論理アドレス空間の設定 ○プロセッサの動作モードのユーザモードへの切り替え
処理が含まれる。これら一連の処理を終えて、ようやく制御
がユーザプロセスに移行する。On the other hand, a case where the conventional interrupt processing is used will be described below. First, the context is switched from the user program to the operating system at the start of interrupt processing. This context switching includes: o Kernel save P of the current value of the program counter 22
Setting to C24 ○ Setting of the value of the kernel handler register 21 to the program counter 22 ○ Switching of the processor operation mode to the kernel mode ○ Saving of necessary register set contents to the main memory is included. By this context switching, control is transferred to the kernel level interrupt handler 51 inside the operating system 50 and then to the process scheduler 52. The process scheduler 52 analyzes the cause of the interruption and determines the next process to be executed according to a predetermined scheduling algorithm. Then, the context is switched again and the execution of the next process is started. This context switching includes: (i) setting a register set for the next process; (ii) setting a logical address space for the next process; and (ii) switching processing of the operation mode of the processor to the user mode. After a series of these processes, control is finally transferred to the user process.

【００３６】つまり、ユーザレベル割り込み機構の導入
により、オペレーティングシステムを経由することによ
るオーバーヘッドの無い、従来より高速なプリエンプシ
ョン機構をユーザプログラムで利用することが可能にな
る。In other words, the introduction of the user-level interrupt mechanism makes it possible to use a preemption mechanism, which is faster than the conventional one, without the overhead of passing through the operating system, in the user program.

【００３７】なお、ユーザレベル割り込み機構の割り込
み要因は、ダウンカウンタのゼロ一致によるプリエンプ
ション割り込みに限定されるものではない。割り込み制
御部さえ対応させれば、例えば、ユーザプロセス内で無
効命令割り込みを処理するなどの目的に利用することも
可能である。The interrupt factor of the user-level interrupt mechanism is not limited to the preemption interrupt due to the zero match of the down counter. If only the interrupt control unit is used, it can be used for the purpose of processing an invalid instruction interrupt in a user process, for example.

【００３８】次に図２を参照してロック変数セット３０
を用いた高速ロック機構について説明する。各プロセッ
サエレメントは、本プロセッサに備わっているテストア
ンドセット命令を用いてロック変数セット３０の内部に
あるロック変数にアクセスする。あるプロセッサエレメ
ント１１がロック変数に対してテストアンドセット命令
を用いると、アクセス調停機構４０は他のプロセッサエ
レメントからロック変数へのアクセスを一時的に禁止
し、その間にプロセッサエレメント１１はロック変数の
値をゼロと比較してプロセッサエレメント１１内の状態
フラグに反映させ、ロック変数に値１を設定する。ソフ
トウェア側から見ると、本実施例で述べたテストアンド
セット命令の動作は、従来のプロセッサが備えているテ
ストアンドセット命令の動作と同様であるが、大きな遅
延をもたらす主記憶メモリへのアクセスを伴わないた
め、従来よりもはるかに少ないオーバーヘッドで変数を
ロックすることができる。Next, referring to FIG.
A high-speed lock mechanism using the above will be described. Each processor element accesses a lock variable inside the lock variable set 30 by using a test and set instruction provided in the present processor. When one processor element 11 uses a test-and-set instruction for a lock variable, the access arbitration mechanism 40 temporarily inhibits access to the lock variable from other processor elements, during which the processor element 11 returns to the value of the lock variable. Is compared with zero and reflected in the status flag in the processor element 11, and the value 1 is set in the lock variable. From the viewpoint of software, the operation of the test and set instruction described in the present embodiment is the same as the operation of the test and set instruction provided in the conventional processor, but the access to the main memory causing a large delay is performed. Because it is not accompanied, variables can be locked with much less overhead than before.

【００３９】プロセッサ内に搭載するロック変数の個数
は８個、１６個または３２個が適当である。その理由
は、ユーザ向けの高機能な排他制御及び同期制御機構は
より基本的な排他制御機構を用いて容易に構築可能であ
り、そのような基本的な排他制御機構が用いるロック変
数は少数で済むためと、ロック変数の個数を小さな２の
冪乗個とすることでロック変数の番号を指定するための
命令フィールド幅を小さくし、命令効率を高めることが
可能であるためである。また、ロック変数の個数をレジ
スタセットの一般的なビット数である３２あるいは６４
以内にすることにより、ロック変数セット全体を一つの
特殊レジスタとして扱うことが可能になる。これにより
ロック変数セットをプロセスに対応するコンテキストに
含めることが可能になり、プロセス毎に論理的に独立し
た高速ロック機構を用いることができるようになる。It is appropriate that the number of lock variables mounted in the processor is 8, 16, or 32. The reason is that sophisticated exclusion control and synchronization control mechanisms for users can be easily constructed using more basic exclusion control mechanisms, and such basic exclusion control mechanisms use a small number of lock variables. This is because the instruction field width for specifying the number of the lock variable can be reduced by increasing the number of lock variables to a small power of 2, thereby increasing the instruction efficiency. Further, the number of lock variables is set to 32 or 64 which is a general number of bits of the register set.
Within this range, the entire lock variable set can be treated as one special register. This makes it possible to include the lock variable set in the context corresponding to the process, and to use a logically independent high-speed lock mechanism for each process.

【００４０】上に述べたユーザレベル割り込み機構及び
高速ロック機構の導入により、従来より細かな単位での
並列処理が可能になり、マルチプロセッサ計算機の実効
性能を高めると共に、その適用分野を拡大することがで
きる。これが本実施の形態の効果である。The introduction of the above-mentioned user-level interrupt mechanism and high-speed lock mechanism enables parallel processing in smaller units than before, thereby improving the effective performance of a multiprocessor computer and expanding its application field. Can be. This is the effect of the present embodiment.

【００４１】[0041]

【発明の他の実施の形態】次に、ユーザレベル割り込み
機構の他の実施形態について説明する。Next, another embodiment of the user level interrupt mechanism will be described.

【００４２】図２において、カウンタ２６の入力はクロ
ック信号となっているが、この信号は計算機システムの
クロック信号そのものには限らない。カウンタ２６の入
力としてシステムクロック信号を適当な分周器で数分の
一に分周したものを用いれば、カウンタ２６を構成する
ハードウェアのビット数やカウンタ２６に値を設定する
計算機命令中のカウンタ値フィールドのビット幅を減ら
すことができ、ハードウェア及びソフトウェア双方の効
率を改善できる。また、システムクロック信号ではな
く、当該プロセッサエレメントの命令実行制御機構の信
号をカウンタ２６に入力し、１命令実行する毎にカウン
タ２６が減じられるようにする方式にしてもよい。In FIG. 2, the input of the counter 26 is a clock signal, but this signal is not limited to the clock signal of the computer system. If the system clock signal obtained by dividing the system clock signal by a suitable frequency divider by a factor of 1 is used as the input to the counter 26, the number of bits of the hardware constituting the counter 26 and the computer instruction for setting the value in the counter 26 are included. The bit width of the counter value field can be reduced, and the efficiency of both hardware and software can be improved. Instead of the system clock signal, a signal of the instruction execution control mechanism of the processor element may be input to the counter 26 so that the counter 26 is decremented each time one instruction is executed.

【００４３】次に、高速ロック機構の他の実施形態につ
いて説明する。Next, another embodiment of the high-speed lock mechanism will be described.

【００４４】図３を参照すると、高速ロック機構の第２
の実施の形態は、各ロック変数がキュー構造になってい
るものである。各キュー６１、６２及び６９には、プロ
セッサエレメント数に等しい個数までのトークンを格納
することができる。トークンは各プロセッサエレメント
が発行するもので、トークンには発行したプロセッサエ
レメントの番号が付される。キューに一つ以上のトーク
ンがある場合、当該キューの先頭にあるトークンを発行
したプロセッサエレメントがそのキューに対応するロッ
ク変数をロックしていることを意味する。このロック変
数を操作するために、ロック試行命令とロック解除命令
の２種の計算機命令が備わっている。プロセッサエレメ
ント１１、１２、…、１９とキュー６１、６２、…、６
９との間に設けられたキュー制御機構６０は、ロック試
行命令、ロック解除命令の実行時、キュー６１、６２、
…、６９に対してプロセッサエレメント間で排他的にト
ークンの追加、検索、削除の操作を行なう。Referring to FIG. 3, the second high-speed locking mechanism
In this embodiment, each lock variable has a queue structure. Each of the queues 61, 62 and 69 can store tokens up to the number equal to the number of processor elements. The token is issued by each processor element, and the number of the issued processor element is attached to the token. If there is one or more tokens in the queue, it means that the processor element that issued the token at the head of the queue has locked the lock variable corresponding to the queue. To operate this lock variable, two types of computer instructions, a lock trial instruction and a lock release instruction, are provided. , 19 and the queues 61, 62, ..., 6
The queue control mechanism 60 provided between the queues 61 and 62 when the lock attempt command and the lock release command are executed.
.., 69 are exclusively subjected to token addition, search, and deletion operations among the processor elements.

【００４５】プロセッサエレメント１１がロック試行命
令を実行すると、キュー制御機構６０は、ロック試行命
令のオペランドで指定されたキュー内（ここではキュー
６１とする）に当該プロセッサエレメント１１のトーク
ンが存在するか否かを検査し、もしトークンが存在しな
ければ新たにトークンを投入する。また、キュー制御機
構６０は指定されたキュー６１の先頭に当該プロセッサ
エレメント１１に対応するトークンがあるか否かを調
べ、その結果をプロセッサエレメント１１の状態フラグ
に設定する。ユーザプログラムはロック試行命令に続い
て状態フラグを参照する分岐命令を実行することで、ロ
ックを獲得できたか否かを判断する。When the processor element 11 executes the lock attempt instruction, the queue control mechanism 60 determines whether the token of the processor element 11 exists in the queue designated by the operand of the lock attempt instruction (here, the queue 61). Check if it is not, and if a token does not exist, insert a new token. Further, the queue control mechanism 60 checks whether or not there is a token corresponding to the processor element 11 at the head of the designated queue 61, and sets the result in the status flag of the processor element 11. The user program determines whether or not the lock has been acquired by executing a branch instruction that refers to the status flag following the lock attempt instruction.

【００４６】プロセッサエレメント１１がロック解除命
令を実行すると、キュー制御機構６０は、ロック試行命
令のオペランドで指定されたキュー内（ここではキュー
６１とする）に当該プロセッサエレメント１１のトーク
ンがあるか否かを検査し、もしあればそのトークンをキ
ュー６１から除去する。When the processor element 11 executes the lock release instruction, the queue control mechanism 60 determines whether or not the token of the processor element 11 is present in the queue specified by the operand of the lock attempt instruction (here, the queue 61). Is checked, and if present, the token is removed from the queue 61.

【００４７】本実施形態は、ロックを要求した順序をキ
ュー内で記憶し先着順にロックを獲得させるという点
で、より公平な排他制御機構を実現できる特長を持つ。The present embodiment has an advantage that a fairer exclusive control mechanism can be realized in that the order in which locks are requested is stored in a queue and locks are acquired on a first-come, first-served basis.

【００４８】図４を参照すると、高速ロック機構の第３
の実施の形態は、ロック変数のための特別なレジスタ等
を持たず、プロセッサ内のキャッシュメモリ（共有キャ
ッシュ）７１上のメモリ領域をロック変数として利用す
るものである。キャッシュメモリ７１と各プロセッサエ
レメント１１、１２及び１９の間にはアクセス調停機構
７０があり、同一メモリアドレスに対するアクセスを調
停する。プロセッサは、従来からの主記憶メモリに対す
るロック処理命令に加え、キャッシュ７１上のメモリ領
域をロックするための特別なテストアンドセット命令を
備える。従来からの主記憶メモリに対するロック処理命
令の場合、ロック変数の値を設定する際にキャッシュラ
インの主記憶メモリへの吐き出しを行なうが、この特別
なテストアンドセット命令は通常のメモリアクセス命令
と同様に処理される。即ち、ロック変数へ値を設定する
際にキャッシュラインの主記憶メモリへの吐き出しを行
なわない。このため、通常のキャッシュ操作によってロ
ック変数がキャッシュから主記憶メモリへ追い出されな
い限り、通常のメモリアクセス命令と同等の速度でロッ
ク操作が行なえる特長を持つ。Referring to FIG. 4, the third type of the high-speed locking mechanism
In this embodiment, a memory area on a cache memory (shared cache) 71 in a processor is used as a lock variable without a special register or the like for a lock variable. An access arbitration mechanism 70 is provided between the cache memory 71 and each of the processor elements 11, 12, and 19, and arbitrates accesses to the same memory address. The processor includes a special test and set instruction for locking the memory area on the cache 71 in addition to the conventional lock processing instruction for the main storage memory. In the case of a conventional lock processing instruction for the main storage memory, when setting the value of the lock variable, the cache line is discharged to the main storage memory, but this special test and set instruction is the same as a normal memory access instruction. Is processed. That is, when a value is set to the lock variable, the cache line is not discharged to the main storage memory. Therefore, as long as the lock variable is not evicted from the cache to the main memory by the normal cache operation, the lock operation can be performed at the same speed as the normal memory access instruction.

【００４９】[0049]

【発明の効果】本発明の第１の効果は、オペレーティン
グシステムを経ないユーザレベルの割り込み機構を実現
できることである。これによりスレッド切り替えのため
のオーバーヘッドが小さくなり、従来より小さな粒度で
のマルチスレッド実行が可能になる。A first effect of the present invention is that a user-level interrupt mechanism without passing through an operating system can be realized. As a result, overhead for thread switching is reduced, and multi-thread execution with smaller granularity than before becomes possible.

【００５０】本発明の第２の効果は、比較的応答速度の
遅い主記憶メモリをアクセスすること無くプロセッサエ
レメント間の排他制御が実現できることである。これに
よりスレッド管理やスレッド間の通信にかかわるオーバ
ーヘッドが小さくなり、従来より小さな粒度でのマルチ
スレッド実行が可能になる。A second effect of the present invention is that exclusive control between processor elements can be realized without accessing a main storage memory having a relatively slow response speed. This reduces overhead related to thread management and communication between threads, and enables multi-thread execution with smaller granularity than before.

【００５１】これらの低オーバーヘッド効果により、共
有メモリ型マルチプロセッサ計算機の応用分野が拡大さ
れ、大きな粒度での並列性が少ないプログラムに対して
もマルチスレッド実行による性能向上が可能になる。Due to these low overhead effects, the application field of the shared memory multiprocessor computer is expanded, and the performance can be improved by multithread execution even for a program with a large granularity and little parallelism.

[Brief description of the drawings]

【図１】本発明が実施される計算機の構成例を示す図で
ある。FIG. 1 is a diagram illustrating a configuration example of a computer on which the present invention is implemented.

【図２】本発明の第１の実施の形態を示す図である。FIG. 2 is a diagram showing a first embodiment of the present invention.

【図３】本発明における高速ロック機構の第２の実施の
形態を示す図である。FIG. 3 is a diagram showing a second embodiment of the high-speed lock mechanism according to the present invention.

【図４】本発明における高速ロック機構の第３の実施の
形態を示す図である。FIG. 4 is a view showing a third embodiment of the high-speed lock mechanism according to the present invention.

[Explanation of symbols]

１プロセッサ２主記憶メモリ１１〜１９プロセッサエレメント２０ユーザハンドラレジスタ２１カーネルハンドラレジスタ２２プログラムカウンタ２３ユーザ退避ＰＣ２４カーネル退避ＰＣ２５ゼロ比較器２６カウンタ３０ロック変数セット４０アクセス調停機構４１割り込み制御部５０オペレーティングシステム５１カーネルレベル割り込みハンドラ５２プロセススケジューラ６０キュー制御機構６１〜６９キュー７０アクセス調停機構７１キャッシュメモリ１００ユーザプロセス１０１スレッドスケジューラ１０２〜１０４スレッド DESCRIPTION OF SYMBOLS 1 Processor 2 Main storage memory 11-19 Processor element 20 User handler register 21 Kernel handler register 22 Program counter 23 User save PC 24 Kernel save PC 25 Zero comparator 26 Counter 30 Lock variable set 40 Access arbitration mechanism 41 Interrupt control unit 50 Operating System 51 Kernel level interrupt handler 52 Process scheduler 60 Queue control mechanism 61-69 Queue 70 Access arbitration mechanism 71 Cache memory 100 User process 101 Thread scheduler 102-104 Thread

Claims

[Claims]

1. A user handler register for holding an entry address of a user-level interrupt handler, a user save PC which is a save destination register of a program counter value, and a program counter value stored in the user save PC when an interrupt request signal is input. An interrupt control unit that performs an interrupt process for setting and setting a user handler register value to a new program counter, and performing preemption in managing a plurality of threads in a user space using the above interrupt mechanism. Characterized multi-thread computer system.

2. A down-counter whose value is decreased every clock, a zero comparator that outputs a signal when the counter value becomes zero, and a user-level interrupt process performed by a zero-match output of the zero comparator. The multi-thread computer system according to claim 1, further comprising an interrupt control unit.

3. A storage system comprising: a set of 1-bit storage devices accessible from each processor element; and a computer instruction set for operating each storage device exclusively between processor elements. A multi-threaded computer system as described.

4. A set of queue structures that can be accessed from each processor element and store tokens having identification numbers corresponding to the processor elements in the order of arrival, and tokens are exclusively added and searched between the processor elements with respect to the queue structure. 3. The multi-thread computer system according to claim 1, further comprising: a queue control mechanism for performing a delete operation; and a computer instruction set for performing an operation of adding, searching, or deleting a token to the queue structure.

5. A cache memory shared by each processor element, an access arbitration mechanism for arbitrating access to the same address in the cache memory between the processor elements, and an exclusive value operation for a memory element on the cache memory 3. The multi-threaded computer system according to claim 1, further comprising: a computer instruction set for performing the following.

6. A processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads, and controlling a thread scheduler in the user process. In a multi-thread execution control method in a multi-thread computer system in which a plurality of threads are assigned to a plurality of processor elements under a multi-thread execution system, (a) a user process register is assigned to a user handler register in a processor element to which a thread of the user process is assigned Setting the memory address where the thread scheduler is located, and setting the time quantum value to be assigned to the thread to the counter in the processor element. (C) updating the value of the counter in the processor element at a constant period simultaneously with the start of execution of the thread, and generating a user-level interrupt when the count reaches a predetermined count value. , The current program counter value of the processor element that issued the interrupt request
C, and transferring control to a thread scheduler in the user process by setting a memory address set in a user handler register in the processor element in a program counter. .

7. An exclusive control between a plurality of threads executed by a plurality of processor elements is accessed from each processor element by a computer instruction set provided in the processor and capable of exclusively operating values between the processor elements. 7. The multi-thread execution control method according to claim 6, wherein the method is performed using a set of possible 1-bit storage devices.

8. A set of queue structures in a processor for storing exclusive control between a plurality of threads executed by a plurality of processor elements in the order of arrival of tokens having identification numbers corresponding to the processor elements. 7. The multi-thread execution control method according to claim 6, wherein the method is performed by using a set of queue structures accessible from each processor element by a computer instruction set in which a token can be exclusively added, searched, or deleted.

9. A memory on a shared cache memory that can be accessed from each processor element through an access arbitration mechanism that arbitrates access to the same address between the processor elements for exclusive control between a plurality of threads executed by the plurality of processor elements. 7. The multi-thread execution control method according to claim 6, wherein the method is performed by using a memory element which is an element and which can be accessed by a computer instruction set in which a value operation can be exclusively performed between processor elements.