JP2004030362A

JP2004030362A - Load module generation method, load module generation program, and load module generation device

Info

Publication number: JP2004030362A
Application number: JP2002187230A
Authority: JP
Inventors: Hideo Miyake; 三宅　英雄; Teruhiko Kamigata; 上方　輝彦; Kengo Azegami; 畔上　謙吾
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-06-27
Filing date: 2002-06-27
Publication date: 2004-01-29

Abstract

【課題】同一データを読み書きする複数の部分プログラムが、主記憶を共有する複数のプロセッサにそれぞれ割り当てられて実行されても、各プロセッサのキャッシュ・メモリと主記憶の間、あるいはキャッシュ・メモリ相互間においてキャッシュの一貫性が自動的に維持されるプログラムのロードモジュールを生成すること。
【解決手段】コンパイラにより、複数の部分プログラムが共有するデータを特定して所定の識別情報（「＿＿ｓｈｒ＿」などの接頭子）を付加しておき、リンカによるプログラムの結合時に、識別情報の付加されたデータだけを抽出して共有データ領域を形成する。この領域はプログラムの実行時、メモリ空間内でキャッシュ対象外として指定された領域に配置されるので、共有データはキャッシュ・メモリに複写されることがなく、常に主記憶上の唯一の値が参照・更新されることになる。
【選択図】　　　図４[PROBLEMS] Even if a plurality of partial programs for reading and writing the same data are assigned to a plurality of processors sharing a main memory and executed, respectively, between a cache memory and a main memory of each processor or between cache memories. To generate a program load module in which the cache coherency is automatically maintained.
A compiler specifies data shared by a plurality of partial programs and adds predetermined identification information (a prefix such as "__shr_") to the data, and the identification information is added when the linker combines the programs. Only the extracted data is extracted to form a shared data area. This area is located in the area specified as non-cacheable in the memory space when the program is executed, so the shared data is not copied to the cache memory, and the only value in main memory is always referred to・ It will be updated.
[Selection diagram] Fig. 4

Description

【０００１】
【発明の属する技術分野】
この発明は、複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成方法、ロードモジュール生成プログラムおよびロードモジュール生成装置に関する。
【０００２】
【従来の技術】
計算機システムにおいては、プログラム実行時の参照の局所性（ｌｏｃａｌｉｔｙ　ｏｆ　ｒｅｆｅｒｅｎｃｅ）を利用することでその処理能力の向上をはかるべく、プロセッサと主記憶との間にキャッシュ機構を設けることがある。
【０００３】
このキャッシュ機構は、主記憶よりは小さい容量だが高速に読み書き可能なキャッシュ・メモリを有し、プロセッサからの読み書き要求に応答して、キャッシュ・メモリあるいは主記憶の読み書きをおこなう。このとき、参照の局所性を活用すべく、一度読み書きした主記憶のメモリ領域の内容（値）をキャッシュ・メモリに複写しておく。複写されたメモリ領域に関しては、主記憶でなくキャッシュ・メモリを読み書きすればよいこととなり、処理の高速化が実現される。
【０００４】
ところで近年の計算機システムでは、複数のプロセッサを搭載し、プログラム中の各部分プログラムをこれらのプロセッサに配分することで処理能力の向上をはかる例がある。このような手法は「共有メモリ型マルチプロセッサ方式（Ｓｈａｒｅｄ−Ｍｅｍｏｒｙ　Ｍｕｌｔｉｐｒｏｃｅｓｓｏｒｓ）」と呼ばれている。そして、この共有メモリ型マルチプロセッサ方式においても、図１０に示すようにプロセッサごとにキャッシュ機構を設けることは、処理能力の向上をはかる上で有効である。
【０００５】
ただ、上記方式においては複数のキャッシュ・メモリを有することになるため、それぞれのキャッシュ・メモリと主記憶の間で、同一アドレスで特定されるメモリ領域の値が一致しなくなることがある。これは主記憶上の任意のメモリ領域に対する任意のプロセッサからの読み出しが、その領域に保持された最新の値を常に返すわけではないことを意味しており、「キャッシュの一致性問題（ｃａｃｈｅ　ｃｏｈｅｒｅｎｃｅ　ｐｒｏｂｌｅｍ）」として知られている。
【０００６】
そして、従来技術においてはこの問題は、「キャッシュ一貫性機構」と呼ばれる物理的な機構によりハードウエア的に解決されるのが通例であった。これは複数の部分プログラムにより読み書きされるデータ（以下では「共有データ」と総称する）の位置を監視し、更新前の古いデータがキャッシュされるのを防ぐキャッシュ一貫性プロトコル（ｃａｃｈｅ　ｃｏｎｓｉｓｔｅｎｃｙ　ｐｒｏｔｏｃｏｌ）にもとづいて、キャッシュの一貫性を保障するものである。
【０００７】
なお、図１１はキャッシュ一貫性機構によりキャッシュの一貫性を保障する場合のメモリマップ例を示す説明図である。図中、ｔｅｘｔ　ａｒｅａ１１００はプログラムの命令列を保持する領域、ｄａｔａ　ａｒｅａ１１０１はプログラムから読み書きされるデータ（共有データであると否とを問わない）を保持する領域である。
【０００８】
そして、これら二つの領域はいずれもキャッシュ対象領域、すなわちキャッシュ・メモリに複写される可能性のある領域である。したがって、共有データは当該データを共有する部分プログラムを実行中の、複数のプロセッサのキャッシュ・メモリにそれぞれ複写される可能性があるが、上述のキャッシュ一貫性機構により、各々のキャッシュ・メモリの値（および主記憶の値）が常に一致するように制御されるわけである。
【０００９】
【発明が解決しようとする課題】
しかしながらこうしたハードウエア的な手法では、キャッシュ一貫性機構そのものが複雑であることから、プロセッサの回路規模が大きくなってしまうという問題点があった。
【００１０】
従来は、共有メモリ型マルチプロセッサ方式が採用されるのは主としてハイエンドの製品であったため、この点はあまり問題にならなかったが、今後一般向けのプリンタやデジタルカメラ、デジタルテレビなどに複数のプロセッサを搭載することを考えると、キャッシュの一貫性を保障するためだけにプロセッサが大きく／重くなったり、製品価格が上昇したりすることは避けなければならない。
【００１１】
この発明は上記従来技術による問題をソフトウエア的に解決するため、同一データを共有する複数の部分プログラムがそれぞれ別個のプロセッサにより実行されても、キャッシュの一貫性が自動的に維持されるようなロードモジュールを生成することが可能なロードモジュール生成方法、ロードモジュール生成プログラムおよびロードモジュール生成装置を提供することを目的とする。
【００１２】
【課題を解決するための手段】
上述した課題を解決し、目的を達成するため、この発明にかかるロードモジュール生成方法、ロードモジュール生成プログラムまたはロードモジュール生成装置は、複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成方法において、前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定し、そのようなデータであると判定されたデータには所定の識別情報を付加した上、この識別情報を付加されたデータを結合してメモリ空間内のキャッシュ非対象領域に配置するための領域を形成することを特徴とする。
【００１３】
また、この発明にかかるロードモジュール生成方法は、上記識別情報として共有データに所定の接頭子を付加することを特徴とする。
【００１４】
また、この発明にかかるロードモジュール生成方法は、上記識別情報として共有データの所属するセクションを指定する情報を付加することを特徴とする。
【００１５】
また、この発明にかかるロードモジュール生成方法、ロードモジュール生成プログラムまたはロードモジュール生成装置は、複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成方法において、前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定した上、そのようなデータであると判定されたデータに対してキャッシュ無効化操作を付加することを特徴とする。
【００１６】
また、この発明にかかるロードモジュール生成方法は、上記キャッシュ無効化操作として、共有データに対する読み出し命令の直前に、当該データのキャッシュのインヴァリデート命令を挿入することを特徴とする。
【００１７】
これらの発明によって生成されたロードモジュールの実行時には、複数の部分プログラムにより共有されるデータはそもそもプロセッサのキャッシュ・メモリに複写されないか、あるいは複写はされるが読み出し時には消去されているために、常に主記憶の値が参照・更新されることになる。
【００１８】
【発明の実施の形態】
以下に添付図面を参照して、この発明にかかるロードモジュール生成方法、ロードモジュール生成プログラムおよびロードモジュール生成装置の好適な実施の形態を詳細に説明する。
【００１９】
上述のように、従来技術ではキャッシュ一貫性機構というハードウエアを用いてキャッシュ・メモリ−主記憶間、あるいはキャッシュ・メモリ相互間のデータの同一性を保障していたのであるが、本発明においてはキャッシュの一致性問題を、もっぱらソフトウエア的に解決する方針である。キャッシュの一貫性を、プログラムを実行するプロセッサ側でなく、プロセッサで実行されるプログラム側から保障しようとするアプローチと言ってもよい。
【００２０】
そして、この方針に沿う解決として理論上は下記二つの手法が提案されている（ｃｆ．シメル、カート「ＵＮＩＸ（Ｒ）カーネル内部解析−キャッシュとマルチプロセッサの管理」ソフトバンク社）。
【００２１】
（１）プログラム内の共有データはそもそもキャッシュしないようにする、すなわち常に主記憶上の値を読み書きすることで、共有データのコピーが複数箇所に散在する状況を未然に防止する方式。以下では「アンキャッシュ共有データ手法」と呼ぶ。
【００２２】
ＭＭＵ（Ｍｅｍｏｒｙ　Ｍａｎａｇｅｍｅｎｔ　Ｕｎｉｔ）を備えたプロセッサの中には、メモリ空間内の任意の領域をキャッシュ・メモリに複写しない領域として指定できるものがある。たとえばＳＰＡＲＣでは、ＭＭＵのページ・テーブル・エントリのＣビット（キャッシュ可能ビット）をＯＦＦにすることで、任意の領域をキャッシュ対象外とすることができる。
【００２３】
この機能は主に、メモリ空間とは別にＩ／Ｏ空間を設けるのでなく、メモリ空間の一部をＩ／Ｏ空間として使用する場合に、当該領域をキャッシュ対象外とする必要があることから用意されているものである。Ｉ／Ｏについては、キャッシュがあると最新の値が読み込めない／書き込めないなどの不具合が生ずるため、処理速度を犠牲にしてもキャッシュをしない設定としておくことに合理性がある。
【００２４】
この機能をいわば転用して、プロセッサのメモリ空間上にキャッシュ非対象領域を設け、共有データはもっぱらこの領域に配置するようにすれば、各プロセッサのキャッシュ・メモリには非共有データ、すなわち当該プロセッサで実行中の部分プログラムに固有のデータしか蓄積されないことになり、共有データについてはキャッシュが存在しないので、常に主記憶上の値が読み書きされることになる。
【００２５】
図１は、アンキャッシュ共有データ手法によりキャッシュの一貫性を保障する場合のメモリマップ例を示す説明図である。図中、ｔｅｘｔ　ａｒｅａ１００はプログラムの命令列を保持する領域、ｄａｔａ　ａｒｅａ１０１はプログラムから読み書きされるデータのうち、とくに非共有データを保持する領域である。そしてこれら二つの領域は、いずれもキャッシュ対象領域、すなわちキャッシュ・メモリに複写される可能性のある領域である。
【００２６】
これに対し、ｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ１０２はプログラムから読み書きされるデータのうち、とくに共有データを保持する領域である。そして、この領域はキャッシュ非対象領域、すなわちキャッシュ・メモリに複写される可能性のない領域である。
【００２７】
以下で説明する実施の形態１および２は、この「アンキャッシュ共有データ手法」によりキャッシュの一貫性を保障しようとするものである。より具体的には上記を前提として、プログラムのロードモジュールの生成時に、メモリ空間内の異なる場所に配置される上記各領域をどうやって形成するかのその手順に関するものである。
【００２８】
（２）共有データも非共有データも区別なくキャッシュはするが、共有データについては読み出しの直前にキャッシュ・メモリ上のデータを無効化してしまうことで、常に主記憶上の値を読みに行き事実上キャッシュを無視する方式。以下では「選択的キャッシュ無効化操作手法」と呼ぶ。選択的と言うのは、非共有データについてはキャッシュの無効化をおこなわない点による。
【００２９】
キャッシュ機構には、ストア命令を実行しただけでキャッシュ・メモリの値とあわせて主記憶の値も同時に更新されるライトスルー・キャッシュと、キャッシュ・メモリにおける更新を主記憶にも反映させるには、ストア命令の後に改めてフラッシュ（ｆｌｕｓｈ）命令を実行しなければならないライトバック・キャッシュとの二種類がある。そして、キャッシュ機構がいずれのタイプであるかによって、上記の「無効化操作」の手順は異なる。
【００３０】
すなわちライトスルー型の場合は、主記憶には常に最新の値が保持されているので、共有データの読み出しに先立ってインヴァリデート（ｉｎｖａｌｉｄａｔｅ）命令を実行することで、自己のキャッシュ・メモリに残っているそのコピーを消去するだけでよい。これに対しライトバック型の場合は、まずフラッシュ命令を実行してキャッシュ・メモリ上の値で主記憶上の値を更新する、すなわち主記憶上の値を最新の値にアップデートした上で、インヴァリデート命令によりキャッシュを消去し主記憶上の値を読み込むという二段構えになる。
【００３１】
図２は、選択的キャッシュ無効化操作手法によりキャッシュの一貫性を保障する場合のメモリマップ例を示す説明図である。図中、ｔｅｘｔ　ａｒｅａ２００はプログラムの命令列を保持する領域、ｄａｔａ　ａｒｅａ２０１はプログラムから読み書きされるデータのうち、とくに非共有データを保持する領域、ｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ２０２は共有データを保持する領域である。そして、これら三つの領域はいずれもキャッシュ対象領域である。
【００３２】
以下で説明する実施の形態３は、この「選択的キャッシュ無効化操作手法」によりキャッシュの一貫性を保障しようとするものである。より具体的には、プログラムのロードモジュールの生成時にあらかじめ共有データを特定しておき、当該データのロード命令の直前にキャッシュの無効化操作、すなわちライトスルー・キャッシュにおいてはインヴァリデート命令、ライトバック・キャッシュにおいてはフラッシュ命令＋インヴァリデート命令をそれぞれ挿入する、その手順に関するものである。
【００３３】
（実施の形態１）
図３は、本発明の実施の形態１にかかるロードモジュール生成装置のハードウエア構成の一例を示すブロック図である。
【００３４】
図中、まずＣＰＵ３０１は装置全体の制御を司る。ＲＯＭ３０２はブートプログラムなどを記憶している。ＲＡＭ３０３はＣＰＵ３０１のワークエリアとして使用される。ＨＤＤ３０４は、ＣＰＵ３０１の制御にしたがってＨＤ３０５に対するデータのリード／ライトを制御する。ＨＤ３０５は、ＨＤＤ３０４の制御にしたがって書き込まれたデータを記憶する。
【００３５】
ＦＤＤ３０６は、ＣＰＵ３０１の制御にしたがってＦＤ３０７に対するデータのリード／ライトを制御する。ＦＤ３０７は、ＦＤＤ３０６の制御にしたがって書き込まれたデータを記憶したり、記憶しているデータをＦＤＤ３０６の磁気ヘッドに読み取らせたりする。着脱可能な記録媒体としては、ＦＤ３０７のほかＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＭＯ、ＤＶＤ（Ｄｉｇｉｔａｌ　Ｖｅｒｓａｔｉｌｅ　Ｄｉｓｋ）、メモリカードなどが考えられる。
【００３６】
ディスプレイ３０８は、たとえばＣＲＴ、ＴＦＴ液晶ディスプレイ、プラズマディスプレイなどであって、カーソルやウィンドウをはじめ、文書、画像などの各種データを表示する。ネットワークＩ／Ｆ３０９は、イーサネット（Ｒ）ケーブル３１０を通じてＬＡＮに接続されるとともに、ＬＡＮと装置内部とのデータの送受信を司る。
【００３７】
キーボード３１１は、文字、数値、各種指示などの入力のためのキーを備え、装置内部へのデータの入力をおこなう。タッチパネル式の入力パッドやテンキーなどであってもよい。マウス３１２は、カーソルの移動や範囲選択などをおこなう。ポインティングデバイスとして同様の機能を備えるものであれば、トラックボール、ジョイスティック、十字キー、ジョグダイヤルなどであってもよい。なお、上記各部はバスまたはケーブル３００により接続されている。
【００３８】
次に、図４は本発明の実施の形態１にかかるロードモジュール生成装置の構成を機能的に示すブロック図である。同図に示す各機能部は、具体的には図３に示したＨＤ３０５、ＦＤ３０７などに格納されたプログラム、具体的にはコンパイラ、アセンブラおよびリンカの三つのプログラムを、ＣＰＵ３０１がＲＡＭ３０３に読み出して実行することにより実現される。
【００３９】
図中、４００〜４０４は上記プログラムのうちコンパイラ、４０５〜４０７はアセンブラ、４０８〜４１２はリンカによりそれぞれ実現される機能部である。なお、各機能部の機能についてはすぐ下に述べるフローチャートで説明する。
【００４０】
次に、図５は本発明の実施の形態１にかかるロードモジュール生成装置におけるロードモジュール生成処理の手順を示すフローチャートである。
【００４１】
上記装置ではまずコンパイラが起動され、当該コンパイラにより実現される第１解析部４００が、指定されたプログラムのソース記述を読み込んで字句解析および構文解析をおこなうとともに、当該プログラムをコンパイラの内部表現へと変換する（ステップＳ５０１）。
【００４２】
次に共有データ判定部４０１が、コンパイラの内部表現を走査することで、そこに含まれる個々のデータが部分プログラム間の共有データであるか否かを判定する。このとき、共有データと判定したデータにはその旨の識別子を付加する（ステップＳ５０２）。
【００４３】
次に共有データ識別情報付加部４０２が、コンパイラの内部表現を走査して、ステップＳ５０２で識別子を付加されたデータを検索する。そして検索された共有データに対し、その識別情報として、データ名の先頭に所定の接頭子を付加する（ステップＳ５０３）。なお、この接頭子としては具体的には文字列「＿＿ｓｈｒ＿」などが考えられる。
【００４４】
次に命令列生成部４０３が、コンパイラの内部表現にもとづいて、プログラムの動作を実現する命令列を生成し、当該命令列をコンパイラの内部情報に付加する（ステップＳ５０４）。
【００４５】
次にアセンブリ記述出力部４０４が、コンパイラの内部表現および付加されている命令列にもとづいて、上記プログラムのアセンブリ記述を出力する（ステップＳ５０５）。以上で、コンパイラによるソース記述からアセンブリ記述までの変換処理が終了する。
【００４６】
次に上記装置ではアセンブラが起動され、当該アセンブラにより実現される第２解析部４０５が、ステップＳ５０５でコンパイラのアセンブリ記述出力部４０４から出力されたアセンブリ記述を読み込んで、字句解析をおこなうとともにアセンブラの内部表現へと変換する（ステップＳ５０６）。
【００４７】
次にバイナリ・コード生成部４０６が、アセンブラの内部表現にもとづいてバイナリ・コード（命令コードを含む）を生成し、当該コードをアセンブラの内部情報に付加する（ステップＳ５０７）。
【００４８】
次にオブジェクト出力部４０７が、アセンブラの内部表現および付加されているバイナリ・コードにもとづいて、上記プログラムのオブジェクトを出力する（ステップＳ５０８）。以上で、アセンブラによるアセンブリ記述からオブジェクト・コードまでの変換処理が終了する。
【００４９】
次に上記装置ではリンカが起動され、当該リンカにより実現されるオブジェクト読み込み部４０８が、ステップＳ５０８でアセンブラのオブジェクト出力部４０７から出力されたオブジェクトをリンカの内部表現として読み込む（ステップＳ５０９）。
【００５０】
次に共有データ領域形成部４０９が、リンカの内部表現において、共有データである旨の識別情報（上述の「＿＿ｓｈｒ＿」など）を有するデータを検索する。そして、これらの共有データのみから構成される領域（図１に示したｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ１０２）を形成し、リンカの内部表現として付加する（ステップＳ５１０）。
【００５１】
次にメモリ空間構築部４１０が、リンカの内部表現において、残った非共有データのみから構成される領域（同ｄａｔａ　ａｒｅａ１０１）と命令列のみからなる領域（同ｔｅｘｔ　ａｒｅａ１００）とを形成し、リンカの内部表現として付加する（ステップＳ５１１）。
【００５２】
次にアドレス解決部４１１が、リンカの内部表現において、ｔｅｘｔ　ａｒｅａ１００、ｄａｔａ　ａｒｅａ１０１、ｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ１０２の各メモリ領域のアドレス解決をおこなう（ステップＳ５１２）。
【００５３】
次にロードモジュール出力部４１２が、リンカの内部表現にもとづいて、上記プログラムのロードモジュールを出力する（ステップＳ５１３）。以上で、リンカによるオブジェクト・コードの結合が終了し、ソースからロードモジュールまでのプログラムの変換処理が終了する。
【００５４】
以上説明した実施の形態１によれば、複数の部分プログラムにより共有されるデータにはそれと分かる識別情報（具体的には「＿＿ｓｈｒ＿」などの接頭子）が付され、リンク時にはまずこの共有データのみが抽出されてｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ１０２が形成された後、残りの非共有データでｄａｔａ　ａｒｅａ１０１が、また残りの命令列でｔｅｘｔ　ａｒｅａ１００が、それぞれ形成される。
【００５５】
このように、ロードモジュールの生成時に共有データと非共有データとがあらかじめ異なる領域にまとめられているので、実行時には共有データすなわちｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ１０２をキャッシュ非対象領域に、非共有データすなわちｄａｔａ　ａｒｅａ１０１をキャッシュ対象領域に、それぞれ配置するようにすれば、共有データが方々のプロセッサのキャッシュ・メモリに分散する事態が避けられ、これによりキャッシュの一貫性が維持される。
【００５６】
（実施の形態２）
さて、上述した実施の形態１では、リンカによるメモリ空間の構築に先立ってあらかじめ印を付けておいた共有データをいわば取り分けておくようにしたが、これは従来技術のリンカでは、共有データと非共有データとを別々にまとめるという発想がないためである。すなわち、データは単純にソースに記述された順序で結合されてゆくので、ステップＳ５１０で先に共有データだけを取り分けておかないと、続くステップＳ５１１では共有データと非共有データとの混在する領域が形成されてしまうためである。
【００５７】
ただし、従来技術によるリンカも、同一セクション内のデータをそれぞれまとめて一ブロックとし、各ブロックを異なるメモリ領域に配置する機能を備えている。そこで以下に説明する実施の形態２のように、アセンブラのセクション指定疑似命令を利用して、たとえば共有データ部分をセクションＡ、非共有データ部分をセクションＢ、のように指定しておけば、従来技術のリンカによるメモリ空間構築処理の枠内で、共有データと非共有データとをそれぞれ別個のブロックにまとめることができる。
【００５８】
実施の形態２にかかるロードモジュール生成装置のハードウエア構成は、図３に示した実施の形態１のそれと同一であるので説明を省略する。図６は、本発明の実施の形態２にかかるロードモジュール生成装置の構成を機能的に示すブロック図、図７は当該装置におけるロードモジュール生成処理の手順を示すフローチャートである。実施の形態１との差異は、図６には図４に示した共有データ領域形成部４０９に対応する機能部がないこと、これに伴って、図７には図５に示したステップＳ５１０に相当する処理がないことである。
【００５９】
また、実施の形態１の共有データ識別情報付加部４０２が、ステップＳ５０３で共有データに所定の接頭子を付加したのに対し、実施の形態２の共有データ識別情報付加部６０２は、ステップＳ７０３で共有データの直前にセクション指定疑似命令、たとえば「．ｓｅｃｔ　”ＳＨＡＲＥＤ　ＤＡＴＡ”」のような一行を挿入する。
【００６０】
そして、実施の形態２によるメモリ空間構築部６０９は、ステップＳ７１０で「ＳＨＡＲＥＤ　ＤＡＴＡ」セクションに所属するデータのみを集めてｓｈａｒｅｄ　ｄａｔａ　ａｒｅａ１０２とし、残りのセクション内のデータはまとめてｄａｔａ　ａｒｅａ１０１、命令列はまとめてｔｅｘｔ　ａｒｅａ１００とする。
【００６１】
この実施の形態２によれば、アセンブラのセクション指定疑似命令により共有データと非共有データとがあらかじめ分離されているので、プログラムの実行時に前者をキャッシュ非対象領域に、後者をキャッシュ対象領域に、それぞれ配置すれば、共有データが方々のプロセッサのキャッシュ・メモリに分散する事態が避けられ、これによりキャッシュの一貫性が維持される。
【００６２】
（実施の形態３）
さて、上述した実施の形態１および２では、プロセッサのメモリ空間上にキャッシュ対象外の領域を設けて、当該領域に共有データを配置することが前提である。したがって、共有データはそもそもキャッシュ・メモリに複写されないのであるが、以下で説明する実施の形態３のように、共有データについてもキャッシュはする設定にしておき、ただデータの読み込み時には手持ちのキャッシュを無効にして、常に主記憶までデータを読みに行くように制御してもよい。
【００６３】
実施の形態３にかかるロードモジュール生成装置のハードウエア構成は、図３に示した実施の形態１のそれと同一であるので説明を省略する。図８は、本発明の実施の形態３にかかるロードモジュール生成装置の構成を機能的に示すブロック図、図９は当該装置におけるロードモジュール生成処理の手順を示すフローチャートである。実施の形態１との差異は、図８には図４に示した共有データ領域形成部４０９に対応する機能部がないこと、これに伴って、図９には図５に示したステップＳ５１０に相当する処理がないことである。
【００６４】
また、図８には図４に示した共有データ識別情報付加部４０２に代えて、キャッシュ無効化操作付加部８０２が設けられており、これに伴って、図９のステップＳ９０３では接頭子の付加に代えて、このキャッシュ無効化操作付加部８０２によるキャッシュ無効化操作付加処理がおこなわれることになる。
【００６５】
キャッシュ無効化操作の付加とは、具体的には共有データに対するロード命令の直前に、ライトスルー・キャッシュの場合はキャッシュ・メモリ内の当該データのキャッシュ（厳密には、当該キャッシュを含むキャッシュ・ブロック）のインヴァリデート命令、ライトバック・キャッシュの場合はそのフラッシュ命令およびインヴァリデート命令を、それぞれ挿入することである。
【００６６】
この実施の形態３によれば、共有データも非共有データと同様にキャッシュはされるものの、共有データについて各プロセッサのキャッシュ・メモリに蓄積されているそのコピーは事実上使用されない。
【００６７】
すなわち非共有データの読み出しにおいては、キャッシュ・メモリ内にそのコピーがあればプロセッサは当該コピーを読み出すのであるが、共有データの読み出しにあたっては、その直前にインヴァリデート命令を実行して自ら当該コピーを消去してしまうので、キャッシュ・メモリを飛び越して常に主記憶まで当該データの本体を見に行くことになる。これにより、どのプロセッサも共有データについては主記憶上の同一アドレスを読み書きすることになり、キャッシュの一貫性が維持される。
【００６８】
なお、本実施の形態におけるロードモジュール生成方法は、あらかじめ用意されたプログラム（コンパイラ、アセンブラおよびリンカ）がパーソナルコンピュータ、ワークステーションなどの各種のコンピュータ上で実行されることにより実現されるが、このプログラムはＨＤ、ＦＤ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な各種の記録媒体に記録され、当該記録媒体によって配布することができるほか、インターネットなどのネットワークを介して配布することも可能である。
【００６９】
（付記１）複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成方法において、
前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定する共有データ判定工程と、
前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、所定の識別情報を付加する識別情報付加工程と、
前記識別情報付加工程で所定の識別情報を付加されたデータを結合してメモリ空間内のキャッシュ非対象領域に配置するための領域を形成する共有データ領域形成工程と、
を含んだことを特徴とするロードモジュール生成方法。
【００７０】
（付記２）前記識別情報付加工程では、前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、識別情報として所定の接頭子を付加することを特徴とする付記１に記載のロードモジュール生成方法。
【００７１】
（付記３）前記識別情報付加工程では、前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、識別情報としてその所属するセクションを指定する情報を付加することを特徴とする付記１に記載のロードモジュール生成方法。
【００７２】
（付記４）複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成方法において、
前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定する共有データ判定工程と、
前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに対してキャッシュ無効化操作を付加するキャッシュ無効化操作付加工程と、
を含んだことを特徴とするロードモジュール生成方法。
【００７３】
（付記５）前記キャッシュ無効化操作付加工程では、前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに対する読み出し命令の直前に、当該データのキャッシュのインヴァリデート命令を挿入することを特徴とする付記４に記載のロードモジュール生成方法。
【００７４】
（付記６）複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成プログラムにおいて、
前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定させる共有データ判定工程と、
前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、所定の識別情報を付加させる識別情報付加工程と、
前記識別情報付加工程で所定の識別情報を付加されたデータを結合してメモリ空間内のキャッシュ非対象領域に配置するための領域を形成させる共有データ領域形成工程と、
をコンピュータに実行させることを特徴とするロードモジュール生成プログラム。
【００７５】
（付記７）前記識別情報付加工程では、前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、識別情報として所定の接頭子を付加させることを特徴とする付記６に記載のロードモジュール生成プログラム。
【００７６】
（付記８）前記識別情報付加工程では、前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、識別情報としてその所属するセクションを指定する情報を付加させることを特徴とする付記６に記載のロードモジュール生成プログラム。
【００７７】
（付記９）複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成プログラムにおいて、
前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定させる共有データ判定工程と、
前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに対してキャッシュ無効化操作を付加させるキャッシュ無効化操作付加工程と、
をコンピュータに実行させることを特徴とするロードモジュール生成プログラム。
【００７８】
（付記１０）前記キャッシュ無効化操作付加工程では、前記共有データ判定工程で少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに対する読み出し命令の直前に、当該データのキャッシュのインヴァリデート命令を挿入させることを特徴とする付記９に記載のロードモジュール生成プログラム。
【００７９】
（付記１１）複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成装置において、
前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定する共有データ判定手段と、
前記共有データ判定手段により少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、所定の識別情報を付加する識別情報付加手段と、
前記識別情報付加手段により所定の識別情報を付加されたデータを結合してメモリ空間内のキャッシュ非対象領域に配置するための領域を形成する共有データ領域形成手段と、
を備えたことを特徴とするロードモジュール生成装置。
【００８０】
（付記１２）前記識別情報付加手段は、前記共有データ判定手段により少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、識別情報として所定の接頭子を付加することを特徴とする付記１１に記載のロードモジュール生成装置。
【００８１】
（付記１３）前記識別情報付加手段は、前記共有データ判定手段により少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに、識別情報としてその所属するセクションを指定する情報を付加することを特徴とする付記１１に記載のロードモジュール生成装置。
【００８２】
（付記１４）複数の部分プログラムにより構成され、前記各部分プログラムが複数のプロセッサによりそれぞれ実行されるプログラムのロードモジュールを生成するロードモジュール生成装置において、
前記プログラムに含まれる個々のデータが、前記複数の部分プログラムのうち少なくとも二つの部分プログラムにより読み書きされるデータであるか否かを判定する共有データ判定手段と、
前記共有データ判定手段により少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに対してキャッシュ無効化操作を付加するキャッシュ無効化操作付加手段と、
を備えたことを特徴とするロードモジュール生成装置。
【００８３】
（付記１５）前記キャッシュ無効化操作付加手段は、前記共有データ判定手段により少なくとも二つの部分プログラムにより読み書きされるデータであると判定されたデータに対する読み出し命令の直前に、当該データのキャッシュのインヴァリデート命令を挿入することを特徴とする付記１４に記載のロードモジュール生成装置。
【００８４】
【発明の効果】
以上説明したように本発明によって生成されたロードモジュールの実行時には、複数の部分プログラムにより共有されるデータはそもそもプロセッサのキャッシュ・メモリに複写されないか、あるいは複写はされるが読み出し時には消去されているために、常に主記憶の値が参照・更新されることになり、これによって、同一データを共有する複数の部分プログラムがそれぞれ別個のプロセッサにより実行されても、キャッシュの一貫性が自動的に維持されるロードモジュールを生成することが可能なロードモジュール生成方法、ロードモジュール生成プログラムおよびロードモジュール生成装置が得られるという効果を奏する。
【図面の簡単な説明】
【図１】アンキャッシュ共有データ手法によりキャッシュの一貫性を保障する場合のメモリマップ例を示す説明図である。
【図２】選択的キャッシュ無効化操作手法によりキャッシュの一貫性を保障する場合のメモリマップ例を示す説明図である。
【図３】本発明の実施の形態１にかかるロードモジュール生成装置のハードウエア構成の一例を示すブロック図である。
【図４】本発明の実施の形態１にかかるロードモジュール生成装置の構成を機能的に示すブロック図である。
【図５】本発明の実施の形態１にかかるロードモジュール生成装置におけるロードモジュール生成処理の手順を示すフローチャートである。
【図６】本発明の実施の形態２にかかるロードモジュール生成装置の構成を機能的に示すブロック図である。
【図７】本発明の実施の形態２にかかるロードモジュール生成装置におけるロードモジュール生成処理の手順を示すフローチャートである。
【図８】本発明の実施の形態３にかかるロードモジュール生成装置の構成を機能的に示すブロック図である。
【図９】本発明の実施の形態３にかかるロードモジュール生成装置におけるロードモジュール生成処理の手順を示すフローチャートである。
【図１０】キャッシュ機構を備えた共有メモリ型マルチプロセッサ方式の構成を模式的に示す説明図である。
【図１１】従来技術のキャッシュ一貫性機構によりキャッシュの一貫性を保障する場合のメモリマップ例を示す説明図である。
【符号の説明】
３００　バスまたはケーブル
３０１　ＣＰＵ
３０２　ＲＯＭ
３０３　ＲＡＭ
３０４　ＨＤＤ
３０５　ＨＤ
３０６　ＦＤＤ
３０７　ＦＤ
３０８　ディスプレイ
３０９　ネットワークＩ／Ｆ
３１０　イーサネット（Ｒ）ケーブル
３１１　キーボード
３１２　マウス
４００，６００，８００　第１解析部
４０１，６０１，８０１　共有データ判定部
４０２，６０２　共有データ識別情報付加部
４０３，６０３，８０３　命令列生成部
４０４，６０４，８０４　アセンブリ記述出力部
４０５，６０５，８０５　第２解析部
４０６，６０６，８０６　バイナリ・コード生成部
４０７，６０７，８０７　オブジェクト出力部
４０８，６０８，８０８　オブジェクト読み込み部
４０９　共有データ領域形成部
４１０，６０９，８０９　メモリ空間構築部
４１１，６１０，８１０　アドレス解決部
４１２，６１１，８１１　ロードモジュール出力部
８０２　キャッシュ無効化操作付加部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a load module generation method, a load module generation program, and a load module generation device that are configured by a plurality of partial programs, and each of the partial programs generates a load module of a program executed by a plurality of processors.
[0002]
[Prior art]
2. Description of the Related Art In a computer system, a cache mechanism may be provided between a processor and a main memory in order to improve processing performance by utilizing locality of reference at the time of program execution.
[0003]
This cache mechanism has a cache memory which has a smaller capacity than the main memory but is readable and writable at high speed, and reads and writes the cache memory or the main memory in response to a read / write request from the processor. At this time, in order to utilize the locality of reference, the contents (values) of the memory area of the main memory once read and written are copied to the cache memory. With respect to the copied memory area, it is only necessary to read and write the data in the cache memory instead of the main memory, and the processing speed is increased.
[0004]
By the way, in recent computer systems, there is an example in which a plurality of processors are mounted and the processing capacity is improved by allocating each partial program in the program to these processors. Such a technique is called “shared-memory multiprocessors”. Also, in this shared memory type multiprocessor system, providing a cache mechanism for each processor as shown in FIG. 10 is effective in improving the processing performance.
[0005]
However, in the above method, since a plurality of cache memories are provided, the values of the memory area specified by the same address may not match between each cache memory and the main memory. This means that a read from an arbitrary processor to an arbitrary memory area in the main memory does not always return the latest value held in the area, and the "cache coherence problem" (cache coherence problem). problem).
[0006]
In the prior art, this problem is usually solved in hardware by a physical mechanism called a "cache coherency mechanism". This is based on a cache consistency protocol that monitors the position of data read and written by a plurality of partial programs (hereinafter collectively referred to as “shared data”) and prevents old data before updating from being cached. Based on this, it guarantees the coherency of the cache.
[0007]
FIG. 11 is an explanatory diagram showing an example of a memory map in the case where cache coherency is ensured by the cache coherency mechanism. In the figure, a text area 1100 is an area for holding a sequence of instructions of a program, and a data area 1101 is an area for holding data (whether or not it is shared data) read / written from the program.
[0008]
Each of these two areas is a cache target area, that is, an area that may be copied to the cache memory. Therefore, the shared data may be copied to the cache memories of a plurality of processors while executing the partial program sharing the data, but the cache coherency mechanism described above causes the value of each cache memory to be copied. (And the value of the main memory) are always controlled to match.
[0009]
[Problems to be solved by the invention]
However, in such a hardware method, there is a problem that the circuit size of the processor becomes large because the cache coherence mechanism itself is complicated.
[0010]
In the past, the shared memory multiprocessor method was mainly used for high-end products, so this point was not a problem.However, in the future, multiple processors will be used in general-purpose printers, digital cameras, digital televisions, etc. Considering that the processor is installed, it is necessary to avoid the processor from becoming large / heavy or increasing the product price just to guarantee the coherency of the cache.
[0011]
The present invention solves the above-mentioned problem of the prior art by software, so that even if a plurality of partial programs sharing the same data are respectively executed by separate processors, cache coherency is automatically maintained. It is an object to provide a load module generation method, a load module generation program, and a load module generation device capable of generating a load module.
[0012]
[Means for Solving the Problems]
In order to solve the above-described problem and achieve the object, a load module generation method, a load module generation program, or a load module generation device according to the present invention is configured by a plurality of partial programs, and each of the partial programs is executed by a plurality of processors. In a load module generation method for generating a load module of a program to be executed, it is determined whether or not each data included in the program is data read and written by at least two of the plurality of partial programs. It is determined to add predetermined identification information to the data determined to be such data, and to combine the data with the identification information added thereto and arrange the data in the non-cache target area in the memory space. The method is characterized in that a region is formed.
[0013]
Further, a load module generation method according to the present invention is characterized in that a predetermined prefix is added to the shared data as the identification information.
[0014]
The load module generation method according to the present invention is characterized in that information for designating a section to which the shared data belongs is added as the identification information.
[0015]
In addition, a load module generation method, a load module generation program, or a load module generation device according to the present invention includes a plurality of partial programs, and each of the partial programs generates a load module of a program that is executed by a plurality of processors. In the load module generation method, it is determined whether individual data included in the program is data read and written by at least two of the plurality of partial programs, and the data is determined to be such data. A cache invalidation operation is added to the determined data.
[0016]
Further, the load module generation method according to the present invention is characterized in that, as the cache invalidation operation, immediately before a read instruction for the shared data, an invalidate instruction for a cache of the data is inserted.
[0017]
When the load module generated by these inventions is executed, data shared by a plurality of partial programs is not copied to the cache memory of the processor in the first place, or is copied but erased at the time of reading. The value of the main memory will be referenced and updated.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of a load module generation method, a load module generation program, and a load module generation device according to the present invention will be described in detail with reference to the accompanying drawings.
[0019]
As described above, in the prior art, the data consistency between the cache memory and the main storage or between the cache memories is guaranteed by using hardware called a cache coherency mechanism. The policy is to solve the cache coherency problem exclusively with software. It may be said that the cache coherency is not the processor executing the program but the program executed by the processor.
[0020]
The following two methods have been proposed in theory as solutions to this policy (cf. Shimmer, Cart "UNIX (R) Kernel Internal Analysis-Cache and Multiprocessor Management", SoftBank Corp.).
[0021]
(1) A method in which shared data in a program is not cached in the first place, that is, a value in a main memory is always read and written to prevent a situation in which copies of the shared data are scattered at a plurality of locations. Hereinafter, it is referred to as “uncached shared data method”.
[0022]
Some processors having an MMU (Memory Management Unit) can designate an arbitrary area in a memory space as an area not copied to a cache memory. For example, in SPARC, an arbitrary area can be excluded from the cache by turning off the C bit (cacheable bit) of the page table entry of the MMU.
[0023]
This function is mainly provided because it is not necessary to provide an I / O space separately from the memory space, and when a part of the memory space is used as the I / O space, the relevant area must be excluded from the cache target. Is what is being done. For the I / O, if a cache is present, a problem such as the inability to read / write the latest value occurs. Therefore, it is reasonable to set the cache not to be used even if the processing speed is sacrificed.
[0024]
If this function is diverted to provide a non-cacheable area in the memory space of the processor and the shared data is arranged exclusively in this area, the non-shared data, that is, the Therefore, only data unique to the partial program being executed is stored, and since there is no cache for shared data, the value in the main memory is always read and written.
[0025]
FIG. 1 is an explanatory diagram showing an example of a memory map in a case where coherency of a cache is guaranteed by an uncached shared data method. In the drawing, a text area 100 is an area for holding a sequence of instructions of a program, and a data area 101 is an area for holding, in particular, non-shared data among data read / written from the program. Each of these two areas is a cache target area, that is, an area that may be copied to the cache memory.
[0026]
On the other hand, the shared data area 102 is an area for holding, in particular, shared data among data read and written from the program. This area is a non-cacheable area, that is, an area that is not likely to be copied to the cache memory.
[0027]
Embodiments 1 and 2 described below are intended to guarantee the coherency of the cache by using the “uncached shared data method”. More specifically, on the premise of the above, the present invention relates to a procedure of how to form each of the above-mentioned regions arranged at different locations in the memory space when a program load module is generated.
[0028]
(2) Both shared data and non-shared data are cached without distinction, but for shared data, the value in the main memory is always read by invalidating the data in the cache memory immediately before reading. A method that ignores the upper cache. Hereinafter, it is referred to as a “selective cache invalidation operation method”. Selective is that the cache is not invalidated for non-shared data.
[0029]
The cache mechanism has a write-through cache in which the value of the main memory is simultaneously updated with the value of the cache memory just by executing a store instruction.To reflect the update in the cache memory to the main memory, There are two types, a write-back cache in which a flush instruction must be executed again after a store instruction. The procedure of the above-mentioned "invalidation operation" differs depending on the type of the cache mechanism.
[0030]
That is, in the case of the write-through type, since the latest value is always held in the main memory, executing the invalidate instruction before reading the shared data allows the cache memory to remain in its own cache memory. You only need to delete that copy. On the other hand, in the case of the write-back type, first, the flash instruction is executed to update the value in the main memory with the value in the cache memory, that is, the value in the main memory is updated to the latest value, and then the This is a two-stage configuration in which the cache is erased and the value in the main memory is read by the validate instruction.
[0031]
FIG. 2 is an explanatory diagram illustrating an example of a memory map in a case where cache coherency is ensured by a selective cache invalidation operation method. In the figure, a text area 200 is an area for holding a sequence of instructions of a program, a data area 201 is an area for holding, in particular, non-shared data among data read and written from the program, and a shared data area 202 is an area for holding shared data. These three areas are all cache target areas.
[0032]
The third embodiment described below intends to guarantee the coherency of the cache by this “selective cache invalidation operation method”. More specifically, shared data is specified in advance when a program load module is generated, and a cache invalidation operation is performed immediately before a load instruction of the data, that is, an invalidate instruction and a write-back instruction in a write-through cache. This is related to the procedure of inserting a flash instruction + invalidate instruction in the cache.
[0033]
(Embodiment 1)
FIG. 3 is a block diagram illustrating an example of a hardware configuration of the load module generation device according to the first embodiment of the present invention.
[0034]
In the figure, first, a CPU 301 controls the entire apparatus. The ROM 302 stores a boot program and the like. The RAM 303 is used as a work area of the CPU 301. The HDD 304 controls reading / writing of data from / to the HD 305 under the control of the CPU 301. The HD 305 stores data written under the control of the HDD 304.
[0035]
The FDD 306 controls reading / writing of data from / to the FD 307 under the control of the CPU 301. The FDD 307 stores the data written under the control of the FDD 306 and causes the magnetic head of the FDD 306 to read the stored data. As the removable recording medium, in addition to the FD 307, a CD-ROM, a CD-R, a CD-RW, an MO, a DVD (Digital Versatile Disk), a memory card, and the like can be considered.
[0036]
The display 308 is, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like, and displays various data such as a document and an image, in addition to a cursor and a window. The network I / F 309 is connected to the LAN via the Ethernet (R) cable 310 and controls transmission and reception of data between the LAN and the inside of the apparatus.
[0037]
The keyboard 311 includes keys for inputting characters, numerical values, various instructions, and the like, and performs input of data into the apparatus. It may be a touch panel type input pad or a numeric keypad. The mouse 312 is used to move a cursor, select a range, and the like. A trackball, a joystick, a cross key, a jog dial, or the like may be used as long as the pointing device has a similar function. Note that the above components are connected by a bus or a cable 300.
[0038]
Next, FIG. 4 is a block diagram functionally showing the configuration of the load module generation device according to the first embodiment of the present invention. The CPU 301 reads out the programs stored in the HD 305 and the FD 307 shown in FIG. 3, specifically, the three programs of the compiler, the assembler, and the linker into the RAM 303 and executes the programs. This is achieved by doing
[0039]
In the figure, reference numerals 400 to 404 denote compilers, 405 to 407 denote assemblers, and 408 to 412 denote functional units realized by linkers. The function of each functional unit will be described with reference to a flowchart described immediately below.
[0040]
Next, FIG. 5 is a flowchart illustrating a procedure of a load module generation process in the load module generation device according to the first embodiment of the present invention.
[0041]
In the above device, a compiler is first activated, and a first analyzer 400 implemented by the compiler reads a source description of a specified program, performs lexical analysis and syntax analysis, and converts the program into an internal representation of the compiler. Conversion is performed (step S501).
[0042]
Next, the shared data determination unit 401 scans the internal representation of the compiler to determine whether individual data included therein is shared data between partial programs. At this time, an identifier to that effect is added to the data determined to be shared data (step S502).
[0043]
Next, the shared data identification information adding unit 402 scans the internal expression of the compiler and searches for the data to which the identifier has been added in step S502. Then, a predetermined prefix is added to the head of the data name as identification information of the searched shared data (step S503). Note that the prefix may be a character string “__shr_” or the like.
[0044]
Next, the instruction sequence generation unit 403 generates an instruction sequence for realizing the operation of the program based on the internal expression of the compiler, and adds the instruction sequence to the internal information of the compiler (step S504).
[0045]
Next, the assembly description output unit 404 outputs the assembly description of the program based on the internal expression of the compiler and the added instruction sequence (step S505). Thus, the conversion process from the source description to the assembly description by the compiler is completed.
[0046]
Next, in the above apparatus, an assembler is started, and the second analysis unit 405 implemented by the assembler reads the assembly description output from the assembly description output unit 404 of the compiler in step S505, performs lexical analysis, and performs the lexical analysis. It is converted to an internal representation (step S506).
[0047]
Next, the binary code generation unit 406 generates a binary code (including an instruction code) based on the internal representation of the assembler, and adds the code to the internal information of the assembler (step S507).
[0048]
Next, the object output unit 407 outputs an object of the program based on the internal expression of the assembler and the added binary code (step S508). Thus, the conversion process from the assembly description to the object code by the assembler is completed.
[0049]
Next, in the above apparatus, a linker is activated, and the object reading unit 408 realized by the linker reads the object output from the object output unit 407 of the assembler in step S508 as an internal representation of the linker (step S509).
[0050]
Next, the shared data area forming unit 409 searches for data having identification information indicating that the data is shared data (such as “__shr_” described above) in the internal representation of the linker. Then, an area (shared data area 102 shown in FIG. 1) composed only of these shared data is formed and added as an internal expression of the linker (step S510).
[0051]
Next, the memory space construction unit 410 forms, in the internal representation of the linker, an area composed of only the remaining unshared data (data area 101) and an area composed of only the instruction sequence (text area 100). It is added as an internal expression (step S511).
[0052]
Next, the address resolution unit 411 resolves the addresses of the memory areas of the text area 100, the data area 101, and the shared data area 102 in the internal representation of the linker (step S512).
[0053]
Next, the load module output unit 412 outputs the load module of the program based on the internal representation of the linker (step S513). Thus, the linking of the object code by the linker is completed, and the process of converting the program from the source to the load module is completed.
[0054]
According to the first embodiment described above, identification information (specifically, a prefix such as “__shr_”) is attached to data shared by a plurality of partial programs, and at the time of linking, only this shared data is first used. Is extracted to form a shared data area 102, then a data area 101 is formed by the remaining unshared data, and a text area 100 is formed by the remaining instruction sequence.
[0055]
As described above, since the shared data and the non-shared data are previously grouped in different areas when the load module is generated, the shared data, that is, the shared data area 102 is cached in the non-cache target area, and the non-shared data, that is, the data area 101 is cached, during execution. By arranging the shared data in the target area, it is possible to prevent the shared data from being distributed to the cache memories of the respective processors, thereby maintaining the coherency of the cache.
[0056]
(Embodiment 2)
By the way, in the first embodiment described above, the shared data marked in advance is so-called separated before the construction of the memory space by the linker. However, this is not the case with the prior art linker. This is because there is no idea of separately combining shared data. That is, since the data is simply combined in the order described in the source, unless only the shared data is first separated in step S510, an area in which the shared data and the non-shared data are mixed in the subsequent step S511. This is because it is formed.
[0057]
However, the linker according to the related art also has a function of collecting data in the same section into one block and arranging each block in a different memory area. Therefore, if a shared data portion is designated as section A and a non-shared data portion is designated as section B using a section designation pseudo instruction of an assembler as in the second embodiment described below, Within the framework of memory space construction processing by the technology linker, shared data and non-shared data can be combined into separate blocks.
[0058]
The hardware configuration of the load module generation device according to the second embodiment is the same as that of the first embodiment shown in FIG. FIG. 6 is a block diagram functionally showing the configuration of the load module generation device according to the second embodiment of the present invention, and FIG. 7 is a flowchart showing the procedure of a load module generation process in the device. The difference from the first embodiment is that FIG. 6 does not have a functional unit corresponding to the shared data area forming unit 409 shown in FIG. 4, and accordingly, FIG. 7 shows a step S510 shown in FIG. There is no corresponding processing.
[0059]
Further, while the shared data identification information adding unit 402 according to the first embodiment adds a predetermined prefix to the shared data in step S503, the shared data identification information adding unit 602 according to the second embodiment performs the processing in step S703. A section specification pseudo instruction, for example, one line such as ".sect" SHARED DATA "" is inserted immediately before the shared data.
[0060]
Then, in step S710, the memory space construction unit 609 according to the second embodiment collects only data belonging to the “SHARED DATA” section to form a shared data area 102, and collects data in the remaining sections collectively as a data area 101 and an instruction sequence. Collectively, text area 100 is used.
[0061]
According to the second embodiment, since the shared data and the non-shared data are separated in advance by the assembler section specification pseudo instruction, the former is set to the non-cache target area and the latter is set to the cache target area when executing the program. With each arrangement, the shared data is prevented from being distributed to the cache memories of the respective processors, thereby maintaining cache coherency.
[0062]
(Embodiment 3)
The first and second embodiments are based on the premise that an area not to be cached is provided in the memory space of the processor and the shared data is arranged in the area. Therefore, the shared data is not copied to the cache memory in the first place. However, as in the third embodiment described below, the shared data is set to be cached, and the existing cache is invalidated when the data is read. Then, control may be performed so that data is always read to the main memory.
[0063]
The hardware configuration of the load module generation device according to the third embodiment is the same as that of the first embodiment shown in FIG. FIG. 8 is a block diagram functionally showing the configuration of the load module generation device according to the third embodiment of the present invention, and FIG. 9 is a flowchart showing the procedure of a load module generation process in the device. The difference from the first embodiment is that FIG. 8 does not include a function unit corresponding to the shared data area forming unit 409 shown in FIG. 4, and accordingly, FIG. 9 shows a step S510 shown in FIG. There is no corresponding processing.
[0064]
Also, in FIG. 8, a cache invalidation operation adding unit 802 is provided instead of the shared data identification information adding unit 402 shown in FIG. 4, and accordingly, in step S903 in FIG. Instead, the cache invalidation operation adding unit 802 performs the cache invalidation operation addition processing.
[0065]
The addition of the cache invalidation operation is, specifically, immediately before a load instruction for shared data, in the case of a write-through cache, the cache of the data in the cache memory (strictly speaking, a cache block including the cache). ) Is inserted, and in the case of a write-back cache, the flash instruction and the invalidate instruction are inserted.
[0066]
According to the third embodiment, the shared data is cached similarly to the non-shared data, but the copy of the shared data stored in the cache memory of each processor is practically not used.
[0067]
In other words, when reading non-shared data, the processor reads the copy if there is a copy in the cache memory.However, when reading shared data, the processor executes an invalidate instruction immediately before that to execute the copy. Is erased, so that the user skips the cache memory and always goes to the main memory to see the main body of the data. As a result, all processors read and write the same address on the main memory for the shared data, and the cache consistency is maintained.
[0068]
Note that the load module generation method according to the present embodiment is realized by executing a prepared program (compiler, assembler, and linker) on various computers such as a personal computer and a workstation. Is recorded on various computer-readable recording media such as HD, FD, CD-ROM, MO, and DVD, and can be distributed via the recording media or distributed via a network such as the Internet. It is.
[0069]
(Supplementary Note 1) In a load module generation method configured to generate a load module of a program including a plurality of partial programs, each of the partial programs being respectively executed by a plurality of processors,
Individual data included in the program, a shared data determination step of determining whether or not data read and written by at least two of the plurality of partial programs,
An identification information adding step of adding predetermined identification information to data determined to be data read and written by at least two partial programs in the shared data determination step;
A shared data area forming step of combining data to which predetermined identification information has been added in the identification information adding step to form an area to be arranged in a non-cache target area in a memory space;
A load module generation method, comprising:
[0070]
(Supplementary Note 2) In the identification information adding step, a predetermined prefix is added as identification information to data determined to be data read / written by at least two partial programs in the shared data determination step. 3. The load module generation method according to claim 1, wherein
[0071]
(Supplementary Note 3) In the identification information adding step, information specifying the section to which the section belongs is added to the data determined to be data read / written by at least two partial programs in the shared data determination step. 3. The load module generation method according to claim 1, wherein
[0072]
(Supplementary Note 4) In a load module generation method configured to generate a load module of a program including a plurality of partial programs, each of the partial programs being respectively executed by a plurality of processors,
Individual data included in the program, a shared data determination step of determining whether or not data read and written by at least two of the plurality of partial programs,
A cache invalidation operation adding step of adding a cache invalidation operation to data determined to be data read and written by at least two partial programs in the shared data determination step,
A load module generation method, comprising:
[0073]
(Supplementary Note 5) In the cache invalidation operation adding step, immediately before a read instruction for data determined to be data read / written by at least two partial programs in the shared data determination step, the cache invalidation of the data is performed. 5. The load module generation method according to claim 4, wherein a date instruction is inserted.
[0074]
(Supplementary Note 6) In a load module generation program configured to generate a load module of a program configured by a plurality of partial programs, each of the partial programs being respectively executed by a plurality of processors,
A shared data determination step of determining whether individual data included in the program is data read and written by at least two of the plurality of partial programs,
An identification information adding step of adding predetermined identification information to data determined to be data read and written by at least two partial programs in the shared data determination step;
A shared data area forming step of combining data to which predetermined identification information has been added in the identification information adding step to form an area to be arranged in a non-cache target area in the memory space;
A load module generation program for causing a computer to execute the program.
[0075]
(Supplementary Note 7) In the identification information adding step, a predetermined prefix is added as identification information to data determined to be data read / written by at least two partial programs in the shared data determination step. 7. The load module generation program according to claim 6, wherein
[0076]
(Supplementary Note 8) In the identification information adding step, information specifying a section to which the section belongs is added as identification information to the data determined to be data read and written by at least two partial programs in the shared data determination step. 7. The load module generation program according to claim 6, wherein:
[0077]
(Supplementary Note 9) In a load module generation program configured to generate a load module of a program that is configured by a plurality of partial programs and each of the partial programs is respectively executed by a plurality of processors,
A shared data determination step of determining whether individual data included in the program is data read and written by at least two of the plurality of partial programs,
A cache invalidation operation adding step of adding a cache invalidation operation to data determined to be data read and written by at least two partial programs in the shared data determination step,
A load module generation program for causing a computer to execute the program.
[0078]
(Supplementary Note 10) In the cache invalidation operation adding step, immediately before a read instruction for data determined to be data read / written by at least two partial programs in the shared data determination step, the cache invalidation of the data is performed. 10. The load module generation program according to supplementary note 9, wherein a date instruction is inserted.
[0079]
(Supplementary Note 11) In a load module generation device configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, shared data determination means to determine whether or not data read and written by at least two of the plurality of partial programs,
Identification information adding means for adding predetermined identification information to data determined to be data read and written by at least two partial programs by the shared data determination means,
A shared data area forming means for combining data to which predetermined identification information has been added by the identification information adding means to form an area to be arranged in a non-cache target area in a memory space;
A load module generation device comprising:
[0080]
(Supplementary Note 12) The identification information adding means adds a predetermined prefix as identification information to the data determined by the shared data determination means to be data read / written by at least two partial programs. 12. The load module generation device according to attachment 11, wherein
[0081]
(Supplementary Note 13) The identification information adding means adds, as identification information, information specifying a section to which the data belongs to the data determined to be data read / written by at least two partial programs by the shared data determination means. 13. The load module generation device according to claim 11, wherein
[0082]
(Supplementary Note 14) In a load module generation device configured to generate a load module of a program configured by a plurality of partial programs, each of the partial programs being respectively executed by a plurality of processors,
Individual data included in the program, shared data determination means to determine whether or not data read and written by at least two of the plurality of partial programs,
A cache invalidation operation adding unit that adds a cache invalidation operation to data determined to be data read and written by at least two partial programs by the shared data determination unit;
A load module generation device comprising:
[0083]
(Supplementary Note 15) The cache invalidation operation adding unit, immediately before a read command for data determined to be data read / written by at least two partial programs by the shared data determining unit, performs an inversion of a cache of the data. 15. The load module generation device according to appendix 14, wherein a date command is inserted.
[0084]
【The invention's effect】
As described above, when the load module generated according to the present invention is executed, data shared by a plurality of partial programs is not copied to the cache memory of the processor in the first place, or is copied but erased when read. As a result, the value of the main memory is always referred to and updated, so that even if a plurality of partial programs sharing the same data are executed by different processors, the coherency of the cache is automatically maintained. It is possible to obtain a load module generation method, a load module generation program, and a load module generation device capable of generating a load module to be generated.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram showing an example of a memory map in a case where cache coherency is ensured by an uncached shared data method.
FIG. 2 is an explanatory diagram showing an example of a memory map in a case where cache coherency is ensured by a selective cache invalidation operation method;
FIG. 3 is a block diagram illustrating an example of a hardware configuration of the load module generation device according to the first embodiment of the present invention;
FIG. 4 is a block diagram functionally showing a configuration of the load module generation device according to the first embodiment of the present invention;
FIG. 5 is a flowchart illustrating a procedure of a load module generation process in the load module generation device according to the first embodiment of the present invention;
FIG. 6 is a block diagram functionally illustrating a configuration of a load module generation device according to a second embodiment of the present invention;
FIG. 7 is a flowchart illustrating a procedure of a load module generation process in the load module generation device according to the second embodiment of the present invention;
FIG. 8 is a block diagram functionally illustrating a configuration of a load module generation device according to a third embodiment of the present invention;
FIG. 9 is a flowchart illustrating a procedure of a load module generation process in the load module generation device according to the third embodiment of the present invention;
FIG. 10 is an explanatory diagram schematically showing a configuration of a shared memory multiprocessor system having a cache mechanism.
FIG. 11 is an explanatory diagram showing an example of a memory map in a case where cache coherency is ensured by a conventional cache coherency mechanism.
[Explanation of symbols]
300 bus or cable
301 CPU
302 ROM
303 RAM
304 HDD
305 HD
306 FDD
307 FD
308 display
309 Network I / F
310 Ethernet cable
311 keyboard
312 mouse
400, 600, 800 First analysis unit
401, 601, 801 shared data determination unit
402, 602 shared data identification information adding unit
403, 603, 803 Instruction sequence generator
404, 604, 804 Assembly description output unit
405, 605, 805 Second analysis unit
406,606,806 Binary code generator
407, 607, 807 Object output unit
408, 608, 808 Object reading unit
409 Shared data area forming unit
410,609,809 Memory space construction unit
411,610,810 Address resolution unit
412, 611, 811 load module output unit
802 Cache invalidation operation addition unit

Claims

A load module generation method configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, a shared data determination step of determining whether or not data read and written by at least two of the plurality of partial programs,
An identification information adding step of adding predetermined identification information to data determined to be data read and written by at least two partial programs in the shared data determination step;
A shared data area forming step of combining data to which predetermined identification information has been added in the identification information adding step to form an area to be arranged in a non-cache target area in a memory space;
A load module generation method, comprising:

2. The identification information adding step, wherein a predetermined prefix is added as identification information to data determined to be data read / written by at least two partial programs in the shared data determination step. The load module generation method according to 1.

In the identification information adding step, information specifying a section to which the section belongs is added as identification information to data determined to be data read / written by at least two partial programs in the shared data determination step. The load module generation method according to claim 1, wherein

A load module generation method configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, a shared data determination step of determining whether or not data read and written by at least two of the plurality of partial programs,
A cache invalidation operation adding step of adding a cache invalidation operation to data determined to be data read and written by at least two partial programs in the shared data determination step,
A load module generation method, comprising:

In the cache invalidation operation adding step, an invalidate instruction of a cache of the data is inserted immediately before a read instruction for data determined to be data read / written by at least two partial programs in the shared data determination step. The load module generation method according to claim 4, wherein

A load module generation program configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, a shared data determination step of determining whether or not data read and written by at least two of the plurality of partial programs,
An identification information adding step of adding predetermined identification information to data determined to be data read and written by at least two partial programs in the shared data determination step;
A shared data area forming step of combining data to which predetermined identification information has been added in the identification information adding step to form an area to be arranged in a non-cache target area in a memory space;
A load module generation program for causing a computer to execute the program.

A load module generation program configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, a shared data determination step of determining whether or not data read and written by at least two of the plurality of partial programs,
A cache invalidation operation adding step of adding a cache invalidation operation to data determined to be data read and written by at least two partial programs in the shared data determination step,
A load module generation program for causing a computer to execute the program.

A load module generation device configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, shared data determination means to determine whether or not data read and written by at least two of the plurality of partial programs,
Identification information adding means for adding predetermined identification information to data determined to be data read and written by at least two partial programs by the shared data determination means,
A shared data area forming means for combining data to which predetermined identification information has been added by the identification information adding means to form an area to be arranged in a non-cache target area in a memory space;
A load module generation device comprising:

A load module generation device configured by a plurality of partial programs, wherein each of the partial programs generates a load module of a program executed by a plurality of processors,
Individual data included in the program, shared data determination means to determine whether or not data read and written by at least two of the plurality of partial programs,
A cache invalidation operation adding unit that adds a cache invalidation operation to data determined to be data read and written by at least two partial programs by the shared data determination unit;
A load module generation device comprising: