JP2004280766A

JP2004280766A - Intermediate code execution system

Info

Publication number: JP2004280766A
Application number: JP2003115033A
Authority: JP
Inventors: Akira Kobayashi; 章小林; Tetsuyuki Kobayashi; 哲之小林
Original assignee: Aplix Corp
Current assignee: Aplix Corp
Priority date: 2003-03-15
Filing date: 2003-03-15
Publication date: 2004-10-07

Abstract

【課題】アクセラレータを有するハードウェア構成において、中間コード実行時に極めて優れたパフォーマンスを達成することができる中間コード実行システムを提供すること。
【解決手段】中間コード実行システム１は、プロセッサ１０と、メインメモリ１２と、メインメモリ１２よりも高速な内蔵メモリ１５と、中間コードの実行を高速化するコプロセッサ１６とを備え、コプロセッサ１６を利用することができない命令を内蔵メモリ１５に格納されたコアモジュール２１およびサブモジュール２１ｂを利用して実行する。
【選択図】図１An intermediate code execution system capable of achieving extremely excellent performance when executing intermediate code in a hardware configuration having an accelerator.
An intermediate code execution system includes a processor, a main memory, a built-in memory that is faster than the main memory, and a coprocessor that speeds up execution of the intermediate code. Are executed using the core module 21 and the sub-module 21b stored in the internal memory 15.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、ネィティブコードよりも抽象度の高い中間コード形式のプログラムを実行する中間コード実行システムに関する。より詳細には、Ｊａｖａ（登録商標）クラスファイルを実行する中間コード実行システムに関する。
【０００２】
【従来の技術】
ハードウェアやＯＳといった、コンピュータのプラットフォームに依存しないプログラムを提供することを目的として、各プラットフォーム上にソフトウェア的な手法またはハードウェア的な手法により仮想機械（ＶＭ；ＶｉｒｔｕａｌＭａｃｈｉｎｅ）を構築し、この仮想機械上でネイティブコードよりも抽象度の高い中間コード形式のプログラムを実行する方法が提案されている。このような方法を採用したプログラム言語の一つとして、クラスファイルと呼ばれる中間コード形式を採用したＪａｖａ（登録商標）を挙げることができる。なお、以下ではハードウェアと当該ハードウェア上に構築された仮想機械とを一体的に、中間コード実行システムと呼称する場合がある。
【０００３】
上記の方法によれば、単一のプログラムコードを種々のプラットフォームに供給して実行させることが可能となるため、オブジェクトコードをプラットフォームの数だけ準備する必要がなくなる。これにより、プログラムの配信を簡潔化することができるばかりでなく、ソフトウェア開発を効率化することも可能である。このため、種々のコンピュータのプラットフォームにおいて仮想機械を構築することが行われている。さらに、近時ではプロセッサを搭載した種々の電子機器（以下、組込み機器という）においても、プロセッサ上に仮想計算機を構築することが行われはじめている。
【０００４】
ここで、仮想計算機としては、プラットフォーム上にソフトウェア的に構築され、クラスファイルに含まれるバイトコード命令を逐次的に解釈して実行するインタープリター方式のものが知られている。
【０００５】
しかしながら、インタープリター方式の仮想計算機は、クラスファイルから一つ一つバイトコード命令を取り出してはその内容を解釈するというプロセスが必要であり、このプロセスがシステムのオーバーヘッドとなる場合がある。このようなオーバーヘッドを軽減してパフォーマンスを向上するために、クラスファイルを各ハードウェアに固有のネイティブコードにコンパイルしてから実行するＪＩＴコンパイラ（ＪｕｓｔＩｎＴｉｍｅＣｏｍｐｉｌｅｒ）方式や、ＡＯＴコンパイラ（ＡｈｅａｄＯｆＴｉｍｅＣｏｍｐｉｌｅｒ）方式等が提案されている。さらに、バイトコード命令を直接実行することができるように特別に設計されたＪａｖａ（登録商標）チップのように、仮想計算機をハードウェア的に構築することも試みられている。
【０００６】
上記のＪＩＴやＡＯＴ等のコンパイラ方式ではプロセッサのネイティブコードを実行することになるので、命令実行の速度だけを見ればインタープリター方式よりも高いパフォーマンスを得ることができる。しかしながら、コンパイラ方式では、コンパイルの作業自体に必要なワークメモリや、クラスファイルに比べてサイズが４〜１０倍と大きいネイティブコードを保存する領域も必要となるため、インタープリター方式よりも大量のメモリが必要となるという問題があり、特に通常のコンピュータよりもハードウェア資源の制約が厳しい組み込み機器において適用することは容易ではない。さらに、クラスファイルの実行を指示してからコンパイルを開始する場合には、コンパイルの作業がオーバーヘッドとなって十分なパフォーマンスが得られないおそれもある。
【０００７】
また、上記のＪａｖａ（登録商標）チップによれば、コンパイルすることもなく高いパフォーマンスでクラスファイルを実行することが可能であるが、Ｊａｖａ（登録商標）で規定されるスタックマシンを完全にハードウェアで実現することは技術的に難しく、実現したとしても高コスト化を免れることはできない。また、Ｊａｖａ（登録商標）チップを採用した場合には、他プログラミング言語でのソフトウェア開発は原則的に不可能となり、過去のプログラム資産を利用することができないという問題もある。
【０００８】
このようなコンパイル方式やＪａｖａ（登録商標）チップを組み込み機器における問題を回避しつつクラスファイル実行時のパフォーマンスを向上するために、クラスファイルの実行をハードウェア的に補助するハードウェア・アクセラレータ（以下、アクセラレータという）技術が提案されている。このようなアクセラレータとしては、クラスファイルに含まれるバイトコード命令の実行を高速化するために、クラスファイルに含まれるバイトコード命令の実行をプロセッサとは独立して行うものと、バイトコード命令をプロセッサのネィティブコードやマイクロコードに変換してプロセッサに引き渡す機能を有するものとを挙げることができる。これらは通常のプロセッサに併設されるコプロセッサや、通常のプロセッサアーキテクチャに追加されるハードウェアアーキテクチャとして実装される。
【０００９】
しかし、アクセラレータで全てのバイトコード命令に対応するとゲート数が増えてコスト高となる等の不具合があるため、アクセラレータを利用して高速化することができるのは全バイトコード命令ではなく、一部のバイトコード命令についてはソフトウェア的な手法により実行しなければならないことが通常である。特に、メソッド呼び出しのように粒度が大きいバイトコード命令はアクセラレータで対応しないことが多い。このため、殆どのバイトコード命令がアクセラレータにより高速化されるにも関わらず、アクセラレータの対応していない一部のバイトコード命令が要因となって、クラスファイル実行時に所望のパフォーマンスが得られないことがある。
【００１０】
【特許文献】
特開２００２−１６３１１６
米国特許第６，３３２，２１５号
米国特許第６，３３８，１６０号
【００１１】
【発明が解決しようとする課題】
本発明は、上述したアクセラレータの技術的な特徴を鑑みてなされたものであって、アクセラレータを有するハードウェア構成において、中間コード実行時に極めて優れたパフォーマンスを達成することができる中間コード実行システムを提供することを目的とする。
【００１２】
【課題を解決するための手段】
上記課題を解決するために、本発明は、ネィティブコードよりも抽象度の高い中間コード形式のプログラムを実行する中間コード実行システムにおいて、
プロセッサと、前記プロセッサに接続されたメインメモリおよび前記メインメモリよりも高速な補助メモリと、中間コード形式のプログラムに含まれる命令の一部について実行を高速化するアクセラレータとを備え、
中間コード形式のプログラムに含まれる命令のうち、アクセラレータを利用することができない命令を実行するためのプログラムコードが前記補助メモリに格納されており、
中間コード形式のプログラムを実行するにあたり、アクセラレータを利用することができる命令はアクセラレータを利用して実行し、アクセラレータを利用することができない命令は前記プログラムコードを利用してソフトウェア的に実行することを特徴とする中間コード実行システムを提供する。
【００１３】
上記構成によれば、アクセラレータを利用することができない命令を高速な補助メモリに格納されたプログラムコードを利用してソフトウェア的に実行するので、アクセラレータを利用することができない命令をより高速に実行することができ、これによりアクセラレータを備えた中間コード実行システムのパフォーマンスを向上することができる。
【００１４】
本発明において、前記アクセラレータは、例えば、前記プロセッサと一体的に構成されているか、または、前記プロセッサに接続されたコプロセッサとする形態が可能であり、中間コード形式のプログラムをプロセッサとは独立して実行するものや、中間コード形式に含まれる一部の命令を前記プロセッサのネィティブコードないしマイクロコードに変換するものを用いることができる。
【００１５】
また、前記中間コード形式のプログラムをＪａｖａ（登録商標）クラスファイルとした場合には、少なくともメソッドを呼び出す命令は前記補助メモリに格納されたプログラムコードを利用して実行されることが好適である。このようにすることで、アクセラレータの利用が難しく、パフォーマンス低下の要因となりやすいメソッド呼び出し命令を高速化することができ、より優れたパフォーマンスの中間コード実行システムを提供することが可能となる。
【００１６】
さらに、補助メモリに格納されるプログラムコードは、アクセラレータを利用することができない命令の実行にあたり、実行する命令に対応したソフトウェアモジュールを呼び出す機能を有するものとすることができる。
【００１７】
上記の場合には、さらにまた、メソッドを呼び出す命令を実行する際に呼び出されるソフトウェアモジュールが前記補助メモリに格納されていることが好ましい。このようにすることで、アクセラレータで対応することが困難なメソッドを呼び出す命令を高速化することができる。このようなソフトウェアモジュールを、頻繁に呼び出されるメソッドの形式に応じて最適化すればさらに好ましい。例えば、一般には、バイトコード命令からなり、かつ、スタックフレームを追加する必要のないメソッドを呼び出す場合に最適化した構成とすることができる。
【００１８】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しながら説明する。
図１は、本実施形態にかかる中間コード実行システムの概略図である。
この中間コード実行システム１は、Ｊａｖａ（登録商標）クラスファイルを実行可能に構成されたものであり、その主なハードウェア構成は、ＣＰＵ１０と、このＣＰＵ１０に外部バス１１（ＥｘｔｅｒｎａｌＢｕｓ）を介して接続されたメインメモリ１２とを具備している。また、図示しないが、ＬＣＤや音声出力装置、キー入力装置等の入出力装置、カードリーダライタ等の補助記憶装置、所定の通信網に接続するためのネットワークインターフェイスをさらに具備してもよいし、この中間コード実行システム１を組み込み機器上に構成する場合には各機器に固有のハードウェアがさらに接続されていても構わない。
【００１９】
ＣＰＵ１０は、ＣＰＵ１０の制御機能および演算機能を実装したプロセッサコア１３と、このプロセッサコア１３に内部バス１４（ＩｎｔｅｒｎａｌＢｕｓ）を介して接続された内蔵メモリ１５（補助メモリ）とを有し、Ｊａｖａ（登録商標）アクセラレータとしての機能を有するコプロセッサ１６とメインメモリ１２が接続されている。メインメモリ１２はＣＰＵ１０と外部バス１７（ｅｘｔｅｒｎａｌｂｕｓ）を介して接続されており、ワークエリアとして使用される揮発性領域をなすＲＡＭ（例えばＳＲＡＭ、ＤＲＡＭ、ＳＤＲＡＭ等）と、プログラムが格納される不揮発性領域をなすＲＯＭ（例えばＦｌａｓｈＲＯＭ等）とからなっている（不図示）。また、内蔵メモリ１５は揮発性のＲＡＭ（例えばＳＲＡＭ等）からなっている。ここで、ＣＰＵ１０内に設けられた内部バス１２は外部バス１７よりもバス幅が広く、このためＣＰＵ１０において内蔵メモリ１５はメインメモリ１２よりも高速にアクセスして読み書きすることが可能となっている。しかし、内蔵メモリ１５はＣＰＵ１０内に構成されている関係上、メインメモリ１２よりもコストが高く、その容量は制限されたものとなっている。
【００２０】
コプロセッサ１６は、メインメモリ１２とプロセッサコア１３との間に設けられており、中間コード実行システム１におけるＪａｖａ（登録商標）クラスファイルの実行を高速化する機能を有している。例えば、コプロセッサ１６は、中間コード実行システム１がメインメモリ１２からバイトコード命令をフェッチして実行するにあたり、メインメモリ１２からフェッチされたバイトコード命令をＣＰＵ１０固有のネィティブコード（マシン語）ないしマイクロコードに変換してＣＰＵ１０に引き渡す処理を行う構成とすることができる。この場合、ＣＰＵ１０はプロセッサコア１３にて直接実行することができるネィティブコードやマイクロコードを実行すればよいので、極めて高速にバイトコード命令を実行することが可能となる。本実施形態では、コプロセッサ１６が、このようにバイトコード命令をＣＰＵ１０のマイクロコードに変換する場合について説明する。
【００２１】
ただし、ゲート数増大による高シリコンコストを回避するため、コプロセッサ１６のアクセラレータ機能は全てのバイトコード命令に対応してはおらず、一部のバイトコード命令は、後述するコアモジュール２１がサブモジュール２２ａ，２２ｂから所定のコードを呼び出すことにより実行される。本実施形態では、コプロセッサ１６のアクセラレータ機能が、粒度の大きいｉｎｖｏｋｅｖｉｒｔｕａｌ，ｉｎｖｏｋｅｓｐｅｃｉａｌ，ｉｎｖｏｋｅｓｔａｔｉｃ，ｉｎｖｏｋｅｉｎｔｅｆａｃｅといったメソッド呼び出し命令を含むいくつかのバイトコード命令に対応していない場合について示す。
【００２２】
次に、中間コード実行システム１のソフトウェア構成について説明する。中間コード実行システム１は、内蔵メモリ１５に格納されたコアモジュール２１と、このコアモジュール２１から呼び出されるサブモジュール２２ａ，２２ｂとを有している。このうちサブモジュール２２ａは内蔵メモリ１５に格納されており、サブモジュール２２ｂはメインメモリ１２に格納されている。コアモジュール２１は、中間コード実行システム１でＪａｖａ（登録商標）クラスファイルを実行中にコプロセッサ１６の対応していないバイトコード命令が出現し、制御がコプロセッサ１６からＣＰＵ１０に移動した場合に実行され、サブモジュール２２ａ，２２ｂから出現したバイトコード命令に対応した処理を行うコードを選択して呼び出す機能を有している。
【００２３】
コアモジュール２１およびサブモジュール２２ａ，２２ｂは、いずれも独立したプログラムコードとしての体裁を取る必要はなく、ＣＰＵ１０で実行することにより上述の内容を含む種々の機能を奏するコードの集合であればよい。このような集合は、例えば、関数、手続き、サブルーチン等の概念を含んだ機能単位をなすコードの集合として解釈されることが相当である。
【００２４】
サブモジュール２２ａ，２２ｂのうち、サブモジュール２２ａはｉｎｖｏｋｅｖｉｒｔｕａｌ，ｉｎｖｏｋｅｓｐｅｃｉａｌ，ｉｎｖｏｋｅｓｔａｔｉｃ，ｉｎｖｏｋｅｉｎｔｅｆａｃｅといったメソッドを呼び出すバイトコード命令を実行する際に、コアモジュール２１から呼び出されるものである。本実施形態では、コアモジュール２１に加えて、このサブモジュール２２ａを内蔵メモリ１５に格納しているので、コプロセッサ１６が対応していない、メソッド呼び出しに関するバイトコード命令の実行を、内蔵メモリ１５に格納されたコードを利用して高速に行うことができる。
【００２５】
このようにメソッド呼び出しを実行するためのサブモジュール２２ａだけを内蔵メモリ１５に格納し、サブモジュール２２ｂをメインメモリ１２に格納するのは、容量の限られた内蔵メモリ１５を有効に活用して、中間コード実行システム１のパフォーマンスに大きな影響を与えるメソッド呼び出しを効率的に高速化するためである。ただし、内蔵メモリ１５の容量に余裕がある場合には、サブモジュール２２ａとサブモジュール２２ｂとの双方を内蔵メモリ１５に格納しても構わない。
【００２６】
また、図示しないが、中間コード実行システム１はこれらの他に、クラスファイルをヒープに展開するクラスローダ、展開されたクラスファイルの検証を行うベリファイヤ、メインメモリ１２に配置されたオブジェクトにガベージコレクションを施すガベージコレクタを含むヒープ管理モジュール、処理の流れをコントロールするスレッド管理モジュール、クラスファイル実行時に必要に応じて参照されるクラスライブラリ、ＪＡＲファイルリーダ等をメインメモリに有している。これらについては、通常のソフトウェアＪａｖａ（登録商標）ＶＭにおける対応したモジュールを利用してもよい。
【００２７】
本発明はこれらの他にソフトウェアが存在することを妨げるものではなく、オペレーティングシステム（ＯＳ）、種々のアプリケーション、サービスプログラム、プロトコルスタック、デバイスドライバ等を含んだソフトウェア構成であって構わない。また、中間コード実行システム１を組み込み機器上に構成する場合には、各機器に固有の機能を実現するためのソフトウェアを有することもできる。
【００２８】
次に、以上のような構成の中間コード実行システム１において、メインメモリ１２に格納されたクラスファイル１０３を実行する動作の概要について、図２を参照しながら説明する。クラスファイル１０３は、例えばクラスローダにより不図示のＪＡＲファイルから取り出されたものでもよいし、中間コード実行システム１外からロードされたもの等であっても構わない。また、以下の動作に先だっては、クラスファイル１０３が妥当なものであるかの検証と、ｓｔａｔｉｃフィールドの生成およびコンスタントプールの決定を含むリンクと、クラスの初期化とが行われているものとする。
【００２９】
中間コード実行システム１におけるクラスファイル１０３の実行にあたっては、まず、中間コード実行システム１の初期化が行われる（ＳＴＥＰ１０１）。これによりコプロセッサ１６はクラスファイル１０３からバイトコード命令をフェッチしてマイクロコードに変換する処理が可能な状態となり、コプロセッサ１６の制御により以降の動作が行われる。また、コアモジュール２１およびサブモジュール２２ａは、このＳＴＥＰ１０１において内蔵メモリに配置するようにしてもよいが、これに先だって配置するようにしても構わない。
【００３０】
次に、コプロセッサ１６がクラスファイル１０３からバイトコード命令をフェッチし（ＳＴＥＰ１０２）、フェッチされたバイトコード命令にコプロセッサ１６のアクセラレータ機能が対応しているかを判断する（ＳＴＥＰ１０３）。ここでアクセラレータ機能の対応したバイトコード命令であると判断された場合には、バイトコード命令がＣＰＵ１０の対応するマイクロコードに変換され（ＳＴＥＰ１０４）、当該マイクロコードがＣＰＵ１０のプロセッサコアに渡されて実行される（ＳＴＥＰ１０５）。なお、ＳＴＥＰ１０３とＳＴＥＰ１０４とは厳密に区別されている必要はなく、コプロセッサ１６によりフェッチされたバイトコード命令の変換を試行して、変換できる場合にはそのまま変換し、変換できない場合には変換できないという結果やフラグを出力して下記ＳＴＥＰ１０８以降の処理に移るようにしてもよい。
【００３１】
ＳＴＥＰ１０３において、アクセラレータ機能の対応したバイトコード命令ではないと判断された場合には、制御がコプロセッサ１６からＣＰＵ１０に移動し、ＣＰＵ１０上で各々のコードを実行することににより以降の動作が行われる。まず、ＣＰＵ１０上でコアモジュール２１が実行され、サブモジュール２２ａ，２２ｂからフェッチされたバイトコード命令に対応した処理を行うコードが呼び出される（ＳＴＥＰ１０８）。ここで、コアモジュール２１は高速な内蔵メモリ１５に格納されているので、ＣＰＵ１０はＳＴＥＰ１０８の処理を高速に行うことができる。
【００３２】
次に、ＳＴＥＰ１０８で呼び出されたコードがＣＰＵ１０により実行され（ＳＴＥＰ１０９）、これによりフェッチされたバイトコード命令に応じた処理が行われる。ここで、フェッチされたバイトコード命令がメソッドを呼び出す命令であった場合には、これらの命令に対応したサブモジュール２２ａは高速な内蔵メモリ１５に格納されているので、ＣＰＵ１０はＳＴＥＰ１０９の処理を高速に行うことができる。
【００３３】
バイトコード命令の実行が終わった後は、いずれの場合も実行されたバイトコード命令が最後のものであったかが判断され（ＳＴＥＰ１０６）、最後であると判断された場合にはクラスファイル１０３の実行は終了し、最後であると判断されなかった場合には対象を次のバイトコード命令に移動して（ＳＴＥＰ１０７）、ＳＴＥＰ１０２以降の処理を繰り返す。中間コード実行システム１は、以上のような概略で示される動作により、クラスファイル１０３を実行する。
【００３４】
このような動作によれば、コプロセッサ１６のアクセラレータ機能が対応したバイトコード命令を高速に実行することができるばかりでなく、コプロセッサ１６のアクセラレータ機能が対応していないバイトコード命令を実行するためのＳＴＥＰ１０８がＣＰＵ１０により高速で実行され、さらに、メソッド呼び出し命令の場合にはＳＴＥＰ１０９も同様に高速で実行されるので、コプロセッサ１６のアクセラレータ機能が対応していないバイトコード命令（特にメソッド呼び出し命令）の実行速度を向上することができる。これにより本実施形態の中間コード実行システム１では、アクセラレータを有するハードウェア構成において、容量の限られた高速な内蔵メモリ１５を有効に活用して、極めて優れたパフォーマンスを達成するこができる。
【００３５】
以上のような構成の本実施形態においては、頻繁に呼び出される形式のメソッド呼び出しに応じてサブモジュール２１ａを最適化することによって、メソッド呼び出しの平均的な実行速度をさらに向上することができる。本発明者らが種々のクラスファイルを調査したところ、一般的に、呼び出される頻度が高いメソッドは、ネィティブメソッドではなくＪａｖａ（登録商標）のメソッドであって、スタックフレームを追加しなくてもよいものであることが判明した。したがって、サブモジュール２２ａ内に、そのようなメソッドを呼び出す場合の処理をインライン展開して条件分岐や関数呼び出しを行うことなく実行できる最適化コードを準備し、サブモジュール２２ａが呼び出された際に頻出パターンに該当するか否かを判断し、頻出パターンのメソッドであれば最適化コードを利用して実行し、頻出パターンのメソッドでなければ所定の条件分岐や関数呼び出しを含む通常の処理で実行するように、サブモジュール２２ａを構成することにより、中間コード実行システム１のパフォーマンスをさらに向上することが可能である。また、別パターンのメソッドが高頻度で呼び出される場合には、当該別パターンのメソッドを呼び出す処理をインライン展開したコードを準備し、同様にサブモジュール２２ａを構成すればよい。
【００３６】
以上、本発明の実施形態に係る中間コード実行システムについて説明したが、本発明はこれに限定されることなく、その趣旨を逸脱しない範囲で種々の改良・変更が可能である。例えば、上記の実施形態では、補助メモリとして内蔵メモリ１５を使用した場合を示したが、キャッシュメモリを補助メモリとしても構わないし、メインメモリをＤＲＡＭとして補助メモリをＳＲＡＭとする等の構成も可能である。
【００３７】
また、本実施形態では、アクセラレータ機能を有するコプロセッサ１６を使用した場合を示したが、プロセッサコアと一体的に構成されたアクセラレータを使用した場合にも上記実施形態と略同様に本発明を適用することによって、極めて高いパフォーマンスの中間コード実行システムを得ることができる。
【００３８】
さらに、Ｊａｖａ（登録商標）クラスファイルを実行する中間コード実行システムを構成する場合には、Ｊａｖａ（登録商標）仮想マシン仕様（原題”ＴｈｅＪａｖａ（登録商標，原題ではＴＭ）ＶｉｒｔｕａｌＭａｃｈｉｎｅＳｐｅｃｉｆｉｃａｔｉｏｎ”）に基づいてハードウェアおよび／またはソフトウェアを設計すればよく、クラスファイルを実行可能なこと以外に制限はない。したがって、仕様と本発明の趣旨とを逸脱しない範囲において、自由に設計を行うことが可能である。さらにまた、本発明はＪａｖａ（登録商標）クラスファイルを実行する中間コード実行システムに限られるものではない。
【００３９】
【発明の効果】
本発明によれば、アクセラレータを利用することができない命令を高速な補助メモリに格納されたプログラムコードを利用してソフトウェア的に実行するので、アクセラレータを利用することができない命令をより高速に実行することができ、これによりアクセラレータを備えた中間コード実行システムのパフォーマンスを向上することができる。
【図面の簡単な説明】
【図１】本実施形態にかかる中間コード実行システムの概略図
【図２】中間コード実行システムでクラスファイルを実行する動作の概略図
【符号の説明】
１中間コード実行システム
１０ＣＰＵ
１２メインメモリ
１３プロセッサコア
１４内部バス
１５内蔵メモリ
１６コプロセッサ
１７外部バス
２１コアモジュール
２２ａ，２２ｂサブモジュール
２３クラスファイル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an intermediate code execution system that executes an intermediate code format program having a higher degree of abstraction than native code. More specifically, the present invention relates to an intermediate code execution system that executes a Java (registered trademark) class file.
[0002]
[Prior art]
In order to provide a program independent of a computer platform, such as hardware or an OS, a virtual machine (VM) is constructed on each platform by a software method or a hardware method, and this virtual machine is created. A method of executing a program in an intermediate code format having a higher degree of abstraction than a native code on a machine has been proposed. One of the programming languages that employs such a method is Java (registered trademark) that employs an intermediate code format called a class file. Hereinafter, the hardware and the virtual machine constructed on the hardware may be referred to as an intermediate code execution system.
[0003]
According to the above method, a single program code can be supplied to various platforms and executed, so that it is not necessary to prepare object codes for each platform. As a result, not only can the distribution of the program be simplified, but also the efficiency of software development can be increased. For this reason, virtual machines have been constructed on various computer platforms. Further, recently, in various electronic devices (hereinafter, referred to as embedded devices) equipped with a processor, construction of a virtual computer on the processor has begun.
[0004]
Here, as the virtual machine, an interpreter system which is constructed as software on a platform and sequentially interprets and executes bytecode instructions included in a class file is known.
[0005]
However, a virtual machine of the interpreter system needs a process of extracting bytecode instructions one by one from a class file and interpreting the contents thereof, and this process may be a system overhead. In order to reduce such overhead and improve performance, a JIT compiler (Just In Time Compiler) method that compiles a class file into a native code specific to each hardware and then executes it, or an AOT compiler (Ahead Of Time) Compiler) method and the like have been proposed. Further, it has been attempted to construct a virtual machine in hardware, such as a Java (registered trademark) chip specially designed to directly execute a bytecode instruction.
[0006]
In the above-mentioned compiler system such as JIT or AOT, the native code of the processor is executed. Therefore, if only the speed of instruction execution is considered, higher performance can be obtained than in the interpreter system. However, the compiler method requires a work memory required for the compilation operation itself and an area for storing native code whose size is 4 to 10 times larger than the class file, and therefore requires a larger amount of memory than the interpreter method. It is not easy to apply the method to an embedded device in which hardware resources are more restricted than a normal computer. Furthermore, when starting the compilation after instructing the execution of the class file, there is a possibility that a sufficient performance may not be obtained due to the overhead of the compilation work.
[0007]
Further, according to the above-mentioned Java (registered trademark) chip, it is possible to execute a class file with high performance without compiling, but the stack machine specified by Java (registered trademark) is completely implemented by hardware. It is technically difficult to realize this, and even if it is realized, it cannot be avoided from increasing costs. In addition, when a Java (registered trademark) chip is adopted, software development in another programming language becomes impossible in principle, and there is a problem that past program assets cannot be used.
[0008]
In order to improve the performance at the time of executing a class file while avoiding problems in a device incorporating such a compile method or a Java (registered trademark) chip, a hardware accelerator (hereinafter referred to as a hardware accelerator) which assists in executing a class file in terms of hardware. , An accelerator). In order to speed up the execution of bytecode instructions included in a class file, such an accelerator performs execution of bytecode instructions included in a class file independently of a processor. And a function having a function of converting the code into a native code or a microcode and delivering the converted code to a processor. These are implemented as a coprocessor attached to a normal processor or a hardware architecture added to a normal processor architecture.
[0009]
However, if the accelerator supports all bytecode instructions, the number of gates will increase and the cost will increase.Therefore, it is not all bytecode instructions that can be accelerated using accelerators. Is usually required to be executed by a software method. In particular, a bytecode instruction having a large granularity such as a method call is often not supported by the accelerator. For this reason, despite the fact that most bytecode instructions are accelerated by the accelerator, some bytecode instructions that are not supported by the accelerator may cause the desired performance when executing the class file. There is.
[0010]
[Patent Document]
JP-A-2002-163116
US Pat. No. 6,332,215 US Pat. No. 6,338,160
[Problems to be solved by the invention]
The present invention has been made in view of the technical characteristics of the accelerator described above, and provides an intermediate code execution system that can achieve extremely excellent performance when executing intermediate code in a hardware configuration having an accelerator. The purpose is to do.
[0012]
[Means for Solving the Problems]
In order to solve the above problems, the present invention provides an intermediate code execution system that executes a program in an intermediate code format having a higher degree of abstraction than native code.
A processor, a main memory connected to the processor and an auxiliary memory faster than the main memory, and an accelerator for accelerating execution of a part of instructions included in a program in an intermediate code format,
Among the instructions included in the program in the intermediate code format, a program code for executing an instruction that cannot use the accelerator is stored in the auxiliary memory,
When executing a program in the intermediate code format, instructions that can use an accelerator are executed using an accelerator, and instructions that cannot use an accelerator are executed as software using the program code. An intermediate code execution system is provided.
[0013]
According to the above configuration, since the instruction that cannot use the accelerator is executed in software using the program code stored in the high-speed auxiliary memory, the instruction that cannot use the accelerator is executed at higher speed. This can improve the performance of an intermediate code execution system having an accelerator.
[0014]
In the present invention, for example, the accelerator may be configured integrally with the processor, or may be configured as a coprocessor connected to the processor, and may execute a program in an intermediate code format independently of the processor. And a program that converts some instructions included in the intermediate code format into native code or microcode of the processor.
[0015]
When the program in the intermediate code format is a Java (registered trademark) class file, it is preferable that at least an instruction for calling a method is executed by using a program code stored in the auxiliary memory. By doing so, it is difficult to use an accelerator, and it is possible to speed up a method call instruction which is likely to cause a decrease in performance, and to provide an intermediate code execution system with higher performance.
[0016]
Further, the program code stored in the auxiliary memory may have a function of calling a software module corresponding to the instruction to be executed when executing an instruction that cannot use the accelerator.
[0017]
In the above case, it is preferable that a software module called when executing an instruction for calling a method is stored in the auxiliary memory. By doing so, it is possible to speed up the instruction for calling a method that is difficult to handle with the accelerator. More preferably, such software modules are optimized according to the type of frequently called method. For example, in general, an optimized configuration can be used when a method that is composed of bytecode instructions and does not need to add a stack frame is called.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a schematic diagram of an intermediate code execution system according to the present embodiment.
The intermediate code execution system 1 is configured to execute a Java (registered trademark) class file. The main hardware configuration of the intermediate code execution system 1 includes a CPU 10 and an external bus 11 (External Bus) connected to the CPU 10. And a main memory 12 connected thereto. Although not shown, an input / output device such as an LCD, a voice output device, and a key input device, an auxiliary storage device such as a card reader / writer, and a network interface for connecting to a predetermined communication network may be further provided. When this intermediate code execution system 1 is configured on an embedded device, hardware unique to each device may be further connected.
[0019]
The CPU 10 includes a processor core 13 having a control function and an arithmetic function of the CPU 10 mounted therein, and a built-in memory 15 (auxiliary memory) connected to the processor core 13 via an internal bus 14 (Internal Bus). The main memory 12 is connected to a coprocessor 16 having a function as a registered trademark accelerator. The main memory 12 is connected to the CPU 10 via an external bus 17 (external bus), and includes a RAM (for example, an SRAM, a DRAM, an SDRAM, etc.) serving as a volatile area used as a work area, and a nonvolatile memory for storing a program. (For example, FlashROM or the like) forming an active region (not shown). The built-in memory 15 is composed of a volatile RAM (for example, an SRAM). Here, the bus width of the internal bus 12 provided in the CPU 10 is wider than that of the external bus 17. Therefore, in the CPU 10, the built-in memory 15 can access and read / write at a higher speed than the main memory 12. . However, since the built-in memory 15 is configured in the CPU 10, the cost is higher than that of the main memory 12, and its capacity is limited.
[0020]
The coprocessor 16 is provided between the main memory 12 and the processor core 13, and has a function of accelerating the execution of a Java (registered trademark) class file in the intermediate code execution system 1. For example, when the intermediate code execution system 1 fetches and executes a bytecode instruction from the main memory 12, the coprocessor 16 converts the bytecode instruction fetched from the main memory 12 into a native code (machine language) unique to the CPU 10 or a microcode. It may be configured to perform a process of converting the code into a code and delivering the code to the CPU 10. In this case, since the CPU 10 only needs to execute native code or microcode that can be directly executed by the processor core 13, it is possible to execute a bytecode instruction at extremely high speed. In the present embodiment, the case where the coprocessor 16 converts the bytecode instruction into the microcode of the CPU 10 will be described.
[0021]
However, in order to avoid a high silicon cost due to an increase in the number of gates, the accelerator function of the coprocessor 16 does not support all bytecode instructions. , 22b by calling a predetermined code. In the present embodiment, a case will be described in which the accelerator function of the coprocessor 16 does not support some bytecode instructions including method call instructions such as invokevirtual, invokespecial, invokestatic, and invokeinterface with large granularity.
[0022]
Next, a software configuration of the intermediate code execution system 1 will be described. The intermediate code execution system 1 has a core module 21 stored in the built-in memory 15 and sub modules 22a and 22b called from the core module 21. The sub-module 22a is stored in the internal memory 15, and the sub-module 22b is stored in the main memory 12. The core module 21 is executed when a bytecode instruction not supported by the coprocessor 16 appears during execution of a Java (registered trademark) class file in the intermediate code execution system 1 and control is transferred from the coprocessor 16 to the CPU 10. And has a function of selecting and calling a code for performing a process corresponding to the bytecode instruction appearing from the submodules 22a and 22b.
[0023]
Each of the core module 21 and the submodules 22a and 22b does not need to take the form of an independent program code, and may be a set of codes that have various functions including the above-described contents when executed by the CPU 10. Such a set can be interpreted, for example, as a set of codes constituting a functional unit including concepts of functions, procedures, subroutines, and the like.
[0024]
Of the sub-modules 22a and 22b, the sub-module 22a is called from the core module 21 when executing a bytecode instruction for calling a method such as invokevirtual, invokespecial, invokestatic, and invokeinterface. In the present embodiment, in addition to the core module 21, the sub-module 22a is stored in the built-in memory 15, so that the execution of the bytecode instruction related to the method call which is not supported by the coprocessor 16 It can be performed at high speed using the stored code.
[0025]
Storing only the sub-module 22a for executing the method call in the built-in memory 15 and storing the sub-module 22b in the main memory 12 effectively utilizes the built-in memory 15 having a limited capacity. This is for efficiently speeding up a method call that significantly affects the performance of the intermediate code execution system 1. However, if there is room in the capacity of the built-in memory 15, both the sub-module 22a and the sub-module 22b may be stored in the built-in memory 15.
[0026]
Although not shown, the intermediate code execution system 1 also includes a class loader for expanding the class file on the heap, a verifier for verifying the expanded class file, and garbage collection for the object arranged in the main memory 12. The main memory includes a heap management module including a garbage collector for performing the processing, a thread management module for controlling the flow of processing, a class library that is referred to as needed when executing a class file, a JAR file reader, and the like. For these, corresponding modules in the normal software Java (registered trademark) VM may be used.
[0027]
The present invention does not prevent the existence of software other than these, and may be a software configuration including an operating system (OS), various applications, service programs, protocol stacks, device drivers, and the like. When the intermediate code execution system 1 is configured on an embedded device, the intermediate code execution system 1 may have software for realizing a function unique to each device.
[0028]
Next, an outline of an operation of executing the class file 103 stored in the main memory 12 in the intermediate code execution system 1 having the above configuration will be described with reference to FIG. The class file 103 may be, for example, a file extracted from a JAR file (not shown) by a class loader, or a file loaded from outside the intermediate code execution system 1. Prior to the following operation, it is assumed that the verification of the validity of the class file 103, the generation of the static field and the link including the determination of the constant pool, and the initialization of the class have been performed. .
[0029]
In executing the class file 103 in the intermediate code execution system 1, first, the intermediate code execution system 1 is initialized (STEP 101). As a result, the coprocessor 16 can fetch the bytecode instruction from the class file 103 and convert it to microcode, and the subsequent operations are performed under the control of the coprocessor 16. The core module 21 and the sub-module 22a may be arranged in the built-in memory in STEP 101, but may be arranged prior to this.
[0030]
Next, the coprocessor 16 fetches a bytecode instruction from the class file 103 (STEP102), and determines whether the accelerator function of the coprocessor 16 corresponds to the fetched bytecode instruction (STEP103). If it is determined that the instruction is a bytecode instruction corresponding to the accelerator function, the bytecode instruction is converted into a corresponding microcode of the CPU 10 (STEP 104), and the microcode is passed to the processor core of the CPU 10 and executed. Is performed (STEP 105). Note that STEP 103 and STEP 104 do not need to be strictly distinguished. The conversion of the bytecode instruction fetched by the coprocessor 16 is tried, and if it can be converted, it is converted as it is, and if it cannot be converted, it cannot be converted. And a flag may be output to proceed to the processing in STEP 108 and subsequent steps.
[0031]
If it is determined in STEP 103 that the instruction is not a bytecode instruction corresponding to the accelerator function, the control moves from the coprocessor 16 to the CPU 10, and the subsequent operations are performed by executing each code on the CPU 10. . First, the core module 21 is executed on the CPU 10, and a code for performing a process corresponding to the bytecode instruction fetched from the submodules 22a and 22b is called (STEP 108). Here, since the core module 21 is stored in the high-speed built-in memory 15, the CPU 10 can perform the processing of STEP 108 at high speed.
[0032]
Next, the code called in STEP 108 is executed by the CPU 10 (STEP 109), and processing corresponding to the fetched bytecode instruction is performed. Here, if the fetched bytecode instruction is an instruction for calling a method, the submodule 22a corresponding to these instructions is stored in the high-speed internal memory 15, so that the CPU 10 executes the processing in STEP109 at high speed. Can be done.
[0033]
After the execution of the bytecode instruction is completed, it is determined whether the executed bytecode instruction is the last one (STEP 106). If it is determined that the bytecode instruction is the last one, the execution of the class file 103 is stopped. When the processing is completed and it is not determined that the processing is the last, the target is moved to the next bytecode instruction (STEP107), and the processing of STEP102 and thereafter is repeated. The intermediate code execution system 1 executes the class file 103 by the operation outlined above.
[0034]
According to this operation, not only can the accelerator function of the coprocessor 16 execute the bytecode instruction corresponding to the high speed, but also the bytecode instruction not supported by the accelerator function of the coprocessor 16 can be executed. Is executed at high speed by the CPU 10, and in the case of a method call instruction, STEP 109 is also executed at a high speed. Therefore, bytecode instructions (particularly, method call instructions) which are not compatible with the accelerator function of the coprocessor 16 Execution speed can be improved. Thus, in the intermediate code execution system 1 of the present embodiment, in a hardware configuration having an accelerator, extremely high performance can be achieved by effectively utilizing the high-speed built-in memory 15 having a limited capacity.
[0035]
In the present embodiment having the above-described configuration, the average execution speed of the method call can be further improved by optimizing the submodule 21a according to the method call in a frequently called format. When the present inventors investigated various class files, it was found that, in general, the method that is frequently called is not a native method but a Java (registered trademark) method, and it is not necessary to add a stack frame. Turned out to be something. Therefore, in the sub-module 22a, processing for calling such a method is expanded inline to prepare optimized code that can be executed without performing a conditional branch or a function call, and frequently occurs when the sub-module 22a is called. Judge whether or not it corresponds to the pattern. If it is a method of a frequent pattern, execute it using the optimization code. If it is not a method of a frequent pattern, execute it by normal processing including predetermined conditional branches and function calls. By configuring the submodule 22a as described above, the performance of the intermediate code execution system 1 can be further improved. When a method of another pattern is called at a high frequency, a code in which a process of calling the method of another pattern is expanded inline may be prepared, and the submodule 22a may be similarly configured.
[0036]
The intermediate code execution system according to the embodiment of the present invention has been described above. However, the present invention is not limited to this, and various modifications and changes can be made without departing from the gist of the present invention. For example, in the above-described embodiment, the case where the built-in memory 15 is used as the auxiliary memory has been described. However, the cache memory may be used as the auxiliary memory, or the main memory may be used as the DRAM and the auxiliary memory may be used as the SRAM. is there.
[0037]
Further, in the present embodiment, the case where the coprocessor 16 having the accelerator function is used is shown. However, the present invention is applied to the case where the accelerator integrated with the processor core is used in substantially the same manner as the above embodiment. By doing so, an extremely high performance intermediate code execution system can be obtained.
[0038]
Furthermore, when configuring an intermediate code execution system that executes a Java (registered trademark) class file, it is based on the Java (registered trademark) virtual machine specification (original title “The Java (registered trademark, original name is TM) Virtual Machine Specification”). Hardware and / or software can be designed, and there is no restriction other than that the class file can be executed. Therefore, it is possible to design freely without departing from the specification and the spirit of the present invention. Furthermore, the present invention is not limited to an intermediate code execution system that executes a Java (registered trademark) class file.
[0039]
【The invention's effect】
According to the present invention, an instruction that cannot use an accelerator is executed as software using a program code stored in a high-speed auxiliary memory, so that an instruction that cannot use an accelerator is executed more quickly. This can improve the performance of an intermediate code execution system having an accelerator.
[Brief description of the drawings]
FIG. 1 is a schematic diagram of an intermediate code execution system according to the present embodiment. FIG. 2 is a schematic diagram of an operation of executing a class file in the intermediate code execution system.
1 Intermediate code execution system 10 CPU
12 Main memory 13 Processor core 14 Internal bus 15 Internal memory 16 Coprocessor 17 External bus 21 Core module 22a, 22b Submodule 23 Class file

Claims

In an intermediate code execution system that executes a program in an intermediate code format having a higher degree of abstraction than native code,
A processor, a main memory connected to the processor and an auxiliary memory faster than the main memory, and an accelerator for accelerating execution of a part of instructions included in a program in an intermediate code format,
Among the instructions included in the program in the intermediate code format, a program code for executing an instruction that cannot use the accelerator is stored in the auxiliary memory,
When executing a program in the intermediate code format, instructions that can use an accelerator are executed using an accelerator, and instructions that cannot use an accelerator are executed as software using the program code. Characterized intermediate code execution system.

The intermediate code execution system according to claim 1, wherein the accelerator is configured integrally with the processor or is a coprocessor connected to the processor.

3. The intermediate code execution system according to claim 1, wherein the accelerator converts some instructions included in the intermediate code format into native code or microcode of the processor.

The program according to any one of claims 1 to 3, wherein the intermediate code format program is a Java (registered trademark) class file, and at least an instruction for calling a method is executed by the program code. Intermediate code execution system.

The intermediate code execution according to any one of claims 1 to 4, wherein the program code calls a software module corresponding to the instruction to be executed when executing an instruction that cannot use an accelerator. system.

6. The intermediate code execution system according to claim 5, wherein a software module called when executing an instruction for calling a method is stored in the auxiliary memory.

7. The intermediate code execution system according to claim 6, wherein a software module called when executing the instruction for calling the method is optimized for frequently called form method calling.

7. The method according to claim 6, wherein the software module called when executing the instruction for calling the method includes a bytecode instruction and is optimized when calling a method that does not need to add a stack frame. An intermediate code execution system according to claim 1.