JP2016181228A

JP2016181228A - Source code calculation device and source code calculation method

Info

Publication number: JP2016181228A
Application number: JP2015062541A
Authority: JP
Inventors: 淳也藤田; Junya Fujita; 輝昭酒田; Teruaki Sakata; 山田　勉; Tsutomu Yamada; 山田　　勉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2015-03-25
Filing date: 2015-03-25
Publication date: 2016-10-13

Abstract

【課題】ソフトウェアプログラムをより適切に解析する技術を提供する。【解決手段】ソフトウェアプログラムのソースコードを解析するためのソースコード演算装置であって、前記ソースコードの構文を解析し、前記ソフトウェアプログラムを、内部に分岐および合流を有しない命令の集合である基本ブロックと前記基本ブロック間の制御構造を示す有向グラフとで表わした制御フローグラフを生成する制御フローグラフ生成部と、前記ソフトウェアプログラムの構文の解析で値が定まらない部分を未定とした前記制御フローグラフに基づいて前記ソフトウェアプログラム内の所定の解決対象を探索し、前記制御フローグラフを用いて前記解決対象の到達定義を計算し、前記到達定義から得られる値を前記制御フローグラフに反映することにより前記制御フローグラフを再構成する解決部と、を有している。【選択図】図１[Problem] To provide a technology for more appropriately analyzing software programs. [Solution] A source code calculation device for analyzing source code of a software program, comprising: a control flow graph generation unit that analyzes the syntax of the source code and generates a control flow graph in which the software program is represented by basic blocks, which are sets of instructions with no internal branches or joins, and a directed graph showing the control structure between the basic blocks; and a resolution unit that searches for a predetermined solution target in the software program based on the control flow graph in which parts whose values are not determined by the analysis of the syntax of the software program are left undetermined, calculates a reaching definition of the solution target using the control flow graph, and reconstructs the control flow graph by reflecting a value obtained from the reaching definition in the control flow graph. [Selected Figure] Figure 1

Description

本発明は、マイクロコントローラ向けソフトウェアプログラムのソースコードを解析する技術に関する。 The present invention relates to a technique for analyzing a source code of a software program for a microcontroller.

近年ではマイクロコントローラ（以下「マイコン」と称す）の一部で生産終了するものが出ている。マイコンの生産終了に伴い、マイコンを内蔵する産業機器などで、生産終了となったマイコンから他のマイコンへの置き換えが進められている。そのために、あるマイコン用に設計された組み込みソフトウェアを他のマイコン用に再設計することが求められている。 In recent years, some microcontrollers (hereinafter referred to as “microcomputers”) have been discontinued. With the end of production of microcomputers, replacement of microcomputers that have been discontinued with other microcomputers is being promoted in industrial equipment with built-in microcomputers. Therefore, it is required to redesign embedded software designed for one microcomputer for another microcomputer.

生産終了となるマイコンを内蔵する製品の中にはソフトウェアプログラムの全部あるいは一部がアセンブリ言語で記述されたものがある。アセンブリ言語の仕様はマイコン製品によって異なるため、アセンブリ言語で記述されたソースコードは、Ｃ言語のような高水準言語のソースコードと異なり、マイコン間の移植性が低い。また、マイコン製品によってアーキテクチャが異なるため、ソフトウェアプログラムの再設計において、アセンブリ言語のソースコードは命令単位の単純な変換では対応できないことが殆どである。例えば、命令単位で変換の手順を一意に定めることは困難であり、フラグなどのマイコン内部処理、メモリマップ、周辺回路の互換などを含めてソースコードを一旦解析した上で、命令の集合が意味する処理内容を考慮しながら変換していくことになる。 Some products that have microcomputers that are no longer in production have software programs written entirely or partially in assembly language. Since the specification of the assembly language varies depending on the microcomputer product, the source code written in the assembly language is different from the source code of the high-level language such as C language, and portability between microcomputers is low. In addition, since the architecture varies depending on the microcomputer product, the assembly language source code cannot be handled by simple conversion in units of instructions in redesigning the software program. For example, it is difficult to uniquely define the conversion procedure in units of instructions, and the set of instructions means that after analyzing the source code once, including microcomputer internal processing such as flags, memory map, peripheral circuit compatibility, etc. Conversion is performed in consideration of the processing contents to be performed.

以上のような背景から、アセンブリ言語で記述されたソースコードの静的な解析、とりわけデータの流れ（データフロー）の解析が重要となる。データフローレベルでソースコードを解析することができれば、命令の集合が意味する処理内容を把握できる。しかし、特にデータフローを考慮してソースコードを修正するにはソースコードの熟読が必須であり、人手で作業する場合、高いスキルと多大な工数が必要とされる。そのためソフトウェアプログラムのデータフローレベルでの解析を自動化することができれば、大きな工数の削減が期待できる。 From the above background, static analysis of source code described in assembly language, especially analysis of data flow (data flow) is important. If the source code can be analyzed at the data flow level, it is possible to grasp the processing contents that the instruction set means. However, in order to modify the source code in consideration of the data flow in particular, it is essential to read the source code thoroughly. When working manually, a high skill and a large amount of man-hour are required. Therefore, if the analysis of the software program at the data flow level can be automated, a great reduction in man-hours can be expected.

特許文献１には、ソースコードからマシンコードへ変換する過程において、処理の並列化が可能であるかどうか判定し、可能であれば並列化されたマシンコードを生成するプログラム並列化支援装置が記載されている。特許文献１では、プログラムを並列化することが可能かどうかソースコードをデータフローレベルで解析し、並列化を阻害する要因を抽出している。 Patent Document 1 describes a program parallelization support apparatus that determines whether processing can be parallelized in the process of converting source code to machine code, and generates parallelized machine code if possible. Has been. In Patent Literature 1, source code is analyzed at the data flow level to determine whether or not a program can be parallelized, and factors that inhibit parallelization are extracted.

特開２００９−１２９１７９号公報JP 2009-129179 A

上述のように、あるマイコン向けに記述されたソースコードを他のマイコン向けに修正するためには、データフローレベルでソースコードを解析する必要がある。ソフトウェアプログラムのデータフローレベルでの解析にはＣＦＧ（制御フローグラフ）が用いられる。ＣＦＧは、ソフトウェアプログラムを、内部に分岐および合流を有しない基本ブロックと、基本ブロック間の分岐あるいは合流などの制御構造を示す有向グラフとによって表現したものである。ＣＦＧはソースコードを構文解析することにより得ることができる。 As described above, in order to correct a source code written for one microcomputer for another microcomputer, it is necessary to analyze the source code at the data flow level. CFG (control flow graph) is used for the analysis of the software program at the data flow level. The CFG represents a software program by a basic block that does not have branching and merging therein and a directed graph that indicates a control structure such as branching or merging between the basic blocks. The CFG can be obtained by parsing the source code.

ソフトウェアプログラムのデータフローレベルでの解析はＣＦＧを用いて到達定義を計算することで実現される。到達定義とは、各基本ブロックの入り口に到達するデータ定義命令の集合である。到達定義から命令間の依存関係やレジスタなどの変数が取り得る値の候補を知ることができる。それらの情報をソースコードの解析に活用することによりプログラムをデータフローレベルで解析することが可能となる。また、そのようなソフトウェアプログラムのデータフローレベルでの解析を可能にするためにＣＦＧが完全なものでなければならない。 Analysis of the software program at the data flow level is realized by calculating the arrival definition using CFG. The arrival definition is a set of data definition instructions that reach the entry of each basic block. From the arrival definition, it is possible to know a dependency relationship between instructions and possible values of a variable such as a register. By utilizing such information for source code analysis, it becomes possible to analyze the program at the data flow level. Also, the CFG must be complete to allow such software programs to be analyzed at the data flow level.

上述したようにＣＦＧはソースコードの構文解析により得られる。しかし、ソースコードの構文を解析しただけで完全なＣＦＧを得られない場合もある。例えば、ソースコードの中に間接アドレッシングモードの分岐命令が含まれている場合、ソースコードの構文解析だけで完全なＣＦＧを得ることはできない。間接アドレッシングモードではソースコードを構文解析しても実際のアドレス番地が直接得られないからである。実際のアドレス番地が分からなければ、分岐命令の分岐先を判定することは難しい。分岐命令の分岐先が判明しなければＣＦＧを完成させることができない。 As described above, CFG is obtained by parsing source code. However, in some cases, a complete CFG cannot be obtained just by analyzing the syntax of the source code. For example, when a branch instruction in the indirect addressing mode is included in the source code, a complete CFG cannot be obtained only by parsing the source code. This is because in the indirect addressing mode, even if the source code is parsed, the actual address address cannot be obtained directly. If the actual address is not known, it is difficult to determine the branch destination of the branch instruction. The CFG cannot be completed unless the branch destination of the branch instruction is known.

特許文献１にはプログラムの並列化の可能性を判断する方法が説明されている。その方法というのは、ＣＦＧを解析してＣ言語のポインタ情報を取得し、得られたポインタ情報を並列化が可能か否かを判定するための指標として利用するというものである。このポインタが、間接アドレッシングモードによる命令と類似する処理を実現するものである。つまり、ポインタが示す変数の内容をアドレス値とし、そのアドレス値が示すメモリ上の番地にあるデータを参照したり、更新したりするという処理が実現される。その際、特許文献１に記載された技術では、ソースコードから直接得たＣＦＧのみを利用してポインタを解析し、ポインタに依存関係があるかどうか判定する。しかし、特許文献１の技術は並列化の可能性を判定することを目的としているため、特定の命令を実行したときにポインタが示すアドレス値まで考慮する必要はない。 Patent Document 1 describes a method for determining the possibility of parallelization of a program. The method is to analyze CFG to obtain C language pointer information and use the obtained pointer information as an index for determining whether or not parallelization is possible. This pointer realizes processing similar to an instruction in the indirect addressing mode. That is, a process is realized in which the content of the variable indicated by the pointer is set as an address value, and data at the address on the memory indicated by the address value is referred to or updated. At that time, in the technique described in Patent Document 1, the pointer is analyzed using only the CFG obtained directly from the source code, and it is determined whether the pointer has a dependency. However, since the technique of Patent Document 1 is intended to determine the possibility of parallelization, it is not necessary to consider the address value indicated by the pointer when a specific instruction is executed.

一方、あるマイコンの組み込みソフトウェアを他のマイコン用に修正する場合、データフローレベルでソースコードを解析し、参照あるいは更新するデータのアドレス値を取得し、新たなマイコンにおいてもそのアドレスのデータを参照あるいは更新するようにソフトウェアプログラムを修正する必要がある。また、アドレス値はマイコン毎に固有であるため必然的に置き換えの対象となることがある。そのためソフトウェアプログラムの再設計で命令を新しく置き換えるためにはアドレス値の算出が不可欠である。 On the other hand, when modifying the embedded software of one microcomputer for another microcomputer, the source code is analyzed at the data flow level, the address value of the data to be referenced or updated is obtained, and the data at that address is also referenced in the new microcomputer. Or the software program needs to be modified to update. In addition, since the address value is unique to each microcomputer, it may inevitably be replaced. For this reason, it is indispensable to calculate an address value to replace a new instruction by redesigning the software program.

しかし、特許文献１に開示された処理では具体的なアドレス値を導出しないため、ポインタが示すアドレス値がＣＦＧに影響するような場合には完全なＣＦＧを得ることができない。 However, since the specific address value is not derived in the process disclosed in Patent Document 1, a complete CFG cannot be obtained when the address value indicated by the pointer affects the CFG.

このように、あるマイコンのソフトウェアプログラムをアーキテクチャが異なる他のマイコン用に書き換えるには、特許文献１の技術では不十分であり、間接アドレッシングモードなどを含め、ソフトウェアプログラムをより適切に解析する技術が必要である。 As described above, the technique of Patent Document 1 is insufficient to rewrite a software program of a microcomputer for another microcomputer having a different architecture, and there is a technique for analyzing a software program more appropriately including an indirect addressing mode. is necessary.

本発明の目的は、ソフトウェアプログラムをより適切に解析する技術を提供することである。 An object of the present invention is to provide a technique for more appropriately analyzing a software program.

本発明の一態様によるソースコード演算装置は、ソフトウェアプログラムのソースコードを解析するためのソースコード演算装置であって、前記ソースコードの構文を解析し、前記ソフトウェアプログラムを、内部に分岐および合流を有しない命令の集合である基本ブロックと前記基本ブロック間の制御構造を示す有向グラフとで表わした制御フローグラフを生成する制御フローグラフ生成部と、前記ソフトウェアプログラムの構文の解析で値が定まらない部分を未定とした前記制御フローグラフに基づいて前記ソフトウェアプログラム内の所定の解決対象を探索し、前記制御フローグラフを用いて前記解決対象の到達定義を計算し、前記到達定義から得られる値を前記制御フローグラフに反映することにより前記制御フローグラフを再構成する解決部と、を有している。 A source code arithmetic device according to an aspect of the present invention is a source code arithmetic device for analyzing a source code of a software program, analyzing the syntax of the source code, and branching and merging the software program internally. A control flow graph generation unit that generates a control flow graph represented by a basic block that is a set of instructions that do not have and a directed graph that indicates a control structure between the basic blocks, and a portion whose value is not determined by analyzing the syntax of the software program A predetermined solution object in the software program is searched based on the control flow graph in which undetermined is determined, the arrival definition of the solution object is calculated using the control flow graph, and the value obtained from the arrival definition is Reconstruct the control flow graph by reflecting it in the control flow graph Has a that solves part.

本発明によれば、ソフトウェアプログラムを構文解析して得た制御フローグラフで未定値を生じさせている解決対象を探索し、その解決対象を制御フローグラフを用いて解決し、その解析で得た値を制御フローグラフに反映させることにより、未定値となっている部分を決定していく。構文解析では明らかとならない部分を制御フローグラフを用いて解決していくので、ソフトウェアプログラムをより適切に解析することが可能となる。 According to the present invention, the control flow graph obtained by parsing the software program is searched for a solution target causing an undetermined value, and the solution target is solved using the control flow graph, and obtained by the analysis. By reflecting the value in the control flow graph, the portion that is undetermined is determined. Since the part that is not clarified by the syntax analysis is solved using the control flow graph, the software program can be analyzed more appropriately.

本実施形態によるソースコード解析装置１００の基本的構成を示すブロック図である。It is a block diagram which shows the basic composition of the source code analysis apparatus 100 by this embodiment. ソースコードとそれを基に生成したＣＦＧの一例を示す図である。It is a figure which shows an example of CFG produced | generated based on the source code and it. ソースコードとそれを基に生成したＣＦＧの一例を示す図である。It is a figure which shows an example of CFG produced | generated based on the source code and it. アドレス値解決部ＢＬ１０２のブロック図である。It is a block diagram of address value resolution part BL102. 本実施形態によるソースコード解析装置の処理を示すフローチャートである。It is a flowchart which shows the process of the source code analyzer by this embodiment. ＣＦＧ管理テーブルＯＢ１０３のイメージを示す図である。It is a figure which shows the image of CFG management table OB103. 図４のステップＳ４０４からＳ４１０までの一連の処理を詳細化したフローチャートである。It is the flowchart which detailed the series of processes from step S404 of FIG. 4 to S410. 分岐にける探索のイメージを示す図である。It is a figure which shows the image of the search in a branch. ＣＦＧオブジェクトが変換される様子を例示する図である。It is a figure which illustrates a mode that a CFG object is converted. プログラム解析処理部ＢＬ１０３による解析処理のフローチャートである。It is a flowchart of the analysis process by program analysis process part BL103. 人間が容易に解読可能な形式となった解析結果が画面に表示されている様子を示す図である。It is a figure which shows a mode that the analysis result used as the format which a human being can decipher easily is displayed on the screen. コンピュータがバッチ処理することができる形式の解析結果を示す図である。It is a figure which shows the analysis result of the format which a computer can batch-process. 他の実施形態によるソースコード解析装置の一部を示すブロック図である。It is a block diagram which shows a part of source code analysis apparatus by other embodiment. 他の実施形態によるソースコード解析装置の解析処理のフローチャートである。It is a flowchart of the analysis process of the source code analyzer by other embodiment.

以下、図面を用いて本実施形態によるソースコード解析装置について説明する。なお、本ソースコード解析装置は所定のソースコード解析方法を実現する装置であり、例えば、あるマイコン用のソフトウェアプログラムを他のマイコン用に再設計するのに利用可能である。 Hereinafter, the source code analysis apparatus according to the present embodiment will be described with reference to the drawings. The source code analysis apparatus is an apparatus that realizes a predetermined source code analysis method, and can be used, for example, to redesign a software program for a certain microcomputer for another microcomputer.

図１は、本実施形態によるソースコード解析装置１００の基本的構成を示すブロック図である。図１を参照すると、ソースコード解析装置１００は、ＣＦＧ生成部ＢＬ１０１と、アドレス値解決部ＢＬ１０２と、ＣＦＧ管理テーブルＯＢ１０３と、プログラム解析処理部ＢＬ１０３と、プログラム最適化処理部ＢＬ１０４を有している。 FIG. 1 is a block diagram showing a basic configuration of a source code analysis apparatus 100 according to the present embodiment. Referring to FIG. 1, the source code analysis apparatus 100 includes a CFG generation unit BL101, an address value resolution unit BL102, a CFG management table OB103, a program analysis processing unit BL103, and a program optimization processing unit BL104. .

ＣＦＧ生成部ＢＬ１０１は、アセンブリ言語等で記述されたソースコードＯＢ１０１を入力とし、そのソースコードＯＢ１０１の構文を解析して制御フローグラフ（ＣＦＧ）を生成する。生成されたＣＦＧがＣＦＧオブジェクトＯＢ１０２となる。ＣＦＧは、ソースコードＯＢ１０１のの構文を解析して生成されるグラフデータであり、そのデータ構造は、内部に分岐および合流を有しない基本ブロックと、基本ブロック間のデータの流れを示す有向グラフと、により表現される。 The CFG generation unit BL101 receives a source code OB101 described in an assembly language or the like, analyzes the syntax of the source code OB101, and generates a control flow graph (CFG). The generated CFG becomes the CFG object OB102. The CFG is graph data generated by analyzing the syntax of the source code OB101, and the data structure thereof includes a basic block that does not have branching and merging therein, a directed graph that indicates the flow of data between the basic blocks, It is expressed by

アドレス値解決部ＢＬ１０２は、ＣＦＧオブジェクトＯＢ１０２を元にソースコードＯＢ１０１のデータフロー解析を実行する。その際、アドレス値解決部ＢＬ１０２は、ＣＦＧオブジェクトＯＢ１０２内にある間接アドレッシングモードの命令を探索し、その命令にて参照される実効アドレスを示すアドレス情報を取得する。 The address value resolution unit BL102 performs data flow analysis of the source code OB101 based on the CFG object OB102. At that time, the address value resolution unit BL102 searches for an indirect addressing mode instruction in the CFG object OB102, and acquires address information indicating an effective address referred to by the instruction.

更に、アドレス値解決部ＢＬ１０２は、取得したアドレス情報を、データフロー解析に利用したＣＦＧオブジェクトＯＢ１０２に反映し、再構成したＣＦＧオブジェクトＯＢ１０２に所定の管理情報を付加し、解析結果としてＣＦＧ管理テーブルＯＢ１０３に記録する。その管理情報には、ＣＦＧオブジェクトＯＢ１０２の識別情報と、開始位置からアドレス解決の対象となった間接アドレッシングモードの命令までのプログラムパスを示すパストレース情報とが含まれる。 Further, the address value resolution unit BL102 reflects the acquired address information on the CFG object OB102 used for the data flow analysis, adds predetermined management information to the reconfigured CFG object OB102, and generates a CFG management table OB103 as an analysis result. To record. The management information includes identification information of the CFG object OB102 and path trace information indicating a program path from the start position to the indirect addressing mode instruction targeted for address resolution.

ＣＦＧ管理テーブルＯＢ１０３は、アドレス値解決部ＢＬ１０２が取得した間接アドレッシングモードの命令で参照する実効アドレスを示すアドレス情報と管理情報とを含む解析結果をデータベースにて保持する。 The CFG management table OB103 holds, in a database, an analysis result including address information indicating an effective address referred to by the indirect addressing mode instruction acquired by the address value resolution unit BL102 and management information.

プログラム解析処理部ＢＬ１０３は、ＣＦＧ生成部ＢＬ１０１がソースコードＯＢ１０１から生成し、アドレス値解決部ＢＬ１０２が間接アドレッシングモードの命令のアドレス解決を行った結果を反映したＣＦＧオブジェクトに対して、そのＣＦＧオブジェクトにおける所定の解析対象命令を解析する所定の解析処理を実行し、その解析結果をプログラム解析結果ＯＢ１０５として出力する。所定の解析処理は、一例を挙げれば、データフロー解析により解析対象命令の到達定義を算出する処理である。その場合、算出した到達定義が解析結果となる。プログラム解析結果ＯＢ１０５は、ＣＦＧ管理テーブルＯＢ１０３に登録されたＣＦＧオブジェクトＯＢ１０２を元に、置き換え前のマイコン用のソースコードＯＢ１０１を置き換え後の新たなマイコンが処理できる形式のソースコードＯＢ１０６へ変換するとき、変換を行うマシンまたは人間が参照する情報である。 The program analysis processing unit BL103 generates a CFG object in which the CFG generation unit BL101 generates the source code OB101 and the address value resolution unit BL102 reflects the result of the address resolution of the instruction in the indirect addressing mode. A predetermined analysis process for analyzing a predetermined analysis target instruction is executed, and the analysis result is output as a program analysis result OB105. For example, the predetermined analysis process is a process for calculating the arrival definition of the instruction to be analyzed by data flow analysis. In that case, the calculated arrival definition becomes the analysis result. Based on the CFG object OB102 registered in the CFG management table OB103, the program analysis result OB105 is converted into the source code OB106 in a format that can be processed by a new microcomputer after replacement of the source code OB101 for the microcomputer before replacement. Information that is referred to by the machine or human that performs the conversion.

プログラム最適化処理部ＢＬ１０４は、プログラム解析結果ＯＢ１０５を元に、入力されたソースコードＯＢ１０１を最適化する。ここでいう最適化は、ソースコードＯＢ１０１を、置き換え後の新たなマイコンで処理しやすい形式にしたり、新たなマイコンで効率よく処理できるようにしたり、または人間の作業を容易化あるいは低減できるようにすることである。具体的には、プログラム解析結果ＯＢ１０５を、置き換え後の新たなマイコンが効率よく自動処理できる形式、あるいは人間にとって可読性の高い形式に変換することが好ましい。 The program optimization processing unit BL104 optimizes the input source code OB101 based on the program analysis result OB105. In this optimization, the source code OB101 is made into a format that can be easily processed by the new microcomputer after replacement, can be efficiently processed by the new microcomputer, or human work can be facilitated or reduced. It is to be. Specifically, it is preferable to convert the program analysis result OB105 into a format that can be automatically processed by the new microcomputer after replacement, or a format that is highly readable for humans.

図２Ａ〜図２Ｂは、ソースコードとそれを基に生成したＣＦＧの一例を示す図である。図２Ａはソースコードの一例を示し、図２Ｂは、図２Ａのソースコードから生成したＣＦＧオブジェクトを示している。 2A to 2B are diagrams illustrating an example of a source code and a CFG generated based on the source code. FIG. 2A shows an example of source code, and FIG. 2B shows a CFG object generated from the source code of FIG. 2A.

図２ＢのＣＦＧオブジェクトＯＢ１０２を見ると、図２Ａに示したソースコード２００が、内部に分岐および合流を含まない基本ブロックＢ２０２、Ｂ２０３、Ｂ２０４に分割されている。更に開始（ＳＴＡＲＴ）ブロックＢ２０１と終了（ＥＮＤ）ブロックＢ２０５を加えた５つのブロックＢ２０１〜２０５が、連結状態を矢印で示した有向グラフで連結されている。 Looking at the CFG object OB102 in FIG. 2B, the source code 200 shown in FIG. 2A is divided into basic blocks B202, B203, and B204 that do not include branching and merging. Further, five blocks B201 to 205 including a start (START) block B201 and an end (END) block B205 are connected by a directed graph whose connection state is indicated by an arrow.

図３は、アドレス値解決部ＢＬ１０２のブロック図である。図３を参照すると、アドレス値解決部ＢＬ１０２は、プログラムパス探索部ＢＬ３０１、データフロ―解析部ＢＬ３０２、ＣＦＧ再構成部ＢＬ３０３、およびＣＦＧ検証部ＢＬ３０４を有している。 FIG. 3 is a block diagram of the address value resolution unit BL102. Referring to FIG. 3, the address value resolution unit BL102 includes a program path search unit BL301, a data flow analysis unit BL302, a CFG reconstruction unit BL303, and a CFG verification unit BL304.

プログラムパス探索部ＢＬ３０１は、入力されたＣＦＧオブジェクトＯＢ１０２からＳＴＡＲＴブロックＢ２０１を探し出し、ＳＴＡＲＴブロック２０１から各基本ブロックの命令を辿り、解析対象命令を探索する。解析対象命令は具体例として間接アドレッシングモードの命令である。プログラムパス探索部ＢＬ３０１は、解析対象命令を発見すると、開始位置からその命令までのプログラムパスを示すプログラムパス情報を出力する。 The program path search unit BL301 searches for the START block B201 from the input CFG object OB102, traces the instruction of each basic block from the START block 201, and searches for the analysis target instruction. The analysis target instruction is an indirect addressing mode instruction as a specific example. When the program path search unit BL301 finds an instruction to be analyzed, it outputs program path information indicating a program path from the start position to the instruction.

データフロー解析部ＢＬ３０２は、プログラムパス探索部ＢＬ３０１からのプログラムパス情報に基づき、ＣＦＧオブジェクトＯＢ１０２を元にデータフロー解析処理を実行することにより、解析対象命令の実効アドレスの候補となる値（アドレス候補値）を生成する。 The data flow analysis unit BL302 executes a data flow analysis process based on the CFG object OB102 based on the program path information from the program path search unit BL301, thereby providing a value (address candidate) that is a candidate for the effective address of the instruction to be analyzed. Value).

ＣＦＧ再構成部ＢＬ３０３は、データフロー解析部ＢＬ３０２が生成したアドレス候補値を元にＣＦＧオブジェクトＯＢ１０２を再構成する。 The CFG reconstruction unit BL303 reconstructs the CFG object OB102 based on the address candidate value generated by the data flow analysis unit BL302.

ＣＦＧ検証部ＢＬ３０４は、再構成されたＣＦＧオブジェクト（再構成ＣＦＧオブジェクトＯＢ３０１）を検証し、管理情報と共にＣＦＧ管理テーブルＯＢ１０３へ保存する。また、ＣＦＧ検証部ＢＬ３０４は、他の構成のＣＦＧが得られる可能性がある場合、再構成ＣＦＧオブジェクトＯＢ３０１を再びプログラムパス探索部ＢＬ３０１へ戻す。ＣＦＧオブジェクトＯＢ３０１の検証として、例えば、開始位置から解析対象命令までのプログラムパスを正常に辿れること等を検証すればよい。 The CFG verification unit BL304 verifies the reconfigured CFG object (reconfigured CFG object OB301) and stores it in the CFG management table OB103 together with the management information. Further, when there is a possibility that a CFG having another configuration may be obtained, the CFG verification unit BL304 returns the reconfigured CFG object OB301 to the program path search unit BL301 again. As the verification of the CFG object OB301, for example, it may be verified that the program path from the start position to the analysis target instruction can be normally traced.

図１に戻り、ＣＦＧ管理テーブルＯＢ１０３は、ＣＦＧ生成部ＢＬ１０１が生成したＣＦＧオブジェクトあるいはそれを再構成した再構成ＣＦＧオブジェクト（以下それらをまとめて「ＣＦＧオブジェクトＯＢ１０２」とも称す）と、その管理情報とを、それらを互いに紐付けた状態で保存するテーブルである。管理情報には、当該ＣＦＧオブジェクトＯＢ１０２の識別情報と、当該ＣＦＧオブジェクトＯＢ１０２が生成に寄与した、解析対象命令である間接アドレッシングモードの命令までのプログラムパス情報が含まれる。 Returning to FIG. 1, the CFG management table OB103 includes a CFG object generated by the CFG generation unit BL101 or a reconfigured CFG object reconstructed from the CFG object (hereinafter collectively referred to as “CFG object OB102”), its management information, Is a table that stores them in a state where they are linked to each other. The management information includes identification information of the CFG object OB102 and program path information from the CFG object OB102 to the indirect addressing mode instruction that is an analysis target instruction that contributes to generation.

プログラム解析処理部ＢＬ１０３は、ＣＦＧ管理テーブルＯＢ１０３に登録されたＣＦＧオブジェクトＯＢ１０２を解析することにより、ソースコードの移植に必要な情報を生成し、生成したプログラム解析結果ＯＢ１０５を出力する。 The program analysis processing unit BL103 generates information necessary for porting the source code by analyzing the CFG object OB102 registered in the CFG management table OB103, and outputs the generated program analysis result OB105.

また、プログラム最適化処理部ＢＬ１０４は、プログラム解析処理部ＢＬ１０３が生成したプログラム解析結果ＯＢ１０５を元に、ソースコードＯＢ１０１に対してプログラム最適化処理を行い、処理結果を出力する。ここでいう最適化処理は、ソースコードＯＢ１０１を移植先のマイコンが処理可能な形式へ変換する処理、あるいはその変換に有用な情報を生成する処理である。プログラム最適化処理部ＢＬ１０４は、例えば、最適化したソースコードＯＢ１０６を出力する。 Further, the program optimization processing unit BL104 performs program optimization processing on the source code OB101 based on the program analysis result OB105 generated by the program analysis processing unit BL103, and outputs the processing result. The optimization process here is a process of converting the source code OB101 into a format that can be processed by the porting-destination microcomputer, or a process of generating information useful for the conversion. For example, the program optimization processing unit BL104 outputs the optimized source code OB106.

図４は、本実施形態によるソースコード解析装置の処理を示すフローチャートである。 FIG. 4 is a flowchart showing processing of the source code analysis apparatus according to the present embodiment.

まず、ソースコード解析装置１００は、入力されたソースコードＯＢ１０１の字句解析および構文解析を実行し、ソースコードＯＢ１０１にプログラミング言語上のエラーがあるかどうかチェックする（ステップＳ４００）。 First, the source code analysis apparatus 100 performs lexical analysis and syntax analysis of the input source code OB101, and checks whether there is an error in the programming language in the source code OB101 (step S400).

ソースコードＯＢ１０１に文法上のエラーがあったら、ソースコード解析装置１００は、エラーを通知して処理を終了する（ステップＳ４０１）。ソースコードＯＢ１０１に文法上のエラーが見当たらない場合、ソースコード解析装置１００は、図２Ｂに示したようなＣＦＧオブジェクトＯＢ１０２を生成する（ステップＳ４０１）。ステップＳ４０１の処理において、プログラム中に間接アドレッシングモード（ＩＤＡ：ＩｎｄｉｒｅｃｔＡｄｄｒｅｓｓｉｎｇＭｏｄｅ）の命令を含む場合がある。しかし、この時点でアドレスの具体的な値を見積もることはできないため、未定値としておく。 If there is a grammatical error in the source code OB101, the source code analyzing apparatus 100 notifies the error and ends the process (step S401). If no grammatical error is found in the source code OB101, the source code analysis apparatus 100 generates a CFG object OB102 as shown in FIG. 2B (step S401). In the process of step S401, the program may include an indirect addressing mode (IDA) instruction. However, since a specific value of the address cannot be estimated at this time, it is set as an undetermined value.

次に、ソースコード解析装置１００は、ソースコードＯＢ１０１のプログラムパスを辿り、間接アドレッシングモードの命令があるか否か判定する（ステップＳ４０２）。ソースコードＯＢ１０１中に間接アドレッシングモードの命令が見つからない場合、ソースコード解析装置１００は、ステップＳ４０１の処理で得られたＣＦＧオブジェクトＯＢ１０２を出力し、処理を終了する。すなわち、間接アドレッシングモードの命令が見つからずにプログラムの終点に到達した場合は処理を終了する。 Next, the source code analyzing apparatus 100 traces the program path of the source code OB101 and determines whether there is an instruction in the indirect addressing mode (step S402). When the indirect addressing mode instruction is not found in the source code OB101, the source code analyzing apparatus 100 outputs the CFG object OB102 obtained by the process of step S401, and ends the process. That is, when the end point of the program is reached without finding the indirect addressing mode instruction, the process is terminated.

ソースコードＯＢ１０１に間接アドレッシングモードの命令が含まれていた場合、ソースコード解析装置１００は、ステップＳ４０１の処理で得られた、その命令を含むＣＦＧオブジェクトＯＢ１０２を探索対象ＣＦＧとして登録し（ステップＳ４０３）、プログラムパス探索処理を開始する（ステップＳ４０４）。 When the instruction of the indirect addressing mode is included in the source code OB101, the source code analysis apparatus 100 registers the CFG object OB102 including the instruction obtained in the process of step S401 as the search target CFG (step S403). Then, the program path search process is started (step S404).

プログラムパスの探索過程において間接アドレッシングモードの命令を発見した場合、ソースコード解析装置１００は、現在探索中のＣＦＧオブジェクトＯＢ１０２を用いてデータフロー解析を実行し、到達定義を計算することで、間接アドレッシングモードの命令のアドレスの候補値を計算する（ステップＳ４０５）。なお、到達定義は、式（１）のデータフロー方程式を計算することで得られる。 When an instruction in the indirect addressing mode is found in the program path search process, the source code analysis apparatus 100 performs data flow analysis using the CFG object OB 102 currently being searched for, and calculates the arrival definition, thereby performing indirect addressing. A candidate value for the address of the mode instruction is calculated (step S405). The arrival definition can be obtained by calculating the data flow equation of Expression (1).

式（１）において、ＩＮ（Ｂ）はＣＦＧオブジェクトＯＢ１０２の基本ブロックＢに到達する定義命令の集合を示す。ＯＵＴ（Ｂ）は基本ブロックＢの出口で有効な定義命令の集合である。Ｐｒｅｄ（Ｂ）は基本ブロックＢの先行ブロックの集合である。ＰはＰｒｅｄ（Ｂ）の要素である。 In Expression (1), IN (B) represents a set of definition instructions that reach the basic block B of the CFG object OB102. OUT (B) is a set of definition instructions effective at the exit of the basic block B. Pred (B) is a set of preceding blocks of the basic block B. P is an element of Pred (B).

ＤＥＦ（Ｂ）は基本ブロックＢにおいて定義される定義命令の集合である。ＫＩＬＬ（Ｂ）は基本ブロックＢにおいて再定義等によって消滅する定義命令の集合である。このデータフロー方程式をステップＳ４０４の処理にて発見した間接アドレッシングモードの命令が属する基本ブロックにおいて計算し、当該命令においてアドレス情報を保持するレジスタまたはメモリの有効な定義集合（到達定義）を計算する。この到達定義を命令セットシミュレータ等により解釈することでアドレス値を得ることができる。 DEF (B) is a set of definition instructions defined in the basic block B. KILL (B) is a set of definition instructions that disappear in the basic block B due to redefinition or the like. This data flow equation is calculated in the basic block to which the indirect addressing mode instruction found in the process of step S404 belongs, and an effective definition set (arrival definition) of the register or memory holding the address information in the instruction is calculated. The address value can be obtained by interpreting this arrival definition by an instruction set simulator or the like.

次に、ソースコード解析装置１００は、アドレス値が有効であるか確認する（ステップＳ４０６）。得られたアドレス値が有効なアドレス値でなければ、エラーを出力して処理を終了する（ステップＳ４１１）。 Next, the source code analyzing apparatus 100 confirms whether the address value is valid (step S406). If the obtained address value is not a valid address value, an error is output and the process is terminated (step S411).

得られたアドレス値が有効であれば、ソースコード解析装置１００は、探索対象ＣＦＧとなっているＣＦＧオブジェクトＯＢ１０２を複製し、そのアドレス候補値を元にＣＦＧオブジェクトＯＢが示すＣＦＧを再構成する（ステップＳ４０７）。ＣＦＧの再構成の際、間接アドレッシングモードの命令であることを示す情報を残しておく。ソースコードＯＢ１０１によっては複数のアドレス値を得る場合があるが、アドレス値毎に、ＣＦＧオブジェクトＯＢ１０２のコピーを複製し、そのアドレス値を用いて再構成する。 If the obtained address value is valid, the source code analyzing apparatus 100 duplicates the CFG object OB102 serving as the search target CFG, and reconfigures the CFG indicated by the CFG object OB based on the address candidate value ( Step S407). When reconfiguring the CFG, information indicating that the instruction is in the indirect addressing mode is left. Depending on the source code OB101, a plurality of address values may be obtained. For each address value, a copy of the CFG object OB102 is duplicated and reconfigured using the address value.

次に、ソースコード解析装置１００は、解析終了条件が満たされているか否か検証する（ステップＳ４０８）。ステップＳ４０８における解析終了条件は、再構成ＣＦＧオブジェクトＯＢ３０１の構造が循環するという条件である。例えば、再構成されたＣＦＧである再構成ＣＦＧオブジェクトＯＢ３０１の構造が、既にＣＦＧ管理テーブルＯＢ１０３に登録されている場合、同じ構造のＣＦＧが繰り返し発言する可能性が高い。同じ構造のＣＦＧが繰り返される場合、すなわちＣＦＧ構造が循環する場合、新たに得られるＣＦＧはこれ以上存在しないこととなる。その場合、それ以上の探索は不要といえるので、解析終了条件が満たされる。解析終了条件が満たされていれば、ソースコード解析装置１００は処理を終了する（ステップＳ４０８のＹｅｓ）。 Next, the source code analyzing apparatus 100 verifies whether or not the analysis end condition is satisfied (step S408). The analysis end condition in step S408 is a condition that the structure of the reconstructed CFG object OB301 circulates. For example, when the structure of the reconfigured CFG object OB301, which is a reconfigured CFG, has already been registered in the CFG management table OB103, there is a high possibility that the CFG having the same structure repeatedly makes a statement. When the CFG having the same structure is repeated, that is, when the CFG structure circulates, there is no more CFG obtained newly. In that case, it can be said that no further search is necessary, and the analysis end condition is satisfied. If the analysis end condition is satisfied, the source code analyzing apparatus 100 ends the process (Yes in step S408).

一方、解析終了条件が満たされていなければ、ソースコード解析装置１００は、再構成ＣＦＧオブジェクトＯＢ３０１を管理情報と共にＣＦＧ管理テーブルＯＢ１０３に登録する（ステップＳ４０９）。 On the other hand, if the analysis end condition is not satisfied, the source code analysis apparatus 100 registers the reconstructed CFG object OB301 together with the management information in the CFG management table OB103 (step S409).

図５は、ＣＦＧ管理テーブルＯＢ１０３のイメージを示す図である。図５中の「ＣＦＧ」が再構成ＣＦＧオブジェクトＯＢ３０１である。「ＮＯ．Ｘ」は再構成ＣＦＧオブジェクトＯＢ３０１の識別番号であり、「ＰＡＴＨ」はプログラムパス情報である。この「ＮＯ．Ｘ」と「ＰＡＴＨ」は管理情報に該当する。 FIG. 5 is a diagram showing an image of the CFG management table OB103. “CFG” in FIG. 5 is the reconstructed CFG object OB301. “NO.X” is an identification number of the reconstructed CFG object OB301, and “PATH” is program path information. “NO.X” and “PATH” correspond to management information.

その後、ソースコード解析装置１００は、再構成ＣＦＧオブジェクトＯＢ３０１を探索対象ＣＦＧとして登録し、アドレス解決処理を実行した間接アドレッシングモードの命令を始点としてステップＳ４０４の探索処理を実行する（ステップＳ４１０）。 After that, the source code analysis apparatus 100 registers the reconstructed CFG object OB301 as a search target CFG, and executes the search process of step S404 with the indirect addressing mode instruction that executed the address resolution process as the starting point (step S410).

なお、ステップＳ４００においてソースコードＯＢ１０１に文法上のエラーが発見された場合、ステップＳ４０６においてアドレスが有効な値でなかった場合は、ステップＳ４１１にてエラー情報を生成して処理を終了する。このエラー情報が生成された場合、ソースコードＯＢ１０１を移植する過程において必ず人手によりプログラムを解析して手直しする作業が発生する。そのため、エラー情報は人間が容易に理解できるように詳細な情報を出力することが好ましい。たとえば、エラー情報として、具体的なアドレス値、そのアドレス値の決定に寄与する到達定義、関連する命令群の情報が該当する。 If a grammatical error is found in the source code OB101 in step S400, or if the address is not a valid value in step S406, error information is generated in step S411 and the process ends. When this error information is generated, an operation of manually analyzing and reworking the program is always generated in the process of porting the source code OB101. Therefore, it is preferable to output detailed information so that error information can be easily understood by humans. For example, the error information includes a specific address value, an arrival definition that contributes to the determination of the address value, and related instruction group information.

図６は、図４のステップＳ４０４からＳ４１０までの一連の処理を詳細化したフローチャートである。 FIG. 6 is a flowchart detailing a series of processing from steps S404 to S410 in FIG.

ソースコード解析装置１００は、ＳＴＡＲＴブロックＢ２０１からプログラム先頭命令を検索する（ステップＳ７００）。プログラムの先頭の命令（以下「プログラム先頭命令」という）は、ソースコードＯＢ１０１から生成したＣＦＧオブジェクトＯＢ１０２におけるＳＴＡＲＴブロックに続く後続ブロックを参照することで見つけることができる。 The source code analyzing apparatus 100 searches for a program head instruction from the START block B201 (step S700). The head instruction of the program (hereinafter referred to as “program head instruction”) can be found by referring to the subsequent block following the START block in the CFG object OB102 generated from the source code OB101.

次に、ソースコード解析装置１００は、それまでのパスのトレースを記録したパストレース情報を初期化し、プログラム先頭命令を探索点として設定する（ステップＳ７０１）。探索点とは、プログラムパスを辿る命令の探索における現在の位置（命令）をいう。次に、ソースコード解析装置１００は、探索点の命令をパストレース情報に追記し（ステップＳ７０２）、その次の命令を探索点として設定する（ステップＳ７０３）。 Next, the source code analyzing apparatus 100 initializes the path trace information in which the trace of the previous path is recorded, and sets the program head instruction as a search point (step S701). The search point refers to the current position (command) in searching for a command that follows the program path. Next, the source code analyzing apparatus 100 adds a search point instruction to the path trace information (step S702), and sets the next instruction as a search point (step S703).

なお、プログラムには分岐が存在する場合がある。ソースコード解析装置１００は、プログラムにおける分岐を発見した場合、その分岐毎に非決定的に探索を行う。図７は、分岐にける探索のイメージを示す図である。 There may be branches in the program. When the source code analyzing apparatus 100 finds a branch in the program, the source code analyzing apparatus 100 performs a nondeterministic search for each branch. FIG. 7 is a diagram showing an image of searching at a branch.

図７を参照すると、基本ブロックＢ８００が命令（分岐命令）１０４により基本ブロックＢ８０１またはＢ８０２に分岐する例が示されている。その場合、命令１０４の次の探索点は命令１０５または命令１１５のいずれかとなる。この際、パストレース情報Ｐ８０３を複製し、基本ブロックＢ８０１を探索する場合のパストレース情報Ｐ８０４と、基本ブロックＢ８０２を探索する場合のパストレース情報Ｐ８０５とを生成する。それらはプログラムパスを辿るときに利用される。 Referring to FIG. 7, an example in which the basic block B800 branches to the basic block B801 or B802 by an instruction (branch instruction) 104 is shown. In that case, the next search point of the instruction 104 is either the instruction 105 or the instruction 115. At this time, the path trace information P803 is duplicated to generate path trace information P804 when searching for the basic block B801 and path trace information P805 when searching for the basic block B802. They are used when following the program path.

探索点がプログラム終点である場合、ソースコード解析装置１００は、探索中のプログラムパスにおける解析処理を終了する（ステップＳ７０４）。すべてのプログラムパスの探索を行い、解析処理を終了した場合に限り、ソースコード解析装置１００は、完全な終了条件が満たされたと判断する。 When the search point is the program end point, the source code analyzing apparatus 100 ends the analysis process in the program path being searched (step S704). Only when all the program paths are searched and the analysis process is completed, the source code analysis apparatus 100 determines that the complete termination condition is satisfied.

しかし、プログラム中にループが存在する場合、プログラムパスが無限通りとなり解析処理を終了することができない場合がある。この場合、ループとなるパスを探索する場合のプログラムパス探索条件に制限をつけることで、プログラムパスを辿る命令の探索を終了させることが可能となる。例えば、同じループを通過する回数をカウントし、そのカウント値（ループ回数）が、閾値である所定値に達したら、ループ終了条件が満たされたと判定してプログラムパスの探索を終了することにしてもよい。その場合、閾値をユーザが手動で設定できてもよい。また、データフロー解析によりループ終了条件を定めることにしてもよい。 However, if there is a loop in the program, the program path may be infinite and the analysis process may not be completed. In this case, by limiting the program path search condition when searching for a loop path, the search for an instruction that follows the program path can be terminated. For example, the number of times of passing through the same loop is counted, and when the count value (the number of loops) reaches a predetermined value which is a threshold value, it is determined that the loop end condition is satisfied and the search for the program path is ended. Also good. In that case, the threshold value may be manually set by the user. Further, the loop end condition may be determined by data flow analysis.

通常、間接アドレッシングモードの命令によって指定されるアドレス値はアドレス空間上で制限された領域内のアドレスのみとなることが多いので、ループ回数が有限となることが大半である。特に、生産終了の対象となるような古い設計のマイコンのプログラム場合、メモリ空間のサイズが乏しく、間接アドレッシングモードの命令で指定されるアドレスが取りうるメモリ領域も極めて限定的である場合が多い。したがって、生産終了のマイコンのソフトウェアプログラムを移植するためにソースコードを解析するという目的においては、ループ回数によってループ終了条件を定めるのは有用かつ現実的な手法である。 Usually, since the address value specified by the instruction in the indirect addressing mode is often only an address in an area limited in the address space, the number of loops is usually limited. In particular, in the case of an old designed microcomputer program that is to be discontinued, the size of the memory space is small, and the memory area that can be taken by the address specified by the instruction in the indirect addressing mode is often extremely limited. Therefore, for the purpose of analyzing the source code in order to port the software program of the microcomputer that has been discontinued, it is a useful and realistic technique to determine the loop end condition by the number of loops.

ソースコード解析装置１００は、次に探索点が間接アドレッシングモードの命令であるか否か判定する（ステップＳ７０５）。そうでなければ、ソースコード解析装置１００は、パストレース情報に追記した上で探索点を次命令に進めて探索処理を継続する（ステップＳ７０５、Ｓ７０２、Ｓ７０３）。一方、ステップＳ７０５の判定において探索点が間接アドレッシングモードの命令である場合、ソースコード解析装置１００は、探索処理で利用しているＣＦＧオブジェクトＯＢ１０２を利用してデータフロー解析を実施し、当該命令における到達定義を計算し、当該命令のアドレス値を取得する（ステップＳ７０６、Ｓ７０７）。 Next, the source code analyzing apparatus 100 determines whether or not the search point is an indirect addressing mode instruction (step S705). Otherwise, the source code analysis apparatus 100 adds to the path trace information and advances the search point to the next command and continues the search process (steps S705, S702, and S703). On the other hand, if the search point is an instruction in the indirect addressing mode in the determination in step S705, the source code analysis device 100 performs data flow analysis using the CFG object OB102 used in the search process, and The arrival definition is calculated, and the address value of the instruction is acquired (steps S706 and S707).

式（１）にて上述したデータフロー方程式の計算は、データフローが全体として不動点に到達するまで計算を繰り返す方法がよく利用される。その場合、パストレース情報に含まれない命令も到達定義に含まれる可能性がある。そこで、式（１）を用いたデータフロー解析により得られた到達定義のうち、パストレース情報に含まれない命令を除去することが好ましい。そのようにパストレース情報に含まれない命令を除去した到達定義を元にアドレスの候補値を計算することで、ソフトウェアプログラムが確実にアクセスし得るアドレス値を得ることができるようになる。 For the calculation of the data flow equation described above in Equation (1), a method of repeating the calculation until the data flow reaches the fixed point as a whole is often used. In that case, an instruction not included in the path trace information may be included in the arrival definition. Therefore, it is preferable to remove instructions that are not included in the path trace information from the arrival definition obtained by the data flow analysis using Expression (1). By calculating the candidate value of the address based on the arrival definition from which the instruction not included in the path trace information is removed, an address value that can be reliably accessed by the software program can be obtained.

次に、ソースコード解析装置１００は、以上の過程で得られたアドレス値が有効であるか否か判定する（ステップＳ７０８）。アドレス値が有効である場合、ソースコード解析装置１００は、探索処理で用いているＣＦＧオブジェクトＯＢ１０２を複製し、アドレス値を元に再構成し、再構成されたＣＦＧオブジェクトＯＢ１０２を生成する（ステップＳ７０９）。 Next, the source code analyzing apparatus 100 determines whether or not the address value obtained in the above process is valid (step S708). If the address value is valid, the source code analysis apparatus 100 duplicates the CFG object OB102 used in the search process, reconfigures it based on the address value, and generates a reconfigured CFG object OB102 (step S709). ).

間接アドレッシングモードの命令にはジャンプ命令と非ジャンプ命令とがある。間接アドレッシングモードの命令がジャンプ命令である場合、ソースコード解析装置１００は、基本ブロック間の連結関係を含めてＣＦＧオブジェクトＯＢ１０２を再構成する。一方、間接アドレッシングモードの命令が非ジャンプ命令である場合、ソースコード解析装置１００は、当該命令を絶対値表記に変換するのみでよい。 Indirect addressing mode instructions include jump instructions and non-jump instructions. When the instruction in the indirect addressing mode is a jump instruction, the source code analysis apparatus 100 reconfigures the CFG object OB102 including the connection relationship between basic blocks. On the other hand, when the instruction in the indirect addressing mode is a non-jump instruction, the source code analysis apparatus 100 only needs to convert the instruction into an absolute value notation.

間接アドレッシングモードの命令のアドレス解決に基づきＣＦＧオブジェクトＯＢ１０２を再構成するときにおいても、当該命令が元々間接アドレッシングモードの命令であったことを識別できる情報（フラグ等）を付与する。この情報は、再探索時に当該命令のアドレス値の解決を再実行する際の指標として利用する。その際、アドレス値を保持するレジスタあるいはメモリの情報も併せて保持してもよい。 Even when the CFG object OB102 is reconfigured based on the address resolution of the instruction in the indirect addressing mode, information (flag or the like) that can identify that the instruction was originally an instruction in the indirect addressing mode is given. This information is used as an index when resolving the address value of the instruction during re-search. At this time, information of a register or memory that holds an address value may also be held.

図８は、ＣＦＧオブジェクトが変換される様子を例示する図である。 FIG. 8 is a diagram illustrating a state in which a CFG object is converted.

ここでＣＦＧ９００はプログラムパス探索で利用中のＣＦＧオブジェクトである。ＣＦＧ９０１およびＣＦＧ９０２は、再構成された後のＣＦＧオブジェクトである。例えば、命令１０３における間接アドレッシングモードのジャンプ命令において、破線矢印によって図示されているように、アドレス値を格納するレジスタＲ０が０ｘＸＸＸＸまたは０ｘＹＹＹＹという値をなる場合について考える。その場合、ＣＦＧ９００を２つコピーし、一方はレジスタのアドレスＲ０＝０ｘＸＸＸＸを元にＣＦＧ９００再構成することでＣＦＧ９０１を取得する。他方はレジスタのアドレスＲ０＝０ｘＹＹＹＹを元にＣＦＧ９００を再構成することでＣＦＧ９０２を取得する。 Here, the CFG 900 is a CFG object being used for program path search. CFG 901 and CFG 902 are CFG objects after being reconstructed. For example, in the jump instruction in the indirect addressing mode in the instruction 103, consider a case where the register R0 for storing an address value has a value of 0xXXXX or 0xYYYY, as illustrated by a dashed arrow. In this case, two copies of CFG900 are copied, and one of them obtains CFG901 by reconfiguring CFG900 based on register address R0 = 0xXXXX. The other obtains CFG 902 by reconfiguring CFG 900 based on register address R0 = 0xYYYY.

ステップＳ７０９にて再構成されたＣＦＧオブジェクトは、管理情報と共にＣＦＧ管理テーブルＯＢ１０３に登録される（ステップＳ７１０）。ＣＦＧオブジェクトが格納された様子のイメージは図５に示した通りであり、管理情報として、ＣＦＧオブジェクトＯＢ１０２の識別情報と、開始位置からアドレス解決対象の命令までのパストレース情報が含まれる。 The CFG object reconstructed in step S709 is registered in the CFG management table OB103 together with the management information (step S710). The image of the state in which the CFG object is stored is as shown in FIG. 5, and includes management information including identification information of the CFG object OB102 and path trace information from the start position to the instruction to be resolved.

次に、ソースコード解析装置１００は、再構成ＣＦＧオブジェクトＯＢ３０１をプログラムパスの探索対象となるＣＦＧオブジェクトとして設定し、解析終了条件が満たされるまで探索処理を繰り返す（ステップＳ７１１）。 Next, the source code analyzing apparatus 100 sets the reconstructed CFG object OB301 as a CFG object to be searched for a program path, and repeats the search process until the analysis end condition is satisfied (step S711).

ステップＳ７０９において、アドレス値の候補が複数となる場合、ソースコード解析装置１００は、それぞれのアドレス値を元に再構成された再構成ＣＦＧオブジェクトＯＢ３０１のそれぞれに対してプログラムパスに沿った探索処理を実行する。この際、ソースコード解析装置１００は、パストレース情報も併せて複製し、プログラムパスの探索において分岐命令を発見した場合と同様、非決定的に処理を実行する。ステップＳ７０８において有効なアドレスの候補値を得られない場合、ソースコード解析装置１００は、前述の通りエラーを生成して処理を終了する（ステップＳ７１２）。 If there are a plurality of address value candidates in step S709, the source code analysis apparatus 100 performs a search process along the program path for each reconstructed CFG object OB301 reconstructed based on each address value. Run. At this time, the source code analyzing apparatus 100 also duplicates the path trace information and executes the process nondeterministically as in the case where the branch instruction is found in the program path search. If a candidate value for a valid address cannot be obtained in step S708, the source code analyzing apparatus 100 generates an error as described above and ends the process (step S712).

以上の処理により、ＣＦＧ管理テーブルＯＢ１０３は、プログラムが取り得る全てのＣＦＧオブジェクトＯＢ１０２のパターンを保持することができ、更に、それらＣＦＧオブジェクトＯＢ１０２は間接アドレッシングモードの命令による影響を考慮済みのものとなる。 As a result of the above processing, the CFG management table OB103 can hold patterns of all CFG objects OB102 that can be taken by the program, and these CFG objects OB102 have already taken into account the influence of instructions in the indirect addressing mode. .

図９は、プログラム解析処理部ＢＬ１０３による解析処理のフローチャートである。解析処理は、ＣＦＧ管理テーブルＯＢ１０３に登録されたＣＦＧオブジェクトＯＢ１０２を利用して実行される。 FIG. 9 is a flowchart of analysis processing by the program analysis processing unit BL103. The analysis process is executed using the CFG object OB102 registered in the CFG management table OB103.

最初に、プログラム解析処理部ＢＬ１０３は、解析の対象となる命令を識別する（ステップＳ１０００）。解析対象命令は、ユーザが手動で指定する方法、所定の条件により機械的に自動指定する方法などがある。 First, the program analysis processing unit BL103 identifies an instruction to be analyzed (step S1000). The analysis target instruction includes a method of manually specifying by the user, a method of automatically specifying mechanically according to a predetermined condition, and the like.

解析対象命令が指定されると、プログラム解析処理部ＢＬ１０３は、ＣＦＧ管理テーブルＯＢ１０３へアクセスし、最初のＣＦＧオブジェクトＯＢ１０２を取得する（ステップＳ１００１）。ＣＦＧ管理テーブルＯＢ１０３においてはＣＦＧオブジェクトＯＢ１０２のデータは様々な管理方法で管理され得る。最も単純な方法はリスト型で記録し、管理する方法である。ＣＦＧオブジェクトＯＢ１０２をリストとして管理する場合、リストの先頭にあるＣＦＧオブジェクトＯＢ１０２が、上述した最初のＣＦＧオブジェクトＯＢ１０２に該当する。 When the analysis target instruction is specified, the program analysis processing unit BL103 accesses the CFG management table OB103 and acquires the first CFG object OB102 (step S1001). In the CFG management table OB103, the data of the CFG object OB102 can be managed by various management methods. The simplest method is to record and manage in a list format. When managing the CFG object OB102 as a list, the CFG object OB102 at the top of the list corresponds to the first CFG object OB102 described above.

次に、プログラム解析処理部ＢＬ１０３は、ＣＦＧ管理テーブルＯＢ１０３から取得した最初のＣＦＧオブジェクトＯＢ１０２に紐付けられた管理情報に含まれるパストレース情報を参照し、そのパストレース情報で示されたプログラムパスに解析対象命令が含まれているか否か検証する（ステップＳ１００２）。 Next, the program analysis processing unit BL103 refers to the path trace information included in the management information associated with the first CFG object OB102 acquired from the CFG management table OB103, and sets the program path indicated by the path trace information. It is verified whether or not an instruction to be analyzed is included (step S1002).

パストレース情報に解析対象命令が含まれていない場合、それは、そのＣＦＧオブジェクトＯＢ１０２は解析対象命令が実行されたときに通らないＣＦＧオブジェクトＯＢ１０２であることを意味するので、プログラム解析処理部ＢＬ１０３は、次に、全てのＣＦＧオブジェクトＯＢ１０２を検証したか否か判定する（ステップＳ１００３）。全てのＣＦＧオブジェクトＯＢ１０２が検証済であれば、プログラム解析処理部ＢＬ１０３は処理を終了する。また、未だ検証を行っていないＣＦＧオブジェクトＯＢ１０２があれば、プログラム解析処理部ＢＬ１０３は次のＣＦＧオブジェクトＯＢ１０２を取得し（ステップＳ１００４）、ステップＳ１００２に戻る。 When the analysis target instruction is not included in the path trace information, it means that the CFG object OB102 is a CFG object OB102 that does not pass when the analysis target instruction is executed. Next, it is determined whether or not all CFG objects OB102 have been verified (step S1003). If all the CFG objects OB102 have been verified, the program analysis processing unit BL103 ends the process. If there is a CFG object OB102 that has not been verified yet, the program analysis processing unit BL103 acquires the next CFG object OB102 (step S1004), and the process returns to step S1002.

ステップＳ１００２にてパストレース情報で示されたプログラムパスに解析対象命令が含まれていたら、プログラム解析処理部ＢＬ１０３は、ＣＦＧオブジェクトＯＢ１０２を利用してソフトウェアプログラムを解析する所定の処理を実行する（ステップＳ１００５）。この解析処理として様々な処理が適用可能であるが、一例として、データフロー解析により解析対象命令の到達定義を計算する処理がある。 If an analysis target instruction is included in the program path indicated by the path trace information in step S1002, the program analysis processing unit BL103 executes predetermined processing for analyzing the software program using the CFG object OB102 (step S1002). S1005). Various processes can be applied as the analysis process. As an example, there is a process of calculating the arrival definition of the instruction to be analyzed by data flow analysis.

次に、プログラム解析処理部ＢＬ１０３は、ステップＳ１００５の解析処理において、パストレース情報に含まれない命令を含むような解析結果が生じたら。それを除去する（ステップＳ１００６）。そのような解析結果は無効なものだからである。例えば、前方解析あるいは後方解析などのＣＦＧオブジェクトＯＢ１０２を用いてデータフロー方程式に基づいてソフトウェアプログラムを解析する場合、到達定義が不動点に達するまでデータフロー方程式を繰り返し解くという処理が存在する場合がある。そのような場合、解析対象命令が実行された後に実行される命令が解析対象命令の解析結果に含まれてくることが起こりうる。そして、後から実行される命令が実行されていない状態の情報に基づく解析結果は無効である。その場合には、ＣＦＧオブジェクトＯＢ１０２に紐付けられたパストレース情報を元に命令の解析結果を得る必要があるため、ステップＳ１００６で、プログラム解析処理部ＢＬ１０３は、無効な解析結果を除去しておく。 Next, the program analysis processing unit BL103 generates an analysis result including an instruction not included in the path trace information in the analysis processing in step S1005. It is removed (step S1006). This is because such an analysis result is invalid. For example, when analyzing a software program based on a data flow equation using a CFG object OB102 such as forward analysis or backward analysis, there may be a process of repeatedly solving the data flow equation until the arrival definition reaches a fixed point. . In such a case, an instruction that is executed after the analysis target instruction is executed may be included in the analysis result of the analysis target instruction. An analysis result based on information on a state in which an instruction to be executed later is not executed is invalid. In that case, since it is necessary to obtain the analysis result of the instruction based on the path trace information associated with the CFG object OB102, the program analysis processing unit BL103 removes the invalid analysis result in step S1006. .

次に、プログラム解析処理部ＢＬ１０３は、得られた解析結果を出力する（ステップＳ１００７）。解析対象命令の解析結果に、その命令の解析に利用されたＣＦＧオブジェクトＯＢ１０２に関する情報を付加する。このとき付加するＣＦＧオブジェクトＯＢ１０２に付加する情報は、ＣＦＧオブジェクトＯＢ１０２に紐付けられた管理情報に含まれている識別情報であってもよいし、パストレース情報を含むＣＦＧオブジェクトＯＢ１０２そのものであってもよい。 Next, the program analysis processing unit BL103 outputs the obtained analysis result (step S1007). Information related to the CFG object OB102 used for analyzing the instruction is added to the analysis result of the instruction to be analyzed. The information added to the CFG object OB102 added at this time may be identification information included in the management information linked to the CFG object OB102, or may be the CFG object OB102 itself including path trace information. Good.

プログラム解析処理部ＢＬ１０３は、解析結果を、人間が容易に解読可能な形式、あるいはコンピュータによるバッチ処理が可能な形式などで出力すればよい。 The program analysis processing unit BL103 may output the analysis result in a format that can be easily read by humans or a format that can be batch processed by a computer.

図１０は、人間が容易に解読可能な形式となった解析結果が画面に表示されている様子を示す図である。図１０の例では、データフローを示す画像の画像データを含めてＷｅｂブラウザで閲覧可能な形式となっている。図１１は、コンピュータがバッチ処理することができる形式の解析結果を示す図である。図１１の例では、ＸＭＬベースの言語で記述された解析結果（ＯＢ１２００）と、ＣＳＶ（ＯＢ１２０１）、バイナリデータなど機械処理可能な形式で出力する。これらの処理結果を元に、プログラム最適化処理部ＢＬ１０４は、ソフトウェアプログラムのソースコードＯＢ１０１を最適化する処理を行う。 FIG. 10 is a diagram illustrating a state in which an analysis result in a format that can be easily deciphered by a human being is displayed on the screen. In the example of FIG. 10, the image data including the image data indicating the data flow is in a format that can be browsed by the Web browser. FIG. 11 is a diagram illustrating an analysis result in a format that can be batch processed by a computer. In the example of FIG. 11, an analysis result (OB1200) described in an XML-based language, CSV (OB1201), binary data, and the like are output in a machine-processable format. Based on these processing results, the program optimization processing unit BL104 performs processing for optimizing the source code OB101 of the software program.

上述したように最適化処理は様々考えられるが、例を挙げると、複数の命令の意味を解析してそれら複数の命令を含んだ処理を効率よく実現するようにコードを変換する処理、内部レジスタを自動更新する処理を含めた不要な命令の削除、制御構造を最適にする処理、異なるメモリ空間の変換、ＩＯに依存する命令の抽出、高可読性のコードの生成処理などである。 As mentioned above, there are various possible optimization processes. For example, the process of converting the code to analyze the meaning of a plurality of instructions and efficiently realizing a process including the plurality of instructions, an internal register Delete unnecessary instructions including the process of automatically updating the process, process for optimizing the control structure, conversion of different memory space, extraction of instructions depending on IO, and process of generating highly readable code.

以上説明したように、本実施形態によると、間接アドレッシングモードの命令による影響を考慮しながらＣＦＧオブジェクトＯＢ１０２を生成でき、そのＣＦＧオブジェクトＯＢ１０２を元に制的解析によりソースコードＯＢ１０１の解析を実行することができる。その解析結果を、ブラウザなどでの可読性が高い方法で閲覧できるようにする手段や、コンピュータが認識可能な形式へ変換することで、ソースコード移行に係る工数削減や、最適化された変換済コードを生成できる。 As described above, according to the present embodiment, the CFG object OB102 can be generated while taking into consideration the influence of the instruction in the indirect addressing mode, and the analysis of the source code OB101 is executed by the systematic analysis based on the CFG object OB102. Can do. The analysis results can be viewed in a way that is highly readable by a browser, etc., or converted into a computer-recognizable format, reducing man-hours related to source code migration and optimized converted code Can be generated.

また、本実施形態では、形式検証などに代表される不具合検証を目的とせず、動作が保証されたプログラムを元に解析するため、有限時間内で解析できることが多く、異なるマイコン間でソースコードＯＢ１０１を移植する場合に有用である。 Further, in the present embodiment, since the analysis is based on a program whose operation is guaranteed without aiming at defect verification represented by format verification or the like, the analysis can often be performed within a finite time, and the source code OB101 between different microcomputers. Useful when transplanting.

また、本実施形態では、制御フローグラフ生成部（ＣＦＧ生成部ＢＬ１０１）が、ソースコードＯＢ１０１の構文を解析し、ソフトウェアプログラムを、内部に分岐および合流を有しない命令の集合である基本ブロックとその基本ブロック間の制御構造を示す有向グラフとで表わした制御フローグラフ（ＣＦＧ）を生成する。解決部（アドレス値解決部ＢＬ１０２）は、ソフトウェアプログラムの構文の解析で値が定まらない部分を未定とした制御フローグラフに基づいてソフトウェアプログラム内の所定の解決対象を探索し、制御フローグラフを用いて解決対象の到達定義を計算し、その到達定義から得られる値を制御フローグラフに反映することにより、制御フローグラフを再構成する。 In the present embodiment, the control flow graph generation unit (CFG generation unit BL101) analyzes the syntax of the source code OB101, and converts the software program into a basic block that is a set of instructions that do not have branching and merging therein, and A control flow graph (CFG) represented by a directed graph indicating a control structure between basic blocks is generated. The resolution unit (address value resolution unit BL102) searches for a predetermined resolution target in the software program based on a control flow graph in which a portion whose value is not determined by analysis of the syntax of the software program is undetermined, and uses the control flow graph Then, the arrival definition to be solved is calculated, and the control flow graph is reconstructed by reflecting the value obtained from the arrival definition in the control flow graph.

これによれば、ソフトウェアプログラムを構文解析して得た制御フローグラフで未定値を生じさせている解決対象を探索し、その解決対象を制御フローグラフを用いて解決し、その解析で得た値を制御フローグラフに反映させることにより、未定値となっている部分を決定していく。構文解析では明らかとならない部分を制御フローグラフを用いて解決していくので、ソフトウェアプログラムをより適切に解析することが可能となる。 According to this, a search target that causes an undetermined value is searched for in the control flow graph obtained by parsing the software program, the solution target is solved using the control flow graph, and the value obtained by the analysis Is reflected in the control flow graph to determine the portion that is an undetermined value. Since the part that is not clarified by the syntax analysis is solved using the control flow graph, the software program can be analyzed more appropriately.

また、本実施形態では、解決部（アドレス値解決部ＢＬ１０２）は、制御フローグラフを再構成すると、解決した解決対象を始点としてプログラムパスを辿って次の解決対象を探索し、その解決対象を解決して制御フローグラフを再構成するという処理を、ソフトウェアプログラム内の全ての解決対象を解決するまで繰り返す。 In the present embodiment, when the resolving unit (address value resolving unit BL102) reconstructs the control flow graph, the resolving target is searched for the next resolving target starting from the resolving target, and the resolving target is determined. The process of resolving and reconfiguring the control flow graph is repeated until all resolution targets in the software program are resolved.

これによれば、解決対象を探索して解決するという処理を全ての解決対象を解決するまで繰り返すので、ソフトウェアプログラムをより適切に解析することが可能となる。 According to this, since the process of searching for and solving a solution target is repeated until all the solution targets are solved, the software program can be analyzed more appropriately.

また、本実施形態では、プログラム解析処理部ＢＬ１０３は、再構成された制御フローグラフに基づいてソースコードを解析する。再構成された制御フローグラフを用いてソースコードを解析することができるので、より適切にソフトウェアプログラムを解析することができる。 In the present embodiment, the program analysis processing unit BL103 analyzes the source code based on the reconfigured control flow graph. Since the source code can be analyzed using the reconfigured control flow graph, the software program can be analyzed more appropriately.

また、本実施形態によれば、プログラム解析処理部ＢＬ１０３は、制御フローグラフに基づくデータフロー解析によりソースコードを解析する。これによれば、データフロー解析でソースコードを解析するので、構文解析では未定値となってしまう部分を確定させることができる。 In addition, according to the present embodiment, the program analysis processing unit BL103 analyzes the source code by data flow analysis based on the control flow graph. According to this, since the source code is analyzed by the data flow analysis, it is possible to determine a portion that becomes an undetermined value by the syntax analysis.

また、本実施形態では、プログラム解析処理部ＢＬ１０３は、ソースコードの解析により得た情報と、データフロー解析に用いた制御フローグラフと、を解析結果として出力する。これによれば、ソースコードの解析結果とともに制御フローグラフを出力するので、ソースコード解析の課程で行われたデータフロー解析の状況を知ることができる。 In this embodiment, the program analysis processing unit BL103 outputs information obtained by analyzing the source code and the control flow graph used for the data flow analysis as analysis results. According to this, since the control flow graph is output together with the analysis result of the source code, it is possible to know the status of the data flow analysis performed in the source code analysis process.

また、本実施形態によれば、解決部（アドレス値解決部ＢＬ１０２）は、制御フローグラフを再構成するとき、ソフトウェアプログラムの開始位置から制御フローグラフまでのプログラムパスを示すパストレース情報を制御フローグラフと共に記録し、パストレース情報を参照してソースコードの静的解析を実行する。これによれば、パストレース情報を記録しそれを参照してソフトウェアプログラムの静的解析を実行するので、パストレース情報が示すプログラムパスに沿ってソフトウェアの静的解析を進めることができる。
（他の実施形態） Further, according to the present embodiment, when resolving the control flow graph, the resolving unit (address value resolving unit BL102) controls the path trace information indicating the program path from the start position of the software program to the control flow graph. Record along with the graph and refer to the path trace information for static analysis of the source code. According to this, since the path trace information is recorded and the static analysis of the software program is executed by referring to the recorded information, the software static analysis can be advanced along the program path indicated by the path trace information.
(Other embodiments)

上述の実施形態では、間接アドレッシングモードの命令を含むソースコードＯＢ１０１の解析を例示したが、ここでは他の実施形態として自己書き換えコードを含むソースコードの解析について例示する。 In the above-described embodiment, the analysis of the source code OB101 including the instruction in the indirect addressing mode is illustrated, but here, the analysis of the source code including the self-rewriting code is illustrated as another embodiment.

自己書き換えコードとは、ＭＯＶ命令などのデータコピー命令を利用し、プログラムを実行中に実行しているプログラムのプログラムコード記憶域のデータを書き換えることにより、プログラムの処理内容を更新する命令である。自己書き換えコードは、アセンブリ言語により簡単に記述可能であり、様々な用途で利用される。上述した実施形態で間接アドレッシングモードの命令に対応したのに対して、他の実施形態では、自己書き換えコードの命令に対応する。 The self-rewriting code is an instruction that uses a data copy instruction such as an MOV instruction to update the processing content of the program by rewriting data in the program code storage area of the program being executed. The self-modifying code can be easily described in an assembly language and used for various purposes. In the above-described embodiment, the instruction in the indirect addressing mode is supported. In the other embodiments, the instruction in the self-rewriting code is supported.

図１２は、他の実施形態によるソースコード解析装置の一部を示すブロック図である。図１２には、本実施形態のソースコード解析装置は、図１に示したソースコード解析装置１００とは、アドレス値解決部ＢＬ１０２が自己書き換えコード解決部ＢＬ１３００へ置き換えられている点で異なることが示されている。 FIG. 12 is a block diagram showing a part of a source code analyzing apparatus according to another embodiment. 12 is different from the source code analysis device 100 shown in FIG. 1 in that the address value resolution unit BL102 is replaced with a self-rewrite code resolution unit BL1300. It is shown.

自己書き換えコード解決部ＢＬ１３００は、自己書き換えコード判定部ＢＬ１３０１と、自己書き換えコード識別部ＢＬ１３０２とを有している。 The self-rewriting code resolution unit BL1300 includes a self-rewriting code determination unit BL1301 and a self-rewriting code identification unit BL1302.

自己書き換えコード判定部ＢＬ１３０１は、入力されたＣＦＧオブジェクトＯＢ１０２内の命令に対して、その命令がメモリ領域にデータを書き込む命令である場合、そのメモリ領域がプログラム領域であるか判定する。データを書き込む命令でプログラム領域を書き換える命令であれば、その命令は自己書き換えコードである。 When the instruction in the CFG object OB102 is an instruction to write data in the memory area, the self-rewriting code determination unit BL1301 determines whether the memory area is a program area. If the instruction is to rewrite the program area with an instruction to write data, the instruction is a self-rewriting code.

自己書き換えコード識別部ＢＬ１３０２は、当該命令が自己書き換えコードであれば、その命令で書き込まれる値を取得し、その値から具体的なメモリの内容を識別する。自己書き換えコードによって書き換えられたメモリの内容が分かれば、ソースコードＯＢ１０１の解析が可能となる。 If the instruction is a self-rewriting code, the self-rewriting code identifying unit BL1302 acquires a value written by the instruction and identifies specific memory contents from the value. If the contents of the memory rewritten by the self-rewriting code are known, the source code OB101 can be analyzed.

自己書き換えコード判定部ＢＬ１３０１は、命令が自己書き換えコードであるか否かを判定するための指標として、メモリにおけるプログラム領域のアドレス情報を定義したプログラム域定義リストＯＢ１３０１を保持している。 The self-rewriting code determination unit BL1301 holds a program area definition list OB1301 that defines address information of a program area in the memory as an index for determining whether or not the instruction is a self-rewriting code.

メモリのプログラム領域を知る方法として、例えば、以下に示す３つの方法がある。ひとつ目は、ソースコードから判定する方法である。二つ目は、マイコンのマニュアルから判定する方法である。三つ目は、ソースコードＯＢ１０１のデータフローを解析し、プログラムカウンタの情報を頼りに実行コードとして参照されるメモリ領域から判定する方法である。 As a method of knowing the program area of the memory, for example, there are the following three methods. The first is a method of judging from source code. The second is a method of judging from the manual of the microcomputer. The third method is a method of analyzing the data flow of the source code OB101 and determining from a memory area referred to as an execution code based on information of the program counter.

また、自己書き換えコード識別部ＢＬ１３０２は、メモリのプログラム領域に書き込まれる値（バイナリ値）と、その値が示す命令についての情報（命令情報）との対応関係を示す情報を保持する命令コードテーブルＯＢ１３０２を有している。命令コードテーブルＯＢ１３０２に格納する情報は、主にマイコンの仕様書から取得できる。 The self-rewriting code identification unit BL1302 holds an instruction code table OB1302 that holds information indicating a correspondence relationship between a value (binary value) written in the program area of the memory and information (instruction information) about an instruction indicated by the value. have. Information stored in the instruction code table OB1302 can be obtained mainly from the specifications of the microcomputer.

図１３は、他の実施形態によるソースコード解析装置の解析処理のフローチャートである。 FIG. 13 is a flowchart of analysis processing of a source code analysis apparatus according to another embodiment.

まず、ソースコード解析装置１００は、図４の間接アドレッシングモードの命令を解析するときと同様に、ソースコードＯＢ１０１からＣＦＧを生成した後、ソースコードＯＢ１０１に自己書き換えコードが存在するか否か確認する（ステップＳ１４００）。ある命令が自己書き換えコードであるかどうかは、プログラム域定義リストＯＢ１３０１を元に、命令によりデータが書き込まれるメモリアドレスがプログラム領域であるか否か判定し、データが書き込まれるメモリアドレスがプログラム領域であれば、その命令は自己書き換えコードであると判定する。 First, the source code analyzing apparatus 100 generates a CFG from the source code OB101 and checks whether or not a self-rewriting code exists in the source code OB101, as in the case of analyzing the indirect addressing mode instruction of FIG. (Step S1400). Whether or not an instruction is a self-rewriting code is determined based on the program area definition list OB1301 whether or not the memory address to which data is written by the instruction is a program area. The memory address to which data is written is the program area. If there is, it is determined that the instruction is a self-rewriting code.

ソースコードＯＢ１０１中に自己書き換えコードの存在を確認した場合、ソースコード解析装置１００は、入力によって指定されたＣＦＧオブジェクトＯＢ１０２を探索対象として設定する（ステップＳ１４０１）。 When the existence of the self-rewriting code is confirmed in the source code OB101, the source code analyzing apparatus 100 sets the CFG object OB102 designated by the input as a search target (step S1401).

次に、ソースコード解析装置１００は、探索対象のＣＦＧオブジェクトＯＢ１０２のプログラムパスを先頭から辿り、自己書き換えコードがあるか否か探索する（ステップＳ１４０２）。 Next, the source code analyzing apparatus 100 traces the program path of the CFG object OB102 to be searched from the top, and searches for a self-rewriting code (step S1402).

プログラムパスを辿ってソフトウェアプログラムの終了位置に到達した場合にはソースコード解析装置１００はソースコードＯＢ１０１の解析ができない旨のエラーを出力し（ステップＳ１４０９）、処理を終了する。 When the program path is traced and the end position of the software program is reached, the source code analyzing apparatus 100 outputs an error indicating that the source code OB101 cannot be analyzed (step S1409), and the process ends.

プログラムパスを辿っていく中で分岐を発見したら、ソースコード解析装置１００は、間接アドレッシングモードの命令を発見したときと同様、分岐先ごとに非決定的に探索を行う。ソースコード解析装置１００は、自己書き換えコードについて、探索で利用したＣＦＧオブジェクトＯＢ１０２を基に到達定義を計算し、書き込まれる値を同定する（ステップＳ１４０３）。この書き込まれる値は、命令の内容を識別するのに利用される。 If a branch is found while following the program path, the source code analyzing apparatus 100 performs a nondeterministic search for each branch destination as in the case of finding an instruction in the indirect addressing mode. The source code analyzing apparatus 100 calculates the arrival definition for the self-rewritten code based on the CFG object OB102 used in the search, and identifies the value to be written (step S1403). This written value is used to identify the contents of the instruction.

次に、ソースコード解析装置１００は、ステップＳ１４０３の処理で得られた値から、命令コードテーブルＯＢ１３０２を参照して新たに書き換わる命令の内容を取得し、その命令を表わすコードが有効なものであるか否か検証する（ステップＳ１４０４）。その命令が有効なものでなければ、ソースコード解析装置１００は、ソースコード２０を解析できない旨を示すエラーを出力して処理を終了する（ステップＳ１４０９）。 Next, the source code analyzing apparatus 100 obtains the contents of the newly rewritten instruction with reference to the instruction code table OB1302 from the value obtained in the process of step S1403, and the code representing the instruction is valid. It is verified whether or not there is (step S1404). If the instruction is not valid, the source code analyzing apparatus 100 outputs an error indicating that the source code 20 cannot be analyzed and ends the processing (step S1409).

その命令が有効なものであれば、ソースコード解析装置１００は、探索対象のＣＦＧオブジェクトＯＢ１０２を複製し、書き換わる命令の情報を基にＣＦＧオブジェクトＯＢ１０２を再構成する（ステップＳ１３０５）。 If the instruction is valid, the source code analyzing apparatus 100 duplicates the CFG object OB102 to be searched, and reconfigures the CFG object OB102 based on the information of the instruction to be rewritten (step S1305).

更に、ソースコード解析装置１００は、間接アドレッシングモードの命令の場合と同様、解析終了条件が満たされているか否か確認する（ステップＳ１４０６）。解析終了条件が満たされて入れば、ソースコード解析装置１００は処理を終了する。解析終了条件が満たされていなければ、ソースコード解析装置１００は、再構成されたＣＦＧオブジェクトＯＢ１０２（再構成ＣＦＧオブジェクトＯＢ３０１）をＣＦＧ管理テーブルＯＢ１０３へ登録する（ステップＳ１４０７）。更に、ソースコード解析装置１００は、再構成ＣＦＧオブジェクトＯＢ３０１を探索対象に設定し（ステップＳ１４０８）、ステップＳ１４０２に戻り、解析終了条件が満たされるまで処理を繰り返す。 Further, the source code analyzing apparatus 100 checks whether or not the analysis end condition is satisfied, as in the case of the indirect addressing mode instruction (step S1406). If the analysis end condition is satisfied, the source code analyzing apparatus 100 ends the process. If the analysis end condition is not satisfied, the source code analysis apparatus 100 registers the reconfigured CFG object OB102 (reconstructed CFG object OB301) in the CFG management table OB103 (step S1407). Further, the source code analysis apparatus 100 sets the reconstructed CFG object OB301 as a search target (step S1408), returns to step S1402, and repeats the processing until the analysis end condition is satisfied.

以上の処理により、ソースコード解析装置１００は、自己書き換えコードを含むソースコードＯＢ１０１のＣＦＧオブジェクトＯＢ１０２を取得することができる。また、ソースコード解析装置１００は、そのＣＦＧオブジェクトＯＢ１０２を利用してソースコードＯＢ１０１の静的解析を実現できる。間接アドレッシングモードの場合と同様、得られたＣＦＧから得られた解析結果を、様々な形式へ変換することで、ソースコードＯＢ１０１の移植に係る工数を削減したり、あるいは最適化された変換済のソースコードを自動生成したりすることも可能となる。 Through the above processing, the source code analyzing apparatus 100 can acquire the CFG object OB102 of the source code OB101 including the self-rewriting code. Further, the source code analyzing apparatus 100 can realize static analysis of the source code OB101 by using the CFG object OB102. As in the indirect addressing mode, the analysis result obtained from the obtained CFG can be converted into various formats to reduce the man-hours related to the porting of the source code OB101 or the optimized converted result. It is also possible to automatically generate source code.

上述した本発明の各実施形態は、本発明の説明のための例示であり、本発明の範囲をそれらの実施形態にのみ限定する趣旨ではない。当業者は、本発明の要旨を逸脱することなしに、他の様々な態様で本発明を実施することができる。 Each of the embodiments of the present invention described above is an example for explaining the present invention, and is not intended to limit the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the gist of the present invention.

１００…ソースコード解析装置、ＢＬ１０１…ＣＦＧ生成部、ＢＬ１０２…アドレス値解決部、ＢＬ１０３…プログラム解析処理部、ＢＬ１０４…プログラム最適化処理部、ＢＬ１３００…コード解決部、ＢＬ１３０１…コード判定部、ＢＬ１３０２…コード識別部、ＢＬ３０１…プログラムパス探索部、ＢＬ３０２…データフロー解析部、ＢＬ３０３…ＣＦＧ再構成部、ＢＬ３０４…ＣＦＧ検証部、ＯＢ１０１…ソースコード、ＯＢ１０２…ＣＦＧオブジェクト、ＯＢ１０３…ＣＦＧ管理テーブル、ＯＢ１０５…プログラム解析結果、ＯＢ１０６…ソースコード、ＯＢ１３０１…プログラム域定義リスト、ＯＢ１３０２…命令コードテーブル、ＯＢ３０１…再構成ＣＦＧオブジェクト
DESCRIPTION OF SYMBOLS 100 ... Source code analyzer, BL101 ... CFG generation part, BL102 ... Address value resolution part, BL103 ... Program analysis process part, BL104 ... Program optimization process part, BL1300 ... Code resolution part, BL1301 ... Code determination part, BL1302 ... Code Identification unit, BL301 ... Program path search unit, BL302 ... Data flow analysis unit, BL303 ... CFG reconstruction unit, BL304 ... CFG verification unit, OB101 ... Source code, OB102 ... CFG object, OB103 ... CFG management table, OB105 ... Program analysis Result, OB106 ... source code, OB1301 ... program area definition list, OB1302 ... instruction code table, OB301 ... reconstructed CFG object

Claims

A source code arithmetic device for analyzing a source code of a software program,
Analyzing the syntax of the source code and generating a control flow graph representing the software program as a basic block that is a set of instructions that do not have branches and merges therein and a directed graph that indicates a control structure between the basic blocks A control flow graph generation unit;
Based on the control flow graph in which a portion whose value is not determined in the analysis of the syntax of the software program is undetermined, a predetermined resolution target in the software program is searched, and the resolution definition of the resolution target is determined using the control flow graph And resolving the control flow graph by reflecting the value obtained from the arrival definition in the control flow graph, and
A source code arithmetic unit having

When the resolving unit reconstructs the control flow graph, it searches the next resolving target by tracing the program path starting from the resolved resolving target, resolves the resolving target, and reconstructs the control flow graph. Repeat the process until all resolution targets in the software program are resolved,
The source code arithmetic device according to claim 1.

The source code arithmetic device according to claim 1, further comprising a program analysis processing unit that analyzes the source code based on the reconfigured control flow graph.

4. The source code arithmetic device according to claim 3, wherein the program analysis processing unit analyzes the source code by data flow analysis based on the control flow graph.

5. The source code arithmetic device according to claim 4, wherein the program analysis processing unit outputs information obtained by analyzing the source code and the control flow graph used for the data flow analysis as an analysis result.

The resolving unit, when reconfiguring the control flow graph, records path trace information indicating a program path from the start position of the software program to the control flow graph together with the control flow graph, and refers to the path trace information The source code arithmetic device according to claim 1, wherein static analysis of the source code is performed.

The source code arithmetic device according to claim 1, wherein the solution target is an instruction in an indirect addressing mode.

The source code arithmetic device according to claim 1, wherein the solution target is an instruction to rewrite a program area during execution.

A source code calculation method for analyzing a source code of a software program,
Control flow graph generation means analyzes the syntax of the source code, and represents the software program as a basic block that is a set of instructions that do not have branches and merges therein and a directed graph that indicates a control structure between the basic blocks. Generated control flow graph,
A solving means searches for a predetermined solution target in the software program based on the control flow graph in which a portion whose value is not determined in the analysis of the syntax of the software program is undetermined, and uses the control flow graph to solve the solution Reconstructing the control flow graph by calculating the target arrival definition and reflecting the value obtained from the arrival definition in the control flow graph;
Source code calculation method.

When resolving the control flow graph, the solving means searches for a next resolution target by tracing the program path starting from the resolved resolution target, resolves the resolution target, and reconfigures the control flow graph. Repeat the process until all resolution targets in the software program are resolved,
The source code calculation method according to claim 9.