JP2016024784A

JP2016024784A - Bug prediction device, method, and program

Info

Publication number: JP2016024784A
Application number: JP2014151117A
Authority: JP
Inventors: 啓一田端; Keiichi Tabata; 治門丹野; Haruto Tanno; 守英生沼; Morihide Oinuma; 村主　一仁; Kazuhito Muranushi; 一仁村主
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2014-07-24
Filing date: 2014-07-24
Publication date: 2016-02-08

Abstract

PROBLEM TO BE SOLVED: To predict a function including many bugs to be found in a test process about created software during a period after starting a manufacturing process and before starting the test process.SOLUTION: Under assumption that as many as places which have been modified in the near past are likely to be modified in the near future, functions of software related to functions including many bugs are estimated by: acquiring a change history of source codes; limiting to only revisions where it is determined that modification of the bugs is performed according to the change history; creating a matrix for deriving change places of the source codes having strong functional relationship; and clustering the matrix.SELECTED DRAWING: Figure 1

Description

本発明は、バグ予測装置及び方法及びプログラムに係り、特に、ソースコード上のバグを予測するためのバグ予測装置及び方法及びプログラムに関する。 The present invention relates to a bug prediction apparatus, method, and program, and more particularly, to a bug prediction apparatus, method, and program for predicting a bug in a source code.

ソースコード上でバグのありそうな箇所を連続するバージョンのソースコードから予測する技術や（例えば、非特許文献１参照）、ソースコード上で修正の多い箇所が、近い将来も修正される傾向にある、という経験則に基づいたバグ予測技術がある（例えば、非特許文献２参照）。 A technique for predicting likely bugs in the source code from successive versions of the source code (see, for example, Non-Patent Document 1), and many corrections on the source code tend to be corrected in the near future. There is a bug prediction technique based on the empirical rule that there is (see, for example, Non-Patent Document 2).

また、ソースコードの中で同時に変更される傾向にある箇所は機能的な関連性が高い、という経験則による、ソースコードから機能への多対多マッピング技術がある（例えば、非特許文献３参照）。 In addition, there is a many-to-many mapping technique from source code to function based on an empirical rule that parts that tend to be changed at the same time in the source code are highly functionally related (for example, see Non-Patent Document 3). ).

Y. Higo, K. Murao, S. Kusumoto, K. Inoue, "Predicting Fault-Prone Modules Based on Metrics Transitions", DEFECTS '08.Y. Higo, K. Murao, S. Kusumoto, K. Inoue, "Predicting Fault-Prone Modules Based on Metrics Transitions", DEFECTS '08. S. Kim, T. Zimmermann, E.J. Whitehead, and A. Zeller, "Predicting faults from cached history," ICSE '07.S. Kim, T. Zimmermann, E.J.Whitehead, and A. Zeller, "Predicting faults from cached history," ICSE '07. R. Robbes, D. Pollet, and M. Lanza, "Logical coupling based on fine-grained change information," WCRE '08.R. Robbes, D. Pollet, and M. Lanza, "Logical coupling based on fine-grained change information," WCRE '08. "An efficient algorithm for large-scale detection of protein families". A. J. Enright*, S. Van Dongen and C.A. Ouzounis, Nucleic Acids Research, 2002, Vol. 30, No. 7 1575-1584."An efficient algorithm for large-scale detection of protein families". A. J. Enright *, S. Van Dongen and C.A. Ouzounis, Nucleic Acids Research, 2002, Vol. 30, No. 7 1575-1584. "出現頻度と連接頻度に基づく専門用語抽出",中川裕志,湯本紘彰,森辰則,自然言語処理2003."Extraction of technical terms based on appearance frequency and connection frequency", Hiroshi Nakagawa, Tomoaki Yumoto, Tomonori Mori, Natural Language Processing 2003. "Identifying Linux bug fixing patches", Yuan Tian; J.: Lo, D., ICSE 2012."Identifying Linux bug fixing patches", Yuan Tian; J .: Lo, D., ICSE 2012.

しかしながら、上記の非特許文献１の技術では、バグ予測の結果が「モジュール」であり、バグの多い「機能」を求められない、という問題がある。 However, the technique of Non-Patent Document 1 described above has a problem that the result of bug prediction is “module”, and “functions” with many bugs cannot be obtained.

また、非特許文献２の技術は、バグが含まれそうなソースコード上の位置を予測可能ではあるが、バグの多い「機能」は求められない。 The technique of Non-Patent Document 2 can predict the position on the source code where a bug is likely to be included, but does not require a “function” with many bugs.

非特許文献３の技術は、上記の非特許文献１，２の技術によってバグの多いソースコード箇所を求めた上で、当該非特許文献３の技術を適用することで、バグの多い機能の候補を求めることはできるが、その候補のうちどの機能が実際にバグを含んでいるのか特定できないという問題がある。 The technique of Non-Patent Document 3 is a candidate for a function with many bugs by applying the technique of Non-Patent Document 3 after obtaining the source code part with many bugs by the techniques of Non-Patent Documents 1 and 2 described above. There is a problem that it is not possible to specify which of the candidates actually contains a bug.

本発明は、上記の点に鑑みなされたもので、作成したソフトウェアについて、製造工程開始後かつテスト工程開始前の期間に、テスト工程において見つけるべきバグが多く含まれる機能を予測することが可能なバグ予測装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and it is possible to predict a function including many bugs to be found in the test process in the period after the start of the manufacturing process and before the start of the test process for the created software. It is an object to provide a bug prediction apparatus, method, and program.

一態様によれば、ソースコード上のバグが多く含まれる機能を検出するためのバグ予測装置であって、
近い過去に修正された箇所ほど近い将来にも修正されやすいという前提の下、ソースコードの変更履歴を取得し、該変更履歴からバグの修正が行われたと判断できるリビジョンのみに限定して、機能的な関連の強いソースコードの変更箇所を導出するための行列を作成し、該行列をクラスタリングすることによりバグが多く含まれる機能のソフトウェアの機能を推定するバグ推定手段を有するバグ予測装置が提供される。 According to one aspect, a bug prediction apparatus for detecting a function including many bugs in a source code,
Functionality is limited to only revisions that can be determined that bugs have been corrected based on the assumption that changes made in the near past are likely to be corrected in the near future. A bug prediction apparatus having a bug estimation means for estimating a function of software having a function including many bugs by creating a matrix for deriving a change point of a source code having a strong relation and clustering the matrix is provided. The

一態様によれば、作成したソフトウェアについて、製造工程開始後かつテスト工程開始前の期間に、テスト工程において見つけるべきバグが多く含まれるソフトウェアの機能を特定することが可能となる。 According to one aspect, it is possible to specify the function of software that includes many bugs to be found in the test process after the manufacturing process starts and before the test process starts.

本発明の一実施の形態におけるバグ予測装置の構成例。The structural example of the bug prediction apparatus in one embodiment of this invention. 本発明の一実施の形態におけるVCSリポジトリの構成例。The structural example of the VCS repository in one embodiment of this invention. 本発明の一実施の形態におけるバグ予測装置の概要動作のフローチャート。The flowchart of the outline | summary operation | movement of the bug prediction apparatus in one embodiment of this invention. 本発明の一実施の形態における修正箇所行列登録部で生成される行列の例。The example of the matrix produced | generated by the correction location matrix registration part in one embodiment of this invention. 本発明の一実施の形態における変更箇所登録処理の詳細フローチャート。The detailed flowchart of the change location registration process in one embodiment of this invention. 本発明の一実施の形態におけるバグ修正リビジョン登録処理の詳細フローチャート。The detailed flowchart of the bug correction revision registration process in one embodiment of this invention. 本発明の一実施の形態における修正箇所行列作成処理の詳細フローチャート。The detailed flowchart of the correction location matrix preparation process in one embodiment of this invention. 本発明の一実施の形態における修正箇所行列初期化処理の詳細フローチャート。The detailed flowchart of the correction location matrix initialization process in one embodiment of this invention. 本発明の一実施の形態におけるクラスタリング処理の詳細フローチャート。The detailed flowchart of the clustering process in one embodiment of this invention. 本発明の一実施の形態における特徴語抽出処理の詳細フローチャート。The detailed flowchart of the feature word extraction process in one embodiment of this invention. 本発明の一実施の形態におけるクラスタ特徴語抽出処理の詳細フローチャート。The detailed flowchart of the cluster feature word extraction process in one embodiment of this invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明では、ソースコードの変更履歴からは、「近い過去に修正された箇所ほど将来にも修正されやすい」ことが知られており、バグの予測が可能であること、及び、ソースコードの変更履歴において、共有のリビジョンで共起的に変更される傾向にある一連のソースコード箇所は、機能的な関連の強いクラスタを構成するとき、ソースコードコメントやコミットメッセージを用いた特徴語抽出を併用することで、関連する機能の候補を導出可能であることを前提とする。 In the present invention, it is known from the change history of the source code that “the portion corrected in the near past is easier to be corrected in the future”, and that the bug can be predicted, and the change in the source code In a history, a series of source code parts that tend to change co-occurringly with shared revisions, together with feature word extraction using source code comments and commit messages, when forming a cluster with strong functional relationships Thus, it is assumed that related function candidates can be derived.

図１は、本発明の一実施の形態におけるバグ予測装置の構成例を示す。 FIG. 1 shows a configuration example of a bug prediction apparatus according to an embodiment of the present invention.

同図に示すバグ予測装置１は、ユーザ端末２に接続され、ユーザ端末２から問い合わせに対して、バグの多い機能を出力する。 A bug prediction apparatus 1 shown in the figure is connected to a user terminal 2 and outputs a function with many bugs in response to an inquiry from the user terminal 2.

バグ予測装置１は、変更箇所登録部１０、バグ修正リビジョン登録部２０、修正箇所行列登録部３０、クラスタリング部４０、特徴語抽出部５０、VCS（Version Control System）リポジトリ１１、変更箇所記憶部１２、バグ修正リビジョン記憶部１３、修正箇所行列記憶部１４、クラスタ記憶部１５を有する。 The bug prediction device 1 includes a change location registration unit 10, a bug correction revision registration unit 20, a correction location matrix registration unit 30, a clustering unit 40, a feature word extraction unit 50, a VCS (Version Control System) repository 11, and a change location storage unit 12. A bug correction revision storage unit 13, a correction location matrix storage unit 14, and a cluster storage unit 15.

VCSリポジトリ１１は、一般的には、図２に示すような機能を有する。リビジョン記憶部１１１は、リビジョン毎に管理下のファイル名、ファイル内容、コミットメッセージを記憶する。ファイル差分抽出機能１１２は、ユーザ端末２からコミット操作があると、追加、変更、削除のあったファイル名とコミットメッセージ及び、新しいリビジョンを取得し、ユーザ手元のファイルと、リビジョン記憶部１１１にある最新のリビジョンのファイルを比較して、差分を抽出する。リビジョン記録機能１１３は、差分、コミットログ、リビジョン番号をリビジョン記憶部１１１に追加し、当該処理が完了したことをユーザ端末２に通知する。リビジョン間差分計算機能１１４は、ユーザの比較操作により、リビジョン番号を２つ取得し、２つのリビジョン間にある差分（ファイルの存在、ファイルの内容）を求める。関数名導出機能１１５は、差分の箇所の前後コンテキストをもとに、その箇所の関数名を導出する。コミットメッセージ抽出機能１１６は、２つのリビジョン間にある全てのコミットメッセージを抽出し、差分の内容、差分箇所の関数名、コミットメッセージを返却する。なお、図１では、VCSリポジトリ１１をバグ予測装置１内に設けているが、この例に限定されることなく、バグ予測装置１の外部に設けられていてもよい。 The VCS repository 11 generally has a function as shown in FIG. The revision storage unit 111 stores a managed file name, file content, and commit message for each revision. When there is a commit operation from the user terminal 2, the file difference extraction function 112 acquires the file name, commit message, and new revision that have been added, changed, or deleted, and is stored in the user's local file and revision storage unit 111. Compare the latest revision files and extract the differences. The revision recording function 113 adds the difference, the commit log, and the revision number to the revision storage unit 111 and notifies the user terminal 2 that the processing is completed. The inter-revision difference calculation function 114 obtains two revision numbers by a user comparison operation, and obtains a difference (existence of a file, contents of a file) between the two revisions. The function name derivation function 115 derives the function name at the location based on the context before and after the location of the difference. The commit message extraction function 116 extracts all the commit messages between two revisions, and returns the contents of the difference, the function name of the difference part, and the commit message. In FIG. 1, the VCS repository 11 is provided in the bug prediction apparatus 1. However, the present invention is not limited to this example, and may be provided outside the bug prediction apparatus 1.

次に、上記の構成における各部の処理について説明する。 Next, the process of each part in said structure is demonstrated.

図３は、本発明の一実施の形態におけるバグ予測装置の概要動作のフローチャートである。 FIG. 3 is a flowchart of an outline operation of the bug prediction apparatus according to the embodiment of the present invention.

ステップ２１０）変更箇所登録部１０は、ユーザ端末２から問い合わせがあると、VCSリポジトリ１１からリビジョンを読み出し、各リビジョンの変更箇所を収集して、変更箇所記憶部１２に登録する。ユーザ端末２から問い合わせとして、VCSリポジトリ１１の場所の指定が入力される。また、収集される変更箇所は、ファイル名と関数名からなる。変更箇所記憶部１２は、収集された変更箇所のリビジョンの一覧である変更リビジョン一覧を持つ。詳細な処理については図５で説明する。 Step 210) Upon receiving an inquiry from the user terminal 2, the change location registration unit 10 reads revisions from the VCS repository 11, collects the change locations of each revision, and registers them in the change location storage unit 12. Specification of the location of the VCS repository 11 is input as an inquiry from the user terminal 2. Also, the collected changes include a file name and a function name. The change location storage unit 12 has a change revision list which is a list of revisions of the collected change locations. Detailed processing will be described with reference to FIG.

ステップ２２０）バグ修正リビジョン登録部２０は、VCSリポジトリ１１から読み出したコミットメッセージに基づいて、バグを修正したと判断可能なリビジョンを、バグ修正リビジョン記憶部１３に登録する。 Step 220) Based on the commit message read from the VCS repository 11, the bug correction revision registration unit 20 registers a revision that can be determined to have corrected the bug in the bug correction revision storage unit 13.

ステップ２３０）修正箇所行列登録部３０は、ステップ２１０で変更箇所記憶部１２に登録された変更箇所の集合について、ステップ２２０でバグ修正リビジョン記憶部１３に登録されたバグ修正のリビジョンの１つ１つに、２つの変更箇所が共に変更されているかを表す行列を作成する。具体的には、図４に示すように、バグ修正リビジョン毎に１つの行列を作成する。行列の大きさは、行、列ともに、変更箇所の数である。行列は特定のリビジョンにおいて２つの変更箇所が共に変更されていることを表す。変更箇所iと変更箇所jがあるとき、着目するリビジョンにおいてiとjが共に変更されているならば、行列の要素（i,j）を"１"とし、そうでなければ"０"とする。図４の例は、変更箇所１と２のみがあるリビジョンにおいて変更されていることを示す。 Step 230) The correction part matrix registration unit 30 selects one of the revisions of the bug correction registered in the bug correction revision storage unit 13 in Step 220 for the set of change parts registered in the change part storage unit 12 in Step 210. Second, a matrix is created that indicates whether the two changes are changed together. Specifically, as shown in FIG. 4, one matrix is created for each bug correction revision. The size of the matrix is the number of changes in both rows and columns. The matrix represents that both changes have been changed in a particular revision. If there is a change location i and a change location j, if i and j are both changed in the revision of interest, the matrix element (i, j) is set to "1", otherwise it is set to "0". . The example of FIG. 4 shows that only the changes 1 and 2 are changed in a certain revision.

得られた行列は修正箇所行列記憶部１４に登録される。 The obtained matrix is registered in the corrected part matrix storage unit 14.

ステップ２４０）クラスタリング部４０は、修正箇所行列記憶部１４に登録された修正箇所行列の集合にクラスタリングアルゴリズムを適用することで、関連性の強い変更箇所同士をクラスタとする。具体的には、修正箇所行列記憶部１４から修正箇所行列を読み出し、当該修正箇所行列に対して、与えられた複数のグラフから関連の高い部分グラフを求める技術（例えば、非特許文献４）を適用することで、分類結果であるクラスタが得られる。クラスタは変更箇所の集合で構成され、クラスタ記憶部１５に格納される。 Step 240) The clustering unit 40 applies the clustering algorithm to the set of corrected location matrices registered in the corrected location matrix storage unit 14, thereby making the highly relevant changed locations a cluster. Specifically, a technique (for example, Non-Patent Document 4) that reads a correction part matrix from the correction part matrix storage unit 14 and obtains a highly relevant subgraph from a plurality of given graphs with respect to the correction part matrix. By applying, a cluster as a classification result is obtained. The cluster is composed of a set of changed portions and is stored in the cluster storage unit 15.

ステップ２５０）特徴語抽出部５０は、クラスタ記憶部１５のバグが含まれると考えられるソフトウェア機能の名前を、コミットメッセージに対する特徴語抽出により求める。具体的には、クラスタ記憶部１５からクラスタリング結果を読み出し、個々のクラスタについて、ソフトウェア機能の名前を取得する。その処理内容は、クラスタを構成するすべての修正箇所を、変更箇所記憶部１２を参照することにより、全ての変更箇所から変更リビジョン一覧を取得した後、各リビジョンにおけるコミットメッセージをバグ修正リビジョン記憶部１３から取得し、コミットメッセージ群に対して、特徴的な単語を抽出する自然言語処理手法（例えば、非特許文献５参照）のような特徴語抽出アルゴリズムを適用する。得られた特徴語を、バグが含まれると考えられるソフトウェア機能名としてユーザ端末２に出力する。 Step 250) The feature word extraction unit 50 obtains the name of the software function that is considered to contain a bug in the cluster storage unit 15 by extracting the feature word for the commit message. Specifically, the clustering result is read from the cluster storage unit 15, and the name of the software function is acquired for each cluster. The processing content is that all correction points constituting the cluster are referred to the change point storage unit 12 to obtain a list of change revisions from all the change points, and then a commit message in each revision is sent to the bug correction revision storage unit. 13, a feature word extraction algorithm such as a natural language processing method (see, for example, Non-Patent Document 5) that extracts a characteristic word is applied to the commit message group. The obtained feature word is output to the user terminal 2 as a software function name that is considered to contain a bug.

次に、上記のステップ２１０の変更箇所登録処理の詳細を説明する。 Next, the details of the change location registration process in step 210 will be described.

図５は、本発明の一実施の形態における変更箇所登録処理の詳細フローチャートである。 FIG. 5 is a detailed flowchart of the change location registration process according to the embodiment of the present invention.

以下のステップ３１０〜３３０の処理を、VCSリポジトリ１１内のすべてのリビジョンについて繰り返す。 The following steps 310 to 330 are repeated for all revisions in the VCS repository 11.

ステップ３１０）変更箇所登録部１０は、VCSリポジトリ１１中の着目しているi番目のリビジョンと、その１つ前のリビジョンを比較し、差分を抽出する。差分を抽出する際に、VCSのdiffコマンドを想定し、"＋"、"−"、"*"、"!"などの記号で示された差分を、差分箇所毎に分類して保持する。 Step 310) The change location registration unit 10 compares the i-th revision of interest in the VCS repository 11 with the previous revision, and extracts the difference. When extracting a difference, a VCS diff command is assumed, and differences indicated by symbols such as “+”, “−”, “*”, and “!” Are classified and held for each difference portion.

ステップ３２０）ステップ３１０で得られたそれぞれの差分に対して、差分箇所の位置する関数名を取得する。取得できない場合は近傍のシンボル名を用いる。 Step 320) For each difference obtained in Step 310, the function name where the difference portion is located is acquired. If it cannot be obtained, a nearby symbol name is used.

ステップ３３０）差分箇所を変更箇所記憶部１２に登録する。 Step 330) The difference part is registered in the changed part storage unit 12.

次に、上記のステップ２２０のバグ修正リビジョン登録処理の詳細を説明する。 Next, details of the bug correction revision registration process in step 220 will be described.

図６は、本発明の一実施の形態におけるバグ修正リビジョン登録処理の詳細フローチャートである。 FIG. 6 is a detailed flowchart of the bug correction revision registration process according to the embodiment of the present invention.

バグ修正リビジョン登録部２０は、読み出したVCSリポジトリ１１内のすべてのリビジョンに対して、以下のステップ４１０〜４４０までの処理を繰り返す。 The bug correction revision registration unit 20 repeats the following steps 410 to 440 for all the revisions in the read VCS repository 11.

ステップ４１０） VCSリポジトリ１１から読み出されたリビジョンのコミットメッセージを取得する。 Step 410) A commit message of the revision read from the VCS repository 11 is acquired.

ステップ４２０）コミットメッセージに基づいて、パッチがバグフィックスを含むかどうかを判断する手法（例えば、非特許文献６）を適用してバグの修正を表す表現が含まれているかを判断する。 Step 420) Based on the commit message, a method (for example, Non-Patent Document 6) for determining whether or not the patch includes a bug fix is applied to determine whether or not an expression representing a bug correction is included.

ステップ４３０）バグの修正を表す表現が含まれている場合には、ステップ４４０に移行し、含まれていない場合には、ステップ４１０の処理に戻る。 Step 430) If an expression representing a bug correction is included, the process proceeds to step 440. If not included, the process returns to step 410.

ステップ４４０）コミットメッセージにバグの修正を表す表現が含まれている場合には、当該リビジョンの番号とコミットメッセージをバグ修正リビジョン記憶部１３に登録する。 Step 440) If the commit message includes an expression representing bug correction, the revision number and the commit message are registered in the bug correction revision storage unit 13.

次に、ステップ２３０の修正箇所行列作成処理の詳細を説明する。 Next, details of the correction location matrix creation processing in step 230 will be described.

図７は、本発明の一実施の形態における修正箇所行列作成処理の詳細フローチャートである。 FIG. 7 is a detailed flowchart of the correction location matrix creation process according to the embodiment of the present invention.

ステップ５１０）修正箇所行列登録部３０は、バグ修正リビジョン記憶部１３からリビジョンを読み出す。 Step 510) The correction location matrix registration unit 30 reads the revision from the bug correction revision storage unit 13.

ステップ５２０）読み出したリビジョンの数をRvとする。 Step 520) Let Rv be the number of read revisions.

ステップ５３０）変更箇所記憶部１２から変更箇所を読み出す。 Step 530) The changed part is read from the changed part storage unit 12.

ステップ５４０）読み出した変更箇所の数をPとする。 Step 540) Let P be the number of read changes.

ステップ５５０）処理したリビジョンの数revを０に初期化する。 Step 550) The number of processed revisions rev is initialized to zero.

ステップ５６０）修正箇所行列の初期化処理を行う。詳細な処理は図８で説明する。 Step 560) The correction location matrix is initialized. Detailed processing will be described with reference to FIG.

ステップ５７０） rev=rev+1とする。 Step 570) Set rev = rev + 1.

ステップ５８０） rev＜Rvであればステップ５６０に戻り、rev≧Rvであれば当該修正箇所行列登録部３０の処理を終了する。 Step 580) If rev <Rv, the process returns to Step 560, and if rev ≧ Rv, the process of the modified part matrix registration unit 30 is terminated.

次に、上記のステップ５６０の処理を詳細に説明する。 Next, the process of step 560 will be described in detail.

図８は、本発明の一実施の形態における修正箇所行列初期化処理の詳細フローチャートである。 FIG. 8 is a detailed flowchart of the correction location matrix initialization process according to the embodiment of the present invention.

ステップ６０１０）修正箇所行列登録部３０は、変更箇所記憶部１２からrev番目の変更箇所を読み出し、Rとする。 Step 6010) The correction part matrix registration unit 30 reads the rev-th change part from the change part storage part 12, and sets it to R.

ステップ６０２０） T=P×Pのゼロ行列とする（ただし、Pは変更箇所記憶部１２から読み出された変更箇所の数）。 Step 6020) A T = P × P zero matrix (where P is the number of changed locations read from the changed location storage unit 12).

ステップ６０３０）変更箇所iをi=0とする。 Step 6030) The changed part i is set to i = 0.

ステップ６０４０）変更箇所ｊをj=i+1とする。 Step 6040) The changed portion j is set to j = i + 1.

ステップ６０５０） i番目とj番目のユニークな変更箇所が、ともに、Rに含まれているかを判断する。 Step 6050) It is determined whether both the i-th and j-th unique changed portions are included in R.

ステップ６０６０）含まれている場合にはステップ６０７０に移行し、含まれていない場合にはステップ６０８０に移行する。 Step 6060) If yes, go to Step 6070, otherwise go to Step 6080.

ステップ６０７０） T(i,j)=T(j,i)＝１とする。 Step 6070) T (i, j) = T (j, i) = 1.

ステップ６０８０） j=j+1とする。 Step 6080) j = j + 1.

ステップ６０９０） j＜Pであればステップ６０４０に移行し、j≧Pであればステップ６１００に移行する。 Step 6090) If j <P, go to Step 6040, and if j ≧ P, go to Step 6100.

ステップ６１００）変更箇所iをi=i+1とする。 Step 6100) The changed part i is set to i = i + 1.

ステップ６１１０） i＜Pであればステップ６０４０に戻り、i≧Pであればステップ６１２０に移行する。 Step 6110) If i <P, return to Step 6040, and if i ≧ P, go to Step 6120.

ステップ６１２０）行列Tとリビジョンrevを修正箇所行列記憶部１４に登録する。 Step 6120) The matrix T and the revision rev are registered in the corrected part matrix storage unit 14.

次に、図３のステップ２４０のクラスタリング処理の詳細について説明する。 Next, details of the clustering process in step 240 of FIG. 3 will be described.

図９は、本発明の一実施の形態におけるクラスタリング処理の詳細フローチャートである。 FIG. 9 is a detailed flowchart of the clustering process according to the embodiment of the present invention.

ステップ７１０）クラスタリング部４０は、修正箇所行列記憶部１４からリビジョン毎の修正箇所行列を取り出す。 Step 710) The clustering unit 40 retrieves the correction part matrix for each revision from the correction part matrix storage unit 14.

ステップ７２０）取り出した修正行列群に、与えられた複数のグラフから関連の高い部分を求める既存手法（例えば、非特許文献４）を適用してクラスタリングを行う。 Step 720) Clustering is performed by applying an existing method (for example, Non-Patent Document 4) for obtaining a highly relevant part from a plurality of given graphs to the extracted correction matrix group.

ステップ７３０）ステップ７２０でクラスタリングされた各クラスタをクラスタ記憶部１５に登録する。 Step 730) The clusters clustered in Step 720 are registered in the cluster storage unit 15.

次に、図３のステップ２５０の特徴語抽出処理の詳細について説明する。 Next, the details of the feature word extraction process in step 250 of FIG. 3 will be described.

図１０は、本発明の一実施の形態における特徴語抽出処理の詳細フローチャートである。 FIG. 10 is a detailed flowchart of the feature word extraction process according to the embodiment of the present invention.

ステップ８１０）特徴語抽出部５０は、クラスタ記憶部１５からすべてのクラスタを読み出す。 Step 810) The feature word extraction unit 50 reads all the clusters from the cluster storage unit 15.

ステップ８２０）変更箇所記憶部１２からすべての変更箇所を読み出す。 Step 820) All changed portions are read from the changed portion storage unit 12.

ステップ８３０）ステップ８１０で読み出されたすべてのクラスタに対して、クラスタ特徴語抽出処理を行う。詳細については図１１で詳述する。 Step 830) Cluster feature word extraction processing is performed for all the clusters read in step 810. Details will be described in detail with reference to FIG.

ステップ８４０）ステップ８３０で得られた特徴語を、クラスタが表すバグの多い機能として出力する。 Step 840) The feature word obtained in step 830 is output as a buggy function represented by the cluster.

上記のステップ８３０のクラスタ特徴語抽出処理について説明する。 The cluster feature word extraction process in step 830 will be described.

図１１は、本発明の一実施の形態におけるクラスタ特徴語抽出処理の詳細フローチャートである。 FIG. 11 is a detailed flowchart of cluster feature word extraction processing according to an embodiment of the present invention.

ステップ９１０）特徴語抽出部５０は、クラスタ記憶部１５から読み出されたクラスタを構成する変更箇所を取得する。 Step 910) The feature word extraction unit 50 obtains the changed part constituting the cluster read from the cluster storage unit 15.

ステップ９２０）変更箇所を変更しているリビジョンの集合を取得する。 Step 920) Acquire a set of revisions whose changed portions are changed.

ステップ９３０）メモリ（図示せず）上の文書DをD=φとする。 Step 930) Document D on the memory (not shown) is set to D = φ.

ステップ９４０）バグ修正リビジョン記憶部１３から、ステップ９２０で取得したリビジョンの集合の各リビジョンのコミットメッセージを読み出す。 Step 940) The commit message of each revision in the set of revisions acquired in Step 920 is read from the bug correction revision storage unit 13.

ステップ９５０）リビジョンのコミットメッセージを文書Dに追加する。 Step 950) Add a revision commit message to document D.

ステップ９６０）上記のステップ９４０，９５０の処理を、リビジョン集合のすべてのリビジョンに対して繰り返した後、文書Dに対する特徴語抽出を実行する。特徴語抽出処理は、特徴的な単語を抽出する自然言語処理手法（例えば、非特許文献５）をはじめとする既存の特徴語抽出技術を適用することにより実現できる。 Step 960) After the processes of Steps 940 and 950 are repeated for all revisions in the revision set, feature word extraction for the document D is executed. The feature word extraction process can be realized by applying an existing feature word extraction technique including a natural language processing method (for example, Non-Patent Document 5) that extracts a characteristic word.

本実施の形態に係るバグ予測装置１は、例えば、１つ又は複数のコンピュータに、本実施の形態で説明する処理内容を記述したプログラムを実行させることにより実現可能である。すなわち、バグ予測装置１が有する機能は、当該コンピュータに内蔵されるCPUやメモリ、ハードディスクなどのハードウェア資源を用いて、バグ予測装置１で実施される処理に対応するプログラムを実行することによって実現することが可能である。また、上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 The bug prediction apparatus 1 according to the present embodiment can be realized, for example, by causing one or a plurality of computers to execute a program describing the processing content described in the present embodiment. In other words, the functions of the bug prediction device 1 are realized by executing a program corresponding to the processing executed by the bug prediction device 1 using hardware resources such as a CPU, memory, and hard disk built in the computer. Is possible. Further, the program can be recorded on a computer-readable recording medium (portable memory or the like), stored, or distributed. It is also possible to provide the program through a network such as the Internet or electronic mail.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１バグ予測装置
２ユーザ端末
１０変更箇所登録部
１１ VCS（Version Control System）リポジトリ
１２変更箇所記憶部
１３バグ修正リビジョン記憶部
１４修正箇所行列記憶部
１５クラスタ記憶部
２０バグ修正リビジョン登録部
３０修正箇所行列登録部
４０クラスタリング部
５０特徴語抽出部
１１１リビジョン記憶部
１１２ファイル差分抽出機能
１１３リビジョン記録機能
１１４リビジョン間差分計算機能
１１５関数名導出機能
１１６コミットメッセージ抽出機能 DESCRIPTION OF SYMBOLS 1 Bug prediction apparatus 2 User terminal 10 Change location registration part 11 VCS (Version Control System) repository 12 Change location storage part 13 Bug correction revision storage part 14 Correction part matrix storage part 15 Cluster storage part 20 Bug correction revision registration part 30 Correction part Matrix registration unit 40 Clustering unit 50 Feature word extraction unit 111 Revision storage unit 112 File difference extraction function 113 Revision recording function 114 Revision difference calculation function 115 Function name derivation function 116 Commit message extraction function

Claims

A bug prediction device for detecting a function including many bugs in source code,
Functionality is limited to only revisions that can be determined that bugs have been corrected based on the assumption that changes made in the near past are likely to be corrected in the near future. A bug characterized by having a bug estimation means for estimating a function of software having a function that includes many bugs by creating a matrix for deriving a source code change part having a strong relation and clustering the matrix Prediction device.

The bug estimation means is:
At least a change history for each revision of the source code, a change history storage means storing a commit message,
Change location registration means for extracting a change location for each revision from the change history storage means and registering a change revision list in the change location storage means;
A revision that can be determined that a bug has been corrected from the change history storage means, based on the commit message, and that is registered in the bug correction revision storage means;
For the change revision list acquired from the change location storage means, a matrix indicating whether two change locations are changed for each corrected revision acquired from the bug correction revision storage means is generated, and the correction location matrix storage means Correction location matrix registration means to be registered in
Clustering for the matrix of the correction location matrix storage means, clustering the highly relevant change locations between each other, clustering means for storing in the cluster storage means,
Based on all the changed portions constituting each cluster acquired from the cluster storage unit, the revision of the changed portion is acquired from the changed portion storage unit, and based on the revision, the revision of each revision is acquired from the bug correction revision storage unit. A feature word extraction means for extracting the name of the function of the software presumed to include a bug by acquiring a commit message and extracting a feature word of the commit message;
The bug prediction apparatus according to claim 1, comprising:

The corrected location matrix registration means includes:
The number of rows and columns of the matrix is the number of changed parts. If the changed part i and the changed part j are both changed for each revised revision, the matrix element (i, j) is set to 1. , Otherwise includes a means of 0,
The clustering means includes
3. The bug prediction apparatus according to claim 2, further comprising means for clustering the changed part matrix having an element (i, j) of 1 in the matrix.

A bug prediction method in a device for detecting a function containing many bugs in source code,
Functionality is limited to only revisions that can be determined that bugs have been corrected based on the assumption that changes made in the near past are likely to be corrected in the near future. A bug is characterized in that a bug estimation step for estimating a function of software of a function including many bugs is created by creating a matrix for deriving a source code change part having a strong relation and clustering the matrix. Prediction method.

In the bug estimation step,
At least a change history for each revision of the source code, a change location registration step for extracting a change location for each revision from the change history storage means storing the commit message and registering a change revision list in the change location storage means;
A bug correction revision registration step of extracting a revision that can be determined to have corrected a bug from the change history storage means based on the commit message, and registering the revision in the bug correction revision storage means;
For the change revision list acquired from the change location storage means, a matrix indicating whether two change locations are changed for each corrected revision acquired from the bug correction revision storage means is generated, and the correction location matrix storage means A correction location matrix registration step to be registered in
Clustering with respect to the matrix of the corrected location matrix storage means, clustering the highly relevant change locations together, and storing in the cluster storage means,
Based on all the changed portions constituting each cluster acquired from the cluster storage unit, the revision of the changed portion is acquired from the changed portion storage unit, and based on the revision, the revision of each revision is acquired from the bug correction revision storage unit. A feature word extraction step of extracting a name of a software function that is estimated to include a bug by obtaining a commit message and extracting a feature word of the commit message;
The bug prediction method of Claim 4 which has these.

In the modified location matrix registration step,
The number of rows and columns of the matrix is the number of changed parts. If the changed part i and the changed part j are both changed for each revised revision, the matrix element (i, j) is set to 1. , Otherwise 0,
In the clustering step,
The bug prediction method according to claim 5, wherein clustering is performed on the changed portion matrix having an element (i, j) of 1 in the matrix.

Computer
A bug prediction program for functioning as each means of the bug prediction apparatus according to claim 1.