JP2018018267A

JP2018018267A - Software development support apparatus, software development support method, and program

Info

Publication number: JP2018018267A
Application number: JP2016147549A
Authority: JP
Inventors: 中伸明田; Nobuaki Tanaka; 村陽介田; Yosuke Tamura; 林元也小; Motoya Kobayashi; 田隆史島; Takashi Shimada
Original assignee: Intelligent Evaluation Technologies Inc
Current assignee: Intelligent Evaluation Technologies Inc
Priority date: 2016-07-27
Filing date: 2016-07-27
Publication date: 2018-02-01

Abstract

【課題】ソフトウェア開発における不具合発生を検出するソフトウェア開発支援装置を提供する。【解決手段】ソフトウェア開発支援装置は、コミットが、当該コミットよりも以前にされた他のコミットにおいて混入されたバグを修正する更新であるか否かを、当該コミットにおいて変更されたソースコードの情報に基づいて判断し、バグ修正であるコミットを検出するバグ修正コミット検出部と、当該コミットがバグを修正する更新であると検出された場合に、当該コミットにおいて修正されたバグを混入した前記他のコミットのリビジョンを検出するバグ混入リビジョン検出部と、バグを混入した前記他のコミットのリビジョンを特定する情報を記憶するデータベースと、を備える。【選択図】図２A software development support device for detecting occurrence of a defect in software development is provided. A software development support apparatus determines whether a commit is an update that corrects a bug mixed in another commit made before the commit, and information on the source code changed in the commit And a bug correction commit detection unit that detects a commit that is a bug correction, and the other that contains a bug corrected in the commit when the commit is detected as an update that corrects the bug. A bug-incorporating revision detecting unit for detecting the revision of the other commit, and a database for storing information for identifying the revision of the other commit in which the bug is incorporated. [Selection] Figure 2

Description

本発明は、ソフトウェア開発支援装置、ソフトウェア開発支援方法及びプログラムに関する。 The present invention relates to a software development support apparatus, a software development support method, and a program.

ソフトウェア開発における不具合の発生には、様々な原因が考えられる。例えば、同じソースコードを編集したとしても、開発者の技術に応じて不具合発生率に差異が生まれる。また、同一開発者がプログラミングをしたとしても、開発内容の難易度やプログラミングを行った曜日や、時間帯によって不具合発生率に違いが生じる。 There are various causes for the occurrence of defects in software development. For example, even if the same source code is edited, a difference occurs in the defect occurrence rate according to the developer's technology. Even if the same developer performs programming, the problem occurrence rate varies depending on the difficulty level of the development contents, the day of the week on which the programming is performed, and the time zone.

これらの不具合を検出するために様々な機械学習を活用した手法が開発されている。不具合発生に規則性が存在すると、不具合の検出の精度及び速度を向上することが可能となる。一例として、非特許文献１に記載されているように、これらの不具合を検出するために様々なパラメータや機械学習のアルゴリズムを利用した手法が開発されている。 In order to detect these problems, various methods using machine learning have been developed. If there is regularity in the occurrence of a defect, it is possible to improve the accuracy and speed of detection of the defect. As an example, as described in Non-Patent Document 1, a method using various parameters and machine learning algorithms has been developed to detect these defects.

特開２０１６−２４７８４JP2016-24784

T. Hall, “A systematic Literature Review on Fault Prediction Performance in Software Engineering,” IEEE Trans. on Software Engineering, Nov. 2012, Volume 38, Issue 6, page 1276.T. Hall, “A systematic Literature Review on Fault Prediction Performance in Software Engineering,” IEEE Trans. On Software Engineering, Nov. 2012, Volume 38, Issue 6, page 1276.

本発明が解決しようとする課題は、ソフトウェア開発における不具合発生を検出するソフトウェア開発支援装置を実現することにある。 The problem to be solved by the present invention is to realize a software development support apparatus that detects occurrence of defects in software development.

一実施形態に係るソフトウェア開発支援装置は、
コミットが、当該コミットよりも以前にされた他のコミットにおいて混入されたバグを修正する更新であるか否かを、当該コミットにおいて変更されたソースコードの情報に基づいて判断し、バグ修正であるコミットを検出する、バグ修正コミット検出部と、
当該コミットがバグを修正する更新であると検出された場合に、当該コミットにおいて修正されたバグを混入した前記他のコミットのリビジョンを検出する、バグ混入リビジョン検出部と、
バグを混入した前記他のコミットのリビジョンを特定する情報を記憶する、データベースと、
を備える。 A software development support apparatus according to an embodiment
It is a bug fix by determining whether a commit is an update that fixes a bug introduced in another commit made before that commit based on the information of the source code changed in the commit. A bug fix commit detector to detect commits;
A bug-incorporated revision detection unit that detects a revision of the other commit in which a bug fixed in the commit is mixed when the commit is detected as an update that fixes a bug;
A database that stores information identifying revisions of the other commits that contain bugs;
Is provided.

一実施形態に係るソフトウェア開発支援方法は、
バグ修正コミット検出部が、コミットが当該コミットよりも以前にされた他のコミットにおいて混入されたバグを修正する更新であるか否かを、当該コミットにおいて変更されたソースコードの情報に基づいて判断し、バグ修正であるコミットを検出するステップと、
バグ混入リビジョン検出部が、当該コミットがバグを修正する更新であると検出された場合に、当該コミットにおいて修正されたバグを混入した前記他のコミットのリビジョンを検出するステップと、
データベースに、バグを混入した前記他のコミットのリビジョンを特定する情報を記憶させるステップと、
を備える。 A software development support method according to an embodiment is:
The bug correction commit detection unit determines whether or not the commit is an update that fixes a bug mixed in another commit made before that commit based on the information of the source code changed in that commit. And detecting a commit that is a bug fix,
When the bug-mixed revision detection unit detects that the commit is an update that corrects a bug, a step of detecting a revision of the other commit mixed with the bug corrected in the commit;
Storing in the database information identifying the revision of the other commit that contains the bug;
Is provided.

一実施形態に係るプログラムは、
コンピュータに、
コミットが、当該コミットよりも以前にされた他のコミットにおいて混入されたバグを修正する更新であるか否かを、当該コミットにおいて変更されたソースコードの情報に基づいて判断し、バグ修正であるコミットを検出する、バグ修正コミット検出手段、
当該コミットがバグを修正する更新であると検出された場合に、当該コミットにおいて修正されたバグを混入した前記他のコミットのリビジョンを検出する、バグ混入リビジョン検出手段、
バグを混入した前記他のコミットのリビジョンを特定する情報を記憶する、記憶手段、
として機能させる。 A program according to an embodiment is:
On the computer,
It is a bug fix by determining whether a commit is an update that fixes a bug introduced in another commit made before that commit based on the information of the source code changed in the commit. Bug fix commit detection means to detect commits,
A bug-incorporating revision detection means for detecting a revision of the other commit in which the bug fixed in the commit is mixed when the commit is detected as an update that fixes the bug;
Storage means for storing information for identifying revisions of the other commits including bugs;
To function as.

本発明によれば、ソフトウェア開発における不具合発生の規則性を機械学習により抽出することにより、不具合の検出の精度及び速度を向上することが可能となる。 According to the present invention, it is possible to improve the accuracy and speed of defect detection by extracting the regularity of defect occurrence in software development by machine learning.

一実施形態に係るソフトウェア開発支援装置の使用態様を示す図。The figure which shows the usage condition of the software development assistance apparatus which concerns on one Embodiment. 一実施形態に係るソフトウェア開発支援装置の概略を示すブロック図。The block diagram which shows the outline of the software development assistance apparatus which concerns on one Embodiment. 一実施形態に係るソースコードコミット時の動作を示すフローチャート。The flowchart which shows the operation | movement at the time of the source code commit which concerns on one Embodiment. 一実施形態に係るバグ修正コミット判断の動作を示すフローチャート。10 is a flowchart showing an operation of bug correction commit determination according to an embodiment. コミットがバグ修正であるか否かを判断する判断基準の一例を示す図。The figure which shows an example of the judgment criteria which judges whether a commit is a bug correction. 一実施形態に係るバグ混入パラメータの学習動作を示すフローチャート。The flowchart which shows the learning operation | movement of the bug mixing parameter which concerns on one Embodiment. 一実施形態に係るソフトウェア開発支援装置の使用形態の別の例を示す図。The figure which shows another example of the usage type of the software development assistance apparatus which concerns on one Embodiment.

以下、図面を参照して、本発明の実施形態について説明する。本実施形態は、本発明を限定するものではない。 Embodiments of the present invention will be described below with reference to the drawings. This embodiment does not limit the present invention.

図１は、本実施形態に係るサーバを用いたソフトウェア開発の状態を示す図である。図１に示すように、ソフトウェア開発支援装置１とクライアント２は、有線又は無線のネットワークを介して接続されており、ソフトウェア開発者は、クライアント２を用いて作成又は修正したソースコードが含まれるファイルを、ソフトウェア開発支援装置１へとネットワークを介して送信する。以下、このクライアント２からソフトウェア開発支援装置１へのファイルの送信操作のことをコミットという。 FIG. 1 is a diagram showing a state of software development using the server according to the present embodiment. As shown in FIG. 1, the software development support apparatus 1 and the client 2 are connected via a wired or wireless network, and the software developer includes a file containing source code created or modified using the client 2. Is transmitted to the software development support apparatus 1 via the network. Hereinafter, the file transmission operation from the client 2 to the software development support apparatus 1 is referred to as commit.

コミットされたファイルに記載されたソースコードは、ソフトウェア開発支援装置１内に記憶され、クライアント２からネットワークを介して変更履歴等が参照できる状態となっている。このソフトウェア開発支援装置１を用いることにより、複数の開発者が同一ファイル内のソースコードを修正している場合においても、開発者間でファイル及びソースコードを共有することができる。 The source code described in the committed file is stored in the software development support apparatus 1, and a change history or the like can be referred from the client 2 via the network. By using this software development support apparatus 1, even when a plurality of developers modify the source code in the same file, the file and the source code can be shared among the developers.

図２は、ソフトウェア開発支援装置１の構成を示すブロック図である。この図２に示すようにソフトウェア開発支援装置１は、バージョン管理部１０と、バグ情報学習部２０とを備えて構成される。このソフトウェア開発支援装置１は、例えば、Ｔｒａｃ、Ｒｅｄｍｉｎｅ、Ｂｕｇｚｉｌｌａ等のバグ管理システムやプロジェクト管理システムにより構成される。 FIG. 2 is a block diagram showing the configuration of the software development support apparatus 1. As shown in FIG. 2, the software development support apparatus 1 includes a version management unit 10 and a bug information learning unit 20. The software development support apparatus 1 is configured by a bug management system such as Trac, Redmine, Bugzilla, or a project management system.

バージョン管理部１０は、例えば、ｇｉｔ、Ｍｅｒｃｕｒｉａｌ、Ｓｕｂｖｅｒｓｉｏｎ、ＣＶＳ（Concurrent Versions System）、ＢｉｔＫｅｅｐｅｒ、ＡｌｉｅｎＢｒａｉｎ等の分散型又は集中型のバージョン管理システムにより構成される。このバージョン管理部１０は、ソースコード入力部１００と、ソースコードデータベース１０２と、変更履歴データベース１０４と、リポジトリ情報出力部１０６と、変更箇所検出部１０８と、変更行数計数部１１０と、を備えて構成される。この他、ソースコードのマージをするためのマージ部など所謂バージョン管理システムの基本的な構成を備えていてもよい。 The version management unit 10 is configured by a distributed or centralized version management system such as git, Mercurial, Subversion, CVS (Concurrent Versions System), BitKeeper, and AlienBrain. The version management unit 10 includes a source code input unit 100, a source code database 102, a change history database 104, a repository information output unit 106, a changed part detection unit 108, and a changed line count unit 110. Configured. In addition, a basic configuration of a so-called version management system such as a merge unit for merging source codes may be provided.

ソースコード入力部１００は、図１のクライアント２からファイルのコミットがされた際に、そのファイルが入力され、ソースコードデータベース１０２へファイル内に記載されたソースコードを記録する。 When the file is committed from the client 2 in FIG. 1, the source code input unit 100 receives the file and records the source code described in the file in the source code database 102.

ソースコードデータベース１０２は、コミットされたファイルに記載されたソースコードを記録し保持するためのデータベースである。本実施形態においては、このソースコードデータベース１０２は、ファイルごとにソースコードを記録するが、モジュールごと、或いは、ファンクションごとにソースコードを記録するものであってもよい。 The source code database 102 is a database for recording and holding the source code described in the committed file. In the present embodiment, the source code database 102 records the source code for each file, but may record the source code for each module or function.

変更履歴データベース１０４は、コミットされたファイルが、既にソースコードデータベース１０２に記録されているソースコードに関するファイルである場合に、それらのソースコード同士の差分を変更履歴として記録するためのデータベースである。なお、ソースコードデータベース１０２や変更履歴データベース１０４などの各データベースは、ＳＳＤ（Solid State Drive）やＨＤＤ（Hard Disc Drive）等の記録媒体により構成される。 The change history database 104 is a database for recording a difference between source codes as a change history when the committed file is a file related to the source code already recorded in the source code database 102. Each database such as the source code database 102 and the change history database 104 is configured by a recording medium such as a solid state drive (SSD) or a hard disk drive (HDD).

また、データベースとしては、これらの他に図示しないバイナリファイル用のデータベースがあってもよいし、ソースコードデータベース１０２にバイナリファイルを記録できるようにしてもよい。また、ソースコードデータベース１０２に記録されるファイルは、ソースコードを記載したファイルには限られず、ライセンス情報に係るファイルであったり、ソースコードの使用方法を記載したドキュメントであったり、注意書きが書かれたテキストファイルであったりしてもよいし、これらに限られず、必要なファイルを記録するものであってもよい。さらに、ファイル自体を入力するものではなく、ファイルの属性等が変更されたファイルがコミットされた際に、これらの属性等を記録するものであってもよい。以下、これらのデータベースをまとめてリポジトリといい、各コミットを識別する識別子、例えば、シーケンシャルに付された識別番号をリビジョンという。 In addition to these, there may be a database for binary files (not shown), or the binary file may be recorded in the source code database 102. The file recorded in the source code database 102 is not limited to the file describing the source code, but is a file related to license information, a document describing how to use the source code, or a note. It may be a written text file, or it is not limited to these, and a necessary file may be recorded. Furthermore, instead of inputting the file itself, these attributes may be recorded when a file whose file attributes have been changed is committed. Hereinafter, these databases are collectively referred to as a repository, and an identifier for identifying each commit, for example, an identification number assigned sequentially is referred to as a revision.

上記においては、ソースコードデータベース１０２には、コミットされたファイルが記録されるものとしたが、これには限られず、例えば、各ファイルにおいて、最初にコミットされた情報を記録しておき、現状のファイルを出力する際に、変更履歴データベース１０４に記録されている当該ファイルの変更履歴を過去から現在へと順に走査することにより現状のファイルを再構成し、出力するものとしてもよい。また、逆に、変更履歴データベース１０４は、各ファイルの変更リビジョンの番号等の識別子を記録しておき、変更履歴を出力する際に、ソースコードデータベース１０２における各リビジョンのファイルを比較することにより差分を出力するようにしてもよい。 In the above description, the committed file is recorded in the source code database 102. However, the present invention is not limited to this. For example, in each file, the first committed information is recorded, When outputting a file, the current file may be reconstructed and output by scanning the change history of the file recorded in the change history database 104 in order from the past to the present. On the other hand, the change history database 104 records an identifier such as the revision number of each file, and compares the revision files in the source code database 102 when outputting the change history. May be output.

リポジトリ情報出力部１０６は、クライアント２からの指令により、クライアント２へリポジトリ内にある情報を出力する出力部である。この出力は、クライアント２へファイルを転送することにより出力してもよいし、ブラウザを介してクライアント２に接続されているモニタへ表示させるものであってもよい。出力する情報は、ソースコードデータベース１０２等のデータベースに記録されているファイルや変更履歴データベース１０４に記録されている変更履歴である。また、変更履歴データベース１０４に記録されているソースコードの変更履歴を表示に適した形式へ変換して出力するものであってもよい。 The repository information output unit 106 is an output unit that outputs information in the repository to the client 2 in response to a command from the client 2. This output may be output by transferring a file to the client 2 or may be displayed on a monitor connected to the client 2 via a browser. The information to be output is a file recorded in a database such as the source code database 102 or a change history recorded in the change history database 104. Alternatively, the source code change history recorded in the change history database 104 may be converted into a format suitable for display and output.

変更箇所検出部１０８は、コミットされたファイルが既にソースコードデータベース１０２に記録されているファイルである場合には、コミットにより更新されたファイルと、既存のソースコードデータベース１０２に記録されているファイルとの差分を抽出することにより、変更箇所を検出する検出部である。 When the committed file is a file that has already been recorded in the source code database 102, the change location detection unit 108 includes a file that has been updated by the commit, a file that has been recorded in the existing source code database 102, It is a detection part which detects a change location by extracting the difference of these.

変更行数計数部１１０は、変更箇所検出部１０８が検出した変更箇所を走査することにより、コミットされたファイルにおいて削除された行数及び追加された行数を計数する計数部である。 The changed line number counting unit 110 is a counting unit that counts the number of deleted lines and the number of added lines in the committed file by scanning the changed parts detected by the changed part detecting unit 108.

なお、本実施形態においては、ソースコード入力部１００、リポジトリ情報出力部１０６、変更箇所検出部１０８、および、変更行数計数部１１０は、ソフトウェア開発支援装置１に設けられたＣＰＵ（Central Processing Unit）が、種々のＩ／Ｏインターフェースを必要に応じて利用しながら、所定のプログラムを実行することにより実現される。 In the present embodiment, the source code input unit 100, the repository information output unit 106, the changed part detecting unit 108, and the changed line number counting unit 110 are a CPU (Central Processing Unit) provided in the software development support apparatus 1. ) Is realized by executing a predetermined program while using various I / O interfaces as necessary.

バグ情報学習部２０は、バグ修正を行ったリビジョンと、バグ修正と見られるリビジョンにおいて修正されたバグを混入したリビジョンとを機械学習する学習部である。このバグ情報学習部２０は、学習データベース２００と、バグ混入確率出力部２０２と、バグ修正コミット検出部２０４と、バグ混入リビジョン検出部２０６と、バグ混入確率算出部２０８と、を備えて構成される。 The bug information learning unit 20 is a learning unit that performs machine learning of a revision in which a bug is corrected and a revision in which a bug corrected in a revision that is considered to be a bug correction is mixed. The bug information learning unit 20 includes a learning database 200, a bug mixing probability output unit 202, a bug correction commit detection unit 204, a bug mixing revision detection unit 206, and a bug mixing probability calculation unit 208. The

学習データベース２００は、バグを混入したリビジョンを検出する機械学習のためのパラメータや、学習データやバグ混入確率算出のために確立した学習モデルを記憶するデータベースである。この学習データベース２００に記憶されているパラメータや学習データに基づいてバグ情報学習部２０は、バグを混入したリビジョンと、そのリビジョンにおける開発環境等を機械学習し、ファイルがコミットされた際のバグ混入確率を算出する。また、バグを修正したリビジョンを機械学習により検出するようにしてもよく、この場合、学習データベース２００は、バグ修正に関する学習データを記憶するようにしてもよい。 The learning database 200 is a database that stores parameters for machine learning for detecting a revision in which a bug is mixed, and learning data and a learning model established for calculating a bug mixing probability. Based on the parameters and learning data stored in the learning database 200, the bug information learning unit 20 performs machine learning on the revision in which the bug is mixed, the development environment in the revision, and the bug mixing when the file is committed. Probability is calculated. Further, a revision in which a bug is corrected may be detected by machine learning. In this case, the learning database 200 may store learning data related to bug correction.

バグ混入確率出力部２０２は、ファイルがコミットされた際に、当該コミットにおけるバグ混入確率をクライアント２へと出力する。また、別の例としては、ファイルをコミットしようとしている場合において、ファイルのコミット前にクライアント２を介して開発者にバグ混入確率を表示するようにしてもよい。 When the file is committed, the bug mixing probability output unit 202 outputs the bug mixing probability in the commit to the client 2. As another example, when a file is going to be committed, a bug mixing probability may be displayed to the developer via the client 2 before the file is committed.

バグ修正コミット検出部２０４は、ファイルがコミットされた際に、当該コミットがバグ修正に係るコミットであるか否かを判断する。このバグ修正をしたコミットを検出することにより、当該コミットにおいて修正されたバグがどのリビジョンで混入されたバグであるかを検出する基準とする。 When the file is committed, the bug correction commit detection unit 204 determines whether the commit is a commit related to the bug correction. By detecting a commit in which this bug has been corrected, it is used as a reference for detecting at which revision the bug corrected in the commit is mixed.

バグ混入リビジョン検出部２０６は、バグが混入されたリビジョンを検出する。バグが混入されたリビジョンは、バグ修正コミット検出部２０４により検出されたバグ修正において修正されたバグを混入したリビジョンを検出する。 The bug mixed revision detection unit 206 detects a revision in which a bug is mixed. The revision in which the bug is mixed detects the revision in which the bug corrected in the bug correction detected by the bug correction commit detection unit 204 is mixed.

バグ混入確率算出部２０８は、バグ混入リビジョン検出部２０６により検出されたバグ混入リビジョンにおけるファイルのコミット時の状況に基づいて機械学習をすることにより、ファイルをコミットしようとする際に、当該コミットにおいてバグが混入される確率を算出する。バグ混入確率算出部２０８により求められたバグ混入確率は、バグ混入確率出力部２０２を介してクライアント２へと出力される。以上の構成は、例えばＰＣＩｅ（Peripheral Component Interconnect Express）やＳＡＴＡ（Serial Advanced Technology Attachment）等により接続されていてもよい。 The bug mixing probability calculation unit 208 performs machine learning based on the situation at the time of committing the file in the bug mixing revision detected by the bug mixing revision detection unit 206. Calculate the probability of bugs being mixed. The bug mixing probability obtained by the bug mixing probability calculation unit 208 is output to the client 2 via the bug mixing probability output unit 202. The above configuration may be connected by, for example, PCIe (Peripheral Component Interconnect Express), SATA (Serial Advanced Technology Attachment), or the like.

なお、学習データベース２００は、上述したソースコードデータベース１０２や変更履歴データベース１０４などと同様に、ＳＳＤやＨＤＤ等の記録媒体により構成される。バグ混入確率出力部２０２と、バグ修正コミット検出部２０４と、バグ混入リビジョン検出部２０６と、バグ混入確率算出部２０８は、上述したソースコード入力部１００、リポジトリ情報出力部１０６、変更箇所検出部１０８、および、変更行数計数部１１０と同様に、ソフトウェア開発支援装置１に設けられたＣＰＵ（Central Processing Unit）が、種々のＩ／Ｏインターフェースを必要に応じて利用しながら、所定のプログラムを実行することにより実現される。 Note that the learning database 200 is configured by a recording medium such as an SSD or HDD in the same manner as the source code database 102 and the change history database 104 described above. The bug mixing probability output unit 202, the bug correction commit detection unit 204, the bug mixing revision detection unit 206, and the bug mixing probability calculation unit 208 are the source code input unit 100, the repository information output unit 106, and the change location detection unit described above. 108 and the changed row number counting unit 110, a CPU (Central Processing Unit) provided in the software development support apparatus 1 uses a variety of I / O interfaces as necessary, and executes predetermined programs. It is realized by executing.

次に、本実施形態におけるソフトウェア開発支援装置１の動作について説明する。図３は、ソフトウェア開発支援装置１の処理の流れを示すフローチャートである。 Next, the operation of the software development support apparatus 1 in this embodiment will be described. FIG. 3 is a flowchart showing the flow of processing of the software development support apparatus 1.

まず、開発者は、クライアント２を介してソースコードの含まれたファイルをコミットする。コミットされたファイルに記載されたソースコードは、ソースコード入力部１００を介して、ソフトウェア開発支援装置１に入力される（ステップＳ１００）。ソースコード入力部１００から入力されたソースコードは、ソースコードデータベース１０２に登録される（ステップＳ１０２）。 First, the developer commits the file including the source code via the client 2. The source code described in the committed file is input to the software development support apparatus 1 via the source code input unit 100 (step S100). The source code input from the source code input unit 100 is registered in the source code database 102 (step S102).

次に、変更箇所検出部１０８は、入力されたソースコードが記載されたファイルと、ソースコードデータベース１０２に以前のコミットにより既に記憶されている同じファイルのソースコードとを比較して、変更箇所の検出を行う（ステップＳ１０４）。この変更箇所の変更は、例えば、ファイルのコミットがあった際にｄｉｆｆコマンドを自動的に実行することにより行われる。 Next, the change location detection unit 108 compares the file in which the input source code is described with the source code of the same file already stored in the source code database 102 by the previous commit. Detection is performed (step S104). This change is made by automatically executing a diff command when a file is committed, for example.

次に、変更行数計数部１１０は、変更箇所検出部１０８が検出した変更情報から、当該コミットにおけるファイルの追加行数、及び、削除行数を計数する（ステップＳ１０６）。なお、これらの変更箇所検出部１０８と変更行数計数部１１０は、１つの構成としてもよく、変更箇所を検出するとともに、変更行数を係数するようにしてもよい。これら検出された変更箇所及び変更行数の情報は、変更履歴データベース１０４へと登録される（ステップＳ１０８）。なお、上述したステップＳ１０２の順番はこの順番には限られず、ステップＳ１０４又はステップＳ１０６の動作の後にソースコードを登録するようにしてもよい。 Next, the changed line number counting unit 110 counts the number of added lines and deleted lines of the file in the commit from the change information detected by the changed part detecting unit 108 (step S106). It should be noted that these changed portion detection unit 108 and changed row number counting unit 110 may have a single configuration, and may detect the changed portion and calculate the number of changed rows. Information on the detected change location and the number of changed lines is registered in the change history database 104 (step S108). The order of step S102 described above is not limited to this order, and the source code may be registered after the operation of step S104 or step S106.

次に、バグ修正であるか否かを判断する（ステップＳ１１０）。バグ修正であるか否かの判断を図４及び図５を用いて説明する。図４は、このバグ修正であるか否かを判断するステップＳ１１０の動作の詳細を示すフローチャートである。また、図５は、あるコミットにおけるファイルの追加行数と削除行数との関係を示すグラフである。この図５において、半直線Ｌ１は、削除した行数と追加した行数が等しい行数である場合を示すラインである。 Next, it is determined whether or not it is a bug correction (step S110). The determination of whether or not the bug is corrected will be described with reference to FIGS. FIG. 4 is a flowchart showing details of the operation in step S110 for determining whether or not the bug is corrected. FIG. 5 is a graph showing the relationship between the number of added lines and the number of deleted lines of a file in a certain commit. In FIG. 5, a half line L1 is a line indicating a case where the number of deleted lines is equal to the number of added lines.

まず、バグ修正コミット検出部２０４は、変更行数計数部１１０が計数した、あるコミットにおけるファイルの追加行数と削除行数とを比較する（ステップＳ２００）。当該コミットにおけるファイルの変更行数が所定行数より小さい場合には、バグの修正ではない可能性が高い。そこで、当該コミットにおいて削除、追加された行数が所定数以上であるか否かを判断する（ステップＳ２０２）。すなわち、図５において、当該コミットが領域Ｒ４に含まれる関係を満たすか、領域Ｒ４以外の領域に含まれる関係を満たすかを判断する。 First, the bug correction commit detection unit 204 compares the number of added lines and the number of deleted lines of a file in a certain commit counted by the changed line number counting unit 110 (step S200). If the number of changed lines of the file in the commit is smaller than the predetermined number of lines, there is a high possibility that the bug is not corrected. Therefore, it is determined whether or not the number of rows deleted or added in the commit is equal to or greater than a predetermined number (step S202). That is, in FIG. 5, it is determined whether the commit satisfies a relationship included in the region R4 or a relationship included in a region other than the region R4.

ここで、所定行数とは、例えば、８行であり、より好ましくは１０行である。また、あらかじめ決められた行数としてもよいが、例えば、コミットメッセージによりバグ修正と分かるような情報を集積し学習データベース２００に登録することにより、機械学習によりコミット時の状況やコミットされたファイルの状態等を監視し、教師付学習をすることにより、この所定行数をコミットの内容に基づいて変化させるようにしてもよい。 Here, the predetermined number of rows is, for example, 8 rows, and more preferably 10 rows. Although the number of lines may be determined in advance, for example, by accumulating information that can be understood as bug correction by a commit message and registering it in the learning database 200, the situation at the time of commit by machine learning and the file that has been committed The predetermined number of lines may be changed based on the content of the commit by monitoring the state and the like and performing supervised learning.

削除、追加された行数が所定数以上である場合（ステップＳ２０２：Ｙｅｓ）、削除、追加された行数の比率が所定の範囲内にあるか否かを判断する（ステップＳ２０４）。すなわち、図５において当該コミットが領域Ｒ１に含まれる関係を満たすか、領域Ｒ１以外の領域に含まれる関係を満たすかを判断する。これは、削除された行数と追加された行数がそれほど大きな違いを有しない場合、当該コミットがバグ修正である確率が高いためである。 When the number of deleted / added lines is equal to or greater than the predetermined number (step S202: Yes), it is determined whether the ratio of the number of deleted / added lines is within a predetermined range (step S204). That is, in FIG. 5, it is determined whether the commit satisfies a relationship included in the region R1 or a relationship included in a region other than the region R1. This is because if the number of deleted lines and the number of added lines are not so different, the probability that the commit is a bug correction is high.

例えば、経験的・実験的に、０．８（半直線Ｌ３に相当）≦（追加された行数／削除された行数）≦１．２５（半直線Ｌ２に相当）の場合にはバグ修正である可能性が高い。この比率は、例えば、ソースコードを５０行削除し、４０行追加した場合や、ソースコードを４０行削除し、５０行追加した場合の比率である。この比率に関しても、上述と同様に、あらかじめ定められた所定の値であってもよいし、教師付学習により、所定の比率をコミットの内容に基づいて変化させるようにしてもよい。 For example, empirically and experimentally, if 0.8 (corresponds to half line L3) ≤ (number of added lines / deleted lines) ≤ 1.25 (corresponds to half line L2), bug fixes Is likely. This ratio is, for example, a ratio when 50 lines of source code are deleted and 40 lines are added, or when 40 lines of source code are deleted and 50 lines are added. This ratio may be a predetermined value as described above, or may be changed based on the content of the commit by supervised learning.

また、本実施形態においては、単純に削除、追加された行数を比較することによりバグ修正を検出するものとしたが、これには限られず、削除、追加された文字数や、削除、追加されたバイト数等の情報量を比較するものとしてもよい。換言すれば、削除された箇所の情報量と、追加された箇所の情報量は、行数、文字数、バイト数のうちの少なくとも１つに基づいて定義されるようにしてもよい。また、削除、追加された行数のうち、コメント行や空行等の実質的にプログラムとしての情報を有しない行を省いて、削除された箇所の情報量と追加された箇所の情報量とを比較するものとしてもよい。さらには、削除、追加された情報に含まれる命令数、ステップ数、又は、これらの情報を概略的に推定した値等、プログラムの実行に係る情報を比較するものとしてもよい。このように、削除された箇所の情報量と追加された箇所の情報量のうちバグ修正と判定しうる有意な情報に基づく比較をするものであればどのような情報を比較するものでもよい。 In this embodiment, the bug correction is detected by simply comparing the number of deleted / added lines. However, the present invention is not limited to this, and the number of deleted / added characters, deleted / added, etc. The amount of information such as the number of bytes may be compared. In other words, the information amount of the deleted part and the information amount of the added part may be defined based on at least one of the number of lines, the number of characters, and the number of bytes. Also, out of the number of deleted and added lines, omitting lines that do not substantially have information as a program, such as comment lines and blank lines, the information amount of the deleted part and the information amount of the added part May be compared. Furthermore, information related to program execution, such as the number of instructions included in the deleted or added information, the number of steps, or a value obtained by roughly estimating the information may be compared. As described above, any information may be compared as long as the comparison is based on significant information that can be determined as bug correction, between the information amount of the deleted portion and the information amount of the added portion.

なお、図５に示すように、削除行数が所定の数値と比較して十分に大きい場合、すなわち、曲線Ｌ４とｙ軸との間に挟まれた領域Ｒ２に含まれる関係を満たす場合には、当該コミットは機能削除を行ったものであると判断して、この状況を学習させるようにしてもよい。逆に、追加行数が所定の数値と比較して十分に小さい場合、すなわち、曲線Ｌ５とｘ軸との間に挟まれた領域Ｒ３に含まれる関係を満たす場合には、当該コミットは機能追加を行ったものであると判断して、この状況を学習させるようにしてもよい。このように学習させることにより、バグ修正を行ったコミットの判断の正確性を向上させるようにしてもよい。 As shown in FIG. 5, when the number of deleted rows is sufficiently larger than a predetermined value, that is, when the relationship included in the region R2 sandwiched between the curve L4 and the y-axis is satisfied. It may be determined that the commit is a function deletion and this situation is learned. Conversely, if the number of additional rows is sufficiently small compared to a predetermined value, that is, if the relationship included in the region R3 sandwiched between the curve L5 and the x-axis is satisfied, the commit adds a function. This situation may be learned by determining that it has been performed. By learning in this way, it is possible to improve the accuracy of the determination of the commit after the bug correction.

削除、追加された行数の比率が所定範囲内である場合（ステップＳ２０４：Ｙｅｓ）、バグ修正コミット検出部２０４は、当該コミットがバグ修正をおこなったコミットであると判断する（ステップＳ２０６）。 When the ratio of the number of deleted and added lines is within the predetermined range (step S204: Yes), the bug correction commit detection unit 204 determines that the commit is a commit in which the bug has been corrected (step S206).

一方で、削除、追加された行数が所定数以上ではなかった場合（ステップＳ２０２：Ｎｏ）、及び、削除、追加された行数の比率が所定範囲内では無かった場合（ステップＳ２０４：Ｎｏ）、当該コミットはバグ修正を行ったものではないと判断する（ステップＳ２０８）。 On the other hand, when the number of deleted and added lines is not greater than or equal to the predetermined number (step S202: No), and when the ratio of the number of deleted and added lines is not within the predetermined range (step S204: No). Then, it is determined that the commit is not a bug correction (step S208).

次に、ステップＳ２００からステップＳ２０８までの処理において学習をさせるべきデータが発生した場合には、学習データベース２００へそれらの学習データを学習情報として登録する（ステップＳ２１０）。例えば、コミットメッセージにバグ修正である旨が記載されていた場合などに、削除、追加の行数、ファイル名、修正したモジュール名等のバグ修正を行った内容や状況に関する情報を学習データベース２００へと登録することにより、コミットがされた際に、当該コミットがバグ修正を行ったコミットであることを判断する正確性を向上することが可能となる。 Next, when data to be learned is generated in the processing from step S200 to step S208, the learning data is registered as learning information in the learning database 200 (step S210). For example, if the commit message indicates that the bug is corrected, information about the content and status of the bug correction such as deletion, number of added lines, file name, corrected module name, and the like are stored in the learning database 200. It is possible to improve the accuracy of determining that the commit is a bug-fixed commit when a commit is made.

図３に戻り、コミットがバグ修正に関するコミットであるか否かが判断された後、バグ混入パラメータの学習へと遷移する（ステップＳ１１２）。図６は、バグ混入パラメータの学習の処理の概略を示すフローチャートである。 Returning to FIG. 3, after determining whether or not the commit is a commit related to bug correction, the process proceeds to learning of bug inclusion parameters (step S <b> 112). FIG. 6 is a flowchart showing an outline of the process of learning the bug inclusion parameter.

まず、ステップＳ１１０において、コミットがバグ修正であると判断された場合（ステップＳ３００：Ｙｅｓ）、バグ混入リビジョン検出部２０６は、バグを混入したリビジョンの検出を行う（ステップＳ３０２）。バグを混入したリビジョンの検出は、例えば、当該コミットにおいてバグ修正がされたファイルの以前のコミットがされたリビジョンをソースコードデータベース１０２及び変更履歴データベース１０４から抽出することにより、バグ混入リビジョンとして検出する。また、これには限られず、ファイルよりも範囲を狭めて、バグ修正を行ったモジュール、ファンクション単位において比較してバグ混入リビジョンを検出するものとしてもよいし、バグ修正において修正した箇所、例えば削除した行が含まれる箇所を追加、修正したリビジョンを検出するようにしてもよい。 First, if it is determined in step S110 that the commit is a bug correction (step S300: Yes), the bug-incorporated revision detection unit 206 detects a revision in which a bug is incorporated (step S302). The revision in which the bug is mixed is detected as a bug-mixed revision by, for example, extracting the revision in which the bug was corrected in the commit from the source code database 102 and the change history database 104. . This is not limited to this, and the scope may be narrower than the file, and the bug-corrected revision may be detected by comparing the bug-corrected module or function unit. You may make it detect the revision which added and corrected the part where the line which was made is included.

次に、バグ混入リビジョンのコミットをした状況を示すパラメータを取得する（ステップＳ３０４）。ここで、パラメータとは、例えば、コミットを行った開発者の情報、コミットを行った日付、曜日、時間等の情報、コミットにおいて削除された行数と追加された行数の情報、ファイル名、ソースコードの中身と行ったコミットしたものに関する情報等、コミットをする際にその情報を表す指標となるパラメータである。これらのパラメータを組み合わせることにより、ある開発者が昼の３時にコミットしたファイルにバグが存在していた、といった情報や、あるファイルの特定のファンクションに関するコミットにおいて、バグの混入がされた、といった情報を得ることが可能となる。 Next, a parameter indicating the status of committing the bug-mixed revision is acquired (step S304). Here, the parameters are, for example, information on the developer who made the commit, information on the date of commit, date of the week, time, etc., information on the number of rows deleted and added in the commit, file name, It is a parameter that serves as an index to represent the information when committing, such as information about the contents of the source code and what has been committed. By combining these parameters, information that a bug exists in a file committed by a developer at 3:00 pm, or information that a bug was mixed in a commit related to a specific function in a file. Can be obtained.

なお、これらパラメータとする情報は、ソースコードデータベース１０２に記憶されている情報に基づいて取得するようにしてもよいし、変更履歴データベース１０４に記憶されている情報に基づいて取得するようにしてもよい。さらには、パラメータとする情報は、バージョン管理システムから得られる情報には限られるものではない。例えば、開発者の血圧、脈拍数、心拍数、体温、当日の消費カロリーや歩いた歩数等の活動量等のコミットやファイル更新の際における開発者の健康状態を示すバイタル情報や、天気、気温、湿度、気圧、場所等のコミットした際の開発者の周囲の情報を示す外部環境情報といった特殊な入力デバイスから得られる情報をパラメータとして取得してもよい。 The information used as these parameters may be acquired based on information stored in the source code database 102, or may be acquired based on information stored in the change history database 104. Good. Furthermore, the information used as parameters is not limited to information obtained from the version management system. For example, the developer's blood pressure, pulse rate, heart rate, body temperature, vitality information such as the amount of activity such as calories burned on the day and the number of steps walked, etc. Information obtained from a special input device such as external environment information indicating information around the developer at the time of committing, such as humidity, atmospheric pressure, and location may be acquired as a parameter.

例えば、リストバンド型の血圧、脈拍数、体温、活動量計や心拍数計のように直接開発者が装備するものによりバイタルサインを取得してもよいし、画像を介して体温を計測する赤外線カメラのように直接開発者が装備しないものによりバイタルサインを取得してもよい。また、ヘッドセットなどを通して開発者の呼吸やその他の情報を取得したり、開発機に備えられたカメラを介して得られた開発者の顔の画像から健康状態を推定したりするものとしてもよい。さらには、スマートホン等のデバイスにより自動的に健康状態を取得するものとしてもよいし、その日の体調を開発者がそれらのデバイスにあらかじめ入力するものであってもよい。 For example, wrist signs of blood pressure, pulse rate, body temperature, vital signs such as activity meters and heart rate meters may be used to acquire vital signs, or infrared that measures body temperature via images A vital sign may be obtained by something that the developer does not directly equip, such as a camera. It is also possible to acquire developer's breathing and other information through a headset, etc., or estimate the health status from the developer's face image obtained through the camera equipped with the development machine. . Furthermore, it is good also as what acquires a health condition automatically with devices, such as a smart phone, and a developer may input the physical condition of the day into those devices beforehand.

外部情報に関しても同様であり、温度計、湿度計、気圧計により計測するようにしてもよいし、インターネット上の天気情報データベース等にアクセスして取得するようにしてもよい。場所に関しては、例えば、コミットした開発機の設置場所や、無線ＬＡＮを使用した場所などの情報により取得するが、これらの方法には限られない。また、時間に関しては、コミットしたタイミングにおける時間には限られず、コミットしたファイルをコミット前において最終的に更新した時間や、ファイルに記憶されているタイムスタンプにより取得するようにしてもよい。さらには、前回のコミットと今回のコミットとの間の時間を取得するようにしてもよい。これら全ての情報は、それぞれ、有線により取得してもよいし無線により取得してもよい。 The same applies to external information, and it may be measured by a thermometer, a hygrometer, a barometer, or may be acquired by accessing a weather information database on the Internet. The location is acquired based on information such as the location of the committed development machine and the location where the wireless LAN is used, but is not limited to these methods. In addition, the time is not limited to the time at the commit timing, and the commit file may be acquired based on the time when the file was finally updated before committing or the time stamp stored in the file. Furthermore, the time between the previous commit and the current commit may be acquired. All of these pieces of information may be acquired by wire or wirelessly.

いずれの場合においても、コミットの際に、これら入力デバイスから得られる情報をパラメータとして記録してもよいし、これらの入力デバイスから得られる情報をいずれかのデータベースへと記録しておき、コミットの時間や環境、コミットをした開発者等、種々の情報に基づいて取得するものとしてもよい。 In either case, the information obtained from these input devices may be recorded as a parameter at the time of committing, or the information obtained from these input devices may be recorded in one of the databases. The information may be acquired based on various information such as time, environment, committing developer, and the like.

図７は、上述したバイタル情報や外部環境情報を取得する装置と、外部データベースとをソフトウェア開発支援装置１の外部に備える一例を示す概略図である。この図７に示すように、ソフトウェア開発支援装置１の外部に、バイタル情報取得センサ３と、外部環境情報取得センサ４と、環境データベース５とをさらに備える。 FIG. 7 is a schematic diagram illustrating an example in which the above-described apparatus for acquiring vital information and external environment information and an external database are provided outside the software development support apparatus 1. As shown in FIG. 7, a vital information acquisition sensor 3, an external environment information acquisition sensor 4, and an environment database 5 are further provided outside the software development support apparatus 1.

バイタル情報取得センサ３は、例えば、上述したリストバンド型のデバイスや、各種カメラ、ヘッドセット、スマートホン等のデバイスである。 The vital information acquisition sensor 3 is, for example, a device such as the wristband type device described above, various cameras, a headset, or a smart phone.

外部環境取得センサ４は、例えば、上述した温度計、湿度計、気圧計や、インターネット上の天気情報データベースである。 The external environment acquisition sensor 4 is, for example, the above-described thermometer, hygrometer, barometer, or weather information database on the Internet.

環境データベース５は、これらのバイタル情報や外部環境情報を記憶し、格納するデータベースである。図７において環境データベース５は、ソフトウェア開発支援装置１の外部にあるものとしたが、形態としてはこれには限られず、例えば、変更履歴データベース１０４や、学習データベース２００等のソフトウェア開発支援装置１の内部にあるデータベースにこれらの情報を記録するようにしてもよいし、ソフトウェア開発支援装置１内に別途データベースを設けることとしてもよい。 The environment database 5 is a database that stores and stores these vital information and external environment information. In FIG. 7, the environment database 5 is assumed to be outside the software development support apparatus 1, but the form is not limited to this. For example, the software development support apparatus 1 such as the change history database 104, the learning database 200, etc. Such information may be recorded in an internal database, or a separate database may be provided in the software development support apparatus 1.

バグ混入リビジョンのパラメータが取得された後、バグ混入パラメータを学習データベースへと登録する（ステップＳ３０６）。各パラメータを学習データベースへと登録することにより、機械学習におけるサンプル量を増やし、機械学習の正確度をより向上することとなる。 After the parameters of the bug mixture revision are acquired, the bug mixture parameters are registered in the learning database (step S306). By registering each parameter in the learning database, the amount of samples in machine learning is increased, and the accuracy of machine learning is further improved.

次に、学習データベースに登録された各種パラメータ及び以前の学習データに基づき、バグ混入確率算出部２０８は、バグ混入リビジョンにおけるパラメータがどのような傾向があるか等の機械学習を行う。続いて、機械学習により算出された各種パラメータとバグ混入確率とを紐付ける学習データを新たな機械学習パラメータとして更新する（ステップＳ３０８）。また、機械学習により推定された学習モデルを学習データベース２００へ登録するようにしてもよい。 Next, based on the various parameters registered in the learning database and previous learning data, the bug inclusion probability calculation unit 208 performs machine learning such as the tendency of the parameters in the bug inclusion revision. Subsequently, the learning data that associates the various parameters calculated by machine learning with the bug inclusion probability is updated as a new machine learning parameter (step S308). Further, a learning model estimated by machine learning may be registered in the learning database 200.

各リビジョンがバグ混入リビジョンであったか否かを教師データとして、前述のバグ混入パラメータに追加して機械学習の教師付きデータを作成する。場合によっては、この教師付きデータは、バグなし・低いバグの可能性・高いバグの可能性など、二値ではなく多値で扱ってもよい。機械学習アルゴリズムには、一般にはランダムフォレストやサポートベクタマシンなどを採用するが、ディープラーニングを含む別のアルゴリズムでもよい。 Whether or not each revision is a bug-mixed revision is added as teacher data to the above-described bug-mix parameters and machine-supervised data is created. In some cases, this supervised data may be handled in multiple values instead of binary values, such as no bugs, low bugs, and high bugs. Generally, a random forest or a support vector machine is adopted as the machine learning algorithm, but another algorithm including deep learning may be used.

上記のように作成した教師付きデータを入力として、機械学習モデルを作成する。モデルを作成する際のハイパーパラメータは、グリッドサーチ、ランダムサンプリング、ベイジアン最適化などの探索により決定し、可変とするパラメータには、バグ混入確率を決定するための比率も含んでもよい。ここでハイパーパラメータとは、事前モデルを決定するパラメータや、確率モデル全体に影響を与えるパラメータのことをいう。 A machine learning model is created using the supervised data created as described above as an input. Hyperparameters for creating a model are determined by a search such as grid search, random sampling, and Bayesian optimization, and the variable parameter may include a ratio for determining a bug mixing probability. Here, the hyper parameter refers to a parameter that determines a prior model and a parameter that affects the entire probability model.

採用するアルゴリズムによりグリッドサーチ等に用いるハイパーパラメータは異なり、ランダムフォレストであれば決定木の個数、深さ、サンプル数などであり、サポートベクタマシンであれば、ガンマ値、コストなどであり、ディープラーニングであれば、活性化関数の選択、ユニット数、中間層の数、初期の重みなどがそれにあたる。ここで作成した学習モデルに対してコミット時のバグ混入パラメータを入力として与え、バグ混入確率を結果として出力する。 The hyperparameters used for grid search etc. differ depending on the algorithm used, the number of decision trees, the depth, the number of samples, etc. for a random forest, the gamma value, the cost, etc. for a support vector machine, deep learning If so, the selection of the activation function, the number of units, the number of intermediate layers, the initial weight, and the like correspond thereto. The bug mixing parameter at the time of commit is given as an input to the learning model created here, and the bug mixing probability is output as a result.

ランダムフォレストを採用する場合には、まず、各種パラメータと、バグが発生したか否かのデータを教師データとしてランダムに所定の組数だけサブサンプリングする。例えば、ある開発者が、ある曜日のある時間内において行ったコミットがバグであった、又は、バグでは無かった、というデータをランダムに取得し、所定数、一般的には全サンプル数の２／３程度のサブサンプリングデータを作成する。その後、各サブサンプリングデータを教師データとして、各サブサンプリングデータに対する決定木を生成する。生成されたこれらの木に対して、一例として、開発者と時間をパラメータとして、学習データを作成する。最終的な木のノード数を調整することにより、二値又は多値の分類を細かくすることも可能である。この学習データを様々なパラメータを用いて木の構成を決定することにより、各木同士の相関を低くする。このように学習データに基づいて求められた学習モデルを用いて、各パラメータに対する重み付けや関数の選択を行うことにより、バグ混入確率を出力する。 When a random forest is adopted, first, a predetermined number of sets are randomly subsampled as data of various parameters and whether or not a bug has occurred. For example, a certain developer randomly acquires data indicating that a commit made within a certain time on a certain day of the week was a bug or not, and a predetermined number, generally 2 of the total number of samples. / 3 Sub-sampling data is created. Thereafter, a decision tree for each sub-sampling data is generated using each sub-sampling data as teacher data. For these generated trees, as an example, learning data is created using the developer and time as parameters. By adjusting the number of nodes in the final tree, it is possible to refine the binary or multilevel classification. By determining the structure of the learning data using various parameters, the correlation between the trees is lowered. By using the learning model obtained based on the learning data in this way, the bug inclusion probability is output by performing weighting and function selection for each parameter.

サポートベクタマシンを採用する場合には、各種パラメータと、バグが発生したか否かのデータを教師データとして、パラメータ数の次元を持つ空間内で分類を可能とし、各パラメータから構成される空間内の点からのマージンが最大となるような超平面を算出する。上述したように、ハイパーパラメータとしては、誤差に対する重み付けを示すコストＣや、マージンの最大化と誤差の最小化との比率を決定するガンマγなどが用いられる。上述したランダムフォレストと同様に、二値分類又は多値分類をすることも可能であり、このように求められたパラメータに基づいて、バグ混入確率を出力する学習モデルを推定する。 When a support vector machine is used, various parameters and whether or not a bug has occurred are used as teacher data, and classification within a space with the number of parameters is possible. The hyperplane that maximizes the margin from the point is calculated. As described above, as the hyper parameter, the cost C indicating the weighting for the error, the gamma γ for determining the ratio between the maximization of the margin and the minimization of the error, and the like are used. Similar to the random forest described above, binary classification or multi-level classification is also possible, and a learning model that outputs a bug mixing probability is estimated based on the parameters thus obtained.

ディープラーニングを採用する場合にも、各種パラメータと、バグが発生したか否かのデータを教師データとして、学習データを生成する。ディープラーニングは、一般的には、中間層として複数の層を設けたニューラルネットワークにより構成される。多種多様な構成方法が有り、例えば、コンボリューションニューラルネットワーク、リカレントニューラルネットワーク、自己符号化、ディープボルツマンマシンなどのアルゴリズムがあり、またこれらのうち複数のアルゴリズムを組み合わせて学習をしてもよい。使用するアルゴリズムによりハイパーパラメータの構成は変わるが、上述したように、各層を構成するニューロン同士を接続するシナプスを発火させるための活性化関数の選択や、中間層におけるユニット数、中間層の数、初期の重み付け関数などがハイパーパラメータとなる。これらのハイパーパラメータに基づいて、二値分類又は多値分類をすることにより、バグ混入確率を出力する学習モデルを推定する。 Even when deep learning is employed, learning data is generated by using various parameters and data indicating whether or not a bug has occurred as teacher data. Deep learning is generally configured by a neural network having a plurality of layers as intermediate layers. There are various configuration methods, for example, there are algorithms such as a convolutional neural network, a recurrent neural network, self-coding, and a deep Boltzmann machine, and learning may be performed by combining a plurality of these algorithms. Depending on the algorithm used, the configuration of the hyperparameters varies, but as described above, the selection of the activation function for firing the synapse that connects the neurons constituting each layer, the number of units in the intermediate layer, the number of intermediate layers, An initial weighting function or the like becomes a hyperparameter. Based on these hyperparameters, a learning model that outputs a bug mixing probability is estimated by performing binary classification or multi-level classification.

また、上述したアルゴリズムに限られず、適切に教師付学習をできる機械学習モデルであればどのようなアルゴリズムを用いてもよい。 Further, the present invention is not limited to the algorithm described above, and any algorithm may be used as long as it is a machine learning model that can perform supervised learning appropriately.

次に、コミット時のパラメータを学習データベース２００へと登録する（ステップＳ３１０）。ここで登録するデータは、ステップＳ３０４において取得したパラメータと同等のパラメータである。すなわち、当該コミットにおける開発者や時間情報等のデータを学習データベース２００へと登録する。このようにすることで、当該コミットにおいてバグが混入されていた場合に、将来的になされるバグ修正コミットにおいて、当該コミットをバグが混入されたコミットとして検出することが可能となる。また、バグ混入されたコミットであると検出されない場合においては、バグ混入ではないパラメータとして学習データベース２００へと記憶されることとなる。 Next, the commit parameter is registered in the learning database 200 (step S310). The data to be registered here is a parameter equivalent to the parameter acquired in step S304. That is, data such as developer and time information in the commit is registered in the learning database 200. In this way, when a bug is mixed in the commit, it is possible to detect the commit as a commit mixed with a bug in a bug correction commit made in the future. In the case where it is not detected that the commit includes bugs, the parameters are not stored in the learning database 200 as parameters not including bugs.

一方で、ステップＳ３００で、当該コミットがバグ修正と判断されなかった場合（ステップＳ３００：Ｎｏ）において、同様にコミット時のパラメータを学習データベース２００へと登録する（ステップＳ３１０）。これも上記と同様の理由であり、バグ修正コミットであるか否かに拘わらず、当該コミットの状態や環境を学習データとして登録するためである。 On the other hand, if the commit is not determined to be bug correction in step S300 (step S300: No), the commit parameters are similarly registered in the learning database 200 (step S310). This is also for the same reason as described above, in order to register the state and environment of the commit as learning data regardless of whether or not it is a bug correction commit.

バグ混入パラメータの学習が終了すると、再び図３に戻り、入力されたソースコードのバグ混入確率の算出と出力を行う（ステップ１１４）。バグ混入確率算出部２０８は、学習したバグ混入パラメータに基づいて、当該コミットにおいてバグが混入された確率を算出する。例えば、ある開発者が３時前後にコミットをした場合、バグ混入確率が３０％であるといった情報や、あるファイルのある特定の部分を変更するコミットをした場合、バグ混入確率は５０％である、といった情報が算出される。算出された情報は、バグ混入確率出力部２０２を介してクライアント２へと出力される。 When the bug mixing parameter learning is completed, the process returns to FIG. 3 again to calculate and output the bug mixing probability of the input source code (step 114). The bug mixing probability calculation unit 208 calculates the probability that a bug has been mixed in the commit based on the learned bug mixing parameter. For example, if a developer commits at around 3 o'clock, information that the bug mixing probability is 30%, or if a commit is made to change a specific part of a file, the bug mixing probability is 50%. , And the like are calculated. The calculated information is output to the client 2 via the bug inclusion probability output unit 202.

以上の処理は、全てコミットが行われた後に処理されるものとしたが、例えば、コミットする予定のファイルをサーバであるソフトウェア開発支援装置１へと送信する前に、そのコミット予定に係る情報を用いてバグ混入確率を算出することにより、コミット前に当該コミット予定のものにバグが含まれている確率を表示することも可能となる。 The above processes are all processed after the commit is performed. For example, before transmitting the file to be committed to the software development support apparatus 1 that is a server, information on the commit schedule is displayed. It is also possible to display the probability that a bug is included in the planned commit before committing by calculating the bug inclusion probability by using it.

また、上記においては、コミットごとに学習するようにしたが、これには限られず、バッチ処理等により一定周期で行うようにしてもよい。例えば、所定数のコミットがされた場合に学習をおこなってもよいし、午前０時などの決まった時間に学習を行うようにしてもよいし、これらには限られず、適切なタイミングで行うようにしてもよい。 In the above description, learning is performed for each commit. However, the learning is not limited to this, and may be performed at a constant cycle by batch processing or the like. For example, learning may be performed when a predetermined number of commits have been made, or learning may be performed at a fixed time such as midnight, and the learning is not limited to these, and may be performed at an appropriate timing. It may be.

以上のように、本実施形態によれば、コミット時やコミット前に不具合発生の確率を開発者へと表示することが可能となる。例えば、コミット前に不具合発生確率が高いと表示された場合に、あらかじめテストをすることにより、バグの混入を回避したり、コミット後においても、バグ修正の確実性を向上させたりすることが可能となる。 As described above, according to the present embodiment, it is possible to display the probability of occurrence of a failure to a developer at the time of committing or before committing. For example, if it is displayed that the probability of occurrence of a bug is high before committing, it is possible to avoid mixing bugs by testing in advance, and to improve the reliability of bug fixes even after committing It becomes.

開発者をパラメータとすることにより、開発者本人にバグを混入しがちな状況を自覚させることにより、事前にバグの混入を防止することも可能である。さらに、ある特定のファイルにおいてバグの発生確率が高い場合は、当該ファイルの内容は高度な内容であり、バグ混入する蓋然性が高いというワーニングを表示することにより、コミット前にテストを再度行い、かつ、バグ修正を行うことにより、バグ混入の回避をすることも可能である。このように、コミットする際の環境や状況をパラメータとして機械学習をすることにより、バグ修正の正確性や高速性を向上するとともに、そもそもバグの混入を減少させるという効果を得ることもできる。 By using the developer as a parameter, it is possible to prevent the bug from being mixed in advance by making the developer aware of the situation where the developer tends to mix the bug. Furthermore, if the probability of occurrence of a bug in a specific file is high, the content of the file is advanced, and a warning is displayed indicating that there is a high probability that a bug will be mixed. It is also possible to avoid bugs by fixing bugs. In this way, by performing machine learning using the environment and situation at the time of commit as a parameter, it is possible to improve the accuracy and speed of bug correction and to reduce the number of bugs in the first place.

なお、上述した実施形態において、各機能は、上述した機能を有する回路、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等のハードウェアにより構成されるものであってもよいし、プログラムを用いたソフトウェアにより構成されるものであってもよい。 In the above-described embodiment, each function may be configured by hardware such as a circuit having the above-described function, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), It may be configured by software using a program.

本発明は、以上の実施形態に限定されること無く、均等な範囲を含めて種々の変更が可能であり、これら変更を加えた実施形態や、上述した実施形態を組み合わせた形態についても本発明の範囲内に包含されるものである。 The present invention is not limited to the above-described embodiment, and various modifications including an equivalent range are possible. The present invention is also applied to an embodiment in which these modifications are made and a combination of the above-described embodiments. It is included in the range.

１：ソフトウェア開発支援装置、２：クライアント、３：バイタル情報取得センサ、４：外部環境取得センサ、５：環境データベース、１０：バージョン管理部、１００：ソースコード入力部、１０２：ソースコードデータベース、１０４：変更履歴データベース、１０６：リポジトリ情報出力部、１０８：変更箇所検出部、１１０：変更行数計数部、２０：バグ情報学習部、２００：学習データベース、２０２：バグ混入確率出力部、２０４：バグ修正コミット検出部、２０６：バグ混入リビジョン検出部、２０８：バグ混入確率算出部 1: software development support device, 2: client, 3: vital information acquisition sensor, 4: external environment acquisition sensor, 5: environment database, 10: version management unit, 100: source code input unit, 102: source code database, 104 : Change history database, 106: repository information output unit, 108: change location detection unit, 110: change line count unit, 20: bug information learning unit, 200: learning database, 202: bug inclusion probability output unit, 204: bug Correction commit detection unit 206: Bug-mixed revision detection unit 208: Bug-mixing probability calculation unit

Claims

It is a bug fix by determining whether a commit is an update that fixes a bug introduced in another commit made before that commit based on the information of the source code changed in the commit. A bug fix commit detector to detect commits;
A bug-incorporated revision detection unit that detects a revision of the other commit in which a bug fixed in the commit is mixed when the commit is detected as an update that fixes a bug;
A database that stores information identifying revisions of the other commits that contain bugs;
Software development support device comprising:

Whether the commit is a commit related to the bug correction by comparing the deleted part with the added part in the source code described in the file updated in the commit, The software development support apparatus according to claim 1, which determines whether or not.

When the difference between the information amount of the deleted part and the information amount of the added part is within a predetermined range, the bug correction commit detection unit is determined to be a commit related to bug correction. The software development support apparatus according to claim 2, which makes a determination.

The information amount of the deleted part and the information amount of the added part are defined based on at least one of the number of lines, the number of characters, the number of bytes, the number of instructions, and the number of steps. The software development support apparatus described.

A bug mixing probability calculation unit that calculates a bug mixing probability that is a probability that a new bug is mixed in a file updated in the commit based on a situation in which the commit is performed;
A bug mixing probability output unit for outputting the calculated bug mixing probability;
The software development support apparatus according to claim 1, further comprising:

The database stores the state of each commit as a parameter,
The bug mixture probability calculation unit calculates a learning model by performing machine learning using a parameter indicating a commit state stored in the database, and based on the learning model, the bug mixture probability in the commit The software development support apparatus according to claim 3, which calculates

The software development support apparatus according to claim 6, wherein the parameter includes at least information on a developer who has made the commit and information on a time at which the commit has been made.

In addition to the parameter indicating the commit status stored in the database, the bug mixture probability calculation unit has made a commit stored in an environment database provided inside or outside the software development support apparatus. The software development support apparatus according to claim 6 or 7, wherein the learning model is calculated also using a parameter indicating a situation.

The parameter indicating the committed state stored in the environment database includes at least vital information that is information indicating a developer's health state or external environment information that is information indicating a surrounding environment of the developer. Item 9. The software development support device according to Item 8.

The bug correction commit detection unit determines whether or not the commit is an update that fixes a bug mixed in another commit made before that commit based on the information of the source code changed in that commit. And detecting a commit that is a bug fix,
When the bug-mixed revision detection unit detects that the commit is an update that corrects a bug, a step of detecting a revision of the other commit mixed with the bug corrected in the commit;
Storing in the database information identifying the revision of the other commit that contains the bug;
Software development support method comprising:

On the computer,
It is a bug fix by determining whether a commit is an update that fixes a bug introduced in another commit made before that commit based on the information of the source code changed in the commit. Bug fix commit detection means to detect commits,
A bug-incorporating revision detection means for detecting a revision of the other commit in which the bug fixed in the commit is mixed when the commit is detected as an update that fixes the bug;
Storage means for storing information for identifying revisions of the other commits including bugs;
Program to function as.