JP2018106618A

JP2018106618A - Image data classifying apparatus, object detection apparatus, and program therefor

Info

Publication number: JP2018106618A
Application number: JP2016255555A
Authority: JP
Inventors: 吉彦河合; Yoshihiko Kawai; 佐野　雅規; Masaki Sano; 雅規佐野
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2016-12-28
Filing date: 2016-12-28
Publication date: 2018-07-05
Anticipated expiration: 2036-12-28
Also published as: JP6924031B2

Abstract

【課題】汎用性を持たせてより頑健で精度よく画像データを分類可能とする画像データ分類装置、より頑健で高精度に画像データからオブジェクトを検出するオブジェクト検出装置、及びこれらのプログラムを提供する。【解決手段】本発明の画像データ分類装置１は、予め用意された学習データからマルチスケールの畳み込みフィルタを用いて決定木を学習して構築する学習処理部２と、学習された決定木に従って、当該マルチスケールの畳み込みフィルタを用いて識別対象の入力画像を分類する識別処理部３と、を備える。本発明のオブジェクト検出装置１０は、本発明に係る画像データ分類装置１と、画像データ分類装置１による分類結果を基に、走査窓の画像内でオブジェクトの有無を判定する判定処理と、該オブジェクトが有るときの画像特徴となる特徴点を選定する特徴点選定処理とを並列に実行する分類結果判定手段１２，１３とを備える。【選択図】図１PROBLEM TO BE SOLVED: To provide an image data classification device which has versatility and enables more robust and accurate classification of image data, an object detection device which detects an object from image data more robustly and with high accuracy, and a program thereof. .. SOLUTION: An image data classification device 1 of the present invention is constructed according to a learning processing unit 2 that learns and constructs a decision tree from learning data prepared in advance by using a multi-scale convolution filter, and a learned decision tree. The identification processing unit 3 for classifying the input image to be identified by using the multi-scale convolution filter is provided. The object detection device 10 of the present invention is a determination process for determining the presence or absence of an object in an image of a scanning window based on the classification results of the image data classification device 1 and the image data classification device 1 according to the present invention, and the object. It is provided with classification result determination means 12 and 13 that execute a feature point selection process for selecting a feature point that becomes an image feature when there is an image in parallel. [Selection diagram] Fig. 1

Description

本発明は、画像データを分類する技術に関し、特に、機械学習させて画像の分類を行う画像データ分類装置、この画像データ分類装置を用いて画像データ内の所定のオブジェクト（顔、人物、車両などの物体）を検出可能とするオブジェクト検出装置、及びこれらのプログラムに関する。 The present invention relates to a technique for classifying image data, and in particular, an image data classifying apparatus that classifies images by machine learning, and a predetermined object (face, person, vehicle, etc.) in image data using the image data classifying apparatus. The present invention relates to an object detection device capable of detecting (object) of the above and a program thereof.

一般に、画像データを分類するために、機械学習させて構築された決定木による分類技法がよく用いられる。決定木は、if‐thenルールに基づいて、入力されたデータを分類する技法である。 In general, in order to classify image data, a classification technique based on a decision tree constructed by machine learning is often used. A decision tree is a technique for classifying input data based on if-then rules.

特に、静止画像の画像データを分類する決定木の各ノードでは、入力される画像データ（入力画像）に対し所定の特徴量を算出し、この算出した特徴量を持つ入力画像をまず２つに分離するためのノードとする。そして、当該算出した特徴量が、２つに分離するためのノード闘値より大きいか否かで当該ノードが分岐される。決定木では、この分岐を繰り返し、最終的に到達した葉ノードの分類結果を当該入力画像に対するラベルとして決定する。ラベルとは、検出対象となるオブジェクト（顔、人物、車両などの物体）の分類結果を示すものをいう。 In particular, at each node of the decision tree for classifying still image image data, a predetermined feature amount is calculated for the input image data (input image), and first two input images having the calculated feature amount are obtained. A node for separation. Then, the node is branched depending on whether or not the calculated feature value is larger than the node threshold value for separating the feature value into two. In the decision tree, this branch is repeated, and the classification result of the finally reached leaf node is determined as a label for the input image. The label is a label indicating a classification result of an object to be detected (an object such as a face, a person, or a vehicle).

ここで、決定木を構築するための機械学習の学習手順について説明する。機械学習の学習データとして、正解ラベルが付与された画像群（正例）と、正解ラベルが付与されていない画像群（負例）が予め用意される。決定木を構築するための機械学習のアルゴリズムには、ＩＤ３やＣＡＲＴ等の様々なものがある。尚、正解ラベル或いは不正解ラベルは、ラベル１、ラベル２、…というように複数種が想定される。 Here, a learning procedure of machine learning for constructing a decision tree will be described. As learning data for machine learning, an image group to which a correct label is assigned (positive example) and an image group to which no correct label is assigned (negative example) are prepared in advance. There are various machine learning algorithms for constructing a decision tree, such as ID3 and CART. Note that there are a plurality of correct answer labels or incorrect answer labels such as label 1, label 2,.

また、機械学習には、正例と負例とを分離させ、尚且つ正例を分類するための様々な種類の分離用の特徴量群（以下、「特徴量プール」と称する）も予め用意される。この特徴量プール内の特徴量を基に、正例及び負例の画像の画像特徴（各画像を特徴づける特徴量）が算出される。尚、決定木を用いて分類する対象となる入力画像も同様に、この特徴量プール内の特徴量を基に、当該入力画像を特徴づける特徴量が算出される。 For machine learning, various types of separation feature quantity groups (hereinafter referred to as “feature quantity pools”) for separating positive examples from negative examples and classifying positive examples are also prepared in advance. Is done. Based on the feature amounts in the feature amount pool, the image features of the positive and negative images (feature amounts that characterize each image) are calculated. In the same manner, the feature quantity characterizing the input image is calculated based on the feature quantity in the feature quantity pool for the input image to be classified using the decision tree.

機械学習により決定木を構築するために、特徴量プールの中から、学習データ（より正確には、学習データの画像特徴）を最もよく分離できる特徴量を選択してノード閾値により分岐し、その分岐したノードを更に分岐するよう順番に繰り返す。ノード閾値は、分離判定対象のノードを２つに分離させるために、ノード毎にその都度判定される。 In order to construct a decision tree by machine learning, a feature amount that can best separate learning data (more precisely, image features of learning data) is selected from the feature amount pool, and is branched by a node threshold. It repeats in order so that the branched node may be further branched. The node threshold is determined for each node each time in order to separate the node to be separated into two.

この分岐は、分離判定対象のノードに属する学習データ数が所定の闘値以下になるか、又は当該分離判定対象のノードにおける学習データの分離精度が所定の闘値以下となるまで（即ち、分離精度の向上が望めなくなるまで）繰り返す。尚、データ分離の良否の判定を行うとともに、Ｇｉｎｉ係数や情報利得などがよく利用される。 This branching is performed until the number of learning data belonging to the separation determination target node is equal to or lower than a predetermined threshold value, or until the separation accuracy of learning data in the separation determination target node is equal to or lower than a predetermined threshold value (that is, separation is performed). Repeat until accuracy is no longer desired. In addition, the quality of data separation is determined and a Gini coefficient, information gain, and the like are often used.

ところで、分離精度の高い決定木を構築するためには、ノードの分岐のためにどのような特徴量（特徴量プール内の特徴量及び画像特徴となる特徴量を含む）を利用するかが重要となる。 By the way, in order to construct a decision tree with high separation accuracy, it is important what feature quantities (including feature quantities in the feature quantity pool and feature quantities that become image features) are used for branching nodes. It becomes.

従来技法として、入力画像を２つの小領域に区分し、第１の領域にある画素の総和から、第２の領域内にある画素の総和を減算した値を特徴量とする技法が開示されている（例えば、非特許文献１参照）。非特許文献１では、この特徴量をＨａａｒライク特徴と称し、非特許文献１における図１（Figure １）には、そのＨａａｒライク特徴の例が示されており、灰色の小領域にある画素の総和から、白色の小領域内にある画素の総和を減算した値を特徴量としている。非特許文献１では、この小領域の位置やサイズを様々に変えたものを特徴量プールとしている。 As a conventional technique, a technique is disclosed in which an input image is divided into two small regions, and a feature value is obtained by subtracting the sum of pixels in the second region from the sum of pixels in the first region. (For example, refer nonpatent literature 1). In Non-Patent Document 1, this feature amount is referred to as a Haar-like feature, and FIG. 1 in Non-Patent Document 1 shows an example of the Haar-like feature. A value obtained by subtracting the sum of the pixels in the white small area from the sum is used as the feature amount. In Non-Patent Document 1, a feature amount pool is obtained by changing the position and size of this small region in various ways.

また、入力画像に対し予め規則性のある複数の座標点（画素座標）を微調整可能に割り当て、複数の座標点（顔特徴点）のうち２座標点を選択し、選択した２座標点間の差分（画素値の差分）を特徴量とする技法が開示されている（例えば、非特許文献２参照）。非特許文献２における図９（Figure ９）には、その特徴量の例が示されており、選択する２座標点の組み合わせを様々に変えたものを特徴量プールとしている。また、非特許文献２には、微調整可能とする複数の座標点は、絶対座標系で定義するよりはむしろ局所座標系で定義することが提案されている。尚、画像データに対する顔特徴点検出は、人物認識に利用可能である。 Also, a plurality of regular coordinate points (pixel coordinates) are assigned to the input image in advance so that they can be finely adjusted, two coordinate points are selected from the plurality of coordinate points (face feature points), and the selected two coordinate points Is disclosed as a feature amount (for example, see Non-Patent Document 2). FIG. 9 in Non-Patent Document 2 shows an example of the feature amount, and a feature amount pool is obtained by changing various combinations of two coordinate points to be selected. Non-Patent Document 2 proposes that a plurality of coordinate points that can be finely adjusted be defined in a local coordinate system rather than in an absolute coordinate system. Note that face feature point detection for image data can be used for person recognition.

P.Viola and M.Jones, “Robust Real-time Object Detection”, Technical Report Series, CRL 2001/1, February 2001.P. Viola and M. Jones, “Robust Real-time Object Detection”, Technical Report Series, CRL 2001/1, February 2001. X.Cao, Y.Wei, F.Wen, and J.Sun, “Face Alignment by Explicit Shape Regression”, In Proc.CVPR, 2012.X.Cao, Y.Wei, F.Wen, and J.Sun, “Face Alignment by Explicit Shape Regression”, In Proc. CVPR, 2012.

非特許文献１では、顔が映っている領域を検出する目的で専用に設計された特徴量が提案されている。また、非特許文献２では、顔画像から複数の座標点（顔特徴点）を検出する目的で専用に設計された特徴量が提案されている。 Non-Patent Document 1 proposes a feature amount designed exclusively for the purpose of detecting a region in which a face is reflected. Non-Patent Document 2 proposes a feature amount designed exclusively for the purpose of detecting a plurality of coordinate points (face feature points) from a face image.

これらの従来技法は、目的に応じて専用に設計された特徴量であるため、汎用性に乏しく、その目的以外の画像データの分類に利用することは難しいものとなっている。 Since these conventional techniques are feature quantities designed exclusively for the purpose, they are not versatile and difficult to use for classification of image data other than the purpose.

一般に、画像データの分類用途には、顔検出や顔特徴点検出の他、車両検出や車両特徴点検出、或いはこれらの組み合わせなど、様々なオブジェクト検出の用途があり、目的に応じて専用に設計された特徴量とすることは、汎用性に乏しくなる。 In general, image data classification applications include various object detection applications such as vehicle detection, vehicle feature point detection, or a combination of these in addition to face detection and face feature point detection. Setting the feature amount to be reduced is not versatile.

更に、これらの従来技法により、入力画像に対し顔の有無の検出するとともに、顔画像から複数の座標点（顔特徴点）を検出し人物認識に利用可能とするには、まず非特許文献１の技法に基づく顔検出を行って、その後、当該入力画像に対し非特許文献２の技法に基づく顔画像から複数の座標点（顔特徴点）を検出することが考えられるが、処理効率として優れているとはいえない。 Furthermore, in order to detect the presence / absence of a face in an input image and to detect a plurality of coordinate points (face feature points) from the face image by using these conventional techniques, first of all, Non-Patent Document 1 It is conceivable to detect a plurality of coordinate points (facial feature points) from the face image based on the technique of Non-Patent Document 2 for the input image after performing face detection based on this technique. I cannot say that.

また、このような顔検出や顔特徴点の対象となる入力画像は、一般的に、ノイズや顔の向きの多様性（顔画像の変形）があり、まずは顔検出の精度を高めることが要求されるが、非特許文献１の技法による顔検出の性能は実用性の観点から十分とはいえない。 In addition, input images that are the targets of such face detection and facial feature points generally have a variety of noises and face orientations (deformation of face images). First, it is required to improve the accuracy of face detection. However, the performance of face detection by the technique of Non-Patent Document 1 is not sufficient from the viewpoint of practicality.

このため、入力画像にノイズや検出対象のオブジェクトの向きの多様性がある場合でも、頑健なオブジェクト検出を可能とし、尚且つそのオブジェクト特徴点を効率よく取得可能とするために、汎用性を持たせてより頑健で精度よく画像データを分類可能とする画像データ分類、及びより頑健で高精度に画像データからオブジェクトを検出するオブジェクト検出の技法が望まれる。 For this reason, even if the input image has a variety of noise and the orientation of the object to be detected, it has versatility to enable robust object detection and to efficiently acquire the object feature points. Therefore, there is a demand for image data classification that enables image data to be classified more robustly and accurately, and an object detection technique that detects an object from image data more robustly and with high accuracy.

本発明の目的は、上述の問題に鑑みて、汎用性を持たせてより頑健で精度よく画像データを分類可能とする画像データ分類装置、より頑健で高精度に画像データからオブジェクトを検出するオブジェクト検出装置、及びこれらのプログラムを提供することにある。 In view of the above-described problems, an object of the present invention is to provide an image data classification apparatus that can classify image data with more versatility and more robustly, and an object that detects an object from image data more robustly and with high precision. It is in providing a detection apparatus and these programs.

本発明の画像データ分類装置は、画像データを分類する画像データ分類装置であって、予め用意された学習データからマルチスケールの畳み込みフィルタを用いて決定木を学習して構築する学習処理部と、当該学習された決定木に従って、当該マルチスケールの畳み込みフィルタを用いて識別対象の入力画像を分類する識別処理部と、を備えることを特徴とする。 The image data classification device of the present invention is an image data classification device for classifying image data, and a learning processing unit that learns and constructs a decision tree from learning data prepared in advance using a multiscale convolution filter, An identification processing unit that classifies an input image to be identified using the multiscale convolution filter according to the learned decision tree.

また、本発明の画像データ分類装置において、前記学習処理部は、予め定められた１つ以上の基準座標点と、フィルタサイズ毎に予め定められた複数種のフィルタ係数で構成される複数種の畳み込みフィルタと、予め定められた複数種のフィルタサイズとを特徴量プールとして保持する特徴量プール手段と、入力される複数の学習データの各々に対し、前記特徴量プールに従って当該複数種のフィルタサイズに応じた当該複数種の畳み込みフィルタによるマルチスケールの畳み込みフィルタ処理を実行し、各学習データに対して、当該１つ以上の基準座標点の各々に対し複数種の畳み込みフィルタの数に相当する複数の畳み込み値を求める第１の畳み込みフィルタ処理手段と、全ての学習データの各々に関する当該１つ以上の基準座標点と、当該複数種の畳み込みフィルタと、それぞれ対応付けられた当該畳み込み値との組み合わせ情報を基に、当該１つ以上の基準座標点についてノード分岐対象の全ての学習データを最も精度よく２つに分離する畳み込みフィルタの種類と、この分離のためのノード閾値とを求める分離精度算出手段と、前記ノード閾値を基に全ての学習データをノード分岐として２つに分離し、当該ノード分岐に係る畳み込みフィルタの種類と、当該ノード分岐に係るノード閾値とを当該ノードに対応付けて保持し、当該ノード分岐後の全ての学習データについて更なるノード分岐を行うよう繰り返し制御を行うことにより、前記決定木を構築するノード分岐手段と、を備えることを特徴とする。 Further, in the image data classification device of the present invention, the learning processing unit includes a plurality of types of one or more reference coordinate points and a plurality of types of filter coefficients that are predetermined for each filter size. Feature amount pool means for holding a convolution filter and a plurality of predetermined filter sizes as a feature amount pool, and for each of a plurality of input learning data, the plurality of types of filter sizes according to the feature amount pool Multi-scale convolution filter processing using the plurality of types of convolution filters according to the plurality of convolution filters, and a plurality of convolution filters corresponding to the number of the plurality of types of convolution filters for each of the one or more reference coordinate points for each learning data. First convolution filter processing means for obtaining a convolution value of the one or more, the one or more reference coordinate points for each of all learning data, Based on the combination information of the plurality of types of convolution filters and the corresponding convolution values, all the learning data to be node-branched for the one or more reference coordinate points is most accurately separated into two. Separation accuracy calculation means for obtaining the type of convolution filter and a node threshold for this separation, and all learning data is separated into two as node branches based on the node threshold, and the convolution filter related to the node branch The decision tree is constructed by holding the type and the node threshold value related to the node branch in association with the node, and performing repetitive control to perform further node branching for all learning data after the node branch Node branching means.

また、本発明の画像データ分類装置において、前記ノード分岐手段は、分離判定対象のノードに属する学習データ数が所定の闘値以下になるか、又は当該分離判定対象のノードにおける学習データの分離精度が所定の闘値以下となるまで繰り返す当該繰り返し制御を行うことにより、前記決定木を学習して構築することを特徴とする。 In the image data classification device according to the present invention, the node branching unit may determine whether the number of learning data belonging to the separation determination target node is equal to or less than a predetermined threshold or the separation accuracy of learning data in the separation determination target node. The decision tree is learned and constructed by performing the iterative control repeatedly until is less than or equal to a predetermined threshold value.

また、本発明の画像データ分類装置において、前記識別処理部は、前記ノード分岐手段によって構築された当該決定木を格納する学習結果格納手段と、当該学習された決定木に従って前記マルチスケールの畳み込みフィルタを用いて当該識別対象の入力画像を分類する第２の畳み込みフィルタ処理手段と、を備えることを特徴とする。 In the image data classification device of the present invention, the identification processing unit includes a learning result storage unit that stores the decision tree constructed by the node branching unit, and the multiscale convolution filter according to the learned decision tree. And a second convolution filter processing means for classifying the input image to be identified using.

更に、本発明によるオブジェクト検出装置は、入力フレーム画像から所定のオブジェクトを検出するオブジェクト検出装置であって、本発明の画像データ分類装置と、前記画像データ分類装置による分類結果を基に、前記入力フレーム画像に対する所定の走査窓の画像内でオブジェクトの有無を判定する判定処理と、該オブジェクトが有るときの画像特徴となる特徴点を選定する特徴点選定処理とを並列に実行する分類結果判定手段と、を備えることを特徴とする。 Furthermore, an object detection device according to the present invention is an object detection device for detecting a predetermined object from an input frame image, wherein the input based on the image data classification device according to the present invention and a classification result by the image data classification device. Classification result determination means for executing in parallel a determination process for determining the presence or absence of an object in an image of a predetermined scanning window with respect to a frame image and a feature point selection process for selecting a feature point as an image feature when the object is present And.

また、本発明によるオブジェクト検出装置において、本発明の画像データ分類装置を備え、該画像データ分類装置は、前記特徴量プール手段は、複数の当該基準座標点と、フィルタサイズ毎に予め定められた複数種のフィルタ係数で構成される複数種の畳み込みフィルタと、予め定められた複数種のフィルタサイズとを特徴量プールとして保持する手段を有し、前記畳み込みフィルタ処理手段は、該複数の基準座標点のうち更新可能な特定の２座標点間の畳み込み値の差分値を更に求める手段を有し、前記分離精度算出手段は、該複数の基準座標点のうち更新可能な特定の２座標点間の畳み込み値の差分値の全ての組み合わせを基に、当該複数の基準座標点についてノード分岐対象の全ての学習データを最も精度よく２つに分離する畳み込みフィルタの種類と、この分離のためのノード閾値とを求める手段を有し、前記ノード分岐手段は、該ノード閾値を基に全ての学習データをノード分岐として２つに分離し、当該ノード分岐に係る畳み込みフィルタの種類と、当該ノード分岐に係るノード閾値とを当該ノードに対応付けて保持し、当該ノード分岐後の全ての学習データについて更なるノード分岐を行うよう繰り返し制御を行うことにより、前記決定木を学習して構築する手段を有することを特徴とする。 The object detection device according to the present invention includes the image data classification device according to the present invention. The image data classification device is configured such that the feature amount pool means is predetermined for each of the plurality of reference coordinate points and the filter size. A plurality of types of convolution filters composed of a plurality of types of filter coefficients and a plurality of types of filter sizes determined in advance as a feature amount pool; and the convolution filter processing unit includes the plurality of reference coordinates. A means for further obtaining a difference value of a convolution value between specific two coordinate points that can be updated among the points, wherein the separation accuracy calculating unit is configured to update between two specific coordinate points that can be updated among the plurality of reference coordinate points. Based on all combinations of difference values of convolution values, convolution that separates all the learning data of the node branch target into the two with the highest accuracy for the plurality of reference coordinate points The node branching unit separates all learning data into two as node branches based on the node threshold, and determines the node branch. By holding the type of the convolution filter and the node threshold value related to the node branch in association with the node, and performing repetitive control to perform further node branching on all the learning data after the node branch, It has a means for learning and building a decision tree.

また、本発明によるオブジェクト検出装置において、前記分類結果判定手段は、前記特徴量プール手段内の複数の当該基準座標点のうち所定数の基準座標点の初期値を定め、該所定数の基準座標点の初期値をそれぞれ原点とする局所座標系により、当該所定数の基準座標点の位置関係の位置ずれを修正するよう、画像データ分類装置に対し更新させる基準座標点更新手段を備えることを特徴とする。 In the object detection apparatus according to the present invention, the classification result determination unit determines an initial value of a predetermined number of reference coordinate points among the plurality of reference coordinate points in the feature amount pool unit, and the predetermined number of reference coordinate points Reference coordinate point updating means for updating the image data classification device so as to correct the positional deviation of the positional relationship of the predetermined number of reference coordinate points by a local coordinate system having the initial value of each point as an origin. And

更に、本発明によるプログラムは、コンピュータを、本発明の画像データ分類装置として機能させるためのプログラム、或いは、コンピュータを、本発明のオブジェクト検出装置として機能させるためのプログラムとして構成される。 Furthermore, the program according to the present invention is configured as a program for causing a computer to function as the image data classification apparatus according to the present invention, or as a program for causing a computer to function as the object detection apparatus according to the present invention.

本発明に係る画像データの分類技法によれば、汎用性を持たせてより頑健で精度よく画像データを分類可能となり、画像データのラベルを精度よく推定することが可能となる。そして、本発明に係る画像データの分類技法を基に、画像データから対象のオブジェクトを検出することが可能となる。 According to the image data classification technique of the present invention, it is possible to classify image data more robustly and accurately with general versatility, and it is possible to accurately estimate the label of the image data. Then, based on the image data classification technique according to the present invention, a target object can be detected from the image data.

本発明による一実施形態の画像データ分類装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the image data classification device of one Embodiment by this invention. 本発明による一実施形態の画像データ分類装置における学習処理を示すフローチャートである。It is a flowchart which shows the learning process in the image data classification device of one Embodiment by this invention. 本発明による一実施形態の画像データ分類装置における学習処理の説明図である。It is explanatory drawing of the learning process in the image data classification device of one Embodiment by this invention. 本発明による一実施形態の画像データ分類装置によって構築される決定木の概略図である。It is the schematic of the decision tree constructed | assembled by the image data classification device of one Embodiment by this invention. 本発明による一実施形態のオブジェクト検出装置として構成される一実施例の顔検出装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the face detection apparatus of one Example comprised as an object detection apparatus of one Embodiment by this invention. 本発明による一実施形態のオブジェクト検出装置として構成される一実施例の顔検出装置における走査窓設定部の説明図である。It is explanatory drawing of the scanning window setting part in the face detection apparatus of an Example comprised as an object detection apparatus of one Embodiment by this invention. （ａ）は本発明による一実施形態のオブジェクト検出装置として構成される一実施例の顔検出装置における３例の顔特徴量の説明図であり、（ｂ）は本発明に係る一実施例の顔検出装置における３例の顔特徴量について局所座標系で更新される基準座標を例示する説明図であり、（ｃ）比較例として３例の顔特徴量について絶対座標系で更新される基準座標を例示する説明図である。(A) is explanatory drawing of the face feature-value of three examples in the face detection apparatus of an Example comprised as an object detection apparatus of one Embodiment by this invention, (b) is an example of one Example which concerns on this invention It is explanatory drawing which illustrates the reference | standard coordinate updated in a local coordinate system about three face feature-values in a face detection apparatus, (c) The reference coordinate updated in an absolute coordinate system about three face feature-values as a comparative example It is explanatory drawing which illustrates this. 本発明による一実施形態のオブジェクト検出装置として構成される一実施例の顔検出装置における動作の説明図である。It is explanatory drawing of operation | movement in the face detection apparatus of an Example comprised as an object detection apparatus of one Embodiment by this invention. 本発明による一実施形態のオブジェクト検出装置として構成される一実施例の顔検出装置と、非特許文献１の技法との性能比較を示す図である。It is a figure which shows the performance comparison with the technique of a nonpatent literature 1, and the face detection apparatus of one Example comprised as an object detection apparatus of one Embodiment by this invention.

〔画像データ分類装置〕
まず、図１乃至図４を参照して、本発明による一実施形態の画像データ分類装置１について説明する。 [Image data classification device]
First, an image data classification device 1 according to an embodiment of the present invention will be described with reference to FIGS. 1 to 4.

（装置構成）
図１は、本発明による一実施形態の画像データ分類装置１の概略構成を示すブロック図である。画像データ分類装置１は、機械学習させて構築された決定木により画像データを分類する装置である。 (Device configuration)
FIG. 1 is a block diagram showing a schematic configuration of an image data classification device 1 according to an embodiment of the present invention. The image data classification device 1 is a device that classifies image data using a decision tree constructed by machine learning.

入力される静止画像の画像データを分類するため、決定木の各ノードでは、分類対象の画像データ（入力画像）に対し所定の特徴量を算出し、この算出した特徴量を持つ入力画像をまず２つに分離するためのノードとし、当該算出した特徴量が、２つに分離するためのノード闘値より大きいか否かで当該ノードが分岐される。決定木では、この分岐を繰り返し、最終的に到達した葉ノードの分類結果を当該入力画像に対するラベルとして決定する。ラベルは、検出対象となるオブジェクト（顔、人物、車両などの物体）の分類結果を示すものである。 In order to classify the image data of the input still image, each node of the decision tree calculates a predetermined feature amount for the image data (input image) to be classified, and first inputs the input image having the calculated feature amount. The node is divided into two nodes, and the node branches depending on whether or not the calculated feature value is larger than the node threshold value for separating the two. In the decision tree, this branch is repeated, and the classification result of the finally reached leaf node is determined as a label for the input image. The label indicates a classification result of an object (an object such as a face, a person, or a vehicle) to be detected.

本発明に係る画像データ分類装置１は、ノードの分岐のために利用する特徴量（特徴量プール内の特徴量及び画像特徴となる特徴量を含む）が従来技法（特に、非特許文献１，２の技法）とは異なり、より表現能力の高い特徴量として、マルチスケールの畳み込みフィルタを利用した特徴量としている。 In the image data classification device 1 according to the present invention, a feature amount (including a feature amount in a feature amount pool and a feature amount to be an image feature) used for node branching is a conventional technique (in particular, Non-Patent Document 1, Unlike the technique (2), a feature quantity using a multi-scale convolution filter is used as a feature quantity having higher expressive ability.

より具体的には、本発明に係る画像データ分類装置１では、予め定められた１つ以上の基準座標点と、フィルタサイズ毎に予め定められた複数種のフィルタ係数で構成される複数種の畳み込みフィルタと、予め定められた複数種のフィルタサイズとを、特徴量プール内の特徴量としている。 More specifically, in the image data classification device 1 according to the present invention, a plurality of types of one or more reference coordinate points that are determined in advance and a plurality of types of filter coefficients that are determined in advance for each filter size. The convolution filter and a plurality of predetermined filter sizes are used as feature amounts in the feature amount pool.

そして、本発明に係る画像データ分類装置１では、当該基準座標点の各々に対し、当該複数種のフィルタサイズ毎に、特定のフィルタ係数で構成される畳み込みフィルタによるフィルタ処理を実行し、当該複数種のフィルタサイズ毎の畳み込みフィルタ処理後の画素値を正規化合成した値（畳み込み値ｇ）を、画像特徴となる特徴量としている。 Then, in the image data classification device 1 according to the present invention, for each of the reference coordinate points, a filter process using a convolution filter configured with specific filter coefficients is executed for each of the plurality of types of filter sizes, A value (convolution value g) obtained by normalizing and combining pixel values after the convolution filter processing for each type of filter size is used as a feature amount serving as an image feature.

ただし、本発明に係る特徴量は、非特許文献１，２の技法における各特徴量のいずれをも表現可能な特徴量であり、この詳細は、本発明に係るオブジェクト検出装置１０にて後述する。 However, the feature amount according to the present invention is a feature amount that can represent any of the feature amounts in the techniques of Non-Patent Documents 1 and 2, and details thereof will be described later in the object detection apparatus 10 according to the present invention. .

つまり、本発明に係る画像特徴となる特徴量は、図３を参照して後述するが、複数種（ｍ種類）の畳み込みフィルタｈ_ｍの各々を総括してｈ（Ｋ＋ｉ，Ｋ＋ｊ）と表し、この畳み込みフィルタの複数のフィルタサイズＮ_ｎの各々を総括して縦・横でＮ×Ｎ（Ｎは奇数）画素とし、入力画像ｆに対するｋ（ｋは１以上の整数）個の基準座標点Ｐ_ｋ＝（ｘ_ｋ，ｙ_ｋ）の各々の座標を総括して（ｘ，ｙ）と表すとすると、当該複数種のフィルタサイズ毎の畳み込みフィルタ処理後の画素値を正規化合成した値（畳み込み値ｇ）は、式（１）のように定義される。尚、畳み込みフィルタに関する複数のフィルタサイズＮ×Ｎは特徴量プールとして予め設定してあり、これによりマルチスケールの畳み込みフィルタ処理を構成している。 That is, the feature amount as the image feature of the present invention represent will be described later with reference to FIG. 3, are collectively each convolution filter h _m plural kinds (m types) h (K + i, K + j) and, Each of the plurality of filter sizes N _n of this convolution filter is collectively made up of N × N (N is an odd number) pixels vertically and horizontally, and k (k is an integer of 1 or more) reference coordinate points P with respect to the input image f. _If each coordinate of _k = (x _k , y _k ) is collectively expressed as (x, y), a value obtained by normalizing and combining the pixel values after the convolution filter processing for each of the plural types of filter sizes (convolution) The value g) is defined as in equation (1). Note that a plurality of filter sizes N × N related to the convolution filter are set in advance as a feature amount pool, thereby constituting a multi-scale convolution filter process.

本例では、畳み込みフィルタｈ（Ｋ＋ｉ，Ｋ＋ｊ）;（０≦ｉ,ｊ＜Ｎ）の各フィルタ係数の値と、畳み込みフィルタを適用する注目画素となる基準座標点Ｐ_ｋ＝（ｘ_ｋ，ｙ_ｋ）について、ランダムに設定したものを特徴量プールとして用いる。ただし、畳み込みフィルタを適用する基準座標点Ｐ_ｋについては、用途に応じて予め考慮した座標点とすることもできる。また、用途に応じて、特徴量プールとして用いる畳み込みフィルタｈ（Ｋ＋ｉ，Ｋ＋ｊ）の種類、基準座標点Ｐ_ｋの位置、及び、畳み込みフィルタに関する複数のフィルタサイズＮ×Ｎは、外部から設定変更可能に構成するのが好適である。 In this example, the value of each filter coefficient of the convolution filter h (K + i, K + j); (0 ≦ i, j <N), and the reference coordinate point P _k = (x _k , y that is the target pixel to which the convolution filter is applied. _{For k} ), a random setting is used as the feature amount pool. However, the reference coordinate point P _k to which the convolution filter is applied may be a coordinate point that is considered in advance according to the application. Depending on the application, the type of convolution filter h (K + i, K + j) used as a feature amount pool, the position of the reference coordinate point _Pk , and a plurality of filter sizes N × N related to the convolution filter can be set and changed from the outside. It is preferable to configure.

ここで、本発明に係る画像データ分類装置１は、畳み込みフィルタのフィルタサイズを様々に変更してマルチスケール化を構成するが、以下に説明する例では計算コストの削減のため、フィルタサイズを大きくするのではなく、対象画像のサイズを小さくすることで対応する実施形態としている。ただし、対象画像のサイズを変更せずにフィルタサイズを大きくする実施形態としてもよい。 Here, the image data classification device 1 according to the present invention configures multi-scaling by variously changing the filter size of the convolution filter. In the example described below, the filter size is increased to reduce the calculation cost. Instead of this, the corresponding image size is reduced by reducing the size of the target image. However, the filter size may be increased without changing the size of the target image.

より具体的に図１を参照して説明するに、本実施形態の画像データ分類装置１は、学習データからマルチスケールの畳み込みフィルタを用いて決定木を学習して構築する学習処理部２と、当該マルチスケールの畳み込みフィルタを用いて学習された決定木に従って分類対象の入力画像（静止画像）のラベルを推定する識別処理部３とを備えている。 More specifically, referring to FIG. 1, the image data classification device 1 of the present embodiment includes a learning processing unit 2 that learns and constructs a decision tree from learning data using a multiscale convolution filter, And an identification processing unit 3 that estimates a label of an input image (still image) to be classified according to a decision tree learned using the multiscale convolution filter.

学習処理部２は、特徴量プール部２１、複数解像度画像生成部２２、フィルタ畳み込み部２３、分離精度算出部２４、及びノード分岐部２５を備える。機械学習の学習データとして、正解ラベルが付与された画像群（正例）と、正解ラベルが付与されていない画像群（負例）が予め用意される。 The learning processing unit 2 includes a feature amount pool unit 21, a multi-resolution image generation unit 22, a filter convolution unit 23, a separation accuracy calculation unit 24, and a node branching unit 25. As learning data for machine learning, an image group to which a correct label is assigned (positive example) and an image group to which no correct label is assigned (negative example) are prepared in advance.

特徴量プール部２１は、予め定められた１つ以上の基準座標点と、フィルタサイズ毎に予め定められた複数種のフィルタ係数で構成される複数種の畳み込みフィルタと、予め定められた複数種のフィルタサイズとを保持している。 The feature amount pool unit 21 includes one or more predetermined reference coordinate points, a plurality of types of convolution filters composed of a plurality of types of filter coefficients predetermined for each filter size, and a plurality of types determined in advance. Holds the filter size.

複数解像度画像生成部２２は、入力される複数の学習データの各々に対し、特徴量プール部２１に保持される特徴量プール（複数種のフィルタサイズに応じた解像度）に従って複数の解像度変換を行い、各学習データに対応する複数の解像度画像を生成してフィルタ畳み込み部２３に出力する。 The multi-resolution image generation unit 22 performs a plurality of resolution conversions on each of a plurality of input learning data in accordance with a feature amount pool (resolution corresponding to a plurality of types of filter sizes) held in the feature amount pool unit 21. A plurality of resolution images corresponding to each learning data are generated and output to the filter convolution unit 23.

フィルタ畳み込み部２３は、複数解像度画像生成部２２から得られる複数の学習データの各々に対する複数の解像度画像について、特徴量プール部２１に保持される特徴量プール（個々の基準座標点と個々の畳み込みフィルタ）に従って畳み込みフィルタ処理を実行する。そして、フィルタ畳み込み部２３は、当該複数の解像度画像における或る基準座標点に対して同一フィルタ係数を持つ或る畳み込みフィルタの組み合わせ毎の畳み込みフィルタ処理の実行によって、当該複数種のフィルタサイズ毎の畳み込みフィルタ処理後の画素値を得て、これら画素値を正規化合成した値（畳み込み値ｇ）を求める。従って、１つの学習データにつき、１つ以上の基準座標点の各々に対し複数種の畳み込みフィルタの数に相当する複数の畳み込み値ｇが得られる。 The filter convolution unit 23 uses a feature amount pool (individual reference coordinate points and individual convolutions) held in the feature amount pool unit 21 for a plurality of resolution images for each of a plurality of learning data obtained from the multi-resolution image generation unit 22. Convolution filter processing is executed according to the filter). Then, the filter convolution unit 23 performs convolution filter processing for each combination of a certain convolution filter having the same filter coefficient with respect to a certain reference coordinate point in the plurality of resolution images, for each of the plurality of types of filter sizes. A pixel value after the convolution filter processing is obtained, and a value obtained by normalizing and combining these pixel values (convolution value g) is obtained. Therefore, a plurality of convolution values g corresponding to the number of convolution filters of a plurality of types are obtained for each of one or more reference coordinate points for one learning data.

このため、１つの学習データは、各基準座標点Ｐ_ｋに対しそれぞれが所定数のフィルタサイズＮ×Ｎで畳み込まれた複数種の畳み込みフィルタｈ_ｍにそれぞれ対応付けられた複数の畳み込み値ｇが得られる。従って、１つ以上の基準座標点Ｐ_ｋと、複数種の畳み込みフィルタｈ_ｍと、これらによってそれぞれ対応付けられた複数の畳み込み値ｇとの組み合わせが、当該１つの学習データを定義づける特徴ベクトルとして表される。 Therefore, one learning data, each reference coordinate point P _k, each predetermined number of filter size to N × convolution of plural kinds of convolved with N filter h _m a plurality of convolution values g respectively associated Is obtained. Thus, one or more reference coordinate point P _k, a plurality of kinds of convolution filter h _m, a combination of a plurality of convolution values g respectively associated with these, as a feature vector characterizing define the one learning data expressed.

複数解像度画像生成部２２及びフィルタ畳み込み部２３は、全ての学習データについて同様の処理を行う。 The multi-resolution image generation unit 22 and the filter convolution unit 23 perform the same processing for all learning data.

そして、フィルタ畳み込み部２３は、各学習データを定義づける特徴ベクトルとして表される１つ以上の基準座標点Ｐ_ｋと、複数種の畳み込みフィルタｈ_ｍと、これらによってそれぞれ対応付けられた畳み込み値ｇとの組み合わせ情報を、各学習データに対応付けて分離精度算出部２４に出力する。 Then, the filter convolution unit 23, one or more and the reference coordinate point P _k, expressed as a feature vector characterizing define each learning data, a plurality of kinds of convolution filter h _m and, convolution value g that these respectively associated Is output to the separation accuracy calculation unit 24 in association with each learning data.

分離精度算出部２４は、フィルタ畳み込み部２３から、全ての学習データの各々に関する１つ以上の基準座標点Ｐ_ｋと、複数種の畳み込みフィルタｈ_ｍと、これらによってそれぞれ対応付けられた畳み込み値ｇとの組み合わせ情報を取得して、１つ以上の基準座標点Ｐ_ｋのうち事前設定する特定数の基準座標点Ｐ_ｋ（対応して個々の畳み込み値ｇが得られる）の組み合わせについて、全ての学習データを最も精度よく２つに分離する畳み込みフィルタｈ_ｍの種類と、この分離のためのノード閾値を求めてノード分岐部２５に出力する。分離の良否の判定にはＧｉｎｉ係数や情報利得など従来技術と同様の尺度を利用する。 Separation accuracy calculation unit 24, the filter convolution unit 23, one or more and the reference coordinate point P _k for each of all learning data, and a plurality of kinds of convolution filter h _m, these by convolution respectively associated value g And a combination of a predetermined number of reference coordinate points P _k (correspondingly, individual convolution values g are obtained) among one or more reference coordinate points P _k . the type of convolution filter h _m separates the training data into two highest accuracy, and outputs a node threshold for this separation node bifurcation 25 asking. For the determination of the quality of separation, the same scale as in the prior art such as Gini coefficient and information gain is used.

ノード分岐部２５は、分離精度算出部２４から得られるノード閾値を基に、全ての学習データをノード分岐として２つに分離し、当該ノード分岐に係る畳み込みフィルタｈ_ｍの種類と、当該ノード分岐に係るノード閾値を決定木の構築のために当該ノードに対応付けて保持する。 Node splitter 25, based on the node threshold obtained from the separation accuracy computing unit 24, separates into two all learning data as node bifurcation, and the type of convolution filter h _m according to the node branch, the node branch Is stored in association with the node in order to construct a decision tree.

更に、ノード分岐部２５は、分岐したノードのそれぞれに対し、更なるノード分岐を行うようフィルタ畳み込み部２３に指示して、各分岐したノードに対応する学習データを割り振らせ、分離判定対象のノードに属する学習データ数が所定の闘値以下になるか、又は当該分離判定対象のノードにおける学習データの分離精度が所定の闘値以下となるまで（即ち、分離精度の向上が望めなくなるまで）繰り返す。分岐不能となったノードは葉ノードとなり、最終的にそのノードに残った学習データの正解又は不正解のラベルに応じて、判別結果としての正解又は不正解、及び正解であればその畳み込みフィルタｈ_ｍの種別を示す判別ラベルを決定する。 Further, the node branching unit 25 instructs the filter convolution unit 23 to perform further node branching for each of the branched nodes, and allocates learning data corresponding to each branched node, so that the node to be separated is determined. It repeats until the number of learning data belonging to or below the predetermined threshold value or until the separation accuracy of the learning data at the separation determination target node falls below the predetermined threshold value (that is, until improvement of the separation accuracy cannot be expected). . The node that cannot be branched becomes a leaf node, and the correct answer or incorrect answer as a discrimination result according to the correct answer or incorrect answer label of the learning data finally remaining in the node, and the convolution filter h if the answer is correct _A discrimination label indicating the type of _m is determined.

更に、ノード分岐部２５は、１つ以上の基準座標点Ｐ_ｋのうち更に事前設定する特定数の基準座標点Ｐ_ｋ（対応して個々の畳み込み値ｇが得られる）の組み合わせについても、全ての学習データを最も精度よく２つに分離する畳み込みフィルタｈ_ｍと、この分離のためのノード閾値を基に分岐を繰り返して、最終的にそのノードに残った学習データの正解又は不正解のラベルに応じて、判別結果としての正解又は不正解、及び正解であればその畳み込みフィルタｈ_ｍの種別を示す判別ラベルを決定する。 Further, the node branching unit 25 also applies all the combinations of a predetermined number of reference coordinate points P _k (correspondingly, individual convolution values g are obtained) among the one or more reference coordinate points P _k. a filter h _m convolution most accurately separated into two learning data, repeat branching based on node threshold for this separation, finally correct or incorrect labels remaining training data to that node depending on the determination result as a correct or incorrect answer, and determines the discriminated label indicating the type of the convolution filter h _m if correct.

尚、１つ以上の基準座標点Ｐ_ｋのうち特定数の基準座標点Ｐ_ｋ（対応して個々の畳み込み値ｇが得られる）の組み合わせは、操作者による外部設定によるものとすることができるが、所定の選択基準（例えば当該特定数の基準座標点Ｐ_ｋの組み合わせ初期値から最近位置の別の基準座標点を用いて当該特定数を維持して組み合わせを選択）に基づいて、自動的に設定するのが好ましい。尚、特徴量プール部２１に予め保持する基準座標点Ｐ_ｋが１つのときは当該決定木による分類判定に用いる当該特定数も１つであり、１つの決定木が構築される。また、特徴量プール部２１に予め保持する基準座標点Ｐ_ｋの全てを当該特定数とした場合も１つの決定木が構築される。 Note that a combination of a specific number of reference coordinate points P _k (correspondingly, individual convolution values g can be obtained) among the one or more reference coordinate points P _k can be determined by an external setting by the operator. Automatically based on a predetermined selection criterion (for example, selecting a combination while maintaining the specific number using another reference coordinate point closest to the combination initial value of the specific number of reference coordinate points _Pk ) It is preferable to set to. Note that when there is one reference coordinate point _Pk held in advance in the feature quantity pool unit 21, the specific number used for classification determination by the decision tree is also one, and one decision tree is constructed. Further, even when all the reference coordinate points P _k stored in advance in the feature amount pool unit 21 are set to the specific number, one decision tree is constructed.

このように、１つ以上の基準座標点Ｐ_ｋのうち特定数の基準座標点Ｐ_ｋ（対応して個々の畳み込み値ｇが得られる）の組み合わせ数に応じた数の決定木が構築される。 In this manner, a number of decision trees corresponding to the number of combinations of a specific number of reference coordinate points P _k (correspondingly, individual convolution values g are obtained) among the one or more reference coordinate points P _k are constructed. .

構築する決定木の出力ラベル（最終結果の判別ラベル）は、学習データに予め付されている正解又は不正解のラベルと合致するよう機械学習を行うことになる。最終的な決定木の出力ラベル（最終結果の判別ラベル）は、正解（又は不正解）のラベルでも更に分類してラベル１、ラベル２、…というように複数種が想定されるため、通常、機械学習による決定木の構築にあたって、単純な正解又は不正解の２分類とする場合には、ノード分岐部２５は、この複数種のラベルにおいて所定数以上に学習データが割り振られているノードのみを用いて決定木を構築することができる。 Machine learning is performed so that the output label of the decision tree to be constructed (discrimination label of the final result) matches the label of the correct answer or the incorrect answer given in advance to the learning data. Since the final decision tree output label (discriminating label of the final result) is further classified into labels of correct answer (or incorrect answer) and label 1, label 2,... When constructing a decision tree by machine learning, when the simple correct answer or the incorrect answer is classified into two categories, the node branching unit 25 selects only nodes to which a predetermined number or more of learning data is allocated in the plurality of types of labels. Can be used to build a decision tree.

尚、本例では、ノード分岐部２５は、分岐したノードのそれぞれに対し、更なるノード分岐を行うようフィルタ畳み込み部２３に指示して、各分岐したノードに対応する学習データを割り振らせるよう、決定木におけるノード分岐のためにループ処理を実行する例を示しているが、重複処理を避けるためループ処理を行わずに、一括して全ての基準座標点Ｐ_ｋに対する畳み込み値ｇを求め、ノード分岐を繰り返し行う処理とすることもできる。 In this example, the node branching unit 25 instructs the filter convolution unit 23 to perform further node branching for each of the branched nodes so that learning data corresponding to each branched node is allocated. Although an example is shown in which loop processing is performed for node branching in a decision tree, convolution values g for all reference coordinate points P _k are obtained in a batch without performing loop processing to avoid duplication processing, and nodes It can also be a process of repeatedly performing branching.

また、異なるフィルタサイズの畳み込みフィルタを更に畳み込むことによるマルチスケールの畳み込みフィルタは、予め全ての種類のマルチスケールの畳み込みフィルタのフィルタ係数を演算しておき、複数解像度画像を生成することなく畳み込み値ｇを得る構成とすることもできる。 In addition, a multiscale convolution filter by further convolution with convolution filters of different filter sizes calculates filter coefficients of all types of multiscale convolution filters in advance, and generates a convolution value g without generating a multi-resolution image. It can also be set as the structure which obtains.

ノード分岐部２５は、最終的に構築した決定木を、学習結果格納部３１に保存する。 The node branching unit 25 stores the finally constructed decision tree in the learning result storage unit 31.

一方、識別処理部３は、学習結果格納部３１、複数解像度画像生成部３３、及びフィルタ畳み込み部３３を備える。 On the other hand, the identification processing unit 3 includes a learning result storage unit 31, a multi-resolution image generation unit 33, and a filter convolution unit 33.

学習結果格納部３１は、ノード分岐部２５によって構築された決定木を格納している。決定木は、特徴量プールとして機械学習時に用いられた１つ以上の基準座標点と、フィルタサイズ毎に予め定められた複数種のフィルタ係数で構成される複数種の畳み込みフィルタと、予め定められた複数種のフィルタサイズの情報、及び、各ノードの分岐のためのノード閾値の情報を含んでいる。 The learning result storage unit 31 stores the decision tree constructed by the node branching unit 25. The decision tree includes one or more reference coordinate points used during machine learning as a feature amount pool, a plurality of types of convolution filters configured with a plurality of types of filter coefficients predetermined for each filter size, and a predetermined tree. In addition, information on a plurality of types of filter sizes and node threshold information for branching of each node are included.

複数解像度画像生成部３３は、識別処理対象の入力画像に対し、学習結果格納部３１に保持される決定木（複数種のフィルタサイズに応じた解像度）に従って複数の解像度変換を行い、複数の解像度画像を生成してフィルタ畳み込み部３３に出力する。即ち、複数解像度画像生成部３３は、学習処理部２における複数解像度画像生成部２２と同様の複数の解像度画像に変換し、フィルタ畳み込み部３３に出力する。 The multi-resolution image generation unit 33 performs a plurality of resolution conversions on the input image to be subjected to identification processing according to a decision tree (resolution corresponding to a plurality of types of filter sizes) held in the learning result storage unit 31, An image is generated and output to the filter convolution unit 33. That is, the multi-resolution image generation unit 33 converts the image into a plurality of resolution images similar to the multi-resolution image generation unit 22 in the learning processing unit 2 and outputs the converted image to the filter convolution unit 33.

フィルタ畳み込み部３３は、複数解像度画像生成部３３から得られる入力画像に対する複数の解像度画像について、学習結果格納部３１に保持される決定木（個々の基準座標点と個々の畳み込みフィルタ）に従って畳み込みフィルタ処理を実行し、当該複数種のフィルタサイズ毎の畳み込みフィルタ処理後の画素値を得て、これら画素値を正規化合成した値（畳み込み値ｇ）を求める。 The filter convolution unit 33 performs convolution filters on a plurality of resolution images corresponding to the input image obtained from the multi-resolution image generation unit 33 according to a decision tree (individual reference coordinate points and individual convolution filters) held in the learning result storage unit 31. Processing is performed to obtain pixel values after the convolution filter processing for each of the plurality of types of filter sizes, and a value obtained by normalizing and combining these pixel values (convolution value g) is obtained.

続いて、フィルタ畳み込み部３３は、当該決定木を用いて、各ノード閾値によって分岐していき、葉ノードに到達した段階で、そのノードに割り当てられたラベルを識別結果として出力する。 Subsequently, the filter convolution unit 33 branches based on each node threshold using the decision tree, and outputs the label assigned to the node as an identification result when the leaf node is reached.

（学習処理例）
以下、学習処理部２による学習処理の一例について、より具体的に、図２及び図３を参照して説明する。図２に示す学習処理例は、畳み込みフィルタのフィルタサイズを様々に変更してマルチスケール化を構成するにあたり、計算コストの削減のため、フィルタサイズを大きくするのではなく、対象画像のサイズを小さくすることで対応する例である。ただし、上述したように、決定木におけるノード分岐のためにループ処理を実行する例を示しているが、重複処理を避けるためループ処理を行わずにノード分岐を行う構成とすることもできる。 (Example of learning process)
Hereinafter, an example of the learning process performed by the learning processing unit 2 will be described more specifically with reference to FIGS. 2 and 3. In the learning processing example shown in FIG. 2, in configuring multi-scaling by variously changing the filter size of the convolution filter, the size of the target image is reduced instead of increasing the filter size in order to reduce the calculation cost. This is a corresponding example. However, as described above, an example is shown in which loop processing is executed for node branching in a decision tree. However, in order to avoid duplication processing, node branching may be performed without performing loop processing.

学習処理部２は、入力された複数の学習データｆ_１，ｆ_２，…，ｆ_Ｓ（データ数：Ｓ）の各々について、未分岐のノードが残っているか否かを判定することになるが（ステップＳ１）、最初に入力された時点では当然に未分岐のノードが残っているため（ステップＳ１：Ｙｅｓ）、ステップＳ２に移行する。 The learning processing unit 2 determines whether or not an unbranched node remains for each of the plurality of input learning data f ₁ , f ₂ ,..., F _S (number of data: S). (Step S1) Since an unbranched node naturally remains at the time of the first input (Step S1: Yes), the process proceeds to Step S2.

続いて、学習処理部２は、複数解像度画像生成部２２により、入力される複数の学習データの各々に対し、特徴量プール部２１を参照して（ステップＳ２）、特徴量プール（複数種のフィルタサイズに応じた解像度）に従って複数の解像度変換を行い、各学習データに対応する複数の解像度画像を生成する。 Subsequently, the learning processing unit 2 refers to the feature amount pool unit 21 for each of the plurality of pieces of learning data input by the multi-resolution image generation unit 22 (step S2), and the feature amount pool (multiple types of data). A plurality of resolution conversions are performed according to the resolution according to the filter size, and a plurality of resolution images corresponding to each learning data are generated.

例えば図３に示すように、複数解像度画像生成部２は、入力される複数の学習データｆ_１，ｆ_２，…，ｆ_Ｓ（データ数：Ｓ）の各々について、様々な画像サイズに縮小したもの、フィルタサイズＮ×Ｎとして、特徴量プール部２１内に、Ｎ_１×Ｎ_１（１倍），Ｎ_２×Ｎ_２（０．５倍），Ｎ_３×Ｎ_３（０．２５倍）の３種類が用意されているとき、３種類の解像度画像に変換する。尚、１／√Ｎ_１倍づつ縮小するなど本例に限定する必要はない。 For example, as illustrated in FIG. 3, the multi-resolution image generation unit 2 reduces each of a plurality of input learning data f ₁ , f ₂ ,..., F _S (data number: S) to various image sizes. As the filter size N × N, N ₁ × N ₁ (1 time), N ₂ × N ₂ (0.5 time), N ₃ × N ₃ (0.25 time) are included in the feature amount pool unit 21. Are prepared, they are converted into three types of resolution images. It should be noted that it is not necessary to limit to this example such as reducing by 1 / √N ₁ time.

続いて、学習処理部２は、当該ノードに属する各学習データに所定種類数の畳み込みフィルタ処理を実行し、更に畳み込む（ステップＳ３）。より具体的に、学習処理部２は、フィルタ畳み込み部２３により、各学習データに対応する複数の解像度画像について、特徴量プール部２１に保持される特徴量プール（個々の基準座標点と個々の畳み込みフィルタ）を参照して、当該複数の解像度画像における或る基準座標点に対して同一フィルタ係数を持つ或る畳み込みフィルタの組み合わせ毎の畳み込みフィルタ処理を実行し、当該複数種のフィルタサイズ毎の畳み込みフィルタ処理後の画素値を得て、これら画素値を正規化合成した値（畳み込み値ｇ）を求める。 Subsequently, the learning processing unit 2 performs a predetermined number of convolution filter processes on each learning data belonging to the node, and further convolves (step S3). More specifically, the learning processing unit 2 uses the filter convolution unit 23 to store a feature amount pool (individual reference coordinate points and individual reference points) held in the feature amount pool unit 21 for a plurality of resolution images corresponding to each learning data. The convolution filter processing for each combination of a certain convolution filter having the same filter coefficient with respect to a certain reference coordinate point in the plurality of resolution images is performed with reference to the convolution filter), and A pixel value after the convolution filter processing is obtained, and a value obtained by normalizing and combining these pixel values (convolution value g) is obtained.

例えば図３に示すように、１つの学習データにつき、１つ以上の基準座標点Ｐ_１，Ｐ_２，…，Ｐ_ｋ＝（ｘ_ｋ，ｙ_ｋ）（ｋは１以上の整数）のうち２つの基準座標点Ｐ_１，Ｐ_２の組み合わせに対し、各基準座標点に応じた複数種の畳み込みフィルタｈ_１，ｈ_２，…，ｈ_ｍ（ｍは２以上の整数）の数に相当する複数の畳み込み値ｇが得られ、図３では２つの基準座標点Ｐ_１，Ｐ_２にそれぞれ対応する畳み込み値ｇ_１，ｇ_２を例示して示している。 For example, as shown in FIG. 3, two pieces of one or more reference coordinate points P ₁ , P ₂ ,..., P _k = (x _k , y _k ) (k is an integer of 1 or more) per learning data. A plurality of convolution filters h ₁ , h ₂ ,..., H _m (m is an integer of 2 or more) corresponding to each reference coordinate point for a combination of _two reference coordinate points P ₁ and P _2. The convolution values g are obtained, and FIG. 3 illustrates the convolution values g ₁ and g ₂ corresponding to the _two reference coordinate points P ₁ and P ₂ , respectively.

このため、１つの学習データｆ_Ｓは、各基準座標点Ｐ_ｋに対しそれぞれが所定数のフィルタサイズＮ×Ｎで畳み込まれた複数種の畳み込みフィルタｈ_ｍにそれぞれ対応付けられた複数の畳み込み値ｇが得られる。従って、１つ以上の基準座標点Ｐ_ｋと、複数種の畳み込みフィルタｈ_ｍと、これらによってそれぞれ対応付けられた複数の畳み込み値ｇとの組み合わせが、当該１つの学習データを定義づける特徴ベクトルとして表される。 Therefore, one learning data f _S is, a plurality of respectively for each reference coordinate point P _k is respectively associated with a convolution filter h _m of plural kinds of convolved with a predetermined number of filter size N × N convolution The value g is obtained. Thus, one or more reference coordinate point P _k, a plurality of kinds of convolution filter h _m, a combination of a plurality of convolution values g respectively associated with these, as a feature vector characterizing define the one learning data expressed.

そして、図３に示すように、複数解像度画像生成部２２及びフィルタ畳み込み部２３は、全ての学習データｆ_１，ｆ_２，…，ｆ_Ｓについて同様の処理を行う。 Then, as shown in FIG. 3, multi-resolution image generation unit 22 and the filter convolution unit 23, all learning data _f _1, f 2, ..., the same processing is performed for _{f S.}

続いて、学習処理部２は、分離精度算出部２４により、全ての学習データに関する１つ以上の基準座標点Ｐ_ｋと、複数種の畳み込みフィルタｈ_ｍと、これらによってそれぞれ対応付けられた畳み込み値ｇとの組み合わせ情報を取得して、特定数の基準座標点Ｐ_ｋについて全ての学習データを最も精度よく２つに分離する畳み込みフィルタｈ_ｍと、この分離のためのノード閾値を求め、ノード分岐部２５により、図３に示すように、当該ノードを分岐する（ステップＳ４）。当該ノード分岐時に、畳み込みフィルタｈ_ｍの種類及びノード閾値は決定木の構築のために当該ノードに関連付けて保持される。分離の良否の判定には、Ｇｉｎｉ係数や情報利得など従来技術と同様の尺度を利用する。 Subsequently, the learning processing unit 2, separated by accuracy computing unit 24, and one or more reference coordinate point P _k of all of the training data, a plurality of kinds of convolution filter h _m, the convolution values respectively associated with these acquires combination information of g, determined with highest accuracy convolution separates into two filters h _m all learning data for the reference coordinate point P _k of a specific number, the node threshold for this separation, a node branch The node 25 is branched by the unit 25 as shown in FIG. 3 (step S4). During the node branching, type and node threshold convolution filter h _m is held in association with the node for the construction of decision trees. For the determination of the quality of the separation, the same scale as the conventional technique such as Gini coefficient and information gain is used.

続いて、学習処理部２は、ノード分岐部２５により、当該分岐したノードについて更なるノード分岐が可能であるか否かを判別し（ステップＳ６）、更なるノード分岐が可能であれば（ステップＳ６：Ｙｅｓ）、ステップＳ２に移行して、更なるノード分岐を行うようフィルタ畳み込み部２３に指示して、各分岐したノードに対応する学習データを割り振らせ、分離判定対象のノードに属する学習データ数が所定の闘値以下になるか、又は当該分離判定対象のノードにおける学習データの分離精度が所定の闘値以下となるまで（即ち、分離精度の向上が望めなくなるまで）（ステップＳ６：Ｎｏ）、繰り返す。 Subsequently, the learning processing unit 2 determines whether or not further node branching is possible for the branched node by the node branching unit 25 (step S6), and if further node branching is possible (step S6). S6: Yes), the process proceeds to step S2, where the filter convolution unit 23 is instructed to perform further node branching, learning data corresponding to each branched node is allocated, and learning data belonging to the separation determination target node Until the number falls below a predetermined threshold or until the separation accuracy of learning data at the separation determination target node falls below a predetermined threshold (that is, until improvement in separation accuracy cannot be expected) (step S6: No) ),repeat.

続いて、学習処理部２は、未分岐のノードが残っているか否かを判定し（ステップＳ１）、未分岐のノードが無くなるまで（ステップＳ１：Ｎｏ）、ステップＳ２乃至Ｓ６の処理を繰り返す（ステップＳ１：Ｙｅｓ）。 Subsequently, the learning processing unit 2 determines whether or not unbranched nodes remain (step S1), and repeats the processes of steps S2 to S6 until there are no unbranched nodes (step S1: No) ( Step S1: Yes).

最終的に、学習処理部２は、ノード分岐部２５により、上述したノード分岐を繰り返して、分岐不能となったノードに属する学習データの正解又は不正解のラベルに応じて、判別結果としての正解又は不正解、及び正解（又は不正解）のラベルでも更に分類してその種別を示す判別ラベルを決定する。 Finally, the learning processing unit 2 repeats the above-described node branching by the node branching unit 25, and the correct answer as the determination result according to the correct answer or incorrect answer label of the learning data belonging to the node that cannot be branched. Alternatively, the label of the incorrect answer and the correct answer (or incorrect answer) is further classified to determine the discrimination label indicating the type.

即ち、図４に示すように、入力画像を分岐するための特徴Ａが閾値Ａより大きいか小さいかで分離する第１ノード１００から第２ノード２００及び第３ノード３００へとノード分岐される。そして、第２ノード２００及び第３ノード３００、更には第４ノード４００及び第５ノード５００も同様に、各ノードで可能な限りノード分岐を繰り返し、最終的に、ラベル１、ラベル２、…というように複数種のラベルが付される。 That is, as shown in FIG. 4, node branching is performed from the first node 100 to the second node 200 and the third node 300 that are separated depending on whether the feature A for branching the input image is larger or smaller than the threshold value A. Similarly, the second node 200 and the third node 300, and further the fourth node 400 and the fifth node 500 repeat node branching as much as possible at each node, and finally, label 1, label 2,. In this way, a plurality of types of labels are attached.

通常、機械学習による決定木の構築にあたって、単純な正解又は不正解の２分類とする場合には、ノード分岐部２５は、この複数種のラベルにおいて所定数以上に学習データが割り振られているノードのみを用いて決定木を構築する。 Normally, when the decision tree is constructed by machine learning, when the simple correct answer or the incorrect answer is classified into two categories, the node branching unit 25 is a node in which learning data is allocated to a predetermined number or more in the plural types of labels. Build a decision tree using only

このように構築された決定木は、顔検出や顔特徴点検出の他、車両検出や車両特徴点検出、或いはこれらの組み合わせなど、様々なオブジェクト検出の用途に利用でき、汎用性の高いものとなる。 The decision tree constructed in this way can be used for various object detection applications such as vehicle detection, vehicle feature point detection, or a combination thereof, in addition to face detection and face feature point detection. Become.

例えば、フィルタサイズとフィルタ係数の組み合わせによって、非特許文献１，２に示されるような従来技法の特徴量も表現できることが分かる。尚、フィルタを適用する基準座標点Ｐ_ｋ＝（ｘ_ｋ，ｙ_ｋ）については、非特許文献２の技法と同様に、特徴点位置を考慮して選択することもできる。その場合は、特徴点から位置が近いほど高確率で選択されるような確率的サンプリングを実施するなどが考えられる。 For example, it can be seen that the feature amounts of the conventional techniques as shown in Non-Patent Documents 1 and 2 can be expressed by the combination of the filter size and the filter coefficient. Note that the reference coordinate point P _k = (x _k , y _k ) to which the filter is applied can be selected in consideration of the feature point position, as in the technique of Non-Patent Document 2. In that case, it is conceivable to perform probabilistic sampling such that selection is made with higher probability as the position is closer to the feature point.

特に、本実施形態の画像データ分類装置１は、このように構築された決定木を用いるため、例えば顔検出や顔特徴点の対象となる入力画像に、ノイズや顔の向きの多様性（顔画像の変形）がある場合でも、高周波ノイズ除去効果がある点と、基準座標点に基づく畳み込み値であることから、頑健で精度よく画像データを分類することができる。 In particular, since the image data classification apparatus 1 of the present embodiment uses the decision tree constructed in this way, for example, noise or a variety of face orientations (faces) in an input image that is a target of face detection or face feature points. Even when there is image deformation), it is possible to classify image data robustly and accurately because it is a convolution value based on a point having a high-frequency noise removal effect and a reference coordinate point.

また、画像データに対する人物認識処理に本実施形態の画像データ分類装置１の処理を適用する際、まず本実施形態の画像データ分類装置１の処理を経た後に、非特許文献１の技法に基づく顔検出を行って、その後、当該入力画像に対し非特許文献２の技法に基づく顔画像から複数の座標点（顔特徴点）を検出する構成でも、その分類精度が向上している分、処理性能が向上する。ただし、以下に説明するように、本実施形態の画像データ分類装置１を利用して、より優れた処理効率となるオブジェクト検出装置１０を構成することができる。 Further, when applying the processing of the image data classification device 1 of the present embodiment to the person recognition processing for the image data, first, after the processing of the image data classification device 1 of the present embodiment, the face based on the technique of Non-Patent Document 1 Even in a configuration in which a plurality of coordinate points (face feature points) are detected from a face image based on the technique of Non-Patent Document 2 for the input image after detection, the processing performance is improved because the classification accuracy is improved. Will improve. However, as will be described below, it is possible to configure the object detection device 10 with higher processing efficiency by using the image data classification device 1 of the present embodiment.

〔オブジェクト検出装置〕
以下、図５乃至図９を参照して、本発明による一実施形態のオブジェクト検出装置１０として構成される一実施例の顔検出装置について説明する。 [Object detection device]
Hereinafter, with reference to FIG. 5 to FIG. 9, an example face detection apparatus configured as the object detection apparatus 10 according to an embodiment of the present invention will be described.

（装置構成）
図５は、本発明による一実施形態のオブジェクト検出装置１０として構成される一実施例の顔検出装置の概略構成を示すブロック図である。ここでは、オブジェクト検出装置１０の典型例として、顔検出装置の実施例を説明するが、学習データを適宜選別することで、顔検出以外にも、人物検出や人物認識、車両などの物体検出など、静止画像からのオブジェクト検出に広く利用できる点に留意する。 (Device configuration)
FIG. 5 is a block diagram illustrating a schematic configuration of a face detection apparatus according to an example configured as the object detection apparatus 10 according to an embodiment of the present invention. Here, an example of a face detection device will be described as a typical example of the object detection device 10, but by selecting learning data as appropriate, in addition to face detection, person detection, person recognition, detection of an object such as a vehicle, etc. Note that it can be widely used for object detection from still images.

図５に示すように、オブジェクト検出装置１０は、本発明に係る画像データ分類装置１と、走査窓設定部１１と、分類結果判定部１２と、局所座標系基準座標点更新指示部１３と、を備える。 As shown in FIG. 5, the object detection device 10 includes an image data classification device 1 according to the present invention, a scanning window setting unit 11, a classification result determination unit 12, a local coordinate system reference coordinate point update instruction unit 13, Is provided.

走査窓設定部１１は、動画の１フレームなど静止画像の入力フレーム画像に対し、様々なサイズの走査窓で入力フレーム画像全体を走査可能とする機能部であり、或るサイズ（走査窓スケール）の走査窓で入力フレーム画像における特定の走査位置の画像を切り出して本発明に係る画像データ分類装置１に出力する。走査窓のサイズの変更や、入力フレーム画像の特定の走査位置の変更は、後述する分類結果判定部１２によって指示される。例えば図６には、入力フレーム画像Ｆに対し３例の走査窓スケールＳ_１、Ｓ_２及びＳ_３を示しており、図示中央に例示する入力フレーム画像Ｆには、走査窓スケールＳ_２によってそれぞれ異なる走査位置で顔検出ラベル１，２が判別されると予想される領域が破線で示されている。 The scanning window setting unit 11 is a functional unit that allows an entire input frame image to be scanned with a scanning window of various sizes with respect to an input frame image of a still image such as one frame of a moving image, and has a certain size (scanning window scale) The image at a specific scanning position in the input frame image is cut out from the scanning window and output to the image data classification device 1 according to the present invention. The change of the size of the scanning window and the change of the specific scanning position of the input frame image are instructed by the classification result determination unit 12 described later. For example, FIG. 6 shows three scanning window scales S ₁ , S _2, and S ₃ for the input frame image F. The input frame image F illustrated in the center of the figure has a scanning window scale S ₂ , respectively. A region where the face detection labels 1 and 2 are expected to be distinguished at different scanning positions is indicated by a broken line.

本発明に係る画像データ分類装置１は、顔検出用に学習された決定木が構築され、“顔である”と“顔ではない”の２分類のラベルを出力し、予め定めた平均顔に基づく顔特徴点として４点の基準座標点（顔特徴点）の初期値Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４と、その基準座標点の初期値Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４から位置が近いほど高確率で選択されるような確率的サンプリングを実施して分散された予め定めた近傍の基準座標点（基準座標点の初期値Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４からそれぞれ更新される基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’）が多数、特徴量プールとして保持されているものとする。この基準座標点の設定値の更新は、後述する局所座標系基準座標点更新指示部１３によって指示される。 The image data classification device 1 according to the present invention constructs a decision tree learned for face detection, outputs two classification labels of “face” and “not face”, and sets a predetermined average face. the initial value _P 1 of the reference coordinate point of the 4 points as a face feature point (facial feature points) _based, P _2, P 3, and _{P 4,} the initial value _P 1 of the reference coordinate _point, P _2, P 3, _{P 4} The reference coordinates of predetermined neighbors distributed by performing probabilistic sampling that is selected with higher probability as the position is closer (from the initial values P ₁ , P ₂ , P ₃ , P _{4 of the} reference coordinates). It is assumed that a large number of reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′) that are updated are held as a feature amount pool. The update of the set value of the reference coordinate point is instructed by the local coordinate system reference coordinate point update instruction unit 13 described later.

また、画像データ分類装置１におけるフィルタ畳み込み部２３は、決定木における画像特徴の特徴量としての畳み込み値ｇの他、複数の基準座標点（顔特徴点）のうち更新可能な特定の２座標点間の畳み込み値の差分値（以下、「畳み込み差分値」と称する）Δｇの全ての組み合わせも併せて算出する。 Further, the filter convolution unit 23 in the image data classification device 1 includes, in addition to the convolution value g as the feature amount of the image feature in the decision tree, specific two coordinate points that can be updated among a plurality of reference coordinate points (face feature points). All combinations of difference values of convolution values (hereinafter referred to as “convolution difference values”) Δg are also calculated.

例えば、図７（ａ）には、走査窓により切り出されて入力された或る入力画像ｆに対し、複数の基準座標点（顔特徴点）のうち選択可能な或る２座標点に対応する畳み込み値ｇ_１，ｇ_２や、別の２座標点に対応する畳み込み値ｇ_３，ｇ_４や、更に別の２座標点に対応する畳み込み値ｇ_５，ｇ_６が割り当てられるとすると、各２座標点に対応する畳み込み値の差分（畳み込み差分値Δｇ）もそれぞれ算出されて、当該入力画像ｆについて“顔である”と“顔ではない”の２分類に利用する顔特徴量となる。 For example, FIG. 7A corresponds to certain two coordinate points that can be selected from a plurality of reference coordinate points (face feature points) with respect to a certain input image f cut out and input by the scanning window. If convolution values g ₁ and g ₂ , convolution values g ₃ and g ₄ corresponding to another two coordinate points, and convolution values g ₅ and g ₆ corresponding to another two coordinate points are assigned, 2 The difference between the convolution values corresponding to the coordinate points (convolution difference value Δg) is also calculated, and becomes the facial feature quantity used for the two classifications of “being a face” and “not a face” for the input image f.

更に、本発明に係る基準座標点の更新に関して、図７（ｂ）にて３例の入力画像ｆ_Ａ，ｆ_Ｂ，ｆ_Ｃにそれぞれ示すように、更新される基準座標点Ｐ_１’，Ｐ_２’（Ｐ_３’，Ｐ_４’も同様）は、基準座標点の初期値Ｐ_１，Ｐ_２（Ｐ_３，Ｐ_４も同様）をそれぞれ原点とする局所座標系により更新される。 Further, regarding the update of the reference coordinate point according to the present invention, as shown in three examples of input images f _A , f _B , and f _C in FIG. 7B, the updated reference coordinate points P ₁ ′, P ₂ ′ (same for P ₃ ′ and P ₄ ′) is updated by the local coordinate system with the initial values P ₁ and P ₂ of the reference coordinate points (same for P ₃ and P ₄ ) as origins.

これは、当該顔検出対象の入力画像における顔形状の個人差や、顔の向き、表情の変化による基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’の位置関係の位置ずれを軽減するためである。例えばその比較例として、図７（ｃ）にて３例の入力画像ｆ_Ａ，ｆ_Ｂ，ｆ_Ｃにそれぞれ示すように、絶対座標系により基準座標点を更新すると、目鼻の位置関係の違いなどの影響で、入力画像によって基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’の位置関係の位置ずれが大きくなる。このため、基準座標点の更新は、局所座標系に基づいて行うものとしている。 This is because the positional relationship of the reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ due to individual differences in face shape in the input image to be detected and changes in facial orientation and facial expression. This is to alleviate the problem. For example, as a comparative example, when the reference coordinate point is updated in the absolute coordinate system as shown in three input images f _A , f _B , and f _C in FIG. As a result, the positional deviation of the positional relationship between the reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ increases depending on the input image. For this reason, the reference coordinate point is updated based on the local coordinate system.

そして、画像データ分類装置１における分離精度算出部２４は、フィルタ畳み込み部２３から、全ての学習データに関する１つ以上の基準座標点Ｐ_ｋと、複数種の畳み込みフィルタｈ_ｍと、これらによってそれぞれ対応付けられた複数の畳み込み値ｇと、４点の基準座標点（顔特徴点）のうち特定の２つの基準座標点Ｐ_ｋに対応する畳み込み差分値Δｇの全ての組み合わせを含む情報を取得する。そして、分離精度算出部２４は、畳み込み差分値Δｇの全ての組み合わせについて全ての学習データを最も精度よく２つに分離する畳み込みフィルタｈ_ｍと、この分離のためのノード閾値を求める。ノード分岐部２５は、各ノード閾値により、“顔である”と“顔ではない”の２分類のラベルを出力するよう決定木を構築する。 The separation accuracy calculation unit 24 in the image data classification device 1, respectively from filter convolution unit 23, one or more and the reference coordinate point P _k of all of the training data, a plurality of kinds of convolution filter h _m, these Information including all combinations of a plurality of convolution values g attached and convolution difference values Δg corresponding to two specific reference coordinate points P _k among the four reference coordinate points (face feature points) is acquired. The separation accuracy calculation unit 24 obtains the convolution difference value most accurately filter convolution separates into two h _m all learning data for all combinations of Delta] g, the node threshold for this separation. The node branching unit 25 constructs a decision tree so as to output two types of labels “face” and “not face” based on each node threshold.

従って、本発明に係る画像データ分類装置１は、走査窓により切り出されて入力される顔検出対象の入力画像ｆに対し、当該決定木を用いて、各ノード閾値によって分岐していき、葉ノードに到達した段階で、そのノードに割り当てられた“顔である”と“顔ではない”のいずれかの顔検出ラベルを、そのノードに対し最終的に更新し割り当てられた基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’と共に、識別結果として分類結果判定部１２に出力する。 Therefore, the image data classification device 1 according to the present invention branches the face detection target input image f input by being cut out by the scanning window, using the decision tree, according to each node threshold, Is reached, the face detection label “Face” or “Non-face” assigned to the node is finally updated and assigned to the reference coordinate point P ₁ ′ assigned to the node. , P ₂ ′, P ₃ ′, and P ₄ ′, and outputs them to the classification result determination unit 12 as identification results.

分類結果判定部１２は、本発明に係る画像データ分類装置１から、“顔である”と“顔ではない”のいずれかの顔検出ラベルと共に、最終的に更新し割り当てられた４座標点の基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’を入力して一時保持する。この４座標点の基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’のいずれか、又はその全部は、基準座標点の初期値と同じ値となる場合を含む。 The classification result determination unit 12 finally updates and assigns the assigned four coordinate points from the image data classification apparatus 1 according to the present invention together with the face detection label of “being a face” or “not a face”. Reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ are input and temporarily stored. One or all of the reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ of the _four coordinate points includes a case where the value is the same as the initial value of the reference coordinate point.

続いて、分類結果判定部１２は、一時保持した顔検出ラベルが“顔ではない”の旨を示す場合には、走査窓設定部１１に対し、当該走査窓を次の走査位置へ設定させるか、又は当該走査窓が最終の走査位置であれば次のサイズ（走査窓スケール）の走査窓で入力フレーム画像の初期の走査位置を設定させて、入力フレーム画像から顔検出対象の画像を切り出させ、本発明に係る画像データ分類装置１に再度の分類判定を行うよう指示する。 Subsequently, if the temporarily held face detection label indicates “not a face”, the classification result determination unit 12 causes the scanning window setting unit 11 to set the scanning window to the next scanning position. Alternatively, if the scanning window is the final scanning position, the initial scanning position of the input frame image is set by the scanning window of the next size (scanning window scale), and the face detection target image is cut out from the input frame image. The image data classification apparatus 1 according to the present invention is instructed to perform classification determination again.

一方、分類結果判定部１２は、一時保持した顔検出ラベルが“顔である”の旨を示す場合には、４座標点の基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’を更新させるよう局所座標系基準座標点更新指示部１３に指示する。 On the other hand, when the temporarily held face detection label indicates “face”, the classification result determination unit 12 determines the reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ of the four coordinate points. Instruct the local coordinate system reference coordinate point update instruction unit 13 to update '.

局所座標系基準座標点更新指示部１３は、“顔である”として一時保持した顔検出ラベルの入力画像に対し、４座標点の基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’を、本発明に係る画像データ分類装置１における特徴量プール部２１に保持している組み合わせ可能な全てについてその組み合わせを管理しており、基準座標点の設定値の更新を本発明に係る画像データ分類装置１に対し指示する。 The local coordinate system reference coordinate point update instruction unit 13 performs reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, P of four coordinate points on the input image of the face detection label temporarily stored as “face”. ₄ 'is managed for all possible combinations held in the feature amount pool unit 21 in the image data classification apparatus 1 according to the present invention, and the reference coordinate point setting value is updated according to the present invention. An instruction is given to the image data classification device 1.

尚、図１では、分類結果判定部１２と局所座標系基準座標点更新指示部１３を別個の機能部として図示しているが、局所座標系基準座標点更新指示部１３は、分類結果判定部１２の一部の機能として構成することができる。即ち、分類結果判定部１２は、画像データ分類装置１による分類結果を基に、走査窓の画像内でオブジェクトの有無を判定する判定処理と、該オブジェクトが有るときの画像特徴となる特徴点を選定する特徴点選定処理とを並列に実行するよう構成することができる。 In FIG. 1, the classification result determination unit 12 and the local coordinate system reference coordinate point update instruction unit 13 are illustrated as separate functional units. However, the local coordinate system reference coordinate point update instruction unit 13 includes a classification result determination unit. 12 can be configured as a partial function. That is, the classification result determination unit 12 determines the presence / absence of an object in the image of the scanning window based on the classification result by the image data classification device 1 and the feature points that are the image features when the object exists. The feature point selection process to be selected can be executed in parallel.

そして、分類結果判定部１２は、一時保持した顔検出ラベルが“顔である”の旨を示す当該入力画像に対し、“顔ではない”と“顔である”の分類を最大限繰り返し、顔検出ラベルを付して最終分類された“顔である”の当該入力画像と共に、対応する４座標点の基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’を顔特徴点として外部に出力する。 Then, the classification result determination unit 12 repeats the classification of “not a face” and “is a face” as much as possible for the input image indicating that the temporarily detected face detection label is “face”. Together with the input image of “Face” that is finally classified with the detection label, the corresponding four coordinate point reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, P ₄ ′ are used as face feature points. Output to the outside.

そして、分類結果判定部１２は、走査窓設定部１１に対し走査窓の走査位置やサイズ（操作窓スケール）を変更しても“顔ではない”の旨を示す入力フレーム画像に対しては、“顔ではない”の旨を示す顔検出ラベルを付して外部に出力する。 Then, the classification result determination unit 12 performs the following operation on the input frame image indicating “not a face” even if the scanning window scanning position or size (operation window scale) is changed with respect to the scanning window setting unit 11. A face detection label indicating “not a face” is attached and output to the outside.

（動作例）
図８は、本実施形態のオブジェクト検出装置１０として構成される一実施例の顔検出装置における動作の説明図である。 (Operation example)
FIG. 8 is an explanatory diagram of an operation in the face detection device of one example configured as the object detection device 10 of the present embodiment.

まず、オブジェクト検出装置１０は、走査窓設定部１１により、入力される入力フレーム画像Ｆに対し所定サイズ及び所定位置の走査窓で切り出した画像ｆを入力画像として本発明に係る画像データ分類装置１へ入力する。 First, the object detection apparatus 10 uses the scanning window setting unit 11 to input an image f cut out with a scanning window having a predetermined size and a predetermined position with respect to an input input frame image F as an input image. Enter.

そして、画像データ分類装置１は、図８に示すように、入力画像ｆについて、まず予め定めた平均顔に基づく顔特徴点として４点の基準座標点（顔特徴点）の初期値Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４を割り当て、“顔ではない”と“顔である”の分類を行う（ステップＳ１１）。 Then, as shown in FIG. 8, the image data classification device 1 first sets an initial value P ₁ of four reference coordinate points (face feature points) as face feature points based on a predetermined average face for the input image f. P ₂ , P ₃ , and P ₄ are assigned to classify “not a face” and “is a face” (step S11).

このとき、オブジェクト検出装置１０は、分類結果判定部１２により、“顔ではない”として分類された入力画像ｆについては走査窓設定部１１に対し、次の走査窓の画像を顔検出対象とするよう制御する。 At this time, the object detection apparatus 10 sets the next scanning window image as a face detection target for the input image f classified as “not a face” by the classification result determination unit 12 with respect to the scanning window setting unit 11. Control as follows.

一方、分類結果判定部１２は、“顔である”として分類された入力画像ｆについては、局所座標系基準座標点更新指示部１３を介して本発明に係る画像データ分類装置１に対し指示して、４座標点の基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’を更新する（ステップＳ１２）。このように、画像データ分類装置１による分類結果の判定により“顔ではない”となるとき、次の走査窓の画像を入力するよう回帰される。 On the other hand, the classification result determination unit 12 instructs the image data classification device 1 according to the present invention via the local coordinate system reference coordinate point update instruction unit 13 for the input image f classified as “face”. Then, the reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ of the four coordinate points are updated (step S12). As described above, when the result of classification by the image data classifying apparatus 1 is “not a face”, the process is regressed to input an image of the next scanning window.

このように、分類結果判定部１２は、最終分類された“顔である”の当該入力画像に対し、基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’の更新を繰り返しながら画像データ分類装置１による分類を行わせることで、徐々に“顔ではない”と“顔である”の分類判別が困難となり、いずれ分類判別となる状態まで収束する。そして、最終的な “顔である”の当該入力画像に対し更新された基準座標点Ｐ_１’，Ｐ_２’，Ｐ_３’，Ｐ_４’は、高精度なものとなる。 As described above, the classification result determination unit 12 repeatedly updates the reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ for the input image of “Face” that is finally classified. By causing the image data classification device 1 to perform classification, it is gradually difficult to classify “not a face” and “is a face”, and eventually converge to a state where classification is achieved. Then, the updated reference coordinate points P ₁ ′, P ₂ ′, P ₃ ′, and P ₄ ′ for the final “face” input image are highly accurate.

従って、本実施形態のオブジェクト検出装置１０は、“顔ではない”と“顔である”の分類問題と、４座標点の基準座標点（顔特徴点）の更新（顔特徴点の変位の分散の最小化）を行う回帰問題とを、画像データ分類装置１が並列に解くことができるため、処理効率の向上と、顔検出精度の向上が実現される。 Therefore, the object detection apparatus 10 according to the present embodiment performs a classification problem of “not a face” and “is a face”, updates of reference coordinate points (face feature points) of four coordinate points (dispersion of displacement of face feature points). Since the image data classification device 1 can solve the regression problem for performing (minimization of the image) in parallel, the processing efficiency and the face detection accuracy can be improved.

即ち、非特許文献１の技法に基づく顔検出を行って、その後、当該入力画像に対し非特許文献２の技法に基づく顔画像から複数の座標点（顔特徴点）を検出するような直列処理よりも、本実施形態のオブジェクト検出装置１０は、処理効率が改善する。 That is, serial processing is performed such that face detection based on the technique of Non-Patent Document 1 is performed, and then a plurality of coordinate points (face feature points) are detected from the face image based on the technique of Non-Patent Document 2 for the input image. Rather, the object detection device 10 of the present embodiment improves the processing efficiency.

また、上述した例では、４座標点の基準座標点の更新を行う例を説明したが、さらに少ない２座標点とすることや、逆に更に多い９座標点の基準座標点の更新を行うなど、任意に設定できる。 Further, in the above-described example, the example in which the reference coordinate points of the four coordinate points are updated has been described. However, the number of reference coordinate points is further reduced, or the reference coordinate points of more nine coordinate points are updated. Can be set arbitrarily.

（実験による検証）
顔検出の精度の向上が無ければ、人物認識に有用な顔特徴点検出の精度の向上も望めない。そして、顔検出の精度の向上を図るには、顔分類の精度の向上が有効である。そこで、９座標点の基準座標点の更新を行うよう構成した本実施形態のオブジェクト検出装置１０と、同一条件下で構成した非特許文献１の技法との顔検出性能の比較実験を行った。 (Verification by experiment)
If the accuracy of face detection is not improved, the accuracy of detection of face feature points useful for human recognition cannot be expected. In order to improve the accuracy of face detection, it is effective to improve the accuracy of face classification. Therefore, a comparison experiment of face detection performance was performed between the object detection apparatus 10 of the present embodiment configured to update the reference coordinate points of the nine coordinate points and the technique of Non-Patent Document 1 configured under the same conditions.

学習データは２ヶ月分のテレビ映像から２万枚の顔画像を抽出し、本実施形態のオブジェクト検出装置１０における画像データ分類装置１に決定木を構築させた。尚、オブジェクト検出装置１０の最大回帰数を５回に制限し、ノード数が最大６００となるよう学習時のラベル数を制限して決定木を構築した。 As learning data, 20,000 face images were extracted from TV images for two months, and the decision tree was constructed in the image data classification device 1 in the object detection device 10 of the present embodiment. The maximum number of regressions of the object detection apparatus 10 is limited to 5 times, and the number of labels at the time of learning is limited so that the maximum number of nodes is 600, and the decision tree is constructed.

実験対象の画像は、或る一日分の放送映像における複数のフレーム画像を顔検出対象の入力フレーム画像とし、本実施形態のオブジェクト検出装置１０と非特許文献１の技法との顔検出性能の比較を行ったところ、図９に示す結果が得られた。 The image to be tested is a plurality of frame images in a certain day of broadcast video as input frame images for face detection, and the face detection performance of the object detection device 10 of this embodiment and the technique of Non-Patent Document 1 is as follows. As a result of comparison, the results shown in FIG. 9 were obtained.

図９において、「検出率」は、入力フレーム画像内に出現した顔のうち検出できた割合である。また、「誤検出率」は、検出結果に含まれる誤りの割合を示している。本実施形態のオブジェクト検出装置１０は、「検出率」として２９．３％の性能向上、「誤検出率」として２１．１％の性能向上が確認された。 In FIG. 9, “detection rate” is the proportion of faces that have been detected in the input frame image. The “false detection rate” indicates the ratio of errors included in the detection result. It was confirmed that the object detection apparatus 10 of the present embodiment improved performance by 29.3% as “detection rate” and improved by 21.1% as “false detection rate”.

これらの検出結果を分析すると、非特許文献１の技法では、顔の向きや表情の変化に起因する未検出、及び複雑な背景に起因する誤検出が、本実施形態のオブジェクト検出装置１０との差異として確認され、カメラ映像から顔検出を行うには、特に、本発明に係る本発明に係る画像データ分類装置１がオブジェクト検出装置１０に有効であることが確認された。 When these detection results are analyzed, in the technique of Non-Patent Document 1, undetected due to a change in face orientation and facial expression, and false detection due to a complicated background are detected with the object detection apparatus 10 of the present embodiment. It has been confirmed that the image data classification device 1 according to the present invention is particularly effective for the object detection device 10 in order to perform face detection from a camera image that is confirmed as a difference.

（総括）
以上のように、本発明に係る画像データ分類装置１は、マルチスケールの畳み込みフィルタを利用することによって、従来技法よりも、映像に映るオブジェクトの形状や特徴をより正確に捉えることが可能となり、データの分類精度を向上させることができる。 (Summary)
As described above, the image data classification device 1 according to the present invention can more accurately capture the shape and characteristics of an object shown in a video than the conventional technique by using a multiscale convolution filter. Data classification accuracy can be improved.

そして、本発明に係る画像データ分類装置１は、顔検出や人物検出や人物認識、車両などの物体検出など、静止画像からのオブジェクト検出に広く利用できる。 The image data classification apparatus 1 according to the present invention can be widely used for object detection from still images such as face detection, person detection, person recognition, and object detection such as a vehicle.

その他、本発明に係る画像データ分類装置１は、決定木に基づくオブジェクト検出装置１０として利用する以外にも、決定木を利用した回帰や、ランダムフォレストなどの決定木をベースとしたその他の技法にも利用できる。即ち、ランダムフォレストは、決定木を利用した集団学習技法を１つであり、多数の決定木を利用して、それぞれでデータのラベルを推定し、最終的に多数決で推定ラベルを決定するという技術である。このため、ランダムフォレストにおける決定木を構成する識別器を本発明に係る画像データ分類装置１に置き換えて利用することができる。 In addition, the image data classification device 1 according to the present invention is not only used as an object detection device 10 based on a decision tree, but also uses other techniques based on a decision tree such as regression using a decision tree or a random forest. Can also be used. That is, the random forest is one group learning technique using a decision tree, and uses a large number of decision trees to estimate the label of the data and finally determine the estimated label by majority voting. It is. For this reason, the classifier constituting the decision tree in the random forest can be used in place of the image data classification device 1 according to the present invention.

また、決定木に基づくオブジェクト検出装置１０とする以外にも、AdaBoostやReal AdaBoostなどの各種ブースティングアルゴリズムにも利用することができる。即ち、AdaBoostやReal AdaBoostは、多数の識別器を連結してデータを分類する技法であり、この識別器として、本発明に係る画像データ分類装置１を利用することができる。 In addition to the object detection device 10 based on a decision tree, it can also be used for various boosting algorithms such as AdaBoost and Real AdaBoost. That is, AdaBoost and Real AdaBoost are techniques for classifying data by connecting a large number of classifiers, and the image data classification apparatus 1 according to the present invention can be used as the classifier.

尚、画像データ分類装置１及びオブジェクト検出装置１０は、それぞれコンピュータとして機能させることができ、当該コンピュータに、各構成要素を実現させるためのプログラムは、当該コンピュータのメモリに記憶される。当該コンピュータに備えられる中央演算処理装置（ＣＰＵ）などの制御で、各構成要素の機能を実現するための処理内容が記述されたプログラムを、適宜、当該メモリから読み込んで各構成要素の機能を当該コンピュータに実現させることができる。 The image data classification device 1 and the object detection device 10 can each function as a computer, and a program for causing the computer to realize each component is stored in the memory of the computer. Under the control of a central processing unit (CPU) provided in the computer, a program in which processing content for realizing the function of each component is described is appropriately read from the memory, and the function of each component is It can be realized on a computer.

本発明に係る画像データ分類装置１及びオブジェクト検出装置１０、及びこれらのプログラムは、上述した実施形態の例に限定されるものではなく、特許請求の範囲の記載によってのみ制限される。 The image data classification device 1 and the object detection device 10 according to the present invention and these programs are not limited to the above-described embodiments, but are limited only by the description of the scope of claims.

本発明によれば、汎用性を持たせてより頑健で精度よく画像データを分類可能となり、画像データのラベルを精度よく推定することが可能となるので、データの分類を要する用途や、オブジェクトを検出する用途に有用である。 According to the present invention, it is possible to classify image data with more versatility and more robustly and accurately, and it is possible to accurately estimate the label of the image data. Useful for detecting applications.

１画像データ分類装置
２学習処理部
３識別処理部
１０オブジェクト検出装置
１１走査窓設定部
１２分類結果判定部
１３局所座標系基準座標点更新指示部
２１特徴量プール部
２２複数解像度画像生成部
２３フィルタ畳み込み部
２４分離精度算出部
２５ノード分岐部
３１学習結果格納部
３２複数解像度画像生成部
３３フィルタ畳み込み部 DESCRIPTION OF SYMBOLS 1 Image data classification device 2 Learning process part 3 Identification process part 10 Object detection apparatus 11 Scanning window setting part 12 Classification result determination part 13 Local coordinate system reference | standard coordinate point update instruction part 21 Feature-value pool part 22 Multi-resolution image generation part 23 Filter Convolution unit 24 Separation accuracy calculation unit 25 Node branching unit 31 Learning result storage unit 32 Multi-resolution image generation unit 33 Filter convolution unit

Claims

An image data classification device for classifying image data,
A learning processing unit that learns and constructs a decision tree from learning data prepared in advance using a multi-scale convolution filter;
An identification processing unit that classifies an input image to be identified using the multiscale convolution filter according to the learned decision tree;
An image data classification device comprising:

The learning processing unit
Features of one or more predetermined reference coordinate points, a plurality of types of convolution filters composed of a plurality of types of filter coefficients predetermined for each filter size, and a plurality of types of predetermined filter sizes A feature amount pool means to hold as a pool;
For each of the plurality of input learning data, the multi-scale convolution filter processing is performed by the plurality of types of convolution filters according to the plurality of types of filter sizes according to the feature amount pool, First convolution filter processing means for obtaining a plurality of convolution values corresponding to the number of plural kinds of convolution filters for each of the one or more reference coordinate points;
About one or more reference coordinate points based on the combination information of the one or more reference coordinate points for each of all learning data, the plurality of types of convolution filters, and the convolution values associated with each of the plurality of convolution filters. Separation accuracy calculation means for obtaining a type of convolution filter that most accurately separates all learning data to be node-branched into two and a node threshold for this separation,
Based on the node threshold, all learning data is separated into two as node branches, the type of convolution filter related to the node branch and the node threshold related to the node branch are held in association with the node, Node branching means for constructing the decision tree by performing repetitive control to perform further node branching for all learning data after node branching;
The image data classification device according to claim 1, further comprising:

The node branching means repeats until the number of learning data belonging to the separation determination target node is equal to or less than a predetermined threshold or the separation accuracy of learning data in the separation determination target node is equal to or less than a predetermined threshold. The image data classification apparatus according to claim 2, wherein the decision tree is learned and constructed by performing repetitive control.

The identification processing unit
Learning result storage means for storing the decision tree constructed by the node branching means;
Second convolution filter processing means for classifying the input image to be identified using the multiscale convolution filter according to the learned decision tree;
The image data classification device according to any one of claims 1 to 3, further comprising:

An object detection device for detecting a predetermined object from an input frame image,
An image data classification device according to any one of claims 1 to 4,
Based on the classification result by the image data classification device, a determination process for determining the presence / absence of an object in an image of a predetermined scanning window with respect to the input frame image, and a feature point serving as an image feature when the object is present are selected. Classification result judging means for executing feature point selection processing in parallel;
An object detection apparatus comprising:

An image data classification device according to any one of claims 2 to 4, comprising:
The image data classification device includes:
The feature amount pool means includes a plurality of reference coordinate points, a plurality of types of convolution filters configured with a plurality of types of filter coefficients predetermined for each filter size, and a plurality of types of filter sizes determined in advance. Having means for storing as a feature amount pool;
The convolution filter processing means has means for further obtaining a difference value of a convolution value between specific two coordinate points that can be updated among the plurality of reference coordinate points,
The separation accuracy calculation means is configured to calculate all of the node branching targets for the plurality of reference coordinate points based on all combinations of difference values of convolution values between specific two coordinate points that can be updated among the plurality of reference coordinate points. Means for determining the type of convolution filter that most accurately separates learning data into two and the node threshold for this separation,
The node branching means separates all learning data into two as node branches based on the node threshold, and corresponds the type of convolution filter related to the node branch and the node threshold related to the node branch to the node 6. The information processing apparatus according to claim 5, further comprising means for learning and constructing the decision tree by performing repetitive control to perform further node branching for all learning data after the node branching. The object detection device described in 1.

The classification result determining unit determines an initial value of a predetermined number of reference coordinate points among the plurality of reference coordinate points in the feature amount pool unit, and each of the local values having the initial value of the predetermined number of reference coordinate points as an origin. 7. The object according to claim 6, further comprising reference coordinate point updating means for updating the image data classification device so as to correct a positional deviation of the predetermined number of reference coordinate points by a coordinate system. Detection device.

A program for causing a computer to function as the image data classification device according to any one of claims 1 to 4.

The program for functioning a computer as an object detection apparatus as described in any one of Claim 5 to 7.