JP2017084006A

JP2017084006A - Image processor and method thereof

Info

Publication number: JP2017084006A
Application number: JP2015210007A
Authority: JP
Inventors: 小川　修平; Shuhei Ogawa; 修平小川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-10-26
Filing date: 2015-10-26
Publication date: 2017-05-18

Abstract

【課題】部分領域を統合した統合領域の特徴量を抽出する際の計算コストを削減する。
【解決手段】分割部102は、画像を部分領域に分割する。抽出部103は、部分領域から第一の特徴量を抽出する。統合部104aは、第一の特徴量に基づき、部分領域を統合して統合領域を生成する。算出部104bは、統合領域特徴量として、少なくとも、統合領域に含まれる部分領域の第一の特徴量の統計値を算出する。
【選択図】図1PROBLEM TO BE SOLVED: To reduce a calculation cost when extracting feature amounts of an integrated area obtained by integrating partial areas.
A dividing unit divides an image into partial areas. The extraction unit 103 extracts the first feature amount from the partial region. Based on the first feature amount, the integration unit 104a integrates the partial areas and generates an integrated area. The calculation unit 104b calculates at least a statistical value of the first feature amount of the partial region included in the integrated region as the integrated region feature amount.
[Selection] Figure 1

Description

本発明は、画像認識などの画像処理に関する。 The present invention relates to image processing such as image recognition.

画像中の被写体を学習し認識する際、画像中の局所的な領域の情報だけでなく、ある程度まとまった領域の情報に基づいて、学習と認識を行う手法が用いられる。例えば、画像を意味がある纏まりに領域分割（以下「意味的領域分割」）する際に、局所的な領域のみに着目する認識手法を用いると、白い壁の一部分が「空」と判別されてしまうなど、正確な判別が困難な場合がある。そこで、局所的な領域よりも広い領域の情報を利用して認識を行う手法が用いられる。 When learning and recognizing a subject in an image, a method is used in which learning and recognition are performed based not only on information on a local area in the image but also on information on a certain area. For example, when an image is segmented into meaningful groups (hereinafter referred to as “semantic region segmentation”), if a recognition method that focuses only on local regions is used, a part of the white wall is identified as “empty”. In some cases, accurate discrimination is difficult. Therefore, a method is used in which recognition is performed using information on a region wider than a local region.

より広い領域を利用して認識を行う場合、特許文献1、2に示されるように、部分領域を統合した統合領域を利用する手法が考えられる。しかし、統合領域に含まれる画素をすべてスキャンして特徴量を算出するような、統合領域から直接特徴量を抽出する手法は計算コストが高い。 When performing recognition using a wider area, as shown in Patent Documents 1 and 2, a method of using an integrated area obtained by integrating partial areas is conceivable. However, a method for extracting feature amounts directly from an integrated region, such as scanning all pixels included in the integrated region and calculating feature amounts, has a high calculation cost.

また、統合領域の理想的なサイズや形状は、認識したい被写体によって変わるため、予め適切な統合領域を用意することが困難である。特許文献3は、スケールなどを変えて統合領域を多数生成し、これらを同時に用いて認識を行う方法を開示する。このような方法は、特徴量抽出の計算コストをさらに増大させる。 Also, since the ideal size and shape of the integrated region varies depending on the subject to be recognized, it is difficult to prepare an appropriate integrated region in advance. Patent Document 3 discloses a method in which a large number of integrated regions are generated by changing the scale and the like, and recognition is performed using these simultaneously. Such a method further increases the calculation cost of feature quantity extraction.

特開2013-027637号公報JP 2013-027637 特開2009-212750号公報JP 2009-212750 A 米国特許出願公開第2014/0037198号明細書US Patent Application Publication No. 2014/0037198

R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk「SLIC Superpixels Compared to State-of-the-art Superpixel Methods」IEEE Transactions on Pattern Analysis and Machine Intelligence、vol. 34、Issue 11、2274-2282頁、2012年R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk `` SLIC Superpixels Compared to State-of-the-art Superpixel Methods '' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, Issue 11, 2274-2282, 2012 T. Ojala, M. Pietikainen, and D. Haywood「A comparative study of texture measures with classification based on featured distributions」Pattern Recognition、Vol. 29、No. 1、51-59頁、1996年T. Ojala, M. Pietikainen, and D. Haywood "A comparative study of texture measures with classification based on featured distributions" Pattern Recognition, Vol. 29, No. 1, pp. 51-59, 1996

本発明は、部分領域を統合した統合領域の特徴量を抽出する際の計算コストを削減することを目的とする。 An object of the present invention is to reduce the calculation cost when extracting a feature amount of an integrated area obtained by integrating partial areas.

本発明は、前記の目的を達成する一手段として、以下の構成を備える。 The present invention has the following configuration as one means for achieving the above object.

本発明にかかる画像処理は、画像を部分領域に分割し、前記部分領域から第一の特徴量を抽出し、前記第一の特徴量に基づき、前記部分領域を統合して統合領域を生成し、統合領域特徴量として、少なくとも、前記統合領域に含まれる部分領域の第一の特徴量の統計値を算出する。 The image processing according to the present invention divides an image into partial regions, extracts a first feature amount from the partial region, and integrates the partial regions based on the first feature amount to generate an integrated region. The statistical value of at least the first feature amount of the partial region included in the integrated region is calculated as the integrated region feature amount.

本発明によれば、部分領域を統合した統合領域の特徴量を抽出する際の計算コストを削減することができる。 ADVANTAGE OF THE INVENTION According to this invention, the calculation cost at the time of extracting the feature-value of the integrated area | region which integrated the partial area | region can be reduced.

実施例1の画像処理装置の構成例を説明するブロック図。FIG. 2 is a block diagram for explaining a configuration example of an image processing apparatus according to the first embodiment. 画像処理装置による画像認識処理を説明するフローチャート。6 is a flowchart for explaining image recognition processing by the image processing apparatus. 注目部分領域のカテゴリの判別処理について説明する図。The figure explaining the discrimination | determination process of the category of an attention partial area. 実施例の画像認識処理による認識結果例を示す図。The figure which shows the example of a recognition result by the image recognition process of an Example. 実施例の画像処理装置として機能するコンピュータ装置の構成例を示すブロック図。1 is a block diagram illustrating a configuration example of a computer device that functions as an image processing device according to an embodiment. 統合領域特徴量として利用可能な特徴量を示す図。The figure which shows the feature-value which can be utilized as an integrated area | region feature-value. 統合領域特徴量の算出方法の態様を説明する図。The figure explaining the aspect of the calculation method of an integrated area | region feature-value. 実施例2の画像処理装置の構成例を説明するブロック図。FIG. 3 is a block diagram for explaining an example configuration of an image processing apparatus according to a second embodiment. 実施例2の画像認識処理を説明するフローチャート。9 is a flowchart for explaining image recognition processing according to the second embodiment. 実施例3の画像処理装置の構成例を説明するブロック図。FIG. 6 is a block diagram for explaining an example configuration of an image processing apparatus according to a third embodiment. 実施例3の物体検出処理を説明するフローチャート。10 is a flowchart for explaining object detection processing according to the third embodiment. 物体検出処理について説明する図。The figure explaining an object detection process. 実施例4の画像処理装置の構成例を説明するブロック図。FIG. 9 is a block diagram illustrating an example of the configuration of an image processing apparatus according to a fourth embodiment. 実施例4の主被写体検出処理を説明するフローチャート。9 is a flowchart for explaining main subject detection processing according to a fourth embodiment.

以下、本発明にかかる実施例の画像処理装置および画像処理方法を図面を参照して詳細に説明する。なお、実施例は特許請求の範囲にかかる本発明を限定するものではなく、また、実施例において説明する構成の組み合わせのすべてが本発明の解決手段に必須とは限らない。 Hereinafter, an image processing apparatus and an image processing method according to embodiments of the present invention will be described in detail with reference to the drawings. In addition, an Example does not limit this invention concerning a claim, and all the combinations of the structure demonstrated in an Example are not necessarily essential for the solution means of this invention.

本発明は、入力画像中の被写体の検出、被写体ごとに領域を分割する領域分割、類似した画像を検索する画像検索、および、画像のシーンを判別するシーン判別、などの画像認識処理に関する。入力画像は静止画、動画を問わない。また、被写体には、人物や犬などの生物、建物や道具などの人工物、山や空といった自然や風景など、あらゆる対象が含まれる。 The present invention relates to image recognition processing such as detection of a subject in an input image, region division for dividing a region for each subject, image search for searching for a similar image, and scene determination for determining a scene of an image. The input image may be a still image or a moving image. The subject includes all objects such as creatures such as people and dogs, artifacts such as buildings and tools, nature and landscapes such as mountains and the sky.

以下では、入力画像に含まれる被写体のカテゴリを判別し、画像を意味的領域分割する画像認識処理を説明する。被写体のカテゴリとして、空、人体、草木、建物、車、道路といった一般的なC種類のカテゴリを適用する。 In the following, an image recognition process for determining a category of a subject included in an input image and dividing the image into semantic regions will be described. General C categories such as sky, human body, vegetation, buildings, cars, and roads are applied as subject categories.

［装置の構成］
図1のブロック図により実施例1の画像処理装置の構成例を説明する。画像処理装置において、取得部101は、認識対象の画像を取得する。分割部102は、取得された画像を領域分割する。抽出部103は、分割された部分領域ごとに特徴量（以下「第一の特徴量」）を抽出する。 [Device configuration]
A configuration example of the image processing apparatus according to the first embodiment will be described with reference to the block diagram of FIG. In the image processing apparatus, the acquisition unit 101 acquires an image to be recognized. The dividing unit 102 divides the acquired image into regions. The extraction unit 103 extracts a feature amount (hereinafter “first feature amount”) for each of the divided partial areas.

統合領域生成部104は、部分領域を統合した統合領域と、その特徴量（以下「統合領域特徴量」）を生成する。統合領域生成部104は、第一の特徴量に基づいて部分領域を統合する統合部104a、および、各統合領域の特徴を示す統計値を算出し、統計値を統合領域特徴量として出力する算出部104bを有する。 The integrated region generation unit 104 generates an integrated region obtained by integrating partial regions and a feature amount thereof (hereinafter, “integrated region feature amount”). The integrated region generation unit 104 calculates a statistical value indicating a feature of each integrated region, and outputs the statistical value as an integrated region feature amount, and an integration unit 104a that integrates the partial regions based on the first feature amount Part 104b.

統合領域認識部105は、統合領域特徴量に基づいて、対応する統合領域のカテゴリを認識し、詳細は後述するが、C種類のカテゴリの判別スコアを出力する。部分領域認識部106は、第一の特徴量に基づいて、対応する部分領域のカテゴリを認識し、詳細は後述するが、C種類のカテゴリの判別スコアを出力する。 The integrated area recognition unit 105 recognizes the category of the corresponding integrated area based on the integrated area feature amount, and outputs a discrimination score for C types of categories, as will be described in detail later. The partial region recognition unit 106 recognizes the category of the corresponding partial region based on the first feature amount, and outputs a discrimination score for C types of categories, as will be described in detail later.

特徴量生成部107は、部分領域認識部106から入力される部分領域の判別スコア、統合領域認識部105から入力される当該部分領域を含む統合領域の判別スコアに基づき、当該部分領域の特徴量（以下「第二の特徴量」）を生成する。カテゴリ判定部108は、第二の特徴量に基づいて認識した、対応する部分領域のカテゴリを出力する。 Based on the partial region discrimination score input from the partial region recognition unit 106 and the integrated region discrimination score including the partial region input from the integrated region recognition unit 105, the feature amount generation unit 107 calculates the feature amount of the partial region (Hereinafter referred to as “second feature amount”). The category determination unit 108 outputs the category of the corresponding partial area recognized based on the second feature amount.

［画像認識処理］
図2のフローチャートにより画像処理装置による画像認識処理を説明する。取得部101は、カメラなどの撮像装置やサーバ装置から認識対象の画像を取得する(S11)。なお、取得する画像は、静止画像、もしくは、動画像中の1フレームの画像である。 [Image recognition processing]
Image recognition processing by the image processing apparatus will be described with reference to the flowchart of FIG. The acquisition unit 101 acquires an image to be recognized from an imaging device such as a camera or a server device (S11). Note that the acquired image is a still image or an image of one frame in a moving image.

次に、分割部102は、取得された画像を領域分割する(S12)。例えば、非特許文献1などに記載された方法を用いて、画像は、色が類似した画素のクラスタである「Superpixel (SPx)」に分割される。つまり、SPxが部分領域に相当する。 Next, the dividing unit 102 divides the acquired image into regions (S12). For example, an image is divided into “Superpixel (SPx)”, which is a cluster of pixels having similar colors, using a method described in Non-Patent Document 1. That is, SPx corresponds to the partial area.

次に、抽出部103は、各部分領域から第一の特徴量を抽出する(S13)。例えば、色の分布ヒストグラム、非特許文献2が記載するLocal Binary Pattern (LBP)、領域のモーメント、高次統計量といった特徴量が抽出される。第一の特徴量は、それら複数種類の特徴量を連結し、特徴次元のスケールの違いを吸収するために、次元ごとに特徴量を正規化したものである。 Next, the extraction unit 103 extracts a first feature amount from each partial region (S13). For example, feature quantities such as a color distribution histogram, Local Binary Pattern (LBP) described in Non-Patent Document 2, area moments, and higher-order statistics are extracted. The first feature quantity is obtained by normalizing the feature quantity for each dimension in order to connect these plural kinds of feature quantities and absorb the difference in the scale of the feature dimension.

次に、K回の繰返処理が実行される。つまり、部分領域から様々な大きさや形状の統合領域を生成し、各統合領域から生成した統合領域特徴量に基づき当該統合領域のカテゴリを認識する。 Next, K iteration processes are executed. In other words, integrated regions of various sizes and shapes are generated from the partial regions, and the category of the integrated region is recognized based on the integrated region feature amount generated from each integrated region.

繰返処理において、統合部104aは、第一の特徴量に基づき複数の部分領域を統合した統合領域を生成する(S14)。例えば、k-meansアルゴリズムなどのクラスタリング手法によって第一の特徴量をクラスタリングし、クラスタに属す部分領域同士を統合して統合領域とする。ここでは、部分領域に相当するSPxを統合して得られる統合領域を「Super-Superpixel (SSPx)」と呼ぶことにする。 In the iterative process, the integration unit 104a generates an integrated region obtained by integrating a plurality of partial regions based on the first feature amount (S14). For example, the first feature amount is clustered by a clustering method such as the k-means algorithm, and the partial regions belonging to the cluster are integrated to form an integrated region. Here, an integrated area obtained by integrating SPx corresponding to the partial areas is referred to as “Super-Superpixel (SSPx)”.

算出部104bは、統合領域特徴量として、統合領域が含む部分領域（以下、被包含領域）の第一の特徴量の統計値を算出する(S15)。つまり、算出部104bは、被包含領域の第一の特徴量に最も近いコードブックの特徴量(Visual Word)に投票し、投票結果を示す頻度ヒストグラム(Bag-of-Fearures)を統合領域特徴量として算出する。 The calculation unit 104b calculates a statistical value of the first feature amount of the partial region (hereinafter, included region) included in the integrated region as the integrated region feature amount (S15). In other words, the calculation unit 104b votes for the feature amount (Visual Word) of the codebook closest to the first feature amount of the inclusion region, and uses the frequency histogram (Bag-of-Fearures) indicating the voting result as the integrated region feature amount. Calculate as

コードブックは事前に作成されている。つまり、C種類のカテゴリに対応する様々な学習画像を用意する。そして、それら学習画像を分割部102によって領域分割し、抽出部103によって部分領域ごとに特徴量を抽出し、特徴量をクラスタリングして、クラスタの中心の特徴量(Visual Word)の集合であるコードブックを作成する。 The code book is created in advance. That is, various learning images corresponding to C types of categories are prepared. Then, the learning image is divided into regions by the dividing unit 102, the feature amount is extracted for each partial region by the extracting unit 103, the feature amounts are clustered, and a code that is a set of feature amounts (Visual Word) at the center of the cluster Create a book.

統合領域認識部105は、統合領域特徴量に基づき、対応する統合領域のカテゴリを認識し、認識結果を示すC個のカテゴリ判別スコアを出力する(S16)。ステップS14からS16の処理は、k-meansアルゴリズムのk値を変えて、K回（所定回数）、繰り返される。この繰返処理により、様々な大きさや形状で生成された統合領域のカテゴリ判別スコアが得られる。 The integrated region recognition unit 105 recognizes the category of the corresponding integrated region based on the integrated region feature amount, and outputs C category discrimination scores indicating the recognition result (S16). The processing of steps S14 to S16 is repeated K times (predetermined number of times) while changing the k value of the k-means algorithm. By this iterative process, the category discrimination score of the integrated area generated in various sizes and shapes can be obtained.

繰返処理が終了すると、部分領域認識部106は、第一の特徴量に基づき、対応する部分領域のカテゴリを認識し、認識結果を示すC個のカテゴリ判別スコアを出力する(S17)。特徴量生成部107は、部分領域認識部106が出力する部分領域のカテゴリ判別スコア、および、当該部分領域を含む統合領域の、統合領域認識部105が出力するカテゴリ判別スコアを連結して第二の特徴量を生成する(S18)。 When the repetition process ends, the partial area recognition unit 106 recognizes the category of the corresponding partial area based on the first feature value, and outputs C category discrimination scores indicating the recognition result (S17). The feature quantity generation unit 107 connects the category determination score of the partial region output from the partial region recognition unit 106 and the category determination score output from the integrated region recognition unit 105 of the integrated region including the partial region. Is generated (S18).

次に、カテゴリ判定部108は、第二の特徴量に基づき、対応する部分領域のカテゴリを認識し、認識したカテゴリを当該部分領域のカテゴリとして出力する(S19)。統合領域認識部105、部分領域認識部106、および、カテゴリ判定部108はそれぞれ、サポートベクトルマシン(SVM)の識別器からなる。識別器は、統合領域特徴量、第一の特徴量、または、第二の特徴量を入力変数、正解カテゴリを目標変数として、入力変数に対して正しいカテゴリを出力するように予め学習が行われている。 Next, the category determination unit 108 recognizes the category of the corresponding partial area based on the second feature amount, and outputs the recognized category as the category of the partial area (S19). Each of the integrated region recognition unit 105, the partial region recognition unit 106, and the category determination unit 108 includes a support vector machine (SVM) classifier. The discriminator is trained in advance to output the correct category for the input variable, with the integrated region feature value, the first feature value, or the second feature value as the input variable and the correct answer category as the target variable. ing.

SVMは、基本的に2クラス判別器であり、対象カテゴリを正事例とし、その他の全カテゴリを負事例としてカテゴリごとに学習を行い、C種類のカテゴリに対応するC個のSVMを用意する。従って、ステップS16、S17、S19における判別処理の結果として、一つの統合領域または一つの部分領域に対してC個のカテゴリ判別スコアが得られる。そして、ステップS19において、C個のカテゴリ判別スコアのうち、判別スコアが最高のカテゴリが部分領域のカテゴリとして出力される。 The SVM is basically a two-class discriminator, learning for each category with the target category as a positive case and all other categories as negative cases, and preparing C SVMs corresponding to C types of categories. Accordingly, C category discrimination scores are obtained for one integrated region or one partial region as a result of the discrimination processing in steps S16, S17, and S19. In step S19, the category having the highest discrimination score among the C category discrimination scores is output as the partial region category.

図3により注目部分領域のカテゴリの判別処理について説明する。分割部102は、入力画像を領域分割し、部分領域SP₁からSP_Nを生成する。以下では、部分領域SPxのうち、n番目の部分領域SPxであるSP_nを注目部分領域としてカテゴリの判別処理を説明する。 With reference to FIG. 3, the process of determining the category of the target partial area will be described. Dividing unit 102, an input image divided into regions, and generates the SP _N from the partial area SP _1. In the following, the category determination process will be described using SPn, which is the _nth partial region SPx, of the partial regions SPx as a target partial region.

抽出部103は、部分領域SP₁からSP_Nそれぞれの第一の特徴量を抽出する。部分領域認識部106は、部分領域SP_nの第一の特徴量に基づき部分領域SP_nのカテゴリ尤度を算出する。カテゴリ尤度は、C種類のカテゴリごとに得られる。 Extraction unit 103 extracts the first feature quantity from the partial area SP ₁ SP _N, respectively. Partial area recognition unit 106 calculates the category likelihood of partial regions SP _n based on the first characteristic amount of partial regions SP _n. The category likelihood is obtained for each of the C types of categories.

統合部104aは、第一の特徴量に基づき、部分領域SPxを統合した統合領域SSPxを生成する。注目部分領域SP_nを含む、例えばM個の統合領域SSP₁からSSP_Mが生成されたと仮定する。算出部104bは、生成された統合領域SSPxが含む部分領域SPxの第一の特徴量の統計値を統合領域特徴量として算出する。統合領域認識部105は、統合領域特徴量に基づき統合領域SSP₁からSSP_Mそれぞれのカテゴリ尤度を算出する。上述したように、カテゴリ尤度は、C種類のカテゴリごとに得られる。 The integration unit 104a generates an integrated region SSPx in which the partial regions SPx are integrated based on the first feature amount. Assume that SSP _M including, for example, M integrated regions SSP ₁ including the target partial region SP _n is generated. The calculating unit 104b calculates the statistical value of the first feature amount of the partial region SPx included in the generated integrated region SSPx as the integrated region feature amount. The integrated region recognition unit 105 calculates the category likelihood of each of the integrated regions SSP ₁ to SSP _M based on the integrated region feature amount. As described above, the category likelihood is obtained for each of the C types of categories.

特徴量生成部107は、部分領域SP_nのカテゴリ尤度と、統合領域SSP₁からSSP_Mのカテゴリ尤度を連結した第二の特徴量を生成する。カテゴリ判定部108は、部分領域SP_nの第二の特徴量のカテゴリ尤度を算出し、尤度が最高のカテゴリを部分領域SP_nのカテゴリとして出力する。 Feature amount generating unit 107 generates a category likelihood of partial regions SP _n, the second feature quantity linked category likelihood of SSP _M from the integrated area SSP _1. Category determining unit 108 calculates the second feature quantity categories likelihood of partial regions SP _n, the likelihood outputs the highest category as a category of partial regions SP _n.

統合領域SSPxの特徴量は、通常、当該統合領域の画素やエッジ情報などに基づき抽出される。このような特徴量の抽出方法は計算コストが高い。一方、実施例によれば、統合領域特徴量の抽出は、予め算出されている部分領域の特徴量に基づき行われ、統合領域特徴量の計算コストが削減される。例えば、図3に示す統合領域SSP₁の特徴量は、統合領域SSP₁の被包含領域SP_n、SP_o、SP_p、SP_qそれぞれの第一の特徴量の統計値として算出され、統合領域特徴量を生成するための計算コストが低く抑えられる。 The feature amount of the integrated region SSPx is usually extracted based on the pixel and edge information of the integrated region. Such a feature quantity extraction method is computationally expensive. On the other hand, according to the embodiment, the extraction of the integrated region feature value is performed based on the feature value of the partial region calculated in advance, and the calculation cost of the integrated region feature value is reduced. For example, the feature amount of the consolidated region SSP ₁ shown in FIG. 3, the inclusion area SP _n of the combined area _{_{_{SSP 1, SP o, SP p}}} , is calculated as a statistical value of the first characteristic amount of SP _q respectively, combined areas The calculation cost for generating the feature value can be kept low.

また、被写体の部分領域に基づき被写体カテゴリを判別する場合、ある部分領域が被写体のカテゴリと異なるカテゴリに類似していると、被写体カテゴリの正しい判別が難しくなる。また、部分領域は、被写体の形状に従って生成されるのが理想的であるが、そのような部分領域を生成することは、意味的領域分割の問題を解くことに等しく困難である。 Also, when determining a subject category based on a partial area of a subject, it is difficult to correctly determine the subject category if a certain partial area is similar to a category different from the category of the subject. Ideally, the partial area is generated according to the shape of the subject, but generating such a partial area is equally difficult to solve the problem of semantic area division.

実施例において、様々な形状や大きさの統合領域を生成する際、適切な統合領域が得られる場合もあれば、不適切な統合領域が得られる場合もある。しかし、形状や大きさが異なる複数の統合領域を生成することで、画像中の被写体に適切な統合領域が生成される可能性が高くなる。従って、多様な統合領域の判別結果を統合することで認識結果が安定し、より正しい認識結果が得られ易くなる。 In the embodiment, when generating integrated regions of various shapes and sizes, an appropriate integrated region may be obtained or an inappropriate integrated region may be obtained. However, by generating a plurality of integrated regions having different shapes and sizes, there is a high possibility that an appropriate integrated region is generated for the subject in the image. Therefore, by integrating the discrimination results of various integrated areas, the recognition result is stabilized, and a more correct recognition result can be easily obtained.

図4により実施例の画像認識処理による認識結果例を示す。人物と車両などが写った入力画像を領域分割した部分領域のカテゴリを判別すると、車両の窓ガラスやボンネットは空と類似しているため、部分領域401や402のように「sky」と誤判別される。 FIG. 4 shows an example of recognition results obtained by the image recognition processing of the embodiment. If the category of the partial area obtained by dividing the input image showing a person and a vehicle is determined, the vehicle's window glass and bonnet are similar to the sky, so it is erroneously determined as “sky” as in the partial areas 401 and 402. Is done.

一方、部分領域を統合して様々な形状や大きさの統合領域を生成し、各統合領域のカテゴリを判別すると、統合領域404や406のような誤判別も含まれるが、正しい判別結果が得られる統合領域403、405が存在する。これら多様な統合領域の判別結果を統合することで認識結果が安定し、部分領域のみに着目するよりも正しい認識結果が得られ易くなる。 On the other hand, when integrated regions of various shapes and sizes are generated by integrating partial regions and the categories of each integrated region are discriminated, misclassifications such as the integrated regions 404 and 406 are included, but correct discrimination results are obtained. There are integrated areas 403 and 405 to be created. By integrating the determination results of these various integrated regions, the recognition result is stabilized, and it becomes easier to obtain a correct recognition result than focusing only on the partial region.

［情報処理装置の構成］
図5のブロック図により実施例の画像処理装置として機能するコンピュータ装置の構成例を示す。CPU201は、RAM202をワークメモリとして、ROM203や記憶部204に格納されたプログラムを実行し、システムバス208を介して、後述する構成を制御する。記憶部204は、ハードディスクドライブ(HDD)、ソリッドステートドライブ(SSD)、フラッシュメモリなどであり、OSや前述した画像認識処理を実現するプログラムを格納する。 [Configuration of information processing device]
The block diagram of FIG. 5 shows a configuration example of a computer device that functions as the image processing apparatus of the embodiment. The CPU 201 uses the RAM 202 as a work memory, executes a program stored in the ROM 203 or the storage unit 204, and controls a configuration described later via the system bus 208. The storage unit 204 is a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or the like, and stores an OS and a program that realizes the above-described image recognition processing.

汎用インタフェイス205は、例えばUSBなどのシリアルバスインタフェイスであり、マウスやキーボードなどの操作部211や認識対象の画像の一つの供給源であるディジタルカメラ212が接続される。認識対象の画像は、記憶部204、汎用インタフェイス205に接続されたドライブの記録媒体、ネットワーク213に接続されたサーバ装置などから入力することができる。 The general-purpose interface 205 is a serial bus interface such as a USB, for example, and is connected to an operation unit 211 such as a mouse and a keyboard and a digital camera 212 which is one source of images to be recognized. The recognition target image can be input from the storage unit 204, a recording medium of a drive connected to the general-purpose interface 205, a server device connected to the network 213, or the like.

ビデオインタフェイス206は、HDMI（登録商標）やDisplayPort（商標）などのビデオインタフェイスであり、モニタ106が接続される。ネットワークインタフェイス207は、有線または無線ネットワーク213と接続するためのインタフェイスである。ユーザ操作やディジタルカメラ212との接続は、ネットワークインタフェイス207を介して行われてもよい。 The video interface 206 is a video interface such as HDMI (registered trademark) or DisplayPort (trademark), to which the monitor 106 is connected. The network interface 207 is an interface for connecting to a wired or wireless network 213. User operation and connection with the digital camera 212 may be performed via the network interface 207.

このように、部分領域のカテゴリの認識結果と、当該部分領域を含む統合領域のカテゴリの認識結果を連結することにより、当該部分領域のカテゴリの認識精度を向上することができる。また、統合領域のカテゴリを認識する際の特徴量は、当該統合領域に含まれる各部分領域の特徴量の統計量として算出するので、特徴量抽出における計算コストを削減することができる。言い替えれば、低い計算コストかつ高精度の意味的領域分割が実現される。 Thus, the recognition accuracy of the category of the partial area can be improved by connecting the recognition result of the category of the partial area and the recognition result of the category of the integrated area including the partial area. In addition, since the feature amount for recognizing the category of the integrated region is calculated as a statistic amount of the feature amount of each partial region included in the integrated region, it is possible to reduce the calculation cost in feature amount extraction. In other words, semantic area division with low calculation cost and high accuracy is realized.

［変形例］
上記では、統合部104aがk-meansアルゴリズムを利用する例を説明したが、Mean-Shiftやスペクトラルクラスタリングなどのクラスタリングアルゴリズムを用いても構わない。また、クラスタリングアルゴリズムのほかにも、領域を統合するアルゴリズムであればどのようなものでも構わない。 [Modification]
In the above, an example in which the integration unit 104a uses the k-means algorithm has been described. However, a clustering algorithm such as Mean-Shift or spectral clustering may be used. In addition to the clustering algorithm, any algorithm that integrates regions may be used.

また、上記では、統合部104aが第一の特徴量を特徴空間でクラスタリングし、部分領域を統合する例を説明した。しかし、隣接する部分領域同士の第一の特徴量を比較し、第一の特徴量の類似度が所定の閾値以上の場合にそれら部分領域を連結することで、統合領域を生成しても構わない。あるいは、二つの部分領域の境界付近の画素値の差分が所定の閾値よりも小さい場合、それら部分領域を連結することで、統合領域を生成しても構わない。 In the above description, the example in which the integration unit 104a clusters the first feature amount in the feature space and integrates the partial regions has been described. However, an integrated region may be generated by comparing the first feature amounts of adjacent partial regions and connecting the partial regions when the similarity of the first feature amounts is equal to or greater than a predetermined threshold. Absent. Alternatively, when the difference between pixel values near the boundary between two partial areas is smaller than a predetermined threshold, the integrated areas may be generated by connecting the partial areas.

また、階層的に複数の統合領域を生成しても構わない。上記では、統合部104aが部分領域全体の色やLBP特徴に基づいて部分領域を統合する例を説明したが、色特徴のみに基づく部分領域の統合、LBP特徴のみに基づく部分領域の統合など、統合に利用する特徴量を様々に変えて統合領域を生成しても構わない。 A plurality of integrated areas may be generated hierarchically. In the above, an example has been described in which the integration unit 104a integrates partial areas based on the colors and LBP features of the entire partial areas, but integration of partial areas based only on color features, integration of partial areas based only on LBP features, etc. The integrated region may be generated by changing various feature quantities used for integration.

図6により統合領域特徴量として利用可能な特徴量を示す。上記では、算出部104bが統合領域に含まれる部分領域のコードブック化された特徴量の頻度ヒストグラム（図6(A)）を統合領域特徴量として生成する例を説明した。しかし、被包含領域の特徴量の平均、分散、歪度、尤度などの高次統計量（図6(B)）を統合領域特徴量に用いても構わない。また、被包含領域の重み付線形和（図6(C)）を統合領域特徴量に用いても構わない。 FIG. 6 shows feature quantities that can be used as integrated area feature quantities. In the above description, an example has been described in which the calculation unit 104b generates a frequency histogram (FIG. 6A) of feature amounts that are converted into codebooks of partial regions included in the integrated region as integrated region feature amounts. However, higher-order statistics (FIG. 6B) such as the mean, variance, skewness, and likelihood of feature quantities in the inclusion area may be used as the integrated area feature quantity. Further, the weighted linear sum of the inclusion region (FIG. 6C) may be used as the integrated region feature amount.

図7により統合領域特徴量の算出方法の態様を説明する。上記では、算出部104bが統合領域特徴量を生成する際、すべての被包含領域から統計値を算出する例を説明した。しかし、図7に示すように、注目統合領域の境界付近に位置する部分領域601と、境界付近以外に位置する部分領域602とを区別してそれぞれ統計値を算出しても構わない。 An aspect of the calculation method of the integrated region feature value will be described with reference to FIG. In the above, an example has been described in which the calculation unit 104b calculates the statistical value from all the included regions when generating the integrated region feature amount. However, as shown in FIG. 7, a statistical value may be calculated by distinguishing between a partial region 601 located near the boundary of the target integrated region and a partial region 602 located outside the boundary.

また、被包含領域603の第一の特徴量、および、注目統合領域に隣接する統合領域の被包含領域604の第一の特徴量から統計値を算出し、それら統計値を注目統合領域の統合領域特徴量としても構わない。 Further, a statistical value is calculated from the first feature amount of the inclusion region 603 and the first feature amount of the inclusion region 604 of the integration region adjacent to the attention integration region, and the statistical values are integrated into the attention integration region. It may be an area feature amount.

また、注目統合領域605の統計値と注目統合領域の上側に隣接する統合領域606の統計値、および、注目統合領域605の統計値と注目統合領域の下側に隣接する統合領域607の統計値を、注目統合領域の統合領域特徴量としもよい。また、注目統合領域において、境界に接する部分領域609の統計値、境界に接しない部分領域608の統計値、注目統合領域に接する（または、注目統合領域の近傍の）部分領域610の統計値を、注目統合領域の統合領域特徴量としもよい。 Further, the statistical value of the attention integration region 605 and the statistical value of the integration region 606 adjacent to the upper side of the attention integration region, and the statistical value of the attention integration region 605 and the statistical value of the integration region 607 adjacent to the lower side of the attention integration region May be the integrated region feature amount of the target integrated region. In the attention integration region, the statistical value of the partial region 609 in contact with the boundary, the statistical value of the partial region 608 in contact with the boundary, and the statistical value of the partial region 610 in contact with the attention integration region (or in the vicinity of the attention integration region) Alternatively, the integrated region feature amount of the attention integrated region may be used.

このように、算出部104bにおける統合領域特徴量の算出方法は様々な様態が考えられ、一つに限定されるものではない。 As described above, various methods of calculating the integrated region feature value in the calculation unit 104b can be considered, and the calculation method is not limited to one.

上記では、部分領域認識部106、統合領域認識部107およびカテゴリ認識部109としてSVMを用いる例を説明したが、別の識別器を用いることができる。例えば、ロジスティック回帰、ニューラルネット、ランダムフォレストなどの利用が可能である。また、部分領域と統合領域のカテゴリの判別スコアを条件付き確率分布(conditional random field)の枠組みに組み込んで、カテゴリを判別してもよい。 In the above description, the SVM is used as the partial region recognition unit 106, the integrated region recognition unit 107, and the category recognition unit 109. However, another classifier can be used. For example, logistic regression, neural network, random forest, etc. can be used. Further, the category discrimination may be performed by incorporating the category discrimination score of the partial area and the integrated area into the framework of a conditional random field.

上記では、第一の特徴量に基づく部分領域のカテゴリの判別結果と、当該部分領域を含む統合領域のカテゴリの判別結果を連結して第二の特徴量を生成する例を説明した。しかし、第一の特徴量と統合領域特徴量を連結して第二の特徴量を生成しても構わない。あるいは、統合領域のカテゴリの判別結果、または、統合領域特徴量の平均、分散、歪度、尖度などの基本統計値を、対応する部分領域の第一の特徴量に連結して第二の特徴量を生成しても構わない。 In the above description, an example has been described in which the determination result of the category of the partial region based on the first feature amount and the determination result of the category of the integrated region including the partial region are connected to generate the second feature amount. However, the second feature value may be generated by connecting the first feature value and the integrated region feature value. Alternatively, the result of determining the category of the integrated region, or basic statistics such as the average, variance, skewness, kurtosis of the integrated region feature value are connected to the first feature value of the corresponding partial region, and the second A feature amount may be generated.

以下、本発明にかかる実施例2の画像処理装置および画像処理方法を説明する。なお、実施例2において、実施例1と略同様の構成については、同一の符号を付して、その詳細な説明を省略する場合がある。 Hereinafter, an image processing apparatus and an image processing method according to a second embodiment of the present invention will be described. Note that the same reference numerals in the second embodiment denote the same parts as in the first embodiment, and a detailed description thereof may be omitted.

実施例2においては、実施例1とは異なる画像認識タスクに本発明を適用する例を説明する。実施例2の画像認識処理は、静止画像を入力し、入力画像のシーンのカテゴリを判別する。カテゴリは、山岳風景、街中の景色、人物ポートレート、など予めユーザが分類したC種類のシーンのカテゴリである。 In the second embodiment, an example in which the present invention is applied to an image recognition task different from the first embodiment will be described. In the image recognition processing according to the second embodiment, a still image is input and a scene category of the input image is determined. The category is a category of C types of scenes classified in advance by the user, such as mountain scenery, city scenery, and portraits.

図8のブロック図により実施例2の画像処理装置の構成例を説明する。実施例2の画像処理装置は、実施例1と同様の取得部101、分割部102、抽出部103、統合領域生成部104を有し、さらに、統合領域のシーンを判別する第一判別部111、および、入力画像のシーンを判別する第二判別部112を有する。 A configuration example of the image processing apparatus according to the second embodiment will be described with reference to the block diagram of FIG. The image processing apparatus according to the second embodiment includes an acquisition unit 101, a division unit 102, an extraction unit 103, and an integrated region generation unit 104 that are the same as those of the first embodiment, and further includes a first determination unit 111 that determines a scene in the integrated region. And a second discriminating unit 112 for discriminating a scene of the input image.

図9のフローチャートにより実施例2の画像認識処理を説明する。ステップS11からS15までの処理は実施例1と同様である。繰返処理において、第一判別部111は、統合領域特徴量である頻度ヒストグラムを入力変数とし、SVMなどの識別器を用いて、対応する統合領域のシーンのカテゴリを判別する(S21)。なお、SVMはC種類のシーンのカテゴリ判別スコアを出力し、一つの統合領域に対してC個のカテゴリ判別スコアが得られる。 The image recognition process of the second embodiment will be described with reference to the flowchart of FIG. The processing from step S11 to S15 is the same as that in the first embodiment. In the repetitive processing, the first determination unit 111 determines a scene category of a corresponding integrated region using a frequency histogram that is an integrated region feature amount as an input variable and using a discriminator such as SVM (S21). Note that the SVM outputs C category discrimination scores for C types of scenes, and C category discrimination scores are obtained for one integrated region.

ステップS14、S15、S21の処理は、統合領域の形状や大きさを様々に変更して、K回、繰り返される。この繰返処理により、一つの入力画像に対してK通りのカテゴリ判別スコアが得られる。 The processes in steps S14, S15, and S21 are repeated K times with various changes in the shape and size of the integrated region. By this iterative process, K category discrimination scores are obtained for one input image.

繰返処理が終了すると、第二識別部112は、K通りのカテゴリ判別スコアを連結した特徴量を入力変数とし、SVMなどの識別器を用いて入力画像のシーンを判別する(S22)。SVMは、入力画像がC種類のシーンのカテゴリの何れであるかを示す判別スコアを算出し、入力画像に対してC個のカテゴリ判別スコアが得られる。第二識別部112は、C個のカテゴリ判別スコアのうち、判別スコアが最高のカテゴリに対応するシーンのカテゴリを入力画像のシーンのカテゴリとして出力する。 When the iterative process is completed, the second identification unit 112 determines the scene of the input image using a classifier such as SVM, using the feature quantity obtained by connecting the K category determination scores as an input variable (S22). The SVM calculates a discrimination score indicating which of the C scene categories the input image is, and C category discrimination scores are obtained for the input image. The second identification unit 112 outputs the category of the scene corresponding to the category having the highest discrimination score among the C category discrimination scores as the category of the scene of the input image.

以下、本発明にかかる実施例3の画像処理装置および画像処理方法を説明する。なお、実施例3において、実施例1、2と略同様の構成については、同一の符号を付して、その詳細な説明を省略する場合がある。 Hereinafter, an image processing apparatus and an image processing method of Example 3 according to the present invention will be described. Note that the same reference numerals in the third embodiment denote the same parts as in the first and second embodiments, and a detailed description thereof may be omitted.

実施例3においては、画像を入力し、入力画像に写った物体を検出する物体検出処理を説明する。検出対象の物体は、人物や車両など、予めユーザが指定したC種類の物体のカテゴリである。 In the third embodiment, an object detection process for inputting an image and detecting an object shown in the input image will be described. The object to be detected is a category of C types of objects designated in advance by the user, such as a person or a vehicle.

図10のブロック図により実施例3の画像処理装置の構成例を説明する。実施例3の画像処理装置は、実施例1と同様の取得部101、分割部102、抽出部103、統合領域生成部104を有す。さらに、統合領域の物体らしさ（以下「物体尤度」）を推定する推定部121、物体に対応する統合領域を判定する判定部122、統合領域を囲む矩形領域の特徴量を抽出する抽出部123、矩形領域の特徴量に基づき物体を検出する検出部124を有する。 A configuration example of the image processing apparatus according to the third embodiment will be described with reference to the block diagram of FIG. The image processing apparatus according to the third embodiment includes an acquisition unit 101, a division unit 102, an extraction unit 103, and an integrated region generation unit 104 that are the same as those in the first embodiment. Furthermore, the estimation unit 121 that estimates the object-likeness of the integrated region (hereinafter, “object likelihood”), the determination unit 122 that determines the integrated region corresponding to the object, and the extraction unit 123 that extracts the feature amount of the rectangular region surrounding the integrated region The detection unit 124 detects an object based on the feature amount of the rectangular area.

図11のフローチャートにより実施例3の物体検出処理を説明する。ステップS11からS14までの処理は実施例1と同様である。ステップS15の処理も実施例1とほぼ同様であるが、実施例3においては、図7に示す、統合領域の境界付近に位置する部分領域601と、境界付近以外に位置する部分領域602とを区別してそれぞれ統計値を算出する。これにより、統合領域の境界部に生じ易い物体形状に対する欠損やバリを考慮した認識が可能になり、検出結果が安定する。 The object detection process of the third embodiment will be described with reference to the flowchart of FIG. The processing from step S11 to S14 is the same as that in the first embodiment. The process of step S15 is substantially the same as in the first embodiment, but in the third embodiment, a partial region 601 located near the boundary of the integrated region and a partial region 602 located outside the vicinity of the boundary shown in FIG. A statistical value is calculated for each distinction. As a result, it is possible to recognize the object shape that is likely to occur at the boundary portion of the integrated region in consideration of defects and burrs, and the detection result is stabilized.

繰返処理において、推定部121は、統合領域特徴量を入力変数として、SVMなどの識別器を用いて、対応する統合領域の物体尤度を推定する(S31)。識別器は、統合領域特徴量を入力変数とし、物体である統合領域を正事例、物体ではない統合領域を負事例として、予め学習が行われている。ステップS13、S14、S31の処理は、統合領域の形状や大きさを様々に変更して、K回、繰り返され、様々な形状や大きさの統合領域について物体尤度が推定される。 In the iterative process, the estimation unit 121 estimates the object likelihood of the corresponding integrated region using a discriminator such as SVM using the integrated region feature quantity as an input variable (S31). The discriminator learns in advance using the integrated region feature quantity as an input variable, an integrated region that is an object as a positive example, and an integrated region that is not an object as a negative example. The processes in steps S13, S14, and S31 are repeated K times by changing the shape and size of the integrated region in various ways, and the object likelihood is estimated for the integrated regions of various shapes and sizes.

繰返処理が終了すると、判定部122は、物体尤度が所定の閾値未満の統合領域を物体に対応しないとして棄却する(S32)。言い替えれば、物体に対応する統合領域が判定される。この処理により、物体ではないと推定された統合領域に関する後段の処理が削減される。 When the iterative process ends, the determination unit 122 rejects the integrated region having the object likelihood less than the predetermined threshold as not corresponding to the object (S32). In other words, the integrated area corresponding to the object is determined. This process reduces subsequent processes related to the integrated region that is estimated not to be an object.

次に、抽出部123は、物体に対応すると判定された統合領域を囲む矩形領域（以下、包含領域）の第二の特徴量を抽出する(S33)。第二の特徴量は、例えば、物体検出で一般的な特徴量である勾配方向ヒストグラム(histograms of oriented gradients: HOG)などである。 Next, the extraction unit 123 extracts a second feature amount of a rectangular area (hereinafter referred to as an inclusion area) surrounding the integrated area determined to correspond to the object (S33). The second feature amount is, for example, a gradient direction histogram (histograms of oriented gradients: HOG), which is a general feature amount in object detection.

次に、検出部124は、第二の特徴量を入力変数とし、SVMなどの識別器を用いて、対応する統合領域の物体のカテゴリを検出し(S34)、物体のカテゴリを出力する(S34)。なお、ステップS34で得られる判定スコアが、すべてのカテゴリについて小さい場合、検出部124は、当該統合領域が物体に対応しないと判定する。 Next, the detection unit 124 uses the second feature amount as an input variable, detects a category of the object in the corresponding integrated region using a discriminator such as SVM (S34), and outputs the category of the object (S34). ). When the determination score obtained in step S34 is small for all categories, the detection unit 124 determines that the integrated region does not correspond to an object.

図12により物体検出処理について説明する。分割部102は、入力画像を領域分割する。抽出部103は、部分領域ごとに第一の特徴量を抽出する。統合部104aは、第一の特徴量に基づき、部分領域SPxを統合した統合領域SSPxを生成する。様々な形状や大きさを有する統合領域が生成され、例えば、統合領域SSP₁からSSP_Mが得られたと仮定する。 The object detection process will be described with reference to FIG. The dividing unit 102 divides the input image into regions. The extraction unit 103 extracts the first feature amount for each partial region. The integration unit 104a generates an integrated region SSPx in which the partial regions SPx are integrated based on the first feature amount. Assume that integrated regions having various shapes and sizes are generated, and, for example, integrated regions SSP ₁ to SSP _M are obtained.

算出部104bは、統合領域特徴量として、被包含部分領域の特徴量の統計値を算出する。推定部121は、統合領域特徴量に基づき、対応する統合領域の物体尤度を推定する。つまり、統合領域SSP₁からSSP_Mの物体尤度が得られる。推定部121は、物体尤度が閾値未満の統合領域を物体ではないとして棄却する。 The calculation unit 104b calculates a statistical value of the feature amount of the included partial region as the integrated region feature amount. The estimation unit 121 estimates the object likelihood of the corresponding integrated region based on the integrated region feature amount. That is, the object likelihood of the integrated areas SSP ₁ to SSP _M is obtained. The estimation unit 121 rejects the integrated region having the object likelihood less than the threshold value as not an object.

抽出部122は、閾値以上の物体尤度をもつ統合領域を囲む矩形領域（包含領域）からHOGなどの特徴量を抽出する。検出部123は、包含領域の特徴量に基づき物体カテゴリごとの尤度を推定し、尤度が最大の物体カテゴリを入力画像から検出した物体のカテゴリとして出力する。 The extraction unit 122 extracts a feature quantity such as HOG from a rectangular area (inclusion area) surrounding an integrated area having an object likelihood equal to or greater than a threshold. The detection unit 123 estimates the likelihood for each object category based on the feature amount of the inclusion region, and outputs the object category having the maximum likelihood as the category of the object detected from the input image.

このように、計算コストが低い特徴量の抽出方法により物体の候補領域を限定し、次に計算コストが高い特徴量を用いて物体の判別を行うことで、精度と計算量のバランスを図った物体検出処理を行うことができる。 In this way, the candidate area of an object is limited by a feature quantity extraction method with a low calculation cost, and the object is discriminated using the feature quantity with the next highest calculation cost, thereby achieving a balance between accuracy and calculation quantity. Object detection processing can be performed.

以下、本発明にかかる実施例4の画像処理装置および画像処理方法を説明する。なお、実施例4において、実施例1-3と略同様の構成については、同一の符号を付して、その詳細な説明を省略する場合がある。 Hereinafter, an image processing apparatus and an image processing method according to a fourth embodiment of the present invention will be described. Note that the same reference numerals in the fourth embodiment denote the same parts as in the first to third embodiments, and a detailed description thereof may be omitted.

実施例4においては、画像を入力し、入力画像から主被写体を認識する処理を説明する。図13のブロック図により実施例4の画像処理装置の構成例を説明する。実施例4の画像処理装置は、実施例1と同様の取得部101、分割部102、抽出部103、統合領域生成部104を有す。さらに、統合領域の主被写体らしさ（以下、顕著度）を推定する推定部131、主被写体を検出する検出部132を有する。 In the fourth embodiment, a process of inputting an image and recognizing the main subject from the input image will be described. A configuration example of the image processing apparatus according to the fourth embodiment will be described with reference to the block diagram of FIG. The image processing apparatus according to the fourth embodiment includes an acquisition unit 101, a division unit 102, an extraction unit 103, and an integrated region generation unit 104 that are the same as those in the first embodiment. Furthermore, an estimation unit 131 that estimates the main subject-likeness (hereinafter, saliency) of the integrated region and a detection unit 132 that detects the main subject are included.

図14のフローチャートにより実施例4の主被写体検出処理を説明する。ステップS11からS14までの処理は実施例1と同様である。ステップS15の処理も実施例1とほぼ同様であるが、実施例4においては、図7に示す部分領域608、部分領域609、部分領域610から統計値を算出する。 The main subject detection process of the fourth embodiment will be described with reference to the flowchart of FIG. The processing from step S11 to S14 is the same as that in the first embodiment. The processing in step S15 is substantially the same as in the first embodiment, but in the fourth embodiment, statistical values are calculated from the partial area 608, the partial area 609, and the partial area 610 shown in FIG.

部分領域608は、注目統合領域の内部に位置する部分領域であり、注目統合領域に含まれ、かつ、注目統合領域の境界に全く接しない部分領域、または、注目統合領域の境界と接する部分の長さが所定値未満の部分領域である。部分領域609は、注目統合領域に含まれ、かつ、注目統合領域の境界に接する隣接部分領域である。部分領域610は、注目統合領域に含まれず、かつ、注目統合領域の境界に接する部分領域である。 The partial area 608 is a partial area located inside the attention integration area, is included in the attention integration area, and is a partial area that does not touch the boundary of the attention integration area at all, or a part that touches the boundary of the attention integration area. This is a partial region whose length is less than a predetermined value. The partial region 609 is an adjacent partial region that is included in the attention integration region and is in contact with the boundary of the attention integration region. The partial area 610 is a partial area that is not included in the focused integrated area and touches the boundary of the focused integrated area.

つまり、算出部104bは、内部に位置する部分領域から算出した統計値、内部と境界部に位置する部分領域から算出した統計値、内部と境界部に位置する部分領域および隣接部分領域から算出した統計値を算出する。これにより、注目統合領域だけでなく、注目統合領域に隣接する部分領域も含めて特徴量の統計値を算出することで、背景との関係性を考慮した統計値が得られ、主被写体の検出精度の向上が見込める。 That is, the calculation unit 104b calculates the statistical value calculated from the partial region located inside, the statistical value calculated from the partial region located inside and the boundary, the partial region located inside and the boundary, and the adjacent partial region Calculate statistics. As a result, not only the focused integrated area but also the statistical value of the feature value including the partial area adjacent to the focused integrated area can be calculated, and the statistical value considering the relationship with the background can be obtained. Improvement in accuracy can be expected.

繰返処理において、推定部131は、注目統合領域とその周囲の統合領域の間の類似性を示す顕著度を推定する(S41)。顕著度として、注目統合領域の特徴量とその周囲の統合領域の特徴量の間のカルバック・ライブラ情報量(Kullback-Leibler divergence)やヒストグラム交差などの値が用いられ、顕著度が高いほど主被写体らしいことが示される。ステップS13、S14、S41の処理は、統合領域の形状や大きさを様々に変更して、K回、繰り返され、様々な形状や大きさの統合領域について顕著度が推定される。 In the iterative process, the estimation unit 131 estimates the saliency indicating the similarity between the target integrated region and the surrounding integrated region (S41). As the saliency, values such as the Kullback-Leibler divergence between the feature value of the target integrated region and the feature values of the surrounding integrated region and histogram intersection are used. It seems that it seems. The processes of steps S13, S14, and S41 are repeated K times by changing the shape and size of the integrated region in various ways, and the saliency is estimated for the integrated regions of various shapes and sizes.

繰返処理が終了すると、検出部132は、様々な形状や大きさの統合領域の顕著度を被包含領域ごとに加算して、部分領域の顕著度を算出する(S42)。ある部分領域を含む統合領域の顕著度が高く、当該部分領域を含む顕著度が高い統合領域が多数生成された場合、顕著度の加算により当該部分領域の顕著度も高くなる。検出部132は、顕著度が高い部分領域の集合を主被写体として示す主被写体情報を出力する(S43)。 When the repetition process ends, the detection unit 132 calculates the saliency of the partial area by adding the saliency of the integrated areas of various shapes and sizes for each inclusion area (S42). When the degree of saliency of the integrated area including a certain partial area is high and a large number of integrated areas having high saliency including the partial area are generated, the degree of saliency of the partial area is increased by adding the saliency. The detection unit 132 outputs main subject information indicating a set of partial areas with high saliency as the main subject (S43).

［その他の実施例］
本発明は、上述の実施形態の一以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける一以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、一以上の機能を実現する回路（例えば、ASIC）によっても実現可能である。 [Other Examples]
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program It can also be realized by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

102 … 分割部、103 … 抽出部、104a … 統合部、104b … 算出部 102 ... Dividing unit, 103 ... Extracting unit, 104a ... Integration unit, 104b ... Calculation unit

Claims

A dividing means for dividing the image into partial areas;
Extraction means for extracting a first feature amount from the partial region;
An integration means for generating an integrated region by integrating the partial regions based on the first feature amount;
An image processing apparatus comprising: a calculation unit that calculates at least a statistical value of a first feature amount of a partial region included in the integrated region as the integrated region feature amount.

An integrated region recognition means for recognizing a category of the corresponding integrated region based on the integrated region feature amount;
Partial area recognition means for recognizing the category of the corresponding partial area based on the first feature amount;
Generating means for connecting the recognition results of the integrated area recognition means and the partial area recognition means to generate a second feature quantity;
2. The image processing apparatus according to claim 1, further comprising a determination unit that determines a category of a corresponding partial region based on the second feature amount.

3. The image processing apparatus according to claim 2, wherein the processing of the integrating unit, the calculating unit, and the integrated region recognizing unit is repeated a predetermined number of times by changing the shape or size of the integrated region.

First determining means for determining a scene category of a corresponding integrated region based on the integrated region feature amount;
2. The image processing apparatus according to claim 1, further comprising: a second discriminating unit that discriminates a scene category of the image based on a feature amount obtained by connecting the discrimination results of the first discrimination unit.

5. The image processing apparatus according to claim 4, wherein the processing of the integration unit, the calculation unit, and the first determination unit is repeated a predetermined number of times by changing the shape or size of the integration region.

Estimating means for estimating the object likelihood of the corresponding integrated region based on the integrated region feature amount;
Determination means for determining an integrated region corresponding to an object based on the object likelihood;
Means for extracting a second feature amount of an area including an integrated area determined to correspond to the object;
2. The image processing apparatus according to claim 1, further comprising a detecting unit that detects a category of the object in the integrated region based on the second feature amount.

7. The image processing apparatus according to claim 6, wherein the calculating unit calculates the statistical value by distinguishing between a partial region located near the boundary of the integrated region and a partial region located outside the boundary.

8. The image processing apparatus according to claim 6, wherein the processing of the integration unit, the calculation unit, and the estimation unit is repeated a predetermined number of times by changing the shape or size of the integration region.

Estimating means for estimating the saliency of the corresponding integrated region based on the integrated region feature amount;
2. The image processing according to claim 1, further comprising detection means for adding a saliency of the integrated area for each partial area included in the integrated area and detecting a set of partial areas corresponding to the main subject of the image. apparatus.

The calculation means distinguishes a partial area located inside the integrated area, a partial area included in the integrated area and located at a boundary of the integrated area, and a partial area adjacent to the integrated area. 10. The image processing device according to claim 9, wherein the statistical value is calculated.

11. The image processing apparatus according to claim 9, wherein the processing of the integration unit, the calculation unit, and the estimation unit is repeated a predetermined number of times by changing the shape or size of the integration region.

12. The statistical value according to claim 1, wherein the statistical value is a frequency histogram indicating a result of voting for each feature amount of a codebook prepared in advance based on the first feature amount. Image processing device.

Divide the image into partial areas,
Extracting a first feature value from the partial region;
Based on the first feature amount, the partial areas are integrated to generate an integrated area,
An image processing method for calculating a statistical value of at least a first feature amount of a partial region included in the integrated region as an integrated region feature amount.

13. A program for causing a computer to function as each unit of the image processing apparatus according to claim 1.