JP3569641B2

JP3569641B2 - Apparatus and method for detecting an object in an image and a recording medium recording the method

Info

Publication number: JP3569641B2
Application number: JP04766199A
Authority: JP
Inventors: 悦郎藤田; 伸治安部; 利明杉村
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 1999-02-25
Filing date: 1999-02-25
Publication date: 2004-09-22
Anticipated expiration: 2019-02-25
Also published as: JP2000242782A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像中の物体を検出する検出装置、検出方法に係り、特に検出対象となる物体が写された複数の画像から検出対象物体の認識モデルを作成し、この認識モデルに基づいて、未知画像内に検出対象物体が含まれるか否かを判定し、含まれる場合にはその物体が写っている局所領域を検出する装置、方法に関するものである。
【０００２】
【従来の技術】
画像中の物体を認識し検出するための認識モデルの作成に関する装置は、近年多くの技術が提案されている。その中の一つに、検出対象となる物体が写された複数の画像から直接その物体の認識モデルを作成する装置がある。
【０００３】
この従来の装置は、検出対象物体が写された画像、あるいは検出対象物体が写された画像の、物体が含まれる局所矩形領域による部分画像等からなる事例集合に対してＫＬ（カルフーネン・レーヴ）変換等の画像処理方法を適用することで検出対象物体の２次元的なテンプレート画像を作成し、このテンプレート画像自身により検出対象物体の認識モデルを作成するというものであり、入力未知画像の多くの局所領域とこのテンプレート画像との照合計算に基づいて物体を検出するものである。例えば、次の文献がある。
【０００４】
参考文献１：Ｍ．ＴｕｒｋａｎｄＡ．Ｐｅｎｔｌａｎｄ，”ＥｉｇｅｎｆａｃｅｓｆｏｒＲｅｃｏｇｎｉｔｉｏｎ”，Ｊ．ＣｏｇｎｉｔｉｖｅＮｅｕｒｏｓｃｉｅｎｃｅ，Ｖｏｌ．３，Ｎｏ．１，ｐｐ．７１−８６（１９９１）
この参考文献１では画像から人間の顔を認識し検出するために、複数の人間の正面から撮影した顔画像の集合に対してＫＬ変換を適用することによって、未知画像から正面を向いた顔画像が検出可能な顔画像認識モデルを作成している。
【０００５】
【発明が解決しようとする課題】
画像から検出対象物体を検出するために、検出対象物体が写された画像、あるいは検出対象物体が写された画像の、物体が含まれる局所矩形領域の部分画像等の事例集合に対してＫＬ変換等の画像処理方法を適用することで対象物体の２次元的なテンプレート画像を認識モデルとして作成する従来の装置においては、未知画像内の物体を認識しその領域を検出する際、照合計算において、未知画像の各局所領域とテンプレート画像との画像全体に関する類似性を照合評価として測るため、局所領域内に検出対象物体が写っている場合であっても、検出対象物体に向きやオクルージョン等の見かけ上の変化が多少でもある場合には、画像全体での類似性が評価されないために検出対象物体を検出できないという問題がある。
【０００６】
本発明が解決しようとする課題は、未知画像内において検出対象物体が写されている場合で、向きやオクルージョン等の見かけ上の変化が多少ある場合に対しても、物体の検出が可能な認識モデルを作成し、この認識モデルに基づいて検出対象物体を検出する装置および方法を提案することにある。
【０００７】
【課題を解決するための手段】
上記の課題を解決するための本発明の画像中の物体の検出装置は、検出対象物体が写された複数の画像から作成する認識モデルに基づいて、入力された未知画像内に検出対象物体が含まれるか否かを判定し、含まれる場合にはその物体が写っている局所領域を検出する画像中の物体の検出装置であって、
前記認識モデルの作成手段は、
前記検出対象物体が写された複数の画像を入力する手段と、
前記検出対象物体以外が写された複数の画像を入力する手段と、
前記検出対象物体および検出対象物体以外の各入力画像から色、エッジ、テクスチャ、あるいはこれらの結合による特徴ベクトルを抽出する手段と、
前記抽出された検出対象物体および検出対象物体以外の特徴ベクトルによって、特徴空間の各軸を２等分して２の次元数乗個の方形領域に分割する手段と、
前記各方形領域について、含まれる特徴ベクトルが前記検出対象物体の要素のみである場合はその領域を正領域とし、含まれる特徴ベクトルが前記検出対象物体以外の要素のみである場合は負領域とし、前記検出対象物体および検出対象物体以外のいずれの要素も含まない場合は負領域とする、正負領域の分類と前記各方形領域の分割を繰り返す手段と、
前記正領域の要素が含まれる各方形領域の和集合領域を、前記検出対象物体の画像特徴の特徴付け領域として生成することで、前記検出対象物体の認識モデルを作成する手段とを備えたことを特徴とする。
【０００９】
また、上記の課題を解決するための本発明の画像中の物体の検出方法は、検出対象物体が写された複数の画像から作成する認識モデルに基づいて、入力された未知画像内に検出対象物体が含まれるか否かを判定し、含まれる場合にはその物体が写っている局所領域を検出する画像中の物体の検出方法であって、
前記認識モデルの作成は、
前記検出対象物体が写された複数の画像を入力し、
前記検出対象物体以外が写された複数の画像を入力し、
前記検出対象物体および検出対象物体以外の各入力画像から色、エッジ、テクスチャ、あるいはこれらの結合による特徴ベクトルを抽出し、
前記抽出された検出対象物体および検出対象物体以外の特徴ベクトルによって、特徴空間の各軸を２等分して２の次元数乗個の方形領域に分割し、
前記各方形領域について、含まれる特徴ベクトルが前記検出対象物体の要素のみである場合はその領域を正領域とし、含まれる特徴ベクトルが前記検出対象物体以外の要素のみである場合は負領域とし、前記検出対象物体および検出対象物体以外のいずれの要素も含まない場合は負領域とする、正負領域の分類と前記各方形領域の分割を繰り返し、
前記正領域の要素が含まれる各方形領域の和集合領域を、前記検出対象物体の画像特徴の特徴付け領域として生成することで、前記検出対象物体の認識モデルを作成することを特徴とする。
【００１１】
また、上記検出方法における処理手順をコンピュータに実行させるためのプログラムを、該コンピュータが読み取り可能な記録媒体に記録したことを特徴とする。
【００１２】
【発明の実施の形態】
以下に、本発明の実施形態について、図面を参照して説明する。
【００１３】
図１は、本発明における認識モデル作成のための装置構成である。この装置構成の手段１０１，１０２，１１１〜１１３は、マウス、イメージスキャナ、ディスプレイ、ＣＰＵ及びメモリ装置等からなるいわゆるコンピュータシステムを使用して実現されるが、その構成は周知であるので図示は省略する。
【００１４】
まず、画像入力手段１０１は、イメージスキャナ等を利用して、検出対象となる物体が写された複数の画像を入力し、メモリ等に格納する。ここで、検出対象物体が入力画像の局所的な矩形領域に含まれている場合には、マウス等でその矩形領域を指定し、新たに部分画像を作成した上で、この部分画像をメモリ等に格納する。
【００１５】
次に、画像入力手段１０２は、検出対象物体以外が写された複数の画像を入力し、メモリ等に格納する。
【００１６】
次いで、手段１１１〜１１３は、ＣＰＵ上で以下の処理を実行し、検出対象物体の認識モデルを処理系において自動的に作成する。
【００１７】
まず、特徴ベクトル抽出処理手段１１１は、画像入力手段１０１および１０２で入力された検出対象物体およびそれ以外が写された各入力画像から色、エッジ、テクスチャ、あるいはこれらの結合による特徴ベクトルを抽出する。
【００１８】
次いで、階層的分類処理手段１１２は、抽出された検出対象物体および検出対象物体以外の両者の特徴ベクトルの集合によって特徴空間を階層的に分類し、分類された特徴空間の局所領域内にはこの両集合の要素が同時に含まれないようにする。
【００１９】
最後に、認識モデル作成処理手段１１３は、検出対象物体の特徴ベクトルが含まれる特徴空間上の各局所領域に対してその和集合を生成し、検出対象物体の画像特徴の特徴付け領域として生成することで検出対象物体の認識モデルを作成する。
【００２０】
次に、本発明による認識モデル作成に関する具体的な処理例を図２、図３を参照して説明する。
【００２１】
まず、画像入力手段１０１により検出対象物体が写された複数の画像を入力する。ただし、検出対象物体が入力画像中のある局所矩形領域に含まれる場合には、検出対象物体が含まれ、かつ背景ができるだけ含まれないように外接矩形領域を指定し、新たに部分画像を作成した上で、この部分画像を入力する。
【００２２】
次に、画像入力手段１０２により検出対象物体以外が写された複数の画像を入力する。
【００２３】
次に、特徴ベクトル抽出処理手段１１１において、入力された検出対象物体および検出対象物体以外の各画像から、色、エッジ、テクスチャ、あるいはこれらの結合による特徴ベクトルを抽出する。
【００２４】
なお、特徴ベクトル抽出に関しては、例えば、「Ｙ−ＩＯｈｔａ，Ｔ．ＫａｎｄａａｎｄＴ．Ｓａｓａｋｉ，”Ｃｏｌｏｒｉｎｆｏｍａｔｉｏｎｆｏｒｒｅｇｉｏｎｓｅｇｍｅｎｔａｔｉｏｎ”，Ｃｏｍｐ．ａｎｄＩｍｇ．Ｐｒｏｃ．，１３：２２２−２４１（１９８０）」（参考文献２）や「水野陽一、小林亜樹、吉田俊之、酒井善則、エッジ方向特徴量による画像検索、信学技報、ＩＥ９８−６２（１９９８）」（参考文献３）において詳述されている。
【００２５】
例えば、色特徴ベクトルとしては、ＨＳＩ表色系の色相を１６分割した色ヒストグラム（次元数１６）を採用することができる。また、エッジ特徴ベクトルとしては、方向数が４のエッジ方向ヒストグラム（次元数４）を採用することができる。これは、まず、画像の各画素に対して、水平、垂直、４５°、１３５°の４方向に関するガウス方向微分の絶対値をそれぞれ求め、これら４つの絶対値の最大値がある閾値を満たす場合にその画素をエッジ点と判定し、さらに最大値を与える方向をその画素のエッジ方向と定める。次に、水平、垂直、４５°、１３５°の各方向について、画像内の対応するエッジ点の個数をカウントすることによりヒストグラムを作成する。なお、色及びエッジのヒストグラムの各成分は、画像の全画素数で割って正規化する。
【００２６】
次に、階層的分類処理手段１１２は、図２に示すブロック構成にされ、検出対象物体の各画像から抽出された特徴ベクトルの集合Ｓｐｏｓと、検出対象物体以外の各画像から抽出された特徴ベクトルの集合Ｓｎｅｇとによって、以下の方法により特徴空間Ｖを階層的に分類し、分類によってできる特徴空間Ｖの各局所領域中にＳｐｏｓの要素とＳｎｅｇの要素とが同時に含まれないようにする。以下では、特徴空間Ｖ内の検出対象物体の画像特徴が特徴付けられる領域を正領域、それ以外を負領域と呼ぶことにする。
【００２７】
まず、分割処理手段２０１では、特徴空間Ｖの各軸を２等分して特徴空間Ｖを２^ｄｉｍＶ（ｄｉｍＶは特徴空間Ｖの次元数、上記の色とエッジではｄｉｍＶ＝２０）個の方形領域に分割する。
【００２８】
次に、正負判定処理手段２０２では、図３に示すように、分割処理手段２０１によってできる特徴空間Ｖの各方形領域に対して、以下の要領で正領域か負領域かの判定を行なう。この判定は、
（ａ）方形領域が集合Ｓｐｏｓの要素のみを含むならば、その領域は正領域と定める。
【００２９】
（ｂ）方形領域が集合Ｓｎｅｇの要素のみを含むならば、その領域は負領域と定める。集合Ｓｐｏｓ、Ｓｎｅｇのいずれの要素も含まない場合も負領域と定める。
【００３０】
最後に、終了判定処理手段２０３では、特徴空間Ｖの方形領域がすべて正負いずれかに分類されているならば処理を終了する。正負に分類されていない方形領域Ｗ_１、…、Ｗｎ（集合ＳｐｏｓとＳｎｅｇの要素を同時に含む）がある場合には、方形領域Ｗ_１、…、Ｗｎに対し、分割処理手段２０１と正負判定処理手段２０２に戻ってその処理を行ない、正負いずれかに分類されるまで、方形領域Ｗ_１、…、Ｗｎを細分化していく。
【００３１】
最後に、認識モデル作成処理手段１１３において、集合Ｓｐｏｓの要素が含まれる各局所領域に対してその和集合領域を生成し、検出対象物体の画像特徴の特徴付け領域として生成することで、検出対象物体の認識モデルを作成する。
【００３２】
図４は、上述のようにして作成した認識モデルに基づく本発明の物体の検出装置の構成図である。この装置構成においても実際にはコンピュータシステム上で実現される。
【００３３】
まず、未知画像入力手段３０１は、カメラや画像記録媒体等から未知画像を入力する。探索ウインドウスキャン処理手段３１１は、入力手段３０１から入力された未知画像に対し、様々なスケールの探索ウインドウを画像全体にわたってスキャンして照合を行なうための局所矩形領域を多数選択する。
【００３４】
次に、特徴ベクトル抽出処理手段３１２は、探索ウインドウスキャン処理手段３１１で選択された各局所矩形領域から特徴ベクトルを抽出する。
【００３５】
最後に、認識モデル照合処理手段３１３は、抽出された各特徴ベクトルと上述の認識モデル作成装置を用いて作成し保存しておいた認識モデルとの照合計算によって、未知画像内に検出対象物体が写っているか否かを判定し、写っている場合にはその領域を検出する。
【００３６】
次に、本発明の認識モデルに基づいた未知画像からの物体の検出装置の具体的な処理例を図５を参照して説明する。
【００３７】
未知画像Ｘを入力し、探索ウインドウスキャン処理手段３１１において、図５のように様々なスケールの矩形状探索ウインドウを画像全体にわたってスキャンして照合を行なうための局所矩形領域を多数選択する。
【００３８】
次に、特徴ベクトル抽出処理手段３１２において、選択された各局所矩形領域から特徴ベクトルを抽出する。ただし、未知画像内の各局所矩形領域から抽出する特徴ベクトルは、検出対象物体の認識モデル作成に用いた特徴ベクトルと同じものである。
【００３９】
最後に、認識モデル照合処理手段３１３において、検出対象物体の認識モデルとの照合計算によって、未知画像Ｘに検出対象物体が含まれるか否かを判定し、含まれる場合にはその局所領域を検出する。すなわち、未知画像Ｘ内の選択された局所矩形領域から抽出した特徴ベクトルが一つでも、特徴空間のＲ領域、すなわち検出対象物体の画像特徴の特徴付け領域に含まれる場合には、この未知画像Ｘには検出対象物体が含まれると判定し、かつＲ領域に含まれる特徴ベクトルの抽出元の局所矩形領域内にこの物体は含まれるとしてその領域を検出する。
【００４０】
なお、本発明は、図１、図２及び図４に示した装置の一部又は全部をコンピュータを用いて機能させることができる。また、各図の手段をコンピュータプログラムで記載してそれを実行できるようにし、それをコンピュータが読み取り可能な記録媒体、例えば、ＦＤ（フロッピーディスク）や、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディスクなどに記録して提供し、配布することが可能である。
【００４１】
【発明の効果】
以上説明したように、本発明によれば、検出対象となる物体および検出対象物体以外が写された複数の画像を与えることによって、処理系で、検出対象物体の色、エッジ、テクスチャ、あるいはこれらの結合に関する画像特徴を学習して検出対象物体の認識モデルを自動的に作成する。色、エッジ、テクスチャ等の画像特徴は、物体のローカルな画像特徴であるため、実際の処理において抽出が安定かつロバストに実現される。このため、未知画像からの物体の検出処理においても、未知画像内に検出対象物体が写されている場合で、かつ向きやオクルージョン等の見かけ上の変化が多少程度ある場合であっても、検出対象物体を含みかつ背景をほとんど含まない局所矩形領域が照合対象として選択された場合には、その領域からの所望の画像特徴の抽出が可能となるため、上述の方法により学習した認識モデルに基づいてこの領域内の物体の認識・同定が可能となる。故に、未知画像からの検出対象物体の領域の検出も可能となる。
【図面の簡単な説明】
【図１】本発明の認識モデル作成に係る処理構成図。
【図２】特徴空間を検出対象物体の画像特徴の特徴付け領域と、それ以外の領域とに分類する処理構成図。
【図３】特徴空間を検出対象物体の画像特徴の特徴付け領域と、それ以外の領域とに分類する例。
【図４】本発明の画像中の対象物体の検出に係る処理構成図。
【図５】未知画像に対象物体が含まれるか否かを判定して、含まれる場合にはその局所矩形領域を検出する具体的な処理例。
【符号の説明】
１０１…検出対象物体が写された画像入力手段
１０２…検出対象物体以外が写された画像入力手段
１１１…特徴ベクトル抽出処理手段
１１２…階層的分類処理手段
１１３…認識モデル作成処理手段
２０１…分割処理手段
２０２…正負判定処理手段
２０３…終了判定処理手段
３０１…未知画像入力手段
３１１…探索ウインドウスキャン処理手段
３１２…特徴ベクトル抽出処理手段
３１３…認識モデル照合処理手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a detection device that detects an object in an image, a detection method, and particularly creates a recognition model of a detection target object from a plurality of images in which the detection target object is captured, based on the recognition model. The present invention relates to an apparatus and a method for determining whether or not a detection target object is included in an unknown image and, if so, detecting a local region in which the object is captured.
[0002]
[Prior art]
In recent years, many techniques have been proposed for an apparatus relating to creation of a recognition model for recognizing and detecting an object in an image. One of them is a device for directly creating a recognition model of an object to be detected from a plurality of images of the object.
[0003]
This conventional apparatus uses a KL (Kalfunen-Reve) for a case set consisting of an image in which a detection target object is captured, or a partial image of an image in which the detection target object is captured, which is formed by a local rectangular region including the object. A two-dimensional template image of the detection target object is created by applying an image processing method such as conversion, and a recognition model of the detection target object is created using the template image itself. The object is detected based on a collation calculation between the local region and the template image. For example, there is the following document.
[0004]
Reference 1: M.P. Turk and A. Pentland, "Eigenfaces for Recognition", J. Am. Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86 (1991)
In Reference 1, in order to recognize and detect a human face from an image, a KL transform is applied to a set of face images photographed from the front of a plurality of humans, so that a face image facing the front from an unknown image is obtained. Has created a face image recognition model that can be detected.
[0005]
[Problems to be solved by the invention]
In order to detect an object to be detected from an image, a KL transformation is performed on a case set such as an image of the object to be detected or an image of the image of the object to be detected, such as a partial image of a local rectangular area including the object. In a conventional apparatus that creates a two-dimensional template image of a target object as a recognition model by applying an image processing method such as that described above, when recognizing an object in an unknown image and detecting its area, To measure the similarity of each local region of the unknown image and the template image as a whole for the collation evaluation, even if the detection target object is shown in the local region, the appearance, such as the orientation and occlusion, of the detection target object If the above change is slight, there is a problem that the detection target object cannot be detected because the similarity of the entire image is not evaluated.
[0006]
The problem to be solved by the present invention is that when an object to be detected is captured in an unknown image, the object can be detected even when there is some apparent change in orientation, occlusion, etc. An object of the present invention is to propose a device and a method for creating a model and detecting a detection target object based on the recognition model.
[0007]
[Means for Solving the Problems]
The device for detecting an object in an image according to the present invention for solving the above-described problem is configured such that a detection target object is included in an input unknown image based on a recognition model created from a plurality of images in which the detection target object is captured. It is a device for detecting an object in an image which determines whether or not the object is included, and if included, detects a local region in which the object is captured,
The means for creating the recognition model includes:
Means for inputting a plurality of images of the object to be detected,
Means for inputting a plurality of images of the object other than the detection target object,
Means for extracting a color, an edge, a texture, or a feature vector by combining these from each input image other than the detection target object and the detection target object,
Means for dividing each axis of the feature space into two equal parts according to the extracted detection target object and the feature vector other than the detection target object, and dividing the axes into two-dimensional square areas;
For each of the rectangular regions, if the included feature vector is only the element of the detection target object, the region is defined as a positive region.If the included feature vector is only the element other than the detection target object, the negative region is determined. A means for repeating the classification of the positive and negative areas and the division of each of the rectangular areas when the detection target object and any element other than the detection target object are not included, and
Means for generating a recognition model of the detection target object by generating a union region of each square region including the element of the positive region as a characteristic region of an image feature of the detection target object. It is characterized by.
[0009]
In addition, the method for detecting an object in an image according to the present invention for solving the above-described problem is based on a recognition model created from a plurality of images in which the detection target object is captured. It is a method of detecting an object in an image to determine whether or not an object is included, and if included, to detect a local region in which the object is captured,
Creation of the recognition model,
Input a plurality of images of the object to be detected,
Input a plurality of images other than the object to be detected,
Colors, edges, textures, or feature vectors by combining these are extracted from the input images other than the detection target object and the detection target object,
According to the extracted detection target object and the feature vector other than the detection target object, each axis of the feature space is bisected and divided into two-dimensional square areas.
For each of the rectangular regions, if the included feature vector is only the element of the detection target object, the region is defined as a positive region.If the included feature vector is only the element other than the detection target object, the negative region is determined. If it does not include any elements other than the detection target object and the detection target object, it is a negative area, and the classification of positive and negative areas and the division of each of the square areas are repeated
The recognition model of the detection target object is created by generating a union region of each square region including the element of the positive region as a characterization region of the image feature of the detection target object .
[0011]
Also, a program for causing a computer to execute the processing procedure in the detection method is recorded on a recording medium readable by the computer.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0013]
FIG. 1 shows an apparatus configuration for creating a recognition model according to the present invention. The means 101, 102, 111 to 113 of this device configuration are realized using a so-called computer system including a mouse, an image scanner, a display, a CPU, a memory device, and the like. I do.
[0014]
First, the image input unit 101 uses an image scanner or the like to input a plurality of images in which an object to be detected is captured, and stores the plurality of images in a memory or the like. Here, when the detection target object is included in a local rectangular area of the input image, the rectangular area is designated with a mouse or the like, a new partial image is created, and this partial image is stored in a memory or the like. To be stored.
[0015]
Next, the image input unit 102 inputs a plurality of images in which objects other than the detection target object are captured, and stores the images in a memory or the like.
[0016]
Next, the means 111 to 113 execute the following processing on the CPU to automatically create a recognition model of the detection target object in the processing system.
[0017]
First, the feature vector extraction processing unit 111 extracts a color, an edge, a texture, or a feature vector by a combination thereof from each of the input images in which the object to be detected input by the image input units 101 and 102 and the others are captured. .
[0018]
Next, the hierarchical classification processing means 112 classifies the feature space hierarchically by a set of the extracted feature vectors of both the object to be detected and the object other than the object to be detected, and includes a local region in the classified feature space. Ensure that the elements of both sets are not included at the same time.
[0019]
Finally, the recognition model creation processing unit 113 generates a union for each local region in the feature space including the feature vector of the detection target object, and generates the union as a characteristic region of the image feature of the detection target object. Thus, a recognition model of the detection target object is created.
[0020]
Next, a specific example of processing relating to creation of a recognition model according to the present invention will be described with reference to FIGS.
[0021]
First, a plurality of images on which a detection target object is captured are input by the image input unit 101. However, when the detection target object is included in a certain local rectangular area in the input image, a circumscribed rectangular area is specified so that the detection target object is included and the background is not included as much as possible, and a new partial image is created. Then, this partial image is input.
[0022]
Next, a plurality of images including objects other than the detection target object are input by the image input unit 102.
[0023]
Next, the feature vector extraction processing unit 111 extracts a color, an edge, a texture, or a feature vector based on a combination thereof from each of the input detection target object and each image other than the detection target object.
[0024]
As for the feature vector extraction, for example, “Y-I Ohta, T. Kanda and T. Sasaki,“ Color information for region segmentation ”, Comp. And Img. Proc., 13: 222-241 (1980)” References 2) and "Image Retrieval Using Edge Direction Features, IE98-62 (1998)" (Yuichi Mizuno, Aki Kobayashi, Toshiyuki Yoshida, Yoshinori Sakai, Reference 98) (Ref.
[0025]
For example, a color histogram (16 dimensions) obtained by dividing the hue of the HSI color system into 16 can be adopted as the color feature vector. As the edge feature vector, an edge direction histogram having four directions (4 dimensions) can be adopted. First, the absolute values of the Gaussian derivatives in four directions of horizontal, vertical, 45 ° and 135 ° are obtained for each pixel of the image, and the maximum value of these four absolute values satisfies a certain threshold. Is determined as an edge point, and the direction giving the maximum value is determined as the edge direction of the pixel. Next, a histogram is created by counting the number of corresponding edge points in the image in each of the horizontal, vertical, 45 °, and 135 ° directions. Each component of the color and edge histograms is normalized by dividing by the total number of pixels of the image.
[0026]
Next, the hierarchical classification processing means 112 has a block configuration shown in FIG. 2 and includes a set Spos of feature vectors extracted from each image of the detection target object and a feature vector extracted from each image other than the detection target object. , The feature space V is hierarchically classified by the following method so that the Spos element and the Sneg element are not simultaneously included in each local region of the feature space V formed by the classification. Hereinafter, a region in the feature space V where the image feature of the detection target object is characterized is referred to as a positive region, and the other region is referred to as a negative region.
[0027]
First, in the division processing means 201, each axis of the feature space V is divided into two equal parts, and the feature space V is divided into 2 ^dimV (dimV is the number of dimensions of the feature space V, and dimV = 20 for the above colors and edges). Divided into
[0028]
Next, as shown in FIG. 3, the positive / negative determination processing means 202 determines whether each square area of the feature space V formed by the division processing means 201 is a positive area or a negative area in the following manner. This judgment is
(A) If a rectangular area includes only the elements of the set Spos, the area is defined as a positive area.
[0029]
(B) If the rectangular area includes only the elements of the set Sneg, the area is defined as a negative area. A negative region is also defined when neither of the elements of the sets Spos and Sneg is included.
[0030]
Finally, the end determination processing means 203 ends the processing if all the rectangular regions of the feature space V are classified as either positive or negative. If there are rectangular areas W ₁ ,..., Wn that are not classified as positive or negative (including the elements of the set Spos and Sneg at the same time), the division processing means 201 and the positive / negative determination processing are performed on the rectangular areas W ₁ ,. Returning to the means 202, the processing is performed, and the rectangular areas W ₁ ,..., Wn are subdivided until they are classified as either positive or negative.
[0031]
Finally, the recognition model creation processing unit 113 generates a union region for each local region including the elements of the set Spos, and generates the union region as a characteristic region of the image feature of the detection target object. Create an object recognition model.
[0032]
FIG. 4 is a configuration diagram of the object detection device of the present invention based on the recognition model created as described above. This apparatus configuration is actually realized on a computer system.
[0033]
First, the unknown image input unit 301 inputs an unknown image from a camera, an image recording medium, or the like. The search window scan processing unit 311 scans a search window of various scales over the entire image with respect to the unknown image input from the input unit 301 and selects a large number of local rectangular regions for matching.
[0034]
Next, the feature vector extraction processing unit 312 extracts a feature vector from each local rectangular area selected by the search window scan processing unit 311.
[0035]
Finally, the recognition model matching processing means 313 performs a matching calculation between each of the extracted feature vectors and the recognition model created and stored by using the above-described recognition model creation device, so that the detection target object is included in the unknown image. It is determined whether or not the image is captured, and if it is, the area is detected.
[0036]
Next, a specific processing example of the apparatus for detecting an object from an unknown image based on the recognition model of the present invention will be described with reference to FIG.
[0037]
The unknown image X is input, and the search window scan processing unit 311 scans rectangular search windows of various scales over the entire image as shown in FIG. 5 and selects a large number of local rectangular regions for matching.
[0038]
Next, the feature vector extraction processing unit 312 extracts a feature vector from each selected local rectangular area. However, the feature vector extracted from each local rectangular area in the unknown image is the same as the feature vector used for creating the recognition model of the detection target object.
[0039]
Finally, the recognition model matching processing means 313 determines whether or not the unknown image X includes the detection target object by matching calculation with the recognition model of the detection target object. I do. That is, if at least one feature vector extracted from the selected local rectangular area in the unknown image X is included in the R area of the feature space, that is, the characteristic area of the image feature of the detection target object, the unknown image It is determined that X includes an object to be detected, and the object is detected as being included in the local rectangular area from which the feature vector included in the R area is extracted.
[0040]
In the present invention, some or all of the apparatuses shown in FIGS. 1, 2 and 4 can be made to function using a computer. In addition, the means shown in each figure are described in a computer program so that they can be executed, and can be executed by a computer-readable recording medium, for example, FD (floppy disk), MO, ROM, memory card, CD, DVD, It can be recorded on a removable disk or the like, provided, and distributed.
[0041]
【The invention's effect】
As described above, according to the present invention, by providing a plurality of images of the object to be detected and the object other than the object to be detected, the processing system allows the processing system to provide the color, edge, texture, or the like of the object to be detected. Learning of image features related to the combination of the objects automatically creates a recognition model of the detection target object. Since image features such as colors, edges, and textures are local image features of the object, stable and robust extraction is achieved in actual processing. For this reason, even in the process of detecting an object from an unknown image, detection is performed even when the detection target object is captured in the unknown image and when there is a slight degree of apparent change in orientation, occlusion, etc. When a local rectangular area including the target object and substantially no background is selected as a target to be compared, it is possible to extract a desired image feature from the area, and therefore, based on a recognition model learned by the above-described method. Recognition and identification of the object in the leverage region can be performed. Therefore, it is possible to detect the area of the detection target object from the unknown image.
[Brief description of the drawings]
FIG. 1 is a processing configuration diagram according to a recognition model creation of the present invention.
FIG. 2 is a processing configuration diagram for classifying a feature space into a feature area of an image feature of a detection target object and other areas.
FIG. 3 is an example in which a feature space is classified into a feature area of an image feature of a detection target object and other areas.
FIG. 4 is a processing configuration diagram relating to detection of a target object in an image according to the present invention.
FIG. 5 is a specific example of processing for determining whether or not a target object is included in an unknown image and detecting a local rectangular area if the target object is included.
[Explanation of symbols]
101: Image input means on which a detection target object is captured 102: Image input means 111 on which a non-detection target object is captured ... Feature vector extraction processing means 112 ... Hierarchical classification processing means 113 ... Recognition model creation processing means 201 ... Division processing Means 202 ··· Positive / negative determination processing means 203 · End determination processing means 301 · Unknown image input means 311 · Search window scan processing means 312 · Feature vector extraction processing means 313 · Recognition model matching processing means

Claims

Based on a recognition model created from a plurality of images in which the detection target object is captured, it is determined whether or not the detection target object is included in the input unknown image, and if so, the object is captured. An apparatus for detecting an object in an image for detecting a local region,
The means for creating the recognition model includes:
Means for inputting a plurality of images of the object to be detected,
Means for inputting a plurality of images of the object other than the detection target object,
Means for extracting a color, an edge, a texture, or a feature vector by combining these from each input image other than the detection target object and the detection target object,
Means for dividing each axis of the feature space into two equal parts by the extracted detection target object and the feature vector other than the detection target object, and dividing the axes into two-dimensional square areas;
For each of the square regions, if the included feature vector is only the element of the detection target object, the region is defined as a positive region.If the included feature vector is only the element other than the detection target object, the negative region is determined. If the detection target object and any element other than the detection target object are not included, the area is a negative area, and means for repeating the classification of the positive and negative areas and the division of each square area,
Means for creating a recognition model of the detection target object, by generating a union region of each square region including the element of the positive region as a characterization region of the image feature of the detection target object,
An apparatus for detecting an object in an image, comprising:

Based on a recognition model created from a plurality of images in which the detection target object is captured, it is determined whether or not the detection target object is included in the input unknown image, and if so, the object is captured. A method for detecting an object in an image for detecting a local region,
Creation of the recognition model,
Input a plurality of images of the object to be detected,
Input a plurality of images other than the object to be detected,
Colors, edges, textures, or feature vectors by combining these are extracted from the input images other than the detection target object and the detection target object,
According to the extracted detection target object and the feature vector other than the detection target object, each axis of the feature space is bisected and divided into two-dimensional square areas.
For each of the square regions, if the included feature vector is only the element of the detection target object, the region is defined as a positive region.If the included feature vector is only the element other than the detection target object, the negative region is determined. If it does not include any elements other than the detection target object and the detection target object, it is assumed that the area is a negative area, and the classification of the positive and negative areas and the division of each square area are repeated,
By generating a union region of each square region including the element of the positive region as a characterization region of the image feature of the detection target object, to create a recognition model of the detection target object,
A method for detecting an object in an image, comprising:

A method for detecting an object in an image, wherein the program for causing a computer to execute the processing procedure in the method for detecting an object in an image according to claim 2 is recorded on a computer-readable recording medium. The recording medium on which it was recorded.