JP2013037522A

JP2013037522A - Object tracking program and object tracking device

Info

Publication number: JP2013037522A
Application number: JP2011172896A
Authority: JP
Inventors: Hiroyuki Abe; 啓之阿部
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2011-08-08
Filing date: 2011-08-08
Publication date: 2013-02-21

Abstract

【課題】時系列に入力されるフレーム画像において適切に主要被写体を追跡すること。
【解決手段】被写体追跡プログラムは、時系列で入力される各フレーム画像の色情報および輝度情報に基づいて複数の要素画像を生成する要素画像生成処理と、複数の要素画像をそれぞれ２値化して複数の２値化要素画像を生成する２値化要素画像生成処理と、複数の２値化要素画像を論理積演算する論理積演算処理と、論理積演算後の２値論理積画像に対するラベリング処理に基づいて、各フレーム画像における主要被写体の位置を特定する特定処理と、特定処理で特定された範囲と所定範囲との論理積演算に基づいて特定された範囲を縮小する縮小処理と、前フレーム画像において縮小処理で縮小された範囲を膨張させて所定範囲を得るモルフォロジー処理と、をコンピュータに実行させる。
【選択図】図１３An object of the present invention is to appropriately track a main subject in a frame image input in time series.
An object tracking program binarizes a plurality of element images and element image generation processing for generating a plurality of element images based on color information and luminance information of each frame image input in time series. Binarization element image generation processing for generating a plurality of binarization element images, AND operation processing for ANDing a plurality of binarization element images, and labeling processing for a binary AND image after the AND operation Specific processing for specifying the position of the main subject in each frame image, reduction processing for reducing the range specified based on the logical product of the range specified by the specification processing and the predetermined range, and the previous frame The computer executes a morphological process for expanding a range reduced by the reduction process in an image to obtain a predetermined range.
[Selection] Figure 13

Description

本発明は、被写体追跡プログラムおよび被写体追跡装置に関する。 The present invention relates to a subject tracking program and a subject tracking device.

時系列に入力されるフレーム画像内でテンプレート画像と類似度が高い画像の位置を探索することにより、各フレームにおいて主要被写体の位置を特定する被写体追跡技術が知られている（特許文献１参照）。 A subject tracking technique is known in which the position of a main subject in each frame is specified by searching for the position of an image having a high similarity to the template image in frame images input in time series (see Patent Document 1). .

特開２００８−２９９８３４号公報JP 2008-299834 A

従来技術では、特定した位置に主要被写体以外の背景部分が含まれるため、被写体の位置が不明確になるという問題があった。 In the prior art, the background position other than the main subject is included in the specified position, so that the subject position is unclear.

本発明による被写体追跡プログラムは、コンピュータに、時系列で入力される各フレーム画像の色情報および輝度情報に基づいて複数の要素画像を生成する要素画像生成処理と、複数の要素画像をそれぞれ２値化して複数の２値化要素画像を生成する２値化要素画像生成処理と、複数の２値化要素画像を論理積演算する論理積演算処理と、論理積演算後の２値論理積画像に対するラベリング処理に基づいて、各フレーム画像における主要被写体の位置を特定する特定処理と、特定処理で特定された範囲と所定範囲との論理積演算に基づいて特定された範囲を縮小する縮小処理と、前フレーム画像において縮小処理で縮小された範囲を膨張させて所定範囲を得るモルフォロジー処理と、を実行させることを特徴とする。
本発明による被写体追跡プログラムは、コンピュータに、テンプレート内の主要被写体の２値化画像を膨張させるモルフォロジー処理と、モルフォロジー処理による膨張後の２値化画像を包絡する範囲を探索領域に決定する探索領域決定処理と、時系列で入力される各フレーム画像のうち探索領域内で探索枠を移動させながら、各探索枠位置における探索枠内の画像とテンプレートとの類似度を算出する類似度算出処理と、類似度が最も高い探索枠の位置を主要被写体の位置として特定する特定処理と、を実行させることを特徴とする。 An object tracking program according to the present invention includes an element image generation process for generating a plurality of element images based on color information and luminance information of each frame image input in time series to a computer, and each of the plurality of element images is binarized. A binarized element image generation process for generating a plurality of binarized element images, a logical product operation process for performing a logical AND operation on the plurality of binarized element images, and a binary logical product image after the AND operation A specifying process for specifying the position of the main subject in each frame image based on the labeling process, and a reducing process for reducing the specified range based on a logical product operation of the range specified by the specifying process and the predetermined range; Morphological processing for expanding a range reduced by the reduction processing in the previous frame image to obtain a predetermined range is performed.
A subject tracking program according to the present invention is a search region in which a computer determines, as a search region, a morphological process for expanding a binary image of a main subject in a template and a range enclosing the binary image after expansion by the morphological process. A determination process, and a similarity calculation process for calculating the similarity between the image in the search frame and the template at each search frame position while moving the search frame in the search area among the frame images input in time series And a specifying process for specifying the position of the search frame with the highest similarity as the position of the main subject.

本発明によれば、時系列に入力されるフレーム画像において適切に主要被写体を追跡できる。 According to the present invention, it is possible to appropriately track a main subject in a frame image input in time series.

第一の実施形態におけるカメラの構成を例示するブロック図である。It is a block diagram which illustrates the composition of the camera in a first embodiment. 第一の実施形態における制御装置の処理を示すフローチャートである。It is a flowchart which shows the process of the control apparatus in 1st embodiment. ２値化したＹプレーン画像を例示する図である。It is a figure which illustrates the binarized Y plane image. ２値化したＣｂプレーン画像を例示する図である。It is a figure which illustrates the binarized Cb plane image. ２値化したＣｒプレーン画像を例示する図である。It is a figure which illustrates the binarized Cr plane image. 第一のＡＮＤ演算後の２値化画像を例示する図である。It is a figure which illustrates the binarized image after a 1st AND operation. 従来技術による２値化画像を例示する図である。It is a figure which illustrates the binarized image by a prior art. 膨張処理後の２値化画像を例示する図である。It is a figure which illustrates the binarized image after an expansion process. 第二のＡＮＤ演算後の２値化画像を例示する図である。It is a figure which illustrates the binarized image after a 2nd AND operation. フレーム画像に表示したターゲット枠を例示する図である。It is a figure which illustrates the target frame displayed on the frame image. 主要被写体を含むを含む範囲を例示する図である。It is a figure which illustrates the range containing the main subject. テンプレートを例示する図である。It is a figure which illustrates a template. テンプレートマッチングを説明する図である。It is a figure explaining template matching. 背景ノイズを例示する図である。It is a figure which illustrates background noise. 膨張マスクＤを例示する図である。4 is a diagram illustrating an expansion mask D. FIG. 背景ノイズカットを説明する図である。It is a figure explaining background noise cut. 膨張マスクＤを包絡する枠を探索領域を説明する図である。It is a figure explaining the search area | region about the frame which envelopes the expansion mask D. FIG. 探索領域内での最初のテンプレートの位置を示す図である。It is a figure which shows the position of the first template in a search area | region. 探索領域内での最後のテンプレートの位置を示す図である。It is a figure which shows the position of the last template in a search area | region. １回のマッチング演算時のテンプレートの移動量を示す図である。It is a figure which shows the movement amount of the template at the time of one matching calculation. 第二の実施形態における制御装置の処理を示すフローチャートである。It is a flowchart which shows the process of the control apparatus in 2nd embodiment.

以下、図面を参照して本発明を実施するための形態について説明する。
（第一の実施形態）
図１は、第一の実施形態における被写体追跡装置としてカメラを使用した場合の一実施の形態の構成を例示するブロック図である。第一の実施形態では、ラベリング手法を用いて被写体を追跡する。カメラ１００は、操作部材１０１と、レンズ１０２と、撮像素子１０３と、制御装置１０４と、メモリカードスロット１０５と、モニタ１０６とを備えている。操作部材１０１は、使用者によって操作される種々の入力部材、例えば電源ボタン、レリーズスイッチ、ズームボタン、十字キー、決定ボタン、再生ボタン、削除ボタンなどを含む。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.
(First embodiment)
FIG. 1 is a block diagram illustrating the configuration of an embodiment when a camera is used as the subject tracking device in the first embodiment. In the first embodiment, a subject is tracked using a labeling method. The camera 100 includes an operation member 101, a lens 102, an image sensor 103, a control device 104, a memory card slot 105, and a monitor 106. The operation member 101 includes various input members operated by the user, such as a power button, a release switch, a zoom button, a cross key, a determination button, a playback button, and a delete button.

レンズ１０２は、複数の光学レンズ群から構成されるが、図１では代表して１枚のレンズで表している。撮像素子１０３は、例えばＣＣＤやＣＭＯＳイメージセンサによって構成され、レンズ１０２により結像した被写体像を撮像して画像を取得する。そして、取得した画像のデータ（画像データ）を制御装置１０４へ出力する。制御装置１０４は、撮像素子１０３で取得された画像データを所定の画像形式、例えばＪｐｅｇ形式に圧縮し、Ｅｘｉｆ（ＥｘｃｈａｎｇｅａｂｌｅＩｍａｇｅＦｉｌｅＦｏｒｍａｔｆｏｒＤｉｇｉｔａｌＳｔｉｌｌＣａｍｅｒａ）等の所定の形式の画像ファイルを生成してメモリカードスロット１０５へ出力する。 The lens 102 is composed of a plurality of optical lens groups, but is represented by a single lens in FIG. The image sensor 103 is configured by a CCD or a CMOS image sensor, for example, and acquires an image by capturing a subject image formed by the lens 102. The acquired image data (image data) is output to the control device 104. The control device 104 compresses the image data acquired by the image sensor 103 into a predetermined image format, for example, JPEG format, and generates an image file in a predetermined format such as Exif (Exchangeable Image File Format Digital Still Camera). Output to the memory card slot 105.

メモリカードスロット１０５は、記憶媒体としてのメモリカードを挿入するためのスロットであり、制御装置１０４から出力された画像ファイルをメモリカードに書き込んで記録する。また、制御装置１０４からの指示に基づいて、メモリカード内に記憶されている画像ファイルを読み込む。 The memory card slot 105 is a slot for inserting a memory card as a storage medium, and the image file output from the control device 104 is written and recorded on the memory card. Further, based on an instruction from the control device 104, an image file stored in the memory card is read.

モニタ１０６は、カメラ１００の背面に搭載された液晶モニタ（背面モニタ）であり、当該モニタ１０６には、メモリカードに記憶されている画像やカメラ１００を設定するための設定メニューなどが表示される。また、制御装置１０４は、撮像素子１０３から時系列で画像を取得してモニタ１０６に出力する。これによって、モニタ１０６には、各フレームの画像が所定時間間隔で順番に表示される。すなわちモニタ１０６には画像が表示される。 The monitor 106 is a liquid crystal monitor (rear monitor) mounted on the back surface of the camera 100, and the monitor 106 displays an image stored in a memory card, a setting menu for setting the camera 100, and the like. . Further, the control device 104 acquires images in time series from the image sensor 103 and outputs them to the monitor 106. As a result, the image of each frame is sequentially displayed on the monitor 106 at predetermined time intervals. That is, an image is displayed on the monitor 106.

制御装置１０４は、ＣＰＵ、メモリ、およびその他の周辺回路により構成され、後述する演算部１０４ａを機能的に備えている。なお、制御装置１０４を構成するメモリには、プログラムを格納するメモリ、ワークメモリ、バッファメモリとして使用するメモリを含む。例えばフラッシュメモリや、ＲＡＭ、ＳＤＲＡＭなどである。 The control device 104 includes a CPU, a memory, and other peripheral circuits, and functionally includes a calculation unit 104a described later. Note that the memory constituting the control device 104 includes a memory for storing a program, a work memory, and a memory used as a buffer memory. For example, flash memory, RAM, SDRAM and the like.

図２は、第一の実施形態におけるカメラ１００の処理を示すフローチャートである。図２に示す処理は、撮像素子１０３から画像の入力が開始されると起動するプログラムとして、制御装置１０４内の演算部１０４ａによって実行される。 FIG. 2 is a flowchart showing the processing of the camera 100 in the first embodiment. The processing shown in FIG. 2 is executed by the arithmetic unit 104a in the control device 104 as a program that starts when image input from the image sensor 103 is started.

図２のステップＳ５１において、制御装置１０４の演算部１０４ａは、フレーム画像を読み込んでステップＳ５２へ進む。ステップＳ５２において、演算部１０４ａは、使用者からの指示に基づいて、読み込んだフレーム画像から所定範囲内の画像を抽出し、ステップＳ５３へ進む。 In step S51 of FIG. 2, the calculation unit 104a of the control device 104 reads the frame image and proceeds to step S52. In step S52, the calculation unit 104a extracts an image within a predetermined range from the read frame image based on an instruction from the user, and proceeds to step S53.

ステップＳ５３において、演算部１０４ａは、抽出画像に基づいてＹＣｂＣｒ形式の画像を算出し、Ｙ成分の画像（Ｙプレーン画像）、Ｃｒ成分の画像（Ｃｒプレーン画像）、および、Ｃｂ成分の画像（Ｃｂプレーン画像）をそれぞれ生成する。具体的には、ＲＧＢ表色系で表されている対象画像を次式（１）〜（３）を用いてＹＣｂＣｒ色空間における輝度成分（Ｙ成分）からなる輝度画像と色差成分（Ｃｂ成分、Ｃｒ成分）とからなる色差画像とに変換する。 In step S53, the calculation unit 104a calculates an image in the YCbCr format based on the extracted image, and outputs a Y component image (Y plane image), a Cr component image (Cr plane image), and a Cb component image (Cb Each plane image) is generated. Specifically, a target image represented by the RGB color system is represented by a luminance image composed of luminance components (Y components) in a YCbCr color space and color difference components (Cb components, To a color difference image composed of (Cr component).

すなわち、演算部１０４ａは、抽出画像から次式（１）を用いてＹ成分からなる輝度画像をＹプレーン画像として生成する。演算部１０４ａはさらに、抽出画像から次式（２）および（３）を用いてＣｂ成分からなる色差画像とＣｒ成分からなる色差画像とをそれぞれＣｂプレーン画像、およびＣｒプレーン画像として生成し、ステップＳ５４へ進む。
Ｙ＝０．２９９Ｒ＋０．５８７Ｇ＋０．１１４Ｂ・・・（１）
Ｃｂ＝−０．１６９Ｒ−０．３３２Ｇ＋０．５００Ｂ・・・（２）
Ｃｒ＝０．５００Ｒ−０．４１９Ｇ−０．０８１Ｂ・・・（３） That is, the calculation unit 104a generates a luminance image composed of a Y component as a Y plane image from the extracted image using the following expression (1). The computing unit 104a further generates a color difference image composed of a Cb component and a color difference image composed of a Cr component as a Cb plane image and a Cr plane image from the extracted image using the following equations (2) and (3), respectively. Proceed to S54.
Y = 0.299R + 0.587G + 0.114B (1)
Cb = −0.169R−0.332G + 0.500B (2)
Cr = 0.500R−0.419G−0.081B (3)

ステップＳ５４では、演算部１０４ａは２値化演算を行う。演算部１０４ａは、Ｙプレーン画像、Ｃｂプレーン画像、およびＣｒプレーン画像において、たとえば、各々の平均値を閾値として２値化した２値化画像を算出し、ステップＳ５５へ進む。図３は、２値化したＹプレーン画像を例示する図である。図４は、２値化したＣｂプレーン画像を例示する図である。図５は、２値化したＣｒプレーン画像を例示する図である。なお、２値化に用いる閾値は、各々の平均値に標準偏差σに係数を掛けたものを加減算したものでもよい。 In step S54, the calculation unit 104a performs binarization calculation. For example, in the Y plane image, the Cb plane image, and the Cr plane image, the calculation unit 104a calculates a binarized image obtained by binarizing each average value as a threshold value, and the process proceeds to step S55. FIG. 3 is a diagram illustrating a binarized Y plane image. FIG. 4 is a diagram illustrating a binarized Cb plane image. FIG. 5 is a diagram illustrating a binarized Cr plane image. The threshold used for binarization may be a value obtained by adding and subtracting each average value obtained by multiplying the standard deviation σ by a coefficient.

ステップＳ５５において、演算部１０４ａは第一のＡＮＤ演算を行う。具体的には、２値化後の各プレーン画像を画素ごとに論理積演算してステップＳ５６へ進む。図６は、２値化したＹプレーン画像、Ｃｂプレーン画像、およびＣｒプレーン画像を第一のＡＮＤ演算した２値化画像を例示する図である。 In step S55, the operation unit 104a performs a first AND operation. Specifically, each of the binarized plane images is subjected to a logical product operation for each pixel, and the process proceeds to step S56. FIG. 6 is a diagram illustrating a binarized image obtained by performing a first AND operation on the binarized Y plane image, Cb plane image, and Cr plane image.

ステップＳ５６において、演算部１０４ａは、第一のＡＮＤ演算後の２値化画像に対し、公知のラベリング処理を行い白画素で構成されるラベリング領域を「島」として認識する。選択処理は、上記抽出したラベリング領域のうち、所定の評価処理によって被写体により近い島を選択する。 In step S56, the calculation unit 104a performs a known labeling process on the binarized image after the first AND operation, and recognizes a labeling region including white pixels as an “island”. The selection process selects an island closer to the subject by a predetermined evaluation process from the extracted labeling areas.

ステップＳ５７において、演算部１０４ａは、最初のフレームか否かを判定する。演算部１０４ａは、最初のフレームでない（すなわち２フレーム目以降である）場合はステップＳ５７を否定判定してステップＳ５８へ進む。演算部１０４ａは、最初のフレームの場合はステップＳ５７を肯定判定してステップＳ５９へ進む。 In step S57, the calculation unit 104a determines whether it is the first frame. If not the first frame (that is, the second and subsequent frames), the arithmetic unit 104a makes a negative determination in step S57 and proceeds to step S58. In the case of the first frame, the arithmetic unit 104a makes a positive determination in step S57 and proceeds to step S59.

ステップＳ５８において、演算部１０４ａは第二のＡＮＤ演算を行う。具体的には、ラベリング、選択処理後の２値化画像と、前フレームで膨張処理した２値化画像（図８）とを論理積演算してステップＳ５９へ進む。図９は、第二のＡＮＤ演算後の２値化画像を例示する図である。図９において白画素のまとまりを囲む枠は、主要被写体が存在する範囲を明示するためのターゲット枠である。 In step S58, the operation unit 104a performs a second AND operation. Specifically, a logical product operation is performed on the binarized image after the labeling and selection process and the binarized image (FIG. 8) that has been subjected to the dilation process in the previous frame, and the process proceeds to step S59. FIG. 9 is a diagram illustrating a binarized image after the second AND operation. In FIG. 9, a frame surrounding a group of white pixels is a target frame for clearly indicating a range where the main subject exists.

ステップＳ５９において、演算部１０４ａは、第二のＡＮＤ後の２値化画像に対してモルフォロジーによる膨張処理を行う。図８は、膨張処理後の２値化画像を例示する図である。なお、ステップＳ５７を肯定判定した場合の演算部１０４ａは、最初のフレームの２値化画像に対してモルフォロジーによる膨張処理を行う。 In step S <b> 59, the arithmetic unit 104 a performs expansion processing based on morphology on the binarized image after the second AND. FIG. 8 is a diagram illustrating a binarized image after the expansion process. Note that the arithmetic unit 104a when an affirmative determination is made in step S57 performs an expansion process by morphology on the binarized image of the first frame.

ステップＳ６０において、演算部１０４ａは、上記ターゲット枠をフレーム画像上に表示し、ステップＳ６１へ進む。図１０は、フレーム画像上のターゲット枠１５１を例示する図である。ターゲット枠１５１は、２値化画像において主要被写体を示す「島」である白領域を囲む包絡枠に相当する。第一の実施形態では、ターゲット枠１５１で囲まれる範囲が追跡結果（主要被写体が存在する範囲）となる。 In step S60, the calculation unit 104a displays the target frame on the frame image, and proceeds to step S61. FIG. 10 is a diagram illustrating the target frame 151 on the frame image. The target frame 151 corresponds to an envelope frame surrounding a white region that is an “island” indicating a main subject in the binarized image. In the first embodiment, a range surrounded by the target frame 151 is a tracking result (a range where the main subject exists).

ステップＳ６１において、演算部１０４ａは、フレーム終了か否かを判定する。演算部１０４ａは、画像の入力が終了した場合にステップＳ６１を肯定判定して図２による処理を終了する。演算部１０４ａは、画像の入力が終了していない場合には、ステップＳ６１を否定判定してステップＳ５１へ戻る。ステップＳ５１へ戻る場合は次フレームの読込みを行って上述した処理を繰り返す。 In step S61, the calculation unit 104a determines whether or not it is the end of the frame. When the image input is completed, the calculation unit 104a makes a positive determination in step S61 and ends the process of FIG. If the input of the image has not ended, the arithmetic unit 104a makes a negative determination in step S61 and returns to step S51. When returning to step S51, the next frame is read and the above-described processing is repeated.

以上説明した第一の実施形態によれば、以下のような作用効果を得ることができる。
（１）カメラ１００は、時系列で入力される各フレーム画像の色情報および輝度情報に基づいて複数のプレーン画像Ｙ、Ｃｂ、Ｃｒを生成する要素画像生成処理と、複数のプレーン画像Ｙ、Ｃｂ、Ｃｒをそれぞれ２値化して複数の２値化プレーン画像を生成する２値化演算処理と、複数の２値化プレーン画像を論理積演算する第１のＡＮＤ演算処理と、第１のＡＮＤ演算後の２値論理積画像に対するラベリング処理に基づいて、各フレーム画像における主要被写体の位置を特定するラベリング、選択処理と、ラベリング、選択処理で特定された範囲と所定範囲との論理積演算に基づいて特定された範囲を縮小する第２のＡＮＤ演算処理と、前フレーム画像において縮小処理で縮小された範囲を膨張させて所定範囲を得る膨張処理と、を演算部１０４ａに実行させる被写体追跡プログラムを搭載したので、時系列に入力されるフレーム画像において適切に主要被写体を追跡できる。 According to the first embodiment described above, the following operational effects can be obtained.
(1) The camera 100 performs element image generation processing for generating a plurality of plane images Y, Cb, Cr based on color information and luminance information of each frame image input in time series, and a plurality of plane images Y, Cb. , Cr is binarized to generate a plurality of binarized plane images, a first AND operation to AND a plurality of binarized plane images, and a first AND operation Based on the labeling processing for the binary image, and the labeling and selection processing for specifying the position of the main subject in each frame image, and the logical product operation between the range specified by the labeling and selection processing and the predetermined range. A second AND operation process for reducing the range specified in the above and an expansion process for expanding the range reduced by the reduction process in the previous frame image to obtain a predetermined range. Because mounting the subject tracking program to be executed by 04a, it can track properly main subject in the frame image input to the time series.

上述した第二のＡＮＤ演算後の２値化画像（図９）と、第二のＡＮＤ演算を行わない場合との比較例を説明する。図７は、第１のＡＮＤ演算後の２値化画像（図６）に対してラベリング、選択処理を施す従来技術による２値化画像を例示する図である。図７と図９とを比較すると、第二のＡＮＤ演算後の２値化画像（図９）の方が主要被写体が存在する範囲（すなわち、白画素のまとまり）が小さく、精度よく主要被写体を追跡できることがわかる。主要被写体が存在する範囲（すなわち、白画素のまとまり）を小さくできるので、主要被写体以外の背景ノイズなどを含む可能性が低く抑えられる。 A comparative example between the above-described binarized image after the second AND operation (FIG. 9) and the case where the second AND operation is not performed will be described. FIG. 7 is a diagram illustrating a binarized image according to the prior art in which labeling and selection processing are performed on the binarized image (FIG. 6) after the first AND operation. Comparing FIG. 7 and FIG. 9, the binarized image after the second AND operation (FIG. 9) has a smaller range in which the main subject exists (that is, a group of white pixels), and the main subject is accurately detected. You can trace. Since the range in which the main subject exists (that is, a group of white pixels) can be reduced, the possibility of including background noise other than the main subject is suppressed to a low level.

（２）上記（１）の被写体追跡プログラムにおいて、要素画像生成処理は、輝度画像Ｙおよび色差画像Ｃｂ、Ｃｒをそれぞれ生成するので、主要被写体追跡に適した要素画像が得られる。 (2) In the subject tracking program of (1), the element image generation process generates the luminance image Y and the color difference images Cb and Cr, respectively, so that an element image suitable for main subject tracking can be obtained.

（第二の実施形態）
第二の実施形態では、制御部１０４ａが撮像素子１０３から入力される各フレーム画像に対してテンプレートマッチング処理を行って、フレーム画像内で主要被写体が写っている領域を特定する。そして、特定した領域をフレーム間で追跡する。具体的には、次のように処理する。 (Second embodiment)
In the second embodiment, the control unit 104a performs template matching processing on each frame image input from the image sensor 103, and specifies an area in which the main subject is shown in the frame image. Then, the specified area is traced between frames. Specifically, the process is as follows.

図１１は、モニタ１０６に表示されたフレーム画像を例示する図である。例えば、使用者は、最初のフレーム画像のスルー画がモニタ１０６に表示されたときに、操作部材１０１を操作して、最初のフレーム内で、フレーム間で追跡したい被写体（主要被写体Ａと呼ぶ）を含む範囲Ｅを指定する。演算部１０４ａは、使用者によって指定された範囲Ｅ内の画像を抽出し、テンプレートＴとしてメモリに記憶する。図１２は、テンプレートＴを例示する図である。 FIG. 11 is a diagram illustrating a frame image displayed on the monitor 106. For example, when the through image of the first frame image is displayed on the monitor 106, the user operates the operation member 101 to track a subject (referred to as main subject A) to be tracked between frames within the first frame. A range E including The computing unit 104a extracts an image within the range E designated by the user and stores it in the memory as a template T. FIG. 12 is a diagram illustrating a template T.

演算部１０４ａは、図１３に例示するように、テンプレートＴと撮像素子１０３から時系列で入力される各フレームの画像データ（ターゲット画像Ｉと呼ぶ）との間でマッチング演算を行う。具体的には、テンプレートＴを用いてターゲット画像Ｉ内における主要被写体位置を特定する。 As illustrated in FIG. 13, the calculation unit 104 a performs a matching calculation between the template T and image data of each frame (referred to as a target image I) input in time series from the image sensor 103. Specifically, the main subject position in the target image I is specified using the template T.

テンプレートマッチングの手法は公知であるため詳細な説明は省略するが、類似度の算出は、たとえば次式（４）に示す残差和や次式（５）に示す正規化相関により算出できる。次式（４）に示す残差和により類似度を算出した場合は、算出されるｒが小さいほどテンプレートＴとターゲット画像Ｉとの類似度が高いことを示している。一方、次式（５）に示す正規化相関により類似度を算出した場合は、算出されるｒが大きいほどテンプレートＴとターゲット画像Ｉとの類似度が高いことを示している。

Although the template matching method is well known and will not be described in detail, the similarity can be calculated by, for example, the residual sum shown in the following equation (4) or the normalized correlation shown in the following equation (5). When the similarity is calculated by the residual sum shown in the following equation (4), the smaller the calculated r is, the higher the similarity between the template T and the target image I is. On the other hand, when the similarity is calculated by the normalized correlation shown in the following equation (5), the larger the calculated r is, the higher the similarity between the template T and the target image I is.

演算部１０４ａは、撮像素子１０３からの画像の入力が開始されると、各フレーム（コマ）の画像をターゲット画像Ｉとし、当該ターゲット画像Ｉ上の所定の位置にテンプレートＴと同じ大きさの探索枠Ｂを設定する。演算部１０４ａは、設定した探索枠Ｂをターゲット画像Ｉ内で移動させながら、各位置における探索枠Ｂ内の画像と、テンプレートＴとのマッチング演算を行なう。マッチング演算の結果、テンプレートＴとの間で最も類似度の高い探索枠Ｂの位置を被写体位置として特定する。 When the input of the image from the image sensor 103 is started, the calculation unit 104a sets the image of each frame (frame) as the target image I, and searches for the same size as the template T at a predetermined position on the target image I. Set frame B. The calculation unit 104a performs matching calculation between the image in the search frame B at each position and the template T while moving the set search frame B in the target image I. As a result of the matching calculation, the position of the search frame B having the highest similarity with the template T is specified as the subject position.

なお、第二の実施形態の演算部１０４ａは、ターゲット画像Ｉの全体ではなく、フレーム画像内からテンプレートＴを抽出した範囲Ｅを含む所定範囲（探索領域Ｓ２と呼ぶ）を対象にテンプレートマッチングを行う。このように探索領域Ｓ２のサイズをターゲット画像Ｉより小さくするのは、画面全体を追尾対象とすると処理時間がかかる上、背景による擬似マッチングが起きやすいことから、これらを避けるためである。探索領域Ｓ２のサイズの決め方は後述する。 Note that the calculation unit 104a of the second embodiment performs template matching not on the entire target image I but on a predetermined range (referred to as a search region S2) including a range E in which the template T is extracted from the frame image. . The reason why the size of the search area S2 is made smaller than the target image I in this way is to avoid the processing time required for the entire screen and the fact that pseudo matching with the background tends to occur. How to determine the size of the search area S2 will be described later.

ところで、図１２のような主要被写体Ａ以外の部分を多く含むテンプレートＴでは、マッチング演算時に使用するターゲット画像Ｉの状態によっては図１４のように背景ノイズが混入しやすくマッチング精度が出ないことが多い。そのため図１５に示すような、主要被写体Ａに公知のモルフォロジー（数理形態学）の膨張処理を施した膨張マスクＤを設定する。すなわち、テンプレートＴを２値化し、該２値化画像を膨張させて膨張マスクＤを得る。膨張マスクＤは、図１５に示すように内側の画素がすべて１、外側がすべて０にしてある。この膨張マスクＤと、ターゲット画像ＩをＡＮＤ演算することにより、ターゲット画像Ｉに含まれる背景ノイズをカットする。図１６は、背景ノイズカットを説明する図である。図１６によれば、ターゲット画像Ｉのうち膨張マスクＤの内部（画素１）の画像のみが残り、外部（画素０）の背景ノイズを含む画像が消去される。 By the way, in the template T including many parts other than the main subject A as shown in FIG. 12, background noise is likely to be mixed as shown in FIG. 14 depending on the state of the target image I used during the matching calculation, and matching accuracy may not be obtained. Many. Therefore, as shown in FIG. 15, an expansion mask D is set in which the main subject A is subjected to expansion processing of a known morphology (mathematical morphology). That is, the template T is binarized and the binarized image is expanded to obtain the expansion mask D. In the expansion mask D, as shown in FIG. 15, the inner pixels are all 1 and the outer pixels are all 0. By performing an AND operation on the expansion mask D and the target image I, the background noise included in the target image I is cut. FIG. 16 is a diagram for explaining background noise cut. According to FIG. 16, only the image inside the expansion mask D (pixel 1) remains in the target image I, and the image including background noise outside (pixel 0) is deleted.

膨張マスクＤを採用する場合の演算部１０４ａは、図１７に示すように、主要被写体Ａを中心として膨張マスクＤを包絡する枠を探索領域Ｓ２とする。図１８は探索領域Ｓ２内での最初のテンプレートＴの位置を示し、図１９は探索領域Ｓ２内での最後のテンプレートＴの位置を示す。テンプレートＴ内において膨張マスクＤの外側に相当する部分、つまりテンプレートＴの右下と左上の部分は、マッチング演算の対象から除外できる。 As illustrated in FIG. 17, the calculation unit 104a in the case of employing the expansion mask D sets a frame that envelops the expansion mask D around the main subject A as a search region S2. FIG. 18 shows the position of the first template T in the search area S2, and FIG. 19 shows the position of the last template T in the search area S2. The portion corresponding to the outside of the expansion mask D in the template T, that is, the lower right and upper left portions of the template T can be excluded from the target of the matching calculation.

図２０に、１回のマッチング演算時のテンプレートＴの移動量Ｘ２、Ｙ２を示す。膨張マスクＤを採用する第二の実施形態では、探索領域Ｓ２のサイズは一意的に決まる。また、膨張マスクＤによる探索領域Ｓ２のサイズは、従来のマッチング演算で用いた探索領域より小さくなることが多いことが実験で示されている。このように膨張マスクＤを導入することで、背景ノイズの削減だけでなくマッチング演算の短縮も可能となる。 FIG. 20 shows the movement amounts X2 and Y2 of the template T during one matching calculation. In the second embodiment employing the expansion mask D, the size of the search area S2 is uniquely determined. Experiments have shown that the size of the search area S2 by the expansion mask D is often smaller than the search area used in the conventional matching calculation. By introducing the expansion mask D in this way, not only the background noise can be reduced but also the matching operation can be shortened.

演算部１０４ａは、上述した処理を、入力される各フレーム画像に対して実行する。演算部１０４ａは、マッチング演算によって特定された主要被写体位置のうち、マッチング時の類似度が最も高い位置を被写体位置とする。そして、この被写体位置を追跡対象としてターゲット枠を表示し、各フレーム間で被写体追跡を行うことができる。 The computing unit 104a performs the above-described processing on each input frame image. The calculation unit 104a sets a position having the highest similarity in matching among the main subject positions specified by the matching calculation as the subject position. A target frame can be displayed with the subject position as a tracking target, and subject tracking can be performed between the frames.

図２１は、第二の実施形態におけるカメラ１００の処理を示すフローチャートである。図２１に示す処理は、撮像素子１０３から画像の入力が開始されると起動するプログラムとして、制御装置１０４の演算部１０４ａによって実行される。 FIG. 21 is a flowchart showing the processing of the camera 100 in the second embodiment. The processing shown in FIG. 21 is executed by the arithmetic unit 104a of the control device 104 as a program that starts when image input from the image sensor 103 is started.

図２１のステップＳ１１において、演算部１０４ａは、フレーム画像を読み込んで、ステップＳ１２へ進む。ステップＳ１２では、演算部１０４ａは、使用者からの指示に基づいて、フレーム画像から所定範囲内の画像を抽出し、ステップＳ１３へ進む。 In step S11 of FIG. 21, the calculation unit 104a reads a frame image and proceeds to step S12. In step S12, the calculation unit 104a extracts an image within a predetermined range from the frame image based on an instruction from the user, and proceeds to step S13.

ステップＳ１３において、演算部１０４ａは、最初のフレームか否かを判定する。演算部１０４ａは、最初のフレームでない（すなわち２フレーム目以降である）場合はステップＳ１３を否定判定してステップＳ１４へ進む。演算部１０４ａは、最初のフレームの場合はステップＳ１３を肯定判定してステップＳ１８へ進む。 In step S13, the calculation unit 104a determines whether or not it is the first frame. If not the first frame (that is, the second and subsequent frames), the arithmetic unit 104a makes a negative determination in step S13 and proceeds to step S14. In the case of the first frame, the arithmetic unit 104a makes a positive determination in step S13 and proceeds to step S18.

ステップＳ１４において、演算部１０４ａは２値化演算を行う。演算部１０４ａは、上記所定範囲内の画像を２値化した２値化画像を算出し、ステップＳ１５へ進む。ステップＳ１５において、演算部１０４ａは、２値化画像に対してモルフォロジーによる膨張処理を行い、膨張マスクＤを設定してステップＳ１６へ進む。ステップＳ１６において、演算部１０４ａは膨張マスクＤを包絡する枠を探索領域Ｓ２としてステップＳ１７へ進む。ステップＳ１７において、演算部１０４ａはマッチング演算を行ってステップＳ１８へ進む。 In step S14, the calculation unit 104a performs binarization calculation. The computing unit 104a calculates a binarized image obtained by binarizing the image within the predetermined range, and proceeds to step S15. In step S15, the calculation unit 104a performs a morphological expansion process on the binarized image, sets an expansion mask D, and proceeds to step S16. In step S16, the calculation unit 104a sets the frame enveloping the expansion mask D as the search area S2 and proceeds to step S17. In step S17, the calculation unit 104a performs a matching calculation and proceeds to step S18.

ステップＳ１８において、演算部１０４ａは、主要被写体が存在する範囲を明示するためのターゲット枠をフレーム画像上に表示し、ステップＳ１９へ進む。ステップＳ１９において、演算部１０４ａは、フレーム終了か否かを判定する。演算部１０４ａは、画像の入力が終了した場合にステップＳ１９を肯定判定して図２１による処理を終了する。演算部１０４ａは、画像の入力が終了していない場合には、ステップＳ１９を否定判定してステップＳ１１へ戻る。ステップＳ１１へ戻る場合は次フレームの読込みを行って上述した処理を繰り返す。 In step S18, the calculation unit 104a displays a target frame on the frame image for clearly indicating the range in which the main subject exists, and proceeds to step S19. In step S19, the calculation unit 104a determines whether or not it is the end of the frame. The arithmetic unit 104a makes a positive determination in step S19 when the input of the image is completed, and ends the process of FIG. If the image input has not ended, the arithmetic unit 104a makes a negative determination in step S19 and returns to step S11. When returning to step S11, the next frame is read and the above-described processing is repeated.

以上説明した第二の実施形態によれば、以下のような作用効果を得ることができる。
（１）カメラ１００は、テンプレートＴ内の主要被写体Ａの２値化画像を膨張させるモルフォロジー処理と、モルフォロジー処理による膨張後の２値化画像を包絡する範囲を探索領域Ｓ２に決定する探索領域決定処理と、時系列で入力される各フレーム画像のうち探索領域Ｓ２内で探索枠Ｂを移動させながら、各探索枠位置における探索枠Ｂ内の画像とテンプレートＴとの類似度を算出する類似度算出処理と、類似度が最も高い探索枠Ｂの位置を主要被写体の位置として特定する特定処理と、を制御装置１０４に実行させる被写体追跡プログラムを搭載したので、時系列に入力されるフレーム画像において適切に主要被写体を追跡できる。 According to the second embodiment described above, the following operational effects can be obtained.
(1) The camera 100 determines a search area in which a morphological process for expanding the binarized image of the main subject A in the template T and a range enveloping the binarized image after the expansion by the morphological process are set as the search area S2. Processing and similarity for calculating similarity between the image in the search frame B and the template T at each search frame position while moving the search frame B in the search region S2 among the frame images input in time series Since the subject tracking program that causes the control device 104 to execute the calculation process and the specifying process for specifying the position of the search frame B with the highest similarity as the position of the main subject is installed, in the frame image input in time series The main subject can be tracked appropriately.

（２）上記（１）の被写体追跡プログラムにおいて、フレーム画像から膨張後の２値化画像に基づいて画像を抽出する膨張マスク処理をさらに備え、類似度算出処理は、時系列で入力される各フレーム画像のうち探索領域Ｓ２内で探索枠Ｂを移動させながら、各探索枠位置における探索枠Ｂ内で膨張マスク処理後の画像とテンプレートＴとの類似度を算出するようにしたので、フレーム画像において仮に主要被写体Ａ以外の背景ノイズが含まれる場合でも、この背景ノイズをカットして適切に主要被写体Ａを追跡できる。 (2) The subject tracking program of (1) further includes an expansion mask process for extracting an image from a frame image based on a binarized image after expansion, and the similarity calculation process is performed in time series. Since the search frame B is moved in the search area S2 in the frame image, the similarity between the image after the expansion mask processing and the template T in the search frame B at each search frame position is calculated. Even if background noise other than the main subject A is included, the background noise can be cut and the main subject A can be tracked appropriately.

（変形例）
上述した実施形態では、カメラ１００の制御装置１０４が被写体追跡処理を行う例を説明したが、被写体追跡処理をコンピュータに行わせるように構成するようにしてもよい。図２、図２１に例示したフローチャートに基づく処理を行うプログラムをコンピュータに実行させることにより、被写体追跡装置を構成する。プログラムをコンピュータに取込んで使用する場合には、コンピュータのデータストレージ装置にプログラムをローディングした上で、当該プログラムを実行させる。 (Modification)
In the above-described embodiment, an example in which the control device 104 of the camera 100 performs the subject tracking process has been described. However, the subject tracking process may be configured to be performed by a computer. The subject tracking device is configured by causing a computer to execute a program that performs processing based on the flowcharts illustrated in FIGS. 2 and 21. When a program is used by being taken into a computer, the program is executed after being loaded into a data storage device of the computer.

コンピュータに対するプログラムのローディングは、プログラムを格納したＣＤ−ＲＯＭなどの記憶媒体をコンピュータにセットして行ってもよいし、ネットワークなどの通信回線を経由する方法でコンピュータへローディングしてもよい。通信回線を経由する場合は、通信回線に接続されたサーバー（コンピュータ）のストレージ装置などにプログラムを格納しておく。プログラムは、記憶媒体や通信回線を介する提供など、種々の形態のコンピュータプログラム製品として供給することができる。 The program may be loaded into the computer by setting a storage medium such as a CD-ROM storing the program in the computer, or may be loaded into the computer by a method via a communication line such as a network. When passing through a communication line, the program is stored in a storage device of a server (computer) connected to the communication line. The program can be supplied as various forms of computer program products such as provision via a storage medium or a communication line.

以上の説明はあくまで一例であり、上記の実施形態の構成に何ら限定されるものではない。 The above description is merely an example, and is not limited to the configuration of the above embodiment.

１００…カメラ
１０１…操作部材
１０４…制御装置
１０４ａ…演算部
１０６…モニタ DESCRIPTION OF SYMBOLS 100 ... Camera 101 ... Operation member 104 ... Control apparatus 104a ... Calculation part 106 ... Monitor

Claims

On the computer,
Element image generation processing for generating a plurality of element images based on color information and luminance information of each frame image input in time series;
Binarization element image generation processing for binarizing each of the plurality of element images to generate a plurality of binarization element images;
AND operation processing that performs an AND operation on the plurality of binarized element images;
A specifying process for specifying the position of the main subject in each frame image based on a labeling process for the binary AND image after the AND operation;
A reduction process for reducing the specified range based on a logical product operation of the range specified in the specifying process and a predetermined range;
Morphology processing for expanding the range reduced by the reduction processing in the previous frame image to obtain the predetermined range;
A subject tracking program characterized by causing

The subject tracking program according to claim 1,
The subject image generation process generates a luminance image and a color difference image, respectively.

On the computer,
Morphological processing for expanding the binarized image of the main subject in the template;
A search region determination process for determining a search region as a range for enveloping the binary image after expansion by the morphology processing;
A similarity calculation process for calculating the similarity between the image in the search frame at each search frame position and the template while moving the search frame in the search region among the frame images input in time series,
A specifying process for specifying the position of the search frame having the highest similarity as the position of the main subject;
A subject tracking program characterized by causing

In the subject tracking program according to claim 3,
Further comprising expansion mask processing for extracting an image from the frame image based on the binarized image after expansion,
The similarity calculation processing is performed by moving the search frame in the search area among the frame images input in time series, and the image after the expansion mask processing in the search frame at each search frame position and the template A subject tracking program characterized by calculating a similarity to

An object tracking apparatus comprising execution means for executing the object tracking program according to claim 1.