JP2017229061A

JP2017229061A - Image processing apparatus, control method therefor, and imaging apparatus

Info

Publication number: JP2017229061A
Application number: JP2017084763A
Authority: JP
Inventors: 良介辻; Ryosuke Tsuji
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-06-21
Filing date: 2017-04-21
Publication date: 2017-12-28
Anticipated expiration: 2037-04-21
Also published as: JP7223079B2; JP2021176243A; JP6924064B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus capable of accurate region tracking and a control method for the same.SOLUTION: On the basis of a specified position, if distance information satisfying a reliability condition is obtained for a region including the specified position, an image region for extracting a feature amount used for region tracking is identified using the distance information, and, if it is not obtained, it is identified without using the distance information.SELECTED DRAWING: Figure 9

Description

本発明は、画像処理装置およびその制御方法、ならびに撮像装置に関し、特には画像間で特定の領域を追跡する技術に関する。 The present invention relates to an image processing apparatus, a control method therefor, and an imaging apparatus, and more particularly to a technique for tracking a specific area between images.

ある時刻ｔに撮影された画像内の領域と類似した領域を、時刻ｔより後に撮影された１つ以上の画像内で探索することで、領域の経時的な動きを検出することができる。例えば動画撮影において特定の被写体の領域（顔領域）の動きを検出することにより、特定の被写体にピントを合わせ続けたり、特定の被写体の露出が適正になるように露出条件を動的に変更したりすることが可能になる（特許文献１）。 By searching a region similar to the region in the image captured at a certain time t in one or more images captured after the time t, it is possible to detect the temporal movement of the region. For example, by detecting the movement of the area (face area) of a specific subject in movie shooting, the exposure condition is dynamically changed so that the specific subject is kept in focus or the exposure of the specific subject is appropriate. (Patent Document 1).

特開２００５−３１８５５４号公報JP 2005-318554 A

特定の画像領域と類似した領域を探索する場合、マッチングと呼ばれる手法が一般的に用いられる。例えばテンプレートマッチングでは、ある画像領域の画素パターンを特徴量（テンプレート）として設定し、別の画像の探索領域内でテンプレートの位置を相対的に変えながら位置ごとに類似度（例えば相関量）を算出し、類似度の最も高い位置を検出する。そして、検出された位置での類似度が十分に高いと判定されれば、その位置にテンプレートと同じパターンの画像領域が存在すると推定する。 When searching for a region similar to a specific image region, a technique called matching is generally used. For example, in template matching, the pixel pattern of a certain image area is set as a feature amount (template), and the degree of similarity (for example, the correlation amount) is calculated for each position while relatively changing the position of the template in the search area of another image. Then, the position with the highest similarity is detected. If it is determined that the similarity at the detected position is sufficiently high, it is estimated that an image region having the same pattern as the template exists at that position.

マッチングによる探索精度は、マッチングに用いる特徴量をどのように設定するかに大きく依存する。例えば、ある特定の人物の顔領域を追跡する場合、顔領域の一部しか含まない領域の画素パターンを特徴量に設定すると、顔の特徴量が少ないために誤検出が起こりやすくなる。また逆に顔領域全体を含むが、顔領域の周辺領域（例えば背景領域）の割合が多い画素パターンを特徴量に設定すると、背景の類似度の寄与が大きくなり、やはり誤検出が起こりやすくなる。 The search accuracy by matching largely depends on how to set feature quantities used for matching. For example, when tracking a face area of a specific person, if a pixel pattern of an area that includes only a part of the face area is set as a feature amount, erroneous detection is likely to occur because the face feature amount is small. Conversely, if a pixel pattern that includes the entire face area but has a large proportion of the peripheral area of the face area (for example, the background area) is set as the feature amount, the contribution of the background similarity increases, and erroneous detection is likely to occur. .

本発明はこのような従来技術の課題に鑑みてなされたものであり、精度の良い領域追跡が可能な画像処理装置およびその制御方法の提供を目的とする。 The present invention has been made in view of the above-described problems of the prior art, and an object of the present invention is to provide an image processing apparatus capable of accurately tracking an area and a control method thereof.

上述の目的は、指定された位置に基づいて、特徴量を抽出するための画像領域を画像内で特定する特定手段と、画像領域から特徴量を抽出する抽出手段と、特徴量を用いて、画像領域と類似する領域を時系列的な複数の画像内で探索する探索手段と、を有し、特定手段は、指定された位置を含む領域について信頼性の条件を満たす距離情報が得られていれば距離情報を用いて、得られていなければ距離情報を用いずに、画像領域を特定する、ことを特徴とする画像処理装置によって達成される。 The above-described object is achieved by using a specifying unit that specifies an image region for extracting a feature amount in an image based on a specified position, an extraction unit that extracts a feature amount from the image region, and a feature amount. Search means for searching for a region similar to the image region in a plurality of time-sequential images, and the specifying unit obtains distance information that satisfies the reliability condition for the region including the designated position. This is achieved by an image processing apparatus characterized by specifying an image region using distance information without using distance information if it is not obtained.

本発明によれば、精度の良い領域追跡が可能な画像処理装置およびその制御方法を提供できる。 According to the present invention, it is possible to provide an image processing apparatus capable of accurately tracking an area and a control method thereof.

実施形態に係るデジタルカメラの機能構成例を示すブロック図1 is a block diagram showing a functional configuration example of a digital camera according to an embodiment. 図１の撮像素子の画素配列例を示す図The figure which shows the pixel array example of the image pick-up element of FIG. 図１の追跡部の機能構成例を示すブロック図The block diagram which shows the function structural example of the tracking part of FIG. 第１の実施形態におけるテンプレートマッチングに関する図The figure regarding the template matching in 1st Embodiment 第１の実施形態におけるヒストグラムマッチングに関する図The figure regarding the histogram matching in 1st Embodiment 第１の実施形態における被写体距離の取得方法に関する図The figure regarding the acquisition method of the object distance in a 1st embodiment. 第１の実施形態における被写体領域の特定方法を模式的に示す図The figure which shows typically the identification method of the to-be-photographed object area | region in 1st Embodiment. 第１の実施形態における撮像処理のフローチャートFlowchart of imaging processing in the first embodiment 第１の実施形態における被写体追跡処理のフローチャートFlowchart of subject tracking processing in the first embodiment 第２の実施形態における撮像処理のフローチャートFlowchart of imaging processing in the second embodiment 第２の実施形態における被写体追跡処理のフローチャートFlowchart of subject tracking processing in the second embodiment 第２の実施形態における特徴量の更新判定方法を模式的に示す図The figure which shows typically the update determination method of the feature-value in 2nd Embodiment.

以下、添付図面を参照して本発明の実施形態に係る画像処理装置の一例としてのデジタルカメラについて詳細に説明する。しかしながら、本発明は撮影機能を有さない電子機器においても実施可能である。本発明を実施可能な電子機器には例えば、デジタルカメラ、携帯電話機、タブレット端末、ゲーム機、パーソナルコンピュータ、ナビゲーションシステム、家電製品、ロボットなどが含まれるが、これらに限定されない。 Hereinafter, a digital camera as an example of an image processing apparatus according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention can also be implemented in an electronic device that does not have a photographing function. Examples of electronic devices that can implement the present invention include, but are not limited to, digital cameras, mobile phones, tablet terminals, game machines, personal computers, navigation systems, home appliances, and robots.

●＜第１の実施形態＞
（撮像装置の構成）
図１は、本発明の第１の実施形態に係るデジタルカメラ１００の機能構成例を示すブロック図である。デジタルカメラ１００は動画および静止画の撮影ならびに記録が可能である。デジタルカメラ１００内の各機能ブロックは、バス１６０を介して互いに通信可能に接続されている。デジタルカメラ１００の動作は、主制御部１５１（中央演算処理装置）がプログラムを実行して各機能ブロックを制御することにより実現される。 ● <First embodiment>
(Configuration of imaging device)
FIG. 1 is a block diagram illustrating a functional configuration example of a digital camera 100 according to the first embodiment of the present invention. The digital camera 100 can shoot and record moving images and still images. Each functional block in the digital camera 100 is connected to be communicable with each other via a bus 160. The operation of the digital camera 100 is realized by the main control unit 151 (central processing unit) executing a program to control each functional block.

本実施形態のデジタルカメラ１００は撮影した被写体の距離情報を取得可能である。距離情報は例えば画素値が対応する被写体の距離を表す距離画像であってよい。距離情報はどのような方法で取得してもよいが、本実施形態では視差画像に基づいて距離情報を取得するものとする。視差画像の取得方法にも制限は無いが、本実施形態では１つのマイクロレンズを共有する複数の光電変換素子を備えた撮像素子１４１を用いて視差画像を取得するものとする。なお、デジタルカメラ１００をステレオカメラのような多眼カメラとして視差画像を取得してもよいし、任意の方法で撮影された視差画像のデータを記憶媒体や外部装置から取得してもよい。 The digital camera 100 according to the present embodiment can acquire distance information of a photographed subject. The distance information may be, for example, a distance image representing the distance of the subject corresponding to the pixel value. The distance information may be acquired by any method, but in the present embodiment, the distance information is acquired based on the parallax image. Although the parallax image acquisition method is not limited, in the present embodiment, the parallax image is acquired using the imaging element 141 including a plurality of photoelectric conversion elements sharing one microlens. Note that a parallax image may be acquired by using the digital camera 100 as a multi-view camera such as a stereo camera, and data of a parallax image captured by an arbitrary method may be acquired from a storage medium or an external device.

また、デジタルカメラ１００は指定された被写体領域と類似した領域の探索を継続的に実行することにより被写体追跡機能を実現する追跡部１６１を有する。追跡部１６１は視差画像から距離情報を生成し、被写体領域の探索に用いる。追跡部１６１の構成及び動作の詳細については後述する。 In addition, the digital camera 100 includes a tracking unit 161 that realizes a subject tracking function by continuously searching for a region similar to the designated subject region. The tracking unit 161 generates distance information from the parallax image and uses it for searching for a subject area. Details of the configuration and operation of the tracking unit 161 will be described later.

撮影レンズ１０１（レンズユニット）は、固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、フォーカスレンズ１３１、ズームモータ１１２、絞りモータ１０４、およびフォーカスモータ１３２を有する。固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、フォーカスレンズ１３１は撮影光学系を構成する。なお、便宜上レンズ１０２、１１１、１２１、１３１を１枚のレンズとして図示しているが、それぞれ複数のレンズで構成されてもよい。また、撮影レンズ１０１は着脱可能な交換レンズとして構成されてもよい。 The photographic lens 101 (lens unit) includes a fixed first group lens 102, a zoom lens 111, an aperture 103, a fixed third group lens 121, a focus lens 131, a zoom motor 112, an aperture motor 104, and a focus motor 132. The fixed first group lens 102, the zoom lens 111, the diaphragm 103, the fixed third group lens 121, and the focus lens 131 constitute a photographing optical system. In addition, although the lenses 102, 111, 121, and 131 are illustrated as one lens for convenience, each may be configured by a plurality of lenses. Further, the photographing lens 101 may be configured as a detachable interchangeable lens.

絞り制御部１０５は絞り１０３を駆動する絞りモータ１０４の動作を制御し、絞り１０３の開口径を変更する。
ズーム制御部１１３は、ズームレンズ１１１を駆動するズームモータ１１２の動作を制御し、撮影レンズ１０１の焦点距離（画角）を変更する。 The diaphragm control unit 105 controls the operation of the diaphragm motor 104 that drives the diaphragm 103 to change the aperture diameter of the diaphragm 103.
The zoom control unit 113 controls the operation of the zoom motor 112 that drives the zoom lens 111 to change the focal length (view angle) of the photographing lens 101.

フォーカス制御部１３３は、撮像素子１４１から得られる１対の焦点検出用信号（Ａ像およびＢ像）の位相差に基づいて撮影レンズ１０１のデフォーカス量およびデフォーカス方向を算出する。そしてフォーカス制御部１３３は、デフォーカス量およびデフォーカス方向をフォーカスモータ１３２の駆動量および駆動方向に変換する。この駆動量および駆動方向に基づいてフォーカス制御部１３３はフォーカスモータ１３２の動作を制御し、フォーカスレンズ１３１を駆動することにより、撮影レンズ１０１の焦点状態を制御する。このように、フォーカス制御部１３３は位相差検出方式の自動焦点検出（ＡＦ）を実施する。なお、フォーカス制御部１３３は撮像素子１４１から得られる画像信号から得られるコントラスト評価値に基づくコントラスト検出方式のＡＦを実行してもよい。 The focus control unit 133 calculates the defocus amount and the defocus direction of the photographing lens 101 based on the phase difference between a pair of focus detection signals (A image and B image) obtained from the image sensor 141. The focus control unit 133 converts the defocus amount and the defocus direction into the drive amount and drive direction of the focus motor 132. The focus control unit 133 controls the operation of the focus motor 132 based on the drive amount and the drive direction, and controls the focus state of the photographic lens 101 by driving the focus lens 131. In this way, the focus control unit 133 performs automatic focus detection (AF) using a phase difference detection method. The focus control unit 133 may execute contrast detection AF based on the contrast evaluation value obtained from the image signal obtained from the image sensor 141.

撮影レンズ１０１によって撮像素子１４１の結像面に形成される被写体像は、撮像素子１４１に配置された複数の画素のそれぞれが有する光電変換素子により電気信号（画像信号）に変換される。本実施形態では、撮像素子１４１に、水平方向にｍ、垂直方向にｎ（ｎ，ｍは複数）の画素が行列状に配置されており、各画素には２つの光電変換素子（光電変換領域）が設けられている。撮像素子１４１からの信号読み出しは、主制御部１５１からの指示に従ってセンサ制御部１４３が制御する。 A subject image formed on the imaging surface of the image sensor 141 by the photographing lens 101 is converted into an electrical signal (image signal) by a photoelectric conversion element included in each of a plurality of pixels arranged in the image sensor 141. In the present embodiment, m pixels in the horizontal direction and n pixels (n and m are plural) are arranged in a matrix in the image sensor 141, and each pixel has two photoelectric conversion elements (photoelectric conversion regions). ) Is provided. Signal reading from the image sensor 141 is controlled by the sensor control unit 143 in accordance with an instruction from the main control unit 151.

（撮像素子１４１の画素配列）
図２は、撮像素子１４１における画素の配置例を模式的に示す図であり、水平方向に４画素、垂直方向に４画素の１６画素からなる領域を代表的に示している。撮像素子１４１の各画素には１つのマイクロレンズ２１０と、マイクロレンズ２１０を介して受光する２つの光電変換素子２０１、２０２とが設けられている。図２の例では水平方向に２つの光電変換素子２０１、２０２が配置されているため、各画素は撮影レンズ１０１の瞳領域を水平方向に分割する機能を有する。 (Pixel arrangement of the image sensor 141)
FIG. 2 is a diagram schematically showing an example of pixel arrangement in the image sensor 141, and representatively shows an area composed of 16 pixels of 4 pixels in the horizontal direction and 4 pixels in the vertical direction. Each pixel of the image sensor 141 is provided with one microlens 210 and two photoelectric conversion elements 201 and 202 that receive light through the microlens 210. In the example of FIG. 2, since the two photoelectric conversion elements 201 and 202 are arranged in the horizontal direction, each pixel has a function of dividing the pupil region of the photographing lens 101 in the horizontal direction.

また、撮像素子１４１には、水平方向２画素×垂直方向２画素の４画素を繰り返し単位とする原色ベイヤー配列のカラーフィルタが設けられている。カラーフィルタはＲ（赤）およびＧ（緑）が水平方向に繰り返し配置される行と、ＧおよびＢ（青）が水平方向に繰り返し配置される行とが交互に配置された構成を有する。赤フィルタが設けられた画素２００Ｒを赤画素、Ｇ（緑）フィルタが設けられた画素２００Ｇを緑画素、Ｂ（青）フィルタが設けられた画素２００Ｂを青画素と呼ぶ。 In addition, the image sensor 141 is provided with a primary color Bayer array color filter having a repetition unit of four pixels of 2 pixels in the horizontal direction and 2 pixels in the vertical direction. The color filter has a configuration in which rows in which R (red) and G (green) are repeatedly arranged in the horizontal direction and rows in which G and B (blue) are repeatedly arranged in the horizontal direction are alternately arranged. The pixel 200R provided with the red filter is called a red pixel, the pixel 200G provided with the G (green) filter is called a green pixel, and the pixel 200B provided with a B (blue) filter is called a blue pixel.

以下の説明では、第１の光電変換素子２０１をＡ画素、第２の光電変換素子２０２をＢ画素、Ａ画素から読み出される信号をＡ信号、Ｂ画素から読み出される信号をＢ信号と呼ぶことがある。ある領域に含まれる複数の画素から得られるＡ信号で構成される画像と、Ｂ信号で構成される画像とは１組の視差画像を構成する。したがって、デジタルカメラ１００は１回の撮影によって２つの視差画像を生成することができる。また、画素ごとにＡ信号とＢ信号とを加算すると、瞳分割機能を持たない一般的な画素と同様の信号を得ることができる。以下ではこの加算信号をＡ＋Ｂ信号、Ａ＋Ｂ信号から構成される画像を撮像画像と呼ぶことがある。 In the following description, the first photoelectric conversion element 201 is called an A pixel, the second photoelectric conversion element 202 is called a B pixel, a signal read from the A pixel is called an A signal, and a signal read from the B pixel is called a B signal. is there. An image composed of A signals obtained from a plurality of pixels included in a certain area and an image composed of B signals constitute a set of parallax images. Therefore, the digital camera 100 can generate two parallax images by one shooting. Further, by adding the A signal and the B signal for each pixel, a signal similar to that of a general pixel having no pupil division function can be obtained. Hereinafter, this added signal may be referred to as an A + B signal, and an image composed of the A + B signal may be referred to as a captured image.

このように、１つの画素から、第１の光電変換素子２０１の出力（Ａ信号）、第２の光電変換素子２０２の出力（Ｂ信号）、および第１の光電変換素子２０１と第２の光電変換素子２０２の加算出力（Ａ＋Ｂ信号）という３種類の信号を読み出すことができる。なお、Ａ信号（Ｂ信号）は、読み出す代わりにＡ＋Ｂ信号からＢ信号（Ａ信号）を減じて求めてもよい。 Thus, from one pixel, the output (A signal) of the first photoelectric conversion element 201, the output (B signal) of the second photoelectric conversion element 202, and the first photoelectric conversion element 201 and the second photoelectric conversion element. Three types of signals, that is, the addition output (A + B signal) of the conversion element 202 can be read out. The A signal (B signal) may be obtained by subtracting the B signal (A signal) from the A + B signal instead of reading.

なお、光電変換素子は垂直方向に分割配置されてもよいし、光電変換素子の分割方向が異なる画素が混在していてもよい。また、光電変換素子は垂直および水平の両方向に分割されていてもよい。また、同一方向で３つ以上に分割されていてもよい。 Note that the photoelectric conversion elements may be divided and arranged in the vertical direction, or pixels having different division directions of the photoelectric conversion elements may be mixed. Further, the photoelectric conversion element may be divided in both vertical and horizontal directions. Moreover, you may divide | segment into three or more in the same direction.

図１に戻って、撮像素子１４１から読み出された画像信号は信号処理部１４２に供給される。信号処理部１４２は、ノイズ低減処理、Ａ／Ｄ変換処理、自動利得制御処理などの信号処理を画像信号に適用し、センサ制御部１４３に出力する。センサ制御部１４３は信号処理部１４２から受信した画像信号をＲＡＭ（ランダム・アクセス・メモリ）１５４に蓄積する。 Returning to FIG. 1, the image signal read from the image sensor 141 is supplied to the signal processing unit 142. The signal processing unit 142 applies signal processing such as noise reduction processing, A / D conversion processing, and automatic gain control processing to the image signal and outputs the image signal to the sensor control unit 143. The sensor control unit 143 stores the image signal received from the signal processing unit 142 in a RAM (Random Access Memory) 154.

画像処理部１５２は、ＲＡＭ１５４に蓄積された画像データに対して予め定められた画像処理を適用する。画像処理部１５２が適用する画像処理には、ホワイトバランス調整処理、色補間（デモザイク）処理、ガンマ補正処理といった所謂現像処理のほか、信号形式変換処理、スケーリング処理、被写体検出処理、被写体認識処理などがあるが、これらに限定されない。また、自動露出制御（ＡＥ）に用いるための、被写体輝度に関する情報なども画像処理部１５２で生成することができる。被写体検出処理や被写体認識処理の結果を他の画像処理（例えばホワイトバランス調整処理）に利用してもよい。なお、コントラスト検出方式のＡＦを行う場合、ＡＦ評価値を画像処理部１５２が生成してもよい。画像処理部１５２は、処理した画像データをＲＡＭ１５４に保存する。 The image processing unit 152 applies predetermined image processing to the image data stored in the RAM 154. Image processing applied by the image processing unit 152 includes so-called development processing such as white balance adjustment processing, color interpolation (demosaic) processing, and gamma correction processing, signal format conversion processing, scaling processing, subject detection processing, subject recognition processing, and the like. However, it is not limited to these. The image processing unit 152 can also generate information related to subject brightness and the like for use in automatic exposure control (AE). The results of subject detection processing and subject recognition processing may be used for other image processing (for example, white balance adjustment processing). Note that when performing contrast detection AF, the image processing unit 152 may generate an AF evaluation value. The image processing unit 152 stores the processed image data in the RAM 154.

ＲＡＭ１５４に保存された画像データを記録する場合、主制御部１５１は画像処理データに例えば所定のヘッダを追加するなどして、記録形式に応じたデータファイルを生成する。この際、主制御部１５１は必要に応じて圧縮解凍部１５３で画像データを符号化して情報量を圧縮する。主制御部１５１は、生成したデータファイルを例えばメモリカードのような記録媒体１５７に記録する。 When recording the image data stored in the RAM 154, the main control unit 151 generates a data file corresponding to the recording format, for example, by adding a predetermined header to the image processing data. At this time, the main control unit 151 compresses the amount of information by encoding image data in the compression / decompression unit 153 as necessary. The main control unit 151 records the generated data file on a recording medium 157 such as a memory card.

また、ＲＡＭ１５４に保存された画像データを表示する場合、主制御部１５１は表示部１５０での表示サイズに適合するように画像データを画像処理部１５２でスケーリングした後、ＲＡＭ１５４のうちビデオメモリとして用いる領域（ＶＲＡＭ領域）に書き込む。
表示部１５０は、ＲＡＭ１５４のＶＲＡＭ領域から表示用の画像データを読み出し、例えばＬＣＤや有機ＥＬディスプレイなどの表示装置に表示する。 When displaying image data stored in the RAM 154, the main control unit 151 uses the image data in the RAM 154 as a video memory after the image processing unit 152 scales the image data so as to fit the display size on the display unit 150. Write to area (VRAM area).
The display unit 150 reads display image data from the VRAM area of the RAM 154 and displays it on a display device such as an LCD or an organic EL display.

本実施形態のデジタルカメラ１００は、動画撮影時（撮影スタンバイ状態や動画記録中）に、撮影された動画を表示部１５０に即時表示することにより、表示部１５０を電子ビューファインダー（ＥＶＦ）として機能させる。表示部１５０をＥＶＦとして機能させる際に表示する動画像およびそのフレーム画像を、ライブビュー画像もしくはスルー画像と呼ぶ。
また、デジタルカメラ１００は、静止画撮影を行った場合、撮影結果をユーザが確認できるように、直前に撮影した静止画を一定時間表示部１５０に表示する。これらの表示動作についても、主制御部１５１の制御によって実現される。 The digital camera 100 according to the present embodiment functions as an electronic viewfinder (EVF) by immediately displaying the captured moving image on the display unit 150 during moving image shooting (in shooting standby state or during moving image recording). Let A moving image and its frame image displayed when the display unit 150 functions as an EVF are referred to as a live view image or a through image.
In addition, when taking a still image, the digital camera 100 displays the still image taken immediately before on the display unit 150 for a certain period of time so that the user can check the shooting result. These display operations are also realized by the control of the main control unit 151.

操作部１５６は、ユーザがデジタルカメラ１００に指示を入力するためのスイッチ、ボタン、キー、タッチパネルなどである。操作部１５６を通じた入力はバス１６０を通じて主制御部１５１が検知し、主制御部１５１は入力に応じた動作を実現するために各部を制御する。 The operation unit 156 is a switch, button, key, touch panel, or the like for the user to input an instruction to the digital camera 100. An input through the operation unit 156 is detected by the main control unit 151 through the bus 160, and the main control unit 151 controls each unit in order to realize an operation according to the input.

主制御部１５１は例えばＣＰＵやＭＰＵなどのプログラマブルプロセッサを１つ以上有し、例えば記憶部１５５に記憶されたプログラムをＲＡＭ１５４に読み込んで実行することにより各部を制御し、デジタルカメラ１００の機能を実現する。主制御部１５１はまた、被写体輝度の情報に基づいて露出条件（シャッタースピードもしくは蓄積時間、絞り値、感度）を自動的に決定するＡＥ処理を実行する。被写体輝度の情報は例えば画像処理部１５２から取得することができる。主制御部１５１は、例えば人物の顔など、特定被写体の領域を基準として露出条件を決定することもできる。 The main control unit 151 has one or more programmable processors such as a CPU and an MPU, for example, and controls the respective units by reading the program stored in the storage unit 155 into the RAM 154 and executing it, thereby realizing the functions of the digital camera 100. To do. The main control unit 151 also executes an AE process that automatically determines an exposure condition (shutter speed or accumulation time, aperture value, sensitivity) based on the subject luminance information. Information on the subject brightness can be acquired from the image processing unit 152, for example. The main control unit 151 can also determine the exposure condition based on the area of a specific subject such as a person's face.

主制御部１５１は、動画撮影時には絞りは固定とし、電子シャッタスピード（蓄積時間）とゲインの大きさで露出を制御する。主制御部１５１は決定した蓄積時感とゲインの大きさをセンサ制御部１４３に通知する。センサ制御部１４３は通知された露出条件に従った撮影が行われるように撮像素子１４１の動作を制御する。 The main control unit 151 fixes the aperture during moving image shooting, and controls the exposure based on the electronic shutter speed (accumulation time) and the magnitude of the gain. The main control unit 151 notifies the sensor control unit 143 of the determined sense of accumulation and the magnitude of the gain. The sensor control unit 143 controls the operation of the image sensor 141 so that shooting is performed according to the notified exposure condition.

なお、本実施形態では、１回の撮影で１組の視差画像と、撮像画像との計３つの画像が取得可能であり、個々の画像について画像処理部１５２が処理を行ってＲＡＭ１５４に書き込む。追跡部１６１は、１組の視差画像から被写体の距離情報を求め、撮像画像を対象とした被写体追跡処理に利用する。被写体追跡に成功した場合、追跡部１６１は撮像画像内の被写体領域の位置についての情報と、信頼度に関する情報を出力する。 In the present embodiment, a total of three images, that is, a set of parallax images and a captured image can be acquired by one shooting, and the image processing unit 152 processes each image and writes it to the RAM 154. The tracking unit 161 obtains subject distance information from a set of parallax images and uses it for subject tracking processing for a captured image. When the subject tracking is successful, the tracking unit 161 outputs information on the position of the subject region in the captured image and information on the reliability.

被写体追跡の結果は、例えば焦点検出領域の自動設定に用いることができる。この結果、特定の被写体領域に対する追跡ＡＦ機能を実現できる。また、焦点検出領域の輝度情報に基づいてＡＥ処理を行ったり、焦点検出領域の画素値に基づいて画像処理（例えばガンマ補正処理やホワイトバランス調整処理など）を行ったりすることもできる。なお、主制御部１５１は、現在の被写体領域の位置を表す指標（例えば領域を囲む矩形枠）を表示画像に重畳表示させてもよい。 The result of subject tracking can be used for automatic setting of a focus detection area, for example. As a result, a tracking AF function for a specific subject area can be realized. Further, AE processing can be performed based on the luminance information of the focus detection area, or image processing (for example, gamma correction processing or white balance adjustment processing) can be performed based on the pixel value of the focus detection area. Note that the main control unit 151 may superimpose and display an index (for example, a rectangular frame surrounding the area) indicating the current position of the subject area on the display image.

バッテリ１５９は、電源管理部１５８により管理され、デジタルカメラ１００の全体に電源を供給する。
記憶部１５５は、主制御部１５１が実行するプログラム、プログラムの実行に必要な設定値、ＧＵＩデータ、ユーザ設定値などを記憶する。例えば操作部１５６の操作により電源ＯＦＦ状態から電源ＯＮ状態への移行が指示されると、記憶部１５５に格納されたプログラムがＲＡＭ１５４の一部に読み込まれ、主制御部１５１がプログラムを実行する。 The battery 159 is managed by the power management unit 158 and supplies power to the entire digital camera 100.
The storage unit 155 stores a program executed by the main control unit 151, setting values necessary for executing the program, GUI data, user setting values, and the like. For example, when a transition from the power OFF state to the power ON state is instructed by operating the operation unit 156, the program stored in the storage unit 155 is read into a part of the RAM 154, and the main control unit 151 executes the program.

（追跡部の構成および動作）
図３は、追跡部１６１の機能構成例を示すブロック図である。追跡部１６１は照合部１６１０と、特徴抽出部１６２０と、距離マップ生成部１６３０とを有する。追跡部１６１は、指定された位置から追跡を行う画像領域（被写体領域）を特定し、被写体領域から特徴量を抽出する。そして、供給される個々の撮像画像内で、抽出した特徴量を用いて、前フレームの被写体領域と類似度の高い領域を被写体領域として探索する。また、追跡部１６１は１対の視差画像から距離情報を取得し、被写体領域の特定に利用する。 (Configuration and operation of the tracking unit)
FIG. 3 is a block diagram illustrating a functional configuration example of the tracking unit 161. The tracking unit 161 includes a matching unit 1610, a feature extraction unit 1620, and a distance map generation unit 1630. The tracking unit 161 identifies an image region (subject region) to be tracked from a designated position, and extracts a feature amount from the subject region. Then, an area having a high similarity to the subject area of the previous frame is searched as a subject area using the extracted feature amount in each supplied captured image. In addition, the tracking unit 161 acquires distance information from a pair of parallax images and uses it for specifying a subject area.

照合部１６１０では、特徴抽出部１６２０から供給される被写体領域の特徴量を用いて、供給される画像内の被写体領域を探索する。画像の特徴量に基づいて領域を探索する方法に特に制限は無いが、照合部１６１０はテンプレートマッチングおよびヒストグラムマッチングの少なくとも一方を用いる。 The matching unit 1610 searches for the subject area in the supplied image using the feature amount of the subject area supplied from the feature extraction unit 1620. Although there is no particular limitation on a method for searching for a region based on the feature amount of the image, the matching unit 1610 uses at least one of template matching and histogram matching.

以下、テンプレートマッチングおよびヒストグラムマッチングについて説明する。
テンプレートマッチングは、画素パターンをテンプレートとして設定し、テンプレートとの類似度が最も高い領域を画像内で探索する技術である。テンプレートと画像領域との類似度として、対応画素間の差分絶対値和のような相関量を用いることができる。 Hereinafter, template matching and histogram matching will be described.
Template matching is a technique in which a pixel pattern is set as a template and an area having the highest similarity with the template is searched for in the image. As the similarity between the template and the image region, a correlation amount such as a sum of absolute differences between corresponding pixels can be used.

図４（ａ）は、テンプレート３０１とその構成例３０２を模式的に示す。テンプレートマッチングを行う場合、特徴抽出部１６２０からはテンプレートに利用する色（色相）の情報が特徴量として照合部１６１０に供給される。ここでは、テンプレート３０１が水平画素数Ｗ、垂直画素数Ｈの大きさであり、特徴量と一致する画素と一致しない画素とを、それぞれ別の固定値に置換する２値化が行われている。照合部１６１０は２値化されたテンプレート３０１を用いてパターンマッチングを行う。 FIG. 4A schematically shows a template 301 and a configuration example 302 thereof. When performing template matching, the feature extraction unit 1620 supplies color (hue) information used for the template as a feature amount to the matching unit 1610. Here, the template 301 has a horizontal pixel number W and a vertical pixel number H, and binarization is performed in which pixels that match the feature amount and pixels that do not match the feature amount are respectively replaced with different fixed values. . The matching unit 1610 performs pattern matching using the binarized template 301.

従って、パターンマッチングに用いるテンプレート３０１の特徴量T(i,j)は、テンプレート３０１内の座標を図４（ａ）に示すような座標系で表すと、以下の式（１）で表現できる。
T(i, j) = {T(0, 0), T(1, 0), ..., T(W-1, H-1)} （１） Therefore, the feature quantity T (i, j) of the template 301 used for pattern matching can be expressed by the following equation (1) when the coordinates in the template 301 are expressed in a coordinate system as shown in FIG.
T (i, j) = {T (0, 0), T (1, 0), ..., T (W-1, H-1)} (1)

図４（ｂ）は、被写体領域の探索領域３０３とその構成３０５の例を示す。探索領域３０３は画像内でパターンマッチングを行う範囲であり、画像の全体もしくは一部であってよい。探索領域３０３内の座標は（x, y）で表すものとする。探索領域３０３においても、特徴量と一致する画素と一致しない画素とを、それぞれ別の固定値に置換する２値化が行われている。領域３０４はテンプレート３０１と同じ大きさ（水平画素数Ｗ、垂直画素数Ｈ）を有し、テンプレート３０１との類似度を算出する対象である。 FIG. 4B shows an example of a subject area search area 303 and its configuration 305. The search area 303 is a range where pattern matching is performed in the image, and may be the whole or a part of the image. The coordinates in the search area 303 are represented by (x, y). Also in the search area 303, binarization is performed in which pixels that match the feature amount and pixels that do not match the feature value are respectively replaced with different fixed values. The area 304 has the same size as the template 301 (the number of horizontal pixels W and the number of vertical pixels H), and is an object for calculating the similarity with the template 301.

パターンマッチングに用いる領域３０４の特徴量S(i,j)は、テンプレート３０１内の座標を図４（ｂ）に示すような座標系で表すと、以下の式（２）で表現できる。
S(i, j) = {S(0, 0), S(1, 0), ..., S(W-1, H-1)} （２） The feature quantity S (i, j) of the region 304 used for pattern matching can be expressed by the following equation (2) when the coordinates in the template 301 are expressed in a coordinate system as shown in FIG.
S (i, j) = {S (0, 0), S (1, 0), ..., S (W-1, H-1)} (2)

照合部１６１０は、テンプレート３０１と領域３０４との類似性を表す評価値V(x, y)として、以下の式（３）に示す差分絶対和(SAD: Sum of Absolute Difference)値を算出する。

ここで、V(x, y)は、領域３０４の左上頂点の座標(x, y)における評価値を表す。 The collation unit 1610 calculates a sum of absolute difference (SAD) value represented by the following equation (3) as an evaluation value V (x, y) representing the similarity between the template 301 and the region 304.

Here, V (x, y) represents an evaluation value at the coordinates (x, y) of the upper left vertex of the region 304.

照合部１６１０は、領域３０４を探索領域３０３の左上から右方向に１画素ずつ、またx=(X-1)-(W-1)に達すると次にx=0として下方向に１画素ずつ、それぞれずらしながら、各位置で評価値V(x, y)を算出する。算出された評価値V(x, y)が最小値を示す座標(x, y)がテンプレート３０１と最も類似した画素パターンを有する領域３０４の位置を示す。照合部１６１０は、評価値V(x, y)が最小値を示す領域３０４を、探索領域内に存在する被写体領域として検出する。なお、探索結果の信頼性が低い場合（例えば評価値V(x, y)の最小値が閾値を超える場合）には、被写体領域が見つからなかったと判定してもよい。 The collation unit 1610 sets the area 304 by one pixel from the upper left of the search area 303 to the right, and when x = (X-1)-(W-1) is reached, next sets x = 0 one pixel at the bottom. The evaluation value V (x, y) is calculated at each position while shifting each. The coordinates (x, y) where the calculated evaluation value V (x, y) indicates the minimum value indicates the position of the region 304 having the pixel pattern most similar to the template 301. The collation unit 1610 detects an area 304 where the evaluation value V (x, y) is the minimum value as a subject area existing in the search area. When the reliability of the search result is low (for example, when the minimum value of the evaluation values V (x, y) exceeds the threshold value), it may be determined that the subject area has not been found.

ここではパターンマッチングに、特徴量に対応するいずれかの色であるか否かに応じて２値化したテンプレートを用いる例を示したが、特徴量に含まれる複数の色のそれぞれに応じて多値化したテンプレートを用いても良い。また、色の特徴量の代わりに明度や彩度に基づく特徴量を用いてもよい。また、類似度の評価値としてＳＡＤを用いる例を示したが、他の評価値、例えば正規化相互相関（ＮＣＣ: Normalized Cross-Correlation）やＺＮＣＣなどを用いてもよい。 Here, an example is shown in which a template that is binarized according to whether or not a color corresponding to a feature amount is used for pattern matching. However, there are many patterns that match each of a plurality of colors included in the feature amount. A valuated template may be used. Also, feature values based on lightness or saturation may be used instead of color feature values. Moreover, although the example which uses SAD as an evaluation value of similarity was shown, you may use other evaluation values, for example, normalized cross-correlation (NCC: Normalized Cross-Correlation), ZNCC, etc.

次に、ヒストグラムマッチングの詳細に関して説明する。
図５（ａ）は被写体領域４０１とそのヒストグラム４０２の例を示す。ヒストグラムマッチングを行う場合、特徴抽出部１６２０からは色ヒストグラムに利用する色（色相）の情報が特徴量として照合部１６１０に供給される。色ヒストグラムのビン数をＭ（Ｍは２以上の整数）とすると、照合部１６２０が生成する色ヒストグラムｐ（ｍ）４０２は以下の式（４）で表現できる。
p(m) = {p(0), p(1), ..., p(M-1)} （４）
なお、ｐ（ｍ）は正規化ヒストグラムであるものとする。この色ヒストグラムｐ（ｍ）は、特徴量に含まれる色に対応するビンのみを有する。つまりビン数がＭであるならば、特徴量として供給された色の数もＭである。 Next, details of histogram matching will be described.
FIG. 5A shows an example of the subject area 401 and its histogram 402. When histogram matching is performed, information on the color (hue) used for the color histogram is supplied from the feature extraction unit 1620 to the matching unit 1610 as a feature amount. If the number of bins in the color histogram is M (M is an integer equal to or greater than 2), the color histogram p (m) 402 generated by the matching unit 1620 can be expressed by the following equation (4).
p (m) = {p (0), p (1), ..., p (M-1)} (4)
Note that p (m) is a normalized histogram. This color histogram p (m) has only bins corresponding to colors included in the feature amount. That is, if the number of bins is M, the number of colors supplied as the feature amount is also M.

図５（ｂ）は、被写体領域の探索領域４０３と色ヒストグラム４０５の例を示す。領域４０４の色ヒストグラムｑ（ｍ）４０５はビン数がＭとすると、以下の式（５）で表現される。
q(m) = {q(0), q(1), ..., q(M-1)} （５）
なお、ｑ（ｍ）は正規化ヒストグラムであるものとする。また、この色ヒストグラムｑ（ｍ）も、特徴量に含まれる色に対応するビンのみを有するヒストグラムである。 FIG. 5B shows an example of a subject area search area 403 and a color histogram 405. The color histogram q (m) 405 of the region 404 is expressed by the following equation (5), where the number of bins is M.
q (m) = {q (0), q (1), ..., q (M-1)} (5)
Note that q (m) is a normalized histogram. The color histogram q (m) is also a histogram having only bins corresponding to the colors included in the feature amount.

追跡部１６１は、被写体領域４０１の色ヒストグラムｐ（ｍ）と領域４０４の色ヒストグラムｑ（ｍ）との類似性の評価値D(x, y)として以下の式（６）に示すBhattacharyya係数を算出することができる。

ここで、D(x, y)は、領域４０４の左上頂点の座標(x, y)における評価値を表す。 The tracking unit 161 uses a Bhattacharyya coefficient expressed by the following equation (6) as an evaluation value D (x, y) of similarity between the color histogram p (m) of the subject region 401 and the color histogram q (m) of the region 404. Can be calculated.

Here, D (x, y) represents an evaluation value at the coordinates (x, y) of the upper left vertex of the region 404.

照合部１６１０はテンプレートマッチングと同様に、領域４０４を探索領域４０３内でずらしながら、評価値D(x, y)を算出する。算出された評価値D(x, y)が最大値を示す座標(x, y)が被写体領域４０１と最も類似する領域４０４の位置を示す。照合部１６１０は、評価値D(x, y)が最大値を示す領域４０４を、探索領域内に存在する被写体領域として検出する。 Similar to the template matching, the matching unit 1610 calculates the evaluation value D (x, y) while shifting the region 404 within the search region 403. The coordinates (x, y) at which the calculated evaluation value D (x, y) is the maximum value indicate the position of the region 404 that is most similar to the subject region 401. The collation unit 1610 detects an area 404 where the evaluation value D (x, y) is the maximum value as a subject area existing in the search area.

ここではヒストグラムマッチングに色の特徴量を用いる例を示したが、色相や彩度の特徴量を用いてもよい。また、類似度の評価値としてBhattacharyya係数を用いる例を示したが、他の評価値、例えばヒストグラムインタセクションなどを用いてもよい。 Here, an example is shown in which the feature amount of color is used for histogram matching, but the feature amount of hue or saturation may be used. Further, although an example in which the Bhattacharyya coefficient is used as the evaluation value of the similarity is shown, other evaluation values such as a histogram intersection may be used.

距離マップ生成部１６３０では、１組の視差画像から被写体距離を算出し、距離マップを生成する。距離マップは画素のそれぞれが被写体距離を表す距離情報の１つであり、デプスマップ、奥行き画像、距離画像と呼ばれることもある。なお、距離マップは視差画像を用いずに生成してもよい。例えば、コントラスト評価値が極大となるフォーカスレンズ１３１の位置を画素ごとに求めることで、画素ごとの被写体距離を取得し、距離画像を生成してもよい。 The distance map generation unit 1630 calculates a subject distance from a set of parallax images and generates a distance map. The distance map is one piece of distance information in which each pixel represents the subject distance, and is sometimes called a depth map, a depth image, or a distance image. The distance map may be generated without using a parallax image. For example, the subject distance for each pixel may be obtained by obtaining the position of the focus lens 131 at which the contrast evaluation value is maximized for each pixel, and a distance image may be generated.

図６を用いて被写体距離の算出方法について説明する。図６において、Ａ像１１５１ａとＢ像１１５１ｂが得られているとすると、撮影レンズ１０１の焦点距離および、フォーカスレンズ１３１と撮像素子１４１との距離情報から、実線のように光束が屈折されることがわかる。従って、ピントの合う被写体は１１５２ａの位置にあることがわかる。同様にして、Ａ像１１５１ａに対してＢ像１１５１ｃが得られた場合には位置１１５２ｂ、Ｂ像１１５１ｄが得られた場合には位置１１５２ｃにピントの合う被写体があることがわかる。以上のように、各画素において、その画素を含むＡ像と、対応するＢ像との相対位置から、その画素位置における被写体の距離情報を算出することができる。 A method for calculating the subject distance will be described with reference to FIG. In FIG. 6, assuming that an A image 1151a and a B image 1151b are obtained, the light beam is refracted as shown by a solid line from the focal length of the photographing lens 101 and the distance information between the focus lens 131 and the imaging element 141. I understand. Therefore, it can be seen that the subject in focus is at the position 1152a. Similarly, it can be seen that there is a subject in focus at position 1152b when the B image 1151c is obtained for the A image 1151a and at position 1152c when the B image 1151d is obtained. As described above, in each pixel, the distance information of the subject at the pixel position can be calculated from the relative position between the A image including the pixel and the corresponding B image.

例えば図６においてＡ像１１５１ａとＢ像１１５１ｄが得られているとする。この場合、像のずれ量の半分に相当する中間点の画素１１５４から被写体位置１１５２ｃまでの距離１１５３または距離１１５３に相当するデフォーカス量を、画素１１５４の画素値として記憶する。このようにして、各画素について被写体の距離情報を算出し、距離マップを生成することができる。 For example, assume that an A image 1151a and a B image 1151d are obtained in FIG. In this case, the distance 1153 from the intermediate pixel 1154 corresponding to half of the image shift amount to the subject position 1152c or the defocus amount corresponding to the distance 1153 is stored as the pixel value of the pixel 1154. In this manner, the distance information of the subject can be calculated for each pixel, and a distance map can be generated.

なお、画像を微小領域に分割し、微小領域ごとにデフォーカス量を算出することによって距離マップを生成してもよい。微小領域に含まれる画素からＡ像およびＢ像を生成し、その位相差（像ずれ量）を相関演算によって検出し、デフォーカス量に変換すればよい。この場合においても生成される距離マップは各画素が被写体距離を示すが、微小領域に含まれる画素は同じ被写体距離を示す。距離マップ生成部１６３０は、生成した距離マップを特徴抽出部１６２０に供給する。 Note that the distance map may be generated by dividing the image into minute regions and calculating the defocus amount for each minute region. What is necessary is just to produce | generate A image and B image from the pixel contained in a micro area | region, detect the phase difference (image shift | offset | difference amount) by correlation calculation, and to convert into defocus amount. Even in this case, in the distance map generated, each pixel indicates the subject distance, but the pixels included in the minute region indicate the same subject distance. The distance map generation unit 1630 supplies the generated distance map to the feature extraction unit 1620.

なお、距離マップは画像全体に対して生成してもよいが、特徴量を抽出するために指定された部分領域に対してだけ生成してもよい。 Although the distance map may be generated for the entire image, it may be generated only for the partial area designated for extracting the feature amount.

特徴抽出部１６２０は、被写体領域から、被写体領域を追跡（探索）するために用いる特徴量を抽出する。
被写体追跡を実行する場合、一般には追跡の実行開始前に、ユーザに追跡対象となる画像中の位置を指定させる。例えば、撮影スタンバイ状態において、表示部１５０に表示されている画像内の位置を操作部１５６を通じてユーザに指定させることができる。例えば主制御部１５１は、表示部１５０がタッチディスプレイであればタップ操作された座標や、操作部１５６の操作を通じて画像上を移動可能なカーソルによって指定された位置の座標を取得する。特徴抽出部１６２０には主制御部１５１から指定位置の情報が入力される。 The feature extraction unit 1620 extracts a feature amount used for tracking (searching) the subject area from the subject area.
When subject tracking is performed, generally, a user specifies a position in an image to be tracked before starting the tracking. For example, in the shooting standby state, the position in the image displayed on the display unit 150 can be designated by the user through the operation unit 156. For example, if the display unit 150 is a touch display, the main control unit 151 acquires the coordinates of the tap operation or the coordinates of the position specified by the cursor that can move on the image through the operation of the operation unit 156. Information on the designated position is input from the main control unit 151 to the feature extraction unit 1620.

特徴抽出部１６２０が特徴量を抽出する被写体領域を特定する方法について、図７を参照して説明する。図７（ａ）は撮像画像を示し、指定位置５０３は人物の顔５０１内の座標を示すものとする。また、背景としての家５０２は、人物の顔５０１と類似した色情報を有しているとする。 A method by which the feature extraction unit 1620 specifies a subject area from which a feature amount is extracted will be described with reference to FIG. FIG. 7A shows the captured image, and the designated position 503 indicates the coordinates in the face 501 of the person. Further, it is assumed that the house 502 as the background has color information similar to the face 501 of the person.

特徴抽出部１６２０は、指定位置５０３を含んだ所定領域、例えば指定位置５０３を中心とした所定の矩形領域を仮の被写体領域として、被写体領域内の色ヒストグラムH_inを生成する。また、特徴抽出部１６２０は仮の被写体領域以外の全ての領域を参照領域とし、この参照領域に関する色ヒストグラムH_Outを生成する。色ヒストグラムは、画像に含まれる色の頻度を表し、ここでは一例として画素値をＲＧＢ色空間からＨＳＶ色空間に変換し、色相（Ｈ）についての色ヒストグラムを生成するものとする。しかし、他の型式の色ヒストグラムを生成してもよい。 The feature extraction unit 1620 generates a color histogram Hin _{in the} subject area using a predetermined area including the designated position 503, for example, a predetermined rectangular area centered on the designated position 503 as a temporary subject area. In addition, the feature extraction unit 1620 uses all regions other than the temporary subject region as reference regions, and generates a color histogram H _Out regarding this reference region. The color histogram represents the frequency of colors included in the image. Here, as an example, the pixel value is converted from the RGB color space to the HSV color space, and a color histogram for the hue (H) is generated. However, other types of color histograms may be generated.

そして、特徴抽出部１６２０は、以下の式（７）で表わされる情報量Ｉ（ａ）を算出する。
I(a) = -log₂（H_in(a) / H_out(a)）（７）
ここでａはビンの番号を示す整数である。情報量Ｉ（ａ）の絶対値は、参照領域に含まれるそのビンに対応する色の画素数に対する、仮の被写体領域に含まれるそのビンに対応する色の画素数の割合が大きいほど小さくなる。すなわち、この情報量Ｉ（ａ）の値が小さいほど、この情報量Ｉ（ａ）に対応する色は、参照領域に含まれる割合よりも、仮の被写体領域に含まれる割合が大きく、仮の被写体領域の特徴的な色である可能性が高いと考えられる。特徴抽出部１６２０は全てのビンについて情報量Ｉ（ａ）を算出する。 Then, the feature extraction unit 1620 calculates an information amount I (a) represented by the following equation (7).
I (a) = -log ₂ (H _in (a) / H _out (a)) (7)
Here, a is an integer indicating the bin number. The absolute value of the information amount I (a) decreases as the ratio of the number of pixels of the color corresponding to the bin included in the temporary subject region to the number of pixels of the color corresponding to the bin included in the reference region increases. . That is, as the value of the information amount I (a) is smaller, the color corresponding to the information amount I (a) has a larger proportion of the provisional subject area than the proportion of the reference region. It is likely that the color is a characteristic color of the subject area. The feature extraction unit 1620 calculates the information amount I (a) for all bins.

特徴抽出部１６２０は、算出した情報量Ｉ（ａ）のそれぞれを、特定の範囲（例えば８ビット値（０〜２５５）の範囲）内のいずれかの値に置換する。この際、特徴抽出部１６２０は、情報量Ｉ（ａ）の値が小さいほど大きい値に置換する。そして、特徴抽出部１６２０は、撮像画像に含まれる各画素の値を、その画素の色に対応する情報量Ｉ（ａ）が置換された値に置き換える。 The feature extraction unit 1620 replaces each calculated information amount I (a) with any value within a specific range (for example, a range of 8-bit values (0 to 255)). At this time, the feature extraction unit 1620 substitutes a larger value as the value of the information amount I (a) is smaller. Then, the feature extraction unit 1620 replaces the value of each pixel included in the captured image with a value in which the information amount I (a) corresponding to the color of the pixel is replaced.

このような処理により、特徴抽出部１６２０は、色情報に基づく被写体マップを生成する。図７（ｂ）は被写体マップの例を示し、白に近い画素は被写体の画素である確からしさが高く、黒に近い画素は被写体の画素である確からしさが低いことを示す。なお、便宜上、図７（ｂ）では被写体マップを二値画像として示しているが、実際には多階調画像である。撮像画像の背景としての家５０２の一部が人物の顔５０１と類似した色を有するため、色情報に基づく被写体マップでは人物の顔５０１の識別が十分ではない。図７（ｃ）に示す矩形領域５０４は、例えば、被写体マップで画素値が所定の閾値以上の領域に基づいて最終的に設定した（更新した）被写体領域の例を示す。 Through such processing, the feature extraction unit 1620 generates a subject map based on the color information. FIG. 7B shows an example of a subject map, in which a pixel close to white has a high probability of being a subject pixel, and a pixel close to black has a low probability of being a subject pixel. For convenience, the subject map is shown as a binary image in FIG. 7B, but it is actually a multi-tone image. Since a part of the house 502 as a background of the captured image has a color similar to that of the person's face 501, the subject map based on the color information does not sufficiently identify the person's face 501. A rectangular area 504 shown in FIG. 7C shows an example of an object area that is finally set (updated) based on an area in which the pixel value is equal to or greater than a predetermined threshold in the object map.

このような被写体領域から抽出した特徴量を用いた場合、人物の顔５０１を精度良く追跡できる可能性は低くなる。そのため本実施形態では、色情報に基づいて設定した被写体領域の精度を向上させるために、距離マップ生成部１６３０が生成した距離マップを利用する。図７（ｄ）に、図７（ａ）に示した撮像画像について生成された距離マップを、指定位置５０３に対応する被写体距離を基準として、被写体距離の差が小さいほど白く、大きいほど黒く表されるように変換した例を示す。なお、便宜上、図７（ｄ）では距離マップを二値画像として示しているが、実際には多階調画像である。 When the feature amount extracted from such a subject area is used, the possibility that the human face 501 can be accurately tracked is reduced. Therefore, in the present embodiment, the distance map generated by the distance map generation unit 1630 is used to improve the accuracy of the subject area set based on the color information. FIG. 7D shows a distance map generated for the captured image shown in FIG. 7A, with the subject distance corresponding to the designated position 503 as a reference, white as the subject distance difference is small and black as the subject distance is large. An example of conversion is shown below. For convenience, the distance map is shown as a binary image in FIG. 7D, but it is actually a multi-tone image.

特徴抽出部１６２０は、距離情報を加味した被写体マップを、例えば、距離マップと色情報に基づく被写体マップの対応画素の値を乗じることによって生成する。図７（ｅ）に、距離情報を加味した（すなわち、色情報と距離情報の両方に基づく）被写体マップの例を示す。図７（ｅ）に示す被写体マップでは、人物の顔５０１と背景としての家５０２とを精度良く区別できている。図７（ｆ）に示す矩形領域５０５は、例えば図７（ｅ）に示す被写体マップで画素値が所定の閾値以上の領域に基づいて設定した被写体領域の例を示す。矩形領域５０５は人物の顔５０１に外接した矩形領域であり、領域内に含まれる背景の画素が非常に少ない。このような被写体領域で抽出した特徴量を用いた場合、人物の顔５０１を精度良く追跡できる可能性は高くなる。 The feature extraction unit 1620 generates a subject map that takes distance information into consideration, for example, by multiplying the value of the corresponding pixel of the subject map based on the distance map and color information. FIG. 7E shows an example of a subject map that takes distance information into account (that is, based on both color information and distance information). In the subject map shown in FIG. 7E, the human face 501 and the house 502 as the background can be distinguished with high accuracy. A rectangular region 505 illustrated in FIG. 7F illustrates an example of a subject region set based on a region having a pixel value equal to or greater than a predetermined threshold in the subject map illustrated in FIG. The rectangular area 505 is a rectangular area circumscribing the person's face 501, and the number of background pixels included in the area is very small. When the feature amount extracted from such a subject area is used, the possibility that the human face 501 can be accurately tracked increases.

このように、指定位置を含んだ所定範囲に関する色情報に加え、距離情報を参照することにより、より精度の高い被写体領域を設定でき、精度の良い追跡に適した特徴量を抽出することが可能になる。 In this way, by referring to the distance information in addition to the color information related to the predetermined range including the designated position, it is possible to set a more accurate subject area and extract a feature quantity suitable for accurate tracking. become.

なお、追跡対象の位置が指定された時点において、指定位置およびその近傍領域に関し、有効な（参照するに足りる信頼性を有する）距離情報が得られていない場合もある。例えば、距離マップの生成が特定の領域（例えば焦点検出領域）についてしか実行されず、指定位置が特定領域外である場合や、指定位置のピントが合っておらず、距離情報の信頼性が低い場合などが考えられる。 In addition, at the time when the position of the tracking target is specified, there is a case where valid distance information (having sufficient reliability to refer to) is not obtained for the specified position and its neighboring area. For example, distance map generation is performed only for a specific area (for example, a focus detection area), and the specified position is out of the specified area, or the specified position is not in focus, and the reliability of the distance information is low. Cases can be considered.

そのため、特徴抽出部１６２０は、指定位置近傍（仮の被写体領域）について参照するに足りる信頼性を有する距離情報が得られていれば、色情報に加えて距離情報を参照して被写体領域を設定する。一方、指定位置近傍（仮の被写体領域）について参照するに足りる信頼性を有する距離情報が得られていない場合、特徴抽出部１６２０は、距離情報を参照せずに色情報に基づいて被写体領域を設定する。なお、参照するに足りる信頼性を有する距離情報とは、例えば、仮の被写体領域が合焦状態もしくは合焦に近い状態（すなわちデフォーカス量が所定の閾値以下である状態）で得られた距離情報であってよいが、これに限定されない。 Therefore, the feature extraction unit 1620 sets the subject region by referring to the distance information in addition to the color information if distance information having sufficient reliability to refer to the vicinity of the designated position (temporary subject region) is obtained. To do. On the other hand, when the distance information having sufficient reliability to refer to the vicinity of the designated position (temporary subject area) is not obtained, the feature extraction unit 1620 selects the subject area based on the color information without referring to the distance information. Set. The distance information having sufficient reliability to be referred to is, for example, the distance obtained when the temporary subject area is in focus or close to focus (that is, the defocus amount is equal to or less than a predetermined threshold). It may be information, but is not limited to this.

（撮像装置の処理の流れ）
図８および図９のフローチャートを用いて、本実施形態のデジタルカメラ１００による、被写体追跡処理を伴う動画撮影動作に関して説明する。動画撮影動作は、撮影スタンバイ時や動画記録時に実行される。なお、撮影スタンバイ時と動画記録時とでは取り扱う画像（フレーム）の解像度など、細部において異なるが、被写体追跡に係る処理の内容は基本的に同様であるため、以下では特に区別せずに説明する。 (Processing flow of the imaging device)
With reference to the flowcharts of FIGS. 8 and 9, the moving image shooting operation with subject tracking processing by the digital camera 100 of the present embodiment will be described. The moving image shooting operation is executed during shooting standby or when recording a moving image. Note that although details such as resolution of images (frames) to be handled differ between shooting standby and moving image recording, the contents of processing related to subject tracking are basically the same, and therefore will be described without particular distinction below. .

Ｓ８０１で主制御部１５１はデジタルカメラ１００の電源がＯＮかどうか判定し、ＯＮと判定されなければ処理を終了し、ＯＮと判定されれば処理をＳ８０２に進める。
Ｓ８０２で主制御部１５１は各部を制御し、１フレーム分の撮像処理を実行して処理をＳ８０３に進める。なお、ここでは１組の視差画像と、１画面分の撮像画像とが生成され、ＲＡＭ１５４に格納される。 In step S801, the main control unit 151 determines whether the power of the digital camera 100 is ON. If it is not determined to be ON, the main control unit 151 ends the process. If it is determined to be ON, the process proceeds to step S802.
In step S802, the main control unit 151 controls each unit, executes an imaging process for one frame, and advances the process to step S803. Here, a set of parallax images and a captured image for one screen are generated and stored in the RAM 154.

Ｓ８０３で主制御部１５１は、追跡部１６１に被写体追跡処理を実行させる。処理の詳細については後述する。なお、被写体追跡処理により、追跡部１６１から被写体領域の位置や大きさが主制御部１５１に通知される。主制御部１５１は通知された被写体領域に基づいて焦点検出領域を設定する。 In step S803, the main control unit 151 causes the tracking unit 161 to execute subject tracking processing. Details of the processing will be described later. Note that the position and size of the subject area are notified from the tracking unit 161 to the main control unit 151 by subject tracking processing. The main control unit 151 sets a focus detection area based on the notified subject area.

Ｓ８０４で主制御部１５１は、フォーカス制御部１３３に焦点検出処理を実行させる。フォーカス制御部１３３は、１対の視差画像のうち焦点検出領域に含まれる複数の画素のうち、同一行に配置された複数の画素から得られる複数のＡ信号をつなぎ合わせてＡ像を、複数のＢ信号をつなぎ合わせてＢ像を生成する。そして、フォーカス制御部１３３は、Ａ像とＢ像との相対的な位置をずらしながらＡ像とＢ像の相関量を演算し、Ａ像とＢ像との類似度が最も高くなる相対位置をＡ像とＢ像との位相差（ずれ量）として求める。さらに、フォーカス制御部１３３は位相差をデフォーカス量およびデフォーカス方向に変換する。 In step S804, the main control unit 151 causes the focus control unit 133 to execute focus detection processing. The focus control unit 133 connects a plurality of A signals obtained from a plurality of pixels arranged in the same row among a plurality of pixels included in the focus detection region of the pair of parallax images, and generates a plurality of A images. The B signals are connected to generate a B image. Then, the focus control unit 133 calculates the correlation amount between the A image and the B image while shifting the relative position between the A image and the B image, and determines the relative position where the similarity between the A image and the B image is the highest. Obtained as the phase difference (deviation amount) between the A and B images. Further, the focus control unit 133 converts the phase difference into a defocus amount and a defocus direction.

Ｓ８０５でフォーカス制御部１３３はＳ８０４で求めたデフォーカス量およびデフォーカス方向に対応するレンズ駆動量および駆動方向に従ってフォーカスモータ１３２を駆動し、フォーカスレンズ１３１を移動させ、処理をＳ８０１に戻す。 In step S805, the focus control unit 133 drives the focus motor 132 according to the lens drive amount and drive direction corresponding to the defocus amount and defocus direction obtained in step S804, moves the focus lens 131, and returns the process to step S801.

以後、Ｓ８０１で電源スイッチがＯＮであると判定されなくなるまで、Ｓ８０２〜Ｓ８０５の処理を繰り返し実行する。これにより、時系列的な複数の画像に対して被写体領域の探索が行われ、被写体追跡機能が実現される。なお、図８では被写体追跡処理を毎フレーム実行するものとしているが、処理負荷や消費電力の軽減を目的として数フレームごとに行うようにしてもよい。 Thereafter, the processes of S802 to S805 are repeatedly executed until it is not determined in S801 that the power switch is ON. Thus, the subject area is searched for a plurality of time-series images, and the subject tracking function is realized. In FIG. 8, the subject tracking process is performed every frame, but may be performed every several frames for the purpose of reducing the processing load and power consumption.

（被写体追跡処理）
次に、図９のフローチャートを用いて、Ｓ８０３における被写体追跡処理の詳細について説明する。
Ｓ９０１で追跡部１６１は、被写体追跡の開始指示が検出されたか否かを判定し、開始指示があったと判定されればＳ９０２へ、判定されなければＳ９０６へ、処理を進める。なお、開始指示は例えば操作部１５６からの追跡位置の指定入力であってよい。指定された位置の情報は主制御部１５１から通知される。この時点では、指定された位置の距離情報が得られていなかったり、指定された位置が非合焦のため距離情報の信頼性が低かったりする可能性が高い。そのため、指定された位置について焦点検出処理が行われた後とは処理内容を異ならせている。 (Subject tracking process)
Next, details of the subject tracking process in S803 will be described using the flowchart of FIG.
In step S901, the tracking unit 161 determines whether a subject tracking start instruction has been detected. If it is determined that a start instruction has been received, the process proceeds to step S902. If not, the process proceeds to step S906. The start instruction may be a tracking position designation input from the operation unit 156, for example. Information on the designated position is notified from the main control unit 151. At this time, there is a high possibility that the distance information of the designated position is not obtained or the reliability of the distance information is low because the designated position is out of focus. Therefore, the processing content is different from that after the focus detection processing is performed for the designated position.

Ｓ９０２で追跡部１６１（特徴抽出部１６２０）は指定位置およびその近傍について有効な（信頼性の高い）距離情報が得られているか否かを判定し、得られていると判定されればＳ９０４へ、得られていると判定されなければＳ９０３へ、処理を進める。 In S902, the tracking unit 161 (feature extraction unit 1620) determines whether effective (highly reliable) distance information is obtained for the designated position and its vicinity, and if it is determined that the information is obtained, the process proceeds to S904. If it is not determined that it has been obtained, the process proceeds to S903.

Ｓ９０３で追跡部１６１（特徴抽出部１６２０）は上述したように色情報のみを用いて指定位置から被写体領域を特定し、被写体領域の特徴量を抽出して処理をＳ９０５に進める。 In step S903, the tracking unit 161 (feature extraction unit 1620) specifies the subject area from the specified position using only the color information as described above, extracts the feature amount of the subject area, and advances the process to step S905.

Ｓ９０４で追跡部１６１（特徴抽出部１６２０）は上述したように色情報と距離情報の両方を用いて指定位置から被写体領域を特定し、被写体領域の特徴量（画素パターンまたはヒストグラム）を抽出して処理をＳ９０５に進める。 In step S904, the tracking unit 161 (feature extraction unit 1620) identifies the subject region from the specified position using both the color information and the distance information as described above, and extracts the feature amount (pixel pattern or histogram) of the subject region. The process proceeds to S905.

Ｓ９０５で追跡部１６１（照合部１６１０）は、Ｓ９０３またはＳ９０４で抽出された特徴量を用いて撮像画像の探索領域に対してマッチング処理を実行し、特徴量の類似度が最も高い領域を探索する。追跡部１６１は、探索された領域の位置および大きさに関する情報を追跡結果として主制御部１５１に通知し、追跡処理を終了する。 In S905, the tracking unit 161 (collation unit 1610) performs matching processing on the search region of the captured image using the feature amount extracted in S903 or S904, and searches for a region having the highest feature amount similarity. . The tracking unit 161 notifies the main control unit 151 of information related to the position and size of the searched area as a tracking result, and ends the tracking process.

一方、Ｓ９０６で追跡部１６１（特徴抽出部１６２０）は、直近に抽出した特徴量が、色情報と距離情報の両方を用いて特定された被写体領域から抽出されたものか否かを判定する。そして、追跡部１６１（特徴抽出部１６２０）は、直近に抽出した特徴量が、色情報と距離情報の両方を用いて特定された被写体領域から抽出されたものと判定されればＳ９０５へ、判定されなければＳ９０７へ、処理を進める。 On the other hand, in S906, the tracking unit 161 (feature extraction unit 1620) determines whether or not the most recently extracted feature amount is extracted from the subject area specified using both the color information and the distance information. If the tracking unit 161 (feature extraction unit 1620) determines that the most recently extracted feature value is extracted from the subject area specified using both the color information and the distance information, the process proceeds to S905. If not, the process proceeds to S907.

Ｓ９０７で追跡部１６１（特徴抽出部１６２０）は、前回の照合により検出された被写体領域について有効な距離情報が得られているか否かを判定し、得られていると判定されればＳ９０８へ、得られていると判定されなければＳ９０５へ、処理を進める。 In S907, the tracking unit 161 (feature extraction unit 1620) determines whether or not effective distance information has been obtained for the subject area detected by the previous collation, and if it is determined, the process proceeds to S908. If it is not determined that it has been obtained, the process proceeds to S905.

Ｓ９０８で追跡部１６１（特徴抽出部１６２０）はＳ９０４と同様に色情報と距離情報の両方を用いて指定位置から被写体領域を改めて特定（更新）し、更新した被写体領域の特徴量を抽出して処理をＳ９０５に進める。なお、Ｓ９０８で抽出した特徴量に、過去に抽出した（例えば直前のＳ９０３の処理で抽出した）特徴量を加味するようにしてもよい。 In S908, the tracking unit 161 (feature extraction unit 1620) respecifies (updates) the subject region from the designated position using both the color information and the distance information, and extracts the feature amount of the updated subject region, as in S904. The process proceeds to S905. Note that the feature amount extracted in the past (for example, extracted in the process of S903 immediately before) may be added to the feature amount extracted in S908.

継続処理中にＳ９０５で実行される照合処理では、Ｓ９０８で特徴量が更新されていれば更新された特徴量を用い、Ｓ９０８で特徴量が更新されていなければ直近に抽出した特徴量を継続して用いる。 In the matching process executed in S905 during the continuation process, the updated feature quantity is used if the feature quantity has been updated in S908, and the most recently extracted feature quantity is continued if the feature quantity has not been updated in S908. Use.

例えば前回の照合により検出された被写体領域についての焦点検出処理は開始されていても、デフォーカス量が所定の閾値以下になっていなければ、距離情報の信頼性が高いとは言えない。このような場合は、Ｓ９０１、Ｓ９０６、Ｓ９０７、Ｓ９０５の手順で処理される。
追跡された被写体領域のデフォーカス量が所定の閾値以下になれば、被写体領域について信頼性の高い距離情報が取得できる。このような場合は、Ｓ９０１、Ｓ９０６、Ｓ９０７、Ｓ９０８、Ｓ９０５の手順で処理される。
色情報だけでなく、信頼性の高い距離情報も用いて被写体領域が特定されるようになったら被写体領域および特徴量を更新し、以後の追跡処理においては更新した特徴量を用いる。この場合は、Ｓ９０１、Ｓ９０６、Ｓ９０５の手順で処理される。 For example, even if the focus detection process for the subject area detected by the previous collation is started, it cannot be said that the reliability of the distance information is high unless the defocus amount is equal to or less than a predetermined threshold. In such a case, it is processed in the procedure of S901, S906, S907, and S905.
If the defocus amount of the tracked subject area is equal to or less than a predetermined threshold, highly reliable distance information can be acquired for the subject area. In such a case, it is processed in the procedure of S901, S906, S907, S908, and S905.
When the subject region is specified using not only color information but also highly reliable distance information, the subject region and the feature amount are updated, and the updated feature amount is used in the subsequent tracking processing. In this case, processing is performed in the steps S901, S906, and S905.

以上説明したように本実施形態によれば、画像中の指定位置に基づいて追跡を行う画像領域（被写体領域）を特定する際、画像の色情報に加え、距離情報を用いることにより、被写体領域の精度を向上させることができる。そのため、被写体領域から抽出される特徴量を用いる追跡処理の精度を向上させることができる。 As described above, according to the present embodiment, when specifying an image region (subject region) to be tracked based on a designated position in the image, by using distance information in addition to the color information of the image, the subject region Accuracy can be improved. Therefore, it is possible to improve the accuracy of the tracking process that uses the feature amount extracted from the subject area.

また、距離情報の信頼性が高くない場合には、信頼性が高くなるまでは色情報に基づいて被写体領域を特定し、信頼性が高い距離情報が得られるようになった時点で距離情報をさらに用いて被写体領域を特定し直す（更新する）。そのため、距離情報が得られていない位置や距離情報の信頼性が低い位置が追跡対象として指定された場合であっても、時間の経過と共に追跡処理の精度を向上させることができる。 If the distance information is not highly reliable, the subject area is identified based on the color information until the reliability becomes high, and the distance information is obtained when highly reliable distance information can be obtained. Furthermore, the subject area is specified again (updated). Therefore, even when a position where distance information is not obtained or a position where the reliability of distance information is low is designated as a tracking target, the accuracy of the tracking process can be improved as time passes.

●＜第２の実施形態＞
第１の実施形態では、信頼性が高い距離情報と色情報に基づいて特定した被写体領域から特徴量を抽出できた場合、特徴量を更新しない。これにより、ドリフトの蓄積を回避できたり、オクルージョンに強い被写体追跡が実現できる。一方で、例えば、被写体の存在する環境が変化した場合など、被写体の輝度や色相が特徴量を抽出したときから変化した場合に被写体の追跡精度が低下することがある。 ● <Second Embodiment>
In the first embodiment, when a feature amount can be extracted from a subject region specified based on distance information and color information with high reliability, the feature amount is not updated. As a result, accumulation of drift can be avoided and subject tracking resistant to occlusion can be realized. On the other hand, for example, when the environment in which the subject exists changes, the subject tracking accuracy may decrease when the luminance or hue of the subject changes from when the feature amount is extracted.

そこで、本実施形態では、被写体領域とその周辺領域の距離情報の差異が所定の条件を満たす場合には、信頼性が高い距離情報を用いて抽出した特徴量についても更新することを特徴としている。なお、本実施形態は第１の実施形態と同様に図１の構成を有するデジタルカメラ１００で実施可能であるため、以下では主に第１の実施形態との動作上の差異について説明する。 Therefore, the present embodiment is characterized in that when the difference in distance information between the subject area and the surrounding area satisfies a predetermined condition, the feature amount extracted using distance information with high reliability is also updated. . Since the present embodiment can be implemented by the digital camera 100 having the configuration of FIG. 1 as in the first embodiment, the following mainly describes differences in operation from the first embodiment.

図１０のフローチャートを用いて、本実施形態のデジタルカメラ１００による、被写体追跡処理を伴う動画撮影動作に関して説明する。
図１０の、Ｓ１００１〜Ｓ１００３およびＳ１００５〜Ｓ１００６は、図８のＳ８０１〜Ｓ８０５と同じである。本実施形態では、Ｓ１００３で被写体追跡処理を行った後、Ｓ１００４で特徴量更新処理を行う点が第１の実施形態と異なる。 With reference to the flowchart of FIG. 10, the moving image shooting operation with subject tracking processing by the digital camera 100 of the present embodiment will be described.
S1001 to S1003 and S1005 to S1006 in FIG. 10 are the same as S801 to S805 in FIG. The present embodiment is different from the first embodiment in that after subject tracking processing is performed in S1003, feature amount update processing is performed in S1004.

次に、図１１のフローチャートを用いて、図１０のＳ１００４で実施する特徴量更新処理の詳細について説明する。
Ｓ１１０１で、追跡部１６１（特徴抽出部１６２０）は、照合処理（Ｓ９０５）で探索された被写体領域と、得られている距離情報とから、被写体領域とその周辺領域の距離情報の差異が大きいか否かを判定する。 Next, details of the feature amount update processing performed in S1004 of FIG. 10 will be described using the flowchart of FIG.
In S1101, the tracking unit 161 (feature extraction unit 1620) determines whether there is a large difference in distance information between the subject region and the surrounding region from the subject region searched in the matching process (S905) and the obtained distance information. Determine whether or not.

図１２（ａ）と図１２（ｃ）はそれぞれ別の撮像画像を、図１２（ｂ）と図１２（ｄ）はそれぞれ図１２（ａ）と図１２（ｃ）の撮像画像に対して生成された距離マップを模式的に示す。図１２（ａ）では、人物１２０１の後ろに距離をあけて背景としての家１２０２が存在し、図１２（ｃ）では、人物１２０５の手前に別の人物１２０６が存在している。 12 (a) and 12 (c) generate different captured images, and FIGS. 12 (b) and 12 (d) generate the captured images of FIGS. 12 (a) and 12 (c), respectively. The distance map is schematically shown. In FIG. 12A, a house 1202 exists as a background with a distance behind the person 1201, and in FIG. 12C, another person 1206 exists in front of the person 1205.

図１２（ｂ）の距離マップは、各画素の距離情報を、追跡処理の対象である人物１２０１に対応する距離情報を基準とした差が小さいほど白く、大きいほど黒く示している。同様に、図１２（ｂ）の距離マップは、各画素の距離情報を、追跡処理の対象である人物１２０５に対応する距離情報を基準とした差が小さいほど白く、大きいほど黒く示している。なお、作図上、図１２（ｂ）および（ｄ）は距離マップを二値画像として示しているが、実際には多値のグレースケール画像である。なお、基準とする距離情報は、被写体領域に対応する距離情報は、距離情報の平均値もしくは最も頻度の高い距離情報などであってよい。 The distance map of FIG. 12B shows the distance information of each pixel as white as the difference based on the distance information corresponding to the person 1201 that is the target of the tracking process is smaller and black as it is larger. Similarly, the distance map of FIG. 12B shows the distance information of each pixel as white as the difference based on the distance information corresponding to the person 1205 that is the target of the tracking process is smaller and as black as it is larger. In the drawing, FIGS. 12B and 12D show the distance map as a binary image, but it is actually a multi-value grayscale image. The reference distance information may be the average value of distance information or the most frequently used distance information.

図１２（ｂ）の領域１２０３および図１２（ｄ）の領域１２０７は、Ｓ１００３の被写体追跡処理によって特定された被写体領域であり、領域１２０４および領域１２０８はそれぞれ領域１２０３および領域１２０７の周辺領域である。ここでは、被写体領域の周辺領域を、被写体領域を上下および左右方向に等量拡大し、水平方向および垂直方向のサイズがそれぞれ被写体領域の３倍の領域から、被写体領域を除外した、中心が空いた中空の領域と規定する。ただし、これは一例であり、他の方法で規定してもよい。 An area 1203 in FIG. 12B and an area 1207 in FIG. 12D are subject areas specified by the subject tracking process in S1003, and an area 1204 and an area 1208 are peripheral areas of the areas 1203 and 1207, respectively. . Here, the center of the subject area is vacant by excluding the subject area from the area that is three times larger than the subject area in the horizontal and vertical sizes by expanding the subject area by the same amount in the vertical and horizontal directions. It is defined as a hollow area. However, this is an example, and other methods may be used.

追跡部１６１（特徴抽出部１６２０）は、周辺領域から、主被写体領域における距離情報と類似する（差が所定の範囲内である）距離情報を有する領域を抽出し、この抽出された領域が周辺領域において占める割合が、所定の閾値以上であるか否かを判定する。追跡部１６１（特徴抽出部１６２０）は、この割合が閾値以上と判定されれば特徴量更新処理を終了し、割合が閾値以上と判定されなければ処理をＳ１１０２に進める。 The tracking unit 161 (feature extraction unit 1620) extracts a region having distance information similar to the distance information in the main subject region (the difference is within a predetermined range) from the peripheral region, and the extracted region is a peripheral region. It is determined whether the proportion of the area is equal to or greater than a predetermined threshold. The tracking unit 161 (feature extraction unit 1620) ends the feature amount update processing if the ratio is determined to be equal to or greater than the threshold value, and proceeds to S1102 if the ratio is not determined to be equal to or greater than the threshold value.

Ｓ１１０１での判定に関して説明する。周辺領域のうち、主被写体領域における距離情報と類似する距離情報を有する部分の割合が少なければ（例えば閾値未満であれば）、追跡対象である被写体領域と背景領域とが明確に区別できる状況であると考えられる。そのため、この条件を満たす撮像画像に基づいて特徴量を更新しても、更新後の特徴量における背景の影響は少ないと考えられる。 The determination in S1101 will be described. If the ratio of the portion having distance information similar to the distance information in the main subject region in the peripheral region is small (for example, less than the threshold value), the subject region that is the tracking target and the background region can be clearly distinguished. It is believed that there is. For this reason, even if the feature amount is updated based on a captured image that satisfies this condition, it is considered that the influence of the background on the updated feature amount is small.

反対に、周辺領域のうち、主被写体領域における距離情報と類似する距離情報を有する部分の割合が多ければ（例えば閾値以上であれば）、追跡対象である被写体領域と背景領域との区別が難しい状況であると考えられる。 On the other hand, if the ratio of the portion having distance information similar to the distance information in the main subject region in the peripheral region is large (for example, if it is greater than or equal to a threshold), it is difficult to distinguish the subject region that is the tracking target from the background region The situation is considered.

図１２（ｂ）および（ｄ）の例では、白く示された領域が、主被写体領域に対応する距離情報と類似する距離情報を有する領域である。Ｓ１１０１で用いる閾値は例えば実験的に定めることができる。ここでは、主被写体領域における距離情報と類似する（差が所定の範囲内である）距離情報を有する領域が周辺領域において占める割合が、図１２（ｂ）に示す例では所定の閾値未満、図１２（ｄ）に示す例では所定の閾値以上であると判定される。 In the examples of FIGS. 12B and 12D, the white area is an area having distance information similar to the distance information corresponding to the main subject area. The threshold used in S1101 can be determined experimentally, for example. Here, the ratio of the area having distance information similar to the distance information in the main subject area (the difference is within a predetermined range) in the peripheral area is less than the predetermined threshold in the example shown in FIG. In the example shown in 12 (d), it is determined that the value is equal to or greater than a predetermined threshold.

Ｓ１１０２で、追跡部１６１（特徴抽出部１６２０）は、照合処理で算出した評価値（式（３））に基づいて、照合処理で探索された被写体領域から抽出した新たな特徴量と、照合処理で被写体領域の探索に用いた特徴量との類似度が低いか否かを判定する。具体的には、特徴抽出部１６２０は、照合部１６１０が算出した新たな評価値が更新閾値よりも高いか否か、あるいは、Bhattacharyya係数に基づく評価値（式（６））が、別の更新閾値より低いか否かを判定する。 In S1102, the tracking unit 161 (feature extraction unit 1620) uses the new feature amount extracted from the subject area searched by the matching process based on the evaluation value (Equation (3)) calculated by the matching process, and the matching process. It is determined whether or not the similarity with the feature amount used for searching the subject area is low. Specifically, the feature extraction unit 1620 determines whether or not the new evaluation value calculated by the collation unit 1610 is higher than the update threshold, or the evaluation value based on the Bhattacharyya coefficient (expression (6)) is another update. It is determined whether it is lower than the threshold value.

探索された被写体領域から、探索に用いられた特徴量と類似度が低い特徴量が抽出された場合、被写体領域の探索はできたが、被写体領域の見た目に変化が生じており、特徴量を更新する必要性が高いと考えられる。一方で、探索された被写体領域から、探索に用いられた特徴量と類似度が高い特徴量が抽出された場合には、被写体領域の見た目の変化が小さく、特徴量を更新する必要性は低いと考えられる。 If a feature quantity with a low similarity to the feature quantity used for the search is extracted from the searched subject area, the subject area has been searched, but the appearance of the subject area has changed, and the feature quantity is The need to update is considered high. On the other hand, when a feature amount having a high similarity to the feature amount used for the search is extracted from the searched subject region, the change in the appearance of the subject region is small, and the necessity to update the feature amount is low. it is conceivable that.

したがって、追跡部１６１（特徴抽出部１６２０）は、Ｓ１１０２で類似度が低いと判定されれば処理をＳ１１０３の処理に進め、類似度が低いと判定されなければ特徴量更新処理を終了する。 Therefore, the tracking unit 161 (feature extraction unit 1620) proceeds to the processing of S1103 if it is determined in S1102 that the similarity is low, and ends the feature amount update processing if it is not determined that the similarity is low.

Ｓ１１０３で追跡部１６１（特徴抽出部１６２０）は、Ｓ９０８と同様に、探索された被写体領域から抽出された新たな特徴量で、照合処理に用いる特徴量を更新する。更新の方法に特に制限はない。例えば、それまで照合処理に用いていた特徴量を新たな特徴量で完全に置き換えてもよいし、それまで照合処理に用いていた特徴量と新たな特徴量とを用いて更新後の特徴量を算出してもよい。たとえば、差分絶対和に基づく評価値（式（３））であれば、式（８）にしたがって更新後の特徴量を求めることができる。
T(i, j) = Tpre(i, j)×α + Tnow(i, j)×(1-α) , 0 ≦α≦1 （８）
ここで、Tpre (i,j)が照合処理に用いた特徴量、Tnow (i,j)が新たな特徴量、T(i, j)が更新後の特徴量である。 In S1103, the tracking unit 161 (feature extraction unit 1620) updates the feature amount used for the collation process with the new feature amount extracted from the searched subject area, similarly to S908. There is no particular restriction on the updating method. For example, the feature value that has been used for the matching process may be completely replaced with a new feature value, or the feature value that has been updated by using the feature value that has been used for the matching process and the new feature value. May be calculated. For example, if the evaluation value is based on the absolute difference sum (equation (3)), the updated feature amount can be obtained according to equation (8).
T (i, j) = Tpre (i, j) x α + Tnow (i, j) x (1-α), 0 ≤ α ≤ 1 (8)
Here, Tpre (i, j) is a feature amount used in the matching process, Tonow (i, j) is a new feature amount, and T (i, j) is an updated feature amount.

また、Bhattacharyya係数に基づく評価値（式（６））であれば、式（９）にしたがって更新後の特徴量を求めることができる。
p(m) = ppre(m)×α + pnow(m)×(1-α) , 0 ≦α≦1 （９）
ここで、ppre (m)が照合処理に用いた特徴量、pnow (m)が新たな特徴量、p(m)が更新後の特徴量を示す。 If the evaluation value is based on the Bhattacharyya coefficient (equation (6)), the updated feature amount can be obtained according to equation (9).
p (m) = ppre (m) × α + pnow (m) × (1-α), 0 ≦ α ≦ 1 (9)
Here, ppre (m) represents the feature amount used for the matching process, pnow (m) represents the new feature amount, and p (m) represents the updated feature amount.

式（８）および式（９）のいずれにおいても、α＝0が新たな抽出した特徴量で完全に置換する更新を示し、α＝１が特徴量が更新されないことを示す。更新の度合いαは、例えば、Ｓ１１０１で判定された距離情報の差異の大きさと、Ｓ１１０２で判定された類似度との少なくとも一方に応じて適応的に決定することができる。 In both formulas (8) and (9), α = 0 indicates an update that completely replaces with the newly extracted feature value, and α = 1 indicates that the feature value is not updated. The degree of update α can be adaptively determined according to at least one of the magnitude of the difference in distance information determined in S1101 and the similarity determined in S1102, for example.

例えば、Ｓ１１０１およびＳ１１０２での判定条件を満たした上で、距離情報の差異が大きいほど、また類似度が低いほど、更新の度合いαの値を小さく（新たな特徴量の寄与を大きく）して更新後の特徴量を算出することができる。また、Ｓ１１０１およびＳ１１０２での判定条件を満たした上で、距離情報の差異が小さいほど、また類似度が高いほど、更新の度合いαの値を大きく（新たな特徴量の寄与を小さく）して更新後の特徴量を算出することができる。 For example, after satisfying the determination conditions in S1101 and S1102, the greater the difference in distance information and the lower the similarity, the smaller the value of the update degree α (the greater the contribution of the new feature value). The updated feature value can be calculated. Further, after satisfying the determination conditions in S1101 and S1102, as the difference in distance information is smaller and the similarity is higher, the value of the update degree α is increased (the contribution of the new feature amount is reduced). The updated feature value can be calculated.

さらに、合焦距離や露出を確定する操作（例えば、撮影準備指示または撮影開始指示に相当する操作であり、操作部１５６に含まれるシャッタボタンの操作）が検出された場合、その時点で被写体の追跡処理が成功している可能性が高いと考えられる。したがって、合焦距離や露出を確定する操作が検出された場合には、その時点で検出されている被写体領域から抽出された新たな特徴量で更新されやすくなるようにＳ１１０１およびＳ１１０２での判定に用いる閾値を変更するようにしてもよい。 Furthermore, when an operation for determining the in-focus distance or exposure (for example, an operation corresponding to a shooting preparation instruction or a shooting start instruction and an operation of a shutter button included in the operation unit 156) is detected, the object is detected at that time. The tracking process is likely to be successful. Therefore, when an operation for determining the in-focus distance or the exposure is detected, the determination in S1101 and S1102 is performed so that the feature is easily updated with a new feature amount extracted from the subject area detected at that time. The threshold value used may be changed.

以上説明したように本実施形態によれば、距離情報を用い、被写体領域から精度良く特徴量を抽出できる場合には特徴量を更新できるようにした。そのため、追跡対象の被写体領域の見えが変化する場合であっても、追跡精度を低下させることなく、特徴量を更新することが可能となり、被写体追跡の性能をさらに向上させることができる。 As described above, according to the present embodiment, the feature amount can be updated when the feature information can be accurately extracted from the subject region using the distance information. Therefore, even when the appearance of the subject area to be tracked changes, the feature amount can be updated without reducing the tracking accuracy, and the subject tracking performance can be further improved.

（その他の実施形態）
なお、上述の実施形態では撮影時に被写体追跡を行う場合について説明したが、距離情報が取得可能であれば、動画像の再生時においても同様の被写体追跡を行うことが可能である。この場合、動画像のフレームに記録されている距離情報を取得してもよいし、各フレームが１組の視差画像の形式で記録されていれば、視差画像から距離情報を生成し、視差画像を合成して再生用の動画フレームを生成すればよい。もちろん、他の方法で距離情報を取得してもよい。 (Other embodiments)
In the above-described embodiment, the case where subject tracking is performed at the time of shooting has been described. However, if distance information can be acquired, similar subject tracking can be performed even when a moving image is reproduced. In this case, the distance information recorded in the frame of the moving image may be acquired, or if each frame is recorded in the form of a set of parallax images, the distance information is generated from the parallax image, and the parallax image May be combined to generate a moving image frame for playback. Of course, the distance information may be acquired by other methods.

再生時に被写体追跡を実行する場合、追跡結果は例えば動画の表示方法の制御に用いることができる。例えば、追跡中の被写体領域が画面の中心に表示されるように制御したり、追跡中の被写体領域の大きさが一定になるようにスケーリングして表示されるように制御したりすることができる。また、追跡中の被写体領域を特定する指標（例えば被写体領域の外接矩形枠）を重畳表示するようにしてもよい。なお、これらは単なる例にすぎず、追跡結果を他の用途で用いてもよい。 When subject tracking is performed during reproduction, the tracking result can be used, for example, for controlling a moving image display method. For example, it can be controlled so that the tracked subject area is displayed at the center of the screen, or can be controlled to be scaled and displayed so that the size of the tracked subject area is constant. . In addition, an index (for example, a circumscribed rectangular frame of the subject area) for specifying the subject area being tracked may be superimposed and displayed. Note that these are merely examples, and the tracking results may be used for other purposes.

追跡中の被写体領域を特定する指標の重畳表示を、被写体領域が距離情報を参照して特定されている場合と、色情報のみを用いて特定されている場合とで異なる形態としてもよい。例えば、被写体領域が色情報のみを用いて特定されている場合には、被写体領域の精度が低い可能性があるため、固定位置および大きさの指標を表示する。また、被写体領域が距離情報を参照して特定されている場合には、被写体領域の位置や大きさに応じて指標の位置や大きさを動的に変更する。 The superimposed display of the index for specifying the subject area being tracked may be different depending on whether the subject area is specified with reference to the distance information or when only the color information is specified. For example, when the subject area is specified using only the color information, since the accuracy of the subject area may be low, an indicator of a fixed position and size is displayed. Further, when the subject area is specified with reference to the distance information, the position and size of the index are dynamically changed according to the position and size of the subject area.

また、動画に限らず、連写やインターバル撮影のような時系列的な複数の画像の撮影および再生時にも本発明は適用可能である。 Further, the present invention is applicable not only to moving images but also to shooting and reproduction of a plurality of time-series images such as continuous shooting and interval shooting.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

また、上述の実施形態は本発明の理解を助けることを目的とした具体例に過ぎず、いかなる意味においても本発明を上述の実施形態に限定する意図はない。特許請求の範囲に規定される範囲に含まれる全ての実施形態は本発明に包含される。 Further, the above-described embodiment is merely a specific example for the purpose of helping understanding of the present invention, and the present invention is not intended to be limited to the above-described embodiment in any sense. All embodiments that fall within the scope of the claims are encompassed by the present invention.

１００…デジタルカメラ、１０１…撮影レンズ、１３１…フォーカスレンズ、１３２…フォーカスモータ、１３３…フォーカス制御部、１４１…撮像素子、１５１…主制御部、１５２…画像処理部、１６１…追跡部 DESCRIPTION OF SYMBOLS 100 ... Digital camera, 101 ... Shooting lens, 131 ... Focus lens, 132 ... Focus motor, 133 ... Focus control part, 141 ... Image sensor, 151 ... Main control part, 152 ... Image processing part, 161 ... Tracking part

Claims

A specifying means for specifying an image region in the image for extracting the feature amount based on the specified position;
Extracting means for extracting a feature amount from the image region;
Search means for searching an area corresponding to the image area in a plurality of time-series images using the feature amount;
The specifying means uses the distance information if the distance information satisfying the reliability condition is obtained for the region including the designated position, and does not use the distance information otherwise. Identify areas,
An image processing apparatus.

The image processing apparatus according to claim 1, wherein the reliability condition is that a defocus amount of an area including the designated position is equal to or less than a threshold value.

A specifying means for specifying an image region in the image for extracting the feature amount based on the specified position;
Extracting means for extracting a feature amount from the image region;
Search means for searching an area corresponding to the image area in a plurality of time-series images using the feature amount;
The specifying unit specifies the image region without using the distance information before the distance information satisfying the reliability condition is obtained for the region including the designated position, and the distance information is obtained after the information is obtained. To identify the image area,
An image processing apparatus.

The reliability condition is that a defocus amount with respect to an area including the designated position is equal to or less than a threshold value.
The image processing apparatus according to claim 3.

When the image region is in a state specified using the distance information from a state specified without using the distance information, the extraction unit updates the feature amount.
The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

The extraction unit, when updating the feature quantity, generates an updated feature quantity from the feature quantity extracted from the image region specified using the distance information and the feature quantity extracted in the past. The image processing apparatus according to claim 5, wherein:

7. The image processing according to claim 1, wherein when the distance information is not used, the specifying unit specifies the image region using color information in the image. apparatus.

The said specifying means specifies the said image area | region based on the map which shows the probability which is a pixel of the to-be-photographed object of the designated position produced | generated based on color information. Image processing device.

The said specifying means specifies the said image area | region using the color information in the said image, and the said distance information, when using the said distance information, The any one of Claims 1-7 characterized by the above-mentioned. An image processing apparatus according to 1.

The specifying means generates a map indicating the likelihood of being a pixel of the subject at the designated position, which is generated based on color information, and the subject at the designated position, which is generated based on the distance information. The image processing apparatus according to claim 9, wherein the image region is specified from a map indicating the probability of being a pixel.

The image processing apparatus according to claim 1, wherein the feature amount is a pixel pattern or a histogram of the image area.

Extraction means for extracting feature values from the image area;
Search means for searching for a region similar to the image region in a plurality of time-series images using the feature amount;
The extraction unit updates the feature amount used for the search by the search unit based on distance information in a region similar to the image region and distance information in a peripheral region similar to the image region. An image processing apparatus.

The extraction means uses the search means for the search based on a ratio of areas having distance information in which the difference between distance information in an area similar to the image area in the peripheral area is within a predetermined range. The image processing apparatus according to claim 12, wherein it is determined whether to update the amount.

The extraction means performs the search when the ratio of areas having distance information whose difference from distance information in an area similar to the image area in the peripheral area is within a predetermined range is greater than or equal to a threshold value. The image processing apparatus according to claim 13, wherein the feature amount used for the update is updated.

The extraction means uses the search means for the search according to the ratio of the area having distance information in which the difference from the distance information in the area similar to the image area in the peripheral area is within a predetermined range. The image processing apparatus according to claim 14, wherein the degree of updating the amount is changed.

The image processing apparatus according to any one of claims 1 to 15,
A focus detection means for performing focus detection on an area including an area similar to the image area searched by the search means;
An imaging device comprising:

An image sensor having a function of dividing the pupil region of the taking lens;
Generating means for generating the distance information from a parallax image obtained from the imaging element;
The imaging apparatus according to claim 16, further comprising:

A specifying step in which the specifying unit specifies an image region in the image for extracting the feature amount based on the designated position;
An extracting step in which an extracting means extracts a feature amount from the image region;
A search step for searching for a region similar to the image region in a plurality of time-series images using the feature amount; and
In the specifying step, the specifying means uses the distance information if the distance information satisfying the reliability condition is obtained for the region including the designated position, and does not use the distance information otherwise. Identifying the image area;
And a control method for the image processing apparatus.

A specifying step in which the specifying unit specifies an image region in the image for extracting the feature amount based on the designated position;
An extracting step in which an extracting means extracts a feature amount from the image region;
A search step for searching for a region similar to the image region in a plurality of time-series images using the feature amount; and
In the specifying step, the specifying unit specifies the image region without using the distance information before the distance information satisfying the reliability condition is obtained for the region including the designated position, and after the information is obtained. Identifying the image region using the distance information;
And a control method for the image processing apparatus.

An extraction step in which the extraction means extracts a feature amount from the image region;
A search step for searching for a region similar to the image region in a plurality of time-series images using the feature amount; and
Updating the feature amount used for the search in the search step based on distance information in a region similar to the image region and distance information in a peripheral region similar to the image region in the extraction step; A control method for an image processing apparatus.

The program for functioning a computer as each means of the image processing apparatus of any one of Claims 1-15.