JP2012015742A

JP2012015742A - Reproduction device and recording device

Info

Publication number: JP2012015742A
Application number: JP2010149647A
Authority: JP
Inventors: Hidefumi Takeda; 英史竹田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-06-30
Filing date: 2010-06-30
Publication date: 2012-01-19

Abstract

【課題】コンシューマ向けの撮像装置を用いて、処理の高速化及びメモリ量の低消費化を可能にし、視認性の高い代表画像を抽出できるようにする。
【解決手段】画像データの被写体に対応した顔部分が存在する領域を表す顔認識情報を取得する顔認識情報取得手段と、前記画像データのピントが合っている領域を表すフォーカス情報を取得するフォーカス情報取得手段と、前記顔認識情報取得手段により取得された顔認識領域と、前記フォーカス情報取得手段により取得されたフォーカス領域との一致を検出する領域一致検出手段と、前記領域一致検出手段が一致すると判断した画像データを再生画像として前記表示装置に表示するよう制御する制御手段とを設ける。
【選択図】図２An imaging device for a consumer enables high-speed processing and low memory consumption, so that a representative image with high visibility can be extracted.
Face recognition information acquisition means for acquiring face recognition information representing an area where a face portion corresponding to a subject of image data exists, and focus for acquiring focus information representing an area in which the image data is in focus An information acquisition unit, a region match detection unit that detects a match between the face recognition region acquired by the face recognition information acquisition unit, and the focus region acquired by the focus information acquisition unit, and the region match detection unit match Then, a control means is provided for controlling to display the determined image data on the display device as a reproduced image.
[Selection] Figure 2

Description

本発明は再生装置および記録装置に関し、特に、記録媒体に記録された動画あるいは複数の静止画データ中から、特定のフレーム画像を自動選択する技術に関するものである。 The present invention relates to a reproducing apparatus and a recording apparatus, and more particularly to a technique for automatically selecting a specific frame image from a moving image or a plurality of still image data recorded on a recording medium.

従来、デジカメやビデオカメラ等の撮像装置で記録した画像データに対して、顔検出処理を行い、検出された顔画像をコンテンツの概要を示すサムネイル代表画像として登録し、表示出力する方法（特許文献１〜４）が開示されている。 Conventionally, a method of performing face detection processing on image data recorded by an imaging device such as a digital camera or a video camera, registering the detected face image as a thumbnail representative image indicating an outline of the content, and displaying and outputting the image (Patent Document) 1-4) are disclosed.

特許文献１に示されている方法は、動画像中のフレーム画像から顔認識による顔画像を抽出し、前記顔画像から特徴量を算出する。そして、算出した特徴量に基づいて同一人物判定を行い、最も出現頻度の高いものを代表画像として決定する。
特許文献２に示されている方法は、動画像の所定区間から代表画像候補を決定した後、この代表画像と、予め設定した顔テンプレートとの照合により顔認識を行い、この認識結果に基づいて代表画像を探索して出力する。
特許文献３に示されている方法は、動画中のフレーム画像から人物認識に行い、人物を含む領域を部分画像として抽出する。そして、抽出した部分画像と抽出元のフレーム画像と記録媒体上のアドレス情報とを関連付けて記録する。再生時には前述の部分画像をシーン情報として用いて、動画中の任意シーンを検索する。
特許文献４に示されている方法は、動画像中のフレーム画像から顔領域の有無判定、および予め設定した特定人物画像の顔との類似度を算出する。そして、算出した類似度の判定結果に基づき、特定人物を中心に動画像の再生を行う。 In the method disclosed in Patent Document 1, a face image obtained by face recognition is extracted from a frame image in a moving image, and a feature amount is calculated from the face image. Then, the same person is determined based on the calculated feature amount, and the image having the highest appearance frequency is determined as the representative image.
In the method disclosed in Patent Literature 2, after a representative image candidate is determined from a predetermined section of a moving image, face recognition is performed by comparing the representative image with a preset face template, and based on the recognition result. Search for and output a representative image.
The method disclosed in Patent Document 3 performs person recognition from a frame image in a moving image, and extracts a region including the person as a partial image. Then, the extracted partial image, the source frame image, and the address information on the recording medium are recorded in association with each other. At the time of reproduction, the above-mentioned partial image is used as scene information to search for an arbitrary scene in the moving image.
The method disclosed in Patent Literature 4 calculates the presence / absence of a face region from a frame image in a moving image, and calculates a similarity with a face of a specific person image set in advance. Then, based on the calculated similarity determination result, the moving image is reproduced centering on the specific person.

特開２００７−１７４３７８号公報JP 2007-174378 A 特開２００６−３３１２７１号公報JP 2006-331271 A 特開２００４−２９７３０５号公報JP 2004-297305 A 特開２００５−０１８４５１号公報JP 2005-018451 A

特許文献１〜４については、いずれも再生時の検索性向上を目的とした、動画中からの顔認識による代表画像抽出方法を提供するものである。しかしながら、特許文献１〜４においては、被写体が動く時の顔認識判定は、被写体ブレが発生したり、またはピントが所望の領域に合っていないような、ピンぼけが発生したりしていても、顔画像であると判定しまうことがある。 Patent Documents 1 to 4 each provide a representative image extraction method by face recognition from a moving image for the purpose of improving searchability during reproduction. However, in Patent Documents 1 to 4, in the face recognition determination when the subject moves, subject blurring or blurring that does not match the desired area may occur. The face image may be determined.

特に、コンシューマ向けのデジタルカメラやビデオカメラなどで撮影した場合、静止画、動画に関係なく、前記問題は顕著に発生する。そのため、特許文献１〜４にて提案されている従来技術では、視認性の高い高精細な代表画像が必ずしも得られない問題点があった。特に、コンシューマ向けの撮像装置を用いると、判定処理に必要な処理時間が長くなってしまう問題点、メモリ量を多く消費する問題点、及び視認性の高い高精細な代表画像が得られない問題点があった。
本発明では前述の問題点に鑑み、処理の高速化及びメモリ量の低消費化を可能にし、視認性の高い代表画像を抽出できるようにすることを目的としている。 In particular, when shooting with a consumer-oriented digital camera or video camera, the above problem occurs remarkably regardless of still images and moving images. For this reason, the conventional techniques proposed in Patent Documents 1 to 4 have a problem that high-definition representative images with high visibility cannot always be obtained. In particular, when a consumer imaging device is used, there is a problem that the processing time required for the determination process becomes long, a problem that a large amount of memory is consumed, and a problem that a high-definition representative image with high visibility cannot be obtained. There was a point.
In view of the above-described problems, an object of the present invention is to enable processing speed and memory consumption to be reduced, and to extract a representative image with high visibility.

本発明の再生装置は、画像データを再生して表示装置に表示する再生装置において、前記画像データの被写体に対応した顔部分が存在する領域を表す顔認識情報を取得する顔認識情報取得手段と、前記画像データのピントが合っている領域を表すフォーカス情報を取得するフォーカス情報取得手段と、前記顔認識情報取得手段により取得された顔認識領域と、前記フォーカス情報取得手段により取得されたフォーカス領域との一致を検出する領域一致検出手段と、前記領域一致検出手段が一致すると判断した画像データを再生画像として前記表示装置に表示するよう制御する制御手段とを有することを特徴とする。 The reproduction apparatus according to the present invention is a reproduction apparatus that reproduces image data and displays the image data on a display device, face recognition information acquisition means for acquiring face recognition information representing an area where a face portion corresponding to the subject of the image data exists. A focus information acquisition unit that acquires focus information representing a region in focus of the image data, a face recognition region acquired by the face recognition information acquisition unit, and a focus region acquired by the focus information acquisition unit A region coincidence detecting unit for detecting coincidence with the control unit, and a control unit for controlling the display device to display the image data determined to be coincident by the region coincidence detecting unit as a reproduced image.

本発明によれば、処理の高速化及びメモリ量の低消費化を可能になり、記録された多数の画像データから、視認性の高い代表画像のみを抽出することができる。したがって、ユーザが直感的な内容の把握をすることが容易となり、検索性の向上を実現できる。 According to the present invention, it is possible to increase the processing speed and reduce the amount of memory, and it is possible to extract only representative images with high visibility from a large number of recorded image data. Therefore, it becomes easy for the user to grasp intuitive contents, and the searchability can be improved.

実施形態の再生装置の概略構成例を示すブロック図である。It is a block diagram which shows the schematic structural example of the reproducing | regenerating apparatus of embodiment. 顔画像領域とフォーカス領域が一致する抽出処理を説明する図である。It is a figure explaining the extraction process in which a face image area | region and a focus area correspond. 顔画像領域とフォーカス領域が一致する抽出処理を説明する図である。It is a figure explaining the extraction process in which a face image area | region and a focus area correspond. 顔画像領域とフォーカス領域が一致する抽出処理を説明する図である。It is a figure explaining the extraction process in which a face image area | region and a focus area correspond. ＬＣＤ上に表示されたインデックス画面の一例を説明する図である。It is a figure explaining an example of the index screen displayed on LCD. 外部接続機器を接続したシステム構成を説明する構成図である。It is a block diagram explaining the system configuration | structure which connected the external connection apparatus. 一致画像が複数個検出された場合の絞込み処理を説明する図である。It is a figure explaining the narrowing-down process when two or more coincidence images are detected. 顔画像認識情報を座標情報として生成する処理を説明する図である。It is a figure explaining the process which produces | generates face image recognition information as coordinate information. 顔画像認識情報をビットマップ情報として処理する様子を説明する図である。It is a figure explaining a mode that face image recognition information is processed as bitmap information. 領域が一致する再生画像の抽出処理手順を説明するフローチャートである。It is a flowchart explaining the extraction processing procedure of the reproduction image in which an area | region corresponds. 複数の顔画像が存在する場合の抽出処理を説明するフローチャートである。It is a flowchart explaining the extraction process in case a some face image exists. 複数の顔画像が存在する場合の抽出処理を説明するフローチャートである。It is a flowchart explaining the extraction process in case a some face image exists. 顔画像領域とフォーカス領域との一致判定する処理を説明する図である。It is a figure explaining the process which determines a match with a face image area | region and a focus area | region.

次に、発明を実施するための最良の形態について図面を参照して詳細に説明する。
「１．本実施形態のシステム構成例」
図１には、本実施形態による顔認識領域とフォーカス領域の一致画像を検出する手段を具備するビデオカメラシステムを一例にした再生装置の概略構成ブロック図を示した。
以下、本実施形態に係る撮像再生システムの各ブロックの機能について説明する。 Next, the best mode for carrying out the invention will be described in detail with reference to the drawings.
"1. System configuration example of this embodiment"
FIG. 1 shows a schematic block diagram of a playback apparatus taking as an example a video camera system including means for detecting a coincidence image of a face recognition area and a focus area according to the present embodiment.
Hereinafter, the function of each block of the imaging reproduction system according to the present embodiment will be described.

光学部１０１は、レンズや光学素子や光学素子の駆動機構を備えている。例えば、レンズ、ＩＲＩＳモータ、ＡＦモータ、フラッシュ（何れも図示せず）を備えており、光学像を結像する。
撮像部１０２は、ＣＣＤやＣＭＯＳなどのイメージセンサ、ＡＧＣ、ＡＤ変換器などを備え、光学部１０１によって結像された光学像を電気信号に変換するととともに、Ａ／Ｄ変換して撮影画像データを生成する。また、光学部１０１および撮像部１０２において、フォーカス制御のために使用した駆動パラメータから、撮像画像データのどこにピントが合っているか、画面内のピントが合っている領域を表しているフォーカス位置情報を生成する。 The optical unit 101 includes a lens, an optical element, and a driving mechanism for the optical element. For example, a lens, an IRIS motor, an AF motor, and a flash (all not shown) are provided to form an optical image.
The imaging unit 102 includes an image sensor such as a CCD and a CMOS, an AGC, an AD converter, and the like. The imaging unit 102 converts an optical image formed by the optical unit 101 into an electrical signal and A / D converts the captured image data. Generate. In addition, in the optical unit 101 and the imaging unit 102, focus position information indicating where the captured image data is in focus and the focused area in the screen is obtained from the drive parameters used for focus control. Generate.

さらに、前記フォーカス位置情報を、静止画データにおいてはEXIF情報として、動画データにおいては、公知の方法を用いてフォーカス位置情報も含むカメラ情報メタデータとして、後述の符号化処理によって情報圧縮された画像データ中に埋め込む。 Further, the image of which the focus position information is information-compressed by an encoding process, which will be described later, as EXIF information in still image data and as camera information metadata including focus position information in a moving image data using a known method. Embed in data.

顔画像認識部１０３は、入力されたフレーム画像から、顔画像の検出判定を行う。
本実施形態では、撮影記録時に撮影フレーム画像について前述の顔検出判定を行い、顔画像情報を後述する記録方式で画像データ中にメタデータとて保存することを前提としている。ただし、記録媒体からの再生読み出し時に復号画像フレームに対して前述の顔検出判定を行った後に、顔画像情報を作成する手順でもよく、前記顔画像情報を予めメタデータとして持っていない画像データについても本実施形態を適用することが可能である。なお、顔画像検出の判定アルゴリズムは、画面内のエッジ検出、肌色などの特定の色相検出、さらには、顔形状のパターンマッチングなど公知の技術を適用し検出するものとする。 The face image recognition unit 103 performs face image detection determination from the input frame image.
In the present embodiment, it is assumed that the above-described face detection determination is performed on a captured frame image at the time of capturing and recording, and the face image information is stored as metadata in the image data by a recording method described later. However, it may be a procedure of creating face image information after performing the aforementioned face detection determination on the decoded image frame at the time of reproduction read from the recording medium. For image data that does not have the face image information as metadata in advance Also, this embodiment can be applied. Note that the face image detection determination algorithm is detected by applying known techniques such as edge detection in the screen, specific hue detection such as skin color, and face shape pattern matching.

符号化部１０４は、撮像部１０２から入力された画像データに対して、例えば、ＭＰＥＧのようなＤＣＴ変換と動き予測による、空間的な冗長性と時間的な冗長性を削減した動画圧縮を施す。 The encoding unit 104 subjects the image data input from the imaging unit 102 to video compression that reduces spatial redundancy and temporal redundancy by, for example, DCT conversion and motion prediction such as MPEG. .

記録再生部１０５は、符号化部１０４で圧縮された動画データに対して、ＥＣＣなどの誤り訂正パリティ符号を付加し、磁気ディスク１０６あるいは光ディスク１０７などの大容量記録媒体に記録書き込みを行う。また、ＬＣＤパネル等の表示出力部１０９の画面上に現在撮影中のフレームを表示出力する。 The recording / reproducing unit 105 adds an error correction parity code such as ECC to the moving image data compressed by the encoding unit 104, and performs recording / writing on a large-capacity recording medium such as the magnetic disk 106 or the optical disk 107. Further, the frame currently being shot is displayed and output on the screen of the display output unit 109 such as an LCD panel.

続いて、前述した各手段によって、記録媒体に記録された動画データを再生する手順について説明する。
記録再生部１０５は、磁気ディスク１０６あるいは光ディスク１０７などの記録媒体に書き込んだ圧縮動画データを読み出し、記録時に付加したパリティ符号を用いて、誤り訂正を行う。そして、訂正後の動画データを復号化部１０８に出力し、復号化部１０８において入力時と同じ画像データとして復号する。 Subsequently, a procedure for reproducing the moving image data recorded on the recording medium by each means described above will be described.
The recording / reproducing unit 105 reads the compressed moving image data written on the recording medium such as the magnetic disk 106 or the optical disk 107, and performs error correction using the parity code added at the time of recording. Then, the corrected moving image data is output to the decoding unit 108, and the decoding unit 108 decodes it as the same image data as input.

復号化部１０８にて復号したデータは表示出力部１０９に出力する。また、前述の撮像システムを構成する各ブロックを制御するために、プログラムを内蔵した制御用マイコン１１０が存在し、前記ブロックの処理命令やデータフローのシステムバス１１１を介して行うものとする。 The data decoded by the decoding unit 108 is output to the display output unit 109. Further, in order to control each block constituting the above-described imaging system, there is a control microcomputer 110 having a built-in program, which is performed via the processing instruction of the block and the system bus 111 of the data flow.

システムバス１１１は、各処理ブロック間でのデータ転送や、制御命令の送受信を行う。
半導体メモリ１１２は、各システムブロックでの処理に必要なデータ蓄積を行う。
また、ユーザからの記録や再生要求あるいは電源の投入、停止などのオペレーションは、外部コントロール１１３経由で、制御用マイコン１１０に通知されるものとし、前記通知内容をトリガーとして制御用マイコン１１０が、システム全体の動作を制御する。 The system bus 111 performs data transfer between processing blocks and transmission / reception of control commands.
The semiconductor memory 112 accumulates data necessary for processing in each system block.
In addition, operations such as recording and reproduction requests from the user or power-on / stop are notified to the control microcomputer 110 via the external control 113, and the control microcomputer 110 is triggered by the notification content as a trigger. Control overall operation.

外部入出力インタフェース１１４は、ＵＳＢやＩＥＥＥ１３９４などの有線ケーブル、あるいはBluetooth等の無線ＬＡＮによって接続され、外部機器、またはネットワーク間で、撮像装置内の画像データの送受信を行う。
外部接続機器１１５は、本実施形態の撮像装置と接続され、画像データを表示出力するテレビや、画像データを印刷するプリンタなどの機器である。
以上が、本実施形態の再生装置の基本となるシステム構成である。 The external input / output interface 114 is connected by a wired cable such as USB or IEEE1394, or a wireless LAN such as Bluetooth, and transmits / receives image data in the imaging apparatus between external devices or networks.
The externally connected device 115 is connected to the imaging apparatus of the present embodiment, and is a device such as a television that displays and outputs image data and a printer that prints image data.
The above is the basic system configuration of the playback apparatus of this embodiment.

「２．顔画像とフォーカス領域の一致画像抽出処理」
続いて、本実施形態による顔画像領域とフォーカス領域の一致する画像フレームを抽出する方法を、図２、図３、図４及び図１０を用いて説明する。
図２は、光ディスク１０７などの記録媒体に記憶された画像データから、各シーンのデータを読み出し、顔画像検出とフォーカス検出を行い、一致する画像だけを抽出した図である。 “2. Face image and focus area matching image extraction process”
Next, a method for extracting an image frame in which the face image area and the focus area match according to the present embodiment will be described with reference to FIGS. 2, 3, 4, and 10.
FIG. 2 is a diagram in which data of each scene is read from image data stored in a recording medium such as the optical disc 107, face image detection and focus detection are performed, and only matching images are extracted.

図２中のように、記録媒体上に記録済みデータとしてCLIP001.MOVとCLIP002.MOVというファイル名で動画データが配置されている時、CLIP002.MOVを読み出し顔画像とフォーカス領域が一致する再生画像を抽出するものとする。 As shown in Fig. 2, when moving image data with CLIP001.MOV and CLIP002.MOV file names is recorded as recorded data on the recording medium, CLIP002.MOV is read, and the playback image whose face image matches the focus area Shall be extracted.

記録媒体から読み出した画像データCLIP002.MOVには、図２中のようにSCN01からSCN05まで５つの代表シーンがあるものとする。この代表シーンは、例えば撮像装置で記録開始時の第１フレーム画像や、数秒間隔自動で無作為に取り出したフレーム画像であってもよいものとする。もちろん、すべてのフレーム画像でもよい。 Assume that the image data CLIP002.MOV read from the recording medium has five representative scenes SCN01 to SCN05 as shown in FIG. The representative scene may be, for example, a first frame image at the start of recording by the imaging apparatus, or a frame image randomly extracted at intervals of several seconds. Of course, all frame images may be used.

各代表シーンについて、前述の図１を用いて説明した復号処理を行うと共に、画像データ中にメタデータ形式で記録されているフォーカス領域情報、および後述の方法で作成した顔認識情報の両方を取得する、フォーカス情報取得および顔認識情報取得を行う。本実施形態において、メタデータ内に記述されているフォーカス情報は、撮影した画面内でピントが合っている領域、いわゆる合焦領域を座標値として持っていることが望ましい。またはその合焦領域と同じ意味となる領域情報を導出できるカメラ情報（例えば、焦点距離、画角や被写界深度等）を持っていればよいものとする。 For each representative scene, the decoding process described with reference to FIG. 1 is performed, and both the focus area information recorded in the metadata format in the image data and the face recognition information created by a method described later are acquired. Focus information acquisition and face recognition information acquisition are performed. In the present embodiment, it is desirable that the focus information described in the metadata has an area in focus on the captured screen, a so-called in-focus area as a coordinate value. Alternatively, it is only necessary to have camera information (for example, focal length, angle of view, depth of field, etc.) that can derive area information having the same meaning as the in-focus area.

図２中の例では、各領域情報が異なることを図示により説明しやすくする便宜上、代表シーンにおいて検出されたフォーカス領域をエッジ部が太い実線で表現し、顔認識領域は破線で囲まれた四角としてそれぞれ表現している。 In the example in FIG. 2, for the sake of convenience, it is easy to explain that each area information is different, the focus area detected in the representative scene is represented by a solid line with a thick edge portion, and the face recognition area is a square surrounded by a broken line. Respectively.

本実施形態では前述の各代表シーンについて、この顔検出領域とフォーカス領域が重複する度合いを検出するために、一致面積の比率割合を検出する領域一致度検出処理を行う。そして、検出の結果が予め定めたフォーカス領域の閾値以上の場合に、一致画像であると判定する。 In the present embodiment, in order to detect the degree of overlap between the face detection area and the focus area for each of the above-described representative scenes, an area coincidence degree detection process for detecting a ratio ratio of coincidence areas is performed. Then, when the detection result is equal to or greater than a predetermined focus area threshold value, it is determined that the images match.

前述の判定方法を数式として表現すると、フォーカス領域の面積がSf[pixel²]、顔認識領域とフォーカス領域の重ね領域の面積がSo[pixel²]とした時、重み付け係数となる閾値が80%とすると、
So [pixel²] ≧ Sf [pixel²]×0.8 ・・・(式１)
となるSoの値を持つ再生画像を、一致画像であると判定する。 Expressing the above judgment method as a mathematical expression, when the area of the focus area is Sf [pixel ² ] and the area of the overlap area of the face recognition area and the focus area is So [pixel ² ], the threshold value as a weighting factor is 80% Then,
So [pixel ² ] ≥ Sf [pixel ² ] x 0.8 (Formula 1)
A reproduced image having a value of So is determined to be a matching image.

図１３には、前述の重なり領域による一致判定方法の具体例を詳細に説明する図を示した。
図１３の（ａ）には、或る一致判定対象の再生画像を図示し、そして（ｂ）には判定対象の再生画像におけるフォーカス領域と顔認識領域の位置情報として、座標値を図示した。 FIG. 13 is a diagram for explaining in detail a specific example of the above-described coincidence determination method using overlapping regions.
FIG. 13A illustrates a reproduced image to be a certain match determination target, and FIG. 13B illustrates coordinate values as position information of the focus area and the face recognition area in the determination target reproduced image.

判定対象となる再生画像は、水平方向１９２０ピクセル、垂直ライン数１０８０ラインの画像データであると仮定する。そして、前述の画像データにおけるフォーカス領域と顔認識領域を左上の頂点を原点として、それぞれの領域における左上と右下の対角成分を、再生データ読み込み時に取得するものとする。 It is assumed that the reproduced image to be determined is image data of 1920 pixels in the horizontal direction and 1080 lines of vertical lines. Then, with the focus area and face recognition area in the image data described above as the origin at the upper left vertex, the upper left and lower right diagonal components in each area are acquired when reading the reproduction data.

各座標値が、As(100, 100)、Ae(500, 500)、Bs(200, 200)、Be(550, 600)となることから、フォーカス領域の面積Sfは、
Sf = (550 - 200) * (600 - 200) = 350 * 400 = 14*10⁴[pixel²]
図１３の斜線部で示した顔認識領域とフォーカス領域の重なり領域の面積Soは、
So = (500 - 200) * (500 - 200) = 300 * 300 = 9*10⁴[pixel²]
となる。 Since each coordinate value is As (100, 100), Ae (500, 500), Bs (200, 200), Be (550, 600), the area Sf of the focus area is
Sf = (550-200) * (600-200) = 350 * 400 = 14 * 10 ⁴ [pixel ² ]
The area So of the overlapping area of the face recognition area and the focus area indicated by the shaded area in FIG.
So = (500-200) * (500-200) = 300 * 300 = 9 * 10 ⁴ [pixel ² ]
It becomes.

続いて、一致判定のための重み付け係数が80％とすると、
9*10⁴ [pixel²] < 14*10⁴ * 0.8 = 11.2*10⁴ [pixel²]
となり、式（１）を満たさない。
よって、図１３の再生画像については一致画像ではないと判定する。 Next, if the weighting coefficient for matching is 80%,
9 * 10 ⁴ [pixel ² ] <14 * 10 ⁴ * 0.8 = 11.2 * 10 ⁴ [pixel ² ]
Therefore, Formula (1) is not satisfied.
Therefore, it is determined that the reproduced image in FIG. 13 is not a coincident image.

図２中の例においては、前述の判定結果として代表シーン５つのうち、SCN01と、SCN03、SCN05が顔認識とフォーカス領域が一致した画像であると判断されたものである。前述の一致判定のための閾値をどのような値に設定するかによって、抽出する前記一致画像の枚数を柔軟に変更することが可能となり、高精細な代表画像に絞り込むのか、多くの候補画像を選べるようにするのか適応的に選択できる。 In the example in FIG. 2, SCN01, SCN03, and SCN05 are determined to be images in which the face recognition and the focus area are the same among the five representative scenes as the above-described determination results. It is possible to flexibly change the number of matching images to be extracted depending on what value is set as the threshold value for the above-mentioned matching determination, and whether to narrow down to a high-definition representative image or to select many candidate images. You can choose adaptively whether you can choose.

また、撮影するシーンによっては、図２のような被写体だけではなく、図３や図４のようなケースも想定される。
図３（ａ）には、パンフォーカス撮影時などに実線で示したフォーカス領域内に、破線で示した複数の顔認識領域が存在する場合の再生画像を例として示した。図３（ａ）の例では、実線で囲んだフォーカスエリア内に、破線で囲んだ顔認識エリアが含まれる形で５つ検出されている。 Depending on the scene to be photographed, not only the subject as shown in FIG. 2 but also cases as shown in FIGS. 3 and 4 are assumed.
FIG. 3A shows an example of a reproduced image in a case where a plurality of face recognition areas indicated by broken lines exist within a focus area indicated by solid lines at the time of pan-focus shooting or the like. In the example of FIG. 3A, five face recognition areas surrounded by a broken line are included in the focus area surrounded by a solid line.

本発明の目的とするところでは、このような再生画像においても一致画像として判定されることが望ましいが、前述の（式１）による判定処理では一致画像と判定できなくなってしまう。そのため、本実施形態では、このようなケースについても検出漏れをしないために、フォーカス領域中、顔認識領域がすべて内包される場合についても、一致画像であると判定する。 For the purpose of the present invention, it is desirable that such a reproduced image is determined as a coincidence image. However, the determination process according to (Equation 1) described above cannot be determined as a coincidence image. For this reason, in the present embodiment, in order to prevent detection failure even in such a case, even when all of the face recognition areas are included in the focus area, it is determined that they are coincident images.

具体的には、フォーカス領域中に顔認識領域がすべて内包されていると判断する方法を用いる。この判断方法は、顔認識領域の座標情報と、フォーカス領域の座標情報から、四つの頂点で囲まれるフォーカス領域の座標内に、同様に顔認識領域の座標値が存在するか否かを算出することによって判断する。 Specifically, a method of determining that the face recognition area is entirely included in the focus area is used. This determination method calculates from the coordinate information of the face recognition area and the coordinate information of the focus area whether or not the coordinate value of the face recognition area similarly exists within the coordinates of the focus area surrounded by the four vertices. Judge by.

図３（ｂ）には、図３（ａ）に対応したフォーカス領域と顔認識領域を座標値として表現したものを示した。座標値には、矩形エリアの左上の頂点を始点、右下の頂点を終点とした。例えば、水平解像度１２８０ピクセル、垂直ライン数９６０ラインの再生画像について、左上の頂点をゼロ点としてマッピングした場合、図３（ｂ）の凡例に示すような値をもつことが分かったと仮定する。 FIG. 3B shows a focus area and a face recognition area corresponding to FIG. 3A expressed as coordinate values. In the coordinate values, the upper left vertex of the rectangular area is the start point, and the lower right vertex is the end point. For example, it is assumed that a reproduced image having a horizontal resolution of 1280 pixels and a vertical line number of 960 lines has a value as shown in the legend of FIG.

図３（ｂ）より、前述の再生画像のフォーカス領域はFocusStart(10,10)からFocusEnd(1270, 950)が対角線として結ばれる矩形エリアに相当する。
同様に、顔認識エリアについては、検出された人数分、それぞれFace1Start(20, 10)からFace1End(300, 310)まで。Face2Start(320, 20)からFace2End(620, 320)まで。Face3Start(700, 30)からFace3End(1100, 400)まで。Face4Start(200, 310)からFace4End(350, 460)まで。Face5Start(600, 300)からFace5End(760, 460)までの対角線で結ばれる矩形エリアとして表現する。 From FIG. 3B, the focus area of the above-described reproduced image corresponds to a rectangular area in which FocusStart (10, 10) to FocusEnd (1270, 950) are connected as a diagonal line.
Similarly, for the face recognition area, from Face1Start (20, 10) to Face1End (300, 310) for each detected number of people. From Face2Start (320, 20) to Face2End (620, 320). From Face3Start (700, 30) to Face3End (1100, 400). From Face4Start (200, 310) to Face4End (350, 460). It is expressed as a rectangular area connected by diagonal lines from Face5Start (600, 300) to Face5End (760, 460).

前述の顔認識エリアの各座標の水平方向成分、および垂直方向成分の値が、フォーカス領域の始点FocusStart、とフォーカス領域の終点FocusEndの座標値以内に入っているかを判定する。
すなわち、或る顔認識領域の始点または終点座標値をFaceX(a, b)とした時、次式（２）および（３）を、顔認識領域の始点と終点とが共に満たしていれば、フォーカス領域内に顔検出領域が含まれていると判断される。 It is determined whether the horizontal direction component and the vertical direction component value of each coordinate of the face recognition area are within the coordinate values of the focus area start point FocusStart and the focus area end point FocusEnd.
That is, when the start point or end point coordinate value of a face recognition area is FaceX (a, b), the following expressions (2) and (3) are satisfied if both the start point and end point of the face recognition area satisfy It is determined that the face detection area is included in the focus area.

FocusStartの水平方向成分値 ≦ a ≦ FocusEndの水平方向成分値・・・ (式２)
FocusStartの垂直方向成分値 ≦ b ≦ FocusEndの垂直方向成分値・・・ (式３)
前述の例では、５人分の顔検出エリアがあるので、上式を５人分の顔認識領域の座標Face1StartからFace5Endまで、すべて比較してフォーカスエリア内に顔検出領域が内包されているか判断する必要がある。 Horizontal component value of FocusStart ≦ a ≦ Horizontal component value of FocusEnd (Equation 2)
Vertical component value of FocusStart ≦ b ≦ Vertical component value of FocusEnd (Equation 3)
In the above example, since there are face detection areas for five people, the above expression is all compared from the face recognition area coordinates Face1Start to Face5End for five people to determine whether the face detection area is included in the focus area. There is a need to.

前述した式による判定結果としては、
水平方向 10 ≦ 20/300/320/620/700/1100/200/350/600/760 ≦ 1270
垂直方向 10 ≦ 10/310/20/320/30/400/310/460/300/460 ≦ 950
となるため、一致画像として判定される。 As a judgment result by the above-mentioned formula,
Horizontal direction 10 ≤ 20/300/320/620/700/1100/200/350/600/760 ≤ 1270
Vertical 10 ≤ 10/310/20/320/30/400/310/460/300/460 ≤ 950
Therefore, it is determined as a matching image.

一方、図４のケースでは、被写界深度が浅く、人物などをアップで撮影した場合に、顔の一部だけ（図４中では鼻の部分だけ）にフォーカスが合焦をしている。
このように顔認識領域にフォーカス領域が含まれる場合、（式１）の判定結果としては一致画像となるが、フォーカス領域の大きさやフォーカス位置によっては識別不可能なボケ画像を抽出してしまう可能性がある。 On the other hand, in the case of FIG. 4, when the depth of field is shallow and a person or the like is photographed up, the focus is focused on only a part of the face (only the nose part in FIG. 4).
When the focus area is included in the face recognition area as described above, the determination result of (Equation 1) is a coincidence image, but a blur image that cannot be identified depending on the size of the focus area and the focus position may be extracted. There is sex.

しかしながら、ソフトフォーカスなど撮影効果としての画像データもある。このため、実際にはユーザによる設定あるいは、レンズ性能などを考慮した上で、（式１）による顔認識領域とフォーカス領域の一致画像データ検出処理の前処理を行う。例えば、再生画像を占めるフォーカス領域の最小面積を予め導出し、適用可能か否かを判断する。これにより、アップで人物を撮影した場合に発生する検出漏れを防止することが可能となる。 However, there is also image data as a photographing effect such as soft focus. Therefore, in actuality, the pre-processing of the matching image data detection process of the face recognition area and the focus area according to (Equation 1) is performed in consideration of the setting by the user or the lens performance. For example, the minimum area of the focus area that occupies the reproduced image is derived in advance, and it is determined whether or not it is applicable. Accordingly, it is possible to prevent a detection omission that occurs when a person is photographed up.

図１０には、前述の一致画像抽出処理をフローチャートとして表現した。
Ｓ１０１では対象となる動画データの最終フレームまで抽出処理が完了したかを判断する。この判断の結果、完了している場合には、一致画像の抽出処理を終了する。 In FIG. 10, the above-described matching image extraction process is expressed as a flowchart.
In S101, it is determined whether the extraction process has been completed up to the final frame of the target moving image data. If the result of this determination is that it is complete, the matching image extraction process is terminated.

一方、Ｓ１０１の判断の結果、一致画像検索対象のフレーム画像がまだ残っている場合には、Ｓ１０２に進み、図１で説明した記録媒体からのデータ読み込みを行い、圧縮画像データの復号処理を行う。続いて、Ｓ１０３に進み、画像データ中のメタデータまたは管理情報ファイルからの検索対象となる画像データのフォーカス領域情報を取得する。次に、Ｓ１０４に進み、画像データから顔画像認識部１０３による顔認識領域情報を取得する。 On the other hand, if the result of determination in S101 is that there are still frame images to be searched for matching images, the process proceeds to S102, where data reading from the recording medium described in FIG. 1 is performed, and compressed image data is decoded. . In step S103, the focus area information of the image data to be searched is acquired from the metadata in the image data or the management information file. In step S104, face recognition area information obtained by the face image recognition unit 103 is acquired from the image data.

次に、Ｓ１０５において、前述のＳ１０３とＳ１０４のステップより取得した二つの領域情報を元に、フォーカスエリアと顔認識エリアの一致度を、前述の式（１）に基づく判定処理として導出する。 Next, in S105, the degree of coincidence between the focus area and the face recognition area is derived as a determination process based on the above-described equation (1) based on the two area information acquired from the above-described steps S103 and S104.

次に、Ｓ１０６のステップでは、Ｓ１０５における判定結果が、予め定めた閾値以上であるか否かの一致判断を行う。この判断の結果、閾値以上である場合は、一致画像として判断し、Ｓ１０７に進んで再生画像を表示出力する。一方、Ｓ１０６の判断の結果、閾値より小さい場合は、次の画像フレームの抽出処理を続けて行う。
以上が、フォーカス領域と顔認識領域の一致画像抽出処理のフローである。 Next, in step S106, a determination is made as to whether or not the determination result in S105 is equal to or greater than a predetermined threshold value. If the result of this determination is that it is greater than or equal to the threshold value, it is determined as a matching image, and the process proceeds to S107 to display and output the reproduced image. On the other hand, if the result of determination in S106 is smaller than the threshold value, the next image frame extraction process is continued.
The flow of the matching image extraction process between the focus area and the face recognition area has been described above.

「３．本発明を応用した実施形態」
続いて、本発明による顔画像とフォーカス領域の一致画像フレーム抽出処理を応用した実施形態について、図５および図６を用いて説明する。
図５には、本実施形態によって動画の記録再生を行ったディスクカムコーダの外観と、ディスクカムコーダのLCD上に表示されたインデックス画面の実施形態を示した。
図５に示すように、前述の抽出した一致画像の画像データを縮小したサムネイル画像を複数枚並べたメニュー画面を構成して出力することによって、記録媒体内のコンテンツの概要を視覚的に表す、サムネイルインデックスとして表示出力することが可能となる。これにより、検出された一致画像すべてをメモリ上に展開して保持する必要がなく、さらに、一画面上で複数の画像データを効率よく閲覧することできる。 "3. Embodiments applying the present invention"
Next, an embodiment in which the face image and focus area matching image frame extraction processing according to the present invention is applied will be described with reference to FIGS. 5 and 6.
FIG. 5 shows an appearance of a disc camcorder on which a moving image is recorded and reproduced according to this embodiment, and an embodiment of an index screen displayed on the LCD of the disc camcorder.
As shown in FIG. 5, an outline of the content in the recording medium is visually expressed by configuring and outputting a menu screen in which a plurality of thumbnail images obtained by reducing the image data of the extracted matching images are arranged. It is possible to display and output as a thumbnail index. Thereby, it is not necessary to develop and hold all detected matching images on the memory, and moreover, it is possible to efficiently browse a plurality of image data on one screen.

図５中の例では、THM001からTHM007まで７枚のサムネイル画像を表示しているが、一つの画面上には９枚まで表示可能としている。なお、９枚を超えるサムネイル画像がある場合には、送りボタン等によってジャンプが可能な、別のサムネイルインデックス画面を複数ページに渡って作成してもよい。ただし、一画面上の最大表示可能サムネイル数は、ユーザの視認性を妨げない解像度、および表示装置の大きさ、さらにはメモリ内に保持できるサムネイル画像データ容量などの観点から決定すればよく、特に９枚と限定するものではない。 In the example in FIG. 5, seven thumbnail images from THM001 to THM007 are displayed, but up to nine images can be displayed on one screen. When there are more than nine thumbnail images, another thumbnail index screen that can be jumped by a feed button or the like may be created over a plurality of pages. However, the maximum number of thumbnails that can be displayed on one screen may be determined from the viewpoint of the resolution that does not hinder the visibility of the user, the size of the display device, and the thumbnail image data capacity that can be held in the memory. The number is not limited to nine.

さらに、ユーザはこのサムネイルメニューから所望のサムネイル画像を選択するだけで、そのサムネイル画像に対応する動画コンテンツを、待ち時間を発生することなく、高速な頭だし再生ができる。なお、抽出した一致画像を縮小処理する方法は、バイキュービック法など公知技術を用いたものでよい。 Furthermore, the user can select a desired thumbnail image from the thumbnail menu, and can quickly reproduce the moving image content corresponding to the thumbnail image without causing a waiting time. A method for reducing the extracted matching image may be a method using a known technique such as a bicubic method.

前述のランダムアクセスを可能とするため、サムネイル画像生成と同時に、各サムネイル画像と、縮小元になった一致画像が存在する動画データ上の記録位置とを関連付けて保持する。例えば、記録媒体上のファイル名と、一致画像を検出した時間位置を図５中のテーブル形式で、システムメモリの論理アドレスに関連付けて保持する。 In order to enable the above-described random access, simultaneously with the generation of the thumbnail images, each thumbnail image and the recording position on the moving image data where the matching image as the reduction source exists are associated and held. For example, the file name on the recording medium and the time position at which the coincident image is detected are stored in association with the logical address of the system memory in the table format in FIG.

図５の場合には、既にclip001.mov、clip002.mov、clip003.movと三つの動画ファイルが記録されている。それぞれの動画ファイルに検出されたフォーカス領域と顔認識領域の一致画像のサムネイル画像名THM001からTHM007までが、記録時間と共に関連付けられる。この一致画像サムネイルと記録位置を対応付けたランダムアクセステーブルを作成することにより、ユーザは所望のサムネイル画像を選択することによって、対応した動画ファイルの対応する記録時間を参照し、所望の位置からの再生を実現できる。 In the case of FIG. 5, three video files, clip001.mov, clip002.mov, and clip003.mov, have already been recorded. Thumbnail image names THM001 to THM007 of matching images in the focus area and face recognition area detected in each moving image file are associated with the recording time. By creating a random access table in which the matching image thumbnail and the recording position are associated, the user selects a desired thumbnail image, refers to the corresponding recording time of the corresponding moving image file, and starts from the desired position. Playback can be realized.

図６には、ビデオカメラなど撮像装置を図１の外部インタフェースを経由してプリンタなどに接続し、一致画像をビデオプリントモードにおける推奨画像としてプリンタのＧＵＩ上に表示出力する実施形態を示した。なお、プリンタに光ディスク用のディスクドライブが搭載されている場合には、光ディスクを撮像装置から取り出し、ドライブに装填することで同様の機能を実現できるものとする。 FIG. 6 shows an embodiment in which an imaging device such as a video camera is connected to a printer or the like via the external interface of FIG. 1 and a matching image is displayed and output on the GUI of the printer as a recommended image in the video print mode. When a disk drive for an optical disk is mounted on the printer, the same function can be realized by taking out the optical disk from the imaging apparatus and loading it in the drive.

図６では、図５を用いて説明した実施形態と同様に、本実施形態によるフォーカス画像と顔認識領域の一致画像をプリンタ本体の再生機能として取得する。そして、縮小処理したサムネイル画像をLCDなど表示デバイスによる操作画面あるいは表示画面に並べて出力し、印刷推奨画像としてユーザに通知する。本実施形態においては、操作画面には、サムネイル画像の他にプリント部数、プリントサイズ、印刷実行、そして印刷候補画像の送りボタン等を配置したプリント設定用のユーザインタフェースをもつウィンドウ画面を合わせて出力する。これにより、印刷に適したフレーム画像をユーザが検索する手間が省略できる。 In FIG. 6, as in the embodiment described with reference to FIG. 5, the coincidence image of the focus image and the face recognition area according to the present embodiment is acquired as the playback function of the printer body. The reduced thumbnail images are output side by side on an operation screen or display screen by a display device such as an LCD, and notified to the user as a recommended print image. In this embodiment, in addition to the thumbnail image, a window screen having a user interface for print settings in which the number of copies, print size, print execution, and a print candidate image feed button are arranged is output as the operation screen. To do. This saves the user from having to search for a frame image suitable for printing.

ユーザは、操作ウィンドウをプリンタ本体に具備されるボタンキー、タッチパネル、ジョグダイヤルなどの入力手段によって操作することで、対応する一致画像のサムネイル画像を表示出力による候補選択、または一致画像の印刷出力を行う。図６中のプリント選択画面の例では、印刷候補としてPRNT001からPRNT003の三つの一致画像のサムネイル画像が印刷候補として出力されている。 The user operates the operation window with an input unit such as a button key, a touch panel, or a jog dial provided in the printer body, thereby selecting a candidate by displaying a corresponding thumbnail image or printing out the matching image. . In the example of the print selection screen in FIG. 6, thumbnail images of three matching images PRNT001 to PRNT003 are output as print candidates.

このプリント選択画面では、現在選択中のサムネイル画像PRNT002には、画面の縁取りが太くなっており、ユーザがどの画像を選択中なのかを直感的に理解できるようにしてあるが、特に選択中の画像をどのように強調表示するかは限定しない。
また、左右に前候補PRNT001と後候補PRNT003を並べて表示することで、構図が似たような一致画像が複数検出された際、どちらを印刷したいかユーザが容易に比較できるようにしている。
また、図５、図６中のように、図２の一致画像検出で検出された一致画像の、フォーカス領域および顔認識領域情報を、対応するサムネイル画像にＯＳＤ（On Screen Display）描画することによって、小さな表示画面上でユーザが識別可能とする。あるいは再生機能としてフォーカス情報と顔認識情報の表示モードをオプションとして設けてもよいものとする。これにより、解像度の低い小さな画面でも、ユーザはどこに顔画像が存在するか、そしてフォーカスがどこに合焦をしているかを直感的に理解することができる。したがって、ユーザ操作の失敗、またはやり直し操作を未然に防止することが可能となる。 In this print selection screen, the currently selected thumbnail image PRNT002 has a thick border so that the user can intuitively understand which image is being selected. There is no limitation on how the image is highlighted.
Further, by displaying the previous candidate PRNT001 and the subsequent candidate PRNT003 side by side on the left and right, when a plurality of matching images having similar compositions are detected, the user can easily compare which one to print.
Further, as shown in FIGS. 5 and 6, the focus area and face recognition area information of the coincidence image detected by the coincidence image detection of FIG. 2 is drawn on the corresponding thumbnail image by OSD (On Screen Display). The user can be identified on a small display screen. Alternatively, a display mode of focus information and face recognition information may be optionally provided as a playback function. Thereby, even on a small screen with a low resolution, the user can intuitively understand where the face image exists and where the focus is focused. Therefore, it is possible to prevent a user operation failure or a redo operation.

このように、フォーカス領域と顔検出領域をＯＳＤ機能として出画することで、小さな画面であるサムネイル画像だけでは、どちらから頭だし再生させたいのか、あるいは印刷したいのかを判別しづらいケースでも、ユーザの判断が容易となる。 In this way, the focus area and the face detection area are output as the OSD function, so even if it is difficult to determine from which the head image is to be played back or printed from only a thumbnail image that is a small screen, the user can Judgment becomes easier.

図５の例の場合では、THM001のフォーカスされている人物のシーンから再生したいのか、THM002あるいはTHM003上でフォーカスされている人物のシーンから再生したいのかは、一致画像だけからはユーザは直感的に判断しづらいことが考えられる。
そこで本実施形態においては、サムネイル画像とサムネイル画像に対応したフォーカス領域と顔認識領域を重ね合わせて描画することで、さらに検索性向上の補助機能を担っている。この効果は、図６の実施形態についても同様に得られる。 In the case of the example in FIG. 5, whether the user wants to reproduce from the scene of the person who is focused on THM001 or the scene of the person who is focused on THM002 or THM003, the user intuitively only from the matching images It may be difficult to judge.
Thus, in the present embodiment, the thumbnail image and the focus area corresponding to the thumbnail image and the face recognition area are drawn in an overlapping manner, thereby further supporting an auxiliary function for improving searchability. This effect can be obtained similarly in the embodiment of FIG.

なお、本実施形態に図示した例のように、フォーカス領域をエッジ強調によって表示する方法としては、種々の方法で実現することができる。例えば、フォーカス領域内の画像データについて、微分法、ラプラシアン法、Sobelフィルタ、あるいはCannyフィルタなど公知技術を用いてエッジ検出を行った画像を出力することで実現できる。 Note that, as in the example illustrated in the present embodiment, the method of displaying the focus area by edge enhancement can be realized by various methods. For example, the image data in the focus area can be realized by outputting an image in which edge detection is performed using a known technique such as a differentiation method, a Laplacian method, a Sobel filter, or a Canny filter.

「４．複数の一致画像が存在する場合の絞込み処理」
続いて、本実施形態による顔画像とフォーカス領域の一致画像フレームが所定再生区間内に複数個検出された場合の絞り込み処理について、図７と図１１、図１２を用いて説明する。
前述の顔画像認識とフォーカス領域の一致画像検出を動画データに対して行うと、静止しているような絵柄の場合に、連続して一致画像が検出されることが考えられる。
また、MPEGやH.264など、動き予測とＤＣＴ（離散コサイン変換）を用いて情報圧縮を行った動画データにおいては、フレーム間予測を行って動画像データ中のピクチャを符号化したＰピクチャやＢピクチャよりも、Ｉピクチャと呼ばれる画面内符号化を行ったフレーム画像が最も高精細で、視認性が高いことが知られている。そこで、動画像データ中のピクチャを符号化した方式を判定する、符号化方式判定処理を行う。 “4. Narrowing process when multiple matching images exist”
Next, a narrowing process when a plurality of matching image frames of the face image and the focus area are detected within a predetermined reproduction section according to the present embodiment will be described with reference to FIGS. 7, 11, and 12.
If the above-described face image recognition and focus area coincidence image detection are performed on moving image data, it is conceivable that coincident images are continuously detected in the case of a still picture.
In addition, in moving image data in which information compression is performed using motion prediction and DCT (discrete cosine transform), such as MPEG and H.264, a P picture obtained by encoding a picture in moving image data by performing inter-frame prediction It is known that a frame image obtained by performing intra-screen coding called an I picture has the highest definition and higher visibility than a B picture. Therefore, an encoding method determination process for determining a method for encoding a picture in moving image data is performed.

本実施形態では前述の性質を利用し、図７に示した一致画像を検出するための所定の再生区間を設け、この再生区間内で一致画像が複数個検出された場合、優先的にＩピクチャを一致画像として選択するものとする。なお、代表画像の検出再生区間は、特に指定はしないが１ＧＯＰ（ＮＴＳＣ映像信号で１５フレーム期間、ＰＡＬ映像信号で１２フレーム期間）程度に設定されることが望ましい。 In the present embodiment, using the above-described properties, a predetermined reproduction section for detecting the coincidence image shown in FIG. 7 is provided, and when a plurality of coincidence images are detected within this reproduction section, the I picture is preferentially used. Are selected as matching images. Note that the representative image detection / reproduction section is not particularly specified, but is preferably set to about 1 GOP (15 frame periods for NTSC video signals and 12 frame periods for PAL video signals).

図７の例では、代表シーンの検出区間に８枚の再生画像が存在したとする。続いて、その再生画像中、本実施形態による顔認識領域とフォーカス領域とが一致するフレーム画像として、候補１から３までの３フレームが検出される。 In the example of FIG. 7, it is assumed that there are eight reproduced images in the representative scene detection section. Subsequently, in the reproduced image, three frames from candidates 1 to 3 are detected as frame images in which the face recognition area and the focus area according to the present embodiment match.

前述の「２．顔画像とフォーカス領域の一致画像抽出処理」および図１０のフローチャートでは、この段階ですべての再生画像を表示出力する。それに対して、本方式ではさらに、表示出力画像絞込みの絞込み処理として、フレーム内符号化画像検出処理と、順方向予測符号化画像検出処理と、イントラマクロブロック数判定処理とを行う。そして、符号化画像データ中のIピクチャを第一優先として表示出力する。また、Iピクチャが見つからない場合は、Pピクチャを第二優先とし、IピクチャもPピクチャも候補画像中に検出することができなかった場合には、候補画像の中で、再生画像中最もイントラマクロブロックの発現数が多いフレームを表示出力するものとする。 In the above-mentioned “2. Match image extraction process of face image and focus area” and the flowchart of FIG. 10, all reproduced images are displayed and output at this stage. On the other hand, in this method, an intra-frame encoded image detection process, a forward prediction encoded image detection process, and an intra macroblock number determination process are further performed as a narrowing process for narrowing the display output image. Then, the I picture in the encoded image data is displayed and output with the first priority. If an I picture is not found, the P picture is given the second priority. If neither an I picture nor a P picture can be detected in the candidate image, the most intra-played image among the candidate images is displayed. Frames with a large number of macroblocks are displayed and output.

例えば、標準解像度（７２０ピクセル×４８０ライン）の画像フレームの場合、ＭＰＥＧ−２で符号化すると、１マクロブロックは１６ピクセルｘ１６ラインのラスタ画像に相当するため１画面内に１３５０個のマクロブロックが配置されることになる。そして、各マクロブロックが、符号化モード（イントラ、インター、スキップ）を持つ。このことから、１３５０個中の半分以上、６７５個のマクロブロックがイントラ符号化されていることを閾値として設定することによって、前述の出画画像の絞込みを行う。このようにすることにより、静止した被写体を撮影した場合など、内容の類似した一致画像を連続して何枚も表示してしまい、逆にユーザに対して検索性の低下を招いてしまう不都合を回避することが可能となる。 For example, in the case of an image frame of standard resolution (720 pixels × 480 lines), when encoded with MPEG-2, 1 macroblock corresponds to a raster image of 16 pixels × 16 lines, so 1350 macroblocks are included in one screen. Will be placed. Each macroblock has a coding mode (intra, inter, skip). For this reason, the above-described output image is narrowed down by setting as a threshold that more than half of 1350 and 675 macroblocks are intra-coded. In this way, when a stationary subject is photographed, a number of matching images with similar contents are displayed in succession, and conversely, the searchability is lowered for the user. It can be avoided.

図７においては、三つの候補画像中、候補２の再生画像がIピクチャで符号化されていることから、候補２の再生画像が一致画像として表示出力される。
前述の圧縮画像の復号結果パラメータは、図１の復号化部１０８のステータスレジスタに格納される。そして、制御用マイコン１１０に内蔵のプログラムが前述のレジスタを読むことによって絞り込み判定処理を行うことが高速化の観点からも望ましい。このようにすることにより、複数枚検出した前記一致画像の中で、最も高画質な画像データだけをユーザに提供することができる。 In FIG. 7, since the reproduced image of candidate 2 is encoded with an I picture among the three candidate images, the reproduced image of candidate 2 is displayed and output as a matching image.
The above-described decoding result parameter of the compressed image is stored in the status register of the decoding unit 108 in FIG. It is also desirable from the viewpoint of speeding up that the program built in the control microcomputer 110 performs the narrowing determination process by reading the aforementioned register. By doing in this way, only the image data with the highest image quality can be provided to the user among the coincidence images detected in a plurality.

図１１と図１２に、前述の本方式による一致画像絞込み処理のフローチャートを示す。
図１１のＳ１０１〜Ｓ１０６までの処理手順は、図１０を用いて説明した一致画像の抽出処理の手順と基本的に同様の処理であるので、図１０と同じ符号を付して説明を省略する。
Ｓ１１７においては、選択画像の絞込み処理を行う。次に、ステップＳ１１８に進み、再生画像を外部接続機器１１５に表示出力する。 FIG. 11 and FIG. 12 show flowcharts of the matching image narrowing process according to the above-described present method.
The processing procedures from S101 to S106 in FIG. 11 are basically the same as the matching image extraction processing described with reference to FIG. .
In S117, a selection image narrowing process is performed. In step S118, the reproduced image is displayed and output on the external device 115.

次に、図１１のＳ１１７において行われる選択画像の絞込み処理の一例について、図１２のフローチャートを用いて説明する。
本実施形態の絞込み処理では、先ず、Ｓ１２１において、前段の一致画像抽出で抽出された画像データ、画像情報を、絞込み処理が完了するまで一時的にメモリ内に保持する。続いて、メモリ内に絞込みのための再生区間分の一致画像が蓄積された後、絞込み処理を行うが、このとき再生区間内に複数の一致画像が連続して存在したか否かを判断する（Ｓ１２２）。 Next, an example of the selection image narrowing process performed in S117 of FIG. 11 will be described with reference to the flowchart of FIG.
In the narrowing-down process of the present embodiment, first, in S121, the image data and image information extracted by the previous matching image extraction are temporarily stored in the memory until the narrowing-down process is completed. Subsequently, after the matching images for the playback section for narrowing are stored in the memory, the narrowing process is performed. At this time, it is determined whether or not a plurality of matching images exist continuously in the playback section. (S122).

Ｓ１２２の判断の結果、所定区間内に一致画像が存在しなかった場合、すなわち、一致画像がひとつしかなかった場合には、そのフレーム画像を一致画像として絞込み処理を完了し、表示出力する（図１１のＳ１１８）。 As a result of the determination in S122, if there is no matching image in the predetermined section, that is, if there is only one matching image, the narrowing process is completed with the frame image as the matching image, and the display is output (FIG. 11 S118).

一方、Ｓ１２２の判断の結果、複数の一致画像が存在した場合にはＳ１２３に進み、一致画像をすべて検索したか判断する。この判断処理を行うことにより、表示画像の絞り込みが完了するまでメモリ内に蓄積した画像フレーム情報を検索することになる。Ｓ１２３の判断の結果、すべて検索した場合にはＳ１３０に進み、出力画像以外のフレーム情報を破棄する。一方、Ｓ１２３の判断の結果、すべて検索していない場合にはＳ１２４に進む。 On the other hand, if it is determined in S122 that there are a plurality of matching images, the process proceeds to S123, where it is determined whether all matching images have been searched. By performing this determination processing, the image frame information stored in the memory is searched until the display image narrowing down is completed. As a result of the determination in S123, when all the search is performed, the process proceeds to S130, and the frame information other than the output image is discarded. On the other hand, as a result of the determination in S123, if not all have been searched, the process proceeds to S124.

Ｓ１２４においては、検索中の画像フレーム情報が、Ｉピクチャで符号化された画像データがあるか検出する。この検出の結果、Ｉピクチャで符号化された画像データを検出した場合は、Ｓ１２５に進み、その時点で、Ｉピクチャの再生画像を出画画像として判定する。その後、Ｓ１３０に進み、それ以外の画像フレーム情報を破棄し、その後、再生画像絞込み処理を完了する。 In S124, it is detected whether the image frame information being searched includes image data encoded with an I picture. If image data encoded with an I picture is detected as a result of this detection, the process proceeds to S125, and at that time, a reproduced image of the I picture is determined as an output image. Thereafter, the process proceeds to S130, and other image frame information is discarded, and then the reproduction image narrowing process is completed.

Ｓ１２４の検出処理でＩピクチャを検出できなかった場合にはＳ１２６に進み、Ｐピクチャで符号化された画像データを検出する。この検出の結果、Ｐピクチャで符号化された画像データが存在する場合はＳ１２７に進み、その時点で、そのＰピクチャの再生画像を出画画像として判定する。その後、Ｓ１３０に進み、それ以外の画像フレーム情報を破棄した後、絞込み処理を完了する。 If the I picture cannot be detected in the detection process of S124, the process proceeds to S126, and the image data encoded with the P picture is detected. As a result of this detection, if there is image data encoded with a P picture, the process proceeds to S127, and at that time, a reproduced image of the P picture is determined as an output image. Thereafter, the process proceeds to S130, and other image frame information is discarded, and then the narrowing process is completed.

Ｓ１２６の検出の結果、Ｐピクチャを検出できなかった場合にはＳ１２８に進み、符号化ピクチャ（通常は双方向予測符号化を行ったＢピクチャ）中のイントラマクロブロックの個数が予め定めた閾値以上であるかを判定する（Ｓ１２８）。この判定の結果、閾値以上の場合はＳ１２９に進み、その時点で、そのピクチャの再生画像を出画画像として判定する。その後、Ｓ１３０に進み、それ以外の画像フレーム情報を破棄した後、絞込み処理を完了する。 If the P picture cannot be detected as a result of the detection in S126, the process proceeds to S128, and the number of intra macroblocks in the encoded picture (usually a B picture subjected to bidirectional predictive encoding) is equal to or greater than a predetermined threshold. Is determined (S128). If the result of this determination is that it is greater than or equal to the threshold value, the process proceeds to S129, and at that time, the reproduced image of that picture is determined as the output image. Thereafter, the process proceeds to S130, and other image frame information is discarded, and then the narrowing process is completed.

一方、Ｓ１２８の判定の結果、予め定めた閾値より小さい場合は、Ｓ１２３に戻って次の画像フレーム情報を検索する（Ｓ１２３）。
以上が、フォーカス領域と顔認識領域の一致画像絞込み処理の一例を示すフローである。 On the other hand, if the result of determination in S128 is smaller than the predetermined threshold, the process returns to S123 to search for the next image frame information (S123).
The above is a flow showing an example of the matching image narrowing process between the focus area and the face recognition area.

「５．顔認識情報の生成処理」
続いて、本実施形態による顔画像認識情報の生成処理について、図８および図９を用いて説明する。
本実施形態においては、顔認識領域とフォーカス領域の一致画像検出処理を、画像データの記録時に行うか、再生読み出し時に行うかについて特に限定していない。しかしながら、再生読み出し時に、図１の顔画像認識部１０３において顔認識処理を行った場合、再生画像を一度バッファしてから表示出力するまでのメモリ使用効率と処理時間の観点から、記録時に顔認識情報を画像データ中にメタデータとして持つことが望ましい。 "5. Face recognition information generation process"
Next, face image recognition information generation processing according to the present embodiment will be described with reference to FIGS.
In the present embodiment, there is no particular limitation on whether the coincidence image detection process between the face recognition area and the focus area is performed when image data is recorded or reproduced and read. However, when face recognition processing is performed in the face image recognition unit 103 in FIG. 1 during playback reading, face recognition is performed during recording from the viewpoint of memory usage efficiency and processing time from buffering the playback image once to display output. It is desirable to have information as metadata in image data.

本実施形態では、顔認識情報の記録方式として、図８に示す、座標形式データによる領域情報の表現方式、および図９に示す、ビットマップ形式データによる領域情報の表現方式の二つを提案する。
第１に、座標による顔認識領域情報の作成方法は、図８中のように、顔認識が二人分、二つの矩形領域で検出された場合、それぞれの矩形について対角の２点の座標（点ＡとＢ，点ＣとＤ）を得る。そして、それぞれの水平方向、垂直方向の成分を検出された人数分セットで座標情報として画像データ中に記録する。 In the present embodiment, two methods are proposed as face recognition information recording methods: a region information expression method using coordinate format data shown in FIG. 8 and a region information expression method using bitmap format data shown in FIG. .
First, as shown in FIG. 8, the face recognition area information creation method using coordinates is as follows. When face recognition is detected in two rectangular areas for two persons, the coordinates of two diagonal points for each rectangle are used. (Points A and B, Points C and D) are obtained. Then, the horizontal and vertical components are recorded in the image data as coordinate information in sets for the number of detected persons.

図８の例では（ｘａ、ｙａ）と（ｘｂ、ｙｂ）、（ｘｃ、ｙｃ）と（ｘｄ、ｙｄ）の二組を記録する。これらの顔認識座標情報をメタデータとして画像データ中に記録時間毎に記録する。記録間隔は、例えば図中の圧縮映像データや音声データがアクセスユニット単位で時分割多重されているような場合、ピクチャ（フレーム）単位、ＧＯＰ単位、あるいはユーザ指定の時間間隔のどれでもよいものとする。 In the example of FIG. 8, two sets of (xa, ya) and (xb, yb), (xc, yc) and (xd, yd) are recorded. These face recognition coordinate information is recorded as metadata in the image data at every recording time. For example, when the compressed video data and audio data in the figure are time-division multiplexed in units of access units, the recording interval may be any of a picture (frame) unit, a GOP unit, or a user-specified time interval. To do.

メタデータの記録領域として、ＭＰＥＧ−２システム規格で規定されているプライベートストリームパケットデータ、あるいはＭＰＥＧ−２ビデオ規格で規定されるＧＯＰヘッダー、またはピクチャヘッダー中のユーザデータ領域に格納する方法など考えられる。また、同様の方法で図９に示した顔認識情報をビットマップ形式データに変換し、画像データ内に付帯情報として記録する方法も考えられる。 As a metadata recording area, private stream packet data defined by the MPEG-2 system standard, a GOP header defined by the MPEG-2 video standard, or a method of storing in a user data area in a picture header can be considered. . Further, a method of converting the face recognition information shown in FIG. 9 into bitmap format data by the same method and recording it as auxiliary information in the image data is also conceivable.

再生画像の画面内について、顔部分を１、そうでない部分を０としてビットマップを作成し、顔認識領域をビットマップとして表現する。このように、ビットマップとして表現することで、座標情報に比べて保持する情報量は多くなるものの、矩形以外の、人物の顔にフィットした自由な形状を表現できるため、人物の顔領域を検出する際一致画像抽出時の精度が向上する。また、顔認識領域とフォーカス領域の一致画像抽出処理においてもビット値１となるビット数を計数すればよく、計算しやすい利点がある。これにより、再生時に前記顔画像領域データを取得するだけでよく、再生装置において多量の画像データに対してメモリアクセスを行う可能性のある顔検出処理を行う必要がなくなり高速化を実現することができる。 A bitmap is created with the face portion set to 1 and the other portion set to 0 in the screen of the reproduced image, and the face recognition area is expressed as a bitmap. In this way, expressing as a bitmap increases the amount of information to be stored compared to coordinate information, but can express a free shape that fits a person's face other than a rectangle, thus detecting the human face area The accuracy at the time of extracting the coincidence image is improved. Also, in the coincidence image extraction process of the face recognition area and the focus area, it is only necessary to count the number of bits with a bit value of 1, which has the advantage of being easy to calculate. As a result, it is only necessary to acquire the face image area data at the time of reproduction, and it is not necessary to perform face detection processing that may cause memory access to a large amount of image data in the reproduction apparatus, thereby realizing high speed. it can.

認識情報の記録間隔が短い場合、ビットマップをそのまま記録すると、画像データ中に埋め込むオーバヘッドが多くなる。そこで、このビットマップデータをゼロランによるランレングス圧縮を施したデータを記録してもよい。このようにすると、画像データ全体を占める顔画像認識データ量を削減することができ、そのデータ量分だけ、画像データ自体に多くの情報を付与することが可能となる。したがって、情報圧縮による情報欠落が少ない、より高画質な画像データをユーザに提供することができる。 If the recording interval of the recognition information is short, if the bitmap is recorded as it is, the overhead embedded in the image data increases. Therefore, data obtained by subjecting this bitmap data to run length compression by zero run may be recorded. In this way, it is possible to reduce the amount of face image recognition data that occupies the entire image data, and it is possible to add as much information to the image data itself as much as the amount of data. Therefore, it is possible to provide the user with higher quality image data with less information loss due to information compression.

さらには、本実施形態においては図示しないが、撮影記録時において、顔認識領域とフォーカス領域の領域一致検出処理を行い、一致画像と判定した再生画像であると識別可能な、フラグ情報などの識別子だけを画像データにメタデータとして記録してもよい。このようにすると、このフラグ情報だけ判断して一致画像を表示出力すればよく、さらなるコンテンツの検索性向上が期待できる。 Further, although not shown in the present embodiment, an identifier such as flag information that can be identified as a reproduced image that has been subjected to area coincidence detection processing between the face recognition area and the focus area and determined as a coincidence image at the time of shooting and recording. May be recorded as metadata in the image data. In this way, only the flag information is judged and the coincidence image may be displayed and output, and further improvement in searchability of content can be expected.

なお、顔認識情報の生成は、必ずしも記録撮影時に行わなくてもよい。すなわち、顔認識情報が画像データの再生読み出し時になかった場合には、復号伸張した画像データより顔画像検出処理を行い、一致画像検出のフラグ取得を行うようにしてもよい。そして、一致画像検出フラグが検出された場合には、メニュー画面表示処理、印刷推奨画像であることをユーザに通知する通知処理、出画データの絞込みを行う絞込み処理のうち、何れか１つ、または複数を行うようにする。 Note that the generation of face recognition information does not necessarily have to be performed during recording shooting. In other words, if the face recognition information is not present at the time of reproduction / reading of image data, face image detection processing may be performed from the decoded / decompressed image data to obtain a match image detection flag. When a coincidence image detection flag is detected, any one of a menu screen display process, a notification process for notifying the user that the image is a recommended print image, and a narrowing process for narrowing output data, Or do more.

また、本発明の実施形態として顔認識情報は画像データ中にメタデータとして記録されていることが、記録媒体へのアクセス効率の観点から望ましいが、画像データとは独立した画像データの属性を記述した管理ファイルデータとして持ってもよい。 As an embodiment of the present invention, it is desirable that face recognition information is recorded as metadata in image data from the viewpoint of access efficiency to the recording medium. However, the attribute of the image data independent of the image data is described. You may have as management file data.

なお、再生装置の制御は１つの制御用マイコン１１０ハードウェアが行ってもよいし、複数のハードウェアが処理を分担することで、再生装置全体の制御を行ってもよい。また、本発明をその好適な実施形態に基づいて詳述してきたが、本発明はこれら特定の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の様々な形態も本発明に含まれる。さらに、前述した各実施形態は本発明の一実施形態を示すものにすぎず、各実施形態を適宜組み合わせることも可能である。 Note that the control of the playback device may be performed by one control microcomputer 110 hardware, or the entire playback device may be controlled by a plurality of hardware sharing the processing. Although the present invention has been described in detail based on the preferred embodiments thereof, the present invention is not limited to these specific embodiments, and various forms without departing from the gist of the present invention are also included in the present invention. included. Furthermore, each embodiment mentioned above only shows one embodiment of this invention, and it is also possible to combine each embodiment suitably.

また、前述した実施形態においては、本発明をビデオカメラに適用した場合を例にして説明したが、これに限らず、動画像を再生可能な装置であれば適用可能である。すなわち、本発明は、パーソナルコンピュータやＰＤＡ、携帯電話端末や携帯型の画像ビューワ、デジタルフォトフレーム、ゲーム機、音楽プレーヤーなどの動画再生可能な機器に適用可能である。 In the above-described embodiments, the case where the present invention is applied to a video camera has been described as an example. However, the present invention is not limited to this, and can be applied to any apparatus capable of reproducing moving images. That is, the present invention can be applied to devices capable of reproducing moving images, such as personal computers, PDAs, mobile phone terminals, portable image viewers, digital photo frames, game machines, and music players.

（他の実施形態）
本発明は、以下の処理を実行することによっても実現される。即ち、前述した実施形態の機能を実現するソフトウェア（コンピュータプログラム）をネットワーク又は各種のコンピュータ読み取り可能な記憶媒体を介してシステム或いは装置に供給する。そして、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ等）がプログラムコードを読み出して実行する処理である。この場合、そのプログラム、及びそのプログラムを記憶した記憶媒体は本発明を構成することになる。 (Other embodiments)
The present invention is also realized by executing the following processing. That is, software (computer program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various computer-readable storage media. Then, the computer (or CPU, MPU, etc.) of the system or apparatus reads out and executes the program code. In this case, the program and the storage medium storing the program constitute the present invention.

１０１光学部、１０２撮像部、１０３顔画像認識部、１０４符号化部、１０５記録再生部、１０６磁気ディスク、１０７光ディスク、１０８復号化部、１０９表示出力部、１１０制御用マイコン、１１１システムバス、１１２半導体メモリ、１１３外部コントロール、１１４外部入出力インタフェース、１１５外部接続機器 DESCRIPTION OF SYMBOLS 101 Optical part, 102 Imaging part, 103 Face image recognition part, 104 Encoding part, 105 Recording / reproducing part, 106 Magnetic disk, 107 Optical disk, 108 Decoding part, 109 Display output part, 110 Control microcomputer, 111 System bus, 112 Semiconductor memory, 113 External control, 114 External input / output interface, 115 External connection equipment

Claims

In a playback device for playing back image data and displaying it on a display device,
Face recognition information acquisition means for acquiring face recognition information representing an area where a face portion corresponding to a subject of the image data exists;
Focus information acquisition means for acquiring focus information representing a focused area of the image data;
An area coincidence detection unit for detecting coincidence between the face recognition area acquired by the face recognition information acquisition unit and the focus area acquired by the focus information acquisition unit;
And a control unit configured to control the display device to display the image data determined to be matched by the region match detection unit as a playback image.

The area coincidence detecting means detects the degree of overlap between the face recognition area and the focus area,
A coincidence determination unit that determines that the face recognition area and the focus area are reproduced images when the degree of coincidence detected by the coincidence degree detection unit exceeds a predetermined threshold with respect to the focus area; The playback apparatus according to claim 1, wherein:

The coincidence degree detection unit determines that the face recognition area and the focus area are reproduction images when all of the face recognition areas are included in the focus area. 2. The playback device according to 2.

Matching image search means for searching for matching image data between the face recognition area and the focus area from video data or a plurality of still image data;
2. The playback apparatus according to claim 1, further comprising thumbnail image generation means for generating thumbnail images by reducing the match image data searched by the match image search means.

An association means for associating the thumbnail image and the matching image data with a logical address on a recording medium;
5. The playback apparatus according to claim 4, wherein the control means controls to display the menu screen by arranging the thumbnail images on the display apparatus.

Display output means for displaying and outputting the thumbnail image generated by the thumbnail image generation means on a print selection screen;
5. The reproducing apparatus according to claim 4, wherein the control unit controls the display output unit to display the thumbnail image on the print selection screen and notify the user that the image is a recommended print image.

A drawing means for converting the face recognition area and the focus area into image data and superimposing the matching image on the matching image;
2. The reproducing apparatus according to claim 1, further comprising: OSD (On Screen Display) means for displaying and outputting an image superimposed by the drawing means on the display device.

Encoding method determining means for determining an encoding method of a picture in the moving image data;
In the moving image data that has been subjected to motion prediction and information compression by DCT (discrete cosine transform), the control means matches if the coincidence degree detection means detects a plurality of coincident images within a predetermined reproduction section. The playback apparatus according to claim 1, wherein control is performed so as to perform a narrowing process of a playback image selected as an image.

The reproduction image narrowing process includes an intra-frame encoded image detection process, a forward prediction encoded image detection process, and an intra macroblock number determination process,
The control means outputs the detected image data as a reproduced image when image data subjected to intra-frame coding is detected, and forward prediction without detecting the image data subjected to intra-frame coding. When the encoded image data is detected, the detected image data is output as a reproduced image, and when no image data of any of the encoding methods is detected, the encoded image data 9. The reproduction apparatus according to claim 8, wherein image data having the largest number of intra macroblocks or image data having the number of intra macroblocks equal to or greater than a predetermined threshold is output as a reproduction image.

The face recognition area information is converted into coordinate format data or bitmap format data based on the result of face image recognition for recognizing the face area of a person existing in the screen, and recorded as supplementary information in the image data. The reproducing apparatus according to claim 1, wherein

11. The reproducing apparatus according to claim 10, wherein the bitmap format data is run-length compressed.

Flag acquisition means for acquiring a coincidence image detection flag at the time of reading and reproducing the image data;
The control means, when a matching image detection flag is detected by the flag acquisition means, a menu screen display process for displaying the thumbnail images side by side on the display device, a notification process for notifying the user that the image is a recommended print image, 2. The playback apparatus according to claim 1, wherein control is performed so as to perform any one or a plurality of narrowing-down processes for narrowing output image data.

Face recognition information acquisition means for acquiring face recognition information corresponding to a subject;
Focus information acquisition means for acquiring focus information of captured image data;
A region match detection unit that performs a region match detection process between the face recognition region acquired by the face recognition information acquisition unit and the focus region acquired by the focus information acquisition unit when recording image data;
And a recording means for recording a coincidence image detection flag as incidental information in corresponding image data when the area coincidence detection means detects that the face recognition area and the focus area coincide with each other. A recording device.

In a playback method for playing back image data and displaying it on a display device,
A face recognition information acquisition step of acquiring face recognition information representing an area where a face portion corresponding to a subject of the image data exists;
A focus information acquisition step of acquiring focus information representing a region in focus of the image data;
A region matching detection step for detecting a match between the face recognition region acquired in the face recognition information acquisition step and the focus region acquired in the focus information acquisition step;
And a control step of controlling the display device to display the image data determined to match in the region match detection step as a playback image.

A face recognition information acquisition step of acquiring face recognition information corresponding to the subject;
A focus information acquisition step of acquiring focus information of the captured image data;
A region matching detection step of performing region matching detection processing between the face recognition region acquired in the face recognition information acquisition step and the focus region acquired in the focus information acquisition step when recording image data;
A recording step of recording a coincidence image detection flag in the corresponding image data as incidental information when it is detected in the region coincidence detection step that the face recognition region and the focus region coincide with each other. Recording method.

In a computer program for causing a computer to execute a process of reproducing and displaying image data on a display device,
A face recognition information acquisition step of acquiring face recognition information representing an area where a face portion corresponding to a subject of the image data exists;
A focus information acquisition step of acquiring focus information representing a region in focus of the image data;
A region matching detection step for detecting a match between the face recognition region acquired in the face recognition information acquisition step and the focus region acquired in the focus information acquisition step;
A program causing a computer to execute a control step of controlling image data determined to match in the region match detection step to be displayed on the display device as a reproduced image.

A face recognition information acquisition step of acquiring face recognition information corresponding to the subject;
A focus information acquisition step of acquiring focus information of the captured image data;
A region matching detection step of performing region matching detection processing between the face recognition region acquired in the face recognition information acquisition step and the focus region acquired in the focus information acquisition step when recording image data;
When the region coincidence detecting step detects that the face recognition region and the focus region are coincident with each other, the computer is caused to execute a recording step of recording the coincidence image detection flag in the corresponding image data as incidental information. A program characterized by that.

A computer-readable storage medium storing the program according to claim 16 or 17.