JP2022071984A

JP2022071984A - Imaging device, control method, and program

Info

Publication number: JP2022071984A
Application number: JP2020181149A
Authority: JP
Inventors: 隆川上; Takashi Kawakami
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2022-05-17
Also published as: CN114430450A; CN114430450B; US20220141389A1; US12003857B2

Abstract

PROBLEM TO BE SOLVED: To make it difficult to miss a subject to be imaged by a user.
SOLUTION: The detection of an image pickup means for photographing a subject, a drive means for driving the image pickup direction of the image pickup means, a first voice command for executing a specific process, and the first voice command is started. The driving means has a second voice command for detecting and a detecting means for detecting the second voice command, and the driving means limits the driving range according to the detection of the second voice command. An image pickup device characterized by.
[Selection diagram] FIG. 7

Description

本発明は、音声コマンドを認識可能な撮像装置に関する。 The present invention relates to an image pickup device capable of recognizing voice commands.

近年では、周囲を撮像することができる撮像装置が提案されている。このような撮像装置には、ユーザから発せられた音声コマンドを認識したことに応じて被写体を自動的に撮像することで、ユーザの欲するシーンの画像を撮像することを目的としている。 In recent years, an imaging device capable of imaging the surroundings has been proposed. Such an image pickup device is intended to capture an image of a scene desired by a user by automatically capturing a subject in response to recognizing a voice command issued by the user.

特許文献１には、パン・チルトやズームを駆動して被写体を自動探索して撮影する自動撮影モードを有し、自動撮影モードにおいて音声コマンドによって手動撮影することができる撮像装置が開示されている。 Patent Document 1 discloses an image pickup device having an automatic shooting mode in which a pan / tilt or zoom is driven to automatically search for and shoot a subject, and manual shooting can be performed by a voice command in the automatic shooting mode. ..

特開２０１９－１０６６９４号公報Japanese Unexamined Patent Publication No. 2019-106694

一般的に音声コマンドによる処理では、誤操作防止などの目的で、音声コマンドを検出する装置は、所定の処理を実行するための音声コマンド（コマンドワード）の前にコマンドワードの検出を開始させるための音声コマンド（トリガーワード）を検出する。しかしながら、特許文献１では撮像装置がトリガーワードを用いることについてなんら考慮されていない。このような音声コマンドによる処理を特許文献１に開示されている撮像装置に適用した場合、特許文献１の撮像装置は、自動撮影モードにおいて、被写体を自動探索しながらトリガーワードおよびコマンドワードを検出することが想定される。そして、トリガーワードを検出した時点ではその後の音声コマンドによる処理が特定されていないため、自動撮影を継続することが考えられる。しかしながら、ユーザの発したコマンドワードが被写体を撮像することを指示する音声コマンドである場合、ユーザの撮像したい被写体は、トリガーワードの発声を開始した時点において撮像装置に撮像されている被写体であることが想定される。それにもかかわらず、トリガーワードを検出しても自動探索を継続してしまう場合、コマンドワードが検出された時点では、ユーザの撮像したい被写体が画角から外れてしまい、ユーザの撮像したい被写体を撮り逃すおそれがある。そこで、本発明は、音声コマンドを用いて撮像指示を入力する際にユーザの撮像したい被写体を撮り逃しにくくすることを目的とする。 Generally, in the process by voice command, for the purpose of preventing erroneous operation, the device that detects the voice command starts the detection of the command word before the voice command (command word) for executing the predetermined process. Detects voice commands (trigger words). However, Patent Document 1 does not consider the use of the trigger word in the image pickup apparatus. When such processing by voice commands is applied to the image pickup apparatus disclosed in Patent Document 1, the image pickup apparatus of Patent Document 1 detects a trigger word and a command word while automatically searching for a subject in the automatic shooting mode. Is assumed. Then, since the processing by the subsequent voice command is not specified at the time when the trigger word is detected, it is conceivable to continue the automatic shooting. However, when the command word issued by the user is a voice command instructing the subject to be imaged, the subject to be imaged by the user is the subject imaged by the image pickup device at the time when the trigger word is uttered. Is assumed. Nevertheless, if the automatic search continues even if the trigger word is detected, the subject that the user wants to capture will be out of the angle of view when the command word is detected, and the subject that the user wants to capture will be shot. There is a risk of missing it. Therefore, an object of the present invention is to make it difficult for a user to miss a subject to be imaged when inputting an image pickup instruction using a voice command.

上記目的を達成するために、本発明の撮像装置は、被写体を撮像する撮像手段と、前記撮像手段の撮像方向を駆動する駆動手段と、特定の処理を実行するための第一の音声コマンドと、前記第一の音声コマンドの検出を開始するための第二の音声コマンドと、を検出する検出手段と、を有し、前記検出手段が前記第二の音声コマンドを検出したことに応じて、前記駆動手段は、駆動範囲を制限することを特徴とする。 In order to achieve the above object, the image pickup apparatus of the present invention includes an image pickup means for photographing a subject, a drive means for driving the image pickup direction of the image pickup means, and a first voice command for executing a specific process. , A second voice command for initiating detection of the first voice command, and a detection means for detecting, depending on the detection means detecting the second voice command. The driving means is characterized in that the driving range is limited.

本発明によれば、音声コマンドを用いて撮像指示を入力する際にユーザの撮像したい被写体を撮り逃しにくくすることができる。 According to the present invention, it is possible to make it difficult for a user to miss a subject to be imaged when inputting an image pickup instruction using a voice command.

（ａ）第一の実施形態における、撮像装置の外観の例を示すための図である。（ｂ）第一の実施形態における、撮像装置の動作を説明するための図である。(A) It is a figure for demonstrating an example of the appearance of the image pickup apparatus in 1st Embodiment. (B) It is a figure for demonstrating the operation of the image pickup apparatus in 1st Embodiment. 第一の実施形態における、撮像装置の構成を示す図である。It is a figure which shows the structure of the image pickup apparatus in the 1st Embodiment. 第一の実施形態における、撮像装置と外部装置との構成を示す図である。It is a figure which shows the structure of the image pickup apparatus and the external apparatus in the 1st Embodiment. 第一の実施形態における、外部装置の構成を示す図である。It is a figure which shows the structure of the external device in 1st Embodiment. 第一の実施形態における、自動撮影処理を示すフローチャートである。It is a flowchart which shows the automatic photography processing in 1st Embodiment. 第一の実施形態における、撮影画像内のエリア分割を説明するための図である。It is a figure for demonstrating the area division in a photographed image in 1st Embodiment. 第一の実施形態における、音声認識処理を示すフローチャートである。It is a flowchart which shows the voice recognition processing in 1st Embodiment.

以下に、本発明の好ましい実施の形態を、添付の図面に基づいて詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

なお、以下に説明する実施の形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されてもよい。また、各実施の形態は適宜組み合わされることも可能である。 The embodiment described below is an example as a means for realizing the present invention, and may be appropriately modified or modified depending on the configuration of the apparatus to which the present invention is applied and various conditions. Moreover, each embodiment can be combined as appropriate.

［第一の実施形態］
＜撮像装置の構成＞
図１は、第一の実施形態の撮像装置の構成を示す図である。 [First Embodiment]
<Configuration of image pickup device>
FIG. 1 is a diagram showing a configuration of an image pickup apparatus according to the first embodiment.

図１（ａ）に示す撮像装置１０１は、電源のオンおよびオフを切り替える操作を行うことができる電源スイッチを含む操作部材などが設けられている。操作部材にはタッチパネルも含まれる。 The image pickup apparatus 101 shown in FIG. 1A is provided with an operation member including a power switch capable of performing an operation of switching the power on and off. The operation member also includes a touch panel.

鏡筒１０２は、光学レンズ群および撮像素子を含む筐体である。鏡筒１０２は、撮像装置１０１に取り付けられる。チルト回転ユニット１０４およびパン回転ユニット１０５は、鏡筒１０２を固定部１０３に対して回転駆動できる回転機構である。チルト回転ユニット１０４は、例えば鏡筒１０２を図１（ｂ）に示すピッチ方向に回転できるモーターである。パン回転ユニット１０５は、例えば鏡筒１０２を図１（ｂ）に示すヨー方向に回転できるモーターである。チルト回転ユニット１０４およびパン回転ユニット１０５によって、鏡筒１０２は、１軸以上の方向に回転駆動できる。なお、本実施形態では、図１（ｂ）に示すＹ軸は、パン回転ユニット１０５の回転軸である。また、本実施形態では、図１（ｂ）に示すＺ軸の正の方向は、撮像装置１０１の正面方向である。 The lens barrel 102 is a housing including an optical lens group and an image pickup element. The lens barrel 102 is attached to the image pickup apparatus 101. The tilt rotation unit 104 and the pan rotation unit 105 are rotation mechanisms capable of rotationally driving the lens barrel 102 with respect to the fixed portion 103. The tilt rotation unit 104 is, for example, a motor capable of rotating the lens barrel 102 in the pitch direction shown in FIG. 1 (b). The pan rotation unit 105 is, for example, a motor capable of rotating the lens barrel 102 in the yaw direction shown in FIG. 1 (b). The tilt rotation unit 104 and the pan rotation unit 105 can drive the lens barrel 102 to rotate in one or more directions. In this embodiment, the Y-axis shown in FIG. 1B is the rotation axis of the pan rotation unit 105. Further, in the present embodiment, the positive direction of the Z axis shown in FIG. 1B is the front direction of the image pickup apparatus 101.

角速度計１０６および加速度計１０７は、それぞれ、例えば、ジャイロセンサおよび加速度センサであり、撮像装置１０１の固定部１０３に配置されている。撮像装置１０１は、角速度計１０６や加速度計１０７において計測されたそれぞれの速度に基づいて、撮像装置１０１の振動を検出する。撮像装置１０１は、検出された撮像装置１０１の振動に基づいて、チルト回転ユニット１０４およびパン回転ユニット１０５を回転駆動することで鏡筒１０２における揺れおよび傾きを補正した画像を生成することができる。 The angular velocity meter 106 and the accelerometer 107 are, for example, a gyro sensor and an acceleration sensor, respectively, and are arranged in a fixed portion 103 of the image pickup apparatus 101. The image pickup device 101 detects the vibration of the image pickup device 101 based on the respective velocities measured by the angular velocity meter 106 and the accelerometer 107. The image pickup apparatus 101 can generate an image in which the shake and the tilt of the lens barrel 102 are corrected by rotationally driving the tilt rotation unit 104 and the pan rotation unit 105 based on the detected vibration of the image pickup apparatus 101.

図２は、本実施形態の撮像装置１０１の構成を示すブロック図である。 FIG. 2 is a block diagram showing the configuration of the image pickup apparatus 101 of the present embodiment.

第１制御部２２３は、プロセッサ（例えば、ＣＰＵ、ＧＰＵ、マイクロプロセッサ、ＭＰＵなど）、およびメモリ（例えば、ＤＲＡＭ、ＳＲＡＭなど）等からなる。第１制御部２２３は、各種処理を実行して撮像装置１０１の各ブロックを制御したり、各ブロック間でのデータ転送を制御したりする。なお、第１制御部２２３は、制御手段および決定手段の一例である。 The first control unit 223 includes a processor (for example, CPU, GPU, microprocessor, MPU, etc.), a memory (for example, DRAM, SRAM, etc.) and the like. The first control unit 223 executes various processes to control each block of the image pickup apparatus 101, and controls data transfer between the blocks. The first control unit 223 is an example of a control means and a determination means.

不揮発性メモリ２１６は、データを記録および消去可能なメモリであり、第１制御部２２３の動作用の定数、プログラム等が記録される。 The non-volatile memory 216 is a memory capable of recording and erasing data, and records constants, programs, and the like for the operation of the first control unit 223.

ズームユニット２０１は、ズーム倍率の変倍を行うズームレンズを構成する光学レンズ群である。ズーム駆動制御部２０２は、ズームユニット２０１の光学レンズを駆動制御する制御部である。フォーカスユニット２０３は、ピント調整を行う光学レンズ群である。フォーカス駆動制御部２０４は、フォーカスユニット２０３の光学レンズを駆動制御する。撮像部２０６は、撮像素子が各光学レンズ群を通して入射する光を受け、その光量に応じた電荷の情報をアナログ画像データとして画像処理部２０７に出力する。なお、ズームユニット２０１、ズーム駆動制御部２０２、フォーカスユニット２０３、フォーカス駆動制御部２０４、および撮像部２０６は鏡筒１０２に含まれる。 The zoom unit 201 is a group of optical lenses constituting a zoom lens that changes the zoom magnification. The zoom drive control unit 202 is a control unit that drives and controls the optical lens of the zoom unit 201. The focus unit 203 is a group of optical lenses for adjusting the focus. The focus drive control unit 204 drives and controls the optical lens of the focus unit 203. The image pickup unit 206 receives light incident from the image pickup element through each optical lens group, and outputs charge information corresponding to the amount of light to the image processing unit 207 as analog image data. The zoom unit 201, the zoom drive control unit 202, the focus unit 203, the focus drive control unit 204, and the image pickup unit 206 are included in the lens barrel 102.

画像処理部２０７は、撮像部２０６から入力された画像データに対して、歪曲補正、ホワイトバランス調整、および色補間処理等の画像処理を行い、デジタル画像データを出力する。画像処理部２０７から出力されたデジタル画像データは、画像記録部２０８でＪＰＥＧ形式等の画像ファイルフォーマットやＭＰＥＧ形式等の動画ファイルフォーマットによって変換される。変換されたデジタル画像データは、メモリ２１５や後述する映像出力部２１７に送信される。第１制御部２２３は、メモリ２１５に記録されたデジタル画像データを記録する場合、デジタル画像データを記録再生部２２０に出力する。 The image processing unit 207 performs image processing such as distortion correction, white balance adjustment, and color interpolation processing on the image data input from the image pickup unit 206, and outputs digital image data. The digital image data output from the image processing unit 207 is converted by the image recording unit 208 by an image file format such as JPEG format or a moving image file format such as MPEG format. The converted digital image data is transmitted to the memory 215 and the video output unit 217 described later. When recording the digital image data recorded in the memory 215, the first control unit 223 outputs the digital image data to the recording / reproducing unit 220.

鏡筒回転駆動部２０５は、チルト回転ユニット１０４およびパン回転ユニット１０５を駆動して鏡筒１０２をチルト方向およびパン方向に駆動させる。なお、鏡筒回転駆動部２０５は駆動手段の一例である。 The lens barrel rotation drive unit 205 drives the tilt rotation unit 104 and the pan rotation unit 105 to drive the lens barrel 102 in the tilt direction and the pan direction. The lens barrel rotation drive unit 205 is an example of the drive means.

装置揺れ検出部２０９は、例えば撮像装置１０１の３軸方向の角速度を検出する角速度計１０６および、装置の３軸方向の加速度を検出する加速度計１０７が搭載される。装置揺れ検出部２０９は、角速度計１０６および加速度計１０７によって検出された信号に基づいて、装置の回転角度や装置のシフト量などを演算する。 The device shake detection unit 209 is equipped with, for example, an angular velocity meter 106 that detects the angular velocity in the triaxial direction of the image pickup device 101, and an accelerometer 107 that detects the acceleration in the triaxial direction of the device. The device shake detection unit 209 calculates the rotation angle of the device, the shift amount of the device, and the like based on the signals detected by the angular velocity meter 106 and the accelerometer 107.

音声入力部２１３は複数のマイクを有する。また、音声入力部２１３は、マイクから入力された音声信号をＡ／Ｄ変換して音声処理部２１４に出力する。 The voice input unit 213 has a plurality of microphones. Further, the voice input unit 213 A / D-converts the voice signal input from the microphone and outputs it to the voice processing unit 214.

音声処理部２１４は、この複数のマイクが設置された平面上の音の方向を検出することができる。検出された音の方向は後述する探索や自動撮像に用いることができる。さらに、音声処理部２１４は、特定の音声コマンドを認識することができる。本実施形態では、特定の音声コマンドは、トリガーワードおよびコマンドワードの２種類ある。トリガーワードは、コマンドワードの認識を開始させるためのトリガーとなるコマンドである。例えば、トリガーワードは、ユーザが発声する「オーケー、カメラ」などの特定のキーワードからなるコマンドである。また、コマンドワードは、撮像装置１０１に対して所定の処理をするよう指示するためのコマンドである。例えば、この所定の処理は、静止画の撮像処理、動画の撮像開始処理、動画の撮像終了処理、スリープ処理、被写体の変更処理、および自動撮像処理である。なお、コマンドワードは、例えば、静止画の撮像処理であれば「静止画を撮って」、動画の撮像開始処理であれば「動画を撮って」、などのように、所定の処理ごとに異なるキーワードからなるコマンドである。これらの音声コマンドは、撮像装置１０１のメモリ２１５にあらかじめ記録されている。なお、撮像装置１０１は、あらかじめ記録されている音声コマンドの他にも、ユーザの任意の処理を実行するための音声コマンドを登録できるように構成されてもよい。 The voice processing unit 214 can detect the direction of sound on a plane on which the plurality of microphones are installed. The direction of the detected sound can be used for the search and automatic imaging described later. Further, the voice processing unit 214 can recognize a specific voice command. In this embodiment, there are two types of specific voice commands, a trigger word and a command word. The trigger word is a command that triggers the recognition of the command word. For example, a trigger word is a command consisting of a specific keyword such as "OK, camera" uttered by the user. Further, the command word is a command for instructing the image pickup apparatus 101 to perform a predetermined process. For example, this predetermined process is a still image imaging process, a moving image imaging start process, a moving image imaging end process, a sleep process, a subject change process, and an automatic image pickup process. Note that the command word differs for each predetermined process, for example, "take a still image" in the case of still image imaging processing, "take a moving image" in the case of moving image imaging start processing, and the like. It is a command consisting of keywords. These voice commands are pre-recorded in the memory 215 of the image pickup apparatus 101. In addition to the voice commands recorded in advance, the image pickup apparatus 101 may be configured so that voice commands for executing arbitrary processing by the user can be registered.

また、音声処理部２１４は、入力された音声信号に対して適正化処理および符号化等の音声に関する処理を行う。そして、音声処理部２１４で処理された音声信号は、第１制御部２２３によりメモリ２１５に送信される。メモリ２１５は、画像記録部２０８から入力されたデータ、および音声処理部２１４から入力された音声信号を一時的に記録する。第１制御部２２３は、この音声信号を記録する場合、音声信号をメモリ２１５から記録再生部２２０に出力する。 In addition, the voice processing unit 214 performs voice-related processing such as optimization processing and coding for the input voice signal. Then, the voice signal processed by the voice processing unit 214 is transmitted to the memory 215 by the first control unit 223. The memory 215 temporarily records the data input from the image recording unit 208 and the audio signal input from the audio processing unit 214. When recording this audio signal, the first control unit 223 outputs the audio signal from the memory 215 to the recording / reproducing unit 220.

記録再生部２２０は、画像データ、音声信号、およびその他撮像に関する制御データ等を記録媒体２２１に記録する。記録媒体２２１は、撮像装置１０１に内蔵された記録媒体でも、取外し可能な記録媒体でもよい。記録媒体２２１は、画像データ、音声信号などの各種データを記録することができる。本実施形態では、記録媒体２２１は、不揮発性メモリ２１６よりも大容量な媒体である。例えば、記録媒体２２１は、ハードディスク、光ディスク、光磁気ディスク、ＣＤ－Ｒ、ＤＶＤ－Ｒ、磁気テープ、不揮発性の半導体メモリ、およびフラッシュメモリなどの記録媒体である。 The recording / reproducing unit 220 records image data, audio signals, other control data related to imaging, and the like on the recording medium 221. The recording medium 221 may be a recording medium built in the image pickup apparatus 101 or a removable recording medium. The recording medium 221 can record various data such as image data and audio signals. In the present embodiment, the recording medium 221 is a medium having a larger capacity than the non-volatile memory 216. For example, the recording medium 221 is a recording medium such as a hard disk, an optical disk, a magneto-optical disk, a CD-R, a DVD-R, a magnetic tape, a non-volatile semiconductor memory, and a flash memory.

また、記録再生部２２０は、記録媒体２２１に記録された画像データ、音声信号、各種データ、プログラムを読み出す（再生する）ことができる。記録媒体２２１に記録された画像データおよび音声信号を再生する場合、第１制御部２２３は次のように動作する。第１制御部２２３は、記録再生部２２０によって読み出された画像データおよび音声信号を、それぞれ画像処理部２０７および音声処理部２１４に出力する。画像処理部２０７および音声処理部２１４は、それぞれ画像データおよび音声信号を復号する。画像処理部２０７および音声処理部２１４は、復号した信号を、それぞれ映像出力部２１７、および音声出力部２１８に出力する。 Further, the recording / reproducing unit 220 can read (reproduce) the image data, the audio signal, various data, and the program recorded on the recording medium 221. When reproducing the image data and the audio signal recorded on the recording medium 221, the first control unit 223 operates as follows. The first control unit 223 outputs the image data and the audio signal read by the recording / reproducing unit 220 to the image processing unit 207 and the audio processing unit 214, respectively. The image processing unit 207 and the audio processing unit 214 decode the image data and the audio signal, respectively. The image processing unit 207 and the audio processing unit 214 output the decoded signal to the video output unit 217 and the audio output unit 218, respectively.

第２制御部２１１は、第１制御部２２３の供給電源を制御する。例えば、第２制御部２１１は、プロセッサ（例えば、ＣＰＵ、マイクロプロセッサ、ＭＰＵなど）、およびメモリ（例えば、ＤＲＡＭ、ＳＲＡＭなど）等からなる。なお、本実施形態では、第２制御部２１１は、撮像装置１０１のメインシステム全体を制御する第１制御部２２３とは別に設けられている。 The second control unit 211 controls the power supply of the first control unit 223. For example, the second control unit 211 includes a processor (for example, a CPU, a microprocessor, an MPU, etc.), a memory (for example, a DRAM, a SRAM, etc.), and the like. In this embodiment, the second control unit 211 is provided separately from the first control unit 223 that controls the entire main system of the image pickup apparatus 101.

第１電源部２１０および第２電源部２１２は、それぞれ第１制御部２２３および第２制御部２１１を動作させるための電力を供給する。本実施形態では、第１電源部２１０が供給する電力は、第２電源部２１２が供給する電力よりも大きい。本実施形態では、供給する電力量に合わせて、第１電源部２１０および第２電源部２１２が選定される。例えば、第１電源部２１０は、第１制御部２２３に電力供給するためのスイッチであり、第２電源部２１２は、リチウム電池、アルカリ乾電池である。撮像装置１０１に設けられた電源スイッチの押下により、まず第２制御部２１１へ電力が供給され、続いて、第１制御部２２３へ電力が供給される。 The first power supply unit 210 and the second power supply unit 212 supply electric power for operating the first control unit 223 and the second control unit 211, respectively. In the present embodiment, the electric power supplied by the first power supply unit 210 is larger than the electric power supplied by the second power supply unit 212. In the present embodiment, the first power supply unit 210 and the second power supply unit 212 are selected according to the amount of electric power to be supplied. For example, the first power supply unit 210 is a switch for supplying electric power to the first control unit 223, and the second power supply unit 212 is a lithium battery or an alkaline dry battery. By pressing the power switch provided on the image pickup apparatus 101, electric power is first supplied to the second control unit 211, and then power is supplied to the first control unit 223.

また、撮像装置１０１はスリープ状態を有する。スリープ状態では、第１制御部２２３は、第１制御部２２３への電力供給をオフするように第１電源部２１０を制御する。第１制御部２２３に電力が供給されていないスリープ状態でも、第２制御部２１１は動作しており、装置揺れ検出部２０９および音声処理部２１４から情報を取得する。第２制御部はこのような入力情報に基づいて、第１制御部２２３を起動するか否かの判定処理を行う。第２制御部２１１は、第１制御部２２３を起動する（スリープ状態を解除する）と判定した場合、第１電源部２１０に対して第１制御部２２３へ電力を供給するよう制御する。 Further, the image pickup apparatus 101 has a sleep state. In the sleep state, the first control unit 223 controls the first power supply unit 210 so as to turn off the power supply to the first control unit 223. Even in the sleep state in which power is not supplied to the first control unit 223, the second control unit 211 is operating and acquires information from the device shake detection unit 209 and the voice processing unit 214. Based on such input information, the second control unit performs a determination process of whether or not to activate the first control unit 223. When the second control unit 211 determines that the first control unit 223 is activated (releases the sleep state), the second control unit 211 controls the first power supply unit 210 to supply electric power to the first control unit 223.

音声出力部２１８は、例えば撮像時などに撮像装置１０１に内蔵されたスピーカーから電子シャッター音などの音声信号を出力する。ＬＥＤ制御部２２４は、例えば撮像時などに撮像装置１０１に設けられたＬＥＤを予め設定されたパターンで点灯または点滅するように制御する。 The audio output unit 218 outputs an audio signal such as an electronic shutter sound from a speaker built in the image pickup apparatus 101, for example, at the time of imaging. The LED control unit 224 controls the LED provided in the image pickup apparatus 101 to light or blink in a preset pattern, for example, at the time of image pickup.

映像出力部２１７は、例えば映像出力端子からなり、接続された外部ディスプレイ等に映像を表示させるための画像信号を出力する。また、音声出力部２１８、および映像出力部２１７は、統合された１つの端子、例えばＨＤＭＩ（登録商標）（Ｈｉｇｈ－ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ（登録商標））端子のようなインターフェースであってもよい。 The video output unit 217 is composed of, for example, a video output terminal, and outputs an image signal for displaying a video on a connected external display or the like. Further, the audio output unit 218 and the video output unit 217 may be an integrated terminal such as an HDMI (registered trademark) (High-Definition Multimedia Interface (registered trademark)) terminal.

通信部２２２は、撮像装置１０１と外部装置との間で通信を行うためのインターフェースである。通信部２２２は、例えば、音声信号および画像データなどのデータを送信したり受信したりする。また、通信部２２２が撮像開始、撮像終了、パン駆動、チルト駆動、およびズーム駆動などの、撮像にかかわる制御信号を受信した場合、第１制御部２２３は、その制御信号に応じて撮像装置１０１を駆動する。通信部２２２は、例えば、赤外線通信モジュール、Ｂｌｕｅｔｏｏｔｈ（登録商標）通信モジュール、無線ＬＡＮ通信モジュール、ＷｉｒｅｌｅｓｓＵＳＢ、ＧＰＳ受信機等の無線通信モジュールを有する。 The communication unit 222 is an interface for communicating between the image pickup device 101 and the external device. The communication unit 222 transmits and receives data such as audio signals and image data, for example. Further, when the communication unit 222 receives control signals related to image pickup such as image pickup start, image pickup end, pan drive, tilt drive, and zoom drive, the first control unit 223 receives the image pickup device 101 according to the control signals. To drive. The communication unit 222 has, for example, a wireless communication module such as an infrared communication module, a Bluetooth (registered trademark) communication module, a wireless LAN communication module, a WirelessUSB, and a GPS receiver.

被写体検出部２２５は、画像処理部２０７から出力された画像データをメモリ２１５から読み出し、人物および物体などの被写体認識を行う。例えば、被写体検出部２２５が、人物を認識する場合、被写体の顔を検出する。撮像装置１０１には、被写体の顔を判定するためのパターンが予め登録されている。なお、このパターンには被写体ごとに区別するための識別子が付与されている。被写体の顔検出処理では、被写体検出部２２５は、撮像された画像内に含まれる被写体の顔を判定するためのパターンに一致する箇所を検出することで、被写体の顔を検出する。また、被写体検出部２２５は、複数の登録された人物をそれぞれ区別することができる。 The subject detection unit 225 reads the image data output from the image processing unit 207 from the memory 215 and recognizes a subject such as a person or an object. For example, when the subject detection unit 225 recognizes a person, the subject's face is detected. A pattern for determining the face of the subject is registered in advance in the image pickup apparatus 101. An identifier for distinguishing each subject is given to this pattern. In the subject face detection process, the subject detection unit 225 detects the face of the subject by detecting a portion matching the pattern for determining the face of the subject included in the captured image. In addition, the subject detection unit 225 can distinguish between a plurality of registered persons.

また、被写体検出部２２５は、検出された被写体の顔に対する確からしさを示す信頼度も同時に算出する。信頼度は、例えば画像内における顔領域の大きさや、顔パターンとの一致度等から算出される。また、被写体検出部２２５は、画像内で被写体の顔に対してパターンマッチングを行うことで、検出された顔が笑顔であるか否か、目が開いているか否か、および顔の向き等の顔情報を検出することができる。なお、顔情報の検出方法はパターンマッチングに限るものではなく、ディープラーニングを利用する方法等、公知の技術を利用することができる。なお、被写体検出部２２５は、検出手段の一例である。 In addition, the subject detection unit 225 also calculates the reliability indicating the certainty of the detected subject on the face at the same time. The reliability is calculated from, for example, the size of the face region in the image, the degree of matching with the face pattern, and the like. Further, the subject detection unit 225 performs pattern matching on the face of the subject in the image to determine whether the detected face is a smile, whether the eyes are open, the orientation of the face, and the like. Face information can be detected. The method for detecting face information is not limited to pattern matching, and known techniques such as a method using deep learning can be used. The subject detection unit 225 is an example of the detection means.

また、物体認識処理では、被写体検出部２２５は、予め登録されたパターンに一致するか否かを判定することで物体を認識することができる。他にも、被写体検出部２２５は、撮像された画像内の色相や彩度等のヒストグラムを利用して被写体の特徴量を抽出することで物体を認識することができる。 Further, in the object recognition process, the subject detection unit 225 can recognize the object by determining whether or not it matches the pattern registered in advance. In addition, the subject detection unit 225 can recognize an object by extracting a feature amount of the subject by using a histogram of hue, saturation, etc. in the captured image.

以上の方法で、第１制御部２２３は、撮像された画像データから被写体検出部２２５によって被写体を検出することができる。 By the above method, the first control unit 223 can detect a subject from the captured image data by the subject detection unit 225.

＜外部装置とのシステム構成＞
図３は、撮像装置１０１とスマートデバイス３０１との無線通信システムの構成例を示す図である。撮像装置１０１は例えばデジタルカメラである。スマートデバイス３０１は例えばＢｌｕｅｔｏｏｔｈ通信モジュール、無線ＬＡＮ通信モジュールを含むスマートフォンである。 <System configuration with external device>
FIG. 3 is a diagram showing a configuration example of a wireless communication system between the image pickup device 101 and the smart device 301. The image pickup device 101 is, for example, a digital camera. The smart device 301 is, for example, a smartphone including a Bluetooth communication module and a wireless LAN communication module.

本実施形態では、撮像装置１０１とスマートデバイス３０１は、２つの通信経路を介して通信できる。１つは、例えばＩＥＥＥ８０２．１１規格シリーズに準拠した無線ＬＡＮによる通信経路３０２である。もう１つは、例えばＢｌｕｅｔｏｏｔｈＬｏｗＥｎｅｒｇｙ（以下、「ＢＬＥ」と呼ぶ）などの、制御局と従属局などの主従関係を有する通信経路３０３である。なお、無線ＬＡＮおよびＢＬＥは通信手法の一例である。なお、各通信装置が、２つ以上の通信機能を有し、例えば制御局と従属局との関係の中で通信を行う一方の通信機能によって、他方の通信機能の制御を行うことが可能であれば、他の通信手法が用いられてもよい。ただし、一般性を失うことなく、無線ＬＡＮなどの第１の通信は、ＢＬＥなどの第２の通信より高速な通信が可能であり、また、第２の通信は、第１の通信よりも消費電力が少ないか通信可能距離が短いかの少なくともいずれかであるものとする。 In the present embodiment, the image pickup apparatus 101 and the smart device 301 can communicate with each other via two communication paths. One is, for example, a communication path 302 by a wireless LAN conforming to the IEEE802.11 standard series. The other is a communication path 303 having a master-slave relationship between a control station and a subordinate station, such as Bluetooth Low Energy (hereinafter referred to as "BLE"). The wireless LAN and BLE are examples of communication methods. It should be noted that each communication device has two or more communication functions, and it is possible to control the other communication function by one communication function that communicates in a relationship between a control station and a subordinate station, for example. If so, other communication methods may be used. However, without losing generality, the first communication such as wireless LAN can perform higher-speed communication than the second communication such as BLE, and the second communication consumes more than the first communication. It shall be at least one of low power consumption and short communication range.

＜外部装置の構成＞
外部装置の一例であるスマートデバイス３０１の構成を、図４を用いて説明する。スマートデバイス３０１は、いわゆるスマートフォンや携帯電話である。スマートデバイス３０１は、すなわち携帯可能な端末である。 <Configuration of external device>
The configuration of the smart device 301, which is an example of the external device, will be described with reference to FIG. The smart device 301 is a so-called smartphone or mobile phone. The smart device 301 is a portable terminal.

スマートデバイス３０１は、例えば、無線ＬＡＮによる通信のための無線ＬＡＮ制御部４０１、およびＢＬＥによる通信のためのＢＬＥ制御部４０２に加え、公衆無線による通信のための公衆回線制御部４０６を有する。また、スマートデバイス３０１は、パケット送受信部４０３を有する。 The smart device 301 has, for example, a wireless LAN control unit 401 for wireless LAN communication, a BLE control unit 402 for BLE communication, and a public line control unit 406 for public wireless communication. Further, the smart device 301 has a packet transmission / reception unit 403.

無線ＬＡＮ制御部４０１は、無線ＬＡＮのＲＦ制御、通信処理、ＩＥＥＥ８０２．１１規格シリーズに準拠した無線ＬＡＮによる通信の各種制御を行う。また、無線ＬＡＮ制御部４０１は、無線ＬＡＮによる通信に関するプロトコル処理を行う。ＢＬＥ制御部４０２は、ＢＬＥのＲＦ制御、通信処理、ＢＬＥによる通信の各種制御を行う。また、ＢＬＥ制御部４０２は、ＢＬＥによる通信に関するプロトコル処理を行う。公衆回線制御部４０６は、公衆無線通信のＲＦ制御、通信処理、公衆無線通信の各種制御を行う。また、公衆回線制御部４０６は、公衆無線通信関連のプロトコル処理を行う。公衆無線通信は例えばＩＭＴ（ＩｎｔｅｒｎａｔｉｏｎａｌＭｕｌｔｉｍｅｄｉａＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓ）規格やＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）規格などに準拠したものである。パケット送受信部４０３は、無線ＬＡＮによる通信、ＢＬＥによる通信、および公衆無線通信に関するパケットの送信と受信との少なくともいずれかを実行するための処理を行う。なお、本実施形態では、スマートデバイス３０１は、通信においてパケットの送信と受信との少なくともいずれかを行うものとして説明するが、パケット交換以外に、例えば回線交換など、他の通信形式が用いられてもよい。 The wireless LAN control unit 401 performs RF control of the wireless LAN, communication processing, and various controls of communication by the wireless LAN conforming to the IEEE802.11 standard series. Further, the wireless LAN control unit 401 performs protocol processing related to communication by wireless LAN. The BLE control unit 402 performs RF control of BLE, communication processing, and various controls of communication by BLE. Further, the BLE control unit 402 performs protocol processing related to communication by BLE. The public line control unit 406 performs RF control of public wireless communication, communication processing, and various controls of public wireless communication. In addition, the public line control unit 406 performs protocol processing related to public wireless communication. Public wireless communication is compliant with, for example, the IMT (International Multimedia Telecommunications) standard and the LTE (Long Term Evolution) standard. The packet transmission / reception unit 403 performs processing for executing at least one of transmission and reception of packets related to wireless LAN communication, BLE communication, and public wireless communication. In the present embodiment, the smart device 301 is described as performing at least one of transmission and reception of packets in communication, but other communication formats such as circuit switching are used in addition to packet switching. It is also good.

また、本実施形態のスマートデバイス３０１は、制御部４１１、記録部４０４、ＧＰＳ受信部４０５、表示部４０７、操作部４０８、音声入力音声処理部４０９、および電源部４１０をさらに有する。制御部４１１は、例えば、記録部４０４に記録されたプログラムを実行することにより、スマートデバイス３０１全体を制御する。記録部４０４は、例えば、制御部４１１によって実行されるプログラム、および通信に必要なパラメータ等の各種情報などを記録する。後述するスマートデバイス３０１の各種動作は、記録部４０４に記録されたプログラムを制御部４１１が実行することにより、実現される。 Further, the smart device 301 of the present embodiment further includes a control unit 411, a recording unit 404, a GPS receiving unit 405, a display unit 407, an operation unit 408, a voice input voice processing unit 409, and a power supply unit 410. The control unit 411 controls the entire smart device 301, for example, by executing the program recorded in the recording unit 404. The recording unit 404 records, for example, a program executed by the control unit 411 and various information such as parameters required for communication. Various operations of the smart device 301, which will be described later, are realized by the control unit 411 executing the program recorded in the recording unit 404.

電源部４１０はスマートデバイス３０１の各部に電力を供給する。表示部４０７は、例えば、ＬＣＤやＬＥＤのように視覚で認知可能な情報の出力、およびスピーカー等の音出力が可能な機能を有し、各種情報の表示および出力を行う。操作部４０８は、例えばユーザによるスマートデバイス３０１に対する操作を受け付けるボタン等である。なお、表示部４０７および操作部４０８は、例えばタッチパネルなどの共通する部材によって構成されてもよい。 The power supply unit 410 supplies electric power to each unit of the smart device 301. The display unit 407 has a function capable of outputting visually recognizable information such as an LCD or LED and outputting sound from a speaker or the like, and displays and outputs various information. The operation unit 408 is, for example, a button or the like that accepts an operation on the smart device 301 by a user. The display unit 407 and the operation unit 408 may be configured by a common member such as a touch panel.

音声入力音声処理部４０９は、例えばマイクを有し、マイクから入力された音声信号に対して音声認識処理を行う。音声入力音声処理部４０９は、ユーザの撮像装置１０１に対する操作を音声認識することができる。この音声認識処理では、音声入力音声処理部４０９は、専用のアプリケーションによって、マイクから入力された音声信号からユーザにより発せられた音声コマンドを認識する。スマートデバイス３０１は、例えば通信経路３０２を介して、撮像装置１０１の音声処理部２１４に特定処理を実行させるための音声コマンドを撮像装置１０１に登録することができる。 Voice input The voice processing unit 409 has, for example, a microphone, and performs voice recognition processing on a voice signal input from the microphone. Voice input The voice processing unit 409 can recognize the user's operation on the image pickup device 101 by voice. In this voice recognition process, the voice input voice processing unit 409 recognizes a voice command issued by the user from the voice signal input from the microphone by a dedicated application. The smart device 301 can register a voice command for causing the voice processing unit 214 of the image pickup device 101 to execute a specific process, for example, via a communication path 302, in the image pickup device 101.

ＧＰＳ（Ｇｌｏｂａｌｐｏｓｉｔｉｏｎｉｎｇｓｙｓｔｅｍ）４０５は、衛星から受信したＧＰＳ信号を解析することで、スマートデバイス３０１の現在位置（経度および緯度情報）を推定する。推定した現在位置が自宅などの予め設定されている範囲（所定半径の範囲以内）に位置している場合、スマートデバイス３０１は、ＢＬＥ制御部４０２を介して撮像装置１０１へ現在位置の情報を通知する。撮像装置１０１はこの現在位置に基づいて、後述する自動撮像や自動編集のためのパラメータとして使用することができる。また、現在位置に予め設定されている範囲から外れるなどの位置変化があった場合、ＢＬＥ制御部４０２を介して撮像装置１０１へ移動情報を通知し、自動撮像や自動編集のためのパラメータとして使用する。なお、スマートデバイス３０１は、ＷＰＳ（Ｗｉ－ＦｉＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）等を利用して、周囲に存在する無線ネットワークの情報に基づいて、スマートデバイス３０１の現在位置を推定してもよい。 The GPS (Global Positioning System) 405 estimates the current position (longitude and latitude information) of the smart device 301 by analyzing a GPS signal received from a satellite. When the estimated current position is located within a preset range (within a predetermined radius range) such as at home, the smart device 301 notifies the image pickup device 101 of the current position information via the BLE control unit 402. do. The image pickup apparatus 101 can be used as a parameter for automatic imaging and automatic editing, which will be described later, based on this current position. In addition, when there is a position change such as when the current position deviates from the preset range, the movement information is notified to the image pickup device 101 via the BLE control unit 402 and used as a parameter for automatic image pickup and automatic editing. do. The smart device 301 may estimate the current position of the smart device 301 based on the information of the wireless network existing in the surroundings by using WPS (Wi-Fi Positioning System) or the like.

上記のように撮像装置１０１とスマートデバイス３０１は、無線ＬＡＮ制御部４０１、およびＢＬＥ制御部４０２を用いた通信により、撮像装置１０１とデータのやりとりを行う。例えば、撮像装置１０１とスマートデバイス３０１は、音声信号、および画像データなどのデータを送受信する。また、スマートデバイス３０１は撮像装置１０１の撮像にかかわる設定情報を撮像装置１０１へ送信することができる。また、スマートデバイス３０１は、撮像装置１０１の撮像処理に関する制御信号、および位置情報を撮像装置１０１へ送信することができる。 As described above, the image pickup device 101 and the smart device 301 exchange data with the image pickup device 101 by communication using the wireless LAN control unit 401 and the BLE control unit 402. For example, the image pickup device 101 and the smart device 301 transmit and receive data such as audio signals and image data. Further, the smart device 301 can transmit the setting information related to the image pickup of the image pickup device 101 to the image pickup device 101. Further, the smart device 301 can transmit a control signal related to the image pickup process of the image pickup device 101 and position information to the image pickup device 101.

＜自動撮像処理＞
自動撮像処理は、第１制御部２２３が撮像するタイミングを判定し、被写体を自動的に撮像する処理である。この自動撮像処理では、第１制御部２２３は、良い画像や良い動画を撮像できると判定した場合やある程度時間が経過した場合に、自動的に撮像対象となる被写体を決定し撮像する、ということを繰り返す。これにより、ユーザは手動で撮像しなくとも、日常に不意に現れる良い場面や日常の何気ない変化を撮像装置１０１によって残すことができる。 <Automatic imaging process>
The automatic image pickup process is a process of determining the timing at which the first control unit 223 takes an image and automatically taking an image of the subject. In this automatic image pickup process, the first control unit 223 automatically determines and captures a subject to be imaged when it is determined that a good image or a good moving image can be captured or when a certain amount of time has passed. repeat. As a result, the user can leave a good scene suddenly appearing in daily life or a casual change in daily life by the image pickup device 101 without manually taking an image.

図５は、本実施形態における撮像装置１０１の自動撮像処理のフローチャートである。本フローチャートの処理は、ユーザによって撮像装置１０１の電源スイッチをオンにされたことに応じて開始される。なお、本実施形態において、撮像装置１０１とスマートデバイス３０１との間では無線接続が確立されている。また、ユーザはスマートデバイス３０１上の専用アプリケーションから撮像装置１０１に対して各種の操作を行うことができる。以下のフローチャートの各ステップの処理は、第１制御部２２３が撮像装置１０１の各部を制御することによって実現される。 FIG. 5 is a flowchart of the automatic image pickup process of the image pickup apparatus 101 according to the present embodiment. The process of this flowchart is started in response to the user turning on the power switch of the image pickup apparatus 101. In this embodiment, a wireless connection is established between the image pickup device 101 and the smart device 301. In addition, the user can perform various operations on the image pickup apparatus 101 from a dedicated application on the smart device 301. The processing of each step in the following flowchart is realized by the first control unit 223 controlling each unit of the image pickup apparatus 101.

ステップＳ５０１では、第１制御部２２３は、自動撮像処理が停止中の状態かどうかを判定する。自動撮像処理の停止に関しては後述の音声認識処理の説明で述べる。自動撮像処理が停止中である場合では、第１制御部２２３は、自動撮像処理の停止が解除されるまで待機する。すなわち、自動撮像処理が停止中である場合では、自動撮像処理の停止が解除されるまでステップＳ５０１の処理が繰り返される。自動撮像処理が停止中ではない場合では、ステップＳ５０２の処理が実行される。 In step S501, the first control unit 223 determines whether or not the automatic imaging process is stopped. The stop of the automatic imaging process will be described later in the description of the voice recognition process. When the automatic image pickup process is stopped, the first control unit 223 waits until the stop of the automatic image pickup process is released. That is, when the automatic image pickup process is stopped, the process of step S501 is repeated until the stop of the automatic image pickup process is canceled. If the automatic imaging process is not stopped, the process of step S502 is executed.

ステップＳ５０２では、第１制御部２２３は、画像処理部２０７に撮像部２０６で取り込まれた信号を画像処理させ、被写体認識用の画像を生成させる。さらに第１制御部２２３は、被写体検出部２２５を制御し、生成された被写体認識用の画像から、人物認識、および動物認識などの被写体認識を行う。例えば、第１制御部２２３は、被写体認識を行う場合、被写体を判定するためのパターンをあらかじめ保持しておき、この保持されているパターンと被写体認識用の画像に含まれるパターンとの一致度に基づいて、被写体検出部２２５によって被写体を判定する。ここで、被写体検出部２２５は、被写体の個人識別。また、第１制御部２２３は、被写体を判定するとともに、被写体の画角内の位置を検出する。 In step S502, the first control unit 223 causes the image processing unit 207 to process the signal captured by the image pickup unit 206 to generate an image for subject recognition. Further, the first control unit 223 controls the subject detection unit 225 to perform subject recognition such as person recognition and animal recognition from the generated image for subject recognition. For example, when performing subject recognition, the first control unit 223 holds a pattern for determining the subject in advance, and determines the degree of matching between the held pattern and the pattern included in the image for subject recognition. Based on this, the subject detection unit 225 determines the subject. Here, the subject detection unit 225 personally identifies the subject. In addition, the first control unit 223 determines the subject and detects the position within the angle of view of the subject.

ステップＳ５０３では、第１制御部２２３は、像揺れ補正量の算出を行う。具体的には、まず、第１制御部２２３は、装置揺れ検出部２０９において取得した角速度および加速度情報に基づいて撮像装置の絶対角度の算出を行う。そして、第１制御部２２３は、絶対角度を打ち消す角度方向にチルト回転ユニット１０４およびパン回転ユニット１０５を動かす防振角度を求め、像揺れ補正量とする。 In step S503, the first control unit 223 calculates the image shake correction amount. Specifically, first, the first control unit 223 calculates the absolute angle of the image pickup device based on the angular velocity and acceleration information acquired by the device shake detection unit 209. Then, the first control unit 223 obtains the vibration isolation angle for moving the tilt rotation unit 104 and the pan rotation unit 105 in the angle direction for canceling the absolute angle, and uses this as the image shake correction amount.

ステップＳ５０４では、第１制御部２２３は、被写体探索処理を行う。被写体探索処理は、以下の処理によって構成される。 In step S504, the first control unit 223 performs subject search processing. The subject search process is composed of the following processes.

（１）エリア分割
図６（ａ）～（ｃ）を用いて、エリア分割を説明する。図６（ａ）～（ｃ）では、撮像装置の位置を原点Ｏとして、球面上のエリアが分割されている。図６（ａ）の例においては、チルト方向、パン方向においてエリアがそれぞれ２２．５度ごとに分割されている。図６（ａ）に示すように分割すると、チルト方向の角度が０度から離れるにつれて、水平方向の円周が小さくなり、１エリアの領域が小さくなる。そこで、本実施形態の撮像装置は、図６（ｂ）に示すように、チルト角度が４５度以上の場合、水平方向のエリア範囲を２２．５度よりも大きく設定する。 (1) Area division The area division will be described with reference to FIGS. 6 (a) to 6 (c). In FIGS. 6A to 6C, the area on the spherical surface is divided with the position of the image pickup apparatus as the origin O. In the example of FIG. 6A, the areas are divided by 22.5 degrees in the tilt direction and the pan direction, respectively. When divided as shown in FIG. 6A, the circumference in the horizontal direction becomes smaller and the area of one area becomes smaller as the angle in the tilt direction deviates from 0 degrees. Therefore, as shown in FIG. 6B, the image pickup apparatus of the present embodiment sets the horizontal area range to be larger than 22.5 degrees when the tilt angle is 45 degrees or more.

次に、図６（ｃ）、（ｄ）を用いて、撮像装置１０１によって撮像された画像の画角内におけるエリアについて説明する。軸１３０１は、撮像装置１０１の撮像方向の基準方向であり、この方向を基準としてエリア分割が行われる。軸１３０１は、例えば、撮像装置１０１が起動していたときにおける撮像方向、または撮像方向の基準となる方向として予め定められている方向である。エリア１３０２は、撮像部２０６によって撮像されている画像の画角エリアである。図６（ｄ）は、エリア１３０２において撮像部２０６によって撮像されたスルー画像の一例である。図６（ｄ）のスルー画像の画角内では図６（ｃ）に示すエリア分割に基づいて、エリア１３０３～エリア１３１８のように画像のエリアが分割される。 Next, the area within the angle of view of the image captured by the image pickup apparatus 101 will be described with reference to FIGS. 6 (c) and 6 (d). The axis 1301 is a reference direction of the image pickup direction of the image pickup apparatus 101, and the area division is performed with reference to this direction. The axis 1301 is, for example, a direction defined in advance as an image pickup direction when the image pickup apparatus 101 is activated, or a direction as a reference for the image pickup direction. The area 1302 is an angle of view area of the image captured by the image pickup unit 206. FIG. 6D is an example of a through image captured by the imaging unit 206 in the area 1302. Within the angle of view of the through image of FIG. 6 (d), the area of the image is divided into areas 1303 to 1318 based on the area division shown in FIG. 6 (c).

（２）エリア毎の重要度レベルの算出
上述のように分割した各エリアについて、エリア内に存在する被写体やエリアのシーン状況に応じて、被写体探索を行う際の優先順位を示す重要度レベルを算出する。被写体の状況に基づいた重要度レベルは、例えば、エリア内に存在する被写体の数、被写体の顔の大きさ、被写体の顔の向き、被写体の顔検出の確からしさ、被写体の表情、被写体の個人識別結果に基づいて算出される。また、シーンの状況に応じた重要度レベルは、例えば、一般物体認識結果、シーン判定結果（青空、逆光、夕景など）、エリアの方向からする音のレベルや音声認識結果、エリア内の動き検知情報等である。ここでは、第１制御部２２３は、撮像装置１０１の全周を探索するように駆動する。 (2) Calculation of importance level for each area For each area divided as described above, the importance level indicating the priority when searching for a subject is set according to the subject existing in the area and the scene situation of the area. calculate. The importance level based on the situation of the subject is, for example, the number of subjects existing in the area, the size of the subject's face, the orientation of the subject's face, the certainty of the subject's face detection, the facial expression of the subject, and the individual of the subject. Calculated based on the identification result. In addition, the importance level according to the situation of the scene is, for example, general object recognition result, scene judgment result (blue sky, backlight, evening scene, etc.), sound level or voice recognition result from the direction of the area, motion detection in the area. Information etc. Here, the first control unit 223 is driven so as to search the entire circumference of the image pickup apparatus 101.

また、例えば、第１制御部２２３は、被写体の顔が登録されている場合、登録されている被写体の顔を検出したエリアの重要度レベルを高くする。例えば、被写体の顔は、被写体を判定するためのパターンとして不揮発性メモリ２１６に記録されている。なお、第１制御部２２３は、被写体の顔が検出されたエリアの重要度レベルを高くする場合、所定時間が経過したこと、または所定の回数撮像したことに応じて、そのエリアの重要度レベルを元の重要度レベルに戻す。 Further, for example, when the face of the subject is registered, the first control unit 223 raises the importance level of the area where the face of the registered subject is detected. For example, the face of the subject is recorded in the non-volatile memory 216 as a pattern for determining the subject. When increasing the importance level of the area where the face of the subject is detected, the first control unit 223 sets the importance level of the area according to the elapse of a predetermined time or the imaging of a predetermined number of times. To return to the original importance level.

（３）探索エリアの決定
第１制御部２２３は、上述のように各エリアの重要度レベルを決定した後、重要度レベルが高いエリアを重点的に探索するように決定する。そして、第１制御部２２３は、重要度レベルの高いエリアの１つを撮像するために必要なパン角度およびチルト角度を算出する。 (3) Determination of search area The first control unit 223 determines the importance level of each area as described above, and then determines to focus on the area having a high importance level. Then, the first control unit 223 calculates the pan angle and the tilt angle required to image one of the areas having a high importance level.

ステップＳ５０５では、第１制御部２２３は、パン駆動およびチルト駆動を行う。具体的には、像振れ補正量とステップＳ５０４において算出されたパン角度およびチルト角度とに基づいて、パン駆動量およびチルト駆動量を算出する。そして第１制御部２２３は、算出されたパン駆動量およびチルト駆動量に基づいて、鏡筒回転駆動部２０５によって、チルト回転ユニット１０４、およびパン回転ユニット１０５をそれぞれ駆動制御する。本実施形態では、第１制御部２２３は、ステップＳ５０５における駆動により、重要度レベルが高いエリアにおいて被写体を検出し、その被写体の撮像を開始するものとして説明する。そして、第１制御部２２３は、その被写体を追尾する（画角に収め続ける）ように鏡筒回転駆動部２０５を制御する。 In step S505, the first control unit 223 performs pan drive and tilt drive. Specifically, the pan drive amount and the tilt drive amount are calculated based on the image shake correction amount and the pan angle and tilt angle calculated in step S504. Then, the first control unit 223 drives and controls the tilt rotation unit 104 and the pan rotation unit 105 by the lens barrel rotation drive unit 205 based on the calculated pan drive amount and tilt drive amount, respectively. In the present embodiment, the first control unit 223 will be described as detecting a subject in an area having a high importance level by driving in step S505 and starting imaging of the subject. Then, the first control unit 223 controls the lens barrel rotation drive unit 205 so as to track the subject (continue to keep it within the angle of view).

ステップＳ５０６では、第１制御部２２３は、ズームユニット２０１を制御しズーム駆動を行う。例えば、ステップＳ５０５において撮像を開始した被写体の状態に応じてズームを駆動させる。例えば、被写体の顔が画角内において非常に小さく撮像されている場合、第１制御部２２３は、望遠側にズームすることで、被写体の顔のサイズが画角内に適正に（より大きく）撮像されるように制御する。一方、被写体の顔が画角内において非常に大きく撮像されている場合、第１制御部２２３は、広角側にズームすることで、被写体の顔のサイズが画角内に適正に（より小さく）撮像されるように制御する。このようにズーム制御を行うことで、被写体を追尾するのに適した状態を保つことができる。 In step S506, the first control unit 223 controls the zoom unit 201 to drive the zoom. For example, the zoom is driven according to the state of the subject that started imaging in step S505. For example, when the face of the subject is captured very small within the angle of view, the first control unit 223 zooms to the telephoto side so that the size of the face of the subject is properly (larger) within the angle of view. Control to be imaged. On the other hand, when the face of the subject is imaged very large within the angle of view, the first control unit 223 zooms to the wide-angle side so that the size of the face of the subject is appropriately (smaller) within the angle of view. Control to be imaged. By performing zoom control in this way, it is possible to maintain a state suitable for tracking the subject.

ステップＳ５０４～ステップＳ５０６では、パン駆動およびチルト駆動、ならびにズーム駆動により被写体探索が行われる方法を説明したが、広角なレンズを複数使用して全方位を一度に撮像する撮像システムで被写体探索が行われてもよい。この場合、全方位の撮像によって得られる信号すべてを入力画像として、被写体検出などの画像処理を行うには全体の処理負荷が大きい。そこで、この場合は、全方位の撮像によって得られた画像の一部を切り出して、切り出された画像範囲の中で被写体の探索処理が行われる構成にする。この構成では、第１制御部２２３は、上述した方法と同様にエリア毎の重要度レベルを算出し、重要度レベルに基づいて切り出し位置を変更し、後述する自動撮像の判定を行う。これにより画像処理による消費電力を抑えながらも高速な被写体探索が可能となる。 In steps S504 to S506, a method of performing subject search by pan drive, tilt drive, and zoom drive has been described, but the subject search is performed by an imaging system that uses a plurality of wide-angle lenses to capture images in all directions at once. You may be broken. In this case, the overall processing load is large in order to perform image processing such as subject detection using all the signals obtained by omnidirectional imaging as input images. Therefore, in this case, a part of the image obtained by omnidirectional imaging is cut out, and the subject search process is performed within the cut out image range. In this configuration, the first control unit 223 calculates the importance level for each area in the same manner as the above-mentioned method, changes the cutting position based on the importance level, and makes a determination of automatic imaging described later. This enables high-speed subject search while suppressing power consumption due to image processing.

ステップＳ５０７では、第１制御部２２３は、自動撮像を行うかどうかの判定を行う。 In step S507, the first control unit 223 determines whether or not to perform automatic imaging.

ここで、自動撮像を行うかどうかの判定について説明する。自動撮像を行うかどうかの判定は、撮像スコアが所定値を超えるかどうかで行われる。撮像スコアとは、自動撮像を行うかどうかの判定に用いるパラメータである。撮像スコアは、被写体の検出状況と時間経過に応じて得点が加点される。例えば、撮像スコアが２０００点を超えると自動撮像を行われるよう設計する場合を考える。この場合、まず、撮像スコアは初期値が０点であり、自動撮像のモードに入った時点からの時間経過によって加点されていく。例えば撮像スコアは１２０秒後に２０００点に達するような増加率で増加していく。このとき、被写体が検出されないまま１２０秒が経過した場合、時間経過による加点によって２０００点に達し、撮像が行われる。また、時間経過中に優先度の高い被写体を検出すると１０００点が加点される。例えば、優先度の高い被写体は、撮像装置１０１に顔が登録されている被写体の内、優先して撮像する対象としてユーザに設定された被写体である。優先度の高い被写体が検出されている状態では、２０００点に達しやすくなり、結果的に撮像頻度が上がることになりやすい。 Here, the determination of whether or not to perform automatic imaging will be described. The determination of whether or not to perform automatic imaging is performed based on whether or not the imaging score exceeds a predetermined value. The image pickup score is a parameter used for determining whether or not to perform automatic image pickup. Scores are added to the imaging score according to the detection status of the subject and the passage of time. For example, consider a case of designing so that automatic imaging is performed when the imaging score exceeds 2000 points. In this case, first, the initial value of the imaging score is 0, and points are added according to the passage of time from the time when the automatic imaging mode is entered. For example, the imaging score increases at an increasing rate so as to reach 2000 points after 120 seconds. At this time, if 120 seconds elapse without detecting the subject, the number of points added by the passage of time reaches 2000 points, and imaging is performed. Further, if a subject having a high priority is detected during the lapse of time, 1000 points will be added. For example, the subject having a high priority is a subject set by the user as a target to be imaged with priority among the subjects whose faces are registered in the image pickup apparatus 101. In a state where a subject having a high priority is detected, it is easy to reach 2000 points, and as a result, the imaging frequency tends to increase.

また、例えば被写体の笑顔を認識した場合は、８００点が加点される。なお、この笑顔に基づく加点は、優先度の高い被写体でなくとも加点される。また、本実施形態では、笑顔に基づく加点の点数は優先度の高い被写体であるか否かに関わらず同じ点数である場合を例に挙げて説明するが、これに限られるものではない。例えば優先度の高い被写体の笑顔を検知したことに応じた加点の点数を、優先度が高くない被写体の笑顔を検知したことに応じた加点の点数よりも高くしてもよい。このようにすることで、よりユーザの意図に沿った撮像を行うことが可能になる。これらの被写体の喜怒哀楽等の表情変化に伴う加点により２０００点を超えれば自動撮像される。また、表情変化に伴う加点で２０００点を超えなくとも、その後の時間経過による加点で２０００点により短い時間で到達する。 Further, for example, when the smiling face of the subject is recognized, 800 points are added. The points added based on this smile are added even if the subject is not a high-priority subject. Further, in the present embodiment, the points to be added based on the smile will be described by taking as an example the case where the points are the same regardless of whether or not the subject has a high priority, but the present invention is not limited to this. For example, the score added according to the detection of the smile of the subject having a high priority may be higher than the score added according to the detection of the smile of the subject having a low priority. By doing so, it becomes possible to perform imaging more in line with the user's intention. If the number of points exceeds 2000 due to the addition of points due to changes in facial expressions such as emotions of these subjects, automatic imaging is performed. In addition, even if the points added due to the change in facial expression do not exceed 2000 points, the points added after the passage of time reach 2000 points in a shorter time.

なお、時間経過による加点は、例えば１２０秒で２０００点になるよう加点する場合、１秒ごとに２０００／１２０点だけ加点する、すなわち時間に対して線形に加点する場合を例に挙げて説明するがこれに限られるものではない。例えば、１２０秒のうち１１０秒までは加点せず、１１０秒から１２０秒までの１０秒間で、秒間２００点ずつ加点して２０００点に達するような増加の仕方にしてもよい。このようにすることで、被写体の表情変化による加点で、優先度の高低に関わらず撮像される点数に達してしまうことを防ぐことができる。時間経過に伴い線形増加する加点方法の場合、すでに時間経過により加点されている状態が長いため、優先度の低い被写体の笑顔への変化に伴う加点であっても撮像される点数に達してしまうことが多く、優先度の高低がさほど反映されにくい。かといって表情変化に伴う加点の点数を低くすると表情変化のあるタイミングを逃すことになるため、加点の点数を下げることでの対応は避けたい。そこで、１１０秒までは加点しないようにする。このようにすれば、優先度の低い被写体は加点されないまま１１０秒が経過する。一方、優先度の高い被写体は検知した時点で１０００点が加点されるようにしているため、１１０秒まで時間経過による加点がなくとも１０００点は加点された状態になる。これにより、表情変化に伴う加点が行われる場合に、優先度の低い被写体は撮像を行う点数に達する可能性を、優先度の高い被写体にくらべて抑えることができ、優先度の高低が機能しやすい。上記の説明では表情変化を例に挙げたが、加点される基準はこのほかにも声が大きくなった場合や身振り手振りが大きくなった場合などが考えられる。これらについても優先度の高低を機能させやすくするために上記のような加点方法の差を設ければよい。 The points added over time will be described by taking as an example a case where points are added so as to reach 2000 points in 120 seconds, a case where only 2000/120 points are added every second, that is, a case where points are added linearly with respect to time. Is not limited to this. For example, points may not be added up to 110 seconds out of 120 seconds, but may be increased so that points are added by 200 points per second to reach 2000 points in 10 seconds from 110 seconds to 120 seconds. By doing so, it is possible to prevent the points added due to the change in the facial expression of the subject from reaching the points to be imaged regardless of the high or low priority. In the case of the point addition method that linearly increases with the passage of time, the points are already added over time for a long time, so even if the points are added due to the change to the smile of the subject with low priority, the points to be imaged will be reached. In many cases, it is difficult to reflect the high and low priorities. However, if the score of points added due to changes in facial expressions is lowered, the timing of changes in facial expressions will be missed, so it is desirable to avoid measures by lowering the points of points added. Therefore, do not add points until 110 seconds. By doing so, 110 seconds elapse without adding points to the subject having a low priority. On the other hand, for a subject having a high priority, 1000 points are added at the time of detection, so that 1000 points are added even if there is no point added due to the passage of time up to 110 seconds. As a result, when points are added due to changes in facial expressions, the possibility that a low-priority subject will reach the number of points to be imaged can be suppressed compared to a high-priority subject, and the high and low priorities function. Cheap. In the above explanation, the change in facial expression is taken as an example, but other criteria for adding points may be when the voice becomes louder or when the gesture becomes louder. Also for these, in order to make it easier for the high and low priorities to function, the difference in the point addition method as described above may be provided.

また、仮に被写体の行動によって２０００点を超えなくとも、時間経過によって必ず１２０秒で撮像されるため、一定期間まったく撮像されないということはない。 Further, even if the action of the subject does not exceed 2000 points, the image is always taken in 120 seconds with the passage of time, so that it is possible that the image is not taken at all for a certain period of time.

また、途中で被写体が検出された場合、１２０秒のうち、増加を開始する時間を前倒ししてもよい。つまり、例えば６０秒の時点で優先度の高い被写体が検出された場合、それによって１０００点が加点されても２０００点を超えないが、そのまま１１０秒まで増加しないのではなく、被写体を検出したのち３０秒が経過したら線形増加を始めるようにしてもよい。あるいは、１２０秒の１０秒前ではなく２０秒前に線形増加を始めるようにしてもよい。このようにすれば、優先度の高い被写体が撮像される可能性が高まるため、よりユーザの意図に沿った撮像を実現しやすくなる。 Further, when the subject is detected in the middle, the time for starting the increase may be advanced in 120 seconds. That is, for example, if a subject with a high priority is detected at 60 seconds, even if 1000 points are added, it does not exceed 2000 points, but it does not increase to 110 seconds as it is, but after detecting the subject. The linear increase may be started after 30 seconds. Alternatively, the linear increase may be started 20 seconds before 10 seconds before 120 seconds. By doing so, the possibility that a high-priority subject is imaged increases, and it becomes easier to realize the image capture according to the user's intention.

自動撮像が行われると、撮像スコアは０点にリセットされる。再度２０００点を超えるまで自動撮像は行われない。 When automatic imaging is performed, the imaging score is reset to 0 points. Automatic imaging is not performed until 2000 points are exceeded again.

以上、自動撮像を行うかどうかの判定について説明した。上記の判定により、第１制御部２２３が自動撮像すると判定した場合には、ステップＳ５０８の処理が実行される。第１制御部２２３が撮像しないと判定した場合には、ステップＳ５０１の処理が実行される。 The determination of whether or not to perform automatic imaging has been described above. If it is determined by the above determination that the first control unit 223 automatically takes an image, the process of step S508 is executed. If the first control unit 223 determines that the image is not captured, the process of step S501 is executed.

ステップＳ５０８では、第１制御部２２３は、撮像処理を実行する。撮像処理は、例えば静止画撮像や動画撮像である。 In step S508, the first control unit 223 executes an imaging process. The imaging process is, for example, still image imaging or moving image imaging.

以上、本実施形態における撮像装置１０１の自動撮像処理について説明した。このような自動的に被写体を撮像する処理によって、撮像装置１０１はユーザからの撮像指示がなくとも、ユーザの欲するシーンの画像や動画を撮像することができる。 The automatic image pickup process of the image pickup apparatus 101 in the present embodiment has been described above. By such a process of automatically capturing a subject, the imaging device 101 can capture an image or a moving image of a scene desired by the user without an imaging instruction from the user.

なお、第１制御部２２３は、自動撮像処理を、被写体の探索処理や撮像処理を所定時間行うことと、スリープ状態になるためのスリープ処理を所定時間行うこととを繰り返すような処理にしてもよい。スリープ処理では、第１制御部２２３は自身への電力供給をオフにするように第１電源部２１０を制御し、第２制御部２１１が動作している状態である。これにより、第１制御部２２３は消費電力を抑えつつも、ユーザの欲するシーンの画像や動画を自動的に撮像することができる。なお、このような自動撮像処理を、上述の連続的な撮像処理と区別するために、断続撮像処理という。 The first control unit 223 may perform the automatic image pickup process by repeating the process of searching for a subject and the image pickup process for a predetermined time and the sleep process for entering a sleep state for a predetermined time. good. In the sleep process, the first control unit 223 controls the first power supply unit 210 so as to turn off the power supply to itself, and the second control unit 211 is in operation. As a result, the first control unit 223 can automatically capture an image or a moving image of a scene desired by the user while suppressing power consumption. In addition, in order to distinguish such an automatic imaging process from the above-mentioned continuous imaging process, it is referred to as an intermittent imaging process.

＜音声認識処理＞
図７は、本実施形態における撮像装置１０１の音声認識処理のフローチャートである。本フローチャートの処理は、マイクから音声信号が入力されたことを音声入力音声処理部４０９によって検出されたことに応じて開始される。また、本フローチャートの処理は図５の自動撮像処理と並行して実行される。本実施形態では、図５の自動撮像処理の実行中にマイクから音声信号が入力されたことを音声入力音声処理部４０９によって検出されたことに応じて本フローチャートが開始される場合を例に挙げて説明する。また、この処理は、第１制御部２２３が不揮発性メモリ２１６に記録されたプログラムを実行することで実現される。 <Voice recognition processing>
FIG. 7 is a flowchart of the voice recognition process of the image pickup apparatus 101 in the present embodiment. The processing of this flowchart is started in response to the detection by the voice input voice processing unit 409 that the voice signal is input from the microphone. Further, the processing of this flowchart is executed in parallel with the automatic imaging process of FIG. In the present embodiment, the case where the flowchart is started in response to the detection by the voice input voice processing unit 409 that the voice signal is input from the microphone during the execution of the automatic imaging process of FIG. 5 is taken as an example. I will explain. Further, this processing is realized by the first control unit 223 executing the program recorded in the non-volatile memory 216.

ステップＳ７０１では、第１制御部２２３は、トリガーワードの検出がされたかどうかの判定を行う。トリガーワードは、撮像装置１０１に対する具体的な指示を音声で行う音声コマンド認識を開始するための起動コマンドである。ユーザは音声によって撮像装置１０１に指示を行う場合、トリガーワードの後にコマンドワードを発声し、撮像装置１０１にそのコマンドワードを認識させる必要がある。トリガーワードが検出された場合には、ステップＳ７０２の処理が実行される。トリガーワードが検出されなかった場合には、トリガーワードが検出されるまでステップＳ７０１の処理が繰り返される。 In step S701, the first control unit 223 determines whether or not the trigger word has been detected. The trigger word is an activation command for starting voice command recognition that gives a specific instruction to the image pickup apparatus 101 by voice. When the user gives an instruction to the image pickup apparatus 101 by voice, it is necessary to utter a command word after the trigger word so that the image pickup apparatus 101 recognizes the command word. If the trigger word is detected, the process of step S702 is executed. If the trigger word is not detected, the process of step S701 is repeated until the trigger word is detected.

ステップＳ７０２では、第１制御部２２３は、自動撮像処理を停止状態にする。本実施形態では、第１制御部２２３は、トリガーワードを検出した場合、コマンドワードの待ち受け状態に遷移する。また、第１制御部２２３は、図５のフローチャートを用いて説明した自動撮像処理のうちの被写体探索処理（ステップＳ５０４）、駆動処理（ステップＳ５０５－ステップＳ５０６）、および撮像処理（ステップＳ５０８）を停止する。なお、駆動処理は、例えば、パン駆動、チルト駆動、およびズーム駆動である。なお、撮像処理は、例えば、静止画撮像および動画撮像である。一方、本ステップでは、第１制御部２２３は、被写体検出などのための画像認識処理（ステップＳ５０２）は停止せずに実行し続ける。このようにトリガーワードを検出した時点で自動撮像処理を停止する理由は、以下のとおりである。すなわち、ユーザは、撮像してほしい被写体の方向を撮像装置１００が向いている状態を認識し、かつユーザが撮像してほしいタイミングでトリガーワードを発声すると考えられるからである。一方、第１制御部２２３は、トリガーワードを検出しただけでは、コマンドワードによる指示を検出していないため、撮像処理を実行するべきか否かは判定できない。撮像処理を実行するべきであると第１制御部２２３が判定するのは、トリガーワードのあとのコマンドワードを認識したタイミングである。つまり、第１制御部２２３がトリガーワードを検出した後も自動撮像処理を継続してしまうと、コマンドワードを認識した時点ではユーザの撮像したい向きから撮像装置１００の方向が変わってしまう可能性がある。そこで、第１制御部２２３は、トリガーワードを検出したことに応じて自動撮像処理を停止することで、ユーザの撮像したいタイミングにおける撮像方向を向いた状態を維持する。そして、ユーザがトリガーワードの後に撮像を指示するためのコマンドワードを発声した場合には、第１制御部２２３は、撮像処理を実行することで、ユーザの撮像したい被写体を撮像することができる。すなわち、第１制御部２２３は、駆動処理を停止することで、ユーザの撮像したい被写体を撮像することができる。 In step S702, the first control unit 223 stops the automatic imaging process. In the present embodiment, when the first control unit 223 detects the trigger word, the first control unit 223 transitions to the standby state of the command word. Further, the first control unit 223 performs the subject search process (step S504), the drive process (step S505-step S506), and the image pickup process (step S508) among the automatic image pickup processes described using the flowchart of FIG. Stop. The drive processing is, for example, pan drive, tilt drive, and zoom drive. The imaging process is, for example, still image imaging and moving image imaging. On the other hand, in this step, the first control unit 223 continues to execute the image recognition process (step S502) for subject detection and the like without stopping. The reason for stopping the automatic imaging process when the trigger word is detected in this way is as follows. That is, it is considered that the user recognizes the state in which the image pickup apparatus 100 is facing the direction of the subject to be imaged, and utters the trigger word at the timing when the user wants to image the image. On the other hand, since the first control unit 223 does not detect the instruction by the command word only by detecting the trigger word, it cannot determine whether or not the imaging process should be executed. The first control unit 223 determines that the image pickup process should be executed at the timing when the command word after the trigger word is recognized. That is, if the automatic image pickup process is continued even after the first control unit 223 detects the trigger word, the direction of the image pickup device 100 may change from the direction in which the user wants to image when the command word is recognized. be. Therefore, the first control unit 223 stops the automatic imaging process in response to the detection of the trigger word to maintain the state facing the imaging direction at the timing at which the user wants to image. Then, when the user utters a command word for instructing imaging after the trigger word, the first control unit 223 can capture the subject to be imaged by the user by executing the imaging process. That is, the first control unit 223 can capture the subject to be captured by the user by stopping the drive process.

ステップＳ７０３では、第１制御部２２３は、トリガーワードを検出したことをユーザに示すための検出音をスピーカーによって出力する。 In step S703, the first control unit 223 outputs a detection sound for indicating to the user that the trigger word has been detected by the speaker.

ステップＳ７０４では、第１制御部２２３は、トリガーワードにつづいてコマンドワードが検出されたかどうか判定を行う。コマンドワードが検出された場合にはステップＳ７０６の処理が実行され、検出されなかった場合にはステップＳ７０５の処理が実行される。 In step S704, the first control unit 223 determines whether or not a command word is detected following the trigger word. If the command word is detected, the process of step S706 is executed, and if it is not detected, the process of step S705 is executed.

ステップＳ７０５では、第１制御部２２３は、トリガーワードを検出し、コマンドワード待ち受け状態になってから所定時間が経過したか否かを判定する。所定時間が経過した場合にはステップＳ７０１の処理が実行され、第１制御部２２３は、コマンドワードの待ち受け状態を止めて、トリガーワードの待ち受け状態となる。また、この場合、第１制御部２２３は自動撮像処理を再開する。所定時間が経過していない場合には、第１制御部２２３は、コマンドワードが検出されるまでステップＳ７０４の処理を繰り返す。 In step S705, the first control unit 223 detects the trigger word and determines whether or not a predetermined time has elapsed since the command word was in the standby state. When the predetermined time has elapsed, the process of step S701 is executed, and the first control unit 223 stops the standby state of the command word and enters the standby state of the trigger word. Further, in this case, the first control unit 223 restarts the automatic imaging process. If the predetermined time has not elapsed, the first control unit 223 repeats the process of step S704 until the command word is detected.

ステップＳ７０６では、第１制御部２２３は、検出されたコマンドワードが静止画撮像コマンドかどうかの判定を行う。この静止画撮像コマンドは、撮像装置１０１に対して１枚の静止画の撮像処理および記録処理を実行させるためのコマンドである。静止画撮像コマンドと判定された場合には、ステップＳ７０７の処理が実行され、静止画撮像コマンドではないと判定された場合には、ステップＳ７０８の処理が実行される。 In step S706, the first control unit 223 determines whether or not the detected command word is a still image imaging command. This still image image pickup command is a command for causing the image pickup apparatus 101 to execute an image pickup process and a recording process of one still image. If it is determined that the command is a still image capture command, the process of step S707 is executed, and if it is determined that the command is not a still image capture command, the process of step S708 is executed.

ステップＳ７０７では、第１制御部２２３は、静止画撮像処理を行う。具体的には、撮像部２０６にて撮像した信号を画像処理部２０７において、ＪＰＥＧフォーマット等に従って変換し、画像記録部２０８にて記録媒体２２１に記録を行う。 In step S707, the first control unit 223 performs a still image imaging process. Specifically, the signal captured by the image pickup unit 206 is converted by the image processing unit 207 according to the JPEG format and the like, and recorded on the recording medium 221 by the image recording unit 208.

ステップＳ７０８では、第１制御部２２３は、検出されたコマンドワードが被写体変更コマンドかどうかの判定を行う。被写体変更コマンドは、例えば、「他の人を撮像して」というキーワードである。被写体変更コマンドと判定された場合には、ステップＳ７０９の処理が実行され、被写体変更コマンドではないと判定された場合には、ステップＳ７１０の処理が実行される。 In step S708, the first control unit 223 determines whether or not the detected command word is a subject change command. The subject change command is, for example, a keyword of "imaging another person". If it is determined that the command is a subject change command, the process of step S709 is executed, and if it is determined that the command is not a subject change command, the process of step S710 is executed.

ステップＳ７０９では、第１制御部２２３は、被写体変更処理を行う。被写体変更処理では、第１制御部２２３は、現在撮像している被写体とは別の被写体を撮像するように、鏡筒１０２を駆動制御する。なお、第１制御部２２３は、コマンドワードを検出した時点において被写体を検出していない場合、本ステップの処理を実行しない。 In step S709, the first control unit 223 performs subject change processing. In the subject change process, the first control unit 223 drives and controls the lens barrel 102 so as to capture a subject different from the subject currently being captured. If the subject is not detected at the time when the command word is detected, the first control unit 223 does not execute the process of this step.

ステップＳ７１０では、第１制御部２２３は、検出されたコマンドワードが動画記録開始コマンドかどうかの判定を行う。動画撮像コマンドは、撮像装置１０１に対して動画像の撮像処理および記録処理を実行させるコマンドである。動画記録開始コマンドと判定された場合には、ステップＳ７１２の処理が実行され、動画記録開始コマンドではないと判定された場合には、ステップＳ７１３の処理が実行される。 In step S710, the first control unit 223 determines whether or not the detected command word is a moving image recording start command. The moving image imaging command is a command for causing the imaging device 101 to execute a moving image imaging process and a recording process. If it is determined that it is a moving image recording start command, the process of step S712 is executed, and if it is determined that it is not a moving image recording start command, the process of step S713 is executed.

ステップＳ７１１では、第１制御部２２３は、撮像部２０６を用いて動画の撮像を開始し、記録媒体２２１へ撮像された動画データの記録を行う。動画データの記録中、第１制御部２２３は、被写体の探索は行わず、自動撮像の停止の状態を維持し続ける。 In step S711, the first control unit 223 starts imaging a moving image using the imaging unit 206, and records the captured moving image data on the recording medium 221. While recording the moving image data, the first control unit 223 does not search for the subject and continues to maintain the state in which the automatic imaging is stopped.

ステップＳ７１２では、第１制御部２２３は、検出されたコマンドワードが動画記録停止コマンドかどうかの判定を行う。動画記録停止コマンドと判定された場合には、ステップＳ７１３の処理が実行され、動画記録停止コマンドではないと判定された場合には、ステップＳ７１４の処理が実行される。 In step S712, the first control unit 223 determines whether or not the detected command word is a moving image recording stop command. If it is determined that the command is not a moving image recording stop command, the process of step S713 is executed, and if it is determined that the command is not a moving image recording stop command, the process of step S714 is executed.

ステップＳ７１３では、第１制御部２２３は、撮像部２０６による被写体の撮像、および記録媒体２２１への動画データの記録を停止し、動画データの記録を完了させる。 In step S713, the first control unit 223 stops the image pickup of the subject by the image pickup unit 206 and the recording of the moving image data on the recording medium 221 to complete the recording of the moving image data.

ステップＳ７１４では、第１制御部２２３は、その他のコマンドワードに対応する処理を実行する。例えば、第１制御部２２３は、ユーザの指定した方向にパン駆動およびチルト駆動を行わせるためのコマンドワードに対する処理や、撮像装置１０１の露出補正などの各種撮像パラメータを変更させるためのコマンドワードに対する処理を行う。 In step S714, the first control unit 223 executes processing corresponding to other command words. For example, the first control unit 223 performs processing for a command word for performing pan drive and tilt drive in a direction specified by the user, and for a command word for changing various imaging parameters such as exposure compensation of the imaging device 101. Perform processing.

ステップＳ７１５およびステップＳ７１６では、第１制御部２２３は、ステップＳ７０２にて停止した自動撮像の再開処理を行う。例えば、第１制御部２２３は、図５に示すフローチャートの処理を、ステップＳ５０２の処理から開始する。 In step S715 and step S716, the first control unit 223 performs the process of resuming the automatic imaging stopped in step S702. For example, the first control unit 223 starts the processing of the flowchart shown in FIG. 5 from the processing of step S502.

以上、本実施形態における撮像装置１０１の音声認識処理について説明した。 The voice recognition process of the image pickup apparatus 101 in the present embodiment has been described above.

このように、本実施形態における撮像装置１０１の被写体の音声認識処理では、撮像装置１０１は、トリガーワードを検出したことに応じて自動撮像処理を停止した。すなわち、図７に示すフローチャートのステップＳ７０２の処理では、第１制御部２２３は、トリガーワードを検出したことに応じて自動撮像処理を停止することで、ユーザの撮像したいタイミングにおける撮像方向を向いた状態を維持した。これにより、ユーザがトリガーワードの後に撮像を指示するためのコマンドワードを発声した場合には、第１制御部２２３は、トリガーワードを受けた時点での画角で撮像処理を実行することになるため、ユーザの撮像したい被写体を撮像することができる。 As described above, in the voice recognition process of the subject of the image pickup device 101 in the present embodiment, the image pickup device 101 stops the automatic image pickup process in response to the detection of the trigger word. That is, in the process of step S702 of the flowchart shown in FIG. 7, the first control unit 223 stops the automatic image pickup process in response to the detection of the trigger word, so that the user faces the image pickup direction at the timing desired to be imaged. Maintained the state. As a result, when the user utters a command word for instructing imaging after the trigger word, the first control unit 223 executes the imaging process at the image angle at the time when the trigger word is received. Therefore, it is possible to capture the subject that the user wants to capture.

なお、第１制御部２２３は、ステップＳ７０２において、自動撮像処理を停止するのではなく、鏡筒１０２の撮像方向の駆動速度が、トリガーワードを検出する直前の駆動速度よりも遅くなるように制御してもよい。本実施形態では、コマンドワードを検出する処理に必要な時間などを考慮し、第１制御部２２３は、鏡筒１０２の撮像方向の駆動速度が、自動撮像処理における鏡筒１０２の撮像方向の駆動速度よりも遅くなるように制御する。そして、コマンドワードが撮像指示出なかった場合、第１制御部２２３は、駆動速度を元に戻して自動撮像処理を継続する。これにより、撮像装置１０１は、ユーザの撮像したい方向を画角に収めやすくすることと、自動撮像処理における画角の変更の継続とを両立することができる。 In step S702, the first control unit 223 controls the drive speed of the lens barrel 102 in the image pickup direction to be slower than the drive speed immediately before detecting the trigger word, instead of stopping the automatic image pickup process. You may. In the present embodiment, in consideration of the time required for the process of detecting the command word, the drive speed of the lens barrel 102 in the image pickup direction of the first control unit 223 is the drive speed of the lens barrel 102 in the image pickup direction in the automatic image pickup process. Control to be slower than the speed. Then, when the command word does not issue an imaging instruction, the first control unit 223 restores the driving speed and continues the automatic imaging process. As a result, the image pickup apparatus 101 can achieve both the ease in which the user wants to capture the image in the angle of view and the continuation of the change in the angle of view in the automatic image pickup process.

なお、第１制御部２２３は、トリガーワードを検出したタイミングにおいて、被写体を検出していた場合、ステップＳ７０２において、駆動処理を停止せずにその被写体を追尾するように鏡筒１０２を駆動制御してもよい。これはユーザがトリガーワードを発したタイミングで撮像装置１００が検出している被写体がユーザの撮像したい被写体であり、駆動を継続してその被写体を追尾したほうがユーザの望む画像を得られる可能性が高いと考えられるからである。この場合、第１制御部２２３は、トリガーワードを検出したタイミングにおいて検出していた被写体を追尾する（画角に収め続ける）ように鏡筒１０２を駆動制御する。このように、第１制御部２２３は、トリガーワードを検出した被写体を追尾するように撮像方向を駆動することで、ユーザの撮像したい被写体を撮像するようにしてもよい。なお、トリガーワードを検出したタイミングにおいて、被写体を追尾していた場合においても同様に、第１制御部２２３は、ステップＳ７０２において、駆動処理を停止せずにその被写体を追尾するように鏡筒１０２を駆動制御してもよい。 If the subject is detected at the timing when the trigger word is detected, the first control unit 223 drives and controls the lens barrel 102 so as to track the subject without stopping the drive process in step S702. You may. This is because the subject detected by the image pickup apparatus 100 at the timing when the user issues a trigger word is the subject to be imaged by the user, and it is possible that the image desired by the user can be obtained by continuously driving and tracking the subject. This is because it is considered expensive. In this case, the first control unit 223 drives and controls the lens barrel 102 so as to track (continue to keep the angle of view) the detected subject at the timing when the trigger word is detected. In this way, the first control unit 223 may drive the image pickup direction so as to track the subject for which the trigger word is detected, so as to capture the subject to be captured by the user. Similarly, even when the subject is tracked at the timing when the trigger word is detected, the first control unit 223 tracks the subject without stopping the drive process in step S702. May be driven and controlled.

なお、第１制御部２２３は、トリガーワードを検出したタイミングにおいて、被写体を検出していない場合、ステップＳ７０２において、被写体の探索処理を実行するように鏡筒１０２を駆動制御してもよい。例えば、第１制御部２２３は、現在の撮像方向を含む、自動撮像処理の探索範囲よりも狭い範囲において、被写体の探索処理を実行する。このように、第１制御部２２３は、トリガーワードを検出したことに応じて、現在の撮像方向付近において被写体を探索することで、ユーザの撮像したい被写体を撮像するようにしてもよい。 If the subject is not detected at the timing when the trigger word is detected, the first control unit 223 may drive and control the lens barrel 102 so as to execute the subject search process in step S702. For example, the first control unit 223 executes the subject search process in a range narrower than the search range of the automatic image pickup process, including the current image pickup direction. In this way, the first control unit 223 may capture the subject to be captured by the user by searching for the subject in the vicinity of the current imaging direction in response to the detection of the trigger word.

このように、撮像装置１０１は、トリガーワードを検出したことに応じて自動撮像処理を停止するだけでなく、トリガーワードを検出したタイミングに撮像していた被写体を追尾することなどのように、トリガーワードを検出したことに応じて駆動範囲を制限する。これにより、撮像装置１０１は、ユーザの撮像したい被写体を撮像することができる。 In this way, the image pickup apparatus 101 not only stops the automatic image pickup process in response to the detection of the trigger word, but also tracks the subject to be imaged at the timing when the trigger word is detected. Limit the drive range according to the detection of the word. As a result, the image pickup apparatus 101 can capture the subject to be captured by the user.

なお、ステップＳ７０２において、第１制御部２２３は、静止画の撮像および動画の撮像処理の実行を停止してもよい。撮像処理のための負荷が軽減されることにより、第１制御部２２３は、ユーザからトリガーワードの次に発せられるコマンドワードの指示により素早く反応することができる。 In step S702, the first control unit 223 may stop the execution of the still image imaging and the moving image imaging process. By reducing the load for the image pickup process, the first control unit 223 can quickly react to the instruction of the command word issued from the user next to the trigger word.

なお、ステップＳ７０５において、所定時間が経過したと判定した場合、第１制御部２２３は、所定時間が経過したことを示す音を音声出力部２１８から出力してもよい。これにより、ユーザは撮像装置１０１がコマンドワードの検出に失敗したことを認識できるため、再度音声による指示をするタイミングが分かりやすくなる。 If it is determined in step S705 that the predetermined time has elapsed, the first control unit 223 may output a sound indicating that the predetermined time has elapsed from the voice output unit 218. As a result, the user can recognize that the image pickup apparatus 101 has failed to detect the command word, so that it becomes easy to understand the timing of giving the instruction by voice again.

なお、第１制御部２２３は、自動撮像処理として断続撮像処理を実行している場合、トリガーワードを検出したことに応じて、スリープ処理の実行をコマンドワードによって指示された処理を完了するまで先延ばしにしてもよい。この場合、第１制御部２２３は、コマンドワードによって指示された処理を完了したことに応じて、断続撮像処理におけるスリープ処理を実行する。 When the first control unit 223 is executing the intermittent imaging process as the automatic imaging process, the first control unit 223 executes the sleep process in response to the detection of the trigger word until the process instructed by the command word is completed. You may procrastinate. In this case, the first control unit 223 executes the sleep process in the intermittent imaging process in response to the completion of the process instructed by the command word.

［その他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other embodiments]
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 It should be noted that the present invention is not limited to the above embodiment as it is, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof. In addition, various inventions can be formed by an appropriate combination of the plurality of components disclosed in the above-described embodiment. For example, some components may be removed from all the components shown in the embodiments. In addition, components across different embodiments may be combined as appropriate.

Claims

An imaging means that captures the subject,
A driving means for driving the imaging direction of the imaging means and
It has a detection means for detecting a first voice command for executing a specific process, a second voice command for initiating detection of the first voice command, and a detection means for detecting the first voice command.
An imaging device, characterized in that, in response to the detection means detecting the second voice command, the driving means limits driving in the imaging direction.

A claim characterized in that, when the detection means detects the first voice command for executing the image pickup process while the drive means limits the drive, the image pickup means images the subject. The image pickup apparatus according to 1.

Determining means to automatically determine the imaging target,
Claim 1 is further characterized by further comprising a control means for controlling an automatic image pickup process for driving the image pickup means to drive the image pickup direction of the image pickup means so that the image pickup means captures an image pickup target determined by the determination means. Or the image pickup apparatus according to 2.

The imaging device according to claim 3, wherein the determination means calculates the importance of each subject based on the detected face information of the subject, and determines the imaging target based on the importance.

The fourth aspect of the present invention is characterized in that the importance of the subject is calculated based on the priority set by the user, the facial expression of the subject, the size of the eyes of the subject, and the orientation of the face of the subject. The imaging device described.

One of claims 3 to 5, wherein the detection means detects the first voice command and the second voice command in parallel even during the execution of the automatic imaging process. The imaging device according to.

The invention according to any one of claims 3 to 6, wherein when the detection means detects the second voice command during the execution of the automatic image pickup process, the automatic image pickup process is stopped. Imaging device.

Any of claims 3 to 6, wherein when the detection means detects the second voice command during the execution of the automatic image pickup process, the drive means limits the drive in the image pickup direction. The image pickup apparatus according to item 1.

The image pickup according to any one of claims 3 to 8, wherein in the automatic image pickup process, the drive means drives the image pickup direction of the image pickup means with the entire circumference of the image pickup device as a drive range. Device.

The method according to any one of claims 3 to 9, wherein when the drive is limited, the drive means drives the image pickup direction of the image pickup means in a drive range narrower than the drive range in the automatic image pickup process. Imaging device.

When limiting the drive, the driving means drives the imaging direction of the imaging means at a speed slower than the driving speed of the imaging means immediately before the detecting means detects the second voice command. The image pickup apparatus according to any one of claims 1 to 10.

When the driving is limited, any one of claims 1 to 11 is characterized in that the driving means drives the subject captured when the second voice command is detected so as to fit in the angle of view. The imaging device according to the section.

The image pickup apparatus according to any one of claims 1 to 12, wherein when the drive is restricted, the drive means is driven so as to search for a subject in the restricted drive.

The image pickup apparatus according to any one of claims 1 to 10, wherein when the drive is limited, the drive means stops the drive of the image pickup means in the image pickup direction.

Have more mics
The imaging device according to any one of claims 1 to 14, wherein the detection means detects the voice command by analyzing a voice signal input from the microphone.

An imaging step to capture the subject and
A drive step that drives the imaging direction in the imaging step,
A detection step for detecting a first voice command for executing a specific process and a second voice command for initiating detection of the first voice command.
A step of limiting the drive in the drive step according to the detection of the second voice command in the detection step, and a step of limiting the drive in the drive step.
A control method characterized by having.

A computer-readable program for operating a computer as each means of the image pickup apparatus according to any one of claims 1 to 15.