JP2019134204A

JP2019134204A - Imaging apparatus

Info

Publication number: JP2019134204A
Application number: JP2018012155A
Authority: JP
Inventors: 優成田; Masaru Narita
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2019-08-08

Abstract

【課題】
撮影者が撮影に注力することなく、撮影場所に適した構図を自動的に判定して撮影を行うこと。
【解決手段】
撮像手段と、複数の構図を記憶する記憶手段と、撮影場所の３次元空間情報を撮影前に予め取得する空間情報取得手段と、前記撮像装置の位置姿勢情報を逐次取得する位置姿勢情報取得手段と、前記複数の構図から撮影に適した構図を選択する選択手段と、前記撮像手段を制御する撮像制御手段を有し、前記撮像制御手段は、前記撮像手段が捉える被写界の構図が、前記選択手段によって選択された構図と略一致したと判定すると、撮影を行うこと。
【選択図】図１【Task】
The photographer automatically determines the composition suitable for the shooting location and does the shooting without focusing on the shooting.
[Solution]
Imaging means, storage means for storing a plurality of compositions, spatial information acquisition means for acquiring three-dimensional spatial information of a shooting location in advance before shooting, and position and orientation information acquisition means for sequentially acquiring position and orientation information of the imaging device And a selection means for selecting a composition suitable for shooting from the plurality of compositions, and an imaging control means for controlling the imaging means, wherein the imaging control means has a composition of an object scene captured by the imaging means, When it is determined that the composition selected by the selection means substantially matches, photographing is performed.
[Selection] Figure 1

Description

本発明は、場の３次元情報を用いて、決定的瞬間を検知する方法に関する。 The present invention relates to a method for detecting a decisive moment using three-dimensional information of a field.

従来より、撮影者がカメラ撮影を意識することなく、決定的瞬間に於いて自動撮影を行いたいというニーズが存在する。例えば、サッカーで自分の子供の決定的瞬間を写真に収める場合、従来はファインダーを通して子供を追わなければならず、肉眼で観戦することは困難であった。そのため、肉眼で観戦を行いながら、カメラが自動で決定的瞬間を予測し自動撮影を行う技術が望まれている。 Conventionally, there is a need for a photographer to perform automatic shooting at a decisive moment without being conscious of camera shooting. For example, when taking a picture of the critical moment of your child in soccer, it has been difficult to watch it with the naked eye because it has been necessary to follow the child through the viewfinder. Therefore, there is a demand for a technique in which a camera automatically predicts a decisive moment and performs automatic shooting while watching with the naked eye.

自動撮影においては、カメラと被写体、背景の位置関係によって決まる構図が重要である。例えば、特許文献１では、移動するロボットカメラに所望の構図を事前に記憶させ、カメラが移動した場合でも所望の構図を得られるようにする技術が提案されている。 In automatic shooting, a composition determined by the positional relationship between the camera, the subject, and the background is important. For example, Patent Document 1 proposes a technique in which a desired composition is stored in advance in a moving robot camera so that the desired composition can be obtained even when the camera moves.

一方で、近年、３Ｄレーザースキャナを用いて、構造物の正確な３次元情報を取得できる技術が提案されている。前記３次元情報とは、空間上のある点を原点とした時の、３次元位置情報及びＲＧＢデータ等の色情報の点群データや３Ｄモデルデータである。以下、本明細書では、前記３次元情報を場の情報と呼ぶ。 On the other hand, in recent years, a technique that can acquire accurate three-dimensional information of a structure using a 3D laser scanner has been proposed. The three-dimensional information is point group data or 3D model data of color information such as three-dimensional position information and RGB data when a certain point in space is the origin. Hereinafter, in the present specification, the three-dimensional information is referred to as field information.

特許４８３４６４５号明細書Japanese Patent No. 4833445

特許文献１で提案された自動撮影技術は、被写体の３次元位置情報を記憶しておき、移動するカメラとの相対的な位置関係から、所望の構図となるようなカメラパラメータを算出するものである。この方法では、被写体の位置が固定されている場合には所望の構図で撮影されるが、サッカーのようなスポーツで被写体が激しく動く場合では、被写体が画面から外れてしまい撮影できない。 The automatic photographing technique proposed in Patent Document 1 stores three-dimensional position information of a subject and calculates camera parameters that give a desired composition from the relative positional relationship with a moving camera. is there. In this method, when the position of the subject is fixed, shooting is performed with a desired composition, but when the subject moves violently in sports such as soccer, the subject moves off the screen and cannot be shot.

上記の背景及び課題を鑑み、本発明では、場の情報を用いて決定的瞬間を検知し、自動撮影を行う技術を提案することを目的とする。 In view of the above background and problems, an object of the present invention is to propose a technique for detecting a decisive moment using field information and performing automatic photographing.

上記の課題を解決するために本発明は、撮像手段と、複数の構図を記憶する記憶手段と、撮影場所の３次元空間情報を撮影前に予め取得する空間情報取得手段と、前記撮像装置の位置姿勢情報を逐次取得する位置姿勢情報取得手段と、前記複数の構図から撮影に適した構図を選択する選択手段と、前記撮像手段を制御する撮像制御手段を有し、前記撮像制御手段は、前記撮像手段が捉える被写界の構図が、前記選択手段によって選択された構図と略一致したと判定すると、撮影を行うことを特徴とする。 In order to solve the above problems, the present invention provides an imaging unit, a storage unit that stores a plurality of compositions, a spatial information acquisition unit that acquires in advance three-dimensional spatial information of a shooting location before shooting, and the imaging device. Position and orientation information acquisition means for sequentially acquiring position and orientation information, selection means for selecting a composition suitable for shooting from the plurality of compositions, and imaging control means for controlling the imaging means, the imaging control means, When it is determined that the composition of the object scene captured by the imaging unit substantially matches the composition selected by the selection unit, shooting is performed.

本発明によれば、場の情報を応用して決定的瞬間を検知することで、撮影者が撮影に注力することなく、所望の構図を判定して自動撮影を行う事ができる。 According to the present invention, by detecting the decisive moment by applying the field information, it is possible to perform automatic shooting by determining a desired composition without the photographer focusing on shooting.

実施例１の画像処理装置の構成を示すブロック図1 is a block diagram illustrating a configuration of an image processing apparatus according to a first embodiment. 実施例１の処理フローを説明する図The figure explaining the processing flow of Example 1. カメラ位置姿勢の算出方法の一例を説明する図The figure explaining an example of the calculation method of a camera position and orientation 決定的瞬間を表わす画像の一例を説明する図The figure explaining an example of the image showing a decisive moment 場の情報を利用した被写体抽出方法を説明する図The figure explaining the subject extraction method using field information 表１〜３に対応する構図の一例Example of composition corresponding to Tables 1-3

［実施例１］
本発明の第１の実施例では、カメラを仕込んだサッカーボール（以下、ボールカメラと呼ぶ）で、決定的瞬間を自動撮影するというユースケースを想定する。なお、本実施例の用途はボールカメラに限定されるものではなく、通常の手持ちカメラやウェアラブルカメラ、ロボットカメラなど、構図が変化し得る全てのカメラに適用することが可能である。 [Example 1]
In the first embodiment of the present invention, a use case is assumed in which a decisive moment is automatically photographed with a soccer ball (hereinafter referred to as a ball camera) loaded with a camera. The application of the present embodiment is not limited to the ball camera, but can be applied to all cameras whose composition can be changed, such as a normal hand-held camera, wearable camera, and robot camera.

以下、図面を用いて、本実施例に好適な撮像装置について、詳細な説明を行う。 Hereinafter, an imaging apparatus suitable for the present embodiment will be described in detail with reference to the drawings.

図１は、実施例１に好適な撮像装置の構成を示すブロック図である。撮像部１０１は、レンズ及び撮像素子により構成され、被写体像を電気信号として出力する。カメラ信号処理部１０２は、撮像部１０１より出力される電気信号から映像信号を形成する。カメラ信号処理回路１０２は、不図示のＡ／Ｄ変換回路、オートゲイン制御回路（ＡＧＣ）、オートホワイトバランス回路を含み、デジタル信号を形成する。３次元空間情報保持部１０３は、場の情報を保持している。 FIG. 1 is a block diagram illustrating a configuration of an imaging apparatus suitable for the first embodiment. The imaging unit 101 includes a lens and an imaging element, and outputs a subject image as an electrical signal. The camera signal processing unit 102 forms a video signal from the electrical signal output from the imaging unit 101. The camera signal processing circuit 102 includes an A / D conversion circuit, an auto gain control circuit (AGC), and an auto white balance circuit (not shown), and forms a digital signal. The three-dimensional spatial information holding unit 103 holds field information.

位置姿勢算出部１０４は、前記カメラ信号処理部１０２の出力と前記３次元空間情報部１０３より得られる３次元空間情報に基づいてカメラの位置姿勢を算出する。構図情報保持部１０５は、決定的瞬間を表す構図情報を保持している。この構図情報は、前記位置姿勢算出部１０４の出力に対応付けられた形式で保持されている。構図情報選択部１０６は、前記位置姿勢算出部１０４の出力に基づいて、前記構図情報保持部１０５から構図情報を選択する。構図情報検出部１０７は、前記カメラ信号処理部１０２から出力される画像データから、構図情報を検出する。 The position / orientation calculation unit 104 calculates the position / orientation of the camera based on the output of the camera signal processing unit 102 and the three-dimensional space information obtained from the three-dimensional space information unit 103. The composition information holding unit 105 holds composition information representing a decisive moment. This composition information is held in a format associated with the output of the position / orientation calculation unit 104. The composition information selection unit 106 selects composition information from the composition information holding unit 105 based on the output of the position / orientation calculation unit 104. The composition information detection unit 107 detects composition information from the image data output from the camera signal processing unit 102.

決定的瞬間判定部１０８は、前記構図情報選択部１０６で選択される構図情報と、前記構図情報検出部１０７より得られる構図情報に基づいて、決定的瞬間か否かを判定する。画像記録部１０９は、決定的瞬間判定部１０８が決定的瞬間と判定した場合に、前記カメラ信号処理部１０２から出力される画像データを記録する。 The deterministic moment determination unit 108 determines whether or not it is a definitive moment based on the composition information selected by the composition information selection unit 106 and the composition information obtained from the composition information detection unit 107. The image recording unit 109 records the image data output from the camera signal processing unit 102 when the deterministic moment determination unit 108 determines that it is a decisive moment.

図２に、本発明の第１の実施例である画像処理装置の動作フローを示す。まず、ステップＳ２０１において、前記カメラ信号処理部１０２から撮像画像を取得する。 FIG. 2 shows an operation flow of the image processing apparatus according to the first embodiment of the present invention. First, in step S <b> 201, a captured image is acquired from the camera signal processing unit 102.

次に、ステップＳ２０２において、前記３次元空間情報保持部１０３より場の情報を取得する。前述のように、場の情報とは、空間上にある原点を定めたときに、位置情報（ｘ，ｙ，ｚ）と色情報（Ｒ，Ｇ，Ｂ）が一意に定まる点データの集合体（以下、点群と呼ぶ）である。 Next, in step S <b> 202, field information is acquired from the three-dimensional spatial information holding unit 103. As described above, field information is a collection of point data in which position information (x, y, z) and color information (R, G, B) are uniquely determined when an origin in space is determined. (Hereinafter referred to as a point cloud).

本実施例では、サッカーを行うグラウンドや観客席など周辺の構造物を表す点群である。場の情報は、例えばレーザースキャナを用いて事前に取得することが可能である。 In the present embodiment, it is a point cloud representing surrounding structures such as a ground for playing soccer or a spectator seat. The field information can be acquired in advance using, for example, a laser scanner.

次に、ステップＳ２０３において、前記位置姿勢算出部１０４はステップＳ２０１で取得した撮像画像とステップＳ２０２で取得した場の情報を用いて、カメラの位置姿勢を算出する。図３を用いて、カメラの位置姿勢の算出方法の一例を説明する。まず、撮像画像中から特徴点の抽出および特徴量の記述を行う。撮像画像から特徴点を抽出する手法としては、公知のコーナー検出技術を用いることができる。 In step S203, the position / orientation calculation unit 104 calculates the position / orientation of the camera using the captured image acquired in step S201 and the field information acquired in step S202. An example of a method for calculating the position and orientation of the camera will be described with reference to FIG. First, feature points are extracted from the captured image and feature quantities are described. As a method for extracting feature points from the captured image, a known corner detection technique can be used.

図３（ａ）において、３０１は撮像画像、３０２は撮像画像から抽出される特徴点、３０３はカメラ座標系を示している。抽出された特徴点の座標はカメラ座標系で表現される。ここでは、３０２をｐ＝（ｕ，ｖ）と表す。 In FIG. 3A, 301 indicates a captured image, 302 indicates a feature point extracted from the captured image, and 303 indicates a camera coordinate system. The coordinates of the extracted feature points are expressed in the camera coordinate system. Here, 302 is expressed as p = (u, v).

一方、図３（ｂ）において、３０４は場の情報に含まれる特徴点、３０５は世界座標系、３０６は３０４を画像上に投影した点を示している。３０４は世界座標系で表現され、３０６はカメラ座標で表現される。ここでは、３０４をＳ＝（ｘ，ｙ，ｚ）、３０６をｐ’＝（ｕ’，ｖ’）と表す。 On the other hand, in FIG. 3B, 304 indicates a feature point included in the field information, 305 indicates a world coordinate system, and 306 indicates a point obtained by projecting 304 on the image. 304 is expressed in the world coordinate system, and 306 is expressed in camera coordinates. Here, 304 is represented as S = (x, y, z), and 306 is represented as p ′ = (u ′, v ′).

カメラの位置姿勢を算出することは、カメラ座標系と世界座標系の対応関係を求めることに他ならない。この対応関係はカメラ外部パラメータ行列Ｍを用いて表現される。以下では、この行列Ｍをカメラの位置姿勢変化と呼ぶ。カメラの位置姿勢変化Ｍは、回転行列Ｒと平行移動ベクトルｔを用いて式１で表される。 Calculating the position and orientation of the camera is nothing but finding the correspondence between the camera coordinate system and the world coordinate system. This correspondence is expressed using a camera external parameter matrix M. Hereinafter, this matrix M is referred to as camera position and orientation change. The camera position / orientation change M is expressed by Equation 1 using the rotation matrix R and the translation vector t.

ここでは、３０２と３０６の特徴量を最近傍探索により対応付け、３０２の座標ｐと３０６の座標ｐ’の誤差が最小となるようなＭを算出すれば良い。誤差の指標としては、例えば距離の二乗和を用いることができる。ｐ’はＳを用いて以下の式２、式３によって求められる。 Here, the feature quantities 302 and 306 may be associated by nearest neighbor search, and M may be calculated such that the error between the coordinates p 302 and the coordinates p ′ of 306 is minimized. As an index of error, for example, the sum of squares of distances can be used. p ′ is obtained by the following equations 2 and 3 using S.

ここで、（ｘ’，ｙ’，ｚ’）はＳをカメラ座標系で表現した座標、ｆはカメラの焦点距離である。 Here, (x ′, y ′, z ′) is a coordinate representing S in the camera coordinate system, and f is a focal length of the camera.

以上の通り、カメラ位置姿勢は、場の情報と撮影画像により、場の原点からのカメラ位置姿勢変化として算出することができる。なお、カメラの初期の位置姿勢Ｍ０が算出できれば、1 フレーム前でのカメラ位置姿勢Ｍ_ｎ−１（初期値はＭ０）を利用することで、現在のカメラの位置姿勢Ｍ_ｎを効率的に算出することが可能である。例えば、Ｍ_ｎ−１とＭ_ｎ間の変化量が小さいと仮定することで、３０２と３０６の特徴点の対応付け範囲を絞り込むことができる。 As described above, the camera position / orientation can be calculated as a change in camera position / orientation from the origin of the field based on the field information and the captured image. If the initial position / orientation M0 of the camera can be calculated, the current position / orientation _Mn of the camera can be efficiently calculated by using the camera position / orientation _Mn-1 (initial value is M0) one frame before. Is possible. For example, assuming that the amount of change between M _n−1 and M _n is small, the correspondence range of the feature points 302 and 306 can be narrowed down.

また、カメラ位置姿勢をより高速、高精度に算出する場合、ＧＰＳやジャイロセンサの情報を、Ｍを算出する初期値として用いることができる。また、撮像画像中に特徴点が含まれない場合が考えられる。例えば、地面だけが映っているような場合である。このような場合には、あらかじめフィールド内に人工的なマーカーを配置しておく方法が考えられる。不可視のマーカーを用いれば、景観を損なうことなく配置することも可能である。 Further, when the camera position and orientation are calculated at higher speed and higher accuracy, information on GPS and gyro sensors can be used as an initial value for calculating M. Further, there may be a case where no feature point is included in the captured image. For example, when only the ground is shown. In such a case, a method of arranging an artificial marker in the field in advance can be considered. If an invisible marker is used, it is possible to arrange it without damaging the landscape.

次にステップＳ２０４では、ステップＳ２０３で算出したカメラ位置姿勢について、１フレーム前の算出結果と差分があるか否かを判定する。カメラが静止した状態であれば、位置姿勢の変化は生じないため、差分がないことになる。カメラの位置姿勢が変化した場合、ステップＳ２０５へ移行する。カメラの位置姿勢が変化しない場合、ステップＳ２０６へ移行する。 Next, in step S204, it is determined whether there is a difference from the calculation result of the previous frame for the camera position and orientation calculated in step S203. If the camera is stationary, no change in position and orientation occurs, so there is no difference. When the position and orientation of the camera has changed, the process proceeds to step S205. When the camera position / posture does not change, the process proceeds to step S206.

次にステップＳ２０５において、構図情報選択部１０６は、ステップＳ２０３で算出されたカメラの位置姿勢に基づいて、前記構図情報保持部１０５から構図情報を選択する。構図情報は、カメラの位置姿勢と対応付けて保持されており、カメラ位置姿勢毎にどういう画像を、決定的瞬間を表わす画像として判断するかを指示するものである。 In step S205, the composition information selection unit 106 selects composition information from the composition information holding unit 105 based on the position and orientation of the camera calculated in step S203. The composition information is held in association with the position and orientation of the camera, and instructs what kind of image is determined as an image representing a decisive moment for each camera position and orientation.

図４を用いて、決定的瞬間を表わす画像の一例を説明する。図４（ａ）において、４０１はサッカーグラウンド、４０２、４０３、４０４はそれぞれ異なる時刻でのボールカメラの位置と向きを示している。４０２においては、選手がゴールに向かってボールを蹴るシーンが想定されるため、例えば図４（ｂ）に示すように、選手がボールに向かって蹴る瞬間を決定的瞬間とすることが考えられる。 An example of an image representing a decisive moment will be described with reference to FIG. In FIG. 4A, 401 indicates the soccer ground, and 402, 403, and 404 indicate the position and orientation of the ball camera at different times. In 402, since a scene is assumed in which the player kicks the ball toward the goal, for example, as shown in FIG. 4B, the moment when the player kicks the ball toward the ball can be considered as a decisive moment.

一方、４０３においてはゴールキーパー（以下、ＧＫと表記）がシュートを止めるシーンが想定されるため、例えば図４（ｃ）に示すように、ＧＫがボールに向かって飛びつく瞬間を決定的瞬間とすることが考えられる。 On the other hand, in 403, a scene is assumed in which the goalkeeper (hereinafter referred to as GK) stops the shot. For example, as shown in FIG. It is possible.

なお、本実施例のようなボールカメラの特徴として、カメラと選手との距離が近いこと、常に選手の目線がカメラに向けられること、ボールの動きによって多様なアングルでの撮影が生まれることが挙げられる。これらの特徴によって、サッカーグラウンドの外部から撮影した場合に比べ、迫力ある映像が得られることはもちろん、選手に隠れて決定的瞬間を見逃すこともなくなる。 The features of the ball camera as in this embodiment are that the distance between the camera and the player is close, the player's eyes are always directed at the camera, and shooting at various angles is born by the movement of the ball. It is done. These features make it possible to obtain a powerful video as compared to when shooting from outside the soccer ground, as well as to hide the decisive moment behind the player.

続いて、表1〜表３を用いて、決定的瞬間を表わす画像を指定するための構図情報の例を説明する。表１では、入力として図４（ａ）におけるカメラ位置姿勢４０２および焦点距離を与える場合を想定する。また、構図情報として被写体の種類、被写体の位置、被写体の大きさを用いる。これらの情報は、構図情報検出部１０７によって得られる情報であり、取得方法は後述する。カメラ位置姿勢４０２では、１つの決定的瞬間として選手がボールを蹴るシーンを想定している。そのため、被写体の種類に関して、例えば「人物」が選択される。こうすることで、被写体として人物が写っていない場面では、決定的瞬間とは判定しないことになる。 Next, an example of composition information for designating an image representing a decisive moment will be described using Tables 1 to 3. In Table 1, it is assumed that the camera position and orientation 402 and the focal length in FIG. Further, the type of subject, the location of the subject, and the size of the subject are used as composition information. These pieces of information are information obtained by the composition information detection unit 107, and an acquisition method will be described later. The camera position / posture 402 assumes a scene in which the player kicks the ball as one decisive moment. Therefore, for example, “person” is selected as the type of subject. By doing so, in a scene where a person is not shown as a subject, it is not determined as a decisive moment.

次に被写体位置であるが、これは被写体が写るべき画像内の領域を意味する。例えば画面の中央が選択される。こうすることで、被写体が画面の端に写っている場面では、決定的瞬間とは判定しないことになる。さらに被写体の大きさであるが、これは被写体が画面に占める割合を意味する。例えば、画面の７０％以上が選択される。こうすることで、被写体の大きさが小さく画面の７０％に満たない場面では、決定的瞬間とは判定しないことになる。 Next, the subject position means an area in the image where the subject is to be photographed. For example, the center of the screen is selected. By doing so, in a scene where the subject is reflected at the edge of the screen, it is not determined as a decisive moment. Furthermore, the size of the subject means the ratio of the subject to the screen. For example, 70% or more of the screen is selected. By doing so, in a scene where the size of the subject is small and less than 70% of the screen, it is not determined as a decisive moment.

以上のように、表１では決定的瞬間を表わす画像を、被写体の種類、被写体の位置、被写体の大きさによって指定した。 As described above, in Table 1, an image representing a decisive moment is designated by the type of subject, the location of the subject, and the size of the subject.

表１に対応する構図の一例を図６（ａ）に示す。表１に対して、決定的瞬間をより正確に指定する、あるいはユーザの意図をより正確に反映するために、表２では構図情報として被写体の動作、被写体の姿勢を追加している。これらの情報は、構図情報検出部１０７によって得られる情報であり、取得方法は後述する。 An example of the composition corresponding to Table 1 is shown in FIG. In Table 2, in order to specify the decisive moment more accurately or to reflect the user's intention more accurately, in Table 2, subject motion and subject posture are added as composition information. These pieces of information are information obtained by the composition information detection unit 107, and an acquisition method will be described later.

被写体の動作は、連続する複数のフレーム間での被写体の動きを意味する。例えば、選手がボールを蹴るシーンを撮影する場合は、「蹴る」という動きを指定する。こうすることで、「蹴る」という動作が存在しない場面では、決定的瞬間とは判定しないことになる。 The movement of the subject means movement of the subject between a plurality of consecutive frames. For example, when shooting a scene in which the player kicks the ball, the movement “kick” is designated. By doing so, in a scene where the action of “kick” does not exist, it is not determined as a decisive moment.

また、被写体の姿勢は、ある１つのフレームでの被写体の体勢を意味する。例えば、ボールを蹴る瞬間は片足で地面に立つ姿勢になる。そこで「片足立ち」という姿勢を指定する。こうすることで、選手の両足が地面についている状態では、決定的瞬間とは判定しないことになる。 The posture of the subject means the posture of the subject in a certain frame. For example, at the moment of kicking the ball, the posture is to stand on the ground with one foot. Therefore, the posture “standing on one foot” is designated. In this way, when the player's feet are on the ground, it is not determined as a decisive moment.

以上のように、表２では決定的瞬間を表わす画像を、被写体の種類、被写体の位置、被写体の大きさに加えて、被写体の動作、被写体の姿勢によって指定した。 As described above, in Table 2, the image representing the decisive moment is designated by the motion of the subject and the posture of the subject in addition to the type of subject, the position of the subject, and the size of the subject.

表２に対応する構図の一例を図６（ｂ）に示す。図６（ａ）に対して、被写体の動作および姿勢が指定されていることが特徴である。表２に対して、決定的瞬間をさらに正確に指定する、あるいはユーザの意図をさらに正確に反映するために、表３では構図情報として被写体の表情、被写体距離、圧力情報を追加している。これらの情報は、構図情報検出部１０７によって得られる情報であり、取得方法は後述する。 An example of a composition corresponding to Table 2 is shown in FIG. 6A is characterized in that the motion and posture of the subject are specified. In Table 3, in order to specify the decisive moment more accurately or to reflect the user's intention more accurately, in Table 3, the expression of the subject, subject distance, and pressure information are added as composition information. These pieces of information are information obtained by the composition information detection unit 107, and an acquisition method will be described later.

被写体の表情に関して、例えば、選手が笑顔でボールを蹴るシーンを撮影する場合は「笑顔」という表情を指定する。こうすることで、選手の笑顔が存在しない場面では、決定的瞬間とは判定しないことになる。また、被写体距離に関して、ボールカメラは選手との距離が常に近づいたり遠ざかったりする特徴がある。選手がボールを蹴るシーンでは、蹴った瞬間は距離がゼロになり、その後選手から距離が離れていく。 Regarding the facial expression of the subject, for example, when shooting a scene where the player kicks the ball with a smile, the facial expression “smile” is designated. In this way, in a scene where there is no player's smile, it is not determined as a decisive moment. In addition, with respect to the subject distance, the ball camera has a feature that the distance from the player always approaches or moves away. In the scene where the player kicks the ball, the distance becomes zero at the moment of kicking, and then the distance from the player increases.

ボールカメラから選手の蹴る瞬間を撮影するには、焦点距離に応じた適切な距離範囲があるため、その距離範囲を指定してやれば良い。そうすることで、選手との距離が近すぎたり遠すぎたりする場面では、決定的瞬間とは判定しないことになる。さらに、圧力情報に関して、ボールカメラでは選手がボールを蹴る瞬間に大きな圧力が加わる。選手がボールを蹴るシーンでは、蹴った瞬間で圧力が最大になり、その後圧力が下がっていく。 In order to shoot a player's kicking moment from the ball camera, there is an appropriate distance range corresponding to the focal length, and it is only necessary to specify the distance range. By doing so, in a scene where the distance to the player is too close or too far away, it is not determined as a decisive moment. Furthermore, with respect to pressure information, a large amount of pressure is applied to the ball camera at the moment the player kicks the ball. In a scene where a player kicks a ball, the pressure becomes maximum at the moment of kicking, and then the pressure decreases.

先程の被写体距離の場合と同様に、ボールカメラから選手の蹴る瞬間を撮影するには、焦点距離に応じた適切な圧力範囲があるため、その圧力範囲を指定してやれば良い。そうすることで、ボールカメラに加わる圧力が大きすぎたり小さすぎたり場面では、決定的瞬間とは判定しないことになる。 As in the case of the subject distance, in order to capture the moment when the player kicks from the ball camera, there is an appropriate pressure range corresponding to the focal length. By doing so, in a scene where the pressure applied to the ball camera is too large or too small, it is not determined as a decisive moment.

以上のように、表３では決定的瞬間を表わす画像を、被写体の種類、被写体の位置、被写体の大きさ、被写体の動作、被写体の姿勢に加えて、被写体の表情、被写体距離、圧力情報によって指定した。 As described above, in Table 3, an image representing a decisive moment is represented by subject expression, subject distance, pressure information in addition to subject type, subject position, subject size, subject action, subject posture. Specified.

表３に対応する構図の一例を図６（ｃ）に示す。図６（ｂ）に対して、被写体の表情が指定されていることが特徴である。また図６（ｃ）では図示されていないが、被写体距離、圧力情報も加味される。 An example of the composition corresponding to Table 3 is shown in FIG. A feature of FIG. 6B is that the facial expression of the subject is designated. Although not shown in FIG. 6C, subject distance and pressure information are also taken into account.

ここで、一つのカメラ位置姿勢に対応付けられる構図情報は複数存在しても良い。例えば、カメラ位置姿勢４０２では、１つの決定的瞬間として足でボールを蹴る他に、ボールをヘディングするようなシーンが考えられる。その場合は、被写体の動作や姿勢の候補を別に用意しておく。例えば、動作としては頭でボールを打つので、「頭突き」が選択される。また姿勢としては、空中に飛びながらヘディングすることが考えられるので「両足浮遊」が選択される。また、足でボールを蹴る場合と比較して、ボールカメラへの圧力が小さくなると考えられるので、圧力は低めが選択される。 Here, there may be a plurality of composition information associated with one camera position and orientation. For example, in the camera position / posture 402, in addition to kicking the ball with a foot as one decisive moment, a scene where the ball is headed can be considered. In that case, separate motion and orientation candidates for the subject are prepared. For example, as the action, the ball is hit with the head, so “head butt” is selected. As the posture, it is conceivable to head while flying in the air, so “floating both feet” is selected. Further, since it is considered that the pressure on the ball camera is smaller than when the ball is kicked with the foot, a lower pressure is selected.

さらに、カメラ位置姿勢４０２において、焦点距離がテレであれば、ＧＫの様子を捉えるかも知れない。例えば、１つの決定的瞬間としてＧＫが選手に向かって指示を飛ばしているシーンが想定される。そこで、被写体の種類として「ＧＫ」、姿勢として「指さし」が選択される。また、サッカーグラウンドという場の中では、決定的瞬間を表わす構図情報が無数に散りばめられている。例えば、図４（ａ）のカメラ位置姿勢４０３では、決定的瞬間としてＧＫがゴールに入りそうなボールに飛びついてセーブするシーンなどが想定される。 Furthermore, in the camera position / posture 402, if the focal length is tele, the state of GK may be captured. For example, a scene in which GK skips an instruction toward a player is assumed as one decisive moment. Therefore, “GK” is selected as the subject type and “pointing” is selected as the posture. In addition, in a soccer ground, countless composition information representing a decisive moment is scattered. For example, in the camera position and orientation 403 in FIG. 4A, a scene where GK jumps onto a ball that is likely to enter the goal and saves is assumed as a decisive moment.

そこで、入力のカメラ位置姿勢４０３に対する構図情報としては、表５に示すように、被写体の種類が「ＧＫ」、動作が「跳ぶ」、姿勢が「腕伸ばし」のように選択され、これが決定的瞬間と判断されることになる。また、図４（ａ）のカメラ位置姿勢４０４では、決定的瞬間として選手がスローインするシーンなどが想定される。そこで、入力のカメラ位置姿勢４０４に対する構図情報としては、被写体の種類が「選手」、動作が「投げる」のように選択され、これが決定的瞬間と判断されることになる。 Therefore, as the composition information for the input camera position / posture 403, as shown in Table 5, the subject type is selected as “GK”, the motion is “jump”, and the posture is “arm extension”, which is decisive. It will be judged as a moment. In addition, in the camera position / posture 404 of FIG. 4A, a scene where a player throws in is assumed as a decisive moment. Therefore, the composition information for the input camera position / posture 404 is selected such that the type of subject is “player” and the action is “throw”, and this is determined to be a decisive moment.

このように、場の中に無数に散りばめられた構図情報を、場の情報とカメラ位置姿勢に基づいて絞り込むことで、決定的瞬間の判定を瞬時に行うことが可能になる。なお、構図情報は機械学習によって、カメラ位置姿勢と対応付けて保持しても良い。また、ステップＳ２０３で算出されたカメラ位置姿勢に対して、対応付けられた構図情報が存在しない場合は、構図情報を選択しないでＳ２０３に移行するか、あるいは最も近しいカメラ位置姿勢と対応付けられた構図情報を参照するようにしてＳ２０６に移行すれば良い。 In this way, it is possible to instantly determine the decisive moment by narrowing down the composition information scattered innumerably in the field based on the field information and the camera position and orientation. The composition information may be held in association with the camera position and orientation by machine learning. If there is no composition information associated with the camera position and orientation calculated in step S203, the process proceeds to S203 without selecting composition information, or is associated with the closest camera position and orientation. What is necessary is just to transfer to S206 so that composition information may be referred.

次にステップＳ２０６において、構図情報検出部１０７は、ステップＳ２０１で取得した撮像画像から、構図情報を検出する。構図情報とは、先述のとおり、被写体の種類、被写体の位置、被写体の大きさ、被写体の動作、被写体の姿勢に加えて、被写体の表情、被写体距離、圧力情報である。被写体の種類は、例えば人であれば公知の人物検出技術を用いることができる。また、参加選手の顔などを事前に登録しておくことで、ＧＫを検出するといった個人認証も可能である。被写体の位置は、例えば検出された被写体の外接矩形の重心座標として算出する。被写体の大きさは、例えば被写体の外接矩形の面積と画面全体の面積との比で算出する。 In step S206, the composition information detection unit 107 detects composition information from the captured image acquired in step S201. As described above, the composition information includes subject expression, subject distance, and pressure information in addition to the type of subject, the position of the subject, the size of the subject, the motion of the subject, and the posture of the subject. For example, if the subject is a person, a known person detection technique can be used. Also, personal authentication such as detecting GK is possible by registering the faces of participating players in advance. The position of the subject is calculated as, for example, the barycentric coordinates of the circumscribed rectangle of the detected subject. The size of the subject is calculated by, for example, a ratio between the area of the circumscribed rectangle of the subject and the area of the entire screen.

被写体の動作や姿勢は、公知のジェスチャー認証技術を用いることで検出できる。ジェスチャー認証技術の一例を説明する。まずフレーム毎に距離画像を取得する。距離画像は測距センサを用いるか、あるいは、撮像画像から公知のステレオマッチング法で検出できる。 The motion and posture of the subject can be detected by using a known gesture authentication technique. An example of the gesture authentication technique will be described. First, a distance image is acquired for each frame. The distance image can be detected by using a distance measuring sensor or from a captured image by a known stereo matching method.

次に取得された距離画像を元に、人型領域を抽出し、ピクセル単位で体の部位に分類する。分類には、ＡｄａＢｏｏｓｔ等のブースティング学習アルゴリズムを用いることができる。そして、体の部位に基づいて、運動力学的な拘束と、時間的な一貫性が保たれるように、３次元空間上での部位の配置（関節位置）を推定する。この関節位置の３次元位置の変位を、認証すべき所望の動きパターンと比較することで、ジェスチャー認証を行うことができる。 Next, based on the acquired distance image, a humanoid region is extracted and classified into body parts in units of pixels. For the classification, a boosting learning algorithm such as AdaBoost can be used. Then, based on the body part, the arrangement (joint position) of the part in the three-dimensional space is estimated so that the kinematic constraint and the temporal consistency are maintained. Gesture authentication can be performed by comparing the displacement of the three-dimensional position of the joint position with a desired motion pattern to be authenticated.

被写体の表情は、公知の表情認識技術で検出できる。例えば、笑顔認識では、まず顔検出によって顔領域を検出する。次に顔領域において特徴点検出を行い、両目および口の各端点の座標を特定する。笑顔の特徴としては、目が細まること、目尻が下がること、口の周囲にしわができること、口角が上がることなどがある。これらの特徴を捉えるため、特徴点座標付近において特徴量の算出を行う。そして、この特徴量から表情を識別する。識別には、ＡｄａＢｏｏｓｔ等のブースティング学習アルゴリズムを用いることができる。 The expression of the subject can be detected by a known expression recognition technique. For example, in smile recognition, a face area is first detected by face detection. Next, feature points are detected in the face region, and the coordinates of the end points of both eyes and mouth are specified. The characteristics of smiles include narrowing eyes, lowering the corners of the eyes, wrinkles around the mouth, and raising the corners of the mouth. In order to capture these features, feature quantities are calculated in the vicinity of the feature point coordinates. Then, the facial expression is identified from the feature amount. For identification, a boosting learning algorithm such as AdaBoost can be used.

被写体距離は、測距センサを用いることで検出できる。あるいは、撮像画像から公知のステレオマッチング法で検出できる。ボールカメラへの圧力は、圧力センサを用いることで検出できる。ここで、場の情報を利用すれば、被写体抽出を高精度かつ高速に行うことができる。例えば、ステップＳ２０２で得られる場の情報と、ステップＳ２０３で得られるカメラ位置姿勢を用いれば、公知の手法により背景画像を生成することが可能である。 The subject distance can be detected by using a distance measuring sensor. Or it can detect with a well-known stereo matching method from a captured image. The pressure on the ball camera can be detected by using a pressure sensor. Here, if the field information is used, subject extraction can be performed with high accuracy and at high speed. For example, using the field information obtained in step S202 and the camera position and orientation obtained in step S203, a background image can be generated by a known method.

図５は、背景画像と撮像画像から主被写体を抽出する模式図である。図５において、背景画像（ａ）と撮像画像（ｂ）との差分をとることで得られる差分画像（ｃ）には、場の情報には存在しない被写体が抽出されることになる。 FIG. 5 is a schematic diagram for extracting the main subject from the background image and the captured image. In FIG. 5, a subject that does not exist in the field information is extracted from the difference image (c) obtained by taking the difference between the background image (a) and the captured image (b).

次にステップＳ２０７において、ステップＳ２０５で絞り込んだ構図情報と、ステップＳ２０６で検出した構図情報を比較する。構図情報が一致（実質一致）する場合、撮像画像は決定的瞬間を捉えていると判断し、ステップＳ２０８に移行する。構図情報が一致しない場合、ステップＳ２０３に移行する。最後にステップＳ２０８において、画像記録部１０９は、ステップＳ２０１で取得される撮像画像を記録する。 In step S207, the composition information narrowed down in step S205 is compared with the composition information detected in step S206. If the composition information matches (substantially matches), it is determined that the captured image captures a decisive moment, and the process proceeds to step S208. If the composition information does not match, the process proceeds to step S203. Finally, in step S208, the image recording unit 109 records the captured image acquired in step S201.

以上により、決定的瞬間を判定し、画像を記録することができる。なお、本実施例ではＳ２０７で構図情報が一致する場合のみ、Ｓ２０８で画像を記録するとしたが、常に画像を記録しておき、Ｓ２０７で構図情報が一致するかしないかのタグ付けを行い、後からタグに基づいて画像を編集するようにしても良い。 As described above, a decisive moment can be determined and an image can be recorded. In this embodiment, only when the composition information matches in S207, the image is recorded in S208. However, the image is always recorded, and whether or not the composition information matches is tagged in S207. The image may be edited based on the tag.

本実施例で説明したように、サッカーグラウンドという場の中では決定的瞬間は無数に存在するが、カメラの位置姿勢が定まると決定的瞬間を適切に絞り込むことが可能になる。 As described in the present embodiment, there are an infinite number of decisive moments in a soccer ground, but when the position and orientation of the camera are determined, the decisive moments can be appropriately narrowed down.

その結果、絞り込まれた決定的瞬間を待ち構え、撮影画像が条件に合致した場合に画像を記録することで、決定的瞬間を捉えることができる。 As a result, the definitive moment can be captured by waiting for the narrowed decisive moment and recording the image when the captured image matches the condition.

以上、本発明の好ましい実施形態について説明したが、本発明はこれらの実施形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 As mentioned above, although preferable embodiment of this invention was described, this invention is not limited to these embodiment, A various deformation | transformation and change are possible within the range of the summary.

上記ではカメラと撮影場所の組み合わせとして、ボールカメラとサッカーグラウンドの例を説明したが、これに限定されるものではない。ボールカメラは選手が身につけるウェアラブルカメラや、観客が持つカメラに置きかえることができる。ボールカメラの場合は、ボールの動きに連動してカメラの位置姿勢が変化する。 In the above, an example of a ball camera and a soccer ground has been described as a combination of a camera and a shooting location, but the present invention is not limited to this. The ball camera can be replaced with a wearable camera worn by players or a camera held by the audience. In the case of a ball camera, the position and orientation of the camera change in conjunction with the movement of the ball.

一方、ウェアラブルカメラの場合は、選手の動きに連動してカメラの位置姿勢が変化する。また観客が持つカメラは、観客の撮影位置および方向に連動してカメラの位置姿勢が変化する。 On the other hand, in the case of a wearable camera, the position and orientation of the camera change in conjunction with the movement of the player. The camera held by the spectator changes its position and orientation in conjunction with the shooting position and direction of the spectator.

いずれも、場の情報とカメラの位置姿勢変化から、決定的瞬間を表わす構図情報を絞り込むことができる。例えば、ウェアラブルカメラでは、選手がゴール付近にいる場合、ディフェンダーがボールを奪いに来るようなシーンが決定的瞬間として想定される。また、観客が持つカメラでは、カメラをゴール方向に向けて撮影する場合、ＧＫがゴールに入りそうなボールを止めるようなシーンが決定的瞬間として想定される。 In either case, composition information representing a decisive moment can be narrowed down from field information and camera position and orientation changes. For example, in a wearable camera, when the player is near the goal, a scene in which the defender comes to take the ball is assumed as the decisive moment. In the case of a camera held by a spectator, when shooting with the camera facing the goal direction, a scene in which the GK stops the ball that is likely to enter the goal is assumed as a decisive moment.

これらの決定的瞬間を、前述のように構図情報として規定することで、本発明の効果を得ることができる。また、サッカーグラウンドという撮影場所は、自宅や公園など他の場所に置きかえることができる。 By defining these critical moments as composition information as described above, the effects of the present invention can be obtained. In addition, the shooting place called a soccer ground can be replaced with another place such as a home or a park.

例えば、自宅や公園では、子供やペットなど行動の予期しづらい被写体の決定的瞬間を捉える目的で、本発明を利用することができる。 For example, at home or in a park, the present invention can be used for the purpose of capturing a decisive moment of a subject such as a child or a pet that is difficult to predict behavior.

一例として、自宅のソファの上で猫があくびを行うシーンを決定的瞬間として想定し、これを前述のように構図情報として規定する。 As an example, a scene where a cat yawns on a sofa at home is assumed as a decisive moment, and this is defined as composition information as described above.

同様に、公園の滑り台から子供が滑り降りる瞬間を決定的瞬間として想定し、これを前述のように構図情報として規定する。撮影するカメラとして、先述のボールカメラを用いれば、本発明の効果を得ることができる。なお先述のとおり、ボールカメラは、人が手に持つカメラや移動式のロボットカメラに置き変えても良い。 Similarly, a moment when a child slides down from a slide in a park is assumed as a critical moment, and this is defined as composition information as described above. If the above-described ball camera is used as a camera for photographing, the effects of the present invention can be obtained. As described above, the ball camera may be replaced with a camera held by a person or a mobile robot camera.

以上の通り、本発明では、カメラと撮影場所の組み合わせは限定されるものではなく、構図が変化し得る全てのカメラにおいて、場の情報とカメラ位置姿勢を利用することで、決定的瞬間を自動的に捉えることができる。 As described above, in the present invention, the combination of the camera and the shooting location is not limited, and the decisive moment is automatically detected by using the field information and the camera position and orientation in all the cameras whose composition can change. Can be understood.

１０１撮像部
１０２カメラ信号処理部
１０３３次元空間情報保持部
１０４位置姿勢算出部
１０５構図情報保持部
１０６構図情報選択部
１０７構図情報検出部
１０８決定的瞬間判定部
１０９画像記録部 DESCRIPTION OF SYMBOLS 101 Image pick-up part 102 Camera signal processing part 103 Three-dimensional spatial information holding part 104 Position and orientation calculation part 105 Composition information holding part 106 Composition information selection part 107 Composition information detection part 108 Deterministic instantaneous determination part 109 Image recording part

Claims

An imaging device that captures an image when the composition of the object scene captured by the imaging means matches a predetermined composition,
Storage means for storing a plurality of compositions;
Spatial information acquisition means for acquiring three-dimensional spatial information of the shooting location in advance before shooting;
Position and orientation information acquisition means for sequentially acquiring the position and orientation information of the imaging device;
Selection means for selecting a composition suitable for photographing by the imaging device from a plurality of compositions stored in the storage means based on the spatial information and the position and orientation information;
An imaging control unit that controls the imaging unit to perform imaging when it is determined that the composition of the object scene captured by the imaging unit matches the composition selected by the selection unit;
An imaging apparatus comprising:

The imaging apparatus according to claim 1,
The position and orientation information includes the position, orientation, and focal length of the image pickup apparatus at the shooting location.

The imaging apparatus according to claim 1,
The imaging apparatus according to claim 1, wherein the selection unit selects a composition based on the three-dimensional space information of the shooting location and the subject information.

The imaging apparatus according to claim 1,
It also has a pressure sensor,
The image pickup apparatus, wherein the selection unit selects a composition based on a pressure applied to the image pickup apparatus.

The imaging apparatus according to claim 1,
When the imaging control means determines that the composition of the object scene captured by the imaging means matches the composition selected by the selection means, the imaging control means controls the imaging means to tag the captured image. An imaging device.