JP5608194B2

JP5608194B2 - Moving image processing apparatus and moving image processing program

Info

Publication number: JP5608194B2
Application number: JP2012163332A
Authority: JP
Inventors: 嶐一岡
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2012-07-24
Filing date: 2012-07-24
Publication date: 2014-10-15
Anticipated expiration: 2032-07-24
Also published as: JP2014021944A

Description

本発明は、参照動画像の移動パターンに類似する動きを、入力動画像から抽出すると共に、その軌跡を求めることが可能な動画像処理装置および動画像処理プログラムに関する。 The present invention relates to a moving image processing apparatus and a moving image processing program capable of extracting a movement similar to a movement pattern of a reference moving image from an input moving image and obtaining a locus thereof.

従来より、ビデオカメラなどで撮影された動画において、人物や物体などの動きを認識（抽出）すると共に、その軌跡を抽出するための方法が提案されている。例えば、物体の動きを示す動画を参照動画像とし、参照動画像における物体の動き（パターン）に対応する動きが認識されると共に、その軌跡（トラッキング）が抽出される動画像を入力動画像とする。そして、参照動画像において物体の動きを示す部分を特徴量として抽出し、抽出された参照動画像の特徴量の時間変化（時系列変化）を、動的計画法を用いて入力動画像の動画像において照合することにより、参照動画像の物体の動きに対応する対象物の認識を行う方法が提案されている（例えば、非特許文献１、非特許文献２、非特許文献３および非特許文献４参照）。このように、特徴量を用いて物体の動きに対応する対象物の認識を行う手法を、特徴ベースの手法と呼ぶ。 Conventionally, a method has been proposed for recognizing (extracting) the movement of a person or object in a moving image shot with a video camera or the like and extracting the locus thereof. For example, a moving image showing the movement of an object is used as a reference moving image, and a moving image in which a movement corresponding to the movement (pattern) of the object in the reference moving image is recognized and its locus (tracking) is extracted is defined as an input moving image. To do. Then, a portion indicating the motion of the object in the reference moving image is extracted as a feature amount, and a temporal change (time-series change) of the extracted feature amount of the reference moving image is converted into a moving image of the input moving image using dynamic programming. A method for recognizing an object corresponding to a motion of an object in a reference moving image by collating in an image has been proposed (for example, Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3, and Non-Patent Document). 4). In this way, a method for recognizing an object corresponding to the movement of an object using a feature amount is called a feature-based method.

また、特徴ベースの手法により対象物が認識された場合に、その動きの軌跡（トラッキング）抽出を行う方法として、カルマンフィルタやパーティクルフィルタなどを用いる方法が知られている（例えば、非特許文献５、非特許文献６参照）。 In addition, when a target is recognized by a feature-based method, a method using a Kalman filter, a particle filter, or the like is known as a method for extracting the movement locus (tracking) (for example, Non-Patent Document 5, Non-patent document 6).

Eli Shechtman, Michal Irani, "Space-Time Behavior-Based Correlation - OR - How to Tell If Two Underlying Motion Fields Are Similar without Computing Them?", IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 29, No. 11, 2007年11月, p. 2045-2056Eli Shechtman, Michal Irani, "Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar without Computing Them?", IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 29, No 11, 2007, p. 2045-2056 村井泰裕、藤吉弘亘、数井誠人、「時空間特徴に基づくエスカレーターシーンにおける人の異常行動検知」、信学技報、社団法人電子情報通信学会、Vol. 108、No. 198、PRMU2008-87、2008年9月、p. 247-254Yasuhiro Murai, Hironobu Fujiyoshi, Masato Nukai, “Detection of anomalous behavior in the escalator scene based on spatio-temporal features”, IEICE Technical Report, IEICE, Vol. 108, No. 198, PRMU2008-87 , September 2008, p. 247-254 岩田健司, 佐藤雄隆, 小林匠, 依田育士, 坂上勝彦, 大津展之, 「ＣＨＬＡＣによる映像サーベイランスのためのビジュアルフレームワーク」、第１３回画像センシングシンポジウム(SSII07)論文集、LD1-04、2007年6月、p.1-7Kenji Iwata, Yutaka Sato, Taku Kobayashi, Ikuo Yoda, Katsuhiko Sakagami, Nobuyuki Otsu, “Visual Framework for Video Surveillance with CHLAC”, 13th Symposium on Image Sensing Symposium (SSII07), LD1-04, June 2007, p.1-7 Yan Ke, R. Sukthanker, M. Hebert, "Event Detection in Crowded Video", IEEE 11th International Conference on Computer Vision (ICCV 2007), Brasil, 2007年10月, p.1-8Yan Ke, R. Sukthanker, M. Hebert, "Event Detection in Crowded Video", IEEE 11th International Conference on Computer Vision (ICCV 2007), Brasil, October 2007, p.1-8 Raphael Canals, Ali Ganoun, and Remy Leconge, "OCCLUSION-HANDLING FOR IMPROVED PARTICLE FILTERING-BASED TRACKING",17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, 2009年8月, p.1107-1111Raphael Canals, Ali Ganoun, and Remy Leconge, "OCCLUSION-HANDLING FOR IMPROVED PARTICLE FILTERING-BASED TRACKING", 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 2009, p.1107-1111 Kenji Okuma, Ali Talefhani, Nando De Freitas, James J. Little, David G. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking", Lecture Notes in Computer Vision, Springer, Vol. 3021/2004, 2004年6月30日, p.28-39Kenji Okuma, Ali Talefhani, Nando De Freitas, James J. Little, David G. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking", Lecture Notes in Computer Vision, Springer, Vol. 3021/2004, June 2004 30 days, p.28-39

しかしながら、入力動画像において、特徴となる部分が動画像の中（より詳細には、動画像を形成する画像の平面位置（空間位置）の中）に複数個出現したり、任意の平面位置（空間位置）に出現したりする場合があるため、特徴ベースの手法を用いて、特徴となる部分を良好に抽出することが困難であった。また、特徴となる部分の抽出が困難であることから、物体の動きに対応する対象物の認識（抽出）も容易ではなく、特徴量の時間変化（時系列変化）に基づいて、対象物の軌跡の抽出を行うことも困難であった。このため、上述したフィルタを用いた軌跡の抽出も困難なものとなっていた。 However, in the input moving image, a plurality of characteristic portions appear in the moving image (more specifically, in the plane position (spatial position) of the image forming the moving image), or in an arbitrary plane position ( In some cases, it is difficult to satisfactorily extract a feature portion by using a feature-based method. In addition, since it is difficult to extract a feature portion, it is not easy to recognize (extract) the target corresponding to the movement of the object, and based on the temporal change (time-series change) of the feature amount, It was also difficult to extract the trajectory. For this reason, it is difficult to extract a trajectory using the above-described filter.

さらに、物体の動きの認識が困難になる状況として、認識対象となる移動物体の前面に他の物体が存在し、移動物体の動きが一時的に遮蔽される場合（これをオクルージョンという）がある。このように、入力動画像にオクルージョンが含まれる場合には、入力動画像を構成する画像平面の時間変化（時系列的変化）において、認識対象となる移動物体が一時的に表示されなくなる（時系列的に表示が分断される）ため、参照動画像の特徴量との照合が不可能になってしまうという問題があった。また、同様に、オクルージョンが生じると、従来のトラッキング手法を用いても、対象物の軌跡を抽出することが困難になってしまうという問題があった。 Furthermore, as a situation where it is difficult to recognize the movement of an object, there is a case where there is another object in front of the moving object to be recognized and the movement of the moving object is temporarily blocked (this is called occlusion). . As described above, when the input moving image includes occlusion, the moving object to be recognized is temporarily not displayed in the temporal change (time-series change) of the image plane constituting the input moving image (time). Therefore, there is a problem that collation with the feature amount of the reference moving image becomes impossible. Similarly, when occlusion occurs, there is a problem that it is difficult to extract the trajectory of an object even if a conventional tracking method is used.

さらに、特徴ベースの手法を用いて対象物の動きを認識する場合には、入力動画像にいくつの物体の動きが含まれるかを予め設定することにより、対象物の動きの抽出個数が時系列的に決定される。しかしながら、事前にいくつの物体の動きが含まれるか判明していない場合には、時系列的に検出対象の個数を決定することが困難であった。 Furthermore, when recognizing the movement of an object using a feature-based method, the number of movements of the object is extracted in time series by setting in advance how many objects are included in the input moving image. To be determined. However, it is difficult to determine the number of detection objects in time series when it is not known in advance how many object motions are included.

また、同様に軌跡の抽出を行う場合においても困難が生ずる傾向があった。例えば、複数の移動物体が入力動画像に表示され、移動物体が互いに交錯する場合には、カルマンフィルタやパーティクルフィルタを用いても、軌跡抽出に失敗することが多かった。カルマンフィルタやパーティクルフィルタは、対象物の存在確率を画像平面の各点において時間毎に計算するという特徴があるため、移動物体が交錯する場合には、適切な存在確率の算出が困難になってしまうためである。 Similarly, there is a tendency for difficulty to occur when extracting the trajectory. For example, when a plurality of moving objects are displayed in the input moving image and the moving objects cross each other, the trajectory extraction often fails even when a Kalman filter or a particle filter is used. The Kalman filter and particle filter have the feature that the existence probability of an object is calculated every time at each point on the image plane, so that it is difficult to calculate an appropriate existence probability when moving objects are intermingled. Because.

本発明は上記問題に鑑みてなされたものであり、検出対象物に対応する特徴量の時間変化（時系列変化）によって対象物の検出および軌跡の抽出を行うのではなく、参照動画像と入力動画像とのマッチングのみによって対象物の動きの認識と軌跡抽出とを実現することを特徴とする動画像処理装置および動画像処理プログラムを提供することを課題とする。 The present invention has been made in view of the above problems, and does not detect an object and extract a trajectory by a temporal change (time-series change) of a feature amount corresponding to the detection object, but a reference moving image and an input It is an object of the present invention to provide a moving image processing apparatus and a moving image processing program characterized by realizing recognition of a movement of a target object and trajectory extraction only by matching with a moving image.

上記課題を解決するために、本発明に係る動画像処理装置は、物体の動きが撮影されたＴ時間の参照動画像であって、当該参照動画像における時間τの座標位置が（ξ（τ），η（τ））で示されるとともに、当該座標位置における輝度がＺ（ξ（τ），η（τ））で表された参照動画像と、前記物体の動きに類似する動きの認識が行われる入力動画像であって、当該入力動画像における時間ｔの輝度が、座標位置（ｘ，ｙ）を用いてｆ（ｘ，ｙ，ｔ）で表された入力動画像とを記録する記録手段と、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求める動画像処理手段とを備え、前記参照動画像の輝度から前記入力動画像の輝度を減じた値の絶対値を局所距離ｄとして、ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、前記動画像処理手段が、連続ＤＰを用いることにより、前記記録手段に記録された前記参照動画像Ｚ（ξ（τ），η（τ））と前記入力動画像ｆ（ｘ，ｙ，ｔ）とに基づいて、前記局所距離ｄの時間τにおける最小値を算出し、時間τにおいて最小値となる前記局所距離ｄに基づいて、連続ＤＰを用いることにより、時間τが１からＴになるまで累積した評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を算出し、算出された前記評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）から時間Ｔを除した値が、所定の閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を算出することにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めることを特徴とする。 In order to solve the above-described problem, the moving image processing apparatus according to the present invention is a T-time reference moving image in which the motion of an object is captured, and the coordinate position of the time τ in the reference moving image is (ξ (τ ), Η (τ)), and the reference moving image in which the luminance at the coordinate position is represented by Z (ξ (τ), η (τ)) and the recognition of the movement similar to the movement of the object. Recording to record an input moving image to be performed, in which the luminance at time t in the input moving image is represented by f (x, y, t) using the coordinate position (x, y) And a moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image, The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is the local distance. As d, d (x, y, τ, t) = （Z (ξ (τ), η (τ)) − f (x, y, t) ‖, and the moving image processing means calculates the continuous DP. By using the reference moving image Z (ξ (τ), η (τ)) recorded in the recording means and the input moving image f (x, y, t), the local distance d By calculating the minimum value at time τ and using the continuous DP based on the local distance d that becomes the minimum value at time τ, the evaluation function S (x, y, T, t) is calculated, and the coordinate position (x ^* , y) at which the value obtained by dividing the calculated evaluation function S (x, y, T, t) by time T is equal to or less than a predetermined threshold value h. ^* ) To calculate the time t of the portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image. Each coordinate position (x ^* , y ^* ) is obtained.

また、本発明に係る動画像処理装置は、物体の動きが撮影されたＴ時間の参照動画像であって、当該参照動画像における時間τの座標位置が（ξ（τ），η（τ））で示されるとともに、当該座標位置における輝度がＺ（ξ（τ），η（τ））で表された参照動画像と、前記物体の動きに類似する動きの認識が行われる入力動画像であって、当該入力動画像における時間ｔの輝度が、座標位置（ｘ，ｙ）を用いてｆ（ｘ，ｙ，ｔ）で表された入力動画像とを記録する記録手段と、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求める動画像処理手段とを備え、前記参照動画像の輝度から前記入力動画像の輝度を減じた値の絶対値を局所距離ｄとして、ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、ｅ_１＝ξ（τ）−ξ（τ−１），ｅ_２＝η（τ）−η（τ−１）と設定して、前記動画像処理手段が、連続ＤＰを用いることにより、前記記録手段に記録された前記参照動画像Ｚ（ξ（τ），η（τ））と前記入力動画像ｆ（ｘ，ｙ，ｔ）とに基づいて、前記局所距離ｄの時間τにおける最小値を算出し、前記局所距離ｄの時間τにおける最小値を用いて、時間τが１からＴになるまで累積することにより求められる評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を、Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）とした上で、連続ＤＰにより求められる漸化式

により算出し、算出された前記評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）から時間３Ｔを除した値が、所定の閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を、前記入力動画像の縦横解像度範囲内となるlocal areaを基準として、

に基づいて算出することにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めることを特徴とする。 Further, the moving image processing apparatus according to the present invention is a T-time reference moving image in which the movement of an object is photographed, and the coordinate position of time τ in the reference moving image is (ξ (τ), η (τ) ) And a reference moving image in which the luminance at the coordinate position is represented by Z (ξ (τ), η (τ)) and an input moving image in which a movement similar to the movement of the object is recognized. And recording means for recording the input moving image in which the luminance at time t in the input moving image is represented by f (x, y, t) using the coordinate position (x, y), and the input moving image Moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed with the reference moving image in an image, from the luminance of the reference moving image The absolute value of the value obtained by reducing the luminance of the input moving image is defined as a local distance d, and d (x, y , Τ, t) = ‖Z (ξ (τ), η (τ)) − f (x, y, t) ‖, e ₁ = ξ (τ) −ξ (τ−1), e ₂ = By setting η (τ) −η (τ−1), the moving image processing means uses the continuous DP, so that the reference moving image Z (ξ (τ), η ( τ)) and the input moving image f (x, y, t) based on the input image f (x, y, t), the minimum value at the time τ of the local distance d is calculated, and the minimum value at the time τ of the local distance d is used to calculate the time An evaluation function S (x, y, T, t) obtained by accumulating τ from 1 to T is expressed as S (x, y, 1, t) = 3d (x, y, 1, t). And the recurrence formula obtained by continuous DP

The coordinate position (x ^* , y ^* ) of time t at which the value obtained by dividing time 3T from the calculated evaluation function S (x, y, T, t) is equal to or less than a predetermined threshold value h, Based on a local area that is within the vertical and horizontal resolution range of the input moving image,

By calculating the coordinate position (x ^* , y ^* ) of each portion of the input moving image similar to the motion pattern of the object photographed in the reference moving image. To do.

さらに、本発明に係る動画像処理装置は、物体の動きが撮影されたＴ時間の参照動画像であって、当該参照動画像における時間τの座標位置が（ξ（τ），η（τ））で示されるとともに、当該座標位置における輝度がＺ（ξ（τ），η（τ））で表された参照動画像と、前記物体の動きに類似する動きの認識が行われる入力動画像であって、当該入力動画像における時間ｔの輝度が、座標位置（ｘ，ｙ）を用いてｆ（ｘ，ｙ，ｔ）で表された入力動画像とを記録する記録手段と、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求める動画像処理手段とを備え、前記参照動画像の輝度から前記入力動画像の輝度を減じた値の絶対値を局所距離ｄとして、ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、ｅ_１＝ξ（τ）−ξ（τ−１），ｅ_２＝η（τ）−η（τ−１）と設定し、前記参照動画像における前記物体の動きの軌跡の縦横解像度方向における変形率をαに設定して、前記動画像処理手段が、連続ＤＰを用いることにより、前記記録手段に記録された前記参照動画像Ｚ（ξ（τ），η（τ））と前記入力動画像ｆ（ｘ，ｙ，ｔ）とに基づいて、前記局所距離ｄの時間τにおける最小値を算出し、前記局所距離ｄの時間τにおける最小値を用いて、時間τが１からＴになるまで累積することにより求められる評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を、Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）とした上で、連続ＤＰにより求められる漸化式

に基づいて算出することにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めることを特徴とする。 Furthermore, the moving image processing apparatus according to the present invention is a T-time reference moving image in which the movement of an object is photographed, and the coordinate position of time τ in the reference moving image is (ξ (τ), η (τ) ) And a reference moving image in which the luminance at the coordinate position is represented by Z (ξ (τ), η (τ)) and an input moving image in which a movement similar to the movement of the object is recognized. And recording means for recording the input moving image in which the luminance at time t in the input moving image is represented by f (x, y, t) using the coordinate position (x, y), and the input moving image Moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed with the reference moving image in an image, from the luminance of the reference moving image The absolute value of the value obtained by reducing the luminance of the input moving image is defined as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) − f (x, y, t) ‖, and e ₁ = ξ (τ) −ξ (τ−1), e ₂ = Η (τ) −η (τ−1), the deformation rate in the vertical / horizontal resolution direction of the trajectory of the object in the reference moving image is set to α, and the moving image processing means performs continuous DP By using the reference moving image Z (ξ (τ), η (τ)) recorded in the recording means and the input moving image f (x, y, t). And the evaluation function S (x, y, T, t obtained by accumulating the time τ from 1 to T using the minimum value at the time τ of the local distance d. ) As S (x, y, 1, t) = 3d (x, y, 1, t) and a recurrence formula obtained by continuous DP

一方で、本発明に係る動画像処理プログラムは、物体の動きが撮影されたＴ時間の参照動画像であって、当該参照動画像における時間τの座標位置が（ξ（τ），η（τ））で示されるとともに、当該座標位置における輝度がＺ（ξ（τ），η（τ））で表された参照動画像と、前記物体の動きに類似する動きの認識が行われる入力動画像であって、当該入力動画像における時間ｔの輝度が、座標位置（ｘ，ｙ）を用いてｆ（ｘ，ｙ，ｔ）で表された入力動画像とを記録する記録手段と、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求める動画像処理手段とを備える動画像処理装置の動画像処理プログラムであって、前記参照動画像の輝度から前記入力動画像の輝度を減じた値の絶対値を局所距離ｄとして、ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、前記動画像処理手段に、連続ＤＰを用いることにより、前記記録手段に記録された前記参照動画像Ｚ（ξ（τ），η（τ））と前記入力動画像ｆ（ｘ，ｙ，ｔ）とに基づいて、前記局所距離ｄの時間τにおける最小値を算出させる局所距離算出機能と、時間τにおいて最小値となる前記局所距離ｄに基づいて、連続ＤＰを用いることにより、時間τが１からＴになるまで累積した評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を算出させる評価関数算出機能と、算出された前記評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）から時間Ｔを除した値が、所定の閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を算出することにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めさせる座標位置算出機能とを実現させることを特徴とする。 On the other hand, the moving image processing program according to the present invention is a T-time reference moving image in which the movement of an object is photographed, and the coordinate position of time τ in the reference moving image is (ξ (τ), η (τ )) And a reference moving image whose luminance at the coordinate position is represented by Z (ξ (τ), η (τ)) and an input moving image in which a movement similar to the movement of the object is recognized. A recording means for recording the input moving image in which the luminance at time t in the input moving image is represented by f (x, y, t) using the coordinate position (x, y); A moving image of a moving image processing apparatus comprising moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the movement pattern of an object photographed with the reference moving image in the moving image. A processing program comprising: calculating the input moving image from the luminance of the reference moving image Assuming that the absolute value of the value obtained by subtracting the luminance of the local distance d is d (x, y, τ, t) =） Z (ξ (τ), η (τ)) − f (x, y, t) ｔ By using continuous DP for the moving image processing means, the reference moving image Z (ξ (τ), η (τ)) recorded in the recording means and the input moving image f (x, y, t) based on the local distance calculation function for calculating the minimum value of the local distance d at time τ, and using the continuous DP based on the local distance d at which the minimum value at time τ is obtained. An evaluation function calculation function for calculating an evaluation function S (x, y, T, t) accumulated until 1 becomes T, and a time T is calculated from the calculated evaluation function S (x, y, T, t). By calculating the coordinate position (x ^* , y ^* ) of time t when the divided value is equal to or less than a predetermined threshold value h, the input moving image And a coordinate position calculation function for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the image.

また、本発明に係る動画像処理プログラムは、物体の動きが撮影されたＴ時間の参照動画像であって、当該参照動画像における時間τの座標位置が（ξ（τ），η（τ））で示されるとともに、当該座標位置における輝度がＺ（ξ（τ），η（τ））で表された参照動画像と、前記物体の動きに類似する動きの認識が行われる入力動画像であって、当該入力動画像における時間ｔの輝度が、座標位置（ｘ，ｙ）を用いてｆ（ｘ，ｙ，ｔ）で表された入力動画像とを記録する記録手段と、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求める動画像処理手段とを備える動画像処理装置の動画像処理プログラムであって、前記参照動画像の輝度から前記入力動画像の輝度を減じた値の絶対値を局所距離ｄとして、ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、ｅ_１＝ξ（τ）−ξ（τ−１），ｅ_２＝η（τ）−η（τ−１）と設定して、前記動画像処理手段に、連続ＤＰを用いることにより、前記記録手段に記録された前記参照動画像Ｚ（ξ（τ），η（τ））と前記入力動画像ｆ（ｘ，ｙ，ｔ）とに基づいて、前記局所距離ｄの時間τにおける最小値を算出させる局所距離算出機能と、前記局所距離ｄの時間τにおける最小値を用いて、時間τが１からＴになるまで累積することにより求められる評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を、Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）とした上で、連続ＤＰにより求められる漸化式

により算出させる評価関数算出機能と、算出された前記評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）から時間３Ｔを除した値が、所定の閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を、前記入力動画像の縦横解像度範囲内となるlocal areaを基準として、

に基づいて算出させることにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めさせる座標位置算出機能とを実現させることを特徴とする。 In addition, the moving image processing program according to the present invention is a T-time reference moving image in which the movement of an object is photographed, and the coordinate position of time τ in the reference moving image is (ξ (τ), η (τ) ) And a reference moving image in which the luminance at the coordinate position is represented by Z (ξ (τ), η (τ)) and an input moving image in which a movement similar to the movement of the object is recognized. And recording means for recording the input moving image in which the luminance at time t in the input moving image is represented by f (x, y, t) using the coordinate position (x, y), and the input moving image Moving image processing of a moving image processing apparatus comprising moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the movement pattern of an object photographed with the reference moving image in an image A program for calculating the input moving image from the luminance of the reference moving image; The absolute value of the value obtained by subtracting the luminance is the local distance d, and d (x, y, τ, t) = ｔZ (ξ (τ), η (τ)) − f (x, y, t) ‖ , E ₁ = ξ (τ) −ξ (τ−1), e ₂ = η (τ) −η (τ−1), and by using continuous DP for the moving image processing means, Based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, the minimum value at time τ of the local distance d And an evaluation function S (x, y, T, t) obtained by accumulating the time τ from 1 to T using the local distance calculation function for calculating λ and the minimum value of the local distance d at the time τ. Is set to S (x, y, 1, t) = 3d (x, y, 1, t), and the recurrence formula obtained by continuous DP

And the coordinate function (x ^* , x) of the time t at which a value obtained by dividing the calculated evaluation function S (x, y, T, t) by the time 3T is equal to or less than a predetermined threshold value h. y ^* ) with reference to a local area within the vertical and horizontal resolution range of the input moving image,

Coordinate position calculation for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image. It is characterized by realizing functions.

さらに、本発明に係る動画像処理プログラムは、物体の動きが撮影されたＴ時間の参照動画像であって、当該参照動画像における時間τの座標位置が（ξ（τ），η（τ））で示されるとともに、当該座標位置における輝度がＺ（ξ（τ），η（τ））で表された参照動画像と、前記物体の動きに類似する動きの認識が行われる入力動画像であって、当該入力動画像における時間ｔの輝度が、座標位置（ｘ，ｙ）を用いてｆ（ｘ，ｙ，ｔ）で表された入力動画像とを記録する記録手段と、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求める動画像処理手段とを備える動画像処理装置の動画像処理プログラムであって、前記参照動画像の輝度から前記入力動画像の輝度を減じた値の絶対値を局所距離ｄとして、ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、ｅ_１＝ξ（τ）−ξ（τ−１），ｅ_２＝η（τ）−η（τ−１）と設定し、前記参照動画像における前記物体の動きの軌跡の縦横解像度方向における変形率をαに設定して、前記動画像処理手段に、連続ＤＰを用いることにより、前記記録手段に記録された前記参照動画像Ｚ（ξ（τ），η（τ））と前記入力動画像ｆ（ｘ，ｙ，ｔ）とに基づいて、前記局所距離ｄの時間τにおける最小値を算出させる局所距離算出機能と、前記局所距離ｄの時間τにおける最小値を用いて、時間τが１からＴになるまで累積することにより求められる評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を、Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）とした上で、連続ＤＰにより求められる漸化式

に基づいて算出させることにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めさせる座標位置算出機能とを実現させることを特徴とする。 Furthermore, the moving image processing program according to the present invention is a T-time reference moving image in which the motion of an object is photographed, and the coordinate position of time τ in the reference moving image is (ξ (τ), η (τ) ) And a reference moving image in which the luminance at the coordinate position is represented by Z (ξ (τ), η (τ)) and an input moving image in which a movement similar to the movement of the object is recognized. And recording means for recording the input moving image in which the luminance at time t in the input moving image is represented by f (x, y, t) using the coordinate position (x, y), and the input moving image Moving image processing of a moving image processing apparatus comprising moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the movement pattern of an object photographed with the reference moving image in an image A program, wherein the input moving image is derived from the luminance of the reference moving image Assuming that the absolute value of the value obtained by subtracting the luminance of the local distance d is d (x, y, τ, t) =） Z (ξ (τ), η (τ)) − f (x, y, t) ｔ E ₁ = ξ (τ) −ξ (τ−1), e ₂ = η (τ) −η (τ−1), and the vertical and horizontal resolution directions of the trajectory of the object in the reference moving image Is set to α and the moving image processing means uses a continuous DP, whereby the reference moving image Z (ξ (τ), η (τ)) recorded in the recording means and the input are used. Based on the moving image f (x, y, t), a local distance calculation function for calculating the minimum value of the local distance d at the time τ, and the minimum value of the local distance d at the time τ, the time τ The evaluation function S (x, y, T, t) obtained by accumulating until 1 reaches T is expressed as S (x, y, 1, t) = 3d (x, y, 1, t) And the recurrence formula obtained by continuous DP

ここで、輝度とは、単に参照動画像と入力動画像との単なる明るさを示す明暗値のみを意味するだけでなく、画素毎の各色（カラー）の明るさを含むものである。このため、輝度が単なる明暗値に該当する場合における局所距離ｄの値（‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖）は、明暗値の差の絶対値を示すことになるが、輝度として画素毎の各色（カラー）の明るさを考慮する場合には、Ｒ（赤色）、Ｇ（緑色）、Ｂ（青色）のそれぞれの要素の差の和（あるいはユークリッド距離）を求めることにより局所距離ｄ（ｘ，ｙ，τ，ｔ）が算出されることになる。さらに、参照動画像の色が複数のＫ種類のいずれでも良い場合には、局所距離を求める式である
ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖が、
ｄ（ｘ，ｙ，τ，ｔ）＝min｛‖Ｚ_｛ｋ｝（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖｝に該当することになる。ここで、ｋは１≦ｋ≦Ｋの範囲の値を示し、Ｚ_｛ｋ｝は、ｋ番目の色の参照動画像を示すことになる。 Here, the brightness means not only the brightness value indicating the mere brightness of the reference moving image and the input moving image, but also includes the brightness of each color (color) for each pixel. For this reason, the value of local distance d (‖Z (ξ (τ), η (τ)) − f (x, y, t) ‖) when the luminance corresponds to a simple brightness value is the difference between the brightness values. Although the absolute value is shown, when the brightness of each color (color) for each pixel is considered as the luminance, the sum of the differences between the respective elements of R (red), G (green), and B (blue) By calculating (or Euclidean distance), the local distance d (x, y, τ, t) is calculated. Further, when the color of the reference moving image may be any of a plurality of K types, d (x, y, τ, t) = ‖Z (ξ (τ), η (τ) ) -F (x, y, t) ‖
d (x, y, τ, t) = min {‖Z_ {k} (ξ (τ), η (τ)) − f (x, y, t) ‖}. Here, k represents a value in a range of 1 ≦ k ≦ K, and Z_ {k} represents a k-th color reference moving image.

従来より、入力データから参照データのデータパターンに類似する部分を検出する連続ＤＰ（Dynamic Programming：ダイナミック・プログラミング）と呼ばれる技術が知られている。しかしながら、連続ＤＰでは、参照データと入力データとがそれぞれ、音声データのように時間毎に出力変化が生じる２変数（時間と変化量）からなる時系列データでなければならなかった。一方で、動画像のような画像平面（２次元）の時間変化を示す時空間的な時系列データは、３変数（時間と縦横座標）により構成されるため、連続ＤＰを用いて類似する部分を検出することは困難であった。 Conventionally, a technique called continuous DP (Dynamic Programming) for detecting a portion similar to a data pattern of reference data from input data is known. However, in continuous DP, the reference data and input data must be time-series data composed of two variables (time and change amount) in which output changes with time, such as audio data. On the other hand, spatio-temporal time-series data indicating temporal changes in an image plane (two-dimensional) such as a moving image is composed of three variables (time and ordinate and abscissa), and thus similar parts using continuous DP. It was difficult to detect.

このため、本発明に係る動画像処理装置および動画像処理プログラムでは、参照データに該当する参照動画像を時間τだけの変数であるＺ（ξ（τ），η（τ））で示すと共に、入力データに該当する入力動画像を時間ｔと座標位置（ｘ，ｙ）との３つの変数からなるｆ（ｘ，ｙ，ｔ）で示すことにより、入力動画像と参照動画像との変数の和を４変数にしたことを特徴としている。 For this reason, in the moving image processing apparatus and the moving image processing program according to the present invention, the reference moving image corresponding to the reference data is represented by Z (ξ (τ), η (τ)) that is a variable only for time τ, By indicating the input moving image corresponding to the input data by f (x, y, t) including three variables of time t and coordinate position (x, y), the variable of the input moving image and the reference moving image can be changed. It is characterized by the sum of four variables.

このように、入力動画像と参照動画像との変数の和を４変数で表すことにより、連続ＤＰにおいて、動画像の類似部分検出が可能となり、連続ＤＰを動画像のような時空間的なデータの類似判断検出へと拡張させて時空間連続ＤＰとして用いることが可能となる。 In this way, by representing the sum of the variables of the input moving image and the reference moving image with four variables, it is possible to detect a similar portion of the moving image in the continuous DP, and the continuous DP is temporally and spatially like a moving image. It can be used as a spatio-temporal continuous DP by extending to data similarity judgment detection.

従って、連続ＤＰで用いられる局所距離ｄを、参照動画像の輝度から入力動画像の輝度を減じた値の絶対値であるｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖で示し、この局所距離ｄを用いて連続ＤＰに基づき、時間τにおいて最小値となる局所距離ｄを用いて、時間τが１からＴになるまで累積して評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を算出することにより、入力動画像において参照動画像で撮影された対象物の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めることができる。 Therefore, the local distance d used in the continuous DP is the absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image, d (x, y, τ, t) = ‖Z (ξ (τ) , Η (τ)) − f (x, y, t) ‖, and based on the continuous DP using this local distance d, the time τ is set to 1 using the local distance d that is the minimum value at time τ. By calculating the evaluation function S (x, y, T, t) by accumulating until T, every time t of the portion similar to the motion pattern of the object photographed with the reference moving image in the input moving image Coordinate position (x ^* , y ^* ) can be obtained.

さらに、所定の閾値ｈを基準として、時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を求めることにより、入力動画像において参照動画像の対象物の動きに類似する部分を複数検出することが可能となる。 Furthermore, by obtaining the coordinate position (x ^* , y ^* ) for each time t with a predetermined threshold value h as a reference, a plurality of portions similar to the movement of the target object of the reference moving image can be detected in the input moving image. It becomes possible.

また、評価関数Ｓを、連続ＤＰより求められる漸化式

に基づいて求める場合には、評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）から時間３Ｔを割ることにより、時間τに対する時間ｔの正規化を図ることが可能となる。 Also, the recurrence formula obtained from the continuous DP for the evaluation function S

When the time t is obtained from the evaluation function S (x, y, T, t), the time t can be normalized with respect to the time τ.

さらに、参照動画像における物体の動きの軌跡の縦横解像度方向における変形率をαに設定して、評価関数Ｓを、

で示される漸化式から求めることにより、縦横解像度方向における変形（空間的変形）を考慮した上で、入力動画像より参照動画像の対象物の動きに類似する部分を検出することが可能となる。 Further, the deformation function in the vertical and horizontal resolution directions of the trajectory of the object in the reference moving image is set to α, and the evaluation function S is

It is possible to detect a portion similar to the movement of the target object of the reference moving image from the input moving image in consideration of the deformation in the vertical and horizontal resolution directions (spatial deformation). Become.

また、上述した動画像処理装置において、前記評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）に基づいて求められた、前記閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を座標位置（ｘ，ｙ）として、
Ｂ＝
｛（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２），
（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１），
（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）｝と設定し、
時間τの初期値としてτ＝Ｔを設定して、前記動画像処理手段が、時間ｔにおける前記座標位置（ｘ，ｙ）に基づいて、

よりｘ^＊，ｙ^＊，ｔ^＊を算出し、算出されたｘ^＊，ｙ^＊，ｔ^＊を新たにｘ，ｙ，ｔに設定するとともに、τの値から１を減じて、τが１になるまでｘ^＊，ｙ^＊，ｔ^＊を繰り返し返し算出することにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）の軌跡を抽出するものであってもよい。 Further, in the above-described moving image processing apparatus, the coordinate position (x ^* , y ^* ) of time t which is obtained based on the evaluation function S (x, y, T, t) and is not more than the threshold value h is coordinated. As position (x, y),
B =
{(X−e ₁ (τ), ye ₂ (τ), τ−1, t−2),
(X−e ₁ (τ), ye ₂ (τ), τ−1, t−1),
_{(X-e 1 (τ)} -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1), τ-2, t-1) is set} and,
Τ = T is set as an initial value of time τ, and the moving image processing means is based on the coordinate position (x, y) at time t,

Then, x ^* , y ^* , t ^* are calculated, and the calculated x ^* , y ^* , t ^* are newly set to x, y, t, and 1 is subtracted from the value of τ, so that τ becomes 1. By repeating and calculating x ^* , y ^* , t ^* until the coordinate position (x ^*) of the portion similar to the motion pattern of the object photographed with the reference moving image in the input moving image (x ^*). , Y ^* ) may be extracted.

また、上述した動画像処理プログラムにおいて、前記座標位置算出機能において求められた時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を座標位置（ｘ，ｙ）として、
Ｂ＝
｛（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２），
（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１），
（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）｝と設定し、
時間τの初期値としてτ＝Ｔを設定して、前記動画像処理手段に、時間ｔにおける前記座標位置（ｘ，ｙ）に基づいて、

よりｘ^＊，ｙ^＊，ｔ^＊を算出させる軌跡算出機能を実行させ、算出されたｘ^＊，ｙ^＊，ｔ^＊を新たにｘ，ｙ，ｔに設定するとともに、τの値から１を減じて、τが１になるまで前記軌跡算出機能を繰り返し実行させることにより、前記入力動画像において前記参照動画像で撮影された物体の動きのパターンに類似する部分の時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）の軌跡を抽出させるものであってもよい。 In the above-described moving image processing program, the coordinate position (x ^* , y ^* ) for each time t obtained by the coordinate position calculation function is defined as the coordinate position (x, y).
B =
{(X−e ₁ (τ), ye ₂ (τ), τ−1, t−2),
(X−e ₁ (τ), ye ₂ (τ), τ−1, t−1),
_{(X-e 1 (τ)} -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1), τ-2, t-1) is set} and,
By setting τ = T as an initial value of time τ, the moving image processing means is based on the coordinate position (x, y) at time t,

The trajectory calculation function for calculating x ^* , y ^* , and t ^* is executed, and the calculated x ^* , y ^* , and t ^* are newly set to x, y, and t, and 1 is subtracted from the value of τ. Thus, by repeatedly executing the trajectory calculation function until τ becomes 1, the coordinate position (x for each time t) of the portion similar to the motion pattern of the object captured by the reference moving image in the input moving image (x ^* , Y ^* ) may be extracted.

上述した動画像処理装置および動画像処理プログラムを用いることにより、評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）に基づいて求められた、所定の閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を、τがＴから１まで戻しつつ、評価関数Ｓの値が最小になる（ｘ，ｙ，ｔ）を求めることにより、入力動画像において参照動画像の対象物の動きに類似する動きの検出部分を軌跡として求めることが可能となる。 By using the above-described moving image processing apparatus and moving image processing program, the coordinate position (x ^{*) of} time t which is obtained based on the evaluation function S (x, y, T, t) and is equal to or less than a predetermined threshold value h ^. , Y ^* ) is similar to the motion of the object of the reference moving image in the input moving image by obtaining (x, y, t) that minimizes the value of the evaluation function S while τ returns from T to 1 It is possible to obtain the detected motion part as a trajectory.

また、上述した動画像処理装置は、汎用性を持つ要素的な動きであってそれぞれ異なる動きを撮影した複数の参照動画像を前記記録手段に記録させ、前記動画像処理手段が、前記評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）に基づいて求められた、前記閾値ｈ以下となる時間ｔの座標位置（ｘ^＊，ｙ^＊）を当該時間ｔに応じて抽出することにより、前記入力動画像から前記要素的な動きの種類に対応する連続した動きを求めるものであってもよい。 Further, the moving image processing apparatus described above causes the recording unit to record a plurality of reference moving images obtained by capturing elemental movements having different versatility, and the moving image processing unit includes the evaluation function. By extracting the coordinate position (x ^* , y ^* ) of the time t which is obtained based on S (x, y, T, t) and is not more than the threshold value h according to the time t, the input moving image A continuous motion corresponding to the elemental motion type may be obtained from an image.

さらに、上述した動画像処理プログラムは、汎用性を持つ要素的な動きであってそれぞれ異なる動きを撮影した複数の参照動画像を前記記録手段に記録させ、前記動画像処理手段に、前記座標位置算出機能において求められた時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）を当該時間ｔに応じて抽出する抽出機能を実行させて、当該抽出機能により抽出された時間ｔ毎の座標位置（ｘ^＊，ｙ^＊）より、前記入力動画像から前記要素的な動きの種類に対応する連続した動きを求めさせるものであってもよい。 Further, the above-described moving image processing program causes the recording means to record a plurality of reference moving images that are elemental movements having versatility, each of which is a different movement, and causes the moving image processing means to record the coordinate position. By executing an extraction function for extracting the coordinate position (x ^* , y ^* ) for each time t obtained by the calculation function according to the time t, the coordinate position (x for each time t extracted by the extraction function) ^* , Y ^* ), a continuous motion corresponding to the elemental motion type may be obtained from the input moving image.

入力動画像において参照動画像の対象物の動きに類似する部分を検出するためには、事前に検出対象となる対象物の動きを参照動画像に撮影する必要がある。しかしながら、参照動画像において予め対象物の動きを予測出ない場合は、１つの参照動画像で全ての動きを記録することが困難であった。例えば、町中での人物の多様な動きや、複雑なジェスチャなどは、その全ての動きを１つの参照動画像を用いて特定することが困難となる。 In order to detect a part of the input moving image that is similar to the movement of the object of the reference moving image, it is necessary to capture the movement of the object to be detected in advance in the reference moving image. However, if the motion of the object is not predicted in advance in the reference moving image, it is difficult to record all the motions with one reference moving image. For example, it is difficult to specify various movements of people in town, complicated gestures, and the like using one reference moving image.

本発明に係る動画像処理装置および動画像処理プログラムによれば、汎用性を持つ要素的な動きであってそれぞれ異なる動きを撮影した参照動画像を複数用意することにより、汎用的な動きのパターンを、参照動画像の動きを繋げ合わせることにより生成することができる。このため、町中での人物の多様な動きや、複雑なジェスチャなどであっても、要素的な動きの種類に対応する連続した動きと捉えて入力動画像から求めることが可能となる。 According to the moving image processing apparatus and the moving image processing program according to the present invention, by preparing a plurality of reference moving images that are elemental motions having general versatility, and each capturing different motions, a general motion pattern Can be generated by connecting the movements of the reference moving images. For this reason, even if there are various movements of people in the town, complicated gestures, etc., it is possible to obtain them from the input moving image by considering them as continuous movements corresponding to the types of elemental movements.

本発明に係る動画像処理装置および動画像処理プログラムでは、参照データに該当する参照動画像を時間τだけの変数であるＺ（ξ（τ），η（τ））で示すと共に、入力データに該当する入力動画像を時間ｔと座標位置（ｘ，ｙ）との３つの変数からなるｆ（ｘ，ｙ，ｔ）で示すことにより、入力動画像と参照動画像との変数の和を４変数にしたことを特徴とする。 In the moving image processing apparatus and the moving image processing program according to the present invention, the reference moving image corresponding to the reference data is indicated by Z (ξ (τ), η (τ)) which is a variable of only time τ, and input data is also displayed. The corresponding input moving image is indicated by f (x, y, t) consisting of three variables of time t and coordinate position (x, y), so that the sum of the variables of the input moving image and the reference moving image is 4 Characterized by variable.

このように、入力動画像と参照動画像との変数の和を４変数で表すことにより、連続ＤＰにおいて、動画像の類似部分検出が可能となり、連続ＤＰを動画像のような時空間的なデータの類似判断検出へと拡張させて時空間連続ＤＰとして用いることが可能となる。 Thus, by expressing the sum of the variables of the input moving image and the reference moving image with four variables, it is possible to detect a similar portion of the moving image in the continuous DP, and the continuous DP is temporally and spatially like a moving image. It can be used as a spatio-temporal continuous DP by extending to data similarity judgment detection.

（ａ）は、スポッティング点を説明するために示した、時間ｔと時間τとからなる関係図であり、（ｂ）は、傾斜制限に伴う係数（ウェイト）を説明するために示した、時間ｔと時間τとからなる関係図である。(A) is a relational diagram composed of time t and time τ shown for explaining the spotting points, and (b) is a time shown for explaining the coefficient (weight) associated with the tilt restriction. It is a relationship figure which consists of t and time (tau). （ａ）は、入力時系列パターンｆ（ｘ，ｙ，ｔ）を示した図であり、（ｂ）は、参照時系列パターンＺ（ξ（τ），η（τ））を示した図である。(A) is the figure which showed the input time series pattern f (x, y, t), (b) is the figure which showed the reference time series pattern Z (ξ (τ), η (τ)). is there. 少年が手を挙げながら口で何かを説明している状態を示した図である。It is the figure which showed the state in which the boy is explaining something with a mouth, raising a hand. 式９を説明するために用いた図であって、（ｘ，ｙ，τ，ｔ）からなる４次元空間においてＳ（ｘ，ｙ，τ，ｔ）を決定するためのパス（経路）を示した図である。FIG. 10 is a diagram used to explain Equation 9 and shows a path (path) for determining S (x, y, τ, t) in a four-dimensional space composed of (x, y, τ, t). It is a figure. 式１０を説明するために用いた図であって、（ｘ，ｙ，τ，ｔ）からなる４次元空間においてＳ（ｘ，ｙ，τ，ｔ）を決定するためのパス（経路）を示した図である。FIG. 10 is a diagram used to explain Equation 10 and shows a path (path) for determining S (x, y, τ, t) in a four-dimensional space consisting of (x, y, τ, t). It is a figure. 式１１を説明するために用いた図であって、（ｘ，ｙ，τ，ｔ）からなる４次元空間においてＳ（ｘ，ｙ，τ，ｔ）を決定するためのパス（経路）を示した図である。FIG. 11 is a diagram used to explain Equation 11, and shows a path (path) for determining S (x, y, τ, t) in a four-dimensional space consisting of (x, y, τ, t). It is a figure. （ａ）〜（ｃ）は、参照動画像における空間的変形を説明するための図である。(A)-(c) is a figure for demonstrating the spatial deformation | transformation in a reference moving image. （ａ）は、８つのプリミティブな動きを一例として示した図であり、（ｂ）は、（ａ）に示したｉ＝２の動きを行う参照動画像の一例を示した図であり、（ｃ）は、プリミティブな動きに基づいて、複数の対象物の任意の動きをトラッキングした様子を示した図である。(A) is a diagram showing eight primitive motions as an example, (b) is a diagram showing an example of a reference moving image that performs the motion of i = 2 shown in (a), ( c) is a diagram showing a state in which arbitrary movements of a plurality of objects are tracked based on a primitive movement. （ａ）に示すように、時間ｔにスポッティングされた軌跡のτ＝１からτ＝Ｔまでの座標を時間ｔの画像上に全てプロットする様子を模式的に示した図であり、（ｂ）は、ｆ（ｘ，ｔ）で示される入力動画像の一例を示した図である。As shown in (a), it is a diagram schematically showing a state in which all the coordinates from τ = 1 to τ = T of the locus spotted at time t are plotted on the image at time t, (b) These are figures which showed an example of the input moving image shown by f (x, t). 本実施の形態に係る動画像処理装置の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the moving image processing apparatus which concerns on this Embodiment. 本実施の形態に係る動画像処理装置の処理内容を示したフローチャートである。It is the flowchart which showed the processing content of the moving image processing apparatus which concerns on this Embodiment. 参照動画像と、画面に映った人物が右手を上から下へと移動させる様子を示した入力動画像と、出力動画像とを示した図である。It is the figure which showed the reference moving image, the input moving image which showed a mode that the person reflected on the screen moved a right hand from the top to the bottom, and the output moving image. 参照動画像と、画面に映った人物が奥側から手前側へと移動する様子を示した入力動画像と、出力動画像とを示した図である。It is the figure which showed the reference moving image, the input moving image which showed a mode that the person reflected on the screen moved to the near side from the back side, and the output moving image. 参照動画像と、画面に映った人物が手のひらで上から下へとＳの字を描くように移動する様子を示した入力動画像と、出力動画像とを示した図である。It is the figure which showed the reference moving image, the input moving image which showed a mode that the person reflected on the screen moved so that a letter S might be drawn from the top to the bottom, and an output moving image. 参照動画像と、画面に映った人物が両方の手のひらを上から下へと同時に移動させる様子を示した入力動画像と、出力動画像とを示した図である。It is the figure which showed the reference moving image, the input moving image which showed a mode that the person reflected on the screen moved both palms simultaneously from the top to the bottom, and the output moving image. 参照動画像と、画面に映った人物が、両方の手のひらを体の中心位置において交互に回転させて入れ替えながら下から上へと移動させる様子を示した入力動画像と、出力動画像とを示した図である。The reference moving image, the input moving image showing how the person on the screen moves both palms from the bottom to the top while alternately rotating them at the center position of the body, and the output moving image are shown. It is a figure. 参照動画像と、画面に映ったボールが画面の右側から左側へと移動する様子を示した入力動画像と、出力動画像とを示した図である。It is the figure which showed the reference moving image, the input moving image which showed a mode that the ball | bowl reflected on the screen moved from the right side of a screen to the left side, and an output moving image.

以下、本発明に係る動画像処理装置の一例を、図面を用いて詳細に説明する。 Hereinafter, an example of a moving image processing apparatus according to the present invention will be described in detail with reference to the drawings.

動画像処理装置は、移動対象物（以下、移動物という）の動きを示した参照動画像に基づいて、入力動画像から参照動画像に示される移動物の動きに類似する動きを認識・抽出する（マッチングする）と共に、抽出された移動物の軌跡を抽出（トラッキング）することを特徴とする。 The moving image processing apparatus recognizes and extracts a movement similar to the movement of the moving object indicated in the reference moving image from the input moving image based on the reference moving image indicating the movement of the moving object (hereinafter referred to as the moving object). In addition to performing (matching), the trajectory of the extracted moving object is extracted (tracked).

ここで、参照動画像とは、移動物の動き（パターン）が撮影された有限の動画像である。一方で、入力動画像は、一般的な動画像であって、撮影される対象は特に限定されない。また、入力動画像は、有限の動画像には限定されず、連続的に映像が続くエンドレスな動画像により構成されるものであってもよい。このため、参照動画像は動画の始まりと終わりがあるのに対して、入力動画像は始めと終わりが必ずしもある必要がなく、予めいかなる切り出しも行われていない点で相違する。本実施の形態に係る入力動画像では、エンドレスな動画像を用いるものとする。 Here, the reference moving image is a finite moving image in which the movement (pattern) of the moving object is captured. On the other hand, the input moving image is a general moving image, and the subject to be photographed is not particularly limited. Further, the input moving image is not limited to a finite moving image, and may be configured by an endless moving image in which video continuously continues. For this reason, the reference moving image has the beginning and end of the moving image, whereas the input moving image does not necessarily have to have the beginning and end, and is different in that no clipping is performed in advance. In the input moving image according to the present embodiment, an endless moving image is used.

ここで、入力動画像や参照動画像は、動画像を構成する時間毎の画像平面を２次元空間として捉え、画像平面の時間変化を時系列的な視点で捉えることにより、時空間的なパターンにより構成されるものであると判断することができる。一方でこのような時空間的なパターンにより構成されるものではなく、単なる時系列のデータ同士のマッチングを行う連続ＤＰ（Dynamic Programming：ダイナミック・プログラミング）と呼ばれる技術が知られている。連続ＤＰは、例えば、『Ryuichi Oka, "Spotting Method for Classification of Real World Data", The Computer Journal, Vol.41, Issue 8, Oxford University Press, 1998年, p.559-565』などの文献に開示されている。 Here, the input moving image and the reference moving image have a spatio-temporal pattern by capturing a time-dependent image plane constituting the moving image as a two-dimensional space and capturing a temporal change of the image plane from a time-series viewpoint. Can be determined. On the other hand, a technique called continuous DP (Dynamic Programming) is known, which is not composed of such a spatio-temporal pattern, and simply matches time-series data. Continuous DP is disclosed in documents such as `` Ryuichi Oka, "Spotting Method for Classification of Real World Data", The Computer Journal, Vol.41, Issue 8, Oxford University Press, 1998, p.559-565. Has been.

［連続ＤＰ］
連続ＤＰとは、参照データと入力データとを用意し、入力データから参照データのデータパターンに類似する部分を検出する手法である。ここで、参照データとして、例えば、時系列的な始点と終点と設けられた１次元的なデータが用いられ、入力データとして、時系列的な始点と終点とが設けられていない１次元的なデータが用いられる。例えば、参照データとして、特定の音を示した音声データ、例えば「東京」と発音された音声データを用いることができる。音声データは、振幅の時系列変化で示され、より詳細には、時系列毎に変化する周波数特性により音声データのパターンを示すことができる。このため連続ＤＰを用いることにより、始点と終点とが設けられた参照データ（始点と終点が設けられているため区間時系列パターンと呼ぶことができる）に類似する部分を、始点と終点とが設けられていない入力データ（すなわち事前にいかなる切り出しもされていない時系列パターン）の中から認識（抽出）することが可能となる。このように始点と終点のない時系列パターンの中から区間時系列パターンを認識し、かつその区間をも特定することを、連続ＤＰでは、スポッティング認識と呼ぶ。 [Continuous DP]
Continuous DP is a technique for preparing reference data and input data and detecting a portion similar to the data pattern of the reference data from the input data. Here, for example, one-dimensional data provided with a time-series start point and end point is used as reference data, and one-dimensional data without a time-series start point and end point is used as input data. Data is used. For example, voice data indicating a specific sound, for example, voice data pronounced “Tokyo” can be used as the reference data. The audio data is indicated by a time-series change in amplitude, and more specifically, a pattern of the audio data can be indicated by a frequency characteristic that changes for each time series. For this reason, by using continuous DP, a portion similar to reference data provided with a start point and an end point (which can be called a section time series pattern because the start point and the end point are provided) It is possible to recognize (extract) from input data that is not provided (that is, a time-series pattern that has not been cut out in advance). Recognizing a section time-series pattern from time-series patterns having no start point and end point and specifying the section in this way is called spotting recognition in continuous DP.

図１（ａ）は、スポッティング認識を説明するために示した、時間ｔと時間τとからなる関係図である。図１（ａ）において、横軸は入力データの時間軸ｔを示し、始点と終点のない入力データの時系列パターンが示されている。この入力データの時系列パターンを、時間ｔを用いて入力時系列パターンｇ（ｔ）と示す。一方で、図１（ａ）における縦軸は、参照データの時間軸τを示し、始点と終点とが設けられる区間時系列パターンが示されている。この参照データの区間時系列パターンを、時間τを用いて参照時系列パターンｆ（τ）として示すものとする。 FIG. 1A is a relational diagram composed of time t and time τ shown for explaining spotting recognition. In FIG. 1A, the horizontal axis represents a time axis t of input data, and a time series pattern of input data without a start point and an end point is shown. The time series pattern of this input data is indicated as input time series pattern g (t) using time t. On the other hand, the vertical axis in FIG. 1A indicates the time axis τ of the reference data, and shows a section time series pattern in which a start point and an end point are provided. The section time series pattern of the reference data is indicated as a reference time series pattern f (τ) using time τ.

また、参照時系列パターンｆ（τ）には、始点と終点とが設けられており、τの範囲は、１≦τ≦Ｔの範囲である。一方で、入力時系列パターンｇ（ｔ）は、始点と終点とが設けられていないため、ｔの範囲は、−∞≦ｔ≦＋∞の範囲となる。 The reference time series pattern f (τ) has a start point and an end point, and the range of τ is a range of 1 ≦ τ ≦ T. On the other hand, since the input time series pattern g (t) has no start point and no end point, the range of t is a range of −∞ ≦ t ≦ + ∞.

連続ＤＰのアルゴリズムでは、以下のような評価関数を用いて、時間毎に最適解を求めることによりスポッティング認識を行う。

この関係式は、参照時系列パターンの各点と入力時系列パターンの各点との間に最適対応関係を与えるものである。ここで、関数ｒは、変数にτを与えることによりｔが求められる関数であり、ｔとτとを対応づける関数の集合を関数ｒ（τ）で示したものである。関数ｒの変数にＴを与えた場合には、ｒ（Ｔ）＝ｔが成立する。また、関数ｒの変数τとτ＋１とを代入した場合には、ｒ（τ）≦ｒ（τ＋１）の条件を満たすものとする。 In the continuous DP algorithm, spotting recognition is performed by obtaining an optimal solution for each time using the following evaluation function.

This relational expression gives an optimum correspondence between each point of the reference time series pattern and each point of the input time series pattern. Here, the function r is a function in which t is obtained by giving τ to a variable, and a set of functions that associate t with τ is represented by a function r (τ). When T is given to the variable of the function r, r (T) = t is established. Further, when the variable τ and τ + 1 of the function r are substituted, the condition r (τ) ≦ r (τ + 1) is satisfied.

Ｄ（ｔ，Ｔ）は、スポッティング認識が可能であるか否かの判断に用いられる、後述のＡ（ｔ）を求めるための値である。式１におけるｄ（ｒ（τ），τ）は、ｄ（ｔ，τ）で示すこともでき、局所距離と呼ばれる値を示している。局所距離ｄは、参照時系列パターンｆ（τ）と入力時系列パターンｇ（ｔ）との最短距離の絶対値を示しており、
ｄ（ｔ，τ）＝‖ｇ（ｔ）−ｆ（τ）‖ ・・・式２
により求められるものである。式１に示されるように、Ｄ（ｔ，Ｔ）は、局所距離ｄ（ｒ（τ），τ）における、τの１からＴまでの累積的な値の和の最小値により求められる。 D (t, T) is a value for obtaining A (t), which will be described later, used for determining whether spotting recognition is possible. D (r (τ), τ) in Equation 1 can also be represented by d (t, τ), and indicates a value called a local distance. The local distance d indicates the absolute value of the shortest distance between the reference time series pattern f (τ) and the input time series pattern g (t).
d (t, τ) = ‖g (t) −f (τ) ‖ Expression 2
Is required. As shown in Equation 1, D (t, T) is obtained from the minimum value of the sum of cumulative values from 1 to T of τ at the local distance d (r (τ), τ).

局所距離ｄ（ｒ（τ），τ）の累積的な値Ｄ（ｔ，τ）は、ｔおよびτの各時間における下記のアルゴリズムの漸化式により求めることができる。

この漸化式におけるτの範囲は２≦τ≦Ｔとなる。
また、式３では、Ｄ（ｔ，τ）の値として、
（１）Ｄ（ｔ−２，τ−１）＋２ｄ（ｔ−１，τ）＋ｄ（ｔ，τ）と、
（２）Ｄ（ｔ−１，τ−１）＋３ｄ（ｔ，τ）と、
（３）Ｄ（ｔ−１，τ−２）＋３ｄ（ｔ，τ−１）＋３ｄ（ｔ，τ）の３式のうち、
最小となる式を用いて値が求められる。但し、ｔ≦０の場合、または、τが１からＴの範囲に含まれない場合に、Ｄ（ｔ，τ）は、Ｄ（ｔ，τ）＝∞となる。 The cumulative value D (t, τ) of the local distance d (r (τ), τ) can be obtained by a recurrence formula of the following algorithm at each time of t and τ.

The range of τ in this recurrence formula is 2 ≦ τ ≦ T.
In Equation 3, as the value of D (t, τ)
(1) D (t−2, τ−1) + 2d (t−1, τ) + d (t, τ)
(2) D (t−1, τ−1) + 3d (t, τ);
(3) Of the three equations D (t−1, τ−2) + 3d (t, τ−1) + 3d (t, τ),
The value is determined using the smallest formula. However, when t ≦ 0 or when τ is not included in the range of 1 to T, D (t, τ) becomes D (t, τ) = ∞.

ここで、（１）〜（３）に示す式は、図１（ｂ）に示すように、Ｄ（ｔ，τ）へ至る３つの経路を数式により示したものに該当する。図１（ｂ）は、横軸をｔ、縦軸をτで示している。図１（ｂ）のＡ点は、Ｄ（ｔ，τ）を意図している。一方で（１）式におけるＤ（ｔ−２，τ−１）は、図１（ｂ）のＢ１点が該当することになる。（１）式は、Ｂ１点からＢ２点を経てＡ点へと至る経路を数式で示したものに該当する。この経路において、Ｂ１点からＢ２点へと至る距離を、ｄ（ｔ−１，τ）に対して係数（ウェイト）として２を掛け合わせた２ｄ（ｔ−１，τ）に設定し、Ｂ２点からＡ点へと至る距離を、ｄ（ｔ，τ）に対して係数（ウェイト）として１を掛け合わせたｄ（ｔ，τ）に設定している。 Here, the equations shown in (1) to (3) correspond to the three routes to D (t, τ) expressed by equations as shown in FIG. In FIG. 1B, the horizontal axis indicates t and the vertical axis indicates τ. The point A in FIG. 1B is intended to be D (t, τ). On the other hand, D (t−2, τ−1) in equation (1) corresponds to point B1 in FIG. Equation (1) corresponds to a mathematical expression of the route from point B1 to point A via point B2. In this route, the distance from the point B1 to the point B2 is set to 2d (t−1, τ) obtained by multiplying d (t−1, τ) by 2 as a coefficient (weight), and the point B2 The distance from point A to point A is set to d (t, τ) obtained by multiplying d (t, τ) by 1 as a coefficient (weight).

また、（２）式におけるＤ（ｔ−１，τ−１）は、図１（ｂ）のＣ点に該当することになる。（２）式は、Ｃ点からＡ点へと至る経路を数式で示したものに該当する。この経路において、Ｃ点からＡ点へと至る距離を、ｄ（ｔ，τ）に対して係数（ウェイト）３を掛け合わせた３ｄ（ｔ，τ）に設定している。 Further, D (t−1, τ−1) in the equation (2) corresponds to the point C in FIG. Equation (2) corresponds to a mathematical expression of the path from point C to point A. In this route, the distance from the point C to the point A is set to 3d (t, τ) obtained by multiplying d (t, τ) by a coefficient (weight) 3.

さらに、（３）式におけるＤ（ｔ−１，τ−２）は、図１（ｂ）のＤ１点に該当することになる。（３）式は、Ｄ１点からＤ２点を経てＡ点へと至る経路を数式で示したものに該当する。この経路において、Ｄ１点からＤ２点へと至る距離を、ｄ（ｔ，τ−１）に対して係数（ウェイト）として３を掛け合わせた３ｄ（ｔ，τ−１）に設定し、Ｄ２点からＡ点へと至る距離を、ｄ（ｔ，τ）に対して係数（ウェイト）３を掛け合わせた３ｄ（ｔ，τ）に設定している。 Further, D (t−1, τ−2) in the expression (3) corresponds to the point D1 in FIG. Equation (3) corresponds to a mathematical expression of the route from point D1 to point A via point D2. In this route, the distance from the point D1 to the point D2 is set to 3d (t, τ−1) obtained by multiplying d (t, τ−1) by 3 as a coefficient (weight), and the point D2 The distance from point A to point A is set to 3d (t, τ) obtained by multiplying d (t, τ) by a coefficient (weight) 3.

各係数は、いずれの経路を経てＡ点に至る場合であっても、局所距離ｄの係数（ウェイト）が３の倍数になるように設定されたものである。例えば、Ｂ１点およびＣ点のように、τ−１の時の位置からＡ点へと移動する場合に設定される係数は３であり、Ｄ１のように、τ−２の時の位置からＡ点へと移動する場合に設定される係数は３＋３となる。この係数を連続ＤＰに用いられる（ｔ，τ）についての傾斜制限と呼ぶ。また、本実施の形態では、Ｄ（ｔ，τ）の値として、（１）から（３）の式のうち値が小さくなるものを抽出する構成としたが、Ｄ（ｔ，τ）を求める式は、必ずしも（１）から（３）の３つの式には限定されない。より多くの数式を用いて値が最小となる式を採用するものであってもよい。 Each coefficient is set so that the coefficient (weight) of the local distance d is a multiple of 3 regardless of which route is used to reach point A. For example, the coefficient set when moving from the position at τ-1 to the point A, such as point B1 and point C, is 3, and from D, the position at the time of τ-2 is set to A. The coefficient set when moving to a point is 3 + 3. This coefficient is referred to as tilt restriction for (t, τ) used for continuous DP. In this embodiment, the value of D (t, τ) is extracted from the expressions (1) to (3) where the value is small. However, D (t, τ) is obtained. The expressions are not necessarily limited to the three expressions (1) to (3). An equation that minimizes the value may be adopted by using more equations.

その後、上述の式３に基づく漸化式を求めることにより、以下のＡ（ｔ）を求める。
Ａ（ｔ）＝Ｄ（ｔ，Ｔ）／３Ｔ Then, the following A (t) is calculated | required by calculating | requiring the recurrence formula based on the above-mentioned Formula 3.
A (t) = D (t, T) / 3T

このＡ（ｔ）は、連続ＤＰの入力時系列パターンのパラメータである時間ｔにおける、各時間の出力を示している。この値の変化をモニタし、モニタした値が予め設定される閾値（ｈ）以下で最小となる時点でスポッティング認識がなされたものと判断する。閾値（ｈ）以下で最小となる時点ｔをスポッティング時点（spotting time point）と呼ぶ。スポッティング時点において、参照時系列パターンにおけるパターン、例えば、参照時系列パターンが「東京」という音声データであれば、その東京という音声パターンを認識したことになる。 A (t) indicates the output at each time at time t, which is a parameter of the input time series pattern of continuous DP. The change in this value is monitored, and it is determined that spotting recognition has been made when the monitored value is minimum below a preset threshold value (h). The point of time t that is minimum below the threshold (h) is called a spotting time point. If the pattern in the reference time series pattern, for example, the reference time series pattern is voice data “Tokyo” at the time of spotting, the voice pattern “Tokyo” is recognized.

その後、スポッティング時点へと至る漸化式のバックトレースを、上述した式３などを用いて行うことにより、参照時系列パターンに対応する入力時系列中の区間（図１（ａ）においてSegmented intervalと表記される区間）を容易に検出することが可能となる。 Thereafter, a recurrence backtrace to the spotting time point is performed using the above-described equation 3 or the like, so that a segment in the input time series corresponding to the reference time series pattern (Segmented interval in FIG. It is possible to easily detect the indicated section.

なお、上述したように、Ｄ（ｔ，τ）を求める数式（１）から（３）には係数（ウェイト）が設定されている。また、Ｄ（ｔ，Ｔ）におけるＴは、参照時系列パターンの終端時間（始点から終点までの時間）を示しており、参照時系列パターンに応じてＴの値が異なるので、係数３と参照時系列パターンの時間Ｔとを、求められたＤ（ｔ，Ｔ）から割ることにより、係数（ウェイト）が考慮されたスポッティング点（spotting point：図１（ａ）参照）を算出することが可能となる。なお、スポッティング点は、前述したように単にＡ（ｔ）により求められるのではなく、局所的に閾値（ｈ）以下で最小となるポイントを求めることにより決定されることになる。 As described above, coefficients (weights) are set in equations (1) to (3) for obtaining D (t, τ). Further, T in D (t, T) indicates the end time (time from the start point to the end point) of the reference time series pattern, and the value of T differs depending on the reference time series pattern. By dividing the time T of the time series pattern from the obtained D (t, T), it is possible to calculate a spotting point (spotting point: see FIG. 1A) in consideration of the coefficient (weight). It becomes. Note that the spotting point is not determined simply by A (t) as described above, but is determined by determining a point that locally becomes the minimum below the threshold (h).

なお、上述した連続ＤＰは、１次元的な時系列パターンからなる音声の識別分野などで利用されている。このように、従来から知られている連続ＤＰは、時系列パターンのマッチングを行うことを特徴としている。 Note that the above-described continuous DP is used in the field of voice identification composed of a one-dimensional time-series pattern. Thus, conventionally known continuous DP is characterized by matching time-series patterns.

音声のような１次元的な時系列パターンを用いてマッチングを行う場合には、連続ＤＰを好適に用いることができる。しかしながら、動画像における対象物の移動パターンの認識とその軌跡抽出を考える場合には、２次元的な平面から捉える側面とその時間変化という時間軸で捉える側面との２つの側面を同時に備える必要が生ずる。このように、動画像では２つの側面を備えているため、従来の特徴ベースの手法を用いて画像平面の特徴抽出を行う場合には、あるいは、その特徴の時間変化を用いて、参照時系列パターンを入力時系列パターンの中から抽出する場合には、時間と空間（画像平面）との間の相互作用（カップリング）が弱くなる傾向があった。また、画像平面を縦または横に走査して一つの軸を構成し、画像平面を１つの１次元のベクトル空間として扱う場合であっても、同様に、時間と空間（平面）との間の相互作用（カップリング）が弱くなる傾向があった。 When matching is performed using a one-dimensional time-series pattern such as speech, continuous DP can be preferably used. However, when recognizing the movement pattern of an object in moving images and extracting its trajectory, it is necessary to simultaneously have two aspects: a side that is captured from a two-dimensional plane and a side that is captured along the time axis of its time change. Arise. As described above, since a moving image has two aspects, when performing feature extraction on an image plane using a conventional feature-based method, or by using a temporal change of the feature, a reference time series is used. When a pattern is extracted from an input time series pattern, the interaction (coupling) between time and space (image plane) tends to be weakened. In addition, even when the image plane is scanned vertically or horizontally to form one axis and the image plane is handled as a single one-dimensional vector space, similarly, between the time and the space (plane) There was a tendency for the interaction (coupling) to become weaker.

このため、本実施の形態に係る動画像処理装置では、上述した連続ＤＰに対して、動画像を構成する画像平面のパラメータを内部的に導入するアルゴリズムを考えることにより、動画像の抽出を行うことを特徴する。このように２次元的な画像平面が時間変化することにより構成される動画像の特徴から、本実施の形態に係る連続ＤＰを時空間連続ＤＰと呼ぶ。 For this reason, in the moving image processing apparatus according to the present embodiment, moving images are extracted by considering an algorithm that internally introduces parameters of image planes that form moving images with respect to the continuous DP described above. It is characterized by that. The continuous DP according to the present embodiment is referred to as a spatiotemporal continuous DP based on the characteristics of a moving image that is configured by temporally changing a two-dimensional image plane.

［時空間連続ＤＰ］
上述した連続ＤＰでは、マッチングの対象とする時系列パターンが、音声のような１次元的に時間変化（時系列変化）することを特徴としていたが、時空間連続ＤＰでは、動画像のように、２次元空間を成す画像平面が時間変化（時系列変化）するものを対象とする点で相違する。このため、時空間連続ＤＰでは、連続ＤＰにおける時系列パターンを、１次元的に時系列変化するデータから、２次元的に時系列変化するデータ、具体的には動画像へと拡張することにより、時空間画像（時系列的に変化する画像）同士のスポッティング認識を実現させる。従って、時空間連続ＤＰでは、入力時系列パターンおよび参照時系列パターンとして、動画像（時系列的に変化する画像平面）からなるデータを用いる。 [Space-time continuous DP]
In the above-mentioned continuous DP, the time-series pattern to be matched is characterized by a time change (time-series change) in a one-dimensional manner such as speech. However, in a spatio-temporal continuous DP, The difference is that an image plane that forms a two-dimensional space is subject to time change (time series change). Therefore, in spatio-temporal continuous DP, the time series pattern in continuous DP is expanded from data that changes one-dimensionally in time series to data that changes two-dimensionally in time series, specifically, moving images. Spotting recognition between spatiotemporal images (images changing in time series) is realized. Therefore, in the spatio-temporal continuous DP, data composed of moving images (image planes that change in time series) is used as the input time series pattern and the reference time series pattern.

ここで、動画像は、画像平面の時系列変化により構成されるものであるため、２つの空間パラメータ（画像平面における２次元的なパラメータ）と１つの時間パラメータ（時系列変化を示す時間のパラメータ）とを備えている。このため、時空間連続ＤＰにおける入力時系列パターンは、ｆ（ｘ，ｙ，ｔ）で示すことができ、参照時系列パターンは、ｇ（ξ，η，τ）で表すことができる。ｆ（ｘ，ｙ，ｔ）において、ｘとｙとは、入力時系列パターンの画像平面の座標のパラメータを示し、ｔは、その時系列変化を示す時間を示している。一方で、ｇ（ξ，η，τ）において、ξとηとは、参照時系列パターンの画像平面の座標のパラメータを示し、τは、その時系列変化を示す時間を示している。このように、入力時系列パターンとして３変数、参照時系列パターンとして３変数の合計６変数が用いられるため、その要素間の局所距離は６変数によって、ｄ（ｘ，ｙ，ｔ，ξ，η，τ）で示されることになる。 Here, since the moving image is composed of time-series changes in the image plane, two spatial parameters (two-dimensional parameters in the image plane) and one time parameter (time parameter indicating time-series change). ). Therefore, the input time series pattern in the space-time continuous DP can be represented by f (x, y, t), and the reference time series pattern can be represented by g (ξ, η, τ). In f (x, y, t), x and y indicate parameters of coordinates on the image plane of the input time series pattern, and t indicates a time indicating the time series change. On the other hand, in g (ξ, η, τ), ξ and η indicate parameters of coordinates on the image plane of the reference time series pattern, and τ indicates a time indicating the time series change. In this way, since a total of 6 variables of 3 variables are used as the input time series pattern and 3 variables are used as the reference time series pattern, the local distance between the elements is d (x, y, t, ξ, η) by 6 variables. , Τ).

しかしながら、局所距離が６変数によって構成されたまま、累積的な局所距離の計算等を行うと、計算量が膨大となり、動画像処理装置の処理負担が増大すると共に、迅速な処理が困難になるおそれがある。また、局所距離が６変数により構成されたままで、連続ＤＰへの拡張を行うことは、現実的に困難であるといえる。このため、本実施の形態に係る時空間連続ＤＰでは、局所距離の変数を６変数から４変数へ削減することを特徴とする。 However, if the cumulative local distance is calculated while the local distance is composed of six variables, the amount of calculation becomes enormous, increasing the processing load on the moving image processing apparatus and making it difficult to perform rapid processing. There is a fear. Moreover, it can be said that it is practically difficult to perform extension to continuous DP while the local distance is composed of six variables. For this reason, the spatiotemporal continuous DP according to the present embodiment is characterized in that the variable of the local distance is reduced from 6 variables to 4 variables.

具体的には、入力時系列パターンｆ（ｘ，ｙ，ｔ）は、そのまま３変数により構成されるものとし、明暗レベルを持つ２次元的な画像平面が時系列的に変化する動画像であるとする。入力時系列パターンｆ（ｘ，ｙ，ｔ）の動画像は、任意の背景映像が含まれていても、任意の数の移動物体の像が存在するものであってもよい。図２（ａ）は、入力時系列パターンｆ（ｘ，ｙ，ｔ）を模式的に示したものである。図２（ａ）には、時間ｔにおける入力時系列パターンｆ（ｘ，ｙ，ｔ）の画像平面がｘｙ平面として示されている。また、図２（ａ）には、一例として矢印β１，β２で示される２つの移動物体の像が示されており、さらに、β３で示されるノイズも含まれている。 Specifically, the input time-series pattern f (x, y, t) is a moving image in which a two-dimensional image plane having a light / dark level changes in time series, as it is composed of three variables as they are. And The moving image of the input time series pattern f (x, y, t) may include an arbitrary background image or an image having an arbitrary number of moving objects. FIG. 2A schematically shows an input time series pattern f (x, y, t). In FIG. 2A, the image plane of the input time series pattern f (x, y, t) at time t is shown as the xy plane. In addition, FIG. 2A shows two moving object images indicated by arrows β1 and β2 as an example, and further includes noise indicated by β3.

一方で、参照時系列パターンを、（ξ，η，τ）からなる３次元空間の軌跡パターン（pixel trajectory pattern）として捉えて、Ｚ（ξ（τ），η（τ））で示すものとする。Ｚ（ξ（τ），η（τ））は、時間τにおける２次元の画像平面の座標点（２次元空間座標点）を示し、また、Ｚは一般に明暗値を示す。つまり、参照時系列パターンは、３次元空間において１つの明暗値を持ち、τのみをパラメータとする軌跡パターンとして定義されることになる。但し、参照時系列パターンは始点と終点とが設けられた区間時系列パターンであるため、τは、τ＝１，２，・・・，Ｔの値となる。図２（ｂ）は、参照時系列パターンＺ（ξ（τ），η（τ））を示している。参照時系列パターンでは、時間τが１からＴへと変化することにより、その時間に対応するξη平面（画像平面）の座標点がＺ（ξ（τ），η（τ））として特定される。このように、参照時系列パターンをτのみのパラメータで定義することにより、連続ＤＰの場合と同様に、１つの動画像としての動きのカテゴリを示すことができる。入力時系列パターンに該当する動画像を、入力動画像ｆ（ｘ，ｙ，ｔ）で示し、参照時系列パターンに該当する動画像を、参照動画像Ｚ（ξ（τ），η（τ））で示す。時空間連続ＤＰは、以下のアルゴリズムにより示すことが可能となる。 On the other hand, the reference time-series pattern is regarded as a three-dimensional space trajectory pattern (pixel trajectory pattern) composed of (ξ, η, τ) and is represented by Z (ξ (τ), η (τ)). . Z (ξ (τ), η (τ)) represents a coordinate point (two-dimensional space coordinate point) of a two-dimensional image plane at time τ, and Z generally represents a light / dark value. That is, the reference time series pattern is defined as a trajectory pattern having one brightness value in the three-dimensional space and using only τ as a parameter. However, since the reference time series pattern is an interval time series pattern in which a start point and an end point are provided, τ is a value of τ = 1, 2,. FIG. 2B shows a reference time series pattern Z (ξ (τ), η (τ)). In the reference time series pattern, when the time τ changes from 1 to T, the coordinate point of the ξη plane (image plane) corresponding to the time is specified as Z (ξ (τ), η (τ)). . As described above, by defining the reference time series pattern with the parameter of τ alone, the category of motion as one moving image can be shown as in the case of continuous DP. A moving image corresponding to the input time series pattern is indicated by an input moving image f (x, y, t), and a moving image corresponding to the reference time series pattern is indicated by a reference moving image Z (ξ (τ), η (τ). ). The spatio-temporal continuous DP can be indicated by the following algorithm.

入力動画像：ｆ（ｘ，ｙ，ｔ）
但し、１≦ｘ≦Ｍ，１≦ｙ≦Ｎ，１≦ｔ＜∞とし、Ｍは、入力動画像の画像平面における横座標の最大値であり、Ｎは、縦座標の最大値である。 Input video: f (x, y, t)
However, 1 ≦ x ≦ M, 1 ≦ y ≦ N, and 1 ≦ t <∞, where M is the maximum value of the abscissa on the image plane of the input moving image, and N is the maximum value of the ordinate.

参照動画像：Ｚ（ξ（τ），η（τ））
但し、１≦ξ（τ）≦Ｍ，１≦η（τ）≦Ｎ，１≦τ≦Ｔである。 Reference video: Z (ξ (τ), η (τ))
However, 1 ≦ ξ (τ) ≦ M, 1 ≦ η (τ) ≦ N, and 1 ≦ τ ≦ T.

また、時間τと時間τ−１とのξにおける軌跡の違いをｅ_１（τ）として、
ｅ_１（τ）＝ξ（τ）−ξ（τ−１）で示し、
時間τと時間τ−１とのηにおける軌跡の違いをｅ_２（τ）として、
ｅ_２（τ）＝η（τ）−η（τ−１）で示す。 Also, let e ₁ (τ) be the trajectory difference at ξ between time τ and time τ−1.
e ₁ (τ) = ξ (τ) −ξ (τ−1),
Let e ₂ (τ) be the difference in trajectory at η between time τ and time τ−1.
e ₂ (τ) = η (τ) −η (τ−1).

また、上述したように、参照動画像を時間τからなる１変数によって表現することにより、入力動画像ｆ（ｘ，ｙ，ｔ）と、参照動画像Ｚ（ξ（τ），η（τ））との局所距離を４つの変数を用いて、以下のように定義することが可能となる。
ｄ（ｘ，ｙ，τ，ｔ）
＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖ ・・・式４
次に、累積的な局所距離であって、ｘ，ｙおよびｔを用いて最適化されたものを評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）として式５で表す。

但し、ｐ（ｘ，ｙ，τ）は時間τにおけるｘの値を求める関数であり、ｑ（ｘ，ｙ，τ）は時間τにおけるｙの値を求める関数であり、ｒ（τ）は時間τに対応する時間ｔを求める（時間τを時間ｔに移す）関数である。また、ｒ（τ）≦ｒ（τ＋１）であり、ｒ（Ｔ）＝ｔとする。 Further, as described above, by expressing the reference moving image by one variable consisting of time τ, the input moving image f (x, y, t) and the reference moving image Z (ξ (τ), η (τ) ) And the local distance can be defined as follows using four variables.
d (x, y, τ, t)
= ‖Z (ξ (τ), η (τ))-f (x, y, t) ‖ Equation 4
Next, the cumulative local distance, which is optimized using x, y, and t, is expressed by Equation 5 as an evaluation function S (x, y, T, t).

However, p (x, y, τ) is a function for obtaining the value of x at time τ, q (x, y, τ) is a function for obtaining the value of y at time τ, and r (τ) is time. This is a function for obtaining time t corresponding to τ (moving time τ to time t). Further, r (τ) ≦ r (τ + 1), and r (T) = t.

このようにして求められる評価関数を最小化したＳ（ｘ，ｙ，Ｔ，ｔ）を、連続ＤＰに基づいて漸化式から求める方法が、時空間連続ＤＰに該当する。 A method for obtaining S (x, y, T, t) in which the evaluation function obtained in this way is minimized from a recurrence formula based on continuous DP corresponds to spatiotemporal continuous DP.

Ｓ（ｘ，ｙ，Ｔ，ｔ）を求める漸化式は、初期条件として、τ＝１の時のＳ（ｘ，ｙ，１，ｔ）を、
Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）・・・式６
とすることにより、以下の式で求められる。

但し、この漸化式におけるτの範囲は２≦τ≦Ｔとなる。
また、（ｘ，ｙ）が（Ｍ，Ｎ）の座標内に属さない場合、または、ｔ≦０である場合、あるいは、τが１からＴに該当しない場合には、
Ｓ（ｘ，ｙ，τ，ｔ）＝∞，ｄ（ｘ，ｙ，τ，ｔ）＝∞となる。 The recurrence formula for obtaining S (x, y, T, t) is, as an initial condition, S (x, y, 1, t) when τ = 1,
S (x, y, 1, t) = 3d (x, y, 1, t) Equation 6
Thus, the following formula is obtained.

However, the range of τ in this recurrence formula is 2 ≦ τ ≦ T.
When (x, y) does not belong to the coordinates of (M, N), or when t ≦ 0, or when τ does not fall within the range of 1 to T,
S (x, y, τ, t) = ∞ and d (x, y, τ, t) = ∞.

式６に示したτ＝１の場合においてｄ（ｘ，ｙ，１，ｔ）に乗算される係数（ウェイト）の３と、式７の局所距離ｄに乗算される係数（ウェイト）の値とは、図１（ｂ）に示したように局所遷移候補が３通り（３つの経路（パス））である場合を一例として示すものである。局所遷移候補が３通りでない場合や、図１（ｂ）に示す係数（ウェイト）とは異なる係数が経路毎に設定される場合には、その値が変更され得る。 In the case of τ = 1 shown in Expression 6, the coefficient (weight) 3 multiplied by d (x, y, 1, t), and the value of the coefficient (weight) multiplied by the local distance d of Expression 7 Shows an example of the case where there are three types of local transition candidates (three paths (paths)) as shown in FIG. When there are not three types of local transition candidates or when a coefficient different from the coefficient (weight) shown in FIG. 1B is set for each path, the value can be changed.

上述の式７の漸化式を用いることにより、時空間連続ＤＰの計算によって、各時間ｔにおけるＳ（ｘ，ｙ，Ｔ，ｔ）という値を求めることができる。ここで、（ｘ，ｙ）の範囲は、入力動画像における画像平面のピクセル範囲に対応する。また、Ｍは、入力動画像の横方向のピクセル数を示し、Ｎは、入力動画像の縦方向のピクセル数を示している。 By using the recurrence formula of the above-mentioned formula 7, a value of S (x, y, T, t) at each time t can be obtained by calculating the spatiotemporal continuous DP. Here, the range of (x, y) corresponds to the pixel range of the image plane in the input moving image. M represents the number of pixels in the horizontal direction of the input moving image, and N represents the number of pixels in the vertical direction of the input moving image.

評価関数を最小化したＳ（ｘ，ｙ，Ｔ，ｔ）を用いて、入力動画像における参照動画像のスポッティング認識は、以下の式によって求められる。

Using S (x, y, T, t) that minimizes the evaluation function, spotting recognition of the reference moving image in the input moving image is obtained by the following equation.

式８の右辺におけるＳ（ｘ，ｙ，Ｔ，ｔ）／３Ｔにおける３Ｔは、参照時系列パターンの時間方向の長さを正規化するための定数である。この定数は、式７で示される漸化式において、局所距離ｄに付加される係数の全てを、τ＝１からτ＝Ｔまで加算することにより求められる値であり、式７の場合には３Ｔとなる。なお、この３Ｔは、図１（ｂ）に示したように、局所遷移候補として３通り（３つの経路（パス））を設定し、それぞれの係数（ウェイト）を図１（ｂ）に示すように設定した場合の値である。累積距離で示される評価関数Ｓ（ｘ，ｙ，Ｔ，ｔ）を、３Ｔで割ることによって、正規化を行うことができる。 3T in S (x, y, T, t) / 3T on the right side of Expression 8 is a constant for normalizing the length in the time direction of the reference time series pattern. This constant is a value obtained by adding all the coefficients added to the local distance d from τ = 1 to τ = T in the recurrence formula shown in Expression 7. In the case of Expression 7, 3T. In this 3T, as shown in FIG. 1B, three types (three paths (paths)) are set as local transition candidates, and the respective coefficients (weights) are shown in FIG. 1B. This is the value when set to. Normalization can be performed by dividing the evaluation function S (x, y, T, t) indicated by the cumulative distance by 3T.

式８により、正規化された累積値が、予め設定される一定の閾値ｈ以下であり、さらに、（ｘ，ｙ）の平面で局所的に最小となる座標点が（ｘ^＊，ｙ^＊）である場合に、参照動画像の時系列パターンに類似する動きが、入力動画像において各時間ｔに（ｘ^＊，ｙ^＊）の座標で認識されたものと判断することができる。 According to Expression 8, the normalized cumulative value is equal to or smaller than a predetermined threshold value h, and the coordinate point that is locally minimum on the (x, y) plane is (x ^* , y ^* ). In this case, it can be determined that the movement similar to the time-series pattern of the reference moving image is recognized at the coordinates (x ^* , y ^* ) at each time t in the input moving image.

式８において（ｘ，ｙ）に関する最小値を、閾値ｈ以下となるローカルなエリア（以下、このエリアをローカルエリア（local area）とし、このローカルエリアは、入力動画像の縦横解像度の範囲内となる。）で判断する理由は、入力動画像において参照動画像のパターンの認識を行う場合に、ローカルエリアで認識判断を行うことにより、類似する画像領域（詳細には、複数の類似するピクセルからなるピクセル群）を同時に複数判断することが可能になるためである。閾値ｈを基準として判断しない場合には、入力動画像の画面全体で、時間毎に１つしか類似する画像領域を抽出することができなくなってしまう。閾値ｈを基準として判断することにより、複数の画像領域においてそれぞれスポッティング認識を行うことが可能となるため、複数の類似する画像領域を抽出することが可能になる。 In Equation 8, the minimum value regarding (x, y) is a local area (hereinafter referred to as a local area) that is equal to or less than the threshold value h, and this local area is within the range of the vertical and horizontal resolution of the input moving image. The reason for the determination in (2) is that when the pattern of the reference moving image is recognized in the input moving image, by performing the recognition determination in the local area, a similar image region (specifically, from a plurality of similar pixels) This is because a plurality of pixel groups) can be simultaneously determined. If the threshold h is not used as a reference, only one similar image area can be extracted every time on the entire screen of the input moving image. By determining using the threshold value h as a reference, spotting recognition can be performed in each of a plurality of image areas, and thus a plurality of similar image areas can be extracted.

複数の類似する画像領域を抽出すことができるため、１つのスポッティング点の近くのピクセル群が類似する場合も存在する。例えば、図３は、少年が手を挙げながら口で何かを説明している状態を示した図である。このような状態では、上に挙げた手の動きと、口の動きとの２つの動きの認識および軌跡抽出が行われることになる。つまり、手の動きと口の動きとがそれぞれ異なるスポッティング点であり、異なる画像領域として同時に画像平面において２カ所の領域を形成することになる。例えば、手の動きを示す動画（手の動きを示すピクセルの軌跡）と口の動きを示す動画（口の動きを示すピクセルの軌跡）とを予め参照動画像として設定しておくことにより、入力動画像において手の動きと口の動きとが終わった時点で、それぞれの動きに類似したピクセル群が別々の領域として抽出されることになる。このように、閾値ｈを基準として判断を行うことにより、手の動きの抽出およびその軌跡と、口の動きの抽出およびその軌跡とを、それぞれ別々に抽出することが可能となる。 Since a plurality of similar image regions can be extracted, there may be a case where pixels near one spotting point are similar. For example, FIG. 3 is a diagram showing a state where a boy is explaining something with his mouth while raising his hand. In such a state, recognition and locus extraction of the two movements of the hand movement and the mouth movement described above are performed. That is, the movement of the hand and the movement of the mouth are different spotting points, and two regions are simultaneously formed on the image plane as different image regions. For example, a video showing a hand movement (pixel trajectory indicating hand movement) and a video showing a mouth movement (pixel trajectory showing a mouth movement) are set as reference moving images in advance. When the movement of the hand and the movement of the mouth are finished in the moving image, pixel groups similar to the respective movements are extracted as separate areas. As described above, by performing the determination based on the threshold value h, it is possible to extract the movement of the hand and its trajectory and the extraction of the movement of the mouth and its trajectory separately.

類似の動作のスポッティング認識は、式７に示す時空間連続ＤＰの漸化式により抽出される。時空間連続ＤＰの漸化式は、時間の非線形な変動も許容するという特性を有しているためである。その詳細について図４〜図６に示す図を用いて説明する。 Spotting recognition of similar motions is extracted by a recurrence formula of the spatiotemporal continuous DP shown in Equation 7. This is because the recurrence formula of the spatiotemporal continuous DP has a characteristic of allowing non-linear fluctuations in time. Details thereof will be described with reference to FIGS.

図４は、式７の漸化式のうち、上から２番目の式である、
Ｓ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１）＋３ｄ（ｘ，ｙ，τ，ｔ）
（以下、この式を式９とする。）を漸化式の最小値を導く式であるとして選択するときのパス（経路）を示した図である。 FIG. 4 is the second equation from the top among the recurrence equations of Equation 7.
S (x−e ₁ (τ), ye ₂ (τ), τ−1, t−1) + 3d (x, y, τ, t)
FIG. 9 is a diagram showing a path (route) when selecting (hereinafter referred to as Expression 9) as an expression for deriving the minimum value of the recurrence formula.

図４では、式９を、（ｘ，ｙ，τ，ｔ）の４変数からなる４次元空間で捉えており、この４次元空間におけるＳ（ｘ，ｙ，τ，ｔ）を決定するためのパス（経路）が白丸と矢印とで示されている。式９では、点（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１）におけるＳの値と、局所距離ｄ（ｘ，ｙ，τ，ｔ）に重み係数３を掛けたものの和が選択されるべき候補となっている。ここで局所距離は、式４に示すように定義されるため、局所距離ｄ（ｘ，ｙ，τ，ｔ）は、（ξ（τ），η（τ））と（ｘ，ｙ，ｔ）とが対応することになる。また同様に、ｄ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１）は、（ξ（τ−１），η（τ−１））と（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），ｔ−１）とが対応することを意味している。まとめると、ｔとτとが対応し、ｔ−１とτ−１とが対応し、（ξ（τ），η（τ））と（ｘ，ｙ，ｔ）とが対応し、（ξ（τ−１），η（τ−１））と（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），ｔ−１）とが対応する。これらの対応は、時空間（画像平面に時間変化要素を加えたもの）において"線形"の対応であるといえる。式７の漸化式の一番上の式と３番目の式も同様である。 In FIG. 4, Equation 9 is captured in a four-dimensional space composed of four variables (x, y, τ, t), and S (x, y, τ, t) in this four-dimensional space is determined. A path (path) is indicated by a white circle and an arrow. In Equation 9, the value of S at the point (x−e ₁ (τ), ye ₂ (τ), τ−1, t−1) and the local distance d (x, y, τ, t) are weighted. The sum of the product of the coefficient 3 is a candidate to be selected. Here, since the local distance is defined as shown in Equation 4, the local distance d (x, y, τ, t) is (ξ (τ), η (τ)) and (x, y, t). Will correspond. Similarly, d (x−e ₁ (τ), ye ₂ (τ), τ−1, t−1) is expressed by (ξ (τ−1), η (τ−1)) and (x -E ₁ (τ), ye ₂ (τ), t−1) correspond to each other. In summary, t and τ correspond, t−1 and τ−1 correspond, (ξ (τ), η (τ)) and (x, y, t) correspond, and (ξ ( τ-1), η (τ-1)) and (xe ₁ (τ), ye ₂ (τ), t-1) correspond to each other. These correspondences can be said to be "linear" correspondences in space-time (an image plane plus a time-varying element). The same applies to the top expression and the third expression of the recurrence expression of Expression 7.

また、図５は、式７の漸化式のうち３番目の式である、
Ｓ（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）＋３ｄ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ）＋３ｄ（ｘ，ｙ，τ，ｔ）
（以下、この式を式１０とする。）を漸化式の最小値を導く式であるとして選択するときのパス（経路）を示した図である。 FIG. 5 is the third equation of the recurrence equation of Equation 7.
_{S (x-e 1 (τ} ) -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1), τ-2, t-1) + 3d (x-e 1 (τ ), Y−e ₂ (τ), τ−1, t) + 3d (x, y, τ, t)
FIG. 10 is a diagram showing a path (route) when selecting (hereinafter referred to as Expression 10) as an expression for deriving the minimum value of the recurrence formula.

図５では、式１０を、（ｘ，ｙ，τ，ｔ）の４変数からなる４次元空間で捉えており、この４次元空間におけるＳ（ｘ，ｙ，τ，ｔ）を決定するためのパス（経路）が白丸と矢印とで示されている。式１０において、Ｓ（ｘ，ｙ，τ，ｔ）へと至るパスは、点（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）から、点（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ）を経て、（ｘ，ｙ，τ，ｔ）へ至る。このとき、（ξ（τ−２），η（τ−２））、（ξ（τ−１），η（τ−１））、（ξ（τ），η（τ））の３つの点は、それぞれ（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），ｔ−１），（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），ｔ）および（ｘ，ｙ，ｔ）に対応する。この場合、入力動画像の時間パラメータｔが参照動画像の時間パラメータτに比べて１／２に圧縮されることになる。 In FIG. 5, Expression 10 is captured in a four-dimensional space composed of four variables (x, y, τ, t), and S (x, y, τ, t) in this four-dimensional space is determined. A path (path) is indicated by a white circle and an arrow. In Formula 10, S (x, y, τ, t) path leading to the point _{(x-e 1 (τ)} -e 1 (τ-1), y-e 2 (τ) -e 2 (τ -1), τ-2, t-1), through the point (xe ₁ (τ), ye ₂ (τ), τ-1, t), and (x, y, τ, t) To. At this time, three points (ξ (τ-2), η (τ-2)), (ξ (τ-1), η (τ-1)), (ξ (τ), η (τ)) are each _{(x-e 1 (τ)} -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1), t-1), (x-e 1 (τ), ye ₂ (τ), t) and (x, y, t). In this case, the time parameter t of the input moving image is compressed to ½ compared to the time parameter τ of the reference moving image.

また、図６は、式７の漸化式のうち一番上の式である、
Ｓ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２）＋２ｄ（ｘ，ｙ，τ，ｔ−１）＋ｄ（ｘ，ｙ，τ，ｔ）
（以下、この式を式１１とする。）を漸化式の最小値を導く式であるとして選択するときのパス（経路）を示した図である。式１１においても式９および式１０と同様にして線形の対応を求めることができる。式１１では、入力動画像の時間パラメータｔが参照動画像の時間パラメータτに比べて２倍に拡張される。 FIG. 6 is the top expression of the recurrence expression of Expression 7.
S (x−e ₁ (τ), ye ₂ (τ), τ−1, t−2) + 2d (x, y, τ, t−1) + d (x, y, τ, t)
FIG. 11 is a diagram showing a path (route) when selecting (hereinafter referred to as Expression 11) as an expression for deriving the minimum value of the recurrence formula. In Expression 11, linear correspondence can be obtained in the same manner as Expression 9 and Expression 10. In Expression 11, the time parameter t of the input moving image is expanded twice as much as the time parameter τ of the reference moving image.

このように、入力動画像と参照動画像との２つの動画像における時間的、かつ、局所的な伸縮が積み上げられる（蓄積される）ことによって、全体において非線形となる対応が形成されることになる。このようにして、入力動画像と参照動画像との動画像のパターンが非線形な対応になることによって、類似なものの整合ができるようになり、特徴ベースの手法を用いるのではなく、参照動画像と入力動画像とのマッチングのみによって対象物の動きの認識と軌跡抽出とを実現することが可能となる。 As described above, the temporal and local expansion and contraction in the two moving images of the input moving image and the reference moving image are accumulated (accumulated), thereby forming a non-linear correspondence as a whole. Become. In this way, since the moving image pattern of the input moving image and the reference moving image becomes a non-linear correspondence, it becomes possible to match similar ones, and the reference moving image is used instead of using a feature-based method. It is possible to realize the recognition of the movement of the object and the extraction of the trajectory only by matching with the input moving image.

また、（ｔ，τ）について、式１１では２倍、式９では１倍、式１０では２分の１倍の局所的な非線形の変形が作用している。しかしながら、これらの局所的な非線形の変形は、（ｔ，τ）に関するものであって、時間の非線形対応による非線形の影響を示したものにすぎない。このため、空間独自の非線形な変動（時間に関する変形でなく、画像平面における非線形の変動）は考慮されていない。次に、時空間独自の非線形を考慮した漸化式について説明する。 Further, with respect to (t, τ), a local non-linear deformation is acting twice in Equation 11, 1 in Equation 9, and 1/2 in Equation 10. However, these local non-linear deformations relate to (t, τ) and only show non-linear effects due to non-linear correspondence of time. For this reason, non-linear fluctuations unique to space (non-temporal fluctuations, non-linear fluctuations in the image plane) are not considered. Next, a recurrence formula that takes into account the spatio-temporal nonlinearity will be described.

図７（ａ）〜（ｃ）は、参照動画像における空間的変形（画像平面的変形）を説明するための図である。図７（ａ）は、基本となる参照動画像のパターンを示しており、τ＝１からτ＝Ｔまでの軌跡を画像平面に示した図である。これに対して、図７（ｂ）と図７（ｃ）とは、空間的な大きさ（画像平面における大きさ）が変形された場合を示している。図７（ｂ）は空間的に縮小された図であり、図７（ｃ）は、空間的に拡張された図を示している。本実施の形態に係る時空間連続ＤＰを用いて対象物のマッチングを行う場合には、このような画像平面における空間的な変形に加えて、前述したような時間的な変形とが同時に発生する。式７だけでは、時間的な変形には対応できるが、空間的な変形には対応することができない。このため、空間的変形にも対応できるようにしたアルゴリズムについて説明する。 FIGS. 7A to 7C are diagrams for explaining spatial deformation (image plane deformation) in the reference moving image. FIG. 7A shows a basic reference moving image pattern, and shows a trajectory from τ = 1 to τ = T on the image plane. On the other hand, FIG. 7B and FIG. 7C show a case where the spatial size (size on the image plane) is deformed. FIG. 7B is a spatially reduced view, and FIG. 7C shows a spatially expanded view. When matching an object using the spatio-temporal continuous DP according to the present embodiment, in addition to the spatial deformation in the image plane, the temporal deformation as described above occurs simultaneously. . Expression 7 alone can cope with temporal deformation, but cannot cope with spatial deformation. Therefore, an algorithm that can cope with spatial deformation will be described.

なお、空間的変形においても、時間的変形と同様に、入力動画像が参照動画像に比べて２分の１倍、１倍、２倍に変形される場合について考える。この２分の１倍、１倍、２倍は、時間的変形と対応するように任意に設定した値であり、必ずしも空間的変形の拡大・縮小はこの比率に限定されるものではない。 In the spatial deformation, as in the case of temporal deformation, consider a case where the input moving image is deformed by a factor of 1/2, 1 or 2 compared to the reference moving image. The 1/2 times, 1 times, and 2 times are values arbitrarily set so as to correspond to the temporal deformation, and the enlargement / reduction of the spatial deformation is not necessarily limited to this ratio.

空間的変形を示す係数をαとすると、αは、{１／２，１，２｝のいずれかの値となる。
その上で、Ｓ（ｘ，ｙ，Ｔ，ｔ）を求める漸化式は、初期条件として、τ＝１の時のＳ（ｘ，ｙ，１，ｔ）を、
Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）・・・式６
とすることにより、以下の式で求められる。

但し、この漸化式におけるτの範囲は２≦τ≦Ｔとなる。
また、（ｘ，ｙ）が（Ｍ，Ｎ）の座標内に属さない場合、または、ｔ≦０である場合、あるいは、τが１からＴに該当しない場合には、
Ｓ（ｘ，ｙ，τ，ｔ）＝∞，ｄ（ｘ，ｙ，τ，ｔ）＝∞となる。 Assuming that the coefficient indicating the spatial deformation is α, α is any value of {1/2, 1, 2}.
Then, the recurrence formula for obtaining S (x, y, T, t) is S (x, y, 1, t) when τ = 1 as an initial condition.
S (x, y, 1, t) = 3d (x, y, 1, t) Equation 6
Thus, the following formula is obtained.

式１２は、式７の場合に比べてα∈｛１／２，１，２｝というパラメータが導入されている点で相違している。このαが上述した空間的変形を示す係数である。右辺の式において、αに１／２，１，２のいずれかの値が該当する場合に最小値をとる漸化式の値が選択されることになる。式１２において、αのパラメータが導入されている部分を見ると、例えば、Ｓ（ｘ−α・ｅ_１（τ），ｙ−α・ｅ_２（τ），τ−１，ｔ−２）のように、参照動画像の大きさが部分的にα倍されている。従って、αが１より大きい場合には、（ｘ−α・ｅ_１（τ），ｙ−α・ｅ_２（τ））の空間位置（画像平面位置）が参照され、参照された空間位置が、（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ））の位置よりも参照動画像の空間位置において局所的に拡大されることになる。一方で、αが１より小さい場合には、参照される（ｘ−α・ｅ_１（τ），ｙ−α・ｅ_２（τ））の空間位置（画像平面位置）が、（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ））の位置よりも局所的に縮小されることになる。全体としてこれらの空間局所的な拡大と縮小との積み重ねによりマッチングが行われる。このようなマッチングにおいて、図７（ａ）〜（ｃ）に示すような、空間的な大きさの拡大・縮小に対する対応がなされ、同時に時間的な拡張と縮小が行われることになる。 Expression 12 differs from Expression 7 in that a parameter of α∈ {1/2, 1, 2} is introduced. This α is a coefficient indicating the above-described spatial deformation. In the expression on the right side, the value of the recurrence formula that takes the minimum value is selected when any of 1/2, 1, and 2 corresponds to α. Looking at the part where the α parameter is introduced in Equation 12, for example, S (x−α · e ₁ (τ), y−α · e ₂ (τ), τ−1, t−2) Thus, the size of the reference moving image is partially multiplied by α. Therefore, when α is larger than ₁ , the spatial position (image plane position) of (x−α · e ₁ (τ), y−α · e ₂ (τ)) is referred to, and the referenced spatial position is , (X−e ₁ (τ), ye ₂ (τ)), the image is locally expanded at the spatial position of the reference moving image. On the other hand, when α is smaller than ₁ , the spatial position (image plane position) of (x−α · e ₁ (τ), y−α · e ₂ (τ)) referred to is (x−e). ₁ (τ), y−e ₂ (τ)). As a whole, matching is performed by stacking these spatial local enlargements and reductions. In such matching, as shown in FIGS. 7A to 7C, the spatial size is expanded / reduced, and temporal expansion and reduction are performed at the same time.

［汎用性を持つ要素的な動きからなる参照動画像］
これまで説明した時空間連続ＤＰでは、予め設定された１つの参照動画像のパターンが、入力動画像において認識されることになる。従って、事前に、対象物の動きを示す参照動画像を設定することが前提となる。複数種類の対象物の動きのマッチングを行うためには、予めその対象物の動きに対応する複数の参照動画像（カテゴリ毎に異なる参照動画像）を用意する必要がある。複数の参照動画像（カテゴリ）を区別するために、参照動画像に番号ｉを附するものとする。番号ｉに該当する参照動画像のマッチング座標位置（ｘ_ｉ，ｙ_ｉ，ｔ）は、累積された局所距離Ｓ_ｉ（ｘ，ｙ，Ｔ，ｔ）を計算し、下記の式を用いることにより、求めることができる。

ｈはマッチング検出を行うための閾値を示している。 [Reference video consisting of versatile elemental motion]
In the spatio-temporal continuous DP described so far, one preset reference moving image pattern is recognized in the input moving image. Therefore, it is assumed that a reference moving image indicating the movement of the object is set in advance. In order to perform matching of movements of a plurality of types of objects, it is necessary to prepare a plurality of reference moving images (reference moving images that differ for each category) corresponding to the movements of the objects in advance. In order to distinguish a plurality of reference moving images (categories), the reference moving image is numbered i. The matching coordinate position (x _i , y _i , t) of the reference moving image corresponding to the number i is calculated by calculating the accumulated local distance S _i (x, y, T, t) and using the following formula: Can be sought.

h represents a threshold value for performing matching detection.

マッチング検出を行うための対象物の動きが、一定の特徴的な動きにより構成されたり、決まった動きで構成される場合には、該当する動きを参照動画像として準備することは容易である。しかしながら、例えば、町中での人物の多様な動きや、複雑なジェスチャなどは、その全ての動きを１つの参照動画像を用いて特定すること（１つの参照動画像を用いて１つのカテゴリとして特定すること）が困難な場合もある。このような場合には、複数の概念を表す参照動画像の一群を想定し、それらの組み合わせによって対象物の動きを識別することが必要になる。その典型例として、本実施の形態では、汎用性を持つ要素的（プリミティブ）な動きの参照動画像を複数設定しておき、それらの出力の連結によって、多様な動きの抽出を行うことを特徴とする。図８（ａ）〜（ｃ）はその一例を示した図である。 When the movement of the object for performing matching detection is constituted by a certain characteristic movement or a predetermined movement, it is easy to prepare the corresponding movement as a reference moving image. However, for example, various movements of people in a town, complicated gestures, and the like are specified using one reference moving image (as one category using one reference moving image). It may be difficult to identify). In such a case, it is necessary to identify a group of reference moving images representing a plurality of concepts, and to identify the movement of the object by combining them. As a typical example, in the present embodiment, a plurality of elemental (primitive) motion reference moving images having versatility are set, and various motions are extracted by connecting their outputs. And FIGS. 8A to 8C are diagrams showing an example thereof.

図８（ａ）には、８方向の矢印で示した動きを、８つのプリミティブな動きを示す参照動画像として設定した場合を示している。具体的には、番号ｉ＝１〜８までの８個の放射状の参照動画像が設定されている。図８（ｂ）には、図８（ａ）に示したｉ＝２の動きを、τが１からＴまで行う参照動画像を示している。 FIG. 8A shows a case where the motion indicated by the arrows in eight directions is set as a reference moving image indicating eight primitive motions. Specifically, eight radial reference moving images with numbers i = 1 to 8 are set. FIG. 8B shows a reference moving image in which the motion of i = 2 shown in FIG. 8A is performed from τ to 1 to T.

図８（ａ）に示した８個の参照動画像を用いて、時空間連続ＤＰに基づき、入力動画像から対象物の動きを抽出する（抽出機能）ことにより、各参照動画像の動きを連結させた動きの軌跡抽出（トラッキング）を行うことが可能となる。図８（ｃ）は、図８（ａ）に示した８個の参照動画像を汎用プリミティブとして捉えて、汎用プリミティブの識別結果の結合により、複数の対象物の任意の動きを軌跡抽出（トラッキング）した様子を示している。 The motion of each reference video is extracted by extracting the motion of the object from the input video based on the spatio-temporal DP using the 8 reference video shown in FIG. It is possible to perform track extraction (tracking) of linked movements. In FIG. 8C, the eight reference moving images shown in FIG. 8A are regarded as general-purpose primitives, and arbitrary movements of a plurality of objects are extracted (tracked) by combining the identification results of the general-purpose primitives. ).

［軌跡抽出］
本実施の形態において説明した時空間連続ＤＰを用いることにより、参照動画像における対象物の動きのパターンを１つのカテゴリとして認識し、この認識結果の抽出を行うことができる。さらに時空間連続ＤＰを用いることにより、参照動画像のパターンに類似的に一致する入力動画像のパターンを軌跡として抽出することが可能である。 [Track Extraction]
By using the spatio-temporal continuous DP described in the present embodiment, it is possible to recognize the movement pattern of the object in the reference moving image as one category and extract the recognition result. Furthermore, by using the spatio-temporal continuous DP, it is possible to extract an input moving image pattern that similarly matches the reference moving image pattern as a trajectory.

時空間連続ＤＰでは、特徴パターンを抽出する方法を用いるのではなく、累積された局所距離に基づいて、入力動画像における参照動画像のパターンに全体として最適に類似する部分を求めることを特徴とする。このため、時空間連続ＤＰにより求められる軌跡は、オクルージョンに頑健であるという特徴を備えている。ここで、オクルージョンとは、動画像でとられた対象物の動きが、その前景に存在する他の遮蔽物によって一時的に隠されてしまう場合を意味している。設定された参照動画像のパターンに対するオクルージョンへの頑健さの検証には、軌跡抽出を行うことが好ましい。 In spatio-temporal continuous DP, instead of using a method of extracting a feature pattern, it is characterized in that a portion optimally similar to the reference moving image pattern as a whole in the input moving image is obtained based on the accumulated local distance. To do. For this reason, the trajectory calculated | required by spatio-temporal continuous DP has the characteristic of being robust to occlusion. Here, the occlusion means a case where the movement of the object taken in the moving image is temporarily hidden by another shielding object existing in the foreground. In order to verify the robustness to the occlusion with respect to the set pattern of the reference moving image, it is preferable to perform trajectory extraction.

本実施の形態における軌跡抽出では、リアルタイムトラッキング(real-time tracking)と呼ばれる方法を用いる。リアルタイムトラッキングと呼ばれる軌跡抽出方法は、図９（ａ）に示すように、時間ｔにスポッティングされたτ＝１からτ＝Ｔまでの座標を、時間ｔの画像上に全てプロットする方法である。リアルタイムトラッキングでは、スポッティングされた時空間点（ｘ，ｙ，ｔ）から、以下に示すアルゴリズムを用いて経路を逆に辿ることによって抽出することができる。 In the locus extraction in the present embodiment, a method called real-time tracking is used. The trajectory extraction method called real-time tracking is a method of plotting all the coordinates from τ = 1 to τ = T spotted at time t on the image at time t, as shown in FIG. In real-time tracking, it is possible to extract from a spotted spatio-temporal point (x, y, t) by following the route in reverse using the algorithm shown below.

抽出するスポッティング点を、時空間点（ｘ，ｙ，ｔ）とし、初期値として、τ＝Ｔに設定すると共に、
Ｂ＝
｛（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２），
（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１），
（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）｝
とする。 The spotting point to be extracted is set as a spatiotemporal point (x, y, t), and an initial value is set to τ = T,
B =
{(X−e ₁ (τ), ye ₂ (τ), τ−1, t−2),
(X−e ₁ (τ), ye ₂ (τ), τ−1, t−1),
(X−e ₁ (τ) −e ₁ (τ−1), y−e ₂ (τ) −e ₂ (τ−1), τ−2, t−1)}
And

そして、

となる（ｘ^＊，ｙ^＊，τ^＊，ｔ^＊）を求める。求められたｘ^＊，ｙ^＊，ｔ^＊を、式１４のｘ，ｙ，ｔに置き換えるとともに、τの値を１だけ減じて、新たな（ｘ^＊，ｙ^＊，τ^＊，ｔ^＊）を求める。その後、τ＝１になるまで、求められたｘ^＊，ｙ^＊，ｔ^＊を、式１４のｘ，ｙ，τ，ｔに置き換えることにより、各τにおけるスポッティング点を、時空間点（ｘ，ｙ，ｔ）より求めて、軌跡の抽出を行う。 And

(X ^* , y ^* , τ ^* , t ^* ) is obtained. The obtained x ^* , y ^* , and t ^* are replaced with x, y, and t in Expression 14, and the value of τ is reduced by 1 to obtain a new (x ^* , y ^* , τ ^* , t ^* ). Ask. Thereafter, the obtained x ^* , y ^* , t ^* is replaced with x, y, τ, t in Equation 14 until τ = 1, whereby spotting points at each τ are expressed as spatio-temporal points (x, The trajectory is extracted from y, t).

上述した方法で軌跡を求める場合には、パラメータとしてｘ，ｙ，ｔの３変数を用いたｆ（ｘ，ｙ，ｔ）のアルゴリズムを一例として示したが、次元の少ないｆ（ｘ，ｔ）や、次元の多いｆ（ｘ，ｙ，ｚ，ｔ）に拡張することも容易である。次元の少ないｆ（ｘ，ｔ）の場合については、図９（ｂ）に示されるようなパターンを、一例として参照動画像のパターンに用いることが可能である。 When the trajectory is obtained by the above-described method, the algorithm of f (x, y, t) using three variables of x, y, and t as parameters is shown as an example. However, f (x, t) having a small dimension is used. It is also easy to extend to f (x, y, z, t) having many dimensions. In the case of f (x, t) having a small number of dimensions, a pattern as shown in FIG. 9B can be used as a reference moving image pattern as an example.

［動画像処理装置］
次に、時空間連続ＤＰを用いて、入力動画像から参照動画像のパターンに類似する物体の動きを抽出するための装置の一例として、動画像処理装置を示して説明を行う。図１０は、動画像処理装置の概略構成を示したブロック図である。動画像処理装置１は、一般的なコンピュータにより構成することができ、図１０に示すように、ＣＰＵ（Central Processing Unit）（動画像処理手段）２と、ＲＯＭ（Read Only Memory）３と、ＲＡＭ（Random Access Memory）４と、記録部（記録手段）５と、ディスプレイ部６と、操作部７とにより概略構成されている。 [Moving image processing device]
Next, a moving image processing apparatus will be described as an example of an apparatus for extracting a motion of an object similar to a reference moving image pattern from an input moving image using a spatiotemporal continuous DP. FIG. 10 is a block diagram showing a schematic configuration of the moving image processing apparatus. The moving image processing apparatus 1 can be constituted by a general computer. As shown in FIG. 10, a CPU (Central Processing Unit) (moving image processing means) 2, a ROM (Read Only Memory) 3, and a RAM (Random Access Memory) 4, a recording unit (recording unit) 5, a display unit 6, and an operation unit 7 are roughly configured.

ＣＰＵ２は、入力動画像から参照動画像のパターンに類似する対象物の動きを抽出すると共に、抽出された対象物の動きの軌跡を求める役割を有している。ＣＰＵ２は、後述する処理プログラム（例えば、図１１に示すフローチャートに基づくプログラム）に従って、入力動画像と参照動画像とのマッチング処理を行う。ＲＡＭ４は、ＣＰＵ２の処理に利用されるワークエリアとして用いられる。ＲＯＭ３には、上述したマッチング処理に関するプログラム等が記録されている。ＣＰＵ２は、ＲＯＭ３より読み込んだプログラムに従って、入力動画像と参照動画像とのマッチング処理を行う。なお、本実施の形態に係る動画像処理装置１では、ＣＰＵ２において実行されるマッチング処理に関するプログラムを、ＲＯＭ３に記録する構成として説明を行うが、これらのプログラムは、記録部５に記録されるものであってもよい。 The CPU 2 has a role of extracting a movement of the object similar to the pattern of the reference moving image from the input moving image and obtaining a locus of the extracted movement of the object. The CPU 2 performs a matching process between the input moving image and the reference moving image in accordance with a processing program (for example, a program based on the flowchart shown in FIG. 11) described later. The RAM 4 is used as a work area used for processing by the CPU 2. The ROM 3 stores a program related to the matching process described above. The CPU 2 performs matching processing between the input moving image and the reference moving image according to the program read from the ROM 3. In the moving image processing apparatus 1 according to the present embodiment, a program related to matching processing executed by the CPU 2 will be described as a configuration recorded in the ROM 3, but these programs are recorded in the recording unit 5. It may be.

記録部５には、マッチング処理に用いられる動画像のデータなどが記録されている。具体的には、参照動画像のデータをカテゴリに応じて複数記録することが可能となっている。また、本実施の形態に係る動画像処理装置１では、後述するようにビデオカメラで撮影された映像を入力動画像として採用するため、撮影された入力動画像をデータとして予め記録部５に記録する必要はない。但し、入力動画像の時間ｔにおけるデータは、参照動画像とのマッチング処理を行うために、記録部５に一時的に記録される。ＣＰＵ２では、一時的に記録された入力動画像の時間ｔにおけるデータに基づいて、後述する対象物の動きの認識処理および軌跡抽出を行う。 The recording unit 5 records moving image data used for matching processing. Specifically, a plurality of reference moving image data can be recorded according to the category. In addition, since the moving image processing apparatus 1 according to the present embodiment employs a video captured by a video camera as an input moving image as described later, the captured input moving image is recorded in the recording unit 5 in advance as data. do not have to. However, data at time t of the input moving image is temporarily recorded in the recording unit 5 in order to perform matching processing with the reference moving image. In the CPU 2, the object movement recognition process and the locus extraction described later are performed based on the temporarily recorded data at the time t of the input moving image.

本実施の形態に係る記録部５は、一般的なハードディスクにより構成されている。なお、記録部５の構成は、ハードディスクだけに限定されるものではなく、フラッシュメモリ、ＳＳＤ（Solid State Drive / Solid State Disk）などのように、マッチング処理に用いられる動画データをＣＰＵ２が読み出し可能な状態で記録することができるものであるならば、具体的な構成は特に限定されるものではない。 The recording unit 5 according to the present embodiment is configured by a general hard disk. The configuration of the recording unit 5 is not limited to the hard disk, but the CPU 2 can read moving image data used for matching processing such as a flash memory and an SSD (Solid State Drive / Solid State Disk). The specific configuration is not particularly limited as long as it can be recorded in a state.

ディスプレイ部６は、記録部５に記録される動画をユーザに対して視認可能に表示させ、また、ＣＰＵ２によるマッチング処理結果を出力動画像として表示させる役割を有している。ディスプレイ部６には、液晶ディスプレイや、ＣＲＴディスプレイなどの一般的な表示装置が用いられる。 The display unit 6 has a role of displaying the moving image recorded in the recording unit 5 so as to be visible to the user, and displaying the result of the matching process by the CPU 2 as an output moving image. A general display device such as a liquid crystal display or a CRT display is used for the display unit 6.

操作部７は、ユーザが動画像処理装置１を操作するために必要なデータの入力や、動画像処理装置１の具体的な操作などを行う場合に用いられるデバイスであって、一般的なキーボードやマウスなどにより構成される。 The operation unit 7 is a device used when a user inputs data necessary for operating the moving image processing apparatus 1 or performs a specific operation of the moving image processing apparatus 1, and is a general keyboard. And mouse.

ＣＰＵ２は、ＲＯＭ３より演算処理に必要なプログラムを読み取り、読み取ったプログラムに従って、時空間連続ＤＰを用いて、参照動画像のパターンに類似する対象物の動きを、入力動画像から認識し、その軌跡を抽出する処理を行う。 The CPU 2 reads a program necessary for arithmetic processing from the ROM 3, recognizes the movement of the object similar to the pattern of the reference moving image from the input moving image using the spatiotemporal continuous DP according to the read program, and the locus thereof The process which extracts is performed.

対象物の動きを認識するにあたって、まず、記録部５に参照動画像を記録させる。なお、入力動画像は、上述したように、ビデオカメラにより撮影されたリアルタイムの映像をそのまま利用するため、予め記録部５に記録させておく必要がない。入力動画像においてマッチング処理に必要なデータが一時的に記録部５に記録される。 In recognizing the movement of the object, first, the reference moving image is recorded in the recording unit 5. Note that, as described above, the input moving image uses a real-time image captured by the video camera as it is, and therefore it is not necessary to record it in the recording unit 5 in advance. Data necessary for the matching process in the input moving image is temporarily recorded in the recording unit 5.

本実施の形態に係る動画像処理装置１では、マウスを用いて一筆書きする様子を参照動画像の一例として用意する。このときの参照動画像の動きは、マウスカーソルによる一筆書きの動きとなり、参照動画像の画像平面における時間τごとの座標点（ξ（τ），η（τ））で示されることになる（τ＝１，２，・・・Ｔ）。但し、τは参照動画像の動画開始から終了までの時間を示し、時間Ｔで参照動画像が終了する。またその座標点における明暗値（輝度）をＺ（ξ（τ），η（τ））で示し、一例として、本実施の形態では、Ｚ（ξ（τ），η（τ））＝７０に決定する。なお、参照動画像は、必ずしもマウスで一筆書きされるものには限定されない。他の手法を用いることにより、対象物の動きを参照動画像として記録することができるのであれば、その作成方法はどのような方法で作成（撮影）されたものであってもよい。 In the moving image processing apparatus 1 according to the present embodiment, a state in which one stroke is written using a mouse is prepared as an example of a reference moving image. The movement of the reference moving image at this time is a one-stroke drawing movement with the mouse cursor, and is indicated by coordinate points (ξ (τ), η (τ)) for each time τ on the image plane of the reference moving image ( τ = 1, 2,... T). However, τ indicates the time from the start to the end of the moving image of the reference moving image, and the reference moving image ends at time T. Further, the brightness value (luminance) at the coordinate point is indicated by Z (ξ (τ), η (τ)). As an example, in this embodiment, Z (ξ (τ), η (τ)) = 70. decide. Note that the reference moving image is not necessarily limited to one written with a mouse. As long as the movement of the object can be recorded as a reference moving image by using another method, the creation method may be created (photographed) by any method.

次に、入力動画像の一例として、ビデオカメラにより人物や物等が動く様子を撮影する。本実施の形態では、ビデオカメラにより撮影された映像から、Ｒ（赤色）、Ｇ（緑色）、Ｂ（青色）の３色の値を求めて、明暗値に相当するＧ（緑色）の成分値ｇのみを取り出して、時間ｔにおける空間位置（ｘ，ｙ）の画素値を、ｆ（ｘ，ｙ，ｔ）＝ｇとして定める。 Next, as an example of the input moving image, a state in which a person or an object moves by a video camera is photographed. In this embodiment, R (red), G (green), and B (blue) values of three colors are obtained from the video shot by the video camera, and the G (green) component value corresponding to the brightness value is obtained. Only g is taken out, and the pixel value at the spatial position (x, y) at time t is defined as f (x, y, t) = g.

次に、ＣＰＵ２は、時空間連続ＤＰを用いて、入力動画像において参照動画像の動画パターンに類似する動きが存在するか否かの抽出（識別抽出）を行うとともに、その軌跡を求めて、出力動画像として入力動画像に軌跡を重畳・付加した動画像を出力する。 Next, the CPU 2 uses the spatiotemporal continuous DP to extract (identify and extract) whether or not there is a motion similar to the moving image pattern of the reference moving image in the input moving image, and obtain the locus thereof. As an output moving image, a moving image in which a locus is superimposed on and added to the input moving image is output.

図１１は、ＣＰＵ２における動きの識別抽出と軌跡抽出処理を示したフローチャートである。ＣＰＵ２は、まず、時間ｔにおける入力動画像の各座標点（ｘ，ｙ）の画素の明暗値（輝度）を求める（ステップＳ．１）。ここで、入力動画像の座標はｆ（ｘ，ｙ，ｔ）によって表すことができる。ビデオカメラによりリアルタイムで撮影される動画を入力動画像として用いることにより、時間ｔは、１≦ｔ≦∞の値をとることになる。また、入力動画像の座標位置（ｘ，ｙ）の値は、既に説明したように、１≦ｘ≦Ｍ、１≦ｙ≦Ｎとなる。 FIG. 11 is a flowchart showing motion identification extraction and locus extraction processing in the CPU 2. The CPU 2 first obtains the brightness value (luminance) of the pixel at each coordinate point (x, y) of the input moving image at time t (step S.1). Here, the coordinates of the input moving image can be represented by f (x, y, t). By using a moving image photographed in real time by the video camera as an input moving image, the time t takes a value of 1 ≦ t ≦ ∞. In addition, the value of the coordinate position (x, y) of the input moving image is 1 ≦ x ≦ M and 1 ≦ y ≦ N as already described.

次に、ＣＰＵ２は、参照動画像の３次元空間（画像平面の２次元に時間要素を加えた３次元）において、上述した明暗値（輝度）７０を示す画素の軌跡を求める（ステップＳ．２）。ここで、参照動画像の時間τにおける座標点は、Ｚ（ξ（τ），η（τ））で示すことができ、τは１≦τ≦Ｔとなる。また、また、参照動画像の座標位置（ξ（τ），η（τ））の値は、既に説明したように、１≦ξ（τ）≦Ｍ，１≦η（τ）≦Ｎとなる。また、明暗値（輝度）の値は、軌跡上の点で全て同じ値である。 Next, the CPU 2 obtains the locus of the pixel indicating the above-described brightness value (luminance) 70 in the three-dimensional space of the reference moving image (three-dimensional image obtained by adding a time element to the two-dimensional image plane) (step S.2). ). Here, the coordinate point at the time τ of the reference moving image can be represented by Z (ξ (τ), η (τ)), and τ is 1 ≦ τ ≦ T. In addition, the values of the coordinate positions (ξ (τ), η (τ)) of the reference moving image are 1 ≦ ξ (τ) ≦ M and 1 ≦ η (τ) ≦ N as already described. . In addition, the brightness value (brightness) value is the same at all points on the trajectory.

そして、ＣＰＵ２は、参照動画像における時間τとその１時点前のτ−１とのそれぞれの座標の差を、下記のように、ξ（τ）とη（τ）とでそれぞれ求める（ステップＳ．３）。
ｅ_１（τ）＝ξ（τ）−ξ（τ−１）
ｅ_２（τ）＝η（τ）−η（τ−１）
ｅ_１（τ）は、ξ（τ）座標における軌跡の差を示し、ｅ_２（τ）は、η（τ）座標における軌跡の差を示している。ＣＰＵ２は、求めたｅ_１（τ）およびｅ_２（τ）をそれぞれＲＡＭ４に記憶する。 Then, the CPU 2 obtains the difference in coordinates between the time τ in the reference moving image and τ−1 one point before that as ξ (τ) and η (τ) as follows (step S). .3).
e ₁ (τ) = ξ (τ) −ξ (τ−1)
e ₂ (τ) = η (τ) −η (τ−1)
e ₁ (τ) represents a difference in locus in ξ (τ) coordinates, and e ₂ (τ) represents a difference in locus in η (τ) coordinates. The CPU 2 stores the obtained e ₁ (τ) and e ₂ (τ) in the RAM 4.

そして、ＣＰＵ２は、下記の式に基づいて、時間ｔにおける入力動画像の各座標点（ｘ，ｙ）の画素の明暗値（輝度）と参照動画像の軌跡の各点の明暗値との差の絶対値を、下記の式に基づいて求める（ステップＳ．４：局所距離算出機能）。このようにして求められた差の絶対値は局所距離ｄである。 The CPU 2 then calculates the difference between the brightness value (brightness) of the pixel at each coordinate point (x, y) of the input moving image at the time t and the brightness value of each point of the trajectory of the reference moving image based on the following equation. Is obtained based on the following equation (step S.4: local distance calculation function). The absolute value of the difference thus obtained is the local distance d.

ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖
その後、ＣＰＵ２は、時間τにおいて距離の算出に使う局所距離ｄ（ｘ，ｙ，τ，ｔ）と、時間ｔにおける距離の累積値（累積距離の値：評価関数に該当する）Ｓ（ｘ，ｙ，τ，ｔ）とを変数として使用するためにメモリを確保する。また、（ｘ，ｙ）のｘが１からＭの範囲に属さないか、ｙが１からＮの範囲に属さないか、ｔ≦０であるか、τが１からＴの範囲に属さない場合に、ｄ（ｘ，ｙ，τ，ｔ）＝∞，Ｓ（ｘ，ｙ，τ，ｔ）＝∞となる旨の初期設定を行う（ステップＳ．５）。 d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) − f (x, y, t) ‖
After that, the CPU 2 uses the local distance d (x, y, τ, t) used for calculating the distance at the time τ and the cumulative value of the distance at the time t (cumulative distance value: corresponding to the evaluation function) S (x, Memory is reserved for using y, τ, t) as variables. In addition, when x of (x, y) does not belong to the range of 1 to M, y does not belong to the range of 1 to N, t ≦ 0, or τ does not belong to the range of 1 to T In addition, initial setting is made so that d (x, y, τ, t) = ∞ and S (x, y, τ, t) = ∞ (step S.5).

そして、ＣＰＵ２は、時間τが１の場合における累積距離の値Ｓ（ｘ，ｙ，１，ｔ）を、Ｓ（ｘ，ｙ，１，ｔ）＝３ｄ（ｘ，ｙ，１，ｔ）と設定した上で、下記の式に基づいて、各時間ｔにおいて、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）の最小値を求める（ステップＳ．６：評価関数算出機能）。

Then, the CPU 2 sets the cumulative distance value S (x, y, 1, t) when the time τ is 1 to S (x, y, 1, t) = 3d (x, y, 1, t). After setting, the minimum value of the cumulative distance value S (x, y, τ, t) is obtained at each time t based on the following equation (step S.6: evaluation function calculation function).

この漸化式におけるτの範囲は２≦τ≦Ｔとなる。
具体的に、ＣＰＵ２は、累積距離の値Ｓ（ｘ，ｙ，１，ｔ）を、τが１の時の距離の値ｄ（ｘ，ｙ，１，ｔ）に係数３を乗算したものに置き換える。ここで、累積距離の値の更新は、τのパラメータに基づいて、τ＝１，２，３，・・・，Ｔの順番に行われる。さらに、ＣＰＵ２は、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）の更新を、式７のmin｛・・・｝内の３つ式のうち、最も小さな値となる式を用いて、順次、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）の値を繰り返し計算する。 The range of τ in this recurrence formula is 2 ≦ τ ≦ T.
Specifically, the CPU 2 calculates the cumulative distance value S (x, y, 1, t) by multiplying the distance value d (x, y, 1, t) when τ is 1 by the coefficient 3. replace. Here, the value of the cumulative distance is updated in the order of τ = 1, 2, 3,..., T based on the parameter of τ. Further, the CPU 2 updates the cumulative distance value S (x, y, τ, t) using an expression that is the smallest value among the three expressions in min {. Sequentially, the cumulative distance value S (x, y, τ, t) is repeatedly calculated.

式７のmin{・・・｝内の１番上の式
Ｓ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２）＋２ｄ（ｘ，ｙ，τ，ｔ−１）＋ｄ（ｘ，ｙ，τ，ｔ）
・・・式１１
では、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）のｘとｙとτとｔについて、ｘ−ｅ_１（τ）とｙ−ｅ_２（τ）とを計算し、τ−１を計算し、さらに、ｔ−２を計算することにより、これらの計算結果に基づいて、累積距離の値Ｓ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２）を求める。そして求められた値に、距離の値ｄ（ｘ，ｙ，τ，ｔ−１）に係数２を乗算した値と、距離の値ｄ（ｘ，ｙ，τ，ｔ）とを加算することにより計算を行う。 The uppermost expression S (x−e ₁ (τ), ye ₂ (τ), τ−1, t−2) + 2d (x, y, τ, t-1) + d (x, y, τ, t)
... Formula 11
Then, x−e ₁ (τ) and ye ₂ (τ) are calculated for x, y, τ, and t of the cumulative distance value S (x, y, τ, t), and τ−1 is calculated. calculated, further, by calculating the t-2, based on these calculations, the cumulative distance value _{S (x-e 1 (τ} ), y-e 2 (τ), τ-1, t- 2). Then, by adding the value obtained by multiplying the value d (x, y, τ, t−1) by the coefficient 2 and the distance value d (x, y, τ, t) to the obtained value. Perform the calculation.

また、式７のmin{・・・｝内の２番目の式
Ｓ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１）＋３ｄ（ｘ，ｙ，τ，ｔ）
・・・式９
では、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）のｘとｙとτとｔについて、ｘ−ｅ_１（τ）とｙ−ｅ_２（τ）とを計算し、τ−１を計算し、さらに、ｔ−１を計算することにより、これらの計算結果に基づいて、累積距離の値Ｓ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１）を求める。そして求められた値に、距離の値ｄ（ｘ，ｙ，τ，ｔ）に係数３を乗算した値を加算することにより計算を行う。 In addition, the second expression S (x−e ₁ (τ), ye ₂ (τ), τ−1, t−1) + 3d (x, y, τ) in min {. , T)
... Equation 9
Then, x−e ₁ (τ) and ye ₂ (τ) are calculated for x, y, τ, and t of the cumulative distance value S (x, y, τ, t), and τ−1 is calculated. Then, by calculating t−1, based on these calculation results, the cumulative distance values S (x−e ₁ (τ), ye ₂ (τ), τ−1, t− 1) is determined. Then, the calculation is performed by adding the value obtained by multiplying the value d (x, y, τ, t) by the coefficient 3 to the obtained value.

さらに、式７のmin{・・・｝内の３番目の式
Ｓ（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）＋３ｄ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ）＋３ｄ（ｘ，ｙ，τ，ｔ）
・・・・式１０
では、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）のｘとｙとτとｔについて、τ−１を計算し、さらに、ｅ_１（τ）およびｅ_１（τ−１）と、ｅ_２（τ）およびｅ_２（τ−１）とを、ＲＡＭ４から読み出して、ｘ−ｅ_１（τ）−ｅ_１（τ−１）と、ｙ−ｅ_２（τ）−ｅ_２（τ−１）とを計算する。さらに、τ−２を計算し、さらにｔ−１を計算することにより、計算結果に基づいて、累積距離の値Ｓ（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）を求める。そして求められた値に、距離の値ｄ（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ）に係数３を乗算した値と、距離の値ｄ（ｘ，ｙ，τ，ｔ）に係数３を乗算した値を加算することにより計算を行う。 Further, the third expression S in min of Formula _{7 {···} (x-e} 1 (τ) -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1) , Τ−2, t−1) + 3d (x−e ₁ (τ), ye ₂ (τ), τ−1, t) + 3d (x, y, τ, t)
.... Formula 10
Then, τ−1 is calculated for x, y, τ, and t of the cumulative distance values S (x, y, τ, t), and e ₁ (τ) and e ₁ (τ−1) e ₂ and (tau) and _{e 2} (τ-1), is read from RAM4, _{x-e 1} and _{(τ) -e 1 (τ-} 1), y-e 2 (τ) -e 2 (τ -1). Further, by calculating τ−2 and further calculating t−1, the cumulative distance value S (x−e ₁ (τ) −e ₁ (τ−1), ye is calculated based on the calculation result. ₂ (τ) −e ₂ (τ−1), τ−2, t−1) is obtained. Then, the value obtained by multiplying the value d (x−e ₁ (τ), ye ₂ (τ), τ−1, t) by the coefficient 3 and the distance value d (x, Calculation is performed by adding a value obtained by multiplying y, τ, t) by the coefficient 3.

このようにして、式９〜式１１の値をそれぞれ計算した上で、３つの式の値の中から最も小さな値となるものを求めて、累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）の更新を行う。 In this way, after calculating the values of Expressions 9 to 11, the smallest one of the three expression values is obtained, and the cumulative distance value S (x, y, τ, t ).

次に、ＣＰＵ２は、次の式８を用いて、入力動画像において参照動画像のパターンに対応する動きを識別できたかどうかの検出判断を行う（ステップＳ．７：座標位置算出機能、抽出機能）。 Next, the CPU 2 uses the following equation 8 to determine whether or not the movement corresponding to the pattern of the reference moving image has been identified in the input moving image (step S.7: coordinate position calculation function, extraction function) ).

この式８では、上述したステップＳ．６において、各時間ｔにおいて計算された、τ＝Ｔである累積距離の値Ｓ（ｘ，ｙ，Ｔ，ｔ）を３Ｔで割った値を用いて識別結果を求める。この式１３を用いることにより、Ｓ（ｘ，ｙ，Ｔ，ｔ）を３Ｔで割った値で、事前に定めた閾値ｈ以下であるような画像平面の点（ｘ，ｙ）の集まりを検出することが可能となる。この点の集まりは画像平面において領域を形成することになり、この領域の１つ１つがローカルエリア（local area）を構成することになる。ＣＰＵ２は、このローカルエリア（local area）の中の点（座標の１つ）であって、Ｓ（ｘ，ｙ，Ｔ，ｔ）／３Ｔの値が最小となる座標を求める。求められた座標点で、参照動画像で示されたパターンが識別されることになる。この座標点がスポッティング点（spotting point）に該当する。

In this equation 8, the above-described step S.E. 6, an identification result is obtained by using a value obtained by dividing the cumulative distance value S (x, y, T, t) with τ = T calculated at each time t by 3T. By using Expression 13, a set of points (x, y) on the image plane that is equal to or less than a predetermined threshold value h is obtained by dividing S (x, y, T, t) by 3T. It becomes possible to do. This collection of points will form a region in the image plane, and each of these regions will constitute a local area. The CPU 2 obtains a coordinate (one of the coordinates) in the local area that has the minimum value of S (x, y, T, t) / 3T. The pattern indicated by the reference moving image is identified at the obtained coordinate point. This coordinate point corresponds to a spotting point.

そして、ＣＰＵ２は、求められたスポッティング点で抽出される軌跡を求める処理を行う（ステップＳ．８：軌跡算出機能）。 And CPU2 performs the process which calculates | requires the locus | trajectory extracted by the calculated | required spotting point (step S.8: locus | trajectory calculation function).

まず、ＣＰＵ２は、時間ｔにおけるスポッティング点を（ｘ，ｙ，ｔ）と定義し、初期設定として、時間パラメータτの値を、参照動画像の終端の時間である時間Ｔに設定する。次に、ＣＰＵ２は、軌跡の抽出に用いられる３つの時空間点の選択肢Ｂを定める。Ｂは、
Ｂ＝｛（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２），
（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１），
（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）｝
で示される。 First, the CPU 2 defines a spotting point at time t as (x, y, t), and sets the value of the time parameter τ as time T, which is the end time of the reference moving image, as an initial setting. Next, the CPU 2 determines the choices B for the three spatio-temporal points used for extracting the trajectory. B is
B = {(x−e ₁ (τ), ye ₂ (τ), τ−1, t−2),
(X−e ₁ (τ), ye ₂ (τ), τ−1, t−1),
(X−e ₁ (τ) −e ₁ (τ−1), y−e ₂ (τ) −e ₂ (τ−1), τ−2, t−1)}
Indicated by

１番目の値の時空間点の候補である（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−２）は、スポッティング点（ｘ，ｙ，ｔ）のｘとｙとτとｔの値について、ｘ−ｅ_１（τ）とｙ−ｅ_２（τ）とを計算し、τ−１を計算し、さらに、ｔ−２を計算することにより求められる座標点を示している。 The first value space-time point candidate (xe ₁ (τ), ye ₂ (τ), τ-1, t-2) is the x of the spotting point (x, y, t). Coordinates obtained by calculating xe ₁ (τ) and ye ₂ (τ), calculating τ-1, and further calculating t-2. Shows the point.

２番目の値の時空間点の候補である（ｘ−ｅ_１（τ），ｙ−ｅ_２（τ），τ−１，ｔ−１）は、スポッティング点（ｘ，ｙ，ｔ）のｘとｙとτとｔの値について、ｘ−ｅ_１（τ）とｙ−ｅ_２（τ）とを計算し、τ−１を計算し、さらに、ｔ−１を計算することにより求められる座標点を示している。 The second value spatio-temporal point candidate (xe ₁ (τ), ye ₂ (τ), τ−1, t−1) is the x of the spotting point (x, y, t). X, y, τ, and t values, x−e ₁ (τ) and ye ₂ (τ) are calculated, τ−1 is calculated, and t−1 is further calculated. Shows the point.

３番目の値の時空間点の候補である（ｘ−ｅ_１（τ）−ｅ_１（τ−１），ｙ−ｅ_２（τ）−ｅ_２（τ−１），τ−２，ｔ−１）は、スポッティング点（ｘ，ｙ，ｔ）のｘとｙとτとｔの値について、τ−１を計算し、さらに、ｅ_１（τ）およびｅ_１（τ−１）と、ｅ_２（τ）およびｅ_２（τ−１）とを、ＲＡＭ４から読み出して、ｘ−ｅ_１（τ）−ｅ_１（τ−１）と、ｙ−ｅ_２（τ）−ｅ_２（τ−１）とを計算する。さらに、τ−２を計算し、さらにｔ−１を計算することにより求められる座標点を示している。 (X-e ₁ (τ) -e ₁ (τ-1), y-e ₂ (τ) -e ₂ (τ-1), τ-2, t) −1) calculates τ−1 for the values of x, y, τ, and t at the spotting point (x, y, t), and further, e ₁ (τ) and e ₁ (τ-1), e ₂ and (tau) and _{e 2} (τ-1), is read from RAM4, _{x-e 1} and _{(τ) -e 1 (τ-} 1), y-e 2 (τ) -e 2 (τ -1). Furthermore, the coordinate point calculated | required by calculating (tau) -2 and also calculating t-1 is shown.

そして、ＣＰＵ２は、選択肢Ｂにより求められる３つの座標候補点のそれぞれを、Ｓ（ｘ，ｙ，τ，ｔ）の中で、（ｘ，ｙ，τ，ｔ）に置き換えて、式１４に従って、計算を行い、最小値を与える候補点を決定する。

ここで決定された候補点を改めて、新しいスポッティング点（ｘ^＊，ｙ^＊，τ^＊，ｔ^＊）とする。その後、ＣＰＵ２は、新しいスポッティング点（ｘ^＊，ｙ^＊，τ^＊，ｔ^＊）を用いて、式１４により最小値を与える新たな候補点を決定し、新たな候補点を新しいスポッティング点（ｘ^＊，ｙ^＊，τ^＊，ｔ^＊）とする。この作業をＣＰＵ２は、τがＴから１になるまで繰り返し実行する。 Then, the CPU 2 replaces each of the three coordinate candidate points obtained by the option B with (x, y, τ, t) in S (x, y, τ, t), Calculation is performed to determine a candidate point that gives the minimum value.

The candidate point determined here is changed to a new spotting point (x ^* , y ^* , τ ^* , t ^* ). Thereafter, the CPU 2 uses the new spotting points (x ^* , y ^* , τ ^* , t ^* ) to determine a new candidate point that gives the minimum value according to Equation 14, and sets the new candidate point to the new spotting point (x ^* , Y ^* , τ ^* , t ^* ). The CPU 2 repeatedly executes this work until τ changes from T to 1.

そして、ＣＰＵ２は、τ＝１になるまで繰り返し求められたスポッティング点を全て抽出することにより、参照動画像のパターンに対応する入力動画像における動きを軌跡として抽出することが可能となる。 Then, by extracting all spotting points repeatedly obtained until τ = 1, the CPU 2 can extract the movement in the input moving image corresponding to the pattern of the reference moving image as a trajectory.

抽出された軌跡は、（ｘ^＊（Ｔ），ｙ^＊（Ｔ）），（ｘ^＊（Ｔ−１），ｙ^＊（Ｔ−１）），・・・，（ｘ^＊（１），ｙ^＊（１））となるが、ＣＰＵ２では、これらの軌跡を各時間毎に入力画像に重畳表示させることにより出力動画像を作成して出力する。ここで、（ｘ^＊（Ｔ），ｙ^＊（Ｔ））は、ステップＳ．８において、最初の処理で求められ軌跡の点を意味し、（ｘ^＊（１），ｙ^＊（１））は、τ＝１となった最後の処理で求められた軌跡の点を意味している。 The extracted trajectories are (x ^* (T), y ^* (T)), (x ^* (T-1), y ^* (T-1)), ..., (x ^* (1), y ^* (1)), the CPU 2 creates and outputs an output moving image by superimposing and displaying these loci on the input image every time. Here, (x ^* (T), y ^* (T)) is the same as step S.P. 8, the point of the trajectory obtained in the first process means (x ^* (1), y ^* (1)) means the point of the trajectory obtained in the last process when τ = 1. ing.

このようにして、ＣＰＵ２は、参照動画像のパターンに対応する入力動画像の動きをスポッティング点として抽出し、さらに、そのスポッティング点の軌跡を、式１４を用いてτ＝Ｔからτ＝１まで繰り返し計算することにより求める。 In this way, the CPU 2 extracts the movement of the input moving image corresponding to the pattern of the reference moving image as a spotting point, and further, the locus of the spotting point from τ = T to τ = 1 using Expression 14. Obtained by repeated calculation.

図１２〜図１７は、それぞれの参照動画像および入力動画像を示すと共に、ＣＰＵ２が、それぞれの軌跡を入力動画像の上に重畳させた出力動画像を示した図である。図１２〜図１７における上段の図は、参照動画像を示し、時間τが１からTまで変化するまでのマウスカーソルの動きを矢印で示している。図１２〜図１７における中段の図は、入力動画像を示し、固定されたビデオカメラにより３０ｆｐｓでカラー撮影された人物や物の動きを示している。そして、図１２〜図１７における下段の図は、出力動画像を示しており、全てリアルタイムトラッキング(real-time tracking)を用いている。 12 to 17 are diagrams illustrating the reference moving image and the input moving image, and the output moving image in which the CPU 2 superimposes each locus on the input moving image. The upper diagram in FIGS. 12 to 17 shows the reference moving image, and the movement of the mouse cursor until the time τ changes from 1 to T is indicated by an arrow. The middle diagram in FIGS. 12 to 17 shows an input moving image, and shows the movement of a person or an object captured in color at 30 fps by a fixed video camera. The lower diagrams in FIGS. 12 to 17 show output moving images, and all use real-time tracking.

図１２は、参照動画像として上から下へと真っ直ぐに移動するカーソルの動きを示しており、入力動画像として画面に映った人物が右手を上から下へと移動させる様子が示されている。ＣＰＵ２では、上述したフローチャートに従って、入力動画像における手のひらの動きを、参照動画像のパターンに類似する動きとして各フレーム毎に識別し、識別された手のひらの動きの軌跡を、出力動画像として出力している。図１２の下段に示す出力動画像の黒丸の連続が軌跡を示している。 FIG. 12 shows the movement of the cursor that moves straight from top to bottom as the reference moving image, and shows how the person reflected on the screen moves the right hand from top to bottom as the input moving image. . The CPU 2 identifies the movement of the palm in the input moving image as a movement similar to the pattern of the reference moving image for each frame according to the flowchart described above, and outputs the identified locus of the palm movement as the output moving image. ing. A series of black circles in the output moving image shown in the lower part of FIG. 12 indicates a locus.

このように、参照動画像において上から下へと移動するパターンを記録することにより、ＣＰＵ２では、入力動画像において上から下へと移動するパターンを示す手のひらの動きを類似する動きであるとして各フレーム毎に認識し、認識された手のひらの動きを軌跡として抽出することが可能となる。 As described above, by recording the pattern moving from top to bottom in the reference moving image, the CPU 2 assumes that the movement of the palm indicating the pattern moving from top to bottom in the input moving image is similar to each other. It is possible to recognize each frame and extract the recognized palm movement as a locus.

一方で、図１３は、参照動画像として上から下へと真っ直ぐに移動するカーソルの動きを示しており、入力動画像として画面に映った人物が奥側から手前側へと移動する様子が示されている。ＣＰＵ２では、上述したフローチャートに従って、入力動画像における人物の移動する様子を、参照動画像のパターンに類似する動きとして各フレーム毎に識別し、識別された人物の動きを軌跡として出力動画像に出力している。 On the other hand, FIG. 13 shows the movement of the cursor that moves straight from top to bottom as the reference moving image, and shows that the person reflected on the screen as the input moving image moves from the back side to the near side. Has been. According to the above-described flowchart, the CPU 2 identifies the movement of the person in the input moving image for each frame as a movement similar to the pattern of the reference moving image, and outputs the identified person's movement as a trajectory to the output moving image. doing.

このように、図１２と図１３において、上から下へと移動するパターンを参照動画像として記録する場合であっても、ＣＰＵ２は、図１２の場合には手のひらの動きのパターンを、図１３の場合には人物の移動パターンを、参照動画像のパターンに類似する動きであると認識し、軌跡を抽出することが可能となる。 In this way, even when a pattern moving from top to bottom is recorded as a reference moving image in FIGS. 12 and 13, the CPU 2 displays the pattern of palm movement in FIG. In this case, it is possible to recognize the movement pattern of the person as a movement similar to the pattern of the reference moving image and extract the locus.

図１４は、参照動画像として上から下へとＳの字を描くように移動するカーソルの動きを示しており、入力動画像として画面に映った人物の手のひらが上から下へとＳの字を描くように移動する様子が示されている。ＣＰＵ２では、上述したフローチャートに従って、入力動画像における人物の手のひらの動きを、参照動画像のＳ字のパターンに類似する動きとして各フレーム毎に識別し、識別された人物の手のひらの動きを軌跡として出力動画像に出力している。 FIG. 14 shows the movement of the cursor that moves so as to draw the letter S from the top to the bottom as the reference moving image, and the palm of the person reflected on the screen as the input moving image is the letter S from the top to the bottom. It shows how it moves to draw. The CPU 2 identifies the movement of the palm of the person in the input moving image as a movement similar to the S-shaped pattern of the reference moving image for each frame according to the flowchart described above, and uses the movement of the palm of the identified person as a trajectory. Output to output video.

このように、上から下へとＳの字に移動するような複雑なパターンを、参照動画像として記録する場合であっても、ＣＰＵ２は、手のひらの動きのパターンを、参照動画像のＳ字のパターンに類似する動きであると認識し、軌跡を抽出することが可能となる。 In this way, even when a complicated pattern that moves from the top to the bottom in the S shape is recorded as the reference moving image, the CPU 2 converts the palm movement pattern into the S shape of the reference moving image. It is possible to recognize that the movement is similar to the pattern and extract the trajectory.

図１５は、参照動画像として上から下へと真っ直ぐに移動するカーソルの動きを示しており、入力動画像として画面に映った人物が両方の手のひらを上から下へと同時に移動させる様子が示されている。ＣＰＵ２では、上述したフローチャートに従って、

に基づき、閾値ｈ以下となる（ｘ，ｙ）の全てをスポッティング点として抽出し、その抽出された軌跡をフレーム毎に抽出している。このため、ＣＰＵ２は、入力動画像における右手の手のひらと左手の手のひらとの動きを、参照動画像のパターンに類似する動きとして各フレーム毎に別々に識別し、識別された左右の手のひらの動きの軌跡を、出力動画像に別々に出力することが可能となっている。 FIG. 15 shows the movement of the cursor that moves straight from top to bottom as a reference moving image, and shows how a person reflected on the screen moves both palms from top to bottom simultaneously as an input moving image. Has been. In CPU2, according to the flowchart described above,

Based on the above, all (x, y) that are equal to or less than the threshold value h are extracted as spotting points, and the extracted trajectory is extracted for each frame. For this reason, the CPU 2 separately identifies the movements of the right hand palm and the left hand palm in the input moving image as movements similar to the pattern of the reference moving image for each frame, and the identified left and right palm movements. The trajectory can be output separately to the output moving image.

このように、参照動画像において上から下へと移動するパターンを記録することにより、ＣＰＵ２では、入力動画像において上から下へと移動する複数のパターンをそれぞれ異なる動きであると判断し、両手の手のひらの動きをそれぞれ別々に、類似する動きであると認識して、軌跡として出力画像に出力する。 In this way, by recording the pattern moving from top to bottom in the reference moving image, the CPU 2 determines that the plurality of patterns moving from top to bottom in the input moving image are different movements, and both hands. The palm movements are separately recognized as similar movements and output as a trajectory to the output image.

図１６は、参照動画像として下から上へと真っ直ぐに移動するカーソルの動きを示しており、入力動画像として画面に映った人物が、両方の手のひらを体の中心位置において交互に回転させて入れ替えながら下から上へと移動させる様子が示されている。ＣＰＵ２では、上述したフローチャートに従って、入力動画像における左右の手のひらの動きを、参照動画像のパターンに類似する動きとして各フレーム毎に別々に識別し、識別された左右の手のひらの動きの軌跡を、出力動画像として別々に出力している。この場合に、手のひらが交互に回転される際に、後ろ側に位置する手のひらが手前側に位置する手のひらにより見えなくなるが、手前側に位置する手のひらは常に下から上へと移動するので、この移動パターンが軌跡として抽出されることになる。 FIG. 16 shows the movement of the cursor that moves straight from bottom to top as a reference moving image, and a person reflected on the screen as an input moving image rotates both palms alternately at the center position of the body. It shows how to move from bottom to top while switching. In the CPU 2, according to the above-described flowchart, the left and right palm movements in the input moving image are separately identified for each frame as movements similar to the reference moving image pattern, and the identified left and right palm movement trajectories are It is output separately as an output video. In this case, when the palms are rotated alternately, the palm located on the back side becomes invisible due to the palm located on the near side, but the palm located on the near side always moves from bottom to top. A movement pattern is extracted as a locus.

このように、参照動画像において下から上へと移動するパターンを記録することにより、ＣＰＵ２では、入力動画像において下から上へと移動するパターンが交互に生じる場合であっても、その動きを参照動画像のパターンに類似する動きであると認識して、軌跡を出力画像に出力することが可能となる。 In this way, by recording the pattern moving from the bottom to the top in the reference moving image, the CPU 2 can change the movement even when the pattern moving from the bottom to the top alternately occurs in the input moving image. It is possible to recognize that the movement is similar to the pattern of the reference moving image and output the locus to the output image.

図１７は、参照動画像として右側から左側へと真っ直ぐに移動するカーソルの動きを示している。また、入力動画像として、ボールが画面の右側から左側へと移動する様子が示されており、画面中央には、ボールの移動経路の前側に箱が置かれている。ＣＰＵ２では、上述したフローチャートに従って、入力動画像におけるボールの動きを、参照動画像のパターンに類似する動きとしてフレーム毎に別々に識別し、識別された右側から左側へのボールの動きの軌跡を、出力動画像として出力している。この場合に、ボールが箱の後ろ側を通過する間には、ボールの動きを入力動画像において観察することができず、オクルージョンが一時的に発生することになる。このような状態であっても、ＣＰＵ２では、時空間連続ＤＰに基づいてボールの動きの抽出を行うため、ボールが箱の存在により見えなくなる前の状態から、その後の動きのパターンを推測して（可能性の高い移動位置を検出して）、ボールの認識を行う。このため、カルマンフィルタやパーティクルフィルタを用いてトラッキングを行う場合のように、存在確率などの計算を行う必要がないので、ボールの移動を認識することができ、箱の左側からボールが現れた状態において、その前のボールの状態を考慮した軌跡を出力動画像として出力することが可能となる。 FIG. 17 shows the movement of the cursor that moves straight from the right side to the left side as the reference moving image. In addition, as an input moving image, a state in which the ball moves from the right side to the left side of the screen is shown, and a box is placed in front of the moving path of the ball in the center of the screen. In the CPU 2, according to the above-described flowchart, the movement of the ball in the input moving image is separately identified for each frame as a movement similar to the pattern of the reference moving image, and the locus of the movement of the ball from the right side to the left side is identified. Output as output video. In this case, while the ball passes the rear side of the box, the movement of the ball cannot be observed in the input moving image, and occlusion occurs temporarily. Even in such a state, since the CPU 2 extracts the movement of the ball based on the spatiotemporal continuous DP, the subsequent movement pattern is estimated from the state before the ball disappears due to the presence of the box. The ball is recognized (by detecting a highly likely moving position). For this reason, it is not necessary to calculate the existence probability as in the case of tracking using a Kalman filter or a particle filter, so that the movement of the ball can be recognized, and the ball appears from the left side of the box. It is possible to output a trajectory considering the previous ball state as an output moving image.

このように、ＣＰＵ２では、入力動画像においてオクルージョンが発生する場合であっても、対象物が画面から消える前の移動パターンおよび現れた後の移動パターンに基づいて、オクルージョンの最中の対象物の移動状態を識別することができ、オクルージョンに頑健な認識結果を得ることが可能となる。 As described above, in the CPU 2, even when occlusion occurs in the input moving image, based on the movement pattern before the object disappears from the screen and the movement pattern after the appearance, the object in the middle of the occlusion is displayed. The moving state can be identified, and an occlusion robust recognition result can be obtained.

以上、説明したように、本発明に係る動画像処理装置１では、ＣＰＵ２が、参照動画像の座標点を、時間τをパラメータとしてＺ（ξ（τ），η（τ））で特定することにより、参照動画像の座標点を示すパラメータを時間τだけにすることを特徴としている。このようにして参照動画像の座標点を１次元のτだけの変数として捉えることにより、連続ＤＰと同じように、参照動画像Ｚ（ξ（τ），η（τ））のパラメータτと、入力動画像ｆ（ｘ，ｙ，ｔ）のパラメータｘ，ｙ，ｔとの４つのパラメータを用いて、局所距離ｄ（ｘ，ｙ，τ，ｔ）を求めることが可能となる。このため、参照時系列パターンおよび入力時系列パターンとしてそれぞれ動画を用いることにより、連続ＤＰを拡張させた時空間連続ＤＰとして、参照動画像におけるパターンに類似する入力動画像の動きを認識・抽出することが可能になると共に、軌跡を抽出することができる。 As described above, in the moving image processing apparatus 1 according to the present invention, the CPU 2 specifies the coordinate point of the reference moving image with Z (ξ (τ), η (τ)) using the time τ as a parameter. Thus, the parameter indicating the coordinate point of the reference moving image is set to only time τ. In this way, by capturing the coordinate point of the reference moving image as a variable of only one-dimensional τ, the parameter τ of the reference moving image Z (ξ (τ), η (τ)), as in the continuous DP, The local distance d (x, y, τ, t) can be obtained using the four parameters x, y, t of the input moving image f (x, y, t). For this reason, by using moving images as the reference time series pattern and the input time series pattern, the motion of the input moving image similar to the pattern in the reference moving image is recognized / extracted as a spatiotemporal continuous DP obtained by expanding the continuous DP. And the trajectory can be extracted.

さらに、時空間連続ＤＰでは、特徴パターンを抽出するのではなく、スポッティング点を累積された局所距離に基づいて求めることにより、参照動画像と入力動画像のマッチングを行うことを特徴とする。このため、例えば、オクルージョンなどのように一次的に対象物の存在がなくなる（時系列的に対象物の存在が分断される）場合であっても、参照動画像と入力動画像とのマッチングが可能となる。さらに、閾値を用いて判断することにより、マッチング点の抽出をローカルエリア（local area）毎に求めることが可能となる。このため、入力動画像において参照動画像のパターンに類似する動きが複数存在する場合であっても、それぞれ別々に識別を行うことが可能になると共に、別々に軌跡の抽出を行うことが可能となる。 Furthermore, the spatio-temporal continuous DP is characterized in that the reference moving image and the input moving image are matched not by extracting a feature pattern but by obtaining a spotting point based on the accumulated local distance. For this reason, for example, even when the target object temporarily disappears, such as occlusion (the presence of the target object is divided in time series), the matching between the reference moving image and the input moving image is performed. It becomes possible. Further, by making a determination using a threshold value, it is possible to obtain a matching point for each local area. For this reason, even when there are a plurality of movements similar to the pattern of the reference moving image in the input moving image, it is possible to separately identify each of them and to extract a locus separately. Become.

さらに、時空間連続ＤＰでは、時間τにおける局所距離の累積値を求める際に、時間τおよび時間ｔにおける直前の累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）からの移動経路を選択して、τが１からＴまでの間における累積距離の値Ｓ（ｘ，ｙ，τ，ｔ）が最小となる値を、スポッティング点として判断する。このため、参照動画像と入力動画像とのマッチングにおいて、時間的かつ局所的な伸縮を含んだ形で累積されて全体としての非線形な類似関係（対応関係）が形成されることになる。このように参照動画像と入力動画像とのパターンが非線形の対応を形成することによって、結果として完全に一致するのではない類似な関係になるものの整合性を図ることが可能になる。 Further, in the spatio-temporal continuous DP, when obtaining the cumulative value of the local distance at time τ, the travel route from the value S (x, y, τ, t) of the previous cumulative distance at time τ and time t is selected. Thus, the value at which the cumulative distance value S (x, y, τ, t) between τ and 1 to T is determined as the spotting point. For this reason, in the matching between the reference moving image and the input moving image, the non-linear similarity (correspondence) as a whole is formed by accumulating in a form including temporal and local expansion and contraction. Thus, by forming a non-linear correspondence between the pattern of the reference moving image and the input moving image, it is possible to achieve consistency of the similar relationship that does not completely match as a result.

以上、本発明に係る動画像処理装置および動画像処理プログラムについて、図面を用いて詳細に説明したが、本発明に係る動画像処理装置および動画像処理プログラムは、上述した実施の形態に示した例には限定されない。 The moving image processing apparatus and the moving image processing program according to the present invention have been described in detail with reference to the drawings. The moving image processing apparatus and the moving image processing program according to the present invention have been described in the above-described embodiments. It is not limited to examples.

例えば、図１１に示すＣＰＵ２の処理において、ＣＰＵでは、時間ｔにおける入力動画像の各座標点（ｘ，ｙ）の画素の明暗値と参照動画像の軌跡の各点の明暗値との差の絶対値を局所距離ｄ（ｘ，ｙ，τ，ｔ）＝‖Ｚ（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖として求める場合（ステップＳ．４）について説明した。しかしながら、局所距離ｄ（ｘ，ｙ，τ，ｔ）の算出方法は、明暗値の差の絶対値のみには限定されない。例えば、参照動画像と入力動画像とのマッチング対象が明暗値だけである場合には、明暗値の差の絶対値を求めれば良いが、画素毎の各色（カラー）を対象とする場合には、Ｒ（赤色）、Ｇ（緑色）、Ｂ（青色）のそれぞれの要素の差の和（あるいはユークリッド距離）を求めることにより局所距離ｄ（ｘ，ｙ，τ，ｔ）を算出することが可能である。さらに、参照動画像の色が複数のＫ種類のいずれでも良い場合には、ｋを１≦ｋ≦Ｋの範囲の値として、局所距離ｄ（ｘ，ｙ，τ，ｔ）を、ｄ（ｘ，ｙ，τ，ｔ）＝min｛‖Ｚ_｛ｋ｝（ξ（τ），η（τ））−ｆ（ｘ，ｙ，ｔ）‖｝として求めることが可能である。この場合、Ｚ_｛ｋ｝は、ｋ番目の色の参照動画を示すことになる。 For example, in the processing of the CPU 2 shown in FIG. 11, the CPU determines the difference between the brightness value of the pixel at each coordinate point (x, y) of the input moving image and the brightness value of each point of the reference moving image locus at time t. The case where the absolute value is obtained as the local distance d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) − f (x, y, t) ‖ will be described (step S.4). did. However, the method for calculating the local distance d (x, y, τ, t) is not limited to the absolute value of the difference between the light and dark values. For example, if the matching target of the reference moving image and the input moving image is only the brightness value, the absolute value of the difference between the brightness values may be obtained. It is possible to calculate the local distance d (x, y, τ, t) by calculating the sum (or Euclidean distance) of the differences between the elements R, R (red), G (green), and B (blue). It is. Further, when the color of the reference moving image may be any of a plurality of K types, the local distance d (x, y, τ, t) is set to d (x, with k being a value in the range of 1 ≦ k ≦ K. , Y, τ, t) = min {‖Z_ {k} (ξ (τ), η (τ)) − f (x, y, t) ‖}. In this case, Z_ {k} indicates the reference moving image of the kth color.

１ …動画像処理装置
２ …ＣＰＵ（動画像処理手段）
３ …ＲＯＭ
４ …ＲＡＭ
５ …記録部（記録手段）
６ …ディスプレイ部
７ …操作部 DESCRIPTION OF SYMBOLS 1 ... Moving image processing apparatus 2 ... CPU (moving image processing means)
3 ... ROM
4 ... RAM
5 ... Recording section (recording means)
6: Display unit 7: Operation unit

Claims

A reference moving image of time T in which the motion of the object is photographed, and the coordinate position of time τ in the reference moving image is indicated by (ξ (τ), η (τ)), and the luminance at the coordinate position is A reference video represented by Z (ξ (τ), η (τ));
An input moving image in which movement similar to the movement of the object is recognized, and the luminance at time t in the input moving image is f (x, y, t) using the coordinate position (x, y). A recording means for recording the input moving image represented;
Moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image;
The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is set as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) −f (x, y, t) ‖,
The moving image processing means
By using continuous DP, based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, Calculate the minimum value of the distance d at time τ,
Based on the local distance d that is the minimum value at the time τ, by using the continuous DP, the evaluation function S (x, y, T, t) accumulated until the time τ becomes 1 to T is calculated.
By calculating the coordinate position (x ^* , y ^* ) of the time t at which the value obtained by dividing the time T from the calculated evaluation function S (x, y, T, t) is equal to or less than a predetermined threshold value h,
A moving image processing apparatus characterized in that a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed in the reference moving image in the input moving image is obtained.

A reference moving image of time T in which the motion of the object is photographed, and the coordinate position of time τ in the reference moving image is indicated by (ξ (τ), η (τ)), and the luminance at the coordinate position is A reference video represented by Z (ξ (τ), η (τ));
An input moving image in which movement similar to the movement of the object is recognized, and the luminance at time t in the input moving image is f (x, y, t) using the coordinate position (x, y). A recording means for recording the input moving image represented;
Moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image;
The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is set as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) −f (x, y, t) ‖,
e ₁ = ξ (τ) −ξ (τ−1), e ₂ = η (τ) −η (τ−1),
The moving image processing means
By using continuous DP, based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, Calculate the minimum value of the distance d at time τ,
Using the minimum value of the local distance d at time τ, an evaluation function S (x, y, T, t) obtained by accumulating the time τ from 1 to T is expressed as S (x, y, 1 , T) = 3d (x, y, 1, t), and recurrence formula obtained by continuous DP

Calculated by
A coordinate position (x ^* , y ^* ) at time t at which a value obtained by dividing time 3T from the calculated evaluation function S (x, y, T, t) is equal to or less than a predetermined threshold value h is represented as the input moving image. Based on the local area within the vertical and horizontal resolution range of

By calculating based on
A moving image processing apparatus characterized in that a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed in the reference moving image in the input moving image is obtained.

A reference moving image of time T in which the motion of the object is photographed, and the coordinate position of time τ in the reference moving image is indicated by (ξ (τ), η (τ)), and the luminance at the coordinate position is A reference video represented by Z (ξ (τ), η (τ));
An input moving image in which movement similar to the movement of the object is recognized, and the luminance at time t in the input moving image is f (x, y, t) using the coordinate position (x, y). A recording means for recording the input moving image represented;
Moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image;
The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is set as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) −f (x, y, t) ‖,
e ₁ = ξ (τ) −ξ (τ−1), e ₂ = η (τ) −η (τ−1),
Set the deformation rate in the vertical and horizontal resolution directions of the trajectory of the object in the reference moving image to α,
The moving image processing means
By using continuous DP, based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, Calculate the minimum value of the distance d at time τ,
Using the minimum value of the local distance d at time τ, an evaluation function S (x, y, T, t) obtained by accumulating the time τ from 1 to T is expressed as S (x, y, 1 , T) = 3d (x, y, 1, t), and recurrence formula obtained by continuous DP

The coordinate position (x ^* , y ^* ) at time t, which is obtained based on the evaluation function S (x, y, T, t) and is equal to or less than the threshold value h, is defined as the coordinate position (x, y).
B =
{(X−e ₁ (τ), ye ₂ (τ), τ−1, t−2),
(X−e ₁ (τ), ye ₂ (τ), τ−1, t−1),
_{(X-e 1 (τ)} -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1), τ-2, t-1) is set} and,
Set τ = T as the initial value of time τ,
The moving image processing means
Based on the coordinate position (x, y) at time t,

Calculate x ^* , y ^* , t ^* from
The calculated x ^* , y ^* , t ^* are newly set to x, y, t, and 1 is subtracted from the value of τ, and x ^* , y ^* , t ^* are repeatedly returned until τ becomes 1. By calculating
The trajectory of the coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image is extracted from the input moving image. The moving image processing apparatus according to claim 3.

A plurality of reference moving images, which are elemental movements having versatility and shot different movements, are recorded in the recording means,
The moving image processing means
By extracting the coordinate position (x ^* , y ^* ) of the time t which is obtained based on the evaluation function S (x, y, T, t) and is not more than the threshold value h according to the time t, 5. The moving image processing apparatus according to claim 1, wherein a continuous motion corresponding to the elemental motion type is obtained from the input moving image. 6.

A reference moving image of time T in which the motion of the object is photographed, and the coordinate position of time τ in the reference moving image is indicated by (ξ (τ), η (τ)), and the luminance at the coordinate position is A reference video represented by Z (ξ (τ), η (τ));
An input moving image in which movement similar to the movement of the object is recognized, and the luminance at time t in the input moving image is f (x, y, t) using the coordinate position (x, y). A recording means for recording the input moving image represented;
A moving image processing apparatus comprising: moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed in the reference moving image in the input moving image. A moving image processing program,
The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is set as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) −f (x, y, t) ‖,
In the moving image processing means,
By using continuous DP, based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, A local distance calculation function for calculating a minimum value of the distance d at time τ;
An evaluation function for calculating an evaluation function S (x, y, T, t) accumulated until the time τ becomes 1 to T by using the continuous DP based on the local distance d that becomes the minimum value at the time τ. A calculation function;
By calculating the coordinate position (x ^* , y ^* ) of the time t at which the value obtained by dividing the time T from the calculated evaluation function S (x, y, T, t) is equal to or less than a predetermined threshold value h, A moving image for realizing a coordinate position calculation function for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image Image processing program.

A reference moving image of time T in which the motion of the object is photographed, and the coordinate position of time τ in the reference moving image is indicated by (ξ (τ), η (τ)), and the luminance at the coordinate position is A reference video represented by Z (ξ (τ), η (τ));
An input moving image in which movement similar to the movement of the object is recognized, and the luminance at time t in the input moving image is f (x, y, t) using the coordinate position (x, y). A recording means for recording the input moving image represented;
A moving image processing apparatus comprising: moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed in the reference moving image in the input moving image. A moving image processing program,
The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is set as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) −f (x, y, t) ‖,
e ₁ = ξ (τ) −ξ (τ−1), e ₂ = η (τ) −η (τ−1),
In the moving image processing means,
By using continuous DP, based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, A local distance calculation function for calculating a minimum value of the distance d at time τ;
Using the minimum value of the local distance d at time τ, an evaluation function S (x, y, T, t) obtained by accumulating the time τ from 1 to T is expressed as S (x, y, 1 , T) = 3d (x, y, 1, t), and recurrence formula obtained by continuous DP

An evaluation function calculation function to be calculated by
A coordinate position (x ^* , y ^* ) at time t at which a value obtained by dividing time 3T from the calculated evaluation function S (x, y, T, t) is equal to or less than a predetermined threshold value h is represented as the input moving image. Based on the local area within the vertical and horizontal resolution range of

By calculating based on
A moving image for realizing a coordinate position calculation function for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image in the input moving image Image processing program.

A reference moving image of time T in which the motion of the object is photographed, and the coordinate position of time τ in the reference moving image is indicated by (ξ (τ), η (τ)), and the luminance at the coordinate position is A reference video represented by Z (ξ (τ), η (τ));
An input moving image in which movement similar to the movement of the object is recognized, and the luminance at time t in the input moving image is f (x, y, t) using the coordinate position (x, y). A recording means for recording the input moving image represented;
A moving image processing apparatus comprising: moving image processing means for obtaining a coordinate position (x ^* , y ^* ) for each time t of a portion similar to a motion pattern of an object photographed in the reference moving image in the input moving image. A moving image processing program,
The absolute value of the value obtained by subtracting the luminance of the input moving image from the luminance of the reference moving image is set as a local distance d, and d (x, y, τ, t) = ‖Z (ξ (τ), η (τ)) −f (x, y, t) ‖,
e ₁ = ξ (τ) −ξ (τ−1), e ₂ = η (τ) −η (τ−1),
Set the deformation rate in the vertical and horizontal resolution directions of the trajectory of the object in the reference moving image to α,
In the moving image processing means,
By using continuous DP, based on the reference moving image Z (ξ (τ), η (τ)) and the input moving image f (x, y, t) recorded in the recording means, A local distance calculation function for calculating a minimum value of the distance d at time τ;
Using the minimum value of the local distance d at time τ, an evaluation function S (x, y, T, t) obtained by accumulating the time τ from 1 to T is expressed as S (x, y, 1 , T) = 3d (x, y, 1, t), and recurrence formula obtained by continuous DP

The coordinate position (x ^* , y ^* ) for each time t obtained by the coordinate position calculation function is defined as the coordinate position (x, y).
B =
{(X−e ₁ (τ), ye ₂ (τ), τ−1, t−2),
(X−e ₁ (τ), ye ₂ (τ), τ−1, t−1),
_{(X-e 1 (τ)} -e 1 (τ-1), y-e 2 (τ) -e 2 (τ-1), τ-2, t-1) is set} and,
Set τ = T as the initial value of time τ,
In the moving image processing means,
Based on the coordinate position (x, y) at time t,

Execute the trajectory calculation function to calculate x ^* , y ^* , t ^* from
By newly setting the calculated x ^* , y ^* , t ^* to x, y, t, subtracting 1 from the value of τ, and repeatedly executing the locus calculation function until τ becomes 1,
The locus of the coordinate position (x ^* , y ^* ) for each time t of a portion similar to the motion pattern of the object photographed in the reference moving image is extracted from the input moving image. The moving image processing program according to claim 8.

A plurality of reference moving images, which are elemental movements having versatility and shot different movements, are recorded in the recording means,
Causing the moving image processing means to execute an extraction function for extracting the coordinate position (x ^* , y ^* ) for each time t determined by the coordinate position calculation function according to the time t;
The continuous motion corresponding to the elemental motion type is obtained from the input moving image from the coordinate position (x ^* , y ^* ) for each time t extracted by the extraction function. The moving image processing program according to any one of claims 6 to 9.