JP2007114885A

JP2007114885A - Classification method and apparatus based on image similarity

Info

Publication number: JP2007114885A
Application number: JP2005303452A
Authority: JP
Inventors: Atsushi Yoshimoto; 淳善本
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2005-10-18
Filing date: 2005-10-18
Publication date: 2007-05-10

Abstract

【課題】撮像対象の動作など変化に関する情報を、撮像画像から得て、その画像における変化のパターンを撮像に平行して分類するための方法及び装置を提供すること。
【解決手段】動作をしうる対象体を撮像可能な複数の撮像手段と、その撮像手段によって得られた画像情報を少なくとも記録する記憶手段と、撮像手段によって得られた画像情報を少なくとも解析する演算手段とを備え、対象体を撮像して得た複数の画像における変化のパターンを分類する方法において、撮像手段によって対象体を連続的に撮像し、それら連続的に撮像された画像情報の差分から、撮像単位時間の間に発生した画像情報の変化量を求め、現時刻から所定の対象時間内における変化パターンを変数として（例えばクラスター分析を利用して）分類することによって、現時刻から所定の対象時間内における画像情報をほぼリアルタイムに分類する。
【選択図】図２
PROBLEM TO BE SOLVED: To provide a method and an apparatus for obtaining information on a change such as an operation of an imaging target from a captured image and classifying a change pattern in the image in parallel with the imaging.
SOLUTION: A plurality of imaging means capable of imaging an object that can operate, a storage means for recording at least image information obtained by the imaging means, and an operation for analyzing at least the image information obtained by the imaging means And a method of classifying a pattern of change in a plurality of images obtained by imaging the target object, the target object is continuously imaged by the imaging means, and the difference between the image information continuously captured The amount of change in image information generated during the imaging unit time is obtained, and the change pattern within a predetermined target time from the current time is classified as a variable (for example, using cluster analysis), thereby obtaining a predetermined amount from the current time. The image information within the target time is classified almost in real time.
[Selection] Figure 2

Description

本発明は、例えば人が習慣的に何度も発生させる、比較的短時間に収まる特徴的な動作など、撮像画像から類似性に基づいて分類する方法と、それを実施する装置に関する。 The present invention relates to a method of classifying a captured image based on similarity, such as a characteristic action that occurs in a habitual manner many times by a person, and a device that performs the classification.

近年、電子電気的な受光素子の大量生産化に伴い、ビデオ等の撮像手段の価格の廉価化が激しい。安価な監視用撮像手段を手に入れることも容易になった今、多数の監視用撮像手段を設置し、会社設備などをいくつもの視点から監視するのも容易になった。
しかしながら、撮像手段を増やせば増やすほど、それらが撮像した映像も増えてしまうので、撮像手段や撮像データをもてあましているのが現状である。
撮像手段と１対１に対応づけられたディスプレイ等の表示手段を考えた場合は、その表示手段の大きさや設置スペースも問題となる。一人の監視員で50台も100台もの多数を監視するのも困難である。 In recent years, with the mass production of electro-electric light receiving elements, the price of imaging means such as video has been drastically reduced. Now that it is easy to obtain inexpensive monitoring imaging means, it has become easy to install a large number of monitoring imaging means and monitor company facilities from various viewpoints.
However, as the number of image pickup means increases, the number of images picked up by them increases, so the current situation is that the image pickup means and image data are used.
When a display unit such as a display associated with the image pickup unit in a one-to-one relationship is considered, the size and installation space of the display unit are also a problem. It is difficult for a single observer to monitor as many as 50 or 100 units.

例えば撮像手段４台に対して表示手段１台を割り振り、時間を４分割して、表示手段に画像情報を供給する撮像手段を自動的に切り替える方法がある。
しかし、これは3/4の映像を捨てていることになり、その間に監視したい出来事が発生するかもしれないので、万全な方法とはいいがたい。 For example, there is a method in which one display means is assigned to four image pickup means, the time is divided into four, and the image pickup means for supplying image information to the display means is automatically switched.
However, this means that 3/4 video has been thrown away, and events that you want to monitor may occur in the meantime, so it is not a perfect method.

それに対し、動作のある映像のみを表示し、動作のない映像は捨象するという方法がある。
しかし、例えば50台の撮像装置において同時に動作が発生した場合、結局は50台分の映像をチェックする必要が生じてしまう。 On the other hand, there is a method in which only a moving image is displayed and a non-moving image is discarded.
However, for example, when operations occur simultaneously in 50 imaging devices, it is eventually necessary to check images for 50 devices.

複数の撮像手段から類似性の高い撮像データが出力されている場合、いずれかの撮像手段を停止させても支障はない。しかし、この類似性の判断は人間の経験や主観に依存するところが大きく、これを自動的に行う装置は従来にはなかった。 When highly similar imaging data is output from a plurality of imaging means, there is no problem even if one of the imaging means is stopped. However, this similarity determination largely depends on human experience and subjectivity, and there has been no apparatus that automatically performs this.

例えば、聴衆がホールで講演者の話を聞いている場合を考えると、聴衆のなかには、頷く者もいたり、首を捻る者もいたり、不動の者もいたりする。彼らの動作を分類するには、全聴衆を映せるような撮像手段を用意し、その後、人手を使って各聴衆の動作を判断し分類するのが従来の一般的な方法である。また、話題の受け容れ状態を判断するには、現在は講演者が講演最中に全聴衆を観察して、その様子から判断するしか方法がない。
仮にこの聴衆一人一人に対し、撮像装置が設置されていたとしても、現在の技術では、結局人手によって聴衆一人一人を分類することになるだろう。
もし、ここで、講演者が講演を始めから今までの時間において、聴衆の頷き、首の捻り、不動の比率の変化や、聴衆の動作の一致状態を知ることができたら、それにあわせて話の内容を合理的に臨機応変に変えることができる。 For example, if the audience is listening to the speakers at the hall, some of the audience may be whispering, some may be twisted, and others may be stationary. In order to classify their actions, it is a conventional general method to prepare an imaging means that can show the entire audience, and then determine and classify the actions of each audience using human hands. The only way to determine the status of acceptance of a topic is to observe the entire audience during the lecture and make a judgment based on the situation.
Even if an imaging device is installed for each audience, the current technology will eventually classify each audience manually.
If the speaker can know the audience's whispering, neck twisting, immobility ratio change, and the matching state of the audience's movement during the time from the start of the lecture to the present, talk accordingly. The contents of can be rationally changed in a flexible manner.

また、例えば電話をする時など、相手の顔を見ないで話す場合にも、頭部等の動作が表出することは、よく知られた現象である。人間は、対話時に無意識に各人固有の頭部動作が表出しやすい傾向があり、またそれは個人によって癖のある、所謂個性的である場合も少なくない。そのような人体の頭部動作など、撮像対象の動作に関する非言語情報の意味がわかれば、マンマシンインタフェースの補助に使用するなど、諸々の場面に活用することができる。
しかし、従来には、撮像対象の動作に関する非言語情報を有効に、かつ具体的に活用や分類もせずにこれら非言語情報は利用されることもなく捨てられてきた。 In addition, it is a well-known phenomenon that the action of the head or the like appears even when speaking without looking at the other party's face, for example, when making a call. Humans tend to unintentionally express head movements that are unique to each person during dialogue, and it is often the so-called individuality that has a habit depending on the individual. If the meaning of non-linguistic information related to the motion of the imaging target, such as the head motion of the human body, is known, it can be used in various situations, such as being used to assist a man-machine interface.
However, conventionally, non-linguistic information related to the operation of an imaging target has been effectively discarded without being used or classified without being used.

そこで、本発明は、撮像対象の動作など変化に関する情報を、撮像画像から得て、その画像における変化のパターンを類似性によって分類する方法と、それを実施する装置を提供することを課題とする。 SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide a method for obtaining information related to changes such as an operation of an imaging target from a captured image, and classifying the change pattern in the image based on similarity, and an apparatus for implementing the method. .

上記課題を解決するために、本発明の動きのある画像の分類方法は、動作をしうる対象体を撮像可能な複数の撮像手段と、その撮像手段によって得られた画像情報を少なくとも記録する記憶手段と、撮像手段によって得られた画像情報を少なくとも解析する演算手段とを備え、対象体を撮像して得た複数の画像における変化のパターンを分類する方法において、撮像手段によって対象体を連続的に撮像し、それら連続的に撮像された画像情報の差分から、撮像単位時間の間に発生した画像情報の変化量を求め、現時刻から所定の対象時間内における変化パターンを変数として類似性を比較し、類似度の高い動作同士を同一動作として分類することによって、所定の対象時間内における画像情報をその動作の種類によって随時分類することを特徴とする。
類似性によって分類する方法は、例えば各特徴を数値化して比較することによるクラスター分析などがある。 In order to solve the above-described problems, a moving image classification method according to the present invention includes a plurality of imaging units capable of imaging an object that can be operated, and a memory that records at least image information obtained by the imaging unit. And a calculation means for analyzing at least the image information obtained by the imaging means, and in the method for classifying the pattern of changes in a plurality of images obtained by imaging the object, the object is continuously detected by the imaging means. The amount of change in the image information generated during the imaging unit time is obtained from the difference between the image information captured continuously, and the similarity is obtained using the change pattern within the predetermined target time from the current time as a variable. By comparing and classifying actions with high similarity as the same action, the image information within a predetermined target time is classified at any time according to the type of action. To.
A method of classifying by similarity includes, for example, cluster analysis by digitizing and comparing each feature.

ここで、時間経過によって発生した画像情報の変化量をフーリエ変換し、それにより得られる連続した撮像画像情報に内包されている周波数及び強度を、類似性分類のための変数として用いて、有効な分類に寄与させてもよい。これによって複数回連続した動作や複合した動作は、単数や個別の動作へと還元することが可能となり、単離できないが故に別の動作と分類されていたところを回避することが可能になる。 Here, the amount of change in the image information generated over time is Fourier transformed, and the frequency and intensity included in the continuous captured image information obtained thereby are used as variables for similarity classification. You may contribute to classification. As a result, it is possible to reduce a plurality of continuous operations or combined operations to a single operation or an individual operation, and avoid being classified as another operation because it cannot be isolated.

例えば分類にクラスター分析を用いる場合、撮像画像情報に変化が発生するタイミングに重み付けをした変数を用いて分析を行い、目的に沿った有効な分類に寄与させてもよい。 For example, when cluster analysis is used for classification, the analysis may be performed using a variable weighted to the timing at which a change occurs in the captured image information, and may be contributed to effective classification according to the purpose.

分類された各類似動作を代表する画像を、表示手段に出力して、視認に寄与させてもよい。 An image representative of each classified similar operation may be output to the display means to contribute to visual recognition.

典型的な変化パターンを、頻出する周期的変化、頻出しない周期的変化、非周期的変化、無変化、の４状態に分類し、その４状態への分類状況を、分類後即座に表示手段に出力して示して、分類結果の活用に寄与させてもよい。 Typical change patterns are classified into four states: frequent periodic changes, infrequent periodic changes, non-periodic changes, and no changes, and the classification status to the four states is displayed immediately after classification. You may output and show and contribute to utilization of a classification result.

複数の変化パターン間の類似関係を定め、撮像画像を、その変化パターン間の類似関係に基づいて表示手段に階層表示して、分類結果の活用に寄与させてもよい。 A similarity relationship between a plurality of change patterns may be determined, and the captured image may be hierarchically displayed on the display unit based on the similarity relationship between the change patterns, thereby contributing to the utilization of the classification result.

典型的な変化パターン例を予め用意しておき、その典型例と分類された画像情報の入力は、即座に表示手段に出力して示して、分類結果の活用に寄与させてもよい。 A typical change pattern example may be prepared in advance, and the input of the image information classified as the typical example may be immediately output to the display means to contribute to the utilization of the classification result.

このような分類を実施する本発明装置は、動作をしうる対象体を撮像可能な複数の撮像手段と、その撮像手段によって得られた画像情報を少なくとも記録する記憶手段と、撮像手段によって得られた画像情報を少なくとも解析する演算手段とを備え、対象体を撮像して得た複数の画像における変化のパターンを分類する装置において、対象体を連続的に撮像する撮像手段と、撮像手段で連続的に撮像された画像情報を記録する記憶手段と、記憶手段に蓄積された画像情報の差分から、撮像単位時間の間に発生した画像情報の変化量を算出し、現時刻から所定の対象時間内における変化パターンを変数として分析することによって、現時刻から所定の対象時間内における画像情報を略リアルタイムに分類する演算手段とを有することを特徴とする。なお、変化パターンを変数として分析することは、類似性に従って分類することによる。
また、以上において、略リアルタイムに分類するとは、分類対象の単位動作の完了後即座に分類処理を行うということである。 The apparatus of the present invention for performing such classification is obtained by a plurality of imaging means capable of imaging an object that can operate, a storage means for recording at least image information obtained by the imaging means, and an imaging means. An apparatus for classifying change patterns in a plurality of images obtained by imaging a target object, and an imaging means for continuously imaging the target object, and an imaging means. The amount of change in the image information generated during the imaging unit time is calculated from the difference between the storage means for recording the captured image information and the image information stored in the storage means, and a predetermined target time from the current time And calculating means for classifying image information within a predetermined target time from the current time in substantially real time by analyzing the change pattern in the image as a variable. . The analysis of change patterns as variables is based on classification according to similarity.
In the above description, the classification in substantially real time means that the classification process is performed immediately after the unit operation of the classification target is completed.

本発明によると、連続的に撮像した画像の変化から撮像画像を動作後すぐに分類可能なので、その分類状況を随時利用したり、特定の対象物や事象の発生を検知することに利用できる。 According to the present invention, since a captured image can be classified immediately after operation from changes in continuously captured images, the classification state can be used as needed or used to detect the occurrence of a specific object or event.

以下に、図面を基に本発明の実施形態を説明する。
本実施例では、ビデオカメラ等の複数台の撮像手段で同時に撮像した動画像を、分類の対象としている。また、撮像手段から出力される撮像データは、ＣＰＵ等の演算手段やＨＤ等の記憶手段を備えＣＲＴ等の表示手段に接続されたコンピュータに入力されて処理される。なお、ここでは、撮像手段と被写体との間の距離や、撮像手段とコンピュータとの間のデータ伝送遅延は実質上は無視できるものとする。 Embodiments of the present invention will be described below with reference to the drawings.
In this embodiment, moving images captured simultaneously by a plurality of image capturing means such as video cameras are targeted for classification. In addition, image data output from the image pickup means is input to a computer connected to a display means such as a CRT, which has a calculation means such as a CPU and a storage means such as an HD, and is processed. Here, it is assumed that the distance between the imaging unit and the subject and the data transmission delay between the imaging unit and the computer can be substantially ignored.

複数の撮像画像を比べて、それらが「似ている」か或いは「似ていない」かについて類似性を判断するには、以下のように行う。
仮に、撮像手段αがある動画像Ｍを撮像し、それと全く同じ動画像Ｍを１秒遅れで撮像手段βが撮像したとする。この場合、１秒の時間差はあるものの動画の内容は同一であるが、ここでは「似ていない」と判断する。ここでは、異なる位置にセットされた異なる撮像手段が、同一被写体を撮像し、類似した動画像Ｍを得た場合に「似ている」と判断する。そのために重要となるのは、動作の発生から終了までのタイミングの同一性と、動作の内容自体の類似性となる。 In order to compare a plurality of captured images and determine whether they are “similar” or “similar”, the following is performed.
Suppose that the moving image M with the image pickup means α is picked up, and the same moving image M is picked up by the image pickup means β with a delay of 1 second. In this case, although there is a time difference of 1 second, the content of the moving image is the same, but here it is determined that it is not similar. Here, when different imaging units set at different positions image the same subject and obtain similar moving images M, they are determined to be “similar”. Therefore, what is important is the same timing from the start to the end of the operation and the similarity of the operation content itself.

処理を容易にするために、動画像情報を時間軸に沿って加工する。
ここでの加工とは、動画を構成する１フレーム画像の各ピクセルにおいて、フレーム間での差異、例えば輝度差の絶対値の総和、を数値として表現することを意味する。 In order to facilitate the processing, the moving image information is processed along the time axis.
The processing here means expressing the difference between frames, for example, the sum of absolute values of luminance differences, as a numerical value in each pixel of one frame image constituting the moving image.

撮像データが白黒画像情報ならば、時刻tにおいて撮像手段が撮像したフレーム画像内の座標(x,y)における輝度は、として表現できる。撮像データがカラー画像情報ならば、時刻tにおいて撮像手段が撮像したフレーム画像内の座標(x,y)における赤、緑、青の強度は、それぞれ, , として表現できる。 If the imaging data is monochrome image information, the luminance at coordinates (x, y) in the frame image captured by the imaging means at time t can be expressed as: If the imaging data is color image information, the red, green, and blue intensities at coordinates (x, y) in the frame image captured by the imaging means at time t can be expressed as,, respectively.

輝度と赤緑青の強度との間には、一般的に次式が成り立つ。
（数１）
The following equation is generally established between the luminance and the intensity of red, green and blue.
(Equation 1)

ここで、Cr, Cg, Cbは赤緑青各色の補正係数であり、経験則として一般的にCr= 0.298912, Cg=0.586611, Cb=0.114478が利用されることが多い。撮像手段の特性や撮像手段が映す画像の光源等の条件によって、Cr, Cg, Cbの値を変えることも望ましい。この式から白黒画像であってもカラー画像であっても、ともに以下に述べる方法が利用できる。 Here, Cr, Cg, and Cb are correction coefficients for red, green, and blue, and generally, Cr = 0.298912, Cg = 0.586611, and Cb = 0.114478 are often used as empirical rules. It is also desirable to change the values of Cr, Cg, and Cb depending on conditions such as the characteristics of the imaging means and the light source of the image projected by the imaging means. From this equation, the following method can be used for both monochrome images and color images.

画像が縦h, 横wピクセルから構成されているとすると、下式のd(t)で示される数値は、時刻tにおいて、１フレームの時間に相当するΔtだけ前、即ち時刻（t−Δt）の画像との輝度差の絶対値の総和となる。
（数２）
Assuming that an image is composed of vertical h pixels and horizontal w pixels, the numerical value represented by d (t) in the following expression is the previous time by Δt corresponding to the time of one frame at time t, that is, time (t−Δt). ) Is the sum of absolute values of luminance differences from the image.
(Equation 2)

図１は、異なる位置にセットされた３つの撮像手段α, β, γから得られるフレーム毎の情報d(t)を時間軸上に並べたグラフである。
ここで示されているのは、フレーム間の時間Δtよりも十分に大きい比較用サンプリング期間S中に発生した動作である。フレーム間時間Δtは、例えば１秒間に１５コマ撮像できる撮像装置ならば、Δt＝0.0667秒となる。比較用サンプリング期間Sは、任意であるが例えば５秒等が適用できる。 FIG. 1 is a graph in which information d (t) for each frame obtained from three imaging means α, β, and γ set at different positions is arranged on the time axis.
Shown here is the operation that occurred during the comparative sampling period S, which is sufficiently larger than the time Δt between frames. The inter-frame time Δt is, for example, Δt = 0.0667 seconds if the imaging apparatus can capture 15 frames per second. The comparison sampling period S is arbitrary, but for example, 5 seconds or the like can be applied.

この図から得られる各種変数の選り分けは重要である。次のように、今回の類似性の分類に重要であるものと重要でないものがある。
例えば、この場合のサンプリング期間S中に発生した各撮像手段における総動作量Ｓα（＝Σd(t)）, Ｓβ, Ｓγは、Ｓβ＜Ｓα＝Ｓγとなる。
よって総動作量だけで考えると、撮像手段αによる画像情報は撮像手段γによる画像情報と等しく、撮像手段αによる画像情報も撮像手段γによる画像情報も撮像手段βによる画像情報とは異なる。
しかし、撮像手段αと撮像手段γが同一被写体を撮像したとは考えにくい。むしろ同一被写体を撮像したかどうかは、動作タイミングの発生時期で考えれば、撮像手段αと撮像手段βとが類似している。
この例で解るように、撮像手段間の動作総量Σd(t)の比較はさして重要でない。その一因には、動作総量Σd(t)が撮像手段と被写体との距離に依存してしまうことなどが挙げられる。むしろ、動作のタイミングの一致性を重視することが必要である。 The selection of the various variables obtained from this figure is important. There are some that are important and some that are not important in this classification of similarity as follows.
For example, the total motion amount Sα (= Σd (t)), Sβ, Sγ in each imaging means generated during the sampling period S in this case is Sβ <Sα = Sγ.
Therefore, considering only the total motion amount, the image information by the imaging unit α is equal to the image information by the imaging unit γ, and the image information by the imaging unit α and the image information by the imaging unit γ are different from the image information by the imaging unit β.
However, it is unlikely that the imaging unit α and the imaging unit γ have captured the same subject. Rather, whether or not the same subject has been imaged is similar to the imaging unit α and the imaging unit β in terms of the timing of operation timing.
As will be understood from this example, the comparison of the total operation amount Σd (t) between the imaging means is not important. One factor is that the total operation amount Σd (t) depends on the distance between the imaging means and the subject. Rather, it is necessary to emphasize the coincidence of operation timing.

タイミングの一致は、従来公知の様々な方法を利用して測定できるが、ここでは一例として動作非動作判断変数D(二進数)を用いる。
動作非動作判断変数Dは、桁数がsの二進数表記であり、各桁がd(1)〜d(s)に対応する。d(t)>0ならば1、d(t)=0ならば0とする。静止した場所を撮り続けた動画ならば理想的にはd(t)=0であるが、前後フレーム間に電気電子的な微小ノイズが発生するのは一般的であるために、微小ノイズ用閾値Nを設定しd(t)≧Nならば1、d(t)<Nならば0とし、微小ノイズの影響を除去するのが一般的である。このようにして例えばDα=01011011010101などと表される。
sは、比較に用いるフレーム数であり、サンプリング期間Sのフレーム間時間Δtによる商で表せる。例えば１秒間に15コマ撮像できる撮像装置でサンプリング期間が5秒なら、s=S/Δt=5/0.0667=75[フレーム]となる。 The coincidence of timing can be measured using various conventionally known methods, but here, an operation non-operation determination variable D (binary number) is used as an example.
The operation / non-operation determination variable D is expressed in binary notation with the number of digits s, and each digit corresponds to d (1) to d (s). 1 if d (t)> 0, 0 if d (t) = 0. Ideally, d (t) = 0 for a movie that has been shot in a static place, but it is common for electrical and electronic noise to occur between the previous and next frames. In general, N is set to 1 if d (t) ≧ N, and is set to 0 if d (t) <N, to remove the influence of minute noise. In this way, for example, Dα = 01011011010101 is represented.
s is the number of frames used for comparison, and can be expressed as a quotient by the inter-frame time Δt of the sampling period S. For example, if an imaging apparatus capable of capturing 15 frames per second and the sampling period is 5 seconds, s = S / Δt = 5 / 0.0667 = 75 [frame].

例えば、撮像手段αでの動作非動作判断変数Dαが1100111111、撮像手段βでの動作非動作判断変数Dβが1100111111、撮像手段γでの動作非動作判断変数Dγが1111110011と表されたとする。
すると、撮像手段αによる画像情報と撮像手段βによる画像情報のタイミングの同一性はDαとDβのハミング距離を調べることによって判断することができる。DαとDβのハミング距離は0であり、DαとDγのハミング距離は4である。取りきれなかったノイズや他の外乱を考慮しても、ハミング距離が小さいほどタイミングの同一性が高いことになる。
なお、画像情報のタイミングの同一性については、ハミング距離と同様にレーベンシュタイン距離等も適宜利用可能である。 For example, it is assumed that the operation / non-operation determination variable Dα in the imaging unit α is represented as 1100111111, the operation / non-operation determination variable Dβ in the imaging unit β is represented as 1100111111, and the operation / non-operation determination variable Dγ in the imaging unit γ is represented as 1111110011.
Then, the sameness of the timing of the image information by the imaging means α and the image information by the imaging means β can be determined by examining the Hamming distance between Dα and Dβ. The Hamming distance between Dα and Dβ is 0, and the Hamming distance between Dα and Dγ is 4. Even when noise and other disturbances that cannot be removed are taken into account, the smaller the hamming distance, the higher the timing identity.
Note that the Levenshtein distance and the like can be used as appropriate for the sameness of the timing of the image information as well as the Hamming distance.

次に、動作の内容を考えると、動作情報で重要なものは、動作の変化パターンである。
例えば、撮像手段αが、ある人物Qの腕の動作のみのようなリズミカルな長周期振動を撮像し、撮像手段βが、ある人物Qの腕の動作と、首のリズミカルな短い周期振動とを撮像したとする。この場合、例えば動作発生部分を、周期と強度が異なる複数のsin波の合成波ととらえれば、この合成波をフーリエ変換することにより、動画像に内包されている周波数と強度を得ることが可能である。 Next, considering the content of the operation, what is important in the operation information is a change pattern of the operation.
For example, the imaging means α images rhythmic long-period vibrations such as only the movement of an arm of a certain person Q, and the imaging means β performs movements of an arm of a certain person Q and rhythmical short-period vibrations of the neck. Suppose that an image is taken. In this case, for example, if the motion generation part is regarded as a composite wave of a plurality of sin waves having different periods and intensities, it is possible to obtain the frequency and intensity included in the moving image by performing Fourier transform on the composite wave. It is.

このようにして得られた動作タイミングの同一性、動作内容の周期や強度情報を変数として、適切に類似性による分類、例えばクラスター分析を行う。図１の場合ならば、撮像手段αと撮像手段βは近い距離にあり１つのクラスターを形成し、撮像手段γは、撮像手段αと撮像手段βが作るクラスターよりも遠い位置にあることになる。
これにより、大雑把な判断ではあるが撮像手段αと撮像手段βが同じ被写体を撮像し、撮像手段γは異なる被写体を撮像していると判断することにより分類が可能である。
撮像手段αと撮像手段βが同一被写体を撮像していると判断したので、どちらか一方からの情報で十分となるので、ここで有用な情報は撮像手段α（または撮像手段β）と撮像手段γの情報となる。 Classification based on similarity, for example, cluster analysis, is appropriately performed using the operation timing identity, the operation content period, and the intensity information obtained as described above as variables. In the case of FIG. 1, the imaging unit α and the imaging unit β are close to each other to form one cluster, and the imaging unit γ is located farther than the cluster formed by the imaging unit α and the imaging unit β. .
Thereby, although it is a rough judgment, classification can be performed by determining that the imaging unit α and the imaging unit β capture the same subject, and that the imaging unit γ captures a different subject.
Since it is determined that the imaging unit α and the imaging unit β are capturing the same subject, information from either one is sufficient. Therefore, useful information here is the imaging unit α (or the imaging unit β) and the imaging unit. It becomes information of γ.

仮に撮像手段の位置を不変とするならば、この方法で得られた情報を蓄積していくと、同一クラスターに分類されやすい撮像手段が自ずと解ることになる。同一クラスターに分類されやすい撮像手段同士は近い空間を撮像していて、反対に異なるクラスターに分類されやすい撮像手段同士は遠い（異なる）空間を撮像しているとみなせる。 If the position of the image pickup means is not changed, the image pickup means that are easily classified into the same cluster can be understood by accumulating information obtained by this method. Imaging devices that are likely to be classified into the same cluster capture an image of a close space, and conversely, imaging devices that are easily classified as a different cluster capture an image of a distant (different) space.

他方、既に撮像された画像については、撮像手段の台数に依存した多数の動画像m個全部を解析するよりも任意のn個に数を減らして解析した方が演算量も少なく効率がよい。前述の手段によって、任意のn個への類似画像の絞り込み分類が終了したものとする。nは、利用者が選択した動画個数に相当し、例えばm=10、n=2であるとすれば、10の動画像ソースから２つを選んだことを意味する。
次に必要なことは、n個の各動画像の解析となる。 On the other hand, with respect to an already captured image, it is more efficient to reduce the number of calculations and analyze it by reducing the number to an arbitrary n rather than analyzing all m moving images depending on the number of imaging means. It is assumed that the narrowing-down classification of n similar images has been completed by the above-described means. n corresponds to the number of moving images selected by the user. For example, if m = 10 and n = 2, it means that two are selected from ten moving image sources.
Next, what is needed is an analysis of each of the n moving images.

動画像の解析は、従来公知の方法を適宜利用できる。ここでは、前述の手段によって既に演算し終わった変数d(t)を再利用して判断する簡単な例を示す。この例による方法は、演算量を抑えることができ、動作解析の正確性よりも演算量抑制を主旨とするものである。
例えば撮像手段１台から得た動画像をリアルタイムに解析するならば、現状のコンピュータの演算速度でも十分に足りる。しかしながら、多数の撮像手段を駆使することや、また演算装置自体を廉価かつ低電力消費型のものにするならば、演算量の低減を優先させることは十分に意味があることである。 For analysis of moving images, conventionally known methods can be used as appropriate. Here, a simple example is shown in which the variable d (t) that has already been calculated by the above-described means is reused for determination. The method according to this example can reduce the amount of calculation, and is intended to suppress the amount of calculation rather than the accuracy of the operation analysis.
For example, if a moving image obtained from one imaging means is analyzed in real time, the current computer speed is sufficient. However, if a large number of image pickup means are used, or if the arithmetic unit itself is inexpensive and low power consumption type, it is meaningful to give priority to the reduction of the calculation amount.

動作分析の一例として、友人間などで対面時に話す日本語口語が挙げられる。
日本人が用いる一般的な口語では、主語の省略や、文末の省略、共話を誘うようなタイミングでの会話など、一般的な他国の口語と比べて曖昧な文型を多用することがあるため、自然と会話中にバックチャネリング等の非言語動作が多く見受けられる。
そこで、ここでは超長周期の動作は無意味なもの、或いは判断に高等な処理が必要なものとして捨象することにする。例えば、説明のために発話を伴って指先で仮想的な地図を空中に描くような大型の長期間非周期動作、退屈さなどが原因で座っている回転式の椅子を左右にゆっくりと振り続けるなどの超長周期の動作は、サンプリング期間Sよりも長いものとすれば自動的に無視したことになる。 An example of motion analysis is Japanese colloquium that is spoken between friends.
The common spoken language used by Japanese people often uses ambiguous sentence patterns compared to common spoken languages in other countries, such as omission of the subject, omission of the end of the sentence, and conversation at a timing that invites co-speaking. Many non-verbal actions such as back-channeling can be seen during conversation with nature.
Therefore, here, the ultra-long cycle operation is discarded as meaningless or requiring high-level processing for determination. For example, for the purpose of explanation, keep swinging a swivel chair slowly from side to side due to large long-term non-periodic motion, such as drawing a virtual map in the air with your fingertips for explanation, boredom, etc. Such an operation with a very long period is automatically ignored if it is longer than the sampling period S.

撮像手段の向きや、被写体の向き、またそれらの間の距離が千差万別であるとすると、１フレーム前の画像との輝度差に従属するd(t)の増減周期を測定するのが最も合理的である。
ある特定の撮像手段αから得られた動画像の現時刻tからサンプリング期間S前までのd(t)を、上述方法で解析するだけでも、動作分析として例えば次の４動作に分類することができる。
（イ）：最も頻出する周期的動作（変化）
（ロ）：それ以外の周期的動作（変化）
（ハ）：非周期的動作（変化）
（ニ）：非動作（不動）
個人差はあるが、ある特定の撮像手段αから得られた動画像が、一般的なバックチャネリングであった場合、これらの動作分類イ〜ニは、それぞれ頷き（イ）、頷き以外の頻出する周期動作（ロ）、いわゆるジェスチャーなどの非周期動作（ハ）、非動作（ニ）と大雑把に見なすことができる。 Assuming that the orientation of the imaging means, the orientation of the subject, and the distance between them are very different, it is possible to measure the increase / decrease period of d (t) depending on the luminance difference from the image one frame before. The most reasonable.
Even if d (t) from the current time t of the moving image obtained from a specific imaging means α to the sampling period S is analyzed by the above-described method, it can be classified into, for example, the following four motions as motion analysis. it can.
(B): The most frequent periodic movement (change)
(B): Other periodic operations (changes)
(C): Aperiodic operation (change)
(D): Non-operation (immobility)
Although there are individual differences, when a moving image obtained from a specific imaging means α is general back-channeling, these motion classifications a to i occur frequently other than whispering (b) and whispering, respectively. It can be roughly regarded as a periodic motion (b), a non-periodic motion (c) such as a so-called gesture, and a non-motion (d).

図２は、本実施例のシステム構成の要部を示す説明図である。
m台の撮像装置から得られた各動画像は、動画像受信装置に入力され、演算装置によって計算処理され、それぞれd(t)として記憶装置に記録保存される。各d(t)は、分析結果表示装置で表示してもよい。 FIG. 2 is an explanatory diagram showing the main part of the system configuration of this embodiment.
Each moving image obtained from the m imaging devices is input to the moving image receiving device, subjected to calculation processing by the arithmetic device, and recorded and stored in the storage device as d (t). Each d (t) may be displayed on the analysis result display device.

演算装置では、サンプリング期間Sより長いことが望ましい適当な時間毎に、一連の連続した動作部分のd(t)を解析し、動作分類イ〜ニの４状態に分類演算処理する。これは、例えば人物Qの動作を撮像している特定の撮像装置αからの動画を、動作分類イ〜ニの４状態に分類したことを意味する。
現在もしくは最大サンプリング期間S前までに人物Qが何らかの動作を行っている（Dα>0の場合）ならば、その動作が動作分類イ〜ハのどれに最も類似しているかマッチングを行う。これは、前述のようなサンプリング期間S内でのd(t)が示す動作の周期や強度の類似性を調べ、動作分類イ〜ハのどれに最も類似しているかを判断する。周期、強度を変数とした従来公知のクラスター分析でもよい。例えば動作分類イに最も類似していると判断されれば、人物Qはサンプリング期間S内で頷きをしたと判断できる。 The arithmetic unit analyzes d (t) of a series of continuous motion parts at appropriate times that are preferably longer than the sampling period S, and performs classification computation processing into four states of motion classifications A to D. This means that, for example, moving images from a specific imaging device α that captures the motion of the person Q are classified into four states of motion classifications a to d.
If the person Q is performing some kind of action at present or before the maximum sampling period S (when Dα> 0), matching is performed to determine which of the action classifications A to C is the most similar. This examines the similarity of the operation period and intensity indicated by d (t) within the sampling period S as described above, and determines which of the operation classifications A to C is the most similar. A conventionally known cluster analysis using the period and intensity as variables may be used. For example, if it is determined that the person is most similar to the motion classification A, it can be determined that the person Q has whispered within the sampling period S.

現時刻からサンプリング期間S内でのd(t)を比較することによって得た類似動画像の分類結果や、適当な時間毎に全m個のd(t)の連続した動作部分の分析結果は、分析結果表示装置で表示してもよい。
また、類似性分類の結果、全m台の撮像装置から選ばれた非類似のn個の動画像は、演算装置から動画切替機への切替指示発信に従って、動画表示装置で表示してもよい。 The classification results of similar moving images obtained by comparing d (t) within the sampling period S from the current time, and the analysis results of continuous motion parts of all m d (t) at appropriate times are It may be displayed on the analysis result display device.
Also, as a result of the similarity classification, n dissimilar moving images selected from all m imaging devices may be displayed on the moving image display device in accordance with the switching instruction transmission from the arithmetic device to the moving image switching device. .

n個の動画像が示す現時刻からサンプリング期間S内に発生した動作が、前述で予め分類しておいた動作分類イ〜ニのどれに類似しているかを分析することで、nより少ないq人の動作を即時に分類後、示すことが可能になる。
全q人のうち、頷き（イ）が何人、頷き以外の周期動作（ロ）が何人、非周期動作（ハ）が何人、不動（ニ）が何人であるかや、或いは、動作分類イ〜ニの各比率を、即座に分類後分析結果表示装置で表示してもよい。 By analyzing which of the motion classifications a to d previously classified in the above-mentioned manner, the motion generated within the sampling period S from the current time indicated by n moving images is less than n q It is possible to show human actions immediately after classification.
Of all the q people, how many people are whipping (b), how many are periodic movements (b) other than whipping, how many are non-periodic movements (c), how many are immobile (d), Each ratio may be immediately displayed on the analysis result display device after classification.

動画内における動作の分析は、人物の同定や、特殊な行為の同定にも適用できる。
例えば、予め次のd(t)のパターンを、記憶装置に記録しておく。
（イ）：家人が窓を普通に開けようとしている動作
（ロ）：盗人が窓を破壊して侵入しようとしている典型的な動作
（ハ）：それ以外の動作
（ニ）：非動作（不動）
演算装置で、現在記録されているサンプリング期間S内のd(t)の変化が上記の動作分類イ〜ニのいずれに最も類似しているか判断することにより、もし盗人が窓を破壊して侵入しようとしている動作情報（ロ）が得られたら、付設警報装置によって警報を出すなどの出力を行えばよい。
このように、動作リズムに相当するd(t)を記録しておき、それを典型的な動作と比較させることは諸々の場面に活用できる。 Motion analysis within a video can also be applied to identification of people and special actions.
For example, the following d (t) pattern is recorded in advance in the storage device.
(B): Movement of a housekeeper trying to open a window normally (b): Typical movement of a thief trying to break a window and invade (c): Other actions (d): Non-operation (immobility) )
If the computing device determines whether the change in d (t) within the currently recorded sampling period S is most similar to any of the above operation classifications a to d, the thief breaks the window and enters When the operation information (b) to be obtained is obtained, an output such as issuing an alarm by an attached alarm device may be performed.
In this way, recording d (t) corresponding to the motion rhythm and comparing it with a typical motion can be used in various situations.

本発明によると、撮像画像をほぼリアルタイムで分類可能なので、その分類状況の随時利用や、分類による検知などに利用できる。例えば、講演者が聴衆の反応分類に応じて話を変えたり、家屋侵入など特定の事象の発生を検知して警報を出すなど、多様な場面に応用できるので、用途が広く産業上非常に有用である。 According to the present invention, captured images can be classified almost in real time, and can be used for the use of the classification status as needed or for detection by classification. For example, it can be applied to a variety of situations, such as when a speaker changes the story according to the audience's response classification, or when a specific event such as a house intrusion is detected and an alarm is issued. It is.

異なる位置にセットされた３つの撮像手段α, β, γから得られるフレーム毎の情報d(t)を時間軸上に並べたグラフA graph in which information d (t) for each frame obtained from three imaging means α, β, γ set at different positions is arranged on the time axis 本実施例のシステム構成の要部を示す説明図Explanatory drawing which shows the principal part of the system configuration | structure of a present Example

Claims

A plurality of imaging means capable of imaging a target object capable of movement;
Storage means for recording at least image information obtained by the imaging means;
An arithmetic means for analyzing at least the image information obtained by the imaging means,
A method of classifying patterns of change in a plurality of images obtained by imaging a target object,
The object is continuously imaged by the imaging means,
From the difference between these continuously captured image information, the amount of change in the image information that occurred during the imaging unit time is obtained,
By classifying the change pattern within a given target time from the current time as a variable according to similarity,
A classification method based on similarity of images, characterized in that image information within a predetermined target time from the current time is classified in substantially real time.

The classification method according to the similarity of images according to claim 1, wherein the amount of change in image information is subjected to Fourier transform, and the frequency and intensity included in continuous captured image information obtained thereby are used as classification variables.

The classification method according to the similarity of images according to claim 1 or 2, wherein classification is performed using a variable that weights the timing at which a change occurs in captured image information.

The classification method according to the similarity of images according to claim 1, wherein each characteristic image classified by similarity is output and displayed on a display means.

Typical change pattern
Classify into four states: periodic changes that occur frequently, periodic changes that do not occur frequently, aperiodic changes, and no changes.
The classification method according to the similarity of images according to claim 1, wherein the classification state indicating the four states is output and displayed on the display means in substantially real time.

Define similar relationships between multiple change patterns,
The classification method according to the similarity of images according to claim 1, wherein the captured images are hierarchically displayed on the display means based on the similarity between the change patterns.

Prepare variables for classification that show typical change patterns in advance.
The classification method according to the similarity of images according to any one of claims 1 to 6, wherein when image information classified into the typical change pattern is input, the image information is immediately output to the display means.

A plurality of imaging means capable of imaging a target object capable of movement;
Storage means for recording at least image information obtained by the imaging means;
An arithmetic means for analyzing at least the image information obtained by the imaging means,
An apparatus for classifying patterns of change in a plurality of images obtained by imaging a target object,
Imaging means for continuously imaging the object;
Storage means for recording image information continuously captured by the imaging means;
From the difference between the image information accumulated in the storage means, the amount of change in the image information generated during the imaging unit time is calculated,
By analyzing the change pattern within a given target time from the current time as a variable,
A classification device based on similarity of images, comprising: arithmetic means for classifying image information within a predetermined target time from a current time in substantially real time.