JP2003168106A

JP2003168106A - Eye position tracking method, eye position tracking device, and program therefor

Info

Publication number: JP2003168106A
Application number: JP2001369076A
Authority: JP
Inventors: Shinjiro Kawato; 慎二郎川戸; Shinji Tetsuya; 信二鉄谷
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2001-12-03
Filing date: 2001-12-03
Publication date: 2003-06-13
Anticipated expiration: 2021-12-03
Also published as: JP3980341B2

Abstract

(57)【要約】【課題】一般的な顔画像から目の位置をロバスト性高
くトラッキングすることが可能で、利用者を不当に拘束
することがない目の位置のトラッキング方法を提供す
る。【解決手段】目の位置をトラッキングする方法は、画
像を取得するステップ２４２と、画像内の顔画像の眉間
位置を予測するステップ２４４と、予測された眉間位置
の近傍で、眉間の画像パターンと最もよく一致する点を
探索するステップ２４６と、探索された点の位置と、両
目の相対位置とに基づいて、画像内での両目の位置を予
測し、当該予測された領域を中心として予め定められた
条件を満足する二つの領域の各々の中心となる点を探索
するステップ２４８と、眉間の位置、両目の相対位置、
および眉間の画像パターンを更新するステップ２５４と
を含む。 (57) [Problem] To provide an eye position tracking method capable of tracking the position of eyes from a general face image with high robustness and without unduly restricting a user. A method for tracking the position of an eye includes a step of acquiring an image, a step of predicting an inter-brow position of a face image in the image, and a step of: Step 246 of searching for the best matching point; predicting the position of both eyes in the image based on the position of the searched point and the relative position of both eyes; A step 248 of searching for a center point of each of the two regions satisfying the specified condition, a position between eyebrows, a relative position of both eyes,
And a step 254 of updating the image pattern between the eyebrows.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明が属する技術分野】この発明はマン・マシンイン
タフェース技術に関し、特に、人の視線によってコンピ
ュータなどを操作する際に、人の視線方向を誤りなく検
出する目的のために、人の目の位置をトラッキングする
方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to man-machine interface technology, and more particularly, to the position of human eyes for the purpose of accurately detecting the direction of the line of sight of a person when operating a computer or the like by the line of sight of the person. A method and apparatus for tracking an object.

【０００２】[0002]

【従来の技術】近年のコンピュータ技術の発達により、
人々の生活の隅々にまでコンピュータを用いた装置が用
いられている。コンピュータを操作する技術なしには、
満足な社会生活も営めなくなるおそれさえある。2. Description of the Related Art With the recent development of computer technology,
Computer-based devices are used in every corner of people's lives. Without the technology to operate a computer,
There is a risk that they will not be able to live a satisfactory social life.

【０００３】一方で、コンピュータを操作するためには
人の意思をコンピュータに伝える必要がある。いかに効
率よく、誤りなく、そして簡単に人の意思をコンピュー
タに伝えるかについては、さまざまな研究がなされ、実
用化されている。そうした技術は一般にマン・マシンイ
ンタフェースと呼ばれている。On the other hand, in order to operate a computer, it is necessary to convey the intention of a person to the computer. Various studies have been made and put into practical use as to how efficiently, without error, and easily convey human intentions to a computer. Such a technique is generally called a man-machine interface.

【０００４】コンピュータを操作するためだけであれ
ば、テキストベースでコンピュータを操作するためのコ
マンドをたとえばキーボードなどを介してコンピュータ
に与えればよい。しかし、それでは人間がそれら多数の
コマンドを、そのコマンドの構文および必要なパラメー
タとともに記憶する必要がある。そこで、一般にＧＵＩ
（Graphical User Interface）と呼ばれるものが考案さ
れ、現在では主流を占めている。If it is only to operate the computer, a text-based command for operating the computer may be given to the computer via, for example, a keyboard. However, that would require a human to remember those numerous commands along with the syntax of the command and the required parameters. Therefore, in general, GUI
What was called (Graphical User Interface) was invented and is now the mainstream.

【０００５】ＧＵＩでは、一般的にはマウスなどのポイ
ンティングデバイスを用いて、画面に表示されるアイコ
ンなどをポイントし、クリック、ダブルクリック、ドラ
ッグなどの所定の操作を行なうことにより人間の意思を
コンピュータに与えることができる。そのため、多数の
コマンドを覚える必要がなく、誤りも生じにくいという
特徴がある。In the GUI, a pointing device such as a mouse is generally used to point an icon or the like displayed on the screen and perform a predetermined operation such as click, double click, drag, etc. Can be given to. Therefore, there is a feature that it is not necessary to remember many commands and an error is unlikely to occur.

【０００６】一方で、ＧＵＩではポインティングデバイ
スを操作する必要がある。そのため、たとえば手の動作
に障害がある人などは、ＧＵＩを採用したコンピュータ
であっても操作が困難である。また応用によっては手を
自由に利用できない場合もあり、マウスなどのポインテ
ィングデバイスを用いることが困難であったりする場合
がある。On the other hand, in the GUI, it is necessary to operate a pointing device. Therefore, for example, it is difficult for a person with a hand movement disorder to operate even a computer that employs a GUI. In addition, depending on the application, the hand may not be used freely, and it may be difficult to use a pointing device such as a mouse.

【０００７】そこで、人間の視線を用いて人間の意思を
コンピュータに伝達する技術が種々考えられている。人
間が意識的に視線を操作することにより、その視線をポ
インティングデバイスとして用いて、人間のその意思を
コンピュータに与えることができる。また、人間が無意
識のうちに視線を移動させた場合、その視線の移動をコ
ンピュータが検出することにより、人間の意思を推定す
ることもできる。Therefore, various techniques for transmitting the intention of a human to a computer by using the line of sight of the human have been considered. When a human consciously operates the line of sight, the line of sight can be used as a pointing device to give the human the intention of the human to the computer. In addition, when a human unconsciously moves the line of sight, the computer can detect the movement of the line of sight to estimate the intention of the human.

【０００８】そのための前提となる技術は、人間の眼球
を撮像して視線方向を推定する装置である。そのために
は一般的にアイカメラが用いられる。アイカメラを用い
て人間の視線方向を推定することにより、その人間の意
思を推定してコンピュータの操作に役立てることができ
る。[0008] A technology which is a prerequisite for this is an apparatus for estimating a line-of-sight direction by imaging a human eyeball. For that purpose, an eye camera is generally used. By estimating the line-of-sight direction of a person using an eye camera, the intention of the person can be estimated and can be used for computer operation.

【０００９】アイカメラを用いて視線方向を推定する場
合、推定精度を上げるために高解像度で眼球を撮像する
必要がある。ところが、そのためにアイカメラでは撮像
範囲が狭くなる。また被写界深度も浅い。その結果、ア
イカメラを用いた場合には、ユーザは顔を前後左右のい
ずれにも動かさないよう、ほとんど固定していなければ
ならない。そのため、アイカメラを単独で用いた場合に
はその利用形態は非常に限定されてしまう。また、特定
の個人だけにしか利用できないようなシステムではな
く、見知らぬ顔でも目を検出しトラッキングすることが
できるようにすることが望ましい。In the case of estimating the line-of-sight direction using an eye camera, it is necessary to image the eyeball with high resolution in order to improve the estimation accuracy. However, because of this, the imaging range of the eye camera is narrowed. The depth of field is also shallow. As a result, when the eye camera is used, the user has to almost fix the face so that the face does not move forward, backward, left or right. Therefore, when the eye camera is used alone, its usage is very limited. Further, it is desirable that the system can be used only for a specific individual and that the eyes can be detected and tracked even with a strange face.

【００１０】こうしたアイカメラの使用に伴う問題点を
解決するために、本願発明者は、まず広視野のカメラを
使用して利用者の目の位置を検出し、その情報を用いて
アイカメラのパン、チルト、フォーカスを制御すること
が有効であることに想到した。つまり、何らかの方法に
より人間の目の位置を発見し、トラッキングして、逐次
その方向と距離とをアイカメラの制御装置に出力するサ
ブシステムを採用することで、アイカメラによる視線方
向の検出の精度を高める方法である。In order to solve the problems associated with the use of such an eye camera, the inventor of the present application first detects the position of the user's eyes using a camera with a wide field of view, and uses the information to detect the eye camera. We have found that controlling pan, tilt, and focus is effective. In other words, by using a subsystem that discovers and tracks the position of the human eye by some method and sequentially outputs the direction and distance to the control device of the eye camera, Is a way to increase.

【００１１】通常、利用者はコンピュータを利用する際
には、アイカメラから５０〜１００ｃｍ程度の距離にお
り、椅子に座っているものと考えられる。したがって、
上記したような目の位置を検出し、その情報を用いてア
イカメラを制御することにより、アイカメラによる視線
方向検出の精度は非常に高くなると考えられる。It is considered that the user is usually sitting in a chair at a distance of about 50 to 100 cm from the eye camera when using the computer. Therefore,
By detecting the eye position as described above and controlling the eye camera by using the information, it is considered that the accuracy of the eye gaze direction detection by the eye camera becomes extremely high.

【００１２】想定している状況では、顔の動く範囲がか
なり限定されている。そのため、顔が画面の高さいっぱ
いになる程度に大きく撮像することができる。そこで本
願発明の発明者は、このように大きく撮像された顔画像
の中の、目蓋の動きを検出することにより、両目の位置
を検出することができることに思い至った。人間は無意
識のうちに瞬きをしており、自然の瞬きを待つことによ
り、利用者に意識させずにその目の位置を検出すること
ができる。特に、アイカメラを使用する状況では、利用
者の協力を得ることも可能である。In the assumed situation, the range of movement of the face is quite limited. Therefore, it is possible to capture an image as large as the face fills the height of the screen. Therefore, the inventor of the present invention has come to the idea that the positions of both eyes can be detected by detecting the movement of the eyelids in the face image captured in such a large size. Humans are blinking unconsciously, and by waiting for a natural blink, the position of their eyes can be detected without making the user aware. Especially in the situation where the eye camera is used, it is possible to obtain the cooperation of the user.

【００１３】瞬きは継続する動きではない。そのため、
いったん目の位置を検出したなら、何らかの方法で目の
位置をトラッキングする必要がある。そのためのひとつ
の方法は、顔画像のテンプレートを多数用意しておき、
撮像された顔の画像とテンプレートとをマッチングする
ことにより、目の位置を推定することである。Blinking is not a continuous movement. for that reason,
Once the eye position is detected, it is necessary to somehow track the eye position. One way to do that is to prepare a large number of face image templates,
It is to estimate the eye position by matching the captured face image with the template.

【００１４】しかしこの場合、単純なテンプレートマッ
チングでは顔の向きの変化に対応できない。テンプレー
トを更新するようにしたとしても、トラッキング点が徐
々に実際の目とずれていくことは避けられない。したが
って、こうしたテンプレートマッチングによる弱点を克
服し、トラッキング処理のロバスト性を向上させる必要
がある。However, in this case, a simple template matching cannot cope with a change in the orientation of the face. Even if the template is updated, it is inevitable that the tracking point gradually shifts from the actual eye. Therefore, it is necessary to overcome the weak points caused by the template matching and improve the robustness of the tracking process.

【００１５】なお、アイカメラに与える距離の情報につ
いては、目の方向さえわかればたとえば超音波センサな
どを用いて測定することで十分な精度が得られる。ただ
し以下の実施の形態では２眼ステレオ画像を用いて、利
用者の顔までの距離を推定することとした。もちろんこ
の他にも、種々の方法を用いて利用者の顔までの距離を
測定することができるが、本発明における目の位置の検
出およびトラッキングとは直接の関係はないので、以下
の説明では距離の測定については詳細な説明は行なわな
い。Regarding the information on the distance given to the eye camera, sufficient accuracy can be obtained by measuring it using an ultrasonic sensor or the like as long as the direction of the eyes is known. However, in the following embodiments, the distance to the user's face is estimated using a twin-lens stereo image. Of course, in addition to this, various methods can be used to measure the distance to the user's face, but since it is not directly related to the detection and tracking of the eye position in the present invention, it will be described below. A detailed description of distance measurement will not be given.

【００１６】目の位置のトラッキング方法としては、す
でに幾つかの例がある。その一つは特開２０００−１６
３５６４に開示されたものである。この例では、目を中
心とする画像パターンをテンプレートとして、いわゆる
テンプレートマッチングの手法を用いて目の位置をトラ
ッキングしている。また他の例は、信学技報ＰＲＭＵ９
９−１５１（１９９９−１１），ｐｐ．９−１４の「リ
アルタイム視線検出・動作認識システムの開発」と題さ
れた論文に開示されたものである。この例では、２台の
ビデオカメラを用いて得られた顔画像から、左右の目の
両端を含む顔の特徴領域の画像および３次元座標を用い
た画像のステレオ処理によって顔トラッキングを行な
い、さらにこうして推定された顔位置と、左右の目の両
端の中点から眼球へ向かうオフセットベクトルとによっ
て、眼球の中心位置が推定される。There are already some examples of eye position tracking methods. One of them is JP 2000-16.
3564. In this example, the position of the eyes is tracked using a so-called template matching method with an image pattern centering on the eyes as a template. Another example is the Technical Report PRMU9.
9-151 (1999-11), pp. It was disclosed in a paper entitled "Development of a real-time eye-gaze detection / motion recognition system" in 9-14. In this example, face tracking is performed from a face image obtained by using two video cameras by stereo processing of an image of a facial feature region including both ends of the left and right eyes and an image using three-dimensional coordinates. The center position of the eyeball is estimated by the face position thus estimated and the offset vector from the midpoint of both ends of the left and right eyes to the eyeball.

【００１７】[0017]

【発明が解決しようとする課題】しかし、特開２０００
−１６３５６４に開示された方法では通常のテンプレー
トマッチングの手法が用いられており、パターンの回転
とスケール変化とに弱いという問題点がある。また信学
技報に掲載された例では、２つの画像を用いたステレオ
処理のみから目の位置を推定しているため、カメラから
目までの距離が限定され、さらにトラッキングするパタ
ーンをあらかじめ登録しておく必要があるという問題が
ある。However, Japanese Patent Laid-Open No. 2000-2000
In the method disclosed in 163564, a normal template matching method is used, and there is a problem that it is weak against pattern rotation and scale change. In addition, in the example published in IEICE, the eye position is estimated only from the stereo processing using two images, so the distance from the camera to the eye is limited, and the pattern to be tracked is registered in advance. There is a problem that needs to be kept.

【００１８】すなわちこれまでは、一般的な利用者の目
の位置を、利用者の顔の位置を拘束することなく正確に
トラッキングすることができないという問題点があっ
た。In other words, there has been a problem that a general user's eye position cannot be accurately tracked without constraining the user's face position.

【００１９】それゆえに、この発明の目的は、一般的な
顔画像から目の位置をロバスト性高くトラッキングする
ことが可能で、利用者を不当に拘束することがない目の
位置のトラッキング方法、装置およびそのためのプログ
ラムを提供することである。Therefore, an object of the present invention is to enable tracking of the eye position from a general face image with high robustness, and a method and apparatus for tracking the eye position without improperly restraining the user. And to provide a program therefor.

【００２０】この発明の他の目的は、一般的な顔画像か
ら、顔が回転したり移動したりしても目の位置を精度高
くかつロバスト性高く検出することが可能で、利用者を
不当に拘束することがない目の位置のトラッキング方
法、装置およびそのためのプログラムを提供することで
ある。Another object of the present invention is to detect a user's eye position with high accuracy and robustness from a general face image even if the face rotates or moves, which makes the user unfair. (EN) Provided are an eye position tracking method, an apparatus, and a program therefor which are not restricted by the above.

【００２１】[0021]

【課題を解決するための手段】本願の第１の局面に係る
発明は、被験者の顔のビデオ画像を撮像する撮像手段が
接続されるコンピュータにおいて、撮像手段の出力する
一連の画像内における被験者の目の位置をトラッキング
するための方法およびそのためのコンピュータで実行可
能なプログラムであって、予め、撮像手段の出力する顔
画像の眉間の位置、眉間の位置に対する両目の相対位
置、および眉間の位置によって定められる眉間の画像パ
ターンが与えられており、当該方法は、画像に後続する
画像を取得する第１のステップと、後続する画像内の顔
画像の眉間の位置を予測するステップと、予測された眉
間の位置の近傍で、眉間の画像パターンと最もよく一致
する領域の中心となる点を探索するステップと、探索さ
れた点の位置と、両目の相対位置とに基づいて、後続す
る画像内での両目の位置を予測し、当該予測された領域
を中心として予め定められた条件を満足する二つの領域
の各々の中心となる点を探索するステップと、探索され
た二つの中心となる点の中点の位置を新たな眉間の位置
とし、当該新たな眉間の位置および二つの中心となる点
の相対位置によって定まる領域を眉間の位置に対する新
たな両目の相対位置とし、さらに新たな眉間の位置によ
って定められる領域の画像パターンを新たな眉間の画像
パターンとして眉間の位置、両目の相対位置、および眉
間の画像パターンをそれぞれ更新するステップと、後続
する画像にさらに後続する画像に対して第１のステップ
から処理を実行するステップとを含む。The invention according to the first aspect of the present application is a computer to which an image pickup means for picking up a video image of a subject's face is connected. A method for tracking the position of an eye and a computer-executable program therefor, which are set in advance according to a position between the eyebrows of a face image output by the imaging unit, a relative position of both eyes with respect to a position between the eyebrows, and a position between the eyebrows. Given a defined image pattern between the eyebrows, the method comprises a first step of obtaining an image subsequent to the image, a step of predicting a position between the eyebrows of a facial image in the subsequent image, In the vicinity of the position between the eyebrows, the step of searching for a point that is the center of the region that best matches the image pattern between the eyebrows, the position of the searched point, and The position of both eyes in the subsequent image is predicted based on the relative position of each of the two, and the center point of each of the two regions satisfying a predetermined condition is searched with the predicted region as the center. The position of the midpoint between the step and the two center points that have been searched is set as a new position between the eyebrows, and an area defined by the position of the new eyebrows and the relative position of the two center points is set to the position between the eyebrows. The relative position of both eyes, and the step of updating the image pattern of the area defined by the new position between the eyebrows as the new image pattern between the eyebrows, the position between the eyebrows, the relative position between the eyes, and the image pattern between the eyebrows, respectively. Performing the process from the first step on the image subsequent to the image to be processed.

【００２２】好ましくは、眉間の位置を予測するステッ
プは、撮像手段の出力する画像内での眉間の位置と、当
該画像に先行する画像内での眉間の位置とから、後続す
る画像の眉間の位置を外挿するステップを含む。Preferably, the step of predicting the position between the eyebrows is based on the position between the eyebrows in the image output by the image pickup means and the position between the eyebrows in the image preceding the image, Extrapolating the position is included.

【００２３】さらに好ましくは、この方法は、二つの領
域の各々の中心となる点を探索するステップにおいて、
当該点が探索できなかったことに応答して、撮像手段の
出力する顔画像の眉間の位置、眉間の位置に対する両目
の相対位置、および眉間の位置によって定められる眉間
の画像パターンを予め定められた方法によって抽出し、
第１のステップから処理を再開するステップをさらに含
む。More preferably, the method comprises the step of searching for a center point of each of the two regions,
In response to the fact that the point could not be searched, the position between the eyebrows of the face image output by the imaging means, the relative position of both eyes to the position between the eyebrows, and the image pattern between the eyebrows determined by the position between the eyebrows were predetermined. Extracted by the method
The method further includes the step of restarting the process from the first step.

【００２４】また、二つの領域の各々の中心となる点を
探索するステップは、探索された点の位置に対し、両目
の相対位置によって定められる位置の各々を中心とする
近傍において、予め定められた形状の領域であって、か
つ当該領域内の画素値の平均が最も暗くなる領域の中心
を探索して両目の候補点とするステップを含んでもよ
い。Further, the step of searching for the center point of each of the two regions is predetermined in the vicinity of each of the positions determined by the relative positions of the eyes with respect to the position of the searched point. It may include a step of searching for the center of an area having a different shape and having the darkest average pixel value in the area and setting it as a candidate point for both eyes.

【００２５】さらに好ましくは、二つの領域の各々の中
心となる点を探索するステップはさらに、最も暗くなる
領域の中心を探索するステップで探索された候補点が、１）候補点間の距離があらかじめ定められた最小値以上
であること、２）候補点間の距離があらかじめ定められた最大値以下
であること、および３）候補点を結ぶ直線と走査線方向とのなす角度が、あ
らかじめ定められた関係を満足すること、のすべての条
件を満足するか否かを判定し、いずれかの条件が満足さ
れない場合は探索を失敗とするステップを含んでもよ
い。More preferably, the step of searching for the center point of each of the two areas further comprises: 1) the candidate points searched in the step of searching the center of the darkest area are It is greater than or equal to a predetermined minimum value, 2) the distance between candidate points is less than or equal to a predetermined maximum value, and 3) the angle between the straight line connecting the candidate points and the scanning line direction is predetermined. Satisfying all of the above conditions, it may be determined whether all the conditions are satisfied, and if any of the conditions is not satisfied, the search may be failed.

【００２６】本願の第２の局面にかかる発明は、被験者
の顔のビデオ画像を撮像する撮像手段が接続されるコン
ピュータにおいて、撮像手段の出力する一連の画像内に
おける被験者の目の位置をトラッキングするための装置
であって、予め、撮像手段の出力する顔画像の眉間の位
置、眉間の位置に対する両目の相対位置、および眉間の
位置によって定められる眉間の画像パターンが与えられ
ており、画像に後続する画像を取得するための画像取得
手段と、後続する画像内の顔画像の眉間の位置を予測す
るための予測手段と、予測された眉間の位置の近傍で、
眉間の画像パターンと最もよく一致する領域の中心とな
る点を探索するための第１の探索手段と、探索された点
の位置と、両目の相対位置とに基づいて、後続する画像
内での両目の位置を予測し、当該予測された領域を中心
として予め定められた条件を満足する二つの領域の各々
の中心となる点を探索するための第２の探索手段と、探
索された二つの中心となる点の中点の位置を新たな眉間
の位置とし、当該新たな眉間の位置および二つの中心と
なる点の相対位置によって定まる領域を眉間の位置に対
する新たな両目の相対位置とし、さらに新たな眉間の位
置によって定められる領域の画像パターンを新たな眉間
の画像パターンとして眉間の位置、両目の相対位置、お
よび眉間の画像パターンをそれぞれ更新するための更新
手段と、後続する画像にさらに後続する画像に対して画
像取得手段からの処理を繰返すように画像取得手段、予
測手段、第１および第２の探索手段、および更新手段を
制御するための手段とを含む。According to a second aspect of the present invention, in a computer to which an image pickup means for picking up a video image of a subject's face is connected, the eye position of the subject is tracked within a series of images output by the image pickup means. And an image pattern between the eyebrows determined by the position of the eyebrows relative to the position of the eyebrows, and the position of the eyebrows of the face image output by the imaging unit, Image acquisition means for acquiring an image, a prediction means for predicting a position between the eyebrows of a face image in a subsequent image, and in the vicinity of the predicted position between the eyebrows,
Based on the first searching means for searching the center point of the region that best matches the image pattern between the eyebrows, the position of the searched point, and the relative position of both eyes, Second search means for predicting the positions of both eyes and searching for a center point of each of the two regions satisfying a predetermined condition with the predicted region as the center, and two searched points. The position of the middle point of the center point is set as the position of the new eyebrow, and the area determined by the position of the new eyebrow and the relative position of the two center points is set as the new relative position of both eyes with respect to the position of the eyebrow. The image pattern of the area defined by the new position between the eyebrows is used as a new image pattern between the eyebrows, an updating unit for updating the position between the eyebrows, the relative position of the eyes, and the image pattern between the eyebrows, and subsequently. Image obtaining means to repeat the processing from the image acquisition unit to the image for further subsequent to the image, and means for controlling the prediction means, first and second search means, and updating means.

【００２７】好ましくは、予測手段は、撮像手段の出力
する画像内での眉間の位置と、当該画像に先行する画像
内での眉間の位置とから、後続する画像の眉間の位置を
外挿するための手段を含む。Preferably, the predicting means extrapolates the position of the eyebrows of the subsequent image from the position of the eyebrows in the image output by the image pickup means and the position of the eyebrow in the image preceding the image. Including means for.

【００２８】さらに好ましくは、第２の探索手段が当該
点を探索できなかったことに応答して、撮像手段の出力
する顔画像の眉間の位置、眉間の位置に対する両目の相
対位置、および眉間の位置によって定められる眉間の画
像パターンを予め定められた装置によって抽出し、画像
取得手段から処理を再開するように画像取得手段、予測
手段、第１および第２の探索手段、ならびに更新手段を
制御するための手段をさらに含む。More preferably, in response to the second searching means not being able to search for the point, the position between the eyebrows of the face image output by the image pickup means, the relative position of both eyes to the position between the eyebrows, and the space between the eyebrows. The image pattern between the eyebrows defined by the position is extracted by a predetermined device, and the image acquisition means, the prediction means, the first and second search means, and the update means are controlled so as to restart the processing from the image acquisition means. Further included are means for:

【００２９】また、第２の探索手段は、探索された点の
位置に対し、両目の相対位置によって定められる位置の
各々を中心とする近傍において、予め定められた形状の
領域であって、かつ当該領域内の画素値の平均が最も暗
くなる領域の中心を探索して両目の候補点とするための
手段を含んでもよい。The second searching means is a region having a predetermined shape in the vicinity of each of the positions determined by the relative positions of the eyes with respect to the position of the searched point, and Means may be included for searching the center of the region where the average of the pixel values in the region is darkest and using it as a candidate point for both eyes.

【００３０】より好ましくは、第２の探索手段はさら
に、最も暗くなる領域の中心を探索する手段で探索され
た候補点が、１）候補点間の距離があらかじめ定められた最小値以上
であること、２）候補点間の距離があらかじめ定められた最大値以下
であること、および３）候補点を結ぶ直線と走査線方向とのなす角度が、あ
らかじめ定められた関係を満足すること、のすべての条
件を満足するか否かを判定し、いずれかの条件が満足さ
れない場合は探索を失敗とするための手段を含んでもよ
い。More preferably, in the second searching means, the candidate points searched by the means for searching the center of the darkest area are: 1) The distance between the candidate points is equal to or more than a predetermined minimum value. 2) the distance between the candidate points is less than or equal to a predetermined maximum value, and 3) the angle between the straight line connecting the candidate points and the scanning line direction satisfies a predetermined relationship. Means may be included for determining whether or not all the conditions are satisfied, and failing the search if any of the conditions is not satisfied.

【００３１】[0031]

【発明の実施の形態】［ハードウェア構成］図１に、本
願発明にかかる目の位置のトラッキング方法および装置
を実現するコンピュータシステム２０の外観を示す。な
お、コンピュータシステム２０は目の位置の検出および
トラッキングを行ない、図示していないコンピュータが
コンピュータシステム２０から与えられる利用者の顔ま
での距離および目の方向の情報を用いてアイカメラを制
御する。なお、本明細書において「被験者」とは、この
システムが目の位置を検出しトラッキングする利用者の
ことをいい、人間だけでなく動物など「目」に相当する
ものを有しているものすべてを含み得るものとする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [Hardware Configuration] FIG. 1 shows the external appearance of a computer system 20 which implements an eye position tracking method and apparatus according to the present invention. The computer system 20 detects and tracks the position of the eyes, and a computer (not shown) controls the eye camera by using the information about the distance to the face of the user and the direction of the eyes provided from the computer system 20. In the present specification, the “subject” refers to a user whose system detects and tracks the position of the eyes, and includes not only humans but also animals and the like that have “eyes” equivalent. May be included.

【００３２】図１を参照して、このコンピュータシステ
ム２０は、ＣＤ−ＲＯＭ（CompactDisc Read Only Memo
ry）のためのＣＤ−ＲＯＭドライブ５０と、フレキシブ
ルディスク（ＦＤ）のためのＦＤドライブ５２とを備え
たコンピュータ４０と、コンピュータ４０に接続された
第１のポインティングデバイスであるマウス４８と、キ
ーボード４６と、モニタ４２と、モニタ４２とキーボー
ド４６との間に配置されたステレオビデオカメラ５６，
５８と、前述した別のコンピュータにより制御されるア
イカメラ５４とを含む。Referring to FIG. 1, the computer system 20 includes a CD-ROM (Compact Disc Read Only Memo).
computer 40 equipped with a CD-ROM drive 50 for ry), an FD drive 52 for a flexible disk (FD), a mouse 48 which is a first pointing device connected to the computer 40, and a keyboard 46. And a monitor 42 and a stereo video camera 56 disposed between the monitor 42 and the keyboard 46,
58 and an eye camera 54 controlled by the separate computer described above.

【００３３】図２に、コンピュータ４０の内部構成をコ
ンピュータシステム２０の他の構成要素とともに図示す
る。図２を参照して、コンピュータ４０は、ＣＰＵ（Ce
ntral Processing Unit）７０と、ＣＰＵ７０に接続さ
れたメモリ７２と、ＣＰＵ７０に接続されたハードディ
スク７４とを含む。ＣＰＵ７０は、図示されないネット
ワークボードを介してネットワーク８０に接続され、こ
のネットワーク８０には、アイカメラ５４を制御する前
述したコンピュータ８２が接続されている。コンピュー
タ８２の構成はコンピュータ４０と同様であるので、こ
こではその詳細は説明しない。FIG. 2 shows the internal configuration of the computer 40 together with other components of the computer system 20. Referring to FIG. 2, the computer 40 has a CPU (Ce
ntral Processing Unit) 70, a memory 72 connected to the CPU 70, and a hard disk 74 connected to the CPU 70. The CPU 70 is connected to a network 80 via a network board (not shown), and the above-described computer 82 that controls the eye camera 54 is connected to the network 80. Since the configuration of the computer 82 is similar to that of the computer 40, its details will not be described here.

【００３４】以下に詳細に説明する、本発明にかかる目
の位置のトラッキング方法および装置は、図１および図
２を参照して説明したコンピュータハードウェアと、Ｃ
ＰＵ７０によって実行されるソフトウェアとにより実現
される。このソフトウェアは、ＣＤ−ＲＯＭなどの記憶
媒体に記憶されたり、ネットワークを介したりして市場
を流通するものである。このソフトウェアがこのコンピ
ュータ４０にインストールされると、典型的にはソフト
ウェアはハードディスク７４に格納される。そして、利
用者がこのソフトウェアを起動すると、またはコンピュ
ータ４０のオペレーティングシステム（ＯＳ）により起
動されると、ハードディスク７４からメモリ７２に読み
出され、ＣＰＵ７０によって実行される。An eye position tracking method and apparatus according to the present invention, which will be described in detail below, includes the computer hardware described with reference to FIGS.
It is realized by software executed by the PU 70. This software is distributed in the market by being stored in a storage medium such as a CD-ROM or via a network. When this software is installed on this computer 40, it is typically stored on hard disk 74. Then, when the user starts this software, or when it is started by the operating system (OS) of the computer 40, it is read from the hard disk 74 to the memory 72 and executed by the CPU 70.

【００３５】なお、このソフトウェアが、もともとコン
ピュータ４０のＯＳの一部として組み込まれている機能
を利用したり、またはＯＳとは別にインストールされた
別のソフトウェアの一部の機能を利用することもあり得
る。したがって、本発明を実現するためのソフトウェア
として流通するのは、そうした一部の機能を欠いたもの
であってもよく、実行時にそうした機能が他のソフトウ
ェアによって実現されることを考慮した構成となってい
ればよい。もちろん、必要な機能がすべてあらかじめ組
み込まれたシステムソフトウェアとして流通してもよ
い。It should be noted that this software may utilize a function originally incorporated as a part of the OS of the computer 40, or may utilize a part of a function of another software installed separately from the OS. obtain. Therefore, what is circulated as software for implementing the present invention may lack such a part of the functions, and has a configuration considering that such functions are realized by other software at the time of execution. If you have. Of course, all necessary functions may be distributed as system software in which all necessary functions are incorporated in advance.

【００３６】［原理］本発明において、目の位置を検出
するための方法の基本は、フレーム画像の差分に基づ
き、特に瞬きによる明度の差に着目したものである。フ
レーム間差分から瞬きを検出しようとするとき、顔が動
いていると、動いた瞬き部分以外にも明度変化の大きな
画素が多くの部分で生じる。そのために、目蓋の動きに
よる明度変化と、顔全体の動きによる明度変化とを区別
する必要がある。なお、口形状の変化や眉の動きなどに
よっても問題が生じる可能性があるが、それらの明度変
化は、目候補点を抽出した後、条件検定によって棄却す
る。[Principle] In the present invention, the basis of the method for detecting the position of the eyes is based on the difference between the frame images, and in particular, the difference in brightness due to blinking. When a blink is to be detected from the inter-frame difference, if the face is moving, pixels with large brightness changes occur in many portions other than the moving blinking portion. Therefore, it is necessary to distinguish the change in brightness caused by the movement of the eyelid and the change in brightness caused by the movement of the entire face. Although problems may occur due to changes in the mouth shape and eyebrow movements, those brightness changes are rejected by the conditional test after extracting eye candidate points.

【００３７】顔全体の動きは、画像平面内の平行移動、
画像平面内の回転（首をかしげる）、画像平面外の回転
（振り向く、頷く）に分けて考えることができる。フレ
ームレートは十分に速くて、フレーム間での動きは小さ
いものとする。以下、これらについて順にそれらの動き
をキャンセルする方法について説明する。The movement of the whole face is a translation in the image plane,
It can be divided into rotation within the image plane (shaking the neck) and rotation outside the image plane (turning around, nodding). The frame rate should be fast enough and the movement between frames should be small. Hereinafter, a method of canceling the movements of these will be described in order.

【００３８】顔が画像平面内で平行移動したときの動き
をキャンセルする方法は以下のとおりである。図３
（Ａ）に示される、画像ｆ_t-1に写っている色の付いた
円板９０が、右方向に距離ｄだけ移動して、次のフレー
ムで図３（Ｂ）に示される画像ｆ _tのようになった場合
を考える。円板には白い小円マークが付いている。背景
は一様でなくともよいが、図示はしていない。Movement when the face is translated in the image plane
The way to cancel is as follows. Figure 3
Image f shown in (A)_t-1The color shown in
The disc 90 moves to the right by the distance d and moves to the next frame.
Image f shown in FIG. _tIf it becomes like
think of. The disk has a white small circle mark. background
Are not necessarily shown, but they are not shown.

【００３９】図３（Ａ）および図３（Ｂ）の画像の差分
画像をとると図４に示したとおりの画像となる。図４に
おいて、斜線で示した領域が画素値の差が大きく現れた
部分であり、白い領域は画素値の差が小さい領域であ
る。図４に示される差分画像Ｄの中で、画素値の差が大
きく現れた画素位置を、図３（Ａ）または（Ｂ）に示す
元の画像に立ち返って考えてみると、これら画素位置は
さらに図５の上段の左右に示された領域ＦおよびＢに分
類できる。領域ＦのＦはForeground、すなわち円板上の
点であり、領域ＢのＢはBackground、すなわち背景に対
応する。領域Ｂは一方の画像で見えていて、他方の画像
では見えていない部分である。領域Ｆは両方の画像で見
えていて、互いに対応する画素がある部分である。ただ
し、実際の画像では、移動物体がどの部分であるかは不
明なので、どこが領域Ｆでどこが領域Ｂかは区別できな
い。When the difference image between the images of FIGS. 3A and 3B is taken, the image is as shown in FIG. In FIG. 4, a shaded area is a portion where a large difference in pixel value appears, and a white area is a small difference in pixel value. In the difference image D shown in FIG. 4, considering the pixel positions where a large difference in pixel value appears, returning to the original image shown in FIG. 3A or 3B, these pixel positions are Further, it can be classified into regions F and B shown on the left and right of the upper part of FIG. The area F corresponds to Foreground, that is, a point on the disk, and the area B corresponds to Background, that is, the background. The area B is a part that is visible in one image but not in the other image. The area F is a portion visible in both images and having pixels corresponding to each other. However, in the actual image, since it is unknown which part the moving object is, it is not possible to distinguish which is the region F and which is the region B.

【００４０】ここで、図５の上段右に示されている画像
ｆ_t上の、領域ＦとＢとに相当する画素を−ｄだけシフ
トして画像ｆ_t-1に重ねてみる。すると、図５の下段左
に示されているように、画素値が一致する領域Ｍの部分
と、一致するともしないともいえない領域Ｕの部分とが
あることがわかる。同様に画像ｆ_t-1上の領域ＦとＢと
に相当する画素を＋ｄだけシフトして画像ｆ_tに重ねて
みると、領域ＭとＵとは図５の下段右に示したようにな
る。なお、ここでいう移動量±ｄは二次元ベクトル量で
ある。Here, the pixels corresponding to the regions F and B on the image f _t shown in the upper right of FIG. 5 are shifted by -d and overlapped on the image f _t-1 . Then, as shown in the lower left of FIG. 5, it can be seen that there is a portion of the region M where the pixel values match and a portion of the region U where the pixel values can and cannot be said to match. Similarly, when pixels corresponding to the regions F and B on the image f _t-1 are shifted by + d and overlapped on the image f _t , the regions M and U are as shown in the lower right of FIG. . The movement amount ± d here is a two-dimensional vector amount.

【００４１】ここで、シフト量が実際のシフト量±ｄと
は異なるシフト量±ｄ’であるとすると、領域Ｍに属す
る画素数は、実際のシフト量±ｄを用いた場合よりも少
なくなると考えられる。そこで、領域Ｍの画素数が最大
となるシフト量を探索すれば、それがフレーム間での円
板９０の移動量ｄであるといえる。領域Ｍの画素数の計
数にあたっては、画像ｆ_t-1の画素のシフト量をｄとす
れば画像ｆ_tのシフト量が−ｄとなるように、両者が常
に逆方向で同じ大きさでなければならない。Assuming that the shift amount is a shift amount ± d 'different from the actual shift amount ± d, the number of pixels belonging to the region M becomes smaller than when the actual shift amount ± d is used. Conceivable. Therefore, if a shift amount that maximizes the number of pixels in the region M is searched, it can be said that the shift amount is the movement amount d of the disc 90 between frames. In counting the number of pixels in the region M, both must always have the same size in opposite directions so that the shift amount of the image f _t becomes −d when the shift amount of pixels of the image f _t−1 is d. I have to.

【００４２】このように、フレーム間の差分画像で抽出
された画素のうち、領域Ｍに属する画素数が最大となる
シフト量±ｄを見出して画像をシフトさせて重ね合わ
せ、なおかついずれでも画素値が一致しない画素を抽出
すると、それは移動量ｄでの平行移動以外の動きをした
部分であると言える。顔全体の移動量をｄとすれば、ｄ
以外の動きをした部分は、瞬き、口形状の変化などに由
来する部分である。したがってフレーム間差分画像から
顔全体の動きをキャンセルすることにより、瞬きで動く
部分、すなわち目位置を抽出することができる。As described above, among the pixels extracted from the difference image between frames, the shift amount ± d that maximizes the number of pixels belonging to the region M is found, the images are shifted and superimposed, and the pixel value It can be said that when a pixel in which is not matched is extracted, it is a portion that has moved other than the parallel movement with the movement amount d. If the amount of movement of the entire face is d, d
The parts that have moved other than are parts derived from blinks, changes in the mouth shape, and the like. Therefore, by canceling the movement of the entire face from the inter-frame difference image, it is possible to extract the blinking portion, that is, the eye position.

【００４３】次に振り向く動きについて考える。図６を
参照して、カメラ５６、５８を向いている顔が左に振り
向こうとしている場合を考える。この場合、顔の右端
（図６における左端）からは隠れていた部分が出現し、
顔の左端（図６における右端）では、見えていた部分が
セルフオクルージョンで隠れる。このような部分は、フ
レーム間差分を計算する２枚の画像の一方では隠れてい
るので、平行移動のときとは異なりシフトして画素値の
一致を見ることは意味がない。Next, consider the turning motion. With reference to FIG. 6, consider a case where the face facing the cameras 56, 58 is about to turn to the left. In this case, the hidden part appears from the right end (left end in FIG. 6) of the face,
At the left end of the face (right end in FIG. 6), the visible part is hidden by self occlusion. Since such a portion is hidden in one of the two images for which the inter-frame difference is calculated, it is meaningless to shift and see the coincidence of the pixel values unlike the case of the parallel movement.

【００４４】しかしこのような部分は、移動領域の周辺
部にしか現れない。したがって、フレーム間差分画像
（図４）を計算した後、斜線を引いた領域の左右から一
定数の画素をそれぞれ削り去ることで、その影響を除去
することができる。他方、顔の中心部は、振り向く動作
の場合でも平行移動とみなせるので、平行移動に関する
前述の手法（二枚の画像を±ｄだけ移動して一致する画
素を見る手法）を適用することができる。However, such a portion appears only in the peripheral portion of the moving area. Therefore, after calculating the inter-frame difference image (FIG. 4), the influence can be removed by removing a certain number of pixels from the left and right of the hatched area. On the other hand, since the center part of the face can be regarded as parallel movement even in the case of a turning motion, the above-described method regarding parallel movement (a method of moving two images by ± d to see matching pixels) can be applied. .

【００４５】なお、首をかしげる動きと頷く動きとは、
回転中心が顔からはずれているため、平行移動とみなす
ことができる。ただし、いずれの場合も、フレーム間に
おいて動きが小さいことが前提である。The movements of squeezing the neck and the nod are
Since the center of rotation is off the face, it can be regarded as a parallel movement. However, in either case, it is premised that the movement between frames is small.

【００４６】［ソフトウェアの制御構造］上記した原理
に基づく目の検出処理およびトラッキング処理を実現す
るためのソフトウェアの構造について以下説明する。な
お、前述したとおり、以下に述べるソフトウェアの実行
時には、ＯＳが持つ機能、またはすでにコンピュータに
インストールされている機能を利用することが前提とな
っている場合があり、その結果、流通するソフトウェア
がそうした機能自体を含まないことがあり得る。しか
し、その場合でも、そうした機能を制御構造のどこで利
用するかについての情報をもつものである限り、そうし
たソフトウェアは本願発明の技術的範囲に属するもので
ある。[Control Structure of Software] The structure of software for realizing the eye detection processing and the tracking processing based on the above principle will be described below. As described above, when executing the software described below, it may be premised that the functions of the OS or the functions already installed in the computer are used, and as a result, the distributed software does so. It may not include the function itself. However, even in that case, such software falls within the technical scope of the present invention as long as it has information about where in the control structure such a function is used.

【００４７】なお、以下に述べる処理では、すでに述べ
た目の候補点の検出処理に加えて、目が二つあることを
前提に、瞬きは左右の目で同時に起きる、両目間の距離
はある範囲に入っている、顔の傾き角度は一定範囲内で
ある、両目の中点近傍の濃淡パターンはほぼ左右対称で
ある、などの条件を用いることで、抽出された目の位置
の候補点対が目であるか否かを検定している。In the processing described below, in addition to the processing for detecting eye candidate points already described, on the assumption that there are two eyes, blinks occur simultaneously in the left and right eyes, and there is a distance between both eyes. By using conditions such as being in the range, the angle of inclination of the face being within a certain range, and the grayscale pattern near the midpoint of both eyes being almost symmetrical, the candidate point pairs of the extracted eye positions are paired. It tests whether or not is an eye.

【００４８】また、以下の処理では種々のしきい値が用
いられているが、その値は応用により、また求められて
いる精度により経験的に定められるものである。Further, various thresholds are used in the following processing, but the values are empirically determined depending on the application and the required accuracy.

【００４９】図７を参照して、このソフトウェアの全体
構成は以下のようになっている。まず、図８以下を参照
して後に詳細に説明する手順を用いて、目の位置を抽出
する（ステップ１００）。続くステップ１０２で、抽出
された目の位置をアイカメラ５４を制御するコンピュー
タ８２に送信する。そして、引き続きステップ１０４で
目の位置のトラッキング処理を行なう。目位置のトラッ
キングができたか否か、すなわち目の位置を見失ったか
否かを判断する（ステップ１０６）。トラッキングが失
敗した場合には制御はステップ１００に戻り、再度ステ
ップ１００の目の位置の抽出処理から実行する制御が行
なわれる。トラッキングができていれば、制御はステッ
プ１０２に戻り、トラッキングによって得られた目の方
向、目までの距離などの情報がコンピュータ８２に送信
される。以下、ステップ１００から１０６の処理を繰り
返し行なうことで、コンピュータ８２に対して利用者の
目の方向および距離に関する情報を絶えず送信すること
ができる。Referring to FIG. 7, the overall structure of this software is as follows. First, the eye position is extracted using a procedure described in detail later with reference to FIG. 8 and subsequent figures (step 100). In the following step 102, the extracted eye position is transmitted to the computer 82 which controls the eye camera 54. Then, in step 104, the eye position tracking process is performed. It is determined whether or not the eye position has been tracked, that is, whether or not the eye position has been lost (step 106). If the tracking fails, the control returns to step 100, and the control is executed again from the eye position extraction processing of step 100. If the tracking is completed, the control returns to step 102, and the information such as the direction of the eyes and the distance to the eyes obtained by the tracking is transmitted to the computer 82. Thereafter, by repeatedly performing the processing of steps 100 to 106, it is possible to constantly transmit the information regarding the direction and the distance of the user's eyes to the computer 82.

【００５０】なお図７のステップ１００において目の位
置が抽出されない場合もあり得る。その場合は、再度ス
テップ１００の処理を実行して、次のフレームに対して
目の位置の抽出処理を実行し、その後再度トラッキング
処理を開始する。Note that the eye positions may not be extracted in step 100 of FIG. In that case, the process of step 100 is executed again, the eye position extraction process is executed for the next frame, and then the tracking process is started again.

【００５１】図８を参照して、図７のステップ１００で
行なわれる目の位置の抽出処理について説明する。以下
の処理は、図４から図６を参照して説明した原理を用い
たものである。なおこの処理の前に、前画像がキャプチ
ャされていることが前提である。特に処理の最初では、
前画像を現画像と一致させるようにしてもよい。まず、
ステップ１２０でカメラ５６，５８からの現画像をキャ
プチャする。以下の処理は実際には二つのカメラ５６の
画像に対して行なわれ、目の検出後、カメラ５８の画像
とあわせてステレオ処理により目までの距離を計算す
る。カメラ５６とカメラ５８との役割を交代させてもよ
い。Referring to FIG. 8, the eye position extraction processing performed in step 100 of FIG. 7 will be described. The following processing uses the principle described with reference to FIGS. 4 to 6. It is assumed that the previous image has been captured before this processing. Especially at the beginning of the process,
The previous image may be matched with the current image. First,
At step 120, the current image from the cameras 56, 58 is captured. The following processing is actually performed on the images of the two cameras 56, and after detecting the eyes, the distance to the eyes is calculated by stereo processing together with the images of the cameras 58. The roles of the camera 56 and the camera 58 may be changed.

【００５２】続いてステップ１２２で、前画像と現画像
とのフレーム間差分２値化画像Ｄを計算する。より具体
的には、各画素ごとに、前画像と現画像との値の差の絶
対値を求め、その値があらかじめ定められたしきい値Ｎ
以上であればその画素値を「１」とし、しきい値Ｎ未満
であれば画素値を「０」とする。ビデオ信号は、静止物
体を撮像している場合でもある程度変動しているので、
このようなしきい値Ｎを用いることが必要である。この
しきい値Ｎはノイズレベルと考えることができる。Subsequently, at step 122, the inter-frame difference binarized image D between the previous image and the current image is calculated. More specifically, the absolute value of the difference between the values of the previous image and the current image is calculated for each pixel, and the value is determined by a predetermined threshold value N.
If it is above, the pixel value is set to "1", and if it is less than the threshold value N, the pixel value is set to "0". Since the video signal fluctuates to some extent even when capturing a stationary object,
It is necessary to use such a threshold value N. This threshold N can be considered as a noise level.

【００５３】続いてステップ１２４で、差分画像Ｄ中の
画素値「１」の数があらかじめ定められたしきい値Ｇ以
上か否かを判定する。判定結果が「ＹＥＳ」であれば制
御は図９に示すステップ１６８にとび、目は検出されな
かったとされ処理を終了する。この場合は、あまりにも
多くの画素に動きが検出されたため、想定以上の動きが
あるものとして、次のフレーム入力を待つ。Subsequently, at step 124, it is judged whether or not the number of pixel values "1" in the difference image D is equal to or larger than a predetermined threshold value G. If the determination result is “YES”, the control jumps to step 168 shown in FIG. 9, and it is determined that no eyes have been detected, and the process ends. In this case, since motion has been detected in too many pixels, it is assumed that there is more motion than expected, and the next frame input is awaited.

【００５４】ステップ１２４での判定結果が「ＮＯ」の
場合、ステップ１２６で差分画像Ｄ中の画素値「１」の
数がしきい値Ｅ以下か否かを判定する。このしきい値Ｅ
は前述のしきい値Ｇよりも小さな値であり、顔が実質的
に静止しているか否かを判定するためのものである。ス
テップ１２６の判定結果が「ＹＥＳ」の場合、制御は図
９に示すステップ１５２に飛ぶ。この場合には、フレー
ム間差分で抽出された画素数が少なく、顔が停止状態で
あると考えられる。When the result of the determination in step 124 is "NO", it is determined in step 126 whether the number of pixel values "1" in the difference image D is less than or equal to the threshold value E. This threshold E
Is a value smaller than the above-mentioned threshold value G and is for determining whether or not the face is substantially stationary. If the decision result in the step 126 is "YES", the control jumps to the step 152 shown in FIG. In this case, the number of pixels extracted by the interframe difference is small, and the face is considered to be in a stopped state.

【００５５】ステップ１２６での判定結果が「ＮＯ」の
場合、制御はステップ１２８に進む。ステップ１２８で
は、差分画像Ｄの画素値「１」である画素群に対し、そ
の両端の画素を削除する処理が行なわれる。この処理
は、図６を参照して説明したように振り向く動作を考慮
したものである。When the result of the determination in step 126 is "NO", the control proceeds to step 128. In step 128, the pixel group having the pixel value “1” of the difference image D is subjected to the process of deleting the pixels at both ends thereof. This processing considers the turning motion as described with reference to FIG.

【００５６】より具体的には、差分画像Ｄの各走査線ご
とに左右の端からそれぞれ右方向および左方向にサーチ
して、最初に現れる画素値「１」の画素を探す。そして
その画素からそれぞれｋ個の画素の画素値を「０」に書
き換える。この処理により、画像中の画素値「１」の左
右領域の所定分が削り去られることになる。More specifically, each scanning line of the difference image D is searched in the rightward and leftward directions from the left and right ends, and the pixel having the pixel value "1" that appears first is searched for. Then, the pixel values of k pixels from that pixel are rewritten to "0". By this processing, a predetermined amount of the left and right regions of the pixel value "1" in the image is removed.

【００５７】続いてステップ１３０で、前画像と現画像
とから、差分画像Ｄの画素値「１」に対応する画素を抽
出する。この処理は、図５の中段に示した画像を得る処
理である。この後ステップ１３２で、前述した移動量±
ｄを算出する処理が行なわれる。この処理の詳細につい
ては図１０を参照して後に詳述する。Subsequently, at step 130, the pixel corresponding to the pixel value "1" of the difference image D is extracted from the previous image and the current image. This process is a process for obtaining the image shown in the middle part of FIG. Thereafter, in step 132, the above-mentioned movement amount ±
The process of calculating d is performed. Details of this processing will be described later with reference to FIG.

【００５８】移動量±ｄが算出されると、続いてステッ
プ１５０（図９）で図５の下段に示す領域Ｍに相当する
画素を差分画像Ｄから削除する処理が行なわれる。さら
にステップ１５２で、差分画像Ｄに対して孤立点除去お
よび平滑化処理が実行される。より具体的には、処理対
象の画素の８近傍と、中心画素とのうち合計して４画素
以上の画素値が「１」であればその処理対象の画素の画
素値を「１」とする。さもなければ画素値を「０」とす
る。この処理により、孤立点の画素値は「０」となって
差分画像Ｄ上から削除される。When the movement amount ± d is calculated, subsequently, in step 150 (FIG. 9), a process of deleting the pixels corresponding to the region M shown in the lower part of FIG. 5 from the difference image D is performed. Further, in step 152, isolated point removal and smoothing processing is executed on the difference image D. More specifically, if the pixel value of 4 pixels or more in total of 8 neighborhoods of the pixel to be processed and the central pixel is “1”, the pixel value of the pixel to be processed is set to “1”. . Otherwise, the pixel value is set to "0". By this processing, the pixel value of the isolated point becomes “0” and is deleted from the difference image D.

【００５９】続いて、ステップ１５４で、差分画像Ｄに
対してラベリング処理を施す。この処理により、互いに
連続した画素値「１」の画素からなる領域の各々につい
てラベルが付与される。こうして得られた各要素の中心
を、目位置の候補点とする（ステップ１５６）。Subsequently, in step 154, the labeling process is performed on the difference image D. By this processing, a label is given to each of the regions made up of pixels having pixel values of "1" that are continuous with each other. The center of each element thus obtained is set as the eye point candidate point (step 156).

【００６０】得られた候補点の数が２未満か否かを判定
する（ステップ１５８）。目位置は一般的に２個である
ことが想定されるから、候補点の数が２未満の場合には
目は検出されなかったとして、処理はステップ１６８に
進む。候補数が２以上であれば制御はステップ１６０に
進む。It is determined whether the number of obtained candidate points is less than 2 (step 158). Since it is generally assumed that there are two eye positions, if the number of candidate points is less than two, it is determined that no eye has been detected, and the process proceeds to step 168. If the number of candidates is 2 or more, the control proceeds to step 160.

【００６１】ステップ１６０では、各候補点ごとに、候
補点を中心とする矩形ａ×ｂの領域の各画素の、前画像
の画素値と現画像の画素値との差の絶対値の総和Ｓを求
める。ここで矩形ａ×ｂとは、想定される目の大きさに
対応する大きさの矩形である。前画像と現画像との間で
瞬きがあれば、一方の画像では目の画像であり他方の画
像では目蓋の画像となるためその画素値の差の絶対値は
大きくなり、かつその大きさは想定される目の大きさに
近いはずである。したがって、このように各候補点につ
いてＳを求め、その値の大きなもの二つを目の候補点
（候補点対）とすることができる。In step 160, for each candidate point, the sum S of the absolute values of the differences between the pixel value of the previous image and the pixel value of the current image of each pixel in the rectangle a × b centered on the candidate point is set. Ask for. Here, the rectangle a × b is a rectangle having a size corresponding to the expected size of the eyes. If there is a blink between the previous image and the current image, one image is the image of the eye and the other image is the image of the eyelid, so the absolute value of the difference between the pixel values is large and its size is large. Should be close to the expected eye size. Therefore, S can be obtained for each candidate point in this way, and the two with the largest value can be used as the second candidate point (pair of candidate points).

【００６２】続いてステップ１６２で、以上のようにし
て求められた二つの候補点が以下の諸条件を満足するか
否かが判定される。全条件を満足しなければ、目は検出
されなかったとして（ステップ１６８）、処理を終了す
る。Subsequently, in step 162, it is judged whether or not the two candidate points obtained as described above satisfy the following conditions. If all the conditions are not satisfied, it is determined that no eye has been detected (step 168), and the process ends.

【００６３】１）ステップ１６０で求めた差の絶対値の
総和Ｓが、二つの候補点のいずれにおいても所定のしき
い値Ｆ以上である。1) The sum S of the absolute values of the differences obtained in step 160 is equal to or greater than the predetermined threshold value F at any of the two candidate points.

【００６４】２）候補点間の距離があらかじめ定められ
た値Ｌmin以上である。３）候補点間の距離があらかじめ定められた値Ｌmax以
下である。2) The distance between the candidate points is greater than or equal to a predetermined value Lmin. 3) The distance between the candidate points is less than or equal to a predetermined value Lmax.

【００６５】４）候補点を結ぶ直線と走査線方向とのな
す角度が、あらかじめ定められたしきい値Ａ度以下であ
る。4) The angle formed by the straight line connecting the candidate points and the scanning line direction is less than or equal to a predetermined threshold value A degrees.

【００６６】しきい値Ｆはあらかじめ実験的、または統
計的に定められる値であり、目以外の部分が目として抽
出されるのを防止するために用いられる。Ｌminおよび
Ｌmaxはそれぞれ、目の中心間の距離として想定される
最小値および最大値である。これらの値はあらかじめ統
計的に定められる。またしきい値角度Ａ度も実験的に適
切な値が定められるものであるが、通常は３０度から４
５度程度の値が使用される。The threshold value F is a value that is experimentally or statistically determined in advance, and is used to prevent portions other than the eyes from being extracted as the eyes. Lmin and Lmax are the minimum and maximum values that are assumed as the distance between the centers of the eyes, respectively. These values are statistically determined beforehand. An appropriate value for the threshold angle A is also determined experimentally, but normally it is 30 to 4 degrees.
Values around 5 degrees are used.

【００６７】上述の４つの条件がすべて充足された場
合、さらにステップ１６４で、求められた二つの目候補
点の中点近傍の濃淡パターンがほぼ左右対称か否かを判
定する。ほぼ左右対称であると判定されたら、この二つ
の目候補点を目の位置とする。左右対称でなければ目を
見つけられなかったとして処理を終了する。When all of the above four conditions are satisfied, it is further determined in step 164 whether the grayscale pattern near the midpoint of the obtained two eye candidate points is substantially symmetrical. If it is determined that they are substantially symmetrical, these two eye candidate points are set as eye positions. If the eyes are not symmetrical, the processing is terminated because the eyes cannot be found.

【００６８】二つの目候補点の中点近傍のパターンがほ
ぼ左右対称か否かを判定する方法について説明する。図
１１を参照して、一般的に両目の中点（これを以下「眉
間」と呼ぶ。）を中心とする領域２２０において、眉間
の左右の濃淡パターンは、二つの目候補点を結ぶ線分に
垂直で、眉間（二つの目候補点の中点）をとおる直線に
関してほぼ左右対称となるはずである。そこで、まず現
画像を、両目の候補点の中点を中心として、両目候補が
走査線に水平に並ぶように回転し、候補点の中点を中心
とするｇ×ｈ画素（領域２２０とする。）を切出す。図
１１の下段に示されるように、この領域２２０の縦ｈ画
素の画素値の総和を各列ごとに求め、さらに全画素の総
和に対する各列の総和を百分率であらわした値を計算す
ることにより、図１１に示すグラフ２２２のような投影
プロフィールが得られる。そして、このプロフィールの
うち、両候補の中点を中心として、左右対称となる互い
に隣接する３列ずつの合計を左右双方について計算す
る。そして、左右の合計の差の総和を、隣接する３列の
すべての組み合わせに対して計算し、その値がすべてし
きい値ｐ以下であれば、二つの目候補点の中点近傍のパ
ターンが、ほぼ左右対称であると判定する。A method for determining whether or not the patterns near the midpoints of the two eye candidate points are substantially symmetrical will be described. Referring to FIG. 11, generally in a region 220 centered on the midpoint of both eyes (hereinafter referred to as “between eyebrows”), the gray pattern on the left and right between the eyebrows is a line segment connecting two eye candidate points. It should be almost symmetric with respect to a straight line that is perpendicular to the line between the eyebrows (the midpoint between the two eye candidate points). Therefore, first, the current image is rotated about the midpoint of the candidate points of both eyes so that the candidates for both eyes are arranged horizontally on the scanning line, and g × h pixels (region 220 is set) centered on the midpoint of the candidate points. .) Is cut out. As shown in the lower part of FIG. 11, the sum of the pixel values of the vertical h pixels in this area 220 is obtained for each column, and the sum of each column with respect to the sum of all pixels is calculated as a percentage. , A projection profile such as the graph 222 shown in FIG. 11 is obtained. Then, in this profile, a total of three columns adjacent to each other, which are symmetrical with respect to the midpoint of both candidates, is calculated for both the left and right sides. Then, the sum of the differences between the left and right sums is calculated for all combinations of three adjacent columns, and if all the values are equal to or less than the threshold value p, the pattern near the midpoint of the two eye candidate points is , And determine that they are almost symmetrical.

【００６９】具体的には、図１１の下段を参照して、た
とえば最左端の３列２２４Ｌの値の合計と、最右端の３
列２２４Ｒの値の合計とを計算し、その差ｐ₁を求め
る。同様に、左から２列目〜４列目の３列２２６Ｌの値
の合計と、右から２列目〜４列目の３列２２６Ｒの値の
合計とを計算し、その差ｐ₂を求める。左から３列目〜
５列目の３列２２８Ｌの値の合計と、右から３列目〜５
列目の３列２２８Ｒの値の合計を計算し、その差ｐ₃を
求める。以下同様にして値ｐ₄〜ｐ_g-2を計算し、ｐ₁〜
ｐ_g-2の値の合計を求め、その合計としきい値ｐとを比
較することで、二つの目候補点の中点近傍の濃淡パター
ンがほぼ左右対称か否かを判定できる。Specifically, referring to the lower part of FIG. 11, for example, the sum of the values of the three columns 224L at the leftmost end and the value of 3 at the rightmost end.
The sum of the values in the column 224R is calculated, and the difference p ₁ is calculated. Similarly, the sum of the values of the third column 226L from the second column to the fourth column from the left and the sum of the values of the third column 226R from the second column to the fourth column from the right are calculated, and the difference p ₂ is obtained. . 3rd column from the left
The sum of the values of the 3rd column 228L of the 5th column and the 3rd to 5th columns from the right
The sum of the values in the third column 228R in the third column is calculated, and the difference p ₃ is calculated. Similarly, the values p _{4 to} pg _-2 are calculated, and p ₁ to
By obtaining the sum of the values of p _g-2 and comparing the sum with the threshold value p, it is possible to determine whether or not the grayscale pattern near the midpoint of the two eye candidate points is substantially symmetrical.

【００７０】なお、上の説明では、二つの目候補点を水
平走査線に並ぶようにその中点を中心としていったん回
転してから濃淡パターンが左右対称か否かを判定した。
しかし左右対称の判定方法はこれに限定されるわけでは
ない。たとえば、二つの目候補点を結ぶ線分が水平走査
線に対してなす角度がわかれば、その線分に対し垂直な
方向で画素値の合計を計算するようにすることで、図１
１に示すのと同様の処理を実現することができる。ま
た、上の例では各列ごとに合計をとるようにしたが、中
点を中心として左右対称な画素ごとに差を計算してその
総和を計算し、それがしきい値より小さい場合に左右対
称と判定してもよい。In the above description, the two eye candidate points are once rotated about their midpoints so as to be aligned with the horizontal scanning line, and then it is determined whether the light and shade pattern is symmetrical.
However, the symmetrical determination method is not limited to this. For example, if the angle formed by the line segment connecting the two eye candidate points with respect to the horizontal scanning line is known, the total pixel value can be calculated in the direction perpendicular to the line segment.
Processing similar to that shown in 1 can be realized. In the above example, the total is calculated for each column, but the difference is calculated for each pixel that is symmetrical about the midpoint, and the sum is calculated. It may be determined to be symmetrical.

【００７１】ここで再び図８のステップ１３２に戻り、
移動量ｄを算出する処理について、図１０を参照して詳
細に説明する。移動量ｄ（二次元ベクトル量）＝（ｘ，
ｙ）とし、ｘ＝−ｍ，−ｍ＋１，…，ｍ、ｙ＝−ｎ，−
ｎ＋１，…，ｎの範囲で図５の下段に示すような領域Ｍ
に属する画素数の合計の最大値を示すものを探索するも
のとする。まず、変数ｍａｘに初期値０を代入する（ス
テップ１８０）。この変数ｍａｘは、以下に述べる計算
過程で、個々の移動量に対して算出される差分画像のう
ち、領域Ｍに属する画素数の合計の最大値を記憶するた
めのものである。Now, returning to step 132 of FIG. 8 again,
The process of calculating the movement amount d will be described in detail with reference to FIG. Movement amount d (two-dimensional vector amount) = (x,
y), x = -m, -m + 1, ..., m, y = -n,-
An area M as shown in the lower part of FIG. 5 in the range of n + 1, ..., N.
The one showing the maximum value of the total number of pixels belonging to is searched. First, the initial value 0 is substituted for the variable max (step 180). This variable max is for storing the maximum value of the total number of pixels belonging to the region M in the difference image calculated for each movement amount in the calculation process described below.

【００７２】変数ｘに−ｍを代入し（ステップ１８
２）、変数ｙに−ｎを代入する（ステップ１８４）。以
下、ｙの各値ごとの繰り返し処理である。Substitute -m for variable x (step 18
2) Substitute -n for variable y (step 184). Hereinafter, iterative processing is performed for each value of y.

【００７３】まず、ｄ＝（ｘ、ｙ）に対する領域Ｍに属
する画素数の合計Ｍ（ｄ）を求める（ステップ１８
６）。そしてこのＭ（ｄ）が変数ｍａｘより大きいか否
かを判定する（ステップ１８８）。Ｍ（ｄ）が変数ｍａ
ｘ以下の場合には何もせず制御はステップ１９２に進
む。Ｍ（ｄ）が変数ｍａｘより大きい場合には、ステッ
プ１９０で変数ｍａｘにＭ（ｄ）の値を代入し、Ｘ、Ｙ
にそれぞれそのときのｘ，ｙの値を代入する。制御はス
テップ１９２に進む。First, the total number M (d) of pixels belonging to the region M for d = (x, y) is obtained (step 18).
6). Then, it is determined whether or not this M (d) is larger than the variable max (step 188). M (d) is the variable ma
If x or less, nothing is done and the control proceeds to step 192. When M (d) is larger than the variable max, the value of M (d) is substituted for the variable max in step 190, and X, Y
The x and y values at that time are respectively substituted into. Control continues to step 192.

【００７４】ステップ１９２ではｙに１を加算し、その
結果ｙがｎを超えたか否かを判定する（ステップ１９
４）。ｙがｎを超えていない場合には制御はステップ１
８６に戻り、新たなｙに対してステップ１８６〜１９４
の処理を繰り返す。ｙがｎを超えている場合には、制御
はステップ１９６に進む。At step 192, 1 is added to y, and it is determined whether or not y exceeds n (step 19).
4). If y does not exceed n, the control is step 1
Returning to 86, steps 186-194 for the new y.
The process of is repeated. If y exceeds n, control proceeds to step 196.

【００７５】ステップ１９６では、ｘに１を加算する。
そして、その結果ｘがｍを超えたか否かを判定する（ス
テップ１９８）。ｘがｍを超えていないと判定された場
合には制御はステップ１８４に戻り、以下新たなｘに対
してステップ１８４〜１９８の処理を繰り返す。ｘがｍ
を超えたと判定された場合には、制御はステップ２００
に進み、変数ｍａｘの値と、この変数ｍａｘの値が得ら
れたときのｘ、ｙの値であるＸ，Ｙを出力して処理を終
了する。At step 196, 1 is added to x.
Then, it is determined whether or not the result x exceeds m (step 198). If it is determined that x does not exceed m, the control returns to step 184, and the processing of steps 184-198 is repeated for the new x. x is m
If it is determined that the value exceeds the limit, control proceeds to step 200.
Then, the process ends with the value of the variable max and the values of x and y when the value of the variable max is obtained.

【００７６】このようにして、領域Ｍに含まれる画素数
が最大となる移動量ｄを求める。なお、移動量ｄの算出
方法はこれ以外にも種々考えられる。たとえば、ｘ、ｙ
をｍ×ｎの範囲で変化させながらあらかじめ総当りで領
域Ｍに含まれることになる画素数を求めてテーブル化
し、そのテーブルの中で最大値を示すセルを求めるよう
にしてもよい。In this way, the movement amount d that maximizes the number of pixels included in the area M is obtained. Note that various other methods of calculating the movement amount d are possible. For example, x, y
It is also possible to find the number of pixels to be included in the region M in a brute force in advance while making a change in the range of m × n to make a table, and find the cell having the maximum value in the table.

【００７７】さて、上で述べたようにして検出した目を
次のフレームから信頼性高く、かつロバスト性高くトラ
ッキングしていくために、次のようなトラッキング方法
をソフトウェアにより実現する。本実施の形態では、目
をトラッキングするのに、直接ではなく、目と目の間
（眉間）のパターンをトラッキングする。そして、その
ようにトラッキングされた眉間からの相対的位置が前フ
レームの目位置と同じ点の近傍で目を探索する。In order to track the eye detected as described above from the next frame with high reliability and robustness, the following tracking method is realized by software. In the present embodiment, the eye is not directly tracked but the pattern between the eyes (between eyebrows) is tracked. Then, the eye is searched for in the vicinity of the point where the relative position from the eyebrow thus tracked is the same as the eye position of the previous frame.

【００７８】眉間パターンは、顔の表情が変化する場合
でも比較的安定している。また、額部と鼻筋の明るい部
分に両サイドから目と眉とが楔のように割込んだ濃淡パ
ターンを形成しているので、パターンマッチングによっ
て位置を決める際の精度を出しやすい。The eyebrow pattern is relatively stable even when the facial expression changes. Further, since a light and shade pattern in which eyes and eyebrows are cut in from both sides like wedges is formed in the forehead and the bright part of the nose, it is easy to obtain accuracy in determining the position by pattern matching.

【００７９】以下に述べるように、眉間のテンプレート
を用いてパターンマッチングする際に、テンプレートを
更新する必要がある。テンプレートとして本実施の形態
では、両目の中心を中点とする矩形パターンを採用し、
テンプレートの更新には現フレームで検出された矩形パ
ターンを用いる。As described below, when performing pattern matching using the template between the eyebrows, it is necessary to update the template. In the present embodiment, a rectangular pattern having the center of both eyes as the midpoint is adopted as a template,
The rectangular pattern detected in the current frame is used to update the template.

【００８０】以下、図１２を参照してトラッキング処理
の詳細について説明する。なお、前フレームで抽出され
た両目の中点位置と、眉間パターン（ｓ×ｔ画素）と、
中点から見た右目と左目との相対位置ｅ_r、ｅ_l（ｅ_r＝
―ｅ_lで、いずれも２次元量）とがすでにセーブしてあ
るものとする。Details of the tracking process will be described below with reference to FIG. In addition, the midpoint positions of both eyes extracted in the previous frame, the eyebrow pattern (s × t pixels),
Relative position e _r , e _l (e _r =
-E _l , both are two-dimensional quantities) and are already saved.

【００８１】まず、ステップ２４０でセーブされている
中点位置、眉間パターン、および両目の相対位置を初期
値として取り込む処理を行なう。続いて、現画像（現フ
レーム）の取り込みを行なう（ステップ２４２）。First, in step 240, a process of taking in the midpoint position, the eyebrow pattern, and the relative positions of both eyes saved as initial values is performed. Then, the current image (current frame) is captured (step 242).

【００８２】次のステップ２４４では、前画像と、さら
にその前の画像（前々画像）とにおける両目の中点位置
から、現画像における両目の中点位置を予測する。すな
わち、前フレーム、前々フレームの眉間位置をそれぞれ
Ｘ_t-1，Ｘ_t-2とする（ただしＸ_t-1，Ｘ_t-2はいずれも２
次元量）と、現フレームでの予測位置Ｘ_tは、Ｘ_t＝２Ｘ
_t-1―Ｘ_t-2で外挿できる。ただし、最初の検出時はＸ_-1
＝Ｘ₀とする。このように現画像における両目の中点位
置を予測するのは、顔画像が移動している場合、その移
動量はほぼ一定であると考えられるので、移動後の中点
位置を予測してから眉間パターンのマッチングをすると
効率がよいからである。In the next step 244, the midpoint position of both eyes in the current image is predicted from the midpoint positions of both eyes in the previous image and the image before that (previous image). That is, the positions of the eyebrows between the previous frame and the previous-previous frame are set to X _t-1 and X _t-2 , respectively (where X _t-1 and X _t-2 are both 2).
Dimension amount) and the predicted position X _t in the current frame are X _t = 2X
Extrapolation is possible with _t-1 ―X _t-2 . However, at the first detection, X _-1
= X ₀ . In this way, the midpoint position of both eyes in the current image is predicted when the face image is moving, because the amount of movement is considered to be almost constant. This is because it is efficient to match the pattern between the eyebrows.

【００８３】ステップ２４６では、ステップ２４４で予
測された中点位置の近傍で、セーブされていた眉間パタ
ーンと最もよく一致する眉間パターンを探すマッチング
処理を実行する。ここで最もよいマッチが得られた位置
をＸ_t0とする。そして、ステップ２４８で、Ｘ_t0+ｅ_rお
よびＸ_t0+ｅ_lを中心とする近傍（ｉ×ｊ画素）において
右目および左目の位置をそれぞれ探索する（ステップ２
４８）。探索にあたっては、その点を中心とする５×５
画素の平均画素値が最も暗い点を目と判定する。In step 246, a matching process is performed in the vicinity of the midpoint position predicted in step 244 to search for an eyebrow pattern that best matches the saved eyebrow pattern. The position at which the best match is obtained is X _t0 . Then, in step 248, the positions of the right eye and the left eye are searched in the neighborhood (i × j pixels) centered on X _{t0 +} e _r and X _{t0 +} e _l , respectively (step 2
48). In the search, 5 × 5 centered on that point
The point having the darkest average pixel value of the pixels is determined as the eye.

【００８４】次に、探索結果の目の位置が、図９のステ
ップ１６２で用いられた条件２）３）４）を満たすか否
かを判定する。いずれかひとつの条件でも満足されてい
なければ、目のトラッキングを誤ったか見失ったものと
して、トラッキング失敗の判定をし（ステップ２５２）
処理を終了する。この場合、図７のステップ１０６から
制御はステップ１００に戻り、再び目の位置の検出から
処理が再実行される。Next, it is determined whether or not the eye position of the search result satisfies the conditions 2) 3) 4) used in step 162 of FIG. If any one of the conditions is not satisfied, it is determined that the eye tracking is wrong or lost, and the tracking failure is determined (step 252).
The process ends. In this case, the control returns from step 106 of FIG. 7 to step 100, and the processing is performed again from the detection of the eye position.

【００８５】すべての条件が満足されていると判定され
た場合には、トラッキングが成功したものと判定され、
トラッキングにより得られた両目の中点位置と、その点
を中心とする眉間パターン（ｓ×ｔ画素）と、中点に対
する両目の相対位置ｅ_r、ｅ_lとをセーブ（更新）してト
ラッキング処理を終了する。If it is determined that all the conditions are satisfied, it is determined that the tracking has succeeded,
The tracking process is performed by saving (updating) the midpoint position of both eyes obtained by tracking, the eyebrow pattern (s × t pixel) centered on that point, and the relative positions e _r and e _{l of} both eyes with respect to the midpoint. To finish.

【００８６】なお、このようにしてカメラ５６の画像に
対して両目の位置が検出されれば、カメラ５８の画像と
あわせてステレオ処理により目までの距離が求まり、ア
イカメラ５４を制御するためのパラメータ（距離、方
向）が計算により決定できる。その処理については周知
の技法を適用することができるので、ここでは詳細な説
明は繰り返さない。If the positions of both eyes are detected in the image of the camera 56 in this manner, the distance to the eye is obtained by stereo processing together with the image of the camera 58, and the eye camera 54 is controlled. Parameters (distance, direction) can be determined by calculation. A well-known technique can be applied to the processing, and thus the detailed description will not be repeated here.

【００８７】［動作］上にその構造について説明した本
願発明にかかる目の位置の検出装置は以下のように動作
する。図１に示すステレオカメラ５６，５８は、利用者
の顔画像を撮影しそれぞれビデオ信号を出力しコンピュ
ータ４０に与える。[Operation] The eye position detecting device according to the present invention, the structure of which has been described above, operates as follows. The stereo cameras 56 and 58 shown in FIG. 1 capture a face image of the user, output video signals to the computer 40, and apply the video signals to the computer 40.

【００８８】図８に示すように目の位置の検出およびト
ラッキングのためのソフトウェアを起動すると、カメラ
５６，５８から得られたビデオ画像の各々に対して目の
位置が抽出される。具体的には、現画像がキャプチャさ
れ（ステップ１２０）、差分画像Ｄの計算（ステップ１
２２）、ステップ１２４および１２６の判断が行なわれ
る。ステップ１２４または１２６において判定結果が
「ＹＥＳ」となれば目の位置の検出ができなかったとし
て、次のフレーム画像に対して再度目の位置の抽出処理
が実行される。When the software for eye position detection and tracking is activated as shown in FIG. 8, the eye position is extracted for each of the video images obtained from the cameras 56 and 58. Specifically, the current image is captured (step 120) and the difference image D is calculated (step 1).
22), the determinations of steps 124 and 126 are made. If the determination result in step 124 or 126 is “YES”, it is determined that the eye position cannot be detected, and the eye position extraction process is executed again for the next frame image.

【００８９】ステップ１２４および１２６における判定
結果がいずれも「ＮＯ」となった場合には、ステップ１
２８で画素値「１」の領域の両端の画素が削除される。
そして、既に説明した手法にしたがって移動量ｄを計算
した後（ステップ１３２）、このようにして得られた移
動量ｄを用いて図５に示す移動および重ねあわせを行な
って得られた領域Ｍに相当する画素を、差分画像Ｄから
除去する（ステップ１５０）。If the determination results at steps 124 and 126 are both "NO", step 1
At 28, the pixels at both ends of the area having the pixel value “1” are deleted.
Then, after the movement amount d is calculated according to the method already described (step 132), the movement amount d thus obtained is used to move to the region M obtained by performing the movement and superposition shown in FIG. The corresponding pixels are removed from the difference image D (step 150).

【００９０】続いて、孤立点除去、平滑化処理（ステッ
プ１５２）、ラベリング処理（ステップ１５４）を経て
得られた各要素の中心を目位置候補とする（ステップ１
５６）。この候補数が２未満の場合には抽出は失敗した
ものと判定され、最初から処理が再開される。候補数が
２以上の場合には、その近傍における画像間の差分の絶
対値の総和が最も大きな二つの点が候補点となる（ステ
ップ１６０）。この候補点が前述した４つの条件のすべ
てを満足し（ステップ１６２）、かつ中点近傍のパター
ンが左右対称であると判定される（ステップ１６４）
と、その二つの候補点を目の位置として、目の位置の抽
出処理は終了する。ステップ１６２、１６４のテストに
失敗すると目の位置の抽出はできなかったものとして、
最初から処理が再開される。Subsequently, the center of each element obtained through isolated point removal, smoothing processing (step 152) and labeling processing (step 154) is set as an eye position candidate (step 1).
56). If the number of candidates is less than 2, it is determined that the extraction has failed, and the process is restarted from the beginning. When the number of candidates is two or more, the two points having the largest sum of absolute values of the differences between the images in the vicinity thereof are candidate points (step 160). It is determined that this candidate point satisfies all of the above four conditions (step 162) and that the pattern near the midpoint is symmetrical (step 164).
Then, with the two candidate points as eye positions, the eye position extraction process ends. If the test in steps 162 and 164 fails, the eye position cannot be extracted,
The process is restarted from the beginning.

【００９１】こうしてステレオカメラ５６，５８の双方
について、目の方向が決定されると、それらの情報を用
いてアイカメラ５４を制御すべき情報（アイカメラ５４
から見た目の方向および目までの距離）が得られる。こ
うした情報をコンピュータ８２に送信して（図７のステ
ップ１０２）、制御は目位置のトラッキング処理に移る
（ステップ１０４）。When the direction of the eyes of both the stereo cameras 56 and 58 is determined in this way, information for controlling the eye camera 54 using those information (eye camera 54
To the visual direction and the distance to the eye). Such information is transmitted to the computer 82 (step 102 in FIG. 7), and the control shifts to eye position tracking processing (step 104).

【００９２】トラッキング処理の最初には、目の位置の
抽出処理で得られた両目の中点位置（すなわち眉間の位
置）、眉間の画像パターン、および中点位置から見た両
目の相対位置に関する情報がセーブされている。まず最
初に、このようにセーブされている情報を初期値として
取り込む（ステップ２４０）。続いて現画像を取込み
（ステップ２４２）、前画像における目の中点位置（眉
間の位置）および前々画像における目の中点位置に基づ
いて現画像での目の中点位置を予測する(ステップ２４
４)。この予測は、たとえば前々画像と前画像との中点
位置を外挿することにより行なわれる。予測された中点
位置近傍で、既にセーブされていた前画像での眉間パタ
ーンと最もよくマッチするパターンの中心となる位置を
探索し、最もよいマッチを示した位置を中心として、両
目の位置を探索する（ステップ２４８）。得られた両目
位置が所定の条件を満足していれば（ステップ２５０で
ＹＥＳ）、その両目位置に基づいて両目の中点位置を算
出し、その値と、その中点位置を中心とする眉間パター
ンと、中点位置から両目までの相対位置とをセーブし
て、トラッキングを一旦終了する。ただしこの場合、図
７においてステップ１０６からステップ１０２の経路を
通り、次のトラッキング処理が行なわれる。At the beginning of the tracking process, the information about the midpoint position of both eyes (that is, the position between the eyebrows) obtained by the eye position extraction process, the image pattern between the eyebrows, and the relative position of both eyes viewed from the midpoint position. Has been saved. First, the information thus saved is fetched as an initial value (step 240). Then, the current image is captured (step 242), and the midpoint position of the eye in the current image is predicted based on the midpoint position of the eye (position between the eyebrows) in the previous image and the midpoint position of the eye in the previous image ( Step 24
4). This prediction is performed, for example, by extrapolating the midpoint positions of the two images before and after. In the vicinity of the predicted midpoint position, search for the center position of the pattern that best matches the already saved eyebrow pattern in the previous image, and locate the positions of both eyes centering on the position showing the best match. Search (step 248). If the obtained binocular position satisfies the predetermined condition (YES in step 250), the midpoint position of both eyes is calculated based on the binocular position, and the value and the eyebrow centered around the midpoint position are calculated. The pattern and the relative position from the midpoint position to both eyes are saved, and the tracking is temporarily terminated. However, in this case, the next tracking process is performed through the route from step 106 to step 102 in FIG.

【００９３】トラッキングに失敗すると、図７のステッ
プ１０６からステップ１００の経路を通り、再度目の位
置の抽出処理から実行される。If the tracking fails, the process proceeds from the step 106 to the step 100 in FIG. 7 and the eye position extraction processing is performed again.

【００９４】こうして、ステレオカメラ５６，５８を用
いて目の位置（方向）を抽出し、トラッキングし、それ
らの値を用いることによりアイカメラ５４を制御するた
めのパラメータ（目の方向、距離）を計算してアイカメ
ラ５４を制御することができる。Thus, the positions (directions) of the eyes are extracted and tracked using the stereo cameras 56 and 58, and the parameters (directions and distances of the eyes) for controlling the eye camera 54 are obtained by using these values. The eye camera 54 can be calculated and controlled.

【００９５】図１３〜図１５に、本実施の形態による目
の位置の抽出およびトラッキング処理の具体例につい
て、画面表示を示す。13 to 15 show screen displays of specific examples of eye position extraction and tracking processing according to the present embodiment.

【００９６】図１３には、入力された画像例を示す。図
１４に示すのは、この画像と前画像との差分画像であ
る。画面の左上には左右の対称性を判定するために切出
したパターン２７０が、左下には入力画像の眉間パター
ン２７２が、それぞれ表示されている。差分画像の中
で、目位置候補として２箇所２７４および２７６が抽出
され白丸で表示されている。FIG. 13 shows an example of the input image. FIG. 14 shows a difference image between this image and the previous image. A pattern 270 cut out to determine left-right symmetry is displayed on the upper left of the screen, and a pattern between eyebrows 272 of the input image is displayed on the lower left. In the difference image, two places 274 and 276 are extracted as eye position candidates and displayed as white circles.

【００９７】図１５には、両目位置２９２，２９４およ
び中心位置２９０を画像に重ねて表示した例を示す。図
１５に示すように、眉間を中心として、両目の位置が正
しくトラッキングされている。FIG. 15 shows an example in which the positions of both eyes 292 and 294 and the center position 290 are superimposed and displayed on the image. As shown in FIG. 15, the positions of both eyes are correctly tracked with the center between the eyebrows.

【００９８】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time are to be considered as illustrative in all points and not restrictive. The scope of the present invention is shown not by the above description but by the claims, and is intended to include meanings equivalent to the claims and all modifications within the scope.

[Brief description of drawings]

【図１】本願発明にかかる目の位置のトラッキング方
法および装置を実現するコンピュータシステム２０の外
観図である。FIG. 1 is an external view of a computer system 20 that implements an eye position tracking method and apparatus according to the present invention.

【図２】図１に示すコンピュータシステム２０および
周辺装置のブロック図である。FIG. 2 is a block diagram of the computer system 20 and peripheral devices shown in FIG.

【図３】本発明の一実施の形態における目の位置の抽
出の原理を示すための図である。FIG. 3 is a diagram showing the principle of eye position extraction according to the embodiment of the present invention.

【図４】差分画像の一例を示す図である。FIG. 4 is a diagram showing an example of a difference image.

【図５】本発明の一実施の形態における目の位置の抽
出の原理を示すための図である。FIG. 5 is a diagram showing the principle of eye position extraction according to the embodiment of the present invention.

【図６】本発明の一実施の形態において、利用者が振
り向く動きをキャンセルする原理を示す図である。FIG. 6 is a diagram showing the principle of canceling the movement of the user turning around in the embodiment of the present invention.

【図７】本発明の一実施の形態における目の位置の抽
出方法および装置を実現するソフトウェアの全体の制御
構造を示すフローチャートである。FIG. 7 is a flowchart showing the overall control structure of software that realizes the eye position extraction method and apparatus according to the embodiment of the present invention.

【図８】本発明の一実施の形態における目の位置の抽
出処理を実現するソフトウェアのフローチャートであ
る。FIG. 8 is a flowchart of software that implements eye position extraction processing according to the embodiment of the present invention.

【図９】本発明の一実施の形態における目の位置の抽
出処理を実現するソフトウェアのフローチャートであ
る。FIG. 9 is a flowchart of software that implements eye position extraction processing according to an embodiment of the present invention.

【図１０】本発明の一実施の形態における目の位置の
抽出処理において、移動量ｄの算出処理を実現するソフ
トウェアのフローチャートである。FIG. 10 is a flowchart of software that implements a process of calculating a movement amount d in the eye position extraction process according to the embodiment of the present invention.

【図１１】本発明の一実施の形態における、眉間の近
傍の左右対称性を判定する処理を説明するための図であ
る。FIG. 11 is a diagram for explaining processing for determining left-right symmetry in the vicinity of the eyebrow in the embodiment of the present invention.

【図１２】本発明の一実施の形態における目のトラッ
キング処理を実現するソフトウェアのフローチャートで
ある。FIG. 12 is a flowchart of software that implements eye tracking processing according to an embodiment of the present invention.

【図１３】本発明の一実施の形態における現画像の表
示例を示す図である。FIG. 13 is a diagram showing a display example of a current image according to the embodiment of the present invention.

【図１４】本発明の一実施の形態における差分画像の
表示例を示す図である。FIG. 14 is a diagram showing a display example of a difference image according to the embodiment of the present invention.

【図１５】本発明の一実施の形態における目の位置の
トラッキングの表示例を示す図である。FIG. 15 is a diagram showing a display example of eye position tracking according to an embodiment of the present invention.

[Explanation of symbols]

２０コンピュータシステム、４０，８２コンピュー
タ、４２モニタ、４６キーボード、４８マウス、
５４アイカメラ、５６，５８ステレオカメラ。20 computer system, 40,82 computer, 42 monitor, 46 keyboard, 48 mouse,
54 eye camera, 56, 58 stereo camera.

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B057 AA20 CA12 CA16 CH08 CH11 CH12 DA07 DA08 DB02 DC09 DC34 5L096 CA02 FA69 HA04 HA07 JA03 JA09 ─────────────────────────────────────────────────── ─── Continued front page F term (reference) 5B057 AA20 CA12 CA16 CH08 CH11 CH12 DA07 DA08 DB02 DC09 DC34 5L096 CA02 FA69 HA04 HA07 JA03 JA09

Claims

[Claims]

1. A method for tracking the position of the subject's eyes in a series of images output by the image capturing means in a computer to which image capturing means for capturing a video image of a subject's face is connected, An image pattern between the eyebrows determined by the position between the eyebrows of the face image output by the imaging means, the relative position of both eyes with respect to the position between the eyebrows, and the position between the eyebrows is given in advance, and an image subsequent to the image is acquired. And a step of predicting a position between the eyebrows of the face image in the subsequent image, and a center of a region that best matches the image pattern between the eyebrows in the vicinity of the predicted position between the eyebrows. A step of searching for a point, the position of the searched point, and the position of both eyes in the subsequent image based on the relative position of the both eyes,
A step of searching for a center point of each of the two areas satisfying a predetermined condition centered on the predicted area, and a new position of the midpoint of the searched two center points is newly set. The position between the eyebrows, the area defined by the new eyebrow position and the relative position of the two central points is the new relative position of both eyes with respect to the eyebrow position, and the area defined by the new eyebrow position Updating the image pattern as a new image pattern between the eyebrows, the position between the eyebrows, the relative positions of both eyes, and the image pattern between the eyebrows, and the first image with respect to an image further following the subsequent image. The method of tracking an eye position, the method including:

2. The step of predicting the position between the eyebrows comprises:
The method according to claim 1, further comprising extrapolating a position between the eyebrows of the subsequent image from a position between the eyebrows in the image output by the imaging unit and a position between the eyebrows in the image preceding the image. The method described.

3. In the step of searching for a point that is the center of each of the two areas, in response to the fact that the point cannot be searched, the position between the eyebrows of the face image output by the image pickup means, and the space between the eyebrows. 2. The method according to claim 1, further comprising the step of extracting an image pattern between the eyebrows defined by the relative positions of both eyes with respect to the position of the eyebrow and the position between the eyebrows by a predetermined method, and restarting the process from the first step. Item 2. The method according to Item 2.

4. The step of searching for a center point of each of the two regions, in the vicinity of each of the positions defined by the relative positions of the eyes with respect to the position of the searched point, 4. The method according to claim 1, further comprising a step of searching a center of an area having a predetermined shape and having a darkest average pixel value in the area and setting the center as a candidate point for both eyes. The method described in crab.

5. The step of searching for a center point of each of the two areas further includes: 1) the candidate points searched in the step of searching for the center of the darkest area; It is greater than or equal to a predetermined minimum value, 2) the distance between candidate points is less than or equal to a predetermined maximum value, and 3) the angle between the straight line connecting the candidate points and the scanning line direction is predetermined. The method according to claim 4, further comprising the step of determining whether all the conditions are satisfied, and failing the search if any of the conditions is not satisfied.

6. An apparatus for tracking the eye position of the subject in a series of images output by the image capturing means in a computer to which image capturing means for capturing a video image of the face of the subject is connected, An image pattern between the eyebrows determined by the position between the eyebrows of the face image output by the imaging means, the relative position of both eyes with respect to the position between the eyebrows, and the position between the eyebrows is given in advance, and an image subsequent to the image is acquired. Image acquisition means for performing, a prediction means for predicting the position between the eyebrows of the face image in the subsequent image, a region near the predicted position between the eyebrows, the region that best matches the image pattern between the eyebrows A first searching means for searching for a point that is the center of the, and the position of the searched point and the relative position of both eyes, in the subsequent image. To predict the position of the eyes,
Second searching means for searching a center point of each of two areas satisfying a predetermined condition centering on the predicted area, and among the searched two center points The position of the point is a new position between the eyebrows, a region defined by the position of the new eyebrows and the relative position of the two central points is a new relative position of both eyes with respect to the position of the eyebrow, and the new position between the new eyebrows. An image pattern of a region defined by the position of the eyebrow as a new image pattern between the eyebrows, the position between the eyebrows, the relative position of both eyes, and updating means for updating the image pattern between the eyebrows, further to the subsequent image The image acquisition unit, the prediction unit, the first and second search units, and the image acquisition unit so as to repeat the process from the image acquisition unit for the subsequent images, and And means for controlling the serial updating means, the tracking device locations eye.

7. The predicting means excludes the position between the eyebrows of the subsequent image from the position between the eyebrows in the image output by the imaging means and the position of the eyebrow in the image preceding the image. 7. The device of claim 6, including means for inserting.

8. The position of the eyebrow of the face image output by the imaging unit, the relative position of both eyes with respect to the position of the eyebrow, and An image pattern between the eyebrows defined by the position between the eyebrows is extracted by a predetermined device, and the image acquisition means, the prediction means, the first and second search means, so as to restart the processing from the image acquisition means, 8. An apparatus according to claim 6 or claim 7, further comprising means for controlling the updating means.

9. The second searching means is a region having a predetermined shape in the vicinity of each of the positions determined by the relative positions of the eyes with respect to the position of the searched point. 9. The apparatus according to claim 6, further comprising means for searching for a center of a region where the average of pixel values in the region is darkest to be a candidate point for both eyes.

10. The candidate points searched for by the means for searching the center of the darkest region are further 1) the distance between the candidate points is equal to or greater than a predetermined minimum value. 2) the distance between the candidate points is less than or equal to a predetermined maximum value, and 3) the angle between the straight line connecting the candidate points and the scanning line direction satisfies a predetermined relationship. 10. The apparatus of claim 9, including means for determining if all conditions are met and failing the search if any of the conditions are not met.

11. A method for tracking the eye position of the subject in a series of images output by the imaging means in a computer connected to an imaging means for capturing a video image of the subject's face. Is a computer-executable program for controlling the computer, wherein the position between the eyebrows of the face image output by the imaging means, the relative position of both eyes with respect to the position between the eyebrows, and the eyebrows determined by the position between the eyebrows. Image pattern is given, the method comprises: a first step of obtaining an image subsequent to the image; a step of predicting a position between eyebrows of a face image in the subsequent image; A point near the position of the center of the region that best matches the image pattern between the eyebrows, The position of the cord has been point, based on the relative position of the eyes, we predict the positions of the eyes in the subsequent image,
A step of searching for a center point of each of the two areas satisfying a predetermined condition centered on the predicted area, and a new position of the midpoint of the searched two center points is newly set. The position between the eyebrows, the area defined by the new eyebrow position and the relative position of the two central points is the new relative position of both eyes with respect to the eyebrow position, and the area defined by the new eyebrow position Updating the position of the eyebrow, the relative position of the eyes, and the image pattern of the eyebrow as the new image pattern of the eyebrow. Eye position tracking program, including the steps of performing processing from step 1.

12. The step of predicting the position between the eyebrows includes the position between the eyebrows in the image output by the imaging unit,
From the position of the eyebrows in the image preceding the image, including the step of extrapolating the position of the eyebrows of the subsequent image,
The program according to claim 11.

13. The method, in the step of searching for a center point of each of the two areas, in response to the fact that the point could not be searched, between the eyebrows of the face image output by the imaging means. The method further includes: extracting a position, a relative position of both eyes with respect to the position between the eyebrows, and an image pattern between the eyebrows defined by the position between the eyebrows by a predetermined method, and restarting the process from the first step. The program according to claim 11 or 12.

14. The step of searching for a center point of each of the two regions, in the vicinity of each of the positions defined by the relative positions of the eyes with respect to the position of the searched point, 14. The method according to claim 11, further comprising a step of searching a center of a region having a predetermined shape and having a darkest average pixel value in the region and setting the center as a candidate point for both eyes. The program described in Crab.

15. The step of searching for a center point of each of the two regions further includes: 1) the candidate points searched in the step of searching for the center of the darkest region; It is greater than or equal to a predetermined minimum value, 2) the distance between candidate points is less than or equal to a predetermined maximum value, and 3) the angle between the straight line connecting the candidate points and the scanning line direction is predetermined. 15. The program according to claim 14, further comprising the step of determining whether all the conditions are satisfied, and failing the search if any of the conditions is not satisfied.