JP3823760B2

JP3823760B2 - Robot equipment

Info

Publication number: JP3823760B2
Application number: JP2001158402A
Authority: JP
Inventors: 晃井上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-05-28
Filing date: 2001-05-28
Publication date: 2006-09-20
Anticipated expiration: 2019-01-29
Also published as: JP2002056388A

Description

【０００１】
【発明の属する技術分野】
本発明は、映像中の人物を識別する技術に関し、特に、正面顔による人物識別技術、並びに人物を識別するロボット装置に関する。
【０００２】
【従来の技術】
顔の映像を用いて人物を識別する方式は、従来よりいくつか提案されている。最近の顔検出、識別技術の動向は、例えば文献（１）（赤松茂、“コンピュータによる顔の認識−サーベイ−”、電子情報通信学会論文誌、Vol.J80-D-II,No.8,pp.2031-2046,August 1997）に記載されている。一般に、顔識別システムは、画像中から顔を検出する処理と、顔パターンからの特徴抽出処理と、特徴量を辞書データと比較する人物識別処理と、を備えて構成されている。
【０００３】
顔画像の検出方式としては、文献（２）（小杉信、“個人識別のための多重ピラミッドを用いたシーン中の顔の探索・位置決め”、電子情報通信学会論文誌、Vol.J77-D-II,No.4,pp.672-681,April 1994）に記載されているように濃淡パターンを用いたテンプレートマッチングを行うものや、文献（３）（M.Turk, A.Pentland, “Face Recognition on Using Eigenfaces”, Proceedings of IEEE,CVPR91）に記載されているような、顔画像の固有ベクトル投影距離方式が知られている。
【０００４】
また、例えば特開平９−２５１５３４号公報には、目、鼻、口といった造作を検出し、その位置関係から正面顔濃淡パターンを切り出す方法も提案されている。
【０００５】
顔検出の代表的な例として、M.Turkらによる固有ベクトル投影距離方式について説明する。
【０００６】
あらかじめ多くの正面顔データ（数百枚）を用意する。それらの画素値を特徴ベクトルとして、固有値と固有ベクトルを求める。固有値の大きい順にｐ個の固有ベクトルＶn(n=1,..p)を求める。
【０００７】
テスト画像ｔを固有ベクトルＶnに投影すると、ｐ個の投影値が得られる。これらの投影値と、固有ベクトルＶn から、テスト画像を再構成することにより、再構成テスト画像ｔ’が得られる。
【０００８】
もしもｔが顔パターンに近ければ、再構成テスト画像ｔ’も顔パターンに近い画像が得られる。そこで次式（１）で与えられる距離尺度Ｄｔによって、顔であるかどうかを判断する。
【０００９】

【００１０】
顔識別の特徴量としては、目、鼻、口といった顔造作の幾何学的特徴を用いるものと、大局的な濃淡パターンの照合によるものとの２種類があるが、シーン中の顔パターンは、顔の向きや表情が変化すると造作の位置関係も変化するため、近時、後者の大局的な濃淡パターンを用いる方法が現在主流となっている。
【００１１】
顔画像の識別、照合方法としては例えば、上記文献（２）（小杉信、“個人識別のための多重ピラミッドを用いたシーン中の顔の探索・位置決め”、電子情報通信学会論文誌、Vol.J77-D-II,No.4,pp.672-681,April 1994）では、濃淡パターンを特徴ベクトルと考え、特徴ベクトル間の内積が大きいカテゴリを識別結果としている。また、上記文献（３）（M.Turk, A.Pentland, “Face Recognition on Using Eigenfaces”, Proceedings of IEEE,CVPR91）では、顔画像の固有ベクトルへの投影値を特徴ベクトルとし、それらのユークリッド距離の小さいカテゴリを識別結果としている。
【００１２】
また従来、画像認識機能を持ったロボット装置としては、例えば特願平１０−１５１５９１号に記載された装置がある。このロボット装置は、画像中から色情報を抽出し、色パターンに応じて動作を変化させることができる。しかしながら、人物を認識する機能手段は具備していない。
【００１３】
【発明が解決しようとする課題】
上記した従来のシステムは下記記載の問題点を有している。
【００１４】
第一の問題点は、家庭環境のように照明条件が一定でない環境では人物識別が出来ないということである。
【００１５】
その理由は、一般環境における顔の検出が困難であるためである。例えばテンプレートマッチング法は、画像中の顔パターンと辞書パターンとがほとんど濃度値でない限り、検出することは困難であり、照明方向が少しでもずれていたり、あるいは辞書の人物と異なる人物の場合には、ほとんど検出不可能である。一方、固有ベクトル投影距離方式は、テンプレートマッチングに比べると、検出性能は高いものの、照明方向が違っていたり、また複雑な背景を持つ画像では、同様に、検出は失敗する。
【００１６】
また、照明条件が一定でない環境で人物識別が出来ないもう一つの理由は、従来の特徴抽出方式と識別方式が、照明変動による特徴量の変動を吸収できないためである。
【００１７】
したがって、本発明は、上記問題点に鑑みてなされたものであって、その目的は、家庭環境のような一般環境において人物を識別できるロボット装置を提供することにある。
【００１８】
したがって、本発明の他の目的は、一般環境において安定して人物を識別できるロボット装置を提供することにある。
【００１９】
【課題を解決するための手段】
前記目的を達成する本発明に係るロボット装置は、人物識別装置として、画像を取得する映像取得手段と、画像中から人間の頭部を検出する頭部検出追跡手段と、検出された頭部の部分画像中から正面顔画像を取得する正面顔位置合わせ手段と、正面顔画像を特徴量に変換する顔特徴抽出手段と、識別辞書を用いて特徴量から人物を識別する顔識別手段と、識別辞書を保存する識別辞書記憶手段とを備えたことを特徴とする。そして頭部検出追跡手段において、１枚の画像から頭部を検出する単眼視頭部矩形座標検出手段と、対面距離値と頭部矩形座標値とから頭部の誤検出を取り除く対面距離評価手段とを備え、ロボットの動作を制御する全体制御部と、全体制御部の指示で音声を発話するスピーカと、全体制御部の指示でロボットを移動する移動手段と、前方の物体との距離を測定する対面距離センサと、タッチセンサと、マイクと、音声認識手段とを備えたことを特徴とする。
【００２０】
本発明において、前記全体制御部は、人物識別結果が得られたときに、人物毎に異なる音声で発話するよう制御する。
【００２１】
本発明において、前記全体制御部が、前記人物識別装置から前方物体との対面距離と方向を取得する手段と、人物識別結果を取得する手段と、前記対面距離がしきい値以上の場合には、前記前方物体に近づくように移動する手段と、前記対面距離がしきい値以下のときは、人物識別結果を人物毎に異なる音声で発話するように制御する手段と、を備える。
【００２２】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して詳細に説明する。
【００２３】
図１は、本発明に係るロボット装置で用いられる人物識別装置の一実施の形態の構成を示す図である。図１を参照すると、本発明の一実施の形態をなす人物識別装置１４は、映像取得手段２と、対面距離センサ５と、人物検出識別手段１と、人物検出識別管理部１３と、辞書データ管理部１２と、を備えている。映像取得手段２は、右カメラ３と左カメラ４とを備え、それぞれのカメラ映像情報を取得する。対面距離センサ５は、カメラの光軸と同じ方向に設置されており、映像中の物体との対面距離を計測する。対面距離センサ５の例として、超音波センサ、赤外線センサなどがある。人物検出識別管理部１３は、人物検出識別手段１に対する動作開始命令と動作終了命令の送信と、辞書データ管理部１２への特徴データ送信、辞書作成命令の送信を行なう。
【００２４】
本発明の一実施の形態において用いられるカメラとしては、例えばビデオカメラ、デジタルＣＣＤカメラ等を含み、動きのある情景を、静止画像の連続として出力することのできる撮影デバイスを総称している。
【００２５】
人物検出識別手段１は、頭部検出追跡手段６と、正面顔位置合わせ手段７と、顔特徴抽出手段８と、顔識別手段９と、識別辞書記憶手段１０と、識別結果補正手段１１とを備えている。
【００２６】
人物検出識別手段１は、人物検出識別管理部１３から動作開始命令を受けると、辞書データ管理部１２から識別辞書記憶手段１０に識別用辞書をロードした後、動作を開始する。
【００２７】
図１８は、本発明の一実施の形態における人物検出識別手段１の処理を説明するための流れ図である。図１及び図１８を参照して、人物検出識別手段１の動作を説明する。
【００２８】
はじめに、頭部検出追跡手段６は、映像取得手段２からの画像情報と、対面距離センサ５の読み取り値をもとに、現在のフレームにおける人物の頭部の数と、頭部矩形座標を出力する(ステップＳ１)。
【００２９】
次に、検出した頭部数を評価する（ステップＳ２）。検出した頭部数が０の場合には、次のフレームの映像を入力して頭部検出を行い、検出数が１以上になるまで、ステップＳ１は継続される。
【００３０】
頭部検出数が１以上の時、正面顔位置合わせ手段７に検出結果が送信される。正面顔位置合わせ手段７では、顔領域の探索処理を行い（ステップＳ３）、正面顔領域が見つかったかどうかを判断する（ステップＳ４）。
【００３１】
正面顔が見つかると、顔中心部の矩形画像である正面顔画像を出力する。ステップＳ３とステップＳ４の処理は、頭部の誤検出を排除し、さらに人物がカメラの正面を向いている映像のみを抽出して後段の処理に送る事を目的としている。正面顔を発見できなかったときは、再びステップＳ１から処理を行う。
【００３２】
正面顔を発見した時は、次に顔特徴抽出手段８において、正面顔画像を特徴量データに変換する（ステップＳ５）。
【００３３】
顔特徴抽出手段８の一例は、図８に示すように、正面顔画像を左から右へ１ライン毎にスキャンし、上から下へ１ライン終われば次のラインをスキャンして１次元データを生成し（「ラスタスキャン」という）、それを特徴データとして用いるものである。その他、１次微分フィルタや２次微分フィルタを用いてフィルタリングし、エッジ情報を抽出したものを、ラスタスキャンして特徴データとする方法を用いてもよい。
【００３４】
次に、顔識別手段９において、識別辞書記憶手段１０の辞書データを参照して顔識別処理を行う（ステップＳ６）。
【００３５】
次に、識別結果補正手段１１において、過去ｍフレーム分（ｍは２以上の整数）の識別結果との統合処理を行い（ステップＳ７）、その結果を、人物検出識別管理部１３に出力する（ステップＳ８）。
【００３６】
この時、頭部検出追跡手段６（ステップＳ１）において、複数の頭部矩形を検出し、それらを全て処理していない時は（ステップＳ９のＮｏ分岐）、もう一度正面顔位置合わせ手段７による（ステップＳ３）から処理を行なう。人物検出識別手段１は、人物識別管理部１３から終了指示を受けて終了する（ステップＳ１０）。終了指示があるまでは再びステップＳ１から処理を継続して行う。
【００３７】
図２は、図１の頭部検出追跡手段６の一実施例をなす頭部検出追跡手段２７の構成を示す図である。図２を参照すると、頭部検出追跡手段２７は、頭部検出手段２１と、頭部追跡手段２２と、頭部矩形座標記憶手段２３とを備えている。頭部検出手段２１は、頭部矩形座標検出手段２４と、左右画像照合手段２５と、対面距離統合手段３１と、対面距離評価手段２６とを備えている。
【００３８】
図１９は、頭部検出追跡手段２７の処理を説明するための流れ図である。図２及び図１９を参照して、本発明の一実施例をなす頭部検出追跡手段２７の動作を説明する。
【００３９】
右カメラの映像と左カメラの映像と、対面距離センサの読み取り値が頭部検出手段２１に入力される。頭部検出手段２１は、入力された情報から人物頭部の検出処理を行い、頭部矩形座標と頭部検出数を出力する（ステップＳ１０）。
【００４０】
頭部検出数が１以上の場合には、頭部検出数と頭部矩形座標を、頭部矩形座標記憶手段２３に保存した後に、出力する（ステップＳ１８）。
【００４１】
頭部検出数が０の場合には、頭部追跡手段２２において、前のフレームにおける頭部矩形情報を、頭部矩形記憶手段２３から取り出し、頭部追跡処理を行う（ステップＳ１９）。
【００４２】
頭部追跡に成功した場合には、追跡に成功した頭部数と頭部矩形座標を出力し、追跡に失敗した場合には検出数０を出力する（ステップＳ２０）。
【００４３】
次に、図１９のステップ１０の頭部検出手段２１の動作について詳細に説明する。頭部検出手段２１では、まず左右どちらか一方の映像を頭部矩形座標検出手段２４に入力し、仮頭部検出数と仮頭部矩形座標を得る（ステップＳ１１）。
【００４４】
図２に示した頭部矩形座標検出手段２４では右カメラ映像を用いている。次に、左右画像照合手段２５において、得られた頭部矩形座標と左右カメラの映像を用いて、ステレオ視の原理をもとに対面距離値を算出する（図１９のステップＳ１２）。
【００４５】
図６を参照して、図２の左右画像照合手段２５の動作を説明する。右カメラ画像において検出された頭部矩形を、頭部検出矩形５１とする。そして頭部検出矩形５１内の画像データを用いて、左カメラ画像の同じ検出座標位置の近傍を探索する。探索方法の一例はテンプレートマッチングである。右カメラ画像の濃淡値をＦＲ(x,y)、左カメラ画像の濃淡値をＦＬ(x,y)、矩形の横サイズをＴｗ、縦サイズをＴｈとすると、テンプレートの左上始点位置が、左カメラ画像の(sx,sy) にある時のマッチング距離Ｄtmは、次式（２）で表される。
【００４６】

【００４７】
上式（２）は、右カメラと左カメラの部分画像間のユークリッド距離を表している。Ｄtmが最も小さい時の左カメラ画像上の座標を、探索結果５２とする。探索結果５２が求まると、次に左右の矩形座標値を比較し、人物頭部への距離を算出する。
【００４８】
図３２を参照して、対象物への距離算出方法の一例を示す。図３２は、左右のカメラを使って、ある一つの対象物体４０３を撮影している状況を、上から見た図である。右カメラ４０１と左カメラ４０２が、間隔Ｃを空けて平行に設置されている。カメラの画角はθで左右共に同じとする。カメラの撮像面の横方向の長さをｅとする。この状態で、右カメラ画像には、対象物４０３が座標Ｘｒに写っており、左カメラ画像には座標Ｘｌに写っている。なお画像の最大横サイズはＷ画素である。この時、カメラ撮像面から、対称物４０３までの対面距離Ｚは、次式（３）で算出することができる。

【００４９】
ここで、ｅは通常１ｃｍ未満の小さい値であることから、０として近似計算することもできる。以上のようにして、左右カメラ画像から対面距離を算出する。
【００５０】
そして、左右画像照合手段２５において、ステレオ視によって対面距離を算出した後、対面距離統合手段３１において、対面距離センサ３０の出力値をもとに、対面距離の統合処理を行う（図１９のステップＳ１３）。実験的に、超音波センサ等の距離センサは、距離が１ｍ未満の場合には非常に精度が高い。一方、１ｍ以上の遠い距離では誤差が大きくなる傾向にある。
【００５１】
ステレオ視により算出された距離値は、カメラの画角にもよるが概ね３ｍ程度まで有効であるが、距離が近すぎると、かえって誤差が大きくなる傾向にある。そこで、両者の距離値を統合する方法として、対面距離センサ３０の出力があるしきい値Ｔよりも小さい場合には、対面距離センサの値を採用し、しきい値Ｔよりも大きい場合にはステレオ視による距離値を採用するという方法が用いられる。
【００５２】
対面距離を統合した後、対面距離評価手段２６において、対面距離値と画像中の頭部矩形座標値から、頭部の実際のサイズを算出する（図１９のステップＳ１４）。
【００５３】
算出結果が、人間の頭の大きさにほぼ一致すれば、本当に頭部を検出したと判定する。算出結果が実際の頭のサイズから著しくかけ離れる場合には、誤検出であると判断する（図１９ステップＳ１５）。例えば、頭部の横サイズが１２ｃｍプラスマイナス２ｃｍ以内で、かつ縦サイズが２０ｃｍプラスマイナス４ｃｍ以内の場合は頭部と見做し、それ以外の場合は頭部ではない、と判断する。
【００５４】
実際のサイズに合っている場合には、検出数を１増やす（図１９ステップＳ１６）。
【００５５】
評価していない仮頭部矩形座標が残っている時は（図１９のステップＳ１７のＮｏ分岐）、再びステップＳ１２から処理を行う。頭部検出手段２１は仮頭部矩形座標を全て評価し終わった時点で（図１９のステップＳ１９のＹｅｓ分岐）、頭部検出数と頭部矩形座標を出力する。
【００５６】
次に、図２に示した頭部矩形座標検出手段２４について説明する。図４は、頭部矩形座標検出手段２４の一実施例をなす頭部矩形座標検出手段４１の構成を示す図である。図４を参照すると、頭部矩形座標検出手段４１は、動き画素検出手段４２と、ノイズ除去手段４７と、人物数評価手段４３と、頭頂座標検出手段４４と、頭部下部座標検出手段４５と、側頭座標検出手段４６とを備えている。
【００５７】
図２０は、頭部矩形座標検出手段４１の処理を説明するための流れ図である。図２０、図４、及び図５を参照して、頭部矩形座標検出手段４１の動作について説明する。
【００５８】
まず動き画素検出手段４２において、画面内で動きのある画素群を検出する。入力画像データと、それより１つ前に入力された画像データとの差分をとり、差分画像ｇを生成する（ステップＳ２１）。
【００５９】
さらに過去ｍフレーム分（ｍは２以上の整数）の差分画像ｇを加算し平均をとる事によって、統合差分画像Ｇを得る（ステップＳ２２）。統合差分画像Ｇは、動きのない領域の画素値が０で、動きのある領域ほど画素値が大きい値を取る。
【００６０】
統合差分画像Ｇは、ごま塩ノイズを多く含むので、ノイズ除去手段４７において、ノイズ除去処理を行う（ステップＳ２３）。ノイズ除去処理の例としては、膨張収縮処理や、メジアンフィルタ処理などがある。これらのノイズ除去処理は、画像処理の分野で一般的であり、当業者にとってよく知られている処理が用いられるので、その詳細な構成は省略する。
【００６１】
次に、図４の人物数評価手段４３において、画面内に何人の人間がいるのかを評価する。人物数評価手段４３の動作について説明する。図５は、統合差分画像Ｇの取得の例を説明するための図である。
【００６２】
はじめに、人物１人だけを検出する方法について説明する。統合差分画像Ｇ４８が得られたとすると、まず動き領域があるかどうかを判定する（図２０のステップＳ２４）。ここで動き領域とは、動きのある画素が占める領域を表わす。この動き領域がない、すなわち統合差分画像Ｇが全て０の場合には、人物数は０と判定する。それ以外の場合人物数は１とする。
【００６３】
次に、複数人物を検出する方法について説明する。統合差分画像Ｇ４９が得られたとすると、まず動き領域の有無を調べる（図２０のステップＳ２４）。動き領域がない場合は人物数０である。動き領域がある場合、統合差分画像Ｇを参照して、何人いるのかを判定する（図２０のステップＳ２５）。判定方法としては、例えば統合差分画像上部領域５０における動き領域幅の最大値が、あるしきい値よりも小さいときは１人、大きいときは２人、とする方法がある。人物数が２人のときは、人物が横に並んでいると仮定し、統合差分領域Ｇを、部分領域１と部分領域２に分割する。なお３人以上検出の場合も、分割数を増やすことで対応できる。頭部矩形を求める際には、部分領域１と部分領域２のそれぞれに対して、以下に述べる同じ処理（図２０のステップＳ２６からステップＳ２９まで）を繰り返せばよい。
【００６４】
次に、統合差分画像Ｇから頭部矩形座標を求める処理について説明する。各スキャンライン毎に動き領域幅４７を求める（図２０のステップＳ２６）。
【００６５】
動き領域幅４７は、各スキャンラインにおいて動き領域のｘ座標最大値と最小値の差分を表している。
【００６６】
次に頭頂座標検出手段４４によって頭頂のＹ座標を求める（図２０のステップＳ２７）。頭頂座標の求め方としては、動き領域のＹ座標の最小値を頭頂とする方法がある。
【００６７】
次に、頭部下部座標検出手段４５によって、頭部矩形の底辺のＹ座標を求める（図２０のステップＳ２８）。頭部矩形の底辺座標を求め方としては、頭頂から下方向（Ｙ方向）に探索し、動き領域幅４７が動き領域幅の平均値ｄｍよりも小さいラインを求め、そのラインの中でＹ座標が最も大きいところを、頭部矩形の底辺とする方法を用いてもよい。
【００６８】
次に側頭座標検出手段４６によって、頭部矩形の左右のｘ座標を求める（図２０のステップＳ２９）。左右のｘ座標の求め方としては、頭頂から頭部下部までの範囲で最も動き領域幅４７が大きなラインにおける動き領域の左右端の座標を求める方法を用いてもよい。
【００６９】
人物数が２つ以上の場合には、図２０のステップＳ２６からステップＳ２９までの処理を部分領域毎に繰り返す。
【００７０】
次に、図２の頭部追跡手段２２の動作について、図７を参照して説明する。追跡処理は、頭部矩形座標検出に用いたカメラ画像（図２では右カメラ画像）に対して行なう。まず前フレームの頭部矩形座標５３と前フレームの頭部矩形画像５５を、頭部矩形記憶手段２３から読み出す。
【００７１】
次に、現フレームにおいて、前フレームの頭部矩形座標５３の近傍領域をテンプレートマッチングによって探索し、最も距離値の小さい所を追跡結果とする。
【００７２】
図３は、図１の頭部検出追跡手段６の他の実施例をなす頭部検出追跡手段３２の構成を示す図である。図３を参照すると、この頭部検出追跡手段３２は、頭部検出手段３３と、頭部矩形記憶手段２３と、頭部追跡手段２２とを備えている。図２に示した実施例との相違点としては、頭部検出手段３３が、頭部矩形座標検出手段２４と、対面距離評価手段２６とを持ち、単眼のカメラ３４と対面距離センサ３０の出力を用いて検出を行なっていることである。すなわち、左右のステレオ視による対面距離は考慮せず、対面距離センサの読み取り値のみを用いて頭部矩形の評価を行なうものである。
【００７３】
また、頭部検出追跡手段６のその他の実施例として、対面距離センサを用いずに、左右カメラのみの情報から対面距離を求め、頭部矩形を評価するという構成の頭部検出手段を用いてもよい。この構成の場合、頭部検出手段２１において、図２の対面距離統合手段３１を除いた構成となる。
【００７４】
図９は、図１の正面顔位置合わせ手段７の一実施例をなす正面顔位置合わせ手段６１の構成を示す図である。図９を参照すると、正面顔位置合わせ手段６１は、頭部矩形切り取り手段６２と、正面顔探索手段６３と、正面顔らしさ判定手段６５とを備えて構成されている。
【００７５】
正面顔らしさ判定手段６５は、濃度分散判定手段６６と、しきい値処理手段６７とを備えている。
【００７６】
図２１は、正面顔位置合わせ手段６１の処理を説明するための流れ図である。図９及び図２１を参照して、正面顔位置合わせ手段６１の動作について説明する。正面顔位置合わせ手段６１は、画像データと頭部矩形座標と対面距離が入力されると、正面顔有無フラグと正面顔画像データを出力する。入力された画像データは、頭部矩形切り取り手段６２において、頭部矩形による部分画像に切り取られる（ステップＳ４１）。この部分画像を「頭部矩形画像」と呼ぶ。
【００７７】
次に正面顔探索手段６３において、頭部矩形画像の中から、正面顔領域を探索し、正面顔画像と標準顔辞書とのパターン間距離又は類似度を出力する（ステップＳ４２）。
【００７８】
次に、正面顔らしさ判定手段６５において、正面顔画像が本当に正面顔であるかどうかを判断する（ステップＳ４３）。ここで正面顔であると判断されれば、正面顔有無フラグは「有り」となり、正面顔画像を出力する。正面顔ではないと判断すれば、正面顔有無フラグは「無し」となり、正面顔画像は出力しない。
【００７９】
正面顔らしさ判定手段６５は、濃度分散判定手段６６と、しきい値処理手段６７とを備えている。
【００８０】
濃度分散判定手段６６は、正面顔画像データの濃淡値の分散を求め、あるしきい値以下の場合には、正面顔ではないと判断する（図２１のステップＳ４４）。
【００８１】
濃度分散判定手段６６により、単調な壁のようなパターンを排除することができる。
【００８２】
しきい値手段６７は、パターン間距離又は類似度をしきい値処理することによって正面顔らしさを判断する（図２１のステップＳ４５）。
【００８３】
パターン間距離値の場合には、しきい値以上のときに正面顔ではないと判断する。類似度の場合は、しきい値以下のときに、正面顔でないと判断する。
【００８４】
図１２は、正面顔位置合わせ手段６１の動作を模式的に示す説明図である。頭部矩形１５１が検出されているとすると、図２１のステップＳ４１によって、頭部矩形画像１５２が生成される。
【００８５】
次に図２１のステップＳ４２の顔中心部探索処理では、縮小頭部矩形画像１５３が生成された後に正面顔画像１５５が得られる。
【００８６】
なお、正面顔画像とは、図１２の正面顔画像１５５に示すような、顔の中心部分の画像であり、横方向は、両目を完全に含む程度で、縦方向は、眉毛から口全体を含む程度の領域の画像を意味する。
【００８７】
図１０は、正面顔探索手段６３の一実施例をなす正面顔探索手段７１の構成を示す図である。図１０を参照すると、この正面顔探索手段７１は、頭部矩形画像記憶手段８９と、頭部中間サイズ算出手段８８と、画像縮小手段９０と、中間サイズ記憶手段９１と、正面顔候補抽出手段７２と、中間縮小画像記憶手段７３と、コントラスト補正手段７４と、固有ベクトル投影距離算出手段７５と、標準顔辞書記憶手段７６と、記憶手段７７と、投影距離最小判定手段７８と、探索範囲終了判定手段７９と、多重解像度処理終了判定手段９２とを備えて構成されている。
【００８８】
固有ベクトル投影距離算出手段７５は、平均差分手段８２と、ベクトル投影値算出手段８３と、再構成演算手段８４と、投影距離計算手段８５とを備えて構成されている。
【００８９】
標準顔辞書記憶手段７６は、標準顔平均データ記憶手段８０と、標準顔固有ベクトルデータ記憶手段８１とを備えて構成されている。
【００９０】
記憶手段７７は、投影距離最小値記憶手段８６と、正面顔濃淡値記憶手段８７とを備えて構成されている。
【００９１】
図２２は、正面顔探索手段７１の処理を説明するための流れ図である。図１０及び図２２を参照して、正面顔探索手段７１の動作について説明する。頭部矩形画像データは、頭部矩形画像記憶手段８９に保持されている。はじめに頭部中間サイズ算出手段８８において、対面距離値と標準顔辞書データのサイズを参照して、頭部矩形画像の中間縮小サイズを計算する（ステップＳ１０１）。
【００９２】
頭部中間サイズ算出手段８８の処理例について説明する。中間縮小サイズは、図１２の縮小頭部矩形画像１５３の縦横サイズとして示されている。頭部矩形画像１５２の横サイズをＨｗ、縦サイズをＨｈとする。中間縮小サイズの横サイズをＭｗ、縦サイズをＭｈとする。また正面顔画像の横サイズをＦｗ、縦サイズをＦｈとする。Ｆｗ，Ｆｈは正面顔探索形状１５４の縦横サイズと同一であり、標準顔辞書に対して一意に決定される。なお、Ｈｈ，Ｈｗ，Ｍｈ，Ｍｗ、Ｆｈ，Ｆｗはすべて画素単位のピクセルサイズである。
【００９３】
標準顔辞書は、図１２の正面顔画像１５５に示す正面顔領域の濃淡値を特徴値として生成されたパターン認識用の辞書である。正面顔領域とは、横方向は両目を完全に含む程度で、縦方向は眉毛から口全体を含む程度の領域を意味する。正面顔領域は必ずしも矩形である必要はなく、楕円形など、両目、鼻、口を含む任意の連続領域で実現可能である。ただし、形状が矩形であれば処理が単純化されて高速化することができるので、実装形態として有効である。よって、以下では、正面顔領域を矩形であるものとして説明する。
【００９４】
正面顔領域の実際の縦横サイズを、ＲＦｈ、ＲＦｗとすると、男性の大人であれば、大体ＲＦｗ＝１０ｃｍ、ＲＦｈ＝１５ｃｍ程度で表わすことができる。一方、頭部矩形画像の実際の縦横サイズＲＨｈ、ＲＨｗは、対面距離Ｚが既知であるため、次式（４）によって、計算することができる。なお次式（４）の変数は、図３２に対応している。
【００９５】

【００９６】
撮像面の幅ｅは小さいので通常は無視して計算しても問題ない。
【００９７】
標準顔辞書を用いて頭部矩形画像を探索するためには、頭部矩形画像を標準顔辞書と同じ解像度に変換する必要がある。その変換後のサイズが中間縮小サイズＭｗ、Ｍｈである。Ｍｈ、Ｍｗは、次式（５）の関係式から求めることができる。
【００９８】

【００９９】
すなわち、頭部中間サイズ算出手段８８において、ＲＦｗ、ＲＦｈを１組指定することによって中間縮小サイズＭｗ、Ｍｈを１組算出することができる。しかし人間の正面顔の大きさは大人から子供、女性と男性で異なっている。そこで、ＲＦｗ、ＲＦｈを複数組用意し、それぞれに対応する中間縮小サイズを算出することも可能となっている。あらかじめ複数算出することにより、後段の正面顔探索処理を複数の中間縮小サイズで処理することができる。また複数の中間縮小サイズで探索処理することは、頭部矩形を複数の解像度で探索処理する行為と同じであると解釈できる。
【０１００】
頭部中間サイズ算出手段８８によって中間サイズが算出されると、中間サイズ記憶手段９１に中間サイズの情報が記憶される。
【０１０１】
次に最小パターン間距離値 Dmin を、通常得られるパターン間距離値に比べ十分大きな値に初期化する（図２２のステップＳ１０２）。
【０１０２】
中間縮小サイズ記憶手段９１の情報を１つ選択し、画像縮小手段９０において、頭部矩形画像を選択した中間縮小サイズに縮小し、縮小頭部矩形画像を得る（図２２のステップＳ１０３）。
【０１０３】
次に正面顔探索位置ＳＸ，ＳＹを０に初期化する（図２２のステップＳ１０４）。
【０１０４】
次に正面顔候補抽出手段７２において、探索位置ＳＸ，ＳＹにおける正面顔候補画像を抽出する（図２２のステップＳ１０５）。
【０１０５】
次に正面顔候補画像を明暗の調子を補正するため、コントラスト補正手段７４によってコントラストを補正する（図２２のステップＳ１０６）。
【０１０６】
コントラスト補正の具体的な方法の例について説明する。正面顔候補画像が、０からｖmaxまでの値をとるものとし、画素値の平均をμ、標準偏差をσとすると、元画像Ｖからコントラスト補正画像Ｖ’への変換式は、次式（６）で表わすことができる。
【０１０７】

【０１０８】
再び図１０及び図２２を参照すると、次に、固有ベクトル投影距離算出手段７５において、正面顔候補画像と標準顔パターンとの固有ベクトル投影距離Dを求める（ステップＳ１０７）。
【０１０９】
次に、投影距離最小判定手段７８において、DとDminとを比較する。このときDがDminよりも小さい値であれば、DminにDを代入して値を更新し、投影距離最小値記憶手段８６に記憶する。同時に正面顔候補画像を、正面顔濃淡値記憶手段８７に記憶する（ステップＳ１０８）。
【０１１０】
次に、探索範囲終了判定手段７９において、探索位置ＳＸ、ＳＹをインクリメントし（ステップＳ１０９）、頭部矩形を全て探索し終わったかどうかを判断する（ステップＳ１１０）。まだ探索し終わっていないときは、再びステップ１０５より処理を繰り返す。
【０１１１】
頭部矩形の探索範囲を全て探索し終わったら、多重解像度処理終了判定手段９２において、すべての中間縮小サイズで探索したか否かを判断する（ステップＳ１１１）。もし、探索していない中間縮小サイズがあれば、異なる中間縮小サイズを用いて再びステップＳ１０３から処理を開始する。すべての中間縮小サイズで探索が終了した時点で正面顔探索手段７１は終了する。
【０１１２】
次に、図１０の固有ベクトル投影距離算出手段７５の動作について説明する。
【０１１３】
標準顔辞書記憶手段７６には、標準顔平均データと、標準顔固有ベクトルデータが記憶されている。
【０１１４】
図３３に、特徴量の数がｐ個の時の、固有ベクトル投影距離算出用辞書の一例を示す。固有ベクトル投影距離算出用辞書は、１からｐ番めまでのｐ次元の固有ベクトルデータＥと、ｐ個の特徴量の平均値Ａveとからなる。特徴量がｐ個のとき、固有ベクトルはｐ番めまで存在するが、投影距離算出時には１からｍ番めまでを使用する。
【０１１５】
正面顔候補画像の画素値を、図８に示すようにラスタスキャンし、１次元の特徴データに変換する。このとき正面顔画像の縦横サイズの積Ｆｗ×Ｆｈは、辞書の特徴量と同じｐ個でなければならない。これをベクトルＸ：Ｘ１、Ｘ２，．．．Ｘｐとする。
【０１１６】
まず平均差分手段８２において、ベクトルＸから平均ベクトルＡveを差分する。これをベクトルＹとする。
【０１１７】

【０１１８】
次にベクトル投影値算出手段８３において、ベクトルＹをｍ個の固有ベクトルに投影し、その投影値Ｒ１．．Ｒｍを求める。投影値算出方法を、次式（８）に示す。
【０１１９】

【０１２０】
次に、再構成演算手段８４（図１０参照）において、投影値Ｒ１．．．Ｒｍと、ｍ個の固有ベクトルとを用いて元の特徴量Ｙを再構成し、その再構成ベクトルをＹ’とする。再構成の計算を次式（９）に示す。
【０１２１】

【０１２２】
次に、投影距離計算手段８５において、次式（１０）に従い、ＹとＹ‘とのユークリッド距離値を求める。これによって、固有ベクトルＥへの投影距離Ｄが算出される。
【０１２３】

【０１２４】
図１１は、図９の正面顔探索手段６３の他の実施例をなす正面顔探索手段１０１の構成を示す図である。図１１を参照すると、この正面顔探索手段１０１は、頭部矩形画像記憶手段８９と、頭部中間サイズ算出手段８８と、画像縮小手段９０と、中間サイズ記憶手段９１と、正面顔候補抽出手段７２と、中間縮小画像記憶手段７３と、コントラスト補正手段７４と、積和演算手段１０２と、標準顔辞書データ記憶手段１０４と、記憶手段１０５と、類似度最大判定手段１０３と、探索範囲終了判定手段７９と、多重解像度処理終了判定手段９２とを備えて構成されている。
【０１２５】
記憶手段１０５は、類似度最大値記憶手段１０６と、正面顔濃淡値記憶手段１０７とを備えている。
【０１２６】
図２３は、正面顔探索手段１０１の処理を説明するための流れ図である。図１１及び図２３を参照して、正面顔探索手段１０１の動作について説明する。頭部矩形画像データは、頭部矩形画像記憶手段８９に保持されている。はじめに頭部中間サイズ算出手段８８において、対面距離値と標準顔辞書データのサイズを参照して、頭部矩形画像の中間縮小サイズを計算し、中間サイズ記憶手段９１に記憶する（ステップＳ１２１）。
【０１２７】
中間縮小サイズの計算方法は正面顔探索手段７１と同一である。次に最大類似度Smaxを０に初期化する（ステップＳ１２２）。
【０１２８】
中間サイズ記憶手段９１の情報を１つ選択し、画像縮小手段９０において、頭部矩形画像を選択した中間縮小サイズに縮小し、縮小頭部矩形画像を得る（ステップＳ１２３）。
【０１２９】
次に正面顔探索位置ＳＸ、ＳＹを０に初期化する（ステップＳ１２４）。
【０１３０】
次に正面顔候補抽出手段７２において、探索位置ＳＸ、ＳＹにおける正面顔候補画像を抽出する（ステップＳ１２５）。
【０１３１】
次に正面顔候補画像を明暗の調子を補正するため、コントラスト補正手段７４によってコントラストを補正する（ステップＳ１２６）。
【０１３２】
次に積和演算手段１０２において、正面顔候補画像と標準顔パターンとの類似度Ｓを求める（ステップＳ１２７）。
【０１３３】
次に類似度最大値判定手段１０３において、ＳとＳmaxとを比較する。このときＳがＳmaxよりも大きい値であれば、ＳmaxにＳを代入して値を更新し、類似度最大値記憶手段１０６に記憶する。同時に正面顔候補画像を正面顔濃淡値記憶手段１０７に記憶する（ステップＳ１２８）。
【０１３４】
次に探索範囲終了判定手段７９において、探索位置ＳＸ、ＳＹをインクリメントし（ステップＳ１２９）、頭部矩形を全て探索し終わったかどうかを判断する（ステップＳ１３０）。まだ探索し終わっていないときは、再びステップ１２５より処理を繰り返す。
【０１３５】
頭部矩形の探索範囲を全て探索し終わったら、多重解像度処理終了判定手段９２において、すべての中間縮小サイズで探索したかどうかを判断する。もし探索していない中間縮小サイズがあれば、異なる中間縮小サイズを用いて再びステップＳ１２３から処理を開始する。
【０１３６】
すべての中間縮小サイズで探索が終了した時点で正面顔探索手段１０１は終了する。
【０１３７】
次に、図１１に示した、標準顔パターンとの類似度を算出する積和演算手段１０２の動作について説明する。
【０１３８】
積和演算手段１０２は、正面顔かそれ以外かを判別する線形判別辞書を参照して類似度Ｓを算出する。図３４に、線形判別辞書の一例を示す。図３４には、ｑ個のクラスを判別する辞書が示されているが、標準顔辞書データ記憶手段１０４は、正面顔とそれ以外の２つのクラスを判別する辞書であり、ｑ＝２の場合に相当する。
【０１３９】
正面顔候補画像の画素値を、図８に示すようにラスタスキャンし、１次元の特徴データに変換する。このとき正面顔画像の縦横サイズの積Ｆｗ×Ｆｈは、辞書の特徴量と同じｐ個でなければならない。これをベクトルＸ：Ｘ１、Ｘ２，．．．Ｘｐとする。
【０１４０】
また標準顔辞書データとして、ｑ＝１のクラスが正面顔を、ｑ＝２のクラスがそれ以外を表わすものとして説明する。正面顔との類似度は、図３４のｑ＝１の行、すなわちクラス１の識別係数４２２のみを用いて、次式（１１）によって計算することができる。
【０１４１】

【０１４２】
積和演算手段１０２は、上式（１１）を演算することによって、類似度を算出する。
【０１４３】
図１３には、図１の顔識別手段９の一実施例をなす顔識別手段１１１と、図１の識別辞書記憶手段１０の一実施例をなす識別辞書記憶手段１１２と、図１の識別結果補正手段１１の一実施例をなす識別結果補正手段１１３とが示されている。図１３を参照すると、顔識別手段１１１は、特徴データ記憶手段１１５と、積和演算手段１１６と、最大類似度人物判定手段１１８と、しきい値処理手段１１７とを備えて構成されている。識別辞書記憶手段１１２は、登録人物識別用辞書記憶手段１１９を有する。識別結果補正手段１１３は、識別結果加重平均算出手段１１４を有する。
【０１４４】
図２４は、顔識別手段１１１と識別結果補正手段１１３の処理を説明するための流れ図である。図１３及び図２４を参照して、顔識別手段１１１と識別結果補正手段１１３の動作について説明する。
【０１４５】
特徴データが入力され、特徴データ記憶手段１１５に記憶される（ステップＳ５１）。
【０１４６】
次に、登録人物識別用辞書記憶手段１１９のデータを参照し、積和演算手段１１６において、登録された人物への類似度を、人物毎にそれぞれ算出する（ステップＳ５４）。
【０１４７】
類似度の算出方法は、標準顔パターンとの類似度を算出する積和演算手段１０２の動作と基本的に同じである。ただし、識別するクラス数は登録されている人物数となる。
【０１４８】
辞書としてｑ人分のデータが登録されている場合には、登録人物識別用辞書記憶手段１１９には、図３４に示す線形判別辞書と同じ数のデータが保持されることになる。そして積和演算手段１１６により、次式（１２）に示すように、ｑ個の類似度Ｓｉ：（ｉ＝１，．．．ｑ）が得られる。
【０１４９】

【０１５０】
このように、図３４に示す線形判別辞書による積和演算処理で求められた類似度の大きさによってパターンを識別する方法を、「線形判別辞書による類似度識別」と呼ぶ。
【０１５１】
再び図１３及び図２４を参照すると、次に、最大類似度人物判定手段１１８において、算出されたｑ個の類似度の中で最大値を求め、それに対応する人物を求める（ステップＳ５５）。すなわち、特徴データと最も似ていると判断される人物を求める。この時の類似度を「最大類似度」と呼ぶ。
【０１５２】
次に、しきい値処理手段１１７において、最大類似度をあらかじめ定められたしきい値と比較する（ステップＳ５６）。
【０１５３】
顔識別手段１１１は、最大類似度がしきい値よりも高いときは、確かに人物を識別したと判断し、そのＩＤ番号と最大類似度を出力する。最大類似度がしきい値よりも低いときは、登録されている人物（本人）ではない他人である可能性が高いので、「他人」という情報を出力する。
【０１５４】
識別結果補正手段１１３は、識別結果を受け取った後、識別結果加重平均算出手段１１４によって、過去Ｎフレームにおける識別結果を統合する（ステップＳ５７）。識別結果加重平均算出手段１１４の動作の例として、過去Ｎフレームにおける識別人物ＩＤと類似度、あるいは他人判定結果を、以下のように加重平均する方法がある。
【０１５５】
ステップＡ１：過去Ｎフレームの中で、一定割合のフレーム数が「他人」の時は、「他人」とする。他人でないと判定された場合にはステップＡ２へ進む。
【０１５６】
ステップＡ２：過去Nフレームの中で、人物ｉという判定がＮｉフレームあるものとする（ｉ＝１．．．ｑ）。それぞれの人物の類似度加重平均値を次式（１３）で算出する。Ｓｉは人物ｉの類似度を表し、ＳＳｉは人物ｉの類似度加重平均値を表す。ＳＳｉの最も大きい人物ＩＤを、識別結果として出力する。
【０１５７】

【０１５８】
識別結果補正手段１１３は、上記のようにして統合された識別結果を出力する。
【０１５９】
図１５は、図１の顔識別手段９の他の実施例をなす顔識別手段１３１の構成、図１の識別辞書記憶手段１０の他の実施例をなす識別辞書記憶手段１３２の構成を示す図である。図１５を参照すると、顔識別手段１３１は、特徴データ記憶手段１１５と、固有ベクトル他人判別手段１３３と、積和演算手段１１６と、最大類似度人物判定手段１１８と、しきい値処理手段１１７とを備えている。
【０１６０】
識別辞書記憶手段１３２は、他人判別用辞書記憶手段１３４と、登録人物識別用辞書記憶手段１１９とを備えている。
【０１６１】
図２４は、顔識別手段１３１の処理を説明するための流れ図である。図１５及び図２４を参照して、顔識別手段１３１の動作について説明する。特徴データが入力され、特徴データ記憶手段１１５に記憶される（ステップＳ５１）。
【０１６２】
次に、他人判別用辞書記憶手段１３４のデータを参照しながら、固有ベクトル他人判別手段１３３によって、登録されている人物群とのパターン間距離Ｄｈを求め（ステップＳ５２）、パターン間距離Ｄｈがしきい値よりも大きければ他人であると判定する（ステップＳ５３）。パターン間距離Ｄｈは固有ベクトル投影距離として算出する。すなわち、他人判別用辞書記憶手段には、登録されている全員の特徴データによって作成された固有ベクトル辞書が記憶されている。
【０１６３】
固有ベクトル辞書の例は図３３に示されており、また固有ベクトル投影距離の算出方法は、固有ベクトル投影距離算出手段７５の動作の説明において述べられている。
【０１６４】
入力された特徴データが、固有ベクトル他人判別手段１３３において他人と判断されれば、顔識別手段１３１は、積和演算手段１１６を経ることなく、直ちに「他人」を出力する。
【０１６５】
固有ベクトル他人判定手段１３３において「他人」と判定されなかった場合には、次に、登録人物識別用辞書記憶手段１１９のデータを参照し、積和演算手段１１６において、登録された人物への類似度を、人物毎にそれぞれ算出する（ステップＳ５４）。ｑ人の人物が登録されていればｑ個の類似度Ｓｉ：（ｉ＝１，．．．ｑ）を算出する。
【０１６６】
次に、最大類似度人物判定手段１１８において、算出されたｑ個の類似度の中で最大値を求め、それに対応する人物を求める（ステップＳ５５）。すなわち、特徴データと最も似ていると判断される人物を求める。この時の類似度を最大類似度と呼ぶ。次にしきい値処理手段１１７において、最大類似度をあらかじめ定められたしきい値と比較する（ステップＳ５６）。
【０１６７】
顔識別手段１３１は、最大類似度がしきい値よりも高いときは、確かに人物を識別したと判断し、そのＩＤ番号と最大類似度を出力する。最大類似度がしきい値よりも低いときは、登録されている人物ではない他人である可能性が高いので、「他人」という情報を出力する。
【０１６８】
図１４は、図１の辞書データ管理部１２の一実施例をなす辞書データ管理部１２１の構成を示す図である。図１４を参照すると、辞書データ管理部１２１は、個人特徴データ記憶手段１２２と、識別辞書生成手段１２３と、セレクタ１２４とを備えて構成されている。
【０１６９】
識別辞書生成手段１２３は、線形判別辞書作成手段１２６と固有ベクトル辞書作成手段１２７とを備えて構成されている。個人特徴データ記憶手段１２２内には、人物別特徴データ領域１２５が、登録された人数分存在する。
【０１７０】
辞書データ管理部１２１は、特徴データと人物ＩＤ番号とが入力されると、セレクタ１２４によって人物ＩＤ毎に振り分けられ、個人特徴データ記憶手段１２２内の人物ＩＤ番号に対応する領域に入力特徴データを記憶する。また、新しい人物ＩＤの追加命令があれば、個人特徴データ記憶手段１２２内に、新しい人物別特徴データ領域１２５を確保し、新しい人物ＩＤ番号を割り当てる。また、既存の人物ＩＤの削除命令があれば、個人特徴データ記憶手段１２２内の該当するＩＤの人物別特徴データ領域１２５を破棄する。
【０１７１】
また辞書データ管理部１２１は、識別辞書作成命令を受けると、識別辞書生成手段１２３は、個人特徴データ記憶手段１２２のデータを用いて、線形判別辞書である登録人物識別用識別辞書と固有ベクトル辞書である他人判別用辞書を生成する。登録人物識別用識別辞書は、線形判別辞書作成手段１２６において作成される。他人判別用辞書は、固有ベクトル辞書作成手段１２７で作成される。
【０１７２】
なお、図１の顔識別手段９が、図１３に示した顔識別手段１１１の構成を持つ場合は、他人判別用辞書は不要であるため、識別辞書生成手段１２３は固有ベクトル辞書作成手段１２７を持たない構成としてもよい。
【０１７３】
図２５は、図１の辞書データ管理部１２への登録処理を説明するための流れ図である。図１、図１４及び図２５を参照して、辞書データ管理部１２１がカメラの前にいる新しい人物を登録する時の動作について説明する。
【０１７４】
人物検出識別管理部１３は、新しい人物ＩＤを指定して、辞書管理部１２１に新しい人物の登録を指示する（ステップＳ６１）。
【０１７５】
辞書データ管理部１２１は、個人特徴データ記憶手段１２２内に、指定されたＩＤに対応する人物別特徴データ領域１２５を確保する（ステップＳ６２）。
【０１７６】
人物検出識別管理部１３は、人物検出識別手段１内の顔特徴抽出手段８から指定枚数分の特徴データを取得し、辞書データ管理部１２１に送付する。
【０１７７】
辞書データ管理部１２１は、新しいＩＤに対応する人物別特徴データ領域１２５に、取得したデータを保存する（ステップＳ６３）。
【０１７８】
指定枚数分の取得が完了したら、人物検出識別管理部１３は、辞書データ管理部１２１に対して識別辞書の作成を指示する（ステップＳ６４）。
【０１７９】
辞書データ管理部１２１は、作成指示を受けると、識別辞書生成手段１２３の線形判別辞書作成手段１２６によって、登録人物用識別辞書を作成する（ステップＳ６５）。
【０１８０】
次に、固有ベクトル辞書作成手段１２７によって、他人判別用辞書を作成する（ステップＳ６６）。
【０１８１】
そして、作成された辞書を人物検出識別部の識別辞書記憶手段１０に出力し記憶させる（ステップＳ６７）。以上の処理により、新規人物の登録処理は終了する。
【０１８２】
図２９は、図１４の線形判別辞書作成手段１２６の一実施例をなす線形判別辞書作成手段３１１の構成を示す図である。図２９を参照すると、線形判別辞書作成手段３１１は、特徴量Ｘ記憶手段３１２と、分散共分散行列Ｃｘｘ算出手段３１３と、逆行列変換手段３１４と、行列乗算手段３１５と、目的変数Ｙ記憶手段３１７と、目的変数Ｙ生成手段３１８と、共分散行列Ｃｘｙ算出手段３１９と、係数記憶手段３２０と、定数項算出手段３１６とを備えて構成されている。
【０１８３】
図３１及び図３４を参照して、線形判別辞書の作成方法について説明する。図３１には、１人あたりｎ枚で、人物１から人物ｑまでｑ人分の個人特徴データＸ３４１が示されている。また、個人特徴データＸ３４１の特徴数はｐ個である。図３１の１行が、１枚分の特徴データを示している。個人特徴データＸ３４１は、図１４において個人特徴データ記憶手段１２２に記憶されており、１人分の個人特徴データがそれぞれ人物別特徴データ領域１２５に記憶されている。
【０１８４】
目的変数Ｙ３４２は、１つの特徴データについて１つ作成され、識別する人物数分の要素を持つベクトルである。すなわち、図３１において、人物ＩＤは１からｑまでの値をとるので、目的変数ＹはＹ１からＹｑまで存在する。目的変数Ｙ３４２の値は、０か１の２値であり、特徴データが属する人物のベクトル要素が１で、その他は０である。すなわち人物２の特徴データであれば、Ｙ２要素だけが１で他は０となる。
【０１８５】
図３４は、線形判別辞書の形式の一例を示す図である。線形判別辞書は、定数項４２１と、乗算項４２５の２種類の係数からなる。乗算項からなるマトリクスをＡｉｊ（ｉ＝１，．．．ｐ、ｊ＝１，．．．．ｑ）、定数項からなるベクトルをＡ０ｊ（ｊ＝１，．．．．ｑ）とすると、マトリクスＡｉｊは、次式（１４）から求められる。
【０１８６】

【０１８７】
上式（１４）において、Ｃｘｘは個人特徴データＸ３４１の全てのデータを用いたこの分散共分散行列である。この分散共分散行列Ｃｘｘは、次式（１５）で算出される。
【０１８８】
個人特徴データＸの要素を、ｘｉｊ：（ｉ＝１．．．．Ｎ、ｊ＝１，．．．ｐ）で表わす。Ｎは全データ数で、ｐは特徴数である。図３１に示す例では、１人につきｎ枚のデータがあることから、Ｎ＝ｎｑである。ｘ￣は、ｘの平均値を表わす。
【０１８９】

【０１９０】
Ｃｘｙは、個人特徴データＸと目的変数Ｙとの共分散行列である。共分散行列Ｃｘｙは、次式（１６）に従って算出される。ｘ￣、ｙ￣は、ｘ、ｙの平均値を表わす。
【０１９１】

【０１９２】
また、定数項Ａ０ｊは、次式（１７）に従って算出される。
【０１９３】

【０１９４】
はじめにＣｘｘを算出し、その逆行列を求める。次にＣｘｙを求める。最後にこれらの行列を乗算して乗算項マトリクスＡｉｊを求め、最後に定数項Ａ０ｊを求める。
【０１９５】
図２９は、線形判別辞書作成手段３１１の処理を説明するための図である。図２９を参照して、線形判別辞書作成手段３１１の動作を説明する。入力された個人特徴データ群は、特徴量Ｘ記憶手段３１２に記憶される。
【０１９６】
入力された個人特徴データ群と人物ＩＤを用いて、目的変数Ｙ生成手段３１８によって目的変数Ｙが生成される。生成した目的変数Ｙは、目的変数Ｙ記憶手段３１７に記憶される。
【０１９７】
次に、分散共分散行列Ｃｘｘ算出手段３１３において、分散共分散行列Ｃｘｘを算出する。
【０１９８】
次に、逆行列変換手段３１４において、分散共分散行列Ｃｘｘの逆行列を算出する。
【０１９９】
次に、共分散行列Ｃｘｙ算出手段３１９において、共分散行列Ｃｘｙを算出する。
【０２００】
次に、行列乗算手段３１５において、乗算項Ａｉｊを算出し、係数記憶手段３２０に乗算項Ａｉｊデータを記憶する。
【０２０１】
次に、定数項算出手段３１６において、定数項Ａ０ｊを算出し、係数記憶手段３２０に記憶する。
【０２０２】
最後に、係数記憶手段３２０のデータを出力して終了する。
【０２０３】
なお、図３１では、０と１の２値データを示したが、線形判別辞書作成手段３１１においては、０と１００等の他の２値データを用いることも可能である。
【０２０４】
図３０は、図１４の固有ベクトル辞書作成手段１２７の一実施例をなす固有ベクトル辞書作成手段３３１を示す図である。固有ベクトル辞書作成手段３３１は、特徴量記憶手段３３２と、特徴量平均算出手段３３３と、分散共分散行列算出手段３３４と、固有ベクトル算出手段３３５と、係数記憶手段３３６と備えて構成されている。
【０２０５】
図３１と図３３とを参照して、固有ベクトル辞書の作成方法について説明する。個人特徴データＸの要素を、ｘｉｊ：（ｉ＝１．．．．Ｎ、ｊ＝１，．．．ｐ）で表わす。
【０２０６】
はじめにＸの特徴量ごとの平均値を求める。次に、Ｘの分散共分散行列Ｃｘｘを、上式（１５）を用いて算出する。分散共分散行列Ｃｘｘの固有ベクトルを求める。固有ベクトルを求める方法は、当業者によって広く知られており、本発明とは直接関係しないことから、その詳細は省略する。
【０２０７】
以上の操作により、図３３に示すような形式の固有ベクトル辞書が得られる。固有ベクトルは特徴量の数（ｐ個）だけ得られる。図３３では、１行分が１つの固有ベクトルを表わしている。
【０２０８】
図３０は、固有ベクトル辞書作成手段３３１の構成及び処理を説明するための図である。図３０を参照して、固有ベクトル辞書作成手段３３１の動作について説明する。
【０２０９】
個人特徴データが入力されると、特徴量記憶手段３３２に記憶される。
【０２１０】
次に、特徴量平均算出手段３３３において、特徴量の平均値を求め、係数記憶手段３３６に記憶する。次に分散共分散行列算出手段３３４において、分散共分散行列Ｃｘｘを算出する。
【０２１１】
次に、固有ベクトル算出手段３３５において、分散共分散行列Ｃｘｘの固有ベクトルを算出し、係数記憶手段３３６に記憶する。最後に、係数記憶手段３３６のデータを出力して終了する。
【０２１２】
図１６は、本発明に係るロボット装置の一実施の形態の構成を示す図である。図１６を参照すると、このロボット装置２０１は、ＣＣＤカメラ２０２と、人物検出識別手段２０３と、辞書データ管理部２０４と、人物検出識別管理部２０５と、全体制御部２０６と、スピーカ２０７と、ロボット移動手段２０８とを備えている。ロボット移動手段２０８は、モータ２０９と、車輪２１０とを備えている。
【０２１３】
人物検出識別手段２０３は、ＣＣＤカメラ２０２からのステレオ映像を基に、人物検出と識別を行なっている。人物検出識別管理部２０５は、人物検出識別手段２０３との情報のやり取り、全体制御部２０６との情報のやり取り、辞書データ管理部２０４との情報のやり取りを行なっている。スピーカ２０７は全体制御部２０６に接続され、全体制御部２０６の指示で発話することができる。また全体制御部２０６は、ロボット移動手段２０８に移動指示を送る。ロボット移動手段２０８はモータ２０９と複数の車輪２１０を持ち、ロボットを自由な方向に移動させることができる。
【０２１４】
図２６は、本発明に係るロボット装置の一実施の形態の処理を説明するための図である。図１６及び図２６を参照して、本発明の一実施の形態のロボット装置２０１の動作について説明する。ロボット装置２０１は、人物を検出すると、人物の方向に移動して近づいていき、予め定められた所定の距離以内に近づいたら人物識別を行なう。
【０２１５】
人物検出識別管理部２０５は、人物検出識別手段２０３内の頭部検出追跡手段から、検出した頭部の矩形情報と、対面距離値を取得し、全体制御部２０６に送信する（ステップＳ７１）。
【０２１６】
全体制御部２０６では対面距離値を参照し、対面距離がしきい値よりも近いかどうかを判定する（ステップＳ７３）。距離がしきい値よりも遠いときは、ロボット移動手段２０８に指令して、人物の方向に前進する（ステップＳ７２）。
【０２１７】
人物の存在する概略的な方向は、画像内の頭部矩形座標から類推することができる。ステップＳ７１からステップＳ７３を繰り返し、対面距離がしきい値よりも近くなったら、全体制御部２０６は、人物検出識別管理部２０５から人物識別結果を取得する（ステップＳ７４）。
【０２１８】
次に、人物別の音声をスピーカ２０７から発声し、人物識別したことを対話者に知らせる（ステップＳ７５）。
【０２１９】
図１７は、本発明のロボット装置の一実施例の構成を示す図である。図１７を参照すると、ロボット装置２２１は、ＣＣＤカメラ２０２と、対面距離センサ２２３と、タッチセンサ２２２と、人物検出識別手段２２４と、辞書データ管理部２０４と、人物検出識別管理部２０５と、全体制御部２２７と、マイク２２５と、音声認識手段２２６と、スピーカ２０７と、ロボット移動手段２０８とを備えて構成されている。ロボット移動手段２０８は、モータ２０９と、車輪２１０とを備えている。
【０２２０】
人物検出識別手段２２４は、ＣＣＤカメラ２０２からのステレオ映像と、対面距離センサ２２３との情報を元に、人物検出と識別を行なっている。人物検出識別管理部２０５は、人物検出識別手段２２４との情報のやり取り、全体制御部２２７との情報のやり取り、辞書データ管理部２０４との情報のやり取りを行なっている。スピーカ２０７は全体制御部２２７に接続され、全体制御部２２７の指示で発話することができる。また全体制御部２２７は、ロボット移動手段２０８に移動指示を送る。ロボット移動手段２０８はモータ２０９と複数の車輪２１０を持ち、ロボットを自由な方向に移動させることができる。タッチセンサ２２２は、全体制御部２２７に接続されており、外部から物体の接触の有無と、接触の強さを検出する。マイク２２５は音声認識手段２２６に接続され、音声認識手段２２６は全体制御部２２７に接続されている。音声認識手段２２６は、マイク２２５からの音声データから、人の言葉を自動認識し、認識結果を、全体制御部に送信する。
【０２２１】
図２７は、本発明の一実施例のロボット装置の処理を説明するための流れ図である。図１７及び図２７を参照して、本発明の一実施例のロボット装置２２１の動作例について説明する。
【０２２２】
ロボット装置２２１は、対面している人物の検出識別を行ない、対話者の反応によって識別辞書を更新することによって、人物画像の逐次学習を行なう。
【０２２３】
全体制御部２２７は、人物検出識別管理部２０５から人物識別結果を取得する（ステップＳ８１）。
【０２２４】
次に識別した人物毎に特定の動作を行なう（ステップＳ８２）。特定の動作とは、人の名前を発声したり、人によって車輪を特定方向に動かしたりする行為全体を差す。
【０２２５】
次に、対話者の反応がセンスされるのを待つ（ステップＳ８３）。対話者の反応が得られたときは、次にその反応がポジティブなものかネガティブなものかを判定する（ステップＳ８４）。ポジティブな反応とは、「Ｙｅｓ」という意味であり、例えば音声認識で，「はい」を認識した時などがある。
【０２２６】
ネガティブな反応とは、「Ｎｏ」という意味であり、例えば音声認識で，「いいえ」を認識した時などがある。
【０２２７】
また、例えば、あらかじめタッチセンサを１度押下すると「はい」、２度押下すると「いいえ」という規則を予め全体制御部で決めておくことで、タッチセンサを用いて対話者の反応を取得することができる。
【０２２８】
ステップＳ８４がネガティブな反応の場合は、もう一度ステップＳ８１から動作を繰り返す。ポジティブな反応の場合には、識別に使用した顔特徴データを辞書データ管理部２０４に入力する（ステップＳ８５）。
【０２２９】
そして識別辞書を作成し更新する（ステップＳ８６）。
【０２３０】
終了指示があれば終了し、なければ、再びステップＳ８１から動作を繰り返す（ステップＳ８７）。
【０２３１】
図２８は、本発明のロボット装置の他の実施例の処理を説明するための流れ図である。図１７及び図２８を参照して、本発明の他の実施例をなすロボット装置２２１の動作の例について説明する。ロボット装置２２１は、対面している人物の検出識別を行ない、対話者の命令によって新しい人物の画像を取得し、新しい人物辞書を登録する。
【０２３２】
全体制御部２２７は、人物検出識別管理部２０５から人物識別結果を取得する（ステップＳ９１）。
【０２３３】
次に識別した人物毎に特定の動作を行なう（ステップＳ９２）。特定の動作とは、人の名前を発声したり、人によって車輪を特定方向に動かしたりする行為全体を差す。
【０２３４】
次に、対話者の命令がセンスされるのを待つ（ステップＳ９３）。対話者の命令イベントが得られると、次にその命令イベントが登録命令かどうかを判定する（ステップＳ９４）。登録命令イベントは、例えば音声認識で「とうろく」を認識したというイベントがある。また、あらかじめタッチセンサを１度たたくと登録イベントである、という規則を予め全体制御部で決めておくことにより、タッチセンサを用いて登録命令イベントを発声させることができる。
【０２３５】
登録命令が来たときは、登録処理を行い、新しい人物の登録を行なう（ステップＳ９５）。登録処理ステップＳ９５の一例は、図２５に示されており、既に説明済みである。
【０２３６】
登録処理が終了すると、再びステップＳ９１から処理を開始する。登録命令イベントが来なかった場合は、終了判定を行なう（ステップＳ９６）。
【０２３７】
終了命令があれば終了し、終了命令がない場合は、再びステップＳ９１から処理を開始する。
【０２３８】
図３５は、本発明の関連発明に係る人物検出識別システムの一実施の形態の構成を示す図である。図３５を参照すると、本発明の関連発明に係る人物検出識別システムは、映像取得手段２と、対面距離センサ５と、人物検出識別手段５０１と、辞書データ管理部１２と、人物検出識別管理部５０２と、記録媒体５０３とを備えている。記録媒体５０３には、人物検出識別処理プログラムを記録しており、磁気ディスク、半導体メモリ、CD-ROMその他の記録媒体であってよい。
【０２３９】
人物検出識別処理プログラムは、記録媒体５０３から人物検出識別手段５０１と、人物検出識別管理部５０２に読み込まれ、図１を参照して説明した前記した実施の形態における人物検出識別手段６および人物検出識別管理部１３による処理と同一の処理を実行する。
【０２４０】
【発明の効果】
以上説明したように、本発明によれば、次のような効果を奏する。
【０２４１】
本発明のロボット装置で用いられる人物識別装置は、例えば家庭環境のように、照明条件の変動が激しい環境下においても、極めて高い識別率で人物を識別することができる、という効果を奏する。
【０２４２】
その理由は以下の通りである。すなわち、本発明においては、画像の濃淡値だけではなく、動きと対面距離情報を用いて検出識別している、ためである。また、本発明においては、顔位置合わせ手段により、精度のよい正面顔を検出している、ためである。さらに、本発明においては、顔識別手段において、線形判別辞書による類似度識別を行なっている、ためである。
【０２４３】
そして、本発明においては、人（利用者）との対話を通じて、装置又はシステムが自ら学習することで、識別精度を上げる、ように構成したためである。
【０２４４】
また、本発明のロボット装置によれば、ある場所で入力画像が不良で識別できなくても、その後、装置自ら動作することにより、良好な画像を取得する、ことができる。
【図面の簡単な説明】
【図１】本発明のロボット装置で用いられる人物識別装置の一実施の形態の構成を示すブロック図である。
【図２】本発明のロボット装置で用いられる人物識別装置における頭部検出追跡手段の一実施例の構成を示すブロック図である。
【図３】本発明のロボット装置で用いられる人物識別装置における頭部検出追跡手段の別の実施例の構成を示すブロック図である。
【図４】本発明のロボット装置で用いられる人物識別装置における単眼視頭部矩形座標検出手段の一実施例の構成を示すブロック図である。
【図５】本発明のロボット装置で用いられる人物識別装置における頭部検出処理を説明するための図である。
【図６】本発明のロボット装置で用いられる人物識別装置における左右画像照合処理を説明するための図である。
【図７】本発明のロボット装置で用いられる人物識別装置における頭部追跡処理を説明するための図である。
【図８】本発明のロボット装置で用いられる人物識別装置における正面顔画像を特徴データに変換する際のラスタスキャンの説明図である。
【図９】本発明のロボット装置で用いられる人物識別装置における正面顔位置合わせ手段の一実施例の構成を示すブロック図である。
【図１０】本発明のロボット装置で用いられる人物識別装置における正面顔探索手段の一実施例の構成を示すブロック図である。
【図１１】本発明のロボット装置で用いられる人物識別装置における正面顔探索手段の他の実施例の構成を示すブロック図である。
【図１２】本発明のロボット装置で用いられる人物識別装置における頭部検出処理と顔位置合わせ処理を説明するための図である。
【図１３】本発明のロボット装置で用いられる人物識別装置における顔識別手段と識別辞書記憶手段と識別結果補正手段の各実施例の構成を示すブロック図である。
【図１４】本発明のロボット装置で用いられる人物識別装置における辞書データ管理部の一実施例の構成を示すブロック図である。
【図１５】本発明のロボット装置で用いられる人物識別装置における顔識別手段と識別辞書記憶手段と識別結果補正手段の各実施例の構成を示すブロック図である。
【図１６】本発明のロボット装置の一実施例の構成を示す図である。
【図１７】本発明のロボット装置の他の実施例の構成を示す図である。
【図１８】本発明のロボット装置で用いられる人物識別装置における人物検出処理と識別処理を説明するための流れ図である。
【図１９】本発明のロボット装置で用いられる人物識別装置における頭部検出追跡処理を説明するための流れ図である。
【図２０】本発明のロボット装置で用いられる人物識別装置における頭部検出処理を説明するための流れ図である。
【図２１】本発明のロボット装置で用いられる人物識別装置における正面顔位置合わせ処理を説明するための流れ図である。
【図２２】本発明のロボット装置で用いられる人物識別装置における正面顔探索処理を説明するための流れ図である。
【図２３】本発明のロボット装置で用いられる人物識別装置における正面顔探索処理を説明するための流れ図である。
【図２４】本発明のロボット装置で用いられる人物識別装置における顔識別処理を説明するための流れ図である。
【図２５】本発明のロボット装置で用いられる人物識別装置における人物辞書登録処理を説明するための流れ図である。
【図２６】本発明のロボット装置における行動制御を説明するための流れ図である。
【図２７】本発明のロボット装置における行動制御を説明するための流れ図である。
【図２８】本発明のロボット装置における行動制御を説明するための流れ図である。
【図２９】本発明のロボット装置で用いられる人物識別装置における線形判別辞書作成手段の一実施例の構成を示すブロック図である。
【図３０】本発明のロボット装置で用いられる人物識別装置における固有ベクトル辞書作成手段の一実施例の構成を示すブロック図である。
【図３１】本発明のロボット装置で用いられる人物識別処理に用いる線形判別辞書作成方法を説明するための図である。
【図３２】本発明のロボット装置で用いられる人物識別処理に用いるステレオ視による対面距離算出方法を説明するための図である。
【図３３】本発明のロボット装置で用いられる人物識別処理に用いる固有ベクトル投影距離辞書の説明図である。
【図３４】本発明のロボット装置で用いられる人物識別処理に用いる線形判別辞書の説明図である。
【図３５】本発明のロボット装置で用いられる人物識別装置の他の実施例の構成を示すブロック図である。
【符号の説明】
１人物検出識別手段
２映像取得手段
３右カメラ
４左カメラ
５対面距離センサ
６頭部検出追跡手段
７正面顔位置合わせ手段
８顔特徴量抽出手段
９顔識別手段
１０識別辞書記憶手段
１１識別結果補正手段
１２辞書データ管理部
１３人物検出識別管理部
１４人物識別装置
２１頭部検出手段
２２頭部追跡手段
２３頭部矩形記憶手段
２４単眼視頭部矩形座標検出手段
２５左右画像照合手段
２６対面距離評価手段
２７頭部検出追跡手段
２８右カメラ
２９左カメラ
３０対面距離センサ
３１対面距離統合手段
３２頭部検出追跡手段
３３頭部検出手段
３４ＣＣＤカメラ
４１頭部矩形座標検出手段
４２動き画素検出手段
４３人物数評価手段
４４頭頂座標検出手段
４５頭部下部座標検出手段
４６側頭座標検出手段
４７動き領域幅
４８統合差分画像Ｇ
４９統合差分画像Ｇ
５０統合差分画像上部領域
５１頭部検出矩形
５２探索結果
５３前フレームの頭部矩形座標
５４サーチ軌跡
５５前フレームの頭部矩形画像
６１正面顔位置合わせ手段
６２頭部矩形切り取り手段
６３正面顔探索手段
６４標準顔辞書デ−タ記憶手段
６５正面顔らしさ判定手段
６６濃度分散判定手段
６７しきい値処理手段
７１正面顔探索手段
７２正面顔候補抽出手段
７３中間縮小画像記憶手段
７４コントラスト補正手段
７５固有ベクトル投影距離算出手段
７６標準顔辞書記憶手段
７７記憶手段
７８投影距離最小判定手段
７９探索範囲終了判定手段
８０標準顔平均データ記憶手段
８１標準顔固有ベクトルデータ記憶手段
８２平均差分手段
８３ベクトル投影値算出手段
８４再構成演算手段
８５投影距離計算手段
８６投影距離最小値記憶手段
８７正面顔濃淡値記憶手段
８８頭部中間サイズ算出手段
８９頭部矩形画像記憶手段
９０画像縮小手段
９１中間サイズ記憶手段
９２多重解像度処理終了判定手段
１０１正面顔探索手段
１０２積和演算手段
１０３類似度最大判定手段
１０４標準顔辞書データ記憶部
１０５記憶手段
１０６類似度最大値記憶手段
１０７正面顔濃淡値記憶手段
１１１顔識別手段
１１２識別辞書記憶手段
１１３識別結果補正手段
１１４識別結果加重平均算出手段
１１５特徴データ記憶手段
１１６積和演算手段
１１７しきい値処理手段
１１８最大類似度人物判定手段
１１９登録人物用識別辞書記憶手段
１２１辞書データ管理部
１２２個人特徴データ管理部
１２３識別辞書生成手段
１２４セレクタ
１２５人物別特徴データ
１２６線形判別辞書作成手段
１２７固有ベクトル辞書作成手段
１３１顔識別手段
１３２識別辞書記憶手段
１３３他人判別手段
１３４他人判別用辞書記憶手段
１５１頭部矩形
１５２頭部矩形画像
１５３縮小頭部矩形画像
１５４正面顔探索形状
１５５正面顔画像
２０１ロボット装置
２０２ＣＣＤカメラ
２０３人物検出識別手段
２０４辞書データ管理部
２０５人物検出識別管理部
２０６全体制御部
２０７スピーカ
２０８ロボット移動手段
２０９モータ
２１０車輪
２２１ロボット装置
２２２タッチセンサ
２２３対面距離センサ
２２４人物検出識別手段
２２５マイク
２２６音声認識手段
２２７全体制御部
３１１線形判別辞書作成手段
３１２特徴量Ｘ記憶手段
３１３分散共分散行列Ｃｘｘ算出手段
３１４逆行列変換手段
３１５行列乗算手段
３１６定数項算出手段
３１７目的変数Ｙ記憶手段
３１８目的変数Ｙ生成手段
３１９共分散行列Ｃｘｙ算出手段
３２０係数記憶手段
３３１固有ベクトル辞書作成手段
３３２特徴量記憶手段
３３３特徴量平均算出手段
３３４分散共分散行列算出手段
３３５固有ベクトル算出手段
３３６係数記憶手段
３４１個人特徴データＸ
３４２目的変数Ｙ
３４３特徴データ平均値
３４４目的変数平均値
４０１右カメラ
４０２左カメラ
４０３対象物
４１１１番目の固有ベクトル
４１２２番目の固有ベクトル
４１３Ｐ番目の固有ベクトル
４１４平均値
４２１定数項
４２２クラス１の識別係数
４２３クラス２の識別係数
４２４クラスｑの識別係数
４２５乗算項
５０１人物検出識別処理部
５０２人物検出識別管理部
５０３記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique for identifying a person in a video, and more particularly to a person identification technique using a front face and a robot apparatus for identifying a person.
[0002]
[Prior art]
Several methods for identifying a person using facial images have been proposed. Recent trends in face detection and identification technologies are described in, for example, Reference (1) (Shigeru Akamatsu, “Face Recognition by Computer-Survey”, IEICE Transactions, Vol. J80-D-II, No. 8, pp.2031-2046, August 1997). In general, a face identification system includes a process for detecting a face from an image, a feature extraction process from a face pattern, and a person identification process for comparing feature quantities with dictionary data.
[0003]
As a detection method of face images, reference (2) (Shin Kosugi, “Finding and locating faces in scenes using multiple pyramids for personal identification”, IEICE Transactions, Vol. J77-D- II, No.4, pp.672-681, April 1994), which performs template matching using shading patterns, and literature (3) (M. Turk, A. Pentland, “Face Recognition”). on Eigenfaces ”, Proceedings of IEEE, CVPR91), a face image eigenvector projection distance method is known.
[0004]
Also, for example, Japanese Patent Laid-Open No. 9-251534 has proposed a method of detecting features such as eyes, nose, and mouth and cutting out a front face shading pattern from the positional relationship.
[0005]
As a typical example of face detection, an eigenvector projection distance method by M. Turk et al. Will be described.
[0006]
Prepare a lot of front face data (hundreds) beforehand. Using these pixel values as feature vectors, eigenvalues and eigenvectors are obtained. P eigenvectors Vn (n = 1,... P) are obtained in descending order of eigenvalues.
[0007]
When the test image t is projected onto the eigenvector Vn, p projection values are obtained. A reconstructed test image t ′ is obtained by reconstructing the test image from these projection values and the eigenvector Vn.
[0008]
If t is close to the face pattern, an image close to the reconstructed test image t ′ is also obtained. Therefore, it is determined whether or not the face is based on the distance scale Dt given by the following equation (1).
[0009]

[0010]
There are two types of face identification features, one that uses geometric features of facial features such as eyes, nose, and mouth, and one that uses global shade pattern matching. Since the positional relationship of the feature changes as the face orientation and facial expression change, the latter method using the global shading pattern is currently the mainstream.
[0011]
Examples of face image identification and collation methods include, for example, the above document (2) (Shin Kosugi, “Finding and locating faces in scenes using multiple pyramids for personal identification”, IEICE Transactions, Vol. J77-D-II, No. 4, pp. 672-681, April 1994) considers a gray pattern as a feature vector, and uses a category with a large inner product between feature vectors as an identification result. Also, in the above document (3) (M. Turk, A. Pentland, “Face Recognition on Using Eigenfaces”, Proceedings of IEEE, CVPR91), the projection value onto the eigenvector of the face image is used as a feature vector, and the Euclidean distance The small category is the identification result.
[0012]
Conventionally, as a robot apparatus having an image recognition function, for example, there is an apparatus described in Japanese Patent Application No. 10-151591. This robot apparatus can extract color information from an image and change the operation according to the color pattern. However, no function means for recognizing a person is provided.
[0013]
[Problems to be solved by the invention]
The above-described conventional system has the following problems.
[0014]
The first problem is that a person cannot be identified in an environment where lighting conditions are not constant, such as a home environment.
[0015]
This is because it is difficult to detect a face in a general environment. For example, the template matching method is difficult to detect unless the face pattern and the dictionary pattern in the image are almost density values, and the illumination direction is slightly shifted or is different from the person in the dictionary. Almost undetectable. On the other hand, although the eigenvector projection distance method has a higher detection performance than the template matching, the detection fails similarly in an image with a different illumination direction or a complicated background.
[0016]
Another reason why the person cannot be identified in an environment where the lighting conditions are not constant is that the conventional feature extraction method and the identification method cannot absorb the variation of the feature amount due to the illumination variation.
[0017]
Accordingly, the present invention has been made in view of the above problems, and an object thereof is to provide a robot apparatus that can identify a person in a general environment such as a home environment.
[0018]
Accordingly, another object of the present invention is to provide a robot apparatus that can stably identify a person in a general environment.
[0019]
[Means for Solving the Problems]
The robot apparatus according to the present invention that achieves the above-described object includes, as a person identification device, a video acquisition unit that acquires an image, a head detection tracking unit that detects a human head from the image, and a detected head. Front face alignment means for acquiring a front face image from a partial image, face feature extraction means for converting the front face image into a feature quantity, face identification means for identifying a person from the feature quantity using an identification dictionary, and identification An identification dictionary storage means for storing a dictionary is provided. Then, in the head detection and tracking means, a monocular head rectangular coordinate detection means for detecting the head from one image, and a facing distance evaluation means for removing erroneous head detection from the facing distance value and the head rectangular coordinate value And measuring the distance between the front object, a general control unit that controls the operation of the robot, a speaker that speaks voice according to instructions from the general control unit, a moving means that moves the robot according to instructions from the general control unit A face-to-face distance sensor, a touch sensor, a microphone, and voice recognition means.
[0020]
In the present invention, when the person identification result is obtained, the overall control unit controls to speak with a different voice for each person.
[0021]
In the present invention, when the overall control unit obtains a face-to-face distance and direction with a front object from the person identification device, means for obtaining a person identification result, and the face-to-face distance is equal to or greater than a threshold value And means for moving so as to approach the forward object, and means for controlling the person identification result to be spoken with a different voice for each person when the facing distance is less than or equal to a threshold value.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings.
[0023]
FIG. 1 is a diagram showing a configuration of an embodiment of a person identification device used in a robot apparatus according to the present invention. Referring to FIG. 1, a person identification device 14 according to an embodiment of the present invention includes a video acquisition unit 2, a face-to-face distance sensor 5, a person detection / identification unit 1, a person detection / identification management unit 13, and dictionary data. And a management unit 12. The video acquisition means 2 includes a right camera 3 and a left camera 4 and acquires respective camera video information. The facing distance sensor 5 is installed in the same direction as the optical axis of the camera, and measures the facing distance to an object in the video. Examples of the facing distance sensor 5 include an ultrasonic sensor and an infrared sensor. The person detection / identification management unit 13 transmits an operation start command and an operation end command to the person detection / identification unit 1, transmits characteristic data to the dictionary data management unit 12, and transmits a dictionary creation command.
[0024]
Cameras used in an embodiment of the present invention collectively include, for example, video devices, digital CCD cameras, and the like, and photographing devices that can output a moving scene as a sequence of still images.
[0025]
The person detection and identification unit 1 includes a head detection and tracking unit 6, a front face alignment unit 7, a face feature extraction unit 8, a face identification unit 9, an identification dictionary storage unit 10, and an identification result correction unit 11. I have.
[0026]
Upon receiving the operation start command from the person detection / identification management unit 13, the person detection / identification unit 1 starts the operation after loading the identification dictionary from the dictionary data management unit 12 into the identification dictionary storage unit 10.
[0027]
FIG. 18 is a flowchart for explaining the processing of the person detection / identification means 1 according to the embodiment of the present invention. The operation of the person detection / identification means 1 will be described with reference to FIGS.
[0028]
First, the head detection and tracking unit 6 outputs the number of human heads in the current frame and head rectangular coordinates based on the image information from the video acquisition unit 2 and the reading value of the facing distance sensor 5. (Step S1).
[0029]
Next, the number of detected heads is evaluated (step S2). If the detected number of heads is 0, the next frame image is input to detect heads, and step S1 is continued until the detected number becomes 1 or more.
[0030]
When the number of detected heads is 1 or more, the detection result is transmitted to the front face alignment means 7. The front face alignment means 7 performs a face area search process (step S3) and determines whether a front face area has been found (step S4).
[0031]
When a front face is found, a front face image that is a rectangular image at the center of the face is output. The processes in steps S3 and S4 are intended to eliminate erroneous detection of the head, and to extract only the video in which the person is facing the front of the camera and send it to the subsequent process. If the front face cannot be found, the process is repeated from step S1.
[0032]
When a front face is found, the face feature extraction means 8 converts the front face image into feature amount data (step S5).
[0033]
As shown in FIG. 8, one example of the face feature extraction means 8 scans the front face image from left to right line by line, and when one line ends from top to bottom, scans the next line to obtain one-dimensional data. It is generated (called “raster scan”) and used as feature data. In addition, a method in which edge data is extracted by filtering using a first-order differential filter or a second-order differential filter may be used as a feature data by raster scanning.
[0034]
Next, the face identification means 9 performs face identification processing with reference to the dictionary data in the identification dictionary storage means 10 (step S6).
[0035]
Next, the identification result correction unit 11 performs integration processing with identification results for past m frames (m is an integer of 2 or more) (step S7), and outputs the result to the person detection identification management unit 13 ( Step S8).
[0036]
At this time, when a plurality of head rectangles are detected in the head detection / tracking means 6 (step S1) and all of them are not processed (No branch in step S9), the front face alignment means 7 again ( Processing is performed from step S3). The person detection / identification unit 1 ends upon receiving an end instruction from the person identification management unit 13 (step S10). The process is continued again from step S1 until an end instruction is given.
[0037]
FIG. 2 is a diagram showing the configuration of the head detection / tracking means 27 constituting an embodiment of the head detection / tracking means 6 of FIG. Referring to FIG. 2, the head detection tracking unit 27 includes a head detection unit 21, a head tracking unit 22, and a head rectangular coordinate storage unit 23. The head detecting unit 21 includes a head rectangular coordinate detecting unit 24, a left / right image collating unit 25, a facing distance integrating unit 31, and a facing distance evaluating unit 26.
[0038]
FIG. 19 is a flowchart for explaining the processing of the head detection / tracking means 27. With reference to FIGS. 2 and 19, the operation of the head detection and tracking means 27 according to an embodiment of the present invention will be described.
[0039]
The right camera image, the left camera image, and the reading value of the facing distance sensor are input to the head detecting means 21. The head detection means 21 performs a human head detection process from the input information, and outputs head rectangular coordinates and the number of head detections (step S10).
[0040]
When the number of detected heads is 1 or more, the number of detected heads and the head rectangular coordinates are stored in the head rectangular coordinate storage means 23 and then output (step S18).
[0041]
If the number of detected heads is 0, the head tracking means 22 takes out the head rectangle information in the previous frame from the head rectangle storage means 23 and performs head tracking processing (step S19).
[0042]
If head tracking is successful, the number of heads and head rectangular coordinates that have been successfully tracked are output, and if tracking fails, the number of detections is 0 (step S20).
[0043]
Next, the operation of the head detecting means 21 in step 10 of FIG. 19 will be described in detail. The head detecting means 21 first inputs either the left or right video to the head rectangular coordinate detecting means 24 to obtain the provisional head detection number and provisional head rectangular coordinates (step S11).
[0044]
The head rectangular coordinate detection means 24 shown in FIG. 2 uses a right camera image. Next, the left and right image collating means 25 calculates the facing distance value based on the principle of stereo vision using the obtained head rectangular coordinates and the left and right camera images (step S12 in FIG. 19).
[0045]
With reference to FIG. 6, the operation of the left and right image matching means 25 in FIG. 2 will be described. A head rectangle detected in the right camera image is set as a head detection rectangle 51. Then, the vicinity of the same detected coordinate position of the left camera image is searched using the image data in the head detection rectangle 51. An example of the search method is template matching. If the gray value of the right camera image is FR (x, y), the gray value of the left camera image is FL (x, y), the horizontal size of the rectangle is Tw, and the vertical size is Th, the upper left starting point position of the template is The matching distance Dtm when it is at (sx, sy) of the camera image is expressed by the following equation (2).
[0046]

[0047]
The above equation (2) represents the Euclidean distance between the partial images of the right camera and the left camera. The coordinates on the left camera image when Dtm is the smallest are taken as search results 52. When the search result 52 is obtained, next, the left and right rectangular coordinate values are compared, and the distance to the human head is calculated.
[0048]
With reference to FIG. 32, an example of a method for calculating the distance to the object will be described. FIG. 32 is a top view of a situation where a single target object 403 is captured using the left and right cameras. The right camera 401 and the left camera 402 are installed in parallel with an interval C therebetween. The angle of view of the camera is θ and is the same for both left and right. Let e be the lateral length of the imaging surface of the camera. In this state, the object 403 is shown at the coordinate Xr in the right camera image, and the coordinate Xl is shown in the left camera image. The maximum horizontal size of the image is W pixels. At this time, the facing distance Z from the camera imaging surface to the symmetrical object 403 can be calculated by the following equation (3).

[0049]
Here, since e is a small value usually less than 1 cm, it can be approximated as 0. As described above, the facing distance is calculated from the left and right camera images.
[0050]
Then, after the face-to-face image collating means 25 calculates the face-to-face distance by stereo vision, the face-to-face distance integrating means 31 performs face-to-face distance integration processing based on the output value of the face-to-face distance sensor 30 (step in FIG. 19). S13). Experimentally, distance sensors such as ultrasonic sensors are very accurate when the distance is less than 1 m. On the other hand, the error tends to increase at a distance of 1 m or more.
[0051]
Although the distance value calculated by stereo viewing is effective up to about 3 m depending on the angle of view of the camera, if the distance is too close, the error tends to increase. Therefore, as a method of integrating both distance values, when the output of the facing distance sensor 30 is smaller than a certain threshold value T, the value of the facing distance sensor is adopted, and when the output is larger than the threshold value T, A method of adopting a distance value by stereo vision is used.
[0052]
After integrating the face-to-face distances, the face-to-face distance evaluation means 26 calculates the actual size of the head from the face-to-face distance value and the head rectangular coordinate value in the image (step S14 in FIG. 19).
[0053]
If the calculation result substantially matches the size of the human head, it is determined that the head is really detected. If the calculation result is significantly different from the actual head size, it is determined that it is a false detection (step S15 in FIG. 19). For example, when the horizontal size of the head is within 12 cm plus or minus 2 cm and the vertical size is within 20 cm plus or minus 4 cm, it is regarded as the head, and otherwise it is determined that it is not the head.
[0054]
If it matches the actual size, the number of detections is increased by 1 (step S16 in FIG. 19).
[0055]
When the tentative head rectangular coordinates that have not been evaluated remain (No branch in step S17 in FIG. 19), the process is performed again from step S12. The head detecting means 21 outputs the number of detected heads and the head rectangular coordinates when all the provisional head rectangular coordinates have been evaluated (Yes branch in step S19 in FIG. 19).
[0056]
Next, the head rectangular coordinate detection means 24 shown in FIG. 2 will be described. FIG. 4 is a diagram showing the configuration of the head rectangular coordinate detecting means 41 constituting an embodiment of the head rectangular coordinate detecting means 24. As shown in FIG. Referring to FIG. 4, the head rectangular coordinate detection unit 41 includes a moving pixel detection unit 42, a noise removal unit 47, a person number evaluation unit 43, a top coordinate detection unit 44, and a head lower coordinate detection unit 45. The temporal coordinate detection means 46 is provided.
[0057]
FIG. 20 is a flowchart for explaining the processing of the head rectangular coordinate detection means 41. The operation of the head rectangular coordinate detection means 41 will be described with reference to FIGS. 20, 4, and 5.
[0058]
First, the moving pixel detection means 42 detects a pixel group that moves within the screen. The difference between the input image data and the image data input immediately before is taken to generate a difference image g (step S21).
[0059]
Furthermore, the difference image g of past m frames (m is an integer of 2 or more) is added and averaged to obtain an integrated difference image G (step S22). The integrated difference image G has a pixel value of 0 in a non-motion area, and a larger pixel value in a movement area.
[0060]
Since the integrated difference image G contains a large amount of sesame salt noise, the noise removal unit 47 performs noise removal processing (step S23). Examples of noise removal processing include expansion / contraction processing and median filter processing. These noise removal processes are common in the field of image processing, and processes well known to those skilled in the art are used.
[0061]
Next, the number-of-persons evaluation means 43 in FIG. 4 evaluates how many people are on the screen. The operation of the person number evaluation means 43 will be described. FIG. 5 is a diagram for explaining an example of acquisition of the integrated difference image G.
[0062]
First, a method for detecting only one person will be described. If the integrated difference image G48 is obtained, it is first determined whether or not there is a motion region (step S24 in FIG. 20). Here, the motion area represents an area occupied by a moving pixel. If there is no motion area, that is, if the integrated difference image G is all zero, the number of persons is determined to be zero. Otherwise, the number of persons is 1.
[0063]
Next, a method for detecting a plurality of persons will be described. If the integrated difference image G49 is obtained, first, the presence / absence of a motion region is checked (step S24 in FIG. 20). When there is no motion area, the number of persons is zero. If there is a motion region, the integrated difference image G is referred to determine how many people are present (step S25 in FIG. 20). As a determination method, for example, there is a method in which one person is used when the maximum value of the motion area width in the integrated differential image upper region 50 is smaller than a certain threshold value, and two persons are used when the maximum value is larger. When the number of persons is two, it is assumed that the persons are lined up side by side, and the integrated difference area G is divided into a partial area 1 and a partial area 2. It should be noted that even when three or more people are detected, it can be handled by increasing the number of divisions. When obtaining the head rectangle, the same processing described below (from step S26 to step S29 in FIG. 20) may be repeated for each of the partial area 1 and the partial area 2.
[0064]
Next, processing for obtaining head rectangular coordinates from the integrated difference image G will be described. A motion region width 47 is obtained for each scan line (step S26 in FIG. 20).
[0065]
The motion region width 47 represents the difference between the maximum and minimum x-coordinate values of the motion region in each scan line.
[0066]
Next, the Y coordinate of the top is obtained by the top coordinate detection means 44 (step S27 in FIG. 20). As a method of obtaining the vertex coordinates, there is a method in which the minimum value of the Y coordinate of the motion region is used as the vertex.
[0067]
Next, the Y coordinate of the bottom of the head rectangle is obtained by the head lower coordinate detecting means 45 (step S28 in FIG. 20). As a method for obtaining the base coordinates of the head rectangle, a search is performed from the top to the bottom (Y direction), a line whose motion region width 47 is smaller than the average value dm of the motion region width is obtained, and the Y coordinate in the line A method may be used in which the largest is the base of the head rectangle.
[0068]
Next, the left and right x-coordinates of the head rectangle are obtained by the temporal coordinate detection means 46 (step S29 in FIG. 20). As a method of obtaining the left and right x-coordinates, a method of obtaining the coordinates of the left and right ends of the motion region in the line having the largest motion region width 47 in the range from the top of the head to the lower part of the head may be used.
[0069]
When the number of persons is two or more, the processing from step S26 to step S29 in FIG. 20 is repeated for each partial region.
[0070]
Next, the operation of the head tracking means 22 in FIG. 2 will be described with reference to FIG. The tracking process is performed on the camera image (right camera image in FIG. 2) used for head rectangular coordinate detection. First, the head rectangular coordinate 53 of the previous frame and the head rectangular image 55 of the previous frame are read from the head rectangular storage means 23.
[0071]
Next, in the current frame, the vicinity region of the head rectangular coordinate 53 of the previous frame is searched by template matching, and the place with the smallest distance value is set as the tracking result.
[0072]
FIG. 3 is a diagram showing the configuration of the head detection / tracking means 32 according to another embodiment of the head detection / tracking means 6 of FIG. Referring to FIG. 3, the head detection and tracking unit 32 includes a head detection unit 33, a head rectangle storage unit 23, and a head tracking unit 22. 2 is different from the embodiment shown in FIG. 2 in that the head detecting means 33 has a head rectangular coordinate detecting means 24 and a face-to-face distance evaluating means 26, and outputs from the monocular camera 34 and the face-to-face distance sensor 30. The detection is performed using. That is, the head rectangle is evaluated using only the reading value of the facing distance sensor without considering the facing distance in the left and right stereo vision.
[0073]
As another example of the head detection and tracking means 6, a head detection means having a configuration in which a face distance is obtained from information of only the left and right cameras and a head rectangle is evaluated without using a face distance sensor. Also good. In the case of this configuration, the head detection unit 21 is configured without the facing distance integration unit 31 of FIG.
[0074]
FIG. 9 is a diagram showing the configuration of the front face alignment means 61 constituting one embodiment of the front face alignment means 7 of FIG. Referring to FIG. 9, the front face alignment unit 61 includes a head rectangular cutout unit 62, a front face search unit 63, and a front face appearance determination unit 65.
[0075]
The front face appearance determination unit 65 includes a density dispersion determination unit 66 and a threshold processing unit 67.
[0076]
FIG. 21 is a flowchart for explaining the processing of the front face alignment means 61. The operation of the front face alignment means 61 will be described with reference to FIGS. When the image data, the head rectangular coordinates, and the facing distance are input, the front face alignment unit 61 outputs a front face presence / absence flag and front face image data. The input image data is cut into a partial image of the head rectangle by the head rectangle cutting means 62 (step S41). This partial image is referred to as a “head rectangular image”.
[0077]
Next, the front face search means 63 searches the front face area from the head rectangular image, and outputs the inter-pattern distance or similarity between the front face image and the standard face dictionary (step S42).
[0078]
Next, the front face likelihood determination means 65 determines whether or not the front face image is really a front face (step S43). If it is determined that the face is a front face, the front face presence / absence flag is “present” and a front face image is output. If it is determined that the face is not a front face, the front face presence / absence flag is “none” and no front face image is output.
[0079]
The front face appearance determination unit 65 includes a density dispersion determination unit 66 and a threshold processing unit 67.
[0080]
The density dispersion determining unit 66 obtains the variance of the gray value of the front face image data, and determines that the face is not a front face if it is below a certain threshold value (step S44 in FIG. 21).
[0081]
The density dispersion determining means 66 can eliminate a monotonous wall-like pattern.
[0082]
The threshold value unit 67 determines the likelihood of a front face by thresholding the inter-pattern distance or the similarity (step S45 in FIG. 21).
[0083]
In the case of the inter-pattern distance value, it is determined that the face is not a front face when it is equal to or greater than the threshold value. In the case of similarity, it is determined that the face is not a front face when it is below the threshold.
[0084]
FIG. 12 is an explanatory view schematically showing the operation of the front face alignment means 61. If the head rectangle 151 is detected, a head rectangle image 152 is generated in step S41 of FIG.
[0085]
Next, in the face center portion search process in step S42 of FIG. 21, the front face image 155 is obtained after the reduced head rectangular image 153 is generated.
[0086]
The front face image is an image of the center part of the face as shown in the front face image 155 of FIG. 12, the horizontal direction is such that both eyes are completely included, and the vertical direction is the entire mouth from the eyebrows. It means an image of an area that includes it.
[0087]
FIG. 10 is a diagram showing the configuration of the front face searching means 71 that constitutes one embodiment of the front face searching means 63. Referring to FIG. 10, this front face search means 71 includes a head rectangular image storage means 89, a head intermediate size calculation means 88, an image reduction means 90, an intermediate size storage means 91, and a front face candidate extraction means. 72, intermediate reduced image storage means 73, contrast correction means 74, eigenvector projection distance calculation means 75, standard face dictionary storage means 76, storage means 77, projection distance minimum determination means 78, and search range end determination. Means 79 and multi-resolution processing end judging means 92 are provided.
[0088]
The eigenvector projection distance calculation means 75 includes an average difference means 82, a vector projection value calculation means 83, a reconstruction calculation means 84, and a projection distance calculation means 85.
[0089]
The standard face dictionary storage unit 76 includes a standard face average data storage unit 80 and a standard face eigenvector data storage unit 81.
[0090]
The storage unit 77 includes a projection distance minimum value storage unit 86 and a front face gray value storage unit 87.
[0091]
FIG. 22 is a flowchart for explaining the process of the front face searching means 71. The operation of the front face searching means 71 will be described with reference to FIGS. The head rectangular image data is held in the head rectangular image storage unit 89. First, the head intermediate size calculating means 88 calculates the intermediate reduced size of the head rectangular image with reference to the face distance value and the size of the standard face dictionary data (step S101).
[0092]
A processing example of the head intermediate size calculating unit 88 will be described. The intermediate reduced size is shown as the vertical and horizontal size of the reduced head rectangular image 153 in FIG. The horizontal size of the head rectangular image 152 is Hw, and the vertical size is Hh. The horizontal size of the intermediate reduced size is Mw, and the vertical size is Mh. The front face image has a horizontal size Fw and a vertical size Fh. Fw and Fh are the same as the vertical and horizontal sizes of the front face search shape 154, and are uniquely determined for the standard face dictionary. Hh, Hw, Mh, Mw, Fh, and Fw are all pixel sizes in units of pixels.
[0093]
The standard face dictionary is a pattern recognition dictionary generated by using the gray value of the front face area shown in the front face image 155 of FIG. 12 as a feature value. The front face region means a region that includes both eyes completely in the horizontal direction and a region that includes the entire mouth from the eyebrows. The front face area is not necessarily rectangular, and can be realized by an arbitrary continuous area including both eyes, nose and mouth, such as an ellipse. However, if the shape is rectangular, the processing can be simplified and speeded up, which is effective as a mounting form. Therefore, in the following description, the front face area is assumed to be rectangular.
[0094]
If the actual vertical and horizontal sizes of the front face area are RFh and RFw, a male adult can be represented by about RFw = 10 cm and RFh = 15 cm. On the other hand, since the facing distance Z is known, the actual vertical and horizontal sizes RHh and RHw of the head rectangular image can be calculated by the following equation (4). The variable of the following equation (4) corresponds to FIG.
[0095]

[0096]
Since the width e of the image pickup surface is small, there is no problem even if the calculation is normally ignored.
[0097]
In order to search for a head rectangular image using the standard face dictionary, it is necessary to convert the head rectangular image to the same resolution as the standard face dictionary. The converted sizes are intermediate reduced sizes Mw and Mh. Mh and Mw can be obtained from the relational expression of the following expression (5).
[0098]

[0099]
That is, the head intermediate size calculating means 88 can calculate one set of intermediate reduced sizes Mw and Mh by designating one set of RFw and RFh. However, the size of the front face of humans varies from adults to children, women and men. Therefore, it is possible to prepare a plurality of sets of RFw and RFh and calculate an intermediate reduced size corresponding to each set. By calculating a plurality in advance, the subsequent front face search process can be processed with a plurality of intermediate reduction sizes. Further, it can be interpreted that the search processing with a plurality of intermediate reduction sizes is the same as the act of searching for the head rectangle with a plurality of resolutions.
[0100]
When the intermediate size is calculated by the head intermediate size calculation unit 88, the intermediate size information is stored in the intermediate size storage unit 91.
[0101]
Next, the minimum inter-pattern distance value Dmin is initialized to a value sufficiently larger than the normally obtained inter-pattern distance value (step S102 in FIG. 22).
[0102]
One piece of information in the intermediate reduction size storage unit 91 is selected, and the image reduction unit 90 reduces the head rectangular image to the selected intermediate reduction size to obtain a reduced head rectangular image (step S103 in FIG. 22).
[0103]
Next, the front face search positions SX and SY are initialized to 0 (step S104 in FIG. 22).
[0104]
Next, the front face candidate extracting means 72 extracts front face candidate images at the search positions SX and SY (step S105 in FIG. 22).
[0105]
Next, in order to correct the tone of the front face candidate image, the contrast is corrected by the contrast correcting means 74 (step S106 in FIG. 22).
[0106]
An example of a specific method for contrast correction will be described. Assuming that the front face candidate image takes a value from 0 to vmax, the average of the pixel values is μ, and the standard deviation is σ, the conversion formula from the original image V to the contrast corrected image V ′ is the following formula (6 ).
[0107]

[0108]
Referring to FIGS. 10 and 22 again, next, the eigenvector projection distance calculation means 75 obtains the eigenvector projection distance D between the front face candidate image and the standard face pattern (step S107).
[0109]
Next, the projection distance minimum judging means 78 compares D and Dmin. At this time, if D is a value smaller than Dmin, D is substituted for Dmin, the value is updated, and stored in the projection distance minimum value storage means 86. At the same time, the front face candidate image is stored in the front face gray value storage means 87 (step S108).
[0110]
Next, the search range end determination means 79 increments the search positions SX and SY (step S109), and determines whether or not all the head rectangles have been searched (step S110). If the search has not been completed yet, the process is repeated from step 105 again.
[0111]
When the search for the entire search range of the head rectangle has been completed, the multi-resolution processing end determination unit 92 determines whether or not the search has been performed with all intermediate reduced sizes (step S111). If there is an intermediate reduction size that has not been searched, the process starts again from step S103 using a different intermediate reduction size. The front face searching means 71 ends when the search is completed for all intermediate reduced sizes.
[0112]
Next, the operation of the eigenvector projection distance calculation means 75 in FIG. 10 will be described.
[0113]
The standard face dictionary storage means 76 stores standard face average data and standard face eigenvector data.
[0114]
FIG. 33 shows an example of an eigenvector projection distance calculation dictionary when the number of feature quantities is p. The eigenvector projection distance calculation dictionary includes p-dimensional eigenvector data E from the 1st to the p-th and an average value Ave of p feature amounts. When there are p feature quantities, eigenvectors exist up to the p-th, but 1 to m-th are used when calculating the projection distance.
[0115]
The pixel values of the front face candidate image are raster scanned as shown in FIG. 8 and converted into one-dimensional feature data. At this time, the product Fw × Fh of the vertical and horizontal sizes of the front face image must be the same as the feature quantity of the dictionary. This is expressed as a vector X: X1, X2,. . . Let Xp.
[0116]
First, the average difference means 82 subtracts the average vector Ave from the vector X. This is a vector Y.
[0117]

[0118]
Next, in the vector projection value calculation means 83, the vector Y is projected onto m eigenvectors, and the projection values R1. . Rm is obtained. The projection value calculation method is shown in the following equation (8).
[0119]

[0120]
Next, in the reconstruction calculation means 84 (see FIG. 10), the projection values R1. . . The original feature quantity Y is reconstructed using Rm and m eigenvectors, and the reconstructed vector is Y ′. The reconstruction calculation is shown in the following equation (9).
[0121]

[0122]
Next, the projection distance calculation means 85 calculates the Euclidean distance value between Y and Y ′ according to the following equation (10). Thereby, the projection distance D to the eigenvector E is calculated.
[0123]

[0124]
FIG. 11 is a diagram showing the configuration of the front face searching means 101 as another embodiment of the front face searching means 63 of FIG. Referring to FIG. 11, this front face search means 101 includes a head rectangular image storage means 89, a head intermediate size calculation means 88, an image reduction means 90, an intermediate size storage means 91, and a front face candidate extraction means. 72, intermediate reduced image storage means 73, contrast correction means 74, product-sum operation means 102, standard face dictionary data storage means 104, storage means 105, maximum similarity determination means 103, search range end determination Means 79 and multi-resolution processing end judging means 92 are provided.
[0125]
The storage unit 105 includes a similarity maximum value storage unit 106 and a front face gray value storage unit 107.
[0126]
FIG. 23 is a flowchart for explaining the processing of the front face search means 101. The operation of the front face searching unit 101 will be described with reference to FIGS. The head rectangular image data is held in the head rectangular image storage unit 89. First, the head intermediate size calculation means 88 calculates the intermediate reduced size of the head rectangular image with reference to the face-to-face distance value and the standard face dictionary data size, and stores it in the intermediate size storage means 91 (step S121).
[0127]
The calculation method of the intermediate reduction size is the same as that of the front face search means 71. Next, the maximum similarity Smax is initialized to 0 (step S122).
[0128]
One piece of information in the intermediate size storage unit 91 is selected, and the image reduction unit 90 reduces the head rectangular image to the selected intermediate reduction size to obtain a reduced head rectangular image (step S123).
[0129]
Next, the front face search positions SX and SY are initialized to 0 (step S124).
[0130]
Next, the front face candidate extracting means 72 extracts front face candidate images at the search positions SX and SY (step S125).
[0131]
Next, in order to correct the tone of the front face candidate image, the contrast is corrected by the contrast correcting means 74 (step S126).
[0132]
Next, the product-sum operation unit 102 obtains the similarity S between the front face candidate image and the standard face pattern (step S127).
[0133]
Next, the similarity maximum value determination means 103 compares S and Smax. At this time, if S is a value larger than Smax, S is substituted into Smax, the value is updated, and stored in the similarity maximum value storage unit 106. At the same time, the front face candidate image is stored in the front face gray value storage means 107 (step S128).
[0134]
Next, the search range end determination means 79 increments the search positions SX and SY (step S129), and determines whether or not all the head rectangles have been searched (step S130). If the search has not been completed yet, the process is repeated from step 125 again.
[0135]
When the search for the entire search range of the head rectangle has been completed, the multi-resolution processing end determination unit 92 determines whether or not the search has been performed with all intermediate reduction sizes. If there is an intermediate reduction size that has not been searched, the process starts again from step S123 using a different intermediate reduction size.
[0136]
When the search is completed for all intermediate reduction sizes, the front face search means 101 ends.
[0137]
Next, the operation of the product-sum operation unit 102 for calculating the similarity with the standard face pattern shown in FIG. 11 will be described.
[0138]
The product-sum operation unit 102 calculates the similarity S with reference to a linear discrimination dictionary that discriminates whether the face is the front face or not. FIG. 34 shows an example of the linear discrimination dictionary. FIG. 34 shows a dictionary that discriminates q classes, but the standard face dictionary data storage means 104 is a dictionary that discriminates the front face and the other two classes when q = 2. It corresponds to.
[0139]
The pixel values of the front face candidate image are raster scanned as shown in FIG. 8 and converted into one-dimensional feature data. At this time, the product Fw × Fh of the vertical and horizontal sizes of the front face image must be the same as the feature quantity of the dictionary. This is expressed as a vector X: X1, X2,. . . Let Xp.
[0140]
Also, as standard face dictionary data, a class with q = 1 represents a front face, and a class with q = 2 represents the other. The similarity with the front face can be calculated by the following equation (11) using only the q = 1 row of FIG. 34, that is, the class 1 identification coefficient 422.
[0141]

[0142]
The product-sum operation unit 102 calculates the similarity by calculating the above equation (11).
[0143]
13 includes a face identification unit 111 that is an example of the face identification unit 9 of FIG. 1, an identification dictionary storage unit 112 that is an example of the identification dictionary storage unit 10 of FIG. 1, and an identification result of FIG. An identification result correction unit 113 which is an embodiment of the correction unit 11 is shown. Referring to FIG. 13, the face identification unit 111 includes a feature data storage unit 115, a product-sum operation unit 116, a maximum similarity person determination unit 118, and a threshold processing unit 117. The identification dictionary storage unit 112 includes a registered person identification dictionary storage unit 119. The identification result correction unit 113 includes an identification result weighted average calculation unit 114.
[0144]
FIG. 24 is a flowchart for explaining the processing of the face identifying unit 111 and the identification result correcting unit 113. With reference to FIGS. 13 and 24, operations of the face identifying unit 111 and the identification result correcting unit 113 will be described.
[0145]
Feature data is input and stored in the feature data storage means 115 (step S51).
[0146]
Next, referring to the data in the registered person identification dictionary storage means 119, the product-sum operation means 116 calculates the similarity to the registered person for each person (step S54).
[0147]
The method of calculating the similarity is basically the same as the operation of the product-sum operation unit 102 that calculates the similarity with the standard face pattern. However, the number of classes to be identified is the number of registered persons.
[0148]
When data for q people is registered as a dictionary, the registered person identification dictionary storage means 119 holds the same number of data as the linear discrimination dictionary shown in FIG. Then, the product-sum operation means 116 obtains q similarity degrees Si: (i = 1,... Q) as shown in the following equation (12).
[0149]

[0150]
A method of identifying a pattern based on the magnitude of similarity obtained by the product-sum operation processing using the linear discrimination dictionary shown in FIG. 34 is referred to as “similarity discrimination using the linear discrimination dictionary”.
[0151]
Referring to FIGS. 13 and 24 again, next, the maximum similarity person determination unit 118 obtains the maximum value among the calculated q similarity degrees, and obtains the person corresponding to the maximum value (step S55). That is, the person who is judged to be most similar to the feature data is obtained. The similarity at this time is called “maximum similarity”.
[0152]
Next, the threshold processing means 117 compares the maximum similarity with a predetermined threshold (step S56).
[0153]
When the maximum similarity is higher than the threshold value, the face identification unit 111 determines that the person has been identified, and outputs the ID number and the maximum similarity. When the maximum similarity is lower than the threshold value, there is a high possibility that the person is not a registered person (person), so information “other person” is output.
[0154]
After receiving the identification result, the identification result correcting unit 113 integrates the identification results in the past N frames by the identification result weighted average calculating unit 114 (step S57). As an example of the operation of the identification result weighted average calculating means 114, there is a method of performing the weighted average of the identification person ID and similarity in the past N frames or the other person determination result as follows.
[0155]
Step A1: If the number of frames at a certain ratio in the past N frames is “other”, it is determined as “other”. If it is determined that it is not another person, the process proceeds to step A2.
[0156]
Step A2: It is assumed that there is a Ni frame in the past N frames that is determined as a person i (i = 1... Q). The similarity weighted average value of each person is calculated by the following equation (13). Si represents the similarity of person i, and SSi represents the similarity weighted average value of person i. The person ID with the largest SSi is output as the identification result.
[0157]

[0158]
The identification result correcting unit 113 outputs the identification result integrated as described above.
[0159]
FIG. 15 is a diagram showing a configuration of a face identification unit 131 as another example of the face identification unit 9 of FIG. 1 and a configuration of an identification dictionary storage unit 132 as another example of the identification dictionary storage unit 10 of FIG. It is. Referring to FIG. 15, the face identification unit 131 includes a feature data storage unit 115, an eigenvector / other person determination unit 133, a product-sum operation unit 116, a maximum similarity person determination unit 118, and a threshold processing unit 117. I have.
[0160]
The identification dictionary storage unit 132 includes a different person determination dictionary storage unit 134 and a registered person identification dictionary storage unit 119.
[0161]
FIG. 24 is a flowchart for explaining the processing of the face identification unit 131. The operation of the face identification unit 131 will be described with reference to FIGS. Feature data is input and stored in the feature data storage means 115 (step S51).
[0162]
Next, the inter-pattern distance Dh with the registered person group is obtained by the eigenvector / other person discriminating means 133 while referring to the data in the other person discrimination dictionary storage means 134 (step S52), and the inter-pattern distance Dh is the threshold. If it is larger than the value, it is determined that the person is another person (step S53). The inter-pattern distance Dh is calculated as an eigenvector projection distance. That is, the eigenvector dictionary created based on the feature data of all registered persons is stored in the other person discrimination dictionary storage means.
[0163]
An example of the eigenvector dictionary is shown in FIG. 33, and the eigenvector projection distance calculation method is described in the explanation of the operation of the eigenvector projection distance calculation means 75.
[0164]
If the input feature data is determined to be another person by the eigenvector / other person determination means 133, the face identification means 131 immediately outputs “other person” without going through the product-sum operation means 116.
[0165]
If the eigenvector other person determination means 133 does not determine “others”, the data in the registered person identification dictionary storage means 119 is referred to, and the product sum calculation means 116 determines the similarity to the registered person. Are calculated for each person (step S54). If q persons are registered, q similarity degrees Si: (i = 1,... q) are calculated.
[0166]
Next, the maximum similarity person determination unit 118 obtains the maximum value among the calculated q similarity degrees, and obtains a person corresponding to the maximum value (step S55). That is, the person who is judged to be most similar to the feature data is obtained. The similarity at this time is called the maximum similarity. Next, the threshold processing means 117 compares the maximum similarity with a predetermined threshold (step S56).
[0167]
When the maximum similarity is higher than the threshold value, the face identification unit 131 determines that the person has been identified, and outputs the ID number and the maximum similarity. When the maximum similarity is lower than the threshold value, there is a high possibility that the person is not a registered person, so the information “other person” is output.
[0168]
FIG. 14 is a diagram illustrating a configuration of the dictionary data management unit 121 which is an embodiment of the dictionary data management unit 12 of FIG. Referring to FIG. 14, the dictionary data management unit 121 includes a personal feature data storage unit 122, an identification dictionary generation unit 123, and a selector 124.
[0169]
The identification dictionary generation unit 123 includes a linear discrimination dictionary creation unit 126 and an eigenvector dictionary creation unit 127. In the personal feature data storage unit 122, there are personal feature data regions 125 for the number of registered persons.
[0170]
When the feature data and the person ID number are input, the dictionary data management unit 121 assigns the input feature data to an area corresponding to the person ID number in the personal feature data storage unit 122 by the selector 124. Remember. If there is an instruction to add a new person ID, a new person-specific feature data area 125 is secured in the personal feature data storage unit 122 and a new person ID number is assigned. If there is an instruction to delete an existing person ID, the person-specific feature data area 125 of the corresponding ID in the personal feature data storage unit 122 is discarded.
[0171]
When the dictionary data management unit 121 receives the identification dictionary creation command, the identification dictionary generation unit 123 uses the data of the personal feature data storage unit 122 to create a registered person identification identification dictionary and an eigenvector dictionary, which are linear discrimination dictionaries. A certain person discrimination dictionary is generated. The registered person identifying identification dictionary is created by the linear discrimination dictionary creating means 126. The other person discrimination dictionary is created by the eigenvector dictionary creating means 127.
[0172]
When the face identification unit 9 in FIG. 1 has the configuration of the face identification unit 111 shown in FIG. 13, the other person discrimination dictionary is unnecessary, and therefore the identification dictionary generation unit 123 has the eigenvector dictionary creation unit 127. There may be no configuration.
[0173]
FIG. 25 is a flowchart for explaining the registration process in the dictionary data management unit 12 of FIG. The operation when the dictionary data management unit 121 registers a new person in front of the camera will be described with reference to FIGS.
[0174]
The person detection / identification manager 13 designates a new person ID and instructs the dictionary manager 121 to register a new person (step S61).
[0175]
The dictionary data management unit 121 secures a personal feature data area 125 corresponding to the specified ID in the personal feature data storage unit 122 (step S62).
[0176]
The person detection / identification management unit 13 acquires the specified number of feature data from the face feature extraction unit 8 in the person detection / identification unit 1 and sends it to the dictionary data management unit 121.
[0177]
The dictionary data management unit 121 stores the acquired data in the feature data area 125 for each person corresponding to the new ID (step S63).
[0178]
When the acquisition for the designated number is completed, the person detection / identification management unit 13 instructs the dictionary data management unit 121 to create an identification dictionary (step S64).
[0179]
Upon receiving the creation instruction, the dictionary data management unit 121 creates a registered person identification dictionary by the linear discrimination dictionary creation unit 126 of the identification dictionary creation unit 123 (step S65).
[0180]
Next, another person discrimination dictionary is created by the eigenvector dictionary creating means 127 (step S66).
[0181]
Then, the created dictionary is output and stored in the identification dictionary storage means 10 of the person detection / identification unit (step S67). With the above processing, the new person registration processing is completed.
[0182]
FIG. 29 is a diagram showing the configuration of the linear discrimination dictionary creation means 311 that constitutes an embodiment of the linear discrimination dictionary creation means 126 of FIG. Referring to FIG. 29, the linear discriminant dictionary creation means 311 includes a feature amount X storage means 312, a variance covariance matrix Cxx calculation means 313, an inverse matrix conversion means 314, a matrix multiplication means 315, and an objective variable Y storage means. 317, objective variable Y generation means 318, covariance matrix Cxy calculation means 319, coefficient storage means 320, and constant term calculation means 316.
[0183]
A method for creating a linear discrimination dictionary will be described with reference to FIGS. 31 and 34. FIG. 31 shows n individual feature data X341 for n persons, from person 1 to person q. The number of features of the personal feature data X341 is p. One row in FIG. 31 shows the feature data for one sheet. The personal feature data X341 is stored in the personal feature data storage unit 122 in FIG. 14, and personal feature data for one person is stored in the personal feature data area 125, respectively.
[0184]
The objective variable Y342 is a vector that is created for each feature data and has elements for the number of persons to be identified. That is, in FIG. 31, since the person ID takes a value from 1 to q, the objective variable Y exists from Y1 to Yq. The value of the objective variable Y342 is a binary value of 0 or 1, the vector element of the person to which the feature data belongs is 1, and the others are 0. That is, in the case of the feature data of the person 2, only the Y2 element is 1 and the others are 0.
[0185]
FIG. 34 is a diagram illustrating an example of the format of the linear discrimination dictionary. The linear discrimination dictionary is composed of two types of coefficients, a constant term 421 and a multiplication term 425. If the matrix consisting of multiplication terms is Aij (i = 1,... P, j = 1,... Q) and the vector consisting of constant terms is A0j (j = 1,... Q), then the matrix. Aij is obtained from the following equation (14).
[0186]

[0187]
In the above equation (14), Cxx is this variance-covariance matrix using all data of the personal feature data X341. This variance covariance matrix Cxx is calculated by the following equation (15).
[0188]
Elements of the personal feature data X are represented by xij: (i = 1... N, j = 1,... P). N is the total number of data and p is the number of features. In the example shown in FIG. 31, since there are n pieces of data per person, N = nq. x￣ represents the average value of x.
[0189]

[0190]
Cxy is a covariance matrix of the individual feature data X and the objective variable Y. The covariance matrix Cxy is calculated according to the following equation (16). x￣ and y￣ represent average values of x and y.
[0191]

[0192]
The constant term A0j is calculated according to the following equation (17).
[0193]

[0194]
First, Cxx is calculated and its inverse matrix is obtained. Next, Cxy is obtained. Finally, these matrixes are multiplied to obtain a multiplication term matrix Aij, and finally a constant term A0j is obtained.
[0195]
FIG. 29 is a diagram for explaining the processing of the linear discrimination dictionary creation means 311. With reference to FIG. 29, the operation of the linear discrimination dictionary creation means 311 will be described. The input personal feature data group is stored in the feature amount X storage unit 312.
[0196]
The objective variable Y is generated by the objective variable Y generation means 318 using the inputted personal feature data group and person ID. The generated objective variable Y is stored in the objective variable Y storage means 317.
[0197]
Next, the variance-covariance matrix Cxx calculation means 313 calculates the variance-covariance matrix Cxx.
[0198]
Next, an inverse matrix conversion unit 314 calculates an inverse matrix of the variance-covariance matrix Cxx.
[0199]
Next, the covariance matrix Cxy calculation means 319 calculates the covariance matrix Cxy.
[0200]
Next, the matrix multiplication unit 315 calculates the multiplication term Aij, and stores the multiplication term Aij data in the coefficient storage unit 320.
[0201]
Next, the constant term calculation unit 316 calculates the constant term A0j and stores it in the coefficient storage unit 320.
[0202]
Finally, the data of the coefficient storage means 320 is output and the process ends.
[0203]
In FIG. 31, binary data of 0 and 1 is shown. However, in the linear discriminating dictionary creation means 311, other binary data such as 0 and 100 can be used.
[0204]
FIG. 30 is a diagram showing an eigenvector dictionary creating unit 331 that constitutes an embodiment of the eigenvector dictionary creating unit 127 of FIG. The eigenvector dictionary creation unit 331 includes a feature amount storage unit 332, a feature amount average calculation unit 333, a variance covariance matrix calculation unit 334, an eigenvector calculation unit 335, and a coefficient storage unit 336.
[0205]
A method for creating an eigenvector dictionary will be described with reference to FIGS. 31 and 33. FIG. Elements of the personal feature data X are represented by xij: (i = 1... N, j = 1,... P).
[0206]
First, an average value for each feature amount of X is obtained. Next, the variance covariance matrix Cxx of X is calculated using the above equation (15). The eigenvector of the variance-covariance matrix Cxx is obtained. Methods for obtaining eigenvectors are widely known by those skilled in the art and are not directly related to the present invention, and therefore, details thereof are omitted.
[0207]
Through the above operation, an eigenvector dictionary having a format as shown in FIG. 33 is obtained. Eigenvectors are obtained by the number of feature quantities (p). In FIG. 33, one row represents one eigenvector.
[0208]
FIG. 30 is a diagram for explaining the configuration and processing of the eigenvector dictionary creating unit 331. The operation of the eigenvector dictionary creation unit 331 will be described with reference to FIG.
[0209]
When personal feature data is input, it is stored in the feature amount storage means 332.
[0210]
Next, in the feature quantity average calculation means 333, an average value of the feature quantities is obtained and stored in the coefficient storage means 336. Next, the variance / covariance matrix calculation means 334 calculates the variance / covariance matrix Cxx.
[0211]
Next, the eigenvector calculation means 335 calculates the eigenvector of the variance-covariance matrix Cxx and stores it in the coefficient storage means 336. Finally, the data of the coefficient storage means 336 is output and the process ends.
[0212]
FIG. 16 is a diagram showing a configuration of an embodiment of a robot apparatus according to the present invention. Referring to FIG. 16, the robot apparatus 201 includes a CCD camera 202, a person detection / identification unit 203, a dictionary data management unit 204, a person detection / identification management unit 205, an overall control unit 206, a speaker 207, and a robot. Moving means 208. The robot moving unit 208 includes a motor 209 and wheels 210.
[0213]
The person detection / identification means 203 performs person detection and identification based on the stereo video from the CCD camera 202. The person detection / identification management unit 205 exchanges information with the person detection / identification means 203, exchanges information with the overall control unit 206, and exchanges information with the dictionary data management unit 204. The speaker 207 is connected to the overall control unit 206 and can speak in response to an instruction from the overall control unit 206. The overall control unit 206 also sends a movement instruction to the robot moving means 208. The robot moving means 208 has a motor 209 and a plurality of wheels 210, and can move the robot in any direction.
[0214]
FIG. 26 is a diagram for explaining the process of the embodiment of the robot apparatus according to the present invention. With reference to FIG.16 and FIG.26, operation | movement of the robot apparatus 201 of one embodiment of this invention is demonstrated. When the robot apparatus 201 detects a person, it moves in the direction of the person and approaches it, and performs person identification when approaching within a predetermined distance.
[0215]
The person detection / identification management unit 205 acquires the detected rectangle information of the head and the facing distance value from the head detection / tracking means in the person detection / identification means 203, and transmits them to the overall control section 206 (step S71).
[0216]
The overall control unit 206 refers to the facing distance value and determines whether the facing distance is closer than a threshold value (step S73). When the distance is longer than the threshold value, the robot moving unit 208 is instructed to move forward in the direction of the person (step S72).
[0217]
The approximate direction in which a person exists can be inferred from the head rectangular coordinates in the image. Steps S71 to S73 are repeated, and when the facing distance becomes closer than the threshold value, the overall control unit 206 acquires a person identification result from the person detection / identification management unit 205 (step S74).
[0218]
Next, a voice for each person is uttered from the speaker 207 to inform the conversation person that the person has been identified (step S75).
[0219]
FIG. 17 is a diagram showing the configuration of an embodiment of the robot apparatus of the present invention. Referring to FIG. 17, the robot apparatus 221 includes a CCD camera 202, a face-to-face distance sensor 223, a touch sensor 222, a person detection / identification unit 224, a dictionary data management unit 204, a person detection / identification management unit 205, A control unit 227, a microphone 225, a voice recognition unit 226, a speaker 207, and a robot moving unit 208 are provided. The robot moving unit 208 includes a motor 209 and wheels 210.
[0220]
The person detection / identification means 224 performs person detection and identification based on information from the stereo image from the CCD camera 202 and the face-to-face distance sensor 223. The person detection / identification management unit 205 exchanges information with the person detection / identification means 224, exchanges information with the overall control unit 227, and exchanges information with the dictionary data management unit 204. The speaker 207 is connected to the overall control unit 227 and can speak in response to an instruction from the overall control unit 227. Further, the overall control unit 227 sends a movement instruction to the robot moving means 208. The robot moving means 208 has a motor 209 and a plurality of wheels 210, and can move the robot in any direction. The touch sensor 222 is connected to the overall control unit 227 and detects the presence / absence of an object contact and the strength of the contact from the outside. The microphone 225 is connected to the voice recognition unit 226, and the voice recognition unit 226 is connected to the overall control unit 227. The voice recognition unit 226 automatically recognizes a person's word from the voice data from the microphone 225 and transmits the recognition result to the overall control unit.
[0221]
FIG. 27 is a flowchart for explaining processing of the robot apparatus according to the embodiment of the present invention. An operation example of the robot apparatus 221 according to the embodiment of the present invention will be described with reference to FIGS.
[0222]
The robot device 221 detects and identifies the person who is facing, and updates the identification dictionary according to the reaction of the interlocutor to sequentially learn the person images.
[0223]
The overall control unit 227 acquires the person identification result from the person detection / identification management unit 205 (step S81).
[0224]
Next, a specific operation is performed for each identified person (step S82). A specific action refers to the entire action of speaking a person's name or moving a wheel in a specific direction by a person.
[0225]
Next, it waits for the interaction of the dialogue person to be sensed (step S83). When the interaction is obtained, it is determined whether the response is positive or negative (step S84). The positive reaction means “Yes”, for example, when “Yes” is recognized by voice recognition.
[0226]
The negative reaction means “No”, for example, when “No” is recognized by voice recognition.
[0227]
Also, for example, by pre-determining the rule “Yes” when the touch sensor is pressed once in advance and “No” when it is pressed twice in advance, the response of the conversation person is acquired using the touch sensor. Can do.
[0228]
If step S84 is negative, the operation is repeated once again from step S81. In the case of a positive reaction, the facial feature data used for identification is input to the dictionary data management unit 204 (step S85).
[0229]
Then, an identification dictionary is created and updated (step S86).
[0230]
If there is an end instruction, the process ends. If not, the operation is repeated from step S81 (step S87).
[0231]
FIG. 28 is a flowchart for explaining the processing of another embodiment of the robot apparatus of the present invention. An example of the operation of the robot apparatus 221 according to another embodiment of the present invention will be described with reference to FIGS. The robot device 221 detects and identifies the person who is facing the person, acquires an image of a new person according to the command of the conversation person, and registers a new person dictionary.
[0232]
The overall control unit 227 acquires a person identification result from the person detection / identification management unit 205 (step S91).
[0233]
Next, a specific operation is performed for each identified person (step S92). A specific action refers to the entire action of speaking a person's name or moving a wheel in a specific direction by a person.
[0234]
Next, it waits for the command of the interactor to be sensed (step S93). When a dialogue event is obtained, it is next determined whether or not the command event is a registration command (step S94). The registration command event includes, for example, an event that “Toroku” is recognized by voice recognition. In addition, a rule that a registration event is determined by tapping the touch sensor once in advance is determined in advance by the overall control unit, so that a registration command event can be uttered using the touch sensor.
[0235]
When a registration command is received, registration processing is performed, and a new person is registered (step S95). An example of the registration processing step S95 is shown in FIG. 25 and has already been described.
[0236]
When the registration process ends, the process starts again from step S91. If the registration command event does not come, an end determination is made (step S96).
[0237]
If there is an end instruction, the process ends. If there is no end instruction, the process starts again from step S91.
[0238]
FIG. 35 is a diagram showing a configuration of an embodiment of a person detection and identification system according to the related invention of the present invention. Referring to FIG. 35, the person detection and identification system according to the related invention of the present invention includes a video acquisition unit 2, a face-to-face distance sensor 5, a person detection and identification unit 501, a dictionary data management unit 12, and a person detection and identification management unit. 502 and a recording medium 503. The recording medium 503 records a person detection / identification processing program, and may be a magnetic disk, a semiconductor memory, a CD-ROM, or another recording medium.
[0239]
The person detection / identification processing program is read from the recording medium 503 into the person detection / identification unit 501 and the person detection / identification management unit 502, and the person detection / identification unit 6 and the person detection in the embodiment described above with reference to FIG. The same processing as that performed by the identification management unit 13 is executed.
[0240]
【The invention's effect】
As described above, according to the present invention, the following effects can be obtained.
[0241]
The person identification device used in the robot apparatus of the present invention has an effect that a person can be identified with an extremely high identification rate even in an environment where the lighting conditions fluctuate significantly, such as a home environment.
[0242]
The reason is as follows. That is, in the present invention, detection and identification are performed using not only the gray value of the image but also the motion and the facing distance information. Further, in the present invention, the front face with high accuracy is detected by the face alignment means. Furthermore, in the present invention, the face identification means performs similarity identification using a linear discrimination dictionary.
[0243]
And in this invention, it is because it comprised so that an identification precision could be raised by an apparatus or a system learning by itself through the dialogue with a person (user).
[0244]
Further, according to the robot apparatus of the present invention, even if the input image is not good and cannot be identified at a certain place, a good image can be obtained by operating the apparatus itself thereafter.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an embodiment of a person identification device used in a robot apparatus of the present invention.
FIG. 2 is a block diagram showing a configuration of an embodiment of a head detection tracking means in the person identification device used in the robot apparatus of the present invention.
FIG. 3 is a block diagram showing a configuration of another embodiment of the head detection tracking means in the person identification device used in the robot apparatus of the present invention.
FIG. 4 is a block diagram showing a configuration of an embodiment of a monocular head rectangular coordinate detection means in the person identification device used in the robot apparatus of the present invention.
FIG. 5 is a diagram for explaining head detection processing in the person identification device used in the robot apparatus of the present invention.
FIG. 6 is a diagram for explaining left and right image collation processing in the person identification device used in the robot apparatus of the present invention.
FIG. 7 is a diagram for explaining head tracking processing in the person identification device used in the robot apparatus of the present invention.
FIG. 8 is an explanatory diagram of a raster scan when converting a front face image into feature data in the person identification device used in the robot apparatus of the present invention.
FIG. 9 is a block diagram showing a configuration of an embodiment of a front face alignment means in the person identification device used in the robot apparatus of the present invention.
FIG. 10 is a block diagram showing a configuration of an embodiment of a front face search means in the person identification device used in the robot apparatus of the present invention.
FIG. 11 is a block diagram showing a configuration of another embodiment of the front face search means in the person identification device used in the robot apparatus of the present invention.
FIG. 12 is a diagram for explaining head detection processing and face alignment processing in the person identification device used in the robot apparatus of the present invention.
FIG. 13 is a block diagram showing the configuration of each embodiment of face identification means, identification dictionary storage means, and identification result correction means in the person identification device used in the robot apparatus of the present invention.
FIG. 14 is a block diagram showing a configuration of an example of a dictionary data management unit in the person identification device used in the robot apparatus of the present invention.
FIG. 15 is a block diagram showing the configuration of each embodiment of face identification means, identification dictionary storage means, and identification result correction means in the person identification device used in the robot apparatus of the present invention.
FIG. 16 is a diagram showing a configuration of an embodiment of a robot apparatus according to the present invention.
FIG. 17 is a diagram showing a configuration of another embodiment of the robot apparatus of the present invention.
FIG. 18 is a flowchart for explaining person detection processing and identification processing in the person identification device used in the robot apparatus of the present invention.
FIG. 19 is a flowchart for explaining head detection tracking processing in the person identification device used in the robot apparatus of the present invention.
FIG. 20 is a flowchart for explaining head detection processing in the person identification device used in the robot apparatus of the present invention.
FIG. 21 is a flowchart for explaining front face alignment processing in the person identification device used in the robot apparatus of the present invention.
FIG. 22 is a flowchart for explaining a front face search process in the person identification device used in the robot apparatus of the present invention.
FIG. 23 is a flowchart for explaining a front face search process in the person identification device used in the robot apparatus of the present invention.
FIG. 24 is a flowchart for explaining face identification processing in the person identification device used in the robot apparatus of the present invention.
FIG. 25 is a flowchart for explaining a person dictionary registration process in the person identification device used in the robot apparatus of the present invention.
FIG. 26 is a flowchart for explaining behavior control in the robot apparatus of the present invention.
FIG. 27 is a flowchart for explaining behavior control in the robot apparatus of the present invention.
FIG. 28 is a flowchart for explaining behavior control in the robot apparatus of the present invention.
FIG. 29 is a block diagram showing a configuration of an embodiment of a linear discrimination dictionary creating means in the person identification device used in the robot apparatus of the present invention.
FIG. 30 is a block diagram showing a configuration of an example of eigenvector dictionary creating means in the person identification apparatus used in the robot apparatus of the present invention.
FIG. 31 is a diagram for explaining a linear discriminant dictionary creation method used for person identification processing used in the robot apparatus of the present invention.
FIG. 32 is a diagram for explaining a method for calculating a face-to-face distance by stereo vision used in the person identification process used in the robot apparatus of the present invention.
FIG. 33 is an explanatory diagram of an eigenvector projection distance dictionary used for person identification processing used in the robot apparatus of the present invention.
FIG. 34 is an explanatory diagram of a linear discrimination dictionary used for person identification processing used in the robot apparatus of the present invention.
FIG. 35 is a block diagram showing the configuration of another embodiment of the person identification device used in the robot apparatus of the present invention.
[Explanation of symbols]
1 Person detection and identification means
2 Video acquisition means
3 Right camera
4 Left camera
5 Face-to-face distance sensor
6 Head detection and tracking means
7 Front face alignment means
8 facial feature extraction means
9 Face identification means
10. Identification dictionary storage means
11 Identification result correction means
12 Dictionary Data Management Department
13 Person Detection Identification Management Unit
14 Person identification device
21 Head detection means
22 Head tracking means
23 Head rectangle storage means
24 Monocular head rectangular coordinate detection means
25 Left and right image matching means
26 Face-to-face distance evaluation means
27 Head detection tracking means
28 Right camera
29 Left camera
30 Face-to-face distance sensor
31 Face-to-face distance integration means
32 Head detection and tracking means
33 Head detection means
34 CCD camera
41 Head rectangular coordinate detection means
42 Moving pixel detection means
43 Number of persons evaluation means
44 Head coordinate detection means
45 Head lower coordinate detection means
46 Temporal coordinate detection means
47 Movement area width
48 Integrated difference image G
49 Integrated difference image G
50 Integrated differential image upper area
51 Head detection rectangle
52 Search results
53 Front frame head rectangle coordinates
54 Search Trajectory
55 Head rectangular image of previous frame
61 Front face alignment means
62 Head rectangular cutting means
63 Front face search means
64 Standard face dictionary data storage means
65 Front face appearance determination means
66 Density dispersion determination means
67 Threshold processing means
71 Front face search means
72 Front face candidate extraction means
73 Intermediate reduced image storage means
74 Contrast correction means
75 Eigenvector Projection Distance Calculation Means
76 Standard face dictionary storage means
77 Memory means
78 Projection distance minimum judging means
79 Search range end determination means
80 Standard face average data storage means
81 Standard face eigenvector data storage means
82 Mean difference means
83 Vector projection value calculation means
84 Reconstruction calculation means
85 Projection distance calculation means
86 Projection distance minimum value storage means
87 Front face gray value storage means
88 Head intermediate size calculation means
89 Head rectangular image storage means
90 Image reduction means
91 Intermediate size storage means
92 Multi-resolution processing end judging means
101 Front face search means
102 Product-sum operation means
103 means for determining maximum similarity
104 Standard face dictionary data storage
105 Storage means
106 Similarity maximum value storage means
107 Front face gray value storage means
111 Face identification means
112 Identification dictionary storage means
113 Identification result correction means
114 Identification result weighted average calculation means
115 Feature data storage means
116 Product-sum operation means
117 Threshold processing means
118 Maximum similarity person determination means
119 Registered person identification dictionary storage means
121 Dictionary Data Management Department
122 Personal Feature Data Management Department
123 Identification dictionary generating means
124 selector
125 Characteristic data by person
126 Linear discriminant dictionary creation means
127 Eigenvector Dictionary Creation Means
131 Face identification means
132 Identification dictionary storage means
133 Other person discrimination means
134 Other person discrimination dictionary storage means
151 Head rectangle
152 Head rectangular image
153 Reduced head rectangular image
154 Front face search shape
155 Front face image
201 Robot device
202 CCD camera
203 Person detection identification means
204 Dictionary Data Management Department
205 Person detection identification management unit
206 Overall control unit
207 Speaker
208 Robot moving means
209 Motor
210 wheels
221 Robot device
222 Touch sensor
223 Face-to-face distance sensor
224 Person detection identification means
225 microphone
226 Voice recognition means
227 Overall control unit
311 Linear discriminant dictionary creation means
312 Feature amount X storage means
313 Covariance matrix Cxx calculation means
314 Inverse matrix conversion means
315 Matrix multiplication means
316 Constant term calculation means
317 Objective variable Y storage means
318 Objective variable Y generation means
319 Covariance matrix Cxy calculation means
320 Coefficient storage means
331 Eigenvector dictionary creation means
332 feature quantity storage means
333 feature amount average calculation means
334 Covariance matrix calculation means
335 Eigenvector calculation means
336 Coefficient storage means
341 Personal feature data X
342 Objective variable Y
343 Average value of feature data
344 Target variable mean
401 Right camera
402 Left camera
403 Object
411 1st eigenvector
412 Second eigenvector
413 Pth eigenvector
414 average
421 constant term
422 Class 1 identification coefficient
423 Class 2 identification coefficient
424 Class q identification coefficient
425 multiplication term
501 Person detection identification processing unit
502 Person detection identification management unit
503 recording medium

Claims

Front face alignment means for obtaining a front face image from the obtained image information;
Face identifying means for identifying a person using an identification dictionary from the front face image;
An identification result correcting unit that corrects an identification result output from the face identifying unit using information of past N frames (N is an integer of 2 or more) ;
The robot apparatus according to claim 1, wherein the identification result correcting unit performs a weighted average of the identification results output from the face identifying unit for the past N frames (N is an integer of 2 or more) .

Head detection and tracking means for detecting a human head from the acquired image information;
A front face alignment means for obtaining a front face image from the detected partial image of the head;
Facial feature extraction means for converting a front face image into a feature quantity;
Face identification means for identifying a person from a feature quantity using an identification dictionary;
An identification dictionary storage means for storing the identification dictionary;
An identification result correcting unit that corrects an identification result output from the face identifying unit using information of past N frames (N is an integer of 2 or more) ;
The robot apparatus according to claim 1, wherein the identification result correcting unit performs a weighted average of the identification results output from the face identifying unit for the past N frames (N is an integer of 2 or more) .

The robot apparatus according to claim 1 or 2 ,
An overall control unit for controlling the operation of the robot;
Moving means for moving the robot in accordance with instructions from the overall control unit;
A robot apparatus comprising:

In the said person identification device of the robot apparatus as described in any one of Claims 1 thru | or 3 ,
The face identification means is
A sum-of-products operation means for calculating the similarity to the registered person from the linear discrimination dictionary and the feature data;
Maximum similarity person determination means for obtaining a maximum value of similarity to a registered person;
A threshold value processing means for comparing the maximum value of the similarity with a predetermined threshold value to determine whether or not another person;
A robot apparatus comprising:

In the said person identification device of the robot apparatus as described in any one of Claims 1 thru | or 3 ,
The face identification means is
Eigenvector other person discrimination means for calculating an eigenvector projection distance from the eigenvector dictionary and feature data for discriminating others and comparing distance values;
A product-sum operation means for calculating the similarity to the registered person from the linear discrimination dictionary and the feature data; a maximum similarity person determination means for obtaining the maximum value of the similarity to the registered person;
A threshold value processing means for comparing the maximum value of the similarity with a predetermined threshold value to determine whether or not another person;
A robot apparatus comprising:

Video acquisition means for acquiring images;
Head detection tracking means for detecting a human head from the acquired image information;
A front face alignment means for obtaining a front face image from the detected partial image of the head;
Facial feature extraction means for converting a front face image into a feature quantity;
Face identification means for identifying a person from a feature quantity using an identification dictionary;
An identification dictionary storage means for storing the identification dictionary;
Identification result correction means for weighted average of the identification results output from the face identification means for the past N frames (N is an integer of 2 or more);
A person identification device including:
An overall control unit for controlling the operation of the robot;
A voice output means for speaking a voice in response to an instruction from the overall control unit;
Moving means for moving the robot in accordance with instructions from the overall control unit;
A robot apparatus comprising:

The robot apparatus according to claim 6 , wherein when the person identification result is obtained, the overall control unit controls to speak with a different voice for each person.

Means for acquiring a facing distance and direction of a front object from the person identification device;
Means for obtaining a person identification result;
If the facing distance is greater than or equal to a threshold value, means for moving to approach the front object;
When the face-to-face distance is less than or equal to a threshold value, means for controlling the person identification result to be uttered with a different voice for each person
The robot apparatus according to claim 6, further comprising: