JP4938748B2

JP4938748B2 - Image recognition apparatus and program

Info

Publication number: JP4938748B2
Application number: JP2008285697A
Authority: JP
Inventors: 祐一岩舘; 光裕芥川
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2008-11-06
Filing date: 2008-11-06
Publication date: 2012-05-23
Anticipated expiration: 2028-11-06
Also published as: JP2010113530A

Description

本発明は、カメラで撮像した映像から手のひらの姿勢を推定する技術に関し、特に、手のひらの姿勢を推定する画像認識装置及びプログラムに関する。 The present invention relates to a technique for estimating the posture of a palm from an image captured by a camera, and more particularly to an image recognition apparatus and program for estimating the posture of a palm.

拡張現実感技術は、カメラの映像をコンピュータに入力し、その映像の中にＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃ）などで生成した仮想物体を合成する技術である。従来、このような拡張現実感技術のシステムでは、マーカーを用いて、その位置や姿勢を推定し、マーカー上に仮想物体を合成する手法がとられていた（例えば、非特許文献１参照）。 Augmented reality technology is a technology in which video from a camera is input to a computer and a virtual object generated by CG (Computer Graphic) or the like is synthesized in the video. Conventionally, in such an augmented reality technology system, a method has been used in which a marker is used to estimate its position and orientation, and a virtual object is synthesized on the marker (see, for example, Non-Patent Document 1).

また、マーカーを用いる代わりに２台のカメラを用いて、手の形状や姿勢を推定する技術も知られている（例えば、非特許文献２）。この文献では、マーカーを用いる代わりに、推定した手のひら上に仮想物体を合成することも開示している。 In addition, a technique for estimating the shape and posture of a hand using two cameras instead of using a marker is also known (for example, Non-Patent Document 2). This document also discloses synthesizing a virtual object on the estimated palm instead of using a marker.

一方、マーカーを用いる代わりに１台のカメラを用いて、多数の方向から撮影した手のひらの画像を基準画像平面とし、これら基準画像平面の特徴点を予め学習しておき、システムに入力される画像の特徴点との比較を行うことにより手のひらの姿勢を推定する技術も知られている（例えば、非特許文献３参照）。 On the other hand, using one camera instead of a marker, palm images taken from a number of directions are used as reference image planes, and feature points of these reference image planes are learned in advance and input to the system. There is also known a technique for estimating the posture of the palm by comparing with the feature points (see, for example, Non-Patent Document 3).

これらの技術以外にも、２次元画像中において、指の本数を推定する技術や、一本の指の画像上での方向を推定する技術が知られている（例えば、特許文献１、特許文献２参照）。 In addition to these techniques, a technique for estimating the number of fingers in a two-dimensional image and a technique for estimating the direction of an image on one finger are known (for example, Patent Document 1, Patent Document). 2).

加藤博一、「拡張現実感システム構築ツールＡＲＴｏｏｌＫｉｔの開発」、電子情報通信学会技術報告、Ｖｏｌ．１０１，Ｎｏ．６５２，ＰＲＭＵ，２００１−２３２，２００２年２月、ｐｐ．７９−８６Hirokazu Kato, “Development of Augmented Reality System Construction Tool ARTToolKit”, IEICE Technical Report, Vol. 101, no. 652, PRMU, 2001-232, February 2002, pp. 79-86 齋藤真希子、佐藤洋一、小池英樹、「ＰｅｒｃｅｐｔｕａｌＧｌｏｖｅ:多視点画像に基づく手形状・姿勢の実時間入力とその応用」、社団法人情報処理学会論文誌、Ｖｏ１．４３，Ｎｏ．１，２００２年１月、ｐｐ．１８５−１９４Makiko Saito, Yoichi Sato, Hideki Koike, “Perceptual Glove: Real-time input of hand shape and posture based on multi-viewpoint image and its application”, Journal of Information Processing Society of Japan, Vo1.43, No. 1, 2002, pp. 185-194 加藤喬、近藤祐介、甲藤二郎、「ＨａｎｄｙＡＲ：手をインターフェースとした拡張現実感システムＨａｎｄｙＡＲの開発」、ＦＩＴ２００５，Ｉ−０４５，２００５年９月、ｐｐ．１１１−１１２Satoshi Kato, Yusuke Kondo, Jiro Katto, “HandyAR: Development of Augmented Reality System HandyAR with Hand Interface”, FIT 2005, I-045, September 2005, pp. 111-112 特許第３８６３８０９号Japanese Patent No. 3863809 特開平８−７６９１２号公報JP-A-8-76991

マーカーを用いて映像を撮像し、映像中の物体の位置を推定する従来からの技術の典型的な例を図９に示す。図９において、物体１０３には、例えば正方形の４つの頂点を有する平面パターンのようなマーカー１０１が付与されている。カメラ１０２は、この物体１０３を撮像し、姿勢位置推定装置１００は、カメラ１０２で撮像した画像を入力する画像入力部１１１と、撮像画像中のマーカー１０１の位置を検出するマーカー検出部１１２と、マーカー１０１の位置及び姿勢を算出する位置・姿勢算出部１１３とを備える。この位置・姿勢算出部１１３は、撮像画像中のマーカー１０１の位置情報を入力するため、実際の３次元空間における物体の位置及び姿勢を特定することができ、更には仮想物体を合成するＣＧへと応用することができる。しかしながら、この技術では、マーカーがカメラの撮像画像に写り込むために現実感が損なわれる。 FIG. 9 shows a typical example of a conventional technique for capturing an image using a marker and estimating the position of an object in the image. In FIG. 9, the object 103 is provided with a marker 101 such as a plane pattern having four vertices of a square, for example. The camera 102 images the object 103, and the posture position estimation apparatus 100 includes an image input unit 111 that inputs an image captured by the camera 102, a marker detection unit 112 that detects the position of the marker 101 in the captured image, A position / orientation calculation unit 113 that calculates the position and orientation of the marker 101. Since this position / orientation calculation unit 113 inputs position information of the marker 101 in the captured image, the position / orientation of the object in the actual three-dimensional space can be specified, and further to the CG for synthesizing the virtual object. And can be applied. However, this technique impairs the sense of reality because the marker is reflected in the captured image of the camera.

また、マーカーを用いる代わりに１台又は２台のカメラで多数方向の画像を取得する技術は、結果的に、多数の方向から撮影した手のひらの画像に基づいて、複数の基準画像平面を生成して解析又は学習しておくことになるため処理負担が大きく、且つ事実上のシステム構成も大規模なものとなるという課題がある。 Also, the technique of acquiring images in multiple directions with one or two cameras instead of using markers results in the generation of a plurality of reference image planes based on palm images taken from multiple directions. Therefore, there is a problem that the processing load is large and the actual system configuration becomes large.

特許文献１又は特許文献２に開示される技術では、２次元画像中において、指の本数を推定する技術や、一本の指の画像上での方向を推定することが可能であるとしても、手のひらの３次元の姿勢の推定を行うための情報が不足しており、手のひらの３次元の姿勢の推定へと応用することは容易ではない。 In the technology disclosed in Patent Literature 1 or Patent Literature 2, even if it is possible to estimate the number of fingers in a two-dimensional image or the direction on the image of one finger, Information for estimating the three-dimensional posture of the palm is insufficient, and application to estimation of the three-dimensional posture of the palm is not easy.

本発明の目的は、上記の問題を鑑みて、マーカーを用いることなく、事前学習することなく、一台のカメラで撮影した映像から手のひらの姿勢を推定する画像認識装置及びプログラムを提供することにある。 In view of the above problems, an object of the present invention is to provide an image recognition apparatus and program for estimating the posture of a palm from an image captured by a single camera without using a marker and without prior learning. is there.

本発明の画像認識装置は、動画像のみから該動画像に含まれる手のひらの３次元の姿勢を推定する画像認識装置であって、動画像の画像フレーム中の手のひら画像部分を肌色判定処理を施して手のひらのシルエット画像を抽出し、該シルエット画像における手のひらの画像部分から最小外接多角形処理及び円形抽出フィルタ処理を施して指先の画像座標を取得する指先画像座標取得部と、動画像の世界座標及び当該世界座標の１平面に位置する基準画像平面を設定し、該基準画像平面における各指先の画像座標に対して、前記指先の画像座標の平面射影行列を生成する平面射影変換行列生成部と、前記平面射影行列から基準画像平面に対する透視投影行列を生成し、動画像に含まれる手のひらの画像から前記世界座標における手のひらの姿勢情報を生成する位置姿勢算出部とを備えることを特徴とする。 An image recognition apparatus according to the present invention is an image recognition apparatus that estimates a three-dimensional posture of a palm included in a moving image from only a moving image, and performs a skin color determination process on a palm image portion in an image frame of the moving image. A fingertip image coordinate acquisition unit for extracting a palm silhouette image and performing a minimum circumscribed polygon processing and a circular extraction filter processing from the palm image portion in the silhouette image to acquire a fingertip image coordinate; and a world coordinate of a moving image And a plane projection transformation matrix generation unit that sets a reference image plane located on one plane of the world coordinates and generates a plane projection matrix of the image coordinates of the fingertips with respect to the image coordinates of each fingertip on the reference image plane. Generating a perspective projection matrix for the reference image plane from the plane projection matrix, and the palm posture in the world coordinates from the palm image included in the moving image Characterized in that it comprises a position and orientation calculation unit for generating a multi-address.

また、本発明の画像認識装置において、抽出した各指先に対応する特徴点の画像座標間の距離をそれぞれ比較して指先の種類を特定する指先種別判定部と、前記手のひらのシルエット画像における手のひらの画像部分から、中指と各指の重心とを結ぶ線分方向に該重心から移動しながら、該線分と直交するシルエット画像成分を計数し、この計数値の減少傾向を追跡して手首近辺の特徴点の画像座標を決定する手首画像座標取得部とを更に備え、前記平面射影変換行列生成部は、当該世界座標の１平面に位置する基準画像平面における各指先の基準座標と手首の基準座標に対して、前記手首付近の特徴点の画像座標の平面射影行列を生成することを特徴とする。 Further, in the image recognition device of the present invention, a fingertip type determination unit that compares the distance between the image coordinates of the feature points corresponding to each extracted fingertip to specify the type of the fingertip, and the palm of the palm silhouette image While moving from the center of gravity in the direction of the line segment connecting the middle finger and the center of gravity of each finger from the image portion, the silhouette image component orthogonal to the line segment is counted, and the decreasing tendency of this counted value is tracked to track the vicinity of the wrist A wrist image coordinate acquisition unit that determines the image coordinates of the feature points, and the plane projection transformation matrix generation unit includes a reference coordinate of each fingertip and a reference coordinate of the wrist in a reference image plane located on one plane of the world coordinates On the other hand, a planar projection matrix of image coordinates of feature points near the wrist is generated.

また、本発明の画像認識装置において、前記平面射影変換行列生成部は、前記基準画像平面上に仮想的な正方形を設定し、前記位置姿勢算出部は、該正方形の４つの頂点に対して平面射影変換行列を用いて射影変換を施し、この射影変換で得られる４つの頂点の画像座標に基づいて基準画像平面に対する当該透視投影行列を生成し、手のひらの姿勢を推定することを特徴とする。 In the image recognition device of the present invention, the plane projection transformation matrix generation unit sets a virtual square on the reference image plane, and the position / orientation calculation unit generates a plane with respect to four vertices of the square. Projective transformation is performed using a projective transformation matrix, the perspective projection matrix for the reference image plane is generated based on the image coordinates of the four vertices obtained by the projective transformation, and the posture of the palm is estimated.

また、本発明の画像認識装置において、前記指先画像座標取得部は、処理対象の画像フレーム中から取得した各指先の画像座標が、５つの指先の画像座標として取得したか否かを判別する手段を有し、５つの指先の画像座標として取得していないと判別した場合に、当該動画像における別の画像フレームを処理対象とすることを特徴とする。 Further, in the image recognition device of the present invention, the fingertip image coordinate acquisition unit determines whether or not the image coordinates of each fingertip acquired from the processing target image frame are acquired as image coordinates of five fingertips. When it is determined that the image coordinates of the five fingertips are not acquired, another image frame in the moving image is a processing target.

更に、本発明は、動画像のみから該動画像に含まれる手のひらの３次元の姿勢を推定する画像認識装置として構成するコンピュータに、動画像の画像フレーム中の手のひら画像部分を肌色判定処理を施して手のひらのシルエット画像を抽出し、該シルエット画像における手のひらの画像部分から最小外接多角形処理及び円形抽出フィルタ処理を施して指先の画像座標を取得するステップと、動画像の世界座標及び当該世界座標の１平面に位置する基準画像平面を設定し、該基準画像平面における各指先の基準座標に対して、前記指先の画像座標の平面射影行列を生成するステップと、前記平面射影行列から基準画像平面に対する透視投影行列を生成し、動画像に含まれる手のひらの画像から前記世界座標における手のひらの姿勢情報を生成するステップとを実行させるためのプログラムとしても特徴付けられる。 Furthermore, the present invention performs skin color determination processing on a palm image portion in an image frame of a moving image on a computer configured as an image recognition apparatus that estimates a three-dimensional posture of a palm included in the moving image from only the moving image. Extracting a palm silhouette image, performing minimum circumscribed polygon processing and circular extraction filter processing from the palm image portion of the silhouette image to obtain image coordinates of the fingertip, and moving image world coordinates and the world coordinates Generating a plane projection matrix of the image coordinates of the fingertips with respect to the reference coordinates of each fingertip on the reference image plane, and a reference image plane from the plane projection matrix A perspective projection matrix for the position of the hand, and the posture information of the palm in the world coordinates is generated from the palm image included in the moving image. Also characterized as a program to execute the steps.

本発明によれば、マーカーを用いる代わりに、手のひらの指先抽出と指先種別の判定を用いて姿勢を推定することができる。 According to the present invention, instead of using a marker, posture can be estimated using fingertip extraction of the palm and determination of the fingertip type.

更に、推定した手のひらの位置や姿勢の情報を利用した仮想空間中の才ブジェクトの操作、又は電子ゲームなど様々な分野での応用も期待できる。 Furthermore, application in various fields such as manipulation of talented objects in a virtual space using information on estimated palm positions and postures, or electronic games can be expected.

以下、本発明による実施例の画像認識装置を説明する。尚、同様な構成要素には、同一の参照番号を付している。 Hereinafter, an image recognition apparatus according to an embodiment of the present invention will be described. Similar components are denoted by the same reference numerals.

図１に、本発明による実施例の画像認識装置を用いたシステム例を示す。図１において、定義された世界座標に置かれる物理物体３（肌色以外）とその上部に位置する手のひら４とを含む被写体をカメラ２で撮像し、本実施例の画像認識装置１で、カメラ２で撮像した画像から手のひらの３次元の位置及び姿勢を推定する様子を示している。尚、物理物体３の有無は本願発明とは無関係であり、背景の一部として考えてよいが、本実施例の画像認識装置は、後述するように手のひら検出のために肌色抽出を行うため、背景及び物理物体は肌色以外の色を有するものとする。 FIG. 1 shows an example of a system using an image recognition apparatus according to an embodiment of the present invention. In FIG. 1, an object including a physical object 3 (other than skin color) placed at defined world coordinates and a palm 4 positioned above the physical object 3 is imaged by the camera 2, and the camera 2 is captured by the image recognition apparatus 1 of the present embodiment. 3 shows a state in which the three-dimensional position and posture of the palm are estimated from the image captured in FIG. The presence or absence of the physical object 3 is irrelevant to the present invention and may be considered as a part of the background, but the image recognition apparatus of the present embodiment performs skin color extraction for palm detection as described later. The background and the physical object have colors other than the skin color.

画像認識装置１は、画像入力部１１と、指先画像座標取得部１２と、指先種別判定部１３と、手首画像座標取得部１４と、平面射影変換行列生成部１５と、位置姿勢算出部１６と、基準画像平面発生部１７とを備える。 The image recognition device 1 includes an image input unit 11, a fingertip image coordinate acquisition unit 12, a fingertip type determination unit 13, a wrist image coordinate acquisition unit 14, a plane projection transformation matrix generation unit 15, and a position and orientation calculation unit 16. A reference image plane generation unit 17.

画像入力部１１は、カメラ２で撮像した動画像を入力して、画像フレーム単位で指先画像座標取得部１２に送出する。 The image input unit 11 inputs a moving image captured by the camera 2 and sends it to the fingertip image coordinate acquisition unit 12 in units of image frames.

指先画像座標取得部１２は、動画像の画像フレーム中の手のひら画像部分を後述する肌色判定処理を施して手のひらのシルエット画像を抽出し、シルエット画像における手のひらの画像部分から円形抽出フィルタ処理を施して指先の画像座標を取得し、手のひら画像部分と指先の画像座標の情報とを指先種別判定部１３及び手首画像座標取得部１４に送出する。尚、指先画像座標取得部１２は、後述するように、手のひらのシルエット画像に外接する多角形の頂点の画像座標を求め、該多角形頂点座標に円形抽出フィルタを施すことにより指先以外の頂点を排除することで、手のひら画像部分からの指先の画像座標を取得する。 The fingertip image coordinate acquisition unit 12 extracts a palm silhouette image by performing a skin color determination process described later on a palm image portion in an image frame of a moving image, and performs a circular extraction filter process from the palm image portion in the silhouette image. The image coordinates of the fingertip are acquired, and the palm image portion and the image coordinate information of the fingertip are transmitted to the fingertip type determination unit 13 and the wrist image coordinate acquisition unit 14. As will be described later, the fingertip image coordinate acquisition unit 12 obtains image coordinates of polygon vertices circumscribing the palm silhouette image, and applies a circular extraction filter to the polygon vertex coordinates to obtain vertices other than the fingertips. By eliminating, the image coordinates of the fingertip from the palm image portion are acquired.

指先種別判定部１３は、入力した手のひら画像部分とその指先の画像座標の情報とから、抽出した各指先に対応する特徴点の画像座標間の距離をそれぞれ比較して指先の種類を特定し、入力した手のひら画像部分の指先画像と対応付けて、平面射影変換行列生成部１５に送出する。尚、指先種別判定部１３は、手のひらのシルエット画像から抽出した指先の画像座標について、各画像座標間の距離に基づいて指の種別を特定する。 The fingertip type determination unit 13 identifies the type of the fingertip by comparing the distance between the image coordinates of the feature points corresponding to each extracted fingertip from the input palm image portion and the image coordinate information of the fingertip. The input palm image portion is associated with the fingertip image and sent to the planar projection transformation matrix generation unit 15. The fingertip type determination unit 13 specifies the finger type based on the distance between the image coordinates of the fingertip image coordinates extracted from the palm silhouette image.

手首画像座標取得部１４は、入力した手のひら画像部分とその指先位置の画像座標の情報とから、中指と各指の重心とを結ぶ線分方向に該重心から移動しながら、該線分と直交するシルエット画像成分を計数し、この計数値の減少傾向を追跡して、手首近辺の特徴点を決定し、この手首近辺の特徴点の画像座標を取得し、入力した手のひら画像部分の手首画像と対応付けて、平面射影変換行列生成部１５に送出する。尚、手首画像座標取得部１４は、処理対象の画像フレーム中から取得した各指先の画像座標が、５つの指先の画像座標として取得したか否かを判別し、５つの指先の画像座標として取得していないと判別した場合に、当該動画像における別の画像フレームを処理対象とする機能を有する。ここで云う「別の画像フレーム」とは、当該動画像における処理対象の画像フレームに連続する次の画像フレームでもよいし、所定数の後の画像フレームでもよいし、用途によっては、無秩序に別の画像フレームであってもよい。 The wrist image coordinate acquisition unit 14 is orthogonal to the line segment while moving from the center of gravity in the direction of the line segment connecting the middle finger and the center of gravity of each finger from the input palm image portion and the image coordinate information of the fingertip position. The silhouette image components to be counted are counted, the decreasing tendency of the counted value is tracked, the feature points near the wrist are determined, the image coordinates of the feature points near the wrist are obtained, the wrist image of the input palm image portion and Correspondingly, it is sent to the planar projection transformation matrix generation unit 15. The wrist image coordinate acquisition unit 14 determines whether the image coordinates of each fingertip acquired from the image frame to be processed are acquired as the image coordinates of the five fingertips, and acquires the image coordinates of the five fingertips. If it is determined that it is not, it has a function of processing another image frame in the moving image. The “other image frame” referred to here may be the next image frame that is continuous with the image frame to be processed in the moving image, the image frame after a predetermined number of times, or may be randomly divided depending on the application. It may be an image frame.

平面射影変換行列生成部１５は、手のひら画像部分、指の種類毎の指先座標、及び好適には手首近辺の画像座標を入力するとともに、動画像の世界座標及び当該世界座標の１平面に位置する基準画像平面を設定し、基準画像平面発生部１７から供給される当該世界座標の１平面に位置する手のひらを正面から撮影して得られる基準画像平面における指先の基準座標に対して、指先の画像座標（及び手首の画像座標）の平面射影行列を算出して生成し、位置姿勢算出部１６に送出する。尚、後述するように、平面射影変換行列生成部１５は、手のひら画像部分、及び指の種類毎の指先座標のみから、基準画像平面発生部１７から供給される基準画像平面に対する指先の画像座標の平面射影行列を算出することができ、手首近辺の画像座標を更に用いるのは、平面射影行列のパラメータの計算精度を高めるためである。 The planar projective transformation matrix generation unit 15 inputs a palm image part, fingertip coordinates for each finger type, and preferably image coordinates near the wrist, and is located on one plane of the world coordinates of the moving image and the world coordinates. The image of the fingertip is set with respect to the reference coordinates of the fingertip on the reference image plane obtained by setting the reference image plane and photographing the palm located on one plane of the world coordinates supplied from the reference image plane generator 17 from the front. A plane projection matrix of coordinates (and wrist image coordinates) is calculated and generated and sent to the position and orientation calculation unit 16. As will be described later, the plane projective transformation matrix generation unit 15 calculates the image coordinates of the fingertip relative to the reference image plane supplied from the reference image plane generation unit 17 from only the palm image part and the fingertip coordinates for each type of finger. The plane projection matrix can be calculated and the image coordinates near the wrist are further used in order to increase the calculation accuracy of the parameters of the plane projection matrix.

位置姿勢算出部１６は、平面射影行列から基準画像平面に対する透視投影行列を生成し、動画像に含まれる手のひらの画像から世界座標における手のひらの姿勢情報を生成し、外部に送出する。外部に送出される手のひらの姿勢情報は、カメラ２で撮像した動画像を用いて、ＣＧなどの自然な合成映像を生成するのに利用することができる。 The position / orientation calculation unit 16 generates a perspective projection matrix for the reference image plane from the plane projection matrix, generates palm posture information in world coordinates from the palm image included in the moving image, and transmits the palm posture information to the outside. The palm posture information sent to the outside can be used to generate a natural synthesized video such as CG using a moving image captured by the camera 2.

基準画像平面発生部１７は、任意のカメラを用いて正面から撮影して得られる手のひらの基準画像平面を平面射影変換行列生成部１５に送出する。特に、平面射影変換行列生成部１５は、この基準画像平面に、仮想的な正方形を設定し、位置姿勢算出部１６は、この正方形の４つの頂点に対して平面射影変換行列を用いて射影変換を施して透視投影行列を生成し、射影変換で得られる４つの頂点の画像座標に基づいてカメラ２で撮像する「手のひら」の姿勢を推定する。従って、基準画像平面発生部１７が送出する画像の「手のひら」は、代表的な任意に設定したものでよく、カメラ２で撮像する「手のひら」と同一物である必要はない。 The reference image plane generation unit 17 sends a palm reference image plane obtained by photographing from the front using an arbitrary camera to the plane projection transformation matrix generation unit 15. In particular, the plane projection transformation matrix generation unit 15 sets a virtual square on the reference image plane, and the position / orientation calculation unit 16 performs projection transformation on the four vertices of the square using the plane projection transformation matrix. To generate a perspective projection matrix and estimate the posture of the “palm” captured by the camera 2 based on the image coordinates of the four vertices obtained by projective transformation. Therefore, the “palm” of the image sent out by the reference image plane generation unit 17 may be a representative arbitrary setting, and need not be the same as the “palm” captured by the camera 2.

図２に、本発明による実施例の画像認識装置を用いたシステム例における処理環境を示す。カメラ２で撮像する被写体の「手のひら」４は、手のひら部分４ａと手首近傍部分４ｂとを含み、手首と衣服の袖部５とは区別する。また、本実施例の画像認識装置１は、被写体の「手のひら」４に対して、世界座標（Ｘｗ，Ｙｗ，Ｚｗ）を設定し、世界座標（Ｘｗ，Ｙｗ）平面に、基準画像平面発生部１７が発生する基準画像平面ＲＳを設定する。一方、カメラ２は、カメラ座標（Ｘｃ，Ｙｃ，Ｚｃ）で定義され、カメラ２で撮像した画像座標ＣＳ（ｕ，ｖ）を有する。従って、本実施例の画像認識装置１は、世界座標（Ｘｗ，Ｙｗ，Ｚｗ）における基準画像平面ＲＳと画像座標（ｕ，ｖ）との間の変換行列（後述する平面射影変換行列で表すことができる）から、任意に変化する世界座標（Ｘｗ，Ｙｗ，Ｚｗ）の被写体の「手のひら」４と基準画像平面ＲＳとの間の変換行列（後述する透視投影行列で表すことができる）を特定し、世界座標（Ｘｗ，Ｙｗ，Ｚｗ）の被写体の「手のひら」４と画像座標（ｕ，ｖ）との間の射影変換を実現し、カメラ２で撮像する３次元の「手のひら」の姿勢を推定する。 FIG. 2 shows a processing environment in a system example using the image recognition apparatus according to the embodiment of the present invention. The “palm” 4 of the subject imaged by the camera 2 includes a palm part 4a and a wrist vicinity part 4b, and distinguishes between the wrist and the sleeve part 5 of the clothes. Further, the image recognition apparatus 1 of the present embodiment sets the world coordinates (Xw, Yw, Zw) for the “palm” 4 of the subject, and sets the reference image plane generation unit on the world coordinates (Xw, Yw) plane. A reference image plane RS in which 17 occurs is set. On the other hand, the camera 2 is defined by camera coordinates (Xc, Yc, Zc) and has image coordinates CS (u, v) captured by the camera 2. Therefore, the image recognition apparatus 1 according to the present embodiment represents a transformation matrix between the reference image plane RS and the image coordinates (u, v) in the world coordinates (Xw, Yw, Zw) (a plane projection transformation matrix described later). A transformation matrix (which can be expressed by a perspective projection matrix, which will be described later) between the “palm” 4 of the subject with an arbitrarily changing world coordinate (Xw, Yw, Zw) and the reference image plane RS Then, projective transformation between the “palm” 4 of the subject in the world coordinates (Xw, Yw, Zw) and the image coordinates (u, v) is realized, and the posture of the three-dimensional “palm” captured by the camera 2 is realized. presume.

尚、画像データと画像座標の対応付け、及び画像データと指先種別の対応付けは、任意のフォーマットで実現することができ、例えば画像の画素データ毎に座標データと指先種別の情報（５本の指であるので３ｂｉｔで表現できる）の付帯情報を付与するフォーマットで実現することができる。 The association between image data and image coordinates, and the association between image data and fingertip type can be realized in an arbitrary format. For example, coordinate data and fingertip type information (five pieces of information) for each pixel data of an image. (This can be expressed in 3 bits because it is a finger.)

次に、本発明による実施例の画像認識装置における、より詳細な動作を説明する。 Next, a more detailed operation in the image recognition apparatus according to the embodiment of the present invention will be described.

図３に、本発明による実施例の画像認識装置の動作を表すフローチャートを示す。ステップＳ１で、画像入力部１１により、カメラ２で撮像した動画像を入力して、画像フレーム単位で指先画像座標取得部１２に送出する。 FIG. 3 is a flowchart showing the operation of the image recognition apparatus according to the embodiment of the present invention. In step S <b> 1, a moving image captured by the camera 2 is input by the image input unit 11 and sent to the fingertip image coordinate acquisition unit 12 in units of image frames.

ステップＳ２で、指先画像座標取得部１２により、まず、カメラ映像の１つの画像フレーム中の手のひら領域を抜き出すために、色空間において肌色領域に対する閾値処理を施す。次に、指先画像座標取得部１２により、抜き出した手のひら領域の外接矩形を求める。更に、指先画像座標取得部１２により、その外接矩形の頂点近傍に円形抽出フィルタを施して指先のみを取得する。ステップＳ２の動作は、「処理１」として詳細に後述する。 In step S2, the fingertip image coordinate acquisition unit 12 first performs threshold processing on the skin color region in the color space in order to extract the palm region in one image frame of the camera video. Next, the circumscribed rectangle of the extracted palm region is obtained by the fingertip image coordinate acquisition unit 12. Further, the fingertip image coordinate acquisition unit 12 applies a circular extraction filter to the vicinity of the vertex of the circumscribed rectangle to acquire only the fingertip. The operation of step S2 will be described later in detail as “Process 1”.

ステップＳ３で、指先画像座標取得部１２により、処理対象の画像フレームに対して５つの指先の画像座標を取得できたか否かを判定し、指先の画像座標を取得できない場合は、例えば次の画像フレームに対してステップＳ２に戻り、指先の画像座標を取得できた場合は、現行の処理対象の画像フレームについてステップＳ３に移行する。尚、５つの指先の画像座標を取得できたか否かを判定するには、取得した指先の画像座標の各々が、所定の距離以上離れており、且つ取得した指先の画像座標が５つ存在するか否かで判定する。この指先の画像座標を取得できたか否かの情報は、画像フレーム単位で識別可能なフラグを付して記録するように用いてもよい。例えば撮像した動画像を動画ファイルとして記録する用途に本実施例の画像認識装置を適用した場合、このフラグ値を参照すればこの指先の画像座標を取得できたか否かの情報を直ちに識別できるようになる。 In step S3, the fingertip image coordinate acquisition unit 12 determines whether or not the image coordinates of the five fingertips have been acquired for the image frame to be processed. Returning to step S2 for the frame, if the image coordinates of the fingertip can be acquired, the process proceeds to step S3 for the current image frame to be processed. In order to determine whether or not the image coordinates of the five fingertips have been acquired, each of the acquired image coordinates of the fingertips is separated by a predetermined distance or more, and there are five acquired image coordinates of the fingertips. Judge by whether or not. Information regarding whether or not the image coordinates of the fingertip can be acquired may be used so as to be recorded with a flag that can be identified in units of image frames. For example, when the image recognition apparatus of the present embodiment is applied to use for recording a captured moving image as a moving image file, it is possible to immediately identify whether or not the image coordinates of the fingertip can be acquired by referring to this flag value. become.

ステップＳ４で、指先種別判定部１３により、ステップＳ２で取得した指先の画像座標から、指先間距離の大小により、取得した指先座標がそれぞれ何指の座標なのかを特定する。ステップＳ４の動作は、「処理２」として詳細に後述する。 In step S4, the fingertip type determination unit 13 specifies how many fingers the coordinates of the acquired fingertips are based on the distance between the fingertips from the image coordinates of the fingertips acquired in step S2. The operation of step S4 will be described later in detail as “Process 2”.

ステップＳ５で、手首画像座標取得部１４により、ステップＳ２で得られる中指の指先の画像座標から５つの指の重心画像座標に向かう直線上において、直線に直交する方向の肌色領域の幅を探索し、手首近傍の幅の変化から手首部分を推定し、推定した手首近辺の画像座標を取得する。尚、ステップＳ５の処理は、後述する平面射影行列のパラメータの計算精度を高めるためであるため、必ずしも行う必要は無い。また、ステップＳ５の処理は、ステップＳ４の処理の前に行うことや、ステップＳ４の処理と並行して行ってもよい。ステップＳ５の動作は、「処理３」として詳細に後述する。 In step S5, the wrist image coordinate acquisition unit 14 searches for the width of the skin color region in the direction orthogonal to the straight line on the straight line from the image coordinate of the middle fingertip obtained in step S2 to the barycentric image coordinate of the five fingers. The wrist portion is estimated from the change in the width of the vicinity of the wrist, and the image coordinates near the estimated wrist are acquired. Note that the processing in step S5 is not necessarily performed because it is for improving the calculation accuracy of the parameters of the planar projection matrix described later. Further, the process of step S5 may be performed before the process of step S4 or in parallel with the process of step S4. The operation in step S5 will be described later in detail as “Process 3”.

ステップＳ６で、平面射影変換行列生成部１５により、ステップＳ２，Ｓ４，Ｓ５で得られた指先の画像座標（及び手首の画像座標）と、基準画像平面発生部１７から供給される手のひらの基準画像平面上の指先及び手首の基準座標とで構成される連立方程式を解いて、基準画像平面に対する指先の画像座標（及び手首の画像座標）の平面射影変換行列の各パラメータを決定する。ステップＳ６の動作は、「処理４」として詳細に後述する。 In step S 6, the image projection coordinates (and wrist image coordinates) obtained in steps S 2, S 4, and S 5 by the plane projection transformation matrix generation unit 15, and the palm reference image supplied from the reference image plane generation unit 17. The simultaneous equations composed of the fingertip and wrist reference coordinates on the plane are solved to determine each parameter of the plane projective transformation matrix of the fingertip image coordinates (and wrist image coordinates) with respect to the reference image plane. The operation in step S6 will be described later in detail as “Process 4”.

ステップ７で、位置姿勢算出部１６により、手のひらの基準画像平面上に仮想的に設置した正方形の４頂点の画像座標をステップＳ６で求めた平面射影変換行列で変換処理し、この変換処理で得られる４つの画像座標、及びカメラキャリブレーションにより予め求めた中心射影行列により、空間中の正方形の向かい合う辺を表す２つの直線の方程式を構成し、その２つの直線の法線の外積から平面を規定するｘベクトルを導出する。さらに、位置姿勢算出部１６により、空間中の正方形の向かい合う辺を表すもう一組の２つの直線の方程式を構成し、その２つの直線の法線の外積から平面を規定するｙベクトルを導出する。ここで得られるｘベクトル及びｙベクトルは直交しており、位置姿勢算出部１６によって、ｘベクトル及びｙベクトルの外積からｚベクトルを導出することにより、手のひらの姿勢を決定することができる。ｘベクトル、ｙベクトル、ｚベクトル、及びカメラキャリブレーションにより予め求めた中心射影行列から、手のひらの基準画像平面ＲＳに対するカメラ２の位置ベクトルを求めることができる。ステップＳ７の動作は、「処理５」として詳細に後述する。 In step 7, the position / orientation calculation unit 16 converts the image coordinates of the four vertices of the square virtually placed on the reference image plane of the palm with the plane projective transformation matrix obtained in step S6, and is obtained by this conversion process. Using the four image coordinates and the central projection matrix determined in advance by camera calibration, an equation of two straight lines representing opposite sides of a square in the space is constructed, and a plane is defined from the outer product of the normals of the two straight lines The x vector to be derived is derived. Further, the position / orientation calculation unit 16 constructs another set of two linear equations that represent opposite sides of a square in the space, and derives a y vector that defines a plane from the outer product of the normals of the two straight lines. . The x vector and the y vector obtained here are orthogonal to each other, and the posture of the palm can be determined by deriving the z vector from the outer product of the x vector and the y vector by the position and orientation calculation unit 16. The position vector of the camera 2 with respect to the reference image plane RS of the palm can be obtained from the x vector, the y vector, the z vector, and the central projection matrix obtained in advance by camera calibration. The operation in step S7 will be described later in detail as “Process 5”.

以下、各処理を更に詳細に説明する。まず、処理１を説明する。 Hereinafter, each process will be described in more detail. First, processing 1 will be described.

[処理１：指先の画像座標の取得]
（１）肌色領域の抽出
指先画像座標取得部１２は、まず、図４（ａ）に例示するカメラ２で撮影した画像フレームのＲＧＢ値を、ＨＱＶ表色系に変換し、ＨＱＶ空間内での閾値処理により肌色領域を抽出する。抽出の結果、図４（ｂ）のような手のひらのシルエット画像が得られる。 [Process 1: Acquisition of fingertip image coordinates]
(1) Extraction of skin color region First, the fingertip image coordinate acquisition unit 12 converts the RGB value of the image frame photographed by the camera 2 illustrated in FIG. 4A to the HQV color system, and in the HQV space. A skin color region is extracted by threshold processing. As a result of extraction, a palm silhouette image as shown in FIG. 4B is obtained.

（２）指先の画像座標の取得
指先画像座標取得部１２は、次に、図４（ｂ）のシルエット画像に対して、最小外接多角形を求める処理を施す。２値画像の所定の幾何形状に対して最小外接多角形を求める処理は、計算幾何学として知られる最小外接多角形処理の関数を用いるのが好適である（例えば、参考文献：奈良先端科学技術大学院大学ＯｐｅｎＣＶプログラミングブック制作チーム／著、「ＯｐｅｎＣＶプログラミングブック」、毎日コミュニケーションズ、２００７年９月発行）。 (2) Acquisition of Fingertip Image Coordinates Next, the fingertip image coordinate acquisition unit 12 performs a process for obtaining a minimum circumscribed polygon on the silhouette image of FIG. For the process of obtaining a minimum circumscribed polygon for a predetermined geometric shape of a binary image, it is preferable to use a function of a minimum circumscribed polygon process known as computational geometry (for example, Reference: Nara Advanced Science and Technology). Graduate School OpenCV Programming Book Production Team / Author, “OpenCV Programming Book”, Mainichi Communications, published in September 2007).

一例として、この最小外接多角形処理は、下記のようなＧｒａｈａｍ’ｓＳｃａｎアルゴリズムを用いて得られる。
（１）所定の幾何形状の外縁を所定間隔を有する複数のサンプリング点で表す。
（２）複数のサンプリング点のうち、ｖ座標が最小の点をＰ０と定める。
（３）Ｐ０から見た他のサンプリング点に対して、角度の低い順（又は高い順）にＰ１，Ｐ２，Ｐ３，…と定める。ただし、ほぼ同じ角度に複数のサンプリング点が位置する場合は、最遠点を選定する。
（４）まず、選定したサンプリング点Ｐ０，Ｐ１，Ｐ２を直線で結ぶ。
（５）続いて、サンプリング点Ｐ１，Ｐ２，Ｐ３を結んだ場合の角度（内角）が１８０度以上ならＰ２を接点とせずに、サンプリング点Ｐ１，Ｐ３を直接結ぶ。
（６）全サンプリング点を直線で結ぶまで、次のサンプリング点に対して上記の（５）を順次繰り返す。 As an example, this minimum circumscribed polygon processing is obtained using the Graham's Scan algorithm as follows.
(1) An outer edge of a predetermined geometric shape is represented by a plurality of sampling points having a predetermined interval.
(2) Of the plurality of sampling points, the point with the smallest v coordinate is defined as P0.
(3) With respect to other sampling points viewed from P0, P1, P2, P3,. However, when a plurality of sampling points are located at substantially the same angle, the farthest point is selected.
(4) First, the selected sampling points P0, P1, and P2 are connected by a straight line.
(5) Subsequently, if the angle (inner angle) when the sampling points P1, P2, and P3 are connected is 180 degrees or more, the sampling points P1 and P3 are directly connected without using P2 as a contact point.
(6) The above (5) is sequentially repeated for the next sampling point until all sampling points are connected by a straight line.

図５（ａ）に最小外接多角形処理を施した一例を示す（理解を容易にするために、実画像の手のひら上に多角形を示す）。 FIG. 5A shows an example in which the minimum circumscribed polygon processing is performed (in order to facilitate understanding, a polygon is shown on the palm of the actual image).

図５（ａ）の○印は、外接多角形の各頂点の画像座標を示しているが、図４（ｂ）のシルエット画像に対して最小外接多角形を求めたために、衣服の袖と肌との境界部分も外接多角形の頂点として認識されてしまう。そこで、これらの頂点のうち、指先に対応する頂点を特定する必要がある。一般に、指先は丸みがあり、部分的に円形として近似できる。そこで、指先画像座標取得部１２は、円形抽出フィルタを外接多角形の頂点近傍に施す。具体的には、指先画像座標取得部１２は、予め定めたサイズの円形パターンを各頂点近傍に適用し、当該円形内で所定の頻度内に収まる位置を５箇所検出する。５箇所の位置を検出できない場合には、予め定めた別の円形パターンを適用する。この動作を予め定めた複数の円形パターンについて５箇所検出するまで繰り返し、検出できなかった場合は、別の画像フレームについての処理１に戻るようにする。円形抽出フィルタの出力画像の例を図５（ｂ）に示す。指先画像座標取得部１２は、各頂点近傍において得られる円形抽出フィルタの出力の最大値を、対象とする頂点のフィルタ出力値とする。つまり、指先画像座標取得部１２は、各頂点で得られるフィルタ出力値の最大値を選択し、この最大値を各指先の画像座標とする。その検出例を図５（ｃ）に示す（理解を容易にするために、実画像の手のひら上に検出例を示す）。 The circles in FIG. 5A indicate the image coordinates of each vertex of the circumscribed polygon. Since the minimum circumscribed polygon is obtained for the silhouette image in FIG. Is also recognized as the vertex of the circumscribed polygon. Therefore, it is necessary to specify a vertex corresponding to the fingertip among these vertices. Generally, the fingertip is rounded and can be approximated partially as a circle. Therefore, the fingertip image coordinate acquisition unit 12 applies a circular extraction filter to the vicinity of the vertex of the circumscribed polygon. Specifically, the fingertip image coordinate acquisition unit 12 applies a circular pattern of a predetermined size to the vicinity of each vertex, and detects five positions within a predetermined frequency within the circle. If the five positions cannot be detected, another predetermined circular pattern is applied. This operation is repeated until five locations are detected for a plurality of predetermined circular patterns, and if no detection is possible, the process returns to processing 1 for another image frame. An example of the output image of the circular extraction filter is shown in FIG. The fingertip image coordinate acquisition unit 12 sets the maximum output value of the circular extraction filter obtained in the vicinity of each vertex as the filter output value of the target vertex. That is, the fingertip image coordinate acquisition unit 12 selects the maximum value of the filter output value obtained at each vertex, and uses this maximum value as the image coordinate of each fingertip. The detection example is shown in FIG. 5C (in order to facilitate understanding, the detection example is shown on the palm of the actual image).

円形抽出フィルタについては、任意の既知の技法を用いることができる（例えば、参考文献：池谷、冨山、岩舘、「多視点映像における移動物体抽出とそのＣＧ表現に関する検討」、電子情報通信学会画像工学研究会、Ｖｏｌ．１０５，Ｎｏ．６１１，２００６年２月、ｐｐ．１６５−１７０参照）。 Arbitrary known techniques can be used for the circular extraction filter (for example, references: Ikeya, Hiyama, Iwabuchi, “Examination of moving object extraction in multi-view video and its CG expression”, IEICE Image Engineering) Study Group, Vol. 105, No. 611, February 2006, pp. 165-170).

次に、処理２を説明する。 Next, process 2 will be described.

[処理２：指先の画像座標と指との対応付け]
処理１では、指先に対応する５つの画像座標を求めた。処理２では、実際の指との対応付けを行う。 [Process 2: Association of fingertip image coordinates with finger]
In process 1, five image coordinates corresponding to the fingertip were obtained. In process 2, association with an actual finger is performed.

指先種別判定部１３は、５つの画像座標のデータを、外接多角形上の任意の順の配列に特定し、画像の画素データ毎に座標データと指先種別の情報（５本の指であるので３ｂｉｔで表現できる）の付帯情報を付与する所定のフォーマットで保持する。図６に例示するように、指先の画像座標をＦ１〜Ｆ５の配列で保持する。まず、５つの画像座標の全ての組み合わせで、画像上の直線距離を計算する。最も大きな距離値が得られた直線の端に親指と小指の指先があるとして設定する。例えば、図６では、親指及び小指の候補は、Ｆ３，Ｆ４となる。次に、Ｆ３とＦ２、Ｆ４とＦ５の距離を計算して比較する。距離の大きい方が親指と人差し指間の距離であるとして定めると、Ｆ４が親指であることが特定できる。親指を特定できたので、順に他の指の配列を割り当てる。実際の指との対応付けは、手首の画像座標の取得に用いる。 The fingertip type determination unit 13 specifies the data of the five image coordinates in an arbitrary order on the circumscribed polygon, and the coordinate data and the information of the fingertip type for each pixel data of the image (because there are five fingers). (Which can be expressed in 3 bits) is stored in a predetermined format to which additional information is attached. As illustrated in FIG. 6, the image coordinates of the fingertip are held in an array of F1 to F5. First, the linear distance on the image is calculated with all combinations of the five image coordinates. It is set that the fingertips of the thumb and the little finger are at the end of the straight line where the largest distance value is obtained. For example, in FIG. 6, the thumb and pinkie candidates are F3 and F4. Next, the distances of F3 and F2, and F4 and F5 are calculated and compared. If it is determined that the larger distance is the distance between the thumb and the index finger, it can be specified that F4 is the thumb. Now that the thumb has been identified, another finger array is assigned in order. The actual association with the finger is used to acquire the image coordinates of the wrist.

次に、処理３を説明する。 Next, process 3 will be described.

[処理３：手首の画像座標の取得]
処理２の結果、手首画像座標取得部１４は、得られる「中指」の画像座標、及び５つの指の「重心」の画像座標を手がかりに、手首位置を推定する。図７に示すように、「中指」の画像座標から「重心」の画像座標に向かう方向を第１探索方向（図示する「探索方向１」）とし、手のひらのシルエット画像中で第１探索方向（図示する「探索方向２」）に直交する方向を第２探索方向とする。探索の際には、可能な限り探索処理時間を軽減することを考慮したとき、「重心」の画像座標から第１探索方向に向かって、「中指」の画像座標と「重心」の画像座標の距離だけ離れた点「Ａ」を探索出発点とするのが好適である。 [Process 3: Acquisition of wrist image coordinates]
As a result of the processing 2, the wrist image coordinate acquisition unit 14 estimates the wrist position based on the obtained image coordinates of the “middle finger” and the image coordinates of the “center of gravity” of the five fingers. As shown in FIG. 7, the direction from the image coordinate of the “middle finger” to the image coordinate of the “center of gravity” is defined as the first search direction (“search direction 1” shown in the figure), and the first search direction ( The direction orthogonal to the “search direction 2”) shown in the figure is the second search direction. At the time of searching, considering that the search processing time is reduced as much as possible, the image coordinates of the “middle finger” and the image coordinates of the “center of gravity” from the image coordinates of the “center of gravity” toward the first search direction It is preferable to set the point “A” separated by the distance as the search starting point.

出発点「Ａ」から第１探索方向に向かって１画素づつ位置を移動しながら、各位置で第２探索方向の直線上における手のひらのシルエット画像の領域の画素数を計数する。出発点「Ａ」から第１探索方向に向かってこの探索を続けることにより、第２探索方向における計数値は序々に小さくなり、手首を過ぎるとこの計数値の減少傾向がなくなってほぼ一定値となる。この性質に着目し、一定値になり始める第１探索方向の地点における第２探索方向での手のひらのシルエット画像の境界を手首位置として決定する。手首には２つの特徴点が得られる（図７に示す２つの○印）。この手首の特徴点の左右の対応付けは、小指や親指の指先の特徴点からの距離の大きさから区別することができる。 While moving the position by one pixel from the starting point “A” toward the first search direction, the number of pixels in the palm silhouette image area on the straight line in the second search direction is counted at each position. By continuing this search from the starting point “A” toward the first search direction, the count value in the second search direction gradually decreases, and when the wrist is passed, the count value does not tend to decrease and remains almost constant. Become. Focusing on this property, the boundary of the palm silhouette image in the second search direction at the point in the first search direction starting to become a constant value is determined as the wrist position. Two feature points are obtained on the wrist (two circles shown in FIG. 7). The left-right association of the wrist feature points can be distinguished from the magnitude of the distance from the feature points of the little finger or thumb.

次に、図２を再び参照しながら、処理４を説明する。 Next, the process 4 will be described with reference to FIG. 2 again.

[処理４：平面射影変換行列の生成]
一般的に、空間中の点Ｑの３次元位置を世界座標（Ｘｗ，Ｙｗ，Ｚｗ）、及びカメラ座標を（Ｘｃ，Ｙｃ，Ｚｃ）で与えた場合、両者の関係は式（１）で表される。 [Process 4: Generation of planar projective transformation matrix]
In general, when a three-dimensional position of a point Q in space is given by world coordinates (Xw, Yw, Zw) and camera coordinates are given by (Xc, Yc, Zc), the relationship between them is expressed by equation (1). Is done.

Ｆは、カメラ２の姿勢を表す３×４の行列であり、３×３の回転行列Ｒ、及びカメラ光学主点Ｏｐから世界座標原点までの３次元移動ベクトルＴで表すことができ、式（２）のように構成される。 F is a 3 × 4 matrix representing the attitude of the camera 2, and can be represented by a 3 × 3 rotation matrix R and a three-dimensional movement vector T from the camera optical principal point Op to the world coordinate origin. 2).

カメラ座標（Ｘｃ，Ｙｃ，Ｚｃ）で点Ｑの位置を表すと、カメラ２の撮像画像上の点Ｑの投影点の画像座標（ｕ，ｖ）は式（３）により得られる。 When the position of the point Q is represented by the camera coordinates (Xc, Yc, Zc), the image coordinates (u, v) of the projection point of the point Q on the captured image of the camera 2 can be obtained by Expression (3).

ここで、ｗはカメラ光学主点Ｏｐから点Ｑまでの距離である。Ｐはカメラ２の中心射影を表す３×３の行列であり、次式のように表される。 Here, w is the distance from the camera optical principal point Op to the point Q. P is a 3 × 3 matrix representing the central projection of the camera 2 and is represented by the following equation.

以上の知識を本実施例に適用する。本実施例では基準画像平面ＲＳを平面物体と仮定し、基準画像平面ＲＳを世界座標の（Ｘｗ，Ｙｗ）平面を対応させる。まず、式（１）を式（３）に代入すると、世界座標で表された任意の点をカメラ画像上に投影する式（５）を得る。 The above knowledge is applied to this embodiment. In this embodiment, the reference image plane RS is assumed to be a plane object, and the reference image plane RS is made to correspond to the (Xw, Yw) plane of world coordinates. First, when Expression (1) is substituted into Expression (3), Expression (5) for projecting an arbitrary point represented by world coordinates onto the camera image is obtained.

Ｈは、式（２）及び式（４）の積で得られる３×４の行列である。世界座標の（Ｘｗ，Ｙｗ）平面上、即ち基準画像平面ＲＳ上の点ではＺｗ＝０であり、式（５）の行列Ｈの３列目は削除することができ、下記の式が得られる。 H is a 3 × 4 matrix obtained by the product of Equation (2) and Equation (4). At a point on the (Xw, Yw) plane of the world coordinates, that is, on the reference image plane RS, Zw = 0, and the third column of the matrix H in the formula (5) can be deleted, and the following formula is obtained. .

式（６）のｈ_ｎｍ（ｎ＝１〜３，ｍ＝１〜４）は、行列Ｈ’のｎ行ｍ列の要素を示す。式（６）のｈ_ｎｍで構成される３×３行列が、求めたい平面射影変換行列Ｈ’である。本実施例において、基準画像平面ＲＳの指先や手首の基準座標を（Ｘｗ，Ｙｗ）、処理１〜処理３で得られるカメラ画像上の１つの指先の画像座標を（ｕ，ｖ）として、式（６）に代入すると、Ｈ’に関する方程式が２つ得られる。従って、指先で５点の特徴点があるので１０個の方程式を作ることができ、手首で２点の特徴点があるので、指先の５点を加えて合計７点について、１４個の方程式を作ることができる。式（６）の来知数は９であるが、一つを１としても基準画像平面ＲＳに対する変位量を求めることになるので、実際にはｈ_３４＝１として、それ以外の８つを来知数とする。従って、８つを来知数を決定するのに、１０個又は１４個の方程式を連立して、一般化逆行列を用いることにより、Ｈ’の最小二乗解を求めることができる。 In the equation (6), h _nm (n = 1 to 3, m = 1 to 4) represents an element of n rows and m columns of the matrix H ′. The 3 × 3 matrix composed of h _nm in equation (6) is the plane projective transformation matrix H ′ to be obtained. In the present embodiment, the reference coordinates of the fingertip and wrist on the reference image plane RS are (Xw, Yw), and the image coordinates of one fingertip on the camera image obtained in the processing 1 to processing 3 are (u, v). Substituting into (6) gives two equations for H ′. Therefore, since there are 5 feature points at the fingertips, 10 equations can be created, and there are 2 feature points at the wrist, so adding 5 points at the fingertips, 14 equations for a total of 7 points. Can be made. Although the number of known values in equation (6) is 9, even if one is set to 1, the amount of displacement with respect to the reference image plane RS is obtained. Therefore, in practice, h ₃₄ = 1 is set, and the other 8 are obtained. Let it be an intelligent number. Therefore, in order to determine the known number of eight, the least squares solution of H ′ can be obtained by using a generalized inverse matrix of 10 or 14 equations simultaneously.

このようにして、平面射影変換行列生成部１５は、指先の画像座標（及び手首の画像座標）と、基準画像平面発生部１７から供給される手のひらの基準画像平面上の指先及び手首の基準座標とで構成される連立方程式を解いて、基準画像平面ＲＳに対する指先の画像座標（及び手首の画像座標）を含む平面射影変換行列（式（６））の各パラメータｈ_ｎｍを決定する。 In this way, the plane projection transformation matrix generation unit 15 performs the fingertip image coordinates (and the wrist image coordinates) and the fingertip and wrist reference coordinates on the reference image plane of the palm supplied from the reference image plane generation unit 17. The parameters h _nm of the plane projection transformation matrix (formula (6)) including the fingertip image coordinates (and the wrist image coordinates) with respect to the reference image plane RS are determined.

次に、図２を再び参照しながら、処理５を説明する。 Next, the process 5 will be described with reference to FIG. 2 again.

［処理５：手のひらの姿勢と位置の算出］
平面射影行列Ｈ’は基準画像平面ＲＳに対する手のひらの姿勢と位置を表しているが、この平面射影行列Ｈ’から世界座標の手のひらの姿勢を直接算出するのは容易ではない。そこで、位置姿勢算出部１６は、基準画像平面発生部１７から供給される基準画像平面ＲＳについて、図８（ａ）に示すように手のひらの基準画像平面上に仮想的な正方形を置き、その４つの頂点を平面射影行列Ｈ’で一旦射影変換し、世界座標の手のひらにおける対応する４つの頂点とカメラ画像上の投影点の座標との間の透視投影行列を求める。 [Process 5: Calculation of palm posture and position]
The plane projection matrix H ′ represents the posture and position of the palm with respect to the reference image plane RS. However, it is not easy to directly calculate the posture of the palm in the world coordinates from the plane projection matrix H ′. Therefore, the position / orientation calculation unit 16 places a virtual square on the reference image plane of the palm as shown in FIG. 8A with respect to the reference image plane RS supplied from the reference image plane generation unit 17. One vertex is subjected to projective transformation with the plane projection matrix H ′, and a perspective projection matrix between the corresponding four vertices in the palm of the world coordinates and the coordinates of the projection point on the camera image is obtained.

具体的には、基準画像平面ＲＳ上の正方形の頂点を（Ｘｗ（ｉ），Ｙｗ（ｉ））、ｉ＝１〜４とすると、投影点の画像座標（ｕ（ｉ），ｖ（ｉ））は、式（６）から次式のように計算できる。 Specifically, assuming that the vertices of a square on the reference image plane RS are (Xw (i), Yw (i)) and i = 1 to 4, the image coordinates (u (i), v (i) of the projection point ) Can be calculated from equation (6) as:

図８（ｂ）に画像上に投影された４つの頂点を示す。ここで、（ｕ（１），ｖ（１））から（ｕ（２），ｖ（２））に向かう画像上の直線Ｌ１の方程式は、式（８）のようになる。 FIG. 8B shows four vertices projected on the image. Here, the equation of the straight line L1 on the image from (u (1), v (1)) to (u (2), v (2)) is expressed by equation (8).

ここで、ａ_１，ｂ_１，ｃ_１の係数は、（ｕ（１），ｖ（１））と（ｕ（２），ｖ（２））から計算できる。また、式（４）を式（３）に代入して、カメラ座標と画像フレームの画像座標の関係を表すと、式（９）のようになる。 Here, the coefficients of a ₁ , b ₁ , and c ₁ can be calculated from (u (1), v (1)) and (u (2), v (2)). Further, when Expression (4) is substituted into Expression (3) to represent the relationship between the camera coordinates and the image coordinates of the image frame, Expression (9) is obtained.

式（９）をｕ，ｖについて解き、式（８）に代入すると、式（１０）が得られる。 When equation (9) is solved for u and v and substituted into equation (8), equation (10) is obtained.

式（１０）は、（ｕ（１），ｖ（１））と（ｕ（２），ｖ（２））、及びカメラ座標原点を結んだ三角形の平面方程式であり、Ｘｃ，Ｙｃ，Ｚｃにかかる係数は、平面の法線を表している。カメラ座標の原点を規定するカメラキャリブレーションを行うことによりＰ_ｎｍが得られるので、法線ベクトルを計算することができる。 Expression (10) is a plane equation of a triangle connecting (u (1), v (1)) and (u (2), v (2)) and the camera coordinate origin, and Xc, Yc, Zc Such a coefficient represents the normal of the plane. Since P _nm is obtained by performing camera calibration that defines the origin of the camera coordinates, the normal vector can be calculated.

また、（ｕ（３），ｖ（３））から（ｕ（４），ｖ（４））で定義される直線Ｌ２についても、同様に法線ベクトルを算出する。 Similarly, a normal vector is calculated for the straight line L2 defined by (u (3), v (3)) to (u (4), v (4)).

即ち、（ｕ（３），ｖ（３））から（ｕ（４），ｖ（４））に向かう画像上の直線Ｌ２の方程式は、式（１１）となる。 That is, the equation of the straight line L2 on the image from (u (3), v (3)) to (u (4), v (4)) is expressed by equation (11).

ａ_２，ｂ_２，ｃ_２の係数は、（ｕ（３），ｖ（３））と（ｕ（４），ｖ（４））から計算できる。 The coefficients a ₂ , b ₂ , and c ₂ can be calculated from (u (3), v (3)) and (u (4), v (4)).

従って、式（９）をｕ，ｖについて解き、式（１１）に代入すると、式（１２）を得ることができる。 Therefore, when equation (9) is solved for u and v and substituted into equation (11), equation (12) can be obtained.

式（１０）及び式（１２）の各係数で表された２つの法線べクトルの外積で得られる３次元ベクトルは、仮想的な正方形、即ち基準画像平面ＲＳの３次元空間中の面内ベクトルを表し、３次元空間中の面内ベクトルのうちのｘべクトルとなる。 The three-dimensional vector obtained by the outer product of the two normal vectors represented by the coefficients of Equation (10) and Equation (12) is a virtual square, that is, an in-plane in the three-dimensional space of the reference image plane RS. Represents a vector and is an x vector of in-plane vectors in a three-dimensional space.

式（１０）及び式（１２）と同様の計算を、（ｕ（１），ｖ（１））から（ｕ（４），ｖ（４））に向かう直線Ｌ３と（ｕ（２），ｖ（２））から（ｕ（３），ｖ（３））に向かう直線Ｌ４とについて、式（９）を用いて行い、これらの各係数で表された２つの法線べクトルの外積で得られる３次元ベクトルは、仮想的な正方形、即ち基準画像平面の３次元空間中の面内ベクトル（ｙべクトル）が得られる。 The same calculation as in the equations (10) and (12) is performed by calculating the straight lines L3 and (u (2), v from (u (1), v (1)) to (u (4), v (4)). For the straight line L4 from (2)) to (u (3), v (3)), this is performed using equation (9) and obtained by the outer product of the two normal vectors represented by these coefficients. The obtained three-dimensional vector is a virtual square, that is, an in-plane vector (y vector) in the three-dimensional space of the reference image plane.

ｘベクトルとｙベクトルは、基準画像平面ＲＳ内で直交しているため、この双方のベクトルの外積から、基準画像平面ＲＳに直交するｚベクトルが得られる。これらの３つのベクトルが手のひらの姿勢を表し、式（３）の回転行列Ｒの行ベクトルとなる。 Since the x vector and the y vector are orthogonal in the reference image plane RS, a z vector orthogonal to the reference image plane RS is obtained from the outer product of both vectors. These three vectors represent the posture of the palm, and are the row vectors of the rotation matrix R in Expression (3).

カメラの位置は、式（３）における（Ｔｘ，Ｔｙ，Ｔｚ）である。式（１）〜式（４）を用いて、世界座標から画像座標への変換式を式（１３）のように生成する。 The position of the camera is (Tx, Ty, Tz) in Equation (3). Using equations (1) to (4), a conversion equation from world coordinates to image coordinates is generated as in equation (13).

ここで、行列Ｐは例えばカメラ座標の原点を規定するカメラキャリブレーションを行うことにより事前に特定でき、回転行列Ｒも基準画像平面ＲＳに対するｘ，ｙ，ｚベクトルとして既に得られているから、仮想正方形の４つの頂点について、基準画像平面上の（Ｘｗ（ｉ），Ｙｗ（ｉ））と投影点の画像座標（ｕ（ｉ），ｖ（ｉ））を既知の情報として代入すると、８つの方程式が得られる。（Ｔｘ，Ｔｙ，Ｔｚ）を未知数として、この８つの連立方程式を解くことにより、基準画像平面に対するカメラの位置（Ｔｘ，Ｔｙ，Ｔｚ）を計算できる。これにより、式（１３）で表される透視投影行列を特定でき、世界座標の手のひらの姿勢を、基準画像平面ＲＳに対する手のひらの姿勢と位置に対する変位ベクトルとして、カメラ座標の手のひらの画像から算出することができるようになる。 Here, the matrix P can be specified in advance by performing camera calibration that defines the origin of the camera coordinates, for example, and the rotation matrix R is already obtained as an x, y, z vector with respect to the reference image plane RS. Substituting (Xw (i), Yw (i)) on the reference image plane and the image coordinates (u (i), v (i)) of the projection point as known information for the four vertices of the square, The equation is obtained. By solving these eight simultaneous equations with (Tx, Ty, Tz) as an unknown, the camera position (Tx, Ty, Tz) with respect to the reference image plane can be calculated. Thereby, the perspective projection matrix represented by the equation (13) can be specified, and the posture of the palm in the world coordinates is calculated from the palm image in the camera coordinates as a displacement vector with respect to the posture and position of the palm with respect to the reference image plane RS. Will be able to.

以上の処理１〜処理５により、カメラ２で撮影した手のひらの画像から、３次元空間の手のひらの姿勢と位置を求めることができる。 Through the processing 1 to processing 5 described above, the posture and position of the palm in the three-dimensional space can be obtained from the palm image captured by the camera 2.

本発明の一態様として、画像認識装置１をコンピュータとして構成することができ、前述した画像入力部１１、指先画像座標取得部１２、指先種別判定部１３、手首画像座標取得部１４、平面射影変換行列生成部１５、位置姿勢算出部１６、及び基準画像平面発生部１７の機能を実現させるためのプログラムは、各コンピュータの内部又は外部に備えられる記憶部（図示せず）に記憶される。また、基準画像平面発生部１７が用いる基準画像平面の指先、手首、及び正方形の計１１点の座標データと世界座標の設定情報もこの記憶部に記憶しておくことができる。このような記憶部は、外付けハードディスクなどの外部記憶装置、或いはＲＯＭ又はＲＡＭなどの内部記憶装置で実現することができる。プログラムを実行する制御部は、中央演算処理装置（ＣＰＵ）などで実現することができる。即ち、ＣＰＵが、各構成要素の機能を実現するための処理内容が記述されたプログラムを、適宜、記憶部から読み込んで、コンピュータ上で各装置を実現することができる。ここで、いずれかの手段の機能をハードウェアの全部又は一部で実現しても良い。 As one aspect of the present invention, the image recognition apparatus 1 can be configured as a computer. The image input unit 11, the fingertip image coordinate acquisition unit 12, the fingertip type determination unit 13, the wrist image coordinate acquisition unit 14, the plane projective transformation described above. Programs for realizing the functions of the matrix generation unit 15, the position / orientation calculation unit 16, and the reference image plane generation unit 17 are stored in a storage unit (not shown) provided inside or outside each computer. In addition, coordinate data of a total of 11 points of the fingertip, wrist, and square of the reference image plane used by the reference image plane generation unit 17 and setting information of the world coordinates can be stored in this storage unit. Such a storage unit can be realized by an external storage device such as an external hard disk or an internal storage device such as ROM or RAM. The control unit that executes the program can be realized by a central processing unit (CPU) or the like. That is, the CPU can appropriately read from the storage unit a program in which the processing content for realizing the function of each component is described, and implement each device on the computer. Here, the function of any means may be realized by all or part of the hardware.

上述した実施例において、画像認識装置１の機能を実現するための処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくこともできる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録装置、半導体メモリ等どのようなものでもよい。 In the above-described embodiment, the program describing the processing content for realizing the function of the image recognition apparatus 1 can be recorded on a computer-readable recording medium. As the computer-readable recording medium, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording device, and a semiconductor memory may be used.

また、この処理内容を記述したプログラムを、例えばＤＶＤ又はＣＤ‐ＲＯＭなどの可搬型記録媒体の販売、譲渡、貸与等により流通させることができるほか、そのようなプログラムを、例えばＩＰなどのネットワーク上にあるサーバの記憶領域に記憶しておき、ネットワークを介してサーバから他のコンピュータにそのプログラムを転送することにより、流通させることができる。 In addition, the program describing the processing contents can be distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM, and such a program can be distributed on a network such as an IP. The program can be distributed by storing the program in a storage area of the server and transferring the program from the server to another computer via the network.

また、そのようなプログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラム又はサーバから転送されたプログラムを、一旦、各コンピュータの記憶部に記憶することができる。また、このプログラムの別の実施態様として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、更に、このコンピュータにサーバからプログラムが転送される度に、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。尚、本態様におけるプログラムには、電子計算機の処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないが、コンピュータの処理を規定する性質を有するデータ等）を含むものとする。 In addition, a computer that executes such a program can temporarily store, for example, a program recorded on a portable recording medium or a program transferred from a server in a storage unit of each computer. As another embodiment of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and each time the program is transferred from the server to the computer. In addition, the processing according to the received program may be executed sequentially. Note that the program in this aspect includes information provided for processing of an electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer). .

上述の実施例の画像認識装置１は、カメラで撮像した画像を入力して処理する代表的な例として説明したが、本発明の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。例えば、画像認識装置１は、動画像をカメラ２を介して入力する代わりに、ネットワーク又は記憶媒体を介して得られる動画ファイルについても、行列Ｐが分かっている場合には、撮像画像と世界座標との間の行列Ｐを任意の初期値を設定して、「手のひら」の画像の姿勢を推定することができる。或いは又、画像認識装置１は、カメラ付き携帯電話にも適用することができる。従って、本発明は、上述の実施例によって制限するものと解するべきではなく、特許請求の範囲によってのみ制限される。 The image recognition apparatus 1 according to the above-described embodiment has been described as a typical example of inputting and processing an image captured by a camera. However, a person skilled in the art can make many changes and substitutions within the spirit and scope of the present invention. Is obvious. For example, when the matrix P is known for a moving image file obtained via a network or a storage medium instead of inputting a moving image via the camera 2, the image recognition apparatus 1 captures the captured image and the world coordinates. An arbitrary initial value is set for the matrix P between and the orientation of the image of the “palm” can be estimated. Alternatively, the image recognition device 1 can be applied to a camera-equipped mobile phone. Accordingly, the invention should not be construed as limited by the embodiments described above, but only by the claims.

本発明によれば、マーカーを用いる代わりに、手のひらの指先抽出と指先種別の判定を用いて姿勢を推定するので、カメラの撮像画像にマーカーが映り込むこともなく、ＣＧなどの自然な合成映像を得るのに役立つ。例えば、放送番組での利用や、個人で映像コンテンツを作成する際に簡易なバーチャル効果装置として利用できる。また、マーカーが写り込むことのないカメラの撮像画像を得て手のひらの姿勢を推定することができるので、手のひらを利用したコンピュータ・インタラクションのインターフェースとしても利用することができる。更に、推定した手のひらの位置や姿勢の情報を利用した仮想空間中の才ブジェクトの操作、又は電子ゲームなど様々な分野での応用も期待でき、手のひらの位置や姿勢の推定した情報を利用する任意の用途に有用である。 According to the present invention, instead of using a marker, the posture is estimated using fingertip extraction of the palm and determination of the fingertip type, so that the marker is not reflected in the captured image of the camera, and a natural composite video such as CG Help to get. For example, it can be used as a simple virtual effect device when used in a broadcast program or when personally creating video content. In addition, since it is possible to estimate the posture of the palm by obtaining a captured image of the camera in which the marker does not appear, it can also be used as a computer interaction interface using the palm. In addition, it can be expected to be applied to various fields such as manipulation of talented objects in virtual space using information on estimated palm position and posture, or electronic games, and any information using information on estimated palm position and posture It is useful for applications.

本発明による実施例の画像認識装置を用いたシステム例を示す図である。It is a figure which shows the system example using the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置を用いたシステム例における処理環境を示す図である。It is a figure which shows the processing environment in the example of a system using the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置の動作を表すフローチャートである。It is a flowchart showing operation | movement of the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置の動作説明図である。It is operation | movement explanatory drawing of the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置の動作説明図である。It is operation | movement explanatory drawing of the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置の動作説明図である。It is operation | movement explanatory drawing of the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置の動作説明図である。It is operation | movement explanatory drawing of the image recognition apparatus of the Example by this invention. 本発明による実施例の画像認識装置の動作説明図である。It is operation | movement explanatory drawing of the image recognition apparatus of the Example by this invention. 従来の画像認識装置を用いたシステム例を示す図である。It is a figure which shows the example of a system using the conventional image recognition apparatus.

Explanation of symbols

１画像認識装置
２カメラ
３物理物体
４手のひら
４ａ手のひら部分
４ｂ手首近傍部分
５衣服の袖
１１画像入力部
１２指先画像座標取得部
１３指先種別判定部
１４手首画像座標取得部
１５平面射影変換行列生成部
１６位置姿勢算出部
１７基準画像平面発生部
１００姿勢位置推定装置
１０１マーカー
１０２カメラ
１０３物体
１１１画像入力部
１１２マーカー検出部
１１３位置・姿勢算出部 DESCRIPTION OF SYMBOLS 1 Image recognition apparatus 2 Camera 3 Physical object 4 Palm 4a Palm part 4b Wrist vicinity part 5 Clothes sleeve 11 Image input part 12 Fingertip image coordinate acquisition part 13 Fingertip type determination part 14 Wrist image coordinate acquisition part 15 Planar projection transformation matrix generation part 16 position / orientation calculation unit 17 reference image plane generation unit 100 posture position estimation device 101 marker 102 camera 103 object 111 image input unit 112 marker detection unit 113 position / orientation calculation unit

Claims

An image recognition apparatus for estimating a three-dimensional posture of a palm included in a moving image from only the moving image,
The palm image portion in the image frame of the moving image is subjected to skin color determination processing to extract a palm silhouette image, and the image of the fingertip is subjected to minimum circumscribed polygon processing and circular extraction filter processing from the palm image portion in the silhouette image A fingertip image coordinate acquisition unit for acquiring coordinates;
A plane projective transformation for setting a world coordinate of a moving image and a reference image plane located on one plane of the world coordinate, and generating a plane projection matrix of the image coordinates of the fingertip with respect to the reference coordinates of the fingertip on the reference image plane A matrix generator;
A position and orientation calculation unit for generating a perspective projection matrix for a reference image plane from the plane projection matrix, and generating posture information of the palm in the world coordinates from an image of the palm included in the moving image;
An image recognition apparatus comprising:

A fingertip type determination unit that compares the distance between the image coordinates of the feature points corresponding to each extracted fingertip to identify the type of the fingertip;
The silhouette image component orthogonal to the line segment is counted while moving from the center of gravity in the direction of the line segment connecting the middle finger and the center of gravity of each finger from the palm image portion of the palm silhouette image. A wrist image coordinate acquisition unit for tracking the trend and determining the image coordinates of the feature points near the wrist;
The plane projection transformation matrix generation unit generates a plane projection matrix of image coordinates of feature points near the wrist with respect to the reference coordinates of each fingertip and the reference coordinates of the wrist in a reference image plane located on one plane of the world coordinates. The image recognition apparatus according to claim 1, wherein the image recognition apparatus generates the image recognition apparatus.

The plane projection transformation matrix generation unit sets a virtual square on the reference image plane, and the position and orientation calculation unit performs projection transformation on the four vertices of the square using a plane projection transformation matrix. The image recognition according to claim 1, wherein the perspective projection matrix with respect to the reference image plane is generated based on the image coordinates of the four vertices obtained by the projective transformation, and the posture of the palm is estimated. apparatus.

The fingertip image coordinate acquisition unit includes means for determining whether or not the image coordinates of each fingertip acquired from the image frame to be processed are acquired as image coordinates of five fingertips. The image recognition apparatus according to claim 1, wherein when it is determined that the image is not acquired, another image frame in the moving image is a processing target.

A computer configured as an image recognition device that estimates the three-dimensional posture of the palm included in the moving image from only the moving image;
The palm image portion in the image frame of the moving image is subjected to skin color determination processing to extract a palm silhouette image, and the minimum circumscribed polygon processing and the circular extraction filter processing are performed from the palm image portion in the silhouette image to obtain a fingertip image. Obtaining the coordinates;
Setting the world coordinates of the moving image and a reference image plane located on one plane of the world coordinates, and generating a plane projection matrix of the image coordinates of the fingertip with respect to the reference coordinates of the fingertip on the reference image plane;
Generating a perspective projection matrix for a reference image plane from the planar projection matrix, and generating palm posture information in the world coordinates from a palm image included in a moving image;
A program for running