JP2020013348A

JP2020013348A - Gesture detection device, gesture detection method, and gesture detection control program

Info

Publication number: JP2020013348A
Application number: JP2018135291A
Authority: JP
Inventors: ヒブランベニテス; Benitez Gibran; 浮田　宗伯; Munenori Ukita; 宗伯浮田; 佳行津田; Yoshiyuki Tsuda
Original assignee: Denso Corp; Toyota Gauken
Current assignee: Denso Corp; Toyota Gauken
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2020-01-23
Anticipated expiration: 2038-07-18
Also published as: JP7163649B2

Abstract

To provide a gesture detection device, a gesture detection method, and a gesture detection control program that can recognize a gesture intended by an operator without increasing time and effort of a gesture detection operation.SOLUTION: A gesture detection device, which detects a gesture when an input operation is performed on an operation target based on a gesture of an operator, includes: a gesture section extraction unit 120 that extracts, based on feature amount data extracted from imaging data of the operator imaged by the imaging device 10, among gestures, a former of a set of gestures similar to each other, as a gesture immediately before a start of the input operation, and a latter of the set of similar gestures, as a gesture immediately after an end of the input operation; and a gesture identification unit 130 that identifies, as an input gesture, a gesture performed between the gesture immediately before the start and the gesture immediately after the end.SELECTED DRAWING: Figure 1

Description

本発明は、入力操作に用いるジェスチャを検出するジェスチャ検出装置、ジェスチャ検出方法、およびジェスチャ検出制御プログラムに関するものである。 The present invention relates to a gesture detection device that detects a gesture used for an input operation, a gesture detection method, and a gesture detection control program.

従来のジェスチャ検出装置として、例えば、特許文献１に記載されたものが知られている。特許文献１のジェスチャ検出装置（ハンドパターンスイッチ装置）は、操作者のジェスチャが意図をもって行われたものか、不用意に行われたものかを確実に識別するようにし、誤認識の発生を抑制している。具体的には、カメラによる撮像領域において、操作者が予め設定した手指の形状で、その手指の位置を所定時間にわたって、停止させたとき、検出装置は、操作者の意図を持ったジェスチャによる操作が開始されたものと認識するようになっている。そして、その後に、検出装置は、手指の形状と動きとに基づいてくジェスチャ操作の検出処理を実行するようになっている。 As a conventional gesture detection device, for example, a device described in Patent Document 1 is known. The gesture detection device (hand pattern switch device) disclosed in Patent Literature 1 reliably identifies whether an operator's gesture has been performed intentionally or carelessly, and suppresses the occurrence of erroneous recognition. are doing. Specifically, when the position of the finger is stopped for a predetermined time in the shape of the finger set by the operator in the imaging region of the camera, the detection device performs the operation by the gesture with the intention of the operator. Is recognized as being started. After that, the detection device executes a gesture operation detection process based on the shape and movement of the finger.

特開２００４−１４５７２３号公報JP 2004-145723 A

しかしながら、上記特許文献１では、検出装置は、操作者の意図を持ったジェスチャ操作の開始を認識するために、予め設定した手指の形状で、所定時間にわたって停止させる必要があり、ジェスチャ操作が煩雑（手間のかかる）ものとなっている。 However, in Patent Literature 1, in order to recognize the start of the gesture operation with the intention of the operator, the detection device needs to be stopped for a predetermined time with a predetermined finger shape, which makes the gesture operation complicated. (It takes time).

本発明の目的は、上記問題に鑑み、ジェスチャ検出のための操作の手間を増やすことなく、操作者の意図を持ったジェスチャであることを認識可能とするジェスチャ検出装置、ジェスチャ検出方法、およびジェスチャ検出制御プログラムを提供することにある。 In view of the above problems, it is an object of the present invention to provide a gesture detection device, a gesture detection method, and a gesture detection method capable of recognizing a gesture intended by an operator without increasing the time and effort for gesture detection. It is to provide a detection control program.

本発明は上記目的を達成するために、以下の技術的手段を採用する。 The present invention employs the following technical means to achieve the above object.

第１の発明では、操作対象に対して、操作者のジェスチャに基づいて入力操作を行う際の、ジェスチャを検出するジェスチャ検出装置において、
撮像装置（１０）によって撮像された操作者の撮像データから抽出された特徴量データに基づいて、ジェスチャのうち、互いに類似する一組のジェスチャの先の方を開始直前ジェスチャ、後の方を終了直後ジェスチャとしてジェスチャの区間を抽出するジェスチャ区間抽出部（１２０）と、
開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別するジェスチャ識別部（１３０）と、を備えることを特徴としている。 According to a first aspect, in a gesture detection device that detects a gesture when performing an input operation on an operation target based on a gesture of an operator,
Based on the feature amount data extracted from the imaging data of the operator captured by the imaging device (10), of the gestures, the first of a set of gestures similar to each other is the gesture just before the start, and the last is the end. A gesture section extraction unit (120) for extracting a section of a gesture as a gesture immediately after;
And a gesture identification unit (130) for identifying a gesture performed between the gesture immediately before the start and the gesture immediately after the end as a gesture for input.

また、第２の発明では、操作対象に対して、操作者のジェスチャに基づいて入力操作を行う際の、ジェスチャを検出するジェスチャ検出方法において、
撮像装置（１０）によって撮像された操作者の撮像データから抽出された特徴量データに基づいて、ジェスチャのうち、互いに類似する一組のジェスチャの先の方を開始直前ジェスチャ、後の方を終了直後ジェスチャとしてジェスチャの区間を抽出し、
開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別することを特徴としている。 Further, in the second invention, in a gesture detection method for detecting a gesture when performing an input operation on an operation target based on a gesture of an operator,
Based on the feature amount data extracted from the image data of the operator imaged by the imaging device (10), of the gestures, the first of a set of gestures similar to each other is the gesture just before the start, and the latter is the end. Immediately after extracting a gesture section as a gesture,
A gesture performed between the gesture immediately before the start and the gesture immediately after the end is identified as an input gesture.

また、第３の発明では、操作対象に対して、操作者のジェスチャに基づいて入力操作を行う際の、ジェスチャを検出するジェスチャ検出制御プログラムにおいて、
コンピュータを、
撮像装置（１０）によって撮像された操作者の撮像データから抽出された特徴量データに基づいて、ジェスチャのうち、互いに類似する一組のジェスチャの先の方を開始直前ジェスチャ、後の方を終了直後ジェスチャとしてジェスチャの区間を抽出するジェスチャ区間抽出部（１２０）と、
開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別するジェスチャ識別部（１３０）として機能させることを特徴としている。 In the third invention, in a gesture detection control program for detecting a gesture when performing an input operation on an operation target based on a gesture of an operator,
Computer
Based on the feature amount data extracted from the image data of the operator imaged by the imaging device (10), of the gestures, the first of a set of gestures similar to each other is the gesture just before the start, and the latter is the end. A gesture section extraction unit (120) for extracting a section of the gesture as a gesture immediately after;
A gesture performed between the gesture immediately before the start and the gesture immediately after the end is made to function as a gesture identification unit (130) for identifying the gesture as an input gesture.

一般的に、操作者がジェスチャを行う際の開始時と終了時では、類似した姿勢を取る。即ち、操作者は所定のジェスチャを開始するときは、まず、所定の手指形状（ジェスチャの基本形状）で手をかざす行為をとり、所定のジェスチャを実行した後は、最初と同じ所定の手指形状を維持して、ジェスチャを終了する。 Generally, the operator takes a similar posture at the time of starting and at the time of ending when performing a gesture. That is, when starting the predetermined gesture, the operator first takes an action of holding the hand with a predetermined finger shape (basic shape of the gesture), and after executing the predetermined gesture, the same predetermined finger shape as the first time. To end the gesture.

よって、本発明によれば、互いに類似する一組のジェスチャを捉えることで、開始直前ジェスチャおよび終了直後ジェスチャを抽出して、ジェスチャの区間を抽出することが可能となる。そして、開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別することで、例えば、不用意に行われたジェスチャではなく、操作者の意図を持ったジェスチャとして確実に認識することが可能となる。よって、従来技術のように、わざわざジェスチャ検出のための操作の手間を増やすことなく、入力用のジェスチャを認識することができ、ひいては、誤認識の発生を抑制することができる。 Therefore, according to the present invention, by capturing a set of gestures that are similar to each other, it is possible to extract a gesture immediately before the start and a gesture immediately after the end, and extract a section of the gesture. Then, by identifying the gesture performed between the gesture immediately before the start and the gesture immediately after the end as the input gesture, for example, instead of the carelessly performed gesture, the gesture having the intention of the operator can be reliably performed. Can be recognized. Therefore, it is possible to recognize an input gesture without increasing the time and effort for gesture detection as in the related art, and it is possible to suppress occurrence of erroneous recognition.

尚、上記各手段の括弧内の符号は、後述する実施形態記載の具体的手段との対応関係を示すものである。 Note that the reference numerals in parentheses of the above means indicate the correspondence with specific means described in the embodiment described later.

ジェスチャ検出装置を示すブロック図である。It is a block diagram showing a gesture detection device. ジェスチャの開始終了の候補を抽出する際の要領を示す説明図である。FIG. 9 is an explanatory diagram showing a procedure for extracting a candidate for starting and ending a gesture. ジェスチャの開始終了候補の類似度を算出する際の要領を示す説明図である。FIG. 9 is an explanatory diagram showing a point of calculating a similarity between gesture start / end candidates. ジェスチャの開始終了を決定する際の要領を示す説明図である。It is an explanatory view showing a point at the time of determining start and end of a gesture.

以下に、図面を参照しながら本発明を実施するための複数の形態を説明する。各形態において先行する形態で説明した事項に対応する部分には同一の参照符号を付して重複する説明を省略する場合がある。各形態において構成の一部のみを説明している場合は、構成の他の部分については先行して説明した他の形態を適用することができる。各実施形態で具体的に組み合わせが可能であることを明示している部分同士の組み合わせばかりではなく、特に組み合わせに支障が生じなければ、明示していなくても実施形態同士を部分的に組み合せることも可能である。 Hereinafter, a plurality of embodiments for carrying out the present invention will be described with reference to the drawings. In each embodiment, portions corresponding to the items described in the preceding embodiment are denoted by the same reference numerals, and redundant description may be omitted. When only a part of the configuration is described in each embodiment, the other embodiments described above can be applied to other parts of the configuration. Not only the combination of the parts that clearly indicate that a combination is possible in each embodiment, but also the embodiments can be partially combined without being specified, unless there is any particular problem with the combination. It is also possible.

（第１実施形態）
第１実施形態のジェスチャ検出装置１００について図１〜図４を用いて説明する。本実施形態のジェスチャ検出装置１００は、車両に搭載され、運転者（操作者）の体の特定部位の動き（ジェスチャ）に基づいて、各種車両機器に対する入力操作を行う際の、ジェスチャを検出する装置となっている。 (1st Embodiment)
The gesture detection device 100 according to the first embodiment will be described with reference to FIGS. The gesture detection device 100 according to the present embodiment is mounted on a vehicle, and detects a gesture when performing an input operation on various vehicle devices based on a movement (gesture) of a specific part of a body of a driver (operator). Device.

各種車両機器としては、例えば、車室内の空調を行う空調装置、自車の現在位置表示あるいは目的地への案内表示等を行うカーナビゲーション装置（以下、カーナビ装置）、テレビ放映、ラジオ放送、ＣＤ／ＤＶＤの再生等を行うオーディオ装置等、がある。これらの各種車両機器は、本発明の操作対象に対応する。尚、各種車両機器としては、上記に限らず、ヘッドアップディスプレイ装置、ルームランプ装置、後席サンシェード装置、電動シート装置、グローボックス開閉装置等がある。 Examples of various vehicle devices include an air conditioner that air-conditions a vehicle interior, a car navigation device (hereinafter, a car navigation device) that displays a current position of a vehicle or a guide display to a destination, a television broadcast, a radio broadcast, and a CD. / DVD playback audio device. These various vehicle devices correspond to the operation target of the present invention. The various types of vehicle equipment are not limited to those described above, and include a head-up display device, a room lamp device, a rear seat sunshade device, an electric seat device, a glow box opening / closing device, and the like.

ジェスチャ検出装置１００は、図１に示すように、撮像装置１０によって撮像された運転者の体の特定の部位の撮像データをもとに、入力操作用のジェスチャを検出する。検出されたジェスチャに基づいて、各種車両機器が操作され、操作の結果（作動状態等）が情報機器２０に表示されるようになっている。 As shown in FIG. 1, the gesture detection device 100 detects a gesture for an input operation based on image data of a specific part of a driver's body captured by the imaging device 10. Various vehicle devices are operated based on the detected gesture, and the result of the operation (operation state or the like) is displayed on the information device 20.

運転者の体の特定部位としては、例えば、運転者の手の指、手の平、腕等とすることができる。本実施形態では、入力操作用のジェスチャとして、主に手の指（指差しジェスチャ、サークルジェスチャ、ウェーブジェスチャ、フリックジェスチャ等）を用いるものとしている。 The specific part of the driver's body may be, for example, the driver's finger, palm, arm, or the like. In the present embodiment, as a gesture for an input operation, a finger (a pointing gesture, a circle gesture, a wave gesture, a flick gesture, or the like) is mainly used.

尚、各種車両機器は、入力操作用のジェスチャによって、例えば、空調装置であると、設定温度の変更、空調風の風量の変更等が行われ、また、カーナビ装置では、地図の拡大縮小、目的地設定等が行われ、また、オーディオ装置では、テレビ局、ラジオ局の変更、楽曲の選択、音量の変更等が行われる。 For example, in the case of an air conditioner, a change in set temperature, a change in the amount of conditioned air, and the like are performed by various input and output gestures. Location setting and the like are performed, and in the audio device, change of a television station or a radio station, selection of a music piece, change of a volume, and the like are performed.

撮像装置１０は、運転者の体の特定部位の動きを連続的に（時間経過と共に）撮像し、撮像データをジェスチャ検出装置１００（手検出部１１０）に出力するようになっている。 The imaging device 10 is configured to continuously (with time) capture the movement of a specific part of the driver's body and output captured data to the gesture detection device 100 (hand detection unit 110).

撮像装置１０としては、対象物の輝度画像を形成するカメラ、距離画像を形成する距離画像センサ、あるいはそれらの組合せを用いることができる。カメラとしては、近赤外線を捉える近赤外線カメラ、あるいは可視光を捉える可視光カメラ等がある。 As the imaging device 10, a camera that forms a luminance image of an object, a distance image sensor that forms a distance image, or a combination thereof can be used. Examples of the camera include a near-infrared camera that captures near-infrared light and a visible light camera that captures visible light.

また、距離画像センサとしては、例えば、複数のカメラで同時に撮影して視差から奥行方向の情報を計測するステレオカメラ、あるいは、光源からの光が対象物で反射して返るまでの時間で奥行きを計測するＴｏＦ（Time of Flight）カメラ等がある。本実施形態では、撮像装置１０としては、カメラを用いたものとしている。 Further, as a distance image sensor, for example, a stereo camera that simultaneously captures images with a plurality of cameras and measures information in the depth direction from parallax, or the depth from the time when light from a light source is reflected by an object and returned. There is a ToF (Time of Flight) camera to measure. In the present embodiment, a camera is used as the imaging device 10.

情報機器２０は、各種車両機器の作動状態等を表示する表示部であり、空調表示部、カーナビ表示部、およびオーディオ表示部等がある。情報機器２０は、例えば、液晶ディスプレイあるいは有機ＥＬディスプレイ等によって形成されている。 The information device 20 is a display unit that displays an operation state of various vehicle devices and the like, and includes an air-conditioning display unit, a car navigation display unit, an audio display unit, and the like. The information device 20 is formed by, for example, a liquid crystal display or an organic EL display.

ジェスチャ検出装置１００は、例えば、ＣＰＵ、ＲＯＭ、ＲＡＭ等を備えたコンピュータであり、手検出部１１０、ジェスチャ区間抽出部１２０、およびジェスチャ識別部１３０等を備えている。 The gesture detection device 100 is, for example, a computer including a CPU, a ROM, a RAM, and the like, and includes a hand detection unit 110, a gesture section extraction unit 120, a gesture identification unit 130, and the like.

手検出部１１０は、撮像装置１０によって撮像された撮像データ（画像）から、運転者の手が存在する領域を特定して、特定した領域のデータをジェスチャ区間抽出部１２０に出力するようになっている。尚、手を検出する手法としては、例えば、以下の文献等に示されている。
・M. Kolsch and M. Turk.: Robust Hand Detection. FGR, 2004
・Wei Fan, Li Chen, Wei Liu, Yuan He, Jun Sun, Shutao Li: Monocular Vision Based Relative Depth Estimation for Hand Gesture Recognition. MVA 2013 。 The hand detection unit 110 specifies an area where the driver's hand is present from the image data (image) captured by the imaging apparatus 10 and outputs data of the specified area to the gesture section extraction unit 120. ing. Note that a technique for detecting a hand is disclosed in, for example, the following literature.
・ M. Kolsch and M. Turk .: Robust Hand Detection. FGR, 2004
・ Wei Fan, Li Chen, Wei Liu, Yuan He, Jun Sun, Shutao Li: Monocular Vision Based Relative Depth Estimation for Hand Gesture Recognition. MVA 2013.

ジェスチャ区間抽出部１２０は、運転者が入力操作のために意図的に行ったジェスチャの区間を抽出する部位であり、開始終了候補抽出部１２１、類似度算出部１２２、および開始終了決定部１２３等を有している（詳細後述）。 The gesture section extraction section 120 is a section for extracting a section of a gesture intentionally performed by the driver for an input operation, and includes a start / end candidate extraction section 121, a similarity calculation section 122, a start / end determination section 123, and the like. (Details will be described later).

ジェスチャ識別部１３０は、ジェスチャ区間抽出部１２０によって抽出された区間のジェスチャを、入力操作のために運転者が意図的に行ったジェスチャとして識別する部位である。ジェスチャ識別部１３０は、識別したジェスチャを各種車両機器、および情報機器２０に出力する部位となっている。 The gesture identification unit 130 is a unit that identifies a gesture in a section extracted by the gesture section extraction unit 120 as a gesture intentionally performed by the driver for an input operation. The gesture identification unit 130 is a part that outputs the identified gesture to various vehicle devices and the information device 20.

本実施形態のジェスチャ検出装置１００は、以上のような構成となっており、以下、図２〜図４を加えて、作動および作用効果について説明する。 The gesture detection device 100 according to the present embodiment has the above-described configuration. Hereinafter, the operation and the effect will be described with reference to FIGS.

撮像装置１０によって撮像された撮像データは、手検出部１１０に出力される。手検出部１１０は、撮像データにおいて、手が存在する領域を特定して、特定した領域のデータを、ジェスチャ区間抽出部１２０の開始終了候補抽出部１２１に出力する。 The image data captured by the image capturing device 10 is output to the hand detection unit 110. The hand detection unit 110 specifies an area where a hand is present in the imaging data, and outputs data of the specified area to the start / end candidate extraction unit 121 of the gesture section extraction unit 120.

開始終了候補抽出部１２１は、図２に示すように、手検出部１１０から出力された撮像データのフレームから、ジェスチャの開始時および終了時のフレームの候補を抽出する。ここで、ジェスチャの開始時のフレームというのは、運転手がジェスチャを行う際の開始時の状態を示すフレームである。例えば、実行するジェスチャを、フリックジェスチャとすると、ジェスチャ開始時というのは、まず、指を立てた手をかざした状態に対応する。また、ジェスチャの終了時というのは、フリックジェスチャを終えたときに、開始時と同様に、指を立てた手の姿勢が維持された状態に対応する。 As illustrated in FIG. 2, the start / end candidate extracting unit 121 extracts a frame start and end frame candidate from the frame of the image data output from the hand detection unit 110. Here, the frame at the start of the gesture is a frame indicating a state at the time of the start when the driver performs the gesture. For example, assuming that the gesture to be executed is a flick gesture, the start of the gesture corresponds to a state in which a hand with a finger is held first. The end of the gesture corresponds to a state in which the posture of the hand with the finger raised is maintained when the flick gesture is completed, as in the start.

開始終了候補抽出部１２１は、３枚以上の連続した撮像フレーム（撮像データ）を用いて、１つのフレームに基づく静的な特徴量（特徴量データ）と、複数のフレームに基づく動的な特徴量（特徴量データ）とを抽出し、それらを結合して１つの特徴量を作成する。好適には、静的特徴量としてＨＯＧ（Histogram of Oriented Gradient）、動的特徴量としてＨｏＯＦ（Histogram of Optical Flow）が使用できる。 The start / end candidate extraction unit 121 uses three or more consecutive imaging frames (imaging data) to generate a static feature amount (feature amount data) based on one frame and a dynamic feature amount based on a plurality of frames. Amount (feature amount data) is extracted and combined to create one feature amount. Preferably, HOG (Histogram of Oriented Gradient) can be used as the static feature and HoOF (Histogram of Optical Flow) can be used as the dynamic feature.

開始終了候補抽出部１２１のデータベースには、予め、ジェスチャの開始時および終了時の特徴量と、それ以外の時の特徴量との分類基準を学習した基準データが記憶されている。そして、開始終了候補抽出部１２１は、実際の撮像フレームの中から抽出した特徴量を基準データにより分類し、開始時および終了時の特徴量としての基準を満たすものを候補として抽出する。学習・分類には機械学習の手法を用いることができ、好適にはRandom Forestが使用できる。図２では、連続する撮像フレームのうち、符号１、２、３、４で表示された部分が、ジェスチャの開始時および終了時の候補として複数抽出された部分（フレーム）となっている。 The database of the start / end candidate extraction unit 121 stores in advance reference data that has learned the classification criteria for the feature values at the start and end of the gesture and the feature values at other times. Then, the start / end candidate extraction unit 121 classifies the feature amounts extracted from the actual imaging frames based on the reference data, and extracts candidates that satisfy the criteria as the feature amounts at the start and end times as candidates. For learning and classification, a machine learning method can be used, and Random Forest can be preferably used. In FIG. 2, a portion indicated by reference numerals 1, 2, 3, and 4 is a portion (frame) extracted as a plurality of candidates at the start and end of the gesture among the continuous imaging frames.

類似度算出部１２２は、開始終了候補抽出部１２１によって抽出された複数の候補のうち、２つで一組みとなるすべての組合せを形成して、各組の候補の撮像フレームから、静的な特徴量（特徴量データ）を抽出し、抽出した特徴量を用いて、各組ごとの類似度を算出する。図３では、２つで一組みとなるすべての組み合せは、符号１と２、符号１と３、符号１と４、符号２と３、符号２と４、符号３と４であることを示している。類似度算出部１２２は、これらのすべての組み合せについて、２つの候補の類似度をそれぞれ算出するのである。 The similarity calculation unit 122 forms all combinations of two candidates among the plurality of candidates extracted by the start / end candidate extraction unit 121, and generates a static combination from the imaging frame of each set of candidates. A feature amount (feature amount data) is extracted, and a similarity is calculated for each set using the extracted feature amount. In FIG. 3, all combinations forming one pair are denoted by reference numerals 1 and 2, reference numerals 1 and 3, reference signs 1 and 4, reference signs 2 and 3, reference signs 2 and 4, and reference signs 3 and 4. ing. The similarity calculating unit 122 calculates the similarity of the two candidates for all of these combinations.

開始終了決定部１２３は、類似度算出部１２２によって算出された類似度のうち、最も高い類似度を持つ組み合せ（一組）を抽出し、その組み合せにおいて、早い方（先の方）を開始直前ジェスチャのフレームとし、また遅い方（後の方）を終了直後のジェスチャのフレームとして決定する。そして、開始終了決定部１２３は、開始直前のフレームと終了直後のフレームの間のフレームを、運転者が入力操作のために行ったジェスチャの区間として決定する。図４では、図３における各組み合せのうち、各符号１と２の組み合せの類似度が最も高く、符号１と２の間が、ジェスチャ区間として決定されたことを示している。 The start / end determination unit 123 extracts a combination (one set) having the highest similarity among the similarities calculated by the similarity calculation unit 122, and selects the earlier (earlier) combination immediately before the start. The gesture frame is determined, and the later (later) gesture frame is determined as the gesture frame immediately after the end. Then, the start / end determination unit 123 determines a frame between the frame immediately before the start and the frame immediately after the end as a section of the gesture performed by the driver for the input operation. FIG. 4 shows that among the combinations in FIG. 3, the combination of each of the codes 1 and 2 has the highest similarity, and the area between the codes 1 and 2 is determined as the gesture section.

そして、ジェスチャ識別部１３０は、開始終了決定部１２３によって決定されたジェスチャ区間の全撮像フレームについて、静的な特徴量と動的な特徴量とを抽出する。ジェスチャ識別部１３０のデータベースには、予め、各ジェスチャ動作の特徴量を分類する基準を示すジェスチャ基準データが記憶されている。そして、ジェスチャ識別部１３０は、ジェスチャ基準データに基づき、ジェスチャ区間における撮像フレームの特徴量が、どのジェスチャ動作のものか識別（分類）する。 Then, the gesture identification unit 130 extracts a static feature amount and a dynamic feature amount for all the imaging frames in the gesture section determined by the start / end determination unit 123. Gesture reference data indicating a criterion for classifying the feature amount of each gesture operation is stored in the database of the gesture identification unit 130 in advance. Then, the gesture identifying unit 130 identifies (classifies) which gesture operation the feature amount of the imaging frame in the gesture section is based on the gesture reference data.

ジェスチャ識別部１３０は、識別したジェスチャを各種車両機器、および各情報機器２０に出力することで、各種車両機器は、ジェスチャに応じた入力操作が行われ、各情報機器２０には、操作された作動状態等が表示される。 The gesture identification unit 130 outputs the identified gesture to various vehicle devices and each information device 20, so that the various vehicle devices perform an input operation according to the gesture, and each information device 20 is operated. The operation state and the like are displayed.

一般的に、運転手がジェスチャを行う際の開始時と終了時では、類似した姿勢を取る。即ち、運転手は所定のジェスチャを開始するときは、まず、所定の手指形状（ジェスチャの基本形状）で手をかざす行為をとり、所定のジェスチャを実行した後は、最初と同じ所定の手指形状を維持して、ジェスチャを終了する。 In general, a driver takes a similar posture at the start and end of a gesture. That is, when starting a predetermined gesture, the driver first takes an action of holding a hand with a predetermined finger shape (basic shape of the gesture), and after performing the predetermined gesture, the same predetermined finger shape as the first time. To end the gesture.

よって、本実施形態によれば、ジェスチャ区間抽出部１２０は、互いに類似する一組のジェスチャを捉えることで、開始直前ジェスチャおよび終了直後ジェスチャを抽出して、ジェスチャの区間を抽出することが可能となる。そして、ジェスチャ識別部１３０は、開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別することで、例えば、不用意に行われたジェスチャではなく、運転主の意図を持ったジェスチャとして確実に認識することが可能となる。よって、従来技術のように、わざわざジェスチャ検出のための操作の手間を増やすことなく、入力用のジェスチャを認識することができ、ひいては、誤認識の発生を抑制することができる。 Thus, according to the present embodiment, the gesture section extraction unit 120 can extract a gesture section immediately before the start and a gesture immediately after the end by capturing a set of gestures similar to each other to extract a section of the gesture. Become. Then, the gesture identification unit 130 identifies the gesture performed between the gesture immediately before the start and the gesture immediately after the end as the input gesture, so that, for example, not the gesture performed carelessly but the intention of the driver. Can be reliably recognized as a gesture having a symbol. Therefore, it is possible to recognize an input gesture without increasing the time and effort for gesture detection as in the related art, and it is possible to suppress occurrence of erroneous recognition.

また、本実施形態では、類似度算出部１２２が処理を行う前段階で、互いに類似する複数組みのジェスチャの候補を抽出する開始終了候補抽出部１２１を設けている。これにより、複数組みのジェスチャの候補を予め抽出することで、類似度算出部１２２において、各組の候補に対して類似度を順序立てて計算することが可能となり、計算負荷を減らすことができる。 Further, in the present embodiment, before the similarity calculation unit 122 performs the processing, a start / end candidate extraction unit 121 that extracts a plurality of sets of gesture candidates that are similar to each other is provided. Accordingly, by extracting a plurality of sets of gesture candidates in advance, it becomes possible for the similarity calculating unit 122 to sequentially calculate the similarity for each set of candidates, thereby reducing the calculation load. .

（その他の実施形態）
上記第１実施形態では、手検出部１１０を設けるものとして説明したが、手検出部１１０を廃止して、撮像装置１０によって撮像された撮像データをジェスチャ区間抽出部１２０に出力するものとしてもよい。 (Other embodiments)
In the above-described first embodiment, the hand detection unit 110 has been described as being provided. However, the hand detection unit 110 may be omitted, and image data captured by the imaging device 10 may be output to the gesture section extraction unit 120. .

上記第１実施形態では、開始終了候補抽出部１２１を設けるものとして説明したが、計算の負荷を重視しないのであれば、開始終了候補抽出部１２１は、廃止したものとしてもよい。 In the first embodiment described above, the start / end candidate extraction unit 121 is provided. However, the start / end candidate extraction unit 121 may be eliminated if the load of calculation is not emphasized.

また、対象となる操作者は、運転者に限らず、助手席者としてもよい。この場合、助手席者も、ジェスチャを行うことで、ジェスチャ検出装置１００によるジェスチャ検出が行われて、各種車両機器の操作が可能となる。 The target operator is not limited to the driver, and may be a passenger. In this case, the passenger also performs the gesture, the gesture detection is performed by the gesture detection device 100, and the various vehicle devices can be operated.

尚、上記第１実施形態で説明したジェスチャ検出装置１００が実行する内容は、本発明の「ジェスチャ検出方法」に対応する。また、上記第１実施形態では、ジェスチャ検出のために、ジェスチャ検出装置１００を用いたものとして説明したが、例えば、センタサーバーやクラウド等の記憶媒体に記憶された制御プログラムによって、コンピュータが、各部１１０、１２０、１３０として機能するようにすることで、ジェスチャ検出が可能である。 Note that the content executed by the gesture detection device 100 described in the first embodiment corresponds to the “gesture detection method” of the present invention. In the above-described first embodiment, the gesture detection device 100 has been described as being used for gesture detection. However, for example, a control program stored in a storage medium such as a center server or a cloud may cause a computer to operate each unit. Gesture detection is possible by functioning as 110, 120, and 130.

１０撮像装置
１００ジェスチャ検出装置
１２０ジェスチャ区間抽出部
１２１開始終了候補抽出部
１２２類似度算出部
１２３開始終了決定部
１３０ジェスチャ識別部 Reference Signs List 10 imaging device 100 gesture detection device 120 gesture section extraction unit 121 start / end candidate extraction unit 122 similarity calculation unit 123 start / end determination unit 130 gesture identification unit

Claims

When performing an input operation on an operation target based on a gesture of an operator, in a gesture detection device that detects the gesture,
Based on feature amount data extracted from the imaging data of the operator captured by the imaging device (10), a gesture of a set of gestures similar to each other is gestured immediately before the start of the gesture and a gesture after the gesture is started. A gesture section extraction unit (120) for extracting a section of the gesture as a gesture immediately after the end of
A gesture detection device comprising: a gesture identification unit (130) that identifies the gesture performed between the gesture immediately before the start and the gesture immediately after the end as an input gesture.

A similarity calculating unit (122) that calculates a similarity for each of a plurality of sets of gestures that are similar to each other;
From the similarities calculated by the similarity calculating unit, a set of gestures having the highest similarity is extracted as the set of gestures, and a start / end determination for determining the gesture immediately before the start and the gesture immediately after the end is performed. The gesture detection device according to claim 1, further comprising: a unit (123).

The gesture detection device according to claim 2, wherein the gesture section extraction unit includes a candidate extraction unit (121) that extracts in advance the plurality of sets of gesture candidates that are similar to each other.

When performing an input operation on an operation target based on a gesture of an operator, in the gesture detection method for detecting the gesture,
Based on feature amount data extracted from the imaging data of the operator captured by the imaging device (10), a gesture of a set of gestures similar to each other is gestured immediately before the start of the gesture and a gesture after the gesture is started. Is extracted immediately after the end of the gesture as a gesture,
A gesture detection method for identifying the gesture performed between the gesture immediately before the start and the gesture immediately after the end as an input gesture.

When performing an input operation on an operation target based on a gesture of an operator, in a gesture detection control program for detecting the gesture,
Computer
Based on feature amount data extracted from the imaging data of the operator captured by the imaging device (10), a gesture of a set of gestures similar to each other is gestured immediately before the start of the gesture and a gesture after the gesture is started. A gesture section extraction unit (120) for extracting a section of the gesture as a gesture immediately after the end of
A gesture detection control program for causing the gesture performed between the gesture immediately before the start and the gesture immediately after the end to function as a gesture identification unit (130) that identifies the gesture as an input gesture.