JP2005301693A

JP2005301693A - Video editing system

Info

Publication number: JP2005301693A
Application number: JP2004117169A
Authority: JP
Inventors: Hideki Koike; 英樹小池; Yasuhito Nakanishi; 泰人中西; Yoko Ishii; 陽子石井; Yoichi Sato; 洋一佐藤; Kenji Oka; 兼司岡
Original assignee: Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency
Priority date: 2004-04-12
Filing date: 2004-04-12
Publication date: 2005-10-27

Abstract

【課題】手指のジェスチャにより身体的な操作を行なうことができる動画編集システムの提供。
【解決手段】画像投影用計算機１１０に接続されたプロジェクタ１３０からスクリーン１５０に投影された動画編集画面上で、ユーザ１６０は手指のジェスチャにより動画編集の各操作を行なう。画像処理用計算機１２０に接続されたカメラ１４０はそのジェスチャを撮影し、画像処理用計算機１２０が手指の位置を認識する。画像投影用計算機１１０はユーザ１６０の手指の位置からジェスチャの意味を認識し、そのジェスチャに従って画像編集処理を行ない、動画編集画面を再びスクリーン１５０に投影する。上記のインターフェースにより、マウス等で操作していた従来の動画編集システムの問題点を解決した。
【選択図】図１PROBLEM TO BE SOLVED: To provide a moving image editing system capable of performing a physical operation by a finger gesture.
On a moving image editing screen projected on a screen 150 from a projector 130 connected to an image projection computer 110, a user 160 performs each operation of moving image editing with a finger gesture. The camera 140 connected to the image processing computer 120 captures the gesture, and the image processing computer 120 recognizes the position of the finger. The image projection computer 110 recognizes the meaning of the gesture from the position of the finger of the user 160, performs image editing processing according to the gesture, and projects the moving image editing screen onto the screen 150 again. With the above interface, the problem of the conventional video editing system that was operated with a mouse etc. was solved.
[Selection] Figure 1

Description

本発明は、ジェスチャ認識を用いた動画編集システムに関するものである。 The present invention relates to a moving image editing system using gesture recognition.

近年、デジタルビデオやデジタルカメラの普及に伴い、自分で撮影した動画や静止画をＰＣへ保存し編集作業を個人的に行うことが可能になった。撮影機材がアナログであった頃は、動画の編集作業にはジョグシャトルやスライダなどさまざまなダイヤルやボタンの並んだ入力機器（以下「アナログ入力機器」という）を備えた固有の機器が用いられてきた。これらの入力機器は、全ての操作情報が常に目で見えるように設計されているため、一見複雑なものであるかのように見える。そのため初心者には扱いづらいものと考えられるが、ユーザが使用を重ねるにつれて動作方法を身体的感覚により覚え、直感的な動作が可能となる。そのため、ユーザのスキルが上がるにつれて、よりリアルタイムでの操作が可能になるといった特徴を兼ね備えていると言える。
最近では、例えば非特許文献１，非特許文献２，非特許文献３などの、映像データをＰＣ上で編集するためのソフトウェアが広く普及している。その理由として、先に挙げた固有の機器を用意することなく編集作業を行うことができ、さらにマウスの扱いに慣れているユーザであれば、すぐに利用できるという点が挙げられる。
しかしながら、アナログ入力機器が用いられてきた作業を、上述した従来の動画編集ソフトウェアのようなＧＵＩアプリケーション上でマウスを用いて行うことの問題点として、以下のものが考えられる。 In recent years, with the widespread use of digital video and digital cameras, it has become possible to save moving images and still images taken by the user on a PC for personal editing. When the photographic equipment was analog, video editing work was done using unique devices equipped with input devices with various dials and buttons (hereinafter referred to as “analog input devices”) such as jog shuttles and sliders. It was. Since these input devices are designed so that all the operation information is always visible, it seems to be complicated at first glance. Therefore, it is considered difficult for beginners to handle, but as the user continues to use it, the operation method is learned by a physical sense, and intuitive operation becomes possible. Therefore, it can be said that it has the feature that operation in real time becomes possible as the skill of the user increases.
Recently, software for editing video data on a PC, such as Non-Patent Document 1, Non-Patent Document 2, and Non-Patent Document 3, has been widely used. The reason for this is that the editing work can be performed without preparing the above-mentioned unique devices, and if the user is used to handling the mouse, it can be used immediately.
However, the following can be considered as problems in using the mouse on a GUI application such as the above-described conventional moving image editing software for operations that have been performed using analog input devices.

（１）アナログ入力機器で実現されていたような身体的な動作が行なえないため、操作を行なった量などを身体的に理解することができない。このため、操作に身体的経験が反映されず、スキルが操作に反映されない。
（２）アナログ入力機器では複数の手指で同時操作を行いながら、入力がどのように出力に反映されるかをリアルタイムに知ることができるが、マウスを用いた入力では、一般的にポインティングを行える箇所は１つに限られている。このため、同時に複数の操作を行うことができない。
（３）従来のＧＵＩアプリケーションではプルダウンメニューが用いられていることが多く、操作に必要な複数の情報を常に目で見えるようにすることが難しい。
（４）モニターに大型のディスプレイを用いる際にマウスが必ずしも適切な入力であるとは限らない。すなわち、大型のディスプレイを用いる場合には、マウスカーソルを画面の端から端まで移動させるために、マウスの相対的な移動を何度も繰り返す必要がある。これを解決するために、マウスが描く軌跡の速度を上げる設定を行うこともできるが、その場合には細かいマウスの移動を行うことが困難となる。 (1) Since the physical movement as realized by the analog input device cannot be performed, the amount of operation and the like cannot be physically understood. For this reason, physical experience is not reflected in operation, and skill is not reflected in operation.
(2) In analog input devices, it is possible to know in real time how the input is reflected in the output while performing simultaneous operation with a plurality of fingers, but generally pointing can be performed with an input using a mouse. The number of places is limited to one. For this reason, a plurality of operations cannot be performed simultaneously.
(3) In conventional GUI applications, a pull-down menu is often used, and it is difficult to always make a plurality of information necessary for operation visible.
(4) When using a large display as a monitor, the mouse is not always an appropriate input. That is, when a large display is used, it is necessary to repeat the relative movement of the mouse many times in order to move the mouse cursor from end to end of the screen. In order to solve this, it is possible to set the speed of the trajectory drawn by the mouse, but in that case, it is difficult to move the mouse finely.

これらの問題点を解決する方法として、大型ディスプレイを用い、その入力機器としてアナログ入力機器の良さを持ち合わせたインタフェースを構築することが考えられる。
発明者らは机型実世界指向インタフェースであるEnhancedDeskの研究を進めてきた（例えば、非特許文献４）。EnhancedDeskでは手指の位置やジェスチャおよび物体の位置をシステムへの入力とすることができ、そのアプリケーションとして両手を用いる描画ツールや少人数での会議を支援するシステムなどを実現した（例えば、非特許文献５，非特許文献６）。これらのシステムでは、手指を入力手段とすることにより身体的な操作を可能とし、量の指定を身体的に感じながらより直感的にシステムを操作することができる。また、個々の指先の位置を認識しているため、最大で同時に１０箇所のポインティングを行うことができ、同時に複数のデジタル情報を操作することが可能である。さらにプロジェクタの投影面として机を用いているため、大型ディスプレイとしての環境を実現している。これらの点から、従来はアナログ入力機器が用いられてきた作業を大型のディスプレイを用いてデジタル的に行う環境として、EnhancedDeskは有効なものの１つであると思われる。
しかしながら、従来の動画編集ソフトウェア（例えば非特許文献１〜３）の問題点を解決し、ジェスチャで動画編集作業を指示することができる動画編集システムは開発されておらず、上記の問題点は解決されていない。 As a method for solving these problems, it is conceivable to use a large display and construct an interface having the goodness of an analog input device as its input device.
The inventors have been researching EnhancedDesk, which is a desk-type real world oriented interface (for example, Non-Patent Document 4). EnhancedDesk can input finger positions, gestures, and object positions into the system, and as its application, it has realized a drawing tool that uses both hands and a system that supports meetings with a small number of people (for example, non-patent literature) 5, non-patent document 6). In these systems, physical operation is possible by using fingers as input means, and the system can be operated more intuitively while physically feeling the designation of the amount. In addition, since the positions of individual fingertips are recognized, it is possible to perform pointing at 10 locations at the same time, and it is possible to operate a plurality of digital information at the same time. Furthermore, since a desk is used as the projection surface of the projector, an environment as a large display is realized. From these points, it seems that EnhancedDesk is one of the effective environments in which the work that has conventionally been used for analog input devices is performed digitally using a large display.
However, a video editing system that solves the problems of conventional video editing software (for example, Non-Patent Documents 1 to 3) and can instruct video editing work with gestures has not been developed, and the above problems are solved. It has not been.

Adobe, Premiere. http://www.adobe.com/products/premiere/Adobe, Premiere. Http://www.adobe.com/products/premiere/ Apple, Final Cut Pro. http://www.apple.com/finalcutpro/Apple, Final Cut Pro. Http://www.apple.com/finalcutpro/ Apple, iMovie. http://www.apple.com/imovie/Apple, iMovie. Http://www.apple.com/imovie/ 小林貴訓, 他, “EnhancedDesk のための赤外線画像を用いた実時間指先認識インタフェース”, 日本ソフトウェア科学会WISS 1999, pp.49-54, 1999.Takanori Kobayashi, et al., “Real-time fingertip recognition interface using infrared image for EnhancedDesk”, Japan Software Science Society WISS 1999, pp.49-54, 1999. 陳欣蕾, 他, “机型インタフェースにおける両手直接操作による描画システム”, 日本ソフトウェア科学会WISS 2001, pp.179-184, 2001.Chen Zhao, et al., “Drawing System with Two-handed Direct Operation in Desk Type Interface”, Japan Software Science Society WISS 2001, pp.179-184, 2001. 永嶋慎一郎, 他, “EnhancedTable: 紙を取りまくミーティング支援システム”, 日本ソフトウェア科学会WISS 2002, pp.111-116, 2002.Shinichiro Nagashima, et al., “EnhancedTable: Meeting Support System for Paper”, Japan Software Science Society WISS 2002, pp.111-116, 2002.

本発明の課題は、アナログ入力機器が持っていた長所である、操作の直感性やスキルの作業性への反映が可能であり、さらに同時複数操作が行なえる動画編集システムを提供し、従来の動画編集ソフトウェアの問題点を解決することである。 An object of the present invention is to provide a video editing system that can be reflected in the operational intuition and skill workability, which is an advantage of an analog input device, and that can perform multiple simultaneous operations. It is to solve the problem of video editing software.

上記の課題を解決するために、本発明は、動画編集画面を表示し、それに対するユーザのジェスチャをカメラで入力して動画編集操作を行なう動画編集システムにおいて、前記動画編集画面を表示する動画編集表示手段と、前記カメラが撮影した画像を入力する画像入力手段と、入力した画像からユーザの手指の位置を認識する手指認識手段と、前記ユーザの手指の位置からユーザのジェスチャの意味を認識するジェスチャ認識手段と、前記ジェスチャに対応した動画編集を行なう動画編集手段とを備え、動画編集の結果を動画編集画面として表示することを特徴とする動画編集システムである。
動画編集システムは、さらに、ユーザの手指が写る前の画像を記憶する初期画面記憶手段を備えており、前記手指認識手段は、入力した前記画像と前記初期画面記憶手段からの画像とを比較してユーザの手指の領域を取得し、円形テンプレートマッチングによりユーザの指先の位置を取得して、ユーザの手指の位置を認識することを特徴とすることもできる。
また、前記手指認識手段における前記円形テンプレートマッチングは、ユーザの手の中心の位置を中心とする正方形の領域に対して行なうことを特徴とすることもできる。
また、前記動画編集表示手段は、前記動画編集画面上に複数の動画を横一列に並べ、それぞれの動画を再生する際に、隣り合う動画が右から左に行くにつれ再生位置が一定時間ずつ遅れた状態で再生することを特徴とすることもできる。 In order to solve the above-mentioned problems, the present invention displays a moving image editing screen, and displays the moving image editing screen in a moving image editing system that performs a moving image editing operation by inputting the user's gesture to the camera with a camera. A display unit; an image input unit that inputs an image captured by the camera; a finger recognition unit that recognizes a position of the user's finger from the input image; and a meaning of the user's gesture from the position of the user's finger. A moving image editing system comprising: a gesture recognizing unit; and a moving image editing unit that edits a moving image corresponding to the gesture, and displaying a moving image editing result as a moving image editing screen.
The video editing system further includes an initial screen storage unit that stores an image before the user's finger is captured, and the finger recognition unit compares the input image with the image from the initial screen storage unit. It is also possible to acquire the user's finger area, acquire the position of the user's fingertip by circular template matching, and recognize the position of the user's finger.
Further, the circular template matching in the finger recognition unit may be performed on a square region centered on the center position of the user's hand.
The video editing display means arranges a plurality of videos in a horizontal row on the video editing screen, and when each video is played back, the playback position is delayed by a certain time as the adjacent videos go from right to left. It can also be characterized in that it is played back in a live state.

前記ジェスチャ認識手段は、前記ユーザの手指の位置が、前記編集画面上の同じ場所で手を開いた状態から手を閉じた状態になった場合に、ユーザがその位置に表示されている画像又は動画をつかんだと認識することを特徴とすることもできる。
また、前記ジェスチャ認識手段は、前記ユーザの手指の位置が、前記編集画面上の同じ場所で指を一本出した状態である場合に、ユーザがその位置に表示されている画像又は動画を選択したと認識することを特徴とすることもできる。
また、前記ジェスチャ認識手段は、前記ユーザの両手の手指の位置がそれぞれ、前記編集画面上の同じ場所で指を一本出した状態である場合に、ユーザがそれぞれの手の位置に表示されている画像又は動画およびそれらの間に表示されている画像又は動画を選択したと認識することを特徴とすることもできる。さらに、前記ユーザが画像又は動画を選択した後に、両手を近づけた場合に、選択した画像又は動画を一つに連結したと認識することを特徴とすることもできる。
また、前記ジェスチャ認識手段は、前記ユーザの手指の位置が、手を開いた状態のまま上下左右のいずれかに移動した場合、その移動方向および移動速度および移動距離を、動画編集の量であると認識することを特徴とすることもできる。
上記のいずれかに記載の動画編集システムの機能をコンピュータ・システムに実現させるためのプログラムも、本発明である。 When the position of the finger of the user is changed from a state where the hand is opened at the same place on the editing screen to a state where the hand is closed, the gesture recognition unit It can also be characterized by recognizing that a video has been grabbed.
The gesture recognizing unit selects an image or a moving image displayed at the position of the user's finger when the finger is placed at the same place on the editing screen. It can also be characterized as recognizing that
The gesture recognizing means displays the user at the position of each hand when the finger positions of both hands of the user are in the state where one finger is put out at the same place on the editing screen. It is also possible to recognize that the selected image or moving image and the image or moving image displayed between them are selected. Furthermore, after the user selects an image or a moving image, when both hands are brought close to each other, it is possible to recognize that the selected image or moving image is connected to one.
Further, the gesture recognizing means, when the position of the user's finger moves to any one of up, down, left and right with the hand open, its moving direction, moving speed and moving distance are the amount of moving image editing. It can also be characterized by recognizing.
A program for causing a computer system to realize the function of the moving image editing system described above is also the present invention.

本発明の画編集システムは、画面への入力に手指認識システムを用いている。このため、ユーザはジェスチャにより動画編集の各操作を行なうことができる。本発明により、従来のアナログ入力機器が持ち合わせていた身体的な動作に基づく入力や同時複数操作が可能となり、従来の動画編集ソフトウェアの問題点を解決することができる。 The image editing system of the present invention uses a finger recognition system for input to the screen. For this reason, the user can perform each operation of moving image editing with gestures. According to the present invention, it is possible to perform input based on physical movements that a conventional analog input device has and a plurality of simultaneous operations, and to solve the problems of conventional moving image editing software.

以降、本発明の動画編集システムの実施形態を詳細に説明する。
＜１．システム構成＞
まず、図１に示すシステム構成図を参照しながら、本実施形態の動画編集システムのシステム構成を説明する。
図１に示すように、本実施形態の動画編集システムは、手指認識を行う画像処理用計算機１２０と、動画編集処理およびスクリーン１５０への画像投影を行なう画像投影用計算機１１０とを接続し、画像投影用計算機１１０にはプロジェクタ１３０を、画像処理用計算機１２０にはカメラ１４０を、それぞれ接続して構成している。プロジェクタ１３０およびカメラ１４０は、スクリーン１５０の上方に設置する。ここで、カメラ１４０の画像取得領域とプロジェクタ１３０による投影範囲はほぼ同じ広さとする。なお、本実施形態ではプロジェクタ１３０を用いてスクリーン１５０への投影を行なう場合を例として説明するが、プロジェクタ１３０を用いず、通常の大型ディスプレイ等に画面を表示してもよい。
ユーザ１６０は、プロジェクタ１３０からスクリーン１５０に投影された画面上で、手指を用いたジェスチャにより動画編集の各操作を行なう。カメラ１４０で撮影されるスクリーン１５０上の画像は画像処理用計算機１２０に送信され、画像処理用計算機１２０でその手指の位置を認識する。認識した手指の位置を画像投影用計算機１１０に送ることで、プロジェクタ１３０で投影された画面上のオブジェクト（画像や動画など）と手指とのインタラクションを実現している。すなわち、画像投影用計算機１１０では画像処理用計算機１２０から送られた手指の位置をもとにユーザ１６０のジェスチャを認識し、そのジェスチャに対応して動画編集の各処理を行なって、動画編集後の画面を再びプロジェクタ１３０からスクリーン１５０へ投影する。 Hereinafter, embodiments of the moving image editing system of the present invention will be described in detail.
<1. System configuration>
First, the system configuration of the moving image editing system of this embodiment will be described with reference to the system configuration diagram shown in FIG.
As shown in FIG. 1, the moving image editing system according to the present embodiment connects an image processing computer 120 that performs finger recognition and an image projection computer 110 that performs moving image editing processing and image projection onto a screen 150. A projector 130 is connected to the projection computer 110, and a camera 140 is connected to the image processing computer 120. The projector 130 and the camera 140 are installed above the screen 150. Here, the image acquisition area of the camera 140 and the projection range by the projector 130 are approximately the same. In the present embodiment, a case where projection onto the screen 150 is performed using the projector 130 will be described as an example. However, the screen may be displayed on an ordinary large display or the like without using the projector 130.
The user 160 performs each operation of editing a moving image by a gesture using fingers on the screen projected from the projector 130 onto the screen 150. The image on the screen 150 captured by the camera 140 is transmitted to the image processing computer 120, and the image processing computer 120 recognizes the position of the finger. By transmitting the recognized finger position to the image projection computer 110, interaction between an object (such as an image or a moving image) projected on the screen by the projector 130 and the finger is realized. That is, the image projection computer 110 recognizes the gesture of the user 160 based on the position of the finger sent from the image processing computer 120, performs each video editing process corresponding to the gesture, Are projected again from the projector 130 onto the screen 150.

画像投影用計算機１１０には「ジェスチャ認識プログラム」を実装し、画像処理用計算機１２０から送られた手指の位置からユーザ１６０のジェスチャの意味を認識するジェスチャ認識処理を行なう。また、「動画編集アプリケーション」を実装しており、ユーザ１６０の操作による動画編集処理を行なう。
一方、画像処理用計算機１２０には「手指認識プログラム」および「画像処理ライブラリ」を搭載し、手指認識プログラムは、この画像処理ライブラリを用いてカメラ１４０がとらえたユーザ１６０の手指の位置（スクリーン１５０上の位置）を認識する手指認識処理を行なう。発明者らは従来、指先を認識する画像処理のために画像処理ボードなどのハードウェアを用いてきた。ハードウェアで画像処理を行なうことの利点は、低スペックのコンピュータでも高度な画像処理を行える点である。しかし、ハードウェアで処理を行うにはコンピュータ１台に対して１つの処理ボードが必要となる。新たに画像処理を行うコンピュータを増やす場合、増えたコンピュータの数だけ画像処理ボードが必要になるため拡張性の欠如に繋がる。近年では、コンピュータの処理速度が格段に速くなってきたため、もはや画像処理をハードウェアに依存する必要性が無くなっており、逆にハードウェアを用いることの欠点が目立ってきた。
そこで本実施形態では、例えば、ソフトウェアで画像処理の行なえる画像処理ライブラリOpenCV (Intel Open Computer Vision Library) を画像処理用計算機１２０に搭載する。これにより利便性、拡張性の高いシステムとすることができる。さらに当ライブラリはオープンソースで公開されているため、より柔軟なシステムを構築することが可能であると考えられる。 A “gesture recognition program” is installed in the image projection computer 110 to perform a gesture recognition process for recognizing the meaning of the gesture of the user 160 from the position of a finger sent from the image processing computer 120. In addition, a “moving image editing application” is installed, and a moving image editing process is performed by an operation of the user 160.
On the other hand, the image processing computer 120 is equipped with a “hand recognition program” and an “image processing library”. The hand recognition program uses the image processing library to detect the position of the finger of the user 160 (screen 150) captured by the camera 140. Finger recognition processing for recognizing the upper position) is performed. The inventors have conventionally used hardware such as an image processing board for image processing for recognizing a fingertip. An advantage of performing image processing with hardware is that advanced image processing can be performed even with a low-spec computer. However, one processing board is required for one computer in order to perform processing by hardware. When the number of computers that newly perform image processing is increased, image processing boards are required for the increased number of computers, leading to lack of expandability. In recent years, since the processing speed of computers has become much faster, it is no longer necessary to rely on hardware for image processing, and conversely, the disadvantages of using hardware have become conspicuous.
Therefore, in this embodiment, for example, an image processing library OpenCV (Intel Open Computer Vision Library) capable of performing image processing by software is installed in the image processing computer 120. As a result, a system with high convenience and expandability can be obtained. Furthermore, since this library is open source, it is considered possible to build a more flexible system.

また、カメラ１４０には例えばIEEE1394カメラを用いる。赤外線カメラやCCDカメラを用いることもできるが、これらのカメラは入出力端子がアナログであるため、デジタル信号で撮った画像データをアナログ信号へ変換しコンピュータに取り込み、再びデジタル信号へ戻す必要がある。また、カメラからの入力を取り込むには特殊なハードウェアが必要である。このため、本実施形態ではIEEE1394カメラを用いることでこれらの問題点を解決する。IEEE1394はコンピュータ環境に適した新たなインタフェース規格である。データのリアルタイム転送に優れ価格も安価なため、導入が容易であり現在広く普及している。このインタフェースを備えたIEEE1394カメラを用いることで、より汎用的なシステムとなると考えられる。
また、本実施形態では実時間での処理を目的としているため、手指認識の高速化を実現する必要がある。従って、カメラ１４０の画像から手指の領域を認識する際、スクリーン１５０と手指との差分を取りやすくする必要がある。このため、スクリーン１５０には単色である白いテーブルを使用する。
なお、図１のシステム構成図においては、画像処理用計算機１２０と画像投影用計算機１１０の２台の計算機を用意しているが、これに限らず、例えば画像処理用計算機や画像投影用計算機を必要に応じて複数台用いることも可能であるし、１台の計算機で画像処理と画像投影の両方を行なうようにしてもよい。 The camera 140 is an IEEE1394 camera, for example. Infrared cameras and CCD cameras can also be used, but since these cameras have analog input / output terminals, it is necessary to convert image data taken with digital signals into analog signals, import them into a computer, and return them to digital signals again. . Also, special hardware is required to capture input from the camera. Therefore, in this embodiment, these problems are solved by using an IEEE1394 camera. IEEE1394 is a new interface standard suitable for computer environments. Since it is excellent in real-time data transfer and inexpensive, it is easy to introduce and is now widely used. By using an IEEE1394 camera equipped with this interface, a more general-purpose system is considered.
In addition, since the present embodiment is intended for processing in real time, it is necessary to increase the speed of finger recognition. Therefore, when recognizing the finger area from the image of the camera 140, it is necessary to make it easy to take the difference between the screen 150 and the finger. For this reason, a white table that is a single color is used for the screen 150.
In the system configuration diagram of FIG. 1, two computers, the image processing computer 120 and the image projection computer 110, are prepared. However, the present invention is not limited to this. For example, an image processing computer or an image projection computer is used. A plurality of units can be used as necessary, and both image processing and image projection may be performed by a single computer.

＜２．手指認識＞
次に、本実施形態の動画編集システムにおける、手指認識処理の流れについて説明する。上述したように、手指認識処理は、画像処理用計算機１２０に実装されている手指認識プログラムが、搭載された画像処理ライブラリ（本実施形態では例えば上述のOpenCV）を用いて、カメラ１４０がとらえたユーザ１６０の手指の位置（スクリーン１５０上の位置）を認識する処理を行なう処理である。
手指認識処理を起動する前に、プロジェクタ１３０から動画編集アプリケーションの初期画面をスクリーン１５０に投影する。
手指認識処理を起動すると、まず、上記の初期画面が投影された状態のスクリーン１５０の画像をカメラ１４０より取得し、これを初期画像として記憶する。以降、カメラ１４０が実時間でキャプチャした画像と上記で記憶した初期画像との差分により、手指の領域を取得する。その後、ユーザ１６０のジェスチャにより動画編集が行なわれるなどして動画編集アプリケーションの画面が遷移した場合には、遷移後の画面を初期画像として記憶し、再び上述の方法で手指の領域を取得する。
手指の領域を取得することにより、スクリーン１５０に投影された動画編集画面上のどのオブジェクト（画像や動画など）の上にユーザ１６０の手が置かれているかを認識することができる。 <2. Finger recognition>
Next, a flow of finger recognition processing in the moving image editing system of the present embodiment will be described. As described above, the finger recognition processing is performed by the camera 140 using the image processing library (for example, the above-described OpenCV in the present embodiment) installed in the finger recognition program installed in the image processing computer 120. This is processing for recognizing the position of the finger of the user 160 (position on the screen 150).
Before starting the finger recognition processing, the initial screen of the moving image editing application is projected from the projector 130 onto the screen 150.
When the finger recognition process is started, first, an image of the screen 150 on which the initial screen is projected is acquired from the camera 140 and stored as an initial image. Thereafter, the finger region is acquired based on the difference between the image captured by the camera 140 in real time and the initial image stored above. After that, when the screen of the video editing application transitions due to, for example, video editing performed by the user 160 gesture, the screen after the transition is stored as an initial image, and the finger area is acquired again by the above-described method.
By acquiring the finger area, it is possible to recognize on which object (image, moving image, etc.) on the moving image editing screen projected on the screen 150 the user's 160 hand is placed.

また、本実施形態では手を開いた状態、閉じた状態、指を１本出した状態などのジェスチャを用いる。このため、スクリーン１５０上にあるユーザ１６０の指先を認識する必要がある。
指先の認識には、指先形状の輪郭が円に近いことに基づき、円形テンプレートによるテンプレートマッチングを用いる。テンプレートマッチングは、処理結果に信頼性がある反面、計算コストがかかるという問題点が挙げられている。そこで本実施形態では処理速度を上げるため、あらかじめ手のひらの中心を求めておき、その点を中心とした、60pixel×60pixelの正方形の領域にのみテンプレートマッチングを行うものとする。60pixelの値は、カメラからキャプチャした画像において手のひらが全て入る大きさとして、経験上設定された値であるが、他の適切な値であってもよい。また、テンプレートとして円を用いているため、通常は回転に弱いテンプレートマッチングの処理を問題無く行うことができる。
手指認識処理は、上述のようにしてユーザ１６０の手指の位置を認識する。
本実施形態ではIEEE1394カメラおよびOpenCVを用いているため、従来の赤外線カメラやCCDカメラを用い、ハードウェアで画像処理を行なっていた場合に比べて、手指認識の処理速度が改善された。 In the present embodiment, gestures such as a state where the hand is opened, a state where the hand is closed, and a state where one finger is taken out are used. For this reason, it is necessary to recognize the fingertip of the user 160 on the screen 150.
For fingertip recognition, template matching using a circular template is used based on the fact that the contour of the fingertip shape is close to a circle. The template matching has a problem that the processing result is reliable, but the calculation cost is high. Therefore, in this embodiment, in order to increase the processing speed, the center of the palm is obtained in advance, and template matching is performed only on a square area of 60 pixels × 60 pixels centered on that point. The value of 60 pixels is a value that is empirically set as a size that allows all palms to be included in an image captured from the camera, but may be another appropriate value. In addition, since a circle is used as a template, template matching processing that is usually sensitive to rotation can be performed without any problem.
In the finger recognition process, the position of the finger of the user 160 is recognized as described above.
In this embodiment, since the IEEE1394 camera and OpenCV are used, the finger recognition processing speed is improved as compared with the case where image processing is performed by hardware using a conventional infrared camera or CCD camera.

＜３．手指認識を用いた動画編集システム＞
本実施形態の動画編集システムでは、アナログ入力装置に代わる入力装置としてユーザの手指を用いる。マウスなどの媒体を用いること無くシステムの操作が行えるため、ユーザはより直感的に操作することが可能になると考えられる。本実施形態の動画編集システムで使用するライブラリとして、例えば、動画ファイルの操作に長けているJMF(Java(R) Media Framework)2.1.1とQuickTime for Java(R)を用いる。 <3. Video editing system using finger recognition>
In the moving image editing system of the present embodiment, the user's fingers are used as an input device that replaces the analog input device. Since the system can be operated without using a medium such as a mouse, it is considered that the user can operate more intuitively. As a library used in the moving image editing system of the present embodiment, for example, JMF (Java® Media Framework) 2.1.1 and QuickTime for Java® that are good at moving image files are used.

（３−１．ジェスチャの認識）
ユーザ１６０は、スクリーン１５０に投影された動画編集アプリケーションの画面上で手指によるジェスチャを行なって、本実施形態の動画編集システムの各操作を行なう。図２は、動画編集アプリケーションの画面の例である。図２に示すように、本実施形態における動画編集の画面には幾つものオブジェクト（画像や動画など）が表示されており、ユーザは所望のオブジェクトの真上で、手指によるジェスチャを行なう。なお、図２に示す動画編集アプリケーションの画面構成については、後で詳しく説明する。 (3-1. Recognition of gestures)
The user 160 performs a gesture with a finger on the screen of the moving image editing application projected on the screen 150 and performs each operation of the moving image editing system of the present embodiment. FIG. 2 is an example of a screen of the moving image editing application. As shown in FIG. 2, a number of objects (images, moving images, etc.) are displayed on the moving image editing screen in the present embodiment, and the user performs a gesture with a finger directly above the desired object. The screen configuration of the moving image editing application shown in FIG. 2 will be described in detail later.

ジェスチャ認識処理は、画像投影用計算機１１０で行なわれる。上述したように、画像処理用計算機１２０の手指認識システムでユーザの手指の位置が認識されると、その手指の位置の情報を画像投影用計算機１１０が受け取って、ジェスチャ認識処理によりユーザのジェスチャの意味を認識する。本実施形態では、図３の（ａ）〜（ｄ）に示す４種類の意味を認識する。
（ａ）スクリーンに投影されているオブジェクトの上で、指を開いた状態（３１１）から、閉じる（３１２）ジェスチャ：そのオブジェクトをつかんだと認識される。
（ｂ）スクリーンに投影されているオブジェクトの上で指を一本出したジェスチャ：そのオブジェクトを選択したと認識される。また両手でこのジェスチャを行った場合、左手の指で選択されたオブジェクトから右手の指で選択されたオブジェクトまでが、操作の対象として選択されたと認識される。
（ｃ）手を開いた状態で上下左右に動かすジェスチャ：手が移動した量（移動方向、移動距離、移動速度など）を、システムへ渡すジェスチャであると認識される。
（ｄ）両手でそれぞれ上記（ｂ）の選択を行ない（図では左手３４１でオブジェクト３４３を、右手３４２でオブジェクト３４５を選択している）、その両手を離した状態から近付けるジェスチャ：両手の下にあるオブジェクト（３４３，３４４，３４５）を、１つ（３４７）にまとめたと認識される。ここで、図示されているオブジェクト３４６は、選択されていない。
なお、本実施形態のジェスチャ認識プログラムは、手を開いた状態や閉じた状態、指を１本出した状態のまま一定時間（例えば、０．５秒など）が経過した場合に、それらをジェスチャとして認識する。また、上記の（ａ）〜（ｄ）は一例であり、同様に他のジェスチャにも意味を持たせることが可能である。 The gesture recognition process is performed by the image projection computer 110. As described above, when the position of the user's finger is recognized by the finger recognition system of the image processing computer 120, the image projection computer 110 receives information on the position of the finger, and the gesture recognition process performs the user's gesture. Recognize meaning. In the present embodiment, four types of meanings shown in FIGS. 3A to 3D are recognized.
(A) On the object projected on the screen, from a state where the finger is opened (311) to a close (312) gesture: It is recognized that the object is grasped.
(B) Gesture with one finger on the object projected on the screen: It is recognized that the object has been selected. When this gesture is performed with both hands, it is recognized that the object selected with the finger of the left hand to the object selected with the finger of the right hand is selected as the operation target.
(C) Gesture to move up / down / left / right with hand open: It is recognized as a gesture to pass the amount of movement (movement direction, movement distance, movement speed, etc.) to the system.
(D) Perform the above selection (b) with both hands (in the figure, the object 343 is selected with the left hand 341, and the object 345 is selected with the right hand 342). It is recognized that a certain object (343, 344, 345) is grouped into one (347). Here, the illustrated object 346 is not selected.
Note that the gesture recognition program according to the present embodiment uses the gesture recognition program when a certain time (for example, 0.5 seconds) elapses with the hand open or closed, or with one finger out. Recognize as Also, the above (a) to (d) are examples, and it is possible to give meaning to other gestures as well.

（３−２．ジェスチャによる動画編集）
手指認識を用いて動画編集を行なう動画編集システムを構築するにあたり、本実施形態では、従来の一般的なＰＣ向けの動画編集システムで行われる操作の中から、次の５つの主要な機能に対してアナログ入力機器が持つ特徴を取り入れることにより、動画編集システムの実装を行なった。
（Ａ）選択
（Ｂ）移動
（Ｃ）動画ファイルの早送り／巻戻し
（Ｄ）動画の切り取り
（Ｅ）動画の連結
すなわち、本実施形態の動画編集システムを用いれば、ユーザは上述の（ａ）〜（ｄ）の４つのジェスチャの組み合わせにより、（Ａ）〜（Ｅ）に示す動画編集の機能を利用することができる。
なお、上記の（Ａ）〜（Ｅ）の動画編集機能は一例であり、同様に、他の機能を実装することも可能である。 (3-2. Editing video with gestures)
In constructing a video editing system that performs video editing using finger recognition, in this embodiment, the following five main functions are selected from the operations performed in a conventional general video editing system for PC. The video editing system was implemented by incorporating the characteristics of analog input devices.
(A) Select (B) Move (C) Fast forward / rewind movie file (D) Cut movie (E) Link movie That is, if the movie editing system of this embodiment is used, the user can By combining the four gestures of (d) to (d), the moving image editing functions shown in (A) to (E) can be used.
Note that the moving image editing functions (A) to (E) described above are merely examples, and other functions can be similarly implemented.

以降、図２に示す動画編集アプリケーションの画面および、その画面上での操作の例を示した図４〜８を参照しながら、「元になる動画ファイルから必要な部分を切り出し、切り出した動画を連結して新たな動画ファイルを作成する」操作を例として、上記の（Ａ）〜（Ｅ）の操作を説明する。なお、本実施形態の動画編集アプリケーションで行なわれる（Ａ）〜（Ｅ）の動画編集の処理自体や、画面表示の処理自体は、従来の動画編集ソフトウェアと同様である。 Hereinafter, referring to FIGS. 4 to 8 showing the screen of the video editing application shown in FIG. 2 and examples of operations on the screen, “cut out a necessary part from the original video file, The above operations (A) to (E) will be described with reference to an example of an operation of “concatenating and creating a new moving image file”. Note that the video editing process (A) to (E) and the screen display process itself performed by the video editing application of the present embodiment are the same as those of the conventional video editing software.

図２は、本実施形態の動画編集アプリケーションの画面構成の一例である。画面２００は、スクリーン（図１に示したスクリーン１５０）に投影される画面である。ユーザは、画面２００上で手指のジェスチャを行なう。画面２００の各エリア２１０〜２８０には、ユーザがその上で操作を行なうためのオブジェクト（画像や動画など）を表示する。
画面右上のエリア２２０は、本実施形態の動画編集システムで編集される元となる動画ファイルを表示するエリア（以降「ｃｌｉｐＤｏｃｋ」と呼ぶ）である。元となる動画ファイル（例えばｍｏｖファイルなど）は、例えば、あらかじめ画像投影用計算機１１０の特定のディレクトリ内に記憶しておき、それをアプリケーション起動時に自動的に読み出してｃｌｉｐＤｏｃｋ２２０に表示するようにするとよい。ユーザはスクリーンに投影されたｃｌｉｐＤｏｃｋ２２０から編集したい動画ファイルを選択する。ｃｌｉｐＤｏｃｋ２２０の下にあるエリア２３０には、ｃｌｉｐＤｏｃｋ２２０に表示されている動画ファイルの再生時間を表示する。
画面中央のエリア２５０は、動画ファイルを再生表示するエリア（以降「ｆｌｏｗＶｉｅｗｅｒ」と呼ぶ）であり、ユーザはここで動画ファイルを閲覧して必要な部分の切り取りなどの操作や、動画の早送りや巻戻しなどを行なう。ｆｌｏｗＶｉｅｗｅｒ２５０の上部のエリア２４０（以降「ｔｉｍｅｌｉｎｅＶｉｅｗｅｒ２４０」と呼ぶ）には、再生表示している動画ファイルの再生位置（タイムライン）などを表示する。 FIG. 2 is an example of a screen configuration of the moving image editing application of the present embodiment. The screen 200 is a screen projected on the screen (screen 150 shown in FIG. 1). The user performs a finger gesture on the screen 200. In each area 210 to 280 of the screen 200, an object (an image, a moving image, etc.) for a user to perform an operation is displayed.
An area 220 in the upper right of the screen is an area (hereinafter referred to as “clipDock”) that displays a moving image file that is a source edited by the moving image editing system of the present embodiment. The original moving image file (for example, a mov file) may be stored in advance in a specific directory of the image projection computer 110, for example, and read automatically when the application is started and displayed on the clipDock 220. . The user selects a moving image file to be edited from clipDoc 220 projected on the screen. In an area 230 under the clip Dock 220, the playback time of the moving image file displayed in the clip Dock 220 is displayed.
An area 250 at the center of the screen is an area for playing back and displaying a moving image file (hereinafter referred to as “flowViewer”). Perform a return. In the upper area 240 of the flow Viewer 250 (hereinafter referred to as “timeline Viewer 240”), the playback position (timeline) of the video file being played back is displayed.

画面左のエリア２１０は、ユーザが操作中の画像を表示するエリアであり、以降「ｉｎ／ｏｕｔＶｉｅｗｅｒ」と呼ぶ。例えば切り取りの操作の場合に、ユーザが指定した切り取りの始点の画像を、終点を指定するまでｉｎ／ｏｕｔＶｉｅｗｅｒ２１０に表示して、ユーザに分かりやすいようにしている。また、画面中央下のエリア２６０は、切り取った動画ファイルをサムネイル等で表示するエリアであり、以降「ｃｌｉｐＶｉｅｗｅｒ」と呼ぶ。実際には、例えば、切り取りの操作が行なわれると、切り取られた部分を新たな動画ファイルとして別のディレクトリ（ｃｌｉｐＶｉｅｗｅｒ用に用意したディレクトリ）に格納する。
ｃｌｉｐＶｉｅｗｅｒ２６０の下にあるエリア２７０は、ｃｌｉｐＶｉｅｗｅｒ２６０に表示された動画ファイルの中からユーザが加工（例えば連結など）の対象として指定したものを表示するエリアであり、以降「ｃｏｍｐｉｌｅＶｉｅｗｅｒ」と呼ぶ。実際には、例えば、上述のｃｌｉｐＶｉｅｗｅｒ２６０に表示している動画ファイル（切り取った動画ファイル）を記憶しているディレクトリ（ｃｌｉｐＶｉｅｗｅｒ用のディレクトリ）から、別のディレクトリ（ｃｏｍｐｉｌｅＶｉｅｗｅｒ用のディレクトリ）へのファイルの移動を行なう。
画面下のエリア２８０には、ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０にある動画ファイルの合計再生時間を表示する。 An area 210 on the left side of the screen is an area for displaying an image being operated by the user, and is hereinafter referred to as “in / outViewer”. For example, in the case of a cutting operation, the image of the starting point of cutting specified by the user is displayed on the in / outViewer 210 until the end point is specified, so that the user can easily understand. An area 260 at the bottom center of the screen is an area for displaying the cut video file as a thumbnail or the like, and is hereinafter referred to as “clipViewer”. Actually, for example, when a cut operation is performed, the cut portion is stored as a new moving image file in another directory (a directory prepared for clipViewer).
The area 270 below the clipViewer 260 is an area for displaying what the user designates as a target of processing (for example, concatenation) among the moving image files displayed on the clipViewer 260, and is hereinafter referred to as “compileviewer”. Actually, for example, a file is moved from a directory (clipviewer directory) that stores the movie file (cut movie file) displayed on the clipViewer 260 to another directory (compileviewer directory). To do.
In the area 280 at the bottom of the screen, the total playback time of the moving image file in the compieviewer 270 is displayed.

（１）動画ファイルの選択
まず、撮影などで得た動画ファイルから、編集したいものを選択する。上述したように、撮影された動画ファイルは、あらかじめ、画像投影用計算機１１０に記憶されており、画面２００の右上のｃｌｉｐＤｏｃｋ２２０には、その動画ファイルを表示している。
ユーザがｃｌｉｐＤｏｃｋ２２０内で指を一本出した状態のまま一定時間（例えば、０．５秒）静止するジェスチャを行うと、指の下に表示されている動画ファイルが選択され、後述するｆｌｏｗＶｉｅｗｅｒ２５０での再生が開始される。この時、再生しているファイルがどれであるか分かりやすくするため、ｃｌｉｐＤｏｃｋ２２０内に表示されている画像の表示方法を変更するとよい。例えば、選択された画像を実際の色で表示し、それ以外のファイルを赤みがかった画像で表示する。 (1) Selection of moving image file First, a moving image file obtained by shooting or the like is selected. As described above, the captured moving image file is stored in the image projection computer 110 in advance, and the moving image file is displayed in the clip Dock 220 at the upper right of the screen 200.
When the user performs a gesture of standing still for a certain period of time (for example, 0.5 seconds) with a single finger in clipDock 220, the video file displayed under the finger is selected, and a flow viewer 250 described later is selected. Playback starts. At this time, in order to make it easy to understand which file is being reproduced, the display method of the image displayed in the clipDock 220 may be changed. For example, the selected image is displayed with an actual color, and other files are displayed with a reddish image.

（２）動画ファイルの再生
動画編集の作業の中で、撮影した動画ファイルから必要な部分を切り出す、切り取りの作業がある。この作業は、動画ファイルの中で必要な始点と終点を指定することにより行われる。その際にユーザは前後の場面との比較を行うことで場面を特定する。しかし、従来の一般的な動画編集ソフトウェアでは動画ファイルを再生表示する画面が一つであることが多く、マウスによりタイムラインを移動させながら動画ファイルの再生・巻戻し・早送りを何度も繰り返すことで場面の特定を行っている。
そこで、本実施形態では、編集対象としている動画ファイルを同一画面内に複数枚（本実施形態では例えば７枚とする）横に並べた状態で動画ファイルの再生を行うｆｌｏｗＶｉｅｗｅｒ２５０を備えている。
ｆｌｏｗＶｉｅｗｅｒ２５０での再生の様子を、図４に示す。ｆｌｏｗＶｉｅｗｅｒ２５０には４１０〜４７０まで７つの画像で同じ動画ファイルを再生表示する。ここで、本実施形態では、隣り合う画像が右から左に行くにつれ再生位置が一定時間（例えば100msec）ずつ遅れた状態で再生する。これによりユーザには、動画ファイルが右から左に流れていくかのように見える。そのためユーザは動画ファイルの流れを一度に見ることができ、前後の場面との比較を自然に行うことで、従来のタイムラインによる方法に比べ、ユーザの求めている場面を素早く選択することが可能となる。また、ｆｌｏｗＶｉｅｗｅｒ２５０の上部には、各々のエリアについて、動画ファイルの現在の再生位置をタイムライン等で表示するｔｉｍｅｌｉｎｅＶｉｅｗｅｒ２４０を備えている。 (2) Reproduction of moving image file In the editing operation of moving image, there is a cutting operation of cutting out a necessary part from the captured moving image file. This operation is performed by designating a necessary start point and end point in the moving image file. At that time, the user identifies the scene by comparing with the preceding and following scenes. However, in conventional general video editing software, there is often a single screen for playing and displaying video files, and moving, rewinding, and fast-forwarding video files are repeated many times while moving the timeline with the mouse. The scene is specified by.
Therefore, in the present embodiment, a flow viewer 250 is provided that reproduces a moving image file in a state where a plurality of moving image files to be edited are arranged side by side (for example, seven in this embodiment) on the same screen.
FIG. 4 shows a state of reproduction by the flowViewer 250. The same video file is reproduced and displayed on the flowViewer 250 with seven images from 410 to 470. Here, in the present embodiment, as the adjacent images move from right to left, the reproduction position is reproduced with a delay of a certain time (for example, 100 msec). This makes it appear to the user as if the video file is flowing from right to left. Therefore, the user can see the flow of the video file at a time, and by comparing with the scenes before and after, it is possible to quickly select the scene that the user wants compared to the conventional timeline method. It becomes. In addition, on the upper part of the flowViewer 250, a timelineViewer 240 for displaying the current playback position of the moving image file on a timeline or the like for each area is provided.

（３）再生速度の制御
ユーザが動画ファイルの再生の制御（早送り／巻戻しなど）を行なうにあたって、従来の動画編集ソフトウェアに実装されているスライドバーを用いたインタフェースでは、動画の再生位置が一目でわかるという利点がある一方、スライドバーが画面比率と比べ非常に小さいためポインティングが難しく、マウスを用いて正確に場面の位置を探しにくいという問題点が挙げられる。その問題点を補うため、キーボードで数値を入力する方法などがとられるが、動画ファイルの再生位置を数値入力により決定することは直感的とは言えず、非常に不自然なインタフェースであると言える。
その解決法の１つとして、マウスの移動量を動画の表示速度や表示サイズに対応させる方法がある(例えば、http://www.yugop.com/)。この方法では、マウスを左右に動かすことにより、オブジェクトを対応する方向へ移動させるが、その際に画面中央から離れるにつれオブジェクトの移動速度も速くなる。マウスの移動距離とオブジェクトの移動速度が対応しているため、ユーザにとって理解し易い操作方法であると言える。
本実施形態でも、手の移動方向や移動距離により、動画ファイルの再生を制御する。手を開いた状態で手をスライドさせる（図３（ｃ）に示すジェスチャ）と、その移動方向や移動距離により、再生されている動画の再生速度や、表示サイズを変更する。 (3) Control of playback speed When a user controls playback of a video file (fast forward / rewind, etc.), the playback position of the video is at a glance with an interface using a slide bar implemented in conventional video editing software. On the other hand, there is an advantage that the slide bar is very small compared to the screen ratio, so pointing is difficult, and it is difficult to find the position of the scene accurately using the mouse. In order to make up for the problem, it is possible to input numerical values with the keyboard, but determining the playback position of the movie file by numerical input is not intuitive and can be said to be a very unnatural interface. .
One solution is to make the amount of movement of the mouse correspond to the display speed and display size of the moving image (for example, http://www.yugop.com/). In this method, the object is moved in the corresponding direction by moving the mouse left and right. At that time, the moving speed of the object increases as the object moves away from the center of the screen. Since the movement distance of the mouse corresponds to the movement speed of the object, it can be said that the operation method is easy for the user to understand.
Also in this embodiment, the reproduction of the moving image file is controlled by the moving direction and moving distance of the hand. When the hand is slid with the hand open (the gesture shown in FIG. 3 (c)), the playback speed and display size of the video being played are changed according to the moving direction and moving distance.

本実施形態では、動画ファイルの再生のコントロール（早送り／巻戻し）として二種類のインタフェースを構築した。一つは手の移動した位置により決定する方法である。この方法では、スクリーンに投影された画面の中心を原点とし、手の座標を用いて再生スピードを変更する。例えば画面の中心から右を早送り、中心から左を巻戻しとして、手を画面の一番端へスライドさせることで動画ファイルの早送り、巻戻しを高速で行ない、手が画面の中心に近づくにつれそのスピードを遅くする。
もう一つは、スライドさせた手の移動速度により再生スピードを変更する方法である。これは手をスライドさせた速度により、動画ファイルの再生スピードが変化する。具体的には手を右方向へスライドさせた場合、再生速度が上がり、左方向へスライドさせた場合には再生速度が下がる。上昇率、下降率ともに手をスライドさせる速度により増減し、より速くスライドさせた場合には、一度に再生速度が５段階変化するなどの制御を行なう。これにより、ユーザは場面の位置を素早く、かつ身体的に変更することが可能である。
手を右方向にスライドさせることにより再生速度を上げ、左方向にスライドさせることにより再生速度を下げる場合において、図５（ａ）〜（ｃ）に、再生速度を上げるジェスチャを示す。図５（ａ）に示すように、まず、スクリーン上に投影された画面のｆｌｏｗＶｉｅｗｅｒ２５０の上で左手５１０を開いた状態にする。次に（ｂ）（ｃ）に示すように、左手５１０を開いたまま右方向へスライドさせると、ｆｌｏｗＶｉｅｗｅｒ２５０で再生されている動画ファイルの再生速度を上げることができる。 In the present embodiment, two types of interfaces are constructed as a control (fast forward / rewind) for reproducing a moving image file. One is a method of determining by the position where the hand moves. In this method, the playback speed is changed using the coordinates of the hand, with the center of the screen projected on the screen as the origin. For example, fast forward right from the center of the screen, rewind left from the center, slide your hand to the end of the screen to fast forward and rewind the video file, and as your hand approaches the center of the screen Reduce the speed.
The other is a method of changing the playback speed according to the moving speed of the slid hand. This is because the playback speed of the moving image file changes depending on the speed at which the hand is slid. Specifically, when the hand is slid to the right, the playback speed is increased, and when the hand is slid to the left, the playback speed is decreased. Both the rate of increase and the rate of decrease are increased / decreased depending on the speed at which the hand is slid. Thereby, the user can change the position of the scene quickly and physically.
FIGS. 5A to 5C show gestures for increasing the playback speed when the playback speed is increased by sliding the hand to the right and the playback speed is decreased by sliding the hand to the left. As shown in FIG. 5A, first, the left hand 510 is opened on the flowViewer 250 of the screen projected on the screen. Next, as shown in (b) and (c), when the left hand 510 is slid rightward with the left hand 510 open, the reproduction speed of the moving image file reproduced by the flowViewer 250 can be increased.

また、開いた手を上下にスライドさせるジェスチャにより、ｆｌｏｗＶｉｅｗｅｒ２５０に並んだ動画ファイルの表示サイズを変更する。例えば、同時に表示する画像の数の初期値を図４に示したように７つとして、開いた手を上にスライドさせる程、画像の表示サイズを小さくし、同時に表示する画像の数を増やす。逆に下にスライドさせる程、表示サイズを大きくし、同時に表示する画像の数を減らす。これは、場面の選択候補を増やしたい場合や、場面を大きく表示したい場合に便利である。
このように、ｆｌｏｗＶｉｅｗｅｒ２５０で隣の場面と同時に比較しながら、身体的な操作（ジェスチャ）により再生・早送り・巻戻しを行うことができるため、従来のスライドバーや数値入力よりも、ユーザが求める場面の位置を感覚的に探しやすい。 Further, the display size of the moving image files arranged in the flowViewer 250 is changed by a gesture of sliding the open hand up and down. For example, the initial value of the number of images to be displayed at the same time is set to seven as shown in FIG. 4, and as the open hand is slid upward, the image display size is reduced and the number of images to be displayed at the same time is increased. Conversely, the smaller the slide, the larger the display size and the number of images to be displayed simultaneously. This is convenient when it is desired to increase the number of scene selection candidates or when it is desired to display a large scene.
In this way, while it is possible to perform playback, fast forward, and rewind by physical operation (gesture) while comparing with the next scene at the flowViewer 250, the scene that the user seeks rather than the conventional slide bar or numeric input It is easy to find the position of sensuously.

（４）動画ファイルの切り取り
ユーザがｆｌｏｗＶｉｅｗｅｒ２５０上で「つかむ」ジェスチャ（上述の図３（ａ）のジェスチャ）を行うと、手の真下にある場面で切り取りの始点と終点を決定することができる。切り取りの始点が指定されると、図２に示す画面２００のｉｎ／ｏｕｔＶｉｅｗｅｒ２１０に、その場面の画像が表示され、終点が指定されると、自動的に、もとの動画ファイルから始点から終点まで動画の切り取りを行なう。切り取られた動画は画面下部のｃｌｉｐＶｉｅｗｅｒ２６０に表示される。
ユーザが切り取りの始点を指定するジェスチャを、図６（ａ）〜（ｃ）に示す。ここでは右手６２０で始点を指定する操作を行なっている。まず、（ａ）に示すように、ｆｌｏｗＶｉｅｗｅｒ２５０上で始点の場面が表示されている上に、右手６２０を開いた状態で置き、次に、（ｂ）に示すように手を閉じて、「つかむ」ジェスチャを行なう。これで、切り取りの始点が指定され、（ｃ）に示すように、ｉｎ／ｏｕｔＶｉｅｗｅｒ２１０に始点の画像が表示される。次に、同様の「つかむ」ジェスチャで、終点を指定する。 (4) Cutting a moving image file When the user performs a “grab” gesture (the gesture in FIG. 3A described above) on the flowViewer 250, the start point and the end point of the cutting can be determined in a scene immediately below the hand. When the cut start point is designated, the image of the scene is displayed in the in / outViewer 210 of the screen 200 shown in FIG. 2, and when the end point is designated, the original movie file is automatically moved from the start point to the end point. Cut out the video. The clipped moving image is displayed on clipViewer 260 at the bottom of the screen.
Gestures in which the user designates the start point of cutting are shown in FIGS. Here, an operation for designating the start point is performed with the right hand 620. First, as shown in (a), the scene of the starting point is displayed on the flowViewer 250, and the right hand 620 is placed in an open state, and then the hand is closed as shown in (b). "Do a gesture. Thus, the start point of the cut is specified, and the image of the start point is displayed in in / outViewer 210 as shown in (c). Next, specify the end point with the same “grab” gesture.

（５）切り取った動画ファイルの連結と再生
次に、切り取った複数の動画ファイルを連結して、１つの動画ファイルを作成する。
ユーザはまず、ｃｌｉｐＶｉｅｗｅｒ２６０内の画像上で、選択のジェスチャ（図３（ｂ）に示すジェスチャ）を行なって、連結したい動画ファイルを選択し、それをｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に移動させる。また、逆のジェスチャを行うことでｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に表示された動画ファイルをｃｌｉｐＶｉｅｗｅｒ２６０へ戻すことも可能である。
また、図７（ａ）〜（ｃ）に示すように、両手により選択のジェスチャを行った場合、選択された両ファイルとその間のファイルを全て移動することができる。この場合、まず図７の（ａ）に示すようにｃｌｉｐＶｉｅｗｅｒ２６０で右手７２０と左手７１０により１つずつ画像を選択（図３（ｂ）に示すジェスチャ）する。次に、（ｂ）に示すように両手の指を出したままｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に移動させる。すると、ｃｌｉｐＶｉｅｗｅｒ２６０に表示されていた動画ファイルのうち、ユーザの両手の指で選択された２つのファイルと、その２つのファイルの間に表示されていた１つのファイルの、計３つのファイルが、図７（ｃ）に示すように、ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に表示され、ｃｌｉｐＶｉｅｗｅｒ２６０から削除される。
上記のジェスチャを繰り返し行なって、ユーザは切り取った動画ファイルの中から、連結したい動画ファイルのみを選択することができる。
画面右下にはｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に表示された動画ファイルの合計再生時間２８０が自動的に表示される。 (5) Concatenation and playback of cut out moving image files Next, a plurality of cut out moving image files are connected to create one moving image file.
First, the user performs a selection gesture (gesture shown in FIG. 3B) on the image in the clipViewer 260, selects a moving image file to be connected, and moves it to the compieviewer 270. It is also possible to return the moving image file displayed on the compieviewer 270 to the clipViewer 260 by performing the reverse gesture.
Also, as shown in FIGS. 7A to 7C, when a selection gesture is performed with both hands, both the selected files and all the files in between can be moved. In this case, as shown in FIG. 7A, images are selected one by one with the clip viewer 260 using the right hand 720 and the left hand 710 (the gesture shown in FIG. 3B). Next, as shown in (b), it is moved to completeViewer 270 with the fingers of both hands out. Then, among the video files displayed on clipViewer 260, there are three files in total: two files selected with the fingers of both hands of the user and one file displayed between the two files. As shown in 7 (c), it is displayed on the compieviewer 270 and deleted from the clipviewer 260.
By repeatedly performing the above gesture, the user can select only a moving image file to be connected from the cut moving image files.
In the lower right of the screen, the total playback time 280 of the moving image file displayed on the compieviewer 270 is automatically displayed.

次に、ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に移動した動画ファイルを連結させる。ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０上で、両手の指を一本ずつ出して、それぞれ始まりとなる動画ファイルと終わりとなる動画ファイルを選択し、それらの指を近付けるジェスチャ（図３（ｄ）に示すジェスチャ）を行なう。そうすると、ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に格納されている動画ファイルが連結され、１つの動画ファイルを作成することができる。
ここで、ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に表示されたファイルで作られる動画を、実際に連結を行なう前に再生して、確認することもできる。この操作のジェスチャを図８（ａ）〜（ｃ）に示す。まず（ａ）に示すように左手８１０と右手８２０の指で、それぞれ再生の始まりとなる動画ファイルと終わりとなる動画ファイルを選択する（両手で選択された動画ファイルとその間の動画ファイルが選択されたことになる）。次に、そのジャスチャのまま両手を画面の中央８３０に移動させる（ｂ）。すると、（ｃ）に示すように、画面中央８３０に、（ａ）で選択された動画ファイルを、ｃｏｍｐｉｌｅＶｉｅｗｅｒ２７０に表示されている順に（例えば、左から順に）再生表示する。このようにして、出来上がる動画のプレビューを事前に確認することができる。 Next, the moving image file moved to the compieviewer 270 is connected. On the compieviewer 270, the fingers of both hands are taken out one by one, a moving image file that starts and ends is selected, and a gesture (gesture shown in FIG. 3D) for bringing these fingers close is performed. Then, the moving image files stored in the compieviewer 270 are connected to create one moving image file.
Here, a moving image created by the file displayed on the compieviewer 270 can be reproduced and confirmed before actually connecting. The gesture of this operation is shown in FIGS. First, as shown in (a), with the fingers of the left hand 810 and the right hand 820, select the video file that starts and ends the playback, respectively (the video file selected with both hands and the video file between them are selected. That ’s true). Next, both hands are moved to the center 830 of the screen with the gesture (b). Then, as shown in (c), the moving image file selected in (a) is reproduced and displayed in the center 830 of the screen in the order in which they are displayed on the compieviewer 270 (for example, in order from the left). In this way, it is possible to confirm in advance a preview of the resulting video.

本実施形態の動画編集システムを用いれば、上述のように、ユーザは動画ファイルの編集を身体的な感覚で行なうことができる。
なお、上記は一例であり、同様にして、上記の（ａ）〜（ｄ）の４つのジェスチャを組み合わせた操作で、上記の（Ａ）〜（Ｅ）の５つ以外の動画編集機能を実現させることができる。さらに、（ａ）〜（ｄ）に示した以外のジェスチャを認識させるようにすれば、さらに多くの動画編集機能を実現させることが可能である。 If the moving image editing system of the present embodiment is used, the user can edit the moving image file with a physical sense as described above.
Note that the above is an example, and in the same manner, the video editing functions other than the above five (A) to (E) are realized by combining the four gestures (a) to (d). Can be made. Furthermore, if a gesture other than those shown in (a) to (d) is recognized, it is possible to realize more moving image editing functions.

本実施形態の動画編集システムのシステム構成図である。1 is a system configuration diagram of a moving image editing system of the present embodiment. スクリーンに投影する画面の構成例を示した図である。It is the figure which showed the structural example of the screen projected on a screen. 本実施形態で認識するジェスチャを示した図である。It is the figure which showed the gesture recognized by this embodiment. ｆｌｏｗＶｉｅｗｅｒの表示の一例を示した図である。It is the figure which showed an example of the display of flowViewer. 動画ファイルの再生速度を上げるジェスチャを示した図である。It is the figure which showed the gesture which raises the reproduction speed of a moving image file. 動画ファイルの切り取りの始点を指定するジェスチャを示した図である。It is the figure which showed the gesture which designates the starting point of cutting of a moving image file. 複数の動画ファイルを一度に選択して移動させるジェスチャを示した図である。It is the figure which showed the gesture which selects and moves several moving image files at once. 複数の動画ファイルを１つにまとめるジェスチャを示した図である。It is the figure which showed the gesture which puts together several moving image files into one.

Claims

In the video editing system that displays the video editing screen and inputs the user's gesture for it with the camera to perform the video editing operation.
Video editing display means for displaying the video editing screen;
Image input means for inputting an image taken by the camera;
Finger recognition means for recognizing the position of the user's finger from the input image;
Gesture recognition means for recognizing the meaning of the user's gesture from the position of the user's finger;
A moving image editing means for performing moving image editing corresponding to the gesture, and displaying a result of moving image editing as a moving image editing screen.

The video editing system according to claim 1,
Furthermore, it comprises an initial screen storage means for storing an image before the user's fingers are captured,
The finger recognition means compares the input image with the image from the initial screen storage means to obtain a user finger area, obtains the position of the user fingertip by circular template matching, and obtains the user finger A video editing system that recognizes the position of the video.

The video editing system according to claim 2,
The moving image editing system, wherein the circular template matching in the finger recognition unit is performed on a square region centered on a center position of a user's hand.

In the moving image editing system according to any one of claims 1 to 3,
The video editing display means arranges a plurality of videos in a horizontal row on the video editing screen, and when each video is played back, the playback position is delayed by a certain time as the adjacent videos go from right to left A video editing system characterized by being played back on.

In the moving image editing system according to any one of claims 1 to 4,
When the position of the finger of the user is changed from a state where the hand is opened at the same place on the editing screen to a state where the hand is closed, the gesture recognition unit A video editing system characterized by recognizing that a video has been grabbed.

In the moving image editing system according to any one of claims 1 to 5,
The gesture recognition means, when the position of the finger of the user is a state where one finger is put out at the same place on the editing screen, the user selects an image or a video displayed at the position A video editing system characterized by recognition.

In the moving image editing system according to any one of claims 1 to 6,
The gesture recognition means is an image in which the user is displayed at the position of each hand when the positions of the fingers of both hands of the user are in the state where one finger is put out at the same place on the editing screen. Or a moving image editing system that recognizes that a moving image and an image or moving image displayed between them are selected.

The moving image editing system according to claim 7,
The moving image editing system according to claim 1, wherein the gesture recognizing unit recognizes that the selected image or moving image is connected to one when the user brings both hands close after selecting the image or moving image.

In the moving image editing system according to any one of claims 1 to 8,
The gesture recognizing means recognizes the moving direction, moving speed, and moving distance as the amount of video editing when the position of the user's finger moves up, down, left, or right with the hand open. A video editing system characterized by

The program for making a computer system implement | achieve the function of the moving image editing system in any one of Claims 1-9.