JP2020071394A

JP2020071394A - Information processing device, information processing method, and program

Info

Publication number: JP2020071394A
Application number: JP2018205860A
Authority: JP
Inventors: 大矢　崇; Takashi Oya; 崇大矢
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2020-05-07
Anticipated expiration: 2038-10-31
Also published as: JP7175715B2

Abstract

【課題】違和感を低減する。【解決手段】現実空間を撮像した第１画像を取得する撮像手段と、前記第１画像を用いて生成された第２画像を表示する表示手段とを備える画像表示装置に、前記第２画像を供給する情報処理装置であって、前記第１画像に含まれる特定の物体の領域を、前記第１画像に含まれる該領域の周辺から得られる画像特徴を用いて補完した背景画像を生成する第１生成手段と、前記第１画像に含まれる特定の物体の所定時間後の位置を推定した結果に基づいて、前記背景画像中の前記推定結果に基づく位置に前記物体を描写した画像を合成した前記第２画像を生成する第２生成手段と、前記撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および前記画像表示装置の位置または姿勢の変化量に基づいて、前記表示手段に供給する第２画像を表示する位置をシフトする表示位置制御手段とを有する。【選択図】図２PROBLEM TO BE SOLVED: To reduce a feeling of strangeness. SOLUTION: The second image is mounted on an image display device including an image pickup means for acquiring a first image of an image of a real space and a display means for displaying a second image generated by using the first image. A second information processing apparatus to supply, which generates a background image in which a region of a specific object included in the first image is complemented by using image features obtained from the periphery of the region included in the first image. Based on the result of estimating the position of a specific object included in the first image after a predetermined time with one generation means, an image depicting the object at a position based on the estimation result in the background image was synthesized. The second generation means for generating the second image, a delay time due to at least a part of the processing time in the processing from imaging to display, and the amount of change in the position or posture of the image display device. It has a display position control means for shifting a position for displaying a second image supplied to the display means. [Selection diagram] Fig. 2

Description

本発明は、仮想現実感や複合現実感のシステムに関する。 The present invention relates to virtual reality and mixed reality systems.

設計・製造分野における試作工程の期間短縮、費用削減を目的として、仮想現実感（ＶＲ：ＶｉｒｔｕａｌＲｅａｌｉｔｙ）や複合現実感（ＭＲ：ＭｉｘｅｄＲｅａｌｉｔｙ）システムが利用されている。これらのシステムではＣＡＤ（コンピュータ支援設計）システムで作成した設計（形状・デザイン）データを用いて、現実物を試作することなく、組み立てやすさやメンテナンス性を評価できる。これらのシステムではヘッドマウントディスプレイ（ＨＭＤ：ＨＥＡＤＭｏｕｎｔｅｄＤｉｓｐｌａｙ）が用いられる。ＨＭＤは頭部に直接装着することで視野全体を覆い、体験者の位置・姿勢に応じた視点からみた仮想現実感を体験できる映像を表示する。体験者が動くことによってＨＭＤの位置や姿勢が変化した場合、ＨＭＤに表示する映像を生成する処理に時間がかかるため、ＨＭＤには体験者が動く前の映像が表示される。この際、体験者が想定する映像とは異なる映像が表示されることに起因して映像に対する違和感（体感的遅延による違和感）が生じる。そこで、特許文献１では、体感的遅延緩和を目的として、遅延時間の姿勢変化量に対応した量だけ、表示画像を画面内シフトする。 Virtual reality (VR) and mixed reality (MR) systems are used for the purpose of shortening the period of the trial manufacturing process and reducing costs in the field of design and manufacturing. In these systems, design (shape / design) data created by a CAD (Computer Aided Design) system can be used to evaluate ease of assembly and maintainability without making a prototype of a real object. In these systems, a head mounted display (HMD: HEAD Mounted Display) is used. The HMD covers the entire field of view by mounting it directly on the head, and displays an image that allows the user to experience virtual reality from a viewpoint according to the position and posture of the user. When the position and orientation of the HMD change due to the movement of the experience person, it takes time to generate the image to be displayed on the HMD, so that the image before the experience person moves is displayed on the HMD. At this time, an uncomfortable feeling (uncomfortable feeling due to perceptual delay) occurs in the image due to the display of the image different from the image assumed by the experience person. Therefore, in Patent Document 1, the display image is shifted within the screen by an amount corresponding to the posture change amount of the delay time for the purpose of easing the perceptual delay.

特開２００４−１０９９９４号公報JP, 2004-109994, A

特許文献１では、動体が現実空間に存在する場合でも動体がどちらの方向に動くか推定せずに、位置姿勢が急激に動く前の画像をシフトさせるだけである。そのため、ＨＭＤの位置姿勢の変化に伴って、動体が表示されるべき位置が移動する方向とは逆方向に、動体が表示されることがある。つまり、特許文献１に開示される方法は現実空間中の動体の動きを考慮していないため、動体を含む映像の体感的違和感を緩和することができなかった。本発明は上記課題に鑑みてなされたものであり、シーン中に動体が存在する場合でもＨＭＤ装着者のＨＭＤの急激な位置姿勢の変化が起きた時に、動体が表示されるべき位置に表示されないという違和感を低減することを目的とする。 In Patent Document 1, even when the moving body exists in the physical space, it is not estimated in which direction the moving body moves, and only the image before the position / posture abruptly moves is shifted. Therefore, the moving body may be displayed in a direction opposite to the direction in which the position at which the moving body should be displayed moves, as the position and orientation of the HMD change. That is, since the method disclosed in Patent Document 1 does not consider the motion of the moving body in the physical space, it is not possible to reduce the perceived discomfort in the image including the moving body. The present invention has been made in view of the above problems, and even when a moving body exists in a scene, when the HMD wearer suddenly changes the position and orientation of the HMD, the moving body is not displayed at the position where it should be displayed. The purpose is to reduce the discomfort.

上記課題を解決する本発明にかかる情報処理装置は、現実空間を撮像した第１画像を取得する撮像手段と、前記第１画像を用いて生成された第２画像を表示する表示手段とを備える画像表示装置に、前記第２画像を供給する情報処理装置であって、前記第１画像に含まれる特定の物体の領域を、前記第１画像に含まれる該領域の周辺から得られる画像特徴を用いて補完した背景画像を生成する第１生成手段と、前記第１画像に含まれる特定の物体の所定時間後の位置を推定した結果に基づいて、前記背景画像中の前記推定結果に基づく位置に前記物体を描写した画像を合成した前記第２画像を生成する第２生成手段と、前記撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および前記画像表示装置の位置または姿勢の変化量に基づいて、前記表示手段に供給する第２画像を表示する位置をシフトする表示位置制御手段とを有する。 An information processing apparatus according to the present invention that solves the above problem includes an image capturing unit that acquires a first image obtained by capturing an image of a physical space, and a display unit that displays a second image generated using the first image. An information processing apparatus that supplies the second image to an image display apparatus, wherein image characteristics obtained by surrounding a region of a specific object included in the first image with a periphery of the region included in the first image are provided. First generation means for generating a background image complemented by using the first image generation means, and a position based on the estimation result in the background image based on a result of estimating a position of a specific object included in the first image after a predetermined time. Second generation means for generating the second image in which the images depicting the object are combined with each other, a delay time caused by at least a part of the processing time in the processing from the imaging to the display, and the position of the image display device. Also Based on the amount of change of the posture, and a display position control means for shifting the position of displaying the second image to be supplied to the display means.

本発明によれば、本発明によれば、シーン中に動体が存在する場合でもＨＭＤ装着者のＨＭＤの急激な位置姿勢の変化が起きた時に、動体が表示されるべき位置に表示されないという違和感を低減することができる。 According to the present invention, according to the present invention, even when a moving object is present in a scene, when a sudden change in the position and orientation of the HMD of the HMD wearer occurs, the moving object is not displayed at the position where it should be displayed. Can be reduced.

仮想現実空間で発生しうる遅延した画像の表示例を示す図。The figure which shows the example of a display of the delayed image which can occur in virtual reality space. 情報処理装置が生成する画像の表示例を示す図。FIG. 6 is a diagram showing a display example of an image generated by the information processing device. 物体予測を行う座標系の例を示す図。The figure which shows the example of the coordinate system which predicts an object. 情報処理システムの機能構成例を示すブロック図。The block diagram which shows the function structural example of an information processing system. 情報処理装置が実行する処理を示すフローチャート。6 is a flowchart showing processing executed by the information processing device. 情報処理システムが実行する処理を示すシーケンス図。The sequence diagram which shows the process which an information processing system performs. 情報処理システムの機能構成例を示すブロック図。The block diagram which shows the function structural example of an information processing system. 情報処理装置が実行する処理を示すフローチャート。6 is a flowchart showing processing executed by the information processing device. 情報処理システムが実行する処理を示すシーケンス図。The sequence diagram which shows the process which an information processing system performs. 情報処理システムの機能構成例を示すブロック図。The block diagram which shows the function structural example of an information processing system. 情報処理装置のハードウェア構成例を示すブロック図。FIG. 2 is a block diagram showing a hardware configuration example of an information processing device.

（第１の実施形態）
以下、添付図面を参照して、本発明を適用した好適な実施形態に従って詳細に説明する。 (First embodiment)
Hereinafter, with reference to the accompanying drawings, a detailed description will be given according to preferred embodiments to which the present invention is applied.

ヘッドマウントディスプレイ（以下、ＨＭＤ）は、両目用のディスプレイおよび両耳用のヘッドホンを内蔵する。ユーザはＨＭＤを頭部に装着することにより、ディスプレイに表示される静止画や動画などを鑑賞し、ヘッドホンから出力される音声や音楽などを聴くことができる。本実施形態では特に複合現実システム（ＭＲシステム）に用いられるＨＭＤの例を説明する。このとき、ＨＭＤはＨＭＤ内の撮像部から取り込んだ現実空間の画像と仮想空間の画像とを合成し、複合現実感を体験できる映像としてＨＭＤ内の表示部に出力する。また、ＨＭＤに内蔵または外付けされたジャイロセンサーや加速度センサーなどによりＨＭＤを装着したユーザの頭部の位置情報と頭部の回転角や傾きなどの姿勢情報を計測することができる。なお、姿勢情報とは、ジャイロセンサー等の姿勢センサーによってＨＭＤの相対的な動きを計測した結果である。以下で記載する画像または映像は、情報処理装置で生成された画像の例である。生成された画像の一部またはすべてが、ＨＭＤを装着したユーザに提示される。本実施形態では、動体１１０を観察し、予測した位置に配置することで体感的違和感を軽減する方法について説明する。本実施形態では、動体をステレオカメラによって観測し、撮像画像から抽出される２次元の動体領域を予測した位置に表示する画像を生成する。 A head mounted display (hereinafter, HMD) has a built-in display for both eyes and headphones for both ears. By mounting the HMD on the head, the user can watch a still image or a moving image displayed on the display and can listen to voice or music output from the headphones. In this embodiment, an example of an HMD used in a mixed reality system (MR system) will be described. At this time, the HMD synthesizes the image of the real space and the image of the virtual space captured from the image pickup unit in the HMD, and outputs the image as a video that allows the user to experience mixed reality on the display unit in the HMD. Further, it is possible to measure the position information of the head of the user who wears the HMD and the posture information such as the rotation angle and the inclination of the head by a gyro sensor or an acceleration sensor built in or attached to the HMD. The posture information is the result of measuring the relative movement of the HMD by a posture sensor such as a gyro sensor. The images or videos described below are examples of images generated by the information processing device. Some or all of the generated images are presented to the user wearing the HMD. In this embodiment, a method of observing the moving body 110 and arranging the moving body 110 at a predicted position to reduce a feeling of physical discomfort will be described. In this embodiment, a moving object is observed by a stereo camera, and an image for displaying a two-dimensional moving object region extracted from a captured image at a predicted position is generated.

まず現実空間において特定の物体（例えば動いている物体あるいは動く可能性がある物体）を含む映像の遅延が仮想現実空間で発生する例を図１で説明する。図１の（Ａ）は、所定の時刻ｔにおける現実空間上におけるＨＭＤと特定の物体（動体１１０）と静止物体１２０の位置関係を示している。ここで、特定の物体とは、動く可能性がある物体の代表としてＨＭＤ装着者の手であるとする。図１の（Ａ）の状態から次の時刻ｔ＋Δｔにおける位置関係を図１の（Ｂ）に示す。時間Δｔは、遅延時間であり、情報処理装置が撮像した時間からＨＭＤに表示する画像を生成する直前にＨＭＤの位置を計測した時間との差であるとする（例えば１００ｍｓｅｃ．）。 First, an example in which a video delay including a specific object (for example, a moving object or an object that may move) occurs in the virtual space in the virtual space will be described with reference to FIG. FIG. 1A shows the positional relationship between the HMD, a specific object (moving object 110) and the stationary object 120 in the physical space at a predetermined time t. Here, it is assumed that the specific object is the hand of the HMD wearer as a representative of the objects that may move. The positional relationship at the next time t + Δt from the state of (A) of FIG. 1 is shown in (B) of FIG. The time Δt is a delay time, and is assumed to be a difference between the time taken by the information processing apparatus and the time measured at the position of the HMD immediately before the image displayed on the HMD is generated (for example, 100 msec.).

図１（Ｂ）では、Δｔ秒間にＨＭＤがΔΘだけ右に回転し、動体は静止物体１２０に対し、Δｈだけ右側に移動している状態を示している。ＨＭＤの位置姿勢の変化ΔΘはＨＭＤに内蔵された位置姿勢センサーで計測可能である。図１の（Ｃ）と（Ｄ）は、ＨＭＤの視点から見た時刻ｔおよび時刻ｔ＋Δｔにおける遅延のない理想的な画像を示している。図１（Ｅ）は、仮想空間描画処理で遅延が発生している画像１０２０を示している。画像１０２０は、時刻ｔにおける画像１０００を画像処理した結果であるが、ＨＭＤの動きとは無関係に画像処理にかかった時間だけ遅延して表示されることになる。ここで、遅延のない図１（Ｄ）における動体の理想的な表示位置に対し、図１（Ｅ）は、矢印１５０の長さだけ水平方向にずれた位置に動体が描写される。なお、物体１３０はＣＧモデルを現実空間に合わせて描写したＣＧである。この矢印の大きさに比例してＨＭＤ装着者は仮想空間の映像に体感的な違和感を生じさせうる。なお、図１（Ｄ）（Ｅ）（Ｆ）を貫く点線は、画像１０００が見えていた視界を示しており、ＨＭＤの動きに従ってΔＸだけずれて表示されることを示している。 FIG. 1B shows a state in which the HMD rotates to the right by ΔΘ during Δt seconds, and the moving body moves to the right by Δh with respect to the stationary object 120. The change Δθ in the position and orientation of the HMD can be measured by the position and orientation sensor built in the HMD. 1C and 1D show ideal images without delay at time t and time t + Δt as viewed from the HMD viewpoint. FIG. 1E illustrates an image 1020 in which a delay has occurred in the virtual space drawing process. The image 1020 is the result of the image processing of the image 1000 at time t, but is displayed with a delay of the time required for the image processing regardless of the movement of the HMD. Here, with respect to the ideal display position of the moving body in FIG. 1D without delay, the moving body is depicted at a position horizontally displaced by the length of arrow 150 in FIG. 1E. The object 130 is a CG that depicts the CG model in real space. In proportion to the size of the arrow, the HMD wearer can cause a sense of discomfort in the image in the virtual space. It should be noted that the dotted lines passing through FIGS. 1D, 1E, and 1F show the field of view in which the image 1000 was visible, and indicate that the image 1000 is displaced by ΔX as the HMD moves.

一方で、図１（Ｆ）は時刻ｔで生成した映像を、ＨＭＤの回転量ΔΘを加味した大きさΔＸだけ画像の水平方向にシフトしたときの画像を示している。画像をシフトさせる処理の詳細は後述する。画像１０３０の黒で示した領域は、画像をシフトしたために撮像できていない領域が黒く表示される様子を示す。図１（Ｆ）の静止物体１２０の位置は、図１（Ｄ）の静止物体１２０の位置と比べてＨＭＤの画面上で同じ位置に表示される。しかし、動体に関しては、矢印１６０の長さ分、実時間（時刻ｔ＋Δｔ）における動体の位置（図１（Ｄ）における動体の理想的な表示位置）とのズレが生じるため、そのぶん動体が遅延していると感じる。これは、Δｔ秒間で動いた動体の移動距離Δｈを無視して、時刻ｔで得た撮像画像１０００をΔＸだけシフトしたために生じる。なお、図１（Ｅ）の画像をシフトする処理を含まない場合の動体の位置と、図１（Ｄ）の理想的な表示位置との差異である矢印１５０よりも、矢印１６０の方が長い。そのため、ＨＭＤ装着者により一層の違和感を与える可能性がある。ＨＭＤが右回転し、装着者の手が右に平行移動するケースは一例である。すなわち、ＨＭＤの位置または姿勢の変化を考慮して画像を表示しても、特定の物体の動きが予測できず、また処理時間を更に要するために、ＨＭＤ装着者にとって違和感のある映像を表示してしまう可能性がある。さらに、複合現実感を提供するためのビデオシースルー型のＨＭＤでは、前述した表示映像の生成の前に、ＨＭＤ搭載カメラで現実空間を撮像して露光する時間、画像情報を伝送する時間が加算される。すなわち、画像生成にかかる処理時間がさらに長くなり、ＨＭＤや動体の姿勢の変化による違和感がさらに増す可能性がある。 On the other hand, FIG. 1 (F) shows an image when the image generated at time t is horizontally shifted by a size ΔX in consideration of the rotation amount ΔΘ of the HMD. Details of the process of shifting the image will be described later. The black area of the image 1030 indicates that the area that cannot be captured because the image is shifted is displayed in black. The position of the stationary object 120 in FIG. 1 (F) is displayed at the same position on the screen of the HMD as compared with the position of the stationary object 120 in FIG. 1 (D). However, with respect to the moving body, the length of the arrow 160 causes a deviation from the position of the moving body in real time (time t + Δt) (ideal display position of the moving body in FIG. 1D), so that the moving body is delayed. I feel that I am doing it. This occurs because the captured image 1000 obtained at time t is shifted by ΔX, ignoring the moving distance Δh of the moving body that has moved in Δt seconds. Note that the arrow 160 is longer than the arrow 150 which is the difference between the position of the moving object in the case where the processing of shifting the image in FIG. 1E is not included and the ideal display position in FIG. .. Therefore, the HMD wearer may feel more uncomfortable. An example is a case where the HMD rotates to the right and the wearer's hand moves in parallel to the right. That is, even if an image is displayed in consideration of a change in the position or orientation of the HMD, the movement of a specific object cannot be predicted and further processing time is required. Therefore, an image that is uncomfortable for the HMD wearer is displayed. There is a possibility that Further, in the video see-through HMD for providing mixed reality, the time for capturing and exposing the real space by the HMD-equipped camera and the time for transmitting the image information are added before the generation of the display image described above. It That is, the processing time required for image generation is further lengthened, and there is a possibility that discomfort may be further increased due to changes in the posture of the HMD or the moving body.

図２は本発明における情報処理装置が生成する画像の表示例を示す図である。図２（ａ）及び（ｂ）において撮影時刻をｔ、表示時刻ｔ＋Δｔとする。図１と同様に、時刻ｔにおいて、ＨＭＤ２００から、動体である手２１０と物体２２０が観測される。また、時刻ｔ＋Δｔにおいて、ＨＭＤ、物体、手の位置関係は、ＨＭＤ２０１、手２１１、物体２２０の通りとなる。ＨＭＤは回転し視線を右側に向けている。また手は時刻ｔから時刻ｔ＋Δｔにかけて２１０から２１１の位置に移動している。 FIG. 2 is a diagram showing a display example of an image generated by the information processing apparatus according to the present invention. In FIGS. 2A and 2B, the shooting time is t and the display time is t + Δt. Similar to FIG. 1, at time t, the hand 210 and the object 220 that are moving bodies are observed from the HMD 200. Further, at time t + Δt, the positional relationship between the HMD, the object, and the hand is as in the HMD 201, the hand 211, and the object 220. The HMD rotates and directs its line of sight to the right. Also, the hand moves from the time t to the time t + Δt from the position 210 to the position 211.

時刻ｔにおいてＨＭＤは付帯の撮像装置によって撮像した画像２０００を取得する。画像２０００には、ＨＭＤ装着者の手２３０が写りこんでいる。次に図２（ｄ）に示すように、画像２０００から２次元の動体領域２３２を検出する。ここでは、撮像画像から肌色領域を抽出する。本実施形態では、あらかじめ肌色を撮像した時の肌色の領域のすべての色情報を記録しておき、テーブルに保持しておく。このとき色情報は、ＲＧＢの３原色の表現であってもよいし、ＹＣｂＣｒの輝度と色味情報で表現してもよい。なお、動体検出の手法は、色検出、エッジ検出、フレーム間差分、ＳｕｐｅｒＰｉｘｅｌによる領域分割、パターンマッチング、学習に基づく物体検出手法が選択可能であるが、本発明は特定の手法に限定されるものではない。例えば、山内雄二，山下孝義，藤吉博信，“［サーベイ論文］統計的学学習手法による人検出”，電子情報通信学会パターン認識・メディア研究会（ＰＲＭＵ）技術報告，ｐｐ．１１３−１２６（２０１２）の手法でもよい。領域検出後、図２（ｅ）で、後述する手法により、時刻ｔ＋Δｔにおける動体の位置予測を行い、予測された動体位置２３４を得る。 At time t, the HMD acquires an image 2000 captured by the attached image capturing device. The image 230 shows the hand 230 of the HMD wearer. Next, as shown in FIG. 2D, a two-dimensional moving body area 232 is detected from the image 2000. Here, the skin color area is extracted from the captured image. In this embodiment, all the color information of the skin color area when the skin color is imaged is recorded in advance and stored in the table. At this time, the color information may be represented by the three primary colors of RGB, or may be represented by the luminance and tint information of YCbCr. It should be noted that the moving object detection method can be selected from color detection, edge detection, inter-frame difference, area division by SuperPixel, pattern matching, and object detection method based on learning, but the present invention is limited to a specific method. is not. For example, Yuji Yamauchi, Takayoshi Yamashita, Hironobu Fujiyoshi, “[Survey Paper] Human Detection by Statistical Learning Method”, IEICE Technical Report on Pattern Recognition and Media (PRMU), pp. The method of 113-126 (2012) may be used. After the area is detected, the position of the moving body at time t + Δt is predicted by the method described later in FIG. 2E, and the predicted moving body position 234 is obtained.

動体領域検出後、図２（ｆ）のように背景領域２４１を得る。次に図２（ｇ）において動体領域２４１を背景領域の画像特徴に基づいて穴埋めし、動体のない背景画像２０４０を生成する。本実施形態では、動体領域の周辺領域の色の平均を取得し、その色で動体領域を補完する。なお、背景穴埋めの方法については、周辺領域と同系色で埋める方法や過去の画像から埋める方法があるが本発明は特定の手法に限定されるものではない。例えば、森ら（森尚平，一刈良介，柴田史久，木村朝子，田村秀行，”隠消現実感の技術的枠組と諸問題〜現実世界に実在する物体を視覚的に隠蔽・消去・透視する技術について〜”，日本バーチャルリアリティ学会論文誌）の手法でも良い。さらに、CGモデルデータに基づいて背景画像中の現実空間の対応する所定位置に物体１３０をレンダリングする。 After detecting the moving body area, a background area 241 is obtained as shown in FIG. Next, in FIG. 2G, the moving body area 241 is filled in based on the image feature of the background area, and a background image 2040 having no moving body is generated. In the present embodiment, the average of the colors of the peripheral area of the moving body area is acquired, and the moving body area is complemented by the color. As for the background filling method, there are a method of filling with a color similar to that of the peripheral area and a method of filling from a past image, but the present invention is not limited to a particular method. For example, Mori et al. (Shohei Mori, Ryosuke Ichikari, Fumihisa Shibata, Asako Kimura, Hideyuki Tamura, "Technical framework and problems of hidden reality" -visually conceal, erase and see through real objects in the real world. About technology ~ ", the journal of the Virtual Reality Society of Japan) may be used. Further, based on the CG model data, the object 130 is rendered at a corresponding predetermined position in the real space in the background image.

ここで、時刻ｔ＋Δｔにおける本来の撮影画像は図２（ｉ）の画像２０６０であるが、システム上利用可能な画像は直近の撮影画像は図２（ｃ）の画像２０００である。そのため、図２（ｃ）を用いて表示のための画像２０７０を生成する。これを合成画像と呼ぶ。合成画像は特許文献１と同様に、視点回転角度ΔΘに応じて、図２（ｇ）の画像２０４０を回転と逆方向にΔＸだけ並行移動する。結果図２（ｈ）の画像２０５０が生成される。次に図２（ｈ）上の動体予測位置に動体領域２３２を合成し、図２（ｊ）に示す画像２０７０を生成する。最後に合成画像２０７０をＨＭＤに表示する。以上の処理の結果、あるべき画像２０６０と近い合成画像２０７０を表示できる。 Here, the original captured image at time t + Δt is the image 2060 in FIG. 2 (i), but the most recent captured image on the system is the image 2000 in FIG. 2 (c). Therefore, the image 2070 for display is generated using FIG. This is called a composite image. As in the case of Patent Document 1, the composite image moves the image 2040 of FIG. 2G in parallel in the direction opposite to the rotation by ΔX in accordance with the viewpoint rotation angle ΔΘ. As a result, the image 2050 of FIG. 2H is generated. Next, the moving body region 232 is combined with the moving body predicted position in FIG. 2 (h) to generate the image 2070 shown in FIG. 2 (j). Finally, the composite image 2070 is displayed on the HMD. As a result of the above processing, it is possible to display a composite image 2070 that is close to the desired image 2060.

あるいは、以下に説明する方法でも良い。図２（ｋ）は、予測された位置に動体２３５を描写した画像２０２０と、動体領域を補完した背景画像２０４０とを合成した画像２０８０である。図２（ｌ）は、画像２０８０を遅延時間とＨＭＤの位置姿勢の変化に合わせてシフトした画像２０９０である。このときＨＭＤには、画像２０９０を表示すれば良い。この方法によっても、動体の動きを予測に用いるため、体感的違和感の少ない画像を表示できる。 Alternatively, the method described below may be used. FIG. 2K is an image 2080 in which the image 2020 depicting the moving body 235 at the predicted position and the background image 2040 complementing the moving body region are combined. FIG. 2 (l) is an image 2090 obtained by shifting the image 2080 in accordance with changes in the delay time and the position and orientation of the HMD. At this time, the image 2090 may be displayed on the HMD. With this method as well, since the motion of the moving body is used for prediction, it is possible to display an image with less physical discomfort.

本発明における動体予測方法の例を、図３を用いて説明する。動体３０１は図２の時刻ｔにおける動体の位置を示し、動体３０２は図２の時刻ｔ＋Δｔにおける動体の位置を示す。図３（ａ）の３０１と３０２の座標を用いて、図２（ｂ）のΔｈを予測する。動体の座標は、画像から取得する。例えば、時刻ｔにおける動体３０１の座標は、図２（ｃ）の画像２０００から、画像の左下の隅を原点とする画像座標系で動体２３０の領域の重心の座標から取得する。動体予測は予測フィルタによって行う。予測フィルタの例としては、α−βトラッカーやカルマンフィルタがあるが、本発明は特定の予測フィルタに限定されるものではない。予測フィルタは観測モデルと運動モデルに基づいて、過去の動体観測位置から現在の動体観測位置を予測推定する。フィルタが扱う運動モデルとしては、図３（ａ）に示すように、画像座標上の位置、速度、加速度を内部に持ち、観測モデルとしては画像上の動体の位置を観測可能とする。この場合、観測（ｘ，ｙ）からΔｔ後の移動位置（ｘ’，ｙ’）が予測推定される。あるいは図３（ｂ）に示すように、ＨＭＤから物体が見える方向（θ、φ）を観測可能として、方向とその変位を内部で更新する運動モデルを用いてもよい。 An example of the moving body prediction method according to the present invention will be described with reference to FIG. The moving body 301 shows the position of the moving body at time t in FIG. 2, and the moving body 302 shows the position of the moving body at time t + Δt in FIG. Using the coordinates 301 and 302 in FIG. 3A, Δh in FIG. 2B is predicted. The coordinates of the moving body are acquired from the image. For example, the coordinates of the moving body 301 at time t are acquired from the image 2000 of FIG. 2C from the coordinates of the center of gravity of the area of the moving body 230 in the image coordinate system having the lower left corner of the image as the origin. Motion prediction is performed by a prediction filter. Examples of the prediction filter include an α-β tracker and a Kalman filter, but the present invention is not limited to a specific prediction filter. The prediction filter predicts and estimates the current moving body observation position from the past moving body observation position based on the observation model and the motion model. As a motion model handled by the filter, as shown in FIG. 3A, a position on the image coordinates, a velocity, and an acceleration are internally provided, and a position of a moving body on the image can be observed as an observation model. In this case, the moving position (x ′, y ′) after Δt is predicted and estimated from the observation (x, y). Alternatively, as shown in FIG. 3B, a motion model may be used in which the direction (θ, φ) in which the object is viewed from the HMD is observable and the direction and its displacement are internally updated.

図４は情報処理システムの機能構成例を示すブロック図である。ここではＨＭＤ内部で全ての処理を行う構成を示している。ＨＭＤ２００は、撮像部４１０、撮像光学系補正部４３０、表示光学系補正部４４０、表示部４２０を持つ。また、ＨＭＤ２００に搭載される情報処理装置４０１は以下の機能構成を有する。すなわち、画像取得部４４１、動体領域検出部４３３、動体位置予測部４３４、姿勢取得部４３１，位置姿勢推定部４３２、背景画像生成部４３５、ＣＧ画像生成部４３７、保持部４３８、合成画像生成部４３９、表示位置制御部４４２を有する。 FIG. 4 is a block diagram showing a functional configuration example of the information processing system. Here, a configuration is shown in which all the processing is performed inside the HMD. The HMD 200 has an imaging unit 410, an imaging optical system correction unit 430, a display optical system correction unit 440, and a display unit 420. Further, the information processing device 401 mounted on the HMD 200 has the following functional configuration. That is, the image acquisition unit 441, the moving body region detection unit 433, the moving body position prediction unit 434, the posture acquisition unit 431, the position / posture estimation unit 432, the background image generation unit 435, the CG image generation unit 437, the holding unit 438, the combined image generation unit. 439 and a display position control unit 442.

撮像部４１０は撮像光学系４１１、撮像センサ４１２を持ち、撮像した画像を情報処理装置４０１に出力する。なお、本実施形態において、具体的には撮像部４１０はカラーカメラであり、得られる画像はカラー画像である。撮像部４１０はステレオカメラであってもよい。 The image capturing unit 410 has an image capturing optical system 411 and an image capturing sensor 412, and outputs the captured image to the information processing device 401. In the present embodiment, specifically, the image pickup unit 410 is a color camera, and the obtained image is a color image. The imaging unit 410 may be a stereo camera.

画像取得部４４１は、撮像部４１０が現実空間における動体を撮像した画像（第１画像）を随時取得する。取得された画像は撮像光学系補正部４３０に送られる。撮像光学系補正部４３０は撮像画像の色をはじめとする各種収差を補正する。一方表示部４２０は表示光学系４２１と表示パネル４２２を持ち、補正済みの画像を表示する。表示の際の補正は表示光学系補正部４４０において行われる。これは撮像光学系補正部４３０と逆の処理である。 The image acquisition unit 441 acquires from time to time an image (first image) obtained by the imaging unit 410 capturing an image of a moving body in the physical space. The acquired image is sent to the imaging optical system correction unit 430. The imaging optical system correction unit 430 corrects various aberrations such as the color of the captured image. On the other hand, the display unit 420 has a display optical system 421 and a display panel 422, and displays a corrected image. The correction at the time of display is performed by the display optical system correction unit 440. This is the reverse process of the imaging optical system correction unit 430.

動体領域検出部４３３は、撮影画像（第１画像）を入力とし、動体を示す所定の画像特徴に基づいて画像内の動体領域を検出する。例えば、動体が手である場合は、手の画像特徴に基づいて動体領域を抽出する。具体的には、肌色領域を抽出すればよい。動体位置予測部４３４は、予測フィルタを用いた手法に従って、所定の時間後の動体の位置を予測する。本実施形態において、Δｔは固定値、かつ、運動モデルの移動空間は撮像画面内とする為、動体位置予測４３４では、ＨＭＤの位置姿勢の入力は不要である。Δｔは１００ｍｓｅｃほどの非常に短い時間であるため「動体の移動空間が撮影画像内」と近似できる。固定値を用いることで処理時間を短縮し、映像の違和感を軽減する効果が期待される。背景画像生成部４３５は、動体領域検出結果を入力として、撮影画像における動体領域を背景領域の画像特徴を用いて補完することで背景画像を生成する。具体的には、肌色領域を、撮像画像に含まれる背景領域の色で補完する。 The moving body region detection unit 433 receives the captured image (first image) as input, and detects a moving body region in the image based on a predetermined image feature indicating the moving body. For example, when the moving body is a hand, the moving body region is extracted based on the image feature of the hand. Specifically, the skin color area may be extracted. The moving body position predicting unit 434 predicts the position of the moving body after a predetermined time according to the method using the prediction filter. In the present embodiment, Δt is a fixed value, and the moving space of the motion model is within the imaging screen. Therefore, in the moving body position prediction 434, it is not necessary to input the position and orientation of the HMD. Since Δt is a very short time of about 100 msec, it can be approximated as “the moving space of the moving body is within the captured image”. The use of a fixed value is expected to reduce the processing time and reduce the discomfort in the image. The background image generation unit 435 receives the moving body region detection result as an input and complements the moving body region in the captured image using the image feature of the background region to generate a background image. Specifically, the skin color area is complemented with the color of the background area included in the captured image.

姿勢取得部４３１は、ジャイロセンサーや加速度計から構成され、ＨＭＤの相対的な姿勢情報を計測した結果を取得する。ここで取得する姿勢情報は、特定の物体の位置を予測するのに用いる。姿勢取得部４３１は、１００Ｈｚ以上で姿勢データが取得でき、遅延が少ない姿勢センサーを使用することが好ましい。姿勢取得部４３１は、計測した姿勢情報を遅延取得部１４５０とＣＧ画像生成部４３７に送信する。また、姿勢取得部１３００は、合成画像生成部４３９が画像を生成する直前に現在計測できている最新の姿勢情報を合成画像生成部４３９に送信する。また、姿勢取得部４３１は、姿勢計測時刻と姿勢計測値を対応付けて保持しており、姿勢計測時刻を入力されると、対応する姿勢計測値を返す処理を行うものとする。 The posture acquisition unit 431 includes a gyro sensor and an accelerometer, and acquires a result of measuring relative posture information of the HMD. The posture information acquired here is used to predict the position of a specific object. The attitude acquisition unit 431 preferably uses an attitude sensor that can acquire attitude data at 100 Hz or higher and that has a small delay. The posture acquisition unit 431 transmits the measured posture information to the delay acquisition unit 1450 and the CG image generation unit 437. In addition, the posture acquisition unit 1300 transmits the latest posture information currently measured immediately before the synthetic image generation unit 439 generates an image to the synthetic image generation unit 439. Further, the posture acquisition unit 431 holds the posture measurement time and the posture measurement value in association with each other. When the posture measurement time is input, the posture acquisition unit 431 performs a process of returning the corresponding posture measurement value.

位置姿勢推定部４３２は、ＣＧモデルを描写したＣＧ画像を現実空間と対応させて重畳する為に用いるＨＭＤの絶対的な位置及び姿勢を示す位置姿勢情報を推定する。位置姿勢情報とは、現実空間におけるＨＭＤの絶対的な３次元の位置及び姿勢を示す。位置姿勢情報は、ＣＧ画像を現実空間と対応させて重畳する為に用いられる。このＨＭＤの位置姿勢は、システム設定時にマーカーキャリブレーション工程にて設定されるワールド座標系を基準とする。位置姿勢推定の手法としては、指標（マーカー）や、ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎＡｎｄＭａｐｐｉｎｇ（ＳＬＡＭ）を用いた位置姿勢推定手法が利用可能である。ＳＬＡＭは、撮像画像に映ったマーカーから撮像装置の現実空間における３次元位置姿勢情報を推定する技術である。他には不図示の姿勢センサを用いて精度を高めることが可能であるし、外部の位置姿勢センサから直接的に値を取得してもよい。本発明は特定のＨＭＤ位置姿勢取得方法に限定されるものではない。また、位置姿勢推定部４３２は、位置姿勢取得結果から、時刻ｔ＋ΔｔにおけるＨＭＤの位置・姿勢を予測する。ＨＭＤの位置姿勢推定にあたっては、動体の予測と同様な予測フィルタが利用できる。 The position / orientation estimation unit 432 estimates position / orientation information indicating the absolute position and orientation of the HMD used for superimposing the CG image depicting the CG model in correspondence with the real space. The position / orientation information indicates the absolute three-dimensional position and orientation of the HMD in the physical space. The position and orientation information is used to superimpose the CG image in correspondence with the real space. The position and orientation of the HMD is based on the world coordinate system set in the marker calibration step when setting the system. As a position / orientation estimation method, an index (marker) or a position / orientation estimation method using Simultaneous Localization And Mapping (SLAM) can be used. SLAM is a technique for estimating three-dimensional position / orientation information in the physical space of the image capturing apparatus from the markers shown in the captured image. Alternatively, it is possible to improve the accuracy by using a posture sensor (not shown), or the value may be directly obtained from an external position / posture sensor. The present invention is not limited to a particular HMD position / orientation acquisition method. Further, the position / orientation estimation unit 432 predicts the position / orientation of the HMD at time t + Δt from the position / orientation acquisition result. In estimating the position and orientation of the HMD, a prediction filter similar to that for predicting a moving object can be used.

ＣＧ画像生成部４３７は、保持部４３８と姿勢取得部４３１から得たＨＭＤ位置姿勢情報を入力とし、時刻ｔ＋ΔｔにおけるＨＭＤの位置姿勢から見える位置に動体である手を描写したＣＧ画像を生成する。さらに、ＣＧ画像生成部４３７は、保持部４３８に保持されたCGモデルデータに基づいて背景画像中の現実空間の対応する所定位置に物体１３０をレンダリングしたCG画像を生成する。CGモデルデータに基づいて背景画像中の現実空間の対応する所定位置に物体１３０をレンダリングしたCG画像を生成する。保持部４３８と姿勢取得部４３１から得たＨＭＤ位置姿勢情報を入力とし、時刻ｔ＋ΔｔにおけるＨＭＤの位置姿勢から見える位置にＣＧモデルを描写したＣＧ画像を生成する。 The CG image generation unit 437 receives the HMD position / orientation information obtained from the holding unit 438 and the attitude acquisition unit 431 as an input, and generates a CG image depicting a hand that is a moving body at a position visible from the position / orientation of the HMD at time t + Δt. Further, the CG image generation unit 437 generates a CG image in which the object 130 is rendered at a corresponding predetermined position in the real space in the background image based on the CG model data held in the holding unit 438. Based on the CG model data, a CG image is generated by rendering the object 130 at a corresponding predetermined position in the real space in the background image. Using the HMD position / orientation information obtained from the holding unit 438 and the attitude acquisition unit 431 as input, a CG image depicting a CG model at a position visible from the position / orientation of the HMD at time t + Δt is generated.

保持部４３８は、ＣＧモデルのデータを保持する。また、位置姿勢計測結果のログを保持する。また、撮像部４１０が取得した画像を保持する。 The holding unit 438 holds the CG model data. In addition, a log of position and orientation measurement results is held. Further, it holds the image acquired by the image capturing unit 410.

合成画像生成部４３９は、背景画像（第１画像）に含まれる特定の物体の所定時間後の位置を推定した結果に基づいて、背景画像中の推定結果に基づく位置に動体を描写したＣＧ画像と背景画像とを合成した合成画像（第２画像）を生成する。または、予測フィルタに基づく動体位置予測結果と、ＨＭＤの位置姿勢予測結果と、背景画像とＣＧ画像を入力とする。ＨＭＤ位置姿勢予測結果に基づいて画像のシフト量を計算し、背景画像をシフト後にＣＧ画像と動体領域を重畳し、図２（ｊ）の画像２０７０を生成する。ここで動体とＣＧ画像の重畳順序について、ＨＭＤから動体までの距離が既知である場合には、ＣＧとの前後関係に基づいて描画順序を変更する。合成画像生成部４３９にて生成された画像は、表示光学系補正部４４０を経て表示部４２０にて表示される。 The composite image generation unit 439, based on the result of estimating the position of a specific object included in the background image (first image) after a predetermined time, draws a moving object at a position based on the estimation result in the background image. And a background image are combined to generate a combined image (second image). Alternatively, the moving body position prediction result based on the prediction filter, the HMD position / orientation prediction result, the background image, and the CG image are input. The shift amount of the image is calculated based on the HMD position / orientation prediction result, the background image is shifted, and the CG image and the moving body region are superimposed to generate the image 2070 of FIG. Here, regarding the superimposing order of the moving body and the CG image, when the distance from the HMD to the moving body is known, the drawing order is changed based on the front-rear relationship with the CG. The image generated by the composite image generation unit 439 is displayed on the display unit 420 via the display optical system correction unit 440.

表示位置制御部４４２は、撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および画像表示装置の位置または姿勢の変化量に基づいて、表示部に供給する合成画像（第２画像）を表示する位置をシフトする。シフトされた画像は、図２（ｌ）のように表示される。ここでは、遅延時間として固定値Δｔを用いる。なお、合成画像生成部４３９において、シフト後の背景画像とＣＧ画像を合成する場合は、ここでの処理は省略する。 The display position control unit 442, based on the delay time caused by at least a part of the processing time in the processing from the image pickup to the display, and the amount of change in the position or orientation of the image display device, the composite image (first 2) shift the display position. The shifted image is displayed as shown in FIG. Here, a fixed value Δt is used as the delay time. It should be noted that when the synthesized image generation unit 439 synthesizes the shifted background image and the CG image, the processing here is omitted.

図５は、情報処理装置が実行する処理を示すフローチャートである。図５を使って情報処理装置４０１が行う処理の流れを簡単に説明する。以下、フローチャートは、ＣＰＵが制御プログラムを実行することにより実現されるものとする。なお、以下の説明では、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。図５のフローチャートに示した処理は、コンピュータである図１１のＣＰＵ９０１により外部記憶装置９０６等に格納されているコンピュータプログラムに従って実行される。ただし、情報処理装置４０１は必ずしもこのフローチャートで説明するすべてのステップを行わなくても良い。また、後に説明するシーケンス図のように複数の処理を同時並行的に実行しても良い。 FIG. 5 is a flowchart showing processing executed by the information processing device. The flow of processing performed by the information processing device 401 will be briefly described with reference to FIG. Hereinafter, it is assumed that the flowchart is realized by the CPU executing the control program. In the following description, each process (step) will be described by adding S to the beginning, and the description of the process (step) will be omitted. The processing shown in the flowchart of FIG. 5 is executed by the CPU 901 of FIG. 11, which is a computer, according to a computer program stored in the external storage device 906 or the like. However, the information processing device 401 does not necessarily have to perform all the steps described in this flowchart. Further, a plurality of processes may be simultaneously executed in parallel as in a sequence diagram described later.

Ｓ５００では、情報処理装置４０１が初期化する。具体的には、ＣＧ画像生成部１７００が、モデルデータ記憶部１７５０から３次元のＣＧモデルを所定の位置に所与のＣＧ画像を描写するためのＣＧモデルデータを読み込む。Ｓ５０１では、画像取得部４４１が、撮像部４１０の撮像処理が完了したかを判断する。撮像が完了しており、新たな画像が取得可能であれば現実空間における特定の物体を撮像した撮像画像を取得してＳ５０２へ進む。撮像が完了していなければＳ５０１に戻る。なお、撮像画像は図２（ｃ）の画像２０００に対応する。Ｓ５０２では、撮像光学系補正部４３０は、撮像部４０１から取得された撮像画像の色をはじめとする各種収差を補正する。Ｓ５０３では、姿勢取得部４３１が、姿勢計測センサーによって計測された姿勢情報を取得する。位置姿勢推定部４３２が、撮像画像から撮像時のＨＭＤの位置姿勢情報を推定した結果を取得しても良い。Ｓ５０４では、位置姿勢推定部４３２が、Ｓ５０３における位置姿勢取得結果と過去の位置姿勢情報とから、時刻ｔ＋ΔｔにおけるＨＭＤの位置・姿勢を推定する。Ｓ５０５では、動体領域検出部４３３が、撮影画像（第１画像）から物体を示す物体領域と該物体領域が分離された背景領域とを取得する。物体領域は図２（ｄ）の領域２３２である。背景領域は、図２（ｆ）における画像２０３０である。 In S500, the information processing device 401 is initialized. Specifically, the CG image generation unit 1700 reads from the model data storage unit 1750 CG model data for rendering a given CG image at a predetermined position of a three-dimensional CG model. In step S501, the image acquisition unit 441 determines whether the image capturing process of the image capturing unit 410 is completed. If imaging has been completed and a new image can be acquired, a captured image of a specific object in the physical space is acquired and the process proceeds to S502. If imaging has not been completed, the process returns to S501. The captured image corresponds to the image 2000 shown in FIG. In S502, the imaging optical system correction unit 430 corrects various aberrations including the color of the captured image acquired from the imaging unit 401. In S503, the posture acquisition unit 431 acquires the posture information measured by the posture measurement sensor. The position / orientation estimation unit 432 may acquire the result of estimating the position / orientation information of the HMD at the time of imaging from the captured image. In S504, the position / orientation estimation unit 432 estimates the position / orientation of the HMD at time t + Δt from the position / orientation acquisition result in S503 and the past position / orientation information. In S505, the moving body region detection unit 433 acquires an object region indicating an object and a background region in which the object region is separated from the captured image (first image). The object area is the area 232 in FIG. The background area is the image 2030 in FIG.

Ｓ５０６では、動体位置予測部４３４が、撮影画像（第１画像）における物体領域に基づいて所定時間後（Δｔ後）における物体の位置を推定する。例えば、図２（ｅ）の画像２０２０で、動体２３３の位置からΔｔ秒後の動体２３４の位置への変化を予測する。ここでは前述の予測フィルタを用いた手法に従って、時刻ｔ＋Δｔにおける動体の位置を予測する。ここで所定時間とは遅延時間Δｔである。本実施形態では、予め準備したΔｔの固定値を用いる。Ｓ５０７では、ＣＧ画像生成部４３７が、保持部４３８と姿勢取得部４３１から得たＨＭＤの位置姿勢情報を入力とし、Ｓ５０６で予測された位置に動体を、現実空間と対応させた所定の位置にCGモデルを描写したＣＧ画像を生成する。図２（ｋ）の画像２０８０は、予測位置に画像２０００から抽出された手２３５の画像を、物体２２０の上にCGモデルである物体１３０を描写した画像である。Ｓ５０８では、背景画像生成部４３５が、撮像画像の物体領域の周辺から得られる背景領域の画像特徴（色や輝度）を用いて物体領域を補完することによって背景画像を生成する（第１生成）。図２（ｇ）の画像２０４０は、背景の画像特徴を用いて領域２４２を補完した背景画像である。例えば、物体領域を背景領域の色で塗りつぶすような処理を行う。S５０７とS５０８は逆の順で処理しても良い。 In S506, the moving body position prediction unit 434 estimates the position of the object after a predetermined time (after Δt) based on the object region in the captured image (first image). For example, in the image 2020 of FIG. 2E, a change from the position of the moving body 233 to the position of the moving body 234 after Δt seconds is predicted. Here, the position of the moving body at time t + Δt is predicted according to the method using the above-described prediction filter. Here, the predetermined time is the delay time Δt. In this embodiment, a fixed value of Δt prepared in advance is used. In S507, the CG image generation unit 437 receives the position and orientation information of the HMD obtained from the holding unit 438 and the orientation acquisition unit 431 as an input, and moves the moving body to the position predicted in S506 to a predetermined position corresponding to the real space. Generate a CG image depicting a CG model. The image 2080 in FIG. 2K is an image of the image of the hand 235 extracted from the image 2000 at the predicted position, and the object 130 that is the CG model on the object 220. In step S <b> 508, the background image generation unit 435 generates a background image by complementing the object area with the image characteristics (color and brightness) of the background area obtained from the periphery of the object area of the captured image (first generation). .. The image 2040 in FIG. 2G is a background image in which the area 242 is complemented by using the image feature of the background. For example, the object area is filled with the color of the background area. S507 and S508 may be processed in reverse order.

Ｓ５０９では、合成画像生成部４３９が、背景画像（第１画像）に含まれる特定の物体の所定時間後の位置を推定した結果に基づいて、背景画像中の推定結果に基づく位置に動体を描写したＣＧ画像と背景画像とを合成した合成画像（第２画像）を生成する。図２（ｋ）の画像２０８０がＳ５０９で生成された画像に対応する。Ｓ５０６で予測された位置（推定結果）に基づいて背景画像中の対応する位置に動体領域を合成することによって描写する。Ｓ５１０では、表示光学系補正部４４０が、合成画像を表示光学系に適した画像に変換する。Ｓ５１１では、表示位置制御部４４２は、撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および画像表示装置の位置または姿勢の変化量に基づいて、表示部に供給する合成画像（第２画像）を表示する位置をシフトする。S510とS511の手順は逆であってもよい。Ｓ５１２では、情報処理装置４０１が、ユーザからの終了指示があるかどうかを判定する。終了指示がある場合は、処理を終了する。終了指示がない場合は、Ｓ５０１に処理を移す。 In step S <b> 509, the synthetic image generation unit 439 draws the moving object at a position based on the estimation result in the background image based on the result of estimating the position of the specific object included in the background image (first image) after a predetermined time. A combined image (second image) is generated by combining the CG image and the background image. The image 2080 in FIG. 2K corresponds to the image generated in S509. It is depicted by synthesizing the moving body region at the corresponding position in the background image based on the position (estimation result) predicted in S506. In S510, the display optical system correction unit 440 converts the combined image into an image suitable for the display optical system. In step S511, the display position control unit 442 supplies to the display unit based on the delay time caused by at least a part of the processing time in the processing from the imaging to the display and the amount of change in the position or orientation of the image display device. The position for displaying the image (second image) is shifted. The procedure of S510 and S511 may be reversed. In S512, the information processing apparatus 401 determines whether or not there is a termination instruction from the user. If there is an end instruction, the process ends. If there is no end instruction, the process proceeds to S501.

あるいは、Ｓ５０８以降の処理を以下のようにしてもよい。Ｓ５０８では、背景画像生成部４３５が、背景領域の画像特徴（色や輝度）を用いて物体領域を補完し、ＨＭＤの位置姿勢の変化に基づいてシフトさせた背景画像を生成する。図２（ｈ）の画像２０５０は、背景の画像特徴を用いて領域２４２を補完し、さらにＨＭＤの位置姿勢の変化に応じてシフトさせた背景画像である。Ｓ５０９では、合成画像生成部４３９が、動体位置予測結果と、ＨＭＤの位置姿勢予測結果とに基づいて、シフト後の背景画像を背景に、推定された位置に物体を描写した第２画像を前景に合成した合成画像（第２画像）を生成する。Ｓ５１０では、表示光学系補正部４４０が、合成画像を表示光学系に適した画像に変換する。Ｓ５１１では、表示位置制御部４４２は、合成画像をそのままの位置で表示する。Ｓ５１２では、情報処理装置４０１が、ユーザからの終了指示があるかどうかを判定する。終了指示がある場合は、処理を終了する。終了指示がない場合は、Ｓ５０１に処理を移す。 Alternatively, the processing after S508 may be performed as follows. In step S508, the background image generation unit 435 complements the object area using the image characteristics (color and brightness) of the background area, and generates a background image shifted based on the change in the position and orientation of the HMD. The image 2050 in FIG. 2H is a background image in which the area 242 is complemented using the image characteristics of the background and further shifted according to the change in the position and orientation of the HMD. In step S <b> 509, the composite image generation unit 439 uses the background image after the shift and the second image that depicts the object at the estimated position as the foreground, based on the moving body position prediction result and the HMD position / orientation prediction result. To generate a composite image (second image). In S510, the display optical system correction unit 440 converts the combined image into an image suitable for the display optical system. In S511, the display position control unit 442 displays the combined image at the position as it is. In S512, the information processing apparatus 401 determines whether or not there is a termination instruction from the user. If there is an end instruction, the process ends. If there is no end instruction, the process proceeds to S501.

図６は、情報処理システムが実行する処理を示すシーケンス図である。図６において（ａ）撮影、（ｂ）位置姿勢取得、（ｃ）ＣＧ画像生成、（ｄ）動体領域検出、（ｅ）表示画像生成、（ｆ）表示、の各処理が同時並行的に動作する。前記各処理は繰り返し実施される。なお各処理の起動終了についての説明は省略する。 FIG. 6 is a sequence diagram showing processing executed by the information processing system. In FIG. 6, each processing of (a) photographing, (b) position / orientation acquisition, (c) CG image generation, (d) moving body region detection, (e) display image generation, and (f) display operates simultaneously in parallel. To do. The above processes are repeatedly performed. Note that the description of the end of the activation of each process is omitted.

図６（ａ）撮影処理では、Ｓ６００で撮像部４１０が第１画像を撮影する。図５のＳ５０１と対応している。Ｓ６０１で撮像光学系補正部４３０第１画像の補正処理を行う。図５のＳ５０２と対応している。図６（ｂ）位置姿勢取得処理では、Ｓ６１０の姿勢取得部４３１が、撮影画像から時刻ｔにおけるＨＭＤの位置姿勢を検出する。図５のＳ５０３と対応している。次にＳ６１１で、位置姿勢推定部４３２が、検出された位置姿勢からｔ＋ΔｔにおけるＨＭＤの位置姿勢を予測する。図５のＳ５０４と対応している。本実施形態ではΔｔは固定値を用いる。図６（ｃ）ＣＧ画像生成処理では、Ｓ６２０で、ＣＧ画像生成部４３７が、Ｓ６１０で計測されたＨＭＤの位置姿勢情報に基づいて、ＣＧ画像を生成する。図５のＳ５０７と対応している。図６（ｄ）動体領域検出処理では、Ｓ６３０で動体領域検出部４３３が撮影画像から動体領域を検出する。図５のＳ５０５と対応している。Ｓ６３１で動体位置予測部４３４が時刻ｔ＋Δｔにおける動体位置を予測する。図５のＳ５０６と対応している。本実施形態では図３（ａ）の画面内での予測を採用するため、ＨＭＤの位置姿勢情報は不要である。 In the shooting process of FIG. 6A, the imaging unit 410 shoots the first image in S600. This corresponds to S501 in FIG. In step S601, the correction process of the first image of the imaging optical system correction unit 430 is performed. This corresponds to S502 in FIG. In the position and orientation acquisition process of FIG. 6B, the orientation acquisition unit 431 of S610 detects the position and orientation of the HMD at time t from the captured image. This corresponds to S503 in FIG. Next, in step S611, the position and orientation estimation unit 432 predicts the position and orientation of the HMD at t + Δt from the detected position and orientation. This corresponds to S504 in FIG. In this embodiment, Δt uses a fixed value. In the CG image generation process of FIG. 6C, in S620, the CG image generation unit 437 generates a CG image based on the position and orientation information of the HMD measured in S610. This corresponds to S507 in FIG. In the moving body area detection process of FIG. 6D, the moving body area detection unit 433 detects a moving body area from the captured image in S630. This corresponds to S505 in FIG. In S631, the moving body position predicting unit 434 predicts the moving body position at time t + Δt. This corresponds to S506 in FIG. In this embodiment, since the prediction in the screen of FIG. 3A is adopted, the position / orientation information of the HMD is unnecessary.

図６（ｅ）背景＆表示画像生成処理は、Ｓ６３０の動体領域検出結果を受けて、Ｓ６４０で背景画像生成部４３５が、動体領域を穴埋めした背景画像を生成する。図５のＳ５０８と対応している。その後、Ｓ６４３で合成画像生成部４３９が、合成画像生成処理をおこなう。図５のＳ５０９と対応している。この処理は、動体位置予測Ｓ６３１とＨＭＤの位置姿勢予測Ｓ６１１、ＣＧ画像生成Ｓ６２０の結果を受けて、最新の位置姿勢情報に応じてＨＭＤに表示するための合成画像を生成する。図６（ｆ）は表示処理である。Ｓ６４３で生成された合成画像を受けて、Ｓ６５０で表示光学系補正部４４０が合成画像を補正する。図５のＳ５１０と対応している。Ｓ６５２で、表示位置制御部４４２は、撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および画像表示装置の位置または姿勢の変化量に基づいて、表示部に供給する合成画像（第２画像）を表示する位置をシフトする。Ｓ５５３で表示部４２０が画像表示処理を順次行い、画像を表示する。図５のＳ５１１と対応している。 In the background & display image generation process of FIG. 6E, in response to the moving body region detection result of S630, the background image generation unit 435 generates a background image in which the moving body region is filled in in S640. This corresponds to S508 in FIG. After that, in S643, the composite image generation unit 439 performs a composite image generation process. This corresponds to S509 in FIG. This process receives the results of the moving body position prediction S631, the HMD position / orientation prediction S611, and the CG image generation S620, and generates a composite image to be displayed on the HMD according to the latest position / orientation information. FIG. 6F is a display process. Upon receiving the composite image generated in S643, the display optical system correction unit 440 corrects the composite image in S650. This corresponds to S510 in FIG. In step S652, the display position control unit 442 supplies the composite to the display unit based on the delay time caused by at least a part of the processing time in the processing from the imaging to the display and the amount of change in the position or orientation of the image display device. The position for displaying the image (second image) is shifted. In step S553, the display unit 420 sequentially performs image display processing to display an image. This corresponds to S511 in FIG.

以上述べたように、本実施例によれば、ＨＭＤの体感的遅延緩和を目的とした画像シフト方式のタイムワープ画像生成において、動体の動きを考慮することにより、ＨＭＤ使用者の違和感を改善できる。 As described above, according to the present embodiment, the discomfort of the HMD user can be improved by considering the motion of the moving body in the time warp image generation of the image shift method for the purpose of reducing the perceptual delay of the HMD. ..

（変形例１）
本発明に関わる変形例としては、動体の予測に用いる座標をＨＭＤから見た動体の姿勢であるθφ空間にとることによって、予測精度を向上することが可能である。また撮影から表示までの遅延時間Δｔを可変にとることによっても予測精度を向上できる。運動モデルに基づいて予測を行う際に、実際の撮影と表示の遅延時間計測結果に基づいてΔｔを設定する方が、より正確であるため、映像の違和感をより軽減できる。本変形例の機能構成例を図７に示す。以下、図４との差分箇所を中心に説明する。 (Modification 1)
As a modified example of the present invention, it is possible to improve the prediction accuracy by setting the coordinates used for the prediction of the moving body in the θφ space which is the posture of the moving body as viewed from the HMD. In addition, the prediction accuracy can be improved by making the delay time Δt from photographing to display variable. It is more accurate to set Δt based on the measurement result of the delay time between the actual shooting and the display when performing the prediction based on the motion model, so that the discomfort of the image can be further reduced. FIG. 7 shows a functional configuration example of this modification. Hereinafter, the difference from FIG. 4 will be mainly described.

本変形例では遅延計測部６４１を設け、撮影時刻と表示時刻との差分Δｔを計測する。遅延計測部６４１は内部にタイマーを具備する。遅延時間Δｔは、撮像光学系補正部６３０が撮像画像を取得した時刻（第１の時刻）と、表示光学系補正部６４０が表示画像の送信を完了した時刻（第２の時刻）との差分に基づいて決定される。遅延時間Δｔは撮影画像に添付され、動体位置予測部６３４、ＨＭＤ位置姿勢推定部６３２において予測に利用される。位置姿勢取得部６３１は第１の実施形態４３１と同一である。位置姿勢取得部６３１は、現在計測できている最新の姿勢の計測時刻を遅延取得部６４１に送信する。 In this modification, a delay measuring unit 641 is provided to measure the difference Δt between the shooting time and the display time. The delay measuring unit 641 has a timer inside. The delay time Δt is the difference between the time when the imaging optical system correction unit 630 acquires the captured image (first time) and the time when the display optical system correction unit 640 completes the transmission of the display image (second time). It is decided based on. The delay time Δt is attached to the captured image and used for prediction in the moving body position predicting unit 634 and the HMD position / posture estimating unit 632. The position / orientation acquisition unit 631 is the same as in the first embodiment 431. The position / orientation acquisition unit 631 transmits to the delay acquisition unit 641 the measurement time of the latest orientation that is currently being measured.

動体領域検出部６３３は、図３（ｂ）に示すようなθφ空間での動体位置（方位）を検出する。これはＨＭＤから相対的にどの方向に動体が見えるかを示すものである。動体位置予測部６３４はθφ空間上で動体位置（方位）を予測する。背景画像生成部６３５は４３５と同一である。予測背景画像生成部６３６はＨＭＤの位置姿勢予測に基づいて時刻ｔ＋Δｔにおける背景画像のシフト量を計算し、画像をシフトする。ＣＧ画像生成部６３７も同様に時刻ｔ＋ΔｔにおけるＨＭＤの予測位置姿勢からみたＣＧモデル画像を生成する。合成画像生成部６３９は予測背景画像、ＣＧ画像、動体を合成した画像を生成する。ここで距離計測の結果動体がＣＧよりも手前にあると判明した場合には、動体を最後に重畳する。これは例えば動体がＨＭＤ装着者の手である場合に該当する。 The moving body region detection unit 633 detects the moving body position (orientation) in the θφ space as shown in FIG. This indicates in which direction the moving body can be seen from the HMD. The moving body position predicting unit 634 predicts the moving body position (azimuth) in the θφ space. The background image generation unit 635 is the same as 435. The predicted background image generation unit 636 calculates the shift amount of the background image at time t + Δt based on the position / orientation prediction of the HMD, and shifts the image. Similarly, the CG image generation unit 637 also generates a CG model image viewed from the predicted position and orientation of the HMD at time t + Δt. The synthetic image generation unit 639 generates a predicted background image, a CG image, and an image in which moving objects are combined. If the moving object is found to be in front of the CG as a result of distance measurement, the moving object is superimposed last. This applies, for example, when the moving body is the hand of the HMD wearer.

図８は情報処理装置が実行する処理を示すフローチャートである。Ｓ５００では、情報処理装置４０１が初期化する。Ｓ５０１では、撮像光学系補正部６３０が、撮像部４１０の撮像処理が完了したかを判断する。撮像が完了しており、新たな画像が取得可能であれば現実空間における特定の物体を撮像した撮像画像を取得してＳ８００へ進む。撮像が完了していなければＳ５０１に戻る。Ｓ８００では、遅延計測部６４１が、撮影時刻と表示時刻との差分Δｔを計測する。Ｓ５０２では、撮像光学系補正部６３０は、撮像部４０１から取得された撮像画像の色をはじめとする各種収差を補正する。Ｓ５０３では、位置姿勢取得部６３１が、撮像画像から撮像時のＨＭＤの姿勢情報を取得する。Ｓ５０４では、位置姿勢推定部６３２が、Ｓ５０３における位置姿勢取得結果と過去の位置姿勢情報とから、時刻ｔ＋ΔｔにおけるＨＭＤの位置・姿勢を推定する。Ｓ５０５では、動体領域検出部６３３が撮影画像（第１画像）から物体を示す物体領域と該物体領域が分離された背景領域とを取得する。Ｓ５０６では、動体位置予測部６３４が、撮影画像（第１画像）における物体領域に基づいて遅延時間後における物体の位置を推定する。Ｓ５０７では、ＣＧ画像生成部４３７が、保持部４３８と姿勢取得部４３１から得たＨＭＤの位置姿勢情報を入力とし、Ｓ５０６で予測された位置に動体を、現実空間に対応した所定の位置にＣＧモデルとを描写したＣＧ画像として生成する。Ｓ５０８では、背景画像生成部６３５が、背景領域の画像特徴（色や輝度）を用いて物体領域を補完することによって背景画像を生成する。Ｓ５０９では、合成画像生成部６３９が、背景画像（第１画像）に含まれる特定の物体の所定時間後の位置を推定した結果に基づいて、背景画像中の推定結果に基づく位置に動体を描写したＣＧ画像と背景画像とを合成した合成画像（第２画像）を生成する。Ｓ５１０では、表示光学系補正部６４０が、合成画像を表示光学系に適した画像に変換する。Ｓ５１１では、表示位置制御部６４２は、撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および画像表示装置の位置または姿勢の変化量に基づいて、表示部に供給する合成画像（第２画像）を表示する位置をシフトする。Ｓ８０１では、遅延計測部６４１が、今回の処理にかかった遅延時間を記録する。遅延時間は保持部６３８に保持される。Ｓ５１２では、情報処理装置６０１が、ユーザからの終了指示があるかどうかを判定する。終了指示がある場合は、処理を終了する。終了指示がない場合は、Ｓ５０１に処理を移す。 FIG. 8 is a flowchart showing processing executed by the information processing device. In S500, the information processing device 401 is initialized. In S501, the image pickup optical system correction unit 630 determines whether the image pickup process of the image pickup unit 410 is completed. If imaging has been completed and a new image can be acquired, a captured image of a specific object in the physical space is acquired and the process proceeds to S800. If imaging has not been completed, the process returns to S501. In S800, the delay measuring unit 641 measures the difference Δt between the shooting time and the display time. In S502, the imaging optical system correction unit 630 corrects various aberrations such as the color of the captured image acquired from the imaging unit 401. In S503, the position and orientation acquisition unit 631 acquires the orientation information of the HMD at the time of image capturing from the captured image. In S504, the position / orientation estimation unit 632 estimates the position / orientation of the HMD at time t + Δt from the position / orientation acquisition result in S503 and the past position / orientation information. In step S505, the moving body region detection unit 633 acquires an object region indicating an object and a background region in which the object region is separated from the captured image (first image). In S506, the moving body position prediction unit 634 estimates the position of the object after the delay time based on the object region in the captured image (first image). In S507, the CG image generation unit 437 receives the position and orientation information of the HMD obtained from the holding unit 438 and the orientation acquisition unit 431 as input, and sets the moving body at the position predicted in S506 and the CG at a predetermined position corresponding to the physical space. The model and the CG image are generated. In step S508, the background image generation unit 635 generates a background image by complementing the object area with the image features (color and brightness) of the background area. In step S <b> 509, the composite image generation unit 639 draws the moving object at the position based on the estimation result in the background image based on the result of estimating the position of the specific object included in the background image (first image) after a predetermined time. A combined image (second image) is generated by combining the CG image and the background image. In S510, the display optical system correction unit 640 converts the composite image into an image suitable for the display optical system. In step S511, the display position control unit 642 supplies the composite to the display unit based on the delay time caused by at least a part of the processing time in the processing from the imaging to the display and the amount of change in the position or orientation of the image display device. The position for displaying the image (second image) is shifted. In S801, the delay measuring unit 641 records the delay time required for the current process. The delay time is held in the holding unit 638. In S512, the information processing device 601 determines whether or not there is an end instruction from the user. If there is an end instruction, the process ends. If there is no end instruction, the process proceeds to S501.

図９は、情報処理システムが実行する処理を示すシーケンスである。図６と同様に複数の処理が並行して動作する。以下図６と異なる部分を中心に説明する。撮影と表示との間の遅延時間Δｔ計測の為に、撮影処理（ａ）のＳ７０２では撮像光学系補正部６３０が、撮影時刻を取得する。Ｓ７５１の表示時刻取得情報と合せて、Ｓ７０３では、遅延計測部６４１が、遅延計測を実施する。位置姿勢取得処理（ｂ）において、Ｓ７１１で位置姿勢予測処理を行う。ＣＧ画像生成処理（ｃ）のＳ７２０では、位置姿勢予測結果を受けて、予測位置姿勢からのＣＧ画像を生成する。動体領域検出処理（ｄ）のＳ７３１では、ＨＭＤの位置姿勢計測結果に基づいてθφ空間上での動体位置を予測する。背景＆表示画像生成処理（ｅ）のＳ７４２では、Ｓ７１１位置姿勢予測処理の結果を受けて、時刻ｔ＋Δｔにおける背景画像を生成する。表示処理（ｆ）のＳ７５０では、表示位置制御部６４２が、表示光学系補正部６４０が、合成画像を表示光学系に適した画像に変換する。Ｓ７５２では、表示位置制御部６４２が撮像から表示までの処理における少なくとも一部の処理時間に起因する遅延時間、および画像表示装置の位置または姿勢の変化量に基づいて、表示部に供給する合成画像（第２画像）を表示する位置をシフトする。Ｓ７５１では、遅延計測部６４１が、表示時刻を取得する。Ｓ７５３で、表示部が画像の表示を行う。 FIG. 9 is a sequence showing a process executed by the information processing system. Similar to FIG. 6, a plurality of processes operate in parallel. Hereinafter, description will be made focusing on parts different from FIG. In order to measure the delay time Δt between the shooting and the display, the imaging optical system correction unit 630 acquires the shooting time in S702 of the shooting process (a). In addition to the display time acquisition information in S751, the delay measuring unit 641 performs delay measurement in S703. In the position and orientation acquisition process (b), the position and orientation prediction process is performed in S711. In S720 of the CG image generation processing (c), the position / orientation prediction result is received, and a CG image from the predicted position / orientation is generated. In S731 of the moving body area detection process (d), the moving body position in the θφ space is predicted based on the position and orientation measurement result of the HMD. In S742 of the background & display image generation process (e), the background image at the time t + Δt is generated in response to the result of the S711 position / orientation prediction process. In S750 of the display process (f), the display position control unit 642 and the display optical system correction unit 640 convert the composite image into an image suitable for the display optical system. In step S752, the display position control unit 642 supplies the composite image to the display unit based on the delay time caused by at least a part of the processing time in the process from imaging to display and the amount of change in the position or orientation of the image display device. The position for displaying (second image) is shifted. In S751, the delay measuring unit 641 acquires the display time. In S753, the display unit displays an image.

（変形例２）
本発明に関わるその他の変形例としては、図１０に示すように、ＨＭＤと制御装置が分離した構成でも実現可能である。以下、図６との相違を中心に説明する。システムはＨＭＤ８００と制御装置８５０から構成され、相互に通信する。ＨＭＤ８００はデータ送信部８４０とデータ受信部８４１を持つ。制御装置は例えば市販のＰＣで構成され、データ受信部８３２とデータ送信部８５３を持つ。ＨＭＤのデータ送信部８４０は撮影画像、遅延計測結果、姿勢情報をまとめて、制御装置８５０に送信し、制御装置８５０の受信部８５２がこれを受信する。一方で制御装置８５０のデータ送信部８５３は合成画像生成部８６２により生成された合成画像が、ＨＭＤ８００に対して送信され、データ受信部８４１がこれを受ける。ＨＭＤ８００と制御装置８５０の間の通信はＵＳＢやＩＥＥＥ１３９４、ＬＡＮが選択可能であり、特定の通信方式に限定されるものではない。なお、本変形例のソフトウェア構成は、変形例１のソフトウェア構成において、撮影処理（ａ）と表示処理（ｆ）がＨＭＤ側に配置される。さらに、位置姿勢取得処理（ｂ）、ＣＧ生成処理（ｃ）、動体領域検出処理（ｄ）、背景＆表示画像生成処理（ｅ）が制御装置側に配置される構成をとり、互いに通信を行うことによって実現可能である。 (Modification 2)
As another modification related to the present invention, as shown in FIG. 10, it is also possible to realize a configuration in which the HMD and the control device are separated. The difference from FIG. 6 will be mainly described below. The system consists of HMD 800 and controller 850, which communicate with each other. The HMD 800 has a data transmission unit 840 and a data reception unit 841. The control device is composed of, for example, a commercially available PC, and has a data receiving unit 832 and a data transmitting unit 853. The data transmission unit 840 of the HMD collectively transmits the captured image, the delay measurement result, and the posture information to the control device 850, and the reception unit 852 of the control device 850 receives this. On the other hand, the data transmitting unit 853 of the control device 850 transmits the composite image generated by the composite image generating unit 862 to the HMD 800, and the data receiving unit 841 receives the composite image. The communication between the HMD 800 and the control device 850 can be selected from USB, IEEE1394, and LAN, and is not limited to a specific communication method. In the software configuration of this modification, in the software configuration of Modification 1, the shooting process (a) and the display process (f) are arranged on the HMD side. Further, the position / orientation acquisition process (b), the CG generation process (c), the moving body region detection process (d), and the background & display image generation process (e) are arranged on the control device side to communicate with each other. It can be realized by

（その他の実施例）
図１１は、上記の実施形態の情報処理装置を実現するためのハードウェアを示す模式図である。ＣＰＵ９０１は、ＲＡＭ９０７やＲＯＭ２０２に格納されているコンピュータプログラムやデータを使ってコンピュータ全体の制御を行う。また、ＣＰＵはそれと共に以下の各実施形態で情報処理装置が行うものとして説明する各処理を実行する。ＲＡＭ９０７は、外部記憶装置９０６や記憶媒体ドライブ９０５からロードされたコンピュータプログラムやデータを一時的に記憶する。またＲＡＭ９０７は、外部から受信したデータを一時的に記憶するためのエリアを有する。更に、ＲＡＭ９０７は、ＣＰＵ９０１が各処理を実行する際に用いるワークエリアも有する。即ち、ＲＡＭ９０７は、各種エリアを適宜提供することができる。また、ＲＯＭ９０２には、コンピュータの設定データやブートプログラムなどが格納されている。キーボード９０９、マウス９０８は、操作入力装置の一例としてのものであり、コンピュータのユーザが操作することで、各種の指示をＣＰＵ９０１に対して入力することができる。表示部９０４は、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ９０１による処理結果を画像や文字などで表示することができる。例えば、表示部９０４には、撮像装置４１０によって撮像された現実空間の画像と仮想画像とを合成した合成画像を表示することができる。外部記憶装置９０６は、ハードディスクドライブ装置に代表される大容量情報記憶装置である。外部記憶装置９０６には、ＯＳ（オペレーティングシステム）や、情報処理装置が行う各処理をＣＰＵ９０１に実行させるためのプログラムやデータが格納されている。外部記憶装置９０６に保存されているコンピュータプログラムやデータは、ＣＰＵ９０１による制御に従って適宜ＲＡＭ９０７にロードされる。ＣＰＵ９０１はこのロードされたプログラムやデータを用いて処理を実行することで、情報処理装置が行う各処理を実行することになる。記憶媒体ドライブ９０５は、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記憶媒体に記録されたプログラムやデータを読み出したり、係る記憶媒体にコンピュータプログラムやデータを書込んだりする。尚、外部記憶装置９０６に保存されているものとして説明したプログラムやデータの一部若しくは全部をこの記憶媒体に記録しておいても良い。記憶媒体ドライブ９０５が記憶媒体から読み出したコンピュータプログラムやデータは、外部記憶装置９０６やＲＡＭ９０７に対して出力される。Ｉ／Ｆ９０３は、撮像装置４１０を接続するためのアナログビデオポートあるいはＩＥＥＥ１３９４等のデジタル入出力ポートにより構成される。Ｉ／Ｆ９０３を介して受信したデータは、ＲＡＭ９０７や外部記憶装置９０６に入力される。バス９１０は、上述の各構成部をバス信号によって繋げるものである。 (Other embodiments)
FIG. 11 is a schematic diagram showing hardware for realizing the information processing apparatus of the above embodiment. The CPU 901 controls the entire computer using computer programs and data stored in the RAM 907 and the ROM 202. Further, the CPU also executes each processing described as being executed by the information processing apparatus in each of the following embodiments. The RAM 907 temporarily stores the computer programs and data loaded from the external storage device 906 and the storage medium drive 905. Further, the RAM 907 has an area for temporarily storing data received from the outside. Further, the RAM 907 also has a work area used when the CPU 901 executes each process. That is, the RAM 907 can appropriately provide various areas. Further, the ROM 902 stores computer setting data, a boot program, and the like. The keyboard 909 and the mouse 908 are examples of an operation input device, and various instructions can be input to the CPU 901 by being operated by a computer user. The display unit 904 is composed of a CRT, a liquid crystal screen, or the like, and can display the processing result by the CPU 901 as an image or characters. For example, the display unit 904 can display a combined image obtained by combining the image of the physical space captured by the imaging device 410 and the virtual image. The external storage device 906 is a large capacity information storage device represented by a hard disk drive device. The external storage device 906 stores an OS (operating system) and programs and data for causing the CPU 901 to execute each process performed by the information processing device. The computer programs and data stored in the external storage device 906 are appropriately loaded into the RAM 907 under the control of the CPU 901. The CPU 901 executes each process using the loaded program and data, thereby executing each process performed by the information processing apparatus. The storage medium drive 905 reads programs and data recorded in a storage medium such as a CD-ROM and a DVD-ROM, and writes a computer program and data in the storage medium. Incidentally, some or all of the programs and data described as being stored in the external storage device 906 may be recorded in this storage medium. The computer program and data read from the storage medium by the storage medium drive 905 are output to the external storage device 906 and the RAM 907. The I / F 903 is configured by an analog video port for connecting the imaging device 410 or a digital input / output port such as IEEE1394. The data received via the I / F 903 is input to the RAM 907 or the external storage device 906. The bus 910 connects the above-described components with a bus signal.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. It can also be realized by the processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

４００情報処理システム
１００情報処理装置
２００ヘッドマウントディスプレイ 400 Information Processing System 100 Information Processing Device 200 Head Mounted Display

Claims

Information processing that supplies the second image to an image display device that includes an image capturing unit that acquires a first image obtained by capturing an image of a physical space and a display unit that displays a second image generated using the first image A device,
First generation means for generating a background image in which a region of a specific object included in the first image is complemented using image features obtained from the periphery of the region included in the first image;
Based on the result of estimating the position of the specific object included in the first image after a predetermined time, the second image obtained by synthesizing the image depicting the object at the position based on the estimation result in the background image is displayed. Second generating means for generating,
Based on the delay time caused by at least a part of the processing time in the processing from the image pickup to the display, and the amount of change in the position or orientation of the image display device, the position for displaying the second image to be supplied to the display means is set. An information processing apparatus, comprising: a display position control unit that shifts.

The information processing according to claim 1, wherein the predetermined time is a difference between a first time when an image is captured by the image capturing device and a second time when the position or orientation of the image display device is measured. apparatus.

The information processing apparatus according to claim 1, wherein the first generation unit generates the area having a predetermined image feature from the first image.

The information processing apparatus according to claim 3, wherein the first generation unit generates the area having an image feature of a hand from the first image.

The object is a hand,
The said 1st production | generation means produces | generates the said background image by complementing the skin color area | region in the said 1st image with the color of the background area contained in the said 1st image. Processing equipment.

The second generation unit generates the second image by synthesizing the image obtained by extracting the region of the specific object generated by the first generation unit, at a position based on the estimation result in the background image. The information processing apparatus according to any one of claims 1 to 5, characterized in that:

7. The information processing device according to claim 1, wherein the image display device is a head mounted display.

Information processing that supplies the second image to an image display device that includes an image capturing unit that acquires a first image obtained by capturing an image of a physical space and a display unit that displays a second image generated using the first image A device,
At least a part of the processing time in the processing from the image pickup to the display is performed by complementing the area of the specific object included in the first image with the image feature obtained from the periphery of the area included in the first image. First generation means for generating a background image obtained by shifting the first image on the basis of the delay time caused by and the amount of change in the position or orientation of the image display device,
An image in which the object is depicted at a position in the background image based on the estimation result, based on a result of estimating a position of the specific object included in the first image after a predetermined time, and the first generation means. An information processing device, comprising: a second generation unit configured to generate the second image by combining the generated background image.

A program for causing a computer to function as each unit of the information processing apparatus according to claim 1.

Information processing that supplies the second image to an image display device that includes an image capturing unit that acquires a first image obtained by capturing an image of a physical space and a display unit that displays a second image generated using the first image Method,
A first generation step of generating a background image in which a region of a specific object included in the first image is complemented using image features obtained from the periphery of the region included in the first image;
Based on the result of estimating the position of the specific object included in the first image after a predetermined time, the second image obtained by synthesizing the image depicting the object at the position based on the estimation result in the background image is displayed. A second generation step for generating,
Based on the delay time caused by at least a part of the processing time in the processing from the image pickup to the display, and the amount of change in the position or orientation of the image display device, the position for displaying the second image to be supplied to the display means is set. A display position control step of shifting.