JP2004145448A

JP2004145448A - Terminal device, server device, and image processing method

Info

Publication number: JP2004145448A
Application number: JP2002307170A
Authority: JP
Inventors: Norio Mihara; 三原　功雄; Shunichi Numazaki; 沼崎　俊一; Takahiro Harashima; 原島　高広; Kunihisa Kishikawa; 岸川　晋久; Miwako Doi; 土井　美和子
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-10-22
Filing date: 2002-10-22
Publication date: 2004-05-20

Abstract

<P>PROBLEM TO BE SOLVED: To provide a terminal device capable of processing images, such as for synthesizing an advertising image or a desired image, according to the three-dimensional features of a plurality of objects in the images(including videos and still pictures), such as their three-dimensional configurations and three-dimensional relationship between their positions. <P>SOLUTION: A first image containing a plurality of objects is obtained and depth information corresponding to a plurality of areas in the first image composed of the plurality of areas made up of one or a plurality of pixels is obtained. Based on the first image and the depth information, at least the three-dimensional features of the plurality of objects and the three-dimensional relationship between the positions of the plurality of objects are extracted as feature information and the first image is processed according to this feature information. As a result, images can be processed to synthesize the advertising image or an image of a desired virtual object based on the three-dimensional features of the plurality of objects in the images (including video and still pictures), such as their three-dimensional configurations and the relationship between their positions. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば携帯電話などの端末装置において、撮影した画像に所望の画像を合成するなどして画像を加工する画像加工方法に関し、特に、この画像加工方法を用いて生成された画像を、携帯電話などの複数の端末装置間で送受信することでコミュニケーションを行う端末装置および通信システムに関する。
【０００２】
【従来の技術】
近年、動画像を撮影することができるカメラ装置（ビデオカメラなど）は一般的に広く普及するようになった。これにより、自分の子供の成長過程や運動会などの行事といった身近な対象物を気楽に撮影・閲覧することが可能となった。また、最近は、携帯電話やＰＤＡなどにもカメラが具備されていることが多く、街角などで気軽に動画像を撮影し、そのまま電子メールなどをもちいて、相手に動画像を添付したメールを送ることができるようになった。これらにより、ビデオ画像をコミュニケーションの手段として利用する活用方法が一般的になりつつある（例えば、非特許文献１、非特許文献２参照））。また、ネットワークのブロードバンド化も急速に進み、テレビ電話などといった、リアルタイムの映像コミュニケーションも既に実現されている（例えば、非特許文献３参照）。
【０００３】
以上のように、一般の人々が気楽に動画像をコミュニケーションとして用いる文化が浸透しつつある。しかし、従来のビデオカメラでは、単純にカメラに写ったものを素材として用いることがほとんどで、それに対して、何らかの特殊効果を施すということはあまり行われていなかった。これは、人間の目とは異なり、従来のカメラが、対象の色情報のみを２次元的に撮影するものであったからである。そのため、人間の目では、カメラに写っている対象の区別や、その前後関係、立体形状などが把握できているのに対し、カメラに写ったものは、色情報でしか区別をつけることができない。
【０００４】
このため、カメラで撮影した画像から、例えば、人間の部分のみを取り出して背景と区別したり、複数の人間の前後関係、位置関係を得たり、物の表面の形状変化を判別したり、といった処理を行うことは非常に困難であった。色情報のみを用いて、仮想的にそのような情報を得ようという試みもある。例えば、画像中からオブジェクトを背景から分離し、これを用いて、オブジェクトのみを用いて画像コミュニケーションを行う技術が開示されている（例えば、特許文献１参照）。ここでは、顔などのパーツを分離し、別に用意した背景に、対話をしている複数人の顔パーツを表示することで、サイバー空間でコミュニケーションを行っているような効果を与えている。しかし、本来、立体的な情報とは無関係の色情報を用いているため、厳密に対象の区別を行うことは困難、実際のオブジェクトの前後関係を把握しているのではない、など多くの制約が存在し、安定的に行うことは難しい。
【０００５】
そこで、現状のビデオ画像を用いたコミュニケーションでは、撮影したビデオ画像に、このような対象の区別や前後関係などの位置関係、立体形状といった情報を用いて画像合成などの特殊効果をかけることは考えられていなかった。現在行われているのは、ビデオ画像の上に、文字を重ね書きしたり、飾りフレームを重ねたり、といった特殊効果がほとんどである（例えば、非特許文献４参照）。
【０００６】
一方、上記個人間の映像コミュニケーションを拡張して、その映像中に広告を提示することが可能となれば、様々なビジネス展開を考えることができる。しかし、上述したように、従来の画像に広告の付加を考えた場合、どのように広告を重畳すればよいか、という問題がある。単に従来の画像の上にそのまま広告を重畳したのでは、邪魔になることが多い。例えば、人間のバストアップ画像を用いてテレビ電話によるコミュニケーションをしている最中に、自分や相手の顔の上に広告が重畳表示されると、邪魔になるだけでなく、ユーザがその広告に対して悪印象を持ってしまい、広告の意味がない。そこで、従来、画像中に広告を付加するという試みは行われていなかった。
【０００７】
【特許文献１】
特開２００１−１８８９１０公報（段落番号「０１５８」乃至「０１７０」、第２４図）
【０００８】
【非特許文献１】
Ｊ−Ｐｈｏｎｅ社のＷＷＷページ
ｈｔｔｐ：／／ｗｗｗ．ｊ−ｐｈｏｎｅ．ｃｏｍ／ｍｏｖｉｅ−ｓｈａｍａｉｌ／
【０００９】
【非特許文献２】
ＮＴＴドコモ社のＷＷＷページ
ｈｔｔｐ：／／ｗｗｗ．ｎｔｔｄｏｃｏｍｏ．ｃｏ．ｊｐ／ｐ＿ｓ／ｉｍｏｄｅ／ｉｓｈｏｔ／ｉｎｄｅｘ．ｈｔｍｌ
【００１０】
【非特許文献３】
ＮＴＴドコモ社ＦＯＭＡのＷＷＷページ
ｈｔｔｐ：／／ｆｏｍａ．ｎｔｔｄｏｃｏｍｏ．ｃｏ．ｊｐ／
【００１１】
【非特許文献４】
ＡＴＬＵＳのＷＷＷページ
ｈｔｔｐ：／／ｗｗｗ．ａｔｌｕｓ．ｃｏ．ｊｐ／ａｍ／ｐｒｉｎｔｃｌｕｂ／
【００１２】
【発明が解決しようとする課題】
このように、従来は、撮影した画像（動画像、静止画像を含む）に他の所望の画像を合成するなどして画像を加工する際、当該撮影した画像中の複数の撮影対象のそれぞれの３次元的な形状や位置関係などを無視して、画像上の２次元平面上に単純に画像を合成するなどの加工を行うことしかできないという問題点があった。
【００１３】
しかし、当該撮影した画像中の複数の撮影対象を区別して、それらの３次元的な形状や位置関係などの情報を用いることができれば、これら画像中の撮影対象の３次元的な情報を活用して、当該撮影した動画像中に映っている人間の周りにコンピュータグラフィックス（ＣＧ）のキャラクタが飛び回っていたり、背景部分のみを変化させたり、といったように、より変化に富んだ特殊効果を施すことが可能となる。
【００１４】
また、背景と人物の部分の判別が可能となれば、広告画像を合成する際には、人間の部分に広告がかからないように背景部分に広告画像を合成するといったことも可能となる。さらに、撮影した動画像中に映っている人間の周りに広告を持ったコンピュータグラフィックスのキャラクタが飛び回るなど、対象の立体形状をうまく生かした新たな広告の提示方法も実現可能となる。
【００１５】
そこで、本発明は、上記問題点に鑑み、従来技術では不可能であった、撮影した画像（動画像、静止画像を含む）中の複数の撮影対象のそれぞれの３次元的な形状や３次元的な位置関係などの３次元的な特徴に合わせて、広告画像や所望の画像を合成するなどの加工が行える画像加工方法およびそれを用いた画像加工装置、通信端末装置およびサーバ装置を提供することを目的とする。
【００１６】
【課題を解決するための手段】
（１）本発明は、互いに通信可能な複数の端末装置のうちの１つである端末装置であって、複数の撮影対象を含む第１の画像を取得する第１の取得手段と、前記第１の画像は１または複数の画素からなる複数の領域から構成され、この複数の領域のそれぞれに対応する奥行き情報を取得する第２の取得手段と、前記第１の画像と前記奥行き情報とを基に、少なくとも前記複数の撮影対象の３次元的な形状と前記複数の撮影対象の３次元的な位置関係を特徴情報として抽出する抽出手段と、この抽出手段で抽出した特徴情報を基に、前記第１の画像を加工して第２の画像を生成する生成手段とを具備したことにより、撮影した第１の画像（動画像、静止画像を含む）中の複数の撮影対象のそれぞれの３次元的な形状や３次元的な位置関係などの３次元的な特徴に合わせて、所望の画像を合成するなどの加工が行える。
【００１７】
例えば、前記生成手段は、前記第１の画像に物体の画像を合成することにより前記第２の画像を生成するものであって、その際、当該第１の画像中の奥行き方向に当該物体の位置を定めたときの、当該物体と前記複数の撮影対象の３次元的な位置関係と、前記複数の撮影対象の３次元的な形状とのうちの少なくとも１つを基に、当該物体の画像を当該第１の画像に合成する。
【００１８】
また、前記生成手段は、前記第１の画像に物体の画像を合成することにより前記第２の画像を生成するものであって、その際、前記物体の動きが前記撮影対象の３次元的な形状に合うように制御して、当該物体の画像を当該第１の画像に合成する。
【００１９】
また、前記抽出手段は、前記第１の画像中の前記撮影対象の存在する奥行き方向の位置と当該撮影対象の後の背景領域とを抽出し、前記生成手段は、前記第１の画像に物体の画像を合成することにより前記第２の画像を生成するものであって、その際、当該物体が当該撮影対象と前記背景領域の間に存在するように当該物体の画像を当該第１の画像に合成する。
【００２０】
また、前記抽出手段は、前記第１の画像中の前記複数の撮影対象のうちの１つである第１の撮影対象の存在する奥行き方向の第１の位置と、当該第１の撮影対象の後にある、前記複数の撮影対象のうちの他の１つである第２の撮影対象の存在する奥行き方向の第２の位置とを抽出し、前記生成手段は、前記第１の画像に物体の画像を合成することにより前記第２の画像を生成するものであって、その際、当該物体が当該第１の位置と第２の位置の間に存在するように当該物体の画像を当該第１の画像に合成する。
【００２１】
また、前記抽出手段は、前記第１の画像中の前記撮影対象の存在する３次元的な位置として第３の位置を抽出し、前記生成手段は、前記第１の画像に物体の画像を合成することにより前記第２の画像を生成するものであって、その際、当該物体の動きと当該撮影対象の動きとに基づき、当該物体が前記第３の位置に至ったとき、当該物体が当該撮影対象と衝突したと判断し、当該物体の動きや表現が衝突に対応するよう制御して当該物体の画像を当該第１の画像に合成し、また、前記物体が前記撮影対象と衝突したと判断したとき、衝突の効果表現を前記第１の画像に合成する。
【００２２】
また、前記抽出手段は、前記第１の画像中の前記撮影対象の存在する３次元的な位置として第３の位置を抽出し、前記生成手段は、前記第１の画像に物体の画像を合成することにより前記第２の画像を生成するものであって、その際、当該撮影対象の前記第３の位置と、前記第１の画像中の当該物体の３次元的な位置であるの第４の位置とに基づき、当該物体の動きや表現を制御して当該物体の画像を当該第１の画像に合成する。
【００２３】
また、前記抽出手段は、前記第１の画像から前記複数の撮影対象のそれぞれの画像と、それらの後の背景領域の画像とを抽出し、前記生成手段は、前記背景領域の画像と前記複数の撮影対象のそれぞれの画像のうちの少なくとも１つを加工することにより前記第２の画像を生成する。
【００２４】
（２）前記第２の取得手段は、前記第１の画像に対応し、各画素値に奥行き情報を含む第３の画像を取得する手段と、前記第３の画像から前記第１の画像の各領域に対応する奥行き情報を抽出する手段と、を具備し、前記抽出手段は、前記第１の画像と当該第１の画像の各領域に対応する前記奥行き情報とから、少なくとも前記撮影対象の３次元的な形状と前記複数の撮影対象の３次元的な位置関係を抽出する。
【００２５】
（３）前記第２の取得手段は、前記撮影対象に向けて発光する発光手段と、この発光手段を発光させて前記撮影対象に照射した光の当該撮影対象からの反射光を含む第１の光量を受光する第１の受光手段と、前記発光手段が発光していないときに、前記反射光を含まない第２の光量を受光する第２の受光手段と、前記第１の光量から前記第２の光量を差し引いて、前記第１の光量から前記反射光の成分を抽出することにより、各画素値に奥行き情報を含む第３の画像を生成する手段と、前記第３の画像から前記第１の画像の各領域に対応する奥行き情報を算出する手段とを具備し、前記抽出手段は、前記第１の画像と当該第１の画像の各領域に対応する前記奥行き情報とから、少なくとも前記撮影対象の３次元的な形状と前記複数の撮影対象の３次元的な位置関係を抽出する。
【００２６】
（４）前記生成手段は、前記第１の画像に衝撃などの効果表現や物体を表現した付加画像を合成することにより前記第２の画像を生成するものであって、この生成手段で前記第１の画像に合成するための複数種類の付加画像を記憶する記憶手段と、この記憶手段に記憶された複数の付加画像の中から前記第１の画像に合成する付加画像を選択する手段とをさらに具備したことにより、ユーザは、所望の付加画像を自由に選択して、自分の好みにあった特殊効果を上記第１の画像に付加することができる。
【００２７】
（５）前記生成手段は、前記第１の画像に衝撃などの効果表現や物体を表現した付加画像を合成することにより前記第２の画像を生成するものであって、この生成手段で前記第１の画像に合成するための複数種類の付加画像を記憶する記憶手段と、前記第１の画像から、当該第１の画像に写っている物体、色的な雰囲気、画像の構図のうちの少なくとも１つをシーンの特徴として抽出する手段と、前記シーンの特徴を基に、前記記憶手段に記憶された複数種類の付加画像の中から前記第１の画像に合成する付加画像を選択する手段とをさらに具備したことにより、上記第１の画像中の物体や色的な雰囲気、画像の構図などに適した特殊効果を付加することができる。
【００２８】
（６）本発明は、互いに通信可能な複数の端末装置と当該複数の端末装置と通信可能に接続されたサーバ装置とから構成される通信システムにおける前記サーバ装置であって、前記複数の端末装置のうちの１つから送信された、複数の撮影対象を含む第１の画像と、当該第１の画像を構成する１または複数の画素からなる複数の領域のそれぞれに対応する奥行き情報を受信する受信手段と、この受信手段で受信した前記第１の画像と前記奥行き情報とを基に、少なくとも前記複数の撮影対象の３次元的な形状と前記複数の撮影対象の３次元的な位置関係を抽出する抽出手段と、この抽出手段で取得した前記複数の撮影対象の３次元的な形状と３次元的な位置関係とのうちの少なくとも１つを基に、前記第１の画像に広告画像を合成することにより第２の画像を生成する生成手段と、前記第２の画像を前記複数の端末装置のうちの他の端末装置へ送信する送信手段と、を具備したことにより、端末装置間で送受信される画像（上記第１の画像）に、当該第１の画像（動画像、静止画像を含む）中の複数の撮影対象のそれぞれの３次元的な形状や３次元的な位置関係などの３次元的な特徴に合わせて（当該第１の画像中の撮影対象のじゃまにならないように）広告画像を合成することができる。従って効率よく広告画像を合成することができ、広告効果の向上が図れる。
【００２９】
また、上記サーバ装置は、前記複数の端末装置のうちの１つから送信された、少なくとも複数の撮影対象を含む第１の画像と、当該第１の画像を構成する１または複数の画素からなる複数の領域のそれぞれに対応する奥行き情報から抽出された、少なくとも前記複数の撮影対象の３次元的な形状と前記複数の撮影対象の３次元的な位置関係を含む特徴情報を受信する受信手段と、この受信手段で受信した前記特徴情報に基づき、前記第１の画像に広告画像を合成することにより第２の画像を生成する生成手段と、前記第２の画像を前記複数の端末装置のうちの他の端末装置へ送信する送信手段とから構成されていてもよい。
【００３０】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
【００３１】
（第１の実施形態）
まず、本発明の第１の実施形態について説明する。
【００３２】
＜全体の構成＞
図１は、第１の実施形態に係る、互いに通信可能な複数の端末装置のうちの１つである、主要部の構成例を示したものである。
【００３３】
図１に示した画像コミュニケーション装置は、撮影対象の画像と当該画像の奥行き情報を取得する画像取得部１と、画像取得部１で取得した画像の奥行き情報をもとに当該画像中の撮影対象などの３次元的な特徴を抽出する特徴抽出部２と、特徴抽出部２で抽出された特徴をもとに、画像取得部１で取得した画像に３次元的な要素を考慮して、画像合成などの特殊効果を施す加工部３と、加工部３にて加工した結果得られた画像や、通信部５から受信した画像を提示する画像提示部４と、加工部３にて加工された画像を送受信するための通信部５とから構成される。
【００３４】
図１に示したような構成を有する複数の端末装置間では、それぞれにおいて取得した画像に対し加工部３で加工を施して得られた画像を他の端末装置へ送信したり受信したりすることで、各端末装置のユーザ間でコミュニケーションを行う。
【００３５】
＜画像取得部＞
まず、画像取得部１について説明する。
【００３６】
画像取得部１は、撮影対象の画像として例えばカラー画像を取得するとともに、当該画像の奥行き情報を取得して、撮影対象を、その３次元形状と画像取得部１から当該撮影対象までの距離を反映した奥行き情報を含むカラー画像（ここでは、奥行きカラー画像と呼ぶ）として取得するものである。
【００３７】
なお、ここでは、撮影対象の画像としてカラー画像を取得する場合を例にとり説明するが、この場合に限らず、階調画像（いわゆる白黒画像）であってもよい。この場合には、画像取得部１では、撮影対象を、その３次元形状と画像取得部１から当該撮影対象までの距離を反映した奥行き白黒画像として取得する。
【００３８】
さて、通常、カラー画像は、画素とよばれる単位で構成されており、画素には、図６（ｂ）に示すように、例えばＲＧＢの色の情報が格納されている。この画素を縦横の２次元の方向にアレイ状に並べることで実現されている。例えば、ＶＧＡ（Ｖｉｄｅｏ　Ｇｒａｐｈｉｃｓ　Ａｒｒａｙ）サイズのカラー画像は、ｘ軸（横）方向６４０画素、ｙ軸（縦）方向４８０画素の２次元アレイで表現されており、各画素内に、その位置における色の情報が例えばＲＧＢ形式で格納されている。なお、図６（ｂ）は、ｘ軸方向５画素、ｙ軸方向８画素のカラー画像において、各画素に格納されている色の情報（例えば、（Ｒ、Ｇ、Ｂ）＝（ｒ１、ｇ１、ｂ１）など）の一例を示したものである。
【００３９】
奥行きカラー画像では、各画素に、例えば上記のようにＲＧＢ形式の色の情報に加え、その位置における奥行き情報（例えば、画像取得部１から撮影対象までの距離情報）ｄが対応付けられている。ここでの対応付けとしては、例えば、奥行きカラー画像の各画素に、例えば、上記（Ｒ、Ｇ、Ｂ）にさらに、当該画素に対応する奥行き情報ｄを追加して、（Ｒ、Ｇ、Ｂ、ｄ）という形式で記憶するものとする。
【００４０】
なお、カラー画像の代わりに、白黒画像を用いる場合には、当該白黒画像の画素値は、階調度となる。従って、この場合、奥行き白黒画像の各画素には、上記階調度に、さらに、当該画素に対応する奥行き情報ｄが記憶されている。
【００４１】
また、ここでは、奥行きカラー画像の各画素が奥行き情報ｄを保持するとして説明するが、この場合に限らず、予め複数の画素からなる領域を単位領域として定めて、この単位領域毎に、奥行き情報ｄを対応付けるあるいは保持するようにしてもよい。
【００４２】
なお、奥行きカラー画像の各画素にＲＧＢなどの色情報に奥行き情報を追加したとしたとしても、奥行きカラー画像を表示する場合、この奥行き情報を用いなければ、奥行きカラー画像は通常のカラー画像と全く同様に表示される。各画素に対応する奥行き情報を用いることでカラー画像を３次元的に表示することもできる。
【００４３】
図２に、通常の白黒画像と奥行き白黒画像の比較を示す。図２（ａ）が通常の白黒画像である。２次元的に各画素に階調度が納められている画像である。図２（ｂ）に奥行き白黒画像の一例を示す。これは、奥行き情報を用いて３次元的に表現したものである。（３次元的な表現が分かりやすいように、正面下からの視点から見たようにした。）このように、奥行き白黒画像および奥行きカラー画像は、従来の白黒画像やカラー画像とは異なり、奥行き情報として、画像取得部１から撮影対象までの距離情報ｄが含まれていることが特徴である。
【００４４】
なお、ここでは、通常のカラー画像、白黒画像の各画素に奥行き情報を対応付けたものを、それぞれ、奥行き情報を含むカラー画像、奥行き情報を含む白黒画像とも呼ぶが、奥行きカラー画像、奥行き白黒画像は、それぞれ、奥行き情報を含むカラー画像、奥行き情報を含む白黒画像の一具体例という位置づけにある。
【００４５】
それでは、画像取得部１の構成について図３を用いて詳細に説明する。図３は画像取得部１の機能ブロック図であり、大きく分けて、撮影対象の色情報が含まれる画像情報をリアルタイムで取得するための画像情報取得部１０１と、撮影対象の奥行き情報をリアルタイムで取得するための奥行き情報取得部１０２と、撮像動作制御部１０４と、奥行きカラー画像生成部１１３と、出力部１１４とから構成されている。
【００４６】
画像情報取得部１０１は、撮影対象（例えば、物体など）を自然光（照明光を含む）で撮像することにより背景を含む物体のカラー画像である自然光画像を取得する自然光画像撮像部１０３と、自然光画像撮像部１０３で撮像された自然光画像が格納される自然光画像記憶部１０５と、自然光画像記憶部１０５に格納された自然光画像を読み出して、必要に応じて明度やコントラストなどの調整をする自然光画像調整部１０６と、自然光画像調整部１０６で調整された自然光画像を加工部３での処理に適したデータ形式で出力する加工部３へ出力する画像情報出力部１０７とから構成され、自然光画像調整部１０６で調整された自然光画像は、さらに、奥行きカラー画像生成部１１３へ出力される。
【００４７】
自然光画像撮像部１０３は、例えば、ＣＣＤやＣＭＯＳ撮像素子であり、背景を含む撮影対象（例えば物体など）のカラー平面画像をリアルタイムで取得する。これにより、物体の属性のうち色を取得できる。また、上記カラー画像を連続して撮像するので、連続するフレームの変化から物体の動き情報を把握することができる。また、物体の画像を動画にすることができる。
【００４８】
撮像動作制御部１０４は、自然光画像撮像部１０３と反射光画像撮像部１０８の動作を制御するための制御信号を発生する。
【００４９】
一方、奥行き情報取得部１０２は、撮像動作制御部１０４からの制御信号に従って、物体やその周囲にある被写体などの撮影対象の反射光画像を撮像するための反射光画像撮像部１０８と、反射光画像撮像部１０８で撮像された反射光画像が格納される反射光画像記憶部１０９と、反射光画像記憶部１０９に格納された反射光画像を読み出して各種補正をする反射光画像補正部１１０と、反射光画像を補正するためのパラメータが記憶されているパラメータ記憶部１１１と、反射光画像補正部１１０で補正された反射光画像を解析して物体の奥行き情報を演算する奥行き情報演算部１１２とから構成され、奥行き情報演算部１１２で演算された奥行き情報は、奥行きカラー画像生成部１１３へ出力される。
【００５０】
ここで、反射光画像撮像部１０８の詳細を図４を参照して説明する。図４は、反射光画像撮像部１０８の機能ブロック図である。反射光画像撮像部１０８は発光部１１４、受光光学系１１５、反射光抽出部１１６およびタイミング制御部１１７から構成される。
【００５１】
発光部１１４はタイミング制御部１１７によって生成されるタイミング信号に従って時間的に強度変動する光を発生する。この光は発光部１１４の前方にある、例えば図４ではユーザの頭部（物体の一例）に照射される。ここで発生する光としては近赤外光であるが、これに限らず可視光など他の波長領域の光も利用することができる。
【００５２】
例えばユーザの頭部からの反射光はレンズなどで構成される受光光学系１１５により集光されて反射光抽出部１１６の受光面上に結像される。受光光学系１１５には近赤外光を通過するフィルターが設けられている。このフィルターにより反射光のうち近赤外光以外の可視光や遠赤外光のような外光がカットされる。
【００５３】
反射光抽出部１１６は、上記結像を形成する反射光の空間的な強度分布を抽出する。この強度分布は反射光による画像として捉えることができる。これが上記反射光画像（シルエット像）である。この機能を達成するため、反射光抽出部１１６は、第１の受光部１１９、第２の受光部１２０および差分演算部１２１から成る。第１の受光部１１９と第２の受光部１２０は、異なるタイミングで受光を行う。そして、第１の受光部１１９が受光しているときに発光部１１４が発光し、第２の受光部１２０が受光しているときには発光部１１４は発光しないように、タイミング制御部１１７がこれらの動作タイミングを制御する。これにより第１の受光部１１９は発光部１１４からの光の物体による反射光とそれ以外の自然光（つまり太陽光、照明光などの外光）を受光する。一方、第２の受光部１２０は自然光のみを受光する。両者が受光するタイミングは異なっているが近いので、この間における自然光の変動は無視できる。
【００５４】
従って、第１の受光部１１９で受光した像と第２の受光部１２０で受光した像の差分をとれば、発光部１１４からの光のうち物体による反射光の成分だけが抽出され、反射光画像を生成することができる。差分演算部１２１が第１の受光部１１９と第２の受光部１２０で受光した像の差を計算して出力する。
【００５５】
物体からの反射光は、物体と反射光画像撮像部１０８の距離が長くなるに従い大幅に減少する。物体の表面が一様に光を散乱する場合、反射光画像１画素あたりの受光量は物体までの距離の二乗に反比例して小さくなる。すなわち、例えば、反射光画像中の座標（ｉ、ｊ）にある画素の画素値をＱ（ｉ、ｊ）とすると、
Ｑ（ｉ、ｊ）＝Ｋ／ｄ^２…（１）
と表すことができる。ここで、Ｋは、例えば、ｄ＝０．５ｍのときに、Ｑ（ｉ、ｊ）の値が「２５５」になるように調整された係数である。式（１）をｄについて解くことで、距離を求めることができる。
【００５６】
このように、反射光画像の各画素には、奥行き情報として、反射光画像撮像部１０８から撮影対象までの距離に換算することのできる反射光の強度値、すなわち、奥行き情報が含まれているのである。
【００５７】
物体からの反射光の強度値は、反射光画像撮像部１０８から物体までの距離に換算できるので、物体の立体形状を把握することができる。また、背景からの反射光はほぼ無視できるくらいに小さい。よって、背景がカットされた、物体およびその周囲にある被写体の反射光画像を得ることができる。
【００５８】
反射光画像の一例を図６（ａ）に示す。図６（ａ）には、簡単のため、２５６×２５６画素の反射光画像の一部である５×８画素の反射光画像の場合について示している。各画素の画素値は、反射光の強度値である。
【００５９】
人間の手を撮影対象として、反射光画像撮像部１０８で撮影された反射光画像に含まれる上記奥行き情報から得られる、当該反射光画像中の手の３次元的なイメージを図５に示す。
【００６０】
反射光画像は、例えば、以下の（１）から（３）の構成を有する装置により得ることができる。（１）時間的に一定あるいは時間的に変化するパルス信号や変調信号を発生させるためのタイミング信号生成手段。（２）このタイミング信号生成手段によって生成された信号に基づいて、強度変化する光を発するための発光手段。（３）この発光手段から発された光の物体による反射光をタイミング信号生成手段からの信号と同期して外光（自然光）から分離して検出する手段を配列して構成し、光の物体による反射光画像を検出する反射光抽出手段。
【００６１】
反射光画像撮像部１０８の上記構成要素のさらに詳細な説明は、本件と同一出願人が出願した特開平１０−１７７４４９号公報に記載されている。
【００６２】
なお、反射光画像撮像部１０８は、物体を含む被写体の反射光画像を連続して撮像するので、連続するフレームの変化から物体の動き情報を把握することができる。これにより、物体の画像を動画にすることができる。
【００６３】
後述するように、自然光画像（背景を含む物体のカラーの平面画像）と、物体の反射光画像を演算して得られる奥行き情報とを組み合わせて、奥行きカラー画像が生成される。このため、反射光画像の画素と自然光画像の画素とを対応させる必要があるので、反射光画像撮像部１０８と自然光画像撮像部１０３は近接して配置するのが好ましい。この観点からこれらの自然光画像撮像部１０３、反射光画像撮像部１０８、さらに、必要に必要に応じて撮像動作制御部１０４は１チップ化されていることが好ましい。
【００６４】
また、自然光画像と反射光画像との間では、例えば画素間あるいは複数の画素からなる領域間の対応付けがなされていることが必要である。例えば、画像取得部１０１で取得される自然光画像（カラー画像）が図６（ｂ）に示したような画像であり、奥行き情報取得部１０２で取得される反射光画像が図６（ａ）に示したような画像であるとする。この場合、カラー画像の各画素と反射光画像の各画素との間には、１対１に対応付けられている。図６（ａ）と図６（ｂ）の各画像中の各画素をその座標（ｉ、ｊ）を用いて表すとき（ここで、ｉ＝１〜５、ｊ＝１〜８）、図６（ａ）中の反射光画像の画素Ｐ２（ｉ、ｊ）と、図６（ｂ）の自然光画像中の画素Ｐ１（ｉ、ｊ）とは互いに対応する画素となる。例えば、図６（ａ）中の反射光画像の画素Ｐ２（５、８）、図６（ｂ）の自然光画像中の画素Ｐ１（５、８）とは互いに対応する画素となる。従って、奥行きカラー画像中の画素Ｐ３（ｉ、ｊ）の画素値は、自然光画像中の画素Ｐ１（ｉ、ｊ）の色情報と、反射光画像の画素Ｐ２（ｉ、ｊ）から（奥行き情報演算部１１２にて）算出された距離（奥行き情報）ｄとが含まれている。
【００６５】
ここでは、自然光画像の画素と反射光画像の画素とが対応付けられている場合を示したが、この場合に限らず、自然光画像と反射光画像のサイズや解像度などの違いから、例えば、自然光画像を複数の画素からなる複数の単位領域に分割し、この複数の単位領域のそれぞれと反射光画像の各画素との間で対応付けを行うようにしてもよいし、また逆に、反射光画像を複数の画素からなる複数の単位領域に分割し、この複数の単位領域のそれぞれと自然光画像の各画素との間で対応付けを行うようにしてもよい。さらに、反射光画像と自然光画像のそれぞれを複数の画素からなる複数の単位領域に分割し、反射光画像と自然光画像のそれぞれの単位領域間で対応付けを行うようにしてもよい。
【００６６】
また、撮像動作制御部１０４は、例えば、ユーザからの撮像指示が入力されたときに、自然光画像撮像部１０３と反射光画像撮像部１０８とでほぼ同時に自然光画像と反射光画像とを取得するよう、自然光画像撮像部１０３と反射光画像撮像部１０８とに制御信号を出力するようになっている。
【００６７】
次に、図３に示す反射光画像補正部１１０での補正を説明する。物体の色や反射特性などを考慮して補正をする。詳しく説明すると、物体からの反射光の強度は、物体と反射光画像撮像部１０８との距離以外の要因にも左右される。このため、反射光画像から単純に距離を求めても、距離（つまり立体形状）が正確でないことがある。例えば、物体表面の色が黒い場合、反射光の強度は低下する。また、物体の表面が鏡面反射成分を多く含む場合、物体の表面の法線が光源方向に近くなる部分で強い反射光が発生する。
【００６８】
よって、物体の反射光強度から距離情報を求める前に、反射光画像補正部１１０において、パラメータ記憶部１１１に予め格納されている物体の表面の色や反射特性などに関するパラメータを参照したり、自然光画像記憶部１０５に格納されている自然光画像を参照して、当該自然光画像中の反射光画像の画素や単位領域に対応する領域（画素や単位領域）の画素値を基に、物体の反射光画像を補正する。例えば、反射光画像中のある画素に対応する自然光画像の画素の色情報が「黒」であれば、反射光画像の当該画素値を黒い撮影対象からの反射光を受光した場合のパラメータを用いて補正する。
【００６９】
これにより、反射光画像撮像部１０８で得られた反射光画像の各画素値が補正されるので、後段の奥行き情報演算部１１２で算出される奥行き情報の精度を高めることができる。
【００７０】
奥行き情報演算部１１２は、反射光画像補正部１１０で補正された反射光画像の各画素について、その画素値としての受光量（反射光強度）をＱとすると、当該画素に対応する距離ｄを、例えば次式（２）から求める。
【００７１】
ｄ＝（Ｋ／Ｑ）^１／２　　　　…（２）
すなわち、前述したように、物体からの反射光の強度値は、反射光画像撮像部１０８から物体までの距離ｄに換算できるので、この距離ｄを奥行き情報として求めるのである。
【００７２】
なお、ここでは、反射光画像中の各画素について、奥行き情報としての距離ｄを求める場合を示したが、この場合に限らず、例えば、上記単位領域毎に、その代表画素あるいは当該単位領域中の画素値の平均値などから、距離ｄを算出するようにしてもよい。
【００７３】
さて、奥行きカラー画像生成部１１３では、自然光画像調整部１０６から出力された自然光画像と、奥行き情報演算部１１２で算出された反射光画像中の各画素（あるいは各単位領域）の奥行き情報とから、奥行きカラー画像を生成する。
【００７４】
すなわち、例えば、自然光画像である、例えば図６（ｂ）に示したようなカラー画像の各画素の画素値に、当該画素に対応する反射光画像の画素から算出された奥行き情報を追加して、奥行きカラー画像の各画素値を生成する。例えば、図６（ａ）中の反射光画像の画素Ｐ２（５、８）と、図６（ｂ）の自然光画像中の画素Ｐ１（５、８）とは互いに対応する画素となる。従って、自然光画像中の画素Ｐ１（５、８）の画素値に、反射光画像の画素Ｐ２（５、８）から（奥行き情報演算部１１２にて）算出された距離（奥行き情報）ｄとを追加して、奥行きカラー画像の画素Ｐ３（５、８）の画素値を生成する。
【００７５】
なお、自然光画像を複数の画素からなる複数の単位領域に分割し、この複数の単位領域のそれぞれと反射光画像の各画素との間で対応付けを行っている場合には、自然光画像の各単位領域に、当該単位領域に対応する反射光画像の画素から算出された奥行き情報ｄを対応付けて（例えば、当該単位領域中の各画素の画素値に当該単位領域に対応する反射光画像の画素から算出された奥行き情報ｄを追加する）、奥行きカラー画像の各画素値を生成する。
【００７６】
また、逆に、反射光画像を複数の画素からなる複数の単位領域に分割し、この複数の単位領域のそれぞれと自然光画像の各画素との間で対応付けを行っている場合には、自然光画像の各画素の画素値に、当該画素に対応する反射光画像の単位領域から算出された奥行き情報を追加して、奥行きカラー画像の各画素値を生成する。
【００７７】
さらに、反射光画像と自然光画像のそれぞれを複数の画素からなる複数の単位領域に分割し、反射光画像と自然光画像のそれぞれの単位領域間で対応付けを行っている場合には、自然光画像の各単位領域に、当該単位領域に対応する反射光画像の単位領域から算出された奥行き情報ｄを対応付けて（例えば、当該単位領域中の各画素の画素値に当該単位領域に対応する反射光画像の単位領域から算出された奥行き情報ｄを追加する）、奥行きカラー画像の各画素値を生成する。
【００７８】
このようにして生成された奥行きカラー画像は、出力部１１４に送られ、ここで、出力先のプロトコルやデータ形式に合わせるために変換等を行って、特徴抽出部２へ出力される。
【００７９】
なお、以上説明した画像取得部１の構成は、あくまでも一例であり、これに限定されるものではない。特に、奥行き情報を取得する際には、上記のように、反射光画像を必ずしも用いる必要はない。すなわち、複数の視野から撮影した画像の視差情報を用いることで奥行き情報を計算するという、ステレオマッチングの手法を用いて奥行き情報を取得するという構成であってもよいし、縞状のレーザー光を照射し、その形のゆがみを用いて上記奥行き情報を計測するというレーザーレンジファインダと呼ばれる方式を用いてもよい。また、これら以外の方法を用いて奥行き情報を取得して、上記のような奥行きカラー画像を取得することができるものを使用することもできる。
【００８０】
また、上記画像取得部１は、奥行きカラー画像生成部１１３で各画素に奥行き情報を含む画像（奥行きカラー画像）を生成するようになっているが、この場合に限らず、自然光画像の各画素に、当該画素に対応する奥行き情報を対応付けて、その対応関係を保持あるいは記憶するだけであってもよい。
【００８１】
＜特徴抽出部＞
次に、特徴抽出部２について説明する。ここでは、画像取得部１で求めた画素値に奥行き情報を含む奥行きカラー画像を処理対象とする。
【００８２】
特徴抽出部２は、画像取得部１で取得した奥行きカラー画像に含まれる奥行き情報をもとに撮影対象の３次元的な特徴を抽出するためのものである。
【００８３】
ここで、図７に示す内容の奥行きカラー画像を参照して、特徴抽出部２で解析する３次元的な特徴に関して具体的に説明する。なお、図７は、奥行きカラー画像に含まれる奥行き情報を用いずに表示した場合を示しているので、通常のカラー画像（図７の場合、白黒画像）と同様である。図７は、シーン中央に、イスに座った人物の上半身が映っており、その人物は、右腕を挙げ、人差し指を立てている。また、人物の手前に缶ジュースが置かれている。
【００８４】
このシーンにおけるそれぞれの物体の位置関係を模式的に簡潔に示したのが図８である。図８のように、画像取得部１から最も近い距離に缶が存在し（図１２参照）、その向こうに、人物が存在している（図１０参照）。さらに遠くに背景部分がある（図９参照）。また、人物内の奥行き情報を細かく見てみると、右腕部分が胴体よりも画像取得部１から近い部分にある（図１１参照）。つまり、奥行きの違いによって、シーンを図９〜図１２に示すように、幾つかの領域に分割することができるわけである。
【００８５】
このように、画像取得部１で取得した奥行きカラー画像に含まれる奥行き情報を解析することで、シーン内にどのような撮影対象（例えば物体）が存在するか（どのような領域に分割できるか）、シーン内の奥行き方向の凹凸関係はどうなっているか、シーンにおける各物体（１つの物体を構成する部分もそれぞれ１つの物体としてみなす）の３次元的な位置関係、それぞれの物体の立体形状、などといった特徴を得ることができる。
【００８６】
それでは、特徴解析手法に関して説明する。
【００８７】
最も簡単な解析として、シーン内部に撮像されている物体が何かを判別せずに、その３次元形状のみを得るというものがある。例えば、図７のシーンの場合、画面中央下部（実際には、缶の部分）が最も近い奥行き値を持ち、画面左側下部（実際には人物の右腕部分）が次ぎに近い奥行き値を持ち、画面中央の大部分（実際には、人物部分）が次ぎに存在し、その後の部分（実際には背景部分）は、非常に遠くに存在する、などといったように、奥行き方向に関してシーンの全体的な凹凸関係を抽出する。
【００８８】
また、特徴抽出部２では、必要ならば、さらに複雑な解析を行うことも可能である。
【００８９】
図１３は、図７に示した奥行きカラー画像に含まれる奥行き情報を模式的に示したものである。図１３は、図７に示したシーンを上方から見て、奥行き情報の得られている部分のみを実線で示したものである。このように、奥行きカラー画像では、撮影方向から見える部分の奥行き情報のみが得られる。（撮影方向から見えない部分、例えば、この例では、缶や人物の後ろ側、人物の右腕の後ろに隠れた部分などのデータは無い。）このような奥行き情報をもとに撮影対象の３次元的な位置関係を得る。
【００９０】
最も簡単な方法は、画像取得部１からの奥行き情報ｄによって単純に前景と背景を区別することである。ある閾値ＴＨを定め、奥行き情報が閾値ＴＨよりも近い部分を前景、遠い部分を背景とみなす（図１４参照）。このようにすることで、シーンに映っているものと背景部分を区別することが可能である。これにより、撮影しているシーンに、何か物体（つまり前景にあたる部分）が存在しているか、を判別することができる。
【００９１】
また、この閾値を複数用意し、例えば奥行きカラー画像に写っているシーンを第１の閾値ｘ１と第２の閾値ｘ２と第３の閾値ｘ３とで、距離ｘ１より手前と、距離ｘ１とｘ２の間、距離ｘ２とｘ３の間、距離ｘ３以降というように、４つの領域に分割し、各領域について、物体が存在するか否か、その物体は何であるかを識別したりするようにしてもよい。
【００９２】
また、ジャンプエッジと呼ばれる奥行き方向の不連続点（図１５における点線で示された部分）を検出することで、図１５に示すように、「背景部分」、「人物部分」、「人物の右腕部分」、「缶の部分」、などといったように、シーン内のさらに細かい位置関係を認識することも可能である。
【００９３】
ジャンプエッジの検出方法は、様々な手法を用いることができるが、例えば、奥行きカラー画像から、奥行き部分のみを抽出した画像（反射光画像などの、奥行き情報のみが２次元のアレイ状に並んでいるデータで、以降、奥行き画像と呼ぶ）に対して、エッジ検出のためのフィルタリング処理（代表的には、Ｓｏｂｅｌオペレータによる畳み込みフィルタリング処理など）をおこなうことで、得ることができる。
【００９４】
さらに、パターンマッチングという、あらかじめ物体の特徴を登録しておき、画像内にその特徴と類似している部分を探す手法があるが、その手法などを用いることで、上記で得られたそれぞれの物体が何であるかを認識することができる。例えば、図７に示したシーン内において、「缶」「人物」、当該人物の「右手」などの撮影対象が認識でき、しかも、それらの位置関係が認識できれば、図７からは、「缶」があり、その「缶」の位置はどこであるか、「缶」の後には、「人物」がいて、「右手」を挙げている、などといったような複雑な認識も可能である。
【００９５】
なお、上記のような、奥行きカラー画像から３次元的な特徴を抽出するための解析手法は、あくまでも一例であり、これに限定されるものではない。他の様々な解析・画像処理・画像認識の手法を組み合わせて実現することが可能である。
【００９６】
特徴抽出部２では、奥行きカラー画像（自然光画像と、反射光画像の各素値から得られた奥行き情報）から、自然光画像中の３次元的な特徴を抽出する。自然光画像中の３次元的な特徴とは、例えば、自然光画像中に各撮影対象の３次元的な形状（表面上の凹凸状態も含む）、複数の撮影対象の位置関係（自然光画像中の平面方向の位置関係と、自然光画像の奥行き方向の位置関係（主に前後関係））などであり、さらに、これらから、各撮影対象に対応する奥行き方向の位置から（予め定められた閾値に基づき）判別された撮影対象の存在する前景部分と、背景部分、さらに細かな領域分割が行えるとともに、パターンマッチングなどにより撮影対象が何であるかを認識することもできる。
【００９７】
＜加工部＞
次に、加工部３について説明する。
【００９８】
加工部３は、特徴抽出部２で抽出された３次元的な特徴をもとに、画像取得部１で取得した自然光画像であるカラー画像（あるいは奥行きカラー画像）に、当該カラー画像中の撮影対象の３次元的な形状や位置関係などの３次元的な特徴を考慮した特殊効果を施す（付加する）ためのものである。
【００９９】
具体的には、カラー画像に、ＣＧ（コンピュータグラフィックス）で表現された画像（ここでは、付加画像ともいう）を合成することで、特殊効果を付加する。この際、特徴抽出部２で抽出された、カラー画像（シーン）中の凹凸といった奥行き情報、カラー画像中にどのような物体が存在するか（どのような領域に分割できるか）、カラー画像中における各物体の位置関係、それぞれの物体の立体形状、などといった３次元的な特徴を活用することで、仮想物体とシーンの前後関係や衝突状態などを判別し、必要に応じて仮想物体を変形し、カラー画像に合成する。
【０１００】
ここで、図７に示したカラー画像（に写っているシーン）を例として、３Ｄ（３次元）ＣＧのデータとして与えられる仮想物体「球」を合成する場合を考える。図１６は、図７のカラー画像中の撮影対象である各物体の主に奥行き方向の位置関係を示したものであるが、上述したように、このようなシーンにおける３次元的な特徴が、特徴抽出部２から得られている。
【０１０１】
いま、「球」が図１６における奥行き位置Ｃのところを、画面右から左に動くという特殊効果を考える。この際に、仮想物体（仮想オブジェクト）である「球」の置かれる３次元的な位置および、その形状情報は既知であるため、これと、シーンの３次元的な特徴（シーンの各位置における奥行き値）を比較することで、「球」と、シーン中の各物体の位置との前後関係を判別することができる。
【０１０２】
これより、図１７に示すように、球が背景（図１６における奥行き位置Ｄ）の前を通るが、人物（奥行き位置Ｂ）の後ろを通るように、「球」を合成することが可能である。このように、特徴抽出部２で抽出された特徴を基に、カラー画像に３次元的に仮想物体を付加することが可能である。
【０１０３】
同様にして、カラー画像中の奥行き位置Ａのところを、「球」が右から左へ移動するように、仮想物体「球」の画像を合成すると、「球」は、缶の後ろを通り人物の前を通る特殊効果となる。
【０１０４】
さらに、これを推し進め、仮想物体とカラー画像中の物体との衝突判定も行うことができる。なお、図１６からも明らかなように、カラー画像中の物体の位置は、画像平面上と奥行き方向とから３次元的に特定することができる。従って、カラー画像中の各物体のそれぞれについて特定される３次元的な位置を基により精密な特殊効果を付けることもできる。その１つが「衝突」の特殊効果である。
【０１０５】
いま、図１６における奥行き位置Ｂのところに、画面右側から仮想物体「球」が移動してくる特殊効果を考える。この際、特徴抽出部２で抽出された特徴によれば、人物のいる部分が奥行き位置Ｂであることが分かるため、「球」は、人物の位置まで来たときに、人物と衝突することが分かる。そこで、図１８に示すように、「球」が、人物と衝突し、跳ね返る、といった特殊効果も付加することができる。この際、衝突したあとに跳ね返る方向をも、物体の３次元的な形状を見ることで、計算することが可能である。さらに、衝突する際に、図１８のように、カラー画像中の物体の衝突箇所（図１８では人物の頬の部分）に「衝突」の効果表現の１つである「星」を表現する、といった特殊効果を付加すると、より効果的である。
【０１０６】
また、カラー画像中の各物体のそれぞれについて特定される３次元的な位置を基に、次のような特殊効果も可能である。例えば、仮想物体が３次元的に動く特殊効果の場合について説明する。図１９に、カラー画像手前から当該画像の奥行き方向側に、２つの仮想物体「球」が移動する特殊効果の例を示している。図１９に示したように、特徴抽出部２で抽出された特徴によれば、缶が奥行き方向手前にあり、その後方に人物がいることが分かっている。この情報を「球」の動きに組み合わせ、図１９に示したように、２つの「球」のうち、１つは、近くの缶に当たって跳ね返り、もう１つは、遠くの人物に当たって跳ね返る、といった特殊効果を付加することができる。
【０１０７】
また、図２０に示すように、仮想物体「球」を人物の頭の周りを回るような特殊効果を付加したり、図２１に示すように、「球」を指の周りを回るような特殊効果を付加したりすることも可能である。
【０１０８】
さらには、カラー画像中の物体の３次元的な形状、すなわち、凹凸という３次元的な特徴を用いることで、図２２に示したように、人物の上から、ペンキなどの粘性のある液体などをたらす、といったような特殊効果をかけることもできる。図２２のように、人物の鼻の部分は他の顔よりも出っ張っているなどの特徴があるため、鼻の部分を避けてペンキが流れていくといったことが再現できるのが特徴である。
【０１０９】
また、カラー画像中の物体の３次元的な位置関係の特徴も分かっているため、ペンキは、右腕の指の部分や、缶の部分に流れないように、ペンキという仮想物体の動きや形状を制御できる。
【０１１０】
さらに、缶の上や人間の肩の部分に仮想の「キャラクタ」を座らせたり、指の上の部分に仮想の「とんぼ」がとまったりといったように、各種の画像をカラー画像中の奥行き方向の所望の位置に合成することも可能である。
【０１１１】
また、図２３に示すように、画面中央に仮想の「球」があって、それに対して、手を動かすことで、「球」が動く、というように、動画像のなかで、時間的に変化する手の動きなどの３次元的な特徴の情報を用いて、例えば、当該手の動きに合わせて「球」に仮想物に動きを与える（この場合、「球」を手で叩いてとばす）といった特殊効果も与えることができる。
【０１１２】
以上では、カラー画像に、仮想物体の画像を合成する特殊効果を例として説明したが、特殊効果は、これだけではない。例えば、背景部分を消して、別の背景に差し替えたり、ＣＧで作成した背景に置き換えたりという画像合成による特殊効果も可能である。
【０１１３】
また、前景にある物体（図７の例では、人物と缶の部分）はそのままにして、背景部分のみを白黒やセピア色に変色させるなどの特殊効果も可能である。逆に、人物部分のみにモザイクをかけるなどということもできる。さらに、特殊効果は、カラー画像の全部または一部の変形を行うことで実施されてもよい。例えば、人物の顔の部分を風船のように膨らませたり、萎ませたり、といった特殊効果も可能である。
【０１１４】
以上説明したように、加工部３では、特徴抽出部２で抽出された３次元的な特徴をもとに、画像取得部１で自然光画像として取得したカラー画像に３次元的な要素を考慮した様々な特殊効果を付加する。
【０１１５】
特殊効果の一例としては、例えば、カラー画像に仮想物体を合成するときには、当該カラー画像中の奥行き方向の位置を定めることができるので、３次元空間内のその奥行き方向の位置に実際に当該仮想物体が存在するように、当該仮想物体の画像を当該カラー画像に合成する。その際、仮想物体の画像が動画の場合であっても、カラー画像が動画の場合であっても同様である。
【０１１６】
また、カラー画像中の物体の動きと位置や、仮想物体の動きと位置から、カラー画像中の物体と仮想物体の衝突を判定することができるので、この衝突に対応する仮想物体の動きを表現したり、衝突の発生したことを表す表示を行う（例えば、衝突を表す「星」のような仮想物体の画像を合成する）こともできる。
【０１１７】
また、カラー画像に仮想物体を合成するときには、当該カラー画像中の物体の３次元形状、すなわち、例えば凹凸に合わせて仮想物体の動きや形状を制御することができるので、３次元空間内に実際に当該仮想物体が存在するように、当該仮想物体の画像を当該カラー画像に合成する。その際、仮想物体の画像が動画の場合であっても、カラー画像が動画の場合であっても同様である。
【０１１８】
また、カラー画像中の撮影対象の物体や背景部分が３次元的な特徴として抽出されているので、例えば、撮影対象の物体や背景部分をそれぞれ別個に白黒やセピア色などに変色させたり、モザイクをかけたり、変形したりなどといったことも行える。
【０１１９】
なお、本実施形態で説明した特徴の用い方や特殊効果はあくまでも一例であり、これに限定されるものではない。
【０１２０】
また、ここでは、便宜上、「球」などの仮想物体を用いて説明したが、これに限定されるものではない。仮想物体としては、キャラクタ、乗り物、建物などの様々なものが考えられる。また、仮想物体といっても、実際の写真を用いて３ＤＣＧのデータにしたもの、ユーザにより入力された手書きのマークやキャラクタなどをＣＧデータにしたものも含む。また、本実施形態では、仮想物体が変形しないものとして説明したが、これに限定される物ではなく、仮想物体の種類に応じて、自由に変形しても構わない。また、仮想物体の画像が動画であってもよい。
【０１２１】
なお、ユーザが手書きのマークやキャラクタや乗り物、建物、その他様々なものの絵を入力するための入力部を図１に示した構成に新たに追加してもよい。そして、この入力部を通じて入力された手書きの絵を上記加工部３で処理可能なように、ＣＧデータ化あるいは３ＤＣＧデータ化するための処理部も必要となる。
【０１２２】
以上説明したように、加工部部３では、奥行きカラー画像（奥行き情報を含まないカラー画像であってもよい）に仮想物体の画像を合成する場合には、当該奥行きカラー画像中の奥行き方向に当該仮想物体の位置を定めたときの、当該仮想物体と撮影対象の３次元的な位置関係と、撮影対象の３次元的な形状とのうちの少なくとも１つを基に、当該仮想物体の画像を当該奥行きカラー画像に合成する。
【０１２３】
例えば、仮想物体の動きが撮影物体の３次元的な形状に合うように制御して、当該仮想物体の画像を当該奥行きカラー画像に合成する。
【０１２４】
また、特殊抽出部２で奥行きカラー画像中の撮影対象の存在する奥行き方向の位置と当該撮影対象の後の背景領域とを抽出し、加工部３では、仮想物体が撮影対象と背景領域の間に存在するように当該仮想物体の画像を当該奥行きカラー画像に合成する。
【０１２５】
また、特殊抽出部２で奥行きカラー画像中の複数の撮影対象のうちの１つである第１の撮影対象の存在する奥行き方向の第１の位置と、当該第１の撮影対象の後にある、上記複数の撮影対象のうちの他の１つである第２の撮影対象の存在する奥行き方向の第２の位置とを抽出し、加工部３では、仮想物体が当該第１の位置と第２の位置の間に存在するように当該仮想物体の画像を当該奥行きカラー画像に合成する。
【０１２６】
また、特徴抽出部２で、奥行きカラー画像中の撮影対象の存在する３次元的な位置として第３の位置を抽出し、加工部３では、仮想物体の動きと撮影対象の動きとに基づき、当該仮想物体が上記第３の位置に至ったとき、当該仮想物体が当該撮影対象と衝突したと判断し、当該物体の動きや表現が衝突に対応するよう制御して当該仮想物体の画像を当該奥行きカラー画像に合成する。なお、仮想物体が撮影対象と衝突したと判断したときには、さらに衝突の効果表現（例えば、「星」など）を当該奥行きカラー画像に合成してもよい。
【０１２７】
また、特徴抽出部２で奥行きカラー画像中の撮影対象の存在する３次元的な位置として第３の位置を抽出し、加工部３では、当該撮影対象の上記第３の位置と、上記奥行きカラー画像中の仮想物体の３次元的な位置であるの第４の位置とに基づき、当該仮想物体の動きや表現を制御して当該仮想物体の画像を当該奥行きカラー画像に合成する。
【０１２８】
また、特徴抽出部２で、奥行きカラー画像から複数の撮影対象のそれぞれの画像と、それらの後の背景領域の画像とを抽出し、加工部３では、上記背景領域の画像と複数の撮影対象のそれぞれの画像のうちの少なくとも１つを加工する。例えば、撮影対象の物体や背景部分をそれぞれ別個に白黒やセピア色などに変色させたり、モザイクをかけたり、変形したりなどする。
【０１２９】
＜画像提示部＞
次に、画像提示部４について説明する。
【０１３０】
画像提示部４は、加工部３にて特殊効果の施されたカラー画像（以下、特殊効果付き画像と呼ぶ）、通信部５にて受信した特殊効果付き画像をユーザに提示するためのものである。
【０１３１】
画像提示部４は、具体的には、ディスプレイ装置で構成され、加工部３にて生成された特殊効果付き画像をディスプレイ上に表示する。また、通信部５にて受信した特殊効果付き画像をディスプレイ上に表示する。
【０１３２】
さらに、図２４に示したように、ディスプレイ装置の表示画面上の表示エリアを、加工部３にて生成された特殊効果付き画像を表示するエリアと、通信部５にて受信した特殊効果付き画像を表示するエリアとに分割し、双方を同時に提示することも可能である。図２４では、エリアＡ１には、例えば、通信部５にて受信した特殊効果付き画像を表示され、エリアＡ２には、例えば、加工部３にて生成された特殊効果付き画像が表示される。
【０１３３】
また、画像提示部４は、加工部３にて生成された特殊効果付き画像を、奥行き情報を用いることで３次元モデル化して、３Ｄ（３次元）のシーンとして提示することも可能である。３Ｄのシーンにしてしまえば、視点の位置を変えて見たり、立体視をしたりすることも可能となる。
【０１３４】
＜通信部＞
最後に、通信部５について説明する。
【０１３５】
通信部５は、加工部３にて生成された特殊効果付き画像を、他の端末装置へ送信したり、他の（例えば、図１と同様な構成を有する）端末装置から送信されてきた上記同様の特殊効果付き画像を受信する。
【０１３６】
通信部５は、有線の通信手段による場合と、無線の通信手段による場合がある。まず、無線の通信手段による場合について説明する。
【０１３７】
この場合、例えば、ＰＤＣ（Ｐｅｒｓｏｎａｌ　ｄｉｇｉｔａｌ　ｃｅｌｌｕｌａｒ）やＣＤＭＡ（Ｃｏｄｅ−Ｄｉｖｉｓｉｏｎ　Ｍｕｌｔｉｐｌｅ　Ａｃｃｅｓｓ），ＰＨＳといった携帯電話に用いられているような無線通信方式を用いて、他の端末装置などの外部機器と通信する。これにより、外部機器への特殊効果付き画像の送信、外部機器からの特殊効果付き画像の受信が行われる。なお、通信手段は、携帯電話通信に限定されるものではなく、ＩＥＥＥ８０２．１１ａ／ｂ／ｇなどに規定された無線ＬＡＮや、Ｂｌｕｅｔｏｏｔｈ（商標）、赤外線通信、ＲＦ通信、その他の無線通信方式を用いることが可能である。
【０１３８】
次に、通信部５が、有線の通信手段による場合について説明する。
【０１３９】
この場合、通信部５はＵＳＢ、ＩＥＥＥ１３９４などのインタフェースを具備し、これらの方式で、接続された外部機器と通信を行う。そして、外部機器への特殊効果付き画像の送信、外部機器からの特殊効果付き画像の受信が行われる。例えば、ＵＳＢによって、接続されたＰＣへ特殊効果付き画像を送信したりなどである。なお、通信手段は、これに限定されるものではなく、シリアル通信、一般電話網、光ファイバ、その他の方式を用いることが可能である。また、外部機器の持っている通信手段を介して、さらに別の外部機器への特殊効果付き画像の送受信が行われることもある。これは、例えば、ＵＳＢ接続されているＰＣ（パソコン）のインターネット接続機能を介して、インターネット上の別の機器への送受信を行うということが考えられる。
【０１４０】
＜端末装置間のコミュニケーション＞
以上説明した本実施形態における端末装置を複数用いて、これら端末装置のそれぞれにおいて、リアルタイムに取得したカラー画像に前述したように特殊効果を施して特殊効果付き画像を生成し、それを用いて、当該複数の端末装置間でコミュニケーションを行うことが可能となる。
【０１４１】
それでは、本実施形態の端末装置を用いて実現される通信システムについて、幾つかの具体例を挙げながら説明する。
【０１４２】
現在、カメラ機能付き携帯電話が広く普及している。これは、従来の携帯電話の機能に加えて、写真や動画像を撮影して楽しむという機能、撮影物をメールなどに添付して送付することで、他人とコミュニケーションを行う機能、カメラを用いてうつした自分の姿を相手にリアルタイムに送信し、同時に相手のカメラ機能付き携帯電話でうつされた相手の姿をリアルタイムに受信することで、双方の画像を見ながらコミュニケーションを行うテレビ電話の機能などが実用化されている。
【０１４３】
本実施形態の通信システムは、例えば、これに置き換わるかたちで実現される。上で説明したような画像コミュニケーションを行うことが可能な新たな携帯電話システムというイメージである。
【０１４４】
例えば、ユーザは、この新たな携帯電話システムを構成する本実施形態にかかる端末装置としての携帯電話を用いて、自分の姿や風景、気になった物などを撮影する。そして、ユーザは、その画像に対応する奥行き情報を用いて生成された特殊効果付き画像（この画像には、もちろん、動画像の場合もある）を得る。これを、携帯電話の待ち受け画面などに使用し楽しむ。また、この画像（または、動画像）をメールに添付して相手におくり、コミュニケーションの手段として用いる。さらに、テレビ電話といったリアルタイムな画像コミュニケーション中に、自分の感情を相手にうまく伝えたり、コミュニケーションにエンターテイメント性をもたせるために、前述同様にして特殊効果を付加したりすることができる。
【０１４５】
それでは、テレビ電話の例を用いて一連の流れを図３３、図３４に示すフローチャートを参照して具体的に説明する。いま、２人の人物Ａ、Ｂが双方とも、図１に示した構成を有する携帯電話を保持している。人物Ａ、Ｂともに、お互いに自分の顔を撮影している。何も特殊効果が付加されていない状態では、お互いの機器の通信部５を介して、人物ＡとＢの携帯電話は相互に接続されており、人物Ａの顔の画像は、人物Ｂの画像提示部４を通して提示され、逆に、人物Ｂの顔の画像は、人物Ａに提示されている。これによって、画像を用いたテレビ電話が実現されている。そして、あるとき、人物Ａは、自分の顔の周りに蝶々が飛んでいるような特殊効果を付けようと思いたち、自分の顔の画像を当該携帯電話で撮影する。これにより、画像取得部１は画像奥行き情報を含む画像（すなわち、奥行きカラー画像）を取得し（ステップＳ１）。特徴抽出部２では、奥行き情報などを基に当該顔画像中の３次元的な特徴を抽出する（ステップＳ２）。人物Ａが、蝶々が飛んでいる画像を選択すると、加工部３では、当該顔画像に抽出された３次元的な特徴を基に、顔の回りを蝶が飛び回っているように選択された画像を合成し、特殊効果付き画像を生成する（ステップＳ３）。この特殊効果付き画像を画像提示部４に表示するとともに（ステップＳ４）、人物Ａの送信指示により、当該特殊効果付き画像を人物Ｂに送信する（ステップＳ５）。人物Ｂの所持する携帯電話では、通信部５にて当該特殊効果付き画像を受信すると（ステップＳ６）、それを画像提示部４に表示するので（ステップＳ７）、人物Ｂは、その特殊効果付きの人物Ａの顔を見ることができる。このように、従来のテレビ電話では無かった、エンターテイメント性をコミュニケーションに付加することが可能となる。
【０１４６】
本実施形態は、携帯電話のような場合だけではない。次に、別の例を説明する。従来から、パソコン（ＰＣ）にＵＳＢ接続のカメラを接続し、景色などを常に撮影しておく。そして、そのＰＣをインターネットにつなぎ、撮影しているリアルタイムの画像を一般に公開するということが行われている。これは、通常、ライブカメラ、または、定点観測カメラなどと呼ばれることが多いサービスである。撮影されているものは、自分の部屋の様子、ペットの様子、経営している店の客の入り具合の状態、渋谷の町並みの様子など多岐多様に渡っている。
【０１４７】
このような使用方法において、上記のＵＳＢ接続カメラの代わりに、本実施形態にかかる端末装置をＰＣに接続することで、このような画像に特殊効果を付加することができる。従って、例えば、いま、ある商品を売っている店が、商品の画像をライブカメラを通して、インターネット上に公開しているとする。そして、カメラの向きを変えながら（パン動作）幾つかの商品を撮影しているとする。この際、いわゆる“一押し”のある商品が撮影された際に、その商品の周りに、「一押し」を表す仮想物体（オブジェクト）を図２０のように回す特殊効果をかけ、新製品の商品が撮影された際には、当該新製品を表す仮想物体（オブジェクト）が回るような特殊効果をかける、といったことができる。このように、ライブカメラなどの画像に、エンターテイメント性を持たし、さらに、「一押し」、「新製品」などといった付加情報を提供することができる。
【０１４８】
（第１の実施形態の第１の変形例）
図２５は、第１の変形例に係る端末装置の要部の構成例を示したものである。
【０１４９】
図２５に示した端末装置と図１に示した端末装置との違いは、図２５では、特徴抽出部２が、画像取得部１で取得した奥行きカラー画像に含まれる奥行き情報を基に撮影対象の３次元的な特徴を前述同様にして抽出するとともに、さらに、通信部５で受信した奥行きカラー画像からも、そこに含まれる奥行き情報を基に撮影対象の３次元的な特徴を抽出するものである。
【０１５０】
また、図２５の加工部３は、特徴抽出部２で抽出された特徴をもとに、画像取得部１や通信部５で受信した奥行きカラー画像（あるいはカラー画像）や奥行き情報を含む特殊効果付き画像に、前述同様にして特殊効果を施すように構成されている。
【０１５１】
ここで、特殊効果付き画像は、もともと奥行き情報の対応付けられているカラー画像や奥行きカラー画像から生成されているので、特殊効果付き画像にも奥行き情報は対応付けられている（含まれている）。
【０１５２】
このような構成にすることで、画像取得部１で得られた奥行きカラー画像だけでなく、通信部５で受信した奥行きカラー画像や特殊効果付き画像に関しても特殊効果を施すことが可能となる。
【０１５３】
次に、図２５に示した構成を有する端末装置の動作について、図３５に示すフローチャートを参照しながら、先の携帯電話の例を用いて具体的に説明する。
【０１５４】
いま、２人の人物が双方とも、図２５に示したような構成の（第１の変形例に係る機能を持った）携帯電話を保持しており、２人の間で、テレビ電話を用いたコミュニケーションを行っているとする。この際、第１の実施形態では、自分の撮影している画像に対して特殊効果をかけることが可能であったが、第１の変形例では、これに加え、通信部５では、相手から送られてくる奥行き情報を含む画像を受信し（ステップＳ１１）、これを表示するとともに（ステップＳ１２）、当該受信した画像の奥行き情報などを基に、当該受信した画像の３次元的な特徴を抽出する（ステップＳ１３）。従って、受信側の加工部３では、当該受信した画像に対しても特殊効果を付加することが可能となる（ステップＳ１４）。所望の特殊効果を付加して生成された特殊効果付き画像は、画像提示部４に提示するともに、さらに、再び相手に送信することもできる（ステップＳ１６）。
【０１５５】
従って、例えば、双方とも、自分の顔をうつしてテレビ電話を用いたコミュニケーションを行っている最中に、相手の顔にいたずらをすることができる。例えば、よくテレビのコント番組では、怒りの表現や罰ゲームとして、人物の上に、「たらい」を落として笑いをとることがある。これと同じように、相手とのコミュニケーションの中で、なにか不快な思いをした際に、相手の顔画像に対して、「たらい」を落とすような特殊効果を施し、自分が見て楽しむ、さらに、それを相手におくって、楽しませながら、注意を促す、といった新たな画像コミュニケーションが可能となる。
【０１５６】
（第１の実施形態の第２の変形例）
図２６は、第２の変形例に係る端末装置の要部の構成例を示したものである。
【０１５７】
図２６に示した端末装置と図１に示した端末装置との違いは、図２６では、図１に示した構成に、さらに、仮想物体などの複数の付加画像データ（例えば、ＣＧデータや３ＤＣＧデータ）を記憶する付加画像記憶部７と、この中から所望の付加画像や特殊効果を選択するための特殊効果選択部６とが追加された構成になっている。
【０１５８】
特殊効果選択部６は、付加画像記憶部７に記憶されている付加画像を参照し、付加可能な付加画像や特殊効果の種類をユーザに呈示する。ユーザは、呈示されたものの中から付加したい付加画像あるいは特殊効果を選択し、この情報が加工部３に渡される。加工部３は、ユーザが選択した付加画像や特殊効果の種類に従った特殊効果を付加する。また、特殊効果選択部６では、ユーザに付加画像や特殊効果の種類を選択させるのではなく、ランダムに付加画像や特殊効果を選択することもある。
【０１５９】
このように変形することで、ユーザは、取得したカラー画像に合成する仮想物体や、特殊効果を選択して、ユーザの好みに応じた特殊効果付きカラー画像を生成することが可能となる。
【０１６０】
（第１の実施形態の第３の変形例）
図２７は、第３の変形例に係る端末装置の要部の構成例を示したものである。第３の変形例は、上記第２の変形例をさらに変形したものであり、図２７に示した端末装置と図２６に示した端末装置との違いは、図２７では、図２６に示した構成に、さらに、画像取得部１で取得した奥行きカラー画像（あるいはカラー画像）に映っているシーンの特徴を解析するシーン解析部８が追加された構成になっている。
【０１６１】
シーン解析部８は、画像取得部１で取得した奥行きカラー画像（あるいはカラー画像）に映っているシーンの特徴を解析するものである。シーンの特徴とは、シーンに映っているものの特定、シーンの色的な雰囲気、シーンの構図、などを指す。例えば、シーンに映っている物体が人物であるならば、人物であると判別する。そして、その情報を特殊効果選択部６に与える。特殊効果選択部６では、特殊効果の種類を人物に付加することがふさわしいものに限定する。また、シーンに映っているものが縦長の場合は、その周りを横に回転する仮想物体（オブジェクト）の画像を付加（例えば合成）すると効果があるが、非常に、横長の場合は、その周りを縦に回転する仮想物体（オブジェクト）の画像を付加した方が良い場合もあるし、回転する動きのある仮想物体でない別の仮想物体の画像を合成するなどの特殊効果を選択した方が効果的であることもある。また、シーンの全体的な色あいから、仮想物体などの付加画像の形状や色を様々に変えた方がいいこともある。例えば、全体的に赤いものが多く映っているシーンに、赤い仮想物体を合成しても効果が少ないであろう。さらに、シーンにうつっている物体の配置などの画像（シーン）自体の構図に応じて、仮想物体の動きを変化させるようにしてもよい。
【０１６２】
以上のように、図２７に示した構成により、シーン解析部８にて、シーンの特徴を解析することで、当該シーンの特徴に合わせて、付加する仮想物体などの動き、色、形状といったパラメータなどを変化させたり、当該シーンの特徴に対応する特殊効果の種類を選択したりして、より効果の高い特殊効果を付加することができるのである。
【０１６３】
（第１の実施形態の第４の変形例）
第４の変形例に係る端末装置は、図１に示した端末装置の構成、図２５に示した第１の変形例の構成、図２６に示した第２の変形例の構成、図２７に示した第３の変形例の構成のいずれかに、さらに、通信部５で通信する画像を選択することが可能な選択部９が追加された構成となっている。
【０１６４】
図２８は、第４の変形例に係る端末装置の要部の構成例を示したものであり、図２８では、図２５に示した第１の変形例の構成に選択部９を追加して構成される端末装置を示している。
【０１６５】
図２８に示したように、選択部９を追加することにより、通信部５を介して他の端末装置と交換するデータを選択することが可能となる。
【０１６６】
第１の変形例のところで用いた例を用いて、第４の変形例によって得られる効果について説明する。第１の変形例と同様の例を考える。いま、２人の人物が双方とも第４の変形例に係る図２８に示したような構成の機能を持った携帯電話を保持しており、２人の間で、テレビ電話を用いたコミュニケーションを行っているとする。そして、双方とも、自分の顔をうつしてテレビ電話を用いたコミュニケーションを行っている。この最中に、様々な特殊効果を用いたコミュニケーションを行うが、相手の顔にいたずらをする特殊効果のみは、相手に送信しないなどといった選択を行う。これにより、普段は、特殊効果付き画像を常に相手に送っているが、なにか不快な思いをした際に、相手の顔画像に対して、「たらい」を落とすような特殊効果を施し、その特殊効果付き画像のみは、自分が見て楽しむだけで、相手には送らない、といったことが可能となる。
【０１６７】
これは、１対１のコミュニケーションだけではなく、多人数とのコミュニケーション（１対多数、多数対多数）において、さらに効果がでる。例えば、ある人物Ｄが、同時に３人の人物Ａ、Ｂ、Ｃと、図２８に示した構成を有するテレビ電話器を用いたコミュニケーションを行っていたとする。あるとき、人物Ｄは、人物Ａの言動に対して不快な思いをして、「たらい」を落とす特殊効果を施す選択をした。しかし、Ａにそれを送ったのではけんかになると考えたため、人物Ｂ，Ｃのみにそれを送り、自分の感情を伝えた、などといった送信対象の選択が可能となる。
【０１６８】
以上では、送信に対する選択の説明をしたが、受信に対しても、同様に選択し、特定の人からの特殊効果付き画像を受信しないようにしたり、特定の特殊効果の施された特殊効果付き画像は受信しないようにしたりするようにしてもよい。
【０１６９】
（第１の実施形態の第５の変形例）
特殊効果付きカラー画像には、当該画像に適した効果音を付加することも可能である。
【０１７０】
特殊効果付きカラー画像には、当該画像に適した効果音を付加するための効果音付加部を、図１に示した構成、図２５に示した第１の変形例の構成、図２６に示した第２の変形例の構成、図２７に示した第３の変形例の構成のいずれかにさらに追加することもできる。
【０１７１】
これにより、特殊効果付きカラー画像に効果音をつけることができ、特殊効果をさらに高めることができる。例えば、前述の第１の実施形態では、仮想物体が人物に当たって跳ね返り、その際に衝突したことを表す「星」など仮想物体の画像を合成する特殊効果の例を説明したが、図１に示した構成において、上記効果音付加部（図示せず）を追加すると、この衝突の際に、衝突音を付加することでき、当該特殊効果の効果をあげることができる。
【０１７２】
以上説明したように、上記第１の実施形態によれば、撮影対象の画像（例えば、ここでは自然光画像）とともに、（各画素に奥行き情報を含む反射光画像を取得することにより）当該自然光画像の奥行き方向の情報（奥行き情報）を取得することができるので、この奥行き情報を用いて、自然光画像中の撮影対象の３次元的な形状や撮影対象の３次元的な位置関係などの３次元的な特徴を抽出し、この３次元的な特徴を基に、衝撃などの効果表現や仮想物体を表現した画像（付加画像）を合成するなどして、画像中の撮影対象の３次元的な特徴を生かした特殊効果を付加することができる。
【０１７３】
（第２の実施形態）
次に、本発明の第２の実施形態について説明する。
【０１７４】
＜全体の構成＞
図２９は、第２の実施形態に係る通信システムの全体の概略構成を示したものである。図２９に示したように、この通信システムは、無線通信システムであって、例えば、図１に示した構成を有する複数の端末装置（例えば、ここでは、第１の端末２０１と第２の端末２０２の２つ）と、第１の端末２０１と無線接続する基地局装置（ＢＳ）２１１と第２の端末２０２と無線接続する基地局装置（ＢＳ）２１２と、基地局装置２１１，２１２とサーバ装置２００を接続する所定の通信網（ネットワーク）とから構成されている。サーバ装置２００は、基地局２１１，２１２，ネットワークを介して第１の端末２０１と第２の端末２０２と互いに通信可能に接続されている。
【０１７５】
第１の端末２０１と第２の端末２０２とは、前述したように、特殊効果付き画像や特殊効果のついていない画像を互いに送受信することで画像コミュニケーションが行える。なお、第１の端末２０１と第２の端末２０２はそれぞれ図１に示した構成を有しているため、それぞれの端末ではカラー画像の他に、当該カラー画像に対応する奥行き情報を、すなわち、ここでは、奥行き情報を含むカラー画像としての奥行きカラー画像を取得することができる。
【０１７６】
第１の端末２０１と第２の端末２０２との間で、画像コミュニケーションを行う際には、各端末の通信部５からは、特殊効果付き画像あるいは特殊効果のついていない画像とともに、当該画像に対応する奥行き情報も送信する。なお、ここでは、前述同様、送信する画像自体に奥行き情報が含まれているものとする。
【０１７７】
また、第１の端末２０１と第２の端末２０２との間で、特殊効果付き画像を送受信する場合には、各端末の通信部５からは、当該奥行き情報を含む特殊効果付き画像とともに、各端末の特徴抽出部２で抽出された３次元的な特徴も送信される。
【０１７８】
また、サーバ装置２００は、第１の端末２０１と第２の端末２０２との間で、画像コミュニケーションが行えるサービスを提供するプロバイダにより設置および運営されており、第１の端末２０１と第２の端末２０２との間で当該画像コミュニケーションを行う際には、第１の端末２０１と第２の端末２０２から送信される画像や、奥行き情報などは、必ず、サーバ装置２００により受信されて、このサーバ装置２００を経由して相手端末へ送信されるようになっている。
【０１７９】
さて、この第２の実施形態では、上記サーバ装置２００において、第１の端末２０１と第２の端末２０２から送信される画像に、当該画像に対応する奥行き情報を基に、付加画像すなわち、ここでは、広告画像を合成する点に特徴がある。
【０１８０】
図３０は、サーバ装置２００の構成例を示したもので、受信部１０と特徴抽出部２と加工部３と送信部１１とから構成されている。なお、図３０において、図１と同一部分には同一符号を付し、異なる部分について説明する。すなわち、図３０に示すサーバ装置２００は、図１の端末装置の画像取得部１と画像提示部４と通信部５のそれぞれに代えて、第１の端末２０１や第２の端末２０２から送信されてきた画像（奥行き情報を含む、特殊効果付き画像あるいは特殊効果のついていない画像）を受信する受信部１０と、サーバ装置２００の加工部３で、受信部１０で受信した画像に広告画像の合成などの特殊効果を施すことにより生成された、広告付き画像を当該受信した画像の宛先である第１の端末２０１あるいは第２の端末２０２へ送信する送信部１１を具備している。
【０１８１】
次に、図３６に示すフローチャートを参照して、サーバ装置２００の処理動作について説明する。
【０１８２】
サーバ装置２００の受信部１０は、第１の端末２０１や第２の端末２０２から送信されてきた奥行き情報を含む画像（特殊効果付き画像あるいは特殊効果のついていない画像）を受信すると（ステップＳ２１）、サーバ装置２００の特徴抽出部２は、図１の端末装置の特徴抽出部２と同様にして、受信部１０で受信した画像と当該画像に対応する奥行き情報とから、当該画像中の３次元的な特徴を抽出する（ステップＳ２２）。
【０１８３】
サーバ装置２００の加工部３では、受信部１０で受信した画像と、当該抽出された３次元的な特徴とを基に、広告画像を合成するなどして、当該受信した画像に特殊効果を施す（ステップＳ２３）。この加工部３で広告画像を合成するなどして生成された広告付き画像は、送信部１１から、当該受信した画像のもともとに宛先である第１の端末２０１あるいは第２の端末２０２へ送信される（ステップＳ２４）。
【０１８４】
ここで、サーバ装置２００の加工部３について説明する。加工部３は、受信部１０で受信した画像（既に第１、第２の端末２０１，２０２にて特殊効果のつけられている特殊効果付き画像、あるいは、特殊効果の付けられていない一般的な画像）から特徴抽出部２によって得られた３次元的な特徴を考慮して、受信部１０で受信した当該画像に任意の広告画像をさらに合成するなどして特殊効果を施すものである。
【０１８５】
具体的には、当該画像に、さらに、ＣＧ（コンピュータグラフィックス）などで表現された広告を合成することで、広告付きの画像を生成する。この際、特徴抽出部２で抽出された、画像中の撮影対象（物体）の凹凸といった３次元的な形状、シーン内にどのような物体が存在するか（どのような領域に分割できるか）、シーンにおける各物体の位置関係、それぞれの物体の立体形状、などといった３次元的な特徴を基に、図１の加工部３と同様にして、広告画像と受信した画像中の物体の前後関係などの位置関係や衝突などを判別し、広告を合成する。
【０１８６】
ここで、図７に示した画像が受信部１０で受信されたとし、この図７に示した画像（以下、受信画像と呼ぶ）に広告（画像）を合成する場合を例にとり、サーバ装置２００の加工部３の処理動作について説明する。
【０１８７】
最も簡単な広告の付加方法は、受信画像の前景に広告が架からないように、背景部分に広告を合成するものである。図３１は、図７の受信画像に広告画像を合成した様子を示したものである。図１４に示したように、特徴抽出部２で抽出された３次元的な特徴によれば、容易に前景と背景を判別することができるのは前述したとおりである。この３次元的な情報を用いて、広告を付加する訳である。こうすることで、前景である人物が動いても、人物には広告が架からないように、常に背景に広告を提示しておくことが可能となる。こうすることで、コミュニケーションに悪影響を及ぼすことなく、常に広告を提示しておくことが可能となる。
【０１８８】
さらに、受信画像中の端末装置の加工部３で付加された（合成された）仮想物体に広告を追加しても効果的である。例えば、受信画像に、図１７で説明した人物の後ろを仮想物体「球」が動くという特殊効果が既に施されていたとする。サーバ装置２００の加工部３では、この仮想物体「球」自体に広告を貼り付ける、あるいは、図３２に示すように、「球」に広告をぶら下がるように、広告を合成する。こうすることで、「球」の動きにあわせて、広告自体も画面内を動き回ることになる。つまり、受信画像に含まれる端末側で付加された特殊効果によるエンターテイメント効果の中に広告効果も含めてしまうわけである。これにより、ユーザに悪印象を与えることなく、エンターテイメント効果を与える中でさりげなく広告提示をすることが可能となる。
【０１８９】
また、仮想物体自体が広告となっていてもよく、サーバ装置２００の加工部３では、広告と一体化した仮想物体を受信画像に合成する。広告と一体化した仮想物体とは、例えば、新発売の商品のＣＧ画像などのようなものが挙げられる。
【０１９０】
以上説明した広告の付加方法はあくまでも一例であり、これに限定されるものではない。サーバ装置２００の特徴抽出部２で抽出された３次元的な特徴を用いて、自由に広告を付加することができる。例えば、図２２には、頭にペンキが落ちてくる特殊効果の例を示したが、これが、ペンキではなく、ある商品のパッケージが落ちてきてもよいし、商品名や発売日などを示した文字が落ちてきてもよい。図１８に、仮想物体と衝突判定をし、その際に星などの仮想物体をつける例を示したが、この星のなかに広告が含まれて提示されてもよい。
【０１９１】
また、画面内に２つ以上の別の広告を出すことも可能であるし、受信画像が動画の場合には、シーンによって広告の内容が変わってもよい。例えば、シーンに写っている人物が女性ならば、女性向けの広告を、男性ならば男性向けの広告を付加するといったようにである。
【０１９２】
さらに、第１の端末２０１や第２の端末２０２では、もともと、それぞれの特徴抽出部２にて３次元的な特徴が抽出されているわけであるから、この抽出された３次元的な特徴情報も、画像（奥行き情報を含む、特殊効果付き画像あるいは特殊効果のついていない画像）とともに、必ず送信するようにすれば、サーバ装置２００には特徴抽出部２は持つ必要はなくなる。そして、サーバ装置２００では、図３７に示すように、受信部１０で受信した画像（受信画像）に対し、同じく受信部１０で受信した当該受信画像に対応する特徴情報を基に前述同様にして広告画像を合成することができる（ステップＳ３１〜ステップＳ３２）。
【０１９３】
第２の実施形態に係るサーバ装置２００を用いた通信システムによれば、第１の実施形態で説明したような画像コミュニケーションに加え、送信する画像中に広告を効果的に追加した画像を提供することが可能となる。これにより、広告をユーザ間のコミュニケーションの邪魔にならない位置に提示したり、広告を既にある仮想物体中に組み込むことで、広告にエンターテイメントを持たせるとともに、ユーザに悪影響を与えずに広告を提示することが可能となる。
【０１９４】
なお、以上の各実施形態やその変形例は、適宜組み合わせて実施することが可能である。
【０１９５】
本願発明の実施形態における処理をコンピュータで実行可能なプログラムで実現し、このプログラムをコンピュータで読みとり可能な記憶媒体として実現することも可能である。例えば、図１の端末装置の構成のうち、画像取得部１の一部（例えば、自然光画像撮像部１０３と反射光画像撮像部１０８と撮像動作制御部１０４）は、前述したように、１つの集積回路に収められていることが望ましいので、少なくとも、それ以外の画像取得部１と特徴抽出部２と加工部３はソフトウエアとして実現することも可能である。また、上記実施形態に係る端末装置の図３３〜図３５に示した処理動作を行わせるためのプログラムを当該端末装置にインストールすることにより、実現することができる。
【０１９６】
なお、記憶媒体としては、磁気ディスク、フロッピーディスク、ハードディスク、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ等）、光磁気ディスク（ＭＯ等）、半導体メモリ等、プログラムを記憶でき、かつコンピュータまたは組み込みシステムが読みとり可能な記憶媒体であれば、その記憶形式は何れの形態であってもよい。
【０１９７】
また、記憶媒体からコンピュータや組み込みシステムにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーションシステム）や、データベース管理ソフト、ネットワーク等のＭＷ（ミドルウェア）等が本実施形態を実現するための各処理の一部を実行してもよい。
【０１９８】
さらに、上記記憶媒体は、コンピュータあるいは組み込みシステムと独立した媒体に限らず、ＬＡＮやインターネット等により伝達されたプログラムをダウンロードして記憶または一時記憶した記憶媒体も含まれる。
【０１９９】
また、記憶媒体は１つに限られず、複数の媒体から本実施形態における処理が実行される場合も、本発明における記憶媒体に含まれ、媒体の構成は何れの構成であってもよい。
【０２００】
また、本願発明におけるコンピュータまたは組み込みシステムは、記憶媒体に記憶されたプログラムに基づき、本実施形態における各処理を実行するためのものであって、パソコン、マイコン等の１つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であってもよい。
【０２０１】
また、本願発明におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本願発明の機能を実現することが可能な機器、装置を総称している。
【０２０２】
また、本発明は、上記第１〜第２の実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。さらに、上記実施形態には種々の段階の発明は含まれており、開示される複数の構成要件における適宜な組み合わせにより、種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、発明が解決しようとする課題の欄で述べた課題（の少なくとも１つ）が解決でき、発明の効果の欄で述べられている効果（の少なくとも１つ）が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。
【０２０３】
【発明の効果】
以上説明したように、本発明によれば、撮影した画像（動画像、静止画像を含む）中の複数の撮影対象のそれぞれの３次元的な形状や位置関係などの３次元的な特徴を基に、広告画像や所望の仮想物体の画像を合成するなどの加工が行える。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る端末装置の要部の構成例を概略的に示す図。
【図２】通常の白黒画像と、奥行き白黒画像を説明するための図。
【図３】画像取得部の構成例を概略的に示す図。
【図４】反射光画像撮像部の構成例を示した図。
【図５】人間の手を撮影対象として、反射光画像撮像部で撮影された反射光画像に含まれる奥行き情報から得られる、当該反射光画像中の手の３次元的なイメージを表した図。
【図６】画像取得部で取得された反射光画像（（ａ）図）と自然光画像（（ｂ）図）の一具体例を示した図。
【図７】特徴抽出部での処理対象である奥行きカラー画像の一例を白黒画像で示した図。
【図８】図７の奥行きカラー画像中の物体の３次元的な位置関係を概略的に示す図。
【図９】図７の奥行きカラー画像中の背景部分を示した図。
【図１０】図７の奥行きカラー画像中の人物部分を示した図。
【図１１】図７の奥行きカラー画像中の腕部分を示した図。
【図１２】図７の奥行きカラー画像中の缶部分を示した図。
【図１３】図７の奥行きカラー画像中の物体の奥行き方向の一関係を示した図。
【図１４】図７の奥行きカラー画像中の奥行き方向の位置関係と前景・背景を示した図。
【図１５】図７の奥行きカラー画像中の奥行き方向の不連続点（ジャンプエッジ）から当該画像中の物体の詳細な位置関係を求める方法を説明するための図。
【図１６】図７の奥行きカラー画像中の各物体の奥行き方向の位置を特定する方法を説明するための図。
【図１７】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像の一例を説明するための図。
【図１８】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像の他の例を説明するための図。
【図１９】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像のさらに他の例を説明するための図。
【図２０】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像のさらに他の例を説明するための図。
【図２１】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像のさらに他の例を説明するための図。
【図２２】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像のさらに他の例を説明するための図。
【図２３】図７の奥行きカラー画像に特殊効果を加えた結果を示した特殊効果付き画像のさらに他の例を説明するための図。
【図２４】画面提示部に、画像を表示するための画面分割の例を示した図。
【図２５】本発明の第１の実施形態の第１の変形例に係る端末装置の要部の構成例を概略的に示す図。
【図２６】本発明の第１の実施形態の第２の変形例に係る端末装置の要部の構成例を概略的に示す図。
【図２７】本発明の第１の実施形態の第３の変形例に係る端末装置の要部の構成例を概略的に示す図。
【図２８】本発明の第１の実施形態の第４の変形例に係る端末装置の要部の構成例を概略的に示す図。
【図２９】本発明の第２の実施形態に係る通信システムの全体の構成を概略的に示した図。
【図３０】サーバ装置の要部の構成例を概略的に示した図。
【図３１】広告付き画像の一例を説明するための図。
【図３２】広告付き画像の他の例を説明するための図。
【図３３】端末装置の処理動作を説明するためのフローチャート。
【図３４】端末装置の処理動作を説明するためのフローチャート。
【図３５】端末装置の処理動作を説明するためのフローチャート。
【図３６】サーバ装置の処理動作を説明するためのフローチャート。
【図３７】サーバ装置の処理動作を説明するためのフローチャート。
【符号の説明】
１…画像取得部
２…特徴抽出部
３…加工部
４…画像提示部
５…通信部
６…特殊効果選択部
７…付加画像記憶部
８…シーン解析部
９…選択部
１０…受信部
１１…送信部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image processing method for processing an image by, for example, synthesizing a desired image with a captured image in a terminal device such as a mobile phone, and particularly, an image generated using this image processing method. The present invention relates to a terminal device and a communication system that perform communication by transmitting and receiving between a plurality of terminal devices such as a mobile phone.
[0002]
[Prior art]
2. Description of the Related Art In recent years, camera devices (such as video cameras) capable of capturing moving images have become widely popular. As a result, it became possible to easily photograph and browse familiar objects such as the growth process of one's child and events such as athletic meet. In recent years, mobile phones and PDAs are often equipped with cameras, so you can easily take a moving image at a street corner or the like and use an e-mail as it is to send an e-mail with the moving image attached to the other party. You can now send. Due to these, utilization methods using video images as communication means are becoming popular (for example, see Non-Patent Documents 1 and 2). In addition, network broadband has rapidly progressed, and real-time video communication such as videophone has already been realized (for example, see Non-Patent Document 3).
[0003]
As described above, a culture in which ordinary people easily use moving images as communication is permeating. However, most conventional video cameras simply use what is captured by the camera as a material, and rarely apply any special effect to the material. This is because, unlike human eyes, a conventional camera two-dimensionally captures only target color information. For this reason, the human eye can distinguish the objects shown in the camera, the context, the three-dimensional shape, etc., but can distinguish the objects shown in the camera only by color information. .
[0004]
For this reason, from an image captured by a camera, for example, only a human part is taken out and distinguished from the background, a front-back relationship and a positional relationship of a plurality of humans are obtained, and a shape change of the surface of an object is determined. Processing was very difficult. There are also attempts to virtually obtain such information using only color information. For example, a technique is disclosed in which an object is separated from a background in an image and image communication is performed using only the object using the object (for example, see Patent Document 1). Here, parts such as faces are separated, and face parts of a plurality of interacting persons are displayed on a separately prepared background, thereby giving an effect as if communicating in cyber space. However, there are many limitations, such as the fact that color information that is irrelevant to three-dimensional information is originally used, making it difficult to strictly distinguish objects and not knowing the context of actual objects. It is difficult to perform stably.
[0005]
Therefore, in the current communication using video images, it is conceivable to apply special effects such as image synthesis to the captured video image using information such as the distinction of the target, the positional relationship such as the anteroposterior relationship, and the three-dimensional shape. Had not been. Currently, most of the special effects such as overwriting characters and overlaying decorative frames on video images are performed (for example, see Non-Patent Document 4).
[0006]
On the other hand, if it becomes possible to present the advertisement in the video by extending the video communication between individuals, various business developments can be considered. However, as described above, when adding an advertisement to a conventional image, there is a problem of how to superimpose the advertisement. Simply superimposing an advertisement on a conventional image as it is often becomes an obstacle. For example, if an advertisement is superimposed and displayed on the face of oneself or the other party while communicating by videophone using a bust-up image of a human, not only is it annoying, They have a bad impression on them, and there is no point in advertising. Thus, conventionally, no attempt has been made to add an advertisement to an image.
[0007]
[Patent Document 1]
JP 2001-188910 A (paragraph numbers “0158” to “0170”, FIG. 24)
[0008]
[Non-patent document 1]
J-Phone's WWW page
http: // www. j-phone. com / movie-shamail /
[0009]
[Non-patent document 2]
NTT DoCoMo's WWW page
http: // www. nttdocomo. co. jp / p_s / imode / ishot / index. html
[0010]
[Non-Patent Document 3]
NTT DoCoMo FOMA WWW page
http: // forma. nttdocomo. co. jp /
[0011]
[Non-patent document 4]
ATLUS WWW page
http: // www. atlus. co. jp / am / printclub /
[0012]
[Problems to be solved by the invention]
As described above, conventionally, when processing an image by combining another desired image with a captured image (including a moving image and a still image), each of a plurality of imaging targets in the captured image is processed. There is a problem that processing such as simply combining an image on a two-dimensional plane on an image can be performed ignoring a three-dimensional shape and a positional relationship.
[0013]
However, if it is possible to distinguish between a plurality of photographing targets in the photographed image and use information such as their three-dimensional shape and positional relationship, the three-dimensional information of the photographing targets in these images is utilized. Then, special effects that are more varied are applied, such as a computer graphics (CG) character flying around a person reflected in the captured moving image or changing only the background portion. It becomes possible.
[0014]
Further, if it is possible to distinguish between the background and the person, it is also possible to combine the advertisement image with the background portion so that the advertisement is not applied to the human portion when the advertisement image is combined. Further, a new advertisement presentation method that makes good use of the target three-dimensional shape, such as a computer graphics character having an advertisement flying around a person shown in a captured moving image, can be realized.
[0015]
Accordingly, the present invention has been made in view of the above-described problems, and has been impossible in the related art. The three-dimensional shapes and the three-dimensional shapes of a plurality of photographing targets in a photographed image (including a moving image and a still image) have been impossible. Provided is an image processing method capable of performing processing such as synthesizing an advertisement image or a desired image in accordance with a three-dimensional feature such as a dynamic positional relationship, and an image processing apparatus, a communication terminal apparatus, and a server apparatus using the same. The purpose is to:
[0016]
[Means for Solving the Problems]
(1) The present invention is a terminal device that is one of a plurality of terminal devices that can communicate with each other, wherein a first acquisition unit that acquires a first image including a plurality of imaging targets; One image is composed of a plurality of regions each composed of one or a plurality of pixels, a second acquisition unit that acquires depth information corresponding to each of the plurality of regions, and the first image and the depth information. Extracting means for extracting at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship of the plurality of imaging targets as feature information; and, based on the feature information extracted by the extraction means, Generating means for processing the first image to generate a second image, so that each of the plurality of shooting targets in the shot first image (including a moving image and a still image) Such as three-dimensional shapes and three-dimensional positional relationships In accordance with the dimension characteristics, it can be performed processing such as synthesis of the desired image.
[0017]
For example, the generation unit generates the second image by combining an image of the object with the first image, and in this case, the generation of the object is performed in a depth direction in the first image. An image of the object based on at least one of a three-dimensional positional relationship between the object and the plurality of imaging targets when the position is determined, and a three-dimensional shape of the plurality of imaging targets. Is combined with the first image.
[0018]
Further, the generation unit generates the second image by combining an image of the object with the first image, and in this case, the movement of the object is three-dimensional of the imaging target. The image of the object is synthesized with the first image by controlling so as to match the shape.
[0019]
The extracting unit may extract a position in the depth direction where the imaging target is present in the first image and a background area after the imaging target, and the generation unit may include an object in the first image. Generating the second image by synthesizing the image of the first image and the image of the object so that the object exists between the imaging target and the background area. To be synthesized.
[0020]
Further, the extracting means includes: a first position in a depth direction where a first imaging target, which is one of the plurality of imaging targets in the first image, is present; A second position in the depth direction where a second imaging target, which is another one of the plurality of imaging targets, is located later, is extracted, and the generation unit outputs an object of the second imaging target to the first image. Generating the second image by synthesizing the images, wherein the image of the object is converted to the first image so that the object exists between the first position and the second position. Combine with the image of
[0021]
Further, the extracting unit extracts a third position as a three-dimensional position where the imaging target exists in the first image, and the generating unit combines an image of an object with the first image. The second image is generated by doing so, based on the movement of the object and the movement of the shooting target, when the object reaches the third position, the object is It is determined that the object has collided with the object to be photographed, and the motion and expression of the object are controlled so as to correspond to the collision to combine the image of the object with the first image, and that the object has collided with the object to be photographed. When it is determined, the effect expression of the collision is combined with the first image.
[0022]
Further, the extracting unit extracts a third position as a three-dimensional position where the imaging target exists in the first image, and the generating unit combines an image of an object with the first image. To generate the second image, wherein the third position of the imaging target and the fourth position of the three-dimensional position of the object in the first image are generated. Based on the position, the movement and expression of the object are controlled to combine the image of the object with the first image.
[0023]
Further, the extracting unit extracts each image of the plurality of imaging targets and an image of a background region subsequent thereto from the first image, and the generating unit includes an image of the background region and the plurality of images. The second image is generated by processing at least one of the respective images of the photographing target.
[0024]
(2) The second obtaining unit obtains a third image corresponding to the first image and including depth information in each pixel value, and obtains a third image of the first image from the third image. Means for extracting depth information corresponding to each area, wherein the extracting means determines at least the imaging target from the first image and the depth information corresponding to each area of the first image. A three-dimensional shape and a three-dimensional positional relationship between the plurality of imaging targets are extracted.
[0025]
(3) The second acquisition unit includes a light emitting unit that emits light toward the object to be photographed, and a first light emitting unit that emits light from the light emitting unit and includes light reflected from the object to be irradiated. A first light receiving unit that receives a light amount, a second light receiving unit that receives a second light amount that does not include the reflected light when the light emitting unit is not emitting light, and a second light receiving unit that receives the second light amount from the first light amount. Means for generating a third image including depth information in each pixel value by subtracting the light amount of the second and extracting the component of the reflected light from the first light amount; and extracting the third image from the third image. Means for calculating depth information corresponding to each area of one image, wherein the extraction means comprises at least the above-mentioned information based on the first image and the depth information corresponding to each area of the first image. The three-dimensional shape of the imaging target and the plurality of imaging targets Extracting the three-dimensional positional relationship.
[0026]
(4) The generation unit generates the second image by combining the first image with an additional image expressing an effect expression such as an impact or an object, and the generation unit generates the second image. Storage means for storing a plurality of types of additional images to be combined with one image; and means for selecting an additional image to be combined with the first image from among the plurality of additional images stored in the storage means. By further providing, the user can freely select a desired additional image and add a special effect that suits his or her preference to the first image.
[0027]
(5) The generation unit generates the second image by combining the first image with an additional image expressing an effect expression such as an impact or an object, and the generation unit generates the second image. Storage means for storing a plurality of types of additional images to be combined into one image; and, from the first image, at least one of an object, a color atmosphere, and a composition of the image shown in the first image. Means for extracting one as a feature of the scene; means for selecting an additional image to be combined with the first image from a plurality of types of additional images stored in the storage means based on the feature of the scene; Is provided, it is possible to add a special effect suitable for the object, the color atmosphere, the composition of the image, and the like in the first image.
[0028]
(6) The present invention is the server device in a communication system including a plurality of terminal devices capable of communicating with each other and a server device communicably connected to the plurality of terminal devices, wherein the plurality of terminal devices are provided. And receiving depth information corresponding to each of a first image including a plurality of imaging targets and a plurality of regions including one or a plurality of pixels constituting the first image transmitted from one of the first image and the second image. Receiving means for determining at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship of the plurality of imaging targets based on the first image and the depth information received by the receiving means; An advertisement image is added to the first image on the basis of at least one of an extraction unit to be extracted and a three-dimensional shape and a three-dimensional positional relationship of the plurality of imaging targets acquired by the extraction unit. To combine Generating means for generating a second image, and transmitting means for transmitting the second image to another of the plurality of terminal devices, so that the second image is transmitted and received between the terminal devices. A three-dimensional image such as a three-dimensional shape and a three-dimensional positional relationship of each of a plurality of imaging targets in the first image (including a moving image and a still image) is added to the image (the first image). The advertisement image can be synthesized according to various characteristics (so as not to disturb the shooting target in the first image). Therefore, the advertisement image can be efficiently synthesized, and the advertisement effect can be improved.
[0029]
In addition, the server device includes a first image including at least a plurality of imaging targets transmitted from one of the plurality of terminal devices, and one or a plurality of pixels included in the first image. Receiving means for receiving feature information extracted from depth information corresponding to each of the plurality of regions, the feature information including at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship between the plurality of imaging targets; Generating means for generating a second image by combining an advertisement image with the first image based on the feature information received by the receiving means; and generating the second image among the plurality of terminal devices. And transmission means for transmitting to another terminal device.
[0030]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0031]
(1st Embodiment)
First, a first embodiment of the present invention will be described.
[0032]
<Overall configuration>
FIG. 1 illustrates a configuration example of a main part, which is one of a plurality of terminal devices capable of communicating with each other, according to the first embodiment.
[0033]
The image communication device illustrated in FIG. 1 includes an image acquisition unit 1 that acquires an image to be photographed and depth information of the image, and a photographing target in the image based on the depth information of the image acquired by the image acquisition unit 1. A feature extracting unit 2 for extracting a three-dimensional feature such as an image, and an image acquired by the image acquiring unit 1 based on the feature extracted by the feature extracting unit 2 in consideration of a three-dimensional element. A processing unit 3 for applying a special effect such as synthesis, an image presentation unit 4 for presenting an image obtained as a result of processing by the processing unit 3 or an image received from the communication unit 5; And a communication unit 5 for transmitting and receiving images.
[0034]
Between a plurality of terminal devices having the configuration shown in FIG. 1, an image obtained by processing each of the obtained images by the processing unit 3 is transmitted or received to another terminal device. Then, communication between users of each terminal device is performed.
[0035]
<Image acquisition unit>
First, the image acquisition unit 1 will be described.
[0036]
The image obtaining unit 1 obtains, for example, a color image as an image to be captured, obtains depth information of the image, and determines the three-dimensional shape of the captured object and the distance from the image obtaining unit 1 to the captured object. The image is acquired as a color image including the reflected depth information (here, referred to as a depth color image).
[0037]
Here, a case where a color image is acquired as an image to be photographed will be described as an example. However, the present invention is not limited to this case, and may be a gradation image (a so-called black and white image). In this case, the image acquisition unit 1 acquires the imaging target as a depth monochrome image reflecting the three-dimensional shape and the distance from the image acquisition unit 1 to the imaging target.
[0038]
In general, a color image is composed of a unit called a pixel, and information of, for example, RGB colors is stored in the pixel as shown in FIG. 6B. This is realized by arranging these pixels in an array in two-dimensional directions vertically and horizontally. For example, a VGA (Video Graphics Array) size color image is represented by a two-dimensional array of 640 pixels in the x-axis (horizontal) direction and 480 pixels in the y-axis (vertical) direction. Is stored in, for example, the RGB format. FIG. 6B shows color information (for example, (R, G, B) = (r1, g1) stored in each pixel in a color image having 5 pixels in the x-axis direction and 8 pixels in the y-axis direction. , B1) etc.).
[0039]
In the depth color image, each pixel is associated with depth information (for example, distance information from the image acquisition unit 1 to the shooting target) d at that position, in addition to, for example, RGB color information as described above. . Here, as the correspondence, for example, for each pixel of the depth color image, for example, the depth information d corresponding to the pixel is added to the above (R, G, B) to obtain (R, G, B) , D).
[0040]
When a black and white image is used instead of a color image, the pixel value of the black and white image has a gradient. Therefore, in this case, in each pixel of the depth monochrome image, depth information d corresponding to the pixel is stored in addition to the gradation.
[0041]
Also, here, the description will be made assuming that each pixel of the depth color image holds the depth information d. However, the present invention is not limited to this case, and a region including a plurality of pixels is determined in advance as a unit region, and the depth is set for each unit region. The information d may be associated or held.
[0042]
Note that even if depth information is added to each pixel of the depth color image in addition to color information such as RGB, when displaying the depth color image, if this depth information is not used, the depth color image is different from a normal color image. It is displayed exactly the same. By using the depth information corresponding to each pixel, a color image can be displayed three-dimensionally.
[0043]
FIG. 2 shows a comparison between a normal monochrome image and a depth monochrome image. FIG. 2A shows a normal black and white image. This is an image in which the gradation is stored in each pixel two-dimensionally. FIG. 2B shows an example of a depth monochrome image. This is expressed three-dimensionally using depth information. (In order to make the three-dimensional expression easy to understand, it was viewed from the viewpoint from the lower front.) Thus, the depth black-and-white image and the depth color image are different from the conventional black-and-white image and the color image in the depth. A feature is that the information includes distance information d from the image acquisition unit 1 to the imaging target.
[0044]
Here, the image in which the depth information is associated with each pixel of the normal color image and the monochrome image is also referred to as a color image including the depth information and a monochrome image including the depth information, respectively. The images are positioned as specific examples of a color image including depth information and a black-and-white image including depth information, respectively.
[0045]
Now, the configuration of the image acquisition unit 1 will be described in detail with reference to FIG. FIG. 3 is a functional block diagram of the image acquisition unit 1. The image acquisition unit 101 is mainly divided into an image information acquisition unit 101 for acquiring image information including color information of a shooting target in real time, and depth information of the shooting target is obtained in real time. It comprises a depth information acquisition unit 102 for acquiring, an imaging operation control unit 104, a depth color image generation unit 113, and an output unit 114.
[0046]
The image information acquiring unit 101 acquires a natural light image which is a color image of an object including a background by imaging a shooting target (for example, an object) with natural light (including illumination light), and natural light. A natural light image storage unit 105 in which a natural light image captured by the image capturing unit 103 is stored, and a natural light image in which the natural light image stored in the natural light image storage unit 105 is read and brightness and contrast are adjusted as necessary. An adjusting section 106; and an image information output section 107 for outputting the natural light image adjusted by the natural light image adjusting section 106 to the processing section 3 for outputting the natural light image in a data format suitable for processing in the processing section 3. The natural light image adjusted by the unit 106 is further output to the depth color image generation unit 113.
[0047]
The natural light image capturing unit 103 is, for example, a CCD or a CMOS image sensor, and acquires a color plane image of a shooting target (eg, an object) including a background in real time. Thereby, the color can be acquired from the attributes of the object. In addition, since the color images are continuously captured, the motion information of the object can be grasped from changes in the continuous frames. In addition, the image of the object can be a moving image.
[0048]
The imaging operation control unit 104 generates a control signal for controlling the operations of the natural light image imaging unit 103 and the reflected light image imaging unit 108.
[0049]
On the other hand, the depth information acquiring unit 102 includes a reflected light image capturing unit 108 for capturing a reflected light image of an object to be photographed, such as an object or a subject around the object, according to a control signal from the imaging operation control unit 104, and a reflected light image. A reflected light image storage unit 109 in which the reflected light image captured by the image capturing unit 108 is stored; a reflected light image correction unit 110 that reads out the reflected light image stored in the reflected light image storage unit 109 and performs various corrections; A parameter storage unit 111 for storing parameters for correcting the reflected light image, and a depth information calculating unit 112 for analyzing the reflected light image corrected by the reflected light image correcting unit 110 and calculating depth information of the object. The depth information calculated by the depth information calculation unit 112 is output to the depth color image generation unit 113.
[0050]
Here, details of the reflected light image capturing unit 108 will be described with reference to FIG. FIG. 4 is a functional block diagram of the reflected light image capturing unit 108. The reflected light image capturing unit 108 includes a light emitting unit 114, a light receiving optical system 115, a reflected light extracting unit 116, and a timing control unit 117.
[0051]
The light emitting unit 114 generates light whose intensity varies with time according to the timing signal generated by the timing control unit 117. This light is emitted to the front of the light emitting unit 114, for example, the user's head (an example of an object) in FIG. The light generated here is near-infrared light, but is not limited thereto, and light in other wavelength ranges such as visible light can also be used.
[0052]
For example, the reflected light from the user's head is condensed by a light receiving optical system 115 composed of a lens or the like, and forms an image on the light receiving surface of the reflected light extracting unit 116. The light receiving optical system 115 is provided with a filter that passes near infrared light. This filter cuts out external light such as visible light and far-infrared light other than near-infrared light in the reflected light.
[0053]
The reflected light extraction unit 116 extracts the spatial intensity distribution of the reflected light forming the image. This intensity distribution can be grasped as an image by reflected light. This is the reflected light image (silhouette image). In order to achieve this function, the reflected light extraction unit 116 includes a first light receiving unit 119, a second light receiving unit 120, and a difference calculation unit 121. The first light receiving section 119 and the second light receiving section 120 receive light at different timings. The timing control unit 117 controls the light emitting unit 114 so that the light emitting unit 114 emits light when the first light receiving unit 119 receives light and the light emitting unit 114 does not emit light when the second light receiving unit 120 receives light. Control the operation timing. Accordingly, the first light receiving unit 119 receives the reflected light of the light from the light emitting unit 114 by the object and other natural light (that is, external light such as sunlight and illumination light). On the other hand, the second light receiving unit 120 receives only natural light. The two receive light at different timings but are close to each other, so that the fluctuation of natural light during this time can be ignored.
[0054]
Therefore, by taking the difference between the image received by the first light receiving unit 119 and the image received by the second light receiving unit 120, only the component of the light reflected by the object out of the light from the light emitting unit 114 is extracted, and the reflected light Images can be generated. The difference calculation unit 121 calculates and outputs a difference between images received by the first light receiving unit 119 and the second light receiving unit 120.
[0055]
The reflected light from the object greatly decreases as the distance between the object and the reflected light image pickup unit 108 increases. When the surface of the object scatters light uniformly, the amount of light received per pixel of the reflected light image decreases in inverse proportion to the square of the distance to the object. That is, for example, if the pixel value of the pixel at the coordinates (i, j) in the reflected light image is Q (i, j),
Q (i, j) = K / d ² … (1)
It can be expressed as. Here, K is a coefficient adjusted so that the value of Q (i, j) becomes “255” when d = 0.5 m, for example. By solving equation (1) for d, the distance can be obtained.
[0056]
As described above, each pixel of the reflected light image includes, as depth information, an intensity value of reflected light that can be converted into a distance from the reflected light image capturing unit 108 to the imaging target, that is, depth information. It is.
[0057]
Since the intensity value of the reflected light from the object can be converted into the distance from the reflected light image capturing unit 108 to the object, the three-dimensional shape of the object can be grasped. Also, the reflected light from the background is almost negligible. Therefore, it is possible to obtain a reflected light image of the object and the object around the object with the background cut.
[0058]
FIG. 6A shows an example of the reflected light image. FIG. 6A shows a case of a reflected light image of 5 × 8 pixels which is a part of a reflected light image of 256 × 256 pixels for simplicity. The pixel value of each pixel is the intensity value of the reflected light.
[0059]
FIG. 5 shows a three-dimensional image of a hand in the reflected light image obtained from the above-described depth information included in the reflected light image captured by the reflected light image capturing unit 108 with the human hand as an imaging target.
[0060]
The reflected light image can be obtained, for example, by an apparatus having the following configurations (1) to (3). (1) Timing signal generating means for generating a pulse signal or a modulation signal that is constant or changes over time. (2) Light emitting means for emitting light of which the intensity changes based on the signal generated by the timing signal generating means. (3) means for arranging and detecting means for separating reflected light of the light emitted from the light emitting means from the external light (natural light) in synchronization with a signal from the timing signal generating means and arranging the light; Reflected light extracting means for detecting a reflected light image by the camera.
[0061]
A more detailed description of the above components of the reflected light image pickup unit 108 is described in JP-A-10-177449 filed by the same applicant as the present application.
[0062]
Since the reflected light image capturing unit 108 continuously captures the reflected light image of the object including the object, the movement information of the object can be grasped from the change of the continuous frames. Thereby, the image of the object can be turned into a moving image.
[0063]
As described later, a depth color image is generated by combining a natural light image (a color plane image of an object including a background) with depth information obtained by calculating a reflected light image of the object. For this reason, it is necessary to make the pixels of the reflected light image correspond to the pixels of the natural light image. Therefore, it is preferable that the reflected light image capturing unit 108 and the natural light image capturing unit 103 be disposed close to each other. From this viewpoint, it is preferable that the natural light image capturing unit 103, the reflected light image capturing unit 108, and the image capturing operation control unit 104 be integrated into one chip as needed.
[0064]
In addition, it is necessary that, for example, a correspondence between pixels or a region including a plurality of pixels is made between the natural light image and the reflected light image. For example, a natural light image (color image) acquired by the image acquiring unit 101 is an image as shown in FIG. 6B, and a reflected light image acquired by the depth information acquiring unit 102 is shown in FIG. Assume that the image is as shown. In this case, there is a one-to-one correspondence between each pixel of the color image and each pixel of the reflected light image. When each pixel in each image of FIGS. 6A and 6B is represented using its coordinates (i, j) (where i = 1 to 5, j = 1 to 8), FIG. The pixel P2 (i, j) of the reflected light image in (a) and the pixel P1 (i, j) in the natural light image of FIG. 6B correspond to each other. For example, the pixel P2 (5, 8) of the reflected light image in FIG. 6A and the pixel P1 (5, 8) in the natural light image of FIG. 6B correspond to each other. Accordingly, the pixel value of the pixel P3 (i, j) in the depth color image is calculated from the color information of the pixel P1 (i, j) in the natural light image and the pixel P2 (i, j) of the reflected light image as (depth information). The distance (depth information) d calculated by the calculation unit 112 is included.
[0065]
Here, the case where the pixels of the natural light image and the pixels of the reflected light image are associated with each other is not limited to this case. The image may be divided into a plurality of unit areas composed of a plurality of pixels, and each of the plurality of unit areas may be associated with each pixel of the reflected light image. The image may be divided into a plurality of unit regions each including a plurality of pixels, and each of the plurality of unit regions may be associated with each pixel of the natural light image. Further, each of the reflected light image and the natural light image may be divided into a plurality of unit regions each including a plurality of pixels, and the unit regions of the reflected light image and the natural light image may be associated with each other.
[0066]
In addition, for example, when an imaging instruction is input from a user, the imaging operation control unit 104 causes the natural light image imaging unit 103 and the reflected light image imaging unit 108 to acquire a natural light image and a reflected light image almost simultaneously. The control signal is output to the natural light image capturing unit 103 and the reflected light image capturing unit 108.
[0067]
Next, correction in the reflected light image correction unit 110 shown in FIG. 3 will be described. The correction is performed in consideration of the color and the reflection characteristics of the object. More specifically, the intensity of the reflected light from the object depends on factors other than the distance between the object and the reflected light image pickup unit 108. Therefore, even if the distance is simply obtained from the reflected light image, the distance (that is, the three-dimensional shape) may not be accurate. For example, when the color of the object surface is black, the intensity of the reflected light decreases. When the surface of the object contains a large amount of specular reflection components, strong reflected light is generated at a portion where the normal of the surface of the object is close to the light source direction.
[0068]
Therefore, before obtaining the distance information from the reflected light intensity of the object, the reflected light image correction unit 110 refers to parameters related to the surface color and the reflection characteristics of the object stored in the parameter storage unit 111 in advance, and calculates the natural light. With reference to the natural light image stored in the image storage unit 105, the reflected light of the object is determined based on the pixel value of a pixel (pixel or unit area) corresponding to a pixel or a unit area of the reflected light image in the natural light image. Correct the image. For example, if the color information of the pixel of the natural light image corresponding to a certain pixel in the reflected light image is “black”, the pixel value of the reflected light image is determined by using the parameter when the reflected light from the black imaging target is received. To correct.
[0069]
Accordingly, each pixel value of the reflected light image obtained by the reflected light image capturing unit 108 is corrected, so that the accuracy of the depth information calculated by the depth information calculating unit 112 at the subsequent stage can be improved.
[0070]
For each pixel of the reflected light image corrected by the reflected light image correcting unit 110, the depth information calculation unit 112 sets the distance d corresponding to the pixel, where Q is the amount of received light (reflected light intensity) as the pixel value. For example, it is obtained from the following equation (2).
[0071]
d = (K / Q) ^1/2 … (2)
That is, as described above, since the intensity value of the reflected light from the object can be converted into the distance d from the reflected light image capturing unit 108 to the object, this distance d is obtained as depth information.
[0072]
Here, the case where the distance d as depth information is obtained for each pixel in the reflected light image is shown. However, the present invention is not limited to this case. The distance d may be calculated from the average value of the pixel values of.
[0073]
The depth color image generation unit 113 uses the natural light image output from the natural light image adjustment unit 106 and the depth information of each pixel (or each unit area) in the reflected light image calculated by the depth information calculation unit 112. And generate a depth color image.
[0074]
That is, for example, the depth information calculated from the pixel of the reflected light image corresponding to the pixel is added to the pixel value of each pixel of the color image as shown in FIG. , Each pixel value of the depth color image is generated. For example, the pixel P2 (5, 8) of the reflected light image in FIG. 6A and the pixel P1 (5, 8) in the natural light image of FIG. 6B correspond to each other. Therefore, the distance (depth information) d calculated by the depth information calculation unit 112 from the pixel P2 (5, 8) of the reflected light image is added to the pixel value of the pixel P1 (5, 8) in the natural light image. In addition, a pixel value of the pixel P3 (5, 8) of the depth color image is generated.
[0075]
In addition, when the natural light image is divided into a plurality of unit regions each including a plurality of pixels, and each of the plurality of unit regions is associated with each pixel of the reflected light image, The unit area is associated with depth information d calculated from the pixels of the reflected light image corresponding to the unit area (for example, the pixel value of each pixel in the unit area corresponds to the pixel value of the reflected light image corresponding to the unit area. The depth information d calculated from the pixel is added), and each pixel value of the depth color image is generated.
[0076]
Conversely, when the reflected light image is divided into a plurality of unit regions each composed of a plurality of pixels, and each of the plurality of unit regions is associated with each pixel of the natural light image, the natural light The depth information calculated from the unit area of the reflected light image corresponding to the pixel is added to the pixel value of each pixel of the image to generate each pixel value of the depth color image.
[0077]
Further, when each of the reflected light image and the natural light image is divided into a plurality of unit regions each including a plurality of pixels, and the respective unit regions of the reflected light image and the natural light image are associated with each other, the natural light image Each unit area is associated with depth information d calculated from the unit area of the reflected light image corresponding to the unit area (for example, the pixel value of each pixel in the unit area corresponds to the reflected light corresponding to the unit area. The depth information d calculated from the unit area of the image is added), and each pixel value of the depth color image is generated.
[0078]
The depth color image generated in this way is sent to the output unit 114, where it is subjected to conversion or the like in order to conform to the output destination protocol or data format, and is output to the feature extraction unit 2.
[0079]
The configuration of the image acquisition unit 1 described above is merely an example, and the configuration is not limited to this. In particular, when acquiring depth information, it is not always necessary to use a reflected light image as described above. That is, the depth information may be calculated by using the parallax information of the images captured from a plurality of visual fields, the depth information may be obtained by using a stereo matching method, or the stripe-shaped laser light may be used. Irradiation and measurement of the depth information using the distortion of the shape may be used as a method called a laser range finder. Further, it is also possible to obtain depth information by using a method other than those described above and use a device that can obtain the above-described depth color image.
[0080]
Further, the image acquisition unit 1 generates an image (depth color image) including depth information in each pixel by the depth color image generation unit 113, but is not limited to this case. Alternatively, depth information corresponding to the pixel may be associated with the pixel, and the correspondence may simply be retained or stored.
[0081]
<Feature extraction unit>
Next, the feature extracting unit 2 will be described. Here, a depth color image including the depth information in the pixel values obtained by the image acquisition unit 1 is to be processed.
[0082]
The feature extraction unit 2 is for extracting a three-dimensional feature of a shooting target based on depth information included in the depth color image acquired by the image acquisition unit 1.
[0083]
Here, the three-dimensional features analyzed by the feature extracting unit 2 will be specifically described with reference to the depth color image having the contents shown in FIG. FIG. 7 shows a case where the image is displayed without using the depth information included in the depth color image, and is the same as a normal color image (black and white image in FIG. 7). FIG. 7 shows the upper body of a person sitting on a chair in the center of the scene. The person raises his right arm and raises his index finger. In addition, canned juice is placed in front of the person.
[0084]
FIG. 8 schematically and simply shows the positional relationship between the objects in this scene. As shown in FIG. 8, a can exists at the closest distance from the image acquisition unit 1 (see FIG. 12), and a person exists beyond the can (see FIG. 10). There is a background part further away (see FIG. 9). When the depth information in the person is examined in detail, the right arm portion is closer to the image acquisition unit 1 than the torso (see FIG. 11). That is, the scene can be divided into several regions as shown in FIGS. 9 to 12 depending on the difference in depth.
[0085]
As described above, by analyzing the depth information included in the depth color image acquired by the image acquisition unit 1, what kind of shooting target (for example, an object) exists in the scene (what kind of area can be divided) ), What is the unevenness relationship in the depth direction in the scene, the three-dimensional positional relationship of each object in the scene (portions constituting one object are also regarded as one object, respectively), the three-dimensional shape of each object , Etc. can be obtained.
[0086]
Now, a feature analysis method will be described.
[0087]
As the simplest analysis, there is a method in which only the three-dimensional shape is obtained without discriminating what the object imaged in the scene is. For example, in the case of the scene shown in FIG. 7, the lower center part of the screen (actually, the can part) has the closest depth value, and the lower left part of the screen (actually, the right arm part of the person) has the next closest depth value. The whole of the scene in the depth direction, such as a large part in the center of the screen (actually, the person part) is next, and a part after that (actually, the background part) is very far away. Extract the irregular relationship.
[0088]
Further, the feature extraction unit 2 can perform more complicated analysis if necessary.
[0089]
FIG. 13 schematically shows depth information included in the depth color image shown in FIG. FIG. 13 shows the scene shown in FIG. 7 from above, in which only the part where the depth information is obtained is indicated by a solid line. As described above, in the depth color image, only the depth information of the portion that can be seen from the shooting direction can be obtained. (There is no data that cannot be seen from the shooting direction, for example, in this example, there is no data such as a can or the back side of a person or the part hidden behind the right arm of the person.) Obtain a dimensional positional relationship.
[0090]
The simplest method is to simply distinguish the foreground and the background based on the depth information d from the image acquisition unit 1. A certain threshold value TH is determined, and a portion where the depth information is closer than the threshold value TH is regarded as the foreground, and a portion farther than the threshold value is regarded as the background (see FIG. 14). By doing so, it is possible to distinguish between the object appearing in the scene and the background portion. This makes it possible to determine whether or not any object (that is, a portion corresponding to the foreground) exists in the scene being photographed.
[0091]
In addition, a plurality of thresholds are prepared, and for example, a scene appearing in a depth color image is divided into a first threshold x1, a second threshold x2, and a third threshold x3. It is also possible to divide the region into four regions, such as between the distances x2 and x3 and the distance after the distance x3, and to identify whether or not an object exists and what the object is for each region. Good.
[0092]
Further, by detecting a discontinuity point in the depth direction called a jump edge (portion shown by a dotted line in FIG. 15), as shown in FIG. 15, a "background portion", a "person portion", and a "right arm of a person" It is also possible to recognize a more detailed positional relationship in the scene, such as “part”, “can part”, and the like.
[0093]
Various methods can be used to detect the jump edge. For example, for example, an image in which only a depth portion is extracted from a depth color image (only depth information such as a reflected light image is arranged in a two-dimensional array) This data can be obtained by performing filtering processing for edge detection (typically, convolution filtering processing by a Sobel operator or the like) on the existing data.
[0094]
Furthermore, there is a method called pattern matching, in which the features of an object are registered in advance and a portion similar to the feature is searched for in the image. By using such a method, each object obtained above is obtained. Can recognize what is. For example, in the scene shown in FIG. 7, if the photographing targets such as “can” and “person” and the “right hand” of the person can be recognized, and if their positional relations can be recognized, from FIG. Complex recognition is possible, such as where the "can" is located, "a person" is behind the "can", and the "right hand" is raised.
[0095]
Note that the analysis method for extracting a three-dimensional feature from a depth color image as described above is merely an example, and the present invention is not limited to this. It can be realized by combining various other analysis, image processing, and image recognition techniques.
[0096]
The feature extraction unit 2 extracts three-dimensional features in the natural light image from the depth color image (depth information obtained from each elementary value of the natural light image and the reflected light image). The three-dimensional features in the natural light image include, for example, the three-dimensional shape (including the unevenness on the surface) of each shooting target in the natural light image, and the positional relationship between a plurality of shooting targets (the plane in the natural light image). And the positional relationship in the depth direction of the natural light image (mainly the front-rear relationship), and the like, and from these, the position in the depth direction corresponding to each photographing target (based on a predetermined threshold) The foreground portion and the background portion where the determined shooting target exists can be further divided into areas, and the shooting target can be recognized by pattern matching or the like.
[0097]
<Processing part>
Next, the processing unit 3 will be described.
[0098]
The processing unit 3 captures a color image (or a depth color image), which is a natural light image acquired by the image acquisition unit 1, based on the three-dimensional features extracted by the feature extraction unit 2. This is for applying (adding) a special effect in consideration of a three-dimensional feature such as a three-dimensional shape and a positional relationship of an object.
[0099]
Specifically, a special effect is added by synthesizing an image represented by CG (computer graphics) (here also referred to as an additional image) with a color image. At this time, depth information such as unevenness in the color image (scene) extracted by the feature extraction unit 2, what kind of object exists in the color image (what kind of area can be divided), By utilizing the three-dimensional features such as the positional relationship of each object in the above, the three-dimensional shape of each object, etc., it is possible to determine the relationship between the virtual object and the scene, the collision state, etc., and deform the virtual object as necessary And compose it into a color image.
[0100]
Here, a case is considered in which a virtual object “sphere” given as data of 3D (three-dimensional) CG is synthesized using the color image (scene shown in) shown in FIG. 7 as an example. FIG. 16 mainly shows the positional relationship in the depth direction of each object to be photographed in the color image of FIG. 7. As described above, the three-dimensional features in such a scene are as follows. It is obtained from the feature extraction unit 2.
[0101]
Now, consider the special effect that the “ball” moves from the right to the left on the screen at the depth position C in FIG. At this time, since the three-dimensional position where the “sphere” which is a virtual object (virtual object) is placed and its shape information are known, the three-dimensional position of the scene and the three-dimensional feature of the scene (at each position of the scene) By comparing the “depth values”, the anteroposterior relationship between the “sphere” and the position of each object in the scene can be determined.
[0102]
Thus, as shown in FIG. 17, it is possible to synthesize a “sphere” so that the sphere passes in front of the background (depth position D in FIG. 16) but passes behind the person (depth position B). is there. As described above, it is possible to three-dimensionally add a virtual object to a color image based on the features extracted by the feature extraction unit 2.
[0103]
Similarly, when the image of the virtual object “sphere” is synthesized at the depth position A in the color image so that the “sphere” moves from right to left, the “sphere” passes through the back of the can and the person Special effect passing in front of the
[0104]
Further, by proceeding with this, it is possible to determine the collision between the virtual object and the object in the color image. As is clear from FIG. 16, the position of the object in the color image can be specified three-dimensionally from the image plane and the depth direction. Therefore, a more precise special effect can be added based on the three-dimensional position specified for each object in the color image. One of them is the special effect of “collision”.
[0105]
Now, consider a special effect in which a virtual object “sphere” moves from the right side of the screen to a depth position B in FIG. At this time, since the feature extracted by the feature extraction unit 2 indicates that the part where the person is located is the depth position B, the “ball” may collide with the person when it reaches the position of the person. I understand. Therefore, as shown in FIG. 18, it is possible to add a special effect that the "ball" collides with a person and bounces off. At this time, the direction in which the object bounces after the collision can be calculated by looking at the three-dimensional shape of the object. Further, at the time of collision, as shown in FIG. 18, a “star” which is one of the effect expressions of “collision” is expressed at the collision position of the object in the color image (the cheek portion of the person in FIG. 18). It is more effective to add special effects such as
[0106]
The following special effects are also possible based on the three-dimensional position specified for each object in the color image. For example, a case where the virtual object is a special effect that moves three-dimensionally will be described. FIG. 19 illustrates an example of a special effect in which two virtual objects “spheres” move from the front of a color image to the depth direction of the image. As shown in FIG. 19, according to the features extracted by the feature extraction unit 2, it is known that the can is in the front in the depth direction and a person is behind the can. This information is combined with the movement of the “ball”, and as shown in FIG. 19, one of the two “balls” bounces off a nearby can, and the other bounces off a distant person. An effect can be added.
[0107]
In addition, as shown in FIG. 20, a special effect such as turning a virtual object “sphere” around a person's head is added, or as shown in FIG. 21, a special object such as turning a “sphere” around a finger. It is also possible to add effects.
[0108]
Furthermore, by using the three-dimensional shape of the object in the color image, that is, the three-dimensional feature of unevenness, as shown in FIG. You can also apply special effects such as As shown in FIG. 22, the nose portion of the person has a feature that it protrudes more than other faces, so that it is possible to reproduce that the paint flows while avoiding the nose portion.
[0109]
In addition, since the characteristics of the three-dimensional positional relationship of the objects in the color image are known, the paint moves and shapes the virtual object called paint so that it does not flow into the finger part of the right arm or the can part. Can control.
[0110]
Furthermore, various images can be displayed in the depth direction of the color image, such as a virtual "character" sitting on the can or on the shoulder of a human, or a virtual "tonbo" stopping on the top of the finger. Can be synthesized at a desired position.
[0111]
In addition, as shown in FIG. 23, there is a virtual “ball” in the center of the screen, and by moving the hand, the “ball” moves. Using information of three-dimensional features such as a changing hand motion, for example, a motion is given to the virtual object to the “sphere” in accordance with the motion of the hand (in this case, the “ball” is hit by hand and skipped) ).
[0112]
In the above, the special effect of combining the image of the virtual object with the color image has been described as an example, but the special effect is not limited to this. For example, a special effect by image synthesis such as erasing the background part and replacing it with another background, or replacing it with a background created by CG is also possible.
[0113]
In addition, a special effect such as discoloring only the background portion to black and white or sepia color while leaving the object in the foreground (in the example of FIG. 7, the person and can portion) as it is is possible. Conversely, it is also possible to mosaic only the person part. Further, the special effect may be implemented by performing a modification of all or a part of the color image. For example, special effects such as inflating and shrinking the face of a person like a balloon are also possible.
[0114]
As described above, the processing unit 3 considers three-dimensional elements in the color image acquired as the natural light image by the image acquisition unit 1 based on the three-dimensional features extracted by the feature extraction unit 2. Add various special effects.
[0115]
As an example of the special effect, for example, when a virtual object is synthesized with a color image, the position in the depth direction in the color image can be determined, so that the virtual position is actually set at the position in the depth direction in the three-dimensional space. The image of the virtual object is combined with the color image so that the object exists. At this time, the same applies whether the image of the virtual object is a moving image or the color image is a moving image.
[0116]
In addition, since the collision between the virtual object and the object in the color image can be determined from the motion and position of the object in the color image and the motion and position of the virtual object, the motion of the virtual object corresponding to the collision is expressed. Alternatively, a display indicating that a collision has occurred can be performed (for example, an image of a virtual object such as a “star” indicating a collision can be synthesized).
[0117]
In addition, when a virtual object is synthesized with a color image, the movement and shape of the virtual object can be controlled in accordance with the three-dimensional shape of the object in the color image, that is, for example, in accordance with irregularities. Then, the image of the virtual object is synthesized with the color image so that the virtual object exists. At this time, the same applies whether the image of the virtual object is a moving image or the color image is a moving image.
[0118]
In addition, since the object or background to be photographed in the color image is extracted as a three-dimensional feature, for example, the object or background to be photographed is separately changed to black and white, sepia, or mosaic. Can be applied, deformed, etc.
[0119]
Note that the use of the features and the special effects described in the present embodiment are merely examples, and the present invention is not limited thereto.
[0120]
Also, here, for convenience, a virtual object such as a “sphere” has been described, but the present invention is not limited to this. As the virtual object, various objects such as a character, a vehicle, and a building can be considered. In addition, the virtual object includes 3D CG data obtained by using an actual photograph, and CG data of a handwritten mark or character input by a user. In the present embodiment, the virtual object is described as not being deformed. However, the present invention is not limited to this. The virtual object may be freely deformed according to the type of the virtual object. Further, the image of the virtual object may be a moving image.
[0121]
Note that an input unit for the user to input a handwritten mark, a character, a vehicle, a building, and other various pictures may be newly added to the configuration shown in FIG. Further, a processing unit for converting CG data or 3DCG data is also required so that the handwritten picture input through the input unit can be processed by the processing unit 3.
[0122]
As described above, when the processing unit 3 combines the image of the virtual object with the depth color image (which may be a color image that does not include depth information), the processing unit 3 moves in the depth direction of the depth color image. An image of the virtual object based on at least one of a three-dimensional positional relationship between the virtual object and the imaging target and a three-dimensional shape of the imaging target when the position of the virtual object is determined. Is synthesized with the depth color image.
[0123]
For example, control is performed so that the movement of the virtual object matches the three-dimensional shape of the photographed object, and the image of the virtual object is synthesized with the depth color image.
[0124]
In addition, the special extraction unit 2 extracts the position in the depth direction where the shooting target exists in the depth color image and the background area after the shooting target, and the processing unit 3 determines whether the virtual object is between the shooting target and the background area. Is combined with the depth color image so that the virtual object exists.
[0125]
In addition, the special extraction unit 2 includes a first position in the depth direction where a first imaging target, which is one of a plurality of imaging targets in the depth color image, exists, and a first position in the depth direction after the first imaging target. A second position in the depth direction where a second imaging target, which is another one of the plurality of imaging targets, is extracted, and the virtual object is processed by the processing unit 3 using the first position and the second position. The image of the virtual object is synthesized with the depth color image so that the virtual object exists between the positions.
[0126]
Further, the feature extracting unit 2 extracts a third position as a three-dimensional position where the shooting target exists in the depth color image, and the processing unit 3 calculates the third position based on the movement of the virtual object and the movement of the shooting target. When the virtual object reaches the third position, it is determined that the virtual object has collided with the imaging target, and the motion or expression of the object is controlled so as to correspond to the collision, and the image of the virtual object is determined. Combine with depth color image. Note that when it is determined that the virtual object has collided with the imaging target, a collision effect expression (for example, “star”) may be further combined with the depth color image.
[0127]
Further, the feature extracting unit 2 extracts a third position as a three-dimensional position where the photographing target in the depth color image exists, and the processing unit 3 extracts the third position of the photographing target and the depth color. Based on the three-dimensional position of the virtual object in the image and the fourth position, the motion and expression of the virtual object are controlled to synthesize the image of the virtual object with the depth color image.
[0128]
Further, the feature extracting unit 2 extracts each image of the plurality of shooting targets from the depth color image and the image of the background region behind them, and the processing unit 3 extracts the image of the background region and the plurality of shooting targets. Of at least one of the images. For example, the object to be photographed and the background portion are separately changed in color, such as black and white or sepia, mosaiced or deformed.
[0129]
<Image presentation unit>
Next, the image presentation unit 4 will be described.
[0130]
The image presentation unit 4 is for presenting a color image (hereinafter, referred to as an image with a special effect) to which a special effect has been applied by the processing unit 3 and an image with a special effect received by the communication unit 5 to the user. is there.
[0131]
The image presentation unit 4 is specifically configured by a display device, and displays the image with the special effect generated by the processing unit 3 on a display. The image with the special effect received by the communication unit 5 is displayed on the display.
[0132]
Further, as shown in FIG. 24, the display area on the display screen of the display device is divided into an area for displaying the image with the special effect generated by the processing unit 3, and an image with the special effect received by the communication unit 5. Can be divided into an area for displaying and both can be presented at the same time. In FIG. 24, for example, an image with special effects received by the communication unit 5 is displayed in the area A1, and an image with special effects generated by the processing unit 3 is displayed in the area A2, for example.
[0133]
Further, the image presenting unit 4 can also convert the image with the special effect generated by the processing unit 3 into a three-dimensional model by using the depth information and present it as a 3D (three-dimensional) scene. Once a 3D scene has been created, it is possible to change the position of the viewpoint and view the image or perform stereoscopic viewing.
[0134]
<Communication unit>
Finally, the communication unit 5 will be described.
[0135]
The communication unit 5 transmits the image with the special effect generated by the processing unit 3 to another terminal device, or transmits the image with the special effect from another terminal device (for example, having a configuration similar to that of FIG. 1). A similar image with special effects is received.
[0136]
The communication unit 5 may be a wired communication unit or a wireless communication unit. First, the case using wireless communication means will be described.
[0137]
In this case, for example, communication with an external device such as another terminal device is performed using a wireless communication system such as PDC (Personal digital cellular), CDMA (Code-Division Multiple Access), or PHS, which is used for a mobile phone. . Thereby, transmission of the image with the special effect to the external device and reception of the image with the special effect from the external device are performed. The communication means is not limited to mobile phone communication, but may be a wireless LAN defined by IEEE802.11a / b / g or the like, Bluetooth (trademark), infrared communication, RF communication, or another wireless communication method. It can be used.
[0138]
Next, the case where the communication unit 5 is based on wired communication means will be described.
[0139]
In this case, the communication unit 5 has an interface such as USB or IEEE1394, and communicates with a connected external device by these methods. Then, transmission of the image with the special effect to the external device and reception of the image with the special effect from the external device are performed. For example, an image with a special effect is transmitted to a connected PC by USB. Note that the communication means is not limited to this, and it is possible to use serial communication, a general telephone network, an optical fiber, or another method. Further, transmission and reception of an image with a special effect to another external device may be performed via a communication unit of the external device. This may be, for example, transmission / reception to / from another device on the Internet via the Internet connection function of a USB-connected PC (personal computer).
[0140]
<Communication between terminal devices>
Using a plurality of terminal devices in the present embodiment described above, in each of these terminal devices, a special effect is applied to the color image acquired in real time as described above to generate a special effect-added image, and Communication can be performed between the plurality of terminal devices.
[0141]
Then, a communication system realized using the terminal device of the present embodiment will be described with reference to some specific examples.
[0142]
At present, mobile phones with a camera function are widely used. In addition to the functions of conventional mobile phones, this feature allows users to take pictures and enjoy moving images, communicate with other people by attaching the images to e-mail, etc., and use the camera. A videophone function that communicates while viewing both images by transmitting in real time the figure of the person who has been transmitted to the other party and simultaneously receiving the figure of the other party who has been transmitted by the mobile phone equipped with the camera function of the other party in real time Has been put to practical use.
[0143]
The communication system according to the present embodiment is realized, for example, by replacing it. This is an image of a new mobile phone system capable of performing image communication as described above.
[0144]
For example, the user uses his / her mobile phone as a terminal device according to the present embodiment, which constitutes this new mobile phone system, to photograph his / her figure, scenery, and an object of interest. Then, the user obtains an image with a special effect generated using the depth information corresponding to the image (this image may be, of course, a moving image). Use this on the standby screen of a mobile phone and enjoy it. This image (or moving image) is attached to an e-mail and sent to the other party, and used as a means of communication. Further, during real-time image communication such as a videophone call, special effects can be added in the same manner as described above in order to well convey one's emotions to the other party and to provide the communication with entertainment.
[0145]
Now, a series of flows will be specifically described with reference to flowcharts shown in FIGS. 33 and 34 using an example of a videophone. Now, the two persons A and B both hold the mobile phones having the configuration shown in FIG. Both the persons A and B photograph each other's own face. In the state where no special effect is added, the mobile phones of the persons A and B are connected to each other via the communication unit 5 of each device, and the image of the face of the person A is the image of the person B. The image of the face of the person B is presented to the person A through the presentation unit 4. As a result, a videophone using images is realized. Then, one day, the person A wants to give a special effect such as a butterfly flying around his face, and takes an image of his face with the mobile phone. Thereby, the image acquisition unit 1 acquires an image including image depth information (that is, a depth color image) (Step S1). The feature extraction unit 2 extracts three-dimensional features in the face image based on depth information and the like (step S2). When the person A selects an image in which butterflies are flying, the processing unit 3 selects an image in which a butterfly is flying around the face based on the three-dimensional features extracted in the face image. Are combined to generate an image with a special effect (step S3). The image with the special effect is displayed on the image presentation unit 4 (step S4), and the image with the special effect is transmitted to the person B in response to the transmission instruction of the person A (step S5). When the mobile phone owned by the person B receives the image with the special effect in the communication unit 5 (step S6), it displays it on the image presentation unit 4 (step S7). Of the person A can be seen. In this way, it is possible to add entertainment to communication, which was not possible with conventional videophones.
[0146]
The present embodiment is not limited to the case of a mobile phone. Next, another example will be described. 2. Description of the Related Art Conventionally, a USB-connected camera is connected to a personal computer (PC) to constantly capture scenery and the like. Then, the PC is connected to the Internet, and the real-time image being shot is disclosed to the public. This is a service often referred to as a live camera or a fixed point observation camera. What is being photographed is a wide variety of things, such as the state of your room, pets, the condition of customers in the shop you run, and the state of Shibuya.
[0147]
In such a usage method, a special effect can be added to such an image by connecting the terminal device according to the present embodiment to a PC instead of the USB connection camera. Therefore, for example, it is assumed that a store that sells a certain product has released an image of the product on the Internet through a live camera. Then, it is assumed that some products are photographed while changing the direction of the camera (pan operation). At this time, when a product with a so-called “one-push” is photographed, a special effect of turning a virtual object (object) representing “one-push” around the product as shown in FIG. When a product is photographed, a special effect such as turning a virtual object (object) representing the new product can be applied. As described above, the image of a live camera or the like has entertainment properties, and further, additional information such as “one push” or “new product” can be provided.
[0148]
(First Modification of First Embodiment)
FIG. 25 illustrates a configuration example of a main part of a terminal device according to a first modification.
[0149]
The difference between the terminal device shown in FIG. 25 and the terminal device shown in FIG. 1 is that, in FIG. 25, the feature extraction unit 2 uses the depth information included in the depth color image acquired by the image acquisition unit 1 In addition to extracting the three-dimensional features described above, the three-dimensional features of the shooting target are also extracted from the depth color image received by the communication unit 5 based on the depth information included therein. It is.
[0150]
The processing unit 3 in FIG. 25 performs a special effect including a depth color image (or a color image) and depth information received by the image acquisition unit 1 and the communication unit 5 based on the features extracted by the feature extraction unit 2. A special effect is applied to the attached image in the same manner as described above.
[0151]
Here, since the image with special effect is originally generated from a color image or depth color image with which depth information is associated, the depth information is also associated with the image with special effect (included). ).
[0152]
With such a configuration, it is possible to apply a special effect not only to the depth color image obtained by the image acquisition unit 1 but also to a depth color image or an image with a special effect received by the communication unit 5.
[0153]
Next, the operation of the terminal device having the configuration shown in FIG. 25 will be specifically described with reference to the flowchart shown in FIG.
[0154]
Now, both persons hold a mobile phone (having the function according to the first modified example) as shown in FIG. 25, and use a videophone between the two persons. Communication. At this time, in the first embodiment, it was possible to apply a special effect to the image being shot by the user. However, in the first modified example, in addition to this, the communication unit 5 An image including the transmitted depth information is received (step S11), and is displayed (step S12). Based on the depth information of the received image and the like, the three-dimensional characteristics of the received image are determined. Extract (step S13). Therefore, the processing unit 3 on the receiving side can add a special effect to the received image (step S14). The image with the special effect generated by adding the desired special effect can be presented to the image presenting unit 4 and further transmitted to the other party (step S16).
[0155]
Therefore, for example, both of them can mischief the other person's face while performing communication using the videophone while their own faces are turned down. For example, a TV control program often drops a "tub" on a person and laughs as an expression of anger or a punishment game. In the same way, when something unpleasant in communication with the other party, a special effect that drops the "tub" is applied to the face image of the other party, and you can enjoy watching it. It is possible to perform new image communication, such as sending it to the other party and giving attention to it while having fun.
[0156]
(Second Modification of First Embodiment)
FIG. 26 illustrates a configuration example of a main part of a terminal device according to a second modification.
[0157]
The difference between the terminal device shown in FIG. 26 and the terminal device shown in FIG. 1 is that in FIG. 26, a plurality of additional image data (for example, CG data and 3DCG) such as a virtual object are added to the configuration shown in FIG. ), And a special effect selection unit 6 for selecting a desired additional image or special effect from the additional image storage unit 7.
[0158]
The special effect selection unit 6 refers to the additional image stored in the additional image storage unit 7 and presents the type of the additional image and the special effect that can be added to the user. The user selects an additional image or a special effect to be added from the presented ones, and this information is passed to the processing unit 3. The processing unit 3 adds a special effect according to the type of the additional image or the special effect selected by the user. In addition, the special effect selecting unit 6 may select the additional image or the special effect at random instead of allowing the user to select the type of the additional image or the special effect.
[0159]
By deforming in this way, the user can select a virtual object to be combined with the acquired color image or a special effect, and generate a color image with a special effect according to the user's preference.
[0160]
(Third Modification of First Embodiment)
FIG. 27 illustrates a configuration example of a main part of a terminal device according to a third modification. The third modification is a further modification of the second modification. The difference between the terminal device shown in FIG. 27 and the terminal device shown in FIG. 26 is that the terminal device shown in FIG. In addition to the configuration, a scene analysis unit 8 that analyzes a feature of a scene appearing in a depth color image (or a color image) acquired by the image acquisition unit 1 is added.
[0161]
The scene analysis unit 8 analyzes a feature of a scene shown in the depth color image (or the color image) acquired by the image acquisition unit 1. The characteristics of a scene refer to the identification of what is reflected in the scene, the color atmosphere of the scene, the composition of the scene, and the like. For example, if the object reflected in the scene is a person, it is determined that the person is a person. Then, the information is given to the special effect selecting unit 6. The special effect selection unit 6 limits the types of special effects to those suitable to be added to a person. In addition, when the object reflected in the scene is vertically long, it is effective to add (for example, combine) an image of a virtual object (object) that rotates horizontally around the object. It is sometimes better to add an image of a virtual object that rotates vertically, or it is better to select a special effect such as combining an image of another virtual object that is not a virtual object with a rotating motion. Sometimes it is a target. Also, it may be better to change the shape and color of the additional image such as the virtual object in various ways based on the overall tint of the scene. For example, if a red virtual object is combined with a scene in which many red objects are reflected as a whole, the effect will be small. Furthermore, the movement of the virtual object may be changed according to the composition of the image (scene) itself, such as the arrangement of the object moving in the scene.
[0162]
As described above, according to the configuration shown in FIG. 27, the scene analysis unit 8 analyzes the features of the scene, and according to the features of the scene, the parameters such as the motion, color, and shape of the virtual object to be added. For example, it is possible to add a special effect having a higher effect by changing the setting or the like or selecting a type of the special effect corresponding to the feature of the scene.
[0163]
(Fourth Modification of First Embodiment)
The terminal device according to the fourth modification includes the configuration of the terminal device shown in FIG. 1, the configuration of the first modification shown in FIG. 25, the configuration of the second modification shown in FIG. 26, and FIG. The configuration is such that a selection unit 9 capable of selecting an image to be communicated by the communication unit 5 is added to any of the configurations of the third modified example shown above.
[0164]
FIG. 28 shows a configuration example of a main part of a terminal device according to a fourth modification. In FIG. 28, a selection unit 9 is added to the configuration of the first modification shown in FIG. 1 shows a terminal device to be configured.
[0165]
As shown in FIG. 28, by adding the selection unit 9, it becomes possible to select data to be exchanged with another terminal device via the communication unit 5.
[0166]
The effect obtained by the fourth modification will be described using the example used in the first modification. Consider an example similar to the first modification. Now, two persons are both holding mobile phones having functions as shown in FIG. 28 according to the fourth modification, and communication between the two using a videophone is performed. Let's say you are. Both of them are communicating with each other using videophones with their faces turned down. During this time, communication using various special effects is performed, but only a special effect that mischiefs the face of the partner is not transmitted to the partner. As a result, usually, images with special effects are always sent to the other party, but when something is unpleasant, a special effect that drops the “tub” is applied to the other person's face image, and the special Only the effect-added image can be viewed and enjoyed by oneself and not sent to the other party.
[0167]
This is more effective not only in one-to-one communication but also in communication with many people (one-to-many, many-to-many). For example, assume that a certain person D is simultaneously communicating with three people A, B, and C using a videophone having the configuration shown in FIG. At one point, Person D feels uncomfortable with the behavior of Person A and chooses to apply a special effect that drops the "tub". However, since it is thought that sending it to A is a quarrel, it is possible to select a transmission target such as sending it only to the persons B and C and conveying his / her emotions.
[0168]
In the above, the selection for transmission was explained, but for reception, the same selection is made to prevent receiving images with special effects from specific people, or with special effects with specific special effects An image may not be received.
[0169]
(Fifth Modification of First Embodiment)
A sound effect suitable for the image can be added to the color image with special effects.
[0170]
In the color image with special effects, a sound effect adding unit for adding a sound effect suitable for the image is provided with the configuration shown in FIG. 1, the configuration of the first modified example shown in FIG. 25, and the configuration shown in FIG. The configuration can be further added to either the configuration of the second modified example or the configuration of the third modified example shown in FIG.
[0171]
Thereby, a sound effect can be added to the color image with the special effect, and the special effect can be further enhanced. For example, in the above-described first embodiment, an example of a special effect of synthesizing an image of a virtual object such as a “star” indicating that the virtual object bounces upon a person and collides at that time has been described. In the above configuration, if the above-mentioned sound effect adding section (not shown) is added, a collision sound can be added at the time of the collision, and the effect of the special effect can be raised.
[0172]
As described above, according to the first embodiment, the natural light image (by acquiring a reflected light image including depth information in each pixel) together with the image to be captured (for example, a natural light image in this case). Information in the depth direction (depth information) can be obtained, and this depth information is used to obtain three-dimensional information such as the three-dimensional shape of the shooting target in the natural light image and the three-dimensional positional relationship of the shooting target. The three-dimensional feature of the object to be photographed in the image is extracted by combining the image (additional image) expressing the effect such as impact and the virtual object based on the three-dimensional feature. Special effects that take advantage of the features can be added.
[0173]
(Second embodiment)
Next, a second embodiment of the present invention will be described.
[0174]
<Overall configuration>
FIG. 29 shows the overall schematic configuration of the communication system according to the second embodiment. As shown in FIG. 29, this communication system is a wireless communication system, and includes, for example, a plurality of terminal devices (for example, a first terminal 201 and a second terminal 202), a base station (BS) 211 wirelessly connected to the first terminal 201, a base station (BS) 212 wirelessly connected to the second terminal 202, base stations 211 and 212, and a server. A predetermined communication network (network) for connecting the device 200 is provided. The server device 200 is communicably connected to the first terminal 201 and the second terminal 202 via base stations 211 and 212 and a network.
[0175]
As described above, the first terminal 201 and the second terminal 202 can perform image communication by transmitting and receiving an image with a special effect and an image without a special effect to each other. Since the first terminal 201 and the second terminal 202 each have the configuration shown in FIG. 1, in addition to the color image, each terminal transmits depth information corresponding to the color image, that is, Here, a depth color image as a color image including depth information can be obtained.
[0176]
When image communication is performed between the first terminal 201 and the second terminal 202, the communication unit 5 of each terminal sends an image with a special effect or an image without a special effect and a corresponding image. The depth information to be transmitted is also transmitted. Here, as in the above, it is assumed that the transmitted image itself contains depth information.
[0177]
When transmitting and receiving an image with a special effect between the first terminal 201 and the second terminal 202, the communication unit 5 of each terminal sends the image with the special effect including the depth information together with the image with the special effect. The three-dimensional features extracted by the feature extraction unit 2 of the terminal are also transmitted.
[0178]
The server device 200 is installed and operated by a provider that provides a service that enables image communication between the first terminal 201 and the second terminal 202, and the first terminal 201 and the second terminal When the image communication is performed between the first terminal 201 and the second terminal 202, the image and the depth information are always received by the server device 200. The data is transmitted to the partner terminal via the communication terminal 200.
[0179]
By the way, in the second embodiment, the server apparatus 200 adds an additional image to an image transmitted from the first terminal 201 and the second terminal 202 based on the depth information corresponding to the image. Is characterized in that the advertisement image is synthesized.
[0180]
FIG. 30 shows an example of the configuration of the server device 200, which includes a receiving unit 10, a feature extracting unit 2, a processing unit 3, and a transmitting unit 11. 30, the same parts as those in FIG. 1 are denoted by the same reference numerals, and different parts will be described. That is, the server device 200 illustrated in FIG. 30 is transmitted from the first terminal 201 or the second terminal 202 instead of the image acquisition unit 1, the image presentation unit 4, and the communication unit 5 of the terminal device in FIG. The receiving unit 10 for receiving the received image (the image with the special effect including the depth information or the image without the special effect), and the processing unit 3 of the server device 200 combines the image received by the receiving unit 10 with the advertisement image. The transmission unit 11 transmits an image with advertisement generated by applying a special effect such as the above to the first terminal 201 or the second terminal 202 which is the destination of the received image.
[0181]
Next, the processing operation of the server device 200 will be described with reference to the flowchart shown in FIG.
[0182]
The receiving unit 10 of the server device 200 receives an image (an image with a special effect or an image without a special effect) including depth information transmitted from the first terminal 201 or the second terminal 202 (step S21). The feature extracting unit 2 of the server device 200, based on the image received by the receiving unit 10 and the depth information corresponding to the image in the same manner as the feature extracting unit 2 of the terminal device of FIG. The characteristic is extracted (step S22).
[0183]
The processing unit 3 of the server device 200 applies a special effect to the received image, for example, by combining an advertisement image based on the image received by the receiving unit 10 and the extracted three-dimensional features. (Step S23). The image with advertisement generated by combining the advertisement image with the processing unit 3 is transmitted from the transmission unit 11 to the first terminal 201 or the second terminal 202 which is the destination based on the received image. (Step S24).
[0184]
Here, the processing unit 3 of the server device 200 will be described. The processing unit 3 receives the image received by the receiving unit 10 (an image with a special effect already applied by the first and second terminals 201 and 202, or a general image without a special effect). In consideration of the three-dimensional features obtained by the feature extracting unit 2 from the image), a special effect is applied by further combining an arbitrary advertisement image with the image received by the receiving unit 10.
[0185]
Specifically, an image with an advertisement is generated by further combining the image with an advertisement expressed by CG (computer graphics) or the like. At this time, the three-dimensional shape such as the unevenness of the imaging target (object) in the image extracted by the feature extraction unit 2 and what kind of object exists in the scene (what kind of area can be divided) Based on three-dimensional features such as the positional relationship of each object in the scene, the three-dimensional shape of each object, and the like, the anterior-posterior relationship between the advertisement image and the object in the received image in the same manner as the processing unit 3 in FIG. The advertisement is synthesized by determining the positional relationship, collision, and the like.
[0186]
Here, it is assumed that the image shown in FIG. 7 is received by the receiving unit 10, and an example in which an advertisement (image) is combined with the image (hereinafter, referred to as a received image) shown in FIG. The processing operation of the processing unit 3 will be described.
[0187]
The simplest method of adding an advertisement is to combine the advertisement with the background so that the advertisement does not span the foreground of the received image. FIG. 31 shows a state in which an advertisement image is combined with the received image of FIG. As shown in FIG. 14, as described above, the foreground and the background can be easily distinguished according to the three-dimensional features extracted by the feature extracting unit 2. The advertisement is added using the three-dimensional information. By doing so, even if a person who is the foreground moves, it is possible to always present the advertisement in the background so that the advertisement is not over the person. By doing so, it is possible to always present the advertisement without adversely affecting the communication.
[0188]
Furthermore, it is also effective to add an advertisement to the virtual object added (combined) by the processing unit 3 of the terminal device in the received image. For example, it is assumed that a special effect that the virtual object “sphere” moves behind the person described with reference to FIG. 17 has already been applied to the received image. The processing unit 3 of the server device 200 pastes an advertisement on the virtual object “sphere” itself, or synthesizes the advertisement such that the advertisement hangs on the “sphere” as shown in FIG. In this way, the advertisement itself moves around the screen in accordance with the movement of the “ball”. That is, the advertising effect is included in the entertainment effect by the special effect added on the terminal side included in the received image. Thereby, it is possible to casually present an advertisement while giving an entertainment effect without giving a bad impression to the user.
[0189]
In addition, the virtual object itself may be an advertisement, and the processing unit 3 of the server device 200 combines the virtual object integrated with the advertisement with the received image. The virtual object integrated with the advertisement includes, for example, a CG image of a newly released product.
[0190]
The method of adding an advertisement described above is merely an example, and the present invention is not limited to this. Using the three-dimensional features extracted by the feature extraction unit 2 of the server device 200, an advertisement can be freely added. For example, FIG. 22 shows an example of a special effect in which paint falls on the head, but this is not paint, and a package of a certain product may fall, or a product name, a release date, or the like is shown. Characters may fall. FIG. 18 illustrates an example in which a collision with a virtual object is determined and a virtual object such as a star is attached at that time. However, an advertisement may be included in the star and presented.
[0191]
It is also possible to display two or more different advertisements on the screen, and when the received image is a moving image, the content of the advertisement may change depending on the scene. For example, if the person in the scene is a woman, an advertisement for women is added, and if it is a man, an advertisement for men is added.
[0192]
Furthermore, in the first terminal 201 and the second terminal 202, since the three-dimensional features are originally extracted by the respective feature extraction units 2, the extracted three-dimensional feature information If the data is always transmitted together with the image (the image including the depth information and the image with the special effect or the image without the special effect), the server device 200 does not need to have the feature extracting unit 2. Then, in the server device 200, as shown in FIG. 37, the image (received image) received by the receiving unit 10 is similarly processed based on the feature information corresponding to the received image also received by the receiving unit 10. An advertisement image can be synthesized (steps S31 to S32).
[0193]
According to the communication system using the server device 200 according to the second embodiment, in addition to the image communication described in the first embodiment, an image in which an advertisement is effectively added to an image to be transmitted is provided. It becomes possible. This allows the advertisement to be presented with entertainment without affecting the user by presenting the advertisement in a position that does not interfere with communication between users or by incorporating the advertisement in an existing virtual object. It becomes possible.
[0194]
The above embodiments and their modifications can be implemented in appropriate combinations.
[0195]
The processing in the embodiment of the present invention can be realized by a computer-executable program, and the program can be realized as a computer-readable storage medium. For example, in the configuration of the terminal device in FIG. 1, part of the image acquisition unit 1 (for example, the natural light image capturing unit 103, the reflected light image capturing unit 108, and the image capturing operation control unit 104) includes one Since it is desirable that the image acquisition unit 1, the feature extraction unit 2, and the processing unit 3 are stored in an integrated circuit, at least the other components can be realized as software. Further, the present invention can be realized by installing a program for causing the terminal device according to the above embodiment to perform the processing operations shown in FIGS. 33 to 35 in the terminal device.
[0196]
The storage medium can store programs such as a magnetic disk, a floppy disk, a hard disk, an optical disk (CD-ROM, CD-R, DVD, etc.), a magneto-optical disk (MO, etc.), a semiconductor memory, etc. As long as the storage medium is readable by the system, the storage form may be any form.
[0197]
An OS (operation system) running on the computer, a database management software, a MW (middleware) such as a network, or the like based on instructions of a program installed in the computer or the embedded system from the storage medium realizes the present embodiment. May be performed.
[0198]
Further, the storage medium is not limited to a medium independent of a computer or an embedded system, but also includes a storage medium in which a program transmitted via a LAN or the Internet is downloaded and stored or temporarily stored.
[0199]
Further, the number of storage media is not limited to one, and a case where the processing in the present embodiment is executed from a plurality of media is also included in the storage medium of the present invention, and the configuration of the medium may be any configuration.
[0200]
Further, the computer or the embedded system according to the present invention is for executing each processing in the present embodiment based on a program stored in a storage medium, and includes an apparatus including one such as a personal computer and a microcomputer; The device may have any configuration such as a system connected to a network.
[0201]
Further, the computer in the present invention is not limited to a personal computer, but also includes an arithmetic processing unit, a microcomputer, and the like included in an information processing device, and is a general term for devices and devices capable of realizing the functions of the present invention by a program. .
[0202]
Further, the present invention is not limited to the above-described first and second embodiments, and can be variously modified in an implementation stage without departing from the gist thereof. Furthermore, the above embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some components are deleted from all the components shown in the embodiment, (at least one of) the problems described in the column of the problem to be solved by the invention can be solved, and the effect of the invention can be solved. In the case where (at least one of) the effects described in (1) is obtained, a configuration from which this component is deleted can be extracted as an invention.
[0203]
【The invention's effect】
As described above, according to the present invention, based on three-dimensional features such as the three-dimensional shape and positional relationship of each of a plurality of imaging targets in a captured image (including a moving image and a still image). In addition, processing such as combining an advertisement image or an image of a desired virtual object can be performed.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a configuration example of a main part of a terminal device according to a first embodiment of the present invention.
FIG. 2 is a diagram for explaining a normal monochrome image and a depth monochrome image.
FIG. 3 is a diagram schematically illustrating a configuration example of an image acquisition unit.
FIG. 4 is a diagram illustrating a configuration example of a reflected light image capturing unit.
FIG. 5 is a diagram illustrating a three-dimensional image of a hand in a reflected light image obtained from depth information included in a reflected light image captured by a reflected light image capturing unit with a human hand as a capturing target; .
FIG. 6 is a diagram showing a specific example of a reflected light image (FIG. 6A) and a natural light image (FIG. 6B) acquired by an image acquiring unit.
FIG. 7 is a diagram illustrating an example of a depth color image to be processed by a feature extraction unit in a black and white image.
FIG. 8 is a diagram schematically illustrating a three-dimensional positional relationship of an object in the depth color image of FIG. 7;
FIG. 9 is a diagram showing a background portion in the depth color image of FIG. 7;
FIG. 10 is a diagram showing a person portion in the depth color image of FIG. 7;
FIG. 11 is a diagram showing an arm portion in the depth color image of FIG. 7;
FIG. 12 is a diagram showing a can portion in the depth color image of FIG. 7;
FIG. 13 is a diagram showing one relationship in the depth direction of an object in the depth color image of FIG. 7;
FIG. 14 is a diagram showing the positional relationship in the depth direction and the foreground / background in the depth color image of FIG. 7;
FIG. 15 is a view for explaining a method of obtaining a detailed positional relationship of an object in the depth color image from a discontinuity point (jump edge) in the depth color image in the image.
FIG. 16 is a view for explaining a method of specifying the position in the depth direction of each object in the depth color image of FIG. 7;
FIG. 17 is a view for explaining an example of a special effect-added image showing a result obtained by adding a special effect to the depth color image of FIG. 7;
FIG. 18 is a view for explaining another example of the special effect-added image showing a result of adding a special effect to the depth color image of FIG. 7;
FIG. 19 is a view for explaining still another example of the special effect-added image showing the result of adding the special effect to the depth color image of FIG. 7;
FIG. 20 is a view for explaining still another example of the special effect-added image showing the result of adding the special effect to the depth color image of FIG. 7;
FIG. 21 is a view for explaining still another example of the special effect-added image showing a result obtained by adding a special effect to the depth color image of FIG. 7;
FIG. 22 is a view for explaining still another example of the special effect-added image showing the result of adding the special effect to the depth color image of FIG. 7;
FIG. 23 is a view for explaining still another example of the special effect-added image showing the result of adding the special effect to the depth color image of FIG. 7;
FIG. 24 is a diagram showing an example of screen division for displaying an image on a screen presentation unit.
FIG. 25 is a diagram schematically showing a configuration example of a main part of a terminal device according to a first modification of the first embodiment of the present invention;
FIG. 26 is a diagram schematically illustrating a configuration example of a main part of a terminal device according to a second modification of the first embodiment of the present invention.
FIG. 27 is a diagram schematically showing a configuration example of a main part of a terminal device according to a third modification of the first embodiment of the present invention.
FIG. 28 is a view schematically showing a configuration example of a main part of a terminal device according to a fourth modification of the first embodiment of the present invention.
FIG. 29 is a diagram schematically showing an overall configuration of a communication system according to a second embodiment of the present invention.
FIG. 30 is a diagram schematically illustrating a configuration example of a main part of a server device.
FIG. 31 is a diagram illustrating an example of an image with an advertisement.
FIG. 32 is a view for explaining another example of an image with advertisement.
FIG. 33 is a flowchart illustrating a processing operation of the terminal device.
FIG. 34 is a flowchart illustrating a processing operation of the terminal device.
FIG. 35 is a flowchart illustrating the processing operation of the terminal device.
FIG. 36 is a flowchart for explaining the processing operation of the server device.
FIG. 37 is a flowchart illustrating the processing operation of the server device.
[Explanation of symbols]
1. Image acquisition unit
2. Feature extraction unit
3 Processing part
4: Image presentation unit
5 Communication unit
6 Special effect selection section
7 Additional image storage unit
8: Scene analysis unit
9 ... Selection section
10 receiving unit
11 ... Transmission unit

Claims

A terminal device that is one of a plurality of terminal devices that can communicate with each other,
First acquisition means for acquiring a first image including a plurality of imaging targets;
A second acquisition unit configured to acquire depth information corresponding to each of the plurality of regions, wherein the first image includes a plurality of regions including one or a plurality of pixels;
Extracting means for extracting at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship of the plurality of imaging targets as feature information based on the first image and the depth information;
Generating means for processing the first image to generate a second image based on the feature information extracted by the extracting means;
A terminal device comprising:

The generating means generates the second image by combining an image of the object with the first image, and at this time, the position of the object in the depth direction in the first image is determined. When defining, based on at least one of a three-dimensional positional relationship between the object and the plurality of imaging targets and a three-dimensional shape of the plurality of imaging targets, the image of the object is The terminal device according to claim 1, wherein the terminal device is combined with the first image.

The generation means generates the second image by combining an image of an object with the first image, and at this time, the movement of the object changes to a three-dimensional shape of the imaging target. 2. The terminal device according to claim 1, wherein the terminal device is controlled so as to match the image of the object with the first image.

The extraction means extracts a position in the depth direction where the imaging target exists in the first image and a background area after the imaging target,
The generation unit generates the second image by combining an image of an object with the first image, and the object is present between the imaging target and the background area. The terminal device according to claim 1, wherein the image of the object is combined with the first image.

The extraction unit is located at a first position in a depth direction where a first imaging target that is one of the plurality of imaging targets in the first image is present, and is located after the first imaging target. Extracting a second position in the depth direction where a second imaging target, which is another one of the plurality of imaging targets, exists;
The generating means generates the second image by combining an image of an object with the first image, wherein the object is located between the first position and the second position. 2. The terminal device according to claim 1, wherein an image of the object is combined with the first image so as to exist in the first image.

The extracting means extracts a third position as a three-dimensional position where the imaging target exists in the first image,
The generating means generates the second image by synthesizing the first image with an image of an object. At this time, based on the motion of the object and the motion of the imaging target, When the object reaches the third position, it is determined that the object has collided with the object to be photographed, and the motion or expression of the object is controlled so as to correspond to the collision, and the image of the object is changed to the first image. The terminal device according to claim 1, wherein the terminal device is combined with the terminal device.

The terminal device according to claim 6, wherein the generating unit combines an effect expression of the collision with the first image when determining that the object has collided with the imaging target.

The extracting means extracts a third position as a three-dimensional position where the imaging target exists in the first image,
The generation unit generates the second image by combining an image of an object with the first image, and at this time, the third position of the imaging target and the first position And controlling the movement and expression of the object based on a fourth position that is a three-dimensional position of the object in the image and combining the image of the object with the first image. Item 2. The terminal device according to Item 1.

The extraction means extracts, from the first image, respective images of the plurality of imaging targets and an image of a background region subsequent thereto.
2. The terminal according to claim 1, wherein the generation unit generates the second image by processing at least one of the image of the background area and each image of the plurality of imaging targets. 3. apparatus.

The second acquisition means,
Means for acquiring a third image corresponding to the first image and including depth information in each pixel value;
Means for extracting depth information corresponding to each area of the first image from the third image;
With
The extracting means is configured to determine at least a three-dimensional shape of the imaging target and a three-dimensional position of the plurality of imaging targets from the first image and the depth information corresponding to each region of the first image. The terminal device according to claim 1, wherein the relation is extracted.

The second acquisition means,
A light emitting unit that emits light toward the shooting target,
A first light receiving unit that emits light by the light emitting unit and receives a first light amount including reflected light from the imaging target of light irradiated on the imaging target;
A second light receiving unit that receives a second light amount that does not include the reflected light when the light emitting unit does not emit light;
Means for subtracting the second light amount from the first light amount and extracting the component of the reflected light from the first light amount to generate a third image including depth information in each pixel value;
Means for calculating depth information corresponding to each area of the first image from the third image;
With
The extracting means is configured to determine at least a three-dimensional shape of the imaging target and a three-dimensional position of the plurality of imaging targets from the first image and the depth information corresponding to each region of the first image. The terminal device according to claim 1, wherein the relation is extracted.

The terminal device according to claim 10, wherein the third image is obtained at the same time as the first image.

2. The terminal device according to claim 1, wherein the image obtained by the first obtaining unit is a color image.

The generating means generates the second image by combining the first image with an additional image expressing an effect expression such as an impact or an object.
Storage means for storing a plurality of types of additional images to be combined with the first image by the generation means;
Means for selecting an additional image to be combined with the first image from a plurality of additional images stored in the storage means;
The terminal device according to claim 1, further comprising:

The generating means generates the second image by combining the first image with an additional image expressing an effect expression such as an impact or an object.
Storage means for storing a plurality of types of additional images to be combined with the first image by the generation means;
Means for extracting, from the first image, at least one of an object, a color atmosphere, and a composition of the image shown in the first image as features of the scene;
Means for selecting an additional image to be combined with the first image from a plurality of types of additional images stored in the storage means, based on features of the scene;
The terminal device according to claim 1, further comprising:

The terminal device according to claim 1, further comprising a sound effect adding unit that adds a sound effect suitable for the second image.

Display means for displaying the second image;
Communication means for transmitting and receiving at least the second image to and from another of the plurality of terminal devices;
The terminal device according to claim 1, further comprising:

The server device in a communication system including a plurality of terminal devices capable of communicating with each other and a server device communicably connected to the plurality of terminal devices, wherein the server device transmits a message from one of the plurality of terminal devices. Receiving means for receiving the obtained first image including a plurality of imaging targets, and depth information corresponding to each of a plurality of regions including one or a plurality of pixels constituting the first image;
Extraction means for extracting at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship between the plurality of imaging targets based on the first image and the depth information received by the reception unit. When,
By combining an advertisement image with the first image based on at least one of a three-dimensional shape and a three-dimensional positional relationship of the plurality of imaging targets obtained by the extraction means, Generating means for generating an image of
Transmitting means for transmitting the second image to another terminal device of the plurality of terminal devices;
A server device comprising:

The server device in a communication system including at least a plurality of terminal devices capable of communicating with each other and a server device communicably connected to the plurality of terminal devices,
Corresponding to a first image including at least a plurality of imaging targets transmitted from one of the plurality of terminal devices and a plurality of regions including one or more pixels constituting the first image, respectively. Receiving means for receiving feature information including at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship of the plurality of imaging targets, extracted from the depth information to be obtained;
Generating means for generating a second image by combining an advertisement image with the first image based on the feature information received by the receiving means;
Transmitting means for transmitting the second image to another terminal device of the plurality of terminal devices;
A server device comprising:

Acquiring a first image including a plurality of imaging targets, acquiring depth information corresponding to each of the plurality of regions of the first image including a plurality of regions including one or more pixels,
Based on the first image and the depth information, at least a three-dimensional shape of the plurality of imaging targets and a three-dimensional positional relationship between the plurality of imaging targets are extracted as feature information, and the feature information is extracted. Processing the first image based on the first image.