JP2013030898A

JP2013030898A - Image transmission method, image transmission apparatus, image sending apparatus, image receiving apparatus, image sending program, and image receiving program

Info

Publication number: JP2013030898A
Application number: JP2011164283A
Authority: JP
Inventors: Shiori Sugimoto; 志織杉本; Shinya Shimizu; 信哉志水; Nobuhiko Matsuura; 宣彦松浦
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2011-07-27
Filing date: 2011-07-27
Publication date: 2013-02-07
Anticipated expiration: 2031-07-27
Also published as: JP5749595B2

Abstract

【課題】合成画像の品質を維持して伝送データの総画素数を削減する。
【解決手段】縮小率に基づいて多視点画像をダウンサンプリングし低解像度画像を出力するステップと、低解像度画像を符号化するステップと、高解像度デプスマップを符号化するステップと、低解像度画像符号データを復号するステップと、高解像度デプスマップ符号データを復号するステップと、高解像度多視点デプスマップから得られる各画素の奥行き情報を用いて各画素の別視点での小数画素精度での対応点を求めることで視点間対応関係を設定するステップと、各三次元投影点を別視点へ投影して視点間対応関係を求めるステップと、視点間対応関係に基づき別視点の小数画素値を参照しながらのアップサンプリングを行い低解像度画像から高解像度画像を生成するステップと、高解像度画像と高解像度デプスマップから任意の視点の画像を合成するステップとを有する。
【選択図】図１The total number of pixels of transmission data is reduced while maintaining the quality of a composite image.
A step of downsampling a multi-viewpoint image based on a reduction ratio and outputting a low resolution image, a step of encoding a low resolution image, a step of encoding a high resolution depth map, and a low resolution image code A step of decoding data, a step of decoding high-resolution depth map code data, and a corresponding point with decimal pixel accuracy at a different viewpoint of each pixel using depth information of each pixel obtained from the high-resolution multi-view depth map The step of setting the correspondence between viewpoints by obtaining the step, the step of obtaining the correspondence between viewpoints by projecting each three-dimensional projection point to another viewpoint, and the decimal pixel value of another viewpoint based on the correspondence between viewpoints Generating a high-resolution image from a low-resolution image by performing upsampling, and arbitrary viewing from the high-resolution image and the high-resolution depth map. And a step of synthesizing an image.
[Selection] Figure 1

Description

本発明は、画像伝送方法、画像伝送装置、画像送信装置、画像受信装置、画像送信プログラム及び画像受信プログラムに関する。 The present invention relates to an image transmission method, an image transmission apparatus, an image transmission apparatus, an image reception apparatus, an image transmission program, and an image reception program.

次世代の画像メディアの一つとして、視聴者が自由に視点を操作することができる自由視点画像が注目を集めている。自由視点画像は、対象シーンを多数の撮像装置を用いて様々な位置・角度から撮像してシーンの光線情報を取得し、これを元に任意の視点における光線情報を復元することによって様々な視点から見た画像を生成するものである。 As one of the next-generation image media, free viewpoint images that allow viewers to freely operate the viewpoint are attracting attention. A free viewpoint image captures the target scene from various positions and angles by using a number of imaging devices, obtains the light ray information of the scene, and restores the light ray information at an arbitrary viewpoint based on this information. The image seen from is generated.

このような画像を生成するためには、シーン内全ての光線情報を撮像によって取得するには膨大な数の撮像装置を密に設置しなければならないため、容易には実現できない。実際には、疎に配置した少数の撮像装置から得られる光線情報から、何らかの補間手法を用いて未取得の光線情報を合成する必要がある。 In order to generate such an image, a large number of imaging devices must be densely installed in order to acquire all the light ray information in the scene by imaging. In practice, it is necessary to synthesize unacquired light ray information from light ray information obtained from a small number of sparsely arranged imaging devices using some interpolation method.

この補間合成の手法のひとつとして、多視点画像とそこから推定されるシーンの奥行情報を用いて仮想視点画像を合成するDepth Image based Rendering（ＤＩＢＲ）がある。奥行情報は多視点画像の各画素における、カメラから被写体までの距離である。自由視点画像を伝送することを考えた場合、奥行情報を送信側で推定し多視点のグレースケール画像（デプスマップ）として記述して伝送することが有効である。この方法は受信側の演算量を削減すると共に、符号化歪みが重畳する前の多視点画像を用いて、奥行情報を推定することでより精度の高い推定を可能にする。 As one of the interpolation synthesis methods, there is Depth Image based Rendering (DIBR) in which a virtual viewpoint image is synthesized using multi-viewpoint images and scene depth information estimated therefrom. The depth information is the distance from the camera to the subject in each pixel of the multi-viewpoint image. When considering transmission of a free viewpoint image, it is effective to estimate the depth information on the transmission side and describe and transmit it as a multi-viewpoint gray scale image (depth map). This method reduces the amount of computation on the receiving side and enables more accurate estimation by estimating depth information using a multi-viewpoint image before encoding distortion is superimposed.

このような多視点画像と多視点デプスマップからなる画像データは膨大な情報量を持つため、より効率のいい符号化方式が必須であり、様々な方式が検討されている。しかしながら符号量の削減と共にもう一つ達成しなければならないこととして、デコーダのスループットとメモリ容量の上限から、画像データの総画素数を通常の単一視点画像の数倍程度に抑える必要があると報告されている。そこで一般的には、総画素数を削減するため、多視点デプスマップをダウンサンプリングし画素数の削減を行うといった方法が取られる（例えば、非特許文献１参照）。 Since image data composed of such a multi-view image and a multi-view depth map has a huge amount of information, a more efficient coding method is essential, and various methods are being studied. However, another thing that must be achieved along with the reduction in code amount is that the total number of pixels of image data needs to be reduced to several times that of a normal single-viewpoint image due to the upper limit of decoder throughput and memory capacity. It has been reported. Therefore, generally, in order to reduce the total number of pixels, a method of down-sampling the multi-view depth map and reducing the number of pixels is used (for example, see Non-Patent Document 1).

Yea S, Vetro A, “VIEW SYNTHESIS PREDICTION FOR RATEOVERHEAD REDUCTION IN FTV”, 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video(2008).Yea S, Vetro A, “VIEW SYNTHESIS PREDICTION FOR RATEOVERHEAD REDUCTION IN FTV”, 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video (2008).

しかしながら、多視点デプスマップをダウンサンプリングする方法では、デプスマップのダウンサンプリングによって三次元情報が欠損し視点間画素対応の正確性が損なわれることで、仮想視点における画像を合成した時に画像品質が著しく低下するという問題がある。逆にデプスマップの解像度を維持し多視点画像をダウンサンプリングした場合、三次元情報が正確に保持されるため上記の問題は発生しないが、合成前の画像自体の品質が低下することにより合成画像の品質も同様に低下すると容易に類推できる。 However, in the method of downsampling the multi-view depth map, the 3D information is lost due to the down-sampling of the depth map, and the accuracy of inter-viewpoint pixel correspondence is lost. There is a problem of lowering. Conversely, if the multi-viewpoint image is downsampled while maintaining the resolution of the depth map, the above problem does not occur because the three-dimensional information is accurately maintained, but the composite image is deteriorated because the quality of the pre-combination image itself is degraded. It can be easily inferred that the quality of the product also decreases.

本発明は、このような事情に鑑みてなされたもので、合成画像の品質を維持しながら伝送データの総画素数の削減を行うことができる自由視点の画像伝送方法、画像伝送装置、画像送信装置、画像受信装置、画像送信プログラム及び画像受信プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and a free viewpoint image transmission method, an image transmission apparatus, and an image transmission capable of reducing the total number of pixels of transmission data while maintaining the quality of a composite image. An object is to provide an apparatus, an image receiving apparatus, an image transmission program, and an image reception program.

本発明は、同一シーンを複数のカメラで撮影した高解像度多視点画像とその画素毎の奥行情報をグレースケール画像として記述した高解像度多視点デプスマップからなる自由視点画像の画像伝送方法であって、前記高解像度多視点画像をダウンサンプリングする際の縮小率を設定する縮小率設定ステップと、前記縮小率に基づいて前記高解像度多視点画像をダウンサンプリングし低解像度多視点画像を出力するダウンサンプリングステップと、前記ダウンサンプリングステップで得られた低解像度多視点画像を符号化して低解像度多視点画像符号データを出力する画像符号化ステップと、前記高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データを出力するデプスマップ符号化ステップと、前記画像符号化ステップにより得られた低解像度多視点画像符号データを復号する画像復号ステップと、前記デプスマップ符号化ステップにより得られた高解像度多視点デプスマップ符号データを復号するデプスマップ復号ステップと、前記デプスマップ復号ステップで得られた高解像度多視点デプスマップから得られる各画素の奥行き情報を用いて各画素の別視点での小数画素精度での対応点を求めることで視点間対応関係を設定する視点間対応関係設定ステップと、前記視点間対応関係に基づき別視点の小数画素値を参照しながらのアップサンプリングを行い前記画像復号ステップで得られた低解像度多視点画像から高解像度多視点画像を生成するアップサンプリングステップと、前記アップサンプリングステップで得られた高解像度多視点画像と前記高解像度多視点デプスマップから任意視点の画像を合成する自由視点画像合成ステップとを有することを特徴とする。 The present invention is an image transmission method of a free-viewpoint image comprising a high-resolution multi-viewpoint image obtained by photographing the same scene with a plurality of cameras and a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image. A reduction rate setting step for setting a reduction rate when downsampling the high-resolution multi-viewpoint image; and downsampling for downsampling the high-resolution multi-viewpoint image and outputting a low-resolution multi-viewpoint image based on the reduction rate An encoding step of encoding the low-resolution multi-view image obtained in the down-sampling step and outputting low-resolution multi-view image code data; and encoding the high-resolution multi-view depth map to generate a high-resolution multi-view Depth map encoding step for outputting depth map code data, and the image encoding step An image decoding step for decoding the obtained low-resolution multi-view image code data, a depth map decoding step for decoding the high-resolution multi-view depth map code data obtained by the depth map encoding step, and the depth map decoding Inter-viewpoint correspondence that sets inter-viewpoint correspondences by finding corresponding points with different pixel precision at different viewpoints of each pixel using depth information of each pixel obtained from the high-resolution multiview depth map obtained in the step An up-sampling step that generates a high-resolution multi-view image from the low-resolution multi-view image obtained in the image decoding step by performing up-sampling while referring to the decimal pixel value of another viewpoint based on the correspondence relationship between the viewpoints; Sampling step, high-resolution multi-viewpoint image obtained in the up-sampling step and the high-resolution And having a free viewpoint image synthesizing step of synthesizing the image of the arbitrary viewpoint from the viewpoint depth map.

本発明は、ダウンサンプリングに用いる予め定められたフィルタ群の中からフィルタを選択する第１のフィルタ選択ステップと、アップサンプリングに用いる予め定められたフィルタ群の中からフィルタを選択する第２のフィルタ選択ステップとをさらに有し、前記ダウンサンプリングステップは、前記第１のフィルタ選択ステップで選択されたフィルタを用いて前記高解像度多視点画像をダウンサンプリングし、前記アップサンプリングステップは、前記第２のフィルタ選択ステップで選択されたフィルタを用いて前記低解像度多視点画像をアップサンプリングすることを特徴とする。 The present invention includes a first filter selection step for selecting a filter from a predetermined filter group used for downsampling, and a second filter for selecting a filter from a predetermined filter group used for upsampling. A selection step, wherein the down-sampling step down-samples the high-resolution multi-viewpoint image using the filter selected in the first filter selection step, and the up-sampling step includes the second sampling step. The low-resolution multi-viewpoint image is up-sampled using the filter selected in the filter selection step.

本発明は、前記第１のフィルタ選択ステップで選択したフィルタを識別する情報を符号化し付加情報符号データとして出力する付加情報符号化ステップと、前記付加情報符号データを復号して前記フィルタを識別する情報を出力する付加情報復号ステップとをさらに有し、前記第２のフィルタ選択ステップは、前記フィルタを識別する情報に基づいてアップサンプリングに用いるフィルタを選択することを特徴とする。 The present invention includes an additional information encoding step of encoding information for identifying the filter selected in the first filter selection step and outputting the information as additional information code data, and identifying the filter by decoding the additional information code data An additional information decoding step for outputting information, wherein the second filter selection step selects a filter to be used for upsampling based on information for identifying the filter.

本発明は、画像の復元効率が最も高くなるフィルタを設定する第１のフィルタ設定ステップと、画像の復元効率が最も高くなるフィルタを設定する第２のフィルタ設定ステップとをさらに有し、前記ダウンサンプリングステップは、前記第１のフィルタ設定ステップで設定されたフィルタを用いて前記高解像度多視点画像をダウンサンプリングし、前記アップサンプリングステップは、前記第２のフィルタ設定ステップで設定されたフィルタを用いて前記低解像度多視点画像をアップサンプリングすることを特徴とする。 The present invention further includes a first filter setting step for setting a filter with the highest image restoration efficiency, and a second filter setting step for setting a filter with the highest image restoration efficiency. The sampling step downsamples the high-resolution multi-viewpoint image using the filter set in the first filter setting step, and the upsampling step uses the filter set in the second filter setting step. Then, the low-resolution multi-viewpoint image is upsampled.

本発明は、前記第１のフィルタ設定ステップで設定したフィルタの識別情報を符号化して付加情報符号データを出力する付加情報符号化ステップと、前記付加情報符号データを復号してフィルタの識別情報を出力する付加情報復号ステップとをさらに有し、前記第２のフィルタ設定ステップは、前記フィルタの識別情報に基づき前記アップサンプリングに用いるフィルタを設定することを特徴とする。 The present invention includes an additional information encoding step for encoding the filter identification information set in the first filter setting step and outputting additional information code data, and decoding the additional information code data to obtain the filter identification information. And an additional information decoding step for outputting, wherein the second filter setting step sets a filter used for the upsampling based on identification information of the filter.

本発明は、同一シーンを複数のカメラで撮影した高解像度多視点画像とその画素毎の奥行情報をグレースケール画像として記述した高解像度多視点デプスマップからなる自由視点画像の画像伝送装置であって、前記高解像度多視点画像をダウンサンプリングする際の縮小率を設定する縮小率設定手段と、前記縮小率に基づいて前記高解像度多視点画像をダウンサンプリングし低解像度多視点画像を出力するダウンサンプリング手段と、前記ダウンサンプリング手段で得られた低解像度多視点画像を符号化して低解像度多視点画像符号データを出力する画像符号化手段と、前記高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データを出力するデプスマップ符号化手段と、前記画像符号化手段により得られた低解像度多視点画像符号データを復号する画像復号手段と、前記デプスマップ符号化手段により得られた高解像度多視点デプスマップ符号データを復号するデプスマップ復号手段と、前記デプスマップ復号手段で得られた高解像度多視点デプスマップから得られる各画素の奥行き情報を用いて各画素の別視点での小数画素精度での対応点を求めることで視点間対応関係を設定する視点間対応関係設定手段と、前記視点間対応関係に基づき別視点の小数画素値を参照しながらのアップサンプリングを行い前記画像復号手段で得られた低解像度多視点画像から高解像度多視点画像を生成するアップサンプリング手段と、前記アップサンプリング手段で得られた高解像度多視点画像と前記高解像度多視点デプスマップから任意視点の画像を合成する自由視点画像合成手段とを備えることを特徴とする。 The present invention is an image transmission apparatus for a free-viewpoint image comprising a high-resolution multi-viewpoint image obtained by photographing the same scene with a plurality of cameras and a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image. Reduction ratio setting means for setting a reduction ratio when down-sampling the high-resolution multi-viewpoint image; down-sampling for down-sampling the high-resolution multi-viewpoint image based on the reduction ratio and outputting a low-resolution multi-viewpoint image Means, image encoding means for encoding the low-resolution multi-view image obtained by the down-sampling means and outputting low-resolution multi-view image code data, and encoding the high-resolution multi-view depth map for high-resolution multi-view Depth map encoding means for outputting depth map code data, and low resolution multi-value obtained by the image encoding means. Image decoding means for decoding point image code data, depth map decoding means for decoding high-resolution multi-view depth map code data obtained by the depth map coding means, and high resolution obtained by the depth map decoding means An inter-viewpoint correspondence setting means for setting the inter-viewpoint correspondence by obtaining corresponding points with decimal pixel accuracy at different viewpoints of each pixel using depth information of each pixel obtained from a multi-view depth map; Up-sampling means for generating a high-resolution multi-view image from a low-resolution multi-view image obtained by the image decoding means by performing up-sampling while referring to a decimal pixel value of another viewpoint based on the inter-correspondence relationship, and the up-sampling Free viewpoint image that synthesizes an arbitrary viewpoint image from the high resolution multi-view image obtained by the means and the high-resolution multi-view depth map Characterized in that it comprises a combining means.

本発明は、ダウンサンプリングに用いる予め定められたフィルタ群の中からフィルタを選択する第１のフィルタ選択手段と、アップサンプリングに用いる予め定められたフィルタ群の中からフィルタを選択する第２のフィルタ選択手段とをさらに備え、前記ダウンサンプリング手段は、前記第１のフィルタ選択手段で選択されたフィルタを用いて前記高解像度多視点画像をダウンサンプリングし、前記アップサンプリング手段は、前記第２のフィルタ選択手段で選択されたフィルタを用いて前記低解像度多視点画像をアップサンプリングすることを特徴とする。 The present invention provides a first filter selecting means for selecting a filter from a predetermined filter group used for downsampling, and a second filter for selecting a filter from a predetermined filter group used for upsampling. Selecting means, wherein the downsampling means downsamples the high-resolution multi-viewpoint image using the filter selected by the first filter selecting means, and the upsampling means includes the second filter. The low-resolution multi-viewpoint image is upsampled using the filter selected by the selection means.

本発明は、画像の復元効率が最も高くなるフィルタを設定する第１のフィルタ設定手段と、画像の復元効率が最も高くなるフィルタを設定する第２のフィルタ設定手段とをさらに備え、前記ダウンサンプリング手段は、前記第１のフィルタ設定手段で設定されたフィルタを用いて前記高解像度多視点画像をダウンサンプリングし、前記アップサンプリング手段は、前記第２のフィルタ設定手段で設定されたフィルタを用いて前記低解像度多視点画像をアップサンプリングすることを特徴とする。 The present invention further includes a first filter setting unit that sets a filter that maximizes the image restoration efficiency, and a second filter setting unit that sets a filter that maximizes the image restoration efficiency. The means downsamples the high-resolution multi-viewpoint image using the filter set by the first filter setting means, and the upsampling means uses the filter set by the second filter setting means. The low-resolution multi-viewpoint image is upsampled.

本発明は、同一シーンを複数のカメラで撮影した高解像度多視点画像とその画素毎の奥行情報をグレースケール画像として記述した高解像度多視点デプスマップからなる自由視点画像を送信する画像送信装置であって、前記高解像度多視点画像をダウンサンプリングする際の縮小率を設定する縮小率設定手段と、前記縮小率に基づいて前記高解像度多視点画像をダウンサンプリングし低解像度多視点画像を出力するダウンサンプリング手段と、前記ダウンサンプリング手段で得られた低解像度多視点画像を符号化して低解像度多視点画像符号データを出力する画像符号化手段と、前記高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データを出力するデプスマップ符号化手段とを備えることを特徴とする。 The present invention is an image transmission device that transmits a high-resolution multi-viewpoint image obtained by photographing a same scene with a plurality of cameras and a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image. A reduction ratio setting means for setting a reduction ratio when downsampling the high-resolution multi-viewpoint image; and downsampling the high-resolution multi-viewpoint image based on the reduction ratio and outputting a low-resolution multi-viewpoint image Down-sampling means, image encoding means for encoding the low-resolution multi-view image obtained by the down-sampling means and outputting low-resolution multi-view image code data, and encoding the high-resolution multi-view depth map and high resolution And depth map encoding means for outputting multi-view depth map code data.

本発明は、同一シーンを複数のカメラで撮影した高解像度多視点画像とその画素毎の奥行情報をグレースケール画像として記述した高解像度多視点デプスマップからなる自由視点画像を送信する画像送信装置から送信された自由視点画像符号データを受信する画像受信装置であって、前記画像符号化手段により得られた低解像度多視点画像符号データを復号する画像復号手段と、前記デプスマップ符号化手段により得られた高解像度多視点デプスマップ符号データを復号するデプスマップ復号手段と、前記デプスマップ復号手段で得られた高解像度多視点デプスマップから得られる各画素の奥行き情報を用いて各画素の別視点での小数画素精度での対応点を求めることで視点間対応関係を設定する視点間対応関係設定手段と、前記視点間対応関係に基づき別視点の小数画素値を参照しながらのアップサンプリングを行い前記画像復号手段で得られた低解像度多視点画像から高解像度多視点画像を生成するアップサンプリング手段と、前記アップサンプリング手段で得られた高解像度多視点画像と前記高解像度多視点デプスマップから任意視点の画像を合成する自由視点画像合成手段とを備えることを特徴とする。 The present invention provides a high-resolution multi-viewpoint image obtained by photographing the same scene with a plurality of cameras and a free-viewpoint image composed of a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image. An image receiving apparatus for receiving transmitted free-viewpoint image code data, obtained by the image decoding means for decoding low-resolution multi-viewpoint image code data obtained by the image coding means, and the depth map coding means. Depth map decoding means for decoding the obtained high resolution multi-view depth map code data, and another viewpoint for each pixel using depth information of each pixel obtained from the high-resolution multi-view depth map obtained by the depth map decoding means Inter-viewpoint correspondence setting means for setting a correspondence relationship between viewpoints by obtaining corresponding points with decimal pixel accuracy in the above, and correspondence between the viewpoints An upsampling unit that generates a high-resolution multi-viewpoint image from a low-resolution multi-viewpoint image obtained by the image decoding unit by performing upsampling while referring to a fractional pixel value of another viewpoint based on the relationship, and the upsampling unit And a free-viewpoint image synthesizing unit that synthesizes an arbitrary-viewpoint image from the obtained high-resolution multi-viewpoint image and the high-resolution multi-viewpoint depth map.

本発明は、同一シーンを複数のカメラで撮影した高解像度多視点画像とその画素毎の奥行情報をグレースケール画像として記述した高解像度多視点デプスマップからなる自由視点画像を送信する画像送信装置上のコンピュータに画像送信処理を行わせる画像送信プログラムであって、前記高解像度多視点画像をダウンサンプリングする際の縮小率を設定する縮小率設定ステップと、前記縮小率に基づいて前記高解像度多視点画像をダウンサンプリングし低解像度多視点画像を出力するダウンサンプリングステップと、前記ダウンサンプリングステップで得られた低解像度多視点画像を符号化して低解像度多視点画像符号データを出力する画像符号化ステップと、前記高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データを出力するデプスマップ符号化ステップとを行わせることを特徴とする。 The present invention relates to an image transmission apparatus for transmitting a free viewpoint image composed of a high resolution multi-viewpoint image obtained by photographing the same scene with a plurality of cameras and a high-resolution multiview depth map in which depth information for each pixel is described as a grayscale image. An image transmission program for causing the computer to perform image transmission processing, a reduction ratio setting step for setting a reduction ratio when down-sampling the high-resolution multi-viewpoint image, and the high-resolution multi-viewpoint based on the reduction ratio A downsampling step of downsampling the image and outputting a low-resolution multi-viewpoint image; an image encoding step of encoding the low-resolution multi-viewpoint image obtained in the downsampling step and outputting low-resolution multi-viewpoint image code data; The high-resolution multi-view depth map is encoded and the high-resolution multi-view depth map code is decoded. Characterized in that to perform the depth map coding step of outputting the data.

本発明は、同一シーンを複数のカメラで撮影した高解像度多視点画像とその画素毎の奥行情報をグレースケール画像として記述した高解像度多視点デプスマップからなる自由視点画像を送信する画像送信装置から送信された自由視点画像符号データを受信する画像受信装置上のコンピュータに画像受信処理を行わせる画像受信プログラムであって、前記画像符号化ステップにより得られた低解像度多視点画像符号データを復号する画像復号ステップと、前記デプスマップ符号化ステップにより得られた高解像度多視点デプスマップ符号データを復号するデプスマップ復号ステップと、前記デプスマップ復号ステップで得られた高解像度多視点デプスマップから得られる各画素の奥行き情報を用いて各画素の別視点での小数画素精度での対応点を求めることで視点間対応関係を設定する視点間対応関係設定ステップと、前記視点間対応関係に基づき別視点の小数画素値を参照しながらのアップサンプリングを行い前記画像復号ステップで得られた低解像度多視点画像から高解像度多視点画像を生成するアップサンプリングステップと、前記アップサンプリングステップで得られた高解像度多視点画像と前記高解像度多視点デプスマップから任意視点の画像を合成する自由視点画像合成ステップとを行わせることを特徴とする。 The present invention provides a high-resolution multi-viewpoint image obtained by photographing the same scene with a plurality of cameras and a free-viewpoint image composed of a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image. An image reception program for causing a computer on an image receiving apparatus that receives transmitted free viewpoint image code data to perform image reception processing, which decodes low-resolution multi-view image code data obtained by the image encoding step Obtained from an image decoding step, a depth map decoding step for decoding high-resolution multi-view depth map code data obtained by the depth map encoding step, and a high-resolution multi-view depth map obtained by the depth map decoding step Corresponding point with decimal pixel precision from different viewpoint of each pixel using depth information of each pixel A low-resolution obtained in the image decoding step by performing the up-sampling while referring to the fractional pixel value of another viewpoint based on the inter-viewpoint correspondence, and setting the inter-viewpoint correspondence by setting An up-sampling step for generating a high-resolution multi-view image from a multi-view image, and a free-view image synthesis for synthesizing an arbitrary viewpoint image from the high-resolution multi-view image obtained in the up-sampling step and the high-resolution multi-view depth map Steps are performed.

本発明によれば、高解像度多視点画像をダウンサンプリングして低解像度多視点画像として高解像度デプスマップと共に送信し、受信後に高解像度デプスマップから得られる三次元情報を用いて視点間で画素情報を参照し合うことにより、低解像度多視点画像をアップサンプリングして自由視点画像合成に用いることができ、結果として合成画像の品質を維持しながら伝送データの総画素数の削減を行うことができるという効果が得られる。 According to the present invention, a high-resolution multi-viewpoint image is downsampled and transmitted as a low-resolution multi-viewpoint image together with a high-resolution depth map, and pixel information between viewpoints is obtained using three-dimensional information obtained from the high-resolution depth map after reception. By referring to each other, a low-resolution multi-viewpoint image can be up-sampled and used for free-viewpoint image synthesis. As a result, the total number of pixels of transmission data can be reduced while maintaining the quality of the synthesized image. The effect is obtained.

本発明の第１の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 1st Embodiment of this invention. 図１に示す画像送信装置２００の処理動作を示すフローチャートである。3 is a flowchart showing a processing operation of the image transmission apparatus 200 shown in FIG. 図１に示す画像受信装置３００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the image receiver 300 shown in FIG. 本発明の第２の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 2nd Embodiment of this invention. 図４に示す画像送信装置２００の処理動作を示すフローチャートである。5 is a flowchart showing a processing operation of the image transmission device 200 shown in FIG. 図４に示す画像受信装置３００の処理動作を示すフローチャートである。5 is a flowchart showing a processing operation of the image receiving apparatus 300 shown in FIG. 本発明の第３の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 3rd Embodiment of this invention. 図７に示す画像送信装置２００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the image transmission apparatus 200 shown in FIG. 図７に示す画像受信装置３００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the image receiver 300 shown in FIG.

＜第１の実施形態＞
以下、図面を参照して、本発明の第１の実施形態による画像伝送装置を説明する。図１は同実施形態の構成を示すブロック図である。画像伝送装置１００は、画像送信装置２００と、画像受信装置３００から構成する。画像送信装置２００は、多視点画像入力部２０１、多視点デプスマップ入力部２０２、縮小率設定部２０３、ダウンサンプリング部２０４、デプスマップ符号化部２０５、画像符号化部２０６、多重化部２０７を備える。 <First Embodiment>
Hereinafter, an image transmission apparatus according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. The image transmission device 100 includes an image transmission device 200 and an image reception device 300. The image transmission apparatus 200 includes a multi-view image input unit 201, a multi-view depth map input unit 202, a reduction rate setting unit 203, a downsampling unit 204, a depth map encoding unit 205, an image encoding unit 206, and a multiplexing unit 207. Prepare.

多視点画像入力部２０１は、自由視点画像を生成するためのシーンを多視点から撮影した高解像度多視点画像を入力する。多視点デプスマップ入力部２０２は、高解像度多視点デプスマップを入力する。縮小率設定部２０３は、多視点画像の縮小率を決定する。ダウンサンプリング部２０４は、入力された高解像度多視点画像を設定された縮小率で縮小し低解像度多視点画像を生成する。デプスマップ符号化部２０５は、入力された高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データを生成する。画像符号化部２０６は、低解像度多視点画像を符号化し低解像度多視点画像符号データを生成する。多重化部２０７は、低解像度多視点画像符号データと高解像度多視点デプスマップ符号データとを多重化し自由視点画像符号データを生成する。 The multi-view image input unit 201 inputs a high-resolution multi-view image obtained by capturing a scene for generating a free viewpoint image from multiple viewpoints. The multi-view depth map input unit 202 inputs a high-resolution multi-view depth map. The reduction rate setting unit 203 determines the reduction rate of the multi-viewpoint image. The downsampling unit 204 generates a low-resolution multi-viewpoint image by reducing the input high-resolution multi-viewpoint image at a set reduction rate. The depth map encoding unit 205 encodes the input high-resolution multi-view depth map to generate high-resolution multi-view depth map code data. The image encoding unit 206 encodes the low-resolution multi-view image and generates low-resolution multi-view image code data. The multiplexing unit 207 multiplexes the low resolution multi-view image code data and the high resolution multi-view depth map code data to generate free viewpoint image code data.

また、画像受信装置３００は、符号データ入力部３０１、逆多重化部３０２、画像復号部３０３、デプスマップ復号部３０４、三次元投影点設定部３０５、視点間対応関係設定部３０６、アップサンプリング部３０７、仮想視点設定部３０８、自由視点画像合成部３０９を備える。 Also, the image receiving apparatus 300 includes a code data input unit 301, a demultiplexing unit 302, an image decoding unit 303, a depth map decoding unit 304, a 3D projection point setting unit 305, an inter-viewpoint correspondence setting unit 306, and an upsampling unit. 307, a virtual viewpoint setting unit 308, and a free viewpoint image composition unit 309.

符号データ入力部３０１は、自由視点画像符号データを入力する。逆多重化部３０２は、入力された符号データを低解像度多視点画像符号データと高解像度多視点デプスマップ符号データとに逆多重化する。画像復号部３０３は、低解像度多視点画像符号データを復号し低解像度多視点画像とする。デプスマップ復号部３０４は、高解像度多視点デプスマップ符号データを復号し高解像度多視点デプスマップとする。三次元投影点設定部３０５は、高解像度多視点画像における各画素の三次元上への投影点を計算する。視点間対応点設定部３０６は、各視点各画素の三次元投影点を用いて、高解像度多視点画像の視点間画素対応関係を小数画素精度で求める。アップサンプリング部３０７は、求めた視点間画素対応関係を用いて低解像度多視点画像をアップサンプリングし、高解像度多視点画像とする。仮想視点設定部３０８は、自由視点画像を生成するための仮想的な視点を設定する。自由視点画像合成部３０９は、高解像度多視点画像と高解像度デプスマップを用いて、設定した仮想視点からシーンを観測した画像を自由視点画像として合成する。 The code data input unit 301 inputs free viewpoint image code data. The demultiplexer 302 demultiplexes the input code data into low-resolution multi-view image code data and high-resolution multi-view depth map code data. The image decoding unit 303 decodes the low-resolution multi-view image code data to obtain a low-resolution multi-view image. The depth map decoding unit 304 decodes the high-resolution multi-view depth map code data to obtain a high-resolution multi-view depth map. The three-dimensional projection point setting unit 305 calculates the projection point on each pixel in the high-resolution multi-viewpoint image. The inter-viewpoint corresponding point setting unit 306 obtains the inter-viewpoint pixel correspondence of the high-resolution multi-viewpoint image with the decimal pixel accuracy using the three-dimensional projection point of each pixel of each viewpoint. The up-sampling unit 307 up-samples the low-resolution multi-viewpoint image using the obtained inter-viewpoint pixel correspondence to obtain a high-resolution multi-viewpoint image. The virtual viewpoint setting unit 308 sets a virtual viewpoint for generating a free viewpoint image. The free viewpoint image combining unit 309 combines an image obtained by observing a scene from the set virtual viewpoint as a free viewpoint image using the high resolution multi-view image and the high resolution depth map.

次に、図２を参照して、図１に示す画像送信装置２００の処理動作を説明する。図２は、図１に示す画像送信装置２００の処理動作を示すフローチャートである。図２は、図１に示す画像送信装置２００によって、自由視点画像合成に必要となる、同じシーンを複数の視点から撮影した多視点画像と、それぞれの視点におけるシーンの深度情報を記録した多視点デプスマップとを伝送する際の処理動作を示したものである。 Next, the processing operation of the image transmission apparatus 200 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the image transmission apparatus 200 shown in FIG. FIG. 2 shows a multi-viewpoint in which the image transmission apparatus 200 shown in FIG. 1 records a multi-viewpoint image obtained by photographing the same scene from a plurality of viewpoints and depth information of the scene at each viewpoint, which are necessary for free viewpoint image synthesis. The processing operation at the time of transmitting the depth map is shown.

まず、多視点画像入力部２０１は高解像度多視点画像を入力し、多視点デプスマップ入力部２０２は高解像度多視点デプスマップを入力する（ステップＳ１）。次に、縮小率設定部２０３は多視点画像の縮小率を設定する（ステップＳ２）。この縮小率は縦横同率の縮小率としても構わないし、縦横それぞれに独立の縮小率を設定しても構わない。また、縮小率の値そのものは、固定値を用いても構わないし、多視点画像の解像度や視点数等から適切な値を計算する方法をとっても構わない。画像受信装置３００側でその値を得ることができない場合は送信側で値を符号化して伝送しても構わない。 First, the multi-view image input unit 201 inputs a high-resolution multi-view image, and the multi-view depth map input unit 202 inputs a high-resolution multi-view depth map (step S1). Next, the reduction ratio setting unit 203 sets the reduction ratio of the multi-viewpoint image (step S2). This reduction ratio may be a reduction ratio of the same aspect ratio, or an independent reduction ratio may be set for each of the vertical and horizontal directions. In addition, a fixed value may be used as the reduction ratio value itself, or an appropriate value may be calculated from the resolution of the multi-viewpoint image, the number of viewpoints, and the like. If the value cannot be obtained on the image receiving apparatus 300 side, the value may be encoded and transmitted on the transmitting side.

次に、ダウンサンプリング部２０４は、設定された縮小率で高解像度多視点画像を縮小し、低解像度多視点画像を生成する（ステップＳ３）。続いて、デプスマップ符号化部２０５は高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データとし、画像符号化部２０６により低解像度多視点画像を符号化し低解像度多視点画像符号データとする（ステップＳ４）。符号化には任意の手法を用いて構わない。 Next, the downsampling unit 204 reduces the high-resolution multi-viewpoint image with the set reduction rate, and generates a low-resolution multi-viewpoint image (step S3). Subsequently, the depth map encoding unit 205 encodes the high-resolution multi-view depth map to generate high-resolution multi-view depth map code data, and the image encoding unit 206 encodes the low-resolution multi-view image to generate the low-resolution multi-view image code data. (Step S4). Any method may be used for encoding.

次に、多重化部２０７は、低解像度多視点画像符号データと高解像度多視点デプスマップ符号データとを自由視点画像符号データとして多重化し（ステップＳ５）、多重化した符号データを出力する（ステップＳ６）。この出力方法としては、ネットワークにて伝送しても構わないし、何らかの記録媒体に記録する方法をとっても構わない。 Next, the multiplexing unit 207 multiplexes the low-resolution multi-view image code data and the high-resolution multi-view depth map code data as free-view image code data (step S5), and outputs the multiplexed code data (step S5). S6). This output method may be transmitted over a network or may be recorded on some recording medium.

次に、図３を参照して、図１に示す画像受信装置３００の処理動作を説明する。図３は、図１に示す画像受信装置３００の処理動作を示すフローチャートである。図３は、画像受信装置３００によって、画像送信装置２００により伝送された低解像度多視点画像と高解像度多視点デプスマップを受信し、高解像度多視点デプスマップを用いて多視点画像の解像度をアップサンプリングした上で、それらを用いて自由視点画像を合成する際の処理動作を示したものである。 Next, the processing operation of the image receiving apparatus 300 shown in FIG. 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the processing operation of the image receiving apparatus 300 shown in FIG. FIG. 3 shows that the image receiving device 300 receives the low-resolution multi-view image and the high-resolution multi-view depth map transmitted by the image transmitting device 200, and increases the resolution of the multi-view image using the high-resolution multi-view depth map. It shows the processing operation when a free viewpoint image is synthesized using them after sampling.

まず、符号データ入力部３０１は、自由視点画像符号データを入力する（ステップＳ１１）。このときの入力はネットワークによる伝送でも構わないし、何らかの記録媒体からの入力でも構わない。入力された自由視点画像符号データは逆多重化部３０２に受け渡され、逆多重化部３０２は、低解像度多視点画像符号データと高解像度多視点デプスマップ符号データとに逆多重化する（ステップＳ１２）。画像復号部３０３は、低解像度多視点画像符号データを復号し低解像度多視点画像を出力し、デプスマップ復号部３０４は高解像度多視点デプスマップ符号データを復号し高解像度多視点デプスマップを出力する（ステップＳ１３）。このときの復号方法は送信装置側で符号化に用いたものと同一の方法が使われる。 First, the code data input unit 301 inputs free viewpoint image code data (step S11). The input at this time may be transmission by a network or input from some recording medium. The input free viewpoint image code data is transferred to the demultiplexing unit 302, and the demultiplexing unit 302 demultiplexes the low-resolution multi-view image code data and the high-resolution multi-view depth map code data (steps). S12). The image decoding unit 303 decodes the low-resolution multi-view image code data and outputs a low-resolution multi-view image, and the depth map decoding unit 304 decodes the high-resolution multi-view depth map code data and outputs a high-resolution multi-view depth map (Step S13). The decoding method at this time is the same as that used for encoding on the transmitting apparatus side.

次に、三次元投影点設定部３０５は、高解像度多視点デプスマップより得られる奥行情報と各視点のカメラパラメータに基づき、高解像度多視点画像における各画素の三次元上への投影点を計算する（ステップＳ１４）。各視点のカメラパラメータは、送信側と同一の値を得られればよいとし、画像から推定を行っても構わないし、送信側で符号化して伝送しても構わない。 Next, the three-dimensional projection point setting unit 305 calculates the projection point on each pixel in the high-resolution multi-viewpoint image based on the depth information obtained from the high-resolution multi-viewpoint depth map and the camera parameters of each viewpoint. (Step S14). The camera parameters for each viewpoint may be obtained from the same value as that on the transmission side, and may be estimated from the image, or may be encoded and transmitted on the transmission side.

次に、視点間対応点設定部３０６は、高解像度多視点画像の視点間画素対応関係を小数画素精度で求める（ステップＳ１５）。これは各画素の三次元投影点が他の視点カメラによって撮影される際の投影面での小数画素精度の位置をその視点における対応点の位置とするものである。 Next, the inter-viewpoint corresponding point setting unit 306 obtains the inter-viewpoint pixel correspondence of the high-resolution multi-viewpoint image with decimal pixel accuracy (step S15). In this case, the position of the decimal pixel accuracy on the projection plane when the three-dimensional projection point of each pixel is photographed by another viewpoint camera is the position of the corresponding point at that viewpoint.

次に、アップサンプリング部３０７は、ステップＳ１５で得られた視点間画素対応関係を用いて低解像度多視点画像をアップサンプリングし高解像度多視点画像を出力する（ステップＳ１６）。このアップサンプル処理は次の関数Ｕ_ｖで表される。

Next, the upsampling unit 307 upsamples the low-resolution multi-viewpoint image using the inter-viewpoint pixel correspondence obtained in step S15 and outputs the high-resolution multi-viewpoint image (step S16). The up-sampling process is represented by the following function U _v.

ここで、ＨＲ_ｖは視点ｖにおける高解像度画像、ＬＲ_ｋは視点ｋにおける低解像度画像、ｆ_ｋ，ｌはステップＳ１５で得られる視点ｋにおける高解像度画像の各画素に対する視点ｌの低解像度画像上の小数画素精度の対応位置を示す関数を表す。 Here, HR _v is a high-resolution image at the viewpoint v, LR _k is a low-resolution image at the viewpoint k, and f _{k, l} are on the low-resolution image at the viewpoint l for each pixel of the high-resolution image at the viewpoint k obtained in step S15. Represents a function indicating the corresponding position of the decimal pixel accuracy.

具体的には、ステップＳ１５で得られた視点間画素対応関係に基づいて得られる他視点の小数画素位置の画素値を参照することによってアップサンプル処理を行う。このときのアップサンプル処理ｕ_ｖは次の数式で表される。

Specifically, the upsampling process is performed by referring to the pixel value at the decimal pixel position of the other viewpoint obtained based on the inter-viewpoint pixel correspondence obtained in step S15. The upsampling process u _{v at} this time is expressed by the following equation.

ここで、多視点画像の視点数をｎ、視点ｋの高解像画像の画素ｐ_ｋに対する視点ｌの低解像度画像ＬＲ_ｌでの対応点をＣ_ｌ（ｐ_ｋ）とした。また、小数画素位置の画素値ＬＲ_ｌ［Ｃｌ（ｐ_ｖ）］は周辺の整数画素の値から任意の補間法を使って求める。例えば４近傍画素から一次補間で求める方法等が適用できる。あるいは、その視点における高解像度デプスマップの値を参照し、オブジェクト境界部分などの近傍画素同士で奥行き値が大きく異なる場合には、最近傍画素の奥行き値を基準として、奥行き値が一定範囲内の近傍画素群を用いて補間を行うなどの方法も考えられる。なお、最近傍画素の奥行き値の代わりに、アップサンプル対象視点の画素に対する奥行き値を用いても構わない。ただし、同じ三次元位置を示す奥行き値が視点間で異なる場合には、アップサンプル対象視点の画素に対する奥行き値を、補間処理を行う視点における奥行き値へ変換する必要がある。 Here, the number of viewpoints of the multi-view image to n, the corresponding point in the low-resolution image LR _l viewpoint l for pixel p _k in the high-resolution image of the view k and C _{l (p} _k). Also, the pixel value LR ₁ [Cl (p _v )] at the decimal pixel position is obtained from the values of surrounding integer pixels using an arbitrary interpolation method. For example, a method for obtaining by linear interpolation from four neighboring pixels can be applied. Alternatively, referring to the value of the high-resolution depth map at the viewpoint, and when the depth value differs greatly between neighboring pixels such as the object boundary, the depth value is within a certain range based on the depth value of the nearest pixel. A method of performing interpolation using a neighboring pixel group is also conceivable. Note that the depth value for the pixel of the upsample target viewpoint may be used instead of the depth value of the nearest pixel. However, when the depth values indicating the same three-dimensional position are different between the viewpoints, it is necessary to convert the depth values for the pixels of the upsampling target viewpoint into the depth values at the viewpoint on which interpolation processing is performed.

このとき関数ｕ_ｖとしては任意の関数を用いることができる。例えば、次の数式で表される重み付け平均を用いても構わない。

This time as a function u _v may be an arbitrary function. For example, a weighted average represented by the following formula may be used.

このとき、重みｗ_ｋ（ｐ_ｖ）を全て１とすると単純平均による処理となる。また、重みｗ_ｋ（ｐ_ｖ）として、アップサンプリング対象視点ｖと視点ｋとの距離が近いほど大きな値となる重みを用いても構わないし、Ｃ_ｋ（ｐ_ｖ）の最近傍の整数画素位置からの距離が近いほど大きな値となる重みを用いても構わないし、この２つの重みの組み合わせによる重みを用いても構わない。 At this time, if all the weights w _k (p _v ) are set to 1, the processing is based on simple averaging. Further, as the weight w _k (p _v ), a weight that increases as the distance between the up-sampling target viewpoint v and the viewpoint k may be used, or the nearest integer pixel position of C _k (p _v ) A weight that becomes a larger value as the distance from the nearer may be used, or a weight obtained by combining these two weights may be used.

また、高解像度画像が得られた際の低解像度画像における誤差を最小化するように重みを求めても構わない。

Further, the weight may be obtained so as to minimize the error in the low resolution image when the high resolution image is obtained.

なお、ａｒｇｍｉｎは与えられた関数の最小値を与える、下部で与えられたパラメータを返す関数である。Ｄ_ｖはステップＳ３のダウンサンプリング処理、ｐ_ｖ’はｐ_ｖに対応する低解像度画像上の整数画素位置を表す。ここでは、誤差として差分の二乗を用いたが、差分の絶対値など他の誤差尺度を用いても構わない。また、上記の式では視点ｖにおける誤差のみを考慮したが、高解像度画像が得られた際の全視点の低解像度画像における誤差の合計値や最大値，分散値を考慮しても構わない。さらに、誤差だけでなく、高解像度画像における画像らしさの評価値も誤差尺度に加えても構わない。画像らしさの評価値にはＴｏｔａｌＶａｒｉａｔｉｏｎノルムなどを用いることができる。 Note that argmin is a function that gives the minimum value of a given function and returns a parameter given at the bottom. D _v represents the downsampling process in step S3, and p _v ′ represents an integer pixel position on the low-resolution image corresponding to p _v . Here, the square of the difference is used as the error, but other error measures such as the absolute value of the difference may be used. In the above formula, only the error at the viewpoint v is considered, but the total value, maximum value, and variance of errors in the low resolution images of all viewpoints when a high resolution image is obtained may be considered. Furthermore, not only the error but also the evaluation value of the image quality in the high resolution image may be added to the error scale. A total variation norm or the like can be used as the evaluation value of the image quality.

また、高解像度画像における画像らしさを考慮しながら全視点の低解像度画像における誤差を最小化することで、アップサンプリングを行う手法として、超解像を用いるようにしてもよい。超解像の手法の詳細は、「文献：Farsiu S, Robinson MD, Elad M, Milanfar, “Fast and robust multiframe super resolution”, IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol.13, No10 (2010).」に記載されている。 Further, super-resolution may be used as a method for performing upsampling by minimizing errors in low-resolution images of all viewpoints while taking into consideration the image quality of high-resolution images. The details of the super-resolution technique are described in “Literature: Farsiu S, Robinson MD, Elad M, Milanfar,“ Fast and robust multiframe super resolution ”, IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol.13, No10 (2010).” Has been.

次に、仮想視点設定部３０８は、自由視点画像を生成する仮想的な視点を設定する（ステップＳ１７）。これはユーザによる設定等の外部入力でも構わないし、固定値でも構わない。そして、自由視点画像合成部３０９は、高解像度多視点画像と高解像度デプスマップを用いて、設定した仮想視点からシーンを観測した画像を自由視点画像として合成する（ステップＳ１８）。詳しい手法は、「文献：Mori Y, Fukushima N, Fujii T, Tanimoto M, “View generation with 3D warping using depth information for FTV”, Image Communication, Vol. 24, No. 1-2 (2009).」に記載されている。このときにステップＳ１４で求めた各視点の三次元投影点の情報を使って演算量の削減をすることが可能となる。最後に、自由視点画像合成部３０９は、合成された画像を出力する（ステップＳ１９）。 Next, the virtual viewpoint setting unit 308 sets a virtual viewpoint for generating a free viewpoint image (step S17). This may be an external input such as a setting by the user or a fixed value. The free viewpoint image composition unit 309 then composes an image obtained by observing the scene from the set virtual viewpoint as a free viewpoint image using the high resolution multi-viewpoint image and the high resolution depth map (step S18). For detailed methods, refer to “Literature: Mori Y, Fukushima N, Fujii T, Tanimoto M,“ View generation with 3D warping using depth information for FTV ”, Image Communication, Vol. 24, No. 1-2 (2009).” Have been described. At this time, it is possible to reduce the amount of calculation using the information of the three-dimensional projection point of each viewpoint obtained in step S14. Finally, the free viewpoint image synthesis unit 309 outputs the synthesized image (step S19).

＜第２の実施形態＞
次に、本発明の第２の実施形態による画像伝送装置を説明する。図４は、本発明の第２の実施形態による画像伝送装置の構成を示すブロック図である。図４において、図１に示す画像伝送装置と同一の部分には同一の符号を付し、その説明を省略する。図４に示す画像伝送装置が図１に示す画像伝送装置と異なる点は、縮小率設定部２０３に代えて、ダウンサンプリングに用いるフィルタを選択するフィルタ選択部２０８が設けられている点と、アップサンプリング部３０７の前にアップサンプリングに用いるフィルタを選択するフィルタ選択部３１０が設けられている点である。 <Second Embodiment>
Next, an image transmission apparatus according to a second embodiment of the present invention will be described. FIG. 4 is a block diagram showing a configuration of an image transmission apparatus according to the second embodiment of the present invention. In FIG. 4, the same parts as those of the image transmission apparatus shown in FIG. The image transmission apparatus shown in FIG. 4 is different from the image transmission apparatus shown in FIG. 1 in that a filter selection unit 208 for selecting a filter used for downsampling is provided in place of the reduction ratio setting unit 203. A filter selection unit 310 that selects a filter used for upsampling is provided before the sampling unit 307.

次に、図５を参照して、図４に示す画像送信装置２００の処理動作を説明する。図５は、図４に示す画像送信装置２００の処理動作を示すフローチャートである。図５において、図２に示す処理動作と同様の部分には同一の符号を付してある。図５に示す処理動作が図２に示す処理動作と異なる点は、ダウンサンプリング部２０４において使用するフィルタを選択する処理を行う点である。 Next, the processing operation of the image transmission apparatus 200 shown in FIG. 4 will be described with reference to FIG. FIG. 5 is a flowchart showing the processing operation of the image transmission apparatus 200 shown in FIG. In FIG. 5, the same parts as those in the processing operation shown in FIG. The processing operation shown in FIG. 5 is different from the processing operation shown in FIG. 2 in that processing for selecting a filter to be used in the downsampling unit 204 is performed.

まず、多視点画像入力部２０１は高解像度多視点画像を入力し、多視点デプスマップ入力部２０２は高解像度多視点デプスマップを入力する（ステップＳ１）。次に、フィルタ選択部２０８は、多視点画像のダウンサンプリングに用いるフィルタを一般的にダウンサンプリングに用いられるフィルタから選択する（ステップＳ２ａ）。これは受信側での復元効率ができるだけ高くなるように画像内容や視点数によって適切なものを選択してもいいし、ユーザ入力などの外部からの入力によって選択しても構わない。選択方法は、例えば、予め設計しておいたフィルタそれぞれを用いてダウンサンプリングとアップサンプリングを行い、元画像と比較して最も復元率の高いものを選択するといった方法が考えられる。また、どのフィルタを選択したかの情報を符号化し付加情報として送信しても構わない。その後、ダウンサンプリング部２０４は、選択されたダウンサンプリングフィルタを用いて高解像度多視点画像を縮小し、低解像度多視点画像を生成する（ステップＳ３）。 First, the multi-view image input unit 201 inputs a high-resolution multi-view image, and the multi-view depth map input unit 202 inputs a high-resolution multi-view depth map (step S1). Next, the filter selection unit 208 selects a filter used for downsampling of the multi-viewpoint image from filters generally used for downsampling (step S2a). An appropriate one may be selected depending on the image content and the number of viewpoints so that the restoration efficiency on the receiving side is as high as possible, or may be selected by an external input such as a user input. As the selection method, for example, a method of performing downsampling and upsampling using each of the filters designed in advance and selecting the one having the highest restoration rate compared with the original image can be considered. Further, information indicating which filter is selected may be encoded and transmitted as additional information. Thereafter, the down-sampling unit 204 reduces the high-resolution multi-view image using the selected down-sampling filter, and generates a low-resolution multi-view image (step S3).

次に、デプスマップ符号化部２０５は高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データとし、画像符号化部２０６により低解像度多視点画像を符号化し低解像度多視点画像符号データとする（ステップＳ４）。続いて、多重化部２０７は、低解像度多視点画像符号データと高解像度多視点デプスマップ符号データとを自由視点画像符号データとして多重化し（ステップＳ５）、多重化した符号データを出力する（ステップＳ６）。 Next, the depth map encoding unit 205 encodes the high-resolution multi-view depth map into high-resolution multi-view depth map code data, and the image encoding unit 206 encodes the low-resolution multi-view image to generate the low-resolution multi-view image code data. (Step S4). Subsequently, the multiplexing unit 207 multiplexes the low-resolution multi-view image code data and the high-resolution multi-view depth map code data as free-view image code data (step S5), and outputs the multiplexed code data (step S5). S6).

次に、図６を参照して、図４に示す画像受信装置３００の処理動作を説明する。図６は、図４に示す画像受信装置３００の処理動作を示すフローチャートである。図６において、図３に示す処理動作と同様の部分には同一の符号を付してある。図６に示す処理動作が図３に示す処理動作と異なる点は、アップサンプリング部３０７において使用するフィルタを選択する処理を行う点である。 Next, the processing operation of the image receiving apparatus 300 shown in FIG. 4 will be described with reference to FIG. FIG. 6 is a flowchart showing the processing operation of the image receiving apparatus 300 shown in FIG. In FIG. 6, the same reference numerals are given to the same parts as the processing operation shown in FIG. The processing operation shown in FIG. 6 is different from the processing operation shown in FIG. 3 in that processing for selecting a filter to be used in the upsampling unit 307 is performed.

まず、符号データ入力部３０１は、自由視点画像符号データを入力する（ステップＳ１１）。入力された自由視点画像符号データは逆多重化部３０２に受け渡され、逆多重化部３０２は、低解像度多視点画像符号データと高解像度多視点デプスマップ符号データとに逆多重化する（ステップＳ１２）。画像復号部３０３は、低解像度多視点画像符号データを復号し低解像度多視点画像を出力し、デプスマップ復号部３０４は高解像度多視点デプスマップ符号データを復号し高解像度多視点デプスマップを出力する（ステップＳ１３）。 First, the code data input unit 301 inputs free viewpoint image code data (step S11). The input free viewpoint image code data is transferred to the demultiplexing unit 302, and the demultiplexing unit 302 demultiplexes the low-resolution multi-view image code data and the high-resolution multi-view depth map code data (steps). S12). The image decoding unit 303 decodes the low-resolution multi-view image code data and outputs a low-resolution multi-view image, and the depth map decoding unit 304 decodes the high-resolution multi-view depth map code data and outputs a high-resolution multi-view depth map (Step S13).

次に、三次元投影点設定部３０５は、高解像度多視点デプスマップより得られる奥行情報と各視点のカメラパラメータに基づき、高解像度多視点画像における各画素の三次元上への投影点を計算する（ステップＳ１４）。続いて、視点間対応点設定部３０６は、高解像度多視点画像の視点間画素対応関係を小数画素精度で求める（ステップＳ１５）。 Next, the three-dimensional projection point setting unit 305 calculates the projection point on each pixel in the high-resolution multi-viewpoint image based on the depth information obtained from the high-resolution multi-viewpoint depth map and the camera parameters of each viewpoint. (Step S14). Subsequently, the inter-viewpoint corresponding point setting unit 306 obtains the inter-viewpoint pixel correspondence of the high-resolution multi-viewpoint image with decimal pixel accuracy (step S15).

次に、フィルタ選択部３１０は、多視点画像のアップサンプリングに用いるフィルタを選択する（ステップＳ１５ａ）。これは送信側で用いたダウンサンプリングフィルタと対になっている必要があるが、選択方法自体は画像内容や視点数によって適切なものを選択してもいいし、ユーザ入力などの外部からの入力によって選択しても構わない。また、送信側でどのフィルタを用いたかの情報を符号化し付加情報として送信しても構わない。 Next, the filter selection unit 310 selects a filter used for upsampling of the multi-viewpoint image (step S15a). This needs to be paired with the downsampling filter used on the transmission side, but the selection method itself may be selected according to the image content and the number of viewpoints, or input from the outside such as user input You may choose by. Further, information on which filter is used on the transmission side may be encoded and transmitted as additional information.

次に、アップサンプリング部３０７は、低解像度多視点画像を視点間画素対応関係に基づき他視点の画素値を参照することによってアップサンプリングし高解像度他視点画像とする（ステップＳ１６）。アップサンプリングフィルタは選択されたフィルタを用いる。 Next, the up-sampling unit 307 up-samples the low-resolution multi-viewpoint image by referring to the pixel value of the other viewpoint based on the inter-viewpoint pixel correspondence relationship to obtain a high-resolution other-viewpoint image (step S16). The upsampling filter uses the selected filter.

次に、仮想視点設定部３０８は、自由視点画像を生成する仮想的な視点を設定する（ステップＳ１７）。そして、自由視点画像合成部３０９は、高解像度多視点画像と高解像度デプスマップを用いて、設定した仮想視点からシーンを観測した画像を自由視点画像として合成する（ステップＳ１８）。最後に、自由視点画像合成部３０９は、合成された画像を出力する（ステップＳ１９）。 Next, the virtual viewpoint setting unit 308 sets a virtual viewpoint for generating a free viewpoint image (step S17). The free viewpoint image composition unit 309 then composes an image obtained by observing the scene from the set virtual viewpoint as a free viewpoint image using the high resolution multi-viewpoint image and the high resolution depth map (step S18). Finally, the free viewpoint image synthesis unit 309 outputs the synthesized image (step S19).

＜第３の実施形態＞
次に、本発明の第３の実施形態による画像伝送装置を説明する。図７は、本発明の第３の実施形態による画像伝送装置の構成を示すブロック図である。図７において、図４に示す画像伝送装置と同一の部分には同一の符号を付し、その説明を省略する。図７に示す画像伝送装置が図４に示す画像伝送装置と異なる点は、画像送信装置２００において、フィルタ選択部２０８に代えて、ダウンサンプリングに用いるフィルタを設定するフィルタ設定部２０９と、フィルタを符号化する付加情報符号化部２１０が設けられている点である。また、画像受信装置３００において、フィルタ選択部３１０に代えて、付加情報復号部３１１とフィルタ設定部３１２設けられている点である。 <Third Embodiment>
Next, an image transmission apparatus according to a third embodiment of the present invention will be described. FIG. 7 is a block diagram showing a configuration of an image transmission apparatus according to the third embodiment of the present invention. 7, the same parts as those of the image transmission apparatus shown in FIG. 4 are denoted by the same reference numerals, and the description thereof is omitted. The image transmission apparatus shown in FIG. 7 is different from the image transmission apparatus shown in FIG. 4 in that, in the image transmission apparatus 200, instead of the filter selection unit 208, a filter setting unit 209 that sets a filter used for downsampling, An additional information encoding unit 210 for encoding is provided. Further, in the image receiving device 300, an additional information decoding unit 311 and a filter setting unit 312 are provided instead of the filter selection unit 310.

次に、図８を参照して、図７に示す画像送信装置２００の処理動作を説明する。図８は、図７に示す画像送信装置２００の処理動作を示すフローチャートである。図８において、図５に示す処理動作と同様の部分には同一の符号を付してある。図８に示す処理動作が図５に示す処理動作と異なる点は、ダウンサンプリング部２０４において使用するフィルタを選択する代わりに、任意のフィルタを設定し符号化し伝送する点である。 Next, the processing operation of the image transmission device 200 shown in FIG. 7 will be described with reference to FIG. FIG. 8 is a flowchart showing the processing operation of the image transmission apparatus 200 shown in FIG. In FIG. 8, the same parts as those in the processing operation shown in FIG. The processing operation shown in FIG. 8 is different from the processing operation shown in FIG. 5 in that instead of selecting a filter to be used in the downsampling unit 204, an arbitrary filter is set, encoded, and transmitted.

まず、多視点画像入力部２０１は高解像度多視点画像を入力し、多視点デプスマップ入力部２０２は高解像度多視点デプスマップを入力する（ステップＳ１）。次に、フィルタ設定部２０９は、多視点画像の送信側でのダウンサンプリングと受信側でのアップサンプリングに用いるフィルタを設定する（ステップＳ２ｂ）。これは視点数や各視点画像・デプスマップの特性によって、受信側での復元効率が最も高くなるようなものを設定すればよい。例えば、横Ｎ×縦１のカメラアレイによって撮影された多視点画像には横１／Ｎ×縦１に縮小するダウンサンプリングフィルタを用いて、全視点合わせてのサンプリング効率が高くなるようにするといった方法が考えられる。もしくは、デプスマップから得られる三次元情報に基づいて、画像中の部分領域について、その領域を観測可能である視点数に応じてダウンサンプリング率を決定するといった方法も考えられる。例えば１視点からしか観測されない領域はダウンサンプリングされず、全視点から観測可能な領域は１／Ｎにダウンサンプリングされるといったフィルタになる。アップサンプリングフィルタはダウンサンプリングフィルタと同一の補間法に基づいて設定しても構わないし、別の基準に基づいて設定しても構わない。例えば、あるダウンサンプリングフィルタでダウンサンプリングした画像に対して平均二乗誤差最小化法を用いてアップサンプリングフィルタを決定するといった方法も考えられる。また、このフィルタは全視点で同一のものを設定しても構わないし、視点毎に設定しても構わない。その後、ダウンサンプリング部２０４により、設定したダウンサンプリングフィルタを用いて高解像度多視点画像を縮小し、低解像度多視点画像を生成する（ステップＳ３）。 First, the multi-view image input unit 201 inputs a high-resolution multi-view image, and the multi-view depth map input unit 202 inputs a high-resolution multi-view depth map (step S1). Next, the filter setting unit 209 sets a filter used for downsampling on the transmission side and upsampling on the reception side of the multi-viewpoint image (step S2b). This may be set so that the restoration efficiency on the receiving side is the highest, depending on the number of viewpoints and the characteristics of each viewpoint image / depth map. For example, a multi-viewpoint image captured by a horizontal N × vertical 1 camera array is used with a downsampling filter that reduces to 1 / N × vertical 1 so that the sampling efficiency for all viewpoints is increased. A method is conceivable. Alternatively, based on the three-dimensional information obtained from the depth map, a method of determining the downsampling rate for a partial region in the image according to the number of viewpoints that can observe the region can be considered. For example, an area that can be observed only from one viewpoint is not downsampled, and an area that can be observed from all viewpoints is downsampled to 1 / N. The upsampling filter may be set based on the same interpolation method as the downsampling filter, or may be set based on another standard. For example, a method of determining an upsampling filter using an average square error minimization method for an image downsampled by a certain downsampling filter is also conceivable. In addition, the same filter may be set for all viewpoints, or may be set for each viewpoint. Thereafter, the down-sampling unit 204 reduces the high-resolution multi-view image using the set down-sampling filter, and generates a low-resolution multi-view image (step S3).

次に、付加情報符号化部２１０は、設定されたアップサンプリングフィルタを符号化し、フィルタ符号データを生成する（ステップＳ３ａ）。次に、デプスマップ符号化部２０５は高解像度多視点デプスマップを符号化し高解像度多視点デプスマップ符号データとし、画像符号化部２０６により低解像度多視点画像を符号化し低解像度多視点画像符号データとする（ステップＳ４）。そして、多重化部２０７は、低解像度多視点画像符号データ、高解像度多視点デプスマップ符号データ及びフィルタ符号データを多重化し自由視点画像符号データを生成し（ステップＳ５）、多重化した符号データを出力する（ステップＳ６）。 Next, the additional information encoding unit 210 encodes the set upsampling filter to generate filter code data (step S3a). Next, the depth map encoding unit 205 encodes the high-resolution multi-view depth map into high-resolution multi-view depth map code data, and the image encoding unit 206 encodes the low-resolution multi-view image to generate the low-resolution multi-view image code data. (Step S4). Then, the multiplexing unit 207 multiplexes the low resolution multi-view image code data, the high resolution multi-view depth map code data, and the filter code data to generate free viewpoint image code data (step S5), and the multiplexed code data is generated. Output (step S6).

次に、図９を参照して、図７に示す画像受信装置３００の処理動作を説明する。図９は、図７に示す画像受信装置３００の処理動作を示すフローチャートである。図９において、図６に示す処理動作と同様の部分には同一の符号を付してある。図９に示す処理動作が図６に示す処理動作と異なる点は、アップサンプリング部３０７において使用するフィルタを選択する代わりに、画像データと共に受信したフィルタ符号データを復号してアップサンプリングに用いる処理を行う点である。 Next, the processing operation of the image receiving apparatus 300 shown in FIG. 7 will be described with reference to FIG. FIG. 9 is a flowchart showing the processing operation of the image receiving apparatus 300 shown in FIG. 9, parts that are the same as the processing operations shown in FIG. 6 are given the same reference numerals. The processing operation shown in FIG. 9 is different from the processing operation shown in FIG. 6 in that instead of selecting a filter to be used in the upsampling unit 307, a process of decoding the filter code data received together with the image data and using it for upsampling. It is a point to do.

まず、符号データ入力部３０１は、自由視点画像符号データを入力する（ステップＳ１１）。入力された自由視点画像符号データは逆多重化部３０２に受け渡され、逆多重化部３０２は、低解像度多視点画像符号データと高解像度多視点デプスマップ符号データ及びフィルタ符号データとに逆多重化する（ステップＳ１２）。画像復号部３０３は、低解像度多視点画像符号データを復号し低解像度多視点画像を出力し、デプスマップ復号部３０４は高解像度多視点デプスマップ符号データを復号し高解像度多視点デプスマップを出力する（ステップＳ１３）。 First, the code data input unit 301 inputs free viewpoint image code data (step S11). The input free viewpoint image code data is transferred to the demultiplexing unit 302. The demultiplexing unit 302 demultiplexes the low-resolution multi-view image code data, the high-resolution multi-view depth map code data, and the filter code data. (Step S12). The image decoding unit 303 decodes the low-resolution multi-view image code data and outputs a low-resolution multi-view image, and the depth map decoding unit 304 decodes the high-resolution multi-view depth map code data and outputs a high-resolution multi-view depth map (Step S13).

次に、付加情報復号部３１１は、フィルタ符号データを復号しアップサンプリングフィルタを生成する（ステップＳ１５ｂ）。フィルタ設定部３１２は、復号されたアップサンプリングフィルタを設定する。続いて、アップサンプリング部３０７は、低解像度多視点画像を視点間画素対応関係に基づき他視点の画素値を参照することによってアップサンプリングし高解像度他視点画像とする（ステップＳ１６）。 Next, the additional information decoding unit 311 decodes the filter code data to generate an upsampling filter (step S15b). The filter setting unit 312 sets the decoded upsampling filter. Subsequently, the up-sampling unit 307 up-samples the low-resolution multi-viewpoint image by referring to the pixel values of the other viewpoints based on the inter-viewpoint pixel correspondence relationship to obtain a high-resolution other-viewpoint image (step S16).

前述した説明においては、出力を自由視点画像である例を説明したが、自由視点画像合成を行わずアップサンプリングした多視点画像を出力とすることで多視点画像伝送装置として利用するようにしてもよい。また、動画の各フレームに対して適用することで自由視点映像伝送装置として利用するようにしてもよい。 In the above description, an example in which the output is a free viewpoint image has been described. However, a multi-viewpoint image that has been upsampled without performing free-viewpoint image synthesis may be used as an output to be used as a multi-viewpoint image transmission apparatus. Good. Moreover, you may make it utilize as a free viewpoint video transmission apparatus by applying with respect to each flame | frame of a moving image.

前述した説明においては、アップサンプリングフィルタを符号化し伝送し受信側でそれを用いてアップサンプリングを行うという例を説明したが、代わりにダウンサンプリングフィルタを符号化し伝送し、受信側でそれを用いて数式（４）のようなアップサンプリング処理を行っても構わないし、超解像処理を行っても構わない。超解像処理においてアップサンプリングフィルタを内部的に利用する場合はダウンサンプリングフィルタのフィルタ係数から求めても構わないし、ダウンサンプリングフィルタとアップサンプリングフィルタを両方符号化し伝送しても構わない。 In the above description, an example has been described in which an upsampling filter is encoded and transmitted, and upsampling is performed using the upsampling filter, but instead a downsampling filter is encoded and transmitted and used on the receiving side. An upsampling process like Formula (4) may be performed, and a super-resolution process may be performed. When the upsampling filter is used internally in the super-resolution processing, it may be obtained from the filter coefficient of the downsampling filter, or both the downsampling filter and the upsampling filter may be encoded and transmitted.

また、多視点画像のアップサンプリングは画像受信装置３００で独立に行う例を説明したが、画像送信装置２００でアップサンプリングに用いるパラメータを指定し符号化して伝送するようにしてもよい。画像送信装置２００はダウンサンプリング前の高解像度多視点画像を参照することが可能なため、ダウンサンプリングした多視点画像に対して試験的にアップサンプリングを施しながら画質比較を行い、適切なアップサンプリングパラメータを設定することができる。 Further, although the example in which the up-sampling of the multi-viewpoint image is performed independently by the image receiving apparatus 300 has been described, the image transmitting apparatus 200 may specify the parameters used for the up-sampling, encode them, and transmit them. Since the image transmission apparatus 200 can refer to the high-resolution multi-viewpoint image before downsampling, the image transmission device 200 compares the image quality while performing a trial upsampling on the downsampled multi-viewpoint image, and an appropriate upsampling parameter is set. Can be set.

また、前述した説明においては、高解像度デプスマップから三次元投影点を計算して、これを用いて視点間対応点を求める例を説明したが、別の方法として予めホモグラフィ行列を求めておくようにしてもよい。平面上に存在する点ｖ_ωを、第１のカメラ、第２のカメラそれぞれの座標系でｖ_１、ｖ_２とするとき、平面の法線ベクトルをｎ、平面から第２のカメラまでの距離をｄとして、それぞれの投影変換行列Ａ_１、Ａ_２を用いて、画像上の座標ｍ_１とｍ_２に関して、
ｓｍ_１＝Ｈｍ_２・・・（５）
Ｈ＝Ａ_１ ^−１（Ｒ＋ｔｎ^Ｔ／ｄ）Ａ_２・・・（６）
が成立する。この変換は同一平面上にあるすべての点に対して適用でき、この３×３行列Ｈをホモグラフィ行列と呼ぶ。 In the above description, an example has been described in which a three-dimensional projection point is calculated from a high-resolution depth map and a corresponding point between viewpoints is obtained using this. However, as another method, a homography matrix is obtained in advance. You may do it. When the point v _ω existing on the plane is v ₁ and v ₂ in the coordinate systems of the first camera and the second camera, the normal vector of the plane is n, and the distance from the plane to the second camera And d, respectively, with respect to the coordinates m ₁ and m ₂ on the image, using the respective projection transformation matrices A ₁ and A ₂ ,
sm ₁ = Hm ₂ (5)
H = A ₁ ⁻¹ (R + tn ^T / d) A ₂ (6)
Is established. This transformation can be applied to all points on the same plane, and this 3 × 3 matrix H is called a homography matrix.

多視点デプスマップが８ｂｉｔ＝２５６階調のグレースケール画像である場合、奥行きレベルも２５６階調であることから、ひとつの視点から別の視点へのホモグラフィ変換行列を奥行きレベル別に２５６通り計算しておくことで、特に解像度の高い画像を扱う場合には演算量の大幅な削減が期待できる。 When the multi-view depth map is a grayscale image with 8 bits = 256 gradations, the depth level is also 256 gradations, so 256 homography transformation matrices from one viewpoint to another viewpoint are calculated for each depth level. By doing so, it is possible to expect a significant reduction in the amount of calculation especially when dealing with high-resolution images.

以上説明したように、合成画像の品質を維持した多視点画像と多視点デプスマップからなる自由視点画像データの総画素数削減を実現するために、多視点画像のダウンサンプリングによって総画素数削減を達成し、画像受信装置で多視点デプスマップを利用したアップサンプリングを行うことで画像品質を回復するようにした。多視点デプスマップを高解像度で精度良く持つことにより、多視点画像の視点間画素対応関係が小数画素精度で求まり、その対応関係を用いて他視点の画素を参照しながら精度の良いアップサンプリングを行うことが可能となる。 As described above, in order to reduce the total number of pixels of free viewpoint image data consisting of multi-view images and multi-view depth maps that maintain the quality of the composite image, the total number of pixels can be reduced by down-sampling the multi-view images. The image quality is restored by performing upsampling using a multi-view depth map in the image receiving apparatus. By having a multi-view depth map with high resolution and high accuracy, the inter-viewpoint pixel correspondences of multi-viewpoint images can be obtained with decimal pixel accuracy, and accurate upsampling can be performed while referring to pixels of other viewpoints using the correspondences. Can be done.

なお、図１、図４、図７における画像送信装置２００及び画像受信装置３００の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより自由視点画像の伝送処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 1, 4, and 7, the program for realizing the functions of the image transmission device 200 and the image reception device 300 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is recorded on the computer. The free viewpoint image transmission process may be performed by causing the system to read and execute. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は、本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の精神及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 Although the embodiments of the present invention have been described with reference to the drawings, the above embodiments are merely examples of the present invention, and it is clear that the present invention is not limited to the above embodiments. It is. Accordingly, additions, omissions, substitutions, and other modifications of components may be made without departing from the spirit and scope of the present invention.

合成画像の品質を維持しながら伝送データの総画素数の削減を行って、画像伝送を行うことが不可欠な用途に適用できる。 The present invention can be applied to applications in which it is essential to perform image transmission by reducing the total number of pixels of transmission data while maintaining the quality of the composite image.

２０１・・・多視点画像入力部、２０２・・・多視点デプスマップ入力部、２０３・・・縮小率設定部、２０４・・・ダウンサンプリング部、２０５・・・デプスマップ符号化部、２０６・・・画像符号化部、２０７・・・多重化部、２０８・・・フィルタ選択部、２０９・・・フィルタ設定部、２１０・・・付加情報符号化部、３０１・・・符号データ入力部、３０２・・・逆多重化部、３０３・・・画像復号部、３０４・・・デプスマップ復号部、３０５・・・三次元投影点設定部、３０６・・・視点間対応関係設定部、３０７・・・アップサンプリング部、３０８・・・仮想視点設定部、３０９・・・自由視点画像合成部、３１０・・・フィルタ選択部、３１１・・・付加情報復号部、３１２・・・フィルタ設定部 201 ... multi-viewpoint image input unit, 202 ... multi-viewpoint depth map input unit, 203 ... reduction rate setting unit, 204 ... downsampling unit, 205 ... depth map encoding unit, 206 ..Image encoding unit, 207... Multiplexing unit, 208... Filter selection unit, 209... Filter setting unit, 210 .. Additional information encoding unit, 301. 302 ... Demultiplexing unit 303 ... Image decoding unit 304 ... Depth map decoding unit 305 ... Three-dimensional projection point setting unit 306 ... Inter-viewpoint correspondence setting unit 307 ..Upsampling unit, 308... Virtual viewpoint setting unit, 309... Free viewpoint image synthesis unit, 310... Filter selection unit, 311.

Claims

An image transmission method of a free-viewpoint image composed of a high-resolution multi-viewpoint image in which the same scene is captured by a plurality of cameras and a high-resolution multi-viewpoint depth map describing depth information for each pixel as a grayscale image,
A reduction ratio setting step for setting a reduction ratio when downsampling the high-resolution multi-viewpoint image;
A downsampling step of downsampling the high-resolution multi-viewpoint image based on the reduction ratio and outputting a low-resolution multi-viewpoint image;
An image encoding step of encoding the low resolution multi-view image obtained in the down-sampling step and outputting low-resolution multi-view image code data;
A depth map encoding step for encoding the high-resolution multi-view depth map and outputting high-resolution multi-view depth map code data;
An image decoding step of decoding the low-resolution multi-view image code data obtained by the image encoding step;
Depth map decoding step for decoding high-resolution multi-view depth map code data obtained by the depth map encoding step;
Setting the correspondence between viewpoints by obtaining corresponding points with different pixel accuracy at different viewpoints of each pixel using the depth information of each pixel obtained from the high-resolution multi-view depth map obtained at the depth map decoding step A correspondence setting step between viewpoints,
An upsampling step of generating a high-resolution multi-viewpoint image from the low-resolution multi-viewpoint image obtained by performing the upsampling while referring to the decimal pixel value of another viewpoint based on the correspondence relationship between the viewpoints;
An image transmission method comprising: a high-resolution multi-view image obtained in the up-sampling step; and a free-viewpoint image synthesis step of synthesizing an arbitrary viewpoint image from the high-resolution multi-view depth map.

A first filter selection step of selecting a filter from a predetermined filter group used for downsampling;
A second filter selection step of selecting a filter from a predetermined group of filters used for upsampling,
The down-sampling step down-samples the high-resolution multi-viewpoint image using the filter selected in the first filter selection step;
The image transmission method according to claim 1, wherein the up-sampling step up-samples the low-resolution multi-viewpoint image using the filter selected in the second filter selection step.

An additional information encoding step of encoding information for identifying the filter selected in the first filter selection step and outputting the information as additional information code data;
An additional information decoding step of decoding the additional information code data and outputting information for identifying the filter;
The image transmission method according to claim 2, wherein the second filter selection step selects a filter used for upsampling based on information for identifying the filter.

A first filter setting step for setting a filter with the highest image restoration efficiency;
And a second filter setting step for setting a filter with the highest image restoration efficiency,
The downsampling step downsamples the high-resolution multi-viewpoint image using the filter set in the first filter setting step,
The image transmission method according to claim 1, wherein the up-sampling step up-samples the low-resolution multi-viewpoint image using the filter set in the second filter setting step.

An additional information encoding step of encoding identification information of the filter set in the first filter setting step and outputting additional information code data;
An additional information decoding step of decoding the additional information code data and outputting filter identification information;
5. The image transmission method according to claim 4, wherein the second filter setting step sets a filter used for the upsampling based on identification information of the filter.

An image transmission device for a free-viewpoint image composed of a high-resolution multi-viewpoint image in which the same scene is captured by a plurality of cameras and a high-resolution multi-view depth map describing depth information for each pixel as a grayscale image,
Reduction ratio setting means for setting a reduction ratio when down-sampling the high-resolution multi-viewpoint image;
Down-sampling means for down-sampling the high-resolution multi-view image based on the reduction ratio and outputting a low-resolution multi-view image;
Image encoding means for encoding the low-resolution multi-view image obtained by the down-sampling means and outputting low-resolution multi-view image code data;
Depth map encoding means for encoding the high resolution multi-view depth map and outputting high-resolution multi-view depth map code data;
Image decoding means for decoding low-resolution multi-view image code data obtained by the image encoding means;
Depth map decoding means for decoding high resolution multi-view depth map code data obtained by the depth map encoding means;
Setting the correspondence between viewpoints by determining the corresponding points with the decimal pixel accuracy in different viewpoints of each pixel using the depth information of each pixel obtained from the high-resolution multi-view depth map obtained by the depth map decoding means A correspondence setting means between viewpoints,
Up-sampling means for generating a high-resolution multi-view image from the low-resolution multi-view image obtained by the image decoding means by performing up-sampling while referring to the decimal pixel value of another viewpoint based on the correspondence relationship between the viewpoints;
An image transmission apparatus comprising: a high-resolution multi-view image obtained by the up-sampling unit; and a free-viewpoint image synthesizing unit that synthesizes an arbitrary viewpoint image from the high-resolution multi-view depth map.

First filter selection means for selecting a filter from a predetermined filter group used for downsampling;
A second filter selection means for selecting a filter from a predetermined filter group used for upsampling;
The down-sampling means down-samples the high-resolution multi-viewpoint image using the filter selected by the first filter selection means;
The image transmission apparatus according to claim 6, wherein the up-sampling unit up-samples the low-resolution multi-viewpoint image using a filter selected by the second filter selection unit.

First filter setting means for setting a filter having the highest image restoration efficiency;
A second filter setting means for setting a filter with the highest image restoration efficiency,
The down-sampling means down-samples the high-resolution multi-viewpoint image using the filter set by the first filter setting means,
The image transmission apparatus according to claim 6, wherein the upsampling unit upsamples the low-resolution multi-viewpoint image using a filter set by the second filter setting unit.

An image transmission device that transmits a high-resolution multi-viewpoint image obtained by photographing a single scene with a plurality of cameras and a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image,
Reduction ratio setting means for setting a reduction ratio when down-sampling the high-resolution multi-viewpoint image;
Down-sampling means for down-sampling the high-resolution multi-view image based on the reduction ratio and outputting a low-resolution multi-view image;
Image encoding means for encoding the low-resolution multi-view image obtained by the down-sampling means and outputting low-resolution multi-view image code data;
An image transmission apparatus comprising: depth map encoding means for encoding the high resolution multi-view depth map and outputting high-resolution multi-view depth map code data.

Free image sent from an image sending device that sends a high-resolution multi-viewpoint image of the same scene captured by multiple cameras and a high-resolution multi-view depth map that describes the depth information for each pixel as a grayscale image An image receiving device that receives viewpoint image code data,
Image decoding means for decoding low-resolution multi-view image code data obtained by the image encoding means;
Depth map decoding means for decoding high resolution multi-view depth map code data obtained by the depth map encoding means;
Setting the correspondence between viewpoints by determining the corresponding points with the decimal pixel accuracy in different viewpoints of each pixel using the depth information of each pixel obtained from the high-resolution multi-view depth map obtained by the depth map decoding means A correspondence setting means between viewpoints,
Up-sampling means for generating a high-resolution multi-view image from the low-resolution multi-view image obtained by the image decoding means by performing up-sampling while referring to the decimal pixel value of another viewpoint based on the correspondence relationship between the viewpoints;
An image receiving apparatus comprising: a high-resolution multi-view image obtained by the up-sampling unit; and a free-view point image synthesizing unit that synthesizes an arbitrary viewpoint image from the high-resolution multi-view depth map.

An image to a computer on an image transmission device that transmits a high-resolution multi-viewpoint image obtained by shooting the same scene with a plurality of cameras and a high-resolution multi-view depth map in which depth information for each pixel is described as a grayscale image An image transmission program for performing transmission processing,
A reduction ratio setting step for setting a reduction ratio when downsampling the high-resolution multi-viewpoint image;
A downsampling step of downsampling the high-resolution multi-viewpoint image based on the reduction ratio and outputting a low-resolution multi-viewpoint image;
An image encoding step of encoding the low resolution multi-view image obtained in the down-sampling step and outputting low-resolution multi-view image code data;
A depth map encoding step of encoding the high resolution multi-view depth map and outputting high-resolution multi-view depth map code data.

Free image sent from an image sending device that sends a high-resolution multi-viewpoint image of the same scene captured by multiple cameras and a high-resolution multi-view depth map that describes the depth information for each pixel as a grayscale image An image receiving program for causing a computer on an image receiving apparatus that receives viewpoint image code data to perform image receiving processing,
An image decoding step of decoding the low-resolution multi-view image code data obtained by the image encoding step;
Depth map decoding step for decoding high-resolution multi-view depth map code data obtained by the depth map encoding step;
Setting the correspondence between viewpoints by obtaining corresponding points with different pixel accuracy at different viewpoints of each pixel using the depth information of each pixel obtained from the high-resolution multi-view depth map obtained at the depth map decoding step A correspondence setting step between viewpoints,
An upsampling step of generating a high-resolution multi-viewpoint image from the low-resolution multi-viewpoint image obtained by performing the upsampling while referring to the decimal pixel value of another viewpoint based on the correspondence relationship between the viewpoints;
An image receiving program characterized by causing a high-resolution multi-view image obtained in the up-sampling step and a free-viewpoint image synthesis step to synthesize an arbitrary viewpoint image from the high-resolution multi-view depth map.