JP2002369152A

JP2002369152A - Image processing apparatus, image processing method, image processing program, and computer-readable storage medium storing image processing program

Info

Publication number: JP2002369152A
Application number: JP2001171379A
Authority: JP
Inventors: Mitsuru Maeda; 充前田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-06-06
Filing date: 2001-06-06
Publication date: 2002-12-20

Abstract

(57)【要約】【課題】オブジェクト符号化された符号化データを復
号する際、オーディオとビデオとを位置的な差異をなく
して再生する。【解決手段】オーディオの符号化データとビデオの符
号化データを復号して再生する画像処理装置であって、
ビデオ符号化データを復号してビデオデータを生成する
ビデオ復号部４，５と、オーディオ符号化データを復号
してオーディオデータを生成するオーディオ復号部６，
７と、入力したビデオ符号化データに基づいてビデオオ
ブジェクトの位置を求める位置判定部９と、その位置判
定部９で判定された位置情報に基づいて、オーディオ復
号部６，７で復号されたオーディオデータの左右のチャ
ネルの音響再生装置１１，１２における再生を制御する
ミキサ１０とを有する。 (57) [Summary] To decode encoded data subjected to object encoding, audio and video are reproduced without a positional difference. An image processing apparatus for decoding and reproducing encoded audio data and encoded video data, comprising:
Video decoding units 4 and 5 for decoding video encoded data to generate video data, and audio decoding units 6 and 7 for decoding audio encoded data and generating audio data.
7, a position determining unit 9 for obtaining the position of the video object based on the input video encoded data, and the audio decoded by the audio decoding units 6 and 7 based on the position information determined by the position determining unit 9. And a mixer 10 for controlling the reproduction of the left and right channels of the data by the sound reproducing devices 11 and 12.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、オーディオ符号化
データとビデオ符号化データからなる符号化画像データ
を入力して復号する画像処理装置及びその方法に関する
もので、特にオブジェクト単位で符号化された画像デー
タを入力して復号・再生する画像処理装置、画像処理方
法、画像処理プログラム及び画像処理プログラムが記憶
されてコンピュータにより読み取り可能な記憶媒体に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus and method for inputting and decoding encoded image data composed of audio encoded data and video encoded data. The present invention relates to an image processing apparatus that inputs and decodes and reproduces image data, an image processing method, an image processing program, and a computer-readable storage medium that stores the image processing program.

【０００２】[0002]

【従来の技術】近年、動画像の符号化方式としてＭＰＥ
Ｇ(Moving Picture Experts Group)−４符号化方式が検
討され、その国際標準化作業が進んでいる。従来のＭＰ
ＥＧ−２に代表される動画像符号化では、その符号化の
単位はフレーム或はフィールドを単位であったのに対し
て、コンテンツの再利用や編集を実現するために、ＭＰ
ＥＧ−４では映像データやオーディオデータをオブジェ
クト（物体）として符号化を行っている。更に、映像デ
ータに含まれる物体も独立して符号化され、それぞれも
オブジェクトとして扱うことができる。その詳細は、例
えば三木弼一編著「ＭＰＥＧ−４のすべて」（工業調査
会）や国際標準ＩＳＯ／ＩＥＣ１４４９６−２等に詳し
く記載されている。2. Description of the Related Art In recent years, MPE has been
The G (Moving Picture Experts Group) -4 coding scheme is being studied and its international standardization is underway. Conventional MP
In moving picture coding represented by EG-2, the coding unit is a frame or a field, whereas the MP is used to realize reuse and editing of contents.
In EG-4, video data and audio data are encoded as objects. Furthermore, the objects included in the video data are also encoded independently, and each can be treated as an object. The details are described in detail in, for example, "All about MPEG-4" edited by Seiichi Miki (Industrial Research Council), International Standard ISO / IEC14496-2, and the like.

【０００３】このMPEG-4符号化方式によれば、符号化対
象をオブジェクト（物体）単位で扱うために、画像にお
ける物体の形状が符号化及び復号の際に既知でなければ
ならない。また、後ろにある物体が透けて見えるガラス
のような物体を表現するためには、物体がどれだけの透
明度を有するかといった情報が必要となる。このような
物体の形状、及び物体の透明度の情報を合わせて形状情
報と呼ぶ。その形状情報の符号化を形状符号化と呼ぶ。
コアプロファイル以上の符号化データでは、この任意形
状を扱うことができる。これは形状の定義により物体内
外を区別し、その内側をMPEG-1，2と同様に、動き補償
とＤＣＴ変換符号化からなるテクスチャ符号化にて処理
するものである。According to the MPEG-4 encoding method, in order to handle an object to be encoded in units of objects (objects), the shape of an object in an image must be known at the time of encoding and decoding. In addition, in order to represent a glass-like object through which an object behind can be seen, information such as the degree of transparency of the object is required. Such information on the shape of the object and the transparency of the object is collectively referred to as shape information. The encoding of the shape information is called shape encoding.
The encoded data of the core profile or higher can handle this arbitrary shape. In this method, the inside and outside of an object are distinguished by the definition of the shape, and the inside of the object is processed by texture coding including motion compensation and DCT transform coding as in MPEG-1 and MPEG-2.

【０００４】尚、プロファイルとは、想定されるアプリ
ケーションを実現するためのツール・セットを定義する
ものである。[0004] The profile defines a tool set for realizing an assumed application.

【０００５】また、オブジェクトの配置等を記述するた
めに、システムに関する符号化としてＢＩＦＳ(Binary
Format for Scene)符号化が採用されている。これはシ
ーンを記述するための符号化データであり、画面全体の
大きさ、オブジェクトの配置、再生のタイミング等を記
述している。In order to describe the arrangement of objects and the like, BIFS (Binary)
Format for Scene) coding is employed. This is encoded data for describing a scene, and describes the size of the entire screen, the arrangement of objects, the timing of reproduction, and the like.

【０００６】以下、図１４を参照して、ＭＰＥＧ−４で
符号化された動画像データを再生する装置の機能構成に
ついて説明する。[0006] A functional configuration of an apparatus for reproducing moving image data encoded by MPEG-4 will be described below with reference to FIG.

【０００７】図１４において、１００１はＭＰＥＧ−４
符号化データ入力部で、ＭＰＥＧ−４で符号化された符
号化データを入力する。１００２は分離器で、多重化さ
れたＭＰＥＧ−４符号化データからシステムに関する符
号化データ、ビデオオブジェクトに関する符号化デー
タ、オーディオオブジェクトに関する符号化データをそ
れぞれ分離している。１００３はシステム復号部で、シ
ステムに関する符号化データを復号する。１００４，１
００５はビデオ復号部で、それぞれビデオオブジェクト
を復号する。１００６，１００７はオーディオ復号部
で、それぞれ左右のチャネルに対応するオーディオオブ
ジェクトを復号する。１００８は画像合成部で、システ
ム復号部１００３の出力結果に基づいて、ビデオ復号部
１００４，１００５で復号されたビデオオブジェクトを
制御して合成する。１００９はミキサであり、オーディ
オ復号部１００６，１００７で復号されたオーディオデ
ータを合成する。１０１０は合成された画像を表示する
表示装置（モニタ）であり、１０１１と１０１２は合成
された音を再現する音響装置であり、ステレオ効果を実
現するため、左右に配置される。In FIG. 14, reference numeral 1001 denotes MPEG-4
An encoded data input unit inputs encoded data encoded by MPEG-4. Reference numeral 1002 denotes a separator which separates coded data related to the system, coded data related to the video object, and coded data related to the audio object from the multiplexed MPEG-4 coded data. A system decoding unit 1003 decodes encoded data related to the system. 1004,1
Reference numeral 005 denotes a video decoding unit that decodes each video object. Reference numerals 1006 and 1007 denote audio decoding units for decoding audio objects corresponding to the left and right channels, respectively. An image synthesizing unit 1008 controls and synthesizes video objects decoded by the video decoding units 1004 and 1005 based on the output result of the system decoding unit 1003. Reference numeral 1009 denotes a mixer that synthesizes audio data decoded by the audio decoding units 1006 and 1007. Reference numeral 1010 denotes a display device (monitor) for displaying a synthesized image, and reference numerals 1011 and 1012 denote acoustic devices for reproducing a synthesized sound, which are arranged on the left and right sides to realize a stereo effect.

【０００８】いま、符号化データ入力部１００１から入
力されたＭＰＥＧ−４符号化データは、分離器１００２
でそれぞれの符号化データに分離され、それぞれの復号
部に入力される。いま再生しようとしているオブジェク
トの構成例を図２に示す。Now, the MPEG-4 encoded data input from the encoded data input unit 1001 is
Are separated into respective encoded data and input to the respective decoding units. FIG. 2 shows a configuration example of the object to be reproduced.

【０００９】図２において、１１００は画面全体を表
し、この画面は、女性オブジェクト（ＶＯ１）１１０１
と男性オブジェクト（ＶＯ２）１１０２とで構成されて
いる。そして、女性オブジェクト１１０１の声が女性オ
ーディオオブジェクトとなり、男性オブジェクト１１０
２の声が男性オーディオオブジェクトとなる。In FIG. 2, reference numeral 1100 denotes the entire screen. This screen is a female object (VO1) 1101.
And a male object (VO2) 1102. Then, the voice of the female object 1101 becomes a female audio object, and the male object 1101
The second voice is a male audio object.

【００１０】図１４では、システム復号部１００３は、
ＢＩＦＳ符号化データを復号し、ビデオに関しては画面
１１００の大きさ、オブジェクトの配置とオーディオオ
ブジェクトとビデオオブジェクトの同期を復号する。こ
こで各オブジェクトの配置は、各オブジェクトの左上端
と画面左上端との位置関係で表される。In FIG. 14, the system decoding unit 1003
The decoder decodes the BIFS encoded data, and decodes the size of the screen 1100, the arrangement of the objects, and the synchronization between the audio object and the video object for video. Here, the arrangement of each object is represented by the positional relationship between the upper left corner of each object and the upper left corner of the screen.

【００１１】ビデオ復号部１００４は、女性オブジェク
ト１１０１の画像データを、ビデオ復号部１００５は男
性オブジェクト１１０２の画像データを復号する。オー
ディオ復号部１００６は、女性オブジェクト１１０１の
オーディオオブジェクトを、オーディオ復号部１００７
は男性オブジェクト１１０２のオーディオオブジェクト
をそれぞれ復号する。画像合成部１００８は、システム
復号部１００３から出力されるオブジェクトの位置に従
って、女性オブジェクト１１０１と男性オブジェクト１
１０２を配置する。ミキサ１００９も同様に、システム
復号部１００３から出力されるオブジェクトの位置に従
って女性オブジェクト１１０１と男性オブジェクト１１
０２のそれぞれのオーディオデータをミキシングし、左
右の音響装置１０１１と１０１２から出力できるように
音量バランスを調整する。モニタ１０１０は、画像合成
部１００８の出力を表示し、音響装置１０１１、１０１
２はオーディオデータを再生する。The video decoding unit 1004 decodes image data of the female object 1101, and the video decoding unit 1005 decodes image data of the male object 1102. The audio decoding unit 1006 converts the audio object of the female object 1101 into an audio decoding unit 1007
Decodes the audio object of the male object 1102, respectively. The image synthesizing unit 1008 selects the female object 1101 and the male object 1 according to the position of the object output from the system decoding unit 1003.
102 is arranged. Similarly, mixer 1009 also outputs female object 1101 and male object 11 according to the position of the object output from system decoding section 1003.
02 is mixed, and the sound volume balance is adjusted so that the audio data can be output from the left and right audio devices 1011 and 1012. The monitor 1010 displays the output of the image synthesizing unit 1008, and outputs the audio devices 1011 and 101.
2 reproduces audio data.

【００１２】[0012]

【発明が解決しようとする課題】ビデオオブジェクト
は、ＭＰＥＧ−４符号化方式ではVideoObjectとして定
義されるが、フレームの最初から最後まで符号化対象が
含まれる大きさで定義される。また符号化対象は、フレ
ーム毎に符号化対象を含む外接矩形で表され、その位置
と大きさは各フレームで定義されている。その位置は、
主走査方向がvop_horizontal_mc_spatial_ref符号で、
副走査方向がvop_vertical_mc_spatial_ref符号で、そ
れぞれ表されており、その大きさは主走査方向がvop_wi
dth符号により、副走査方向がvop_height符号によりそ
れぞれ表わされている。A video object is defined as a VideoObject in the MPEG-4 encoding method, but is defined to have a size including an encoding target from the beginning to the end of a frame. The encoding target is represented by a circumscribed rectangle including the encoding target for each frame, and its position and size are defined in each frame. Its position is
The main scanning direction is vop_horizontal_mc_spatial_ref code,
The sub-scanning direction is represented by a vop_vertical_mc_spatial_ref code, and the size is vop_wi in the main scanning direction.
The sub-scanning direction is represented by a vop_height code by the dth code.

【００１３】オブジェクトを合成する際に、システム符
号で扱うオブジェクトの位置は左上端と画面との位置関
係で表される。しかしながら、オブジェクト内を符号化
対象が移動してもオーディオデータの再現はシステム符
号が扱う位置のみので、オーディオオブジェクトの再生
とビデオオブジェクトの位置関係にずれが生じ、違和感
が生じる。When combining objects, the position of the object handled by the system code is represented by the positional relationship between the upper left corner and the screen. However, even if the object to be encoded moves within the object, the reproduction of the audio data is limited to the position handled by the system code, so that the positional relationship between the reproduction of the audio object and the video object is displaced, giving a sense of incongruity.

【００１４】図１３（ａ）（ｂ）は、その一例を示す図
である。FIGS. 13A and 13B are diagrams showing an example.

【００１５】図１３（ａ）は最初のフレームであり、図
１３（ｂ）が最終フレームであったとき、図１３（ａ）
の符号化対象である女性１２００が画面の左端から右端
へと移動する。ここで、１２１０がオブジェクトを表
し、細線の矩形１２１１がＶＯＰ(Video Object Plane)
の外接矩形を表わしている。従来は、このように画面内
で女性が移動しても、その女性の声が、その移動に連れ
て移動しないという違和感が生じる。特に、ＭＰＥＧ−
４符号化ではコンテンツの再利用が考えられており、ビ
デオオブジェクトの再配置等が頻繁に起こり、この違和
感を取り除く必要に迫られていた。FIG. 13 (a) shows the first frame, and FIG. 13 (b) shows the last frame when FIG. 13 (b) shows the last frame.
Is moved from the left end to the right end of the screen. Here, reference numeral 1210 denotes an object, and a thin rectangle 1211 denotes a VOP (Video Object Plane).
Represents the circumscribed rectangle of. Conventionally, even if a woman moves on the screen in this way, there is an uncomfortable feeling that the voice of the woman does not move with the movement. In particular, MPEG-
In the four-encoding, the reuse of the content is considered, and the rearrangement of the video object frequently occurs, so that it is necessary to remove the discomfort.

【００１６】本発明は上記従来例に鑑みてなされたもの
で、オブジェクト符号化された符号化データを復号する
際、オーディオとビデオとを位置的な差異を無くして再
生する画像処理装置、画像処理方法、画像処理プログラ
ム及び画像処理プログラムが記憶されてコンピュータに
より読み取り可能な記憶媒体を提供することを目的とす
る。SUMMARY OF THE INVENTION The present invention has been made in view of the above conventional example, and has an image processing apparatus and an image processing apparatus for reproducing audio and video with no positional difference when decoding object-encoded data. It is an object to provide a method, an image processing program, and a computer-readable storage medium storing the image processing program.

【００１７】また本発明の目的は、ビデオオブジェクト
の移動に伴って、それに関連するオーディオの発生位置
を変更して、ビデオとオーディオの再生における違和感
をなくした画像処理装置、画像処理方法、画像処理プロ
グラム及び画像処理プログラムが記憶されたコンピュー
タにより読み取り可能な記憶媒体を提供することにあ
る。It is another object of the present invention to provide an image processing apparatus, an image processing method, and an image processing method that eliminate a sense of incongruity in video and audio reproduction by changing the position of audio generation associated with the movement of a video object. An object of the present invention is to provide a computer-readable storage medium storing a program and an image processing program.

【００１８】また本発明の目的は、ビデオオブジェクト
の移動に伴って、それに関連するオーディオの再生音量
を変更して、ビデオとオーディオの再生における違和感
をなくした画像処理装置、画像処理方法、画像処理プロ
グラム及び画像処理プログラムが記憶されたコンピュー
タにより読み取り可能な記憶媒体を提供することにあ
る。Another object of the present invention is to provide an image processing apparatus, an image processing method, and an image processing method that eliminate a sense of incongruity in video and audio reproduction by changing the audio reproduction volume associated with the movement of a video object. An object of the present invention is to provide a computer-readable storage medium storing a program and an image processing program.

【００１９】[0019]

【課題を解決するための手段】上記目的を達成するため
に本発明の画像処理装置は以下のような構成を備える。
即ち、オーディオの符号化データとビデオの符号化デー
タを復号して再生する画像処理装置であって、オーディ
オ符号化データを復号してオーディオデータを生成する
オーディオ復号手段と、ビデオ符号化データを復号して
ビデオデータを生成するビデオ復号手段と、前記ビデオ
符号化データに基づいてビデオオブジェクトの位置を求
める音源位置取得手段と、前記音源位置取得手段により
取得された位置情報に基づいて、前記オーディオ復号手
段で復号されたオーディオデータの再生を制御するオー
ディオ再生制御手段と、を有することを特徴とする。In order to achieve the above object, an image processing apparatus according to the present invention has the following arrangement.
That is, an image processing apparatus that decodes and reproduces encoded audio data and encoded video data, decodes the encoded audio data and generates audio data, and decodes the encoded video data. Video decoding means for generating video data, sound source position obtaining means for obtaining a position of a video object based on the video encoded data, and audio decoding based on the position information obtained by the sound source position obtaining means. Audio reproduction control means for controlling reproduction of the audio data decoded by the means.

【００２０】上記目的を達成するために本発明の画像処
理装置は以下のような構成を備える。即ち、オーディオ
の符号化データとビデオの符号化データを復号して再生
する画像処理装置であって、オーディオ符号化データを
復号してオーディオデータを生成するオーディオ復号手
段と、ビデオ符号化データを復号してビデオデータを生
成するビデオ復号手段と、前記ビデオ符号化データに基
づいて前記ビデオに含まれるオーディオの音源までの距
離を求める音源距離演算手段と、前記音源距離演算手段
により演算された距離情報に基づいて、前記オーディオ
復号手段で復号されたオーディオデータの再生音量を制
御するオーディオ再生制御手段と、を有することを特徴
とする。In order to achieve the above object, an image processing apparatus according to the present invention has the following arrangement. That is, an image processing apparatus that decodes and reproduces encoded audio data and encoded video data, decodes the encoded audio data and generates audio data, and decodes the encoded video data. Video decoding means for generating video data, sound source distance calculation means for obtaining a distance to an audio source included in the video based on the video encoded data, and distance information calculated by the sound source distance calculation means. Audio playback control means for controlling the playback volume of the audio data decoded by the audio decoding means based on the audio data.

【００２１】上記目的を達成するために本発明の画像処
理方法は以下のような工程を備える。即ち、オーディオ
の符号化データとビデオの符号化データを復号して再生
する画像処理装置における画像処理方法であって、オー
ディオ符号化データを復号してオーディオデータを生成
するオーディオ復号工程と、ビデオ符号化データを復号
してビデオデータを生成するビデオ復号工程と、前記ビ
デオ符号化データに基づいてビデオオブジェクトの位置
を求める音源位置取得工程と、前記音源位置取得工程で
取得された位置情報に基づいて、前記オーディオ復号工
程で復号されたオーディオデータの再生を制御するオー
ディオ再生制御工程と、を有することを特徴とする。In order to achieve the above object, the image processing method of the present invention comprises the following steps. That is, an image processing method in an image processing apparatus that decodes and reproduces encoded audio data and encoded video data, comprising: an audio decoding step of decoding encoded audio data to generate audio data; A video decoding step of decoding encoded data to generate video data, a sound source position obtaining step of determining a position of a video object based on the video encoded data, and based on the position information obtained in the sound source position obtaining step. And an audio reproduction control step of controlling reproduction of the audio data decoded in the audio decoding step.

【００２２】上記目的を達成するために本発明の画像処
理方法は以下のような工程を備える。即ち、オーディオ
の符号化データとビデオの符号化データを復号して再生
する画像処理装置における画像処理方法であって、オー
ディオ符号化データを復号してオーディオデータを生成
するオーディオ復号工程と、ビデオ符号化データを復号
してビデオデータを生成するビデオ復号工程と、前記ビ
デオ符号化データに基づいて前記ビデオに含まれるオー
ディオの音源までの距離を求める音源距離演算工程と、
前記音源距離演算工程で演算された距離情報に基づい
て、前記オーディオ復号工程で復号されたオーディオデ
ータの再生音量を制御するオーディオ再生制御工程と、
を有することを特徴とする。In order to achieve the above object, the image processing method of the present invention comprises the following steps. That is, an image processing method in an image processing apparatus that decodes and reproduces encoded audio data and encoded video data, comprising: an audio decoding step of decoding encoded audio data to generate audio data; A video decoding step of decoding encoded data to generate video data, and a sound source distance calculating step of determining a distance to a sound source of audio included in the video based on the video encoded data,
An audio playback control step of controlling a playback volume of audio data decoded in the audio decoding step based on the distance information calculated in the sound source distance calculation step;
It is characterized by having.

【００２３】[0023]

【発明の実施の形態】以下、添付図面を参照して本発明
の好適な実施の形態を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

【００２４】［実施の形態１］図１は、本発明の実施の
形態１に係る動画像再生装置の構成を示すブロック図で
ある。尚、本実施の形態では、ＭＰＥＧ−４符号化され
た動画像データを入力し、その符号化データに含まれる
ビデオとオーディオとを再生する。[Embodiment 1] FIG. 1 is a block diagram showing a configuration of a moving image reproducing apparatus according to Embodiment 1 of the present invention. In the present embodiment, MPEG-4 encoded moving image data is input, and video and audio included in the encoded data are reproduced.

【００２５】図１において、１はＭＰＥＧ−４符号化デ
ータ入力部で、ＭＰＥＧ−４符号化データを入力する。
２は分離器で、ＭＰＥＧ−４符号化データ入力部１から
入力されるＭＰＥＧ−４符号化データを分離して後段の
各部に入力する。３はシステム復号部で、分離器２で分
離されたＭＰＥＧ−４符号化方式で符号化されたＢＩＦ
Ｓ符号化データを入力して復号する。４，５はビデオ復
号部で、分離器２で分離されたビデオオブジェクトをフ
レーム（ＶＯＰ）単位で復号する。６，７はオーディオ
復号部で、分離器２で分離されたオーディオオブジェク
トを単位時間で復号する。In FIG. 1, reference numeral 1 denotes an MPEG-4 encoded data input section for inputting MPEG-4 encoded data.
Reference numeral 2 denotes a separator that separates the MPEG-4 encoded data input from the MPEG-4 encoded data input unit 1 and inputs the separated data to the subsequent units. Reference numeral 3 denotes a system decoding unit, which is a BIF encoded by the MPEG-4 encoding method separated by the separator 2.
S-coded data is input and decoded. Reference numerals 4 and 5 denote video decoding units, which decode the video objects separated by the separator 2 on a frame (VOP) basis. Reference numerals 6 and 7 denote audio decoding units, which decode the audio objects separated by the separator 2 in unit time.

【００２６】ここでは２つのオブジェクトを再生する場
合について説明する。ここでは、例えば図２に示すオブ
ジェクトの例で説明する。Here, a case where two objects are reproduced will be described. Here, an example of the object shown in FIG. 2 will be described.

【００２７】８は画像合成部であり、ビデオ復号部４，
５で復号されて再生されたビデオデータを、システム復
号部３の出力に従って合成する。９は位置判定部であ
り、システム復号部３、ビデオ復号部４，５の出力に基
づいて、各音源となる各ビデオオブジェクトの位置を判
定する。１０はミキサであり、２つのオブジェクトに関
連するオーディオデータを、位置判定部９の出力に応じ
てミキシングする。１１，１２は音響装置であり、スピ
ーカ等で構成され音響信号を基に音を再生する。１３は
画像を表示する表示装置（モニタ）である。Reference numeral 8 denotes an image synthesizing unit, which is a video decoding unit 4,
The video data decoded and reproduced in step 5 is synthesized according to the output of the system decoding unit 3. Reference numeral 9 denotes a position determination unit which determines the position of each video object as a sound source based on the outputs of the system decoding unit 3 and the video decoding units 4 and 5. Reference numeral 10 denotes a mixer for mixing audio data related to two objects in accordance with an output of the position determination unit 9. Reference numerals 11 and 12 denote audio devices which are constituted by speakers or the like and reproduce sound based on the audio signals. A display device (monitor) 13 displays an image.

【００２８】以上の構成による動作を説明する。The operation of the above configuration will be described.

【００２９】ＭＰＥＧ−４符号化データ入力部１は、Ｍ
ＰＥＧ−４符号化方式に準拠したエレメンタリ・ストリ
ームを入力する。分離器２は、このＭＰＥＧ−４符号化
データ入力部１から入力されるエレメンタリ・ストリー
ムから、システムに関するＢＩＦＳ符号化データ、それ
ぞれのオブジェクトのビデオ符号化データ、オーディオ
符号化データに分離される。この内、ＢＩＦＳ符号化デ
ータはシステム復号部３に、図２のオブジェクト１１０
１に関するビデオの符号化データはビデオ復号部４に、
オブジェクト１１０２に関するビデオの符号化データは
ビデオ復号部５に、オブジェクト１１０１に関するオー
ディオの符号化データはオーディオ復号部６に、オブジ
ェクト１１０２に関するオーディオの符号化データはオ
ーディオ復号部７にそれぞれ入力される。The MPEG-4 encoded data input unit 1
An elementary stream conforming to the PEG-4 encoding method is input. The separator 2 separates the elementary stream input from the MPEG-4 encoded data input unit 1 into BIFS encoded data relating to the system, video encoded data of each object, and audio encoded data. Among them, the BIFS encoded data is transmitted to the system decoding unit 3 by the object 110 in FIG.
The encoded data of the video related to 1 is sent to the video decoding unit 4.
The encoded video data of the object 1102 is input to the video decoding unit 5, the encoded audio data of the object 1101 is input to the audio decoding unit 6, and the encoded audio data of the object 1102 is input to the audio decoding unit 7.

【００３０】システム復号部３ではＢＩＦＳ符号を復号
し、DecSpecificInfoデスクリプタに記載されているpix
elWidth，pixelHeightに従って、画面の大きさを設定す
る。この時の画面のサイズは、主走査方向のサイズ（Im
age_x），副走査方向のサイズ（Image_y）で表される。
またシステム復号部３は、各オブジェクトの位置を復号
する。ここで、オブジェクト１１０１の主走査方向の位
置を（VO1_loc_x）、副走査方向の位置を（VO1_loc_y）
とし、オブジェクト１１０２の主走査方向の位置を（VO
2_loc_x），副走査方向の位置を（VO2_loc_y）とする。The system decoding section 3 decodes the BIFS code and outputs the pix described in the DecSpecificInfo descriptor.
Set the screen size according to elWidth and pixelHeight. The screen size at this time is the size in the main scanning direction (Im
age_x) and the size in the sub-scanning direction (Image_y).
Further, the system decoding unit 3 decodes the position of each object. Here, the position of the object 1101 in the main scanning direction is (VO1_loc_x), and the position of the object 1101 in the sub-scanning direction is (VO1_loc_y).
And the position of the object 1102 in the main scanning direction is (VO
2_loc_x), and the position in the sub-scanning direction is (VO2_loc_y).

【００３１】ビデオ復号部４では、オブジェクト１１０
１の各ＶＯＰの符号化データを復号し、ＶＯＰのサイ
ズ、ＶＯＰの相対位置（オブジェクト１１０１に関する
ＶＯＰの主走査方向の位置（VOP1_loc_x）、副走査方向
の位置（VOP1_loc_y））を得て、画像データを復号す
る。ビデオ復号部５では同様に、オブジェクト１１０２
の各ＶＯＰの符号化データを復号し、ＶＯＰのサイズ、
ＶＯＰの相対位置（オブジェクト１１０２に関するＶＯ
Ｐの主走査方向の位置（VOP2_loc_x）、副走査方向の位
置（VOP2_loc_y））を得て画像データを復号する。In the video decoding unit 4, the object 110
1, the VOP size and the relative position of the VOP (the position of the VOP in the main scanning direction (VOP1_loc_x) and the position in the sub-scanning direction (VOP1_loc_y) of the object 1101) are obtained. Is decrypted. In the video decoding unit 5, similarly, the object 1102
, And decodes the encoded data of each VOP,
Relative position of VOP (VO related to object 1102)
The position of the P in the main scanning direction (VOP2_loc_x) and the position of the P in the sub-scanning direction (VOP2_loc_y) are obtained, and the image data is decoded.

【００３２】図３は、ビデオ復号部４（５）の詳細な構
成を示すブロック図である。FIG. 3 is a block diagram showing a detailed configuration of the video decoding unit 4 (5).

【００３３】図３において、ＭＰＥＧ−４ビデオデータ
入力部５１は、分離器２で分離されたＭＰＥＧ−４ビデ
オ符号化データを入力する。分離器５２は、その入力さ
れたＭＰＥＧ−４ビデオ符号化データから、ヘッダ、形
状情報及びテクスチャのそれぞれの符号化データを分離
して後段の各部に入力する。ヘッダ復号部５３は、ＶＯ
Ｐの位置やサイズを復号し、そのビデオ符号化データを
復号するのに必要な情報を各部にセットする。形状復号
部５４は、分離器５２で分離された形状情報の符号化デ
ータを入力して復号する。テクスチャ復号部５５は、オ
ブジェクトのテクスチャに関する符号化データを復号し
て画像データを再生する。In FIG. 3, an MPEG-4 video data input section 51 inputs the MPEG-4 video encoded data separated by the separator 2. The separator 52 separates the encoded data of the header, the shape information, and the texture from the inputted MPEG-4 video encoded data and inputs the encoded data to the subsequent units. The header decoding unit 53
The position and size of P are decoded, and information necessary for decoding the video encoded data is set in each unit. The shape decoding unit 54 inputs and decodes the encoded data of the shape information separated by the separator 52. The texture decoding unit 55 decodes encoded data relating to the texture of the object to reproduce image data.

【００３４】このような構成において、ＭＰＥＧ−４ビ
デオデータ入力部５１は、ＭＰＥＧ−４符号化方式に準
拠したビデオ符号化データを入力する。分離器５２は、
この入力したビデオ符号化データから、Visual Object
Sequenceレイヤ、Visual Objectレイヤ, Video Object
Layerレイヤ，Video Object Planeレイヤの各ヘッダの
符号化データ、形状情報に関する符号化データ、テクス
チャに関する符号化データに分離する。このうちヘッダ
の符号化データは、ヘッダ復号部５３に、形状に関する
符号化データは形状復号部５４に、テクスチャに関する
符号化データは、テクスチャ復号部５５にそれぞれ入力
される。In such a configuration, the MPEG-4 video data input section 51 inputs video encoded data conforming to the MPEG-4 encoding system. The separator 52 includes:
From this input video encoded data, a Visual Object
Sequence layer, Visual Object layer, Video Object
It is separated into encoded data of each header of the Layer layer and Video Object Plane layer, encoded data relating to shape information, and encoded data relating to texture. The encoded data of the header is input to the header decoding unit 53, the encoded data of the shape is input to the shape decoding unit 54, and the encoded data of the texture is input to the texture decoding unit 55.

【００３５】ヘッダ復号部５３は、ＶＯＰのサイズや位
置等のＶＯＰを復号するに不可欠な情報を復号し、その
ビデオオブジェクトの復号に必要な情報を各部にセット
する。このうち、ＶＯＰの大きさや位置に関する情報を
位置判定部９に出力する。形状復号部５４は、各ＶＯＰ
の形状情報を表す２値の画像データを復号する。オブジ
ェクトの内部を表す場合、画素値は"１"であり、オブジ
ェクトの外側の値は"０"である。こうして復号された結
果は、画像合成部８に出力される。テクスチャ復号部５
５は、各ＶＯＰのオブジェクトのテクスチャを復号して
画像データを得て、その画像データも画像合成部８に出
力される。The header decoding section 53 decodes information necessary for decoding the VOP, such as the size and position of the VOP, and sets information necessary for decoding the video object in each section. Among them, information on the size and position of the VOP is output to the position determining unit 9. The shape decoding unit 54 outputs
The binary image data representing the shape information is decoded. When representing the inside of the object, the pixel value is “1” and the value outside the object is “0”. The result of the decoding is output to the image synthesizing unit 8. Texture decoding unit 5
5 decodes the texture of the object of each VOP to obtain image data, and the image data is also output to the image synthesizing unit 8.

【００３６】以上の動作は、ビデオ復号部４及び５にお
いて、その対象となるビデオ符号化データのオブジェク
トが異なるだけで同様にして実行される。The above operation is performed in the video decoding units 4 and 5 in the same manner except that the object of the video encoded data to be processed is different.

【００３７】再び、図１に戻って、オーディオ復号部６
は、図２のオブジェクト１１０１に関するビデオの表示
間隔の時間に再生するオーディオのデータを復号する。
同様に、オーディオ復号部７は、図２のオブジェクト１
１０２に関するビデオの表示間隔の時間に再生するオー
ディオのデータを復号する。Returning again to FIG. 1, the audio decoding unit 6
Decodes audio data to be reproduced at the time of the video display interval for the object 1101 in FIG.
Similarly, the audio decoding unit 7 outputs the object 1 in FIG.
The audio data to be reproduced at the time of the video display interval related to 102 is decoded.

【００３８】位置判定部９は、システム復号部３からオ
ブジェクトの位置と、ビデオ復号部４からのオブジェク
ト１１０１のＶＯＰの位置と、ビデオ復号部５からのオ
ブジェクト１１０２のＶＯＰの位置に関する情報を入力
し、これらの情報からオーディオのミキシングを行なう
パラメータを決定する。ここでオブジェクト１１０１の
音源の位置は（VO1_loc_x＋VOP1_loc_x）で表される。
また画面全体の主走査方向のサイズはImage_xであるか
ら、オブジェクト１１０１のオーディオに関するミキシ
ングパラメータＰ1は、Ｐ1 ＝（VO1_loc_x＋VOP1_loc_x）／Image_x ...（１）となる。同様に、オブジェクト１１０２のオーディオに
関するミキシングパラメータＰ2は、Ｐ2 ＝（VO2_loc_x＋VOP2_loc_x）／Image_x ...（２）となる。これらの結果がミキサ１０に入力される。The position judging unit 9 receives information on the position of the object from the system decoding unit 3, the position of the VOP of the object 1101 from the video decoding unit 4, and the information on the position of the VOP of the object 1102 from the video decoding unit 5. From this information, parameters for performing audio mixing are determined. Here, the position of the sound source of the object 1101 is represented by (VO1_loc_x + VOP1_loc_x).
Further, since the size of the entire screen in the main scanning direction is Image_x, the mixing parameter P1 relating to the audio of the object 1101 is as follows: P1 = (VO1_loc_x + VOP1_loc_x) / Image_x (1) Similarly, the mixing parameter P2 relating to the audio of the object 1102 is as follows: P2 = (VO2_loc_x + VOP2_loc_x) / Image_x (2) These results are input to the mixer 10.

【００３９】ミキサ１０では、これらのミキシングパラ
メータＰ1，Ｐ2に従って、オーディオ復号部６，７から
の復号データをミキシングする。ここで、オブジェクト
１１０１に関するオーディオデータの大きさを概念的に
Ａ１とし、オブジェクト１１０２に関するオーディオデ
ータの大きさを概念的にＡ２として説明する。また、大
きさがＡ１のオーディオデータを再生する際にステレオ
再生における左側の音響装置１１で再生される大きさを
概念的にＡ１Ｌ、右側の音響装置１２で再生される大き
さを概念的にＡ１Ｒと定義する。同様に、大きさがＡ２
のオーディオデータに関しても、Ａ２Ｌ，Ａ２Ｒを定義
する。The mixer 10 mixes the decoded data from the audio decoding units 6 and 7 in accordance with the mixing parameters P1 and P2. Here, the size of audio data related to the object 1101 is conceptually A1, and the size of audio data related to the object 1102 is conceptually A2. Also, when audio data having a size of A1 is reproduced, the size reproduced by the left audio device 11 in stereo reproduction is conceptually A1L, and the size reproduced in the right audio device 12 is conceptually A1R. Is defined. Similarly, if the size is A2
A2L and A2R are also defined for the audio data.

【００４０】まず、最初にＡ１ＬとＡ１Ｒの算出方法に
ついて説明する。なお、ここでは最終的に音響装置１１
から出る左の音の大きさを概念的にＡＬ、右の音の大き
さを概念的にＡＲと定義する。First, a method of calculating A1L and A1R will be described. Here, the sound device 11 is finally
The loudness of the left sound coming out of the sound is conceptually defined as AL, and the loudness of the right sound is conceptually defined as AR.

【００４１】まず、オブジェクト１１０１に関して左右
のバランスを計算して、Ａ１ＲとＡ１Ｌを求める。これ
は以下の式で求められる。First, the left and right balance of the object 1101 is calculated, and A1R and A1L are obtained. This is obtained by the following equation.

【００４２】Ａ１Ｌ＝Ａ１×Ｐ1 ...（３）Ａ１Ｒ＝Ａ１×（１−Ｐ1） ...（４）同様にして、Ａ２Ｌ，Ａ２Ｒに関しても、Ａ２Ｌ＝Ａ２×Ｐ2 ...（５）Ａ２Ｒ＝Ａ２×（１−Ｐ2） ...（６）で求められる。従って、音響装置１１，１２で再生され
る音の大きさＡＲ，ＡＬは、それぞれ以下の式で表され
る。A1L = A1 × P1 (3) A1R = A1 × (1-P1) (4) Similarly, for A2L and A2R, A2L = A2 × P2 (5) A2R = A2 × (1-P2) (6) Accordingly, the loudnesses AR and AL of the sounds reproduced by the audio devices 11 and 12 are represented by the following equations, respectively.

【００４３】ＡＲ＝Ａ１Ｒ＋Ａ２Ｒ ...（７）ＡＬ＝Ａ１Ｌ＋Ａ２Ｌ ...（８）このようにして、ミキサ１０でミキシングされたデータ
は、それぞれ音響装置１１，１２に入力されて、再生さ
れる。また表示装置１３は、画像合成部８からの出力さ
れる復号されたビデオデータを表示し、音響装置１１，
１２は、オーディオ復号部６，７で復号されたオーディ
オを再生する。AR = A1R + A2R (7) AL = A1L + A2L (8) In this way, the data mixed by the mixer 10 is input to the audio devices 11 and 12, respectively. Will be played. Further, the display device 13 displays the decoded video data output from the image synthesizing unit 8, and
Reference numeral 12 reproduces the audio decoded by the audio decoding units 6 and 7.

【００４４】図４（ａ）（ｂ）は、このようにして合成
した結果の一例を示す図である。FIGS. 4 (a) and 4 (b) are views showing an example of the result of the synthesis in this manner.

【００４５】図４（ａ）は、最初のフレームを示し、図
２の例に対応している。図４（ｂ）は最終フレームの様
子を示している。図４（ａ）の最初のフレームでは、オ
ブジェクト１１０１の女性の声は主に左側の音響装置１
１から再生されて、主に画面の左側から聞こえてくる
が、図４（ｂ）の最終のフレーム１１００では、オブジ
ェクト１１０３の女性の声は主に右側の音響装置１２か
ら再生されて、主に画面の右側から聞こえてくるように
再生できる。FIG. 4A shows the first frame and corresponds to the example of FIG. FIG. 4B shows the state of the last frame. In the first frame of FIG. 4A, the female voice of the object 1101 is mainly the sound device 1 on the left side.
1 and mainly heard from the left side of the screen, in the last frame 1100 of FIG. 4B, the female voice of the object 1103 is mainly played back from the right acoustic device 12 and mainly You can play it as you hear it from the right side of the screen.

【００４６】以上、再生までの処理を、図５を参照して
説明する。The processing up to the reproduction will be described with reference to FIG.

【００４７】図５は、本実施の形態１に係る動画像再生
装置における動画再生処理を示すフローチャートであ
る。FIG. 5 is a flowchart showing a moving image reproducing process in the moving image reproducing apparatus according to the first embodiment.

【００４８】まずステップＳ１０１にて、ＭＰＥＧ−４
符号化データ入力部１からＭＰＥＧ−４符号化データを
入力する。次にステップＳ１０２にて、分離器２で分離
されたシステム符号化データをシステム復号部３で復号
することにより、各オブジェクトの位置情報を得る。次
にステップＳ１０３に進み、フレーム数をカウントする
変数ｎに"１"を代入して初期化する。次にステップＳ１
０４に進み、変数ｎの値とフレーム数とを比較すること
により、動画の再生が終了したか否かを判定し、最終フ
レームであれば処理を終了するが、最終フレームでなけ
ればステップＳ１０５に進み、処理を継続する。First, in step S101, MPEG-4
MPEG-4 encoded data is input from the encoded data input unit 1. Next, in step S102, the system coded data separated by the separator 2 is decoded by the system decoding unit 3 to obtain position information of each object. Next, the process proceeds to step S103, in which "1" is substituted for a variable n for counting the number of frames, and initialization is performed. Next, step S1
04, by comparing the value of the variable n with the number of frames, it is determined whether or not the reproduction of the moving image has been completed. If it is the last frame, the process ends. Proceed and continue the process.

【００４９】ステップＳ１０５では、ビデオオブジェク
トである１フレーム（ｎ番目のフレーム）（ＶＯＰ）の
符号化データを復号し、そのＶＯＰの位置や大きさの情
報、形状情報、テクスチャデータを得る。これは分離器
２で分離されたシステム符号化データをシステム復号部
３で復号し、またビデオ復号部４，５により、ｎ番目の
ビデオオブジェクトの１フレームを復号することにより
得られる。次にステップＳ１０６に進み、ｎ番目のフレ
ームのオーディオオブジェクトの符号化データから、そ
の１フレームの間隔に匹敵する時間分のオーディオデー
タを復号する。次にステップＳ１０７に進み、上記
（１）式又は（２）式を用いて、ミキサ１０におけるミ
キシングのパラメータを求める。次にステップＳ１０８
に進み、ビデオ復号部４，５で復号されたビデオオブジ
ェクトの画像データ同士を、システム復号部３からの形
状情報とテクスチャ情報に基づいて画像合成部８で合成
し、そのシステム符号の復号結果に従って画面上に配置
する。これと同時に、オーディオ復号部６，７で復号さ
れたオーディオデータを、上記（３）式から（８）式に
従って再生する。こうしてステップＳ１０９で画像及び
オーディオが再生表示・出力され、次にステップＳ１１
０に進み、変数ｎに"１"を加え、再度ステップＳ１０４
に戻って、処理が終了したかどうかを判断する。In step S105, the coded data of one frame (n-th frame) (VOP), which is a video object, is decoded, and information on the position and size of the VOP, shape information, and texture data are obtained. This is obtained by decoding the system encoded data separated by the separator 2 by the system decoding unit 3 and decoding one frame of the n-th video object by the video decoding units 4 and 5. Next, the process proceeds to step S106, where audio data for a time equivalent to the interval of one frame is decoded from the encoded data of the audio object of the n-th frame. Next, the process proceeds to step S107, and a mixing parameter in the mixer 10 is obtained by using the above equation (1) or (2). Next, step S108
Then, the image data of the video objects decoded by the video decoding units 4 and 5 are synthesized by the image synthesizing unit 8 based on the shape information and the texture information from the system decoding unit 3, and according to the decoding result of the system code. Place it on the screen. At the same time, the audio data decoded by the audio decoding units 6 and 7 is reproduced according to the above equations (3) to (8). Thus, in step S109, the image and audio are reproduced and displayed / output, and then in step S11
The process proceeds to 0, "1" is added to the variable n, and step S104 is repeated.
To determine whether the process has been completed.

【００５０】このような一連の処理により、オブジェク
トの位置をより忠実に判定することができ、その位置に
応じてオーディオを移動して再生できるので、ビデオ表
示に対して違和感のないオーディオの再生ができる。ま
た、フレーム単位で、オーディオの再生位置を更新する
ので、ビデオ表示に伴う滑らかなオーディオの再生・移
動を実現することも可能になる。By such a series of processing, the position of the object can be determined more faithfully, and the audio can be moved and reproduced in accordance with the position. it can. Also, since the audio playback position is updated in frame units, it is possible to realize smooth audio playback / movement accompanying video display.

【００５１】尚、本実施の形態１においては、入力され
る符号化データをＭＰＥＧ−４符号化データとしたが本
発明はこれに限定されるものではなく、例えば、各フレ
ームにＪＰＥＧ２０００符号化方式でＲＯＩの機能を持
って符号化したデータ等を用いてももちろんかまわな
い。In the first embodiment, the input encoded data is MPEG-4 encoded data. However, the present invention is not limited to this. For example, the JPEG2000 encoding method may be used for each frame. Of course, data encoded with the function of the ROI may be used.

【００５２】尚、本発明は、上述の実施の形態１におけ
る処理手順に限定されるものではなく、例えば、ステッ
プＳ１０５とステップＳ１０６は並列に処理しても良
く、他にも並列で処理できるところがあれば、並列処理
を行ってももちろんかまわない。また本実施の形態にお
いては、オブジェクトが２つの場合で説明したが、オブ
ジェクトが１つの場合でも全く同様に実現可能であり、
３つ以上の場合でもそれぞれの復号部を増やすことによ
って簡単に対応できる。The present invention is not limited to the processing procedure in the first embodiment. For example, step S105 and step S106 may be performed in parallel, and other processing may be performed in parallel. If so, of course, it does not matter if parallel processing is performed. Further, in the present embodiment, the case where there are two objects has been described. However, even in the case where there is one object, it can be realized in exactly the same way.
Even in the case of three or more, it is possible to easily cope with the situation by increasing the number of decoding units.

【００５３】［実施の形態２］本発明の実施の形態２と
して、図１に示した動画像処理装置の構成を用い、ビデ
オ復号部４又は５が異なる構成をとる場合について説明
する。[Second Embodiment] As the second embodiment of the present invention, a case where the video decoding unit 4 or 5 has a different configuration using the configuration of the moving picture processing apparatus shown in FIG. 1 will be described.

【００５４】図６は、本発明の実施の形態２に係るビデ
オ復号部４（５）の構成を示すブロック図である。尚、
前述の実施の形態１（図３）と同様の構成要素について
は同一番号を付して、その詳細な説明は省略する。FIG. 6 is a block diagram showing a configuration of video decoding section 4 (5) according to Embodiment 2 of the present invention. still,
The same components as those in the first embodiment (FIG. 3) are denoted by the same reference numerals, and detailed description thereof will be omitted.

【００５５】図６において、１５１は形状重心算出部
で、形状復号部５４で復号された形状情報からオブジェ
クトの重心位置を計算する。１５２はオブジェクト位置
判定部で、ヘッダ復号部５３で復号されたシステム符号
の復号結果と、形状重心算出部１５１で算出された重心
の位置情報からオブジェクトの位置を判定する。In FIG. 6, reference numeral 151 denotes a shape center-of-gravity calculating unit which calculates the position of the center of gravity of the object from the shape information decoded by the shape decoding unit 54. Reference numeral 152 denotes an object position determining unit which determines the position of the object from the decoding result of the system code decoded by the header decoding unit 53 and the position information of the center of gravity calculated by the shape center of gravity calculating unit 151.

【００５６】ここでは実施の形態１と同様に、ＭＰＥＧ
−４ビデオデータ入力部５１は、分離器２で分離された
ＭＰＥＧ−４符号化方式に準拠したビデオ符号化データ
を入力する。分離器５２は、その入力したビデオ符号化
データから、各ヘッダの符号化データ、形状情報に関す
る符号化データ、テクスチャに関する符号化データに分
離する。ここでヘッダの符号化データはヘッダ復号部５
３に、形状に関する符号化データは形状復号部５４に、
テクスチャに関する符号化データはテクスチャ復号部５
５にそれぞれ入力される。Here, as in the first embodiment, MPEG
The −4 video data input unit 51 inputs the video encoded data compliant with the MPEG-4 encoding scheme separated by the separator 2. The separator 52 separates the input video encoded data into encoded data of each header, encoded data relating to shape information, and encoded data relating to texture. Here, the encoded data of the header is transmitted to the header decoding unit 5.
3, the encoded data on the shape is sent to the shape decoding unit 54,
The encoded data relating to the texture is sent to the texture decoding unit 5.
5, respectively.

【００５７】ヘッダ復号部５３は、ＶＯＰのサイズや位
置等といったＶＯＰを復号するに不可欠な情報を復号
し、その復号に必要な情報を各部にセットし、ＶＯＰの
大きさや位置に関する情報をオブジェクト位置判定部１
５２に出力する。形状復号部５４は、各ＶＯＰの形状情
報を表す２値の画像データを復号する。こうして復号さ
れた結果は、画像合成部８に出力されるとともに、形状
重心算出部１５１にも入力される。形状重心算出部１５
１は、形状情報が"１"である領域の重心（主走査方向の
座標（O_x）、副走査方向の座標（O_y））を求め、そし
てその重心位置をオブジェクト位置判定部１５２に入力
する。The header decoding unit 53 decodes information essential for decoding the VOP, such as the size and position of the VOP, sets information necessary for the decoding in each unit, and stores information on the size and position of the VOP in the object position. Judgment unit 1
52. The shape decoding unit 54 decodes binary image data representing the shape information of each VOP. The result decoded in this way is output to the image synthesizing unit 8 and also to the shape centroid calculating unit 151. Shape center of gravity calculation unit 15
In step 1, the center of gravity (coordinates in the main scanning direction (O_x) and coordinates in the sub-scanning direction (O_y)) of the area whose shape information is “1” is obtained, and the position of the center of gravity is input to the object position determination unit 152.

【００５８】オブジェクト位置判定部１５２は、ヘッダ
復号部５３の出力から、オブジェクトに関するＶＯＰの
主走査方向の位置（VOP1_loc_x）を読み込み、それを新
たなオブジェクトの位置（VOP1_loc_p_x）とする。これ
は以下の（９）式で得られる。The object position judging unit 152 reads the position (VOP1_loc_x) of the VOP relating to the object in the main scanning direction from the output of the header decoding unit 53, and sets it as a new object position (VOP1_loc_p_x). This is obtained by the following equation (9).

【００５９】 VOP1_loc_p_x ＝VOP1_loc_x ＋ O_x ...（９）テクスチャ復号部５５は、各ＶＯＰのオブジェクトのテ
クスチャを復号して画像データを得る。この復号された
画像データも画像合成部８に出力される。VOP1_loc_p_x = VOP1_loc_x + O_x (9) The texture decoding unit 55 decodes the texture of the object of each VOP to obtain image data. The decoded image data is also output to the image synthesizing unit 8.

【００６０】位置判定部９では、この新たなオブジェク
トの位置（VOP1_loc_p_x）を、上記（１）式の（VOP1_l
oc_x）の代りに用いて、ミキシングパラメータＰ1を算
出する。The position determining unit 9 determines the position (VOP1_loc_p_x) of the new object by using (VOP1_l
oc_x) to calculate the mixing parameter P1.

【００６１】このような一連の処理により、ＶＯＰが外
接矩形を持たない任意の形状の場合でも、各オブジェク
トの位置をより忠実に判定することができる。またこれ
により、そのビデオオブジェクトの位置に応じて、対応
するオーディオの発生位置を変更することができるの
で、ビデオに対して違和感のないオーディオの再生がで
きる。また、フレーム単位でオーディオの再生位置を更
新するので、ビデオオブジェクトの移動に応じて滑らか
なオーディオの再生位置の移動を反映することも可能に
なる。By such a series of processing, the position of each object can be determined more faithfully even when the VOP has an arbitrary shape having no circumscribed rectangle. In addition, since the position at which the corresponding audio is generated can be changed according to the position of the video object, the audio can be reproduced without discomfort with the video. In addition, since the audio playback position is updated on a frame basis, it is possible to reflect the smooth movement of the audio playback position according to the movement of the video object.

【００６２】［実施の形態３］図７は、本発明の実施の
形態３に係る動画像再生装置の構成を示すブロック図で
ある。尚、前述の実施の形態１と同様の構成要素につい
ては同一番号を付してその詳細な説明は省略する。本実
施の形態３では、ＭＰＥＧ−４符号化データを入力して
ビデオとオーディオを再生する。また、本実施の形態３
では図２のオブジェクトの構成を例にとって説明する
が、ここでは男性のビデオオブジェクトが接近してくる
場合で説明する。[Third Embodiment] FIG. 7 is a block diagram showing a configuration of a moving picture reproducing apparatus according to a third embodiment of the present invention. The same components as those in the first embodiment are given the same reference numerals, and detailed description thereof will be omitted. In the third embodiment, video and audio are reproduced by inputting MPEG-4 encoded data. Third Embodiment
In the following, the configuration of the object shown in FIG. 2 will be described as an example. Here, a case where a male video object approaches will be described.

【００６３】図８（ａ）（ｂ）は、その様子を示す図で
ある。図８（ａ）は最初のフレームを示し、図８（ｂ）
は最終フレームである。ここでは、符号化対象である男
性オブジェクト１１０２が、最終フレームでは手前に移
動している。ここで、このように男性が移動しても、そ
の男性に関する音の大きさが変化しないと、この画面を
見ている人は違和感を持つことになる。FIGS. 8A and 8B are diagrams showing this state. FIG. 8A shows the first frame, and FIG.
Is the last frame. Here, the male object 1102 to be encoded has moved to the front in the final frame. Here, if the loudness of the man does not change even if the man moves in this way, a person looking at this screen will feel uncomfortable.

【００６４】図７において、２０１，２０２はビデオ復
号部で、分離器２で分離されたビデオオブジェクトをフ
レーム（ＶＯＰ）単位で復号する。２０３，２０４は音
量調節部であり、それぞれ各ビデオ復号部２０１，２０
２の出力に応じてオーディオ復号部６，７で復号された
オーディオデータの音量を調整している。In FIG. 7, reference numerals 201 and 202 denote video decoding units, which decode the video objects separated by the separator 2 on a frame (VOP) basis. Reference numerals 203 and 204 denote volume control units, which are video decoding units 201 and 20 respectively.
2, the volume of the audio data decoded by the audio decoding units 6 and 7 is adjusted.

【００６５】以上の構成による動画像再生装置の動作を
以下に説明する。The operation of the moving picture reproducing apparatus having the above configuration will be described below.

【００６６】前述の実施の形態１と同様に、ＭＰＥＧ−
４符号化データ入力部１は、ＭＰＥＧ−４符号化方式に
準拠したエレメンタリ・ストリームを入力する。分離器
２は、そのエレメンタリ・ストリームを分離し、そのう
ちのＢＩＦＳ符号化データはシステム復号部３に、オブ
ジェクト１１０１に関するビデオの符号化データをビデ
オ復号部２０１に、オブジェクト１１０２に関するビデ
オの符号化データをビデオ復号部２０２に、オブジェク
ト１１０１に関するオーディオの符号化データをオーデ
ィオ復号部６に、オブジェクト１１０２に関するオーデ
ィオの符号化データをオーディオ復号部７にそれぞれ入
力する。As in the first embodiment, the MPEG-
The 4-coded data input unit 1 inputs an elementary stream conforming to the MPEG-4 coding method. The separator 2 separates the elementary streams, of which the BIFS encoded data is transmitted to the system decoding unit 3, the video encoded data relating to the object 1101 is transmitted to the video decoding unit 201, and the video encoded data relating to the object 1102 is transmitted to the system decoding unit 201. The encoded video data of the object 1101 is input to the audio decoding unit 6, and the encoded audio data of the object 1102 is input to the audio decoding unit 7.

【００６７】システム復号部３では、ＢＩＦＳ符号を復
号し、画面の大きさを設定する。また各オブジェクトの
位置を復号する。ビデオ復号部２０１では、オブジェク
ト１１０１の各ＶＯＰの符号化データを復号し、ＶＯＰ
のサイズ、ＶＯＰの相対位置を得て画像データを復号す
る。ビデオ復号部２０２では、オブジェクト１１０２の
各ＶＯＰの符号化データを復号し、ＶＯＰのサイズ、Ｖ
ＯＰの相対位置を得て画像データを復号する。The system decoding section 3 decodes the BIFS code and sets the size of the screen. Also, the position of each object is decoded. The video decoding unit 201 decodes the encoded data of each VOP of the object 1101,
And the relative position of the VOP are obtained to decode the image data. The video decoding unit 202 decodes the encoded data of each VOP of the object 1102, and
The relative position of the OP is obtained and the image data is decoded.

【００６８】図９は、本実施の形態３に係るビデオ復号
部２０１（２０２）の構成を示すブロック図である。
尚、図９において、前述の実施の形態１における図３と
同様の構成要素については同一番号を付してその詳細な
説明は省略する。FIG. 9 is a block diagram showing a configuration of video decoding section 201 (202) according to the third embodiment.
In FIG. 9, the same components as those in FIG. 3 in the first embodiment are given the same reference numerals, and detailed description thereof will be omitted.

【００６９】図９において、２５１は形状サイズ算出部
で、形状復号部５４で復号された形状情報からオブジェ
クトの大きさを計算する。２５２はオブジェクト位置判
定部で、ヘッダ復号部５３の出力と、形状サイズ算出部
２５１で算出された形状情報からオブジェクトの距離を
判定する。In FIG. 9, reference numeral 251 denotes a shape size calculation unit which calculates the size of an object from the shape information decoded by the shape decoding unit 54. An object position determination unit 252 determines the distance of the object from the output of the header decoding unit 53 and the shape information calculated by the shape size calculation unit 251.

【００７０】ここでは前述の実施の形態１と同様に、Ｍ
ＰＥＧ−４ビデオデータ入力部５１は、分離器２で分離
されたＭＰＥＧ−４符号化方式に準拠したビデオ符号化
データを入力する。分離器５２は、この入力されるビデ
オ符号化データから、各ヘッダの符号化データ、形状情
報に関する符号化データ、テクスチャに関する符号化デ
ータに分離する。そしてヘッダの符号化データはヘッダ
復号部５３に、形状に関する符号化データは形状復号部
５４に、そしてテクスチャに関する符号化データは、テ
クスチャ復号部５５にそれぞれ入力される。Here, as in the first embodiment, M
The PEG-4 video data input unit 51 inputs video coded data compliant with the MPEG-4 coding scheme separated by the separator 2. The separator 52 separates the input encoded video data into encoded data of each header, encoded data relating to shape information, and encoded data relating to texture. The encoded data of the header is input to the header decoding unit 53, the encoded data of the shape is input to the shape decoding unit 54, and the encoded data of the texture is input to the texture decoding unit 55.

【００７１】ヘッダ復号部５３は、ＶＯＰのサイズや位
置等といった、ＶＯＰを復号するに不可欠な情報を復号
し、その復号に必要な情報を各部にセットし、ＶＯＰの
大きさや位置に関する情報を位置判定部９に出力する。
また形状復号部５４は、各ＶＯＰの形状情報を復号して
画像合成部８に出力するとともに、形状サイズ算出部２
５１に対しても形状情報を供給している。形状サイズ算
出部２５１は、形状情報が"１"である画素の数を計数し
て、その形状のサイズＳnを求め、そのサイズＳnをオブ
ジェクト距離判定部２５２に入力する。The header decoding unit 53 decodes information essential for decoding the VOP, such as the size and position of the VOP, sets information necessary for the decoding in each unit, and stores information on the size and position of the VOP in the position. Output to the determination unit 9.
The shape decoding unit 54 decodes the shape information of each VOP and outputs it to the image synthesizing unit 8,
The shape information is also supplied to 51. The shape size calculation unit 251 counts the number of pixels whose shape information is “1”, obtains the size Sn of the shape, and inputs the size Sn to the object distance determination unit 252.

【００７２】オブジェクト位置判定部２５２は、現フレ
ームにおけるサイズＳnと、１フレーム前のサイズＳn-1
とを比較し、以下の（１０）式に従って、距離パラメー
タＤ1を求める。The object position determining unit 252 calculates the size Sn in the current frame and the size Sn-1 of the previous frame.
And the distance parameter D1 is obtained according to the following equation (10).

【００７３】Ｄ1 = （Ｓn／Ｓn-1） ×α ...（１０）こうして求められた距離パラメータＤ1は、閾値Ｔｄと
比較され、閾値Ｔｄよりも小さければＤ1＝１とする。
この距離パラメータＤ1は、音量調節部２０３に出力さ
れる。尚、αは予め定められた値である。またテクスチ
ャ復号部５５は、各ＶＯＰのオブジェクトのテクスチャ
を復号し、その画像データを得る。この画像データも画
像合成部８に出力される。D1 = (Sn / Sn−1) × α (10) The distance parameter D1 thus obtained is compared with a threshold value Td, and if it is smaller than the threshold value Td, D1 = 1 is set.
This distance parameter D1 is output to the volume controller 203. Here, α is a predetermined value. Further, the texture decoding unit 55 decodes the texture of the object of each VOP and obtains the image data. This image data is also output to the image synthesizing unit 8.

【００７４】尚、本実施の形態３に係るビデオ復号部２
０２においても、同様の処理が行われ、現フレームにお
けるサイズＳnと、１フレーム前のサイズＳn-1とを比較
し、下式に従って、距離パラメータＤ2を求める。The video decoding unit 2 according to the third embodiment
02, the same processing is performed, the size Sn in the current frame is compared with the size Sn-1 one frame before, and the distance parameter D2 is obtained according to the following equation.

【００７５】Ｄ2 = （Ｓn／Ｓn-1） × α ...（１１）こうして求められた距離パラメータＤ2は、閾値Ｔｄと
比較され、閾値Ｔｄよりも小さければＤ2＝１とする。
この距離パラメータＤ2は、音量調節部２０４に出力さ
れる。D2 = (Sn / Sn−1) × α (11) The distance parameter D2 obtained in this way is compared with a threshold value Td. If the distance parameter D2 is smaller than the threshold value Td, D2 = 1 is set.
This distance parameter D2 is output to the volume controller 204.

【００７６】再び図７に戻って、オーディオ復号部６
は、図２のオブジェクト１１０１に関するビデオの表示
間隔の時間に再生するオーディオのデータを復号する。
同様にオーディオ復号部７は、オブジェクト１１０２に
関するビデオの表示間隔の時間に再生するオーディオデ
ータを復号する。音量調節部２０３は、上述したように
ビデオ復号部２０１からオブジェクトの距離パラメータ
Ｄ1と、オーディオ復号部６から大きさＡ１のオーディ
オデータを入力する。そして音量調節部２０３は、その
入力された距離パラメータＤ1と、１フレーム前の音量
調節パラメータＭn-1から、下の（１２）式に従って音
量調節パラメータＭnを算出する。尚、１フレーム目の
音量調節パラメータＭ1を"１"とする。Returning to FIG. 7, the audio decoding unit 6
Decodes audio data to be reproduced at the time of the video display interval for the object 1101 in FIG.
Similarly, the audio decoding unit 7 decodes audio data to be reproduced at the time of the video display interval for the object 1102. The volume control unit 203 receives the distance parameter D1 of the object from the video decoding unit 201 and the audio data of the size A1 from the audio decoding unit 6 as described above. Then, the volume control unit 203 calculates a volume control parameter Mn from the input distance parameter D1 and the volume control parameter Mn-1 one frame before according to the following equation (12). The volume control parameter M1 of the first frame is set to "1".

【００７７】Ｍn ＝Ｄ1 × Ｍn-1 ...（１２）この音量調節パラメータＭnを用いて、Ａ１を調整し、
調整値Ａ１ｍを下記の（１３）式から得る。Mn = D1 × Mn−1 (12) A1 is adjusted using the volume control parameter Mn,
The adjustment value A1m is obtained from the following equation (13).

【００７８】Ａ１ｍ＝Ａ１ × Ｍn ...（１３）こうして調整されたオーディオデータは、前述の実施の
形態１と同様にして、音響装置１１，１２で再生され
る。A1m = A1 × Mn (13) The audio data thus adjusted is reproduced by the audio devices 11 and 12 in the same manner as in the first embodiment.

【００７９】図１０（ａ）（ｂ）は、このように合成さ
れて表示された結果を示す図である。FIGS. 10 (a) and 10 (b) are diagrams showing the result of combining and displaying in this manner.

【００８０】図１０（ａ）は、最初のフレームを示し、
図１０（ｂ）は最終フレームの様子を示している。最初
のフレームでは、オブジェクト１１０２の男性の声は小
さく再生され、最終のフレームでは、オブジェクト１１
０２ａで示すように、男性が接近してきているので、そ
れに伴ってその男性の声は大きく再生される。FIG. 10A shows the first frame.
FIG. 10B shows the state of the last frame. In the first frame, the male voice of the object 1102 is reproduced in a low volume, and in the last frame, the object 1112
As shown by 02a, since the man is approaching, his voice is reproduced loudly.

【００８１】以上、再生までの処理を、図１１を参照し
て説明する。The processing up to the reproduction will be described with reference to FIG.

【００８２】図１１は、本発明の実施の形態３に係る動
画再生装置における動画再生処理を示すフローチャート
である。FIG. 11 is a flowchart showing a moving picture reproducing process in the moving picture reproducing apparatus according to Embodiment 3 of the present invention.

【００８３】まずステップＳ２０１にて、ＭＰＥＧ−４
符号化データ入力部１からＭＰＥＧ−４符号化データを
入力し、ステップＳ２０２で、分離器２により分離した
システム符号化データをシステム復号器３で復号するこ
とにより、オブジェクトの位置情報を得る。次にステッ
プＳ２０３に進み、フレーム数をカウントする変数ｎ
に"１"を代入して初期化を行なう。次にステップＳ２０
４に進み、その変数ｎの値とフレーム数とを比較するこ
とにより、画像の再生が終了したか否かを判定し、終
了、即ち、最終フレームであれば処理を終了する。しか
しそうでなければステップＳ２０５に進み、処理を継続
する。First, in step S201, MPEG-4
MPEG-4 encoded data is input from the encoded data input unit 1, and in step S202, system encoded data separated by the separator 2 is decoded by the system decoder 3 to obtain position information of the object. Next, proceeding to step S203, a variable n for counting the number of frames is set.
Is initialized by substituting "1" into Next, step S20
Then, the process proceeds to 4 to compare the value of the variable n with the number of frames to determine whether or not the reproduction of the image has been completed. However, if not, the process proceeds to step S205 and the process is continued.

【００８４】ステップＳ２０５では、ビデオ復号部２０
１（２０２）によりビデオオブジェクトにおける１フレ
ーム（ＶＯＰ）の符号化データを復号し、そのＶＯＰの
位置や大きさの情報、形状情報、テクスチャデータを得
る。次にステップＳ２０６に進み、そのオーディオオブ
ジェクトの符号化データから１フレームの間隔に匹敵す
る時間分のオーディオデータを復号する。次にステップ
Ｓ２０７に進み、ビデオオブジェクトの形状情報からサ
イズを算出する。次にステップＳ２０８にて、上記（１
２）式を用いて、そのビデオオブジェクトに対応する音
量調節パラメータを求める。In step S205, the video decoding unit 20
1 (202), the encoded data of one frame (VOP) in the video object is decoded, and information on the position and size of the VOP, shape information, and texture data are obtained. Next, the process proceeds to step S206, where audio data for a time equivalent to an interval of one frame is decoded from the encoded data of the audio object. Next, the process proceeds to step S207, where the size is calculated from the shape information of the video object. Next, in step S208, the above (1)
A volume control parameter corresponding to the video object is obtained by using equation (2).

【００８５】次にステップＳ２０９に進み、形状情報と
テクスチャ情報に基づいて、ビデオオブジェクトの画像
データを合成し、システム復号部３におけるシステム符
号の復号結果に従って画面上に配置して表示する。これ
と同時に、上述の（１３）式に従って、個々の再生する
オーディオデータを再生する。こうしてステップＳ２１
０で画像及びオーディオを再生・表示し、次にステップ
Ｓ２１１に進み、変数ｎに"１"を加えてステップＳ２０
４に戻り、処理の終了の判断を行なう。Next, the process proceeds to step S209, where the image data of the video object is synthesized based on the shape information and the texture information, and is arranged and displayed on the screen according to the decoding result of the system code in the system decoding unit 3. At the same time, the audio data to be reproduced is reproduced according to the above-mentioned equation (13). Thus, step S21
0, the image and the audio are reproduced / displayed. Then, the process proceeds to step S211 to add "1" to the variable n, and then to step S20.
4 to determine whether the processing is completed.

【００８６】このような処理により、ビデオオブジェク
トの大きさから推定される距離を基に、それに関連した
オーディオデータの再生音量を調整することで、遠近感
の有るビデオ及びオーディオの再生ができる。By such processing, the reproduction volume of audio data related thereto is adjusted based on the distance estimated from the size of the video object, whereby video and audio with perspective can be reproduced.

【００８７】尚、本実施の形態３においては、入力をＭ
ＰＥＧ−４符号化データとしたが、本発明はこれに限定
されるものではない。例えば各フレームにＪＰＥＧ２０
００符号化方式でＲＯＩの機能を持って符号化したデー
タ等を用いてももちろんかまわない。また本発明は、実
施の形態３に示す処理手順に限定されるものではない。
例えば、ステップＳ２０５とステップＳ２０６は並列で
処理しても良く、他にも並列で処理できるところがあれ
ば並列処理を行なっても良い。In the third embodiment, the input is M
Although PEG-4 encoded data is used, the present invention is not limited to this. For example, JPEG20 for each frame
Of course, it is also possible to use data or the like coded with the ROI function in the 00 coding method. Further, the present invention is not limited to the processing procedure described in the third embodiment.
For example, step S205 and step S206 may be processed in parallel, and if there is another place that can be processed in parallel, parallel processing may be performed.

【００８８】また、本実施の形態３では、オブジェクト
が２つの場合で説明したが、オブジェクトが１つの場合
でも全く同じであるし、３つ以上になった場合でもそれ
ぞれの復号部を増やすことによって簡単に対応できる。Further, in the third embodiment, the case where the number of objects is two has been described. However, even when the number of objects is one, it is completely the same. Even when the number of objects becomes three or more, the number of decoding units is increased. We can easily cope.

【００８９】尚、本実施の形態３においては、前のフレ
ームのサイズと現フレームのサイズとの比較によって音
量調節パラメータＭnを算出したが、本発明はこれに限
定されず、画面の大きさ（pixelWidth×pixelHeight）
との比較によって（１４）式のように求めてもかまわな
い。In the third embodiment, the volume control parameter Mn is calculated by comparing the size of the previous frame with the size of the current frame. However, the present invention is not limited to this. pixelWidth x pixelHeight)
By comparing with (14), it may be obtained as in equation (14).

【００９０】Ｍn ＝Ｓn／（pixelWidth×pixelHeight） ...（１４）更に、画面の大きさ（pixelWidth×pixelHeight）の代
りに、ＶＯＰサイズの最大値を用いてもかまわない。Mn = Sn / (pixelWidth × pixelHeight) (14) Further, instead of the screen size (pixelWidth × pixelHeight), the maximum value of the VOP size may be used.

【００９１】尚、本実施の形態３においては、オブジェ
クトのサイズを形状情報から求めたが、ＶＯＰのサイズ
を表すvop_eidth符号とvop_height符号の復号結果を用
いてももちろんかまわない。また本実施の形態３におい
ては、説明のために音量調節を行なう機能を別途設けて
説明したが、ミキサ１０で同様の処理をしてももちろん
かまわない。In the third embodiment, the size of the object is obtained from the shape information. However, the decoding result of the vop_eidth code and the vop_height code representing the size of the VOP may be used. Further, in the third embodiment, a function of adjusting the volume is provided separately for the sake of explanation, but the same processing may be performed by mixer 10 as a matter of course.

【００９２】［実施の形態４］図１２は、本発明の実施
の形態４に係る動画像再生装置の構成を示すブロック図
である。尚、前述の実施の形態１と同様の構成要素につ
いては同一番号を付して、その詳細な説明は省略する。
本実施の形態４では、ＭＰＥＧ−４符号化のビデオ符号
化データとオーディオ符号化データを入力し、ビデオと
オーディオを再生する。[Embodiment 4] FIG. 12 is a block diagram showing a configuration of a moving picture reproducing apparatus according to Embodiment 4 of the present invention. The same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof will be omitted.
In the fourth embodiment, MPEG-4 encoded video data and audio encoded data are input, and video and audio are reproduced.

【００９３】図１２において、５０１はＭＰＥＧ−４符
号化データ入力部で、ビデオ符号化データとオーディオ
符号化データを入力する。５０２は分離器で、符号化デ
ータ入力部５０１から入力される符号化データを分離し
て後段の各部に入力する。５０３は画像合成部で、予め
決められた画像サイズ（Image_x，Image_y）で画像を合
成する。５０４は位置判定部で、ビデオ復号部４，５か
ら入力される糸情報を基に音源（ビデオオブジェクト）
の位置を判定する。In FIG. 12, reference numeral 501 denotes an MPEG-4 encoded data input section for inputting video encoded data and audio encoded data. Reference numeral 502 denotes a separator, which separates the encoded data input from the encoded data input unit 501 and inputs the separated data to the subsequent units. An image synthesis unit 503 synthesizes an image with a predetermined image size (Image_x, Image_y). Reference numeral 504 denotes a position determination unit which generates a sound source (video object) based on the thread information input from the video decoding units 4 and 5.
Is determined.

【００９４】以上の構成による動作を以下に説明する。The operation of the above configuration will be described below.

【００９５】ここでは前述の実施の形態１と同様に、Ｍ
ＰＥＧ−４符号化データ入力部５０１は、ＭＰＥＧ−４
符号化方式に準拠したビデオ符号化データとオーディオ
符号化データとを入力する。分離器５０２は、図２のオ
ブジェクト１１０１に関するビデオの符号化データをビ
デオ復号部４に、オブジェクト１１０２に関するビデオ
の符号化データをビデオ復号部５に、オブジェクト１１
０１に関するオーディオの符号化データをオーディオ復
号部６に、そしてオブジェクト１１０２に関するオーデ
ィオの符号化データをオーディオ復号部７にそれぞれ入
力する。Here, as in the first embodiment, M
The PEG-4 encoded data input unit 501 is an MPEG-4 encoded data input unit.
Video encoded data and audio encoded data conforming to the encoding method are input. The separator 502 transmits the encoded video data of the object 1101 of FIG. 2 to the video decoding unit 4, the encoded video data of the object 1102 to the video decoding unit 5, and the object 11
The audio coded data relating to the object 1102 is input to the audio decoding unit 7, and the audio coded data relating to the object 1102 is input to the audio decoding unit 7.

【００９６】ビデオ復号部４では、オブジェクト１１０
１の各ＶＯＰの符号化データを復号し、そのＶＯＰのサ
イズ、ＶＯＰの相対位置を得て画像データを復号する。
またビデオ復号部５は、オブジェクト１１０２の各ＶＯ
Ｐの符号化データを復号し、そのＶＯＰのサイズ、ＶＯ
Ｐの相対位置を得て画像データを復号する。In the video decoding unit 4, the object 110
1, the encoded data of each VOP is decoded, and the size of the VOP and the relative position of the VOP are obtained to decode the image data.
In addition, the video decoding unit 5 outputs each VO of the object 1102.
The encoded data of P is decoded, and the size of the VOP, VO
The relative position of P is obtained and the image data is decoded.

【００９７】オーディオ復号部６は、オブジェクト１１
０１に関するビデオの表示間隔の時間に再生するオーデ
ィオデータを復号する。同様に、オーディオ復号部７
は、オブジェクト１１０２に関するビデオの表示間隔の
時間に再生するオーディオデータを復号する。位置判定
部５０４は、ビデオ復号部４からオブジェクト１１０１
のＶＯＰの位置情報を、ビデオ復号部５からオブジェク
ト１１０２のＶＯＰの位置情報をそれぞれを入力する。
この場合、各オブジェクトの位置は、左上端が重なって
画面の原点に重なっているものとする。位置判定部５０
４は、これらの位置情報からオーディオのミキシングを
行なうパラメータＰ1、Ｐ2を決定する。ここで、オブジ
ェクト１１０１の音源の位置は（VOP1_loc_x）で表され
る。ここで画面全体の主走査方向のサイズは（Image_
x）であるから、オブジェクト１１０１のオーディオに
関するミキシングパラメータＰ1は、Ｐ1 ＝VOP1_loc_x ／ Image_x ...（１５）となる。同様に、オブジェクト１１０２のオーディオに
関するミキシングパラメータＰ2は、Ｐ2 ＝VOP2_loc_x ／ Image_x ...（１６）で表される。これらの結果が、ミキサ１０に入力され
る。[0097] The audio decoding unit 6
The audio data to be played back at the time of the video display interval related to 01 is decoded. Similarly, the audio decoding unit 7
Decodes audio data to be reproduced at the time of the video display interval for the object 1102. The position determining unit 504 receives the object 1101 from the video decoding unit 4.
, And the VOP position information of the object 1102 from the video decoding unit 5.
In this case, it is assumed that the position of each object overlaps the origin of the screen with the upper left end overlapping. Position determination unit 50
4 determines parameters P1 and P2 for performing audio mixing from the position information. Here, the position of the sound source of the object 1101 is represented by (VOP1_loc_x). Here, the size of the entire screen in the main scanning direction is (Image_
x), the mixing parameter P1 relating to the audio of the object 1101 is as follows: P1 = VOP1_loc_x / Image_x (15) Similarly, the mixing parameter P2 relating to the audio of the object 1102 is represented by P2 = VOP2_loc_x / Image_x (16). These results are input to the mixer 10.

【００９８】そして前述の実施の形態１と同様に、ミキ
サ１０は、これらのミキシングパラメータＰ1，Ｐ2に従
って、オーディオ復号部６，７から入力されるオーディ
オデータをミキシングする。こうしてミキシングされた
オーディオデータは、それぞれ音響装置１１，１２に入
力される。表示装置１３は、画像合成部５０３からの出
力を表示し、音響装置１１，１２は音を再生する。As in the first embodiment, the mixer 10 mixes the audio data input from the audio decoding units 6 and 7 according to the mixing parameters P1 and P2. The audio data thus mixed is input to the audio devices 11 and 12, respectively. The display device 13 displays the output from the image synthesizing unit 503, and the sound devices 11 and 12 reproduce sound.

【００９９】このような処理により、システムに関する
ＢＩＦＳ符号を用いない場合でも、オブジェクトの位置
をより忠実に判定することができ、ビデオオブジェクト
の移動に伴う違和感のないオーディオの再生ができる。
またフレーム単位で、ビデオ移動に伴ってその関連する
オーディオの発生源となる位置を更新するので、ビデオ
移動に伴った滑らかな音源の移動を実現することも可能
になる。By such processing, the position of the object can be determined more faithfully even without using the BIFS code relating to the system, and the audio can be reproduced without discomfort due to the movement of the video object.
In addition, since the position as the source of the related audio is updated in accordance with the movement of the video in units of frames, it is possible to realize the smooth movement of the sound source in accordance with the movement of the video.

【０１００】また上記実施の形態では、音響装置１１，
１２を左右に配置したが、本発明はこれに限定されるも
のでなく、音響装置１１，１２を上下にも設ければ、副
走査方向でのオブジェクトの位置による音響再生の制御
が行なえることは明らかである。Further, in the above embodiment, the acoustic device 11,
Although the present invention is not limited to this, the sound reproduction can be controlled by the position of the object in the sub-scanning direction if the sound devices 11 and 12 are also provided above and below. Is clear.

【０１０１】なお本発明は、複数の機器（例えばホスト
コンピュータ、インターフェース機器、ビデオカメラ、
ビデオカセットレコーダ、ディスプレイなど）から構成
されるシステムに適用しても、一つの機器からなる装置
（例えば、ビデオカメラ、ビデオカセットレコーダな
ど）に適用しても良い。The present invention relates to a plurality of devices (for example, a host computer, an interface device, a video camera,
The present invention may be applied to a system including a video cassette recorder, a display, and the like, or may be applied to a device including one device (for example, a video camera, a video cassette recorder, and the like).

【０１０２】また本発明の目的は、前述した実施の形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体（又は記録媒体）をシステム或は装置に
供給し、そのシステム或は装置のコンピュータ（又はＣ
ＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコー
ドを読み出し実行することによっても達成される。この
場合、記憶媒体から読み出されたプログラムコード自体
が前述した実施形態の機能を実現することになり、その
プログラムコードを記憶した記憶媒体は本発明を構成す
ることになる。また、コンピュータが読み出したプログ
ラムコードを実行することにより、前述した実施形態の
機能が実現されるだけでなく、そのプログラムコードの
指示に基づき、コンピュータ上で稼働しているオペレー
ティングシステム（ＯＳ）などが実際の処理の一部又は
全部を行い、その処理によって前述した実施形態の機能
が実現される場合も含まれる。It is also an object of the present invention to provide a storage medium (or a storage medium) on which a program code of software for realizing the functions of the above-described embodiments is recorded to a system or an apparatus, and a computer of the system or the apparatus. (Or C
PU and MPU) can also be achieved by reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention. When the computer executes the readout program codes, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instructions of the program codes. This also includes a case where some or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing.

【０１０３】更に、記憶媒体から読み出されたプログラ
ムコードが、コンピュータに挿入された機能拡張カード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書き込まれた後、そのプログラムコードの指示
に基づき、その機能拡張カードや機能拡張ユニットに備
わるＣＰＵなどが実際の処理の一部又は全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれる。本発明を簡単にするために各実施の形態
ではオブジェクトが１つの場合について述べたが、オブ
ジェクトごとに同様の処理を行うことで複数のオブジェ
クトに対応することは明らかである。Further, after the program code read from the storage medium is written in a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, based on the instruction of the program code, The case where the CPU of the function expansion card or the function expansion unit performs part or all of the actual processing, and the function of the above-described embodiment is realized by the processing. Although each embodiment has been described with reference to a single object in order to simplify the present invention, it is apparent that the same processing is performed for each object so as to correspond to a plurality of objects.

【０１０４】以上の説明したように本実施の形態によれ
ば、ＭＰＥＧ−４のようなビデオオブジェクトとオーデ
ィオオブジェクトを含むオブジェクト符号化データを復
号する場合に、ビデオオブジェクトの位置を考慮して、
それに関連したオーディオデータを復号して再生するこ
とにより、ビデオの移動に対して違和感の無い動画像の
再生を行なうことができる。As described above, according to the present embodiment, when decoding object encoded data including a video object and an audio object such as MPEG-4, the position of the video object is taken into consideration.
By decoding and playing back the audio data related thereto, it is possible to play back a moving image that does not feel uncomfortable with the movement of the video.

【０１０５】更には、これらオブジェクトを組み合わせ
て新しい画像を作る際にも、３次元空間を定義すること
なく、簡易に遠近感や立体感が出せるという効果があ
る。Further, when a new image is created by combining these objects, there is an effect that a perspective and a three-dimensional effect can be easily obtained without defining a three-dimensional space.

【０１０６】[0106]

【発明の効果】以上説明したように本発明によれば、オ
ブジェクト符号化された符号化データを復号する際、オ
ーディオとビデオとを位置的な差異をなくして再生する
ことができる。As described above, according to the present invention, when decoding the object-encoded data, audio and video can be reproduced without a positional difference.

【０１０７】また本発明によれば、ビデオオブジェクト
の移動に伴って、それに関連するオーディオの発生位置
を変更して、ビデオとオーディオの再生における違和感
をなくすことができる。Further, according to the present invention, it is possible to change the audio generation position related to the movement of the video object in accordance with the movement of the video object, thereby eliminating a sense of incongruity in video and audio reproduction.

【０１０８】また本発明によれば、ビデオオブジェクト
の移動に伴って、それに関連するオーディオの再生音量
を変更して、ビデオとオーディオの再生における違和感
をなくすことができる。Further, according to the present invention, it is possible to change the audio reproduction volume associated with the movement of the video object, thereby eliminating a sense of incongruity in video and audio reproduction.

[Brief description of the drawings]

【図１】本発明の実施の形態１に係る動画像再生装置の
機能構成を示すブロック図である。FIG. 1 is a block diagram showing a functional configuration of a moving image playback device according to Embodiment 1 of the present invention.

【図２】本実施の形態１におけるオブジェクトの合成例
を説明する図である。FIG. 2 is a diagram illustrating an example of combining objects according to the first embodiment;

【図３】本発明の実施の形態１に係るビデオ復号部の構
成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of a video decoding unit according to Embodiment 1 of the present invention.

【図４】本発明の実施の形態１に係るオブジェクトの合
成例を説明する図である。FIG. 4 is a diagram illustrating an example of combining objects according to the first embodiment of the present invention.

【図５】本発明の実施の形態１に係る動画像再生装置に
おける復号処理を説明するフローチャートである。FIG. 5 is a flowchart illustrating a decoding process in the video playback device according to Embodiment 1 of the present invention;

【図６】本発明の実施の形態２に係るビデオ復号部の機
能構成を示すブロック図である。FIG. 6 is a block diagram showing a functional configuration of a video decoding unit according to Embodiment 2 of the present invention.

【図７】本発明の実施の形態３に係る動画像再生装置の
機能構成を示すブロック図である。FIG. 7 is a block diagram showing a functional configuration of a moving image reproducing apparatus according to Embodiment 3 of the present invention.

【図８】本発明の実施の形態３に係るオブジェクトの一
例を説明する図である。FIG. 8 is a diagram illustrating an example of an object according to Embodiment 3 of the present invention.

【図９】本発明の実施の形態３に係るビデオ復号部の機
能構成を示すブロック図である。FIG. 9 is a block diagram showing a functional configuration of a video decoding unit according to Embodiment 3 of the present invention.

【図１０】本発明の実施の形態３におけるオブジェクト
の合成例を説明する図である。FIG. 10 is a diagram illustrating an example of combining objects according to the third embodiment of the present invention.

【図１１】本発明の実施の形態３に係る動画像再生装置
における復号処理を説明するフローチャートである。FIG. 11 is a flowchart illustrating a decoding process in the video playback device according to Embodiment 3 of the present invention.

【図１２】本発明の実施の形態４に係る動画像再生装置
の機能構成を示すブロック図である。FIG. 12 is a block diagram showing a functional configuration of a moving image reproducing apparatus according to Embodiment 4 of the present invention.

【図１３】オブジェクトの表示例を説明する図である。FIG. 13 is a diagram illustrating a display example of an object.

【図１４】従来のＭＰＥＧ−４符号化データの動画像再
生装置の構成を示すブロック図である。FIG. 14 is a block diagram showing a configuration of a conventional moving picture reproducing apparatus for MPEG-4 encoded data.

Claims

[Claims]

1. An image processing apparatus for decoding and reproducing encoded audio data and encoded video data, comprising: audio decoding means for decoding audio encoded data to generate audio data; Video decoding means for decoding data to generate video data, sound source position obtaining means for determining the position of a video object based on the video encoded data, based on the position information obtained by the sound source position obtaining means, An image processing apparatus, comprising: audio playback control means for controlling playback of audio data decoded by the audio decoding means.

2. A sound source position acquiring unit, comprising: shape information extracting unit for extracting shape information of a video object from the video encoded data; and a position of the video object from the shape information extracted by the shape information extracting unit. The image processing apparatus according to claim 1, further comprising: a determination unit configured to determine the position of the sound source by determining a position of the sound source.

3. The audio reproduction control means, based on a ratio between a position of the video object and a screen size obtained by the sound source position obtaining means, adjusts a reproduction volume balance of left and right channels of the decoded audio data. The image processing apparatus according to claim 1, wherein the adjustment is performed.

4. The image processing apparatus according to claim 1, wherein the audio reproduction control unit adjusts a reproduction volume balance between left and right channels of the audio data decoded by the audio decoding unit.

5. An image processing apparatus for decoding and reproducing encoded audio data and encoded video data, comprising: audio decoding means for decoding the encoded audio data to generate audio data; Video decoding means for decoding data to generate video data; sound source distance calculating means for calculating a distance to a sound source of audio included in the video based on the video encoded data; and sound source distance calculating means. And an audio reproduction control means for controlling a reproduction volume of the audio data decoded by the audio decoding means based on the distance information.

6. The sound source distance calculating means includes: shape information extracting means for extracting shape information of a video object from the encoded video data; and the video object includes the shape information extracted by the shape information extracting means. The image processing apparatus according to claim 5, further comprising: means for calculating a distance to a sound source to be transmitted.

7. The sound source distance calculation means, based on a size ratio of a video object of a previous frame extracted by the shape information extraction means and a video object included in a video object of a current frame. 7. The method according to claim 6, wherein the distance is obtained.
An image processing apparatus according to claim 1.

8. The video coded data is MPEG-coded data.
The image processing apparatus according to claim 1, wherein the image data is encoded data based on a four-encoding method.

9. An image processing method in an image processing device for decoding and reproducing encoded audio data and encoded video data, the audio decoding method comprising: decoding audio encoded data to generate audio data; A video decoding step of decoding video encoded data to generate video data; a sound source position obtaining step of determining a position of a video object based on the video encoded data; and a position information obtained in the sound source position obtaining step. An audio reproduction control step of controlling reproduction of the audio data decoded in the audio decoding step based on the audio data.

10. The sound source position obtaining step includes: a shape information extracting step of extracting shape information of a video object from the video encoded data; and a position of the video object from the shape information extracted in the shape information extracting step. And determining the position of the sound source by determining the position of the sound source.

11. The audio reproduction control step includes, based on a ratio between a position of the video object and a screen size obtained in the sound source position obtaining step, adjusting a reproduction volume balance of left and right channels of the decoded audio data. The image processing method according to claim 9, wherein adjustment is performed.

12. The image processing method according to claim 9, wherein in the audio reproduction control step, a reproduction volume balance of left and right channels of the audio data decoded in the audio decoding step is adjusted.

13. An image processing method in an image processing apparatus for decoding and reproducing encoded audio data and encoded video data, the audio decoding step comprising: decoding audio encoded data to generate audio data. A video decoding step of decoding video encoded data to generate video data; a sound source distance calculation step of calculating a distance to a sound source of audio included in the video based on the video encoded data; An audio playback control step of controlling a playback volume of the audio data decoded in the audio decoding step, based on the distance information calculated in the step,
An image processing method comprising:

14. The sound source distance calculating step includes: a shape information extracting step of extracting shape information of a video object from the video encoded data; and a video object included in the video object from the shape information extracted in the shape information extracting step. Calculating the distance to a sound source to be processed.

15. The sound source distance calculating step is based on a size ratio of a video object of a previous frame extracted in the shape information extracting step and a video object included in a video object of a current frame. The image processing method according to claim 14, wherein the distance is obtained.

16. The video coded data may be MPEG
The image processing method according to any one of claims 9 to 15, wherein the data is coded data according to a -4 coding method.

17. An image processing program for executing the image processing method according to claim 9. Description:

18. A computer-readable storage medium storing a program for executing the image processing method according to claim 9. Description: