JP2004048632A

JP2004048632A - Video encoding method and video decoding method

Info

Publication number: JP2004048632A
Application number: JP2002269295A
Authority: JP
Inventors: Toshiyuki Kondo; 敏志近藤; Shinya Sumino; 眞也角野; Makoto Hagai; 誠羽飼; Seishi Abe; 清史安倍
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-05-16
Filing date: 2002-09-13
Publication date: 2004-02-12

Abstract

【目的】インタレース映像に対するダイレクトモードにおいて、非常に小さなオーバーヘッドで従来方法と比較して符号化効率を大きく改善する。
【解決手段】フィールドＢ３１のブロックａをダイレクトモードで処理する場合に、後方参照フィールドであるＰ４１のブロックｂを処理する際に用いた動きベクトルＡを参照動きベクトルとする。この参照動きベクトルＡがフィールドＰ１１とＰ１２のいずれを参照するかにより、スケーリング時の係数を切り替える。また、スケーリング時の複数の係数間に所定の関係がある場合には、一部の係数のみを関連情報として記述する。
【選択図】　図７The present invention aims to greatly improve the coding efficiency in the direct mode for interlaced video with a very small overhead as compared with the conventional method.
When a block a of a field B31 is processed in a direct mode, a motion vector A used when processing a block b of a back reference field P41 is set as a reference motion vector. The scaling coefficient is switched according to which of the fields P11 and P12 the reference motion vector A refers to. When there is a predetermined relationship between a plurality of coefficients at the time of scaling, only some of the coefficients are described as related information.
[Selection diagram] FIG.

Description

【０００１】
【発明の属する技術分野】
本発明は、フレーム間予測符号化を用いて符号化および復号化を行う動画像符号化方法および動画像復号化方法、動画像符号化装置、動画像復号化装置、並びにそれをソフトウェアで実施するためのプログラム、およびプログラムを格納した記録媒体に関する。
【０００２】
【従来の技術】
動画像符号化においては、一般に動画像が有する空間方向および時間方向の冗長性を利用して情報量の圧縮を行う。ここで、時間方向の冗長性を利用する方法として、フレーム間予測符号化が用いられる。フレーム間予測符号化では、あるフレームを符号化する際に、表示時間順で前方または後方にあるフレームを参照フレームとする。そして、その参照フレームからの動き量を検出し、動き補償を行ったフレームと符号化対象のフレームとの差分値に対して空間方向の冗長度を取り除くことにより情報量の圧縮を行う。
【０００３】
ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４、Ｈ．２６３等の動画像符号化方式では、ピクチャ間予測符号化を行わない、すなわちピクチャ内符号化を行うフレームをＩピクチャと呼ぶ。ここでピクチャとは、フレームおよびフィールドの両者を包含する１つの符号化の単位を意味する。また、表示時間順で前方にあるピクチャを参照してピクチャ間予測符号化するピクチャをＰピクチャと呼び、表示時間順で前方および後方にある既に処理済みのピクチャを参照してピクチャ間予測符号化するピクチャをＢピクチャと呼ぶ。図１９に上記の動画像符号化方式における、各フレームの予測関係を示す。図１９において、縦線は１枚のフレームを示しており、各フレームの右下にフレームタイプ（Ｉ、Ｐ、Ｂ）を示している。また図１９中の矢印は、矢印の終端にあるフレームが、矢印の始端にあるフレームを参照フレームとして用いてフレーム間予測符号化することを示している。例えば、先頭から２枚目のＢフレームは、先頭のＩフレームと先頭から４枚目のＰフレームを参照画像として用いることにより符号化する。
【０００４】
ＭＰＥＧ−４方式では、Ｂピクチャの符号化において、ダイレクトモードという符号化モードを選択することができる（例えば、非特許文献１参照）。ダイレクトモードにおける従来のフレーム間予測方法を図２０を用いて説明する（例えば、特許文献１参照）。今、フレームＢ３のブロックａをダイレクトモードで符号化するとする。この場合、フレームＢ３の後方参照フレームであるフレームＰ４中の、ブロックａと同じ位置にあるブロックｂの動きベクトルＡを利用する。
動きベクトルＡは、ブロックｂが符号化または復号化された際に用いられた動きベクトルであり、フレームＰ１を参照している。ブロックａは、動きベクトルＡから所定の方法により計算して得られる動きベクトルを用いて、参照フレームであるフレームＰ１とフレームＰ４から動き補償を行う。この場合のブロックａを符号化する際に用いる動きベクトルは、フレームＰ１に対しては動きベクトルＢ、フレームＰ４に対しては動きベクトルＣとなる。この際、動きベクトルＡの大きさをＭＶ、動きベクトルＢの大きさをＭＶｆ、動きベクトルＣの大きさをＭＶｂとすると、ＭＶｆ、ＭＶｂはそれぞれ式１、式２によって得られる。
（式１）　ＭＶｆ＝Ｎ×ＭＶ／Ｄ
（式２）　ＭＶｂ＝−Ｍ×ＭＶ／Ｄ
【０００５】
ここで、Ｎ、Ｍ、Ｄはフレーム単位で定められた１組の値であり、以下ではこれらの値をスケーリング係数と呼ぶことにする。スケーリング係数の値は、符号化時に決定すれば良い。一例としては、各フレーム間の時間的距離を用いて設定することができる。今、フレームＰ１からＰ４の時間的距離をＤ、フレームＰ１からＢ３の時間的距離をＮ、フレームＢ３からＰ４の時間的距離をＭと設定すれば、ＭＶｆとＭＶｂはＭＶに平行な動きベクトルとなる。
【０００６】
【非特許文献１】
ＩＮＴＥＲＮＡＴＩＯＮＡＬ　ＳＴＡＮＤＡＲＤ　ＩＳＯ／ＩＥＣ１４４９６−２：１９９９／ａｍｄ．１：２０００（Ｅ）
【０００７】
【特許文献１】
特開平１１−７５１９１号公報
【０００８】
【発明が解決しようとする課題】
上記従来の方法をインタレース画像に適用する場合を考える。ここで、インタレース画像とは、１つのフレームが２つのフィールドから構成される画像である。インタレース画像の符号化や復号化処理においては、１つのフレームをフレームのまま処理したり、２つのフィールドとして処理したり、フレーム内のブロック毎にフレーム構造またはフィールド構造として処理したりすることができる。
インタレース映像をフィールド単位で処理する場合、同一フィールド内のブロックを処理する場合であっても、ブロックによって、ダイレクトモードにおける参照ピクチャとして第１フィールドと第２フィールドのいずれを用いるかが異なる。これは、例えばダイレクトモードにおける前方参照フィールドが、後方参照フィールドが参照するフィールドとなるためである。このように、ブロックによってダイレクトモードにおける前方参照フィールドが異なるため、動きベクトルに対するスケーリング係数の最適な値も異なる。したがって、符号化や復号化対象のフレームやフィールド単位でスケーリング係数を１組だけ定めると符号化効率が低下するという問題が生じる。
【０００９】
本発明は上記問題点を鑑みてなされたものであり、インタレース映像に対するダイレクトモードにおいて、非常に小さなオーバーヘッドで従来方法と比較して符号化効率を大きく改善することができる、動きベクトル計算方法および動画像符号化方法および動画像復号化方法を提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記目的を達成するために、本発明に係る動画像符号化方法は、ピクチャ列からなる動画像を符号化し、得られた符号列を出力する方法であって、ピクチャを構成するブロックごとに動きベクトルを算出する動きベクトル算出ステップと、算出された動きベクトルを参照動きベクトルとして、係数を用いたスケーリング処理を行うことによって処理対象ブロックの動きベクトルを予測して生成する動きベクトル予測ステップと、前記係数を前記符号列とともに出力する係数出力ステップとを含むことを特徴とする。
【００１１】
また、本発明に係る動画像復号化方法は、請求項１記載の動画像符号化方法によって出力された符号列を復号化する方法であって、前記符号列から前記係数を取り出し、取り出した係数を用いてスケーリング処理を行うことによって処理対象ブロックの動きベクトルを算出する動きベクトル算出ステップと、算出された動きベクトルを用いて前記処理対象ブロックの復号化を行う復号化ステップとを含むことを特徴とする。
【００１２】
また、本発明に係る動画像符号化装置は、ピクチャ列からなる動画像を符号化し、得られた符号列を出力する装置であって、ピクチャを構成するブロックごとに動きベクトルを算出する動きベクトル算出手段と、算出された動きベクトルを参照動きベクトルとして、係数を用いたスケーリング処理を行うことによって処理対象ブロックの動きベクトルを予測して生成する動きベクトル予測手段と、前記係数を前記符号列とともに出力する係数出力手段とを備えることを特徴とする。
【００１３】
また、本発明に係る動画像復号化装置は、請求項９記載の動画像符号化装置によって出力された符号列を復号化する装置であって、前記符号列から前記係数を取り出し、取り出した係数を用いてスケーリング処理を行うことによって処理対象ブロックの動きベクトルを算出する動きベクトル算出手段と、算出された動きベクトルを用いて前記処理対象ブロックの復号化を行う復号化手段とを備えることを特徴とする。
【００１４】
さらに、本発明は、前記動画像符号化方法および動画像復号化方法におけるステップをコンピュータに実行させるプログラムとして実現し、ＣＤ−ＲＯＭや通信ネットワーク等の記録媒体や伝送媒体を介して流通させたりすることもできる。
【００１５】
【発明の実施の形態】
本発明の実施の形態について、図面を参照して説明する。
【００１６】
（実施の形態１）
図１は、本発明に係る動画像符号化方法を用いた動画像符号化装置の一実施の形態の構成を示すブロック図である。
【００１７】
動画像符号化装置は、フレームメモリ１０１、差分演算部１０２、予測誤差符号化部１０３、符号列生成部１０４、予測誤差復号化部１０５、加算演算部１０６、フレームメモリ１０７、動きベクトル検出部１０８、モード選択部１０９、符号化制御部１１０、スイッチ１１１〜１１５、および動きベクトル記憶部１１６を備えている。
【００１８】
フレームメモリ１０１は、表示時間順にピクチャ単位で入力された動画像を格納する。符号化制御部１１０は、フレームメモリ１０１に格納された各ピクチャを符号化が行われる順に並び替えを行う。また、符号化制御部１１０は、動きベクトル記憶部１１６への動きベクトルの記憶動作を制御する。さらに、符号化制御部１１０は、後で詳細に説明するスケーリング係数を決定し、符号列生成部１０４およびモード選択部１０９へ出力する。
【００１９】
動きベクトル検出部１０８は、符号化済みの復号化画像データを参照ピクチャとして用いて、そのピクチャ内の探索領域において最適と予測される位置を示す動きベクトルの検出を行う。モード選択部１０９は、動きベクトル検出部１０８で検出された動きベクトルを用いて、マクロブロックの符号化モードを決定する。差分演算部１０２は、フレームメモリ１０１より読み出された画像データと、モード選択部１０９より入力された参照画像データとの差分を演算し、予測誤差画像データを生成する。
【００２０】
予測誤差符号化部１０３は、入力された予測誤差画像データに対して周波数変換や量子化等の符号化処理を行い、符号化データを生成する。符号列生成部１０４は、入力された符号化データに対して可変長符号化等を行い、さらにモード選択部１０９から入力された動きベクトルの情報、符号化モードの情報、およびその他の関連情報等を付加することにより符号列を生成する。
【００２１】
予測誤差復号化部１０５は、入力された符号化データに対して逆量子化や逆周波数変換等の復号化処理を行い、復号化差分画像データを生成する。加算演算部１０６は、予測誤差復号化部１０５より入力された復号化差分画像データと、モード選択部１０９より入力された参照画像データとを加算し、復号化画像データを生成する。フレームメモリ１０７は、生成された復号化画像データを格納する。
【００２２】
次に、上記のように構成された動画像符号化装置の動作について説明する。
図２は、フレームメモリ１０１におけるピクチャの順序を示す説明図であり、（ａ）　入力された順序、（ｂ）　並び替えられた順序を示す説明図である。ここで、縦線はピクチャを示し、各ピクチャの右下に示す記号は１文字目のアルファベットがピクチャタイプ（Ｉ、Ｐ、またはＢ）を、２文字目以降の数字が表示時間順のピクチャ番号を示している。また、Ｐピクチャは、表示時間順で前方にある近傍３枚のＩピクチャまたはＰピクチャを参照ピクチャとし、Ｂピクチャは、表示時間順で前方にある近傍３枚のＩピクチャまたはＰピクチャと、表示時間順で後方にある近傍１枚のＩピクチャまたはＰピクチャとを参照ピクチャとして用いるものとしている。
【００２３】
入力画像は、例えば図２（ａ）　に示すように表示時間順にピクチャ単位でフレームメモリ１０１に入力される。フレームメモリ１０１にピクチャが入力されると、符号化制御部１１０は、入力されたピクチャをどのタイプのピクチャ（Ｉ、Ｐ、またはＢピクチャ）で符号化するかを決定し、決定したピクチャタイプによりスイッチ１１３〜１１５を制御する。なお、ピクチャタイプの決定は、例えば周期的にピクチャタイプを割り当てる方法が一般的に用いられている。
【００２４】
符号化制御部１１０は、ピクチャタイプを決定すると、フレームメモリ１０１内で入力された各ピクチャを、例えば図２（ｂ）　に示すように符号化が行われる順に並び替える。この符号化順への並び替えは、ピクチャ間予測符号化における参照関係に基づいて行われ、参照ピクチャとして用いられるピクチャが、参照ピクチャとして用いるピクチャよりも先に符号化されるように並び替えられる。
【００２５】
フレームメモリ１０１で並び替えが行われた各ピクチャは、例えば水平１６×垂直１６画素のグループに分割されたマクロブロック単位で読み出される。また、動き補償は、例えば水平８×垂直８画素のグループに分割されたブロック単位で行っている。
【００２６】
以降の動作については、符号化対象のピクチャがＰピクチャの場合とＢピクチャの場合に分けてそれぞれ説明する。
【００２７】
＜Ｐピクチャの場合＞
Ｐピクチャでは、前方参照を用いたピクチャ間予測符号化を行っている。例えば、図２（ａ）　に示す例でピクチャＰ１３の符号化処理を行う場合、参照ピクチャはピクチャＰ１０、Ｐ７、Ｐ４となる。これら参照ピクチャは、既に符号化が終了しており、復号化画像データがフレームメモリ１０７に格納されている。
【００２８】
Ｐピクチャの符号化時には、符号化制御部１１０は、スイッチ１１３〜１１５がオンになるように各スイッチを制御する。これにより、フレームメモリ１０１より読み出されたピクチャＰ１３のマクロブロックは、動きベクトル検出部１０８、モード選択部１０９、および差分演算部１０２に入力される。
【００２９】
動きベクトル検出部１０８は、マクロブロック内の各ブロックに対して、フレームメモリ１０７に格納された復号化画像データを参照ピクチャとして用いて、動きベクトルの検出を行う。動きベクトル検出部１０８は、検出した動きベクトルをモード選択部１０９に対して出力する。
【００３０】
モード選択部１０９は、動きベクトル検出部１０８で検出された動きベクトルを用いて、マクロブロックの符号化モードを決定する。ここで、符号化モードとはマクロブロックをどのような方法で符号化するかを示すものである。例えば、Ｐピクチャの場合には、ピクチャ内符号化、動きベクトルを用いたピクチャ間予測符号化、動きベクトルを用いない（動きを０として扱う、または周囲ブロックの動きベクトルから選択する）ピクチャ間予測符号化の中から、いずれの方法で符号化するかを選択することができるものとする。符号化モードの決定においては、一般的には少ないビット量でより符号化誤差が小さくなる方法を選択する。
【００３１】
モード選択部１０９は、決定した符号化モードを符号列生成部１０４に対して出力する。このとき、符号化対象のピクチャが、他のピクチャの符号化時に参照ピクチャとして用いられるピクチャであり、モード選択部１０９で決定した符号化モードがピクチャ間予測符号化である場合には、そのピクチャ間予測符号化で用いられる動きベクトルおよび参照ピクチャ番号を示す情報を動きベクトル記憶部１１６に記憶する。また、この場合にはモード選択部１０９は、動きベクトルおよび参照ピクチャ番号を示す情報を符号列生成部１０４に対して出力する。
【００３２】
モード選択部１０９は、決定した符号化モードに基づいた参照画像データを、差分演算部１０２と加算演算部１０６とに出力する。なお、モード選択部１０９がピクチャ内符号化を選択した場合には、参照画像データは出力しない。また、モード選択部１０９は、ピクチャ内符号化を選択した場合には、スイッチ１１１をａ側に、スイッチ１１２をｃ側に接続し、ピクチャ間予測符号化を選択した場合には、スイッチ１１１をｂ側に、スイッチ１１２をｄ側に接続するように制御する。
【００３３】
以下では、モード選択部１０９でピクチャ間予測符号化が選択された場合について説明する。
差分演算部１０２は、フレームメモリ１０１より読み出されたピクチャＰ１３のマクロブロックの画像データと、モード選択部１０９より入力された参照画像データとの差分を演算し、予測誤差画像データを生成して予測誤差符号化部１０３へ出力する。予測誤差画像データが入力された予測誤差符号化部１０３は、この予測誤差画像データに対して周波数変換や量子化等の符号化処理を行い、符号化データを生成して符号列生成部１０４および予測誤差復号化部１０５へ出力する。ここで、周波数変換や量子化の処理は、例えば水平８×垂直８画素の単位で行うことができる。
【００３４】
符号化データが入力された符号列生成部１０４は、この符号化データに対して可変長符号化等を行い、さらにモード選択部１０９から入力された動きベクトルの情報、参照ピクチャの情報を示す情報、符号化モードの情報、およびその他の関連情報等を付加することにより符号列を生成し、出力する。参照ピクチャ番号についての詳細は後述する。
【００３５】
一方、符号化データが入力された予測誤差復号化部１０５は、この符号化データに対して逆量子化や逆周波数変換等の復号化処理を行い、復号化差分画像データを生成して加算演算部１０６へ出力する。復号化差分画像データが入力された加算演算部１０６は、この復号化差分画像データと、モード選択部１０９より入力された参照画像データとを加算し、復号化画像データを生成してフレームメモリ１０７に格納する。
【００３６】
以降同様の処理により、ピクチャＰ１３の残りのマクロブロックについても符号化処理を行う。そして、図２（ａ）　に示す例では、ピクチャＰ１３のすべてのマクロブロックについて処理が終了すると、次にピクチャＢ１１の符号化処理を行う。
【００３７】
＜Ｂピクチャの場合＞
Ｂピクチャでは、２方向参照を用いたピクチャ間予測符号化を行っている。例えば、図２（ａ）　に示す例でピクチャＢ１１の符号化処理を行う場合、表示時間順で前方にある参照ピクチャはピクチャＰ１０、Ｐ７、Ｐ４、表示時間順で後方にある参照ピクチャはピクチャＰ１３となる。
【００３８】
ここではＢピクチャが他のピクチャの符号化時に、参照ピクチャとして用いられない場合を考える。よって、Ｂピクチャの符号化時には、符号化制御部１１０は、スイッチ１１３がオンに、スイッチ１１４〜１１５がオフになるように各スイッチを制御する。これにより、フレームメモリ１０１より読み出されたピクチャＢ１１のマクロブロックは、動きベクトル検出部１０８、モード選択部１０９、および差分演算部１０２に入力される。
【００３９】
動きベクトル検出部１０８は、フレームメモリ１０７に格納されたピクチャＰ１０、Ｐ７、Ｐ４の復号化画像データを前方参照ピクチャとして、ピクチャＰ１３の復号化画像データを後方参照ピクチャとして用いて、マクロブロック内の各ブロックに対して、前方動きベクトルと後方動きベクトルとの検出を行う。動きベクトル検出部１０８は、検出した動きベクトルをモード選択部１０９に対して出力する。
【００４０】
モード選択部１０９は、動きベクトル検出部１０８で検出された動きベクトルを用いて、マクロブロックの符号化モードを決定する。ここで、Ｂピクチャの符号化モードは、例えばピクチャ内符号化、前方動きベクトルを用いたピクチャ間予測符号化、後方動きベクトルを用いたピクチャ間予測符号化、双方向動きベクトルを用いたピクチャ間予測符号化、ダイレクトモードの中から、いずれの方法で符号化するかを選択することができるものとする。
【００４１】
ここでは、ダイレクトモードで符号化を行う場合について説明する。
図３は、ダイレクトモードにおける動きベクトルを示す説明図である。ここで、符号化対象のブロックがピクチャＢ１１のブロックａであるとする。
【００４２】
ブロックａをダイレクトモードで符号化を行う場合には、後方参照ピクチャの中のブロックａと同じ位置にあるブロックの動きベクトルを利用する。すなわち、図３に示すようにピクチャＰ１３のブロックｂの動きベクトルｃを利用することになる。動きベクトルｃは、ブロックｂが符号化された際に用いられた動きベクトルであり、ピクチャＰ１０を参照している。なお、動きベクトルｃは、動きベクトル記憶部１１６に記憶されている。
【００４３】
モード選択部１０９は、符号化制御部１１０で決定されたスケーリング係数を用いて、動きベクトルｃからピクチャＰ１０に基づく動きベクトルｄ、およびピクチャＰ１３に基づく動きベクトルｅを生成する。ブロックａは、動きベクトルｃから生成された２つの動きベクトルｄ、ｅを用いて、参照ピクチャであるピクチャＰ１０およびピクチャＰ１３から２方向予測が行われる。なお、参照ピクチャとして第１フィールドと第２フィールドのいずれを用いるかに応じたスケーリング係数の選択については、後で詳細に説明する。
【００４４】
さて、モード選択部１０９は、決定した符号化モードを符号列生成部１０４に対して出力する。また、モード選択部１０９は、決定した符号化モードに基づいた参照画像データを、差分演算部１０２と加算演算部１０６とに出力する。なお、モード選択部１０９がピクチャ内符号化を選択した場合には、参照画像データは出力しない。また、モード選択部１０９は、ピクチャ内符号化を選択した場合には、スイッチ１１１をａ側に、スイッチ１１２をｃ側に接続し、ピクチャ間予測符号化を選択した場合には、スイッチ１１１をｂ側に、スイッチ１１２をｄ側に接続するように制御する。
【００４５】
以下では、モード選択部１０９でピクチャ間予測符号化が選択された場合について説明する。
差分演算部１０２は、フレームメモリ１０１より読み出されたピクチャＰ１３のマクロブロックの画像データと、モード選択部１０９より入力された参照画像データとの差分を演算し、予測誤差画像データを生成して予測誤差符号化部１０３へ出力する。
【００４６】
予測誤差画像データが入力されたデータ予測誤差符号化部１０３は、この予測誤差画像データに対して周波数変換や量子化等の符号化処理を行い、符号化データを生成して符号列生成部１０４へ出力する。符号化データが入力された符号列生成部１０４は、この符号化データに対して可変長符号化等を行い、さらにモード選択部１０９から入力された動きベクトルの情報、符号化モードの情報、およびその他の関連情報等を付加することにより符号列を生成し、出力する。なお、ピクチャの関連情報には、ダイレクトモードで用いるために符号化制御部１１０で決定されたスケーリング係数が含まれる。また、ダイレクトモードで符号化されたマクロブロックについては、動きベクトルの情報は符号化列には付加しない。
【００４７】
以降同様の処理により、ピクチャＢ１１の残りのマクロブロックについても符号化処理を行う。そして、図２（ａ）　に示す例では、ピクチャＢ１１のすべてのマクロブロックについて処理が終了すると、次にピクチャＢ１２の符号化処理を行う。
【００４８】
図６は、前記動画像符号化装置による画像符号化信号フォーマットの概念図である。Ｐｉｃｔｕｒｅは１フレーム分の符号化信号、Ｈｅａｄｅｒはフレーム先頭に含まれる関連符号化信号、Ｂｌｏｃｋ１はダイレクトモードによるブロックの符号化信号、Ｂｌｏｃｋ２はダイレクトモード以外の補間予測によるブロックの符号化信号、ＲＩｄｘ１　、ＲＩｄｘ２　は相対インデックス、ＭＶ１　、ＭＶ２　は動きベクトルを示す。ここでは、関連情報であるＨｅａｄｅｒ部は、同一の符号列中に含まれる場合について示しているが、関連情報が他の符号列に含まれていても構わない。Ｈｅａｄｅｒには、ダイレクトモードで用いるために符号化制御部１１０で決定されたスケーリング係数が含まれる。補間予測ブロックＢｌｏｃｋ２では、補間に使用する２つの参照フレームを示すため２つの相対インデックスＲＩｄｘ１　、ＲＩｄｘ２　を符号化信号中にこの順で有する。相対インデックスは上記で説明した参照ピクチャ番号と同じである。相対インデックスＲＩｄｘ１　、ＲＩｄｘ２　のいずれを使用するかはＰｒｅｄＴｙｐｅにより判断することができる。例えば、２方向でピクチャを参照することが示される場合はＲＩｄｘ１　とＲＩｄｘ２　が用いられ、１方向でピクチャを参照することが示される場合はＲＩｄｘ１　またはＲＩｄｘ２　が用いられ、ダイレクトモードが示されている場合はＲＩｄｘ１　、ＲＩｄｘ２　ともに用いられない。第１参照フレームを示す相対インデックスＲＩｄｘ１　を第１相対インデックス、第２参照フレームを示す相対インデックスＲＩｄｘ２　を第２相対インデックスと呼ぶ。第１参照フレームか第２参照フレームかは符号化ストリーム中のデータ位置で決まる。
【００４９】
図４は、本発明に係る動画像復号化方法を用いた動画像復号化装置の一実施の形態の構成を示すブロック図である。
【００５０】
動画像復号化装置は、符号列解析部７０１、予測誤差復号化部７０２、モード復号部７０３、動き補償復号部７０５、動きベクトル記憶部７０６、フレームメモリ７０７、加算演算部７０８、およびスイッチ７０９、７１０を備えている。
【００５１】
符号列解析部７０１は、入力された符号列より符号化モードの情報、動きベクトルの情報、およびスケーリング係数等の各種データの抽出を行う。予測誤差復号化部７０２は、入力された予測誤差符号化データの復号化を行い、予測誤差画像データを生成する。モード復号部７０３は、符号列より抽出された符号化モードの情報を参照し、スイッチ７０９、７１０の制御を行う。
【００５２】
動き補償復号部７０５は、参照ピクチャ番号と動きベクトルの情報の復号化処理を行い、復号化した参照ピクチャ番号と動きベクトルとに基づいて、フレームメモリ７０７より動き補償画像データを取得する。動きベクトル記憶部７０６は、動きベクトルを記憶する。
【００５３】
加算演算部７０８は、予測誤差復号化部７０２より入力された予測誤差符号化データと、動き補償復号部７０５より入力された動き補償画像データとを加算し、復号化画像データを生成する。フレームメモリ７０７は、生成された復号化画像データを格納する。
【００５４】
次に、上記のように構成された動画像復号化装置の動作について説明する。
図５は、ピクチャの順序を示す説明図であり、（ａ）　入力される符号列中のピクチャの順序、（ｂ）　出力画像として出力されるピクチャの順序を示す説明図である。ここで、Ｐピクチャは、表示時間順で前方にある近傍３枚のＩピクチャまたはＰピクチャを参照ピクチャとし、Ｂピクチャは、表示時間順で前方にある近傍３枚のＩピクチャまたはＰピクチャと、表示時間順で後方にある近傍１枚のＩピクチャまたはＰピクチャとを参照ピクチャとして用いて符号化されているものとする。
【００５５】
符号列は、図６（ａ）　に示すようなピクチャ順で符号列解析部７０１に入力される。符号列解析部７０１は、入力された符号列より符号化モードの情報、動きベクトルの情報、およびスケーリング係数等の各種データの抽出を行う。符号列解析部７０１は、抽出した符号化モードの情報をモード復号部７０３へ、動きベクトルの情報およびスケーリング係数を動き補償復号部７０５へそれぞれ出力する。また、符号列解析部７０１は、抽出した予測誤差符号化データを予測誤差復号化部７０２へ出力する。
【００５６】
モード復号部７０３は、符号列より抽出された符号化モードの情報を参照し、スイッチ７０９、７１０の制御を行う。このとき、符号化モードがピクチャ内符号化である場合には、スイッチ７０９をａ側に、スイッチ７１０をｃ側に接続し、符号化モードがピクチャ間予測符号化である場合には、スイッチ７０９をｂ側に、スイッチ７１０をｄ側に接続するように制御する。また、モード復号部７０３は、符号化モードの情報を動き補償復号部７０５に対しても出力する。
【００５７】
以下では、符号化モードがピクチャ間予測符号化である場合について説明する。
予測誤差復号化部７０２は、入力された予測誤差符号化データの復号化を行い、予測誤差画像データを生成し、加算演算部７０８へ出力する。
【００５８】
以降の動作については、符号化対象のピクチャがＰピクチャの場合とＢピクチャの場合に分けてそれぞれ説明する。
【００５９】
＜Ｐピクチャの場合＞
動きベクトルの情報が入力された動き補償復号部７０５は、この動きベクトルの情報の復号化処理を行う。そして、動き補償復号部７０５は、復号化した参照ピクチャ番号と動きベクトルとに基づいて、フレームメモリ７０７より動き補償画像データ（ブロック）を取得し、この動き補償画像データを加算演算部７０８へ出力する。
【００６０】
また、動き補償復号部７０５は、復号化対象のピクチャが、他のピクチャの復号化時に参照ピクチャとして用いられるピクチャである場合には、動きベクトルと参照ピクチャ番号とを動きベクトル記憶部７０６に記憶する。ここでは、Ｐピクチャが参照ピクチャとして用いられるので、ピクチャＰ１３を復号化する際に得られた動きベクトルと参照ピクチャ番号とは動きベクトル記憶部７０６に記憶される。なお、動きベクトル記憶部７０６への動きベクトルの記憶は、符号列の関連情報により制御される。
【００６１】
加算演算部７０８は、予測誤差復号化部７０２より入力された予測誤差符号化データと、動き補償復号部７０５より入力された動き補償画像データとを加算し、復号化画像データを生成してフレームメモリ７０７に格納する。
【００６２】
以降同様の処理により、ピクチャＰ１３の残りのマクロブロックについても復号化処理を行う。そして、図６（ａ）　に示す例では、ピクチャＰ１３のすべてのマクロブロックについて処理が終了すると、次にピクチャＢ１１の復号化処理を行う。
【００６３】
＜Ｂピクチャの場合＞
ここでは、モード復号部７０３で抽出された符号化モードがダイレクトモードである場合について説明する。図３に示すピクチャＢ１１のブロックａが復号化対象のブロックであるとする。
【００６４】
ブロックａをダイレクトモードで復号化を行う場合には、後方参照ピクチャの中のブロックａと同じ位置にあるブロックの動きベクトルを利用する。すなわち、図３に示すようにピクチャＰ１３のブロックｂの動きベクトルｃを利用することになる。動きベクトルｃは、ブロックｂが符号化された際に用いられた動きベクトルであり、ピクチャＰ１０を参照している。
【００６５】
動き補償復号部７０５は、符号列解析部７０１より入力されたスケーリング係数を用いて、動きベクトルｃからピクチャＰ１０に基づく動きベクトルｄ、およびピクチャＰ１３に基づく動きベクトルｅを生成する。ブロックａは、動きベクトルｃから生成された２つの動きベクトルｄ、ｅを用いて、参照ピクチャであるピクチャＰ１０およびピクチャＰ１３から２方向予測が行われる。なお、参照ピクチャとして第１フィールドと第２フィールドのいずれを用いるかに応じたスケーリング係数の選択については、後で詳細に説明する。
【００６６】
動き補償復号部７０５は、生成した動きベクトルに基づいて、フレームメモリ７０７より動き補償画像データ（ブロック）を取得し、加算演算部７０８へ出力する。加算演算部７０８は、動き補償画像データと、予測誤差復号化部７０２より入力された予測誤差符号化データとを加算し、復号化画像データを生成してフレームメモリ７０７に格納する。
【００６７】
以降同様の処理により、ピクチャＢ１１の残りのマクロブロックについても復号化処理を行う。そして、図６（ａ）　に示す例では、ピクチャＢ１１のすべてのマクロブロックについて処理が終了すると、次にピクチャＢ１２の復号化処理を行う。以上のように復号化処理されたピクチャは、図６（ｂ）　に示すように順次出力画像としてフレームメモリ７０７から出力される。
【００６８】
以上のように、ピクチャの関連情報に、ダイレクトモードで用いるために符号化制御部１１０で決定されたスケーリング係数を含め、符号列を生成して出力し、復号化時に関連情報よりスケーリング係数を取り出し、このスケーリング係数を用いて、参照動きベクトルから処理対象ブロックの２つの動きベクトルの生成している。これによって、復号化時にスケーリング係数を、例えば各フィールド間の時間的距離等から求める必要がなく、処理付加が軽減でき、効率のよい処理を行うことが可能である。
【００６９】
次に、符号化制御部１１０で決定されたスケーリング係数を用いた、参照動きベクトルから処理対象ブロックの２つの動きベクトルの生成、および参照ピクチャとして第１フィールドと第２フィールドのいずれを用いるかに応じたスケーリング係数の選択について、詳細に説明する。
【００７０】
各ピクチャをフレーム構造で符号化するか、またはフィールド構造で符号化するかは、符号化制御部１１０により適応的に決定されるものとする。フレーム構造とフィールド構造のいずれで符号化するかは、例えば、ピクチャ内の画素値の分散をフレーム構造とフィールド構造とで求め、分散の小さい方を選択する方法がある。また、各ピクチャをブロック単位でフレーム構造またはフィールド構造のいずれかで符号化する方法も考えられるが、ここではピクチャ単位でフレーム構造またはフィールド構造を切り替える場合について説明する。
【００７１】
図７は、動画像を符号化または復号化する際の、各フレームの時間的並びを示している。図７において、フレームＰ１、Ｐ４はＰピクチャとして処理され、フレームＢ２、Ｂ３はＢピクチャとして処理される。また、１つのフレームは２つのフィールドとして扱うことができる。例えば、フレームＰ１はフィールドＰ１１、Ｐ１２として、フレームＢ２はフィールドＢ２１、Ｂ２２として、フレームＢ３はフィールドＢ３１、Ｂ３２として、フレームＰ４はフィールドＰ４１、Ｐ４２として扱うことができる。さらに、各フレームは、フレーム構造またはフィールド構造のいずれかの形式で適応的に符号化、復号化処理がなされるものとする。
図７〜図１１において、ピクチャを示す記号のうち、上段の記号が振られている単位で符号化と復号化の処理が行われる。例えば、図７においては、すべてのピクチャがフィールド単位で処理される。
【００７２】
現在の処理対象ピクチャが、フィールドＢ３１であるとする。すなわち、フレームＢ３はフィールド構造で処理される。またフィールドＢ３１は前方参照ピクチャとしてフィールドＰ１１またはＰ１２を、後方参照ピクチャとしてフィールドＰ４１またはＰ４２を用いるとする。これらの参照ピクチャは、すでに符号化または復号化処理が完了している。また、フレームＰ１、Ｐ４はフィールド単位で処理がなされているものとする。
【００７３】
今、フィールドＢ３１のブロックａをダイレクトモードで処理する場合を考える。この場合、後方参照ピクチャであり、ブロックａが属するフィールドと同一パリティ（第１フィールド、第２フィールドのいずれであるかを示す値）を有するフィールド（すなわち第１フィールド）であるフィールドＰ４１中の、ブロックａと同一位置にあるブロックｂの動きベクトルを利用する。以下では、この動きベクトルを参照動きベクトルと呼ぶ。
【００７４】
ここではまず、図７（ａ）のように、ブロックｂが、動きベクトルＡを用いて処理されており、この動きベクトルＡは、フィールドＰ１１を参照している場合について説明する。この場合、ブロックａは、参照動きベクトルＡから所定の方法により計算して得られる動きベクトルを用いて、前方参照フィールドであるフィールドＰ１１（参照動きベクトルＡが指しているフィールド）と後方参照フィールドであるフィールドＰ４１（ブロックｂが属するフィールド）とから動き補償を行う。この場合にブロックａを処理する際に用いる動きベクトルは、フィールドＰ１１に対しては動きベクトルＢ、フィールドＰ４１に対しては動きベクトルＣとなるとする。この際、動きベクトルＡの大きさをＭＶ１、動きベクトルＢの大きさをＭＶｆ１、動きベクトルＣの大きさをＭＶｂ１とすると、ＭＶｆ１、ＭＶｂ１はそれぞれ式３、式４によって得られる。
（式３）　ＭＶｆ１＝Ｎ１×ＭＶ１／Ｄ１
（式４）　ＭＶｂ１＝―Ｍ１×ＭＶ１／Ｄ１
【００７５】
以下ではこれらＮ１、Ｍ１、Ｄ１の値をスケーリング係数と呼ぶとする。スケーリング係数は、フィールド単位で設定された値であるとする。例えばこの場合、スケーリング係数は、各フィールド間の時間的距離から設定することができる。例えばフィールドＰ１１からＰ４１の時間的距離をＤ１、フィールドＰ１１からＢ３１の時間的距離をＮ１、フィールドＢ３１からＰ４１の時間的距離をＭ１と設定すれば、ＭＶｆ１とＭＶｂ１はＭＶに平行な動きベクトルとなる。ここで、スケーリング係数の値は、符号化時に設定されるものとし、関連情報等として符号列中または符号列の付属情報として記述されるものとする。復号化時には、スケーリング係数を符号列中または符号列の付属情報から取得する。そしてダイレクトモードにより符号化されているブロックを復号化する際には、式３、式４を用いてＭＶｆ１、ＭＶｂ１を計算すれば良い。
【００７６】
次に、図７（ｂ）のように、ブロックｂが、動きベクトルＤを用いて処理されており、この動きベクトルＤがフィールドＰ１２を参照している場合について説明する。この場合、ブロックａは、参照動きベクトルＤから所定の方法により計算して得られる動きベクトルを用いて、前方参照フィールドであるフィールドＰ１２（動きベクトルＤが指しているフィールド）と後方参照フィールドであるフィールドＰ４１（ブロックｂが属するフィールド）から動き補償を行う。この場合にブロックａを処理する際に用いる動きベクトルは、フィールドＰ１２に対しては動きベクトルＥ、フィールドＰ４１に対しては動きベクトルＦとなるとする。この際、動きベクトルＤの大きさをＭＶ２、動きベクトルＥの大きさをＭＶｆ２、動きベクトルＦの大きさをＭＶｂ２とすると、ＭＶｆ２、ＭＶｂ２はそれぞれ式５、式６によって得られる。
（式５）　ＭＶｆ２＝Ｎ２×ＭＶ２／Ｄ２
（式６）　ＭＶｂ２＝Ｍ２×ＭＶ２／Ｄ２
【００７７】
ここでスケーリング係数（Ｎ２、Ｍ２、Ｄ２）は、ピクチャ単位で設定された値であるとする。例えばスケーリング係数（Ｎ２、Ｍ２、Ｄ２）の値は、各フィールド間の時間的距離から設定することができる。例えばフィールドＰ１２からＰ４１の時間的距離をＤ２、フィールドＰ１２からＢ３１の時間的距離をＮ２、フィールドＢ３１からＰ４１の時間的距離をＭ２と設定すれば、ＭＶｆ２とＭＶｂ２はＭＶ２に平行な動きベクトルとなる。ここで、スケーリング係数の値は、符号化時に設定されるとし、関連情報等として符号列中または符号列の付属情報として記述されるものとする。復号化時には、これらの値を符号列中または符号列の付属情報から取得し、それにより式５、式６を用いてＭＶｆ２、ＭＶｂ２を計算すれば良い。
【００７８】
ここで、スケーリング係数（Ｎ１、Ｍ１、Ｄ１）および（Ｎ２、Ｍ２、Ｄ２）を符号列中または符号列の付属情報として記述する方法としては、上記のように２組とも記述する方法以外に、例えばスケーリング係数を１組だけ記述し、その記述する（された）スケーリング係数を用いて、もう一方の組のスケーリング係数を求めるようにしても良い。例えば、スケーリング係数を上述のようにフィールド間の時間的距離から決定する場合、スケーリング係数（Ｎ１、Ｍ１、Ｄ１）と（Ｎ２、Ｍ２、Ｄ２）との間には、
（式７）　Ｎ２＝Ｎ１−１
（式８）　Ｍ２＝Ｍ１
（式９）　Ｄ２＝Ｄ１−１
の関係が成立する。したがって、スケーリング係数（Ｎ２、Ｍ２、Ｄ２）は、スケーリング係数（Ｎ１、Ｍ１、Ｄ１）から求めることができる。
【００７９】
以上の説明では、前方参照ピクチャとして図７におけるフィールドＰ１１、Ｐ１２を用い、後方参照ピクチャとして図７におけるフィールドＰ４１、Ｐ４２を用いる場合について説明したが、これらの参照ピクチャの数はさらに多くても良い。例えば、図１１に示すように、フィールドＢ６１のブロックａを処理する場合に、前方参照ピクチャとしてフィールドＰ１１、Ｐ１２、Ｐ４１、Ｐ４２、後方参照ピクチャとしてＰ７１、Ｐ７２を用いるような場合がある。図１１（ａ）はブロックａをダイレクトモードで処理する場合の参照動きベクトルＡが、フレームＰ１の第１フィールドを参照している場合を示し、図１１（ｂ）はブロックａをダイレクトモードで処理する場合の参照動きベクトルＡが、フレームＰ１の第２フィールドを参照している場合を示している。このような場合、図７で示した場合と合わせると、参照動きベクトルが参照するフィールドは４通りあることになる。したがって、スケーリング係数も４通りあることなる。ただし、同一フレームに属するフィールドのスケーリング係数は、一方から他方を容易に求めることが出来るため、すべてのスケーリング係数を関連情報として記述する必要はない。また、参照ピクチャ（ここではＰピクチャ）の間隔が一定であったり、他の方法により検出できる場合には、１組だけのスケーリング係数を記述し、他のスケーリング係数はこの記述されたスケーリング係数から求めることができる。
【００８０】
さて、以上の説明においては、図７におけるフィールドＢ３１に属するブロックをダイレクトモードで処理する場合について説明したが、フレームＢ３の第２フィールドである、フィールドＢ３２に属するブロックを処理する場合についても、同様に処理することができる。以下ではこの場合について説明する。
【００８１】
処理の様子を図８を用いて説明する。フィールドＢ３２に属するブロックａをダイレクトモードで処理する場合に、ピクチャＰ４２に属しかつブロックａと同一位置にあるブロックｂの動きベクトルが参照動きベクトルとなる。図８（ａ）は、参照動きベクトルがフィールドＰ１１を参照する場合を示し、図８（ｂ）は、参照動きベクトルがフィールドＰ１２を参照する場合を示す。これらの場合における処理の概要は、上述の場合とほぼ同様であるので、ここでは説明は割愛する。ただし、この場合のスケーリング係数の値は、図７におけるフィールドＢ３１を処理する場合に用いたスケーリング係数とは一般に異なる。
【００８２】
ここで第２フィールドを処理する場合には、その参照ピクチャとして同一フレーム内の第１フィールドを参照することが可能となる。したがって、ブロックｂがフィールドＰ４１を参照して処理されている場合には、上記の処理とは異なる処理となる。その様子を図９を用いて説明する。
【００８３】
図９では、ブロックｂが、動きベクトルＧを用いて処理されており、この動きベクトルＧは、フィールドＰ４１を参照している場合について示している。この場合、ブロックａは、参照動きベクトルＧから所定の方法により計算して得られる動きベクトルを用いて、後方参照フィールドであるフィールドＰ４１（動きベクトルＧが指しているフィールド）と後方参照フィールドであるフィールドＰ４２（ブロックｂが属するフィールド）から動き補償を行う。この場合にブロックａを符号化する際に用いる動きベクトルは、フィールドＰ４１に対しては動きベクトルＨ、フィールドＰ４２に対しては動きベクトルＩとなるとする。この際、動きベクトルＧの大きさをＭＶ３、動きベクトルＨの大きさをＭＶｆ３、動きベクトルＩの大きさをＭＶｂ３とすると、ＭＶｆ３、ＭＶｂ３はそれぞれ式１０、式１１によって得られる。
（式１０）　ＭＶｆ３＝―Ｎ３×ＭＶ３／Ｄ３
（式１１）　ＭＶｂ３＝―Ｍ３×ＭＶ３／Ｄ３
【００８４】
ここで例えば、これらＮ３、Ｍ３、Ｄ３の値は、各フィールド間の時間的距離から設定することができる。例えばフィールドＰ４１からＰ４２の時間的距離をＤ３、フィールドＢ３２からＰ４１の時間的距離をＮ３、フィールドＢ３１からＰ４２の時間的距離をＭ３と設定すれば、ＭＶｆ３とＭＶｂ３はＭＶ３に平行な動きベクトルとなる。ここで、これらＮ３、Ｍ３、Ｄ３の値は、符号化時にフィールド単位で設定されるとし、これらの値は関連情報等として符号列中または符号列の付属情報として記述される。復号化時には、これらの値を符号列中または符号列の付属情報から取得し、それにより式１０、式１１を用いてＭＶｆ３、ＭＶｂ３を計算すれば良い。
【００８５】
次に、以上の動きベクトル計算方法を用いた動画像符号化方法における処理方法のフローチャートを図１２に示す。ここでは、図７を用いて説明した動きベクトル計算方法に対応する動画像符号化方法について説明する。まず、Ｓ６０１において、ピクチャ（フレームまたはフィールド）単位でスケーリング係数を決定し、それを関連情報として記述する。ここでスケーリング係数の決定方法としては、上記のように参照フィールド間の時間間隔等によって決めても良いし、他の方法によって決めても良い。また、スケーリング係数の記述方法は、すべてのスケーリング係数を記述しても良いし、一部のスケーリング係数を記述し、残りのスケーリング係数は記述されたスケーリング係数から求めても良い。そして、Ｓ６０２において、処理対象ブロックをダイレクトモードで処理するか否かを決定する。ダイレクトモードで処理しない場合には、Ｓ６０６において、他のモードに応じた処理で処理対象ブロックを処理する。ダイレクトモードで処理する場合には、Ｓ６０３で参照動きベクトルが、第１フィールドに対する動きベクトルであるか、第２フィールドに対する動きベクトルであるかを判定する。この判定は、参照ピクチャ番号の値を用いて決定することができる。参照動きベクトルが第１フィールドを参照する動きベクトルである場合には、Ｓ６０４において、第１フィールド用のスケーリング係数（Ｎ１、Ｍ１、Ｄ１）を用いて参照動きベクトルのスケーリングを行う。また、参照動きベクトルが第２フィールドを参照する動きベクトルである場合には、Ｓ６０５において、第２フィールド用のスケーリング係数（Ｎ２、Ｍ２、Ｄ２）を用いて参照動きベクトルのスケーリングを行う。そして、Ｓ６０７において、Ｓ６０４またはＳ６０５において得られた動きベクトルを用いて動き補償を行う。
【００８６】
次に、以上の動きベクトル計算方法を用いた動画像復号化方法における処理方法のフローチャートを図１３に示す。ここでは、図１３を用いて説明した動きベクトル計算方法に対応する動画像復号化方法について説明する。まず、Ｓ７０１において、スケーリング係数を関連情報から取得する。そして例えば、一部のスケーリング係数のみが関連情報として記述されている場合には、他のスケーリング係数を記述されているスケーリング係数から所定の方法により計算する。そして、Ｓ７０２において、処理対象ブロックをダイレクトモードで処理するか否かを決定する。ダイレクトモードで処理しない場合には、Ｓ７０６において、他のモードに応じた処理で処理対象ブロックを処理する。ダイレクトモードで処理する場合には、Ｓ７０３で参照動きベクトルが、第１フィールドに対する動きベクトルであるか、第２フィールドに対する動きベクトルであるかを判定する。この判定は、参照ピクチャ番号の値を用いて決定することができる。参照動きベクトルが第１フィールドを参照する動きベクトルである場合には、Ｓ７０４において、第１フィールド用のスケーリング係数（Ｎ１、Ｍ１、Ｄ１）を用いて参照動きベクトルのスケーリングを行う。また、参照動きベクトルが第２フィールドを参照する動きベクトルである場合には、Ｓ７０５において、第２フィールド用のスケーリング係数（Ｎ２、Ｍ２、Ｄ２）を用いて参照動きベクトルのスケーリングを行う。そして、Ｓ７０７において、Ｓ７０４またはＳ７０５において得られた動きベクトルを用いて動き補償を行う。
【００８７】
以上のように、本発明の動きベクトル計算方法および動画像符号化方法および動画像復号化方法は、ダイレクトモードにおいて用いる参照動きベクトルがフィールド構造に対応した動きベクトルである場合、参照動きベクトルが指し示すフィールドにより、ダイレクトモードの際に用いるスケーリング係数を切り替える。このスケーリング係数は、処理フィールド単位で設定し、符号化時には関連情報として記述し、復号化時には関連情報から取得する。また、スケーリング係数の記述方法としては、すべてのスケーリング係数を記述しても良いし、一部のスケーリング係数のみを記述し、残りのスケーリング係数は記述されたスケーリング係数から所定の方法により求めても良い。
【００８８】
このように本発明の動きベクトル計算方法および動画像符号化方法および動画像復号化方法を用いることにより、ダイレクトモードにおいて参照するフィールドに応じて最適なスケーリング係数を用いることができ、符号化効率の向上を図ることができる。またこの際にスケーリング係数を記述するオーバーヘッドはほとんど無視できる。さらに一部のスケーリング係数のみを関連情報として記述し、他のスケーリング係数は記述されたスケーリング係数から計算する方法を用いると、スケーリング係数を記述するための関連情報の符号量は、従来例と全く同じとなる。
【００８９】
なお、本実施の形態においては、スケーリング係数を時間的距離から設定する場合について説明したが、これは他の方法で決定しても良い。
【００９０】
また、本実施の形態においては、各フレームは、フレーム構造またはフィールド構造のいずれかを用いて適応的に符号化、復号化処理されるとして説明したが、これは例えばブロック単位でフレーム構造またはフィールド構造のいずれかを用いて適応的に符号化、復号化処理されるとしても、本発明と同様の処理により実施することが可能であり、同様の効果が得られる。
【００９１】
図１０は、動画像を符号化または復号化する際の、各フレームの表示時間順の並びを示している。図１０においては、実線はフレーム構造のピクチャを示し、破線はフィールド構造のピクチャを示す。例えば、実線で示すフレームＰ１は、破線で示すフィールドＰ１１とＰ１２を合成したものとなる。また、フレームＰ１、Ｐ４はＰピクチャとして処理され、フレームＢ２、Ｂ３はＢピクチャとして処理される。また、１つのフレームは２つのフィールドとして扱うことができる。例えば、フレームＰ１はフィールドＰ１１とＰ１２として、フレームＢ２はフィールドＢ２１とＢ２２として、フレームＢ３はフィールドＢ３１とＢ３２として、フレームＰ４はフィールドＰ４１とＰ４２として扱うことができる。ここでは、各フレームは、フレーム構造またはフィールド構造のいずれかの形式で適応的に符号化、復号化処理されるものとする。
【００９２】
現在の処理対象はフレームＢ３であるとする。すなわち、フレームＢ３はフレーム構造で処理されるものとする。またフレームＢ３は、前方参照ピクチャとしてフレームＰ１を、後方参照ピクチャとしてフレームＰ４を用いるとする。これらの参照ピクチャは、すでに符号化または復号化処理が完了している。ここで、フレームＰ４はフィールド単位で処理されているものとする。すなわち、フレームＰ４を参照する場合には、フィールド構造として処理されたＰ４１、Ｐ４２を合成してフレーム構造として参照することになる。
【００９３】
今、フレームＢ３のブロックａをダイレクトモードで処理する場合を考える。この場合、後方参照ピクチャであるフレームＰ４中の、ブロックａと同一位置にあるブロックを処理する際に用いた動きベクトルを利用する。ただしここでは、フレームＰ４はフィールド構造として、フィールドＰ４１、Ｐ４２として扱われている。そこでここでは、フレームＰ４の第１フィールドである、フィールドＰ４１中に属するブロックｂの動きベクトルを参照動きベクトルとする。
【００９４】
ここではまず、図１０（ａ）のように、ブロックｂが、動きベクトルＡを用いて処理されており、この動きベクトルＡは、フィールドＰ１１を参照している場合について説明する。この場合、ブロックａは、参照動きベクトルＡから所定の方法により計算して得られる動きベクトルを用いて、前方参照フレームであるフレームＰ１（動きベクトルＡが指しているフィールドを含むフレーム）と後方参照フレームであるフレームＰ４（ブロックｂが属するフィールドを含むフレーム）から動き補償を行う。この場合にブロックａを符号化する際に用いる動きベクトルは、フレームＰ１に対しては動きベクトルＢ、フレームＰ４に対しては動きベクトルＣとなるとする。この際、動きベクトルＡの大きさをＭＶ４、動きベクトルＢの大きさをＭＶｆ４、動きベクトルＣの大きさをＭＶｂ４とすると、ＭＶｆ４、ＭＶｂ４はそれぞれ式１２、式１３によって得られる。
（式１２）　ＭＶｆ４＝Ｎ４×ＭＶ４／Ｄ４
（式１３）　ＭＶｂ４＝―Ｍ４×ＭＶ４／Ｄ４
【００９５】
ここで、スケーリング係数（Ｎ４、Ｍ４、Ｄ４）の値は、一例として、各ピクチャ間の時間的距離から設定することができる。例えばフィールドＰ１１からＰ４１の時間的距離をＤ４、フレームＰ１からフレームＢ３の時間的距離をＮ４、フレームＢ３からフレームＰ４の時間的距離をＭ４と設定すれば、ＭＶｆ４とＭＶｂ４はＭＶ４に平行な動きベクトルとなる。ここで、スケーリング係数（Ｎ４、Ｍ４、Ｄ４）の値は、符号化時にフレーム単位で設定されるとし、これらの値はフレームＢ３の関連情報として符号列中または符号列の付属情報として記述される。復号化時には、これらの値を符号列中または符号列の付属情報から取得し、それにより式１２、式１３を用いてＭＶｆ４、ＭＶｂ４を計算すれば良い。
【００９６】
次に、図１０（ｂ）のように、ブロックｂが、動きベクトルＤを用いて処理されており、この動きベクトルＤは、フィールドＰ１２を参照している場合について説明する。この場合、ブロックａは、参照動きベクトルＤから所定の方法により計算して得られる動きベクトルを用いて、前方参照フレームであるフレームＰ１（動きベクトルＤが指しているフィールドを含むフレーム）と後方参照フレームであるフレームＰ４（ブロックｂが属するフィールドを含むフレーム）から動き補償を行う。この場合にブロックａを符号化する際に用いる動きベクトルは、フレームＰ１に対しては動きベクトルＥ、フレームＰ４に対しては動きベクトルＦとなるとする。この際、動きベクトルＤの大きさをＭＶ５、動きベクトルＥの大きさをＭＶｆ５、動きベクトルＦの大きさをＭＶｂ５とすると、ＭＶｆ５、ＭＶｂ５はそれぞれ式１４、式１５によって得られる。
（式１４）　ＭＶｆ５＝Ｎ５×ＭＶ５／Ｄ５
（式１５）　ＭＶｂ５＝―Ｍ５×ＭＶ５／Ｄ５
【００９７】
ここで例えば、スケーリング係数（Ｎ５、Ｍ５、Ｄ５）の値は、一例として各ピクチャ間の時間的距離から設定することができる。例えばフィールドＰ１２からフィールドＰ４１の時間的距離をＤ５、フレームＰ１からフレームＢ３の時間的距離をＮ５、フレームＢ３からフレームＰ４の時間的距離をＭ５と設定すれば、ＭＶｆ５とＭＶｂ５はＭＶ５に平行な動きベクトルとなる。ここで、スケーリング係数（Ｎ５、Ｍ５、Ｄ５）の値は、符号化時にピクチャ単位で設定されるとし、これらの値は関連情報等として符号列中または符号列の付属情報として記述される。復号化時には、これらの値を符号列中または符号列の付属情報から取得し、それにより式１４、式１５を用いてＭＶｆ５、ＭＶｂ５を計算すれば良い。
【００９８】
ここで、スケーリング係数（Ｎ４、Ｍ４、Ｄ４）および（Ｎ５、Ｍ５、Ｄ５）を符号列中または符号列の付属情報として記述する方法としては、上記のように２組とも記述する方法以外に、例えばスケーリング係数を１組だけ記述し、もう一方の組のスケーリング係数は記述されたスケーリング係数から求めるようにしても良い。例えば、スケーリング係数を上述のようにフィールド間の時間的距離から決定する場合、
（式１６）　Ｎ５＝Ｎ４
（式１７）　Ｍ５＝Ｍ４
（式１８）　Ｄ５＝Ｄ４−１
の関係が成立する。したがって、スケーリング係数（Ｎ５、Ｍ５、Ｄ５）は、スケーリング係数（Ｎ４、Ｍ４、Ｄ４）から求めることができる。
【００９９】
また、第１の実施の形態においては、図７の参照関係に加えて、図１１に示すように、複数の参照フレームを用いる場合について説明した。これは第２の実施の形態においても同様のことが可能となる。
【０１００】
また、第１の実施の形態においては、図７を用いて説明した動きベクトル計算方法を用いた場合の、動画像符号化方法および動画像復号化方法について説明した。ここで、図１０を用いて説明した動きベクトル計算方法を用いた場合の、動画像符号化方法および動画像復号化方法は、第１の実施の形態において説明した動画像符号化方法および動画像復号化方法と同様であるので、説明は割愛する。
【０１０１】
このように本発明の動きベクトル計算方法および動画像符号化方法および動画像復号化方法を用いることにより、ダイレクトモードにおいて参照するフィールドに応じて最適なスケーリング係数を用いることができ、符号化効率の向上を図ることができる。またこの際にスケーリング係数を記述するオーバーヘッドはほとんど無視できる。さらに一部のスケーリング係数のみを関連情報として記述し、他のスケーリング係数は記述されたスケーリング係数から計算する方法を用いると、スケーリング係数を記述するための関連情報の符号量は、従来例と全く同じとなる。
【０１０２】
なお、本実施の形態においては、スケーリング係数を時間的距離から設定する場合について説明したが、これは他の方法で決定しても良い。
【０１０３】
また、本実施の形態においては、各フレームは、フレーム構造またはフィールド構造のいずれかを用いて適応的に符号化、復号化処理されるとして説明したが、これは例えばブロック単位でフレーム構造またはフィールド構造のいずれかを用いて適応的に符号化、復号化処理されるとしても、本発明と同様の処理により実施することが可能であり、同様の効果が得られる。
【０１０４】
なお、本実施の形態においては、Ｐピクチャは前方１方向のピクチャを参照して処理され、Ｂピクチャは前方および後方の２方向のピクチャを参照して処理されるピクチャとして説明したが、これらはＰピクチャは後方１方向のピクチャを参照して処理され、Ｂピクチャは前方２方向または後方２方向のピクチャを参照して処理されるとしても、同様の効果が得られる。
【０１０５】
次に、符号化制御部１１０でのスケーリング係数を決定について説明する。
上記のようにスケーリング係数を用いて、参照動きベクトルから処理対象ブロックの２つの動きベクトルを生成する際には、スケーリング係数の除算演算処理が発生する。このため、スケーリング係数の決定について、以下のような方法で行うこともできる。
【０１０６】
（方法１）
上記式３、式４、式５、式６に示すスケーリング係数Ｎ１、Ｍ１、Ｄ１、Ｎ２、Ｍ２、Ｄ２では、参照動きベクトルから処理対象ブロックの２つの動きベクトルを生成する際に、Ｎ１／Ｄ１、Ｍ１／Ｄ１、Ｎ２／Ｄ２、Ｍ２／Ｄ２という除算演算処理が必要である。このとき、分母が２のべき乗となるようにそれぞれ近似を行う。図７（ａ）　に示すようにＮ１／Ｄ１が４／６である場合、例えば、分母を１６と固定して通分すると、分子は１０．６６…となる。次に、分子１０．６６…の少数点以下を切り捨て１０／１６、すなわち１０／２＾４（ここで＾はべき乗を示す）を得る。このように分母を２のべき乗とすれば、シフト演算によって演算処理することができる。例えば１０を２進数で表示すれば、「１０１０」であり、４桁シフトすると「．１０１０」となる。このようにして、１０／１６の演算処理することができる。
【０１０７】
一方、分母の異なるＮ２／Ｄ２についても、Ｎ１／Ｄ１と同じ２のべき乗の分母に共通化する。図７（ｂ）　に示すようにＮ２／Ｄ２が３／５である場合、例えば、分母を１６と固定して通分すると、分子は９．６となる。次に、分子９．６の少数点以下を切り捨て９／１６、すなわち９／２＾４を得る。
【０１０８】
（方法２）
Ｎ１／Ｄ１については上記同様に求めて１０／２＾４を得た上で、Ｎ２／Ｄ２については、分母はＮ１／Ｄ１と同じ２のべき乗に共通化する。そして、Ｎ２は１０より所定の値（ここでは１）を減算し、９／２＾４を得る。
【０１０９】
（方法３）
Ｎ１／Ｄ１については上記同様に求める。一方、Ｎ２／Ｄ２についてＮ１／Ｄ１と分母が同じ２のべき乗に共通化せずに、できるだけ元のＮ２／Ｄ２に近似できるような２のべき乗の分母に変更する。
【０１１０】
以上のように、スケーリング係数について分母が２のべき乗となるようにそれぞれ近似を行っているので、例えば除算演算処理を行う機能がない画像処理用のＤＳＰ（信号処理用ＬＳＩ）であっても、スケーリング係数の除算演算処理をシフト演算によって演算処理することができる。
【０１１１】
（実施の形態２）
さらに、上記各実施の形態で示した画像符号化方法または画像復号化方法の構成を実現するためのプログラムを、フレキシブルディスク等の記憶媒体に記録するようにすることにより、上記各実施の形態で示した処理を、独立したコンピュータシステムにおいて簡単に実施することが可能となる。
【０１１２】
図１４は、上記実施の形態１の画像符号化方法または画像復号化方法を格納したフレキシブルディスクを用いて、コンピュータシステムにより実施する場合の説明図である。
【０１１３】
図１４（ｂ）は、フレキシブルディスクの正面からみた外観、断面構造、及びフレキシブルディスクを示し、図１４（ａ）は、記録媒体本体であるフレキシブルディスクの物理フォーマットの例を示している。フレキシブルディスクＦＤはケースＦ内に内蔵され、該ディスクの表面には、同心円状に外周からは内周に向かって複数のトラックＴｒが形成され、各トラックは角度方向に１６のセクタＳｅに分割されている。従って、上記プログラムを格納したフレキシブルディスクでは、上記フレキシブルディスクＦＤ上に割り当てられた領域に、上記プログラムとしての画像符号化方法が記録されている。
【０１１４】
また、図１４（ｃ）は、フレキシブルディスクＦＤに上記プログラムの記録再生を行うための構成を示す。上記プログラムをフレキシブルディスクＦＤに記録する場合は、コンピュータシステムＣｓから上記プログラムとしての画像符号化方法または画像復号化方法をフレキシブルディスクドライブを介して書き込む。また、フレキシブルディスク内のプログラムにより上記画像符号化方法をコンピュータシステム中に構築する場合は、フレキシブルディスクドライブによりプログラムをフレキシブルディスクから読み出し、コンピュータシステムに転送する。
【０１１５】
なお、上記説明では、記録媒体としてフレキシブルディスクを用いて説明を行ったが、光ディスクを用いても同様に行うことができる。また、記録媒体はこれに限らず、ＩＣカード、ＲＯＭカセット等、プログラムを記録できるものであれば同様に実施することができる。
【０１１６】
図１５から図１８は、上記実施の形態で示した符号化処理または復号化処理を行う機器、およびこの機器を用いたシステムを説明する図である。
【０１１７】
図１５は、コンテンツ配信サービスを実現するコンテンツ供給システムｅｘ１００の全体構成を示すブロック図である。通信サービスの提供エリアを所望の大きさに分割し、各セル内にそれぞれ固定無線局である基地局ｅｘ１０７〜ｅｘ１１０が設置されている。このコンテンツ供給システムｅｘ１００は、例えば、インターネットｅｘ１０１にインターネットサービスプロバイダｅｘ１０２および電話網ｅｘ１０４を介して、コンピュータｅｘ１１１、ＰＤＡ（ｐｅｒｓｏｎａｌ　ｄｉｇｉｔａｌ　ａｓｓｉｓｔａｎｔ）ｅｘ１１２、カメラｅｘ１１３、携帯電話ｅｘ１１４が接続される。しかし、コンテンツ供給システムｅｘ１００は図１５のような組み合わせに限定されず、いずれかを組み合わせて接続するようにしてもよい。また、固定無線局である基地局ｅｘ１０７〜ｅｘ１１０を介さずに、電話網ｅｘ１０４に直接接続されてもよい。
【０１１８】
カメラｅｘ１１３はデジタルビデオカメラ等の動画撮影が可能な機器である。また、携帯電話は、ＰＤＣ（Ｐｅｒｓｏｎａｌ　Ｄｉｇｉｔａｌ　Ｃｏｍｍｕｎｉｃａｔｉｏｎｓ）方式、ＣＤＭＡ（Ｃｏｄｅ　Ｄｉｖｉｓｉｏｎ　Ｍｕｌｔｉｐｌｅ　Ａｃｃｅｓｓ）方式、Ｗ−ＣＤＭＡ（Ｗｉｄｅｂａｎｄ−Ｃｏｄｅ　Ｄｉｖｉｓｉｏｎ　Ｍｕｌｔｉｐｌｅ　Ａｃｃｅｓｓ）方式、若しくはＧＳＭ（Ｇｌｏｂａｌ　Ｓｙｓｔｅｍ　ｆｏｒ　Ｍｏｂｉｌｅ　Ｃｏｍｍｕｎｉｃａｔｉｏｎｓ）方式の携帯電話機、またはＰＨＳ（Ｐｅｒｓｏｎａｌ　Ｈａｎｄｙｐｈｏｎｅ　Ｓｙｓｔｅｍ）等であり、いずれでも構わない。
【０１１９】
また、ストリーミングサーバｅｘ１０３は、カメラｅｘ１１３から基地局ｅｘ１０９、電話網ｅｘ１０４を通じて接続されており、カメラｅｘ１１３を用いてユーザが送信する符号化処理されたデータに基づいたライブ配信等が可能になる。撮影したデータの符号化処理はカメラｅｘ１１３で行っても、データの送信処理をするサーバ等で行ってもよい。また、カメラ１１６で撮影した動画データはコンピュータｅｘ１１１を介してストリーミングサーバｅｘ１０３に送信されてもよい。カメラｅｘ１１６はデジタルカメラ等の静止画、動画が撮影可能な機器である。この場合、動画データの符号化はカメラｅｘ１１６で行ってもコンピュータｅｘ１１１で行ってもどちらでもよい。また、符号化処理はコンピュータｅｘ１１１やカメラｅｘ１１６が有するＬＳＩｅｘ１１７において処理することになる。なお、画像符号化・復号化用のソフトウェアをコンピュータｅｘ１１１等で読み取り可能な記録媒体である何らかの蓄積メディア（ＣＤ−ＲＯＭ、フレキシブルディスク、ハードディスクなど）に組み込んでもよい。さらに、カメラ付きの携帯電話ｅｘ１１５で動画データを送信してもよい。このときの動画データは携帯電話ｅｘ１１５が有するＬＳＩで符号化処理されたデータである。
【０１２０】
図１６は、携帯電話ｅｘ１１５の一例を示す図である。携帯電話ｅｘ１１５は、基地局ｅｘ１１０との間で電波を送受信するためのアンテナｅｘ２０１、ＣＣＤカメラ等の映像、静止画を撮ることが可能なカメラ部ｅｘ２０３、カメラ部ｅｘ２０３で撮影した映像、アンテナｅｘ２０１で受信した映像等が復号化されたデータを表示する液晶ディスプレイ等の表示部ｅｘ２０２、操作キー群から構成される本体部ｅｘ２０４、音声出力をするためのスピーカ等の音声出力部ｅｘ２０８、音声入力をするためのマイク等の音声入力部ｅｘ２０５、撮影した動画もしくは静止画のデータ、受信したメールのデータ、動画のデータもしくは静止画のデータ等、符号化されたデータまたは復号化されたデータを保存するための記憶メディアｅｘ２０７、携帯電話ｅｘ１１５に記憶メディアｅｘ２０７を装着可能とするためのスロット部ｅｘ２０６を有している。記憶メディアｅｘ２０７はＳＤカード等のプラスチックケース内に電気的に書換えや消去が可能な不揮発性メモリであるＥＥＰＲＯＭ（Ｅｌｅｃｔｒｉｃａｌｌｙ　Ｅｒａｓａｂｌｅ　ａｎｄ　Ｐｒｏｇｒａｍｍａｂｌｅ　Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）の一種であるフラッシュメモリ素子を格納したものである。
【０１２１】
このコンテンツ供給システムｅｘ１００では、ユーザがカメラｅｘ１１３、カメラｅｘ１１６等で撮影しているコンテンツ（例えば、音楽ライブを撮影した映像等）を上記実施の形態同様に符号化処理してストリーミングサーバｅｘ１０３に送信する一方で、ストリーミングサーバｅｘ１０３は要求のあったクライアントに対して上記コンテンツデータをストリーム配信する。クライアントとしては、上記符号化処理されたデータを復号化することが可能な、コンピュータｅｘ１１１、ＰＤＡｅｘ１１２、カメラｅｘ１１３、携帯電話ｅｘ１１４等がある。このようにすることでコンテンツ供給システムｅｘ１００は、符号化されたデータをクライアントにおいて受信して再生することができ、さらにクライアントにおいてリアルタイムで受信して復号化し、再生することにより、個人放送をも実現可能になるシステムである。
【０１２２】
さらに、携帯電話ｅｘ１１５について図１７を用いて説明する。携帯電話ｅｘ１１５は表示部ｅｘ２０２及び本体部ｅｘ２０４の各部を統括的に制御するようになされた主制御部ｅｘ３１１に対して、電源回路部ｅｘ３１０、操作入力制御部ｅｘ３０４、画像符号化部ｅｘ３１２、カメラインターフェース部ｅｘ３０３、ＬＣＤ（Ｌｉｑｕｉｄ　Ｃｒｙｓｔａｌ　Ｄｉｓｐｌａｙ）制御部ｅｘ３０２、画像復号化部ｅｘ３０９、多重分離部ｅｘ３０８、記録再生部ｅｘ３０７、変復調回路部ｅｘ３０６及び音声処理部ｅｘ３０５が同期バスｅｘ３１３を介して互いに接続されている。電源回路部ｅｘ３１０は、ユーザの操作により終話及び電源キーがオン状態にされると、バッテリパックから各部に対して電力を供給することによりカメラ付ディジタル携帯電話ｅｘ１１５を動作可能な状態に起動する。携帯電話ｅｘ１１５は、ＣＰＵ、ＲＯＭ及びＲＡＭ等でなる主制御部ｅｘ３１１の制御に基づいて、音声通話モード時に音声入力部ｅｘ２０５で集音した音声データを音声処理部ｅｘ３０５によってディジタル音声データに変換し、これを変復調回路部ｅｘ３０６でスペクトラム拡散処理し、送受信回路部ｅｘ３０１でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナｅｘ２０１を介して送信する。また携帯電話機ｅｘ１１５は、音声通話モード時にアンテナｅｘ２０１で受信した受信データを増幅して周波数変換処理及びアナログディジタル変換処理を施し、変復調回路部ｅｘ３０６でスペクトラム逆拡散処理し、音声処理部ｅｘ３０５によってアナログ音声データに変換した後、これを音声出力部２０８を介して出力する。さらに、データ通信モード時に電子メールを送信する場合、本体部ｅｘ２０４の操作キーの操作によって入力された電子メールのテキストデータは操作入力制御部ｅｘ３０４を介して主制御部ｅｘ３１１に送出される。主制御部ｅｘ３１１は、テキストデータを変復調回路部ｅｘ３０６でスペクトラム拡散処理し、送受信回路部ｅｘ３０１でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナｅｘ２０１を介して基地局ｅｘ１１０へ送信する。
【０１２３】
データ通信モード時に画像データを送信する場合、カメラ部ｅｘ２０３で撮像された画像データをカメラインターフェース部ｅｘ３０３を介して画像符号化部ｅｘ３１２に供給する。また、画像データを送信しない場合には、カメラ部ｅｘ２０３で撮像した画像データをカメラインターフェース部ｅｘ３０３及びＬＣＤ制御部ｅｘ３０２を介して表示部ｅｘ２０２に直接表示することも可能である。
【０１２４】
画像符号化部ｅｘ３１２は、カメラ部ｅｘ２０３から供給された画像データを上記実施の形態で示した符号化方法によって圧縮符号化することにより符号化画像データに変換し、これを多重分離部ｅｘ３０８に送出する。また、このとき同時に携帯電話機ｅｘ１１５は、カメラ部ｅｘ２０３で撮像中に音声入力部ｅｘ２０５で集音した音声を音声処理部ｅｘ３０５を介してディジタルの音声データとして多重分離部ｅｘ３０８に送出する。
【０１２５】
多重分離部ｅｘ３０８は、画像符号化部ｅｘ３１２から供給された符号化画像データと音声処理部ｅｘ３０５から供給された音声データとを所定の方式で多重化し、その結果得られる多重化データを変復調回路部ｅｘ３０６でスペクトラム拡散処理し、送受信回路部ｅｘ３０１でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナｅｘ２０１を介して送信する。
【０１２６】
データ通信モード時にホームページ等にリンクされた動画像ファイルのデータを受信する場合、アンテナｅｘ２０１を介して基地局ｅｘ１１０から受信した受信データを変復調回路部ｅｘ３０６でスペクトラム逆拡散処理し、その結果得られる多重化データを多重分離部ｅｘ３０８に送出する。
【０１２７】
また、アンテナｅｘ２０１を介して受信された多重化データを復号化するには、多重分離部ｅｘ３０８は、多重化データを分離することにより符号化画像データと音声データとに分け、同期バスｅｘ３１３を介して当該符号化画像データを画像復号化部ｅｘ３０９に供給すると共に当該音声データを音声処理部ｅｘ３０５に供給する。
【０１２８】
次に、画像復号化部ｅｘ３０９は、符号化画像データを上記実施の形態で示した符号化方法に対応した復号化方法で復号することにより再生動画像データを生成し、これをＬＣＤ制御部ｅｘ３０２を介して表示部ｅｘ２０２に供給し、これにより、例えばホームページにリンクされた動画像ファイルに含まれる動画データが表示される。このとき同時に音声処理部ｅｘ３０５は、音声データをアナログ音声データに変換した後、これを音声出力部ｅｘ２０８に供給し、これにより、例えばホームページにリンクされた動画像ファイルに含まる音声データが再生される。
なお、上記システムの例に限られず、最近は衛星、地上波によるディジタル放送が話題となっており、図１８に示すようにディジタル放送用システムにも上記実施の形態の少なくとも符号化方法または復号化方法いずれかを組み込むことができる。具体的には、放送局ｅｘ４０９では映像情報の符号化ビットストリームが電波を介して通信または放送衛星ｅｘ４１０に伝送される。これを受けた放送衛星ｅｘ４１０は、放送用の電波を発信し、この電波を衛星放送受信設備をもつ家庭のアンテナｅｘ４０６で受信し、テレビ（受信機）ｅｘ４０１またはセットトップボックス（ＳＴＢ）ｅｘ４０７などの装置により符号化ビットストリームを復号化してこれを再生する。また、記録媒体であるＣＤやＤＶＤ等の蓄積メディアｅｘ４０２に記録した符号化ビットストリームを読み取り、復号化する再生装置ｅｘ４０３にも上記実施の形態で示した画像復号化装置を実装することが可能である。この場合、再生された映像信号はモニタｅｘ４０４に表示される。また、ケーブルテレビ用のケーブルｅｘ４０５または衛星／地上波放送のアンテナｅｘ４０６に接続されたセットトップボックスｅｘ４０７内に画像復号化装置を実装し、これをテレビのモニタｅｘ４０８で再生する構成も考えられる。このときセットトップボックスではなく、テレビ内に画像復号化装置を組み込んでも良い。また、アンテナｅｘ４１１を有する車ｅｘ４１２で衛星ｅｘ４１０からまたは基地局ｅｘ１０７等から信号を受信し、車ｅｘ４１２が有するカーナビゲーションｅｘ４１３等の表示装置に動画を再生することも可能である。
更に、画像信号を上記実施の形態で示した画像符号化装置で符号化し、記録媒体に記録することもできる。具体例としては、ＤＶＤディスクｅｘ４２１に画像信号を記録するＤＶＤレコーダや、ハードディスクに記録するディスクレコーダなどのレコーダｅｘ４２０がある。更にＳＤカードｅｘ４２２に記録することもできる。
レコーダｅｘ４２０が上記実施の形態で示した画像復号化装置を備えていれば、ＤＶＤディスクｅｘ４２１やＳＤカードｅｘ４２２に記録した画像信号を再生し、モニタｅｘ４０８で表示することができる。
なお、カーナビゲーションｅｘ４１３の構成は例えば図１７に示す携帯電話機ｅｘ１１５と同様であるが、図１７に示した構成のうち、カメラ部ｅｘ２０３とカメラインターフェース部ｅｘ３０３、画像符号化部ｅｘ３１２を除いた構成が考えられる。同様なことがコンピュータｅｘ１１１やテレビ（受信機）ｅｘ４０１等でも考えられる。
また、上記携帯電話ｅｘ１１４等の端末は、符号化器・復号化器を両方持つ送受信型の端末の他に、符号化器のみの送信端末、復号化器のみの受信端末の３通りの実装形式が考えられる。
【０１２９】
このように、本明細書に示した符号化方法、復号化方法を実装することにより本実施の形態で示したいずれの装置・システムに関しても実現可能になる。
【０１３０】
【発明の効果】
以上の説明から明らかなように、本発明に係る動画像符号化方法は、ピクチャ列からなる動画像を符号化し、得られた符号列を出力する方法であって、ピクチャを構成するブロックごとに動きベクトルを算出する動きベクトル算出ステップと、算出された動きベクトルを参照動きベクトルとして、係数を用いたスケーリング処理を行うことによって処理対象ブロックの動きベクトルを予測して生成する動きベクトル予測ステップと、前記係数を前記符号列とともに出力する係数出力ステップとを含むことを特徴とする。
【０１３１】
これによって、ピクチャの関連情報等にダイレクトモードで用いるためのスケーリング係数を含めて、符号列を生成して出力しているので、復号化時に関連情報よりスケーリング係数を取り出すことができる。
【０１３２】
また、本発明に係る動画像復号化方法は、請求項１記載の動画像符号化方法によって出力された符号列を復号化する方法であって、前記符号列から前記係数を取り出し、取り出した係数を用いてスケーリング処理を行うことによって処理対象ブロックの動きベクトルを算出する動きベクトル算出ステップと、算出された動きベクトルを用いて前記処理対象ブロックの復号化を行う復号化ステップとを含むことを特徴とする。
【０１３３】
これによって、符号列より取り出したスケーリング係数を用いて、参照動きベクトルから処理対象ブロックの２つの動きベクトルの生成することができるので、復号化時にスケーリング係数を、例えば各フィールド間の時間的距離等から求める必要がなく、処理付加が軽減でき、効率のよい処理を行うことが可能である。
【０１３４】
また、本発明の動画像符号化方法および動画像復号化方法を用いることにより、ダイレクトモードにおいて参照するフィールドに応じて最適なスケーリング係数を用いることができ、符号化効率の向上を図ることができる。またこの際にスケーリング係数を記述するオーバーヘッドはほとんど無視できる。さらに一部のスケーリング係数のみを関連情報として記述し、他のスケーリング係数は記述されたスケーリング係数から計算する方法を用いると、スケーリング係数を記述するための関連情報の符号量は、従来例と全く同じとなる。
【図面の簡単な説明】
【図１】本発明に係る動画像符号化装置の一実施の形態の構成を示すブロック図である。
【図２】フレームメモリにおけるピクチャの順序を示す説明図であり、（ａ）　入力された順序、（ｂ）　並び替えられた順序を示す説明図である。
【図３】ダイレクトモードにおける動きベクトルを示す説明図である。
【図４】動画像符号化装置による画像符号化信号フォーマットの概念図である。
【図５】本発明に係る動画像復号化装置の一実施の形態の構成を示すブロック図である。
【図６】ピクチャの順序を示す説明図であり、（ａ）　入力される符号列中のピクチャの順序、（ｂ）　出力画像として出力されるピクチャの順序を示す説明図である。
【図７】本発明の実施の形態を説明するための模式図である。
【図８】本発明の実施の形態を説明するための模式図である。
【図９】本発明の実施の形態を説明するための模式図である。
【図１０】本発明の実施の形態を説明するための模式図である。
【図１１】本発明の実施の形態を説明するための模式図である。
【図１２】本発明の実施の形態を説明するためのフローチャートである。
【図１３】本発明の実施の形態を説明するためのフローチャートである。
【図１４】実施の形態１の動画像符号化方法および動画像復号化方法をコンピュータシステムにより実現するためのプログラムを格納するための記録媒体についての説明図であり、（ａ）　記録媒体本体であるフレキシブルディスクの物理フォーマットの例を示した説明図、（ｂ）　フレキシブルディスクの正面からみた外観、断面構造、及びフレキシブルディスクを示した説明図、（ｃ）　フレキシブルディスクＦＤに上記プログラムの記録再生を行うための構成を示した説明図である。
【図１５】コンテンツ配信サービスを実現するコンテンツ供給システムの全体構成を示すブロック図である。
【図１６】携帯電話の一例を示す図である。
【図１７】携帯電話の内部構成を示すブロック図である。
【図１８】ディジタル放送用システムの全体構成を示すブロック図である。
【図１９】従来例を説明するための模式図である。
【図２０】従来例を説明するための模式図である。
【符号の説明】
１０１、１０７、７０７　フレームメモリ
１０２　差分演算部
１０３　予測誤差符号化部
１０４　符号列生成部
１０５　予測誤差復号化部
１０６　加算演算部
１０８　動きベクトル検出部
１０９　モード選択部
１１０　符号化制御部
１１１〜１１５、７０９、７１０　スイッチ
１１６　動きベクトル記憶部
７０１　符号列解析部
７０２　予測誤差復号化部
７０３　モード復号部
７０５　動き補償復号部
７０６　動きベクトル記憶部
７０８　加算演算部
Ｐ１、Ｂ２、Ｂ３、Ｂ４　フレーム
Ｐ１１、Ｐ１２、Ｂ２１、Ｂ２１、Ｂ３１、Ｂ３２、Ｂ４１、Ｂ４２　フィールド[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention implements a moving picture coding method and a moving picture decoding method for performing encoding and decoding using inter-frame prediction coding, a moving picture coding apparatus, a moving picture decoding apparatus, and software. And a recording medium storing the program.
[0002]
[Prior art]
In video coding, the amount of information is generally compressed using the spatial and temporal redundancy of a video. Here, as a method of utilizing the redundancy in the time direction, inter-frame prediction coding is used. In the inter-frame predictive coding, when a certain frame is coded, a frame located forward or backward in display time order is set as a reference frame. Then, the amount of motion from the reference frame is detected, and the information amount is compressed by removing the redundancy in the spatial direction from the difference value between the frame on which motion compensation has been performed and the frame to be encoded.
[0003]
MPEG-1, MPEG-2, MPEG-4, H.264. In a moving picture coding system such as H.263, a frame for which inter-picture prediction coding is not performed, that is, a frame for which intra-picture coding is performed, is called an I picture. Here, the picture means one coding unit including both a frame and a field. In addition, a picture to be subjected to inter-picture predictive encoding by referring to a picture located in front in display time order is called a P picture, and inter-picture predictive encoding is performed by referring to already processed pictures in forward and backward order in display time order. The picture to be executed is called a B picture. FIG. 19 shows the prediction relationship of each frame in the above-described moving picture coding method. In FIG. 19, the vertical line indicates one frame, and the frame type (I, P, B) is shown at the lower right of each frame. The arrow in FIG. 19 indicates that the frame at the end of the arrow is subjected to inter-frame predictive coding using the frame at the start of the arrow as a reference frame. For example, the second B frame from the top is encoded by using the first I frame and the fourth P frame from the top as reference images.
[0004]
In the MPEG-4 system, an encoding mode called a direct mode can be selected in encoding a B picture (for example, see Non-Patent Document 1). A conventional inter-frame prediction method in the direct mode will be described with reference to FIG. Now, assume that the block a of the frame B3 is encoded in the direct mode. In this case, the motion vector A of the block b located at the same position as the block a in the frame P4 which is the backward reference frame of the frame B3 is used.
The motion vector A is a motion vector used when the block b is encoded or decoded, and refers to the frame P1. The block a performs motion compensation from the frames P1 and P4, which are reference frames, using a motion vector calculated by a predetermined method from the motion vector A. In this case, the motion vector used to encode the block a is a motion vector B for the frame P1 and a motion vector C for the frame P4. At this time, if the magnitude of the motion vector A is MV, the magnitude of the motion vector B is MVf, and the magnitude of the motion vector C is MVb, MVf and MVb are obtained by Equations 1 and 2, respectively.
(Equation 1) MVf = N × MV / D
(Equation 2) MVb = −M × MV / D
[0005]
Here, N, M, and D are a set of values determined for each frame, and these values are hereinafter referred to as scaling coefficients. The value of the scaling coefficient may be determined at the time of encoding. As an example, it can be set using the temporal distance between each frame. Now, if the temporal distance between the frames P1 and P4 is set as D, the temporal distance between the frames P1 and B3 is set as N, and the temporal distance between the frames B3 and P4 is set as M, MVf and MVb become motion vectors parallel to MV. Become.
[0006]
[Non-patent document 1]
INTERNATIONAL STANDARD ISO / IEC14496-2: 1999 / amd. 1: 2000 (E)
[0007]
[Patent Document 1]
JP-A-11-75191
[0008]
[Problems to be solved by the invention]
Consider the case where the above conventional method is applied to an interlaced image. Here, an interlaced image is an image in which one frame is composed of two fields. In encoding and decoding of an interlaced image, one frame may be processed as a frame, processed as two fields, or processed as a frame structure or a field structure for each block in the frame. it can.
When interlaced video is processed on a field basis, even when processing blocks in the same field, the use of the first field or the second field as a reference picture in the direct mode differs depending on the block. This is because, for example, the forward reference field in the direct mode is a field referred to by the backward reference field. As described above, since the forward reference field in the direct mode differs depending on the block, the optimum value of the scaling coefficient for the motion vector also differs. Therefore, if only one set of scaling coefficients is determined for each frame or field to be encoded or decoded, there is a problem that the encoding efficiency is reduced.
[0009]
The present invention has been made in view of the above problems, and in a direct mode for interlaced video, a motion vector calculation method and a motion vector calculation method capable of greatly improving the coding efficiency as compared with the conventional method with a very small overhead. It is an object to provide a moving picture encoding method and a moving picture decoding method.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, a moving picture coding method according to the present invention is a method for coding a moving picture composed of a picture sequence and outputting an obtained code string. A motion vector calculating step of calculating a vector, a motion vector predicting step of predicting and generating a motion vector of a processing target block by performing scaling processing using coefficients using the calculated motion vector as a reference motion vector, Outputting a coefficient together with the code string.
[0011]
A moving picture decoding method according to the present invention is a method for decoding a code string output by the moving picture coding method according to claim 1, wherein the coefficient is extracted from the code string, and the extracted coefficient is A motion vector calculating step of calculating a motion vector of the processing target block by performing a scaling process using: and a decoding step of decoding the processing target block by using the calculated motion vector. And
[0012]
Further, the moving picture encoding apparatus according to the present invention is an apparatus that encodes a moving picture composed of a picture sequence and outputs the obtained code string, and calculates a motion vector for each block constituting the picture. A calculating unit, a motion vector predicting unit that predicts and generates a motion vector of a processing target block by performing scaling processing using a coefficient, using the calculated motion vector as a reference motion vector, and the coefficient together with the code sequence. And a coefficient output means for outputting.
[0013]
Also, a moving picture decoding apparatus according to the present invention is an apparatus for decoding a code string output by the moving picture coding apparatus according to claim 9, wherein the coefficient is extracted from the code string, and the extracted coefficient is A motion vector calculating unit that calculates a motion vector of the processing target block by performing scaling processing by using a motion vector; and a decoding unit that decodes the processing target block by using the calculated motion vector. And
[0014]
Further, the present invention is realized as a program for causing a computer to execute the steps in the moving picture encoding method and the moving picture decoding method, and is circulated through a recording medium or a transmission medium such as a CD-ROM or a communication network. You can also.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described with reference to the drawings.
[0016]
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an embodiment of a moving picture coding apparatus using a moving picture coding method according to the present invention.
[0017]
The moving picture coding apparatus includes a frame memory 101, a difference calculation unit 102, a prediction error coding unit 103, a code string generation unit 104, a prediction error decoding unit 105, an addition calculation unit 106, a frame memory 107, and a motion vector detection unit 108. , A mode selection unit 109, an encoding control unit 110, switches 111 to 115, and a motion vector storage unit 116.
[0018]
The frame memory 101 stores moving images input in units of pictures in order of display time. The encoding control unit 110 rearranges the pictures stored in the frame memory 101 in the order in which the encoding is performed. Further, the encoding control unit 110 controls the operation of storing the motion vector in the motion vector storage unit 116. Further, encoding control section 110 determines a scaling coefficient, which will be described in detail later, and outputs the result to code sequence generation section 104 and mode selection section 109.
[0019]
Using the encoded decoded image data as a reference picture, the motion vector detection unit 108 detects a motion vector indicating a position predicted to be optimal in a search area within the picture. The mode selection unit 109 determines a macroblock encoding mode using the motion vector detected by the motion vector detection unit 108. The difference calculation unit 102 calculates a difference between the image data read from the frame memory 101 and the reference image data input from the mode selection unit 109, and generates prediction error image data.
[0020]
The prediction error encoding unit 103 performs encoding processing such as frequency conversion and quantization on the input prediction error image data to generate encoded data. The code sequence generation unit 104 performs variable length coding and the like on the input coded data, and furthermore, information on the motion vector, coding mode information, and other related information input from the mode selection unit 109. To generate a code string.
[0021]
The prediction error decoding unit 105 performs decoding processing such as inverse quantization and inverse frequency transform on the input encoded data, and generates decoded differential image data. The addition operation unit 106 adds the decoded difference image data input from the prediction error decoding unit 105 and the reference image data input from the mode selection unit 109 to generate decoded image data. The frame memory 107 stores the generated decoded image data.
[0022]
Next, the operation of the moving picture coding apparatus configured as described above will be described.
FIG. 2 is an explanatory diagram showing the order of pictures in the frame memory 101, and is an explanatory diagram showing (a) an input order and (b) a rearranged order. Here, a vertical line indicates a picture, and a symbol shown at the lower right of each picture indicates that the first alphabet is a picture type (I, P, or B), and the second and subsequent numbers are picture numbers in display time order. Is shown. Also, the P picture is a reference picture using three neighboring I pictures or P pictures in the order of display time, and the B picture is a display picture including three I pictures or P pictures nearby in the display time order. One nearby I-picture or P-picture in the temporal order is used as a reference picture.
[0023]
The input image is input to the frame memory 101 in picture units in the order of display time, for example, as shown in FIG. When a picture is input to the frame memory 101, the encoding control unit 110 determines which type of picture (I, P, or B picture) is to be used to encode the input picture, and determines the type of the input picture according to the determined picture type. The switches 113 to 115 are controlled. The picture type is generally determined by, for example, a method of periodically assigning the picture type.
[0024]
After determining the picture type, the encoding control unit 110 rearranges the pictures input in the frame memory 101 in the order in which the encoding is performed, for example, as shown in FIG. The rearrangement in the coding order is performed based on the reference relation in the inter-picture prediction coding, and the rearrangement is performed so that the picture used as the reference picture is coded before the picture used as the reference picture. .
[0025]
Each of the pictures rearranged in the frame memory 101 is read, for example, in units of macroblocks divided into groups of 16 × 16 pixels. The motion compensation is performed, for example, in units of blocks divided into groups of horizontal 8 × vertical 8 pixels.
[0026]
The subsequent operation will be described separately for a case where the picture to be encoded is a P picture and a case where it is a B picture.
[0027]
<In case of P picture>
For P pictures, inter-picture predictive coding using forward reference is performed. For example, when the encoding process of the picture P13 is performed in the example shown in FIG. 2A, the reference pictures are the pictures P10, P7, and P4. These reference pictures have already been encoded, and decoded image data is stored in the frame memory 107.
[0028]
When encoding a P picture, the encoding control unit 110 controls each switch so that the switches 113 to 115 are turned on. Thereby, the macroblock of the picture P13 read from the frame memory 101 is input to the motion vector detecting unit 108, the mode selecting unit 109, and the difference calculating unit 102.
[0029]
The motion vector detection unit 108 detects a motion vector for each block in the macroblock using the decoded image data stored in the frame memory 107 as a reference picture. The motion vector detection unit 108 outputs the detected motion vector to the mode selection unit 109.
[0030]
The mode selection unit 109 determines a macroblock encoding mode using the motion vector detected by the motion vector detection unit 108. Here, the coding mode indicates a method of coding a macroblock. For example, in the case of a P picture, intra-picture coding, inter-picture prediction coding using a motion vector, and inter-picture prediction not using a motion vector (treating motion as 0 or selecting from motion vectors of surrounding blocks) It is assumed that it is possible to select which method to encode from among the encoding. In determining the encoding mode, a method is generally selected in which the encoding error is reduced with a smaller bit amount.
[0031]
The mode selection unit 109 outputs the determined encoding mode to the code sequence generation unit 104. At this time, if the picture to be coded is a picture to be used as a reference picture when coding another picture, and the coding mode determined by the mode selection unit 109 is inter-picture prediction coding, the picture Information indicating the motion vector and the reference picture number used in the inter prediction coding is stored in the motion vector storage unit 116. In this case, mode selecting section 109 outputs information indicating the motion vector and the reference picture number to code stream generating section 104.
[0032]
The mode selection unit 109 outputs reference image data based on the determined encoding mode to the difference calculation unit 102 and the addition calculation unit 106. When the mode selection unit 109 selects the intra-picture encoding, the reference image data is not output. When the intra-picture encoding is selected, the mode selection unit 109 connects the switch 111 to the a side, connects the switch 112 to the c side, and sets the switch 111 to the inter-picture predictive encoding. Control is performed such that the switch 112 is connected to the b-side and the switch 112 is connected to the d-side.
[0033]
Hereinafter, a case where the inter-picture prediction coding is selected by the mode selection unit 109 will be described.
The difference calculation unit 102 calculates a difference between the image data of the macroblock of the picture P13 read from the frame memory 101 and the reference image data input from the mode selection unit 109, and generates prediction error image data. Output to prediction error encoding section 103. The prediction error encoding unit 103 to which the prediction error image data is input performs encoding processing such as frequency conversion and quantization on the prediction error image data, generates encoded data, and generates a code sequence generation unit 104 and Output to prediction error decoding section 105. Here, the processing of frequency conversion and quantization can be performed, for example, in units of horizontal 8 × vertical 8 pixels.
[0034]
The coded sequence generation unit 104 to which the coded data is input performs variable length coding or the like on the coded data, and furthermore, information indicating the motion vector information and the reference picture information input from the mode selection unit 109. , An encoding mode information, other related information, and the like, to generate and output a code string. Details of the reference picture number will be described later.
[0035]
On the other hand, the prediction error decoding unit 105 to which the encoded data is input performs decoding processing such as inverse quantization and inverse frequency conversion on the encoded data, generates decoded differential image data, and performs an addition operation. Output to the unit 106. The addition operation unit 106 to which the decoded difference image data is input adds the decoded difference image data and the reference image data input from the mode selection unit 109 to generate decoded image data, and To be stored.
[0036]
Thereafter, by the same processing, the encoding processing is also performed on the remaining macroblocks of the picture P13. Then, in the example shown in FIG. 2A, when the processing is completed for all the macroblocks of the picture P13, the encoding processing of the picture B11 is performed next.
[0037]
<In the case of B picture>
For B pictures, inter-picture predictive coding using bidirectional reference is performed. For example, when the encoding process of the picture B11 is performed in the example shown in FIG. 2A, the reference pictures located ahead in the display time order are pictures P10, P7, P4, and the reference pictures located backward in the display time order are the pictures P13. It becomes.
[0038]
Here, a case is considered where a B picture is not used as a reference picture when encoding another picture. Therefore, when encoding a B picture, the encoding control unit 110 controls the switches so that the switch 113 is turned on and the switches 114 to 115 are turned off. Thereby, the macroblock of the picture B11 read from the frame memory 101 is input to the motion vector detection unit 108, the mode selection unit 109, and the difference calculation unit 102.
[0039]
The motion vector detection unit 108 uses the decoded image data of the pictures P10, P7, and P4 stored in the frame memory 107 as a forward reference picture and the decoded image data of the picture P13 as a backward reference picture, and For each block, a forward motion vector and a backward motion vector are detected. The motion vector detection unit 108 outputs the detected motion vector to the mode selection unit 109.
[0040]
The mode selection unit 109 determines a macroblock encoding mode using the motion vector detected by the motion vector detection unit 108. Here, the encoding mode of the B picture is, for example, intra-picture encoding, inter-picture prediction encoding using a forward motion vector, inter-picture prediction encoding using a backward motion vector, and inter-picture encoding using a bidirectional motion vector. It is assumed that it is possible to select which method to perform encoding from among predictive encoding and direct mode.
[0041]
Here, a case where encoding is performed in the direct mode will be described.
FIG. 3 is an explanatory diagram showing a motion vector in the direct mode. Here, it is assumed that the encoding target block is the block a of the picture B11.
[0042]
When encoding the block a in the direct mode, the motion vector of the block located at the same position as the block a in the backward reference picture is used. That is, as shown in FIG. 3, the motion vector c of the block b of the picture P13 is used. The motion vector c is a motion vector used when the block b is encoded, and refers to the picture P10. Note that the motion vector c is stored in the motion vector storage unit 116.
[0043]
The mode selection unit 109 generates a motion vector d based on the picture P10 and a motion vector e based on the picture P13 from the motion vector c using the scaling coefficient determined by the encoding control unit 110. In block a, bidirectional prediction is performed from pictures P10 and P13, which are reference pictures, using two motion vectors d and e generated from motion vector c. The selection of the scaling coefficient depending on whether the first field or the second field is used as the reference picture will be described later in detail.
[0044]
The mode selection unit 109 outputs the determined encoding mode to the code sequence generation unit 104. The mode selection unit 109 outputs reference image data based on the determined encoding mode to the difference calculation unit 102 and the addition calculation unit 106. When the mode selection unit 109 selects the intra-picture encoding, the reference image data is not output. When the intra-picture encoding is selected, the mode selection unit 109 connects the switch 111 to the a side, connects the switch 112 to the c side, and sets the switch 111 to the inter-picture predictive encoding. Control is performed such that the switch 112 is connected to the b-side and the switch 112 is connected to the d-side.
[0045]
Hereinafter, a case where the inter-picture prediction coding is selected by the mode selection unit 109 will be described.
The difference calculation unit 102 calculates a difference between the image data of the macroblock of the picture P13 read from the frame memory 101 and the reference image data input from the mode selection unit 109, and generates prediction error image data. Output to prediction error encoding section 103.
[0046]
The data prediction error encoding unit 103 to which the prediction error image data is input performs encoding processing such as frequency conversion and quantization on the prediction error image data, generates encoded data, and generates a code sequence generation unit 104. Output to The coded data generation unit 104 to which the coded data is input performs variable-length coding and the like on the coded data, and furthermore, the motion vector information, the coding mode information, and the motion vector information input from the mode selection unit 109. A code string is generated by adding other related information and the like, and output. Note that the picture-related information includes a scaling coefficient determined by the encoding control unit 110 for use in the direct mode. For a macroblock encoded in the direct mode, the motion vector information is not added to the encoded sequence.
[0047]
Thereafter, by the same process, the encoding process is also performed on the remaining macroblock of the picture B11. Then, in the example shown in FIG. 2A, when the processing is completed for all the macroblocks of the picture B11, the encoding processing of the picture B12 is performed next.
[0048]
FIG. 6 is a conceptual diagram of an image coded signal format by the moving image coding apparatus. Picture is a coded signal for one frame, Header is a related coded signal included at the head of the frame, Block1 is a coded signal of a block in the direct mode, Block2 is a coded signal of a block by interpolation prediction other than the direct mode, RIdx1, RIdx2 indicates a relative index, and MV1 and MV2 indicate motion vectors. Here, the case where the Header section as the related information is included in the same code string is shown, but the related information may be included in another code string. The header includes a scaling coefficient determined by the encoding control unit 110 for use in the direct mode. The interpolation prediction block Block2 has two relative indexes RIdx1 and RIdx2 in the coded signal in this order to indicate two reference frames used for interpolation. The relative index is the same as the reference picture number described above. Whether to use the relative index RIdx1 or RIdx2 can be determined by PredType. For example, when it is indicated to refer to a picture in two directions, RIdx1 and RIdx2 are used. When it is indicated to refer to a picture in one direction, RIdx1 or RIdx2 is used, and when direct mode is indicated. Are not used for both RIdx1 and RIdx2. The relative index RIdx1 indicating the first reference frame is called a first relative index, and the relative index RIdx2 indicating the second reference frame is called a second relative index. Whether the frame is the first reference frame or the second reference frame is determined by the data position in the encoded stream.
[0049]
FIG. 4 is a block diagram showing a configuration of an embodiment of a video decoding device using the video decoding method according to the present invention.
[0050]
The video decoding device includes a code sequence analysis unit 701, a prediction error decoding unit 702, a mode decoding unit 703, a motion compensation decoding unit 705, a motion vector storage unit 706, a frame memory 707, an addition operation unit 708, and a switch 709. 710.
[0051]
The code string analysis unit 701 extracts various data such as coding mode information, motion vector information, and scaling coefficients from the input code string. The prediction error decoding unit 702 decodes the input prediction error coded data to generate prediction error image data. The mode decoding unit 703 controls the switches 709 and 710 with reference to the information on the encoding mode extracted from the code string.
[0052]
The motion compensation decoding unit 705 performs a decoding process of the information of the reference picture number and the motion vector, and acquires the motion compensation image data from the frame memory 707 based on the decoded reference picture number and the motion vector. The motion vector storage unit 706 stores a motion vector.
[0053]
The addition operation unit 708 adds the prediction error coded data input from the prediction error decoding unit 702 and the motion compensated image data input from the motion compensation decoding unit 705 to generate decoded image data. The frame memory 707 stores the generated decoded image data.
[0054]
Next, the operation of the moving picture decoding apparatus configured as described above will be described.
FIG. 5 is an explanatory diagram showing the order of pictures, in which (a) the order of pictures in an input code string and (b) the order of pictures output as an output image. Here, the P picture is a reference picture using three neighboring I pictures or P pictures in the order of the display time, and the B picture is three I pictures or P pictures located in the front in the order of the display time. It is assumed that the encoding is performed using one neighboring I picture or P picture located rearward in display time order as a reference picture.
[0055]
The code sequence is input to the code sequence analysis unit 701 in picture order as shown in FIG. The code string analysis unit 701 extracts various data such as coding mode information, motion vector information, and scaling coefficients from the input code string. The code sequence analysis unit 701 outputs the extracted encoding mode information to the mode decoding unit 703, and outputs the motion vector information and the scaling coefficient to the motion compensation decoding unit 705. Further, the code sequence analysis unit 701 outputs the extracted prediction error encoded data to the prediction error decoding unit 702.
[0056]
The mode decoding unit 703 controls the switches 709 and 710 with reference to the information on the encoding mode extracted from the code string. At this time, when the encoding mode is the intra-picture encoding, the switch 709 is connected to the a side, and the switch 710 is connected to the c side. When the encoding mode is the inter-picture prediction encoding, the switch 709 is connected. Is connected to the b side, and the switch 710 is connected to the d side. Further, mode decoding section 703 also outputs information on the encoding mode to motion compensation decoding section 705.
[0057]
Hereinafter, the case where the encoding mode is the inter-picture prediction encoding will be described.
The prediction error decoding unit 702 decodes the input prediction error encoded data, generates prediction error image data, and outputs the data to the addition operation unit 708.
[0058]
The subsequent operation will be described separately for a case where the picture to be encoded is a P picture and a case where it is a B picture.
[0059]
<In case of P picture>
The motion compensation decoding unit 705 to which the motion vector information is input performs decoding processing of the motion vector information. Then, the motion compensation decoding unit 705 acquires the motion compensation image data (block) from the frame memory 707 based on the decoded reference picture number and the motion vector, and outputs the motion compensation image data to the addition operation unit 708. I do.
[0060]
Also, when the picture to be decoded is a picture used as a reference picture when decoding another picture, the motion compensation decoding unit 705 stores the motion vector and the reference picture number in the motion vector storage unit 706. I do. Here, since the P picture is used as a reference picture, the motion vector and the reference picture number obtained when decoding the picture P13 are stored in the motion vector storage unit 706. Note that the storage of the motion vector in the motion vector storage unit 706 is controlled by the related information of the code string.
[0061]
The addition operation unit 708 adds the prediction error coded data input from the prediction error decoding unit 702 and the motion compensation image data input from the motion compensation decoding unit 705 to generate decoded image data, and It is stored in the memory 707.
[0062]
Thereafter, by the same process, the decoding process is also performed on the remaining macroblocks of the picture P13. Then, in the example shown in FIG. 6A, when the processing is completed for all the macroblocks of the picture P13, the decoding processing of the picture B11 is performed next.
[0063]
<In the case of B picture>
Here, a case where the encoding mode extracted by mode decoding section 703 is the direct mode will be described. It is assumed that the block a of the picture B11 shown in FIG. 3 is a block to be decoded.
[0064]
When decoding the block a in the direct mode, the motion vector of the block at the same position as the block a in the backward reference picture is used. That is, as shown in FIG. 3, the motion vector c of the block b of the picture P13 is used. The motion vector c is a motion vector used when the block b is encoded, and refers to the picture P10.
[0065]
The motion compensation decoding unit 705 generates a motion vector d based on the picture P10 and a motion vector e based on the picture P13 from the motion vector c using the scaling coefficient input from the code sequence analysis unit 701. In block a, bidirectional prediction is performed from pictures P10 and P13, which are reference pictures, using two motion vectors d and e generated from motion vector c. The selection of the scaling coefficient depending on whether the first field or the second field is used as the reference picture will be described later in detail.
[0066]
The motion compensation decoding unit 705 acquires motion compensation image data (block) from the frame memory 707 based on the generated motion vector, and outputs the data to the addition operation unit 708. The addition operation unit 708 adds the motion compensated image data and the prediction error coded data input from the prediction error decoding unit 702, generates decoded image data, and stores the decoded image data in the frame memory 707.
[0067]
Thereafter, by the same processing, the decoding processing is also performed on the remaining macro blocks of the picture B11. Then, in the example shown in FIG. 6A, when the processing is completed for all the macroblocks of the picture B11, the decoding processing of the picture B12 is performed next. The pictures decoded as described above are sequentially output from the frame memory 707 as output images as shown in FIG.
[0068]
As described above, the related information of the picture includes the scaling coefficient determined by the encoding control unit 110 for use in the direct mode, generates and outputs a code string, and extracts the scaling coefficient from the related information during decoding. By using the scaling coefficients, two motion vectors of the processing target block are generated from the reference motion vector. This eliminates the need to find a scaling coefficient at the time of decoding from, for example, the temporal distance between fields, and can reduce the processing load and perform efficient processing.
[0069]
Next, generation of two motion vectors of the processing target block from the reference motion vector using the scaling coefficient determined by the encoding control unit 110, and whether to use the first field or the second field as the reference picture The selection of the corresponding scaling coefficient will be described in detail.
[0070]
Whether each picture is coded in the frame structure or in the field structure is determined by the coding control unit 110 adaptively. Whether the encoding is performed using the frame structure or the field structure is performed by, for example, obtaining the variance of the pixel values in the picture using the frame structure and the field structure, and selecting the smaller variance. In addition, a method of encoding each picture in either a frame structure or a field structure in units of a block can be considered. Here, a case where the frame structure or the field structure is switched in units of a picture will be described.
[0071]
FIG. 7 shows a temporal arrangement of each frame when encoding or decoding a moving image. In FIG. 7, frames P1 and P4 are processed as P pictures, and frames B2 and B3 are processed as B pictures. Also, one frame can be treated as two fields. For example, the frame P1 can be treated as fields P11 and P12, the frame B2 as fields B21 and B22, the frame B3 as fields B31 and B32, and the frame P4 as fields P41 and P42. Further, each frame is adaptively encoded and decoded in either a frame structure or a field structure.
7 to 11, the encoding and decoding processes are performed in units of the upper symbols among the symbols indicating pictures. For example, in FIG. 7, all pictures are processed on a field basis.
[0072]
It is assumed that the current picture to be processed is the field B31. That is, the frame B3 is processed in a field structure. In the field B31, the field P11 or P12 is used as a forward reference picture, and the field P41 or P42 is used as a backward reference picture. These reference pictures have already been encoded or decoded. It is assumed that the frames P1 and P4 are processed on a field-by-field basis.
[0073]
Now, consider a case where block a of field B31 is processed in the direct mode. In this case, in the field P41, which is a backward reference picture and is a field (that is, a first field) having the same parity as the field to which the block a belongs (a value indicating which of the first field and the second field), The motion vector of block b located at the same position as block a is used. Hereinafter, this motion vector is referred to as a reference motion vector.
[0074]
Here, first, as shown in FIG. 7A, the case where the block b is processed using the motion vector A, and the motion vector A refers to the field P11 will be described. In this case, the block a is divided into a field P11 (a field pointed to by the reference motion vector A), which is a forward reference field, and a backward reference field using a motion vector calculated by a predetermined method from the reference motion vector A. Motion compensation is performed from a certain field P41 (the field to which the block b belongs). In this case, the motion vector used when processing the block a is the motion vector B for the field P11 and the motion vector C for the field P41. At this time, assuming that the magnitude of the motion vector A is MV1, the magnitude of the motion vector B is MVf1, and the magnitude of the motion vector C is MVb1, MVf1 and MVb1 are obtained by Equations 3 and 4, respectively.
(Equation 3) MVf1 = N1 × MV1 / D1
(Equation 4) MVb1 = −M1 × MV1 / D1
[0075]
Hereinafter, these values of N1, M1, and D1 are referred to as scaling coefficients. It is assumed that the scaling coefficient is a value set for each field. For example, in this case, the scaling factor can be set from the temporal distance between each field. For example, if the temporal distance between the fields P11 and P41 is set to D1, the temporal distance between the fields P11 and B31 is set to N1, and the temporal distance between the fields B31 and P41 is set to M1, MVf1 and MVb1 become motion vectors parallel to MV. . Here, the value of the scaling coefficient is set at the time of encoding, and is described as related information or the like in the code string or as additional information of the code string. At the time of decoding, the scaling coefficient is obtained from the code string or from information attached to the code string. Then, when decoding a block coded in the direct mode, MVf1 and MVb1 may be calculated using Expressions 3 and 4.
[0076]
Next, a case where the block b is processed using the motion vector D as shown in FIG. 7B and the motion vector D refers to the field P12 will be described. In this case, the block a is a field P12 (a field pointed to by the motion vector D), which is a forward reference field, and a backward reference field using a motion vector calculated by a predetermined method from the reference motion vector D. Motion compensation is performed from the field P41 (the field to which the block b belongs). In this case, the motion vector used when processing the block a is the motion vector E for the field P12 and the motion vector F for the field P41. At this time, if the magnitude of the motion vector D is MV2, the magnitude of the motion vector E is MVf2, and the magnitude of the motion vector F is MVb2, MVf2 and MVb2 are obtained by Equations 5 and 6, respectively.
(Equation 5) MVf2 = N2 × MV2 / D2
(Equation 6) MVb2 = M2 × MV2 / D2
[0077]
Here, it is assumed that the scaling coefficients (N2, M2, D2) are values set in picture units. For example, the values of the scaling coefficients (N2, M2, D2) can be set from the temporal distance between each field. For example, if the temporal distance from the field P12 to P41 is set to D2, the temporal distance from the field P12 to B31 is set to N2, and the temporal distance from the field B31 to P41 is set to M2, MVf2 and MVb2 become motion vectors parallel to MV2. . Here, it is assumed that the value of the scaling coefficient is set at the time of encoding, and is described as related information or the like in the code string or as additional information of the code string. At the time of decoding, these values may be obtained from the attached information of the code string or the code string, and MVf2 and MVb2 may be calculated using Expressions 5 and 6 accordingly.
[0078]
Here, as a method of describing the scaling coefficients (N1, M1, D1) and (N2, M2, D2) in the code string or as the additional information of the code string, besides the method of describing both sets as described above, For example, only one set of scaling coefficients may be described, and the described set of scaling coefficients may be used to determine the other set of scaling coefficients. For example, when the scaling factor is determined from the temporal distance between fields as described above, the scaling factor between (N1, M1, D1) and (N2, M2, D2) is:
(Equation 7) N2 = N1-1
(Equation 8) M2 = M1
(Equation 9) D2 = D1-1
Is established. Therefore, the scaling factor (N2, M2, D2) can be obtained from the scaling factor (N1, M1, D1).
[0079]
In the above description, the case where the fields P11 and P12 in FIG. 7 are used as the forward reference pictures and the fields P41 and P42 in FIG. 7 are used as the backward reference pictures, however, the number of these reference pictures may be further increased. . For example, as shown in FIG. 11, when processing block a of field B61, fields P11, P12, P41, and P42 are used as forward reference pictures, and P71 and P72 are used as backward reference pictures. FIG. 11A shows a case where the reference motion vector A when processing the block a in the direct mode refers to the first field of the frame P1, and FIG. 11B shows a case where the block a is processed in the direct mode. In this case, the reference motion vector A refers to the second field of the frame P1. In such a case, when combined with the case shown in FIG. 7, there are four fields referred to by the reference motion vector. Therefore, there are also four types of scaling coefficients. However, since the scaling coefficients of the fields belonging to the same frame can easily be obtained from one to the other, it is not necessary to describe all the scaling coefficients as related information. If the interval between reference pictures (here, P pictures) is constant or can be detected by another method, only one set of scaling coefficients is described, and the other scaling coefficients are calculated from the described scaling coefficients. You can ask.
[0080]
In the above description, the case where the block belonging to the field B31 in FIG. 7 is processed in the direct mode has been described. However, the same applies to the case where the block belonging to the field B32 which is the second field of the frame B3 is processed. Can be processed. Hereinafter, this case will be described.
[0081]
The state of the processing will be described with reference to FIG. When processing the block a belonging to the field B32 in the direct mode, the motion vector of the block b belonging to the picture P42 and located at the same position as the block a becomes the reference motion vector. FIG. 8A shows a case where the reference motion vector refers to the field P11, and FIG. 8B shows a case where the reference motion vector refers to the field P12. Since the outline of the processing in these cases is almost the same as in the above-described case, the description is omitted here. However, the value of the scaling coefficient in this case is generally different from the scaling coefficient used when processing the field B31 in FIG.
[0082]
Here, when processing the second field, it is possible to refer to the first field in the same frame as the reference picture. Therefore, when the block b is processed with reference to the field P41, the processing is different from the above processing. This will be described with reference to FIG.
[0083]
FIG. 9 illustrates a case where the block b is processed using the motion vector G, and the motion vector G refers to the field P41. In this case, the block a is a field P41 (field pointed to by the motion vector G), which is a backward reference field, and a backward reference field, using a motion vector calculated from the reference motion vector G by a predetermined method. Motion compensation is performed from the field P42 (the field to which the block b belongs). In this case, the motion vector used for encoding the block a is the motion vector H for the field P41 and the motion vector I for the field P42. At this time, assuming that the magnitude of the motion vector G is MV3, the magnitude of the motion vector H is MVf3, and the magnitude of the motion vector I is MVb3, MVf3 and MVb3 are obtained by Equations 10 and 11, respectively.
(Equation 10) MVf3 = −N3 × MV3 / D3
(Equation 11) MVb3 = −M3 × MV3 / D3
[0084]
Here, for example, the values of N3, M3, and D3 can be set from the temporal distance between each field. For example, if the temporal distance from the field P41 to P42 is set to D3, the temporal distance from the field B32 to P41 is set to N3, and the temporal distance from the field B31 to P42 is set to M3, MVf3 and MVb3 become motion vectors parallel to MV3. . Here, it is assumed that the values of N3, M3, and D3 are set on a field basis at the time of encoding, and these values are described as related information or the like in a code string or as additional information of the code string. At the time of decoding, these values may be obtained from the attached information of the code string or the code string, and MVf3 and MVb3 may be calculated using Expressions 10 and 11 accordingly.
[0085]
Next, FIG. 12 shows a flowchart of a processing method in the moving picture coding method using the above-described motion vector calculation method. Here, a moving image coding method corresponding to the motion vector calculation method described with reference to FIG. 7 will be described. First, in S601, a scaling coefficient is determined for each picture (frame or field), and is described as related information. Here, the method of determining the scaling coefficient may be determined by the time interval between the reference fields as described above, or may be determined by another method. Further, as the method of describing the scaling coefficients, all the scaling coefficients may be described, or a part of the scaling coefficients may be described, and the remaining scaling coefficients may be obtained from the described scaling coefficients. Then, in S602, it is determined whether to process the processing target block in the direct mode. If the processing is not to be performed in the direct mode, in S606, the processing target block is processed by processing according to another mode. When processing is performed in the direct mode, it is determined in step S603 whether the reference motion vector is a motion vector for the first field or a motion vector for the second field. This determination can be made using the value of the reference picture number. If the reference motion vector is a motion vector that refers to the first field, in step S604, scaling of the reference motion vector is performed using the scaling coefficient (N1, M1, D1) for the first field. If the reference motion vector is a motion vector referring to the second field, in step S605, the scaling of the reference motion vector is performed using the scaling coefficient (N2, M2, D2) for the second field. Then, in S607, motion compensation is performed using the motion vector obtained in S604 or S605.
[0086]
Next, FIG. 13 shows a flowchart of a processing method in the video decoding method using the above-described motion vector calculation method. Here, a moving image decoding method corresponding to the motion vector calculation method described with reference to FIG. 13 will be described. First, in S701, a scaling coefficient is obtained from related information. Then, for example, when only some of the scaling coefficients are described as related information, other scaling coefficients are calculated from the described scaling coefficients by a predetermined method. Then, in S702, it is determined whether to process the processing target block in the direct mode. If the processing is not to be performed in the direct mode, in S706, the processing target block is processed by processing according to another mode. When processing is performed in the direct mode, it is determined in step S703 whether the reference motion vector is a motion vector for the first field or a motion vector for the second field. This determination can be made using the value of the reference picture number. If the reference motion vector is a motion vector that refers to the first field, in step S704, scaling of the reference motion vector is performed using the scaling coefficient (N1, M1, D1) for the first field. If the reference motion vector is a motion vector that refers to the second field, in step S705, the scaling of the reference motion vector is performed using the scaling coefficient (N2, M2, D2) for the second field. Then, in S707, motion compensation is performed using the motion vector obtained in S704 or S705.
[0087]
As described above, in the motion vector calculation method, the moving image encoding method, and the moving image decoding method of the present invention, when the reference motion vector used in the direct mode is a motion vector corresponding to a field structure, the reference motion vector indicates The scaling factor used in the direct mode is switched depending on the field. This scaling coefficient is set for each processing field, described as related information at the time of encoding, and acquired from the related information at the time of decoding. Further, as a method of describing the scaling coefficient, all the scaling coefficients may be described, or only some of the scaling coefficients may be described, and the remaining scaling coefficients may be obtained from the described scaling coefficients by a predetermined method. good.
[0088]
As described above, by using the motion vector calculation method, the moving image coding method, and the moving image decoding method of the present invention, it is possible to use an optimal scaling coefficient according to a field to be referred to in the direct mode, and to reduce the coding efficiency. Improvement can be achieved. At this time, the overhead for describing the scaling coefficient can be almost ignored. Furthermore, if only a part of the scaling coefficient is described as related information and the other scaling coefficient is calculated from the described scaling coefficient, the code amount of the related information for describing the scaling coefficient is completely different from the conventional example. Will be the same.
[0089]
Although the case has been described with the present embodiment where the scaling coefficient is set from the temporal distance, this may be determined by another method.
[0090]
Also, in the present embodiment, each frame is described as being adaptively encoded and decoded using either the frame structure or the field structure. Even if encoding and decoding are adaptively performed using any of the structures, the encoding and decoding can be performed by the same processing as in the present invention, and the same effects can be obtained.
[0091]
FIG. 10 shows the order of display time of each frame when encoding or decoding a moving image. In FIG. 10, a solid line indicates a picture having a frame structure, and a broken line indicates a picture having a field structure. For example, a frame P1 indicated by a solid line is obtained by combining fields P11 and P12 indicated by a broken line. The frames P1 and P4 are processed as P pictures, and the frames B2 and B3 are processed as B pictures. Also, one frame can be treated as two fields. For example, the frame P1 can be treated as fields P11 and P12, the frame B2 as fields B21 and B22, the frame B3 as fields B31 and B32, and the frame P4 as fields P41 and P42. Here, it is assumed that each frame is adaptively encoded and decoded in either a frame structure or a field structure.
[0092]
It is assumed that the current processing target is the frame B3. That is, the frame B3 is processed in a frame structure. Also, frame B3 uses frame P1 as a forward reference picture and frame P4 as a backward reference picture. These reference pictures have already been encoded or decoded. Here, it is assumed that the frame P4 is processed in field units. That is, when referring to the frame P4, P41 and P42 processed as the field structure are combined and referred to as the frame structure.
[0093]
Now, consider a case where block a of frame B3 is processed in the direct mode. In this case, the motion vector used when processing the block located at the same position as the block a in the frame P4 that is the backward reference picture is used. However, here, the frame P4 is treated as fields P41 and P42 as a field structure. Therefore, here, the motion vector of the block b belonging to the field P41, which is the first field of the frame P4, is set as the reference motion vector.
[0094]
Here, first, as shown in FIG. 10A, the case where the block b is processed using the motion vector A, and the motion vector A refers to the field P11 will be described. In this case, the block a includes a frame P1 (a frame including a field pointed to by the motion vector A) as a forward reference frame and a backward reference frame using a motion vector calculated by a predetermined method from the reference motion vector A. Motion compensation is performed from frame P4 (a frame including the field to which block b belongs). In this case, it is assumed that the motion vector used for encoding the block a is the motion vector B for the frame P1 and the motion vector C for the frame P4. At this time, assuming that the magnitude of the motion vector A is MV4, the magnitude of the motion vector B is MVf4, and the magnitude of the motion vector C is MVb4, MVf4 and MVb4 are obtained by Expressions 12 and 13, respectively.
(Equation 12) MVf4 = N4 × MV4 / D4
(Equation 13) MVb4 = −M4 × MV4 / D4
[0095]
Here, the values of the scaling coefficients (N4, M4, D4) can be set, for example, from the temporal distance between the pictures. For example, if the temporal distance from field P11 to P41 is set to D4, the temporal distance from frame P1 to frame B3 is set to N4, and the temporal distance from frame B3 to frame P4 is set to M4, MVf4 and MVb4 become motion vectors parallel to MV4. It becomes. Here, it is assumed that the values of the scaling coefficients (N4, M4, D4) are set on a frame basis at the time of encoding, and these values are described as related information of the frame B3 in the code string or as additional information of the code string. . At the time of decoding, these values may be obtained from the attached information of the code string or the code string, and MVf4 and MVb4 may be calculated using Expressions 12 and 13 accordingly.
[0096]
Next, as shown in FIG. 10B, the case where the block b is processed using the motion vector D, and the motion vector D refers to the field P12 will be described. In this case, the block a includes a frame P1 (a frame including a field pointed to by the motion vector D) which is a forward reference frame and a backward reference frame using a motion vector calculated by a predetermined method from the reference motion vector D. Motion compensation is performed from frame P4 (a frame including the field to which block b belongs). In this case, the motion vector used when encoding the block a is a motion vector E for the frame P1 and a motion vector F for the frame P4. At this time, assuming that the magnitude of the motion vector D is MV5, the magnitude of the motion vector E is MVf5, and the magnitude of the motion vector F is MVb5, MVf5 and MVb5 are obtained by Expressions 14 and 15, respectively.
(Equation 14) MVf5 = N5 × MV5 / D5
(Equation 15) MVb5 = −M5 × MV5 / D5
[0097]
Here, for example, the values of the scaling coefficients (N5, M5, D5) can be set from the temporal distance between the pictures as an example. For example, if the temporal distance from the field P12 to the field P41 is set to D5, the temporal distance from the frame P1 to the frame B3 is set to N5, and the temporal distance from the frame B3 to the frame P4 is set to M5, the MVf5 and the MVb5 move in parallel to the MV5. Vector. Here, it is assumed that the values of the scaling coefficients (N5, M5, D5) are set in picture units at the time of encoding, and these values are described as related information or the like in a code string or as additional information of the code string. At the time of decoding, these values may be obtained from the attached information of the code string or the code string, and MVf5 and MVb5 may be calculated using Expressions 14 and 15 accordingly.
[0098]
Here, as a method of describing the scaling coefficients (N4, M4, D4) and (N5, M5, D5) in the code sequence or as the additional information of the code sequence, besides the method of describing both sets as described above, For example, only one set of scaling coefficients may be described, and the other set of scaling coefficients may be obtained from the described scaling coefficients. For example, if the scaling factor is determined from the temporal distance between fields as described above,
(Equation 16) N5 = N4
(Equation 17) M5 = M4
(Equation 18) D5 = D4-1
Is established. Therefore, the scaling coefficients (N5, M5, D5) can be obtained from the scaling coefficients (N4, M4, D4).
[0099]
In the first embodiment, a case has been described in which a plurality of reference frames are used as shown in FIG. 11 in addition to the reference relationships shown in FIG. This is also possible in the second embodiment.
[0100]
In the first embodiment, the moving picture coding method and the moving picture decoding method when the motion vector calculating method described with reference to FIG. 7 is used have been described. Here, when the motion vector calculation method described with reference to FIG. 10 is used, the moving picture coding method and the moving picture decoding method are the same as the moving picture coding method and the moving picture described in the first embodiment. The description is omitted because it is the same as the decoding method.
[0101]
As described above, by using the motion vector calculation method, the moving image coding method, and the moving image decoding method of the present invention, it is possible to use an optimal scaling coefficient according to a field to be referred to in the direct mode, and to reduce the coding efficiency. Improvement can be achieved. At this time, the overhead for describing the scaling coefficient can be almost ignored. Furthermore, if only a part of the scaling coefficient is described as related information and the other scaling coefficient is calculated from the described scaling coefficient, the code amount of the related information for describing the scaling coefficient is completely different from the conventional example. Will be the same.
[0102]
Although the case has been described with the present embodiment where the scaling coefficient is set from the temporal distance, this may be determined by another method.
[0103]
Also, in the present embodiment, each frame is described as being adaptively encoded and decoded using either the frame structure or the field structure. Even if encoding and decoding are adaptively performed using any of the structures, the encoding and decoding can be performed by the same processing as in the present invention, and the same effects can be obtained.
[0104]
In the present embodiment, the P picture has been described as a picture processed with reference to a picture in one forward direction, and the B picture has been described as a picture processed with reference to a picture in two forward and backward directions. The same effect can be obtained even if the P picture is processed with reference to the picture in the backward one direction, and the B picture is processed with reference to the picture in the forward two directions or the backward two directions.
[0105]
Next, determination of the scaling coefficient in the encoding control unit 110 will be described.
When the two motion vectors of the processing target block are generated from the reference motion vector using the scaling coefficients as described above, the division coefficient division operation processing is performed. For this reason, the scaling factor can be determined by the following method.
[0106]
(Method 1)
With the scaling coefficients N1, M1, D1, N2, M2, and D2 shown in Equations 3, 4, 5, and 6, when two motion vectors of the processing target block are generated from the reference motion vector, N1 / D1 , M1 / D1, N2 / D2, and M2 / D2. At this time, approximations are performed so that the denominator is a power of two. As shown in FIG. 7A, when N1 / D1 is 4/6, for example, when the denominator is fixed to 16 and divided, the numerator becomes 10.66. Next, 10/16, that is, 10/2 ＾ 4 (where ＾ indicates a power) is obtained by rounding down the decimal point of the numerator 10.66. If the denominator is a power of 2, the arithmetic processing can be performed by the shift operation. For example, if 10 is represented by a binary number, it becomes "1010", and if it is shifted by four digits, it becomes ".1010". In this way, 10/16 arithmetic processing can be performed.
[0107]
On the other hand, N2 / D2 having different denominators are also shared by the same power-of-two denominator as N1 / D1. As shown in FIG. 7B, when N2 / D2 is 3/5, for example, if the denominator is fixed to 16 and divided, the numerator becomes 9.6. Next, the fraction of the numerator 9.6 is rounded down to 9/16, that is, 9/2 ＾ 4.
[0108]
(Method 2)
For N1 / D1, 10/2 ＾ 4 is obtained in the same manner as described above, and then for N2 / D2, the denominator is shared by the same power of 2 as N1 / D1. Then, N2 subtracts a predetermined value (here, 1) from 10 to obtain 9/2 ＾ 4.
[0109]
(Method 3)
N1 / D1 is obtained in the same manner as described above. On the other hand, the denominator of N2 / D2 and N1 / D1 are not shared by the same power of two, but are changed to a denominator of power of two so as to be as close as possible to the original N2 / D2.
[0110]
As described above, the approximation is performed so that the denominator is a power of 2 with respect to the scaling coefficient. Therefore, for example, even for an image processing DSP (signal processing LSI) having no function of performing a division operation process, The division coefficient arithmetic processing of the scaling coefficient can be performed by the shift operation.
[0111]
(Embodiment 2)
Further, by recording a program for realizing the configuration of the image encoding method or the image decoding method described in each of the above embodiments on a storage medium such as a flexible disk, The illustrated processing can be easily performed in an independent computer system.
[0112]
FIG. 14 is an explanatory diagram of a case where the image encoding method or the image decoding method according to the first embodiment is implemented by a computer system using a flexible disk storing the image encoding method or the image decoding method.
[0113]
FIG. 14B shows the appearance, cross-sectional structure, and flexible disk as viewed from the front of the flexible disk, and FIG. 14A shows an example of the physical format of the flexible disk which is a recording medium body. The flexible disk FD is built in the case F, and a plurality of tracks Tr are formed concentrically from the outer circumference toward the inner circumference on the surface of the disk, and each track is divided into 16 sectors Se in an angular direction. ing. Therefore, in the flexible disk storing the program, an image encoding method as the program is recorded in an area allocated on the flexible disk FD.
[0114]
FIG. 14C shows a configuration for recording and reproducing the program on the flexible disk FD. When the above program is recorded on the flexible disk FD, the image encoding method or the image decoding method as the above program is written from the computer system Cs via the flexible disk drive. When the image encoding method is constructed in a computer system using a program in a flexible disk, the program is read from the flexible disk by a flexible disk drive and transferred to the computer system.
[0115]
In the above description, the description has been made using a flexible disk as a recording medium. However, the same description can be made using an optical disk. Further, the recording medium is not limited to this, and the present invention can be similarly implemented as long as the program can be recorded, such as an IC card or a ROM cassette.
[0116]
FIGS. 15 to 18 are diagrams illustrating a device that performs the encoding process or the decoding process described in the above embodiment, and a system using the device.
[0117]
FIG. 15 is a block diagram illustrating an overall configuration of a content supply system ex100 that realizes a content distribution service. A communication service providing area is divided into desired sizes, and base stations ex107 to ex110, which are fixed wireless stations, are installed in each cell. In the content supply system ex100, for example, a computer ex111, a PDA (personal digital assistant) ex112, a camera ex113, and a mobile phone ex114 are connected to the Internet ex101 via an Internet service provider ex102 and a telephone network ex104. However, the content supply system ex100 is not limited to the combination as shown in FIG. 15, and may be connected in any combination. Further, the mobile station may be directly connected to the telephone network ex104 without going through the base stations ex107 to ex110 which are fixed wireless stations.
[0118]
The camera ex113 is a device such as a digital video camera capable of shooting moving images. In addition, a mobile phone can be a PDC (Personal Digital Communications) system, a CDMA (Code Division Multiple Access) system, a W-CDMA (Wideband-Code Division Multiple Access mobile phone system, or a GSM gigabit mobile access system). Or PHS (Personal Handyphone System) or the like.
[0119]
The streaming server ex103 is connected from the camera ex113 to the base station ex109 and the telephone network ex104, and enables live distribution and the like based on encoded data transmitted by the user using the camera ex113. The encoding process of the photographed data may be performed by the camera ex113, or may be performed by a server or the like that performs the data transmission process. Also, moving image data captured by the camera 116 may be transmitted to the streaming server ex103 via the computer ex111. The camera ex116 is a device such as a digital camera that can shoot still images and moving images. In this case, encoding of the moving image data may be performed by the camera ex116 or the computer ex111. The encoding process is performed by the LSI ex117 of the computer ex111 and the camera ex116. The image encoding / decoding software may be incorporated in any storage medium (CD-ROM, flexible disk, hard disk, or the like) that is a recording medium readable by the computer ex111 or the like. Further, the moving image data may be transmitted by the mobile phone with camera ex115. The moving image data at this time is data encoded by the LSI included in the mobile phone ex115.
[0120]
FIG. 16 is a diagram illustrating an example of the mobile phone ex115. The mobile phone ex115 includes an antenna ex201 for transmitting and receiving radio waves to and from the base station ex110, a camera unit ex203 capable of taking a picture such as a CCD camera, a still image, a picture taken by the camera unit ex203, and an antenna ex201. A display unit ex202 such as a liquid crystal display for displaying data obtained by decoding a received image or the like, a main unit ex204 including operation keys, an audio output unit ex208 such as a speaker for outputting audio, and audio input. Input unit ex205 such as a microphone for storing encoded or decoded data, such as data of captured moving images or still images, received mail data, moving image data or still image data, etc. Storage media ex207, attached storage media ex207 to mobile phone ex115 And a slot portion ex206 to ability. The storage medium ex207 stores a flash memory element, which is a kind of an electrically erasable and programmable read only memory (EEPROM), which is a nonvolatile memory that can be electrically rewritten and erased, in a plastic case such as an SD card.
[0121]
In the content supply system ex100, the content (for example, a video image of a live music) captured by the user with the camera ex113, the camera ex116, or the like is encoded and transmitted to the streaming server ex103 as in the above-described embodiment. On the other hand, the streaming server ex103 stream-distributes the content data to the requesting client. Examples of the client include a computer ex111, a PDA ex112, a camera ex113, a mobile phone ex114, and the like that can decode the encoded data. In this way, the content supply system ex100 can receive and reproduce the encoded data at the client, and further, realizes personal broadcast by receiving, decoding, and reproducing the data in real time at the client. It is a system that becomes possible.
[0122]
Further, the mobile phone ex115 will be described with reference to FIG. The mobile phone ex115 controls a power supply circuit unit ex310, an operation input control unit ex304, an image encoding unit ex312, a camera interface, with respect to a main control unit ex311 that controls the display unit ex202 and the main unit ex204 collectively. The unit ex303, an LCD (Liquid Crystal Display) control unit ex302, an image decoding unit ex309, a demultiplexing unit ex308, a recording / reproducing unit ex307, a modulation / demodulation circuit unit ex306, and an audio processing unit ex305 are connected to each other via a synchronous bus ex313. . When the end of the call and the power key are turned on by a user operation, the power supply circuit unit ex310 supplies power to each unit from the battery pack to activate the digital cellular phone with camera ex115 in an operable state. . The mobile phone ex115 converts voice data collected by the voice input unit ex205 into digital voice data by the voice processing unit ex305 in the voice call mode based on the control of the main control unit ex311 including a CPU, a ROM, a RAM, and the like. This is spread-spectrum-processed by a modulation / demodulation circuit unit ex306, subjected to digital-analog conversion processing and frequency conversion processing by a transmission / reception circuit unit ex301, and then transmitted via an antenna ex201. The mobile phone ex115 amplifies the received data received by the antenna ex201 in the voice communication mode, performs frequency conversion processing and analog-to-digital conversion processing, performs spectrum despreading processing in the modulation / demodulation circuit unit ex306, and performs analog voice decoding in the voice processing unit ex305. After being converted into data, the data is output via the audio output unit 208. Further, when an e-mail is transmitted in the data communication mode, text data of the e-mail input by operating the operation keys of the main body ex204 is sent to the main control unit ex311 via the operation input control unit ex304. The main control unit ex311 performs spread spectrum processing on the text data in the modulation / demodulation circuit unit ex306, performs digital / analog conversion processing and frequency conversion processing in the transmission / reception circuit unit ex301, and transmits the data to the base station ex110 via the antenna ex201.
[0123]
When transmitting image data in the data communication mode, the image data captured by the camera unit ex203 is supplied to the image encoding unit ex312 via the camera interface unit ex303. When image data is not transmitted, image data captured by the camera unit ex203 can be directly displayed on the display unit ex202 via the camera interface unit ex303 and the LCD control unit ex302.
[0124]
The image coding unit ex312 converts the image data supplied from the camera unit ex203 into coded image data by performing compression coding according to the coding method described in the above embodiment, and sends this to the demultiplexing unit ex308. I do. At this time, the mobile phone ex115 simultaneously transmits the audio collected by the audio input unit ex205 during imaging by the camera unit ex203 to the demultiplexing unit ex308 as digital audio data via the audio processing unit ex305.
[0125]
The demultiplexing unit ex308 multiplexes the encoded image data supplied from the image encoding unit ex312 and the audio data supplied from the audio processing unit ex305 by a predetermined method, and multiplexes the resulting multiplexed data into a modulation / demodulation circuit unit. The signal is subjected to spread spectrum processing in ex306 and subjected to digital-analog conversion processing and frequency conversion processing in the transmission / reception circuit unit ex301, and then transmitted via the antenna ex201.
[0126]
When data of a moving image file linked to a homepage or the like is received in the data communication mode, the data received from the base station ex110 via the antenna ex201 is subjected to spectrum despreading processing by the modulation / demodulation circuit unit ex306, and the resulting multiplexed data is obtained. The demultiplexed data is sent to the demultiplexing unit ex308.
[0127]
To decode the multiplexed data received via the antenna ex201, the demultiplexing unit ex308 separates the multiplexed data into coded image data and audio data by separating the multiplexed data, and transmits the multiplexed data via the synchronization bus ex313. And supplies the encoded image data to the image decoding unit ex309 and the audio data to the audio processing unit ex305.
[0128]
Next, the image decoding unit ex309 generates reproduced moving image data by decoding the encoded image data using a decoding method corresponding to the encoding method described in the above embodiment, and outputs the reproduced moving image data to the LCD control unit ex302. To the display unit ex202, whereby, for example, moving image data included in a moving image file linked to a homepage is displayed. At this time, the audio processing unit ex305 simultaneously converts the audio data into analog audio data and supplies the analog audio data to the audio output unit ex208, whereby the audio data included in the moving image file linked to the homepage is reproduced, for example. You.
It should be noted that the present invention is not limited to the example of the system described above, and digital broadcasting using satellites and terrestrial waves has recently become a topic. As shown in FIG. Either method can be incorporated. Specifically, at the broadcasting station ex409, an encoded bit stream of video information is transmitted to a communication or broadcasting satellite ex410 via radio waves. The broadcasting satellite ex410 receiving this transmits a radio wave for broadcasting, receives this radio wave with a home antenna ex406 having a satellite broadcasting receiving facility, and transmits the radio wave to a television (receiver) ex401 or a set-top box (STB) ex407 or the like. The device decodes the encoded bit stream and reproduces it. In addition, the image decoding device described in the above embodiment can be mounted on a playback device ex403 that reads and decodes an encoded bit stream recorded on a storage medium ex402 such as a CD or DVD that is a recording medium. is there. In this case, the reproduced video signal is displayed on the monitor ex404. A configuration is also conceivable in which an image decoding device is mounted in a set-top box ex407 connected to a cable ex405 for cable television or an antenna ex406 for satellite / terrestrial broadcasting, and this is reproduced on a monitor ex408 of the television. At this time, the image decoding device may be incorporated in the television instead of the set-top box. Further, it is also possible to receive a signal from the satellite ex410 or the base station ex107 or the like with the car ex412 having the antenna ex411 and reproduce the moving image on a display device such as the car navigation ex413 or the like included in the car ex412.
Further, an image signal can be encoded by the image encoding device described in the above embodiment and recorded on a recording medium. As specific examples, there are a recorder ex420 such as a DVD recorder for recording an image signal on a DVD disk ex421 and a disk recorder for recording on a hard disk. Furthermore, it can be recorded on the SD card ex422.
If the recorder ex420 includes the image decoding device described in the above embodiment, the image signal recorded on the DVD disc ex421 or the SD card ex422 can be reproduced and displayed on the monitor ex408.
The configuration of the car navigation system ex413 is the same as that of the mobile phone ex115 shown in FIG. 17, for example, except for the configuration shown in FIG. 17 except for the camera unit ex203, the camera interface unit ex303, and the image encoding unit ex312. Conceivable. The same can be considered for the computer ex111 and the television (receiver) ex401.
In addition, terminals such as the mobile phone ex114 and the like have three mounting formats, in addition to a transmitting / receiving terminal having both an encoder and a decoder, a transmitting terminal having only an encoder and a receiving terminal having only a decoder. Can be considered.
[0129]
As described above, by implementing the encoding method and the decoding method described in this specification, it becomes possible to realize any of the apparatuses and systems described in this embodiment.
[0130]
【The invention's effect】
As is clear from the above description, the moving picture coding method according to the present invention is a method for coding a moving picture composed of a picture sequence and outputting the obtained code string, and for each block constituting the picture. A motion vector calculating step of calculating a motion vector, a motion vector predicting step of predicting and generating a motion vector of a processing target block by performing scaling processing using coefficients with the calculated motion vector as a reference motion vector, Outputting the coefficient together with the code sequence.
[0131]
As a result, since the code string is generated and output including the scaling coefficient for use in the direct mode in the related information of the picture, the scaling coefficient can be extracted from the related information at the time of decoding.
[0132]
A moving picture decoding method according to the present invention is a method for decoding a code string output by the moving picture coding method according to claim 1, wherein the coefficient is extracted from the code string, and the extracted coefficient is A motion vector calculating step of calculating a motion vector of the processing target block by performing a scaling process using: and a decoding step of decoding the processing target block by using the calculated motion vector. And
[0133]
This makes it possible to generate two motion vectors of the processing target block from the reference motion vector using the scaling coefficients extracted from the code string. , It is possible to reduce processing addition and to perform efficient processing.
[0134]
Further, by using the moving picture coding method and the moving picture decoding method of the present invention, it is possible to use an optimum scaling coefficient according to a field to be referred in the direct mode, and to improve coding efficiency. . At this time, the overhead for describing the scaling coefficient can be almost ignored. Furthermore, if only a part of the scaling coefficient is described as related information and the other scaling coefficient is calculated from the described scaling coefficient, the code amount of the related information for describing the scaling coefficient is completely different from the conventional example. Will be the same.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an embodiment of a video encoding device according to the present invention.
FIGS. 2A and 2B are explanatory diagrams showing the order of pictures in a frame memory; FIG. 2A is an explanatory diagram showing an input order, and FIG. 2B is an explanatory diagram showing a rearranged order;
FIG. 3 is an explanatory diagram showing a motion vector in a direct mode.
FIG. 4 is a conceptual diagram of an image encoded signal format by the moving image encoding device.
FIG. 5 is a block diagram showing a configuration of an embodiment of a video decoding device according to the present invention.
6A and 6B are explanatory diagrams showing the order of pictures, in which (a) shows the order of pictures in an input code string, and (b) shows the order of pictures output as an output image.
FIG. 7 is a schematic diagram for explaining an embodiment of the present invention.
FIG. 8 is a schematic diagram for explaining an embodiment of the present invention.
FIG. 9 is a schematic diagram for explaining an embodiment of the present invention.
FIG. 10 is a schematic diagram for explaining an embodiment of the present invention.
FIG. 11 is a schematic diagram for explaining an embodiment of the present invention.
FIG. 12 is a flowchart illustrating an embodiment of the present invention.
FIG. 13 is a flowchart illustrating an embodiment of the present invention.
14A and 14B are explanatory diagrams of a recording medium for storing a program for realizing a moving image encoding method and a moving image decoding method according to the first embodiment by a computer system, and FIG. Explanatory diagram showing an example of the physical format of a certain flexible disk, (b) Explanatory diagram showing the appearance, cross-sectional structure, and flexible disk viewed from the front of the flexible disk, and (c) Recording and reproduction of the above program on the flexible disk FD. FIG. 4 is an explanatory diagram showing a configuration for performing the operation.
FIG. 15 is a block diagram illustrating an overall configuration of a content supply system that realizes a content distribution service.
FIG. 16 illustrates an example of a mobile phone.
FIG. 17 is a block diagram showing an internal configuration of a mobile phone.
FIG. 18 is a block diagram illustrating an overall configuration of a digital broadcasting system.
FIG. 19 is a schematic diagram for explaining a conventional example.
FIG. 20 is a schematic diagram for explaining a conventional example.
[Explanation of symbols]
101, 107, 707 Frame memory
102 Difference calculation unit
103 prediction error encoder
104 Code string generation unit
105 prediction error decoding unit
106 addition operation unit
108 Motion vector detector
109 Mode selector
110 encoding control unit
111-115, 709, 710 switch
116 motion vector storage unit
701 Code string analysis unit
702 Prediction error decoding unit
703 mode decoding unit
705 Motion compensation decoding unit
706 Motion vector storage unit
708 addition operation unit
P1, B2, B3, B4 frames
P11, P12, B21, B21, B31, B32, B41, B42 fields

Claims

A method of encoding a moving image as a sequence of pictures having a frame structure or a field structure, and outputting an obtained code sequence,
A motion vector calculation step of calculating a motion vector for each block constituting the picture;
A motion vector prediction step of predicting and generating a motion vector of a processing target block by performing scaling processing using coefficients with the calculated motion vector as a reference motion vector;
A coefficient output step of outputting the coefficient together with the code sequence.

2. The moving picture coding method according to claim 1, wherein, in the coefficient output step, the coefficient is included in information related to the picture and output.

In the motion vector prediction step, a plurality of coefficients are generated for one encoding target picture,
3. The moving image encoding method according to claim 2, wherein in the coefficient output step, only a part of the plurality of generated coefficients is included in the related information and output.

In the motion vector prediction step, one coefficient corresponding to a picture referred to by the processing target block is selected from among a plurality of coefficients held in advance, and the motion vector is generated using the selected coefficient. 2. The moving picture coding method according to claim 1, wherein:

2. The moving picture coding method according to claim 1, wherein in the motion vector predicting step, the coefficient is calculated and generated based on a temporal distance between a picture to which the processing target block belongs and a reference picture. .

2. The moving picture coding method according to claim 1, wherein in the motion vector prediction step, a ratio between an integer and a power of 2 is generated as the coefficient.

A method for decoding a code string output by the moving picture coding method according to claim 1,
A motion vector calculating step of calculating the motion vector of the processing target block by extracting the coefficient from the code string and performing a scaling process using the extracted coefficient;
A decoding step of decoding the processing target block using the calculated motion vector.

The code string includes only some coefficients among a plurality of coefficients for blocks constituting one picture,
The step of calculating the motion vector includes extracting the part of the coefficients from the code string, and determining a remaining coefficient among the plurality of coefficients by a predetermined method from the part of the coefficients. Item 8. The moving picture decoding method according to Item 7.

A device that encodes a moving image composed of a picture sequence and outputs an obtained code sequence,
Motion vector calculating means for calculating a motion vector for each block constituting the picture;
A motion vector prediction unit that predicts and generates a motion vector of a processing target block by performing scaling processing using coefficients, using the calculated motion vector as a reference motion vector,
A moving image coding apparatus comprising: a coefficient output unit that outputs the coefficient together with the code sequence.

An apparatus for decoding a code string output by the moving image encoding apparatus according to claim 9,
A motion vector calculation unit that calculates the motion vector of the processing target block by extracting the coefficient from the code string and performing a scaling process using the extracted coefficient;
A moving image decoding apparatus, comprising: decoding means for decoding the processing target block using the calculated motion vector.

A program for encoding a moving image composed of a picture sequence and outputting an obtained code sequence,
A program for causing a computer to execute the steps included in the moving picture coding method according to claim 1.

A program for decoding a code string output by the moving picture coding method according to claim 1,
A program for causing a computer to execute the steps included in the moving picture decoding method according to claim 7 or 8.

A computer-readable recording medium in which a code sequence in which a moving image including a picture sequence is encoded is recorded,
Coefficients used to predict and generate a motion vector of a processing target block by performing scaling processing using a motion vector already calculated for a block constituting a picture as a reference motion vector are included in the code string. Recording medium characterized by the above-mentioned.

14. The recording medium according to claim 13, wherein the coefficient is included in related information of a picture to which a block related to the coefficient belongs.

14. The recording medium according to claim 13, wherein the code string includes only a part of coefficients among a plurality of coefficients of a block constituting one picture.

14. The recording medium according to claim 13, wherein the coefficient is a ratio between an integer and a power of two.