JP2004040512A

JP2004040512A - Image encoding method and image decoding method

Info

Publication number: JP2004040512A
Application number: JP2002195304A
Authority: JP
Inventors: Makoto Hagai; 誠羽飼; Shinya Sumino; 眞也角野; Toshiyuki Kondo; 敏志近藤; Seishi Abe; 清史安倍
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-07-03
Filing date: 2002-07-03
Publication date: 2004-02-05

Abstract

【課題】複数の参照フレームから任意の２枚の参照フレームを選択し補間予測によりフレーム間予測を行う符号化方式のダイレクトモードの符号化効率を改善する。
【解決手段】複数のダイレクトモード用スケーリング係数を格納したダイレクトモード用スケーリング係数表を備え、ダイレクトモードの第１相対インデックス値に応じてダイレクトモード係数を選択使用することにより、符号化対象フレームと参照フレームの表示時刻差を考慮したダイレクトモード用動きベクトル生成ができるため、ダイレクトモード時の符号化効率を改善することができる。
【選択図】　図１An object of the present invention is to improve the coding efficiency of a direct mode of a coding system in which two arbitrary reference frames are selected from a plurality of reference frames and inter-frame prediction is performed by interpolation prediction.
A direct mode scaling coefficient table in which a plurality of direct mode scaling coefficients are stored, and a direct mode coefficient is selected and used according to a first relative index value of the direct mode, so that an encoding target frame is referred to. Since a direct mode motion vector can be generated in consideration of a frame display time difference, encoding efficiency in the direct mode can be improved.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、画像信号を符号化および復号する方法、並びにそれをソフトウェアで実施するためのプログラムが記録された記録媒体に係るものである。
【０００２】
【従来の技術】
近年、マルチメディアアプリケーションの発展に伴い、画像・音声・テキストなど、あらゆるメディアの情報を統一的に扱うことが一般的になってきた。この時、全てのメディアをディジタル化することにより統一的にメディアを扱うことが可能になる。しかしながら、ディジタル化された画像は膨大なデータ量を持つため、蓄積・伝送のためには、画像の情報圧縮技術が不可欠である。一方で、圧縮した画像データを相互運用するためには、圧縮技術の標準化も重要である。画像圧縮技術の標準規格としては、ＩＴＵ−Ｔ（国際電気通信連合　電気通信標準化部門）のＨ．２６１、Ｈ．２６３、ＩＳＯ（国際標準化機構）のＭＰＥＧ（Ｍｏｖｉｎｇ　Ｐｉｃｔｕｒｅ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）−１、ＭＰＥＧ−２、ＭＰＥＧ−４などがある。また、ＩＴＵでは、現在、最新の画像符号化規格としてＨ．２６４が標準化中であり、標準化過程におけるドラフト案はＨ．２６Ｌと呼ばれる。
【０００３】
ＭＰＥＧ−１，２，４，Ｈ．２６３などの動画像符号化方式に共通の技術として動き補償を伴うフレーム間予測がある。これらの動画像符号化方式の動き補償では、入力画像のフレームを所定のサイズの矩形（以降、ブロックと呼ぶ）に分割し、各ブロック毎にフレーム間の動きを示す動きベクトルから予測画素を生成する。
【０００４】
以下、動き補償を伴うフレーム間予測について説明するため、まず▲１▼Ｂピクチャの概念について説明し、Ｂピクチャは補間予測ブロックをピクチャ内に含むことが可能なピクチャであるため、▲２▼補間予測について説明する。次に、補間予測等で用いられる参照フレームは一意に識別されているため、▲３▼識別のために用いられるフレーム番号と相対インデックス、およびこれらを参照フレームに付与する方法について説明する。次に、▲４▼短期フレームバッファと長期フレームバッファについて説明し、▲５▼画素補間によりフレーム間予測を行うモードであるダイレクトモードについて説明する。最後に▲６▼従来の画像符号化装置、▲７▼従来の画像復号化装置について説明する。
【０００５】
▲１▼Ｂピクチャについて図１６を用いて説明する。図１６はＢピクチャの概念図である。Ｆｒｍは符号化対象のＢピクチャ、Ｒｅｆ１，Ｒｅｆ２，Ｒｅｆ３はフレーム間予測の参照フレームとして使用可能な符号化済フレームを示す。ブロックＢｌｋ１は参照ブロックＲｅｆＢｌｋ１とＲｅｆＢｌｋ２からフレーム間予測されたブロック、ブロックＢｌｋ２は参照ブロックＲｅｆＢｌｋ２１とＲｅｆＢｌｋ２２からフレーム間予測されたブロックである。
【０００６】
▲２▼図１７は補間予測の説明図である。ＲｅｆＢｌｋ１とＲｅｆＢｌｋ２は補間予測に使用された２つの参照ブロック、ＰｒｅｄＢｌｋは補間処理により得られた予測ブロックを示す。ここでは、ブロックサイズは４×４画素として説明する。Ｘ１（ｉ）はＲｅｆＢｌｋ１の画素値、Ｘ２（ｉ）はＲｅｆＢｌｋ２の画素値、Ｐ（ｉ）はＰｒｅｄＢｌｋの画素値とする。画素値Ｐ（ｉ）は次式のような線形予測式により得ることができる。
【０００７】
Ｐ（ｉ）　＝　Ａ・Ｘ１（ｉ）　＋　Ｂ・Ｘ２（ｉ）＋Ｃ
Ａ，Ｂ，Ｃは線形予測係数である。この線形予測係数は、ＭＰＥＧ−１，２のように、平均値（上式でＡ＝１／２，Ｂ＝１／２，Ｃ＝０の場合）のみが使用される場合もあるし、明示的に線形予測係数を設定し、その値を画像符号化信号中に格納して画像符号化装置から画像復号装置に伝送する場合もある。
【０００８】
複数の参照フレームから画素補間によりフレーム間予測されるブロックを「補間予測ブロック」と呼ぶ。Ｂピクチャは、補間予測ブロックをピクチャ内に含むことが可能なピクチャである。また、Ｂピクチャでは、図１６で示すブロックＢｌｋ１とブロックＢｌｋ２のように、補間予測ブロック毎に異なる参照フレームを選択することができる。
【０００９】
なお、補間予測ブロックを含まず、１枚の参照フレームからフレーム間予測を行うブロックをピクチャ内に含むことが可能なピクチャをＰピクチャと呼び、フレーム間予測を行わない面内予測ブロックのみから構成されるピクチャをＩピクチャと呼ぶ。
【００１０】
Ｈ．２６ＬではＢピクチャのブロックに対し最大２枚の参照フレームを使用する。そこで、２枚の参照フレームを区別するため、各参照フレームを第１参照フレーム、第２参照フレームと呼ぶ。第１参照フレーム、第２参照フレーム、それぞれに対する動きベクトルを第１動きベクトル、第２動きベクトルと呼ぶ。図１６のＢｌｋ１を例にとると、第１参照フレームはＲｅｆ１、第２参照フレームはＲｅｆ３、第１動きベクトルはＭＶ１、第２動きベクトルはＭＶ２となる。また、第１参照フレームのみからの予測を第１参照フレーム予測、第２参照フレームのみからの予測を第２参照フレーム予測と呼ぶ。なお、参照フレーム１枚からフレーム間予測されたブロックでは、参照フレームや動きベクトルを区別する必要はない。ただし、本文書中では、説明の都合上、参照フレーム１枚からフレーム間予測されたブロックの参照フレーム・動きベクトルを、第１参照フレーム、第１動きベクトルと呼ぶ。
【００１１】
▲３▼図１８はフレーム番号と相対インデックスの説明図である。フレーム番号・相対インデックスはマルチフレームバッファＭＦｒｍＢｕｆに格納された参照フレームを一意に識別するための番号である。Ｈ．２６Ｌでは、参照画像としてメモリに蓄積されるフレーム毎に１増加する値がフレーム番号として割り当てられる。一方、相対インデックスは図１９に示すようにブロック中にある。そして、このブロックのフレーム間予測に使用する参照フレームを指示するために相対インデックスは使用される。
【００１２】
図１９は従来の画像符号化装置による画像符号化信号フォーマットの概念図である。Ｐｉｃｔｕｒｅは１フレーム分の符号化信号、Ｈｅａｄｅｒはフレーム先頭に含まれるヘッダ符号化信号、Ｂｌｏｃｋ１はダイレクトモードによるブロックの符号化信号、Ｂｌｏｃｋ２はダイレクトモード以外の補間予測によるブロックの符号化信号、ＲＩｄｘ１，ＲＩｄｘ２は相対インデックス、ＭＶ１，ＭＶ２は動きベクトルを示す。補間予測ブロックＢｌｏｃｋ２では、補間に使用する２つの参照フレームを示すため２つの相対インデックスＲＩｄｘ１，ＲＩｄｘ２を符号化信号中にこの順で有する。この順で有する。相対インデックスＲＩｄｘ１，ＲＩｄｘ２いずれを使用するかはＰｒｅｄＴｙｐｅにより判断することができる。第１参照フレームを示す相対インデックスＲＩｄｘ１を第１相対インデックス、第２参照フレームを示す相対インデックスＲＩｄｘ２を第２相対インデックスと呼ぶ。第１参照フレームと第２参照フレームとは符号化ストリーム中のデータ位置で決まる。
【００１３】
以下、第１相対インデックス、第２相対インデックスの付与方法について図１８を用いて説明する。
【００１４】
第１相対インデックスの値には、まず、符号化対象フレームより前の表示時刻を持つ参照フレームに対し、符号化対象フレームに近い順より０から始まる値が割り当てられる。符号化対象より前の表示時刻を持つ参照フレーム全てに対し０から始まる値が割り当てられたら、次に符号化対象フレームより後の表示時刻を持つ参照フレームに対し、符号化対象フレームに近い順から続きの値が割り当てられる。
【００１５】
図２１（ａ）における第１相対インデックスＲＩｄｘ１が０で第２相対インデックスＲｉｄｘ２が１の場合、図２０に示すように、第１参照フレームはフレーム番号１４のＢピクチャであり、第２参照フレームはフレーム番号１３のＢピクチャである。　第２相対インデックスの値には、まず、符号化対象フレームより後の表示時刻を持つ参照フレームに対し、符号化対象フレームに近い順より０から始まる値が割り当てられる。符号化対象より後の表示時刻を持つ参照フレーム全てに対し０から始まる値が割り当てられたら、次に符号化対象フレームより前の表示時刻を持つ参照フレームに対し、符号化対象フレームに近い順から続きの値が割り当てられる。
【００１６】
ブロック中の相対インデックスは可変長符号語により表現され、値が小さいほど短い符号長のコードが割り当てられている。通常、フレーム間予測の参照フレームとして符号化対象フレームに最も近いフレームが選択されるため、上記のように符号化対象フレームに近い順に相対インデックス値を割り当てれば符号化効率は高くなる。
【００１７】
▲４▼図２０は短期フレームバッファと長期フレームバッファの概念図である。ＭＰＥＧ−１，２では、マルチフレームバッファＭＦｒｍＢｕｆは、バッファに空きがなければ最初にバッファに格納した参照フレームを破棄して新しい参照フレームを格納する、ＦＩＦＯ（Ｆｉｒｓｔ−Ｉｎ　Ｆｉｒｓｔ−Ｏｕｔ）形式のバッファとして既定される。ここで、静止した背景の前を物体が横切るシーンを考える。この場合、背景は一旦前景の物体に隠れ、再び画面に現れる。もし、物体に隠れた背景が最初に現れた時点で参照フレームとして、物体に隠れる前に符号化した背景のフレームを使用できれば符号化効率は向上する。しかしながら、参照フレームの画像データを保持するために必要なメモリ量は膨大で、実用的なメモリ量を考慮すれば、マルチフレームバッファＭＦｒｍＢｕｆに格納できる最大フレーム数は数枚程度である。そのため、上記のようなシーンの場合、背景が再び画面に現れた時点には、既に背景のフレームはマルチフレームバッファＭＦｒｍＢｕｆから破棄され予測に使用できず、効率的なフレーム間予測を行うことができない場合が多い。
【００１８】
そこで、Ｈ．２６Ｌでは、マルチフレームバッファＭＦｒｍＢｕｆとして、ＦＩＦＯ形式のバッファとは別に、特定のフレームを長期間保持することができる長期フレームバッファＬＴＭｅｍと呼ばれるバッファを既定している。図２０に示すように、Ｈ．２６Ｌでは、マルチフレームバッファＭＦｒｍＢｕｆは、仮想的に短期フレームバッファＳＴＭｅｍと長期フレームバッファＬＴＭｅｍに分割する。従来のＦＩＦＯ形式のバッファは、長期フレームバッファと区別するため、短期フレームバッファＳＴＭｅｍと呼ばれる。一旦、長期フレームバッファＬＴＭｅｍに格納された参照フレームは、バッファに対する制御指示なしに、長期フレームバッファＬＴＭｅｍから破棄されることはない。上記で説明したシーンの場合には、長期フレームバッファＬＴＭｅｍに背景のフレームを格納しておくことにより、隠れた背景が再び現れた時点で、そのフレームを予測に使用することができる。この仕組みにより、上記で説明したシーンのような場合に大きく符号化効率を上げることができる。長期フレームバッファＬＴＭｅｍと短期フレームバッファＳＴＭｅｍの間で参照フレームを移動したり、長期フレームバッファＬＴＭｅｍから参照フレームの破棄を行うには、バッファに対して制御指示を行う。長期フレームバッファＬＴＭｅｍに対し参照フレームの移動・破棄などの制御を行った場合に、画像符号化装置は、図２１に示すように符号化信号中にその制御内容を示すバッファ制御信号ＲＰＳＬを格納する。画像復号装置は、符号化信号中のバッファ制御信号ＲＰＳＬを使用して、画像符号化装置と同じ制御をマルチフレームバッファＭＦｒｍＢｕｆの長期フレームバッファＬＴＭｅｍに行うことができる。
【００１９】
図２０の例では、図１９に示すように短期フレームバッファＳＴＭｅｍに参照フレームＲｅｆ１，Ｒｅｆ２，Ｒｅｆ３を、長期フレームバッファＬＴＭｅｍに参照フレームＬＴＲｅｆ１，ＬＴＲｅｆ２を格納している。相対インデックス値は、長期フレームバッファＬＴＭｅｍの参照フレームに対しても割り当てられる。そのため、短期フレームバッファＳＴＭｅｍの参照フレームの場合と同様に、画像符号化信号中において、ブロックの動き補償で参照フレームとして選択された長期フレームバッファＬＴＭｅｍのフレームを相対インデックスの値で示すことができる。
【００２０】
▲５▼図２１は従来の画像符号化装置のダイレクトモードの説明図である。ダイレクトモードとは、符号化対象ブロックに対する参照フレーム・動きベクトルを、参照フレームの符号化時に使用した動きベクトル・参照フレームから以下に説明する方法により決定し、画素補間によりフレーム間予測を行うモードである。Ｆｒｍは符号化対象のＢピクチャ、Ｒｅｆ１，Ｒｅｆ２，Ｒｅｆ３，Ｒｅｆ４はマルチフレームバッファＭＦｒｍＢｕｆ内の符号化済フレームを示す。ＲＩｄｘ１は第１相対インデックス、ＲＩｄｘ２は第２相対インデックス、ＭＶ０１は第１動きベクトル、ＭＶ０２は第２動きベクトルを示す。
【００２１】
ブロックＢｌｋ０はダイレクトモードで符号化されるブロック、ブロックＢｌｋ００は参照フレームＲｅｆ３内で符号化対象ブロックＢｌｋ０と同じ位置にあるブロック、ＲｅｆＢｌｋ０１は参照フレームＲｅｆ１に含まれる参照ブロック、ＲｅｆＢｌｋ０２は参照フレームＲｅｆ３に含まれる参照ブロックを示す。
【００２２】
動きベクトルＭＶ０はブロックＢｌｋ００を符号化した際の第１動きベクトルでフレームＲｅｆ１を参照する。符号化対象ブロックＢｌｋ０の予測に使用される第１動きベクトルＭＶ０１と第２動きベクトルＭＶ０２とは次の式により計算される。
【００２３】
ＭＶ０１　＝　Ｎ１×ＭＶ０／Ｄ、ＭＶ０２＝Ｎ２×ＭＶ０／Ｄ
上式において、Ｎ１，Ｎ２，Ｄはダイレクトモード用動きベクトルの計算時に使用される値であり、本詳細な説明ではこのＮ１，Ｎ２，Ｄの値の組をダイレクトモード用スケーリング係数と呼ぶ。図２１の場合、ダイレクトモード用スケーリング係数をＮ１＝２，Ｎ２＝−１，Ｄ＝３とすればよい。また、動きベクトルＭＶ０をスケーリング用動きベクトルと呼ぶ。画面内での符号化対象ブロックを含む物体の動きが一定であると仮定した場合、第１動きベクトルＭＶ０１と第２動きベクトルＭＶ０２は、フレームＦｒｍと第１参照フレームＲｅｆ１の表示時刻差と、フレームＦｒｍと第２参照フレームＲｅｆ２の表示時刻差と、によって動きベクトルＭＶ０を内分することによって求めることができる。
【００２４】
また、ダイレクトモードにおいては各フレームごとに伝送されたダイレクトモード用スケーリング係数がフレーム含まれる全ブロックに共通して使用される。
【００２５】
なお、図２１を図１８と対応させるとすると、図２１のＦｒｍは図１８の中央のＢピクチャ（点線のピクチャ）に、図２１のＲｅｆ３は図１８のフレーム番号１３のピクチャに、図２１のＲｅｆ１は図１８のフレーム番号１２のピクチャに、図２１のＲｅｆ２は図１８のフレーム番号１１のＢピクチャに対応する。図２１のＦｒｍ以外の点線で示されている非参照フレームは、他のピクチャから参照されることがないため、マルチフレームバッファＭｆｒｍＢｕｆには保存されない。よって、図１８に示すピクチャのようにそのフレームを参照するための相対インデックスが割り当てられることは無い。
【００２６】
▲６▼図１５は、従来の画像符号化装置の構成を示すブロック図である。以下、画像符号化装置について説明する。
画像符号化装置は、ブロックに分割された画像信号Ｉｍｇを入力し、ブロック毎に処理を行うとする。減算器Ｓｕｂは、画像信号Ｉｍｇから予測画像信号Ｐｒｅｄを減算し、残差信号Ｒｅｓとして出力する。　画像符号化手段ＩｍｇＥｎｃは、残差信号Ｒｅｓを入力し、ＤＣＴ変換・量子化などの画像符号化処理を行い、量子化済ＤＣＴ係数などを含む残差符号化信号ＥＲｅｓを出力する。画像復号手段ＩｍｇＤｅｃは、残差符号化信号ＥＲｅｓを入力し、逆量子化・逆ＤＣＴ変換などの画像復号処理を行い、残差復号信号ＤＲｅｓを出力する。加算器Ａｄｄは、残差復号信号ＤＲｅｓと予測画像信号Ｐｒｅｄを加算し、再構成画像信号Ｒｅｃｏｎとして出力する。再構成画像信号Ｒｅｃｏｎで、以降のフレーム間予測で参照される可能性がある信号は、マルチフレームバッファＭＦｒｍＢｕｆに格納する。マルチフレームバッファＭＦｒｍＢｕｆのメモリ量は有限なため、マルチフレームバッファＭＦｒｍＢｕｆ内で以降のフレーム間予測に使用されないフレームのデータはマルチフレームバッファＭＦｒｍＢｕｆから除去することができる。
【００２７】
動き推定手段ＭＥは、マルチフレームバッファＭＦｒｍＢｕｆに格納された参照フレームＲｅｆＦｒｍｓを入力し動き推定を行い、面内予測、第１参照フレーム予測、第２参照フレーム予測、補間予測による予測の中から所定の方法で最適な予測種別を選択し（ピクチャ種別により選択できる予測種別は異なる）、ブロックに対する第１動きベクトルｅＭＶ１、第２動きベクトルｅＭＶ２、第１相対インデックスｅＲＩｄｘ１、第２相対インデックスｅＲＩｄｘ２を出力する。
【００２８】
動き推定手段ＭＥにおける予測種別の選択方法には、例えば、各予測種別による予測誤差が最小となる予測種別を選択する方法がある。選択された予測種別が面内予測の場合には、動きベクトル、相対インデックスは出力されず、第１参照フレーム予測の場合には、第１相対インデックス、第１動きベクトルのみ出力され、第２参照フレーム予測の場合には、第２相対インデックス、第２動きベクトルのみ出力され、補間予測の場合には、第１、第２相対インデックス、第１，第２動きベクトルが出力される。
【００２９】
上記で示したように、Ｈ．２６Ｌではダイレクトモード時の第２参照フレームとして第２相対インデックスｒＲＩｄｘ２が０の参照フレームが使用される。よって、値０の第２相対インデックスｒＲＩｄｘ２は動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆとダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲＩｄｘとに入力される。
【００３０】
動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆには、スケーリング用ベクトルｒＭＶとスケーリング用ベクトルｒＭＶが参照するフレームを示すフレーム番号とが記憶されている。スケーリング用ベクトルｒＭＶが含まれる参照フレームは第２相対インデックスｒＲｉｄｘ２が示す参照フレームであるため、値０の第２相対インデックスｒＲＩｄｘ２を入力し、スケーリング用ベクトルｒＭＶとスケーリング用ベクトルｒＭＶの参照フレームを示す第１相対インデックスｒＲＩｄｘ１とを出力する。
【００３１】
ダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲＩｄｘは、ダイレクトモード用スケーリング係数ＳＰ、スケーリング用ベクトルｒＭＶ、第１相対インデックスｒＲＩｄｘ１を入力し、上記に説明したダイレクトモードの処理により、ダイレクトモード時の第１動きベクトルｓＭＶ１、第２動きベクトルｓＭＶ２、第１相対インデックスｓＲＩｄｘ１、第２相対インデックスｓＲＩｄｘ２を出力する。
【００３２】
予測種別選択手段ＰｒｅｄＳｅｌは、画像信号Ｉｍｇと、参照フレームＲｅｆＦｒｍｓと、「ダイレクトモード」の参照ブロックの位置を示す相対インデックスｓＲＩｄｘ１，ｓＲＩｄｘ２・動きベクトルｓＭＶ１，ｓＭＶ２と、「ダイレクトモード以外」の予測時に使用する参照ブロックの位置を示す相対インデックスｅＲＩｄｘ１，ｅＲＩｄｘ２・動きベクトルｅＭＶ１，ｅＭＶ２を入力する。そして、ブロックの予測にダイレクトモードを使用すべきか決定され、決定された予測種別が予測種別ＰｒｅｄＴｙｐｅとして可変長符号化手段ＶＬＣ０に出力される。
【００３３】
予測種別選択手段ＰｒｅｄＳｅｌにおける予測種別の選択は、例えば、入力画素に対する「ダイレクトモード時」の予測誤差と「ダイレクトモード以外の予測時」の予測誤差とで、予測誤差の小さい方を選択することで行う。
【００３４】
よって、予測種別ＰｒｅｄＴｙｐｅには、動き推定手段ＭＥで選択される面内予測、第１参照フレーム予測、第２参照フレーム予測、ダイレクトモード以外の補間予測に加えて、ダイレクトモードが加わることになる。
【００３５】
そして、予測種別ＰｒｅｄＴｙｐｅがダイレクトモードを示す場合には、スイッチＳＷ１２は”１”側に切り替わり、相対インデックスｓＲＩｄｘ１，ｓＲＩｄｘ２、動きベクトルｓＭＶ１，ｓＭＶ２が相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２として使用される。
【００３６】
一方、予測種別ＰｒｅｄＴｙｐｅがダイレクトモード以外を示す場合には、スイッチＳＷ１２は”０”側に切り替わり、相対インデックスｅＲＩｄｘ１，ｅＲＩｄｘ２、動きベクトルｅＭＶ１，ｅＭＶ２が相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２として使用される。
【００３７】
また、ダイレクトモード時には、符号化済フレームのブロックを符号化した際の第１動きベクトルｓＭＶ１がスケーリング用ベクトルとして使用される。そして、その第１動きベクトルｓＭＶ１に参照されるフレームが、ダイレクトモードの一方の参照フレームとして使用される。
【００３８】
従って、符号化した第１相対インデックスＲＩｄｘ１、第１動きベクトルＭＶ１の中で、符号化したフレーム以降のフレームでダイレクトモードで使用される可能性がある第１相対インデックスＲＩｄｘ１、第１動きベクトルＭＶ１は動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆに格納される。
【００３９】
予測種別ＰｒｅｄＴｙｐｅを決定後、マルチフレームバッファＭＦｒｍＢｕｆに第１相対インデックスＲＩｄｘ１と第１動きベクトルＭＶ１とが入力され、入力された第１相対インデックスＲＩｄｘ１と第１動きベクトルＭＶ１とに対応する参照ブロックＲｅｆＢｌｋ１が画素補間手段Ｐｏｌに出力させる。予測種別により２つの参照ブロックが必要されるとき、さらにマルチフレームバッファＭＦｒｍＢｕｆから第２相対インデックスＲＩｄｘ２と第２動きベクトルＭＶ２とに対応する参照ブロックＲｅｆＢｌｋ２が画素補間手段Ｐｏｌに出力される。
【００４０】
補間予測時には、画素補間手段Ｐｏｌは２個の参照ブロックＲｅｆＢｌｋ１，ＲｅｆＢｌｋ２に対応する位置の画素値を補間し、補間ブロックＲｅｆＰｏｌを出力する。
【００４１】
予測種別ＰｒｅｄＴｙｐｅが補間予測を示す場合には、スイッチＳＷ１１は”１”側に切り替わり、補間ブロックＲｅｆＰｏｌ　が予測画像信号Ｐｒｅｄとして使用される。
【００４２】
第１参照フレーム予測時には、マルチフレームバッファＭＦｒｍＢｕｆは第１相対インデックスＲＩｄｘ１と第１動きベクトルＭＶ１とに対応する参照ブロックＲｅｆＢｌｋを出力する。
【００４３】
第２参照フレーム予測時には、マルチフレームバッファＭＦｒｍＢｕｆは第２相対インデックスＲＩｄｘ２と第２動きベクトルＭＶ２とに対応する参照ブロックＲｅｆＢｌｋを出力する。
【００４４】
図示していないが、面内予測時には面内予測結果の画素からなるブロックＲｅｆＢｌｋがマルチフレームバッファＭＦｒｍＢｕｆから出力される。
【００４５】
これら予測種別ＰｒｅｄＴｙｐｅが補間予測以外の予測方法を示す場合には、スイッチＳＷ１１は”１”側に切り替わり、参照ブロックＲｅｆＢｌｋが予測画像信号Ｐｒｅｄとして使用される。
【００４６】
可変長符号化手段ＶＬＣ０は、残差符号化信号ＥＲｅｓ、相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２、ダイレクトモード用スケーリング係数ＳＰ、予測種別ＰｒｅｄＴｙｐｅを可変長符号化し、符号化信号ＢｉｔＳｔｒｍ０として出力する。
【００４７】
図２１の従来の画像符号化装置の画像符号化信号フォーマットの概念図において、Ｂｌｏｃｋ１はダイレクトモードで符号化されたブロックの例で、このブロックでは相対インデックス、動きベクトルの情報は符号化信号中を有さない。一方、Ｂｌｏｃｋ２はダイレクトモード以外の補間予測で符号化されているブロックの例で、このブロックでは相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２を画像符号化信号中に有することができる。
【００４８】
▲７▼図２３は、従来の画像復号装置の構成を示すブロック図である。図１５における従来の画像符号化装置の構成を示すブロック図と同じ動作をするユニットおよび同じ動作の信号は同じ記号を付し、説明を省略する。
【００４９】
可変長復号手段ＶＬＤ０は、画像符号化信号ＢｉｔＳｔｒｍ０を入力し可変長復号を行い、残差符号化信号ＥＲｅｓ、動きベクトルＭＶ１，ＭＶ２相対インデックスＲＩｄｘ１，ＲＩｄｘ２、ダイレクトモード用スケーリング係数ＳＰ、予測種別ＰｒｅｄＴｙｐｅを出力する。
【００５０】
画像復号手段ＩｍｇＤｅｃは、残差符号化信号ＥＲｅｓを入力し、逆量子化・逆ＤＣＴ変換などの画像復号処理を行い、残差復号信号ＤＲｅｓを出力する。
【００５１】
加算器Ａｄｄは、残差復号信号ＤＲｅｓと予測画像信号Ｐｒｅｄを加算し、復号画像信号ＤＩｍｇとして画像復号装置外に出力する。
【００５２】
マルチフレームバッファＭＦｒｍＢｕｆは、フレーム間予測のために復号画像信号ＤＩｍｇをバッファに格納する。
【００５３】
動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆには、スケーリング用ベクトルｒＭＶとスケーリング用ベクトルが参照するフレームを識別する情報ｒＲＩｄｘ１が記憶されている。また、動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆは値０の第１相対インデックスｒＲＩｄｘ２を入力し、スケーリング用ベクトルｒＭＶ、スケーリング用ベクトルが参照するフレームを識別する第１相対インデックスｒＲＩｄｘ１を出力する。
【００５４】
ダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲＩｄｘは、図１５のダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲＩｄｘと同じ処理を行う。
【００５５】
予測種別ＰｒｅｄＴｙｐｅがダイレクトモードを示す場合、スイッチＳＷ２２は”１”側に切り替わる。そして、相対インデックスｓＲＩｄｘ１，ｓＲＩｄｘ２、動きベクトルｓＭＶ１，ｓＭＶ２が相対インデックスｎＲＩｄｘ１，ｎＲＩｄｘ２、動きベクトルｎＭＶ１，ｎＭＶ２として使用される。
【００５６】
予測種別ＰｒｅｄＴｙｐｅがダイレクトモード以外を示す場合、スイッチＳＷ２２は”０”側に切り替わる。そして、相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２が相対インデックスｎＲＩｄｘ１，ｎＲＩｄｘ２、動きベクトルｎＭＶ１，ｎＭＶ２として使用される。
【００５７】
補間予測時は、マルチフレームバッファＭＦｒｍＢｕｆは第１相対インデックスｎＲＩｄｘ１と第１動きベクトルｎＭＶ１に対応する参照ブロックＲｅｆＢｌｋ１と、第２相対インデックスｎＲＩｄｘ２と第２動きベクトルｎＭＶ２に対応するＲｅｆＢｌｋ２とを出力する。そして、画素補間手段Ｐｏｌは２個の参照ブロックＲｅｆＢｌｋ１，ＲｅｆＢｌｋ２に対応する画素値を補間ブロックＲｅｆＰｏｌとして出力する。
【００５８】
第１参照フレーム予測時には、マルチフレームバッファＭＦｒｍＢｕｆは第１相対インデックスｎＲＩｄｘ１と第２動きベクトルｎＭＶ１に対応する参照ブロックＲｅｆＢｌｋを出力する。
【００５９】
第２参照フレーム予測時には、マルチフレームバッファＭＦｒｍＢｕｆは第２相対インデックスｎＲＩｄｘ２と第２動きベクトルｎＭＶ２に対応する参照ブロックＲｅｆＢｌｋを出力する。
【００６０】
図示していないが、面内予測時には面内予測結果の画素からなるブロックＲｅｆＢｌｋがマルチフレームバッファＭＦｒｍＢｕｆから出力される。
【００６１】
予測種別ＰｒｅｄＴｙｐｅが補間予測を示す場合には、スイッチＳＷ２１は”０”側に切り替わり、補間ブロックＲｅｆＰｏｌ　が予測画像信号Ｐｒｅｄとして使用される。
【００６２】
予測種別ＰｒｅｄＴｙｐｅが補間予測以外の予測方法を示す場合には、スイッチＳＷ２１は”１”側に切り替わり、参照ブロックＲｅｆＢｌｋが予測画像信号Ｐｒｅｄとして使用される。
【００６３】
復号した第１相対インデックスｎＲＩｄｘ１、第１動きベクトルｎＭＶ１の中で、復号したフレーム以降のフレームでダイレクトモードで使用される可能性がある第１相対インデックスｎＲＩｄｘ１、第１動きベクトルｎＭＶ１は動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆに格納される。
【００６４】
以上、説明した処理により画像復号装置は画像符号化信号ＢｉｔＳｔｒｍ０を復号し、画像復号信号ＤＩｍｇとして出力する。
【００６５】
【発明が解決しようとする課題】
画面内での符号化対象ブロックを含む物体の動きが一定であると仮定した場合、符号化対象フレームと第１参照フレームの表示時刻差と、符号化対象フレームと第２参照フレームの表示時刻差との比によって、ダイレクトモード用スケーリング係数を決定することができる。この場合、図２０のブロックＢｌｋ０では、スケーリング係数は（Ｎ１，Ｎ２，Ｄ）＝（２，−１，３）となり、ブロックＢｌｋ１では、スケーリング係数は（Ｎ１，Ｎ２，Ｄ）＝（５，−１，６）と決まる。
【００６６】
しかしながら、Ｈ．２６Ｌでは、ダイレクトモード用スケーリング係数はフレームに対し１個しか送信できない。そのため、フレーム内の全てのダイレクトモードブロックに対し同じダイレクトモード用スケーリング係数を使用しなければならない。その結果、図２０の場合において、例えば、（Ｎ１，Ｎ２，Ｄ）＝（２，−１，３）をフレームＦｒｍに対し共通のスケーリング係数として使用すると、ブロックＢｌｋ０では適切なダイレクトモード用動きベクトルが生成できるが、ブロックＢｌｋ１では適切なダイレクトモード用動きベクトルを生成できないことになり、符号化効率は劣化する。
【００６７】
そこで、本発明では、ダイレクトモードのブロックに対し、使用する参照フレームに応じてスケーリング用係数を選択するための手段を提供し、ダイレクトモードの符号化効率を改善することを目的とする。
【００６８】
【課題を解決するための手段】
第１の発明は、
マルチフレームバッファに格納されている複数の符号化済フレームから、符号化対象フレーム上のブロックを動き補償により求めるときに参照する第１の参照フレームと第２の参照フレームとを選択するために、前記符号化済フレームに対して付与された第１相対インデックスと第２相対インデックスとを用いて、前記第１または第２少なくとも一方の参照フレームを選択する第一のステップと、
前記第１または第２少なくとも一方の参照フレーム上の動き補償により得られたブロックから画素補間により予測画像を生成する第二のステップと、
前記予測誤差を符号化し、予測誤差の符号化信号を含む画像符号化信号を出力する第三のステップと
を有する画像符号化方法における第二のステップにおいて、
前記第２の参照フレーム内で、前記符号化対象フレーム上の所定のブロックと同じ位置のブロックの動き補償で使用した動きベクトルが参照するフレームを前記第１の参照フレームとし、
第１相対インデックスの値と対応付けられたスケーリング係数を１個以上格納したスケーリング係数表を備え、前記第１相対インデックスに対応するスケーリング係数を前記スケーリング係数表から選択し、前記動きベクトルと前記スケーリング係数から前記第１の参照フレームに対する動きベクトルと前記第２の参照フレームへの算出し、
前記第１の参照フレームに対する動きベクトルから得られるブロックと前記第２の参照フレームに対する動きベクトルから得られるブロックとから画素補間により予測画像を生成し、
前記第三のステップにおいて、
画像符号化信号中に前記スケーリング係数表を含めて出力することにより、
第１の参照フレーム毎にスケーリング係数をダイレクトモード用スケーリング係数表から選択でき、符号化対象フレームと参照フレームの表示時刻差を考慮したダイレクトモード用動きベクトル生成ができるため、ダイレクトモードの符号化効率を改善できる。
【００６９】
第２の発明は、
予測誤差の符号化信号を含む画像符号化信号を入力する第一のステップと、
マルチフレームバッファに格納されている複数の復号化済フレームから、復号化対象フレーム上のブロックを動き補償により求めるときに参照する第１の参照フレームと第２の参照フレームとを選択するために、前記復号化済フレームに対して付与された第１相対インデックスと第２相対インデックスとを用いて、前記第１または第２少なくとも一方の参照フレームを選択する第二ステップと、
前記第１または第２少なくとも一方の参照フレーム上の動き補償により得られたブロックから画素補間により予測画像を生成する第三のステップと、
前記予測画像と復号した予測誤差からフレームの復号画像を生成する第四のステップと、
フレーム間予測に使用される可能性があるフレームの復号画像をマルチフレームバッファに格納する第五のステップと
を有する画像復号方法における第一のステップにおいて、
画像符号化信号中のスケーリング係数表を復号し、
前記第二のステップにおいて、
前記復号化済フレームのうち前記復号化対象フレームより表示順が後で前記第２相対インデックスが最小の参照フレームを前記第２の参照フレームとして選択し、
前記第三のステップにおいて、
前記第２の参照フレーム内で、前記復号化対象フレーム上の所定のブロックと同じ位置のブロックの動き補償で使用した動きベクトルが参照するフレームを前記第１の参照フレームとし、前記第１相対インデックスに対応するスケーリング係数を前記スケーリング係数表から選択し、前記動きベクトルと前記スケーリング係数から前記第１の参照フレームに対する動きベクトルと前記第２の参照フレームへの算出し、前記第１の参照フレームに対する動きベクトルから得られるブロックと前記第２の参照フレームに対する動きベクトルから得られるブロックとから画素補間により予測画像を生成することにより、
第１の参照フレーム毎にスケーリング係数をダイレクトモード用スケーリング係数表から選択でき、符号化対象フレームと参照フレームの表示時刻差を考慮したダイレクトモード用動きベクトル生成ができるため、ダイレクトモードの符号化効率を改善できる。
【００７０】
第３の発明は、
マルチフレームバッファに格納されている複数の符号化済フレームから、符号化対象フレーム上のブロックを動き補償により求めるときに参照する第１の参照フレームと第２の参照フレームとを選択するために、前記符号化済フレームに対して付与された第１相対インデックスと第２相対インデックスとを用いて、前記第１または第２少なくとも一方の参照フレームを選択する第一のステップと、
前記第１または第２少なくとも一方の参照フレーム上の動き補償により得られたブロックから画素補間により予測画像を生成する第二のステップと、
前記予測誤差を符号化し、予測誤差の符号化信号を含む画像符号化信号を出力する第三のステップとを有する画像符号化方法における第二のステップにおいて、
前記第２の参照フレーム内で、前記符号化対象フレーム上の所定のブロックと同じ位置のブロックの動き補償で使用した動きベクトルが参照するフレームを前記第１の参照フレームとし、
第１相対インデックスの値と対応付けられたスケーリング係数、および、画素補間あるいは第１の参照フレームからの予測あるいは第２の参照フレームからの予測を示す予測方法種別を１個以上格納したスケーリング係数・予測方法表を備え、
前記第１相対インデックスに対応するスケーリング係数と予測方法種別を前記スケーリング係数・予測方法表から選択し、前記動きベクトルと前記スケーリング係数から前記第１の参照フレームに対する動きベクトルと前記第２の参照フレームへの算出し、
予測方法種別が画素補間の場合には、前記第１の参照フレームに対する動きベクトルから得られるブロックと前記第２の参照フレームに対する動きベクトルから得られるブロックとから画素補間により予測画像を生成し、予測方法種別が第１の参照フレームからの予測の場合には、前記第１の参照フレームに対する動きベクトルから得られるブロックを予測画像とし、予測方法種別が第２の参照フレームからの予測の場合には、前記第２の参照フレームに対する動きベクトルから得られるブロックを予測画像とし、
前記第三のステップにおいて、
画像符号化信号中に前記スケーリング係数・予測方法表を含めて出力することにより、
補間予測のみではなく、参照フレーム１枚のみを使用したダイレクトモードを使用できるため、符号化効率を改善できる。
【００７１】
第４の発明は、
予測誤差の符号化信号を含む画像符号化信号を入力する第一のステップと、
マルチフレームバッファに格納されている複数の復号化済フレームから、復号化対象フレーム上のブロックを動き補償により求めるときに参照する第１の参照フレームと第２の参照フレームとを選択するために、前記復号化済フレームに対して付与された第１相対インデックスと第２相対インデックスとを用いて、前記第１または第２少なくとも一方の参照フレームを選択する第二ステップと、
前記第１または第２少なくとも一方の参照フレーム上の動き補償により得られたブロックから画素補間により予測画像を生成する第三のステップと、
前記予測画像と復号した予測誤差からフレームの復号画像を生成する第四のステップと、
フレーム間予測に使用される可能性があるフレームの復号画像をマルチフレームバッファに格納する第五のステップと
を有する画像復号方法における第一のステップにおいて、
画像符号化信号中のスケーリング係数・予測方法表を復号し、
前記第二のステップにおいて、
前記復号化済フレームのうち前記復号化対象フレームより表示順が後で前記第２相対インデックスが最小の参照フレームを前記第２の参照フレームとして選択し、
前記第三のステップにおいて、
前記第２の参照フレーム内で、前記復号化対象フレーム上の所定のブロックと同じ位置のブロックの動き補償で使用した動きベクトルが参照するフレームを前記第１の参照フレームとし、
前記第１相対インデックスに対応するスケーリング係数と予測方法種別を前記スケーリング係数・予測方法表から選択し、前記動きベクトルと前記スケーリング係数から前記第１の参照フレームに対する動きベクトルと前記第２の参照フレームへの算出し、
予測方法種別が画素補間の場合には、前記第１の参照フレームに対する動きベクトルから得られるブロックと前記第２の参照フレームに対する動きベクトルから得られるブロックとから画素補間により予測画像を生成し、予測方法種別が第１の参照フレームからの予測の場合には、前記第１の参照フレームに対する動きベクトルから得られるブロックを予測画像とし、予測方法種別が第２の参照フレームからの予測の場合には、前記第２の参照フレームに対する動きベクトルから得られるブロックを予測画像とすることにより、
補間予測のみではなく、参照フレーム１枚のみを使用したダイレクトモードを使用できるため、符号化効率を改善できる。
【００７２】
【発明の実施の形態】
以下、本発明の具体的な実施の形態について、図面を参照しながら説明する。
（実施の形態１）
図１は、実施の形態１の画像符号化装置のブロック図である。図２２における従来の画像符号化装置のブロック図と同じ動作をするユニットおよび同じ動作の信号は同じ記号を付し、説明を省略する。
【００７３】
Ｈ．２６Ｌでは、ダイレクトモード用スケーリング係数はフレームに対し１個のみしか伝送できないため、ブロックに最適なダイレクトモード用スケーリング係数を選択することができず、符号化効率が悪化してしまう問題があった。そこで、本発明では、複数のダイレクトモード用スケーリング係数を格納したダイレクトモード用スケーリング係数表を備え、ダイレクトモードの第１相対インデックスに応じて、その表中からダイレクトモードスケーリング係数を選択使用することで、ダイレクトモードの符号化効率を改善する。
【００７４】
図２はダイレクトモード係数選択時に使用するダイレクトモード用スケーリング係数表の第一例である。ＲＩｄｘ１はダイレクトモード時の第１相対インデックスを示し、Ｎ１，Ｎ２，Ｄの値の組はスケーリング係数を示す。表の各行のスケーリング係数は、行頭のＲＩｄｘ１の値に対する係数を示す。
【００７５】
画面内での符号化対象ブロックを含む物体の動きが一定であると仮定した場合、符号化対象フレームと第１参照フレームの表示時刻差と、符号化対象フレームと第２参照フレーム２の表示時刻差との比によって、ダイレクトモード用スケーリング係数を決定することができる。
【００７６】
図２０を例にすれば、ブロックＢｌｋ０のように第１相対インデックスＲＩｄｘ１が０の場合にスケーリング係数は（Ｎ１，Ｎ２，Ｄ）＝（２，−１，３）、ブロックＢｌｋ１のように第１相対インデックスＲＩｄｘ１が１の場合にスケーリング係数は（Ｎ１，Ｎ２，Ｄ）＝（５，−１，６）となる。同様にしてＲＩｄｘ１＝２以降の値に対してスケーリング係数を決定した結果により、図２にスケーリング係数を決定できる。
【００７７】
図１のダイレクトモード用スケーリング係数表バッファＳＰＴａｂｌｅＢｕｆには、ダイレクトモード用スケーリング係数表が格納されている。ダイレクトモード用スケーリング係数選択手段ＳＰＳｅｌは、動きベクトル・フレーム番号バッファＭＶＦＮＢｕｆから得られた第１相対インデックスＲＩｄｘ１を入力する。そして、ダイレクトモード用スケーリング係数表の第１列からＲＩｄｘ１＝ｒＲＩｄｘ１の条件を満たす行を検索し、検索の結果得られた行の　スケーリング係数（Ｎ１，Ｎ２，Ｄ）をダイレクトモード用スケーリング係数ＳＰとして出力する。
【００７８】
画像符号化装置で使用したダイレクトモード用スケーリング係数表は、符号化信号に格納し画像復号装置に伝送することにより、画像復号装置でも同じダイレクトモード用スケーリング係数を使用することができる。そこで、ダイレクトモード用スケーリング係数表バッファＳＰＴａｂｌｅＢｕｆからダイレクトモード用スケーリング係数表ＳＰｓを取り出し、可変長符号化手段ＶＬＣ４により、残差符号化信号ＥＲｅｓ、予測種別ＰｒｅｄＴｙｐｅ、相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２とともに、可変長符号化を行い、符号化信号ＢｉｔＳｔｒｍ４として出力する。
【００７９】
ここで、ダイレクトモード用スケーリング係数表を画像符号化信号ＢｉｔＳｔｒｍ４中に符号化する方法について説明する。本実施の形態では、ダイレクトモード用スケーリング係数表の符号化方法として異なる４つの方法を示す。
【００８０】
第１のダイレクトモード用スケーリング係数表の符号化方法は、Ｎ１，Ｎ２，Ｄの３つのパラメータとも画像符号化信号ＢｉｔＳｔｒｍ４中に符号化する方法である。図３は、この方法による画像符号化信号フォーマット例である。図２１の従来の画像符号化装置の画像符号化信号フォーマットに対しヘッダＨｅａｄｅｒ部のみ異なるため、Ｈｅａｄｅｒ部のみ図示する。Ｎはダイレクトモード用係数表に含まれるダイレクトモード用スケーリング係数の個数を示す。
【００８１】
Ｎ１（ｒ），Ｎ２（ｒ），Ｄ（ｒ）は第１相対インデックスＲＩｄｘ１＝ｒに対するダイレクトモード用スケーリング係数を示す。Ｎ個分のダイレクトモード用スケーリング係数がＨｅａｄｅｒ部に格納されており、画像復号装置はこのＨｅａｄｅｒ部を解析することで第１相対インデックスＲＩｄｘ１に対するダイレクトモード用スケーリング係数を取得することができる。
【００８２】
第２のダイレクトモード用スケーリング係数表の符号化方法は、Ｄ，Ｎ１，Ｎ２の間で成り立つ関係式を定義し、画像符号化信号ＢｉｔＳｔｒｍ４中ではＤ，Ｎ１，Ｎ２のうち２つのパラメータのみしか格納しない方法である。図４はこの方法による画像符号化信号フォーマット例である。
【００８３】
符号化対象ブロックを含む画面内の物体の動きが一定の仮定の下で、ダイレクトモードの第１動きベクトル、第２動きベクトルは、スケーリング用ベクトルを符号化対象フレームと第１参照フレームの表示時刻差と符号化対象フレームと第２参照フレームの表示時刻差の比で内分した結果になる。フレームレートと参照フレームの表示時刻間隔が一定とすると、内分の場合にはＤ＝Ｎ１−Ｎ２の関係式が成り立つため、画像符号化装置と画像復号装置の間でＤ＝Ｎ１−Ｎ２の関係式を既定することで、Ｎ１，Ｎ２，Ｄのいずれか１つのパラメータを画像復号装置に伝送しなくても、Ｎ１，Ｎ２，Ｄを決定することができる。図４は、Ｄ＝Ｎ１−Ｎ２の関係式が成り立つとし、Ｎ１を省略した場合の画像符号化信号フォーマットである。同様に、Ｎ２，Ｄを省略した画像符号化信号フォーマットも定義できる。
【００８４】
第３のダイレクトモード用スケーリング係数表の符号化方法は、Ｎ１，Ｎ２，Ｄのいずれかの値を表中の全ダイレクトモード用スケーリング係数で共通にする方法である。
【００８５】
ここでは、ダイレクトモードの第１動きベクトル、第２動きベクトルは、スケーリング用ベクトルを、符号化対象フレームと第１参照フレームの表示時刻差と符号化対象フレームと第２参照フレームの表示時刻差との比で内分した結果になると考える。
【００８６】
ダイレクトモードに使用する第２参照フレームがフレーム内で常に同じ参照フレームをであるとき、符号化対象フレームと第２参照フレームの表示時刻差はフレーム内で常に一定となる。そのため、第２参照ベクトルの計算時に使用するＮ２は固定値Ｃに固定にし、Ｎ１，Ｄを変更することで表示時刻差による内分を示すダイレクトモード用スケーリング係数を生成することができる。例えば、図２０の場合には、Ｃは−１とすることができる。図５はこの方法による画像符号化信号フォーマットの例である。
【００８７】
Ｈｅａｄｅｒ部に１〜Ｎのダイレクトモード用スケーリング係数で共通に使用されるＮ２の値を１つのみ格納している。
【００８８】
同様に、ダイレクトモードに使用する第１参照フレームがフレーム内で常に同じ参照フレームをであるとき、Ｎ１に関しても全てのダイレクトモード用スケーリング係数に共通に使用される値のみＨｅａｄｅｒ部に格納し、Ｎ２，Ｄに関してはＮ組をＨｅａｄｅｒ部に格納するようにしてもよい。
【００８９】
同様に、ダイレクトモードに使用する第１参照フレームと第２参照フレームとの間隔がフレーム内で常に一定であるとき、Ｄに関しても全てのダイレクトモード用スケーリング係数に共通に使用される値のみＨｅａｄｅｒ部に格納し、Ｎ１，Ｎ２の組Ｎ個をＨｅａｄｅｒ部に格納するようにしてもよい。
【００９０】
第４のダイレクトモード用スケーリング係数表の符号化方法は、Ｎ１，Ｎ２，Ｄ生成のための計算式を既定する方法である。
【００９１】
例えば、参照フレームのフレーム時間間隔が一定、第１参照フレームと第２参照フレームはスケーリング用ベクトルを内分した結果とすれば、Ｄ，Ｎ１，Ｎ２を次式で計算できる。
Ｄ（ｉ）＝Ｋ・（ｉ＋１）、Ｎ２（ｉ）＝ａ、Ｎ１（ｉ）＝Ｄ（ｉ）−Ｎ２（ｉ）
（ｉは第１相対インデックスが示す値、Ｋは参照フレームのフレーム間隔（例えば図２０のＲｅｆ１とＲｅｆ２との間隔）、ａは符号化対象フレームと第２参照フレームとの間隔（例えば図２０のＦｒｍとＲｅｆ３との間隔））
【００９２】
図６はこの方法による画像符号化信号フォーマット例である。符号化信号中にはＫ，ａのみを符号化する。画像符号化装置、画像復号装置、ともに上式を使用することで、符号化側・復号側で同じダイレクトモード用スケーリング係数が生成できる。
以上がダイレクトモード用スケーリング係数表の符号化方法についての説明である。
【００９３】
次に、ダイレクトモードの動きベクトルが図１９で示すような長期フレームバッファに格納されたフレームを参照する場合を考える。図７はダイレクトモード用スケーリング係数表の例である。ここでは、長期フレームバッファに含まれる全ての参照フレームに対し共通のダイレクトモード用スケーリング係数を設定する。画像符号化信号フォーマットには、長期フレームバッファに含まれるフレームに対しては、上記の長期フレームバッファに含まれる全ての参照フレームに対して共通に使用するダイレクトモード用スケーリング係数のみ格納する。長期フレームバッファに含まれるフレームは、長期間、参照フレームとなることから動きが殆どない画像と想定されるため、全て同じ係数を割り当てても問題ない。長期フレームバッファに含まれる全てのフレームに対し、ダイレクトモード用スケーリング係数送るだけで済むため符号量を削減できる。
【００９４】
本実施の形態では、Ｎ１，Ｎ２，Ｄの値は整数値であり、各整数値に対し可変長符号語を割り当てることができる。しかし、Ｎ１，Ｎ２，Ｄの値の範囲を制限することにより、より効率的な可変長符号語の割り当てができる。図８はダイレクトモード用スケーリング係数に対する符号語表の例である。（Ａ）はＮ１に対する可変長符号語表、（Ｂ）はＮ２に対する可変長符号語表、（Ｃ）はＤに対する可変長符号語表である。
【００９５】
画面内の符号化対象ブロックを含む物体の動きが一定であると仮定すれば、スケーリング用ベクトルを内分し、第１動きベクトルと第２ベクトルとすることができる。このとき、Ｄ，Ｎ１を正値、Ｎ２を負値にすれば内分を行うことができるので、Ｄ，Ｎ１の値の範囲は正値、Ｎ２の値の範囲は負値に制限しても問題はない。
【００９６】
そこで、図８で示すように、Ｄ，Ｎ１は正値のみ、Ｎ２は負値のみ可変長符号語を割り当てるようにしてもよい。使用しない値に可変長符号語を割り当てなくてもすむため符号化効率が高くなる。なお、Ｎ１の値は正値としたが０以上の値として、０にも可変長符号語を割り当ててもよい。同様にＮ２の値は０以下の値として、０にも可変長符号語を割り当ててもよい。また、本例では各Ｎ１，Ｎ２，Ｄに１つの可変長符号語を割り当てたが、Ｎ１，Ｎ２，Ｄ値の組み合わせに対して１つの可変長符号語を割り当ててもよい。
【００９７】
以上のように本実施の画像符号化装置は、ダイレクトモードの第１相対インデックスに対応したダイレクトモード用スケーリング係数をダイレクトモード用スケーリング係数表から選択し使用することで、符号化対象フレームと参照フレームの表示時刻差を考慮したダイレクトモード用動きベクトル生成ができるため、ダイレクトモードの符号化効率を改善できる。
【００９８】
（実施の形態２）
図９は、実施の形態２の画像復号装置のブロック図である。図２３における従来の画像復号装置の構成を示すブロック図と同じ動作をするユニットおよび同じ動作の信号は同じ記号を付し、説明を省略する。
【００９９】
可変長復号手段ＶＬＤ４は、画像符号化信号ＢｉｔＳｔｒｍ４を入力し、可変長復号を行い、残差符号化信号ＥＲｅｓ、予測種別ＰｒｅｄＴｙｐｅ、相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２、ダイレクトモード用スケーリング係数表ＳＰｓを出力する。ダイレクトモード用スケーリング係数表ＳＰｓは、ダイレクトモード用スケーリング係数表バッファＳＰＴａｂｌｅＢｕｆに格納する。ダイレクトモード用スケーリング係数選択手段ＳＰＳｅｌは、実施の形態１で示した図１の画像符号化装置のダイレクトモード用スケーリング係数選択手段ＳＰＳｅｌと同じ処理を行うとする。
【０１００】
画像符号化信号中からダイレクトモード係数表ＳＰｓを復号する場合には、実施の形態１で説明したダイレクトモード用スケーリング係数表の符号化方法に対応した復号を行う。
【０１０１】
例えば、実施の形態１で説明した第１のダイレクトモード用スケーリング係数表の符号化方法に対して、Ｎ１，Ｎ２，Ｄの間に関係式を定義し１つのパラメータを省略した場合には、Ｎ１，Ｎ２，Ｄの関係式から未知のパラメータを計算する。
【０１０２】
また、図４の画像符号化信号フォーマットのように、パラメータの１つを全ダイレクトモード用スケーリング係数に共通にし、共通のパラメータは１つのみ画像符号化信号中に格納する場合には、共通のパラメータを復号し、それを全ダイレクトモード用スケーリング係数に使用する。
【０１０３】
図６の画像符号化信号フォーマットのように、所定の計算式によりパラメータ値を計算できる場合には、その所定の計算式を使用して得られたパラメータ値を全ダイレクトモード用スケーリング係数に使用する。
【０１０４】
次に、相対インデックスｒＲＩｄｘに該当するダイレクトモード用スケーリング係数がダイレクトモード用スケーリング係数表に存在しない場合の処理について２つの方法を説明する。
【０１０５】
最初に第１の方法を説明する。図１０は本実施の形態における第１のダイレクトモード用スケーリング係数選択フローである。ｒＲＩｄｘはダイレクトモード時の第１相対インデックスとする。処理Ｆ１１でＲＩｄｘ１＝ｒＲＩｄｘに対するダイレクトモード用スケーリング係数がダイレクトモード用スケーリング係数表に存在するか調べる。もし、表中にダイレクトモード用スケーリング係数が存在していれば、処理Ｆ１３により、ｒＲＩｄｘに対応するダイレクトモード用スケーリング係数を選択する。もし、表中にダイレクトモード用スケーリング係数が存在しなければ、処理Ｆ１２によりダイレクトモード用スケーリング係数表でＲＩｄｘ１の値が最大のダイレクトモード用スケーリング係数を選択する。そしてダイレクトモード用スケーリング係数を選択する処理を終了する。ここで、Ｒｉｄｘ１が最大のダイレクトモード用スケーリング係数を選択するのは、表中にないダイレクトモード用スケーリング係数で、表中で最も近いダイレクトモード用スケーリング係数としては、ＲＩｄｘ１の値が最大のものになるからである。これは、従来の技術でも説明したように、デフォルトでは第１相対インデックスの値には、まず、符号化対象フレームより前の表示時刻を持つ参照フレームに対し、符号化対象フレームに近い順より０から始まる値が割り当てられるからである。
【０１０６】
次に第２の方法を説明する。図１１は本実施の形態における第２のダイレクトモード用スケーリング係数選択フローである。図１０との違いはＲＩｄｘ１＝ｒＲＩｄｘに対するダイレクトモード用スケーリング係数がダイレクトモード用スケーリング係数表に存在していなければ、処理Ｆ２２により既定のダイレクトモード用スケーリング係数を使用することである。
【０１０７】
以上のように本実施の形態によれば、実施の形態１で説明した本発明の画像符号化方法を用いた画像符号化装置のブロック図で符号化した画像符号化信号を正しく復号化できる。また、複数のダイレクトモード用スケーリング係数を使用することで、参照フレーム候補が複数ある場合にも符号化対象フレームと参照フレームの表示時刻差を考慮したダイレクトモード用動きベクトル生成ができるため符号化効率を改善できる。
【０１０８】
（実施の形態３）
図１２は、実施の形態３の画像符号化装置のブロック図である。図１における実施の形態１の画像符号化装置のブロック図と同じ動作をするユニットおよび同じ動作の信号は同じ記号を付し、説明を省略する。
【０１０９】
従来のダイレクトモードでは予測方法として補間予測のみしか使用できなかったが、本実施の形態ではダイレクトモードとして１枚の参照フレームからのフレーム予測も使用することができる。例えば、図２０において、Ｒｅｆ１とＦｒｍの間でシーンチェンジが発生すると、Ｒｅｆ１，Ｒｅｆ２とＦｒｍの間に相関がなくなり、フレーム間予測の効果がなくなる。この場合、相関がない参照フレームＲｅｆ１，Ｒｅｆ２を使用した補間予測よりも、Ｒｅｆ３のみを参照フレームとする予測の方が予測効率は高くなる。
【０１１０】
従って、１枚の参照フレームのみを使用する第１参照フレーム予測や第２参照フレーム予測をダイレクトモードとして使用すれば符号化効率を上げることができる。
【０１１１】
ダイレクトモード用スケーリング係数・予測方法表バッファＳＰＰＲｅｄＴａｂｌｅＢｕｆに格納されたダイレクトモード用スケーリング係数・予測方法表について、図１３を用いて説明する。図２と図１３の表の違いは、図１３の表には予測方法が追加されていることである。
【０１１２】
ダイレクトモード用スケーリング係数・予測方法選択手段ＳＰＰｒｅｄＳｅｌは、第１相対インデックスｒＲＩｄｘ１に対応するダイレクトモード用スケーリング係数ＳＰとダイレクトモード予測方法ＤＰｒｅｄの組を、ダイレクトモード用スケーリング係数・予測方法表バッファＳＰＰＲｅｄＴａｂｌｅＢｕｆから選択し出力する。
【０１１３】
ダイレクトモード予測方法ＤＰｒｅｄが第１参照フレーム予測を示す場合には、ダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲｅｆＩｄｘ２は、第１相対インデックスｓＲＩｄｘ１、第１動きベクトルｓＭＶ１のみを出力する。
【０１１４】
ダイレクトモード予測方法ＤＰｒｅｄが第２参照フレーム予測を示す場合には、ダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲｅｆＩｄｘ２は、第２相対インデックスｓＲＩｄｘ２、第２動きベクトルｓＭＶ２のみを出力する。
【０１１５】
ダイレクトモード選択時で、ダイレクトモード予測方法ＤＰｒｅｄが第１参照フレーム予測を示す場合には、予測種別選択手段ＰｒｅｄＳｅｌは、スイッチＳＷ１１を”１”側に切り替え、第１相対インデックスＲＩｄｘ１、第１参照ベクトルＭＶ１が示す参照ブロックＲｅｆＢｌｋ１を予測に使用する。ダイレクトモード選択時で、ダイレクトモード予測方法ＤＰｒｅｄが第２参照フレーム予測を示す場合には、予測種別選択手段ＰｒｅｄＳｅｌは、スイッチＳＷ１１を”１”側に切り替え、第２相対インデックスＲＩｄｘ２、第２動きベクトルＭＶ２が示す参照ブロックＲｅｆＢｌｋ２を予測に使用する。
【０１１６】
ダイレクトモード予測方法ＤＰｒｅｄが補間予測を示す場合には、スイッチＳＷ１１を”０”側に切り替え、第１相対インデックスＲＩｄｘ１、第１参照ベクトルＭＶ１が示す参照ブロックＲｅｆＢｌｋ１と、第２相対インデックスＲＩｄｘ２、第２参照ベクトルＭＶ２が示す参照ブロックＲｅｆＢｌｋ２を補間予測に使用する。
【０１１７】
画像符号化装置と画像復号装置で、共通のダイレクトモード用スケーリング係数・予測方法表ＳＰＰｒｅｄｓを使用できるように、ダイレクトモード用スケーリング係数・予測方法表ＳＰＰｒｅｄｓは符号化信号中に格納する。このとき、以下に示す方法のようにダイレクトモード用スケーリング係数Ｎ１，Ｎ２の値を利用することで、表中のダイレクトモード予測方法は符号化せず、すなわち、実施の形態１で示した画像符号化信号フォーマットのままで、画像復号装置にダイレクトモード予測方法ＤＰｒｅｄを通知することができる。
【０１１８】
図１３の表中からＮ２が０のダイレクトモード用スケーリング係数が選択された場合には、補間予測ではなく、第１参照フレーム予測を使用すると定義する。このとき、スケーリングベクトルをＭＶすれば、第１動きベクトルは（Ｎ１×ＭＶ）／Ｄとして計算することができる。
【０１１９】
同様に、Ｎ１が０のダイレクトモード用スケーリング係数が選択された場合には第２参照フレーム予測と定義し、第１動きベクトルは（Ｎ２×ＭＶ）／Ｄとして計算することができる。
【０１２０】
可変長符号化手段ＶＬＣ５は、残差符号化信号ＥＲｅｓ、予測種別ＰｒｅｄＴｙｐｅ、相対インデックスＲｅｆＩｄｘ１，ＲｅｆＩｄｘ２、動きベクトルＭＶ１，ＭＶ２、ダイレクトモード用スケーリング係数・予測方法表ＳＰＰｒｅｄｓを可変長符号化し、符号化信号ＢｉｔＳｔｒｍ５として出力する。
【０１２１】
以上のように本実施の形態によれば、シーンチェンジなどにより符号化対象フレームと相関が高い参照フレームが１枚しかなくなり補間予測の効果がなくなった場合でも、相関が高い参照フレームのみを使用したダイレクトモードを使用できるため、符号化効率を改善できる。
【０１２２】
（実施の形態４）
図１４は、実施の形態４の画像復号装置のブロック図である。図９における実施の形態２の画像復号装置のブロック図と同じ動作をするユニットおよび同じ動作の信号は同じ記号を付し、説明を省略する。
【０１２３】
可変長復号手段ＶＬＤ５は、画像符号化信号ＢｉｔＳｔｒｍ５を入力し、可変長復号を行い、残差符号化信号ＥＲｅｓ、予測種別ＰｒｅｄＴｙｐｅ、相対インデックスＲＩｄｘ１，ＲＩｄｘ２、動きベクトルＭＶ１，ＭＶ２ダイレクトモード用スケーリング係数・予測方法表ＳＰＰｒｅｄｓを出力する。
【０１２４】
ダイレクトモード用スケーリング係数・予測方法表ＳＰＰｒｅｄｓは、ダイレクトモード用スケーリング係数・予測方法表バッファＳＰＰＲｅｄＴａｂｌｅＢｕｆに格納する。ダイレクトモード用スケーリング係数・予測方法選択手段ＳＰＰｒｅｄＳｅｌは、第１相対インデックスｒＲＩｄｘ１に対応するダイレクトモード用スケーリング係数ＳＰとダイレクトモード予測方法ＤＰｒｅｄの組を、ダイレクトモード用スケーリング係数・予測方法表バッファＳＰＰＲｅｄＴａｂｌｅＢｕｆから選択し出力する。
【０１２５】
ダイレクトモード予測方法ＤＰｒｅｄが第１参照フレーム予測を示す場合には、ダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲｅｆＩｄｘ２は、第１相対インデックスｓＲＩｄｘ１、第１動きベクトルｓＭＶ１のみを出力する。
【０１２６】
ダイレクトモード予測方法ＤＰｒｅｄが第２参照フレーム予測を示す場合には、ダイレクトモード用ベクトル・相対インデックス生成手段ＧｅｎＭＶＲｅｆＩｄｘ２は、第２相対インデックス号ｓＲＩｄｘ２、第２動きベクトルｓＭＶ２のみを出力する。
【０１２７】
ダイレクトモード選択時で、ダイレクトモード予測方法ＤＰｒｅｄが第１参照フレーム予測を示す場合には、スイッチＳＷ２３を”１”側に切り替え、第１相対インデックスｎＲＲＩｄｘ１、第１参照ベクトルｎＭＶ１が示す参照ブロックＲｅｆＢｌｋを予測に使用する。
【０１２８】
ダイレクトモード選択時で、ダイレクトモード予測方法ＤＰｒｅｄが第２参照フレーム予測を示す場合には、スイッチＳＷ２３を”１”側に切り替え、第２相対インデックスｎＲＩｄｘ２、第２動きベクトルｎＭＶ２が示す参照ブロックＲｅｆＢｌｋを予測に使用する。
【０１２９】
ダイレクトモード予測方法ＤＰｒｅｄが補間予測を示す場合には、スイッチＳＷ２３を”０”側に切り替え、第１相対インデックスｎＲＩｄｘ１、第１参照ベクトルｎＭＶ１が示す参照ブロックＲｅｆＢｌｋ１と、第２相対インデックスｎＲＩｄｘ２、第２参照ベクトルｎＭＶ２が示す参照ブロックＲｅｆＢｌｋ２を補間予測に使用する。
【０１３０】
以上のように本実施の画像復号装置は、符号化信号中のダイレクトモード用スケーリング係数・予測方法表を復号し、ダイレクトモード用スケーリング係数表からダイレクトモードの相対インデックスの値に応じたダイレクトモード用スケーリング係数を使用することにより、実施の形態３で説明した画像符号化装置で符号化した画像符号化信号を正しく復号化できる。
【０１３１】
（実施の形態５）
さらに、上記各実施の形態で示した画像符号化方法または画像復号方法の構成を実現するためのプログラムを、フレキシブルディスク等の記憶媒体に記録するようにすることにより、上記各実施の形態で示した処理を、独立したコンピュータシステムにおいて簡単に実施することが可能となる。
【０１３２】
図１５は、実施の形態１〜実施の形態４の画像符号化方法および画像復号方法をコンピュータシステムにより実現するためのプログラムを格納するための記憶媒体についての説明図である。
【０１３３】
図１５（ｂ）は、フレキシブルディスクの正面からみた外観、断面構造、及びフレキシブルディスクを示し、図１５（ａ）は、記録媒体本体であるフレキシブルディスクの物理フォーマットの例を示している。フレキシブルディスクＦＤはケースＦ内に内蔵され、該ディスクの表面には、同心円状に外周からは内周に向かって複数のトラックＴｒが形成され、各トラックは角度方向に１６のセクタＳｅに分割されている。従って、上記プログラムを格納したフレキシブルディスクでは、上記フレキシブルディスクＦＤ上に割り当てられた領域に、上記プログラムとしての画像符号化方法が記録されている。
【０１３４】
また、図１５（ｃ）は、フレキシブルディスクＦＤに上記プログラムの記録再生を行うための構成を示す。上記プログラムをフレキシブルディスクＦＤに記録する場合は、コンピュータシステムＣｓから上記プログラムとしての画像符号化方法または画像復号化方法をフレキシブルディスクドライブＦＤＤを介して書き込む。また、フレキシブルディスク内のプログラムにより上記画像符号化方法をコンピュータシステム中に構築する場合は、フレキシブルディスクドライブによりプログラムをフレキシブルディスクから読み出し、コンピュータシステムに転送する。
【０１３５】
なお、上記説明では、記録媒体としてフレキシブルディスクを用いて説明を行ったが、光ディスクを用いても同様に行うことができる。また、記録媒体はこれに限らず、ＩＣカード、ＲＯＭカセット等、プログラムを記録できるものであれば同様に実施することができる。
【０１３６】
さらにここで、上記実施の形態で示した画像符号化装置または画像復号化装置の応用例とそれを用いたシステムを説明する。
【０１３７】
図２４は、コンテンツ配信サービスを実現するコンテンツ供給システムｅｘ１００の全体構成を示すブロック図である。通信サービスの提供エリアを所望の大きさに分割し、各セル内にそれぞれ固定無線局である基地局ｅｘ１０７〜ｅｘ１１０が設置されている。
【０１３８】
このコンテンツ供給システムｅｘ１００は、例えば、インターネットｅｘ１０１にインターネットサービスプロバイダｅｘ１０２および電話網ｅｘ１０４、および基地局ｅｘ１０７〜ｅｘ１１０を介して、コンピュータｅｘ１１１、ＰＤＡ（ｐｅｒｓｏｎａｌｄｉｇｉｔａｌ　ａｓｓｉｓｔａｎｔ）ｅｘ１１２、カメラｅｘ１１３、携帯電話ｅｘ１１４、カメラ付きの携帯電話ｅｘ１１５などの各機器が接続される。
【０１３９】
しかし、コンテンツ供給システムｅｘ１００は図２４のような組合せに限定されず、いずれかを組み合わせて接続するようにしてもよい。また、固定無線局である基地局ｅｘ１０７〜ｅｘ１１０を介さずに、各機器が電話網ｅｘ１０４に直接接続されてもよい。
【０１４０】
カメラｅｘ１１３はデジタルビデオカメラ等の動画撮影が可能な機器である。また、携帯電話は、ＰＤＣ（Ｐｅｒｓｏｎａｌ　Ｄｉｇｉｔａｌ　Ｃｏｍｍｕｎｉｃａｔｉｏｎｓ）方式、ＣＤＭＡ（Ｃｏｄｅ　Ｄｉｖｉｓｉｏｎ　Ｍｕｌｔｉｐｌｅ　Ａｃｃｅｓｓ）方式、Ｗ−ＣＤＭＡ（Ｗｉｄｅｂａｎｄ−Ｃｏｄｅ　Ｄｉｖｉｓｉｏｎ　Ｍｕｌｔｉｐｌｅ　Ａｃｃｅｓｓ）方式、若しくはＧＳＭ（Ｇｌｏｂａｌ　Ｓｙｓｔｅｍ　ｆｏｒ　Ｍｏｂｉｌｅ　Ｃｏｍｍｕｎｉｃａｔｉｏｎｓ）方式の携帯電話機、またはＰＨＳ（Ｐｅｒｓｏｎａｌ　Ｈａｎｄｙｐｈｏｎｅ　Ｓｙｓｔｅｍ）等であり、いずれでも構わない。
【０１４１】
また、ストリーミングサーバｅｘ１０３は、カメラｅｘ１１３から基地局ｅｘ１０９、電話網ｅｘ１０４を通じて接続されており、カメラｅｘ１１３を用いてユーザが送信する符号化処理されたデータに基づいたライブ配信等が可能になる。撮影したデータの符号化処理はカメラｅｘ１１３で行っても、データの送信処理をするサーバ等で行ってもよい。また、カメラ１１６で撮影した動画データはコンピュータｅｘ１１１を介してストリーミングサーバｅｘ１０３に送信されてもよい。カメラｅｘ１１６はデジタルカメラ等の静止画、動画が撮影可能な機器である。この場合、動画データの符号化はカメラｅｘ１１６で行ってもコンピュータｅｘ１１１で行ってもどちらでもよい。また、符号化処理はコンピュータｅｘ１１１やカメラｅｘ１１６が有するＬＳＩｅｘ１１７において処理することになる。なお、画像符号化・復号化用のソフトウェアをコンピュータｅｘ１１１等で読み取り可能な記録媒体である何らかの蓄積メディア（ＣＤ−ＲＯＭ、フレキシブルディスク、ハードディスクなど）に組み込んでもよい。さらに、カメラ付きの携帯電話ｅｘ１１５で動画データを送信してもよい。このときの動画データは携帯電話ｅｘ１１５が有するＬＳＩで符号化処理されたデータである。
【０１４２】
このコンテンツ供給システムｅｘ１００では、ユーザがカメラｅｘ１１３、カメラｅｘ１１６等で撮影しているコンテンツ（例えば、音楽ライブを撮影した映像等）を上記実施の形態同様に符号化処理してストリーミングサーバｅｘ１０３に送信する一方で、ストリーミングサーバｅｘ１０３は要求のあったクライアントに対して上記コンテンツデータをストリーム配信する。クライアントとしては、上記符号化処理されたデータを復号化することが可能な、コンピュータｅｘ１１１、ＰＤＡｅｘ１１２、カメラｅｘ１１３、携帯電話ｅｘ１１４等がある。このようにすることでコンテンツ供給システムｅｘ１００は、符号化されたデータをクライアントにおいて受信して再生することができ、さらにクライアントにおいてリアルタイムで受信して復号化し、再生することにより、個人放送をも実現可能になるシステムである。
【０１４３】
このシステムを構成する各機器の符号化、復号化には上記各実施の形態で示した画像符号化装置あるいは画像復号化装置を用いるようにすればよい。
【０１４４】
その一例として携帯電話について説明する。
図２５は、上記実施の形態で説明した画像符号化装置および画像復号化装置を用いた携帯電話ｅｘ１１５を示す図である。携帯電話ｅｘ１１５は、基地局ｅｘ１１０との間で電波を送受信するためのアンテナｅｘ２０１、ＣＣＤカメラ等の映像、静止画を撮ることが可能なカメラ部ｅｘ２０３、カメラ部ｅｘ２０３で撮影した映像、アンテナｅｘ２０１で受信した映像等が復号化されたデータを表示する液晶ディスプレイ等の表示部ｅｘ２０２、操作キーｅｘ２０４群から構成される本体部、音声出力をするためのスピーカ等の音声出力部ｅｘ２０８、音声入力をするためのマイク等の音声入力部ｅｘ２０５、撮影した動画もしくは静止画のデータ、受信したメールのデータ、動画のデータもしくは静止画のデータ等、符号化されたデータまたは復号化されたデータを保存するための記憶メディアｅｘ２０７、携帯電話ｅｘ１１５に記憶メディアｅｘ２０７を装着可能とするためのスロット部ｅｘ２０６を有している。記憶メディアｅｘ２０７はＳＤカード等のプラスチックケース内に電気的に書換えや消去が可能な不揮発性メモリであるＥＥＰＲＯＭ（Ｅｌｅｃｔｒｉｃａｌｌｙ　Ｅｒａｓａｂｌｅ　ａｎｄ　Ｐｒｏｇｒａｍｍａｂｌｅ　Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）の一種であるフラッシュメモリ素子を格納したものである。
【０１４５】
さらに、携帯電話ｅｘ１１５について図２６を用いて説明する。携帯電話ｅｘ１１５は表示部ｅｘ２０２及び操作キーｅｘ２０４を備えた本体部の各部を統括的に制御するようになされた主制御部ｅｘ３１１に対して、電源回路部ｅｘ３１０、操作入力制御部ｅｘ３０４、画像符号化部ｅｘ３１２、カメラインターフェース部ｅｘ３０３、ＬＣＤ（Ｌｉｑｕｉｄ　Ｃｒｙｓｔａｌ　Ｄｉｓｐｌａｙ）制御部ｅｘ３０２、画像復号化部ｅｘ３０９、多重分離部ｅｘ３０８、記録再生部ｅｘ３０７、変復調回路部ｅｘ３０６及び音声処理部ｅｘ３０５が同期バスｅｘ３１３を介して互いに接続されている。
【０１４６】
電源回路部ｅｘ３１０は、ユーザの操作により終話及び電源キーがオン状態にされると、バッテリパックから各部に対して電力を供給することによりカメラ付ディジタル携帯電話ｅｘ１１５を動作可能な状態に起動する。
【０１４７】
携帯電話ｅｘ１１５は、ＣＰＵ、ＲＯＭ及びＲＡＭ等でなる主制御部ｅｘ３１１の制御に基づいて、音声通話モード時に音声入力部ｅｘ２０５で集音した音声信号を音声処理部ｅｘ３０５によってディジタル音声データに変換し、これを変復調回路部ｅｘ３０６でスペクトラム拡散処理し、送受信回路部ｅｘ３０１でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナｅｘ２０１を介して送信する。また携帯電話機ｅｘ１１５は、音声通話モード時にアンテナｅｘ２０１で受信した受信信号を増幅して周波数変換処理及びアナログディジタル変換処理を施し、変復調回路部ｅｘ３０６でスペクトラム逆拡散処理し、音声処理部ｅｘ３０５によってアナログ音声信号に変換した後、これを音声出力部ｅｘ２０８を介して出力する。
【０１４８】
さらに、データ通信モード時に電子メールを送信する場合、本体部の操作キーｅｘ２０４の操作によって入力された電子メールのテキストデータは操作入力制御部ｅｘ３０４を介して主制御部ｅｘ３１１に送出される。主制御部ｅｘ３１１は、テキストデータを変復調回路部ｅｘ３０６でスペクトラム拡散処理し、送受信回路部ｅｘ３０１でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナｅｘ２０１を介して基地局ｅｘ１１０へ送信する。
【０１４９】
データ通信モード時に画像データを送信する場合、カメラ部ｅｘ２０３で撮像された画像データをカメラインターフェース部ｅｘ３０３を介して画像符号化部ｅｘ３１２に供給する。また、画像データを送信しない場合には、カメラ部ｅｘ２０３で撮像した画像データをカメラインターフェース部ｅｘ３０３及びＬＣＤ制御部ｅｘ３０２を介して表示部ｅｘ２０２に直接表示することも可能である。
【０１５０】
画像符号化部ｅｘ３１２は、本願発明で説明した画像符号化装置を備えた構成であり、カメラ部ｅｘ２０３から供給された画像データを上記実施の形態で示した画像符号化装置に用いた符号化方法によって圧縮符号化することにより符号化画像データに変換し、これを多重分離部ｅｘ３０８に送出する。また、このとき同時に携帯電話機ｅｘ１１５は、カメラ部ｅｘ２０３で撮像中に音声入力部ｅｘ２０５で集音した音声を音声処理部ｅｘ３０５を介してディジタルの音声データとして多重分離部ｅｘ３０８に送出する。
【０１５１】
多重分離部ｅｘ３０８は、画像符号化部ｅｘ３１２から供給された符号化画像データと音声処理部ｅｘ３０５から供給された音声データとを所定の方式で多重化し、その結果得られる多重化データを変復調回路部ｅｘ３０６でスペクトラム拡散処理し、送受信回路部ｅｘ３０１でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナｅｘ２０１を介して送信する。
【０１５２】
データ通信モード時にホームページ等にリンクされた動画像ファイルのデータを受信する場合、アンテナｅｘ２０１を介して基地局ｅｘ１１０から受信した受信信号を変復調回路部ｅｘ３０６でスペクトラム逆拡散処理し、その結果得られる多重化データを多重分離部ｅｘ３０８に送出する。
【０１５３】
また、アンテナｅｘ２０１を介して受信された多重化データを復号化するには、多重分離部ｅｘ３０８は、多重化データを分離することにより符号化画像データと音声データとに分け、同期バスｅｘ３１３を介して当該符号化画像データを画像復号化部ｅｘ３０９に供給すると共に当該音声データを音声処理部ｅｘ３０５に供給する。
【０１５４】
次に、画像復号化部ｅｘ３０９は、本願発明で説明した画像復号化装置を備えた構成であり、符号化画像データを上記実施の形態で示した符号化方法に対応した復号化方法で復号することにより再生動画像データを生成し、これをＬＣＤ制御部ｅｘ３０２を介して表示部ｅｘ２０２に供給し、これにより、例えばホームページにリンクされた動画像ファイルに含まれる動画データが表示される。このとき同時に音声処理部ｅｘ３０５は、音声データをアナログ音声信号に変換した後、これを音声出力部ｅｘ２０８に供給し、これにより、例えばホームページにリンクされた動画像ファイルに含まる音声データが再生される。
【０１５５】
なお、上記システムの例に限られず、最近は衛星、地上波によるディジタル放送が話題となっており、図２７に示すようにディジタル放送用システムにも上記実施の形態の少なくとも画像符号化装置または画像復号化装置のいずれかを組み込むことができる。具体的には、放送局ｅｘ４０９では映像情報の符号化ビットストリームが電波を介して通信または放送衛星ｅｘ４１０に伝送される。これを受けた放送衛星ｅｘ４１０は、放送用の電波を発信し、この電波を衛星放送受信設備をもつ家庭のアンテナｅｘ４０６で受信し、テレビ（受信機）ｅｘ４０１またはセットトップボックス（ＳＴＢ）ｅｘ４０７などの装置により符号化ビットストリームを復号化してこれを再生する。また、記録媒体である蓄積メディアｅｘ４０２に記録した符号化ビットストリームを読み取り、復号化する再生装置ｅｘ４０３にも上記実施の形態で示した画像復号化装置を実装することが可能である。この場合、再生された映像信号はモニタｅｘ４０４に表示される。また、ケーブルテレビ用のケーブルｅｘ４０５または衛星／地上波放送のアンテナｅｘ４０６に接続されたセットトップボックスｅｘ４０７内に画像復号化装置を実装し、これをテレビのモニタｅｘ４０８で再生する構成も考えられる。このときセットトップボックスではなく、テレビ内に画像符号化装置を組み込んでも良い。また、アンテナｅｘ４１１を有する車ｅｘ４１２で衛星ｅｘ４１０からまたは基地局ｅｘ１０７等から信号を受信し、車ｅｘ４１２が有するカーナビゲーションｅｘ４１３等の表示装置に動画を再生することも可能である。
【０１５６】
なお、カーナビゲーションｅｘ４１３の構成は例えば図２６に示す構成のうち、カメラ部ｅｘ２０３とカメラインターフェース部ｅｘ３０３を除いた構成が考えられ、同様なことがコンピュータｅｘ１１１やテレビ（受信機）ｅｘ４０１等でも考えられる。また、上記携帯電話ｅｘ１１４等の端末は、符号化器・復号化器を両方持つ送受信型の端末の他に、符号化器のみの送信端末、復号化器のみの受信端末の３通りの実装形式が考えられる。
【０１５７】
このように、上記実施の形態で示した画像符号化装置、画像復号化装置を上述したいずれの機器・システムに用いることは可能であり、そうすることで、上記実施の形態で説明した効果を得ることができる。
【０１５８】
【発明の効果】
以上、詳細に説明したように、本発明の画像符号化方法・画像復号方法は複数のダイレクトモード用スケーリング係数を格納したダイレクトモード用スケーリング係数表を備え、ダイレクトモードの第１相対インデックス値に応じてダイレクトモード係数を選択使用することにより、符号化対象フレームと参照フレームの表示時刻差を考慮したダイレクトモード用動きベクトル生成ができるため、ダイレクトモード時の符号化効率を改善することができる。
【図面の簡単な説明】
【図１】実施の形態１の画像符号化装置のブロック図
【図２】実施の形態１のダイレクトモード用スケーリング係数表の第１例の図
【図３】実施の形態１の画像符号化信号フォーマットの第１例の図
【図４】実施の形態１の画像符号化信号フォーマットの第２例の図
【図５】実施の形態１の画像符号化信号フォーマットの第３例の図
【図６】実施の形態１の画像符号化信号フォーマットの第４例の図
【図７】実施の形態１のダイレクトモード用スケーリング係数表の第２例の図
【図８】実施の形態１のダイレクトモード用スケーリング係数に対する符号表の図
【図９】実施の形態２の画像復号装置のブロック図
【図１０】実施の形態２の第１の方法によるダイレクトモード用スケーリング係数選択フローチャート
【図１１】実施の形態２の第２の方法によるダイレクトモード用スケーリング係数選択フローチャート
【図１２】実施の形態３の画像復号装置のブロック図
【図１３】実施の形態３のダイレクトモード用スケーリング係数・予測方法表の図
【図１４】実施の形態４の画像復号装置のブロック図
【図１５】実施の形態１〜実施の形態４の画像符号化方法および画像復号方法をコンピュータシステムにより実現するためのプログラムを格納するための記憶媒体についての説明図
【図１６】Ｂピクチャの概念図
【図１７】補間予測の説明図
【図１８】フレーム番号と相対インデックスの説明図
【図１９】短期フレームバッファと長期フレームバッファの概念図
【図２０】従来の画像符号化装置のダイレクトモードの説明図
【図２１】従来の画像符号化装置の画像符号化信号フォーマットの概念図
【図２２】従来の画像符号化装置の構成を示すブロック図
【図２３】従来の画像復号装置の構成を示すブロック図
【図２４】コンテンツ供給システムの全体構成を示すブロック図
【図２５】携帯電話の外観を示す図
【図２６】携帯電話の構成を示す図
【図２７】本実施の形態で示した画像符号化装置または画像復号化装置の応用例を示す図
【符号の説明】
ＩｍｇＥｎｃ　画像符号化手段
ＩｍｇＤｅｃ　画像復号化手段
Ａｄｄ　　加算器
Ｓｕｂ　　減算器
ＭＦｒｍＢｕｆ　マルチフレームバッファ
ＭＥ　動き推定手段
ＶＬＣ０，　ＶＬＣ４，ＶＬＣ５　可変長符号化手段
ＶＬＤ０，　ＶＬＤ４，ＶＬＤ５　可変長復号手段
ＭＶＢｕｆ　動きベクトルバッファ
Ｐｏｌ　画素補間手段
ＧｅｎＭＶＲＩｄｘダイレクトモード用ベクトル・相対インデックス生成手段
ＭＶＦＮＢｕｆ　動きベクトル・フレーム番号バッファ
ＭＶＢｕｆ　動きベクトルバッファ
ＳＰＴａｂｌｅ　ダイレクトモード用スケーリング係数表バッファ
ＳＰＳｅｌ　ダイレクトモード用スケーリング係数選択手段
ＰｒｅｄＳｅｌ　予測種別選択手段
ＳＰＰｒｅｄＴａｂｌｅ　ダイレクトモード用スケーリング係数・予測方法表バッファ
ＳＰＰｒｅｄＳｅｌ　ダイレクトモード用スケーリング係数・予測方法選択手段
ＳＷ１１〜ＳＷ１３，．ＳＷ２１〜ＳＷ２３　スイッチ
Ｃｓ　コンピュータ・システム
ＦＤ　フレキシブルディスク
ＦＤＤ　フレキシブルドライブ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method for encoding and decoding an image signal, and a recording medium on which a program for executing the method by software is recorded.
[0002]
[Prior art]
In recent years, with the development of multimedia applications, it has become common to handle information of all kinds of media such as images, sounds, and texts in a unified manner. At this time, by digitizing all media, media can be handled in a unified manner. However, since a digitized image has a huge amount of data, an image information compression technique is indispensable for storage and transmission. On the other hand, in order to interoperate compressed image data, standardization of compression technology is also important. Standards for image compression technology include H.264 of ITU-T (International Telecommunication Union Telecommunication Standardization Sector). 261, H .; H.263, MPEG (Moving Picture Experts Group) -1, MPEG-2, MPEG-4, etc. of ISO (International Organization for Standardization). In addition, the ITU is currently using H.264 as the latest image coding standard. H.264 is under standardization, and a draft draft in the standardization process is described in H.264. 26L.
[0003]
MPEG-1,2,4, H. As a technique common to moving picture coding methods such as H.263, there is inter-frame prediction with motion compensation. In the motion compensation of these video coding methods, a frame of an input image is divided into rectangles of a predetermined size (hereinafter, referred to as blocks), and a prediction pixel is generated from a motion vector indicating a motion between frames for each block. I do.
[0004]
Hereinafter, in order to describe the inter-frame prediction with motion compensation, (1) the concept of a B picture will be described first. Since a B picture is a picture that can include an interpolation prediction block in a picture, (2) interpolation The prediction will be described. Next, since a reference frame used for interpolation prediction or the like is uniquely identified, (3) a frame number and a relative index used for identification and a method of assigning these to a reference frame will be described. Next, (4) a short-term frame buffer and a long-term frame buffer will be described, and (5) a direct mode in which inter-frame prediction is performed by pixel interpolation will be described. Finally, (6) a conventional image encoding device and (7) a conventional image decoding device will be described.
[0005]
(1) The B picture will be described with reference to FIG. FIG. 16 is a conceptual diagram of a B picture. Frm indicates a B picture to be encoded, and Ref1, Ref2, and Ref3 indicate encoded frames that can be used as reference frames for inter-frame prediction. Block Blk1 is a block inter-frame predicted from reference blocks RefBlk1 and RefBlk2, and block Blk2 is a block inter-frame predicted from reference blocks RefBlk21 and RefBlk22.
[0006]
{Circle around (2)} FIG. 17 is an explanatory diagram of interpolation prediction. RefBlk1 and RefBlk2 indicate two reference blocks used for interpolation prediction, and PredBlk indicates a prediction block obtained by the interpolation processing. Here, the block size is described as 4 × 4 pixels. X1 (i) is a pixel value of RefBlk1, X2 (i) is a pixel value of RefBlk2, and P (i) is a pixel value of PredBlk. The pixel value P (i) can be obtained by the following linear prediction equation.
[0007]
P (i) = A.X1 (i) + B.X2 (i) + C
A, B, and C are linear prediction coefficients. As the linear prediction coefficient, only an average value (when A = 1/2, B = 1/2, and C = 0 in the above equation) may be used as in MPEG-1 and MPEG-2. In some cases, a linear prediction coefficient is set in advance, the value is stored in an image coded signal, and transmitted from the image coding device to the image decoding device.
[0008]
A block predicted between frames by pixel interpolation from a plurality of reference frames is referred to as an “interpolated prediction block”. A B picture is a picture that can include an interpolation prediction block in a picture. In a B picture, a different reference frame can be selected for each interpolation prediction block, such as a block Blk1 and a block Blk2 shown in FIG.
[0009]
A picture that does not include an interpolated prediction block and can include a block for performing inter-frame prediction from one reference frame in a picture is called a P-picture, and includes only intra-frame prediction blocks that do not perform inter-frame prediction. The picture to be processed is called an I picture.
[0010]
H. In 26L, a maximum of two reference frames are used for a block of a B picture. Therefore, to distinguish the two reference frames, each reference frame is referred to as a first reference frame and a second reference frame. The motion vectors for the first reference frame and the second reference frame are called a first motion vector and a second motion vector, respectively. Taking Blk1 in FIG. 16 as an example, the first reference frame is Ref1, the second reference frame is Ref3, the first motion vector is MV1, and the second motion vector is MV2. In addition, prediction from only the first reference frame is referred to as first reference frame prediction, and prediction from only the second reference frame is referred to as second reference frame prediction. Note that it is not necessary to distinguish between reference frames and motion vectors in blocks predicted between frames from one reference frame. However, in this document, a reference frame / motion vector of a block inter-frame predicted from one reference frame is referred to as a first reference frame and a first motion vector for convenience of description.
[0011]
(3) FIG. 18 is an explanatory diagram of a frame number and a relative index. The frame number / relative index is a number for uniquely identifying a reference frame stored in the multi-frame buffer MFrmBuf. H. In 26L, a value that increases by 1 for each frame stored in the memory as a reference image is assigned as a frame number. On the other hand, the relative index is in the block as shown in FIG. Then, the relative index is used to indicate a reference frame used for inter-frame prediction of this block.
[0012]
FIG. 19 is a conceptual diagram of an image encoded signal format by a conventional image encoding device. Picture is a coded signal for one frame, Header is a coded header signal included at the beginning of the frame, Block1 is a coded signal of a block in the direct mode, Block2 is a coded signal of a block by interpolation prediction other than the direct mode, RIdx1, RIdx2 indicates a relative index, and MV1 and MV2 indicate motion vectors. The interpolation prediction block Block2 has two relative indexes RIdx1 and RIdx2 in the coded signal in this order to indicate two reference frames used for interpolation. Have in this order. Whether to use the relative index RIdx1 or RIdx2 can be determined by PredType. The relative index RIdx1 indicating the first reference frame is referred to as a first relative index, and the relative index RIdx2 indicating the second reference frame is referred to as a second relative index. The first reference frame and the second reference frame are determined by a data position in the coded stream.
[0013]
Hereinafter, a method of assigning the first relative index and the second relative index will be described with reference to FIG.
[0014]
As the value of the first relative index, first, a value starting from 0 is assigned to a reference frame having a display time earlier than the encoding target frame in the order from the closest to the encoding target frame. If a value starting from 0 is assigned to all reference frames having a display time earlier than the encoding target, then the reference frames having a display time later than the encoding target frame are assigned in order from the closest to the encoding target frame. Subsequent values are assigned.
[0015]
When the first relative index Ridx1 is 0 and the second relative index Ridx2 is 1 in FIG. 21A, as shown in FIG. 20, the first reference frame is the B picture of the frame number 14, and the second reference frame is This is a B picture of frame number 13. As the value of the second relative index, first, a value starting from 0 is assigned to a reference frame having a display time later than the encoding target frame, in order from the closest to the encoding target frame. If a value starting from 0 is assigned to all reference frames having a display time later than the encoding target, the reference frame having a display time earlier than the encoding target frame is assigned a value starting from the order closest to the encoding target frame. Subsequent values are assigned.
[0016]
The relative index in the block is represented by a variable-length codeword, and the smaller the value, the shorter the code length code is assigned. Normally, the frame closest to the encoding target frame is selected as a reference frame for inter-frame prediction. Therefore, if relative index values are assigned in the order closer to the encoding target frame as described above, encoding efficiency will increase.
[0017]
(4) FIG. 20 is a conceptual diagram of the short-term frame buffer and the long-term frame buffer. In MPEG-1 and MPEG-2, the multi-frame buffer MFrmBuf is a FIFO (First-In First-Out) format buffer in which if there is no free space in the buffer, the reference frame stored first in the buffer is discarded and a new reference frame is stored. As default. Here, consider a scene in which an object crosses in front of a stationary background. In this case, the background is once hidden by the foreground object and appears again on the screen. If the background frame coded before hiding in the object can be used as a reference frame when the background hidden in the object first appears, the coding efficiency is improved. However, the amount of memory required to hold the image data of the reference frame is enormous, and considering the practical amount of memory, the maximum number of frames that can be stored in the multi-frame buffer MFrmBuf is about several. Therefore, in the case of the above scene, when the background appears on the screen again, the background frame is already discarded from the multi-frame buffer MFrmBuf and cannot be used for prediction, and efficient inter-frame prediction cannot be performed. Often.
[0018]
Then, H. In the 26L, a buffer called a long-term frame buffer LTMem that can hold a specific frame for a long time is defined as a multi-frame buffer MFrmBuf, in addition to a FIFO-type buffer. As shown in FIG. In 26L, the multi-frame buffer MFrmBuf is virtually divided into a short-term frame buffer STMem and a long-term frame buffer LTMem. The conventional FIFO type buffer is called a short-term frame buffer STMem to distinguish it from a long-term frame buffer. The reference frame once stored in the long-term frame buffer LTMem is not discarded from the long-term frame buffer LTMem without a control instruction for the buffer. In the case of the scene described above, by storing the background frame in the long-term frame buffer LTMem, the frame can be used for prediction when the hidden background appears again. With this mechanism, the coding efficiency can be greatly increased in the case of the scene described above. In order to move the reference frame between the long-term frame buffer LTMem and the short-term frame buffer STemem, or to discard the reference frame from the long-term frame buffer LTMem, a control instruction is issued to the buffer. When performing control such as moving or discarding the reference frame on the long-term frame buffer LTMem, the image coding apparatus stores a buffer control signal RPSL indicating the control content in the coded signal as shown in FIG. . The image decoding device can perform the same control as that of the image encoding device on the long-term frame buffer LTMem of the multi-frame buffer MFrmBuf by using the buffer control signal RPSL in the encoded signal.
[0019]
In the example of FIG. 20, as shown in FIG. 19, the reference frames Ref1, Ref2, and Ref3 are stored in the short-term frame buffer STMem, and the reference frames LTRef1 and LTRef2 are stored in the long-term frame buffer LTMem. The relative index value is also assigned to the reference frame of the long-term frame buffer LTMem. Therefore, similarly to the case of the reference frame of the short-term frame buffer STMem, the frame of the long-term frame buffer LTMem selected as the reference frame by the motion compensation of the block can be indicated by the value of the relative index in the image coded signal.
[0020]
(5) FIG. 21 is an explanatory diagram of the direct mode of the conventional image encoding device. The direct mode is a mode in which a reference frame / motion vector for a current block is determined from the motion vector / reference frame used in encoding the reference frame by a method described below, and inter-frame prediction is performed by pixel interpolation. is there. Frm indicates a B picture to be encoded, and Ref1, Ref2, Ref3, and Ref4 indicate encoded frames in the multi-frame buffer MFrmBuf. RIdx1 indicates a first relative index, RIdx2 indicates a second relative index, MV01 indicates a first motion vector, and MV02 indicates a second motion vector.
[0021]
Block Blk0 is a block to be encoded in the direct mode, block Blk00 is a block located at the same position as encoding target block Blk0 in reference frame Ref3, RefBlk01 is a reference block included in reference frame Ref1, and RefBlk02 is included in reference frame Ref3. Indicates a reference block to be used.
[0022]
The motion vector MV0 refers to the frame Ref1 with the first motion vector when the block Blk00 is encoded. The first motion vector MV01 and the second motion vector MV02 used for prediction of the current block Blk0 are calculated by the following equations.
[0023]
MV01 = N1 × MV0 / D, MV02 = N2 × MV0 / D
In the above equation, N1, N2, and D are values used when calculating a direct mode motion vector, and in this detailed description, a set of the values of N1, N2, and D is referred to as a direct mode scaling coefficient. In the case of FIG. 21, the scaling coefficients for the direct mode may be N1 = 2, N2 = −1, and D = 3. The motion vector MV0 is called a scaling motion vector. Assuming that the motion of the object including the encoding target block in the screen is constant, the first motion vector MV01 and the second motion vector MV02 are represented by the display time difference between the frame Frm and the first reference frame Ref1, and The motion vector MV0 can be obtained by internally dividing the motion vector MV0 based on Frm and the display time difference between the second reference frame Ref2.
[0024]
In the direct mode, the direct mode scaling coefficient transmitted for each frame is used in common for all blocks including the frame.
[0025]
If FIG. 21 corresponds to FIG. 18, Frm in FIG. 21 corresponds to the B picture (dotted picture) in the center of FIG. 18, Ref3 in FIG. 21 corresponds to the picture of frame number 13 in FIG. Ref1 corresponds to the picture of frame number 12 in FIG. 18, and Ref2 of FIG. 21 corresponds to the B picture of frame number 11 in FIG. Non-reference frames indicated by dotted lines other than Frm in FIG. 21 are not stored in the multi-frame buffer MfrmBuf because they are not referenced from other pictures. Therefore, unlike the picture shown in FIG. 18, a relative index for referring to the frame is not assigned.
[0026]
{Circle around (6)} FIG. 15 is a block diagram showing a configuration of a conventional image encoding device. Hereinafter, the image encoding device will be described.
It is assumed that the image coding apparatus receives the image signal Img divided into blocks and performs processing for each block. The subtractor Sub subtracts the predicted image signal Pred from the image signal Img and outputs the result as a residual signal Res. The image encoding unit ImgEnc receives the residual signal Res, performs image encoding processing such as DCT transform / quantization, and outputs a residual encoded signal ERes including quantized DCT coefficients and the like. The image decoding means ImgDec receives the residual coded signal ERes, performs image decoding processing such as inverse quantization / inverse DCT transform, and outputs a residual decoded signal DRes. The adder Add adds the residual decoded signal DRes and the predicted image signal Pred, and outputs the result as a reconstructed image signal Recon. In the reconstructed image signal Recon, a signal that may be referred to in subsequent inter-frame prediction is stored in the multi-frame buffer MFrmBuf. Since the memory capacity of the multi-frame buffer MFrmBuf is finite, data of frames not used for subsequent inter-frame prediction in the multi-frame buffer MFrmBuf can be removed from the multi-frame buffer MFrmBuf.
[0027]
The motion estimating means ME inputs the reference frame RefFrms stored in the multi-frame buffer MFrmBuf, performs motion estimation, and performs a predetermined prediction among predictions by intra prediction, first reference frame prediction, second reference frame prediction, and interpolation prediction. An optimal prediction type is selected according to the method (the prediction type that can be selected differs depending on the picture type), and the first motion vector eMV1, the second motion vector eMV2, the first relative index eRIdx1, and the second relative index eRIdx2 for the block are output.
[0028]
As a method of selecting a prediction type in the motion estimation means ME, for example, there is a method of selecting a prediction type that minimizes a prediction error of each prediction type. If the selected prediction type is intra prediction, the motion vector and the relative index are not output. If the selected reference type is the first reference frame prediction, only the first relative index and the first motion vector are output, and the second reference frame is output. In the case of frame prediction, only the second relative index and the second motion vector are output, and in the case of interpolation prediction, the first and second relative indexes and the first and second motion vectors are output.
[0029]
As indicated above, H. In 26L, a reference frame whose second relative index rRIdx2 is 0 is used as a second reference frame in the direct mode. Therefore, the second relative index rRIdx2 of the value 0 is input to the motion vector / frame number buffer MVFNBuf and the direct mode vector / relative index generating means GenMVRIdx.
[0030]
The motion vector / frame number buffer MVFNBuf stores a scaling vector rMV and a frame number indicating a frame referred to by the scaling vector rMV. Since the reference frame including the scaling vector rMV is the reference frame indicated by the second relative index rRidx2, the second relative index rRIdx2 having a value of 0 is input, and the reference frame indicating the scaling vector rMV and the reference frame of the scaling vector rMV is input. 1 Relative index rRIdx1 is output.
[0031]
The direct mode vector / relative index generation unit GenMVRIdx receives the direct mode scaling coefficient SP, the scaling vector rMV, and the first relative index rRIdx1, and performs the first mode operation in the direct mode by the direct mode processing described above. A vector sMV1, a second motion vector sMV2, a first relative index sRIdx1, and a second relative index sRIdx2 are output.
[0032]
The prediction type selection means PredSel is used for prediction of the image signal Img, the reference frame RefFrms, the relative indexes sRIdx1, sRIdx2 and the motion vectors sMV1, sMV2 indicating the position of the reference block in the “direct mode”, and “other than the direct mode”. Relative indexes eRIdx1, eRIdx2 and motion vectors eMV1, eMV2 indicating the position of the reference block to be input. Then, it is determined whether the direct mode should be used for the block prediction, and the determined prediction type is output to the variable length coding unit VLC0 as the prediction type PredType.
[0033]
The selection of the prediction type in the prediction type selection unit PredSel is performed, for example, by selecting a smaller prediction error between the prediction error in the “direct mode” and the prediction error in the “non-direct mode prediction” for the input pixel. Do.
[0034]
Therefore, the direct mode is added to the prediction type PredType in addition to the intra prediction, the first reference frame prediction, the second reference frame prediction, and the interpolation prediction other than the direct mode selected by the motion estimation unit ME.
[0035]
When the prediction type PredType indicates the direct mode, the switch SW12 switches to the “1” side, and the relative indexes sRIdx1 and sRIdx2 and the motion vectors sMV1 and sMV2 are used as the relative indexes RIdx1, RIdx2 and the motion vectors MV1 and MV2. You.
[0036]
On the other hand, when the prediction type PredType indicates a mode other than the direct mode, the switch SW12 is switched to the “0” side, and the relative indexes eRIdx1 and eRIdx2 and the motion vectors eMV1 and eMV2 are used as the relative indexes RIdx1, RIdx2 and the motion vectors MV1 and MV2. Is done.
[0037]
Further, in the direct mode, the first motion vector sMV1 when the block of the coded frame is coded is used as the scaling vector. Then, the frame referred to by the first motion vector sMV1 is used as one reference frame in the direct mode.
[0038]
Accordingly, in the encoded first relative index RIdx1 and the first motion vector MV1, the first relative index RIdx1 and the first motion vector MV1 that may be used in the direct mode in the frames after the encoded frame are: It is stored in the motion vector / frame number buffer MVFNBuf.
[0039]
After determining the prediction type PredType, the first relative index RIdx1 and the first motion vector MV1 are input to the multi-frame buffer MFrmBuf, and the reference block RefBlk1 corresponding to the input first relative index RIdx1 and the first motion vector MV1 is generated. Output to the pixel interpolation means Pol. When two reference blocks are required depending on the prediction type, a reference block RefBlk2 corresponding to the second relative index RIdx2 and the second motion vector MV2 is further output from the multi-frame buffer MFrmBuf to the pixel interpolation means Pol.
[0040]
At the time of interpolation prediction, the pixel interpolation means Pol interpolates pixel values at positions corresponding to the two reference blocks RefBlk1 and RefBlk2, and outputs an interpolation block RefPol.
[0041]
When the prediction type PredType indicates interpolation prediction, the switch SW11 is switched to "1", and the interpolation block RefPol is used as the prediction image signal Pred.
[0042]
At the time of the first reference frame prediction, the multi-frame buffer MFrmBuf outputs a reference block RefBlk corresponding to the first relative index RIdx1 and the first motion vector MV1.
[0043]
At the time of the second reference frame prediction, the multi-frame buffer MFrmBuf outputs a reference block RefBlk corresponding to the second relative index RIdx2 and the second motion vector MV2.
[0044]
Although not shown, at the time of intra prediction, a block RefBlk composed of pixels of the intra prediction result is output from the multi-frame buffer MFrmBuf.
[0045]
When the prediction type PredType indicates a prediction method other than the interpolation prediction, the switch SW11 is switched to “1”, and the reference block RefBlk is used as the prediction image signal Pred.
[0046]
The variable length coding unit VLC0 performs variable length coding on the residual coded signal ERes, the relative indexes RIdx1 and RIdx2, the motion vectors MV1 and MV2, the scaling coefficient SP for direct mode, and the prediction type PredType, and outputs the coded signal BitStrm0. .
[0047]
In the conceptual diagram of the image coded signal format of the conventional image coding apparatus of FIG. 21, Block1 is an example of a block coded in the direct mode. In this block, the relative index and the motion vector information are in the coded signal. I do not have. On the other hand, Block2 is an example of a block coded by interpolation prediction other than the direct mode. In this block, relative indexes RIdx1 and RIdx2 and motion vectors MV1 and MV2 can be included in the coded image signal.
[0048]
{Circle around (7)} FIG. 23 is a block diagram showing a configuration of a conventional image decoding device. Units performing the same operations and signals having the same operations as those in the block diagram showing the configuration of the conventional image encoding device in FIG.
[0049]
The variable-length decoding means VLD0 receives the coded image signal BitStrm0, performs variable-length decoding, and calculates the residual coded signal ERes, the motion vectors MV1, MV2, relative indexes RIdx1, RIdx2, the scaling coefficient SP for direct mode, and the prediction type PredType. Output.
[0050]
The image decoding means ImgDec receives the residual coded signal ERes, performs image decoding processing such as inverse quantization / inverse DCT transform, and outputs a residual decoded signal DRes.
[0051]
The adder Add adds the residual decoded signal DRes and the predicted image signal Pred, and outputs the result as a decoded image signal DImg outside the image decoding device.
[0052]
The multi-frame buffer MFrmBuf stores the decoded image signal DImg in the buffer for inter-frame prediction.
[0053]
The motion vector / frame number buffer MVFNBuf stores a scaling vector rMV and information rRIdx1 for identifying a frame referred to by the scaling vector. The motion vector / frame number buffer MVFNBuf receives the first relative index rRIdx2 having a value of 0, and outputs a scaling vector rMV and a first relative index rRIdx1 for identifying a frame referred to by the scaling vector.
[0054]
The direct mode vector / relative index generation unit GenMVRIdx performs the same processing as the direct mode vector / relative index generation unit GenMVRIdx in FIG.
[0055]
When the prediction type PredType indicates the direct mode, the switch SW22 switches to “1”. Then, the relative indexes sRIdx1 and sRIdx2 and the motion vectors sMV1 and sMV2 are used as the relative indexes nRIdx1 and nRIdx2 and the motion vectors nMV1 and nMV2.
[0056]
When the prediction type PredType indicates a mode other than the direct mode, the switch SW22 switches to the “0” side. Then, the relative indexes RIdx1 and RIdx2 and the motion vectors MV1 and MV2 are used as the relative indexes nRIdx1 and nRIdx2 and the motion vectors nMV1 and nMV2.
[0057]
At the time of interpolation prediction, the multi-frame buffer MFrmBuf outputs a reference block RefBlk1 corresponding to the first relative index nRIdx1 and the first motion vector nMV1, and RefBlk2 corresponding to the second relative index nRIdx2 and the second motion vector nMV2. Then, the pixel interpolation means Pol outputs the pixel values corresponding to the two reference blocks RefBlk1 and RefBlk2 as the interpolation block RefPol.
[0058]
At the time of the first reference frame prediction, the multi-frame buffer MFrmBuf outputs the reference block RefBlk corresponding to the first relative index nRIdx1 and the second motion vector nMV1.
[0059]
At the time of the second reference frame prediction, the multi-frame buffer MFrmBuf outputs the reference block RefBlk corresponding to the second relative index nRIdx2 and the second motion vector nMV2.
[0060]
Although not shown, at the time of intra prediction, a block RefBlk composed of pixels of the intra prediction result is output from the multi-frame buffer MFrmBuf.
[0061]
When the prediction type PredType indicates the interpolation prediction, the switch SW21 switches to the “0” side, and the interpolation block RefPol is used as the prediction image signal Pred.
[0062]
When the prediction type PredType indicates a prediction method other than the interpolation prediction, the switch SW21 is switched to “1”, and the reference block RefBlk is used as the predicted image signal Pred.
[0063]
Among the decoded first relative index nRIdx1 and the first motion vector nMV1, the first relative index nRIdx1 and the first motion vector nMV1 that may be used in the direct mode in frames after the decoded frame are motion vector frames. It is stored in the number buffer MVFNBuf.
[0064]
As described above, the image decoding device decodes the coded image signal BitStrm0 and outputs the decoded image signal as a decoded image signal DImg.
[0065]
[Problems to be solved by the invention]
Assuming that the motion of the object including the encoding target block in the screen is constant, the display time difference between the encoding target frame and the first reference frame, and the display time difference between the encoding target frame and the second reference frame. The scaling factor for direct mode can be determined by the ratio. In this case, the scaling coefficient is (N1, N2, D) = (2, −1, 3) in the block Blk0 in FIG. 20, and the scaling coefficient is (N1, N2, D) = (5, −) in the block Blk1. 1, 6).
[0066]
However, H. et al. In 26L, only one direct mode scaling coefficient can be transmitted per frame. Therefore, the same direct mode scaling factor must be used for all direct mode blocks in a frame. As a result, in the case of FIG. 20, for example, when (N1, N2, D) = (2, -1, 3) is used as a common scaling coefficient for the frame Frm, an appropriate direct mode motion vector is used for the block Blk0. Can be generated, but an appropriate direct mode motion vector cannot be generated in the block Blk1, and the coding efficiency is degraded.
[0067]
Therefore, an object of the present invention is to provide means for selecting a scaling coefficient for a block in a direct mode according to a reference frame to be used, and to improve coding efficiency in the direct mode.
[0068]
[Means for Solving the Problems]
The first invention is
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on an encoding target frame by motion compensation from a plurality of encoded frames stored in the multi-frame buffer, A first step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the encoded frame;
A second step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A third step of encoding the prediction error and outputting an image encoded signal including the encoded signal of the prediction error;
In the second step of the image encoding method having
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the encoding target frame is set as the first reference frame,
A scaling coefficient table storing at least one scaling coefficient associated with a value of the first relative index; selecting a scaling coefficient corresponding to the first relative index from the scaling coefficient table; Calculating a motion vector for the first reference frame and a second reference frame from the coefficients;
Generating a predicted image by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame;
In the third step,
By including and including the scaling coefficient table in the image coded signal,
Since the scaling coefficient can be selected from the direct mode scaling coefficient table for each first reference frame, and the direct mode motion vector can be generated in consideration of the display time difference between the encoding target frame and the reference frame, the coding efficiency of the direct mode can be improved. Can be improved.
[0069]
The second invention is
A first step of inputting an image encoded signal including an encoded signal of a prediction error,
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on a decoding target frame by motion compensation from a plurality of decoded frames stored in the multi-frame buffer, A second step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the decoded frame;
A third step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A fourth step of generating a decoded image of a frame from the predicted image and the decoded prediction error,
A fifth step of storing a decoded image of a frame that may be used for inter-frame prediction in a multi-frame buffer;
In the first step of the image decoding method having
Decode the scaling coefficient table in the image encoded signal,
In the second step,
Selecting, as the second reference frame, a reference frame having the smallest second relative index in the display order of the decoded frames after the decoding target frame;
In the third step,
In the second reference frame, a frame referred to by a motion vector used for motion compensation of a block at the same position as a predetermined block on the decoding target frame is set as the first reference frame, and the first relative index Is selected from the scaling coefficient table, and a motion vector for the first reference frame and a second reference frame are calculated from the motion vector and the scaling coefficient. By generating a predicted image by pixel interpolation from a block obtained from a motion vector and a block obtained from a motion vector for the second reference frame,
Since the scaling coefficient can be selected from the direct mode scaling coefficient table for each first reference frame, and the direct mode motion vector can be generated in consideration of the display time difference between the encoding target frame and the reference frame, the coding efficiency of the direct mode can be improved. Can be improved.
[0070]
The third invention is
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on an encoding target frame by motion compensation from a plurality of encoded frames stored in the multi-frame buffer, A first step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the encoded frame;
A second step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
Encoding the prediction error, the third step of outputting an image encoding signal including an encoded signal of the prediction error, in the second step in the image encoding method,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the encoding target frame is set as the first reference frame,
A scaling coefficient associated with the value of the first relative index, and a scaling coefficient storing one or more prediction method types indicating pixel interpolation, prediction from the first reference frame, or prediction from the second reference frame. It has a forecast method table,
A scaling coefficient and a prediction method type corresponding to the first relative index are selected from the scaling coefficient / prediction method table, and a motion vector for the first reference frame and the second reference frame are selected from the motion vector and the scaling coefficient. Calculated to
When the prediction method type is pixel interpolation, a prediction image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame, and prediction is performed. When the method type is prediction from a first reference frame, a block obtained from a motion vector for the first reference frame is used as a prediction image, and when the prediction method type is prediction from a second reference frame, , A block obtained from a motion vector for the second reference frame as a prediction image,
In the third step,
By including the scaling coefficient / prediction method table in the image coded signal and outputting it,
Since not only the interpolation prediction but also the direct mode using only one reference frame can be used, the coding efficiency can be improved.
[0071]
The fourth invention is
A first step of inputting an image encoded signal including an encoded signal of a prediction error,
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on a decoding target frame by motion compensation from a plurality of decoded frames stored in the multi-frame buffer, A second step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the decoded frame;
A third step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A fourth step of generating a decoded image of a frame from the predicted image and the decoded prediction error,
A fifth step of storing a decoded image of a frame that may be used for inter-frame prediction in a multi-frame buffer;
In the first step of the image decoding method having
Decode the scaling coefficient / prediction method table in the image encoded signal,
In the second step,
Selecting, as the second reference frame, a reference frame having the smallest second relative index in the display order of the decoded frames after the decoding target frame;
In the third step,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the decoding target frame is set as the first reference frame,
A scaling coefficient and a prediction method type corresponding to the first relative index are selected from the scaling coefficient / prediction method table, and a motion vector for the first reference frame and the second reference frame are determined from the motion vector and the scaling coefficient. Calculated to
When the prediction method type is pixel interpolation, a prediction image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame, and prediction is performed. When the method type is prediction from a first reference frame, a block obtained from a motion vector for the first reference frame is used as a prediction image, and when the prediction method type is prediction from a second reference frame, , By using a block obtained from a motion vector for the second reference frame as a prediction image,
Since not only the interpolation prediction but also the direct mode using only one reference frame can be used, the coding efficiency can be improved.
[0072]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, specific embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
FIG. 1 is a block diagram of an image encoding device according to the first embodiment. Units performing the same operations and signals having the same operations as those in the block diagram of the conventional image encoding device in FIG. 22 are denoted by the same reference numerals, and description thereof will be omitted.
[0073]
H. In the 26L, since only one direct mode scaling coefficient can be transmitted per frame, an optimal direct mode scaling coefficient for a block cannot be selected, and there is a problem in that encoding efficiency is deteriorated. Therefore, in the present invention, a direct mode scaling coefficient table storing a plurality of direct mode scaling coefficients is provided, and the direct mode scaling coefficient is selected and used from the table according to the first relative index of the direct mode. To improve the coding efficiency of the direct mode.
[0074]
FIG. 2 is a first example of a direct mode scaling coefficient table used when selecting a direct mode coefficient. RIdx1 indicates a first relative index in the direct mode, and a set of values of N1, N2, and D indicates a scaling coefficient. The scaling coefficient of each row in the table indicates a coefficient for the value of RIdx1 at the head of the row.
[0075]
Assuming that the motion of the object including the encoding target block in the screen is constant, the display time difference between the encoding target frame and the first reference frame, and the display time between the encoding target frame and the second reference frame 2 The scaling factor for direct mode can be determined by the ratio with the difference.
[0076]
In the example of FIG. 20, when the first relative index RIdx1 is 0 as in the block Blk0, the scaling coefficient is (N1, N2, D) = (2, −1, 3), and the first coefficient as in the block Blk1. When the relative index RIdx1 is 1, the scaling coefficient is (N1, N2, D) = (5, -1, 6). Similarly, the scaling coefficient can be determined in FIG. 2 based on the result of determining the scaling coefficient for the value after RIdx1 = 2.
[0077]
The direct mode scaling coefficient table SPTableBuf in FIG. 1 stores a direct mode scaling coefficient table. The direct mode scaling coefficient selection means SPSel inputs the first relative index RIdx1 obtained from the motion vector / frame number buffer MVFNBuf. Then, a row satisfying the condition of RIdx1 = rRIdx1 is searched from the first column of the direct mode scaling coefficient table, and the scaling coefficient (N1, N2, D) of the row obtained as a result of the search is used as the direct mode scaling coefficient SP. Output.
[0078]
By storing the direct mode scaling coefficient table used in the image encoding device in an encoded signal and transmitting it to the image decoding device, the same direct mode scaling coefficient can be used in the image decoding device. Therefore, the direct mode scaling coefficient table SPs is extracted from the direct mode scaling coefficient table buffer SPTableBuf, and the residual encoded signal ERes, the prediction type PredType, the relative indexes RIdx1, RIdx2, the motion vectors MV1, The variable length coding is performed together with the MV2, and the result is output as a coded signal BitStrm4.
[0079]
Here, a method of encoding the direct mode scaling coefficient table into the image encoded signal BitStrm4 will be described. In the present embodiment, four different methods are shown as methods for encoding the direct mode scaling coefficient table.
[0080]
The encoding method of the first direct mode scaling coefficient table is a method of encoding all three parameters N1, N2, and D in the image encoded signal BitStrm4. FIG. 3 shows an example of an image coded signal format according to this method. Since only the header header part differs from the image encoding signal format of the conventional image encoding device in FIG. 21, only the header part is shown. N indicates the number of direct mode scaling coefficients included in the direct mode coefficient table.
[0081]
N1 (r), N2 (r), and D (r) indicate direct mode scaling coefficients for the first relative index RIdx1 = r. The N number of direct mode scaling coefficients are stored in the header section, and the image decoding apparatus can obtain the direct mode scaling coefficient for the first relative index RIdx1 by analyzing the header section.
[0082]
The encoding method of the second direct mode scaling coefficient table defines a relational expression that holds between D, N1, and N2, and stores only two parameters of D, N1, and N2 in the image encoded signal BitStrm4. Not a way. FIG. 4 shows an example of an image coded signal format according to this method.
[0083]
Under the assumption that the motion of the object in the screen including the encoding target block is constant, the first motion vector and the second motion vector in the direct mode are obtained by changing the scaling vector to the display time of the encoding target frame and the first reference frame. The result is internally divided by the ratio of the difference and the display time difference between the current frame and the second reference frame. Assuming that the frame rate and the display time interval of the reference frame are constant, the relational expression of D = N1−N2 is satisfied in the case of internal division, so that the relation of D = N1−N2 between the image encoding device and the image decoding device is satisfied. By defining the equation, N1, N2, and D can be determined without transmitting any one parameter of N1, N2, and D to the image decoding device. FIG. 4 shows an image coded signal format when the relational expression of D = N1−N2 is satisfied and N1 is omitted. Similarly, an image coded signal format in which N2 and D are omitted can be defined.
[0084]
The third encoding method of the direct mode scaling coefficient table is a method in which one of the values of N1, N2, and D is common to all direct mode scaling coefficients in the table.
[0085]
Here, the first motion vector and the second motion vector in the direct mode are obtained by setting the scaling vector to the display time difference between the current frame and the first reference frame and the display time difference between the current frame and the second reference frame. It is considered that the result is internally divided by the ratio of
[0086]
When the second reference frame used in the direct mode is always the same reference frame in the frame, the display time difference between the encoding target frame and the second reference frame is always constant in the frame. Therefore, N2 used in calculating the second reference vector is fixed to a fixed value C, and by changing N1 and D, it is possible to generate a direct mode scaling coefficient indicating an internal portion due to a display time difference. For example, in the case of FIG. 20, C can be set to −1. FIG. 5 shows an example of an image coded signal format according to this method.
[0087]
The header section stores only one value of N2 commonly used for the direct mode scaling coefficients 1 to N.
[0088]
Similarly, when the first reference frame used in the direct mode is always the same reference frame in the frame, only the value commonly used for all the direct mode scaling coefficients is stored in the Header section for N1, and N2 is stored in the header section. , D, N sets may be stored in the Header section.
[0089]
Similarly, when the interval between the first reference frame and the second reference frame used in the direct mode is always constant within the frame, only the value commonly used for all the direct mode scaling coefficients for the header is also used for D. And N sets of N1 and N2 may be stored in the Header section.
[0090]
The fourth encoding method of the direct mode scaling coefficient table is a method of defining a calculation formula for generating N1, N2, and D.
[0091]
For example, if the frame time interval of the reference frame is constant and the first reference frame and the second reference frame are the results of internally dividing the scaling vector, D, N1, and N2 can be calculated by the following equations.
D (i) = K · (i + 1), N2 (i) = a, N1 (i) = D (i) −N2 (i)
(I is the value indicated by the first relative index, K is the frame interval of the reference frame (for example, the interval between Ref1 and Ref2 in FIG. 20), and a is the interval between the encoding target frame and the second reference frame (for example, FIG. The interval between Frm and Ref3))
[0092]
FIG. 6 shows an example of an image coded signal format according to this method. Only K and a are encoded in the encoded signal. By using the above equation for both the image encoding device and the image decoding device, the same direct mode scaling coefficient can be generated on the encoding side and the decoding side.
The above is the description of the encoding method of the direct mode scaling coefficient table.
[0093]
Next, consider a case where the motion vector in the direct mode refers to a frame stored in the long-term frame buffer as shown in FIG. FIG. 7 is an example of a direct mode scaling coefficient table. Here, a common direct mode scaling coefficient is set for all reference frames included in the long-term frame buffer. For the frame included in the long-term frame buffer, only the direct mode scaling coefficient that is commonly used for all the reference frames included in the long-term frame buffer is stored in the image coded signal format. Since the frames included in the long-term frame buffer are assumed to be images with little motion since they are reference frames for a long time, there is no problem even if all the same coefficients are assigned. For all the frames included in the long-term frame buffer, it is only necessary to send the direct mode scaling coefficient, so that the code amount can be reduced.
[0094]
In the present embodiment, the values of N1, N2, and D are integer values, and a variable length codeword can be assigned to each integer value. However, by limiting the range of the values of N1, N2, and D, more efficient variable-length codeword allocation can be performed. FIG. 8 is an example of a codeword table for a direct mode scaling coefficient. (A) is a variable length codeword table for N1, (B) is a variable length codeword table for N2, and (C) is a variable length codeword table for D.
[0095]
Assuming that the motion of the object including the encoding target block in the screen is constant, the scaling vector can be internally divided into the first motion vector and the second vector. At this time, if D and N1 are set to positive values and N2 is set to negative values, internal division can be performed. Therefore, even if the range of D and N1 values is limited to a positive value and the range of N2 values is limited to a negative value, No problem.
[0096]
Therefore, as shown in FIG. 8, variable length codewords may be assigned to D and N1 only for positive values and to N2 only for negative values. Since it is not necessary to assign a variable-length codeword to a value that is not used, coding efficiency is increased. Although the value of N1 is a positive value, a variable length codeword may be assigned to 0 as a value of 0 or more. Similarly, assuming that the value of N2 is equal to or less than 0, a variable length codeword may be assigned to 0. In this example, one variable-length codeword is assigned to each of N1, N2, and D. However, one variable-length codeword may be assigned to a combination of N1, N2, and D values.
[0097]
As described above, the image encoding apparatus according to the present embodiment selects the direct mode scaling coefficient corresponding to the first relative index of the direct mode from the direct mode scaling coefficient table, and uses the selected frame. Since the motion vector for the direct mode can be generated in consideration of the display time difference, the encoding efficiency in the direct mode can be improved.
[0098]
(Embodiment 2)
FIG. 9 is a block diagram of an image decoding device according to the second embodiment. Units performing the same operations and signals having the same operations as those in the block diagram showing the configuration of the conventional image decoding apparatus in FIG. 23 are denoted by the same reference numerals, and description thereof will be omitted.
[0099]
The variable length decoding means VLD4 receives the coded image signal BitStrm4, performs variable length decoding, and performs a residual coded signal ERes, a prediction type PredType, relative indexes RIdx1, RIdx2, motion vectors MV1, MV2, a scaling coefficient for direct mode. Output table SPs. The direct mode scaling coefficient table SPs is stored in the direct mode scaling coefficient table buffer SPTableBuf. It is assumed that the direct mode scaling coefficient selection unit SPSel performs the same processing as the direct mode scaling coefficient selection unit SPSel of the image encoding apparatus of FIG. 1 shown in the first embodiment.
[0100]
When decoding the direct mode coefficient table SPs from the image coded signal, decoding corresponding to the encoding method of the direct mode scaling coefficient table described in the first embodiment is performed.
[0101]
For example, with respect to the encoding method of the first direct mode scaling coefficient table described in the first embodiment, when a relational expression is defined between N1, N2, and D and one parameter is omitted, N1 , N2, D, unknown parameters are calculated.
[0102]
When one of the parameters is made common to all the direct mode scaling coefficients and only one common parameter is stored in the image coded signal as in the image coded signal format of FIG. Decode the parameters and use them for all direct mode scaling factors.
[0103]
When the parameter value can be calculated by a predetermined calculation formula as in the image coded signal format of FIG. 6, the parameter value obtained by using the predetermined calculation formula is used for all direct mode scaling coefficients. .
[0104]
Next, two methods will be described for processing when the direct mode scaling coefficient corresponding to the relative index rRIdx does not exist in the direct mode scaling coefficient table.
[0105]
First, the first method will be described. FIG. 10 is a flowchart for selecting a first direct mode scaling coefficient in the present embodiment. rRIdx is a first relative index in the direct mode. In process F11, it is checked whether or not the direct mode scaling coefficient for RIdx1 = rRIdx exists in the direct mode scaling coefficient table. If the direct mode scaling coefficient is present in the table, a direct mode scaling coefficient corresponding to rRIdx is selected in step F13. If the direct mode scaling coefficient does not exist in the table, the direct mode scaling coefficient having the maximum value of RIdx1 in the direct mode scaling coefficient table is selected in process F12. Then, the process of selecting the direct mode scaling coefficient is completed. Here, Ridx1 selects the largest direct mode scaling factor in the direct mode scaling factor that is not in the table, and the closest direct mode scaling factor in the table is the one with the largest RIdx1 value. Because it becomes. As described in the related art, the value of the first relative index is initially set to 0 in the order of the reference frame having the display time earlier than the encoding target frame in order from the closest to the encoding target frame. Is assigned.
[0106]
Next, the second method will be described. FIG. 11 is a flowchart for selecting a second direct mode scaling coefficient in the present embodiment. The difference from FIG. 10 is that if the direct mode scaling factor for RIdx1 = rRIdx does not exist in the direct mode scaling factor table, the default direct mode scaling factor is used in process F22.
[0107]
As described above, according to the present embodiment, it is possible to correctly decode an image-encoded signal encoded in the block diagram of the image encoding device using the image encoding method of the present invention described in Embodiment 1. Also, by using a plurality of direct mode scaling coefficients, even when there are a plurality of reference frame candidates, it is possible to generate a direct mode motion vector in consideration of the display time difference between the current frame and the reference frame. Can be improved.
[0108]
(Embodiment 3)
FIG. 12 is a block diagram of an image encoding device according to the third embodiment. Units performing the same operations and signals having the same operations as those in the block diagram of the image encoding apparatus according to the first embodiment in FIG.
[0109]
In the conventional direct mode, only interpolation prediction can be used as a prediction method. In the present embodiment, frame prediction from one reference frame can be used in the direct mode. For example, in FIG. 20, when a scene change occurs between Ref1 and Frm, there is no correlation between Ref1, Ref2 and Frm, and the effect of inter-frame prediction is lost. In this case, the prediction efficiency is higher in the prediction using only Ref3 as the reference frame than in the interpolation prediction using the reference frames Ref1 and Ref2 having no correlation.
[0110]
Therefore, if the first reference frame prediction or the second reference frame prediction using only one reference frame is used as the direct mode, the coding efficiency can be improved.
[0111]
Direct mode scaling coefficient / prediction method table The direct mode scaling coefficient / prediction method table stored in the buffer SPPRedTableBuf will be described with reference to FIG. The difference between the tables of FIGS. 2 and 13 is that a prediction method is added to the table of FIG.
[0112]
The direct mode scaling coefficient / prediction method selecting means SPPPredSel selects a set of the direct mode scaling coefficient SP and the direct mode prediction method DPred corresponding to the first relative index rRIdx1 from the direct mode scaling coefficient / prediction method table buffer SPPRedTableBuf. And output.
[0113]
When the direct mode prediction method DPred indicates the first reference frame prediction, the direct mode vector / relative index generation unit GenMVRefIdx2 outputs only the first relative index sRIdx1 and the first motion vector sMV1.
[0114]
When the direct mode prediction method DPred indicates the second reference frame prediction, the direct mode vector / relative index generation unit GenMVRefIdx2 outputs only the second relative index sRIdx2 and the second motion vector sMV2.
[0115]
When the direct mode is selected and the direct mode prediction method DPred indicates the first reference frame prediction, the prediction type selection unit PredSel switches the switch SW11 to the “1” side, and sets the first relative index RIdx1 and the first reference vector. The reference block RefBlk1 indicated by MV1 is used for prediction. When the direct mode is selected and the direct mode prediction method DPred indicates the second reference frame prediction, the prediction type selection unit PredSel switches the switch SW11 to the “1” side, and sets the second relative index RIdx2 and the second motion vector. The reference block RefBlk2 indicated by MV2 is used for prediction.
[0116]
When the direct mode prediction method DPred indicates interpolation prediction, the switch SW11 is switched to the “0” side, and the first relative index RIdx1, the reference block RefBlk1 indicated by the first reference vector MV1, the second relative index RIdx2, and the second relative index RIdx2. The reference block RefBlk2 indicated by the reference vector MV2 is used for interpolation prediction.
[0117]
The direct mode scaling coefficient / prediction method table SPPreds is stored in the coded signal so that the image coding apparatus and the image decoding apparatus can use the common direct mode scaling coefficient / prediction method table SPPreds. At this time, by using the values of the scaling coefficients N1 and N2 for the direct mode as in the method described below, the direct mode prediction method in the table is not coded, that is, the image code shown in the first embodiment is not coded. It is possible to notify the image decoding apparatus of the direct mode prediction method DPred while keeping the encoded signal format.
[0118]
In the case where a scaling coefficient for direct mode in which N2 is 0 is selected from the table of FIG. 13, it is defined that the first reference frame prediction is used instead of the interpolation prediction. At this time, if the scaling vector is MV, the first motion vector can be calculated as (N1 × MV) / D.
[0119]
Similarly, when the direct mode scaling coefficient of which N1 is 0 is selected, it is defined as the second reference frame prediction, and the first motion vector can be calculated as (N2 × MV) / D.
[0120]
The variable length coding unit VLC5 performs variable length coding on the residual coded signal ERes, the prediction type PredType, the relative indexes RefIdx1, RefIdx2, the motion vectors MV1, MV2, the direct mode scaling coefficient / prediction method table SPPreds, and outputs the coded signal. Output as BitStrm5.
[0121]
As described above, according to the present embodiment, even when there is only one reference frame having a high correlation with the encoding target frame due to a scene change or the like and the effect of interpolation prediction is lost, only the reference frame having a high correlation is used. Since the direct mode can be used, coding efficiency can be improved.
[0122]
(Embodiment 4)
FIG. 14 is a block diagram of an image decoding device according to the fourth embodiment. Units performing the same operations and signals having the same operations as in the block diagram of the image decoding apparatus according to the second embodiment in FIG.
[0123]
The variable length decoding unit VLD5 receives the coded image signal BitStrm5, performs variable length decoding, and performs a residual coded signal ERes, a prediction type PredType, relative indexes RIdx1, RIdx2, motion vectors MV1, MV2, and a scaling coefficient for direct mode. It outputs a prediction method table SPPreds.
[0124]
The direct mode scaling coefficient / prediction method table SPPPreds is stored in the direct mode scaling coefficient / prediction method table buffer SPPRedTableBuf. The direct mode scaling coefficient / prediction method selecting means SPPPredSel selects a set of the direct mode scaling coefficient SP and the direct mode prediction method DPred corresponding to the first relative index rRIdx1 from the direct mode scaling coefficient / prediction method table buffer SPPRedTableBuf. And output.
[0125]
When the direct mode prediction method DPred indicates the first reference frame prediction, the direct mode vector / relative index generation unit GenMVRefIdx2 outputs only the first relative index sRIdx1 and the first motion vector sMV1.
[0126]
When the direct mode prediction method DPred indicates the second reference frame prediction, the direct mode vector / relative index generation unit GenMVRefIdx2 outputs only the second relative index number sRIdx2 and the second motion vector sMV2.
[0127]
When the direct mode prediction method DPred indicates the first reference frame prediction when the direct mode is selected, the switch SW23 is switched to “1”, and the reference block RefBlk indicated by the first relative index nRRIdx1 and the first reference vector nMV1 is set. Used for prediction.
[0128]
When the direct mode is selected and the direct mode prediction method DPred indicates the second reference frame prediction, the switch SW23 is switched to the “1” side to change the reference block RefBlk indicated by the second relative index nRIdx2 and the second motion vector nMV2. Used for prediction.
[0129]
When the direct mode prediction method DPred indicates the interpolation prediction, the switch SW23 is switched to “0” side, and the first relative index nRIdx1, the reference block RefBlk1 indicated by the first reference vector nMV1, the second relative index nRIdx2, and the second relative index nRIdx2. The reference block RefBlk2 indicated by the reference vector nMV2 is used for interpolation prediction.
[0130]
As described above, the image decoding apparatus according to the present embodiment decodes the direct mode scaling coefficient / prediction method table in the encoded signal, and uses the direct mode scaling coefficient table according to the direct mode relative index value from the direct mode scaling coefficient table. By using the scaling coefficient, the image encoded signal encoded by the image encoding device described in the third embodiment can be correctly decoded.
[0131]
(Embodiment 5)
Furthermore, a program for realizing the configuration of the image encoding method or the image decoding method described in each of the above embodiments is recorded on a storage medium such as a flexible disk, so that the program described in each of the above embodiments is recorded. Can be easily executed in an independent computer system.
[0132]
FIG. 15 is an explanatory diagram of a storage medium for storing a program for realizing the image encoding method and the image decoding method of Embodiments 1 to 4 by a computer system.
[0133]
FIG. 15B shows the appearance, cross-sectional structure, and flexible disk as viewed from the front of the flexible disk, and FIG. 15A shows an example of the physical format of the flexible disk which is a recording medium body. The flexible disk FD is built in the case F, and a plurality of tracks Tr are formed concentrically from the outer circumference toward the inner circumference on the surface of the disk, and each track is divided into 16 sectors Se in an angular direction. ing. Therefore, in the flexible disk storing the program, an image encoding method as the program is recorded in an area allocated on the flexible disk FD.
[0134]
FIG. 15C shows a configuration for recording and reproducing the program on the flexible disk FD. When the above program is recorded on the flexible disk FD, the image coding method or the image decoding method as the above program is written from the computer system Cs via the flexible disk drive FDD. When the image encoding method is constructed in a computer system using a program in a flexible disk, the program is read from the flexible disk by a flexible disk drive and transferred to the computer system.
[0135]
In the above description, the description has been made using a flexible disk as a recording medium. However, the same description can be made using an optical disk. Further, the recording medium is not limited to this, and the present invention can be similarly implemented as long as the program can be recorded, such as an IC card or a ROM cassette.
[0136]
Further, here, an application example of the image encoding device or the image decoding device described in the above embodiment and a system using the same will be described.
[0137]
FIG. 24 is a block diagram illustrating an overall configuration of a content supply system ex100 that realizes a content distribution service. A communication service providing area is divided into desired sizes, and base stations ex107 to ex110, which are fixed wireless stations, are installed in each cell.
[0138]
The content supply system ex100 includes, for example, a computer ex111, a PDA (personal digital assistant) ex112, a camera ex113, a mobile phone ex114, and a camera on the Internet ex101 via an Internet service provider ex102 and a telephone network ex104, and base stations ex107 to ex110. Each device such as the mobile phone ex115 is connected.
[0139]
However, the content supply system ex100 is not limited to the combination as shown in FIG. 24, and may be connected in any combination. Further, each device may be directly connected to the telephone network ex104 without going through the base stations ex107 to ex110 which are fixed wireless stations.
[0140]
The camera ex113 is a device such as a digital video camera capable of shooting moving images. In addition, a mobile phone can be a PDC (Personal Digital Communications) system, a CDMA (Code Division Multiple Access) system, a W-CDMA (Wideband-Code Division Multiple Access mobile phone system, or a GSM gigabit mobile access system). Or PHS (Personal Handyphone System) or the like.
[0141]
The streaming server ex103 is connected from the camera ex113 to the base station ex109 and the telephone network ex104, and enables live distribution and the like based on encoded data transmitted by the user using the camera ex113. The encoding process of the photographed data may be performed by the camera ex113, or may be performed by a server or the like that performs the data transmission process. Also, moving image data captured by the camera 116 may be transmitted to the streaming server ex103 via the computer ex111. The camera ex116 is a device such as a digital camera that can shoot still images and moving images. In this case, encoding of the moving image data may be performed by the camera ex116 or the computer ex111. The encoding process is performed by the LSI ex117 of the computer ex111 and the camera ex116. The image encoding / decoding software may be incorporated in any storage medium (CD-ROM, flexible disk, hard disk, or the like) that is a recording medium readable by the computer ex111 or the like. Further, the moving image data may be transmitted by the mobile phone with camera ex115. The moving image data at this time is data encoded by the LSI included in the mobile phone ex115.
[0142]
In the content supply system ex100, the content (for example, a video image of a live music) captured by the user with the camera ex113, the camera ex116, or the like is encoded and transmitted to the streaming server ex103 as in the above-described embodiment. On the other hand, the streaming server ex103 stream-distributes the content data to the requesting client. Examples of the client include a computer ex111, a PDA ex112, a camera ex113, a mobile phone ex114, and the like that can decode the encoded data. In this way, the content supply system ex100 can receive and reproduce the encoded data at the client, and further, realizes personal broadcast by receiving, decoding, and reproducing the data in real time at the client. It is a system that becomes possible.
[0143]
The encoding and decoding of each device constituting this system may be performed using the image encoding device or the image decoding device described in each of the above embodiments.
[0144]
A mobile phone will be described as an example.
FIG. 25 is a diagram illustrating a mobile phone ex115 that uses the image encoding device and the image decoding device described in the above embodiment. The mobile phone ex115 includes an antenna ex201 for transmitting and receiving radio waves to and from the base station ex110, a camera unit ex203 capable of taking a picture such as a CCD camera, a still image, a picture taken by the camera unit ex203, and an antenna ex201. A display unit ex202 such as a liquid crystal display for displaying data obtained by decoding a received video or the like, a main unit including a group of operation keys ex204, an audio output unit ex208 such as a speaker for outputting audio, and audio input. Input unit ex205 such as a microphone for storing encoded or decoded data, such as data of captured moving images or still images, received mail data, moving image data or still image data, etc. Storage media ex207, attached storage media ex207 to mobile phone ex115 And a slot portion ex206 to ability. The storage medium ex207 stores a flash memory element, which is a kind of an electrically erasable and programmable read only memory (EEPROM), which is a nonvolatile memory that can be electrically rewritten and erased, in a plastic case such as an SD card.
[0145]
Further, the mobile phone ex115 will be described with reference to FIG. The mobile phone ex115 is provided with a power supply circuit unit ex310, an operation input control unit ex304, an image encoding unit, and a main control unit ex311 which controls the respective units of a main body unit including a display unit ex202 and operation keys ex204. Unit ex312, camera interface unit ex303, LCD (Liquid Crystal Display) control unit ex302, image decoding unit ex309, demultiplexing unit ex308, recording / reproducing unit ex307, modulation / demodulation circuit unit ex306, and audio processing unit ex305 via the synchronous bus ex313. Connected to each other.
[0146]
When the end of the call and the power key are turned on by a user operation, the power supply circuit unit ex310 supplies power to each unit from the battery pack to activate the digital cellular phone with camera ex115 in an operable state. .
[0147]
The mobile phone ex115 converts a sound signal collected by the sound input unit ex205 into digital sound data by the sound processing unit ex305 in the voice call mode based on the control of the main control unit ex311 including a CPU, a ROM, a RAM, and the like. This is spread-spectrum-processed by a modulation / demodulation circuit unit ex306, subjected to digital-analog conversion processing and frequency conversion processing by a transmission / reception circuit unit ex301, and then transmitted via an antenna ex201. Also, the mobile phone ex115 amplifies the received signal received by the antenna ex201 in the voice communication mode, performs frequency conversion processing and analog-to-digital conversion processing, performs spectrum despreading processing in the modulation / demodulation circuit unit ex306, and performs analog voice After being converted into a signal, the signal is output via the audio output unit ex208.
[0148]
Further, when an e-mail is transmitted in the data communication mode, text data of the e-mail input by operating the operation key ex204 of the main body is sent to the main control unit ex311 via the operation input control unit ex304. The main control unit ex311 performs spread spectrum processing on the text data in the modulation / demodulation circuit unit ex306, performs digital / analog conversion processing and frequency conversion processing in the transmission / reception circuit unit ex301, and transmits the data to the base station ex110 via the antenna ex201.
[0149]
When transmitting image data in the data communication mode, the image data captured by the camera unit ex203 is supplied to the image encoding unit ex312 via the camera interface unit ex303. When image data is not transmitted, image data captured by the camera unit ex203 can be directly displayed on the display unit ex202 via the camera interface unit ex303 and the LCD control unit ex302.
[0150]
The image encoding unit ex312 includes the image encoding device described in the present invention, and uses the image data supplied from the camera unit ex203 in the image encoding device described in the above embodiment. The image data is converted into encoded image data by compression encoding, and is transmitted to the demultiplexing unit ex308. At this time, the mobile phone ex115 simultaneously transmits the audio collected by the audio input unit ex205 during imaging by the camera unit ex203 to the demultiplexing unit ex308 as digital audio data via the audio processing unit ex305.
[0151]
The demultiplexing unit ex308 multiplexes the encoded image data supplied from the image encoding unit ex312 and the audio data supplied from the audio processing unit ex305 by a predetermined method, and multiplexes the resulting multiplexed data into a modulation / demodulation circuit unit. The signal is subjected to spread spectrum processing in ex306 and subjected to digital-analog conversion processing and frequency conversion processing in the transmission / reception circuit unit ex301, and then transmitted via the antenna ex201.
[0152]
When receiving data of a moving image file linked to a homepage or the like in the data communication mode, the modulation and demodulation circuit unit ex306 performs spectrum despread processing on a received signal received from the base station ex110 via the antenna ex201, and obtains the resulting multiplexed signal. The demultiplexed data is sent to the demultiplexing unit ex308.
[0153]
To decode the multiplexed data received via the antenna ex201, the demultiplexing unit ex308 separates the multiplexed data into coded image data and audio data by separating the multiplexed data, and transmits the multiplexed data via the synchronization bus ex313. And supplies the encoded image data to the image decoding unit ex309 and the audio data to the audio processing unit ex305.
[0154]
Next, the image decoding unit ex309 is configured to include the image decoding device described in the present invention, and decodes the encoded image data by a decoding method corresponding to the encoding method described in the above embodiment. Thereby, reproduced moving image data is generated and supplied to the display unit ex202 via the LCD control unit ex302, whereby, for example, moving image data included in a moving image file linked to a homepage is displayed. At this time, the audio processing unit ex305 simultaneously converts the audio data into an analog audio signal, and supplies the analog audio signal to the audio output unit ex208. Thereby, the audio data included in the moving image file linked to the homepage is reproduced, for example. You.
[0155]
It should be noted that the present invention is not limited to the example of the system described above, and digital broadcasting using satellites and terrestrial waves has recently become a topic. As shown in FIG. Any of the decoding devices can be incorporated. Specifically, at the broadcasting station ex409, an encoded bit stream of video information is transmitted to a communication or broadcasting satellite ex410 via radio waves. The broadcasting satellite ex410 receiving this transmits a radio wave for broadcasting, receives this radio wave with a home antenna ex406 having a satellite broadcasting receiving facility, and transmits the radio wave to a television (receiver) ex401 or a set-top box (STB) ex407 or the like. The device decodes the encoded bit stream and reproduces it. Further, the image decoding device described in the above embodiment can also be mounted on a playback device ex403 that reads and decodes an encoded bit stream recorded on a storage medium ex402 that is a recording medium. In this case, the reproduced video signal is displayed on the monitor ex404. A configuration is also conceivable in which an image decoding device is mounted in a set-top box ex407 connected to a cable ex405 for cable television or an antenna ex406 for satellite / terrestrial broadcasting, and this is reproduced on a monitor ex408 of the television. At this time, the image encoding device may be incorporated in the television instead of the set-top box. Further, it is also possible to receive a signal from the satellite ex410 or the base station ex107 or the like with the car ex412 having the antenna ex411 and reproduce the moving image on a display device such as the car navigation ex413 or the like included in the car ex412.
[0156]
It should be noted that the configuration of the car navigation ex413 can be, for example, a configuration excluding the camera unit ex203 and the camera interface unit ex303 in the configuration illustrated in FIG. 26, and the same can be considered for the computer ex111 and the television (receiver) ex401. . In addition, terminals such as the mobile phone ex114 and the like have three mounting formats, in addition to a transmitting / receiving terminal having both an encoder and a decoder, a transmitting terminal having only an encoder and a receiving terminal having only a decoder. Can be considered.
[0157]
As described above, the image encoding device and the image decoding device described in the above embodiment can be used in any of the devices and systems described above, and by doing so, the effects described in the above embodiment can be obtained. Obtainable.
[0158]
【The invention's effect】
As described above in detail, the image encoding method / image decoding method of the present invention includes a direct mode scaling coefficient table storing a plurality of direct mode scaling coefficients, and is provided in accordance with the first relative index value of the direct mode. By selecting and using the direct mode coefficient, a direct mode motion vector can be generated in consideration of the display time difference between the encoding target frame and the reference frame, so that the encoding efficiency in the direct mode can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram of an image encoding device according to a first embodiment.
FIG. 2 is a diagram of a first example of a direct mode scaling coefficient table according to the first embodiment;
FIG. 3 is a diagram of a first example of an image coded signal format according to the first embodiment;
FIG. 4 is a diagram of a second example of an image coded signal format according to the first embodiment;
FIG. 5 is a diagram illustrating a third example of an image coded signal format according to the first embodiment;
FIG. 6 is a diagram illustrating a fourth example of an image coded signal format according to the first embodiment;
FIG. 7 is a diagram of a second example of a direct mode scaling coefficient table according to the first embodiment;
FIG. 8 is a diagram of a code table for scaling coefficients for direct mode according to the first embodiment.
FIG. 9 is a block diagram of an image decoding device according to a second embodiment.
FIG. 10 is a flowchart for selecting a scaling coefficient for direct mode according to the first method of the second embodiment.
FIG. 11 is a flowchart for selecting a scaling coefficient for direct mode according to a second method of the second embodiment.
FIG. 12 is a block diagram of an image decoding apparatus according to a third embodiment.
FIG. 13 is a diagram of a direct mode scaling coefficient / prediction method table according to the third embodiment;
FIG. 14 is a block diagram of an image decoding apparatus according to a fourth embodiment.
FIG. 15 is a diagram illustrating a storage medium for storing a program for realizing the image encoding method and the image decoding method according to the first to fourth embodiments by a computer system.
FIG. 16 is a conceptual diagram of a B picture.
FIG. 17 is an explanatory diagram of interpolation prediction.
FIG. 18 is an explanatory diagram of a frame number and a relative index.
FIG. 19 is a conceptual diagram of a short-term frame buffer and a long-term frame buffer.
FIG. 20 is an explanatory diagram of a direct mode of a conventional image encoding device.
FIG. 21 is a conceptual diagram of an image encoded signal format of a conventional image encoding device.
FIG. 22 is a block diagram illustrating a configuration of a conventional image encoding device.
FIG. 23 is a block diagram showing a configuration of a conventional image decoding device.
FIG. 24 is a block diagram showing the overall configuration of a content supply system.
FIG. 25 illustrates an appearance of a mobile phone.
FIG. 26 illustrates a configuration of a mobile phone.
FIG. 27 is a diagram illustrating an application example of the image encoding device or the image decoding device described in this embodiment.
[Explanation of symbols]
ImgEnc image coding means
ImgDec image decoding means
Add adder
Sub subtractor
MFrmBuf multi-frame buffer
ME motion estimation means
VLC0, VLC4, VLC5 Variable length coding means
VLD0, VLD4, VLD5 Variable length decoding means
MVBuf motion vector buffer
Pol pixel interpolation means
GenMVRdx Direct mode vector / relative index generation means
MVFNBuf motion vector frame number buffer
MVBuf motion vector buffer
SPTable Direct mode scaling coefficient table buffer
SPSel Direct mode scaling coefficient selection means
PredSel prediction type selection means
SPPredTable Scaling coefficient / prediction method table buffer for direct mode
SPPredSel Direct mode scaling coefficient / prediction method selection means
SW11 to SW13,. SW21-SW23 switch
Cs computer system
FD flexible disk
FDD flexible drive

Claims

In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on an encoding target frame by motion compensation from a plurality of encoded frames stored in the multi-frame buffer, A first step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the encoded frame;
A second step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
Encoding a prediction error which is a difference between the input encoding target frame and the predicted image, and outputting an image encoded signal including an encoded signal of the prediction error. In the second step,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the encoding target frame is set as the first reference frame,
One or more scaling coefficients associated with a value of a first relative index used when calculating a motion vector for the first reference frame and a motion vector for the second reference frame by scaling from the motion vector. A scaling coefficient table corresponding to the first relative index is selected from the scaling coefficient table; a motion vector for the first reference frame; Calculate the motion vector for the frame,
A predicted image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame, and in the third step,
An image encoding method, wherein the image encoding signal is output including the scaling coefficient table.

A first step of inputting an image encoded signal including an encoded signal of a prediction error,
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on a decoding target frame by motion compensation from a plurality of decoded frames stored in the multi-frame buffer, A second step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the decoded frame;
A third step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A fourth step of generating a decoded image of a frame from the predicted image and the decoded prediction error,
Storing a decoded image of a frame that may be used for inter-frame prediction in a multi-frame buffer, and a fifth step of the image decoding method,
Decoding the scaling coefficient table in the image encoded signal, in the second step,
In the decoded frame, the display order is later than the decoding target frame, and the second relative index is selected as the second reference frame having the smallest reference index. In the third step,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the decoding target frame is set as the first reference frame,
Selecting a scaling coefficient corresponding to the first relative index from the scaling coefficient table for use in calculating a motion vector for the first reference frame and a motion vector for the second reference frame by scaling from the motion vector; Calculating a motion vector for the first reference frame and a motion vector for the second reference frame from the motion vector and the scaling coefficient, and calculating a block obtained from the motion vector for the first reference frame; An image decoding method characterized by generating a predicted image by pixel interpolation from a block obtained from a motion vector for a second reference frame.

In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on an encoding target frame by motion compensation from a plurality of encoded frames stored in the multi-frame buffer, A first step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the encoded frame;
A second step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
Encoding a prediction error which is a difference between the input encoding target frame and the predicted image, and outputting an image encoded signal including an encoded signal of the prediction error. In the second step,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the encoding target frame is set as the first reference frame,
A scaling coefficient associated with a value of a first relative index used when calculating a motion vector for the first reference frame and a motion vector for the second reference frame by scaling from the motion vector; A scaling coefficient / prediction method table storing one or more prediction method types indicating interpolation or prediction from a first reference frame or prediction from a second reference frame;
A scaling coefficient and prediction method type corresponding to the first relative index are selected from the scaling coefficient / prediction method table, and a motion vector for the first reference frame and the second reference frame And a motion vector for
When the prediction method type is pixel interpolation, a predicted image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame,
When the prediction method type is prediction from a first reference frame, a block obtained from a motion vector for the first reference frame is set as a prediction image,
When the prediction method type is prediction from a second reference frame, a block obtained from a motion vector for the second reference frame is used as a prediction image, and in the third step,
An image encoding method characterized by outputting the above-mentioned scaling coefficient / prediction method table in an image encoded signal.

A first step of inputting an image encoded signal including an encoded signal of a prediction error,
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on a decoding target frame by motion compensation from a plurality of decoded frames stored in the multi-frame buffer, A second step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the decoded frame;
A third step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A fourth step of generating a decoded image of a frame from the predicted image and the decoded prediction error,
Storing a decoded image of a frame that may be used for inter-frame prediction in a multi-frame buffer, and a fifth step of the image decoding method,
Decoding the scaling coefficient prediction method table in the image encoding signal, in the second step,
In the decoded frame, the display order is later than the decoding target frame, and the second relative index is selected as the second reference frame having the smallest reference index. In the third step,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the decoding target frame is set as the first reference frame,
A scaling coefficient corresponding to the first relative index and a prediction method type used when calculating a motion vector for the first reference frame and a motion vector for the second reference frame by scaling from the motion vector Selecting from a coefficient / prediction method table, calculating a motion vector for the first reference frame and a motion vector for the second reference frame from the motion vector and the scaling coefficient,
When the prediction method type is pixel interpolation, a predicted image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame,
When the prediction method type is prediction from a first reference frame, a block obtained from a motion vector for the first reference frame is set as a prediction image,
When the prediction method type is prediction from a second reference frame, a block obtained from a motion vector for the second reference frame is used as a prediction image.

An image signal input, a difference device that performs a difference between the image signal and the predicted image and outputs the result as a residual signal, and an image encoding unit that performs image encoding processing on the difference signal and outputs the result as a residual encoded signal. Image decoding means for decoding the residual coded signal and outputting it as a residual decoded signal, an adder for adding the residual decoded signal and a predicted image to output a reconstructed image, A multi-frame buffer to be stored, a scaling coefficient table storing at least one scaling coefficient associated with the value of the first relative index, and a motion of a block at the same position as the encoding target block in the selected reference frame A frame referred to by a vector is referred to as a first reference frame, the selected reference frame is referred to as a second reference frame, and an index relative to the first reference frame is used. Direct mode scaling coefficient selection means for selecting a scaling coefficient corresponding to a first relative index from the scaling coefficient table, and a first reference frame and a second reference frame by a predetermined method from the scaling coefficient and the motion vector. A direct mode vector / relative index for generating a motion vector to a frame and a generation unit, and prediction is performed by performing pixel interpolation of two reference blocks referenced by a motion vector for a first reference frame and a motion vector for a second reference frame. An image encoding apparatus comprising: a pixel interpolation unit that outputs an image; and a variable length encoding unit that performs variable length encoding of a prediction error and the scaling coefficient table and outputs the encoded signal as an encoded signal.

Variable length decoding means for inputting an image coded signal and performing variable length decoding and outputting a residual coded signal and a scaling coefficient table, and image decoding means for decoding the residual coded signal and outputting a decoded residual signal An adder that adds the residual decoded signal and the prediction image signal and outputs a decoded image; a multi-frame buffer that stores the decoded image; and a multi-frame buffer that stores the same position as the encoding target block in the selected reference frame. A frame referred to by a motion vector of a block is a first reference frame, the selected reference frame is a second reference frame, and scaling corresponding to a first relative index which is a relative index to the first reference frame. Direct mode scaling coefficient selection means for selecting a coefficient from the scaling coefficient table; A direct mode vector / relative index for generating a motion vector to a first reference frame and a second reference frame from the first reference frame by a predetermined method, and the first motion vector for the first reference frame. A pixel interpolating means for performing pixel interpolation of a block referred to by the second motion vector with respect to a block referred to by the second motion vector and outputting the predicted image signal as the predicted image signal. Decoding device.

A storage medium storing a program for performing image encoding by a computer, the program storing, on the computer, a block on an encoding target frame from a plurality of encoded frames stored in a multi-frame buffer. In order to select a first reference frame and a second reference frame to be referred to when obtaining by motion compensation, a first relative index and a second relative index given to the encoded frame are used. A first step of selecting the first or second at least one reference frame;
A second step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
Encoding the prediction error, the third step of outputting an image encoding signal including an encoded signal of the prediction error, in the second step in the image encoding method,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the encoding target frame is set as the first reference frame,
A scaling coefficient table storing at least one scaling coefficient associated with a value of the first relative index; selecting a scaling coefficient corresponding to the first relative index from the scaling coefficient table; Calculating a motion vector for the first reference frame and a second reference frame from the coefficients;
Generating a predicted image by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame;
In the third step,
A storage medium for outputting an encoded image signal including the scaling coefficient table.

A storage medium storing a program for performing image decoding by a computer, wherein the program stores
A first step of inputting an image encoded signal including an encoded signal of a prediction error,
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on a decoding target frame by motion compensation from a plurality of decoded frames stored in the multi-frame buffer, A second step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the decoded frame;
A third step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A fourth step of generating a decoded image of a frame from the predicted image and the decoded prediction error,
Storing a decoded image of a frame that may be used for inter-frame prediction in a multi-frame buffer, and a fifth step of the image decoding method,
Decode the scaling coefficient table in the image encoded signal,
In the second step,
Selecting, as the second reference frame, a reference frame having the smallest second relative index in the display order of the decoded frames after the decoding target frame;
In the third step,
In the second reference frame, a frame referred to by a motion vector used for motion compensation of a block at the same position as a predetermined block on the decoding target frame is set as the first reference frame, and the first relative index is set. Is selected from the scaling coefficient table, and a motion vector for the first reference frame and a second reference frame are calculated from the motion vector and the scaling coefficient. A storage medium for causing a predicted image to be generated by pixel interpolation from a block obtained from a motion vector and a block obtained from a motion vector for the second reference frame.

A computer-readable storage medium storing a program for performing image encoding, wherein the program stores
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on an encoding target frame by motion compensation from a plurality of encoded frames stored in the multi-frame buffer, A first step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the encoded frame;
A second step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
Encoding the prediction error, the third step of outputting an image encoding signal including an encoded signal of the prediction error, in the second step in the image encoding method,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the encoding target frame is set as the first reference frame,
A scaling coefficient associated with the value of the first relative index, and a scaling coefficient storing one or more prediction method types indicating pixel interpolation, prediction from the first reference frame, or prediction from the second reference frame. It has a forecast method table,
A scaling coefficient and a prediction method type corresponding to the first relative index are selected from the scaling coefficient / prediction method table, and a motion vector for the first reference frame and the second reference frame are determined from the motion vector and the scaling coefficient. Calculated to
When the prediction method type is pixel interpolation, a prediction image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame, and prediction is performed. When the method type is prediction from a first reference frame, a block obtained from a motion vector for the first reference frame is used as a prediction image, and when the prediction method type is prediction from a second reference frame, , A block obtained from a motion vector for the second reference frame as a prediction image,
In the third step,
A storage medium for outputting an encoded image signal including the scaling coefficient / prediction method table.

A storage medium storing a program for performing image decoding by a computer, wherein the program stores
A first step of inputting an image encoded signal including an encoded signal of a prediction error,
In order to select a first reference frame and a second reference frame to be referred to when obtaining a block on a decoding target frame by motion compensation from a plurality of decoded frames stored in the multi-frame buffer, A second step of selecting the first or second at least one reference frame using a first relative index and a second relative index given to the decoded frame;
A third step of generating a predicted image by pixel interpolation from a block obtained by motion compensation on the first or second at least one reference frame;
A fourth step of generating a decoded image of a frame from the predicted image and the decoded prediction error,
Storing a decoded image of a frame that may be used for inter-frame prediction in a multi-frame buffer, and a fifth step of the image decoding method,
Decode the scaling coefficient / prediction method table in the image encoded signal,
In the second step,
Selecting, as the second reference frame, a reference frame having the smallest second relative index in the display order of the decoded frames after the decoding target frame;
In the third step,
In the second reference frame, a frame referred to by a motion vector used in motion compensation of a block at the same position as a predetermined block on the decoding target frame is set as the first reference frame,
A scaling coefficient and a prediction method type corresponding to the first relative index are selected from the scaling coefficient / prediction method table, and a motion vector for the first reference frame and the second reference frame are determined from the motion vector and the scaling coefficient. Calculated to
When the prediction method type is pixel interpolation, a prediction image is generated by pixel interpolation from a block obtained from a motion vector for the first reference frame and a block obtained from a motion vector for the second reference frame, and prediction is performed. When the method type is prediction from a first reference frame, a block obtained from a motion vector for the first reference frame is used as a prediction image, and when the prediction method type is prediction from a second reference frame, A block obtained from a motion vector for the second reference frame as a predicted image.