JP2004180269A

JP2004180269A - Moving image encoding method, moving image decoding method, moving image encoding device, moving image decoding device, moving image encoding program, and moving image decoding program

Info

Publication number: JP2004180269A
Application number: JP2003207598A
Authority: JP
Inventors: Chunsen Bun; チュンセンブン; Satoru Adachi; 悟安達; Sadaatsu Kato; 禎篤加藤; Minoru Eito; 稔栄藤; Thiow Keng Tan; ティオケンタン
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2002-10-03
Filing date: 2003-08-14
Publication date: 2004-06-24

Abstract

【課題】逆方向フレーム間予測を用いる際に適切な時間間隔での復号画像出力を得ることが可能な動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム及び動画像復号プログラムを提供する。
【解決手段】動画像処理システムは、動画像符号化装置１と、動画像復号装置２とを備えて構成される。符号化装置１は、動画像データＤ０を符号化した符号化データＤ１に加えて、逆方向予測により生じ得る最大遅延時間を出力する。また、復号装置２は、符号化装置１からの符号化データＤ１に加えて、逆方向予測により生じ得る最大遅延時間を入力する。そして、入力した最大遅延時間を参照しつつ、符号化データＤ１を復号して動画像データＤ２を生成する。
【選択図】図１Kind Code: A1 A moving image encoding method, a moving image decoding method, a moving image encoding device, a moving image decoding device, and a moving image capable of obtaining decoded image outputs at appropriate time intervals when using backward inter-frame prediction. An image encoding program and a moving image decoding program are provided.
A moving image processing system includes a moving image encoding device and a moving image decoding device. The encoding device 1 outputs a maximum delay time that can be generated by backward prediction, in addition to the encoded data D1 obtained by encoding the moving image data D0. In addition, the decoding device 2 receives, in addition to the encoded data D1 from the encoding device 1, a maximum delay time that can be generated by backward prediction. Then, the encoded data D1 is decoded with reference to the input maximum delay time to generate moving image data D2.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像処理システム、動画像符号化プログラム、及び動画像復号プログラムに関するものである。
【０００２】
【従来の技術】
動画像信号の伝送や蓄積、再生を行うために、動画像信号の符号化技術が用いられる。そのような技術として、ITU-T Recommendation H.263（以下H.263と呼ぶ）やISO/IEC International Standard 14496-2（MPEG-4 Visual、以下MPEG-4と呼ぶ）などの国際標準化動画像符号化方式が知られている。また、より新しい符号化方式として、ITU-T とISO/IECとの合同国際標準化が予定されている動画像符号化方式、ITU-T Recommendation H.264、ISO/IEC International Standard14496-10（Joint Final Committee Draft of Joint Video Specification、以下H.26Lと呼ぶ）が知られている。これらの動画像符号化方式に用いられている一般的な符号化技術については、例えば非特許文献１（小野文孝、渡辺裕共著、「国際標準画像符号化の基礎技術」）に記載がある。
【０００３】
動画像信号は時間的に少しずつ変化する一枚づつの画像（フレーム）が連続して構成されたものであることから、一般的にこれらの動画像符号化方式においては、符号化対象として入力されたフレーム（現フレーム）に対して、他のフレーム（参照フレーム）との間でフレーム間予測を行って動画像信号における時間的な冗長度を削減する。この場合フレーム間予測は、現フレームとの変化がより小さい参照フレームとの間で行うことによって、より大きく冗長度を削減し符号化効率を高めることできる。
【０００４】
このため、図６に示すように、現フレームＡ１に対する参照フレームとしては、現フレームＡ１より時間的に前のフレームＡ０だけでなく、時間的に後のフレームＡ２を用いる場合もある。前のフレームを用いる場合を順方向予測、後のフレームを用いる場合を逆方向予測と呼ぶ。またこのとき、両方の予測が任意に選択されるか、もしくは同時に用いられる場合を双方向予測と呼ぶ。
【０００５】
一般的にこのような双方向予測が用いられる場合には、図６に示した例のように、時間的に前のフレームの１つが順方向予測の参照フレームとして、また時間的に後のフレームの１つが逆方向予測の参照フレームとして、現フレームに先んじて予めそれぞれ保持される。
【０００６】
図７は、図６に示した双方向予測を行った場合でのフレームの（ａ）復号、及び（ｂ）出力について示す図である。例えばMPEG-4の復号においては、現フレームＡ１を双方向フレーム間予測により復号する場合には、まず現フレームＡ１の復号化に先んじて、現フレームＡ１より時間的に前のフレームの１つであるフレームＡ０、および時間的に後のフレームの１つであるフレームＡ２が、フレーム間予測を用いないフレーム内予測により復号されたフレーム、もしくは順方向フレーム間予測により復号されたフレームとして復号され、それらが参照フレームとして保持される。その後に現フレームＡ１が、保持されたこれら２つのフレームＡ０、Ａ２を用いて双方向予測により復号される（図７（ａ））。
【０００７】
したがってこの場合、時間的に後の参照フレームＡ２と現フレームＡ１との復号時間の順序は、それぞれの復号画像の出力時間の順序と逆転することとなる。なお、これらのフレームＡ０、Ａ１、Ａ２には、それぞれ出力時間情報０、１、２が関連づけられており、この情報にしたがって各フレームの時間的な前後関係を知ることができる。このため、それぞれの復号画像は正しい順序にて出力される（図７（ｂ））。MPEG-4では、出力時間情報は絶対値として記述されている。
【０００８】
また、近年の動画像符号化方式では、図８に示すように、このようなフレーム間予測において、より現フレームとの変化の小さいフレームからの予測が可能となるように、順方向、逆方向それぞれの参照フレームを１つだけではなく、複数用いることのできるものがある。図８においては、現フレームＢ２に対する参照フレームとして、現フレームＢ２より時間的に前の２つのフレームＢ０、Ｂ１、及び時間的に後の２つのフレームＢ３、Ｂ４を用いる例を示している。
【０００９】
図９は、図８に示した双方向予測を行った場合でのフレームの（ａ）復号、及び（ｂ）出力について示す図である。例えば、H.26Lの復号においては、予め定められた参照フレーム数上限までの範囲で参照フレームを複数保持しておくことができ、フレーム間予測を行う場合には、それらの中から最適なものが任意に指示されて用いられる。この場合、現フレームＢ２を双方向予測フレームとして復号する場合には、まず現フレームＢ２の復号に先んじて参照フレームが復号されるが、この参照フレームとして、現フレームＢ２より時間的に前のフレームが複数（例えば２つのフレームＢ０、Ｂ１）、また時間的に後のフレームが複数（例えば２つのフレームＢ３、Ｂ４）、それぞれ復号されて、参照フレームとして保持される。現フレームＢ２では、それらのフレームＢ０、Ｂ１、Ｂ３、Ｂ４のなかから予測に用いるフレームが任意に指示されて予測を行うことができる（図９（ａ））。
【００１０】
したがってこの場合、時間的に後の複数の参照フレームＢ３、Ｂ４と現フレームＢ２との復号時間の順序が、出力時間の順序と逆転することとなる。なお、これらのフレームＢ０〜Ｂ４はそれぞれ出力時間情報もしくは出力順序情報０〜４が関連づけられており、この情報にしたがって各フレームの時間的な前後関係を知ることができる。このため、それぞれの復号画像は正しい順序にて出力される（図９（ｂ））。出力時間情報は、絶対値として記述されることが多い。また、出力順序は、フレーム間隔が一定の場合に用いられる。
【００１１】
時間的に後のフレームを予測フレームとして用いた逆方向予測による復号を行う場合には、現フレームの復号に先んじて、時間的に後のフレームの復号が完了しており、予測フレームとして用いることができる必要がある。この場合に現フレームには復号画像が得られるまでに、逆方向予測が用いられないフレームの場合と比較して、遅延が生じることとなる。
【００１２】
これについて、図１０を参照しつつ以下に具体的に説明する。尚、図１０は図６及び図７に示した例に対応している。まず、各フレームＡ０〜Ａ２の符号化データがフレーム間予測を行うために必要な順序にて復号され、その間隔はフレームレートに準じた一定の時間間隔であると仮定し、また復号処理に必要となる時間はフレーム間予測が用いられるか否かやフレーム間予測の方向の如何にかかわらず各フレームＡ０〜Ａ２について無視できると仮定する（図１０（ａ））。実際には、各フレームＡ０〜Ａ２の復号間隔は一定である必要はなく、各フレームＡ０〜Ａ２の符号化ビット量変動などの要因により変化し得るが、平均的には一定と見なすことができる。また復号処理に必要となる時間もゼロではないが、各フレームＡ０〜Ａ２の間で大きな差がなければ、以下の説明において大きな問題とはならない。
【００１３】
ここで、逆方向予測を行うことによる遅延や他のフレームとの間で復号時間と出力時間の順序の逆転がないフレームＡ０（以下、逆方向予測未関連フレームと呼ぶ）において復号画像が得られる時間を、その復号画像に関連づけられた出力時間として、復号画像の出力を行うものとする。すると、後に続くフレームが逆方向予測フレームＡ１であった場合に、この復号画像は時間的に後となるフレームＡ２よりも後に復号されることとなるため、復号画像が得られるまでには遅延が生じる。
【００１４】
このため、逆方向予測未関連フレームＡ０において復号画像が得られる時間を出力時間の基準としてしまうと、逆方向予測フレームＡ１における復号画像をそれに関連づけられた出力時間までに得ることができない（図１０（ｂ））。すなわち、逆方向予測未関連フレームＡ０の復号画像と逆方向予測フレームＡ１の復号画像との出力時間間隔が、本来の間隔よりも逆方向予測を行う際に必要な遅延時間だけ開いて空いてしまうこととなり、不自然な動画像出力となってしまう。
【００１５】
したがって、動画像符号化において逆方向フレーム間予測が用いられる場合には、図１０（ｃ）に示すように、逆方向予測未関連フレームＡ０においても予め逆方向予測を行う際に必要となる遅延時間だけ復号画像の出力時間を遅延させておき、逆方向予測フレームＡ１との出力時間間隔を正しく扱うことができるようにする必要がある。
【００１６】
従来では逆方向フレーム間予測は、予測の選択肢が増えることとなるため計算量が増大し簡易な機器では実現が難しいこと、またテレビ会議といった双方向での対話がなされる実時間通信においては遅延時間の増加が望ましくないことから、テレビ放送やその蓄積など、高ビットレートでの符号化がなされ、常にテレビ放送信号と同じ３０フレーム／秒の固定フレームレートが用いられる条件での動画像符号化において用いられてきた。
【００１７】
この場合には、例えばMPEG-4のように時間的に後のフレームの１つを逆方向予測の参照フレームとして用いる符号化において、逆方向予測を行う際に必要となる遅延時間は一定である。例えば、上記のように３０フレーム／秒のフレームレートが用いられる場合、遅延時間は各フレームの時間間隔、すなわち１／３０秒となる。したがって、逆方向予測未関連フレームにおいて復号画像の出力時間を遅延させるべき時間は、一律に１／３０秒とすることができる。
【００１８】
【非特許文献１】
小野文孝、渡辺裕共著、「国際標準画像符号化の基礎技術」、コロナ社、1998年3月20日
【００１９】
【発明が解決しようとする課題】
しかしながら近年では、計算機能力の向上とともに映像サービスの多様化が進んでいることに伴い、インターネットや移動通信における映像配信など、遅延が許容され、かつ低ビットレートでの符号化が求められる動画像符号化が用いられるようになってきている。低ビットレートでの符号化を実現するためには、３０フレーム／秒よりも小さなフレームレートが用いられたり、また符号化ビットレートを制御するためにフレームレートが動的に変更される可変フレームレートが用いられたりする。
【００２０】
このような動画像符号化において、より符号化効率を高めるために上述した逆方向予測を用いた場合、逆方向予測による遅延時間は従来のように１／３０秒とはならない。また、可変フレームレートが用いられる場合には、フレームレートは一定とはならない。例えば、一時的に小さなフレームレートが用いられた場合には、そこでの各フレームの時間間隔は大きくなるため、逆方向予測未関連フレームにおいて復号画像の出力時間を遅延させるべき時間が一意には決まらない。このため、逆方向予測未関連フレームの復号画像と逆方向予測フレームの復号画像との出力時間間隔を正しく扱うことができなくなってしまう。
【００２１】
このとき、予め逆方向予測を行う際に生じ得る遅延時間を大きく見込み、常にこの遅延時間だけ逆方向予測未関連フレームの復号画像の出力時間を遅延させることにより、逆方向予測フレームの復号画像との出力時間間隔を正しく扱うこともできる。しかしながらこの場合、実際の逆方向予測における遅延時間に関わらず、常に復号画像の出力時間に大きな遅延が付加されることとなってしまう。
【００２２】
また、H.26Lのように、逆方向予測において複数の参照フレームが用いられる場合には、現フレームの復号に先んじて、時間的に後のフレームであるそれらの参照フレームすべての復号が完了している必要がある。このため、逆方向予測を行う際に必要となる遅延時間は、さらに増大することになる。
【００２３】
またこの場合、逆方向予測において用いられる参照フレーム数は、現フレームよりも以前に復号された、現フレームよりも時間的に後となるフレームの数として一意に決まるため、予め定められた参照フレーム数上限までの範囲で、参照フレーム数を任意に変化させることができてしまう。
【００２４】
例えば、参照フレーム数上限が４であれば、図８に示したように逆方向予測において用いられる参照フレーム数は２でも良いが、図１１（ａ）に示すように、これを１としても良いし、あるいは図１１（ｂ）に示すように、３としても良い。このように参照フレーム数を変化させることが可能であるため、逆方向予測を行う際に必要な遅延時間は大きく変化し得ることになる。これにより、逆方向予測未関連フレームの復号画像と逆方向予測フレームの復号画像との出力時間間隔を正しく扱うことができなくなってしまう。
【００２５】
このとき、逆方向予測において用いることのできる最大の参照フレーム数は、参照フレーム数上限よりも大きくなることはないことから、この参照フレーム数上限に応じた遅延時間が逆方向予測を行う際に生じ得る最大の遅延時間となる。したがって、常にこの遅延時間だけ逆方向予測未関連フレームの復号画像の出力時間を遅延させることにより、逆方向予測フレームの復号画像との出力時間間隔を正しく扱うこともできる。
【００２６】
しかしながら、この場合、実際に逆方向予測フレームにおいて用いられる参照フレーム数に関わらず、常に復号画像の出力時間に大きな遅延が付加されることとなってしまう。また上述のような可変フレームレートが用いられている場合には、最大の参照フレーム数が一意に決まっても、最大の遅延時間を一意に決めることはできなくなってしまう。
【００２７】
このように従来では、動画像符号化において逆方向予測を用いる場合には、固定フレームレートが用いられることが明らかである場合を除き、逆方向予測を行う際に必要な遅延時間を一意に決めることができない。このため、逆方向予測未関連フレームの復号画像と逆方向予測フレームの復号画像との出力時間間隔を正しく扱うことができなくなり、不自然な動画像出力となってしまうという問題があった。
【００２８】
また、逆方向予測において複数の参照フレームが用いられる場合にも、参照フレーム数が変化し得ることから、遅延時間が変化し得る。したがって、逆方向予測未関連フレームの復号画像と逆方向予測フレームの復号画像との時間間隔を正しく扱うことができなくなってしまうという問題があった。また、これに対処するために常に最大の遅延時間を想定した場合には、常に復号画像の出力時間に大きな遅延が付加されてしまう問題があった。
【００２９】
本発明は、以上の問題点を解決するためになされたものであり、逆方向フレーム間予測を用いる際に適切な時間間隔での復号画像出力を得ることが可能な動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラムを提供することを目的とする。
【００３０】
【課題を解決するための手段】
このような目的を達成するために、本発明に係る動画像符号化方法は、他のフレームとの間でフレーム間予測を行う動画像符号化方法であって、逆方向予測により生じ得る最大遅延時間を出力することを特徴とする。
【００３１】
同様に、本発明に係る動画像符号化装置は、他のフレームとの間でフレーム間予測を行う動画像符号化装置であって、逆方向予測により生じ得る最大遅延時間を出力することを特徴とする。
【００３２】
このように、本発明に係る動画像符号化方法及び装置においては、連続するフレームによって構成された動画像を符号化して出力する際に、符号化データに加えて、逆方向予測に伴う最大遅延時間を出力することとしている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００３３】
また、本発明に係る動画像符号化プログラムは、他のフレームとの間でフレーム間予測を行う動画像符号化をコンピュータに実行させるための動画像符号化プログラムであって、逆方向予測により生じ得る最大遅延時間を出力する処理をコンピュータに実行させることを特徴とする。
【００３４】
このように、本発明に係る動画像符号化プログラムにおいては、動画像を符号化して出力する際に、符号化データに加えて、最大遅延時間を出力する処理をコンピュータに実行させることとしている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００３５】
本発明に係る動画像復号方法は、他のフレームとの間でフレーム間予測を行う動画像復号方法であって、逆方向予測により生じ得る最大遅延時間を入力することを特徴とする。
【００３６】
同様に、本発明に係る動画像復号装置は、他のフレームとの間でフレーム間予測を行う動画像復号装置であって、逆方向予測により生じ得る最大遅延時間を入力することを特徴とする。
【００３７】
このように、本発明に係る動画像復号方法及び装置においては、入力された符号化データを復号して動画像を生成する際に、符号化データに加えて、逆方向予測に伴う最大遅延時間を入力することとしている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００３８】
また、本発明に係る動画像復号プログラムは、他のフレームとの間でフレーム間予測を行う動画像復号をコンピュータに実行させるための動画像復号プログラムであって、逆方向予測により生じ得る最大遅延時間を入力する処理をコンピュータに実行させることを特徴とする。
【００３９】
このように、本発明に係る動画像復号プログラムにおいては、符号化データを復号して動画像を生成する際に、符号化データに加えて、最大遅延時間を入力する処理をコンピュータに実行させることとしている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００４０】
また、動画像符号化方法は、符号化対象となるフレームを入力する入力ステップと、フレームを所定の方法で符号化する符号化ステップと、フレームの表示時間、符号化時間、及び逆方向予測により生じ得る遅延時間からフレームの最大遅延時間を求める最大遅延時間計算ステップとを備えることを特徴とする。
【００４１】
同様に、動画像符号化装置は、符号化対象となるフレームを入力する入力手段と、フレームを所定の方法で符号化する符号化手段と、フレームの表示時間、符号化時間、及び逆方向予測により生じ得る遅延時間からフレームの最大遅延時間を求める最大遅延時間計算手段とを備えることを特徴とする。
【００４２】
同様に、動画像符号化プログラムは、符号化対象となるフレームを入力する入力処理と、フレームを所定の方法で符号化する符号化処理と、フレームの表示時間、符号化時間、及び逆方向予測により生じ得る遅延時間からフレームの最大遅延時間を求める最大遅延時間計算処理とをコンピュータに実行させることを特徴とする。
【００４３】
このように、本発明に係る動画像符号化方法、装置、及びプログラムにおいては、動画像を符号化する際に、フレームの最大遅延時間を求めている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００４４】
また、動画像復号方法は、所定の方法で符号化されたフレームの符号化データ、フレームの復号時間、及び最大遅延時間を含む画像データを入力する入力ステップと、符号化データを復号し、再生画像を生成する復号ステップと、復号時間及び最大遅延時間に基づいて、フレームを表示するための出力時間を求める画像出力時間計算ステップとを備えることを特徴とする。
【００４５】
同様に、動画像復号装置は、所定の方法で符号化されたフレームの符号化データ、フレームの復号時間、及び最大遅延時間を含む画像データを入力する入力手段と、符号化データを復号し、再生画像を生成する復号手段と、復号時間及び最大遅延時間に基づいて、フレームを表示するための出力時間を求める画像出力時間計算手段とを備えることを特徴とする。
【００４６】
同様に、動画像復号プログラムは、所定の方法で符号化されたフレームの符号化データ、フレームの復号時間、及び最大遅延時間を含む画像データを入力する入力処理と、符号化データを復号し、再生画像を生成する復号処理と、復号時間及び最大遅延時間に基づいて、フレームを表示するための出力時間を求める画像出力時間計算処理とをコンピュータに実行させることを特徴とする。
【００４７】
このように、本発明に係る動画像復号方法、装置、及びプログラムにおいては、符号化データを復号して動画像を生成する際に、最大遅延時間に基づいてフレームを表示するための出力時間を求めている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００４８】
動画像符号化方法、符号化装置、及び符号化プログラムにおいて出力される最大遅延時間としては、逆方向フレーム間予測を行うフレームの発生時間から、逆方向予測の参照フレームとして用いることのできる時間的に最も後のフレームの発生時間までの時間差を最大遅延時間とすることが好ましい。
【００４９】
また、最大遅延時間の適用に関しては、最大遅延時間が符号化データ全体に適用される情報として出力されることを特徴としてもよい。あるいは、最大遅延時間が各フレームに適用される情報として出力されることを特徴としてもよい。あるいは、最大遅延時間がこの最大遅延時間が通知されるフレーム及びそのフレームよりも時間的に後の各フレームに適用される情報として任意に出力されることを特徴としてもよい。
【００５０】
また、動画像復号方法、復号装置、及び復号プログラムにおいて入力される最大遅延時間については、他のフレームとの間で復号時間と出力時間の順序の逆転がないフレームにおける、復号時間とそのフレームに関連づけられた復号画像出力時間との時間差を最大遅延時間とすることが好ましい。あるいはさらに、その最大遅延時間に基づいて以降の復号画像出力時間の基準を設定することが好ましい。
【００５１】
また、最大遅延時間の適用に関しては、最大遅延時間が符号化データ全体に適用される情報として入力されることを特徴としてもよい。あるいは、最大遅延時間が各フレームに適用される情報として入力されることを特徴としてもよい。あるいは、最大遅延時間がこの最大遅延時間が通知されるフレーム及びそのフレームよりも時間的に後の各フレームに適用される情報として任意に入力されることを特徴としてもよい。
【００５２】
本発明に係る動画像処理システムは、動画像の符号化装置と復号装置とを含んで構成された動画像処理システムであって、符号化装置は、上記した動画像符号化装置からなり、復号装置は、上記した動画像復号装置からなることを特徴とする。
【００５３】
このように、動画像処理システムは、逆方向予測に伴う最大遅延時間をそれぞれ出力及び入力する動画像符号化装置及び動画像復号装置を用いて構成されている。これにより、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能な動画像処理システムが実現される。
【００５４】
【発明の実施の形態】
以下、図面とともに本発明による動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラムの好適な実施形態について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。
【００５５】
まず、本発明における動画像の符号化及び復号の概略について説明する。図１は、本発明による動画像符号化装置、動画像復号装置、及び動画像処理システムの概略構成を示すブロック図である。本動画像処理システムは、動画像符号化装置１と、動画像復号装置２とを備えて構成されている。以下、動画像符号化装置１、動画像復号装置２、及び動画像処理システムについて、それらにおいて実行される動画像符号化方法、及び動画像復号方法とともに説明する。
【００５６】
動画像符号化装置１は、動画像の伝送や蓄積、再生を行うために、画像（フレーム）が連続して構成された動画像データＤ０を符号化して、符号化データＤ１として出力する装置である。また、動画像復号装置２は、入力された符号化データＤ１を復号して、フレームが連続して構成された復号後の動画像データＤ２を生成する装置である。また、動画像符号化装置１と動画像復号装置２との間は、符号化データＤ１等の必要なデータを伝送するため、有線または無線の所定のデータ伝送路によって接続されている。
【００５７】
動画像符号化装置１で行われる動画像の符号化においては、上述したように、符号化対象として入力された動画像データＤ０のフレームに対して、参照フレームとなる他のフレームとの間でフレーム間予測を行って、動画像データにおける冗長度を削減する。図１に示した動画像処理システムにおいては、動画像符号化装置１は、このフレーム間予測について、時間的に後のフレームからの逆方向フレーム間予測を行う。さらに、この動画像符号化装置１は、符号化データＤ１に加えて、逆方向予測により生じ得る最大遅延時間を出力する。
【００５８】
また、このような動画像符号化装置１に対応して、動画像復号装置２は、動画像符号化装置１からの符号化データＤ１に加えて、逆方向予測により生じ得る最大遅延時間を入力する。そして、入力された最大遅延時間を参照しつつ、符号化データＤ１を復号して動画像データＤ２を生成する。
【００５９】
このように、逆方向フレーム間予測に対して、最大遅延時間を出力する動画像符号化装置１及び動画像符号化方法、最大遅延時間を入力する動画像復号装置２及び動画像復号方法、及びそれらの装置１、２を備える動画像処理システムによれば、逆方向フレーム間予測を用いてフレーム間予測を行う場合に、適切な時間間隔での復号画像出力を得ることが可能となる。
【００６０】
ここで、動画像符号化において出力される最大遅延時間については、例えば、逆方向フレーム間予測を行うフレームの発生時間から、逆方向予測の参照フレームとして用いることのできる時間的に最も後のフレームの発生時間までの時間差を最大遅延時間とすることができる。
【００６１】
また、動画像復号において入力される最大遅延時間については、例えば、逆方向フレーム間予測を行うことによる遅延、及び他のフレームとの間で復号時間と出力時間の順序の逆転がないフレームにおける、復号時間（以下、Ｔｒとする）と当該フレームに関連づけられた復号画像出力時間（以下、Ｔｏとする）との時間差を最大遅延時間（以下、dpb＿output＿delayとする）とすることができる。この場合、その最大遅延時間に基づいて以降の復号画像出力時間の基準を設定することが好ましい。
【００６２】
また、最大遅延時間の適用については、符号化データ全体に適用する方法、または各フレームに適用する方法がある。あるいは、最大遅延時間の情報が通知された以降の各フレーム、すなわち、最大遅延時間が通知されるフレーム及びそのフレームよりも時間的に後の各フレームに適用する方法がある。これらの最大遅延時間の出力、入力、及び適用等については具体的には後述する。
【００６３】
上記した動画像符号化装置１において実行される動画像符号化方法に対応する処理は、動画像符号化をコンピュータに実行させるための動画像符号化プログラムによって実現可能である。また、動画像復号装置２において実行される動画像復号方法に対応する処理は、動画像復号をコンピュータに実行させるための動画像復号プログラムによって実現可能である。
【００６４】
例えば、動画像符号化装置１は、動画像符号化の処理動作に必要な各ソフトウェアプログラムなどが記憶されるＲＯＭと、プログラム実行中に一時的にデータが記憶されるＲＡＭとが接続されたＣＰＵによって構成することができる。このような構成において、ＣＰＵによって所定の動画像符号化プログラムを実行することにより、動画像符号化装置１を実現することができる。
【００６５】
同様に、動画像復号装置２は、動画像復号の処理動作に必要な各ソフトウェアプログラムなどが記憶されるＲＯＭと、プログラム実行中に一時的なデータが記憶されるＲＡＭとが接続されたＣＰＵによって構成することができる。このような構成において、ＣＰＵによって所定の動画像復号プログラムを実行することにより、動画像復号装置２を実現することができる。
【００６６】
また、動画像符号化または動画像復号のための各処理をＣＰＵによって実行させるための上記したプログラムは、コンピュータ読取可能な記録媒体に記録して頒布することが可能である。このような記録媒体には、例えば、ハードディスク及びフロッピーディスクなどの磁気媒体、ＣＤ−ＲＯＭ及びＤＶＤ−ＲＯＭなどの光学媒体、フロプティカルディスクなどの磁気光学媒体、あるいは、プログラム命令を実行または格納するように特別に配置された、例えばＲＡＭ、ＲＯＭ、及び半導体不揮発性メモリなどのハードウェアデバイスなどが含まれる。
【００６７】
以下、図１に示した動画像符号化装置、動画像復号装置、それらを備える動画像処理システム、及び対応する動画像符号化方法、動画像復号方法について、具体的な実施形態とともに説明する。以下の説明では、動画像の符号化及び復号について、H.26Lをもとにして実現することとして説明を行い、動画像符号化における動作について特に触れていない部分については、H.26Lの動作に準じるものとする。ただし、本発明は、H.26Lに限られるものではない。
【００６８】
（第１実施形態）
まず、本発明の第１実施形態について説明する。本実施形態では、固定フレームレートによる符号化がなされる場合の実施の形態を示す。本実施形態による符号化においては、逆方向予測に用いる最大参照フレーム数を決定し、この最大参照フレーム数と符号化に用いるフレームレートとにより最大遅延時間を算出し出力する。また本実施形態による復号においては、逆方向予測未関連フレームの復号の際にその復号画像の出力時間を、入力された最大遅延時間だけ遅延させる。またその出力時間への遅延時間を、以降のすべてに一様に適用し、逆方向予測未関連フレームの復号画像と逆方向予測フレームの復号画像との出力時間間隔が、本来の間隔から変化してしまうことを防ぐ。
【００６９】
符号化においては、まず、用いられる参照フレーム数上限が予め定められていることからこれを越えない範囲で、逆方向予測に用いる最大参照フレーム数を決定する。次に、これも予め定められた符号化に用いるフレームレートに基づき、最大遅延時間を、逆方向予測に用いる最大参照フレーム数に応じた１つ又は複数のフレームの時間間隔として算出する。
【００７０】
図２は、双方向予測を行った場合でのフレームの符号化の一例について示す図である。ここで、この図２においては、現フレームＦ２に対する参照フレームとして、現フレームＦ２より時間的に前の２つのフレームＦ０、Ｆ１、及び時間的に後の２つのフレームＦ３、Ｆ４を用いる例を示している。
【００７１】
図２に示すように、逆方向予測に用いる最大参照フレーム数を２、フレームレートを１５フレーム／秒とすれば、１つのフレームの時間間隔は１／１５秒である。したがって、この場合、最大遅延時間は２×（１／１５）＝２／１５秒となる。
【００７２】
符号化においては、以降は最大遅延時間を超える遅延時間が必要となる逆方向予測が行われることのないように、各フレームの符号化を制御する。具体的には、逆方向予測に用いる参照フレーム、すなわち現フレームよりも時間的に後となるフレームが、逆方向予測に用いる最大参照フレーム数を越えて現フレームよりも先に符号化され出力されることがないように、各フレームの符号化の順序を制御する。
【００７３】
図３は、本実施形態において用いられる動画像符号化装置の構成の一例を示すブロック図である。図３に示す動画像符号化装置１は、フレーム（画像）を所定の方法で符号化する符号化器１０と、符号化装置１の各部の動作を制御する制御器（ＣＰＵ）１５と、入力端子１ａ及び符号化器１０の間に設けられたフレームメモリ１１と、出力端子１ｂ及び符号化器１０の間に設けられた多重化器１２とを備える。また、制御器１５は、その機能として、最大遅延時間を求める最大遅延時間計算部１６を有している。また、符号化器１０には、出力バッファ１３が設けられている。
【００７４】
本符号化装置１における動画像符号化では、入力端子１ｃより、映像を符号化するための条件が入力される。この条件の入力では、一般には、キーボードなどの入力装置によって符号化条件が選択または入力される。本実施形態においては具体的には、符号化条件として、符号化対象となるフレームの大きさ、フレームレート、ビットレートに加えて、その映像の予測参照構造（逆方向予測を行うかどうか）、一時的に格納され参照フレームとして用いられるフレームの枚数（出力バッファ１３の容量に対応）、逆方向予測に用いられる参照フレームの枚数が入力される。これらの条件は、時間とともに変化させるように設定しても良い。入力端子１ｃから入力された符号化条件は制御器１５に格納される。
【００７５】
符号化処理が開始されると、制御器１５は符号化条件を符号化器１０に送り、符号化条件がセットされる。一方、入力端子１ａより符号化対象となるフレームが入力され、フレームメモリ１１を経由して符号化器１０に送られて符号化される。フレームメモリ１１内には、逆方向予測を行うにあたってフレームの順番が入れ替わるため、入力フレームが一時的に格納される。例えば、図２に示した例では、フレームＦ２は、フレームＦ３、Ｆ４よりも先に入力端子１ａから入力されるが、フレームＦ３、Ｆ４よりも後に符号化されるため、一時的にフレームメモリ１１に格納される。
【００７６】
符号化器１０は、H.26Lのアルコリズムに基づいてフレームを符号化する。そして、符号化されたデータは、多重化器１２を経由して、他の関連情報と多重化されて出力端子１ｂより出力される。また、予測に用いられるフレームは符号化器１０において再生され、次のフレームを符号化するための参照フレームとしてバッファ１３に格納される。
【００７７】
本実施形態においては、制御器１５の最大遅延時間計算部１６において、入力端子１ｃより入力される逆方向予測に用いる参照フレーム枚数及びフレームレートに基づき、最大遅延時間dpb＿output＿delayを算出する。そして、最大遅延時間は、多重化器１２にて、符号化された画像データに付加される。また、各フレームの符号化データには、それを識別するための表示順番を示す識別子（Ｎ）が合わせて付加される。
【００７８】
なお、当然ながら、逆方向予測を行わない場合、そのために用いられる参照フレーム枚数がゼロとなるので、dpb＿output＿delayの値がゼロとなる。
【００７９】
本実施形態では、この最大遅延時間を符号化において出力し、また復号において入力するために、H.26Lにおける符号化データシンタックスにおいて、最大遅延時間を通知するシンタックスを追加するものとする。ここでは符号化データ全体に適用される情報を通知するシンタックスであるシーケンスパラメータセット（Sequence Parameter Set）の中に、新たなシンタックスを追加する。
【００８０】
この最大遅延時間を通知するシンタックスとして、dpb＿output＿delayを定義する。ここでは、dpb＿output＿delayは、H.26Lにおいて時間を示す他のシンタックスに用いられる時間単位と同じものを使うこととして、９０ｋＨｚの時間単位にて最大遅延時間を示すものとする。また、その時間単位にて表される数値を、３２ビットの符号無し固定長符号にて符号化して伝送するものとする。例えば、上記のように最大遅延時間が２／１５秒である場合には、dpb＿output＿delayは（２／１５）×９００００＝１２０００となる。
【００８１】
復号においては、dpb＿output＿delayにより通知された最大遅延時間を復号し、これを用いて復号画像の出力時間を遅延する。
【００８２】
図４は、本実施形態において用いられる動画像復号装置の構成の一例を示すブロック図である。図４に示す動画像復号装置２は、符号化データを復号し、再生画像を生成する復号器２０と、復号装置２の各部の動作を制御する制御器（ＣＰＵ）２５と、入力端子２ａ及び復号器２０の間に設けられた入力バッファ２１と、出力端子２ｂ及び復号器２０の間に設けられた出力バッファ２２とを備える。また、制御器２５は、その機能として、フレームを表示するための出力時間を求める画像出力時間計算部２６を有している。
【００８３】
本復号装置２における動画像復号では、入力端子２ａより、復号対象となるデータが入力される。このデータには、図３に示した符号化装置１を用いて符号化された各フレームの符号化データや、最大遅延時間dpb＿output＿delay、各フレームの表示順番を示す識別子（Ｎ）が多重化されている。
【００８４】
入力されたデータは、入力バッファ２１に格納される。制御器２５の指示により、復号する時刻になれば、１フレーム分のデータが入力バッファ２１より復号器２０に入力され、H.26Lのアルゴリズムにしたがって復号される。このように再生されたフレームは出力バッファ２２に格納される。出力バッファ２２にあるフレームはライン２３を経由して復号器２０にフィードバックし、次のフレームを復号するための参照フレームとして用いられる。
【００８５】
一方、復号器２０において復号された最大遅延時間dpb＿output＿delay、フレームレート、及び各フレームの識別子（Ｎ）が制御器２５に入力される。そして、制御器２５の画像出力時間計算部２６において、これらのデータより、各フレームの出力時間が、下記の式にしたがって計算される。
Ｔｏ（ｎ）＝dpb＿output＿delay＋Ｎ×フレーム間隔
ここで、フレーム間隔はフレームレートより求められる。
【００８６】
図２に示した例に合わせてdpb＿output＿delayを２／１５秒、フレーム間隔を１／１５秒とすると、上記の式により、
Ｎ＝０、Ｔｏ（０）＝２／１５
Ｎ＝１、Ｔｏ（１）＝３／１５
Ｎ＝２、Ｔｏ（２）＝４／１５
Ｎ＝３、Ｔｏ（３）＝５／１５
となる。このように、制御器２５において求められた出力時間Ｔｏ（ｎ）にしたがい、出力バッファ２２にあるフレームは、図５（ｂ）に示す各フレームＦ０、Ｆ１、Ｆ２、Ｆ３のように、一定間隔で出力端子２ｂに出力される。また、図示していないが、出力端子２ｂはモニタなどの表示装置に接続される。
【００８７】
図５は、図２に示した双方向予測を行った場合でのフレームの（ａ）復号、及び（ｂ）出力について示す図である。復号においては、各フレームの符号化データがフレーム間予測を行うために必要な順序にて復号され、その間隔はフレームレートに準じた一定の時間間隔であると仮定し、また復号処理に必要となる時間はフレーム間予測が用いられるか否かやフレーム間予測の方向の如何にかかわらず各フレームについて無視できると仮定する。この場合に、逆方向予測フレームにおいて逆方向予測を行う際に必要となる最大の遅延時間は、逆方向予測に用いる最大参照フレーム数に応じたフレームの時間間隔に等しい。この時間が、最大遅延時間としてdpb＿output＿delayにより通知されていることになる。したがって復号画像を出力するにあたり、出力時間をこの最大遅延時間だけ遅延させるものとする。
【００８８】
実際には、各フレームの復号間隔は一定とはならず、各フレームの符号化ビット量変動などの要因により変化し得る。また各フレームの復号処理に必要となる時間は、逆方向予測フレームであるか否かの他、各フレームの符号化ビット量などに応じて変化し得る。
【００８９】
したがって、出力時間を遅延させる場合には、図５に示すように、逆方向予測を行うことによる遅延や他のフレームとの間で復号時間と出力時間の順序の逆転がない逆方向予測未関連フレームＦ０において、復号画像が得られる時間を基準とする。すなわち、この復号画像が得られる時間からdpb＿output＿delayにより通知された最大遅延時間だけ遅延させた時間を、この復号画像に関連づけられた出力時間に等しい時間であることとして、復号画像出力における基準時間とする。以降の復号画像Ｆ１〜Ｆ４の出力は、この基準時間が各復号画像に関連づけられた出力時間と同じ時間となったときに出力することとする。
【００９０】
例えば、上記のように最大遅延時間が２／１５秒である場合には、逆方向予測未関連フレームにおいて復号画像が得られる時間から２／１５秒だけ遅延させた時間をこの復号画像に関連づけられた出力時間に等しい時間であることとして、以降の復号画像出力における基準時間とする。
【００９１】
なお、場合によっては符号化あるいは復号の処理を簡略化するために、あえて最大遅延時間を通知しないことも考えられる。このような場合のために、最大遅延時間を通知するためのシンタックスは、それより前にシンタックスの有無を指示するフラグが通知されることとして、省略可能としてもよい。
【００９２】
最大遅延時間の通知が省略される場合には、符号化においては、例えば逆方向予測は用いないこととして予め規定しておくこととしても良いし、あるいはまた参照フレーム数上限を越えない範囲で、逆方向予測に用いる参照フレーム数を任意に変動できることとしても良い。
【００９３】
復号においては、例えば符号化における規定と一致させて逆方向予測は用いない、したがって逆方向予測を行うことにより必要となる遅延は発生しないものとしても良いし、あるいはまた参照フレーム数上限を越えない範囲で、逆方向予測に用いる参照フレーム数は任意に変動し、したがって遅延時間は大きく変動し得るものとしても良い。この場合に復号においては、想定される最大の遅延時間を想定した処理を常に行うこととしても良いし、あるいはまた復号画像の出力時間間隔の変動を許容することとして、各フレームの遅延時間を考慮しない簡略化した処理を行うこととしても良い。
【００９４】
本実施形態の説明はH.26Lをもとにして実現したものとして説明したが、本発明を適用することのできる動画像符号化方式はH.26Lに限定されるものではなく、逆方向フレーム間予測を用いる様々な動画像符号化方式に適用することが可能である。
【００９５】
また、本実施形態においては、最大遅延時間を通知するためのシンタックスとしてシーケンスパラメータセットの中に固定長符号によるシンタックスを追加するものとしたが、むろんこれを通知するための符号やシンタックス、あるいは最大遅延時間を表現するための時間単位はこれらに限られるものではない。固定長符号に代わり可変長符号を用いることとしても良いし、また符号化データ全体に適用されるための情報を通知することのできる様々なシンタックスにおいて通知するものとすることができる。
【００９６】
例えば、H.26Lにおいては、補助拡張情報メッセージ（Supplemental Enhancement Information Message）の中にシンタックスを追加することとしても良い。また他の動画像符号化方式を用いる場合には、当該符号化方式において符号化データ全体に適用されるための情報を通知するためのシンタックスを用いることができ、またH.263を用いた通信において制御情報の通知のために利用されるITU-TRecommendation H.245のように、動画像符号化方式による符号化データの外部において通知することとしても良い。
【００９７】
（第２実施形態）
次に、本発明の第２実施形態について説明する。本実施形態では、可変フレームレートによる符号化がなされる場合の実施の形態を示す。本実施形態による符号化および復号における動作は、基本的に第１実施形態と同様である。本実施形態では可変フレームレートが用いられることから、符号化においては第１実施形態における動作に加えて、フレームレートが低下した場合に、予め算出された最大遅延時間を越える遅延時間が必要となる逆方向予測が行われないように動作し、フレームレートが変化する場合においても、逆方向予測未関連フレームの復号画像と逆方向予測フレームの復号画像との出力時間間隔が、本来の間隔から変化してしまうことを防ぐ。
【００９８】
符号化においては、まず用いられる参照フレーム数上限は予め定められていることからこれを越えない範囲で、逆方向予測に用いる最大参照フレーム数を決定する。次に符号化ビットレートの制御において予め定められる目標フレームレートに基づいた最大フレーム時間間隔を決定し、最大遅延時間を、逆方向予測に用いる最大参照フレーム数と最大フレーム時間間隔に応じた１つ又は複数のフレームの時間間隔として算出する。
【００９９】
符号化においては、以降は最大遅延時間を超える遅延時間が必要となる逆方向予測が行われることのないように、各フレームの符号化を制御する。具体的には、逆方向予測に用いる参照フレーム、すなわち現フレームよりも時間的に後となるフレームが、逆方向予測に用いる最大参照フレーム数を越えて現フレームよりも先に符号化され出力されることがないように、各フレームの符号化の順序を制御する。
【０１００】
またそれとともに、符号化ビットレート制御により符号化フレームレートが一時的に小さくなり、その場合のフレーム時間間隔が最大フレーム時間間隔より大きくなってしまった場合には、そこでのフレームの符号化に逆方向予測を用いないように各フレームの符号化を制御する。
【０１０１】
本実施形態において、この最大遅延時間を符号化において出力し、また復号において入力するために、符号化データシンタックスにおいて、最大遅延時間を通知するシンタックスdpb＿output＿delayを追加すること、ならびにその定義については、第１実施形態におけるものと同じである。
【０１０２】
また、本実施形態において、復号においては、dpb＿output＿delayにより通知された最大遅延時間を復号し、これを用いて復号画像の出力時間を遅延する。この処理についても、第１実施形態におけるものと同じである。
【０１０３】
（第３実施形態）
次に、本発明の第３実施形態について説明する。本実施形態では、最大遅延時間が各フレームについて任意に通知されて柔軟に変更できる場合の実施の形態を示す。本実施形態による符号化および復号における動作は、基本的に第１実施形態もしくは第２実施形態と同様である。
【０１０４】
本実施形態においては、第１実施形態において定義された、最大遅延時間を通知するシンタックスdpb＿output＿delayを、符号化データ全体に適用される情報を通知するシンタックスではなく、各フレームに適用される情報を通知するシンタックスであるピクチャパラメータセット（Picture Parameter Set）の中に追加するものとする。ここでは、dpb＿output＿delayは、第１実施形態における場合と同様に、９０ｋＨｚの時間単位にて最大遅延時間を示すものとし、その時間単位にて表される数値を、３２ビットの符号無し固定長符号にて符号化して伝送するものとする。
【０１０５】
符号化における最大遅延時間の算出、ならびに復号における最大遅延時間を用いた復号画像の出力時間の遅延については、第１実施形態におけるものと同様である。また、本実施形態において用いられる動画像符号化装置、及び動画像復号装置の構成は、第１実施形態に関して図３、図４に示したものと同様である。
【０１０６】
本実施形態における各フレームの最大遅延時間dpb＿output＿delayの求め方について説明する。図３に示す符号化装置１において、制御器１５では、逆方向予測による遅延時間（Ｄ）を、第１実施形態で説明したような方法で求め、各フレームの符号化時間Ｔｒ（ｎ）を決定する。次に、フレームメモリ１１より各フレームの表示時間Ｔｉｎ（ｎ）が入力されると、そのフレームのdpb＿output＿delay（ｎ）は下記のように求められる。
dpb＿output＿delay（ｎ）＝Ｔｉｎ（ｎ）＋Ｄ−Ｔｒ（ｎ）
このdpb＿output＿delayの値は、該当するフレームに関連付けられ、多重化器１２にて多重化される。
【０１０７】
本実施形態では、各フレームを符号化するための時間Ｔｒ（ｎ）も一緒に符号化される。図２を例として考えると、Ｄ＝２／１５秒、Ｔｉｎ（ｎ）＝０、１／１５、２／１５、３／１５、４／１５（ｎ＝０、１、２、３、４）である。符号化の順序が変わるため、Ｔｒ（ｎ）＝０、１／１５、４／１５、２／１５、３／１５（ｎ＝０、１、２、３、４）となる。ここで、各フレームのdpb＿output＿delay（ｎ）は、

となる。
【０１０８】
一方、図４に示す復号装置２では、復号器２０より、各フレームのdpb＿output＿delay（ｎ）及びＴｒ（ｎ）が制御器２５に送られ、下記の式に基づいて各フレームの出力時間Ｔｏ（ｎ）が求められる。
Ｔｏ（ｎ）＝Ｔｒ（ｎ）＋dpb＿output＿delay
図２を例として考えると、上記の定義より、各フレームについて、Ｔｒ（ｎ）＝０、１／１５、４／１５、２／１５、３／１５（ｎ＝０、１、２、３、４）、dpb＿output＿delay（ｎ）＝２／１５、２／１５、０、３／１５、３／１５（ｎ＝０、１、２、３、４）であるから、
ｎ＝０、Ｔｏ（０）＝０＋２／１５＝２／１５
ｎ＝１、Ｔｏ（１）＝１／１５＋２／１５＝３／１５
ｎ＝２、Ｔｏ（２）＝４／１５＋０＝４／１５
ｎ＝３、Ｔｏ（３）＝２／１５＋３／１５＝５／１５
ｎ＝４、Ｔｏ（４）＝３／１５＋３／１５＝６／１５
となる。
【０１０９】
すなわち、全ての画像は２／１５秒送れて、一定の間隔を保ちながらモニタに表示される。なお、当然ながら、逆方向予測を行わない場合、そのために用いられる参照フレーム枚数がゼロとなるので、dpb＿output＿delay（ｎ）の値はゼロとなる。
【０１１０】
最大遅延時間は、逆方向予測未関連フレームにおいて復号画像が得られる時間から、復号画像出力における基準時間を定義するものであることから、逆方向予測未関連フレームについてのみ通知されれば良い。したがって例えば、最大遅延時間を通知するためのシンタックスは、それ以前にシンタックスの有無を指示するフラグが通知されることとして、省略可能としてもよい。逆方向予測未関連フレームにおいても任意に省略されることとしても良く、最大遅延時間の通知が省略される場合には、それ以前に通知された最大遅延時間が適用されるものとすれば良い。
【０１１１】
また、本実施形態における各フレームに対するシンタックスは、第１実施形態において定義されたような、符号化データ全体に対するシンタックスと同時に用いられることとしても良い。この場合、各フレームに対するシンタックスは、上述のようにそれ以前にシンタックスの有無を指示するフラグが通知されることとして、省略可能とする。符号化データ全体に対するシンタックスにおいて通知された最大遅延時間は、各フレームに対するシンタックスにおいて最大遅延時間が通知されるまで適用されるものとし、各フレームに対するシンタックスにより更新された後は、これに基づいて遅延させた時間が以降のすべての復号画像出力における基準時間となることとする。
【０１１２】
本実施形態の説明はH.26Lをもとにして実現したものとして説明したが、本発明を適用することのできる動画像符号化方式はH.26Lに限定されるものではなく、逆方向フレーム間予測を用いる様々な動画像符号化方式に適用することが可能である。
【０１１３】
また、本実施形態においては、最大遅延時間を通知するためのシンタックスとしてピクチャパラメータセットの中に固定長符号によるシンタックスを追加するものとしたが、むろんこれを通知するための符号やシンタックス、あるいは最大遅延時間を表現するための時間単位はこれらに限られるものではない。固定長符号に代わり可変長符号を用いることとしても良いし、また各フレームに適用されるための情報を通知することのできる様々なシンタックスにおいて通知するものとすることができる。
【０１１４】
例えば、H.26Lにおいては補助拡張情報メッセージ（Supplemental Enhancement Information Message）の中にシンタックスを追加することとしても良い。また他の動画像符号化方式を用いる場合には、当該符号化方式において各フレームに適用されるための情報を通知するためのシンタックスを用いることができ、またH.263を用いた通信において制御情報の通知のために利用されるITU-T Recommendation H.245のように、動画像符号化方式による符号化データの外部において通知することとしても良い。
【０１１５】
【発明の効果】
本発明による動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像処理システム、動画像符号化プログラム、及び動画像復号プログラムは、以上詳細に説明したように、次のような効果を得る。すなわち、連続するフレームによって構成された動画像に対して、逆方向フレーム間予測を行って符号化して出力する際に、逆方向予測に伴う最大遅延時間を出力する動画像符号化方法、符号化装置、符号化プログラム、最大遅延時間を入力する動画像復号方法、復号装置、復号プログラム、及びそれらを用いた動画像処理システムによれば、逆方向フレーム間予測を用いる際に、適切な時間間隔での復号画像出力を得ることが可能となる。
【０１１６】
特に、従来技術と異なり、出力時間は絶対値ではなくて、復号時間Ｔｒからの相対値を用いるために、フレームレートが可変の場合においても、少ないビット数で正確に最大遅延時間dpb＿output＿delayの値を記述し、伝送することができる効果がある。また、復号時間Ｔｒがずれる場合、もしくは受信されない場合においても、対応する画像が復号完了時点からdpb＿output＿delay分遅延されてから出力されるので、画像は正しい間隔で出力できる利点がある。
【図面の簡単な説明】
【図１】動画像符号化装置、動画像復号装置、及び動画像処理システムの概略構成を示すブロック図である。
【図２】双方向予測を行った場合でのフレームの符号化の一例について示す図である。
【図３】動画像符号化装置の構成の一例を示すブロック図である。
【図４】動画像復号装置の構成の一例を示すブロック図である。
【図５】図２に示した双方向予測を行った場合でのフレームの（ａ）復号、及び（ｂ）出力について示す図である。
【図６】双方向予測を行った場合でのフレームの符号化について示す図である。
【図７】図６に示した双方向予測を行った場合でのフレームの（ａ）復号、及び（ｂ）出力について示す図である。
【図８】双方向予測を行った場合でのフレームの符号化について示す図である。
【図９】図８に示した双方向予測を行った場合でのフレームの（ａ）復号、及び（ｂ）出力について示す図である。
【図１０】双方向予測を行った場合でのフレームの（ａ）復号、（ｂ）出力、及び（ｃ）遅延させた出力について示す図である。
【図１１】双方向予測を行った場合でのフレームの符号化について示す図である。
【符号の説明】
１…動画像符号化装置、１ａ、１ｃ…入力端子、１ｂ…出力端子、１０…符号化器、１１…フレームメモリ、１２…多重化器、１３…バッファ、１５…制御器、１６…最大遅延時間計算部、２…動画像復号装置、２ａ…入力端子、２ｂ…出力端子、２０…復号器、２１…入力バッファ、２２…出力バッファ、２３…ライン、２５…制御器、２６…画像出力時間計算部。
Ｄ０…動画像データ、Ｄ１…符号化データ、Ｄ２…復号後の動画像データ、Ｆ０〜Ｆ４…フレーム。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a moving picture coding method, a moving picture decoding method, a moving picture coding apparatus, a moving picture decoding apparatus, a moving picture processing system, a moving picture coding program, and a moving picture decoding program.
[0002]
[Prior art]
2. Description of the Related Art In order to transmit, store, and reproduce a moving image signal, a moving image signal encoding technique is used. Such technologies include internationally standardized video codes such as ITU-T Recommendation H.263 (hereinafter referred to as H.263) and ISO / IEC International Standard 14496-2 (MPEG-4 Visual, hereinafter referred to as MPEG-4). There is a known scheme. In addition, as newer coding methods, a moving picture coding method which is scheduled to be jointly standardized by ITU-T and ISO / IEC, ITU-T Recommendation H.264, ISO / IEC International Standard 14496-10 (Joint Final Committee Draft of Joint Video Specification (hereinafter referred to as H.26L) is known. For example, Non-Patent Document 1 (Fumitaka Ono, Hiroshi Watanabe, "Basic Technology of International Standard Image Coding") describes general coding techniques used in these moving picture coding methods.
[0003]
Since a moving image signal is a sequence of one image (frame) that changes little by little in time, generally in these moving image coding methods, an input as an encoding target is performed. The inter-frame prediction is performed on the obtained frame (current frame) with another frame (reference frame) to reduce temporal redundancy in the moving image signal. In this case, the inter-frame prediction is performed between the current frame and a reference frame having a smaller change, so that the redundancy can be greatly reduced and the coding efficiency can be increased.
[0004]
For this reason, as shown in FIG. 6, not only the frame A0 temporally before the current frame A1 but also the frame A2 temporally after the current frame A1 may be used as a reference frame for the current frame A1. The case of using the previous frame is called forward prediction, and the case of using the subsequent frame is called backward prediction. At this time, a case where both predictions are arbitrarily selected or used at the same time is called bidirectional prediction.
[0005]
Generally, when such bidirectional prediction is used, one of the temporally preceding frames is used as a reference frame for forward prediction and the temporally subsequent frame is used as in the example shown in FIG. Are respectively held as reference frames for backward prediction before the current frame.
[0006]
FIG. 7 is a diagram showing (a) decoding and (b) output of a frame when the bidirectional prediction shown in FIG. 6 is performed. For example, in the decoding of MPEG-4, when decoding the current frame A1 by bidirectional inter-frame prediction, first, prior to decoding of the current frame A1, one of the frames temporally earlier than the current frame A1. A certain frame A0 and a frame A2 which is one of the temporally later frames are decoded as frames decoded by intra-frame prediction without using inter-frame prediction or frames decoded by forward inter-frame prediction, They are kept as reference frames. Thereafter, the current frame A1 is decoded by bidirectional prediction using these two held frames A0 and A2 (FIG. 7A).
[0007]
Therefore, in this case, the order of the decoding times of the temporally later reference frame A2 and the current frame A1 is reversed from the order of the output times of the respective decoded images. Note that

output time information

0, 1, and 2 are associated with these frames A0, A1, and A2, respectively, and the temporal context of each frame can be known according to this information. Therefore, each decoded image is output in the correct order (FIG. 7B). In MPEG-4, the output time information is described as an absolute value.
[0008]
In addition, in recent moving picture coding systems, as shown in FIG. 8, in such inter-frame prediction, the forward and backward directions are set so that prediction from a frame having a smaller change from the current frame becomes possible. Some frames can use a plurality of reference frames instead of just one. FIG. 8 shows an example in which two frames B0 and B1 temporally earlier than the current frame B2 and two frames B3 and B4 temporally later than the current frame B2 are used as reference frames for the current frame B2.
[0009]
FIG. 9 is a diagram showing (a) decoding and (b) output of a frame when the bidirectional prediction shown in FIG. 8 is performed. For example, in H.26L decoding, a plurality of reference frames can be held in a range up to a predetermined upper limit of the number of reference frames, and when performing inter-frame prediction, an optimal Is used arbitrarily designated. In this case, when decoding the current frame B2 as a bidirectional prediction frame, first, a reference frame is decoded prior to decoding of the current frame B2, and a frame temporally earlier than the current frame B2 is used as the reference frame. (For example, two frames B0 and B1) and a plurality of frames (for example, two frames B3 and B4) temporally later are respectively decoded and held as reference frames. In the current frame B2, prediction can be performed by arbitrarily instructing a frame used for prediction from among the frames B0, B1, B3, and B4 (FIG. 9A).
[0010]
Therefore, in this case, the order of the decoding time of the plurality of reference frames B3 and B4 temporally later and the current frame B2 is reversed from the order of the output time. Note that these frames B0 to B4 are associated with output time information or output order information 0 to 4, respectively, and the temporal context of each frame can be known according to this information. Therefore, the respective decoded images are output in the correct order (FIG. 9B). The output time information is often described as an absolute value. The output order is used when the frame interval is constant.
[0011]
When decoding by backward prediction using a temporally later frame as a predicted frame, decoding of a temporally later frame has been completed prior to decoding of the current frame, and it should be used as a predicted frame. Need to be able to In this case, a delay occurs before a decoded image is obtained for the current frame, as compared with a frame in which backward prediction is not used.
[0012]
This will be specifically described below with reference to FIG. FIG. 10 corresponds to the example shown in FIGS. First, the encoded data of each of the frames A0 to A2 is decoded in an order necessary for performing inter-frame prediction, and the interval is assumed to be a fixed time interval according to the frame rate. Is assumed to be negligible for each of the frames A0 to A2 regardless of whether inter-frame prediction is used or the direction of inter-frame prediction (FIG. 10A). Actually, the decoding intervals of the frames A0 to A2 do not need to be constant, and may vary due to factors such as fluctuations in the amount of coded bits of the frames A0 to A2, but can be regarded as constant on average. . Further, although the time required for the decoding process is not zero, if there is no large difference between the frames A0 to A2, it does not cause a significant problem in the following description.
[0013]
Here, a decoded image is obtained in a frame A0 in which the order of the decoding time and the output time is not reversed between the delay due to the backward prediction and the other frames (hereinafter, referred to as an unrelated frame in the backward prediction). It is assumed that the output of the decoded image is performed using the time as the output time associated with the decoded image. Then, when the succeeding frame is the backward prediction frame A1, this decoded image is decoded after the temporally subsequent frame A2, so that a delay is required until a decoded image is obtained. Occurs.
[0014]
For this reason, if the time at which the decoded image is obtained in the backward prediction unrelated frame A0 is used as the reference of the output time, the decoded image of the backward predicted frame A1 cannot be obtained by the output time associated therewith (FIG. 10). (B)). That is, the output time interval between the decoded image of the backward prediction unrelated frame A0 and the decoded image of the backward prediction frame A1 is longer than the original interval by a delay time necessary for performing the backward prediction. This results in an unnatural moving image output.
[0015]
Therefore, when backward inter-frame prediction is used in video coding, as shown in FIG. 10 (c), a delay required when performing backward prediction in advance also on a frame A0 not related to backward prediction. It is necessary to delay the output time of the decoded image by the time so that the output time interval with the backward prediction frame A1 can be correctly handled.
[0016]
Conventionally, backward inter-frame prediction requires a large amount of calculation due to an increase in the number of prediction options, making it difficult to implement with simple equipment. In addition, there is a delay in real-time communication in which bidirectional conversations such as videoconferencing are performed. Since it is not desirable to increase the time, encoding is performed at a high bit rate such as television broadcasting and its accumulation, and moving image encoding is performed under the condition that the same fixed frame rate of 30 frames / sec as the television broadcasting signal is always used. Has been used in
[0017]
In this case, for example, in encoding using one of the temporally later frames as a reference frame for backward prediction, such as MPEG-4, the delay time required for performing backward prediction is constant. . For example, when a frame rate of 30 frames / second is used as described above, the delay time is a time interval of each frame, that is, 1/30 seconds. Therefore, the time for delaying the output time of the decoded image in the backward prediction unrelated frame can be uniformly set to 1/30 second.
[0018]
[Non-patent document 1]
Fumitaka Ono and Hiroshi Watanabe, "Basic Technology of International Standard Image Coding", Corona, March 20, 1998
[0019]
[Problems to be solved by the invention]
However, in recent years, video services have been diversified along with the improvement of computational capabilities, so that video encoding that requires delay and requires low bit rate encoding, such as video distribution in the Internet and mobile communications, is required. Is being used. To achieve low bit rate encoding, a frame rate of less than 30 frames / sec is used, or a variable frame rate where the frame rate is dynamically changed to control the encoding bit rate Is used.
[0020]
In such video coding, when the above-described backward prediction is used in order to further increase the coding efficiency, the delay time due to the backward prediction does not become 1/30 second as in the related art. When a variable frame rate is used, the frame rate is not constant. For example, when a small frame rate is temporarily used, the time interval between each frame becomes large, so that the time to delay the output time of the decoded image in the frame not related to backward prediction is uniquely determined. Absent. For this reason, it becomes impossible to correctly handle the output time interval between the decoded image of the backward prediction unrelated frame and the decoded image of the backward prediction frame.
[0021]
At this time, a delay time that may occur when performing the backward prediction is expected in advance, and by always delaying the output time of the decoded image of the backward prediction unrelated frame by the delay time, the decoded image of the backward predicted frame and Output time intervals can be handled correctly. However, in this case, a large delay is always added to the output time of the decoded image regardless of the delay time in the actual backward prediction.
[0022]
Further, when a plurality of reference frames are used in backward prediction as in H.26L, decoding of all the reference frames that are temporally later frames is completed prior to decoding of the current frame. Need to be. For this reason, the delay time required for performing the backward prediction further increases.
[0023]
In this case, the number of reference frames used in the backward prediction is uniquely determined as the number of frames decoded before the current frame and temporally later than the current frame. Within the range up to the upper limit, the number of reference frames can be arbitrarily changed.
[0024]
For example, if the upper limit of the number of reference frames is four, the number of reference frames used in backward prediction may be two as shown in FIG. 8, but may be one as shown in FIG. Alternatively, as shown in FIG. Since the number of reference frames can be changed in this way, the delay time required when performing backward prediction can greatly change. This makes it impossible to correctly handle the output time interval between the decoded image of the backward prediction unrelated frame and the decoded image of the backward prediction frame.
[0025]
At this time, since the maximum number of reference frames that can be used in the backward prediction does not become larger than the upper limit of the number of reference frames, the delay time according to the upper limit of the number of reference frames is used when performing backward prediction. This is the maximum delay time that can occur. Therefore, by always delaying the output time of the decoded image of the frame not related to backward prediction by this delay time, the output time interval between the decoded image of the backward predicted frame and the decoded image can be correctly handled.
[0026]
However, in this case, a large delay is always added to the output time of the decoded image regardless of the number of reference frames actually used in the backward prediction frame. Further, when the above-described variable frame rate is used, the maximum delay time cannot be uniquely determined even if the maximum number of reference frames is uniquely determined.
[0027]
As described above, conventionally, when backward prediction is used in video coding, a delay time required for performing backward prediction is uniquely determined unless it is clear that a fixed frame rate is used. I can't. For this reason, the output time interval between the decoded image of the frame not related to backward prediction and the decoded image of the backward predicted frame cannot be correctly handled, resulting in an unnatural moving image output.
[0028]
Also, when a plurality of reference frames are used in backward prediction, the number of reference frames can change, so that the delay time can change. Therefore, there has been a problem that the time interval between the decoded image of the backward prediction unrelated frame and the decoded image of the backward prediction frame cannot be correctly handled. Further, if a maximum delay time is always assumed to cope with this, there is a problem that a large delay is always added to the output time of the decoded image.
[0029]
The present invention has been made in order to solve the above problems, and a moving image encoding method and a moving image encoding method capable of obtaining decoded image output at appropriate time intervals when using backward inter-frame prediction. An object of the present invention is to provide an image decoding method, a moving image encoding device, a moving image decoding device, a moving image encoding program, and a moving image decoding program.
[0030]
[Means for Solving the Problems]
In order to achieve such an object, a moving picture coding method according to the present invention is a moving picture coding method for performing inter-frame prediction between other frames, and has a maximum delay that can be caused by backward prediction. It is characterized by outputting time.
[0031]
Similarly, the video encoding device according to the present invention is a video encoding device that performs inter-frame prediction with another frame, and outputs a maximum delay time that can be caused by backward prediction. And
[0032]
As described above, in the moving picture coding method and the moving picture coding apparatus according to the present invention, when coding and outputting a moving picture composed of continuous frames, in addition to the coded data, the maximum delay associated with backward prediction The time is to be output. This makes it possible to obtain decoded image outputs at appropriate time intervals when using reverse inter-frame prediction.
[0033]
Further, the moving picture coding program according to the present invention is a moving picture coding program for causing a computer to perform moving picture coding for performing inter-frame prediction with another frame, and is performed by backward prediction. It is characterized by causing a computer to execute a process of outputting the obtained maximum delay time.
[0034]
As described above, in the moving picture coding program according to the present invention, when a moving picture is coded and output, the computer outputs the maximum delay time in addition to the coded data. This makes it possible to obtain decoded image outputs at appropriate time intervals when using reverse inter-frame prediction.
[0035]
A moving picture decoding method according to the present invention is a moving picture decoding method for performing inter-frame prediction with another frame, and is characterized in that a maximum delay time that can be generated by backward prediction is input.
[0036]
Similarly, a video decoding device according to the present invention is a video decoding device that performs inter-frame prediction with another frame, and inputs a maximum delay time that can be generated by backward prediction. .
[0037]
As described above, in the moving picture decoding method and apparatus according to the present invention, when decoding the input coded data to generate a moving picture, in addition to the coded data, the maximum delay time associated with backward prediction Is to enter. This makes it possible to obtain decoded image outputs at appropriate time intervals when using reverse inter-frame prediction.
[0038]
Further, the moving picture decoding program according to the present invention is a moving picture decoding program for causing a computer to execute moving picture decoding for performing inter-frame prediction with another frame, and the maximum delay that can be caused by backward prediction. The computer is configured to execute a process of inputting time.
[0039]
As described above, in the moving picture decoding program according to the present invention, when decoding the coded data to generate a moving picture, the computer may execute a process of inputting the maximum delay time in addition to the coded data. And This makes it possible to obtain decoded image outputs at appropriate time intervals when using reverse inter-frame prediction.
[0040]
Further, the moving picture coding method includes an input step of inputting a frame to be coded, a coding step of coding the frame by a predetermined method, a display time of the frame, a coding time, and a backward prediction. Calculating a maximum delay time of a frame from a possible delay time.
[0041]
Similarly, the moving image encoding apparatus includes an input unit that inputs a frame to be encoded, an encoding unit that encodes the frame by a predetermined method, a display time of the frame, an encoding time, and a backward prediction. And a maximum delay time calculating means for calculating a maximum delay time of a frame from a delay time which may occur due to the following.
[0042]
Similarly, the video encoding program includes an input process of inputting a frame to be encoded, an encoding process of encoding the frame by a predetermined method, a display time of the frame, an encoding time, and a backward prediction. And a maximum delay time calculating process for calculating the maximum delay time of the frame from the delay time that can occur due to the above.
[0043]
As described above, in the moving image encoding method, apparatus, and program according to the present invention, the maximum delay time of a frame is obtained when encoding a moving image. This makes it possible to obtain decoded image outputs at appropriate time intervals when using reverse inter-frame prediction.
[0044]
Also, the moving image decoding method includes an input step of inputting encoded data of a frame encoded by a predetermined method, image data including a frame decoding time and a maximum delay time, and decoding and reproducing the encoded data. It is characterized by comprising a decoding step of generating an image, and an image output time calculating step of obtaining an output time for displaying a frame based on the decoding time and the maximum delay time.
[0045]
Similarly, the moving picture decoding apparatus decodes coded data of a frame coded by a predetermined method, input means for inputting image data including a frame decoding time, and a maximum delay time, It is characterized by comprising decoding means for generating a reproduced image, and image output time calculating means for obtaining an output time for displaying a frame based on the decoding time and the maximum delay time.
[0046]
Similarly, the moving image decoding program, the encoded data of the frame encoded in a predetermined method, the decoding time of the frame, the input process of inputting image data including the maximum delay time, decoding the encoded data, It is characterized by causing a computer to execute a decoding process for generating a reproduced image and an image output time calculation process for obtaining an output time for displaying a frame based on the decoding time and the maximum delay time.
[0047]
As described above, in the moving image decoding method, apparatus, and program according to the present invention, when decoding encoded data to generate a moving image, an output time for displaying a frame based on the maximum delay time is set. I'm asking. This makes it possible to obtain decoded image outputs at appropriate time intervals when using reverse inter-frame prediction.
[0048]
The maximum delay time output in the moving image encoding method, the encoding device, and the encoding program is based on the time of occurrence of a frame for performing reverse inter-frame prediction, and the temporal delay that can be used as a reference frame for backward prediction. It is preferable that the time difference from the last frame to the generation time of the last frame be the maximum delay time.
[0049]
As for the application of the maximum delay time, the maximum delay time may be output as information applied to the entire encoded data. Alternatively, the maximum delay time may be output as information applied to each frame. Alternatively, the maximum delay time may be arbitrarily output as information applied to the frame notified of the maximum delay time and each frame temporally subsequent to the frame.
[0050]
Further, regarding the maximum delay time input in the video decoding method, the decoding device, and the decoding program, the decoding time and the frame in the frame in which the order of the decoding time and the output time are not reversed with respect to other frames. It is preferable that the time difference from the associated decoded image output time be the maximum delay time. Alternatively, it is preferable to set a reference for the subsequent decoded image output time based on the maximum delay time.
[0051]
Further, regarding the application of the maximum delay time, the maximum delay time may be input as information applied to the entire encoded data. Alternatively, the maximum delay time may be input as information applied to each frame. Alternatively, the maximum delay time may be arbitrarily input as information applied to a frame notified of the maximum delay time and each frame temporally subsequent to the frame.
[0052]
A moving image processing system according to the present invention is a moving image processing system configured to include a moving image encoding device and a decoding device, and the encoding device includes the above moving image encoding device and performs decoding. An apparatus is characterized by comprising the moving picture decoding apparatus described above.
[0053]
As described above, the moving image processing system is configured using the moving image encoding device and the moving image decoding device that respectively output and input the maximum delay time associated with backward prediction. This realizes a moving image processing system capable of obtaining decoded image outputs at appropriate time intervals when using backward inter-frame prediction.
[0054]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of a moving image encoding method, a moving image decoding method, a moving image encoding device, a moving image decoding device, a moving image encoding program, and a moving image decoding program according to the present invention will be described in detail with reference to the drawings. . In the description of the drawings, the same elements will be denoted by the same reference symbols, without redundant description.
[0055]
First, an outline of encoding and decoding of a moving image in the present invention will be described. FIG. 1 is a block diagram showing a schematic configuration of a video encoding device, a video decoding device, and a video processing system according to the present invention. The moving image processing system includes a moving image encoding device 1 and a moving image decoding device 2. Hereinafter, the moving picture coding apparatus 1, the moving picture decoding apparatus 2, and the moving picture processing system will be described together with the moving picture coding method and the moving picture decoding method executed therein.
[0056]
The moving image encoding apparatus 1 is an apparatus that encodes moving image data D0 in which images (frames) are continuously formed and outputs the encoded data D1 in order to transmit, store, and reproduce moving images. is there. The video decoding device 2 is a device that decodes the input encoded data D1 and generates decoded video data D2 in which frames are continuously formed. In addition, the video encoding device 1 and the video decoding device 2 are connected by a predetermined wired or wireless data transmission path for transmitting necessary data such as encoded data D1.
[0057]
In the encoding of a moving image performed by the moving image encoding device 1, as described above, a frame of the moving image data D0 input as an encoding target is transmitted between another frame serving as a reference frame. Inter-frame prediction is performed to reduce redundancy in moving image data. In the video processing system shown in FIG. 1, the video encoding device 1 performs backward inter-frame prediction from a temporally later frame for this inter-frame prediction. Further, the moving picture coding apparatus 1 outputs a maximum delay time that can be generated by backward prediction, in addition to the coded data D1.
[0058]
In addition, corresponding to such a moving picture coding apparatus 1, the moving picture decoding apparatus 2 inputs the maximum delay time that can be generated by the backward prediction in addition to the coded data D1 from the moving picture coding apparatus 1. I do. Then, the encoded data D1 is decoded and the moving image data D2 is generated with reference to the input maximum delay time.
[0059]
As described above, for the backward inter-frame prediction, the video encoding device 1 and the video encoding method that output the maximum delay time, the video decoding device 2 and the video decoding method that input the maximum delay time, and According to the moving image processing system including these

devices

1 and 2, it is possible to obtain decoded image outputs at appropriate time intervals when performing inter-frame prediction using backward inter-frame prediction.
[0060]
Here, regarding the maximum delay time output in the video encoding, for example, from the generation time of a frame for performing backward inter-frame prediction, a temporally latest frame that can be used as a reference frame for backward prediction Can be the maximum delay time.
[0061]
In addition, for the maximum delay time input in video decoding, for example, the delay due to performing backward inter-frame prediction, and in the frame in which the order of decoding time and output time is not reversed between other frames, The time difference between the decoding time (hereinafter referred to as Tr) and the decoded image output time (hereinafter referred to as To) associated with the frame can be set as a maximum delay time (hereinafter referred to as dpb_output_delay). In this case, it is preferable to set a reference for a subsequent decoded image output time based on the maximum delay time.
[0062]
The maximum delay time can be applied to the entire coded data or to each frame. Alternatively, there is a method in which the present invention is applied to each frame after the information of the maximum delay time is notified, that is, a frame in which the maximum delay time is notified and each frame temporally later than the frame. The output, input, application, and the like of these maximum delay times will be specifically described later.
[0063]
The processing corresponding to the moving picture coding method executed in the moving picture coding apparatus 1 described above can be realized by a moving picture coding program for causing a computer to execute moving picture coding. The processing corresponding to the moving picture decoding method executed in the moving picture decoding device 2 can be realized by a moving picture decoding program for causing a computer to execute moving picture decoding.
[0064]
For example, the moving picture coding apparatus 1 has a CPU in which a ROM in which software programs necessary for moving picture coding processing are stored and a RAM in which data is temporarily stored during execution of the program are connected. Can be configured by In such a configuration, the moving image encoding device 1 can be realized by executing a predetermined moving image encoding program by the CPU.
[0065]
Similarly, the moving picture decoding device 2 is configured by a CPU connected with a ROM in which software programs required for processing operations of moving picture decoding are stored and a RAM in which temporary data is stored during execution of the program. Can be configured. In such a configuration, the moving image decoding device 2 can be realized by executing a predetermined moving image decoding program by the CPU.
[0066]
Further, the above-described program for causing the CPU to execute each process for moving image encoding or moving image decoding can be recorded on a computer-readable recording medium and distributed. For example, such a recording medium executes or stores a magnetic medium such as a hard disk and a floppy disk, an optical medium such as a CD-ROM and a DVD-ROM, a magneto-optical medium such as a floptical disk, or a program instruction. Specially arranged hardware devices such as RAM, ROM, and semiconductor nonvolatile memory are included.
[0067]
Hereinafter, the moving picture coding apparatus, the moving picture decoding apparatus, the moving picture processing system including them, and the corresponding moving picture coding method and moving picture decoding method shown in FIG. 1 will be described together with specific embodiments. In the following description, encoding and decoding of moving images will be described as being realized based on H.26L. Shall conform to However, the present invention is not limited to H.26L.
[0068]
(1st Embodiment)
First, a first embodiment of the present invention will be described. In the present embodiment, an embodiment in which encoding is performed at a fixed frame rate will be described. In the encoding according to the present embodiment, the maximum number of reference frames used for backward prediction is determined, and the maximum delay time is calculated and output based on the maximum number of reference frames and the frame rate used for encoding. In the decoding according to the present embodiment, the output time of the decoded image is delayed by the input maximum delay time when decoding the frame related to backward prediction. In addition, the delay time to the output time is uniformly applied to all the subsequent steps, and the output time interval between the decoded image of the frame not related to backward prediction and the decoded image of the backward predicted frame changes from the original interval. To prevent them from
[0069]
In encoding, first, since the upper limit of the number of reference frames to be used is predetermined, the maximum number of reference frames to be used for backward prediction is determined within a range not exceeding the upper limit. Next, the maximum delay time is calculated as a time interval of one or more frames corresponding to the maximum number of reference frames used for backward prediction, also based on a predetermined frame rate used for encoding.
[0070]
FIG. 2 is a diagram illustrating an example of encoding of a frame when bidirectional prediction is performed. Here, FIG. 2 shows an example in which two frames F0 and F1 temporally earlier than the current frame F2 and two frames F3 and F4 temporally later than the current frame F2 are used as reference frames for the current frame F2. ing.
[0071]
As shown in FIG. 2, if the maximum number of reference frames used for backward prediction is 2 and the frame rate is 15 frames / sec, the time interval of one frame is 1/15 second. Therefore, in this case, the maximum delay time is 2 × (1/15) = 2/15 seconds.
[0072]
In the encoding, the encoding of each frame is controlled so that backward prediction requiring a delay time exceeding the maximum delay time is not performed thereafter. Specifically, a reference frame used for backward prediction, that is, a frame temporally later than the current frame is coded and output earlier than the current frame exceeding the maximum number of reference frames used for backward prediction. The encoding order of each frame is controlled so as not to be performed.
[0073]
FIG. 3 is a block diagram illustrating an example of a configuration of a moving image encoding device used in the present embodiment. 3 includes an encoder 10 that encodes a frame (image) by a predetermined method, a controller (CPU) 15 that controls the operation of each unit of the encoder 1, and an input. A frame memory 11 is provided between the terminal 1a and the encoder 10, and a multiplexer 12 is provided between the output terminal 1b and the encoder 10. Further, the controller 15 has a maximum delay time calculation unit 16 for obtaining a maximum delay time as its function. Further, the encoder 10 is provided with an output buffer 13.
[0074]
In the moving picture coding in the present coding apparatus 1, a condition for coding a video is input from the input terminal 1c. In inputting this condition, generally, an encoding condition is selected or input by an input device such as a keyboard. In the present embodiment, specifically, in addition to the size, frame rate, and bit rate of a frame to be encoded, a prediction reference structure of the video (whether to perform backward prediction), The number of frames temporarily stored and used as reference frames (corresponding to the capacity of the output buffer 13) and the number of reference frames used for backward prediction are input. These conditions may be set to change with time. The encoding condition input from the input terminal 1c is stored in the controller 15.
[0075]
When the encoding process is started, the controller 15 sends the encoding condition to the encoder 10, and the encoding condition is set. On the other hand, a frame to be encoded is input from the input terminal 1a, sent to the encoder 10 via the frame memory 11, and encoded. The input frame is temporarily stored in the frame memory 11 because the order of the frames is changed in performing the backward prediction. For example, in the example shown in FIG. 2, the frame F2 is input from the input terminal 1a before the frames F3 and F4, but is encoded after the frames F3 and F4. Is stored in
[0076]
The encoder 10 encodes the frame based on the H.26L algorithm. Then, the encoded data is multiplexed with other related information via the multiplexer 12, and output from the output terminal 1b. The frame used for prediction is reproduced by the encoder 10 and stored in the buffer 13 as a reference frame for encoding the next frame.
[0077]
In the present embodiment, the maximum delay time calculation unit 16 of the controller 15 calculates the maximum delay time dpb_output_delay based on the number of reference frames and the frame rate used for backward prediction input from the input terminal 1c. Then, the maximum delay time is added to the encoded image data by the multiplexer 12. Further, an identifier (N) indicating a display order for identifying the encoded data is added to the encoded data of each frame.
[0078]
Note that, of course, when backward prediction is not performed, the number of reference frames used for that becomes zero, and the value of dpb_output_delay becomes zero.
[0079]
In the present embodiment, in order to output the maximum delay time in encoding and to input the same in decoding, a syntax for notifying the maximum delay time is added to the encoded data syntax in H.26L. Here, a new syntax is added to a sequence parameter set, which is a syntax for notifying information applied to the entire encoded data.
[0080]
Dpb_output_delay is defined as a syntax for notifying the maximum delay time. Here, dpb_output_delay is the same as the time unit used for other syntax indicating time in H.26L, and indicates the maximum delay time in 90 kHz time units. It is also assumed that the numerical value expressed in the time unit is encoded by a 32-bit unsigned fixed-length code and transmitted. For example, when the maximum delay time is 2/15 seconds as described above, dpb_output_delay is (2/15) × 90000 = 12000.
[0081]
In the decoding, the maximum delay time notified by dpb_output_delay is decoded, and the output time of the decoded image is delayed using this.
[0082]
FIG. 4 is a block diagram illustrating an example of a configuration of a video decoding device used in the present embodiment. The video decoding device 2 illustrated in FIG. 4 decodes encoded data to generate a reproduced image, a controller (CPU) 25 that controls the operation of each unit of the decoding device 2, an input terminal 2a, It comprises an input buffer 21 provided between the decoders 20 and an output buffer 22 provided between the output terminal 2b and the decoder 20. The controller 25 has, as its function, an image output time calculation unit 26 for obtaining an output time for displaying a frame.
[0083]
In moving image decoding in the decoding device 2, data to be decoded is input from the input terminal 2a. This data is multiplexed with encoded data of each frame encoded using the encoding device 1 shown in FIG. 3, a maximum delay time dpb_output_delay, and an identifier (N) indicating the display order of each frame. I have.
[0084]
The input data is stored in the input buffer 21. At the time of decoding according to an instruction from the controller 25, data for one frame is input to the decoder 20 from the input buffer 21 and decoded according to the H.26L algorithm. The frame reproduced in this manner is stored in the output buffer 22. The frame in output buffer 22 is fed back to decoder 20 via line 23 and is used as a reference frame for decoding the next frame.
[0085]
On the other hand, the maximum delay time dpb_output_delay, the frame rate, and the identifier (N) of each frame decoded by the decoder 20 are input to the controller 25. Then, the image output time calculator 26 of the controller 25 calculates the output time of each frame from these data according to the following equation.
To (n) = dpb_output_delay + N × frame interval
Here, the frame interval is obtained from the frame rate.
[0086]
Assuming that dpb_output_delay is 2/15 seconds and the frame interval is 1/15 seconds according to the example shown in FIG.
N = 0, To (0) = 2/15
N = 1, To (1) = 3/15
N = 2, To (2) = 4/15
N = 3, To (3) = 5/15
It becomes. As described above, according to the output time To (n) obtained by the controller 25, the frames in the output buffer 22 are arranged at regular intervals as shown in the frames F0, F1, F2, and F3 shown in FIG. At the output terminal 2b. Although not shown, the output terminal 2b is connected to a display device such as a monitor.
[0087]
FIG. 5 is a diagram showing (a) decoding and (b) output of a frame when the bidirectional prediction shown in FIG. 2 is performed. In decoding, the encoded data of each frame is decoded in the order necessary for performing inter-frame prediction, and the interval is assumed to be a fixed time interval according to the frame rate. It is assumed that the time is negligible for each frame regardless of whether inter-frame prediction is used or the direction of inter-frame prediction. In this case, the maximum delay time required when performing backward prediction in the backward prediction frame is equal to the time interval between frames corresponding to the maximum number of reference frames used for backward prediction. This time is reported by dpb_output_delay as the maximum delay time. Therefore, when outputting the decoded image, the output time is delayed by the maximum delay time.
[0088]
Actually, the decoding interval of each frame is not constant, and may vary due to factors such as fluctuations in the amount of coded bits in each frame. Also, the time required for the decoding process of each frame may vary depending on whether or not the frame is a backward prediction frame, the amount of encoded bits of each frame, and the like.
[0089]
Therefore, when the output time is delayed, as shown in FIG. 5, there is no delay caused by performing the backward prediction or the reverse of the order of the decoding time and the output time between other frames. In frame F0, the time at which a decoded image is obtained is used as a reference. That is, a time delayed from the time at which this decoded image is obtained by the maximum delay time notified by dpb_output_delay is a time equal to the output time associated with this decoded image, and is set as a reference time for decoding image output. . The output of the subsequent decoded images F1 to F4 will be output when this reference time becomes the same as the output time associated with each decoded image.
[0090]
For example, when the maximum delay time is 2/15 seconds as described above, the time delayed by 2/15 seconds from the time at which the decoded image is obtained in the frame not related to backward prediction is associated with this decoded image. It is assumed that the output time is equal to the output time, and is set as a reference time in the subsequent decoded image output.
[0091]
In some cases, the maximum delay time may not be notified in order to simplify the encoding or decoding process. For such a case, the syntax for notifying the maximum delay time may be omitted as a flag indicating the presence or absence of the syntax is notified before that.
[0092]
If the notification of the maximum delay time is omitted, in the encoding, for example, it may be specified in advance that the backward prediction is not used, or in a range not exceeding the reference frame number upper limit, The number of reference frames used for backward prediction may be arbitrarily variable.
[0093]
In decoding, for example, backward prediction is not used in accordance with the definition in coding, so that the delay required by performing backward prediction may not occur, or the reference frame number upper limit is not exceeded. Within the range, the number of reference frames used for backward prediction may fluctuate arbitrarily, so that the delay time may fluctuate greatly. In this case, in the decoding, the processing that assumes the assumed maximum delay time may be always performed, or the delay time of each frame may be considered by allowing the fluctuation of the output time interval of the decoded image. Alternatively, a simplified process may be performed.
[0094]
Although the description of the present embodiment has been described as being realized based on H.26L, the moving picture encoding method to which the present invention can be applied is not limited to H.26L, and the backward frame It is possible to apply to various moving picture coding methods using inter prediction.
[0095]
Further, in the present embodiment, the syntax using the fixed length code is added to the sequence parameter set as the syntax for notifying the maximum delay time, but, of course, the code or syntax for notifying this is added. The time unit for expressing the maximum delay time is not limited to these. A variable-length code may be used instead of the fixed-length code, or notification may be made in various syntaxes that can notify information to be applied to the entire encoded data.
[0096]
For example, in H.26L, the syntax may be added to a supplementary enhancement information message. Also, when using another moving picture coding method, it is possible to use a syntax for notifying information to be applied to the entire coded data in the coding method, and using H.263 As in ITU-TRecommendation H.245 used for notification of control information in communication, the notification may be made outside of the encoded data according to the moving picture coding method.
[0097]
(2nd Embodiment)
Next, a second embodiment of the present invention will be described. In the present embodiment, an embodiment in which encoding is performed at a variable frame rate will be described. The operations in encoding and decoding according to the present embodiment are basically the same as those in the first embodiment. In the present embodiment, since a variable frame rate is used, in the encoding, in addition to the operation in the first embodiment, when the frame rate decreases, a delay time exceeding a maximum delay time calculated in advance is required. The operation is performed so that backward prediction is not performed, and even when the frame rate changes, the output time interval between the decoded image of the frame not related to backward prediction and the decoded image of the backward predicted frame changes from the original interval. To prevent them from doing so.
[0098]
In encoding, first, the upper limit of the number of reference frames to be used is predetermined, so that the maximum number of reference frames to be used for backward prediction is determined within a range not exceeding the upper limit. Next, a maximum frame time interval based on a predetermined target frame rate in controlling the encoding bit rate is determined, and the maximum delay time is set to one according to the maximum number of reference frames used for backward prediction and the maximum frame time interval. Alternatively, it is calculated as a time interval of a plurality of frames.
[0099]
In the encoding, the encoding of each frame is controlled so that backward prediction requiring a delay time exceeding the maximum delay time is not performed thereafter. Specifically, a reference frame used for backward prediction, that is, a frame temporally later than the current frame is coded and output earlier than the current frame exceeding the maximum number of reference frames used for backward prediction. The encoding order of each frame is controlled so as not to be performed.
[0100]
At the same time, if the encoding frame rate temporarily decreases due to the encoding bit rate control, and the frame time interval in that case becomes larger than the maximum frame time interval, reverse the encoding of the frame there. The encoding of each frame is controlled so as not to use directional prediction.
[0101]
In the present embodiment, in order to output the maximum delay time in encoding and input in decoding, in the encoded data syntax, a syntax dpb_output_delay for notifying the maximum delay time is added, and the definition thereof is as follows. , The same as in the first embodiment.
[0102]
In the present embodiment, in the decoding, the maximum delay time notified by dpb_output_delay is decoded, and the output time of the decoded image is delayed using the maximum delay time. This processing is the same as that in the first embodiment.
[0103]
(Third embodiment)
Next, a third embodiment of the present invention will be described. In the present embodiment, an embodiment in which the maximum delay time is arbitrarily notified for each frame and can be flexibly changed will be described. The operations in encoding and decoding according to the present embodiment are basically the same as those in the first embodiment or the second embodiment.
[0104]
In the present embodiment, the syntax dpb_output_delay for notifying the maximum delay time defined in the first embodiment is not the syntax for notifying the information applied to the entire encoded data, but the information applied to each frame. Is added to the picture parameter set (Picture Parameter Set) which is a syntax for notifying Here, as in the case of the first embodiment, dpb_output_delay indicates the maximum delay time in units of 90 kHz, and a numerical value expressed in units of time is converted to a 32-bit unsigned fixed-length code. And transmit it.
[0105]
The calculation of the maximum delay time in the encoding and the delay of the output time of the decoded image using the maximum delay time in the decoding are the same as those in the first embodiment. The configurations of the video encoding device and the video decoding device used in the present embodiment are the same as those shown in FIGS. 3 and 4 in the first embodiment.
[0106]
A method of obtaining the maximum delay time dpb_output_delay of each frame according to the present embodiment will be described. In the encoding device 1 shown in FIG. 3, the controller 15 obtains the delay time (D) by backward prediction by the method described in the first embodiment, and calculates the encoding time Tr (n) of each frame. decide. Next, when the display time Tin (n) of each frame is input from the frame memory 11, dpb_output_delay (n) of the frame is obtained as follows.
dpb_output_delay (n) = Tin (n) + D−Tr (n)
The value of dpb_output_delay is associated with the corresponding frame and multiplexed by the multiplexer 12.
[0107]
In the present embodiment, the time Tr (n) for encoding each frame is also encoded. Taking FIG. 2 as an example, D = 2/15 second, Tin (n) = 0, 1/15, 2/15, 3/15, 4/15 (n = 0, 1, 2, 3, 4) It is. Since the encoding order changes, Tr (n) = 0, 1/15, 4/15, 2/15, 3/15 (n = 0, 1, 2, 3, 4). Here, dpb_output_delay (n) of each frame is

It becomes.
[0108]
On the other hand, in the decoding device 2 shown in FIG. 4, dpb_output_delay (n) and Tr (n) of each frame are sent from the decoder 20 to the controller 25, and the output time To (n) of each frame is calculated based on the following equation. ) Is required.
To (n) = Tr (n) + dpb_output_delay
Considering FIG. 2 as an example, from the above definition, Tr (n) = 0, 1/15, 4/15, 2/15, 3/15 (n = 0, 1, 2, 3,. 4) Since dpb_output_delay (n) = 2/15, 2/15, 0, 3/15, 3/15 (n = 0, 1, 2, 3, 4),
n = 0, To (0) = 0 + 2/15 = 2/15
n = 1, To (1) = 1/15 + 2/15 = 3/15
n = 2, To (2) = 4/15 + 0 = 4/15
n = 3, To (3) = 2/15 + 3/15 = 5/15
n = 4, To (4) = 3/15 + 3/15 = 6/15
It becomes.
[0109]
That is, all the images are sent for 2/15 seconds and are displayed on the monitor while maintaining a constant interval. Note that, of course, when backward prediction is not performed, the number of reference frames used for that is zero, so the value of dpb_output_delay (n) is zero.
[0110]
Since the maximum delay time defines the reference time in the decoded image output from the time at which the decoded image is obtained in the backward prediction unrelated frame, only the maximum delay time needs to be notified only for the backward prediction unrelated frame. Therefore, for example, the syntax for notifying the maximum delay time may be omitted because a flag indicating the presence or absence of the syntax is notified before that. It may be arbitrarily omitted even in a frame not related to backward prediction, and when the notification of the maximum delay time is omitted, the maximum delay time notified before that may be applied.
[0111]
Further, the syntax for each frame in the present embodiment may be used simultaneously with the syntax for the entire encoded data as defined in the first embodiment. In this case, the syntax for each frame can be omitted because the flag indicating the presence or absence of the syntax is notified before that as described above. The maximum delay time reported in the syntax for the entire coded data shall be applied until the maximum delay time is reported in the syntax for each frame, and after being updated by the syntax for each frame, It is assumed that the time delayed on the basis of this becomes the reference time in all the decoded image outputs thereafter.
[0112]
Although the description of the present embodiment has been described as being realized based on H.26L, the moving picture encoding method to which the present invention can be applied is not limited to H.26L, and the backward frame It is possible to apply to various moving picture coding methods using inter prediction.
[0113]
Further, in the present embodiment, as a syntax for notifying the maximum delay time, the syntax using the fixed length code is added to the picture parameter set, but, of course, the code or the syntax for notifying this is added. The time unit for expressing the maximum delay time is not limited to these. A variable-length code may be used instead of the fixed-length code, or notification may be made in various syntaxes that can notify information to be applied to each frame.
[0114]
For example, in H.26L, a syntax may be added to a supplementary enhancement information message (Supplemental Enhancement Information Message). Also, when using another moving picture coding method, it is possible to use a syntax for notifying information to be applied to each frame in the coding method, and in communication using H.263. As in ITU-T Recommendation H.245 used for notification of control information, the notification may be provided outside of the encoded data according to the moving image encoding method.
[0115]
【The invention's effect】
The moving picture coding method, the moving picture decoding method, the moving picture coding apparatus, the moving picture decoding apparatus, the moving picture processing system, the moving picture coding program, and the moving picture decoding program according to the present invention are described in detail above. The following effects are obtained. That is, when performing reverse inter-frame prediction on a video composed of consecutive frames and encoding and outputting the resultant, a video encoding method that outputs a maximum delay time associated with reverse prediction, According to the apparatus, the encoding program, the moving picture decoding method for inputting the maximum delay time, the decoding apparatus, the decoding program, and the moving picture processing system using them, when using the backward inter-frame prediction, an appropriate time interval Can be obtained.
[0116]
In particular, unlike the related art, the output time is not an absolute value, but a relative value from the decoding time Tr. Therefore, even when the frame rate is variable, the value of the maximum delay time dpb_output_delay can be accurately calculated with a small number of bits even when the frame rate is variable. There is an effect that can be described and transmitted. In addition, even when the decoding time Tr is shifted or not received, the corresponding image is output after being delayed by dpb_output_delay from the point of completion of decoding, so that there is an advantage that the image can be output at a correct interval.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a schematic configuration of a video encoding device, a video decoding device, and a video processing system.
FIG. 2 is a diagram illustrating an example of encoding of a frame when bidirectional prediction is performed.
FIG. 3 is a block diagram illustrating an example of a configuration of a video encoding device.
FIG. 4 is a block diagram illustrating an example of a configuration of a video decoding device.
5 is a diagram showing (a) decoding and (b) output of a frame when the bidirectional prediction shown in FIG. 2 is performed.
FIG. 6 is a diagram illustrating encoding of a frame when bidirectional prediction is performed.
FIG. 7 is a diagram illustrating (a) decoding and (b) output of a frame when the bidirectional prediction illustrated in FIG. 6 is performed.
FIG. 8 is a diagram illustrating encoding of a frame when bidirectional prediction is performed.
9 is a diagram showing (a) decoding and (b) output of a frame when the bidirectional prediction shown in FIG. 8 is performed.
FIG. 10 is a diagram illustrating (a) decoding, (b) output, and (c) delayed output of a frame when bidirectional prediction is performed.
FIG. 11 is a diagram illustrating encoding of a frame when bidirectional prediction is performed.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Video coding apparatus, 1a, 1c ... Input terminal, 1b ... Output terminal, 10 ... Encoder, 11 ... Frame memory, 12 ... Multiplexer, 13 ... Buffer, 15 ... Controller, 16 ... Maximum delay Time calculation unit, 2: video decoding device, 2a: input terminal, 2b: output terminal, 20: decoder, 21: input buffer, 22: output buffer, 23: line, 25: controller, 26: image output time Calculation part.
D0: moving image data, D1: encoded data, D2: decoded moving image data, F0 to F4: frames.

Claims

A moving image encoding method for performing inter-frame prediction between other frames,
A moving picture coding method characterized by outputting a maximum delay time that can be generated by backward prediction.

A video decoding method for performing inter-frame prediction between another frame,
A moving picture decoding method characterized by inputting a maximum delay time that can be generated by backward prediction.

A video encoding device that performs inter-frame prediction between another frame,
A moving picture coding apparatus for outputting a maximum delay time that can be generated by backward prediction.

The maximum delay time is a time difference from an occurrence time of a frame for performing reverse inter-frame prediction to an occurrence time of a temporally latest frame that can be used as a reference frame for backward prediction. The moving picture coding apparatus according to claim 3.

The moving picture encoding apparatus according to claim 3, wherein the maximum delay time is output as information applied to entire encoded data.

The moving picture encoding apparatus according to claim 3, wherein the maximum delay time is output as information applied to each frame.

The method according to claim 3, wherein the maximum delay time is arbitrarily output as information applied to a frame notified of the maximum delay time and each frame temporally subsequent to the frame. Video encoding device.

A video decoding device that performs inter-frame prediction with another frame,
A moving picture decoding apparatus characterized by inputting a maximum delay time that can be generated by backward prediction.

The maximum delay time is a time difference between a decoding time and a decoded image output time associated with the frame in a frame in which the order of the decoding time and the output time is not reversed with respect to another frame. The moving picture decoding device according to claim 8.

10. The moving picture decoding apparatus according to claim 8, wherein the maximum delay time is input as information applied to the entire encoded data.

10. The video decoding apparatus according to claim 8, wherein the maximum delay time is input as information applied to each frame.

10. The method according to claim 8, wherein the maximum delay time is arbitrarily input as information applied to a frame notified of the maximum delay time and each frame temporally subsequent to the frame. Video decoding device.

A moving image encoding program for causing a computer to execute moving image encoding for performing inter-frame prediction between other frames,
A moving picture coding program for causing a computer to execute a process of outputting a maximum delay time that can be generated by backward prediction.

A moving image decoding program for causing a computer to execute moving image decoding for performing inter-frame prediction between other frames,
A moving picture decoding program for causing a computer to execute a process of inputting a maximum delay time that can occur due to backward prediction.