JP4762486B2

JP4762486B2 - Multi-resolution video encoding and decoding

Info

Publication number: JP4762486B2
Application number: JP2003309278A
Authority: JP
Inventors: ダブリュ．ホルコムトーマス; レグナザンシャンカー; ブルースリンチ−ルン; スリニバサンスリドハー
Original assignee: Microsoft Corp
Current assignee: Microsoft Corp
Priority date: 2002-09-04
Filing date: 2003-09-01
Publication date: 2011-08-31
Anticipated expiration: 2023-09-01
Also published as: JP2004266794A

Description

本発明はマルチ・リゾルーション（multi-resolution：複数解像度）・ビデオ符号化および復号化（multi-resolution video coding and decoding）に関する。例えば、ビデオ符号器はビデオ・フレーム・サイズを適応して変更して、低ビットレートでのブロッキング・アーティファクト（blocking artifacts）を低減する。 The present invention relates to multi-resolution video coding and decoding. For example, video encoders adaptively change the video frame size to reduce blocking artifacts at low bit rates.

デジタル・ビデオは大量の記憶および伝送容量を消費する（使い尽くす）。通常の生のデジタル・ビデオ・シーケンスは毎秒１５または３０フレームを含む。各フレームは数万または数十万のピクセル（ペルとも呼ばれる）を含むことがある。各ピクセルは画像の極めて小さい要素を表現する。生の形式では、コンピュータは一般にピクセルを２４ビットで表現する。したがって、通常の生のデジタル・ビデオ・シーケンスの１秒あたりのビット数またはビットレートは、５百万ビット／秒以上になる可能性がある。 Digital video consumes a large amount of storage and transmission capacity. A typical raw digital video sequence contains 15 or 30 frames per second. Each frame may contain tens of thousands or hundreds of thousands of pixels (also called pels). Each pixel represents a very small element of the image. In raw form, computers typically represent pixels in 24 bits. Thus, the number of bits or bit rate per second of a normal raw digital video sequence can be 5 million bits / second or more.

大抵のコンピュータおよびコンピュータ・ネットワークは、生のデジタル・ビデオを処理するためのリソース（資源）が不足している。このため、エンジニアは圧縮（コード化または符号化とも呼ばれる）を使用して、デジタル・ビデオのビットレートを低減する。圧縮を可逆的（lossless）にすることができ、これにおいてはビデオの品質は悪くならないが、ビットレートの低減がビデオの複雑性によって制限される。または、圧縮を不可逆的（lossy）にすることができ、これにおいてはビデオの品質が悪くなるが、ビットレートの低減は、可逆的圧縮における結果に対して、より劇的である。圧縮解除は圧縮を逆にする。 Most computers and computer networks lack the resources to process live digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. The compression can be lossless, in which the video quality is not compromised, but the bit rate reduction is limited by the complexity of the video. Alternatively, compression can be lossy, which results in poor video quality, but the reduction in bit rate is more dramatic for the results in lossless compression. Decompression reverses compression.

一般に、ビデオ圧縮テクニック（techniques；技術、方法）は、イントラ・フレーム（フレーム内）圧縮およびインター・フレーム（フレーム間）圧縮を含む。イントラ・フレーム圧縮テクニックは個々のフレームを圧縮し、これらは通常、Ｉフレーム、キー・フレームまたは参照フレームと呼ばれる。インター・フレーム圧縮テクニックは、先行および／または後に続くフレームを参照してフレームを圧縮し、通常は予測フレーム、ＰフレームまたはＢフレームと呼ばれる。 In general, video compression techniques include intra-frame (intraframe) compression and inter-frame (interframe) compression. Intra frame compression techniques compress individual frames, which are usually referred to as I-frames, key frames, or reference frames. Inter-frame compression techniques compress frames with reference to preceding and / or following frames and are commonly referred to as predicted frames, P frames, or B frames.

多くのイントラ・フレームおよびインター・フレーム圧縮テクニックは、ブロック・ベースである。ビデオ・フレームは符号化のために複数のブロックに分割される。例えば、Ｉフレームが８×８ブロックに分割され、これらのブロックが圧縮される。または、Ｐフレームが１６×１６マクロブロック（例えば、４つの８×８ルミナンス・ブロックおよび２つの８×８クロミナンス・ブロック）に分割され、これらのマクロブロックが圧縮される。異なる実装は異なるブロック構成を使用することができる。 Many intra-frame and inter-frame compression techniques are block-based. A video frame is divided into a plurality of blocks for encoding. For example, an I frame is divided into 8 × 8 blocks and these blocks are compressed. Alternatively, a P frame is divided into 16 × 16 macroblocks (eg, four 8 × 8 luminance blocks and two 8 × 8 chrominance blocks), and these macroblocks are compressed. Different implementations can use different block configurations.

米国特許第６，４９９，０６０号明細書US Pat. No. 6,499,060 米国特許第６，４１８，１６６号明細書US Pat. No. 6,418,166

標準ビデオ符号器は、目標レートがあるしきい値より下に落ちたとき、性能が劇的な劣化を受ける。ブロック・ベースのビデオ圧縮および圧縮解除では、量子化および他の不可逆的処理ステージが、一般にブロッキング・アーティファクト、すなわち複数のブロックの間の知覚可能な不連続性として現れる歪み、をもたらす。低いビットレートでは、Ｉフレームのブロックについての高い周波数情報がひどく歪まされるか、あるいは完全に失われる可能性がある。同様に、Ｐフレーム（動き推定または他の予測によって予測されなかったＰフレームの一部分）のブロックの残差についての高い周波数情報がひどく歪まされるか、あるいは完全に失われる可能性がある。結果として、著しいブロッキング・アーティファクトが「低域通過」領域において生じる可能性があり、再構築されたビデオの品質において相当な低下を引き起こす可能性がある。 Standard video encoders suffer a dramatic degradation in performance when the target rate falls below a certain threshold. In block-based video compression and decompression, quantization and other irreversible processing stages generally result in blocking artifacts, ie distortion that appears as perceptible discontinuities between multiple blocks. At low bit rates, high frequency information about blocks of I frames can be severely distorted or lost completely. Similarly, high frequency information about block residuals of P frames (parts of P frames that were not predicted by motion estimation or other predictions) can be severely distorted or lost entirely. As a result, significant blocking artifacts can occur in the “low pass” region, which can cause a substantial reduction in the quality of the reconstructed video.

いくつかの以前の符号器は、再構築されたフレームをデブロッキング・フィルタ（deblocking filter；ブロック化除去フィルタ）により処理することによって、ブロッキング・アーティファクトの知覚可能性を低減しようと試みている。デブロッキング・フィルタは、複数のブロックの間の境界を平滑化する。デブロッキング・フィルタは、知覚されるビデオ品質を改善することができるが、これはいくつかの欠点を有する。例えば、そのフィルタは、復号器において再構築された出力においてのみ、機能する。したがって、ループ内デブロッキングを使用中であるときでさえ、デブロッキングの効果を、カレント・フレームについての動き推定、動き補償または変換符号化のプロセスに要素として組み込むことができない。他方では、後処理フィルタによるカレント・フレームの平滑化（すなわち、ループ外デブロッキング）は極端になり過ぎる可能性があり、平滑化プロセスは不要な計算の複雑性を導入する。 Some previous encoders have attempted to reduce the perceptibility of blocking artifacts by processing the reconstructed frame with a deblocking filter (deblocking filter). The deblocking filter smoothes the boundary between the blocks. A deblocking filter can improve the perceived video quality, but this has several drawbacks. For example , the filter works only at the output reconstructed at the decoder. Therefore, even when in-loop deblocking is in use, the effect of deblocking cannot be incorporated as an element in the motion estimation, motion compensation or transform coding process for the current frame. On the other hand, smoothing of the current frame with a post-processing filter (ie, out-of-loop deblocking) can be too extreme, and the smoothing process introduces unnecessary computational complexity.

ビデオ圧縮およびデジタル・ビデオへの圧縮解除についての決定的な重要性を考えると、ビデオ圧縮および圧縮解除は十分に開発された分野であることは驚くべきことではない。しかし、以前のビデオ圧縮および圧縮解除張テクニックの利点が何であれ、これらは以下のテクニックおよびツールの利点を有していない。 Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression is a well-developed field. However, whatever the advantages of previous video compression and decompression techniques, they do not have the advantages of the following techniques and tools.

要約すると、詳細な説明は、マルチ・リゾルーション・ビデオ符号化のための様々なテクニックおよびツールを対象とする。例えば、ビデオ符号器はビデオ・フレーム・サイズを適応して変更して、低いビットレードでのブロッキング・アーティファクトを低減する。それを行うと、符号器はブロッキング・アーティファクトを低減するが、ブロッキング・アーティファクトより知覚可能性が低く、不愉快さが少ないブラーリング（blurring；不鮮明化）を増す可能性がある。様々なテクニックおよびツールを組み合わせて、あるいは独立して使用することができる。 In summary, the detailed description is directed to various techniques and tools for multi-resolution video coding. For example, video encoders adaptively change the video frame size to reduce blocking artifacts at low bit trades. Doing so reduces the blocking artifacts, but may increase blurring, which is less perceptible and less unpleasant than blocking artifacts. Various techniques and tools can be used in combination or independently.

一態様では、ビデオ符号器はビデオを、多数の空間リゾルーションのうちのいずれかで符号化する。符号器は、多数のビデオ・フレームのシーケンスにおける少なくとも１つのフレームを第１の空間リゾルーションで符号化し、少なくとも１つの他のフレームを第２の空間リゾルーションで符号化する。第２の空間リゾルーションは第１の空間リゾルーションとは異なり、符号器は第２の空間リゾルーションを多数の空間リゾルーションのセットから選択して、ビデオ・フレームのシーケンスにおけるブロッキング・アーティファクトを低減する。 In one aspect, the video encoder encodes the video with any of a number of spatial resolutions. The encoder encodes at least one frame in a sequence of multiple video frames with a first spatial resolution and encodes at least one other frame with a second spatial resolution. The second spatial resolution is different from the first spatial resolution, and the encoder selects the second spatial resolution from a set of multiple spatial resolutions to reduce blocking artifacts in the sequence of video frames. To do.

その他の態様では、符号器はフレームの第１の部分を第１の空間リゾルーションで符号化し、フレームの第２の部分を第２の空間リゾルーションで符号化する。第２の空間リゾルーションは第１の空間リゾルーションとは異なる。 In other aspects, the encoder encodes a first portion of the frame with a first spatial resolution and encodes a second portion of the frame with a second spatial resolution. The second spatial resolution is different from the first spatial resolution.

その他の態様では、ビデオ符号器は、第１の空間リゾルーションで符号化された第１のフレームについて第１の空間リゾルーションを示す第１のコードをビットストリームに含み、第２の空間リゾルーションで符号化された第２のフレームについて第２の空間リゾルーションを示す第２のコードをビットストリームに含む。第２の空間リゾルーションは第１の空間リゾルーションとは異なり、符号器は第２の空間リゾルーションを多数の空間リゾルーションのセットから選択して、ビデオ・フレームのシーケンスにおけるブロッキング・アーティファクトを低減する。 In other aspects, the video encoder includes a first code in the bitstream that indicates the first spatial resolution for a first frame encoded with the first spatial resolution, and the second spatial resolution. The bit stream includes a second code indicating a second spatial resolution for the second frame encoded in step (b). The second spatial resolution is different from the first spatial resolution, and the encoder selects the second spatial resolution from a set of multiple spatial resolutions to reduce blocking artifacts in the sequence of video frames. To do.

その他の態様では、符号器は、フレームの第１の部分について第１の空間リゾルーションを示す第１のシグナルをビットストリームに含み、フレームの第２の部分について第２の空間リゾルーションを示す第２のシグナルをビットストリームに含む。第２の空間リゾルーションは第１の空間リゾルーションとは異なる。 In other aspects, the encoder includes a first signal in the bitstream indicating a first spatial resolution for a first portion of the frame and a first indicating a second spatial resolution for the second portion of the frame. 2 signals are included in the bitstream. The second spatial resolution is different from the first spatial resolution.

その他の態様では、復号器はマルチ・リゾルーションのシグナルを、多数の符号化されたフレームのビデオ・シーケンスのためのシーケンス・ヘッダにおいて受信する。マルチ・リゾルーションのシグナルは、複数のフレームが複数の空間リゾルーションで符号化されるかどうかを示す。複数のフレームが複数の空間リゾルーションで符号化される場合、復号器は第１の符号化されたフレームを第１の空間リゾルーションで復号化し、第２の符号化されたフレームを第２の空間リゾルーションで復号化する。 In other aspects, the decoder receives the multi-resolution signal in a sequence header for a video sequence of multiple encoded frames. The multi-resolution signal indicates whether multiple frames are encoded with multiple spatial resolutions. If the multiple frames are encoded with multiple spatial resolutions, the decoder decodes the first encoded frame with the first spatial resolution and converts the second encoded frame to the second Decrypt with spatial resolution.

その他の態様では、復号器は、符号化されたフレームの第１の部分を第１の空間リゾルーションで復号化し、符号化されたフレームの第２の部分を第２の空間リゾルーションで復号化する。第２の空間リゾルーションは第１の空間リゾルーションとは異なる。 In other aspects, the decoder decodes the first portion of the encoded frame with a first spatial resolution and decodes the second portion of the encoded frame with a second spatial resolution. To do. The second spatial resolution is different from the first spatial resolution.

その他の態様では、符号器または復号器はビデオ・イメージのためのピクセル・データを受信し、ビデオ・イメージの空間リゾルーションを適応して変更するが、これは、再サンプリングされたピクセル・データを、６タップ・ダウン・サンプリング・フィルタまたは１０タップアップサンプリングフィルタを使用して計算することを含む。 In other aspects, an encoder or decoder receives pixel data for a video image and adapts and modifies the spatial resolution of the video image, which resamples the resampled pixel data. , Using a 6 tap down sampling filter or a 10 tap upsampling filter.

追加の特徴および利点は、添付の図面を参照して進行する、以下の様々な実施形態の詳細な説明から明らかになるであろう。 Additional features and advantages will become apparent from the following detailed description of various embodiments, which proceeds with reference to the accompanying figures.

本発明の詳細な実施形態は、マルチ・リゾルーション・ビデオ符号化および復号化を対象とする。例えば、ビデオ符号器はビデオ・フレーム・サイズを適応して変更して、低いビットレードでのブロッキング・アーティファクトを低減する。それを行う際に、符号器はブロッキング・アーティファクトを低減するが、ブラーリングを増す可能性があり、これはブロッキング・アーティファクトより知覚可能性が低く、より好ましくないものではない。 Detailed embodiments of the present invention are directed to multi-resolution video encoding and decoding. For example, video encoders adaptively change the video frame size to reduce blocking artifacts at low bit trades. In doing so, the encoder reduces blocking artifacts but may increase blurring, which is less perceptible and less desirable than blocking artifacts.

いくつかの実施形態では、符号器はマルチ・リゾルーション符号化テクニックおよびツールを使用して、入力フレームを異なる空間リゾルーションで符号化する。例えば、一実装態様では、符号器はフレームを最大の元のリゾルーションで符号化したり、水平方向において２分の１にダウン・サンプリングされたリゾルーションで符号化したり、垂直方向において２分の１にダウン・サンプリングされたリゾルーションで符号化したり、あるいは水平方向および垂直方向において２分の１にダウン・サンプリングされたリゾルーションで符号化したり、する。別法として、符号器は、元のリゾルーションに対してある他のファクタ（factor）によって符号化されたフレームのリゾルーションを減らすかあるいは増したり、現在のリゾルーションに対してあるファクタにより符号化されたフレームのリゾルーションを減らすかあるいは増したり、あるいはある他のテクニックを使用してリゾルーションを設定したり、する。復号器は、符号化されたフレームを、対応するテクニックを使用して復号化する。 In some embodiments, the encoder uses multi-resolution encoding techniques and tools to encode the input frame with different spatial resolutions. For example, in one implementation, the encoder encodes the frame with the largest original resolution, encodes with a resolution down-sampled by a factor of two in the horizontal direction, or halves in the vertical direction. Or with a resolution down-sampled in half in the horizontal and vertical directions. Alternatively, the encoder may reduce or increase the resolution of a frame encoded by some other factor relative to the original resolution, or encode by a factor relative to the current resolution. Decrease or increase the resolution of a given frame, or set the resolution using some other technique. The decoder decodes the encoded frame using a corresponding technique.

いくつかの実施形態では、符号器はフレームのための空間リゾルーションを、フレーム毎のベースで、あるいはある他のベースで選択する。復号器は対応する調整を実行する。 In some embodiments, the encoder selects a spatial resolution for the frame on a per frame basis or some other basis. The decoder performs the corresponding adjustment.

いくつかの実施形態では、符号器は空間リゾルーションを、ある基準（例えば、ビットレート、フレーム・コンテンツなど）を評価することによって選択する。 In some embodiments, the encoder selects a spatial resolution by evaluating certain criteria (eg, bit rate, frame content, etc.).

様々なテクニックおよびツールを組み合わせて、あるいは独立して使用することができる。個々の実施形態は、説明するテクニックおよびツールのうち１つまたは複数を実装する。異なるテクニックおよびツールを組み合わせて、独立して、あるいは他のテクニックおよびツールと共に使用することができる。 Various techniques and tools can be used in combination or independently. Individual embodiments implement one or more of the techniques and tools described. Different techniques and tools can be combined, used independently, or with other techniques and tools.

Ｉ．コンピューティング環境
図１は、説明する実施形態を実装することができる、適切なコンピューティング環境（１００）の汎用的な実施例を例示する。コンピューティング環境（１００）は、本発明の用途または機能性の範囲についていかなる限定をも示唆するように意図されない、これは、本発明を多様な汎用または専用コンピューティング環境において実装することができるからである。 I. Computing Environment FIG. 1 illustrates a general example of a suitable computing environment (100) in which the described embodiments may be implemented. The computing environment (100) is not intended to suggest any limitation as to the scope of use or functionality of the invention, as the invention can be implemented in a variety of general purpose or special purpose computing environments. It is.

図１を参照すると、コンピューティング環境（１００）は少なくとも１つの処理装置（１１０）およびメモリ（１２０）を含む。図１では、この最も基本的な構成（１３０）が破線内に含まれる。処理装置（１１０）はコンピュータ実行可能命令を実行し、実際または仮想のプロセッサにすることができる。マルチ処理システムでは、マルチ処理装置は、処理能力を増すためのコンピュータ実行可能命令を実行する。メモリ（１２０）を揮発性メモリ（例えば、レジスタ、キャッシュ、ＲＡＭ）、不揮発性メモリ（例えば、ＲＯＭ、ＥＥＰＲＯＭ、フラッシュメモリなど）、またはこれら２つのある組合せにすることができる。メモリ（１２０）は、マルチ・リゾルーション符号化および／または復号化テクニックを実装するソフトウェア（１８０）を格納する。 With reference to FIG. 1, the computing environment (100) includes at least one processing unit (110) and memory (120). In FIG. 1, this most basic configuration (130) is contained within a dashed line. The processing unit (110) executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, the multi-processing device executes computer-executable instructions to increase processing power. The memory (120) can be volatile memory (eg, registers, cache, RAM), non-volatile memory (eg, ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory (120) stores software (180) that implements multi-resolution encoding and / or decoding techniques.

コンピューティング環境は追加の機能を有することができる。例えば、コンピューティング環境（１００）はストレージ（１４０）、１つまたは複数の入力デバイス（１５０）、１つまたは複数の出力デバイス（１６０）、および１つまたは複数の通信接続（１７０）を含む。バス、コントローラ、またはネットワークなど、相互接続メカニズム（図示せず）がコンピューティング環境（１００）の複数のコンポーネントを相互接続する。通常、オペレーティング・システム・ソフトウェア（図示せず）は、コンピューティング環境（１００）において実行する他のソフトウェアのためのオペレーティング環境を提供し、コンピューティング環境（１００）のコンポーネントの活動を調整する。 A computing environment may have additional features. For example, the computing environment (100) includes storage (140), one or more input devices (150), one or more output devices (160), and one or more communication connections (170). An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects multiple components of the computing environment (100). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment (100) and coordinates the activities of the components of the computing environment (100).

ストレージ（１４０）をリムーバルまたは非リムーバルにすることができ、これには磁気ディスク、磁気テープまたはカセット、ＣＤ−ＲＯＭ、ＣＤ−ＲＷ、ＤＶＤ、または情報を格納するために使用することができ、コンピューティング環境（１００）内でアクセスすることができるいずれかの他の媒体が含まれる。ストレージ（１４０）は、マルチ・リゾルーション符号化および／または復号化テクニックを実装するソフトウェア（１８０）のための命令を格納する。 Storage (140) can be removable or non-removable, which can be used to store magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or information, and Any other media that can be accessed within the storage environment (100) is included. Storage (140) stores instructions for software (180) that implements multi-resolution encoding and / or decoding techniques.

入力デバイス（１５０）を、キーボード、マウス、ペンまたはトラック・ボールなどのタッチ入力デバイス、音声入力デバイス、スキャニング・デバイス、ネットワーク・アダプタ、または、入力をコンピューティング環境（１００）に提供する別のデバイスにすることができる。ビデオでは、入力デバイス（１５０）をＴＶチューナー・カード、カメラ・ビデオ・インターフェース、または、ビデオ入力をアナログまたはデジタル形式で受け入れる類似のデバイス、または、ビデオ入力をコンピューティング環境に提供するＣＤ−ＲＯＭ／ＤＶＤリーダにすることができる。出力デバイス（１６０）を、ディスプレイ、プリンタ、スピーカ、ＣＤ／ＤＶＤライタ、ネットワーク・アダプタ、または、コンピューティング環境（１００）からの出力を提供する別のデバイスにすることができる。 Input device (150) is a touch input device such as a keyboard, mouse, pen or trackball, voice input device, scanning device, network adapter, or another device that provides input to the computing environment (100) Can be. For video, the input device (150) is a TV tuner card, camera video interface, or similar device that accepts video input in analog or digital format, or a CD-ROM / that provides video input to the computing environment. It can be a DVD reader. The output device (160) can be a display, printer, speaker, CD / DVD writer, network adapter, or another device that provides output from the computing environment (100).

通信接続（１７０）は、ある通信媒体を介した別のコンピューティング・エンティティへの通信を可能にする。通信媒体は、コンピュータ実行可能命令、圧縮されたビデオ情報、または変調データ信号における他のデータなどの情報を搬送する。変調データ信号は、その特性の１つまたは複数が、情報を信号において符号化するように設定または変更されている信号である。例として、限定ではなく、通信媒体には、電気的、光学、ＲＦ、赤外線、音響または他の搬送波内に実装されたワイヤードまたはワイヤレス・テクニックが含まれる。 A communication connection (170) allows communication to another computing entity via a communication medium. The communication medium carries information such as computer-executable instructions, compressed video information, or other data in the modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless techniques implemented in electrical, optical, RF, infrared, acoustic or other carrier waves.

本発明をコンピュータ可読媒体の一般的なコンテクストで説明することができる。コンピュータ可読媒体は、コンピューティング環境内でアクセスすることができる任意の使用可能な媒体である。例として、限定ではなく、コンピューティング環境（１００）内でコンピュータ可読媒体は、メモリ（１２０）、ストレージ（１４０）、通信媒体、および上記のいずれかの組合せを含む。 The invention can be described in the general context of computer-readable media. Computer readable media can be any available media that can be accessed within a computing environment. By way of example, and not limitation, computer readable media within computing environment (100) include memory (120), storage (140), communication media, and combinations of any of the above.

本発明を、対象とする実際または仮想のプロセッサ上のコンピューティング環境内で実行されるプログラム・モジュールに含まれたコンピュータ実行可能命令の一般的なコンテクストで説明することができる。一般に、プログラム・モジュールには、ルーチン、プログラム、ライブラリ、オブジェクト、クラス、コンポーネント、データ構造など、特定のタスクを実行するか、あるいは特定の抽象データ型を実施するものが含まれる。プログラム・モジュールの機能性を複数のプログラム・モジュールの間で、様々な実施形態において望ましいように結合または分割することができる。プログラム・モジュールのためのコンピュータ実行可能命令を、ローカルまたは分散コンピューティング環境内で実行することができる。 The invention may be described in the general context of computer-executable instructions contained in program modules that are executed within a computing environment on a real or virtual processor of interest. Generally, program modules include those that perform particular tasks or implement particular abstract data types, such as routines, programs, libraries, objects, classes, components, data structures, and the like. Program module functionality may be combined or divided among a plurality of program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.

提示のために、詳細な説明は「設定する（set）」、「選択する（choose）」、「符号化する（encode）」および「復号化する（decode）」という用語を使用して、コンピューティング環境におけるコンピュータ・オペレーションを説明する。これらの用語は、コンピュータによって実行されたオペレーションについての高レベルの抽象であり、人間によって実行される行為（acts）と混同されるべきではない。これらの用語に対応する実際のコンピュータ・オペレーションは、実装に応じて変わる。 For the sake of presentation, the detailed description uses the terms “set”, “choose”, “encode” and “decode” and uses the terms “compute” and “decode”. A computer operation in a computing environment will be described. These terms are high-level abstractions about operations performed by a computer and should not be confused with acts performed by a human. The actual computer operations corresponding to these terms vary depending on the implementation.

ＩＩ．ビデオ符号器および復号器の実施例
様々な実施形態におけるテクニックおよびツールを、ビデオ符号器（encode）および／または復号器（decoder）に実装することができる。ビデオ符号器および復号器は、それらの内に異なるモジュールを含むことができ、複数の異なるモジュールが互いに多数の異なる方法で関係し、通信することができる。以下で説明するモジュールおよび関係は例示的である。 II. Video Encoder and Decoder Examples Techniques and tools in various embodiments may be implemented in a video encoder and / or decoder. Video encoders and decoders can include different modules within them, and multiple different modules can be related to and communicate with one another in a number of different ways. The modules and relationships described below are exemplary.

実装および所望の圧縮のタイプに応じて、符号器または復号器のモジュールを追加し、省略し、多数のモジュールに分割し、他のモジュールと結合し、かつ／または同様のモジュールと置き換えることができる。代替実施形態では、異なるモジュールおよび／または他のモジュールの構成を有する符号器または復号器は、説明するテクニックの１つまたは複数を実行する。 Depending on the implementation and the type of compression desired, encoder or decoder modules can be added, omitted, split into multiple modules, combined with other modules, and / or replaced with similar modules. . In alternative embodiments, encoders or decoders having different module and / or other module configurations perform one or more of the described techniques.

例示する符号器および復号器はブロック・ベースであり、４：２：０マクロブロック・フォーマットを使用し、各マクロブロックは４つの８×８ルミナンス・ブロック（時として、１つの１６×１６マクロブロックとして扱われる）、および、２つの８×８クロミナンス・ブロックを含む。別法として、符号器および復号器はオブジェクト・ベースであり、異なるマクロブロックまたはブロック・フォーマットを使用し、あるいは、８×８ブロックおよび１６×１６マクロブロックとは異なるサイズまたは構成のピクセルのセットにおいてオペレーションを実行する。 The exemplary encoder and decoder are block based and use a 4: 2: 0 macroblock format, with each macroblock having four 8 × 8 luminance blocks (sometimes one 16 × 16 macroblock). And two 2 × 8 chrominance blocks. Alternatively, the encoder and decoder are object based and use different macroblocks or block formats, or in a set of pixels of a different size or configuration than 8x8 blocks and 16x16 macroblocks. Perform the operation.

Ａ．ビデオ符号器の実施例
符号器は、カレント・フレームを含むビデオ・フレームのシーケンスを受信し、圧縮されたビデオ情報を出力として生成する。符号器は、予測フレームおよびキー・フレームを圧縮する。符号器のコンポーネントの多数が、キー・フレームおよび予測フレームの両方を圧縮するために使用される。これらのコンポーネントによって実行される厳密なオペレーションは、圧縮中されている情報のタイプに応じて異なる可能性がある。 A. Example of Video Encoder An encoder receives a sequence of video frames including a current frame and generates compressed video information as an output. The encoder compresses the predicted frame and the key frame. Many of the components of the encoder are used to compress both key frames and predicted frames. The exact operations performed by these components can vary depending on the type of information being compressed.

予測フレーム（Ｐフレーム、双方向予測ではＢフレーム、またはインター（フレーム間）符号化フレーム（inter-coded frame）とも呼ばれる）が、１つまたは複数の他のフレームからの予測（または差）に関して表現される。予測残差は、予測されたものと元のフレームの間の差である。対照的にキー・フレーム（Ｉフレーム、イントラ（フレーム内）符号化フレーム（intra-coded frame）とも呼ばれる）は、他のフレームに関係なく圧縮される。 Predicted frames (also called P frames, B frames in bi-directional prediction, or inter-coded frames) are expressed in terms of prediction (or difference) from one or more other frames Is done. The prediction residual is the difference between what was predicted and the original frame. In contrast, key frames (also called I-frames, intra-coded frames) are compressed regardless of other frames.

いくつかの実施形態の中の図２を参照すると、カレント・フレーム（２０５）を符号化する符号器（２００）は、マルチ・リゾルーション符号化のためのリゾルーション変換器（２１０）を含む。リゾルーション変換器（２１０）はカレント・フレーム（２０５）を入力として受信し、変換されたフレームはもちろん、マルチ・リゾルーション・パラメータ（２１５）を出力する。カレント・フレーム（２０５）が予測フレームである場合、リゾルーション変換器（２３０）は、入力としてカレント・フレーム（２０５）のための参照フレーム（２２５）を受信し、変換された参照フレームを出力する。 Referring to FIG. 2 in some embodiments, the encoder (200) that encodes the current frame (205) includes a resolution converter (210) for multi-resolution encoding. The resolution converter (210) receives the current frame (205) as input, and outputs the multi-resolution parameter (215) as well as the converted frame. If the current frame (205) is a predicted frame, the resolution converter (230) receives the reference frame (225) for the current frame (205) as input and outputs the converted reference frame. .

リゾルーション変換器（２１０）および（２３０）は、他の符号化モジュール（２４０）と通信し、他の符号化モジュール（２４０）は、リゾルーション変換器（２１０）および（２３０）によって提供されたマルチ・リゾルーション符号化情報（例えば、マルチ・リゾルーション・パラメータ（２１５））に部分的に基づく出力（２４５）（例えば、ピクセル・ブロック・データ、モーション・ベクトル、残差など）を生成する。 The resolution converters (210) and (230) are in communication with other encoding modules (240), and the other encoding modules (240) were provided by the resolution converters (210) and (230). Generate output (245) (eg, pixel block data, motion vectors, residuals, etc.) based in part on multi-resolution encoding information (eg, multi-resolution parameters (215)).

他の符号器モジュール（２４０）には例えば、動き推定器、動き補償器、周波数変換器、量子化器、フレーム・ストアおよびエントロピー符号器を含むことができる。 Other encoder modules (240) may include, for example, a motion estimator, motion compensator, frequency converter, quantizer, frame store, and entropy encoder.

カレント・フレーム（２０５）が前方予測フレームである場合、動き推定器は、参照フレーム（２２５）に対するカレント・フレーム（２０５）のマクロブロックまたは他のピクセルのセットの動きを、推定する。この場合の参照フレームは、フレーム・ストアにバッファリングされた、再構築された以前のフレームである。代替実施形態では、参照フレーム（２２５）は後のフレームであり、あるいはカレント・フレーム（２０５）は双方向に予測される。動き補償器は、動き情報を再構築された以前のフレームに適用して、動き補償されたカレント・フレームを形成する。しかし、予測はめったに完全ではなく、動き補償されたカレント・フレームと元のカレント・フレーム（２０５）の間の差異は、予測残差である。別法として、動き推定器および動き補償器は別のタイプの動き推定／補償を適用する。 If the current frame (205) is a forward prediction frame, the motion estimator estimates the motion of the macroblock or other set of pixels of the current frame (205) relative to the reference frame (225). The reference frame in this case is the previous reconstructed frame buffered in the frame store. In an alternative embodiment, the reference frame (225) is a later frame, or the current frame (205) is predicted bi-directionally. The motion compensator applies motion information to the reconstructed previous frame to form a motion compensated current frame. However, the prediction is rarely complete and the difference between the motion compensated current frame and the original current frame (205) is the prediction residual. Alternatively, the motion estimator and motion compensator apply another type of motion estimation / compensation.

周波数変換器は、空間領域ビデオ情報（spatial domain video information）を周波数領域（frequency domain）（すなわち、スペクトル）データに変換する。ブロック・ベースのビデオ・フレームでは、周波数変換器は離散コサイン変換（ＤＣＴ）をピクセル・データのブロックまたは予測残差データに適用し、ＤＣＴ係数のブロックを生成する。別法として、周波数変換器は、フーリエ変換などの別の従来の周波数変換を適用し、あるいはウェーブレットまたはサブ・バンド解析（analysis）を使用する。 The frequency converter converts spatial domain video information into frequency domain (ie, spectrum) data. For block-based video frames, the frequency transformer applies a discrete cosine transform (DCT) to the block of pixel data or predicted residual data to generate a block of DCT coefficients. Alternatively, the frequency transformer applies another conventional frequency transform, such as a Fourier transform, or uses wavelet or sub-band analysis.

次いで、量子化器はスペクトル・データ係数のブロックを量子化する。量子化器はフレーム毎のベースで、あるいは他のベースで変わるステップ・サイズを使用して、スペクトル・データに対して一様のスカラ量子化を適用する。別法として、量子化器は、非一様、ベクトル、または非適応量子化などの別のタイプの量子化をスペクトル・データ係数に適用したり、あるいは、周波数変換を使用しない符号器システムにおいて空間領域データを直接量子化したり、する。適応量子化に加えて、符号器（２００）はフレーム・ドロップ、適応フィルタリング、またはレート・コントロールのための他のテクニックを使用することができる。 The quantizer then quantizes the block of spectral data coefficients. The quantizer applies uniform scalar quantization to the spectral data using a step size that varies on a frame-by-frame basis or other basis. Alternatively, the quantizer can apply another type of quantization to the spectral data coefficients, such as non-uniform, vector, or non-adaptive quantization, or it can be spatially used in an encoder system that does not use frequency transforms. Quantize region data directly. In addition to adaptive quantization, the encoder (200) can use frame drops, adaptive filtering, or other techniques for rate control.

再構築されたカレント・フレームが、後続の動き推定／補償のために必要とされるとき、符号器（２００）のモジュールはカレント・フレーム（２０５）を再構築し、これは通常、フレームを符号化するために使用されたテクニックの逆を実行する。フレーム・ストアは、再構築されたカレント・フレームを、次のフレームの予測において使用するためにバッファリングする。 When the reconstructed current frame is needed for subsequent motion estimation / compensation, the encoder (200) module reconstructs the current frame (205), which typically encodes the frame. Perform the reverse of the technique used to The frame store buffers the reconstructed current frame for use in predicting the next frame.

エントロピー符号器は、量子化器の出力ならびにある副情報（例えば、動き情報、量子化ステップ・サイズなど）を圧縮する。通常のエントロピー符号化テクニックには、算術符号化、差分符号化、ハフマン符号化、ラン・レングス符号化、ＬＺ符号化、辞書型符号化、および上記の組合せまたは変形物が含まれる。エントロピー符号器は通常、異なる種類の情報について異なる符号化テクニックを使用し、特定の符号化テクニック内の多数のコード・テーブルの中から選択することができる。 The entropy encoder compresses the quantizer output as well as some sub-information (eg, motion information, quantization step size, etc.). Typical entropy coding techniques include arithmetic coding, differential coding, Huffman coding, run length coding, LZ coding, dictionary coding, and combinations or variations of the above. Entropy encoders typically use different encoding techniques for different types of information and can choose from a number of code tables within a particular encoding technique.

いくつかの実施形態における他の符号器モジュール（２４０）についての追加の詳細については、２００１年５月３日出願の「DYNAMIC FILTERING FOR LOSSY COMPRESSION」という名称の米国特許出願第０９／８４９，５０２号、１９９８年１１月３０日出願の「EFFICIENT MOTION VECTOR CODING FOR VIDEO COMPRESSION」という名称の米国特許出願第０９／２０１，２７８号、特許文献１および特許文献２を参照されたい。これらの各開示は参照により本明細書に組み込まれる。 For additional details on other encoder modules (240) in some embodiments, see US patent application Ser. No. 09 / 849,502, filed May 3, 2001, entitled “DYNAMIC FILTERING FOR LOSSY COMPRESSION”. U.S. patent application Ser. No. 09 / 201,278, entitled “EFFICIENT MOTION VECTOR CODING FOR VIDEO COMPRESSION”, filed Nov. 30, 1998, U.S. Pat. Each of these disclosures is incorporated herein by reference.

Ｂ．ビデオ復号器の実施例
図３を参照すると、復号器（３００）は、圧縮されたビデオ・フレームのシーケンスについての情報を受信し、再構築されたフレームを含む出力を生成する。復号器（３００）は、予測フレームおよびキー・フレームを圧縮解除する。復号器（３００）のコンポーネントの多数が、キー・フレームおよび予測フレームの両方を圧縮解除するために使用される。これらのコンポーネントによって実行される厳密なオペレーションは、圧縮されている情報のタイプに応じて変わる可能性がある。 B. Video Decoder Embodiment Referring to FIG. 3, the decoder (300) receives information about a sequence of compressed video frames and generates an output that includes the reconstructed frames. The decoder (300) decompresses the predicted frame and the key frame. Many of the components of the decoder (300) are used to decompress both key frames and predicted frames. The exact operations performed by these components can vary depending on the type of information being compressed.

いくつかの実施形態では、カレント・フレーム（３０５）を再構築する復号器（３００）は、マルチ・リゾルーション復号化のためのリゾルーション変換器（３１０）を含む。リゾルーション変換器（３１０）は、復号化されたフレーム（３１５）を入力として取り、再構築されたカレント・フレーム（３０５）を出力する。 In some embodiments, the decoder (300) that reconstructs the current frame (305) includes a resolution converter (310) for multi-resolution decoding. The resolution converter (310) takes the decoded frame (315) as input and outputs a reconstructed current frame (305).

カレント・フレーム（３０５）が予測フレームである場合、リゾルーション変換器（３３０）は入力として、マルチ・リゾルーション・パラメータ（３１５）およびカレント・フレーム（３０５）のための参照フレーム（３２５）を受信する。リゾルーション変換器（３３０）は、参照フレーム情報を他の復号器モジュール（３４０）に出力する。他の復号器モジュール（３４０）は参照フレーム情報を、符号器から受信された動きベクトル、残差など（３４５）と共に使用して、カレント・フレーム（３０５）を復号化する。 If the current frame (305) is a predicted frame, the resolution converter (330) receives as input the multi-resolution parameter (315) and the reference frame (325) for the current frame (305). To do. The resolution converter (330) outputs the reference frame information to the other decoder module (340). Another decoder module (340) uses the reference frame information along with the motion vectors, residuals, etc. (345) received from the encoder to decode the current frame (305).

他の復号器モジュール３４０には、例えば、バッファ、エントロピー復号器、動き補償器、フレーム・ストア、逆量子化器および逆周波数変換器を含むことができる。 Other decoder modules 340 may include, for example, buffers, entropy decoders, motion compensators, frame stores, inverse quantizers, and inverse frequency converters.

バッファは、圧縮されたビデオ・シーケンスについての情報（３４５）を受信し、受信された情報をエントロピー復号器で使用可能にする。バッファは通常、情報を、経時的にかなり一定であるレートで受信し、帯域幅または伝送における短期変動を平滑化するためのジッタ・バッファを含む。バッファは、再生バッファおよび他のバッファも含むことができる。別法として、バッファは情報を、変化するレートで受信する。 The buffer receives information (345) about the compressed video sequence and makes the received information available to the entropy decoder. The buffer typically includes a jitter buffer for receiving information at a rate that is fairly constant over time and smoothing short-term variations in bandwidth or transmission. Buffers can also include playback buffers and other buffers. Alternatively, the buffer receives information at a changing rate.

エントロピー復号器は、エントロピー符号化された量子化データ、ならびに、エントロピー符号化された副情報（例えば、動き情報、量子化ステップ・サイズなど）を復号化し、通常は、符号器において実行されたエントロピー符号化の逆を適用する。エントロピー復号器はしばしば、異なる種類の情報について異なる復号化テクニックを使用し、特定の復号化テクニック内の多数のコード・テーブルの中から選択することができる。 The entropy decoder decodes the entropy-encoded quantized data, as well as the entropy-encoded sub-information (eg, motion information, quantization step size, etc.), and usually entropy performed in the encoder Apply the opposite of encoding. Entropy decoders often use different decoding techniques for different types of information and can choose from a number of code tables within a particular decoding technique.

再構築されるフレームが前方予測フレームである場合、動き補償器は動き情報を参照フレームに適用して、再構築中であるフレームの予測を形成する。例えば、動き補償器はマクロブロック動きベクトルを使用して、参照フレームにおけるマクロブロックを見つけ出す。フレーム・バッファは、以前の再構築されたフレームを、参照フレームとして使用するために格納する。別法として、動き補償器は別のタイプの動き補償を適用する。動き補償器による予測はめったに完全ではなく、そのため復号器はまた予測残差をも再構築する。 If the frame to be reconstructed is a forward predicted frame, the motion compensator applies motion information to the reference frame to form a prediction of the frame that is being reconstructed. For example, the motion compensator uses the macroblock motion vector to find the macroblock in the reference frame. The frame buffer stores the previous reconstructed frame for use as a reference frame. Alternatively, the motion compensator applies another type of motion compensation. The prediction by the motion compensator is rarely complete, so the decoder also reconstructs the prediction residual.

復号器が、再構築されたフレームを後続の動き補償のために必要とするとき、フレーム・ストアは、再構築されたフレームを、次のフレームの予測において使用するためにバッファリングする。 When the decoder needs a reconstructed frame for subsequent motion compensation, the frame store buffers the reconstructed frame for use in predicting the next frame.

逆量子化器は、エントロピー復号化されたデータを逆量子化する。一般には、逆量子化器は、フレーム毎のベースで、あるいは他のベースで変わるステップ・サイズを使用して、エントロピー復号化されたデータに対して一様のスカラ逆量子化を適用する。別法として、逆量子化器は、非一様、ベクトル、または非適応逆量子化などの別のタイプの逆量子化をデータに適用したり、あるいは空間領域データを、逆周波数変換を使用しない復号器システムにおいて直接逆量子化したり、する。 The inverse quantizer performs inverse quantization on the entropy-decoded data. In general, an inverse quantizer applies uniform scalar inverse quantization to entropy decoded data using a step size that varies on a frame-by-frame basis or other basis. Alternatively, the inverse quantizer does not apply another type of inverse quantization to the data, such as non-uniform, vector, or non-adaptive inverse quantization, or does not use inverse frequency transforms on spatial domain data Direct dequantization in the decoder system.

逆周波数変換器は、量子化された周波数領域データを空間領域ビデオ情報に変換する。ブロック・ベースのビデオ・フレームでは、逆周波数変換器は、逆ＤＣＴ（ＩＤＣＴ）をＤＣＴ係数のブロックに適用し、それぞれキー・フレームまたは予測フレームのためのピクセル・データまたは予測残差データを生成する。別法として、周波数変換器は、フーリエ変換などの別の従来の逆周波数変換を適用し、あるいはウェーブレットまたはサブ・バンド統合（synthesis）を使用する。 The inverse frequency converter converts the quantized frequency domain data into spatial domain video information. For block-based video frames, the inverse frequency transformer applies inverse DCT (IDCT) to the block of DCT coefficients to generate pixel data or prediction residual data for the key frame or prediction frame, respectively. . Alternatively, the frequency transformer applies another conventional inverse frequency transform, such as a Fourier transform, or uses wavelets or sub-band synthesis.

いくつかの実施形態における他の復号器モジュール（３４０）についての追加の詳細については、２００１年５月３日出願の「DYNAMIC FILTERING FOR LOSSY COMPRESSION」という名称の米国特許出願第０９／８４９，５０２号、１９９８年１１月３０日出願の「EFFICIENT MOTION VECTOR CODING FOR VIDEO COMPRESSION」という名称の米国特許出願第０９／２０１，２７８号、特許文献１および特許文献２を参照されたい。 For additional details about other decoder modules (340) in some embodiments, see US patent application Ser. No. 09 / 849,502, filed May 3, 2001, entitled “DYNAMIC FILTERING FOR LOSSY COMPRESSION”. U.S. patent application Ser. No. 09 / 201,278, entitled “EFFICIENT MOTION VECTOR CODING FOR VIDEO COMPRESSION”, filed Nov. 30, 1998, U.S. Pat.

ＩＩＩ．マルチ・リゾルーション・ビデオ符号化および復号化
マルチ・リゾルーション符号化では、符号器が入力フレームを異なる空間リゾルーションで符号化する。符号器はフレームのための空間リゾルーションを、フレーム毎のベースで、あるいは他のベースで選択する。いくつかの実施形態では、符号器は空間リゾルーションを、以下の観測に基づいて選択する。
１．ビットレートが減るにつれて、より低い空間リゾルーションでの符号化での利点が増す。
２．量子化ステップ・サイズが増すにつれて、より低い空間リゾルーションでの符号化での利点が増す。
３．ダウン・サンプリングが高い周波数情報を廃棄するので、ダウン・サンプリングは時として、知覚的に重要な高い周波数コンテンツ（例えば、「強いエッジ」、テキストなど）を有するフレームには、あまり適していない。
４．フレームが低域特性を有する場合、あるいはフレームがノイズのような高い周波数コンテンツを有する場合、ダウン・サンプリングが適切である可能性がある。 III. Multi-resolution video encoding and decoding In multi-resolution encoding, an encoder encodes an input frame with different spatial resolutions. The encoder selects the spatial resolution for the frame on a frame-by-frame basis or on another basis. In some embodiments, the encoder selects a spatial resolution based on the following observations.
1. As the bit rate decreases, the benefits of encoding with lower spatial resolution increase.
2. As the quantization step size increases, the benefits of coding with lower spatial resolution increase.
3. Because down-sampling discards high frequency information, down-sampling is sometimes not well suited for frames with perceptually important high frequency content (eg, “strong edges”, text, etc.).
4). Downsampling may be appropriate if the frame has low frequency characteristics, or if the frame has high frequency content such as noise.

いくつかの実施形態では、符号器はカレント・フレームのビットレート、量子化器ステップ・サイズおよび高い周波数エネルギーについての方向づけ（orientation）／大きさを使用して、空間リゾルーションを選択する。例えば、カレント・フレームの水平の高い周波数成分の大きさが大きいが垂直の高い周波数成分の大きさが小さい場合、符号器は垂直ダウン・サンプリングを選択する。他の実施形態では、符号器は参照フレームからの情報を（カレント・フレームからの情報の代わりに、あるいはこれと組み合わせて）使用して、空間リゾルーションを選択する。別法として、符号器は上の基準のいくつかまたはすべてを省略するか、上の基準のいくつかの代わりに他の基準を使用するか、あるいは追加の基準を使用して空間リゾルーションを選択することができる。 In some embodiments, the encoder uses the current frame bit rate, quantizer step size, and orientation / magnitude for high frequency energy to select spatial resolution. For example, if the magnitude of the horizontal high frequency component of the current frame is large but the magnitude of the vertical high frequency component is small, the encoder selects vertical down-sampling. In other embodiments, the encoder uses information from the reference frame (instead of or in combination with information from the current frame) to select a spatial resolution. Alternatively, the encoder may omit some or all of the above criteria, use other criteria instead of some of the above criteria, or use additional criteria to select spatial resolution. can do.

符号器がカレント・フレームのための空間リゾルーションを選択した後、符号器は、これを符号化する前に、元のフレームを所望のリゾルーションへ再サンプリングする。カレント・フレームが予測フレームである場合、符号器はまた予測フレームのための参照フレームも、カレント・フレームの新しいリゾルーションに合致するように再サンプリングする。次いで、符号器はリゾルーションの選択を復号器に送信する。一実装では、６タップ・フィルタがダウン・サンプリングのために使用され、１０タップ・フィルタがアップ・サンプリングのために使用され、これらのフィルタは、組み合わせることによって、再構築されたビデオの品質を高めるように設計される。別法として、他のフィルタが使用される。 After the encoder selects a spatial resolution for the current frame, the encoder resamples the original frame to the desired resolution before encoding it. If the current frame is a predicted frame, the encoder also resamples the reference frame for the predicted frame to match the new resolution of the current frame. The encoder then sends a resolution selection to the decoder. In one implementation, a 6-tap filter is used for down-sampling, a 10-tap filter is used for up-sampling, and these filters combine to increase the quality of the reconstructed video Designed as such. Alternatively, other filters are used.

図４は、フレームのマルチ・リゾルーション符号化のためのテクニック（４００）を示す。図２の符号器（２００）などの符号器は、フレームのためのリゾルーションを設定する（４１０）。例えば、符号器は、上に挙げた基準または他の基準を考慮する。 FIG. 4 shows a technique (400) for multi-resolution encoding of frames. An encoder, such as the encoder (200) of FIG. 2, sets up a resolution for the frame (410). For example, the encoder considers the criteria listed above or other criteria.

次いで、符号器は、フレーム（４２０）をそのリゾルーションで符号化する。符号化が完了した場合（４３０）、符号器は終了する。そうでない場合、符号器が次のフレームのためのリゾルーションを設定し（４１０）、符号化を継続する。別法として、符号器はリゾルーションをフレーム・レベル以外のあるレベルに設定する。 The encoder then encodes the frame (420) with that resolution. If the encoding is complete (430), the encoder ends. Otherwise, the encoder sets a resolution for the next frame (410) and continues encoding. Alternatively, the encoder sets the resolution to some level other than the frame level.

いくつかの実施形態では、符号器は予測フレームならびにイントラ・フレームを符号化する。図６は、イントラ・フレームおよび予測フレームのマルチ・リゾルーション符号化のためのテクニック（６００）を示す。 In some embodiments, the encoder encodes predicted frames as well as intra frames. FIG. 6 shows a technique (600) for multi-resolution encoding of intra frames and predicted frames.

最初に、符号器は、符号化されるカレント・フレームがＩフレームであるかＰフレームであるかをチェックする（６１０）。カレント・フレームがＩフレームである場合、符号器はカレント・フレームのためのリゾルーションを設定する（６２０）。フレームがＰフレームである場合、符号器は、カレント・フレームのためのリゾルーションを設定する（６２０）前に、参照フレームのためのリゾルーションを設定する（６３０）。 Initially, the encoder checks whether the current frame to be encoded is an I frame or a P frame (610). If the current frame is an I frame, the encoder sets a resolution for the current frame (620). If the frame is a P-frame, the encoder sets a resolution for the reference frame (630) before setting a resolution for the current frame (620).

カレント・フレームのためのリゾルーションを設定した後（６２０）、符号器はカレント・フレームをそのリゾルーションで符号化する（６４０）。符号化が完了した場合（６５０）、符号器は終了する。そうでない場合、符号器は符号化を継続する。 After setting the resolution for the current frame (620), the encoder encodes the current frame with that resolution (640). If the encoding is complete (650), the encoder ends. Otherwise, the encoder continues to encode.

いくつかの実装では、符号器は、以下のリゾルーションのうち１つで選択的にフレームを符号化する。すなわち、１）最大リゾルーション、２）水平方向において２のファクタでダウン・サンプリングされたリゾルーション、３）垂直方向において２のファクタでダウン・サンプリングされたリゾルーション、または４）水平方向および垂直方向において２のファクタでダウン・サンプリングされたリゾルーション、である。別法として、符号器は、リゾルーションをある他のファクタ（例えば、２のべき乗ではないもの）により減らすかあるいは増したり、追加の使用可能なリゾルーションを有したり、あるいはある他のテクニックを使用してリゾルーションを設定したり、する。符号器は各フレームのためのリゾルーションを、元のイメージ・サイズに対して設定する。別法として、符号器はフレームのためのリゾルーションを、以前のフレームのリゾルーションまたは以前のリゾルーション設定に対して設定し、すなわち、符号器は、以前のリゾルーションに対してリゾルーションを漸進的に（progressively）変更する。 In some implementations, the encoder selectively encodes the frame with one of the following resolutions: 1) maximum resolution, 2) resolution down-sampled by a factor of 2 in the horizontal direction, 3) resolution down-sampled by a factor of 2 in the vertical direction, or 4) horizontal and vertical direction. Resolution downsampled by a factor of 2. Alternatively, the encoder may reduce or increase the resolution by some other factor (eg, not a power of 2), have an additional usable resolution, or use some other technique. Use to set the resolution. The encoder sets the resolution for each frame to the original image size. Alternatively, the encoder sets the resolution for the frame relative to the resolution of the previous frame or the previous resolution setting, i.e. the encoder progressively advances the resolution relative to the previous resolution. Change progressively.

復号器は、符号化されたフレームを復号化し、必要な場合、フレームを表示前にアップ・サンプリングする。符号化されたフレームのリゾルーションのように、復号化されたフレームのリゾルーションを多数の異なる方法において調整することができる。 The decoder decodes the encoded frame and, if necessary, up-samples the frame before displaying it. Like the resolution of the encoded frame, the resolution of the decoded frame can be adjusted in a number of different ways.

図５は、フレームのマルチ・リゾルーション復号化のためのテクニック（５００）を示す。図３の復号器（３００）などの復号器は、フレームのためのリゾルーションを設定する（５１０）。例えば、復号器はリゾルーション情報を符号器から得る。 FIG. 5 shows a technique (500) for multi-resolution decoding of a frame. A decoder, such as the decoder (300) of FIG. 3, sets up a resolution for the frame (510). For example, the decoder obtains resolution information from the encoder.

次いで、復号器は、フレームをそのリゾルーションで復号化する（５２０）。復号化が完了した場合（５３０）、復号器は終了する。そうでない場合、復号器は次のフレームのためのリゾルーションを設定し（５１０）、復号化を継続する。別法として、復号器はリゾルーションをフレーム・レベル以外のあるレベルに設定する。 The decoder then decodes the frame with the resolution (520). If decoding is complete (530), the decoder ends. Otherwise, the decoder sets a resolution for the next frame (510) and continues decoding. Alternatively, the decoder sets the resolution to some level other than the frame level.

いくつかの実施形態では、復号器は予測フレームならびにイントラ・フレームを復号化する。図７は、イントラ・フレームおよび予測フレームのマルチ・リゾルーション復号化のためのテクニック（７００）を示す。 In some embodiments, the decoder decodes prediction frames as well as intra frames. FIG. 7 shows a technique (700) for multi-resolution decoding of intra frames and predicted frames.

最初に、復号器は、復号化されるカレント・フレームがＩフレームであるかＰフレームであるかをチェックする（７１０）。カレント・フレームがＩフレームである場合、復号器はカレント・フレームのためのリゾルーションを設定する（７２０）。フレームがＰフレームである場合、復号器は、カレント・フレームのためのリゾルーションを設定する（７２０）前に、参照フレームのためのリゾルーションを設定する（７３０）。 Initially, the decoder checks 710 whether the current frame to be decoded is an I frame or a P frame. If the current frame is an I frame, the decoder sets a resolution for the current frame (720). If the frame is a P frame, the decoder sets the resolution for the reference frame (730) before setting the resolution for the current frame (720).

参照フレームのためのリゾルーションを設定した後（７２０）、復号器はカレント・フレームをそのリゾルーションで復号化する（７４０）。復号化が完了した場合（７５０）、復号器は終了する。そうでない場合、復号器は復号化を継続する。 After setting the resolution for the reference frame (720), the decoder decodes the current frame with that resolution (740). If the decoding is complete (750), the decoder ends. Otherwise, the decoder continues decoding.

復号器は通常、符号器で使用されたリゾルーション、例えば、上述のリゾルーションのうち１つでフレームを復号化する。別法として、復号器で使用可能なリゾルーションは、符号器で使用されたものと厳密に同じではない。 The decoder typically decodes the frame with the resolution used by the encoder, eg, one of the above-described resolutions. Alternatively, the resolution available at the decoder is not exactly the same as that used at the encoder.

Ａ．シグナリング
マルチ・リゾルーション符号化されたフレームを復号化するための十分な情報を復号器に提供するために、符号器はビット・ストリーム・シグナリングを使用する。例えば、符号器は、フレームのシーケンスがマルチ・リゾルーション符号化を使用して符号化されるかどうかを示し、かつ／または、シーケンス内で符号化されたフレームのリゾルーションを示すシグナルを、１つまたは複数のフラグまたはコードの形式で送信することができる。別法として、符号器はマルチ・リゾルーション符号化を、シーケンス・レベル以外のあるレベルで可能／不可能にし、かつ／またはリゾルーションをフレーム・レベル以外のあるレベルで設定する。 A. Signaling In order to provide the decoder with enough information to decode the multi-resolution encoded frame, the encoder uses bit stream signaling. For example, the encoder may indicate whether a sequence of frames is encoded using multi-resolution encoding and / or a signal indicating resolution of frames encoded in the sequence 1 It can be sent in the form of one or more flags or codes. Alternatively, the encoder enables / disables multi-resolution encoding at some level other than the sequence level and / or sets the resolution at some level other than the frame level.

図８は、フレームのシーケンスをマルチ・リゾルーション符号化により符号化する際に、シグナルを送信するためのテクニック（８００）を示す。符号器は次のシーケンス・ヘッダを入力として取り（８１０）、マルチ・リゾルーション符号化がこのシーケンスに対して可能にするべきであるかどうかを決定する（８２０）。 FIG. 8 shows a technique (800) for transmitting a signal when encoding a sequence of frames with multi-resolution encoding. The encoder takes the next sequence header as input (810) and determines whether multi-resolution encoding should be enabled for this sequence (820).

符号器が、マルチ・リゾルーション符号化をこのシーケンスに対して使用中でない場合、符号器は、そのシーケンスのシグナルをそれに応じて設定し（８３０）、そのシーケンス内のフレームを符号化する（８４０）。 If the encoder is not using multi-resolution encoding for this sequence, the encoder sets the signal for that sequence accordingly (830) and encodes the frames in that sequence (840). ).

符号器がマルチ・リゾルーション符号化を使用中である場合、符号器はシーケンスのシグナルをそれに応じて設定する（８５０）。次いで、符号器はそのシーケンス内のフレームを、そのフレームの水平および／または垂直リゾルーション用のスケーリング・ファクタを示すシグナルに従って、（例えば、図４または図６を参照して上述したように）符号化する。 If the encoder is using multi-resolution encoding, the encoder sets the signal of the sequence accordingly (850). The encoder then encodes the frames in the sequence according to a signal indicating a scaling factor for the horizontal and / or vertical resolution of the frame (eg, as described above with reference to FIG. 4 or FIG. 6). Turn into.

符号化が完了した場合（８７０）、符号器は終了する。そうでない場合、符号器は次のシーケンスを符号化する。 If the encoding is complete (870), the encoder ends. Otherwise, the encoder encodes the next sequence.

いくつかの実施形態では、符号器は、マルチ・リゾルーション符号化がフレームのシーケンスのために可能にされるかどうかを示す１ビットを送信する。次いで、そのシーケンス内のフレームについて、ＩフレームおよびＰフレームの各々について指定されたフィールドにおけるコードが、最大リゾルーション・フレームに対するそのフレームのリゾルーション用のスケーリング・ファクタを指定する。一実装では、そのコードは固定長コードである。表１は、スケーリング・ファクタが、ＲＥＳＰＩＣＦＬＣというラベルが付けられたフィールドにおいてどのように符号化されるかを示している。 In some embodiments, the encoder transmits one bit that indicates whether multi-resolution encoding is enabled for the sequence of frames. Then, for the frames in the sequence, the code in the field specified for each of the I and P frames specifies the scaling factor for the resolution of that frame relative to the maximum resolution frame. In one implementation, the code is a fixed length code. Table 1 shows how the scaling factor is encoded in the field labeled RESPIC FLC.

別法として、符号器は、フレーム・リゾルーションへの調整をシグナルで送るもう１つの方法（例えば、可変長コード）を使用する。実装および可能なリゾルーションの数に応じて、符号器は追加のシグナル・コードまたはより少ない数のシグナル・コードを使用することができ、あるいは水平および垂直リゾルーションについて異なるコードを使用することができる。さらに、可能なリゾルーションの相対的確率に応じて、符号器はコードの長さを調整することができる（例えば、より短いコードを、最も可能性の高いリゾルーションに割り当てる）。さらに、符号器はシグナルを他の目的のために使用することができる。例えば、符号器はシグナル（例えば、固定または可変長コード）を使用して、複数のフィルタが使用可能である状況においてどのフィルタが再サンプリングのために使用されるべきであるかを示すことができる。符号器はこのようなシグナルを使用して、使用可能な、あらかじめ定義されたフィルタまたはカスタム・フィルタのうちどちらを再サンプリングにおいて使用するべきであるかを示すことができる。 Alternatively, the encoder uses another method (eg, a variable length code) that signals an adjustment to frame resolution. Depending on the implementation and the number of possible resolutions, the encoder can use additional signal codes or fewer signal codes, or can use different codes for horizontal and vertical resolution. . Further, depending on the relative probability of possible resolution, the encoder can adjust the length of the code (eg, assign a shorter code to the most likely resolution). In addition, the encoder can use the signal for other purposes. For example, the encoder can use a signal (eg, a fixed or variable length code) to indicate which filter should be used for resampling in situations where multiple filters are available. . The encoder can use such a signal to indicate whether an available predefined filter or a custom filter should be used in resampling.

シグナルを送信することによって、符号器は復号器に、マルチ・リゾルーション符号化されたフレームを復号化するために有用な情報を提供する。復号器はシグナルを構文解析して、符号化されたフレームがどのように復号化されるべきであるかを決定する。例えば、復号器は、符号器によって送信されたコードを解釈して、フレームのシーケンスがマルチ・リゾルーション符号化を使用して符号化されているかどうかを決定し、かつ／または、シーケンス内の符号化されたフレームのリゾルーションを決定することができる。 By transmitting the signal, the encoder provides the decoder with useful information for decoding the multi-resolution encoded frame. The decoder parses the signal to determine how the encoded frame should be decoded. For example, the decoder interprets the code transmitted by the encoder to determine whether the sequence of frames is encoded using multi-resolution encoding and / or the code in the sequence The resolution of the normalized frames can be determined.

図９は、符号化されたフレームのシーケンスをマルチ・リゾルーション復号化により復号化するとき、シグナルを受信および解釈するためのテクニック（９００）を示す。復号器は次の符号化されたシーケンス・ヘッダを入力として取り（９１０）、このシーケンスに関連付けられたシグナルをチェックして、符号器がマルチ・リゾルーション符号化をこのシーケンスのために使用したかどうかを決定する（９２０）。 FIG. 9 shows a technique (900) for receiving and interpreting signals when decoding a sequence of encoded frames with multi-resolution decoding. The decoder takes the next encoded sequence header as input (910), checks the signal associated with this sequence, and whether the encoder used multi-resolution encoding for this sequence A decision is made (920).

符号器がマルチ・リゾルーション符号化を使用しなかった場合、復号器は、そのシーケンス内のフレームを復号化する（９３０）。他方では、符号器がマルチ・リゾルーション符号化を使用した場合、復号器は、そのフレーム用の水平および／または垂直リゾルーションのスケーリング・ファクタを示すシグナルを構文解析する（９４０）。次いで、復号器は、そのフレームをそれに応じて（例えば、図５または図７を参照して上述したように）復号化する（９５０）。 If the encoder did not use multi-resolution encoding, the decoder decodes (930) the frames in the sequence. On the other hand, if the encoder used multi-resolution encoding, the decoder parses a signal indicating the horizontal and / or vertical resolution scaling factors for that frame (940). The decoder then decodes (950) the frame accordingly (eg, as described above with reference to FIG. 5 or FIG. 7).

復号化が完了した場合（９６０）、復号器は終了する。そうでない場合、復号器は次のシーケンスを復号化する。 If the decoding is complete (960), the decoder ends. Otherwise, the decoder decodes the next sequence.

Ｂ．ダウン・サンプリングおよびアップ・サンプリング
以下のセクションでは、いくつかの実装におけるアップ・サンプリングおよびダウン・サンプリングのプロセスを説明する。他の実装は、異なるアップ・サンプリング、ダウン・サンプリングまたはフィルタリング・テクニックを使用する。例えば、代替実施形態では、符号器は、非線形フィルタまたは空間的に変化するフィルタ・バンクを使用して、フレームを符号化することができる。 B. Downsampling and Upsampling The following sections describe the upsampling and downsampling processes in some implementations. Other implementations use different up-sampling, down-sampling or filtering techniques. For example, in an alternative embodiment, the encoder may encode the frame using a non-linear filter or a spatially varying filter bank.

表２は、符号器および／または復号器によって、フレームのダウン・サンプリング／アップ・サンプリングのために使用される変数定義を示している。これらの定義は、ダウン・サンプリングおよびアップ・サンプリングの実施例のための擬似コードについての以下の説明で使用される。 Table 2 shows the variable definitions used by the encoder and / or decoder for down-sampling / up-sampling of the frame. These definitions will be used in the following description of pseudo code for the down-sampling and up-sampling embodiments.

表２：いくつかの実装におけるダウン・サンプリング／アップ・サンプリングのための変数定義
Ｎ_ｕ＝アップ・サンプリングされた（最大リゾルーション）ライン内のサンプルの数
Ｎ_ｄ＝ダウン・サンプリングされた（半リゾルーション）ライン内のサンプルの数
Ｘ_ｕ［ｎ］＝位置ｎでアップ・サンプリングされたサンプル値、ただしｎ＝０，１，２…Ｎ_ｕ−１
Ｘ_ｄ［ｎ］＝位置ｎでダウン・サンプリングされたサンプル値、ただしｎ＝０，１，２…Ｎ_ｄ−１ Table 2: Variable definitions for downsampling / upsampling in some implementations
N _u = number of samples in the up-sampled (maximum resolution) line
N _d = number of samples in the down-sampled (semi-resolution) line
X _u [n] = sample value up-sampled at position n, where n = 0, 1, 2,... N _u −1
X _d [n] = sampled value down-sampled at position n, where n = 0, 1, 2,... N _d −1

「ライン」という用語は、Ｙ、ＣｒまたはＣｂのコンポーネント平面における水平−行または垂直−列におけるサンプル群を指す。以下の実施例では、アップ・サンプリングおよびダウン・サンプリング・オペレーションが行および列の両方について等しく、したがって、１次元のラインのサンプルを使用して例示される。垂直および水平のアップ・サンプリングまたはダウン・サンプリングが実行される場合、水平ラインが最初に再サンプリングされ、後に垂直ラインが続く。別法として、水平および垂直フィルタリングが同時にピクセルのブロック上で、異なるフィルタを使用して実施される。 The term “line” refers to a group of samples in horizontal-row or vertical-column in the Y, Cr or Cb component plane. In the following example, up-sampling and down-sampling operations are equal for both rows and columns, and are therefore illustrated using a one-dimensional line sample. When vertical and horizontal up-sampling or down-sampling is performed, the horizontal line is resampled first, followed by the vertical line. Alternatively, horizontal and vertical filtering is performed simultaneously on the block of pixels using different filters.

表３は、ルミナンス線の再サンプリングのための擬似コードを示し、表４は、クロミナンス・ラインの再サンプリングのための擬似コードを示している。 Table 3 shows the pseudo code for luminance line resampling, and Table 4 shows the pseudo code for chrominance line resampling.

表３：いくつかの実装におけるルミナンス線の再サンプリングのための擬似コード
N_d=N_u/2（ただし、Ｎ_ｕは最大リゾルーションのルミナンス・ラインにおけるサンプルの数）
if((N_d & 15)!=0)
N_d=N_d+16-(N_d & 15) Table 3: Pseudo code for luminance line resampling in some implementations
N _d = N _u / 2 (where N _u is the number of samples in the maximum resolution luminance line)
if ((N _d & 15)! = 0)
N _d = N _d + 16- (N _d & 15)

表４：いくつかの実施態様におけるクロミナンス線の再サンプリングのための擬似コード
N_d=N_u/2（ただし、Ｎ_ｕは最大リゾルーションのクロミナンス・ラインにおけるサンプルの数）
if((N_d & 7)!=0)
N_d=N_d+8-(N_d & 7) Table 4: Pseudo code for chrominance line resampling in some embodiments
N _d = N _u / 2, where N _u is the number of samples in the chrominance line for maximum resolution
if ((N _d & 7)! = 0)
N _d = N _d + 8- (N _d & 7)

再サンプリングは、ダウン・サンプリングされたラインについてのサンプルの数を設定する。次いで、（４：２：０または類似のマクロブロックにより動作する符号器では）再サンプリングがラインにおけるサンプルの数を調整し、このラインがルミナンス・ライン用のマクロブロックの倍数（すなわち、１６の倍数）、またはクロミナンス・ライン用のブロックの倍数（すなわち、８の倍数）であるように、する。 Resampling sets the number of samples for the down-sampled line. The resampling then adjusts the number of samples in the line (in encoders operating with 4: 2: 0 or similar macroblocks) and this line is a multiple of the macroblock for the luminance line (ie a multiple of 16) ), Or multiples of blocks for chrominance lines (ie, multiples of 8).

１．ダウン・サンプリング・フィルタ
ラインのダウン・サンプリングは、表５の擬似コードに従って出力を生成する。 1. Down-sampling filter Line down-sampling produces an output according to the pseudo code in Table 5.

表５：いくつかの実装におけるラインのダウン・サンプリングのための擬似コード
if (N_d!=(N_u/2))
{
for (i=N_u;i<N_d*2;i++)
x_u[i]=x_u[N_u-1]
}
downsamplefilter_line(x_u[])
for (i=0;i<N_d;i++)
x_d[i]=x_u[i*2]
ｄｏｗｎｓａｍｐｌｅｆｉｌｔｅｒ＿ｌｉｎｅ（）において使用される６タップ・フィルタのためのコードを、図１０に示す。 Table 5: Pseudocode for line downsampling in some implementations
if (N _d ! = (N _u / 2))
{
for (i = N _u ; i <N _d * 2; i ++)
x _u [i] = x _u [N _u -1]
}
downsamplefilter_line (x _u [])
for (i = 0; i <N _d ; i ++)
x _d [i] = x _u [i * 2]
The code for the 6-tap filter used in downsample ef filter_line () is shown in FIG.

図１０では、イメージが水平方向においてフィルタリングされるとき、ＲＮＤ＿ＤＯＷＮが値６４に設定され、イメージが垂直方向においてフィルタリングされるとき、値６３に設定される。 In FIG. 10, RND_DOWN is set to the value 64 when the image is filtered in the horizontal direction, and set to the value 63 when the image is filtered in the vertical direction.

２．アップ・サンプリング・フィルタ
ラインのアップ・サンプリングは、表６の擬似コードに従って出力を生成する。 2. Up-sampling filter Line up-sampling produces an output according to the pseudo code in Table 6.

表６：いくつかの実装におけるラインのアップ・サンプリングのための擬似コード
for (i=0;i<N_u;i++)
{
x_u[i]=x_d[i*2]
x_u[i+1]=0
}
upsamplefilter_line(x_u[]) Table 6: Pseudocode for line upsampling in some implementations
for (i = 0; i <N _u ; i ++)
{
x _u [i] = x _d [i * 2]
x _u [i + 1] = 0
}
upsamplefilter_line (x _u [])

ｕｐｓａｍｐｌｅｆｉｌｔｅｒ＿ｌｉｎｅ（）において使用される１０タップ・フィルタのためのコード例を、図１１に示す。図１１では、イメージが水平方向においてフィルタリングされるとき、ＲＮＤ＿ＵＰが１５に設定され、イメージが垂直方向においてフィルタリングされるとき、１６に設定される。 An example code for a 10 tap filter used in upsampl ef filter_line () is shown in FIG. In FIG. 11, RND_UP is set to 15 when the image is filtered in the horizontal direction, and is set to 16 when the image is filtered in the vertical direction.

他のフィルタのペアも再サンプリングのために使用することができる。フィルタのペアを、ビデオのコンテンツおよび／または目標ビットレートに調整することができる。符号器は、フィルタの選択を副情報として復号器に送信することができる。 Other filter pairs can also be used for resampling. The filter pair can be adjusted to the video content and / or target bit rate. The encoder can send the filter selection as side information to the decoder.

Ｃ．新しいフレーム・ディメンションの計算
表７の擬似コードは、符号器が、ダウン・サンプリングされたフレームのための新しいフレーム・ディメンション（dimensions：大きさ、寸法）を計算する方法を例示している。 C. Calculation of New Frame Dimensions The pseudo code in Table 7 illustrates how the encoder calculates new frame dimensions for the down-sampled frames.

表７：ダウン・サンプリング後に新しいフレーム・ディメンションを計算するための擬似コード
Ｘ＝水平ディメンション、元のリゾルーションにおけるサンプルの数
Ｙ＝垂直ディメンション、元のリゾルーションにおけるサンプルの数
ｘ＝新しい水平リゾルーション
ｙ＝新しい垂直リゾルーション
ｈｓｃａｌｅ＝水平スケーリング・ファクタ（０＝最大リゾルーション、１＝半リゾルーション）
ｖｓｃａｌｅ＝垂直スケーリング・ファクタ（０＝最大リゾルーション、１＝半リゾルーション）
x=X
y=Y
if (hscale==1)
{
x=X/2
if ((x & 15)!=0)
x=x+16-(x & 15)
}
if (vscale==1)
{
y=Y/2
if ((y & 15)!=0)
y=y+16-(y & 15)
} Table 7: Pseudocode for calculating new frame dimensions after down-sampling
X = horizontal dimension, number of samples in original resolution
Y = vertical dimension, number of samples in the original resolution
x = new horizontal resolution
y = new vertical resolution
hscale = horizontal scaling factor (0 = maximum resolution, 1 = half resolution)
vscale = vertical scaling factor (0 = maximum resolution, 1 = half resolution)
x = X
y = Y
if (hscale == 1)
{
x = X / 2
if ((x & 15)! = 0)
x = x + 16- (x & 15)
}
if (vscale == 1)
{
y = Y / 2
if ((y & 15)! = 0)
y = y + 16- (y & 15)
}

表７に示すテクニックを使用した実装では、符号器が新しいフレーム・ディメンションを計算し、これは、元の・ディメンションを２のファクタでダウン・サンプリングすること、および次いで、新しい・ディメンションがマクロブロック・サイズの整数の倍数（１６の倍数）であるように、端数を切り上げる。クロミナンス・ラインについて、符号器は、ディメンションを、ブロック・サイズの整数の倍数（８の倍数）となるように切り上げる。新しいディメンションの切り上げにより、ダウン・サンプリングされたフレームを、４：２：０または類似のマクロブロック・フォーマットを使用するビデオ符号器／復号器によって、符号化することができる。 In an implementation using the technique shown in Table 7, the encoder calculates a new frame dimension, which is down-sampling the original dimension by a factor of 2, and then the new dimension is a macroblock The fraction is rounded up to be an integer multiple of size (a multiple of 16). For chrominance lines, the encoder rounds the dimension up to be an integer multiple of block size (a multiple of 8). With the new dimension round-up, the down-sampled frame can be encoded by a video encoder / decoder using 4: 2: 0 or similar macroblock format.

Ｄ．他に取りうる方法（alternatives）
上述の様々な選択肢（alternatives）と共に、あるいはこれらに加えて、符号器および復号器は以下のように動作することができる。 D. Alternatives
With or in addition to the various alternatives described above, the encoder and decoder may operate as follows.

マルチ・リゾルーション・フレームワークを、個々のフレームまたは一連のフレームについてのいくつかのレベルのダウン・サンプリングに拡張することができる。いくつかのレベルのダウン・サンプリングを使用することにより、符号器が高リゾルーション・フレームを比較的低いビットレートで符号化するとき、再構築されたフレームの品質を改善することができる。 The multi-resolution framework can be extended to several levels of down-sampling for individual frames or series of frames. By using several levels of down-sampling, the quality of the reconstructed frame can be improved when the encoder encodes a high resolution frame at a relatively low bit rate.

符号器は、元のリゾルーションに関係したリゾルーションを２のファクタで調整することによって達成されるリゾルーション以外のリゾルーションにフレームを再サンプリングするマルチレート・フィルタリング・テクニックを使用することができる。例えば、分数比サンプリング（fractional-rate sampling）は、複雑さが増すという代償を払って、より順調なトレードオフを、高い周波数の細部の保持とブロッキング・アーティファクトの低減との間で提供することができる。 The encoder can use a multi-rate filtering technique that resamples the frame to a resolution other than that achieved by adjusting the resolution relative to the original resolution by a factor of two. For example, fractional-rate sampling can provide a smoother tradeoff between maintaining high frequency details and reducing blocking artifacts at the cost of increased complexity. it can.

符号器は、異なるレベルの再サンプリングをフレームの異なる部分に適用することができる。例えば、符号器は、フレームのうち高い周波数コンテンツがほとんどない領域を、ダウン・サンプリングされたリゾルーションで符号化することができると同時に、符号器は、フレームのうち強い高い周波数コンテンツを有するエリアを元のリゾルーションで符号化することができる。さらに、符号器は異なるフィルタを、フレームの異なる部分を再サンプリングするために、あるいは垂直および水平再サンプリングのために適用することができる。符号器はシグナリングを使用して、異なる再サンプリングレベル、および／または、フレームの異なる部分を再サンプリングするために使用された異なるフィルタを示すことができる。 The encoder can apply different levels of resampling to different parts of the frame. For example, an encoder can encode an area of a frame that has little high frequency content with down-sampled resolution, while an encoder can identify areas of the frame that have strong high frequency content. It can be encoded with the original resolution. In addition, the encoder can apply different filters to resample different portions of the frame, or for vertical and horizontal resampling. The encoder may use signaling to indicate different resampling levels and / or different filters used to resample different portions of the frame.

本発明の原理を、様々な上述の実施形態を参照して説明し、例示したが、上述の実施形態を構成および詳細において、このような原理から逸脱することなく修正できることは理解されよう。本明細書で説明したプログラム、プロセスまたは方法は、特に断らない限り、いかなる特定のタイプのコンピューティング環境にも関係せず、あるいは限定されないことを理解されたい。様々なタイプの汎用または専用コンピューティング環境は、本明細書で説明した教示によるオペレーションと共に使用することができ、あるいはこれを実行することができる。ソフトウェアにおいて示した上述の実施形態の要素をハードウェアに実装することができ、その逆も可能である。 Although the principles of the present invention have been described and illustrated with reference to various above-described embodiments, it will be understood that the above-described embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment, unless specifically stated otherwise. Various types of general purpose or special purpose computing environments may be used in conjunction with, or may perform, the operations according to the teachings described herein. Elements of the above-described embodiments shown in software can be implemented in hardware and vice versa.

本発明の原理を適用することができる多数の可能な実施形態に鑑みて、本発明として、付属の特許請求の範囲およびその同等物の範囲および趣旨内に入る可能性のあるこのようなすべての実施形態を主張する。 In view of the many possible embodiments to which the principles of the present invention may be applied, the present invention includes all such claims that may fall within the scope and spirit of the appended claims and their equivalents. Claim embodiment.

説明する実施形態を実装することができる、適切なコンピューティング環境のブロック図である。1 is a block diagram of a suitable computing environment in which the described embodiments can be implemented. 説明する実施形態を実装することができる、ビデオ符号器のブロック図である。FIG. 3 is a block diagram of a video encoder in which the described embodiments can be implemented. 説明する実施形態を実装することができる、ビデオ復号器のブロック図である。FIG. 3 is a block diagram of a video decoder in which the described embodiments can be implemented. フレームのマルチ・リゾルーション符号化のための汎用化されたテクニックを示す流れ図である。Figure 3 is a flow diagram illustrating a generalized technique for multi-resolution encoding of frames. フレームのマルチ・リゾルーション復号化のための汎用化されたテクニックを示す流れ図である。Fig. 3 is a flow diagram illustrating a generalized technique for multi-resolution decoding of frames. イントラ・フレームおよび予測フレームのマルチ・リゾルーション符号化のためのテクニックを示す流れ図である。2 is a flow diagram illustrating a technique for multi-resolution encoding of intra frames and predicted frames. イントラ・フレームおよび予測フレームのマルチ・リゾルーション復号化のためのテクニックを示す流れ図である。3 is a flow diagram illustrating a technique for multi-resolution decoding of intra and predicted frames. フレームのシーケンスをマルチ・リゾルーション符号化により符号化するとき、シグナルを送信するためのテクニックを示す流れ図である。Fig. 4 is a flow diagram illustrating a technique for transmitting a signal when encoding a sequence of frames with multi-resolution encoding. 符号化されたフレームのシーケンスをマルチ・リゾルーション復号化により復号化するとき、シグナルを受信および解釈するためのテクニックを示す流れ図である。FIG. 5 is a flow diagram illustrating a technique for receiving and interpreting signals when decoding a sequence of encoded frames with multi-resolution decoding. 一実装例におけるダウン・サンプリング・フィルタのための擬似コードリストの図である。FIG. 4 is a pseudo code listing for a down-sampling filter in one implementation. 一実装例におけるアップ・サンプリング・フィルタのための擬似コードリストの図である。FIG. 4 is a pseudo code listing for an up-sampling filter in one implementation.

Explanation of symbols

２００符号器
２０５カレント・フレーム
２１０リゾルーション変換器
２１５マルチ・リゾルーション・パラメータ
２２５参照フレーム
２３０リゾルーション変換器
２４０他の符号器モジュール
２４５動きベクトル、残差など
３００復号器
３０５カレント・フレーム
３１０リゾルーション変換器
３１５マルチ・リゾルーション・パラメータ、復号化されたフレーム
３２５参照フレーム
３３０リゾルーション変換器
３４０他の復号器モジュール
３４５動きベクトル、残差など
200 encoder 205 current frame 210 resolution converter 215 multi resolution parameter 225 reference frame 230 resolution converter 240 other encoder module 245 motion vector, residual etc. 300 decoder 305 current frame 310 resolution Transformer 315 Multi-resolution parameter, decoded frame 325 Reference frame 330 Resolution converter 340 Other decoder modules 345 Motion vector, residual, etc.

Claims

A method for decoding a bitstream of a sequence of video frames, comprising:
Receiving first information about the sequence, the first information indicating whether second information is present in the bitstream ;
Determining that the second information is present in the bitstream based on the first information, and then determining the sequence based on a scaling factor of spatial resolution in response to the determination. For each of a plurality of frames of the sequence, the second information at the frame level in the bitstream, wherein one or more spatial resolution scalings for the frame Receiving second information indicative of a factor;
Multi-resolution resolution decoding the frames using the scaling factor;
The plurality of frames include at least one I frame and at least one P frame, and each of the at least one P frame has a reference I frame among the at least one I frame, and the at least one P frame. For each of the at least one P frame, the scaling factor of the one or more spatial resolutions for the P frame is the 1 or the reference I frame for the reference I frame of the P frame. A method characterized in that it is constrained to be the same as the scaling factor of multiple spatial resolutions.

The method of claim 1, wherein the scaling factor of the one or more spatial resolutions is adaptively determined based at least in part on a bit rate criterion.

The method of claim 1, wherein the scaling factor of the one or more spatial resolutions is adaptively determined based at least in part on a high frequency content criterion.

The method of claim 1, wherein the scaling factor of the one or more spatial resolutions is adaptively determined based at least in part on a quantization step size criterion.

The method of claim 1, wherein the second information is a fixed length code.

6. The method of claim 5, wherein the fixed length code is a 2-bit code that represents four possible states of the scaling factor of the one or more spatial resolutions.

The method of claim 1, wherein the second information is a variable length code.

The method of claim 1, wherein the first information is signaled in a sequence header.

9. The method of claim 8, wherein the first information is a 1-bit code in the sequence header.

The method of claim 1, further comprising receiving third information in the bitstream, the third information indicating a selected resampling filter.

The method of claim 1, wherein the one or more spatial resolution scaling factors comprise a vertical spatial resolution scaling factor and a horizontal spatial resolution scaling factor.

The method of claim 11, wherein the scaling factor of the vertical spatial resolution is different from the scaling factor of the horizontal spatial resolution.

12. The method of claim 11, wherein the vertical spatial resolution scaling factor is selected from a set of vertical spatial resolutions including full resolution and half resolution.

12. The method of claim 11, wherein the horizontal spatial resolution scaling factor is selected from a set of horizontal spatial resolutions including full resolution and half resolution.

Decoding the plurality of frames with multi-spatial resolution decoding according to a scaling factor of the spatial resolution indicated by the second information;
The method of claim 1, further comprising displaying the plurality of frames.

Decoding with the multi-spatial resolution decoding
Decoding the plurality of frames of current frames encoded with reduced spatial resolution;
After decoding the current frame, up-sampling the current frame, wherein the up-sampling comprises producing a decoded frame of full resolution. The method of claim 15.

The method of claim 16, wherein the up-sampling comprises applying a 10-tap filter to the decoded current frame.

The method of claim 16, wherein the displayed current frame with the reduced spatial resolution comprises reduced blocking artifacts.

A method of encoding a bitstream of a sequence of video frames, the bitstream having multiple levels,
Outputting first information about the sequence, the first information indicating whether second information is present in the bitstream ;
Based on a scaling factor of spatial resolution in response to determining that the second information is present in the bitstream and thereafter determining the first information as indicated by the first information. Second information at a frame level in the bitstream for each of the plurality of frames of the sequence, wherein one or more spatial resolutions for the frame are encoded. Outputting second information indicating a scaling factor of
Outputting multi-spatial resolution encoded data for the plurality of frames using the scaling factor;
The plurality of frames include at least one I frame and at least one P frame, and each of the at least one P frame has a reference I frame among the at least one I frame, and the at least one P frame. For each of the at least one P frame, the scaling factor of the one or more spatial resolutions for the P frame is the 1 or the reference I frame for the reference I frame of the P frame. A method characterized in that it is constrained to be the same as the scaling factor of multiple spatial resolutions.

The method of claim 19, wherein the second information is a fixed length code.

21. The method of claim 20, wherein the fixed length code is a 2-bit code that represents four possible states of the scaling factor of the one or more spatial resolutions.

The method of claim 19, wherein the second information is a variable length code.

The method of claim 19, wherein the first information is signaled in a sequence header.

24. The method of claim 23, wherein the first information is a 1-bit code in the sequence header.

The method of claim 19, further comprising outputting third information in the bitstream that is indicative of a selected resampling filter.

The method of claim 19, wherein the one or more spatial resolution scaling factors comprise a vertical spatial resolution scaling factor and a horizontal spatial resolution scaling factor.

27. The method of claim 26, wherein the vertical spatial resolution scaling factor is different from the horizontal spatial resolution scaling factor.

27. The method of claim 26, wherein the vertical spatial resolution scaling factor is selected from a set of vertical spatial resolutions including a full resolution and a half resolution.

27. The method of claim 26, wherein the horizontal spatial resolution scaling factor is selected from a set of horizontal spatial resolutions including a full resolution and a half resolution.

The method of claim 19, wherein the scaling factor of the one or more spatial resolutions is adaptively determined based at least in part on a bit rate criterion.

The method of claim 19, wherein the scaling factor of the one or more spatial resolutions is adaptively determined based at least in part on a high frequency content criterion.

The method of claim 19, wherein the scaling factor of the one or more spatial resolutions is adaptively determined based at least in part on a quantization step size criterion.

The method of claim 19, further comprising encoding the plurality of frames with multi-spatial resolution encoding according to a scaling factor of the spatial resolution indicated by the second information.

34. The method of claim 33, further comprising down-sampling the plurality of current frames, wherein the down-sampling further results in a reduced resolution frame.

The method of claim 34, wherein the down-sampling comprises applying a 6-tap filter to the current frame.

35. The method of claim 34, wherein the down-sampling comprises down-sampling in a horizontal direction prior to down-sampling in a vertical direction.

The current frame of the plurality of frames includes a plurality of lines, and the multi-space resolution encoding for the current frame is performed so that the number of samples in each of the plurality of lines is a multiple of a macroblock. 20. The method of claim 19, comprising adjusting to:

Means for receiving first information in a bitstream of a sequence of video frames, the first information indicating whether second information is present in the bitstream ;
If the first information indicates that the second information is present in the bitstream, second information at a frame level in the bitstream for each of a plurality of frames of the sequence; Means for receiving second information indicative of a scaling factor of one or more spatial resolutions and multi-spatial resolution decoding the plurality of frames using the scaling factor, the plurality of frames comprising: At least one I frame and at least one P frame, each of the at least one P frame having a reference I frame of the at least one I frame, and for each of the at least one P frame, The scaling factor for one or more spatial resolutions is System characterized in that it comprises a means which is constrained to be identical to the scaling factor of the one or more spatial Rizorushon for a reference I-frame of the P-frame.

Means for outputting first information in a bitstream of a sequence of video frames, the first information indicating whether second information is present in the bitstream ;
If the first information indicates that the second information is present in the bitstream, for each of a plurality of frames of the sequence, second information at a frame level in the bitstream; Outputting second information indicating a scaling factor of one or more spatial resolutions for the frame and outputting multi-spatial resolution encoded data for the plurality of frames using the scaling factor The plurality of frames comprises at least one I frame and at least one P frame, each of the at least one P frame having a reference I frame of the at least one I frame, For each P frame, the one or more empty frames Scaling factor Rizorushon the system characterized in that it comprises a means which is constrained to be identical to the scaling factor of the one or more spatial Rizorushon for a reference I-frame of the P-frame.