HK1103501B

HK1103501B - Multi-layer video coding and decoding methods and multi-layer video encoder and decoder

Info

Publication number: HK1103501B
Application number: HK07111669.0A
Authority: HK
Inventors: 韩宇镇; 李培根; 车尚昌; 河昊振; 李教爀; 李宰荣
Original assignee: 三星电子株式会社
Priority date: 2004-09-07
Filing date: 2005-08-13
Publication date: 2013-11-15

Description

Multi-layer video encoding and decoding method, and multi-layer video encoder and decoder

Technical Field

Apparatuses and methods consistent with the present invention relate to a multi-layer video coding algorithm, and more particularly, to a multi-layer video coding algorithm designed to code a predetermined resolution layer using a plurality of coding algorithms.

Background

With the development of information communication technologies including the internet, video communication as well as text and voice communication have increased. Conventional text communication cannot satisfy various demands of users, so multimedia services capable of providing various types of information such as text, pictures, and music have increased. Since the amount of multimedia data is generally large, the multimedia data requires a large-capacity storage medium and a wide transmission bandwidth. For example, a 24-bit true color image with a 640 × 480 resolution requires a capacity of 640 × 480 × 24 bits per frame, i.e., approximately 7.37 megabits (Mbits) of data per frame. When this picture is transmitted at a rate of 30 frames per second, a bandwidth of 221 megabits per second (Mbits/sec) is required. When storing a 90 minute movie based on such an image, a storage space of about 1200 gigabits (Gbits) is required. Therefore, a compression encoding method is necessary for transmitting multimedia data including text, video, and audio.

The basic principle of multimedia data compression is to remove data redundancy. In other words, video data may be compressed by removing spatial redundancy, temporal redundancy, or mental visual redundancy, in which the same color or object is repeated in an image; in temporal redundancy, there is little variation between adjacent frames in a moving image, or the same sound is repeated in audio; while psychovisual redundancy takes into account human vision and the limited perception of high frequencies.

Fig. 1 illustrates an environment in which video compression is applied.

The video data is compressed by a video encoder 110. Currently known video compression algorithms based on the Discrete Cosine Transform (DCT) are MPEG-2, MPEG-4, H.263, and H.264. In recent years, active research has been conducted on wavelet-based scalable (scalable) video coding. The compressed video data is transmitted to the video decoder 130 via the network 120. The video decoder 130 decodes the compressed video data to reconstruct the original video data.

The video encoder 110 compresses the original video data so as not to exceed the available bandwidth of the network 120 for the video decoder 130 to decode the compressed data. However, the communication bandwidth may vary depending on the type of network 120. For example, the available communication bandwidth of ethernet is different from the available communication bandwidth of a Wireless Local Area Network (WLAN). Cellular communication networks may have very narrow bandwidths. Accordingly, active research has been conducted on methods for generating video data compressed at various bit rates from the same compressed video data, particularly scalable video coding methods.

Scalable video coding is a video compression technique that allows video data to provide scalability (scalability). Scalability is the ability to generate video sequences at different resolutions, frame rates, and qualities from the same compressed bitstream. Temporal scalability may be provided using Motion Compensated Temporal Filtering (MCTF), Unconstrained MCTF (Unconstrained MCTF), or progressive Temporal Approximation and Referencing (STAR) algorithms. Spatial scalability is achieved by wavelet transform algorithms or multi-layer coding, which have been actively studied in recent years. Signal-to-Noise (SNR) scalability can be obtained using Embedded zero tree wavelets (EZW), multilevel tree Set splitting (Set Partitioning in hierarchical trees, SPIHT), Embedded Zero Block Coding (EZBC), or optimized truncated Embedded Block Coding with optimized truncated version (EBCOT).

Fig. 2 and 3 illustrate examples of a multi-layered bitstream structure.

Referring to fig. 2, the multi-layer Video encoder encodes each layer using MPEG-4 Advanced Video Coding (AVC) which provides the highest Coding efficiency currently available. The MPEG-4 AVC algorithm removes temporal redundancy between frames and transforms the resulting frames using DCT for quantization.

Referring to fig. 2, each layer has at least one different resolution, frame rate, and bit rate. In the AVC scheme, a base layer (base layer) frame having the lowest resolution, the lowest frame rate, and the lowest bit rate is encoded, and then an enhancement layer (enhancement layer) is encoded using the encoded base layer frame. The AVC-based multi-layer video coding scheme encodes each layer using AVC-based techniques, providing high coding efficiency. In particular, the intra prediction (intra prediction) and deblocking (deblocking) techniques used in the AVC algorithm effectively remove most of the artifacts (artifacts) caused by block-based coding. Also, each layer is optimized for rate-distortion (rate-distortion). However, the generated bitstream does not have flexible scalability. That is, it is difficult to provide Fine Granularity Scalability (FGS) and combined scalability (combinescalability) using a bitstream generated by multi-layer AVC video coding because scalability is interdependent. When video data is encoded into multiple layers, the multi-layer encoding scheme shown in fig. 2 performs AVC encoding on all layers.

Referring to fig. 3, after a base layer having the lowest resolution, lowest frame rate, and lowest bit rate is encoded using AVC, a layer having the highest resolution, highest frame rate, and highest quality is encoded using the encoded base layer through wavelet encoding.

The coding scheme shown in fig. 3 can provide a bitstream with full scalability (full scalability) since the layer with the highest resolution, the highest frame rate, and the highest quality is coded using wavelet coding. Also, since the lowest resolution layer is encoded using AVC, the video decoder can reconstruct a video frame having satisfactory quality at the lowest resolution.

Disclosure of Invention

Technical problem

Although the bitstream shown in fig. 2 is optimized for rate distortion for each layer, it has poor scalability, and the bitstream shown in fig. 3 has good scalability but low video quality because all layers except the lowest resolution AVC coding layer are reconstructed from one wavelet-coded layer.

Technical scheme

The present invention provides multi-layered video encoding and decoding methods, and multi-layered video encoders and decoders, capable of providing higher encoding efficiency and scalability.

According to an aspect of the present invention, there is provided a multi-layered video encoding method including: encoding a video frame having a predetermined resolution using a first video encoding scheme; encoding the video frame having the same resolution as the predetermined resolution using a second video encoding scheme with reference to a frame encoded by a first video encoding scheme; and generating a bitstream containing the frames encoded by the first and second video coding schemes.

According to another aspect of the present invention, there is provided a multi-layered video encoding method including: generating a lower resolution video frame by downsampling the video frame; encoding the lower resolution video frame; encoding the video frame using the encoded lower resolution video frame as a reference; and generating a bitstream containing the encoded lower resolution video frames and the video frames, wherein encoding the lower resolution video frames comprises: encoding the lower resolution video frame using a first video encoding scheme; and encoding the lower resolution video frame using the second video encoding scheme with reference to the lower resolution frame encoded by the first video encoding scheme.

According to still another aspect of the present invention, there is provided a multi-layered video encoding method including: (a) encoding a video frame having a predetermined resolution using a first video encoding scheme; (b) encoding the video frame having the same resolution as the predetermined resolution using a second video encoding scheme with reference to a frame encoded by a first video encoding scheme; and (c) generating a bitstream containing encoded frames for all resolution layers, wherein steps (a) and (b) are performed recursively for all resolution layers in order from a lower resolution layer to a higher resolution layer.

According to still another aspect of the present invention, there is provided a multi-layer video encoder including: a down-sampler to down-sample the higher resolution video frame to generate a lower resolution video frame; a lower resolution video encoding unit encoding a lower resolution video frame; a higher resolution video encoding unit that encodes a higher resolution video frame using the encoded lower resolution video frame as a reference; and a bitstream generator that generates a bitstream containing the encoded lower resolution frames and the encoded higher resolution video frames, wherein the lower resolution video encoding unit encodes the lower resolution video frames using the first video encoding scheme, and encodes the lower resolution video frames using the second video encoding scheme using the lower resolution frames encoded using the first video encoding scheme, thereby generating the encoded lower resolution frames.

According to still another aspect of the present invention, there is provided a multi-layer decoding method including: extracting frames encoded by a first video encoding scheme and frames encoded by a second video encoding scheme from a bitstream; decoding the frame encoded by the first video encoding scheme using a first video decoding scheme to reconstruct a first frame; and decoding the frame encoded by the second video coding scheme with reference to the reconstructed first frame using the second video decoding scheme at the same resolution as the reconstructed first frame to reconstruct the second frame.

According to still another aspect of the present invention, there is provided a multi-layer decoding method including: extracting frames encoded by a first video encoding scheme and frames encoded by a second video encoding scheme from a bitstream; decoding the frames encoded by the first video encoding scheme using a first video decoding scheme to reconstruct the first frames; decoding the frames encoded by the second video coding scheme using a second video decoding scheme at the same resolution as the reconstructed first frames to reconstruct second frames; and adding the reconstructed second frame to the reconstructed first frame to reconstruct the video frame.

According to another aspect of the present invention, there is provided a multi-layer video decoding method including: extracting an encoded lower resolution layer frame and an encoded higher resolution layer frame from a bitstream; decoding the encoded lower resolution layer frame to reconstruct the lower resolution layer frame; and decoding the encoded higher resolution layer frame with reference to the reconstructed lower resolution layer frame to reconstruct the higher resolution layer frame, wherein the encoded lower resolution layer frame includes a frame encoded by the first video coding scheme and a frame encoded by the second video coding scheme, and wherein decoding the lower resolution layer frame includes: decoding the frame encoded by the first video encoding scheme using a first video decoding scheme to reconstruct a first frame; the frame encoded by the second video encoding scheme is decoded using a second video decoding scheme with reference to the reconstructed first frame to reconstruct a second frame.

According to another aspect of the present invention, there is provided a multi-layer video decoding method including: extracting an encoded lower resolution layer frame and an encoded higher resolution layer frame from a bitstream; decoding the encoded lower resolution layer frame to reconstruct the lower resolution layer frame; and decoding the encoded higher resolution layer frame with reference to the reconstructed lower resolution layer frame to reconstruct the higher resolution layer frame, wherein the encoded lower resolution layer frame comprises a frame encoded by the first video coding scheme and a frame encoded by the second video coding scheme, and wherein decoding the lower resolution layer frame comprises: decoding the frame encoded by the first video encoding scheme using a first video decoding scheme to reconstruct a first frame; decoding the frames encoded by the second video encoding scheme using a second video decoding scheme to reconstruct second frames; and adding the reconstructed second frame to the reconstructed first frame to reconstruct the lower resolution layer video frame.

According to another aspect of the present invention, there is provided a multi-layer video decoding method including: extracting an encoded lower resolution layer frame and an encoded higher resolution layer frame from a bitstream; decoding the encoded lower resolution layer frame to reconstruct the lower resolution layer frame; and decoding the encoded higher resolution layer frame with reference to the reconstructed lower resolution layer frame to reconstruct the higher resolution layer frame, wherein the encoded lower resolution layer frame includes a frame encoded by the first video coding scheme and a frame encoded by the second video coding scheme, and wherein decoding the lower resolution layer frame includes: decoding the frames encoded by the first video encoding scheme using a first video decoding scheme to reconstruct the first frames; decoding the frames encoded by the second video encoding scheme using a second video decoding scheme to reconstruct second frames; and adding the reconstructed second frame to the reconstructed first frame to reconstruct the lower resolution layer video frame.

According to another aspect of the present invention, there is provided a multi-layer video decoding method including: extracting an encoded lower resolution layer frame and an encoded higher resolution layer frame from a bitstream; and decoding the encoded lower resolution layer frames and the encoded higher resolution layer frames to reconstruct video frames, wherein the encoded frames of each resolution layer include frames encoded by the first video coding scheme and frames encoded by the second video coding scheme, the method comprising: decoding the frames encoded by the first video coding scheme for a predetermined resolution layer using a first video decoding scheme to reconstruct first frames; and decoding the frames encoded by the second video coding scheme for the resolution layers using the second video decoding scheme with reference to the reconstructed first frames to reconstruct second frames, and wherein the decoding of the frames encoded by the first video coding scheme and the decoding of the frames encoded by the second video coding scheme are recursively performed for all resolution layers in order from a lower resolution layer to a higher resolution layer.

According to another aspect of the present invention, there is provided a multi-layer video decoding method including: extracting an encoded lower resolution layer frame and an encoded higher resolution layer frame from a bitstream; and decoding the encoded lower resolution layer frames and the encoded higher resolution layer frames to reconstruct video frames, wherein the encoded video frames of each resolution layer include frames encoded by the first video coding scheme and frames encoded by the second video coding scheme, the method comprising: decoding the frames encoded by the first video coding scheme for a predetermined resolution layer using a first video decoding scheme to reconstruct first frames; decoding the frames encoded by the second video coding scheme for the resolution layer using a second video decoding scheme to reconstruct second frames; and adding the reconstructed second frame to the reconstructed first frame to thereby reconstruct the video frame in the resolution layer, wherein decoding the frame encoded by the first video coding scheme, decoding the frame encoded by the second video coding scheme, and adding to reconstruct the video frame are recursively performed for all resolution layers in order from the lower resolution layer to the higher resolution layer.

According to another aspect of the present invention, there is provided a multi-layer video decoder including: a bitstream interpreter that interprets a bitstream to extract encoded lower resolution layer frames and encoded higher resolution layer frames; a lower resolution video decoding unit that decodes the encoded lower resolution layer frame; and a higher resolution video decoding unit which decodes the encoded higher resolution layer frame using the reconstructed lower resolution layer frame as a reference, wherein the lower resolution video decoding unit decodes the frame encoded by the first video encoding scheme using the first video decoding scheme to reconstruct the first frame, and uses the first frame so as to decode the frame encoded by the second video encoding scheme using the second video decoding scheme, thereby reconstructing the lower resolution layer frame.

Drawings

The above and other aspects of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:

FIG. 1 illustrates an environment in which video compression is applied;

fig. 2 and 3 show examples of multi-layer video bitstream structures;

fig. 4 illustrates a structure of a multi-layered video bitstream according to an exemplary embodiment of the present invention;

fig. 5 is a block diagram of a multi-layer video encoder according to an exemplary embodiment of the present invention;

fig. 6 is a flowchart illustrating a multi-layered video encoding process according to an exemplary embodiment of the present invention;

fig. 7 and 8 illustrate a detailed multi-layer video encoding process according to an exemplary embodiment of the present invention;

fig. 9 illustrates a process of allocating a bit rate to each layer in a multi-layer video encoding process according to an exemplary embodiment of the present invention;

fig. 10 and 11 illustrate a structure of a multi-layered video bitstream according to an exemplary embodiment of the present invention;

fig. 12 is a block diagram of a multi-layer video decoder according to an exemplary embodiment of the present invention; and

fig. 13 is a flowchart illustrating a multi-layer video decoding process according to an exemplary embodiment of the present invention.

Detailed Description

Aspects of the present invention and methods of accomplishing the same may be understood more readily by reference to the following description of exemplary embodiments and the accompanying drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art, and the present invention will only be defined by the appended claims. Like reference numerals correspond to like elements throughout the specification.

Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown.

Fig. 4 illustrates a structure of a multi-layered video bitstream according to an exemplary embodiment of the present invention.

Referring to fig. 4, a bitstream generated by multi-layer video encoding has two layers for each resolution. One layer is coded using Advanced Video Coding (AVC) and the other layer is coded using wavelet coding. In this specification, AVC coding or AVC layer always refers to a coding or layer that employs Discrete Cosine Transform (DCT) and quantization in the AVC algorithm. Wavelet coding or wavelet layer refers to coding or layer that employs wavelet transform and embedded quantization (embedded quantization). In order to generate a bitstream having temporal scalability, the AVC coding scheme and the wavelet coding scheme respectively employ MCTF, UMCTF, or STAR algorithms providing temporal scalability.

The AVC layer of each resolution guarantees efficiency at the temporal-spatial-quality (spatio-temporal-quality) level, while the wavelet layer guarantees Fine Granularity Scalability (FGS). The predecoder simply truncates (truncate) a portion of the wavelet layer bitstream to produce a bitstream having a quality between the AVC layer quality and the wavelet layer quality. The same truncation scheme will be applied to the multiple layers.

For example, the pre-decoder may generate a bitstream having QCIF resolution and 32 to 64 kilobits per second (kbps) quality from the bitstream shown in fig. 4. To achieve this, the pre-decoder truncates all CIF and SD resolution layers and all or part of each QCIF resolution wavelet layer.

An example of a video encoder generating a multi-layer bitstream according to an exemplary embodiment of the present invention is shown in fig. 5. For convenience of explanation, it is assumed that the video encoder has encoding units for two resolution layers.

Fig. 5 is a block diagram of a multi-layered video encoder according to an exemplary embodiment of the present invention.

Referring to fig. 5, the multi-layer video encoder includes a downsampler 550, an AVC encoding unit 510 and a wavelet encoding unit 520 that encode low resolution layer video frames, an AVC encoding unit 530 and a wavelet encoding unit 540 that encode high resolution layer video frames, and a bitstream generator 560 that generates a bitstream.

More specifically, the downsampler 550 downsamples the video frame to generate a low resolution video frame.

The multi-layer video encoder has two coding units, i.e., an AVC coding unit and a wavelet coding unit, for each resolution layer. That is, the multi-layer video encoder includes an AVC encoding unit 510 and a wavelet encoding unit 520 for encoding low resolution layer video frames, and an AVC encoding unit 530 and a wavelet encoding unit 540 for encoding high resolution layer video frames.

The bitstream generator 560 generates a bitstream containing the encoded low and high resolution layer frames.

A process for generating a bitstream will now be described.

First, the downsampler 550 downsamples the video frame 500 to generate a low resolution video frame having half the resolution of the video frame. The low-resolution video frame is transmitted to the AVC coding unit 510 and the wavelet coding unit 520 for the low-resolution layer, while the video frame 500 is transmitted to the AVC coding unit 530 and the wavelet coding unit 540 for the high-resolution layer.

The AVC encoding unit 510 for a low resolution layer includes a temporal filter 511 that removes temporal redundancy existing in a low resolution frame, a DCT transformer 512 that performs DCT on the low resolution frame in which the temporal redundancy has been removed, and a quantizer 513 that quantizes the DCT-transformed low resolution frame. AVC encoded low resolution layer frames are provided for performing wavelet encoding of a low resolution layer.

The wavelet encoding unit 520 for the low resolution layer includes a temporal filter 521 for removing temporal redundancy in the low resolution frames using AVC-encoded low resolution layer frames, a wavelet transformer 522 for performing wavelet transform on the low resolution frames, and a quantizer 523 for quantizing the wavelet-transformed low resolution frames. Wavelet-coded low-resolution layer frames are provided for performing AVC coding of a high-resolution layer.

The AVC encoding unit 530 for a high resolution layer includes a temporal filter 531 for removing temporal redundancy existing in the high resolution frames 500 using wavelet-encoded low resolution layer frames, a DCT transformer 532 for performing DCT on the high resolution frames in which the temporal redundancy has been removed, and a quantizer 533 for quantizing the DCT-transformed high resolution frames. AVC-coded high-resolution layer frames are provided for performing wavelet coding of the high-resolution layer.

The wavelet encoding unit 540 for a high resolution layer includes a temporal filter 541 that removes temporal redundancy existing in the high resolution frames 500 using AVC-encoded high resolution layer frames, a wavelet transformer 542 that performs wavelet transform on the high resolution frames, and a quantizer 543 that quantizes the wavelet-transformed high resolution frames. Wavelet encoded high resolution layer frames are provided for performing wavelet encoding of the high resolution layer.

The bitstream generator 560 generates a bitstream containing AVC-coded and wavelet-coded low resolution layer frames and AVC-coded and wavelet-coded high resolution layer frames. The bitstream contains information about the encoded frames, including sequence headers (headers), group-of-pictures (GOP) headers, header information for frame headers, and other information such as motion vectors obtained during temporal filtering.

The bitstream is pre-decoded by a pre-decoder (not shown) and transmitted to a multi-layer video decoder. For example, a pre-decoder may truncate a high resolution layer of a bitstream to produce a bitstream containing only encoded low resolution layer frames for use in devices with small display screens, such as cellular phones or Personal Digital Assistants (PDAs). The pre-decoder may also truncate a portion of the bit stream to produce a bit stream having a low bit rate when network conditions are poor. Meanwhile, when the required frame rate is low, the pre-decoder truncates some frames of the bitstream to generate a bitstream having a low frame rate.

Fig. 6 is a flowchart illustrating a multi-layer video encoding process.

Referring to fig. 6, a video frame is input to a multi-layered video encoder in operation S610, and the multi-video encoder downsamples the input video frame to a lower resolution in operation S620. The multi-video encoder uses an MPEG downsampler to downsample the input video frame because the MPEG downsampler is able to produce a smooth, downsampled version (version) of the low resolution image compared to the wavelet downsamplers currently available. However, any other filter capable of obtaining a downsampled version of the image may be used to perform the downsampling. To obtain a bitstream having three resolution layers, a multi-layer video encoder downsamples an input video frame using coefficients (factors) 2 and 4 to generate half and quarter resolution frames. To obtain a bitstream with four resolution layers, a multi-layer video encoder downsamples input video frames using coefficients 2, 4, and 8 to generate half, quarter, and eighth resolution frames.

In operation S630, the multi-layer video encoder performs AVC encoding on the low resolution video frame. In operation S640, the encoder performs wavelet coding on the low resolution video frame using the AVC-coded low resolution video frame. For example, after performing AVC encoding to produce AVC-encoded video frames having a QCIF resolution, a 15 hertz (Hz) frame rate, and a bit rate of 32 kilobits/second (kbps), the encoder performs wavelet encoding to generate wavelet-encoded frames having the same resolution and frame rate as the AVC-encoded video frames, and a bit rate of 64kbps, using the AVC-encoded frames as a reference.

After encoding the low resolution frames, the multi-layer video encoder encodes the high resolution video frames using the encoded low resolution frames.

More specifically, in operation S650, the encoder performs AVC encoding on the high resolution video frame. In operation S660, the encoder performs wavelet coding on the high-resolution video frame using the AVC-coded high-resolution video frame. For example, after performing AVC encoding to produce AVC-encoded video frames having a CIF resolution, a 30Hz frame rate, and a 256kbps bit rate, the encoder performs wavelet encoding to generate wavelet-encoded frames having a CIF resolution, a 30Hz frame rate, and a 750kbps bit rate, using the AVC-encoded and wavelet-encoded QCIF resolution video frames and the AVC-encoded CIF frames as references. Once video encoding is performed on all resolution layers, the multi-layer video encoder generates a bitstream using the encoded video frames in operation S670.

Fig. 7 and 8 illustrate an example of a specific multi-layer video encoding process according to an exemplary embodiment of the present invention. Although fig. 7 and 8 show that video encoding is performed on two resolution layers, video encoding may be performed on three or more resolution layers in the same manner.

An exemplary embodiment of the present invention shown in fig. 7 is described first.

The multi-layer video encoder downsamples the video frame 700 to generate the low-resolution video frame 710, and then performs AVC encoding on the low-resolution video frame 710 to generate an AVC-encoded low-resolution layer frame to be included in the bitstream.

The multi-layer video encoder then decodes the AVC-encoded low resolution layer frames to obtain decoded frames 720, and compares the decoded frames 720 with the low resolution video frames 710 to obtain low resolution residual frames (residual frames) 730.

The encoder performs wavelet encoding on the low resolution residual frames 730 to generate wavelet encoded low resolution layer frames, then decodes the wavelet encoded low resolution layer frames to obtain decoded frames 740, and then the decoded frames 740 are added to the decoded frames 720 to obtain decoded low resolution layer video frames 750.

The encoder upsamples the decoded low resolution layer video frame 750 to a higher resolution and compares the upsampled version of the frame 760 to the video frame 700 to obtain a high resolution layer frame 770. AVC encoding is performed on the high resolution layer frame 770 to generate an AVC-encoded high resolution layer frame to be included in the bitstream. The AVC-encoded high resolution layer frame is decoded to obtain a decoded frame 780, and the decoded frame 780 is compared to the high resolution layer frame 770, thereby obtaining a high resolution residual frame 790.

Wavelet coding is then performed on the high resolution residual frames 790 to obtain wavelet coded high resolution layer frames to be included in the bitstream.

The multi-layer video encoder finally generates a bitstream containing AVC-coded and wavelet-coded low-resolution layer frames and AVC-coded and wavelet-coded high-resolution layer frames.

Next, referring to fig. 8, the multi-layer video encoder down-samples the high resolution video frame to generate a low resolution video frame and performs AVC encoding on the low resolution video frame to generate an AVC-encoded low resolution layer video frame, and then performs wavelet encoding on the low resolution video frame using the AVC-encoded low resolution layer video frame.

More specifically, the (N-1) th and (N + 1) th low resolution video frames 811 and 813 are used to encode the nth low resolution video frame 812. When the low resolution video frames 811 and 813 are used as a reference for open-loop (open-loop) video encoding, frames reconstructed after decoding AVC-encoded low resolution video frames are used for closed-loop (closed-loop) video encoding.

After completing AVC encoding of the low resolution layer, the multi-layer video encoder performs wavelet encoding on the low resolution layer. The multi-layered video encoder may encode the nth low resolution video frame 822 using the N-1 th and N +1 th low resolution video frames 821 and 823, or frames reconstructed by decoding AVC-encoded frames.

After completing video encoding of the low resolution layer, the encoder performs video encoding on the high resolution layer.

The AVC encoding of the nth high resolution layer video frame 842 may be performed using the N-1 th and N +1 th high resolution layer video frames 841 and 843 or frames reconstructed by decoding the nth low resolution video frame 822. The reconstructed frame is upsampled to generate a video frame 832 before it can be used as a reference.

The encoder then performs wavelet coding on the nth high resolution layer video frame 852 using the N-1 and N +1 th high resolution layer video frames 851 and 853, or frames reconstructed by decoding the nth high resolution layer video frame 842.

The multi-layer video encoding process shown in fig. 7 includes inter-layer reference (inter-layer reference) after temporal filtering, while the video encoding process shown in fig. 8 includes inter-layer reference during temporal filtering. The encoding process shown in fig. 7 can provide better encoding efficiency than the process shown in fig. 8 when there is a large amount of motion in the bitstream, because the spatial relationship between frames is more close (closer) than the temporal relationship between them. Conversely, when there is little motion in the bitstream, the latter can exhibit higher coding efficiency than the former, because the temporal relationship between frames is more intimate than the spatial relationship between them.

A process of allocating a bit rate to each layer will now be described.

Fig. 9 illustrates a process of allocating a bit rate to each layer in a multi-layer video encoding process according to an exemplary embodiment of the present invention. For convenience of explanation, it is assumed that the multi-layer video encoder supports three different resolution layers, i.e., QCIF, CIF, and SD layers.

The scalability requirements for video coding are: QCIF layer 930 has a 15Hz frame rate and a 96 to 192kbps bit rate; the CIF layer 920 has a frame rate of 7.5 to 30Hz, a bit rate of 192 to 768 kbps; while SD layer 910 has a frame rate of 15 to 60Hz and a bit rate of 768 to 3072 kbps.

First, video encoding of the QCIF layer 930 will be described. The multi-layer video encoder performs AVC encoding of the QCIF frames to produce AVC-encoded QCIF layer frames having a 96kbps bit rate and a 15Hz frame rate. The encoder then performs wavelet coding on the QCIF frames using the AVC-coded frames to generate wavelet-coded QCIF layer frames having a bit rate of 192kbps and a frame rate of 15 Hz.

Video encoding of the CIF layer 920 will be described next.

The encoder performs AVC encoding on the CIF frames to generate AVC-encoded CIF layer frames having a maximum frame rate of 30Hz available for the CIF layer 920. In order to reconstruct video frames having a bit rate of 192kbps and a frame rate of 7.5Hz, QCIF layer frames which are AVC-coded and wavelet-coded and a part of CIF layer frames which are AVC-coded are required.

The encoder then performs wavelet coding on the CIF frames to generate wavelet coded CIF layer frames with a maximum frame rate of 30Hz allowed by CIF layer 920. In order to reconstruct video frames having a frame rate of 384 to 768kbps, QCIF layer frames AVC-coded and wavelet-coded, CIF layer frames AVC-coded, and a portion of wavelet-coded CIF layer frames are required.

Finally, video encoding of the SD layer 910 will be described.

The encoder performs AVC encoding on the SD frames to generate AVC-encoded SD layer frames having a maximum frame rate of 60Hz available to SD layer 910. In order to reconstruct a video frame having a bit rate of 768kbps and a frame rate of 15Hz, QCIF layer frames which are AVC-coded and wavelet-coded, CIF layer frames which are AVC-coded and wavelet-coded, and a part of SD layer frames which are AVC-coded are required.

The encoder then performs wavelet encoding on the SD frames to generate wavelet encoded SD layer frames with a maximum frame rate of 60Hz allowed by SD layer 910. In order to reconstruct video frames having a frame rate of 1536 to 3072kbps, QCIF layer frames subjected to AVC coding and wavelet coding, CIF layer frames subjected to AVC coding and wavelet coding, SD layer frames subjected to AVC coding, and a portion of SD layer frames subjected to wavelet coding are required.

Multi-layer video coding may be implemented in various other ways. Fig. 10 and 11 illustrate the structure of a multi-layered video bitstream according to other exemplary embodiments of the present invention.

Unlike the bitstream shown in fig. 4, the bitstream shown in fig. 10 has an SD layer encoded using only wavelet coding, because video frames having a lower bit rate of 1.5Mbps are easily reconstructed from a wavelet-coded bitstream having a high resolution and a sufficient bit rate, for example, 3.0 Mbps.

Fig. 12 is a block diagram of a multi-layer video decoder according to an exemplary embodiment of the present invention. For ease of explanation, it is assumed that the video decoder reconstructs video frames from a bitstream having two resolution layers.

Referring to fig. 12, the multi-layer video decoder includes a bitstream interpreter 1250, an AVC decoding unit 1210 and a wavelet decoding unit 1220 that decode encoded low resolution layer video frames, and an AVC decoding unit 1230 and a wavelet decoding unit 1240 that decode encoded high resolution layer video frames.

The bitstream interpreter 1250 extracts the encoded high and low resolution layer frames from the input bitstream. The encoded low resolution layer frames are composed of AVC-encoded low resolution layer frames and wavelet-encoded low resolution layer frames, and the encoded high resolution layer frames are composed of AVC-encoded high resolution layer frames and wavelet-encoded high resolution layer frames.

The AVC decoding unit 1210 for a low resolution layer includes an inverse quantizer 1211 that inversely (inversely) quantizes AVC-encoded low resolution layer frames, an inverse DCT transformer 1212 that performs inverse DCT on the inversely quantized frames, and an inverse temporal filter 1213 that performs inverse temporal filtering on the inversely DCT-processed frames.

The wavelet decoding unit 1220 for a low resolution layer includes an inverse quantizer 1221 that inversely quantizes wavelet-coded low resolution layer frames using video frames reconstructed by the AVC decoding unit 1210, an inverse wavelet transformer 1222 that performs inverse wavelet transform on the inversely quantized frames, and an inverse temporal filter 1223 that performs inverse temporal filtering on the inversely wavelet-transformed frames.

The AVC decoding unit 1230 for a high resolution layer includes an inverse quantizer 1231 that inversely quantizes AVC-encoded high resolution layer frames using video frames reconstructed by the wavelet decoding unit 1220 for a low resolution layer, an inverse DCT transformer 1232 that performs inverse DCT on the inversely quantized frames, and an inverse temporal filter 1233 that performs inverse temporal filtering on the inversely DCT-transformed frames.

The wavelet decoding unit 1240 for a high resolution layer includes an inverse quantizer 1241 that inversely quantizes wavelet-coded high resolution layer frames using video frames reconstructed by the AVC decoding unit 1230, an inverse wavelet transformer 1242 that performs inverse wavelet transform on the inversely quantized frames, and an inverse temporal filter 1243 that performs inverse temporal filtering on the inversely wavelet-transformed frames.

The term "unit" as used herein refers to, but is not limited to, a software or hardware component, such as a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), that performs certain tasks. A unit may advantageously be configured to reside on (reside on) addressable storage medium and configured to execute on one or more processors. Thus, a unit may include, by way of example, components, such as software components, object-oriented software components, class components, and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functionality provided for in the components and units may be combined into fewer components and units or further separated into additional components and units. Further, the components and units may be implemented such that they execute on one or more computers in a communication system.

Referring to fig. 13, when a bitstream is fed to a multi-layer video decoder, the multi-layer video decoder interprets the bitstream and extracts encoded high and low resolution frames from the bitstream in operation S1310.

After extracting the encoded frames, AVC decoding is performed on AVC-encoded low resolution layer frames among the encoded frames to decode the low resolution AVC layer in operation S1320. Video frames reconstructed by decoding the low resolution AVC layer are used to decode the low resolution wavelet layer.

In operation S1330, the encoder decodes the low resolution wavelet layer using the video frame reconstructed by decoding the low resolution AVC layer. That is, wavelet decoding is performed on wavelet-coded low resolution layer frames among the coded frames using video frames reconstructed by decoding the low resolution AVC layer in order to decode the low resolution wavelet layer. Video frames reconstructed by decoding the low resolution wavelet layer are provided for decoding the high resolution AVC layer.

In operation S1340, the encoder decodes the high-resolution AVC layer using the video frames reconstructed by decoding the low-resolution wavelet layer. That is, AVC decoding is performed on AVC-coded high resolution layer frames among the coded frames using video frames reconstructed by decoding the low resolution wavelet layer in order to decode the high resolution AVC layer. Video frames reconstructed by decoding the high-resolution AVC layer are provided for decoding the high-resolution wavelet layer.

In operation S1350, the encoder decodes the high-resolution wavelet layer using the video frames reconstructed by decoding the high-resolution AVC layer. That is, wavelet decoding is performed on wavelet-coded high-resolution layer frames among the coded frames using video frames reconstructed by decoding the high-resolution AVC layer in order to decode the high-resolution wavelet layer.

After the decoding of all layers is completed, the multi-layer video decoder generates a video signal using the reconstructed video frames and then displays it through a display device in operation S1360.

Industrial applicability

As described above, the encoding and decoding methods according to exemplary embodiments of the present invention allow a predetermined resolution layer to be encoded/decoded using a plurality of different video encoding schemes, thereby providing excellent scalability and encoding efficiency.

While the present invention has been shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. For example, although one resolution layer is described as including an AVC layer and a wavelet layer, the resolution layer may also be composed of two layers using other coding algorithms. Also, although it is described above that one resolution layer is encoded using two video encoding schemes, the resolution layer may be encoded using three or more video encoding schemes.

Claims

1. A multi-layered video encoding method, comprising:

encoding a video frame having a predetermined resolution using a first video encoding scheme;

encoding a video frame having a resolution identical to the predetermined resolution using a second video encoding scheme with reference to the video frame encoded by the first video encoding scheme; and

a bitstream is generated containing encoded video frames for all resolution layers,

wherein the encoding using the first video coding scheme and the encoding using the second video coding scheme are recursively performed for all resolution layers in order from a lower resolution layer to a higher resolution layer,

wherein the first video coding scheme is based on Advanced Video Coding (AVC) and the second video coding scheme is based on wavelet coding.

2. The method of claim 1, wherein the first and second encoding schemes are performed at a same frame rate.

3. The method of claim 1, wherein encoding the video frame using a second video encoding scheme comprises:

decoding the video frame encoded by the first video encoding scheme;

obtaining a residual frame between the video frame and a decoded video frame; and

encoding the residual frame using a second video encoding scheme.

4. The method of claim 1, wherein encoding the video frame using a second video encoding scheme comprises:

decoding the video frame encoded by the first video encoding scheme; and

encoding the video frame using a second video encoding scheme,

wherein the decoded video frame is used as a reference for temporal filtering performed during encoding of the video frame using the second video encoding scheme.

5. A multi-layered video decoding method, comprising: extracting encoded lower resolution layer frames and encoded higher resolution layer frames from a bitstream, and decoding the encoded lower resolution layer frames and the encoded higher resolution layer frames to reconstruct video frames, wherein the encoded frames of each resolution layer include frames encoded with a first video encoding scheme and frames encoded with a second video encoding scheme, the decoding comprising:

decoding the frames encoded by the first video coding scheme for a predetermined resolution layer using a first video decoding scheme to reconstruct first frames; and

decoding the frame encoded by the second video coding scheme for the predetermined resolution layer using the second video decoding scheme with reference to the reconstructed first frame to reconstruct a second frame, and

wherein decoding frames encoded by the first video coding scheme and decoding frames encoded by the second video coding scheme are performed recursively for all resolution layers in order from a lower resolution layer to a higher resolution layer,

wherein the first video coding scheme and the first video decoding scheme are based on Advanced Video Coding (AVC), and the second video coding scheme and the second video decoding scheme are based on wavelet coding.

6. A multi-layered video decoding method, comprising: extracting encoded lower resolution layer frames and encoded higher resolution layer frames from a bitstream, and decoding the encoded lower resolution layer frames and the encoded higher resolution layer frames to reconstruct video frames, wherein the encoded video frames of each resolution layer comprise frames encoded by a first video coding scheme and frames encoded by a second video coding scheme, the decoding comprising:

decoding the frames encoded by the first video coding scheme for a predetermined resolution layer using a first video decoding scheme to reconstruct first frames;

decoding the frames encoded by the second video coding scheme for the predetermined resolution layer using a second video decoding scheme to reconstruct second frames; and

adding the reconstructed second frame to the reconstructed first frame, thereby reconstructing a video frame in the predetermined resolution layer,

wherein decoding frames encoded by the first video coding scheme, decoding frames encoded by the second video coding scheme, and adding to reconstruct video frames are performed recursively for all resolution layers in order from a lower resolution layer to a higher resolution layer,