HK1129013B

HK1129013B - Inter-layer prediction method for video signal

Info

Publication number: HK1129013B
Application number: HK09106198.8A
Authority: HK
Inventors: 朴胜煜; 全柄文; 朴志皓
Original assignee: Lg电子株式会社
Priority date: 2006-01-09
Filing date: 2007-01-09
Publication date: 2011-10-07

Description

Inter-layer prediction method for video signal

1. Field of the invention

The present invention relates to a method for inter-layer prediction in encoding/decoding a video signal.

2. Background of the invention

Scalable Video Codecs (SVCs) encode video into a sequence of pictures with the highest image quality while ensuring that portions of the encoded picture sequence (specifically, a partial sequence of frames intermittently selected from the entire sequence of frames) can be decoded and used to represent the video with low image quality.

Although it is possible to represent low image quality video by receiving and processing portions of a picture sequence encoded according to a scalable scheme, there is still a problem in that image quality is significantly degraded if a bit rate is lowered. One solution to this problem is to provide a low bit rate auxiliary picture sequence, e.g. a picture sequence with a small screen size and/or a low frame rate, as at least one layer in the hierarchical structure.

When it is assumed that two sequences are provided, the secondary (lower) picture sequence is referred to as the base layer, and the primary (upper) picture sequence is referred to as the enhancement or enhancement layer. The video signals of the base layer and the enhancement layer have redundancy because the same video signal source is encoded into two layers. In order to improve the coding efficiency of the enhanced layer, the video signal of the enhanced layer is coded using the coded information (motion information or texture information) of the base layer.

Although a single video source 1 may be encoded into multiple layers with different transmissibilities as shown in fig. 1a, multiple video sources 2b in different scanning modes containing the same content 2a may also be encoded into respective layers as shown in fig. 1 b. Also in this case, an encoder encoding an upper layer can improve a codec gain by performing inter-layer prediction using encoded information of a lower layer, because two sources 2b provide the same content 2 a.

Therefore, it is desirable to provide an inter-layer prediction method that takes into account the scanning mode of a video signal when encoding different sources into respective layers. When encoding interlaced video, it can be encoded into even and odd fields, and also into odd and even macroblock pairs in a frame. Accordingly, the picture type used for codec of interlaced video signals must also be considered for inter-layer prediction.

In general, the enhancement layer provides pictures with a resolution higher than that of the base layer. Accordingly, if pictures of layers have different resolutions when different sources are encoded into respective layers, interpolation is also required to increase picture resolution (i.e., picture size). Since the closer an image of a base layer picture used in inter-layer prediction is to an image of an enhancement layer picture for prediction coding, the higher the coding rate, it is necessary to provide an interpolation method that takes into account the scanning modes of video signals of layers.

3. Summary of the invention

It is an object of the present invention to provide a method of performing inter-layer prediction in a situation where at least one of two layers has an interlaced video signal component.

It is another object of the present invention to provide a method of performing inter-layer motion prediction for layers of a picture having different spatial resolutions (scalability) according to a picture type.

It is a further object of the present invention to provide a method of performing inter-layer texture prediction for layers of a picture having different spatial and/or temporal resolutions (scalability).

An inter-layer motion prediction method according to the present invention includes: setting motion-related information of an intra-mode macroblock to motion-related information of an inter-mode macroblock, the intra-mode and inter-mode macroblocks being two vertically adjacent macroblocks of a base layer; motion information of the pair of vertically adjacent macroblocks is then obtained based on the two vertically adjacent macroblocks for inter-layer motion prediction.

Another inter-layer motion prediction method according to the present invention includes: setting an intra-mode macroblock, which is one of two vertically adjacent intra-mode and inter-mode macroblocks of a base layer, as an inter-mode block having 0 motion-related information; motion information of the pair of vertically adjacent macroblocks is then obtained based on the two vertically adjacent macroblocks for inter-layer motion prediction.

Another inter-layer motion prediction method according to the present invention includes: deriving motion information for a single macroblock from motion information for a pair of vertically adjacent frame macroblocks of the base layer; and using the derived motion information as motion information for a field macroblock in the current layer or prediction information for respective motion information for pairs of field macroblocks in the current layer.

Another inter-layer motion prediction method according to the present invention includes deriving motion information of each of two macroblocks from motion information of a single field macroblock of a base layer or motion information of a single field macroblock selected from a pair of vertically adjacent field macroblocks of the base layer; and using the derived respective motion information as prediction information for the respective motion information for the frame macroblocks of the current layer.

An inter-layer motion prediction method for layers of a picture having different resolutions according to the present invention includes: converting a picture of a lower layer into a frame picture of the same resolution by selectively using a prediction method of converting into framed macroblocks according to the type of the picture and the type of macroblocks in the picture; up-sampling the frame picture to have the same resolution as that of an upper layer; then, an inter-layer prediction method suitable for the type of frame macroblock in this upsampled frame picture and the macroblock type in the upper layer picture is applied.

Another inter-layer motion prediction method for layers of a picture having different resolutions according to the present invention includes: identifying types of pictures of lower and upper layers and/or types of macroblocks included in the pictures; applying a method of predicting a pair of frame macroblocks from a single field macroblock to a lower layer picture according to the identified result to construct a virtual picture having the same aspect ratio as that of an upper layer picture; up-sampling the virtual picture; this upsampled virtual picture is then used to apply inter-layer motion prediction to the upper layer.

Another inter-layer motion prediction method for layers of a picture having different resolutions according to the present invention includes: identifying types of pictures of lower and upper layers and/or types of macroblocks included in the pictures; applying a method of predicting a pair of frame macroblocks from a single field macroblock to a lower layer picture according to the identified result to construct a virtual picture having the same aspect ratio as that of an upper layer picture; and applying inter-layer motion prediction to an upper layer picture using the constructed virtual picture.

Another inter-layer motion prediction method for layers of a picture having different resolutions according to the present invention includes: identifying the types of the lower layer and the upper layer pictures; copying motion information of blocks in the lower layer picture to construct a virtual picture if the type of the lower layer picture is field and the type of the upper layer picture is progressive; up-sampling the virtual picture; and applying a frame macroblock-macroblock motion prediction method between the upsampled virtual picture and the upper layer picture.

Another inter-layer motion prediction method for layers of a picture having different resolutions according to the present invention includes: identifying the types of the lower layer and the upper layer pictures; copying motion information of a block of a lower layer to construct a virtual picture if the type of the lower layer picture is field and the type of the upper layer picture is progressive; and applying inter-layer motion prediction to the upper layer picture using the virtual picture.

In an embodiment of the present invention, a partition mode, a reference index, and a motion vector are sequentially predicted in inter-layer motion prediction.

In another embodiment of the present invention, the reference index, the motion vector, and the partition mode are sequentially predicted.

In another embodiment of the present invention, motion information of a pair of field macroblocks of a virtual base layer to be used for inter-layer motion prediction is derived from motion information of a pair of frame macroblocks of the base layer.

In another embodiment of the present invention, motion information of a field macroblock in an even or odd field picture of a virtual base layer to be used for inter-layer motion prediction is derived from motion information of a pair of frame macroblocks of the base layer.

In another embodiment of the present invention, a macroblock is selected from a field macroblock pair of a base layer, and motion information of a frame macroblock pair of a virtual base layer to be used for inter-layer motion prediction is derived from the motion information of the selected macroblock.

In another embodiment of the present invention, motion information of a pair of frame macroblocks of a virtual base layer to be used for inter-layer motion prediction is derived from motion information of field macroblocks in even or odd field pictures of the base layer.

In another embodiment of the present invention, information of field macroblocks in even or odd field pictures of a base layer is copied to additionally construct a virtual field macroblock, and motion information of a pair of frame macroblocks of the virtual base layer to be used for inter-layer motion prediction is derived from motion information of the pair of field macroblocks constructed in this way.

An inter-layer texture prediction method according to the present invention includes: constructing a field macroblock pair from a vertically adjacent frame macroblock pair of the base layer; and using the respective texture information of the constructed pair of field macroblocks as respective texture prediction information of the pair of field macroblocks of the current layer.

Another inter-layer texture prediction method according to the present invention includes: constructing a single field macroblock from a vertically adjacent pair of frame macroblocks of the base layer; and using the constructed texture information of the single field macroblock as texture prediction information of the field macroblock of the current layer.

Another inter-layer texture prediction method according to the present invention includes: constructing a pair of frame macroblocks from a single field macroblock or a pair of vertically adjacent field macroblocks of the base layer; and using the respective texture information of the constructed pair of frame macroblocks as respective texture prediction information of the pair of frame macroblocks of the current layer.

Another inter-layer texture prediction method according to the present invention includes constructing N-pair frame macroblocks from a pair of vertically adjacent field macroblocks of a base layer, where N is an integer greater than 1; and using the respective texture information of the constructed N pairs of frame macro blocks as respective texture prediction information of the N pairs of frame macro blocks located at different time positions in the current layer.

Another inter-layer texture prediction method according to the present invention includes: dividing each frame of the lower layer into a plurality of field pictures to allow the lower layer to have the same temporal resolution as the upper layer; upsampling each of the separated field pictures in a vertical direction to expand each of the separated field pictures in the vertical direction; each upsampled field picture is then used for inter-layer texture prediction for each frame of the upper layer.

Another inter-layer texture prediction method according to the present invention includes: up-sampling each field picture of the lower layer in a vertical direction to expand each field picture in the vertical direction; and using each upsampled field picture for inter-layer texture prediction of each frame of the upper layer.

Another inter-layer texture prediction method according to the present invention includes: dividing each frame of an upper layer into a plurality of field pictures; down-sampling the lower layer picture to downsize the lower layer picture in a vertical direction; the down-sampled picture is then used for inter-layer texture prediction of the separated field picture of the upper layer.

A method of encoding a video signal using inter-layer prediction according to the present invention includes: determining whether texture information of each of 2N blocks constructed by alternately selecting lines of the 2N blocks in an arbitrary picture of the base layer and then arranging the selected lines in a selected order is used or texture information of each of 2N blocks constructed by interpolating one block selected from the 2N blocks of the base layer is used in inter-layer texture prediction; and incorporating information indicative of the determination into the encoded information.

A method of decoding a video signal using inter-layer prediction according to the present invention includes: checking whether specific indication information is included in the received signal; and determining, based on the checked result, whether to use texture information of each of the 2N blocks constructed by alternately selecting lines of the 2N blocks in an arbitrary picture of the base layer and then arranging the selected lines in a selected order, or texture information of each of the 2N blocks constructed by interpolating one block selected from the 2N blocks of the base layer, in the inter-layer texture prediction.

In an embodiment of the present invention, each frame of an upper layer or a lower layer is divided into two field pictures.

In the embodiment of the present invention, if specific indication information is not included in the received signal, the case is considered to be the same as the case where a signal including indication information that has been set to 0 is received and blocks whose respective texture information is to be used for inter-layer prediction are determined.

A method of using a video signal of a base layer for inter-layer texture prediction according to the present invention includes: dividing an interlaced video signal of a base layer into even and odd field components; amplifying the even and odd field components in vertical and/or horizontal directions, respectively; the amplified even and odd field component sets are then combined for inter-layer texture prediction.

Another method of using a video signal of a base layer for inter-layer texture prediction according to the present invention includes: dividing the progressive video signal of the base layer into even line groups and odd line groups; amplifying the even and odd row groups individually in the vertical and/or horizontal direction; the enlarged even and odd row groups are combined and used for inter-layer texture prediction.

Another method of using a video signal of a base layer for inter-layer texture prediction according to the present invention includes: enlarging the interlaced video signal of the base layer in a vertical and/or horizontal direction to have the same resolution as the progressive video signal of the upper layer; and performing inter-layer texture prediction of the video signal of the upper layer based on the amplified video signal.

Another method of using a video signal of a base layer for inter-layer texture prediction according to the present invention includes: enlarging a progressive video signal of the base layer in a vertical and/or horizontal direction to have the same resolution as an interlaced video signal of the upper layer; and performing inter-layer texture prediction of the video signal of the upper layer based on the amplified video signal.

In one embodiment of the invention, video signal separation and amplification is performed at the macroblock level (or on a macroblock basis).

In another embodiment of the invention, the video signal separation and the enlargement are performed at the picture level.

In another embodiment of the present invention, video signal separation and enlargement are performed if picture formats of two layers to which inter-layer texture prediction is to be applied are different, i.e., if one layer includes a progressive picture and the other layer includes an interlaced picture.

In another embodiment of the present invention, if pictures of two layers to which inter-layer texture prediction is to be applied are both interlaced, video signal separation and enlargement are performed.

4. Brief description of the drawings

FIGS. 1a and 1b illustrate a method of encoding a single video source into multiple layers;

fig. 2a and 2b schematically illustrate the configuration of a video signal encoding apparatus to which an inter-layer prediction method according to the present invention is applied;

fig. 2c and 2d show types of picture sequences for encoding interlaced video signals;

FIGS. 3a and 3b schematically illustrate a process in which a base layer picture is constructed for inter-layer texture prediction and deblocking filtering is performed, according to an embodiment of the present invention;

fig. 4a to 4f schematically illustrate a process in which motion information of a field macroblock of a virtual base layer to be used for inter-layer motion prediction of the field macroblock in an MBAFF frame is derived using motion information of a frame macroblock according to an embodiment of the present invention;

fig. 4g schematically shows a procedure in which texture information of a macroblock pair is used for texture prediction of a field macroblock pair in a MBAFF frame according to an embodiment of the present invention;

FIG. 4h illustrates a method of transforming a pair of frame macroblocks into a pair of field macroblocks in accordance with an embodiment of the present invention;

FIGS. 5a and 5b illustrate a reference index and motion information derivation procedure according to another embodiment of the present invention;

fig. 6a to 6c schematically illustrate a procedure in which motion information of a field macroblock in a virtual base layer is derived using motion information of a frame macroblock according to an embodiment of the present invention;

fig. 6d schematically shows a procedure in which the texture information of a pair of frame macroblocks is used for texture prediction of a field macroblock in a field picture according to an embodiment of the present invention;

FIGS. 7a and 7b illustrate a reference index and motion information derivation procedure according to another embodiment of the present invention;

fig. 8a to 8c schematically illustrate a procedure in which motion information of a field macroblock frame macroblock of a virtual base layer to be used for inter-layer motion prediction is derived using motion information of a field macroblock in an MBAFF frame according to an embodiment of the present invention;

fig. 8d schematically shows a procedure in which texture information of a pair of field macroblocks in a MBAFF frame is used for texture prediction of a pair of frame macroblocks, according to an embodiment of the present invention;

FIG. 8e illustrates a method of transforming a pair of field macroblocks into a pair of frame macroblocks in accordance with an embodiment of the present invention;

fig. 8f and 8g schematically show a procedure for using texture information of a field macroblock pair in a MBAFF frame for inter-layer prediction of a frame macroblock pair when only one of the field macroblock pair is in inter mode, according to an embodiment of the present invention;

fig. 8h schematically shows a procedure in which texture information of a pair of field macroblocks in an MBAFF frame is used for texture prediction of a plurality of pairs of frame macroblocks, according to an embodiment of the present invention;

FIGS. 9a and 9b illustrate a reference index and motion information derivation procedure according to another embodiment of the present invention;

fig. 10a to 10c schematically illustrate a procedure in which motion information of a frame macroblock of a virtual base layer to be used for inter-layer motion prediction is derived using motion information of a field macroblock in a field picture according to an embodiment of the present invention;

fig. 10d schematically shows a procedure in which texture information of field macroblocks in a field picture is used for texture prediction of a pair of frame macroblocks, according to an embodiment of the present invention;

fig. 11 illustrates a reference index and motion information derivation procedure according to another embodiment of the present invention;

fig. 12a and 12b schematically illustrate a procedure in which motion information of a frame macroblock of a virtual base layer to be used for inter-layer motion prediction is derived using motion information of a field macroblock in a field picture according to another embodiment of the present invention;

fig. 13a to 13d schematically illustrate a procedure in which motion information of a field macroblock of a virtual base layer to be used for inter-layer motion prediction is derived using motion information of the field macroblock according to an embodiment of the present invention, respectively, according to the type of picture;

fig. 14a to 14k illustrate methods of performing inter-layer motion prediction when spatial resolutions of layers are different according to types of pictures, respectively, according to various embodiments of the present invention;

fig. 15a and 15b schematically illustrate procedures for using pictures of a base layer having different spatial resolutions for inter-layer texture prediction when an enhancement layer is progressive and the base layer is interlaced, according to an embodiment of the present invention;

fig. 16a and 16b schematically illustrate a procedure in which a pair of macroblocks in a picture is divided into macroblocks and the divided macroblocks are enlarged in order to use the picture of a base layer for inter-layer texture prediction according to an embodiment of the present invention;

fig. 17a and 17b schematically illustrate a procedure for using pictures of a base layer having different spatial resolutions for inter-layer texture prediction when an enhancement layer is interlaced and the base layer is progressive according to an embodiment of the present invention;

fig. 18 schematically illustrates a procedure for using pictures of a base layer having different spatial resolutions for inter-layer prediction when both an enhanced layer and the base layer are interlaced according to an embodiment of the present invention;

fig. 19a illustrates a procedure for applying inter-layer prediction when an enhancement layer is a progressive frame sequence and picture types and temporal resolutions of two layers are different according to an embodiment of the present invention;

FIG. 19b shows a procedure for applying inter-layer prediction when the enhancement layer is a progressive frame sequence and the two layers have different picture types and the same resolution, according to an embodiment of the present invention;

fig. 20 illustrates a procedure of applying inter-layer prediction when a base layer is a progressive frame sequence and picture types and temporal resolutions of two layers are different according to an embodiment of the present invention; and

fig. 21 illustrates a procedure of applying inter-layer prediction when a base layer is a progressive frame sequence and two layers have different picture types and the same resolution according to an embodiment of the present invention.

5. Modes for carrying out the invention

Embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

Fig. 2a schematically shows building blocks of a video signal encoding apparatus to which the inter-layer prediction method according to the present invention is applied. Although the apparatus of fig. 2a is implemented to encode an input video signal into two layers, the principles of the present invention described below are also applicable to an inter-layer process when a video signal is encoded into three or even more layers.

The inter-layer prediction method according to the present invention is performed at an Enhancement Layer (EL) encoder 20 in the device of fig. 2 a. The encoded information (motion information and texture information) is received at a base layer (EL) encoder 21. Performing inter-layer texture prediction or motion prediction based on the received information. If necessary, the received information is decoded and prediction is performed based on the decoded information. Of course, in the present invention, as shown in fig. 2b, the input video signal may be coded using the video source 3 of the base layer that has been coded. In this case, the inter-layer prediction method described below is also applicable.

In the case of fig. 2a, there may be two methods in which the BS encoder 21 encodes an interlaced video signal or in which the encoded video source 3 of fig. 2b is codec. Specifically, in one of these two methods, the interlaced video signal is simply encoded into a sequence of fields on a field-by-field basis, as shown in fig. 3a, while in the other method, the frames are encoded into a sequence of frames by constructing each frame of the sequence in pairs of macroblocks of two (even and odd) fields, as shown in fig. 3 b. The upper macroblock of a macroblock pair in a frame encoded in this manner is referred to as the "top macroblock" and the lower macroblock is referred to as the "bottom macroblock". If the top macroblock consists of even (or odd) field picture components, the bottom macroblock consists of odd (or even) field picture components. A frame constructed in this manner is referred to as a macroblock adaptive field (MBAFF) frame. MBAFF frames may include not only macroblock pairs that each contain odd and even field macroblocks, but also macroblock pairs that each contain two frame macroblocks.

Accordingly, when a macroblock in a picture has an interlaced image component, it may be a macroblock in a field, and may also be a macroblock in a frame. Each macroblock having an interlaced image component is called a field macroblock, and each macroblock having a progressive (scanned) image component is called a frame macroblock.

Therefore, it is necessary to determine the inter-layer prediction method by determining whether the respective types of the macroblock to be encoded at the EL encoder 20 and the base layer macroblock to be used in the inter-layer prediction of the macroblock are the frame macroblock type or the field macroblock type. If the macroblock is a field macroblock, the inter-layer prediction method needs to be determined by determining whether it is a field macroblock in a field or a field macroblock in a MBAFF frame.

The method will be described separately for each case. Before the description, it is assumed that the resolution of the current layer is equal to the resolution of the base layer. That is, it is assumed that SpatialScalabilityType () is 0. A description when the resolution of the current layer is higher than that of the base layer will be given later. In the following description and drawings, the terms "top" and "even" (or odd) are used interchangeably, and the terms "bottom" and "odd" (or even) are used interchangeably.

In order to perform inter-layer prediction using the base layer to encode or decode the enhancement layer, the base layer needs to be decoded first. Therefore, the base layer decoding is first described as follows.

In decoding the base layer, not only base layer motion information such as a partition mode, a reference index, and a motion vector, but also texture of the base layer is decoded.

When the texture of the base layer is decoded for inter-layer texture prediction, not all image sample data of the base layer is decoded, which is to reduce the load of a decoder. The image sample data of the intra mode macroblock is decoded, and the inter mode macroblock is only residual data, i.e., error data between the image sample data, is decoded without motion compensation using an adjacent picture.

Further, base layer texture decoding for inter-layer texture prediction is performed not on a macroblock-by-macroblock basis but on a picture-by-picture basis to construct a base layer picture temporally coincident with an enhancement layer picture. The base layer picture is constructed from image sample data reconstructed from the intra mode macroblock and residual data decoded from the inter mode macroblock as described above.

Intra-mode or inter-mode motion compensation and transformation, such as DCT and quantization, is performed on an image block basis, e.g., on a 16 x 16 macroblock basis or on a 4x4 subblock basis. This causes blocking artifacts at block boundaries to distort the image. Deblocking filtering is applied to reduce these blocking artifacts. The deblocking filter smoothes edges of image blocks to improve the quality of video frames.

Whether deblocking filtering is applied to reduce blocking distortion depends on the strength of the image block at the boundary and the gradient of pixels around the boundary. The strength or degree of the deblocking filter is determined by a quantization parameter, an intra mode, an inter mode, an image block division mode indicating a block size and the like, a motion vector, a pixel value before deblocking filtering, and the like.

The deblocking filter in inter-layer prediction is an intra-mode macroblock in a base layer picture applied as a basis for texture prediction of an intra-mode (intraBL or inter-layer intra-mode) macroblock of an enhancement layer.

When two layers to be encoded according to the inter-layer prediction method are all encoded into a field picture sequence as shown in fig. 2c, the two layers are all considered as a frame format, so that an encoding/decoding process including deblocking filtering can be easily derived from a coding/decoding process for the frame format.

The method of performing deblocking filtering according to an embodiment of the present invention will now be described for the case where the picture format of the base layer is different from that of the enhancement layer, i.e., the case where the base layer is in frame (or progressive) format and the base layer is in field (or interlaced) format, the case where the base layer is in field format and the base layer is in frame format, or the case where one of the enhancement layer and the base layer is encoded into a field picture sequence and the other is encoded into MBAFF frames although both the enhancement layer and the base layer are in field format, as shown in fig. 2c and 2 d.

Fig. 3a and 3b schematically illustrate a process in which a base layer picture is constructed to perform deblocking filtering for inter-layer texture prediction according to an embodiment of the present invention.

Fig. 3a shows an embodiment in which the enhancement layer is in frame format and the base layer is in field format, while fig. 3b shows an embodiment in which the base layer is in field format and the base layer is in frame format.

In these embodiments, for inter-layer texture prediction, textures of inter-mode macroblocks and intra-mode macroblocks of a base layer are decoded to construct a base layer picture including image sample data and residual data, and the constructed picture is up-sampled according to a ratio of a resolution (or screen size) of the base layer to a resolution of an enhancement layer after applying a deblocking filter to the constructed picture to reduce blocking artifacts.

The first method (method 1) in fig. 3a and 3b is a method in which a base layer is divided into two field pictures to perform deblocking filtering. In this method, when an enhancement layer is created using a base layer encoded in different picture formats, the base layer picture is divided into even-line field pictures and odd-line field pictures, and the two field pictures are deblocked (i.e., subjected to filtering for deblocking) and upsampled. The two pictures are then spliced into a single picture and inter-layer texture prediction is performed based on the single picture.

The first method includes the following three steps.

In the separation step (step 1), the base layer picture is divided into a top field (or odd field) picture including even lines and a bottom field (or even field) picture including odd lines. The base layer picture is a video picture including residual data (inter mode data) and image sample data (intra mode data) reconstructed from a data stream of the base layer by motion compensation.

In the deblocking step (step 2), the field picture separated in the separating step is deblocked by a deblocking filter. Here, a conventional deblocking filter may be used as the deblocking filter.

When the resolution of the enhanced layer is different from that of the base layer, the deblocked field picture is upsampled according to a ratio of the resolution of the enhanced layer to that of the base layer.

In the splicing step (step 3), the upsampled top field picture and the upsampled bottom field picture are interlaced in an alternating manner to be spliced into a single picture. Thereafter, texture prediction of the enhancement layer is performed based on this single picture.

In the second method (method 2) in fig. 3a and 3b, when an enhancement layer is created using a base layer encoded in a different picture format, the base layer picture is not divided into two field pictures but is directly deblocked and upsampled, and inter-layer texture prediction is performed based on the resulting pictures.

In this second method, a base layer picture corresponding to an enhancement layer picture to be encoded by inter-layer texture prediction is not divided into top and bottom field pictures but is immediately deblocked and then upsampled. Thereafter, texture prediction of the enhancement layer is performed based on this upsampled picture.

The deblocking filter applied to a base layer picture constructed for inter-layer motion prediction is applied only to a region including image sample data decoded from an intra-mode macroblock, and is not applied to a region including residual data.

In case the base layer in fig. 3a is coded in field format, i.e. the base layer is coded as a sequence of field pictures as shown in fig. 2c or as MBAFF frames as shown in fig. 2d, in order to apply the second method, a procedure of alternately interlacing the lines of the top and bottom field pictures to combine them into a single picture (in case of fig. 2 c) or alternately interlacing the lines of the top and bottom macroblocks of a pair of field macroblocks to combine them into a single picture (in case of fig. 2 d) needs to be performed. This process will be described in detail with reference to fig. 8d and 8 e. Top and bottom field pictures or top and bottom macroblocks to be interlaced are field pictures or macroblocks that include residual data (inter mode data) and image sample data (intra mode data) reconstructed by motion compensation.

Further, in the case where the top and bottom macroblocks of a field macroblock pair (of the base layer) in the MBAFF frame as shown in fig. 2d are of different modes and intra-mode blocks are selected from these macroblocks for inter-layer texture prediction of a macroblock pair of the enhanced layer (in the case of fig. 8g described later), in the case where any frame (picture) in the base layer encoded into a field macroblock pair in the MBAFF frame as shown in fig. 2d does not temporally coincide with an enhanced layer picture (in the case of fig. 8h described later), or in the case where the texture of the enhanced layer having a macroblock pair is predicted from the base layer having a field macroblock of a field picture as shown in fig. 2c (in the case of fig. 10d described later), a selected one of the field macroblocks is up-sampled into a provisional macroblock pair (in fig. 8g, "841" and in fig. 8h "and" 852 ") or two provisional macroblocks (in fig. 10 d" 1021 "), and applies a deblocking filter to intra-mode macroblocks of the macroblocks.

Inter-layer texture prediction, described in the various embodiments below, is performed based on the deblocked base layer picture described in the embodiments of fig. 3a and 3 b.

The inter-layer prediction method will now be described separately for each case classified according to the macroblock type in the current layer to be coded and the macroblock type of the base layer to be used for inter-layer prediction of the macroblock of the current layer. In the present description, it is assumed that the spatial resolution of the current layer is equal to the spatial resolution of the base layer as described above.

I. Case of frame MB- > field MB in MBAFF frame

In this case, a macroblock in a current layer (EL) is encoded as a field macroblock in an MBAFF frame, and a macroblock in a base layer to be used for inter-layer prediction of the macroblock of the current layer is encoded as a frame macroblock. The video signal component included in both the upper macroblock and the lower macroblock in the base layer is the same as the video signal component included in a pair of co-located macroblocks in the current layer. The upper and lower (top and bottom) macroblocks will be referred to as macroblock pairs, and the term pair will be used in the following description to describe a pair of vertically adjacent blocks. First, interlayer motion prediction is described as follows.

The EL encoder 20 uses a macroblock division mode obtained by merging the macroblock pair 410 of the base layer into a single macroblock (by compressing to half the size in the vertical direction) as the division mode of the current macroblock. Fig. 4a shows a detailed example of this process. As shown, first, the corresponding macroblock pair 410 of the base layer is merged into a single macroblock (S41), and the division pattern of the macroblock obtained by the merging is copied to another macroblock to construct the macroblock pair 411 (S42). Thereafter, the respective partition modes of the pair of macroblocks 411 are applied to the pair of macroblocks 412 of the virtual base layer (S43).

However, when the corresponding macroblock pair 410 is merged into a single macroblock, a partition area that is not allowed in the partition mode may be generated. To prevent this, the EL encoder 20 determines the division mode according to the following rule.

1) The top and bottom two 8 × 8 blocks ("B8 _ 0" and "B8 _ 2" in fig. 4a) in a macroblock pair of the base layer are merged into a single 8 × 8 block. However, if none of the respective 8 x 8 blocks are subdivided, they are merged into two 8 x4 blocks, whereas if any of the respective 8 x 8 blocks are subdivided, they are merged into four 4x4 blocks ("401" in fig. 4 a).

2) An 8 × 16 block of the base layer is reduced to an 8 × 8 block, a 16 × 8 block is reduced to two adjacent 8 × 4 blocks, and a 16 × 16 block is reduced to a 16 × 8 block.

If at least one macroblock of the corresponding macroblock pair is intra-mode encoded, the EL encoder 20 first performs the following process before the merging process.

If only one of the two macroblocks is the intra mode, the motion information such as the macroblock division mode, the reference index, and the motion vector of the inter macroblock is copied to the intra macroblock as shown in fig. 4b, or the intra macroblock is considered as a 16 × 16 inter macroblock having a 0 motion vector and a 0 reference index as shown in fig. 4 c. Alternatively, as shown in fig. 4d, the reference index of the intra macroblock is set by copying the reference index of the inter macroblock to the intra macroblock, and a 0 motion vector is allocated to the intra macroblock. Then, the above-mentioned merging process is performed, and then the reference index and motion vector derivation procedure is performed as described below.

EL encoder 20 performs the following process to derive the reference index of current macroblock pair 412 from the reference index of corresponding macroblock pair 410.

If each block in the base layer 8 x 8 block pair corresponding to the current 8 x 8 block has been subdivided into the same number of portions, the reference index of one block (top or bottom) in the 8 x 8 block pair is determined to be the reference index of the current 8 x 8 block. Otherwise, the reference index of the block of the 8 × 8 block pair that has been subdivided into a smaller number of portions is determined as the reference index of the current 8 × 8 block.

In another embodiment of the present invention, the smaller one of the reference indexes set for the base layer 8 × 8 block pair corresponding to the current 8 × 8 block is determined as the reference index of the current 8 × 8 block. This determination in the example of fig. 4e can be expressed as follows:

the reference index of the current B8_0 is min (reference index of B8_0 of the base top frame MB, reference index of B8_2 of the base top frame MB)

The reference index of the current B8_1 is min (reference index of B8_1 of the base top frame MB, reference index of B8_3 of the base top frame MB)

The reference index of the current B8_2 is min (the reference index of B8_0 of the base frame MB, the reference index of B8_2 of the base frame MB), and

the reference index of the current B8_3 is min (the reference index of B8_1 of the base frame MB, the reference index of B8_3 of the base frame MB).

The above reference index derivation procedure is applicable to both top and bottom field macroblocks. The reference index of each 8 × 8 block determined in this way is multiplied by 2, and the multiplied reference index is determined as its final reference index. The reason for this multiplication is that at decoding, the number of pictures is twice the number in the frame sequence, because field macroblocks belonging to a picture are divided into even and odd fields. Depending on the decoding algorithm, the final reference index of the bottom field macroblock can be determined by multiplying its reference index by 2 and then adding 1 to the multiplied reference index.

The following is a procedure of the EL encoder 20 to derive motion vectors of the macroblock pairs of the virtual base layer.

The motion vector is determined on a 4x4 block basis, so that the corresponding 4x 8 block of the base layer is identified, as shown in fig. 4 f. If the corresponding 4x 8 block has been subdivided, the motion vector of the top or bottom 4x4 block thereof is determined as the motion vector of the current 4x4 block. Otherwise, the motion vector of the corresponding 4 × 8 block is determined as the motion vector of the current 4 × 4 block. The determined motion vector is used as the final motion vector for the current 4x4 block after dividing its vertical component by 2. The reason for this division is that the picture components included in two frame macroblocks correspond to the picture components of one field macroblock and thus the size of the field picture is reduced by half in the vertical direction.

Once the motion information of the pair of field macroblocks 412 of the virtual base layer is determined in this manner, the motion information is used for inter-layer motion prediction of the pair of target field macroblocks 413 of the enhancement layer. Also, in the following description, once motion information of a macroblock or a macroblock pair of a virtual base layer is determined, the motion information is used for inter-layer motion prediction of a corresponding macroblock or a corresponding macroblock pair of a current layer. In the following description, it is assumed that the process is applied even if motion information of a macroblock or a macroblock pair of a virtual base layer is not mentioned to be used for inter-layer motion prediction of a corresponding macroblock or a corresponding macroblock pair of a current layer.

Fig. 5 schematically shows how motion information of a pair of field macroblocks 500 of a virtual base layer to be used for inter-layer prediction is derived from motion information of a pair of base layer frame macroblocks corresponding to a current macroblock pair, according to another embodiment of the present invention. In the present embodiment, as shown in the drawing, the reference index of the top or bottom 8 × 8 block of the top macroblock of the frame macroblock pair of the base layer is used as the reference index of the top 8 × 8 block of each macroblock of the field macroblock pair 500 of the virtual base layer, and the reference index of the top or bottom 8 × 8 block of the bottom macroblock of the base layer is used as the reference index of the bottom 8 × 8 block of each macroblock of the field macroblock pair 500. On the other hand, as shown in the drawing, the motion vector of the topmost 4 × 4 block of the top macroblock of the frame macroblock pair of the base layer is shared with the topmost 4 × 4 block of each of the field macroblock pair 500 of the virtual base layer, the motion vector of the third 4 × 4 block of the top macroblock of the frame macroblock pair of the base layer is shared with the second 4 × 4 block of each of the field macroblock pair 500, the motion vector of the topmost 4 × 4 block of the bottom macroblock of the frame macroblock pair of the base layer is shared with the third 4 × 4 block of each of the field macroblock pair 500, and the motion vector of the third 4 × 4 block of the bottom macroblock of the frame macroblock pair of the base layer is shared with the fourth 4 × 4 block of each of the field macroblock pair 500.

As shown in fig. 5a, the top 4 × 4 block 501 and the bottom 4 × 4 block 502 of the 8 × 8 blocks of the field macroblock pair 500 constructed for inter-layer prediction use motion vectors of 4 × 4 blocks of different 8 × 8 blocks 511 and 512 of the base layer. These motion vectors may be motion vectors using different reference pictures. That is, different 8 x 8 blocks 511 and 512 may have different reference indices. Accordingly, in this case, in order to construct the macroblock pair 500 of the virtual base layer, the EL encoder 20 shares the motion vector of the corresponding 4 × 4 block 503 selected for the top 4 × 4 block 501 as the motion vector of the second 4 × 4 block 502 of the virtual base layer, as shown in fig. 5b (521).

In the embodiment described with reference to fig. 4a to 4f, in order to construct motion information of a virtual base layer to predict motion information of a current macroblock pair, the EL encoder 20 sequentially derives a partition mode, a reference index, and a motion vector based on the motion information of the corresponding macroblock pair of the base layer. However, in the embodiment described with reference to fig. 5a and 5b, the EL encoder 20 first derives the reference index and the motion vector of the macroblock pair of the virtual base layer based on the motion information of the corresponding macroblock pair of the base layer, and then finally determines the partition mode of the macroblock pair of the virtual base layer based on the derived values. When the division mode is determined, 4 × 4 block units having the same derived motion vector and reference index are combined, and the division mode is set to this combined mode if the combined block mode is an allowable division mode, and otherwise set to a pre-combination mode.

In the above-described embodiment, if both macroblocks in the corresponding macroblock pair 410 of the base layer are intra mode, only base intra prediction is performed on the current macroblock pair 413. In this case, motion prediction is not performed. Of course, the macroblock pair of the virtual base layer is not constructed in the case of texture prediction. If only one macroblock of the corresponding macroblock pair 410 of the base layer is the intra mode, the motion information of the inter macroblock is copied to the intra macroblock as shown in fig. 4b, the motion vector and the reference index of the intra macroblock are set to 0 as shown in fig. 4c, or the reference index of the intra macroblock is set by copying the reference index of the inter macroblock to the intra macroblock and the motion vector of the intra macroblock is set to 0 as shown in fig. 4 d. Then, the motion information of the macroblock pair of the virtual base layer is derived as described above.

After constructing a macroblock pair of the virtual base layer for inter-layer motion prediction as described above, the EL encoder 20 predicts and encodes motion information of the current field macroblock pair 413 using motion information of the constructed macroblock pair.

The inter-layer texture prediction will now be described. Fig. 4g shows an example inter-layer texture prediction method in case of "frame MB- > field MB in MBAFF frame". The EL encoder 20 identifies the block mode of the corresponding pair of frame macroblocks 410 of the base layer. If both macroblocks in the corresponding pair of frame macroblocks 410 are either in intra mode or in inter mode, the EL encoder 20 converts (transforms) the corresponding pair of macroblocks 410 of the base layer into a pair of temporary field macroblocks 421 in order to perform either base intra prediction of the current pair of field macroblocks 413 (when both frame macroblocks 410 are in intra mode) or residual prediction thereof in the manner described below (when both frame macroblocks 410 are in inter mode). When both macroblocks in the corresponding macroblock pair 410 are intra-mode, the temporary field macroblock pair 421 includes data that is deblocked (i.e., subjected to filtering for deblocking) after completion of decoding in the case of intra-mode as previously described. In the following description of various embodiments, the same is true for pairs of temporary macroblocks derived from macroblocks of the base layer used for texture prediction.

However, inter-layer texture prediction is not performed when only one of the two macroblocks is in inter mode. The macroblock pair 410 of the base layer for inter-layer texture prediction has original image data (or decoded image data) that is not encoded if the macroblock is an intra mode, and has residual data (or decoded residual data) that is encoded if the macroblock is an inter mode. The same is true for the macroblock pair of the base layer in the following description of texture prediction.

Fig. 4h illustrates a method for converting a pair of frame macroblocks into a pair of field macroblocks to be used for inter-layer texture prediction. As shown, even lines of a pair of frame macroblocks a and B are sequentially selected to construct a top field macroblock a ', and odd lines of the pair of frame macroblocks a and B are sequentially selected to construct a bottom field macroblock B'. When a field macroblock is filled with lines, it is first filled with even (or odd) lines of the top block a (a even or a odd) and then with odd (or even) lines of the bottom block B (B even or B odd).

Case of frame MB- > field MB in field picture

In this case, a macroblock in a current layer is a field macroblock encoded into a field picture, and a macroblock in a base layer to be used for inter-layer prediction of the macroblock of the current layer is a frame macroblock encoded. The video signal component included in the macroblock pair in the base layer is the same as the video signal component included in the macroblock of the parity in the even field or the odd field in the current layer. First, interlayer motion prediction is described as follows.

The EL encoder 20 uses a macroblock division pattern obtained by merging the macroblock pairs of the base layer into a single macroblock (by compressing to half the size in the vertical direction) as a division pattern of even or odd macroblocks of the virtual base layer. Fig. 6a shows a detailed example of this process. As shown, the corresponding macroblock pair 610 of the base layer is first merged into a single macroblock 611(S61), and the partition mode obtained by the merging is applied to the macroblock of the virtual base layer to be used for inter-layer motion prediction of the current macroblock 613 (S62). The merging rules are the same as in case I previously. The processing when at least one macroblock in the corresponding macroblock pair 610 is intra-mode encoded is the same as in case I previously.

The procedure for deriving the reference index and the motion vector is also performed in the same manner as described above in the previous case I. In case I, the same derivation procedure is applied to the top and bottom macroblocks, since even and odd macroblock pairs are carried in one frame. However, the present case II is different from the case I in that the derivation procedure is applied to only one field macroblock, as shown in fig. 6b and 6c, because there is only one macroblock corresponding to the base layer macroblock pair 610 in the current field picture to be coded.

In the above embodiment, in order to predict motion information of a macroblock of a virtual base layer, the EL encoder 20 sequentially derives a partition mode, a reference index, and a motion vector of the macroblock based on motion information of a corresponding macroblock pair of the base layer.

In another embodiment of the present invention, the EL encoder 20 first derives a reference index and a motion vector of a macroblock of the virtual base layer based on motion information of a corresponding macroblock pair of the base layer, and then finally determines a block mode of the macroblock of the virtual base layer based on the derived values. Fig. 7a and 7b schematically illustrate the derivation of the reference index and motion vector for a field macroblock of the virtual base layer. The operation for derivation in this case is similar to that in case I described with reference to fig. 5a and 5b, except that the motion information of the top or bottom macroblock is derived using the motion information of the macroblock pair of the base layer.

When the division mode is finally determined, 4 × 4 block units having the same derived motion vector and reference index are combined, and the division mode is set to this combined mode if the combined block mode is an allowable division mode, and otherwise set to a pre-combination mode.

In the above-described embodiment, if both of the respective macroblock pairs of the base layer are intra mode, the motion prediction is not performed, and the motion information of the macroblock pair of the virtual base layer is not constructed, whereas if only one of the two macroblocks is intra mode, the motion prediction is performed as previously described in this case.

The inter-layer texture prediction will now be described. Fig. 6d shows an example inter-layer texture prediction method in case of "frame MB- > field MB in field picture". The EL encoder 20 identifies the block mode of the corresponding macroblock pair 610 of the base layer. If both macroblocks in the macroblock pair are either intra or inter, then EL encoder 20 constructs a temporary field macroblock 621 from the single pair of frame macroblocks 610. If current macroblock 613 belongs to an even field picture, EL encoder 20 constructs a temporary field macroblock 621 from the even lines of the corresponding macroblock pair 610. If the current macroblock 613 belongs to an odd field picture, the EL encoder 20 constructs a temporary field macroblock 621 from the odd lines of the corresponding macroblock pair 610. The construction method is similar to the method of constructing a single field macroblock a 'or B' in fig. 4 h.

Once the temporary field macroblock 621 is constructed, the EL encoder 20 performs base intra prediction of the current field macroblock 613 (when both macroblocks in the corresponding macroblock pair 610 are intra mode) or performs residual prediction thereof (when both macroblocks in the corresponding macroblock pair 610 are inter mode) based on texture information in the field macroblock 621.

If only one macroblock of the corresponding macroblock pair 610 is inter mode, the EL encoder 20 does not perform inter-layer texture prediction.

MB in MBAFF frame-case of frame MB

In this case, the macroblock in the current layer is a coded-into-frame macroblock, and the macroblock in the base layer to be used for inter-layer prediction of the frame macroblock of the current layer is a field macroblock coded into an MBAFF frame. The video signal component included in a field macroblock in the base layer is the same as the video signal component included in a pair of co-located macroblocks in the current layer. First, interlayer motion prediction is described as follows.

The EL encoder 20 uses a macroblock division mode obtained by extending the top or bottom macroblocks of the base layer macroblock pair (to twice in the vertical direction) as the division mode of the macroblock pair in the virtual base layer. Fig. 8a shows a detailed example of this process. Although the top field macroblock is selected in the following description and the drawings, the same applies as described below when the bottom field macroblock is selected.

As shown in fig. 8a, the top field macroblock of the corresponding macroblock pair 810 of the base layer is expanded to two times to construct two macroblocks 811(S81), and the division pattern obtained by the expansion is applied to the macroblock pair 812 of the virtual base layer (S82).

However, when the corresponding field macroblock is expanded to be twice in the vertical direction, a partition mode (or pattern) that is not allowed in the macroblock partition mode may be generated. To prevent this, the EL encoder 20 determines the division mode according to the extended division mode in the following rule.

1) The 4 × 4, 8 × 4, and 16 × 8 blocks of the base layer are determined as 4 × 8, 8 × 8, and 16 × 16 blocks obtained by enlarging them twice in the vertical direction after expansion.

2) The 4 × 8, 8 × 8, and 16 × 16 blocks of the base layer are respectively determined to be top and bottom two blocks of the same size after being extended. As shown in fig. 8a, the 8 × 8 block B8_0 of the base layer is determined as two 8 × 8 blocks (801). The reason why the 8 × 8 block B8_0 is not set to an 8 × 16 block after expansion is that an expanded block adjacent on the left or right side thereof may not be an 8 × 16 partition block, and which macroblock partition mode is not supported in this case.

If one of the macroblocks in the corresponding macroblock pair 810 is intra-mode encoded, the EL encoder 20 selects not intra-mode but top or bottom field macroblocks of inter-mode and performs the above extension process thereon to determine the partition mode of the macroblock pair 812 in the virtual base layer.

If both macroblocks in the corresponding macroblock pair 810 are intra mode, the EL encoder 20 performs only inter-layer texture prediction without performing the partition mode determination through the above extension process and the reference index and motion vector derivation process described below.

To derive the reference index of the macroblock pair of the virtual base layer from the reference index of the corresponding field macroblock, the EL encoder 20 determines the reference index of the corresponding 8 × 8 block B8_0 of the base layer as the reference index of each of the top and bottom two 8 × 8 blocks, as shown in fig. 8B, and divides the determined reference index of each 8 × 8 block by 2 to obtain a final reference index thereof. The reason for this division is that to be applicable to the frame sequence, the number of pictures needs to be reduced by half, because the number of reference pictures for field macroblocks is set based on the picture divided into even and odd fields.

When deriving the motion vector of the pair of frame macroblocks 812 of the virtual base layer, the EL encoder 20 determines the motion vector of the corresponding 4 × 4 block of the base layer as the motion vector of the 4 × 8 block in the pair of macroblocks 812 of the virtual base layer, as shown in fig. 8c, and uses the determined motion vector as a final motion vector after multiplying its vertical component by 2. The reason for this multiplication is that the picture component included in one field macroblock corresponds to the picture component of two frame macroblocks, thus doubling the size of the frame picture in the vertical direction.

In the above-described embodiment, in order to predict motion information of a macroblock pair of a virtual base layer, the EL encoder 20 sequentially derives a partition mode, a reference index, and a motion vector of a macroblock based on motion information of a corresponding field macroblock of the base layer.

In another embodiment of the present invention, when deriving motion information of a macroblock pair of a virtual base layer to be used for inter-layer prediction of a current macroblock pair, the EL encoder 20 first obtains a reference index and a motion vector of the macroblock pair of the virtual base layer based on motion information of a corresponding field macroblock of the base layer, and then finally determines a block mode of each macroblock of the macroblock pair of the virtual base layer based on the obtained values, as shown in fig. 9 a. When the division mode is finally determined, 4 × 4 block units having the same derived motion vector and reference index are combined, and the division mode is set to this combined mode if the combined block mode is an allowable division mode, and otherwise set to a pre-combination mode.

The following is a more detailed description of the embodiment of fig. 9 a. As shown, an inter-mode field macroblock of the base layer is selected, and a reference index and a motion vector of a pair of frame macroblocks of the virtual base layer to be used for motion prediction of a current macroblock pair are derived using the motion vector and the reference index of the selected macroblock. If both macroblocks are inter mode, then either of the top and bottom macroblocks is selected (901 or 902) and the motion vector and reference index information of the selected macroblock is used. As shown, to derive the reference index, the corresponding values of the top 8 x 8 block of the selected macroblock are copied to the reference indices of the top and bottom 8 x 8 blocks of the top macroblock of the virtual base layer, and the corresponding values of the bottom 8 x 8 block of the selected macroblock are copied to the reference indices of the top and bottom 8 x 8 blocks of the bottom macroblock of the virtual base layer. As shown, to derive the motion vector, the respective values of each 4x4 block of the selected macroblock are shared as motion vectors for a corresponding pair of vertically adjacent 4x4 blocks in the macroblock pair of the virtual base layer. In another embodiment of the present invention, motion information of corresponding macroblock pairs of the base layer may be mixed and used to derive motion vectors and reference indices of frame macroblock pairs of the virtual base layer, unlike the embodiment shown in fig. 9 a. Fig. 9b shows a procedure for deriving motion vectors and reference indices according to this embodiment. A detailed description of the reference indices of the subblocks (8 × 8 blocks and 4 × 4 blocks) in the macroblock pair of the virtual base layer and the copy association of the motion vector is omitted herein because it can be intuitively understood from the description of the motion information derivation procedure described above and the illustration of fig. 9 b.

However, since the motion information of both macroblocks in the pair of field macroblocks of the base layer is used in the embodiment of fig. 9b, if one macroblock in the pair of field macroblocks of the base layer is intra-mode, the motion information of the intra-mode macroblock is derived using the motion information of the other macroblock, which is an inter-mode macroblock. Specifically, the motion vector and reference index information of the macroblock pair of the virtual base layer may be derived as shown in fig. 9b after the motion vector and reference index of the intra-mode macroblock are constructed by copying the corresponding information of the inter-mode macroblock to the intra-mode macroblock as shown in fig. 4b, or after the intra-mode macroblock is treated as the inter-mode macroblock having a motion vector and a reference index of 0 as shown in fig. 4c, or after the reference index of the intra-mode macroblock is set and the motion vector thereof is set to 0 by copying the reference index of the inter-mode macroblock to the intra-mode macroblock as shown in fig. 4 d. Once the motion vector and reference index information of the macroblock pair of the virtual base layer are derived, the block mode of the macroblock pair is determined based on the derived information as previously described.

On the other hand, if both macroblocks in the corresponding field macroblock pair of the base layer are intra mode, no motion prediction is performed.

The inter-layer texture prediction will now be described. Fig. 8d shows an example inter-layer texture prediction method in case of "field MB in MBAFF frame- > frame MB". The EL encoder 20 identifies the block mode of the corresponding field macroblock pair 810 of the base layer. If both macroblocks of the corresponding pair of frame macroblocks 810 are either in intra mode or in inter mode, the EL encoder 20 converts the corresponding pair of field macroblocks 810 of the base layer into a pair of temporary frame macroblocks 821 to perform either base intra prediction of the pair of current frame macroblocks 813 (when both frame macroblocks 810 are in intra mode) or residual prediction thereof in the manner described below (when both frame macroblocks 810 are in inter mode). When both macroblocks in a respective macroblock pair 810 are intra, the macroblock pair 810 includes data that has been decoded and a deblocking filter is applied to the frame macroblock pair 821 as previously described. Fig. 8e shows a method for converting a pair of field macroblocks into a pair of frame macroblocks. As shown, lines of a pair of field macroblocks A and B are alternately selected sequentially from the top of each macroblock (A- > B- > A- >, and then sequentially arranged in the selected order from the top to construct a pair of frame macroblocks A 'and B'. Because the lines of the pair of field macroblocks are rearranged in this manner, the top frame macroblock a 'is constructed from the lines of the upper half of the pair of field macroblocks a and B, and the bottom frame macroblock B' is constructed from the lines of the lower half.

On the other hand, if only one macroblock of the corresponding field macroblock pair 810 of the base layer is an inter mode, one block is selected from the macroblock pair 810 of the base layer according to the block mode of the current frame macroblock pair 813, and the selected block is used for inter-layer texture prediction. Alternatively, each method described below may be applied to perform inter-layer prediction before determining the block mode of the current frame macroblock pair 813, and then the block mode of the macroblock pair 813 may be determined.

Fig. 8f and 8g show examples in which one block is selected to perform inter-layer prediction. In case the current pair of frame macroblocks 813 is inter-mode coded (or inter-mode prediction is performed), as shown in fig. 8f, an inter-mode block 810a is selected from the pair of field macroblocks 810 of the base layer, and the selected block is up-sampled in the vertical direction to create two corresponding macroblocks 831. These two macroblocks 831 are then used for residual prediction of the current frame macroblock pair 813. In case the current pair of frame macroblocks 813 is not inter-mode coded (or its intra-mode prediction is performed), as shown in fig. 8g, an intra-mode block 810b is selected from the pair of field macroblocks 810 of the base layer, and the selected block is up-sampled in the vertical direction to create two corresponding macroblocks 841. After applying the deblocking filter to these two macroblocks 841, these two macroblocks 841 are used for intra base prediction of current frame macroblock pair 813.

The method shown in fig. 8f and 8g, in which one block is selected and upsampled to create a macroblock pair to be used for inter-layer texture prediction, can also be applied when the layers have different picture rates. When the picture rate of the enhanced layer is higher than that of the base layer, some pictures in the picture sequence of the enhanced layer may not have temporally corresponding pictures in the base layer. Inter-layer texture prediction of a pair of frame macroblocks included in an enhancement layer picture having no temporally corresponding picture in the base layer can be performed using one macroblock of a pair of spatially co-located field macroblocks in a temporally preceding picture in the base layer.

Fig. 8h is an example of the method in the case where the picture rate of the enhanced layer is twice the picture rate of the base layer.

As shown, the picture rate of the enhanced layer is twice the picture rate of the base layer. Thus, there is one in every two pictures of the enhancement layer, such as a picture with a Picture Order Count (POC) of "n 2", with no picture with the same Picture Order Count (POC) in the base layer. Here, the same POC indicates temporal consistency.

When there is no temporally consistent picture in the base layer (e.g., when the current POC is n 2), the bottom field macroblock 802 included in a pair of spatially co-located field macroblocks in a previous picture (i.e., a picture having a POC lower than the current POC by 1) is vertically up-sampled to create a temporary macroblock pair 852(S82), and then inter-layer texture prediction of the current macroblock pair 815 is performed using this temporary macroblock pair 852. When a temporally consistent picture in the base layer (e.g., when the current POC is n 1), the top field macroblock 801 included in a pair of spatially co-located field macroblocks in this temporally consistent picture is vertically up-sampled to create a temporary macroblock pair 851(S82), and then inter-layer texture prediction of the current macroblock pair 814 is performed using this temporary macroblock pair 851. When a macroblock pair decoded from an intra-mode macroblock is included in the temporary macroblock pair 851 or 852 created by upsampling, the macroblock pair is used for inter-layer texture prediction after applying a deblocking filter to the macroblock pair.

In another embodiment of the present invention, when there are temporally consistent pictures in the base layer (when the current POC in the example of fig. 8h is n 1), the frame macroblock pair is not created using the method shown in fig. 8h but may be created from the field macroblock pair according to the embodiment shown in fig. 8d, which can then be used for inter-layer texture prediction. Also, when the current picture has no temporally consistent picture in the base layer (when the current POC in the example of fig. 8h is n 2), the inter-layer texture prediction may be performed as in fig. 8h, or the inter-layer texture prediction may not be performed on the macroblock in the current picture.

Accordingly, an embodiment of the present invention allocates a flag 'field _ base _ flag' to indicate whether inter-layer texture prediction is performed according to the method shown in fig. 8d or the method shown in fig. 8h, and includes this flag in the encoding information. For example, the flag is set to '0' when the texture prediction has been performed according to the method of fig. 8d, and is set to '1' when the texture prediction has been performed according to the method of fig. 8 h. The flag is defined in a sequence parameter set in an enhancement layer to be transmitted to a decoder, a sequence parameter in a scalable extension, a picture parameter set in a scalable extension, a slice header in a scalable extension, a macroblock layer, or a macroblock layer in a scalable extension.

Case of field MB in field picture- > frame MB

In this case, a macroblock in a current layer (EL) is a coded frame macroblock, and a macroblock in a Base Layer (BL) to be used for inter-layer prediction of the frame macroblock of the current layer is a coded field macroblock in a field picture. The video signal component included in a field macroblock in the base layer is the same as the video signal component included in a pair of co-located macroblocks in the current layer. First, interlayer motion prediction is described as follows.

The EL encoder 20 uses a division pattern obtained by expanding macroblocks (to be twice as large in the vertical direction) in even or odd fields of the base layer as a division pattern of macroblocks in the virtual base layer. Fig. 10a shows a detailed example of this process. The procedure shown in fig. 10a is different from the procedure of case III in which the top or bottom field macroblock in the MBAFF frame is selected in that a spatially co-located field macroblock 1010 in even or odd fields is naturally used, and it is similar to the procedure of case III in that the co-located field macroblock 1010 is extended and the division pattern of two macroblocks obtained by the extension is applied to the macroblock pair 1012 of the virtual base layer. When the corresponding field macroblock 1010 is expanded to be twice in the vertical direction, a partition mode (or pattern) that is not allowed in the macroblock partition mode may be generated. To prevent this, the EL encoder 20 determines the division pattern from the extended division pattern according to the same rules as rules 1) and 2) suggested in case III.

The EL encoder 20 performs only inter-layer texture prediction if the corresponding macroblock has been encoded in the intra mode, without performing the partition mode determination through the above extension process and the reference index and motion vector derivation process described below. That is, the EL encoder 20 does not perform inter-layer motion prediction.

The reference index and motion vector derivation procedure is also similar to that described in case III above. However, the present case IV differs from the case III in the following respects. In case III, one of the top and bottom macroblocks is selected and applied to the derivation procedure because the corresponding base layer macroblock is carried in even and odd macroblock pairs in the frame. In this case IV, since there is only one macroblock in the base layer corresponding to the current macroblock to be coded, the motion information of the macroblock pair 1012 of the virtual base layer is derived from the motion information of the corresponding field macroblock without the macroblock selection procedure as shown in fig. 10b and 10c, and the derived motion information is used for inter-layer motion prediction of the current macroblock pair 1013.

Fig. 11 schematically illustrates the derivation of a reference index and a motion vector for a macroblock pair of a virtual base layer according to another embodiment of the present invention. In this case, the motion information of the macroblock pair of the virtual base layer is derived from the motion information of the even or odd field macroblocks of the base layer, unlike the case described above with reference to fig. 9 a. The same derivation operation as in the case of fig. 9a applies to this case. However, the process of mixing and using motion information of macroblock pairs in the case shown in fig. 9b is not applicable in this case IV, because there is no top and bottom macroblock pair in the corresponding field in the base layer.

In the embodiment described with reference to fig. 10a to 10c, in order to predict motion information of a macroblock pair of a virtual base layer, the EL encoder 20 sequentially derives a partition mode, a reference index, and a motion vector based on motion information of a corresponding field macroblock of the base layer. However, in another embodiment of fig. 11, the EL encoder 20 first derives the reference index and the motion vector of the macroblock pair of the virtual base layer based on the motion information of the corresponding macroblock pair of the base layer, and then finally determines the partition mode of the macroblock pair of the virtual base layer based on the derived values. When the division mode is determined, 4 × 4 block units having the same derived motion vector and reference index are combined, and the division mode is set to this combined mode if the combined block mode is an allowable division mode, and otherwise set to a pre-combination mode.

When texture prediction is performed in the above-described embodiment, if a corresponding field macroblock of the base layer is an intra mode, intra base prediction encoding is performed on a current macroblock. If the corresponding field macroblock is inter mode, and if the current macroblock has been encoded in inter mode, inter-layer residual prediction coding is performed. Here, of course, the field macroblock used in prediction is used for texture prediction after it is up-sampled in the vertical direction.

In another embodiment of the present invention, a virtual macroblock is created from field macroblocks included in odd or even fields to construct a macroblock pair, and then motion information of the macroblock pair of the virtual base layer is derived from the constructed macroblock pair. Fig. 12a and 12b show an example of this embodiment.

In this embodiment, the reference indices and motion vectors of the corresponding even (or odd) field macroblocks of the base layer are copied (1201 and 1202) to create virtual odd (or even) field macroblocks to construct macroblock pair 1211, and the motion information of the constructed macroblock pair 1211 is blended to derive motion information of macroblock pair 1212 of the virtual base layer (1203 and 1204). In an example method of mixing and using motion information, as shown in fig. 12a and 12b, a reference index of a top 8 × 8 block of a corresponding top macroblock is applied to a top 8 × 8 block of a top macroblock of a macroblock pair 1212 of a virtual base layer, a reference index of a bottom 8 × 8 block is applied to a top 8 × 8 block of a bottom macroblock, a reference index of a top 8 × 8 block of a corresponding bottom macroblock is applied to a bottom 8 × 8 block of a top macroblock of the macroblock pair 1212 of the virtual base layer, and a reference index of a bottom 8 × 8 block is applied to a bottom 8 × 8 block of the bottom macroblock (1203). Motion vectors are applied according to the reference index (1204). The description of this process is omitted here because it can be intuitively understood from fig. 12a and 12 b.

In the embodiment shown in fig. 12a and 12b, the partition mode of the macroblock pair 1212 of the virtual base layer is determined based on the derived reference index and the motion vector using the same method as described above.

The inter-layer texture prediction will now be described. Fig. 10b shows an example inter-layer texture prediction method for the case of "field MB in field picture- > frame MB". EL encoder 20 first upsamples a corresponding field macroblock 1010 of the base layer to create two temporary macroblocks 1021. If the corresponding field macroblock 1010 is intra mode, the EL encoder 20 applies a deblocking filter to the two created temporary macroblocks 1021, and then performs base intra prediction of the current frame macroblock pair 1013 based on the two temporary macroblocks 1021. If the corresponding field macroblock 1010 is an inter mode, the EL encoder 20 performs residual prediction of the current frame macroblock pair 1013 based on the created two temporary macroblocks 1021.

Case of V, field MB- > field MB

This case is subdivided into the following four cases because the field macroblocks are divided into field macroblocks included in the field picture and field macroblocks included in the MBAFF frame.

i) Case where the base and enhancement layers are MBAFF frames

This situation is shown in fig. 13 a. As shown, the motion information (partition mode, reference index, and motion vector) of the corresponding macroblock pair of the base layer is used as the motion information of the macroblock pair of the virtual base layer by directly copying the motion information of the corresponding macroblock pair to the macroblock pair of the virtual base layer. Here, motion information is copied between macroblocks having the same parity. Specifically, the motion information of the even field macroblock is copied to the even field macroblock, and the motion information of the odd field macroblock is copied to the odd field macroblock to construct a macroblock of the virtual layer for motion prediction of the macroblock of the current layer.

The known method of inter-layer texture prediction between frame macroblocks is applied when performing texture prediction.

ii) case where the base layer includes field pictures and the enhancement layer includes MBAFF frames

This situation is shown in fig. 13 b. As shown in the drawing, motion information (partition mode, reference index, and motion vector) of a corresponding field macroblock of the base layer is used as motion information of each of a macroblock pair of the virtual base layer by directly copying the motion information of the corresponding field macroblock to each of the macroblock pair of the virtual base layer. Here, the same parity copy rule does not apply, since the motion information of a single field macroblock is used for both the top and bottom field macroblocks.

When performing texture prediction, intra-base prediction (when a corresponding block of the base layer is intra-mode) or residual prediction (when a corresponding block of the base layer is inter-mode) is applied between the enhancement layer and the base layer macroblocks having the same (even or odd) field properties.

iii) case where the base layer includes MBAFF frames and the enhancement layer includes field pictures

This situation is shown in fig. 13 c. As shown in the drawing, field macroblocks having the same parity are selected from a base layer macroblock pair corresponding to a current field macroblock, and motion information (partition mode, reference index, and motion vector) of the selected field macroblock is used as motion information of the field macroblock of the virtual base layer by directly copying the motion information of the selected field macroblock to the field macroblock of the virtual base layer.

iv) case where the base layer and the enhancement layer are field pictures

This situation is shown in fig. 13 d. As shown in the drawing, the motion information (partition mode, reference index, and motion vector) of the corresponding field macroblock of the base layer is used as the motion information of the field macroblock of the virtual base layer by directly copying the motion information of the corresponding field macroblock of the base layer to the field macroblock of the virtual base layer. Also in this case, motion information is copied between macroblocks having the same parity.

The above description of inter-layer prediction is given for the case where the base layer and the enhancement layer have the same resolution. The following description will be given on how to identify the picture type (progressive frame, MBAFF frame, or interlaced field) of each layer and/or the type of macroblock in the picture when the resolution of the enhanced layer is higher than the resolution of the base layer (i.e., when spatialscalability type () is greater than 0), and to apply an inter-layer prediction method according to the identified type. Inter-layer motion prediction is described first.

M _ a), base layer (progressive frame) - > enhancement layer (MBAFF frame)

Fig. 14a shows a processing method for this case. As shown, first, motion information of all macroblocks of a corresponding frame in a base layer is copied to create a virtual frame. Upsampling is then performed. In the upsampling, interpolation is performed using texture information of a base layer picture at an interpolation rate that allows the resolution (or picture size) of the picture to be equal to the resolution of the current layer. Further, the motion information of each macroblock of the picture enlarged by interpolation is constructed based on the motion information of each macroblock of the virtual frame. One of a variety of known methods may be used for this construction. The pictures of the temporary base layer constructed in this way have the same resolution as the pictures of the current (enhancement) layer. Accordingly, the above-described inter-layer motion prediction may be applied in this case.

In this case (fig. 14a), the macroblocks in the pictures in the base layer and the current layer are frame macroblocks and field macroblocks in MBAFF frames, because the base layer comprises frames and the current layer comprises MBAFF frames. Accordingly, the method of case I described above is applied to perform inter-layer motion prediction. However, not only field macroblock pairs, but frame macroblock pairs may also be included in the same MBAFF frame as described above. Accordingly, when the type of a current layer macroblock pair corresponding to a macroblock pair in a picture of a temporary base layer has been identified as a frame macroblock type instead of a field macroblock type, a known method of motion prediction including a simple copy of motion information between frame macroblocks (frame-to-frame prediction method) is applied.

M _ B), base layer (progressive frame) - > enhancement layer (interlaced field)

Fig. 14b shows a processing method for this case. As shown, first, motion information of all macroblocks of a corresponding frame in a base layer is copied to create a virtual frame. Upsampling is then performed. In this upsampling, interpolation is performed at an interpolation rate that allows the resolution of the picture to be equal to the resolution of the current layer, using texture information of the base layer picture. Further, the motion information of each macroblock of the picture enlarged by interpolation is constructed based on the motion information of each macroblock of the virtual frame created.

The method of case II described above is applied to perform inter-layer motion prediction because each macroblock of a picture of the temporary base layer constructed in this way is a frame macroblock, and each macroblock of the current layer is a field macroblock in a field picture.

M _ C), base layer (MBAFF frame) - > enhancement layer (progressive frame)

Fig. 14c shows a processing method for this case. As shown, first, the corresponding MBAFF frame of the base layer is converted into a progressive frame. The method of case III above applies to the transformation of a field macroblock pair of a MBAFF frame into a progressive frame, and the known frame-to-frame prediction method applies to the transformation of a frame macroblock pair of a MBAFF frame. Of course, when the method of case III is applied to this case, a virtual frame and motion information of each macroblock of the frame are created using data obtained by inter-layer prediction without performing an operation of encoding and decoding a difference between predicted data and data of a layer to be actually encoded and decoded.

Once a virtual frame is obtained, upsampling is performed on the virtual frame. In this up-sampling, interpolation is performed at an interpolation rate that allows the resolution of the base layer to be equal to the resolution of the current layer. Further, motion information of each macro block of the enlarged picture is constructed based on the motion information of each macro block of the virtual frame using one of a variety of known methods. Here, the known frame macroblock-macroblock inter-layer motion prediction method is performed because each macroblock of the picture of the temporary base layer constructed in this way is a frame macroblock, and each macroblock of the current layer is a frame macroblock.

M _ D) base layer (interlaced field) > enhancement layer (progressive frame)

Fig. 14d shows one approach to this case. In this case, the type of the picture is the same as the type of the macroblock of the picture. As shown, first, the corresponding field of the base layer is converted into a progressive frame. The transformed frame has the same vertical/horizontal (aspect) ratio as the picture of the current layer. The up-sampling procedure and the method of case IV above are applicable to the conversion of interlaced fields into progressive frames. Of course, when the method of case IV is applied to this case, texture data of a virtual frame and motion information of each macroblock of the frame are created using data obtained by inter-layer prediction without performing an operation of encoding and decoding a difference between predicted data and data of a layer to be actually encoded and decoded.

Once a virtual frame is obtained, upsampling is performed on the virtual frame. In this upsampling, interpolation is performed to allow the resolution of the virtual frame to be equal to the resolution of the current layer. In addition, motion information for each macroblock of the interpolated picture is constructed based on the motion information for each macroblock of the virtual frame using one of a number of known methods. Here, the known frame macroblock-macroblock inter-layer motion prediction method is performed because each macroblock of the picture of the temporary base layer constructed in this way is a frame macroblock, and each macroblock of the current layer is a frame macroblock.

Fig. 14e shows a processing method for the above case M _ D) according to another embodiment of the invention. As shown, this embodiment transforms odd or even corresponding fields into progressive frames. For converting interlaced fields to progressive frames, the method of upsampling and case IV described above is applied as shown in fig. 14 d. Once the virtual frame is obtained, a method of motion prediction between pictures having the same aspect ratio, which is one of a variety of known methods, is applied to the virtual frame to perform motion prediction between a picture of a current layer and a temporary layer to perform predictive coding of motion information of each macroblock of a progressive picture of the current layer.

The method shown in fig. 14e differs from the method of fig. 14d in that no temporary prediction signal is generated.

Fig. 14f shows a processing method for the above case M _ D) according to another embodiment of the invention. As shown, this embodiment copies motion information of all macroblocks of the corresponding field of the base layer to create a virtual picture. Upsampling is then performed. In this upsampling, texture information of a picture of a base layer is used, and different interpolation rates are used for vertical and horizontal interpolation so that an enlarged picture has the same size (or resolution) as a picture of a current layer. Further, one of a variety of known prediction methods, such as Extended Special Scalability (ESS), may be applied to the virtual picture to construct various syntax information and motion information of the enlarged picture. The motion vector constructed in this process is expanded according to the magnification ratio. Once the upsampled picture of the temporary base layer is constructed, the picture is used to perform inter-layer motion prediction of each macroblock in the picture of the current layer to encode motion information for each macroblock of the picture of the current layer. Here, a known frame macroblock ═ macroblock inter-layer motion prediction method is applied.

Fig. 14g shows a processing method for the above case M _ D) according to another embodiment of the invention. As shown, this embodiment first copies motion information of all macroblocks of the corresponding field of the base layer to create a virtual picture. Thereafter, interpolation is performed at different rates for vertical and horizontal interpolation using texture information of a picture of the base layer. The texture information created by this operation is used for inter-layer texture prediction. In addition, motion information in the virtual picture is used to perform inter-layer motion prediction of each macroblock in the picture of the current layer. Here, motion prediction codec of a picture of a current layer is performed using one of various known methods, for example, Extended Special Scalability (ESS) defined in Joint Scalable Video Model (JSVM).

The method shown in fig. 14g differs from the method of fig. 14f in that no temporary prediction signal is generated.

M _ E), base layer (MBAFF frame) - > enhancement layer (MBAFF frame)

Fig. 14h shows a processing method for this case. As shown, first, the corresponding MBAFF frame of the base layer is converted into a progressive frame. To transform MBAFF frames into progressive frames, the method of case III above applies to the transformation of field macroblock pairs of MBAFF frames, and the frame-to-frame prediction method applies to the transformation of frame macroblock pairs of MBAFF frames. Of course, when the method of case III is applied to this case, the motion information of the virtual frame and each macroblock of the frame is created using data obtained by inter-layer prediction without performing an operation of a difference between data predicted by codec and data of a layer to be actually codec.

Once a virtual frame is obtained, upsampling is performed on the virtual frame. In this up-sampling, interpolation is performed at an interpolation rate that allows the resolution of the base layer to be equal to the resolution of the current layer. Further, motion information of each macro block of the enlarged picture is constructed based on the motion information of each macro block of the virtual frame using one of a variety of known methods. The method of case I described above is applied to perform inter-layer motion prediction because each macroblock of a picture of the temporary base layer constructed in this way is a frame macroblock, and each macroblock of the current layer is a field macroblock in the MBAFF frame. However, not only field macroblock pairs, but frame macroblock pairs may also be included in the same MBAFF frame as described above. Accordingly, when a current layer macroblock pair corresponding to a macroblock pair in a picture of a temporary base layer is a frame macroblock instead of a field macroblock, a known method of motion prediction including a copy of motion information between frame macroblocks (frame-to-frame prediction method) is applied.

M _ F) base layer (MBFF frame) > enhancement layer (interlaced field)

Fig. 14i shows a processing method of this case. As shown, first, the corresponding MBAFF frame of the base layer is converted into a progressive frame. To transform MBAFF frames into progressive frames, the method of case III above applies to the transformation of field macroblock pairs of MBAFF frames, and the frame-to-frame prediction method applies to the transformation of frame macroblock pairs of MBAFF frames. Of course, also when the method of case III is applied to this case, it is to create a virtual frame and motion information of each macroblock of the frame using data obtained by inter-layer prediction without performing an operation of a difference between data predicted by codec and data of a layer to be actually codec.

Once the virtual frame is acquired, interpolation is performed on the virtual frame at an interpolation rate that allows a resolution equal to that of the current layer. Further, motion information of each macroblock of the enlarged picture is constructed based on the motion information of each macroblock of the virtual frame using one of a variety of known methods. The method of case II described above is applied to perform inter-layer motion prediction because each macroblock of a picture of the temporary base layer constructed in this way is a frame macroblock, and each macroblock of the current layer is a field macroblock in an even or odd field.

M _ G) base layer (interlaced field) > enhancement layer (MBAFF frame)

Fig. 14j shows a processing method for this case. As shown, first, the interlaced field of the base layer is converted into a progressive frame. The method of upsampling and case IV above is applied to convert interlaced fields into progressive frames. Of course, also when the method of case IV is applied to this case, it is to create a virtual frame and motion information of each macroblock of the frame using data obtained by inter-layer prediction without performing an operation of coding a difference between predicted data and data of a layer to be actually coded.

Once the virtual frame is obtained, upsampling is performed on the virtual frame to allow a resolution equal to that of the current layer. In addition, motion information for each macroblock of the enlarged picture is constructed using one of a number of known methods. The method of case I described above is applied to perform inter-layer motion prediction because each macroblock of a picture of the temporary base layer constructed in this way is a frame macroblock, and each macroblock of the current layer is a field macroblock in the MBAFF frame. However, not only field macroblock pairs, but frame macroblock pairs may also be included in the same MBAFF frame as described above. Therefore, when the current layer macroblock pair corresponding to the macroblock pair in the picture of the temporary base layer includes a frame macroblock instead of a field macroblock, a known method of motion prediction between frame macroblocks (frame-to-frame prediction method) is applied instead of the prediction method of the above case I.

M _ H) base layer (interlaced field) > enhancement layer (interlaced field)

Fig. 14k shows a processing method for this case. As shown, first, motion information of all macroblocks of a corresponding field in a base layer is copied to create a virtual field, and then upsampling is performed on the virtual field. The upsampling is performed at an upsampling rate that allows the resolution of the base layer to be equal to the resolution of the current layer. Further, motion information of each macro block of the enlarged picture is constructed based on the motion information of each macro block of the created virtual frame using one of a variety of known methods. The method of the case iv) in the above-described case V) is applied to perform interlayer motion prediction because each macroblock of the picture of the temporary base layer constructed in this way is a field macroblock in a field picture, and each macroblock of the current layer is also a field macroblock in a field picture.

Although the upsampling is performed using texture information of a virtual field or frame of the temporal layer instead of texture information of a picture of the base layer in the description of the embodiment of fig. 14a to 14k, texture information of a picture of the base layer may be used for the upsampling. Further, when deriving motion information of a picture of a temporary layer to be used for inter-layer motion prediction performed in a subsequent stage, if not necessary, an interpolation process using texture information may be omitted in the above up-sampling process.

On the other hand, although the description of texture prediction is given for the case where the base layer and the enhancement layer have the same spatial resolution, the two layers may have different spatial resolutions as described above. In the case where the resolution of the enhanced layer is higher than that of the base layer, first, an operation of making the resolution of the picture of the base layer equal to that of the enhanced layer is performed to create a picture of the base layer having the same resolution as that of the enhanced layer, and a texture prediction method corresponding to each of the above cases I-V is selected based on each macroblock in the picture to perform prediction coding. A procedure of making the resolution of the base layer picture equal to that of the enhancement layer picture will now be described in detail.

When two layers for inter-layer prediction are considered, the combined number of picture formats (progressive and interlaced formats) for coding between the two layers is 4, because there are two video signal scanning methods, one is progressive and the other is interlaced. Therefore, a method of increasing the resolution of the base layer picture to perform inter-layer texture prediction will be described separately for each of these four cases.

T _ a), case where the enhancement layer is progressive and the base layer is interlaced

Fig. 15a illustrates an embodiment of a method for using a base layer picture for inter-layer texture prediction for this case. As shown, a base layer picture 1501, which temporally corresponds to the picture 1500 of the current (enhancement) layer, includes even and odd fields that are output at different times. Therefore, first, the EL encoder 20 divides the picture of the base layer into even and odd fields (S151). The intra-mode macroblock of the base layer picture 1501 has original image data (or image data that has been decoded) that has not been encoded for intra-mode prediction, and its inter-mode macroblock has encoded residual data (or decoded residual data) for residual prediction. The same is true for base layer macroblocks or pictures when texture prediction is described below.

After dividing the corresponding picture 1501 into field components, the EL encoder 20 performs interpolation of the separated fields 1501a and 1501b in the vertical and/or horizontal directions to create enlarged even and odd pictures 1502a and 1502b (S152). The interpolation uses one of a number of known methods, such as 6-tap filtering and binary linear filtering. The vertical and horizontal ratio for increasing the resolution (i.e., size) of the picture by interpolation is equal to the vertical and horizontal ratio of the size of the enhancement layer picture 1500 to the size of the base layer picture 1501. The vertical and horizontal ratios may be equal to each other. For example, if the resolution between the enhanced layer and the base layer is 2, interpolation is performed on the separated even and odd fields 1501a and 1501b to create one more pixel between each pixel in each field in the vertical and horizontal directions.

Once the interpolation is completed, the enlarged even and odd fields 1502a and 1502b are combined to construct a picture 1503 (S153). In this combination, lines of amplified even and odd fields 1502a and 1502b (1502a- > 1502b- >) are alternately selected and then arranged in a selected order to construct a combined picture 1503. Here, the block mode of each macroblock in the combined picture 1503 is determined. For example, the block mode of the macroblock of the combined picture 1503 is determined to be equal to the block mode of the macroblock in the base layer picture 1501 including a region having the same image component. This determination method is applicable to any case of an enlarged picture described below. Since the combined picture 1503 constructed in this manner has the same spatial resolution as the current picture 1500 of the enhanced layer, texture prediction (e.g., frame-to-frame inter-macroblock texture prediction) of the macroblock in the current progressive picture 1500 is performed based on the corresponding macroblock of the combined picture 1503 (S154).

Fig. 15b illustrates a method of using a base layer picture in inter-layer texture prediction according to another embodiment of the present invention. As shown, this embodiment does not separate the base layer picture on the basis of field properties (parity), but directly performs interpolation of the base layer picture including even and odd fields output at different times in the vertical and/or horizontal directions (S155) to construct an enlarged picture having the same resolution (i.e., size) as that of the enhancement layer picture. The enlarged picture constructed in this way is used to perform inter-layer texture prediction of the current progressive picture of the enhanced layer (S156).

Fig. 15a shows at the picture level the procedure for interpolating pictures with even and odd fields by separating them on the basis of the field properties. However, the EL encoder 20 can achieve the same result as shown in fig. 15a by performing the procedure shown in fig. 15a at the macroblock level. More specifically, when the base layer having even and odd fields is already MBAFF encoded, the video signal of even and odd field components that a vertically adjacent macroblock pair in picture 1501, which is co-located with the macroblock pair in the enhancement layer picture currently subjected to texture prediction coding, can include as in fig. 16a or 16 b. Fig. 16a shows a frame MB pair mode in which even and odd field components are interleaved in each of a pair of macroblocks a and B, and fig. 16B shows a field MB pair mode in which each of a pair of macroblocks a and B includes video lines having the same field attribute.

In the case of fig. 16a, in order to apply the method shown in fig. 15a, the even lines of each of the pair of macroblocks a and B are selected to construct even field blocks a ', and the odd lines thereof are selected to construct odd field blocks B', thereby dividing the pair of macroblocks in each macroblock, in which even and odd field components are interleaved, into two blocks a 'and B' having even and odd field components, respectively. Interpolation is performed on each of the two macroblocks a 'and B' separated in this manner to construct an enlarged block. Texture prediction is performed using data in an area of the enlarged block corresponding to a macroblock of an intra _ BL (intra base layer) or residual _ prediction mode in an enhancement layer picture to be currently subjected to texture prediction coding. Although not shown in fig. 16a, combining the individually enlarged blocks based in part on the field attributes may construct enlarged even and odd pictures 1502a and 1502b in fig. 15a, and thus enlarged even and odd pictures 1502a and 1502b in fig. 15a may be constructed by repeating the above operations for each pair of macroblocks.

In the case where the macroblock pair is divided based on the field attribute to construct each macroblock as in fig. 16b, the above-described separation procedure is a process of simply copying each macroblock from the macroblock pair to construct two separated macroblocks. The subsequent procedure is similar to the procedure described with reference to fig. 16 a.

T _ B), the case where the enhancement layer is interlaced and the base layer is progressive

Fig. 17a illustrates an embodiment of a method for using a base layer picture for inter-layer texture prediction for this case. As shown, first, the EL encoder 20 constructs two pictures for the current layer picture 1700 (S171). In an example method of applying the construction of two pictures, even lines of the respective pictures 1701 are selected to construct one picture 1701a, and odd lines thereof are selected to construct the other picture 1701 b. The EL encoder 20 then performs interpolation of the two pictures 1701a and 1701b thus constructed in the vertical and/or horizontal direction to create two enlarged pictures 1702a and 1702b (S172). The interpolation uses one of a number of known methods, such as 6-tap filtering and binary linear filtering in case T _ a). The ratios for increasing the resolution are also the same as those described in the case T _ a).

Once the interpolation is completed, the two enlarged fields 1702a and 1702b are combined to construct a picture 1703 (S173). In this combination, the lines of the two magnified fields 1702a and 1702b (1702a- > 1702b- >) are alternately selected and then arranged in the selected order to construct a combined picture 1703. Since the combined picture 1703 constructed in this way has the same spatial resolution as the current picture 1700 of the enhanced layer, texture prediction (e.g., inter-frame macroblock texture prediction or texture prediction described with reference to fig. 4 g) of the macroblock in the current interlaced picture 1700 is performed based on the corresponding macroblock of the combined picture 1703 (S174).

Fig. 17b illustrates a method of using a base layer picture in inter-layer texture prediction according to another embodiment of the present invention. As shown, this embodiment does not divide the base layer picture into two pictures, but directly performs interpolation of the base layer picture in the vertical and/or horizontal direction (S175) to construct an enlarged picture having the same resolution (i.e., size) as the resolution (i.e., size) of the enhancement layer picture. The enlarged picture constructed in this way is used to perform inter-layer texture prediction of the current interlaced picture of the enhanced layer (S176).

Although the description of fig. 17a is also given at the picture level, the EL encoder 20 may perform the picture separation process at the macroblock level as described in the case T _ a) above. The method of fig. 17b is similar to the separation and interpolation procedure shown in fig. 17a when a single picture 1701 is treated as a vertically adjacent pair of macroblocks. A detailed description of the procedure is omitted here because it can be intuitively understood from fig. 17 a.

T _ C), case where both enhancement and base layers are interlaced

Fig. 18 illustrates an embodiment of a method for using a base layer picture for inter-layer texture prediction for this case. In this case, as shown, the EL encoder 20 divides the base layer picture 1801 temporally corresponding to the current layer picture 1800 into even and odd fields in the same manner as in the case T _ a) (S181). The EL encoder 20 then performs interpolation of the separated fields 1801a and 1801b in the vertical and/or horizontal direction to create enlarged even and odd pictures 1802a and 1802b (S182). The EL encoder 20 then combines the amplified even and odd fields 1802a and 1802b to construct a picture 1803 (S182). The EL encoder 20 then performs inter-layer texture prediction (e.g., frame-to-frame inter-macroblock texture prediction or texture prediction described with reference to fig. 4 g) of the macroblock (MBAFF-coded frame macroblock pair) in the current interlaced picture 1800 based on the corresponding macroblock of the combined picture 1803 (S184).

Although both layers have the same picture format, the EL encoder 20 separates the base layer picture 1801 on the basis of the field properties (S181) and individually enlarges the separated fields (S182) and then combines the enlarged pictures (S183), because the enlarged pictures may have distorted images (e.g., images with extended boundaries) compared to the interlaced picture 1800 of the enhancement layer having interleaved even and odd fields if the picture 1801 combining even and odd fields is directly interpolated when it has the property that the video signals of even and odd fields vary greatly. Accordingly, even if both layers are interlaced, according to the present invention, the EL encoder 20 uses the base layer picture after separating it on the basis of the field property to obtain two fields, and individually enlarges the two fields, and then combines the enlarged fields.

Of course, the method shown in fig. 18 may not always be used when the pictures of both layers are interlaced, but may instead be selectively used according to the video characteristics of the pictures.

Fig. 18 shows at the picture level the procedure for separating and enlarging pictures with even and odd fields on the basis of the field properties according to the invention. However, as described in T _ a) above, the EL encoder 20 may achieve the same result as shown in fig. 18 by performing the procedure shown in fig. 18 on a macroblock level, which includes the macroblock-based separation and interpolation process described with reference to fig. 16a and 16b (specifically, dividing a frame macroblock into even and odd lines of blocks and individually enlarging the separated blocks) and the combination and inter-layer texture prediction process (specifically, alternately selecting lines of enlarged blocks to construct a pair of enlarged blocks and performing texture prediction of a frame macroblock pair of a current layer using the constructed pair of enlarged blocks).

T _ D), case where both enhancement and base layers are progressive

In this case, the base layer picture is enlarged to the same size as the enhancement layer picture, and the enlarged picture is used for inter-layer texture prediction of the current enhancement layer picture having the same picture format.

Although the embodiments of texture prediction when the base layer and the enhancement layer have the same temporal resolution have been described above, the two layers may have different temporal resolutions, i.e., different picture rates. If the pictures of the layers are different picture scanning types even when the layers have the same temporal resolution, the pictures may contain video signals having different output times even if they are pictures of the same POC (i.e., pictures that temporally correspond to each other). The inter-layer texture prediction method in this case will now be described. In the following description, it is assumed that the two layers initially have the same spatial resolution. If the two layers have different spatial resolutions, the method described below is applied after each picture of the base layer is up-sampled to make the spatial resolution equal to that of the enhancement layer as described above.

a) Case where the enhancement layer includes progressive frames, the base layer includes MBAFF frames, and the temporal resolution of the enhancement layer is twice as high

Fig. 19a shows an inter-layer texture prediction method for this case. As shown, each MBAFF frame of the base layer includes even and odd fields having different output times, and thus the EL encoder 20 divides each MABFF frame into even and odd fields (S191). The EL encoder 20 divides the even field component (e.g., even lines) and the odd field component (e.g., odd lines) of each MBAFF frame into even fields and odd fields, respectively. After dividing the MBAFF frame into two fields in this way, the EL encoder 20 interpolates each field in the vertical direction to have a resolution twice as high (S192). The interpolation uses one of a number of known methods, such as 6-tap filtering, binary linear filtering, and sample line zero padding. Once the interpolation is completed, each frame of the enhanced layer has a temporally uniform picture in the base layer, and thus the EL encoder 20 performs known inter-layer texture prediction (e.g., frame-to-frame inter-macroblock prediction) on the macroblock of each frame of the enhanced layer (S193).

The above procedure may also be applied to inter-layer motion prediction. Here, when the MBAFF frame is divided into two fields, the EL encoder 20 copies the motion information of each of the pair of field macroblocks in the MBAFF frame as the motion information of the macroblock having the same field attribute (parity) to use it for inter-layer motion prediction. Even when there is no temporally coincident picture in the base layer (in the case of t1, t3... times.), a temporally coincident picture can be created using this method according to the above method to perform inter-layer motion prediction.

The above method can be directly applied when the resolution of one of the two layers is as high as twice the resolution of the other layer as in the example of fig. 19a and even when it is N times (three times or more) high. For example, when the resolution is three times as high, one of the two separated fields may be additionally copied to construct and use three fields, and when the resolution is four times as high, each of the two separated fields may be copied again to construct and use four fields. It is obvious that inter-layer prediction can be performed by applying the principles of the present invention simply without any inventive thought by a person skilled in the art, in case of any temporal resolution difference. Therefore, any method for prediction between layers having different temporal resolutions, which is not described in the present specification, naturally falls within the scope of the present invention. The same is true for other cases described below.

If the base layer has been coded as Picture Adaptive Field and Frame (PAFF) instead of MBAFF frame, the two layers may have the same temporal resolution as in fig. 19 b. Therefore, in this case, the inter-layer texture prediction is performed after constructing a picture having the same temporal resolution as the current layer by directly interpolating the frame without performing a process of dividing the frame into two fields.

b) Case where the enhancement layer includes MBAFF frames, the base layer includes progressive frames, and the temporal resolution of the enhancement layer is half that of the base layer

Fig. 20 shows an inter-layer texture prediction method for this case. As shown, each MBAFF frame of the enhancement layer includes even and odd fields having different output times, and thus the EL encoder 20 divides each MABFF frame into even and odd fields (S201). The EL encoder 20 divides the even field component (e.g., even lines) and the odd field component (e.g., odd lines) of each MBAFF frame into even fields and odd fields, respectively. The EL encoder 20 performs sub-sampling of each frame of the base layer in the vertical direction to construct a picture with half resolution (S202). The sub-sampling may use line sub-sampling or one of various other known down-sampling methods, and in the example of fig. 20, the EL encoder 20 selects even lines of a picture with an even picture index (pictures t0, t2, t4..) to obtain a picture of halved size, and selects odd lines of a picture with an odd picture index (pictures t1, t3..) to obtain a picture of halved size. Frame separation (S201) and sub-sampling (S202) may also be performed in reverse order.

Upon completion of these two processes S201 and S202, the field 2001 separated from the frame of the enhanced layer has a picture temporally coincident with the field 2001 and having the same spatial resolution as the field 2001 in the base layer, whereby the EL encoder 20 performs known inter-layer texture prediction (e.g., frame-to-frame inter-macroblock prediction) on macroblocks in each field (S203).

The above procedure can also be applied to inter-layer motion prediction. Here, when a reduced-size picture is acquired from each frame of the base layer through sub-sampling (S202), the EL encoder 20 may obtain motion information of a corresponding macroblock from motion information of each of a pair of vertically adjacent macroblocks according to a suitable method (e.g., a method of employing motion information of a block that is not completely divided), and then may use the obtained motion information for inter-layer motion prediction.

In this case, the picture of the enhancement layer is PAFF-coded for transmission because inter-layer prediction is performed on each field picture 2001 separated from the MBAFF frame.

c) Case where the enhancement layer includes MBAFF frames, the base layer includes progressive frames, and both layers have the same temporal resolution

Fig. 21 shows an inter-layer texture prediction method for this case. As shown, each MBAFF frame of the enhancement layer includes even and odd fields having different output times, and thus the EL encoder 20 divides each MABFF frame into even and odd fields (S211). The EL encoder 20 divides the even field component (e.g., even lines) and the odd field component (e.g., odd lines) of each MBAFF frame into even fields and odd fields, respectively. The EL encoder 20 performs sub-sampling of each frame of the base layer in the vertical direction to construct a picture with half resolution (S212). The sub-sampling may use row sub-sampling or one of various other known down-sampling methods. Frame separation (S211) and sub-sampling (S212) may also be performed in reverse order.

The EL encoder 20 may also construct a field (e.g., even field picture) from the MBAFF frame instead of dividing the MBAFF frame into two fields. This is because two layers have the same temporal resolution, and thus only one (but not both) of two field pictures separated from one frame has a corresponding frame in the base layer that can be used for inter-layer prediction.

Upon completion of these two processes S211 and S212, the EL encoder 20 performs known inter-layer texture prediction (e.g., frame-to-frame inter-macroblock prediction) on only even (odd) fields among fields separated from the frame of the enhanced layer based on the corresponding sub-sampled picture in the base layer (S213).

Also in this case, inter-layer motion prediction may be performed on the separated field of the enhancement layer for which inter-layer texture prediction is performed in the same manner as described in case b).

Although the above description is given in terms of the inter-layer prediction operation performed by the EL encoder 20 of fig. 2a or 2b, all descriptions of the inter-layer prediction operation may be commonly applied to an EL decoder that receives decoded information from a base layer and decodes an enhancement layer stream. In the encoding and decoding procedures, the above-described inter-layer prediction operations (including operations for separating, enlarging, and combining video signals in pictures or macroblocks) are performed in the same manner, but the operations after the inter-layer prediction are performed in different manners. Examples of this difference are: after performing motion and texture prediction, the encoder encodes predicted information or difference information between the predicted information and actual information to transmit it to the decoder, and the decoder obtains the actual motion information and texture information by directly applying information obtained by performing the same inter-layer motion and texture prediction as performed at the encoder to a current macroblock or by additionally encoding decoding information using an actually received macroblock. The details and principles of the invention described above from an encoding point of view apply directly to a decoder that decodes a received two-layer data stream.

However, when the EL encoder transmits an enhancement layer having MBAFF frames in a PAFF manner after separating the enhancement layer into field sequences and performing inter-layer prediction as described with reference to fig. 20 and 21, the decoder does not perform the above-described procedure of dividing MBAFF frames into field pictures for the currently received layer.

In addition, the decoder then decodes a flag 'field _ base _ flag' identifying whether the EL encoder 20 has performed inter-layer texture prediction between macroblocks as shown in fig. 8d or fig. 8h from the received signal. Based on the decoded flag value, the decoder determines whether prediction between macroblocks is performed as shown in fig. 8d or 8h, and acquires texture prediction information according to the determination. If the flag 'field _ base _ flag' is not received, the EL decoder assumes that a flag having a value of "0" has been received. That is, the EL decoder assumes that texture prediction between macroblocks is performed according to the method as shown in fig. 8d, and obtains prediction information of the current macroblock pair to reconstruct the current macroblock or macroblock pair.

At least one of the above-described limited embodiments of the present invention can perform inter-layer prediction even when video signal sources of different formats (or modes) are used. Accordingly, when a plurality of layers are codec-decoded, the data coding rate can be increased regardless of the picture type of the video signal, such as an interlaced signal, a progressive signal, an MBAFF frame picture, and a field picture. Further, when one of the two layers is an interlaced video signal source, an image of a picture used in prediction can be constructed to be more similar to an original image used in prediction coding, thereby improving a data encoding rate.

Although the present invention has been described with reference to preferred embodiments, it will be apparent to those skilled in the art that various improvements, modifications, substitutions, and additions may be made therein without departing from the scope and spirit of the invention. Thus, it is intended that the present invention cover the modifications, alterations, substitutions and additions of the invention provided they come within the scope of the appended claims and their equivalents.

Claims

1. A method for decoding a video signal, the method comprising the steps of:

deriving virtual location information for respective pairs of field macroblocks corresponding to the pairs of field macroblocks in the enhancement layer;

deriving location information for a reference frame macroblock pair in a base layer based on the virtual location information for the corresponding macroblock pair;

predicting motion information of a pair of field macroblocks in the enhancement layer from motion information of the pair of reference frame macroblocks according to the position information of the pair of reference frame macroblocks; and

decoding the pair of field macroblocks in the enhancement layer using the motion information of the pair of field macroblocks in the enhancement layer,

wherein the motion information of the field macroblock pair and the reference frame macroblock pair comprises a reference index and a motion vector, the field macroblock pair is two vertically adjacent field macroblocks, and the reference frame macroblock pair is two vertically adjacent frame macroblocks.

2. The method of claim 1, wherein the respective pair of field macroblocks is comprised of top field macroblocks and bottom field macroblocks, an upper portion of the top field macroblocks being comprised of even lines of top macroblocks of the pair of reference frame macroblocks, a lower portion of the top field macroblocks being comprised of even lines of bottom macroblocks of the pair of reference frame macroblocks, an upper portion of the bottom field macroblocks being comprised of odd lines of top macroblocks of the pair of reference frame macroblocks, and a lower portion of the bottom field macroblocks being comprised of odd lines of bottom macroblocks of the pair of reference frame macroblocks.

3. The method of claim 1, wherein when the pair of field macroblocks are predicted based on intra-coded macroblocks, motion vectors of the pair of field macroblocks are predicted to be zero,

wherein the intra-coded macroblock is included in the reference frame macroblock pair and the motion vector of the field macroblock pair is predicted in units of 4x4 blocks.

4. The method of claim 1, wherein the pair of Field macroblocks in the enhancement layer are included in a macroblock Adaptive Frame Field (MBAFF Frame) Frame of the enhancement layer, the macroblock Adaptive Frame Field Frame (MBAFF Frame) being a Frame that includes macroblocks that are adaptively decoded as Frame or Field macroblocks.

5. An apparatus for decoding a video signal, the apparatus comprising:

a decoding unit for deriving virtual position information of respective pairs of field macroblocks corresponding to the pairs of field macroblocks in the enhancement layer; and deriving position information of a reference frame macroblock pair in the base layer based on the virtual position information of the corresponding field macroblock pair; predicting motion information of a field macro block pair in the enhancement layer from the motion information of the reference frame macro block pair according to the position information of the reference frame macro block pair; and decoding the pair of field macroblocks in the enhancement layer using the motion information of the pair of field macroblocks in the enhancement layer,

wherein the motion information of the field macroblock pair and the reference frame macroblock pair includes a reference index and a motion vector, the field macroblock pair is two vertically adjacent field macroblocks, and the reference frame macroblock pair is two vertically adjacent frame macroblocks.

6. The apparatus of claim 5, wherein the respective pair of field macroblocks is comprised of top field macroblocks and bottom field macroblocks, an upper portion of the top field macroblocks being comprised of even lines of top macroblocks of the pair of reference frame macroblocks, a lower portion of the top field macroblocks being comprised of even lines of bottom macroblocks of the pair of reference frame macroblocks, an upper portion of the bottom field macroblocks being comprised of odd lines of top macroblocks of the pair of reference frame macroblocks, and a lower portion of the bottom field macroblocks being comprised of odd lines of bottom macroblocks of the pair of reference frame macroblocks.

7. The apparatus of claim 5, wherein when the pair of field macroblocks are predicted based on intra-coded macroblocks, motion vectors of the pair of field macroblocks are predicted to be zero,

8. The apparatus of claim 5, wherein the pair of Field macroblocks in the enhancement layer are included in a macroblock Adaptive Frame Field (MBAFF Frame) Frame (MBAFF Frame) of the enhancement layer, the macroblock Adaptive Frame Field Frame (MBAFF Frame) being a Frame that includes macroblocks that are adaptively decoded as Frame macroblocks or Field macroblocks.