HK1118663B

HK1118663B - Method and apparatus for weighted prediction for scalable video coding

Info

Publication number: HK1118663B
Application number: HK08112464.4A
Authority: HK
Inventors: 鹏尹; 吉尔‧麦克唐纳‧布瓦斯; 普尔温‧比贝哈斯‧潘迪特
Original assignee: 汤姆森许可贸易公司
Priority date: 2005-07-21
Filing date: 2006-05-19
Publication date: 2011-12-02

Description

Scalable video decoding method and apparatus for weighted prediction

Reference to related applications

The benefit of U.S. provisional patent No.60/701,464 entitled "METHOD and apparatus FOR WEIGHTED preservation FOR scalable preservation of video CODING" filed on 21/7/2005 is hereby incorporated by reference in its entirety.

Technical Field

The present invention relates generally to video encoding and decoding and, more particularly, to methods and apparatus for weighted prediction for scalable video encoding and decoding.

Background

The Advanced Video Coding (AVC) standard/international telecommunication union, telecommunication sector (ITU-H) h.264 standard of part 10 of the international organization for standardization/international electrotechnical commission (ISO/IEC) moving picture experts group-4 (MPEG-4) (hereinafter referred to as the "MPEG 4/h.264 standard" or simply as the "h.264 standard") is the first international video coding standard that includes a Weighted Prediction (WP) tool. Weighted prediction is employed to improve coding efficiency. The Scalable Video Coding (SVC) standard, which is a revision of the h.264 standard, also employs weighted prediction. However, the SVC standard does not explicitly specify the relationship of weights between the base layer and its enhancement layers.

Weighted prediction is supported in the primary profile (profile), extended profile and high-end profile of the h.264 standard. The use of WP is indicated in the sequence parameter set for P and SP slices using the weighted _ pred _ flag field and for B slices using the weighting _ bipred _ idc field. There are two WP modes: explicit mode and implicit mode. Explicit mode is supported in P, SP and the B slice. The implicit mode is supported only in B frames.

A single weighting factor and offset is associated with each reference picture index for each color component in each tile. In explicit mode, these WP parameters may be encoded in the slice header. In implicit mode, these parameters are derived based on the relative distance of the current picture from its reference picture.

For each macroblock or macroblock partition, the weighting parameters applied are based on the reference picture index (or indices in the case of bi-prediction) of the current macroblock or macroblock partition. The reference picture index is either coded in a bitstream or can be derived, for example, for macroblocks of skip or direct mode. Since the reference picture index is already available based on other required bitstream fields, the signal to which the reference picture index is applied for the weighting parameters is bit rate efficient compared to the weighting parameter index needed in the bitstream.

A number of different classification methods have been extensively studied and standardized in the classification profiles (profiles) of the MPEG-2 and h.264 standards, including SNR classification, spatial classification, temporal classification, and granular classification, or are currently being developed as a revision of h.264.

For spatial, temporal and SNR scalability, a large degree of inter-layer prediction is incorporated. The intra and inter macroblocks can be predicted using the corresponding signals in the previous layers. Furthermore, the motion description of each layer can be used for prediction of the motion description of the subsequent enhancement layer. These techniques can be divided into three categories: inter-layer intra texture prediction, inter-layer motion prediction, and inter-layer residue prediction.

In Joint Scalable Video Module (JSVM)2.0, an enhancement LAYER macroblock can utilize inter-LAYER prediction using scalable BASE LAYER motion data, as in the case of binary (two-LAYER) spatial scalability, using either "BASE _ LAYER _ MODE" or "QPEL _ REFINEMENT _ MODE". When inter-layer motion prediction is used, the motion vectors in the corresponding (upsampled) MBs in the previous layer, including their reference picture indices and associated weighting parameters, are used for motion prediction. If the enhancement layer and its previous layers have different values of pred _ weight _ table (), we need to store different sets of weighting parameters for the same reference picture in the enhancement layer.

Disclosure of Invention

These and other drawbacks and disadvantages of the prior art are addressed by the present invention, which is directed to methods and apparatus for weighted prediction for scalable video encoding and decoding.

According to an aspect of the present invention, there is provided a scalable video decoder. The scalable video decoder includes a decoder for decoding a block in an enhancement layer of a picture by applying to an enhancement layer reference picture the same weighting parameter that is applied to a lower layer reference picture for decoding a block in a lower layer of the picture. The block in the enhancement layer corresponds to the block in the lower layer, and the enhancement layer reference picture corresponds to the lower layer reference picture.

According to another aspect of the present invention, there is provided a scalable video decoding method. The method comprises the following steps: a block in an enhancement layer of a picture is decoded by applying to an enhancement layer reference picture the same weighting parameter that is applied to a lower layer reference picture for decoding a block in a lower layer of the picture. The block in the enhancement layer corresponds to the block in the lower layer, and the enhancement layer reference picture corresponds to the lower layer reference picture.

According to still another aspect of the present invention, there is provided a storage medium having scalable video signal data encoded thereon, the scalable video signal data including a block encoded in an enhancement layer of a picture, the block being generated by applying the same weighting parameter to an enhancement layer reference picture as that applied to a lower layer reference picture for encoding a block in a lower layer of the picture. The block in the enhancement layer corresponds to the block in the lower layer, and the enhancement layer reference picture corresponds to the lower layer reference picture.

These and other aspects, features and advantages of the present invention will become apparent from the detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.

Drawings

The invention may be better understood in light of the following exemplary figures in which:

FIG. 1 illustrates a block diagram of an exemplary Joint Scalable Video Module (JSVM)2.0 encoder to which the principles of the present invention may be applied;

FIG. 2 shows a block diagram of an exemplary decoder to which the principles of the present invention may be applied;

FIG. 3 is a flow diagram of an exemplary method for scalable video coding an image block using weighted prediction according to an exemplary embodiment of the present invention;

FIG. 4 is a flowchart of an exemplary method for scalable video decoding of image blocks using weighted prediction according to an exemplary embodiment of the present invention;

FIG. 5 is a flowchart of an exemplary method for decoding level _ jdc and profile _ idc statements, according to an exemplary embodiment of the invention;

fig. 6 is a flowchart of an exemplary method for decoding a weighted prediction constraint of an enhancement layer according to an exemplary embodiment of the present invention.

Detailed Description

The present invention relates to scalable video encoding and decoding methods and apparatuses for weighted prediction.

In accordance with the principles of the present invention, methods and apparatus are disclosed for reusing bottom layer weighting parameters for enhancement layer weighted prediction. Advantageously, embodiments in accordance with the present principles save cost and/or complexity of the encoder and decoder. Furthermore, embodiments in accordance with the present principles may also save bits at very low bit rates.

The present description illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope.

All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Further, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor ("DSP") hardware, read-only memory ("ROM") for storing software, random access memory ("RAM"), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included. Similarly, all switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that: the functions provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

According to an embodiment of the invention, a method and apparatus are disclosed for reusing bottom layer weighting parameters for an enhancement layer. Since the base layer is simply a down-sampled version of the enhancement layer, it is beneficial for the enhancement layer and the base layer to have the same set of weighting parameters for the same reference picture.

Moreover, the present principles provide other advantages/features. One advantage/feature is that only one set of weighting parameters needs to be stored for each enhancement layer, which may save memory usage. Furthermore, when inter-layer motion prediction is used, the decoder needs to know which set of weighting parameters to use. A look-up table may be utilized to store the necessary information.

Another advantage/feature is reduced complexity of the encoder and decoder. For a decoder, embodiments of the present invention may reduce the complexity of parsing and table lookup to determine the correct set of weighting parameters. For an encoder, embodiments of the present principles may reduce the complexity of using different algorithms and making decisions for weighting parameter estimation accordingly. Having multiple weighting parameters for the same reference picture index will make the derivation of motion information in the inverse update step at the decoder and the update step at the encoding more complex when using the update step and considering the prediction weights.

Yet another advantage/feature is extremely low bit-rate, embodiments of the present principles may also have a small advantage in coding efficiency since the weighting parameters are not explicitly transmitted in the slice header of the enhancement layer.

Turning to fig. 1, an exemplary joint scalable video module version 2.0 (hereinafter "JSVM 2.0") encoder to which the present invention is applicable is indicated generally by the reference numeral 100. The JSVM2.0 encoder 100 uses three spatial layers and motion compensated temporal filtering. The JSVM encoder 100 includes a two-dimensional (2D) decimator 104, a 2D decimator 106, and a Motion Compensated Temporal Filtering (MCTF) module 108, each of which has an input for receiving video signal data 102.

An output of the 2D decimator 106 is connected in signal communication with an input of an MCTP module 110. A first output of the MCTP module 110 is connected in signal communication with an input of a motion encoder 112, and a second output of the MCTP module 110 is connected in signal communication with an input of a prediction module 116. A first output of the motion encoder 112 is connected in signal communication with a first input of a multiplexer 114. A second output of the motion encoder 112 is connected in signal communication with a first input of a motion encoder 124. A first output of the prediction module 116 is connected in signal communication with an input of a spatial transformer 118. An output of the space transformer 118 is connected in signal communication with a second input of the multiplexer 114. A second output of the prediction module 116 is connected in signal communication with an input of an interpolator 120. An output of the interpolator is connected in signal communication with a first input of a prediction module 122. A first output of the prediction module 122 is connected in signal communication with an input of a spatial transformer 126. An output of the space transformer 126 is connected in signal communication with a second input of the multiplexer 114. A second output of the prediction module 122 is connected in signal communication with an input of an interpolator 130. An output of the interpolator 130 is connected in signal communication with a first input of a prediction module 134. An output of the prediction module 134 is connected in signal communication with an input of a spatial transformer 136. An output of the space transformer is connected in signal communication with a second input of the multiplexer 114.

An output of the 2D decimator 104 is connected in signal communication with an input of an MCTF module 128. A first output of the MCTP module 128 is connected in signal communication with a second input of the motion coder 124. A first output of the motion coder 124 is connected in signal communication with a first input of the multiplexer 114. A second output of the motion encoder 124 is connected in signal communication with a first input of a motion encoder 132. A second output of the MCTP module 128 is connected in signal communication with a second input of the prediction module 122.

A first output of the MCTP module 108 is connected in signal communication with a second input of the motion coder 132. An output of the motion coder 132 is connected in signal communication with a first input of the multiplexer 114. A second output of the MCTP module 108 is connected in signal communication with a second input of the prediction module 134. The output of the multiplexer 114 provides an output bitstream 138.

For each spatial layer, a motion compensated temporal decomposition is performed. This decomposition provides temporal scalability. Motion information from lower spatial layers can be used for motion prediction of higher layers. For texture coding, spatial prediction between successive spatial layers can be applied to eliminate redundancy. A residual signal generated by intra-layer prediction or motion compensated inter-layer prediction is transform coded. The quality floor residual provides a minimum reconstruction quality for each spatial layer. If inter-layer prediction is not applied, the quality base layer can be encoded into a stream compliant with the h.264 standard. For quality grading, quality enhancement layers are additionally encoded. These enhancement layers may be selected to provide coarse or fine quality (SNR) grading.

Turning to fig. 2, an exemplary scalable video decoder to which the present invention may be applied is indicated generally by the reference numeral 200. An input of the demultiplexer 202 is available as an input to the scalable video decoder 200, for receiving a scalable bitstream. A first output of the demultiplexer 202 is connected in signal communication with an input of a spatial inverse transform SNR scalable entropy decoder 204. A first output of the spatial inverse transform SNR scalable entropy decoder 204 is connected in signal communication with a first input of a prediction module 206. An output of the prediction module 206 is connected in signal communication with a first input of an inverse MCTF module 208.

A second output of the spatial inverse transform SNR scalable entropy decoder 204 is connected in signal communication with a first input of a Motion Vector (MV) decoder 210. An output of the MV decoder 210 is connected in signal communication with a second input of the inverse MCTF module 208.

A second output of the demultiplexer 202 is connected in signal communication with an input of a spatial inverse transform SNR scalable entropy decoder 212. A first output of the spatial inverse transform SNR scalable entropy decoder 212 is connected in signal communication with a first input of a prediction module 214. A first output of the prediction module 214 is connected in signal communication with an input of an interpolation module 216. An output of the interpolation module 216 is connected in signal communication with a second input of the prediction module 206. A second output of the prediction module 214 is connected in signal communication with a first input of an inverse MCTF module 218.

A second output of the spatial inverse transform SNR scalable entropy decoder 212 is connected in signal communication with a first input of an MV decoder 220. A first output of the MV decoder 220 is connected in signal communication with a second input of the MV decoder 210. A second output of the MV decoder 220 is connected in signal communication with a second input of the inverse MCTF module 218.

A third output of the demultiplexer 202 is connected in signal communication with an input of a spatial inverse transform SNR scalable entropy decoder 222. A first output of the spatial inverse transform SNR scalable entropy decoder 222 is connected in signal communication with an input of a prediction module 224. A first output of the prediction module 224 is connected in signal communication with an input of an interpolation module 226. An output of the interpolation module 226 is connected in signal communication with a second input of the prediction module 214.

A second output of the prediction module 224 is connected in signal communication with a first input of an inverse MCTF module 228. A second output of the spatial inverse transform SNR scalable entropy decoder 222 is connected in signal communication with an input of an MV decoder 230. A first output of the MV decoder 230 is connected in signal communication with a second input of the MV decoder 220. A second output of the MV decoder 230 is connected in signal communication with a second input of the inverse MCTF module 228.

An output of the inverse MCTF module 228 is available as an output of the decoder 220, for outputting a layer 0 signal. An output of the inverse MCTF module 218 is available as an output of the decoder 200, for outputting a layer 1 signal. An output of the inverse MCTF module 208 is available as an output of the decoder 200, for outputting a layer 2 signal.

In a first exemplary embodiment in accordance with the present principles, no new statements are used. In this first exemplary embodiment, the enhancement layer reuses the base layer weights. For example, the first exemplary embodiment may be implemented as a profile or rating constraint. The requirement may also be indicated in the sequence or picture parameter set.

In a second exemplary embodiment in accordance with the present principles, a syntax element base _ pred _ weight _ table _ flag is introduced in the slice header syntax of the scalable extension as shown in table 1, so that the encoder can adaptively select a mode for weighted prediction on a slice basis. When base _ pred _ weight _ table _ flag is not present, base _ pred _ weight _ table _ flag should be inferred to be equal to 0. When base _ pred _ weight _ table _ flag is equal to 1, this means that the enhancement layer reuses pred _ weight _ table () from its previous layer.

Table 1 shows a syntax for weighted prediction scalable video coding.

TABLE 1

slice_header_in_scalable_extension(){	C	Descriptor
			first_mb_in_slice	2	ue(v)
slice_type	2	ue(v)
			pic_parameter_set_id	2	ue(v)
if(slice_type＝＝PR){
			num_mbs_in_slice_minus1	2	ue(v)
luma_chroma_sep_flag	2	u(l)
			}
frame_num	2	u(v)
			if(！frame_mbs_only_flag){
field_pic_flag	2	u(l)
			if(field_pic_flag)
bottom_field_flag	2	u(l)
			}
if(nal_unit_type＝＝21)
			idr_pic_id	2	ue(v)
if(pic_order_cnt_type＝＝0){
			pic order cnt_lsb	2	u(v)
if(pic_order_present_flag&&！field_pic_flag)
			delta_pic_order_cnt_bottom	2	se(v)
}
			if(pic_order_cnt_type＝＝1&&！delta_pic_order_always_zero_flag){
delta_pic_order_cnt[0]	2	se(v)

if(pic order_present_flag&&！field_pic_flag)
			delta_pic_order_cnt[l]	2	se(v)
}
			if(slice_type！＝PR){
if(redundant_pic_cnt_present_flag)
			redundant_pic_cnt	2	ue(v)
if(slice_type＝＝EB)
			direct_spatial_mv_pred_flag	2	u(l)
key_picture_flag	2	u(l)
			decomposition_stages	2	ue(v)
base_id_plus1	2	ue(v)
			if(base_id_plus1！＝0){
adaptive_prediction_flag	2	u(l)
			}
if(slice_type＝＝EP\|\|slice_type＝＝EB){
			num_ref_idx_active_override_flag	2	u(l)
if(num_ref_idx_active_override_flag){
			num_ref_idx_l0_active_minus1	2	ue(v)
if(slice_type＝＝EB)
			num_ref_idx_l1_active_minus1	2	ue(v)
}
			}
ref_pic_list_reordering()	2

for(decLvl＝temporal_leve1；decLvl＜decomposition_stages；decLvl++){
			num_ref_idx_update_l0_active[decLvl+1]	2	ue(v)
num_ref_idx_update_l1_active[decLvl+1]	2	ue(v)
			}
if((weighted_pred_flag&&slice_type＝＝EP)\|\|(weighted_bipred_idc＝＝1&&slice_type＝＝ EB))
			{
if((base_id_plus1！＝0)&&(adaptive_prediction_flag＝＝1))
			base_pred_weight_table_flag	2	u(l)
if(base_pred_weight_table_flag＝＝0)
			pred_weight_table()	2
}
			if(nal_ref_idc！＝0)
dec_ref_pic_marking()	2
			if(entropy_coding_mode_flag&&slice_type！＝EI)
cabac_init_idc	2	ue(v)
			}
slice_qp_delta	2	se(v)
			if(deblocking_filter_control_present_flag){
disable_deblocking_filter_idc	2	ue(v)
			if(disable_deblocking_filter_idc！＝1){
slice_alpha_c0_offset_div2	2	se(v)

slice_beta_offset_div2	2	se(v)
			}
}
			if(slice_type！＝PR)
if(num_slice_groups_minusl＞0 &&slice_group_map_type＞＝3&&slice_group_map_type＜＝5)
			slice_group_change_cycle	2	u(v)
if(slice_type！＝PR &&extended_spatial_scalability＞0){
			if(chroma_format_idc＞0){
base_chroma_phase_x_plus1	2	u(2)
			base_chroma_phase_y_plus1	2	u(2)
}
			if(extended_spatial_scalability＝＝2){
scaled_base_left_offset	2	se(v)

scaled_base_top_offset	2	se(v)
			scaled_base_right_offset	2	se(v)
scaled_base_bottom_offset	2	se(v)
			}
}
			SpatialScalabilityType＝spatial_scalability_type()
}

For the decoder, when the enhancement layer is to reuse the weights from the base layer, a remapping of pred _ weight _ table () from the base (or previous) layer to pred _ weight _ table () in the current enhancement layer is performed. This procedure was used for the following cases: in the first case, the same reference picture index in the base layer and the enhancement layer indicates a different reference picture; or in the second case, the reference picture used in the enhancement layer does not have a corresponding matching picture in the bottom layer. For the first case, the picture order number (POC) number is used to map the weighting parameters from the base layer to the correct reference picture index in the enhancement layer. If multiple weighting parameters are used in the bottom layer, the weighting parameter with the smallest reference picture index is preferably, but not necessarily, mapped first. For the second case, assume that base _ pred _ weight _ table _ fla is set to 0 for a reference picture that is not available for the enhancement layer. The remapping of pred _ weight _ table () from the base (previous) layer to pred _ weight _ table () in the current enhancement layer is derived as follows. This process is referred to as an inheritance process for pred _ weight _ table (). Specifically, this inheritance process is invoked when base _ pred _ weight _ table _ flag is equal to 1. The output of this process is as follows:

luma _ weight _ LX [ ] (X is 0 or 1)

-luma _ offset _ LX [ ] (X is 0 or 1)

-chroma _ weight _ LX [ ] (X is 0 or 1)

-chroma _ offset _ LX [ ] (X is 0 or 1)

-luma_log2_weight_denom

-chroma_log2_weight_denom

The derivation process for the underlying picture is called with basePic as output. For X substituted by 0 or 1, the following applies:

let base _ luma _ weight _ LX [ ] be the value of the statement element luma _ weight _ LX [ ] value of the underlying picture basePic.

Let base _ luma _ offset _ LX [ ] be the value of the statement element luma _ offset _ LX [ ] value of the underlying picture basePic.

Let base _ chroma _ weight _ LX [ ] be the value of the statement element chroma _ weight _ LX [ ] value of the underlying image basePic.

Let base _ chroma _ offset _ LX [ ] be the value of the statement element chroma _ offset _ LX [ ] value of the base picture basePic.

Let base _ luma _ log2_ weight _ denom be the value of the syntax element luma _ log2_ weight _ denom value of the underlying picture basePic.

Let base _ chroma _ log2_ weight _ denom be the value of the syntax element chroma _ log2_ weight _ denom of the underlying picture basePic.

-it is assumed that BaseRefPicListX is the reference index list RefPicListX of the base picture basePic.

For each reference index in the current slice (slice) reference index list RefPicListX (loop from 0 to num _ ref _ idx _ lX _ active _ minus1), its associated weighting parameter in the current slice inherits as follows:

-suppose refPic is an image marked by refIdxLX

-if there are pictures that satisfy all the conditions below, then refPicBase (reference picture in the corresponding base layer) is assumed to be present.

-the dependency _ id of the statement element for picture refPicBase equals the variable dependencyidsase of picture refPic.

Quality _ level for the image refPicBase equals the variable QualityLevelBase for the image refPic.

-the statement element fragment _ order for picture refPicBase equals the variable fragment OrderBase for picture refPic.

-the value of PicOrderCnt (refPic) equals the value of PicOrderCnt (refPicBase).

-there is an index baseRefIdxLX of the available reference index that is equal to the lowest value in the corresponding underlying reference index list BaseRefPicListX of reference refPicBase.

-if refPicBase is found to be present, the following applies:

-marking basereflidxlx as unusable for subsequent steps of the process.

luma_log2_weight_denom＝

base_luma_log2_weight_denom (1)

chroma_log2_weight_denom＝

base_chroma_log2_weight_denom (2)

luma_weight_LX[refIdxLX]＝

base_luma_weight_LX[baseRefIdxLX] (3)

luma_offset_LX[refIdxLX]＝

base_luma_offset_LX[baseRefIdxLX] (4)

chroma_weight_LX[refIdxLX][0]＝

base_chroma_weight_LX[baseRefIdxLX][0] (5)

chroma_offset_LX[refIdxLX][0]＝

base_chroma_offset_LX[baseRefIdxLX][0] (6)

chroma_weight_LX[refIdxLX][1]＝

base_chroma_weight_LX[baseRefIdxLX][1] (7)

chroma_offset_LX[refIdxLX][1]＝

base _ chroma _ offset _ LX [ base refidxlx ] [1] (8) -otherwise,

luma_log2_weight_denom＝

base_luma_log2_weight_denom (9)

chroma_log2_weight_denom＝

base_chroma_log2_weight_denom (10)

luma_weight_LX[refIdxLX]＝

1＜＜luma_log2_weight_denom (11)

luma_offset_LX[refIdxLX]＝0 (12)

chroma_weight_LX[refIdxLX][0]＝

1＜＜chroma_log2_weight_denom (13)

chroma_offset_LX[refIdxLX][0]＝0 (14)

chroma_weight_LX[refIdxLX][1]＝

1＜＜chroma_log2_weight_denom (15)

chroma_offset_LX[refIdxLX][1]＝0 (16)

the following is an exemplary method of implementing the inheritance process:

for(baseRefIdxLX＝0；baseRetIdxLX＜＝

base_num_ref_idx_lX_active_minus1；baseRefIdxLX++)

base_ref_avail[baseRefIdxLX]＝1

for(refIdxLX＝0；refIdxLX＜＝num_ref_idx_lX_active_minus1；

refIdxLX++){

base_weights_avail_flag[refIdxLX]＝0

for(baseRefIdxLX＝0；baseRefIdxLX＜＝

base_num_ref_idx_lX_active_minus1；baseRefIdxLX++){

if(base_ref_avail[baseRefIdxLX] &&

(PicOrderCnt(RefPicListX[refIdxLX])＝＝

PicOrderCnt(BaseRefPicListX[baseRefIdxLX]))){

apply equations(1)to(8)

base_ref_avail[baseRefIdxLX]＝0

base_weights_avail_flag[refIdxLX]＝1

break；

}

if(base_weights_avail_flag[refIdxLX]＝＝0) {

apply equations(9)to(16)

}

} (17)

if the enhancement layer picture has the same slice partition as the base layer picture, remapping of pred _ weight _ table () from the base (lower) layer to pred _ weight _ table () in the current enhancement layer can be performed on a slice basis. However, if the enhancement layer and the base layer have different slice partitions, the remapping of pred _ weight _ table () from the base (lower) layer to pred _ weight _ table () in the current enhancement layer needs to be done on a macroblock basis. For example, when the base layer and the enhancement layer have the same two slice divisions, the inheritance process described above can be invoked once per slice. Conversely, if the base layer has two partitions and the enhancement layer has three partitions, the inheritance process is invoked on a macroblock basis.

Turning to fig. 3, an exemplary method for scalable video coding an image block using weighted prediction is indicated generally by the reference numeral 300.

A start block 305 begins encoding a current Enhancement Layer (EL) picture, and passes control to a decision block 310. The decision block 310 determines whether there is an underlying (BL) picture for the current EL picture. If so, then control is passed to a function block 350. Otherwise, control is passed to a function block 315.

The function block 315 obtains weights from the BL picture, and passes control to a function block 320. The function block 320 remaps pred _ weight _ table () of the BL to pred _ weight _ table () of the enhancement layer, and passes control to a function block 325. The function block 325 sets base _ pred _ weight _ table _ flag equal to true, and passes control to a function block 330. The function block 330 weights the reference image using the obtained weights, and passes control to a function block 335. The function block 335 writes base _ pred _ weight _ table _ flag to the slice header, and passes control to a decision block 340. The decision block 340 determines whether base _ pred _ weight _ table _ flag is true. If so, then control is passed to a function block 345. Otherwise, control is passed to a function block 360.

The function block 350 computes weights for the EL image, and passes control to a function block 355. The function block 355 sets base _ pred _ weight _ table _ flag to false, and passes control to a function block 330.

The function block 345 encodes the EL picture using the weighted reference picture, and passes control to an end block 365.

The function block 360 writes the weight to the slice header, and passes control to a function block 345.

Turning to fig. 4, an exemplary method for scalable video decoding of an image block using weighted prediction is indicated generally by the reference numeral 400.

A start block 405 begins decoding a current Enhancement Layer (EL) picture, and passes control to a function block 410. The function block 410 parses base _ pred _ weight _ table _ flag in the slice header, and passes control to a decision block 415. The decision block 415 determines whether base _ pred _ weight _ table _ flag is equal to 1. If so, then control is passed to a function block 420. Otherwise, control is passed to a function block 435.

The function block 420 copies the weights from the corresponding Base Layer (BL) picture to the EL picture, and passes control to a function block 425. The function block 425 remaps pred _ weight _ table () of the BL picture to pred _ weight _ table () of the BL picture, and passes control to a function block 430. The function block 430 weights the EL image using the obtained weights, and passes control to an end block 440.

The function block 435 parses the weighting parameters, and passes control to a function block 430.

Turning to fig. 5, an exemplary method for decoding level _ idc and profile _ idc statements is indicated generally by the reference numeral 500.

A start block 505 passes control to a function block 510. The function block 510 parses the level _ idc and profile _ idc statements, and passes control to a function block 515. The function block 515 determines a weighted prediction constraint for the enhancement layer based on the parsing performed by the function block 510, and passes control to an end block 520.

Turning to fig. 6, an exemplary method for decoding a weighted prediction constraint for an enhancement layer is indicated generally by the reference numeral 600.

A start block 605 passes control to a function block 610. The function block 610 parses the statement for weighted prediction of the enhancement layer, and passes control to an end block 615.

A description will now be given of some of the many attendant advantages/features of the present invention, some of which have been mentioned above. For example, one advantage/feature is a scalable video encoder that includes an encoder for encoding a block in an enhancement layer of a picture by applying to an enhancement layer reference picture the same weighting parameter that is applied to a lower layer reference picture for encoding the block in the lower layer of the picture, wherein the block in the enhancement layer corresponds to the block in the lower layer and the enhancement layer reference picture corresponds to the lower layer reference picture. Another advantage/feature is the scalable video encoder as described above, wherein the encoder encodes the block in the enhancement layer by selecting between an explicit weighting parameter mode and an implicit weighting parameter mode. Yet another advantage/feature is the scalable video encoder as described above, wherein the encoder imposes a constraint: when a block in the enhancement layer corresponds to a block in the lower layer, and the enhancement layer reference picture corresponds to the lower layer reference picture, the same weighting parameter as that applied to the particular lower layer reference picture is always applied to the enhancement layer reference picture. Moreover, another advantage/feature is the scalable video encoder having the constraint as described above, wherein the constraint is defined as a profile or a level constraint or signaled in a sequence picture reference set. Also, another advantage/feature is the scalable video encoder as described above, wherein the encoder adds a syntax in a slice header for a slice in the enhancement layer to selectively apply the same weighting parameter or different weighting parameters to the enhancement layer reference picture. Also, another advantage/feature is the scalable video encoder as described above, wherein the encoder performs remapping of a pred _ weight _ table () statement from a lower layer to a pred _ weight _ table () statement of an enhancement layer. Also, another advantage/feature is the scalable video encoder having the mapping as described above, wherein the encoder uses the picture order number to remap the weighting parameters from the lower layer to corresponding reference picture indices in the enhancement layer. Also, another advantage/feature is the scalable video encoder having the remapping using the picture order number as described above, wherein the weighting parameter having the smallest reference picture index is remapped first. Also, another advantage/feature is the scalable video encoder having the remapping as described above, wherein the encoder sets a weighted _ prediction _ flag field to 0 for a reference picture used for an enhancement layer that is not available in a lower layer. Also, another advantage/feature is the scalable video encoder having the remapping as described above, wherein when a reference picture for the enhancement layer lacks a match in the lower layer, the encoder sends a weighting parameter for a reference picture index corresponding to the reference picture for the enhancement layer in a slice header. Also, another advantage/feature is the scalable video encoder having the remapping as described above, wherein the encoder performs the remapping based on the slices when the picture has the same slice division on the enhancement layer and the lower layer, and performs the remapping based on the macroblocks when the picture has a different slice division in the enhancement layer relative to the lower layer. Also, another advantage/feature is the scalable video encoder as described above, wherein the encoder performs the remapping of pred _ weight _ table () from the lower layer to pred _ weight _ table () of the enhancement layer when the encoder applies the same weighting parameters to the enhancement layer reference picture as applied to a particular lower layer reference picture. Also, another advantage/feature is the scalable video encoder as described above, wherein the encoder skips performing the weighting parameter estimation when the encoder applies the same weighting parameter to the enhancement layer parametric picture as that applied to the particular lower layer reference picture. Additionally, another advantage/feature is the scalable video encoder as described above, wherein the encoder stores only one weighting parameter set for each reference picture index when the encoder applies the same weighting parameter to the enhancement layer reference picture as that applied to the particular lower layer reference picture. Also, another advantage/feature is the scalable video encoder as described above, wherein the encoder estimates the weighting parameters when the encoder applies different weighting parameters or the enhancement layer has no lower layer.

These and other features and advantages of the present invention may be readily ascertained by one of ordinary skill in the pertinent art based on the teachings herein. It is to be understood that the teachings of the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof.

Most preferably, the teachings of the present invention are implemented as a combination of hardware and software. Further, the software is preferably implemented as an application program tangibly embodied on a program storage unit. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. The machine is preferably implemented on a computer platform having hardware such as one or more central processing units ("CPU"), a random access memory ("RAM"), and input/output ("I/O") interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit.

It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present invention is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such variations and modifications are intended to be included herein within the scope of the present invention as set forth in the appended claims.

Claims

1. A scalable video decoding apparatus, comprising:

a decoder (200) for decoding a block in an enhancement layer of a picture by applying to an enhancement layer reference picture the same weighting parameter as is applied to a lower layer reference picture for decoding a block in a lower layer of the picture, and

the decoder (200) performs a remapping of a reference picture weighting table statement from a lower layer to a reference picture weighting table statement of an enhancement layer when an enhancement layer reference picture does not correspond to a lower layer reference picture, wherein a block in the enhancement layer corresponds to a block in the lower layer.

2. The apparatus of claim 1, wherein said decoder (200) decodes the block in the enhancement layer by determining whether to use an explicit weighting parameter mode or an implicit weighting parameter mode.

3. The apparatus of claim 1, wherein said decoder (200) obeys the following constraints imposed by the respective encoder: when a block in the enhancement layer corresponds to a block in the lower layer, and the enhancement layer reference picture corresponds to the lower layer reference picture, the same weighting parameter as that applied to the lower layer reference picture is always applied to the enhancement layer reference picture.

4. The apparatus of claim 3, wherein the constraints are defined as profile and/or level constraints and/or signaled in a sequence picture reference set.

5. The apparatus of claim 1, wherein said decoder (200) evaluates a statement in a slice header for a slice in the enhancement layer to determine whether to apply the same weighting parameter or to use a different weighting parameter to the enhancement layer reference picture.

6. The apparatus of claim 1, wherein said decoder (200) uses the picture order number to remap the weighting parameters from the lower layer to a corresponding reference picture index in the enhancement layer.

7. The apparatus of claim 6, wherein the weighting parameter having the smallest reference picture index is remapped first.

8. The apparatus of claim 1, wherein said decoder (200) reads a weighted _ prediction _ flag field set to 0 for reference pictures used for the enhancement layer that are not available in the lower layer.

9. The apparatus of claim 1, wherein said decoder (200) receives a weighting parameter in a slice header for a reference picture index corresponding to a reference picture used in the enhancement layer when the reference picture used in the enhancement layer lacks a match in the lower layer.

10. The apparatus of claim 1, wherein said decoder (200) performs remapping on a slice basis when said picture has a same slice partition in the enhancement layer and in the lower layer, and said decoder (200) performs remapping on a macroblock basis when said picture has a different slice partition in the enhancement layer relative to the lower layer.

11. The apparatus of claim 1, wherein said decoder (200) stores only one weighting parameter set per reference picture index when said decoder applies the same weighting parameter to an enhancement layer parameter picture as applied to a lower layer reference picture.

12. The apparatus of claim 1, wherein said decoder (200) parses the weighting parameter from the slice header when said decoder applies a different weighting parameter to the enhancement layer reference picture than to the lower layer reference picture.

13. A method of scalable video decoding, comprising:

decoding (420) a block in an enhancement layer of a picture by applying to an enhancement layer reference picture the same weighting parameter that was applied to a lower layer reference picture for decoding a block in a lower layer of the picture, and

when the enhancement layer reference picture does not correspond to the lower layer reference picture, performing a remapping of the reference picture weighting table statement from the lower layer to the reference picture weighting table statement of the enhancement layer, wherein a block in the enhancement layer corresponds to a block in the lower layer.

14. The method of claim 13, wherein said decoding step (420) decodes the block in the enhancement layer by determining whether to use an explicit weighting parameter mode or an implicit weighting parameter mode.

15. The method of claim 13, wherein said decoding step (420) comprises complying with the following constraints imposed by the respective encoder: when a block in the enhancement layer corresponds to a block in the lower layer, and the enhancement layer reference picture corresponds to the lower layer reference picture, the same weighting parameter as that applied to the lower layer reference picture is always applied to the enhancement layer reference picture.

16. The method of claim 15, wherein the constraint is defined as a profile and/or level constraint and/or signaled in a sequence picture reference set (510).

17. The method of claim 13, wherein the decoding step comprises: for a slice in the enhancement layer, the statement in the slice header is evaluated (410) to determine whether to apply the same weighting parameter to the enhancement layer reference picture or to use a different weighting parameter.

18. The method of claim 13, wherein the performing step uses a picture order number to remap weighting parameters from a lower layer to a corresponding reference picture index in an enhancement layer.

19. The method of claim 18, wherein the weighting parameter with the smallest reference picture index is remapped first.

20. The method of claim 13, wherein the decoding step comprises: a weighted _ prediction _ flag field set to 0 for a reference picture used for the enhancement layer that is not available in the lower layer is read.

21. The method of claim 13, wherein the decoding step comprises: when a reference picture for use in the enhancement layer lacks a match in the lower layer, a weighting parameter for a reference picture index corresponding to the reference picture for use in the enhancement layer is received (435) in the slice header.

22. The method of claim 13, wherein the remapping is performed on a slice basis when the picture has the same slice division on the enhancement layer and the lower layer, and the remapping step is performed on a macroblock basis when the picture has a different slice division in the enhancement layer relative to the lower layer.

23. The method of claim 13, wherein when said decoding step applies the same weighting parameters to the enhancement layer parameter picture as to the lower layer reference picture, said decoding step comprises storing only one set of weighting parameters for each reference picture index.

24. The method of claim 13, wherein when said decoding step applies a different weighting parameter to the enhancement layer reference picture than to the lower layer reference picture, said decoding step comprises parsing (435) the weighting parameter from the slice header.