HK1161788B

HK1161788B - Conversion operations in scalable video encoding and decoding

Info

Publication number: HK1161788B
Application number: HK11113315.8A
Authority: HK
Inventors: S.孙; S．瑞古纳萨恩; C．图; C-L．林
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2008-08-25
Filing date: 2009-08-14
Publication date: 2014-02-14

Description

Switching operations in scalable video encoding and decoding

Background

Engineers use compression (also known as encoding or transcoding) to reduce the bit rate of digital video. Compression reduces the cost of storing and transmitting video by converting it to a lower bit rate form. Decompression (also known as decoding) reconstructs a version of the original video from the compressed form. A "codec" is an encoder/decoder system.

When a video encoder converts video into a lower bit rate form, the video encoder reduces the quality of the compressed video to reduce the bit rate. By selectively removing details from the video, the encoder makes the video simpler and easier to compress, but the compressed video is less faithful to the original video. In addition to this basic quality/bitrate tradeoff, the bitrate of a video depends on the content (e.g., complexity) of the video and the format of the video.

Video information is organized according to different formats for different devices and applications. The attributes of the video format may include color space, chroma sampling rate, sample depth, spatial resolution, and temporal resolution. In general, quality and bit rate differ directly for spatial resolution (e.g. details in picture pictures) and temporal resolution (e.g. number of pictures per second), higher spatial resolution or higher temporal resolution leading to higher quality but also to higher bit rate.

In video coding and decoding applications, common color spaces include YUV and YCbCr. Y represents the luminance component of the video and U and V, or Cb and Cr, represent the color (chrominance) component of the video. In addition to YUV and YCbCr, many other color spaces organize video within one luminance channel and multiple chrominance channels.

The chroma sampling rate refers to the sampling of the chroma channel relative to the luma channel of the video. For example, in the YUV color space, one chroma sampling rate is 4:4:4, meaning that for each Y sample, there is a corresponding U and V sample. However, the human eye is more sensitive to changes in brightness than variations in color, and encoders have been developed to take advantage of this fact. Another chroma sampling rate is 4:2:2, meaning that a single U sample and a single V sample correspond to two horizontal Y samples. The chroma sampling rate at a lower resolution, such as 4:2:2 or 4:2:0, results in fewer samples, typically requiring fewer bits to encode than a higher resolution chroma sampling rate, such as 4:4: 4. Due to the popularity of 4:2:0 chroma sampling, some video encoders accept video in 4:2:0 format, but do not accept source formats with higher chroma resolution.

Each picture element ("pixel") of a video picture includes one or more samples, and each sample is represented digitally by one or more bits. Studios and content producers often use video with 10 bits per sample or 12 bits per sample to more accurately represent sample values, with more gradations in brightness or color. By using a higher sample depth, greater accuracy of the sample values may be achieved, or a wider color gamut may be captured. For example, a 12-bit sample value has more possible values than a 10-bit sample value or an 8-bit sample value. As a compromise for this higher quality, higher sample depths tend to increase the bitrate of the encoding and decoding application. By convention, many encoders accept video with 8-bit samples.

Scalable video encoding and decoding facilitates the delivery of video to devices with different capabilities. A typical scalable video encoder splits video into a base layer and one or more enhancement layers. The base layer alone provides video for reconstruction at lower resolutions and enhancement layers may be added to provide additional information that will improve the video quality. Some scalable encoders and decoders rely on temporal scalability of video. Other common scalable coding/decoding schemes involve scalability to the spatial resolution or overall coding quality of the video.

Scalable video codecs that support temporal scalability, spatial scalability, and/or overall coding quality scalability provide many options for base and enhancement layers. While these types of scalability provide acceptable performance in many cases, they do not have the benefits and advantages of the techniques and tools described below.

SUMMARY

In general, detailed description presents techniques and tools for switching operations between modules in a scalable video coding tool or a scalable video decoding tool. For example, when the base layer video has a low sample depth and/or low color fidelity, the conversion operation helps to improve the efficiency of encoding the inter-layer residual video with a higher sample depth and/or higher color fidelity.

In accordance with a first aspect of the techniques and tools described herein, a tool, such as a scalable video encoding tool or a scalable video decoding tool, receives base layer video after the base layer video is reconstructed. The reconstructed base layer video has sample values with a first sample depth (e.g., 8 bits per sample). The tool filters the reconstructed base layer video using an adaptive low-pass filter and upsamples the sample values to a second sample depth (e.g., 10 bits per sample). The tool may also perform inverse tone mapping on the filtered and upsampled results. An adaptive low-pass filter, which may be adapted to remove coding artifacts or jitter values in the reconstructed base layer video, may be adjusted according to a filter strength parameter signaled by the encoding tool to the decoding tool.

In accordance with a second aspect of the techniques and tools described herein, a tool, such as a scalable video encoding tool or a scalable video decoding tool, receives the base layer video after reconstructing the base layer video. The reconstructed base layer video has one luma channel and multiple chroma channels with a first chroma sampling rate (e.g., 4:2: 0). The tool scales each chroma channel to a second chroma sampling rate (e.g., 4:2: 2). The scaling uses a type of chroma upsampling represented by one or more chroma scaling parameters signaled by the encoding tool to the decoding tool. For example, the chroma scaling parameter indicates a choice between linear interpolation and cubic interpolation for chroma upsampling.

According to a third aspect of the techniques and tools described herein, a tool, such as a scalable video coding tool, receives an inter-layer residual video having sample values selected from a first set of sample values. The encoding tool converts the sample values to a second set of sample values, maps the sample values between the first and second sets of sample values according to one or more set remapping parameters. The encoding tool signals the set remapping parameters to the scalable video decoding tool. The decoding tool receives the inter-layer residual video (having sample values selected from the second set of sample values) and performs inverse remapping according to the one or more set remapping parameters to map the sample values between the second and first sets of sample values.

The foregoing and other objects, features and advantages will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures. This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Brief Description of Drawings

FIG. 1 is a block diagram of a suitable computing environment in which several of the described techniques and tools may be implemented.

Fig. 2 is a block diagram of a scalable video coding tool in which several of the described techniques may be implemented.

Fig. 3 is a block diagram of a scalable video decoding tool in which several of the described techniques may be implemented.

Fig. 4 is a flow diagram of a generalized technique for upsampling sample values of a base layer video to a higher sample depth and adaptively filtering the video during scalable video encoding or decoding.

Fig. 5 is a diagram illustrating an example adaptive low-pass filtering option for sample values of base layer video during scalable video encoding or decoding.

Fig. 6 is a flow diagram of a generalized technique for scaling the chroma channel of base layer video to a higher chroma sampling rate during scalable video encoding or decoding.

Fig. 7 is a diagram illustrating example chroma sampling rate scaling options for a chroma channel of base layer video during scalable video encoding or decoding.

Fig. 8 is a flow diagram of a generalized technique for remapping inter-layer residual video during scalable video encoding or decoding.

Fig. 9 is a diagram illustrating an example remapping of sample values of inter-layer residual video during scalable video coding.

Fig. 10a and 10b are flow diagrams of techniques for scalable video coding with sample depth upsampling and adaptive filtering of base layer video, scaling of chroma channels of the base layer video, and remapping of sample values of inter-layer residual video.

Fig. 11a and 11b are flow diagrams of techniques for scalable video decoding corresponding to the scalable video coding of fig. 10a and 10 b.

Detailed Description

The present application relates to techniques and tools for switching operations between modules in a scalable video coding tool or a scalable video decoding tool. In particular, when scalable video coding and decoding uses base layer video with low sample depth and/or low color fidelity, the conversion operation helps to improve the efficiency of coding inter-layer residual video for video with higher sample depth and/or higher color fidelity.

For example, many existing video codecs are suitable for video in a 4:2:0YCbCr format with 8-bit samples. However, video content for high quality entertainment applications may have higher sample depth or color fidelity and may use a wider color gamut. To encode such content, the pre-processor reduces image fidelity to 8-bit 4:2:0YCbCr video before the base layer video encoder encodes the content. Some display devices are suitable for samples having a higher bit depth (e.g., 10 bits per sample) or a wider color gamut. To provide high fidelity video to such display systems, some scalable video codecs use an 8-bit 4:2:0YCbCr encoder for the base layer version of the video and one or more enhancement layers of the inter-layer residual video to represent the difference between the base layer version and the original video. The techniques and tools described herein help scalable video encoding and decoding tools convert video from a lower resolution format (e.g., 4:2:0YCbCr video with 8-bit samples in a limited color gamut) to a higher resolution format (e.g., 4:2:2YCbCr video with 10-bit samples in a wider color gamut) in a manner that makes compression of inter-layer residual video more efficient.

One aspect of the transform operation involves inverse scaling of the reconstructed base layer video to reverse the sample depth scaling performed prior to encoding. Inverse scaling combines adaptive low-pass filtering with sample depth upsampling to achieve higher sample depths. In many cases, the filtering and upsampling process reduces artifacts (e.g., blocking artifacts, or more generally, quantization noise) while also increasing the sample depth. Subsequent inverse tone mapping (e.g., from one color gamut to another) may be performed at the same sample depth or at a higher sample depth. This approach reduces the energy in the inter-layer residual video by bringing the reconstructed base layer video closer to the input video and thus helps make the compression of the inter-layer residual video more efficient.

Another aspect of the transform operation involves inverse scaling of the reconstructed base layer video to reverse the chroma sampling rate scaling performed prior to encoding. Inverse scaling uses an adaptive upsampling process to restore a higher chroma sampling rate. For example, when upsampling sample values to a higher chroma sampling rate in the chroma channel, the encoding tool or the decoding tool switches between linear interpolation and cubic interpolation. By adapting the chroma upsampling, the coding tool can reduce the energy in the inter-layer residual video and make the compression of the inter-layer residual video more efficient.

A third aspect of the conversion operation involves remapping and inverse remapping of the inter-layer residual video. In some cases, the difference between the input video and the reconstructed base layer video is beyond the dynamic range of the encoder and decoder for the enhancement layer video. In other cases, these differences have such a small dynamic range that they are encoded with the enhancement layer encoder, and are not preserved even at the highest quality allowed. To solve such a problem, the scalable video coding tool remaps the inter-layer residual video according to the remapping parameter and encodes the remapped inter-layer residual video. The corresponding scalable video decoding tool decodes the remapped inter-layer residual video and inversely remaps the inter-layer residual video. By adapting the remapping parameters, the encoding tool can adjust the dynamic range of the inter-layer residual video for efficient encoding by the enhancement layer encoder.

Various alternatives to the implementations described herein are possible. The techniques described with reference to the flowcharts may be changed by changing the ordering of the stages shown in the flowcharts, by splitting, repeating, or omitting certain stages, etc. Different aspects of the conversion operation can be combined or used separately. Different embodiments implement one or more of the described techniques and tools.

Some of the techniques and tools described herein address one or more of the problems noted in the background. Generally, the presented techniques/tools do not solve all these problems. Rather, given the constraints and tradeoffs of encoding time, encoding resources, decoding time, decoding resources, available bit rate, and/or quality, the presented techniques/tools improve the performance of a particular implementation or scenario.

I. Computing environment

FIG. 1 illustrates a generalized example of a suitable computing environment (100) in which several of the described techniques and tools may be implemented. The computing environment (100) is not intended to suggest any limitation as to scope of use or functionality, as the techniques and tools may be implemented in diverse general-purpose or special-purpose computing environments.

Referring to fig. 1, a computing environment (100) includes at least one processing unit (110) and memory (120). In fig. 1, this most basic configuration (130) is included within the dashed line. The processing unit (110) executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The memory (120) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory (120) stores software (180) that implements one or more of the conversion operations for scalable video encoding and/or decoding.

The computing environment may have other features. For example, the computing environment (100) includes storage (140), one or more input devices (150), one or more output devices (160), and one or more communication connections (170). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment (100). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment (100) and coordinates activities of the components of the computing environment (100).

Storage (140) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (100). The storage (140) stores instructions of software (180) for implementing the conversion operation.

The input device (150) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment (100). For audio or video encoding, the input device (150) may be a sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD-ROM or CD-RW that reads audio or video samples into the computing environment (100). The output device (160) may be a display, a printer, a CD writer, or another device that provides output from the computing environment (100).

The communication connection (170) allows communication with another computing entity over a communication medium. Communication media conveys information such as computer-executable instructions in a modulated data signal, audio or video input or output, or other data. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.

Various techniques and tools may be described in the general context of computer-readable media. Computer readable media can be any available media that can be accessed within a computing environment. By way of example, and not limitation, with respect to computing environment (100), computer-readable media may comprise memory (120), storage (140), communication media, and combinations of any of the above.

The techniques and tools may be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or separated between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed in local or distributed computing environments.

For the sake of presentation, this detailed description uses terms like "select" and "reconstruct" to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on the implementation.

Generalized coding tool

Fig. 2 is a block diagram of a generalized scalable video coding tool (200) in conjunction with which certain techniques described may be implemented. The encoding tool (200) receives a sequence of video pictures including an input picture (205) and generates a base layer bitstream (295) and one or more enhancement layer bitstreams (298). For the base layer, the format of the output bitstream may be Windows Media Video format, SMPTE 421-M format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g., H.261, H.262, H.263, or H.264), or other format. For the enhancement layer, the format of the output bitstream may be the same as the base layer bitstream or another format.

The tool (200) processes video images. The term "picture" generally refers to source, encoded, or reconstructed image data. For progressive video, a picture is a progressive video frame. For interlaced video, a picture may refer to an interlaced video frame, the top field of a frame, or the bottom field of a frame, depending on the context.

The input picture (205) has a spatial resolution of sample depth, chroma sampling rate, and/or a higher resolution than the resolution accepted by the base layer encoder (220). For example, the base layer encoder (220) is configured to encode a video picture with 8-bit samples and a 4:2:0 chroma sampling rate, while the input picture (205) has 10-bit samples and a 4:2:2 chroma sampling rate, or has another format with a higher resolution than 8-bit 4:2: 0. Alternatively, the base layer encoder (220) accepts 10-bit samples, 12-bit samples, or samples with some other sample depth, or the base layer encoder (220) accepts 4:2:2 video, 4:4:4 video, or video with some other chroma sampling rate.

The encoding tool (200) includes a first sealer (210) that accepts an input video picture (205) and outputs a base layer video to a base layer encoder (220). The first sealer (210) may downsample or otherwise scale the input video picture (205), e.g., to reduce sample depth, spatial resolution, and/or chroma sampling resolution. For sample depth downsampling, the puncturer (210) may crop the least significant x bits of a sample, use tone mapping to map sample values of one sample depth (e.g., 10 bits per sample) to another sample depth (e.g., 8 bits per sample), or use another mechanism. For chroma subsampling, the scaler (210) may use sample dropping, low pass filtering, or another mechanism. The scaler (210) may selectively add dither signals to improve the perceived quality of the independent base layer video. Alternatively, for one or more of these attributes of the input video picture (205), the first scaler (210) does not change the input video picture (205) at all.

Generally, tone mapping is a technique that maps one set of colors to another set of colors. Tone mapping may use simple linear functions, piecewise linear functions, table lookup operations, or other operators to map. For example, tone mapping maps a set 230 of possible color values (three 10-bit samples per pixel) to an arbitrary subset of 224 possible values (three 8-bit samples per pixel). This arbitrary subset may represent colors in the same gamut but omit possible colors, or it may have colors in smaller gamuts with slight gradations, or it may redistribute colors arbitrarily.

For example, in some coding cases, the scaler (210) accepts studio-quality video with high sample depth and high chroma sampling rate, filters and downsamples the video, adds a dither signal, and outputs base layer video with lower sample depth and lower chroma sampling rate. In other coding cases, the scaler (210) accepts video that has been downsampled in terms of sample depth and combined with a dither signal, and then downsamples the chroma sampling rate of the video to produce base layer video. In still other coding cases, the scaler (210) accepts video with high sample depth and high chroma sampling rate to which dither signal has been added, and then downsamples the video to produce base layer video with lower sample depth and lower chroma sampling rate.

The base layer encoder (220) encodes the base layer video and outputs a base layer bitstream (295). In addition, the base layer encoder (220) makes available the reconstructed base layer video input to the inverse scaler (230). As part of the encoding, the base layer encoder (220) typically produces a reconstructed version of the input picture (205). For example, the base layer encoder (220) decodes and buffers the reconstructed base layer picture for later motion compensation. In this manner, a reconstructed version can be obtained from the base layer encoder (220) for further processing in scalable coding. (alternatively, a base layer decoder (not shown) in the encoding tool (200) decodes the base layer bitstream (295) to produce a reconstructed base layer video.)

If the reconstructed base layer video has a different sample depth, spatial resolution, chroma sampling rate, etc. than the input video picture (205) due to scaling, the inverse scaler (230) may upsample or otherwise inverse scale the reconstructed base layer video so that it has a higher sample depth, spatial resolution, chroma sampling rate, etc. (e.g., the same sample depth, spatial resolution, chroma sampling rate, etc. as the input video picture (205)). The inverse scaler (230) may also adaptively filter the reconstructed base layer video to remove certain types of artifacts (e.g., blocking artifacts, jitter signals). For example, when it filters the reconstructed base layer video using an adaptive low-pass filter, the inverse scaler (230) upsamples the sample values of the reconstructed base layer video to a higher sample depth, and then the inverse scaler (230) upsamples the chroma channels of the reconstructed base layer video to the chroma sampling rate of the input video picture (205). In addition, to compensate for tone mapping during the scaling process, the inverse scaler (230) may perform inverse tone mapping (e.g., from one color gamut to another color gamut) at the same sample depth or at a higher sample depth. Details of the inverse scaling operation for the reconstructed base layer video in the example implementation are described below. Alternatively, the inverse scaler (230) inverse scales the reconstructed base layer video using another mechanism, e.g., repeated for chroma upsampled sample values.

The steps of scaling and encoding the input video typically result in some data loss between the input video and the reconstructed base layer video. In general, the inter-layer residual video represents some differences (but not necessarily all differences) between the reconstructed base layer video and the input video. In the tool (200) of fig. 2, a differentiator subtracts samples of the reconstructed base layer video from corresponding samples of the input video to generate this inter-layer residual video. The input video may be additionally filtered before the differentiator.

The second scaler (250) scales the inter-layer residual video for input to the enhancement layer video encoder (260). For example, the second sealer (250) remaps sample values of the inter-layer residual video so that the sample values have a distribution that facilitates efficient compression with the enhancement layer video encoder (260). Details of the scaling operation for inter-layer residual video in an example implementation are described below. Alternatively, the second scaler (250) scales the inter-layer residual video using another mechanism.

The enhancement layer encoder (260) compresses the inter-layer residual video and generates an enhancement layer bitstream (298). A "picture" at a time in the inter-layer residual video represents the difference between the input video picture and the reconstructed base layer video picture, but is still encoded as a picture by the example enhancement layer video encoder (260). The enhancement layer bitstream (298) may also include parameters for adaptive low-pass filtering and upsampling by an inverse scaler (230) and parameters for remapping by a second scaler (250).

Although fig. 2 shows a single enhancement layer encoder (260), the inter-layer residual video itself may be split into multi-layer residual video for encoding with separate residual encoders. For example, the decomposer uses wavelet decomposition or another suitable decomposition mechanism to split the inter-layer residual video into a chroma high-pass residual layer and a sample depth residual layer, which are then encoded by a chroma high-pass encoder and a sample depth residual encoder, respectively, to generate two separate enhancement layer bitstreams.

A controller (not shown) receives inputs from various modules of the tool (200) during the encoding process and evaluates the intermediate results. The controller works with modules such as the inverse scaler (230) and the second scaler (250) as well as modules within the base layer encoder (220) and the enhancement layer encoder (260) to set and replace encoding parameters in the encoding process. The tree of coding parameter decisions to be evaluated, and the corresponding temporal choice of coding, varies from implementation to implementation. In some embodiments, the controller also receives input from an encoding session wizard interface, from another encoder application interface, or another source to specify the video to be encoded using a particular rule.

The relationships shown between modules within the tool (200) represent general information flow; other relationships are not shown for simplicity. Specifically, fig. 2 generally does not show auxiliary information for the inverse expanders (230) and the second expanders (250). Such side information, once completed, is sent in the output bitstream or side channel. Particular embodiments of scalable video coding tools typically use a variant or complementary version of the tool (200). Depending on the implementation and type of compression desired, modules may be added, omitted, split into multiple modules, combined with other modules, and/or replaced with similar modules. In alternative embodiments, scalable video coding tools with different modules and/or other configurations of modules perform one or more of the described techniques.

Generalized decoding tool

Fig. 3 is a block diagram of a generalized scalable video decoding tool (300) in connection with which certain techniques described may be implemented. The decoding tool (300) receives one or more bitstreams of compressed video information, including bitstreams of different layers, and generates reconstructed video (395). For base layer Video, the format of the base layer bitstream (305) may be Windows Media Video format, SMPTE 421-M format, MPEG-x format ((e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g., H.261, H.262, H.263, or H.264), or other format for inter-layer residual Video, the format of the enhancement layer bitstream (308) may be the same as the base layer bitstream (305), or it may be another format.

The decoding tool (300) includes a base layer decoder (320) that receives a base layer bitstream (305) and outputs reconstructed base layer video to a first inverse puncturer (330). If the reconstructed base layer video has a different sample depth, spatial resolution, chroma sampling rate, etc. than the output video picture (due to scaling during encoding), then the first inverse scaler (330) upsamples or otherwise inverse scales the reconstructed base layer video so that it has a higher sample depth, spatial resolution, chroma sampling rate, etc. (e.g., the same sample depth, spatial resolution, chroma sampling rate, etc. as the output video (395)). The first inverse scaler (330) may also adaptively filter the reconstructed base layer video to remove certain types of artifacts (e.g., blocking artifacts, jitter signals). For example, when it filters the reconstructed base layer video using an adaptive low-pass filter, the first inverse scaler (330) upsamples the sample values of the reconstructed base layer video to a higher sample depth and then upsamples the chroma channels of the reconstructed base layer video to a higher chroma sampling rate. The inverse scaler (330) may also perform inverse tone mapping at the same sample depth or a higher sample depth. Details of the inverse scaling operation for the reconstructed base layer video in the example implementation are described below. The enhancement layer bitstream (308) may include parameters that control the operation of the adaptive low-pass filtering and upsampling of the first inverse scaler (330). Alternatively, the first inverse scaler (330) inverse scales the reconstructed base layer video using another mechanism.

The decoding tool (300) further comprises an enhancement layer decoder (340) for receiving the enhancement layer bitstream (308) and outputting the decoded inter-layer residual video to a second inverse sealer (350). A second inverse scaler (350) inversely scales the inter-layer residual video. For example, the second inverse scaler (350) remaps sample values of the inter-layer residual video to reverse the mapping performed during the encoding process. Details of the inverse scaling operation for inter-layer residual video in an example implementation are described below. The enhancement layer bitstream (308) may include parameters that control the operation of remapping by the second inverse scaler (350). Alternatively, the second inverse scaler (350) uses another mechanism to inverse scale the inter-layer residual video.

Although fig. 3 shows a single enhancement layer decoder (340), the inter-layer residual video itself may be split into multiple layers (signaled as multiple enhancement layer bitstreams) for decoding with separate enhancement layer decoders.

In some cases, one or more of the enhancement layer bitstreams are not present. This may occur, for example, if the bit stream is corrupted during transmission or on the storage medium. Alternatively, for certain types of playback devices or certain decoding scenarios, the enhancement layer bitstream is selectively dropped by the transmitter or by the decoding tool (300) in order to reduce the bit rate or reduce the decoding complexity.

The decoding tool (300) combines the reconstructed base layer video output by the first inverse scaler (330) with the reconstructed inter-layer residual video output from the second inverse scaler (350), if any, to produce a reconstructed video (395) for output. If the layers of the inter-layer residual video are separated during the encoding process by wavelet decomposition or another mechanism, the decoding tool (300) may combine the reconstructed residual layers by using wavelet synthesis or another mechanism before combining the generated inter-layer residual video with the reconstructed base layer video.

The relationships shown between modules within the decoding tool (300) represent general information flow in the decoding tool (300); other relationships are not shown for simplicity. Particular embodiments of video decoding tools typically use a variant or complementary version of generalized decoding tools. Depending on the implementation and type of decompression desired, modules of the decoding tool may be added, omitted, split into multiple modules, combined with other modules, and/or replaced with similar modules. In alternative embodiments, decoding tools with different modules and/or other configurations of modules perform one or more of the described techniques.

Adaptive filtering and upsampling of reconstructed base layer video

In some embodiments, the scalable video coding tool and the decoding tool perform inverse scaling on the reconstructed base layer video using a combination of adaptive low-pass filtering and upsampling after the sample depth of the input video is scaled before base layer coding. The filtering and upsampling process may reduce image artifacts while also increasing sample depth. Subsequent inverse tone mapping (e.g., from one color gamut to another) may optionally be performed at the same sample depth or at a higher sample depth. This approach helps deal with coding errors and artifacts (e.g., blocking artifacts, banding artifacts or, more generally, quantization noise) in the reconstructed base layer video. By adapting filtering to make the reconstructed base layer video more closely approximate to the input video, the scalable video coding tools can reduce the energy of the inter-layer residual video and thereby improve compression efficiency.

Adaptive low-pass filtering and upsampling have advantages over other methods of inverse scaling. For example, one method of restoring sample depths in reconstructed base layer video is to map sample values to higher sample depths through direct pixel-to-pixel mapping. Despite this simplicity, coding errors or banding artifacts caused by limited sample depth in the base layer video can easily propagate to the inter-layer residual video. Adaptive filtering and upsampling may help remove such artifacts.

Adaptive filtering and upsampling may also help improve quality in another way. In some encoding cases, the encoding tool adds a dither signal during the pre-processing of the base layer video, and then encodes the base layer video using the added dither signal. While the dither signal improves perceptual quality when the base layer video alone is played, the dither signal may add energy to the inter-layer residual video in scalable video coding. Thus, the encoding and decoding tools use low pass filters adapted to remove the added dither signal. The adaptive low-pass filter may remove compression artifacts such as blocking artifacts and banding artifacts simultaneously to reduce the energy of the inter-layer residual video.

A. Techniques for adaptive filtering and upsampling

Fig. 4 shows a generalized technique (400) for adaptive low-pass filtering and upsampling of reconstructed base layer video. Tools such as scalable video coding tool (200) of fig. 2, scalable video decoding tool (300) of fig. 3, or other tools perform techniques (400).

Initially, the tool receives (410) reconstructed base layer video with sample values having a first sample depth. For example, the reconstructed base layer video has 8-bit samples. Alternatively, the samples of the reconstructed base layer video have some other sample depth.

In some implementations, the tool also obtains one or more filter strength parameters of the low-pass filter. For example, during the encoding process, the encoding tool selects a filter strength parameter (e.g., after evaluating different values of the filter strength parameter, or after estimating which values of the filter strength parameter will provide good performance). Later, the encoding tool signals the filter strength parameter as side information in the enhancement layer bitstream or side information sent out-of-band. During decoding, the decoding tool parses the filter strength parameters from the enhancement layer bitstream (or side channel) and adjusts the low pass filter. Example filter strength parameters are presented below. Alternatively, the tool also uses other filter strength parameters. The filter strength parameters may vary picture by picture and channel by channel or according to some other way.

The tool filters (420) the base layer video using an adaptive low-pass filter and upsamples (430) sample values of the base layer video to a second sample depth that is higher than the first sample depth. For example, the tool performs filtering and upsampling using a filter implemented in one of the examples below to remove artifacts (e.g., blocking artifacts, dither signals) or smooth them while also restoring the sample depth to a higher level. Alternatively, the tool performs filtering and upsampling using another filter.

The tool performs the technique (400) picture by picture, or in some other way, through reconstructed base layer video pictures using a sliding window. Although fig. 4 shows filtering (420) before upsampling (430), in practice, filtering (420) and upsampling (430) may be performed in combination sample by sample in a sliding window, or in some other order. Before or after filtering and upsampling, the tool may perform inverse tone mapping (not shown in fig. 4) on the sample values of the reconstructed base layer video, compensating for the tone mapping performed as part of the scaling prior to base layer encoding.

Fig. 10a and 10b illustrate a technique (1000) of scalable video coding including filtering and upsampling as shown in fig. 4. Fig. 11a and 11b illustrate a technique (1100) for scalable video decoding that includes filtering and upsampling as shown in fig. 4. Alternatively, the technique (400) is used in some other way in a scalable video encoding and/or decoding process.

B. Example implementation of adaptive filtering and upsampling

Example implementations use an adaptive filter that combines low-pass filtering and upsampling. This adaptive filtering can reduce artifacts and naturally bring the sample depth to a higher level. Subsequent tone mapping may then be done within the same sample depth, or to a higher sample depth.

While adaptive filtering has a flexible design, in general, it has two integrated components: low pass filtering and sample depth upsampling. For example, for a current sample value s (x, y) at a location (x, y) in a picture of reconstructed base layer video, the combined filtering and upsampling may be represented as follows.

In this equation, w (i, j) represents a 2D low pass filter with a normalization factor N, and R represents the filtering range. BD denotes a target sample depth, which is greater than or equal to the sample depth of the base layer video shown as 8 in equation (1). Thus, s' (x, y) represents the filtered sample value with the sample depth BD.

The 2D filter may be implemented as a 2D window or as a combination of 1D filters along one or more axes. Fig. 5 shows the axes along four directions: horizontal, vertical, top left to bottom right, and bottom left to top right, each direction including the current sample position (501). Sample values at positions that do not fall on one of these axes are not given a weight (w (i, j) ═ 0). Sample values at positions falling on one of these axes are given a full weight (w (i, j) ═ 1) and are counted towards the normalization factor. Alternatively, the filter uses another shape, for example, a shape suitable for smoothing different types of artifacts.

The size value R indicates the extent of possible filtering using the filter. In one implementation, R is 0, 1 or 2, a sample position of +/-2 horizontally and vertically with respect to the current position (x, y) may be considered. For an example sample value (500), fig. 5 shows a window (510) when R ═ 1 and a window (520) when R ═ 2.

In implementations applicable to local complexity, the filter uses a threshold to exclude certain locations within the window. Without loss of generality, the following rules show how the threshold adaptively changes which sample points contribute to filtering in the 1D horizontal window. The position offset m represents the degree of similar value within the 1D window away from the current position (x, y). For example, the offset m is set to the minimum absolute value of i that satisfies the following constraint:

|s(x+i，y)-s(x，y)|＞T (2)

satisfy-R < i < R. The threshold T is a filter threshold control parameter. In fig. 5, a sequence of sample values 16, 19, 20, 18, 17 in a 1D window is assumed, where the current sample value s (x, y) is 20. If T is 2, the offset value m is 1, since at offset +2, |17-20| > 2. If no value i satisfies the constraint in equation (2), then m is R. For simplicity, the adaptive filter is symmetric; the same offset m is used in each direction. Alternatively, different offset values are used in different directions away from the current position.

In filtering, sample values at positions within an offset m from the current position (x, y) are weighted, and other sample values within the 1D window are not weighted.

Wherein, for filtration within a 1D horizontal window, j is 0, and-R < i < R.

Similarly, for the adaptive threshold rule along the 1D vertical window, when the position offset m is found, j varies from-R < j < R. For an adaptive threshold rule along the diagonal 1D window, both i and j may vary, where i equals j (for one diagonal as shown in fig. 5) or i equals j (for another diagonal as shown in fig. 5) in the range of-R and R to find the position offset m. For a 2D window, the euclidean distances at positions in the range of-R and R to different values i and j may be considered to find a position offset m.

Regardless of whether an adaptive threshold rule is applied, the normalization factor N is determined when the value of w (i, j) is set. In some implementations, for simplicity, the tap coefficient (tap coefficient) of w (i, j) is 0 or 1, when there are any non-zero w (i, j) values for i ≠ 0 or j ≠ 0, then w (0, 0) is set to 0 so that the current sample does not contribute to the filtered result. The normalization factor N is simply a statistic of the position, where w (i, j) ═ 1. More generally, different positions in w (i, j) may have different tap values, e.g. to provide a larger weight at the current position, or a bilinear or bicubic filter is implemented, a de-ringing filter or other filter is implemented instead of a low-pass filter, or smoothing handles different types of artifacts, in which case the tap values of the positions contributing to the filtering are summed up to determine the normalization factor N.

For the combined filtering and upsampling adaptive implementation represented in equation (1), the strength of the filtering can be effectively controlled by setting the parameter values T and R. Fig. 5 shows the result of filtering the position s (x, y) in an example sample value (500) for different values of R and T. In general, to reduce the energy of the inter-layer residual video and thereby facilitate compression, the encoding tool adjusts one or more of the intensity parameters used for filtering. Increasing R would be the possible window size for filtering, potentially resulting in stronger filtering. Increasing T tends to result in more locations contributing to the filtering, since more sample values will satisfy the similarity constraint, tending to result in stronger filtering. For example, the threshold intensity parameter T is set to 1 and the range R is one of {0, 1, 2 }. When R is 0, there is no low pass filtering. Alternatively, the intensity parameter T and the range R have other possible values, or the encoding tool and the decoding tool adapt the filtering by changing another parameter. For example, the encoder adapts the weighting mechanism and the normalization factor.

The encoding tool signals the filter strength control parameters in the bitstream so that the corresponding decoding tool can apply the same filter strength parameters in the inverse scaling process on the reconstructed base layer video. For example, the enhancement layer bitstream includes a filter strength control parameter.

Depending on the implementation, the encoding and decoding tools may change the filter strength parameters per channel, per picture, or according to some other manner. In some implementations, the encoding and decoding tools may selectively disable filtering in certain regions (e.g., depending on local image complexity). In some implementations, the scalable video coding tools and decoding tools use the same filter strength parameters for the luma channel and the chroma channels of a given picture.

In addition to low pass filtering and sample depth upsampling, the encoding and decoding tools may also perform spatial upsampling. If the spatial resolution of the base layer video is lower than that of the inter-layer residual video, the encoding tool or the decoding tool may increase the spatial resolution using a spatial interpolation filter (e.g., a low pass filter).

V. adaptive chroma upsampling of reconstructed base layer video

In some embodiments, the scalable video coding tools and decoding tools perform chroma upsampling on the reconstructed base layer video if the chroma sampling rate of the reconstructed base layer video is less than the high fidelity level. By adapting the chroma upsampling wave to make the reconstructed base layer video more closely approximate to the input video, the encoding tools can reduce the energy of the inter-layer residual video and thereby improve compression efficiency.

For example, scalable video coding tools select between linear interpolation and cubic interpolation in chroma upsampling for a given chroma channel of a picture of the base layer video. The coding tool selects the interpolation type that causes the reconstructed base layer video to more closely match the input video. The coding tool signals the selection in the bitstream and the corresponding decoding tool uses the same type of interpolation for chroma upsampling of a given chroma channel of the picture.

A. Techniques for adaptive chroma upsampling

Fig. 6 shows a generalized technique (600) for adaptive chroma upsampling of reconstructed base layer video. Tools such as scalable video coding tool (200) of fig. 2, scalable video decoding tool (300) of fig. 3, or other tools perform technique (600).

First, the tool receives (610) reconstructed base layer video with a chroma channel having a first chroma sampling rate. For example, the reconstructed base layer video has a chroma sampling rate of 4:2:0 or 4:2: 2. Alternatively, the reconstructed base layer video has another chroma sampling rate.

The tool then obtains (620) one or more chroma scaling parameters. For example, during the encoding process, the encoding tool selects the chroma scaling parameters (e.g., after evaluating different values of the chroma scaling parameters, or after estimating which values of the chroma scaling parameters will provide good performance). Later, the encoding tool signals the chroma scaling parameters as side information in the enhancement layer bitstream or side information sent out-of-band. In the decoding process, the decoding tool receives the chroma scaling parameters from the enhancement layer bitstream (or side channel) and adjusts the chroma upsampling accordingly. Example chroma scaling parameters will be presented below. Alternatively, the tool also uses other chroma scaling parameters. The chroma scaling parameters may vary picture by picture and channel by channel or according to some other way.

The tool scales (630) sample values of the chroma channel to a second chroma sampling rate that is higher than the first chroma sampling rate. For example, the tool scales the sample values of the chroma channel to convert the chroma sampling rate of the reconstructed base layer video from 4:2:0 to 4:2:2, from 4:2:2 to 4:4:4, or from 4:2:0 to 4:4: 4. Alternatively, the tool scales the sample values of the chroma channels to another chroma sampling rate.

The chroma sampling parameter indicates a type of chroma upsampling to be used in the scaling (630) operation. For example, the chroma scaling parameter indicates whether the scaling uses linear interpolation with a first predefined filter or cubic interpolation with a second predefined filter. Alternatively, the chroma scaling parameter explicitly represents the filter coefficients of a filter to be used in chroma upsampling, the filter size of the filter, and/or another property of the filter, or the chroma scaling parameter represents the switching between other types of interpolation. Alternatively, the chroma scaling parameter may otherwise indicate the type of chroma upsampling in terms of the strength of the chroma scaling and/or the mechanism used in the chroma scaling.

The tool performs the technique (600) picture by picture, or in some other way, through reconstructed base layer video pictures using a sliding window. Although fig. 6 shows chroma upsampling as separate from other filtering and upsampling operations, different filtering and upsampling operations may be performed in combination. Before or after chroma upsampling, the tool may perform inverse tone mapping (not shown in fig. 6) on the sample values of the reconstructed base layer video, compensating for the tone mapping performed as part of the scaling prior to base layer coding.

Fig. 10a and 10b illustrate techniques (1000) for scalable video coding including chroma upsampling as shown in fig. 6. Fig. 11a and 11b illustrate a technique (1100) for scalable video decoding including chroma upsampling as shown in fig. 6. Alternatively, the technique (600) is used in some other way in a scalable video encoding and/or decoding process.

B. Example implementation of adaptive chroma upsampling

An example implementation of chroma upsampling switches between linear interpolation and cubic interpolation. In general, linear interpolation tends to smooth out high frequency patterns in the processed sample values, helping when high frequency energy is added to the base layer video and should be removed. In contrast, cubic interpolation tends to preserve or even emphasize high frequency modes, which is helpful when chroma sample values have been smoothed in the base layer video.

Linear interpolation uses the following filters to determine two chroma sample values s of the reconstructed base layer video_t(x, y) and s_t+1Two new chrominance sample values s between (x, y)_t+1/4(x, y) and s_t+3/4(x，y)。

s_t+1/4(x，y)＝(3·s_t(x，y)+s_t+1(x，y))＞＞2 (4)，

s_t+3/4(x，y)＝(s_t(x，y)+3·s_t+1(x，y))＞＞2 (5)，

They correspond to filters with coefficients of (3, 1)/4 and (1, 3)/4, respectively. Alternatively, linear interpolation also uses filters with other coefficients.

Cubic interpolation uses the following filters to determine two chroma sample values s of the reconstructed base layer video_t(x, y) and s_t+1Two new chrominance sample values s between (x, y)_t+1/4(x, y) and s_t+3/4(x，y)。

s_t+1/4(x，y)＝(-3·s_t-1(x，y)+28·s_t(x，y)+9·s_t+1(x，y)-2·s_t+2(x，y))＞＞5(6)，

s_t+3/4(x，y)＝(-2·s_t-1(x，y)+9·s_t(x，y)+28·s_t+1(x，y)-3·s_t+2(x，y))＞＞5(7)，

They correspond to filters with coefficients (-3, 28, 9, -2)/32 and { -2, 9, 28, -3}/32, respectively. Alternatively, cubic interpolation also uses filters with other coefficients. Depending on the implementation, the results of the cubic interpolation may be clipped so that the output values fall within the expected sample depth range.

Fig. 7 shows the result of linear interpolation using the filters of equations (4) and (5) for a set of chroma sample values. Fig. 7 also shows the results of cubic interpolation using the filters of equations (6) and (7) for the chroma sample values. The encoding tool and the decoding tool perform vertical interpolation when upsampling from 4:2:0 to 4:2: 2. The encoding tool and the decoding tool perform horizontal interpolation when upsampling from 4:2:2 to 4:4: 4. When upsampling from 4:2:0 to 4:4:4, the encoding and decoding tools may perform separable vertical and horizontal interpolation or perform 2D filtering. The type of interpolation (e.g., linear or cubic) may be the same or different horizontally and vertically.

The encoding tool and the decoding tool select the interpolation type for chroma upsampling per chroma channel per picture. Alternatively, the encoding and decoding tools are turned on in some other way, e.g., the same type of interpolation is used for both chroma channels of a picture, but potentially switching the type of interpolation from picture to picture. The selection of the type of interpolation used for chroma upsampling may be made independently of the type of chroma downsampling used in the encoding process. After chroma upsampling, the chroma sample values are usually different from the original chroma sample values due to compression, different filtering, etc., but the positions of the chroma sample values should be the same in the reconstructed base layer video and the input video.

After determining which type of chroma upsampling to use, the coding tool signals selected chroma sampling parameters in the enhancement layer bitstream or another bitstream. The decoding tool parses chroma sampling parameters from the bitstream and uses them to select which type of chroma upsampling to perform.

The encoding and decoding tools may perform chroma upsampling in combination with adaptive low pass filtering, sample depth upsampling, and/or inverse tone mapping. Alternatively, they may perform chroma upsampling independently. For example, the encoding tool and the decoding tool may perform chroma upsampling independently after low pass filtering and sample depth upsampling, but before inverse tone mapping. Alternatively, the encoding and decoding tools independently perform chroma upsampling after low pass filtering, sample depth upsampling and inverse tone mapping,

adaptive remapping for inter-layer residual video

In some embodiments, scalable video coding tools and decoding tools perform sample value mapping on sample values of inter-layer residual video. With remapping, the coding tool scales the inter-layer residual video values by the appropriate factor selected by the coding tool. The encoding tool signals the scaling factor to the corresponding decoding tool. Using the remapping, the decoding tool inverse scales the inter-layer residual video values according to the scaling factor and then combines the inverse scaled inter-layer residual video with the reconstructed base layer video. Scaling and inverse scaling allow many different dynamic ranges of inter-layer residual video to be efficiently encoded with a given enhancement layer codec.

For example, a typical enhancement layer video encoder is most effectively adapted to an 8-bit value having a dynamic range of 256 (+/-128 around midpoint 128 for a range of 0 … 255). If the dynamic range of the inter-layer residual video is much greater than 256, skewed with respect to the midpoint 128, or much less than 256, the compression efficiency of the enhancement layer encoder may suffer. As such, the encoding tool maps the sample values of the inter-layer residual video to a target dynamic range 256 (128 around midpoint 128) for encoding; after decoding, the decoding tool maps the sample values of the inter-layer residual video back to the initial dynamic range.

Remapping sample values of inter-layer residual video is suitable for many coding and decoding scenarios. For example, due to tone mapping, the difference between the input video and the reconstructed base layer video may be beyond the dynamic range of the enhancement layer encoder. For example, if 10-bit input video (with a wide color gamut) is tone mapped to 8-bit video for base layer encoding (with a more limited color gamut), the difference between the 10-bit input video and the 10-bit reconstructed base layer video (after inverse tone mapping) is often beyond the dynamic range that can be efficiently encoded with 8-bit samples. In other cases, the enhancement layer encoder may not be able to efficiently encode due to the low quality/low bitrate encoding of the base layer video, the difference between the input video and the reconstructed base layer video resulting in a large dynamic range of the inter-layer residual video.

In other cases, the difference between the input video and the reconstructed base layer video is much smaller than the dynamic range of the enhancement layer encoder. Since the enhancement layer encoder is not suitable for encoding content with such a small dynamic range, quality suffers even if the inter-layer residual video is encoded with the highest quality allowed. For example, an enhancement layer encoder adapted to encode sample values with a dynamic range 256 may have difficulty encoding inter-layer residual video having only sample values less than |5| or a dynamic range 9.

A. Techniques for remapping inter-layer residual video

Fig. 8 shows a generalized technique (800) that includes remapping sample values of inter-layer residual video. Tools such as scalable video coding tool (200) of fig. 2, scalable video decoding tool (300) of fig. 3, or other tools perform technique (800).

Initially, the tool receives (810) inter-layer residual video having sample values in a first sample value set. For a remapping operation in scalable video coding, the first set of sample values is an initial set of sample values of the inter-layer residual video. For example, the inter-layer residual video first has 10-bit sample values with an initial set of sample values in the range-277 … 301, or-4 … 3, or-491 … 563. For an inverse remapping operation in scalable video decoding, the first set of sample values is a target set of sample values produced by a remapping during scalable video encoding by an encoding tool.

The tool then obtains (820) one or more set remapping parameters. For example, in a scalable video coding process, the coding tool selects a set remapping parameter (e.g., after evaluating different values of the set remapping parameter, or after estimating which values of the set remapping parameter will provide good performance). Later, the encoding tool signals the set of remapping parameters as side information in the enhancement layer bitstream or side information sent out-of-band. In a scalable video decoding process, the decoding tool receives the set remapping parameters from the enhancement layer bitstream (or side channel) and adjusts the inverse remapping accordingly. Example set remapping parameters are presented below. Alternatively, the tool also uses other set remapping parameters. The set remapping parameters may vary picture by picture and channel by channel or according to some other way.

The tool maps (830) sample values from a first set of sample values to a second set of sample values. For example, for remapping in scalable video coding, the coding tool maps sample values from an initial set of sample values to a target set of sample values used in enhancement layer encoding/decoding. Alternatively, for inverse remapping in scalable video decoding, the decoding tool maps sample values from a target set of sample values used in enhancement layer encoding/decoding back to an initial set of sample values.

The tools perform the techniques (800) picture by picture, or in some other manner, for inter-layer residual video pictures. Although fig. 8 shows sample value remapping as separate from other operations, other operations may be performed in conjunction with sample value remapping.

Fig. 10a and 10b illustrate a technique (1000) of scalable video coding that includes sample value remapping as shown in fig. 8. Fig. 11a and 11b illustrate a technique (1100) for scalable video decoding that includes sample value remapping as shown in fig. 8. Alternatively, the technique (800) is used in some other way in a scalable video encoding and/or decoding process.

B. Example implementation of remapping for inter-layer residual video

An example of remapping sample values of the inter-layer residual video enables the dynamic range of the inter-layer residual video to be adjusted prior to enhancement layer encoding, and then reversed after enhancement layer decoding. In many coding cases, adjusting the dynamic range of the inter-layer residual video increases the efficiency of enhancement layer coding.

In an example implementation, the encoding tool determines whether to perform sample value remapping for sample values of a picture of inter-layer residual video. The coding tool makes this determination independently for the respective luma and chroma channels of the picture. For pictures, the coding tool signals an on/off flag in the enhancement layer bitstream indicating whether sample values remap for at least one channel.

When a sample value is remapped for a channel, the coding tool determines which parameters to use for sample value remapping. In general, the encoding tool selects the parameters such that the dynamic range of the inter-layer residual video matches the dynamic range of the enhancement layer codec. For example, if the dynamic range of the enhancement layer codec is 256 (128 around midpoint 128), and the initial dynamic range of the inter-layer residual video is 380 (190 … 189 around midpoint 0), the encoding tool selects the remapping parameters that reduce 380 to the target 256 and shifts the range of sample values so that it has the midpoint of the target range.

Fig. 9 shows two examples of sample value remapping before encoding the inter-layer residual video. In the first example, the dynamic range of the sample values is 8 (range-4 … 3), which is too small to be efficiently encoded. The coding tool maps sample values to a larger range of 0 … 224. The center of the range is also shifted in the remapping. In a second example, the dynamic range of the sample values is 1054 (the range 491 … 563 having a midpoint of 36). The coding tool maps the sample values to a smaller range of 0 … 255. In remapping, the encoding changes the center of the range to 128.

When determining the set remapping parameters, the encoding tool evaluates the sample values of the inter-layer residual video. For example, the coding tool finds the highest and lowest values and then determines the dynamic range of the inter-layer residual video. The ratio between the target dynamic range and the initial dynamic range of the inter-layer residual video generally represents a possible scaling for remapping, however, the encoding tool may select a scaling that still results in a more aggressive scaling of sample values within the target dynamic range. The coding tool may apply a "ceiling and floor" function to the sample values of the inter-layer residual video in order to mask outliers that would otherwise mislead the coding tool about the distribution of sample values. For example, for the second example of FIG. 9, if 99% of the values are between-300 and 450, the encoding tool crops the outliers-491, 563, etc. so that the dynamic range is 750, rather than 1054, and the scaling is less aggressive.

The parameterization and scaling operations in the encoder-side range remapping depend on the implementation. In general, for a given decoder-side range remapping scheme, the encoding tool arbitrarily uses any one of a number of different range remapping schemes that conform to the decoding scheme. The example implementation uses three parameters Scale, Shift and Norm that indicate how sample value remapping is performed. The coding tool may use different Scale, Shift and Norm parameters for each channel of a picture of the inter-layer residual video. For a given initial sample value s (x, y) for one channel, the coding tool calculates the remapped sample value s as follows_r(x，y)。

Wherein the ratioShowing dynamic range scaling in general and Shift showing the centre of the rangeOffset, and the operator nint (x) returns the nearest integer value to the floating-point value x. The remapping operation may also include rounding the offset (not shown). For the first example of fig. 9, the values of the parameters are Scale ═ 1, Shift ═ 128, and Norm ═ 5. For the second example, the values of the parameters are Scale 33, Shift 119, and Norm 3.

At the picture level in the enhancement layer bitstream, the encoding tool signals the 1-bit syntax element RES _ SCALING _ PRESENT. If RES _ SCALING _ PRESENT is zero, there is no residual remapping parameter in the bitstream, default values are Scale 1, Shift 128 and Norm 0. Generally, applying these default values in remapping changes sample values from the initial set-128 … 127 to the target set 0 … 255, and in inverse remapping changes sample values back to the initial set.

If RES _ SCALING _ PRESENT is 1, there is a residual remapping parameter in the bitstream. The bitstream includes the parameters shown in the following table.

Parameter(s)	Number of bits	Semantics
			SCALE_Y	8	Scale parameter of Y channel of picture, 1 < ═ Scale < ≦ 256.
SHIFT_Y	8	Shift parameter of Y channel of picture, 0 < Shift ≦ 255.
			NORM_Y	3	Norm parameter of Y channel of picture, 0 < ═ Norm < ═ 7.
SCALE_U	8	Scale parameter of U channel of picture, 1 ≦ Scale ≦ 256.
			SHIFT_U	8	Shift parameter of U channel of picture, 0 < Shift ≦ 255.
NORM_U	3	Norm parameter, 0 < ═ Norm < ═ 7 for U channel of picture.
			SCALE_V	8	Scale parameter of V channel of picture, 1 < ═ Scale < ≦ 256.
SHIFT_V	8	Shift parameter of V channel of picture, 0 < Shift ≦ 255.
			NORM_V	3	Norm parameter of V channel of picture, 0 < ═ Norm < ═ 7.

Table 1: example setting remapping parameters

At decoder side, for one of the inter-layer residual videoFor each picture, the decoding tool receives a 1-bit on/off flag, potentially the Scale, Shift and Norm parameters signaled by the encoding tool for each channel of the picture. For sampling values s from remapping_r(x, y) reconstruct the sample values s' (x, y) for one channel, and the decoding tool performs inverse scaling as follows.

s′(x，y)＝((s_r(x，y)-Shift)*Scale)＞＞Norm (9).

Low complexity is particularly valuable at the decoder side, where division-free operation is used according to equation (9).

In some implementations the sample values of skipped macroblocks or skipped channels in the inter-layer residual video are zero. If the enhancement layer decoder simply sets the sample values to zero, the inverse remapping can change the sample values to have non-zero values. Thus, the enhancement layer decoder will skip the macroblock or the sample value s of the skipped channel_r(x, y) are set to Shift so that they are zero after inverse remapping: s' (x, y) ((Shift-Shift) × > Scale) > Norm ═ 0.

In some implementations, the enhancement layer encoder and decoder perform motion compensation on the inter-layer residual video. If one or more of the channels of the current picture have different set remapping parameters than the corresponding channels of the reconstructed picture, the enhancement layer encoder and decoder may adjust the affected channels of the reconstructed picture. For example, the encoding and decoding tools inverse map the sample values of the affected channels of the reconstructed picture to their original dynamic range using the set remapping parameters of the affected channels of the reconstructed picture, and then remap the sample values of the affected channels using the set remapping parameters of the appropriate channels in the current picture. If the second remapping results in sample values outside the target dynamic range, then those sample values are clipped. Then, with respect to the reconstructed picture, the enhancement layer encoder and decoder perform motion compensation for blocks, macroblocks, etc. in the current picture.

The previous examples used parameters for Shift, Scale and Norm with a range of values. Alternatively, the encoding and decoding tools use parameters with different ranges of values (e.g., a larger range), or use parameters that allow different levels of accuracy in scaling and inverse scaling. Alternatively, the encoding and decoding tools use other parameters for remapping the sample values. The remapping operations in equations (8) and (9) use linear scaling. Alternatively, the remapping operation is implemented according to other linear mapping rules, with the remapping operation being implemented by a look-up table or other non-linear rule.

In the previous example, the range remapping uses different scaling factors for the luminance and chrominance channels of the inter-layer residual video. Alternatively, the range remapping uses the same scaling factor for the luminance and chrominance channels of the inter-layer residual video. Similarly, the picture coding tool may signal the on/off flag on a lane-by-lane basis, a slice-by-slice basis, or some other basis, rather than signaling the on/off flag on a picture-by-picture basis.

VII. Combined realization

Fig. 10a and 10b illustrate an example technique (1000) for scalable video coding using adaptive low-pass filtering, sample depth upsampling, chroma upsampling, and residual remapping. An encoding tool, such as encoding tool (200) shown in fig. 2, or other encoding tool, performs technique (1000). In general, in fig. 10a and 10b, operations performed with a base layer encoder or an enhancement layer encoder are grouped apart from other operations performed as part of scalable video coding.

Initially, the encoding tool scales (1010) the input video to generate a base layer video. The base layer encoder encodes (1020) the base layer video, resulting in encoded data that the base layer encoder signals in the base layer bitstream. The base layer encoder also reconstructs (1022) the base layer video.

The coding tool selects (1030) one of a plurality of filter strength parameters of an adaptive low-pass filter, filters (1032) the reconstructed base layer video using the adaptive low-pass filter, and upsamples (1034) sample values of the reconstructed base layer video to a higher sample depth. For example, to evaluate different values of the filter strength parameter, the encoding tool filters the reconstructed base layer video with a low-pass filter adjusted according to a given value, and then checks the results of the filtering/upsampling. After the encoding tool finds an acceptable value, the encoding tool signals (1036) a filter strength parameter in the enhancement layer bitstream. The encoding tool may also perform inverse tone mapping on the upsampled values of the reconstructed base layer video.

The coding tool also selects (1040) one or more chroma scaling parameters for adaptive chroma upsampling and scales (1042) the chroma channel to a higher chroma sampling rate. For example, to evaluate different values of the chroma scaling parameter, the coding tool performs chroma upsampling as indicated by the value and then checks the result. After the coding tool finds an acceptable value, the coding tool signals (1044) the chroma scaling parameters in the enhancement layer bitstream.

The coding tool determines (1050) the inter-layer residual video as a sample-to-sample difference between the reconstructed base layer video and the input video, and then remaps the sample values of the inter-layer residual video. The encoding tool selects (1060) one or more set remapping parameters and then maps (1062) the sample values of the inter-layer residual video from one set of sample values to another set of sample values according to the set remapping parameters. For example, to evaluate different values of the set remapping parameters, the encoding tool performs the mapping as indicated by the values and then checks the results. After the coding tool finds an acceptable value, the coding tool signals (1064) the set of remapping parameters in the enhancement layer bitstream.

Finally, the enhancement layer encoder encodes (1070) the inter-layer residual video, resulting in encoded data signaled in the enhancement layer bitstream. The encoding tool repeats the adaptive encoding picture by picture (1000).

Fig. 11a and 11b illustrate an example technique (1100) for scalable video decoding using adaptive low-pass filtering, sample depth upsampling, chroma upsampling, and residual remapping. A decoding tool, such as the decoding tool shown in fig. 3, or other decoding tool, performs the technique (1100). Generally, in fig. 11a and 11b, operations performed with a base layer decoder or an enhancement layer decoder are grouped apart from other operations performed as part of scalable video decoding.

The base layer decoder receives (1110) encoded data of the base layer video in the base layer bitstream and decodes (1112) the base layer video. The decoding tool parses (1130) the one or more filter strength parameters from the enhancement layer bitstream and uses the filter strength parameters to adjust the adaptive low-pass filter. The decoding tool filters (1132) the reconstructed base layer video using an adaptive low-pass filter and upsamples (1134) the sample values of the base layer video to a higher sample depth. The encoding and decoding tools perform the same filtering (1032, 1132) and upsampling (1034, 1134) operations on the reconstructed base layer video. The decoding tool may also perform inverse tone mapping on the upsampled values of the reconstructed base layer video.

The decoding tool parses (1140) the one or more chroma scaling parameters from the enhancement layer bitstream. The decoding tool scales (1142) the chroma channel of the reconstructed base layer video to a higher chroma sampling rate using a chroma upsampling represented by a chroma scaling parameter. The encoding and decoding tools perform the same chroma upsampling (1042, 1142) operation on the reconstructed base layer video.

Independently, the enhancement layer decoder receives (1150) encoded data of the inter-layer residual video in the enhancement layer bitstream and decodes (1152) the inter-layer residual video. The decoding tool parses (1160) one or more set remapping parameters from the enhancement layer bitstream and then maps (1162) sample values of the inter-layer residual video from one set of sample values to another set of sample values according to the set remapping parameters. The decoding tool performs remapping (1162) operations that are the inverse of the remapping operations (1062) performed by the encoding tool.

Finally, the decoding tool combines (1170) the remapped inter-layer residual video with the filtered/upsampled base layer video, resulting in a reconstructed version of the input video. The decoding tool repeats the adaptive decoding picture by picture (1100).

Alternative scheme VIII

Many of the examples described herein relate to adaptive behavior (e.g., for filtering, chroma upsampling or sample value remapping) represented by parameters that are signaled as side information. Alternatively, the encoding and decoding tools adapt filtering, chroma upsampling, and/or sample value remapping based on context information available to the encoding and decoding tools without explicitly signaling parameters as side information.

Having described and illustrated the principles of the invention with reference to various embodiments, it will be recognized that the various embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment unless otherwise specified. Various types of general purpose or special purpose computing environments may be used or operations may be performed in accordance with the teachings described herein. Elements of embodiments shown in software may be implemented in hardware and vice versa.

In view of the many possible embodiments to which the principles of our invention may be applied, all such embodiments are claimed as the invention, which fall within the scope and spirit of the appended claims and their equivalents.

Claims

1. A method of using a scalable video processing tool, the method comprising:

receiving the base layer video after reconstructing the base layer video, wherein the reconstructed base layer video has a plurality of sample values with a first sample depth;

filtering the reconstructed base layer video using an adaptive low-pass filter; and

upsampling the plurality of sample values of the reconstructed base layer video to a second sample depth that is higher than the first sample depth;

wherein the filtering comprises changing a normalization factor of the adaptive low-pass filter at a location depending on how many neighboring sample values around the location are within a threshold of similarity to a current sample value at the location.

2. The method of claim 1, wherein the adaptive low-pass filter is adjusted based on one or more filter strength parameters signaled as side information.

3. The method of claim 1, wherein the reconstructed base layer video comprises a picture, and wherein the adaptive low-pass filter is adapted to remove artifacts or jitter.

4. The method of claim 1, further comprising inverse tone mapping the plurality of sample values of the reconstructed base layer video after the upsampling.

5. The method of claim 1, wherein the reconstructed base layer video has a luma channel and a plurality of chroma channels, each of the plurality of chroma channels having a first chroma sampling rate, the method further comprising:

scaling each of the plurality of chroma channels to a second chroma sampling rate different from the first chroma sampling rate based at least in part on one or more chroma scaling parameters signaled as side information.

6. The method of claim 2, wherein the one or more filter strength parameters comprise a kernel size of the adaptive low-pass filter and/or a threshold value of similarity for comparing a current sample value to adjacent sample values.

7. The method of claim 1, wherein the reconstructed base layer video comprises a picture having a first spatial resolution, and wherein the method further comprises upsampling the picture to a second spatial resolution different from the first spatial resolution.

8. The method of claim 1, wherein the method further comprises, during encoding:

scaling an input video to generate the base layer video;

encoding the base layer video with a base layer video encoder to generate at least a portion of a base layer bitstream and reconstruct the base layer video;

determining an inter-layer residual video from the input video and the reconstructed base layer video after the filtering and the upsampling;

encoding the inter-layer residual video with an enhancement layer video encoder to generate at least a portion of an enhancement layer bitstream; and

outputting the at least a portion of the base layer bitstream and the at least a portion of the enhancement layer bitstream.

9. The method of claim 1, wherein the method further comprises, during decoding:

receiving at least a portion of a base layer bitstream and at least a portion of an enhancement layer bitstream;

decoding the base layer video with a base layer video decoder using the at least a portion of the base layer bitstream to produce the reconstructed base layer video;

decoding an inter-layer residual video with an enhancement layer video decoder using the at least a portion of the enhancement layer bitstream; and

combining the inter-layer residual video and the reconstructed base layer video after the filtering and the upsampling.