HK1160318A

HK1160318A - Reduced dc gain mismatch and dc leakage in overlap transform processing

Info

Publication number: HK1160318A
Application number: HK12100545.6A
Authority: HK
Inventors: D．舍恩伯格; S．L．瑞古纳萨恩; S.孙; G．J．沙利文; Z.周; S.斯里尼瓦桑
Original assignee: 微软技术许可有限责任公司
Priority date: 2008-10-10
Filing date: 2009-10-09
Publication date: 2012-08-10

Description

Reduced DC gain mismatch and DC leakage in overlay variation processing

Background

Transform coding is a compression technique used in many audio, image and video compression systems. Uncompressed digital images and videos are typically represented or captured as samples of primitives or colors at locations in an image or video frame arranged in a two-dimensional (2D) grid. This is conventionally referred to as a spatial domain representation of an image or video. For example, a typical format for a rectangular shaped image includes three two-dimensional arrays of 8-bit color samples. Each sample is a number representing a value of a color component at a spatial location in the grid, where each color component represents a magnitude along an axis in a color space such as RGB or YUV. The individual samples in one of these arrays may be referred to as pixels. (in other common uses, the term pixel is used to refer to an n-tuple of spatially co-located n color component samples-e.g., to refer to R, G at a given spatial location and a 3-tuple grouping of B color component values-however, the term may alternatively be used herein to index magnitude samples). Various image and video systems may use sampling of different color, spatial, and temporal resolutions. Similarly, digital audio is often represented as a stream of time-sampled audio signals. For example, a typical audio format consists of a stream of 16-bit audio signal amplitude samples representing the amplitude of the audio signal at regularly spaced time instants.

Uncompressed digital audio, image, and video signals can consume a large amount of storage and transmission capacity. Transform coding may be used with other coding techniques to reduce the amount of data required to represent such digital audio, images and video, for example by transforming a spatial (or time domain) representation of a signal into a frequency (or other similar transform domain) representation, to enable subsequent reduction in the amount of data required to represent the signal. The reduction in the amount of data is typically achieved by applying a process called quantization or by selectively discarding certain frequency components of the transform domain representation (or a combination of both), followed by applying entropy coding techniques such as adaptive huffman coding or adaptive arithmetic coding. The quantization process may be selectively applied based on the estimated perceptual sensitivities of the frequency components or based on other criteria. For a given bit rate output, properly applying transform coding typically results in much lower perceptible digital signal degradation than reducing the color sample fidelity or spatial resolution of an image or video directly in the spatial domain, or reducing the sample fidelity of audio in the time domain.

More specifically, typical block transform-based encoding techniques divide uncompressed pixels of a digital image into fixed-size two-dimensional blocks (X)₁……X_n)。

A linear transform that performs a space-frequency analysis is applied to a given block, which converts the intra-block time-domain samples into a set of frequency (or transform) coefficients that generally represent the strength of the digital signal in the corresponding frequency band over the block interval. For compression, the transform coefficients may be quantized (i.e., reduced in precision, such as by dropping the least significant bits of the coefficient values or mapping values in a higher precision number set to a lower precision), and also entropy or variable length encoded into a compressed data stream. At decoding time, the transform coefficients will be inverse quantized and inverse transformed back into the spatial domain to reconstruct almost the original color/spatial sampled image/video signal (reconstruction block))。

The ability to exploit the correlation of samples in a block and thereby maximize compression capability is a major requirement in transform design. In many block transform-based coding applications, the transform should be invertible to support both lossy and lossless compression depending on the quantization operation applied in the transform domain. For example, coding with reversible changes may enable exact reproduction of input data when corresponding decoding is applied, without applying quantization. However, the requirement of reversibility in these applications constrains the choice of transforms to be used for designing the coding technique. The implementation complexity of the transform is another important design constraint. Therefore, the transform design is typically chosen such that applying the forward and inverse transforms involves only multiplication of small integers, as well as other simple mathematical operations such as addition, subtraction and shift operations (multiplication or division is achieved by powers of 2, such as 4, 8, 16, 32, etc.) so that a fast integer implementation with minimal dynamic range extension can be obtained.

Many image and video compression standards, such as JPEG (ITU-T T.81. ISO/IEC 10918-1) and MPEG-2(ITU-T H.262. ISO/IEC 13818-2), utilize Discrete Cosine Transform (DCT) based transforms. DCTs are known to have advantageous energy compression characteristics, but also have drawbacks in many implementations. DCT is described by "Discrete Cosine Transform" by N.Ahmed, T.Natarajan and K.R.Rao, IEEE letters, C-23 (January 1974), pages 90-93.

In compressing still images (or intra-coded frames in video sequences), many common standards, such as JPEG and MPEG-2, divide an array representing an image into blocks of 8x 8 samples and apply a block transform to each such image block. The transform coefficients in a given block in these designs are only affected by the sample values in the block region. In image and video coding, quantization of samples in these independently constructed blocks can lead to block boundary discontinuities and thus create visible annoying artifacts known as blocking artifacts or blockiness. Similarly for audio data, when non-overlapping blocks are independently transform coded, the quantization error will produce discontinuities at block boundaries in the signal upon reconstruction of the audio signal at the decoder. For audio frequencies, a periodic small electrostatic interference sound (clicking) effect may be heard.

Techniques for mitigating blocking artifacts include using deblocking filters to smooth signal values across inter-block edge boundaries. These techniques are not without their drawbacks. For example, deblocking techniques may require significant computational implementation resources.

Another approach is to reduce blocking artifacts by using an overlap transform as described in "Signal Processing with Lapped Transforms" by h.malvar, Artech House, Norwood MA, 1992. In general, a lapped transform is a transform that has an input region that spans some neighboring samples in neighboring blocks in addition to samples in the current block. Also, on the reconstruction side, the inverse lapped transform affects some of the decoded samples in the neighboring blocks as well as the samples of the current block. Thus, even if quantization is present, the inverse transform may maintain continuity across block boundaries, thereby resulting in reduced blockiness. Another advantage of the lapped transform is that it can exploit cross-block correlation, which results in better compression capability. In some lapped transform implementations, overlapping blocks of samples are processed in forward and inverse transforms. In other implementations, the overlap process is separate from the transform process; for encoding, the overlap process is performed across block boundaries prior to the forward transform performed on non-overlapping blocks, while for decoding, the inverse transform is performed on non-overlapping blocks and then the overlap transform is performed across block boundaries.

For the case of two-dimensional data, in general, the lapped two-dimensional transform is a function of the current block, and of the selected elements of the blocks to the left, above, right, below the current block, and possibly of the blocks above left, above right, below left, and below right of the current block. The number of samples in neighboring blocks used to compute the lapped transform for the current block is referred to as the overlap or support amount.

SUMMARY

In summary, the detailed description relates to various techniques for digital media compression and decompression. For example, various techniques are applied to address DC gain mismatch and/or DC leakage phenomena in overlapping processing operations during encoding and/or decoding.

According to one aspect of the disclosed technology, a digital media decoding device performs an inverse lapped transform when decoding digital media. The digital media decoding apparatus performs inverse frequency transform on the digital media. The device then applies a plurality of overlap operators to the result of the inverse frequency transform. The first of the plurality of overlap operators is an internal overlap operator and the second of the plurality of overlap operators is an edge or corner overlap operator. Each of the plurality of overlap operators is characterized by a substantially equivalent DC gain. This reduces DC gain mismatch between operators.

In a corresponding encoding, a digital media encoding device performs a lapped transform when encoding digital media. In pre-processing, the device applies a plurality of overlap operators to the digital media data sample or to results from earlier stages of encoding such digital media data sample. The first of the plurality of overlap operators is an internal overlap operator and the second of the plurality of overlap operators is an edge or corner overlap operator. Also, each of the plurality of overlapping operators is characterized by a substantially equivalent DC gain, which reduces DC gain mismatch between operators and thereby improves compression performance. The digital media encoding apparatus performs frequency transformation on the result of the overlap preprocessing. In addition to DC gain mismatch reduction, multiple overlap operators show reduced DC leakage in many cases.

In accordance with another aspect of the disclosed technique, a digital media decoding device receives information in an encoded bitstream indicating a selected tile boundary option, wherein the selected tile boundary option indicates one of a hard tile boundary treatment for an overlap operator and a soft tile boundary treatment for the overlap operator. Based at least in part on the selected tile boundary option, the digital media decoding device performs an inverse overlap process. For example, soft tile boundary processing is characterized by overlapping processing across tile boundaries, while hard tile boundary processing is characterized by the absence of such overlapping processing across tile boundaries. In some implementations, the inverse overlap processing may include applying an overlap operator designed to have reduced DC gain mismatch and/or DC leakage.

In a corresponding encoding, the digital media encoding device selects between using hard tile boundary processing for the overlap operator and soft tile boundary processing for the overlap operator. The digital media encoding device performs an overlap process according to the selected tile boundary option. The apparatus also signals information indicating the selected tile boundary option in the encoded bitstream. In some implementations, this allows switching between a first mode (hard tiles) that typically has lower compression efficiency but no dependencies between tiles and a second mode (soft tiles) that typically has higher compression efficiency but dependencies between tiles.

The above summary is only a brief overview and is not intended to describe all features of the invention presented herein. The foregoing and other objects, features and advantages of the invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.

Brief Description of Drawings

Fig. 1 is a flow chart of an encoder including a lapped transform using reversible overlap operators.

Fig. 2 is a flow chart of a decoder including a corresponding inverse lapped transform.

FIG. 3 is an illustration of a block arrangement in an example implementation depicting a layout of a 4x4 interior overlap operator, a 4x1 edge overlap operator, and a 2x2 corner overlap operator in the example implementation for use in a first stage lapped transform and for a full resolution channel in a second stage of lapped transform. The depicted overlap operator is also used in the corresponding stage of the inverse lapped transform in the example implementation.

Fig. 4 is an illustration of a block arrangement in an example implementation, depicting a layout of a 2x2 internal overlap operator, a 2x1 edge overlap operator, and a 1x1 corner overlap operator in an example implementation for use in a second stage lapped transform for a 4:2:2 downsampled chroma channel. The depicted overlap operator is also used in the corresponding stage of the inverse lapped transform in the example implementation.

Fig. 5 is an illustration of a block arrangement in an example implementation, depicting a layout of a 2x2 internal overlap operator, a 2x1 edge overlap operator, and a 1x1 corner overlap operator in an example implementation for use in a second stage lapped transform for a 4:2:0 downsampled chroma channel. The depicted overlap operator is also used in the corresponding stage of the inverse lapped transform in the example implementation.

Fig. 6A is a flow diagram depicting an example method for selecting and signaling hard or soft tile boundaries for overlap processing.

FIG. 6B is a flow diagram depicting an example method for receiving a selected hard or soft tile boundary indicator for inverse overlap processing.

Fig. 7A is a flow diagram depicting an example method for performing lapped transforms with reduced DC gain mismatch and reduced DC leakage using a lap operator.

Fig. 7B is a flow diagram depicting an example method for performing an inverse lapped transform with reduced DC gain mismatch and reduced DC leakage using a lap operator.

FIG. 8 is a block diagram of a suitable computing environment for implementing the techniques described herein.

Detailed Description

The following description relates to digital media compression or decompression systems, i.e., encoders or decoders, that utilize a forward/inverse lapped transform design that addresses DC gain mismatch and/or DC leakage phenomena. For purposes of illustration, an embodiment of a compression/decompression system incorporating these techniques is an image or video compression/decompression system. Alternatively, the techniques described herein may be incorporated into compression or decompression systems, i.e., encoders or decoders, for other two-dimensional data or other media data. The techniques described herein do not require the digital media compression system to encode compressed digital media data in a particular encoding format.

Example implementations presented below illustrate solutions to the DC gain mismatch and/or DC leakage problem in image encoding and decoding. For example, these solutions can be incorporated into the JPEG XR image coding standard (ITU-T T.832| ISO/IEC 29199-2). Additionally, various choices for the first and second example implementations use operators referenced and/or defined in the JPEG XR standard.

These implementations also refer to various ways of solving the DC leakage problem in the 4x4 operator for overlap processing in image encoding and decoding, as described in U.S. patent application No. 12/165,474 filed on 30.6.2008.

1. Encoder/decoder

A representative but generalized and simplified data encoder and decoder are shown and described as follows.

Fig. 1 and 2 are generalized illustrations of processes employed in a representative two-dimensional data encoder 100 and corresponding decoder 200. The encoder 100 and decoder 200 include a lapped transform process using techniques that account for DC gain mismatch and/or DC leakage. These figures present a generalized or simplified illustration of the use and application of the techniques described herein in compression and decompression systems incorporating two-dimensional data encoders and decoders. In alternative encoders and decoders based on these reduced DC gain mismatch and DC leakage techniques, more or fewer processes than shown in the representative encoder and decoder may be used for two-dimensional data compression. For example, some encoders/decoders may also include color conversion, any kind of color format processing, scalable coding, and so forth. The compression and decompression systems (encoders and decoders) can provide lossless and/or lossy two-dimensional data compression, depending on the application of quantization, which can be based on one or more quantization control parameters that control the degree of fidelity loss in the encoded representation over a wide range of selectable fidelity, ranging from perfectly lossless to very coarse (high compression ratio) representations.

The two-dimensional data encoder 100 produces a compressed bitstream 120, the compressed bitstream 220 being a more compact representation (for a typical input) of the two-dimensional data 110 provided as input to the encoder 100. For example, a two-dimensional data input may be an image, a frame of a video sequence, or other data having two dimensions, commonly referred to as an image. In the region arrangement stage 130, the encoder 100 organizes the input data into blocks for later processing. For example, the block is a 4x4 sample block, a 2x4 sample block, or a 2x2 sample block. Alternatively, the blocks have other sizes.

In the first stage 140 of the positive overlap process, the encoder applies an overlap operator to the input data block. In an exemplary embodiment, the positive overlap operator (shown as shaded block 142) is four lapped transform operators. The encoder 100 then performs a block transform 150 on each block.

The encoder 100 separates the DC coefficient AC coefficients from the respective blocks for subsequent encoding. The encoder 100 performs the additional steps of the overlap processing and the positive frequency transform on the DC coefficient. The additional step of the overlap process may be skipped and may use the same overlap operator as the first stage or a different overlap operator. For example, in some implementations, the encoder 100 may select whether to perform additional steps of the overlap processing, and signal this decision in the encoded data for use by the decoder in deciding which inverse overlap processing to perform. The encoder quantizes 170 the result of the additional lapped transform of the DC coefficient and the encoder quantizes the AC coefficient. The encoder then entropy encodes 180 the coefficients and packetizes the entropy encoded information for signaling in the bitstream 120 along with side information indicating the encoding decision that the decoder 200 will use in decoding.

Referring to fig. 2, as a brief overview, the decoder 200 performs the inverse process. At the decoder side, the transform coefficient bits are extracted 210 from their respective packets in the compressed bitstream 205, from which the DC and AC coefficients themselves are decoded 220 and dequantized 230. DC coefficients 240 are regenerated by applying an inverse transform and "inversely overlap" the planes of the DC coefficients in post-transform filtering using appropriate operators applied across the DC block edges. The block of samples is then regenerated by applying an inverse transform 250 to the DC and AC coefficients 242 decoded from the bitstream. Finally, the block edges in the resulting image plane are subjected to an inverse overlap filter 260. The second stage of the inverse overlap process may use the same overlap operator as the first stage of the overlap process or a different overlap operator. This produces a reconstructed two-dimensional data output 290.

2. General examples of the techniques

This section contains a general example of techniques to improve the performance of lapped transform coding and corresponding decoding.

Fig. 6A depicts an example method 600 for selecting and signaling hard or soft tile boundaries for overlap processing. An encoder such as that explained with reference to fig. 1 performs this technique. Alternatively, another encoder performs the technique.

At 610, the encoder selects one of a hard tile boundary and a soft tile boundary for use in performing an overlap process during lapped transform coding. For example, the selection may be based on the number of tiles present in the image to be encoded, the desired computational complexity of the decoding, the desired output quality, user settings, whether individual decoding of individual tiles (without inter-tile dependencies) is a desired application feature, or another factor. In general, using hard tile boundaries in an image results in fewer inter-tile dependencies because the edges of the tiles are treated as image boundaries for overlap processing. This facilitates decoding of individual small blocks (as opposed to the entire image) at the possible compression efficiency cost (since more bits may be needed to mitigate the blocking artifacts that could otherwise be avoided with overlap processing, and since overlap operations generally also improve the transform compression characteristics known as coding gain). On the other hand, using soft tile boundaries permits overlapping processing across tile boundaries, but creates dependencies between tiles for decoding. After making the tile boundary decision, the encoder performs the lapped transform process accordingly.

At 620, the encoder signals the selected tile boundary decision in the encoded bitstream. For example, the selected tile boundary decision may be signaled as a separate syntax element (e.g., a single bit in the picture header indicating whether a hard or soft tile boundary is selected for overlap processing of a given picture). The selected tile boundary may also be signaled in conjunction or association with other syntax elements, and may be signaled on a basis other than picture-by-picture.

Fig. 6B depicts an example method 630 for receiving a selected hard or soft tile boundary indicator. A decoder such as that explained with reference to fig. 2 performs this technique. Alternatively, another decoder performs the technique.

At 640, the decoder receives information in the encoded bitstream indicating the selected tile boundary option for the inverse overlap process. The selected tile boundary option indicates one of a hard tile boundary option and a soft tile boundary option for the inverse overlap process. For example, the selected tile boundary option may be received as a separate syntax element or a jointly encoded syntax element, and this information may be received in the picture header of the picture or at some other syntax level.

At 650, the decoder performs inverse transform decoding based at least in part on the selected tile boundary option. For example, the decoder performs, selectively performs, or does not perform inverse overlap processing across tile boundaries, depending on whether soft or hard tiles are used.

The tile boundary decision indicator may also be combined with the selected overlay mode such that the overlay mode option indicates a combination of overlay phase and hard/soft tiling decision. For example, one value of the overlap mode option may indicate that no overlap is applied (e.g., no first and second overlap phases are applied in a hierarchical overlap transform scheme having two phases), another value of the overlap mode option may indicate that only the first of the two overlap phases is applied for soft tiling, another value of the overlap mode option may indicate that the two overlap phases are applied for soft tiling, another value of the overlap mode option may indicate that only the first of the two overlap phases is applied for hard tiling, and yet another value of the overlap mode option may indicate that the two overlap phases are applied for hard tiling. This overlap mode option may be selected, used and signaled in the encoded bitstream when performing lapped transform coding. The encoded bitstream is received by a corresponding decoder, and the selected lapped mode option is decoded and used when performing inverse lapped transform decoding.

Fig. 7A depicts an example method 700 for using overlap operators with reduced DC gain mismatch and reduced DC leakage in an overlap transform during encoding. An encoder such as that explained with reference to fig. 1 performs this technique. Alternatively, another encoder performs the technique.

At 710, an encoder encodes digital media, wherein the encoding includes a lapped transform process using a plurality of overlap operators with reduced DC gain mismatch and reduced DC leakage. In particular, the first overlap operator (for the inner region) has a first DC gain, while the second overlap operator (for the edge or corner region) has a second DC gain that is substantially equivalent to the first DC gain. Examples of the internal overlap operator, edge overlap operator and corner overlap operator with substantially equivalent DC gain are provided below for blocks of luma samples and chroma samples at different resolutions.

At 720, the encoder generates an encoded bitstream. For example, the bitstream conforms to the JPEG XT format or another format.

Fig. 7B depicts an example method 730 for using an overlap operator with reduced DC gain mismatch and reduced DC leakage in an inverse overlap transform during decoding. An encoder such as that explained with reference to fig. 2 performs this technique. Alternatively, another decoder performs the technique.

At 740, the decoder receives the encoded bitstream. For example, the bitstream conforms to the JPEG XT format or another format.

At 750, the decoder performs an inverse lapped transform process using a plurality of overlap operators with reduced DC gain mismatch and reduced DC leakage. In particular, the first overlap operator (for the inner region) has a first DC gain, while the second overlap operator (for the edge or corner region) has a second DC gain that is substantially equivalent to the first DC gain. Examples of the internal overlap operator, edge overlap operator and corner overlap operator with substantially equivalent DC gain are provided below for blocks of luma samples and chroma samples at different resolutions.

The term "substantially equivalent DC gain" does not mean that the overlap operator must have exactly the same DC gain. Rather, the DC gain difference should be small enough to reduce (or even eliminate) undesirable artifacts due to DC gain differences between overlapping operators, taking into account operational constraints and planned applications. The target similarity level of the DC gain may vary depending on the target implementation complexity versus the required quality, as well as other factors. As an example, the sets of overlap operators presented below have substantially equivalent DC gains in the respective sets.

3. Example implementation

These example implementations include lapped transform techniques.

3.1 theoretical basis

As described above, the lapped transform (also referred to as lapped transform) is conceptually similar to the block transform. The conventional block transform has the following two steps:

1. dividing input data into block regions (regions including one-dimensional series of sample strings or sample array strings having a rectangular block shape in a two-dimensional data set such as an image), and

2. a transformation process is applied to each block to analyze/decompose its frequency content.

For compression applications, the output of the block transform is quantized and entropy coding is applied to the result. These steps are sometimes combined with other operations such as prediction processes and probability estimation processes. During decoding, the decoder applies an inverse or quasi-inverse operation to each of these processing stages.

A well-known phenomenon associated with block transforms is the generation of blocking artifacts, which are perceptual discontinuities that may occur in reconstructed approximations resulting from the decoding process. A well-known method of mitigating block artifacts is to use a lapped transform. As described above, in the lapped transform, data blocks forming an input to the transform process overlap each other. One problem that arises when using lapped transforms is how to deal with signal edges. At the edges of the input data sample set (such as image edges), some data that was originally part of the lapped transform input to the encoder does not exist. The system designer must determine how to handle these edge cases.

In image coding techniques, the key concept is to decompose a larger image into multiple small blocks. A tile is a data set that corresponds exactly or approximately to a particular (originally rectangular) aerial image region of a picture. By segmenting an image into small blocks and encoding each small block individually, it is possible to access a portion of the image and decode that portion without decoding the entire image. The ability to access a particular area of an image may be particularly useful when the image is large. However, segmenting a large image into smaller small regions can create more edges in the picture-i.e., edges that separate small blocks from other small blocks in the image. At each of these edges, a small block boundary phenomenon may occur that is directly analogous to a blocking artifact.

In existing versions of the JPEG XR standard, the overlap operation, when enabled, is applied across image tile boundaries as well as across individual block boundaries of the block transform. While this approach may have advantages in some cases, the result of this design choice is that tile-based image encoding and retrieval has more computational complexity when accessing tile-aligned image regions than if no overlap operation were applied across image tile boundaries. This extra computational cost is due to the data dependency across neighboring tiles introduced by the overlap process of the transform regions, which has the following consequences: in order to decode a particular tile region, the data stored for spatially adjacent tiles must also be accessed and at least partially decoded.

In contrast, in the example implementations presented herein, tile boundaries are categorized into one of two types, as follows:

"hard" tile boundaries, such that the edges of the tiles are treated in the same way as the image edges, so there is no overlap of transform blocks applied across tile boundaries, or

"soft" tile boundaries, where the overlap of transform blocks is applied across tile boundaries.

In existing designs such as JPEG 1 and JPEG 2000, all tile boundaries are treated as hard tile boundaries. In contrast, in existing versions of the FPEG XR standard, the tile boundaries are treated as soft tile boundaries. Each type of boundary processing is actually useful in some situations. In the example implementations presented herein, the encoder may choose to use hard or soft tile boundaries as its decision and signal this choice along with the compressed representation of the image data. Further, in the example implementations presented herein, even when hard tile boundaries are used, at least some edge samples in a given tile that are near the tile boundary are overlap processed with edge operators in order to reduce DC gain mismatch that can cause problems due to the application of internal overlap operators in the tile.

Example implementations also use lifting-based transformation operations. A phenomenon known as "DC leakage" can sometimes be a problem in these conversions. If the transform shows DC leakage, the output of the positive transform may contain significant non-DC transform coefficients of some constant value input signal (for which theoretically the DC coefficients should only be non-zero frequency coefficients). DC leakage can sometimes lead to loss of compressibility and perceptible artifacts in the decoded output (such as waves or distortion in the smooth region).

In large images, the appearance of some loss of compression performance and perceptible artifacts only near the outer boundaries of the image may be acceptable in compression coding designs. However, because tile segmentation using hard tile boundaries creates more regions that induce this phenomenon, it becomes more important to carefully design the edge region's process when designing can include hard tile boundary support.

The problems posed by small block support in image encoders may also give rise to a system-level, more advanced support of large "meta-images" that are constructed into larger data sets from a plurality of smaller images that are each encoded separately. Also in this use case, it is more important to carefully handle the image edges.

Thus, it is desirable to have a hard tile boundary coding option so that each image tile can be processed independently. One solution is to process the tile boundaries the same as the overall image boundary. However, in some cases, transform designs have undesirably high DC leakage phenomena in operations used near image edges. In addition, and even more importantly, there is a DC gain difference between the output of the edge processing operation and the processing of the inner region of the image.

Identifying several points can help to account for DC gain mismatch and DC leakage in lapped transforms near image boundaries. First, in existing implementations, the overlap operator is compromised by DC gain mismatch between the interior operators and the corner and edge operators, resulting in reduced compression efficiency and significant visual artifacts. Second, in existing implementations, edge, corner, and chroma internal operators are also compromised by DC leakage, which also results in reduced compression efficiency and significant visual artifacts. In addition, in the existing implementations, the DC leakage characteristics for processing the inner region of the image are also problematic in the case of 4:2:0 and 4:2:2 chromaticities.

3.2 overview of example implementations

Examples implement transformations using hierarchical overlap-add. The transformation has four stages:

1. a first stage of overlapping treatment;

2. a first stage of kernel transformation;

3. the second stage of overlapping treatment;

4. and a second stage of kernel transformation.

The overlap phase acts on the slant grid with respect to the kernel transform operator. That is, the kernel transform works on 4x4 blocks arranged in a grid aligned with the top left pixel in the image. The overlap phase works on a grid of similar size but with a horizontal offset of 2 pixels from the top left corner pixel and a vertical offset of 2 pixels. The corresponding inverse lapped transform also has four stages: the method comprises a first stage of inverse kernel transformation, a first stage of inverse overlapping processing, a second stage of inverse kernel transformation and a second stage of inverse overlapping processing.

Unfortunately, as previously implemented, the overlap operator possesses the disadvantages of DC gain mismatch and DC leakage. In particular, edge overlap operators (operators that operate in 2-pixel wide edges introduced by grid offsets at the overlap stage) are problematic. Because hard blocking involves using these edge operators in large numbers for the central region of the image (as the most natural approach to allowing hard blocking), artifacts resulting from the problems with these operators become too noticeable. Even without hard blocking, previous overlap operators produced an unacceptable amount of visible artifacts in the image.

In the example implementation presented herein, the lapped transform is implemented by 6 different operators. They are grouped into two sets of three operators, a full resolution set and a chroma set. The three full resolution overlap operators are:

the Overlap4x4 operator-internal Overlap processing operation;

the Overlap4x1 operator is an edge Overlap processing operation;

cornerOverlap2x2 operator-angle overlap processing operation.

Fig. 3 depicts a diagram 300 of the layout of these operators. Fig. 3 depicts a 2 block x2 block region of a sample. The dots represent samples, and the central horizontal and vertical dashed lines indicate block boundaries for transform coding and decoding. The rounded boxes represent the regions that support each of the overlap operators. The rounded corner box 310 around the center 4x4 sample region represents the internal Overlap4x4 operator, the rounded corner box (e.g., 320) around the four corners 2x2 sample region represents the CornerOverlap2x2 operator, and the rounded corner box (e.g., 330) around the other 84 x1 (or 1x4) sample regions represents the Overlap4x1 operator. These three full resolution operators are used for all channels in the first stage lapped transform and for the full resolution channels in the second stage lapped transform (i.e., luma and chroma in the 4:4:4 sampling mode).

The three chroma operators are:

1, overlay 2x2 operator-internal Overlap processing operation;

2, Overlap2x1 operator-edge Overlap processing operation;

cornerOverlap1x1 operator-angle overlap processing operation.

These chroma operators are used for the second stage lapped transform of the down-sampled (4:2:2 or 4:2:0 resampling) chroma channels.

Fig. 4 shows a diagram 400 of the layout of these chroma operators when applied to a chroma channel using 4:2:2 resampling. FIG. 4 depicts an overlap operator for a second stage lapped transform of a 4:2:2 downsampled chroma channel. For a 2x2 block arrangement, the layout of overlapping chroma operators is applied to 4:2:2 samples. The dots represent samples, and the central horizontal and vertical dashed lines indicate block boundaries for transform coding and decoding. The rounded boxes (and circles) thus represent the domain/support of each overlap operator. The rounded corner box 410 around the center 2x2 sample region represents the internal Overlap2x2 operator, the rounded corner box (e.g., 420) around the four corners 1x1 sample region represents the CornerOverlap1x1 operator, and the rounded corner box (e.g., 430) around the other 82 x1 (or 1x2) sample regions represents the Overlap2x1 operator.

Fig. 5 shows a diagram 500 of the layout of these chroma operators when applied to a chroma channel using 4:2:0 resampling. FIG. 5 depicts an overlap operator for a second stage lapped transform of a 4:2:0 downsampled chroma channel. For a 2x2 block arrangement, the layout of overlapping chroma operators is applied to 4:2:0 samples. The dots represent samples, and the dotted lines indicate block boundaries for transform coding and decoding. The rounded boxes (and circles) thus represent the domain/support of each overlap operator. The rounded corner box 510 around the center 2x2 sample region represents the internal Overlap2x2 operator, the rounded corner box (e.g., 520) around the four corners 1x1 sample region represents the CornerOverlap1x1 operator, and the rounded corner box (e.g., 530) around the other 42 x1 (or 1x2) sample regions represents the Overlap2x1 operator.

Two problems arise when designing these operators: DC gain mismatch and DC leakage. Ideally, in each of these two operator sets (full resolution and chroma), the DC gain factors of all three operators should be equal. When the three DC gains are not matched, a problem of DC gain mismatch occurs.

DC leakage is a phenomenon that causes DC and AC coefficients to the transformed perfectly flat input. DC leakage is measured by the magnitude of the generated AC coefficient.

As a result of these two problems, a completely flat image coded using only its DC coefficients (ignoring all high frequency coefficients) will not have a completely flat reconstruction. The difference in the reconstructed pixel values can be seen as the sum of two terms: a) scaled DC gain mismatch, and b) DC leakage.

The difference is scaled gain mismatch + DC leakage.

One solution to the DC leakage problem in the Overlap4x4 operator is presented in U.S. patent application No. 12/165,474.

In the example implementation presented herein, a number of solutions are proposed to reduce or even eliminate DC gain mismatch and DC leakage for the remaining 5 operators (5 operators other than the Ovelap4x4 full resolution operator). For the remaining 5 operators, the magnitude of the DC gain mismatch is typically much larger than the magnitude of the DC leakage, but the perceptible artifacts caused by the DC leakage are generally more annoying. Therefore, the main problem to be solved is the DC gain mismatch between the overlap operators. A secondary problem is DC leakage in the overlap operator.

The next section describes the syntax and semantics of several solutions that enable soft/hard tile boundary decision-making. The next section describes a new overlap operator designed to eliminate artifacts caused by existing overlap operators. In addition to hard blocking scenes, these new overlap operators are also useful in "stitched" scenes, where a single large image is broken up into small blocks that are compressed independently. Later, the images are "stitched" together by displaying the images adjacently according to their original relationship.

3.3 overview of syntax for hard/Soft chunking decisions

Example implementations support hard and soft tile boundaries. Soft tile boundaries are more efficient for processing a large number of tiles in a tiled image and mitigate blocking artifacts. Hard patch boundaries allow for more efficient processing of a limited number of patches in an image. The syntax and semantics of the decision making with reference to hard/soft tile boundaries describe the options of two categories.

The chunking changes syntax. The first category of options changes the syntax of the existing implementation. The first option is to add a new syntax element indicating whether hard or soft chunking is used. For example, the new syntax element is a flag in the picture header.

The second option is to add a value of an OVERLAP level syntax flag (OVERLAP _ MODE) that has indicated OVERLAP MODE information. In the existing implementation, OVERLAP _ MODE is a 2-bit syntax element with the following three possible values and corresponding interpretations.

OVERLAP _ MODE is 0 meaning that no stage of the OVERLAP process is applied.

OVERLAP _ MODE ═ 1 means that only the first OVERLAP processing stage is applied to both OVERLAP processing stages.

OVERLAP _ MODE ═ 2 means that two phases of the OVERLAP process are applied.

The second option changes the OVERLAP _ MODE syntax element to a 3-bit syntax element. The above overlay MODE is modified to correspond to the soft chunking MODE (at least for overlay _ MODE ═ 1 or overlay _ MODE ═ 2, since overlay _ MODE ═ 0 is the default hard chunk boundary and has no overlay processing), and the following additional values and explanations are added.

OVERLAP _ MODE ═ 3 means that only the first OVERLAP processing stage is applied and the OVERLAP processing is performed in hard chunking format.

Overlay _ MODE 4 means that two overlapping processing stages are applied and the overlapping processing is performed in a hard chunking format.

A decoder that does not support the 3-bit OVERLAP _ MODE syntax element cannot successfully decode a picture using the syntax element. Thus, adding new bits will result in the inability of conventional decoders to decode pictures.

Backward compatible syntax. To allow legacy decoders to handle new bitstreams (even if decoding results in significant artifacts), the use of a sub-version number syntax element (labeled RESERVED _ C in the JPEG XR specification) may be used. For example, a bitstream containing the changes presented in U.S. patent application No. 12/165,474 has a sub-version value of 1, and otherwise has a sub-version value of 0. The next least significant bits of the sub-version number may be used to signal hard or soft blocking. In this scheme, the use of software tiles is signaled by setting the second most significant bit to zero, while the use of hard tiles is signaled by setting the second most significant bit to one. A conventional decoder that ignores the sub-version values will successfully decode both hard and soft-chunked images, but significant artifacts may result in hard-chunked images because soft-chunked boundary overlap processing is used by default. More advanced decoders will be able to respond to the value of the sub-version number and correctly decode the image without artifacts when hard-blocking is used.

The hard blocking setting will be encoded as follows:

HardTilingTrueFlag (hard blocking true flag) (RESERVED _ C > 1) & 1.

3.4 overview of existing overlap operators

At the decoder, the existing Overlap4x4 operator is designed to have a scaling of 1/(s ^2), where s ^ 0.8272. In other words, the Overlap4x4 operator is scaled up to approximately 1.4614. (this is the Overlap4x4 operator as described in U.S. patent application No. 12/165,474). The existing Overlap4x1 operator is designed to have a scaling of 1/s of the DC value. In other words, the Overlap4x1 operator amplifies DC at a scale of approximately 1.2089. The existing corner operator CornerOverlap2x2 is null (no operation is performed) and therefore the corresponding DC value has an implicit scaling of 1.0. Several observations about these existing operators should be made. First, scaling existing operators generally ensures that the basis of the inverse overlap is smooth and provides more coding gain in addition to fewer significant artifacts. Second, the actual scaling performed by the operators is slightly different due to the implementation of the integer. Third, the corresponding encoder overlap operator performs inverse scaling on the overlap operator of the decoder. Fourth, this scaling down at the encoder has some implications for lossless coding. For lossless coding, even a completely flat image needs to produce AC coefficients (and thus some compression efficiency is lost). Fifth, the rounding used in the integer implementation has little effect on some analytical and empirical results. In particular, many quantities such as scaling gain mismatch and dcLeakageRatio are only nearly linear.

In the following description of the overlap operator, the DC leakage and DC gain mismatch are calculated for the operator. The method for calculating DC leakage used below gives approximately the same results as the theoretical analysis based on matrix coefficients, and is simpler and faster. Finally, to show the range of DC leakage and DC gain mismatch, a DC scaling gain ratio and DC leakage are derived for each overlap operator by considering an input block where all pixel values are equal to 1000000(10^ 6). This input is called the value X₁。

The existing Overlap4x4 operator for full resolution. For the example existing Overlap4X4 operator, if given input X₁Then the output pixel has a value of 1461540 or 1461547. Thus, the ScalingGainRatio (scaling gain ratio) of the Overlap4x4 operator is approximately 1.461543. The difference in output values (i.e., |1461540-1461547| ═ 7) is due to DC leakage. The DCLeakageRatio (DC leakage ratio) of the existing Overlap4x4 operator is 7/10^ 6.

For an 8-bit sample image, the maximum range is +/-128/(scaling gain ratio). Thus, the maximum DC leakage of the 4x4 operator on an 8-bit image is (7/10^6) x 128/scaling gain ratio 0.000896/scaling gain ratio 0.000614. The maximum DC leakage on a 16-bit image is 256x 0.000614 ~ 0.157.

The existing Overlap4x1 operator for full resolution. Existing for exampleOverlap4X1 operator, if given input X₁Then the data pixel has a value of 1206726 or 1205078. The scaling gain ratio of Overlap4x1 was approximately 1.2058. If the inputs to the inverse Overlap4x1 operator are all the same value, the matrix representation of the scaling stage of that operator over a pair of input values can be expressed as:

wherein x is 3/32and y is 3/16. The scaling for one input is 1+ xy + x (2+ xy) 19771/16384-1.206726, while the scaling for the other input is y +1+ xy 617/512-1.205078. The DC leakage ratio is 19771/16384-617/512-27/16384-0.001648. The difference in output values (1206726-1205078-1648) is due to DC leakage. The DC leakage ratio is (1648/10^ 6). This DC leakage is small, but still significant.

For an 8-bit image, the maximum DC leakage is 127/scaling gain ratio x (1648/10^6) to 0.1788/scaling gain ratio 0.12247. For a 16-bit sample image, the maximum DC leakage may be +/-46/scaling gain ratio 31.35.

The existing CornerOverlap2x2 operator for full resolution. In existing implementations, there is no overlap operator for corners. The scaling gain ratio is 1.0 and the DC leakage ratio is 0.0.

DC gain mismatch effects for existing full resolution operators. In existing implementations, the main problem is DC gain mismatch (if not perceptually significant) by size, and the worst case effect can be quantified theoretically as follows. For an 8-bit sample image, at the edges, the worst case scaling gain mismatch would be (1.4615-1.205) x 128/scaling gain ratio-22.48. For an 8-bit sample image, the worst-case scaling gain mismatch at the corners would be (1.4615-1.0) x 128/scaling gain ratio 59/scaling gain ratio 40.46. For a 16-bit sample image, the worst-case scaling gain mismatch at the edges and corners would be 22.4x256 and 40.46x256, 10,393.6, respectively. The total worst case difference can be approximated as the sum of the scaled gain mismatch and the DC leakage.

Existing Overlap2x2 chroma operators. For the example existing Overlap2x2 operator, if the inputs to the inverse Overlap2x2 operator are all the same value, the matrix representation of the scaling stage of that operator over a pair of input values can be represented as:

wherein x is 1/4 and y is 1/2. The scaling for one input is 1+ xy + x (2+ xy) ═ 53/32 ═ 1.656235, while the scaling for the other input is y +1+ xy ═ 13/8 ═ 1.625.

The DC leakage ratio is 53/32-13/8-1/32-0.03125.

Existing Overlap2x1 chroma operators. For the example existing Overlap2x1 operator, if the inputs to the inverse Overlap2x1 operator are all the same value, the matrix representation of the scaling stage of that operator over a pair of input values can be represented as:

wherein x is 1/8 and y is 1/4. The scaling for one input is 1+ xy + x (2+ xy) ═ 329/256 ═ 1.28515625, and the scaling for the other input is y +1+ xy ═ 41/32. The DC leakage ratio is 329/256-41/32-1/256-0.003906.

There is a CornerOverlap1x1 chroma operator. In the existing implementation, there is no 1x1 overlap operator for corners. The scaling gain ratio is 1.0 and the DC leakage ratio is 0.0.

3.5 solution for Overlap4x1 operator

Three solutions for the Overlap4x1 operator are presented, where one of the three solutions has multiple variants. The application of the existing Overlap4x1 operator to its pixels is denoted Overlap4x1(a, b, c, d).

Solution1 for the Overlap4x1 operator. Since the DC gain of the existing Overlap4x1 operator is approximately the square root of the DC gain of the Overlap4x4, the first solution is to apply the existing Overlap4x1 operator twice. Overlap4x1Solution1(a, b, c, d) is specified as:

1.Overlap4x1(a，b，c，d)；

2.Overlap4x1(d，c，b，a)。

by reversing the order between the two applications of the operator, the DC leakage introduced by the first stage can be cancelled by the second stage. Repeating this ordering will incur leakage accumulation. Unfortunately, this solution still suffers from significant DC gain mismatch.

If the Overlap4X1Solution1 takes input X₁Then the output sample has a value of 1454061 or 1453548 or 1454852 or 1454340. The scaling gain ratio is approximately 1.4542. The DC leakage ratio is (1454852-. For an 8-bit sample image, the worst-case scaling gain mismatch for the 8-bit image is (1.4615-1.4542) x 128/scaling gain ratio 0.9344/scaling gain ratio 0.64. The worst case DC leakage may be 127/scaling gain ratio x1304/10^6 ~ 0.16561/scaling gain ratio. For a 16-bit sample image, the worst-case scaling gain mismatch is 0.9344x 256/scaling gain ratio-239.2064/scaling gain ratio-163. The worst case DC leakage may be 256x 0.16561/scaling gain ratio 42.39/scaling gain ratio 28.76. In terms of complexity, this solution has roughly twice the complexity of the existing Overlap4x1 operator.

Solution2 for the Overlap4x1 operator. Solution2 is similar in spirit to solution 1-it involves applying a single operator twice to the same sample. But for this solution the scaling of the existing Overlap4x1 operator is modified by adding an additional lifting step. As before, Overlap4x1Solution 2(a, b, c, d) is specified as:

1.Overlap4x1 Altered(a，b，c，d)；

2.Overlap4x1 Altered(d，c，b，a)。

these modifications use 6 additional lifting steps in the scaling stage, where each lifting step has one addition and one shift.

The idea behind this solution2 is to change x and y such that both DC leakage and DC scaling gain mismatch are minimized. (As used herein, "minimize" means reduce to an acceptable operating level.) the conditions for minimizing DC leakage are as follows

1+xy+x(2+xy)＝y+1+xy。

Solving for y to obtainIn this case, the scaling gain ratio of one stage of the Overlap4x1 Altered operator can be calculated as:

let the square root of the DC gain of the Overlap4x4 operator be denoted as k. Thus, the value of x that minimizes the scaling gain mismatch is given by:

or:

once the value of x that minimizes the scaling gain mismatch using this equation has been determined, the earlier equation can be usedTo determine the value of y (to minimize DC leakage). In this case, the scaling gain ratio of Overlap4x4 operatorThus, the value of kAnd the DC scaling gain must approach this value to minimize the DC scaling gain mismatch. Thus, the value of x must approach 0.094589554, and the value of y should be selected according to the earlier equation.

In practice, x may be approximated by a value that can be achieved using a binary lifting step; the value of y may be selected and then y may be approximated by a value that may be achieved using a binary lifting step. Experimental results show that the objective of minimizing DC leakage is more important than the objective of minimizing DC scaling gain mismatch. Therefore, the approximation of y for a given value of x should be more accurate than the initial approximation of x.

Solution 2a for the Overlap4x1 operator. X is 3/32and y is 775/4096 is 3/16+ 1/512-1/4096. This solution has minimal DC leakage. The DC gain of this solution is:

therefore, the solution still has a small amount of scaling gain mismatch. Compared to the existing Overlap4x1 operator, there is no additional lifting step to implement x. The implementation of y uses two additional lifting steps compared to the existing Overlap4x1 operator.

Solution 2b for the Overlap4x1 operator. X-97/1024-3/32 +1/1024 and y-49/256-3/16 +1/256 are selected. This solution has minimal DC leakage. The DC gain of this solution is:

thus, this solution still has a small amount of scaling gain mismatch, but less than solution 2a for the Overlap4x1 operator. Note that there is an additional lifting step for implementing x compared to the existing Overlap4x1 operator. The implementation of y uses an additional lifting step compared to the existing Overlap4x1 operator. Since the scaling phase requires applying x twice and y once, there are 3 additional lifting steps in the new solution 2b for the Overlap4x1 operator.

Solution 2c for the Overlap4x1 operator. X-775/8192, 3/32+1/1024-1/8192 and y-391/2048-3/32 + 1/512-1/2048. The DC gain of this solution is:

therefore, this solution has the least amount of scaling gain mismatch. There are two additional lifting steps in implementing x. There are two additional lifting steps in implementing y. Since the scaling phase requires x to be applied twice and y to be applied once, there are 6 additional lifting steps in the new solution.

The modification to Overlap4x1 has the advantage of further reducing both DC gain mismatch and DC leakage. By preserving the spirit of solution1, all of its benefits are also preserved. If the operator takes input X₁Then the output pixels all have a value of 1464631. The scaling gain ratio is close to 1.4616. DC leakage ratio close to 0 (ratio 2)^-16Much smaller).

For an 8-bit sample image, the worst-case scaling gain mismatch is (1.461543-1.461631) x 128/scaling gain ratio 0.000088/scaling gain ratio. The worst case DC leakage is 0. For a 16-bit sample image, the worst-case scaling gain mismatch is 0.000088x 256/scaling gain ratio-0.022528/scaling gain ratio. The worst case DC leakage is 0.

In light of the complexity of solution 2c for the Overlap4x1 operator, the six additional steps in the Overlap4x1 Altered make it 1.5 times more complex than the Overlap4x 1. Thus, the overall Solution Overlap4x1Solution 2 is approximately 3 times as complex as the existing Overlap4x1 operator.

Solution3 for the Overlap4x1 operator. For this Solution, the scaling phase of the existing Overlap4x1 operator is replaced by the scaling phase of the existing Overlap4x4 operator to form the new operator Overlap4x1Solution 3. The solution is a single step solution and thus no operator has to be repeated. These modifications ensure that the DC gain and DC leakage of the Overlap4x1Solution 3 are approximately the same as those of the Overlap4x1 operator. However, there are still some small differences due to the rounding effect of operations outside the scaling stage.

If the Overlap4X1Solution 3 operator takes the input X₁Then the output pixels have values 1461552, 1461547, 1461540, and 1461535.The scaling gain ratio is close to 1.45615435. The DC leakage ratio is (1461552-1461535)/10^ 6.

For an 8-bit sample image, the worst-case scaling gain mismatch is (1.461543-1.45615435) x 128/scaling gain ratio-0.000064/scaling gain ratio. The worst case DC leakage is (1461552-.

For a 16-bit sample image, the worst-case scaling gain mismatch is 0.000064x 256/scaling gain ratio-0.022528/scaling gain ratio. The worst case DC leakage is 0.002176x 256-0.557056/scaling gain ratio < 1/2.

For example, in one prior implementation, the operations shown in the table below are used for the overlapping post-filtering of the edge 4x1 sample block.

After solution3 for the overlap4x1 edge operator, a change is made to the scaling, as shown in the table below.

OverlapPostFilter4(a，b，c，d){
	a+＝d；
b+＝c；
	d-＝((a+1)＞＞1)；
c-＝((b+1)＞＞1)；
	InvScale(a，d)；
InvScale(b，c)；
	a+＝((d x 3+4)＞＞3)；
b+＝((c x 3+4)＞＞3)；
	d-＝(a＞＞1)；
c-＝(b＞＞1)；

a+＝d；
	b+＝c；
d x＝-1；
	c x＝-1；
InvRotate(c，d)；
	d+＝((a+1)＞＞1)；
c+＝((b+1)＞＞1)；
	a-＝d；
b-＝c；
	}

Wherein InvScale () and InvRotate () are defined as follows.

InvScale(a，b){
	a+＝b；
b＝(a＞＞1)-b；
	a+＝(b x 3+0)＞＞3；
b+＝((a x 3+0)＞＞4；
	b+＝(a＞＞7)；
b-＝(a＞＞10)；
	}

InvRotate(a，b){
	a-＝((b+1)＞＞1)；
b+＝((a+1)＞＞1)；
	}

Replacing the old scaling stage of the Overlap4x1 operator with the new scaling stage adds two additional scaling steps to the complexity. This increases the complexity roughly 1.2 times the complexity of the existing Overlap4x1 operator for edges.

3.6 solution for CornerOverlap2x2 operator

Two possible solutions for the CornerOverlap2x2 operator are presented.

Solution1 for the CornerOverlap2x2 operator. Solution1 is to form a new Overlap6x1 operator for corners, merging corner pixels with adjacent edge pixels. One reason this solution is undesirable is because it involves a complex design that eliminates DC gain mismatch and DC leakage. In addition, if the orientation of these operators is horizontal, the image width is less than 3 macroblocks, and the solution decomposes. Similarly, if the orientation of these operators is vertical, 4:2:0 images with a height less than 3 macroblocks and 4:2:2 images with a height less than 2 macroblocks are not suitable for this solution.

Solution2 for the CornerOverlap2x2 operator. Solution2 is to apply the same Overlap4x1 operator applied to the edge, to the diagonal pixels in raster scan order. Since both operators operate on 4 pixels, the benefit of this solution is that the corner operators will have exactly the same DC gain mismatch and DC leakage characteristics as the edge operators. The solution is represented as: cornervoverlap 2x2(a, b, c, d) ═ Overlap4x1 applied solution (top left, top right, bottom left, bottom right)

In one particular implementation, for each corner, the CornerOverlap2x2 operator is applied in the same pixel order. This has the advantage of allowing a uniform implementation across all corners. A disadvantage of this implementation is that the rotation may introduce some small error. For Overlap4x1Solution 3, the error will only be in rounding. Alternatively, a rotation ordering can be used at each corner.

3.7 solution for Overlap2x2 operator for chroma

In existing implementations, the main problem with the Overlap2x2 operator is DC leakage. In this section, a redesign of the scaling stage of the operator is presented so that DC leakage is reduced. In the following sections, the Overlap2x1 and cornervolap 2x1 operators were redesigned to gain match the new Overlap2x2 operator.

The DC leakage of this new Overlap2x2 operator at the decoder can be estimated by setting all inputs to this operator to the same value. If the inputs to the inverse Overlap2x2 operator are all the same value, the effect of that operator on a pair of input values can be expressed as:

for no DC leakage, the conditions are:

x (2+ xy) +1+ xy, where solving for y yields:

y ═ x (2+ xy), then

y＝2*x/(1-x²)

The amount of DC leakage can be quantified as:

y-x(2+xy)。

the existing Overlap2x2 operator sets the values x 1/4 and y 1/2, and thus the input is (1+ xy + y) 13/8 and (x (2+ xy) +1+ xy) 53/32, and this difference is the cause of DC leakage at 1/32. The complexity of the existing Overlap2x2 operator for chroma is small because the values of x and y are amenable to implementation using a very simple binary lifting step.

While general-purpose solutions may vary x and y to mitigate DC leakage, these solutions typically have greater complexity than the existing Overlap2x2 operator for chroma. Thus, a solution has been developed to reduce DC leakage while limiting the increase in complexity.

In one possible solution, the value x is retained 1/4 and the value of y is adjusted so that DC leakage is reduced. Note that boosting with y occurs and therefore this solution will have less complexity than a solution that changes x while preserving the existing y values.

Solution1 Overlap2x2 operator for chroma. The value of y to eliminate DC leakage is 2x (1/4)/(1-1/16) 8/15. The DC leakage in this solution is zero. However, y 8/15 cannot be achieved using a binary lifting step. This solution has a high complexity because it requires a division operation.

Solution 2a for chroma Overlap2x2 operator. The value is set to y-17/32 and the solution can be implemented by using a binary lifting step as 1/2+1/32 (without a multiplier). This value has a DC leakage 1/512.

Solution 2b for chroma overlay 2x2 operator. The value of y is set to 273/512. This solution can be achieved by using a binary lifting step as 1/2+1/32+1/512 (no multiplier). The value of DC leakage is 1/8192.

Solution 2c for chroma overlay 2x2 operator. The value of y is set to 4369/8192. This solution can be achieved by using a binary lifting step as 1/2+1/32+1/512+1/8192 (no multiplier). The value of DC leakage is 1/131072, i.e., 1/(2^ 17).

Note that.

Thus, DC leakage has been reduced from 1/32(0.03125) to 1/(2^ 17). Thus, the DC leakage is reduced to less than 16-bit accuracy. The DC scaling gain ratio in this case is 1+ xy + y 54613/32768 ~ 1.666656494.

For example, in one prior implementation, the post-overlap filtering of the inner 2x2 blocks of chroma samples uses the operations shown in the table below.

OverlapPostFilter2x2(a，b，c，d){
	a+＝d；
b+＝c；
	d-＝((a+1)＞＞1)；
c-＝((b+1)＞＞1)；
	b+＝((a+2)＞＞2)；
a+＝((b+1)＞＞1)；
	b+＝((a+2)＞＞2)；
d+＝((a+1)＞＞1)；
	c+＝((b+1)＞＞1)；
a-＝d；

b-＝c；
	}

After solution 2c for the overlap2x2 operator for chroma, a binary lifting operation with factors 1/32, 1/512, and 1/8192 is added, as shown in the table below. Left-shifted (>) bits 5, 9, and 13 correspond to divisions 32, 512, and 8192, respectively.

OverlapPostFilter2x2(a，b，c，d){
	a+＝d；
b+＝c；
	d-＝((a+1)＞＞1)；
c-＝((b+1)＞＞1)；
	b+＝((a+2)＞＞2)；
a+＝((b+1)＞＞1)；
	a+＝(b＞＞5)；
a+＝(b＞＞9)；
	a+＝(b＞＞13)；
b+＝((a+2)＞＞2)；
	d+＝((a+1)＞＞1)；
c+＝((b+1)＞＞1)；
	a-＝d；
b-＝c；
	}

3.8 solution for Overlap2x1 operator

Similar to the solution for the Overlap4x1 operator, 3 solutions for the Overlap2x1 operator are presented, one of which has multiple variants.

Solution1 for the Overlap2x1 operator. Since the DC gain of the existing Overlap2x1 operator is approximately the square root of the DC gain of the existing Overlap2x2 operator, the first solution is to apply the existing Overlap2x1 operator twice. Overlap2x 1Solution1(a, b) is assigned:

3.Overlap2x1(a，b)；

4.Overlap2x1(b，a)。

by reversing the order between the two applications of the operator, the DC leakage introduced by the first stage can be eliminated by the second stage. Repeating this ordering will incur leakage accumulation. Unfortunately, this solution still suffers from significant DC gain mismatch. In terms of complexity, this solution has roughly twice the complexity of the existing Overlap2x1 operator.

Solution2 for the Overlap2x1 operator. Solution2 is similar in spirit to solution1 for the Overlap2x1 operator-it involves applying a single operator twice on the same sample. But for this solution the scaling of the existing Overlap2x1 operator is modified by adding an additional lifting step. As before, Overlap2x 1Solution 2(a, b) was assigned:

3.Overlap2x1 Altered(a，b)；

4.Overlap2x1 Altered(b，a)。

the idea behind solution2 for the Overlap2x1 operator is to change x and y such that both DC leakage and DC scaling gain mismatch are minimized. The conditions for minimizing DC leakage are as follows

1+xy+x(2+xy)＝y+1+xy。

Solving for y to obtainIn this case, the scaling gain ratio of one stage of the Overlap2x1 Altered operator can be calculated as:

let the square root of the gain of the Overlap2x2 operator be denoted as k. Thus, the value of x that minimizes the DC scaling gain mismatch is given by:

or:

once the value of x that minimizes the scaling gain mismatch using this equation is determined, the earlier equation can be usedTo determine the value of y (to minimize DC leakage). In this case, the scaling gain ratio of Overlap4x4 operatorThus, the value of kAnd the DC scaling gain must approach this value to minimize the DC scaling gain mismatch. Thus, the value of x must approach 0.127015153, and the value of y should be selected according to the earlier equation.

In practice, x may be approximated by a value that can be achieved using a binary lifting step; the value of y may be selected and then y may be approximated by a value that may be achieved using a binary lifting step. For this operator, experimental results show that the goal of minimizing DC leakage is much more important than the goal of minimizing DC scaling gain mismatch. Therefore, the approximation of y for a given value x should be much more accurate than the initial approximation of x.

Solution 2a for the Overlap2x1 operator. X is 1/8 and y is 65/256 is 1/4+ 1/256. This solution has minimal DC leakage. The DC scaling gain of this solution is:

therefore, the solution still has a small amount of DC scaling gain mismatch. Compared to the existing Overlap2x1 operator, there is no additional lifting step to implement x. The implementation of y uses an additional lifting step compared to the existing Overlap2x1 operator.

Solution 2b for the Overlap2x1 operator. X-65/512-1/8 +1/512 and y-33/128-1/4 +1/128 are selected. This solution has minimal DC leakage. The DC scaling gain of this solution is:

thus, this solution still has a small amount of scaling gain mismatch, but less than solution 2a for the Overlap2x1 operator. Compared to the existing Overlap2x1 operator, there is an additional lifting step for implementing x. The implementation of y requires an additional lifting step compared to the existing Overlap2x1 operator. Since the scaling phase requires applying x twice and y once, there are 3 additional lifting steps in the new solution 2b for the Overlap2x1 operator.

Solution 2c for the Overlap2x1 operator. X is 2081/16384 and y is 33/128. The DC scaling gain of this solution isTherefore, this solution has the least amount of DC scaling gain mismatch. There are two additional lifting steps in implementing x. There is an additional lifting step in implementing y. Since the scaling phase requires applying x twice and y once, there are 5 additional lifting steps in the new solution 2b for the Overlap2x1 operator. In terms of complexity, the five additional lifting steps in Overlap2x1 Altered make it 1.5 times more complex than the existing Overlap2x1 operator. Thus, the overall Solution Overlap2x 1Solution 2 is approximately 3 times as complex as the existing Overlap2x1 operator.

Solution3 for the Overlap2x1 operator. In this Solution, the scaling phase of the existing Overlap2x2 operator is replaced by the scaling phase of the existing Overlap2x1 operator to form the new operator Overlap2x 1Solution 3. The solution is a single step solution and thus no operator has to be repeated. These modifications ensure that the DC gain and DC leakage of Overlap2x 1Solution 3 are approximately the same as the Overlap2x2 operator. However, there are still some small differences due to the rounding effect of operations outside the scaling stage.

For example, in one prior implementation, the operations shown in the table below are used for the overlap post-filtering of the edge 2x1 chroma sample block.

OverlapPostFilter2(a，b){
	b+＝((a+4)＞＞3)；
a+＝((b+2)＞＞2)；
	b+＝((a+4)＞＞3)；
}

After solution3 for the overlap2x1 operator for chroma, a binary lifting operation with factors 1/32, 1/512, and 1/8192 is added, as shown in the table below. Left-shifted (>) bits 5, 9, and 13 correspond to divisions 32, 512, and 8192, respectively.

OverlapPostFilter2(a，b){
	b+＝((a+2)＞＞2)；
a+＝((b+1)＞＞1)；
	a+＝(b＞＞5)；
a+＝(b＞＞9)；
	a+＝(b＞＞13)；
b+＝((a+2)＞＞2)；
	}

Replacing the old scaling stage of the Overlap2x1 operator with the new scaling stage adds three additional scaling steps to the complexity. This increases the complexity roughly 1.2 times the complexity of the existing Overlap2x1 operator for edges.

3.9 solution for CornerOverlap1x1 operator

The CornerOverlap1x1 operator presents a particular design challenge because it only works on a single sample. DC leakage is therefore not a problem, while DC gain mismatch is a significant problem. Several possible solutions are described below.

Strictly speaking, the CornerOverlap1x1 operator is not an operator in the same sense as the other overlap operators. The "1 x 1" operator will have no support region outside of a single sample. On the other hand, many of the example 1x1 operator solutions below involve prediction between a single sample and a predictor determined from at least one sample outside the 1x1 region. For simplicity, these corner operators are referred to as the corner overlap1x1 operators.

DC scaling is another way that many of the example 1x1 operator solutions presented below differ from other overlap operators. For the previous 4x4, 4x1, 2x2, and 2x1 operators, the scaling results directly from the overlap operation. The sample prediction operations described below for some example cornercoverlap 1x1 operators do not result in scaling in the same manner. But this may lead to a departure from the goal of having the same DC gain for all operators in the set, with a similar scaling yielding most of the time for these 1x1 angle operators in practice, as described below.

Solution1 for the CornerOverlap1x1 operator. Similar to solution1 for the CornerOverlap2x2 operator, one solution is to involve a new Overlap3x1 operator. This solution has similar drawbacks. This solution suffers from the minimum image size requirement (3 macroblocks along the direction of the operator) and the possibility of additional DC gain mismatch and DC leakage problems.

Solution2 for the CornerOverlap1x1 operator. For this solution, the fact that the corner samples are likely to be highly correlated with their neighbors is exploited. It is important that this operator operates on the mean of each original image block, taking into account its use in the overlap processing of the DC coefficients. Thus, before the 1x1 corner overlap operator, the values of the corner sample and its neighbors have a high probability of being similar. If the DC gain mismatch is eliminated, the angular sample value will also be very similar to its neighboring samples after lapped transform.

Therefore, this solution uses a scheme based on prediction of diagonal samples. Consider the upper left corner as an example and mark these samples as follows:

A B....

C D...

.....

the top left sample is labeled a, the sample to the right of a is labeled B, and the sample below a is labeled C. The prediction scheme is implemented as follows, assuming in-place computation is used.

1. Before applying the second stage overlap operator:

a. the value (a) ═ value (a) - (value (B) + value (C) +1) > 1.

2. After applying the second stage overlap operator:

a. the value (a) ═ value (a) + (value (B) + value (C) +1) > 1.

Since the residual value of a after step is very likely small, the operation is roughly equivalent to applying the same gain to the diagonal samples. (while B and C may have significant values, the disparity value A is typically zero or near zero. after step 1, scaling is applied to adjacent B and C samples (e.g., by applying an edge overlap operator.) when scaled B and C are added back to the disparity value A in step 2, the resulting value A has been effectively scaled to the same degree as the B and C values in most cases.) it is straightforward to apply this solution to the other three corners.

Solution2 for the CornerOverlap1x1 operator has advantages with respect to rotational symmetry. One drawback is that this solution requires the picture to have at least a width of 2 macroblocks and a height of 2 macroblocks (only for 4:2:0, not 4:2:2), otherwise the prediction scheme becomes impractical.

Solution3 for the CornerOverlap1x1 operator. A second scheme similar to that of solution2 is presented for the CornerOverlap1x1 operator. Some implementations may not perform each stage of the lapped transform at a single time, but rather interleave the stages. This may make it difficult to access both B and C simultaneously before or after the lapped transform. The following solutions are proposed for this solution:

1. before applying the second stage overlap operator to B:

a. value (a) ═ value (a) > 1.

2. Before applying the second stage overlap operator to C:

a. value (a) -value (C) > 1.

3. After applying the second stage overlap operator to B:

a. the value (A) ═ value (A) + value (B) > 1.

4. After applying the second stage overlap operator to C:

a. the value (A) ═ value (A) + value (C) > 1.

The ordering of the steps only has to be such that 1 occurs before 3 and 2 before 4. Any reordering is acceptable as long as both conditions are met. Solution3 for the CornerOverlap1x1 operator allows decoupling from the timing problem in various implementations. Unfortunately, this solution may also have some rounding issues that are not present in solution 2. In addition, this solution shares the same properties as solution2 for the CornerOverlap1x1 operator.

Solution 4 for the CornerOverlap1x1 operator. In this solution, solution3 is modified to operate only for the horizontal direction, as follows:

1. before applying the second stage overlap operator to B:

a. the value (a) means a value (a) to a value (B).

2. After applying the second stage overlap operator to B:

a. the value (a) is the value (a) + the value (B).

The advantage of this solution is that it involves fewer memory accesses. The complexity is low because it interacts with half as many samples as the solutions 2and 3 for the CornerOverlap1x1 operator. The disadvantage of this solution is that it lacks symmetry and therefore has a rotational problem. This problem is alleviated when considering that downsampling introduces more loss than asymmetry. As with other solutions for the CornerOverlap1x1 operator, in solution 4 the image (or hard patch) has a width of at least 2 macroblocks. One advantage is that all downsampled chroma operations are similar. Similar to solution2 (and solutions 3, 5, 6, and 7), the neighboring samples used to calculate the difference in a are expected to have very similar values to a. After step 1, the adjacent samples are scaled as part of the edge overlap process and then added back to the difference in angle a in step 2. Because the difference a is expected to be zero or close to zero, the angular value is scaled to approximately the same extent as the neighboring samples.

Solution 5 for the CornerOverlap1x1 operator. In this solution, solution3 is modified to operate only for the vertical direction, as follows:

1. before applying the second stage overlap operator to C:

a. the value (a) is the value (a) to the value (C).

2. After applying the second stage overlap operator to C:

a. the value (a) is the value (a) + the value (C).

The advantage of this solution is that it involves fewer memory accesses. The complexity is low because it interacts with half as many samples as the solutions 2and 3 for the CornerOverlap1x1 operator. Like solution 4, this solution has the disadvantage that it lacks symmetry and therefore has a rotation problem. This problem is alleviated when considering that downsampling introduces more loss than asymmetry. As with other solutions for the CornerOverlap1x1 operator, in solution 5, the 4:2:0 image has a height of at least 2 macroblocks. This solution does not incur other size requirements. Thus, the 4:2:0 and 4:2:2 operations are not parallel, another drawback of this solution.

Solution 6 for the CornerOverlap1x1 operator. This solution for the CornerOverlap1x1 operator is a combination of solutions 4 and 5. Specifically, syntax elements are added to the bitstream to specify whether the prediction is horizontal or vertical for the CornerOverlap1x1 operator. This solution has the advantage of supporting only rotation. The addition of syntax elements is a disadvantage of this solution.

Solution 7 for the CornerOverlap1x1 operator. This final solution is similar to solution 4, except that instead of using the value of B as a predictor, the value of D is used as a predictor for a. The benefit of using D as a predictor is rotational symmetry. However, a disadvantage of using D is that it may be a worse match than B or C.

4. Computing environment

The overlay processing invention described above may be implemented on any of a variety of devices in which digital media signal processing is performed (e.g., digital media encoding and/or decoding devices), including computers, image and video recording, transmitting and receiving devices, portable video players, digital media players, video conferencing, and the like. The overlap processing invention can be implemented in hardware circuitry, and in digital media processing software executing in a computer or other computing environment, such as that shown in FIG. 8.

FIG. 8 illustrates a general example of a suitable computing environment (800) in which described embodiments may be implemented. The computing environment (800) is not intended to suggest any limitation as to scope of use or functionality of the invention, as the present invention may be implemented in diverse general-purpose or special-purpose computing environments.

Referring to FIG. 8, a computing environment (800) includes at least one processing unit (810) and a storage unit (820). In fig. 8, this most basic configuration (830) is included within the dashed line. The processing unit (810) executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The memory (820) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. A memory (820) stores software (880) implementing the encoder/decoder with multiple tile boundary options for overlap processing and/or overlap operator with reduced DC gain mismatch.

The computing environment may have additional features. For example, the computing environment (800) includes memory (840), one or more input devices (850), one or more output devices (860), and one or more communication connections (870). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment (800). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment (800) and coordinates activities of the components of the computing environment (800).

Storage (840) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (800). A memory (840) stores software (880) implementing the encoding/decoding with multiple tile boundary options for overlap processing and/or overlap operator with reduced DC gain mismatch.

The input device (850) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment (800). For audio, the input device (850) may be a sound card or similar device that accepts audio input in analog or digital form, or a CD-ROM reader that provides audio samples to the computing environment. For images or video, the input device (850) may be a super-sequential, TV tuner, or other device that provides input video in analog or digital form. The output device (860) may be a display, a printer, a speaker, a CD-writer, or another device that provides output from the computing environment (800).

The communication connection (870) allows communication with another computing entity over a communication medium. The communication medium conveys information such as compressed audio or video information, or other data, in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless techniques implemented with an electrical, optical, Radio Frequency (RF), infrared, acoustic, or other carrier.

The digital media processing techniques herein may be described in the general context of computer-readable media. Computer readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with respect to computing environment (800), computer-readable media comprise memory (820), storage (840), and combinations of both. The computer readable medium is a tangible medium. The computer readable medium does not include a modulated data signal.

The digital media processing techniques herein may be described in the general context of computer-executable instructions, such as those included in program modules, being executed on a target real or virtual processor in a computing environment. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions of program modules may be executed within a local or distributed computing environment.

For the purposes of this description, the detailed description uses terms such as "determine," "generate," "adjust," and "apply" to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary from implementation to implementation.

Having described and illustrated the principles of the present invention in the detailed description and the accompanying drawings, it will be recognized that the various embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment unless specifically stated otherwise. Various types of general purpose or specialized computing environments may be used or operations may be performed in accordance with the principles described herein. Elements of embodiments shown in software may be implemented in hardware and vice versa.

In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.

Claims

1. A method of performing an inverse lapped transform (260) for decoding digital media using a digital media decoding device (800), the method comprising:

performing, with the digital media decoding apparatus, an inverse frequency transform (250) on digital media;

applying, with the digital media decoding device, a plurality of overlap operators to a result of the inverse frequency transform, the plurality of overlap operators including at least a first overlap operator and a second overlap operator, wherein the first overlap operator is an interior overlap operator, and wherein the second overlap operator is an edge or corner overlap operator, each of the plurality of overlap operators characterized by a substantially equivalent DC gain (750).

2. The method of claim 1, wherein as part of the applying:

applying the first overlap operator to an internal 4x4 sample region (310) of an image and/or patch;

the second overlap operator is an edge overlap operator applied to an edge 4x1 sample region (330) of the image and/or patch; and

applying a third overlap operator of the plurality of overlap operators to an angular 2x2 sample region (320) of the image and/or patch.

3. The method of claim 2, wherein the first overlap operator includes a scaling stage, and wherein the second overlap operator uses the scaling stage of the first overlap operator.

4. The method of claim 2, wherein the inverse lapped transform is a hierarchical inverse transform having a plurality of stages, and wherein the first, second, and third overlap operators are applied to samples of at least one channel in a first stage of the plurality of stages.

5. The method of claim 2, wherein the inverse lapped transform is a hierarchical inverse transform having a plurality of stages, and wherein the first, second, and third overlap operators are applied to samples of each of a plurality of full resolution channels in a second stage of the plurality of stages.

6. The method of claim 2, wherein the first overlap operator comprises a first scale having a given scale stage, wherein the second overlap operator comprises a second scale having a given scale stage, and wherein the given scale stage is:

a+＝b； b＝(a＞＞1)-b； a+＝(b x 3+0)＞＞3； b+＝((a x 3+0)＞＞4； b+＝(a＞＞7)； b-＝(a＞＞10)；

7. the method of claim 2, wherein the third overlap operator applied to the angle 2x2 sample region is applied using the second overlap operator after reordering top-left, top-right, bottom-left, and bottom-right samples in the angle 2x2 sample region into a middle 4x1 sample region.

8. The method of claim 1, wherein the inverse lapped transform is a layered inverse transform having a plurality of stages, wherein as part of the applying:

applying the first overlap operator to an internal 2x2 sample region (410; 510) of an image and/or patch;

the second overlap operator is an edge overlap operator applied to an edge 2x1 sample region (430; 530) of the image and/or patch; and

applying a third overlap operator of the plurality of overlap operators to an angular 1x1 sample region (420; 520) of the image and/or patch; and is

Wherein the first, second, and third overlapping operators are applied to samples of the downsampled chroma channel in a first stage of the plurality of stages.

9. The method of claim 8, in which the first overlap operator and the second overlap operator comprise a factor ofAndthe binary lifting step of (1).

10. The method of claim 8 wherein the third overlap operator is applied to sample values at position a using sample values at horizontally adjacent position B, and wherein the third overlap operator is implemented by:

adjusting the sample value at position A by subtracting the sample value at position B from the sample value at position A before an overlap operator is applied to the sample value at position B; and

after the overlap operator has been applied to the sample value at position B, the sample value at position a is adjusted by adding the sample value at position B to the sample value at position a.

11. A method of decoding digital media using a digital media decoding device (800), the method comprising:

with the digital media decoding device:

receiving information indicating the selected tile boundary option in the encoded bitstream (640), wherein

The selected tile boundary option indicates one of a hard tile boundary treatment to an overlap operator and a soft tile boundary treatment to an overlap operator;

performing an inverse overlap process (650) based at least in part on the selected tile boundary option.

12. The method of claim 11, further comprising:

when the selected tile boundary option indicates the soft tile boundary processing for overlap operators, performing inverse overlap processing with inverse overlap operations across tile boundaries; and

when the selected tile boundary option indicates the hard tile boundary processing for overlap operators, performing inverse overlap processing across tile boundaries without inverse overlap operations, wherein the inverse overlap processing still includes inverse overlap operations on edge samples on at least one side of respective tile boundaries to reduce DC gain mismatch.

13. The method of claim 11, wherein the selected tile boundary option is signaled in a picture header.

14. The method of claim 11, wherein:

when the selected tile boundary option indicates the soft tile boundary processing for the overlap operator, the inverse overlap processing for the current tile comprises at least partially decoding at least one spatially adjacent tile; and

when the selected tile boundary option indicates the hard tile boundary processing for overlap operators, the inverse overlap processing for the current tile is independent of decoding spatially neighboring tiles.

15. A computer-readable medium storing computer-executable instructions for causing the digital media decoding apparatus to be programmed thereby to perform the method of any one of claims 1-14.