HK1188352B

HK1188352B - Enhancement methods for sampled and multiplexed image and video data

Info

Publication number: HK1188352B
Application number: HK14101213.3A
Authority: HK
Inventors: 阿萨纳西奥斯．莱昂塔里斯; 亚历山德罗斯．图拉皮斯; 佩沙拉．V．帕哈拉瓦达
Original assignee: 杜比实验室特许公司
Priority date: 2010-07-19
Filing date: 2011-07-19
Publication date: 2019-06-06

Description

Enhancement method for sampled multiplexed image and video data

Cross Reference to Related Applications

This application claims priority from U.S. provisional patent application No. 61/365,743, filed on 19/7/2010, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates generally to methods and systems for processing image and video data. More particularly, embodiments of the invention relate to enhancement methods for sample multiplexed image and video data.

Background

Images and multidimensional data are typically sampled in a rectangular raster scan pattern. Throughout this specification, it will be understood by those of ordinary skill in the art that the term "sampling" may refer to down-sampling or up-sampling. Downsampling an image or video frame reduces the resolution or number of pixels of the image or video frame. On the other hand, upsampling an image or video frame increases the resolution or number of pixels of the image. Sampling may include a process of selecting all or a portion of the samples with or without any filtering. The sampling may be performed according to different sampling intervals or sampling rates, linear functions or non-linear functions. Further, it will be understood by those of ordinary skill in the art that the term "resampling" may refer to restoring a sampled image or video frame back to its previous state prior to being sampled.

The image or video data is sampled for a number of reasons. These reasons include, but are not limited to: (a) since the number of samples is now small, it is easier to store. (b) The amount of computation is small, since for example the fourier transform is computed faster or the motion estimation and compensation is done on raw data compared to e.g. a quarter of the original sample size. (c) Appropriately performed sampling (e.g., before low pass filtering) may also increase the compression rate of the data (despite the fact that the sample size is also small). (d) Sampling is helpful in representing and classifying data for pattern recognition or matching, for example. The fine sampling enables the identification of those samples that are most critical in the representation of the image or video data. Due to the sampling, not only the amount of computation is reduced but even the success rate of a particular algorithm is greatly benefited. (e) It may be desirable to sample the data in the following pattern: this pattern makes it easy to subsequently recover lost/discarded samples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one or more embodiments of the disclosure and, together with the description of the example embodiments, serve to explain the principles and implementations of the disclosure.

Fig. 1 shows a sampling pattern that is sub-sampled at a ratio of 2 in the vertical dimension.

Fig. 2 shows a sampling pattern that is sub-sampled at a ratio of 2 in the horizontal dimension.

Fig. 3 shows a sampling pattern that is sub-sampled in two dimensions at a ratio of 2.

Fig. 4 shows a quincunx sampling pattern.

Fig. 5 shows a checkerboard interleaved arrangement for transmitting stereoscopic material (material), referred to as CB.

Fig. 6 shows an example of the rearrangement of the quincunx samples, referred to as a "quadrant-quincunx" or a (height/2) × (width/2) quincunx sampling pattern.

Fig. 7 shows an alternative arrangement of quincunx samples, called "side by side quincunx", where the views are packed side by side and the rows of "even" quincunx samples alternate with the rows of "odd" quincunx samples.

Fig. 8 illustrates multiplexing samples using column interleaving.

Fig. 9 shows multiplexing of samples using side by side-column (side by side-column).

Fig. 10 shows multiplexing of samples using top-bottom row interleaving.

Fig. 11 shows multiplexing of samples using row interleaving.

FIG. 12 shows an example of samples to be deblocked in a single row divided by vertical block edges.

Fig. 13 shows an example where samples to be deblocked do not necessarily belong to the same view. This may lead to cross-view contamination. In the example of the figure, the two hatched types represent two different views.

Fig. 14 illustrates the deblocking problem with respect to quadrant-quincunx video data.

Fig. 15 illustrates the deblocking problem with respect to the vertical edges of a side-by-side-quincunx arrangement.

Figure 16 illustrates the deblocking problem with respect to the horizontal edges of a side-by-side-quincunx arrangement.

Figure 17 illustrates the side-by-side problem.

Fig. 18 illustrates the up-down-row problem.

Fig. 19A shows a schematic diagram of performing processing of an image or video sample according to an embodiment of the present disclosure.

Fig. 19B shows an example of deblocking the quincunx sampled and packaged content with full demultiplexing and interpolation. Lower case letters denote the interpolated values. y' represents the deblocked sample y.

FIG. 20 shows an example of analysis guided deblocking.

Fig. 21 shows the deblocking strategy of a bilinear interpolation filter on quincunx sampled and packed data. Italicized letters indicate the values that must be interpolated.

Fig. 22 shows a deblocking strategy with respect to packed quincunx sampled video data.

Fig. 23 shows an example of diagonal deblocking.

Fig. 24 shows joint deblocking and demultiplexing for interpolation using bilinear filters.

Fig. 25 shows joint de-multiplexing interpolation and deblocking using the overcomplete de-noising principle. The unshaded example is interpolated. Deblocking affects all the hatched samples near the block boundary.

Fig. 26 shows an example of recovery of lost samples using an overcomplete denoising method.

FIG. 27 shows an example of deblocking for pre-processing a full resolution view.

Fig. 28 shows an example of in-loop (in-loop) filtering for quincunx sampled video data in a video encoder.

Fig. 29 shows an example of in-loop filtering for quincunx sampled video data in a video decoder.

Fig. 30 shows an example of loop filtering for sample data in a resolution scalable 3D stereoscopic video encoder.

Fig. 31 shows an example of loop filtering for sampled video data in a resolution scalable 3D stereoscopic video decoder.

Fig. 32 shows an example of loop filtering for sampled video data in a resolution scalable 3D stereoscopic video encoder.

Fig. 33 shows an example of loop filtering for sampled video data in a resolution scalable 3D stereoscopic video decoder.

Fig. 34 shows an example of loop filtering for sampled video data in a resolution scalable 3D stereoscopic video encoder.

Fig. 35 shows an example of loop filtering for sampled video data in a resolution scalable 3D stereoscopic video decoder.

Fig. 36 shows an example of enhancement layer deblocking using samples from the base layer.

FIG. 37 shows an example of an out-of-loop (off-loop) post-process.

Fig. 38 shows another example where input video samples are processed separately in two different branches and then combined together.

FIG. 39 illustrates another example in which acceleration of processing operations is discussed.

Fig. 40 shows an example of loop filtering for sample data in a resolution scalable video decoder with multiple enhancement layers.

Fig. 41 illustrates an example of loop filtering for sampled video data in a resolution scalable video decoder with multiple enhancement layers.

Detailed Description

The present disclosure relates to an enhancement method for sample multiplexed image and video data.

According to a first aspect, there is provided a method of processing composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising: demultiplexing the composite sampled image or video data into a plurality of component pictures; processing each component picture separately; and sampling and multiplexing the separately processed component pictures together.

According to a second aspect, there is provided a method of processing composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components or categories, the method comprising: each element of the composite sampled image or video data is processed by taking into account the image or video component or category to which it relates, thereby distinguishing between processing of composite data relating to one image or video component or category and processing of composite data relating to another image or video component or category.

According to a third aspect, there is provided a method of processing composite sampled image or video data, the composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, comprising: demultiplexing the composite sampled image or video data into a plurality of component pictures while processing the sampled image or video data, wherein the processing is selected from deblocking, denoising, deblurring, deringing, and filtering.

According to a fourth aspect, there is provided a method of processing composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video categories, the method comprising: providing an initial block of existing samples of the same class; applying a transform to the samples of the initial block; estimating transform coefficients of a double-sized block using the transform coefficients of the initial block, the double-sized block containing the same existing samples of the initial block and the same class of missing samples of the existing samples; adjusting the estimated transform coefficients of other twice-sized blocks; and applying an inverse transform to samples of the twice-sized block.

According to a fifth aspect, there is provided a method for processing image or video data, comprising: separately pre-processing image or video components of an image or video to be interleaved or multiplexed; separately sampling the pre-processed image or video components; interleaving or multiplexing the sampled pre-processed image or video components to form a composite image or video; and processing the composite image or video.

According to a sixth aspect, there is provided a method for processing composite sampled image or video data for a scalable video coding system, the scalable video coding system having a base layer and one or more enhancement layers, the composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising: demultiplexing the composite sampled image or video data of one or more of the one or more enhancement layers into a plurality of enhancement layer component pictures; replacing missing samples of each enhancement layer component picture with samples from the base layer; after the replacement, processing each enhancement layer component picture separately; and sampling and multiplexing together the separately processed enhancement layer component pictures.

According to a seventh aspect, there is provided a method for processing composite sampled image or video data for a scalable video coding system, the scalable video coding system having a base layer and one or more enhancement layers, the composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising: demultiplexing the composite sampled image or video data of one or more of the one or more enhancement layers into a plurality of enhancement layer component pictures; encoding each enhancement layer component picture separately using a prediction from the base layer; processing each enhancement layer component picture after encoding separately; and sampling and multiplexing the separately processed component pictures together.

According to an eighth aspect, there is provided a method for processing video samples of an image, comprising: performing two separate sets of operations on the video samples, a first set of operations comprising upsampling the video samples followed by processing the upsampled video samples to provide a first output, and a second set of operations comprising processing the video samples followed by upsampling the processed video samples to provide a second output; and combining the first output with the second output.

According to a ninth aspect, there is provided a method for increasing the computational speed of processing operations on samples of a composite image or video arrangement, comprising: demultiplexing the samples of the composite image or video arrangement into individual samples constituting components of the composite image or video arrangement; processing each component separately; and multiplexing the separately processed components together.

Images and multidimensional data are typically sampled in a rectangular raster scan pattern. In this specification, it will be understood by those of ordinary skill in the art that the term "sampling" may refer to down-sampling or up-sampling. Downsampling an image or video frame reduces the resolution or number of pixels of the image or video frame. On the other hand, upsampling an image or video frame increases the resolution or number of pixels of the image. Sampling may include a process of selecting all or a portion of the samples with or without any filtering. The sampling may be performed according to different sampling intervals or sampling rates, linear functions or non-linear functions. Further, it will be understood by those of ordinary skill in the art that the term "resampling" may refer to restoring a sampled image or video frame back to its previous state prior to being sampled.

Most signal processing algorithms, including denoising, deblocking and enhanced filtering, work on regular raster scan grids. Some examples of sampling arrangements will be briefly described. In a rectangular raster scan grid, it may be desirable to hold all pixels horizontally and one-half of the pixels vertically. If one dimension is considered more important than the remaining dimensions, the sampling rate may be reduced to retain more of the original information. Such a sampling pattern (10) is shown in fig. 1. In different applications, horizontal sampling may be preferred over vertical sampling (horizontal sampling pattern (20) is shown in fig. 2). For many applications, however, it may be desirable to retain an equal amount of information for both dimensions. This can be achieved by employing equal sampling rates in both dimensions as shown in fig. 3, which shows a quarter-sampled image (30) in fig. 3.

An alternative sampling pattern is the quincunx sampling pattern depicted in fig. 4, with the quincunx sampling image (40) shown in fig. 4. Each missing sample is equidistant from its nearest neighbor existing sample, which may be beneficial for interpolating the missing samples. Further improvements to the interpolation process can be supported in reference [9 ]. In practice, the horizontal and vertical dimensions are not equal in importance. It may be desirable to maintain a higher resolution in the vertical dimension than in the horizontal dimension. Although the sampling pattern of fig. 1 retains the same number of samples as the pattern in fig. 4, interpolation is not as efficient as quincunx sampling. Quincunx sampling preserves a large portion of the signal characteristics and frequency information and results in a more efficient interpolation of the missing samples itself. Quincunx sampling has been used to sample left and right views (stereo pairs) of a 3D stereo image sequence and pack them together into a tessellated (CB) arrangement. The resulting CB picture contains information from both views and is compatible with some of the currently available 3D displays and systems based on Digital Light Processing (DLP) technology [ see references 1, 2, the entire contents of which are incorporated herein by reference ].

Such a sampling and interleaving arrangement, also referred to as "CB" or "quincunx-quincunx", and the resulting CB picture (50) are shown in fig. 5. "CB" requires quincunx sampling and checkerboard interleaving (throughout this disclosure, the terms "interleaving" and "multiplexing" may be used interchangeably). Today, with the use of e.g. blu-ray discs with the capacity required for storing compressed signals, the transmission of CB formatted content to CB compatible displays, such as DLP (digital light processing) displays, is possible. Compression is facilitated by any codec supported by the blu-ray specification, such as the h.264/MPEG-4AVC [ see reference 4, incorporated by reference in its entirety ] video coding standard and the SMPTE VC-1[ see reference 5, incorporated by reference in its entirety ] coding standard. Thus, compression, transmission, playback, and display of CB formatted content may be accomplished using commercially available off-the-shelf equipment. Better compression performance can also be achieved by employing methods suitable for CB formatted content [ see reference 7, which is incorporated herein by reference in its entirety ].

The quincunx sampled 3D video data may be sampled and multiplexed together in some other arrangement than a checkerboard arrangement. This includes the packing format (horizontal side-by-side-quincunx, or vertical side-by-quincunx), and the quadrant-based approach where each quadrant of the image represents odd or even row samples from different views [ see reference 9, which is incorporated herein by reference in its entirety ]. One example of a quadrant-based packing format (60) is depicted in fig. 6. In this case, the quincunx pattern samples of each view in fig. 4 are divided into two categories and packed together to form two sub-images. The two sub-images correspond to the same view. One sub-image contains samples with "even" indices or "upper" quincunx pattern samples, distinguished in dark color, while the other sub-image contains samples with "odd" indices or "lower" samples in light color. Using this format, the samples in each sub-image substantially correspond to the rectangular pattern samples in the original view. This is not the only possible alternative packing format. In practice, the sampling pattern of fig. 4 may be defined as a 1 × 1 quincunx sampling pattern, where each 2 × 2 pixel group in the 3D CB picture is a set of four sub-images. In the same manner, the sampling pattern of fig. 6 is defined as a (height/2) × (width/2) quincunx sampling pattern, which results in a total of 4 sub-images. The intermediate values of N and M of generalized (height/N) × (width/M) yield a set of N × M sub-images or equivalently (N × M)/4 sets of four sub-images. Since this arrangement involves quincunx sampling and quadrant-image based interleaving, it will be referred to hereinafter as "quadrant-quincunx".

An alternative arrangement of quincunx samples is referred to as "side-by-side-quincunx". The samples are based on a quincunx and are interleaved side-by-side. For 3D stereoscopic content, the views are packed side by side and the rows of "even" quincunx samples alternate with the rows of "odd" quincunx samples. This arrangement is shown in fig. 7, in which a possible quincunx sample arrangement (70) is shown. Both the "quadrant-quincunx" arrangement and the "side-by-side-quincunx" arrangement differ from the "CB" arrangement in that the views are packed individually without being interleaved. The "quadrant-quincunx" arrangement differs from the "side-by-side-quincunx" arrangement in that all samples are aligned in the vertical direction when considered in a full resolution grid. In contrast, the samples in a "side-by-side-quincunx" arrangement are only vertically aligned in every other row. Neighboring samples in the vertical direction are horizontally offset by one pixel and are therefore slightly (rather than intentionally) misaligned.

The sampling and interleaving need not be limited to a quincunx pattern. Different arrangements are also possible. One such arrangement (80) including column sampling and column interleaving is depicted in fig. 8. Another arrangement (90), referred to as "side-by-side-column", employing column sampling and side-by-side interleaving is shown in fig. 9. An arrangement (100) also referred to as "top and bottom-row (topcotom-row)" or "over-under" using row sampling and top and bottom interleaving is shown in fig. 10. Finally, an arrangement (110), also referred to as "row interleaving" or "row-to-row", applying row sampling and row interleaving is shown in fig. 11.

All of the above mentioned sampling arrangements produce signal characteristics that are very different from those of the original image or video data. Although 3D stereo data has been discussed above, similar conclusions are also drawn when considering interleaved signals, which are characterized by line sampling and line interleaving in the time domain, for example. Another application is the processing of single-view data sampled from an original source (e.g., down-sampled by a factor of two). Among other things, algorithms such as de-noising, deblocking, enhancement, general low-or high-pass filtering, wiener filters, etc. must be modified to ensure the desired results.

The case of deblocking for quincunx sampled pictures will now be discussed.

Sample interleaved video data (such as the CB pictures of fig. 5) may be compressed using a variety of video codecs [ see reference 3, which is incorporated herein by reference in its entirety ], including ITU-t h.264/ISO MPEG-4AVC [ see reference 4, which is incorporated herein by reference in its entirety ], video coding standards, MPEG-2 and SMPTE VC-1[ see reference 5, which is incorporated herein by reference in its entirety ], video coding standards, and the like. Part of many video coding standards use deblocking and other filtering mechanisms, both of which may be "in-loop" or "out-of-loop". In-loop processing generally refers to processing applied to samples that are subsequently used to predict other samples. The process must be mirrored at both the encoder and decoder so that the final result is the same. Out-of-loop processing refers to processing applied to a sample before it is encoded or after it has left the encoding chain and is being sent to a display or some other subsequent processing stage. The former is commonly referred to as pre-processing, while the latter is referred to as post-processing. The intermediate "processing" step includes any operation that may affect the sample (including encoding and reconstruction). The in-loop deblocking stage can be critical for many applications because it can remove blocking artifacts (blocking artifacts), which are the result of block-based prediction (e.g., intra-frame motion compensated prediction or block-based hybrid motion compensated prediction) and the block-based transform and quantization process underlying most coding standards.

In inter prediction, block motion compensation is used to predict each block in a current picture using rectangular blocks in one or more reference pictures. The prediction block does not have to be co-located with the current block. The prediction block may be derived using a prediction model comprising a translation model, an affine model and a perspective model. The reference pictures may be derived from past, future or future codecs, and the prediction blocks may even originate from the same picture. In intra prediction, a block in a current picture is predicted from only reconstructed or decoded samples in the current picture. The prediction may be as simple as e.g. copying available high sample values or using a prediction model to predict the current block with samples in e.g. a distant region of the same picture.

Intra-prediction and inter-prediction constitute the backbone of most practical and commercial video codecs. For coarse quantization of the prediction residual, the block properties of the prediction process can produce visual artifacts along block boundaries, known as block artifacts. Block artifacts appear as visual discontinuities along horizontal and/or vertical block boundaries. In addition to visual degradation, these artifacts may also reduce the prediction efficiency of block-based motion compensation. Thus, video codecs may employ in-loop deblocking to suppress these artifacts and improve coding efficiency. Deblocking may also be part of a post-processing system that may itself be located in a display. Typically, today's deblocking algorithms use adaptive filters that suppress visual discontinuities between samples at block boundaries. Note that in addition to block artifacts, the encoding process that applies quantization to the prediction residual results in several other artifact types. These include ringing, noise, and blurring artifacts, among others. Thus, the discussion of the deblocking problem can be extended to the application of deblurring, denoising, and deringing algorithms to sample multiplexed video content.

An example block boundary is shown in fig. 12. Let p be_iPixel, q, shown to the left of the block boundary (120)_iRepresents the pixels to the right of the block boundary (120). Although the depicted boundaries are vertical, similar considerations may be made for horizontal block boundaries (130). H.264/MPEG-4AVC deblocking algorithms will span blocksThose eight pixels of the boundary are taken as input and updated based on a number of parameters, including: coding mode, coding parameters, intra prediction direction, energy of the residual and motion parameters of neighboring (neighboring) blocks, filtering strength, quantization parameters used for coding a block, input constraints such as slice structure, component type and chroma sub-sampling structure, average chroma value, transform type, coding type (e.g. field coding or frame coding), and information from a certain base layer in case a constituent picture is coded in a certain enhancement layer, etc. Value p 'after deblocking'_iAnd q'_iIs a function of the input values from both sides of the block boundary (the objective is to suppress the difference: therefore, the boundary p values will be mixed with the q values adjacent to it). Deblocking can also be performed in the frequency domain. One such technique is based on the principles of overcomplete denoising [ see reference 11, the entire contents of which are incorporated herein by reference]. In the overcomplete denoising method, samples are transformed by an overcomplete (redundant) transform and then adjusted using a predefined threshold. The threshold transform coefficients are then inverse transformed and combined (since they are a plurality of redundant/overcomplete values) to produce the final denoised samples. To perform deblocking, these methods have been employed to consider only values close to block boundaries. These methods may also modify their thresholds according to neighboring block patterns, QP, motion of the block, and original and quantized transform coefficients, etc.

Deblocking, when performed as an in-loop processing step, can benefit compression efficiency and visual quality. In addition, visual quality benefits when applied as a post-processing step. However, applying traditional deblocking algorithms (such as those mentioned above) directly to sample interleaved video content can lead to undesirable results. Consider the case of, for example, an H.264/MPEG-4AVC deblocker operating on the pixels of FIG. 12. For the case of sampling and interleaving 3D video content, fig. 13 shows that these pixels now belong to different views. The H.264/AVC deblocker will calculate a new deblocked value from all the p-values and q-values. This process can lead to highly undesirable cross-view contamination if the stereo disparity between views is very high: pixels from view 0 (left) may contaminate pixels from view 1 (right) and vice versa. Similar conclusions can be drawn for other conventional deblocking schemes that do not distinguish pixels of each view, such as the ones used by the SMPTE VC-1 codec or the overcomplete techniques. Samples from one view will be contaminated if care is not taken to avoid considering samples from other views when deblocking samples from the other views. For example, aliasing may be created and the threshold may null the true content leaving artifacts. Although these problems significantly affect the 1 × 1 quincunx sampled and packed 3D data (CB picture), a similar problem occurs when considering general (height/N) × (width/M) quincunx sampled and packed 3D data.

Without loss of generality, consider now an example of (height/2) × (width/2) quincunx sampled and packed 3D data. The direct application of deblocking to the "even" and "odd" sub-images (140, 150) of the same view is shown in fig. 14. In contrast to the 1 x 13D data packing of fig. 5, this situation looks very different in the sub-image domain. All samples in the sub-image belong to the same view, so that existing algorithms can be applied directly without considering samples of different views. However, if the full resolution grid also shown in fig. 14 is considered, the process is not as simple. Since deblocking will be done independently, the imaginary block boundaries (shown in dashed lines) of one view sub-image (160) do not coincide with the imaginary block boundaries of the other sub-images (170) of the same view. Furthermore, since these regions may not be encoded in a similar pattern, the strength of deblocking may differ for each sub-image. The inconsistent deblocking strength may contribute to artifacts of up and down motion (seeslow artifacts) along the imaginary boundary in the full resolution image in fig. 14.

Similar problems also affect deblocking applied to "side-by-side-quincunx" data. As with the "quadrant-quincunx" data shown in fig. 15, this situation is similar for vertical edges. Artifacts of up and down motion may be created, etc. However, when considering deblocking of horizontal edges, there are different and more serious problems. This is depicted in fig. 16. The deblocked "rows" (180) comprise samples that are shifted by one pixel in the horizontal direction for each row. Filtering these samples without regard to their spatial position in the full resolution grid results in a visual degradation.

These problems also affect data that is sampled and interleaved "side-by-side-column" or data that is interleaved "side-by-column" only. The sample positions and full resolution positions for the deblocking or even general filtering case are shown in fig. 17. Although the processing in the vertical direction remains unchanged when considering the full resolution content, this does not apply to the horizontal direction where the data is now far apart when considering the full resolution grid (190). In addition, it can also be observed that the distance of the data from the actual block edge (200) is different in the horizontal direction. A similar problem is also confirmed when considering the case of "up and down-row" sampling and interleaving, as depicted in fig. 18. At this time, the samples in the horizontal direction correspond to the positions of the full resolution data. In contrast, samples in the vertical direction have a large distance and are also not equidistant from the block edge. In both cases, the sampled data has a different aspect ratio than the original data: 2:1/1:2 and 1: 1. Applying existing h.264 deblocking algorithms directly to such content can result in undesirable compression efficiency and subjective quality. It should also be noted that the problem of fig. 18 is very similar to that encountered when processing the top or bottom fields of an interlaced picture. Better performance is possible when deblocking, and in general, the processing algorithm takes into account the position of the samples in the full resolution grid.

As also mentioned above, deblocking is only one of the possible processing methods that can be applied to sample multiplexed data. Denoising methods designed for full resolution data will also encounter problems when applied to sampled data. The above-mentioned overcomplete denoising method is one such example. However, there are other equally important operations whose efficiency is reduced when they are applied in a straightforward manner to sampled data. For example, consider the problem of fractional pixel interpolation of CB pictures. Fractional pixel interpolation is an in-loop processing step that underlies high-performance motion compensated prediction. Although the samples are represented with pixel precision, real motion typically has fractional pixel components. Fractional pixel interpolation improves prediction efficiency by estimating these values. The following simple example illustrates a problem that may arise. Assume that there are two adjacent pixels a and b in the CB picture of fig. 5. Obviously, each pixel belongs to a different view. However, half-pixel interpolation of missing samples located between a and b will take into account two pixels, resulting in cross-view contamination.

Several embodiments of the present disclosure relate to various methods for processing sample interleaved image or video data.

(a) According to a first embodiment, full demultiplexing and interpolation of missing samples is provided, followed by processing using available existing methods originally developed for full resolution rectangular sampled image or video data, and resampling with the original sampling pattern to recover the final processed samples. During the sampling process, additional filtering may be performed. For example, instead of taking the value of a sample at a sampling position during the resampling stage, a dynamic (on-the-fly) filtering may be performed such that the retained value is some (e.g. linear) combination (filtering) of the sample at the sampling position with other neighboring samples that may not necessarily belong to the sampling position. A first overview of this process is shown in fig. 19A. Sampled video input (210) is multiplexed (220) to generate component representations (e.g., separate left and right pictures). Each component representation is then interpolated (230), followed by standard processing (240) of each representation. After processing, the volumetric representations are sampled (250) and multiplexed or packed (260) together. In this disclosure, the terms multiplexing and packing are used interchangeably. A diagram of this process for the case of deblocking (as the processing algorithm), sampling (e.g., quincunx), and multiplexing is shown in fig. 19B. As shown in fig. 19B, the interpolation (270, 280) in each component view or layer is independent of each other.

As shown in fig. 20, performance can be further improved by analyzing the input sampled picture (e.g., the "CB" picture) and optionally the interpolated full resolution content (e.g., left and right views). The information that can be extracted by the analysis module (290, 300, 310) of fig. 20 includes, but is not limited to: (a) an estimate of stereo disparity, which may be used to control the contribution of samples from different views when processing samples of a current view. As an example, for low disparity, it makes sense to consider a sample from another view, regardless of any potential contamination. (b) Again, may be used to guide the edge and frequency characteristics of the filtering process. For example, for highly textured areas, limited deblocking may be used or no deblocking at all may be used. The estimation or knowledge of the horizontal and vertical correlations may also be useful in guiding the design of the interpolation filter (as in the case of interleaved samples and those of fig. 17 and 18). (c) The average luminance and chrominance values may also be used to direct the filtering. For example, high brightness blurs artifacts, and subsequent filtering operations can solve this problem.

(b) Further embodiments of the present disclosure relate to processing of sample inputs dynamically without the need to fully resample, demultiplex, or interpolate the sample inputs into full resolution image or video data. These embodiments modify the processing algorithm so that the processing algorithm knows the position of each sample of the sampled picture in the full resolution grid. For 3D stereoscopic content, this means that the processor distinguishes samples in different views and/or locations. In the case of interleaved content, the processor takes advantage of the fact that: vertical neighbors in the top or bottom field have twice the spatial distance when compared to horizontal neighbors. For interleaved image or video data, the processor distinguishes data belonging to one category from data belonging to a different category, and may choose to avoid contamination between these different categories depending on the purpose of the processing algorithm. When sampling the data, the processor utilizes knowledge of the actual position of each sample in the sampled image or video data in the full resolution grid. There may be the following: adjacent samples in the full resolution grid are located at remote locations in the sample picture or vice versa. Such knowledge may also be utilized. While in some embodiments full interpolation of sampled input data is avoided, in other embodiments limited interpolation is allowed when the processing gain greatly exceeds the computational and memory complexity increase. For example, in the case of deblocking, missing samples adjacent to block boundaries may be interpolated and considered at the processing stage. Some specific embodiments will be considered below.

Consider the case of deblocking quincunx sampled and interleaved 3D stereo data (CB). For proper deblocking, the problem of cross-view contamination should be addressed. The deblocked samples of one view should not be contaminated by samples of other views. As with fig. 20, this limitation can be relaxed by requiring that samples from the same view be assigned a greater weight (when combined) than samples of other views or that a similarity measure between the left and right view samples be calculated before deblocking of the samples is considered. Since the coarse quantization may actually have contaminated both views before the deblocking process, it makes sense to consider even all samples with different weights. These weights may be modified based on coding parameters and statistics such as quantization parameters, percentage of non-zero transform coefficients, coding mode, location of blocks, type of components, and chroma sub-sampling type.

(b1) Fig. 21 shows an example of this embodiment in the case of deblocking the "CB" data. The simplest solution for implementation is by assuming a value p₂、p₀、q₁And q is₃Deblocking is performed for consecutive pixels that span the boundary between the p-value and the q-value. The remaining pixels p are individually addressed in a similar manner₃、p₁、q₀And q is₂And (6) processing. Although this first embodiment would avoid cross-view contamination, deblocking is possible because the number of samples is now smallIt may not be as efficient and at the same time they have different distances from the block boundary: for example, sample p₃Ratio sample q₂Further away from the boundary. For better results this should be taken into account during filtering. When not interpolating, this can be solved by applying different weights to each sample during filtering or deblocking. For example, for sample p₂、p₀、q₂And q is₃The following weight depending on its distance from the block boundary (considering each pair) may be employed: 7/12, 3/4, 1/4 and 5/12. Improved performance may be obtained by interpolation that may dynamically generate all or part of the missing samples (e.g., the missing samples closer to the edge) and use it in conjunction with existing samples when performing deblocking. In the case of fig. 21, the missing samples can be interpolated as follows:

in this example, a simple bilinear filter has been used to interpolate the missing samples without loss of generality. It should be noted that when interpolating a missing sample to perform deblocking on a block boundary, only samples belonging to the same block boundary side as the sample position to be interpolated may be considered. Now, for a total of eight available samples, the deblocking process is performed on four consecutive samples on each side of the boundary, for example with the algorithm used in h.264. Updating interpolated bits without deblockingA sample at a site. Value p'₃、p’₁、q’₀And q'₂To update only the views p belonging to the current view₃、p₁、q₀And q is₂The sample of (1).

Various embodiments may use more complex interpolation methods to estimate the missing samples, which may include edge adaptive filtering [ see reference 8, the entire contents of which are incorporated herein by reference]Separable and non-separable filters with longer support, or prediction from future or past pictures, etc. In contrast to conventional interpolation and demultiplexing, care must be taken to avoid considering samples from one side of the boundary when interpolating samples on the other side of the boundary. May be performed by simply aligning one or two samples closest to the block boundary (in this case, the samples of the block boundary)) Interpolation is performed to reduce complexity.

(b2) Various embodiments address deblocking of generic (height/N) × (width/M) quincunx sampled and packed 3D data. Without loss of generality, consider the example of (height/2) × (width/2) quincunx sampled and packed 3D data of fig. 14 ("quadrant-quincunx"). Deblocking for a given block edge in an "even" sub-image is combined with deblocking for the same edge in an "odd" sub-image for optimal deblocking performance.

One possible embodiment of this strategy is depicted in fig. 22. For reference, the position of the samples of each sub-image in the original full resolution grid is shown. Before deblocking the "even" sub-image samples, the "a" samples are interpolated using the adjacent "even" and "odd" samples of the same view. Similarly, prior to deblocking the "odd" sub-image samples, the "B" samples are interpolated using adjacent "odd" and "even" samples of the same view. Interpolation can be as simple as bilinear and as complex as content and edge adaptation, or even inpainting. Deblocking of the "even" sub-image then also takes place by taking into account the value of sample "a" and by adjusting the deblocking filter values to take into account the fact that: the distance of a sample from a block boundary in a sub-image is not necessarily equal to an equal distance in the full resolution grid.

For example, x₁And x₂Set to two "even" sub-image samples located on the block boundary. Suppose A₁Indicating interpolated samples between these two samples that are also located on the same horizontal line. Assume that it is desired to deblock these two samples by simply setting them equal to their average: each pixel gets a weight of one half: x'₁=x’₂=（x₁+x₂)/2. The same applies to "odd" sub-image samples that lie on the same boundary above and below the same horizontal line. This may produce artifacts of up and down motion as mentioned earlier. A more appropriate deblocking process is to modify the filter weights to account for the full resolution distance and preferably also to account for "a" samples or "B" samples for "even" sub-image samples or "odd" sub-image samples, respectively. The final result is: x'₁＝x’₂=（2×A₁+x₁+3 ×x₂）/6。

(b3) Further embodiments address filtering (e.g., deblocking) for "side-by-side" as previously depicted in fig. 17. As discussed previously, the samples in the sample interleaved image have different aspect ratios: the horizontal dimension is sub-sampled by a factor of 2, while all samples are retained in the vertical direction. This observation leads to modifications to the filter lengths so that they take this difference into account. Assuming that the original content is obtained using progressive scanning before sampling, filtering in the horizontal direction can take into account half the number of consecutive samples compared to filtering in the vertical direction. Otherwise, too much blurring may be introduced and a large amount of high frequency information may be lost in the horizontal direction. Furthermore, as previously indicated, the distance of a sample from a full resolution block edge is not the same for either side of the edge. X is to be₁And x₂Set to two adjacent horizontal samples located on the block boundary of the sample interleaved image (left side of fig. 17). For the case of deblocking, for example, sample x₂Additional weights must be given. Thus, for a simple bilinear filter, x 'can be derived'₁=x’₂=（x₁+2×x₂)/6. Similar to the previous paragraph, the missing full resolution sample a between these two samples may be interpolated to further improve the filtering efficiency. It should also be noted that when samples of other views are processed, it will be x which is assigned a larger weight₁. By replacing "horizontal" with "vertical" only (and vice versa) in the above discussion, the embodiments described in this paragraph are also applicable to the "top-bottom-row" case depicted in fig. 18. Note that in the case of, for example, interlaced content, it would be desirable to use the original deblocking strategy without reducing the number of filter taps. Alternatively, for the interlaced case, it may be desirable to limit the filtering range equally in both the vertical and horizontal directions. In general, consideration should be given not only to the sampling and multiplexing applied to the input content, but also to how the input content was originally introduced (e.g., progressive or interleaved, etc.).

The methods described so far require some limited or more extensive interpolation of the missing samples to increase the efficiency of the deblocking processing step, or when interpolation is required or desired, a separable processing step is applied first in one direction and then in the other ("separable" operation).

(b4) However, if the direction of the processing step (e.g., deblocking) is modified to be other than the vertical or horizontal direction (e.g., diagonal), interpolation of the missing samples can be avoided. An example of the visualization of the "CB" arrangement is shown in fig. 23. Along the diagonal of the samples p to avoid contamination of samples from other views and to avoid interpolation of missing samples₁₀、p₂₁、p₃₂、p₄₃、p₅₄、 p₆₅、p₇₆And p₈₇Application deblocking (e.g. using de-blocking)H.264 algorithm requiring four samples across each side of the block edge), or may be along a diagonal sample p₇₀、p₆₁、p₅₂、p₄₃、p₃₄、p₂₅、p₁₆And p₀₇Deblocking may be applied, or alternatively, first to a first set of diagonal samples and second to a second set of diagonal samples, to avoid biasing the deblocked samples in a single direction. As described above, the decision as to which direction to apply filtering and how many samples to use for interpolation may be fixed or may be signaled. Such a decision can be facilitated by a pre-analysis that determines the orientation and spatial characteristics of the source signals. Furthermore, the present embodiment for processing (e.g. de-blocking) sample data along alternating directions may be applicable to any operation, including the processing of samples that have been sampled according to their original spatial position and resolution.

(b5) Various embodiments contemplate interpolation of sample interleaved data for motion compensated prediction. This embodiment is also based on considering the distance of the samples in the full resolution grid and the category (e.g., view) to which the current sample belongs. Sometimes, a block to be predicted contains data belonging to more than one category. For certain motion parameters, the prediction block has data that matches the class of data in the current block. However, for other motion parameters, this will not hold. For example, reference may be made to the CB screen shown in fig. 24: in case the current block is to be matched with an interpolated block in the reference picture buffer, it can be observed in fig. 24 that there is a view order correspondence only every third pixel, so that pixels from the same view 0 are predicted from pixels from view 0 and are similar for pixels of view 1. When the data is not properly aligned, this results in a loss of motion compensation efficiency due to cross-class contamination. In this embodiment, such a problem is solved by interpolating the missing data of each class in consideration of only samples of the same class. Then, during the formation of the prediction block, existing data or interpolation data is selected so that the data categories of the current block and the prediction block are accurately aligned.

(c) In practical systems, processing is performed before or after demultiplexing or downsampling. In the following embodiments, algorithms combining processing and demultiplexing or processing and sampling simultaneously in a single joint step will be shown. Demultiplexing includes interpolating missing samples from neighboring samples of preferably the same class. Embodiments of the present disclosure are directed to modifying interpolation filters or filter coefficients so that they also perform the function of the processing step (e.g., deblocking) simultaneously. Similarly, when sampling data, a delta function or low pass filter may be used to account for aliasing (aliasing). Embodiments involving joint deblocking and demultiplexing will now be described.

Deblocking is a sample processing stage that is typically implemented as separate processing modules (e.g., H.264/MPEG-4AVC and VC-1). Deblocking may be applied before a demultiplexing operation to obtain full resolution samples or after a sampling process (e.g., quincunx) to obtain data of each category (e.g., forming CB pictures) from full resolution data. Deblocking can be combined with demultiplexing or with sampling. Although the demultiplexing or sampling operations may be quite complex when best quality is targeted, applicants will show examples illustrating how deblocking may be combined with these operations. These combinations benefit from reduced computation and memory access compared to separate applications of these operations. Furthermore, since the resampling operation utilizes knowledge about block boundaries, joint application of deblocking and resampling (to or from the sampled signal) may also benefit quality. Joint deblocking and demultiplexing involves modifying the demultiplexing/interpolation filter as samples near or at block boundaries are processed. The filter support can also be extended to account for more samples from the other side of the boundary, all with the goal of minimizing signal transition inconsistencies across block boundaries.

(c1) An embodiment of the case of a bilinear interpolation filter and CB picture is shown in fig. 24. Is highlighted by dark colorThe samples shown are already available and it is desirable to interpolate the missing samples and simultaneously deblock the signal. Conventional interpolation using a bilinear filter estimates the sample at position (3, 3) as:the existing value of the sample at (3, 3) corresponds to the other layer/view (view 1), while the four sample values averaged belong to view 0. If deblocking is desired to be performed in a single step, the location of the interpolated samples may be considered. Since it is close (in this case, adjacent) to the block boundary, the filtering operation is modified as follows to perform interpolation and deblocking in a single step:imposing alpha>the same processing may also be applied to other views as well as to pixels of the same view on the other side of the block boundary.

(c2) As previously shown in fig. 17 and 18, similar embodiments may also be defined for non-quincunx sample content, such as "side-by-side" and "top-bottom-row". Assume that it is desired to perform deblocking and interpolation on the missing sample a of fig. 17 at the same time. Using the previously introduced notation and assuming a simple bilinear filter, the interpolated samples are derived as: a = (x)₁+x₂)/2. The samples after the combined treatment can be derived asfurthermore, α is imposed for exactly the same reasons as set forth in the preceding paragraph>the strategies described in the two paragraphs are also applicable to more complex interpolation schemes that may be combined with processing operations such as edge adaptive filtering, Lanczos filtering, etc.

The joint deblocking and demultiplexing can be performed not only in the pixel domain but also in the frequency domain. A frequency-based approach to deblocking is mentioned in reference 11, the entire content of reference 11 being incorporated herein by reference. The same overcomplete denoising principle has also been used previously to recover missing regions in images. Overcomplete denoising is based on applying multiple redundant transfer versions to a transform of the same data, a threshold of transform coefficients, an inverse transform of the coefficients, and a combination of the resulting values to produce the final processed samples.

(c3) Reference will be made to fig. 25. In the case of the same view, two 4 x 4 overlapping "blocks" (320, 330) of existing samples of the same category are highlighted with different hatching, both marked with an "x". The purpose is to interpolate and deblock the missing samples simultaneously. First, a transform is applied to each overlapped block. The transform may be mx 1, 1 xn or diagonal, and may also be mxn in size, assuming different aspect ratios. In the depicted case, the transform is a 4 × 4 integer Discrete Cosine Transform (DCT) or a 4 × 4 hadamard transform. Then, a 2 mx 2N-sized transform coefficient corresponding to the same supported spatial region as that of the mxn-sized transform is estimated using, for example, the mxn-sized transform coefficient. In the depicted case, the 4 × 4 transform coefficients are then converted to their equivalent of an 8 × 8 transform (e.g., integer DCT transform). Thus, the frequency information of an 8 x 8 block (340, 350) delimited by a 4 x 4 "block" (320, 330) of existing samples is estimated. It should be noted that 4 x 4 blocks (e.g., 320) and 8 x 8 blocks (e.g., 340) have the same spatial extent in the full grid. A 4 x 4 block includes only one class of existing samples, while an 8 x 8 block includes both existing samples and missing samples of the same class. As explained below, these new transform coefficients may be adjusted, for example by thresholding, to set the coefficients to 0 if they are above or below (typically below) some threshold. A threshold is set to perform denoising and/or deblocking, etc. The modified (e.g., thresholded) transform coefficients are then inverse transformed to obtain an estimate for each of the 8 x 8 samples (existing and missing). It should be noted that the estimate of the existing sample is obtained for the following cases: it is desirable to update such values to ensure more consistent visual quality and to avoid artifacts. For example, given a sampling pattern, filtering optimized for display purposes has been used to generate existing samples, but does not necessarily result in the same visual quality when recovering from missing samples in a full resolution grid. Each shifted overlapping "block" will yield a different estimate for each missing sample. These estimates may be combined in a linear or non-linear manner to produce a final estimate. To perform deblocking simultaneously, transform coefficients are adjusted according to a large number of parameters, inverse transforms are applied, and estimates for each pixel corresponding to the multiple parameters for each shift transform are retained. This is a different factor than conventional overcomplete denoising, which produces a single estimate for each sample from each shift transform. Then, depending on whether the sample is close to a block boundary, an appropriate estimate corresponding to an appropriate set of parameters is used. This estimate is then combined with the final estimate of the rest of the overlap/shift transform. The parameters used to adjust the coefficients depend on quantization parameters, coding mode, spatial characteristics, components and chroma sub-sampling type and proximity to block boundaries, among other factors.

FIG. 26 shows a high level diagram that describes the operation on each block. Such an embodiment may also be applied to joint de-noising and de-multiplexing if the modification of the transform coefficients of the shifted block uses a single set of thresholding parameters (as opposed to the multiple sets used above) and applies processing to all samples (as opposed to applying processing primarily to block boundaries).

Deblocking and other pre-processing, post-processing, and in-loop processing operations (such as denoising) may be parameterized with the following parameters in embodiments in accordance with the present disclosure: (a) coding modes used to code blocks that overlap with processed samples, (b) motion of those blocks, which may be translational, affine, or some higher order motion model, (c) spatial statistics including variance, edge information, frequency characteristics that can be collected using a large number of transforms (e.g., fourier transforms, integer discrete cosine transforms, wavelet transforms, hadamard transforms, etc.), block size used to code samples, signaled filtering strength, filter direction, filter type (low-pass or high-pass, bilinear or Lanczos, etc.), quantization parameters used to code blocks, block prediction residuals, input constraints (such as slice structure, component type (luma or chroma) and chroma subsampling structure), transform type, and coding type (e.g., interlace or frame), etc.

Typically, it is desirable to signal instructions to the processing module. The instructions may refer to all or part/area of the picture. These instructions include, but are not limited to, the following operations: (a) processing operations for the designated area are turned on and off. (b) The type of filter used during processing is controlled (low or high pass, Lanczos, bilinear, filter orientation, etc.). (c) The strength of the filtering is controlled (e.g., by a factor). (d) The filtering strategy is signaled (either directly processing the sampled picture or upsampling the sampled picture to e.g. a full resolution view, processing and then resampling, or even using the overcomplete de-noising principle). Since the processing instructions may be signaled on a region basis, different processing algorithms may also be selected for different regions. The criteria for such a decision would be distortion-dependent: since the encoder can use the original reference content and signal to the processing module the best method for each region, the encoder can test all available methods.

For 3D stereoscopic content that employs side-by-side or top-bottom interleaving, the filtering operation of one view may be used to derive filtering parameters or thresholds that may direct the filtering operation of another view. This is possible since the views are no longer interleaved on a pixel basis, as with the quincunx interleaving pattern. A "quadrant-quincunx" picture with sufficiently large quadrants may also benefit from such a technique. Parameters such as deblocking filter strength, type, coefficients, and direction, as well as various other filtering thresholds or coefficients estimated for one view may be applied to another view. This applies to all ((a), (b) and (c)) embodiments described above.

The embodiments described above for processing sample interleaved data have several applications. Detailed descriptions of specific methods and configurations follow.

(a) And (4) preprocessing. The content is typically pre-processed, for example before compression. The pre-processing may include spatio-temporal filtering involving block-based motion compensated prediction. Prior to encoding, full resolution content (e.g., left and right views) may be temporally filtered, then sampled and interleaved (e.g., in "CB" format). Conventionally, processing such as deblocking or spatial filtering would be performed on the full resolution content prior to the sampling and interleaving processes. In an embodiment according to the present disclosure, the processing is performed after sampling and interleaving processing using the method described below. The benefit of doing this at this stage is that fewer samples have to be processed (about half of the example of CB stereo data packing, for example), resulting in a significant reduction in computational and memory complexity. A diagram of this embodiment is shown in fig. 27.

(b) In-loop processing/filtering for hybrid video compression. In-loop filtering may be applied during the encoding process after the pictures are generated and before the pictures are stored in the reference buffer. In h.264/MPEG-4AVC, pictures are generated by adding a prediction residual to a prediction picture. A picture may undergo deblocking before it is stored in a reference picture buffer so that it can be used as a motion compensated reference for a subsequently encoded picture. At this stage of the encoding process, the teachings of the present disclosure may be applied. Fig. 28 and 29 show diagrams of an encoder and decoder, respectively, of the method.

(c) In-loop processing/filtering for resolution scalable video coding systems. In such systems, a low resolution (sample interleaved) version of the full resolution content is encoded in a so-called Base Layer (BL) bitstream (e.g., using an h.264 or VC-1 codec). For example, CB pictures sampled and interleaved using a quincunx pattern retain only half the samples of each of the full-resolution left and right views. Thus, CB pictures can be compressed (e.g., using an h.264 codec) in a so-called Base Layer (BL) bitstream. However, the base layer is not limited to only CB-picture or 3D stereoscopic content transmission. For example, only the left view or only the right view, a sampled version of a single view, may be compressed, or in the case of general stereo content, one of the following possible arrangements may be used: "CB", "quadrant-quincunx", "side-by-side quincunx", "side-by-row", "row-by-row", or "up-down-row", etc. Such a system can be applied to transmit conventional single-view content in addition to 3D content. The base layer may comprise a sampled version of the single-view source. Full-resolution content may include single or multiple views.

To recover the lost resolution (samples), a prediction of the lost samples from the base layer bitstream can be performed, and then the prediction residual of the lost samples is encoded in a second bitstream comprising one or more Enhancement Layers (ELs) using a certain sampling pattern. Another option is to perform prediction of full resolution samples using samples encoded in the base layer bitstream. This is particularly useful when the filtering/processing stage is performed prior to the sampling operation that generates the BL samples. In these cases, the samples in the BL do not necessarily retain their original values as in the full resolution content. It should be noted that in a system with multiple enhancement layers, the prediction of samples encoded in one layer may depend not only on samples encoded in the base layer, but also on samples encoded in a higher priority enhancement layer. An EL is considered to have a higher priority than another second EL if it can be decoded without the need for information stored in the second EL bitstream. It is also desirable to perform in-loop filtering on the input samples and the output samples before and after EL is predicted from BL. The filtering process may include low-pass or high-pass filtering, deblocking, deinterleaving, deringing, denoising, and other enhancement operations, among others. As far as the base layer is concerned, these filtering operations are out-of-loop since the base layer operations are not affected. However, when considering the entire scalable coding system, these operations are in-loop and are needed to correctly decode the full resolution content. While the base layer encoder and decoder may optionally perform deblocking, denoising, deringing, etc., it is also possible and often desirable to perform loop processing (e.g., deblocking) at one or more enhancement layers since there may be coding artifacts that are not related to coding artifacts in the base layer. The scalable coding strategy described above can be implemented using the scalable video coding (annex G) or multi-view video coding (annex H) extensions of the h.264/MPEG-4AVC video coding standard. The layered strategy can also be implemented using the basic (annex A) specification of H.264/AVC, MPEG-2, or VC-1, etc., by defining enhancement layer frames that are specified as being disposable (not used for motion compensated prediction), which is done for temporal scalability purposes.

Processing or filtering algorithms such as deblocking include two separate steps: (a) derivation of parameters for guiding the filtering process; and (b) applying a treatment to the sample using the parameter derived in the previous step. In general, parameters used for processing at the enhancement layer may be highly correlated with those in the base layer. Thus, operational efficiency benefits if the base layer information is (re) used to initialize and direct operations at the enhancement layer.

The enhancement layer may contain full resolution content or samples that are not encoded in the base layer. Fig. 30 and 31 show diagrams of base layer and enhancement layer encoding and decoding systems using quincunx pattern sampling in the enhancement layer. The same diagram also applies to the case where spatial scalability for single view transmission is used. This is not limited to the transmission of stereoscopic content. Further, any of the available sampling and interleaving strategies may be used. In the same figure, applicant shows the application of the in-loop filtering step (360, 390). It should be noted that when the EL comprises full resolution samples and the target is a 3D stereoscopic video transmission, then the EL frames comprise full resolution views, e.g. side by side. Therefore, the EL frame is larger than the BL frame multiplexed in one possible arrangement (CB, parallel arrangement, etc.). Such an embodiment may be implemented using the SVC extension part of H.264/MPEG-4AVC (annex G) and modifying the prediction and coding process of the EL. As shown in fig. 30 and 31, the method according to the present disclosure (e.g., a quincunx loop filter) may be applied to an in-loop filter (360, 390) of one or more enhancement layers of a scalable video coding system. Furthermore, such methods may also be applied before (370, 400) and/or after (380, 410) the base-to-enhancement layer predictor. Those skilled in the art will appreciate that the teachings of the present disclosure with reference to fig. 30 and 31 are applicable to content that is not limited to 3D CBs. As an example, they may also be applicable to any type of 3D multiplexing, such as side-by-side content. In addition, spatial scalability can be achieved using such strategies: the BL can encode half of the original signal at that resolution and the EL can encode the other half (e.g., using column subsampling). Thus, applications also include traditional 2D video transmission.

Further embodiments of resolution scalable stereoscopic video coding will now be described. The deblocking process can be an in-loop process of the base layer and the enhancement layer. In such embodiments, unlike the above-described system, two separate internal reference picture buffers (420, 430) are maintained in the enhancement layer for each view. Each individual buffer receives input from a loop filter (440, 450), such as a deblocking filter. However, the prediction signal from the base layer to the enhancement layer is obtained by sampling and interleaving the stored views. This architecture includes several sampling steps, multiplexing steps and demultiplexing steps. See, in particular, the blocks "demux and upsampler" (460) and "loop filter" (470) indicated by the dashed lines along the paths (480, 490) of the prediction signal. Thus, in certain other embodiments, these steps may be combined as discussed in section (c) above. An encoder for such a system is shown in fig. 32, and a decoder for such a system is shown in fig. 33.

Still other systems, of which encoders and decoders are shown in fig. 34 and 35, maintain two separate picture buffers and perform prediction in the full resolution domain, followed by sampling of the predicted views. Loop filtering (e.g., deblocking) can be applied at the enhancement layer and during prediction of the enhancement layer picture from the picture of the base layer. In particular, one or more of the one or more enhancement layers of fig. 34 (encoding side) and fig. 35 (decoding side) include two samplers (500, 510) and a multiplexer (520), among other components. In another embodiment of the present disclosure, the enhancement layer on the encoding side or the decoding side does not use a sampler. Thus, while the system of fig. 32/33 sub-samples and multiplexes the frames to create a single frame that is then used to predict the sampled frame, in the system of fig. 34/35, the full resolution frame is used to predict the full resolution input frame without multiplexing or sub-sampling.

Another embodiment of the present disclosure, whose encoder and decoder are shown in fig. 40 and 41, supports two separate enhancement layers, each of which can use a single reference picture buffer (530, 540) and perform prediction in the full resolution domain. Thus, each EL encodes a full resolution version of each view. Loop filtering (e.g., deblocking) can be applied at the enhancement layer and during prediction of the enhancement layer picture from the picture of the base layer.

The deblocking step can have different implementations depending on the nature and format of the data being encoded in the enhancement layer. In one embodiment, the enhancement layer encodes samples for each view that are not encoded in the base layer representation (keeping in mind that the sampling process as part of the generation of the base layer results in fewer samples being encoded than full resolution samples). The combination of samples encoded in the base layer and the enhancement layer then produces a full resolution view. As previously mentioned, deblocking in the base layer includes some interpolation to recover the lost samples for best deblocking performance. Deblocking in the enhancement layer can also be achieved in the same manner. However, the following embodiments may be provided: the missing samples are not interpolated in the enhancement layer, but merely replaced with samples from the base layer, so that all full resolution samples can be used before deblocking is applied in the enhancement layer. This is shown in fig. 36 for the exemplary case of quincunx sampling and interleaving. In this case, a conventional deblocking algorithm may be used if all samples are replaced, or one of the sample-aware (e.g., quincunx-specific) algorithms described in this disclosure may be used if only the sample values closest to the boundary are used.

In another embodiment, the enhancement layer also encodes the full resolution left and right views using the prediction from the base layer. In this case, deblocking in the enhancement layer can use any known deblocking algorithm designed for progressive or interlaced (including macroblock-based and picture-based interlaced coding as employed in h.264) image or video data.

In different embodiments, the scheme of fig. 32 or fig. 34 may be used to transmit interleaved content. Using the symbols in the graph, the top field can be assigned to V₀And the bottom field is assigned to V₁. The base layer may encode either of the two fields, a sub-sampled version of either of the two fields, or a sub-sampled version of a combination of the two fields.

(d) And (5) performing out-of-loop post treatment. The implementation may differ according to the final signal with full resolution or with a certain sub-sampling (e.g. quincunx) pattern and also according to the type of display. If the display expects the input to be in a sampling format (e.g., "CB"), there are two main options, applying post-processing directly to, for example, the quincunx sample field or demultiplexing the two views to their full resolution, applying post-processing to the full resolution samples, and then resampling the view sample picture to be fed to the display. If the display receives its input in a format different from that of the CB, including but not limited to a side-by-side format, a line-by-line format, or a full resolution view, one option is to perform deblocking in the sample pictures (e.g., the CB field) before demultiplexing the content into one of the possible output formats. Another option is to perform post-processing as part of (in conjunction with) the demultiplexing process as discussed previously in section (c) of this disclosure. A general implementation is depicted in fig. 37.

Fig. 38 illustrates another embodiment of the present disclosure that performs joint processing (e.g., "deblocking") and upsampling (e.g., by a factor of 2 for each dimension). As shown in fig. 38, a set of N input video samples undergoes two distinct operations. In the upper branch of fig. 38, after upsampling (550), overcomplete deblocking (560) is performed. In the lower branch, first the overcomplete deblocking (570) is done, followed by upsampling (580). The 4N samples processed in each branch are then combined together (590). Conventional overcomplete deblocking may be used to perform deblocking. Any upsampling operation may be used. According to the embodiment of fig. 38, the overcomplete operation may be utilized before or after the upsampling operation. The samples from each branch are then combined in the pixel (image) or frequency domain. The combination may be linear (e.g., some weighted sum), but may also be non-linear: one of the two may be selected based on some metric or constraint or segmentation. For example, textured regions may be more biased toward one another than flat regions. The following examples may also be provided: frequency domain processing takes into account a number of different frequency decomposition domains, where processing is performed in each domain and the domains are later combined. Although the embodiment of fig. 38 has been shown with reference to overcomplete deblocking, it may also be used with other types of processing, generally, such as denoising or filtering. Further, the input video sample of fig. 38 may be any type of video sample, and is not limited to a sample of a synthesized picture.

The teachings of the present disclosure may be employed to increase the computational speed of processing operations such as deblocking, denoising, and filtering. Fig. 39 shows a schematic diagram in which, for example, to speed up deblocking, full resolution samples are sampled in, for example, a quincunx pattern, then the composite video samples (600) are demultiplexed into component representations (610, 620) (e.g., "left" and "right" pictures or any type of content, such as odd and even index samples) to perform deblocking (and optionally upsampling) on each component representation (630, 640) and then each representation after processing (e.g., by linear or non-linear combination) is multiplexed (650) to obtain a deblocked full resolution picture.

In summary, the present disclosure contemplates data enhancement or processing systems and methods, such as in-loop processing (part of the encoding/decoding process) or out-of-loop processing (pre-processing stages or post-processing stages (such as deblocking and denoising)) of data that has been sampled and multiplexed using various methods, according to several embodiments. These systems and methods can be applied to existing codecs (encoders and decoders), but can also be extended to future encoders and decoders by also providing modifications to the core components. Applications may include blu-ray video encoders and players, set-top boxes, software encoders and players, but also play and download more bandwidth-limited solutions. Other applications include BD video encoders, players and video discs created in a suitable format, or even content and systems for other applications such as broadcast, satellite and IPTV systems.

The methods and systems described in this disclosure may be implemented in hardware, software, firmware, or a combination thereof. Features described as blocks, modules, or components may be implemented together (e.g., in a logic device such as an integrated logic device) or separately (e.g., as separately connected logic devices). The software portion of the methods of the present disclosure may include a computer readable medium comprising instructions that when executed perform, at least in part, the methods. The computer-readable medium may include, for example, Random Access Memory (RAM) and/or Read Only Memory (ROM). The instructions may be executed by a processor, such as a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA).

Accordingly, embodiments of the invention may relate to one or more of the following example embodiments.

Thus, the present invention may be implemented in any of the forms described herein, including but not limited to the following Enumerated Example Embodiments (EEEs) that describe the structure, features, and functionality of certain portions of the present invention:

eee1. a method for processing composite sampled image or video data, said composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising:

demultiplexing the composite sampled image or video data into a plurality of component pictures;

processing each component picture separately; and

the separately processed component pictures are sampled and multiplexed together.

Eee2. the method according to enumerated example embodiment 1, further comprising interpolating each component picture separately.

Eee3. the method according to enumerated example embodiments 1 or 2, wherein the processing is selected from deblocking, denoising, deblurring, deringing, and filtering.

Eee4. the method of enumerated example embodiments 1-3, wherein the plurality of component pictures are a left component view and a right component view of a three-dimensional (3D) stereoscopic image or video data.

Eee5. the method of enumerated example embodiments 1-4, wherein the processing of each component picture is based on an analysis of parameters related to the composite sampled image or video data and at least one of the plurality of component pictures.

The method of enumerated example embodiment 5, wherein the parameter comprises one or more of: an estimate of stereo disparity, edge and frequency characteristics, average luminance values, temporal characteristics such as motion information, coding mode, intra prediction direction, energy of residual, quantization parameters, average chrominance values, and information from the base layer if the constituent pictures are coded in the enhancement layer.

Eee7. a method for processing composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components or categories, the method comprising:

each element of the composite sampled image or video data is processed by taking into account the image or video component or category to which it relates, thereby distinguishing between processing of composite data relating to one image or video component or category and processing of composite data relating to another image or video component or category.

Eee8. the method of enumerated example embodiment 7, wherein when the composite image or video data is sampled together, knowledge of a particular image or video component or category related to the composite sampled image or video data is provided.

Eee9. the method of enumerated example embodiment 8, wherein the knowledge includes spatial relationships between data of the composite image or video and/or spatial relationships between data of the image or video components or categories.

Eee10. the method according to enumerated example embodiments 7 to 9, wherein the processing includes interpolation.

Eee11. the method according to enumerated example embodiment 10, wherein the interpolation is performed partially, fully, or not.

Eee12. the method of enumerated example embodiment 11, wherein the processing includes deblocking, and wherein the decision as to whether to perform interpolation on data is dependent on a distance of the data from an edge of the block.

Eee13. the method according to enumerated example embodiments 10 to 12, wherein the interpolation is selected from bilinear filtering, edge adaptive filtering, separable filtering, non-separable filtering, and prediction from past and/or future pictures.

Eee14. the method according to enumerated example embodiments 7 to 13, wherein processing includes deblocking, denoising, deblurring, deringing, or filtering.

Eee15. the method according to enumerated example embodiments 7 to 14, wherein distinguishing between processing of data relating to different components or classes is performed by assigning different weights to the data relating to the different components or classes.

Eee16. the method according to enumerated example embodiments 7 to 14, wherein distinguishing between processing of data relating to different components or classes is performed by a similarity measure between data relating to one component or class and data relating to another component or class.

Eee17. the method according to enumerated example embodiment 7, wherein the processing includes: deblocking the data relating to an individual image or component prior to interpolating the block-edge data relating to the individual image or component.

Eee18. the method according to enumerated example embodiment 17, further comprising: the deblocking filtering values are adjusted according to distances from block edges in the image or constituent pictures.

Eee19. the method according to enumerated example embodiment 7, wherein the processing comprises filtering, and wherein the filtering distinguishes one video component or class from another video component or class by providing different filter weights.

Eee20. the method of enumerated example embodiment 19, wherein the different filter weights correspond to different sampling rates, patterns, and/or orientations between the data of the one video component or category and the data of the other video component or category.

Eee21. the method according to enumerated example embodiments 19 or 20, wherein the processing further includes interpolation.

Eee22. the method of enumerated example embodiment 7, wherein the composite sampled image or video data is arranged along a rectangular grid, and wherein the processing is performed in a non-horizontal and non-vertical direction.

Eee23. the method of enumerated example embodiment 22, wherein the direction comprises a diagonal direction.

Eee24. the method according to enumerated example embodiments 22 or 23, wherein the processing continues according to fixed or signaled parameters.

Eee25. the method according to enumerated example embodiments 22-24, wherein the processing is selected from deblocking, denoising, deblurring, deringing, and filtering.

Eee26. the method according to enumerated example embodiments 22-24, wherein a pre-analysis is performed prior to the processing to determine orientation and/or spatial features of the image or video data.

Eee27. the method according to enumerated example embodiment 7, wherein the synthesized sampled image or video data is processed for motion compensated prediction to obtain a predicted block from a current block, and wherein the predicted block is obtained by considering, for each sample data position in the current block, samples in a motion compensated reference that belong to the same class as the samples of the sample data position.

Eee28. a method for processing composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising:

demultiplexing the composite sampled image or video data into a plurality of component pictures while processing the sampled image or video data, wherein processing is selected from deblocking, denoising, deblurring, deringing, and filtering.

Eee29. the method according to enumerated example embodiments 28, wherein the demultiplexing and simultaneous processing is by weighted interpolation, wherein different weights are to be applied to different adjacent samples.

Eee30. the method according to enumerated example embodiment 29, wherein the processing is deblocking and adjacent samples to be used for interpolation are located on both sides of a block edge, and wherein the first weight comprises adjacent samples applied on one side of the block edge and the second weight comprises adjacent samples applied on the other side of the block edge.

Eee31. the method according to enumerated example embodiment 30, wherein the first weight is greater than the second weight when there are fewer adjacent samples on the one side than adjacent samples on the other side.

Eee32. the method according to enumerated example embodiments 29 to 31, wherein the weight depends on a parameter selected from: quantization parameters of neighboring blocks, coding modes and motion parameters of the blocks, and spatial features of the image or video.

Eee33. the method according to enumerated example embodiments 29 to 32, wherein the weights are calculated prior to processing.

Eee34. a method for processing composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video categories, the method comprising:

providing an initial block of existing samples of the same class;

applying a transform to samples of the initial block;

estimating transform coefficients of a double-sized block using transform coefficients of the initial block, the double-sized block including a same existing sample of the initial block and a same class of lost samples of the existing sample;

adjusting the estimated transform coefficients of other twice-sized blocks; and

applying an inverse transform to samples of the twice-sized block.

Eee35. a method for processing image or video data, comprising:

separately pre-processing image or video components of an image or video to be interleaved or multiplexed;

separately sampling the pre-processed image or video components;

interleaving or multiplexing the sampled pre-processed image or video components to form a composite image or video; and

and processing the composite image or video.

Eee36. the method of enumerated example embodiment 35, wherein processing the composite image or video is selected from deblocking, denoising, deblurring, deringing, and filtering the composite image or video.

Eee37. the method of enumerated example embodiments 35 or 36, wherein processing the composite image or video is based on motion information derived from the image or video components during pre-processing.

Eee38. the method of any one of enumerated example embodiments 1-37, wherein the method is used to process the image or video data prior to encoding.

Eee39. the method of any one of enumerated example embodiments 1-37, wherein the method is for processing the image or video data after encoding.

Eee40. the method of any one of enumerated example embodiments 1-37, wherein the method is to process the image or video data while encoding the image or video data.

Eee41. the method of any one of enumerated example embodiments 1-37, wherein the method is to process the image or video data while decoding the image or video data.

The method of claim 40, wherein the method is used to process multiplexed images or video data for a scalable video coding system comprising a base layer and one or more enhancement layers, the method being applied to in-loop filters of one or more of the one or more enhancement layers at the encoder side.

The method of claim 42, wherein the scalable video coding system comprises a base layer to enhancement layer predictor on the encoder side, the method being further applied before and/or after the base layer to enhancement layer predictor.

The method of claim 41, wherein the method is for processing multiplexed images or video data for a scalable video coding system comprising a base layer and one or more enhancement layers, the method being applied to in-loop filters of one or more of the one or more enhancement layers at a decoder side.

The method of claim 44, wherein the scalable video coding system comprises a base layer to enhancement layer predictor at the decoder side, the method being further applied out-of-loop before and/or after the base layer to enhancement layer predictor.

The method of claim 40, wherein the method is for processing multiplexed image or video data for a scalable video coding system comprising a base layer and one or more enhancement layers, wherein at least one of the one or more enhancement layers comprises a plurality of in-loop dedicated reference picture buffers on an encoder side, one in-loop dedicated reference picture buffer for each image or video component of the multiplexed image or video data.

The method of claim 46, wherein the scalable video coding system comprises, at the encoder side, a prediction signal path between the base layer and one or more of the one or more enhancement layers, the prediction signal path comprising a demultiplexer and an upsampler and/or a loop filter between a reference picture buffer of the base layer and each dedicated reference picture buffer of the one or more enhancement layers.

The method of claim 47, wherein the demultiplexer and upsampler of the prediction signal block and the loop filter are combined together.

The method of claim 41, wherein the method is for processing multiplexed image or video data for a scalable video coding system comprising a base layer and one or more enhancement layers, wherein at least one of the one or more enhancement layers comprises a plurality of in-loop dedicated reference picture buffers on a decoder side, each in-loop dedicated reference picture buffer for each image or video component of the multiplexed image or video data.

The method of claim 49, wherein the scalable video coding system comprises, at the decoder side, a prediction signal path between the base layer and one or more of the one or more enhancement layers, the prediction signal path comprising a demultiplexer and an upsampler and/or a loop filter between a reference picture buffer of the base layer and each dedicated reference picture buffer of the one or more enhancement layers.

The method of any one of enumerated example embodiments 46-51, wherein the video content is interleaved video content.

Eee53. a method for processing composite sampled image or video data for a scalable video coding system having a base layer and one or more enhancement layers, the composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising:

demultiplexing the composite sampled image or video data of one or more of the one or more enhancement layers into a plurality of enhancement layer component pictures;

replacing missing samples of each enhancement layer cost picture with samples from the base layer;

processing each enhancement layer component picture separately after the replacement; and

Eee54. a method for processing composite sampled image or video data for a scalable video coding system having a base layer and one or more enhancement layers, the composite sampled image or video data comprising multiplexed image or video data relating to a plurality of image or video components, the method comprising:

encoding each enhancement layer component picture separately using a prediction from the base layer;

processing each enhancement layer component picture after encoding separately; and

Eee55. the method according to enumerated example embodiment 54, wherein the processing step includes a deblocking step.

EEE56. the method according to claim 41, wherein said method is used for processing said multiplexed image or video data of a scalable video coding system comprising a base layer and one or more enhancement layers, said method being applied as a post-processing filter at the decoder side of said base layer.

Eee57. a method for processing video samples of an image, comprising:

performing two different sets of operations on the video sample, a first set of operations comprising upsampling the video sample followed by processing the upsampled video sample to provide a first output, and a second set of operations comprising processing the video sample followed by upsampling the processed video sample to provide a second output; and

combining the first output with the second output.

Eee58. the method according to enumerated example embodiment 57, wherein the processing is deblocking, denoising, deblurring, deringing, or filtering.

Eee59. the method of enumerated example embodiment 58, wherein the deblocking comprises overcomplete deblocking.

Eee60. the method according to enumerated example embodiments 58 or 59, wherein the combining is performed by linear and/or nonlinear combining in the image and/or frequency domain.

Eee61. the method of enumerated example embodiment 60, wherein frequency domain processing includes considering different frequency decomposition domains, wherein processing is performed within each of the domains and the domains are combined later.

Eee62. a method for increasing the computational speed of processing operations on samples of a composite image or video arrangement, comprising:

demultiplexing samples of the composite image or video arrangement into individual samples constituting components of the composite image or video arrangement;

processing each component separately; and

the separately processed components are multiplexed together.

Eee63. the method according to enumerated example embodiment 62, wherein the processing is selected from deblocking, denoising, deblurring, deringing, and filtering.

Eee64. an encoder for encoding a video signal according to the method recited in one or more of enumerated example embodiments 1-37, 42-43, or 46-48.

Eee65. an apparatus for encoding a video signal according to a method as described in one or more of enumerated example embodiments 1-37, 42-43, or 46-48.

Eee66. a system for encoding a video signal according to the method as described in the enumerated example embodiments 1-37, 42-43, or 46-48.

Eee67. a decoder for decoding a video signal according to the method recited in one or more of enumerated example embodiments 1-37, 44-45, 49-51, or 56.

Eee68. an apparatus for decoding a video signal according to a method as described in one or more of enumerated example embodiments 1-37, 44-45, 49-51, or 56.

Eee69. a system for encoding a video signal according to a method as described in one or more of enumerated example embodiments 1-37, 44-45, 49-51, or 56.

Eee70. a computer-readable medium containing a set of instructions that cause a computer to perform the method according to enumerated example embodiments 1-62.

Use of a method according to one or more of enumerated example embodiments 1-37, 42-43, or 46-48 for encoding a video signal.

Eee72. use of a method according to enumerated example embodiments 1-37, 44-45, 49-51, or 56 for decoding a video signal.

All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains. All references cited in this disclosure are incorporated herein by reference as if each reference were individually incorporated by reference in its entirety. It is to be understood that this disclosure is not limited to a particular method or system, which can, of course, vary. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms "a", "an" and "the" include plural referents unless the content clearly dictates otherwise. Unless defined to the contrary, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

The above examples are provided to give those of ordinary skill in the art a complete disclosure and description of how to make and use embodiments of the enhancement methods for sample multiplexed image and video data of the present disclosure, and are not intended to limit the scope of what the inventors regard as their disclosure. Variations of the above-described ways to implement the present disclosure may be used by those skilled in the art of video and are intended to fall within the scope of the appended claims.

Various embodiments of the present disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. Accordingly, other embodiments are within the scope of the following claims.

List of references

[1]D.C.Hutchison,"Introducing DLP3-D TV",http://www.dlp.com/ downloads/Introducing DLP3D HDTVWhitepaper.pdf

[2] McCormick, H.W.New, and D.C.Hutchison, "Implementation of Stereoscope and Dualview Images on a Microdisplay High definition television,"3DTV-CON'08, Istanbul, Turkey, 5 months 2008, pp.33-36.

[3] Tourapsis, W.Husak, A.Leontaris, D.Ruhoff, "Encoder Optimization of Stereoscopic Video Delivery Systems," U.S. provisional application No. 61/082,220 filed on 20/7/2008.

[4]Advanced video coding for generic audiovisual services,http// www.itu.int/rec/recommendation.asp?type=folders&lang=e&parent=T-REC-H.2643 months in 2009.

[5] SMPTE421M, "VC-1Compressed Video Bitstream Format and Decoding Process",2006, month 4.

[6] Tourapis, a.leontaris, and p.pahalawatta, "Encoding and DecodingArchitecture of Checkerboard Multiplexed Image Data," PCT patent application PCT/US2009/056940 filed 9, 15, 2009.

[7] P.Pahalawatta, A.Tourpsis, and A.Leontaris, "Reconstruction Of De-Interleaved Views, Using Adaptive interaction Based On separation BetWeen the Views For Up-Sampling," PCT patent application PCT/US2009/069132 filed 12, 2009, 22.

[8] Tourapis, w.huskk, p.pahalawatta, And a.leontaris, "Codecs And devices For Interleaved And/Or Multiplexed Image Data," PCT patent application PCT/US2010/022445 filed 1, 28/2010.

[9] Tourapis, a. leontaris, p. pahalawatta, And k. stec, "directedlnterplation And Data Post-processing," PCT patent application PCT/US2010/031762 filed 4/20/2010.

[10] Guleryuz, "A nonlinear loop filter for normalization of noise in hybrid video compression," in Proc. int. conference on image processing, vol.2, pp.333-336,2005, 9 months.

[11] Guleryuz, "iterative Denoising for Image Recovery," in Proc. data Compression Conference, Snowbird, UT, 4 months 2002.

Claims

1. A method for processing composite sampled image or video data formed by combining together in a first block two or more sub-sampled image or video data belonging to one of a plurality of image or video classes by means of multiplexing, the method comprising:

sampling the composite sampled image or video data to extract one of the two or more sub-sampled images or video data from the composite sampled image or video data, wherein a first set of samples of the extracted sub-sampled image or video data, referred to as existing samples, is retained and a second set of samples of the extracted sub-sampled image or video data, referred to as lost samples, is discarded;

storing the existing samples in an initial block of existing samples of the same class, the initial block including half the number of rows in the first block and half the number of columns in the first block; and

applying a transform to samples of the initial block;

characterized in that the method further comprises:

estimating transform coefficients of a double-sized block using transform coefficients of the initial block, the double-sized block including twice the number of rows in the initial block and twice the number of columns in the initial block, the double-sized block containing a same existing sample in the initial block and a same class of missing samples of the existing sample;

adjusting estimated transform coefficients of the twice-sized block by, in one case, setting each of the estimated transform coefficients to 0 if the estimated transform coefficients exceed a threshold, and alternatively, in another case, setting each of the estimated transform coefficients to 0 if the estimated transform coefficients are less than a threshold; and

applying an inverse transform to samples of the twice-sized block.

2. A method for processing composite sampled image or video data for a scalable video coding system having a base layer and one or more enhancement layers, the composite sampled image or video data formed by combining together by multiplexing two or more sub-sampled image or video data belonging to one of a plurality of image or video categories, the method comprising:

replacing missing samples of each enhancement layer component picture with samples from the base layer;

the separately processed enhancement layer component pictures are sampled and multiplexed together.

3. A method for processing composite sampled image or video data for a scalable video coding system having a base layer and one or more enhancement layers, the composite sampled image or video data formed by combining together by multiplexing two or more sub-sampled image or video data belonging to one of a plurality of image or video categories, the method comprising:

4. A system for processing image and video data, comprising means for performing, implementing or controlling a method according to one of claims 1 to 3.

5. A video encoder, comprising:

at least one processor; and

a computer-readable storage medium comprising instructions tangibly stored therein, wherein the instructions:

cause, control, program or configure the processor to perform, implement or control a method according to one of claims 1 to 3; and

the video signal that has been encoded accordingly is output.

6. A video decoder, comprising:

at least one processor; and

cause, control, program or configure the processor to perform, implement or control a method according to one of claims 1 to 3; or

Decoding an encoded input video signal, wherein the encoded input video signal comprises the video signal output of the video encoder of claim 5.

7. A video apparatus comprising:

at least one processor; and

a computer-readable storage medium comprising instructions tangibly stored therein, wherein the instructions implement one or more functions comprising:

cause, control, program or configure the processor to perform, implement or control a method according to one of claims 1 to 3; or configuring, programming or controlling a video encoder according to claim 5; and

outputting the video signal that has been encoded accordingly; or decoding an encoded input video signal, wherein the encoded input video signal comprises a video signal output of the video encoder.

8. A video apparatus comprising:

the video encoder of claim 5; or

The video decoder of claim 6.

9. A computer system, comprising:

at least one processor; and

outputting the video signal that has been encoded accordingly; or decoding an encoded input video signal, wherein the encoded input video signal comprises a video signal output of the video encoder; or configuring, programming or controlling a video apparatus according to claim 7 or claim 8.

10. A computer use comprising one or more of:

performing, implementing or controlling a method according to one of claims 1 to 3; or configuring, programming or controlling a video encoder according to claim 5; and

outputting the video signal that has been encoded accordingly; or decoding an encoded input video signal, wherein the encoded input video signal comprises a video signal output of the video encoder; or to configure, program or control one or more of: the video device of claim 7 or claim 8, or the computer system of claim 9.