HK1199785B

HK1199785B - Coding method and decoding method, coder and decoder

Info

Publication number: HK1199785B
Application number: HK15100178.7A
Authority: HK
Inventors: 马蒂亚斯．纳罗施克; 汉斯-格奥尔格．穆斯曼
Original assignee: 马蒂亚斯．纳罗施克; 汉斯-格奥尔格．穆斯曼
Priority date: 2006-01-09
Filing date: 2015-01-08
Publication date: 2018-07-06

Description

Encoding method and decoding method, encoder and decoder

This patent application is a divisional application of a patent application having an international application date of 22/12/2006, a national application number of 200680050791.2, and an invention name of "adaptive coding of prediction error in hybrid video coding".

Technical Field

The invention relates to a method for encoding and decoding, an encoder and a decoder using adaptive coding of prediction errors.

Background

The latest standardized video coding methods are based on hybrid coding. Hybrid coding specifies a coding step in the time domain and a coding step in the spatial domain. First, temporal redundancy of a video signal is reduced by employing block-based motion compensated prediction between an image block to be encoded and a reference block of a transmitted image, determined by a motion vector. The remaining prediction error samples are arranged in blocks and transformed to the frequency domain resulting in blocks of coefficients. These coefficients are quantized and scanned according to a fixed and well known zig-zag scanning scheme, which starts with coefficients representing DC values. According to a typical expression, the coefficient is set within the low frequency coefficient located in the upper left corner of the block. The zig-zag scan produces a one-dimensional array of coefficients that are entropy encoded by a subsequent encoder. The encoder is optimized for an energy-reduced coefficient array. Since the order of the coefficients within a block is predetermined and fixed, a zig-zag scan produces an array of reduced energy coefficients if the prediction error samples are correlated. The subsequent encoding steps can then be optimized for this case. For this purpose, the latest standard h.264/AVC proposes context-based adaptive binary arithmetic coding (CABAC) or context-adaptive variable length coding (CAVLC). However, the coding efficiency of the transform is high only in case the prediction error samples are correlated. For samples that are only edge-dependent in the spatial domain, the conversion efficiency is low.

Disclosure of Invention

It is an object of the present invention to provide more efficient encoding and decoding methods, corresponding encoders and decoders, data signals and corresponding systems and semantics for encoding and decoding video signals than the prior art.

According to an aspect of the present invention, there is provided a method of encoding a video signal based on hybrid coding. The method comprises the following steps: reducing temporal redundancy by block-based motion compensated prediction to establish a prediction error signal; it is decided whether to transform the prediction error signal into the frequency domain or to keep the prediction error signal in the spatial domain for decoding.

According to a corresponding aspect of the present invention, a decoder is provided, which is adapted to apply hybrid coding of video signals. The decoder comprises means for reducing temporal redundancy by block-based motion compensated prediction in order to establish a prediction error signal; and means for deciding whether to transform the prediction error signal into the frequency domain or to maintain the prediction error signal in the spatial domain. According to this aspect of the invention, a concept and corresponding apparatus, signal and semantics are provided to adaptively decide whether to process a prediction error signal in the frequency domain or in the spatial domain. If the prediction error samples have only a small correlation, the subsequent step of encoding the samples may be more efficient and result in a reduced data rate compared to the encoding of coefficients in the frequency domain. Therefore, the present invention realizes an adaptive decision procedure for making the decision and an adaptive control device. Accordingly, it is decided whether to adopt the frequency domain transform or to keep the prediction error signal in the spatial domain in consideration of the prediction error signal. The subsequent encoding scheme may be the same as in the frequency domain or may be specifically adapted to the requirements of the sampling in the spatial domain.

According to another aspect of the invention, a method of encoding a video signal, in particular a decision step, is based on a cost function. In general, the decision whether to use coefficients in the frequency domain or samples in the spatial domain may be based on various types of decision mechanisms. The decision may be made immediately for all samples within a particular portion of the video signal or, for example, for a particular number of blocks, macroblocks or slices. The decision may be based on a cost function, such as a lagrangian function. The cost is calculated for both the coding in the frequency domain and the coding in the spatial domain. A decision is made for the encoding with the lower cost.

According to another aspect of the invention, the cost function comprises a rate-distortion cost for encoding in the spatial domain and for encoding in the frequency domain. According to another aspect of the invention, the rate-distortion cost may be calculated by weighting the required rate and the resulting distortion by the lagrangian parameter. Additionally, the distortion measure may be a mean square quantization error or a mean absolute quantization error.

According to another aspect of the invention, the samples in the spatial domain may be encoded using substantially the same method as the coefficients in the frequency domain. These methods may include CABAC or CAVLC encoding methods. Accordingly, little or no change in the encoding scheme is required if the adaptive control means decides to switch between the frequency and spatial domains. However, different coding schemes may also be employed for the coefficients in the two domains.

According to another aspect of the present invention, there is provided a method of encoding a video signal based on hybrid coding. According to this aspect of the invention, temporal redundancy is reduced by block-based motion compensated prediction, and samples of the prediction error signal within the prediction error block are provided in the spatial domain. The samples within the prediction error block are scanned to provide an array of samples in a particular order. According to this aspect of the invention, the prescribed scanning scheme is derived from a prediction error image or a prediction image. The scanning scheme according to this aspect of the invention takes into account the phenomenon that the zigzag scanning used in the prior art for the frequency domain may not be the most efficient scanning order for the spatial domain. Thus, an adaptive scanning scheme is provided that takes into account the distribution of samples in the spatial domain and the magnitude of the samples. The scanning scheme may preferably be based on a prediction error image or a prediction image. This aspect of the invention takes into account the most likely locations of the samples with the highest magnitudes and the most likely zero samples. Since the coding gain for the frequency domain is mainly based on the phenomenon that the low frequency components have large magnitudes and most of the high frequency coefficients are zero, very efficient, variable coding length coding schemes like CABAC or CAVLC can be applied. However, in the spatial domain, the sample with the largest magnitude may be located anywhere within the block. However, since the prediction error is usually highest at the edge of the moving object, the prediction image or the prediction error image can be used to establish the most efficient scanning order.

According to one aspect of the invention, the gradient of the predicted image can be used to identify samples having large magnitudes. The scan order is in order of the intra-gradient metric of the image. The same scanning order is then applied to the prediction error image, i.e. to the samples within the prediction error image in the spatial domain.

Further, according to still another aspect of the present invention, the scanning scheme may be based on a motion vector combined with a prediction error image of the reference block. The scans are in the order of decreasing prediction error magnitude.

According to one aspect of the invention, the scanning scheme is derived from a linear combination of the gradient of the prediction image and the prediction error image of the reference block combined with a motion vector.

According to one aspect of the invention, a specific coding of the coding scheme (e.g. CABAC, etc.) is used based on probabilities determined separately for the coefficients in the frequency domain or the samples in the spatial domain. Accordingly, coding schemes well known in the art may be adapted at least slightly to provide the most efficient coding scheme for the spatial domain. Accordingly, the switching mechanism, which is adaptively controlled so as to be encodable in the spatial or frequency domain, may be further adapted for switching subsequent encoding steps of samples or coefficients in the respective domain.

According to one aspect of the present invention, there is provided a method of encoding a video signal, the method comprising the step of quantizing prediction error samples in the spatial domain by a quantizer, which has either a subjectively weighted error optimization or a mean square quantization error optimization. According to this aspect of the invention, the quantizer for quantizing samples in the spatial domain may be adapted to take into account a subjectively optimal visual impression of the picture. The representative value and the decision threshold of the quantizer may be adapted based on corresponding subjective or statistical properties of the prediction error signal.

In addition, the present invention also relates to a decoding method and a decoding apparatus according to the above-described aspects. According to an aspect of the present invention, there is provided a decoder comprising adaptive control means for adaptively deciding whether an input stream of an encoded video signal represents a prediction error signal of the encoded video signal in a spatial domain or a frequency domain. Accordingly, the decoder according to this aspect of the invention is adapted to decide for the input data stream, i.e. whether the prediction error signal is encoded in the frequency domain or in the spatial domain. In addition, the decoder provides corresponding decoding means for each of the two domains, either the spatial or frequency domain.

Further according to an aspect of the invention, the decoder comprises scan control means for providing a scan order based on the prediction signal or the prediction error signal. The scan control unit according to this aspect of the invention is adapted to obtain the required information about the scan order in which the samples of the arriving block have been scanned during encoding of the video signal. In addition, the decoder may include all means to inverse quantize and inverse transform the coefficients in the frequency domain or to inverse quantize the samples in the spatial domain. The decoder may also include a mechanism to provide motion compensation and decoding. Basically, the decoder may be configured to provide all means for implementing the method steps corresponding to the encoding steps described above.

According to yet another aspect of the present invention, there is provided a data signal representing an encoded video signal, wherein encoding information of a prediction error signal in the data signal is partially encoded in the spatial domain and partially encoded in the frequency domain. This aspect of the invention relates to an encoded video signal that is the result of the above-described encoding scheme.

Further, according to still another aspect of the present invention, the data signal may include side information indicating a domain in which a slice, a macroblock, or a block is encoded, and particularly, information whether the slice, the macroblock, or the block is encoded in a spatial domain or a frequency domain. Since the adaptive control of the present invention provides a prediction error signal that is either coded in the spatial domain or in the frequency domain, corresponding information needs to be included in the coded video signal. Therefore, the present invention also provides a specific information indicating a domain in which a specific portion, such as a slice, a macroblock, or a block, has been encoded.

In addition, this aspect of the invention also takes into account the possibility that an entire macroblock or an entire block may be encoded in only one of the two domains. Thus, if an entire macroblock is coded, for example, in the spatial domain, this may be represented by a single flag or the like. In addition, even the entire slice may be encoded only in the frequency or spatial domain and a corresponding indicator for the entire slice may be included in the data stream. This may result in a reduced data rate and a more efficient coding scheme for the side information.

According to another aspect of the present invention, there is provided a method of encoding a video signal using hybrid encoding, the method including: reducing temporal redundancy by block-based motion compensated prediction to create a prediction error signal; deciding whether to transform the prediction error signal into the frequency domain or to keep the prediction error signal in the spatial domain for encoding, wherein flags are provided to indicate whether all blocks of a current slice are encoded in the frequency domain, wherein further flags are provided for the blocks of the slice if not all blocks of the slice are encoded in the frequency domain, each of the further flags indicating whether the blocks of the slice are encoded in the spatial domain or in the frequency domain, and wherein the further flags indicating whether the blocks of the slice are encoded in the spatial domain or in the frequency domain, respectively, are encoded using CABAC, conditioned on the flags of top-side and left-side neighboring blocks that have been encoded.

According to another aspect of the present invention, there is provided a method of decoding a video signal using hybrid decoding, the method including: receiving an encoded video signal comprising encoded video data, the encoded video data comprising encoded frequency domain data and/or encoded spatial domain data; decoding the received encoded video data; performing an inverse transform of the video data from a frequency domain into a spatial domain, or skipping an inverse transform of the video data from the frequency domain into the spatial domain, wherein a flag indicating whether all blocks of a current slice are encoded in the frequency domain is read from the encoded video signal and depending on the value of the flag, reading further marks for the blocks of the slice, each of the further marks indicating whether the block of the slice is encoded in a spatial domain or in a frequency domain, wherein the further flags indicating whether the block of the slice is encoded in the spatial domain or the frequency domain, respectively, are decoded using CABAC conditional on flags of top-side and left-side neighboring blocks that have already been decoded, and wherein an inverse transform from the frequency domain to the spatial domain is performed on the current slice or block thereof depending on the values of the flag and the further flag.

According to another aspect of the present invention, there is provided an encoder for encoding a video signal using hybrid encoding, the encoder including: means for reducing temporal redundancy by block-based motion compensated prediction in order to establish a prediction error signal; adaptive control means for deciding whether to transform the prediction error signal into the frequency domain or to maintain the prediction error signal in the spatial domain; and encoding means adapted to encode the prediction error signal transformed into the frequency domain or kept in the spatial domain, wherein flags are provided to indicate whether all blocks of a current slice are encoded in the frequency domain, wherein further flags are provided for the blocks of the slice if not all blocks of the slice are encoded in the frequency domain, each of the further flags indicating whether the block of the slice is encoded in the spatial domain or in the frequency domain, and wherein the further flags indicating whether the block of the slice is encoded in the spatial domain or in the frequency domain, respectively, are encoded using CABAC, conditioned on the flags of already encoded top-side and left-side neighboring blocks.

According to another aspect of the present invention, there is provided a decoder for decoding a video signal encoded by using hybrid encoding, the decoder including: receiving means configured to receive an encoded video signal comprising encoded frequency domain data and/or encoded spatial domain data; and adaptive control means for adaptively determining whether the received encoded video data represents a prediction error signal in the spatial domain or in the frequency domain of the encoded video signal, wherein a flag indicating whether all blocks of a current slice are encoded in the frequency domain is read from the encoded video signal and depending on the value of the flag, reading further marks for the blocks of the slice, each of the further marks indicating whether the block of the slice is encoded in a spatial domain or in a frequency domain, wherein the further flags indicating whether the block of the slice is encoded in the spatial domain or the frequency domain, respectively, are decoded using CABAC conditional on flags of top-side and left-side neighboring blocks that have already been decoded, and wherein an inverse transform from the frequency domain to the spatial domain is performed on the current slice or block thereof depending on the values of the flag and the further flag.

Drawings

These aspects of the invention are illustrated by the preferred embodiments illustrated in conjunction with the drawings.

FIG. 1 shows a simplified block diagram of an encoder implementing the present invention;

FIG. 2 shows a simplified block diagram of a decoder implementing the present invention;

FIG. 3 illustrates a prior art scanning scheme;

FIG. 4 illustrates a scanning scheme of the present invention;

fig. 5 shows the parameters for the optimized quantizer of the present invention.

Fig. 6 shows a simplified representation of the measured average absolute reproduction error of a picture element in the case of subjectively weighted quantization in the frequency domain of fig. 6(a) and in the spatial domain of fig. 6 (b).

Detailed Description

Fig. 1 shows a simplified block diagram of an encoder according to the present invention. Accordingly, the input signal 101 is subjected to a motion estimation, from which a motion compensated prediction is performed, in order to provide a prediction signal 104, said prediction signal 104 being subtracted from the input signal 101. The resulting prediction error signal 105 is transformed into the frequency domain 106 and quantized by the optimal quantizer 107 into frequency-dependent coefficients. The output signal 120 of the quantizer 107 is passed to an entropy coder 113, which entropy coder 113 provides an output signal 116 to be transmitted, stored, etc. The quantized prediction error signal 120 is further used for the next prediction step of the motion compensated prediction block 103 by means of an inverse quantization block 110 and an inverse transform block 111. The inverse quantized inverse DCT transformed prediction error signal is added to the prediction signal and passed to a frame memory 122, which frame memory 122 stores previous images for the motion compensated prediction block 103 and motion estimation block 102. In general, the present invention suggests (in addition to the prior art) to employ an adaptive control mechanism 115 to switch between the frequency domain and the spatial domain for transforming the prediction error signal 105. The adaptive control means 115 generates signals and parameters to control the adaptive variation between the frequency and spatial domains. Accordingly, the adaptive control information signal 121 is provided (alert) to two switches that switch between positions a and B. If the transformation is done in the frequency domain, then the two switches are in position A. If the spatial domain is used, then the two switches are in position B. In addition, the side information signal 121, i.e. which domain has been used for the encoding procedure of the picture, is also passed to the entropy encoder 113. Thus, the appropriate information for the device is included in the data stream. In parallel to the frequency transform, the prediction error signal 105 is passed to the quantizer 109 via another path. The quantization block 109 provides an optimal quantization in the spatial domain for the prediction error signal 105. The quantized prediction error signal 124 in the spatial domain may be transferred to the second inverse quantization block 122 and further to a following connection to the motion compensated prediction block 103. In addition, there is a scan control block 114 which receives the motion vector 123 and the inverse quantized prediction error signal 118 or the prediction signal 104 via connection 119. Block 117 is used to encode motion information.

The adaptive control block 115 decides whether the block is encoded in the frequency domain or in the spatial domain and it generates corresponding side information to indicate this domain. The decision made by the adaptive control means is based on the rate-distortion cost of the coding in the spatial domain and in the frequency domain. The domain with the lower rate-distortion cost is selected for encoding. For example, the rate-distortion cost C is calculated by weighting the required rate R and the resulting distortion D by the lagrangian parameter L, C ═ L × R + D. As a distortion measure, a mean square quantization error may be used, but other measures, such as a mean absolute value quantization error, may also be applied. As the lagrangian parameter L, the lagrangian parameter L of 0.85 × 2((QP-12)/3) used in the usual h.264/AVC decoder control can be used. Other alternative methods for determining the rate-distortion cost are also possible.

The adaptive control 115 may alternatively control the encoding method. This can be done, for example, based on the prediction signal or on the correlation of the prediction errors, or based on the field, the prediction errors are encoded at the motion compensated positions of the transmitted frames.

Fig. 2 shows a simplified block diagram of the architecture of a decoder in accordance with aspects of the present invention. Accordingly, the encoded video data is input to two entropy decoding blocks 201 and 202. The entropy decoding block 202 decodes motion compensation information such as motion vectors and the like. The entropy decoding block 201 applies the inverse coding mechanism used in the encoder, for example decoding according to CABAC or CAVLC. If the encoder employs different encoding schemes for the coefficients or samples in the spatial domain, then the corresponding decoding mechanisms are used in the corresponding entropy decoding blocks. Accordingly, the entropy decoding block 201 generates appropriate signals to switch between positions a and B in order to use the appropriate inverse quantization path for the spatial domain, i.e. the inverse quantization operation block 206, or the appropriate blocks according to the switch position a, i.e. the inverse quantization block 203 and the inverse transform block 204. If the prediction error occurs in the frequency domain, the inverse quantization block 203 and the inverse transform block 204 apply the corresponding inverse operations. Since the samples in the spatial domain are arranged in a specific order according to the scanning mechanism of the present aspect, the scan control unit 205 provides the entropy decoding block 201 with the correct sample order. If the encoding has been done in the spatial domain, the inverse transform block 204 and the inverse quantization block 203 are bypassed by the inverse quantization operation in block 206. The switching mechanism that switches between the frequency domain and the spatial domain (i.e. between positions a and B of the switch) is controlled by side information that is sent in the bitstream and decoded by the entropy decoding block 201. In addition, the inverse quantized signal in the spatial domain, or the inverse quantized and inverse transformed signal in the frequency domain, is added to the motion compensated prediction picture to provide a decoded video signal 210. Motion compensation is performed in block 209 based on previously decoded video signal data (previous picture) and motion vectors. The scan control unit 205 uses the prediction image 208, or the prediction error signal 207 in combination with the motion vector 212, to determine the correct coefficient scan sequence. The scanning mechanism may also be based on two pictures, a prediction error picture and a prediction picture. As explained for the decoding mechanism with reference to fig. 1, the scanning sequence during encoding may be based on a combination of prediction error information 207 and motion compensation vectors. Thus, the motion compensation vectors may be communicated to the scan control unit 205 via path 212. In addition, consistent with fig. 1, there is a frame memory 211 that stores necessary and previously decoded pictures.

Fig. 3 shows a simplified block diagram illustrating a zigzag scanning sequence of the prior art. Accordingly, the coefficients as a result of transformation into the frequency domain (e.g., DCT) are arranged into 4 × 4 blocks in a predetermined order as shown in fig. 3. The coefficients are read in a particular order such that the coefficient representing the low frequency portion is located at the first left position of the one-dimensional array. The lower right in the array, the higher the corresponding frequency of the coefficients. Since the block to be encoded usually contains a large number of low frequency coefficients, the high frequency coefficients, or at least a substantial majority of the high frequency coefficients, are zero. This situation can be effectively used to reduce the data to be transmitted by replacing large sequences of zeros, for example, with simple information related to the number of zeros.

Fig. 4 illustrates a simplified exemplary embodiment of a scanning mechanism according to an aspect of the present invention. Fig. 4(a) shows the magnitude of the gradient within a predicted image for a block. The value of each position of the block represents the gradient of the prediction image of the current block. The gradient itself is a vector consisting of two components representing the gradient in the horizontal and vertical directions. Each component may be determined by the difference of two adjacent samples or may be determined by the well-known sobel operator considering six adjacent samples. The magnitude of the gradient is the magnitude of the vector. If the two values have the same magnitude, a fixed or predetermined scan order may be applied. The scanning is in order of magnitude of the gradient values within the block, as shown by the dashed line. Once the scan order within the gradient prediction image is established, this same scan order is applied to the quantized prediction error samples, which are shown in fig. 4 (b). If the quantized samples in the spatial domain of the block shown in fig. 4(b) are arranged in a one-dimensional array as shown on the left side of fig. 4(b) according to a scan order established based on the magnitude of the gradient within the predicted image, the samples with high values will typically be arranged first in the array, i.e. at the left position. The right position is filled with zeros as shown in fig. 4 (b).

Instead of a gradient controlled scan, other scans may be applied, such as a predetermined scan or a scan controlled with the quantization prediction error of the transmitted frame in combination with motion vectors, or a combination thereof (scan control relates to blocks 114 or 205, as described with reference to fig. 1 and 2). In case the scanning is controlled by the prediction error signal combined with the motion vector, the scanning order follows the order in which the magnitudes of the quantized prediction error samples of the block to which the motion vector of the current block refers decrease.

If the motion vector points to a partial sample position, interpolation may be used to determine the required quantized prediction error samples.

This may be the same interpolation method as the interpolation used to generate the reference picture of the prediction sample.

In case the scanning is controlled by a combination of a prediction image and a prediction error image combined with a motion vector, a linear combination of the magnitudes of the gradients and of the quantized prediction error samples of the block to which the motion vector of the current block refers is calculated. The scanning is performed according to the values of these linear combinations. In addition, the scan determination method may be signaled for a field of the sequence (e.g., for each frame or for each slice or for a set of blocks). According to this typical standard processing method, motion compensation vectors have been considered while determining the predicted image.

According to another aspect of the invention, the scanning order may also be based on prediction error pictures in combination with motion vectors. In addition, a combination of the gradient principle and the prediction error picture as described above is also conceivable.

Fig. 5 shows a simplified view for explaining the definition of an optimal quantizer according to aspects of the present invention. Accordingly, the three parameters a, b, c are parameters for modifying the quantizer. According to the h.264/AVC standard, a rate-distortion optimized quantizer for coefficients with two different distortion measures is applied. The first measure is the mean square quantization error and the second measure is the subjectively weighted quantization error. According to the h.264/AVC standard, two quantizers for prediction error samples were developed. Since the distribution of prediction errors is close to the laplacian distribution, a scalar dead-zone uniform threshold quantizer is used in the case of mean square quantization error optimization. Fig. 5 illustrates the parameters a, b, c of quantization and dequantization.

Table 1 shows parameters a, b, c which are advantageously used for QP (quantization parameter) commonly used in the h.264/AVC coding scheme. The parameters a, b, c are respectively optimal parameters for mean square quantization error optimization. Of course, this is just one example, and different or other parameters may be used for different applications.

TABLE 1

For subjective weighted quantization error optimization, a non-uniform quantizer with representative values ri, -ri and decision thresholds between adjacent ri is proposed, as also shown in table 1. Visual masking may be utilized if large prediction errors occur at the edges. Accordingly, a large quantization error may be allowed to occur at the edge, and a small quantization error may be allowed if the image signal is flat. H.264/AVC may employ more than 4 QPs as shown in FIG. 1. Thus, table 1 must be extended. H.264/AVC can employ 52 different QPs. The basic idea of determining a suitable representative value ri, -ri is explained below with reference to fig. 6.

Fig. 6 shows a simplified representation of the measured average absolute reproduction error of a picture element in the case of subjectively weighted quantization in the frequency domain of fig. 6(a) and in the spatial domain of fig. 6 (b). The measured average absolute reproduction error of the subjectively weighted quantization in the frequency domain is shown as a function of the absolute value of the prediction error. For the absolute reproduction error of the subjectively weighted quantization in the spatial domain, the representative value r is evaluated_iThe adjustment is made such that the average absolute reproduction error is the same for quantization in both the frequency domain and the spatial domain with respect to the quantization interval in the spatial domain. By way of example only, the value r for QP ═ 26 as shown in table 1₁、r₂、r₃、r₄Is also shown inFIG. 6 (b). As a rule of thumb, if the value of QP is increased by 6, then the value level r is representative_iIt is approximately doubled. The quantizer design may also employ other features of the vision system. In addition, a quantizer may be used to generate a quantization error with characteristics different from that of the H.264/AVC quantizer.

Entropy coding of quantized samples in the spatial domain

According to one aspect of the present invention, entropy encoding in the spatial domain may be based on the same method as the method of quantizing coefficients in the frequency domain. For the H.264/AVC standard, the two preferred entropy coding methods are CABAC and CAVLC. However, according to this aspect of the present invention, instead of encoding the quantized coefficients in the frequency domain, the quantized samples in the spatial domain are encoded using the above-described method. As described above, the scanning order may be changed to provide the same data reduction as the frequency domain. As proposed above, the scanning in the spatial domain can be controlled with the magnitude of the gradient of the prediction image signal at the same spatial position. According to this principle, the samples to be encoded are arranged in decreasing order of gradient, as already explained in connection with fig. 4(a) and 4 (b). Other scanning mechanisms may also be applied as described above. In addition, according to other aspects of the present invention, separate codes, which mean separate probability models in the case of CABAC, can be used for the spatial domain. The coding and initialization of the probability model in the case of CABAC can be derived from the statistics of the quantized samples. Context modeling in the spatial domain can be done in the same way as in the frequency domain.

Encoding of side information

The adaptive control means described in connection with fig. 1 generates information about the domain in which the block is to be encoded. The block size may be 4 × 4 or 8 × 8 picture elements depending on the transform size. Of course, other block sizes independent of the transform size may also be applied, in accordance with various aspects of the present invention. According to an aspect of the invention, the side information comprises a specific flag indicating whether the encoding scheme has been adaptively changed during encoding. If, for example, all blocks of a slice are coded in the frequency domain, this may be indicated by a specific bit in the coded video data signal. This aspect of the invention may also relate to blocks in a macroblock that may both be encoded in both domains or only in one domain. In addition, the idea according to this aspect of the invention may be applied in macroblocks and may include information in the data stream indicating whether at least one block of the macroblock is coded in the spatial domain. Accordingly, a flag Slice _ FD _ SD _ coding _ flag may be used to indicate whether all blocks of a current Slice are coded in the frequency domain or whether at least one block is coded in the spatial domain. The flag may be encoded in a single bit. If at least one block in the slice is coded in the spatial domain, this may be indicated by a flag MB _ FD _ coding _ flag for each individual macroblock of the current slice whether all blocks of the current macroblock are coded in the frequency domain or whether at least one block is coded in the spatial domain. The flag may be coded conditional on the flags of the top and left adjacent coded blocks. If the last of the macroblocks is coded in the spatial domain, this can be indicated by the Flag FD _ or _ SD _ Flag for each block to be coded of the macroblock, if the current block is coded in the frequency domain or in the spatial domain. The flag may be coded conditional on the flags of the top and left adjacent coded blocks. Alternatively, the side information may also be encoded under the condition of a prediction signal or a prediction error signal combined with a motion vector.

Syntax and semantics

According to this aspect of the invention, an exemplary syntax and semantics are introduced that allow aspects of the invention to be incorporated into the H.264/AVC coding scheme. Accordingly, a Flag Slice _ FD _ SD _ coding _ Flag may be introduced in Slice _ header as shown in table 2. The flag MB _ FD _ SD coding _ flag may be transmitted in each macroblock _ layer as shown in table 3. In residual _ block _ cabac, if frequency domain encoding or spatial domain encoding is provided to the current block, it can be represented by a Flag FD _ or _ SD _ Flag, which is shown in table 4 below. Similar schemes can be applied in other video coding algorithms for prediction error coding.

TABLE 2

TABLE 3

TABLE 4

The invention provides the following inventive concepts:

1. a method of encoding a video signal using hybrid encoding, comprising:

reducing temporal redundancy by block-based motion compensated prediction to establish a prediction error signal;

it is decided whether to transform the prediction error signal into the frequency domain or to keep the prediction error signal in the spatial domain for encoding.

2. The method according to inventive concept 1, wherein the step of deciding is based on a cost function.

3. The method according to inventive concept 1 or 2, wherein the cost function comprises rate-distortion costs of encoding in the spatial domain and encoding in the frequency domain.

4. The method according to inventive concept 3, wherein the rate-distortion cost is calculated by weighting the required ratio (R) and the resulting distortion (D) by lagrangian parameters.

5. The method according to inventive concept 4, wherein the distortion measure is a mean square quantization error or a mean absolute quantization error.

6. The method according to any of the preceding inventive concepts, wherein the samples in the spatial domain are encoded by the same method as the coefficients in the frequency domain.

7. The method according to inventive concept 6, wherein the encoding of the coefficients is based on CABAC or CAVLC.

8. Method for encoding a video signal using hybrid coding, wherein temporal redundancy is reduced by block-based motion compensated prediction, the method comprising:

providing samples of a prediction error signal within a prediction error block in the spatial domain;

the samples within the prediction error block are scanned to provide an array of samples in a particular order, wherein the scanning scheme is derived from the prediction error image or the prediction image.

9. The method according to inventive concept 8, wherein the scanning scheme is derived from predicting a gradient of the image.

10. The method according to inventive concept 8, wherein the scanning scheme is based on a motion vector combined with a prediction error image of the reference block.

11. The method according to inventive concept 8, wherein the scanning scheme is derived from a linear combination of a gradient of the prediction image and a prediction error image of the reference block combined with the motion vector.

12. The method according to any of the preceding inventive concepts, wherein for the spatial domain, a specific coding with separate probabilities for CABAC is employed.

13. The method according to any of the preceding inventive concepts, wherein for the spatial domain a specific coding for CAVLC is employed.

14. The method according to any of the preceding inventive concepts, comprising further quantizing the prediction error samples in the spatial domain by a quantizer comprising a subjectively weighted error optimization or a mean squared quantization error optimization.

15. A data signal representing an encoded video signal comprising coding information of a prediction error signal partially encoded in the spatial domain and partially encoded in the frequency domain.

16. The data signal according to the inventive concept 15 comprises information about a domain in which a slice, a macroblock or a block is encoded, in particular information whether a slice, a macroblock or a block is encoded in the spatial domain or in the frequency domain.

17. The data signal according to inventive concept 16, comprising slice _ fd _ sd _ coding _ flag, mb _ fd _ coding _ flag and/or fd _ or _ sd _ flag information related to coding for a slice, macroblock or block, respectively.

18. A method of decoding a video signal using hybrid coding, comprising:

encoded video data is efficiently decoded in the frequency or spatial domain according to the encoding scheme used to encode the video signal data.

19. The decoding method according to inventive concept 18, wherein the positions of the prediction error signal samples received in the one-dimensional array are allocated to positions arranged in two dimensions, the positions arranged in two dimensions being determined based on the prediction error signal or the prediction image previously received.

20. An encoder for encoding a video signal using hybrid encoding, comprising:

means for reducing temporal redundancy by block-based motion compensated prediction in order to establish a prediction error signal; and

adaptive control means for determining whether to convert the prediction error signal into the frequency domain or to hold the prediction error signal in the spatial domain.

21. Decoder for decoding a video signal using hybrid coding, comprising adaptive control means (201), said adaptive control means (201) being arranged for adaptively deciding whether an input stream of the encoded video signal represents a prediction error signal of the encoded video signal in the spatial domain or in the frequency domain.

22. The decoder according to inventive concept 21, further comprising scan control means for providing a scan order based on the prediction signal or the prediction error signal or a linear combination of both.

Claims

1. A method of encoding a video signal using hybrid encoding, comprising:

reducing temporal redundancy by block-based motion compensated prediction to create a prediction error signal;

deciding whether to transform the prediction error signal into the frequency domain or to keep the prediction error signal in the spatial domain for encoding,

wherein a flag is provided to indicate whether all blocks of a current slice are encoded in the frequency domain, wherein, if not all blocks of the slice are encoded in the frequency domain, further flags are provided for the blocks of the slice, each of the further flags indicating whether the blocks of the slice are encoded in the spatial domain or in the frequency domain, and

wherein the further flags indicating whether the block of the slice is encoded in the spatial domain or the frequency domain, respectively, are encoded using CABAC conditional on flags of top-side and left-side neighboring blocks that have been encoded.

2. The method of claim 1, wherein the samples in the spatial domain are encoded by the same method as the coefficients in the frequency domain.

3. The method of claim 2, wherein the encoding of the coefficients is performed according to CABAC or CAVLC.

4. The method according to claim 1, wherein the following block sizes correspond to the sizes of the transforms performed with the coding in the frequency domain selected: for this block size, the method switches between encoding in the frequency domain and encoding in the spatial domain.

5. The method according to claim 1, wherein the following block size corresponds to the size of the transform performed with the coding in the selected frequency domain and is 4 x 4: for this block size, the method switches between encoding in the frequency domain and encoding in the spatial domain.

6. The method of claim 1, wherein the prediction error samples are provided in blocks and the elements of each block are scanned according to a scanning order, and the scanning order depends on whether the samples have been transformed into the frequency domain or have been kept in the spatial domain.

7. The method of claim 1, wherein the flag indicating whether all blocks of the current slice are coded in the frequency domain is coded by a single bit.

8. A method of decoding a video signal using hybrid decoding, comprising:

receiving an encoded video signal comprising encoded video data, the encoded video data comprising encoded frequency domain data and/or encoded spatial domain data;

decoding the received encoded video data;

performing an inverse transform of the video data from a frequency domain into a spatial domain, or skipping an inverse transform of the video data from the frequency domain into the spatial domain,

wherein the content of the first and second substances,

reading from the encoded video signal a flag indicating whether all blocks of a current slice are encoded in the frequency domain, and

reading further marks for the blocks of the slice, depending on the values of the marks, each of the further marks indicating whether the block of the slice is encoded in the spatial domain or in the frequency domain,

wherein the further flags indicating whether the blocks of the slice are coded in the spatial or frequency domain, respectively, are decoded using CABAC conditional on flags of top-side and left-side neighboring blocks that have been decoded, and

wherein an inverse transform from a frequency domain to a spatial domain is performed on the current slice or block thereof depending on the values of the flag and the further flag.

9. The method of claim 8, wherein the samples in the spatial domain are decoded by the same method as the coefficients in the frequency domain.

10. The method of claim 9, wherein the decoding of the coefficients is performed according to CABAC or CAVLC.

11. The method of claim 8, wherein the following block sizes correspond to the sizes of the inverse transform performed with decoding in the selected frequency domain: for this block size, the method switches between decoding in the frequency domain and decoding in the spatial domain.

12. The method of claim 8, wherein the following block size corresponds to the size of the inverse transform performed with decoding in the selected frequency domain and is 4 x 4: for this block size, the method switches between decoding in the frequency domain and decoding in the spatial domain.

13. The method of claim 8, wherein the transmitted prediction error samples or coefficients of the video are received in a scan order and rearranged into a plurality of blocks according to the following order: the order depends on whether the samples have been transformed into the frequency domain or have been kept in the spatial domain.

14. The method of claim 8, wherein the flag indicating whether all blocks of the current slice are coded in the frequency domain is contained in a single bit.

15. An encoder for encoding a video signal using hybrid encoding, comprising:

means for reducing temporal redundancy by block-based motion compensated prediction in order to establish a prediction error signal;

adaptive control means for deciding whether to transform the prediction error signal into the frequency domain or to maintain the prediction error signal in the spatial domain; and

encoding means adapted to encode said prediction error signal transformed into the frequency domain or kept in the spatial domain,

16. Encoder according to claim 15, adapted to perform the method according to any of claims 2 to 7.

17. A decoder for decoding a video signal encoded by using hybrid encoding, comprising:

receiving means configured to receive an encoded video signal comprising encoded frequency domain data and/or encoded spatial domain data; and

adaptive control means (201) for adaptively determining whether the received encoded video data represents a prediction error signal in the spatial domain or in the frequency domain of the encoded video signal,

wherein the content of the first and second substances,

18. The decoder of claim 17, adapted to perform the method of any of claims 9 to 14.