HK1159913B - Image encoding device and method and image decoding device and method - Google Patents
Image encoding device and method and image decoding device and method Download PDFInfo
- Publication number
- HK1159913B HK1159913B HK12100070.9A HK12100070A HK1159913B HK 1159913 B HK1159913 B HK 1159913B HK 12100070 A HK12100070 A HK 12100070A HK 1159913 B HK1159913 B HK 1159913B
- Authority
- HK
- Hong Kong
- Prior art keywords
- encoding
- prediction
- intra
- image
- prediction mode
- Prior art date
Links
Description
The present application is a divisional application of an invention patent application having an application number of 200680025140.8(PCT/JP2006/312159), an application date of 2006, 6, and 16 (a delivery date of 2008, 1, and 10), and having an invention name of "an image encoding device, an image decoding device, an image encoding method, an image decoding method, an image encoding program, an image decoding program, a computer-readable recording medium on which the image encoding program is recorded, and a computer-readable recording medium on which the image decoding program is recorded".
Technical Field
The present invention relates to a digital image signal encoding apparatus, a digital image signal decoding apparatus, a digital image signal encoding method, and a digital image signal decoding method, which are used in an image compression encoding technique, a compressed image data transmission technique, and the like.
Background
Currently, the standardized input signal format called 4:2:0 format is mainly used in the international standard video coding method such as MPEG and ITU-T h.26x. The 4:2:0 format is a format in which a color moving picture signal of RGB or the like is converted into a luminance component (Y) and 2 color difference differences (Cb, Cr), and the number of samples of the color difference components is reduced to half of the luminance component both horizontally and vertically. Since the color difference component has a lower visibility than the luminance component, the amount of original information to be encoded is reduced by downsampling (down sample) the color difference component before encoding in the conventional international standard video encoding system. On the other hand, with the recent increase in resolution and gray scale of video displays, a method of encoding color difference components by sampling similar to luminance components without undersampling the color difference components has been studied. The format of the number of samples in which the luminance component and the color difference component are completely the same is referred to as a 4:4:4 format. In the MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard (hereinafter referred to as AVC), a "high 4:4:4 configuration specification (profile)" is prepared as a coding scheme using a 4:4:4 format as an input. As shown in fig. 10, the conventional 4:2:0 format is limited to color space definitions such as Y, Cb, and Cr on the premise of undersampling color difference components, and in the 4:4:4 format, since there is no difference in sampling ratio between color components, R, G, B can be used as it is or other color space definitions can be used in addition to Y, Cb, and Cr. In the video encoding method using the 4:2:0 format, since the color space is defined as Y, Cb, or Cr, it is not necessary to consider the type of color space in the encoding process, but in the aforementioned AVC high 4:4:4 arrangement specification, the color space definition affects the encoding process itself. On the other hand, the current high 4:4:4 arrangement specification is not a design that optimizes the compression efficiency of the 4:4:4 format because it considers compatibility with other arrangement specifications that target the 4:2:0 format defined in the Y, Cb, and Cr spaces as an encoding target.
Non-patent document 1: MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 specification
For example, in the high 4:2:0 arrangement standard for coding in the 4:2:0 format of AVC, in a macroblock (macro block) region composed of 16 × 16 pixels as luminance components, corresponding color difference components are 8 × 8 pixel blocks for both Cb and Cr. In intra-macroblock (intra-macroblock) coding of the 4:2:0 arrangement standard, spatial prediction (intra) prediction using surrounding sample values in the same picture is used, and different intra prediction modes are used for a luminance component and a color difference component. The luminance component is selected as the intra prediction mode with the highest prediction efficiency from among 9 types of fig. 3, and the color difference component is selected as the intra prediction mode with the highest prediction efficiency from among 4 types of fig. 9 in common for Cb and Cr (Cb and Cr cannot use different prediction modes). In the motion compensation prediction of the high 4:2:0 arrangement specification, the block size information, the reference image information for prediction, and the motion vector information of each block, which are units of the motion compensation prediction, are multiplexed only for the luminance component, and the motion compensation prediction is performed using the same information as for the luminance component for the color difference component. This is true only on the premise of color space definition in which the 4:2:0 format has a smaller effect on the color difference component than on the luminance component, which has a large effect on the expression of the structure (texture) of the image. However, the current high 4:4:4 arrangement specification is a scheme in which the intra prediction mode for color difference in the 4:2:0 format is simply extended even in a state in which the block size of the color difference signal for each macroblock is extended to 16 × 16 pixels, and similarly to the 4:2:0 format, it cannot be said that the optimum prediction method is always the 4:4:4 format in which the respective color components equally act when the structure of the image signal is expressed, except that 1 component is regarded as a luminance component, information of the 1 component is multiplexed, and dynamic compensation prediction is performed on 3 components using a common inter prediction mode, reference image information, and dynamic vector information.
Disclosure of Invention
The object of the invention is therefore: provided are an encoding device, a decoding device, an encoding method, a decoding method, a program for executing them, and a recording medium recording the program, which improve the optimum suitability when encoding a moving picture image signal in which there is no difference in sampling ratio between color components such as in the 4:4:4 format as in the conventional technique.
An image encoding device of the present invention includes: a prediction image generation unit that generates a prediction image in correspondence with a plurality of prediction modes indicating a prediction image generation method; a prediction mode determining unit that determines a predetermined prediction mode by evaluating a prediction efficiency of the prediction image outputted from the prediction image generating unit; and a coding unit that performs variable length coding on an output of the prediction mode determining unit, wherein the prediction mode determining unit determines whether to use a common prediction mode or to use individual prediction modes for the respective color components constituting the input image signal based on a predetermined control signal, multiplexes information of the control signal into a bitstream, multiplexes common prediction mode information into the bitstream when the common prediction mode is used, and multiplexes prediction mode information for each color component into the bitstream when the common prediction mode is not used.
According to the image encoding device, the image decoding device, the image encoding method, the image decoding method, the program for executing them, and the recording medium storing the program of the present invention, when encoding is performed using a variety of color spaces without being limited to a fixed color space such as Y, Cb, or Cr, intra prediction mode information and inter prediction mode information used for each color component can be flexibly selected, and even when there are definitions of a variety of color spaces, optimal encoding processing can be performed.
Drawings
Fig. 1 is an explanatory diagram showing a configuration of a video encoding device according to embodiment 1.
Fig. 2 is an explanatory diagram showing a configuration of the video decoding apparatus according to embodiment 1.
Fig. 3 is an explanatory diagram for explaining a predicted image generation method in the intra 4 × 4 prediction mode evaluated by the spatial prediction unit 2 in fig. 1.
Fig. 4 is an explanatory diagram for explaining a predicted image generation method in the intra 16 × 16 prediction mode evaluated by the spatial prediction unit 2 in fig. 1.
Fig. 5 is a flowchart illustrating a procedure of an intra prediction mode determination process performed in the video encoding apparatus of fig. 1.
Fig. 6 is an explanatory diagram showing a data array of a video bit stream output from the video encoding apparatus according to embodiment 1.
Fig. 7 is a flowchart illustrating a procedure of an intra prediction decoding process performed by the video decoding apparatus of fig. 2.
Fig. 8 is an explanatory diagram showing another data array format of the video bit stream output from the video encoding apparatus according to embodiment 1.
Fig. 9 is an explanatory diagram for explaining a predicted image generation method in the intra prediction mode corresponding to color difference components in the AVC standard.
Fig. 10 is an explanatory diagram illustrating the previous and present macro blocks.
Fig. 11 is an explanatory diagram showing a configuration of the video encoding device according to embodiment 2.
Fig. 12 is an explanatory diagram showing a configuration of the video decoding apparatus according to embodiment 2.
Fig. 13 is an explanatory diagram for explaining a predicted image generation method in the intra 8 × 8 prediction mode evaluated by the spatial prediction unit 2 in fig. 11.
Fig. 14 is a flowchart illustrating a procedure of an intra-coding mode determination process executed by the video encoding apparatus of fig. 11.
Fig. 15 is an explanatory diagram showing a data array of a video bit stream output from the video encoding apparatus according to embodiment 2.
Fig. 16 is an explanatory diagram showing another data array of a video bit stream output from the video encoding apparatus according to embodiment 2.
Fig. 17 is a flowchart illustrating a procedure for executing the intra prediction decoding process in the video decoding apparatus of fig. 12.
Fig. 18 is an explanatory diagram for explaining parameters of the intra prediction mode encoding processing of the C0 component in embodiment 3.
Fig. 19 is an explanatory diagram for explaining parameters of the intra prediction mode encoding processing of the component C1 in embodiment 3.
Fig. 20 is an explanatory diagram for explaining parameters of the intra prediction mode encoding processing of the C2 component in embodiment 3.
Fig. 21 is a flowchart showing the flow of the intra prediction mode encoding process according to embodiment 3.
Fig. 22 is a flowchart showing another flow of the intra prediction mode encoding process according to embodiment 3.
Fig. 23 is a flowchart showing a flow of the intra prediction mode encoding process according to embodiment 3.
Fig. 24 is an explanatory diagram showing another data array of a video bit stream output from the video encoding apparatus according to embodiment 4.
Fig. 25 is a flowchart showing another flow of the intra prediction mode encoding process according to embodiment 5.
Fig. 26 is an explanatory diagram showing a rule of predicted value setting made of a table (table) in example 5.
Fig. 27 is a flowchart showing an encoding procedure of embodiment 6.
Fig. 28 is an explanatory diagram showing a binary sequence structure of CurrIntraPredMode in example 6.
Fig. 29 is an explanatory diagram showing another binary sequence structure of CurrIntraPredMode in example 6.
Fig. 30 is an explanatory diagram showing a configuration of the video encoding device according to embodiment 7.
Fig. 31 is an explanatory diagram showing a configuration of a video decoding device according to embodiment 7.
Fig. 32 is an explanatory diagram showing a unit of a macroblock.
Fig. 33 is a flowchart showing the flow of inter prediction mode determination processing in embodiment 7.
Fig. 34 is an explanatory diagram showing a data array of a video stream output from the video encoding apparatus according to embodiment 7.
Fig. 35 is a flowchart showing the flow of processing executed in the variable length decoding section 25 of embodiment 7.
Fig. 36 is an explanatory diagram showing another data array of a video stream output from the video encoding apparatus according to embodiment 7.
Fig. 37 is an explanatory diagram showing another data array of a video stream output from the video encoding apparatus according to embodiment 7.
Fig. 38 is a flowchart showing the flow of inter prediction mode determination processing in embodiment 8.
Fig. 39 is an explanatory diagram showing a data array of a macroblock-level bit stream in embodiment 8.
Fig. 40 is a flowchart showing the flow of inter-prediction image generation processing according to embodiment 8.
Fig. 41 is an explanatory diagram showing another data array of a macroblock-level bit stream in embodiment 8.
Fig. 42 is an explanatory diagram showing another data array of the bit stream of the macroblock level of embodiment 8.
Fig. 43 is a flowchart showing the flow of inter prediction mode determination processing in embodiment 9.
Fig. 44 is a flowchart showing the flow of inter-prediction image generation processing according to embodiment 9.
Fig. 45 is an explanatory diagram showing the configuration of the motion vector encoding unit.
Fig. 46 is an explanatory diagram showing an operation of the motion vector encoding unit.
Fig. 47 is an explanatory diagram showing the configuration of the motion vector decoding unit.
Fig. 48 is an explanatory diagram showing a case of a bitstream syntax (syntax).
Fig. 49 is an explanatory diagram showing a structure of macroblock encoded data according to embodiment 11.
Fig. 50 is an explanatory diagram showing a detailed configuration of encoded data of Cn component header (header) information in fig. 49 of embodiment 11.
Fig. 51 is an explanatory diagram showing another configuration of macroblock encoded data according to embodiment 11.
Fig. 52 is an explanatory diagram showing the structure of a bit stream according to embodiment 11.
Fig. 53 is an explanatory diagram showing a structure of a slice (slice) in example 11.
Fig. 54 is an explanatory diagram showing an internal configuration related to the arithmetic coding process of the variable-length coding unit 11 according to embodiment 12.
Fig. 55 is a flowchart showing the flow of arithmetic coding processing by the variable-length coding unit 11 of embodiment 12.
Fig. 56 is an explanatory diagram showing a detailed flow of the processing of step S162 in fig. 55 of embodiment 12.
FIG. 57 is an explanatory diagram showing the concept of a context model (ctx).
Fig. 58 is an explanatory diagram showing an example of a context model relating to motion vectors of macroblocks.
Fig. 59 is an explanatory diagram showing an internal configuration related to the arithmetic decoding processing of the variable length decoding unit 25 of embodiment 12.
Fig. 60 is a flowchart showing the flow of arithmetic decoding processing by the variable-length decoding section 25 of embodiment 12.
Fig. 61 is an explanatory diagram showing a context model 11f of example 12.
Fig. 62 is a diagram illustrating a difference in the mode of the current macroblock in embodiment 12.
Fig. 63 is an explanatory diagram showing the configuration of the encoding device and the decoding device according to embodiment 13.
Fig. 64 is an explanatory diagram showing a configuration of the video encoding device according to embodiment 13.
Fig. 65 is an explanatory diagram showing a configuration of the video decoding apparatus according to embodiment 13.
Fig. 66 is an explanatory diagram showing a common encoding process in embodiment 14.
Fig. 67 is an explanatory diagram showing the independent encoding process of embodiment 14.
Fig. 68 is an explanatory diagram showing a temporal dynamic prediction reference relationship between pictures (pictures) in the encoding apparatus and the decoding apparatus of embodiment 14.
Fig. 69 is an explanatory diagram showing an example of the structure of a bitstream to be decoded generated by the encoding device of embodiment 14 and input to the decoding device of embodiment 14.
Fig. 70 is an explanatory diagram showing a bit stream structure of slice data in the case of the common encoding process and the independent encoding process.
Fig. 71 is an explanatory diagram showing a schematic configuration of the coding apparatus of embodiment 14.
Fig. 72 is an explanatory diagram showing a case where the processing delay on the encoding device side is reduced.
FIG. 73 is an explanatory diagram showing an internal configuration of the first image encoding unit.
FIG. 74 is an explanatory diagram showing an internal configuration of the second image encoding unit.
Fig. 75 is an explanatory diagram showing a schematic configuration of the decoding device of embodiment 14.
Fig. 76 is an explanatory diagram showing an internal configuration of the first image decoding section.
Fig. 77 is an explanatory diagram showing an internal configuration of the second image decoding section.
Fig. 78 is an explanatory diagram showing an internal configuration of the first image encoding unit after the color space conversion processing.
Fig. 79 is an explanatory diagram showing an internal configuration of the first image encoding unit after the color space conversion processing.
FIG. 80 is an explanatory diagram showing an internal configuration of the first image encoding unit after the inverse color space conversion processing.
FIG. 81 is an explanatory diagram showing an internal configuration of the first image encoding unit after the inverse color space conversion processing.
Fig. 82 is an explanatory diagram showing a structure of encoded data of macroblock header information included in a conventional YUV4:2: 0-format bit stream.
Fig. 83 is an explanatory diagram showing an internal configuration of the prediction section 461 of the first picture decoding section that ensures compatibility with a conventional YUV4:2:0 format bit stream.
Fig. 84 is an explanatory diagram showing the structure of a bit stream of multiplexed encoded data according to embodiment 15.
Fig. 85 is an explanatory diagram showing information of a picture coding type when picture data in an access unit (access unit) represented by an AUD NAL unit is coded.
Fig. 86 is an explanatory diagram showing a structure of a bit stream of multiplexed encoded data according to embodiment 15.
Description of the symbols
1: inputting an image signal; 2: a spatial prediction component; 3: a subtractor; 4: predicting a difference signal; 5: an encoding mode judging section; 6: a coding mode; 7: predicting an image; 8: an orthogonal transformation unit; 9: a quantized component; 10: quantizing the transformed coefficients; 11: a variable length encoding component; 11 a: a context model decision component; 11 b: a binarization section; 11 c: an occurrence probability generating section; 11 d: a coding means; 11 e: encoding a value; 11 f: a context model; 11 g: an occurrence probability information memory; 11 h: an occurrence probability state; 12: an inverse quantization unit; 13: an inverse orthogonal transformation unit; 14: locally decoding the predicted difference signal; 15: a local decoded image (tentative decoded image); 16: a memory; 17: a transmission buffer; 18: an adder; 19: a code control section; 20: a weighting coefficient; 21: quantizing the parameters; 22: a video stream; 23: an intra-prediction mode commonization identification flag; 24: deblocking (deblocking) the filter control flags; 25: a variable length decoding section; 25 a: a decoding section; 25 b: bin recovery value; 26: a deblocking filter; 27: decoding the image; 28: an intra-coding mode; 29: a basic intra prediction mode; 30: an extended intra prediction mode; 31: an extended intra prediction mode table indicator; 32: a transform block size identifier; 33: an intra-coding mode commonization identification mark; 34: an intra-coding mode; 35: an intra-prediction mode; 36: an intra prediction mode indicator; 102: a dynamic compensation prediction component; 106: macroblock type/sub-macroblock type; 123: an inter-prediction mode commonization identification flag; 123 b: a dynamic vector commonization identification mark; 123 c: a macro block head commonization identification mark; 128: a basic macroblock type; 128 b: a macroblock type; 129: a basic sub-macroblock type; 129 b: a sub-macroblock type; 130: an extended macroblock type; 131: an extended sub-macroblock type; 132: a basic reference image identification number; 132 b: a reference image identification number; 133: basic dynamic vector information; 134: an extended reference image identification number; 135: expanding the dynamic vector information; 136: configuration specification (profile) information; 137: a dynamic vector; 138. 138a, 138b, 138 c: skip indication information; 139a, 139b, 139 c: header information; 140a, 140b, 140 c: transform coefficient data; 141: an intra-prediction mode; 142: transform coefficient significance invalidity indication information; 143: occurrence probability state parameter commonization identification mark; 144: an intra-chromatic aberration prediction mode; 111: a motion vector prediction unit; 112: a differential motion vector calculation section; 113: a differential motion vector variable length encoding section; 250: a dynamic vector decoding section; 251: a differential motion vector variable length decoding section; 252: a motion vector prediction unit; 253: a dynamic vector calculation section; 301: a color space conversion section; 302: converting the image signal; 303: an encoding device; 304: color space transformation method identification information; 305: a bit stream; 306: a decoding device; 307: decoding the image; 308: an inverse color space conversion section; 310: a conversion component; 311: color space transformation method identification information; 312: an inverse transformation section; 422a, 422b0, 422b1, 422b2, 422 c: a video stream; 423: a common code/independent code identification signal; 427a, 427 b: decoding the image; 461: a prediction component; 462: deblocking filter (deblocking filter); 463: predictive overhead (overhead) information; 464: a transform block size designation flag; 465: a color space conversion section; 466: an inverse color space conversion section; 467: signaling information; 501. 601, a step of: a switch; 502: a color component separating member; 503 a: a first image encoding unit; 503b0, 503b1, 503b 2: a second image encoding unit; 504: a multiplexing section; 602: a color component judgment section; 603 a: a first image decoding unit; 603b0, 603b1, 603b 2: a second image decoding section; 610: an upper head analyzing section; 4611a, 4611b, 4611 c: a switching member; 4612: a luminance signal intra prediction section; 4613: a color difference signal intra prediction section; 4614: an inter luminance signal (inter) prediction section; 4615: inter-color-difference signal prediction unit
Detailed Description
Example 1
In embodiment 1, an encoding device and a corresponding decoding device for performing intra-frame encoding in units in which a video frame input in the 4:4:4 format is equally divided into rectangular regions (macroblocks) of 16 × 16 pixels will be described. The encoding device and the decoding device are based on the encoding method adopted in the MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard, which is non-patent document 1, and have unique features added thereto.
Fig. 1 shows a configuration of a video encoding device according to embodiment 1, and fig. 2 shows a configuration of a video decoding device according to embodiment 1. In fig. 2, elements denoted by the same reference numerals as those of the components of the coding apparatus of fig. 1 denote the same elements.
Hereinafter, the operations of the entire encoding apparatus and decoding apparatus, and the characteristic operations of embodiment 1, that is, the intra-prediction mode determination process and the intra-prediction decoding process will be described with reference to these drawings.
1. Outline of operation of encoding device
In the encoding apparatus of fig. 1, each frame of the video signal 1 is input in accordance with the 4:4:4 format. As shown in fig. 10, the input video frame is input to the encoding device in units of macroblocks in which 3 color components are divided into 16-pixel × 16-pixel blocks of the same size and collected.
First, the spatial prediction unit 2 performs intra prediction processing for each color component in units of macroblocks using the local decoded image 15 stored in the memory 16. A 3-plane memory is prepared for each color component (in the present embodiment, 3 planes are described, but may be appropriately changed according to design). The intra prediction mode includes an intra4 × 4 prediction mode in which spatial prediction is performed using peripheral pixels in units of 4 pixel × 4 line blocks shown in fig. 3, and an intra 16 × 16 prediction mode in which spatial prediction is performed using peripheral pixels in units of 16 pixel × 16 line macroblocks shown in fig. 4.
(a) Intra4 x 4 prediction mode
The 16 × 16 pixel block of the luminance signal in the macroblock is divided into 16 blocks each composed of 4 × 4 pixel blocks, and any one of the 9 modes shown in fig. 3 is selected as a4 × 4 pixel block unit. The pixels of the surrounding blocks (upper left, upper right, and left) that have already been encoded and subjected to the local decoding process and stored in the memory 16 are used in generating the prediction image.
Intra4 × 4_ pred _ mode ═ 0: the adjacent upper pixels are used as prediction images.
Intra4 × 4_ pred _ mode ═ 1: the left adjacent pixel is used as a prediction image.
Intra4 × 4_ pred _ mode ═ 2: the average value of the adjacent 8 pixels is used as a prediction image.
Intra4 × 4_ pred _ mode — 3: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a predicted image (corresponding to the right 45-degree edge).
Intra4 × 4_ pred _ mode — 4: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a prediction image (corresponding to the left 45-degree edge).
Intra4 × 4_ pred _ mode — 5: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a prediction image (corresponding to the left 22.5 degree edge).
Intra4 × 4_ pred _ mode ═ 6: a weighted average is obtained for every 2 to 3 pixels from adjacent pixels, and used as a prediction image (corresponding to the left 67.5 degree edge).
Intra4 × 4_ pred _ mode ═ 7: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a predicted image (corresponding to the right 22.5 degree edge).
Intra4 × 4_ pred _ mode — 8: a weighted average is obtained for every 2 to 3 pixels from adjacent pixels, and used as a prediction image (corresponding to the left 112.5 degree edge).
In the case of selecting the intra4 × 4 prediction mode, 16 pieces of mode information per macroblock are required. Therefore, in order to reduce the amount of code of the mode information itself, the mode information is predictively encoded based on the mode information of the adjacent block, using the fact that the correlation between the mode information and the adjacent block is high.
(b) Intra16 × 16 prediction mode
The mode is a mode in which a16 × 16 pixel block corresponding to the macroblock size is predicted once, and any of the 4 modes shown in fig. 4 is selected in units of macroblocks. As in the intra 4 × 4 prediction mode, the pixels of the surrounding macroblocks (upper left, and left) that have already been encoded and subjected to the local decoding process and are stored in the memory 16 are used in generating the prediction image.
Intra16 × 16_ pred _ mode ═ 0: the 16 pixels at the lowermost side of the upper macroblock are used as a prediction image.
Intra16 × 16_ pred _ mode ═ 1: the rightmost 16 pixels of the left macroblock are used as a prediction image.
Intra16 × 16_ pred _ mode ═ 2: the average value of the total 32 pixels of the lowermost 16 pixels (part a in fig. 4) of the upper macroblock and the leftmost 16 pixels (part B in fig. 4) of the left macroblock is used as a prediction image.
Intra16 × 16_ pred _ mode — 3: a predicted image is obtained by predetermined calculation processing (weighted addition processing corresponding to the pixel position predicted as a used pixel) using 31 pixels in total of the bottom right pixel of the top left macroblock, the 15 pixels (excluding the blank pixel) at the bottom of the top macroblock, and the 15 pixels (excluding the blank pixel) at the right most of the left macroblock.
The video encoding device of embodiment 1 is characterized in that: the intra prediction processing method corresponding to the 3 color components is switched according to the intra prediction mode sharing identification flag 23. This point is explained in detail by the following 2.
The spatial prediction unit 2 performs prediction processing for all the modes or sub-groups (subsets) shown in fig. 3 and 4, and obtains a prediction difference signal 4 by the subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the encoding mode determination unit 5, and a prediction mode that yields the optimum prediction efficiency for the macroblock to be predicted is output as the encoding mode 6 from the prediction processing performed in the spatial prediction unit 2. Here, the coding mode 6 includes information (corresponding to the Intra coding mode of fig. 6) for determining whether to use the Intra4 × 4 prediction mode or the Intra16 × 16 prediction mode, and also includes information (the above-described Intra4 × 4_ pred _ mode or Intra16 × 16_ pred _ mode) of the prediction modes used for each prediction unit region. The prediction unit region corresponds to a4 × 4 pixel block in the case of the intra4 × 4 prediction mode, and corresponds to a16 × 16 pixel block in the case of the intra16 × 16 prediction mode. Each time the coding mode 6 is selected, the weighting factor 20 corresponding to each coding mode determined by the judgment of the coding control unit 19 is further included. The optimal prediction difference signal 4 obtained by the encoding mode determining section 5 using the encoding mode 6 is output to the orthogonal transformation section 8. The orthogonal transformation unit 8 transforms the input prediction difference signal 4 and outputs the transformed signal to the quantization unit 9 as an orthogonal transformation coefficient. The quantization unit 9 quantizes the input orthogonal transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized orthogonal transform coefficient to the variable length encoding unit 11 as the quantized transform coefficient 10. The quantized transform coefficient 10 is subjected to average information amount (entropy) encoding by means of huffman encoding or arithmetic encoding in the variable length encoding unit 11. The quantized transform coefficient 10 is restored to a local decoded prediction difference signal 14 via an inverse quantization unit 12 and an inverse orthogonal transform unit 13, and added to the predicted image 7 generated in accordance with the encoding mode 6 by an adder 18, thereby generating a local decoded image 15. The local decoded image 15 is used in the subsequent intra prediction process, and is stored in the memory 16. Further, a deblocking filter control flag 24 indicating whether or not to perform deblocking filtering on the macroblock is input to the variable length coding unit 11 (in the prediction processing performed by the spatial prediction unit 2, since pixel data before performing deblocking filtering is stored in the memory 16 and used, the deblocking filtering process itself is not necessary in the coding processing, but the decoding apparatus performs deblocking filtering in accordance with an instruction of the deblocking filter control flag 24 to obtain a final decoded image).
The intra-prediction mode-sharing flag 23, the quantized transform coefficient 10, the coding mode 6, and the quantization parameter 21 input to the variable length coding unit 11 are aligned and shaped into a bit stream according to a predetermined rule (syntax), and output to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path connected to the encoding apparatus and the reading speed of the recording medium, and outputs the bit stream as a video stream 22. In addition, the feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.
2. Intra prediction mode determination processing in encoding device
The intra prediction mode determination process, which is a feature of the encoding apparatus of embodiment 1, will be described in detail. This processing is performed in units of macroblocks in which the 3 color components are aggregated, and is mainly executed by the spatial prediction unit 2 and the encoding mode determination unit 5 in the encoding device of fig. 1. Fig. 5 is a flowchart showing the flow of this processing. Hereinafter, the image data of 3 color components constituting a block are referred to as C0, C1, and C2.
First, the coding mode determining unit 5 receives the intra prediction mode sharing flag 23, and determines whether or not the common intra prediction mode is used for C0, C1, and C2, based on the value of the flag (step S1 in fig. 5). If the common use is made, the process proceeds to step S2 and thereafter, and if the common use is not made, the process proceeds to step S5 and thereafter.
When the intra prediction mode is commonly used for C0, C1, and C2, the encoding mode determining unit 5 notifies the spatial prediction unit 2 of all the intra 4 × 4 prediction modes that can be selected, and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the optimal intra 4 × 4 prediction mode that is commonly used for C0, C1, and C2 (step S2). Next, the coding mode determining unit 5 notifies the spatial prediction unit 2 of all the intra 16 × 16 prediction modes that can be selected, and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the optimal intra 16 × 16 prediction mode common to C0, C1, and C2 (step S3). The encoding mode determining section 5 finally selects the most efficient mode in prediction efficiency from the modes obtained in steps S2 and S3 (step S4), and ends the process.
When the intra-prediction mode is not shared among C0, C1, and C2, but the optimal mode is selected among C0, C1, and C2, the coding mode determination unit 5 notifies the spatial prediction unit 2 of all intra-4 × 4 prediction modes that can be selected for the Ci (i < ═ 0 < 3) component, and the spatial prediction unit 2 evaluates all prediction efficiencies thereof and selects the optimal intra-4 × 4 prediction mode for the Ci (i < ═ 0 < 3) component (step S6). Likewise, the optimal intra 16 × 16 prediction mode is selected (step S7). Finally, in step S8, the optimum intra prediction mode of Ci (i < ═ 0 < 3) components is determined.
As a rule of the prediction efficiency evaluation of the prediction mode performed in the spatial prediction section 2, for example, a rate, a distortion cost given by the following formula can be used.
Jm is Dm + Lambda Rm (Lambda is positive number)
Here, Dm is an encoding distortion or a prediction error amount when the intra prediction mode m is applied. The coding distortion is a result obtained by obtaining a prediction error by applying the intra prediction mode m, decoding a video based on a result of transforming and quantizing the prediction error, and measuring an error with respect to a signal before coding. The prediction error amount is a result of obtaining a difference between a predicted image and a signal before encoding when the intra-prediction mode m is applied, and quantizing the magnitude of the difference, and for example, Sum of Absolute Differences (SAD) or the like can be used. Rm is the generated code amount when the intra prediction mode m is applied. That is, Jm is a value that defines a balance (trade-off) between the amount of code and the degree of deterioration when the intra-prediction mode m is applied, and the intra-prediction mode m that generates the smallest Jm generates the optimal solution.
When the encoding device performs the processing of step S2 and subsequent steps, information of one intra prediction mode is assigned to a macroblock including 3 color components. On the other hand, when the processing in step S5 and thereafter is performed, the intra prediction mode information is assigned to each color component. Therefore, since the intra prediction mode information allocated to the macroblock is different, it is necessary to multiplex the intra prediction mode sharing flag 23 into the bit stream so that the decoding apparatus can recognize whether the encoding apparatus has performed the processing procedure after S2 or the processing procedure after S5. Fig. 6 shows a data array of such a bitstream.
This figure shows a data array of a macroblock-level bit stream, the intra coding mode 28 indicates information for determining whether the intra mode is intra 4 × 4 or intra 16 × 16, the basic intra prediction mode 29 indicates common intra prediction mode information when the intra prediction mode sharing flag 23 indicates "C0, C1, and C2 are common", and indicates intra prediction mode information for C0 when the flag does not indicate "C0, C1, and C2 are common". The extended intra prediction mode 30 is multiplexed only when the intra prediction mode sharing flag 23 indicates that "C0, C1, and C2 are not shared", and indicates intra prediction mode information for C1 and C2. Further, the quantization parameter 21 and the quantized transform coefficient 10 are multiplexed. The coding mode 6 in fig. 1 is a general term for the intra coding mode 28 and the intra prediction mode (basic, extended) (in fig. 6, the deblocking filter control flag 24 input to the variable length coding unit 11 is not included, but is not described since it is not a necessary component for describing the features of the present embodiment 1).
In the 4:2:0 format adopted in the conventional video encoding standard, the definition of the color space is fixed to Y, Cb, and Cr, but in the 4:4:4 format, not only Y, Cb, and Cr, but also various color spaces can be used. By configuring the intra-prediction mode information as shown in fig. 6, it is possible to perform an optimal encoding process even when there are various definitions of the color space of the input video signal 1. For example, when the color space is defined by RGB, since the video texture (texture) structure remains uniformly in each component of R, G, B, the redundancy of the intra-prediction mode information itself can be reduced and the encoding efficiency can be improved by using the common intra-prediction mode information. On the other hand, when the color space is defined by Y, Cb, and Cr, the video texture structure is integrated in Y, and therefore the common intra prediction mode does not always yield the optimal result. Therefore, by appropriately using the extended intra prediction mode 30, optimal coding efficiency can be obtained.
3. Outline of operation of decoding device
The decoding apparatus of fig. 2 receives the video stream 22 corresponding to the array of fig. 6 output from the encoding apparatus of fig. 1, decodes 3 color components in units of macroblocks of the same size (4:4:4 format), and restores each video frame.
First, the variable length decoding unit 25 decodes the stream 22 in accordance with a predetermined rule (syntax) and extracts information such as the intra-prediction mode-sharing flag 23, the quantized transform coefficient 10, the coding mode 6, and the quantization parameter 21. The quantized transform coefficient 10 is input to the inverse quantization unit 12 together with the quantization parameter 21, and inverse quantization processing is performed. The output is input to the inverse orthogonal transform unit 13 and restored to the local decoded prediction difference signal 14. On the other hand, the spatial prediction unit 2 receives the encoding mode 6 and the intra-prediction mode sharing flag 23, and obtains the predicted image 7 based on these pieces of information. A specific procedure for obtaining the predicted image 7 will be described later. The adder 18 adds the local decoded prediction difference signal 14 and the predicted image 7 to obtain a temporary decoded image 15 (which is a signal identical to the local decoded image 15 in the encoding apparatus). The provisional decoded image 15 is written back to the memory 16 for use in the intra prediction of the subsequent macroblock. A 3-plane memory is prepared for each color component (in the present embodiment, 3-plane is described, but may be appropriately changed according to design). In addition, the deblocking filter 26 is caused to act on the provisional decoded image 15 in accordance with the instruction of the deblocking filter control flag 24 interpreted by the variable length encoding unit 25, and a final decoded image 27 is obtained.
4. Intra-prediction decoding process of decoding apparatus
The intra prediction image generation process, which is a feature of the decoding apparatus of embodiment 1, will be described in detail. This processing is performed in units of macroblocks in which the 3 color components are grouped, and is mainly executed by the variable length decoding unit 25 and the spatial prediction unit 2 in the decoding device of fig. 2. Fig. 7 is a flowchart showing the flow of the present process.
S10 to S14 in the flowchart of fig. 7 are executed in the variable length decoding section 25. The video stream 22 as input to the variable length decoding section 25 corresponds to the data array of fig. 6. In step S10, the intra coding mode 28 in the data of fig. 6 is first decoded, and the intra prediction mode sharing flag 23 is also decoded (step S11). Further, the basic intra prediction mode 28 is decoded (step S12). In step S13, it is determined whether or not the intra prediction mode is commonly used for C0, C1, and C2 using the intra prediction mode sharing flag 23, and in the case of sharing, the basic intra prediction mode 29 is used for all of C0, C1, and C2, and in the case of not sharing, the basic intra prediction mode 29 is used as the mode of C0, and the extended intra prediction mode 30 is decoded (step S14), so that the mode information of C1 and C2 is obtained. Since the encoding mode 6 of each color component is determined through the above processing procedure, the encoding mode is input to the spatial prediction unit 2, and an intra-prediction image of each color component is obtained in accordance with steps S15 to S17. The process of obtaining the intra-prediction image corresponds to the procedure of fig. 4, and is the same as the process performed in the encoding apparatus of fig. 1.
Fig. 8 shows a variation of the array of bitstream data of fig. 6. In fig. 7, the intra prediction mode sharing identification flag 23 is not a flag at the macroblock level, but a flag at an upper data layer as a slice, a picture, a sequence (sequence), or the like is multiplexed, and the intra prediction mode table local extension indication flag 31 enables selection of any one from among a plurality of code tables defining codes of the extended intra prediction mode 30. Thus, when sufficient prediction efficiency can be ensured by switching in an upper layer above a slice, overhead (overhead) bits can be reduced without multiplexing the intra-prediction mode-sharing identification flag 23 one by one at the macroblock level. In addition, by providing the extended intra prediction mode table indicator 31 for the extended intra prediction mode 30, it is possible to select a prediction mode definition that is specialized for the C1 and C2 components, without using the same definition as that of the basic intra prediction mode 29, and to perform encoding processing suitable for the definition of the color space. For example, in the coding of the format 4:2:0 of AVC, an intra prediction mode group different from luminance (Y) is defined for the color difference components (Cb, Cr). In the 4:2:0 format, the color difference signals in a macroblock are 8 pixels × 8 lines, and any one of the 4 modes shown in fig. 9 is selected for each macroblock unit and subjected to decoding processing. The color difference signals have 2 kinds of Cb and Cr, but the same pattern is used. The DC prediction is similar to the intra 16 × 16 prediction mode in fig. 4 except that the intra _ chroma _ pred _ mode is 0, but in the DC prediction, an 8 × 8 block is divided into 4 × 4 blocks, and the position of a pixel for obtaining an average value is changed for each block to perform processing. In the figure, a block "a + x, a or x" is used as the predicted image 7 by using 8 pixels of a and x when both the pixel a and the pixel x are available, using 4 pixels of a when only a is available, and using 4 pixels of x when only x is available. When both a and x are unavailable, the value 128 is used as the predicted image 7. If the image b can be used, the block "b or x" uses 4 pixels of b, and if only the pixel x can be used, the block "b or x" uses 4 pixels of x to obtain an average value.
In this way, when a change in the group of intra prediction modes is required in accordance with the properties of the color components, a more excellent coding efficiency can be obtained with the structure as shown in the grammar of fig. 8.
Example 2
In embodiment 2, another encoding apparatus and a corresponding decoding apparatus will be described, which encode a video frame input in a 4:4:4 format in units of rectangular regions (macroblocks) equally divided into 16 × 16 pixels so as to converge in a frame. The encoding device and the decoding device are based on the encoding method adopted in the MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard, which is non-patent document 1, and have unique features added to the present invention, as in embodiment 1.
Fig. 11 shows the configuration of a video encoding device according to embodiment 2, and fig. 12 shows the configuration of a video decoding device according to embodiment 2. In fig. 11, elements denoted by the same reference numerals as those of the components of the coding apparatus of fig. 1 denote the same elements. In fig. 12, elements denoted by the same reference numerals as those of the coding apparatus of fig. 11 represent the same elements. In fig. 11, 32 denotes a transform block size flag, and 33 denotes an intra coding mode sharing flag.
Hereinafter, the operations of the entire encoding apparatus and decoding apparatus of embodiment 2, and the intra-encoding/prediction mode determination processing and intra-prediction decoding processing, which are characteristic operations of embodiment 2, will be described with reference to these drawings.
1. Outline of operation of encoding device
In the encoding device of fig. 11, each video frame of the input video signal 1 is in a 4:4:4 format, and as shown in fig. 10, the input video signal is input to the encoding device in units in which 3 color components are divided into macroblocks of the same size and collected.
The spatial prediction unit 2 performs intra prediction processing for each color component in units of macroblocks using the local decoded image 15 stored in the memory 16. The intra prediction modes include an intra 4 × 4 prediction mode in which spatial prediction is performed using the surrounding pixels in units of 4 pixel × 4 line blocks shown in fig. 3, an intra 8 × 8 prediction mode in which spatial prediction is performed using the surrounding pixels in units of 8 pixel × 8 line blocks shown in fig. 13, and an intra 16 × 16 prediction mode in which spatial prediction is performed using the surrounding pixels in units of 16 pixel × 16 line macroblocks shown in fig. 4. In the encoding device of embodiment 2, the intra 4 × 4 prediction mode and the intra 8 × 8 prediction mode are switched to be used in accordance with the state of the transform block size identification flag 32. Which intra prediction mode of 4 × 4 prediction, 8 × 8 prediction, and 16 × 16 prediction is used to encode a certain macroblock can be represented by an intra coding mode as in fig. 6. In the encoding device of embodiment 2, 2 kinds of intra N × N prediction encoding modes (N is 4 to 8) that perform encoding using either an intra 4 × 4 prediction mode or an intra 8 × 8 prediction mode, and intra 16 × 16 prediction encoding modes that perform encoding using an intra 16 × 16 prediction mode are provided as the intra encoding modes. Hereinafter, the description will be made in a differentiated manner according to the intra coding mode.
(a) Intra-NxN predictive coding mode
The following pattern is used: encoding is performed while selectively switching between an intra 4 × 4 prediction mode in which a luminance signal 16 × 16 pixel block within a macroblock is divided into 16 blocks each composed of 4 × 4 pixel blocks and a prediction mode is selected for each of the 4 × 4 pixel blocks, and an intra 8 × 8 prediction mode in which a luminance signal 16 × 16 pixel block within a macroblock is divided into 4 blocks each composed of 8 × 8 pixel blocks and a prediction mode is selected for each of the 8 × 8 pixel blocks. The intra 4 × 4 prediction mode and the intra 8 × 8 prediction mode are switched in accordance with the state of the transform block size identification flag 32. This point will be described later. As described in embodiment 1, any of the 9 modes shown in fig. 3 is selected for the intra 4 × 4 prediction mode in units of 4 × 4 pixel blocks. The pixels of the surrounding blocks (upper left, upper right, and left) that have already finished encoding and performed the local decoding processing and are stored in the memory 16 are used for prediction image generation.
On the other hand, the intra 8 × 8 prediction mode selects any one of the 9 modes shown in fig. 13 in units of 8 × 8 pixel blocks. As is clear from comparison with fig. 3, the prediction method of the intra 4 × 4 prediction mode is changed so as to be suitable for an 8 × 8 pixel block.
Intra8 × 8_ pred _ mode ═ 0: the adjacent upper pixels are used as prediction images.
Intra8 × 8_ pred _ mode ═ 1: the left adjacent pixel is used as a prediction image.
Intra8 × 8_ pred _ mode ═ 2: the average value of the adjacent 8 pixels is used as a prediction image.
Intra8 × 8_ pred _ mode — 3: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a predicted image (corresponding to the right 45-degree edge).
Intra8 × 8_ pred _ mode — 4: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a prediction image (corresponding to the left 45-degree edge).
Intra8 × 8_ pred _ mode ═ 5: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a prediction image (corresponding to the left 22.5 degree edge).
Intra8 × 8_ pred _ mode ═ 6: a weighted average is obtained for every 2 to 3 pixels from adjacent pixels, and used as a prediction image (corresponding to the left 67.5 degree edge).
Intra8 × 8_ pred _ mode ═ 7: the weighted average is obtained for every 2 to 3 pixels from the adjacent pixels, and is used as a predicted image (corresponding to the right 22.5 degree edge).
Intra8 × 8_ pred _ mode — 8: a weighted average is obtained for every 2 to 3 pixels from adjacent pixels, and used as a prediction image (corresponding to the left 112.5 degree edge).
In the case of selecting the intra 4 × 4 prediction mode, 16 pieces of mode information per macroblock are required. Therefore, in order to reduce the amount of code of the mode information itself, the mode information is predictively encoded based on the mode information of the adjacent block, using the fact that the correlation between the mode information and the adjacent block is high. Similarly, when the intra 8 × 8 prediction mode is selected, prediction encoding is performed based on the mode information of the adjacent block, using the fact that the correlation with the adjacent block is high.
(b) Intra 16 × 16 prediction mode
The mode is a mode in which a 16 × 16 pixel block corresponding to the macroblock size is predicted once, and any of the 4 modes shown in fig. 4 is selected in units of macroblocks. As in the intra 4 × 4 prediction mode, the pixels of the surrounding macroblocks (upper left, and left) that have already been encoded and subjected to the local decoding process and are stored in the memory 16 are used in generating the prediction image. The mode type is as described in fig. 4 of embodiment 1. In the inner 16 × 16 prediction coding mode, the transform block size is always 4 × 4. First, 16 DC (direct current component, average value) groups are converted into 4 × 4 blocks by 4 × 4 blocks, and the remaining alternating current components except the DC component are converted into 4 × 4 blocks.
The video encoding device of embodiment 2 is characterized in that: the intra prediction, transform, and encoding methods corresponding to the 3 color components are switched according to the intra coding mode sharing flag 33. This point is explained in detail by the following 2.
The spatial prediction unit 2 evaluates the intra-prediction mode for the input 3 color component signals in accordance with the instruction of the intra-coding mode sharing flag 33. The intra-coding mode sharing flag 33 indicates whether the input 3 color components are assigned the intra-coding mode respectively or the 3 components are all assigned the same intra-coding mode. The background thereof is as follows.
In the 4:4:4 mode, RGB can be used as it is, in addition to the Y, Cb, Cr color space used in the conventional encoding. The Y, Cb, and Cr color spaces have components dependent on the video context removed from the Cb and Cr signals. The optimal intra-coding method has a high probability of changing between the Y component and the Cb, Cr2 components (currently, the design of intra-prediction modes used for the Y component and the Cb, Cr components is different in the coding scheme targeted for the 4:2:0 format of AVC/h.264 such as the high 4:2:0 configuration standard). On the other hand, when encoding is performed in the RGB color space, since the correlation between signal components in the same space is increased without directly removing the context structure from the color components as in Y, Cb, and Cr, it is possible to improve the encoding efficiency by configuring to be able to commonly select the intra-encoding mode. This point is not only the definition of the color space, but also the use of a specific color space is influenced by the properties of the video, and it is desirable that the encoding method itself can appropriately correspond to the properties of the video signal. Therefore, in the present embodiment, the intra-coding mode commonization identification flag 33 is provided, and the coding apparatus is configured so that the 4:4:4 format video can be flexibly coded.
In the spatial prediction unit 2, in accordance with the state of the intra-coding mode sharing flag 33 set as described above, prediction processing corresponding to each color component is performed on all the intra-prediction modes or a predetermined sub-group shown in fig. 3, 4, and 13, and the subtracter 3 obtains the prediction difference signal 4. The prediction efficiency of the prediction difference signal 4 is evaluated by the coding mode determination unit 5, and an intra-prediction mode that can obtain the optimum prediction efficiency for the target macroblock is selected from the prediction processing performed by the spatial prediction unit 2. Here, when intra N × N prediction is selected, an N × N prediction encoding mode is output as the encoding mode 6, and when the prediction mode is intra 4 × 4 prediction, the transform block size identification flag 32 is set to "transform of 4 × 4 block size". In addition, in the case where the prediction mode is intra 8 × 8 prediction, the transform block size identification flag 32 is set to "transform of 8 × 8 block size". Various methods can be considered as a method of determining the transform block size identification flag 32, but in the encoding device of embodiment 2, since the block size in the case of transforming the residual obtained by intra N × N prediction is determined, a method of determining the optimum intra N × N prediction mode by the encoding mode determining means 5 and then determining the same as the N value is basic. For example, if the intra 4 × 4 prediction mode is used, if the transform block size is set to an 8 × 8 pixel block, the prediction difference signal 4 obtained as a result of prediction has a high possibility of discontinuity in spatial continuity of the prediction signal in units of 4 × 4 blocks, and unnecessary high-frequency components are generated, so that the effect of concentration of signal power by transform is small. Such a problem does not occur if the transform block size is set to 4 × 4 pixel blocks in accordance with the prediction mode.
When intra 16 × 16 prediction is selected by the coding mode determination unit 5, an intra 16 × 16 prediction coding mode is output as the coding mode 6. When the coding mode 6 is selected, the weighting coefficient 20 for each coding mode determined by the judgment of the coding control unit 19 is also used.
The prediction difference signal 4 obtained according to the coding mode 6 is output to the orthogonal transformation section 8. The orthogonal transformation unit 8 transforms the input prediction difference signal and outputs the transformed prediction difference signal to the quantization unit 9 as an orthogonal transformation coefficient. The quantization unit 9 quantizes the input orthogonal transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized orthogonal transform coefficient to the variable length encoding unit 11 as the quantized transform coefficient 10.
When the transform block size is 4 × 4 block units, the prediction difference signal 4 input to the orthogonal transform unit 8 is divided into 4 × 4 block units, orthogonal transformed, and quantized by the quantization unit 9. When the transform block size is 8 × 8 block units, the prediction difference signal 4 input to the orthogonal transform unit 8 is divided into 8 × 8 block units, orthogonal transformed, and quantized by the quantization unit 9.
The quantized transform coefficient 10 is subjected to average information amount (entropy) encoding by means of huffman encoding or arithmetic encoding in the variable length encoding unit 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse orthogonal transform unit 13 according to the block size based on the transform block size identification flag 32 or the like, and is added to the predicted image 7 generated in the encoding mode 6 by the adder 18, thereby generating the local encoded image 15. The local coded image 15 is used in the subsequent intra prediction process, and is stored in the memory 16. Further, a deblocking filter control flag 24 indicating whether or not to perform deblocking filtering on the macroblock is input to the variable length coding unit 11 (in the prediction processing performed by the spatial prediction unit 2, since pixel data before performing deblocking filtering is stored in the memory 16 and used, the deblocking filtering process itself is not necessary in the coding processing, but the decoding apparatus performs deblocking filtering in accordance with an instruction of the deblocking filter control flag 24 to obtain a final decoded image).
The intra-coding mode sharing flag 33, the quantized transform coefficient 10, the coding mode 6, and the quantization parameter 21 input to the variable length coding unit 11 are aligned and shaped into a bit stream according to a predetermined rule (syntax), and transmitted to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path connected to the encoding apparatus and the reading speed of the recording medium, and outputs the bit stream as a video stream 22. In addition, the feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.
2. Intra-coding mode/prediction mode determination processing in coding apparatus
The determination processing of the intra coding mode and the intra prediction mode, which is a feature of the coding apparatus of embodiment 2, will be described in detail. This processing is performed in units of macroblocks in which the 3 color components are aggregated, and is mainly executed by the spatial prediction unit 2 and the encoding mode determination unit 5 in the encoding device of fig. 11. Fig. 14 is a flowchart showing the flow of this processing. Hereinafter, the image data of 3 color components constituting a block are referred to as C0, C1, and C2.
First, the coding mode determination unit 5 receives the intra coding mode sharing flag 33, and determines whether or not the common intra coding mode is used for C0, C1, and C2, based on the value thereof (step S20 in fig. 14). If the common use is made, the process proceeds to step S21 and thereafter, and if the common use is not made, the process proceeds to step S22 and thereafter.
When the intra-coding mode is used in common for C0, C1, and C2, the coding mode determination unit 5 notifies the spatial prediction unit 2 of all the selectable intra-prediction modes (intra N × N prediction, intra 16 × 16 prediction), and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the intra-coding mode and the intra-prediction mode that are optimal for all the components (step S21).
On the other hand, when selecting the optimal mode for each of C0, C1, and C2, the coding mode determining unit 5 notifies the spatial prediction unit 2 of all intra-prediction modes (intra N × N prediction, intra 16 × 16 prediction) in which the component Ci (i < 0 < 3) can be selected, and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the optimal intra 4 × 4 prediction mode for the component Ci (i < 0 < 3) (step S23).
In the above-described steps S21 and S23, when the spatial prediction unit 2 selects the intra 4 × 4 prediction mode as the mode that yields the optimal prediction efficiency, the transform block size flag 32 is set to "transform of 4 × 4 block size", and when the spatial prediction unit 2 selects the intra 8 × 8 prediction mode as the mode that yields the optimal prediction efficiency, the transform block size flag 32 is set to "transform of 8 × 8 block size".
As a specification of the prediction efficiency evaluation of the prediction mode performed in the spatial prediction section 2, for example, a rate, a distortion cost given by the following formula can be used.
Jm is Dm + Lambda Rm (Lambda is positive number)
Here, Dm is an encoding distortion or a prediction error amount when the intra prediction mode m is applied. The coding distortion is a result obtained by obtaining a prediction error by applying the intra prediction mode m, decoding a video based on a result of transforming and quantizing the prediction error, and measuring an error with respect to a signal before coding. The prediction error amount is a result of obtaining a difference between a predicted image and a signal before encoding when the intra-prediction mode m is applied, and quantizing the magnitude of the difference, and for example, Sum of Absolute Differences (SAD) or the like can be used. Rm is the generated code amount when the intra prediction mode m is applied. That is, Jm is a value that defines a balance (trade-off) between the amount of code and the degree of deterioration when the intra-prediction mode m is applied, and the intra-prediction mode m that generates the smallest Jm generates the optimal solution.
When the encoding device performs the processing of step S21 and subsequent steps, information of one intra prediction mode is assigned to a macroblock including 3 color components. On the other hand, when the processing of step S22 and subsequent steps is performed, intra prediction mode information (3 in total) is assigned to each color component. Therefore, since the intra prediction mode information allocated to the macroblock is different, it is necessary to multiplex the intra prediction mode sharing flag 23 into the bit stream so that the decoding apparatus can recognize whether the encoding apparatus has performed the processing procedure after S21 or the processing procedure after S23. Fig. 15 shows a data array of such a bit stream.
In fig. 15, intra coding modes 0(34a), 1(34b), and 2(34C) multiplexed into a bitstream at the macroblock level indicate coding modes 6 corresponding to C0, C1, and C2 components, respectively. When the intra-coding is the intra nxn prediction coding mode, the transform block size identification flag 32 and the information of the intra-prediction mode are multiplexed into the bitstream. On the other hand, when the intra coding mode is the intra 16 × 16 prediction coding mode, the information of the intra prediction mode is coded as a part of the intra coding mode information, and the transform block size identification flag 32 and the information of the intra prediction mode are not multiplexed into the bitstream. When the intra coding mode sharing flag 33 indicates "C0, C1, and C2 are common", the intra coding modes 1(34b) and 2(34C), the transform block size flags 1(32b) and 2(32C), and the intra prediction modes 1(35b) and 2(35C) are not multiplexed into the bitstream (the broken-line circles in fig. 15 indicate branches thereof). In this case, the intra coding mode 0(34a), the transform block size flag 0(32a), and the intra prediction mode 0(35a) each function as coding information common to all color components. Fig. 15 shows an example of the intra-coding-mode-sharing flag 33 multiplexed into bit stream data at a higher level than a macroblock, such as a slice, a picture, and a sequence. In particular, in the case of use as in the example exemplified in embodiment 2, since the color space is often not changed by the sequence, the object can be achieved by multiplexing the intra-coding mode commonization identification flag 33 in advance at the sequence level.
In embodiment 2, the intra-coding-mode commonization flag 33 is used in the sense of "whether or not all components are common", but it may be used in the sense of "whether or not 2 components specific to C1, C2, etc. are common", for example, depending on the definition of the color space of the input video signal 1 (in the case of Y, Cb, Cr, etc., there is a high possibility that Cb and Cr can be commonized). Further, when the intra-coding mode sharing identification flag 33 is used in a sharing range limited to the intra-coding mode, and the intra-N × N prediction mode is used, the transform block size and the N × N prediction mode may be selected independently for each color component (fig. 16). According to the syntax structure shown in fig. 16, for a video of a complex picture requiring N × N prediction, the encoding mode information can be shared, and the prediction method can be changed for each color component, thereby improving the prediction efficiency.
Further, if it is known in advance what both the encoding apparatus and the decoding apparatus are, the information of the intra-coding mode sharing identification flag 33 may not be included in the bit stream of the video and may be transmitted. In this case, for example, the encoding device may be configured to fix the intra-coding mode sharing identification flag 33 to an arbitrary value and encode the flag, or may transmit the flag separately from the video bitstream.
3. Outline of operation of decoding device
The decoding apparatus of fig. 12 receives the video stream 22 corresponding to the array of fig. 15 output from the encoding apparatus of fig. 11, and decodes 3 color components in units of macroblocks of the same size (4:4:4 format) to restore each video frame.
First, the variable length decoding unit 25 decodes the stream 22 in accordance with a predetermined rule (syntax) and extracts information such as the intra-coding mode sharing flag 33, the quantized transform coefficient 10, the coding mode 6, and the quantization parameter 21. The quantized transform coefficient 10 is input to the inverse quantization unit 12 together with the quantization parameter 21, and inverse quantization processing is performed. The output is input to the inverse orthogonal transform unit 13 and restored to the local decoded prediction difference signal 14. On the other hand, the spatial prediction unit 2 receives the encoding mode 6 and the intra-encoding mode sharing flag 33, and obtains the predicted image 7 based on these pieces of information. A specific procedure for obtaining the predicted image 7 will be described later. The adder 18 adds the local decoded prediction difference signal 14 and the predicted image 7 to obtain a temporary decoded image 15 (which is a signal identical to the local decoded image 15 in the encoding apparatus). The provisional decoded image 15 is written back to the memory 16 for use in the intra prediction of the subsequent macroblock. A 3-plane memory is prepared for each color component. In addition, the deblocking filter 26 is caused to act on the provisional decoded image 15 in accordance with the instruction of the deblocking filter control flag 24 interpreted by the variable length decoding unit 25, and a final decoded image 27 is obtained.
4. Intra-prediction decoding process of decoding apparatus
The intra prediction image generation process, which is a feature of the decoding apparatus of embodiment 2, will be described in detail. This processing is performed in units of macroblocks in which the 3 color components are grouped, and is mainly executed by the variable length decoding unit 25 and the spatial prediction unit 2 in the decoding device of fig. 12. Fig. 17 is a flowchart showing the flow of the present process.
S25 to S38 in the flowchart of fig. 17 are executed in the variable length decoding section 25. The video stream 22 as input to the variable length decoding section 25 corresponds to the data array of fig. 15. In step S25, the intra coding mode 0(34a) (corresponding to C0) in the data of fig. 15 is first decoded. If the intra coding mode 0(34a) is "intra N × N prediction" as a result, the transform block size flag 0(32a) and the intra prediction mode 0(35a) are decoded (steps S26 and S27). Next, when it is determined that the intra-coding and prediction modes are common to all the color components based on the state of the intra-coding mode sharing flag 33, the intra-coding mode 0(34a), the transform block size flag 0(32a), and the intra-prediction mode 0(35a) are set as the coding information used for the C1 and the C2 components (steps S29 and S30). Fig. 17 illustrates the processing in units of macroblocks, and the intra-coding-mode-sharing flag 33 used in step S29 is read out from the bitstream 22 by the variable-length decoding section 25 at the layer level above the slice before the processing as to the start of fig. 17 is performed.
When it is determined in step S29 of fig. 17 that the intra-coding/prediction mode information is encoded for each color component, the intra-coding/prediction mode information for the C1 and C2 components is decoded in the processing of the subsequent steps S31 to S38. The encoding mode 6 of each color component is determined through the above processing procedure, and the determined encoding mode is input to the spatial prediction unit 2, so that an intra-prediction image of each color component is obtained in accordance with steps S39 to S41. The process of obtaining the intra-prediction image corresponds to the steps in fig. 3, 4, and 13, and is the same as the process performed by the encoding device in fig. 11.
As described above, if it is known in advance what the encoding apparatus and the decoding apparatus are for the intra-coding-mode sharing identification flag 33, the decoding apparatus may not analyze its value from the video bitstream, and may be configured to decode the video bitstream by using a predetermined value, for example, or may transmit the decoded video bitstream separately from the video bitstream.
In the 4:2:0 format adopted in the conventional video encoding standard, the definition of the color space is fixed to Y, Cb, and Cr, but in the 4:4:4 format, not only Y, Cb, and Cr, but also various color spaces can be used. By configuring the coding information of the intra-prediction macroblock as shown in fig. 15 and 16, it is possible to perform an optimal coding process in accordance with the color space definition of the input video signal 1 and the properties of the video signal, and to perform a video decoding/playback process by uniquely interpreting a bit stream obtained as a result of such a coding process.
Example 3
In embodiment 3, another configuration example of the encoding device in fig. 11 and the decoding device in fig. 12 is shown. The encoding device and the decoding device are based on the encoding method adopted in the MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard, which is non-patent document 1, and have unique features added to the present invention, as in embodiment 1. The video encoding apparatus according to embodiment 3 differs from the encoding apparatus according to embodiment 2 described with reference to fig. 11 only in the variable length encoding unit 11. The video decoding apparatus according to embodiment 3 differs from the decoding apparatus according to embodiment 2 described with reference to fig. 12 only in the variable length decoding unit 25. The operation is otherwise the same as in example 2, and only the differences will be described here.
1. Coding step of intra-prediction mode information in coding device
In the encoding apparatus according to embodiment 2, the variable length encoding unit 11 indicates the data array on the bit stream for the information of the intra N × N prediction mode, but does not particularly indicate the encoding procedure. In the present embodiment, a specific method of the encoding step is shown. In the present embodiment, the following features are particularly provided: considering a case where the values of the intra nxn prediction modes have high correlation between color components, average information amount (entropy) encoding using the correlation of the values between the color components is performed on the intra nxn prediction modes obtained with the respective color components.
In the following description, a bit stream array of the format of fig. 16 is assumed. For simplification of description, it is assumed that the intra coding mode sharing flag 33 is set to share the intra coding mode among C0, C1, and C2, the intra coding mode is an intra nxn prediction mode, and the transform block sizes 0 to 2 are 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. In fig. 18 to 20, the current macroblock to be encoded is assumed to be X. In addition, it is assumed that the macroblock in the left neighborhood is macroblock a and the macroblock immediately above is macroblock B.
Fig. 18 to 20 are diagrams illustrating the encoding procedure for each color component of C0, C1, and C2. Fig. 21 and 22 show flowcharts of steps.
Fig. 18 shows the case of the C0 component of the macroblock X. Here, the 4 × 4 block to be encoded is referred to as a block X, and the 4 × 4 blocks on the left and top of the block X are referred to as a block a and a block B, respectively. In the macroblock X, there are 2 cases corresponding to the position of the 4 × 4 block to be encoded. Case 1 is a case where the 4 × 4 block on the left and top of the 4 × 4 block to be encoded belongs to a block other than the current macroblock X, that is, a macroblock a or a macroblock B. Case 2 is a case where the 4 × 4 block on the left and top of the 4 × 4 block to be encoded belongs to the inside of the current macroblock X, that is, belongs to the macroblock X. For either case, the intra 4 × 4 prediction mode is assigned one by one to each 4 × 4 block X within macroblock X, which is assumed to be CurrIntraPredMode. In addition, let the intra 4 × 4 prediction mode of block a be IntraPredModeA, and the intra 4 × 4 prediction mode of block B be IntraPredModeB. IntraPredModeA and IntRAPredModeB are both information that has been encoded at the time block X was encoded. When the intra 4 × 4 prediction mode of a certain block X is coded, these parameters are first allocated (step S50 in fig. 21).
Next, a predicted value predcurintrapredmode corresponding to CurrIntraPredMode of the block X is determined by the following equation (step S51).
predCurrIntraPredMode=Min(IntraPredModeA、IntraPredModeB)
Next, CurrIntraPredMode as component C0 was encoded. Here, if CurrIntraPredMode is predCurrIntraPredMode, a flag (prev _ intra _ pred _ mode _ flag) of 1 bit indicating that the prediction value is the same is encoded. If CurrIntraPredMode! In case that CurrIntraPredMode is small, CurrIntraPredMode is encoded as it is. When CurrIntraPredMode is large, CurrIntraPredMode-1 is encoded (step S52).
Next, referring to fig. 19, a procedure for encoding the component C1 is shown. First, as with the component C0, the encoding parameters in the vicinity of IntraPredModeA, IntraPredModeB, and the like are set in accordance with the position of the block X (step S53).
Next, the predictor candidate 1 predcurintrapredmode 1 corresponding to CurrIntraPredMode of the block X is determined by the following equation (step S54).
predCurrIntraPredMode1=Min(IntraPredModeA,IntraPredModeB)
If prev _ intra _ pred _ mode _ flag in component C0 is 1, this predcurrintedmode is used as it is for predcurrintinpredmode of block X of component C1. The reason for this is as follows. If prev _ intra _ pred _ mode _ flag is adopted at the same block position in the C0 component as 1, this indicates that in the C0 component, the correlation between prediction modes is high in the nearby image region. In the case where RGB signals or the like in which the correlation of the context structure is not completely excluded between the C0 component and the C1 component, there is a possibility that the correlation is high between the nearby image regions in the C1 component as in the C0 component. Therefore, it was determined that the predicted value of the C1 component did not depend on the intra 4 × 4 prediction mode of the C0 component.
On the other hand, in the C0 component, when the prev _ intra _ pred _ mode _ flag is 0, that is, the rem _ intra _ pred _ mode is decoded (step S55), the CurrIntraPredMode of the C0 component is assumed to be the predictor candidate 2 (step S56). That is to say that the first and second electrodes,
predCurrIntraPredMode2=CurrIntraPredMode_C0
the background of this as a predictor candidate is as follows. If the rem _ intra _ pred _ mode is encoded for the C0 component, it indicates that the correlation of intra prediction between nearby image regions in the C0 component is low. In this case, similarly to the C1 component, the correlation between neighboring image regions is expected to be low, and thus, an intra prediction mode at the same block position with different color components may generate a higher prediction value.
The predicted value of CurrIntraPredMode of block X of component C1 is finally determined to be either one of predcururrpintrapredmode 1 and predcururrpintrapredmode 2 (step S57). For which value is used, the coding is additionally performed with a 1-bit flag (pred _ flag). However, the pred _ flag is coded only when CurrIntraPredMode matches the prediction value, and when it does not match (when rem _ intra _ pred _ mode is coded), predCurrIntraPredMode1 is used as the prediction value.
If the above steps are described by a formula, the following is:
as a result, prev _ intra _ pred _ mode _ flag, pred _ flag, rem _ intra _ pred _ mode are encoded into encoded data (step S58).
Next, referring to fig. 20, a coding procedure of component C2 is shown. First, as with the C0 and C1 components, the encoding parameters in the neighborhood of IntraPredModeA, IntraPredModeB, and the like are set in accordance with the position of the block X (step S59).
Next, the predictor candidate 1 predcurintrapredmode 1 corresponding to CurrIntraPredMode of the block X is determined by the following equation (step S60).
predCurrIntraPredMode1=Min(IntraPredModeA,IntraPredModeB)
If prev _ intraa _ pred _ mode _ flag is 1 in both the C0 and C1 components, this predcurritrappredmode 1 is used as it is for predcurintrapredmode of the block X of the C1 component. The reason for this is as follows. If prev _ intra _ pred _ mode _ flag is 1 in the same block position in C0 and C1 components, it indicates that the correlation between prediction modes is high in the nearby image region in C0 and C1 components. In the case where RGB signals or the like in which the correlation of the context structure is not completely excluded among the C0, C1 components, and C2 components, there is a possibility that the correlation is high among the nearby image regions in the C2 components, as in the case of the C0 and C1 components. Therefore, it was determined that the predicted value of the C2 component did not depend on the intra 4 × 4 prediction mode of the C0 and C1 components.
On the other hand, in the C0 or C1 components, when the prev _ intra _ pred _ mode _ flag is 0, that is, when the rem _ intra _ pred _ mode is encoded (step S61), CurrIntraPredMode of the C0 or C1 components is set as the predictor candidate 2 (step S62). That is to say that the first and second electrodes,
The background of this as a predictor candidate is as follows. If the rem _ intra _ pred _ mode is encoded for the C0 or even C1 components, it means that the correlation of intra prediction between nearby image regions in the C0 or even C1 components is low. In this case, similarly to the C2 component, the correlation between neighboring image regions is expected to be low, and thus, an intra prediction mode at the same block position with different color components may generate a higher prediction value. In addition, according to the above-described consideration method, when the rem _ intra _ pred _ mode is encoded for both the C0 and C1 components, the current intra prediction modes of both C0 and C1 may be candidates for prediction values, but here, the current intra prediction mode of the C1 component is used as a prediction value. The reason is that: when a YUV color space is input, there is a high possibility that C0 is handled as luminance and C1/C2 is handled as color difference, and in this case, it is considered that C1 is closer to the prediction mode of C2 than C0. In the case of inputting the RGB space, it is considered that selecting C0 or C1 is not a large factor, and it is generally appropriate to use C1 components for the predicted values (C2 components may be used for the predicted values depending on the design).
The predicted value of CurrIntraPredMode of block X of component C2 is finally determined to be either one of predcururrpintrapredmode 1 and predcururrpintrapredmode 2 (step S63). For which value is used, the coding is additionally performed with a 1-bit flag (pred _ flag). If the above steps are described by a formula, the following is:
As a result, prev _ intra _ pred _ mode _ flag, pred _ flag, rem _ intra _ pred _ mode are encoded into encoded data (step S64).
The above-described encoding steps may be defined similarly for the intra 8 × 8 prediction mode. By encoding the intra-nxn prediction mode in such a procedure, the correlation with the selected prediction mode can be used for other color components, and the amount of code of the prediction mode itself can be reduced, thereby improving the encoding efficiency.
Fig. 21 and 22 differ in whether the encoding process of the intra prediction mode per MB is separated or unified for each color component. In the case of fig. 21, the color components are encoded in units of 4 × 4 blocks, and these data arrays, in which 16 modes are collected, are grouped into a bit stream (step S65). In the case of fig. 22, 16 4 × 4 blocks of each color component are collectively encoded, and the color components are separately grouped into a bitstream (steps S66, S67, S68).
In the above-described step, the pred _ flag is only information that is valid when the prev _ intra _ pred _ mode _ flag is 1, but it may be determined that the case where the prev _ intra _ pred _ mode _ flag is 0 is also included. That is, for example, if the component C1 is used, the encoding may be performed by the following procedure.
In this method, when rem _ intra _ pred _ mode is encoded in the intra prediction mode of the co-located block of the C0 component, the pred _ flag is always encoded, but when prev _ intra _ pred _ mode _ flag is equal to 0, a prediction value with higher accuracy can be used, and improvement in encoding efficiency can be expected. Further, it may be configured to encode the pred _ flag independently of whether or not the rem _ intra _ pred _ mode is encoded in the intra prediction mode of the co-located block of the C0 component. In this case, the intra prediction mode of the C0 component is always used as a prediction value candidate.
That is, the formula in this case is as follows.
In addition, the pred _ flag may be set not only in 4 × 4 block units but also in units of macroblocks or sequences. When the prediction value candidates 1 and 2 are used in common for all 4 × 4 blocks within a macroblock, overhead information transmitted as a pred _ flag can be further reduced. In order to determine which of the predictor candidate 1 or 2 corresponds to the input color space definition, the predictor candidate may be determined in units of a sequence. In this case, it is not necessary to transmit pred _ flag for each macroblock, and overhead information can be further reduced.
2. Decoding step of intra-prediction mode information of decoding device
In the decoding apparatus according to embodiment 2, the variable length decoding unit 25 indicates the data array on the bit stream as to the information of the intra N × N prediction mode, but does not particularly indicate the encoding procedure. In embodiment 3, a specific method of the encoding step is shown. In embodiment 3, the following features are particularly provided: considering a case where the values of the intra nxn prediction modes have high correlation between color components, average information amount (entropy) encoding using the correlation of the values between the color components is performed on the intra nxn prediction modes obtained with the respective color components.
In the following description, a bit stream array of the format of fig. 16 is assumed. Since the description is limited to the decoding procedure of the intra prediction mode, it is assumed that the value of the intra coding mode commonization flag 33 in the bit stream is set so as to make the intra coding mode commonized to C0, C1, and C2. In addition, the intra coding mode is assumed to be an intra NxN prediction mode, and the transform block sizes 0 to 2 are 4 x 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. The relationships of fig. 18 to 20 are also used in the decoding apparatus as in the encoding apparatus. In the decoding apparatus, a current macroblock to be encoded is assumed to be X. In addition, it is assumed that the macroblock in the left neighborhood is macroblock a and the macroblock immediately above is macroblock B. Fig. 23 shows a flowchart of the decoding step. In fig. 23, steps given the same reference numerals as in fig. 21 and 22 show the same processing as that performed by the encoding device.
Fig. 18 shows the case of the C0 component of the macroblock X. In the macroblock X, there are 2 cases corresponding to the position of the 4 × 4 block to be encoded. Case 1 is a case where the 4 × 4 block on the left and top of the 4 × 4 block to be encoded belongs to a block other than the current macroblock X, that is, a macroblock a or a macroblock B. Case 2 is a case where the 4 × 4 block on the left and top of the 4 × 4 block to be encoded belongs to the inside of the current macroblock X, that is, belongs to the macroblock X. Here, the 4 × 4 block to be encoded is referred to as a block X, and the 4 × 4 blocks on the left and top of the block X are referred to as a block a and a block B, respectively. For either case, the intra 4 × 4 prediction mode is assigned one by one to each 4 × 4 block X within macroblock X, which is assumed to be CurrIntraPredMode. In addition, let the intra 4 × 4 prediction mode of block a be IntraPredModeA, and the intra 4 × 4 prediction mode of block B be IntraPredModeB. IntraPredModeA and IntRAPredModeB are both information that has been encoded at the time block X was encoded. When the intra 4 × 4 prediction mode of a certain block X is coded, these parameters are first assigned (step S50).
Next, a predicted value predcurintrapredmode corresponding to CurrIntraPredMode of the block X is determined by the following equation (step S51).
predCurrIntraPredMode=Min(IntraPredModeA、IntraPredModeB)
Next, a flag (prev _ intra _ pred _ mode _ flag) of 1 bit indicating whether or not the flag is CurrIntraPredMode ═ predcurcurrintrapredmode is decoded. prev _ intra _ pred _ mode _ flag ═ 1 denotes CurrIntraPredMode ═ predcurcurcurcurcurrintrapredmode. Otherwise (prev _ intra _ pred _ mode _ flag ═ 0), information of rem _ intra _ pred _ mode is decoded from the bitstream. The rem _ intra _ pred _ mode and the predcurintrapredmode are compared, and when the rem _ intra _ pred _ mode is small, CurrIntraPredMode is set to rem _ intra _ pred _ mode. When CurrIntraPredMode is large, CurrIntraPredMode is set to rem _ intra _ pred _ mode +1 (step S65).
If these steps are summarized, the following is done.
Next, referring to fig. 19, a procedure for encoding the component C1 is shown. First, as with the component C0, the encoding parameters in the vicinity of IntraPredModeA, IntraPredModeB, and the like are set in accordance with the position of the block X (step S53).
Next, the predictor candidate 1 predcurintrapredmode 1 corresponding to CurrIntraPredMode of the block X is determined by the following equation (step S54).
predCurrIntraPredMode1=Min(IntraPredModeA,IntraPredModeB)
If prev _ intra _ pred _ mode _ flag in component C0 is 1, this predcurrintedmode is used as it is for predcurrintinpredmode of block X of component C1. This is for the same reason as explained in the encoding apparatus.
On the other hand, in the C0 component, when the prev _ intra _ pred _ mode _ flag is 0, that is, the rem _ intra _ pred _ mode is decoded (step S55), the CurrIntraPredMode of the C0 component is assumed to be the predictor candidate 2 (step S56). That is to say that the first and second electrodes,
predCurrIntraPredMode2=CurrIntraPredMode_C0
the reason why this is used as a predictor candidate is the same as that explained in the encoding apparatus.
The predicted value of CurrIntraPredMode of block X of component C1 is finally determined to be either one of predcururrpintrapredmode 1 and predcururrpintrapredmode 2 (step S57). For which value is used, a flag of 1 bit (pred _ flag) is decoded to determine. However, the pred _ flag is coded only when CurrIntraPredMode matches the prediction value, and when it does not match (when rem _ intra _ pred _ mode is coded), predCurrIntraPredMode1 is used as the prediction value.
After predictor candidate 1, predictor candidate 2, prev _ intra _ pred _ mode _ flag, pred _ flag, and rem _ intra _ pred _ mode are generated, CurrIntraPredMode is decoded by the following steps (step S66).
Next, a decoding procedure of the component C2 is shown in fig. 20. First, as in C0 and C1, the encoding parameters in the neighborhood of IntraPredModeA, IntraPredModeB, and the like are set in accordance with the position of the block X (step S59).
Next, the predictor candidate 1 predcurintrapredmode 1 corresponding to CurrIntraPredMode of the block X is determined by the following equation (step S60).
predCurrIntraPredMode1=Min(IntraPredModeA,IntraPredModeB)
If prev _ intra _ pred _ mode _ flag is 1 in both the C0 and C1 components, this predcurritrappredmode 1 is used as it is for predcurintrapredmode of the block X of the C1 component. This is for the same reason as explained in the encoding apparatus.
On the other hand, in the C0 or C1 components, when the prev _ intra _ pred _ mode _ flag is 0, that is, when the rem _ intra _ pred _ mode is encoded (step S61), CurrIntraPredMode of the C0 or C1 components is set as the predictor candidate 2 (step S62). That is to say that the first and second electrodes,
the reason why this is used as a predictor candidate is the same as that explained in the encoding apparatus.
The predicted value of CurrIntraPredMode of block X of component C2 is finally determined to be either one of predcururrpintrapredmode 1 and predcururrpintrapredmode 2 (step S63). For which value is used, a flag of 1 bit (pred _ flag) is decoded to determine. However, the pred _ flag is coded only when CurrIntraPredMode matches the prediction value, and when it does not match (when rem _ intra _ pred _ mode is coded), predCurrIntraPredMode1 is used as the prediction value.
After predictor candidate 1, predictor candidate 2, prev _ intra _ pred _ mode _ flag, pred _ flag, and rem _ intra _ pred _ mode are generated, CurrIntraPredMode is decoded by the following steps (step S71).
The decoding steps described above may be defined similarly for the intra 8 × 8 prediction mode. By decoding the intra-nxn prediction mode in such a procedure, the correlation with the selected prediction mode can be used for other color components, and the amount of code of the prediction mode itself can be reduced, thereby improving the decoding efficiency.
In the above-described step, the pred _ flag is information to be decoded only when the prev _ intra _ pred _ mode _ flag is 1, but decoding may be performed including a case where the prev _ intra _ pred _ mode _ flag is 0.
That is, for example, if the component of C1 is cited, the decoding may be performed by the following procedure.
The effect of this method is as described in the description of the encoding procedure on the encoding apparatus side. Further, it may be configured to encode the pred _ flag independently of whether the rem _ intra _ pred _ mode is decoded in the intra prediction mode of the co-located block of the C0 component. In this case, the intra prediction mode of the C0 component is always used as a prediction value candidate.
That is to say that the first and second electrodes,
in addition, the pred _ flag may be set not only in 4 × 4 block units but also in units of macroblocks or sequences. When the prediction value candidates 1 and 2 are used in common for all 4 × 4 blocks within a macroblock, overhead information of decoded pred _ flag can be reduced. In order to determine which of the predictor candidate 1 or 2 corresponds to the input color space definition, the predictor candidate may be determined in units of a sequence. In this case, it is not necessary to transmit pred _ flag for each macroblock, and overhead information can be further reduced.
Example 4
A bitstream of the form of fig. 16 is explained in embodiment 2. In embodiment 2, it was described that, when the intra coding mode indicates "intra N × N prediction", the intra prediction mode of each color component of C0, C1, and C2 is recognized as an intra 4 × 4 prediction mode or an intra 8 × 8 prediction mode in accordance with the values of the transform block size flags 0 to 2(32a to 32C). In embodiment 4, the bit stream array is changed and the intra prediction mode indicator 1, 2(36a, 36b) is transmitted in the sequence level for the C1 and C2 components as shown in fig. 24. When the intra prediction mode indication flag is an intra N × N prediction mode selected in the intra coding mode, and when the transform block size identification flag indicates 4 × 4 transform, that is, when the intra 4 × 4 prediction mode is valid, the following 2 states can be switched according to the value thereof.
State 1: for the C1 or C2 components, the intra4 × 4 prediction mode to be used is also selected from the 9 in fig. 3 and coded.
State 2: for the C1 or C2 components, the intra4 × 4 prediction mode used is limited to DC prediction, i.e., intra4 × 4_ pred _ mode in fig. 3 is 2, and no intra prediction mode information is encoded.
For example, when encoding is performed in a color space such as Y, Cb, or Cr, 4 × 4 blocks correspond to an extremely small image area in the case of high-resolution video such as HDTV or higher. In this case, it is more efficient to fix the prediction mode information itself to 1 and not transmit the prediction mode information which becomes overhead, compared to the case where a room for selecting 9 prediction modes is reserved particularly for components such as Cb and Cr which do not hold an image context structure. By performing such bit stream arrangement, optimal encoding can be performed in accordance with the properties of the input color space and the characteristics of the video.
The decoding device that receives the bit stream of the format of fig. 24 is configured such that the variable length decoding unit 25 decodes the intra prediction mode indicator (36a, 36b) and identifies whether to encode the bit stream in state 1 or encode the bit stream in state 2 based on the value of the intra prediction mode indicator (36a, 36 b). Thus, it is determined whether the C1 or C2 component is used by decoding a4 × 4 prediction mode from a bitstream or DC prediction is applied fixedly, that is, intra4 × 4_ pred _ mode in fig. 3 is 2.
In embodiment 4, the state 2 is limited to intra4 × 4_ pred _ mode of 2 for the C1 and C2 components, but the prediction mode information may be fixed to 1, or other prediction modes may be used. In the state 2, the intra4 × 4 prediction mode may be determined as in the case of C0 for the C1 or C2 components. In this case, since it is not necessary to encode the inner 4 × 4 prediction mode for the C1 or C2 components, overhead bits can be reduced.
Example 5
In embodiment 5, another configuration example of the encoding device of fig. 11 and the decoding device of fig. 12 is shown. The encoding device and the decoding device of embodiment 5 are based on the encoding method adopted in the MPEG-4AVC (ISO/IEC 14496-10)/ITU-T h.264 standard which is non-patent document 1, and have unique features added to the present invention, as in the other embodiments described above. The video encoding apparatus according to embodiment 5 differs from the encoding apparatus of fig. 11 described in embodiments 2 and 3 only in the variable length encoding unit 11. The video encoding apparatus according to embodiment 5 differs from the decoding apparatus shown in fig. 12 described in embodiments 2 and 3 only in the variable length decoding unit 25. The same operations as in examples 2 and 3 are performed, and only the differences will be described here.
1. Coding step of intra-prediction mode information in coding device
In the encoding apparatus of embodiment 3, a specific encoding method of the intra nxn prediction mode information in the bitstream of the format of fig. 16 in the variable length encoding section 11 has been described. In embodiment 5, another specific method of the encoding step is shown. In embodiment 5, focusing on the fact that the intra N × N prediction mode value reflects the context structure as an image model (pattern), the present invention is characterized in that: a method for adaptively predicting a neighboring pixel region in the same color component is provided. In the following description, a bit stream array of the format of fig. 16 is assumed. In embodiment 5, information of intra nxn prediction modes of the respective components C0, C1, and C2 is independently encoded for each color component, and the encoding method of the C0 component is similarly applied to the C1 and C2, and only the C0 component will be described for simplicity of description. The value of the intra coding mode sharing flag 33 is set so that the intra coding modes are shared among C0, C1, and C2, the intra coding mode is an intra nxn prediction mode, and the transform block size flags 0 to 2(32a and 32b) are 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. Fig. 18 is a diagram for explaining a procedure of encoding intra N × N prediction mode information as a component C0. In fig. 18, it is assumed that a current macroblock to be encoded is X. In addition, it is assumed that the macroblock in the left neighborhood is macroblock a and the macroblock immediately above is macroblock B. Fig. 25 is a flowchart of the encoding procedure.
In embodiment 3, in fig. 18, the predicted value predcurnorintrapredmode corresponding to the intra 4 × 4 prediction mode CurrIntraPredMode allocated one by one for each 4 × 4 block X is uniquely allocated to the smaller value of intrapredmode a and intrapredmode b. This is a method adopted in the current AVC/h.264 standard, and the larger the value of the intra N × N prediction mode is, the more the predicted image generation method becomes a complicated mode accompanied by pixel compensation in which the directivity of the image model is added, and this is because a small value is assigned to a mode having high suitability for a general image model. In contrast to this, in the case where the bit rate is relatively high, the small value of intrapredmode a and intrapredmode b cannot be said to be optimal because the amount of increase in distortion in the prediction mode has a larger influence on the mode selection than the amount of increase in distortion in the prediction mode. Based on such an observation, in embodiment 5, as described below, the accuracy of the predicted value is improved by making the predicted value setting more appropriate in accordance with the state of IntraPredModeA or IntraPredModeB. In this step, when focusing on the image model, as a value determined to be able to best estimate CurrIntraPredMode, predcurintrapredmode is determined based on the states of intrapredmode a and intrapredmode b (steps S73, S74, and S75).
(1) When both IntraPredModeA and IntRAPredModeB are in the range of 0-2, MIN (IntRAPredModeA, IntRAPredModeB) is set to predCorrIntRAPredModeB.
(2) When either of intrapredmode a and intrapredmode b is 3 or more, if the prediction directions of intrapredmode a and intrapredmode b are completely different (for example, when intrapredmode a is 3 and intrapredmode b is 4), the DC prediction (intra4 × 4_ pred _ mode2) is predcurrpurrapredmode.
(3) When either IntraPredModeA or IntraPredModeB is 3 or more, the prediction mode of the difference-compensated pixel (7 in the above example) is predcurrintrapredmode when the prediction directions are the same (for example, when IntraPredModeA is 3 and IntraPredModeB is 7, both predictions are performed from the upper right).
In addition, preparation processing for encoding of IntraPredModeA, IntraPredModeB, and the like is performed in advance as in example 3 (steps S50, S53, S59). The result is that predcurintrapredmode is uniquely derived from the values of IntraPredModeA, IntraPredModeB. Fig. 26 shows a case where the rule for setting the prediction value is tabulated. In fig. 26, the part of the frame does not follow the rule of the conventional MIN (IntraPredModeA, IntraPredModeB), and a better predicted value is determined from the continuity of the image model. In the above step (1), a table of class (class)0 is used. The table of class 1 was used in (2) and (3).
After predcurintrapredmode is determined as a result of the above, the remaining encoding step of the C0 component explained in embodiment 3 is performed, thereby completing encoding (steps S52, S58, S64).
Namely, as follows:
the encoding steps described above may be defined similarly for the intra 8 × 8 prediction mode. By encoding the intra-nxn prediction mode in such a procedure, it is possible to better utilize the correlation of prediction modes in the neighboring pixel regions of the same color component, reduce the amount of encoding of the prediction mode itself, and improve the encoding efficiency.
2. Decoding step of intra-prediction mode information of decoding device
The decoding apparatus according to embodiment 3 shows one specific decoding procedure of the information of the intra nxn prediction mode in the variable length decoding unit 25 for the bit stream of the format of fig. 16. In embodiment 5, another specific method of the decoding step is shown. In embodiment 5, focusing on the fact that the intra N × N prediction mode values reflect the context structure as an image model, the present invention is characterized in that: adaptive prediction is performed in a neighboring pixel region in the same color component, and the encoded bit stream is decoded.
In the following description, a bit stream array of the format of fig. 16 is assumed. For simplicity of explanation, it is assumed that the value of the intra coding mode commonalization flag 33 in the bit stream is set so as to make the intra coding mode common to C0, C1, and C2. In addition, the intra coding mode is set as an intra NxN prediction mode, and the transform block size identifiers 0 to 2(32a to 32c) are set as 4 x 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. As with the encoding apparatus, only the C0 component will be described using the relationship shown in fig. 18 (C1 and C2 will be decoded independently of C0 by the same procedure). In the decoding apparatus, a current macroblock to be encoded is assumed to be X. In addition, it is assumed that the macroblock in the left neighborhood is macroblock a and the macroblock immediately above is macroblock B.
In embodiment 3, as described in the description of the encoding apparatus, as well, in fig. 18, the smaller value of intrapredmode a and intrapredmode b is uniquely assigned to the predicted value predcurcururrappredmode corresponding to the intra 4 × 4 prediction mode CurrIntraPredMode assigned to each 4 × 4 block X. In contrast, in the decoding device of embodiment 5, predcurintrapredmode is determined using the table of fig. 26 in exactly the same procedure as that shown in the encoding procedure. Since IntraPredModeA and IntraPredModeB are already decoded and known, the same processing as the encoding process can be performed.
The subsequent steps are equivalent to the decoding step of the C0 component described in example 3. The summary is as follows.
The decoding steps described above may be defined similarly for the intra 8 × 8 prediction mode. By decoding the intra-nxn prediction mode in such a procedure, it is possible to decode the coded bit stream with the reduced amount of code of the prediction mode itself, making better use of the correlation of prediction modes in the neighboring pixel regions of the same color component.
In the above example, predcurintrapredmode is fixed and encoded using the table of fig. 26, but it may be configured to encode and decode while successively updating the intra-prediction mode that is most likely to occur for intrapredmode a and intrapredmode to predcurintrapredmode using the table of fig. 26 as an initial value. For example, in the combination of "class 0, IntraPredModeA 0, IntraPredModeB 0, and predcurintrapredmode 0" in fig. 26, in the above embodiment, predcurintrapredmode is always set to 0 when IntraPredModeA 0 and IntraPredModeB 0. However, since the video signal itself is a non-constant signal, it cannot be guaranteed that the combination is always optimal depending on the content of the video. In the worst case, the possibility that predcurintrapredmode does not match the predicted value in almost all cases is not 0 for the entire video. Therefore, for example, the frequency of CurrIntraPredMode generated when IntraPredModeA is 0 and IntraPredModeB is 0 is counted, and predcurintrapredmode is updated for the prediction mode having the highest generation frequency with respect to the states of IntraPredModeA and IntraPredModeB each time encoding and decoding of CurrIntraPredMode is completed. With this configuration, the prediction value for encoding and decoding of CurrIntraPredMode can be set to an optimal value in comparison with the video content.
Example 6
In embodiment 6, another configuration example of the encoding device of fig. 11 and the decoding device of fig. 12 is shown. The encoding device and the decoding device of embodiment 6 are based on the encoding method adopted in the MPEG-4AVC (ISO/IEC 14496-10)/ITU-T h.264 standard which is non-patent document 1, and have unique features added to the present invention, as in the other embodiments described above. The video encoding apparatus according to embodiment 6 differs from the encoding apparatus of fig. 11 described in embodiments 2, 3, and 5 only in the variable length encoding unit 11. The video encoding apparatus according to embodiment 6 differs from the decoding apparatus shown in fig. 12 described in embodiments 2, 3, and 5 only in the variable length decoding unit 25. The same operations as in examples 2, 3, and 5 are performed, and only the differences will be described here.
1. Coding step of intra-prediction mode information in coding device
The specific encoding method of the intra nxn prediction mode information with respect to the bit stream of the format of fig. 16 has been described in the encoding apparatuses of embodiments 3 and 5. In embodiment 6, another specific method of the encoding step is shown. In embodiment 6, focusing particularly on the case where the intra N × N prediction mode value reflects a context structure as an image model, the present invention is characterized in that: a method for adaptive arithmetic coding in a neighborhood pixel region in the same color component is provided. In the following description, a bit stream array of the format of fig. 16 is assumed. In example 6, intra nxn prediction mode information of each of C0, C1, and C2 is independently encoded for each color component, and the C0 encoding method is similarly applied to C1 and C2, and only the C0 component will be described for simplicity of description. The value of the intra coding mode sharing flag 33 is set so that the intra coding modes are shared among C0, C1, and C2, the intra coding mode is an intra nxn prediction mode, and the transform block size flags 0 to 2(32a 32C) are 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. Fig. 18 is a diagram for explaining a procedure of encoding intra N × N prediction mode information as a component C0. In fig. 18, it is assumed that a current macroblock to be encoded is X. In addition, it is assumed that the macroblock in the left neighborhood is macroblock a and the macroblock immediately above is macroblock B. Fig. 27 is a flowchart of the encoding step.
In embodiments 3 and 5, in fig. 18, the smaller value of intrapredmode a and intrapredmode b is uniquely assigned to the prediction value predcurcururrappredmode corresponding to the intra4 × 4 prediction mode CurrIntraPredMode assigned one by one to each 4 × 4 block X, and if equal, prev _ intra _ pred _ mode _ flag is set to 1, the encoding of the intra4 × 4 prediction mode for block X is interrupted, and if not, the code is transmitted in accordance with rem _ intra _ pred _ mode. In this embodiment, CurrIntraPredMode is directly arithmetically encoded using states of intrapredmode a and intrapredmode b. At this time, a coding step of adaptive binary arithmetic coding in accordance with the context adopted in the AVC/h.264 specification is utilized.
First, CurrIntraPredMode of the encoding target is binary-expressed in the form of fig. 28 (step S76). The 1 st bin of the binary sequence is a code for classifying whether CurrIntraPredMode is vertical prediction or horizontal prediction (see fig. 3). In this example, the DC prediction (intra4 × 4_ pred _ mode — 2) is classified as the horizontal prediction, but may be classified as the vertical prediction. The 2bin provides a Terminate bit for a prediction mode value that is considered to have the highest frequency of occurrence in each of the vertical and horizontal directions. The 3 rd bin and subsequent bins are configured to be sequentially terminated from the beginning of the high frequency of occurrence of the remaining prediction mode values (it is preferable that the 2 nd bin and subsequent bins of the binary sequence structure in fig. 28 are set in accordance with the symbol occurrence probability during actual image data encoding).
Arithmetic coding is performed on each bin of a binary sequence while sequentially selecting a (0, 1) occurrence probability table to be used. In the 1 st bin coding, a context (context) used in arithmetic coding is determined as follows (step S78).
Context A (C)A): a flag intra _ pred _ direction _ flag indicating whether the intra prediction mode is vertical prediction or horizontal prediction in binary is defined for IntraPredModeA and IntraPredModeB, and the following 4 states are set as context values.
CA=(intra_pred_direction_flag for IntraPredModeA==1)+(intra_pred_direction_flag for IntraPredModeB==1)
Here, the intra _ pred _ direction _ flag is classified into vertical prediction (0) when the intra4 × 4_ pred _ mode is 0, 3, or 5, and is 1, 2, 4, or 5 when the intra4 × 4_ pred _ mode is 1, 3, or 5, for example, in fig. 3,6. In case 8, the prediction is classified as horizontal direction prediction (1). For CAThe probability of the additional condition of CurrIntraPredMode based on intrapredmode a and intrapredmode b is determined in advance for the 4 states of (1, 0), and an initial occurrence probability table is assigned based on the probability. By configuring the context in this way, the probability of occurrence of the additional condition of the 1 st bin can be estimated more appropriately, and the efficiency of arithmetic coding can be improved. And CAThe occurrence probability table of the 1 st bin is selected correspondingly, and arithmetic coding is performed. In addition, the occurrence probability table is updated with the coded value (step S79).
The initial occurrence probability table of (0, 1) determined in accordance with the occurrence probability of each prediction mode value is assigned in advance to the 2 nd bin and thereafter (step S80). Next, binary arithmetic coding and update of the occurrence probability table are performed in the same manner as for the 1 st bin (step S81).
The above-described encoding steps may be defined similarly for the intra 8 × 8 prediction mode. By encoding the intra-nxn prediction mode in such a procedure, it is possible to better utilize the correlation of prediction modes in the neighboring pixel regions of the same color component, and to apply adaptive arithmetic coding to the encoding of the prediction mode information, so that the encoding efficiency can be improved.
2. Decoding step of intra-prediction mode information of decoding device
The decoding apparatuses according to embodiments 3 and 5 each show one specific decoding procedure of the intra nxn prediction mode information in the variable length decoding unit 25 for a bitstream of the format shown in fig. 16. In embodiment 6, another specific method of the decoding step is shown. In embodiment 6, focusing particularly on the case where the intra N × N prediction mode value reflects a context structure as an image model, the present invention is characterized in that: a bit stream encoded by adaptive arithmetic coding is decoded in a neighborhood pixel region in the same color component.
In the following description, a bit stream array of the format of fig. 16 is assumed. For simplification of description, it is assumed that the intra coding mode sharing flag 33 is set to share the intra coding mode with respect to C0, C1, and C2. The intra coding mode is an intra nxn prediction mode, and transform block sizes 0 to 2(32a to 32c) are specified as 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. As in the encoding apparatus, the decoding apparatus will also describe only the C0 component using the relationship shown in fig. 18 (C1 and C2 are decoded independently of C0 by the same procedure). In the decoding apparatus, a current macroblock to be encoded is assumed to be X. In addition, it is assumed that the macroblock in the left neighborhood is macroblock a and the macroblock immediately above is macroblock B.
In embodiments 3 and 5, as described in the description of the encoding apparatus, in fig. 18, the smaller of intrapredmode a and intrapredmode b is uniquely assigned to the predicted value predurradpredmode corresponding to the intra4 × 4 prediction mode CurrIntraPredMode allocated to each 4 × 4 block X, and prev _ intra _ pred _ mode _ flag is decoded, and when the value is 1, predrrrarapredmode is used as currprepredmode, and when the value is 0, rem _ intra _ pred _ mode is decoded to restore the intra4 × 4 prediction mode of the block X. In contrast, in the decoding device of embodiment 6, CurrIntraPredMode is directly arithmetically encoded using the states of intrapredmode a and intrapredmode b. At this time, a decoding step in accordance with the context-adaptive binary arithmetic decoding adopted in the AVC/h.264 specification is utilized.
Assuming that CurrIntraPredMode of the decoding object is encoded into a binary sequence in accordance with the form of fig. 28, the sequence is subjected to binary arithmetic decoding from the left end. As described in the encoding procedure of embodiment 6, the 1 st bin of the binary sequence is a code for classifying whether CurrIntraPredMode is vertical prediction or horizontal prediction (see fig. 3). The codes after the 2bin are coded and configured to Terminate sequentially from the start of the prediction mode value in which the occurrence frequency is high. The reason for this code structure is as explained in the encoding step.
In the decoding process, first, the upper bit used in the encoding step is determined in the decoding of the 1 st binHereinafter the same CA. And CAThe occurrence probability table of the 1 st bin is selected in accordance with the value of (1), and arithmetic decoding is performed to restore the 1 st bin. In addition, the occurrence probability table is updated with the decoded values.
For the 2 nd bin and later, an initial occurrence probability table of (0, 1) determined in accordance with the occurrence probability of each prediction mode value is allocated in advance. Next, binary arithmetic coding and updating of the occurrence probability table are performed in the same manner as for the 1 st bin. Since the binary sequence in fig. 28 is configured to uniquely identify each prediction mode value, when a predetermined number of bins are restored, CurrIntraPredMode is sequentially decoded.
The decoding steps described above may be defined similarly for the intra 8 × 8 prediction mode. By decoding the intra-nxn prediction mode in such a procedure, it is possible to decode the coded bit stream by reducing the code amount of the prediction mode itself by arithmetic coding using the correlation of the prediction modes in the neighboring pixel regions of the same color component.
In the above example, other variations are also contemplated by the table of fig. 28. For example, a method of constructing a binary sequence as shown in fig. 29 may be employed. Here, the 1 st bin uses the following context B.
Context B (C)B): a flag intra _ pred _ direction _ flag indicating whether the intra prediction mode is vertical prediction or horizontal prediction in binary is defined for IntraPredModeA and IntraPredModeB, and the following 4 states are set as context values.
CA=(intra_dc_direction_flag for IntraPredModeA==1)+(intra_dc_direction_flag for IntraPredModeB==1)
Here, in fig. 3, the intra _ dc _ direction _ flag is set to 1 when the intra4 × 4 value is 2, and is set to 0 when other values are taken. For CBThe probability of the additional condition of CurrIntraPredMode based on intrapredmode a and intrapredmode b is determined in advance for the 4 states of (1), and an initial occurrence probability table of the value (0, 1) of the 1 st bin determined based on the probability is assigned. FIG. 29 In the method, the 1 st bin is set to be 0 when the CurrIntraPredMode is DC prediction, and the 1 st bin is set to be 1 when the CurrIntraPredMode is not DC prediction. In addition, the above-described context A (C) is used for the 2 nd binA). By configuring the context in this way, the probability of occurrence of the additional condition of the 1 st bin and the 2 nd bin can be estimated more appropriately, and the efficiency of arithmetic coding can be improved.
Example 7
In embodiment 7, an encoding device and a corresponding decoding device for performing encoding by inter prediction in units in which a video frame input in the 4:4:4 format is equally divided into rectangular regions (macroblocks) of 16 × 16 pixels will be described. The encoding device and the decoding device are based on the encoding method adopted by MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard (hereinafter referred to as AVC) which is non-patent document 1, and have unique features added thereto.
Fig. 30 shows a configuration of a video encoding device according to embodiment 7, and fig. 31 shows a configuration of a video decoding device according to embodiment 7. In fig. 31, elements denoted by the same reference numerals as those of the components of the coding apparatus of fig. 30 represent the same elements.
Hereinafter, the operations of the entire encoding apparatus and decoding apparatus of example 7, and the inter prediction mode determination process and the compensation prediction decoding process, which are characteristic operations of example 7, will be described with reference to these drawings.
1. Outline of operation of encoding device
In the encoding device of fig. 30, each video frame of the input video signal 1 is in a 4:4:4 format, and is input to the encoding device in units in which 3 color components are divided into macroblocks of the same size and collected.
First, the spatial prediction unit 102 selects a reference image of 1 frame from the motion compensation prediction reference image data of 1 frame or more stored in the memory 16, and performs motion compensation prediction processing for each color component on a macroblock basis. 3 memories are prepared for each color component (3 are explained in the present embodiment, but may be changed as appropriate according to the design). As shown in fig. 32(a) to (d), 7 kinds of block sizes for motion compensation prediction are prepared, and any one of 16 × 16, 16 × 8, 8 × 16, and 8 × 8 can be selected for each block size unit. Further, when 8 × 8 is selected, as shown in fig. 32(e) to (h), any one of 8 × 8, 8 × 4, 4 × 8, and 4 × 4 may be selected for each 8 × 8 block. For the selected size information, the size information in units of macroblocks is output as a macroblock type, and the size information in units of 8 × 8 blocks is output as a sub-macroblock type. Further, the identification number of the reference image selected for each block and motion vector information are also output.
The video encoding device according to embodiment 7 is characterized in that: the motion compensation prediction processing method for the 3 color components is switched according to the inter-prediction mode sharing identification flag 123. This point will be described in detail with reference to the following 2.
The motion compensation prediction unit 102 performs motion compensation prediction processing on all block sizes and sub-block sizes shown in fig. 23, all motion vectors 137 in a predetermined search range, and 1 or more selectable reference images, and obtains a prediction difference signal 4 from the motion vectors 137 and 1 reference image by a subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the encoding mode determining unit 5, and the macroblock type/sub-macroblock type 106, motion vector 137, and identification encoding of the reference image, which are optimal for the prediction efficiency of the macroblock to be predicted, are output from the prediction processing performed in the motion compensation predicting unit 102. When the macroblock type/sub-macroblock type 106 is selected, the weighting coefficient 20 corresponding to each encoding mode determined by the judgment of the encoding control section 19 is also used. The prediction difference signal 4 obtained by motion compensation prediction based on the selected type, the motion vector 137, and the reference image is output to the orthogonal transformation unit 8. The orthogonal transformation unit 8 transforms the input prediction difference signal 4 and outputs the transformed signal to the quantization unit 9 as an orthogonal transformation coefficient. The quantization unit 9 quantizes the input orthogonal transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized orthogonal transform coefficient to the variable length encoding unit 11 as the quantized transform coefficient 10. The quantized transform coefficient 10 is subjected to average information amount (entropy) encoding by means of huffman encoding or arithmetic encoding in the variable length encoding unit 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse orthogonal transform unit 13, and added to the selected macroblock type/sub-macroblock type 106, the motion vector 137, and the predicted image 7 generated from the reference image in the adder 18, thereby generating the local encoded image 15. The local coded image 15 is used in the subsequent intra prediction process, and is stored in the memory 16. Further, a deblocking filter control flag 24 indicating whether or not to perform deblocking filtering on the macroblock is input to the variable length coding unit 11 (in the prediction processing performed by the spatial prediction unit 102, since pixel data before performing deblocking filtering is stored in the memory 16 and used, the deblocking filtering process itself is not necessary in the coding processing, but the decoding apparatus performs deblocking filtering in accordance with an instruction of the deblocking filter control flag 24 to obtain a final decoded image).
The inter-prediction mode sharing flag 123, the quantized transform coefficient 10, the macroblock type/sub-macroblock type 106, the motion vector 137, the reference picture identification number, and the quantization parameter 21 input to the variable length coding unit 11 are aligned and shaped into a bit stream according to a predetermined rule (syntax), and are transmitted to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path connected to the encoding apparatus and the reading speed of the recording medium, and outputs the bit stream as a video stream 22. In addition, the feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.
2. Intra prediction mode determination processing in encoding device
The inter prediction mode determination process, which is a feature of the encoding apparatus of embodiment 7, will be described in detail. In the following description, the inter prediction mode refers to a block size which is a unit of the motion compensation prediction, that is, a macroblock type/sub-macroblock type, and the inter prediction mode determination processing refers to processing for selecting a macroblock type/sub-macroblock type, a motion vector, and a reference picture. This processing is performed in units of macroblocks in which the 3 color components are aggregated, and is mainly executed by the motion compensation prediction unit 102 and the encoding mode determination unit 5 in the encoding device of fig. 30. Fig. 33 is a flowchart showing the flow of this process. Hereinafter, the image data of 3 color components constituting a block are referred to as C0, C1, and C2.
First, the encoding mode determining unit 5 receives the inter prediction mode sharing flag 123, and determines whether or not the common inter prediction mode is used for C0, C1, and C2, based on the value (step S1 in fig. 5). If the sharing is performed, the process proceeds to step S101 and thereafter, otherwise, the process proceeds to step S102 and thereafter.
When the inter prediction mode, the motion vector 137, and the reference image are commonly used for C0, C1, and C2, the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all selectable intra prediction modes, motion vector search ranges, and reference images, and the motion compensation predicting unit 102 evaluates all prediction efficiencies thereof and selects an optimal inter prediction mode, motion vector, and reference image that are commonly used for C0, C1, and C2 (step S101).
When the inter-prediction mode, the motion vector 137, and the reference image are not used in common for C0, C1, and C2, but the optimal mode is selected for each of C0, C1, and C2, the coding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter-prediction modes, motion vector search ranges, and reference images that can be selected for the Ci (i < 0 < 3) component, evaluates all the prediction efficiencies thereof, and selects the optimal inter-prediction mode, motion vector 137, and reference image for the Ci (i < 0 < 3) (steps S102, S103, and S104).
As a rule of the prediction efficiency evaluation of the prediction mode performed in the dynamic compensation prediction section 102, for example, a rate, a distortion cost given by the following formula are used.
Jm, v, r ═ Dm, v, r + λ Rm, v, r (λ is positive number)
Here, Dm, v, r are coding distortion and prediction error amounts when the inter-prediction mode m, the motion vector v in a predetermined range, and the reference image r are applied. The coding distortion is a result of obtaining a prediction error by applying the inter prediction mode m, the motion vector v, and the reference image r, decoding a video based on a result of transforming and quantizing the prediction error, and measuring an error with respect to a signal before coding. The prediction error amount is a result of obtaining a difference between the predicted image and the signal before encoding when the inter-prediction mode m, the motion vector v, and the reference image r are applied, and quantizing the magnitude of the difference, and for example, Sum of Absolute Differences (SAD) or the like can be used. Rm, v, r are generated code amounts when the inter-prediction mode m, the motion vector v, and the reference image r are applied. That is, Jm, v, r are values that define a balance (trade-off) between the amount of code and the degree of deterioration when the inter-prediction mode m, the motion vector v, and the reference image r are applied, and the inter-prediction mode m, the motion vector v, and the reference image r that generate the smallest Jm, v, and r generate the optimal solution.
When the encoding device performs the processing of step S101 and subsequent steps, the inter-prediction mode m, the motion vector 137, and the information of the reference image are assigned to the macroblock including 3 color components. On the other hand, when the processing in step S102 and thereafter is performed, the inter prediction mode, the motion vector 137, and the reference image are assigned to each color component. Therefore, since the inter prediction mode allocated to the macroblock, the motion vector 137, and the reference picture have different information, it is necessary to multiplex the inter prediction mode sharing identification flag 123 into the bit stream so that the decoding apparatus can identify whether the encoding apparatus has performed the processing procedure after S101 or the processing procedure after S102. Fig. 34 shows a data array of such a bit stream.
Fig. 34 shows the data arrangement of a bit stream at the macroblock level, and the block type indicates intra (intra) or inter (inter), and in the inter mode, the block size indicating the unit of motion compensation is included. The sub-macroblock type is multiplexed only when the macroblock type selects an 8 × 8 block size, and contains block size information of each 8 × 8 block. The basic macroblock type 128 and the basic sub-macroblock type 129 indicate a common macroblock type and a common sub-macroblock type in the case where the inter prediction mode commonization flag 123 indicates "common to C0, C1, C2", and otherwise indicate a macroblock type and a sub-macroblock type corresponding to C0. The extended macroblock type 130 and the extended sub-macroblock type 131 are respectively multiplexed with C1 and C2 only when the inter prediction mode commonalization flag 123 indicates that "common to C0, C1, and C2" is not included, and indicate a macroblock type and a sub-macroblock type corresponding to C1 and C2.
The reference image identification number is information for specifying a reference image selected for each block having a size of 8 × 8 blocks or larger, which is a unit of motion compensation. In the case of an inter frame (inter frame), since the selectable reference image is 1 frame, 1 reference image identification number is multiplexed for each block. The motion vector information multiplexes 1 set of motion vector information for each block as a unit of motion compensation. It is necessary to multiplex the motion vector information and the reference picture identification numbers of the blocks which are units of motion compensation included in the macroblock. The base reference picture identification number 132 and the base motion vector information 133 indicate a common reference picture identification number and common motion vector information when the inter-prediction mode sharing flag 123 indicates "common to C0, C1, and C2", and indicate a reference picture identification number and motion vector information corresponding to C0 otherwise. The extended reference picture identification number 134 and the extended motion vector information 135 are multiplexed with C1 and C2, respectively, and indicate the reference picture identification number and the motion vector information corresponding to C1 and C2, only when the inter-prediction mode sharing identification flag 123 indicates that "C0, C1, and C2 are not shared".
The quantization parameter 21 and the quantized transform coefficient 10 are multiplexed (in fig. 34, the deblocking filter control flag 24 input to the variable length coding unit 11 in fig. 30 is not included, but is not described since it is not a necessary component for describing the features of embodiment 7).
In the 4:2:0 format adopted in the conventional video encoding standard, the definition of the color space is fixed to Y, Cb, and Cr, but in the 4:4:4 format, not only Y, Cb, and Cr, but also various color spaces can be used. By configuring the inter prediction mode information as shown in fig. 34, it is possible to perform an optimal encoding process even when there are various definitions of the color space of the input video signal 1. For example, when the color space is defined by RGB, since the video texture (texture) remains uniformly in each component R, G, B, the redundancy of the inter-prediction mode information itself can be reduced and the encoding efficiency can be improved by using the common inter-prediction mode information and the common motion vector information. In contrast, for example, in a region where there is no red at all (the R component is 0), the inter prediction mode and motion vector information optimal for the R component should be different from the inter prediction mode and motion vector information optimal for the G, B component. Therefore, by appropriately using the extended inter-prediction mode, the extended reference picture identification information, and the extended motion vector information, it is possible to obtain an optimal coding efficiency.
3. Outline of operation of decoding device
The decoding apparatus of fig. 31 receives the video stream 22 corresponding to the array of fig. 34 output from the encoding apparatus of fig. 30, and decodes 3 color components in units of macroblocks of the same size (4:4:4 format) to restore each video frame.
First, the variable length decoding unit 25 decodes the input stream 22 in accordance with a predetermined rule (syntax), and extracts information such as the inter-prediction mode sharing flag 123, the quantized transform coefficient 10, the macroblock type/sub-macroblock type 106, the reference picture identification number, motion vector information, and the quantization parameter 21. The quantized transform coefficient 10 is input to the inverse quantization unit 12 together with the quantization parameter 21, and inverse quantization processing is performed. The output is input to the inverse orthogonal transform unit 13 and restored to the local decoded prediction difference signal 14. On the other hand, the motion compensation prediction unit 102 receives the macroblock type/sub-macroblock type 106, the inter-prediction mode sharing identification flag 123, the motion vector 137, and the reference image identification number, and obtains the predicted image 7 based on these pieces of information. A specific procedure for obtaining the predicted image 7 will be described later. The adder 18 adds the local decoded prediction difference signal 14 and the predicted image 7 to obtain a temporary decoded image 15 (which is a signal identical to the local decoded image 15 in the encoding apparatus). The provisional decoded image 15 is written back to the memory 16 for use in the intra prediction of the subsequent macroblock. A 3-plane memory is prepared for each color component (in the present embodiment, 3-plane is described, but may be appropriately changed according to design). In addition, the deblocking filter 26 is caused to act on the provisional decoded image 15 in accordance with the instruction of the deblocking filter control flag 24 interpreted by the variable length decoding unit 25, and a final decoded image 27 is obtained.
4. Inter-prediction decoding process of decoding device
The decoding apparatus of fig. 31 receives the video stream 22 corresponding to the array of fig. 34 output from the encoding apparatus of fig. 30, and decodes 3 color components in units of macroblocks of the same size (4:4:4 format) to restore each video frame.
The inter-prediction image generation process, which is a feature of the decoding apparatus of embodiment 7, will be described in detail. This processing is performed in units of macroblocks in which the 3 color components are grouped, and is mainly executed by the variable length decoding unit 25 and the motion compensation prediction unit 102 in the decoding device of fig. 31. Fig. 35 is a flowchart showing the flow of the present process.
The video stream 22 as input to the variable length decoding section 25 corresponds to the data array of fig. 34. In step S110, the inter prediction mode sharing flag 123 in the data of fig. 34 is first decoded (step S110). Further, the basic macroblock type 128 and the basic sub-macroblock type 129 are decoded (step S111). In step S112, it is determined whether or not the inter prediction mode is commonly used for C0, C1, and C2 using the inter prediction mode sharing flag 123, and if the inter prediction mode is commonly used (Yes in step S112), the basic macroblock type 128 and the basic sub macroblock type 129 are used for all of C0, C1, and C2, and if not (No in step S112), the basic macroblock type 128 and the basic sub macroblock type 129 are used as the C0 mode, and the extended macroblock type 130 and the extended sub macroblock type 131 are decoded for C1 and C2, respectively (step S113), and the inter prediction mode information of C1 and C2 is obtained. Next, the base reference image identification number 132 and the base motion vector information 133 are decoded (step S114), and when the inter-prediction mode sharing identification flag 123 indicates "common to C0, C1, and C2" (Yes at step S115), the base reference image identification number 132 and the base motion vector information 133 are used for all of C0, C1, and C2, and otherwise (No at step S115), the base reference image identification number 132 and the base motion vector information 133 are used as information of C0, and the extended reference image identification number 134 and the extended motion vector information 135 are decoded for C1 and C2, respectively (step S116). After the above processing, the macroblock type/sub-macroblock type 106, the reference image identification number, and the motion vector information of each color component are determined, and therefore, they are input to the motion compensation prediction unit 102, and a motion compensation predicted image of each color component is obtained.
Fig. 36 shows a modification of the bit stream data array of fig. 34. In fig. 36, the inter-prediction mode sharing identification flag 123 is not multiplexed as a flag at the macroblock level but as a flag located in an upper data layer such as a slice, a picture, or a sequence. Thus, when sufficient prediction efficiency can be ensured by switching in the upper layer above the slice, the inter-prediction mode sharing flag 123 does not need to be multiplexed one by one at the macroblock level, and overhead bits can be reduced.
In fig. 34 and 36, the inter-prediction mode sharing flag 123 is multiplexed for each upper data layer such as a macroblock, slice, picture, sequence, and the like, but when encoding is performed in the 4:4:4 format without multiplexing the inter-prediction mode sharing flag 123, an inter-prediction mode and motion vector information different for each component may be always used. Fig. 37 shows an array of bit stream data in this case. In fig. 37, the inter-prediction mode sharing identification flag 123 is not present, and the arrangement specification information 136 indicating that an input image of the 4:4:4 format is processed in an upper data layer such as a sequence is multiplexed, and the extended macroblock type 130, the extended sub-macroblock type 131, the extended reference picture identification number 134, and the extended motion vector information 135 are multiplexed in accordance with the decoding result of the arrangement specification information.
Example 8
Although in embodiment 7, the macroblock type, the sub-macroblock type, the motion vector, and the reference picture can be made different for each color component, in embodiment 8, a video encoding device and a video decoding device will be described, in which: the macroblock type/sub-macroblock type and the reference picture may be made common to each component, and only the motion vector may be different for each component. The video encoding device and the video decoding device according to embodiment 8 are the same as those of fig. 30 and fig. 31 of embodiment 7, but are different in that: the motion vector commonization flag 123b is used instead of the inter-prediction mode commonization flag 123.
1. Inter prediction mode determination processing in encoding device
The inter prediction mode determination process, which is a feature of the encoding device of embodiment 8, will be described centering on a process different from embodiment 7.
This processing is performed in units of macroblocks in which the 3 color components are aggregated, and is mainly executed by the motion compensation prediction unit 102 and the encoding mode determination unit 5 in the encoding device of fig. 30. Fig. 38 is a flowchart showing the flow of this process. Hereinafter, the image data of 3 color components constituting a block are referred to as C0, C1, and C2.
First, the coding mode determining unit 5 receives the motion vector sharing flag 123b, and determines whether or not the shared motion vector 137 is used for C0, C1, and C2 based on the value (step S120 in fig. 37). If the sharing is performed, the process proceeds to step S121 and thereafter, otherwise, the process proceeds to step S122 and thereafter.
When the motion vector 137 is commonly used for C0, C1, and C2, the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all selectable inter-prediction modes, motion vector search ranges, and reference images, and the motion compensation predicting unit 102 evaluates all prediction efficiencies thereof and selects an optimal inter-prediction mode, motion vector 137, and reference image that are commonly used for C0, C1, and C2 (step S121).
When the motion vector 137 is not used in common for C0, C1, and C2, but the optimal motion vector is selected for each of C0, C1, and C2, the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all selectable inter-prediction modes, motion vector search ranges, and reference images, the motion compensation predicting unit 102 evaluates all prediction efficiencies thereof, selects the optimal inter-prediction mode and reference image that are common to C0, C1, and C2 components (step S122), and further selects the motion vector that is optimal for Ci (i < 0 < 3) (steps S123, S124, and S125).
The motion vector commonization identification flag 123b must be multiplexed into the bit stream so as to be able to be identified on the decoding apparatus side. Fig. 39 shows a data array of such a bit stream.
Fig. 39 shows a data array of a macroblock-level bit stream. The macroblock type 128b, the sub-macroblock type 129, and the reference picture identification number 132b are "common to C0, C1, and C2". The basic motion vector information 133 indicates common motion vector information when the motion vector sharing flag 123b indicates "common to C0, C1, and C2", and indicates motion vector information corresponding to C0 otherwise. The extended motion vector information 135 is multiplexed with C1 and C2 only when the motion vector sharing flag 123b indicates "common to C0, C1, and C2", respectively, and indicates motion vector information corresponding to C1 and C2. The macroblock type/sub-macroblock type 106 in fig. 30 and 31 is a general name of the macroblock type 128b and sub-macroblock type 129b in fig. 39.
2. Inter-prediction decoding process of decoding device
The decoding apparatus of embodiment 8 receives the video stream 22 corresponding to the array of fig. 39 output from the encoding apparatus of embodiment 8, decodes 3 color components in units of macroblocks of the same size (4:4:4 format), and restores each video frame.
The inter-prediction image generation process, which is a feature of the decoding apparatus of embodiment 8, will be described in detail mainly on the difference from embodiment 7. This processing is performed in units of macroblocks in which the 3 color components are grouped, and is mainly executed by the variable length decoding unit 25 and the motion compensation prediction unit 102 in the decoding device of fig. 31. Fig. 40 is a flowchart showing a flow of the process executed by the variable length decoding unit 25 in the present process.
The video stream 22 as input to the variable length decoding section 25 coincides with the data array of fig. 39. In step S126, the macroblock type 128b and the sub-macroblock type 129b common to C0, C1, and C2 are first decoded. Since the block size as the motion compensation unit is determined based on the decoded macroblock type 128b and sub-macroblock type 129b, the reference picture identification number 132 common to C0, C1, and C2 is decoded for each block as the motion compensation unit (step S127). In step S128, the motion vector commonization flag 123b is decoded. Next, the basic motion vector information 133 is decoded for each block as a motion compensation unit (step S129). In step S130, it is determined whether or not the motion vector 137 is commonly used for C0, C1, and C2 using the result of the motion vector commonalization flag 123b, and if the motion vector is commonalized (Yes in step S130), the basic motion vector information 133 is used for all of C0, C1, and C2, otherwise (No in step S130), the basic motion vector information 133 is used as the C0 mode, and the extended motion vector information 135 is decoded for C1 and C2, respectively (step S131). The macroblock type/sub-macroblock type 106, the reference image identification number, and the motion vector information of each color component are determined through the above processing procedure, and are input to the motion compensation prediction unit 102, so that a motion compensation predicted image of each color component is obtained.
Fig. 41 shows a variation of the bit stream data array of fig. 39. In fig. 39, the motion vector commonization identification flag 123b is not multiplexed as a flag at the macroblock level but as a flag located in an upper data layer such as a slice, a picture, or a sequence. Thus, when sufficient prediction efficiency can be ensured by switching in the upper layer above the slice, the motion vector sharing flag 123b does not need to be multiplexed one by one at the macroblock level, and overhead bits can be reduced.
In fig. 39 and 41, the motion vector common identification flag 123b is multiplexed for each upper data layer such as a macroblock, slice, picture, sequence, and the like, but when encoding is performed in the 4:4:4 format without multiplexing the motion vector common identification flag 123b, motion vector information different for each component may be used at all times. Fig. 42 shows an array of bit stream data in this case. In fig. 42, the motion vector sharing identification flag 123b is not present, and the arrangement specification information 136 indicating that an input image of 4:4:4 format is processed in an upper data layer such as a sequence is multiplexed, and the extended motion vector information 135 is multiplexed based on the decoding result of the arrangement specification information.
In embodiment 8, the macroblock type/sub-macroblock type 106 and the reference picture can be used in common for each color component, and only the motion vector 139 is different for each color component. Thus, when sufficient prediction efficiency can be obtained by adapting only the motion vector 137 for each color component, it is not necessary to multiplex the macroblock type/sub-macroblock type 106 and the reference picture identification number for each color component, and overhead bits can be reduced.
Example 9
In embodiment 7, it is possible to switch whether the macroblock type/sub-macroblock type 106, the motion vector 137, and the reference image are used in common for 3 components or different for each color component, based on the inter-prediction mode sharing flag 123 and the arrangement specification information 136, but in embodiment 9, a 4:4:4 format image such as a Y, Cb, or Cr format is assumed so that it is possible to switch between different modes for the luminance component (Y) and the color difference components (Cb, Cr) (in this case, a common mode is used for 2 components of the color difference components). That is, the present invention is a video encoding device and a video decoding device, characterized in that: it is possible to switch between 3 components common to each other, different for each component, and different for the luminance component and the color difference component. The video encoding device and the video decoding device according to embodiment 9 have the same configurations as those of fig. 30 and 31 of embodiment 7.
1. Inter prediction mode determination processing in encoding device
The inter prediction mode determination process, which is a feature of the encoding apparatus of embodiment 9, will be described centering on a process different from embodiment 7.
This processing is performed in units of macroblocks in which the 3 color components are aggregated, and is mainly executed by the motion compensation prediction unit 102 and the encoding mode determination unit 5 in the encoding device of fig. 30. Fig. 43 is a flowchart showing the flow of this processing. Hereinafter, the image data of 3 color components constituting a block are referred to as C0, C1, and C2.
First, the encoding mode determining unit 5 receives the inter prediction mode sharing flag 123, and determines whether or not the common inter prediction mode, the common motion vector 137, and the common reference picture are used for C0, C1, and C2 based on the value (step S132 in fig. 43). If the sharing is performed, the process proceeds to step S133 and thereafter, otherwise, the process proceeds to step S134 and thereafter or step S137 and thereafter.
When the inter prediction mode, the motion vector 137, and the reference image are used in common for C0, C1, and C2, the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all selectable inter prediction modes, motion vector search ranges, and reference images, and the motion compensation predicting unit 102 evaluates all prediction efficiencies thereof and selects an optimal inter prediction mode, motion vector 137, and reference image in common for C0, C1, and C2 (step S133).
When the inter-prediction mode, the motion vector 137, and the reference image are not used in common for C0, C1, and C2, but the optimal mode is selected for each of C0, C1, and C2, the coding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter-prediction modes, motion vector search ranges, and reference images that can be selected for the Ci (i < 0 < 3) component, evaluates all the prediction efficiencies thereof, and selects the optimal inter-prediction mode and reference image that are used in common for the Ci (i < 0 < 3) (steps S134, S135, and S136).
When the inter-prediction mode, the motion vector 137, and the reference image are used in common for C1 and C2, and the optimal mode is selected for C0 (corresponding to the luminance component), C1, and C2 (corresponding to the color difference component), the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all the inter-prediction modes, motion vector search ranges, and reference images that can be selected for the C0 component, and the motion compensation predicting unit 102 evaluates all the prediction efficiencies thereof and selects the inter-prediction mode, the motion vector 137, and the reference image that are optimal for the C0 component (step S137). Further, all the inter-prediction modes, motion vector search ranges, and reference images that can be selected for the C1 and C2 components are notified, and the motion compensated prediction unit 102 evaluates all the prediction efficiencies thereof and selects an optimal inter-prediction mode, motion vector 137, and reference image that are common to the C1 and C2 components (step S138).
The data array of the bit stream output by the encoding apparatus of embodiment 9 is the same as that in fig. 34, but when the inter-prediction mode sharing flag 123 indicates "common to C1 and C2", the extended macroblock type 130, the extended sub-macroblock type 131, the extended reference identification number 134, and the extended motion vector information 135 are information common to C1 and C2.
2. Inter-prediction decoding process of decoding device
The decoding apparatus of embodiment 9 receives the video stream 22 corresponding to the array of fig. 34 output from the encoding apparatus of embodiment 9, decodes 3 color components in units of macroblocks of the same size (4:4:4 format), and restores each video frame.
The inter-prediction image generation process, which is a feature of the decoding apparatus of embodiment 9, will be described in detail mainly on the difference from embodiment 7. This processing is performed in units of macroblocks in which the 3 color components are grouped, and is mainly executed by the variable length decoding unit 25 and the motion compensation prediction unit 102 in the decoding device of fig. 31. Fig. 44 is a flowchart showing the flow of the present process.
The video stream 22 as input to the variable length decoding section 25 corresponds to the data array of fig. 34. In step S140, the inter prediction mode sharing flag 123 in the data of fig. 34 is decoded (step S140). Further, the basic macroblock type 128 and the basic sub-macroblock type 129 are decoded (step S141). In step S142, it is determined whether the inter prediction mode is commonly used for C0, C1, and C2 using the result of the inter prediction mode commonization flag 123, and in the case of commonization, the basic macroblock type 128 and the basic sub-macroblock type 129 are used for all of C0, C1, and C2, and otherwise, the basic macroblock type 128 and the basic sub-macroblock type 129 are used as the modes of C0. When C1 and C2 are shared, the extended macroblock type 130 and the extended sub-macroblock type 131 shared by C1 and C2 are decoded (step S143). When different modes are used for C0, C1, and C2, the extended macroblock type 130 and the extended sub-macroblock type 131 are decoded for C1 and C2, respectively (steps S144, S145, and S146), and mode information of C1 and C2 is obtained. Next, the base reference image identification number 132 and the base motion vector information 133 are decoded (step S147), and when the inter-prediction mode sharing identification flag 123 indicates "common to C0, C1, and C2", the base reference image identification number 132 and the base motion vector information 133 are used for all of C0, C1, and C2, and otherwise, the base reference image identification number 132 and the base motion vector information 133 are used as the C0 mode. If C1 and C2 are common, the extended reference picture identification number 134 and the extended motion vector information 135 common to C1 and C2 components are decoded (step S149). When different modes are used for C0, C1, and C2, the extended reference picture identification number 134 and the extended motion vector information 135 are decoded for C1 and C2, respectively (steps S150, S151, and S152). After the above processing, the macroblock type/sub-macroblock type 106, the reference image identification number, and the motion vector information of each color component are determined, and they are output to the motion compensation prediction unit 102, so that a motion compensation predicted image of each color component is obtained.
Note that, in the case where the inter-prediction mode sharing flag 123 indicates "common to C1 and C2" in the case of the data array of the bit stream in fig. 36, the extended macroblock type 130, the extended sub-macroblock type 131, the extended reference picture identification number 134, and the extended motion vector information 135 are information common to C1 and C2, and the video encoding apparatus and the video decoding apparatus that input and output the video stream matching the data array shown in fig. 36 operate similarly to the case of fig. 34.
In embodiment 9, the macroblock type/sub-macroblock type 106, motion vector 137, and reference image can be made different for each color component, but the macroblock type/sub-macroblock type 106 and reference image may be made common to each component, and only the motion vector 137 may be switched to be common to 3 components, different for each component, or common to C1 and C2, so that the optimum is selected between C0 and C1, C2, respectively. In this case, the data array of the bit stream matches fig. 39 or fig. 41, and in this case, when the inter-prediction mode sharing flag 123 indicates "common to C1 and C2", the extended motion vector information 135 is common to C1 and C2.
Example 10
In embodiment 10, a method of encoding an input motion vector 137 in the variable length encoding unit 11 of the encoding apparatus described in embodiment 7 and multiplexing the encoded motion vector into a bit stream, and a method of decoding a motion vector 137 from a bit stream in the variable length decoding unit 25 of a corresponding decoding apparatus will be described.
Fig. 45 is a configuration of a motion vector encoding unit that encodes a motion vector 137, which is a part of the variable length encoding unit 11 of the encoding apparatus shown in fig. 30.
A method of multiplexing the motion vector 137 of 3 color components (C0, C1, C2) into a bitstream in the order of C0, C1, C2 is explained.
Let the motion vector 137 of C0 be mv 0. The motion vector predictor 111 obtains a prediction vector (mvp0) of the motion vector 137 of C0. As shown in fig. 46, motion vectors (mvA0, mvB0, mvC0) of a block (A, B, C in fig. 46) adjacent to the block in which the motion vector (mv0) to be encoded is located are acquired from the memory. In addition, the motion vectors 137 of A, B, C are all multiplexed into the bitstream. The median values of mvA0, mvB0, and mvC0 were calculated as mvp 0. The calculated prediction vector mvp0 and the encoding target motion vector mv0 are input to the differential motion vector calculation unit 112. The differential motion vector calculation section 112 calculates a differential vector (mvd0) of mv0 and mvp 0. The calculated mvd0 is input to the differential dynamic vector variable length coding unit 113, and average information amount (entropy) coding is performed by means of huffman coding or arithmetic coding.
Next, the motion vector of C1 (mv1) is encoded. The motion vector predictor 111 obtains a prediction vector (mvp1) of the motion vector 137 of C1. As shown in fig. 46, motion vectors (mvA1, mvB1, and mvC1) of blocks adjacent to the block in which the encoding target motion vector (mv1) is located, and a motion vector (mv0) of C0 at the same position as the block at the position of mv1 are obtained from the memory. In addition, the motion vectors 137 of A, B, C are all multiplexed into the bitstream. The median values of mvA1, mvB1, and mvC1 were calculated as mvp 1. The calculated prediction vector mvp1 and the encoding target motion vector mv1 are input to the differential motion vector calculation unit 112, and a differential vector between mv1 and mvp1 is calculated (mvd1 ═ mv1-mvp 1). The calculated mvd1 is input to the differential dynamic vector variable length coding unit 113, and average information amount (entropy) coding is performed by means of huffman coding or arithmetic coding.
Next, the motion vector of C2 (mv2) is encoded. The motion vector predictor 111 obtains a prediction vector (mvp2) of the motion vector 137 of C2. As shown in fig. 46, motion vectors (mvA2, mvB2, mvC2) of blocks adjacent to the block in which the encoding target motion vector (mv2) is located, and motion vectors (mv1, mv2) of C0 and C1 at the same positions as the block at the position of mv2 are obtained from the memory. The median value of mvA2, mvB2, mvC2, mv0, mv1 was calculated as mvp 2. The calculated prediction vector mvp2 and the encoding target motion vector mv2 are input to the differential motion vector calculation unit 112, and a differential vector between mv2 and mvp2 is calculated (mvd2 ═ mv2-mvp 2). The calculated mvd2 is input to the differential dynamic vector variable length coding unit 113, and average information amount (entropy) coding is performed by means of huffman coding or arithmetic coding.
Fig. 47 shows a configuration of a motion vector decoding unit 250 that decodes a motion vector 137, which is a part of the variable length decoding unit 25 of the decoding apparatus shown in fig. 31.
In the motion vector decoding section 250, the motion vectors 137 of 3 color components multiplexed in the video stream 22 are decoded in the order of C0, C1, and C2.
In the differential motion vector variable length decoding unit 251, differential motion vectors (mvd0, mvd1, mvd2) of 3 color components (C0, C1, C2) multiplexed in the video stream 22 are extracted and variable length decoded.
In the motion vector predictor 252, a prediction vector (mvp0, mvp1, mvp2) of the motion vector 137 of C0, C1, and C2 is calculated. The calculation method of the prediction vector is the same as the motion vector prediction unit 111 of the encoding apparatus.
Next, the motion vector calculation unit 253 adds the difference motion vector to the prediction vector to calculate a motion vector (mvi ═ mvdi + mvpi (i ═ 0, 1, 2)). Since the calculated motion vector 137 is used as a prediction vector candidate, it is stored in the memory 16.
According to embodiment 10, when encoding and decoding a motion vector, motion vectors of blocks of the same color component adjacent to a block in which a motion vector to be encoded is located and motion vectors of blocks of different color components at the same position as a block at the position of the motion vector to be encoded are used as candidates for a prediction vector, and therefore, when there is no continuity of motion vectors of adjacent blocks in the same color component such as a boundary region of an object, motion vectors of blocks at the same position of different color components are used as candidates for a prediction vector, it is possible to obtain an effect of improving the prediction efficiency of a vector and reducing the code amount of a motion vector.
Example 11
In this embodiment 11, an embodiment of another encoding device and decoding device derived from the encoding device and decoding device described in embodiment 7 will be described. The encoding device and the decoding device according to embodiment 11 are characterized in that: a method of determining whether or not to encode the video data in accordance with respective header information in accordance with a predetermined control signal, and multiplexing the information of the control signal into a video stream 22 is provided. In addition, header information necessary for decoding the C0, C1, and C2 components is multiplexed into the video stream 22 in accordance with the control signal, and a unit for efficiently encoding a skip (or not-coded) macroblock in the case where there is no information on motion vectors and transform coefficients to be transmitted in accordance with the control signal.
In a conventional MPEG video encoding system including AVC, a signal transmission (signaling) is performed particularly when there is no encoding information to be transmitted for a macroblock to be encoded, thereby minimizing the amount of encoding of the macroblock and realizing high-efficiency encoding. For example, when a certain macroblock is to be encoded, image data at the exact same position on a reference image for motion compensation prediction is used as a predicted image (that is, the motion vector is 0), and the obtained prediction error signal is transformed and quantized, and as a result, when all the quantized transform coefficients in the macroblock are 0, the amplitude of the prediction error signal obtained by inverse quantization on the decoding side is 0, and the transform coefficient data to be transmitted to the decoding apparatus side is not present. Further, if the motion vector is assumed to be 0 in combination, a special macroblock type such as "motion vector is 0 and no transform coefficient" can be defined. Such macroblocks, previously known as skip macroblocks, or not coded macroblocks, are processed so that no redundant information is transmitted by making a special signaling. In AVC, assuming that motion vectors are "when 16 × 16 prediction in fig. 32(a) is performed and prediction values used for coding motion vectors (corresponding to prediction vectors mvp0, mvp1, and mvp2) are equal to actual motion vectors" is assumed to be a skip macroblock if the conditions match and there is no transform coefficient data to be transmitted. In the conventional AVC, when the skip macroblock is coded, any one of the following 2 methods is selected according to the variable length coding method used.
The method comprises the following steps: the number of consecutive skipped macroblocks (RUN length) within a slice is counted, and the RUN length is variable-length-coded.
The method 2 comprises the following steps: for each macroblock, an indication whether it is a skipped macroblock is encoded.
Fig. 48 shows a bit stream syntax of each method. Fig. 48 a shows a case where adaptive haffman coding is used as the variable length coding scheme (method 1), and fig. 48 b shows a case where adaptive arithmetic coding is used (method 2). The signaling of the skipped macroblock is performed by mb _ skip _ run in case of method 1 and by mb _ skip _ flag in case of method 2. Mb (n) represents the encoded data for the nth (not skipped) macroblock. It should be noted that mb _ skip _ run, mb _ skip _ flag are allocated in units of macroblocks in which C0, C1, and C2 components are aggregated.
In contrast, the encoding device and the decoding device according to embodiment 11 provide the following methods: in accordance with the state of the control signal, i.e., the signal corresponding to the inter-prediction mode sharing flag 123 described in embodiment 7, header information including motion vectors and the like is changed for each of the components C0, C1, and C2, and signal transmission is performed for each of the components C0, C1, and C2 to skip macroblocks. Fig. 49 and 50 show specific examples of bit stream grammars.
Fig. 49 shows the structure of macroblock coded data inputted to the decoding device of embodiment 11, and fig. 50 shows the detailed structure of coded data of Cn component header information in fig. 49. Hereinafter, in order to explain the effect of the bit stream structure, the operation of the decoding apparatus that receives the bit stream and restores the video signal will be mainly explained. Fig. 31 is referred to in the description of the operation of the decoding apparatus.
The inter prediction mode sharing flag 123 of embodiment 7 is shown as a macroblock header sharing flag 123c by extending the definition. The macroblock header sharing identification flag 123C is a flag indicating whether only the C0 component header information 139a is multiplexed as header information commonly used for the C1 and C2 components, or the C1 component header information 139b and C2 component header information 139C are multiplexed as extension header information, respectively, by regarding the C0 component header information 139a as basic macroblock header information. The macroblock header-sharing identification flag 123c is extracted and decoded from the video stream 22 by the variable length decoding unit 25. When the flag indicates that only the C0 component header information 139a is multiplexed as header information commonly used for the C1 and C2 components, the C0 component header information 139a is used to decode all the C0, C1, and C2 components in the macroblock, and when the flag indicates that the C1 component header information 139a and C2 component header information 139C are multiplexed as extension header information individually, the header information 139a to 139C unique to each of the C0, C1, and C2 components in the macroblock is used to decode the respective components. This point will be described in further detail below in terms of processing in units of macroblocks.
1. Case of multiplexing only C0 component header information
When the macroblock header sharing identification flag 123C indicates that only the C0 component header information 139a is multiplexed as header information commonly used for the C1 and C2 components, the macroblock is decoded from the various kinds of macroblock header information included in the C0 component header information 139a for all the components of the C0, C1, and C2. In this case, since the C0 component skip instruction information 138a and the C0 component header information 139a are commonly applied to the C1 and the C2 components, the skip instruction information (138b, 138C) and the header information (139b, 139C) for the C1 and the C2 components are not multiplexed into the bitstream.
The variable length decoding unit 25 first decodes and evaluates the C0 component skip instruction information 138 a. Here, when the C0 component skip instruction information 138a indicates "skip", it is considered that the C0 component header 139a is not encoded, and the transform coefficient validity indication information 142 in the C0 component header 139a is 0 (transform coefficient which is not encoded at all). Thus, it is considered that all the C0 to C2 component transform coefficient data (140a to 140C) are not encoded, and all the quantized transform coefficients 10 in the macroblock are output as 0. Further, the motion vectors 137 of all the components of C0, C1, and C2 are set to the same value and output according to the definition of the skip macroblock.
In the case where the C0 component skip indication information 138a indicates not "skip", the C0 component header information 139a exists, which is decoded. In the header information 139a of C0, if the macroblock type 128b indicates intra coding, the intra prediction mode 141, the transform coefficient validity invalidation indication information 142, and the quantization parameter (if the transform coefficient validity invalidation indication information 142 is not 0) are decoded. Here, if the transform coefficient validity indication information 142 is not 0, the C0, C1, and C2 component transform coefficient data (140a to 140C) are decoded and output as the quantized transform coefficient 10. When the transform coefficient validity indication information 142 is 0, all of the C0 to C2 component transform coefficient data (140a to 140C) are set to 0, and all of the quantized transform coefficients 10 in the macroblock are set to 0 and output. If the macroblock type 128b indicates inter coding, the sub-macroblock type 129b is decoded as necessary, and the quantization parameter 21 is further decoded with reference to the picture identification number 132b, the motion vector information 133b, the transform coefficient validity/invalidity instruction information 142, and (if the transform coefficient validity/invalidity instruction information 142 is not 0). Here, if the transform coefficient validity indication information 142 is not 0, the C0, C1, and C2 component transform coefficient data (140a to 140C) are decoded and output as the quantized transform coefficient 10. When the transform coefficient validity indication information 142 is 0, all of the C0 to C2 component transform coefficient data (140a to 140C) are set to 0, and all of the quantized transform coefficients 10 in the macroblock are set to 0 and output. By the above operation, the macroblock is decoded in accordance with a predetermined processing procedure using the output from the variable length decoding section 25, which is the same as in embodiment 7.
2. Multiplexing of corresponding header information for C0, C1, and C2 components
When the macroblock header-sharing identification flag 123C indicates that the C1 component header information 139b and the C2 component header information 139C are multiplexed into the extension header information separately from the C0 component header information 139a, each component image is decoded based on each of the macroblock header information included in the corresponding header information (139a to 139C) for each of the components C0, C1, and C2. In this case, skip indication information (138b, 138C) and header information (139b, 139C) for the C1 and C2 components are multiplexed into the bitstream.
The variable length decoding unit 25 first decodes and evaluates the C0 component skip instruction information 138 a. Here, when the C0 component skip instruction information 138a indicates "skip", it is considered that the C0 component header 139a is not encoded, and the transform coefficient validity indication information 142 in the C0 component header 139a is 0 (transform coefficient which is not encoded at all). Thus, it is considered that the C0 component transform coefficient data 140a is not encoded, and the quantized full transform coefficient of the C0 component is set to 0 (that is, the relationship between the C0 component skip instruction information 138a and the transform coefficient validity/invalidity instruction information 142 changes depending on the value of the macroblock header sharing flag 123C). Further, the motion vector 137 of the C0 component is set and output, as defined in the case where the C0 component is skipped.
In the case where the C0 component skip indication information 138a indicates not "skip", the C0 component header information 139a exists, which is decoded. In the C0 component header information 139a, if the macroblock type 128b indicates intra coding, the quantization parameter 21 is decoded in the intra prediction mode 141 (a mode of spatial pixel prediction using a pixel near a predicted target pixel in the frame as a prediction value), the transform coefficient valid/invalid indication information 142, and (if the transform coefficient valid/invalid indication information 142 is not 0). Here, if the transform coefficient validity indication information 142 is not 0, the C0 component transform coefficient data is decoded and output as the quantized transform coefficient 10. When the transform coefficient validity/invalidity instruction information is 0, all the transform coefficient data of the C0 component is assumed to be 0. If the macroblock type indicates inter-coding, the sub-macroblock type is decoded as necessary, and further, the quantization parameter such as the reference picture identification number, motion vector information, conversion coefficient validity indication information, and (if the conversion coefficient validity indication information is not 0) quantization parameter is decoded. Here, if the transform coefficient validity indication information is not 0, the C0 component transform coefficient data is decoded and output as the quantized transform coefficient 10. When the transform coefficient validity/invalidity instruction information is 0, all the transform coefficient data of the C0 component is assumed to be 0. The above processing steps are also performed for C1 and C2.
The same as in embodiment 7 is true for decoding each component of C0, C1, and C2 in a macroblock in accordance with a predetermined processing procedure using the output from the variable length decoding unit 25 obtained through the above operation.
The above description has been focused on the operation of the decoding apparatus, and by configuring the bit stream in this manner, the following effects can be obtained. First, in the conventional AVC, only 1 set of header information is available for each macroblock, and all components of C0, C1, and C2 must be internally and indirectly determined and encoded in accordance with the header information. On the other hand, in the case where the signal component corresponding to the luminance signal which transmits the content of the image signal is equivalently included in 3 color components as in the 4:4:4 format, since there is a possibility that the signal characteristics may be varied due to noise or the like of the input video signal for each component, it is not necessarily optimal to collectively encode all the components of C0, C1, and C2. Assuming the bit stream structure of fig. 49 and 50 of embodiment 11, the encoding device can select and encode an optimal encoding mode (macroblock type including intra/inter coding type) and motion vector, etc. corresponding to the signal characteristics for each of the components C0 to C2 based on the macroblock header sharing flag 123C, and can improve the encoding efficiency.
Further, in the conventional technique, coding is performed on a macroblock unit in which all components of C0 to C2 are grouped, and skip is determined on the condition that all pieces of coding information of all components are not present, but in embodiment 11, the presence or absence of coding information can be determined by skip instruction information 138 for each component, and therefore, when only one component is skipped and other components are not skipped, it is not necessary to make all components not skipped, and the coding amount can be more efficiently allocated. In the encoding apparatus, the variable length encoding unit 11 determines the value of the skip instruction signal 138 based on the quantized transform coefficient data 10, the motion vector 137, the reference picture identification number 132b, and the macroblock type/sub-macroblock type 106 defined according to the definition of a skip macroblock uniquely defined in both the encoding apparatus and the decoding apparatus as described above.
The bit stream processed by the encoding apparatus and the decoding apparatus according to embodiment 11 may have a structure as shown in fig. 51. In this example, the skip instruction information (138), the header information (139a to 139C), and the transform coefficient data (140a to 140C) of each component of C0, C1, and C2 are arranged in a collective manner. In this case, the skip instruction information 138 may be obtained by arranging the states of C0, C1, and C2 with 1-bit code symbols, or by encoding 8 states collectively with 1 code symbol. When the correlation of the skip state between the color components is high, the coding efficiency of the skip instruction information 138 itself can be improved by appropriately defining a context model of arithmetic coding (which will be described in embodiment 12) by integrating the code symbols.
The macroblock header sharing identification flag 123c may be multiplexed into the bit stream in units of arbitrary data layers such as macroblocks, slices, pictures, and sequences. When there is always a difference in signal properties between color components in an input signal, if the configuration is such that the macroblock header-sharing identification flag 123c is multiplexed in units of a sequence, efficient encoding can be performed with less overhead information. If the macroblock header sharing identification flag 123c is configured to be multiplexed on a picture-by-picture basis, the header is shared for I pictures with less distortion of the macroblock type, and a separate header is used for each color component for P, B pictures with more distortion of the macroblock type, thereby making it possible to expect an effect of improving the balance between the encoding efficiency and the calculation load. Further, from the viewpoint of encoding control of a video signal in which signal properties change for each image such as frame change (sine change), it is desirable to switch image layers. If the macroblock header sharing identification flag 123c is multiplexed in units of macroblocks, it is possible to control whether or not to share header information in units of macroblocks in accordance with the signal states of the color components, on the other hand, if the coding amount of each macroblock is increased, and to better track local signal fluctuations of an image, thereby enabling a coding apparatus with improved compression efficiency to be configured.
When switching the coding type corresponding to the picture type at the slice level as in AVC, the following method can be considered: the macroblock header sharing identification flag 123C is multiplexed for each slice, and when the flag indicates "common to C0, C1, and C2", the bit stream is configured so that the slice includes coded information of all 3 color components, and when the flag indicates "common to C0, C1, and C2", the bit stream is configured so that 1 slice includes information of 1 color component. Fig. 52 shows this case. In fig. 52, the macroblock header sharing identification flag 123c has a meaning of "encoding information that the current slice includes all 3 color components" or slice configuration identification information that the current slice includes encoding information of a specific color component ". Of course, such slice configuration identification information may be prepared separately from the macroblock header-sharing identification flag 123 c. In the case of identifying "the current slice contains coded information of a certain color component", it contains an identification of "which of C0, C1, C2". In this way, when switching between using one macroblock header for C0, C1, and C2 components in common or multiplexing macroblock headers (C0 slice, C1 slice, and C2 slice) for each of C0, C1, and C2 components individually in a slice unit, when 2 types of slices are mixed in 1 picture, the following restriction is placed on the C0 slice, C1 slice, and C2 slice, and the slices are always multiplexed in groups as data for encoding macroblocks at the same position in the picture into a bitstream. That is, the value of first _ mb _ in _ slice included in the slice header and indicating the position within the picture of the first macroblock of the slice is always the same value in the set of C0 slice, C1 slice, and C2 slice, and the number of macroblocks included in the set of C0 slice, C1 slice, and C2 slice is the same number. Fig. 53 shows this case. By providing such restrictions on the structure of the bit stream, the encoding device can adaptively select and encode a high-encoding-efficiency encoding method in a group of slices in which C0, C1, and C2 are mixed with C0 slices, C1 slices, and C2 slices according to the properties of local signals in the picture, and the decoding device can receive the bit stream thus encoded efficiently and reproduce video signals. For example, if the bit stream 22 input to the decoding apparatus of fig. 31 is of such a configuration, in the variable length decoding section 25, slice configuration identification information is decoded from the bit stream each time slice data is input, and it is identified which slice of fig. 52 the slice to be decoded thereafter is. When the encoded data is determined to be a group of C0 slices, C1 slices, and C2 slices based on the slice configuration identification information, the inter-prediction mode sharing flag 123 (or the macroblock header sharing flag 123C) may be set to "perform a decoding operation using a separate inter-prediction mode or (macroblock header) for C0, C1, and C2". Since the value of first _ mb _ in _ slice of each slice is guaranteed to be equal to the number of macroblocks in the slice, decoding processing can be performed without generating an interlace (overlap) or skip on slices and pictures in which C0, C1, and C2 are mixed.
In addition, since the reduction of the encoding efficiency can be avoided by providing such a restriction when the signal properties of the slices of C0, C1, and C2 are greatly different, it is also possible to add identification information that can select whether or not to permit the slices having different values of slice configuration identification information to be mixed in a picture at the picture level or the sequence level.
Example 12
In this embodiment 12, an embodiment of another encoding apparatus and decoding apparatus derived from the encoding apparatus and decoding apparatus described in embodiment 11 will be described. The encoding device and the decoding device according to embodiment 12 are characterized in that: when encoding each component of C0, C1, and C2 in a macroblock by using an adaptive arithmetic coding method, it is adaptively switched whether to share the symbol occurrence probability and the learning process thereof for arithmetic coding for all components or to separate each component, in accordance with an instruction signal multiplexed in a bit stream.
The encoding apparatus of the present embodiment 12 has only the processing in the variable length encoding section 11 of fig. 30 and the processing of the embodiment 11, and the decoding apparatus has only the processing in the variable length decoding section 25 of fig. 31 and the processing of the embodiment 11, and the other operations are the same as those of the embodiment 11. The arithmetic encoding and decoding process, which is the point of the invention of this embodiment 12, is explained in detail below.
1. Encoding process
Fig. 54 shows an internal configuration related to arithmetic coding processing in the variable length coding unit 11, and fig. 55 and 56 show an operation flow thereof.
The variable length coding part 11 of the present embodiment 12 includes: a context model determining unit 11a for determining a context model (described later) defined for each data type such as a motion vector 137 to be encoded, a reference picture identification number 132b, a macroblock type/sub-macroblock type 106, an intra prediction mode 141, and a quantized transform coefficient 10; a binarization section 11b for converting the multi-valued data into binary data in accordance with a binarization rule determined for each type of the encoding target data; an occurrence probability generating unit 11c that generates an occurrence probability of each bin value (0or 1) after binarization; an encoding section 11d that performs arithmetic encoding based on the generated occurrence probability; and a memory 11g for storing the occurrence probability information. The input to the context model determining unit 11a is various data input to the variable length coding unit 11 as coding target data, such as the motion vector 137, the reference picture identification number 132b, the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantized transform coefficient 10, and the output from the coding unit 11d is information relating to a macroblock in the video stream 22.
(1) Context model determination processing (step S160 in FIG. 55)
The context model is a result of modeling dependency relationship with other information that is a variation factor of the occurrence probability of the information source symbol (symbol), and by switching the state of the occurrence probability in accordance with the dependency relationship, it is possible to encode the symbol more adaptively to the actual occurrence probability. Fig. 57 shows the concept of the context model (ctx). In fig. 57, the information source symbol is binary, but may be multilevel. Assuming that the state of the occurrence probability of the information source symbol using the ctx should change according to the situation, the selection branches of the ctx such as 0 to 2 in fig. 57 are defined. When the video coding of embodiment 12 is described, the ctx value is switched according to the dependency relationship between the coded data in a certain macroblock and the coded data in the neighboring macroblocks. For example, fig. 58 shows an example of a Context model relating to motion vectors of macroblocks, which is disclosed in d.mark and other, "Video Conference Using Context-Based Adaptive mapping", and International Conference on Image Processing 2001. Fig. 58 shows that the motion vector of the block C is the encoding target (to be precise, the prediction difference value mvd obtained by predicting the motion vector of the block C from the neighborhood k(C) Coded), ctx _ mvd (C, k) represents a context model. mvdk(A) Representing the motion vector prediction difference value, mvd, of block Ak(B) A switching evaluation value e representing a motion vector prediction difference value of the block B and used for a context modelk(C) In the definition of (1). Evaluation value ek(C) Generally, when the degree of dispersion of nearby motion vectors is small, the mvd tends to be smallk(C) Small, on the contrary at ek(C) In the case of large, mvdk(C) And also becomes larger. Therefore, it is desirable to use the following formula ek(C) Make mvdk(C) Is suitable. It can be said that the set of variations (variation set) of the occurrence probability isIs a context model, in which case there are 3 occurrence probability changes.
The context model is also defined in advance for the encoding target data such as the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantized transform coefficient 10, and is shared by the encoding apparatus and the decoding apparatus. The context model determining unit 11a selects and processes a model that is predetermined according to the type of the data to be encoded (selecting which occurrence probability change in the context model corresponds to the occurrence probability generating process of (3) below).
(2) Binarization processing (step S161 in FIG. 55)
The binarization unit 11b binarizes the encoding target data, and specifies a context model corresponding to each bin (binary position) of the binary sequence. The rule of binarization is to transform into a variable-length binary sequence in accordance with the approximate distribution of values of each encoded data. The binarization is performed by encoding in bin units, as opposed to the case where arithmetic encoding is performed on an encoding target that can originally take multiple values, and the following advantages can be obtained: the number of probability-number straight line segments can be reduced, the calculation can be simplified, and the context model can be miniaturized (slim).
(3) Occurrence probability generation processing (step S162 in FIG. 55 (step S162 is detailed in FIG. 56))
In the above-described processes (1) and (2), binarization of multivalued encoding target data and setting of a context model to be applied to each bin are completed, and encoding preparation is performed. Next, generation processing of an occurrence probability state for arithmetic coding is performed in the occurrence probability generation section 11 c. Since each context model includes a change in the occurrence probability corresponding to each value of 0/1, the processing is performed with reference to the context model 11f determined in step S160 as shown in fig. 54. Determine e shown in FIG. 58 k(C) The evaluation value for selecting the occurrence probability is used to determine which occurrence probability change is used in the current coding from the selection branch of the context model to be referred to (step of fig. S56)Step S162 a). Furthermore, the variable length coding unit 11 of the present example 12 includes an occurrence probability information memory 11g, and a mechanism for storing the occurrence probability states 11h sequentially updated in the coding process for each color component. The occurrence probability generating unit 11C selects the occurrence probability state 11h used for the current coding or the data for sharing the C0 component with the C1 and the C2 from the data stored for the color components of C0 to C2, and determines the occurrence probability state 11h actually used for the coding, in accordance with the value of the occurrence probability state parameter sharing flag 143 (steps S162b to S162d in fig. 56).
Since the same selection is possible in the decoding apparatus, the occurrence probability state parameter commonization flag 143 must be multiplexed into the bit stream. With this configuration, the following effects are obtained. For example, if fig. 58 is taken as an example, in the case where the macroblock header-sharing identification flag 123C indicates that the C0 component header information 139a is used for other components, if the macroblock type 128b indicates the 16 × 16 prediction mode, only one e of fig. 58 is determined for each macroblock k(C) In that respect At this time, the occurrence probability state prepared for the C0 component is always used. On the other hand, if the macroblock header sharing flag 123C indicates that header information (139a to 139C) corresponding to each component is used, and if the macroblock type 128b indicates that the prediction mode is 16 × 16 for any of C0, C1, and C2, e in fig. 58k(C) There may be 3 changes for each macroblock. In the encoding unit 11d at the subsequent stage, it is possible to select whether to use and update the occurrence probability state 11h prepared for the C0 component in common or to use and update the 2 options prepared for the occurrence probability states 11h prepared for the respective color components individually for each change. In the former case, when the components C0, C1, and C2 have substantially the same motion vector distribution, the occurrence probability state 11h is commonly used and updated, and thus the number of times of learning is increased, and there is a possibility that the occurrence probability of the motion vector can be better learned. In contrast to the latter, when the components C0, C1, and C2 have discrete motion vector distributions, the occurrence probability states 11h are used and updated individually, thereby reducing the learning-related productIt is possible that the occurrence probability of the motion vector can be better learned by the mismatch match (miss match). Since the video signal is not constant, the efficiency of arithmetic coding can be improved by performing such adaptive control.
(4) Encoding process
According to (3), the probability of occurrence of the 0/1 values on the probability number straight line required for the arithmetic coding process can be obtained, and therefore the arithmetic coding is performed in the coding unit 11d in accordance with the process described in the conventional example (step S163 in fig. 55). The actual code value (0or 1)11e is fed back to the occurrence probability generating means 11c, and 0/1 is counted to update the used occurrence probability state 11h (step S164). For example, when the encoding processing of 100 bins is performed using a certain specific occurrence probability state 11h, the occurrence probabilities of 0/1 at which the occurrence probability changes are assumed to be 0.25 and 0.75. Here, if 1 is encoded using the same occurrence probability change, the occurrence frequency of 1 is updated, and the occurrence probability changes of 0/1 are 0.247 and 0.752. With this mechanism, efficient encoding can be performed in accordance with the actual occurrence probability. The encoded value 11e is an output from the variable length encoding section 11, and is output from the encoding apparatus as the video stream 22.
Fig. 59 shows an internal configuration of the arithmetic decoding process of the variable length decoding unit 25, and fig. 60 shows an operation flow thereof.
The variable length decoding section 25 of the present embodiment 12 includes: a context model determining unit 11a for determining a motion vector 137, and for determining a context model defined in common with the encoding apparatus for each type of decoding target data such as the image identification number 132b, the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantized transform coefficient 10; a binarization section 11b that outputs a binarization rule determined according to a type of the decoding target data; an occurrence probability generating section 11c that generates an occurrence probability of each bin (0or 1) in accordance with a binarization rule and a context model; a decoding unit 25a for performing arithmetic decoding based on the generated occurrence probability and decoding data such as the motion vector 137, the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantized transform coefficient 10 based on the binary sequence obtained as a result of the arithmetic decoding and the binarization rule; and a memory 11g for storing the occurrence probability information. 11a to 11c and 11g are the same as the internal components of the variable length coding member 11 of fig. 54.
(5) Context model determination processing, binarization processing, and occurrence probability generation processing
These processes are the same as the processes (1) to (3) on the encoding device side. Although not shown, the occurrence probability state parameter commonalization identification flag 143 is extracted from the video stream 22 in advance.
(6) Arithmetic decoding process
Since the probability of occurrence of a bin to be decoded thereafter is determined by the processing up to (6), the value of the bin is restored in accordance with predetermined arithmetic decoding processing in the decoding unit 25a (step S166 in fig. 60). The bin restoration value 25b is fed back to the occurrence probability generating means 11c, and 0/1 occurrence frequency is counted to update the used occurrence probability state 11h (step S164). In the decoding section 25a, each time the restoration value of each bin is determined, the correspondence with the binary sequence model (pattern) determined by the binarization rule is confirmed, and the data value indicated by the matching model is output as a decoded data value (step S167). Only when the decoded data is not determined, the process returns to step S166, and the decoding process is continued.
According to the encoding/decoding device having the above configuration and including the arithmetic encoding and arithmetic decoding processes, when the encoding information of each color component is adaptively arithmetically encoded in correspondence with the macroblock header sharing identification flag 123c, more efficient encoding can be performed.
In particular, although not shown, the unit of multiplexing the occurrence probability state parameter sharing identification flag 143 may be any of macroblock unit, slice unit, picture unit, and sequence unit. When sufficient coding efficiency can be ensured by multiplexing flags for an upper data layer located in a slice, a picture, a sequence, or the like, and switching the upper layer above the slice, it is possible to reduce overhead bits without multiplexing the occurrence probability state parameter sharing identification flag 143 on a macroblock level one by one.
The occurrence probability state parameter commonization flag 143 may be information that is specified inside the decoding apparatus based on the related information included in the bit stream different from itself.
In embodiment 12, when the macroblock header-sharing identification flag 123c is arithmetically encoded in units of macroblocks, the context model 11f uses the model shown in fig. 61. In fig. 61, the value of the macroblock header-sharing identification flag 123c of the macroblock X is IDCX. When the macroblock head-sharing identification flag 123C in the macroblock C is encoded, the value IDC of the macroblock head-sharing identification flag 123C in the macroblock a is used as the reference AValue IDC of macroblock header-sharing identification flag 123c of macroblock BBThe following 3 states are obtained from the formula in the figure.
Value 0: a, B is the mode "use a common macroblock header for C0, C1, and C2".
Value 1: a, B, the mode is such that any one of the pairs C0, C1 and C2 uses a common macroblock header, and the other pair C0, C1 and C2 uses individual macroblock headers.
Value 2: a, B, both "use separate macroblock headers for C0, C1, and C2".
By encoding the macroblock header sharing identification flag 123c in this way, arithmetic encoding can be performed appropriately according to the encoding state of the neighboring macroblock, and encoding efficiency can be improved. In addition, according to the operation description of the decoding apparatus of the above embodiment 12, it is apparent that the context model is defined by the same procedure on both the encoding side and the decoding side to perform arithmetic decoding.
In embodiment 12, header information (macroblock type, sub-macroblock type, intra prediction mode, image identification number for each picture, motion vector, transform coefficient validity indication information, and quantization parameter) in fig. 50 included in the macroblock header is arithmetically encoded using a context model defined for each information type, but any context model may be defined with reference to corresponding information of macroblock A, B for current macroblock C as shown in fig. 62. Here, as shown in fig. 62(a), when the macroblock C is in the mode "common macroblock header for C0, C1, and C2" and the macroblock B is in the mode "individual macroblock header for C0, C1, and C2", information of any specific color component of C0, C1, and C2 is used as reference information in the context model definition.
For example, when C0, C1, and C2 correspond to the R, G, B color component, a method of selecting a G component having a component closest to a luminance signal used for encoding in the related art as a signal for better expressing the structure of an image may be considered. This is because it is considered that even in the mode of "using a macroblock header common to C0, C1, and C2", information for specifying a macroblock header with reference to the G component is often encoded.
On the other hand, in the opposite case, as shown in fig. 62(B), when the macroblock C is in the "use individual macroblock headers for C0, C1, and C2" mode, and the macroblock B is in the "use common macroblock header for C0, C1, and C2" mode, the macroblock C needs to encode and decode header information of 3 color components, but in this case, as reference information for context definition of the header information of each color component, header information common to 3 components is used as 3 components for the macroblock B with the same value. It is to be noted that, when all of the macroblock header-sharing identification flags 123c indicate the same value for the macroblock A, B, C, the corresponding reference information is always present, and therefore, they are used.
In addition, according to the above description of the operation of the decoding apparatus of embodiment 12, it is obvious that the context model is defined by the same procedure on both the encoding side and the decoding side to perform arithmetic decoding. After determining which component information is used as a context model to refer to, the occurrence probability state is updated according to the state of the occurrence probability state parameter sharing identification flag 143 and the context model.
In example 12, the transform coefficient data of C0, C1, and C2 components are also arithmetically encoded with the occurrence probability distribution of the encoding target data. These data are not related to the common use of the macroblock header, and 3-component coded data are always included in the bit stream. In embodiment 12, since the prediction difference signal is obtained by performing intra-prediction and inter-prediction on the color space of the encoded input signal, it is considered that the distribution of the transform coefficient data obtained by performing integer transform on the prediction difference signal has the same occurrence probability distribution regardless of the peripheral state in which the macroblock header is not commonly used as in fig. 62. Therefore, in the present embodiment 12, a common context model is defined and used for encoding for each component of C0, C1, and C2, regardless of not commonly using a macroblock header. In decoding.
In addition, according to the above description of the operation of the decoding apparatus of embodiment 12, it is obvious that the context model is defined by the same procedure on both the encoding side and the decoding side to perform arithmetic decoding. After determining which component of the information is to be used as a reference context model, the occurrence probability state is updated according to the state of the occurrence probability state parameter sharing identification flag 143 and the context model.
Example 13
In example 13, an example of another encoding device and decoding device derived from the encoding devices and decoding devices described in examples 7 to 12 will be described. The encoding device and the decoding device according to embodiment 13 are characterized in that: an encoding device that performs color space conversion processing at an input stage of the encoding device described in embodiments 7 to 12, converts the color space of a video signal input to the encoding device after image capturing into an arbitrary color space suitable for encoding, and multiplexes information specifying inverse conversion processing for returning to the color space at the image capturing time on a decoding side into a bit stream; a configuration in which information specifying an inverse transform process is extracted from a bit stream, a decoded image is obtained by the decoding device described in embodiments 7 to 12, and then inverse color space transform is performed based on the information specifying the inverse transform process.
Fig. 63 shows the configurations of the encoding device and the decoding device according to example 13. An encoding device and a decoding device according to embodiment 13 will be described with reference to fig. 63.
The encoding device of example 13 is the encoding device 303 of examples 7 to 12, and further includes a color space conversion member 301 at a preceding stage thereof. The color space conversion unit 301 includes 1 or more color space conversion processes, selects a color space conversion process to be used according to the nature of an input video signal, system settings, and the like, and performs the color space conversion process on the input video signal. The resulting converted video signal 302 is sent to the encoding device 303. Information for identifying the color space conversion process used at the same time is output to the encoding device 303 as color space conversion method identification information 304. The encoding device 303 multiplexes the color space conversion method identification information 304 into a bit stream 305, which is compression-encoded by the method shown in examples 7 to 12 with the converted video signal 302 as the encoding target signal, and transmits the resultant signal to a transmission path or outputs the resultant signal to a recording device which records the resultant signal on a recording medium.
Here, as a color space conversion method to be prepared, for example, the following conversion methods are available:
The conversion from RGB to YUV used in the existing standard:
C0=Y=0.299xR+0.587xG+0.114xB
C1=U=-0.169xR-0.3316xG+0.500xB
C2=V=0.500xR-0.4186xG-0.0813xB
prediction between color components:
C0=G’=G
c1 ═ B' ═ bf (G) (where f (G) is the result of filtration treatment of component G)
C2=R’=Rf(G)
Conversion from RGB to YCoCg:
C0=Y=R/2+G/2+B/4
C1=Co=R/2B/2
C2=Cg=-R/4+G/2B/4
the input to the color space conversion section 301 is not necessarily limited to RGB, and the conversion processing is not limited to the above 3 kinds.
The decoding apparatus of embodiment 13 is the decoding apparatus 306 of embodiments 7 to 12, and further includes an inverse color space conversion unit 308 at the subsequent stage. The decoding apparatus 306 receives the bit stream 305 as input, extracts the color space conversion method identification information 304 from the bit stream 305, and outputs the decoded image 307 obtained by the operation of the decoding apparatus described in embodiments 7 to 12. Inverse color space conversion section 308 has inverse conversion processes corresponding to the color space conversion methods selectable by color space conversion section 301, specifies the conversion performed by color space conversion section 301 based on color space conversion method identification information 304 output from decoding apparatus 306, performs inverse conversion processes on decoded image 307, and performs a process of restoring the color space of the input video signal input to the encoding apparatus of example 13.
According to the encoding device and the decoding device of embodiment 13, the optimal color space conversion processing is performed on the encoded video signal at the previous stage of encoding and the subsequent stage of decoding processing, and the correlation included in the image signal composed of 3 color components is removed before encoding, so that encoding can be performed with redundancy reduced, and the compression efficiency can be improved. In the conventional standard encoding system such as MPEG, the color space of the signal to be encoded is limited to 1 type of YUV, but by including the color space conversion unit 301 and the inverse color space conversion unit 308 and including the color space conversion method identification information 304 in the bitstream 305, it is possible to reduce the limitation on the color space of the video signal to be encoded and to encode the video signal using an optimal conversion from among various means in which the correlation between color components is removed.
In embodiment 13, the color space conversion unit 301 and the inverse color space conversion unit 308 have been described on the premise that they operate all the time, but information for instructing to secure compatibility with existing specifications may be encoded in an upper layer such as a sequence without operating these processing units.
Further, the color space conversion unit 301 and the inverse color space conversion unit 308 according to example 13 may be combined with the internal configurations of the encoding devices and decoding devices according to examples 7 to 12 to perform color space conversion for predicting the differential signal level. Fig. 64 shows an encoding device configured in this manner, and fig. 65 shows a decoding device. In the encoding device shown in fig. 64, the transform unit 310 is provided in place of the orthogonal transform unit 8, and the inverse transform unit 312 is provided in place of the inverse orthogonal transform unit 13. The decoding apparatus of fig. 65 includes an inverse transform unit 312 instead of the inverse orthogonal transform unit 13.
The transform section 310 selects an optimal transform process from among the plurality of color space transform processes as shown in the process of the color space transform section 301 for the prediction difference signal 4 of the C0, C1, and C2 components output from the encoding mode determining section 5, and first performs color space transform. Then, a transform equivalent to the orthogonal transform section 8 is performed on the result of the color space transform. The color space conversion method identification information 311 indicating which conversion is selected is sent to the variable length coding section 11, and multiplexed into a bit stream to be output as the video stream 22. Further, the inverse transform section 312 performs inverse transform corresponding to the inverse orthogonal transform section 13 first, and then performs inverse color space transform processing using the color space transform processing specified by the color space transform method identification information 311.
In the decoding apparatus, the variable length decoding unit 25 extracts the color space conversion method identification information 311 from the bit stream, and transmits the result to the inverse transform unit 312, thereby performing the same processing as the inverse transform unit 312 in the encoding apparatus. With this configuration, when the prediction difference region can sufficiently remove the correlation between the remaining color components, it can be executed as a part of the encoding process, which has the effect of improving the encoding efficiency. However, when separate macroblock headers are used for the C0, C1, and C2 components, the correlation in the region of the prediction difference signal 4 is also difficult to maintain because the prediction method may vary for each component, as in the case of intra prediction for the C0 component and inter prediction for the C1 component. Therefore, in the case where separate macroblock headers are used for the C0, C1, and C2 components, the transform section 310 and the inverse transform section 312 may not perform color space transform, and whether or not color space transform is performed in the region of the prediction difference signal 4 may also be multiplexed into the bitstream as identification information. The color space conversion method identification information 311 may be switched in units of any one of a sequence, an image, a slice, and a macroblock.
In the configurations of the encoding device and the decoding device in fig. 64 and 65, the signal definition ranges (domains) of the encoding target signals are different depending on the color space conversion method identification information 311 from among the conversion coefficient data of the C0, C1, and C2 components. Therefore, it is considered that the distribution of the transform coefficient data is generally different occurrence probability distribution corresponding to the color space transform method identification information 311. Therefore, when the encoding device and the decoding device are configured as shown in fig. 64 and 65, the encoding and decoding are performed for each component of C0, C1, and C2 using a context model corresponding to a different occurrence probability state for each state of the color space conversion method identification information 311.
In addition, according to the operation description of the decoding apparatus of the above embodiment 12, it is apparent that the context model is defined by the same procedure on both the encoding side and the decoding side to perform arithmetic decoding. After determining which component information is used as a context model to refer to, the occurrence probability state is updated according to the state of the occurrence probability state parameter sharing identification flag 143 and the context model.
Example 14
In this embodiment 14, the specific device configuration is further described for the encoding device and the decoding device described in the above embodiments.
In the above-described embodiments, the operations of the encoding device and the decoding device are described using, for example, the drawings based on fig. 1, fig. 2, fig. 30, fig. 31, and the like. In these figures, the following actions are illustrated: an input video signal composed of 3 color components is collectively input to an encoding device, and is encoded while selecting, within the device, whether to encode the 3 color components by a common prediction mode or macroblock header, or to encode by a separate prediction mode or macroblock header, and a resultant bit stream is input to a decoding device, and a playback video is obtained by performing a decoding process while selecting, within the decoding device, whether to encode the 3 color components by the common prediction mode or macroblock header, or to encode by the separate prediction mode or macroblock header, based on a flag (for example, an intra-prediction-mode-sharing identification flag 23, an inter-prediction-mode-sharing identification flag 123, or the like) decoded from the bit stream. Although it has been described that the flag can be encoded and decoded in units of arbitrary data layers such as macroblocks, slices, pictures, and sequences, in example 14, the configuration and operation of an apparatus for encoding and decoding 3 color component signals with a common macroblock header or encoding with individual macroblock headers will be described in particular with reference to the specific drawings. Hereinafter, the term "1 frame" is not particularly limited, and is regarded as a data unit of 1 frame or 1 field.
The macroblock header of example 14 includes a transform block size flag as shown in fig. 15, a macroblock type/sub-macroblock type as shown in fig. 50, encoding such as an intra prediction mode, prediction mode information, a reference image identification number, motion prediction information such as a motion vector, transform coefficient validity/invalidity instruction information, quantization parameters corresponding to transform coefficients, and other macroblock overhead information other than transform coefficient data.
Hereinafter, the process of encoding 3 color component signals of 1 frame with a common macroblock head is referred to as "common encoding process", and the process of encoding 3 color component signals of 1 frame with independent macroblock heads is referred to as "independent encoding process". Similarly, the process of decoding frame image data from a bitstream in which 3 color components of 1 frame are encoded from a common macroblock head is referred to as "common decoding process", and the process of decoding frame image data from a bitstream in which 3 color components of 1 frame are encoded from separate and independent macroblock heads is referred to as "independent decoding process". In the common encoding process of embodiment 14, as shown in fig. 66, an input video signal of 1 frame is divided into macroblocks in a format in which 3 color components are collected. On the other hand, in the independent encoding process, as shown in fig. 67, the input video signal of 1 frame is separated into 3 color components, and these are divided into macroblocks composed of a single color component. That is, the macroblock to be subjected to the common encoding process includes samples (samples) of 3 color components of C0, C1, and C2, but the macroblock to be subjected to the independent encoding process includes only a sample of any one of the C0, C1, and C2 components.
Fig. 68 is an explanatory diagram showing a temporal prediction reference relationship between pictures in the encoding device and the decoding device of example 14. In this example, the data unit indicated by the thick vertical line is an image, and the relationship between the image and the access unit is indicated by the dotted line. In the case of the common encoding/decoding process, 1 picture is data representing a 1-frame video signal in which 3 color components are mixed, and in the case of the independent encoding/decoding process, 1 picture is a 1-frame video signal of any one color component. The access unit is a minimum data unit to which a time stamp (timestamp) is added for the purpose of synchronizing a video signal with audio, audio information, and the like, and 1 access unit includes data of 1 picture (427 a in fig. 68) in the case of common encoding and decoding processes. On the other hand, in the case of the independent encoding/decoding processing, 3 pictures are included in 1 access unit (427 b in fig. 68). This is because, in the case of independent encoding and decoding processing, images at the same display time of all 3 color components start in agreement to obtain a playback video signal of 1 frame. The number added to the top of each picture indicates the coding/decoding processing order (frame _ num of AVC) in the temporal direction of the picture. In fig. 68, arrows between pictures indicate reference directions of motion prediction. That is, in the case of independent encoding and decoding processing, the motion prediction reference between pictures included in the same access unit and the motion prediction reference between different color components are not performed, but the signals of the same color component are limited, and the pictures of the color components of C0, C1, and C2 are encoded and decoded while being prediction-referred. With such a configuration, in the case of the independent encoding/decoding process of embodiment 14, encoding/decoding of each color component can be performed without depending on encoding/decoding processes of other color components, and parallel processing can be easily performed.
In AVC, it is intra-coded and defines an idr (instant decoder) picture that resets the contents of a reference picture memory for motion compensation prediction. IDR pictures can be decoded without depending on any other picture, and are used as random access points. In the case of the common encoding processing, the access unit 1 is 1 access unit 1 image, but in the case of the independent encoding processing, the access unit 1 is composed of a plurality of images, and therefore, in the case where a certain color component image is an IDR image, the IDR access unit is defined as an IDR image for the remaining color component images, and a random access concept is secured.
Hereinafter, in embodiment 14, identification information indicating whether encoding is performed by the common encoding process or encoding is performed by the independent encoding process is referred to as a common encoding/independent encoding identification signal.
Fig. 69 is an explanatory diagram showing an example of the structure of a bit stream generated in embodiment 14 and subjected to decoding processing as input by the decoding apparatus of embodiment 14. Fig. 69 shows a bit stream structure from the sequence level to the frame level, and first, the common coded/independent coded identification signal 423 is multiplexed to an upper header (sequence parameter set or the like in the case of AVC) at the sequence level. Each frame is encoded in units of access units. AUD refers to an AccessUnit limiter NAL unit that is the only NAL unit in AVC used to identify the end of an access unit. When the common encoding/independent encoding identification signal 423 indicates "image encoding by common encoding processing", the access unit is included in the encoded data of 1 image. The image at this time is data representing a 1-frame video signal in which 3 color components are mixed as described above. In this case, the encoded data of the i-th access unit is configured as a set of Slice data Slice (i, j). j is an index (index) of slice data within 1 picture.
On the other hand, when the common encoding/independent encoding identification signal 423 indicates "image encoding by independent encoding processing", 1 image is a 1-frame video signal of 1 color component. In this case, the encoded data of the p-th access unit is configured as a set of Slice data Slice (q, p, r) of the q-th picture. R is an index of slice data within 1 picture. When a color component is composed of 3 components such as RGB, the number of values of q is 3. In addition to the video signal composed of 3 primary colors, the number of values of q is set to 4 or more, for example, when additional data such as transparency information for α blending (α blending) is encoded and decoded as the same access unit, or when a video signal composed of 4 or more color components (for example, YMCK used in color printing) is encoded and decoded. The encoding apparatus and the decoding apparatus according to embodiment 14 encode each color component constituting a video signal completely independently if independent encoding processing is selected, and therefore can freely change the number of color components in principle without changing encoding and decoding processing. Even when the signal format for color representation of a video signal is changed in the future, the effect of enabling processing by the independent encoding processing of embodiment 14 is obtained.
In order to realize such a configuration, in example 14, the common encoding/independent encoding identification signal 423 is expressed as "the number of pictures included in 1 access unit and independently encoded without performing motion prediction reference to each other". In this case, the common code/independent code identification signal 423 may be expressed by the number of values of the parameter q, and the number of values of the parameter q will be referred to as num _ pictures _ in _ au hereinafter. That is, num _ pictures _ in _ au ═ 1 indicates "common encoding processing", and num _ pictures _ in _ au ═ 3 indicates "independent encoding processing" in the present embodiment 14. In the case where the color component is 4 or more, a value of num _ pictures _ in _ au > 3 may be set. By performing such signal transmission, if the decoding apparatus decodes and refers to num _ pictures _ in _ au, not only encoded data by common encoding processing and encoded data by independent encoding processing can be distinguished, but also it can be known that the single color component picture of the variable length decoding unit 25 is present in 1 access unit, and the common encoding processing and the independent encoding processing can be smoothly processed in a bitstream while color representation expansion of a future video signal can be processed.
Fig. 70 is an explanatory diagram showing a bit stream structure of slice data in the case of the common encoding process and the independent encoding process. In the bit stream encoded by the independent encoding process, a color component identification flag (color _ channel _ idc) is added to a head region of the head of slice data so that it can be identified to which color component of a picture in an access unit the slice data received by the decoding apparatus belongs, in order to achieve an effect described later. The value of color _ channel _ idc groups the same slices. That is, there is no dependency on encoding or decoding between slices having different color _ channel _ idc values (for example, dynamic prediction reference, context model generation by CABAC, occurrence probability learning, or the like). By defining in this way, independence of each image in the access unit in the case of independent encoding processing is ensured. In addition, frame _ num (the encoding and decoding processing order of pictures to which slices belong) multiplexed in each slice header has the same value for all color component pictures in 1 access unit.
Fig. 71 is an explanatory diagram showing a schematic configuration of the coding apparatus of example 14. In fig. 71, the common encoding process is performed by the first image encoding unit 503a, and the independent encoding process is performed by the second image encoding units 503b0, 503b1, and 503b2 (units for preparing 3 color components). The input video signal 1 is supplied to any one of the first image encoder 503a, the second image encoder 503b0, 503b1, and 503b2 through a Switch (SW) 501. The switch 501 is driven by the common code/independent code identification signal 423 to supply the input video signal 1 to a predetermined path. In the following, when the input video signal is in the 4:4:4 format, the common encoding/independent encoding identification signal (num _ pictures _ in _ au)423 is multiplexed into the sequence parameter group, and the common encoding process and the independent encoding process are selected in sequence units. This is conceptually the same as the inter prediction mode sharing identification flag 123 described in embodiment 7 and the macroblock header sharing identification flag 123 described in embodiment 11. When the common encoding process is used, the common decoding process needs to be executed on the decoding apparatus side, and when the independent encoding process is used, the independent decoding process needs to be executed on the decoding apparatus side, and therefore, the common encoding/independent encoding identification signal 423 needs to be multiplexed into the bit stream as information for specifying this case. Therefore, the common code/independent code identification signal 423 is input to the multiplexing section 504. The multiplexing unit of the common/independent encoding identification signal 423 may be any unit if it is a layer higher than a picture, such as a gop (group of picture) unit composed of several picture groups in a sequence.
In order to execute the common encoding process, the first image encoding unit 503a divides the input video signal 1 into macroblocks in the form of samples in which 3 color components are collected, and executes the encoding process in units of the macroblocks, as shown in fig. 66. The encoding process by the first image encoding means 503a will be described later. When the independent encoding process is selected, the color component separator 502 separates the input video signal into 1 frame of data of C0, C1, and C2, and supplies the data to the corresponding second image encoders 503b0 to 503b 2. In the second image encoding units 503b0 to 503b2, the signal of 1 frame separated for each color component is divided into macroblocks of the format shown in fig. 67, and encoding processing is executed in units of the macroblocks. The encoding process by the second image encoding means will be described later.
The first image encoding means 503a receives 1 image signal composed of 3 color components and outputs encoded data as an image stream 422 a. The video signals of 1 picture composed of a single color component are input to the second picture encoding units 503b0 to 503b2, and encoded data are output as video streams 422b0 to 422b 2. These video streams are multiplexed and output in the multiplexing section 504 in accordance with the state of the common code/independent code identification signal 423.
In the multiplexing of the video stream 422c, in the case of the independent encoding processing, multiplexing and transmission order interleaving (interleave) in the bit stream of slice data may be performed between pictures (color components) in an access unit (fig. 72). In this case, the decoding apparatus needs to identify to which color component in the access unit the received slice data belongs. Therefore, the color component identification mark multiplexed as shown in fig. 70 is used in the head region of the head of the slice data.
With such a configuration, in the encoding apparatus, when images of 3 groups of 3 color components are encoded by parallel processing using the independent second image encoding units 503b0 and 503b2 as in the encoding apparatus of fig. 71, the encoded data can be transmitted immediately if the slice data preparation of the image itself is completed without waiting for the completion of the encoded data of the images of the other color components. In AVC, 1 image can be divided into a plurality of slices and encoded, and the slice data length and the number of macroblocks included in a slice can be flexibly changed in accordance with the encoding conditions. Since the independence of decoding processing of slices is ensured between slices adjacent in an image space, a neighboring context such as intra prediction or arithmetic coding cannot be used, and thus the coding efficiency is high if the slice data length is as long as possible. On the other hand, when an error is mixed in a bit stream during transmission and recording, the shorter the slice data length is, the earlier the error is recovered, and the quality deterioration is easily suppressed. If the slice length, structure, color component order, etc. are fixed without multiplexing the color component identifiers, the conditions for generating the bit stream in the encoding apparatus are fixed, and it is not possible to flexibly cope with various encoding requirements.
Further, if a bit stream can be configured as shown in fig. 72, the encoding apparatus can reduce the transmission buffer size required for transmission, that is, the processing delay on the encoding apparatus side. Fig. 72 shows this case. When multiplexing of slice data across images is not allowed, the encoding device needs to buffer encoded data of other images until encoding of an image of a specific color component is completed. This means that an image-level delay is generated. On the other hand, as shown in the lowermost part of fig. 72, if interleaving is likely to occur at the slice level, encoded data can be output to the multiplexing section in units of slice data, and delay can be suppressed.
Further, slice data included in 1 color component picture may be transferred in the raster scanning order of macroblocks, or may be interleaved and transferred in 1 picture.
The operation of the first and second image encoding means will be described in detail below.
Summary of motion of first image encoding means
Fig. 73 shows an internal configuration of the first image encoding means 503 a. In fig. 73, an input video signal 1 is input in units of macroblocks in the format of fig. 66, in which 3 color components are aggregated, in a 4:4:4 format.
First, the prediction unit 461 selects a reference image from the motion compensation prediction reference image data stored in the memory 16a, and performs motion compensation prediction processing on a macroblock basis. A plurality of reference image data sets of 3 color components are stored in the memory 16a over a plurality of times, and the prediction section 461 selects an optimal reference image from these sets in units of macroblocks and performs motion prediction. The arrangement of the reference image data in the memory 16a may be divided and stored for each color component in the plane order, or may be a sample of each color component in the dot order. First, as shown in fig. 32(a) to (d), any one of 16 × 16, 16 × 8, 8 × 16, and 8 × 8 sizes may be selected for each macroblock unit. Further, when 8 × 8 is selected, any one of sizes 8 × 8, 4 × 8, and 4 × 4 may be selected for each 8 × 8 block as shown in fig. 32(e) to (h).
The prediction unit 461 executes motion compensation prediction processing for each macroblock with respect to all or a part of the block size/sub-block size in fig. 32, the dynamic efficiency of a predetermined search range, and available 1 or more reference images, and acquires a prediction difference signal 4 for each block as a unit of motion compensation prediction from the subtractor 3 with respect to the dynamic efficiency information and the reference image identification number 463 used for prediction. The prediction difference signal 4 is evaluated in the encoding mode determining unit 5 for the prediction efficiency, and the macroblock type/sub-macroblock type 106, the dynamic efficiency information, and the reference picture identification number 463, which can obtain the optimal prediction efficiency for the macroblock to be predicted, are output from the prediction processing executed in the prediction unit 461. All macroblock header information such as a macroblock type, a sub-macroblock type, a reference picture index, and a motion efficiency are determined as common header information for 3 color components, used for encoding, and multiplexed into a bit stream. In order to evaluate the optimality of the prediction efficiency, only the prediction error amounts with a predetermined color component (for example, G component in RGB, Y component in YUV, or the like) may be evaluated with the purpose of suppressing the calculation amount, or the prediction error amounts with respect to all the color components, which can obtain the optimum prediction performance although the calculation amount becomes large, may be comprehensively evaluated. When the macroblock type/sub-macroblock type 106 is finally selected, the weighting coefficient 20 corresponding to each type determined by the judgment of the encoding control unit 19 may be further used.
Also in the prediction section 461, intra prediction is performed. When intra prediction is performed, intra prediction mode information is output on output signal 463. Hereinafter, particularly when intra prediction and motion compensated prediction are not distinguished, the output signal 463 is referred to as prediction overhead (overhead) information by integrating intra prediction mode information, motion vector information, and reference picture identification number. For the intra prediction, only the prediction error amounts of predetermined color components may be evaluated, or the prediction error amounts of all the color components may be comprehensively evaluated. Finally, the coding mode determining unit 5 evaluates whether intra prediction or inter prediction is performed using the prediction efficiency or coding efficiency, and selects a macroblock type.
The prediction difference signal 4 obtained by the intra prediction and the motion compensation prediction based on the prediction overhead information 463 is output to the transform section 310 for the selected macroblock type/sub-macroblock type 106. The transform unit 310 transforms the input prediction difference signal 4 and outputs the transformed prediction difference signal as a transform coefficient to the quantization unit 9. In this case, the size of the block, which is the unit of transformation, may be selected from any of 4 × 4 and 8 × 8. When the transform block size can be selected, the block size selected at the time of encoding is reflected in the value of the transform block size designation flag 464, and the flag is multiplexed into the bitstream. The quantization unit 9 quantizes the input transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient 10 to the variable-length encoding unit 11. The quantized transform coefficient 10 includes information of 3 color components, and is subjected to average information amount coding by means of huffman coding, arithmetic coding, or the like by the variable length coding section 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse transform unit 312, and is added to the selected macroblock type/sub-macroblock type 106 and the prediction image 7 generated from the prediction overhead information 463 by the adder 18, thereby generating a local encoded image 15. The local coded image 15 is used in the subsequent motion compensation prediction processing after the block distortion removal processing is performed by the deblocking filter 462, and is stored in the memory 16 a. In addition, a deblocking filter control flag 24 indicating whether or not to perform deblocking filtering on the macroblock is input to the variable length coding section 11.
The quantized transform coefficients 10, the macroblock type/sub-macroblock type 106, the prediction overhead information 463, and the quantization parameter 21 input to the variable length coding unit 11 are aligned and shaped into a bitstream in accordance with a predetermined rule (syntax), and the encoded data obtained by NAL-uniting the macroblock of the format of fig. 66 in units of 1 or a plurality of pieces of slice data, is transmitted to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the band of the transmission path connected to the encoding apparatus and the reading speed of the recording medium, and outputs the bit stream as a video stream 422 a. In addition, the feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.
Since the output of the first image encoding unit 503a is a slice in which 3 components are collected and is equivalent to the amount of code in which access units are collected, the transmission buffer 17 may be arranged in the multiplexing unit 504 as it is.
In the first image encoding unit 503a of embodiment 4, since it can be identified from the common encoding/independent encoding identification signal 423 that all slice data in the sequence is a slice in which C0, C1, and C2 are mixed (that is, a slice in which information of 3 color components is mixed), the color component identification flag is not multiplexed in the slice header.
Outline of operation of second image encoding means
FIG. 74 shows the internal structure of the second image encoder 503b0(503b1, 503b 2). In fig. 74, an input video signal 1 is input as a macroblock composed of samples of a single color component in the form of fig. 67.
First, the prediction unit 461 selects a reference image from the motion compensated prediction reference image data stored in the memory 16b, and performs motion prediction processing on a macroblock basis. A plurality of reference image data composed of a single color component at a plurality of times can be stored in the memory 16b, and the prediction section 461 selects an optimal reference image from these reference image data on a macroblock-by-macroblock basis and performs motion prediction. The memory 16b may be shared with the memory 16a in units in which 3 color components are collected. As shown in fig. 32(a) to (d), any one of 16 × 16, 16 × 8, 8 × 16, and 8 × 8 block sizes can be selected for each macroblock. Further, when 8 × 8 is selected, any one of sizes 8 × 8, 8 × 4, 4 × 8, and 4 × 4 may be selected for each 8 × 8 block as shown in fig. 32(e) to (h).
The prediction unit 461 executes motion compensation prediction processing for each macroblock with respect to all or a part of the block size, the sub-block size, the motion vector of the predetermined search range, and available 1 or more reference images, and obtains a prediction difference signal 4 for each block as a motion compensation prediction unit by obtaining a difference between the motion vector information and the reference image identification number 463 for prediction by the subtractor 3. The prediction difference signal 4 is evaluated in the encoding mode determining unit 5 for the prediction efficiency, and the macroblock type/sub-macroblock type 106, motion vector information, and reference picture identification number 463, which can obtain the optimal prediction efficiency for the macroblock to be predicted, are output from the prediction processing performed by the predicting unit 461. All of the macroblock header information such as the macroblock type, sub-macroblock type, reference picture index, motion vector, and the like are determined as header information corresponding to a signal of a single color component of the input video signal 1, and are multiplexed into a bit stream for encoding. When the optimality of the prediction efficiency is evaluated, only the prediction error amount corresponding to a single color component to be subjected to the encoding process is evaluated. When the macroblock type/sub-macroblock type 106 is selected, the weighting coefficient 20 for each type determined by the judgment of the encoding control unit 19 is also used.
Also in the prediction section 461, intra prediction is performed. When intra prediction is performed, intra prediction mode information is output on output signal 463. Hereinafter, particularly when intra prediction and motion compensated prediction are not distinguished, the output signal 463 integrates intra prediction mode information, motion vector information, and reference picture identification number, and is referred to as prediction overhead (overhead) information. For the intra prediction, only the prediction error amount corresponding to a single color component to be subjected to the encoding process may be evaluated. Finally, whether to perform intra prediction or inter prediction is evaluated and selected by using the prediction efficiency or the coding efficiency.
The prediction difference signal 4 obtained from the prediction overhead information 463 and the selected macroblock type/sub-macroblock type 106 is output to the transform section 310. The transform unit 310 transforms the input prediction difference signal 4 of a single color component and outputs the transformed prediction difference signal as a transform coefficient to the quantization unit 9. In this case, the size of the block, which is the unit of transformation, may be selected from any of 4 × 4 and 8 × 8. If the selection is possible, the block size selected at the time of encoding is reflected in the value of the transform block size designation flag 464, and the flag is multiplexed into the bit stream. The quantization unit 9 quantizes the input transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient 10 to the variable-length encoding unit 11. The quantized transform coefficient 10 contains information of a single color component, and is subjected to average information amount coding by means of huffman coding, arithmetic coding, or the like by the variable length coding section 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse transform unit 312, and is added to the selected macroblock type/sub-macroblock type 106 and the prediction image 7 generated from the prediction overhead information 463 by the adder 18, thereby generating a local encoded image 15. The local coded image 15 is subjected to the block distortion removal processing by the deblocking filter 462 and then used in the subsequent motion compensation prediction processing, and is stored in the memory 16 b. In addition, a deblocking filter control flag 24 indicating whether or not to perform deblocking filtering on the macroblock is input to the variable length coding section 11.
The quantized transform coefficients 10, the macroblock type/sub-macroblock type 106, the prediction overhead information 463, and the quantization parameter 21 input to the variable length coding unit 11 are aligned and shaped into a bitstream in accordance with a predetermined rule (syntax), and the encoded data obtained by NAL-uniting the macroblock of the format of fig. 67 in units of 1 or a plurality of pieces of slice data, is transmitted to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the band of the transmission path connected to the encoding apparatus and the reading speed of the recording medium, and outputs the bit stream as a video stream 422b0(422b1, 422b 2). In addition, the feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.
Further, when the output of the second image encoding units 503b0 to 503b2 is a slice composed of data of a single color component and it is necessary to perform code amount control in units of a set of access units, a common transmission buffer of a unit in which all the slices of the color component are multiplexed may be provided in the multiplexing unit 504, and the encoding control unit 19 of each color component may be fed back based on the occupied amount of the buffer. In this case, the encoding control may be performed using only the amount of information generated for all color components, or may be performed using the state of the transmission buffer 17 for each color component. When encoding control is performed using only the generated information amount of all color components, the concept corresponding to the transmission buffer 17 may be realized by a common transmission buffer in the multiplexer 504, and the transmission buffer 17 may be omitted.
In the second image encoding units 503b 0-503 b2 of embodiment 14, since it is possible to recognize from the common encoding/independent encoding identification signal 423 that all slice data in the sequence is a single color component slice (i.e., a C0 slice, a C1 slice, or a C2 slice), the color component identification flag is always multiplexed in the slice header, so that the decoding apparatus can recognize which image data in the access unit corresponds to. Therefore, the second image encoding units 503b0 to 503b2 can transmit the output from the transmission buffer 17 at a timing that spans 1 slice of data without aggregating the output into 1 image.
The common code/independent code identification signal (num _ pictures _ in _ au) can indicate information (common code identification information) for distinguishing the coded data subjected to the common code processing from the coded data subjected to the independent code processing, and information (color component number) indicating that several single-color-component pictures exist in one access unit, but the 2 pieces of information may be coded as independent information.
The first image encoding unit 503a and the second image encoding units 503b0 to 503b2 may be implemented by common functional blocks in the first image encoding unit 503a and the second image encoding units 503b0 to 503b2, depending on whether the macroblock header information is processed as 3-component common information or as information of a single color component, and depending on whether the basic processing blocks such as the prediction unit, the conversion unit, the inverse conversion unit, the quantization unit, the inverse quantization unit, and the deblocking filter in fig. 73 and 74 are mostly processed only as information of 3 color components or only information of a single color component. Therefore, not only the completely independent encoding processing means as shown in fig. 71, but also the basic components shown in fig. 73 and 74 can be appropriately combined to realize various configurations of the encoding device. Further, if the memory 16a of the first image encoder 503a is disposed in the order of the plane, the configuration of the reference image memory can be shared between the first image encoder 503a and the second image encoders 503b 0-503 b 2.
Although not shown, the encoding device of embodiment 14 assumes that there are a virtual stream buffer (encoded picture buffer) for buffering the video stream 422c matching the array of fig. 69 and 70 and a virtual frame memory (decoded picture buffer) for buffering the decoded pictures 427a and 427b, and generates the video stream 422c so that there is no overflow (overflow), underflow (underflow) or disruption of the encoded picture buffer. This control is mainly performed by the encoding control section 19. Thus, in the case where the decoding device decodes the video stream 422c in accordance with the operations (virtual buffer model) of the encoded picture buffer and the decoded picture buffer, it is ensured that no break occurs in the decoding device. The virtual cache mode is defined below.
The operation of buffering the encoded image is performed in units of access units. As described above, in the case of performing the common decoding processing, 1 access unit contains encoded data of 1 picture, and in the case of performing the independent decoding processing, 1 access unit contains encoded data of pictures having a color component (3 pictures if 3 components). The predetermined operation for buffering the encoded image is a timing when the first bit and the last bit of the access unit are input to the encoded image buffer and a timing when the bit of the access unit is read out from the encoded image buffer. Further, it is specified that the reading from the encoded image buffer is performed instantaneously, and all bits of the access unit are read from the encoded image buffer at the same time. When the bits of the access unit are read from the encoded picture buffer, the bits are input to the upper header analyzing section, and the decoded bits are output as color picture frames grouped in units of access units by the first picture decoding section or the second picture decoding section as described above. In addition, in the definition of the virtual buffer model, it is assumed that the processing from reading out bits from the encoded image buffer to outputting a color image frame as an access unit is performed instantaneously. A color image frame constituted by an access unit is input to a decoded image buffer, and the output time from the decoded image buffer is calculated. The output time from the decoded picture buffer is a value obtained by adding a predetermined delay time to the read time from the encoded picture buffer. The delay time may be multiplexed into the bitstream to control the decoding apparatus. When the delay time is 0, that is, when the output time from the decoded picture buffer is equal to the read time from the encoded picture buffer, if a color picture frame is input to the decoded picture buffer, the color picture frame is simultaneously output from the decoded picture buffer. Otherwise, that is, when the output time from the decoded picture buffer is later than the read time from the encoded picture buffer, the color picture frame is stored in the decoded picture buffer until the output time from the decoded picture buffer. As described above, the operation of the decoded image buffer is defined in units of access units.
Fig. 75 is an explanatory diagram showing a schematic configuration of the decoding device of example 14. In fig. 75, the common decoding process is performed by the first picture decoding section 603a, and the independent decoding process is performed by the color component determination section 602 and the second picture decoding sections 603b0, 603b1, and 603b2 (sections in which 3 color components are prepared).
The upper header analyzing unit 610 divides the video stream 422c into NAL unit units, decodes upper header information such as a sequence parameter group and an image parameter group as it is, and stores the decoded upper header information in a predetermined memory area that can be referred to by the first image decoding unit 603a, the color component determining unit 602, and the second image decoding units 603b0 to 603b2 in the decoding apparatus. The common code/independent code identification signal 423(num _ pictures _ in _ au) multiplexed into the sequence unit is decoded and stored as a part of the upper header information.
The decoded num _ pictures _ in _ au is supplied to a Switch (SW)601, and if num _ pictures _ in _ au is 1, the switch 601 supplies a slice NAL unit for each picture to a first picture decoding section 603a, and if num _ pictures _ in _ au is 3, the switch supplies color component judgment section 602. That is, if num _ pictures _ in _ au is 1, the first picture decoding section 603a performs common decoding processing, and if num _ pictures _ in _ au is 3, the 3 second picture decoding sections 603b0 to 603b2 perform independent decoding processing. The detailed operation of the first and second image decoding means will be described later.
The color component determination section 602 identifies which color component picture in the current access unit the slice NAL unit corresponds to based on the value of the color component identification flag shown in fig. 70, and assigns the color component picture to be supplied to the appropriate second picture decoding sections 603b0 to 603b 2. With such a configuration of the decoding apparatus, the following effects can be obtained: even if a bit stream coded in a slice interleaving manner in an access unit as shown in fig. 72 is received, it is possible to arbitrarily determine which slice belongs to which color component image and correctly decode the same.
Summary of actions of first image decoding means
Fig. 76 shows the internal configuration of the first image decoding section 603 a. The first video decoding unit 603a divides the video stream 422C output from the encoding apparatus of fig. 71 and matching the array of fig. 69 and 70 into NAL unit units by the upper header analyzing unit 610, receives the NAL units in slice units in which C0, C1, and C2 are mixed, performs decoding processing on macroblocks each composed of 3 color component samples shown in fig. 66, and restores an output video frame.
The variable length decoding unit 25 receives a video stream 422c divided into NAL units, decodes the stream 422c in accordance with a predetermined rule (syntax), and extracts 3 quantized transform coefficients 10 and macroblock header information (macroblock type/sub-macroblock type 106, prediction overhead information 463, transform block size designation flag 464, and quantization parameter 21) commonly used for the 3 components. The quantized transform coefficient 10 is input to the inverse quantization unit 12 that performs the same processing as the first image encoding unit 503a together with the quantization parameter 21, and inverse quantization processing is performed. The output is input to the inverse orthogonal transform unit 312 that performs the same processing as the first image decoding unit 503a, and is restored to the local decoded prediction difference signal 14 (if the transform block size designation flag 464 is present in the video stream 422c, this is referred to during the inverse quantization and inverse transform processing). On the other hand, the prediction unit 461 performs processing of the prediction unit 461 in the first image decoding unit 503a including only generation of the predicted image 7 with reference to the prediction overhead information 463, and inputs the macroblock type/sub-macroblock type 106 and the prediction overhead information 463 to the prediction unit 461 to obtain the predicted image 7 of 3 components. When the macroblock type indicates intra prediction, a 3-component predicted image 7 is obtained from the prediction overhead information 463 in accordance with the intra prediction mode information, and when the macroblock type indicates inter prediction, a 3-component predicted image 7 is obtained from the prediction overhead information 463 in accordance with the motion vector and the reference image index. The adder 18 adds the local decoded prediction difference signal 14 and the predicted image 7 to obtain a 3-component temporally decoded image (local decoded image) 15. The temporarily decoded image 15 is output as a decoded image 427a and stored in the memory 16a after the block distortion removal processing is applied to the temporarily decoded image samples of 3 components in the deblocking filter 462 which performs the same processing as the first image decoding section 503a, for use in the motion compensated prediction of the subsequent macroblock. At this time, the deblocking filter process is caused to act on the decoded image 15 in accordance with the instruction of the deblocking filter control flag 24 interpreted by the variable length decoding unit 25. A plurality of pieces of reference image data composed of 3 color components at a plurality of times are stored in the memory 16a, and the prediction unit 461 selects a reference image indicated by a reference image index extracted from the bitstream from these reference image data in units of macroblocks and generates a predicted image. The arrangement of the reference image data in the memory 16a may be such that each color component is stored separately in the order of surface, or samples of each color component may be stored in the order of dots. The decoded image 427a includes 3 color components, and is a color image frame constituting the access unit 427a0 in the common decoding process.
Outline of operation of second image decoding unit
FIG. 77 shows the internal structure of the second picture decoding units 603b 0-603 b 2. In the second image decoding units 603b0 to 603b2, the upper header analyzing unit 610 divides the video stream 422C output from the encoding apparatus of fig. 71 and matching the array of fig. 69 and 70 into NAL unit units, receives the NAL unit units in C0, C1, and C2 slices allocated by the color component determining unit 602, decodes the macro blocks shown in fig. 67 and composed of samples of a single color component as a unit, and restores the output image frame.
The variable length decoding unit 25 receives the video stream 422c as input, decodes the stream 422c in accordance with a predetermined rule (syntax), and extracts the quantized transform coefficients 10 of a single color component and macroblock header information (macroblock type/sub-macroblock type 106, prediction overhead information 463, transform block size designation flag 464, and quantization parameter 21) applied to the single color component. The quantized transform coefficients 10 are input to the inverse quantization unit 12, which performs the same processing as the second image encoding units 503b0 to 503b2, together with the quantization parameters 21, and inverse quantization processing is performed. The output is input to the inverse orthogonal transform unit 312 that performs the same processing as the second image encoding unit 503b0(503b1, 503b2), and is restored to the local decoded prediction difference signal 14 (if the transform block size designation flag 464 is present in the video stream 422c, this is referred to during the inverse quantization and inverse orthogonal transform processing). On the other hand, the prediction unit 461 performs processing including generating the predicted image 7 only by referring to the prediction overhead information 463 in the prediction unit 461 in the second picture encoding unit 503b0(503b1, 503b2), and inputs the macroblock type/sub-macroblock type 106 and the prediction overhead information 463 to the prediction unit 461 to obtain the predicted image 7 of a single color component. When the macroblock type indicates intra prediction, a predicted image 7 of a single color component is obtained from the prediction overhead information 463 in accordance with the intra prediction mode information, and when the macroblock type indicates inter prediction, a predicted image 7 of a single color component is obtained from the prediction overhead information 463 in accordance with the motion vector and the reference image index. The adder 18 adds the local decoded prediction difference signal 14 and the predicted image 7 to obtain a temporarily decoded image 15 of a single color component macroblock. The temporarily decoded image 15 is used for motion compensated prediction of a subsequent macroblock, and is output as a decoded image 427b and stored in the memory 16b after block distortion removal processing is performed on temporarily decoded image samples of a single color component in the deblocking filter 26 that performs the same processing as that of the second image encoding unit 503b0(503b1, 503b 2). At this time, the deblocking filter process is caused to act on the decoded image 15 in accordance with the instruction of the deblocking filter control flag 24 interpreted by the variable length decoding unit 25. The decoded image 427b includes only samples of a single color component, and the decoded images 427b output from the other second image decoding units 603b0 to 603b2 in fig. 75 are grouped into units of the access unit 427b0, thereby forming a color video frame.
As is apparent from the above description, the first image decoding section 603a and the second image decoding sections 603b0 to 603b2 differ only in the bit stream structure of slice data, whether the macroblock header information is processed as 3-component common information or as information of a single color component, and many of the basic decoding processing modules such as motion compensation prediction processing, inverse transform, inverse quantization and the like in fig. 73 and 74 may be realized by common functional modules in the first image decoding section 603a and the second image decoding sections 603b0 to 603b 2. Therefore, not only the completely independent decoding processing means as shown in fig. 75 but also the basic components shown in fig. 76 and 77 can be appropriately combined to realize various configurations of decoding apparatuses. In addition, if the memory 16a of the first image decoding unit 603a is arranged in the planar order, the first image decoding unit 603a and the second image decoding units 603b0 to 603b2 may have the same configuration as the memory 16a and the memory 16 b.
As another form of the encoding apparatus of fig. 71, the decoding apparatus of fig. 75 may be configured to receive and decode a bit stream output from an encoding apparatus configured to always fix the common encoding/independent encoding identification signal 423 to the "independent encoding process" and independently encode all frames without using the first image encoding means 503a at all. As another form of the decoding apparatus of fig. 75, a decoding apparatus may be configured to perform only the independent decoding process without using the switch 601 and the first image decoding unit 603a in a usage form on the premise that the common encoding/independent encoding identification signal 423 is always fixed to the "independent encoding process".
The common code/independent code identification signal (num _ pictures _ in _ au) includes information (common code identification information) indicating that coded data subjected to the common coding process and coded data subjected to the independent coding process are to be distinguished from each other, and information (color component number) indicating that several single color component pictures exist in one access unit.
Further, the first image decoding unit 603a has a decoding concept of a bitstream of AVC configuration standard encoded by aggregating 3 components of the conventional YUV4:2:0 format, and the upper header analyzing unit 610 determines which format the configuration specification identifier decoded from the video stream 422c is encoded in, and transmits the determination result to the switch 601 and the first image decoding unit 603a as a part of the information of the signal line of the common encoding/independent encoding identification signal 423, thereby making it possible to configure a decoding device capable of ensuring compatibility with a bitstream of the conventional YUV4:2:0 format.
In addition, in the first image encoding unit 503a of example 14, since information of 3 color components is mixed in slice data and completely the same intra-prediction and inter-prediction processing is performed on the 3 color components, signals between the remaining color components are correlated in the prediction difference signal space. As the process of removing this, for example, the color space conversion process as described in embodiment 13 above may be performed on the prediction difference signal. Fig. 78 and 79 show examples of the first image encoding member 503a having such a configuration. Fig. 78 shows an example of performing color space conversion processing at the pixel level before conversion processing, where color space conversion section 465 is disposed before conversion section 310, and inverse color space conversion section 466 is disposed after inverse conversion section 312. Fig. 79 shows an example in which the color space conversion processing is performed while appropriately selecting the frequency component to be processed for the coefficient data obtained after the conversion processing, and the color space conversion unit 465 is disposed after the conversion unit 310 and the inverse color space conversion unit 466 is disposed before the inverse conversion unit 312. By limiting the frequency components to which the color space conversion is performed, there is an effect of suppressing propagation of high-frequency noise components contained in a specific color component to other color components containing little noise. When the frequency component to be subjected to the color space conversion processing can be adaptively selected, the signal transfer information 467 used for selection at the time of judgment of encoding on the decoding side is multiplexed in the bit stream.
The color space conversion processing may be switched and used in units of macroblocks in accordance with the properties of the image signal to be encoded, or may be determined in units of macroblocks to determine whether or not conversion is performed. The type of conversion method that can be selected may be specified on the sequence level or the like, and which conversion method is selected may be specified on the image, slice, macroblock, or other unit basis. In addition, it is also possible to make a selection as to whether to perform orthogonal transformation before or after. When these adaptive encoding processes are performed, the encoding mode determining unit 5 may be configured to select a branch having a high encoding efficiency for all selectable branches even if the encoding efficiency is evaluated. When these adaptive encoding processes are performed, the signal transmission information 467 used for determining selection at the decoding side at the time of encoding is multiplexed into the bit stream. Such signaling may be specified at a level different from that of the macroblock, such as slice, picture, GOP, sequence, etc.
Fig. 80 and 81 show decoding apparatuses corresponding to the encoding apparatuses of fig. 78 and 79. Fig. 80 is a decoding apparatus that decodes a bit stream encoded by performing color space conversion by the encoding apparatus of fig. 78 before conversion processing. The variable length decoding section 25 decodes, from the bit stream, information for selecting the presence or absence of conversion performed or not converted by the color space conversion section 466 and signal transmission information 467 as information for selecting a conversion scheme executable by the inverse color space conversion section 466, and supplies the information to the inverse color space conversion section 466. The decoding apparatus of fig. 80 performs color space conversion processing on the inverse-converted prediction error signal based on these pieces of information in the inverse color space conversion unit 466. Fig. 81 is a decoding device that decodes a bit stream encoded by performing color space conversion on a frequency component selected as a processing target by the encoding device of fig. 79 after the conversion processing. The variable length decoding means decodes, from the bitstream, information for selecting the presence or absence of conversion performed or not converted by the inverse color space conversion means 466, information for selecting the conversion method to be performed by the inverse color space conversion means, and signal transmission information 467 that is identification information including information for specifying a frequency component to be subjected to color space conversion, and the like, and supplies the decoded information to the inverse color space conversion means 466. The decoding apparatus of fig. 81 performs a color space conversion process on the transform coefficients subjected to inverse quantization based on the information in the inverse color space section 466.
The decoding apparatuses of fig. 80 and 81 have a function of decoding a bit stream of AVC standard configuration standard in which 3 components of a target set are encoded in the conventional YUV4:2:0 format in the first picture decoding section 603a, and the decoding apparatus can be configured to have compatibility with a bit stream of the conventional YUV4:2:0 format by determining which format is encoded in the bit stream by referring to the configuration specification identifier decoded from the video stream 422c in the upper header analyzing section 610 and transmitting the determination result to the switch 601 and the first picture decoding section 603a as part of information of the signal line of the common encoding/independent encoding identification signal 423.
Fig. 82 shows a structure of encoded data of macroblock header information included in a conventional YUV4:2: 0-format bitstream. The difference from the Cn component header information shown in fig. 50 is only the point that the coded data of the intra-color difference prediction mode 144 is included when the macroblock type is intra-prediction. When the macroblock type is inter prediction, the structure of coded data of macroblock information is the same as Cn component header information shown in fig. 50, but a motion vector of a color difference component is generated by a method different from that of a luminance component using a reference picture identification number and motion vector information included in the macroblock header information.
The operation of a decoding device that ensures compatibility with a conventional YUV4:2:0 format bit stream will be described. As described above, the first picture decoding section 603a has a function of decoding a bit stream of the conventional YUV4:2:0 format. The internal structure of the first picture decoding section is the same as that of fig. 76.
The operation of the variable length decoding section 25 of the first picture decoding section having a function of decoding a bit stream of the conventional YUV4:2:0 format will be described. If the video stream 422c is input to the variable length decoding section, the color difference format indication flag is decoded. The color difference format indicator is included in the sequence parameter header of the video stream 422c, and indicates which format the input video format is 4:4:4, 4:2:2, 4:2:0, or 4:0: 0. The decoding process of the macroblock header information of the video stream 422c is switched according to the value of the color difference format indicator. In case the macroblock type indicates intra prediction, the intra color difference prediction mode 144 is decoded from the bitstream in case the color difference format indicator indicates 4:2:0 or 4:2: 2. In case that the color difference format indication flag indicates 4:4:4, decoding of the intra color difference prediction mode 144 is skipped. When the color difference format indicator indicates 4:0:0, the input video signal is in a format consisting of only a luminance signal (4:0:0 format), and is thus decoded in the intra color difference prediction mode 144. The decoding process of the macroblock header information other than the intra-color difference prediction mode 144 is the same as that of the variable length decoding section of the first picture decoding section 603a which does not have the decoding function of the bit stream of the conventional YUV4:4:0 format. Through the above processing, if the video stream 422c is input to the variable length decoding unit 25, the color difference format indicator (not shown), the 3-component quantized transform coefficients 10, and the macroblock header information (macroblock type/sub-macroblock type 106, prediction overhead information 463, transform block size specification flag 464, and quantization parameter 21) are extracted. A color difference format indicator (not shown) and prediction overhead information 463 are input to the prediction unit 461, thereby obtaining a predicted image 7 having 3 components.
Fig. 83 shows an internal configuration of the prediction unit 461 of the first image decoding unit, which ensures compatibility with a conventional YUV4:2:0 format bit stream, and describes the operation thereof.
The switching section 4611a judges the macroblock type, and in the case where the macroblock type indicates intra prediction, the value of the color difference format indicator is judged by the switching section 4611 b. When the value of the color difference format indicator indicates 4:2:0 or 4:2:2, a predicted image 7 of 3 components is obtained from the prediction overhead information 463 in accordance with the intra prediction mode information and the intra color difference prediction mode information. The luminance information intra prediction section 4612 generates a prediction image of the luminance signal in 3 components in accordance with the intra prediction mode information. A color difference signal intra prediction section 4613, which performs processing different from that of the luminance component, generates a predicted image of 2 components of the color difference signal in accordance with the intra color difference prediction mode information. In the case where the value of the color difference format indicator indicates 4:4:4, a prediction image of all 3 components is generated by the luminance signal intra-prediction section 4612 in accordance with the intra-prediction mode information. When the value of the color difference format indicator indicates 4:0:0, the 4:0:0 format is composed of only luminance information (1 component), and therefore the luminance signal intra prediction means 4612 generates only a predicted image of the luminance signal in accordance with the intra prediction mode information.
In the case where the macroblock type indicates inter prediction in the switching section 4611a, the value of the color difference format indicator flag is discriminated in the switching section 4611 c. When the value of the color difference format indicator indicates 4:2:0 or 2:2:2, the inter-luminance signal prediction section 4614 generates a prediction image for a luminance signal in accordance with a prediction image generation method for a luminance signal defined by the AVC rule in accordance with a dynamic sequence and a reference image index from the prediction overhead information 463. For the predicted image of 2 components of the color difference signal, the inter-color-difference-signal prediction section 4615 scales (scales) the motion vector obtained from the prediction overhead information 463 in accordance with the color difference format to generate a color difference motion vector, and generates a predicted image in accordance with the method defined by the AVC rule from the reference image indicated by the reference image index obtained from the prediction overhead information 463. When the value of the color difference format indicator indicates 4:0:0, the 4:0:0 format is composed of only the luminance signal (1 component), and therefore, the inter-luminance-signal prediction section 4614 generates a predicted image of only the luminance signal in accordance with the motion vector and the reference image index.
As described above, by providing the means for generating the predictive image of the color difference signal in the conventional YUV4:2:0 format and switching the means for generating the predictive image of 3 components in accordance with the value of the color difference format indicator decoded from the bitstream, it is possible to configure a decoding device which can secure compatibility with the bitstream in the conventional YUV4:2:0 format.
Further, if information indicating whether or not a bitstream that can be decoded even by a decoding apparatus that does not support color space conversion processing, such as the decoding apparatus of fig. 75, is added to the video stream 422c supplied to the decoding apparatus of fig. 80 or 81 in units of sequence parameter sets or the like, the decoding apparatus of either of fig. 71 and 75 can decode the bitstream in accordance with the decoding performance thereof, and the effect of easily ensuring compatibility of the bitstream is obtained.
Example 15
In embodiment 15, the coding apparatus and the decoding apparatus of embodiment 14 described above, such as those shown in fig. 71 and 75, have described another embodiment in which only the configuration of the bit stream to be input/output is different. The encoding device of embodiment 15 multiplexes encoded data by the bit stream configuration shown in fig. 84.
In the configuration of fig. 69, the AUD NAL unit includes information such as primary _ pic _ type as an element thereof. Fig. 85 shows information of picture coding types when picture data in an access unit represented by an AUD NAL unit is coded.
For example, when primary _ pic _ type is 0, this indicates that all pictures are intra-coded. When primary _ pic _ type is 1, it indicates that a slice that can be motion compensation predicted by using only one intra-coded slice or reference picture list is mixed in a picture. primary _ pic _ type is information that defines what encoding mode can be used to encode 1 picture, and therefore, the encoding apparatus can perform encoding suitable for various conditions such as the nature of the input video signal and the random access performance by operating on the information. In the above embodiment 14, since there are only 1 primary _ pic _ type in an access unit, it is assumed that the primary _ pic _ type is common to 3 color component pictures in the access unit in the case of performing independent encoding processing. In embodiment 15, when each Color component picture is independently encoded, primary _ pic _ type in which the remaining Color component pictures are additionally inserted is included in the AUD NAL unit of fig. 69 in accordance with the value of num _ pictures _ in _ au, or encoded data of each Color component picture is started from an NAL unit (Color Channel limiter) indicating the start of a Color component picture as in the bit stream configuration of fig. 84, and the CCD NAL unit is configured to include primary _ pic _ type of the corresponding picture. In this configuration, since encoded data of each color component picture is multiplexed by aggregating 1 picture, the color component identification flag (color _ channel _ idc) described in embodiment 14 is not a slice header but included in the CCD NAL unit. This has the effect of reducing overhead information by aggregating information of the color component identifier that needs to be multiplexed into each slice into data of an image unit. Further, since it is possible to detect that color _ channel _ idc is verified 1 time for each color component image as a CCD NAL unit configured as a bit sequence and the head of the color component image can be found as soon as possible without performing variable length decoding processing, it is not necessary to verify color _ channel _ idc at the slice head one by one for separating the NAL unit to be decoded for each color component on the decoding apparatus side, and it is possible to flexibly supply data to the second image decoding means.
On the other hand, in such a configuration, since the effect of reducing the buffer size and the processing delay of the encoding apparatus as shown in fig. 72 of embodiment 14 is reduced, it is also possible to configure to perform signal transmission by multiplexing the color component identification flag in a slice unit or in a color component image unit in a higher level (sequence or GOP) pair. With such a bit stream structure, the encoding apparatus can be flexibly realized according to the usage form thereof.
Further, as another embodiment, the encoded data may be multiplexed by the bit stream configuration shown in fig. 86. In fig. 86, color _ channel _ idc and primary _ pic _ type included in the CCD NAL unit in fig. 84 are included in each AUD. In the bit stream structure of embodiment 15, 1 (color component) picture is also included in 1 access unit in the case of the independent encoding processing. In such a configuration, there is an effect of reducing overhead information by integrating information of the color component identification flag into data of picture units, and furthermore, since it is possible to detect CCDNAL units configured as a bit sequence and verify color _ channel _ idc for each color component picture 1 time, and it is possible to find the head of the color component picture as soon as possible without performing variable length decoding processing, it is not necessary to verify color _ channel _ idc in the slice header one by one in order to separate NAL units to be decoded for each color component on the decoding apparatus side, and it is possible to flexibly supply data to the second picture decoding means. On the other hand, since 1-frame or 1-field image is constituted by 3 access units, it is necessary to specify that 3 access units are image data at the same time. Therefore, in the bit stream structure of fig. 86, the AUD can be further configured to add the sequence number (encoding and decoding order in the time direction, etc.) of each picture. With this configuration, the decoding device can verify the decoding and display order of each picture, the color component attribute, whether or not IDR is present, and the like without decoding slice data at all, and can efficiently perform bit stream level editing and special playback.
In the bit stream structures of fig. 69, 84, and 86, information for specifying the number of slice NAL units included in one color component image may be stored in the area of the AUD or CCD.
In all of the above embodiments, the transform process and the inverse transform process may be performed by a transform such as DCT that ensures orthogonality, or may be combined with a quantization process and an inverse quantization process to approximate orthogonality, instead of the orthogonal transform such as AVC and strictly speaking, such as DCT. In addition, the prediction error signal may be encoded into information at the pixel level without performing conversion.
The present invention can be applied to a digital image signal encoding device and a digital image signal decoding device used in an image compression encoding technique, a compressed image data transmission technique, or the like.
Claims (4)
1. An image decoding apparatus for decoding a color image signal by receiving as input a bit stream generated by compression-encoding a color image composed of a plurality of color components in units of a predetermined region, the apparatus comprising:
an upper header analyzing unit that extracts common code independent code identification information from the bit stream;
a decoding unit that decodes an inter prediction mode, a motion vector, and a reference picture number for each of the regions from the bit stream, and decodes a prediction error signal of the region;
a prediction image generation unit that generates a prediction image based on the decoded inter-prediction mode, motion vector, and reference image number; and
an adding unit that adds the decoded prediction error signal and the predicted image to generate a decoded image,
the decoding means decodes an inter-prediction mode, a motion vector, and a reference picture number that are commonly used for all color components when the common encoding independent encoding identification information indicates that a region that is a coding unit is encoded by a prediction method common to the color components, and decodes an inter-prediction mode, a motion vector, and a reference picture number for each color component in units of the region when the common encoding independent encoding identification information indicates that a region that is a coding unit is encoded by a prediction method for each color component,
The predicted image generation unit generates a predicted image for each color component by using the decoded inter-prediction mode, motion vector, and reference image number.
2. An image encoding device that generates a bit stream by performing compression encoding on a color image composed of a plurality of color components in units of a predetermined region, the image encoding device comprising:
multiplexing means for multiplexing common encoded independent encoded identification information indicating: whether the region to be the coding unit is coded by a common prediction method for each color component or by a separate prediction method for each color component; and
and an encoding unit that specifies an inter-prediction mode, a motion vector, and a reference image number that are commonly used for all color components when the common encoding independent encoding identification information indicates that the region that is the encoding unit is encoded by a prediction method that is common to the color components, specifies an inter-prediction mode, a motion vector, and a reference image number for each color component when the common encoding independent encoding identification information indicates that the region that is the encoding unit is encoded by a prediction method that is common to the color components, and performs compression encoding on a prediction error signal that is obtained from the specified inter-prediction mode, motion vector, and reference image number.
3. An image decoding method for decoding a color image signal by receiving as input a bit stream generated by compression-encoding a color image composed of a plurality of color components in units of a predetermined region, the image decoding method comprising:
an upper header analyzing step of extracting common code independent code identification information from the bit stream;
a decoding step of decoding an inter prediction mode, a motion vector, and a reference picture number for each of the regions from the bit stream, and decoding a prediction error signal of the region;
a predicted image generation step of generating a predicted image based on the decoded inter-prediction mode, motion vector, and reference image number; and
an addition step of adding the decoded prediction error signal and the predicted image to generate a decoded image,
in the decoding step, when the common encoding independent encoding identification information indicates that the region serving as the encoding unit is encoded by a prediction method common to the color components, the inter prediction mode, the motion vector, and the reference picture number, which are commonly used for all the color components, are decoded, and when the common encoding independent encoding identification information indicates that the region serving as the encoding unit is encoded by a prediction method for each color component, the inter prediction mode, the motion vector, and the reference picture number are decoded in units of the region for each color component,
The predicted image generation step generates a predicted image for each color component using the decoded inter-prediction mode, motion vector, and reference image number.
4. An image encoding method for generating a bit stream by performing compression encoding on a color image composed of a plurality of color components in units of a predetermined region, the method comprising:
a multiplexing step of multiplexing common coded independent coded identification information indicating: whether the region to be the coding unit is coded by a common prediction method for each color component or by a separate prediction method for each color component; and
and an encoding step of, when the common encoding independent encoding identification information indicates that the region to be the encoding unit is encoded by a prediction method common to the color components, identifying an inter-prediction mode, a motion vector, and a reference image number which are commonly used for all the color components, and when the common encoding independent encoding identification information indicates that the region to be the encoding unit is encoded by a prediction method separate from the color components, identifying the inter-prediction mode, the motion vector, and the reference image number separately for each color component, and performing compression encoding on a prediction error signal obtained from the identified inter-prediction mode, motion vector, and reference image number.
Applications Claiming Priority (10)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2005212601 | 2005-07-22 | ||
| JP2005-212601 | 2005-07-22 | ||
| JP2005-294768 | 2005-10-07 | ||
| JP2005-294767 | 2005-10-07 | ||
| JP2005294768 | 2005-10-07 | ||
| JP2005294767 | 2005-10-07 | ||
| JP2005377638 | 2005-12-28 | ||
| JP2005-377638 | 2005-12-28 | ||
| JP2006085210 | 2006-03-27 | ||
| JP2006-085210 | 2006-03-27 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1159913A1 HK1159913A1 (en) | 2012-08-03 |
| HK1159913B true HK1159913B (en) | 2013-07-26 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102281448B (en) | Image encoder and image decoder, image encoding method and image decoding method | |
| CN101889449B (en) | Image encoding device and image decoding device | |
| RU2502216C2 (en) | Image encoder and image decoder, image encoding method and image decoding method | |
| US8488889B2 (en) | Image encoder and image decoder, image encoding method and image decoding method, image encoding program and image decoding program, and computer readable recording medium recorded with image encoding program and computer readable recording medium recorded with image decoding program | |
| US20090123066A1 (en) | Image encoding device, image decoding device, image encoding method, image decoding method, image encoding program, image decoding program, computer readable recording medium having image encoding program recorded therein, | |
| US20080165849A1 (en) | Image encoder and image decoder, image encoding method and image decoding method, image encoding program and image decoding program, and computer readable recording medium recorded with image encoding program and computer readable recording medium recorded with image decoding program | |
| US20080130990A1 (en) | Image encoder and image decoder, image encoding method and image decoding method, image encoding program and image decoding program, and computer readable recording medium recorded with image encoding program and computer readable recording medium recorded with image decoding program | |
| US20080130988A1 (en) | Image encoder and image decoder, image encoding method and image decoding method, image encoding program and image decoding program, and computer readable recording medium recorded with image encoding program and computer readable recording medium recorded with image decoding program | |
| HK1159913B (en) | Image encoding device and method and image decoding device and method | |
| HK1162791B (en) | Image encoding device and method thereof, image decoding device and method thereof | |
| HK1148148B (en) | Image encoder and image decoder |