HK1225202A1 - Image decoding device, image coding device, and coded data - Google Patents
Image decoding device, image coding device, and coded data Download PDFInfo
- Publication number
- HK1225202A1 HK1225202A1 HK16113179.8A HK16113179A HK1225202A1 HK 1225202 A1 HK1225202 A1 HK 1225202A1 HK 16113179 A HK16113179 A HK 16113179A HK 1225202 A1 HK1225202 A1 HK 1225202A1
- Authority
- HK
- Hong Kong
- Prior art keywords
- layer
- unit
- picture
- information
- decoding
- Prior art date
Links
Description
Technical Field
The present invention relates to an image decoding apparatus that decodes scalable-coded image data, an image encoding apparatus that scalable-codes image data, and encoded data referred to by such an image decoding apparatus.
Background
As scalable coding techniques, there are known: spatial scalability with different inter-layer resolutions, SNR scalability with different inter-layer image qualities and bit depths, bit depth scalability, view scalability in which multiple view images can be encoded, depth scalability in which images for audio and visual (textures) and depth information (depth) can be encoded simultaneously, 3D scalability in which view scalability and depth scalability are combined (or view scalability plus depth). In scalable coding, images corresponding to resolution, image quality, bit depth, viewpoint, and depth are coded by being distinguished by identifiers called layer IDs.
In non-patent document 1, a scalable technique that extends HEVC/h.265 is known. In non-patent document 1, a layer having a layer ID of 0 is referred to as a base layer, and for backward compatibility, a condition that the encoded data of the base layer can be decoded by a non-scalable decoder (for example, a main standard decoder of HEVC/h.265) is required. This is because, in the case where only the base layer is extracted, reproduction can be performed simply with a non-scalable decoder. Therefore, even for scalable-coded image data in a layer having a layer ID of 0, the syntax structure is the same as that of the main standard, and the tools used are the same as those of the main standard. In addition, the layer ID is encoded as nah _ layer _ ID in the NAL unit header.
In scalable coding, as in the case of predicting an image of a certain view (certain layer) from an image of another view (other layer), the inter-layer dependency is utilized, and coding is more effective than in the case of independently coding the layers. The inter-layer dependencies (presence or absence) are coded in Video Parameter Sets (VPS). However, when encoding a moving picture from a viewpoint completely different from a certain viewpoint as an accompanying moving picture, the effect of utilizing the dependency relationship is small because there is little correlation between two images. And decoding without using dependency becomes easy. Therefore, in such a case, a certain layer may be encoded as a layer independent of other layers (hereinafter referred to as an independent layer).
Non-patent document 1 discloses a method for changing a syntax structure according to a layer ID as a data structure of the scalable coding technique of HEVC. The method comprises the following steps: (1) a technique of omitting profile information and the like encoded in VPS in a Sequence Parameter Set (SPS) of an extension layer other than the extension layer having a layer ID of 0 by encoding profile information and the like of a plurality of layers in VPS; (2) a technique of encoding representation (representation) information such as a multi-layer image size by VPS, and omitting the representation information encoded by VPS in SPS of an extension layer other than the extension layer whose layer ID is 0; (3) a technique of omitting encoding of a scaling table in SPS and Picture Parameter Set (PPS) of an extension layer other than the extension layer having a layer ID of 0 by prediction from a scaling table of another layer; (4) and a technique of encoding the unencoded POCLSB in the IDR picture and BLA picture having the layer ID of 0, and encoding the POC specified in the IDR picture and BLA picture, when the layer ID is not 0. The structure of the encoded data of the SPS, PPS, and slice (slice) header when these techniques are used is shown in fig. 44 and 45.
Documents of the prior art
Non-patent document
Non-patent document 1: "scaleable HighefficiencyVideoCocordingDraft 3", JCTVC-N1008, 14 thMeeting: vienna, AT, 25July-2Aug.2013
Disclosure of Invention
Technical problem to be solved by the invention
However, non-patent document 1 has the following problems: even in the case of an independent layer, when the layer ID is other than 0, since there is a change in the syntax structure of SPS, PPS, and slice header, decoding is not possible when the independent layer is extracted and decoded by a non-scalable decoder; alternatively, in case of extracting the independent layer and decoding with the non-scalable decoder, the SPS, PPS, slice header needs to be rewritten.
The present invention has been made in view of the above problems, and a main object of the present invention is to provide an image decoding apparatus that decodes scalable-coded image data, which can extract independent layers without rewriting syntax, and can reproduce the independent layers by a non-scalable decoder.
Means for solving the problems
In order to solve the above problem, an image decoding device according to an aspect of the present invention is an image decoding device that decodes a scalable-coded image, and includes: a header decoding unit that decodes the first flag; POC information decoding means for decoding slice _ pic _ order _ cnt _ lsb as one of POC information, wherein the POC information decoding means decodes the slice _ pic _ order _ cnt _ lsb from a slice header when the first flag indicates a first value and the layer ID is larger than 0, or when the NAL unit type does not indicate an IDR picture, and does not decode the slice _ pic _ order _ cnt _ lsb otherwise
Effects of the invention
The image decoding apparatus according to an aspect of the present invention has the following advantages: the independent layer can be extracted without rewriting syntax and reproduced by the non-scalable decoder.
Drawings
Fig. 1 is a schematic diagram showing a configuration of an image transmission system according to an embodiment of the present invention.
Fig. 2 is a diagram showing a hierarchical structure of encoded data according to the present embodiment.
Fig. 3 is a conceptual diagram illustrating an example of a reference picture list.
Fig. 4 is a conceptual diagram illustrating an example of a reference picture.
Fig. 5 is a schematic diagram showing the configuration of the image decoding device according to the present embodiment.
Fig. 6 is a schematic diagram showing the configuration of the inter prediction parameter decoding unit according to the present embodiment.
Fig. 7 is a schematic diagram showing the configuration of the merged prediction parameter deriving unit according to the present embodiment.
Fig. 8 is a schematic diagram showing the configuration of the AMVP prediction parameter derivation unit according to the present embodiment.
Fig. 9 is a conceptual diagram illustrating an example of the vector candidates.
Fig. 10 is a schematic diagram showing the configuration of the inter-predicted image generation unit according to the present embodiment.
Fig. 11 is a schematic diagram showing the configuration of a NAL unit according to the embodiment of the present invention.
Fig. 12 is a diagram showing the structure of coded data of NAL units according to the embodiment of the present invention.
Fig. 13 is a diagram showing a relationship between a value of a NAL unit type and a type of a NAL unit according to an embodiment of the present invention.
Fig. 14 shows a configuration of encoded data of a VPS according to an embodiment of the present invention.
Fig. 15 is a diagram showing a configuration of encoded data of a VPS extension according to an embodiment of the present invention.
Fig. 16 is a diagram showing the structure of encoded data for the SPS, PPS, and slice header in the embodiment of the present invention.
Fig. 17 is a diagram showing a structure of a random access picture according to the embodiment of the present invention.
Fig. 18 is a functional block diagram showing a schematic configuration of an image decoding device according to an embodiment of the present invention.
Fig. 19 is a functional block diagram showing a schematic configuration of a header decoding unit according to an embodiment of the present invention.
Fig. 20 is a functional block diagram showing a schematic structure of a NAL unit header decoding section according to an embodiment of the present invention.
Fig. 21 is a functional block diagram showing a schematic configuration of a header decoding unit according to an embodiment different from the header decoding unit shown in fig. 19.
Fig. 22 is a functional block diagram showing a schematic configuration of each decoding unit constituting the header decoding unit of fig. 21.
Fig. 23 is a functional block diagram showing a schematic configuration of a scaling table decoding unit included in the SPS decoding unit and the PPS decoding unit of fig. 22.
Fig. 24 is a schematic diagram showing the structure of a picture structure according to the present embodiment.
Fig. 25 is a schematic diagram showing the configuration of the image coding device according to the present embodiment.
Fig. 26 is a block diagram showing the configuration of a picture encoding unit according to the present embodiment.
Fig. 27 is a schematic diagram showing the configuration of the inter prediction parameter encoding unit according to the present embodiment.
Fig. 28 is a diagram showing the configuration of encoded data decoded by the representation information decoding unit included in the header decoding unit of fig. 19.
Fig. 29 is a schematic diagram showing the configuration of a POC information decoding unit according to an embodiment of the present invention.
Fig. 30 is a diagram showing an operation of the POC information decoding unit according to the embodiment of the present invention.
Fig. 31 is a functional block diagram showing a schematic configuration of the reference picture management unit according to the present embodiment.
Fig. 32 is a diagram showing an example of a reference picture set and a reference picture list, (a) is a diagram in which pictures constituting a moving image are arranged in display order, (b) is a diagram showing an example of RPS information applied to a target picture, (c) is a diagram showing an example of a current RPS derived when the RPS information shown in (b) is used when the POC of the target picture is 0, and (d) and (e) are diagrams showing an example of a reference picture list generated from a reference picture contained in the current RPS.
Fig. 33 shows a reference picture list correction example, where (a) shows an L0 reference list before correction, (b) shows RPL correction information, and (c) shows an L0 reference list after correction.
Fig. 34 is a diagram illustrating a part of an SPS syntax table used in SPS decoding in the header decoding unit and the reference picture information decoding unit of the image decoding apparatus.
Fig. 35 is a diagram illustrating syntax tables of short-term reference picture sets used in SPS decoding and slice header decoding by the header decoding unit and the reference picture information decoding unit of the image decoding apparatus.
Fig. 36 is a diagram illustrating a part of a slice header syntax table used in slice header decoding in the header decoding section and the reference picture information decoding section of the image decoding apparatus.
Fig. 37 is a diagram illustrating a part of a slice header syntax table used in slice header decoding in the header decoding section and the reference picture information decoding section of the image decoding apparatus.
Fig. 38 is a diagram illustrating syntax tables of reference list sorting information used in slice header decoding in the header decoding section and the reference picture information decoding section of the image decoding apparatus.
Fig. 39 is a diagram illustrating a syntax table of reference list sorting information used in slice header decoding in the above-described image decoding apparatus.
Fig. 40 is a schematic diagram showing the configuration of a POC information encoding unit according to an embodiment of the present invention.
Fig. 41 is a diagram showing the configuration of encoded data for the VPS extension and SPS in modification 1 of the embodiment of the present invention.
Fig. 42 is a diagram showing a structure of encoded data for the PPS and the slice header in variation 1 of the embodiment of the present invention.
Fig. 43 is a diagram showing the structure of encoded data for the SPS, PPS, and slice header in variation 2 of the embodiment of the present invention.
Fig. 44 is a diagram showing a structure of conventional SPS encoded data.
Fig. 45 is a diagram showing a structure of encoded data for a conventional PPS and slice header.
Fig. 46 is a diagram of CS restriction X1 related to inter-layer POC alignment for explaining an embodiment of the present invention.
Fig. 47 is a diagram of CS restriction X2 related to inter-layer POC alignment for explaining an embodiment of the present invention.
Fig. 48 is a diagram for explaining access unit restriction according to the embodiment of the present invention.
Detailed Description
(first embodiment)
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
Fig. 1 is a schematic diagram showing the configuration of an image transmission system 5 according to the present embodiment.
The image transmission system 5 is a system that transmits a code obtained by encoding a plurality of layer images and displays an image obtained by decoding the transmitted code. The image transmission system 5 includes an image encoding device 2, a network 3, an image decoding device 1, and an image display device 4.
The image encoding device 2 receives a signal T (input image #10) indicating a plurality of layer images (also referred to as texture images). The layer image refers to an image viewed or photographed at a certain resolution and a certain viewpoint. In the case of performing view scalable coding for coding a three-dimensional image using a plurality of layer images, each of the plurality of layer images is referred to as a view image. Here, the viewpoint corresponds to a position or an observation point of the imaging device. For example, the plurality of viewpoint images are images captured by the left and right imaging devices respectively toward the subject. The image encoding device 2 encodes each of the signals to generate encoded data #1 (encoded data). The details of the encoded data #1 will be described later. The viewpoint image is a two-dimensional image (plane image) observed at a certain viewpoint. The viewpoint image is represented by, for example, a luminance value or a color signal value of each pixel arranged in a two-dimensional plane. Hereinafter, one viewpoint image or a signal representing the viewpoint image is referred to as a picture (picture). In addition, in the case of spatial scalable coding using a plurality of layer images, the plurality of layer images are encoded by: a base layer image with a lower resolution and an extension layer image with a higher resolution. In the case of SNR scalable coding using a plurality of layer pictures, the plurality of layer pictures are encoded by: a base layer image with low image quality and an extended layer image with high image quality. Furthermore, view scalable coding, spatial scalable coding, and SNR scalable coding may be arbitrarily combined.
The network 3 transmits the encoded data #1 generated by the image encoding device 2 to the image decoding device 1. The network 3 is the internet (internet), a Wide Area Network (WAN), a small area network (LAN), or a combination of these networks. The network 3 is not necessarily limited to a bidirectional communication network, and may be a unidirectional or bidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting. The network 3 may be replaced by a storage medium such as dvd (digital versatile disc) or BD (Blue-ray disc) on which the encoded data #1 is recorded.
The image decoding device 1 decodes each of the encoded data #1 transmitted through the network 3, and generates a plurality of decoded layer images Td (decoded view images Td, decoded image #2) that are decoded individually.
The image display device 4 displays all or a part of the plurality of decoded layer images Td (decoded image #2) generated by the image decoding device 1. For example, in view-scalable coding, a three-dimensional image (stereoscopic image) or a free viewpoint image is displayed when all of the images are displayed, and a two-dimensional image is displayed when a part of the images are displayed. The image display apparatus 4 includes a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. In the spatial scalable coding and SNR scalable coding, when the image decoding apparatus 1 and the image display apparatus 4 have high processing capabilities, an extension layer image with high image quality is displayed, and when only the lower processing capabilities are provided, a base layer image without the processing capabilities and display capabilities as high as those of the extension layer is displayed.
< Structure of encoded data #1 >
Before describing the image encoding device 2 and the image decoding device 1 of the present embodiment in detail, a data structure of encoded data #1 generated by the image encoding device 2 and decoded by the image decoding device 1 will be described.
(NAL Unit layer)
Fig. 11 is a diagram showing a hierarchical structure of data in the encoded data # 1. The coded data #1 is encoded in units called NAL (network abstraction layer) units.
The NAL is a layer provided to abstract communication between a VCL (video coding layer) for performing a moving picture coding process and a lower system for transmitting/accumulating coded data.
The VCL is a layer that performs image encoding processing, and encoding is performed in the VCL. On the other hand, the lower system described here corresponds to the file format of h.264/AVC and HEVC, and the MPEG-2 system. In the example shown below, the lower system corresponds to the decoding process in the object layer and the reference layer. In addition, in the NAL, a bit stream generated by the VCL is separated in units of NAL units and transmitted to a lower system which is a transmission target.
Fig. 12(a) shows a syntax table of nal (network abstract layer) unit. The NAL unit includes: encoded data encoded in VCL, and a header (NAL unit header) for correctly delivering the encoded data to a lower system of a transmission target. Further, the NAL unit header is represented by, for example, syntax shown in fig. 12 (b). The NAL unit header describes: "NAL _ unit _ type" indicating the type of encoded data stored in a NAL unit, "nuh _ temporal _ id _ plus1 indicating an identifier (time identifier) of a sublayer to which the stored encoded data belongs, and" nuh _ layer _ id "(or nuh _ reserved _ zero _6bits) indicating an identifier (layer identifier) of a layer to which the stored encoded data belongs.
The NAL unit data includes: parameter sets, SEI, slices, etc., described later.
Fig. 13 is a diagram showing a relationship between a value of a NAL unit type and a category of a NAL unit. As shown in fig. 13, NAL units of NAL unit types having values from 0 to 15 indicated in SYNA101 are fragments of non-RAP (random access picture). NAL units of NAL unit type with values from 16 to 21 indicated with SYNA102 are fragments of RAP (random access picture). RAP pictures are generally distinguished by BLA pictures, IDR pictures, CRA pictures, which are also classified as BLA _ W _ LP, BLA _ W _ DLP, BLA _ N _ LP. IDR pictures are also classified as IDR _ W _ DLP, IDR _ N _ LP. In addition to the RAP picture, there are LP pictures, TSA pictures, STSA pictures, TRAIL pictures, and the like, which will be described later.
(Access Unit)
The set of NAL units assembled according to a particular classification rule is called an access unit. In the case of the number of layers being 1, an access unit is a set of NAL units constituting 1 picture. When the number of layers is larger than 1, an access unit is a set of NAL units constituting pictures of a plurality of layers at the same time (output). In addition, in order to indicate the partition of an access unit, the encoded data may include a NAL unit called an access unit partition (access unit partition). An access unit delimiter is included between a set of NAL units that constitute an access unit in the encoded data and a set of NAL units that constitute another access unit. The value of NAL unit type (AUD _ NUT) of the access unit delimiter is used, for example, 35.
(POC restriction of Access Unit)
In the related art, as a restriction (consistency condition) on the bitstream, there is a restriction that picture order numbers POC (picordercncntval) of all pictures contained in the same access unit are the same.
In the present embodiment, the restriction (consistency condition) on the bit stream is relaxed. Specifically, the consistency condition uses the restriction that the picture order numbers POC of all the pictures of the dependent layers included in the same access unit are the same. That is, the following CS limit X1 is used.
(CS restriction X1) in the case where there is a dependency between layer a and layer B (in the case where direct _ dependency _ flag [ layer ID of layer a ] [ layer ID of layer B ] | ═ 0), POC of a picture must be the same for layer a and layer B contained in the same access unit.
In the CS restriction X1, if there is no dependency between a certain layer a and a certain layer B, conversely, POC of the layer a and the layer B contained in the same access unit may be different. In addition, pictures at the same time (output) between layers having the same POC are referred to as "POC is aligned between layers". In addition, in order to perform POC alignment between layers, the following processing is performed: a process of POC reset flag that picture other than IDR can initialize POC to 0; even in an IDR picture, it is possible to perform processing of POC other than 0 in an IDR picture whose slice header contains POC lower bits.
As the consistency condition, in addition to the CS limit X1, a CS limit X2 may be used.
(CS Limit X2) in the case of a layer having a dependency relationship between layer A and layer B (in the case of NumDirectRefLayers [ layer ID for layer A ] > 0, and NumDirectRefLayers [ layer ID for layer B ] > 0), POC of a picture must be the same for layer A and layer B contained in an access unit.
The CS limit X1 and the CS limit X2 may have other expressions as long as they have the same meaning. For example, the CS limits X1, X2 may also be expressed as follows.
(CS restriction X1') the POC of the picture of the reference layer and the picture of the referenced layer contained in the same access unit must be the same.
Pictures of layers (numdirectrefflayers [ layer ID ] > 0) for which (CS restriction X2') has a dependency must have the same POC in the same access unit.
Can behave as CS Limit X3 in combination with CS limits X1, X2.
(CS restriction X3) if it is a layer that can be either a reference layer or a referenced layer, all pictures belonging to a layer defined in the layer dependency information (direct dependency _ type) must have the same POC in the same access unit.
The dependencies indicated by the CS constraint X1 and the CS constraint X2 are dependencies indicated by the layer dependency information (direct _ dependency _ type) of the VPS, as described later. The independent layer is represented by the number of dependent layers NumDirectRilayers [ ], and an independent layer flag independentLayerFlag [ ], as described later.
Fig. 46 is a diagram for explaining the CS limit X1. In the figure, an access unit is represented by a dashed line containing a set of pictures, the numbers in the pictures representing the POC of the picture. As shown, layer 0 and layer 1(direct _ dependency _ flag [1] [0] | ═ 0), and layer 2 and layer 3(direct _ dependency _ flag [3] [2] | ═ 0) in a dependent relationship have the same POC in pictures included in the same access unit. In contrast, the layer 4 (numdirectrefflayers [4] ═ 0, IndependentLayerFlag [4] | 0) which is an independent layer may have a different POC from the layers 0 to 3.
Fig. 47 is a diagram for explaining CS limit X1+ CS limit X2(CS limit X3). As shown, layer 1, layer 3, which are not independent layers, have the same POC in the pictures contained in the same access unit. That is, the layer group of layer 0 and layer 1, which have no dependency relationship with each other, and the layer group of layer 2 and layer 3 have the same POC in the picture of the same access unit. In addition, since pictures in a dependent relationship by CS restriction X1 have the same POC with each other, as a result, layer 0 to layer 3 have the same POC in the same access unit. In contrast, layer 4, which is an independent layer, may have a different POC from layers 0 to 3.
According to the bitstream restriction of (CS restriction X1) or (CS restriction X1+ CS restriction X2) shown in the present embodiment, no restriction with respect to POC is imposed in the independent layer. Therefore, it is not necessary to encode POC information (POC sub-bits) in the slice header in order to maintain alignment of POC between layers. Therefore, in the independent layer, the following effects can be achieved even in a picture of NAL unit type with IDR: no syntax change (change in syntax structure from encoded data having a layer ID of 0) is necessary as in the case of including POC lower bits. In this case, the independent layer can be decoded using a non-scalable decoder capable of decoding encoded data having a syntax structure when the layer ID is 0.
In addition, in the case where the target layer is not an independent layer, in the processing of pictures to be referred to from the group-of-pictures specific target picture in the DPB (decoded picture buffer 12), in (S205) and (S206) of fig. 15, as will be described later, a judgment is required to select the POC of a reference picture having the same POC as the POC of the target picture. Therefore, in case the object layer is not a separate layer (at least between the object layer and the reference layer), POC alignment between the layers is required. In the case where the CS constraint X1 is satisfied, the inter-layer picture specifying process described above works smoothly because the inter-layer picture of the reference layer and the object picture maintain POC alignment between the layers. Also, in the case where the CS restriction X1+ CS restriction X2 is satisfied, since pictures other than the independent layer have the same POC in all the same access units, the following further effects are achieved: the determination of pictures at the same time becomes easy.
(POC restriction of Access Unit and Access Unit delimiter)
When the consistency condition concerning POC in an access unit is relaxed, that is, when the POC is allowed to be unequal in the same access unit, there is no problem in decoding an image from encoded data, but when the decoded image is displayed, POC cannot be used for determining pictures at the same display time (output) between layers, and it may be difficult to perform synchronous playback. In fact, when the encoded data is further stored in a container such as MPEG2TS or MP4, the addition of time information to the container in each picture unit does not cause a problem, but when the encoded data is not stored in a container, the failure to acquire display synchronization at POC causes a problem. Therefore, in such a case, it is preferable to set the following consistency condition with respect to the access unit.
(CS limit AUD1) all pictures belonging to the same access unit have the same POC. Alternatively, as the partition of the access unit, there is an access unit delimiter.
The CS limit AUD1 may also use different expressions with the same meaning. For example, the following CS limit AUD2 may be set.
(CS limit AUD2) all pictures belonging to the same access unit have the same POC. Alternatively, in case all pictures belonging to the same access unit may not have the same POC, the access unit is preceded by an access unit delimiter.
In addition, in the CS restriction AUD2, before an NAL unit (EOS _ NUT) indicating endofstream indicating interruption and termination of Coded Video Stream (CVS), there may be no access unit delimiter after all pictures belonging to the same access unit do not have the same POC before an NAL unit (EOS _ NUT) indicating endofbitstream indicating interruption and termination of Coded Video Stream (CVS).
In order to further clarify the partition of the access unit, the CS limit AUD3 may be set as follows.
(CS limit AUD3) all pictures belonging to the same access unit have the same POC. Alternatively, in the case where all pictures belonging to the same access unit may not have the same POC, there is an access unit delimiter before and after its access unit.
Further, since the above-described problem is particularly problematic when a byte stream format (a format including a start code prefix described in AnnexB of HEVC and an additional padding) which is a format used in a container is not included, the consistency condition relating to the access unit may be added only to the byte stream format.
Fig. 48 is a diagram for explaining consistency conditions relating to access units according to the present embodiment. Fig. 48(a) is a case where all pictures belonging to the same access unit have the same POC, and AUD (access unit delimiter) in the figure is not necessary as shown by a dotted line. Fig. 48(b) shows a case where pictures belonging to the same access unit have different POCs, and AUD (access unit delimiter) is necessary. Fig. 48(c) is the case of having different POCs in pictures belonging to the same access unit, and in the case of encoding data with EOS discontinuity, in (CS limit AUD2), AUD (access unit delimiter) is not required before EOS, until the case of having different POC in pictures belonging to the same access unit.
In addition, when there may be pictures of different POC pictures in the same access unit even in an access unit belonging to a certain coded data (CVS) composed of consecutive access units, it is preferable that all the access units have an attached AUD (normally, AUD before the access unit). That is, as shown in fig. 48 d, when a plurality of access units (here, access unit 0 and access unit 1) exist in the CVS, the POC alignment between layers is maintained in a certain access unit (access unit 1), and when the POC alignment between layers is not maintained in a certain access unit (access unit 0), the AUD is added to the access unit that maintains the POC alignment between layers. This condition is performed as a consistency limit by the following CS limit AUD 4.
(CS limit AUD4) all pictures belonging to the same access unit have the same POC within a certain CVS. Alternatively, if all pictures belonging to the same access unit do not have the same POC in a certain CVS, the CVS may have an access unit delimiter before the access unit.
Here, the CVS is a unit of encoded data, and includes a plurality of access units. In addition, CVS consists of NAL units starting from IRAP pictures. Furthermore, an IRAP picture is a picture with NAL unit types belonging to any kind of IDR, CRA, BLA.
(video parameter set)
Fig. 14 is a diagram showing a structure of coded data of vps (videoparameterset) according to the embodiment of the present invention. If the meaning of a part of the syntax element is expressed, the following is shown. The VPS is a parameter set for specifying parameters common to a plurality of layers. A picture use ID (video _ parameter _ set _ ID) is made to refer to a parameter set from encoded data as compressed data.
Video _ parameter _ set _ id (SYNA 401 of fig. 14) is an identifier for identifying each VPS.
VPS _ temporal _ id _ nesting _ flag (SYNA 402 of fig. 14) is a flag related to inter prediction in a picture that refers to the VPS and indicates whether or not a constraint is added.
Vps _ max _ num _ sub _ layers _ minus1 (SYNA 403 of fig. 14) is: syntax for calculating an upper limit value MaxNumLayers for the number of layers in relation to temporal scalability in addition to other scalability in relation to hierarchical coded data comprising at least a base layer. The upper limit value MaxNumLayers is denoted by MaxNumLayers ═ vps _ max _ num _ sub _ layers _ minus1+ 1. In the case where layered encoded data is composed of only the base layer, vps _ max _ num _ sub _ layers _ minus1 is 0.
VPS _ extension _ flag (SYNA 404 of fig. 14) is a flag indicating whether the VPS further contains a VPS extension.
VPS _ extension _ data _ flag (SYNA 405 of fig. 14) is a VPS extension body, and is specifically illustrated in fig. 15.
Note that, in the present specification, when a "flag indicating whether XX is present" is described, a1 is set to XX and a 0 is not XX, and in logical negation, logical product, or the like, a1 is processed to true and a 0 is processed to false (the same applies hereinafter). However, in actual apparatuses and methods, other values may be used as the true value and the false value.
Fig. 15 is a diagram showing a configuration of encoded data of a VPS extension according to an embodiment of the present invention. The meaning of a part of the syntax elements is as follows.
Dimension _ id _ len _ minus (SYNA 501 of fig. 15) represents the number num _ dimensions of dimension _ ids contained in each scalable category. Is num _ dimensions _ id _ len _ minus1[1] + 1. For example, num _ dimensions decodes the number of views when the scalable category is depth 2 or view.
Dimension identification _ id (SYNA 502 of fig. 15) is information indicating the category of the picture for each scalable category.
Dependency layer information direct _ dependency _ flag [ i ] [ j ] (SYNA 503 of fig. 15) is a flag indicating whether there is a dependency between the object layer i and the reference layer j.
In SYNA504 of FIG. 15, the part denoted by "…" is information that differs for each level or scalable category (details are described later).
Fig. 2 is a diagram showing a hierarchical structure of data in the encoded data # 1. The encoded data #1 exemplarily contains a sequence and a plurality of pictures constituting the sequence. Fig. 2(a) to (f) are diagrams showing a sequence layer of a predetermined sequence SEQ, a picture layer defining a picture PICT, a slice layer defining a slice S, a slice data layer defining slice data, a coding tree layer defining coding tree units included in slice data, and a coding unit layer defining Coding Units (CUs) included in a coding tree, respectively.
(sequence layer)
In the sequence layer, a set of data to be referred to by the image decoding apparatus 1 is defined in order to decode a sequence SEQ to be processed (hereinafter, also referred to as a target sequence). As shown in fig. 2(a), the sequence SEQ comprises: video parameter set (videoparameter set), sequence parameter set sps (sequenceparameter set), picture parameter set pps (pictureparameterset), picture PICT, and additional extension information sei (supplementenmentinformation). Here, the values shown after # indicate layer IDs. Fig. 2 shows an example in which coded data of #0 and #1, that is, a layer ID of 0 and a layer ID of 1 exists, but the type of layer and the number of layers are not dependent on these.
The video parameter set VPS defines a set of encoding parameters common to a plurality of moving images, and a set of a plurality of layers included in a moving image and encoding parameters associated with each layer in a moving image composed of a plurality of layers.
In the sequence parameter set SPS, a set of encoding parameters to be referred to by the image decoding apparatus 1 is defined in order to decode a target sequence. For example, the width and height of the picture are specified.
In the picture parameter set PPS, a set of encoding parameters to be referred to by the image decoding apparatus 1 is defined in order to decode each picture in a target sequence. For example, it contains: a standard value of quantization width for decoding of a picture (pic _ init _ qp _ minus26), a flag indicating that weighted prediction is used (weighted _ pred _ flag), and a scaling table (quantization matrix). Further, the PPS may be present in plural. In this case, one of the PPS is selected from each picture in the target sequence.
(Picture layer)
In the picture layer, a set of data to be referred to by the image decoding apparatus 1 is defined in order to decode a picture PICT to be processed (hereinafter, also referred to as a target picture). As shown in FIG. 2(b), the picture PICT includes the segments S0 to SNS-1(NS is the total number of segments included in the picture PICT).
In the following description, in some cases, the subscripts of the codes are omitted when it is not necessary to distinguish the segments S0 to SNS-1. The same applies to data included in encoded data #1 described below, that is, other data with a corner mark.
(slice fault)
In the slice layer, a set of data to be referred to by the image decoding apparatus 1 is defined in order to decode a slice S to be processed (also referred to as a target slice). As shown in fig. 2(c), the clip S includes a clip header SH (clip header SH) and clip data SDATA.
In order to determine the decoding method of the target slice, the slice header SH contains the coding parameter set to be referred to by the image decoding apparatus 1. The slice type specification information (slice _ type) specifying the slice type is an example of the coding parameters included in the slice header SH.
As the clip type that can be specified by the clip type specification information, there can be mentioned: (1) i slices using only intra prediction at the time of encoding, (2) P slices using unidirectional prediction or intra prediction at the time of encoding, (3) B slices using unidirectional prediction, bidirectional prediction or intra prediction at the time of encoding, and the like.
The slice header SH may contain a reference (pic _ parameter _ set _ id) to the picture parameter set PPS, which is included in the sequence layer.
(fragment data layer)
In the slice data layer, a set of data to be referred to by the image decoding apparatus 1 is defined in order to decode the slice data SDATA to be processed. As shown in fig. 2(d), the clip data SDATA includes: a Code Tree Block (CTB) (code tree unit CTU). A CTB is a fixed-size (e.g., 64 × 64) block constituting a fragment, and is also sometimes called a Largest Coding Unit (LCU).
(coding tree layer)
As shown in fig. 2(e), the code tree layer defines a set of data to be referred to by the image decoding apparatus 1 in order to decode a code tree block to be processed. The coding tree unit is partitioned by recursive quadtree partitioning. Nodes of a tree structure obtained by recursive quadtree division are referred to as a coding tree (codingtree). The intermediate node of the quadtree is a Coded Quadtree (CQT), and the CTU is specified as a CQT containing the most significant bits. The CQT contains a split flag (split _ flag), and if the split _ flag is 1, the CQT is split into four CQTs (including four CQTs). In the case where the split _ flag is 0, the CQT contains a coding unit (CU: CodedUnit) as an end node. The coding unit CU becomes a basic unit of the coding process.
(coding unit layer)
As shown in fig. 2 (f), the coding unit layer defines a set of data to be referred to by the image decoding apparatus 1 for the coding unit to be decoded. Specifically, the coding unit is composed of a CU header CUH, a prediction tree, a transform tree, and a CU header CUF. In the CU header CUH, a coding unit is specified as a unit using intra prediction, a unit using inter prediction, or the like. The coding unit becomes the root of a Prediction Tree (PT) and a Transformation Tree (TT). The CU header CUF is included between the prediction tree and the transformation tree or after the transformation tree.
In the prediction tree, a coding unit is divided into 1 or more prediction blocks, and the position and size of each prediction block are specified. To say another different expression, a prediction block is one or more non-overlapping regions that constitute a coding unit. In addition, the prediction tree contains one or more prediction blocks obtained by the above-described division.
The prediction process is performed for each of the prediction blocks. Hereinafter, a prediction block, which is a unit of prediction, is also referred to as a Prediction Unit (PU).
The type of partition in the prediction tree is roughly two cases, that is, an intra-frame prediction case and an inter-frame prediction case. Intra prediction is prediction within the same picture, and inter prediction is prediction processing performed between different pictures (for example, between display times and between layer images).
In the case of intra prediction, the partitioning method has 2N × 2N (the same size as the coding unit), and N × N.
In the case of inter prediction, the partitioning method is 2N × 2N (the same size as the coding unit), 2N × N, 2N × nU, 2N × nD, N × 2N, nL × 2N, nR × 2N, N × N, and the like, by part _ mode encoding of encoded data. In addition, 2N × nU indicates that a 2N × 2N coding unit is divided into two regions, 2N × 0.5N and 2N × 1.5N, in this order from the top. 2N × nD indicates that a 2N × 2N coding unit is divided into two regions, 2N × 1.5N and 2N × 0.5N, in this order from the top. nL × 2N denotes a 2N × 2N coding unit divided into two regions, 0.5N × 2N and 1.5N × 2N, in this order from the left. nR × 2N denotes dividing a 2N × 2N coding unit into two regions of 1.5N × 2N and 0.5N × 1.5N in this order from the left. Since the number of divisions is any one of 1, 2, and 4, the number of PUs included in a CU is one to four. These PUs are expressed as PU0, PU1, PU2, and PU3 in this order.
In addition, in the transform tree, a coding unit is divided into one or more transform blocks, and the position and size of each transform block are specified. To say another different manifestation, a transform block is one or more non-repeating regions that make up a coding unit. In addition, the conversion tree contains one or more conversion blocks obtained by the above-described division.
The split of the transformation tree is: cutting an area of the same size as the coding unit into partitions of the transform blocks; the division by recursive quadtree division is the same as the division of the tree block described above.
The conversion process is performed for each of the conversion blocks. Hereinafter, a transform block, which is a unit of transform, is also referred to as a Transform Unit (TU).
(prediction parameters)
The prediction image of the prediction unit is derived from the prediction parameters attached to the prediction unit. The prediction parameters include prediction parameters for intra-frame prediction or prediction parameters for inter-frame prediction. The following describes prediction parameters (inter prediction parameters) of inter prediction. The inter prediction parameters are composed of prediction list utilization flags predflag l0, predflag l1, reference picture indices refIdxL0, refIdxL1, vectors mvL0, and mvL 1. The prediction list utilization flags predflag L0 and predflag L1 are flags indicating whether or not reference picture lists called an L0 reference list and an L1 reference list are used, respectively, and a reference picture list corresponding to the case where the value is 1 is used. The case of using two reference picture lists, i.e., predflag l0 equal to 1 and predflag l1 equal to 1, corresponds to bi-prediction, and the case of using one reference picture list, i.e., the case of (predflag l0, predflag l1) ═ 1, 0 or (predflag l0, predflag l1) ═ 0, 1, corresponds to single prediction. The information of the prediction list use flag may be expressed by an inter prediction flag inter _ pred _ idx described later. In general, in a prediction image generation unit and a prediction parameter memory described later, the inter prediction flag inter _ pred _ idx is used when decoding information on whether or not to use any reference picture list from encoded data using a prediction list use flag.
Syntax elements for deriving inter-prediction parameters included in the encoded data are, for example: the motion vector prediction method comprises a partition mode part _ mode, a merge flag merge _ flag, a merge index merge _ idx, an inter prediction flag inter _ pred _ idx, a reference picture index refIdxLX, a prediction vector index mvp _ LX _ idx and a difference vector mvdLX.
(an example of a reference Picture List)
Next, an example of the reference picture list will be described. The reference picture list is a column made up of reference pictures stored in the decoded picture buffer 12. Fig. 3 is a conceptual diagram illustrating an example of a reference picture list. In the reference picture list 601, five rectangles arranged in a line on the left and right represent reference pictures, respectively. Codes indicated in the order from the left to the right, P1, P2, Q0, P3, and P4 are codes indicating respective reference pictures. P such as P1 represents a viewpoint P, and Q of Q0 represents a viewpoint Q different from the viewpoint P. The corner marks of P and Q represent the picture order numbers POC. A downward arrow directly below refIdxLX indicates that reference picture index refIdxLX is an index of reference picture Q0 in decoded picture buffer 12.
(example of reference Picture)
Next, an example of a reference picture used when deriving a vector is explained. Fig. 4 is a conceptual diagram illustrating an example of a reference picture. In fig. 4, the horizontal axis represents display time and the vertical axis represents viewpoint. Each of the rectangles in 2 rows in the vertical direction and 3 columns in the horizontal direction (6 in total) shown in fig. 4 represents a picture. Of the 6 rectangles, the rectangle in the second column from the left in the lower row represents a picture to be decoded (target picture), and the remaining five rectangles each represent a reference picture. The reference picture Q0 indicated by an arrow directed upward from the target picture is a picture with a different view point at the same display time as the target picture. In the displacement prediction using the target picture as a standard, the reference picture Q0 is used. The reference picture P1 indicated by an arrow toward the left side from the target picture is a past picture at the same viewpoint as the target picture. The reference picture P2 indicated by an arrow from the subject picture toward the right is a future picture at the same viewpoint as the subject picture. In motion prediction using a target picture as a standard, a reference picture P1 or P2 is used.
(random Access Picture)
The structure of a Random Access Picture (RAP) handled in the present embodiment will be explained. Fig. 17 is a diagram illustrating a structure of a random access picture. RAP has three categories, IDR (instant DecodingRefresh), CRA (CleanRandomAccess), and BLA (BrokenLinkAccess). Whether a certain NAL unit is a NAL unit containing a slice of a RAP picture is identified according to the NAL unit type. The NAL unit types of IDR _ W _ LP, IDR _ N _ LP, CRA, BLA _ W _ LP, BLA _ W _ DLP, and BLA _ N _ LP correspond to an IDR _ W _ LP picture, IDR _ N _ LP picture, CRA picture, BLA _ W _ LP picture, BLA _ W _ DLP picture, and BLA _ N _ LP picture, which will be described later, respectively. That is, NAL units containing slices of the above-described pictures have the above-described NAL unit types.
Fig. 17(a) shows a case where a RAP picture is not a picture other than the leading picture. The english in the box indicates the name of the picture, and the numeral indicates POC (same below). The display order is arranged from left to right in the figure. IDR0, a1, a2, B4, B5, B6 are decoded in the order IDR0, B4, a1, a2, B6, B5. Fig. 17(B) to 17(g) show a case where the picture indicated by B4 in fig. 17(a) is changed to a RAP picture.
Fig. 17(b) shows an example in which an IDR picture (particularly, an IDR _ W _ LP picture) is inserted. In this example, decoding is performed in the order IDR0, IDR' 0, a1, a2, B2, and B1. In order to distinguish between two IDR pictures, a picture with a preceding time (decoding order also precedes) is referred to as IDR0, and a picture with a subsequent time is referred to as IDR' 0 picture. All RAP pictures of this example that contain IDR pictures prohibit reference by other pictures. The reference of other pictures is made by restricting the segment of the RAP picture to intra I _ SLICE as described later (in the later-described embodiment, the restriction is related to and relaxed for layers other than layer ID 0). Thus, the RAP picture itself can be independently decoded without relying on the decoding of other pictures. Further, at the decoding time point of the IDR picture, a Reference Picture Set (RPS) described later is initialized. Therefore, prediction using pictures decoded before the IDR picture, for example, prediction from B2 to IDR0, is prohibited. Picture a3 has a display instant POC before the display instant POC of the RAP (here IDR' 0), but is decoded after the RAP picture. Such pictures that are decoded after a RAP picture and reproduced before the RAP picture are referred to as Leading pictures (LP pictures). The pictures other than the RAP picture and the LP picture are decoded and reproduced after the RAP picture, and are generally called TRAIL pictures. IDR _ W _ LP is an abbreviation for instant decoding refresh within header picture, and may include an LP picture such as picture a 3. Picture a2 refers to the pictures of IDR0 and POC4 in the example of fig. 17(a), whereas in the case of an IDR picture, since the RPS is initialized at the time point of decoding IDR '0, reference from a2 to IDR' 0 is prohibited. In addition, at a time point when an IDR picture is decoded, POC is initialized.
As described above, an IDR picture is a picture having the following restrictions.
Initializing POC at Picture decoding Point in time
Initializing RPS at Picture decoding time Point
Forbidding reference of other pictures
Forbidding reference from pictures following the IDR in decoding order to pictures preceding the IDR in decoding order
Prohibition of RASL pictures (described later)
There may be RADL picture (described later) (case of IDR _ W _ LP picture)
There may be RADL pictures (described later) (BLA _ W _ LP, BLA _ W _ DLP pictures).
Fig. 17(c) shows an example of inserting an IDR picture (particularly, an IDR _ N _ LP picture). IDR _ N _ LP is an abbreviation for instant decoding refreshnoleadingpicture, and inhibits the existence of LP pictures. Therefore, the presence of the a3 picture in fig. 17(b) is prohibited. Thus, by referring to an IDR0 picture instead of an IDR '0 picture, an A3 picture needs to be decoded before the IDR' 0 picture.
Fig. 17(d) shows an example of inserting a CRA picture. In this example, decoding is performed in the order IDR0, CRA4, a1, a2, B6, and B5. In a CRA picture, unlike an IDR picture, initialization of RPS is not performed. Thus, it is not necessary to prohibit references (prohibiting references from a2 to CRA 4) from pictures following a RAP in decoding order (here, CRA) to pictures preceding the RAP in decoding order (here, CRA). However, when decoding starts from a CRA picture which is a RAP picture, since it is necessary to be able to decode pictures following the CRA in display order, it is necessary to prohibit reference from pictures following the RAP (CRA) in display order to pictures preceding the RAP (CRA) in decoding order (reference from B6 to IDR 0). Further, in the CRA, the POC is not initialized.
As described above, a CRA picture is a picture having the following restrictions.
Not initializing POC at picture decoding time point
Not initializing RPS at picture decoding time
Forbidding reference of other pictures
Forbidding reference from pictures following a CRA in display order to pictures preceding the CRA in decoding order
Able to have RADL pictures and RASL pictures.
Fig. 17(e) to 17(g) are examples of BLA pictures. BLA pictures are RAP pictures used when a sequence is reconstructed starting with a CRA picture by editing encoded data or the like including the CRA picture, and have the following limitations.
Initializing POC at Picture decoding Point in time
Forbidding reference of other pictures
Forbidding reference from pictures following BLA in display order to pictures preceding BLA in decoding order
Possible RASL pictures (described later) (in the case of BLA _ W _ LP)
RADL pictures (described later) can be used (in the case of BLA _ W _ LP and BLA _ W _ DLP pictures).
For example, the case where the sequence is decoded from the position of CRA4 picture in fig. 6(d) will be described as an example.
Fig. 17(e) is an example of using a BLA picture (especially a BLA _ W _ LP picture). BLA _ W _ LP is an abbreviation for brokenlinkaccessfithleadingpicture, allowing the presence of LP pictures. When the CRA4 picture is replaced with the BLA _ W _ LP picture, a2 picture and A3 picture, which are LP pictures of the BLA picture, may not be present in encoded data. However, since the a2 picture is a picture decoded before the BLA _ W _ LP picture, encoded data edited with the BLA _ W _ LP picture as the leading picture does not exist in the encoded data. In the BLA _ W _ LP picture, such an LP picture that cannot be decoded is handled as a rasl (random access masking) picture, and is displayed without being decoded. In addition, the a3 picture is a LP picture that can be decoded, and such a picture is called radl (random access coded) picture. And identifying the RASL picture and the RADL picture according to the NAL unit types of the RASL _ NUT and the RADL _ NUT.
Fig. 17(f) is an example of using a BLA picture (especially a BLA _ W _ DLP picture). BLA _ W _ DLP is an abbreviation for brokenlinkaccess withdecoquarableingapcompressive picture, allowing the presence of LP pictures that can be decoded. Therefore, in the BLA _ W _ DLP picture, unlike fig. 17(e), the a2 picture, which is an LP picture (RASL) that cannot be decoded, is not allowed to exist in encoded data. A3 picture, which is a decodable LP picture (RADL), is allowed to exist in encoded data.
Fig. 17(g) is an example of using a BLA picture (especially a BLA _ N _ LP picture). BLA _ N _ LP is an abbreviation for brokenlinkaccesnoleadingpicture, and does not allow the presence of LP pictures. Therefore, in the BLA _ N _ DLP picture, unlike fig. 17(e) and 17(f), not only the a2 picture (RASL) but also the A3 picture (RADL) is not allowed to exist in encoded data.
(inter prediction flag and prediction list utilization flag)
The inter prediction flag and the prediction list can be mutually converted as follows using the relationship of the flags predflag l0, predflag l 1. Therefore, the prediction list utilization flag may be used as the inter prediction parameter, or the inter prediction flag may be used. In addition, the following determination using the prediction list use flag may be replaced with an inter prediction flag. On the contrary, the determination using the inter prediction flag may be replaced with a prediction list use flag.
Inter prediction flag (predflag l1 < 1) + predflag l0
predflag l0 ═ inter prediction flag &1
predFlagL1 ═ inter prediction flag > 1
Here, > is a right shift and < is a left shift.
(Merge predict and AMVP predict)
In the decoding (encoding) method of the prediction parameter, a merge flag merge _ flag of two modes, i.e., a merge prediction (merge) mode and an AMVP (adaptive motion vector prediction) mode, is a flag for identifying these modes. In both the merge prediction mode and the AMVP mode, the prediction parameters of the target PU are derived using the prediction parameters of the block that has already been processed. The merge prediction mode is a mode in which the prediction list utilization flag predflalx (inter prediction flag inter _ pred _ idx), the reference picture index refIdxLX, and the vector mvLX are not included in the encoded data, and the prediction parameters that have been already derived are directly used, and the AMVP mode is a mode in which the inter prediction flag inter _ pred _ idx, the reference picture index refIdxLX, and the vector mvLX are included in the encoded data. Further, the vector mvLX is encoded as a prediction vector index mvp _ LX _ idx and a difference vector (mvdLX) representing a prediction vector.
The inter prediction flag inter _ Pred _ idc is data indicating the kind and number of reference pictures, and one value of Pred _ L0, Pred _ L1, Pred _ Bi is acquired. Pred _ L0, Pred _ L1 indicate that reference pictures stored in reference picture lists called L0 reference list, L1 reference list, respectively, are used, and indicate that one reference picture is commonly used (single prediction). Prediction using the L0 reference list and the L1 reference list is referred to as L0 prediction and L1 prediction, respectively. Pred _ Bi represents the use of two reference pictures (Bi-prediction) and represents the use of two reference pictures stored in the L0 reference list and the L1 reference list. The prediction vector index mvp _ LX _ idx is an index representing a prediction vector, and the reference picture index refIdxLX is an index representing a reference picture stored in a reference picture list. LX is a description method used when L0 prediction and L1 prediction are not distinguished, and LX is replaced with L0 and L1 to distinguish between parameters for the L0 reference list and parameters for the L1 reference list. For example, refIdxL0 is a flag of a reference picture index for L0 prediction, refIdxL1 is a flag of a reference picture index for L1 prediction, and refidx (refidxlx) is a flag used when refIdxL0 and refIdxL1 are not distinguished.
The merge index merge _ idx is an index indicating whether or not any of prediction parameter candidates (merge candidates) derived from the processed block is used as a prediction parameter of the decoding target block.
(motion vector and Displacement vector)
The vector mvLX has a motion vector and a displacement vector (disparity vector). A motion vector is a vector indicating a position deviation between a position of a block in a picture at a certain display time in a certain layer and a position of a corresponding block in a picture at the same layer at a different display time (for example, adjacent discrete time). The displacement vector is a vector indicating a positional deviation between a position of a block in a picture of a certain layer at a certain display time and a position of a corresponding block in a picture of a different layer at the same display time. The pictures of different layers may be pictures of different views or pictures of different resolutions. In particular, a displacement vector corresponding to a picture of a different view is referred to as a disparity vector. In the following description, a motion vector and a displacement vector are simply referred to as a vector mvLX without distinction. The prediction vector and the difference vector related to the vector mvLX are referred to as a prediction vector mvpLX and a difference vector mvdLX, respectively. Whether the vector mvLX and the differential vector mvdLX are motion vectors or displacement vectors is performed using a reference picture index refIdxLX attached to the vector.
(Note attached to the coded data Structure)
The syntax included in the NAL unit may be changed depending on whether or not the layer ID is 0, without being limited to NAL units of layers other than the independent layer. Thus, since the structure of the syntax included in the NAL unit of the independent layer is the same when the layer ID is 0 as when the layer ID is other than 0, data substitution is not required at all or hardly required, and the independent layer can be decoded using a non-scalable decoder that can directly decode encoded data of the structure of the syntax when the layer ID is 0. In addition, such an effect can be achieved: the process of extracting data that can be decoded by the non-scalable decoder becomes easy. Strictly speaking, in order to enable decoding by a non-scalable decoder, a process of replacing the layer ID of the independent layer with 0 is necessary, but since the layer ID of the fixed length code is easily replaced in the NAL unit header whose position on the coded data is clear, the amount of processing of this process can be ignored compared to the case of changing another syntax.
In addition, if the non-scalable decoder ignores the check of the layer ID, the encoded data can be decoded without replacing the layer ID with 0. Since the layer ID is a number equal to or greater than 0, the determination (branching) other than whether or not the layer ID is 0 may be a determination (branching) of whether or not the layer ID is greater than 0 (hereinafter, the same applies).
Since the SPS and PPS are relatively easily rewritten compared to the change of the slice header (slice header), the syntax structure in the case of the independent layer may be made the same as the syntax structure in the case of the layer ID0 in the slice header, and the syntax structure of the SPS and PPS may be made different from the syntax structure in the case of the layer ID0 in the independent layer. In this case, in order to enable decoding by the non-scalable decoder, it is necessary to perform replacement processing on SPS and PPS of the independent layer, and it is not necessary to perform replacement processing on (slice header of) a picture of the independent layer. In the modification described later, the structure of encoded data by the independent layer and the decoding and encoding thereof may be used only for the slice header.
(construction of image decoding apparatus)
The configuration of the image decoding apparatus 1 of the present embodiment will be described. Fig. 18 is a schematic diagram showing the configuration of the image decoding device 1 according to the present embodiment. The image decoding device 1 includes a header decoding unit 10, a picture decoding unit 11, a decoded picture buffer 12, and a reference picture management unit 13. The image decoding apparatus 1 can perform random access decoding processing, which will be described later, that starts decoding from a picture at a specific time in an image including a plurality of layers.
[ head decoding unit 10]
The header decoding unit 10 decodes information for decoding in NAL unit units, sequence units, picture units, or slice units from the encoded data #1 supplied from the image encoding device 2. The decoded information is output to the picture decoding unit 11 and the reference picture management unit 13.
The header decoding unit 10 analyzes the VPS, SPS, and PPS included in the encoded data #1 based on a predetermined syntax definition, and decodes information for decoding in sequence units. For example, information related to the number of layers is decoded from the VPS. In addition, when the representation information exists in the VPS, the information associated with the image size of the decoded image is decoded from the VPS, and when the representation information exists in the SPS, the information associated with the image size of the decoded image is decoded from the SPS.
The header decoding unit 10 analyzes the slice header included in the encoded data #1 based on a predetermined syntax definition, and decodes information for decoding in slice units. For example, the slice type is decoded from the slice header.
As shown in fig. 19, the header decoding unit 10 includes: the NAL unit header decoding unit 211, the dependent layer information decoding unit 2101, the profile level information decoding unit 2102, the representation information decoding unit 2103, the scaling table decoding unit 2104, the POC information decoding unit 2105, and the reference picture information decoding unit 218.
In addition, when various header information such as dependency layer information, profile level information, profile information, and scaling table is mentioned, the header decoding unit may be configured as follows: a decoding unit for decoding the header of each type is provided inside the header of each type. In such a case, as shown in fig. 21, the header decoding unit may be a header decoding unit 10' including a NAL unit header decoding unit 211, a VPS decoding unit 212, an SPS decoding unit 213, a PPS decoding unit 214, and a slice header decoding unit 215.
Each of the decoding units of the VPS decoding unit 212, the SPS decoding unit 213, the PPS decoding unit 214, and the slice header decoding unit 215 may have a configuration as shown in fig. 22.
That is, the VPS decoding unit 212 may include: the dependent layer information decoding unit 2101, the profile level information decoding unit 2102, and the characterization information decoding unit 2103 may be provided with the SPS decoding unit 213: the hierarchical level information decoding unit 2102, the token information decoding unit 2103, the scaling table decoding unit 2104, the PPS decoding unit 214 may include the scaling table decoding unit 2104, and the slice header decoding unit 215 may include the POC information decoding unit 2105. The scaling table decoding unit 2104 may have a configuration as shown in fig. 23, for example.
In this case, although the plurality of header decoding units (for example, the VPS decoding unit 212 and the SPS decoding unit 213) have the same mechanism X (for example, the hierarchical level information decoding unit 2102), one header decoding unit may have the mechanism X therein, and the other header decoding units may use the mechanism X. Further, a mechanism X may be provided outside each header decoding unit, and each header decoding unit may use this mechanism X. The VPS decoding unit 212 and the SPS decoding unit 213 may be provided with the rank information decoding unit 2102, or one of the VPS decoding unit 212 and the SPS decoding unit 213 may be provided with the rank information decoding unit 2102 therein, while the other is not provided with the rank information decoding unit 2102 therein. Further, a hierarchy level information decoding unit 2102 may be provided outside the VPS decoding unit 212 and the SPS decoding unit 213, and the hierarchy level information decoding unit 2102 may be used by both the VPS decoding unit 212 and the SPS decoding unit 213.
[ NAL unit header decoding section 211]
Fig. 20 is a functional block diagram showing a schematic structure of the NAL unit header decoding section 211. As shown in fig. 20, the NAL unit header decoding portion 211 is configured to include a layer ID decoding portion 2111 and a NAL unit type decoding portion 2112.
The layer ID decoding section 2111 decodes a layer ID (layer ID included in NAL unit header) from the encoded data. The NAL unit type decoding section 2112 decodes the NAL unit type from the encoded data. The layer ID is, for example, 6bits of information from 0 to 63, and indicates a base layer when the layer ID is 0. To support this backward compatibility, a portion of the encoded data for scalable coding can also be decoded with a non-scalable decoder, the base layer can be decoded with a non-scalable decoder. The NAL unit type is, for example, 6bits of information from 0 to 63, indicating the category of data contained in the NAL unit. As described later, in the category of data, parameter sets such as VPS, SPS, PPS, RPS pictures such as IDR pictures, CRA pictures, BLA pictures, non-RAP pictures such as LP pictures, SEI, and the like are identified from NAL unit types.
[ dependent layer information decoding unit 2101]
The dependent layer information decoding unit 2101 performs extended decoding of the dependent layer information from the VPS and the VPS included in the encoded data based on a predetermined syntax definition. For example, the syntax shown in fig. 14 is decoded from the VPS, and the syntax shown in fig. 15 is extended and decoded from the VPS. The VPS extension is decoded with a flag VPS _ extension _ flag of 1. In this specification, the structure of encoded data (syntax table) and the meaning and restriction (semantics) of syntax elements including the structure of encoded data are referred to as an encoded data structure. The encoded data structure is an important technical element associated with random accessibility and memory size when the encoded data is decoded in an image decoding apparatus, compensating for the same action between different image decoding apparatuses, and also affecting the encoding efficiency of the encoded data.
The dependent layer information decoding unit 2101 decodes dependent layer information (direct _ dependency _ flag [ ] [ ]) of each layer from the encoded data. The dependent layer information decoding unit 2101 derives an independent layer flag independentilayerflag [ ] for each layer.
The dependent layer information decoding unit 2101 derives the number of dependent layers NumDirectRefLayers [ i ] of the object layer i using a flag direct _ dependency _ flag [ i ] [ j ] indicating whether or not there is dependency between the object layer i and the reference layer j (0 ≦ j < i). Specifically, the number of layers other than 0 in direct _ dependency _ flag [ i ] [ j ] is derived in a reference layer of an index j from 0 to i-1 for the object layer i. The dependent layer information decoding unit 2101 may set IndependentLayerFlag [ i ] to 1 if there is no dependent layer in the target layer i (i.e., if the dependent layer number numdirectrefflayers [ i ] ═ 0 is true), and set IndependentLayerFlag [ i ] to 0 if there is a dependent layer in the target layer (i.e., if numdirectrefflayers [ i ] = 0 is false). In addition, in the absence of independtlayerflag [ i ], independtlayerflag [ i ] leads to a1 that represents an independent.
Further, the dependent layer information decoding unit 2101 determines whether or not the layer indicated by the layer of nuh _ layer _ id is an independent layer (independenlayerflag [ nuh _ layer _ id ] | 0) based on the derived dependent layer information (independenlayerflag [ ]) of the layer. In addition, instead of the flag IndependentLayerFlag [ i ] indicating whether or not a certain layer i is an independent layer, a flag DependentLayerFlag [ i ] indicating whether or not a certain layer i is a dependent layer may be used. In this case, the determination (branch) of "whether or not a layer is an independent layer (independentilayerflag [ nuh _ layer _ id ] | 0)" is entirely replaced with a determination (branch) of "whether or not a layer is not a dependent layer (DependentLayerFlag [ nuh _ layer _ id ] | 0)".
In addition, in scalable coding, tool extension using inter-layer dependencies is performed in many cases, and tools are not extended for intra-layer dependencies. That is, independent layers that do not exploit dependencies between layers (independent layers that only exploit dependencies within layers) are generated using tools that are also available to non-scalable decoders. Thus, if the base layer and syntax structure are the same, the independent layer can also be decoded with a non-scalable decoder.
[ grade decoding section 2102]
The profile level decoding unit 2102 decodes profile level information of each layer from the VPS.
When decoding the SPS, the rank level decoding unit 2102 decodes the rank level information from the SPS even when the layer indicated by the layer ID (nuh _ layer _ ID) of the SPS is an independent layer (for example, when the value of independentilayerflag [ nuh _ layer _ ID ] is true), or when the layer ID of the SPS is 0 (see fig. 16 (a)). Specifically, if an input of a layer ID included in a NAL unit header whose NAL unit type is SPS is received from the NAL unit header decoding unit 211, the level information is also decoded from SPS when dependent layer information decoded from VPS indicates that a layer indicated by the layer ID is an independent layer or when the layer ID is 0. By this processing, it is possible to prevent the bit length of SPS from increasing in a case where SPS sharing (SPS) cannot be used (for example, because of a difference in resolution).
[ characterization information decoding unit 2103]
The token information decoding unit 2103 decodes the syntax of fig. 28(a) from the VPS and the syntax of fig. 28(b) from the SPS. Specifically, the token information decoding unit 2103 decodes rep _ format () included in the VPS, and decodes token information such as chroma _ format _ idc, match _ colour _ plane _ flag, pic _ width _ index _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, and bit _ depth _ chroma _ minus 8. Further, when the layer indicated by the layer ID (nuh _ layer _ ID) of the SPS is not an independent layer (for example, when the value of independentilayerflag [ nuh _ layer _ ID ] is false), the representation information decoding unit 2103 includes a representation information update flag update _ rep _ format _ flag in the SPS, and decodes the flag from the SPS. In the case where update _ rep _ format _ flag is not included in the SPS, update _ rep _ format _ flag is derived (prefer) to be 0. When the update _ rep _ format _ flag is 1, the token information decoding unit 2103 further decodes token information such as chroma _ format _ idc, separate _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, and bit _ depth _ chroma _ minus 8. In the case where update _ rep _ format _ flag is 0, the representation information already decoded in rep _ format () of the VPS is used as the representation information for the object layer.
The above-described determination of "whether or not the layer indicated by the layer ID is a layer other than the independent layer" may be replaced with a determination of "whether or not the layer indicated by the layer ID is a layer other than the independent layer and the layer ID is larger than 0 (in the case where the layer ID is other than 0)". Since the layer whose layer ID is 0 is generally an independent layer, the layer ID is also larger than 0 when the true/false value of the former is true (when the layer indicated by the layer ID is a layer other than the independent layer). Therefore, although the latter determination of the true or false value is not essential, in the case of the base layer (layer ID is 0), in order to clearly exclude the token information update flag update _ rep _ format _ flag, the determination regarding the layer ID may be performed in addition to the determination regarding the independent layer (hereinafter, the same).
[ expansion/contraction table decoding unit 2104]
The scaling table decoding unit 2104 refers to (a) of fig. 16 from the SPS decoding scaling table prediction flag SPS _ input _ scaling _ list _ flag when the layer indicated by the layer ID is a layer other than the independent layer (for example, when the value of independentilayelflag [ nuh _ layer _ ID ] is 0, that is, | independentilayerflag [ nuh _ layer _ ID ] is true). If the sps _ inference _ scaling _ list _ flag is not 0, the sps _ scaling _ list _ ref _ layer _ id is decoded. In the case where the sps _ suppression _ scaling _ list _ flag is 0, the sps _ scaling _ list _ data _ present _ flag and the scaling _ list _ data () are decoded and the scaling table is decoded.
When the layer indicated by the layer ID is a layer other than the independent layer, the scaling table decoding unit 2104 decodes PPS _ entrainment _ scaling _ list _ flag from the PPS (see fig. 16 (b)). When the pps _ interpolating _ list _ flag is not 0, the pps _ interpolating _ list _ ref _ layer _ id is decoded. In the case where pps _ interpolating _ list _ flag is 0, pps _ scaling _ list _ data _ present _ flag and scaling _ list _ data () are decoded and the scaling table is decoded.
Specifically, when decoding the SPS or PPS, the scaling table decoding unit 2104 receives an input of the layer ID included in the NAL unit header of the SPS or PPS from the NAL unit header decoding unit 211, and when the dependent layer information decoded from the VPS indicates that the layer indicated by the layer ID is other than the independent layer, decodes flags SPS _ inclusion _ scaling _ list _ flag, PPS _ inclusion _ scaling _ list _ flag indicating whether or not to predict the scaling table
This process can prevent the bit length of SPS and PPS from increasing when SPS/PPS sharing (SPS/PPS) cannot be used.
[ POC information decoding unit 2105]
Fig. 29 is a functional block diagram showing a schematic configuration of the POC information decoding unit 2105(POC derivation unit). As shown in fig. 29, the POC information decoding unit 2105 includes a POC lower bit maximum value decoding unit 21051, a POC lower bit decoding unit 21052, a POC upper bit deriving unit 21053, a POC adding unit 21054, and a POC resetting unit 21055. The POC information decoding unit 2105 derives POC by decoding the upper bits PicOrderCntMsb of POC and the lower bits pic _ order _ cnt _ lsb of POC, and outputs the POC to the picture decoding unit 11 and the reference picture managing unit 13.
The POC lower bit maximum value decoding unit 21051 decodes the POC lower bit maximum value MaxPicOrderCntLsb of the coded data decoding target picture. Specifically, the syntax element log2_ max _ pic _ order _ cnt _ lsb _ minus4 encoded as a value obtained by subtracting a constant 4 from the logarithm of the POC lower bit maximum value MaxPicOrderCntLsb is decoded from the encoded data of PPS, which is a parameter defining the target picture, and the POC lower bit maximum value MaxPicOrderCntLsb is derived by the following expression.
MaxPicOrderCntLsb=2(log2_max_pic_order_cnt_lsb_minus4+4)
Furthermore, MaxPicOrderCntLsb represents the separation of the upper bits PicOrderCntMsb and lower bits pic _ order _ cnt _ lsb of the POC. For example, in the case where MaxPicOrderCntLsb is 16(log2_ max _ pic _ order _ cnt _ lsb _ minus4 ═ 0), the lower 4 bits to 0 to 15 are represented by pic _ order _ cnt _ lsb and the upper bits above are represented by PicOrderCntMsb.
The POC lower bit decoding unit 21052 decodes POC lower bits pic _ order _ cnt _ lsb, which are POC lower bits of the target picture, from coded data when a) a layer indicated by the layer ID of the picture is a layer other than an independent layer (for example, when the value of independentilayerflag [ nuh _ layer _ ID ] is 0) or b) the NAL unit type of the picture is other than IDR (when other than IDR _ W _ RADL and IDR _ N _ LP). Specifically, pic _ order _ cnt _ lsb included in a slice header of a target picture is decoded.
The POC higher-order bit derivation unit 21053 derives POC higher-order bits PicOrderCntMsb which are higher-order bits of the POC of the target picture. Specifically, when the NAL unit type of the target picture input from the NAL unit header decoding unit 211 indicates a RAP picture required for initialization of POC (in the case of BLA or IDR), the POC upper bit PicOrderCntMsb is initialized to 0 by the following expression.
PicOrderCntMsb=0
The initialization timing is set to the point in time of the beginning slice of the picture to be decoded (the slice with the slice address of 0 included in the slice header, or the first slice of the picture to be input to the image decoding apparatus).
In the case of NAL unit types other than this, the POC upper bits PicOrderCntMsb are derived by the following equation using the POC lower bit maximum value MaxPicOrderCntLsb decoded by the POC lower bit maximum value decoding section 21051, and the temporary variables prevPicOrderCntLsb and prevpicordercnttmsb described later.
if((pic_order_cnt_lsb<prevPicOrderCntLsb)&&
((prevPicOrderCntLsb-pic_order_cnt_lsb)>=(MaxPicOrderCntLsb/2)))
PicOrderCntMsb=prevPicOrderCntMsb+MaxPicOrderCntLsb
elseif((pic_order_cnt_lsb>prevPicOrderCntLsb)&&
((pic_order_cnt_lsb-prevPicOrderCntLsb)>(MaxPicOrderCntLsb/2)))
PicOrderCntMsb=prevPicOrderCntMsb-MaxPicOrderCntLsb
else
PicOrderCntMsb=prevPicOrderCntMsb
That is, in the case where pic _ order _ cnt _ lsb is smaller than prevPicOrderCntLsb and the difference between prevPicOrderCntLsb and pic _ order _ cnt _ lsb is more than half of MaxPicOrderCntLsb, the number of prevPicOrderCntMsb plus MaxPicOrderCntLsb is set as PicOrderCntMsb. In addition, in the case where pic _ order _ cnt _ lsb is larger than prevPicOrderCntLsb and the difference between prevPicOrderCntLsb and pic _ order _ cnt _ lsb is larger than half of MaxPicOrderCntLsb, the number of subtraction MaxPicOrderCntLsb from prevPicOrderCntMsb is set as PicOrderCntMsb. In other cases, prevPicOrderCntMsb is set for PicOrderCntMsb.
The temporary variables prevPicOrderCntLsb and prevPicOrderCntMsb are derived by the POC higher-order bit derivation unit 21053 through the following procedure. In the case where a reference picture whose temporalld before in decoding order is 0 is set to prevTid0Pic, poc (picordercntval) of picture prevTid0Pic is set to prevPicOrderCnt, and prevPicOrderCntMsb, prevPicOrderCntMsb are derived by the following equations.
prevPicOrderCntMsb=prevPicOrderCnt&(MaxPicOrderCntLsb-1)
prevPicOrderCntMsb=prevPicOrderCnt-prevPicOrderCntLsb
Fig. 30 is a diagram illustrating an operation of the POC information decoding unit 2105. As shown in fig. 30, the following example is shown: in the case of MaxPicOrderCntLsb 16, pictures of POC 15, 18, 24, 11, 32 are decoded in order from left to right in the drawing. Here, when the picture at the right end (POC-32 picture) is the target picture, the POC information decoding unit 2105 sets the picture with POC-24 as the picture prevTid0Pic because the picture with temporalld-0 before decoding order is the picture with POC-24 at the time point when the target picture is decoded. In POC lower bits and POC upper bits of the picture prevTid0Pic, prevPicOrderCntLsb and prevPicOrderCntMsb are derived as 8 and 16, respectively. Since pic _ order _ cnt _ lsb of the target picture is 0, prevPicOrderCntLsb derived is 8, and half of MaxPicOrderCntLsb is 8, the above-described case where pic _ order _ cnt _ lsb is smaller than prevPicOrderCntLsb and the difference between prevPicOrderCntLsb and pic _ order _ cnt _ lsb is more than half of MaxPicOrderCntLsb is established, and the POC information decoding unit 2105 sets the number of addition of prevPicOrderCntMsb and MaxPicOrderCntLsb as PicOrderCntMsb. That is, PicOrderCntMsb of the target picture is derived as 32(═ 16+ 16).
The POC adding unit 21054 adds the POC lower bit pic _ order _ cnt _ lsb decoded by the POC lower bit decoding unit 21052 and the POC upper bit derived by the POC upper bit deriving unit 21053, and derives POC (picordercntval) by the following equation.
PicOrderCntVal=PicOrderCntMsb+pic_order_cnt_lsb
In the example of fig. 30, since PicOrderCntMsb is 32 and pic _ order _ cnt _ lsb is 0, PicOrderCntVal, which is POC of the target picture, is derived as 32.
When the POC _ reset _ flag is 1, the POC resetting unit 21055 subtracts PicOrderCntVal derived by the POC adding unit 21054 from PicOrderCntVal of each reference picture of the same access unit stored in the DPB, and sets PicOrderCntVal of the target picture to 0.
The POC information decoding unit 2105 decodes slice _ pic _ order _ cnt _ lsb from the slice header when a) the layer indicated by the layer ID of the picture is a layer other than the independent layer, or b) the NAL unit type of the picture is other than IDR (when IDR _ W _ RADL and IDR _ N _ LP are other than IDR ((c) of fig. 16). By this processing, even if the NAL unit type is IDR, when the layer indicated by the layer ID is a layer other than the independent layer, slice _ pic _ order _ cnt _ lsb can be determined, and the occurrence of POC alignment problems can be prevented.
[ reference picture information decoding unit 218]
The reference picture information decoding unit 218 is a component of the header decoding unit 10, and decodes information related to a reference picture from the encoded data # 1. The information related to the reference picture includes: reference picture set information (hereinafter, referred to as RPS information), and reference picture list modification information (hereinafter, referred to as RPL modification information).
Reference Picture Set (RPS) represents a set of pictures that can be used as reference pictures in a target picture or in a picture following the target picture in decoding order. The RPS information is information decoded from the SPS and slice header, and is information for deriving a reference picture set at the time of decoding each picture.
The Reference Picture List (RPL) is a candidate list of reference pictures to be referred to when performing motion compensation prediction. There may also be more than two reference picture lists. In the present embodiment, the L0 reference picture list (L0 reference list) and the L1 reference picture list (L1 reference list) are set to be used. The RPL correction information is information decoded from the SPS or slice header, and indicates the order of reference pictures in the reference picture list.
In the motion compensation prediction, a reference picture recorded at the position of the reference picture index (refIdx) on the reference picture list is used. For example, in the case where the value of refIdx is 0, the position of 0 of the reference image list, i.e., the reference picture at the beginning of the reference image list is used for motion compensated prediction.
The decoding processing of the RPS information and the RPL correction information by the reference picture information decoding unit 218 is an important processing in the present embodiment, and therefore, will be described in detail later.
Here, an example of a reference picture set and a reference picture list is explained with reference to fig. 32. Fig. 32(a) shows pictures constituting a moving picture arranged in display order, and the numerals in the figure indicate POC corresponding to the respective pictures. As described later, POC is assigned to each picture in ascending order in output order in the description of the decoded picture buffer. The picture with POC 9 denoted as "curr" is the current decoded target picture.
Fig. 32(b) shows an example of RPS information applied to a target picture. A reference picture set (current RPS) in the target picture is derived based on the RPS information. The RPS information includes long-term RPS information and short-term RPS information. As the long-term RPS information, POC of a picture contained in the current RPS is directly indicated. In the example shown in fig. 32(b), the long-term RPS information indicates that a picture having POC 1 is included in the current RPS. In the short-term RPS information, pictures included in the current RPS are recorded as differences between the POC of the target picture. The short-term RPS information indicated as "Before, dpcc ═ 1" in the figure indicates that a picture whose POC is 1 smaller than that of the target picture is included in the current RPS. Similarly, "Before, dpcc ═ 4" in the figure indicates a picture of POC smaller than 4, and "After, dpcc ═ 1" indicates that a picture of POC larger than 1 is included in the current RPS. Further, "Before" indicates a picture ahead of the target picture, that is, a picture earlier in display order than the target picture. Further, "After" indicates a rear side of the target picture, that is, a picture later than the target picture.
Fig. 32(c) shows an example of the current RPS derived when the RPS information illustrated in fig. 32(b) is used when the POC of the target picture is 0. A picture including POC 1 indicated by long-term RPS information. In addition, a picture having a POC 1 smaller than that of the target picture (POC 9), that is, a picture having a POC 8, which is indicated by the short-term RPS information, is included. Also, pictures represented by short-term RPS information, POC 5 and POC 10, are included.
Fig. 32(d) and (e) show examples of reference picture lists generated from reference pictures contained in the current RPS. Each element of the reference picture list is assigned an index (reference picture index) (idx in the figure). Fig. 32(d) shows an example of the L0 reference list. The reference pictures contained in the current RPS having POC of 5, 8, 10, 1 are contained in the L0 reference list in this order. Fig. 32(e) shows an example of the L1 reference list. The L1 reference list contains the reference pictures contained in the current RPS with POC of 10, 5, 8 in that order. Further, as shown in the example of the L1 reference list, all reference pictures (pictures that can be referred to) contained in the current RPS need not be contained in the reference picture list. However, the maximum number of elements of the reference picture list is also the number of reference pictures contained in the current RPS. In other words, the length of the reference picture list is below the number of pictures that can be referenced in the current picture.
Next, an example of reference picture list correction is explained with reference to fig. 33. Fig. 33 illustrates a corrected reference picture list (fig. 33(c)) obtained when RPL correction information (fig. 33(b)) is used for a specific reference picture list (fig. 33 (a)). The pre-correction L0 reference list shown in fig. 33(a) is the same as the L0 reference list illustrated in fig. 32 (d). The RPL correction information shown in fig. 33(b) is a list having the value of the reference picture index as an element, and contains values of 0, 2, 1, and 3 in order from the beginning. The RPL correction information indicates that the reference pictures indicated by the reference picture indices of 0, 2, 1, and 3 included in the pre-correction reference list are to be the reference pictures of the post-correction L0 reference list in this order. Fig. 33(c) shows the corrected L0 reference list, and POC includes pictures of 5, 10, 8, and 1 in this order.
(moving Picture decoding processing flow)
The flow of the image decoding apparatus 1 generating the decoded image #2 from the input encoded data #1 is as follows.
(S11) the header decoding unit 10 decodes the VPS and SPS from the encoded data # 1.
(S12) the header decoding unit 10 decodes the PPS from the coded data # 1.
(S13) the pictures indicated by the coded data #1 are set as the target pictures in order. The processing of S14 to S17 is performed for each object picture.
(S14) the header decoding unit 10 decodes the slice header of each slice included in the target picture from the coded data # 1. The reference picture information decoding unit 218 included in the header decoding unit 10 decodes RPS information from the slice header and outputs the RPS information to the reference picture set setting unit 131 included in the reference picture management unit 13. The reference picture information decoding unit 218 decodes the RPL correction information from the slice header and outputs the decoded RPL correction information to the reference picture list deriving unit 132. (S15) the reference picture set setting unit 131 generates a reference picture set RPS to be applied to the target picture based on a combination of the RPS information, the POC of the local decoded image recorded in the decoded picture buffer 12, and the location information in the memory, and outputs the reference picture set RPS to the reference picture list derivation unit 132.
(S16) the reference picture list derivation unit 132 generates a reference picture list RPL based on the reference picture set RPS and the RPL correction information, and outputs the reference picture list RPL to the picture decoding unit 11. (S17) the picture decoding unit 11 creates a local decoded image of the target picture from the coded data #1 based on the slice data of each slice included in the target picture and the reference picture list RPL, and records the local decoded image in the decoded picture buffer in association with the POC of the target picture. The local decoded image recorded in the decoded picture buffer is output to the outside as decoded image #2 at an appropriate timing determined based on POC.
[ decoded picture buffer 12]
The decoded picture buffer 12 stores a local decoded image of each picture decoded by the picture decoding unit in association with a layer ID and POC (picture order information) of the picture. The decoded picture buffer 12 determines the POC of the output target at a predetermined output timing. Then, the local decoded image corresponding to the POC is output to the outside as one of the pictures constituting the decoded image # 2.
Fig. 24 is a conceptual diagram illustrating the configuration of a decoded picture memory. The boxes marked with numbers inside the figure represent the partially decoded images. The numbers represent the POC. As shown in the diagram of fig. 24, the local decoded pictures of the plurality of layers are recorded in association with the layer ID, POC, and local decoded picture. Further, a view _ ID and a depth flag depth _ flag corresponding to the layer ID are also recorded in association with the local decoded picture.
[ reference Picture managing section 13]
Fig. 31 is a schematic diagram showing the configuration of the reference picture management unit 13 according to the present embodiment. The reference picture management unit 13 includes a reference picture set setting unit 131 and a reference picture list derivation unit 132.
The reference picture set setting unit 131 constructs a reference picture set RPS based on the RPS information decoded by the reference picture information decoding unit 218 and the information of the local decoded image, the layer ID, and the POC recorded in the decoded picture buffer 12, and outputs the reference picture set RPS to the reference picture list deriving unit 132. The reference picture set setting unit 131 is described in detail later.
The reference picture list derivation unit 132 generates a reference picture list RPL based on the RPL correction information decoded by the reference picture information decoding unit 218 and the reference picture set RPS input from the reference picture set setting unit 131, and outputs the reference picture list RPL to the picture decoding unit 11. The reference picture list derivation unit 132 is described in detail later.
(detailed description of reference Picture information decoding processing)
The decoding processing of the RPS information and the RPL correction information in the processing of S14 in the decoding flow will be described in detail.
(RPS information decoding processing)
The RPS information is information decoded by SPS or slice header to construct a reference picture set. The RPS information includes the following information.
SPS short-term RPS information: short-term reference picture set information included in SPS
SPS long-term RP information: long-term reference picture information contained in SPS
SH short-term RPS information: short-term reference picture set information contained in slice header
SH long-term RP information: long-term reference picture information contained in slice header
(1.SPS short-term RPS information)
The SPS short-term RPS information includes information of a plurality of short-term reference picture sets that can be utilized from each picture that refers to the SPS. Further, the short-term reference picture set is a set of pictures that can be reference pictures (short-term reference pictures) specified by a relative position to a target picture (for example, POC difference from the target picture).
Decoding of SPS short-term RPS information is explained with reference to fig. 34. Fig. 34 illustrates a part of the SPS syntax table used in the SPS decoding in the header decoding section 10 and the reference picture information decoding section 218. Part (a) of fig. 34 corresponds to SPS short-term RPS information. The SPS short-term RPS information includes: the number of short-term reference picture sets (num _ short _ term _ ref _ pic _ sets) included in the SPS, and information of each short-term reference picture set (short _ term _ ref _ pic _ set (i)).
The short-term reference picture set information is explained with reference to fig. 35. Fig. 35 illustrates syntax tables of short-term reference picture sets used at the time of SPS decoding in the header decoding section 10 and the reference picture information decoding section 218, and at the time of slice header decoding.
The short-term reference picture set information includes: the number of short-term reference pictures (num _ negative _ pics) whose display order is earlier than the target picture, and the number of short-term reference pictures (num _ positive _ pics) whose display order is later than the target picture. In the following, a short-term reference picture having a display order earlier than that of the target picture is referred to as a front short-term reference picture, and a short-term reference picture having a display order later than that of the target picture is referred to as a rear short-term reference picture.
In addition, the short-term reference picture set information includes, for each preceding short-term reference picture: absolute value of POC difference with respect to the object picture (delta _ POC _ s0_ minus1[ i ]), and possibility of use as a reference picture of the object picture (used _ by _ curr _ pic _ s0_ flag [ i ]). Further, each backward short-term reference picture includes: absolute value of POC difference with respect to the object picture (delta _ POC _ s1_ minus1[ i ]), and possibility of use as a reference picture of the object picture (used _ by _ curr _ pic _ s1_ flag [ i ]).
(2.SPS Long-term RP information)
The SPS long-term RP information includes information of a plurality of long-term reference pictures that can be utilized from each picture that refers to the SPS. Further, a long-term reference picture is a picture specified by an absolute position (e.g., POC) within a sequence.
Referring again to fig. 34, decoding of SPS long-term RP information is explained. Part (B) of fig. 34 corresponds to SPS long-term RP information. The SPS long-term RP information includes: information indicating the presence or absence of long-term reference pictures transmitted with SPS (long _ term _ ref _ pics _ present _ flag), the number of long-term reference pictures included in SPS (num _ long _ term _ ref _ pics _ SPS), and information of each long-term reference picture. The information of the long-term reference picture includes: POC of the reference picture (lt _ ref _ pic _ POC _ lsb _ sps [ i ]), and whether there is a possibility of use as a reference picture of the target picture (used _ by _ curr _ pic _ lt _ sps _ flag [ i ]).
The POC of the reference picture may be the value of the POC associated with the reference picture, or may be the lsb (lestsignifiancetit) of the POC, that is, the value obtained by dividing the POC by a predetermined number of powers of 2.
(3.SH short-term RPS information)
The SH short-term RPS information includes information of a single short-term reference picture set that can be utilized from a picture that references the slice header.
Decoding of SPS short-term RPS information is explained with reference to fig. 36. Fig. 36 illustrates a part of the slice header syntax table used in the slice header decoding in the header decoding unit 10 and the reference picture information decoding unit 218. The part (a) of fig. 36 corresponds to SH short-term RPS information. The SH short-term RPS information includes a flag (short _ term _ ref _ pic _ set _ SPS _ flag) indicating whether a short-term reference picture set is selected from among short-term reference picture sets whose decoding is completed with SPS, or explicitly included within a slice header. In case of selecting from completing decoding with SPS, an identifier (short _ term _ ref _ pic _ set _ idx) of one short-term reference picture set that completes decoding is included. If explicitly included in the slice header, information equivalent to the syntax table (short _ term _ ref _ pic _ set (idx)) described with reference to fig. 7 is included in the SPS short-term RPS information.
(4.SH Long-term RP information)
SH long-term RP information contains information of long-term reference pictures that can be utilized from pictures that refer to slice headers.
The decoding of SH long-term RP information is explained with reference to fig. 36 again. Part (B) of fig. 36 corresponds to SH long-term RP information. SH long-term RP information is included in the slice header only when a long-term reference picture (long _ term _ ref _ pic _ present _ flag) can be utilized in the target picture. In the case where decoding of long-term reference pictures of 1 or more is completed with SPS (num _ long _ term _ ref _ pics _ SPS > 0), the number of reference pictures that can be referred to in the target picture (num _ long _ term _ SPS) in the long-term reference pictures whose decoding is completed with SPS is included in SH long-term RP information. In addition, the number of long-term reference pictures (num _ long _ term _ pics) explicitly transmitted in the slice header is included in the SH long-term RP information. Information (lt _ idx _ SPS [ i ]) of a long-term reference picture, which is selected from long-term reference pictures transmitted by SPS, is included in SH long-term RP information. Further, the information of the long-term reference picture explicitly included in the slice header includes: only the number of num _ long _ term _ pics described above, POC (POC _ lsb _ lt [ i ]) of the reference picture, and the possibility of use (used _ by _ curr _ pic _ lt _ flag [ i ]) of whether or not the reference picture as the target picture is used.
(RPL correction information decoding processing)
The RPL correction information is information decoded by SPS or slice header to construct a reference picture list RPL. The RPL correction information includes SPS list correction information and SH list correction information.
(SPS List correction information)
The SPS list correction information is information included in the SPS and is information related to restriction on reference picture list correction. The SPS list correction information is explained with reference to fig. 34 again. Part (C) of fig. 34 corresponds to SPS list correction information. The SPS list correction information includes: a flag (reserved _ ref _ pic _ lists _ flag) indicating whether or not a reference picture list is common in a front slice included in a picture, and a flag (lists _ modification _ present _ flag) indicating whether or not list-ordering related information exists within a slice header.
(SH List correction information)
The SH list modification information is information included in the slice header, and includes update information of the length of a reference picture list (reference list length) applied to the target picture, and ordering information of the reference picture list (reference list ordering information). The SH list modification information is explained with reference to fig. 37. Fig. 37 illustrates a part of the slice header syntax table used in the slice header decoding in the header decoding unit 10 and the reference picture information decoding unit 218. Part (C) of fig. 37 corresponds to SH list correction information.
The reference list length update information includes a flag (num _ ref _ idx _ active _ override _ flag) indicating whether or not the list length is updated. Further, the method includes: information indicating the changed reference list length of the L0 reference list (num _ ref _ idx _10_ active _ minus1), and information indicating the changed reference list length of the L1 reference list (num _ ref _ idx _ L1_ active _ minus 1).
Information contained in the slice header as reference list sorting information will be described with reference to fig. 38. Fig. 38 illustrates syntax tables of reference list sorting information used in slice header decoding in the header decoding section 10 and the reference picture information decoding section 218.
The reference list sorting information includes an L0 reference list sorting presence flag (ref _ pic _ list _ modification _ flag _ 10). In the case where the flag has a value of 1 (in the case where there is an ordering of the L0 reference list) and NumPocTotalCurr is greater than 2, the L0 reference list ordering order (list _ entry _ L0[ i ]) is included in the reference list ordering information. Here, NumPocTotalCurr is a variable representing the number of reference pictures available in the current picture. Therefore, the L0 reference list ordering order is included within the slice header only when there is an ordering of the L0 reference list and the number of available reference pictures in the current picture is greater than 2.
Similarly, in the case where the reference picture is a B slice, that is, in the case where the L1 reference list is available in the target picture, the L1 reference list sorting presence/absence flag (ref _ pic _ list _ modification _ flag _ L1) is included in the reference list sorting information. In the case where the flag has a value of 1 and the NumPocTotalCurr is greater than 2, the L1 reference list sorting order (list _ entry _ L1[ i ]) is included in the reference list sorting information. In other words, the L1 reference list ordering order is contained within the slice header only when there is an ordering of the L1 reference list and when the number of reference pictures available for the current picture is greater than 2.
(detailed description of reference Picture set derivation processing)
The process of S15 in the moving image decoding flow, that is, the reference picture set derivation process by the reference picture set setting section will be described in detail.
As described above, the reference picture set setting unit 131 generates the reference picture set RPS used for decoding the target picture based on the RPS information and the information recorded in the decoded picture buffer 12.
The reference picture set RPS is a set of pictures (pictures that can be referred to) that can be used as reference images in decoding, among target pictures or pictures subsequent to the target pictures in decoding order. The reference picture set is divided into the following two subsets according to the kinds of pictures that can be referred to.
List ListCurr that the current picture can refer to: decoding a list of pictures on a picture buffer that can be referenced in a subject picture
List listfall that subsequent pictures can refer to: list of pictures on decoded picture buffer which, although not referenced, are referable to pictures following the target picture in decoding order
In addition, the number of pictures included in the list that can be referred to by the current picture is referred to as the number of pictures NumCurrList that can be referred to by the current picture. In addition, NumPocTotalCurr described with reference to fig. 38 above is the same as numcurlist.
The list that the current picture can refer to is thus made up of three partial lists.
Current picture long-term referenceable list ListLtCurr: picture that can be referred to by current picture specified by SPS long-term RP information or SH long-term RP information
The current picture short-term front referenceable list ListStCurrBefore: pictures that can be referred to by the current picture specified by the SPS short-term RPS information or the SH short-term RPS information are displayed earlier in the display order than the target picture
The short-term backward referenceable list ListStCurrAfter of the current picture: pictures that can be referred to by the current picture specified by the SPS short-term RPS information or the SH short-term RPS information are displayed earlier in the display order than the target picture
The list that subsequent pictures can refer to is then made up of two partial lists.
The subsequent picture long-term referenceable list listltfall: pictures that subsequent pictures specified by SPS long-term RP information or SH long-term RP information can refer to
The subsequent picture short-term referenceable list liststfol: pictures that can be referred to by a current picture specified by SPS short-term RPS information or SH short-term RPS information
In the case where the NAL unit type is other than IDR, the reference picture set setting section 131 generates the reference picture set RPS in such a procedure that the current picture short-term front referenceable list listcurrbefore, the current picture short-term rear referenceable list ListStCurrAfter, the current picture long-term referenceable list ListLtCurr, the subsequent picture short-term referenceable list ListStFoll, and the subsequent picture long-term referenceable list listltfol. Also, a variable NumPocTotalCurr representing the number of pictures that the current picture can refer to is derived. Further, each of the above-described referenceable lists is set to be empty before starting the following processing. In the case where the NAL unit type is IDR, the reference picture set setting section 131 derives the reference picture set RPS as null.
(S201) a single short-term reference picture set used for decoding of the target picture is specified based on the SPS short-term RPS information and the SH short-term RPS information. Specifically, when the value of short _ term _ ref _ pic _ set _ sps included in the SH short-term RPS information is 0, a short-term RPS explicitly transmitted with a slice header included in the SH short-term RPS information is selected. Otherwise (when the value of short _ term _ ref _ pic _ set _ SPS is 1), the short-term RPS indicated by short _ term _ ref _ pic _ set _ idx included in the SH short-term RPS information is selected from the plurality of short-term RPSs included in the SPS short-term RPS information.
(S202) the value of each POC of the reference pictures included in the selected short-term RPS is derived, and the position of the local decoded image recorded in the decoded picture buffer 12 in association with the POC value is detected and derived as the recording position in the decoded picture buffer of the reference picture.
In the case where the reference picture is a front short-term reference picture, the POC value of the reference picture is derived by subtracting the value of "delta _ POC _ s0_ minus1[ i ] + 1" from the value of the POC of the subject picture. On the other hand, in the case where the reference picture is a backward short-term reference picture, the value of POC of the target picture is added to the value of "delta _ POC _ s1_ minus1[ i ] + 1" and derived.
(S203) confirming the front reference picture included in the short-term RPS in the order of transmission, and adding the front reference picture to the current picture short-term front referenceable list listcurrbefore if the value of the associated used _ by _ curr _ pic _ S0_ flag [ i ] is 1. In addition to this (the value of used _ by _ curr _ pic _ s0_ flag [ i ] is 0), the front reference picture is appended to the subsequent picture short-term referenceable list listsfell.
(S204) confirming a rear reference picture included in the short-term RPS in the order of transmission, and adding the rear reference picture to a current picture short-term rear referable list listcurrafter when the value of the associated used _ by _ curr _ pic _ S1_ flag [ i ] is 1. In addition to this, if the value of used _ by _ curr _ pic _ s1_ flag [ i ] is 0, the previous reference picture is added to the subsequent picture short-term referenceable list liststfol.
(S205) based on the SPS long-term RP information and the SH long-term RP information, a long-term reference picture set used for decoding of the target picture is specified. Specifically, a reference picture of the number num _ long _ term _ SPS is selected from reference pictures that are included in the SPS long-term RP information and that are equal in the object picture and the layer ID, and sequentially added to the long-term reference picture set. The selected reference picture is the reference picture shown by lt _ idx _ sps [ i ]. Then, the reference pictures of the number num _ long _ term _ pics sequentially add the reference pictures included in the SH long-term RP information to the long-term reference picture set. When the layer ID of the target picture is other than 0, a reference picture having a POC equal to the POC of the target picture is added to the long-term reference picture set from among the target picture and pictures having different layer IDs.
(S206) the value of the POC of each of the reference pictures included in the long-term reference picture set is derived, and the position of the local decoded image recorded in association with the POC value is detected from the reference picture having the same layer ID as the target picture in the decoded picture buffer 12, and the detected position is derived as the recording position in the decoded picture buffer of the reference picture. In addition, for a reference picture different in layer ID and the target picture, the position of the local decoded image recorded in association with the layer ID and the POC of the target picture is detected and derived as the recording position on the decoded picture buffer of the reference picture.
For reference pictures with equal object picture and layer ID, POC of the long-term reference picture is directly derived from the value of POC _ lst _ lt [ i ], or lt _ ref _ pic _ POC _ lsb _ sps [ i ], decoded in association. For a reference picture with a different layer ID from the target picture, the POC of the target picture is set.
(S207) sequentially confirming the reference pictures included in the long-term reference picture set, and in a case where the associated used _ by _ curr _ pic _ lt _ flag [ i ], or the used _ by _ curr _ pic _ lt _ sps _ flag [ i ] has a value of 1, appending the long-term reference picture to the current picture long-term referable list ListLtCurr. In addition to this (used _ by _ curr _ pic _ lt _ flag [ i ], or the value of used _ by _ curr _ pic _ lt _ sps _ flag [ i ] is 0), the long-term reference picture is added to the subsequent picture long-term referable list listlfoll.
(S208) the value of the variable NumPocTotalCurr is set to the sum of reference pictures that can be referred to from the current picture. That is, the value of the variable NumPocTotalCurr is set to the sum of the numbers of elements of the three lists ListStCurrBefore, ListStCurrAfter, and ListLtCurr.
(details of reference Picture List construction processing)
The processing of S16 in the above decoding flow, that is, the reference picture list construction processing will be described in detail with reference to fig. 1. As described above, the reference picture list derivation unit 132 generates the reference picture list RPL based on the reference picture set RPS and the RPL correction information.
The reference picture list is composed of two lists, an L0 reference list and an L1 reference list. First, a flow of constructing the L0 reference list will be described. The L0 reference list is constructed according to the flow shown in S301 to S307 below.
(S301) a temporary L0 reference list is generated and initialized to an empty list.
(S302) the reference pictures included in the current picture short-term forward referenceable list are sequentially added to the temporary L0 reference list.
(S303) sequentially adding reference pictures included in the current picture postamble referenceable list to the temporary L0 reference list.
(S304) sequentially appending the reference pictures included in the current picture long-term referable list to the temporary L0 reference list.
(S305) when the reference picture list is corrected (when the value of the lists _ modification _ present _ flag included in the RPL correction information is 1), the following processes of S306a to S306b are executed. Otherwise (in the case where the value of the lists _ modification _ present _ flag is 0), the process of S307 is executed.
(S306a) when the correction of the L0 reference picture is valid (in the case where the value of ref _ pic _ list _ modification _ flag _ L0 included in the RPL correction information is 1) and the number of pictures NumCurrList that can be referred to by the current picture is equal to 2, performs S306 b. Otherwise, S306c is executed.
(S306b) the value of list _ entry _ l0[ i ] included in the RPL correction information is set according to the following equation, and then S306c is executed.
list_entry_l0[0]=1
list_entry_l0[1]=0
(S306c) sorting the elements of the L0 reference list as an L0 reference list based on the value of the reference list sorting order list _ entry _ L0[ i ]. The element RefPicList0[ rIdx ] of the L0 reference list corresponding to the reference picture index rIdx is derived as follows. Here, RefListTemp0[ i ] represents the ith element of the temporary L0 reference list.
RefPicList0[rIdx]=RefPicListTemp0[list_entry_l0[rIdx]]
According to the above equation, with reference to the value recorded in the position indicated by the reference picture index rIdx in the reference list sorting order list _ entry _ L0[ i ], the reference picture recorded in the position of the above value in the temporary L0 reference list is housed as the reference picture of the position of the rIdx of the L0 reference list.
(S307) the provisional L0 reference list is made the L0 reference list.
An L1 reference list is then constructed. The L1 reference list can be constructed in the same flow as the L0 reference list. In the flow of constructing the L0 reference list (S301 to S307), the L0 reference picture, the L0 reference list, the temporary L0 reference list, and the list _ entry _ L0 may be replaced with the L1 reference picture, the L1 reference list, the temporary L1 reference list, and the list _ entry _ L1, respectively.
In the above description, an example in which the RPL correction information is omitted when the number of pictures that can be referred to by the current picture is 2 is described in fig. 38, but the present invention is not limited thereto. The RPL correction information may also be omitted in the case where the number of pictures that can be referred to by the current picture is 1. Specifically, in the decoding process of the SH list correction information in the reference picture information decoding unit 218, the reference list ranking information is analyzed based on the syntax table shown in fig. 39. Fig. 39 illustrates a syntax table of reference list sorting information used when a slice header is decoded.
[ Picture decoding section 11]
The picture decoding unit 11 generates a local decoded image of each picture based on the encoded data #1, the header information input by the header decoding unit 10, the reference picture recorded in the decoded picture buffer 12, and the reference picture list input by the reference picture list deriving unit 132, and records the local decoded image in the decoded picture buffer 12.
Fig. 5 is a schematic diagram showing the configuration of the picture decoding unit 11 according to the present embodiment. The picture decoding unit 11 includes an entropy decoding unit 301, a prediction parameter decoding unit 302, a prediction parameter memory (prediction parameter storage unit) 307, a predicted image generation unit 308, an inverse quantization/inverse DCT unit 311, and an addition unit 312.
The prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304. The predicted image generator 308 includes an inter-predicted image generator 309 and an intra-predicted image generator 310.
The entropy decoding unit 301 performs entropy decoding on the encoded data #1 input from the outside, separates and decodes each code (syntax element). The separated codes include prediction information for generating a prediction image, residual information for generating a difference image, and the like.
The entropy decoding unit 301 outputs a part of the separated code to the prediction parameter decoding unit 302. A part of the separate code is for example: a prediction mode PredMode, a partition mode part _ mode, a merge flag merge _ flag, a merge index merge _ idx, an inter prediction flag inter _ pred _ idx, a reference picture index refIdxLX, a prediction vector index mvp _ LX _ idx, and a differential vector mvdLX. Control of which code is decoded or not is performed based on an instruction from the prediction parameter decoding unit 302. The entropy decoding unit 301 outputs the quantized coefficient to the inverse quantization/inverse DCT unit 311. The quantized coefficient is a coefficient obtained by performing DCT (discrete cosine transform) on the residual signal in the encoding process and quantizing the residual signal.
The inter prediction parameter decoding unit 303 decodes the inter prediction parameter with reference to the prediction parameter stored in the prediction parameter memory 307 based on the code input from the entropy decoding unit 301.
The inter prediction parameter decoding unit 303 outputs the decoded inter prediction parameters to the predicted image generation unit 308, and stores the parameters in the prediction parameter memory 307. The inter prediction parameter decoding unit 303 will be described in detail later.
The intra-prediction parameter decoding unit 304 generates intra-prediction parameters with reference to the prediction parameters stored in the prediction parameter memory 307, based on the code input from the entropy decoding unit 301. The intra prediction parameter is information required to generate a predicted image of a block to be decoded by using intra prediction, and is, for example, an intra prediction mode IntraPredMode.
The intra prediction parameter decoding unit 304 decodes the depth intra prediction mode dmm _ mode from the input code. The intra prediction parameter decoding unit 304 generates an intra prediction mode IntraPredMode from the following expression using the depth intra prediction mode dmm _ mode.
IntraPredMode=dmm_mode+35
In the case where the depth intra prediction MODE DMM _ MODE is 0 or 1, i.e., MODE _ DMM _ WFULL or MODE _ DMM _ wfullldelta, the intra prediction parameter decoding section 304 decodes the Wedgelet MODE index wedge _ full _ tab _ idx from the input code.
When the depth intra prediction MODE DMM _ MODE is MODE _ DMM _ wfullldelta or MODE _ DMM _ cpredatextend, the intra prediction parameter decoding unit 304 decodes DC1 absolute value, DC1 code, DC2 absolute value, and DC2 code from the input code. The depth intra prediction mode dmm _ mode generates quantization offsets DC1 dmquantoffsetdc 1 and quantization offset DC2 dmquantoffsetdc 2 from DC1 absolute value, DC1 code, DC2 absolute value, and DC2 code by the following equations.
DmmQuantOffsetDC1=(1-2*dmm_dc_1_sign_flag)*dmm_dc_1_abs
DmmQuantOffsetDC2=(1-2*dmm_dc_2_sign_flag)*dmm_dc_2_abs
The intra prediction parameter decoding unit 304 uses the Wedgelet mode index wedge _ full _ tab _ idx decoded into the generated intra prediction modes IntraPredMode, Deltaend, quantization offsets DC1 dmquantoffsetdc 1, and quantization offsets DC2 dmquantoffsetdc 2 as prediction parameters.
The intra-prediction parameter decoding unit 304 outputs the intra-prediction parameters to the predicted image generation unit 308, and stores the parameters in the prediction parameter memory 307.
The prediction parameter storage 307 stores the prediction parameters at predetermined positions in the picture and each block of the decoding target. Specifically, the prediction parameter memory 307 stores: the inter prediction parameters decoded by the inter prediction parameter decoding unit 303, the intra prediction parameters decoded by the intra prediction parameter decoding unit 304, and the prediction mode predMode separated by the entropy decoding unit 301. The inter prediction parameters stored are, for example: the prediction list uses a flag predFlagLX (inter prediction flag inter _ pred _ idx), a reference picture index refIdxLX, and a vector mvLX.
The prediction mode predMode input from the entropy decoding unit 301 is input to the prediction image generation unit 308, and the prediction parameters are input from the prediction parameter decoding unit 302. The predicted image generator 308 reads out the reference picture from the decoded picture buffer 12. The predicted image generation unit 308 generates a predicted image block P (predicted image) using the input prediction parameters and the read reference picture in a prediction mode indicated by the prediction mode predMode.
Here, when the prediction mode predMode indicates the inter prediction mode, the inter-prediction image generation unit 309 generates the prediction picture block P by inter prediction using the reference picture read as the inter prediction parameter input from the inter-prediction parameter decoding unit 303. The predicted picture block P corresponds to a PU. As described above, a PU corresponds to a part of a picture including a plurality of pixels that are units to be subjected to prediction processing, that is, a block to be decoded that is subjected to prediction processing at a time.
The inter-prediction image generator 309 reads, from the reference picture indicated by the reference picture index refIdxLX, the reference picture block at the position indicated by the vector mvLX with respect to the block to be decoded from the decoded picture buffer 12 using the reference picture list (L0 reference list or L1 reference list) whose flag predFlagLX is 1 for the prediction list. The inter-prediction image generator 309 generates a prediction picture block P by predicting the read reference picture block. The inter-prediction image generator 309 outputs the generated prediction picture block P to the adder 312.
When the prediction mode predMode indicates the intra prediction mode, the intra prediction image generation unit 310 performs intra prediction using the intra prediction parameters input from the intra prediction parameter decoding unit 304 and the read reference picture. Specifically, the intra-prediction image generator 310 reads out a picture to be decoded, that is, a reference picture block within a range predetermined by a block to be decoded among already decoded blocks, from the decoded picture buffer 12. The predetermined range is, for example, any one of adjacent blocks in the left, upper right, and upper right when the decoding target block moves sequentially in the order of so-called raster scanning, and is different depending on the intra prediction mode. The raster scanning sequence is a sequence in which each line from the top to the bottom in each picture is sequentially shifted from the left end to the right end.
The intra-prediction image generator 310 generates a prediction picture block using the read reference picture block and the input prediction parameters.
The intra-prediction image generator 310 outputs the generated prediction picture block P to the adder 312.
The inverse quantization/inverse DCT unit 311 inversely quantizes the quantization coefficient input from the entropy decoding unit 301 to obtain a DCT coefficient. The inverse quantization/inverse DCT unit 311 performs inverse DCT (inverse discrete cosine transform) on the obtained DCT coefficient to calculate a decoded residual signal. The inverse quantization/inverse DCT unit 311 outputs the calculated decoded residual signal to the addition unit 312.
The adder 312 adds, for each pixel, the predicted picture block P input from the inter-predicted picture generator 309 and the intra-predicted picture generator 310 and the signal value of the decoded residual signal input from the inverse quantization/inverse DCT unit 311 to generate a reference picture block. The adder 312 stores the generated reference picture block in the reference picture buffer 12, and outputs the decoded layer image Td in which the generated reference picture block is integrated for each picture to the outside.
(configuration of inter-frame prediction parameter decoding section)
Next, the configuration of the inter prediction parameter decoding unit 303 will be described.
Fig. 6 is a schematic diagram showing the configuration of the inter prediction parameter decoding unit 303 according to the present embodiment. The inter prediction parameter decoding unit 303 includes: the inter prediction parameter decoding control unit 3031, the AMVP prediction parameter derivation unit 3032, the addition unit 3035, and the merged prediction parameter derivation unit 3036.
The inter prediction parameter decoding control unit 3031 instructs the entropy decoding unit 301 to decode a code (syntax element) associated with inter prediction, and extracts, for example, the partition mode part _ mode, the merge flag merge _ flag, the merge index merge _ idx, the inter prediction flag inter _ pred _ idx, the reference picture index refIdxLX, the prediction vector index mvp _ LX _ idx, and the difference vector mvdLX from the code (syntax element) included in the encoded data.
The inter prediction parameter decoding control unit 3031 first extracts the merge flag. When it appears that a certain syntax element is extracted, the inter-prediction parameter decoding control unit 3031 instructs the entropy decoding unit 301 to decode the certain syntax element, which means that the syntax element is read from the encoded data. Here, when the value indicated by the merge flag is 1, that is, when the merge prediction mode is indicated, the interprediction parameter decoding control unit 3031 extracts the merge index merge _ idx as the prediction parameter of the merge prediction. The inter prediction parameter decoding control unit 3031 outputs the extracted merge index merge _ idx to the merge prediction parameter derivation unit 3036.
When the merge flag merge _ flag is 0, that is, when the AMVP prediction mode is indicated, the inter prediction parameter decoding control unit 3031 extracts the AMVP prediction parameter from the encoded data using the entropy decoding unit 301. Examples of AMVP prediction parameters include: inter-frame prediction flag inter _ pred _ idc, reference picture index refIdxLX, vector index mvp _ LX _ idx, and differential vector mvdLX. The inter prediction parameter decoding control unit 3031 outputs the prediction list derived from the extracted inter prediction flag inter _ pred _ idx to the AMVP prediction parameter derivation unit 3032 and the prediction image generation unit 308 (fig. 5) using the flag predFlagLX and the reference picture index refIdxLX, and stores the prediction list in the prediction parameter memory 307 (fig. 5). The inter-prediction parameter decoding control unit 3031 outputs the extracted vector index mvp _ LX _ idx to the AMVP prediction parameter derivation unit 3032. The inter-prediction parameter decoding control unit 3031 outputs the extracted difference vector mvdLX to the addition unit 3035.
Fig. 7 is a schematic diagram showing the configuration of the merged prediction parameter deriving unit 3036 according to the present embodiment. The merged prediction parameter deriving unit 3036 includes: a merge candidate derivation unit 30361 and a merge candidate selection unit 30362. The merge candidate derivation unit 30361 includes: the merge candidate storage unit 303611, the extended merge candidate derivation unit 303612, the basic merge candidate derivation unit 303613, and the MPI candidate derivation unit 303614.
The merge candidate storage unit 303611 stores merge candidates input from the extended merge candidate derivation unit 303612 and the basic merge candidate derivation unit 303613. The merging candidates include a prediction list constructed by a flag predFlagLX, a vector mvLX, and a reference picture index refIdxLX. The merge candidate storage unit 303611 assigns an index to the stored merge candidates according to a predetermined rule. For example, "0" is assigned as an index to the merge candidate input from the extended merge candidate derivation unit 303612 or the MPI candidate derivation unit 303614.
In the MPI candidate derivation unit 303614, if the layer of the target block is a depth layer and the motion parameter inheritance can be used, that is, if the depth flag depth _ flag and the motion parameter inheritance flag use _ MPI _ flag are both 1, the merge candidate is derived using the motion compensation parameters of the layers different from the target layer. As a layer different from the object layer, for example, a picture of a texture layer having the same view id and the same POC as those of the depth picture of the object.
The MPI candidate derivation unit 303614 reads, from the prediction parameter memory 307, the prediction parameters of a block (also referred to as a corresponding block) having the same coordinates as the target block in a picture of a layer different from the target layer.
When the corresponding block is smaller in size than the target block, the MPI candidate derivation unit 303614 reads the partition flag split _ flag of the CTU having the same coordinates as the target block in the corresponding texture picture and the prediction parameters of the plurality of blocks included in the CTU.
When the size of the corresponding block is larger than the target block, the MPI candidate derivation unit 303614 reads the prediction parameter of the corresponding block.
The MPI candidate derivation unit 303614 outputs the read prediction parameters to the merge candidate storage unit 303611 as merge candidates. When the split flag split _ flag of the CTU is also read, the split information is also included in the merge candidates.
The expanded merge candidate derivation unit 303612 includes: the displacement vector acquisition unit 3036122, the interlayer merging candidate derivation unit 3036121, and the interlayer displacement merging candidate derivation unit 3036123 are configured.
If the layer of the target block is not a depth layer or the motion parameter inheritance cannot be used, that is, if either the depth flag depth _ flag or the motion parameter inheritance flag use _ mpi _ flag is 0, the extended merge candidate derivation unit 303612 derives a merge candidate. The extended merge candidate derivation unit 303612 may derive the merge candidate when both the depth flag depth _ flag and the motion parameter inheritance flag use _ mpi _ flag are 1. In this case, the merge candidate storage unit 303611 assigns different indices to the merge candidates derived by the extended merge candidate derivation unit 303612 and the MPI candidate derivation unit 303614.
The displacement vector acquisition unit 3036122 first sequentially acquires displacement vectors from a plurality of candidate blocks adjacent to the decoding target block (for example, blocks adjacent to the upper left, upper right, and upper right). Specifically, one candidate block is selected, whether the vector of the selected candidate block is a motion vector or a motion vector is determined using the reference picture index refIdxLX of the candidate block, and if there is a motion vector, the selected candidate block is set as a motion vector. In the case where a candidate block has no displacement vector, the next candidate block is scanned in order. When there is no displacement vector in an adjacent block, the displacement vector acquisition unit 3036122 attempts to acquire a displacement vector of a block at a position corresponding to a target block of blocks included in a reference picture in another display order in terms of time. When the displacement vector cannot be acquired, the displacement vector acquisition unit 3036122 sets the zero vector as the displacement vector. The displacement vector acquisition unit 3036122 outputs the displacement vector to the interlayer merging candidate derivation unit 3036121 and the interlayer displacement merging candidate derivation unit.
The interlayer merging candidate derivation unit 3036121 receives the displacement vector from the displacement vector acquisition unit 3036122. The inter-layer merge candidate derivation unit 3036121 selects a block indicated by the displacement vector input from the displacement vector acquisition unit 3036122 from a picture having the same POC as the decoding target picture of another layer (for example, the base layer or the base view), and reads the prediction parameter, which is the motion vector of the block, from the prediction parameter memory 307. More specifically, when the center point of the target block is set as the starting point, the prediction parameter read by the inter-layer merge candidate derivation unit 3036121 is a prediction parameter of a block including a coordinate obtained by adding the coordinate of the starting point and the displacement vector.
The coordinates (xRef, yRef) of the reference block are derived by the following equation when the coordinates of the target block are (xP, yP), the displacement vectors are (mvDisp [0], mvDisp [1]), and the width and height of the target block are nPSW, nPSH.
xRef=Clip3(0,PicWidthInSamplesL-1,xP+((nPSW-1)>>1)+((mvDisp[0]+2)>>2))
yRef=Clip3(0,PicHeightInSamplesL-1,yP+((nPSH-1)>>1)+((mvDisp[1]+2)>>2))
The interlayer displacement combination candidate derivation unit 3036123 receives the displacement vector from the displacement vector acquisition unit 3036122. The inter-layer displacement merge candidate derivation unit 3036123 outputs the input displacement vector and the reference picture index refIdxLX (e.g., an index of a base layer image having the same POC as the decoding target picture) of the layer image pointed to by the displacement vector to the merge candidate storage unit 303611 as a merge candidate. This merging candidate is an inter-layer candidate for displacement prediction (inter-view candidate) and is also referred to as an inter-layer merging candidate (displacement prediction).
The basic merge candidate derivation unit 303613 includes: the spatial merge candidate derivation unit 3036131, the temporal merge candidate derivation unit 3036132, the merge candidate derivation unit 3036133, and the zero merge candidate derivation unit 3036134.
According to a predetermined rule, the spatial merge candidate derivation unit 3036131 reads the prediction parameters (prediction list use flag predFlagLX, vector mvLX, reference picture index refIdxLX) stored in the prediction parameter memory 307, and derives the read prediction parameters as merge candidates. The read prediction parameters are prediction parameters for each block (for example, all or a part of blocks connected to the lower left end, upper left end, and upper right end of the decoding target block, respectively) within a range predetermined by the decoding target block. The derived merge candidates are stored in the merge candidate storage unit 303611.
The temporal merge candidate derivation unit 3036132 reads the prediction parameters of the block in the reference image including the coordinates of the lower right of the decoding target block from the prediction parameter memory 307 as merge candidates. The reference image may be specified by, for example, the reference picture index refIdxLX specified in the slice header, or may be specified by the smallest one of the reference picture indexes refIdxLX of blocks adjacent to the decoding target block. The derived merge candidates are stored in the merge candidate storage unit 303611.
The merge candidate derivation unit 3036133 derives the merge candidates by combining the vectors of two different derived merge candidates, which have been already derived and stored in the merge candidate storage unit 303611, and the reference picture indices as vectors of L0 and L1, respectively. The derived merge candidates are stored in the merge candidate storage unit 303611.
In the zero merge candidate derivation unit 3036134, the reference picture index refIdxLX is 0, and a merge candidate in which both the X component and the Y component of the vector mvLX are 0 is derived. The derived merge candidates are stored in the merge candidate storage unit 303611.
The merge candidate selecting unit 30362 selects, as the inter prediction parameter of the target PU, the merge candidate assigned to the index corresponding to the merge index merge _ idx input from the inter prediction parameter decoding control unit 3031, from among the merge candidates stored in the merge candidate storing unit 303611. The merge candidate selecting unit 30362 stores the selected merge candidate in the prediction parameter memory 307 (fig. 5), and outputs the merge candidate to the predicted image generating unit 308 (fig. 5). When the merge candidate selector 30362 selects the merge candidate derived by the MPI candidate derivation unit 303614 and the merge candidate includes the split flag split _ flag, a plurality of prediction parameters corresponding to the blocks split by the split flag split _ flag are stored in the prediction parameter memory 307 and output to the predicted image generation unit 308.
Fig. 8 is a schematic diagram showing the configuration of the AMVP prediction parameter derivation unit 3032 according to this embodiment. The AMVP prediction parameter derivation unit 3032 includes a vector candidate derivation unit 3033 and a prediction vector selection unit 3034. The vector candidate derivation unit 3033 reads the vector (motion vector or displacement vector) stored in the prediction parameter memory 307 (fig. 5) as a vector candidate based on the reference picture index refIdx. The read-out vector is a vector associated with each block within a range predetermined by the decoding target block (for example, all or a part of blocks respectively adjoining the lower left end, upper left end, and upper right end of the decoding target block).
The vector predictor selector 3034 selects, as the vector predictor mvpLX, the vector candidate indicated by the vector index mvp _ LX _ idx input from the inter-prediction parameter decoding controller 3031 among the vector candidates read by the vector candidate derivation unit 3033. The prediction vector selection unit 3034 outputs the selected prediction vector mvpLX to the addition unit 3035.
Fig. 9 is a conceptual diagram illustrating an example of the vector candidates. The prediction vector list 602 shown in fig. 9 is a list including a plurality of vector candidates derived by the vector candidate derivation unit 3033. In the prediction vector list 602, five rectangles arranged in a row on the left and right represent regions representing prediction vectors. An arrow directly downward from the second mvp _ LX _ idx on the left end and mvpLX therebelow indicate that the vector index mvp _ LX _ idx is an index of the reference vector mvpLX in the prediction parameter memory 307.
The candidate vector is generated based on a vector of a block to which decoding processing has been completed, that is, a block (for example, a neighboring block) in a range predetermined by a decoding target block. The adjacent block includes a block spatially adjacent to the target block, for example, a left block and an upper block, and also includes a block temporally adjacent to the target block, for example, a block obtained from a block different in display time at the same position as the target block.
The adder 3035 adds the vector mvpLX input from the vector predictor selector 3034 and the difference vector mvdLX input from the inter-prediction parameter decoding controller, and calculates the vector mvLX. The adder 3035 outputs the calculated vector mvLX to the prediction image generator 308 (fig. 5).
(configuration of inter-frame prediction parameter decoding control section)
Next, the configuration of the inter prediction parameter decoding control unit 3031 will be described. As shown in fig. 10, the inter prediction parameter decoding control unit 3031 includes: the merge index decoding unit 30312, the vector candidate index decoding unit 30313, and a split mode decoding unit, a merge flag decoding unit, an inter prediction flag decoding unit, a reference picture index decoding unit, and a vector difference decoding unit, which are not shown. The split mode decoding unit, the merge flag decoding unit, the merge index decoding unit, the inter prediction flag decoding unit, the reference picture index decoding unit, the vector candidate index decoding unit 30313, and the vector difference decoding unit decode the split mode part _ mode, the merge flag merge _ flag, the merge index merge _ idx, the inter prediction flag inter _ pred _ idx, the reference picture index refIdxLX, the prediction vector index mvp _ LX _ idx, and the difference vector mvdLX, respectively.
The additional prediction flag decoding unit 30311 includes an additional prediction flag determination unit 30314 therein. The additional prediction flag determination unit 30314 determines whether or not the additional prediction flag xpred _ flag is included in the encoded data (whether or not the additional prediction flag is read from the encoded data and decoded). When the additional prediction flag determination unit 30314 determines that the additional prediction flag is included in the encoded data, the additional prediction flag decoding unit 30311 notifies the entropy decoding unit 301 of the decoding of the additional prediction flag, and extracts the syntax element corresponding to the additional prediction flag from the encoded data by the entropy decoding unit 301. On the other hand, when the additional prediction flag determination unit 30314 determines that the encoded data is not included, a value (here, 1) indicating additional prediction is derived (preferr) from the additional prediction flag. The additional prediction flag determination unit 30314 is described later.
(Displacement vector acquiring section)
When a block adjacent to the target PU has a displacement vector, the displacement vector acquisition unit extracts the displacement vector from the prediction parameter memory 307, and reads the prediction flag predFlagLX, the reference picture index refIdxLX, and the vector mvLX of the block adjacent to the target PU with reference to the prediction parameter memory 307. A displacement vector acquisition unit sequentially reads prediction parameters of blocks adjacent to a target PU, and determines whether or not an adjacent block has a displacement vector from a reference picture index of the adjacent block. When the adjacent block has a displacement vector, the displacement vector is output. When there is no displacement vector in the prediction parameters of the adjacent block, a zero vector is output as a displacement vector.
(inter-prediction image generator 309)
Fig. 10 is a schematic diagram showing the configuration of the inter-prediction image generator 309 according to the present embodiment. The inter-prediction image generator 309 includes a motion displacement compensator 3091, a residual predictor 3092, an illumination compensator 3093, and a weight predictor 3094.
(motion displacement compensation)
The motion displacement compensation unit 3091 generates a motion displacement compensated image by reading out, from the reference picture memory 306, a block at a position deviated from the vector mvLX starting from the position of the target block of the reference picture specified by the reference picture index refIdxLX, based on the prediction list use flag predFlagLX, the reference picture index refIdxLX, and the motion vector mvLX input from the inter-prediction parameter decoding unit 303. Here, when the vector mvLX is not an integer vector, a filter called a motion compensation filter (or a displacement compensation filter) for generating pixels at decimal positions is performed to generate a motion displacement compensated image. In general, when the vector mvLX is a motion vector, the above process is called motion compensation, and when the vector is a motion vector, the process is called motion compensation. Here, motion displacement compensation is collectively referred to. Hereinafter, the motion-displacement-compensated image predicted by L0 is referred to as predSamplesL0, and the motion-displacement-compensated image predicted by L1 is referred to as predSamplesL 1. Without distinguishing between the two, is called predSamplesLX. Hereinafter, an example of further performing residual prediction and illumination compensation on the motion displacement compensated image predSamplesLX obtained by the motion displacement compensation unit 3091 will be described, and these output images are also referred to as motion displacement compensated images predSamplesLX. In the following residual prediction and illumination compensation, when an input image and an output image are distinguished from each other, the input image is expressed as predSamplesLX, and the output image is expressed as predSamplesLX'.
(residual prediction)
The residual prediction unit 3092 performs residual prediction on the input motion displacement compensated image predSamplesLX when the residual prediction implementation flag resPredFlag is 1. When the residual prediction implementation flag resPredFlag is 0, the input motion-displacement-compensated image predSamplesLX is directly output. refResSamples residual prediction is performed by adding a residual of a reference layer (first layer image) different from a target layer (second layer image) that is a target of generation of a prediction image to a motion displacement compensation image predsamples lx that is an image predicted by the target layer. That is, assuming that the same residual as the reference layer is also generated in the target layer, the residual of the already derived reference layer is used as the estimated value of the residual of the target layer. In the base layer (base view), only the image of the same layer becomes the reference image.
(illuminance Compensation)
When the illumination compensation flag ic _ enable _ flag is 1, the illumination compensation unit 3093 performs illumination compensation on the input motion displacement compensation image predSamplesLX. In the case where the illumination compensation flag ic _ enable _ flag is 0, the input motion displacement compensation image predSamplesLX is directly output. The motion-compensated image predSamplesLX input to the illumination compensation unit 3093 is an output image of the motion compensation unit 3091 when residual prediction is off, and is an output image of the residual prediction unit 3092 when residual prediction is on.
(weight prediction)
The weight prediction unit 3094 generates a predicted picture block P (predicted image) by multiplying the input motion displacement image predSamplesLX by a weight coefficient. The input motion displacement image predSamplesLX is an image subjected to these processes when residual prediction and illumination compensation are performed. When one of the reference list use flags (predflag L0 or predflag L1) is 1 (in single prediction), the following expression processing is performed to match the input motion displacement image predSamplesLX (LX is L0 or L1) to the number of pixel bits without using weight prediction.
predSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesLX[x][y]+offset1)>>shift1)
Here, shift1 is 14-bitDepth, and offset1 is 1 < (shift 1-1).
In addition, when both the reference list utilization flags are (predflag l0 or predflag l1)1 (in the case of bi-prediction), the motion displacement images predSamplesL0 and predSamplesL1 that are input are averaged and subjected to the following expression processing in which the number of pixel bits is matched without using weight prediction.
predSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0[x][y]+predSamplesL1[x][y]+offset2)>>shift2)
Here, shift2 is 15-bitDepth, and offset2 is 1 < (shift 2-1).
Further, in the case of performing weight prediction in the single prediction, the weight prediction unit 3094 derives the weight prediction coefficient w0 and the offset o0, and performs the following processing.
predSamples[x][y]=Clip3(0,(1<<bitDepth)-1,((predSamplesLX[x][y]*w0+2log2WD-1)>>log2WD)+o0)
Here, log2WD is a variable indicating a predetermined shift amount.
Further, in the case of performing weight prediction in the double prediction, the weight prediction unit 3094 derives the weight prediction coefficients w0, w1, o0, and o1, and performs the following processing.
predSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0[x][y]*w0+predSamplesL1[x][y]*w1+((o0+o1+1)<<log2WD))>>(log2WD+1))
[ image encoding device ]
The image coding device 2 of the present embodiment will be described below with reference to fig. 25.
(outline of image coding apparatus)
Roughly, the image encoding device 2 is a device that generates and outputs encoded data #1 by encoding an input image # 10.
(Structure of image encoding device)
A configuration example of the image coding device 2 of the present embodiment will be described. Fig. 25 is a schematic diagram showing the configuration of the image coding device 2 according to the present embodiment. The image encoding device 2 includes: the header encoding unit 10E, the picture encoding unit 21, the decoded picture buffer 12, and the reference picture determining unit 13E are configured. The image encoding device 2 can perform random access decoding processing, which will be described later, that starts decoding from a picture at a specific time in an image including a plurality of layers.
[ head encoding part 10E ]
The header encoding unit 10E generates information used for decoding NAL unit headers, SPS, PPS, slice headers, and the like in NAL unit units, sequence units, picture units, or slice units based on the input image #10, encodes the information, and outputs the encoded information.
The header encoding unit 10E analyzes the VPS and SPS included in the encoded data #1 based on a predetermined syntax definition, and encodes information used for decoding in sequence units. For example, information related to the number of layers is encoded as VPS, and information related to the picture size of the decoded picture is encoded as SPS.
The header encoding unit 10E analyzes the slice header included in the encoded data #1 based on a predetermined syntax definition, and encodes information used for decoding in slice units. For example, the slice type is encoded from the slice header.
When the layer indicated by the layer ID is not an independent layer, the header encoding unit 10E changes the syntax and encodes the changed syntax.
The header encoding unit 10E may encode an additional syntax instead of the modified syntax.
Alternatively, the header encoding unit 10E may skip the encoding of the syntax instead of changing the syntax. That is, the syntax may be encoded only when the layer indicated by the layer ID is an independent layer or the layer ID is 0.
The head coding unit 10E includes: NAL unit header encoding section 211E, dependent layer information encoding section, profile level information encoding section, profile information encoding section, scaling table encoding section, POC information encoding section 2105E, and reference picture information encoding section 218E.
[ NAL unit header encoding section 211E ]
The NAL unit header encoding section 211E is configured to include a layer ID encoding section and a NAL unit type encoding section.
A layer ID encoding section encodes the layer ID, and a NAL unit type encoding section encodes the NAL unit type.
[ dependent layer information encoding section ]
The dependent layer information encoding unit encodes dependent layer information of each layer from the VPS and the VPS by extension based on a predetermined syntax definition, and encodes representation information of each layer.
[ information encoding part of rank level ]
The level information encoding unit encodes the level information of each layer and includes the level information in the VPS. The profile information encoding unit encodes profile information of the independent layer and the layer having the layer ID of 0 of the SPS, and includes the encoded profile information in the SPS.
[ characterization information encoding section ]
The token information encoding unit encodes the syntax of fig. 28(a) and includes the syntax in the VPS, and encodes the syntax of fig. 28(b) and includes the syntax in the SPS.
Specifically, the token information encoding unit encodes rep _ format () and includes it in the VPS, and encodes and includes token information such as chroma _ format _ idc, separate _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, and bit _ depth _ chroma _ minus8 in the VPS.
In addition, when the layer indicated by the layer ID (nuh _ layer _ ID) of the SPS is not an independent layer, the attribute information encoding unit encodes and includes the attribute information update flag update _ rep _ format _ flag in the SPS. Further, when the encoded flag update _ format _ flag is 1, the flag information of chroma _ format _ idc, separate _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, bit _ depth _ chroma _ minus8, etc. are encoded and included in the SPS.
[ encoding part of expansion/contraction table ]
The extended/reduced table encoding unit encodes the SPS _ entrainment _ scaling _ list _ flag when the layer indicated by the SPS layer ID (nuh _ layer _ ID) is not an independent layer, and encodes and includes the SPS _ scaling _ list _ ref _ layer _ ID in the SPS when the SPS _ entrainment _ scaling _ list _ flag is other than 0. In the case where the SPS _ inference _ scaling _ list _ flag is 0, the SPS _ scaling _ list _ data _ present _ flag and the scaling _ list _ data () are encoded and included in the SPS. Similarly, the extension table encoding unit encodes the PPS _ entrainment _ scaling _ list _ flag when the layer indicated by the layer ID (nuh _ layer _ ID) of the PPS is not an independent layer, and encodes the sps _ scaling _ list _ ref _ layer _ ID when the layer is other than 0. In the case where PPS _ nfer _ scaling _ list _ flag is 0, PPS _ scaling _ list _ data _ present _ flag and scaling _ list _ data () are encoded and included in the PPS.
[ reference Picture determining section 13E ]
The reference picture determining unit 13E includes therein: a reference picture information encoding unit 218E, a reference picture set decision unit 24, and a reference picture list decision unit 25.
The reference picture set decision unit 24 decides and outputs a reference picture set RPS for encoding and local decoding of the picture to be encoded, based on the input picture #10 and the local decoded picture recorded in the decoded picture buffer 12.
The reference picture list determining unit 25 determines and outputs a reference picture list RPL used for encoding and local decoding of the encoding target picture based on the input image #10 and the reference picture set.
[ reference picture information encoding section 218E ]
The reference picture information encoding unit 218E is included in the header encoding unit 10E, and performs reference picture information encoding processing based on the reference picture set RPS and the reference picture list RPL to generate the RPS information and the RPL correction information included in the SPS and slice header.
(corresponding relationship with image decoding apparatus)
The image encoding device 2 includes a configuration corresponding to each configuration of the image decoding device 1. Here, correspondence means a relationship in which the same processing is performed or the opposite processing is performed.
For example, the reference picture information decoding process of the reference picture information decoding unit 218 included in the image decoding apparatus 1 is the same as the reference picture information encoding process of the reference picture information encoding unit 218E included in the image encoding apparatus 2. More specifically, the reference picture information decoding unit 218 generates RPS information and modified RPL information as syntax values decoded from SPS and slice headers. On the other hand, the reference picture information encoding unit 218E encodes the input RPS information and corrected RPL information as syntax values of SPS and slice header.
For example, in the image decoding apparatus 1, the process of decoding the syntax value from the bit string is made to correspond to the process of encoding the bit string from the syntax value in the image encoding apparatus 2 as a reverse process.
(flow of treatment)
The flow of the image encoding device 2 generating the output encoded data #1 from the input image #10 is as follows.
(S21) the following processes of S22 to S29 are performed on each picture (target picture) constituting the input image # 10.
(S22) the reference picture set decision unit 24 decides the reference picture set RPS based on the target picture in the input image #10 and the local decoded image recorded in the decoded picture buffer 12, and outputs the reference picture set RPS to the reference picture list decision unit 25. RPS information necessary for generating the reference picture set RPS is derived and output to the reference picture information encoding unit 218E.
(S23) the reference picture list decision unit 25 derives the reference picture list RPL based on the target picture in the input image #10 and the input reference picture set RPS, and outputs the reference picture list RPL to the picture encoding unit 21 and the picture decoding unit 11. Further, RPL correction information necessary for generating the reference picture list RPL is derived and output to the reference picture information encoding unit 218E.
(S24) the reference picture information encoding unit 218E generates RPS information and RPL correction information for inclusion in the SPS or slice header, based on the reference picture set RPS and the reference picture list RPL.
(S25) the header encoding unit 10E generates and outputs an SPS to be applied to the target picture, based on the RPS information and the RPL correction information generated by the input picture #10 and the reference picture determining unit 13E.
(S26) the header encoding unit 10E generates and outputs a PPS to be applied to the target picture, based on the input picture # 10.
(S27) the header encoding unit 10E encodes the slice header of each slice constituting the target picture based on the RPS information and the RPL correction information generated by the input image #10 and the reference picture determining unit 13E, outputs the encoded data #1 to the outside as a part of the encoded data, and outputs the encoded data to the picture decoding unit 11.
(S28) the picture coding unit 21 generates clip data of each clip constituting the target picture based on the input image #10, and outputs the clip data to the outside as a part of the coded data # 1.
(S29) the picture encoding unit 21 generates a local decoded image of the target picture, and records the local decoded image in the decoded picture buffer in association with the layer ID and POC of the target picture.
[ POC information encoding section 2105E ]
Fig. 40 is a functional block diagram showing a schematic configuration of the POC information encoding unit 2105E. As shown in fig. 40, the POC information encoding unit 2105E includes: the POC setting unit 21056 and the POC lower bit maximum value encoding unit 21051E, POC lower bit encoding unit 21052E are configured. The POC information encoding unit 2105E separates POC upper bits PicOrderCntMsb and POC lower bits pic _ order _ cnt _ lsb into codes.
The POC setting unit 21056 sets a common TIME for all pictures of the same TIME. Further, the POC setting unit 21056 sets the POC of the target picture based on the TIME of the target picture (common TIME). Specifically, when the picture of the target layer is a RAP picture (BLA or IDR) encoding POC, POC is set to 0, and TIME at this TIME is set to variable TIME _ BASE. TIME _ BASE is recorded by POC setting unit 21056.
In the case where the picture of the object layer is not a RAP picture encoding POC, a value obtained by subtracting TIME _ BASE from TIME is set as POC.
The POC lower bit maximum value encoding unit 21051E sets a POC lower bit maximum value MaxPicOrderCntLsb common to all layers. The maximum POC lower bit value MaxPicOrderCntLsb set to coded data #1 is coded. Specifically, a value obtained by subtracting a constant 4 from the logarithm of the POC lower bit maximum value MaxPicOrderCntLsb is encoded as log2_ max _ pic _ order _ cnt _ lsb _ minus 4.
By updating the display time POC (POC upper bit) at the same timing for pictures of a plurality of layers having the same time (output), the display time POC can be provided between pictures of a plurality of layers having the same time. Thereby, the following effects are achieved: in the case of performing synchronous playback of a plurality of layers, such as reference picture management when using pictures of a layer different from a target layer in a reference picture list as reference pictures, and three-dimensional image playback, or in the case of performing time management display timing using pictures, POC management can be used for pictures at the same time, and reference picture search and synchronization are facilitated.
The POC lower bit encoding unit 21052E encodes POC lower bits pic _ order _ cnt _ lsb of the POC encoding target picture of the target picture input from the POC setting unit 21056. Specifically, POC lower-order bits pic _ order _ cnt _ lsb are obtained from the remaining POC lower-order bit maximum value MaxPicOrderCntLsb of the input POC, POC% MaxPicOrderCntLsb (or POC & (MaxPicOrderCntLsb-1)). The POC lower bit encoding unit 21052E encodes pic _ order _ cnt _ lsb, which is the slice header of the target picture, in a) a case where the layer indicated by the layer ID is a layer other than the independent layer, or b) a NAL unit type is other than IDR (other than IDR _ W _ RADL and IDR _ N _ LP).
According to the encoding device including the POC setting unit 21056, the common TIME is set for all pictures of the same TIME in all layers, and the POC lower bit maximum value encoding unit 21051E sets the POC lower bit maximum value MaxPicOrderCntLsb common to all layers, thereby generating encoded data having the aforementioned POC lower bits pic _ order _ cnt _ lsb.
(construction of Picture coding section 21)
Next, the configuration of the picture coding unit 21 according to the present embodiment will be described. Fig. 26 is a block diagram showing the configuration of the picture encoding unit 21 according to the present embodiment. The picture encoding unit 21 includes: the predicted image generating unit 101, the subtracting unit 102, the DCT/quantizing unit 103, the entropy encoding unit 104, the inverse quantizing/inverse DCT unit 105, the adding unit 106, the prediction parameter memory 108, the encoding parameter determining unit 110, and the prediction parameter encoding unit 111. The prediction parameter encoding unit 111 includes an inter-prediction parameter encoding unit 112 and an intra-prediction parameter encoding unit 113.
For each picture of each view of a layer image T input from the outside, the predicted image generating unit 101 generates a predicted picture block P for each block that is a region into which the picture is divided. Here, the predicted image generation unit 101 reads out the reference picture block from the decoded picture buffer 12 based on the prediction parameter input from the prediction parameter encoding unit 111. The prediction parameter input from the prediction parameter encoding unit 111 is, for example, a motion vector or a displacement vector. The predicted image generation unit 101 reads a reference picture block of a block at a position indicated by a motion vector or a displacement vector predicted from a block to be encoded. The predicted image generation unit 101 generates a predicted image block P by using one of a plurality of prediction methods for the read reference image block. The predicted image generator 101 outputs the generated predicted picture block P to the subtractor 102. Since the predicted image generator 101 operates in the same manner as the predicted image generator 308 described above, the detailed description of the generation of the predicted picture block P is omitted.
In order to select a prediction method, the predicted image generator 101 selects, for example, one of the following prediction methods: an error value based on a difference between a signal value of each pixel of a block included in the layer image and a signal value of each pixel corresponding to the predicted picture block P is minimized. The method of selecting the prediction method is not limited thereto.
When the picture to be encoded is a base view picture, the plurality of prediction methods are intra prediction, motion prediction, and merge prediction. The motion prediction is a prediction between display time points in the above-described inter prediction. The merge prediction is prediction using the same reference picture block and prediction parameters as those of blocks within a range determined in advance by already-coded blocks, i.e., blocks to be coded. In the case where the picture to be encoded is a non-base view picture, the plurality of prediction methods are intra prediction, motion prediction, merge prediction, and displacement prediction. The displacement prediction (disparity prediction) is prediction between other layer images (other viewpoint images) in the above-described inter prediction. Further, motion prediction, merge prediction, and displacement prediction. There are predictions when additional prediction (residual prediction and illumination compensation) is performed for displacement prediction (parallax prediction) and when additional prediction is not performed.
When intra prediction is selected, the predicted image generator 101 outputs a prediction mode predMode indicating an intra prediction mode used when generating the predicted picture block P to the prediction parameter encoder 111.
When motion prediction is selected, the predicted image generator 101 stores a motion vector mvLX used when generating the predicted picture block P in the prediction parameter memory 108, and outputs the motion vector mvLX to the inter-prediction parameter encoder 112. The motion vector mvLX represents a vector from the position of the encoding target block to the position of the reference picture block at the time of generating the prediction picture block P. The information indicating the motion vector mvLX may include information indicating a reference picture (e.g., a reference picture index refIdxLX and a picture order number POC), or may be information indicating a prediction parameter. The predicted image generator 101 outputs a prediction mode predMode indicating an inter prediction mode to the prediction parameter encoder 111.
When the shift prediction is selected, the predicted image generator 101 stores a shift vector used when generating the predicted picture block P in the prediction parameter memory 108, and outputs the result to the inter-prediction parameter encoder 112. The displacement vector dvLX represents a vector from the position of the encoding target block to the position of the reference picture block at the time of generating the prediction picture block P. The information indicating the displacement vector dvLX may include information indicating a reference picture (e.g., a reference picture index refIdxLX, a view IDview _ id), or may be information indicating a prediction parameter. The predicted image generator 101 outputs a prediction mode predMode indicating an inter prediction mode to the prediction parameter encoder 111.
When merge prediction is selected, the predicted image generator 101 outputs a merge index merge _ idx indicating the selected reference picture block to the inter prediction parameter encoder 112. The predicted image generator 101 outputs a prediction mode predMode indicating the merge prediction mode to the prediction parameter encoder 111.
In the above-described motion prediction, displacement prediction, and merge prediction, the predicted image generator 101 performs residual prediction in the residual prediction unit 3092 included in the predicted image generator 101 as described above when residual prediction is performed as additional prediction, and performs illumination compensation prediction in the illumination compensation unit 3093 included in the predicted image generator 101 as described above when illumination compensation is performed as additional prediction.
The subtraction unit 102 subtracts the signal value of the predicted picture block P input from the predicted image generation unit 101 from the signal value of the block corresponding to the layer image T input from the outside, for each pixel, and generates a residual signal. The subtraction unit 102 outputs the generated residual signal to the DCT/quantization unit 103 and the coding parameter determination unit 110.
The DCT/quantization unit 103 performs DCT on the residual signal input from the subtraction unit 102 to calculate a DCT coefficient. The DCT/quantization unit 103 quantizes the calculated DCT coefficient to obtain a quantization coefficient. The DCT/quantization unit 103 outputs the obtained quantization coefficient to the entropy coding unit 104 and the inverse quantization/inverse DCT unit 105.
The entropy coding unit 104 receives the quantized coefficients from the DCT/quantization unit 103, and receives the coding parameters from the coding parameter determination unit 110. The input coding parameters include, for example: reference picture index refIdxLX, vector index mvp _ LX _ idx, differential vector mvdLX, prediction mode predMode, and merge index merge _ idx.
The entropy coding unit 104 entropy codes the input quantized coefficient and the coding parameter to generate coded data #1, and outputs the generated coded data #1 to the outside.
The inverse quantization/inverse DCT unit 105 inversely quantizes the quantization coefficient input from the DCT/quantization unit 103 to obtain a DCT coefficient. The inverse quantization/inverse DCT unit 105 performs inverse DCT on the obtained DCT coefficient to calculate a coded residual signal. The inverse quantization/inverse DCT unit 105 outputs the calculated encoded residual signal to the adder 106.
The adder 106 adds, for each pixel, the signal value of the prediction picture block P input from the prediction image generator 101 and the signal value of the encoded residual signal input from the inverse quantization/inverse DCT unit 105 to generate a reference picture block. The adder 106 stores the generated reference picture block in the decoded picture buffer 12.
The prediction parameter memory 108 stores the prediction parameters generated by the prediction parameter encoding unit 111 in a predetermined position in each of a picture to be encoded and each block.
The encoding parameter determination unit 110 selects one of a plurality of sets of encoding parameters. The encoding parameter is a parameter to be encoded which is generated by associating the prediction parameter with the above-mentioned prediction parameter. The predicted image generator 101 generates the predicted picture block P using the sets of the encoding parameters.
The encoding parameter determination unit 110 calculates a cost value indicating the size of the information amount and the encoding error for each of the plurality of sets. The cost value is, for example, the sum of the code amount and the value of multiplying the square error by the coefficient λ. The code amount is the information amount of the encoded data #1 obtained by entropy encoding the quantization error and the encoding parameter. The square error is the sum of pixels of the square value of the residual signal calculated by the subtraction unit 102. The coefficient λ is a preset real number larger than zero. The encoding parameter determination unit 110 selects a set of encoding parameters for which the calculated cost value is the smallest. Thus, the entropy encoding unit 104 outputs the selected set of encoding parameters to the outside as encoded data #1, and does not output the unselected set of encoding parameters.
The prediction parameter encoding unit 111 derives a prediction parameter used when generating a prediction picture based on the parameter input from the prediction image generating unit 101, encodes the derived prediction parameter, and generates a set of encoding parameters. The prediction parameter encoding unit 111 outputs the generated set of encoding parameters to the entropy encoding unit 104.
The prediction parameter encoding unit 111 stores, in the prediction parameter memory 108, the prediction parameters corresponding to the set selected by the encoding parameter determination unit 110 among the generated sets of encoding parameters.
The prediction parameter encoding unit 111 operates the inter-prediction parameter encoding unit 112 when the prediction mode predMode input from the predicted image generation unit 101 indicates the inter-prediction mode. When the prediction mode predMode indicates the intra prediction mode, the prediction parameter encoding unit 111 operates the intra prediction parameter encoding unit 113.
The inter-prediction parameter encoding unit 112 derives the inter-prediction parameters based on the prediction parameters input from the encoding parameter determination unit 110. The inter prediction parameter encoding unit 112 includes the same configuration as that of the inter prediction parameter decoding unit 303 (see fig. 5 and the like) for deriving the inter prediction parameters, as a configuration for deriving the inter prediction parameters. The structure of the inter prediction parameter encoding unit 112 is as described later.
The intra prediction parameter encoding unit 113 determines the intra prediction mode IntraPredMode indicated by the prediction mode predMode input from the encoding parameter determination unit 110 as a set of inter prediction parameters.
(construction of inter-frame prediction parameter coding section)
Next, the configuration of the inter prediction parameter encoding unit 112 will be described. The inter prediction parameter encoding unit 112 corresponds to the inter prediction parameter decoding unit 303.
Fig. 27 is a schematic diagram showing the configuration of the inter prediction parameter encoding unit 112 according to the present embodiment.
The inter prediction parameter encoding unit 112 includes: the inter-prediction parameter encoding control unit 1031, the merged prediction parameter derivation unit 1121, the AMVP prediction parameter derivation unit 1122, the subtraction unit 1123, and the prediction parameter integration unit 1126.
The merged prediction parameter derivation unit 1121 has the same configuration as the merged prediction parameter derivation unit 3036 (see fig. 7) described above.
The inter prediction parameter coding control unit 1031 instructs the entropy coding unit 104 to code a code (syntax element) associated with inter prediction, and for example, the coding division mode part _ mode, the merge flag merge _ flag, the merge index merge _ idx, the inter prediction flag inter _ pred _ idx, the reference picture index refIdxLX, the prediction vector index mvp _ LX _ idx, and the difference vector mvdLX are applied to the code (syntax element) included in the coded data # 1.
When the prediction mode predMode input from the predicted image generation unit 101 indicates the merge prediction mode, the merge index merge _ idx is input from the encoding parameter determination unit 110 to the merge prediction parameter derivation unit 1121. The merge index merge _ idx is output to the prediction parameter integration unit 1126. The merged prediction parameter derivation unit 1121 reads out the reference picture index refIdxLX and the vector mvLX of the reference block indicated by the merged index merge _ idx in the merge candidates from the prediction parameter memory 108. The merge candidate is a reference block that is within a range predetermined by the encoding target block to be encoded (for example, a reference block that is in contact with the lower left end, upper left end, and upper right end of the encoding target block), that is, a reference block for which the encoding process has been completed.
The AMVP prediction parameter derivation unit 1122 has the same configuration as the AMVP prediction parameter derivation unit 3032 (see fig. 8) described above.
When the prediction mode predMode input from the prediction image generator 101 indicates the inter prediction mode, the AMVP prediction parameter derivation unit 1122 inputs the vector mvLX from the encoding parameter determination unit 110. The AMVP prediction parameter derivation unit 1122 derives a prediction vector mvpLX based on the input vector mvLX. The AMVP prediction parameter derivation unit 1122 outputs the derived prediction vector mvpLX to the subtraction unit 1123. The reference picture index refIdx and the vector index mvp _ LX _ idx are output to the prediction parameter integration unit 1126.
The subtraction unit 1123 subtracts the prediction vector mvpLX input by the AMVP prediction parameter derivation unit 1122 from the vector mvLX input by the encoding parameter determination unit 110, and generates a difference vector mvdLX. The difference vector mvdLX is output to the prediction parameter integration unit 1126.
When the prediction mode predMode input from the predicted image generation unit 101 indicates the merge prediction mode, the prediction parameter integration unit 1126 outputs the merge index merge _ idx input from the encoding parameter determination unit 110 to the entropy encoding unit 104.
When the prediction mode predMode input from the predicted image generation unit 101 indicates the inter prediction mode, the prediction parameter integration unit 1126 performs the following processing.
The prediction parameter integration unit 1126 integrates the reference picture index refIdxLX and the vector index mvp _ LX _ idx input from the coding parameter determination unit 110, and the difference vector mvdLX input from the subtraction unit 1123. The prediction parameter integration unit 1126 outputs the integrated code to the entropy encoding unit 104.
(modification 1 of decoding device 1 and encoding device 2)
The encoding apparatus 2 encodes a flag VPS _ syntax _ change _ by _ layer _ id _ flag indicating whether or not the syntax structure can be changed, and may include the encoded flag in the VPS. The encoding device 2 may change the syntax structure of the independent layer only when vps _ syntax _ change _ by _ layer _ ID _ flag is 1 and the layer ID is larger than 0. The encoding device 2 may encode a flag vps _ syntax _ change _ by _ layer _ id _ flag [ ] indicating whether or not the syntax structure of the layer i can be changed for each layer. In this case, the decoding flag vps _ syntax _ change _ by _ layer _ id _ flag [ i ] may be decoded only when the independent layer flag independentitlerflayerflag [ i ] is independent (when numdirectreflyers [ i ] is larger than 0).
In this case, the header decoding unit 10 of the decoding apparatus 1 decodes a flag VPS _ syntax _ change _ by _ layer _ id _ flag indicating whether or not the syntax structure can be changed from the VPS or the like. Further, in the case where the flag vps _ syntax _ change _ by _ layer _ id _ flag is encoded for each layer, the decoding apparatus 1 decodes the flag vps _ syntax _ change _ by _ layer _ id _ flag [ i ] of the layer i in order. The flag vps _ syntax _ change _ by _ layer _ id _ flag [ i ] may be decoded only in the case where the independent layer flag IndependentLayerFlag [ i ] is indicated to be independent (in the case where numdirectreflyers [ i ] is 0). The level decoding unit 2102, the token information decoding unit 2103, and the scaling table decoding unit 2104 provided inside the header decoding unit 10 of the decoding device 1 are configured to execute the following processing.
[ grade decoding section 2102]
The profile level decoding unit 2102 decodes profile level information of each layer from the VPS. When decoding the SPS, the profile level decoding unit 2102 decodes the profile level information from the SPS, regardless of whether the layer indicated by the layer ID is an independent layer, when the flag vps _ syntax _ change _ by _ layer _ ID _ flag is 0 or the layer ID of the SPS is 0 (see (b) of fig. 41).
[ characterization information decoding unit 2103]
The token information decoding unit 2103 decodes the syntax of fig. 41(a) from the VPS and the syntax of fig. 28(b) from the SPS.
When the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID of the SPS is larger than 0, the token information decoding unit 2103 includes the token information update flag update _ rep _ format _ flag in the SPS, and decodes the flag from the SPS. In the case where update _ rep _ format _ flag is not included in the SPS, update _ rep _ format _ flag is derived (prefer) to be 0. When the update _ rep _ format _ flag is 1, the token information decoding unit 2103 further decodes token information such as chroma _ format _ idc, separate _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, and bit _ depth _ chroma _ minus 8. In the case where update _ rep _ format _ flag is 0, the representation information already decoded in rep _ format () of the VPS is used as the representation information for the object layer.
[ expansion/contraction table decoding unit 2104]
When the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID is larger than 0 (when the layer ID is other than 0), the extended/reduced table decoding unit 2104 decodes a SPS _ introduction _ scaling _ list _ flag from the SPS (see fig. 41 (b)). If the sps _ inference _ scaling _ list _ flag is not 0, the sps _ scaling _ list _ ref _ layer _ id is decoded. In the case where the sps _ suppression _ scaling _ list _ flag is 0, the sps _ scaling _ list _ data _ present _ flag and the scaling _ list _ data () are decoded and the scaling table is decoded.
When the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID is larger than 0 (when the layer ID is other than 0), the extension table decoding unit 2104 decodes PPS _ entrainment _ scaling _ list _ flag from the PPS (see fig. 42 (a)). When pps _ interpolating _ list _ flag is not 0, pps _ scaling _ list _ layer _ id is decoded. In the case where pps _ interpolating _ list _ flag is 0, pps _ scaling _ list _ data _ present _ flag and scaling _ list _ data () are decoded and the scaling table is decoded.
[ POC information decoding unit 2105]
The POC information decoding UNIT 2105 decodes slice _ pic _ order _ cnt _ lsb from the slice header when a) the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID is larger than 0 (when the layer ID is other than 0), or b) the NAL UNIT TYPE does not indicate an IDR picture (when the NAL _ UNIT _ TYPE is neither IDR _ W _ RADL nor IDR _ N _ LP) (see (b) of fig. 42).
[ characterization information encoding section ]
In addition, the representation information encoding unit encodes the representation information update flag update _ rep _ format _ flag and includes it in the SPS when the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID of the SPS is larger than 0. Furthermore, when the encoded representation information update flag update _ rep _ format _ flag is 1, the representation information such as chroma _ format _ idc, match _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, bit _ depth _ chroma _ minus8, etc. is encoded and included in the SPS.
[ encoding part of expansion/contraction table ]
The extended/reduced table encoding unit encodes the SPS _ entrainment _ scaling _ list _ flag when the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID is larger than 0 (when the layer ID is other than 0), and encodes the SPS _ scaling _ list _ ref _ layer _ ID and includes the encoded SPS _ scaling _ list _ ref _ layer _ ID in the SPS when the flag vps _ entrainment _ scaling _ list _ flag is other than 0. In the case where the SPS _ inference _ scaling _ list _ flag is 0, the SPS _ scaling _ list _ data _ present _ flag and the scaling _ list _ data () are encoded and included in the SPS. Similarly, the extended/contracted table encoding unit encodes the PPS _ indexing _ list _ flag when the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID is larger than 0 (when the layer ID is other than 0), and encodes the sps _ indexing _ list _ ref _ layer _ ID when the flag vps _ indexing _ change _ by _ layer _ ID _ flag is other than 0, and includes the encoded PPS _ indexing _ list _ ref _ layer _ ID in the PPS. In the case where PPS _ nfer _ scaling _ list _ flag is 0, PPS _ scaling _ list _ data _ present _ flag and scaling _ list _ data () are encoded and included in the PPS.
[ POC information encoding section 2105E ]
The POC information encoding UNIT 2105E encodes slice _ pic _ order _ cnt _ lsb in the slice header when a) the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID is larger than 0 (when the layer ID is other than 0), or b) the NAL UNIT TYPE does not indicate an IDR picture (when the NAL _ UNIT _ TYPE is neither IDR _ W _ RADL nor IDR _ N _ LP).
In addition, when encoding a list vps _ syntax _ change _ by _ layer _ ID _ flag [ ] indicating whether or not the syntax structure can be changed for each layer, the above determination for determining whether or not to decode each piece of information is performed in decoding SPS, PPS, and slice headers, in accordance with a flag vps _ syntax _ change _ y _ layer _ ID [ nuh _ layer _ ID ] corresponding to a layer ID (nuh _ layer _ ID) of each SPS, PPS, and slice header. In this case, since the vps _ syntax _ change _ by _ layer _ ID [ nuh _ layer _ ID ] already has information on a layer unit basis, in the above description, the flag vps _ syntax _ change _ by _ layer _ ID _ flag is set to a value other than 0 and the determination in the case where the layer ID is larger than 0 (the case where the layer ID is other than 0) is replaced with the determination other than vps _ syntax _ change _ by _ layer _ ID [ nuh _ layer _ ID ] being 0.
Note that the name of a flag indicating whether or not the syntax structure can be changed is not limited to vps _ syntax _ change _ by _ layer _ id _ flag. For example, a name such as syntax _ change _ enable _ flag may be used. The predetermined flag may be a true or false opposite flag. That is, not as a flag indicating that syntax change is permitted, but as a flag indicating that syntax change is not permitted (e.g., syntax _ change _ disable _ flag). In this case, the determination of "whether or not the flag vps _ syntax _ change _ by _ layer _ id _ flag is other than 0" is replaced with the determination of "whether or not the flag is 0", and the determination of "whether or not the flag vps _ syntax _ change _ by _ layer _ id _ flag is 0" is replaced with the determination of "whether or not the flag is other than 0".
(matters attached to modification 1)
The encoding apparatus 2 may also transmit VPS _ syntax _ change _ by _ layer _ id _ flag of each layer to the decoding apparatus 1 through the VPS signal.
(modification 2 of decoding device 1 and encoding device 2)
The plurality of layers constituting the moving image may be divided into: a layer that allows syntax changes and a layer that does not allow syntax changes.
FOR example, a LAYER having a LAYER ID smaller than a predetermined value (FOR example, the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE) (that is, a LAYER indicating that the LAYER ID is 0) or a LAYER having a LAYER ID within a specific range (a LAYER indicating a value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE of 1 or more) may be used as a LAYER FOR which SYNTAX CHANGE is permitted, or a LAYER having a LAYER ID of a predetermined value or more may be used as a LAYER FOR which SYNTAX CHANGE is not permitted.
In this case, the encoding device 2 may CHANGE the SYNTAX structure of each LAYER having a LAYER ID smaller than a predetermined threshold (FOR example, the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE). The encoder 2 may encode the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and include the encoded value in the SPS and/or PPS.
In this case, the level decoding unit 2102 and the scaling table decoding unit 2104 provided inside the header decoding unit 10 of the decoding device 1 are configured to execute the following processing.
[ grade decoding section 2102]
The profile level decoding unit 2102 decodes profile level information of each layer from the VPS. When decoding the SPS, the rank level decoding unit 2102 decodes the rank level information from the SPS (see fig. 43 (a)) when the LAYER ID of the SPS (nuh _ LAYER _ ID included in a NAL unit header such that the NAL unit type is the SPS) is equal to or greater than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE, or when the LAYER ID of the SPS is 0, regardless of whether or not the LAYER indicated by the LAYER ID is an independent LAYER.
[ characterization information decoding unit 2103]
When the LAYER ID (nuh _ LAYER _ ID) of the SPS is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID of the SPS is larger than 0, the representation information decoding unit 2103 includes the representation information update flag update _ rep _ format _ flag in the SPS and decodes the flag from the SPS. In the case where update _ rep _ format _ flag is not included in the SPS, update _ rep _ format _ flag is derived (prefer) to be 0. When the update _ rep _ format _ flag is 1, the token information decoding unit 2103 further decodes token information such as chroma _ format _ idc, separate _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, and bit _ depth _ chroma _ minus 8. In the case where update _ format _ flag is 0, in rep _ format () of VPS, already decoded representation information is used as representation information for an object layer.
[ expansion/contraction table decoding unit 2104]
When the LAYER ID (nuh _ LAYER _ ID) of the SPS is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID is larger than 0, the extended/reduced table decoding unit 2104 decodes the SPS _ interference _ scaling _ list _ flag from the SPS (see fig. 41 (b)). If the sps _ inference _ scaling _ list _ flag is not 0, the sps _ scaling _ list _ ref _ layer _ id is decoded. And in the case that the sps _ inference _ scaling _ list _ flag is 0, decoding the sps _ scaling _ list _ data _ present _ flag and the scaling _ list _ data (), and decoding the scaling table.
When the LAYER ID (nuh _ LAYER _ ID) of the PPS is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID is larger than 0 (when the LAYER ID is other than 0), the extended/reduced table decoding unit 2104 decodes PPS _ inclusion _ list _ flag from the PPS (see fig. 42 (a)). When the pps _ interpolating _ list _ flag is not 0, the pps _ interpolating _ list _ ref _ layer _ id is decoded. And in the case that the pps _ interpolating _ list _ flag is 0, decoding the pps _ scaling _ list _ data _ present _ flag and the scaling _ list _ data (), and decoding the scaling table.
[ POC information decoding unit 2105]
The POC information decoding UNIT 2105 decodes slice _ pic _ order _ cnt _ lsb from the slice header (refer to (b) of fig. 42) when a) the LAYER ID (nuh _ LAYER _ ID) of the slice header is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID is larger than 0 (when the LAYER ID is other than 0), or b) the NAL UNIT TYPE does not indicate an IDR picture (when the NAL _ UNIT _ TYPE is neither IDR _ W _ RADL nor IDR _ N _ LP).
[ characterization information encoding section ]
In addition, the representation information encoding unit encodes the representation information update flag update _ rep _ format _ flag and includes it in the SPS when the flag vps _ syntax _ change _ by _ layer _ ID _ flag is other than 0 and the layer ID of the SPS is larger than 0. Furthermore, when the encoded representation information update flag update _ rep _ format _ flag is 1, the representation information such as chroma _ format _ idc, match _ colour _ plane _ flag, pic _ width _ in _ luma _ samples, pic _ height _ in _ luma _ samples, bit _ depth _ luma _ minus8, bit _ depth _ chroma _ minus8, etc. is encoded and included in the SPS.
[ encoding part of expansion/contraction table ]
The extended/reduced table encoding unit encodes the SPS _ input _ scaling _ list _ flag when the LAYER ID (nuh _ LAYER _ ID) of the SPS is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID of the SPS is larger than 0 (when the LAYER ID is other than 0), and encodes the SPS _ scaling _ list _ ref _ LAYER _ ID when the LAYER ID of the SPS is other than 0, and includes the encoded SPS _ scaling _ list _ ref _ LAYER _ ID in the SPS. In the case where the SPS _ inference _ scaling _ list _ flag is 0, the SPS _ scaling _ list _ data _ present _ flag and the scaling _ list _ data () are encoded and included in the SPS. Similarly, the extension table encoding unit encodes PPS _ indexing _ list _ flag when the LAYER ID (nuh _ LAYER _ ID) of the PPS is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID of the PPS is larger than 0 (when the LAYER ID is other than 0), and encodes sps _ indexing _ list _ ref _ LAYER _ ID when the LAYER ID of the PPS is other than 0, and includes the encoded PPS. In the case where PPS _ inducing _ scaling _ list _ flag is 0, PPS _ scaling _ list _ data _ present _ flag and scaling _ list _ data () are encoded and included in the PPS.
[ POC information encoding section 2105E ]
The POC information encoding UNIT 2105E encodes slice _ pic _ order _ cnt _ lsb in the slice header when a) the LAYER ID (nuh _ LAYER _ ID) of the picture is smaller than the value of LAYER _ ID _ FOR _ SYNTAX _ CHANGE and the LAYER ID of the picture is larger than 0 (when the LAYER ID is other than 0), or b) the NAL UNIT TYPE of the picture does not indicate an IDR picture (when the NAL _ UNIT _ TYPE is neither IDR _ W _ RADL nor IDR _ N _ LP).
In addition, a part of the image encoding device 2 and the image decoding device 1 in the above-described embodiments, for example, the entropy decoding unit 301, the prediction parameter decoding unit 302, the predicted image generating unit 101, the DCT/quantizing unit 103, the entropy encoding unit 104, the inverse quantization/inverse DCT unit 105, the encoding parameter determining unit 110, the prediction parameter encoding unit 111, the entropy decoding unit 301, the prediction parameter decoding unit 302, the predicted image generating unit 308, and the inverse quantization/inverse DCT unit 311 may be implemented by a computer. In this case, the control function may be realized by recording a program for realizing the control function in a computer-readable recording medium, and reading and executing the program recorded in the recording medium into a computer system. The "computer system" referred to herein is a computer system incorporated in either the image encoding device 2 or the image decoding device 1, and includes hardware such as an OS and peripheral devices. In addition, the "computer-readable recording medium" refers to: a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or the like, and a storage device such as a hard disk built in a computer system. Further, the "computer-readable recording medium" may include: a recording medium that dynamically holds a program for a short time, such as a communication line when the program is transmitted via a network such as the internet or a communication line such as a telephone line; a recording medium that holds a program for a certain period of time, such as a volatile memory in a computer system serving as a server or a client in this case. The program may be a program for realizing the above-described partial functions, or may be a program that can realize the above-described functions by combining with a program already recorded in a computer system.
In addition, a part or all of the image encoding device 2 and the image decoding device 1 in the above-described embodiments may be implemented as an integrated circuit such as an lsi (large scale integration). Each functional block of the image encoding device 2 and the image decoding device 1 may be individually processed, or may be partially or entirely integrated into a processor. The method of integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. When a technique for realizing an integrated circuit that replaces an LSI has been developed due to the advancement of semiconductor technology, an integrated circuit using this technique may be used.
(conclusion)
An image decoding device according to claim 1 of the present invention is an image decoding device (image decoding device 1) for decoding a scalable-coded image, the image decoding device including: a layer ID decoding unit (layer ID decoding unit 2111) that decodes the layer ID included in the NAL unit header; a dependent layer information decoding unit (dependent layer information decoding unit 2101) that decodes dependent layer information indicating whether or not there is a dependency relationship between layers, and determines whether or not a layer indicated by the layer ID is an independent layer having no dependency relationship based on the dependent layer information; a profile level information decoding unit (profile level information decoding unit 2102) for decoding profile level information from a video parameter set; the level information decoding unit decodes the level information from the sequence parameter set even when determining that the layer indicated by the layer ID is the independent layer.
According to the above configuration, the following effects can be achieved: the above-described image decoding apparatus can extract the independent layer without rewriting syntax and reproduce it by a non-scalable decoder.
According to the 1 st aspect of the present invention, in the image decoding device according to the 2 nd aspect of the present invention, the level information decoding unit may decode the level information from the sequence parameter set only when the layer ID is 0 or a value of a specific flag (flag vps _ syntax _ change _ by _ layer _ ID _ flag) related to syntax change is 0, when it is determined that the layer indicated by the layer ID is the independent layer.
According to the 1 st aspect of the present invention, in the image decoding device according to the 3 rd aspect of the present invention, the level information decoding unit may decode the level information from the sequence parameter set only when the layer ID is 0 or when the layer ID is a value within a specific range, when it is determined that the layer indicated by the layer ID is the independent layer.
An image encoding device according to claim 4 of the present invention is an image encoding device (image encoding device 2) for encoding a scalable-coded image, the image encoding device including: a layer ID encoding unit that encodes a layer ID included in a NAL unit header; a dependent layer information encoding unit that encodes dependent layer information indicating whether or not there is an inter-layer dependency relationship; a level information encoding unit that encodes level information included in each of a video parameter set and a sequence parameter set; the rank information encoding unit encodes the rank information included in the sequence parameter set when determining that the layer indicated by the layer ID is the independent layer.
The encoded data according to claim 5 of the present invention is encoded data including one or more NAL units, each NAL unit including an NAL unit header and NAL unit data, wherein each of the one or more NAL unit headers included in the encoded data includes: a layer ID, and a NAL unit type specifying a type of NAL unit including the NAL unit header, the NAL unit type being included in NAL units of a video parameter set: the level information and dependent layer information indicating whether or not there is a dependency relationship between layers are included in a NAL unit in which the NAL unit type is a sequence parameter set, that is, a NAL unit in which a layer indicated by the layer ID is an independent layer.
An image decoding device according to claim 6 of the present invention is an image decoding device that decodes a scalable-coded image, and includes: a layer ID decoding unit that decodes a layer ID included in a NAL unit header; a dependent layer information decoding unit that decodes dependent layer information indicating whether or not there is a dependency relationship between layers, and determines whether or not a layer indicated by the layer ID is an independent layer having no dependency relationship based on the dependent layer information; a POC information decoding unit that decodes POC information from the slice header; the POC information decoding unit is configured to decode the POC information when a NAL unit type does not indicate an IDR picture, and the POC information decoding unit decodes the POC information when it is determined that a layer indicated by the layer ID is a layer other than the independent layer.
According to the structure, the following effects can be achieved: the above-described image decoding apparatus can extract the independent layer without rewriting syntax and reproduce it by a non-scalable decoder.
According to the 6 th aspect of the present invention, in the image decoding device according to the 7 th aspect of the present invention, the POC information decoding unit may decode the POC information only when a value of a specific flag (for example, vps _ syntax _ change _ by _ layer _ ID flag) associated with the layer ID is not 0, when the NAL unit type indicates an IDR picture.
According to the 6 th aspect of the present invention, in the image decoding device according to the 8 th aspect of the present invention, the POC information decoding unit may decode the rank level information from the sequence parameter set only when the NAL unit type indicates an IDR picture, and the LAYER ID is a value within a specific range (FOR example, a value equal to or greater than 1 LAYER _ ID _ FOR _ SYNTAX _ CHANGE or less).
An image encoding device according to claim 9 of the present invention is an image encoding device for encoding a scalable-coded image, and includes: a layer ID encoding unit for encoding a layer ID included in a NAL unit header; a dependent layer information encoding unit that encodes dependent layer information indicating whether or not there is an inter-layer dependency relationship; the POC information encoding unit encodes POC information for a slice header when the NAL unit type is other than an IDR picture or when it is determined that a layer indicated by a layer ID is other than the above-mentioned independent layer.
The encoded data according to claim 10 of the present invention is encoded data including one or more NAL units, each of the one or more NAL unit headers included in the encoded data including: contains a layer ID and a NAL unit type specifying the type of NAL unit containing the NAL unit header; the slice header includes POC information only when the NAL unit type is other than an IDR picture or when the layer indicated by the layer ID is other than the above-described independent layer.
An image decoding device according to claim 11 of the present invention is an image decoding device that decodes a scalable-coded image, and includes: a layer ID decoding unit that decodes a layer ID included in the NAL unit header; a dependent layer information decoding unit that decodes dependent layer information indicating whether or not there is a dependency relationship between layers, and determines whether or not a layer indicated by the layer ID is an independent layer having no dependency relationship based on the dependent layer information; a representation information decoding unit that decodes representation information from the video parameter set; the representation information decoding unit decodes the representation information update flag from the sequence parameter set when determining that the layer indicated by the layer ID is not the independent layer, and decodes the representation information when the representation information update flag is other than 0.
According to the structure, the following effects are achieved: the above-described image decoding apparatus can extract the independent layer without rewriting syntax and reproduce it by a non-scalable decoder.
According to the 11 th aspect of the present invention, in the image decoding device according to the 12 th aspect of the present invention, the representation information decoding unit may decode the representation information update flag from the sequence parameter set only when a value of a specific flag (for example, a vps _ syntax _ change _ by _ layer _ ID flag) related to the layer ID is not 0.
According to the 11 th aspect of the present invention, in the image decoding device according to the 13 th aspect of the present invention, the representation information decoding unit may decode the representation information update flag from the sequence parameter set only when the LAYER ID has a value within a specific range (FOR example, a value equal to or greater than 1 LAYER _ ID _ FOR _ SYNTAX _ CHANGE).
An image encoding device according to claim 14 of the present invention is an image encoding device for encoding a scalable-coded image, and includes: a layer ID encoding unit for encoding a layer ID included in a NAL unit header; a dependent layer information encoding unit that encodes dependent layer information indicating whether or not there is an inter-layer dependency relationship; a representation information encoding unit that encodes representation information included in the video parameter set; the profile information encoding unit encodes a profile information update flag included in the sequence parameter set when determining that the layer indicated by the layer ID is not the independent layer.
The encoded data according to claim 15 of the present invention is encoded data including one or more NAL units, each NAL unit including an NAL unit header and NAL unit data, wherein each of the one or more NAL unit headers included in the encoded data includes: a layer ID; a NAL unit type specifying a type of a NAL unit including the NAL unit header, the NAL unit type including the attribute information in a NAL unit of a video parameter set, and the NAL unit type including the sequence parameter set, that is, a NAL unit in which the layer ID indicates a layer other than an independent layer, the attribute information update flag.
An image decoding device according to claim 16 of the present invention is an image decoding device that decodes a scalable-coded image, and includes: a layer ID decoding unit that decodes a layer ID included in a NAL unit header having a NAL unit type of a sequence parameter set; a dependent layer information decoding unit that decodes dependent layer information indicating whether or not there is a dependency relationship between layers, and determines whether or not a layer indicated by the layer ID is an independent layer having no dependency relationship based on the dependent layer information; a scaling table decoding unit for decoding the scaling table; the scaling table decoding unit decodes a scaling table prediction flag from the sequence parameter set and the picture parameter set when determining that the layer indicated by the layer ID is other than the independent layer, and decodes the scaling table when the scaling table prediction flag is 0.
According to the structure, the following effects are achieved: the above-described image decoding apparatus can extract the independent layer without rewriting syntax and reproduce it by a non-scalable decoder.
According to the 16 th aspect of the present invention, in the image decoding device according to the 17 th aspect of the present invention, the scaling table decoding unit may decode the scaling table prediction flag from the sequence parameter set and the picture parameter set only when a value of a specific flag (for example, a vps _ syntax _ change _ by _ layer _ ID flag) related to the layer ID is not 0.
According to the 16 th aspect of the present invention, in the image decoding device according to the 18 th aspect of the present invention, the scaling table decoding unit may decode the scaling table prediction flag from the sequence parameter set only when the LAYER ID is a value within a specific range (FOR example, a value equal to or greater than 1, i.e., equal to or less than 1).
An image encoding device according to claim 19 of the present invention is an image encoding device for encoding a scalable-coded image, and includes: a layer ID encoding unit that encodes a layer ID included in a NAL unit header; a dependent layer information encoding unit that encodes dependent layer information indicating whether or not there is an inter-layer dependency relationship; and a scaling table encoding unit that encodes a scaling table, wherein the scaling table encoding unit decodes a scaling table prediction flag for the sequence parameter set and the picture parameter set when determining that the layer indicated by the layer ID is other than the independent layer.
The encoded data according to the 20 th aspect of the present invention is encoded data including one or more NAL units, each of the one or more NAL unit headers included in the encoded data including: a layer ID, and a NAL unit type that specifies a type of a NAL unit including the NAL unit header, wherein the NAL unit type includes a scaling table prediction flag in a NAL unit of a sequence parameter set, that is, a NAL unit in which a layer indicated by the layer ID is not an independent layer, and a NAL unit in which the NAL unit type is a picture parameter set, that is, a NAL unit in which a layer indicated by the layer ID is not an independent layer.
The encoded data according to claim 21 of the present invention is encoded data constituted by an access unit including one or more NAL units, each NAL unit being constituted by a NAL unit header and NAL unit data, wherein each of the one or more NAL unit headers included in the encoded data includes: a layer ID and a NAL unit type that specifies a type of a NAL unit including the NAL unit header, wherein the NAL unit type is dependent layer information indicating whether or not there is a dependency relationship between layers in the NAL unit of the video parameter set, wherein the NAL unit type is a picture, wherein the slice header includes POC information in the NAL unit of the picture, and wherein all pictures included in the access unit and belonging to a layer defined as a reference layer in the dependent layer information and a layer defined as a reference layer have the same POC.
The encoded data according to claim 22 of the present invention is encoded data constituted by an access unit including one or more NAL units, each NAL unit being constituted by a NAL unit header and NAL unit data, wherein each of the one or more NAL unit headers included in the encoded data includes: a layer ID and a NAL unit type for specifying a type of a NAL unit including a NAL unit header, wherein the NAL unit type is dependent layer information indicating whether or not inter-layer dependency exists in the NAL unit of the video parameter set, and the NAL unit type is a picture, wherein the slice header includes POC information in the NAL unit of the picture, and when a different POC may be included in a picture included in the access unit, the access unit header includes an access unit delimiter indicating a partition of the access unit.
While one embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to the above description, and various design changes and the like can be made without departing from the scope of the present invention.
Industrial applicability
The present invention is suitably applicable to an image decoding device that decodes encoded data of encoded image data and an image encoding device that generates encoded data of encoded image data. In addition, the present invention can be suitably applied to a data structure of encoded data generated by an image encoding apparatus and referred to by an image decoding apparatus.
Description of the symbols
An image decoding apparatus
Image coding device
A network
An image display device
An image transmission system
A header decoding section
Head coding part
A picture decoding section
A decoded picture buffer
A reference picture management section
Reference picture set setting section
132
Reference picture determining section
A predicted image generating unit
102
A DCT/quantization section
1031
An entropy encoding section
An inverse quantization/inverse DCT unit
106
A prediction parameter memory
An encoding parameter determining section
A prediction parameter encoding unit
An inter prediction parameter encoding section
1121
An AMVP prediction parameter derivation unit
1123.. subtracting part
1126
An intra prediction parameter encoding unit
2101.. dependent layer information decoding section
A rank level information decoding unit
2103
2104. expansion and contraction table decoding part
POC information decoding unit
POC information encoding unit
POC lower bit maximum value decoding section
POC next-bit maximum value encoding unit
POC lower bit decoding section
POC low-bit encoding section
21053
21053b
POC addition unit
POC resetting unit
POC setting unit
A picture coding section
NAL unit header decoding section
A layer ID decoding section
NAL unit type decoding section
NAL unit header encoding section
NAL unit type encoding section
A VPS decoding section
213
A PPS decoding part
A slice header decoding section
218
Reference picture set determining part
Reference picture list determining part
An entropy decoding section
A prediction parameter decoding unit
303
3031
30311
30312
30313
30314
3032
3033
3034
3035
3036
30361 merge candidate derivation unit
303611
303612
3036121
3036122
3036123
303613
3036131
3036132
3036133
3036134
303614.. MPI candidate derivation unit
30362A merge candidate selecting unit
304
A prediction parameter memory
A predicted image generating unit
309.. an inter-prediction image generating unit
3091
3092
30921 residual error acquiring unit
30922
3093
30931
30932
3094
An intra prediction image generation unit
3101
3102
An inverse quantization/inverse DCT unit
An addition section
313
Claims (12)
1. An image decoding apparatus that decodes scalable-coded image data, comprising:
a header decoding unit that decodes the first flag;
POC information decoding means for decoding slice _ pic _ order _ cnt _ lsb as one of the POC information,
the POC information decoding means decodes the slice _ pic _ order _ cnt _ lsb from a slice header in a case where the first flag indicates a first value and a layer ID is greater than 0, or in a case where a NAL unit type does not indicate an IDR picture, and does not decode the slice _ pic _ order _ cnt _ lsb in other cases.
2. The image decoding apparatus according to claim 1,
the header decoding means decodes the first flag in a layer unit.
3. The image decoding apparatus according to claim 1 or 2,
the header decoding means further decodes a directly dependent layer number, and decodes the first flag only when the directly dependent layer number is 0.
4. An image encoding device that performs scalable encoding on image data, comprising:
a head encoding mechanism that encodes the first mark;
POC information encoding means for encoding slice _ pic _ order _ cnt _ lsb as one of the POC information,
the POC information encoding means encodes the slice _ pic _ order _ cnt _ lsb from a slice header in a case where the first flag represents a first value and a layer ID is greater than 0, or in a case where a NAL unit type does not represent an IDR picture, and does not decode the slice _ pic _ order _ cnt _ lsb in other cases.
5. The image encoding device according to claim 4,
the head encoding means encodes the first flag in a layer unit.
6. The image encoding device according to claim 4 or 5,
the header encoding mechanism further encodes a directly dependent layer number, and encodes the first flag only when the directly dependent layer number is 0.
7. An image decoding apparatus that decodes scalable-coded image data, comprising:
a header decoding unit that decodes the first flag;
a representation information decoding means for decoding the representation information,
the characterization information decoding means decodes the characterization information when the first flag indicates a first value and the layer ID is other than 0.
8. An image encoding device that performs scalable encoding on image data, comprising:
a head encoding mechanism that encodes the first mark;
a representation information encoding means for encoding the representation information,
the attribute information encoding means encodes the attribute information when the first flag indicates a first value and the layer ID is other than 0.
9. An image decoding apparatus that decodes scalable-coded image data, comprising:
a header decoding unit that decodes the first flag;
a scaling table information decoding means for decoding the scaling table information,
the scaling table information decoding means decodes the sps _ scaling _ list _ ref _ layer _ ID of the scaling table information when the first flag indicates the first value and the layer ID is other than 0, and decodes the sps _ scaling _ list _ data _ present _ flag and the scaling _ list _ data ().
10. An image encoding device that performs scalable encoding on image data, comprising:
a head encoding mechanism that encodes the first mark;
a scaling table information encoding means for encoding scaling table information,
the scaling table information encoding means encodes the sps _ scaling _ list _ ref _ layer _ ID of the scaling table information when the first flag indicates the first value and the layer ID is other than 0, and encodes the sps _ scaling _ list _ data _ present _ flag and the scaling _ list _ data ().
11. Coded data consisting of an access unit containing more than one NAL unit, said NAL unit consisting of a NAL unit header and NAL unit data, said coded data being characterized in that,
each of the one or more NAL unit headers included in the encoded data includes: a layer ID, and a NAL unit type specifying a type of a NAL unit including the NAL unit header, the NAL unit type being dependent layer information indicating whether there is an inter-layer dependency in the NAL unit of the video parameter set,
in a NAL unit of which the NAL unit type is a picture, POC information is contained in a slice header,
all pictures contained within the access unit that belong to a layer defined as a reference layer and a layer defined as a referenced layer in the dependent layer information have the same POC.
12. Coded data consisting of an access unit containing more than one NAL unit, said NAL unit consisting of a NAL unit header and NAL unit data, said coded data being characterized in that,
each of the one or more NAL unit headers included in the encoded data includes: a layer ID, and a NAL unit type specifying a type of NAL unit containing the NAL unit header,
the NAL unit type is that dependent layer information indicating whether there is a dependency relationship between layers is included in a NAL unit of a video parameter set,
in a NAL unit of which the NAL unit type is a picture, POC information is contained in a slice header,
in case there may be different POCs in the pictures contained within the access unit, an access unit delimiter is preceding the access unit, with a delimiter indicating the delimiter of the access unit.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2013-211468 | 2013-10-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1225202A1 true HK1225202A1 (en) | 2017-09-01 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6397421B2 (en) | Image decoding apparatus and image encoding apparatus | |
| US10237564B1 (en) | Image decoding device, image decoding method, image coding device, and image coding method | |
| KR102106190B1 (en) | Decoded picture buffer operations for video coding | |
| US9967592B2 (en) | Block-based advanced residual prediction for 3D video coding | |
| US10158885B2 (en) | Simplified advanced motion prediction for 3D-HEVC | |
| EP3022932B1 (en) | Processing illumination compensation for video coding | |
| KR101722890B1 (en) | More accurate advanced residual prediction (arp) for texture coding | |
| US20150326866A1 (en) | Image decoding device and data structure | |
| KR20150092249A (en) | Advanced merge/skip mode and advanced motion vector prediction (amvp) mode for 3d video | |
| US20160191933A1 (en) | Image decoding device and image coding device | |
| EP2982120A2 (en) | Advanced merge mode for three-dimensional (3d) video coding | |
| WO2015200835A1 (en) | Filters for advanced residual prediction in video coding | |
| KR20160074642A (en) | Image decoding device and image decoding method | |
| KR20160102076A (en) | Disparity vector and/or advanced residual prediction for video coding | |
| WO2014103600A1 (en) | Encoded data structure and image decoding device | |
| JP2015076806A (en) | Image decoding apparatus and image encoding apparatus | |
| HK1225202A1 (en) | Image decoding device, image coding device, and coded data | |
| JP2015015626A (en) | Image decoding apparatus and image encoding apparatus |