HK40110265A

HK40110265A - Multi-view decoder

Info

Publication number: HK40110265A
Application number: HK42024098756.0A
Authority: HK
Inventors: Robert SKUPIN; Karsten Suehring; Yago SANCHEZ DE LA FUENTE; Gerhard Tech; Valeri GEORGE; Thomas Schierl; Detlev Marpe
Original assignee: Ge Video Compression, Llc
Priority date: 2013-04-08
Filing date: 2024-10-29
Publication date: 2024-12-20

Description

Multi-view decoder

本申请是申请号为201480032450.7，申请日为2014年4月8日，发明名称为“允许有效多视图/层编码的编码概念”的发明专利申请的分案申请201910419460.0的分案申请，其全部内容结合于此作为参考。This application is a divisional application of patent application 201910419460.0, filed on April 8, 2014, entitled “Encoding Concept Allowing Effective Multi-View/Layer Encoding”, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本申请涉及允许有效多视图/层编码(例如，多视图图片/视频编码)的编码概念。This application relates to an encoding concept that allows for effective multi-view/layer encoding (e.g., multi-view image/video encoding).

背景技术Background Technology

在本领域中众所周知可伸缩编码概念。在视频编码中，例如，H.264允许基层编码视频数据流伴有额外增强层数据，以便在不同的方面提高基层质量视频的重建质量，例如，空间分辨率、信噪比(SNR)等和/或最后但同样重要的是，视图的数量。最近定案的HEVC标准也由SVC/MVC框架(SVC＝可扩展视频编码，MVC＝多视图编码)扩展。HEVC与其前面的H.264在很多方面不同，例如，适合于并行解码/编码以及低延迟传输。就并行解码/编码而言，HEVC支持WPP(波前并行处理)编码/解码以及瓦片并行处理概念。根据WPP概念，个别图片以逐行方式分成子流。在每个子流内的编码次序从左指向右。子流具有的定义在其中的解码次序，从顶部子流引向底部子流。使用概率适配(自适应)进行子流的熵编码。个别地或者基于用于熵编码诸如分别在第二CTB(编码树块)的端部上的从前面子流的左手边缘的紧接的前面子流直到某个位置的概率的初步适配状态，对各个子流进行概率初始化。不需要限制空间预测。即，空间预测可以横跨介于紧接的后面的子流之间的边界。通过这种方式，这些子流可以与形成波前的当前编码/解码的位置并行地编码/解码，该波前通过从左下到右上、从左到右的瓦片方式运行。根据瓦片概念，图片分成瓦片，并且为了给出(render)这些瓦片的编码/解码，禁止在瓦片边界之上的并行处理、空间预测的可能主题。仅仅允许在瓦片边界之上的回路滤波。为了支持低延迟处理，薄片概念已被扩展：允许薄片可切换为重新初始化熵概率，采用在处理前一个子流期间保存的熵概率，即，位于当前薄片开始所属的子流前面的子流，并且采用连续更新的熵概率，直到直接紧接的前面的薄片结束。通过这种措施，使WPP和瓦片概念更适合于低延迟处理。The concept of scalable coding is well-known in this field. In video coding, for example, H.264 allows the base-level encoded video data stream to be accompanied by additional enhancement layer data to improve the reconstruction quality of the base-level quality video in various ways, such as spatial resolution, signal-to-noise ratio (SNR), and/or last but equally important, the number of views. The recently finalized HEVC standard is also extended by the SVC/MVC framework (SVC = Scalable Video Coding, MVC = Multi-View Coding). HEVC differs from its predecessor H.264 in many ways, for example, it is suitable for parallel decoding/encoding and low-latency transmission. Regarding parallel decoding/encoding, HEVC supports WPP (Wavefront Parallel Processing) encoding/decoding and the concept of tile parallel processing. According to the WPP concept, individual pictures are divided into substreams in a line-by-line manner. The encoding order within each substream is from left to right. The substreams have a decoding order defined within them, leading from the top substream to the bottom substream. Entropy coding of the substreams is performed using probabilistic adaptation (adaptive). Individually or based on an initial fit state for entropy coding, such as the probabilities of the substream immediately preceding the left edge of the preceding substream at the end of the second CTB (coding tree block) up to a certain position, each substream is probabilistically initialized. Spatial prediction is not restricted. That is, spatial prediction can span boundaries between immediately following substreams. In this way, these substreams can be encoded/decoded in parallel with the current encoding/decoding position forming the wavefront, which operates in a tiled manner from bottom left to top right and from left to right. According to the tile concept, the image is divided into tiles, and in order to render the encoding/decoding of these tiles, parallel processing and possible themes of spatial prediction above tile boundaries are prohibited. Only loop filtering above tile boundaries is allowed. To support low-latency processing, the slab concept has been extended: allowing slabs to be switched to reinitialize entropy probabilities using the entropy probabilities saved during processing of the previous substream, i.e., the substream preceding the current slab's starting point, and using continuously updated entropy probabilities until the immediately preceding slab ends. This approach makes WPP and the tile concept more suitable for low-latency processing.

然而，更可取的是具有考虑中的概念，这进一步提高了多视图/层编码概念。However, it is preferable to have a concept under consideration, which further enhances the concept of multi-view/layer coding.

发明内容Summary of the Invention

因此，本发明的目标在于，提供一种提高多视图/层编码概念的概念。Therefore, the objective of this invention is to provide a concept that improves the concept of multi-view/layer encoding.

由未决的独立权利要求的主题实现这个目标。This goal is achieved by the subject matter of the pending independent claims.

本申请的第一方面涉及多视图编码。尤其地，作为第一方面的基础的理念如下。一方面，视图间预测帮助利用在捕捉某个场景的多个视图之间的冗余，从而提高编码效率。另一方面，视图间预测防止多个视图独立于彼此可解码/可编码，即，并行可解码/可编码，以便例如，利用多核处理器。更精确地说，视图间预测使第二视图的部分依赖于第一视图的相应参考部分，并且在第一和第二视图的部分之间的这个关系需要在并行解码/编码第一和第二视图时满足某个视图间解码/编码偏移/延迟。作为第一方面的基础的理念在于，这个视图间编码偏移可以大幅减少，如果通过在第一/参考视图所分成的空间段的空间段边界上进行视图间预测的方式，改变编码和/或解码，那么仅仅通过微小的方式降低编码效率。可以进行变化，以便从第一视图到第二视图的视图间预测不组合第一视图的不同空间段的任何信息，但是仅仅从起源于第一视图的一个空间段的信息中分别预测第二视图及其语法元素。根据一个实施方式，甚至更严格地进行变化，以便视图间预测甚至不穿过空间段边界，即，所述一个空间段边界是包括共同定位的位置或共同定位的部分的空间段。在视图间预测中考虑组合起源于第一视图的两个或多个空间段的信息的结果时，由在段边界上的视图间的变化产生的优点变得清晰。在这种情况下，在视图间预测中涉及这种组合的第二视图的任何部分的编码/解码必须推迟，直到第一视图的所有空间段的编码/解码由层间预测组合。然而，在第一视图的空间段边界上的视图间预测的变化解决了这个问题，并且第一视图的一个空间段一解码/编码，第二视图的每个部分就容易可编码/可解码。然而，编码效率仅仅小幅降低，这是因为依然大体上允许层间预测，该限制仅仅适用于第一视图的空间段边界。根据一个实施方式，编码器注意在第一视图的空间段的空间段边界上的层间预测的变化，以便避免第一视图的两个或多个空间段具有上述组合，并且将这个避免/情况信令给解码器，这反过来将信令用作一种保证，以便例如，减少响应于信令的视图间解码延迟。根据另一个实施方式，解码器还改变由在数据流内的信令触发的层间预测的方式，以便由于就这些空间段边界而言，控制层间预测所需要的边信息的量可以减少，所以在形成数据流时，可以利用在第一视图的空间段的空间段边界上的层间预测参数设置的限制。The first aspect of this application relates to multi-view coding. Specifically, the underlying concept of this first aspect is as follows: On the one hand, inter-view prediction helps to utilize redundancy among multiple views capturing a scene, thereby improving coding efficiency. On the other hand, inter-view prediction prevents multiple views from being independently decodeable/encodeable, i.e., in parallel, so as to utilize, for example, multi-core processors. More precisely, inter-view prediction makes a portion of the second view dependent on a corresponding reference portion of the first view, and this relationship between portions of the first and second views needs to satisfy a certain inter-view decoding/encoding offset/delay when decoding/encoding the first and second views in parallel. The underlying concept of this first aspect is that this inter-view coding offset can be significantly reduced by changing the way coding and/or decoding are performed, by altering the manner in which inter-view prediction is performed on the spatial segment boundaries of the spatial segments into which the first/reference view is divided, thus reducing coding efficiency only slightly. Variations can be made so that the inter-view prediction from the first view to the second view does not combine any information from different spatial segments of the first view, but predicts the second view and its syntax elements separately only from information originating from a spatial segment of the first view. According to one implementation, even more stringent variations are made so that inter-view predictions do not even cross spatial segment boundaries, i.e., a spatial segment boundary is a spatial segment that includes a co-located location or a co-located portion. The advantages arising from the variation at the segment boundaries become clear when considering the result of combining information from two or more spatial segments originating from the first view in inter-view prediction. In this case, encoding/decoding of any part of the second view involving such a combination in inter-view prediction must be postponed until the encoding/decoding of all spatial segments of the first view is combined by inter-layer prediction. However, the variation in inter-view prediction at the spatial segment boundaries of the first view solves this problem, and once a spatial segment of the first view is decoded/encoded, each part of the second view becomes readily encodeable/decodeable. However, the encoding efficiency is only slightly reduced because inter-layer prediction is still largely permitted, with the limitation only applying to the spatial segment boundaries of the first view. According to one implementation, the encoder takes note of the variation in inter-layer prediction at the spatial segment boundaries of the spatial segments of the first view to avoid two or more spatial segments of the first view having the aforementioned combination, and sends this avoidance/situation signaling to the decoder, which in turn uses the signaling as a guarantee to, for example, reduce the inter-view decoding latency in response to the signaling. According to another implementation, the decoder also changes the way inter-layer prediction is triggered by signaling within the data stream so that, since the amount of edge information required to control inter-layer prediction with respect to these spatial segment boundaries can be reduced, the limitations of the inter-layer prediction parameter settings on the spatial segment boundaries of the first view can be utilized when forming the data stream.

本申请的第二方面涉及多层视频编码以及以下情况：通常，多个层的图片所编码成的NAL单元收集到访问单元内，以便与一个时刻相关的NAL单元形成一个访问单元，与和相应的NAL单元相关的层无关，或者以便对于不同的每对时刻和层，具有一个访问单元，然而，与所选择的可能性无关，单独处理每个时刻到层对的NAL单元，将其排列成不交错。即，在通过另一对时刻和层的NAL单元继续之前，发出属于某个时刻和层的NAL单元。不准许交错。然而，这阻止了进一步减少端对端延迟，这是因为防止编码器在属于基层的NAL单元之间发出属于相关层的NAL单元，然而，由层间并行处理造成这种情况。本申请的第二方面放弃在传输的尾流内的NAL单元的严格的依次不交错的设置，并且为此，重新使用限定访问单元的第一可能性，收集一个时刻的所有NAL单元：在一个访问单元内收集一个时刻的所有NAL单元，并且访问单元依然通过未交错的方式设置在传输的位流内。然而，允许一个访问单元的NAL单元交错，以便一个层的NAL单元散置有另一个层的NAL单元。属于在一个访问单元内的一个层的NAL单元的运行形成解码单元。准许交错，以便对于在一个访问单元内的每个NAL单元，层间预测需要的信息包含在该访问单元内的任何前面的NAL单元内。编码器可以在位流内信令是否应用交错，并且反过来，解码器可以(例如)使用多个缓冲器，以便将每个访问单元的不同层的交错的NAL单元重新分类，或者根据信令，在不交错的情况下，使用仅仅一个缓冲器。不产生编码效率损失，然而，端对端延迟减少。The second aspect of this application relates to multi-layer video coding and the following situation: Typically, NAL units encoded from images of multiple layers are collected into an access unit so that an access unit is formed with NAL units associated with a time step, independent of the layer associated with the corresponding NAL unit, or so that there is one access unit for each different pair of time steps and layers. However, regardless of the chosen possibility, the NAL units of each time-to-layer pair are processed separately and arranged non-interleaved. That is, NAL units belonging to a certain time step and layer are issued before continuing with NAL units of another pair of time steps and layers. Interleaving is not allowed. However, this prevents further reduction of end-to-end latency because it prevents the encoder from issuing NAL units belonging to the relevant layer between NAL units belonging to the base layer, which is caused by inter-layer parallel processing. The second aspect of this application abandons the strict sequential non-interleaved arrangement of NAL units in the transmitted wake stream and, for this purpose, reuses the first possibility of defining access units, collecting all NAL units of a time step: collecting all NAL units of a time step within an access unit, and the access unit is still arranged in a non-interleaved manner in the transmitted bit stream. However, NAL units of an access unit are allowed to be interleaved, so that NAL units of one layer are scattered with NAL units of another layer. The execution of NAL units belonging to one layer within an access unit forms a decoding unit. Interleaving is permitted so that for each NAL unit within an access unit, the information needed for inter-layer prediction is contained within any preceding NAL units within that access unit. The encoder can signal within the bitstream whether interleaving is applied, and conversely, the decoder can (e.g.) use multiple buffers to reclassify the interleaved NAL units of different layers for each access unit, or, depending on the signaling, use only one buffer without interleaving. No coding efficiency loss occurs; however, end-to-end latency is reduced.

本申请的第三方面涉及每个位流数据包(例如，每个NAL单元)的层索引的信令。根据本申请的第三方面，发明人认识到，应用主要属于两种类型中的一种。正常应用需要中等数量的层，因此，在每个数据包内没有层ID字段，该数据包被配置为完全覆盖总体的中等数量的层。仅仅很少发生反过来需要过量的层的更复杂的应用。因此，根据本申请的第三方面，使用在多层视频信号内的层识别扩展机制信令，以便信令在每个数据包内的层识别语法元素完全或仅仅部分与在多层数据流内的层识别扩展一起是否确定相应数据包的层或者完全由层识别扩展代替/支配。通过这种措施，仅仅在很少发生的应用中，需要层识别扩展，并且该层识别扩展消耗比特率，而在大部分情况下，层相关联的有效信令可行。The third aspect of this application relates to signaling for the layer index of each bitstream data packet (e.g., each NAL unit). According to the third aspect, the inventors recognize that applications primarily fall into one of two types. Normal applications require a moderate number of layers, therefore, there is no layer ID field within each data packet, which is configured to fully cover the moderate number of layers in total. Only rarely do more complex applications, in turn, require an excessive number of layers. Therefore, according to the third aspect, a layer identification extension mechanism signaling is used within a multi-layer video signal, such that the signaling of the layer identification syntax element within each data packet, either entirely or only partially, together with the layer identification extension within the multi-layer data stream, determines whether the layer of the corresponding data packet is determined or entirely replaced/dominated by the layer identification extension. With this measure, layer identification extension is only needed in rare applications, and this layer identification extension consumes bit rate, while in most cases, effective signaling associated with the layer is feasible.

本申请的第四方面涉及在不同程度的信息量之间的层间预测依赖性的信令，将视频资料编码成多层视频数据流。根据第四方面，第一语法结构限定依赖性维度的数量以及每个依赖性维度i的排序等级的最大N_i，并且双射映射在依赖性空间内的至少子集的可用点中的相应一个上映射每个等级，以及对于依赖性维度i，第二语法结构。后者限定在层之间的依赖性。每个语法结构描述在相应的第二语法结构所属的依赖性维度i的N_i排序等级之中的依赖性。因此，限定依赖性的努力仅仅随着依赖性维度的数量线性增大，而由这个信令施加的对在单独层之间的相互依赖性的限制较低。A fourth aspect of this application relates to signaling for predicting inter-layer dependencies between different levels of information content, encoding video data into multi-layer video data streams. According to the fourth aspect, a first syntactic structure defines the number of dependency dimensions and the maximum N _i of the ordinal rank of each dependency dimension i, and a bijective mapping maps each rank to a corresponding up-rank of at least a subset of available points in the dependency space, and for dependency dimension i, a second syntactic structure. The latter defines the dependencies between layers. Each syntactic structure describes the dependencies within the N _i ordinal rank of the dependency dimension i to which the corresponding second syntactic structure belongs. Thus, the effort to define dependencies increases only linearly with the number of dependency dimensions, while the constraint imposed by this signaling on inter-layer dependencies is relatively low.

自然，所有以上方面可以每两个、每三个或者全部组合。Naturally, all of the above aspects can be combined in pairs, threes, or all of them.

附图说明Attached Figure Description

下面参照附图，描述本申请的优选实施方式，其中：The preferred embodiments of this application are described below with reference to the accompanying drawings, wherein:

图1示出了用作用于实现参照下图进一步概述的任何多层编码器的一个说明性实例的视频编码器；Figure 1 shows a video encoder as an illustrative example of any multilayer encoder used to implement the following figure, which is further outlined below.

图2示出了与图1的视频编码器配合的视频解码器的示意性方框图；Figure 2 shows a schematic block diagram of a video decoder that works in conjunction with the video encoder of Figure 1;

图3示出了细分成子流用于WPP处理的图片的示意图；Figure 3 shows a schematic diagram of an image subdivided into substreams for WPP processing;

图4示出了显示细分成块的任何层的图片的示意图，具有图片到空间段的进一步细分；Figure 4 shows a schematic diagram of an image displaying any layer subdivided into blocks, with further subdivision of the image into spatial segments;

图5示出了细分成块和瓦片的任何层的图片的示意图；Figure 5 shows a schematic diagram of any layer of images subdivided into blocks and tiles;

图6示出了细分成块和子流的图片的示意图；Figure 6 shows a schematic diagram of the image subdivided into blocks and sub-streams;

图7示出了根据一个实施方式的基视图和相关视图，其中相关视图图片排列在基视图图片前面，二者和相关视图的图片的示意图，相关视图图片设置在基视图图片之前，基础和相关视图彼此对齐，以便显示相对于离空间段边界更远的基视图块的接近空间段边界的基视图块的视差矢量的可能值的域的限制；Figure 7 shows a base view and a related view according to one embodiment, wherein the related view image is arranged in front of the base view image, and the two and the related view image are schematic diagrams. The related view image is set in front of the base view image. The base and related views are aligned with each other to show the domain of possible values of the disparity vector of the base view block near the spatial segment boundary relative to the base view block farther from the spatial segment boundary.

图8示出了根据一个实施方式的支持视图间预测限制和空间段边界的编码器的示意性方框图；Figure 8 shows a schematic block diagram of an encoder supporting inter-view prediction constraints and spatial segment boundaries according to one embodiment;

图9示出了与图8的编码器配合的解码器的示意性方框图；Figure 9 shows a schematic block diagram of a decoder that works in conjunction with the encoder of Figure 8;

图10示出了视图间预测的示意图，使用视差矢量，以便说明确定视差矢量的可能值的域的限制和/或视图间预测的相关视图块的视图间预测处理的可能修改的方面；Figure 10 illustrates a schematic diagram of inter-view prediction using disparity vectors to illustrate the limitations of the domain for determining the possible values of the disparity vectors and/or aspects of possible modifications to the inter-view prediction processing of related view blocks.

图11示出了显示用于预测相关视图的参数的视图间预测的示意图，以便说明应用视图间预测；Figure 11 shows a schematic diagram of inter-view prediction, illustrating the parameters used to predict related views, to demonstrate the application of inter-view prediction;

图12示出了分别细分成代码块和瓦片的图片的示意图，瓦片由整数倍的代码块构成，并且在图片细分成瓦片之后，在代码块之中限定解码次序；Figure 12 shows a schematic diagram of an image subdivided into code blocks and tiles, where tiles are composed of integer multiples of code blocks, and the decoding order is defined within the code blocks after the image is subdivided into tiles.

图13A示出了在修改的VPS语法中的一部分，作为用于将图8到图11的实施方式示例性构造成HEVC的一个实例；Figure 13A shows a portion of the modified VPS syntax as an example of constructing HEVC for the implementation of Figures 8 through 11.

图13B示出了与图13A的部分对应的部分，然而，这个部分属于SPS语法；Figure 13B shows the part corresponding to the part in Figure 13A; however, this part belongs to the SPS syntax.

图13C示出了在修改的VPS语法中的另一个示例性部分；Figure 13C shows another exemplary section in the modified VPS syntax;

图13D示出了用于实现视图间预测变化信令的修改的VPS语法的一个实例；Figure 13D shows an example of the modified VPS syntax for implementing inter-view predictive change signaling;

图13E示出了用于在空间段边界上信令视图间预测变化的在修改的VPS语法中的一部分的甚至进一步实例；Figure 13E shows an even further example of a portion of the modified VPS syntax for predicting changes between signaling views at space segment boundaries;

图13F示出了在修改的VPS语法中的一部分的甚至进一步实例；Figure 13F shows even further examples of a portion of the modified VPS syntax;

图13G示出了在修改的VPS语法中的一部分，作为信令视图间预测变化/限制的进一步可能性；Figure 13G illustrates a portion of the modified VPS syntax as a further possibility for predicting changes/limitations between signaling views;

图14示出了用于信令视图间预测变化/限制的甚至进一步实施方式的修改的VPS语法的一部分的实例；Figure 14 shows an example of a portion of the modified VPS syntax for predicting changes/limitations, and even further implementations, between signaling views;

图15示出了根据一个实施方式的显示为彼此层叠的相关视图图片和基视图图片的覆盖的示意图，以便显示在空间段边界上的基层滤波处理的可能修改，可以与视图间预测变化/限制同时触发所述修改；Figure 15 illustrates a schematic diagram of the overlay of related view images and base view images displayed as overlapping each other according to one embodiment, so as to show possible modifications to the base layer filtering process displayed on the spatial segment boundary, which can be triggered simultaneously with the predicted changes/constraints between views;

图16示出了多层视频数据流的示意图，在此处示意性包括三层，其中，在图16的下半部分中显示用于在数据流内设置属于相应时刻和相应层的NAL单元的选项1和2；Figure 16 shows a schematic diagram of a multi-layer video data stream, schematically including three layers. In the lower half of Figure 16, options 1 and 2 are shown for setting NAL units belonging to the corresponding time and corresponding layer within the data stream.

图17示出了在数据流中的部分的示意图，在两个层的示例性情况下，显示了这两个选项；Figure 17 shows a schematic diagram of a portion of the data flow, illustrating the two options in an exemplary case of two layers;

图18示出了解码器的示意性方框图，其被配置为处理根据选项1的图16和17的多层视频数据流；Figure 18 shows a schematic block diagram of the decoder, which is configured to process multi-layer video data streams according to Figures 16 and 17 according to Option 1;

图19示出了与图18的解码器配合的编码器的示意性方框图；Figure 19 shows a schematic block diagram of an encoder that works in conjunction with the decoder of Figure 18;

图20示出了根据一个实施方式的细分成子流的图片的示意图，用于WPP处理，另外表示在使用WPP并行解码/编码图片时所产生的波前；Figure 20 shows a schematic diagram of an image subdivided into substreams according to one embodiment for WPP processing, and also illustrates the wavefront generated when using WPP to decode/encode the image in parallel.

图21示出了涉及具有三个解码单元的三个视图的多层视频数据流，在未交错的状态中具有每个解码单元；Figure 21 illustrates a multi-layer video data stream involving three views with three decoding units, each decoding unit in a non-interleaved state;

图22示出了根据图21的多层视频数据流的配置的示意图，但是视图交错；Figure 22 shows a schematic diagram of the configuration of the multi-layer video data stream according to Figure 21, but the views are interleaved;

图23示出了在NAL单元的内部序列中的多层视频数据流之中的部分的示意图，以便示出在访问单元内交错层时可能遵守约束条件；Figure 23 shows a schematic diagram of a portion of a multi-layer video data stream within the internal sequence of a NAL unit, to illustrate the constraints that may be observed when accessing interleaved layers within the unit.

图24示出了在修改的VPS语法中的一部分的实例，用于显示信令解码单元交错的可能性；Figure 24 shows an example of a portion of the modified VPS syntax to illustrate the possibility of signaling decoding unit interleaving;

图25示出了在NAL单元报头之中的部分，该部分示例性包括固定长度的层识别语法元素；Figure 25 shows a portion of the NAL unit header, which exemplarily includes a fixed-length layer identification syntax element;

图26示出了在VPS语法中的一部分，表示实现层识别扩展机制信令的可能性；Figure 26 shows a portion of the VPS syntax, illustrating the possibility of the implementation layer recognizing extended mechanism signaling;

图27示出了在VPS语法中的一部分，表示实现层识别扩展机制信令的另一个可能性；Figure 27 illustrates a portion of the VPS syntax, representing another possibility for implementation layer recognition of extended mechanism signaling;

图28示出了在VPS语法中的一部分，以便显示实现层识别扩展机制信令的甚至进一步实例；Figure 28 shows a portion of the VPS syntax to illustrate even further instances of implementation layer recognition of extended mechanism signaling;

图29示出了在薄片段报头中的一部分，用于显示在数据流内实现层识别扩展的可能性；Figure 29 shows a portion of the thin fragment header, used to demonstrate the possibility of implementing layer identification extensions within the data stream;

图30示出了在薄片段报头中的一部分，以便显示实现层识别扩展的进一步实例；Figure 30 shows a portion of the thin fragment header to illustrate a further example of the implementation layer identification extension;

图31示出了在VPS语法中的一部分，用于显示实现层识别扩展；Figure 31 shows a portion of the VPS syntax used to demonstrate implementation layer identification extensions;

图32示出了在数据流语法中的一部分，用于显示实现层识别扩展的另一个可能性；Figure 32 illustrates a portion of the dataflow syntax to demonstrate another possibility for implementation layer recognition extensions;

图33示意性示出了根据一个实施方式的摄像头设置，以便显示层识别语法元素与层识别扩展相结合的可能性；Figure 33 schematically illustrates a camera setup according to one embodiment to demonstrate the possibility of combining layer recognition syntax elements with layer recognition extensions;

图34A和图34B示出了在VPS扩展语法中的一部分，用于在数据流内信令层扩展机制的框架；Figures 34A and 34B illustrate a portion of the VPS extended syntax, outlining the framework for signaling layer extension mechanisms within the data stream;

图35示出了解码器的示意图，其被配置为处理具有涉及数据流的层的信令的多层视频数据流，分别在依赖性空间内设置这些层，在层之间具有依赖性；Figure 35 shows a schematic diagram of a decoder configured to process multi-layer video data streams with signaling involving layers of data streams, with these layers set in a dependency space and having dependencies between layers.

图36示出了显示依赖性空间的示意图，在此处，是在二维空间内的层的直接依赖性结构，在使用特定的预测结构的空间内具有每个维度；Figure 36 shows a schematic diagram of the dependency space, where the direct dependency structure of the layers is in a two-dimensional space, with each dimension in the space using a specific prediction structure;

图37示出了规定在不同层之间的依赖性的直接依赖性标志的阵列的示意图；Figure 37 shows a schematic diagram of an array of direct dependency flags that specify dependencies between different layers;

图38示出了规定在不同位置和不同维度之间的依赖性的直接依赖性标志的两个阵列的示意图；Figure 38 shows a schematic diagram of two arrays of direct dependency markers that specify dependencies between different locations and dimensions;

图39示出了在数据流语法中的一部分，显示了信令限定依赖性空间的第一语法结构的一部分的方式；Figure 39 shows a portion of the data flow syntax, illustrating how a part of the first syntactic structure for signaling-limited dependency space is presented;

图40示出了在数据流中的一部分，显示了信令涉及在依赖性空间内在数据流和可用点的层之间的映射的第一语法结构的一部分的可能性；Figure 40 illustrates a portion of the data flow, showing the possibility that signaling involves a portion of the first syntactic structure of mapping between layers of data flow and available points within the dependency space;

图41示出了在数据流中的一部分，显示了限定在依赖性-维度方面描述依赖性的第二语法结构的可能性；以及Figure 41 illustrates a portion of the data flow, demonstrating the possibility of a second syntactic structure that describes dependencies in terms of dependency-dimension; and

图42示出了限定第二语法结构的另一个可能性。Figure 42 illustrates another possibility for defining the second grammatical structure.

具体实施方式Detailed Implementation

首先，作为概述，提出了编码器/解码器结构的实施例，其适合后面提出的任一概念。First, as an overview, an embodiment of an encoder/decoder architecture is presented, which is suitable for any of the concepts presented later.

图1示出了根据实施方式的编码器的一般结构。编码器10可以实现为能够通过多线程的方式或非多线程的方式(即，仅仅单线程的方式)进行操作。即，例如，编码器10可以使用多CPU内核实现。换言之，编码器10可以支持并行处理，但是不一定这样做。所生成的位流还可由单线程编码器/解码器生成/解码。然而，本申请的编码概念能够使并行处理编码器有效地应用并行处理，，但是却没有损害压缩效率。关于并行处理能力，相似的陈述对于稍后在图2中描述的解码器有效。Figure 1 illustrates the general structure of an encoder according to an embodiment. The encoder 10 can be implemented to operate in a multi-threaded or non-multi-threaded manner (i.e., a single-threaded manner only). That is, for example, the encoder 10 can be implemented using multiple CPU cores. In other words, the encoder 10 can support parallel processing, but does not necessarily do so. The generated bitstream can also be generated/decoded by a single-threaded encoder/decoder. However, the encoding concept of this application enables a parallel processing encoder to effectively apply parallel processing without compromising compression efficiency. A similar statement regarding parallel processing capabilities applies to the decoder described later in Figure 2.

编码器10是视频编码器，但是通常，编码器10也可以是图片编码器。视频14的图片12示出为在输入16上进入的编码器10。图片12显示了某个场景，即，图片内容。然而，编码器10在其输入16上也接收属于相同时刻的另一个图片15，这两个图片12和15属于不同的层。仅仅为了说明的目的，图片12被示出为属于层0，而图片15被示出为属于层1。图1示出了层1可以相对于层0具有更高的空间分辨率，即，可以对相同场景显示更多数量的图片样本，但是这仅仅用于说明的目的，并且可替代地，层1的图片15可以具有相同的空间分辨率，但是例如，可以相对于层0在视图方向上不同，即，可以从不同的视角来捕捉图片12和15。要注意的是，在本文档中使用的术语基层和增强层可以表示在层的层次结构中的参考层和相关层的任何集。Encoder 10 is a video encoder, but typically, encoder 10 can also be an image encoder. Image 12 of video 14 is shown as entering encoder 10 at input 16. Image 12 shows a scene, i.e., image content. However, encoder 10 also receives another image 15 belonging to the same moment at its input 16, and these two images 12 and 15 belong to different layers. For illustrative purposes only, image 12 is shown as belonging to layer 0, while image 15 is shown as belonging to layer 1. Figure 1 shows that layer 1 can have a higher spatial resolution relative to layer 0, i.e., more image samples of the same scene can be displayed, but this is for illustrative purposes only, and alternatively, image 15 of layer 1 can have the same spatial resolution, but, for example, can be different in the view orientation relative to layer 0, i.e., images 12 and 15 can be captured from different perspectives. It should be noted that the terms base layer and enhancement layer used in this document can refer to any set of reference layers and related layers in a layer hierarchy.

编码器10是混合编码器，即，图片12和15由预测器18预测，并且由残差确定器22获得的预测残差20在变换/量化模块24内经受变换(例如，频谱分解，例如，DCT)以及量化。这样获得的变换和量化的预测残差26在熵编码器28内使用例如上下文自适应性进行熵编码，例如，算术编码或可变长度编码。残差的可重建版本可用于解码器，即，由再变换/再量化模块31恢复去量化和再变换的残差信号30，并且通过组合器33与预测器18的预测信号32再结合，从而分别产生图片12和15的重建34。然而，编码器10以块为基础进行操作。相应地，重建信号34在块边界上遭受不连续性，因此，为了产生图片12和15的参考图片38，滤波器36可以应用于重建信号34，基于此，预测器18预测随后编码的不同层的图片。然而，如图1中的虚线所示，预测器18也可以用其他预测模式(例如，空间预测模式)直接利用重建信号34而无需滤波器36或中间版本。Encoder 10 is a hybrid encoder, i.e., images 12 and 15 are predicted by predictor 18, and the prediction residual 20 obtained by residual determiner 22 undergoes transformation (e.g., spectral decomposition, e.g., DCT) and quantization within transform/quantization module 24. The transformed and quantized prediction residual 26 thus obtained is entropy encoded within entropy encoder 28 using, for example, context adaptation, arithmetic coding, or variable-length coding. A reconstructable version of the residual can be used in decoder, i.e., the dequantized and retransformed residual signal 30 is recovered by retransform/requantization module 31 and recombined with the prediction signal 32 of predictor 18 via combiner 33 to generate reconstructions 34 of images 12 and 15, respectively. However, encoder 10 operates on a block-based basis. Accordingly, the reconstructed signal 34 suffers discontinuities at block boundaries; therefore, in order to generate a reference image 38 for images 12 and 15, filter 36 can be applied to the reconstructed signal 34, based on which predictor 18 predicts images of different layers subsequently encoded. However, as shown by the dashed line in Figure 1, predictor 18 can also directly utilize the reconstructed signal 34 without filter 36 or intermediate versions using other prediction modes (e.g., spatial prediction mode).

预测器18可以在不同的预测模式中进行选择，以便预测图片12的某些块。图片12的一种这样的块39示例性示出在图1中。可以具有时间预测模式，根据该模式，基于相同层的先前编码图片(例如，图片12’)，预测代表图片12所分成的图片12的任何块的块39。还可以具有空间预测模式，根据该模式，基于相同图片12的先前编码部分(相邻块39)，预测块39。还在图1中说明性示出图片15的块41，以便代表图片15所分成的任一其他块。对于块41，预测器18可以支持刚刚讨论的预测模式，即，时间和空间预测模式。此外，预测器18可以提供层间预测模式，根据该模式，基于下层的图片12的对应部分，来预测块41。在“对应部分”中的“对应”表示空间对应，即，在图片12内的部分与在图15中将要预测的块41表示相同的场景部分。Predictor 18 can select from different prediction modes to predict certain blocks of image 12. One such block 39 of image 12 is exemplarily shown in FIG. 1. A temporal prediction mode can be used, according to which block 39 representing any block of image 12, divided from image 12, is predicted based on previously encoded images of the same layer (e.g., image 12'). A spatial prediction mode can also be used, according to which block 39 is predicted based on previously encoded portions of the same image 12 (adjacent blocks 39). Block 41 of image 15 is also illustratively shown in FIG. 1 to represent any other block of image 15. For block 41, predictor 18 can support the prediction modes just discussed, namely, the temporal and spatial prediction modes. Furthermore, predictor 18 can provide an inter-layer prediction mode, according to which block 41 is predicted based on corresponding portions of the lower-layer image 12. "Corresponding" in "corresponding portion" indicates spatial correspondence, that is, the portion within image 12 represents the same scene portion as block 41 to be predicted in FIG. 15.

本质上，预测器18的预测可以不限于图片样本。预测也可以适用于任何编码参数，即，预测模式、时间预测的运动矢量、视图间预测的视差矢量等。仅仅残差然后可以在位流40中编码。即，使用空间和/或层间预测，可以将编码参数预测性编码/解码。即使在此处，也可以使用视差补偿。Essentially, the predictions of predictor 18 are not limited to image samples. Predictions can also be applied to any encoding parameters, i.e., prediction modes, temporally predicted motion vectors, disparity vectors predicted between views, etc. Only the residuals can then be encoded in bitstream 40. That is, encoding parameters can be predictively encoded/decoded using spatial and/or inter-layer predictions. Even here, disparity compensation can be used.

使用某个语法，以便编译量化的残差数据26，即，变换系数水平和其他残差数据以及编码参数，例如，编码参数包括由预测器18确定的图片12和15的各个块39和41的预测模式和预测参数，并且这个语法的语法元素经受熵编码器28的熵编码。这样获得的由熵编码器28输出的数据流40形成由编码器10输出的位流40。A certain syntax is used to compile the quantized residual data 26, i.e., the transform coefficient levels and other residual data, as well as encoding parameters, such as the prediction modes and prediction parameters of the individual blocks 39 and 41 of images 12 and 15 determined by predictor 18, and the syntax elements of this syntax are subjected to entropy encoding by entropy encoder 28. The resulting data stream 40 output by entropy encoder 28 forms the bit stream 40 output by encoder 10.

图2示出了与图1的视频编码器配合的解码器，即，能够解码位流40。图2的解码器通常由参考标记50表示，并且包括熵解码器、再变换/去量化模块54、组合器56、滤波器58以及预测器60。熵解码器42接收位流并且进行熵解码，以便恢复残差数据62和编码参数64。再变换/去量化模块54去量化和再变换残差数据62，并且将这样获得的残差信号转发给组合器56。组合器56还从预测器60中接收预测信号66，该预测器60进而基于由组合器56组合预测信号66和残差信号65而确定的重建信号68，使用编码参数64，形成预测信号66。该预测反映预测器18最终选择的预测，即，相同的预测模式是可用的，并且这些模式被选择用于图片12和15的各个块，并且根据预测参数来控制。如上面关于图1所述，可替换地或者附加地，预测器60可以使用重建信号68的滤波版本或其中间版本。同样，可以基于组合信号68的未滤波版本或其某些滤波版本，来确定要在解码器50的输出70上最终再现和输出的不同层的图片。Figure 2 illustrates a decoder that works in conjunction with the video encoder of Figure 1, i.e., capable of decoding bitstream 40. The decoder of Figure 2 is generally indicated by reference numeral 50 and includes an entropy decoder, a re-transform/dequantization module 54, a combiner 56, a filter 58, and a predictor 60. The entropy decoder 42 receives the bitstream and performs entropy decoding to recover residual data 62 and encoding parameters 64. The re-transform/dequantization module 54 dequantizes and re-transforms the residual data 62 and forwards the resulting residual signal to the combiner 56. The combiner 56 also receives a prediction signal 66 from the predictor 60, which then forms the prediction signal 66 using the encoding parameters 64 based on the reconstructed signal 68 determined by the combiner 56 combining the prediction signal 66 and the residual signal 65. This prediction reflects the prediction ultimately selected by the predictor 18; i.e., the same prediction modes are available and are selected for the respective blocks of images 12 and 15, controlled according to the prediction parameters. As described above with respect to Figure 1, alternatively or additionally, the predictor 60 may use a filtered version of the reconstructed signal 68 or an intermediate version thereof. Similarly, the images of different layers to be ultimately reproduced and output on the output 70 of the decoder 50 may be determined based on an unfiltered version of the combined signal 68 or some of its filtered versions.

根据瓦片概念，图片12和15分别细分成瓦片80和82，并且至少，在这些瓦片80和82内的块39和41的预测分别限于仅仅使用与相同图片12、15的相同瓦片相关的数据，作为空间预测的基础。这意味着，块39的空间预测限于使用相同瓦片的先前编码部分，但是时间预测模式不限于依赖于先前编码图片(例如，图片12’)的信息。同样，块41的空间预测模式限于仅仅使用相同瓦片的先前编码数据，但是不限制时间和层间预测模式。为了说明的目的，仅仅选择将图片15和12分别细分成6个瓦片。可以分别针对图片12’、12以及15、15’单独地在位流40内，选择和信令成为瓦片的细分。每个图片12和15的瓦片的数量分别可以是1、2、3、4、6等中的任一个，其中，瓦片划分可以限于仅仅规则地划分成瓦面的行和列。为了完整性起见，要注意的是，单独编码瓦片的方式可以不限于帧内预测或空间预测，而是还可以包含跨瓦片边界的编码参数以及在熵编码内的上下文选择的任何预测。即，后者还可以限于仅仅取决于相同瓦片的数据。因此，该解码器能够并行地执行刚提及的操作，即，以瓦片为单位。Based on the tile concept, images 12 and 15 are subdivided into tiles 80 and 82, respectively. At least, the predictions of blocks 39 and 41 within these tiles 80 and 82 are limited to using only data associated with the same tiles of the same images 12 and 15 as the basis for spatial prediction. This means that the spatial prediction of block 39 is limited to using previously encoded portions of the same tiles, but the temporal prediction mode is not limited to relying on information from previously encoded images (e.g., image 12'). Similarly, the spatial prediction mode of block 41 is limited to using only previously encoded data of the same tiles, but is not limited to temporal or inter-layer prediction modes. For illustrative purposes, images 15 and 12 are subdivided into 6 tiles each. The subdivision of tiles can be selected and signaled individually within bitstream 40 for images 12', 12 and 15, 15', respectively. The number of tiles for each image 12 and 15 can be any of 1, 2, 3, 4, 6, etc., where the tile division can be limited to simply regularly dividing the tile face into rows and columns. For completeness, it's important to note that the method of encoding individual tiles is not limited to intra-frame prediction or spatial prediction, but can also include coding parameters across tile boundaries and any predictions selected within the context of entropy coding. That is, the latter can also be limited to data that depends solely on the same tile. Therefore, the decoder can perform the aforementioned operations in parallel, i.e., on a tile-by-tile basis.

可以替代地或者附加地，图1和图2的编码器和解码器能够使用WPP概念。参照图3。WPP子流100还表示图片12、15成为WPP子流的空间划分。与瓦片和薄片相反，WPP子流不对跨WPP子流100的预测和上下文选择施加限制。WPP子流100逐行延伸，例如，跨LCU(最大编码单元)101的行，即，预测编码模式在位流内单独可传输的最大可能块，并且为了能够并行处理，仅仅对熵编码做出一个妥协。尤其，在WPP子流100之中限定次序102，该次序从顶部示例性通向底部，并且对于除了在次序102内的第一WPP子流以外的每个WPP子流100，符号字母表的概率估计(即，熵概率)不完全重置，但是采用或者设置为等于在将直接位于前面的WPP子流直到其的如线条104所示的第二LCU熵编码/解码之后所产生的概率，LCU次序或子流的解码器次序分别针对每个WPP子流在图片12和15的相同侧(例如，由箭头106和引线表示的左手侧)上开始且沿着LCU行方向通向另一侧。因此，通过遵守分别在相同图片12和15的WPP子流的序列之间的某些编码延迟，这些WPP子流100可并行解码/编码，使得并行(即，同时)解码/编码各个图片12、15的部分形成一种波前108，该波前通过瓦片的方式在图片之上从左到右移动。Alternatively or additionally, the encoders and decoders of Figures 1 and 2 can use the WPP concept. Refer to Figure 3. WPP substream 100 also represents the spatial partitioning of images 12 and 15 into a WPP substream. Unlike tiles and slices, WPP substreams do not impose restrictions on prediction and context selection across WPP substream 100. WPP substream 100 extends line by line, for example, across LCU (maximum coding unit) 101, i.e., the largest possible block of prediction coding patterns that can be transmitted individually within the bitstream, and only makes a compromise to entropy coding in order to enable parallel processing. In particular, an order 102 is defined within the WPP substreams 100, which exemplarily extends from top to bottom, and for each WPP substream 100 other than the first WPP substream within order 102, the probability estimate of the symbolic alphabet (i.e., entropy probability) is not completely reset, but is adopted or set to be equal to the probability generated after the entropy encoding/decoding of the WPP substream directly preceding it up to its second LCU as shown by line 104. The LCU order, or the decoder order of the substreams, starts on the same side of images 12 and 15 (e.g., the left-hand side indicated by arrow 106 and the leader) for each WPP substream and proceeds along the LCU row direction to the other side. Thus, by adhering to certain encoding delays between the sequences of WPP substreams of the same images 12 and 15, these WPP substreams 100 can be decoded/encoded in parallel, such that the parallel (i.e., simultaneous) decoding/encoding of portions of each image 12, 15 forms a wavefront 108 that moves from left to right across the image in a tile-like manner.

要简单注意的是，次序102和104还在从左上角的LCU 101到右下角LCU的LCU之中，逐行从上到下限定光栅扫描次序。WPP子流均可以对应于一个LCU行。简单地返回看瓦片，瓦片还可以限于与LCU边界对准。就在子流内部的两个薄片之间的边界而言，子流可以分成一个或多个薄片而不被结合到LCU边界。然而，在从子流的一个薄片过渡到子流的下一个薄片时，在那种情况下，采用熵概率。在瓦片的情况下，就在瓦片内部的两个薄片之间的边界而言，整个瓦片可以概括成一个薄片，或者一个瓦片可以分成一个或多个薄片，而再次不被结合到LCU的边界。在瓦片的情况下，在LCU之间的次序改变，以便在按照瓦片次序继续进入下一个瓦片之前，首先按照光栅扫描次序，遍历按照瓦片次序的瓦片。It's important to note that sequences 102 and 104 are still within the LCUs from the top left LCU 101 to the bottom right LCU, defining the raster scan order row by row from top to bottom. Each WPP substream can correspond to one LCU row. Returning simply to the tiles, tiles can also be confined to alignment with LCU boundaries. With respect to the boundary between two slices within a substream, the substream can be divided into one or more slices without being bound to the LCU boundary. However, when transitioning from one slice of a substream to the next, in that case, entropy probability is used. In the case of tiles, with respect to the boundary between two slices within a tile, the entire tile can be summarized as one slice, or a tile can be divided into one or more slices without being bound to the LCU boundary. In the case of tiles, the order changes between LCUs so that before continuing into the next tile according to the tile order, the tiles in the tile order are traversed first according to the raster scan order.

如迄今所述，图片12可以分成瓦片或WPP子流，同样，图片15也可以分成瓦片或WPP子流。理论上，可以给图片12和15之一选择WPP子流划分/概念，而给其中另一个图片选择瓦片划分/概念。替换地，可以对位流施加限制，根据该限制，概念类型(即，瓦片或WPP子流)必须在层之间相同。空间段的另一个实例包括薄片。薄片用于分割位流40，用于传输目的。薄片打包进NAL单元内，这些单元是用于传输的最小实体。每个薄片单独可编码/可解码。即，禁止跨薄片边界的任何预测，就如上下文选择等一样。这些完全是空间段的三个实例：薄片、瓦片以及WPP子流。此外，所有三个并行化概念(瓦片、WPP子流以及薄片)可以相结合使用，即，图片12或图片15可以分成瓦片，其中，每个瓦片分成多个WPP子流。而且，薄片可以用于将位流分成多个NAL单元，例如，(但不限于)在瓦片或WPP边界上。如果使用瓦片或WPP子流并且另外使用薄片，划分图片12、15，并且薄片划分偏离其他WPP/瓦片划分，那么空间段限定为图片12、15的最小可单独解码部。替换地，可以在位流上施加那些概念的组合可以用于图片(12或15)内和/或边界是否需要在所使用的不同概念之间对齐的限制。As described so far, image 12 can be divided into tiles or WPP substreams, and similarly, image 15 can also be divided into tiles or WPP substreams. Theoretically, one of images 12 and 15 can be assigned a WPP substream partition/concept, while the other can be assigned a tile partition/concept. Alternatively, a constraint can be imposed on the bitstreams, according to which the concept type (i.e., tile or WPP substream) must be the same across layers. Another instance of a spatial segment includes slices. Slices are used to partition bitstream 40 for transmission purposes. Slices are packaged within NAL units, which are the smallest entities used for transmission. Each slice is individually encodeable/decodeable. That is, any predictions across slice boundaries are prohibited, such as context selection. These are exactly three instances of spatial segments: slices, tiles, and WPP substreams. Furthermore, all three parallelized concepts (tiles, WPP substreams, and slices) can be used in combination; that is, image 12 or image 15 can be divided into tiles, where each tile is divided into multiple WPP substreams. Furthermore, slicing can be used to divide the bitstream into multiple NAL units, for example, (but not limited to) on tile or WPP boundaries. If tiles or WPP substreams are used and slicing is also used to divide images 12 and 15, and the slicing deviates from other WPP/tile divisions, then the spatial segment is defined as the smallest individually decodeable portion of images 12 and 15. Alternatively, constraints can be imposed on the bitstream regarding whether combinations of those concepts can be used within images (12 or 15) and/or at boundaries need to be aligned between the different concepts used.

上面讨论了由编码器和解码器支持的各种预测模式、在预测模式上施加的限制、以及用于熵编码/解码的上下文推导，以便支持并行编码概念，例如，瓦片和/或WPP概念。上面还提出了，编码器和解码器可以逐个块地操作。例如，逐个块地选择上述预测模式，即，通过比图片本身更精细的粒度。在继续描述本申请的方面之前，解释根据一个实施方式的在薄片、瓦片、WPP子流以及上述块之间的关系。The foregoing discussed various prediction modes supported by the encoder and decoder, the constraints imposed on the prediction modes, and the context derivation used for entropy coding/decoding to support parallel coding concepts, such as tiles and/or WPP concepts. It was also proposed that the encoder and decoder can operate block-by-block. For example, the aforementioned prediction modes can be selected block-by-block, i.e., with a finer granularity than the image itself. Before continuing with the description of aspects of this application, the relationship between slices, tiles, WPP substreams, and the aforementioned blocks according to one embodiment is explained.

图4示出了一个图片，该图片可以是层0的图片(例如，图片12)或层1的图片(例如，图片15)。图片规则地细分成块90的阵列。有时，这些块90称为最大编码块(LCB)、最大编码单元(LCU)、编码树块(CTB)等。图片成为块90的细分，可以形成一种基础或最粗糙的粒度，可以在该粒度执行上述预测和残差编码，并且这个最粗糙的粒度(即，块90的尺寸)可以由编码器对层0和层1单独地信令和设定。可以在数据流内使用和信令多叉树(例如，四叉树)细分，以便分别将每个块90细分成预测块、残差块和/或编码块。尤其地，编码块可以是块90的递归多叉树细分的树叶块，并且一些预测相关的决定可以以编码块的粒度(例如，预测模式)来进行信令(signaled)，并且预测块和残差块可以是代码块的单独递归多叉树细分的树叶块，在预测块的粒度，将在时间帧内预测的情况下的预测参数(例如，运动矢量)以及在层间预测的情况下的视差矢量编码，在残差块的粒度，编码预测残差。Figure 4 illustrates an image, which can be a layer 0 image (e.g., image 12) or a layer 1 image (e.g., image 15). The image is regularly subdivided into an array of blocks 90. Sometimes, these blocks 90 are referred to as maximum coding blocks (LCBs), maximum coding units (LCUs), coding tree blocks (CTBs), etc. The subdivision of the image into blocks 90 can form a basic or coarsest granularity at which the aforementioned prediction and residual coding can be performed, and this coarsest granularity (i.e., the size of the block 90) can be signaled and set separately for layers 0 and 1 by the encoder. Subdivision can be performed within the data stream using signaling multi-way trees (e.g., quadtrees) to subdivide each block 90 into prediction blocks, residual blocks, and/or coding blocks respectively. In particular, the coding block can be a leaf block of a recursive multi-way tree subdivision of block 90, and some prediction-related decisions can be signaled at the granularity of the coding block (e.g., prediction mode), and the prediction block and residual block can be leaf blocks of separate recursive multi-way tree subdivisions of the coding block. At the granularity of the prediction block, prediction parameters (e.g., motion vectors) will be encoded in the case of intra-time frame prediction and disparity vectors in the case of inter-layer prediction. At the granularity of the residual block, prediction residuals are encoded.

可以在块90之中限定光栅扫描编码/解码次序92。编码/解码次序92限制相邻部分的可用性，用于空间预测的目的：仅仅图片的根据编码/解码次序92位于与目前要预测的语法元素相关的当前部分(例如，块90)或其某个更小的块之前的部分，可用于当前图片内的空间预测。在每个层内，编码/解码次序92遍历图片的所有块90，以便然后按照图片编码/解码次序继续遍历相应层的下一个图片的块，该图片编码/解码次序不必遵循图片的时间再现次序。在各个块90内，编码/解码次序92细化成在更小块(例如，编码块)之中的扫描。The raster scan encoding/decoding order 92 can be defined within block 90. The encoding/decoding order 92 restricts the availability of adjacent portions for spatial prediction purposes: only portions of the image that, according to the encoding/decoding order 92, precede the current portion (e.g., block 90) or one of its smaller blocks related to the syntax element to be predicted, are available for spatial prediction within the current image. Within each layer, the encoding/decoding order 92 traverses all blocks 90 of the image so that it then continues traversing the blocks of the next image in the corresponding layer according to the image encoding/decoding order, which does not necessarily follow the temporal reproduction order of the images. Within each block 90, the encoding/decoding order 92 is refined into scans within smaller blocks (e.g., encoding blocks).

相对于刚才概述块90和更小的块，每个图片进一步沿着刚刚提到的编码/解码次序92细分成一个或多个薄片。在图4中示例性地示出的薄片94a和94b相应地无间隙地覆盖各图片。在一个图片的连续薄片94a和94b之间的边界或界面96可以与相邻块90的边界对准或不对准。更确切地说，并且如在图4的右手边所示，在一个图片内的连续薄片94a和94b可以在更小块(例如，编码块)的边界上彼此接界，即，细分一个块90的树叶块。Relative to the blocks 90 and smaller blocks just outlined, each image is further subdivided into one or more slices along the aforementioned encoding/decoding sequence 92. Slices 94a and 94b, exemplarily shown in Figure 4, correspondingly cover the images without gaps. The boundary or interface 96 between consecutive slices 94a and 94b within an image may or may not be aligned with the boundary of adjacent block 90. More specifically, and as shown on the right-hand side of Figure 4, consecutive slices 94a and 94b within an image may adjoin each other at the boundary of smaller blocks (e.g., encoding blocks), i.e., leaf-like segments subdividing a block 90.

图片的薄片94a和94b可以形成最小单元，其中，图片编码成的数据流的部分可以封包成数据包，即，NAL单元。上面描述了薄片的进一步性能，即，关于(例如)跨薄片边界的预测和熵上下文确定对薄片的限制。具有这种限制的薄片可以称为“普通”薄片。如下面更详细所述，除了普通薄片之外，还具有“相关薄片(dependent slice)”。The slices 94a and 94b of the image can form the smallest unit, where portions of the data stream encoded from the image can be packaged into data packets, i.e., NAL units. Further performance characteristics of the slices were described above, namely, the constraints on the slices regarding (e.g.) predictions across slice boundaries and entropy context determination. Slices with such constraints can be called “ordinary” slices. In addition to ordinary slices, there are also “dependent slices”, as described in more detail below.

如果瓦片划分概念用于图片，那么在块90的阵列之中限定的编码/解码次序92可以改变。这在图5中显示，其中，图片示例性显示为分成4个瓦片82a到82d。如在图5中所示，瓦片本身限定为图片的规则细分，以块90为单位。即，每个瓦片82a到82d由n x m个块90的阵列构成，n单独设置为用于每行瓦片，并且m单独设置为用于每列瓦片。遵循编码/解码次序92，在继续进入下一个瓦片82b中之前，在第一瓦片内的块90首先按照光栅扫描次序扫描，诸如此类，其中，瓦片82a到82d本身按照光栅扫描次序扫描。If the concept of tile division is applied to an image, then the encoding/decoding order 92 defined within the array of blocks 90 can be changed. This is shown in Figure 5, where the image is exemplarily shown as divided into four tiles 82a to 82d. As shown in Figure 5, the tiles themselves are defined as regular subdivisions of the image, in units of blocks 90. That is, each tile 82a to 82d consists of an array of n x m blocks 90, where n is set individually for each row of tiles and m is set individually for each column of tiles. Following the encoding/decoding order 92, the blocks 90 within the first tile are first scanned in raster scan order before proceeding to the next tile 82b, and so on, where tiles 82a to 82d themselves are scanned in raster scan order.

根据WPP流划分概念，图片沿着编码/解码次序92以一行或多行块90为单位细分成WPP子流98a到98d。例如，每个WPP子流可以覆盖整行块90，如图6中所示。Based on the WPP stream partitioning concept, the image is subdivided into WPP substreams 98a to 98d along the encoding/decoding order 92, with one or more rows of blocks 90 as units. For example, each WPP substream can cover an entire row of blocks 90, as shown in Figure 6.

然而，瓦片概念和WPP子流概念也可以混合。在这种情况下，例如，每个WPP子流覆盖在每个瓦片内的一行块90。However, the tile concept and the WPP sub-stream concept can also be mixed. In this case, for example, each WPP sub-stream covers a row of blocks 90 within each tile.

甚至图片的薄片划分与瓦片划分和/或WPP子流划分共同使用。关于瓦片，图片细分成的一个或多个薄片中的每个，可以沿着编码/解码次序92正好由一个完整的瓦片或不止一个完整的瓦片或者仅仅一个瓦片的子部分构成。还可以使用薄片，以便形成WPP子流98a到98d。为此，形成用于封包的最小单位的薄片可以一方面包括普通薄片，并且另一方面包括相关薄片：虽然普通薄片在预测和熵上下文推导上施加上述限制，但是相关薄片不施加这种限制。在图片的边界处的编码/解码次序92基本上离开行的点开始的相关薄片，采用通过块90的直接前一行中的熵解码块90产生的熵上下文，并且在别的地方开始的相关薄片可以采用由熵编码/解码直接前一个薄片直到其结束而产生的熵编码上下文。通过这种措施，每个WPP子流98a到98d可以由一个或多个相关薄片构成。Even image slicing can be used in conjunction with tile slicing and/or WPP substream slicing. Regarding tiles, each of one or more slices into which an image is subdivided can consist of exactly one complete tile, more than one complete tile, or only a sub-part of a tile along the encoding/decoding sequence 92. Slices can also be used to form WPP substreams 98a to 98d. For this purpose, the slice forming the smallest unit for packetization can include, on the one hand, ordinary slices, and on the other hand, relevant slices: while ordinary slices impose the aforementioned restrictions on prediction and entropy context derivation, relevant slices do not. Relevant slices at the boundaries of the image, where the encoding/decoding sequence 92 substantially departs from the point where the row begins, employ the entropy context generated by entropy decoding block 90 directly preceding block 90, and relevant slices starting elsewhere can employ the entropy coding context generated by entropy coding/decoding the slice directly preceding it until its end. Through this measure, each WPP substream 98a to 98d can consist of one or more relevant slices.

即，在块90之中限定的编码/解码次序92从相应图片的第一侧(在此处示例性为左侧)线性通向相反侧(示例性为右侧)，然后，在向下/底部方向，移步到块90的下一行。因此，当前图片的可用(即，已经编码/解码的)部分主要位于当前编码/解码的部分(例如，当前块90)的左边和顶部。由于预测中断以及在瓦片边界上的熵上下文推导，所以可以并行处理一个图片的瓦片。一个图片的瓦片的编码/解码甚至可以同时开始。如果同样允许穿过瓦片边界，那么限制源自上述环路滤波。反过来以交错的方式从上到下执行开始WPP子流的编码/解码。在块90(两个块90)内，测量在连续WPP子流之间的帧内图片延迟。That is, the encoding/decoding order 92 defined within block 90 linearly proceeds from the first side (exemplarily the left side in this case) of the corresponding image to the opposite side (exemplarily the right side), and then moves down/to the next line of block 90. Therefore, the available (i.e., already encoded/decoded) portion of the current image is primarily located to the left and top of the currently encoded/decoded portion (e.g., the current block 90). Due to prediction interruptions and entropy context derivation at tile boundaries, tiles of an image can be processed in parallel. Encoding/decoding of tiles of an image can even begin simultaneously. If crossing tile boundaries is also permitted, the limitation stems from the aforementioned loop filtering. Encoding/decoding of the starting WPP substream is then performed in an interleaved manner from top to bottom. Within block 90 (two blocks 90), intra-frame image latency between consecutive WPP substreams is measured.

然而，即使并行化图片12和15的编码/解码也有利的是，即，不同层的时刻。显然，编码/解码相关层的图片15，必须相对于基层的编码/解码延迟，以便保证具有已经可用的基层的“空间对应”部分。即使在任何图片12和15内不单独使用编码/解码的任何并行化的情况下，这些想法也有效。甚至在使用一个薄片来分别覆盖整个图片12和15，不使用瓦片并且不使用WPP子流处理的情况下，图片12和15的编码/解码也可以并行化。甚至在瓦片或WPP处理用于层的任何图片的这种情况下，或者无论瓦片或WPP处理是否用于层的任何图片，接下来描述的信令(即，方面6)是表示在层之间的这种编码/解码延迟的可能性。However, it is advantageous to parallelize the encoding/decoding of images 12 and 15, i.e., at different layers. Clearly, encoding/decoding of image 15, which is related to the layer, must be delayed relative to the base layer's encoding/decoding to guarantee a "spatial correspondence" portion of the base layer that is already available. These ideas remain valid even if no parallelization of encoding/decoding is used independently within any of images 12 and 15. Even when using a single sheet to cover the entire images 12 and 15 separately, without using tiles and without WPP substream processing, the encoding/decoding of images 12 and 15 can be parallelized. Even in the case where tiles or WPP processing are used for any image of a layer, or regardless of whether tiles or WPP processing are used for any image of a layer, the signaling described below (i.e., aspect 6) represents the possibility of such encoding/decoding delays between layers.

在讨论本申请的上面提出的概念之前，再次参照图1和图2，应注意的是，在图1和图2中的编码器和解码器的模块结构仅仅用于说明的目的，并且该结构也可以不同。Before discussing the concepts presented above in this application, referring again to Figures 1 and 2, it should be noted that the module structures of the encoder and decoder in Figures 1 and 2 are for illustrative purposes only, and the structures may also be different.

相对于与在连续层的编码之间的最小编码延迟相关的以上描述，应注意的是，解码器能够根据短期语法元素，确定最小解码延迟。然而，在使用长期语法元素来提前预定的时间段作为信号发送这个层间时间延迟的情况下，解码器可以使用所提供的保证来计划未来，并且可以在位流40的并行解码内更容易执行工作量分配。In relation to the above description concerning the minimum coding delay between consecutive layers, it should be noted that the decoder can determine the minimum decoding delay based on short-term syntax elements. However, when using long-term syntax elements to signal this inter-layer time delay in advance for a predetermined time period, the decoder can use the provided guarantees to plan for the future and can more easily perform workload allocation within the parallel decoding of bitstream 40.

第一方面涉及限制在视图之间的层间预测，尤其是例如，视差补偿视图间预测，支持更低的总体编码/解码延迟或并行化能力。从以下图中可容易获得细节。有关简单解释，参照图7。The first aspect involves limiting inter-layer prediction between views, particularly, for example, parallax-compensated inter-view prediction, supporting lower overall encoding/decoding latency or parallelization capabilities. Details are readily apparent from the following figures. For a simplified explanation, refer to Figure 7.

例如，编码器可以将相关视图的当前块302的视差矢量的可用域301限制为在基层段的边界300处进行层间预测。303表示限制，为了比较，图7示出了相关视图的另一个块302’，不限制其视差矢量的可用域。编码器可以在数据流内信令这个行为，即，限制303，以使解码器能够在低延迟的意义上进行利用。只要层间预测关涉及编码器，解码器就可以如正常一样操作，但是保证不需要“不可用段”的部分，即，解码器可以将层间延迟保持更低。替换地，就在边界300上的层间预测而言，编码器和解码器均改变其操作模式，以便例如，另外利用在边界300上的层间预测参数的可用状态的更低多样性。For example, the encoder can restrict the available domain 301 of the disparity vector of the current block 302 of the relevant view to interlayer prediction at the boundary 300 of the base segment. 303 represents a restriction; for comparison, Figure 7 shows another block 302' of the relevant view without restricting the available domain of its disparity vector. The encoder can signal this behavior, i.e., restriction 303, within the data stream to allow the decoder to utilize it in a low-latency sense. The decoder can operate normally as long as interlayer prediction is relevant to the encoder, but without the need for "unavailable segments," i.e., the decoder can maintain lower interlayer latency. Alternatively, with respect to interlayer prediction at boundary 300, both the encoder and decoder change their operating modes to, for example, further utilize the lower diversity of available states of the interlayer prediction parameters at boundary 300.

图8示出了多视图编码器600，其被配置为使用视图间预测，将多个视图12和15编码成数据流40。在图8的情况下，视图的数量示例性选择为2，使用箭头602显示从第一视图12向第二视图15的视图间预测。朝着不止两个视图的扩展可容易想象。这同样适用于在后文中描述的实施方式。多视图编码器600被配置为改变在第一视图划分成的空间段301的空间段边界300处的视图间预测。Figure 8 illustrates a multi-view encoder 600 configured to encode multiple views 12 and 15 into a data stream 40 using inter-view prediction. In the case of Figure 8, the number of views is exemplaryly chosen to be two, with arrow 602 indicating the inter-view prediction from the first view 12 to the second view 15. Expansion toward more than two views is readily conceivable. This also applies to the implementation described below. The multi-view encoder 600 is configured to modify the inter-view prediction at the spatial segment boundaries 300 of the spatial segments 301 into which the first view is divided.

就涉及编码器600的可能实现细节而言，例如，参照上面在图1中提出的描述。即，编码器600可以是图片或视频编码器，并且可通过逐块的方式操作。尤其地，编码器600可以是混合编码型，其被配置为使第一视图12和第二视图15经受预测编码，将预测参数插入数据流40内，使用频谱分解成数据流40来变换编码预测残差，并且至少就第二视图15而言，在不同的预测类型之间切换，至少包括空间和视图间预测602。如上所述，编码器600在不同的预测类型/模式之间切换的单位可以称为编码块，这些编码块的尺寸可以变化，这是因为这些编码块可以表示(例如)第二视图15的图片的分层多叉树细分的树叶块或者第二视图15的图片可以规则地预先划分成的树根块。视图间预测的结果是使用视差矢量604在相应的编码块内预测样本，该视差矢量表示将要施加于第一视图12的图片的空间上共同定位的部分606的位移，在空间上共同定位到第二视图15的图片的视图间预测块302，以便访问部分608，通过将部分608的重建版本复制到块302内来从中预测在块302内的样本。然而，视图间预测602不限于第二视图15的样本值的这种类型的视图间预测。确切地说，此外或者替换地，由编码器600支持的视图间预测可以用于预测性编码预测参数本身：设想除了以上概述的视图间预测模式以外，编码器600还支持空间和/或时间预测。空间预测某个编码块，以针对该编码块而插入数据流40内的预测参数结束，正如时间预测一样。并未单独将第二视图15的图片的编码块的所有这些预测参数编码到数据流40内，而是独立于用于将第一视图的图片编码到数据流40内的预测参数，编码器600可以基于由编码器600从第一视图12编码成的数据流40部分中可获得的预测参数或其他信息，使用预测编码和预测用于预测性编码第二视图15的编码块的预测参数。即，例如，可以根据第一视图12的对应(也是时间上预测的)编码块的运动矢量，可以预测第二视图15的某个编码块302的预测参数，例如，运动矢量等。“对应”可以考虑在视图12和15之间的视差。例如，第一和第二视图12和15均可以具有与其相关联的深度图，并且编码器600可以被配置为将视图12和15的纹理样本与深度图的相关联的深度值一起编码成数据流40，并且编码器600可以使用编码块302的深度估计，以便确定在第一视图12内的“对应编码块”，第一视图的场景内容与第二视图15的当前编码块302的场景内容更好地配合。本质上，编码器600还可以根据视图15的附近视图间预测的编码块的所使用的视差矢量，确定这种深度估计，而不考虑任何深度图被编码或没有。Regarding the possible implementation details of encoder 600, refer, for example, to the description presented above in Figure 1. That is, encoder 600 can be a picture or video encoder and can operate in a block-by-block manner. In particular, encoder 600 can be a hybrid coding type, configured to subject the first view 12 and the second view 15 to predictive coding, insert prediction parameters into data stream 40, transform the encoded prediction residuals using spectral decomposition into data stream 40, and, at least with respect to the second view 15, switch between different prediction types, including at least spatial and inter-view prediction 602. As mentioned above, the unit by which encoder 600 switches between different prediction types/modes can be called a coding block, the size of which can vary because these coding blocks can represent, for example, leaf blocks of a hierarchical multi-branch tree subdivision of the image of the second view 15 or root blocks of a tree that can be regularly pre-divided into the image of the second view 15. The result of inter-view prediction is the prediction of samples within the corresponding coding block using a disparity vector 604, which represents the displacement of a spatially co-located portion 606 of the image of the first view 12, spatially co-located to the inter-view prediction block 302 of the image of the second view 15, in order to access portion 608, and from which samples within block 302 are predicted by copying a reconstructed version of portion 608 into block 302. However, inter-view prediction 602 is not limited to this type of inter-view prediction of sample values from the second view 15. Specifically, in addition to or alternatively, inter-view prediction supported by encoder 600 can be used for predictive coding prediction parameters themselves: it is envisioned that, in addition to the inter-view prediction modes outlined above, encoder 600 also supports spatial and/or temporal prediction. Spatial prediction of a coding block ends with prediction parameters inserted into data stream 40 for that coding block, just as temporal prediction does. Instead of encoding all these prediction parameters of the coded blocks of the image of the second view 15 separately into the data stream 40, the encoder 600 can use predictive coding and predictive parameters for predictively encoding the coded blocks of the second view 15, based on the prediction parameters or other information available to the encoder 600 from the portion of the data stream 40 encoded from the first view 12. That is, for example, the prediction parameters of a certain coded block 302 of the second view 15, such as motion vectors, can be predicted based on the motion vectors of the corresponding (also temporally predicted) coded blocks of the first view 12. "Correspondence" can take into account the parallax between views 12 and 15. For example, both first and second views 12 and 15 can have associated depth maps, and encoder 600 can be configured to encode texture samples of views 12 and 15 together with associated depth values of the depth maps into data stream 40. Encoder 600 can use depth estimates of coded blocks 302 to determine “corresponding coded blocks” within the first view 12, where the scene content of the first view better matches the scene content of the current coded block 302 of the second view 15. Essentially, encoder 600 can also determine this depth estimate based on the disparity vectors used by coded blocks predicted between neighboring views of view 15, regardless of whether any depth maps are encoded or not.

如上所述，图8的编码器600被配置为改变在空间段边界300处的视图间预测。即，编码器600改变在这些空间段边界300处的视图间预测的方式。下面进一步概述其原因和目标。尤其地，编码器600通过以下这种方式改变视图间预测的方式：预测的第二视图15的各个实体，例如，视图间预测的编码块300的纹理样本内容或者这种编码块的某个预测参数，通过视图间预测602方式，仅仅正好取决于第一视图12的一个空间段301。通过观看某个编码块的视图间预测的变化的结果、进行视图间预测的样本值或预测参数，可以容易理解其优点。不改变或限制视图间预测602，编码该编码块必须推迟，直到完成参与视图间预测602的第一视图12的两个或更多个空间段301的编码。因此，编码器600必须在任何情况下遵守这个视图间编码延迟/偏移，并且编码器600通过时间重叠的方式编码视图12和15，不能进一步减少编码延迟。通过上述方式在空间段边界301改变/修改视图间预测602时，情况不同，这是因为在这种情况下，第一视图12的一个(仅仅一个)空间段301一完成编码，正好在讨论中的编码块300(进行视图间预测的一些实体)就可以经受编码。因此，减少可能的编码延迟。As described above, the encoder 600 of Figure 8 is configured to alter the inter-view prediction at spatial segment boundaries 300. That is, the encoder 600 changes the manner of inter-view prediction at these spatial segment boundaries 300. The reasons and objectives are further outlined below. In particular, the encoder 600 alters the manner of inter-view prediction in such a way that the predicted entities of the second view 15, for example, the texture sample content of the inter-view predicted coding block 300, or a prediction parameter of such a coding block, depend only exactly on one spatial segment 301 of the first view 12, through the inter-view prediction 602. Its advantages can be readily understood by observing the results of the change in the inter-view prediction of a coding block, the sample values used for inter-view prediction, or the prediction parameters. Without altering or limiting the inter-view prediction 602, encoding the coding block must be postponed until the encoding of two or more spatial segments 301 of the first view 12 involved in the inter-view prediction 602 is completed. Therefore, the encoder 600 must adhere to this inter-view coding delay/offset in all cases, and the encoder 600 encodes views 12 and 15 in a time-overlapping manner, without further reducing the coding delay. The situation is different when the inter-view prediction 602 is changed/modified at the spatial segment boundary 301 as described above. This is because, in this case, once one (only one) spatial segment 301 of the first view 12 is encoded, the encoding block 300 (some entities performing inter-view prediction) under discussion can be encoded. Therefore, the possible encoding delay is reduced.

因此，图9示出了与图8的多视图编码器配合的多视图解码器620。图9的多视图解码器620被配置为使用从第一视图12向第二视图15的视图间预测602，从数据流40中重建多个视图12和15。如上所述，解码器620可以通过以下方式以与应该由图8的多视图编码器600份内完成的相同方式来重新进行视图间预测602：从数据流40中读取包含在数据流内的预测参数并且应用这预测参数，例如，针对第二视图15的相应编码块(其中的一些编码块是视图间预测的编码块)表示的预测模式。如上所述，视图间预测602可以替换地或者另外与预测参数本身的预测相关，其中，对于这种视图间预测的预测参数，数据流40可以包括预测残差或者指向一系列预测器的索引，其中的一个预测器根据602进行视图间预测。Therefore, Figure 9 illustrates a multi-view decoder 620 in conjunction with the multi-view encoder of Figure 8. The multi-view decoder 620 of Figure 9 is configured to reconstruct multiple views 12 and 15 from data stream 40 using inter-view prediction 602 from the first view 12 to the second view 15. As described above, the decoder 620 can re-perform the inter-view prediction 602 in the same manner as it should be done by the multi-view encoder 600 of Figure 8: by reading prediction parameters contained within the data stream 40 and applying these prediction parameters, for example, the prediction mode represented by the corresponding coded blocks for the second view 15 (some of which are inter-view prediction coded blocks). As described above, the inter-view prediction 602 can alternatively or additionally relate to the prediction of the prediction parameters themselves, wherein, for such inter-view prediction prediction parameters, data stream 40 may include prediction residuals or indices pointing to a series of predictors, one of which performs inter-view prediction according to 602.

如在图8中已经所述，编码器可以改变在边界300上的视图间预测的方式，以便避免组合两个段301的信息的视图间预测602。编码器600可以通过对于解码器620透明的方式实现此结果。即，编码器600可以仅仅对其在可能的编码参数设置中的选择施加自我限制，以便解码器620仅仅应用在数据流40内传送的这样设置的编码参数，内在地避免在视图间预测602内的两个不同的段301的信息组合。As already described in Figure 8, the encoder can modify the way inter-view predictions are made on boundary 300 to avoid inter-view prediction 602 that combines information from two segments 301. The encoder 600 can achieve this result transparently to the decoder 620. That is, the encoder 600 can impose self-restrictions only on its choices of possible encoding parameter settings so that the decoder 620 applies only the encoding parameters set in this configuration transmitted within data stream 40, inherently avoiding the combination of information from two different segments 301 within inter-view prediction 602.

即，只要解码器620对数据流40的解码施加并行处理，不感兴趣或不能应用并行处理，通过并行解码视图12和15，解码器620就可以仅仅忽视插入数据流40内的编码器600的信令，信令视图间预测的上述变化。更确切地说，根据本申请的一个实施方式，图8的编码器在数据流内作为信号发送(signal)在数据流40内的段边界300上的视图间预测的变化，即，在边界300上具有任何变化还是没有变化。如果施加了信令，那么解码器620可以将在边界300上的视图间预测602的变化视为一种保证，即在空间段301的空间段边界300上限制视图间预测602，使得视图间预测602不涉及第二视图15的任何部分302对除了共同定位到第二视图的相应部分302的第一视图12的共同定位部分306所在的空间段以外的空间段的任何依赖性。即，如果在边界300上的视图间预测602的变化作为信令被施加了，那么解码器620可以将这视为一种保证，保证：对于视图间预测602用于预测其样本或其预测参数中的任一个的相关视图15的任何块302，这个视图间预测602不引入对任何“相邻空间段”的任何依赖性。这意味着以下内容：对于每个部分/块302，具有与第二视图15的相应块302共同定位的第一视图12的共同定位部分606。例如，“共同定位”意在表示例如，视图12内的块圆周正好与块302的圆周精确地共索引(co-indices)。替换地，不通过样本精度，而是通过层12的图片所划分的块的粒度，测量“共同定位”，以便确定“共同定位”块，产生从层12的图片分成块之中选择该块，即，例如，选择包含共同定位到块302的左上角的位置或者块302的另一个代表位置的块。“共同定位的部分/块”表示为606。要记住的是，由于视图12和15具有不同视图方向，所以共同定位的部分606可不包括与部分302相同的场景内容。然而，在视图间预测变化信令的情况下，解码器620假设经受视图间预测602的第二视图15的任何部分/块302通过视图间预测602仅仅取决于该空间段301，共同定位的部分/块606位于该空间段内。即，在观看彼此登记的第一和第二视图12和15的图片时，然后，视图间预测602不跨第一视图12的段边界，而是依然在第二视图15的相应部分/块302所在的那些段301内。例如，多视图编码器600适当地限制第二视图15的视图间预测的部分/块302的信令/选择的视差矢量604，和/或适当地编码/选择到预测器列表内的索引，以便不从“相邻的空间段301”的信息中索引涉及视图间预测602的预测器。That is, as long as the decoder 620 applies parallel processing to the decoding of the data stream 40, and is not interested in or unable to apply parallel processing, by decoding views 12 and 15 in parallel, the decoder 620 can simply ignore the signaling of the encoder 600 inserted within the data stream 40, and the aforementioned changes in the inter-view predictions of the signaling. More specifically, according to one embodiment of this application, the encoder of FIG8 transmits as a signal within the data stream the changes in the inter-view predictions at the segment boundary 300 within the data stream 40, i.e., whether there are any changes or no changes at the boundary 300. If signaling is applied, then the decoder 620 can regard the changes in the inter-view prediction 602 at the boundary 300 as a guarantee, i.e., limiting the inter-view prediction 602 at the spatial segment boundary 300 of the spatial segment 301, such that the inter-view prediction 602 does not involve any dependence of any part 302 of the second view 15 on any spatial segment other than the spatial segment where the common location part 306 of the first view 12, which is co-located to the corresponding part 302 of the second view, is located. That is, if a change in the inter-view prediction 602 on boundary 300 is applied as signaling, then decoder 620 can treat this as a guarantee that for any block 302 of the relevant view 15 for which the inter-view prediction 602 is used to predict any of its samples or prediction parameters, this inter-view prediction 602 does not introduce any dependency on any “adjacent spatial segments”. This means the following: for each part/block 302, there is a co-location portion 606 of the first view 12 that is co-located with the corresponding block 302 of the second view 15. For example, “co-location” is intended to mean, for example, that the circumference of a block within view 12 is exactly co-indices with the circumference of block 302. Alternatively, “co-location” is measured not by sample precision, but by the granularity of the blocks into which the image of layer 12 is divided, in order to determine “co-located” blocks, resulting in the selection of that block from the blocks into which the image of layer 12 is divided, i.e.,, for example, selecting a block that contains a co-location to the upper left corner of block 302 or another representative location of block 302. The “co-located portion/block” is denoted as 606. It's important to remember that because views 12 and 15 have different viewing orientations, the co-located portion 606 may not include the same scene content as portion 302. However, in the case of inter-view prediction change signaling, decoder 620 assumes that any portion/block 302 of the second view 15 undergoing inter-view prediction 602 depends solely on the spatial segment 301 within which the co-located portion/block 606 resides. That is, when viewing the images of the first and second views 12 and 15 registered to each other, the inter-view prediction 602 does not cross the segment boundaries of the first view 12, but remains within those segments 301 where the corresponding portion/block 302 of the second view 15 resides. For example, the multi-view encoder 600 appropriately restricts the signaling/selection of the disparity vector 604 for the inter-view prediction portion/block 302 of the second view 15, and/or appropriately encodes/selects the index within the predictor list so as not to index the predictor involved in the inter-view prediction 602 from the information of "adjacent spatial segments 301".

在相对于表示可以彼此相结合或不相结合的各种实施方式的图8和图9的编码器和解码器，继续描述各种可能的细节之前，注意以下内容。从图8和图9的描述中显而易见，编码器600可以通过不同的方式实现视图间预测602的其“变化/限制”。在更宽松的限制中，编码器600仅仅通过视图间预测602不结合两个或多个空间段的信息的方式来限制视图间预测602。图9的描述以更严格的限制实例为特征，根据该实例，甚至限制视图间预测602，以便不跨空间段302：即，经受视图间预测602的第二视图15的任何部分/块302仅仅从其“共同定位的块/部分606”所在的第一视图12的该空间段301的信息中通过视图间预测602获得其视图间预测。编码器相应地执行。后一个限制类型表示图8的描述的替换，并且甚至比前面描述的限制更严格。根据这两个替换，解码器602可以利用限制。例如，如果信令将被施加，那么通过在解码第二视图15时减少/降低相对于第一视图12的视图间解码偏移/延迟，解码器602可以利用视图间预测602的限制。替换地或此外，在决定尝试并行解码视图12和15时，解码器602可以考虑保证信令：如果该保证信令被施加，那么解码器可以伺机尝试进行视图间并行处理并且抑制该尝试。例如，在图9中显示的实例中，其中，第一视图12规则地划分成4个空间段301，每个空间段均表示第一视图12的图片的四分之一，第一视图12的第一空间段301一完全解码，解码器620就可以开始解码第二视图15。否则，假设视差矢量604仅仅具有水平性质，解码器620必需至少等待第一视图12的两个上部空间段301完全解码。沿着段边界300的视图间预测的更严格的变化/限制使更容易利用保证。Before proceeding with the description of the encoder and decoder of Figures 8 and 9, which represent various implementations that can be combined or not combined with each other, note the following. It is apparent from the descriptions of Figures 8 and 9 that the encoder 600 can implement its “variations/restrictions” on the inter-view prediction 602 in different ways. In a more relaxed restriction, the encoder 600 restricts the inter-view prediction 602 simply by not combining information from two or more spatial segments. The description of Figure 9 is characterized by a more stringent instance of restriction, according to which the inter-view prediction 602 is even restricted to not crossing spatial segments 302: that is, any part/block 302 of the second view 15 subject to the inter-view prediction 602 obtains its inter-view prediction solely from the information of that spatial segment 301 of the first view 12 where its “co-located block/part 606” is located via the inter-view prediction 602. The encoder performs accordingly. This latter type of restriction represents an alternative to the description of Figure 8 and is even more stringent than the restrictions described above. The decoder 602 can utilize these restrictions according to both alternatives. For example, if a signaling is to be applied, decoder 602 can utilize the constraints of inter-view prediction 602 by reducing/lowering the inter-view decoding offset/delay relative to first view 12 when decoding second view 15. Alternatively or further, when deciding to attempt to decode views 12 and 15 in parallel, decoder 602 can consider a guarantee signaling: if this guarantee signaling is applied, the decoder can wait to attempt inter-view parallel processing and suppress that attempt. For example, in the example shown in Figure 9, where first view 12 is regularly divided into four spatial segments 301, each representing a quarter of the image of first view 12, decoder 620 can begin decoding second view 15 once the first spatial segment 301 of first view 12 is fully decoded. Otherwise, assuming the disparity vector 604 is only horizontal, decoder 620 must wait at least until the two upper spatial segments 301 of first view 12 are fully decoded. A stricter variation/constraint in inter-view prediction along segment boundaries 300 makes it easier to utilize the guarantee.

上述保证信令可以具有范围/有效性，例如，包括仅仅一个图片或者甚至一系列图片。因此，如在后文中所述，可以在视频参数集或序列参数集或甚至图片参数集内进行信令。The aforementioned guarantee signaling can have scope/validity, for example, including only a single image or even a series of images. Therefore, as described later, signaling can be performed within a video parameter set, a sequence parameter set, or even an image parameter set.

迄今，在图8和图9中提出了实施方式，根据这些实施方式，除了保证信令以外，数据流40以及图8和图9的编码器和解码器编码/解码数据流的方式不随着视图间预测602的变化而变化。确切地说，解码/编码数据流的方式保持相同，与在视图间预测602内是否应用自我限制无关。根据一个替换的实施方式，然而，编码器和解码器甚至改变其编码/解码数据流40的方式，以便利用保证情况，即，在空间段边界300上的视图间预测602的限制。例如，针对第一视图12的空间段边界300的共同位置附近的第二视图15的视图间预测的块/部分302，在数据流40中可信令的可能视差矢量的域可被限制。例如，再次参照图7，如上所述，图7显示了第二视图15的两个示例性块302’和302，一个块(即，块302)接近第一视图12的空间段边界300的共同定位位置。第一视图12的空间段边界300的共同定位位置在转移到第二视图15内时显示为622。如图7中所示，块302的共同定位的块306接近空间段边界300，垂直地分离包括共同定位的块606的空间段301a和水平相邻的空间段301b，以某种程度上太大的视差矢量将共同定位的块/部分606移动到右边(即，朝着相邻的空间段301b)，会造成至少部分从这个相邻的空间段301b的样本中复制视图间预测块302，在这种情况下，视图间预测602与空间段边界300相交。因此，在“保证情况”下，编码器600不选择块302的这种视差矢量，因此，可以限制块302的可能视差矢量的可编码域。例如，在使用霍夫曼编码时，用于编码视图间预测块302的视差矢量的霍夫曼代码可以改变，以便利用可能视差矢量的其限制域的情况。在使用算术编码的情况下，例如，与二进制算术方案相结合的另一个二值化可以用于编码视差矢量，或者可以使用在可能的视差矢量之中的另一个概率分布。根据这个实施方式，通过相对于传输在空间段边界300的共同定位的位置附近的空间段302的视差矢量，减少在数据流40内要传送的边信息量，可以部分补偿由在空间段边界300上的视图间预测限制造成的微小编码效率减少。To date, embodiments have been presented in Figures 8 and 9, according to which, apart from guaranteeing signaling, the manner in which data stream 40 and the encoders and decoders of Figures 8 and 9 encode/decode the data stream do not change with variations in inter-view prediction 602. Specifically, the manner in which the data stream is decoded/encoded remains the same, regardless of whether self-constraints are applied within inter-view prediction 602. According to an alternative embodiment, however, the encoders and decoders may even change the manner in which they encode/decode data stream 40 to take advantage of the guarantee situation, namely, the constraint of inter-view prediction 602 on the spatial segment boundary 300. For example, for the block/part 302 of inter-view prediction of the second view 15 near the common location of the spatial segment boundary 300 of the first view 12, the domain of possible disparity vectors for trusting signaling in data stream 40 may be constrained. For example, referring again to Figure 7, as described above, Figure 7 shows two exemplary blocks 302' and 302 of the second view 15, one of which (i.e., block 302) is close to the common location of the spatial segment boundary 300 of the first view 12. The co-location position of the spatial segment boundary 300 in the first view 12 is shown as 622 when transferred into the second view 15. As shown in FIG7, the co-located block 306 of block 302 is close to the spatial segment boundary 300, vertically separating the spatial segment 301a including the co-located block 606 and the horizontally adjacent spatial segment 301b. Moving the co-located block/part 606 to the right (i.e., towards the adjacent spatial segment 301b) with a disparity vector that is somewhat too large will cause at least a partial copying of the inter-view prediction block 302 from the sample of this adjacent spatial segment 301b. In this case, the inter-view prediction 602 intersects the spatial segment boundary 300. Therefore, in the "guaranteed case", the encoder 600 does not select such a disparity vector of block 302, thus limiting the encodeable domain of possible disparity vectors of block 302. For example, when using Huffman coding, the Huffman code used to encode the disparity vector of the inter-view prediction block 302 can be changed to take advantage of the limited domain of possible disparity vectors. In the case of using arithmetic coding, for example, another binarization combined with a binary arithmetic scheme can be used to encode disparity vectors, or another probability distribution can be used among the possible disparity vectors. According to this implementation, by reducing the amount of side information to be transmitted within the data stream 40 relative to the disparity vectors of the spatial segment 302 near the co-location of the spatial segment boundary 300, the slight reduction in coding efficiency caused by the inter-view prediction constraints on the spatial segment boundary 300 can be partially compensated.

因此，根据上述实施方式，根据是否应用保证情况，多视图编码器和多视图解码器均改变其从数据流中解码/编码视差矢量的方式。例如，改变用于解码/编码视差矢量的霍夫曼代码，或者改变用于在算术上解码/编码视差矢量的二值化和/或概率分布。Therefore, according to the above implementation, depending on whether a guarantee is applied, both the multi-view encoder and the multi-view decoder change the way they decode/encode disparity vectors from the data stream. For example, they change the Huffman code used to decode/encode the disparity vectors, or they change the binarization and/or probability distribution used to arithmetically decode/encode the disparity vectors.

为了相对于具体实例更清晰地描述在图8和图9中的编码器和解码器限制在数据流40内可信令的可能视差矢量的域的方式，参照图10。图10再次显示了视图间预测块302的编码器和解码器的通常行为：为当前块302确定在可能视差实例的域之中的视差矢量308。因此，块302是视差补偿预测的预测块。然后，在参考部分304上，在从共同定位到当前部分302的第一视图12的共同定位部分306偏移了所确定的视差矢量308的参考部分304上对第一视图12取样。在数据流内可信令的可能视差矢量的域的限制如下进行：做出限制，使得参考部分304完全位于共同定位部分306在空间上所在的空间段301a内。例如，在图10中显示的视差矢量308不满足这种限制。因此，位于块302的可能视差矢量的域的外面，并且根据一个实施方式，就块302而言，在数据流40内不可信令。然而，根据替换的实施方式，视差矢量308在数据流内可信令，但是在保证情况下，编码器600避免应用这个视差矢量308，并且例如，选择应用块302的另一个预测模式，例如，空间预测模式。To illustrate more clearly with respect to specific examples how the encoder and decoder in Figures 8 and 9 restrict the domain of possible disparity vectors for trusted orders within data stream 40, refer to Figure 10. Figure 10 again shows the typical behavior of the encoder and decoder for the inter-view prediction block 302: a disparity vector 308 is determined for the current block 302 within the domain of possible disparity instances. Thus, block 302 is the prediction block for disparity compensation prediction. Then, at reference portion 304, the first view 12 is sampled at reference portion 304, which is offset from the colocation portion 306 of the first view 12 from the current portion 302 by the determined disparity vector 308. The restriction on the domain of possible disparity vectors for trusted orders within the data stream is performed as follows: a restriction is made such that reference portion 304 is entirely within the spatial segment 301a in which colocation portion 306 is spatially located. For example, the disparity vector 308 shown in Figure 10 does not satisfy this restriction. Therefore, it is located outside the domain of the possible disparity vectors of block 302, and according to one implementation, it is not signalable within the data stream 40 with respect to block 302. However, according to an alternative implementation, the disparity vector 308 is signalable within the data stream, but under certain conditions, the encoder 600 avoids applying this disparity vector 308 and, for example, chooses to apply another prediction mode of block 302, such as a spatial prediction mode.

图10还示出了为了执行视差矢量的域的限制，可以考虑内插滤波器内核半宽度10。更确切地说，在从第一视图12的图片中复制视差补偿的预测块302的样本内容时，在亚像素视差矢量的情况下，通过使用具有某个内插滤波器内核尺寸的内插滤波器来应用内插，可以从第一视图12中获得块302的每个样本。通过组合在滤波器内核311内的位于样本位置“x所在的中心处的样本，可以获得在图10中使用“x”显示的样本值，因此，在这种情况下，甚至可以限制块302的可能视差矢量的域使得在参考部分304、滤波器内核311内没有任何样本覆盖相邻的空间段301b，而是依然在当前空间段301a内。因此，可限制或不限制可信令的域。根据一个替换的实施方式，可以仅仅根据某个例外规则，填充位于相邻的空间段301b内的滤波器内核311的样本，以便针对亚像素视差向量避免可能视差向量的域的额外限制。然而，仅仅在信令保证被施加的情况下，解码器支持更换填充。Figure 10 also illustrates that, in order to implement the domain limitation of the disparity vector, the half-width 10 of the interpolation filter kernel can be considered. More specifically, when copying the sample content of the disparity-compensated prediction block 302 from the image of the first view 12, in the case of sub-pixel disparity vectors, each sample of block 302 can be obtained from the first view 12 by applying interpolation using an interpolation filter with a certain interpolation filter kernel size. By combining samples located at the center of sample position "x" within filter kernel 311, the sample value shown using "x" in Figure 10 can be obtained. Therefore, in this case, the domain of the possible disparity vector of block 302 can be restricted so that no sample in reference portion 304 or filter kernel 311 covers the adjacent spatial segment 301b, but remains within the current spatial segment 301a. Thus, the domain of the signaling can be restricted or not. According to an alternative implementation, the samples of filter kernel 311 located in adjacent spatial segment 301b can be filled simply according to a certain exception rule to avoid additional restrictions on the domain of the possible disparity vector for sub-pixel disparity vectors. However, the decoder supports alternative filling only when the signaling guarantee is applied.

后一个实例清楚地表明，除了或替代在商解码数据流的变化，解码器620可以或可以不，响应于由编码器600插入数据流内的信令和数据流，改变在空间段边界300上执行视图间预测的方式。例如，如刚刚描述的，编码器和解码器根据是否应用保证情况，在延伸穿过空间段边界300的部分不同地填充内插滤波器内核。这同样适用于参考部分306本身：这同样允许至少部分延伸到相邻的空间段301b内，使用独立于位于当前空间段301a外面的任何信息的信息，替换地填充相应的部分。实际上，编码器和解码器可以在保证的情况下，可以将空间边界300视为图片边界，通过从当前空间段301a外插，来填充参考部分304的部分和/或内插滤波器内核311。The latter example clearly demonstrates that, in addition to or instead of changes in the decoded data stream, decoder 620 may or may not, in response to signaling and data streams inserted into the data stream by encoder 600, alter the way it performs interview predictions at spatial segment boundaries 300. For example, as just described, encoders and decoders may, depending on whether a guarantee is applied, differently fill the interpolation filter kernel in the portion extending across spatial segment boundary 300. The same applies to reference portion 306 itself: this also allows at least partial extension into adjacent spatial segment 301b to alternatively fill the corresponding portion using information independent of any information located outside the current spatial segment 301a. In fact, encoders and decoders may, under a guarantee, treat spatial boundary 300 as a picture boundary and fill portions of reference portion 304 and/or interpolation filter kernel 311 by extrapolating from the current spatial segment 301a.

还如上所述，视图间预测602不限于视图间预测块302的逐样本的内容的预测。确切地说，视图间预测还可以适用于预测参数的预测，例如，涉及视图15的时间预测块302的预测的运动参数、或者涉及空间预测块302的预测的空间预测参数的预测。为了说明在边界300上的这组视图间预测602上施加的可能的变化、限制，参照图11。图11示出了相关视图15的块302，至少尤其使用视图间预测，预测其参数。例如，块302的参数的几个预测器的列表可以由视图间预测602确定。为此，例如，编码器和解码器如下进行：为当前块302选择第一视图12的参考部分。在第一层12的图片所划分的编码块、预测块等块之中，进行参考部分/块314的选择或推导。对于其推导，在第一视图12内的代表位置318可以确定为共同定位到块302的代表位置628或者与块302相邻的相邻块320的代表位置630中。例如，相邻块320可以是位于块302顶部的块。块320的确定可以涉及从第二视图层15的图片所划分的块之中选择块320，作为包括直接位于块302的左上角样本的顶部的样本的块。代表位置628和630可以是位于左上角的样本或者位于块的中间的样本等。然后，在第一视图12内的参考位置318是共同定位到628或630中的位置。图11示出了到位置628的共同位置。然后，编码器/解码器估计视差矢量316。例如，这可以基于当前场景的估计的深度图或者分别使用已经解码的并且在块302或块320的空间-时间邻域内的视差矢量来进行。这样确定的视差矢量316应用于代表位置318中，以便矢量316的头部指向位置632。在第一视图12的图片划分成块之中，参考部分314选择为包括位置632的部分。如上所述，在进行部分/块314的选择之中的划分可以是视图12的编码块、预测块、残差块和/或变换块的划分。As mentioned above, the inter-view prediction 602 is not limited to the prediction of sample-by-sample content of the inter-view prediction block 302. Specifically, inter-view prediction can also be applied to the prediction of prediction parameters, such as motion parameters related to the prediction of the temporal prediction block 302 of view 15, or spatial prediction parameters related to the prediction of the spatial prediction block 302. To illustrate the possible variations and limitations imposed on this set of inter-view predictions 602 at boundary 300, refer to Figure 11. Figure 11 shows the block 302 of the relevant view 15, whose parameters are predicted using, at least particularly, inter-view prediction. For example, a list of several predictors for the parameters of block 302 can be determined by the inter-view prediction 602. For this purpose, for example, the encoder and decoder perform the following: A reference portion of the first view 12 is selected for the current block 302. The selection or derivation of the reference portion/block 314 is performed among the coding blocks, prediction blocks, etc., divided by the image of the first layer 12. For its derivation, the representative position 318 within the first view 12 can be determined as either the representative position 628 of block 302 or the representative position 630 of an adjacent block 320 adjacent to block 302. For example, the adjacent block 320 could be the block located at the top of block 302. Determining block 320 can involve selecting block 320 from the blocks divided from the image of the second view layer 15, as a block that includes the sample directly above the sample located at the top left corner of block 302. Representative positions 628 and 630 could be samples located at the top left corner or samples located in the middle of the block, etc. Then, the reference position 318 within the first view 12 is the position co-located to 628 or 630. Figure 11 shows the co-location to position 628. The encoder/decoder then estimates the disparity vector 316. For example, this can be based on an estimated depth map of the current scene or using the already decoded disparity vectors that are in the spatial-temporal neighborhood of block 302 or block 320, respectively. The disparity vector 316 thus determined is applied to the representative position 318 so that the head of the vector 316 points to position 632. In the division of the image of the first view 12 into blocks, the reference portion 314 is selected to include the portion at position 632. As described above, the division in selecting the portion/block 314 can be a division of the view 12 into coded blocks, prediction blocks, residual blocks, and/or transform blocks.

根据一个实施方式，仅仅多视图编码器检查所述参考部分314是否位于相邻空间段301b内，即，不包括参考点628的共同位置所在的共同定位的块的空间段。如果编码器将上述保证信令给解码器，那么编码器600抑制对当前块302的参数的任何应用。即，块302的参数的预测器列表可以包括造成边界300交叉的视图间预测器，但是编码器600避免选择该预测器并且选择不指向不想要的预测器的块302的索引。如果多视图编码器和解码器在保证的情况下检查所述参考部分314是否位于相邻空间段301b内，那么编码器和解码器可以使用另一个预测器代替或者仅仅从预测器列表中排除“边界交叉”视图间预测器，例如，所述预测器列表还可以包括空间和/或时间预测的参数和/或一个或多个默认预测器。条件检查(即，参考部分314是还是不是空间段301a的一部分)以及条件替换或排除仅仅在保证的情况下进行。在未保证的情况下，参考部分314是否在空间段301a内的任何检查可以停止，并且从参考部分314的属性中获得的预测器的应用到块302的参数的预测可以进行，与参考部分314是否在空间段301a或301b内无关。在不将从参考部分314的属性中获得的任何预测器加入当前块302的预测器列表中或者增加替换预测器的情况下，根据位于空间段301a内或外面的参考块314，由编码器以及解码器620执行通常视图间预测的相应变形。通过这种措施，到块302的预测器的确定列表内的任何预测器索引指向在解码器内的预测器的相同列表。可以响应于是否应用保证情况，限制块302的索引的可信令的域。在保证情况应用的情况下，而是仅仅编码器执行检查，多视图编码器形成块302的预测器列表，与位于空间段301a内的参考部分314无关(并且甚至与是否应用保证情况无关)，然而，在保证情况下，限制索引，以便不从预测器列表中选择预测器，以防从位于空间段301a外面的块314的属性中获得预测器。在这种情况下，解码器620可以通过相同的方式获得块302的预测器列表，即，在保证情况和未保证情况中通过相同的方式，这是因为编码器600已经注意到视图间预测不需要相邻空间段301b的任何信息。According to one implementation, the multi-view encoder checks only whether the reference portion 314 is located within an adjacent spatial segment 301b, i.e., excluding spatial segments of blocks where the common location of reference point 628 is located. If the encoder sends the aforementioned guarantee signaling to the decoder, then the encoder 600 suppresses any application of parameters to the current block 302. That is, the predictor list for the parameters of block 302 may include inter-view predictors that cause the boundary 300 to intersect, but the encoder 600 avoids selecting such predictors and selects the index of block 302 that does not point to an unwanted predictor. If the multi-view encoder and decoder check whether the reference portion 314 is located within an adjacent spatial segment 301b under the guarantee, then the encoder and decoder may use another predictor instead or simply exclude the “boundary intersection” inter-view predictor from the predictor list. For example, the predictor list may also include parameters for spatial and/or temporal prediction and/or one or more default predictors. Conditional checks (i.e., whether the reference portion 314 is or is not part of spatial segment 301a) and conditional substitutions or exclusions are performed only under the guarantee. In the absence of a guarantee, any check on whether reference portion 314 is within spatial segment 301a can be stopped, and the prediction of parameters for block 302 can be performed by applying predictors obtained from the properties of reference portion 314, regardless of whether reference portion 314 is within spatial segment 301a or 301b. Without adding any predictors obtained from the properties of reference portion 314 to the predictor list of the current block 302 or adding replacement predictors, the encoder and decoder 620 perform the corresponding transformations of the usual inter-view prediction based on the reference block 314 located within or outside spatial segment 301a. Through this measure, any predictor index within the determined list of predictors for block 302 points to the same list of predictors within the decoder. The domain of the trust order for the indexes of block 302 can be restricted in response to whether a guarantee is applied. When the guaranteed condition is applied, the encoder performs the check alone, and the multi-view encoder forms a predictor list for block 302, independent of the reference portion 314 located within spatial segment 301a (and even independent of whether the guaranteed condition is applied). However, in the guaranteed condition, the index is restricted so that predictors are not selected from the predictor list, in case a predictor is obtained from the properties of block 314 located outside spatial segment 301a. In this case, decoder 620 can obtain the predictor list for block 302 in the same way, i.e., in the same way in both the guaranteed and unguaranteed conditions, because encoder 600 has noted that inter-view prediction does not require any information from adjacent spatial segments 301b.

关于块302的参数以及参考部分314的属性，要注意的是，其可以是运动矢量、视差矢量、残差信号(例如，变换系数)和/或深度值。Regarding the parameters of block 302 and the properties of reference portion 314, it should be noted that they can be motion vectors, disparity vectors, residual signals (e.g., transform coefficients), and/or depth values.

在图8到图11中描述的视图间预测编码概念可以引入到HEVC标准的当前设想的扩展内，即，通过下面描述的方式。迄今，在下面直接提出的描述还解释为涉及上面在图8到图11中提出的描述的可能实现细节的基础。The concept of inter-view predictive coding described in Figures 8 through 11 can be incorporated into the currently envisioned extension of the HEVC standard, i.e., as described below. The description presented directly below is also interpreted as relating to the possible implementation details of the descriptions presented above in Figures 8 through 11.

作为中间注意的，要注意的是，以上讨论的作为在改变/限制视图间预测所在的边界上形成单元的空间段301，不必在减轻或者支持层间并行处理的单元中形成这种空间段。换言之，虽然上面讨论的图8到图11的空间段可以是基层12所分成的瓦片，但是其他实例也可行，例如，空间段301形成基层12的编码树根块CTB的一个实例。在下面描述的实施方式中，空间段301与瓦片的定义结合，即，空间段是瓦片或瓦片组。As an important point to note, the spatial segment 301 discussed above, which forms a unit on the boundary where the prediction between changing/limiting views is located, does not necessarily need to be formed in a unit that mitigates or supports inter-layer parallel processing. In other words, while the spatial segment discussed above in Figures 8 to 11 can be a tile divided by the base layer 12, other instances are also possible, such as spatial segment 301 forming an instance of the coding tree root block (CTB) of the base layer 12. In the embodiments described below, spatial segment 301 is associated with the definition of a tile, i.e., a spatial segment is a tile or a group of tiles.

根据后面解释的在HEVC中的超低延迟和并行化的限制，通过确保划分基层图片(尤其是瓦片)的方式，限制层间预测。As explained later regarding the limitations of ultra-low latency and parallelization in HEVC, inter-layer prediction is constrained by ensuring the partitioning of the base layer image (especially tiles).

HEVC允许通过垂直和水平便捷的网格将编码的基层图片的CTB分成矩形区域，这些区域称为瓦片并且可以单独处理，除了环路滤波以外。可以在瓦片边界上关闭环路滤波器，以使其完全独立。HEVC allows the encoded base layer image's CTB to be divided into rectangular regions called tiles using a convenient vertical and horizontal grid. These regions can be processed individually, except for loop filtering. Loop filtering can be turned off at the tile boundaries to make them completely independent.

在瓦片边界上打破解析和预测依赖性，与在图片边界上很像，然而，如果相应地配置以便减少瓦片边界伪影，那么环路滤波器可以跨越瓦片边界。因此，各个瓦片的处理不完全依靠在图片内的其他瓦片，或者在很大的程度上依靠于滤波配置。安装限制，即瓦片的所有CTB应属于相同的薄片或者薄片的所有CTB应属于相同的瓦片。在图1中可以看出，瓦片促使CTB扫描次序与瓦片的次序有关，即，在继续属于第二瓦片(例如，右上角)的所有CTB之前，穿过属于第一(例如，左上角)瓦片的所有CTB。通过在图片内构成网格的在每个瓦片行和列内的CTB的数量和尺寸，限定瓦片结构。这个结构可以逐个帧地改变或者在编码的视频序列中保持不变。Breaking the dependency between resolution and prediction at tile boundaries is similar to doing so at image boundaries; however, loop filters can cross tile boundaries if configured accordingly to reduce tile boundary artifacts. Therefore, the processing of individual tiles does not entirely depend on other tiles within the image, or depends to a large extent on the filtering configuration. There are mounting constraints, namely, all CTBs of a tile should belong to the same tile, or all CTBs of a tile should belong to the same tile. As can be seen in Figure 1, tiles cause the CTB scanning order to be related to the tile order; that is, all CTBs belonging to the first (e.g., the top left) tile are traversed before continuing with all CTBs belonging to the second tile (e.g., the top right). The tile structure is defined by the number and size of CTBs forming a grid within each tile row and column within the image. This structure can be changed frame-by-frame or remain constant throughout the encoded video sequence.

图12示出了在划分成9个瓦片的图片内的CTB示例性。粗黑线表示瓦片边界，并且编号表示CTB的扫描次序，也揭示了瓦片次序。Figure 12 shows an example of CTB within an image divided into 9 tiles. The thick black lines indicate tile boundaries, and the numbers indicate the scan order of the CTB, also revealing the tile order.

在基层位流内覆盖该对应图像区域的所有瓦片一解码，就可以解码HEVC扩展的增强层瓦片。By decoding all the tiles covering the corresponding image region within the base bitstream, the HEVC extended enhancement layer tiles can be decoded.

以下部分使用图7到11的概念描述了允许更低层间编码偏移/延迟的约束、信令和编码/解码处理变形。The following sections use the concepts from Figures 7 to 11 to describe the constraints, signaling, and encoding/decoding variations that allow for lower inter-layer coding offsets/delays.

在HEVC内涉及瓦片边界的修改的解码处理可以如下所示：Decoding processes involving tile boundary modifications within HEVC can be performed as follows:

a)运动或视差矢量不应跨越基层内的瓦片。a) Motion or parallax vectors should not cross the tiles within the base layer.

如果启用约束条件，那么以下内容适用：If constraints are enabled, the following applies:

如果层间预测(例如，样本值、运动矢量、残差数据或其他数据的预测)将基视图(层12)用作参考图片，那么应约束视差或运动矢量，以便参考的图片区域与同位配置的基层CTU属于相同的瓦片。在具体的实施方式中，在解码处理中，剪除运动或视差矢量308，以便参考的图片区域位于相同的瓦片内部，并且仅仅从位于相同瓦片内部的信息中，预测参考的亚像素位置。更具体而言，在当前HEVC样本内插处理中，这会将指向亚像素位置的运动矢量限制为远离瓦片边界300剪除3到4个像素，或者在视图间运动矢量的视图间残差预测处理中，这会限制指向在相同瓦片内的位置的视差矢量。替换的实施方式调整亚像素内插滤波器，以处理与图片边界相似的瓦片边界，以便允许运动矢量指向比亚像素内插滤波器的内核尺寸310更接近瓦片边界的亚像素位置。替换的实施方式表示位流约束，这禁止使用在先前描述的实施方式中剪除的运动或视差矢量。If inter-layer prediction (e.g., prediction of sample values, motion vectors, residual data, or other data) uses the base view (layer 12) as a reference image, then the disparity or motion vector should be constrained so that the reference image region belongs to the same tile as the co-located base layer CTU. In a specific implementation, during the decoding process, motion or disparity vectors 308 are pruned so that the reference image region lies within the same tile, and the reference subpixel location is predicted only from information located within the same tile. More specifically, in the current HEVC sample interpolation process, this would restrict motion vectors pointing to subpixel locations to be pruned 3 to 4 pixels away from the tile boundary 300, or in the inter-view residual prediction process for inter-view motion vectors, this would restrict disparity vectors pointing to locations within the same tile. An alternative implementation adjusts the subpixel interpolation filter to handle tile boundaries similar to image boundaries so that motion vectors can point to subpixel locations closer to the tile boundary than the kernel size 310 of the subpixel interpolation filter. The alternative implementation represents bitstream constraints, which prohibit the use of motion or disparity vectors that were pruned in the previously described implementation.

b)在处于不同的瓦片内时，不使用在基层内的同位配置块的相邻块。b) When in different tiles, adjacent blocks of the same configuration block in the base layer are not used.

如果基层用于相邻块的预测(例如，TMVP或相邻块视差推导)并且使用瓦片，那么以下适用：如果CTU B属于与并列的基层CTU A相同的瓦片，那么仅仅使用源自与在基层内的并列的CTU A不同的CTU B的预测器候选。例如，在当前HEVC推导处理中，CTU B可以位于并列的CTU A的右边。在本发明的具体实施方式中，使用不同的预测代替预测候选。例如，并联的PU可以用于代替预测。在本发明的另一个实施方式中，禁止在编码的位流内使用相关的预测模式。If the base layer is used for prediction of adjacent blocks (e.g., TMVP or adjacent block disparity derivation) and tiles are used, then the following applies: if CTU B belongs to the same tile as the adjacent base layer CTU A, then only predictor candidates derived from CTU B that are different from the adjacent CTU A within the base layer are used. For example, in the current HEVC derivation process, CTU B could be located to the right of the adjacent CTU A. In specific embodiments of the invention, different predictions are used instead of prediction candidates. For example, parallel PUs can be used instead of predictions. In another embodiment of the invention, the use of associated prediction modes within the encoded bitstream is prohibited.

将刚刚概述的HEVC变形可能性转移到图8和图11的描述上，要注意的是，就图11的预测器替换而言，可以选择该替换物，作为第一层12的该块的相应属性，包括当前块302本身的参考位置628的共同定位的位置。Transferring the HEVC deformation possibilities just outlined to the descriptions of Figures 8 and 11, it is worth noting that, with regard to the predictor replacement in Figure 11, the replacement can be selected as the corresponding attribute of the block of the first layer 12, including the co-location of the reference position 628 of the current block 302 itself.

c)信令c) Signaling

在具体实施方式中，以下高级语法可以使用VPS或SPS来能够使用N个标志而具有上述约束/限制，例如，如图13A、图13B中所示。In a specific implementation, the following advanced syntax can be used with a VPS or SPS to enable the use of N flags while having the above constraints/restrictions, for example, as shown in Figures 13A and 13B.

在此处，inter_layer_PREDTYPE_RESTYPE_SCAL_flag_1到inter_layer_PREDTYPE_RESTYPE_SCAL_flag_N中的PREDTYPE、RESTYPE、SCAL可以由下面描述的不同值代替：Here, PREDTYPE, RESTYPE, and SCAL in inter_layer_PREDTYPE_RESTYPE_SCAL_flag_1 to inter_layer_PREDTYPE_RESTYPE_SCAL_flag_N can be replaced by different values described below:

PREDTYPE表示约束/限制适用的预测类型，并且可能是以下或未列出的其他预测类型中的一个：PREDTYPE indicates the type of forecast to which the constraints/restrictions apply, and may be one of the following or other forecast types not listed:

–例如，temporal_motion_vector_prediction，用于从在基视图内的同位配置块的相邻块中预测时间运动矢量；– For example, temporal_motion_vector_prediction is used to predict temporal motion vectors from adjacent blocks of co-located blocks within the base view;

–例如，disparity_vector_prediction，用于从在基视图内的同位配置块(collocated block)的相邻块中预测视差矢量；– For example, disparity_vector_prediction is used to predict disparity vectors from neighboring blocks of a collocated block within the base view;

–例如，depth_map_derivation，用于从基视图中预测深度值；– For example, depth_map_derivation is used to predict depth values from a base view;

–例如，inter_view_motion_predition，用于从基视图中预测运动矢量；– For example, inter_view_motion_predition is used to predict motion vectors from a base view;

–例如，inter_view_residual_prediction，用于从基视图中预测残差数据；– For example, inter_view_residual_prediction is used to predict residual data from the base view;

–例如，inter_view_sample_prediction，用于从基视图中预测样本值。– For example, inter_view_sample_prediction is used to predict sample values from a base view.

可替换地，并不明确地信令用于限制/约束适用的预测类型，并且限制/约束适用于所有预测类型或者限制/约束仅利用每组一个标志来信令用于预测类型组。Alternatively, the signaling may not be explicitly used to restrict/constrain the forecast types to which it applies, and the restriction/constraint may apply to all forecast types or the restriction/constraint may be used only to signal the forecast type groups using one flag per group.

RESTYPE表示限制的类型并且可能是以下中的一个：RESTYPE indicates the type of restriction and may be one of the following:

–例如，约束(表示位流约束，并且标志可以包含在VUI内)；– For example, constraints (representing bitstream constraints, and flags can be included within the VUI);

–例如，限制(表示剪除(a)或不同预测器(b)的选择)。– For example, restrictions (representing the choice of pruning (a) or different predictors (b)).

–SCAL表示限制/约束仅仅适用于相同类型的层：–SCAL indicates that the restriction/constraint applies only to layers of the same type:

–例如，same_scal(表示限制仅仅适用在基层与增强层是相同的可扩展类型时)；– For example, same_scal (meaning the restriction only applies when the base layer and the enhancement layer are the same extensible type);

–例如，diff_scal(表示限制适用，与基层和增强层的可扩展类型无关)。– For example, diff_scal (indicates that the restrictions apply regardless of the extensible type of the base layer and enhancement layer).

在图14涉及的替换的实施方式中，所有描述的限制的使用可以在高级语法中为超大延迟模式进行信令，例如，在VPS或SPS中作为ultra_low_delay_decoding_mode_flag。In the alternative implementation illustrated in Figure 14, the use of all described limitations can be signaled for ultra-low delay modes in high-level syntax, for example, as ultra_low_delay_decoding_mode_flag in a VPS or SPS.

等于1的ultra_low_delay_decoding_mode_flag表示在瓦片边界处使用修改的解码处理。An ultra_low_delay_decoding_mode_flag of 1 indicates that modified decoding is used at tile boundaries.

由这个标志暗示的限制还可以包括对瓦片边界对准的约束以及对瓦片边界的上取样滤波器限制。The limitations implied by this symbol may also include constraints on tile boundary alignment and limitations on the upsampling filter of the tile boundaries.

即，参照图1，保证信令可以另外用于信令保证，在预定的时间段内(例如，在一系列图片之上扩展的时间段)，细分第二层的图片15，以便在第二层的图片的空间段82之间的边界84覆盖第一层的空间段80的每个边界86(如果考虑空间可扩展性，那么可能在上取样之后)。在小于预定的时间段的时间间隔内(例如，以单独图片为单位)(即，在图片间距内)，解码器依然定期确定第一层和第二层的图片12、15根据多层视频数据流40的短期语法元素实际上细分成空间段80和82，但是关于对准的指示已经有助于对并行处理工作量分配的计划。例如，在图1中的实线84表示瓦片边界84与层0的瓦片边界86完全在空间上对准的一个实例。然而，上述保证也允许层1的瓦片划分，以比层0的瓦片划分更精细，以便层1的瓦片划分包括在空间上不覆盖层0的任何瓦片边界86的进一步额外瓦片边界。在任何情况下，关于在层1与层0之间的瓦片对齐的知识，帮助解码器分配工作量或在同时并行处理的空间段之中处理可用的功率。在没有长期语法元素结构的情况下，解码器需要在更小的时间间隔内(即，每个图片)执行工作量分配，从而浪费计算机功率，以便执行工作量分配。另一方面是“伺机解码”：具有多个CPU核心的解码器可以利用关于层的并行化的知识，来决定尝试解码或者不尝试解码具有更高复杂度的层，即，或者更高数量的层，或者换言之，进一步视图。通过使用相同解码器的所有核心，超过单个核心的功能的位流可能可解码。如果配置文件(profile)和等级指示器不涉及在最小并行化上的这种指示，那么这个信息尤其有用。That is, referring to Figure 1, the guarantee signaling can be further used to guarantee that, within a predetermined time period (e.g., a time period extended over a series of pictures), the pictures 15 of the second layer are subdivided so that the boundaries 84 between the spatial segments 82 of the second layer pictures cover each boundary 86 of the spatial segment 80 of the first layer (possibly after upsampling if spatial scalability is taken into account). In time intervals shorter than the predetermined time period (e.g., on a per-picture basis) (i.e., within the picture spacing), the decoder still periodically determines that the pictures 12, 15 of the first and second layers are actually subdivided into spatial segments 80 and 82 according to the short syntax elements of the multi-layer video data stream 40, but the alignment indications already help in planning the allocation of parallel processing workloads. For example, the solid line 84 in Figure 1 represents an instance where the tile boundary 84 is completely spatially aligned with the tile boundary 86 of layer 0. However, the above guarantee also allows for tile subdivision of layer 1 to be more refined than that of layer 0, such that the tile subdivision of layer 1 includes further additional tile boundaries that do not spatially cover any tile boundaries 86 of layer 0. In any case, knowledge about tile alignment between layer 1 and layer 0 helps the decoder allocate workload or process available power within segments of space that are processed in parallel simultaneously. Without a long-term syntax element structure, the decoder needs to perform workload allocation in smaller time intervals (i.e., per picture), thus wasting computer power. On the other hand, there is "opportunistic decoding": a decoder with multiple CPU cores can utilize knowledge about the parallelization of layers to decide whether to attempt decoding layers with higher complexity—that is, a higher number of layers, or in other words, further views. By using all the cores of the same decoder, bitstreams exceeding the capabilities of a single core may be decodeable. This information is especially useful if profiles and rank indicators do not include such indications on minimum parallelization.

如上所述，在多层视频(基层图片12也具有与相关视图图片15不同的空间分辨率)的情况下，可以使用保证信令(示例性地比较ultra_low_delay_decoding_mode_flag)，以便操纵上取样滤波器36。如果在空间段边界86之上在层0中执行上取样滤波，那么在上取样滤波组合时，相对于层0的空间段80的编码/解码，在并行解码/编码层1的空间段82中满足的延迟增大，从而使用作在层1的块41的层间预测中使用的预测参考38的层0的相邻空间段的信息彼此相关。例如，参照图15。通过覆盖的方式显示图片12和15，根据空间对应性，这两个图片标出尺寸并且彼此对齐，即，显示场景的相同部分的部分彼此覆盖。图片12和15示例性显示为分别分成6和12个空间段，例如，瓦片。滤波器内核说明性显示为在图片12的左上角瓦片之上移动，以便获得其上取样版本，其用作在图片15的瓦片内层间预测任何块的基础，在空间上覆盖左上角瓦片。在一些中间情况下，例如，在202上，内核200覆盖图片12的相邻瓦片。在上取样版本的位置202上的内核200的中间的样本值从而取决于图片12的左上角瓦片的两个样本以及位于其右边的图片12的瓦片的样本。如果图片12的上取样样本用作层间预测的基础，那么并行处理层的段的层间延迟增啊。因此，限制可以帮助增大在不同层之上的并行化量，因此，减小总体编码延迟，自然，语法元素还可以是对一系列图片有效的长期语法元素。可以通过一种以下方式实现限制：例如，在覆盖位置202上填充内核200的覆盖部分，在内核200的非虚线部分内的样本值具有中心趋势，将使用线性或其他函数的非线性部分外插到虚线部分等内。As described above, in the case of multi-layer video (where the base image 12 also has a different spatial resolution than the associated view image 15), guarantee signaling (exemplarily comparing ultra_low_delay_decoding_mode_flag) can be used to manipulate the upsampling filter 36. If upsampling filtering is performed in layer 0 above the spatial segment boundary 86, then during the combination of upsampling filters, the delay satisfied in the parallel decoding/encoding of the spatial segment 82 of layer 1 relative to the encoding/decoding of the spatial segment 80 of layer 0 increases, thereby making the information of adjacent spatial segments of layer 0 used as prediction reference 38 for inter-layer prediction in block 41 of layer 1 correlated with each other. For example, refer to Figure 15. Images 12 and 15 are displayed in an overlay manner, with the two images marked in size and aligned with each other according to spatial correspondence, i.e., parts of the same part of the scene are overlaid on each other. Images 12 and 15 are exemplarily shown as being divided into 6 and 12 spatial segments, respectively, e.g., tiles. The filter kernel is illustrated as moving over the top-left tile of image 12 to obtain its upsampled version, which serves as the basis for inter-layer prediction of any block within the tiles of image 15, spatially covering the top-left tile. In some intermediate cases, for example at 202, kernel 200 covers the adjacent tiles of image 12. The intermediate sample values of kernel 200 at position 202 of the upsampled version thus depend on two samples from the top-left tile of image 12 and samples from the tile of image 12 to its right. If the upsampled samples of image 12 are used as the basis for inter-layer prediction, then the inter-layer latency of the segments of the parallel processing layer increases. Therefore, constraints can help increase the amount of parallelization across different layers, thus reducing the overall coding latency. Naturally, syntax elements can also be long-term syntax elements that are valid for a range of images. The constraint can be achieved by, for example, filling the covered portion of kernel 200 at covered position 202, having sample values with a central tendency in the non-dashed portion of kernel 200, extrapolating the non-linear portion using linear or other functions into the dashed portion, etc.

下面在VPS中提供一个替换的实施方式，作为一个实例，其中，上述限制/约束由ultra_low_delay_decoding_mode_flag控制，但是替换地(在禁用标志时)，可以单独地启用每个限制/约束。对于这个实施方式，参照图13C和图13D。这个实施方式也可以包含在其他非VCL NAL单元(例如，SPS或PPS)内。在图13C和图13D中，The following provides an alternative implementation in a VPS as an example, where the aforementioned limits/constraints are controlled by the `ultra_low_delay_decoding_mode_flag`, but alternatively (when the flag is disabled), each limit/constraint can be enabled individually. For this implementation, refer to Figures 13C and 13D. This implementation can also be included within other non-VCL NAL units (e.g., SPS or PPS). In Figures 13C and 13D,

等于1的ultra_low_delay_decoding_mode_flag规定du_interleaving_enabled_flag、interlayer_tile_mv_clipping_flag、depth_disparity_tile_mv_clipping_flag、inter_layer_tile_tmvp_restriction_flag以及independent_tile_upsampling_idc推断为等于1，并且在VPS、SPS或PPS内不存在。An ultra_low_delay_decoding_mode_flag value of 1 indicates that du_interleaving_enabled_flag, interlayer_tile_mv_clipping_flag, depth_disparity_tile_mv_clipping_flag, inter_layer_tile_tmvp_restriction_flag, and independent_tile_upsampling_idc are inferred to be equal to 1 and do not exist in VPS, SPS, or PPS.

在分层编码视频序列中使用并行化技术(例如，瓦片)时，从延迟的角度来看，有利地控制编码工具的限制，例如，在HEVC的扩展中的视图间预测，以不通过统一的方式跨过瓦片的边界。When using parallelization techniques (e.g., tiles) in hierarchically encoded video sequences, it is advantageous to control the limitations of the encoding tools from a latency perspective, such as inter-view prediction in HEVC extensions, to avoid crossing tile boundaries in a uniform manner.

在一个实施方式中，independent_tiles_flag的值确定具有语法元素，其控制，例如inter_layer_PREDTYPE_RESTYPE_SCAL_flag_x或independent_tile_upsampling_idc的各个限制/约束。independent_tiles_flag可以包含在VPS，如在图13E中所示。在此处，等于1的independent_tiles_flag规定inter_layer_PREDTYPE_RESTYPE_SCAL_flag_1到inter_layer_PREDTYPE_RESTYPE_SCAL_flag_N以及independent_tile_upsampling_idc推断为等于1，并且在VPS、SPS或PPS内不存在。In one implementation, the value of `independent_tiles_flag` determines the various limitations/constraints controlled by syntax elements, such as `inter_layer_PREDTYPE_RESTYPE_SCAL_flag_x` or `independent_tile_upsampling_idc`. `independent_tiles_flag` can be contained within a VPS, as shown in Figure 13E. Here, an `independent_tiles_flag` equal to 1 specifies that `inter_layer_PREDTYPE_RESTYPE_SCAL_flag_1` through `inter_layer_PREDTYPE_RESTYPE_SCAL_flag_N` and `independent_tile_upsampling_idc` are inferred to be equal to 1 and do not exist within the VPS, SPS, or PPS.

在图13F中，在VPS中提供一个替换的实施方式，作为一个实例，其中，上述约束由independent_tiles_flag控制，但是替换地(在禁用标志时)，可以单独地启用每个约束。这个实施方式也可以包含在其他非VCL NAL单元(例如，SPS或PPS)内，如在图13G中所示。In Figure 13F, an alternative implementation is provided in the VPS as an example, where the aforementioned constraints are controlled by the independent_tiles_flag, but alternatively (when the flag is disabled), each constraint can be enabled individually. This implementation can also be included in other non-VCL NAL units (e.g., SPS or PPS), as shown in Figure 13G.

概述目前在图8到图15中描述的以上实施方式，解码器620可以使用在数据流内的保证信令，以便优化在解码不同层/视图12和15之间的层间解码偏移，或者解码器620可以使用保证，以便如上所述，通过参照“伺机解码”，抑制或允许层间并行处理尝试。In summary of the above embodiments described in Figures 8 through 15, decoder 620 may use guarantee signaling within the data stream to optimize inter-layer decoding offsets between different layers/views 12 and 15, or decoder 620 may use guarantees to suppress or allow inter-layer parallel processing attempts by referring to “opportunistic decoding” as described above.

接下来描述的本申请的方面涉及在多层视频编码中允许更低的端对端延迟的问题。值得注意的是，接下来描述的方面可以与前面描述的方面相结合，但是反过来也是如此，即，涉及现在描述的方面的实施方式也可以不需要上面描述的细节而实现。在这方面，还应注意的是，在后文中描述的实施方式不限于多视图编码。涉及本申请的第二方面的在后文中提及的多层可以涉及不同的视图，但是也可以表示具有不同程度的空间分辨率、SNR精确度等的相同视图。可能的可扩展性维度(沿着所述维度，下面讨论的多层增加由前面的层传递的信息内容)是多样的，并且包括(例如)多个视图、空间分辨率、SNR精确度等，并且同讨论本申请的第三和第四方面，进一步可能性也变得显而易见，根据一个实施方式，这些方面也可以与目前描述的方面相结合。The aspects of this application described below relate to the problem of allowing lower end-to-end latency in multi-layer video coding. It is worth noting that the aspects described below can be combined with those described above, but conversely, implementations relating to the aspects now described can also be implemented without the details described above. In this regard, it should also be noted that the implementations described below are not limited to multi-view coding. The multi-layers mentioned below in relation to the second aspect of this application can refer to different views, but can also represent the same view with different degrees of spatial resolution, SNR accuracy, etc. Possible scalability dimensions (along which the multi-layers discussed below increase the information content passed by the preceding layers) are diverse and include, for example, multiple views, spatial resolution, SNR accuracy, etc., and further possibilities become apparent in conjunction with the discussion of the third and fourth aspects of this application; according to one implementation, these aspects can also be combined with the aspects now described.

现在描述的本申请的第二方面涉及实际上实现低编码延迟的问题，即，将低延迟理念嵌入NAL单元的框架内。如上所述，NAL单元由薄片构成。瓦片和/或WPP概念自由地单独选择，用于多层视频数据量的不同层。因此，具有在其内封包化的薄片的每个NAL单元可以在空间上归因于相应薄片所涉及的图片的区域。因此，为了在层间预测的情况下，能够低延迟编码，有利地能与属于相同时刻的不同层的NAL单元交错，以便允许编码器和解码器通过允许并行处理属于相同时刻的不同层的这些图片的方式，开始分别编码和传输并且解码在这些NAL单元内封包化的薄片。然而，根据应用，与允许在层维度内并行处理的能力相比，编码器可以优选地在不同层的图片之间使用不同的编码次序的能力，例如，使用不同层的不同GOP结构。因此，根据第二方面，在后文中在图16中再次描述数据流的构造。The second aspect of this application, now described, relates to the problem of practically achieving low coding latency, i.e., embedding the concept of low latency within the framework of NAL units. As described above, NAL units are composed of tiles. The concepts of tiles and/or WPPs are freely chosen individually for different layers of video data. Thus, each NAL unit having tiles encapsulated within it can be spatially attributed to the region of the picture to which the corresponding tile belongs. Therefore, in order to enable low-latency coding in the case of inter-layer prediction, it is advantageous to interleave NAL units belonging to different layers at the same time, so that the encoder and decoder can begin encoding and transmitting separately and decoding the tiles encapsulated within these NAL units in a manner that allows parallel processing of these pictures belonging to different layers at the same time. However, depending on the application, the encoder may preferably use different coding orders between pictures at different layers, for example, using different GOP structures for different layers, compared to the ability to allow parallel processing within the layer dimension. Therefore, according to the second aspect, the construction of the data stream is described again later in Figure 16.

图16示出了多层视频资料201，其由用于不同层中的每层的一系列图片204构成。每层可以描述由多层视频资料201描述的这个场景的不同性质。即，例如，可以在：颜色成分、深度图、透明度和/或视图点之中选择层的意义。不失一般性地，假设不同层对应于不同的视图，其中视频资料201是多视图视频。Figure 16 illustrates multi-layer video data 201, which consists of a series of images 204 for each of the different layers. Each layer can describe a different property of the scene described by the multi-layer video data 201. That is, for example, the meaning of the layer can be selected among: color components, depth map, transparency, and/or viewpoint. Without loss of generality, it is assumed that different layers correspond to different views, where the video data 201 is multi-view video.

在需要低延迟的应用的情况下，编码器可以决定信令长期高级语法元素(比较：将下面介绍的du_interleaving_enabled_flag设置为等于1)。在这种情况下，由编码器生成的数据流看起来与在图16的中间表示的数据流一样，在该数据流周围具有圆圈。在这情况下，多层视频流200由NAL单元202的序列构成，以便属于一个访问单元206的NAL单元202与一个时间上的时刻的图片相关，并且不同的访问单元的NAL单元202与不同的时刻相关。在每个访问单元206内，对于每层，与各个层相关的NAL单元中的至少一些被分组成一个或多个解码单元208。这表示以下内容：在NAL单元202之中，如上所示，具有不同类型的NAL单元，例如，一方面是VCL NAL单元，另一方面是非VCL NAL单元。更具体而言，NAL单元202可以是不同类型，并且这些类型可以包括：In applications requiring low latency, the encoder can determine the signaling long-term high-level syntax elements (compare: setting the du_interleaving_enabled_flag, as described below, to 1). In this case, the data stream generated by the encoder looks the same as the data stream represented in the middle of Figure 16, with circles around it. In this case, the multi-layer video stream 200 is composed of a sequence of NAL units 202, such that the NAL unit 202 belonging to one access unit 206 is associated with a picture at a certain moment in time, and the NAL units 202 of different access units are associated with different moments. Within each access unit 206, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units 208. This means that, as shown above, there are different types of NAL units within the NAL units 202, for example, one side being VCL NAL units and the other side being non-VCL NAL units. More specifically, the NAL units 202 can be of different types, and these types can include:

1)NAL单元携带薄片、瓦片、WPP子流等，即，语法元素涉及在图片取样规模/粒度上描述图片内容的预测参数和/或残差数据。可以具有一个或多个这种类型。VCL NAL单元是这种类型。这种NAL单元不可移动；1) NAL units carry slices, tiles, WPP substreams, etc., meaning the syntax elements involve prediction parameters and/or residual data describing the image content at the image sampling scale/granularity. One or more of this type are possible. VCL NAL units are of this type. This type of NAL unit is not movable;

2)参数组NAL单元可以携带不频繁改变的信息，例如，长期编码设置，上面描述了其中的一些实例。例如，这种NAL单元可以在某种程度上并且反复地散置在数据流内；2) Parameter group NAL units can carry information that does not change frequently, such as long-term encoding settings, some of which have been described above. For example, such NAL units can be distributed throughout the data stream to some extent and repeatedly;

3)补充增强信息(SEI)NAL单元可以携带可选数据。3) The Supplemental Enhancement Information (SEI) NAL unit can carry optional data.

解码单元可以由上述NAL单元中的第一个构成。更精确地说，解码单元可以由在访问单元内的一个或多个VCL NAL单元以及相关联的非VCL NAL单元构成。因此，解码单元描述一个图片的某个区域，即，编码成一个或多个薄片的区域包含在其内。The decoding unit can be composed of the first of the aforementioned NAL units. More precisely, the decoding unit can be composed of one or more VCL NAL units within the access unit and associated non-VCL NAL units. Thus, the decoding unit describes a region of an image, that is, a region encoded as one or more slices is contained therein.

与不同的层相关的NAL单元的解码单元208交错，以便对于每个解码单元，用于编码相应的解码单元的层间预测基于除了与相应的解码单元相关的层以外的层的图片部分，这些部分编码成位于在相应访问单元内的相应解码单元之前的解码单元。例如，参照在图16中的解码单元208a。示例性地，设想这个解码单元与相关层2的相应图片的区域210以及某个时刻相关。在相同时刻的基层图片内的共同定位的区域由212表示，并且需要略微超过这个区域212的这个基层图片的区域，以便通过利用层间预测，将解码单元208a完全解码。例如，略微超过可以是视差补偿预测的结果。这反过来表示位于在访问单元206内的解码单元208a之前的解码单元208b应完全覆盖层间预测所需要的区域。参照以上描述，其涉及可以用作交错粒度的边界的延迟指示。Decoding units 208 associated with different layers of NAL units are interleaved so that, for each decoding unit, the interlayer prediction for encoding the corresponding decoding unit is based on a portion of the image of the layer other than the layer associated with the corresponding decoding unit, which is encoded as a decoding unit preceding the corresponding decoding unit within the corresponding access unit. For example, refer to decoding unit 208a in FIG. 16. Exemplarily, imagine this decoding unit is associated with region 210 of the corresponding image of the associated layer 2 and at a certain moment. A co-located region within the base layer image at the same moment is represented by 212, and needs to slightly exceed this region 212 of the base layer image in order to fully decode decoding unit 208a by utilizing interlayer prediction. For example, slightly exceeding could be the result of parallax compensation prediction. This, in turn, means that decoding unit 208b preceding decoding unit 208a within access unit 206 should completely cover the region required by the interlayer prediction. Referring to the above description, this relates to a delay indication that can be used as a boundary for the interleaving granularity.

然而，如果应用更加利用在不同层之间不同地选择图片的解码次序的自由，那么编码器可以优选地将du_interleaving_enabled_flag设置为等于0，这种情况在图16的底部描述为2，在其周围具有圆圈。在这种情况下，多层视频数据流具有单独的访问单元，用于属于层ID的一个或多个值的某个对的每个图片以及单个暂时的时刻。如图16中所示，按照第i-1个解码次序，即，时刻t(i-1)，每个层可以由或不由访问单元AU₁、AU₂(以此类推)构成(比较：时刻t(i))，其中，所有层包含在单个访问单元内AU₁。然而，在这种情况下，不允许交错。访问单元设置数据流200内，在解码次序索引i之后，即，用于每层的解码次序索引i的访问单元，然后，具有涉及对应于解码次序i+1的这些层的图片的访问单元，以此类推。在数据流内的时间图片间预测信号信令相等的编码次序或不同的图片编码次序是否适用于不同的层，并且例如，信号可以放在数据流内的一个位置内内或者甚至冗余地放在不止一个位置内，以便在封包化为NAL单元的薄片内。However, if the application takes advantage of the freedom to choose the decoding order of images differently between different layers, the encoder can preferably set du_interleaving_enabled_flag to equal 0, which is described as 2 at the bottom of Figure 16, surrounded by a circle. In this case, the multi-layer video data stream has a separate access unit for each image belonging to one or more pairs of values of layer ID and a single transient moment. As shown in Figure 16, according to the (i-1)th decoding order, i.e., at time t(i-1), each layer can be constituted by or not by access units _AU1 , _AU2 (and so on) (compare: time t(i)), where all layers are contained within a single access unit _AU1 . However, interleaving is not allowed in this case. The access unit is set within the data stream 200 after the decoding order index i, i.e., the access unit for each layer at decoding order index i, then the access unit for the images of these layers corresponding to decoding order i+1, and so on. The question is whether the encoding order of the signaling is equal or different between time-series images within the data stream, and whether the encoding order is suitable for different layers. For example, the signal can be placed in one location within the data stream or even redundantly placed in more than one location, so as to be encapsulated into a sheet of NAL units.

关于NAL单元类型，应注意的是，在其中限定的排序规则可以允许解码器决定在连续访问单元之间的边界放在什么位置，与在传输期间是否去除了可移动数据包类型的NAL单元无关。例如，可移动数据包类型的NAL单元可以包括SEI NAL单元或者冗余图片数据NAL单元或其他特定的NAL单元类型。即，在访问单元之间的边界不移动，而是保持，并且依然，在每个访问单元内遵守排序规则，但是在任何两个访问单元之间的每个边界上，打破所述排序规则。Regarding NAL unit types, it should be noted that the ordering rules defined therein allow the decoder to determine the placement of boundaries between consecutive access units, regardless of whether movable packet type NAL units are removed during transmission. For example, movable packet type NAL units may include SEI NAL units, redundant image data NAL units, or other specific NAL unit types. That is, the boundaries between access units do not move but remain, and the ordering rules are still followed within each access unit, but the ordering rules are broken at each boundary between any two access units.

为了完整性，图17示出了du_interleaving_flag＝1的情况，例如，允许属于不同层(但是属于相同的时刻t(i–1))的数据包分布在一个访问单元内。du_interleaving_flag＝0的情况描述为2，在其周围具有圆圈，与图16一致。For completeness, Figure 17 shows the case where du_interleaving_flag = 1, for example, allowing packets belonging to different layers (but belonging to the same time t(i–1)) to be distributed within one access unit. The case where du_interleaving_flag = 0 is described as 2, with circles around it, consistent with Figure 16.

然而，在图16和17中，要注意的是，上述交错信令或交错信号可以停止，造成多层视频数据流，根据在图16和17中显示为1(在其周围具有圆圈)的情况，多层视频数据流必然使用访问单元定义。However, it should be noted in Figures 16 and 17 that the aforementioned interleaving signaling or interleaving signal can be stopped, resulting in a multi-layer video data stream. According to the case shown as 1 in Figures 16 and 17 (with a circle around it), the multi-layer video data stream must use the access unit definition.

根据一个实施方式，解码器可以酌情决定相对于其与数据流的层的相关性，包含在每个访问单元内的NAL单元是否实际上交错。为了容易处理数据流，语法元素(例如，du_interleaving_flag)可以将在收集某个时间戳的所有NAL单元的访问单元内的NAL单元的交错或非交错信令给解码器，以便解码器可以更容易处理NAL单元。例如，每当交错信令为打开时，解码器可以使用不止一个图片缓冲器，如在图18中简单所述。According to one implementation, the decoder can, at its discretion, determine whether the NAL units contained within each access unit are actually interleaved relative to their relevance to the layer of the data stream. To facilitate data stream processing, syntax elements (e.g., `du_interleaving_flag`) can provide the decoder with interleaved or non-interleaved signaling for the NAL units within the access unit that collects all NAL units at a given timestamp, so that the decoder can more easily process the NAL units. For example, whenever interleaved signaling is enabled, the decoder can use more than one picture buffer, as simply illustrated in Figure 18.

图18示出了解码器700，其可以如上面在图2中所述来体现，并且可以甚至符合在图9中提出的描述。示例性地，图17的多层视频数据流(选项1，在其周围具有圆圈)显示为输入的解码器700。为了更容易执行每个访问单元AU的属于不同层(但是属于共同的时刻)的NAL单元的去交错，解码器700使用两个缓冲器702和704，对于每个访问单元AU，例如，多路复用器706给缓冲器702转发属于第一层的访问单元AU的NAL单元，并且例如，给缓冲器704转发属于第二层的NAL单元。然后，解码单元708执行解码。例如，在图18中，例如，使用非阴影线显示属于基础/第一层的NAL单元，然而，使用阴影线显示相关/第二层的NAL单元。如果上述交错信号存在于数据流内，那么解码器700可以通过以下方式响应于这个交错信令：如果交错信令将NAL单元信令为打开，即，不同层的NAL单元在一个访问单元AU内彼此交错，并且解码器700使用缓冲器702和704，多路复用器706在这些缓冲器上分布NAL单元，如上所述。然而，如果不存在，那么解码器700仅仅使用一个缓冲器702和704，用于由任何访问单元包括的所有NAL单元，例如，缓冲器702。Figure 18 illustrates a decoder 700, which can be embodied as described above in Figure 2 and may even conform to the description presented in Figure 9. Exemplarily, the multi-layer video data stream of Figure 17 (Option 1, circled around it) is shown as the input to the decoder 700. To facilitate deinterleaving of NAL units belonging to different layers (but belonging to a common moment) for each access unit AU, the decoder 700 uses two buffers 702 and 704. For each access unit AU, for example, multiplexer 706 forwards NAL units belonging to the first layer access unit AU to buffer 702, and, for example, forwards NAL units belonging to the second layer to buffer 704. The decoding unit 708 then performs decoding. For example, in Figure 18, NAL units belonging to the basic/first layer are shown without shaded lines, while NAL units belonging to the related/second layer are shown with shaded lines. If the aforementioned interleaving signal exists within the data stream, then decoder 700 can respond to this interleaving signaling in the following manner: if the interleaving signaling signals the NAL units to be enabled, i.e., NAL units of different layers are interleaved with each other within an access unit AU, then decoder 700 uses buffers 702 and 704, and multiplexer 706 distributes the NAL units on these buffers, as described above. However, if it does not exist, then decoder 700 uses only one buffer 702 and 704 for all NAL units included by any access unit, for example, buffer 702.

为了更容易理解图18的实施方式，参照图18以及图19，图19显示了编码器，其被配置为生成多层视频数据流，如上所述。图9的编码器通常使用参考符号720表示，并且编码(在此处，示例性地)2层的输入图片，为了容易理解，这两层表示为形成基层的层12以及形成相关层的层1。如上所述，可以形成不同的视图。一般编码次序(编码器720可以沿着该次序编码层12和15的图片)大体上沿着其时间(显示时间)次序扫描这些层的图片，其中，编码次序722可以以图片组为单位，偏离图片12和15的显示时间次序。在每个暂时的时刻，编码次序722沿着其依赖性穿过层12和15的图片，即，从层12到层15。To better understand the implementation of FIG18, referring to FIG18 and FIG19, FIG19 shows an encoder configured to generate a multi-layer video data stream as described above. The encoder of FIG9 is generally represented by reference numeral 720 and encodes (here, exemplary) two layers of input images, which for ease of understanding are represented as layer 12 forming the base layer and layer 1 forming the related layer. As described above, different views can be formed. The general encoding order (along which encoder 720 may encode images of layers 12 and 15) scans the images of these layers generally along their temporal (display time) order, wherein the encoding order 722 may deviate from the display time order of images 12 and 15 on a group-by-group basis. At each transient moment, the encoding order 722 traverses the images of layers 12 and 15 along its dependency, i.e., from layer 12 to layer 15.

编码器720将层12和15的图片编码成数据流40，以上述NAL单元为单位，每个NAL单元在空间的意义上与一部分相应图片相关联。因此，属于某个图片的NAL单元在空间上并且如上所述细分或划分相应的图片，层间预测使层15的图片部分取决于层12的时间对准图片的部分，这些部分大体上共同定位到层15图片的相应部分中，“大体上”包围视差位移。在图19的实例中，编码器720选择为利用交错可能性，形成收集属于某个时刻的所有NAL单元的访问单元。在图19中，所显示的在数据流40中的部分对应于输入图18的解码器中的部分。即，在图19的实例中，编码器720在编码层12和15中使用层间并行处理。就时刻t(i–1)而言，层12的图片的NAL单元1一编码，编码器720就开始编码层1的图片。完成了编码的每个NAL单元由编码器720输出，具有对应于相应的NAL单元由编码器720输出的时间的到达时间戳。在时刻t(i–1)编码层12的图片的第一NAL单元之后，编码器720继续编码层12的图片的内容，并且输出层12的图片的第二NAL单元，具有在层15的时间对准图片的第一NAL单元的到达时间戳之后的到达时间戳。即，编码器720通过交错的方式输出均属于相同的时刻的层12和15的图片的NAL单元，并且通过这个交错的方式，实际上传输数据流40的NAL单元。编码器720选择利用交错的可能性的情况通过相应的交错信令724在数据流40内由编码器720表示。由于编码器720能够比非交错场景更早输出时刻t(i–1)的相关层15的第一NAL单元，根据非交错场景，层15的第一NAL单元的输出推迟，直到完成时间对准的基层图片的所有NAL单元的编码和输出，所以在图18的解码器与图19的编码器之间的端对端延迟可以减少。Encoder 720 encodes the images of layers 12 and 15 into a data stream 40, in units of the aforementioned NAL units, each NAL unit spatially associated with a portion of the corresponding image. Therefore, the NAL units belonging to a particular image are spatially subdivided or partitioned into the corresponding image as described above, and inter-layer prediction causes portions of the image of layer 15 to depend on portions of the temporally aligned image of layer 12, these portions being substantially co-located within the corresponding portions of the image of layer 15, "substantially" surrounding the parallax shift. In the example of Figure 19, encoder 720 chooses to utilize interleaving probabilities to form access units that collect all NAL units belonging to a given time. In Figure 19, the portions shown in data stream 40 correspond to the portions in the decoder input to Figure 18. That is, in the example of Figure 19, encoder 720 uses inter-layer parallel processing in encoding layers 12 and 15. For time t(i–1), as soon as NAL unit 1 of the image of layer 12 is encoded, encoder 720 begins encoding the image of layer 1. Each NAL unit that has been encoded is output by encoder 720, having an arrival timestamp corresponding to the time when the corresponding NAL unit was output by encoder 720. After encoding the first NAL unit of the image of layer 12 at time t(i–1), encoder 720 continues to encode the content of the image of layer 12 and outputs the second NAL unit of the image of layer 12, having an arrival timestamp after the arrival timestamp of the first NAL unit of the time-aligned image of layer 15. That is, encoder 720 outputs the NAL units of the images of layers 12 and 15 that belong to the same time in an interleaved manner, and through this interleaving, the NAL units of data stream 40 are actually transmitted. Encoder 720 selects the possible use of interleaving and indicates it in data stream 40 by encoder 720 through corresponding interleaving signaling 724. Since encoder 720 can output the first NAL unit of the relevant layer 15 at time t(i–1) earlier than in non-interlaced scenarios, the output of the first NAL unit of layer 15 is delayed until the encoding and output of all NAL units of the time-aligned base image are completed, so the end-to-end delay between the decoder in FIG18 and the encoder in FIG19 can be reduced.

如上所述，根据一个交替的实例，在非交错的情况下，即，在信令724表示非交错的替换物的情况下，访问单元的定义依然相同，即，访问单元AU可以收集属于某个时刻的所有NAL单元。在这种情况下，信令724仅仅表示在每个访问单元内，属于不同层12和15的NAL单元是否交错。As described above, according to an alternating instance, in the non-interleaved case, i.e., when signaling 724 represents a non-interleaved alternative, the definition of the access unit remains the same; that is, the access unit AU can collect all NAL units belonging to a certain moment. In this case, signaling 724 merely indicates whether NAL units belonging to different layers 12 and 15 are interleaved within each access unit.

如上所述，根据信令724，图18的解码器使用一个缓冲器或两个缓冲器。在打开交错的情况下，编码器700在这两个缓冲器702和704上分布NAL单元，以便例如，层12的NAL单元在缓冲器702内缓冲，而层15的NAL单元在缓冲器704内缓冲。缓冲器702和704逐个访问单元地清空。在信令724表示交错或非交错的情况下，都是如此。As described above, according to signaling 724, the decoder of Figure 18 uses one or two buffers. With interleaving enabled, the encoder 700 distributes NAL units across the two buffers 702 and 704, such that, for example, the NAL units of layer 12 are buffered in buffer 702, while the NAL units of layer 15 are buffered in buffer 704. Buffers 702 and 704 are cleared unit by unit. This is true regardless of whether signaling 724 indicates interleaving or non-interleaving.

如果编码器720在每个NAL单元内设置去除时间，那么这优选，以便解码单元708利用使用交错并行处理从数据流40中解码层12和15的可能性。然而，即使解码器700不应用层间并行处理，端对端延迟已经减少。If the encoder 720 sets a removal time within each NAL unit, this is preferable so that the decoding unit 708 can take advantage of the possibility of decoding layers 12 and 15 from the data stream 40 using interleaved parallel processing. However, even if the decoder 700 does not apply interlayer parallel processing, the end-to-end latency has already been reduced.

如上所述，NAL单元可以是不同的NAL单元类型。每个NAL单元可以具有NAL单元类型索引，表示在一组可能类型中的相应NAL单元的类型，并且在每个访问单元内，相应访问单元的NAL单元的类型可以在NAL单元类型之中遵守排序规则，而仅仅在两个连续的访问单元之间，打破所述排序规则，以便通过调查这个规则，解码器700能够识别访问单元边界。关于更多信息，参照H.264标准。As described above, NAL units can be of different NAL unit types. Each NAL unit can have a NAL unit type index, indicating the type of the corresponding NAL unit in a set of possible types. Within each access unit, the type of the NAL unit of the corresponding access unit can follow an ordering rule among NAL unit types, except between two consecutive access units, where the ordering rule is broken so that the decoder 700 can identify access unit boundaries by investigating this rule. For more information, refer to the H.264 standard.

在图18和19中，解码单元DU可识别，作为属于相同层的在一个访问单元内的几圈连续的NAL单元。例如，在访问单元AU(i–1)内的在图19中表示为“3”和“4”的NAL单元形成一个DU。访问单元AU(i–1)的其他解码单元均包括仅仅一个NAL单元。图9的访问单元AU(i–1)共同示例性包括6个解码单元DU，其替换地设置在访问单元AU(i–1)内，即，由一个层的几圈NAL单元构成，所述一个层在层1与层0之间替换地改变。In Figures 18 and 19, the decoding unit DU is identifiable as several consecutive rings of NAL units belonging to the same layer within an access unit. For example, the NAL units denoted as “3” and “4” in Figure 19 within the access unit AU(i–1) form a DU. Other decoding units in the access unit AU(i–1) each comprise only one NAL unit. The access units AU(i–1) in Figure 9 collectively exemplarily comprise six decoding units DU, which are alternatively arranged within the access unit AU(i–1), i.e., constituted by several rings of NAL units in a layer, which alternately changes between layer 1 and layer 0.

与第一方面相似，在下文中，现在概述了在前文中描述的第二方面可以置入HEVC扩展内。Similar to the first aspect, the second aspect described above can now be incorporated into the HEVC extension, as outlined below.

然而，在这之前，为了完整性起见，描述了的当前HEVC的进一步方面，这能够进行图片间并行处理，即，WPP处理。However, prior to this, for the sake of completeness, a further aspect of the current HEVC is described, which enables parallel processing between images, i.e., WPP processing.

图20描述了目前在HEVC内实现WPP的方式。即，本描述也形成了任何上面或下面描述的实施方式的WPP处理的可选实现方式的基础。Figure 20 illustrates the current implementation of WPP within HEVC. That is, this description also forms the basis for any alternative implementations of WPP processing in the embodiments described above or below.

在基层中，波前并行处理允许并行处理编码树块(CTB)行。在CTB行之上未打破预测依赖性。关于熵编码，WPP改变对在相应的上部CTB行内的左上角CTB的CABAC依赖性，如在图20中所示。一旦完成相应的右上角CTB的熵解码，在以下行中熵编码CTB就可以开始。At the base layer, wavefront parallel processing allows for parallel processing of Code Tree Block (CTB) rows. Prediction dependencies are not broken above the CTB rows. Regarding entropy coding, WPP alters the CABAC dependency on the top-left CTB within the corresponding upper CTB row, as shown in Figure 20. Once the entropy decoding of the corresponding top-right CTB is complete, entropy coding of the CTB in the following rows can begin.

在增强层中，一旦包含相应图像区域的CTB完全解码和可用，CTB的解码就可以开始。In the enhancement layer, once the CTB containing the corresponding image region is fully decoded and available, the decoding of the CTB can begin.

在HEVC及其扩展中，提供解码单元的以下定义：In HEVC and its extensions, the following definition of the decoding unit is provided:

解码单元：如果SubPicHrdFlag等于0，那么是访问单元，否则，是访问单元的子集，其由在访问单元内的一个或多个VCL NAL单元以及相关联的非VCL NAL单元构成。Decoding Unit: If SubPicHrdFlag equals 0, then it is an access unit; otherwise, it is a subset of access units, consisting of one or more VCL NAL units within the access unit and associated non-VCL NAL units.

在HEVC中，如果由外部方式优选，那么假想参考解码器(HRD)可以通过解码单元等级(或子图片等级)可选地操作CPB和DPB，并且子图片HRD参数可用。In HEVC, if preferred by an external method, the hypothetical reference decoder (HRD) can optionally operate the CPB and DPB at the decoding unit level (or sub-picture level), and the sub-picture HRD parameters are available.

HEVC规范[1]以如下限定的所谓解码单元的概念为特征。The HEVC specification[1] is characterized by the concept of a so-called decoding unit as defined below.

3.1解码单元：如果SubPicHrdFlag等于0，那么是访问单元，否则，是访问单元的子集，其由在访问单元内的一个或多个VCL NAL单元以及相关联的非VCL NAL单元构成。3.1 Decoding Unit: If SubPicHrdFlag equals 0, then it is an access unit; otherwise, it is a subset of access units, which consists of one or more VCL NAL units within the access unit and associated non-VCL NAL units.

在存在于用于3D[3]、多视图[2]或空间可扩展性[4]的HEVC扩展内的分层编码视频序列中，其中，通过预测性层间/视图间编码工具，根据下层，编码视频数据(例如，具有更高的保真度、空间分辨率或不同的摄像头视角)的额外表示，可以有利的是，在位流内交错相关层的(在图片区域方面)相关的或共同定位的解码单元，以在编码器和解码器上尽可能减少端对端延迟。In a layered encoded video sequence that exists within an HEVC extension for 3D[3], multiple views[2], or spatial scalability[4], where additional representations of the encoded video data (e.g., with higher fidelity, spatial resolution, or different camera views) are encoded according to the lower layer by predictive interlayer/interview coding tools, it is advantageous to interleave related or co-located decoding units of related layers (in terms of picture regions) within the bitstream to minimize end-to-end latency on the encoder and decoder.

为了允许在编码的视频位流内交错解码单元，必须信令和执行在编码的视频位流上的某些约束。In order to allow interleaving of decoding units within the encoded video bitstream, certain constraints on the encoded video bitstream must be signaled and enforced.

在以下分段内详细描述可以在HEVC内实现以上交错概念的方式，并且解释原因。The following sections describe in detail how the above interleaving concept can be implemented within HEVC, and explain the reasons.

就从MV-HEVC规范[2]的文件初稿中提取的HEVC扩展的当前状态而言，强调使用访问单元的定义，根据该定义，访问单元包含一个编码图片(具有nuh_layer_id的特定值)。一个编码图片下面与在MVC中的视图分量基本上相同地定义。访问单元应是否限定为包含具有相同的POC值的所有视图分量，这是一个开放性问题。As far as the current state of the HEVC extension extracted from the initial draft of the MV-HEVC specification [2] is concerned, it emphasizes the use of the definition of an access unit, according to which an access unit contains an encoded picture (with a specific value of nuh_layer_id). An encoded picture is defined below in essentially the same way as a view component in MVC. Whether the access unit should be limited to containing all view components with the same POC value is an open question.

基础HEVC规范[1]定义为：The basic HEVC specification[1] defines it as follows:

3.1访问单元：根据规定的分类规则彼此相关联的一组NAL单元按照解码次序连续，并且包含正好一个编码图片。3.1 Access Unit: A group of NAL units that are related to each other according to the prescribed classification rules, are consecutive in the decoding order, and contain exactly one encoded image.

注释1：除了包含编码图片的VCL NAL单元以外，访问单元还可以包含非VCL NAL单元。访问单元的解码始终产生解码图片。Note 1: In addition to VCL NAL units containing encoded images, access units may also contain non-VCL NAL units. Decoding an access unit always produces a decoded image.

似乎通过每个相关视图解释为单独编码图片并且需要包含在单独访问单元内的方式，解释允许在每个访问单元内的仅仅一个编码图片的访问单元(AU)定义。这在图17中描述为“2”。It appears that by interpreting each relevant view as a separate encoded image and requiring it to be contained within a separate access unit, the interpretation allows for the definition of an access unit (AU) of only one encoded image within each access unit. This is described as "2" in Figure 17.

在先前的标准中，“编码图片”包含某个时间戳的图片的视图表示的所有层。In the previous standard, an "encoded image" contained all layers of the view representation of an image at a certain timestamp.

访问单元不能交错。这表示如果每个视图包含在不同的访问单元内，那么在可以解码相关视图的第一解码单元(DU)之前，基视图的整个图片需要在DPB内接收。Access units cannot be interleaved. This means that if each view is contained within a different access unit, then the entire image of the base view needs to be received within the DPB before the first decoding unit (DU) of the relevant view can be decoded.

对于具有相关层/视图的超低延迟操作，可以有利地交错解码单元。For ultra-low latency operations with associated layers/views, interleaving decoding units can be advantageous.

图21的实例包含三个视图，每个视图均具有三个解码单元。通过从左到右的方式接收解码单元：The example in Figure 21 contains three views, each with three decoding units. The decoding units are received from left to right:

如果每个视图包含在自身的访问单元内，那么解码视图3的第一解码单元的最小延迟包括完全接收视图1和2。If each view is contained within its own access unit, then the minimum delay of the first decoding unit for decoding view 3 includes fully receiving views 1 and 2.

如果可以交错地发送视图，那么如图1中所示，并且如在图18和19中已经解释的，可以减少最小延迟。If views can be sent in an interleaved manner, then as shown in Figure 1 and as explained in Figures 18 and 19, the latency can be minimized.

可以如下实现在HEVC的可扩展延伸中交错不同层的NAL单元：The NAL units of different layers in the scalable extension of HEVC can be interleaved as follows:

用于层或视图表示的位流交错机构以及解码器，该解码器可以使用并行化技术实现，能使用这个位流布局来解码具有非常低的延迟的相关视图。通过标志(例如，du_interleaving_enabled_flag)，控制DU的交错。A bitstream interleaving mechanism and decoder are used for layer or view representation. This decoder can be implemented using parallelization techniques, enabling the decoding of related views with very low latency using this bitstream layout. The interleaving of the DU is controlled by flags (e.g., `du_interleaving_enabled_flag`).

为了在HEVC的可扩展延伸中允许低延迟解码和并行化，需要交错相同AU的不同层的NAL单元。因此，可以介绍沿着以下内容的定义：To enable low-latency decoding and parallelization in scalable extensions of HEVC, NAL units from different layers of the same AU need to be interleaved. Therefore, a definition can be introduced along the following lines:

访问单元：根据规定的分类规则彼此相关联的一组NAL单元按照解码次序连续，并且包含正好一个编码图片；Access Unit: A group of NAL units that are related to each other according to the prescribed classification rules, are consecutive in the decoding order, and contain exactly one encoded image;

编码层图片分量：包含层图片分量的所有编码树单元的层图片分量的编码表示；Encoded layer image component: The encoded representation of the layer image component of all coding tree units containing the layer image component;

编码图片：包含一个或多个编码层图片分量的图片的所有编码树单元的图片的编码表示；Encoded image: The coded representation of an image containing one or more coded layer image components, representing all coded tree units of the image;

图片：图片是一个或多个层图片分量构成的组；Image: An image is a group consisting of one or more layers of image components;

层图片分量：具有单色格式的亮度样本额阵列或者亮度样本的阵列以及具有4:2:0、4:2:2以及4:4:4颜色格式的色度样本的两个相应阵列，所述编码表示由在访问单元内的所有NAL单元之中的特定层的NAL单元构成。Layer image components: an array of luminance samples in monochrome format or an array of luminance samples and two corresponding arrays of chrominance samples in 4:2:0, 4:2:2 and 4:4:4 color formats, wherein the encoding representation consists of NAL units of a specific layer among all NAL units within the access unit.

NAL单元通过这种方式交错(比较：du_interleaving_enabled_flag＝＝1)，遵循在其间的依赖性，以便每个NAL单元可以使用仅仅通过解码次序在先前的NAL单元内接收的数据解码，即，稍后按照解码次序的NAL单元的数据都不需要用于解码NAL单元。NAL units are interleaved in this way (compare: du_interleaving_enabled_flag == 1), following the dependencies between them, so that each NAL unit can be decoded using only the data received in the previous NAL unit according to the decoding order, that is, the data of NAL units later in the decoding order do not need to be used to decode NAL units.

在应用交错(比较：du_interleaving_enabled_flag＝＝1)并且将亮度和色度分量分成不同的颜色平面时，允许交错与颜色平面相关联的相应NAL单元。这些相应NAL单元中的每个(与colour_plane_id的唯一值相关联的)必须履行VCL NAL单元次序，如下所述。由于颜色平面预期在访问单元内在彼此之间没有编码依赖性，所以遵循正常的次序。When applying interleaving (compare: du_interleaving_enabled_flag == 1) and separating the luminance and chrominance components into different color planes, interleaving the corresponding NAL units associated with the color planes is permitted. Each of these corresponding NAL units (associated with a unique value of colour_plane_id) must adhere to the VCL NAL unit order, as described below. Since color planes are expected to have no coded dependencies on each other within the accessed units, they follow the normal order.

可以使用语法元素min_spatial_segment_delay表示对NAL单元次序的约束，该语法元素测量和保证在空间段之间的最坏情况的延迟/偏移，以CTB为单位。语法元素描述在CTB之间的空间区域的依赖性或者基础和增强层的空间段(例如，瓦片、薄片或WPP的CTB行)。语法元素不需要用于交错NAL单元或按照编码次序连续编码NAL单元。并行多层解码器可以使用语法元素来建立层的并行解码。The `min_spatial_segment_delay` syntax element can be used to represent constraints on the order of NAL units. This element measures and guarantees the worst-case delay/offset between spatial segments, in CTB units. The syntax element describes the dependency of spatial regions between CTBs or spatial segments of the base and enhancement layers (e.g., CTB rows of tiles, slices, or WPPs). This syntax element is not required for interleaved NAL units or for consecutively encoded NAL units in coding order. Parallel multilayer decoders can use this syntax element to establish parallel decoding of layers.

以下约束影响允许在层/视图之上并行化以及交错解码单元的可能性，如主要在第一方面中所述：The following constraints affect the possibility of parallelization on top of layers/views and interleaved decoding units, as mainly described in the first aspect:

1)样本和语法元素的预测：1) Prediction of samples and syntactic elements:

亮度和色度重样本的内插滤波器对在下层中要求的数据设置约束条件，以生成更高层需要的上取样数据。通过约束这些滤波器，可以减少解码依赖性，例如，由于图片的段可以单独上取样。上面在第一方面中讨论了瓦片处理的特定约束条件的信令。Interpolation filters for luminance and chrominance resampling set constraints on the data required in lower layers to generate the upsampled data needed for higher layers. By constraining these filters, decoding dependencies can be reduced, for example, since segments of an image can be upsampled individually. The signaling of specific constraints for tile processing was discussed above in the first aspect.

“基于参考索引的可扩展延伸”(HLS方法)的运动矢量预测以及更具体地，时间运动矢量预测(TMVP)对在下层中要求的数据设置约束条件，以生成需要的重样本图片运动字段。上面在第一方面中讨论了相关的发明和信令。Motion vector prediction based on a reference index (HLS method) and, more specifically, temporal motion vector prediction (TMVP) sets constraints on the data required in the lower layer to generate the desired resampled image motion fields. The related inventions and signaling have been discussed above in the first aspect.

2)运动矢量：2) Motion vector:

对于SHVC，运动补偿不供下层使用，即，如果下层用作参考图片(HLS方法)，那么所产生的运动矢量必须是零矢量。然而，对于MV-HEVC 0或3D-HEVC 0，视差矢量可以是约束条件，但是不必是零矢量。即，运动矢量可以用于视图间预测。因此，可以应用对运动矢量的限制，以确保仅仅在先前的NAL单元内接收的数据需要用于解码。上面在第一方面中讨论了相关的发明和信令。For SHVC, motion compensation is not used by lower layers; that is, if the lower layer is used as a reference image (HLS method), then the resulting motion vectors must be zero vectors. However, for MV-HEVC 0 or 3D-HEVC 0, disparity vectors can be constraints, but do not have to be zero vectors. That is, motion vectors can be used for inter-view prediction. Therefore, constraints on motion vectors can be applied to ensure that only data received within the previous NAL unit needs to be used for decoding. The related inventions and signaling have been discussed above in the first aspect.

3)具有瓦片边界的图片划分：3) Image segmentation with tile boundaries:

如果有效地希望并行处理和低延迟，不同层的NAL单元交错，那么在增强层内应进行图片划分，与在参考层内的划分的图片划分相关。If parallel processing and low latency are desired effectively, NAL units from different layers should be interleaved. In this case, image partitioning should be performed within the enhancement layer, in relation to the partitioning within the reference layer.

就VCL NAL单元的次序和编码图片的相关性而言，可以规定以下内容。Regarding the order of VCL NAL units and the correlation of encoded images, the following can be specified.

每个VCL NAL单元是编码图片的一部分。Each VCL NAL unit is a part of the encoded image.

在编码图片的编码层图片分量内的VCL NAL单元(即，具有相同layer_id_nuh值的编码图片的VCL NAL单元)的次序如下限制：The order of VCL NAL units (i.e., VCL NAL units of encoded images with the same layer_id_nuh value) within the encoded layer image component is restricted as follows:

–编码层图片分量的第一VCL NAL单元具有等于1的first_slice_segment_in_pic_flag；– The first VCL NAL unit of the coded layer picture component has a first_slice_segment_in_pic_flag equal to 1;

–假设sliceSegAddrA和sliceSegAddrB是在相同的编码层图片分量内的任何两个编码的薄片段NAL单元A和B的slice_segment_address值。在以下条件中的任一个成立时，编码的薄片段NAL单元A位于编码的薄片段NAL单元B之前：– Assume sliceSegAddrA and sliceSegAddrB are the slice_segment_address values of any two encoded slice NAL units A and B within the same coding layer picture component. Encoded slice NAL unit A precedes encoded slice NAL unit B if any of the following conditions are true:

–TileId[CtbAddrRsToTs[sliceSegAddrA]]小于TileId[CtbAddrRsToTs[sliceSegAddrB]]；–TileId[CtbAddrRsToTs[sliceSegAddrA]] is less than TileId[CtbAddrRsToTs[sliceSegAddrB]];

–TileId[CtbAddrRsToTs[sliceSegAddrA]]等于TileId[CtbAddrRsToTs[sliceSegAddrB]]，并且CtbAddrRsToTs[sliceSegAddrA]小于CtbAddrRsToTs[sliceSegAddrB]。–TileId[CtbAddrRsToTs[sliceSegAddrA]] is equal to TileId[CtbAddrRsToTs[sliceSegAddrB]], and CtbAddrRsToTs[sliceSegAddrA] is less than CtbAddrRsToTs[sliceSegAddrB].

如果编码图片由不止一个编码层图片分量构成，那么所有图片分量的VCL NAL单元如下限制：If the encoded image consists of image components from more than one coding layer, then the VCL NAL units of all image components are subject to the following constraints:

–假设在用作另一个层图片分量layerPicB的参考的编码层图片分量layerPicA内，VCL NAL A是第一VCL NAL单元A。然后，VCL NAL单元A位于属于layerPicB的任何VCLNAL单元B前面；– Assume that within the coded layer picture component layerPicA, which serves as a reference for another layer picture component layerPicB, VCL NAL A is the first VCL NAL unit A. Then, VCL NAL unit A is located before any VCL NAL unit B belonging to layerPicB;

–否则(不是第一VCL NAL单元)，如果du_interleaving_enabled_flag等于0，那么假设VCL NAL A是在用作另一个编码层图片分量layerPicB的参考的编码层图片分量layerPicA的任何VCL NAL单元。然后，VCLNAL单元A位于属于layerPicB的任何VCL NAL单元B前面；Otherwise (not the first VCL NAL unit), if du_interleaving_enabled_flag equals 0, then it is assumed that VCL NAL A is any VCL NAL unit of the coding layer picture component layerPicA that is used as a reference for another coding layer picture component layerPicB. Then, VCL NAL unit A is located before any VCL NAL unit B belonging to layerPicB;

–否则(不是第一VCL NAL单元并且du_interleaving_enabled_flag等于0)，如果ctb_based_delay_enabled_flag等于1(即，无论瓦片还是WPP用于视频序列内，都信令基于CTB的延迟)，那么假设layerPicA是在用作另一个编码层图片分量layerPicB的参考的编码层图片分量。还假设NALUsetA是属于layerPicA的并且直接在属于layerPicB的一系列连续薄片段NAL单元之后的一系列连续薄片段NAL单元，NALUsetB1和NALUsetB2是直接在NALUsetA之后的属于layerPicB的一系列连续薄片段NAL单元。假设sliceSegAddrA是NALUsetA的第一段NAL单元的slice_segment_address，并且sliceSegAddrB是NALUsetB2的第一编码薄片段NAL单元的slice_segment_address。然后，以下条件成立：Otherwise (not the first VCL NAL unit and du_interleaving_enabled_flag equals 0), if ctb_based_delay_enabled_flag equals 1 (i.e., signaling is based on CTB delay regardless of whether tiles or WPP are used within the video sequence), then assume layerPicA is a coded layer picture component used as a reference for another coded layer picture component, layerPicB. Also assume NALUsetA is a series of consecutive thin-segment NAL units belonging to layerPicA and directly following a series of consecutive thin-segment NAL units belonging to layerPicB, and NALUsetB1 and NALUsetB2 are series of consecutive thin-segment NAL units belonging to layerPicB directly following NALUsetA. Assume sliceSegAddrA is the slice_segment_address of the first NAL unit of NALUsetA, and sliceSegAddrB is the slice_segment_address of the first coded thin-segment NAL unit of NALUsetB2. Then, the following condition holds:

–如果具有NALUsetA，那么具有NALUsetB2；–If NALUsetA exists, then NALUsetB2 exists;

–CtbAddrRsToTs[PicWidthInCtbsYA*CtbRowBA(sliceSegAddrB-1)+CtbColBA(sliceSegAddrB-1)+min_spatial_segment_delay]小于或等于CtbAddrRsToTs[sliceSegAddrA-1]。还参照图23。–CtbAddrRsToTs[PicWidthInCtbsYA*CtbRowBA(sliceSegAddrB-1)+CtbColBA(sliceSegAddrB-1)+min_spatial_segment_delay] is less than or equal to CtbAddrRsToTs[sliceSegAddrA-1]. See also Figure 23.

否则(不是第一VCL NAL单元并且du_interleaving_enabled_flag等于0)，如果tiles_enabled_flag等于0，并且entropy_coding_sync_enabled_flag等于0(即，在视频系列中不使用瓦片，也不使用WPP)，那么layerPicA是在用作另一个编码层图片分量layerPicB的参考的编码层图片分量。还假设VCL NAL单元B可以是编码层图片分量layerPicB的任何VCL NAL单元，并且VCL NAL单元A在layerPicA的VCL NAL单元之前，slice_segment_address的值等于sliceSegAddrA，在VCL NAL单元A与VCL NAL单元B之间具有layerPicA的(min_spatial_segment_delay-1)VCL NAL单元。还假设VCL NAL单元C是位于VCL NAL单元B之后的编码层图片分量layerPicB的下一个VCL NAL单元，slice_segment_address的值等于sliceSegAddrC。假设PicWidthInCtbsYA是图片宽度，以layerPicA的CTB为单位。然后，以下条件成立：Otherwise (not the first VCL NAL unit and du_interleaving_enabled_flag equals 0), if tiles_enabled_flag equals 0 and entropy_coding_sync_enabled_flag equals 0 (i.e., tiles are not used in the video series, nor is WPP used), then layerPicA is a coded layer picture component that is used as a reference to another coded layer picture component, layerPicB. It is also assumed that VCL NAL unit B can be any VCL NAL unit of coded layer picture component layerPicB, and that VCL NAL unit A precedes the VCL NAL unit of layerPicA, the value of slice_segment_address is equal to sliceSegAddrA, and there are (min_spatial_segment_delay-1) VCL NAL units of layerPicA between VCL NAL unit A and VCL NAL unit B. It is also assumed that VCL NAL unit C is the next VCL NAL unit of the coded layer picture component layerPicB following VCL NAL unit B, and the value of slice_segment_address is equal to sliceSegAddrC. It is assumed that PicWidthInCtbsYA is the image width in units of layerPicA's CTB. Then, the following condition holds:

–始终具有位于VCL NAL单元B之前的layerPicA的min_spatial_segment_delayVCL NAL单元；- Always has a min_spatial_segment_delayVCL NAL unit in layerPicA that precedes VCL NAL unit B;

–PicWidthInCtbsYA*CtbRowBA(sliceSegAddrC-1)+CtbColBA(sliceSegAddrC-1)小于或等于sliceSegAddrA-1。–PicWidthInCtbsYA*CtbRowBA(sliceSegAddrC-1)+CtbColBA(sliceSegAddrC-1) is less than or equal to sliceSegAddrA-1.

–否则(不是第一VCL NAL单元，du_interleaving_enabled_flag等于1，并且ctb_based_delay_enabled_flag等于0)，如果tiles_enabled_flag等于0，并且entropy_coding_sync_enabled_flag等于1(即，在视频系列中使用WPP)，那么假设sliceSegAddrA是直接在薄片段VCL NAL单元B之前的编码层图片分量layerPicA的任何段NAL单元A的slice_segment_address，slice_segment_address等于属于将layerPicA用作参考的编码层图片分量layerPicB的sliceSegAddrB。还假设PicWidthInCtbsYA是图片宽度，以layerPicA的CTB为单位。然后，以下条件成立：– Otherwise (not the first VCL NAL unit, du_interleaving_enabled_flag equals 1, and ctb_based_delay_enabled_flag equals 0), if tiles_enabled_flag equals 0 and entropy_coding_sync_enabled_flag equals 1 (i.e., WPP is used in the video series), then assume sliceSegAddrA is the slice_segment_address of any segment NAL unit A of the coding layer picture component layerPicA directly preceding the thin segment VCL NAL unit B, and slice_segment_address is equal to sliceSegAddrB of the coding layer picture component layerPicB that uses layerPicA as a reference. Also assume PicWidthInCtbsYA is the picture width in CTB units of layerPicA. Then, the following condition holds:

–(CtbRowBA(sliceSegAddrB)–Floor((sliceSegAddrA)/PicWidthInCtbsYA)+1)等于或大于min_spatial_segment_delay。–(CtbRowBA(sliceSegAddrB)–Floor((sliceSegAddrA)/PicWidthInCtbsYA)+1) is equal to or greater than min_spatial_segment_delay.

否则(不是第一VCL NAL单元，du_interleaving_enabled_flag等于1，并且ctb_based_delay_enabled_flag等于0)，如果tiles_enabled_flag等于1，并且entropy_coding_sync_enabled_flag等于0(即，在视频系列中使用瓦片)，那么假设sliceSegAddrA是编码层图片分量layerPicA的任何段NAL单元A的slice_segment_address，并且薄片段VCL NAL单元B是属于将layerPicA用作参考的编码层图片分量layerPicB的以下第一VCLNAL单元，slice_segment_address等于sliceSegAddrB。还假设PicWidthInCtbsYA是图片宽度，以layerPicA的CTB为单位。然后，以下条件成立：Otherwise (not the first VCL NAL unit, du_interleaving_enabled_flag equals 1, and ctb_based_delay_enabled_flag equals 0), if tiles_enabled_flag equals 1 and entropy_coding_sync_enabled_flag equals 0 (i.e., tiles are used in the video series), then assume sliceSegAddrA is the slice_segment_address of any segment NAL unit A of the coding layer picture component layerPicA, and thin segment VCL NAL unit B is the following first VCL NAL unit belonging to the coding layer picture component layerPicB that uses layerPicA as a reference, with slice_segment_address equal to sliceSegAddrB. Also assume PicWidthInCtbsYA is the picture width in CTB units of layerPicA. Then, the following condition holds:

–TileId[CtbAddrRsToTs[PicWidthInCtbsYA*CtbRowBA(sliceSegAddrB-1)+CtbColBA(sliceSegAddrB-1)]]-TileId[CtbAddrRsToTs[sliceSegAddrA-1]]应等于或大于min_spatial_segment_delay。–TileId[CtbAddrRsToTs[PicWidthInCtbsYA*CtbRowBA(sliceSegAddrB-1)+CtbColBA(sliceSegAddrB-1)]]-TileId[CtbAddrRsToTs[sliceSegAddrA-1]] should be equal to or greater than min_spatial_segment_delay.

如在图24中所示，信令724可以设置在VPS内，其中：As shown in Figure 24, signaling 724 can be configured within the VPS, where:

在du_interleaving_enabled_flag等于1时，du_interleaving_enabled_flag规定帧具有单个相关联的编码图片(即，单个相关联的AU)，其由用于该帧的所有编码层图片分量构成，并且对应于不同层的VCL NAL单元可以交错。在du_interleaving_enabled_flag等于0时，帧可以具有不止一个相关联编码图片(即，一个或多个相关联AU)，并且不同的编码层图片分量的VCL NAL单元不交错。When `du_interleaving_enabled_flag` equals 1, it specifies that a frame has a single associated coded picture (i.e., a single associated AU), which consists of all the coded layer picture components used for that frame, and the VCL NAL units corresponding to different layers can be interleaved. When `du_interleaving_enabled_flag` equals 0, a frame can have more than one associated coded picture (i.e., one or more associated AUs), and the VCL NAL units of different coded layer picture components are not interleaved.

为了完成以上讨论，与图18的实施方式对准，与解码器700相关联的假想参考解码器适合于根据信令724的设置，通过缓冲器702和704的一个或两个缓冲器操作，即，根据信令724在这些选项之间切换。To complete the above discussion, in alignment with the embodiment of FIG18, the hypothetical reference decoder associated with decoder 700 is adapted to operate via one or both of buffers 702 and 704 according to the settings of signaling 724, i.e., to switch between these options according to signaling 724.

在下文中，描述本申请的另一个方面，其再次与方面1和/或方面2相结合。本申请的第三方面涉及可扩展性信令的延伸，用于供大量(例如)视图使用。In the following description, another aspect of this application is described, again in conjunction with aspect 1 and/or aspect 2. A third aspect of this application relates to an extension of scalable signaling for use by a large number (e.g.) of views.

为了容易理解下面提出的描述，提供现有可扩展性信令概念的概述。To facilitate understanding of the descriptions presented below, an overview of existing scalable signaling concepts is provided.

大部分先有技术3D视频应用或部署以这两个摄像头视角中的每个具有或没有相应深度图的立体内容或者以每个摄像头视角具有或没有相应深度图的具有更高数量视图(>2)的多视图内容为特征。Most existing 3D video applications or deployments are characterized by stereoscopic content with or without a corresponding depth map in each of the two camera views, or by multi-view content with a higher number of views (>2) in each of the two camera views with or without a corresponding depth map.

高效率视频编码(HEVC)标准[1]及其3D和多视图视频[2][3]的扩展以在网络抽象层(NAL)上的可扩展性信令为特征，该网络抽象层能够表示高达64个不同的层，在每个NAL单元的报头内具有6位层标识符(比较：nuh_layer_id)，如在图25的语法表格中所规定的。The High Efficiency Video Coding (HEVC) standard [1] and its extensions to 3D and Multi-View Video [2][3] are characterized by scalable signaling on a Network Abstraction Layer (NAL) capable of representing up to 64 different layers, with a 6-bit layer identifier (compare: nuh_layer_id) in the header of each NAL unit, as specified in the syntax table of Figure 25.

层标识符的每个值可以转换成一组可扩展标识符变量(例如，DependencyID、ViewID等)，例如，通过视频参数组扩展，根据使用中的可扩展性维度，这允许在网络抽象层上指示最多64个专用视图，或者如果层标识符还用于指示位深度，那么允许32个专用视图。Each value of the layer identifier can be converted into a set of scalable identifier variables (e.g., DependencyID, ViewID, etc.), for example, by extending the video parameter group. Depending on the scalability dimension in use, this allows up to 64 dedicated views to be indicated on the network abstraction layer, or 32 dedicated views if the layer identifier is also used to indicate bit depth.

然而，还具有应用，需要将远远更大数量的视图编码成视频位流、传送、解码以及显示，例如，在具有大量摄像头的多摄像头阵列中或者在需要大量视角的全息显示中，如在[5][6][7]中所提出的。以下部分描述两个发明，所述发明解决了用于扩展的HEVC高级语法的上述缺点。However, there are also applications that require encoding, transmitting, decoding, and displaying a far greater number of views into video bitstreams, for example, in multi-camera arrays with a large number of cameras or in holographic displays requiring a large number of viewpoints, as proposed in [5][6][7]. The following sections describe two inventions that address the aforementioned shortcomings of the extended HEVC high-level syntax.

仅仅扩展在NAL单元报头内的尺寸nuh_layer_id字段，不被视为该问题的有用解决方案。报头预期具有固定的长度，需要该长度，用于容易访问非常简单(低成本)的装置，这些装置在位流(例如，路由和提取)上执行操作。这表示，如果使用远远更少的视图，那么需要为所有情况增加额外位(或字节)。Simply extending the size nuh_layer_id field within the NAL unit header is not considered a useful solution to this problem. The header is expected to have a fixed length, which is necessary for easily accessible, very simple (low-cost) devices that perform operations on bit streams (e.g., routing and extraction). This means that if far fewer views are used, then extra bits (or bytes) are needed for all cases.

而且，在完成第一版本的标准之后，不再能改变NAL单元报头。Furthermore, after the first version of the standard was completed, the NAL unit header could no longer be changed.

以下描述说明了HEVC解码器或中间装置的扩展机制，以扩展可扩展性信令的能力，以便满足上述要求。可以在HEVC高级语法中信令激活和扩展数据。The following description illustrates the extension mechanisms of the HEVC decoder or intermediate device to extend the capabilities of scalable signaling to meet the above requirements. Signaling activation and data extension can be performed in the HEVC high-level syntax.

尤其地，下面描述了表示在视频位流内启用层标识符扩展机制(如在以下部分中所述)的信令。In particular, the following describes the signaling that indicates the enabling of the layer identifier extension mechanism (as described in the following sections) within the video bitstream.

除了第一和第二方面，首先描述在HEVC框架内的第三概念的可能实现方式，然后，描述下面的概括的实施方式。该概念允许发生多视图分量，在相同的访问单元内具有相同的现有层标识符(比较：nuh_layer_id)。额外标识符扩展用于在这些视图分量之间区分。这个扩展不在NAL单元报头内编码。因此，不能与在NAL单元报头内一样容易访问，但是依然允许具有更多视图的新使用情况。尤其地，通过视图群集(参照以下描述)，旧提取机构可以依然用于提取属于彼此的视图组，没有任何修改。In addition to the first and second aspects, a possible implementation of the third concept within the HEVC framework is first described, followed by a generalized implementation. This concept allows for multiple view components with the same existing layer identifier (compare: nuh_layer_id) within the same access unit. An additional identifier extension is used to distinguish these view components. This extension is not encoded within the NAL unit header. Therefore, it is not as easily accessible as within the NAL unit header, but new use cases with more views are still allowed. In particular, through view clustering (see description below), the legacy extraction mechanism can still be used to extract groups of views belonging to each other without any modification.

为了扩展层标识符值的现有范围，本发明描述了以下机制：To extend the existing range of layer identifier values, the present invention describes the following mechanism:

a.现有层标识符的预定值用作特殊值(所谓的“逸出码”)，以指示使用交替的推导处理确定实际值(在特定的实施方式中：使用在NAL单元报头内的语法元素nuh_layer_id的值(层标识符的最大a. Predefined values for existing layer identifiers are used as special values (so-called "escape codes") to indicate the use of alternating derivation processes to determine the actual value (in a particular implementation: using the value of the syntax element nuh_layer_id within the NAL unit header (the maximum value of the layer identifier)).

值))；value));

b.具有更高级语法结构的标志或索引或位深度指示(例如，在薄片报头语法中或者在视频/序列/图片参数组扩展中，如在本发明的以下实施方式中所规定的)，能够组合现有层标识符值的每个值和另一个语法结构。b. Flags or indexes or bit depth indicators with higher-level syntactic structures (e.g., in sheet header syntax or in video/sequence/picture parameter group extensions, as specified in the following embodiments of the invention) are capable of combining each value of an existing layer identifier value with another syntactic structure.

现有机制的激活可以如下实现。The activation of the existing mechanism can be achieved as follows.

对于a)不需要明确的激活信令，即，预留的逸出码始终可以用于信令扩展的使用(a₁)。但是这会将未使用扩展的可能层/视图的数量减少1(逸出码的值)。因此，下面的切换参数可以用于这两个变体(a₂)。For a) where no explicit activation signaling is required, i.e., the reserved escape code can always be used for signaling extensions (a ₁ ). However, this reduces the number of possible layers/views that do not use the extension by 1 (the value of the escape code). Therefore, the following switching parameters can be used for both variants (a ₂ ).

使用在整个位流、视频序列或视频序列的一部分之上持续的一个或多个语法元素，可以在位流内启用或禁用扩展机制。The extension mechanism can be enabled or disabled within a bitstream by using one or more syntax elements that persist over the entire bitstream, video sequence, or a portion of a video sequence.

通过表示现有层标识符的变量LayerId，用于启用扩展机制的本发明的具体实例实施方式是：A specific implementation of the present invention, using the variable LayerId representing the existing layer identifier, to enable the extension mechanism is as follows:

变体I)在图26中显示了变体I。在此处，Variant I) is shown in Figure 26. Here,

layer_id_ext_flag能够使用额外LayerId值。layer_id_ext_flag can use additional LayerId values.

变体II)在图27中显示了变体II。在此处，Variant II is shown in Figure 27. Here,

等于1的layer_id_mode_idc表示使用在nuh_layer_id内的逸出码，扩展LayerId的值范围。等于2的layer_id_mode_idc表示由偏移值扩展LayerId的值范围。等于0的layer_id_mode_idc表示扩展机制不用于LayerId。A layer_id_mode_idc value of 1 indicates that the escape code within nuh_layer_id is used to expand the value range of LayerId. A layer_id_mode_idc value of 2 indicates that the value range of LayerId is expanded by the offset value. A layer_id_mode_idc value of 0 indicates that the expansion mechanism is not used for LayerId.

注意：值可能具有不同的分配模式。Note: Values may have different assignment patterns.

变体III)在图28中显示了变体III。在此处，Variant III is shown in Figure 28. Here,

layer_id_ext_len表示用于扩展LayerId范围的位数。layer_id_ext_len represents the number of bits used to extend the range of LayerId.

以上语法源用作指示器，使用层标识符扩展机制来指示相应NAL单元或薄片数据的层标识符。The above syntax source is used as an indicator, using a layer identifier extension mechanism to indicate the layer identifier of the corresponding NAL cell or sheet data.

在下面的描述中，变量LayerIdExtEnabled用作启用扩展机制的布尔指示器。在本说明书中，变量用于容易参考。本发明的变量名称实例和实施方式可以使用不同名称或者直接使用相应的语法元素。根据以上情况，如下获得变量LayerIdExtEnabled：In the following description, the variable LayerIdExtEnabled is used as a Boolean indicator to enable the extension mechanism. In this specification, variables are used for ease of reference. Instances and implementations of the variable names in this invention may use different names or directly use the corresponding syntax elements. Based on the above, the variable LayerIdExtEnabled is obtained as follows:

对于a₁)，如果层标识符源元素的预定值仅仅用于启用层标识符扩展机制，那么以下适用：For _a1 ), if the predetermined value of the layer identifier source element is used only to enable the layer identifier extension mechanism, then the following applies:

if(nuh_layer_id＝＝predetermined value)if(nuh_layer_id＝＝predetermined value)

LayerIdExtEnabled＝trueLayerIdExtEnabled = true

elseelse

LayerIdExtEnabled＝falseLayerIdExtEnabled = false

对于a₂)和b)，如果变体I)(即，标志(例如，layer_id_ext_enable_flag))用于启用层标识符扩展机制，那么以下适用：For _a ) and b), if variant I) (i.e., the flag (e.g., layer_id_ext_enable_flag)) is used to enable the layer identifier extension mechanism, then the following applies:

LayerIdExtEnabled＝layer_id_ext_enable_flag。LayerIdExtEnabled=layer_id_ext_enable_flag.

对于a₂)和b)，如果变体II)(即，索引(例如，layer_id_mode_idc))用于启用层标识符扩展机制，那么以下适用：For _a ) and b), if variant II) (i.e., the index (e.g., layer_id_mode_idc)) is used to enable the layer identifier extension mechanism, then the following applies:

if(layer_id_mode_idc＝＝predetermined value)if(layer_id_mode_idc==predetermined value)

LayerIdExtEnabled＝trueLayerIdExtEnabled = true

elseelse

LayerIdExtEnabled＝falseLayerIdExtEnabled = false

对于a₂)和b)，如果变体III)(即，位长度指示(例如，layer_id_ext_len))用于启用层标识符扩展机制，那么以下适用：For _a ) and b), if variant III (i.e., the bit length indicator (e.g., layer_id_ext_len)) is used to enable the layer identifier extension mechanism, then the following applies:

if(layer_id_ext_len>0)if(layer_id_ext_len>0)

LayerIdExtEnabled＝trueLayerIdExtEnabled = true

elseelse

LayerIdExtEnabled＝falseLayerIdExtEnabled = false

对于a₂)，如果预定的值与启用的语法元素相结合使用，那么以下适用：For a ₂ ), if the predefined value is used in conjunction with an enabled syntax element, then the following applies:

LayerIdExtEnabled&＝(nuh_layer_id＝＝predetermined value)。LayerIdExtEnabled&=(nuh_layer_id==predetermined value).

层标识符扩展可以如下信令：Layer identifier extension can be signaled as follows:

如果启用扩展机制(例如，通过信令，如在前一个部分中所述)，那么预先定义的或信令的数量的位(比较：layer_id_ext_len)用于确定实际的LayerId值。对于VCL NAL单元，额外位可以包含在薄片报头语法(例如，使用现有扩展)内或者包含在SEI消息内，通过在视频位流内的位置，或者通过与相应薄片数据相关联的索引，该消息用于扩展在NAL单元报头内的层标识符的信令层。If an extension mechanism is enabled (e.g., via signaling, as described in the previous section), then a predefined number of bits (compare: layer_id_ext_len) are used to determine the actual LayerId value. For VCL NAL units, the extra bits may be included within the slice header syntax (e.g., using existing extensions) or within the SEI message, either by position within the video bitstream or by an index associated with the corresponding slice data, which is used to extend the signaling layer of the layer identifier within the NAL unit header.

对于非VCL NAL单元(VPS、SPS、PPS、SEI消息)，额外标识符可以增加到特定扩展中或者还由相关联的SEI消息增加。For non-VCL NAL units (VPS, SPS, PPS, SEI messages), additional identifiers can be added to specific extensions or also added by the associated SEI message.

在进一步的描述中，特定的语法元素称为layer_id_ext，与其在位流语法内的位置无关。名称用作一个实例。以下语法表格和语义提供了可能实施方式的实例。In further description, a specific syntax element is referred to as `layer_id_ext`, regardless of its position within the bitstream syntax. The name is used as an instance. The following syntax table and semantics provide examples of possible implementations.

在图29中例证了在薄片报头内的层标识符扩展的信令。Figure 29 illustrates the signaling of layer identifier extension within the sheet header.

在图30中显示了在薄片报头内的层标识符扩展的替换的信令。Figure 30 shows the signaling for the replacement of the layer identifier extension within the sheet header.

在图31中显示了用于视频参数组(VPS)信令的一个实例。Figure 31 shows an example of signaling used for Video Parameter Group (VPS).

对于SPS、PPS以及SEI消息，具有相似的扩展。额外的语法元素可以通过相似的方式加入这些扩展中。SPS, PPS, and SEI messages have similar extensions. Additional syntax elements can be added to these extensions in a similar manner.

在图32中显示了在相关联的SEI消息(例如，层ID扩展SEI消息)内信令层标识符。Figure 32 shows the signaling layer identifier within the associated SEI message (e.g., the Layer ID Extended SEI message).

可以基于其在位流内的位置，确定SEI消息的范围。在本发明的一个特定实施方式中，在层ID扩展SEI消息之后所有NAL单元与layer_id_ext的值相关联，直到接收新接入单元或新层ID扩展SEI消息的开始。The range of an SEI message can be determined based on its position within the bit stream. In a particular embodiment of the invention, all NAL units are associated with the value of layer_id_ext after the Layer ID Extended SEI message until the reception of a new access unit or the start of a new Layer ID Extended SEI message.

根据其位置，可以使用固定(在此处表示为u(v))或可变(ue(v))长度代码，将额外语法元素编码。Depending on its position, additional syntax elements can be encoded using fixed (represented here as u(v)) or variable (ue(v)) length codes.

然后，根据层标识符扩展机制(比较：LayerIdExtEnabled)的激活，通过在数学上组合由在NAL单元报头(比较：nuh_layer_id)内的层标识符以及层标识符扩展机制(比较：layer_id_ext)提供的信息，获得用于特定NAL单元和/或薄片数据的层标识符。Then, based on the activation of the layer identifier extension mechanism (compare: LayerIdExtEnabled), the layer identifier for a specific NAL cell and/or sheet data is obtained by mathematically combining the layer identifier in the NAL cell header (compare: nuh_layer_id) and the information provided by the layer identifier extension mechanism (compare: layer_id_ext).

特定的实施方式获得层标识符，在此处称为：LayerId，如下将现有层标识符(比较：nuh_layer_id)用作最高有效位，并且将扩展信息用作最低有效位：A specific implementation obtains the layer identifier, referred to herein as LayerId, by using the existing layer identifier (compare: nuh_layer_id) as the most significant bit and the extended information as the least significant bit:

在b)nuh_layer_id可以表示不同值的情况下，这个信令方案允许通过小范围的layer_id_ext值信令更多不同的LayerId值。还允许特定视图聚集，即，共同接近的视图可以使用相同的值nuh_layer_id表示其彼此所属，参照图33。In case b) nuh_layer_id can represent different values, this signaling scheme allows signaling more different LayerId values through a small range of layer_id_ext values. It also allows specific view clustering, that is, views that are close to each other can use the same nuh_layer_id value to indicate that they belong to each other, as shown in Figure 33.

图33示出了视图集群的构造，其中，与群集相关联的所有NAL单元(即，物理上接近的摄像头的一组视图)具有相同的值nuh_layer_id和相等的值layer_id_ext。替换地，语法元素layer_id_ext可以在本发明的另一个实施方式中用于相应地构成集群，并且nuh_layer_id可以用于在群集内识别视图。Figure 33 illustrates the construction of a view cluster, where all NAL units associated with the cluster (i.e., a group of views from physically close cameras) have the same value nuh_layer_id and an equal value layer_id_ext. Alternatively, the syntax element layer_id_ext can be used in another embodiment of the invention to accordingly construct the cluster, and nuh_layer_id can be used to identify views within the cluster.

本发明的另一个实施方式获得层标识符，在此处称为LayerId，如下将现有层标识符(比较：nuh_layer_id)用作最低有效位，并且将扩展信息用作最高有效位：Another embodiment of the invention obtains the layer identifier, referred to herein as LayerId, by using the existing layer identifier (compare: nuh_layer_id) as the least significant bit and the extended information as the most significant bit:

这个信令方案允许通过特定视图的聚集来信令，即，在物理上彼此远离的摄像头的视图可以使用相同的值nuh_layer_id来表示其使用对摄像头视图的相同预测依赖性，在不同的群集内具有相同的值nuh_layer_id(即，在这个实施方式中，值layer_id_ext)。This signaling scheme allows signaling through the aggregation of specific views, that is, views of cameras that are physically far apart from each other can use the same value nuh_layer_id to indicate that they use the same predictive dependency on the camera view, and have the same value nuh_layer_id (i.e., layer_id_ext in this implementation) within different clusters.

另一个实施方式使用添加的方案来扩展LayerId的范围(maxNuhLayerId表现有层标识符范围(比较：nuh_layer_id)的最大容许值)：Another implementation uses an additive scheme to extend the range of LayerId (maxNuhLayerId represents the maximum permissible range of layer identifiers (compare: nuh_layer_id)):

在a)预先定义的值nuh_layer_id用于启用扩展的情况下，这个信令方案特别有用。例如，maxNuhLayerId的值可以用作预先定义的逸出码，以允许LayerId值范围的无间隙扩展。This signaling scheme is particularly useful when a) a predefined value of nuh_layer_id is used to enable expansion. For example, the value of maxNuhLayerId can be used as a predefined escape code to allow gapless expansion of the LayerId value range.

在HEVC的3D视频编码扩展的测试模型的草案作为早期描述的草拟版本[3]的背景下，在以下段落中描述一个可能的实施方式。In the context of the draft test model of HEVC’s 3D video coding extension as an earlier described draft version [3], a possible implementation is described in the following paragraphs.

在早期版本[3]的部分G.3.5中，如下限定视图分量：In part of the earlier version [3] G.3.5, view components were defined as follows:

视图分量：在单个访问单元A视图分量内的视图的编码表示可以包含深度视图分量和纹理视图分量。View component: The coded representation of a view within a single access unit A view component can include a depth view component and a texture view component.

根据现有层标识符(比较：nuh_layer_id)，在VPS扩展语法中，限定深度和纹理视图分量的映射。本发明增加了映射额外层标识符值范围的灵活性。在图34A和图34B中显示了一个示例性语法。使用阴影，强度现有语法的变化。Based on existing layer identifiers (compare: nuh_layer_id), the mapping of depth and texture view components is limited in the VPS extended syntax. This invention increases the flexibility of mapping an additional range of layer identifier values. An exemplary syntax is shown in Figures 34A and 34B. Shading is used to emphasize the changes to the existing syntax.

如果使用层标识符扩展，那么VpsMaxLayerId设置为等于vps_max_layer_id，否则，设置为等于vps_max_ext_layer_id。If layer identifier extension is used, then VpsMaxLayerId is set to equal vps_max_layer_id; otherwise, it is set to equal vps_max_ext_layer_id.

如果使用层标识符扩展，那么VpsMaxNumLayers设置为可以使用扩展(通过预先定义的位数或者根据layer_id_ext_len)编码的最大层数，否则，VpsMaxNumLayers设置为等于vps_max_layers_minus1+1。If layer identifier extension is used, then VpsMaxNumLayers is set to the maximum number of layers that can be encoded using the extension (by a predefined number of bits or according to layer_id_ext_len); otherwise, VpsMaxNumLayers is set to equal vps_max_layers_minus1+1.

vps_max_ext_layer_id是最大使用的LayerId值。vps_max_ext_layer_id is the maximum LayerId value that can be used.

layer_id_in_nalu[i]规定与第i层的VCL NAL单元相关联的LayerId值的值，对于在从0到VpsMaxNumLayers–1(包括0和VpsMaxNumLayers–1)的范围内的i，在不存在时，layer_id_in_nalu[i]的值推断为等于i。layer_id_in_nalu[i] specifies the value of the LayerId associated with the VCL NAL cell of the i-th layer. For i in the range from 0 to VpsMaxNumLayers–1 (inclusive), if i does not exist, the value of layer_id_in_nalu[i] is inferred to be equal to i.

在i大于0时，layer_id_in_nalu[i]大于layer_id_in_nalu[i–1]。When i is greater than 0, layer_id_in_nalu[i] is greater than layer_id_in_nalu[i–1].

在splitting_flag等于1时，如果在段内的位的总数小于6，那么layer_id_in_nuh的MSB应需要是0。When splitting_flag equals 1, if the total number of bits in the segment is less than 6, then the MSB of layer_id_in_nuh should be 0.

对于在从0到vps_max_layers_minus1(包括0和vps_max_layers_minus1)的范围内的i，变量LayerIdInVps[layer_id_in_nalu[i]]设置为等于i。For i in the range from 0 to vps_max_layers_minus1 (inclusive), the variable LayerIdInVps[layer_id_in_nalu[i]] is set to equal i.

dimension_id[i][j]UI规定第i层的第j个存在的可扩展性维度类型的标识符。在不存在时，dimension_id[i][j]的值推断为等于0。用于表示dimension_id[i][j]的位数是dimension_id_len_minus1[j]+1位。在splitting_flag等于1时，位流一致性要求dimension_id[i][j]等于((layer_id_in_nalu[i]&((1<<dimBitOffset[j+1])-1))>>dimBitOffset[j])。`dimension_id[i][j]` is the identifier of the j-th existing scalability dimension type in the i-th layer. If it does not exist, the value of `dimension_id[i][j]` is inferred to be 0. The number of bits used to represent `dimension_id[i][j]` is `dimension_id_len_minus1[j]+1` bits. When `splitting_flag` is equal to 1, bitstream consistency requires `dimension_id[i][j]` to be equal to `((layer_id_in_nalu[i]&((1<<dimBitOffset[j+1])-1))>>dimBitOffset[j])`.

如下获得规定第i层的第smIdx个可扩展性维度类型的标识符的值ScalabilityId[i][smIdx]、规定第i层的视图标识符的变量ViewId[layer_id_in_nuh[i]]、以及规定第i层的空间/SNR可扩展性标识符的DependencyId[layer_id_in_nalu[i]]：The following methods are used to obtain the value of the identifier for the smIdx-th scalability dimension type of the i-th layer, ScalabilityId[i][smIdx], the variable ViewId[layer_id_in_nuh[i]] which specifies the view identifier of the i-th layer, and DependencyId[layer_id_in_nalu[i]] which specifies the spatial/SNR scalability identifier of the i-th layer:

在早期版本[3]的部分2中，描述了通过如下在早期版本[3]的部分NAL单元报头语义中获得的其可扩展性标识符视图次序索引(比较：ViewIdx)和深度标志(比较：DepthFlag)，特定摄像头的相应深度图和纹理分量可以与其他深度图和纹理区分：In section 2 of the earlier version [3], it is described that the corresponding depth map and texture component of a particular camera can be distinguished from other depth maps and textures by means of its extensibility identifier view order index (compare: ViewIdx) and depth flag (compare: DepthFlag) obtained in the partial NAL unit header semantics of the earlier version [3]:

ViewIdx＝layer_id>>1ViewIdx = layer_id >> 1

DepthFlag＝layer_id％2DepthFlag = layer_id%2

因此，单独视图分量(即，特定摄像头的纹理和深度图分量)必须封包化为NAL单元，具有可区分的单独值layer_id，即，通过可变ViewIdx的值，在0的早期版本部分G.8中，在解码处理中。Therefore, individual view components (i.e., texture and depth map components of a specific camera) must be encapsulated into NAL units with a distinguishable individual value layer_id, i.e., through the variable ViewIdx value, in the early version 0 of G.8, during the decoding process.

上述概念允许将在NAL单元报头(比较：nuh_layer_id)内的层标识符的相同值用于不同视图。因此，标识符ViewIdx和DepthFlag的推导需要如下适合于使用前面获得的扩展的视图标识符：The above concepts allow the same value for the layer identifier within the NAL unit header (compare: nuh_layer_id) to be used for different views. Therefore, the derivation of the identifiers ViewIdx and DepthFlag requires the following approach suitable for using the extended view identifiers obtained earlier:

ViewIdx＝LayerId>>1ViewIdx = LayerId >> 1

DepthFlag＝LayerId％2DepthFlag = LayerId%2

下面参照图35，描述第三方面的普遍实施方式。该图显示了解码器800，其被配置为解码多层视频信号。可以如上面在图2、9或18中所述，体现解码器。即，可以使用上述方面及其实施方式，根据某个实施方式，获得图35的解码器800的更详细的解释的实例。为了说明在上述方面及其实施方式与图35的实施方式之间的这个可能重叠，例如，相同的参考符号在图35中用于多层视频信号40。关于多层视频信号40的多层可以是什么，参照上面在第二方面中提出的陈述。A general implementation of the third aspect is described below with reference to FIG35. This figure shows a decoder 800 configured to decode a multi-layer video signal. The decoder can be represented as described above in FIG2, 9, or 18. That is, examples of a more detailed explanation of the decoder 800 of FIG35 can be obtained using the above aspects and their embodiments, according to a particular embodiment. To illustrate this possible overlap between the above aspects and their embodiments and the embodiment of FIG35, for example, the same reference numerals are used in FIG35 for the multi-layer video signal 40. Regarding what the multi-layers of the multi-layer video signal 40 can be, refer to the statements presented above in the second aspect.

如图35中所示，多层视频信号由一系列数据包804构成，每个数据包包括层识别语法元素806，使用在上述具体HEVC扩展实例中的语法元素nuh_layer_id体现。解码器800被配置为响应于在所述多层视频信号40内的层识别扩展机制信令，如下面进一步所述，该多层视频信号可以部分涉及层识别语法元素本身。层识别扩展机制信令808由解码器800感测，该解码器响应于信号808，如下用于在数据包804之中的预定数据包，这种预定数据包使用箭头810显示为进入的解码器800。如使用解码器800的开关812所述，通过层识别扩展机制信令808控制，对于预定数据包810，信号808从多层数据流40中读取814层识别扩展并且使用这个层识别扩展确定816当前数据包810的层识别索引。如果信号808信令激活层识别扩展机制，那么读取814的层识别扩展可以由当前数据包810本身包括，如在818中所示，或者可以放在数据流40内的其他地方，但是通过与当前数据包810可关联的方式。因此，如果所述层识别扩展机制信令808信令激活层识别扩展机制，那么根据814和816，解码器800确定816当前数据包810的层识别索引。然而，如果所述层识别扩展机制信令808信令去激活层识别扩展机制，那么解码器800仅仅从当前数据包810的层识别语法元素806中，确定820预定数据包810的层识别索引。在这种情况下，不需要层识别扩展818(即，其在信号40内的存在)，即，不存在。As shown in Figure 35, the multi-layer video signal consists of a series of data packets 804, each data packet including a layer identification syntax element 806, embodied using the nuh_layer_id syntax element in the specific HEVC extension instance described above. The decoder 800 is configured to respond to layer identification extension mechanism signaling within the multi-layer video signal 40, as further described below, whereby the multi-layer video signal may partially involve the layer identification syntax element itself. Layer identification extension mechanism signaling 808 is sensed by the decoder 800, which responds to signal 808 as follows for predetermined data packets within data packets 804, such predetermined data packets being indicated by arrow 810 as entering the decoder 800. As described using switch 812 of the decoder 800, controlled by layer identification extension mechanism signaling 808, for a predetermined data packet 810, signal 808 reads 814 the layer identification extension from the multi-layer data stream 40 and uses this layer identification extension to determine 816 the layer identification index of the current data packet 810. If signaling 808 activates the layer identification extension mechanism, then the layer identification extension read in 814 can be included by the current data packet 810 itself, as shown in 818, or it can be placed elsewhere within data stream 40, but in a manner associatable with the current data packet 810. Therefore, if the layer identification extension mechanism signaling 808 activates the layer identification extension mechanism, then decoder 800 determines the layer identification index of the current data packet 810 according to 814 and 816. However, if the layer identification extension mechanism signaling 808 deactivates the layer identification extension mechanism, then decoder 800 determines the layer identification index of the predetermined data packet 810 simply from the layer identification syntax element 806 of the current data packet 810. In this case, the layer identification extension 818 is not needed (i.e., its presence within signal 40), that is, it does not exist.

根据一个实施方式，在每个数据包的意义中，层识别语法元素806至少有助于所述层识别扩展机制信令808：就每个数据包(例如，当前数据包810)而言，解码器800至少部分根据采用或不采用逸出值的相应数据包810的层识别语法元素806，确定所述层识别扩展机制信令808信令激活还是去激活层识别扩展机制。例如，由在某个参数组824内的数据流40包括的高级语法元素822可以相当宏观地或者在更高的范围内有助于所述层识别扩展机制信令808，即，层识别扩展机制信令信令激活还是去激活层识别扩展机制。尤其地，所述解码器800可以被配置为主要根据高级语法元素822，确定所述层识别扩展机制信令808信令激活还是去激活预定数据包810的层识别扩展机制：如果高级语法元素采用第一状态，那么信号信令去激活层识别扩展机制。参照上述时是否能够使，这涉及layer_id_ext_flag＝0、layer_id_mode_idc＝0或layer_id_ext_len＝0。换言之，在以上具体语法实例中，layer_id_ext_flag、layer_id_ext_idc以及layer_id_ext_len分别表示高级语法元素822的实例。According to one implementation, in the sense of each data packet, the layer identification syntax element 806 at least contributes to the layer identification extension mechanism signaling 808: for each data packet (e.g., the current data packet 810), the decoder 800 determines, at least in part, whether the layer identification extension mechanism signaling 808 is activated or deactivated based on the layer identification syntax element 806 of the corresponding data packet 810, which adopts or does not adopt the escape value. For example, the high-level syntax element 822 included by the data stream 40 within a certain parameter group 824 can contribute to the layer identification extension mechanism signaling 808, i.e., whether the layer identification extension mechanism signaling is activated or deactivated, quite macroscopically or on a higher level. In particular, the decoder 800 can be configured to determine whether the layer identification extension mechanism signaling 808 is activated or deactivated for a predetermined data packet 810 primarily based on the high-level syntax element 822: if the high-level syntax element adopts a first state, then the signaling deactivates the layer identification extension mechanism. Whether this is possible, as described above, depends on whether layer_id_ext_flag = 0, layer_id_mode_idc = 0, or layer_id_ext_len = 0. In other words, in the specific syntax examples above, layer_id_ext_flag, layer_id_ext_idc, and layer_id_ext_len represent instances of high-level syntax element 822.

关于某个数据包(例如，数据包810)，这表示如果高级语法元素采用与第一状态不同的状态并且该数据包810的层识别语法元素806采用逸出值，那么解码器800确定等级识别扩展机制信令808信令激活数据包810的等级识别扩展机制。然而，如果对于数据包810有效地高级语法元素822采用第一状态或者该数据包810的层识别语法元素806采用与逸出值不同的值，那么解码器800确定信号808不信令去激活层识别扩展机制。Regarding a given data packet (e.g., packet 810), this means that if the high-level syntax element adopts a state different from the first state and the layer identification syntax element 806 of packet 810 adopts an escape value, then the decoder 800 determines that the layer identification extension mechanism signaling 808 signals to activate the layer identification extension mechanism of packet 810. However, if the high-level syntax element 822 of packet 810 effectively adopts the first state or the layer identification syntax element 806 of packet 810 adopts a value different from the escape value, then the decoder 800 determines that the signaling 808 does not signal to deactivate the layer identification extension mechanism.

并非仅仅具有两个可能的状态，如在以上语法实例中所述，出了去激活状态以外，高级语法元素822可以包括高级语法元素824可能采用的不止一个进一步状态。根据这些可能的进一步状态，确定816可以变化，如使用虚线824所示。例如，在以上语法实例中，layer_id_mode_idc＝2的情况表明，确定816可能造成解码器800串联表示数据包810的层识别语法元素806的数字和表示层识别扩展的数字，以便获得数据包810的层识别索引。与此不同，layer_id_len≠0的实例情况表明，确定816可能造成解码器800执行以下内容：解码器800使用高级语法元素确定与数据包810相关联的高级识别扩展818的长度n，并且串联表示数据包810的层识别语法元素806的数字和表示数据包810的层识别扩展818的n个数字，以便获得预定数据包的等级识别索引。甚至更进一步地，确定816可以涉及将与数据包810相关联的等级识别扩展818加入预定值中，例如，该预定值可以对应于超过层识别语法元素806的最大可表示状态的数量(小于逸出值)，以便获得预定数据包810的层识别索引。Instead of just two possible states, as described in the syntax examples above, the high-level syntax element 822 can include more than one further state that the high-level syntax element 824 may take, in addition to the deactivation state. Depending on these possible further states, determination 816 can vary, as shown by the dashed line 824. For example, in the syntax examples above, the case where layer_id_mode_idc = 2 indicates that determination 816 may cause decoder 800 to concatenate the numbers representing the layer identification syntax element 806 of packet 810 with the numbers representing the layer identification extension to obtain the layer identification index of packet 810. In contrast, the case where layer_id_len ≠ 0 indicates that determination 816 may cause decoder 800 to perform the following: decoder 800 uses the high-level syntax element to determine the length n of the high-level identification extension 818 associated with packet 810, and concatenates the numbers representing the layer identification syntax element 806 of packet 810 with the n numbers representing the layer identification extension 818 of packet 810 to obtain the level identification index of the predetermined packet. Even further, determining 816 may involve adding the level identification extension 818 associated with data packet 810 to a predetermined value, for example, the predetermined value may correspond to the number of maximum representable states exceeding the level identification syntax element 806 (less than the escape value), in order to obtain the level identification index of the predetermined data packet 810.

如在图34A和图34B中使用808’所述，然而，也可行的是，排除数据包810的层识别语法元素806不有助于层识别扩展机制信令808，以便语法元素806的整个可表示的值/状体依然存在，并且都不保留为逸出码。在这种情况下，信号808’向解码器800指示，对于每个数据包810，是否存在层识别扩展818，因此，指示层识别索引确定遵循814和816还是820。As described in Figures 34A and 34B using 808', it is also possible to exclude the layer identification syntax element 806 of data packet 810 from contributing to the layer identification extension mechanism signaling 808, so that the entire representable value/state of syntax element 806 remains, and none are retained as escape codes. In this case, signal 808' indicates to decoder 800 whether a layer identification extension 818 exists for each data packet 810, thus instructing layer identification index determination to follow 814 and 816 or 820.

因此，与图35的解码器配合的编码器仅仅形成数据流。根据(例如)在数据流内编码的层数，编码器决定是否使用扩展机制。Therefore, the encoder working in conjunction with the decoder in Figure 35 simply forms a data stream. The encoder determines whether to use an extension mechanism based on, for example, the number of layers encoded within the data stream.

本申请的第四方面涉及维度相关直接依赖性信令。The fourth aspect of this application relates to dimension-dependent direct dependency signaling.

在当前HEVC扩展([2]、[3]、[4])中，编码层可以使用0个或多个参考编码层，用于预测数据。每个编码层由唯一nuh_layer_id值识别，该值可以双射地映射到layerIdInVps值中。layerIdInVps值是连续的，并且在具有等于A的layerIdinVps的层由具有layerIdInVps的层引用时，位流一致性的要求是A小于B。In the current HEVC extensions ([2], [3], [4]), a coding layer can use zero or more reference coding layers for predicting data. Each coding layer is identified by a unique nuh_layer_id value, which can be bijectively mapped to a layerIdInVps value. The layerIdInVps values are consecutive, and the bitstream consistency requirement is that A is less than B when a layer with a layerIdInVps equal to A is referenced by a layer with a layerIdInVps.

对于在位流内的每个编码层，在视频参数组内信令参考编码层。因此，为每个编码层传输二元掩膜。对于具有layerIdinVps值b的编码层，掩膜(表示为direct_dependency_flag[b])由b-1个位构成。在具有等于x的layerIdinVps的层是具有等于b的layerIdinVps的层的参考层时，在二元掩膜(表示为direct_dependency_flag[b][x])内的第x个位等于1。否则，在具有等于x的layerIdinVps的层不是具有等于B的layerIdinVps的层的参考层时，direct_dependency_flag[b][x]的值等于0。For each coding layer within the bitstream, a signaling reference coding layer is established within the video parameter group. Therefore, a binary mask is transmitted for each coding layer. For a coding layer with a layerIdinVps value of b, the mask (denoted as direct_dependency_flag[b]) consists of b-1 bits. When a layer with layerIdinVps equal to x is the reference layer for a layer with layerIdinVps equal to b, the x-th bit in the binary mask (denoted as direct_dependency_flag[b][x]) is equal to 1. Otherwise, when a layer with layerIdinVps equal to x is not the reference layer for a layer with layerIdinVps equal to b, the value of direct_dependency_flag[b][x] is equal to 0.

在解析所有direct_dependency_flags时，对于每个编码层，创建列表，包括所有参考层的nuh_layer_id值，如direct_dependency_flags所规定的。When resolving all direct_dependency_flags, for each coding layer, a list is created that includes the nuh_layer_id values of all reference layers, as specified by direct_dependency_flags.

而且，在VPS中信令信息，允许将每个layerIdinVps值映射到在T维可扩展性空间内的位置。每个维度t表示可扩展性的类型，这可以是(例如)视图可扩展性、空间可扩展性或深度图的指示。Furthermore, the signaling information within the VPS allows each layerIdinVps value to be mapped to a location within a T-dimensional scalability space. Each dimension t represents the type of scalability, which can be, for example, an indication of view scalability, spatial scalability, or a depth map.

通过信令每个可能依赖性的一个位，当前设计提供最大灵活性。然而，这个灵活性具有一些缺点：By signaling one bit for each possible dependency, the current design offers maximum flexibility. However, this flexibility has some drawbacks:

1.对于每个可扩展性维度，常见用例是使用特定的依赖性结构。1. For each scalability dimension, a common use case is to use a specific dependency structure.

而且，直接维度间依赖性不常见并且可能禁止。Moreover, direct interdimensional dependencies are uncommon and may be prohibited.

在图36中描述了用于常见层设置的一个实例。在此处，维度0可能是视图可扩展性维度，利用一种分层预测结构。维度1可能是使用IP结构的空间可扩展性维度。在图37中显示了涉及所描述的设置的direct_dependency_flags。Figure 36 illustrates an example of a common layer setup. Here, dimension 0 might be a view scalability dimension, utilizing a hierarchical prediction structure. Dimension 1 might be a spatial scalability dimension using an IP structure. Figure 37 shows the direct_dependency_flags involved in the described setup.

当前解决方案的缺点在于，未从当前VPS设计中直接识别这种维度相关依赖性，这是因为这需要direct_dependency_flags的算术复杂的分析。The drawback of the current solution is that it does not directly identify this dimension-dependent dependency from the current VPS design, because this requires arithmetically complex analysis of direct_dependency_flags.

2.即使在使用仅仅一个可扩展的维度类型时，相同的结构通常用于层的子集。对于仅仅(例如)视图可扩展性的情况，视图可能映射到由水平和垂直摄像头位置跨过的空间中。2. Even when using only one scalable dimension type, the same structure is often used for a subset of layers. For cases involving only view scalability, such as those involving only view scalability, the view might be mapped to a space spanned by the horizontal and vertical camera positions.

在图36中描述了这种场景的一个实例，其中，维度0和1解释为水平和垂直摄像头位置维度。Figure 36 illustrates an example of this scenario, where dimensions 0 and 1 are interpreted as the horizontal and vertical camera position dimensions.

虽然习惯做法是将一个预测结构用于每个摄像头位置维度，但是当前VPS设计不能利用由此产生的冗余。而且，在当前VPS设计中没有依赖性是维度相关的直接指示。While the conventional practice is to use a single prediction structure for each camera location dimension, the current VPS design cannot take advantage of the resulting redundancy. Furthermore, the current VPS design lacks a direct indication of dimension dependence due to the absence of dependencies.

3.direct_dependency_flags的数量与在位流内的层的平方数成比例，因此，在当前最坏情况下，通过64层，需要大约64*63/2＝2016位。而且，在位流内的最大数量的层扩展时，这造成大幅增大数量的位。3. The number of `direct_dependency_flags` is proportional to the square of the number of layers in the bitstream. Therefore, in the worst-case scenario, with 64 layers, approximately 64 * 63 / 2 = 2016 bits are required. Moreover, this results in a significantly increased number of bits when expanding to the maximum number of layers in the bitstream.

由于能够明确信令T维依赖性空间的每个维度t的依赖性，所以可以解决上述缺点。Since the dependency of each dimension t in the T-dimensional dependency space of the signaling can be clearly defined, the above-mentioned shortcomings can be solved.

维度相关的直接依赖性信令提供以下优点：Dimension-dependent direct dependency signaling offers the following advantages:

1.每个依赖性维度的依赖性在位流内直接可用，并且不需要direct_dependency_flags的复杂分析；1. The dependencies for each dependency dimension are directly available within the bitstream and do not require complex analysis of direct_dependency_flags;

2.信令依赖性所需要的位数可以减少。2. The number of bits required for signaling dependencies can be reduced.

在一个实施方式中，依赖性空间可以(例如)与在当前MV和可扩展草案[2]中描述的可扩展性空间相同。在另一个实施方式中，依赖性空间可以明确信令，并且还可以(例如)是由摄像头位置跨过的空间。In one implementation, the dependency space may be (for example) the same as the scalability space described in the current MV and scalability draft [2]. In another implementation, the dependency space may be explicit signaling and may also be (for example) the space traversed by the camera location.

在图38中提供了维度相关的依赖性信令的一个实例。可以看出，在维度之间的依赖性可以从二元掩膜中直接获得并且所需要的位的量减少。Figure 38 provides an example of dimension-dependent dependency signaling. It can be seen that the dependencies between dimensions can be directly obtained from the binary mask with a reduced number of bits required.

在下文中，假设每个layerIdInVps值双射映射到T维依赖性空间内，维度0，1，2，....，(T-1)。因此，每层据相关的矢量(d₀，d₁，d₂，...，d_T-1)′，d₀，d₁，d₂，...，d_T-1规定在相应维度0，1，2，...，(T-1).内的位置。In the following text, it is assumed that each layerIdInVps value is bijectively mapped to a T-dimensional dependency space, with dimensions ₀ , ₁ , ₂ , ..., (T- ₁ ). Therefore, each layer is positioned within the corresponding dimensions ₀ , 1, ₂ , ..., ( _T-1 ) according to the associated vector (d0, _d1 , d2, ..., dT-1)′.

基本理念是层依赖性的维度相关信令。因此，对于每个维度t∈{0，1，2 ... (T-1)}以及在维度t内的每个位置d_t，信令一组参考位置Ref(d_t)。参考位置组用于确定在不同层至今年的直接依赖性，如在下面所述：The basic concept is layer-dependent dimension-related signaling. Therefore, for each dimension t ∈ {0, 1, 2 ... (T-1)} and each position d _t within dimension t, a set of reference positions Ref(d_t ) is signaled. This set of reference positions is used to determine the direct dependencies across different layers to the present year, as described below:

在d_t，Ref是在Ref(d_t)内的元素时，在维度t内具有位置d_t并且在维度x内具有位置d_x的层(x∈{0，1，2 ... (T-1)}\{t})取决于在维度t内具有位置d_t，Ref并且在维度x内具有位置d_x的层(x∈{0，1，2 ... (T-1)}\{t})。When _Ref is an element within Ref(d _t ), the layer with position d _t in dimension t and position d _x in dimension x (x∈{0,1,2 ... (T-1)}\{t}) depends on the layer with position d _{t in dimension t, Ref} and position d _x in dimension x (x∈{0,1,2 ... (T-1)}\{t}).

在另一个特定的实施方式中，所有依赖性逆转，因此，在Ref(d_t)内的位置表示取决于在维度t内的位置d_t的层的在维度t内的层的位置。In another particular implementation, all dependencies are reversed, so the position in Ref(d _t ) represents the position of the layer in dimension _t that depends on the position d t in dimension t.

就依赖性空间的信令和推导而言，下面描述的信令可以在(例如)SEI消息内的VPS、SPS中或者在位流内的其他地方进行。Regarding signaling and derivation in the dependency space, the signaling described below can be performed, for example, in the VPS, SPS within the SEI message, or elsewhere within the bit stream.

关于维度的数量以及在维度内的位置的数量，要注意以下内容。通过在维度的特定数量以及在每个维度内的位置的特定数量，限定依赖性空间。Regarding the number of dimensions and the number of positions within each dimension, note the following: The dependency space is constrained by a specific number of dimensions and a specific number of positions within each dimension.

在一个特定的实施方式中，例如，如图39中所示，可以明确信令维度num_dims的数量以及在维度t内的位置的数量num_pos_minus1[t]。In one particular implementation, for example as shown in Figure 39, the number of signaling dimensions num_dims and the number of positions num_pos_minus1[t] within dimension t can be explicitly defined.

在另一个实施方式中，值num_dims或值um_pos_minus1可以固定并且不在位流内信令。In another implementation, the value num_dims or the value um_pos_minus1 can be fixed and not signaled within the bit stream.

在另一个实施方式中，值num_dims或值um_pos_minus1可以从存在于位流内的其他语法元素中获得。更具体而言，在当前HEVC扩展设计中，维度的数量以及在维度内的位置的数量可以分别等于可扩展性维度的数量以及可扩展性维度的长度。In another implementation, the value num_dims or the value um_pos_minus1 can be obtained from other syntax elements existing within the bitstream. More specifically, in the current HEVC extended design, the number of dimensions and the number of positions within a dimension can be equal to the number of scalable dimensions and the length of the scalable dimension, respectively.

因此，NumScalabilityTypes和dimension_id_len_minus1[t]如在[2]中所定义的：Therefore, NumScalabilityTypes and dimension_id_len_minus1[t] are as defined in [2]:

num_dims＝NumScalabilityTypesnum_dims=NumScalabilityTypes

num_pos_minus1[t]＝dimension_id_len_minus1[t]num_pos_minus1[t]=dimension_id_len_minus1[t]

在另一个实施方式中，可以在位流内信令值num_dims或值um_pos_minus1明确信令还是从存在于位流内的其他语法元素中获得。In another implementation, the signaling value num_dims or the value um_pos_minus1 within the bit stream can be used to explicitly indicate whether the signaling is obtained from other syntax elements existing within the bit stream.

在另一个实施方式中，值num_dims可以从存在于位流内的其他语法元素中获得，然后，通过一个或多个维度的分离的额外信令或者通过信令额外维度增大。In another implementation, the value num_dims can be obtained from other syntax elements existing within the bitstream, and then increased by additional signaling separated by one or more dimensions or by additional signaling dimensions.

关于layerIdInVps映射到在依赖性空间内的位置，要注意的是，层映射到依赖性空间中。Regarding the mapping of layerIdInVps to its location in the dependency space, it's important to note that layers are mapped to the dependency space.

在一个特定的实施方式中，例如，可以明确传输规定层的位置的语法元素pos_in_dim[i][t]，在维度t内具有layerIdinVps值i。这在图40中显示。In one particular implementation, for example, the syntax element pos_in_dim[i][t] can be used to explicitly transmit the location of the specified layer, having a layerIdinVps value i within dimension t. This is shown in Figure 40.

在另一个实施方式中，不在位流内信令值pos_in_dim[i][t]，而是直接从layerIdInVps值i中获得，例如，In another implementation, the signaling value pos_in_dim[i][t] is not obtained within the bitstream, but directly from the layerIdInVps value i, for example,

具体而言，对于当前HEVC扩展设计，上述可能代替dimension_id[i][t]值的当前明确信令。Specifically, for the current HEVC extended design, the above may replace the current explicit signaling of the dimension_id[i][t] value.

在另一个实施方式中，值pos_in_dim[i][t]从在位流内的其他语法元素中获得。具体而言，在当前HEVC扩展设计中，例如，可以从dimension_id[i][t]值中获得值pos_in_dim[i][t]。In another implementation, the value pos_in_dim[i][t] is obtained from other syntax elements within the bitstream. Specifically, in the current HEVC extension design, for example, the value pos_in_dim[i][t] can be obtained from the value of dimension_id[i][t].

pos_in_dim[i][t]＝dimension_id[i][t]pos_in_dim[i][t]=dimension_id[i][t]

在另一个实施方式中，可以信令pos_in_dim[i][t]明确信令或者从其他语法元素中获得。In another implementation, the signaling pos_in_dim[i][t] can be explicit or obtained from other syntax elements.

在另一个实施方式中，除了从存在于位流内的其他语法元素中获得的pos_in_dim[i][t]值，还可以信令pos_in_dim[i][t]值是否明确信令。In another implementation, in addition to obtaining the pos_in_dim[i][t] value from other syntax elements existing within the bitstream, the pos_in_dim[i][t] value can also be signaled as to whether it is explicitly signaled.

关于依赖性的信令和推导，使用以下内容。For signaling and derivation regarding dependencies, use the following.

直接位置依赖性标志的使用是以下实施方式的主题。在这个实施方式中，参考位置由(例如)标志pos_dependency_flag[t][m][n]信令，表示在维度t内的位置是否包含在维度t内的位置m的参考位置组内，例如，如在图41中所规定的。The use of direct position dependency flags is the subject of the following implementation. In this implementation, the reference position is signaled by, for example, the flag pos_dependency_flag[t][m][n], indicating whether the position in dimension t is included in the reference position group of position m in dimension t, for example, as specified in Figure 41.

在使用参考位置组的一个实施方式中，然后，可以获得规定用于在维度t内的位置m的在维度t内的参考位置的数量的变量num_ref_pos[t][m]以及规定用于在维度t内的位置m的在维度t内的第j个参考位置的变量ref_pos_set[t][m][j]，例如，In one implementation using a set of reference positions, a variable num_ref_pos[t][m] specifying the number of reference positions in dimension t for position m in dimension t, and a variable ref_pos_set[t][m][j] specifying the j-th reference position in dimension t for position m in dimension t, can then be obtained, for example.

在另一个实施方式中，所设置的参考位置的元素可以直接信令，例如，如在图42中所述规定的。In another implementation, the elements at the set reference positions can be directly signaled, for example, as specified in Figure 42.

在使用直接依赖性标志的实施方式中，可能从参考位置组中获得直接依赖性标志directDependencyFlag[i][j]，规定等于i的layerIdInVps取决于具有等于j的layerIdInVps的层。例如，可能如在下面所规定的那样做。In implementations using the direct dependency flag, the direct dependency flag directDependencyFlag[i][j] might be obtained from the reference location group, specifying that layerIdInVps equal to i depends on a layer having layerIdInVps equal to j. For example, it might be done as specified below.

具有作为输入的矢量posVector的函数posVecToPosIdx(posVector)获得与在依赖性空间内的位置posVector相关的索引posIdx，如在下面所规定的：The function posVecToPosIdx(posVector) takes a vector posVector as input and obtains the index posIdx associated with the position posVector in the dependency space, as specified below:

例如，可以获得规定取决于从pos_in_dim[i]中获得的索引idx的layerIdinVps值i的变量posIdxToLayerIdInVps[idx]，如在下面所规定的：For example, a variable posIdxToLayerIdInVps[idx] can be obtained, which depends on the layerIdinVps value i of the index idx obtained from pos_in_dim[i], as specified below:

for(i＝0；i＜vps_max_layers_minus1；i++)for(i=0;i<vps_max_layers_minus1;i++)

posldxToLayerldnVps[posVecToPosldx(pos_in_dim[i])]＝iposldxToLayerldnVps[posVecToPosldx(pos_in_dim[i])]=i

获得变量directDependencyFlag[i][j]，如在下面所规定的：Obtain the variable directDependencyFlag[i][j] as specified below:

在一个实施方式中，可能从pos_dependency_flag[t][m][n]标志中直接获得直接依赖性标志directDependencyFlag[i][j]，其规定具有等于i的layerIdInVps的层取决于具有等于j的layerIdInVps的层。例如，如在下面所规定的：In one implementation, the direct dependency flag `directDependencyFlag[i][j]` may be obtained directly from the `pos_dependency_flag[t][m][n]` flag, which specifies that a layer with `layerIdInVps` equal to `i` depends on a layer with `layerIdInVps` equal to `j`. For example, as specified below:

在一个实施方式中，使用参考层组，可能获得规定用于具有等于i的layerIdInVps的层的参考层的数量的变量NumDirectRefLayers[i]以及规定第k个参考层的值layerIdInVps的变量RefLayerId[i][k]，例如，如在下面所规定的：In one implementation, using a reference layer group, it is possible to obtain a variable NumDirectRefLayers[i] specifying the number of reference layers for layers having a layerIdInVps equal to i, and a variable RefLayerId[i][k] specifying the value of the k-th reference layer, layerIdInVps, for example, as specified below:

在另一个实施方式中，可以从参考位置组中直接获得参考层，不获得directDependencyFlag值，例如，如在下面所规定的：In another implementation, the reference layer can be obtained directly from the reference location group without obtaining the directDependencyFlag value, for example, as specified below:

在另一个实施方式中，可能从pos_dependency_flag变量中直接获得参考层，而不获得ref_pos_set变量。In another implementation, the reference layer may be obtained directly from the pos_dependency_flag variable instead of the ref_pos_set variable.

因此，上面讨论的视图显示了根据第四方面的数据流并且显示了多层视频数据流，使用层间预测，通过不同等级的信息量(即，数量是LayerIdInVps)，将视频资料编码成该多层视频数据流。等级具有在其中限定的先后次序。例如，等级遵循sequence 1…vps_max_layers_minus1。例如，参照图40。在此处，由vps_max_layers_minus1在900中提供在多层视频数据流内的层数。Therefore, the view discussed above shows the data stream according to the fourth aspect and shows a multi-layer video data stream, which uses inter-layer prediction to encode video data into this multi-layer video data stream with different levels of information content (i.e., the quantity is LayerIdInVps). The levels have a defined order therein. For example, the levels follow sequence 1…vps_max_layers_minus1. For example, refer to Figure 40. Here, the number of layers in the multi-layer video data stream is provided by vps_max_layers_minus1 in 900.

将视频资料编码成多层视频数据流，以便任何层都不通过层间预测取决于根据先后次序位于后面的任何层。即，使用从1到vps_max_layers_minus1的编号，层i仅仅取决于层j<i。Video data is encoded into multi-layer video data streams such that no layer depends on any subsequent layer in the order of their position in the sequence of events, through inter-layer prediction. That is, using numbers from 1 to vps_max_layers_minus1, layer i depends only on layer j < i.

通过层间预测取决于一个或多个其他层的每层增大了信息量，其中，视频资料编码成所述一个或多个其他层。例如，增大与空间分辨率、视图的数量、SNR精确度等或其他维度类型相关。Inter-layer prediction increases the amount of information per layer, which depends on one or more other layers, where video data is encoded into said one or more other layers. This increase is related to spatial resolution, the number of views, SNR accuracy, or other dimensions.

例如，多层视频数据流通过VPS等级包括第一语法结构。在以上实例中，num_dims可以由第一语法结构包括，如在图39中的902中所示。因此，第一语法结构限定依赖性维度904和906的数量M。在图36中，示例性是2，一个是水平引导，另一个是垂直引导。在这方面，参照以上条目2：维度的数量不必等于不同维度类型的数量，在这方面，等级增大了信息量。例如，维度的数量可以更高，例如，在垂直和水平视图位移之间区分。在图36中示例性显示M个依赖性维度904和906，其跨过依赖性空间908。For example, a multi-tiered video data stream includes a first syntax structure via a VPS tier. In the example above, num_dims can be included by the first syntax structure, as shown at 902 in Figure 39. Therefore, the first syntax structure defines the number M of dependency dimensions 904 and 906. In Figure 36, an example is 2, one horizontally guided and the other vertically guided. In this respect, referring to entry 2 above: the number of dimensions does not necessarily equal the number of different dimension types; in this respect, the tier increases the amount of information. For example, the number of dimensions can be higher, for example, distinguishing between vertical and horizontal view displacements. Figure 36 exemplarily shows M dependency dimensions 904 and 906, spanning dependency space 908.

进一步，第一语法结构限定每个依赖性维度i的排序等级的最大N_i，例如，num_pos_minus1，从而在依赖性空间908内限定个可用点910。在图36的情况下，具有4乘以2个可用点910，后者在图36中由矩形表示。进一步，第一语法结构限定双射映射912(参照图40)，在以上实例中，由pos_in_dim[i][t]或者隐式限定该双射映射。双射映射40在依赖性空间908内的至少子集的可用点910中的相应一个上映射每个等级，即，在图40中，i。pos_in_dim[i][t]是通过其分量pos_in_dim[i][t]为等级i指向可用点910的矢量，t扫描维度904和906。例如，在vps_max_layers_minus1小于的情况下，子集是合适的子集。例如，在图36中，实际上使用的并且具有在其中限定的依赖性次序的等级可以在不到8个可用点上映射。Furthermore, the first syntactic structure defines a maximum N _i of sorted levels for each dependency dimension i, for example, num_pos_minus1, thereby defining 910 available points within the dependency space 908. In the case of Figure 36, there are 4 x 2 available points 910, which are represented by rectangles in Figure 36. Furthermore, the first syntactic structure defines a bijective mapping 912 (refer to Figure 40), which in the above example is defined by pos_in_dim[i][t] or implicitly. The bijective mapping 40 maps each level up to a corresponding one of the available points 910 in at least a subset within the dependency space 908, i.e., i in Figure 40. pos_in_dim[i][t] is a vector pointing to available points 910 for level i by its component pos_in_dim[i][t], t scanning dimensions 904 and 906. For example, a subset is appropriate if vps_max_layers_minus1 is less than 1. For example, in Figure 36, the hierarchy that is actually used and has the dependency order defined therein can be mapped on fewer than 8 available points.

对于每个依赖性维度i，例如，通过VPS等级，多层视频数据流包括第二语法结构914。在以上实例中，其包括pos_dependency_flag[t][m][n]或num_ref_pos[t][m]加上ref_pos_set[t][m][j]。对于每个依赖性维度i，第二语法结构914描述在依赖性维度i的N_i排序等级之中的依赖性。中途36中，由在矩形910之间的所有水平或所有垂直箭头表示依赖性。For each dependency dimension i, for example, by VPS tier, the multi-tier video data stream includes a second syntax structure 914. In the example above, it includes either pos_dependency_flag[t][m][n] or num_ref_pos[t][m] plus ref_pos_set[t][m][j]. For each dependency dimension i, the second syntax structure 914 describes the dependencies within the N _i sorted tiers of dependency dimension i. In the middle 36, dependencies are represented by all horizontal or all vertical arrows between rectangles 910.

总而言之，通过这种措施，通过限制的方式在依赖性空间内的可用点之间限定依赖性，以便所有这些依赖性与相应的一个依赖性轴并行运行，并且从更高的排序等级指向更低的排序等级，对于每个依赖性维度，与相应的依赖性维度并行的依赖性不变，抵抗沿着除了相应维度以外的每个依赖性维度的循环移位。参照图36：在上行矩形中的矩形之间的所有水平箭头在下行矩形中重复，并且相对于矩形的四个垂直列，这同样适用于垂直箭头，矩形对应于可变点，并且箭头对应于在其中的依赖性。通过这种措施，第二语法结构通过双射映射同时限定在层之间的依赖性。In summary, this measure constrains dependencies between available points in the dependency space in a restrictive manner, such that all these dependencies operate in parallel with a corresponding dependency axis, pointing from higher ordinal to lower ordinal levels. For each dependency dimension, dependencies parallel to the corresponding dependency dimension remain unchanged, resisting cyclic shifts along every dependency dimension except the corresponding one. Referring to Figure 36: all horizontal arrows between rectangles in the upward rectangle are repeated in the downward rectangle, and this also applies to the vertical arrows relative to the four vertical columns of the rectangle, where the rectangle corresponds to a variable point and the arrow corresponds to a dependency within it. Through this measure, the second syntactic structure simultaneously constrains dependencies between layers via a bijective mapping.

网络实体(例如，解码器)或mane(例如，MME)可以读出数据流的第一和第二语法结构，并且基于所述第一和第二语法结构确定在所述层之间的依赖性。Network entities (e.g., decoders) or manes (e.g., MMEs) can read out the first and second syntactic structures of the data stream and determine the dependencies between the layers based on the first and second syntactic structures.

网络实体读出第一语法结构，并且从中获得跨过依赖性空间的依赖性维度的数量M以及每个依赖性维度i的排序等级的最大N_i，从而在依赖性空间内获得个可用点。进一步，网络实体从第一语法结构中获得双射映射。进一步，对于依赖性维度i，网络实体读出第二语法结构并且由此获得在依赖性维度i的N_i排序等级之中的依赖性。每当决定去除任何层时，即，属于某个层的NAL单元，网络实体分别考虑在依赖性空间内的层的位置以及在可用点与层之间的依赖性。The network entity reads the first syntactic structure and obtains from it the number M of dependency dimensions spanning the dependency space and the maximum N _i of the ranking order of each dependency dimension i, thereby obtaining a number of available points within the dependency space. Further, the network entity obtains a bijective mapping from the first syntactic structure. Further, for dependency dimension i, the network entity reads the second syntactic structure and thereby obtains the dependencies within the N _i ranking order of dependency dimension i. Whenever it decides to remove any layer, i.e., a NAL unit belonging to a certain layer, the network entity considers the position of the layer within the dependency space and the dependencies between the available points and the layer, respectively.

这样做，网络实体可以选择一个等级；并且丢弃所述多层视频数据流的数据包(例如，NAL单元)，其属于(例如，通过nuh_layer_id)通过在所述层之间的依赖性独立于所选择的电平的层。In doing so, network entities can select a level; and discard packets of the multi-layer video data stream (e.g., NAL units) that belong (e.g., by nuh_layer_id) to a layer whose dependency between the layers is independent of the selected level.

虽然在设备的背景下描述了一些方面，但是显然，这些方面也表示相应方法的描述，其中，块或装置对应于方法步骤或者方法步骤的特征。同样，在方法步骤的背景下描述的方面也表示相应设备的相应块或物品或特征的描述。一些或所有方法步骤可以由(或使用)硬件设备执行，例如，微处理器、可编程计算机或电子电路。在一些实施方式中，某一个或多个最重要的方法步骤可以由这种设备执行。Although some aspects are described in the context of the device, it is clear that these aspects also represent a description of the corresponding method, wherein blocks or devices correspond to method steps or features of method steps. Similarly, aspects described in the context of method steps also represent a description of corresponding blocks, articles, or features of the corresponding device. Some or all of the method steps may be performed by (or using) hardware devices, such as microprocessors, programmable computers, or electronic circuits. In some embodiments, one or more of the most important method steps may be performed by such a device.

根据某些实现方式要求，本发明的实施方式可以在硬件内或者在软件内实现。可以使用数字存储器介质，执行实施方式，例如，软盘、DVD、蓝光光盘、CD、ROM、PROM、EPROM、EEPROM或FLASH存储器，电子可读控制信号储存在其上，这些信号与可编程计算机系统配合(或者能够与其配合)，以便执行相应方法。因此，数字存储器介质内可以是计算机可读的。Depending on certain implementation requirements, embodiments of the present invention can be implemented in hardware or software. Digital memory media can be used to execute the embodiments, such as floppy disks, DVDs, Blu-ray discs, CDs, ROMs, PROMs, EPROMs, EEPROMs, or FLASH memories, on which electronically readable control signals are stored. These signals cooperate with (or are capable of cooperating with) a programmable computer system to execute corresponding methods. Therefore, the digital memory medium can be computer-readable.

根据本发明的一些实施方式包括数据载波，具有电子可读控制信号，这些信号能够与可编程的计算机系统配合，以便执行在本文中描述的一种方法。Some embodiments of the invention include a data carrier having electronically readable control signals that are capable of cooperating with a programmable computer system to perform one of the methods described herein.

通常，可以实现本发明的实施方式，作为具有程序代码的计算机程序产品，在计算机程序产品在计算机上运行时，程序代码有效地用于执行一种方法。例如，程序代码还可以储存在机器可读载波上。Typically, embodiments of the present invention can be implemented as a computer program product having program code, which, when run on a computer, is effectively used to execute a method. For example, the program code may also be stored on a machine-readable carrier wave.

其他实施方式包括用于执行在本文中描述的一种方法的计算机程序，算机程序储存在机器可读载波上。Other implementations include a computer program for performing one of the methods described herein, the computer program being stored on a machine-readable carrier.

换言之，因此，本发明方法的一个实施方式是具有程序代码的计算机程序，在计算机程序在计算机上运行时，用于执行在本文中描述的一种方法。In other words, therefore, one embodiment of the method of the present invention is a computer program having program code, which, when run on a computer, is used to perform a method described herein.

因此，本发明方法的进一步实施方式是数据载体(或数字存储器介质或计算机可读介质)，在该数据载体上记录用于执行在本文中描述的一种方法的计算机程序。数据载体、数字存储器介质或记录介质通常是有形和/或非过渡的。Therefore, a further embodiment of the method of the present invention is a data carrier (or digital storage medium or computer-readable medium) on which a computer program for performing one of the methods described herein is recorded. The data carrier, digital storage medium, or recording medium is typically tangible and/or non-transparent.

因此，本发明方法的进一步实施方式是数据流或一系列信号，表示用于执行在本文中描述的一种方法的计算机程序。例如，数据流或这系列信号可以被配置为用于通过数据通信连接来传输，例如，通过互联网。Therefore, a further embodiment of the method of the present invention is a data stream or a series of signals representing a computer program for performing a method described herein. For example, the data stream or such series of signals can be configured for transmission via a data communication connection, such as the Internet.

进一步实施方式包括处理装置，例如，计算机或可编程逻辑装置，其被配置为或者适合于执行在本文中描述的一种方法。Further embodiments include processing means, such as a computer or programmable logic means, configured or adapted to perform one of the methods described herein.

进一步实施方式包括计算机，在该计算机上安装用于执行在本文中描述的一种方法的计算机程序。A further embodiment includes a computer on which a computer program is installed for performing one of the methods described herein.

根据本发明的进一步实施方式包括设备或系统，其被配置为将用于执行在本文中描述的一种方法的计算机程序传输(例如，电子地或光学地)给接收器。例如，接收器可以是计算机、移动装置、存储器装置等。例如，该设备或系统可以包括文件服务器，用于将计算机程序传输给接收器。Further embodiments of the invention include an apparatus or system configured to transmit (e.g., electronically or optically) a computer program for performing a method described herein to a receiver. For example, the receiver may be a computer, a mobile device, a memory device, etc. For example, the apparatus or system may include a file server for transmitting the computer program to the receiver.

在一些实施方式中，可编程逻辑装置(例如，现场可编程门阵列)可以用于执行在本文中描述的方法的一些或所有功能。在一些实施方式中，现场可编程门阵列可以与微处理器配合，以便执行在本文中描述的一种方法。通常，这些方法优选地由任何硬件设备执行。In some embodiments, a programmable logic device (e.g., a field-programmable gate array) can be used to perform some or all of the functions of the methods described herein. In some embodiments, the field-programmable gate array can cooperate with a microprocessor to perform one of the methods described herein. Generally, these methods are preferably performed by any hardware device.

使用硬件设备，或者使用计算机，或者使用硬件设备和计算机的组合，可以实现在本文中描述的设备。The devices described herein can be implemented using hardware devices, or using computers, or a combination of hardware devices and computers.

使用硬件设备，或者使用计算机，或者使用硬件设备和计算机的组合，可以执行在本文中描述的方法。The methods described herein can be performed using hardware devices, or using a computer, or a combination of hardware devices and a computer.

上述实施方式仅仅说明本发明的原理。要理解的是，对于本领域的技术人员，在本文中描述的设置和细节的修改和变化显而易见。因此，其目的在于，通过在本文中的实施方式的描述和解释，仅仅由即将发生的专利权利要求的范围限制，而不由提出的具体细节限制。The above embodiments merely illustrate the principles of the present invention. It should be understood that modifications and variations of the settings and details described herein will be readily apparent to those skilled in the art. Therefore, the aim is that the scope of the embodiments described and explained herein be limited only by the scope of the impending patent claims, and not by the specific details presented.

因此，以上说明书具体描述以下实施方式：Therefore, the above instruction manual specifically describes the following implementation methods:

1.一种多视图解码器，被配置为使用从第一视图(12)到第二视图(15)的视图间预测，从数据流重建多个视图(12，15)，其中，所述多视图解码器被配置为响应于所述数据流中的信令，以改变在所述第一视图(12)被分成的空间段(301)的空间段边界(300)处的所述视图间预测。1. A multi-view decoder configured to reconstruct a plurality of views (12, 15) from a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the multi-view decoder is configured to change the inter-view predictions at spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided in response to signaling in the data stream.

2.根据实施方式1所述的多视图解码器，其中，所述多视图解码器被配置为在改变所述视图间预测中，执行在所述数据流中可信令的可能视差矢量的域的限制。2. The multi-view decoder according to embodiment 1, wherein the multi-view decoder is configured to perform a domain restriction on the possible disparity vectors of the trust order in the data stream during the change of the inter-view prediction.

3.根据实施方式1或2所述的多视图解码器，其中，所述多视图解码器被配置为基于所述数据流，确定用于所述第二视图(15)的当前部分(302)的可能视差矢量的域之中的视差矢量(308)，并且在从所述第一视图12的共同定位到所述当前部分(302)的共同定位部分(306)偏移所确定的视差矢量(308)的参考部分(304)处取样所述第一视图(12)。3. The multi-view decoder according to embodiment 1 or 2, wherein the multi-view decoder is configured to determine, based on the data stream, a disparity vector (308) in the domain of possible disparity vectors for the current portion (302) of the second view (15), and sample the first view (12) at a reference portion (304) of the disparity vector (308) determined by offset from the colocation of the first view 12 to the colocation portion (306) of the current portion (302).

4.根据实施方式3所述的多视图解码器，其中，所述多视图解码器被配置为在改变所述视图间预测中，执行在所述数据流中可信令的可能视差矢量的域的限制，并且执行所述可能视差矢量的域的限制使得所述参考部分(304)位于所述共同定位部分(306)在空间上所在的空间段(301)内。4. The multi-view decoder according to embodiment 3, wherein the multi-view decoder is configured to, in changing the inter-view prediction, perform a domain restriction of the possible disparity vectors of the trust order in the data stream, and perform the domain restriction of the possible disparity vectors such that the reference portion (304) is located within the spatial segment (301) in which the co-location portion (306) is located in space.

5.根据实施方式3所述的多视图解码器，其中，所述多视图解码器被配置为在改变所述视图间预测中，执行在所述数据流中可信令的可能视差矢量的域的限制，并且执行所述可能视差矢量的域的限制使得所述参考部分(304)位于所述共同定位部分(306)在空间上所在的空间段内，并且在指向所述边界(300)的维度的视差矢量的分量具有亚像素分辨率的情况下，所述参考部分(304)与所述空间段的边界间隔开大于或等于内插滤波器内核半宽度(310)。5. The multi-view decoder according to embodiment 3, wherein the multi-view decoder is configured to, in changing the inter-view prediction, perform a domain restriction of the possible disparity vectors of the trusted order in the data stream, and perform the domain restriction of the possible disparity vectors such that the reference portion (304) is located within the spatial segment in which the co-location portion (306) is located, and when the components of the disparity vectors in the dimension pointing to the boundary (300) have sub-pixel resolution, the reference portion (304) is spaced from the boundary of the spatial segment by a distance greater than or equal to the half-width (310) of the interpolation filter kernel.

6.根据前述实施方式中任一项所述的多视图解码器，其中，所述多视图解码器被配置为在改变所述视图间预测中，使用独立于空间段的边界外部的信息的替代数据，在延伸超出所述空间段的边界(300)的部分处填充内插滤波器内核(311)，所述第一视图的共同定位到当前利用所述视图间预测进行预测的所述第二视图(15)的当前部分(302)的共同定位部分(306)在空间上位于所述空间段中。6. The multi-view decoder according to any one of the foregoing embodiments, wherein the multi-view decoder is configured to, in changing the inter-view prediction, use alternative data that is independent of the boundary outside the spatial segment to fill the interpolation filter kernel (311) at the portion extending beyond the boundary (300) of the spatial segment, where the co-localization of the first view to the co-localization portion (306) of the current portion (302) of the second view (15) currently being predicted using the inter-view prediction is spatially located in the spatial segment.

7.根据前述实施方式中任一项所述的多视图解码器，其中，所述多视图解码器被配置为在所述视图间预测中，为所述第二视图的当前部分，导出所述第一视图(12)内的参考部分(314)并且根据所述数据流中的所述信令，7. A multi-view decoder according to any one of the foregoing embodiments, wherein the multi-view decoder is configured to, in the inter-view prediction, derive a reference portion (314) within the first view (12) for the current portion of the second view and according to the signaling in the data stream,

检查所述参考部分(314)是否位于所述第一视图(12)的共同定位到所述当前部分(302)的共同定位部分(306)在空间上所在的空间段(301)内，并且将预测器应用于从所述参考部分(314)的属性导出的所述当前部分(302)，或者根据所述参考部分(314)是否位于所述共同定位部分(306)在空间上所在的空间段(301)内，抑制对所述当前部分(302)的参数的所述应用或者将替代预测器应用于所述当前部分(302)的参数，或者Check whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) of the first view (12) is spatially located to the current portion (302), and apply the predictor to the current portion (302) derived from the attributes of the reference portion (314), or suppress the application of the parameters of the current portion (302) or apply an alternative predictor to the parameters of the current portion (302) depending on whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) is spatially located.

应用所述预测器，而不管所述参考部分(314)是否位于所述共同定位部分在空间上所在的所述空间段(82)内。The predictor is applied regardless of whether the reference portion (314) is located within the spatial segment (82) in which the co-location portion is located.

8.根据实施方式7所述的多视图解码器，其中，所述多视图解码器被配置为在导出所述参考部分(314)中，8. The multi-view decoder according to embodiment 7, wherein the multi-view decoder is configured to derive the reference portion (314),

估计用于所述当前部分(302)的视差矢量(316)，Estimate the disparity vector (316) for the current part (302).

定位所述第一视图的共同定位到所述当前部分(302)的代表位置(318)或者所述第一视图的与所述当前部分(302)相邻的相邻部分(320)，并且The common location of the first view to the representative position (318) of the current part (302) or the adjacent part (320) of the first view adjacent to the current part (302) is determined, and

通过将所述视差矢量(316)应用于所述代表位置(318)来确定所述参考部分(314)。The reference portion (314) is determined by applying the disparity vector (316) to the representative position (318).

9.根据实施方式8所述的多视图解码器，其中，所述多视图解码器被配置为基于在所述数据流中传输的深度图来估计用于所述当前部分的视差矢量或者用于所述当前部分的空间上或时间上预测的视差矢量。9. The multi-view decoder according to embodiment 8, wherein the multi-view decoder is configured to estimate a disparity vector for the current portion or a spatially or temporally predicted disparity vector for the current portion based on a depth map transmitted in the data stream.

10.根据实施方式8或9所述的多视图解码器，其中，所述多视图解码器被配置为在确定所述参考部分(314)中，通过使用所述视差矢量(316)在所述第一视图(12)成为编码块、预测块、残差块和/或变换块的划分中选择所述参考部分。10. The multi-view decoder according to embodiment 8 or 9, wherein the multi-view decoder is configured to select the reference portion (314) in determining the reference portion (314) by using the disparity vector (316) in the partitioning of the first view (12) into a coding block, a prediction block, a residual block, and/or a transform block.

11.根据实施方式7到10中任一项所述的多视图解码器，其中，所述参数是运动矢量、视差矢量、残差信号和/或深度值。11. The multi-view decoder according to any one of embodiments 7 to 10, wherein the parameters are motion vectors, disparity vectors, residual signals, and/or depth values.

12.根据实施方式7到11中任一项所述的多视图解码器，其中，所述属性是运动矢量、视差矢量、残差信号和/或深度值。12. The multi-view decoder according to any one of embodiments 7 to 11, wherein the attribute is a motion vector, a disparity vector, a residual signal, and/or a depth value.

13.一种多视图编码器，被配置为使用从第一视图(12)到第二视图(15)的视图间预测，将多个视图(12、15)编码成数据流，其中，所述多视图编码器被配置为改变在所述第一视图(12)被分成的空间段(301)的空间段边界(300)处的所述视图间预测。13. A multi-view encoder configured to encode a plurality of views (12, 15) into a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the multi-view encoder is configured to modify the inter-view predictions at the spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided.

14.根据实施方式13所述的多视图编码器，其中，所述多视图编码器被配置为在改变所述视图间预测中，执行可能视差矢量的域的限制。14. The multi-view encoder according to embodiment 13, wherein the multi-view encoder is configured to perform domain restriction of possible disparity vectors in changing the inter-view prediction.

15.根据实施方式13或14所述的多视图编码器，其中，所述多视图编码器被配置为确定(例如，通过优化)用于所述第二视图(15)的当前部分(302)(例如，视差补偿地预测的预测块)的可能视差矢量的域中的视差矢量(308)并且将其作为信号在所述数据流中发送，并且在从所述第一视图12的共同定位到所述当前部分(302)的共同定位部分(306)偏移所确定的视差矢量(308)的参考部分(304)处取样所述第一视图(12)。15. The multi-view encoder according to embodiment 13 or 14, wherein the multi-view encoder is configured to determine (e.g., by optimization) a disparity vector (308) in the domain of possible disparity vectors for the current portion (302) (e.g., a prediction block predicted with disparity compensation) of the second view (15) and transmit it as a signal in the data stream, and sample the first view (12) at a reference portion (304) of the determined disparity vector (308) offset from the colocation of the first view 12 to the colocation portion (306) of the current portion (302).

16.根据实施方式15所述的多视图编码器，其中，所述多视图编码器被配置为执行所述可能视差矢量的域的限制，使得所述参考部分(304)位于(例如，完全地)所述共同定位部分(306)在空间上所在的空间段(301)内。16. The multi-view encoder according to embodiment 15, wherein the multi-view encoder is configured to perform a domain restriction of the possible disparity vector such that the reference portion (304) is located (e.g., completely) within the spatial segment (301) in which the co-location portion (306) is located in space.

17.根据实施方式15所述的多视图编码器，其中，所述多视图编码器被配置为执行所述可能视差矢量的域的限制，使得所述参考部分(304)位于所述共同定位部分(306)在空间上所在的空间段内，并且在指向所述边界(300)的维度的视差矢量的分量具有亚像素分辨率的情况下，所述参考部分(304)与所述空间段的边界间隔开大于或等于内插滤波器内核半宽度(310)。17. The multi-view encoder according to embodiment 15, wherein the multi-view encoder is configured to perform a domain restriction of the possible disparity vector such that the reference portion (304) is located within the spatial segment in which the co-location portion (306) is located, and the reference portion (304) is spaced from the boundary of the spatial segment by a distance greater than or equal to the half-width (310) of the interpolation filter kernel when the components of the disparity vector in the dimension pointing to the boundary (300) have sub-pixel resolution.

18.根据实施方式13到17中任一项所述的多视图编码器，其中，所述多视图编码器被配置为在改变所述视图间预测中，在延伸超出空间段的边界(300)的部分处填充内插滤波器内核(311)，所述第一视图的共同定位到当前利用所述视图间预测进行预测的所述第二视图(15)的当前部分(302)的共同定位部分(306)在空间上位于所述空间段中。18. A multi-view encoder according to any one of embodiments 13 to 17, wherein the multi-view encoder is configured to fill an interpolation filter kernel (311) at a portion extending beyond the boundary (300) of a spatial segment in changing the inter-view prediction, wherein the co-location of the first view to the co-location portion (306) of the current portion (302) of the second view (15) currently being predicted using the inter-view prediction is spatially located in the spatial segment.

19.根据实施方式13到18中任一项所述的多视图编码器，其中，所述多视图编码器被配置为在所述视图间预测中，为所述第二视图的当前部分，导出在所述第一视图(12)内的参考部分(314)，并且根据所述数据流中的信令，19. A multi-view encoder according to any one of embodiments 13 to 18, wherein the multi-view encoder is configured to, in the inter-view prediction, derive a reference portion (314) within the first view (12) for the current portion of the second view, and according to signaling in the data stream,

检查所述参考部分(314)是否位于所述第一视图(12)的共同定位到所述当前部分(306)的共同定位部分(306)在空间上所在的空间段(301)内，并且将预测器应用于从所述参考部分(314)的属性导出的所述当前部分(302)，或者根据所述参考部分(314)是否位于所述共同定位部分(306)在空间上所在的所述空间段(301)内来抑制对所述当前部分(302)的参数的所述应用，或者Check whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) of the first view (12) is spatially located, and apply the predictor to the current portion (302) derived from the attributes of the reference portion (314), or suppress the application of the parameters to the current portion (302) based on whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) is spatially located, or

应用所述预测器，而不管所述参考部分(314)是否位于所述共同定位部分在空间上所在的所述空间段(301)内。The predictor is applied regardless of whether the reference portion (314) is located within the spatial segment (301) in which the co-location portion is located.

20.根据实施方式19所述的多视图编码器，其中，所述多视图编码器被配置为在导出所述参考部分(314)中，20. The multi-view encoder according to embodiment 19, wherein the multi-view encoder is configured to, in deriving the reference portion (314),

估计用于所述当前部分(314)的视差矢量(316)，Estimate the disparity vector (316) for the current part (314),

21.根据实施方式20所述的多视图编码器，其中，所述多视图编码器被配置为基于在所述数据流中传输的深度图来估计用于所述当前部分的视差矢量或者用于所述当前部分的空间上或时间上预测的视差矢量。21. The multi-view encoder according to embodiment 20, wherein the multi-view encoder is configured to estimate a disparity vector for the current portion or a spatially or temporally predicted disparity vector for the current portion based on a depth map transmitted in the data stream.

22.根据实施方式19到21中任一项所述的多视图编码器，其中，所述参数是运动矢量、视差矢量、残差信号和/或深度值。22. The multi-view encoder according to any one of embodiments 19 to 21, wherein the parameters are motion vectors, disparity vectors, residual signals and/or depth values.

23.根据实施方式19到22中任一项所述的多视图编码器，其中，所述属性是运动矢量、视差矢量、残差信号和/或深度值。23. The multi-view encoder according to any one of embodiments 19 to 22, wherein the attribute is a motion vector, a disparity vector, a residual signal, and/or a depth value.

24.根据实施方式13到23中任一项所述的多视图编码器，被配置为将所述改变在所述数据流中作为信号发送到解码器，以使所述解码器依靠所述改变。24. The multi-view encoder according to any one of embodiments 13 to 23 is configured to send the change as a signal to the decoder in the data stream so that the decoder relies on the change.

25.一种多视图解码器，被配置为使用从第一视图(12)到第二视图(15)的视图间预测，从数据流中重建多个视图(12，15)，其中，所述多视图解码器被配置为使用在所述数据流中的信令作为保证，即在所述第一视图(12)被分成的空间段(301)的空间段边界(300)处限制所述视图间预测(602)，使得所述视图间预测不涉及所述第二视图(15)的任何当前部分(302)对除了所述第一视图的共同定位到所述第二视图的相应当前部分的共同定位部分(606)所在的空间段以外的空间段的任何依赖性。25. A multi-view decoder configured to reconstruct a plurality of views (12, 15) from a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the multi-view decoder is configured to use signaling in the data stream as a guarantee that the inter-view predictions (602) are constrained at the spatial segment boundaries (300) of the spatial segments (301) into which the first view (12) is divided, such that the inter-view predictions do not involve any dependency of any current portion (302) of the second view (15) on any spatial segment other than the spatial segment in which the colocation of the first view to the colocation portion (606) of the corresponding current portion of the second view is located.

26.根据实施方式25所述的多视图解码器，被配置为响应于在所述数据流中的所述信令，使用视图间并行性，调整视图间解码偏移或决定执行所述第一和第二视图的重建的实验。26. The multi-view decoder according to embodiment 25 is configured to, in response to the signaling in the data stream, use inter-view parallelism to adjust the inter-view decoding offset or determine to perform an experiment to reconstruct the first and second views.

27.根据实施方式25或26所述的多视图解码器，其中，所述多视图解码器被配置为基于所述数据流，确定用于所述第二视图(15)的当前部分(302)的可能视差矢量的域之中的视差矢量(308)，并且在从所述第一视图12的共同定位到所述当前部分(302)的共同定位部分(306)偏移所确定的视差矢量(308)的参考部分(304)处取样所述第一视图(12)。27. The multi-view decoder according to embodiment 25 or 26, wherein the multi-view decoder is configured to determine, based on the data stream, a disparity vector (308) in the domain of possible disparity vectors for the current portion (302) of the second view (15), and sample the first view (12) at a reference portion (304) of the disparity vector (308) determined by offset from the colocation of the first view 12 to the colocation portion (306) of the current portion (302).

28.一种用于使用从第一视图(12)到第二视图(15)的视图间预测，从数据流中重建多个视图(12，15)的方法，其中，所述方法响应于在所述数据流中的信令，以改变在所述第一视图(12)被分成的空间段(301)的空间段边界(300)处的所述视图间预测。28. A method for reconstructing a plurality of views (12, 15) from a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the method responds to signaling in the data stream to change the inter-view predictions at spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided.

29.一种用于使用从第一视图(12)到第二视图(15)的视图间预测，将多个视图(12，15)编码成数据流的方法，其中，所述方法包括改变在所述第一视图(12)被分成的空间段(301)的空间段边界(300)29. A method for encoding a plurality of views (12, 15) into a data stream using inter-view prediction from a first view (12) to a second view (15), wherein the method includes changing the spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided.

处的所述视图间预测。The prediction between views at that location.

30.一种用于使用从第一视图(12)到第二视图(15)的视图间预测，从数据流中重建多个视图(12，15)的方法，其中，所述方法包括使用在所述数据流中的信令作为保证，即在所述第一视图(12)被分成的空间段(301)的空间段边界(300)处限制所述视图间预测(602)，使得所述视图间预测不涉及所述第二视图(15)的任何当前部分(302)对除了所述第一视图的共同定位到所述第二视图的相应当前部分的共同定位部分(606)所在的空间段以外的空间段的任何依赖性。30. A method for reconstructing a plurality of views (12, 15) from a data stream using inter-view prediction from a first view (12) to a second view (15), wherein the method includes using signaling in the data stream as a guarantee that the inter-view prediction (602) is constrained at the spatial segment boundary (300) of a spatial segment (301) into which the first view (12) is divided, such that the inter-view prediction does not involve any dependency of any current portion (302) of the second view (15) on a spatial segment other than the spatial segment in which the colocation of the first view to the colocation portion (606) of the corresponding current portion of the second view is located.

31.一种具有程序代码的计算机程序，所述计算机程序在计算机上运行时，用于执行根据实施方式27到30中任一项所述的方法。31. A computer program having program code, which, when run on a computer, performs the method according to any one of embodiments 27 to 30.

32.一种多层视频数据流(200)，由NAL单元(202)的序列组成，所述多层视频数据流(200)具有使用层间预测编码到其中的多个层的图片(204)，每个NAL单元(202)具有表示与各个所述NAL单元相关的层的层索引(nuh_layer_id)，所述NAL单元的序列被构造成非交错的访问单元(206)的序列，其中，属于一个访问单元的NAL32. A multi-layer video data stream (200) comprising a sequence of NAL units (202), the multi-layer video data stream (200) having images (204) of multiple layers therein encoded using inter-layer predictive coding, each NAL unit (202) having a layer index (nuh_layer_id) representing the layer associated with each NAL unit, the sequence of NAL units being constructed as a sequence of non-interleaved access units (206), wherein the NAL units belonging to an access unit are...

单元与一个时间上的时刻的图片相关，并且不同的访问单元中的NAL单元与不同的时刻相关，其中，在每个访问单元内，对于每个层，与各个层相关的NAL单元被分组到一个或多个解码单元(208)，并且与不同的层相关的NAL单元的解码单元被交错，使得对于每个解码单元(208)，用来编码各个所述解码单元的层间预测基于除了与所述各个解码单元相关的层以外的、被编码成在各个所述访问单元内于所述各个解码单元之前的解码单元的层的图片的部分。Each unit is associated with an image at a given moment in time, and NAL units in different access units are associated with different moments in time. Within each access unit, for each layer, NAL units associated with each layer are grouped into one or more decoding units (208), and decoding units of NAL units associated with different layers are interleaved such that for each decoding unit (208), the inter-layer prediction used to encode each of the decoding units is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within each of the access units, excluding the layer associated with the respective decoding unit.

33.根据实施方式32所述的多层视频数据流(200)，其中，所述多层视频数据流具有交错信令，所述交错信令具有第一可能状态和第二可能状态，其中，33. The multi-layer video data stream (200) according to embodiment 32, wherein the multi-layer video data stream has interleaved signaling, the interleaved signaling having a first possible state and a second possible state, wherein,

如果所述交错信令采取所述第一可能状态，则在每个访问单元内，对于每个层，与各个层相关的所述NAL单元中的至少一些被分组到一个或多个解码单元，并且与不同的层相关的NAL单元的解码单元被交错，使得对于每个解码单元，用来编码各个所述解码单元的层间预测基于除了与所述各个解码单元相关的层以外的、被编码成在各个所述访问单元内于所述各个解码单元之前的解码单元的层的图片的部分，并且If the interleaved signaling adopts the first possible state, then within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units, and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit, the inter-layer prediction used to encode each decoding unit is based on a portion of the image of the layers of the decoding units preceding each decoding unit within each access unit, excluding the layers associated with each decoding unit.

如果所述交错信令采取所述第二可能状态，则在每个访问单元内，所述NAL单元被布置成相对于与所述NAL单元相关的层不交错。If the interleaving signaling adopts the second possible state, then within each access unit, the NAL unit is arranged to be non-interleaved relative to the layer associated with the NAL unit.

34.根据实施方式32或33所述的多层视频数据流，其中，每个NAL单元具有表示所述各个NAL单元在一组可能类型之中的类型的NAL34. A multi-layer video data stream according to embodiment 32 or 33, wherein each NAL unit has a NAL representing the type of the respective NAL unit among a set of possible types.

单元类型索引，并且在每个访问单元内，所述各个访问单元的所述NAL单元的类型在NAL单元类型之中遵循排序规则，并且在每对访问单元之间，打破所述排序规则。The unit type index, and within each access unit, the type of the NAL unit of each access unit follows a sorting rule among the NAL unit types, and between each pair of access units, the sorting rule is broken.

35.一种多层视频编码器，用于生成由NAL单元(202)的序列组成的多层视频数据流(200)，所述多层视频编码器被配置为生成所述多层视频数据流(200)，使得所述多层视频数据流具有使用层间预测编码到其中的多个层的图片(204)，每个NAL单元(202)具有表示与各个所述NAL单元相关的层的层索引(例如，nuh_layer_id)，35. A multilayer video encoder for generating a multilayer video data stream (200) consisting of sequences of NAL units (202), the multilayer video encoder being configured to generate the multilayer video data stream (200) such that the multilayer video data stream has pictures (204) of a plurality of layers therein encoded using inter-layer predictive coding, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with each NAL unit.

所述NAL单元的序列被构造成非交错的访问单元(206)的序列，其中，属于一个访问单元的NAL单元与一个时间上的时刻的图片相关，并且不同的访问单元中的NAL单元与不同的时刻相关，其中，The sequence of NAL units is constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to one access unit is associated with an image at a given time, and NAL units in different access units are associated with different times.

在每个访问单元内，对于每个层，与各个层相关的NAL单元中的至少一些被分组到一个或多个解码单元(208)，并且与不同的层相关的NAL单元的解码单元被交错，使得对于每个解码单元(208)，用来编码各个所述解码单元的层间预测基于除了与所述各个解码单元相关的层以外的、被编码成在各个所述访问单元内于所述各个解码单元之前的解码单元的层的图片的部分。Within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode each of the decoding units is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within each of the access units, excluding the layer associated with the respective decoding unit.

36.一种解码器，被配置为解码由NAL单元(202)的序列组成的多层视频数据流(200)，所述多层视频数据流(200)具有使用层间预测编码到其中的多个层的图片(204)，每个NAL单元(202)具有表示与各个所述NAL单元相关的层的层索引(例如，nuh_layer_id)，36. A decoder configured to decode a multi-layer video data stream (200) consisting of sequences of NAL units (202), the multi-layer video data stream (200) having pictures (204) of a plurality of layers therein encoded using inter-layer prediction, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with the respective NAL unit.

37.根据实施方式36所述的解码器，其中，所述解码器被配置为通过并行方式从所述多层视频数据流中解码与所述一个时刻相关的多个层的图片。37. The decoder according to embodiment 36, wherein the decoder is configured to decode images of multiple layers related to the one moment from the multi-layer video data stream in parallel.

38.根据实施方式36或37所述的解码器，其中，所述解码器被配置为根据所述NAL单元属于的层，将所述NAL单元分配到多个缓冲器上，来在所述多个缓冲器中缓冲所述多层视频数据流。38. The decoder according to embodiment 36 or 37, wherein the decoder is configured to distribute the NAL unit to a plurality of buffers according to the layer to which the NAL unit belongs, so as to buffer the multi-layer video data stream in the plurality of buffers.

39.根据实施方式36到38中任一项所述的解码器，其中，所述多层视频数据流具有交错信令，所述交错信令具有第一可能状态和第二可能状态，其中，其中，所述解码器被配置为响应于所述交错信令在于：所述解码器知道，39. The decoder according to any one of embodiments 36 to 38, wherein the multi-layer video data stream has interleaved signaling, the interleaved signaling having a first possible state and a second possible state, wherein the decoder is configured to respond to the interleaved signaling in that: the decoder knows that

40.根据实施方式36到39中任一项所述的解码器，其中，所述多层视频数据流具有交错信令，所述交错信令具有第一可能状态和第二可能状态，其中，所述解码器被配置为响应于所述交错信令在于：所述解码器被配置为，在所述交错信令具有所述第一可能状态的情况下，根据所述NAL单元属于的层，将所述NAL单元分配到多个缓冲器上，来在所述多个缓冲器中缓冲所述多层视频数据流，并且在所述交错信令具有所述第二可能状态的情况下，将所述多层视频数据流缓冲在所述多个缓冲器的一个缓冲器中，而不管所述各个NAL40. A decoder according to any one of embodiments 36 to 39, wherein the multi-layer video data stream has interleaved signaling, the interleaved signaling having a first possible state and a second possible state, wherein the decoder is configured to, in response to the interleaved signaling, allocate the NAL unit to a plurality of buffers according to the layer to which the NAL unit belongs, to buffer the multi-layer video data stream in the plurality of buffers, and, in the case that the interleaved signaling has the second possible state, buffer the multi-layer video data stream in one of the plurality of buffers, regardless of the individual NAL units.

单元属于的层。The layer to which the unit belongs.

41.根据实施方式36到40中任一项所述的解码器，其中，所述多层视频数据流(200)被布置使得每个NAL单元具有表示所述各个NAL41. The decoder according to any one of embodiments 36 to 40, wherein the multi-layer video data stream (200) is arranged such that each NAL unit has a representation of the respective NAL.

单元在一组可能类型之中的类型的NAL单元类型索引，并且在每个访问单元内，所述各个访问单元的所述NAL单元的类型在NAL单元类型之中遵循排序规则，并且在每对访问单元之间，打破所述排序规则，其中，所述解码器被配置为通过检测所述排序规则在两个直接连续的NAL单元之间是否被打破，来使用所述排序规则检测访问单元边界。The unit is indexed by a type of NAL unit in a set of possible types, and within each access unit, the type of the NAL unit of each access unit follows a sorting rule within the NAL unit type, and between each pair of access units, the sorting rule is broken, wherein the decoder is configured to use the sorting rule to detect access unit boundaries by detecting whether the sorting rule is broken between two directly consecutive NAL units.

42.一种用于生成由NAL单元(202)的序列组成的多层视频数据流(200)的方法，所述方法包括生成所述多层视频数据流(200)，使得所述多层视频数据流具有使用层间预测编码到其中的多个层的图片(204)，每个NAL单元(202)具有表示与各个所述NAL单元相关的层的层索引(例如，nuh_layer_id)，所述NAL单元的序列被构造成非交错的访问单元(206)的序列，其中，属于一个访问单元的NAL单元与一个时间上的时刻的图片相关，并且不同的访问单元中的NAL单元与不同的时刻相关，其中，在每个访问单元内，对于每个层，与各个层相关的NAL单元中的至少一些被分组到一个或多个解码单元(208)，并且与不同的层相关的NAL单元的解码单元被交错，使得对于每个解码单元(208)，用来编码各个所述解码单元的层间预测基于除了与所述各个解码单元相关的层以外的、被编码成在各个所述访问单元内于所述各个解码单元之前的解码单元的层的图片的部分。42. A method for generating a multi-layer video data stream (200) consisting of a sequence of NAL units (202), the method comprising generating the multi-layer video data stream (200) such that the multi-layer video data stream has pictures (204) of a plurality of layers therein encoded using inter-layer predictive coding, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with the respective NAL unit, the sequence of the NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to an access unit is associated with a time... The images are associated with the time of the access unit, and the NAL units in different access units are associated with different time points. Within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode the respective decoding unit is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within the respective access unit, excluding the layer associated with the respective decoding unit.

43.一种用于解码由NAL单元(202)的序列组成的多层视频数据流(200)的方法，所述多层视频数据流(200)具有使用层间预测编码到其中的多个层的图片(204)，每个NAL单元(202)具有表示与各个所述NAL单元相关的层的层索引(例如，nuh_layer_id)，所述NAL单元的序列被构造成非交错的访问单元(206)的序列，其中，属于一个访问单元的NAL单元与一个时间上的时刻的图片相关，并且不同的访问单元中的NAL单元与不同的时刻相关，其中，在每个访问单元内，对于每个层，与各个层相关的NAL单元中的至少一些被分组到一个或多个解码单元(208)，并且与不同的层相关的NAL单元的解码单元被交错，使得对于每个解码单元(208)，用来编码各个所述解码单元的层间预测基于除了与所述各个解码单元相关的层以外的、被编码成在各个所述访问单元内于所述各个解码单元之前的解码单元的层的图片的部分。43. A method for decoding a multi-layer video data stream (200) consisting of a sequence of NAL units (202), the multi-layer video data stream (200) having pictures (204) of a plurality of layers therein encoded using inter-layer prediction, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with each NAL unit, the sequence of NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to an access unit is associated with a picture at a given time. Furthermore, the NAL units in different access units are associated with different times, wherein, within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode each of the decoding units is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within each of the access units, excluding the layer associated with the respective decoding unit.

44.一种具有程序代码的计算机程序，所述计算机程序用于在计算机上运行时执行根据实施方式42到43中任一项所述的方法。44. A computer program having program code, the computer program being used to perform the method according to any one of embodiments 42 to 43 when running on a computer.

45.一种解码器，被配置为解码由数据包的序列组成的多层视频信号，所述数据包中的每个数据包包括层识别语法元素(806)，其中，所述解码器被配置为响应于在所述多层视频信号中的层识别扩展机制信令(808；808’)，以45. A decoder configured to decode a multi-layer video signal consisting of a sequence of data packets, each data packet including a layer recognition syntax element (806), wherein the decoder is configured to respond to layer recognition extension mechanism signaling (808; 808') in the multi-layer video signal to

如果所述层识别扩展机制信令(808；808’)发送层识别扩展机制的激活的信号，则对于预定数据包(810)，从多层数据流中读取(814)层识别扩展(818)并且使用所述层识别扩展(818)确定(816)所述预定数据包的层识别索引，并且If the layer identification extension mechanism signaling (808; 808') sends a signal indicating activation of the layer identification extension mechanism, then for a predetermined data packet (810), the layer identification extension (818) is read (814) from the multi-layer data stream and the layer identification extension (818) is used to determine (816) the layer identification index of the predetermined data packet, and

如果所述层识别扩展机制信令(808；808’)发送层识别扩展机制的去激活的信号，则对于所述预定数据包(810)，从由所述预定数据包包括的所述层识别语法元素(806)确定(820)所述预定数据包的所述层识别索引。If the layer identification extension mechanism signaling (808; 808') sends a signal to deactivate the layer identification extension mechanism, then for the predetermined data packet (810), the layer identification index of the predetermined data packet is determined (820) from the layer identification syntax element (806) included in the predetermined data packet.

46.根据实施方式45所述的解码器，其中，所述层识别语法元素(806)至少有助于所述层识别扩展机制信令(808)，其中，所述解码器被配置为至少根据由采用或者不采用逸出值的所述预定数据包包括的所述层识别语法元素，为所述预定数据包确定所述层识别扩展机制信令(808)发送所述层识别扩展机制的激活还是去激活的信号。46. The decoder according to embodiment 45, wherein the layer identification syntax element (806) at least contributes to the layer identification extension mechanism signaling (808), wherein the decoder is configured to determine, at least based on the layer identification syntax element included in the predetermined data packet with or without an escape value, whether the layer identification extension mechanism signaling (808) sends an activation or deactivation signal for the layer identification extension mechanism.

47.根据实施方式45或46所述的解码器，其中，高级语法元素(822)至少有助于所述层识别扩展机制信令(808；808’)，并且所述解码器被配置为根据所述高级语法元素(822)，为所述预定数据包(810)确定所述层识别扩展机制信令发送所述层识别扩展机制的激活还是去激活的信号。47. The decoder according to embodiment 45 or 46, wherein the high-level syntax element (822) at least contributes to the layer identification extension mechanism signaling (808; 808'), and the decoder is configured to determine, based on the high-level syntax element (822), whether to send an activation or deactivation signal for the layer identification extension mechanism signaling for the predetermined data packet (810).

48.根据实施方式47所述的解码器，其中，所述解码器被配置为响应于采用第一状态的所述高级语法元素，确定所述层识别扩展机制信令(808；808’)发送所述层识别扩展机制的去激活的信号。48. The decoder according to embodiment 47, wherein the decoder is configured to, in response to the high-level syntax element employing the first state, determine that the layer identification extension mechanism signaling (808; 808') sends a signal for deactivation of the layer identification extension mechanism.

49.根据实施方式48所述的解码器，其中，所述层识别语法元素额外有助于所述层识别扩展机制信令(808)，并且其中，所述解码器被配置为如果所述高级语法元素采用与所述第一状态不同的第二状态并且所述预定数据包的所述层识别语法元素采用逸出值，则确定所述等级识别扩展机制信令发送所述等级识别扩展机制的激活的信号用于所述预定数据包，并且如果所述高级语法元素采用所述第一状态以及所述层识别元素采用与所述逸出值不同的值中的一个适用，则确定所述等级识别扩展机制信令发送所述等级识别扩展机制的去激活的信号。49. The decoder according to embodiment 48, wherein the layer identification syntax element further contributes to the layer identification extension mechanism signaling (808), and wherein the decoder is configured to determine that the layer identification extension mechanism signaling sends an activation signal for the layer identification extension mechanism for the predetermined data packet if the high-level syntax element adopts a second state different from the first state and the layer identification syntax element of the predetermined data packet adopts an escape value, and to determine that the layer identification extension mechanism signaling sends a deactivation signal for the layer identification extension mechanism if one of the high-level syntax element adopts the first state and the layer identification element adopts a value different from the escape value applies.

50.根据实施方式49所述的解码器，其中，所述解码器被配置为如果所述高级语法元素采用与所述第一和第二状态不同的第三状态，则将表示由所述预定数据包包括的所述层识别语法元素的数字和表示所述层识别扩展的数字串联，以获得所述预定数据包的所述等级识别索引。50. The decoder according to embodiment 49, wherein the decoder is configured to concatenate a number representing the layer identification syntax element included in the predetermined data packet and a number representing the layer identification extension if the high-level syntax element adopts a third state different from the first and second states, to obtain the level identification index of the predetermined data packet.

51.根据实施方式49所述的解码器，其中，所述解码器被配置为如果所述高级语法元素采用所述第二状态，则使用所述高级语法元素确定所述等级识别扩展的长度n，并且将表示由所述预定数据包包括的所述层识别语法元素的数字和表示所述等级识别扩展的n位数字串联，以获得所述预定数据包的所述等级识别索引。51. The decoder according to embodiment 49, wherein the decoder is configured to determine the length n of the level identification extension using the high-level syntax element if the high-level syntax element is in the second state, and to concatenate a number representing the layer identification syntax element included by the predetermined data packet and an n-bit number representing the level identification extension to obtain the level identification index of the predetermined data packet.

52.根据实施方式45到51中任一项所述的解码器，其中，所述解码器被配置为：52. The decoder according to any one of embodiments 45 to 51, wherein the decoder is configured to:

如果所述层识别扩展机制信令发送所述层识别扩展机制的激活的信号，则通过将表示由所述预定数据包包括的所述层识别语法元素的数字和表示所述等级识别扩展的数字串联，来确定(816)所述预定数据包的所述层识别索引，以获得所述预定数据包的所述等级识别索引。If the layer identification extension mechanism signaling sends a signal indicating that the layer identification extension mechanism is activated, then the layer identification index of the predetermined data packet is determined (816) by concatenating a number representing the layer identification syntax element included in the predetermined data packet and a number representing the level identification extension, so as to obtain the level identification index of the predetermined data packet.

53.根据实施方式45到52中任一项所述的解码器，其中，所述解码器被配置为：53. The decoder according to any one of embodiments 45 to 52, wherein the decoder is configured to:

如果所述层识别扩展机制信令发送所述层识别扩展机制的激活的信号，则通过将所述等级识别扩展加到预定值(例如，maxNuhLayerId)来确定所述预定数据包的所述层识别索引，以获得所述预定数据包的所述等级识别索引。If the layer identification extension mechanism signaling sends a signal indicating that the layer identification extension mechanism is activated, the layer identification index of the predetermined data packet is determined by adding the level identification extension to a predetermined value (e.g., maxNuhLayerId) to obtain the level identification index of the predetermined data packet.

54.一种用于解码由数据包的序列组成的多层视频信号的方法，所述数据包中的每个数据包包括层识别语法元素(806)，其中，所述方法响应于在所述多层视频信号中的层识别扩展机制信令(808；808’)在于：所述方法包括：54. A method for decoding a multilayer video signal composed of a sequence of data packets, each data packet including a layer identification syntax element (806), wherein the method responds to layer identification extension mechanism signaling (808; 808') in the multilayer video signal in that: the method includes:

如果所述层识别扩展机制信令(808；808’)发送所述层识别扩展机制的去激活的信号，则对于所述预定数据包(810)，从由所述预定数据包包括的所述层识别语法元素(806)确定(820)所述预定数据包的所述层识别索引。If the layer identification extension mechanism signaling (808; 808') sends a signal to deactivate the layer identification extension mechanism, then for the predetermined data packet (810), the layer identification index of the predetermined data packet is determined (820) from the layer identification syntax element (806) included in the predetermined data packet.

55.一种具有程序代码的计算机程序，所述计算机程序在计算机上运行时，用于执行根据实施方式54所述的方法。55. A computer program having program code, which, when run on a computer, performs the method according to embodiment 54.

56.一种使用层间预测在不同的信息量的等级将视频资料编码成的多层视频数据流，所述等级具有在其中定义的顺序次序并且所述视频资料被编码成所述多层视频数据流使得经由所述层间预测，层不取决于根据所述顺序次序的随后的任何层，其中，经由所述层间预测取决于一个或多个其他层的每个层增加所述视频资料被编码成所述一个或多个其他层时的信息量(例如，在不同维度类型的方面)，其中，所述多层视频数据流包括：56. A multi-layer video data stream that encodes video data into levels of information content using inter-layer prediction, the levels having an order defined therein, and the video data being encoded into the multi-layer video data stream such that, via the inter-layer prediction, a layer is independent of any subsequent layer according to the order, wherein each layer that depends on one or more other layers via the inter-layer prediction increases the information content (e.g., in terms of different dimensional types) when the video data is encoded into the one or more other layers, wherein the multi-layer video data stream comprises:

第一语法结构，所述第一语法结构定义跨越依赖性空间的依赖性维度的数量M以及每个依赖性维度i的排序等级的最大数N_i，从而定义所述依赖性空间中的个可用点以及双射映射，将每个等级映射在所述依赖性空间内的所述可用点的至少一个子集中的各个可用点上，以及A first syntactic structure defines the number M of dependency dimensions spanning the dependency space and the maximum number _N i of sorting ranks for each dependency dimension i, thereby defining a number of available points in the dependency space and a bijective mapping that maps each rank to available points in at least one subset of the available points in the dependency space.

每个依赖性维度i，第二语法结构描述在依赖性维度i的N_i排序等级之中的依赖性，从而定义在所述依赖性空间中的所述可用点之间的依赖性，所有可用点与从较高指向较低的排序等级的依赖性轴中的各个可用点并行运行，对于每个依赖性维度，与各个依赖性维度并行的依赖性相对沿着除了各个维度以外的依赖性维度中的每个依赖性维度的循环移位不变，从而通过所述双射映射同时定义所述层之间的依赖性。For each dependency dimension i, the second syntactic structure describes the dependency in the N _i ordering rank of dependency dimension i, thereby defining the dependency between the available points in the dependency space. All available points run in parallel with each available point in the dependency axis from higher to lower ordering ranks. For each dependency dimension, the dependencies that run in parallel with each dependency dimension remain unchanged along the cyclic shift of each dependency dimension other than the respective dimensions, thereby simultaneously defining the dependencies between the layers through the bijective mapping.

57.一种网络实体，被配置为：57. A network entity configured as follows:

读取根据实施方式56所述的数据流的所述第一和第二语法结构，并且Read the first and second syntax structures of the data stream according to embodiment 56, and

基于所述第一和第二语法结构确定所述层之间的依赖性。The dependencies between the layers are determined based on the first and second syntactic structures.

58.根据实施方式56所述的网络实体，被配置为：58. The network entity according to embodiment 56 is configured as follows:

选择所述等级之一；并且Select one of the stated levels; and

丢弃所述多层视频数据流中属于(例如，经由nuh_layer_id)所选择的等级以所述层之间的依赖性的方式独立的层的数据包(例如，Discard data packets belonging to layers that are independent of each other in a way that is based on the selected level (e.g., via nuh_layer_id) in the multi-layer video data stream.

NAL单元)。NAL unit).

59.一种方法，包括：59. A method comprising:

读取根据实施方式56所述的数据流的第一和第二语法结构，并且Read the first and second syntax structures of the data stream according to embodiment 56, and

60.一种具有程序代码的计算机程序，所述计算机程序在计算机上运行时，用于执行根据实施方式59所述的方法。60. A computer program having program code, which, when run on a computer, performs the method according to embodiment 59.

参考文献References

[1]B.Bross et al.,“High Efficiency Video Coding(HEVC)textspecification draft 10”,JCTVC-L1003,Geneva,CH,14–23Jan.2013[1]B.Bross et al., "High Efficiency Video Coding(HEVC) textspecification draft 10", JCTVC-L1003, Geneva, CH, 14–23Jan.2013

[2]G.Tech et al.,“MV-HEVC Draft Text 3”,JCT3V-C1004,Geneva,CH,17–23Jan.2013[2]G.Tech et al., "MV-HEVC Draft Text 3", JCT3V-C1004, Geneva, CH, 17–23Jan.2013

[3]G.Tech et al.,“3D-HEVC Test Model 3”,JCT3V-C1005,Geneva,CH,17–23Jan.2013[3]G.Tech et al., "3D-HEVC Test Model 3", JCT3V-C1005, Geneva, CH, 17–23Jan.2013

[4]J.Chen et al.,“SHVC Draft Text 1”,JCT-VCL1008,Geneva,CH,17–23Jan.2013[4]J.Chen et al., "SHVC Draft Text 1", JCT-VCL1008, Geneva, CH, 17–23Jan.2013

[5]WILBURN,Bennett,et al.High performance imaging using large cameraarrays.ACM Transactions on Graphics,2005,24.Jg.,Nr.3,S.765-776.[5]WILBURN,Bennett,et al.High performance imaging using large cameraarrays.ACM Transactions on Graphics,2005,24.Jg.,Nr.3,S.765-776.

[6]WILBURN,Bennett S.,et al.Light field video camera.In:ElectronicImaging 2002.International Society for Optics and Photonics,2001.S.29-36.[6]WILBURN,Bennett S.,et al.Light field video camera.In:ElectronicImaging 2002.International Society for Optics and Photonics,2001.S.29-36.

[7]HORIMAI,Hideyoshi,et al.Full-color 3D display system with360degree horizontal viewing angle.In:Proc.Int.Symposium of 3D andContents.2010.S.7-10.[7]HORIMAI,Hideyoshi,et al.Full-color 3D display system with360degree horizontal viewing angle.In:Proc.Int.Symposium of 3D andContents.2010.S.7-10.

Claims

1. A multi-view decoder configured to reconstruct a plurality of views (12, 15) from a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the multi-view decoder is configured to change the inter-view predictions at spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided in response to signaling in the data stream.

2. The multi-view decoder of claim 1, wherein the multi-view decoder is configured to perform a domain restriction on the possible disparity vectors of the trusted order in the data stream during the alteration of the inter-view prediction.

3. The multi-view decoder according to claim 1 or 2, wherein the multi-view decoder is configured to determine, based on the data stream, a disparity vector (308) in the domain of possible disparity vectors for the current portion (302) of the second view (15), and sample the first view (12) at a reference portion (304) of the disparity vector (308) determined by offset from the colocation of the first view 12 to the colocation portion (306) of the current portion (302).

4. The multi-view decoder according to claim 3, wherein the multi-view decoder is configured to, in changing the inter-view prediction, perform a domain restriction on the possible disparity vectors of the trust order in the data stream, and perform the domain restriction on the possible disparity vectors such that the reference portion (304) is located within the spatial segment (301) in which the co-location portion (306) is spatially located.

5. The multi-view decoder according to claim 3, wherein the multi-view decoder is configured to, in changing the inter-view prediction, perform a domain restriction on the possible disparity vectors of the trusted order in the data stream, and perform the domain restriction on the possible disparity vectors such that the reference portion (304) is located within the spatial segment in which the co-location portion (306) is located, and the reference portion (304) is spaced from the boundary of the spatial segment by a distance greater than or equal to the half-width (310) of the interpolation filter kernel, provided that the components of the disparity vectors in the dimension pointing to the boundary (300) have sub-pixel resolution.

6. The multi-view decoder according to any one of the preceding claims, wherein the multi-view decoder is configured to, in changing the interview prediction, use alternative data that is independent of the boundary outside the spatial segment to fill the interpolation filter kernel (311) at the portion extending beyond the boundary (300) of the spatial segment, where the colocalization of the first view to the colocalization portion (306) of the current portion (302) of the second view (15) currently being predicted using the interview prediction is spatially located within the spatial segment.

7. The multi-view decoder according to any one of the preceding claims, wherein the multi-view decoder is configured to, in the inter-view prediction, derive a reference portion (314) within the first view (12) for the current portion of the second view and according to the signaling in the data stream,

Check whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) of the first view (12) is spatially located to the current portion (302), and apply the predictor to the current portion (302) derived from the attributes of the reference portion (314), or suppress the application of the parameters of the current portion (302) or apply an alternative predictor to the parameters of the current portion (302) depending on whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) is spatially located.

The predictor is applied regardless of whether the reference portion (314) is located within the spatial segment (82) in which the co-location portion is located.

8. The multi-view decoder of claim 7, wherein the multi-view decoder is configured to, in deriving the reference portion (314),

Estimate the disparity vector (316) for the current part (302).

The common location of the first view to the representative position (318) of the current part (302) or the adjacent part (320) of the first view adjacent to the current part (302) is determined, and

The reference portion (314) is determined by applying the disparity vector (316) to the representative position (318).

9. The multi-view decoder of claim 8, wherein the multi-view decoder is configured to estimate a disparity vector for the current portion or a spatially or temporally predicted disparity vector for the current portion based on a depth map transmitted in the data stream.

10. The multi-view decoder according to claim 8 or 9, wherein the multi-view decoder is configured to select the reference portion (314) in determining the reference portion (314) by using the disparity vector (316) in the partitioning of the first view (12) into a coding block, a prediction block, a residual block, and/or a transform block.

11. The multi-view decoder according to any one of claims 7 to 10, wherein the parameters are motion vectors, disparity vectors, residual signals, and/or depth values.

12. The multi-view decoder according to any one of claims 7 to 11, wherein the attribute is a motion vector, a disparity vector, a residual signal, and/or a depth value.

13. A multi-view encoder configured to encode a plurality of views (12, 15) into a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the multi-view encoder is configured to modify the inter-view predictions at the spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided.

14. The multi-view encoder of claim 13, wherein the multi-view encoder is configured to perform domain constraints on possible disparity vectors in changing the inter-view prediction.

15. The multi-view encoder according to claim 13 or 14, wherein the multi-view encoder is configured to determine (e.g., by optimization) a disparity vector (308) in the domain of possible disparity vectors for the current portion (302) of the second view (15) (e.g., a prediction block predicted with disparity compensation) and transmit it as a signal in the data stream, and sample the first view (12) at a reference portion (304) of the determined disparity vector (308) offset from the colocation of the first view 12 to the colocation portion (306) of the current portion (302).

16. The multi-view encoder of claim 15, wherein the multi-view encoder is configured to perform a domain restriction of the possible disparity vector such that the reference portion (304) is located (e.g., completely) within the spatial segment (301) in which the co-location portion (306) is located in space.

17. The multi-view encoder of claim 15, wherein the multi-view encoder is configured to perform a domain restriction of the possible disparity vector such that the reference portion (304) is located within the spatial segment in which the co-location portion (306) is spatially located, and the reference portion (304) is spaced from the boundary of the spatial segment by a distance greater than or equal to the half-width (310) of the interpolation filter kernel, provided that the components of the disparity vector in the dimension pointing to the boundary (300) have sub-pixel resolution.

18. The multi-view encoder according to any one of claims 13 to 17, wherein the multi-view encoder is configured to fill an interpolation filter kernel (311) at a portion extending beyond the boundary (300) of the spatial segment in changing the inter-view prediction, wherein the co-localization portion (306) of the first view to the current portion (302) of the second view (15) currently being predicted using the inter-view prediction is spatially located in the spatial segment.

19. The multi-view encoder according to any one of claims 13 to 18, wherein the multi-view encoder is configured to, in the inter-view prediction, derive a reference portion (314) within the first view (12) for the current portion of the second view, and according to signaling in the data stream,

Check whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) of the first view (12) is spatially located, and apply the predictor to the current portion (302) derived from the attributes of the reference portion (314), or suppress the application of the parameters to the current portion (302) based on whether the reference portion (314) is located within the spatial segment (301) where the co-location portion (306) is spatially located, or

The predictor is applied regardless of whether the reference portion (314) is located within the spatial segment (301) in which the co-location portion is located.

20. The multi-view encoder of claim 19, wherein the multi-view encoder is configured to, in deriving the reference portion (314),

Estimate the disparity vector (316) for the current part (314),

21. The multi-view encoder of claim 20, wherein the multi-view encoder is configured to estimate a disparity vector for the current portion or a spatially or temporally predicted disparity vector for the current portion based on a depth map transmitted in the data stream.

22. The multi-view encoder according to any one of claims 19 to 21, wherein the parameters are motion vectors, disparity vectors, residual signals, and/or depth values.

23. The multi-view encoder according to any one of claims 19 to 22, wherein the attribute is a motion vector, a disparity vector, a residual signal, and/or a depth value.

24. The multi-view encoder according to any one of claims 13 to 23, configured to send the change as a signal to the decoder in the data stream so that the decoder relies on the change.

25. A multi-view decoder configured to reconstruct a plurality of views (12, 15) from a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the multi-view decoder is configured to use signaling in the data stream as a guarantee that the inter-view predictions (602) are constrained at the spatial segment boundaries (300) of the spatial segments (301) into which the first view (12) is divided, such that the inter-view predictions do not involve any dependency of any current portion (302) of the second view (15) on any spatial segment other than the spatial segment in which the colocation of the first view to the colocation portion (606) of the corresponding current portion of the second view is located.

26. The multi-view decoder of claim 25, configured to, in response to the signaling in the data stream, use inter-view parallelism to adjust the inter-view decoding offset or determine to perform an experiment to reconstruct the first and second views.

27. The multi-view decoder according to claim 25 or 26, wherein the multi-view decoder is configured to determine, based on the data stream, a disparity vector (308) in the domain of possible disparity vectors for the current portion (302) of the second view (15), and to sample the first view (12) at a reference portion (304) of the disparity vector (308) determined by offset from the colocation of the first view 12 to the colocation portion (306) of the current portion (302).

28. A method for reconstructing a plurality of views (12, 15) from a data stream using inter-view predictions from a first view (12) to a second view (15), wherein the method responds to signaling in the data stream to change the inter-view predictions at spatial segment boundaries (300) of spatial segments (301) into which the first view (12) is divided.

29. A method for encoding a plurality of views (12, 15) into a data stream using inter-view prediction from a first view (12) to a second view (15), wherein the method includes changing the inter-view prediction at a spatial segment boundary (300) of a spatial segment (301) into which the first view (12) is divided.

30. A method for reconstructing a plurality of views (12, 15) from a data stream using inter-view prediction from a first view (12) to a second view (15), wherein the method includes using signaling in the data stream as a guarantee that the inter-view prediction (602) is constrained at the spatial segment boundary (300) of a spatial segment (301) into which the first view (12) is divided, such that the inter-view prediction does not involve any dependency of any current portion (302) of the second view (15) on a spatial segment other than the spatial segment in which the colocation of the first view to the colocation portion (606) of the corresponding current portion of the second view is located.

31. A computer program having program code, which, when run on a computer, performs the method according to any one of claims 27 to 30.

32. A multi-layer video data stream (200) comprising a sequence of NAL units (202) having pictures (204) of multiple layers therein encoded using inter-layer prediction, each NAL unit (202) having a layer index (nuh_layer_id) representing the layer associated with each NAL unit, the sequence of NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to one access unit is associated with a picture at a time, and NAL units in different access units are associated with different times, wherein within each access unit, for each layer, NAL units associated with each layer are grouped into one or more decoding units (208), and decoding units of NAL units associated with different layers are interleaved such that for each decoding unit (208), the inter-layer prediction used to encode each decoding unit is based on a portion of the picture of the layer of the decoding unit preceding the respective decoding unit, excluding the layer associated with the respective decoding unit.

33. The multi-layer video data stream (200) according to claim 32, wherein the multi-layer video data stream has interleaved signaling, the interleaved signaling having a first possible state and a second possible state, wherein,

If the interleaved signaling adopts the first possible state, then within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units, and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit, the inter-layer prediction used to encode each decoding unit is based on a portion of the image of the layers of the decoding units preceding each decoding unit within each access unit, excluding the layers associated with each decoding unit.

If the interleaving signaling adopts the second possible state, then within each access unit, the NAL unit is arranged to be non-interleaved relative to the layer associated with the NAL unit.

34. The multi-layer video data stream of claim 32 or 33, wherein each NAL unit has a NAL unit type index representing the type of the respective NAL unit among a set of possible types, and within each access unit, the type of the NAL unit of the respective access unit follows a sorting rule among the NAL unit types, and between each pair of access units, the sorting rule is broken.

35. A multilayer video encoder for generating a multilayer video data stream (200) consisting of a sequence of NAL units (202), the multilayer video encoder being configured to generate the multilayer video data stream (200) such that the multilayer video data stream has pictures (204) of a plurality of layers therein encoded using interlayer predictive coding, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with each NAL unit, the sequence of NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to an access unit... The NAL units in different access units are associated with a picture at a specific time point, and the NAL units in different access units are associated with different times points. Within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode the respective decoding unit is based on a portion of the picture of the layer of the decoding unit preceding the respective decoding unit within the respective access unit, excluding the layer associated with the respective decoding unit.

36. A decoder configured to decode a multi-layer video data stream (200) consisting of a sequence of NAL units (202), the multi-layer video data stream (200) having pictures (204) of a plurality of layers therein encoded using inter-layer predictive coding, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with the respective NAL unit, the sequence of NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to an access unit corresponds to a picture at a given time. The NAL units in different access units are associated with different times, wherein, within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode the respective decoding unit is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within the respective access unit, excluding the layer associated with the respective decoding unit.

37. The decoder of claim 36, wherein the decoder is configured to decode images of multiple layers associated with the one moment from the multi-layer video data stream in parallel.

38. The decoder of claim 36 or 37, wherein the decoder is configured to distribute the NAL unit to a plurality of buffers according to the layer to which the NAL unit belongs, so as to buffer the multi-layer video data stream in the plurality of buffers.

39. The decoder according to any one of claims 36 to 38, wherein the multi-layer video data stream has interleaved signaling, the interleaved signaling having a first possible state and a second possible state, wherein the decoder is configured to respond to the interleaved signaling in that: the decoder knows that

40. The decoder according to any one of claims 36 to 39, wherein the multi-layer video data stream has interleaved signaling, the interleaved signaling having a first possible state and a second possible state, wherein the decoder is configured to, in response to the interleaved signaling, allocate the NAL unit to a plurality of buffers according to the layer to which the NAL unit belongs, to buffer the multi-layer video data stream in the plurality of buffers, and in the case that the interleaved signaling has the second possible state, buffer the multi-layer video data stream in one buffer of the plurality of buffers, regardless of the layer to which each NAL unit belongs.

41. The decoder according to any one of claims 36 to 40, wherein the multi-layer video data stream (200) is arranged such that each NAL unit has a NAL unit type index representing the type of the respective NAL unit among a set of possible types, and within each access unit, the type of the NAL unit of the respective access unit follows a sorting rule among NAL unit types, and between each pair of access units, the sorting rule is broken, wherein the decoder is configured to detect access unit boundaries using the sorting rule by detecting whether the sorting rule is broken between two directly consecutive NAL units.

42. A method for generating a multi-layer video data stream (200) consisting of a sequence of NAL units (202), the method comprising generating the multi-layer video data stream (200) such that the multi-layer video data stream has pictures (204) of a plurality of layers therein encoded using inter-layer predictive coding, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with the respective NAL unit, the sequence of the NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to an access unit is associated with a time... The images are associated with the time of the access unit, and the NAL units in different access units are associated with different time points. Within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode the respective decoding unit is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within the respective access unit, excluding the layer associated with the respective decoding unit.

43. A method for decoding a multi-layer video data stream (200) consisting of a sequence of NAL units (202), the multi-layer video data stream (200) having pictures (204) of a plurality of layers therein encoded using inter-layer prediction, each NAL unit (202) having a layer index (e.g., nuh_layer_id) representing the layer associated with each NAL unit, the sequence of NAL units being constructed as a sequence of non-interleaved access units (206), wherein a NAL unit belonging to an access unit is associated with a picture at a given time. Furthermore, the NAL units in different access units are associated with different times, wherein, within each access unit, for each layer, at least some of the NAL units associated with each layer are grouped into one or more decoding units (208), and the decoding units of NAL units associated with different layers are interleaved, such that for each decoding unit (208), the inter-layer prediction used to encode each of the decoding units is based on a portion of the image of the layer of the decoding unit preceding the respective decoding unit within each of the access units, excluding the layer associated with the respective decoding unit.

44. A computer program having program code, the computer program being used to perform the method according to any one of claims 42 to 43 when run on a computer.

45. A decoder configured to decode a multi-layer video signal consisting of a sequence of data packets, each data packet including a layer recognition syntax element (806), wherein the decoder is configured to respond to layer recognition extension mechanism signaling (808; 808') in the multi-layer video signal to

If the layer identification extension mechanism signaling (808; 808') sends a signal indicating activation of the layer identification extension mechanism, then for a predetermined data packet (810), the layer identification extension (818) is read (814) from the multi-layer data stream and the layer identification extension (818) is used to determine (816) the layer identification index of the predetermined data packet, and

If the layer identification extension mechanism signaling (808; 808') sends a signal to deactivate the layer identification extension mechanism, then for the predetermined data packet (810), the layer identification index of the predetermined data packet is determined (820) from the layer identification syntax element (806) included in the predetermined data packet.

46. The decoder of claim 45, wherein the layer identification syntax element (806) at least contributes to the layer identification extension mechanism signaling (808), wherein the decoder is configured to determine, at least based on the layer identification syntax element included in the predetermined data packet with or without an escape value, whether the layer identification extension mechanism signaling (808) sends an activation or deactivation signal for the layer identification extension mechanism.

47. The decoder according to claim 45 or 46, wherein the high-level syntax element (822) at least contributes to the layer identification extension mechanism signaling (808; 808'), and the decoder is configured to determine, based on the high-level syntax element (822), for the predetermined data packet (810) whether to send a signal indicating whether the layer identification extension mechanism is activated or deactivated.

48. The decoder of claim 47, wherein the decoder is configured to, in response to the high-level syntax element employing the first state, determine that the layer identification extension mechanism signaling (808; 808') sends a signal for deactivation of the layer identification extension mechanism.

49. The decoder of claim 48, wherein the layer identification syntax element further contributes to the layer identification extension mechanism signaling (808), and wherein the decoder is configured to determine that the layer identification extension mechanism signaling sends an activation signal for the layer identification extension mechanism for the predetermined data packet if the high-level syntax element adopts a second state different from the first state and the layer identification syntax element of the predetermined data packet adopts an escape value, and to determine that the layer identification extension mechanism signaling sends a deactivation signal for the layer identification extension mechanism if one of the high-level syntax element adopts the first state and the layer identification element adopts a value different from the escape value applies.

50. The decoder of claim 49, wherein the decoder is configured to concatenate a number representing the layer identification syntax element included in the predetermined data packet and a number representing the layer identification extension if the high-level syntax element adopts a third state different from the first and second states, to obtain the level identification index of the predetermined data packet.

51. The decoder of claim 49, wherein the decoder is configured to determine the length n of the level identification extension using the high-level syntax element if the high-level syntax element is in the second state, and to concatenate a number representing the level identification syntax element included in the predetermined data packet and an n-bit number representing the level identification extension to obtain the level identification index of the predetermined data packet.

52. The decoder according to any one of claims 45 to 51, wherein the decoder is configured to:

If the layer identification extension mechanism signaling sends a signal indicating that the layer identification extension mechanism is activated, then the layer identification index of the predetermined data packet is determined (816) by concatenating a number representing the layer identification syntax element included in the predetermined data packet and a number representing the level identification extension, so as to obtain the level identification index of the predetermined data packet.

53. The decoder according to any one of claims 45 to 52, wherein the decoder is configured to:

If the layer identification extension mechanism signaling sends a signal indicating that the layer identification extension mechanism is activated, the layer identification index of the predetermined data packet is determined by adding the level identification extension to a predetermined value (e.g., maxNuhLayerId) to obtain the level identification index of the predetermined data packet.

54. A method for decoding a multilayer video signal composed of a sequence of data packets, each data packet including a layer identification syntax element (806), wherein the method responds to layer identification extension mechanism signaling (808; 808') in the multilayer video signal in that: the method includes:

55. A computer program having program code, which, when run on a computer, performs the method according to claim 54.

56. A multi-layer video data stream that encodes video data into levels of information content using inter-layer prediction, the levels having an order defined therein, and the video data being encoded into the multi-layer video data stream such that, via the inter-layer prediction, a layer is independent of any subsequent layer according to the order, wherein each layer that depends on one or more other layers via the inter-layer prediction increases the information content (e.g., in terms of different dimensional types) when the video data is encoded into the one or more other layers, wherein the multi-layer video data stream comprises:

A first syntactic structure defines the number M of dependency dimensions spanning the dependency space and the maximum number _N i of sorting ranks for each dependency dimension i, thereby defining a number of available points in the dependency space and a bijective mapping that maps each rank to available points in at least one subset of the available points in the dependency space.

For each dependency dimension i, the second syntactic structure describes the dependency in the N _i ordering rank of dependency dimension i, thereby defining the dependency between the available points in the dependency space. All available points run in parallel with each available point in the dependency axis from higher to lower ordering ranks. For each dependency dimension, the dependencies that run in parallel with each dependency dimension remain unchanged along the cyclic shift of each dependency dimension other than the respective dimensions, thereby simultaneously defining the dependencies between the layers through the bijective mapping.

57. A network entity configured as follows:

Read the first and second syntax structures of the data stream according to claim 56, and

The dependencies between the layers are determined based on the first and second syntactic structures.

58. The network entity according to claim 56, configured as follows:

Select one of the stated levels; and

Discard packets (e.g., NAL units) belonging to layers that are independent of each other in a way that is dependent on each other in the multi-layer video data stream, which belong to the selected level (e.g., via nuh_layer_id).

59. A method comprising:

60. A computer program having program code, which, when run on a computer, performs the method according to claim 59.