HK1220060B

HK1220060B - Video coding techniques using asymmetric motion partitioning

Info

Publication number: HK1220060B
Application number: HK16107896.2A
Authority: HK
Inventors: 陈颖; 张莉
Original assignee: 高通股份有限公司
Priority date: 2013-09-13
Filing date: 2014-09-12
Publication date: 2019-10-18

Description

Video decoding technology using asymmetric motion segmentation

本申请案主张2013年9月13日申请的第61/877,793号美国临时申请案和2013年9月23日申请的第61/881,383号美国临时申请案的权益，以上美国临时申请案两者的整个内容以引用的方式并入本文中。This application claims the benefit of U.S. Provisional Application No. 61/877,793, filed September 13, 2013, and U.S. Provisional Application No. 61/881,383, filed September 23, 2013, both of which are incorporated herein by reference in their entirety.

技术领域Technical Field

本发明涉及视频译码，即，视频数据的编码或解码。The present invention relates to video coding, ie, the encoding or decoding of video data.

背景技术Background Art

数字视频能力可以并入到多种多样的装置中，包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子图书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频译码技术，例如由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4第10部分高级视频译码(AVC)定义的标准、目前正在开发的高效率视频译码(HEVC)标准及此类标准的扩展中所描述的视频译码技术。视频装置可通过实施此些视频译码技术而更有效地发射、接收、编码、解码和/或存储数字视频信息。Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones (so-called "smartphones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of such standards. By implementing such video coding techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频译码技术包含空间(图片内)预测和/或时间(图片间)预测以减少或去除视频序列中固有的冗余。对于基于块的视频译码，视频切片(即，视频帧或视频帧的一部分)可分割成视频块，所述视频块还可被称作树块、译码单元(CU)和/或译码节点。使用关于同一图片中的相邻块中的参考样本的空间预测编码图片的经帧内译码(I)切片中的视频块。图片的经帧间编码(P或B)切片中的视频块可使用相对于同一图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称作帧，且参考图片可被称作参考帧。Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

空间或时间预测导致待译码块的预测性块。残余数据表示待译码原始块与预测性块的间的像素差。经帧间译码块是根据指向形成预测块的参考样本块的运动向量和指示经译码块与预测块之间的差的残余数据编码的。根据帧内译码模式和残余数据来编码经帧内译码块。为了进一步压缩，可将残余数据从像素域变换到变换域，从而产生残余变换系数，可接着量化所述残余变换系数。可扫描一开始按二维阵列排列的经量化变换系数，以便产生变换系数的一维向量，且可应用熵译码以实现更多压缩。Spatial or temporal prediction results in a predictive block for the block to be coded. Residual data represents the pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded based on a motion vector pointing to a block of reference samples forming the prediction block and residual data indicating the difference between the coded block and the prediction block. An intra-coded block is encoded based on an intra-coding mode and the residual data. For further compression, the residual data can be transformed from the pixel domain to the transform domain, thereby generating residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to generate a one-dimensional vector of transform coefficients, and entropy coding can be applied to achieve even more compression.

发明内容Summary of the Invention

一般来说，本发明涉及基于高级编解码器的三维(3D)视频译码，在一些实例中包含深度译码技术。本发明描述当结合不对称运动分割使用时用于视图合成预测译码的技术，包含块大小的确定。本发明还描述当结合不对称运动分割使用时用于高级运动预测的技术。Generally, this disclosure relates to advanced codec-based three-dimensional (3D) video coding, including depth coding techniques in some examples. This disclosure describes techniques for view synthesis prediction coding when used with asymmetric motion partitioning, including block size determination. This disclosure also describes techniques for advanced motion prediction when used with asymmetric motion partitioning.

在本发明的一个实例中，一种对视频数据进行解码的方法包括：接收对应于视频数据块的残余数据，其中所述视频数据块是使用不对称运动分割经编码，是使用后向视图合成预测(BVSP)经单向预测，且具有16x12、12x16、16x4或4x16的大小；将所述视频数据块分割为子块，每一子块具有8x4或4x8的大小；从对应于参考图片的深度图片中的对应深度块导出所述子块中的每一者的相应视差运动向量；使用所述相应导出的视差运动向量合成所述子块中的每一者的相应参考块；以及通过使用所述残余数据和所述经合成相应参考块对所述子块中的每一者执行运动补偿而对所述视频数据块进行解码。In one example of the present invention, a method of decoding video data includes receiving residual data corresponding to a block of video data, wherein the block of video data is encoded using asymmetric motion partitioning, is unidirectionally predicted using backward view synthesis prediction (BVSP), and has a size of 16x12, 12x16, 16x4, or 4x16; partitioning the block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8; deriving a corresponding disparity motion vector for each of the sub-blocks from a corresponding depth block in a depth picture corresponding to a reference picture; synthesizing a corresponding reference block for each of the sub-blocks using the corresponding derived disparity motion vectors; and decoding the block of video data by performing motion compensation on each of the sub-blocks using the residual data and the synthesized corresponding reference blocks.

在本发明的另一实例中，一种对视频数据进行编码的方法包括：使用不对称运动分割产生视频数据块，其中所述视频数据块是使用后向视图合成预测(BVSP)经单向预测且具有16x12、12x16、16x4或4x16的大小；将所述视频数据块分割为子块，每一子块具有8x4或4x8的大小；从对应于参考图片的深度图片中的对应深度块导出所述子块中的每一者的相应视差运动向量；使用所述相应导出的视差运动向量合成所述子块中的每一者的相应参考块；以及通过使用所述经合成相应参考块对所述子块中的每一者执行运动补偿而对所述视频数据块进行编码。In another example of the present invention, a method of encoding video data includes: generating a block of video data using asymmetric motion partitioning, wherein the block of video data is unidirectionally predicted using backward view synthesis prediction (BVSP) and has a size of 16x12, 12x16, 16x4, or 4x16; partitioning the block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8; deriving a corresponding disparity motion vector for each of the sub-blocks from a corresponding depth block in a depth picture corresponding to a reference picture; synthesizing a corresponding reference block for each of the sub-blocks using the corresponding derived disparity motion vectors; and encoding the block of video data by performing motion compensation on each of the sub-blocks using the synthesized corresponding reference blocks.

在本发明的另一实例中，一种经配置以对视频数据进行解码的设备包括：视频存储器，其经配置以存储对应于视频数据块的信息；以及一或多个处理器，其经配置以：接收对应于所述视频数据块的残余数据，其中所述视频数据块是使用不对称运动分割经编码，是使用后向视图合成预测(BVSP)经单向预测，且具有16x12、12x16、16x4或4x16的大小；将所述视频数据块分割为子块，每一子块具有8x4或4x8的大小；从对应于参考图片的深度图片中的对应深度块导出所述子块中的每一者的相应视差运动向量；使用所述相应导出的视差运动向量合成所述子块中的每一者的相应参考块；以及通过使用所述残余数据和所述经合成相应参考块对所述子块中的每一者执行运动补偿而对所述视频数据块进行解码。In another example of the present invention, an apparatus configured to decode video data includes a video memory configured to store information corresponding to a block of video data and one or more processors configured to: receive residual data corresponding to the block of video data, wherein the block of video data is encoded using asymmetric motion partitioning, is unidirectionally predicted using backward view synthesis prediction (BVSP), and has a size of 16x12, 12x16, 16x4, or 4x16; partition the block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8; derive a corresponding disparity motion vector for each of the sub-blocks from a corresponding depth block in a depth picture corresponding to a reference picture; synthesize a corresponding reference block for each of the sub-blocks using the corresponding derived disparity motion vectors; and decode the block of video data by performing motion compensation on each of the sub-blocks using the residual data and the synthesized corresponding reference blocks.

在本发明的另一实例中，一种经配置以对视频数据进行解码的设备包括：用于接收对应于视频数据块的残余数据的装置，其中所述视频数据块是使用不对称运动分割经编码，是使用后向视图合成预测(BVSP)经单向预测，且具有16x12、12x16、16x4或4x16的大小；用于将所述视频数据块分割为子块的装置，每一子块具有8x4或4x8的大小；用于从对应于参考图片的深度图片中的对应深度块导出所述子块中的每一者的相应视差运动向量的装置；用于使用所述相应导出的视差运动向量合成所述子块中的每一者的相应参考块的装置；以及用于通过使用所述残余数据和所述经合成相应参考块对所述子块中的每一者执行运动补偿而对所述视频数据块进行解码的装置。In another example of the present invention, an apparatus configured to decode video data includes: a device for receiving residual data corresponding to a block of video data, wherein the block of video data is encoded using asymmetric motion partitioning, is unidirectionally predicted using backward view synthesis prediction (BVSP), and has a size of 16x12, 12x16, 16x4, or 4x16; a device for partitioning the block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8; a device for deriving a corresponding disparity motion vector for each of the sub-blocks from a corresponding depth block in a depth picture corresponding to a reference picture; a device for synthesizing a corresponding reference block for each of the sub-blocks using the corresponding derived disparity motion vectors; and a device for decoding the block of video data by performing motion compensation on each of the sub-blocks using the residual data and the synthesized corresponding reference blocks.

在附图和下文描述中陈述本发明的一或多个实例的细节。其它特征、目标和优点将从所述描述、图式以及权利要求书显而易见。The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是说明可利用本发明的帧间预测技术的实例视频编码及解码系统的框图。1 is a block diagram illustrating an example video encoding and decoding system that may utilize the inter-prediction techniques of this disclosure.

图2是说明用于多视图视频的实例解码次序的概念图。2 is a conceptual diagram illustrating an example decoding order for multi-view video.

图3是说明用于多视图视频的实例预测结构的概念图。3 is a conceptual diagram illustrating an example prediction structure for multi-view video.

图4是说明用于3D视频的纹理和深度值的概念图。FIG. 4 is a conceptual diagram illustrating texture and depth values for 3D video.

图5是说明实例分割类型的概念图。FIG5 is a conceptual diagram illustrating instance segmentation types.

图6是说明合并模式运动向量候选者的概念图。6 is a conceptual diagram illustrating merge mode motion vector candidates.

图7是指示合并候选者索引的实例规范的表。FIG7 is a table indicating example specifications of merge candidate indices.

图8是说明用于实例视差向量导出过程的相邻块的概念图。8 is a conceptual diagram illustrating neighboring blocks for an example disparity vector derivation process.

图9是说明相邻块视差向量导出过程的概念图。FIG. 9 is a conceptual diagram illustrating a process of deriving disparity vectors of neighboring blocks.

图10是说明8x8深度块的四个拐角像素的概念图。10 is a conceptual diagram illustrating four corner pixels of an 8x8 depth block.

图11是说明用于合并/跳过模式的经视图间预测的运动向量候选者的实例导出的概念图。11 is a conceptual diagram illustrating an example derivation of inter-view predicted motion vector candidates for merge/skip mode.

图12是指示3D-HEVC中的参考索引的实例规范的表。FIG. 12 is a table indicating an example specification of reference indexes in 3D-HEVC.

图13是说明用于深度译码的运动向量继承候选者的实例导出的概念图。13 is a conceptual diagram illustrating an example derivation of motion vector inheritance candidates for depth coding.

图14说明多视图视频译码中的高级残余预测(ARP)的预测结构。14 illustrates the prediction structure of advanced residual prediction (ARP) in multi-view video coding.

图15是说明当前块、参考块和运动补偿块之间的实例关系的概念图。15 is a conceptual diagram illustrating an example relationship among a current block, a reference block, and a motion compensated block.

图16是说明子预测单元视图间运动预测的概念图。FIG. 16 is a conceptual diagram illustrating inter-view motion prediction of a sub-prediction unit.

图17是描绘当使用不对称运动分割时本发明的后向视图合成预测和运动补偿技术的概念图。17 is a conceptual diagram illustrating the backward view synthesis prediction and motion compensation techniques of this disclosure when asymmetric motion partitioning is used.

图18是说明用于4x16和16x4的不对称运动分区大小的运动向量继承和运动补偿技术的概念图。18 is a conceptual diagram illustrating motion vector inheritance and motion compensation techniques for asymmetric motion partition sizes of 4x16 and 16x4.

图19是说明可实施本发明的帧间预测技术的视频编码器的实例的框图。19 is a block diagram illustrating an example of a video encoder that may implement the inter-prediction techniques of this disclosure.

图20是说明可实施本发明的帧间预测技术的视频解码器的实例的框图。20 is a block diagram illustrating an example of a video decoder that may implement the inter-prediction techniques of this disclosure.

图21是说明本发明的实例编码方法的流程图。21 is a flowchart illustrating an example encoding method of the present invention.

图22是说明本发明的另一实例编码方法的流程图。FIG22 is a flowchart illustrating another example encoding method of the present invention.

图23是说明本发明的另一实例编码方法的流程图。FIG23 is a flowchart illustrating another example encoding method of the present invention.

图24是说明本发明的实例解码方法的流程图。24 is a flow chart illustrating an example decoding method of the present invention.

图25是说明本发明的实例解码方法的流程图。25 is a flow chart illustrating an example decoding method of the present invention.

图26是说明本发明的实例解码方法的流程图。26 is a flow chart illustrating an example decoding method of the present invention.

具体实施方式DETAILED DESCRIPTION

一般来说，本发明描述与基于高级编解码器的3D视频译码相关的技术，包含使用3D-HEVC(高效率视频译码)编解码器对一或多个视图连同深度块的译码。确切地说，本发明描述用于将使用不对称运动分割技术分割的预测单元(PU)进一步划分为较小子块的技术。本发明的技术包含用于导出和/或继承使用不对称运动分割分割的PU的子块的运动向量和视差运动向量的技术。In general, this disclosure describes techniques related to advanced codec-based 3D video coding, including coding of one or more views along with depth blocks using the 3D-HEVC (High Efficiency Video Coding) codec. Specifically, this disclosure describes techniques for further partitioning prediction units (PUs) partitioned using asymmetric motion partitioning techniques into smaller sub-blocks. The techniques of this disclosure include techniques for deriving and/or inheriting motion vectors and disparity motion vectors for sub-blocks of a PU partitioned using asymmetric motion partitioning.

图1是说明可利用本发明的技术的实例视频编码和解码系统10的框图。如图1中所展示，系统10包含源装置12，其提供待在稍后时间由目的地装置14解码的经编码视频数据。具体地说，源装置12可经由计算机可读媒体16将视频数据提供到目的地装置14。源装置12及目的地装置14可包括广泛范围的装置中的任一者，包含桌上型计算机、笔记型(即，膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话的电话手持机、所谓的“智能”板、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、视频流式传输装置或类似者。在一些情况下，可装备源装置12和目的地装置14以用于无线通信。FIG1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques of this disclosure. As shown in FIG1 , system 10 includes a source device 12 that provides encoded video data to be decoded at a later time by a destination device 14. Specifically, source device 12 may provide the video data to destination device 14 via a computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" tablets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

目的地装置14可经由计算机可读媒体16接收待解码的经编码视频数据。计算机可读媒体16可包括能够将经编码的视频数据从源装置12移动到目的地装置14的任一类型的媒体或装置。在一个实例中，计算机可读媒体16可包括通信媒体以使源装置12能够实时地将经编码的视频数据直接发射到目的地装置14。可根据通信标准(例如，无线通信协议)调制经编码的视频数据，并将其发射到目的地装置14。通信媒体可包括任何无线或有线通信媒体，例如射频(RF)频谱或一或多个物理传输线。通信媒体可形成分组网络(例如，局域网。广域网或全球网络，例如因特网)的部分。通信媒体可包含路由器、交换器、基站或任何其它可用于促进从源装置12到目的地装置14的通信的设备。Destination device 14 may receive the encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit the encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard (e.g., a wireless communication protocol) and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet network (e.g., a local area network, a wide area network, or a global network such as the Internet). The communication medium may include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.

在一些实例中，经编码数据可以从输出接口22输出到存储装置。类似地，经编码数据可由输入接口从存储装置存取。存储装置可包含多种分布式或本地存取的数据存储媒体中的任一者，例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器或任何其它用于存储经编码的视频数据的合适的数字存储媒体。在另一实例中，存储装置可对应于可保持由源装置12产生的经编码视频的文件服务器或另一中间存储装置。目的地装置14可从存储装置经由流式传输或下载来存取所存储的视频数据。文件服务器可为能够存储经编码视频数据且将经编码视频数据发射到目的地装置14的任何类型的服务器。实例文件服务器包含网络服务器(例如，用于网站)、FTP服务器、网络附接存储(NAS)装置或本地磁盘驱动器。目的地装置14可以通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。此可包含无线信道(例如，Wi-Fi连接)、有线连接(例如，DSL、电缆调制解调器等)，或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置的发射可能是流式传输发射、下载发射或其组合。In some examples, the encoded data may be output from output interface 22 to a storage device. Similarly, the encoded data may be accessed from the storage device by the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, the storage device may correspond to a file server or another intermediate storage device that can hold the encoded video generated by source device 12. Destination device 14 may access the stored video data from the storage device via streaming or downloading. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include a network server (e.g., for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

本发明的技术未必限于无线应用或设定。所述技术可以应用于视频译码以支持多种多媒体应用中的任一者，例如空中协议电视广播、有线电视发射、卫星电视发射、因特网流式视频发射(例如，经由HTTP的动态自适应流式传输(DASH))、经编码到数据存储媒体上的数字视频、存储在数据存储媒体上的数字视频的解码或其它应用。在一些实例中，系统10可经配置以支持单向或双向视频传输，以支持例如视频流式传输、视频重放、视频广播和/或视频电话等应用。The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, Internet streaming video transmissions (e.g., Dynamic Adaptive Streaming over HTTP (DASH)), digital video encoded onto data storage media, decoding of digital video stored on data storage media, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

在图1的实例中，源装置12包含视频源18、深度估计单元19、视频编码器20及输出接口22。目的地装置14包含输入接口28、视频解码器30、基于深度图像的再现(DIBR)单元31及显示装置32。在其它实例中，源装置和目的地装置可包含其它组件或布置。举例来说，源装置12可从外部视频源18(例如外部相机)接收视频数据。同样，目的地装置14可与外部显示装置介接，而非包含集成式显示装置。1 , source device 12 includes a video source 18, a depth estimation unit 19, a video encoder 20, and an output interface 22. Destination device 14 includes an input interface 28, a video decoder 30, a depth image-based rendering (DIBR) unit 31, and a display device 32. In other examples, the source and destination devices may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device rather than including an integrated display device.

图1的说明的系统10只是一个实例。本发明的技术可由任何数字视频编码及/或解码装置执行。尽管本发明的技术一般通过视频编码装置来执行，但是所述技术还可通过视频编码器/解码器(通常被称作“编解码器”)来执行。此外，本发明的技术还可由视频预处理器执行。源装置12及目的地装置14仅为源装置12产生经译码视频数据用于发射到目的地装置14的所述译码装置的实例。在一些实例中，装置12、14可以实质上对称的方式操作，使得装置12、14中的每一者包含视频编码及解码组件。因此，系统10可支持视频装置12、14之间的单向或双向视频传播以例如用于视频流式传输、视频回放、视频广播或视频电话。The illustrated system 10 of FIG. 1 is merely an example. The techniques of this disclosure may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are generally performed by a video encoding device, they may also be performed by a video encoder/decoder (often referred to as a "codec"). Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices that generate coded video data for transmission to destination device 14 by source device 12. In some examples, devices 12, 14 may operate in a substantially symmetrical manner, such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting, or video telephony.

源装置12的视频源18可以包含视频俘获装置，例如摄像机、含有先前所俘获视频的视频存档和/或用于从视频内容提供者接收视频的视频馈送接口。作为另一替代方案，视频源18可以产生基于计算机图形的数据作为源视频，或直播视频、存档视频与计算机产生的视频的组合。在一些情况下，如果视频源18为摄像机，那么源装置12和目的地装置14可形成所谓的相机电话或视频电话。然而，如上文所提及，本发明中所描述的技术可大体上适用于视频译码，且可应用于无线和/或有线应用。在每一情况下，俘获、预先俘获或计算机产生的视频可由视频编码器20编码。经编码视频信息可接着由输出接口22输出到计算机可读媒体16上。Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be generally applicable to video decoding and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 to computer-readable medium 16.

视频源18可将视频数据的一或多个视图提供到视频编码器20。举例来说，视频源18可对应于摄像机阵列，所述摄像机各自具有相对于所拍摄的特定场景的独特水平位置。或者，视频源18可例如使用计算机图形产生来自不同水平相机视角的视频数据。深度估计单元19可经配置以确定对应于纹理图像中的像素的深度像素的值。举例来说，深度估计单元19可表示声音导航与测距(SONAR)单元、光检测与测距(LIDAR)单元或能够在记录场景的视频数据时实质上同时直接确定深度值的其它单元。Video source 18 may provide one or more views of video data to video encoder 20. For example, video source 18 may correspond to an array of cameras, each of which has a unique horizontal position relative to the particular scene being captured. Alternatively, video source 18 may generate video data from different horizontal camera perspectives, for example using computer graphics. Depth estimation unit 19 may be configured to determine the values of depth pixels corresponding to pixels in the texture image. For example, depth estimation unit 19 may represent a sound navigation and ranging (SONAR) unit, a light detection and ranging (LIDAR) unit, or other unit capable of directly determining depth values substantially simultaneously as video data of a scene is recorded.

另外或替代地，深度估计单元19可经配置以通过比较在实质上相同时间从不同水平摄像机视角俘获的两个或两个以上图像来间接计算深度值。通过计算图像中的实质上类似像素值之间的水平视差，深度估计单元19可近似估计场景中的各种对象的深度。在一些实例中，深度估计单元19可在功能上与视频源18集成。举例来说，在视频源18产生计算机图形图像时，深度估计单元19可例如使用用以再现纹理图像的像素及对象的z坐标提供用于图形对象的实际深度图。Additionally or alternatively, depth estimation unit 19 may be configured to indirectly calculate depth values by comparing two or more images captured at substantially the same time from different horizontal camera perspectives. By calculating horizontal disparity between substantially similar pixel values in the images, depth estimation unit 19 can approximate the depths of various objects in the scene. In some examples, depth estimation unit 19 may be functionally integrated with video source 18. For example, when video source 18 generates computer graphics images, depth estimation unit 19 may provide an actual depth map for the graphical objects, such as using the z-coordinates of pixels and objects used to reproduce texture images.

计算机可读媒体16可包含瞬时媒体，例如无线广播或有线网络发射，或存储媒体(也就是说，非暂时性存储媒体)，例如硬盘、快闪驱动器、压缩光盘、数字视频光盘、蓝光光盘或其它计算机可读媒体。在一些实例中，网络服务器(未图示)可以从源装置12接收经编码的视频数据，并且例如经由网络发射将经编码的视频数据提供到目的地装置14。类似地，媒体生产设施(例如，光盘冲压设施)的计算装置可从源装置12接收经编码的视频数据且生产含有经编码的视频数据的光盘。因此，在各种实例中，计算机可读媒体16可以理解为包含各种形式的一或多个计算机可读媒体。Computer-readable medium 16 may include transient media, such as wireless broadcasts or wired network transmissions, or storage media (that is, non-transitory storage media) such as a hard drive, a flash drive, a compact disc, a digital video disc, a Blu-ray disc, or other computer-readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, for example, via a network transmission. Similarly, a computing device at a media production facility (e.g., a disc stamping facility) may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.

目的地装置14的输入接口28从计算机可读媒体16接收信息。计算机可读媒体16的信息可包含由视频编码器20定义的语法信息，所述语法信息也被视频解码器30使用，所述语法信息包含描述块和其它经译码单元(例如，GOP)的特性和/或处理的语法元素。显示装置32将经解码视频数据显示给用户，且可包括多种显示装置中的任一者，例如阴极射线管(CRT)、液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。在一些实例中，显示装置32可包括能够同时或实质上同时显示两个或两个以上视图例如以向观察者产生3D视觉效果的装置。Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information from computer-readable medium 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units (e.g., GOPs). Display device 32 displays the decoded video data to a user and may include any of a variety of display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device. In some examples, display device 32 may include a device capable of displaying two or more views simultaneously or substantially simultaneously, for example, to produce a 3D visual effect for a viewer.

目的地装置14的DIBR单元31可使用从视频解码器30接收的经解码视图的纹理及深度信息再现合成视图。举例来说，DIBR单元31可依据对应深度图中的像素的值确定纹理图像的像素数据的水平视差。DIBR单元31接着可通过使纹理图像中的像素向左或向右偏移所确定的水平视差而产生合成图像。以此方式，显示装置32可以任何组合显示可对应于经解码视图及/或合成视图的一或多个视图。根据本发明的技术，视频解码器30可将深度范围及摄像机参数的原始及更新精度值提供到DIBR单元31，所述DIBR单元可使用深度范围及摄像机参数来恰当地合成视图。DIBR unit 31 of destination device 14 can reproduce a synthesized view using texture and depth information for the decoded view received from video decoder 30. For example, DIBR unit 31 can determine the horizontal disparity of pixel data for the texture image based on the values of pixels in the corresponding depth map. DIBR unit 31 can then generate a synthesized image by shifting pixels in the texture image left or right by the determined horizontal disparity. In this manner, display device 32 can display one or more views that may correspond to decoded views and/or synthesized views in any combination. According to the techniques of this disclosure, video decoder 30 can provide original and updated precision values for depth ranges and camera parameters to DIBR unit 31, which can use the depth ranges and camera parameters to properly synthesize the view.

尽管图1中未图示,但在一些方面中，视频编码器20和视频解码器30可各自与音频编码器及解码器集成，且可包含适当多路复用器-多路分用器单元或其它硬件和软件以处置共同数据流或单独数据流中的音频和视频两者的编码。如果适用，则多路复用器-多路分用器单元可符合ITU H.223多路复用器协议，或例如用户数据报协议(UDP)等其它协议。Although not shown in FIG1 , in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include appropriate multiplexer-demultiplexer units or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams. If applicable, the multiplexer-demultiplexer units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).

视频编码器20和视频解码器30各自可实施为多种合适的编码器和解码器电路中的任一者，例如一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当部分地用软件实施所述技术时，装置可将用于所述软件的指令存储在合适的非暂时计算机可读媒体中，且使用一或多个处理器用硬件执行所述指令以执行本发明的技术。视频编码器20和视频解码器30中的每一者可包含在一或多个编码器或解码器中，所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(CODEC)的部分。包含视频编码器20和/或视频解码器30的装置可包括集成电路、微处理器和/或无线通信装置(例如，蜂窝式电话)。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder and decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. A device including video encoder 20 and/or video decoder 30 may include an integrated circuit, a microprocessor, and/or a wireless communication device (e.g., a cellular telephone).

视频编码器20和视频解码器30可以根据一种视频译码标准(例如目前正在开发的高效视频译码(HEVC)标准)来操作，并且可以符合HEVC测试模型(HM)。替代地，视频编码器20及视频解码器30可根据例如替代地被称作MPEG-4第10部分高级视频译码(AVC)的ITU-TH.264标准等其它专属或工业标准或此类标准的扩展(例如，ITU-T H.264/AVC的MVC扩展)操作。MVC的最新联合草案描述于2010年3月的“用于通用视听服务的高级视频译码”(ITU-T建议H.264)中。确切地说，视频编码器20及视频解码器30可根据3D和/或多视图译码标准操作，包含HEVC标准的3D扩展(例如，3D-HEVC)。Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC Test Model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4 Part 10 Advanced Video Coding (AVC), or extensions of such standards, such as the MVC extension of ITU-T H.264/AVC. The latest joint draft of MVC is described in "Advanced Video Coding for Generic Audiovisual Services" (ITU-T Recommendation H.264) of March 2010. Specifically, video encoder 20 and video decoder 30 may operate according to 3D and/or multi-view coding standards, including a 3D extension of the HEVC standard (e.g., 3D-HEVC).

被称作“HEVC工作草案10”或“WD10”的HEVC标准的一个草案在布洛斯等人的文献JCTVC-L1003v34“高效率视频译码(HEVC)文本规范草案10(用于FDIS和最后呼叫)”(ITU-TSG16 WP3和ISO/IEC JTC1/SC29/WG11的视频译码联合合作小组(JCT-VC)，瑞士日内瓦第12次会议，2013年1月14-23日)中描述，其从2014年8月22日起可从http://phenix.int- evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip下载。A draft of the HEVC standard, referred to as “HEVC Working Draft 10” or “WD10,” is described in document JCTVC-L1003v34, “High Efficiency Video Coding (HEVC) Textual Specification Draft 10 (for FDIS and Last Call)” by Bloss et al., Joint Collaboration Team on Video Coding (JCT-VC) of ITU-TSG16 WP3 and ISO/IEC JTC1/SC29/WG11, 12th Meeting, Geneva, Switzerland, January 14-23, 2013, which has been available for download since August 22, 2014, from http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip .

HEVC标准的另一草案在本文中被称作“WD10修订本”，在布洛斯等人的“HEVC版本1的编者提议的校正”(ITU-T SG16WP3和ISO/IEC JTC1/SC29/WG11的视频译码联合合作小组(JCT-VC)2013年4月韩国仁川第13次会议)中描述，其从2014年8月22日起从http:// phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M0432- v3.zip可用。对HEVC的多视图扩展(即MV-HEVC)也正由JCT-3V开发。Another draft of the HEVC standard, referred to herein as the "WD10 revision," is described in "Editor's Proposed Corrections to HEVC Version 1" by Bloss et al., Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 , 13th Meeting, Incheon, South Korea, April 2013, available from http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M0432-v3.zip as of August 22, 2014. A multi-view extension to HEVC, namely MV-HEVC, is also being developed by JCT-3V.

当前，VCEG及MPEG的3D视频译码联合合作小组(JCT-3C)正在开发基于HEVC的3DV标准，其标准化努力的部分包含基于HEVC的多视图视频编解码器(MV-HEVC)的标准化及用于基于HEVC的3D视频译码(3D-HEVC)的另一部分。对于MV-HEVC，应保证其中仅存在高级语法(HLS)改变，以使得HEVC中的译码单元/预测单元层级中的模块不需要再设计，且可完全再用于MV-HEVC。对于3D-HEVC，可包含并支持用于纹理和深度视图两者的包含译码单元/预测单元层级中的工具的新译码工具。Currently, the Joint Collaboration Team on 3D Video Coding (JCT-3C) of VCEG and MPEG is developing a 3DV standard based on HEVC. Part of this standardization effort includes the standardization of the HEVC-based Multi-View Video Codec (MV-HEVC) and another part for HEVC-based 3D Video Coding (3D-HEVC). For MV-HEVC, only the high-level syntax (HLS) changes are required, so that modules at the coding unit/prediction unit level in HEVC do not need to be redesigned and can be fully reused for MV-HEVC. For 3D-HEVC, new coding tools, including tools at the coding unit/prediction unit level, can be included and supported for both texture and depth views.

用于3D-HEVC的一个版本软件3D-HTM可从以下链接下载：[3D-HTM版本8.0]：https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-8.0/。3D-HEVC的一个工作草案(文档编号：E1001)从http://phenix.it-sudparis.eu/jct2/doc_end_user/ current_document.php？id＝1361可用。最新软件描述(文档编号：E1005)从http:// phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝1360可用。A version of the 3D-HEVC software , 3D-HTM, is available for download at the following link: [3D-HTM Version 8.0]: https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-8.0/ . A working draft of 3D-HEVC (Document No. E1001 ) is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1361 . The latest software description (Document No. E1005) is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1360 .

用于3D-HEVC的软件3D-HTM的更为新近版本可从以下链接下载：[3D-HTM版本12.0]：https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-12.0/。3D-HEVC的对应工作草案(文档编号：I1001)从http://phenix.int-evry.fr/jct3v/doc_end_ user/current_document.php？id＝2299可用。最新软件描述(文档编号：I1005)从http:// phenix.int-evry.fr/jct3v/doc_end_user/current_document.php？id＝2301可用。A more recent version of the 3D-HTM software for 3D-HEVC can be downloaded from the following link: [3D-HTM Version 12.0 ]: https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-12.0/ . The corresponding working draft of 3D-HEVC (Document No.: I1001) is available at http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php?id=2299 . The latest software description (Document No.: I1005) is available at http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php?id=2301 .

最初，将论述HEVC的实例译码技术。HEVC标准化努力是基于被称作HEVC测试模型(HM)的视频译码装置的演进模型。HM假设视频译码装置根据例如ITU-T H.264/AVC相对于现有装置的若干额外能力。举例来说，虽然H.264提供9种帧内预测编码模式，但HM可提供多达三十三种角度帧内预测编码模式加DC及平面模式。Initially, example coding techniques for HEVC will be discussed. The HEVC standardization effort is based on an evolving model of a video coding device known as the HEVC Test Model (HM). The HM assumes several additional capabilities of video coding devices compared to existing devices, such as those based on ITU-T H.264/AVC. For example, while H.264 provides nine intra-prediction coding modes, the HM can provide up to thirty-three angular intra-prediction coding modes plus DC and planar modes.

在HEVC及其它视频译码规范中，视频序列通常包含一系列图片。图片也可被称作“帧”。图片可以包含三个样本阵列，标示为S_L、S_Cb以及S_Cr。S_L是明度样本的二维阵列(即，块)。S_Cb是Cb色度样本的二维阵列。S_Cr是Cr色度样本的二维阵列。色度样本在本文中还可以被称为“色度”样本。在其它情况下，图片可为单色的且可仅包含明度样本阵列。In HEVC and other video coding specifications, a video sequence typically comprises a series of pictures. A picture may also be referred to as a "frame." A picture may comprise three sample arrays, denoted _SL , _SCb , and _SCr . _SL is a two-dimensional array (i.e., a block) of luma samples. _SCb is a two-dimensional array of Cb chroma samples. _SCr is a two-dimensional array of Cr chroma samples. Chroma samples may also be referred to herein as "chroma" samples. In other cases, a picture may be monochrome and may comprise only a luma sample array.

为了产生图片的经编码的表示，视频编码器20可以产生一组译码树单元(CTU)。CTU中的每一者可包括明度样本的译码树块、色度样本的两个对应的译码树块，以及用以对译码树块的样本进行译码的语法结构。在单色图片或具有三个单独颜色平面的图片中，CTU可包括单个译码树块及用于对所述译码树块的样本进行译码的语法结构。译码树块可为样本的NxN块。CTU也可以被称为“树块”或“最大译码单元(LCU)”。HEVC的CTU可以广泛地类似于例如H.264/AVC等其它标准的宏块。然而，CTU未必限于特定大小，并且可以包含一或多个译码单元(CU)。切片可包含按光栅扫描次序连续排序的整数数目的CTU。To generate an encoded representation of a picture, video encoder 20 may generate a set of coding tree units (CTUs). Each of the CTUs may include a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples, and syntax structures for coding the samples of the coding tree blocks. In monochrome pictures or pictures with three separate color planes, a CTU may include a single coding tree block and syntax structures for coding the samples of the coding tree block. A coding tree block may be an NxN block of samples. A CTU may also be referred to as a "tree block" or a "largest coding unit (LCU)." A CTU of HEVC may be broadly similar to macroblocks of other standards, such as H.264/AVC. However, a CTU is not necessarily limited to a specific size and may include one or more coding units (CUs). A slice may include an integer number of CTUs ordered contiguously in raster scan order.

为了产生经译码CTU，视频编码器20可在CTU的译码树块上以递归方式执行四叉树分割，以将译码树块划分为译码块，因此命名为“译码树单元”。译码块是样本的NxN块。译码单元(CU)可包括具有明度样本阵列、Cb样本阵列和Cr样本阵列的图片的明度样本的译码块以及色度样本的两个对应的译码块，以及用以对译码块的样本进行译码的语法结构。在单色图片或具有三个单独颜色平面的图片中，CU可包括单个译码块和用以对译码块的样本进行译码的语法结构。To generate a coded CTU, video encoder 20 may recursively perform quadtree partitioning on the coding tree block of the CTU to divide the coding tree block into coding blocks, hence the name "coding tree unit". A coding block is an NxN block of samples. A coding unit (CU) may include a coding block of luma samples and two corresponding coding blocks of chroma samples for a picture having a luma sample array, a Cb sample array, and a Cr sample array, and syntax structures used to code the samples of the coding blocks. In a monochrome picture or a picture with three separate color planes, a CU may include a single coding block and syntax structures used to code the samples of the coding block.

视频编码器20可将CU的译码块分割为一或多个预测块。预测块是应用相同预测的样本的矩形(即，正方形或非正方形)块。CU的预测单元(PU)可包括明度样本的预测块、色度样本的两个对应预测块和用以预测预测块的语法结构。在单色图片或具有三个单独颜色平面的图片中，PU可包括单个预测块和用以预测预测块的语法结构。视频编码器20可以产生用于CU的每个PU的明度预测块、Cb预测块以及Cr预测块的预测性明度块、Cb块以及Cr块。Video encoder 20 may partition the coding blocks of a CU into one or more prediction blocks. A prediction block is a rectangular (i.e., square or non-square) block of samples to which the same prediction applies. A prediction unit (PU) of a CU may include a prediction block of luma samples, two corresponding prediction blocks of chroma samples, and syntax structures used to predict the prediction blocks. In monochrome pictures or pictures with three separate color planes, a PU may include a single prediction block and syntax structures used to predict the prediction blocks. Video encoder 20 may generate predictive luma, Cb, and Cr blocks for the luma, Cb, and Cr prediction blocks of each PU of a CU.

视频编码器20可使用帧内预测或帧间预测来产生PU的预测块。如果视频编码器20使用帧内预测产生PU的预测性块，则视频编码器20可以基于与PU相关联的图片的经解码的样本来产生PU的预测性块。在HEVC的一些版本中，对于每一PU的明度分量，以33个角度预测模式(从2到34编索引)、DC模式(以1编索引)和平面模式(以0编索引)利用帧内预测方法。Video encoder 20 may use intra prediction or inter prediction to generate the prediction block for the PU. If video encoder 20 uses intra prediction to generate the predictive block for the PU, video encoder 20 may generate the predictive block for the PU based on decoded samples of the picture associated with the PU. In some versions of HEVC, for the luma component of each PU, intra prediction methods are utilized with 33 angular prediction modes (indexed from 2 to 34), DC mode (indexed with 1), and planar mode (indexed with 0).

如果视频编码器20使用帧间预测产生PU的预测性块，则视频编码器20可基于除与PU相关的图片以外的一或多个图片的经解码样本产生PU的预测性块。帧间预测可为单向帧间预测(即，单向预测或单向预测性预测)或双向帧间预测(即，双向预测或双向预测性预测)。为了执行单向预测或双向预测，视频编码器20可产生当前切片的第一参考图片列表(RefPicList0)及第二参考图片列表(RefPicList1)。参考图片列表中的每一者可包含一或多个参考图片。当使用单向预测时，视频编码器20可以搜索RefPicList0以及RefPicList1中的任一者或两者中的参考图片，以确定参考图片内的参考位置。此外，当使用单向预测时，视频编码器20可以至少部分基于对应于参考位置的样本产生用于PU的预测样本块。此外，在使用单向预测时，视频编码器20可产生指示PU的预测块与参考位置之间的空间移位的单一运动向量。为了指示PU的预测块与参考位置之间的空间移位，运动向量可以包含指定PU的预测块与参考位置之间的水平移位的水平分量并且可以包含指定PU的预测块与参考位置之间的垂直移位的垂直分量。If video encoder 20 uses inter-prediction to generate the predictive block for a PU, video encoder 20 may generate the predictive block for the PU based on decoded samples of one or more pictures other than the picture associated with the PU. Inter-prediction may be unidirectional inter prediction (i.e., unidirectional prediction or unidirectional predictive prediction) or bidirectional inter prediction (i.e., bidirectional prediction or bidirectional predictive prediction). To perform unidirectional prediction or bidirectional prediction, video encoder 20 may generate a first reference picture list (RefPicList0) and a second reference picture list (RefPicList1) for the current slice. Each of the reference picture lists may include one or more reference pictures. When using unidirectional prediction, video encoder 20 may search the reference pictures in either or both of RefPicList0 and RefPicList1 to determine a reference position within the reference pictures. Furthermore, when using unidirectional prediction, video encoder 20 may generate a block of prediction samples for the PU based at least in part on samples corresponding to the reference position. Furthermore, when using unidirectional prediction, video encoder 20 may generate a single motion vector indicating the spatial displacement between the prediction block of the PU and the reference position. To indicate the spatial displacement between the PU's prediction block and the reference position, the motion vector may contain a horizontal component specifying the horizontal displacement between the PU's prediction block and the reference position and may contain a vertical component specifying the vertical displacement between the PU's prediction block and the reference position.

在使用双向预测来编码PU时，视频编码器20可确定RefPicList0中的参考图片中的第一参考位置及RefPicList1中的参考图片中的第二参考位置。视频编码器20接着可至少部分基于对应于第一及第二参考位置的样本产生PU的预测性块。此外，当使用双向预测对PU进行编码时，视频编码器20可以产生指示PU的样本块与第一参考位置之间的空间移位的第一运动向量，以及指示PU的预测块与第二参考位置之间的空间移位的第二运动向量。When encoding a PU using bi-prediction, video encoder 20 may determine a first reference location in a reference picture in RefPicList0 and a second reference location in a reference picture in RefPicList1. Video encoder 20 may then generate a predictive block for the PU based at least in part on samples corresponding to the first and second reference locations. Furthermore, when encoding a PU using bi-prediction, video encoder 20 may generate a first motion vector indicating a spatial displacement between the sample block of the PU and the first reference location, and a second motion vector indicating a spatial displacement between the prediction block of the PU and the second reference location.

通常，B图片的第一或第二参考图片列表(例如，RefPicList0或RefPicList1)的参考图片列表构造包含两个步骤：参考图片列表初始化和参考图片列表重新排序(修改)。参考图片列表初始化是显式机制，其基于POC(图片次序计数，与图片的显示次序对准)次序值将参考图片存储器(也被称作经解码图片缓冲器)中的参考图片放入列表中。参考图片列表重新排序机制可将在参考图片列表初始化期间放置在列表中的图片的位置修改为任何新位置，或即使在图片不属于初始化列表的情况下也将参考图片存储器中的任何参考图片放置在任何位置。可将参考图片列表重新排序(修改)后的一些图片放置在列表中的再另一位置。然而，如果图片的位置超过列表的有效参考图片的数目，则不将所述图片视为最终参考图片列表的条目。可在每一列表的切片标头中用信号表示有效参考图片的数目。Typically, reference picture list construction for the first or second reference picture list (e.g., RefPicList0 or RefPicList1) of a B picture involves two steps: reference picture list initialization and reference picture list reordering (modification). Reference picture list initialization is an explicit mechanism that places reference pictures in the reference picture memory (also known as the decoded picture buffer) into a list based on their POC (Picture Order Count) order values, which align with the display order of the pictures. The reference picture list reordering mechanism can modify the position of pictures placed in the list during reference picture list initialization to any new position, or place any reference picture in the reference picture memory at any position, even if the picture does not belong to the initialized list. Some pictures may be placed at yet another position in the list after the reference picture list reordering (modification). However, if the position of a picture exceeds the number of valid reference pictures for the list, the picture is not considered an entry in the final reference picture list. The number of valid reference pictures can be signaled in the slice header of each list.

在构造参考图片列表(即RefPicList0和RefPicList1，如果可用)之后，可使用到参考图片列表的参考索引来识别参考图片列表中包含的任何参考图片。After constructing the reference picture lists (ie, RefPicList0 and RefPicList1, if available), the reference indexes into the reference picture lists may be used to identify any reference pictures included in the reference picture lists.

在视频编码器20产生CU的一或多个PU的预测性明度、Cb及Cr块之后，视频编码器20可产生CU的明度残余块。CU的明度残余块中的每个样本指示CU的预测性明度块中的一者中的明度样本与CU的原始明度译码块中对应的样本之间的差异。另外，视频编码器20可以产生CU的Cb残余块。CU的Cb残余块中的每一样本可以指示CU的预测性Cb块中的一者中的Cb样本与CU的原始Cb译码块中对应的样本之间的差异。视频编码器20还可产生CU的Cr残余块。CU的Cr残余块中的每一样本可指示CU的预测性Cr块中的一者中的Cr样本与CU的原始Cr译码块中的对应样本之间的差异。After video encoder 20 generates predictive luma, Cb, and Cr blocks for one or more PUs of a CU, video encoder 20 may generate a luma residual block for the CU. Each sample in the luma residual block of the CU indicates the difference between a luma sample in one of the CU's predictive luma blocks and a corresponding sample in the CU's original luma coding block. Additionally, video encoder 20 may generate a Cb residual block for the CU. Each sample in the Cb residual block of the CU may indicate the difference between a Cb sample in one of the CU's predictive Cb blocks and a corresponding sample in the CU's original Cb coding block. Video encoder 20 may also generate a Cr residual block for the CU. Each sample in the Cr residual block of the CU may indicate the difference between a Cr sample in one of the CU's predictive Cr blocks and a corresponding sample in the CU's original Cr coding block.

此外，视频编码器20可使用四叉树分割将CU的明度、Cb及Cr残余块分解成一或多个明度、Cb及Cr变换块。变换块是应用同一变换的样本的矩形(例如，正方形或非正方形)块。CU的变换单元(TU)可包括明度样本的变换块、色度样本的两个对应变换块及用以对变换块样本进行变换的语法结构。因此，CU的每个TU可以与明度变换块、Cb变换块以及Cr变换块相关联。与TU相关联的明度变换块可以是CU的明度残余块的子块。Cb变换块可为CU的Cb残余块的子块。Cr变换块可以是CU的Cr残余块的子块。在单色图片或具有三个单独颜色平面的图片中，TU可包括单个变换块和用以对变换块的样本进行变换的语法结构。In addition, video encoder 20 may use quadtree partitioning to decompose the luma, Cb, and Cr residual blocks of a CU into one or more luma, Cb, and Cr transform blocks. A transform block is a rectangular (e.g., square or non-square) block of samples to which the same transform is applied. A transform unit (TU) of a CU may include a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax structures used to transform the transform block samples. Thus, each TU of a CU may be associated with a luma transform block, a Cb transform block, and a Cr transform block. The luma transform block associated with a TU may be a subblock of the CU's luma residual block. The Cb transform block may be a subblock of the CU's Cb residual block. The Cr transform block may be a subblock of the CU's Cr residual block. In monochrome pictures or pictures with three separate color planes, a TU may include a single transform block and syntax structures used to transform the samples of the transform block.

视频编码器20可将一或多个变换应用到TU的明度变换块以产生TU的明度系数块。系数块可为变换系数的二维阵列。变换系数可为标量。视频编码器20可将一或多个变换应用至TU的Cb变换块以产生TU的Cb系数块。视频编码器20可将一或多个变换应用至TU的Cr变换块以产生TU的Cr系数块。Video encoder 20 may apply one or more transforms to the luma transform block of a TU to generate a luma coefficient block for the TU. A coefficient block may be a two-dimensional array of transform coefficients. A transform coefficient may be a scalar. Video encoder 20 may apply one or more transforms to the Cb transform block of a TU to generate a Cb coefficient block for the TU. Video encoder 20 may apply one or more transforms to the Cr transform block of a TU to generate a Cr coefficient block for the TU.

在产生系数块(例如，明度系数块、Cb系数块或Cr系数块)之后，视频编码器20可以量化系数块。量化总体上是指对变换系数进行量化以可能减少用以表示变换系数的数据的量从而提供进一步压缩的过程。在视频编码器20量化系数块之后，视频编码器20可以对指示经量化变换系数的语法元素进行熵编码。举例来说，视频编码器20可对指示经量化变换系数的语法元素执行上下文自适应二进制算术译码(CABAC)。After generating a coefficient block (e.g., a luma coefficient block, a Cb coefficient block, or a Cr coefficient block), video encoder 20 may quantize the coefficient block. Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the transform coefficients, thereby providing further compression. After video encoder 20 quantizes the coefficient block, video encoder 20 may entropy encode syntax elements indicating the quantized transform coefficients. For example, video encoder 20 may perform context-adaptive binary arithmetic coding (CABAC) on the syntax elements indicating the quantized transform coefficients.

视频编码器20可输出包含形成经译码图片及相关联数据的表示的位元序列的位流。位流可包括一连串网络抽象层(NAL)单元。NAL单元为含有NAL单元中的数据类型类型的指示及含有所述数据的呈按需要穿插有模拟阻止位的原始字节序列有效负载(RBSP)的形式的字节的语法结构。NAL单元中的每一者包含NAL单元标头且囊封RBSP。NAL单元标头可包含指示NAL单元类型码的语法元素。由NAL单元的NAL单元标头指定的所述NAL单元类型代码指示NAL单元的类型。RBSP可为含有囊封在NAL单元内的整数数目个字节的语法结构。在一些情况下，RBSP包含零个位。Video encoder 20 may output a bitstream comprising a sequence of bits forming a representation of a coded picture and associated data. The bitstream may comprise a series of network abstraction layer (NAL) units. A NAL unit is a syntax structure containing an indication of the type of data in the NAL unit and bytes containing the data in the form of a raw byte sequence payload (RBSP) interspersed with emulation prevention bits as needed. Each NAL unit comprises a NAL unit header and encapsulates the RBSP. The NAL unit header may include a syntax element indicating a NAL unit type code. The NAL unit type code, specified by the NAL unit header of the NAL unit, indicates the type of the NAL unit. The RBSP may be a syntax structure containing an integer number of bytes encapsulated within the NAL unit. In some cases, the RBSP comprises zero bits.

不同类型的NAL单元可囊封不同类型的RBSP。举例来说，第一类型的NAL单元可囊封用于图片参数集(PPS)的RBSP，第二类型的NAL单元可囊封用于经译码切片的RBSP，第三类型的NAL单元可囊封用于SEI的RBSP等等。囊封视频译码数据的RBSP(与参数集及SEI消息的RBSP相对)的NAL单元可被称为视频译码层(VCL)NAL单元。Different types of NAL units may encapsulate different types of RBSPs. For example, a first type of NAL unit may encapsulate RBSPs for picture parameter sets (PPSs), a second type of NAL unit may encapsulate RBSPs for coded slices, a third type of NAL unit may encapsulate RBSPs for SEI, and so on. NAL units that encapsulate RBSPs for video coding data (as opposed to RBSPs for parameter sets and SEI messages) may be referred to as video coding layer (VCL) NAL units.

视频解码器30可以接收由视频编码器20产生的位流。另外，视频解码器30可以剖析位流以获得来自位流的语法元素。视频解码器30可至少部分基于从位流获得的语法元素重构视频数据的图片。用以重构视频数据的过程通常可与由视频编码器20执行的过程互逆。举例来说，视频解码器30可使用PU的运动向量，以确定当前CU的PU的预测性块。另外，视频解码器30可逆量化与当前CU的TU相关联的系数块。视频解码器30可以对系数块执行逆变换以重构与当前CU的TU相关联的变换块。通过将用于当前CU的PU的预测性块的样本增加到当前CU的TU的变换块的对应的样本上，视频解码器30可以重构当前CU的译码块。通过重构用于图片的每一CU的译码块，视频解码器30可重构图片。Video decoder 30 may receive a bitstream generated by video encoder 20. Additionally, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct a picture of the video data based, at least in part, on the syntax elements obtained from the bitstream. The process used to reconstruct the video data may generally be the inverse of the process performed by video encoder 20. For example, video decoder 30 may use motion vectors for PUs to determine predictive blocks for PUs of the current CU. Additionally, video decoder 30 may inverse quantize coefficient blocks associated with TUs of the current CU. Video decoder 30 may perform an inverse transform on the coefficient blocks to reconstruct transform blocks associated with the TUs of the current CU. Video decoder 30 may reconstruct the coding blocks of the current CU by adding samples of the predictive blocks for the PUs of the current CU to corresponding samples of the transform blocks for the TUs of the current CU. By reconstructing the coding blocks for each CU of a picture, video decoder 30 may reconstruct the picture.

在一些实例中，视频编码器20可使用合并模式或高级运动向量预测(AMVP)模式用信号表示PU的运动信息。换句话说，在HEVC中，存在预测运动参数的两个模式，一者为合并模式及另一者为AMVP。运动预测可包括基于一或多个其它视频单元的运动信息的视频单元(例如，PU)的运动信息的确定。PU的运动信息可以包含PU的运动向量以及PU的参考索引。In some examples, video encoder 20 may signal the motion information of a PU using merge mode or advanced motion vector prediction (AMVP) mode. In other words, in HEVC, there are two modes for predicting motion parameters, one is merge mode and the other is AMVP. Motion prediction may include determining the motion information of a video unit (e.g., a PU) based on the motion information of one or more other video units. The motion information of a PU may include a motion vector of the PU and a reference index of the PU.

当视频编码器20使用合并模式用信号表示当前PU的运动信息时，视频编码器20产生合并候选者列表。换句话说，视频编码器20可执行运动向量预测符列表构造过程。合并候选者列表包含指示在空间上或在时间上相邻于当前PU的PU的运动信息的合并候选者的集合。也就是说，在合并模式中，构造运动参数(例如，参考索引、运动向量等)的候选者列表，其中候选者可来自空间及时间相邻块。在一些实例中，候选者还可包含人工产生的候选者。When video encoder 20 uses merge mode to signal the motion information of the current PU, video encoder 20 generates a merge candidate list. In other words, video encoder 20 may perform a motion vector predictor list construction process. The merge candidate list includes a set of merge candidates indicating the motion information of PUs that are spatially or temporally adjacent to the current PU. That is, in merge mode, a candidate list of motion parameters (e.g., reference index, motion vector, etc.) is constructed, where the candidates may come from spatial and temporal neighboring blocks. In some examples, the candidates may also include artificially generated candidates.

此外，在合并模式中，视频编码器20可从合并候选者列表选择合并候选者且可使用由所选合并候选者指示的运动信息作为当前PU的运动信息。视频编码器20可用信号表示所选合并候选者的合并候选者列表中的位置。举例来说，视频编码器20可通过将索引发射到候选者列表中而用信号发送所选择的运动向量参数。视频解码器30可从位流获得进入候选者列表的索引(即，候选者列表索引)。另外，视频解码器30可产生相同合并候选者列表，且可基于所选合并候选者的位置的指示确定所选合并候选者。接着，视频解码器30可以使用选定的合并候选者的运动信息来产生当前PU的预测性块。也就是说，视频解码器30可至少部分地基于候选者列表索引确定候选者列表中的所选候选者，其中所选候选者指定当前PU的运动向量。以此方式，在解码器侧处，一旦索引被解码，索引所指向的对应块的所有运动参数便可由当前PU继承。Furthermore, in merge mode, video encoder 20 may select a merge candidate from a merge candidate list and may use the motion information indicated by the selected merge candidate as the motion information for the current PU. Video encoder 20 may signal the position of the selected merge candidate in the merge candidate list. For example, video encoder 20 may signal the selected motion vector parameters by transmitting an index into the candidate list. Video decoder 30 may obtain the index into the candidate list (i.e., the candidate list index) from the bitstream. Alternatively, video decoder 30 may generate the same merge candidate list and may determine the selected merge candidate based on the indication of the position of the selected merge candidate. Video decoder 30 may then use the motion information of the selected merge candidate to generate a predictive block for the current PU. That is, video decoder 30 may determine the selected candidate in the candidate list based at least in part on the candidate list index, where the selected candidate specifies the motion vector for the current PU. In this way, at the decoder side, once the index is decoded, all motion parameters for the corresponding block pointed to by the index may be inherited by the current PU.

跳过模式类似于合并模式。在跳过模式中，视频编码器20及视频解码器30以视频编码器20及视频解码器30在合并模式中使用合并候选者列表的相同方式来产生及使用合并候选者列表。然而，当视频编码器20使用跳过模式用信号发送当前PU的运动信息时，视频编码器20不用信号发送当前PU的任何残余数据。因此，视频解码器30可在不使用残余数据的情况下基于由合并候选者列表中的选定候选者的运动信息指示的参考块而确定PU的预测块。Skip mode is similar to merge mode. In skip mode, video encoder 20 and video decoder 30 generate and use a merge candidate list in the same manner as they use a merge candidate list in merge mode. However, when video encoder 20 signals the motion information of a current PU using skip mode, video encoder 20 does not signal any residual data for the current PU. Therefore, video decoder 30 can determine the prediction block of the PU based on the reference block indicated by the motion information of the selected candidate in the merge candidate list without using the residual data.

AMVP模式类似于合并模式，类似之处在于视频编码器20可产生候选者列表并且可从候选者列表选择候选者。然而，当视频编码器20使用AMVP模式发信号通知当前PU的RefPicListX运动信息时，视频编码器20可除了发信号通知当前PU的RefPicListX旗标之外还发信号通知当前PU的RefPicListX运动向量差(MVD)及当前PU的RefPicListX参考索引。当前PU的RefPicListX MVP旗标可指示AMVP候选者列表中的选定AMVP候选者的位置。当前PU的RefPicListX MVD可指示当前PU的RefPicListX运动向量与选定AMVP候选者的运动向量之间的差。以此方式，视频编码器20可通过发信号通知RefPicListX运动向量预测符(MVP)旗标、RefPicListX参考索引值和RefPicListX MVD而发信号通知当前PU的RefPicListX运动信息。换句话说，在位流中的表示当前PU的运动向量的数据可包含表示参考索引的数据、到候选者列表的索引及MVD。AMVP mode is similar to merge mode in that video encoder 20 may generate a candidate list and may select a candidate from the candidate list. However, when video encoder 20 signals the RefPicListX motion information of the current PU using AMVP mode, video encoder 20 may signal the RefPicListX motion vector difference (MVD) of the current PU and the RefPicListX reference index of the current PU in addition to signaling the RefPicListX flag of the current PU. The RefPicListX MVP flag of the current PU may indicate the position of the selected AMVP candidate in the AMVP candidate list. The RefPicListX MVD of the current PU may indicate the difference between the RefPicListX motion vector of the current PU and the motion vector of the selected AMVP candidate. In this way, video encoder 20 can signal the RefPicListX motion information of the current PU by signaling the RefPicListX motion vector predictor (MVP) flag, the RefPicListX reference index value, and the RefPicListX MVD. In other words, the data representing the motion vector of the current PU in the bitstream may include data representing the reference index, an index to the candidate list, and the MVD.

此外，在使用AMVP模式发信号通知当前PU的运动信息时，视频解码器30可从所述位流获得当前PU的MVD及MVP旗标。视频解码器30可产生相同的AMVP候选者列表且可基于MVP旗标确定所述选定AMVP候选者。视频解码器30可通过将MVD添加到由所述选定AMVP候选者指示的运动向量来恢复当前PU的运动向量。也就是说，视频解码器30可基于由所述选定AMVP候选者指示的运动向量和MVD确定当前PU的运动向量。视频解码器30接着可使用当前PU的所恢复的一或多个运动向量来产生当前PU的预测性块。In addition, when the motion information of the current PU is signaled using the AMVP mode, the video decoder 30 may obtain the MVD and MVP flag of the current PU from the bitstream. The video decoder 30 may generate the same AMVP candidate list and may determine the selected AMVP candidate based on the MVP flag. The video decoder 30 may recover the motion vector of the current PU by adding the MVD to the motion vector indicated by the selected AMVP candidate. That is, the video decoder 30 may determine the motion vector of the current PU based on the motion vector indicated by the selected AMVP candidate and the MVD. The video decoder 30 may then use the recovered one or more motion vectors of the current PU to generate a predictive block for the current PU.

当视频解码器30产生当前PU的AMVP候选者列表时，视频解码器30可基于覆盖在空间上与当前PU相邻的位置的PU(即，在空间上相邻的PU)的运动信息而导出一或多个AMVP候选者。在PU的预测块包含一位置时，PU可覆盖所述位置。When video decoder 30 generates an AMVP candidate list for a current PU, video decoder 30 may derive one or more AMVP candidates based on motion information of PUs that cover positions spatially adjacent to the current PU (i.e., spatially adjacent PUs). When the prediction block of a PU includes a position, the PU may cover the position.

合并候选者列表或AMVP候选者列表中基于在时间上相邻于当前PU的PU(即，在与当前PU不同的时间实例中的PU)的运动信息的候选者可被称为TMVP。即，TMVP可用以提高HEVC的译码效率，并且不同于其它译码工具，TMVP可需要存取经解码图片缓冲器中、更具体来说参考图片列表中的帧的运动向量。A candidate in a merge candidate list or an AMVP candidate list that is based on the motion information of a PU that is temporally adjacent to the current PU (i.e., a PU at a different time instance than the current PU) may be referred to as a TMVP. That is, the TMVP may be used to improve HEVC coding efficiency, and unlike other coding tools, the TMVP may require access to the motion vectors of frames in the decoded picture buffer, more specifically, in the reference picture list.

可基于逐CVS(经译码视频序列)、逐切片或另一基础来启用或停用TMVP的使用。SPS中的语法元素(例如，sps_temporal_mvp_enable_flag)可指示TMVP的使用是否针对CVS经启用。此外，当TMVP的使用针对CVS经启用时，可针对所述CVS内的特定切片启用或停用TMVP的使用。举例来说，切片标头中的语法元素(例如，slice_temporal_mvp_enable_flag)可指示TMVP的使用是否针对切片经启用。因此，在帧间预测切片中，当TMVP针对整个CVS经启用(例如，SPS中的sps_temporal_mvp_enable_flag设定成1)时，在切片标头中用信号表示slice_temporal_mvp_enable_flag以指示TMVP是否针对当前切片经启用。The use of TMVP can be enabled or disabled on a per-CVS (coded video sequence) basis, per-slice basis, or on another basis. A syntax element in the SPS (e.g., sps_temporal_mvp_enable_flag) can indicate whether the use of TMVP is enabled for a CVS. Furthermore, when the use of TMVP is enabled for a CVS, the use of TMVP can be enabled or disabled for a specific slice within that CVS. For example, a syntax element in the slice header (e.g., slice_temporal_mvp_enable_flag) can indicate whether the use of TMVP is enabled for a slice. Thus, in an inter-predicted slice, when TMVP is enabled for the entire CVS (e.g., sps_temporal_mvp_enable_flag in the SPS is set to 1), a slice_temporal_mvp_enable_flag is signaled in the slice header to indicate whether TMVP is enabled for the current slice.

为了确定TMVP，视频编解码器可首先识别包含与当前PU位于同一地点的PU的参考图片。换句话说，视频译码器可识别位于同一地点的图片。如果当前图片的当前切片是B切片(即，允许包含经双向帧间预测的PU的切片)，那么视频编码器20可在切片标头中用信号表示指示相同位置图片是否来自RefPicList0或RefPicList1的语法元素(例如，collocated_from_l0_flag)。换句话说，当针对当前切片启用TMVP的使用且当前切片是B切片(例如，允许包含双向帧间预测PU的切片)时，视频编码器20可在切片标头中用信号表示语法元素(例如，collocated_from_l0_flag)以指示位于同一地点的图片是否处于RefPicList0或RefPicList1中。换句话说，为了得到TMVP，首先将识别位于同一地点的图片。如果当前图片为B切片，那么在切片标头中用信号表示collocated_from_l0_flag以指示相同位置的图片是来自RefPicList0还是来自RefPicList1。To determine the TMVP, the video codec may first identify a reference picture that includes a PU that is co-located with the current PU. In other words, the video coder may identify the co-located picture. If the current slice of the current picture is a B slice (i.e., a slice that allows inclusion of bi-directionally inter-predicted PUs), video encoder 20 may signal in the slice header a syntax element (e.g., collocated_from_l0_flag) that indicates whether the co-located picture is from RefPicList0 or RefPicList1. In other words, when use of TMVP is enabled for the current slice and the current slice is a B slice (e.g., a slice that allows inclusion of bi-directionally inter-predicted PUs), video encoder 20 may signal in the slice header a syntax element (e.g., collocated_from_l0_flag) that indicates whether the co-located picture is in RefPicList0 or RefPicList1. In other words, to derive the TMVP, the co-located picture will first be identified. If the current picture is a B slice, collocated_from_lO_flag is signaled in the slice header to indicate whether the co-located picture is from RefPicList0 or from RefPicList1.

在视频解码器30识别包含位于同一地点的图片的参考图片列表之后，视频解码器30可使用可在切片标头中用信号发送的另一语法元素(例如，collocated_ref_idx)来识别所识别的参考图片列表中的图片(即，位于同一地点的图片)。即，在识别参考图片列表之后，在切片标头中用信号表示的collocated_ref_idx用以识别参考图片列表中的图片。After video decoder 30 identifies a reference picture list that includes a co-located picture, video decoder 30 may use another syntax element (e.g., collocated_ref_idx) that may be signaled in a slice header to identify the picture in the identified reference picture list (i.e., the co-located picture). That is, after identifying a reference picture list, collocated_ref_idx signaled in the slice header is used to identify the picture in the reference picture list.

视频译码器可通过检查位于同一地点的图片来识别位于同一地点的PU。TMVP可指示含有位于同一地点的PU的CU的右下方PU的运动信息或含有此PU的CU的中心PU内的右下方PU的运动信息。因此，使用含有此PU的CU的右下方PU的运动或含有此PU的CU的中心PU内的右下方PU的运动。含有位于同一地点的PU的CU的右下方PU可为覆盖直接在所述PU的预测块的右下方样本的右下方的位置的PU。换句话说，TMVP可指示在参考图片中且覆盖与当前PU的右下方拐角位于同一地点的位置的PU的运动信息，或TMVP可指示在参考图片中且覆盖与当前PU的中心位于同一地点的位置的PU的运动信息。The video coder can identify the co-located PU by examining the co-located picture. The TMVP may indicate the motion information of the bottom-right PU of the CU containing the co-located PU or the motion information of the bottom-right PU within the center PU of the CU containing the co-located PU. Therefore, the motion of the bottom-right PU of the CU containing the co-located PU or the motion of the bottom-right PU within the center PU of the CU containing the co-located PU is used. The bottom-right PU of the CU containing the co-located PU may be the PU that covers a position directly below and to the right of the bottom-right sample of the prediction block of the co-located PU. In other words, the TMVP may indicate the motion information of the PU in the reference picture that covers a position co-located with the bottom-right corner of the current PU, or the TMVP may indicate the motion information of the PU in the reference picture that covers a position co-located with the center of the current PU.

当由以上过程识别的运动向量(即，TMVP的运动向量)用以产生用于合并模式或AMVP模式的运动候选者时，视频译码器可基于时间位置(由POC值反映)按比例缩放所述运动向量。举例来说，当当前图片与参考图片的POC值之间的差大于当当前图片与参考图片的POC值之间的差较小时的值时，视频编解码器可将运动向量的量值增大多出的量。When the motion vector identified by the above process (i.e., the motion vector of the TMVP) is used to generate a motion candidate for merge mode or AMVP mode, the video coder may scale the motion vector based on the temporal position (reflected by the POC value). For example, when the difference between the POC values of the current picture and the reference picture is greater than the value when the difference between the POC values of the current picture and the reference picture is smaller, the video codec may increase the magnitude of the motion vector by the additional amount.

从TMVP导出的时间合并候选者的所有可能的参考图片列表的目标参考索引可始终设定成0。然而，对于AMVP，所有可能的参考图片的目标参考索引设定成等于经解码参考索引。换句话说，将从TMVP导出的时间合并候选者的所有可能参考图片列表的目标参考索引设定为0，而对于AMVP，其经设定为等于经解码参考索引。在HEVC中，SPS可包含旗标(例如，sps_temporal_mvp_enable_flag)且当sps_temporal_mvp_enable_flag等于1时切片标头可包含旗标(例如，pic_temporal_mvp_enable_flag)。当对于特定图片，pic_temporal_mvp_enable_flag与temporal_id两者都等于0时，在所述特定图片或按解码次序在所述特定图片之后的图片的解码中并不将来自按解码次序在所述特定图片之前的图片的运动向量用作TMVP。The target reference indexes of all possible reference picture lists for temporal merging candidates derived from TMVP may always be set to 0. However, for AMVP, the target reference indexes of all possible reference pictures are set equal to the decoded reference indexes. In other words, the target reference indexes of all possible reference picture lists for temporal merging candidates derived from TMVP are set to 0, while for AMVP, they are set equal to the decoded reference indexes. In HEVC, the SPS may include a flag (e.g., sps_temporal_mvp_enable_flag) and the slice header may include a flag (e.g., pic_temporal_mvp_enable_flag) when sps_temporal_mvp_enable_flag is equal to 1. When pic_temporal_mvp_enable_flag and temporal_id are both equal to 0 for a particular picture, motion vectors from pictures that precede the particular picture in decoding order are not used as TMVP in the decoding of the particular picture or pictures that follow the particular picture in decoding order.

在接下来的部分中，将论述多视图(例如，H.264/MVC中)和多视图加深度(例如，3D-HEVC中)译码技术。最初，将论述MVC技术。如上所述，MVC是ITU-T H.264/AVC的多视图译码扩展。在MVC中，按时间优先次序对多个视图的数据进行译码，且因此，解码次序布置被称作时间优先译码。确切地说，可对在共同时间实例处的多个视图中的每一者的视图分量(即，图片)进行译码，接着可对用于不同时间实例的视图分量的另一集合进行译码，诸如此类。存取单元可包含用于一个输出时间实例的所有视图的经译码图片。应理解，存取单元的解码次序未必等同于输出(或显示)次序。In the following sections, multi-view (e.g., in H.264/MVC) and multi-view plus depth (e.g., in 3D-HEVC) coding techniques will be discussed. Initially, MVC techniques will be discussed. As described above, MVC is a multi-view coding extension of ITU-T H.264/AVC. In MVC, data for multiple views is coded in a time-first order, and therefore, the decoding order arrangement is referred to as time-first coding. Specifically, the view components (i.e., pictures) of each of multiple views at a common time instance may be coded, followed by another set of view components for a different time instance, and so on. An access unit may include coded pictures for all views for one output time instance. It should be understood that the decoding order of access units is not necessarily equivalent to the output (or display) order.

图2中展示了典型MVC解码次序(即，位流次序)。解码次序布置被称作时间优先译码。应注意，存取单元的解码顺序可并不相同于输出或显示顺序。在图2中，S0到S7各自指代多视图视频的不同视图。T0到T8各自表示一个输出时间实例。存取单元可包含针对一个输出时间实例的所有视图的经译码图片。举例来说，第一存取单元可包含针对时间实例T0的所有视图S0到S7，第二存取单元可包含针对时间实例T1的所有视图S0到S7，等。A typical MVC decoding order (i.e., bitstream order) is shown in FIG2 . This decoding order arrangement is referred to as time-first coding. Note that the decoding order of access units may not be the same as the output or display order. In FIG2 , S0 to S7 each refer to a different view of the multi-view video. T0 to T8 each represent an output time instance. An access unit may include coded pictures for all views for one output time instance. For example, a first access unit may include all views S0 to S7 for time instance T0, a second access unit may include all views S0 to S7 for time instance T1, and so on.

出于简洁目的，本发明可使用以下定义：For the purposes of brevity, the following definitions may be used herein:

视图分量：单个存取单元中的视图的经译码表示。当视图包含经译码纹理及深度表示两者时，视图分量由纹理视图分量及深度视图分量构成。View component: A coded representation of a view in a single access unit. When a view includes both coded texture and depth representations, a view component consists of a texture view component and a depth view component.

纹理视图分量：单个存取单元中的视图的纹理的经译码表示。Texture view component: A coded representation of the texture of a view in a single access unit.

深度视图分量：单个存取单元中的视图的深度的经译码表示。Depth view component: A coded representation of the depth of a view in a single access unit.

在图2中，所述视图中的每一者包含若干图片集合。举例来说，视图S0包含图片集合0、8、16、24、32、40、48、56及64，视图S1包含图片集合1、9、17、25、33、41、49、57及65，等。对于3D视频译码，例如3D-HEVC，每一图片可包含两个分量图片：一个分量图片称为纹理视图分量，且另一分量图片称为深度视图分量。视图的一图片集合内的纹理视图分量及深度视图分量可被视为对应于彼此。举例来说，视图的一组图片内的纹理视图分量被视为对应于视图的图片的所述组内的深度视图分量，且反过来也一样(即，深度视图分量对应于其在所述组中的纹理视图分量，且反过来也一样)。如本发明中所使用，对应于深度视图分量的纹理视图分量可被视为纹理视图分量且深度视图分量为单一存取单元的同一视图的部分。In FIG2 , each of the views includes several picture sets. For example, view S0 includes picture sets 0, 8, 16, 24, 32, 40, 48, 56, and 64, view S1 includes picture sets 1, 9, 17, 25, 33, 41, 49, 57, and 65, and so on. For 3D video coding, such as 3D-HEVC, each picture may include two component pictures: one component picture is called a texture view component, and the other component picture is called a depth view component. Texture view components and depth view components within a picture set of a view may be considered to correspond to each other. For example, a texture view component within a set of pictures of a view is considered to correspond to a depth view component within the set of pictures of the view, and vice versa (i.e., a depth view component corresponds to its texture view component in the set, and vice versa). As used in this disclosure, a texture view component corresponding to a depth view component may be considered to be part of the same view of a single access unit.

纹理视图分量包含所显示的实际图像内容。举例来说，所述纹理视图分量可包含明度(Y)及色度(Cb及Cr)分量。深度视图分量可指示其对应纹理视图分量中的像素的相对深度。作为一个实例，深度视图分量为仅包含明度值的灰阶图像。换句话说，深度视图分量可不传达任何图像内容，而是提供纹理视图分量中的像素的相对深度的量度。The texture view component contains the actual image content being displayed. For example, the texture view component may include luma (Y) and chroma (Cb and Cr) components. The depth view component may indicate the relative depth of pixels in its corresponding texture view component. As an example, the depth view component is a grayscale image that only includes luma values. In other words, the depth view component may not convey any image content, but rather provides a measure of the relative depth of pixels in the texture view component.

举例来说，深度视图分量中的纯白色像素指示对应纹理视图分量中的其对应像素较接近于观察者的视角，且深度视图分量中的纯黑色像素指示对应纹理视图分量中的其对应像素距观察者的视角较远。黑色与白色之间的各种灰度渐变指示不同深度水平。举例来说，深度视图分量中的深灰色像素指示纹理视图分量中的其对应像素比深度视图分量中的浅灰色像素更远。因为仅需要灰阶来识别像素的深度，因此深度视图分量不需要包含色度分量，因为深度视图分量的色彩值可能不服务于任何目的。For example, a pure white pixel in a depth view component indicates that its corresponding pixel in the corresponding texture view component is closer to the viewer's perspective, and a pure black pixel in a depth view component indicates that its corresponding pixel in the corresponding texture view component is farther from the viewer's perspective. Various shades of gray between black and white indicate different depth levels. For example, a dark gray pixel in a depth view component indicates that its corresponding pixel in the texture view component is farther away than a light gray pixel in the depth view component. Because only grayscale is needed to identify the depth of a pixel, the depth view component does not need to include chroma components, as the color values of the depth view component may not serve any purpose.

仅使用明度值(例如，强度值)来识别深度的深度视图分量是出于说明的目的而提供，且不应被视为限制性。在其它实例中，可利用任何技术来指示纹理视图分量中的像素的相对深度。The use of only luma values (eg, intensity values) to identify depth view components is provided for purposes of illustration and should not be considered limiting. In other examples, any technique may be utilized to indicate the relative depth of pixels in a texture view component.

图3中展示了用于多视图视频译码的典型MVC预测结构(包含每一视图内的图片间预测及视图间预测两者)。预测方向由箭头指示，箭头指向的对象使用箭头出发的对象作为预测参考。在MVC中，由视差运动补偿支持视图间预测，所述视差运动补偿使用H.264/AVC运动补偿的语法但允许将不同视图中的图片用作参考图片。A typical MVC prediction structure for multi-view video coding (including both inter-picture prediction within each view and inter-view prediction) is shown in Figure 3. The prediction direction is indicated by an arrow, and the object pointed to by the arrow uses the object from which the arrow originates as a prediction reference. In MVC, inter-view prediction is supported by disparity motion compensation, which uses the syntax of H.264/AVC motion compensation but allows pictures in different views to be used as reference pictures.

在图3的实例中，说明八个视图(具有视图ID“S0”到“S7”)，且对于每一视图说明十二个时间位置(“T0”到“T11”)。即，图3中的每一行对应于一视图，而每一列指示一时间位置。3, eight views are illustrated (with view IDs "S0" through "S7"), and twelve temporal positions ("T0" through "T11") are illustrated for each view. That is, each row in FIG3 corresponds to a view, and each column indicates a temporal position.

尽管MVC具有可由H.264/AVC解码器解码的所谓的基础视图，且MVC还可支持立体视图对，但MVC的优点在于其可支持使用两个以上视图作为3D视频输入且解码通过多个视图表示的此3D视频的实例。具有MVC解码器的客户端的再现器可预期具有多个视图的3D视频内容。Although MVC has so-called base views that can be decoded by H.264/AVC decoders, and MVC can also support stereoscopic view pairs, an advantage of MVC is that it can support instances where more than two views are used as 3D video input and such 3D video is decoded using multiple views. A renderer of a client with an MVC decoder can expect 3D video content with multiple views.

在每一行及每一列的交叉点处指示图3中的图片。H.264/AVC标准可使用术语帧来表示视频的一部分。本发明可互换地使用术语图片与帧。3 is indicated at the intersection of each row and each column. The H.264/AVC standard may use the term frame to refer to a portion of a video. This disclosure may use the terms picture and frame interchangeably.

使用包含字母的块来说明图3中的图片，字母指明对应图片是经帧内译码(即，I图片)，还是在一个方向上经帧间译码(即，作为P图片)，还是在多个方向上经帧间译码(即，作为B图片)。大体来说，预测由箭头指示，其中箭头指向的图片使用箭头出发的图片用于预测参考。举例来说，时间位置T0处的视图S2的P图片是从时间位置T0处的视图S0的I图片预测的。The pictures in FIG3 are illustrated using blocks containing letters that indicate whether the corresponding picture is intra-coded (i.e., an I-picture), inter-coded in one direction (i.e., as a P-picture), or inter-coded in multiple directions (i.e., as a B-picture). Generally, prediction is indicated by an arrow, where the picture to which the arrow points uses the picture from which the arrow originates for prediction reference. For example, the P-picture of view S2 at temporal location T0 is predicted from the I-picture of view S0 at temporal location T0.

如同单个视图视频编码，可相对于不同时间位置处的图片预测性地编码多视图视频译码视频序列的图片。举例来说，时间位置T1处的视图S0的b图片具有从时间位置T0处的视图S0的I图片指向其的箭头，从而指示所述b图片是从所述I图片预测的。然而，另外，在多视图视频编码的情况下，图片可经视图间预测。也就是说，视图分量可使用其它视图中的视图分量用于参考。举例来说，在MVC中，如同另一视图中的视图分量为帧间预测参考而实现视图间预测。潜在视图间参考在序列参数集(SPS)MVC扩展中用信号通知且可通过参考图片列表构造过程加以修改，所述参考图片列表构造过程实现帧间预测或视图间预测参考的灵活排序。视图间预测也是包含3D-HEVC(多视图加深度)的HEVC的所提出的多视图扩展的特征。As with single-view video coding, pictures of a multi-view video coding video sequence can be predictively encoded relative to pictures at different temporal locations. For example, a b-picture of view S0 at temporal location T1 has an arrow pointing to it from an I-picture of view S0 at temporal location T0, indicating that the b-picture is predicted from the I-picture. However, in addition, with multi-view video coding, pictures can be inter-view predicted. That is, a view component can use view components in other views for reference. For example, in MVC, inter-view prediction is achieved as if a view component in another view is an inter-prediction reference. Potential inter-view references are signaled in the Sequence Parameter Set (SPS) MVC extension and can be modified through the reference picture list construction process, which enables flexible ordering of inter-prediction or inter-view prediction references. Inter-view prediction is also a feature of proposed multi-view extensions of HEVC, including 3D-HEVC (Multi-view Plus Depth).

图3提供视图间预测的各种实例。在图3的实例中，视图S1的图片说明为是从视图S1的不同时间位置处的图片预测，以及是从相同时间位置处的视图S0和S2的图片经视图间预测。举例来说，时间位置T1处的视图S1的b图片是从时间位置T0及T2处的视图S1的B图片中的每一者以及时间位置T1处的视图S0及S2的b图片预测。FIG3 provides various examples of inter-view prediction. In the example of FIG3, pictures of view S1 are illustrated as being predicted from pictures at different temporal locations of view S1, as well as being inter-view predicted from pictures of views S0 and S2 at the same temporal location. For example, the b-picture of view S1 at temporal location T1 is predicted from each of the B-pictures of view S1 at temporal locations T0 and T2, as well as the b-pictures of views S0 and S2 at temporal location T1.

在一些实例中，图3可被看作说明纹理视图分量。举例来说，图2中所说明的I、P、B和b图片可被视为视图中的每一者的纹理视图分量。根据本发明中描述的技术，对于图3中所说明的纹理视图分量中的每一者，存在对应深度视图分量。在一些实例中，可以类似于图3中针对对应纹理视图分量所说明的方式的方式预测深度视图分量。In some examples, FIG3 can be viewed as illustrating texture view components. For example, the I, P, B, and b pictures illustrated in FIG2 can be viewed as texture view components for each of the views. According to the techniques described in this disclosure, for each of the texture view components illustrated in FIG3, there is a corresponding depth view component. In some examples, the depth view component can be predicted in a manner similar to that illustrated in FIG3 for the corresponding texture view components.

MVC中也可支持两个视图的译码。MVC的优点中的一个优点是：MVC编码器可将两个以上视图视为3D视频输入且MVC解码器可解码此类多视图表示。因此，具有MVC解码器的任何再现器可预期具有两个以上视图的3D视频内容。MVC can also support the decoding of two views. One of the advantages of MVC is that an MVC encoder can treat more than two views as 3D video input and an MVC decoder can decode such multi-view representations. Therefore, any renderer with an MVC decoder can expect 3D video content with more than two views.

在MVC中，允许在相同存取单元(即，具有相同时间例子)中的图片当中进行视图间预测。在译码非基础视图中的一者中的图片时，如果图片在不同视图中，但在相同时间实例内，那么可将图片添加到参考图片列表中。可将视图间参考图片放置在参考图片列表的任何位置中，正如任何帧间预测参考图片一般。如图3中所示，视图分量可出于参考目的使用其它视图中的视图分量。在MVC中，如同另一视图中的视图分量为帧间预测参考般实现视图间预测。In MVC, inter-view prediction is allowed among pictures in the same access unit (i.e., with the same temporal instance). When coding a picture in one of the non-base views, the picture can be added to the reference picture list if it is in a different view but in the same temporal instance. An inter-view reference picture can be placed in any position in the reference picture list, just like any inter-prediction reference picture. As shown in Figure 3, a view component can use view components in other views for reference purposes. In MVC, inter-view prediction is implemented as if the view component in another view is an inter-prediction reference.

在多视图视频译码的上下文中，一般来说存在两个种类的运动向量。一个称为正常运动向量。所述正常运动向量指向时间参考图片且对应时间帧间预测是运动补偿预测(MCP)。另一运动向量是视差运动向量(DMV)。所述DMV指向不同视图中的图片(即，视图间参考图片)且对应帧间预测是视差补偿预测(DCP)。In the context of multi-view video coding, there are generally two types of motion vectors. One is called a normal motion vector. It points to a temporal reference picture and its corresponding temporal inter-frame prediction is motion compensated prediction (MCP). The other motion vector is a disparity motion vector (DMV). It points to a picture in a different view (i.e., an inter-view reference picture) and its corresponding inter-frame prediction is disparity compensated prediction (DCP).

另一类型的多视图视频译码格式引入深度值的使用(例如，3D-HEVC中)。对于流行用于3D电视和自由视点视频的多视图视频加深度(MVD)数据格式，可独立地以多视图纹理图片译码纹理图像和深度图。图4说明具有纹理图像的MVD数据格式及其相关联每样本深度图。深度范围可限于在与对应3D点的相机相距最小z_近和最大z_远距离的范围内。Another type of multi-view video coding format introduces the use of depth values (e.g., in 3D-HEVC). In the Multi-view Video Plus Depth (MVD) data format, which is popular for 3D television and free-viewpoint video, texture images and depth maps can be coded independently in a multi-view texture picture. FIG4 illustrates the MVD data format with a texture image and its associated per-sample depth map. The depth range can be limited to a minimum z- _nearest and maximum z- _{farthest distance} from the camera for the corresponding 3D point.

相机参数和深度范围值可有助于在3D显示器上再现之前处理经解码视图分量。因此，针对H.264/MVC的当前版本界定特殊补充增强信息(SEI)消息，即多视图获取信息SEI，其包含指定获取环境的各种参数的信息。然而，H.264/MVC中未指定用于指示深度范围相关信息的语法。Camera parameters and depth range values can help process decoded view components before rendering on a 3D display. Therefore, a special supplemental enhancement information (SEI) message, the multiview acquisition information SEI, is defined for the current version of H.264/MVC, which contains information specifying various parameters of the acquisition environment. However, H.264/MVC does not specify a syntax for indicating depth range-related information.

现将论述HEVC中的不对称运动分割(AMP)和运动补偿块大小。在HEVC中，经帧间译码译码块可分裂成一个、两个或四个分区。此些分区的各种形状是可能的。图5中描绘经帧间预测译码块的实例分割可能性。Asymmetric motion partitioning (AMP) and motion compensation block size in HEVC will now be discussed. In HEVC, an inter-coded coding block can be split into one, two, or four partitions. Various shapes of such partitions are possible. An example partitioning possibility for an inter-predicted coding block is depicted in FIG5 .

图5中的分区的上部行说明所谓的对称分区。NxN分区简单地是尚未分裂的译码块。N/2xN分区是分裂成两个垂直矩形分区的译码块。同样，NxN/2分区是分裂成两个水平矩形分区的译码块。N/2xN/2分区是分裂成四个相等正方形分区的译码块。The top row of partitions in FIG5 illustrates so-called symmetric partitions. An NxN partition is simply a coding block that has not been split. An N/2xN partition is a coding block that has been split into two vertical rectangular partitions. Similarly, an NxN/2 partition is a coding block that has been split into two horizontal rectangular partitions. An N/2xN/2 partition is a coding block that has been split into four equal square partitions.

图5中的下部四个分区类型称为不对称分区，且可以用于帧间预测的不对称运动分割(AMP)。AMP模式的一个分区分别具有高度或宽度N/4和宽度或高度N，且另一分区通过具有3N/4的高度或宽度和宽度或高度N而由CB的其余部分组成。每一经帧间译码分区被指派一个或两个运动向量和参考图片索引(即，一个运动向量和参考索引用于单向预测且两个运动向量和参考索引用于双向预测)。在一些实例中，为了最小化最坏情况存储器带宽，不允许大小4x4的分区用于帧间预测，且大小4x8和8x4的分区限于基于预测性数据的一个列表的单向预测译码。The lower four partition types in Figure 5 are called asymmetric partitions and can be used for asymmetric motion partitioning (AMP) for inter prediction. One partition of the AMP mode has a height or width of N/4 and a width or height of N, respectively, and the other partition consists of the rest of the CB by having a height or width of 3N/4 and a width or height of N. Each inter-coded partition is assigned one or two motion vectors and reference picture indices (i.e., one motion vector and reference index for unidirectional prediction and two motion vectors and reference indices for bidirectional prediction). In some examples, to minimize worst-case memory bandwidth, partitions of size 4x4 are not allowed for inter prediction, and partitions of size 4x8 and 8x4 are limited to unidirectional predictive coding based on one list of predictive data.

如下文将更详细地论述，本发明描述当结合3D-HEVC译码技术使用时用于AMP的技术，包含后向视图合成预测(BVSP)。As will be discussed in greater detail below, this disclosure describes techniques for AMP, including backward view synthesis prediction (BVSP), when used in conjunction with 3D-HEVC coding techniques.

以下描述HEVC中的合并候选者列表。举例来说，可以以下步骤构造所述合并候选者列表。对于空间合并候选者，视频编码器20和/或视频解码器30可从五个空间相邻块导出多达四个空间运动向量候选者，如图6中说明。The following describes the merge candidate list in HEVC. For example, the merge candidate list may be constructed by the following steps. For spatial merge candidates, video encoder 20 and/or video decoder 30 may derive up to four spatial motion vector candidates from five spatial neighboring blocks, as illustrated in FIG6 .

视频编码器20及视频解码器30可评估空间相邻块的次序如下：左边(A1)、上方(B1)、右上方(B0)、左下方(A0)和左上方(B2)，如图6中所示。在一些实例中，可应用修剪过程以移除具有相同运动信息(例如，运动向量和参考索引)的运动向量候选者。举例来说，可将B1的运动向量和参考索引与A1的运动向量和参考索引进行比较，可将B0的运动向量和参考索引与B1的运动向量和参考索引进行比较，可将A0的运动向量和参考索引与A1的运动向量和参考索引进行比较，且可将B2的运动向量和参考索引与B1和A1两者的运动向量和参考索引进行比较。随后可从运动向量候选者列表移除具有相同运动信息的两个候选者中的一者。如果在修剪过程之后已经存在四个候选者，那么不将候选者B2插入到所述合并候选者列表。Video encoder 20 and video decoder 30 may evaluate spatially neighboring blocks in the following order: left (A1), top (B1), top right (B0), bottom left (A0), and top left (B2), as shown in FIG6 . In some examples, a pruning process may be applied to remove motion vector candidates with identical motion information (e.g., motion vector and reference index). For example, the motion vector and reference index of B1 may be compared with the motion vector and reference index of A1, the motion vector and reference index of B0 may be compared with the motion vector and reference index of B1, the motion vector and reference index of A0 may be compared with the motion vector and reference index of A1, and the motion vector and reference index of B2 may be compared with the motion vectors and reference indexes of both B1 and A1. One of the two candidates with identical motion information may then be removed from the motion vector candidate list. If four candidates already exist after the pruning process, candidate B2 is not inserted into the merge candidate list.

如果经启用且可用，那么将来自参考图片的位于同一地点的时间运动向量预测符(TMVP)候选者添加到运动向量候选者列表中位于空间运动向量候选者之后。If enabled and available, the co-located temporal motion vector predictor (TMVP) candidate from the reference picture is added to the motion vector candidate list after the spatial motion vector candidate.

如果运动向量候选者列表不完整(例如，具有少于预定数目的条目，那么可产生一或多个人工运动向量候选者且插入于所述合并候选者列表的末尾。实例类型的人工运动向量候选者包含仅针对B切片导出的组合双向预测合并候选者以及在不存在足够双向预测合并候选者的情况下包含零运动向量合并候选者(或其它类型的人工运动向量候选者)以提供预定数目的运动向量候选者。If the motion vector candidate list is incomplete (e.g., has fewer than a predetermined number of entries), one or more artificial motion vector candidates may be generated and inserted at the end of the merge candidate list. Example types of artificial motion vector candidates include a combined bi-predictive merge candidate derived only for B slices, and a zero motion vector merge candidate (or other types of artificial motion vector candidates) to provide a predetermined number of motion vector candidates if there are not enough bi-predictive merge candidates.

当当前切片是B切片时，调用组合双向预测合并候选者的导出过程。对于已经在候选者列表中且具有必要运动信息的每对候选者，使用参考列表0(如果可用)中的图片的第一候选者(具有等于l0CandIdx的合并候选者索引)的运动向量与参考列表1(如果可用且参考图片或运动向量不同于第一候选者)中的图片的第二候选者(具有等于l1CandIdx的合并候选者索引)的运动向量的组合来导出组合双向预测运动向量候选者(具有由combIdx表示的索引)。When the current slice is a B slice, the derivation process of combined bi-predictive merge candidates is called. For each pair of candidates that are already in the candidate list and have the necessary motion information, a combined bi-predictive motion vector candidate (with an index denoted by combIdx) is derived using a combination of the motion vector of the first candidate (with a merge candidate index equal to 10CandIdx) of the picture in reference list 0 (if available) and the motion vector of the second candidate (with a merge candidate index equal to 11CandIdx) of the picture in reference list 1 (if available and the reference picture or motion vector is different from the first candidate).

图7是指示3D-HEVC中的l0CandIdx及l1CandIdx的实例规范的表。举例来说，图7说明对应于combIdx的l0CandIdx和l1CandIdx的定义。FIG7 is a table indicating an example specification of l0CandIdx and l1CandIdx in 3D-HEVC. For example, FIG7 illustrates the definition of l0CandIdx and l1CandIdx corresponding to combIdx.

对于为0…11的combIdx，当以下条件为真时组合双向预测运动向量候选者的产生过程终止：(1)combIdx等于(numOrigMergeCand*(numOrigMergeCand-1))，其中numOrigMergeCand表示在调用此过程之前合并列表中的候选者的数目；(2)合并列表中的总候选者的数目(包含新产生的组合双向预测合并候选者)等于MaxNumMergeCand。For combIdx 0…11, the process of generating combined bi-predictive motion vector candidates terminates when the following conditions are true: (1) combIdx is equal to (numOrigMergeCand*(numOrigMergeCand-1)), where numOrigMergeCand represents the number of candidates in the merge list before calling this process; (2) the total number of candidates in the merge list (including the newly generated combined bi-predictive merge candidate) is equal to MaxNumMergeCand.

此部分描述零运动向量合并候选者的导出。对于每一候选者，将零运动向量和参考图片索引设定为从0到可用参考图片索引的数目减1。如果仍存在比合并运动向量候选者的最大数目(例如，由MaxNumMergeCand语法元素指示)少的候选者，那么插入零参考索引和运动向量直到候选者的总数等于MaxNumMergeCand。This section describes the derivation of zero motion vector merge candidates. For each candidate, the zero motion vector and reference picture index are set from 0 to the number of available reference picture indices minus 1. If there are still fewer candidates than the maximum number of merge motion vector candidates (e.g., indicated by the MaxNumMergeCand syntax element), zero reference indices and motion vectors are inserted until the total number of candidates equals MaxNumMergeCand.

以下描述HEVC中的运动补偿大小的约束。为了最小化最坏情况存储器带宽，对于帧间预测不允许大小4x4的分区，且大小4x8和8x4的分区限于单向预测译码。The following describes the constraints on motion compensation size in HEVC. To minimize the worst-case memory bandwidth, partitions of size 4x4 are not allowed for inter prediction, and partitions of size 4x8 and 8x4 are restricted to unidirectional predictive coding.

为了满足上文所提及的此约束，当当前PU大小等于8x4或4x8时，产生的空间/时间/组合双向预测合并候选者，如果其与双向预测模式相关联，则应当通过将预测方向修改为列表0且将对应于RefPicList1的参考图片索引和运动向量分别修改为-1和(0,0)而将当前PU复位为使用单向预测。In order to satisfy this constraint mentioned above, when the current PU size is equal to 8x4 or 4x8, the generated spatial/temporal/combined bi-directional prediction merge candidates, if they are associated with bi-directional prediction mode, should reset the current PU to use uni-directional prediction by modifying the prediction direction to list 0 and the reference picture index and motion vector corresponding to RefPicList1 to -1 and (0,0) respectively.

如上所提到，3D-HEVC在开发中。3D-HEVC可使用视图间运动预测及视图间残余预测而改善译码效率。换句话说，为了进一步改善译码效率，参考软件中已经采用两个新技术，即“视图间运动预测”和“视图间残余预测”。在视图间运动预测中，视频译码器(例如，视频编码器20或视频解码器30)可基于与当前PU不同的视图中的PU的运动信息而确定(即，预测)当前PU的运动信息。在视图间残余预测中，视频译码器可基于与当前CU不同的视图中的残余数据确定当前CU的残余块。As mentioned above, 3D-HEVC is under development. 3D-HEVC can improve coding efficiency using inter-view motion prediction and inter-view residual prediction. In other words, to further improve coding efficiency, two new technologies, inter-view motion prediction and inter-view residual prediction, have been adopted in the reference software. In inter-view motion prediction, a video coder (e.g., video encoder 20 or video decoder 30) can determine (i.e., predict) the motion information of a current PU based on the motion information of a PU in a view different from the current PU. In inter-view residual prediction, the video coder can determine a residual block for the current CU based on residual data in a view different from the current CU.

现将论述3D-HEVC中的基于相邻块的视差向量(NBDV)导出。NBDV导出用作3D-HEVC中的视差向量导出技术，原因在于3D-HEVC使用纹理优先译码次序用于全部视图的事实。由于对应深度图不可用于当前经译码纹理图片，因此在NBDV中从相邻块导出视差向量。在3D-HEVC设计的一些提议中，可通过检索对应于参考纹理视图的深度数据而进一步精炼从NBDV导出过程导出的视差向量。Neighboring Block-Based Disparity Vector (NBDV) derivation in 3D-HEVC will now be discussed. NBDV derivation is used as a disparity vector derivation technique in 3D-HEVC due to the fact that 3D-HEVC uses a texture-first coding order for all views. Since the corresponding depth map is not available for the currently coded texture picture, the disparity vector is derived from the neighboring blocks in NBDV. In some proposed 3D-HEVC designs, the disparity vector derived from the NBDV derivation process can be further refined by retrieving depth data corresponding to a reference texture view.

3D-HEVC初始地采用JCT3V-A0097(3D-CE5.h:视差向量产生结果，L·张，Y·陈，M·卡兹威姿(高通))中所提议的NBDV导出技术。JCTVC-A0126(3D-CE5.h:基于HEVC的3D视频译码的视差向量导出的简化(J·孙，M·古，S·叶(LG))中与简化NBDV一起包含隐式视差向量。另外，在JCT3V-B0047(3D-CE5.h相关：视差向量导出的改进，J·康，Y·陈，L·张，M·卡茨威姿(高通))中，通过移除存储在经解码图片缓冲器中的隐式视差向量而进一步简化NBDV导出技术，并且以随机存取图片(RAP)选择改善译码增益。JCT3V-D0181(CE2:在3D-HEVC中基于CU的视差向量导出，J·康，Y·陈，L·张，M卡茨威姿(高通))中描述用于NBDV导出的额外技术。3D-HEVC initially adopted the NBDV derivation technique proposed in JCT3V-A0097 (3D-CE5.h: Disparity Vector Generation Results, L. Zhang, Y. Chen, M. Katzwirtz (Qualcomm)). JCTVC-A0126 (3D-CE5.h: Simplification of Disparity Vector Derivation for HEVC-Based 3D Video Coding (J. Sun, M. Gu, S. Ye (LG)) includes implicit disparity vectors along with simplified NBDV. In addition, in JCT3V-B0047 (3D-CE5.h related: Improvements in Disparity Vector Derivation, J. Kang, Y. Chen, L. Zhang, M. Katzwirtz (Qualcomm)), the NBDV derivation technique is further simplified by removing the implicit disparity vectors stored in the decoded picture buffer, and improving the decoding gain with random access picture (RAP) selection. Additional techniques for NBDV derivation are described in JCT3V-D0181 (CE2: CU-based Disparity Vector Derivation in 3D-HEVC, J. Kang, Y. Chen, L. Zhang, M. Katzwirtz (Qualcomm)).

视差向量(DV)用作两个视图之间的位移的估计器。因为相邻块在视频译码中几乎共享相同的运动/视差信息，所以当前块可使用相邻块中的运动向量信息作为良好预测符。依照此想法，NBDV导出过程使用相邻视差信息用于估计不同视图中的视差向量。The disparity vector (DV) is used as an estimator of the displacement between two views. Because neighboring blocks share almost the same motion/disparity information in video decoding, the current block can use the motion vector information in the neighboring blocks as a good predictor. Following this idea, the NBDV derivation process uses the neighboring disparity information to estimate the disparity vector in different views.

为了实施NBDV，视频编码器20初始地界定若干空间和时间相邻块。视频编码器20随后以通过当前块与候选块之间的相关的优先级所确定的预定义次序检查相邻块中的每一者。一旦在候选者中发现视差运动向量(即，指向视图间参考图片的运动向量),视频编码器20就将所述视差运动向量转换为视差向量且还返回相关联视图次序索引。利用相邻块的两个集合。一个集合包含空间相邻块且另一集合包含时间相邻块。To implement NBDV, video encoder 20 initially defines several spatial and temporal neighboring blocks. Video encoder 20 then checks each of the neighboring blocks in a predefined order determined by the priority of the correlation between the current block and the candidate blocks. Once a disparity motion vector (i.e., a motion vector pointing to an inter-view reference picture) is found in a candidate, video encoder 20 converts the disparity motion vector into a disparity vector and also returns the associated view order index. Two sets of neighboring blocks are utilized. One set contains spatial neighboring blocks and the other set contains temporal neighboring blocks.

在3D-HEVC的最近提议中，NBDV导出中使用两个空间相邻块。所述空间相邻块是相对于当前译码单元(CU)90的左边和上方相邻块，如图8中分别由A1和B1表示。应注意图8中所描绘的相邻块处于与HEVC中的合并模式中使用的相邻块中的一些相同的位置。因此，无需要额外存储器存取。然而应理解，也可以使用相对于当前CU 90在其它位置中的相邻块。In recent proposals for 3D-HEVC, two spatially neighboring blocks are used in NBDV derivation. These are the left and top neighboring blocks relative to the current coding unit (CU) 90, as indicated by A1 and B1, respectively, in FIG8 . It should be noted that the neighboring blocks depicted in FIG8 are in the same positions as some of the neighboring blocks used in merge mode in HEVC. Therefore, no additional memory access is required. However, it should be understood that neighboring blocks in other positions relative to the current CU 90 can also be used.

为了检查时间相邻块，视频编码器20首先执行用于候选图片列表的构造过程。来自当前视图的高达两个参考图片可被视为候选图片。视频编码器20首先将位于同一地点的参考图片插入到候选图片列表，接着按参考索引的升序插入其余候选图片。当两个参考图片列表中具有相同参考索引的参考图片可用时，在与所述位于同一地点的图片相同的参考图片列表中的参考图片位于具有相同参考索引的另一参考图片之前。对于候选图片列表中的每一候选图片，视频编码器20将位于同一地点的区的覆盖中心位置的块确定为时间相邻块。To check for temporally neighboring blocks, video encoder 20 first performs a process for constructing a candidate picture list. Up to two reference pictures from the current view can be considered candidate pictures. Video encoder 20 first inserts the co-located reference picture into the candidate picture list, followed by the remaining candidate pictures in ascending order of reference index. When a reference picture with the same reference index is available in two reference picture lists, the reference picture in the same reference picture list as the co-located picture precedes the other reference picture with the same reference index. For each candidate picture in the candidate picture list, video encoder 20 determines the block covering the center position of the co-located region as a temporally neighboring block.

当通过视图间运动预测对块进行译码时，可导出视差向量以用于选择不同视图中的对应块。视图间运动预测过程中导出的视差向量称为隐式视差向量(IDV或也称为所导出的视差向量。即使块是以运动预测译码的，为了对随后的块进行译码的目的也不会丢弃所导出的视差向量。When a block is coded using inter-view motion prediction, a disparity vector can be derived for selecting the corresponding block in a different view. The disparity vector derived during inter-view motion prediction is called an implicit disparity vector (IDV) or also called a derived disparity vector. Even if a block is coded using motion prediction, the derived disparity vector is not discarded for the purpose of coding subsequent blocks.

在HTM的一个设计中，在NBDV导出过程期间，视频译码器(例如，视频编码器20和/或视频解码器30)经配置以按次序检查时间相邻块中的视差运动向量、空间相邻块中的视差运动向量和随后IDV。一旦找到视差运动向量或IDV，过程便终止。In one design of HTM, during the NBDV derivation process, a video coder (e.g., video encoder 20 and/or video decoder 30) is configured to sequentially check for disparity motion vectors in temporally neighboring blocks, disparity motion vectors in spatially neighboring blocks, and then IDVs. Once a disparity motion vector or IDV is found, the process terminates.

现将论述通过存取深度信息对NBDV导出过程的精炼(NBDV-R)。当从NBDV导出过程导出视差向量时，可通过从参考视图的深度图检索深度数据进一步精炼所导出的视差向量。所述精炼过程可包含以下技术：We will now discuss the refinement of the NBDV derivation process by accessing depth information (NBDV-R). When deriving a disparity vector from the NBDV derivation process, the derived disparity vector can be further refined by retrieving depth data from the depth map of the reference view. The refinement process may include the following techniques:

a)在例如基础视图等经先前译码参考深度视图中通过所导出的视差向量定位对应深度块；对应深度块的大小与当前PU的大小相同。a) Locate the corresponding depth block by the derived disparity vector in a previously coded reference depth view, such as the base view; the size of the corresponding depth block is the same as that of the current PU.

b)从对应深度块的四个拐角像素选择一个深度值且将所述深度值转换为经精炼视差向量的水平分量。视差向量的垂直分量不变。b) Select one depth value from the four corner pixels of the corresponding depth block and convert the depth value into the horizontal component of the refined disparity vector. The vertical component of the disparity vector remains unchanged.

注意，在一些实例中，经精炼视差向量可以用于视图间运动预测，而未精炼的视差向量可以用于视图间残余预测。另外，如果PU是以后向视图合成预测模式译码，那么可将经精炼视差向量存储为一个PU的运动向量。在3D-HEVC的一些提议中，无论从NBDV导出过程导出的视图次序索引的值如何，都可存取基础视图的深度视图分量。Note that in some examples, the refined disparity vector can be used for inter-view motion prediction, while the unrefined disparity vector can be used for inter-view residual prediction. In addition, if the PU is coded in the backward view synthesis prediction mode, the refined disparity vector can be stored as the motion vector of the PU. In some proposals for 3D-HEVC, the depth view component of the base view can be accessed regardless of the value of the view order index derived from the NBDV derivation process.

现将论述3D-HEVC中的后向视图合成预测(BVSP)技术。第3次JCT-3V会议中采用了由D·田等人提出的一个实例BVSP方法“CE1.h:使用相邻块的后向视图合成预测”(JCT3V-C0152)，其从http://phenix.it-sudparis.eu/jct2/doc_end_user/current_ document.php？id＝594可用。BVSP的基本想法类似于3D-AVC中的基于块的视图合成预测。这两个技术均使用后向扭曲和基于块的视图合成预测来避免发射运动向量差且使用更精确的运动向量。3D-HEVC和3D-AVC中的BVSP的实施细节由于不同平台而不同。The Backward View Synthesis Prediction (BVSP) technique in 3D-HEVC will now be discussed. An example BVSP method, "CE1.h: Backward View Synthesis Prediction Using Neighboring Blocks" (JCT3V-C0152), proposed by D. Tian et al., was adopted at the 3rd JCT-3V meeting and is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=594 . The basic concept of BVSP is similar to block-based view synthesis prediction in 3D-AVC. Both techniques use backward warping and block-based view synthesis prediction to avoid transmitting motion vector differences and use more accurate motion vectors. The implementation details of BVSP in 3D-HEVC and 3D-AVC vary depending on the platform.

在3D-HEVC中，针对在跳过或合并模式中经译码的经帧间译码块支持BVSP模式。在3D-HEVC的一个实例提议中，针对在高级运动向量预测(AMVP)模式中经译码的块不允许BVSP模式。替代于发射旗标以指示BVSP模式的使用，视频编码器20可经配置以将一个额外合并候选者(即，BVSP合并候选者)添加到合并候选者列表，且每一候选者与一个BVSP旗标相关联。当经解码合并索引对应于BVSP合并候选者时，所述经解码合并索引指示当前预测单元(PU)使用BVSP模式。对于当前PU内的每一子块，可通过转换深度参考视图中的深度值而导出视差运动向量。In 3D-HEVC, BVSP mode is supported for inter-coded blocks coded in skip or merge mode. In one example proposal for 3D-HEVC, BVSP mode is not allowed for blocks coded in advanced motion vector prediction (AMVP) mode. Instead of transmitting a flag to indicate the use of BVSP mode, video encoder 20 may be configured to add one additional merge candidate (i.e., a BVSP merge candidate) to the merge candidate list, with each candidate associated with a BVSP flag. When a decoded merge index corresponds to a BVSP merge candidate, the decoded merge index indicates that the current prediction unit (PU) uses BVSP mode. For each sub-block within the current PU, a disparity motion vector may be derived by converting the depth value in the depth reference view.

如下界定BVSP旗标的设定。当用于导出空间合并候选者的空间相邻块是以BVSP模式译码时，空间合并候选者的相关联运动信息由当前块继承，如常规合并模式中。另外，此空间合并候选者与等于1的BVSP旗标相关联(即，指示空间合并候选者是以BVSP模式译码)。对于新引入的BVSP合并候选者，将BVSP旗标设定成1。对于全部其它合并候选者，将相关联BVSP旗标设定成0。The setting of the BVSP flag is defined as follows. When the spatial neighboring block used to derive the spatial merge candidate is coded in BVSP mode, the associated motion information of the spatial merge candidate is inherited by the current block, as in normal merge mode. In addition, this spatial merge candidate is associated with a BVSP flag equal to 1 (i.e., indicating that the spatial merge candidate is coded in BVSP mode). For newly introduced BVSP merge candidates, the BVSP flag is set to 1. For all other merge candidates, the associated BVSP flag is set to 0.

如上文所论述，在3D-HEVC中，视频编码器20可经配置以将名为BVSP合并候选者的新候选者导出且嵌入到合并候选者列表中。通过以下方法设定对应参考图片索引和运动向量。As discussed above, in 3D-HEVC, video encoder 20 may be configured to derive and embed a new candidate named BVSP merge candidate into the merge candidate list.The corresponding reference picture index and motion vector are set by the following method.

第一视频编码器20可经配置以从NBDV导出过程获得由所导出的视差向量的视图索引语法元素(例如，3D-HEVC中的refVIdxLX)表示的视图索引。视频编码器20还可经配置以获得与具有等于refVIdxLX的视图次序索引的参考图片相关联的参考图片列表(例如，RefPicListX(RefPicList0或RefPicList1))。视频编码器20随后使用从NBDV导出过程获得的对应参考图片索引和视差向量作为RefPicListX(即，RefPicList0或RefPicList1)中的BVSP合并候选者的运动信息。The first video encoder 20 may be configured to obtain the view index represented by the view index syntax element of the derived disparity vector (e.g., refVIdxLX in 3D-HEVC) from the NBDV derivation process. Video encoder 20 may also be configured to obtain a reference picture list (e.g., RefPicListX (RefPicList0 or RefPicList1)) associated with the reference picture having a view order index equal to refVIdxLX. Video encoder 20 then uses the corresponding reference picture index and disparity vector obtained from the NBDV derivation process as motion information for the BVSP merge candidate in RefPicListX (i.e., RefPicList0 or RefPicList1).

如果当前切片是B切片，那么视频编码器20检查具有由除RefPicListX外的参考图片列表(即，RefPicListY，其中Y为1-X)中不等于refVIdxLX的refVIdxLY表示的视图次序索引的视图间参考图片的可用性。如果找到此不同视图间参考图片，那么视频编码器20执行双向预测视图合成预测。视频编码器20可进一步经配置以使用不同视图间参考图片的对应参考图片索引和来自NBDV导出过程的经按比例缩放视差向量作为RefPicListY中的BVSP合并候选者的运动信息。来自具有等于refVIdxLX的视图次序索引的视图的深度块用作当前块的深度信息(在纹理优先译码次序的情况下)。视频编码器20经由后向扭曲过程合成所述两个不同视图间参考图片(来自每一参考图片列表一个)并且进一步加权经合成参考图片以实现最终BVSP预测符。If the current slice is a B slice, video encoder 20 checks for the availability of an inter-view reference picture with a view order index represented by refVIdxLY that is not equal to refVIdxLX in a reference picture list other than RefPicListX (i.e., RefPicListY, where Y is 1-X). If this different inter-view reference picture is found, video encoder 20 performs bi-predictive view synthesis prediction. Video encoder 20 may further be configured to use the corresponding reference picture index of the different inter-view reference picture and the scaled disparity vector from the NBDV derivation process as motion information for the BVSP merge candidate in RefPicListY. The depth block from the view with the view order index equal to refVIdxLX is used as the depth information for the current block (in the case of texture-first coding order). Video encoder 20 synthesizes the two different inter-view reference pictures (one from each reference picture list) through a backward warping process and further weights the synthesized reference pictures to arrive at the final BVSP predictor.

对于除B切片外的切片类型(例如，P切片)，视频编码器20以RefPicListX作为用于预测的参考图片列表来应用单向预测视图合成预测。For slice types other than B slices (eg, P slices), video encoder 20 applies uni-predictive view synthesis prediction with RefPicListX as the reference picture list for prediction.

在3D-HTM中，在共同测试条件中应用纹理优先译码。由于视图的纹理分量是在深度分量之前经译码，因此当解码一个非基础纹理分量时对应非基础深度分量不可用。因此，视频解码器30可经配置以估计深度信息，且随后使用估计的深度信息执行BVSP。为了估计块的深度信息，提出首先从相邻块导出视差向量(例如，使用NBDV导出过程)，且随后使用所导出的视差向量从参考视图获得深度块。In 3D-HTM, texture-first coding is applied in a common test condition. Since the texture components of a view are coded before the depth components, the corresponding non-base depth component is not available when decoding a non-base texture component. Therefore, the video decoder 30 can be configured to estimate depth information and then perform BVSP using the estimated depth information. To estimate the depth information of a block, it is proposed to first derive a disparity vector from a neighboring block (e.g., using an NBDV derivation process), and then use the derived disparity vector to obtain a depth block from a reference view.

图9说明用于从参考视图定位深度块且随后使用所述深度块用于BVSP预测的实例技术。初始地，视频编码器20和/或视频解码器30可利用与相邻块102相关联的视差向量104。即，视频编码器20和/或视频解码器30可从已经编码的相邻块(例如相邻块102)存取视差向量信息且再使用当前块100的任何相关联视差向量信息。视差向量104指向参考深度图片中的深度块106。当视差向量104再用于当前块100时，视差向量104现在指向参考深度图片中的深度块108。深度块108对应于当前块100。视频编码器20和/或视频解码器30可随后使用参考深度图片108中的深度信息来使用后向扭曲技术合成参考纹理图片中的块。经合成纹理图片可随后用作参考图片以预测当前块100。FIG9 illustrates an example technique for locating a depth block from a reference view and subsequently using the depth block for BVSP prediction. Initially, video encoder 20 and/or video decoder 30 may utilize disparity vector 104 associated with neighboring block 102. That is, video encoder 20 and/or video decoder 30 may access disparity vector information from an already encoded neighboring block (e.g., neighboring block 102) and reuse any associated disparity vector information for current block 100. Disparity vector 104 points to depth block 106 in a reference depth picture. When disparity vector 104 is reused for current block 100, disparity vector 104 now points to depth block 108 in a reference depth picture. Depth block 108 corresponds to current block 100. Video encoder 20 and/or video decoder 30 may then use the depth information in reference depth picture 108 to synthesize a block in a reference texture picture using a backward warping technique. The synthesized texture picture may then be used as a reference picture to predict current block 100.

在本发明的一个实例中，对于NBDV导出过程，假设(dv_x,dv_y)表示由NBDV导出过程识别的视差向量104，且将当前块100的位置表示为(块_x,块_y)。在单向预测BVSP的一个实例中，视频编码器20和/或视频解码器30可经配置以获取参考视图的深度视图分量中具有(块_x+dv_x,块_y+dv_y)的左上方位置的深度块108。视频编码器20和/或视频解码器30可经配置以首先将当前块100(例如，PU)分裂为若干子块，其各自具有相同大小(例如，等于W*H)。对于具有等于W*H的大小的每一子块，视频编码器20和/或视频解码器30识别来自所获取深度视图分量内的对应深度子块108的四个拐角像素的最大深度值，例如如图10中所示。图10是说明8x8深度块110的四个拐角像素的概念图。所述四个拐角像素可标记为左上方(TL)像素、右上方(TR)像素、左下方(BL)像素)，和右下方(BR)像素。视频编码器20和/或视频解码器30将最大深度值转换为视差运动向量。每一子块的所导出视差运动向量随后用于运动补偿。In one example of the present disclosure, for the NBDV derivation process, it is assumed that ( _dvx , _dvy ) represents the disparity vector 104 identified by the NBDV derivation process, and the position of the current block 100 is denoted as ( _blockx , _blocky ). In one example of uni-directionally predicted BVSP, the video encoder 20 and/or video decoder 30 may be configured to retrieve a depth block 108 having a top-left position of ( _blockx + _dvx , _blocky + _dvy ) in the depth view component of the reference view. The video encoder 20 and/or video decoder 30 may be configured to first split the current block 100 (e.g., PU) into several sub-blocks, each of which has the same size (e.g., equal to W*H). For each sub-block having a size equal to W*H, the video encoder 20 and/or video decoder 30 identifies the maximum depth value of the four corner pixels of the corresponding depth sub-block 108 within the retrieved depth view component, as shown in FIG. FIG. 10 is a conceptual diagram illustrating the four corner pixels of an 8x8 depth block 110. The four corner pixels may be labeled as a top left (TL) pixel, a top right (TR) pixel, a bottom left (BL) pixel), and a bottom right (BR) pixel. Video encoder 20 and/or video decoder 30 converts the maximum depth value into a disparity motion vector. The derived disparity motion vector for each sub-block is then used for motion compensation.

此部分将论述当执行双向预测时的BVSP。当存在来自RefPicList0和RefPicList1中的不同视图的多个视图间参考图片时，视频编码器20和/或视频解码器30应用双向预测BVSP。在双向预测BVSP中，将从每一参考列表产生两个视图合成预测预测符(即，两个合成参考块)，如上文所描述。随后使所述两个视图合成预测预测符平均化以获得最终视图合成预测预测符。This section discusses BVSP when performing bi-directional prediction. When there are multiple inter-view reference pictures from different views in RefPicList0 and RefPicList1, video encoder 20 and/or video decoder 30 applies bi-directional prediction BVSP. In bi-directional prediction BVSP, two view synthesis prediction predictors (i.e., two synthesized reference blocks) are generated from each reference list, as described above. The two view synthesis prediction predictors are then averaged to obtain a final view synthesis prediction predictor.

运动补偿大小，即如上文所描述的W*H，可为8x4或4x8。在一个实例中，为了确定运动补偿大小，应用以下规则：The motion compensation size, i.e. W*H as described above, can be 8x4 or 4x8. In one example, to determine the motion compensation size, the following rules are applied:

对于每一8x8块，检查对应深度8x8块的四个拐角且：For each 8x8 block, check the four corners of the corresponding depth 8x8 block and:

if(vdepth[TL]＜vdepth[BR]？0:1)！＝(vdepth[TR]＜vdepth[BL]？0:1)if(vdepth[TL]<vdepth[BR]?0:1)! =(vdepth[TR]<vdepth[BL]?0:1)

使用4x8分区(W＝4,H＝8)Use 4x8 partitions (W=4, H=8)

elseelse

使用8x4分区(W＝8,H＝4)Use 8x4 partition (W=8, H=4)

以下描述3D-HEVC的一个提议中用于跳过/合并模式的视图间候选者导出过程。基于从NBDV导出过程导出的视差向量，可将称为经视图间预测的运动向量候选者(IPMVC)(如果可用)的新运动向量候选者添加到AMVP和跳过/合并模式运动向量候选者列表。经视图间预测的运动向量(如果可用)为时间运动向量。由于跳过模式具有与合并模式相同的运动向量导出过程，所以此文献中所描述的全部技术适用于合并模式及跳过模式两者。The following describes the inter-view candidate derivation process for skip/merge mode in one proposal for 3D-HEVC. Based on the disparity vector derived from the NBDV derivation process, a new motion vector candidate, called an inter-view predicted motion vector candidate (IPMVC) (if available), can be added to the AMVP and skip/merge mode motion vector candidate lists. The inter-view predicted motion vector (if available) is a temporal motion vector. Since skip mode has the same motion vector derivation process as merge mode, all techniques described in this document apply to both merge and skip modes.

图11是说明用于合并/跳过模式的经视图间预测的运动向量候选者的实例导出的概念图。举例来说，图11展示经视图间预测的运动向量候选者的导出过程的实例。对于合并/跳过模式，通过以下步骤导出经视图间预测的运动向量候选者。首先，视频编码器20和/或视频解码器30使用视差向量在同一存取单元的参考视图中定位当前PU/CU 114的对应块(例如，参考块)112。在图11的实例中，当前块(当前PU)114在视图V1中，而对应参考块112在视图V0中。如果对应参考块112未经帧内译码且未经视图间预测，且其参考图片(在此实例中在视图V0和时间T1中)具有等于当前PU/CU 114的同一参考图片列表中的一个条目的POC值的POC值，那么将对应参考块112的运动信息(即，预测方向、参考图片索引和运动向量)导出为在基于参考图片的POC转换参考索引之后的经视图间预测的运动向量。FIG11 is a conceptual diagram illustrating an example derivation of inter-view predicted motion vector candidates for merge/skip mode. For example, FIG11 shows an example of a derivation process for inter-view predicted motion vector candidates. For merge/skip mode, inter-view predicted motion vector candidates are derived by the following steps. First, video encoder 20 and/or video decoder 30 uses a disparity vector to locate a corresponding block (e.g., a reference block) 112 for a current PU/CU 114 in a reference view of the same access unit. In the example of FIG11 , current block (current PU) 114 is in view V1, while corresponding reference block 112 is in view V0. If the corresponding reference block 112 is not intra-coded and not inter-view predicted, and its reference picture (in view V0 and time T1 in this example) has a POC value equal to the POC value of an entry in the same reference picture list of the current PU/CU 114, then the motion information (i.e., prediction direction, reference picture index, and motion vector) of the corresponding reference block 112 is derived as an inter-view predicted motion vector after converting the reference index based on the POC of the reference picture.

可如下界定对应参考块112。首先表示当前预测单元的左上方明度样本相对于当前图片的左上方明度样本的明度位置(xP,yP)。变量nPSW和nPSH分别表示当前预测单元的宽度和高度。参考视图次序索引标记为refViewIdx，且视差向量标记为mvDisp。通过以下操作导出参考层明度位置(xRef,yRef)：The corresponding reference block 112 can be defined as follows. First, the luma position (xP, yP) of the top left luma sample of the current prediction unit relative to the top left luma sample of the current picture is represented. The variables nPSW and nPSH represent the width and height of the current prediction unit, respectively. The reference view order index is denoted as refViewIdx, and the disparity vector is denoted as mvDisp. The reference layer luma position (xRef, yRef) is derived by the following operation:

xRef＝Clip3(0,PicWidthInSamples_L-1,xP+((nPSW-1)>>1)+((mvDisp[0]+2)>>2)) (H-124)xRef＝Clip3(0,PicWidthInSamples _L -1,xP+((nPSW-1)>>1)+((mvDisp[0]+2)>>2)) (H-124)

yRef＝Clip3(0,PicHeightInSamples_L-1,yP+((nPSH-1)>>1)+((mvDisp[1]+2)>>2)) (H-125)yRef＝Clip3(0,PicHeightInSamples _L -1,yP+((nPSH-1)>>1)+((mvDisp[1]+2)>>2)) (H-125)

对应参考块112设定成覆盖具有等于refViewIdx的ViewIdx的视图分量中的明度位置(xRef,yRef)的预测单元。The corresponding reference block 112 is set to cover the prediction unit of the luma position (xRef, yRef) in the view component with ViewIdx equal to refViewIdx.

另外，视差向量可转换成视图间视差运动向量(IDMVC)，其添加到合并候选者列表中处于与IPMVC不同的位置。视图间视差运动向量也可以添加到AMVP候选者列表中处于与IPMVC(当其可用时)相同的位置。IPMVC或IDMVC可在此上下文中被称为“视图间候选者”。In addition, the disparity vector can be converted into an inter-view disparity motion vector (IDMVC), which is added to the merge candidate list at a different position than the IPMVC. The inter-view disparity motion vector can also be added to the AMVP candidate list at the same position as the IPMVC (when available). IPMVC or IDMVC may be referred to as "inter-view candidate" in this context.

在用于合并/跳过模式的一个实例中，IPMVC(如果可用)在全部空间和时间合并候选者之前插入到合并候选者列表。IDMVC插入在从A₀导出的空间合并候选者之前。In one example for merge/skip mode, IPMVC (if available) is inserted into the merge candidate list before all spatial and temporal merge candidates. IDMVC is inserted before spatial merge candidates derived from _A0 .

以下部分描述3D-HEVC中用于纹理译码的合并候选者列表构造。首先，视频编码器20和/或视频解码器30例如使用上述NBDV导出技术导出视差向量。在导出视差向量之后，视频编码器20和/或视频解码器30可经配置以如下所述执行3D-HEVC中的合并候选者列表构造过程。The following section describes the construction of a merge candidate list for texture coding in 3D-HEVC. First, video encoder 20 and/or video decoder 30 derives a disparity vector, for example, using the NBDV derivation technique described above. After deriving the disparity vector, video encoder 20 and/or video decoder 30 may be configured to perform the merge candidate list construction process in 3D-HEVC as described below.

视频编码器20和/或视频解码器30可使用上述程序导出一或多个IPMVC。如果IPMVC可用，那么可将IPMV插入到合并列表。Video encoder 20 and/or video decoder 30 may derive one or more IPMVCs using the above procedure.If an IPMVC is available, the IPMV may be inserted into the merge list.

接着，视频编码器20和/或视频解码器30可经配置以导出空间合并候选者和3D-HEVC中的一或多个IDMVC插入。为了导出空间合并候选者，视频编码器20和/或视频解码器30可经配置以按例如以下次序检查空间相邻PU的运动信息：A₁，B₁，B₀，A₀或B₂，如图6中所示。Next, video encoder 20 and/or video decoder 30 may be configured to derive spatial merging candidates and one or more IDMVC insertions in 3D-HEVC. To derive spatial merging candidates, video encoder 20 and/or video decoder 30 may be configured to check motion information of spatially neighboring PUs in the following order, for example: _A1 , _B1 , _B0 , _A0 , or _B2 , as shown in FIG.

视频编码器20和/或视频解码器30可进一步经配置以执行受约束修剪。为了执行受约束修剪，视频编码器20和/或视频解码器30可经配置以在A₁和IPMVC具有相同运动向量和相同参考索引的情况下不将位置A₁处的空间合并候选者插入到合并候选者列表中。否则将位置A₁处的空间合并候选者插入到合并候选者列表中。Video encoder 20 and/or video decoder 30 may be further configured to perform constrained pruning. To perform constrained pruning, video encoder 20 and/or video decoder 30 may be configured to not insert the spatial merge candidate at position _A1 into the merge candidate list if _A1 and IPMVC have the same motion vector and the same reference index. Otherwise, the spatial merge candidate at position _A1 is inserted into the merge candidate list.

如果位置B₁处的合并候选者和合并位置A₁处的合并候选者(或IPMVC)具有相同运动向量和相同参考索引，那么位置B₁处的合并候选者不插入到合并候选者列表中。否则将位置B₁处的合并候选者插入到合并候选者列表中。如果位置B₀处的合并候选者可用(即，经译码且具有运动信息)，那么将位置B₀处的合并候选者添加到候选者列表。视频编码器20和/或视频解码器30使用上述程序导出IDMVC。如果IDMVC可用，且IDMVC的运动信息不同于从A₁和B₁导出的候选者，那么将IDMVC插入到候选者列表。If the merge candidate at position _B1 and the merge candidate (or IPMVC) at the merge position _A1 have the same motion vector and the same reference index, the merge candidate at position _B1 is not inserted into the merge candidate list. Otherwise, the merge candidate at position _B1 is inserted into the merge candidate list. If the merge candidate at position _B0 is available (i.e., decoded and has motion information), the merge candidate at position _B0 is added to the candidate list. Video encoder 20 and/or video decoder 30 derives IDMVC using the above procedure. If IDMVC is available and its motion information is different from the candidates derived from _A1 and _B1 , the IDMVC is inserted into the candidate list.

如果BVSP针对整个图片(或针对当前切片)经启用，那么将BVSP合并候选者插入到合并候选者列表。如果位置A₀处的合并候选者可用，那么将其添加到候选者列表。如果B₂处的合并候选者可用，那么将其添加到候选者列表。If BVSP is enabled for the entire picture (or for the current slice), the BVSP merge candidate is inserted into the merge candidate list. If the merge candidate at position A ₀ is available, it is added to the candidate list. If the merge candidate at position B ₂ is available, it is added to the candidate list.

下一部分将论述3D-HEVC中用于时间合并候选者的导出过程。3D-HEVC中的时间合并候选者导出类似于HEVC中的时间合并候选者导出过程，其中利用位于同一地点的PU的运动信息。然而，对于3D-HEVC，可改变时间合并候选者的目标参考图片索引而不是将参考图片索引固定为0。当等于0的目标参考索引对应于时间参考图片(同一视图中)时，在位于同一地点预测单元(PU)的运动向量指向视图间参考图片时，将目标参考索引改变为对应于参考图片列表中的视图间参考图片的第一条目的另一索引。相反，当等于0的目标参考索引对应于视图间参考图片时，在位于同一地点的预测单元(PU)的运动向量指向时间参考图片时，将目标参考图片索引改变为对应于参考图片列表中的时间参考图片的第一条目的另一索引。The next section discusses the derivation process for temporal merging candidates in 3D-HEVC. Temporal merging candidate derivation in 3D-HEVC is similar to that in HEVC, where the motion information of the co-located PU is utilized. However, for 3D-HEVC, the target reference picture index of the temporal merging candidate can be changed instead of fixing the reference picture index to 0. When the target reference index equal to 0 corresponds to a temporal reference picture (in the same view), when the motion vector of the co-located prediction unit (PU) points to an inter-view reference picture, the target reference index is changed to another index corresponding to the first entry of the inter-view reference picture in the reference picture list. Conversely, when the target reference index equal to 0 corresponds to an inter-view reference picture, when the motion vector of the co-located prediction unit (PU) points to a temporal reference picture, the target reference picture index is changed to another index corresponding to the first entry of the temporal reference picture in the reference picture list.

现将论述3D-HEVC中用于组合双向预测合并候选者的实例导出过程。如果从以上两个步骤(即，空间合并候选者的导出和时间合并候选者的导出)导出的候选者的总数小于候选者的最大数目(可为预定义的)，那么执行如在HEVC中界定的相同过程，如上文所描述。然而，参考索引l0CandIdx和l1CandIdx的规范不同。图12是指示3D-HEVC中的l0CandIdx及l1CandIdx的实例规范的另一表。举例来说，在图12中说明的表中界定combIdx、l0CandIdx和l1CandIdx之间的关系。An example derivation process for combining bi-predictive merging candidates in 3D-HEVC will now be discussed. If the total number of candidates derived from the above two steps (i.e., derivation of spatial merging candidates and derivation of temporal merging candidates) is less than the maximum number of candidates (which may be predefined), the same process as defined in HEVC is performed, as described above. However, the specification of the reference indices l0CandIdx and l1CandIdx is different. FIG12 is another table indicating an example specification of l0CandIdx and l1CandIdx in 3D-HEVC. For example, the relationship between combIdx, l0CandIdx, and l1CandIdx is defined in the table illustrated in FIG12.

3D-HEV中用于零运动向量合并候选者的一个实例导出过程是与HEVC中经界定的程序相同的程序。在3D-HEVC的一个实例中，合并候选者列表中的候选者的总数高达6，且在切片标头中产生five_minus_max_num_merge_cand语法元素以指定从6减去合并候选者的最大数目。应注意，语法元素five_minus_max_num_merge_cand的值在0到5且包含0和5的范围中。An example derivation process for zero motion vector merge candidates in 3D-HEVC is the same as that defined in HEVC. In one example of 3D-HEVC, the total number of candidates in the merge candidate list is up to 6, and a five_minus_max_num_merge_cand syntax element is generated in the slice header to specify the maximum number of merge candidates subtracted from 6. It should be noted that the value of the five_minus_max_num_merge_cand syntax element is in the range of 0 to 5, inclusive.

以下描述例如3D-HEVC中用于深度译码的运动向量继承(MVI)。MVI技术寻求利用图片的纹理分量与其相关联深度视图分量之间的运动特性的相似性。图13是说明用于深度译码的运动向量继承候选者的实例导出的概念图。举例来说，图13展示MVI候选者的导出过程的实例，其中将对应纹理块120选择为位于纹理图片124中的当前PU122的中心右下方的4x4块。对于深度图片128中的当前PU 126，MVI候选者重新使用与对应纹理图片124中的已经译码对应纹理块120相关联的运动向量和参考索引(如果此些信息可用)。The following describes motion vector inheritance (MVI) for depth coding, for example, in 3D-HEVC. MVI techniques seek to exploit similarities in motion characteristics between texture components of a picture and their associated depth view components. FIG13 is a conceptual diagram illustrating an example derivation of motion vector inheritance candidates for depth coding. For example, FIG13 shows an example of an MVI candidate derivation process where the corresponding texture block 120 is selected as a 4x4 block located to the lower right of the center of the current PU 122 in the texture picture 124. For the current PU 126 in the depth picture 128, the MVI candidate reuses the motion vector and reference index associated with the already coded corresponding texture block 120 in the corresponding texture picture 124 (if such information is available).

应注意，在深度译码中使用具有整数精度的运动向量，而利用具有四分之一精度的运动向量用于纹理译码。因此，对应纹理块的运动向量可在用作MVI候选者之前经按比例缩放。It should be noted that motion vectors with integer precision are used in depth coding, while motion vectors with quarter precision are utilized for texture coding. Therefore, the motion vectors of the corresponding texture blocks can be scaled before being used as MVI candidates.

在MVI候选者产生的情况下，如下构造用于深度视图的合并候选者列表。对于MVI插入，使用如上文所描述的技术导出MVI，并且如果可用则插入到合并候选者列表中。In case of MVI candidate generation, the merge candidate list for depth views is constructed as follows: For MVI insertion, MVI is derived using the techniques as described above and inserted into the merge candidate list if available.

下文描述用于空间合并候选者的导出过程和3D-HEVC中用于深度译码的IDMVC插入。首先，视频编码器20和/或视频解码器30可经配置以按以下次序检查空间相邻PU的运动信息：A₁，B₁，B₀，A₀或B₂。The following describes the derivation process for spatial merging candidates and IDMVC insertion for depth coding in 3D-HEVC.First, video encoder 20 and/or video decoder 30 may be configured to check motion information of spatially neighboring PUs in the following order: _A1 , _B1 , _B0 , _A0 , or _B2 .

视频编码器20和/或视频解码器30可随后如下执行受约束修剪。如果位置A₁处的运动向量候选者和MVI候选者具有相同运动向量和相同参考索引，那么A₁处的运动向量候选者不插入到合并候选者列表中。如果位置B₁处的运动向量候选者和位置A₁处的运动向量候选者/MVI候选者具有相同运动向量和相同参考索引，那么位置B₁处的运动向量候选者不插入到合并候选者列表中。如果位置B₀处的运动向量候选者可用，那么将位置B₀处的运动向量候选者添加到合并候选者列表。如果位置A₀处的运动向量候选者可用，那么将位置A₀处的运动向量候选者添加到合并候选者列表。如果位置B₂处的运动向量候选者可用，那么将位置B₂处的运动向量候选者添加到合并候选者列表。Video encoder 20 and/or video decoder 30 may then perform constrained pruning as follows. If the motion vector candidate at position _A1 and the MVI candidate have the same motion vector and the same reference index, the motion vector candidate at _A1 is not inserted into the merge candidate list. If the motion vector candidate at position _B1 and the motion vector candidate/MVI candidate at position _A1 have the same motion vector and the same reference index, the motion vector candidate at position _B1 is not inserted into the merge candidate list. If the motion vector candidate at position _B0 is available, the motion vector candidate at position _B0 is added to the merge candidate list. If the motion vector candidate at position _A0 is available, the motion vector candidate at position _A0 is added to the merge candidate list. If the motion vector candidate at position _B2 is available, the motion vector candidate at position _B2 is added to the merge candidate list.

3D-HEVC深度译码中用于时间合并候选者的导出过程类似于HEVC中的时间合并候选者导出过程，其中利用位于同一地点的PU的运动信息。然而，在3D-HEVC深度译码中，如上文所解释，时间合并候选者的目标参考图片索引可改变而不是固定为0。The derivation process for temporal merging candidates in 3D-HEVC depth coding is similar to that in HEVC, where the motion information of the co-located PU is utilized. However, in 3D-HEVC depth coding, as explained above, the target reference picture index of the temporal merging candidate can be changed instead of being fixed to 0.

现将描述3D-HEVC深度译码中用于组合双向预测合并候选者的导出过程。如果从以上两个步骤导出的候选者的总数小于候选者的最大数目，那么除l0CandIdx和l1CandIdx的指定外执行如HEVC中界定的相同过程。在图12中说明的表中界定combIdx、l0CandIdx和l1CandIdx之间的关系。The derivation process for combining bi-predictive merging candidates in 3D-HEVC depth coding will now be described. If the total number of candidates derived from the above two steps is less than the maximum number of candidates, the same process as defined in HEVC is performed, except for the designation of l0CandIdx and l1CandIdx. The relationship between combIdx, l0CandIdx, and l1CandIdx is defined in the table illustrated in FIG12.

3D-HEVC深度译码中用于零运动向量合并候选者的导出过程与HEVC中界定的程序相同。The derivation process for zero motion vector merging candidates in 3D-HEVC depth coding is the same as the procedure defined in HEVC.

以下描述用于高级残余预测(ARP)的实例技术。第4次JCT3V会议中采用应用于具有等于Part_2Nx2N(例如，图5中的NxN)的分割模式的CU的ARP，如JCT3V-D0177中所提议。JCT3V-D0177文献是张等人的标题为“CE4：用于多视图译码的高级残余预测”的文献。JCT3V-D0177文献从2014年8月22日起从http://phenix.it-sudparis.eu/jct3v/doc_end_user/current_document.php？id＝862可用。The following describes an example technique for advanced residual prediction (ARP). The 4th JCT3V meeting adopted ARP applied to CUs with a partitioning mode equal to Part_2Nx2N (e.g., NxN in FIG. 5 ), as proposed in JCT3V-D0177. The JCT3V-D0177 document is Zhang et al., entitled “CE4: Advanced Residual Prediction for Multi-View Coding.” The JCT3V-D0177 document is available from http://phenix.it-sudparis.eu/jct3v/doc_end_user/current_document.php?id=862 as of August 22, 2014.

图14说明多视图视频译码中的高级残余预测(ARP)的预测结构。如图14中所展示，在当前块(“Curr”)140的预测中调用随后的块。由视差向量(DV)146导出的参考/基础视图144中的参考块142标记为“基础”。通过当前块140的(时间)运动向量150(表示为TMV)导出的与当前块Curr 140在同一视图(视图V_m)中的块148标记为“CurrTRef”。通过当前块的时间运动向量(TMV)导出的与块基础142在同一视图(视图V₀)中的块152标记为“BaseTRef”。以与当前块Curr 140进行比较的TMV+DV的向量154识别参考块BaseTRef 152。FIG14 illustrates the prediction structure of advanced residual prediction (ARP) in multi-view video coding. As shown in FIG14 , subsequent blocks are called in the prediction of a current block (“Curr”) 140. A reference block 142 in a reference/base view 144 derived from a disparity vector (DV) 146 is labeled “Base.” A block 148 in the same view (view V _m ) as the current block Curr 140, derived from a (temporal) motion vector 150 (denoted as TMV) of the current block 140, is labeled “CurrTRef.” A block 152 in the same view (view V ₀ ) as the block base 142, derived from the temporal motion vector (TMV) of the current block, is labeled “BaseTRef.” A reference block BaseTRef 152 is identified by a vector 154 of TMV+DV, which is compared to the current block Curr 140.

残余预测符表示为BaseTRef-Base，其中减法运算应用于所表示像素阵列的每一像素。可进一步将加权因数“w”乘以残余预测符。因此当前块Curr的最终预测符可表示为CurrTRef+w*(BaseTRef-Base)。The residual predictor is expressed as BaseTRef-Base, where the subtraction operation is applied to each pixel of the represented pixel array. A weighting factor "w" may be further multiplied by the residual predictor. Thus, the final predictor of the current block Curr may be expressed as CurrTRef+w*(BaseTRef-Base).

注意在以上描述和图14中，假定应用单向预测。当延伸ARP到双向预测时，针对每一参考图片列表应用以上步骤。当当前块Curr使用视图间参考图片(不同视图中)用于两个参考图片列表中的一者时，停用ARP过程。Note that in the above description and FIG14 , unidirectional prediction is assumed. When ARP is extended to bidirectional prediction, the above steps are applied for each reference picture list. When the current block Curr uses an inter-view reference picture (in a different view) for one of the two reference picture lists, the ARP process is disabled.

以下描述ARP中的解码过程。首先，视频解码器30获得指向目标参考视图的视差向量(例如，使用NBDV导出过程)。随后，在同一存取单元内的参考视图的图片中，视频解码器30使用视差向量定位对应块。The decoding process in ARP is described below. First, the video decoder 30 obtains the disparity vector pointing to the target reference view (for example, using the NBDV derivation process). Then, in the pictures of the reference view within the same access unit, the video decoder 30 uses the disparity vector to locate the corresponding block.

视频解码器30可再使用当前块的运动信息以导出参考块的运动信息。视频解码器30可随后基于当前块的同一运动向量和所导出参考图片应用用于对应块的运动补偿，以导出残余块。Video decoder 30 may reuse the motion information of the current block to derive motion information for the reference block.Video decoder 30 may then apply motion compensation for the corresponding block based on the same motion vector of the current block and the derived reference picture to derive a residual block.

图15是说明当前块160、参考块162和运动补偿块164和166之间的实例关系的概念图。具有与当前视图(V_m)的参考图片相同的POC(图片次序计数)值的参考视图(V₀)中的参考图片被选择为对应块162的参考图片。视频编码器20和/或视频解码器30可将加权因数应用于残余块以得到经加权残余块且将经加权残余块的值添加到预测样本。15 is a conceptual diagram illustrating an example relationship between a current block 160, a reference block 162, and motion compensation blocks 164 and 166. A reference picture in a reference view (V ₀ ) having the same POC (Picture Order Count) value as the reference picture of the current view (V _m ) is selected as the reference picture for corresponding block 162. Video encoder 20 and/or video decoder 30 may apply a weighting factor to a residual block to obtain a weighted residual block and add the value of the weighted residual block to the prediction sample.

以下描述加权因数。ARP中使用三个加权因数，即，0、0.5和1。导致当前CU的大多数最小速率失真成本的加权因数被选择为最终加权因数，且在CU层级在位流中发射对应加权因数索引(例如，分别对应于加权因数0、1和0.5的0、1和2)。在ARP的一个实例中，一个CU中的全部PU预测共享同一加权因数。当加权因数等于0时，ARP不用于当前CU。The weighting factors are described below. Three weighting factors are used in ARP, namely 0, 0.5, and 1. The weighting factor that results in the most minimum rate-distortion cost for the current CU is selected as the final weighting factor, and the corresponding weighting factor index (e.g., 0, 1, and 2 corresponding to weighting factors 0, 1, and 0.5, respectively) is transmitted in the bitstream at the CU level. In one example of ARP, all PU predictions in a CU share the same weighting factor. When the weighting factor is equal to 0, ARP is not used for the current CU.

以下描述ARP的一些进一步简化。首先，描述经由运动向量按比例缩放的参考图片选择。第二，描述内插滤波器。The following describes some further simplifications of ARP. First, reference picture selection via motion vector scaling is described. Second, interpolation filters are described.

对于经由运动向量按比例缩放的参考图片选择，在JCT3V-C0049中，以非零加权因数译码的预测单元的参考图片可在块之间不同。JCT3V-C0049文献是张等人的标题为“3D-CE4：用于多视图译码的高级残余预测”的文献。JCT3V-C0049文献在2013年9月23日从http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php？id＝487可用。For reference picture selection via motion vector scaling, in JCT3V-C0049, the reference picture of a prediction unit coded with a non-zero weighting factor can differ between blocks. The JCT3V-C0049 document is Zhang et al., entitled "3D-CE4: Advanced Residual Prediction for Multiview Coding." The JCT3V-C0049 document was available on September 23, 2013, from http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php?id=487.

因此，可能需要存取来自参考视图的不同图片以产生对应块的经运动补偿的块(例如，图14中的BaseTRef)。已提出当加权因数不等于0时在执行用于残余产生过程的运动补偿之前朝向固定图片按比例缩放当前PU的经解码运动向量。如JCT3V-D0177中所提议，所述固定图片在其来自同一视图的情况下经界定为每一参考图片列表的第一参考图片。当经解码运动向量不指向固定图片时，视频解码器30可首先经按比例缩放经解码运动向量且随后使用经按比例缩放运动向量以识别CurrTRef和BaseTRef。用于ARP的此参考图片称为目标ARP参考图片。14 ). It has been proposed to scale the decoded motion vector of the current PU toward a fixed picture before performing motion compensation for the residual generation process when the weighting factor is not equal to 0. As proposed in JCT3V-D0177, the fixed picture is defined as the first reference picture of each reference picture list if it is from the same view. When the decoded motion vector does not point to a fixed picture, video decoder 30 may first scale the decoded motion vector and then use the scaled motion vector to identify CurrTRef and BaseTRef. This reference picture used for ARP is called a target ARP reference picture.

对于内插滤波器，如JCT3V-C0049中所描述，视频编码器20和/或视频解码器30可在对应块及其预测块的内插过程期间应用双线性滤波器。对于非基础视图中的当前PU的预测块，可应用常规8/4分接头滤波器。在另一实例中，如JCT3V-D0177所提议，无论当应用ARP时块在基础视图还是非基础视图中，视频编码器20和/或视频解码器30可始终采用双线性滤波。For interpolation filters, as described in JCT3V-C0049, video encoder 20 and/or video decoder 30 may apply a bilinear filter during the interpolation process of the corresponding block and its prediction block. For the prediction block of the current PU in a non-base view, a conventional 8/4 tap filter may be applied. In another example, as proposed in JCT3V-D0177, video encoder 20 and/or video decoder 30 may always employ bilinear filtering regardless of whether the block is in a base view or a non-base view when ARP is applied.

在本发明的一或多个实例中，视频编码器20和/或视频解码器30可经配置以使用从NBDV导出过程返回的视图次序索引识别参考视图。在ARP的一个实例中，当一个参考图片列表中的一个PU的参考图片来自与当前视图不同的视图时,ARP针对此参考图片列表停用。In one or more examples of the present disclosure, video encoder 20 and/or video decoder 30 may be configured to identify a reference view using a view order index returned from the NBDV derivation process. In one example of ARP, ARP is disabled for a reference picture list when the reference pictures of a PU in the reference picture list are from a different view than the current view.

2013年6月27日申请的第61/840,400号美国临时申请案和2013年7月18日申请的第61/847,942号美国临时申请案中描述用于深度帧间译码的一些添加技术。在这些实例中，当译码深度图片时，通过估计深度值从当前块的相邻样本转换视差向量。Some additional techniques for depth inter coding are described in U.S. Provisional Application No. 61/840,400, filed on June 27, 2013, and U.S. Provisional Application No. 61/847,942, filed on July 18, 2013. In these examples, when coding a depth picture, a disparity vector is converted from neighboring samples of the current block by estimating depth values.

在ARP的其它实例中，可例如通过存取由视差向量识别的基础视图的参考块而导出额外合并候选者。In other examples of ARP, additional merge candidates may be derived, eg, by accessing a reference block of the base view identified by a disparity vector.

以下描述用于定位用于视图间运动预测的块的技术。在3D-HEVC中，使用两个一般步骤识别参考4x4块。第一步骤是以视差运动向量识别参考视图中的像素。第二步骤是获得对应4x4块(具有分别对应于RefPicList0或RefPicList1的运动信息的唯一集合)且利用所述运动信息产生合并候选者。The following describes a technique for locating blocks for inter-view motion prediction. In 3D-HEVC, two general steps are used to identify reference 4x4 blocks. The first step is to identify pixels in the reference view using disparity motion vectors. The second step is to obtain the corresponding 4x4 block (with a unique set of motion information corresponding to RefPicList0 or RefPicList1, respectively) and use that motion information to generate merge candidates.

如下识别参考视图中的像素(xRef,yRef)：The pixel (xRef, yRef) in the reference view is identified as follows:

其中(xP,yP)是当前PU的左上方样本的坐标，mvDisp是视差向量且nPSWxnPSH是当前PU的大小，且PicWidthInSamples_L及PicHeightInSamples_L界定参考视图(与当前视图相同)中的图片的分辨率。where (xP, yP) are the coordinates of the top left sample of the current PU, mvDisp is the disparity vector and nPSWxnPSH is the size of the current PU, and _{PicWidthInSamplesL} and _{PicHeightInSamplesL} define the resolution of the picture in the reference view (same as the current view).

以下描述子PU层级视图间运动预测。在JCT3V-E0184中，提出使用用于IPMVC的子PU层级视图间运动预测方法，即，从参考视图中的参考块导出的候选者。安等人的JCT3V-E0184“3D-CE3.h相关：子PU层级视图间运动预测”从http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝1198可用。The following describes sub-PU-level inter-view motion prediction. JCT3V-E0184 proposes using a sub-PU-level inter-view motion prediction method for IPMVC, i.e., candidates derived from reference blocks in the reference view. An et al.'s JCT3V-E0184, "3D-CE3.h Related: Sub-PU-level Inter-view Motion Prediction," is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1198.

上文描述视图间运动预测的基本概念(例如，相对于用于跳过/合并模式的视图间候选者导出过程)，其中仅与中心位置相关联的参考块的运动信息用于相依视图中的当前PU。然而，当前PU可对应于参考视图中的参考区域(具有与由视差向量识别的当前PU相同的大小)，且参考区域可具有丰富的运动信息(即，超过运动向量)。The basic concept of inter-view motion prediction is described above (e.g., with respect to the inter-view candidate derivation process for skip/merge mode), where only the motion information of the reference block associated with the center position is used for the current PU in the dependent view. However, the current PU may correspond to a reference area in the reference view (having the same size as the current PU identified by the disparity vector), and the reference area may have rich motion information (i.e., more than the motion vector).

因此，提出子PU层级视图间运动预测(SPIVMP)方法。图16是说明子预测单元(PU)视图间运动预测的概念图。如图16中所展示，当前视图V1中的当前PU 170可分裂成多个子PU(例如，四个子PU)。每一子PU的视差向量可用以定位参考视图V0中的对应参考块。视频编码器20和/或视频解码器30可经配置以复制(即，再使用)与参考块中的每一者相关联的运动向量以用于当前PU 170的对应子PU。Therefore, a sub-PU level inter-view motion prediction (SPIVMP) method is proposed. FIG16 is a conceptual diagram illustrating sub-prediction unit (PU) inter-view motion prediction. As shown in FIG16, current PU 170 in current view V1 can be split into multiple sub-PUs (e.g., four sub-PUs). The disparity vector of each sub-PU can be used to locate the corresponding reference block in reference view V0. Video encoder 20 and/or video decoder 30 can be configured to copy (i.e., reuse) the motion vector associated with each of the reference blocks for the corresponding sub-PU of current PU 170.

在一个实例中，如下导出时间视图间合并候选者。首先，由NxN表示经指派子PU大小。首先将当前PU划分为具有较小大小的多个子PU。通过nPSW x nPSH表示当前PU的大小且通过nPSWsub x nPSHSub表示子PU的大小。In one example, temporal inter-view merge candidates are derived as follows. First, the assigned sub-PU size is represented by NxN. The current PU is first divided into multiple sub-PUs of smaller size. The size of the current PU is represented by nPSW x nPSH and the size of the sub-PU is represented by nPSWsub x nPSHSub.

nPSWsub＝min(N,nPSW)nPSWsub＝min(N,nPSW)

nPSHSub＝min(N,nPSH)nPSHSub＝min(N,nPSH)

第二，针对每一参考图片列表将默认运动向量tmvLX设定为(0,0)且将参考索引refLX设定为-1。对于光栅扫描次序中的每一子PU，以下适用。将DV添加到当前子PU的中间位置以如下获得参考样本位置(xRefSub,yRefSub)：Second, for each reference picture list, the default motion vector tmvLX is set to (0, 0) and the reference index refLX is set to -1. For each sub-PU in raster scan order, the following applies. DV is added to the middle position of the current sub-PU to obtain the reference sample position (xRefSub, yRefSub) as follows:

xRefSub＝Clip3(0,PicWidthInSamplesL-1,xPSub+nPSWsub/2+((mvDisp[0]+2)>>2))xRefSub＝Clip3(0,PicWidthInSamplesL-1,xPSub+nPSWsub/2+((mvDisp[0]+2)>>2))

yRefSub＝Clip3(0,PicHeightInSamplesL-1,yPSub+nPSHSub/2+((mvDisp[1]+2)>>2))yRefSub＝Clip3(0,PicHeightInSamplesL-1,yPSub+nPSHSub/2+((mvDisp[1]+2)>>2))

参考视图中覆盖(xRefSub,yRefSub)的块用作当前子PU的参考块。The block covering (xRefSub, yRefSub) in the reference view is used as the reference block of the current sub-PU.

对于所识别的参考块，如果其是使用时间运动向量经译码，那么以下适用。如果refL0和refL1两者等于-1，且当前子PU不是光栅扫描次序中的第一者，那么参考块的运动信息由全部先前子PU继承。相关联运动参数可用作当前子PU的候选运动参数。语法元素tmvLX和refLX经更新到当前子PU的运动信息。否则(例如，如果参考块经帧内译码)，将当前子PU的运动信息设定成tmvLX和refLX。可应用不同的子PU块大小，例如4x4、8x8和16x16。子PU的大小可在VPS中用信号表示。For the identified reference block, if it is coded using a temporal motion vector, the following applies. If both refL0 and refL1 are equal to -1, and the current sub-PU is not the first in raster scan order, the motion information of the reference block is inherited by all previous sub-PUs. The associated motion parameters may be used as candidate motion parameters for the current sub-PU. The syntax elements tmvLX and refLX are updated to the motion information of the current sub-PU. Otherwise (e.g., if the reference block is intra-coded), the motion information of the current sub-PU is set to tmvLX and refLX. Different sub-PU block sizes may be applied, such as 4x4, 8x8, and 16x16. The size of the sub-PU may be signaled in the VPS.

以下描述用于深度译码的子PU层级运动向量继承。类似于从一个纹理视图到另一纹理视图的子PU层级视图间运动预测的提议，2013年7月24日申请的第61/858,089号美国临时申请案提出应用从一个纹理视图到对应深度视图的子PU层级运动预测的技术。即，可在若干子PU中分割当前PU且每一子PU使用位于同一地点的纹理块的运动信息用于运动补偿。在此情况下，支持子PU层级MVI，且由视图间运动预测使用的视差向量被视为始终为零。The following describes sub-PU-level motion vector inheritance for depth coding. Similar to the proposed sub-PU-level inter-view motion prediction from one texture view to another, U.S. Provisional Application No. 61/858,089, filed on July 24, 2013, proposes a technique for applying sub-PU-level motion prediction from one texture view to the corresponding depth view. Specifically, the current PU can be partitioned into several sub-PUs, and each sub-PU uses the motion information of the co-located texture block for motion compensation. In this case, sub-PU-level MVI is supported, and the disparity vector used for inter-view motion prediction is considered to be always zero.

3D-HEVC中用于BVSP的当前设计展现以下问题。当使用AMP且当前PU大小是4x16或16x4且PU经单向预测时，通过针对整个PU导出一个视差向量而实现BVSP。即PU中的每一子块使用相同视差向量用于参考块合成和运动补偿。因此，对于较大的块大小，BVSP可较不高效，因为使用相同视差向量用于全部子块的块合成和运动补偿对于所述子块中的一些可能较不最佳。The current design for BVSP in 3D-HEVC exhibits the following issues. When AMP is used and the current PU size is 4x16 or 16x4, and the PU is unidirectionally predicted, BVSP is implemented by deriving a single disparity vector for the entire PU. That is, each sub-block in the PU uses the same disparity vector for reference block synthesis and motion compensation. Therefore, BVSP can be less efficient for larger block sizes because using the same disparity vector for block synthesis and motion compensation for all sub-blocks may be less optimal for some of the sub-blocks.

作为另一缺点，当当前PU双向预测时，以等于4x8和8x4的块大小启用BVSP。然而，在HEVC中，不允许用于含有少于64像素的块(例如，4x8或8x4块)的运动补偿(但允许16x4和4x16运动补偿)。As another drawback, when the current PU is bidirectionally predicted, BVSP is enabled with block sizes equal to 4x8 and 8x4. However, in HEVC, motion compensation is not allowed for blocks containing less than 64 pixels (e.g., 4x8 or 8x4 blocks) (but 16x4 and 4x16 motion compensation is allowed).

鉴于这些缺陷，本发明提出与3D-HEVC中的视图合成预测相关的技术，其集中于BVSP运动补偿大小。根据本发明的技术，对于BVSP，每一PU可分裂成子块，且每一子块可与不同视差运动向量相关联且单独地经运动补偿。以此方式，对于以AMP分割的块可增加BVSP的准确性，且因此，译码效率可增加。根据本发明的技术，可用于BVSP的子块的大小可进一步如下界定。In light of these limitations, this disclosure proposes techniques related to view synthesis prediction in 3D-HEVC, focusing on the size of BVSP motion compensation. According to the techniques of this disclosure, for BVSP, each PU can be split into sub-blocks, and each sub-block can be associated with a different disparity motion vector and independently motion compensated. In this way, the accuracy of BVSP can be increased for blocks partitioned with AMP, and thus, coding efficiency can be improved. According to the techniques of this disclosure, the size of sub-blocks that can be used for BVSP can be further defined as follows.

在本发明的一个实例中，当当前PU大小是16x4(或4x16)且当前PU经单向预测时，视频编码器20和/或视频解码器30可经配置以将BVSP和运动补偿技术应用于当前PU的8x4(或4x8)子块。即，BVSP子区的大小是8x4(或4x8)。子块中的每一者可被指派从深度块转换的视差运动向量。In one example of the present disclosure, when the current PU size is 16x4 (or 4x16) and the current PU is unidirectionally predicted, video encoder 20 and/or video decoder 30 may be configured to apply BVSP and motion compensation techniques to the 8x4 (or 4x8) sub-block of the current PU. That is, the size of the BVSP sub-region is 8x4 (or 4x8). Each of the sub-blocks may be assigned a disparity motion vector converted from a depth block.

图17是描绘当使用AMP时本发明的BVSP和运动补偿技术的概念图。在图17的实例中，视频编码器20和/或视频解码器30将当前PU 250不对称分割为4x16块。应注意4x16分割仅是一个实例，且参考图17描述的本发明的技术可应用于其它不对称分区，包含16x4分区。视频编码器20和/或视频解码器30可经配置以将PU 250细分为4x8子块255和256。FIG17 is a conceptual diagram illustrating the BVSP and motion compensation techniques of this disclosure when using AMP. In the example of FIG17 , video encoder 20 and/or video decoder 30 asymmetrically partitions current PU 250 into 4x16 blocks. It should be noted that the 4x16 partition is merely an example, and the techniques of this disclosure described with reference to FIG17 may be applied to other asymmetrical partitions, including 16x4 partitions. Video encoder 20 and/or video decoder 30 may be configured to subdivide PU 250 into 4x8 sub-blocks 255 and 256.

在图17的实例中，视频编码器20和/或视频解码器30经配置以使用BVSP单向预测PU 250。在此方面，视频编码器20和/或视频解码器30可经配置以例如使用NBDV导出技术导出PU 250的视差向量。举例来说，视差向量261可从相邻块252导出。视频编码器20和/或视频解码器30可随后经配置以再使用视差向量261来定位参考深度图片中的对应深度块260。根据本发明的技术，并非使用深度块260的全部来导出PU 255的视差运动向量，视频编码器20和/或视频解码器30可经配置以从子块255的深度块260的4x8子块264导出视差运动向量，且从子块256的深度块26的4x8子块262导出视差运动向量。17 , video encoder 20 and/or video decoder 30 are configured to unidirectionally predict PU 250 using BVSP. In this regard, video encoder 20 and/or video decoder 30 may be configured to derive a disparity vector for PU 250, for example, using NBDV derivation techniques. For example, disparity vector 261 may be derived from neighboring block 252. Video encoder 20 and/or video decoder 30 may then be configured to reuse disparity vector 261 to locate a corresponding depth block 260 in a reference depth picture. According to the techniques of this disclosure, rather than using the entirety of depth block 260 to derive a disparity motion vector for PU 255, video encoder 20 and/or video decoder 30 may be configured to derive a disparity motion vector from 4x8 sub-block 264 of depth block 260 for sub-block 255, and derive a disparity motion vector from 4x8 sub-block 262 of depth block 26 for sub-block 256.

视频编码器20和/或视频解码器30可随后使用对应导出视差运动向量合成子块255和256中的每一者的参考块以关于对应于具有等于refVIdxLX的视图次序的视图间参考图片的参考图片执行运动补偿。通过导出子块255和256中的每一者的个别视差运动向量，可合成较准确的参考视图且对应运动补偿过程可实现增加的译码增益。Video encoder 20 and/or video decoder 30 may then use the corresponding derived disparity motion vector to synthesize a reference block for each of sub-blocks 255 and 256 to perform motion compensation with respect to a reference picture corresponding to an inter-view reference picture having a view order equal to refVIdxLX. By deriving individual disparity motion vectors for each of sub-blocks 255 and 256, a more accurate reference view may be synthesized and an increased coding gain may be achieved for the corresponding motion compensation process.

在本发明的另一实例中，当当前PU大小是16x12(或12x16)且当前PU经单向预测时，视频编码器20和/或视频解码器30可经配置以将当前PU分割为8x4(或4x8)子块(也被称为BVSP子区)且使用BVSP导出每一子块的视差运动向量。In another example of the present disclosure, when the current PU size is 16x12 (or 12x16) and the current PU is uni-directionally predicted, video encoder 20 and/or video decoder 30 may be configured to partition the current PU into 8x4 (or 4x8) sub-blocks (also referred to as BVSP sub-regions) and derive a disparity motion vector for each sub-block using BVSP.

在另一实例中，BVS子区的大小可指派给16x12或12x16。在再一实例中，每一16x12(或12x16)子块进一步分割成16x8(或8x16)子块和两个8x4(或4x8)子块，其邻近于同一CU中的16x4(4x16)PU。在另一实例中，16x8(或8x16)子块可基于例如对应深度块的4个拐角而进一步分成两个8x8子区或四个4x8(或8x4)子区。In another example, the size of the BVS sub-region can be assigned to 16x12 or 12x16. In yet another example, each 16x12 (or 12x16) sub-block is further partitioned into a 16x8 (or 8x16) sub-block and two 8x4 (or 4x8) sub-blocks adjacent to a 16x4 (4x16) PU in the same CU. In another example, a 16x8 (or 8x16) sub-block can be further divided into two 8x8 sub-regions or four 4x8 (or 8x4) sub-regions based on, for example, the four corners of the corresponding depth block.

在本发明的另一实例中，当当前PU的高度和宽度两者大于或等于8且PU经双向预测时，视频编码器20和/或视频解码器30经配置以将BVSP子区的大小设定为8x8而不是4x8或8x4，如3D-HEVC的先前提议中。在另一实例中，替代于使用双向预测BVSP用于具有等于12x16或16x12的大小的PU，可应用单向预测BVSP。在此情况下，运动补偿大小可进一步设定成4x16或16x4。在另一实例中，当当前PU大小是16x4或4x16且当前PU经双向预测时，将BVSP子区的大小设定成等于PU大小。In another example of the present disclosure, when both the height and width of the current PU are greater than or equal to 8 and the PU is bi-directionally predicted, video encoder 20 and/or video decoder 30 is configured to set the size of the BVSP sub-region to 8x8 instead of 4x8 or 8x4, as in previous proposals for 3D-HEVC. In another example, instead of using bi-directionally predicted BVSP for PUs having a size equal to 12x16 or 16x12, uni-directionally predicted BVSP may be applied. In this case, the motion compensation size may be further set to 4x16 or 16x4. In another example, when the current PU size is 16x4 or 4x16 and the current PU is bi-directionally predicted, the size of the BVSP sub-region is set equal to the PU size.

子PU运动预测可展现以下缺陷。在此上下文中，子PU运动预测可包含如上文所描述的JCT3V-E0184中所提议的子PU运动预测技术，以及子PU运动预测扩展到从纹理视图到深度视图的MVI。Sub-PU motion prediction may exhibit the following drawbacks. In this context, sub-PU motion prediction may include the sub-PU motion prediction techniques proposed in JCT3V-E0184 as described above, as well as the extension of sub-PU motion prediction to MVI from texture view to depth view.

作为一个缺点，当不对称运动分割(AMP)经启用时，当前PU大小等于例如4x16、16x4，且VPS中用信号表示的子PU块大小等于8x8，基于子PU运动预测的先前提议，此些PU将分裂成两个4x8或8x4子块。对于每一子块，继承来自参考块的运动信息。运动信息可通过用于视图间运动预测的参考纹理视图中的视差向量识别，或可从用于运动向量继承的对应纹理视图中的位于同一地点的纹理块再使用。在此实例中，可调用基于4x8或8x4的双向预测，HEVC不允许这样。As a drawback, when asymmetric motion partitioning (AMP) is enabled, the current PU size is equal to, for example, 4x16 or 16x4, and the sub-PU block size signaled in the VPS is equal to 8x8, these PUs will be split into two 4x8 or 8x4 sub-blocks based on previous proposals for sub-PU motion prediction. For each sub-block, motion information is inherited from the reference block. The motion information can be identified by the disparity vector in the reference texture view used for inter-view motion prediction, or it can be reused from the co-located texture block in the corresponding texture view for motion vector inheritance. In this example, bidirectional prediction based on 4x8 or 8x4 can be invoked, which is not allowed in HEVC.

作为另一缺点，当AMP经启用时，PU大小等于例如12x16或16x12，且VPS中用信号表示的子PU块大小(即，子块大小)等于8x8，基于子PU运动预测的先前提议，此些PU将分裂成两个8x8和两个4x8/8x4子块。类似于以上情况，4x8/8x4子块可使用双向预测，HEVC不允许这样。As another disadvantage, when AMP is enabled, the PU size is equal to, for example, 12x16 or 16x12, and the sub-PU block size (i.e., sub-block size) signaled in the VPS is equal to 8x8, based on previous proposals for sub-PU motion prediction, these PUs will be split into two 8x8 and two 4x8/8x4 sub-blocks. Similar to the above case, the 4x8/8x4 sub-block can use bidirectional prediction, which is not allowed in HEVC.

本发明中提出与视图间运动预测和运动向量继承(用于深度PU)相关的技术。在当合并索引指示视图间运动预测或MVI时的情境中可应用本发明的技术。确切地说，本发明的视图间运动预测和/或MVI技术包含用于将AMP PU进一步分割为子块且获得子块中的每一者的单独运动信息的技术。以此方式，可针对子块中的每一者改善视图间运动预测和/或MVI的准确性，且因此可增加译码效率。This disclosure proposes techniques related to inter-view motion prediction and motion vector inheritance (for depth PUs). The techniques of this disclosure can be applied in the context of a merge index indicating inter-view motion prediction or MVI. Specifically, the inter-view motion prediction and/or MVI techniques of this disclosure include techniques for further partitioning an AMP PU into sub-blocks and obtaining separate motion information for each of the sub-blocks. In this way, the accuracy of inter-view motion prediction and/or MVI can be improved for each of the sub-blocks, and thus coding efficiency can be increased.

在本发明的一个实例中，当当前PU是使用视图间运动预测和/或MVI经译码且当前PU大小等于4x16或16x4时，视频编码器20和/或视频解码器30可经配置以将PU分裂为两个4x8或8x4子块。对于子块中的每一者，视频编码器20和/或视频解码器30可经配置以仅从对应于特定参考图片列表(例如，RefPicList0)的参考块获得运动信息。对应于RefPicList0中的参考块的运动信息是针对4x8或8x4子块继承。在此情况下，子块是从RefPicList0中的图片单向预测。In one example of the present disclosure, when the current PU is coded using inter-view motion prediction and/or MVI and the current PU size is equal to 4x16 or 16x4, video encoder 20 and/or video decoder 30 may be configured to split the PU into two 4x8 or 8x4 sub-blocks. For each of the sub-blocks, video encoder 20 and/or video decoder 30 may be configured to obtain motion information only from the reference blocks corresponding to a specific reference picture list (e.g., RefPicList0). The motion information corresponding to the reference blocks in RefPicList0 is inherited for the 4x8 or 8x4 sub-blocks. In this case, the sub-blocks are unidirectionally predicted from the pictures in RefPicList0.

图18是说明用于不对称分割成大小4x16和16x4的PU的运动向量继承和运动补偿技术的概念图。举例来说，对于4x16PU，视频编码器20和/或视频解码器30经配置以将4x16PU进一步划分为两个4x8子块300和302。子块300和302中的每一者的运动信息是从属于特定参考图片列表(例如，RefPicList0)的参考图片中的参考块获得。随后相对于RefPicList0中的参考块针对子块300和302中的每一者执行运动补偿。同样，对于16x4PU，视频编码器20和/或视频解码器30经配置以将16x5PU进一步划分为两个8x4子块304和306。子块304和306中的每一者的运动信息是从属于特定参考图片列表(例如，RefPicList0)的参考图片中的参考块获得。随后相对于RefPicList0中的参考块针对子块304和306中的每一者执行运动补偿。FIG18 is a conceptual diagram illustrating motion vector inheritance and motion compensation techniques for asymmetrical partitioning into PUs of size 4x16 and 16x4. For example, for a 4x16 PU, video encoder 20 and/or video decoder 30 is configured to further partition the 4x16 PU into two 4x8 sub-blocks 300 and 302. Motion information for each of sub-blocks 300 and 302 is obtained from a reference block in a reference picture belonging to a particular reference picture list (e.g., RefPicList0). Motion compensation is then performed for each of sub-blocks 300 and 302 relative to the reference block in RefPicList0. Similarly, for a 16x4 PU, video encoder 20 and/or video decoder 30 is configured to further partition the 16x5 PU into two 8x4 sub-blocks 304 and 306. Motion information for each of sub-blocks 304 and 306 is obtained from a reference block in a reference picture belonging to a particular reference picture list (e.g., RefPicList0). Motion compensation is then performed for each of sub-blocks 304 and 306 relative to a reference block in RefPicListO.

在本发明的另一实例中，当当前PU大小是16x12、12x16、4x16或16x4中的一者时，视频编码器20及视频解码器30经配置以当应用子PU层级视图间运动预测和/或MVI(用于深度)时不使用用于8x4/4x8子块的双向预测。即，当当前PU大小是16x12、12x16、4x16或16x4中的一者时，视频编码器20及视频解码器30经配置以当应用子PU层级视图间运动预测和/或MVI(用于深度)时仅使用用于8x4/4x8子块的单向预测。In another example of the present disclosure, when the current PU size is one of 16x12, 12x16, 4x16, or 16x4, video encoder 20 and video decoder 30 are configured not to use bi-directional prediction for 8x4/4x8 sub-blocks when sub-PU level inter-view motion prediction and/or MVI (for depth) is applied. That is, when the current PU size is one of 16x12, 12x16, 4x16, or 16x4, video encoder 20 and video decoder 30 are configured to use only uni-directional prediction for 8x4/4x8 sub-blocks when sub-PU level inter-view motion prediction and/or MVI (for depth) is applied.

在本发明的另一实例中，提出当应用子PU层级视图间运动预测或MVI且当前PU大小等于4x16或16x4时，PU不分裂成子PU。In another example of the present invention, it is proposed that when sub-PU level inter-view motion prediction or MVI is applied and the current PU size is equal to 4x16 or 16x4, the PU is not split into sub-PUs.

在本发明的另一实例中，提出当应用子PU层级视图间运动预测或MVI且当前PU大小等于12x16或16x12时，将PU分裂成三个相等大小的子PU块，具有等于4x16或16x4的大小。对于每一子PU块，继承对应参考块的运动信息。In another example of the present invention, when sub-PU level inter-view motion prediction or MVI is applied and the current PU size is equal to 12x16 or 16x12, the PU is split into three equal-sized sub-PU blocks with a size equal to 4x16 or 16x4. For each sub-PU block, the motion information of the corresponding reference block is inherited.

在本发明的另一实例中，当当前PU大小等于12x16或16x12时，将PU分裂为两个8x8和一个4x16或16x4子PU块，其中8x8子PU形成含有此PU的CU的左边或上半部。在此实例的另一方面中，4x16和16x4子块进一步分裂成两个4x8或8x4子PU块。对于每一4x8或8x4子PU，仅获得对应于参考图片列表(RefPicList0)的参考块的运动信息且再用于4x8或8x4子PU。在此情况下，子PU是从RefPicList0中的图片单向预测。In another example of the present invention, when the current PU size is equal to 12x16 or 16x12, the PU is split into two 8x8 and one 4x16 or 16x4 sub-PU blocks, where the 8x8 sub-PU forms the left or upper half of the CU containing the PU. In another aspect of this example, the 4x16 and 16x4 sub-blocks are further split into two 4x8 or 8x4 sub-PU blocks. For each 4x8 or 8x4 sub-PU, only the motion information of the reference block corresponding to the reference picture list (RefPicList0) is obtained and reused for the 4x8 or 8x4 sub-PU. In this case, the sub-PU is unidirectionally predicted from the picture in RefPicList0.

在本发明的另一实例中，当BVSP用于具有等于12x16或16x12的大小的PU时，将PU分裂成三个相等大小的子PU块，具有等于4x16或16x4的大小。视频编码器20和/或视频解码器30可随后从对应深度块导出每一子PU的唯一视差运动向量。In another example of the present disclosure, when BVSP is used for a PU having a size equal to 12x16 or 16x12, the PU is split into three equally sized sub-PU blocks having a size equal to 4x16 or 16x4. Video encoder 20 and/or video decoder 30 may then derive a unique disparity motion vector for each sub-PU from the corresponding depth block.

图19是说明可以实施本发明的技术的视频编码器20的实例的框图。视频编码器20可执行视频切片内的视频块的帧内和帧间译码(包含视图间译码)，所述视频切片例如纹理图像和深度图两者的切片。纹理信息大体上包含明度(亮度或强度)和色度(颜色，例如红色调和蓝色调)信息。一般来说，视频编码器20可确定相对于明度切片的译码模式，且再使用来自对明度信息进行译码的预测信息以编码色度信息(例如，通过再使用分割信息、帧内预测模式选择、运动向量或类似物)。帧内译码依赖于空间预测来减少或移除给定视频帧或图片内的视频中的空间冗余。帧间译码依靠时间预测来减少或移除视频序列的邻近帧或图片内的视频中的时间冗余。帧内模式(I模式)可指代若干基于空间的译码模式中的任一者。例如单向预测(P模式)或双向预测(B模式)等帧间模式可指代若干基于时间的译码模式中的任一者。FIG19 is a block diagram illustrating an example of a video encoder 20 that may implement the techniques of this disclosure. Video encoder 20 may perform intra- and inter-coding (including inter-view coding) of video blocks within video slices, such as slices of both texture images and depth maps. Texture information generally includes luma (brightness or intensity) and chroma (color, such as red and blue tones) information. In general, video encoder 20 may determine a coding mode relative to a luma slice and reuse prediction information from coding the luma information to encode the chroma information (e.g., by reusing partitioning information, intra-prediction mode selection, motion vectors, or the like). Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of several spatially based coding modes. Inter-mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of several temporal-based coding modes.

如图19中所展示，视频编码器20接收待编码的视频帧(即，纹理图像或深度图)内的当前视频块(即，视频数据块，例如明度块、色度块或深度块)。在图19的实例中，视频编码器20包含视频数据存储器40、模式选择单元41、经解码图片缓冲器(DPB)64、求和器50、变换处理单元52、量化单元54、环路滤波器单元63和熵编码单元56。模式选择单元41又包含运动补偿单元44、运动估计单元42、帧内预测处理单元46和分割单元48。为了视频块重构，视频编码器20还包含逆量化单元58、逆变换处理单元60，及求和器62。环路滤波器单元63可包含解块滤波器和SAO滤波器以对块边界进行滤波以从经重构视频移除成块效应假象。除了解块滤波器之外，还可使用额外滤波器(环路内或环路后)。为简洁起见未图示此些滤波器，但是必要时，此些滤波器可以对求和器50的输出进行滤波(作为环路内滤波器)。As shown in FIG19 , video encoder 20 receives a current video block (i.e., a block of video data, such as a luma block, a chroma block, or a depth block) within a video frame to be encoded (i.e., a texture image or a depth map). In the example of FIG19 , video encoder 20 includes video data memory 40, mode select unit 41, decoded picture buffer (DPB) 64, summer 50, transform processing unit 52, quantization unit 54, loop filter unit 63, and entropy encoding unit 56. Mode select unit 41, in turn, includes motion compensation unit 44, motion estimation unit 42, intra-prediction processing unit 46, and partitioning unit 48. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform processing unit 60, and summer 62. Loop filter unit 63 may include a deblocking filter and an SAO filter to filter block boundaries to remove blockiness artifacts from the reconstructed video. In addition to the deblocking filter, additional filters (in-loop or post-loop) may also be used. Such filters are not shown for brevity, but may filter the output of summer 50 (as in-loop filters), if desired.

视频数据存储器40可存储待由视频编码器20的组件编码的视频数据。可(例如)从视频源18获得存储在视频数据存储器40中的视频数据。DPB 64是存储参考视频数据供视频编码器20用于编码视频数据(例如，在帧内或帧间译码模式中，也被称作帧内或帧间预测译码模式)的缓冲器。视频数据存储器40和DPB 64可由多种存储器装置中的任一者形成，例如包含同步DRAM(SDRAM)的动态随机存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM)，或其它类型的存储器装置。视频数据存储器40和DPB 64可由同一存储器装置或单独的存储器装置提供。在各种实例中，视频数据存储器40可与视频编码器20的其它组件一起在芯片上，或相对于那些组件在芯片外。Video data memory 40 may store video data to be encoded by components of video encoder 20. The video data stored in video data memory 40 may be obtained, for example, from video source 18. DPB 64 is a buffer that stores reference video data for use by video encoder 20 in encoding video data (e.g., in intra- or inter-coding modes, also known as intra- or inter-prediction coding modes). Video data memory 40 and DPB 64 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 40 and DPB 64 may be provided by the same memory device or separate memory devices. In various examples, video data memory 40 may be on-chip with other components of video encoder 20, or off-chip relative to those components.

在编码过程期间，视频编码器20接收待译码的视频帧或切片。所述帧或切片可以划分成多个视频块。运动估计单元42和运动补偿单元44执行所接收视频块相对于一或多个参考帧(包含视图间参考帧)中的一或多个块的帧间预测性译码以提供时间预测。帧内预测处理单元46可替代地相对于与待译码块相同的帧或切片中的一或多个相邻块执行对所接收视频块的帧内预测性译码以提供空间预测。视频编码器20可以执行多个译码遍次，例如，以为每一视频数据块选择适当的译码模式。During the encoding process, video encoder 20 receives a video frame or slice to be coded. The frame or slice may be divided into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-frame predictive coding of the received video block relative to one or more blocks in one or more reference frames (including inter-view reference frames) to provide temporal prediction. Intra-frame prediction processing unit 46 may alternatively perform intra-frame predictive coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.

此外，分割单元48可以基于前述译码遍次中的前述划分方案的评估将视频数据块分割成子块。举例来说，分割单元48最初可以将帧或切片分割成LCU，并且基于速率失真分析(例如，速率失真优化)将LCU中的每一者分割成子CU。模式选择单元41可进一步产生指示将LCU分割成若干子CU的四叉树数据结构。四叉树的叶节点CU可以包含一或多个PU和一或多个TU。Furthermore, partition unit 48 may partition a block of video data into sub-blocks based on evaluation of the aforementioned partitioning schemes in the aforementioned coding passes. For example, partition unit 48 may initially partition a frame or slice into LCUs and, based on rate-distortion analysis (e.g., rate-distortion optimization), partition each of the LCUs into sub-CUs. Mode select unit 41 may further generate a quadtree data structure indicating the partitioning of the LCU into sub-CUs. A leaf node CU of the quadtree may include one or more PUs and one or more TUs.

模式选择单元41可例如基于误差结果而选择帧内或帧间译码模式中的一者，且将所得帧内或帧间译码块提供到求和器50以产生残余块数据且提供到求和器62以重构经编码块以用作参考帧。模式选择单元41还将语法元素(例如，运动向量、帧内模式指示符、分割信息和其它此类语法信息)提供给熵编码单元56。Mode select unit 41 may select one of intra- or inter-coding modes, e.g., based on the error result, and provide the resulting intra- or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 41 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

运动估计单元42和运动补偿单元44可高度集成，但出于概念的目的分别加以说明。运动估计单元42所执行的运动估计是产生运动向量的过程，所述过程估计视频块的运动。举例来说，运动向量可指示当前视频帧或图片内的视频块的PU相对于参考帧(或其它经译码单元)内的预测性块相对于当前帧(或其它经译码单元)内正被译码的当前块的位移。Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference frame (or other coded unit) relative to the current block being coded within the current frame (or other coded unit).

预测性块是被发现在像素差方面与待译码块紧密匹配的块，像素差可通过绝对差总和(SAD)、平方差总和(SSD)或其它差度量来确定。在一些实例中，视频编码器20可计算存储于DPB 64中的参考图片的子整数像素位置的值。举例来说，视频编码器20可以内插参考图片的四分之一像素位置、八分之一像素位置或其它分数像素位置的值。因此，运动估计单元42可相对于全像素位置和分数像素位置执行运动搜索并且输出具有分数像素精度的运动向量。A predictive block is a block that is found to closely match the block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in DPB 64. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of the reference picture. Thus, motion estimation unit 42 may perform a motion search relative to full-pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.

运动估计单元42通过比较PU的位置与参考图片的预测性块的位置来计算用于经帧间译码切片中的视频块的PU的运动向量。可从第一参考图片列表(列表0)或第二参考图片列表(列表1)来选择参考图片，所述列表中的每一者识别存储于DPB 64中的一或多个参考图片。可使用本发明的技术构造参考图片列表。运动估计单元42将计算出来的运动向量发送到熵编码单元56和运动补偿单元44。Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU with the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in DPB 64. The reference picture lists may be constructed using the techniques of this disclosure. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.

由运动补偿单元44执行的运动补偿可涉及基于由运动估计单元42确定的运动向量来获取或产生预测性块。再次，在一些实例中，运动估计单元42与运动补偿单元44可在功能上集成。在接收到当前视频块的PU的运动向量后，运动补偿单元44可在参考图片列表中的一者中定位所述运动向量指向的预测块。求和器50通过从正经译码的当前视频块的像素值减去预测性块的像素值从而形成像素差值来形成残余视频块，如下文所论述。一般来说，运动估计单元42相对于明度分量执行运动估计，并且运动补偿单元44对于色度分量及明度分量两者使用基于明度分量计算的运动向量。以此方式，运动补偿单元44可再用针对明度分量确定的运动信息来对色度分量进行译码，以使得运动估计单元42不需要对色度分量执行运动搜索。模式选择单元41还可产生与视频块及视频切片相关联的语法元素以供视频解码器30在解码视频切片的视频块时使用。Motion compensation performed by motion compensation unit 44 may involve retrieving or generating a predictive block based on the motion vector determined by motion estimation unit 42. Again, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference picture lists. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, thereby forming pixel difference values, as discussed below. Generally, motion estimation unit 42 performs motion estimation with respect to the luma component, and motion compensation unit 44 uses motion vectors calculated based on the luma component for both chroma and luma components. In this way, motion compensation unit 44 can reuse the motion information determined for the luma component to code the chroma components, so that motion estimation unit 42 does not need to perform motion search on the chroma components. Mode select unit 41 may also generate syntax elements associated with video blocks and video slices for use by video decoder 30 when decoding video blocks of the video slices.

作为如上文所描述由运动估计单元42和运动补偿单元44执行的帧间预测的替代方案，帧内预测处理单元46可以对当前块进行帧内预测。具体来说，帧内预测处理单元46可确定用来对当前块进行编码的帧内预测模式。在一些实例中，帧内预测处理单元46可(例如)在不同的编码编次期间使用各种帧内预测模式对当前块进行编码，且帧内预测处理单元46(或在一些实例中为模式选择单元41)可从测试模式中选择适当帧内预测模式来使用。As an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, as described above, intra-prediction processing unit 46 may perform intra-prediction on the current block. Specifically, intra-prediction processing unit 46 may determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during different encoding passes, and intra-prediction processing unit 46 (or mode select unit 41 in some examples) may select an appropriate intra-prediction mode to use from a test mode.

举例来说，帧内预测处理单元46可使用速率失真分析计算用于各种所测试的帧内预测模式的速率失真值，并从所述所测试的模式当中选择具有最佳速率失真特性的帧内预测模式。速率失真分析一般确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量，以及用于产生经编码块的位速率(也就是说，位数目)。帧内预测处理单元46可根据用于各种经编码块的失真及速率计算比率，以确定哪个帧内预测模式对于所述块展现最佳速率失真值。For example, intra-prediction processing unit 46 may use rate-distortion analysis to calculate rate-distortion values for various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics from among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (that is, the number of bits) used to produce the encoded block. Intra-prediction processing unit 46 may calculate ratios based on the distortion and rate for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在选择用于块的帧内预测模式之后，帧内预测处理单元46可将指示块的选定帧内预测模式的信息提供到熵编码单元56。熵编码单元56可编码指示所选帧内预测模式的信息。视频编码器20在所发射的位流中可包含配置数据，其可包含多个帧内预测模式索引表和多个经修改的帧内预测模式索引表(也称为码字映射表)，对用于各种块的上下文进行编码的定义，以及对最可能帧内预测模式、帧内预测模式索引表和经修改的帧内预测模式索引表的指示以用于所述上下文中的每一者。After selecting an intra-prediction mode for a block, intra-prediction processing unit 46 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode. Video encoder 20 may include, in the transmitted bitstream, configuration data that may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also called codeword mapping tables), definitions of contexts for encoding various blocks, and indications of the most probable intra-prediction mode, the intra-prediction mode index table, and the modified intra-prediction mode index table to use for each of the contexts.

视频编码器20通过从正译码原始视频块减去来自模式选择单元41的预测数据而形成残余视频块。求和器50表示执行此减法运算的一或多个组件。变换处理单元52将例如离散余弦变换(DCT)或概念上类似的变换等变换应用于残余块，从而产生包括残余变换系数值的视频块。变换处理单元52可以执行概念上类似于DCT的其它变换。也可使用小波变换、整数变换、子带变换或其它类型的变换。在任何状况下，变换处理单元52向残余块应用所述变换，从而产生残余变换系数的块。Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 41 from the original video block being decoded. Summer 50 represents the one or more components that perform this subtraction operation. Transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. Transform processing unit 52 may perform other transforms conceptually similar to DCT. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used. In any case, transform processing unit 52 applies the transform to the residual block, producing a block of residual transform coefficients.

所述变换可将残余信息从像素值域转换到变换域(例如，频域)。变换处理单元52可将所得变换系数发送到量化单元54。量化单元54可量化所述变换系数以进一步减小位速率。量化过程可以减少与系数中的一些或全部相关联的位深度。可以通过调整量化参数来修改量化程度。在一些实例中，量化单元54可以接着执行对包含经量化的变换系数的矩阵的扫描。替代地，熵编码单元56可以执行所述扫描。The transform may convert the residual information from the pixel value domain to a transform domain (e.g., frequency domain). Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 may quantize the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

在量化之后，熵编码单元56对经量化的变换系数进行熵译码。举例来说，熵编码单元56可执行上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵译码技术。就基于上下文的熵译码而论，上下文可以基于相邻块。在熵编码单元56的熵译码之后，可以将经编码位流发射到另一装置(例如，视频解码器30)，或者将所述视频存档以用于稍后发射或检索。After quantization, entropy encoding unit 56 performs entropy coding on the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the context may be based on neighboring blocks. After entropy coding by entropy encoding unit 56, the encoded bitstream may be transmitted to another device (e.g., video decoder 30), or the video may be archived for later transmission or retrieval.

逆量化单元58及逆变换处理单元60分别应用逆量化及逆变换以在像素域中重构残余块，例如以供稍后用作参考块。运动补偿单元44可通过将残余块相加到DPB 64的帧中的一者的预测性块来计算参考块。运动补偿单元44还可将一或多个内插滤波器应用于经重构的残余块以计算子整数像素值以用于运动估计。求和器62将经重构建的残余块添加到由运动补偿单元44产生的经运动补偿的预测块，以产生经重构建的视频块以用于存储于DPB64中。经重构的视频块可由运动估计单元42和运动补偿单元44用作参考块以帧间译码后续视频帧中的块。Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the frames of DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reconstructed video block for storage in DPB 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.

视频编码器20可以实质上类似用于对明度分量进行译码的译码技术的方式对深度图进行编码，即使是在无对应色度分量的情况下。举例来说，帧内预测处理单元46可对深度图的块进行帧内预测，而运动估计单元42和运动补偿单元44可对深度图的块进行帧间预测。然而，如上文所论述，在深度图的帧间预测期间，运动补偿单元44可基于深度范围中的差及深度范围的精度值按比例缩放(也就是说，调整)参考深度图的值。举例来说，如果当前深度图和参考深度图中的不同最大深度值对应于相同现实世界深度，那么视频编码器20可将参考深度图的最大深度值按比例缩放为等于当前深度图中的最大深度值以用于预测的目的。另外或替代地，视频编码器20可例如使用大体类似于视图间预测的技术，使用经更新深度范围值和精度值以产生用于视图合成预测的视图合成图片。Video encoder 20 may encode depth maps in a manner substantially similar to the coding techniques used to code luma components, even in the absence of corresponding chroma components. For example, intra-prediction processing unit 46 may intra-predict blocks of depth maps, while motion estimation unit 42 and motion compensation unit 44 may inter-predict blocks of depth maps. However, as discussed above, during inter-prediction of depth maps, motion compensation unit 44 may scale (that is, adjust) the values of the reference depth map based on the difference in depth ranges and the precision values for the depth ranges. For example, if different maximum depth values in the current depth map and the reference depth map correspond to the same real-world depth, video encoder 20 may scale the maximum depth value of the reference depth map to be equal to the maximum depth value in the current depth map for prediction purposes. Additionally or alternatively, video encoder 20 may use the updated depth range values and precision values to generate view synthesis pictures for view synthesis prediction, e.g., using techniques generally similar to inter-view prediction.

如下文将参考图21到23更详细地论述，视频编码器20可经配置以采用上述本发明的技术。确切地说，视频编码器20可经配置以当PU是根据不对称分割模式分割时将此些PU分割为子块。视频编码器20可随后经配置以继承和/或导出子块中的每一者的运动向量或视差运动向量。As will be discussed in more detail below with reference to Figures 21-23, video encoder 20 may be configured to employ the techniques of this disclosure described above. Specifically, video encoder 20 may be configured to partition PUs into sub-blocks when the PUs are partitioned according to an asymmetric partitioning pattern. Video encoder 20 may then be configured to inherit and/or derive motion vectors or disparity motion vectors for each of the sub-blocks.

图20是说明可实施本发明的技术的视频解码器30的实例的框图。在图20的实例中，视频解码器30包含视频数据存储器79、熵解码单元70、运动补偿单元72、帧内预测处理单元74、逆量化单元76、逆变换处理单元78、经解码图片缓冲器(DPB)82、环路滤波器单元83和求和器80。在一些实例中，视频解码器30可执行总体上与关于视频编码器20(图19)描述的编码遍次互逆的解码遍次。运动补偿单元72可基于从熵解码单元70接收的运动向量产生预测数据，而帧内预测处理单元74可基于从熵解码单元70接收的帧内预测模式指示符产生预测数据。FIG20 is a block diagram illustrating an example of a video decoder 30 that may implement the techniques of this disclosure. In the example of FIG20 , video decoder 30 includes video data memory 79, entropy decoding unit 70, motion compensation unit 72, intra-prediction processing unit 74, inverse quantization unit 76, inverse transform processing unit 78, decoded picture buffer (DPB) 82, loop filter unit 83, and summer 80. In some examples, video decoder 30 may perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20 ( FIG19 ). Motion compensation unit 72 may generate prediction data based on motion vectors received from entropy decoding unit 70, while intra-prediction processing unit 74 may generate prediction data based on intra-prediction mode indicators received from entropy decoding unit 70.

视频数据存储器79可以存储待由视频解码器30的组件解码的视频数据，例如经编码视频位流。存储在视频数据存储器79中的视频数据可从例如相机等局部视频源、经由视频数据的有线或无线网络通信或通过存取物理数据存储媒体而获得。视频数据存储器79可形成存储来自经编码视频位流的经编码视频数据的经译码图片缓冲器(CPB)。DPB 82是存储参考视频数据供视频解码器30用于解码视频数据(例如，在帧内或帧间译码模式中，也被称作帧内或帧间预测译码模式)的DPB的一个实例。视频数据存储器79和DPB 82可由多种存储器装置中的任一者形成，例如包含同步DRAM(SDRAM)的动态随机存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM)，或其它类型的存储器装置。视频数据存储器79和DPB 82可由同一存储器装置或单独的存储器装置提供。在各种实例中，视频数据存储器79可与视频解码器30的其它组件一起在芯片上，或相对于所述组件在芯片外。Video data memory 79 can store video data, such as an encoded video bitstream, to be decoded by components of video decoder 30. The video data stored in video data memory 79 can be obtained from a local video source, such as a camera, via wired or wireless network communication of video data, or by accessing physical data storage media. Video data memory 79 can form a coded picture buffer (CPB) that stores encoded video data from the encoded video bitstream. DPB 82 is an example of a DPB that stores reference video data for use by video decoder 30 in decoding video data (e.g., in intra- or inter-frame coding modes, also known as intra- or inter-frame prediction coding modes). Video data memory 79 and DPB 82 can be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 79 and DPB 82 can be provided by the same memory device or separate memory devices. In various examples, video data memory 79 may be on-chip with the other components of video decoder 30 , or off-chip relative to those components.

在解码过程期间，视频解码器30从视频编码器20接收表示经编码视频切片的视频块和相关联的语法元素的经编码视频位流。视频解码器30的熵解码单元70熵解码位流以产生经量化系数、运动向量或帧内预测模式指示符和其它语法元素。熵解码单元70将运动向量及其它语法元素转发到运动补偿单元72。视频解码器30可在视频切片层级及/或视频块层级接收语法元素。During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors or intra-prediction mode indicators, and other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

当视频切片经译码为经帧内译码(I)切片时，帧内预测处理单元74可以基于用信号表示的帧内预测模式及来自当前帧或图片的先前经解码块的数据产生用于当前视频切片的视频块的预测数据。当视频帧被译码为经帧间译码(即，B、P或GPB)切片时，运动补偿单元72基于从熵解码单元70接收的运动向量及其它语法元素产生用于当前视频切片的视频块的预测性块。可从参考图片列表中的一者内的参考图片中的一者产生所述预测性块。视频解码器30可基于存储于经解码图片缓冲器82中的参考图片使用本发明的技术构造参考帧列表：列表0及列表1。运动补偿单元72通过剖析运动向量和其它语法元素确定用于当前视频切片的视频块的预测信息，并且使用所述预测信息产生用于正被解码的当前视频块的预测性块。举例来说，运动补偿单元72使用一些接收到的语法元素来确定用以译码视频片段的视频块的预测模式(例如，帧内预测或帧间预测)、帧间预测片段类型(例如，B片段、P片段或GPB片段)、用于片段的参考图片列表中的一或多者的构造信息、片段的每一经帧间编码的视频块的运动向量、片段的每一经帧间译码的视频块的帧间预测状态，以及解码当前视频片段中的视频块的其它信息。When the video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 74 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P, or GPB) slice, motion compensation unit 72 generates a predictive block for the video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 70. The predictive block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct reference frame lists: List 0 and List 1, based on the reference pictures stored in decoded picture buffer 82 using the techniques of this disclosure. Motion compensation unit 72 determines prediction information for the video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to generate a predictive block for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine the prediction mode (e.g., intra-prediction or inter-prediction) used to code video blocks of a video slice, the inter-prediction slice type (e.g., B slice, P slice, or GPB slice), construction information for one or more of the reference picture lists for the slice, the motion vector for each inter-coded video block of the slice, the inter-prediction status for each inter-coded video block of the slice, and other information for decoding video blocks in the current video slice.

运动补偿单元72还可以基于内插滤波器执行内插。运动补偿单元72可以使用如视频编码器20在视频块的编码期间使用的内插滤波器来计算参考块的子整数像素的内插值。在此情况下，运动补偿单元72可根据接收的语法元素而确定由视频编码器20使用的内插滤波器并使用所述内插滤波器来产生预测性块。Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of a reference block using interpolation filters such as those used by video encoder 20 during encoding of the video block. In this case, motion compensation unit 72 may determine the interpolation filters used by video encoder 20 from received syntax elements and use the interpolation filters to produce the predictive block.

逆量化单元76将提供于位流中且由熵解码单元70解码的经量化的变换系数逆量化，即解量化。逆量化过程可包含使用视频解码器30针对视频切片中的每一视频块计算以确定应应用的量化程度和同样逆量化程度的量化参数QP_Y。Inverse quantization unit 76 inverse quantizes, or dequantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include calculating, using video decoder 30, a quantization parameter QP _Y that determines, for each video block in a video slice, a degree of quantization and, likewise, a degree of inverse quantization that should be applied.

逆变换处理单元78对变换系数应用逆变换(例如，逆DCT、逆整数变换，或概念上类似的逆变换过程)，以便产生像素域中的残余块。Inverse transform processing unit 78 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to produce residual blocks in the pixel domain.

在运动补偿单元72基于运动向量及其它语法元素产生当前视频块的预测性块后，视频解码器30通过对来自逆变换处理单元78的残余块与由运动补偿单元72产生的对应预测性块求和而形成经解码的视频块。求和器90表示执行此求和运算的一或多个组件。环路滤波器单元63可包含解块滤波器和SAO滤波器以对块边界进行滤波以从经重构视频移除成块效应假象。除了解块滤波器之外，还可使用额外滤波器(环路内或环路后)。为简洁起见未图示此些滤波器，但是必要时，此些滤波器可对求和器80的输出进行滤波(作为环路内滤波器)。接着将给定帧或图片中的经解码视频块存储在经解码图片缓冲器82中，经解码图片缓冲器82存储用于后续运动补偿的参考图片。经解码图片缓冲器82还存储经解码视频以用于稍后呈现在显示器装置(例如，图1的显示器装置32)上。After motion compensation unit 72 generates a predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform processing unit 78 with the corresponding predictive block generated by motion compensation unit 72. Summer 90 represents the one or more components that perform this summing operation. Loop filter unit 63 may include a deblocking filter and an SAO filter to filter block boundaries to remove blockiness artifacts from the reconstructed video. In addition to the deblocking filter, additional filters (in-loop or post-loop) may also be used. Such filters are not shown for brevity, but if necessary, such filters may filter the output of summer 80 (as in-loop filters). The decoded video blocks in a given frame or picture are then stored in decoded picture buffer 82, which stores reference pictures used for subsequent motion compensation. Decoded picture buffer 82 also stores the decoded video for later presentation on a display device (e.g., display device 32 of FIG. 1 ).

如下文将参考图24到26更详细地论述，视频解码器30可经配置以采用上述本发明的技术。确切地说，视频解码器30可经配置以当PU是根据不对称分割模式分割时将此些PU分割为子块。视频解码器30可随后经配置以继承和/或导出子块中的每一者的运动向量或视差运动向量。As will be discussed in more detail below with reference to Figures 24-26, video decoder 30 may be configured to employ the techniques of this disclosure described above. Specifically, video decoder 30 may be configured to partition PUs into sub-blocks when the PUs are partitioned according to an asymmetric partitioning pattern. Video decoder 30 may then be configured to inherit and/or derive motion vectors or disparity motion vectors for each of the sub-blocks.

图21是说明本发明的实例编码方法的流程图。图21的技术可由视频编码器20的一或多个结构单元实施，例如由模式选择单元41、分割单元48和/或运动补偿单元44实施。21 is a flowchart illustrating an example encoding method of this disclosure. The technique of FIG. 21 may be implemented by one or more structural units of video encoder 20, such as mode select unit 41, partition unit 48, and/or motion compensation unit 44.

在本发明的一个实例中，视频编码器20(例如，使用模式选择单元41和分割单元48)可经配置以使用AMP产生视频数据块，其中所述视频数据块是使用BVSP单向预测的且具有16x12、12x16、16x4或4x16的大小(2100)。在本发明的一个实例中，所述视频数据块是预测单元。In one example of the present disclosure, video encoder 20 (e.g., using mode select unit 41 and partition unit 48) may be configured to generate a block of video data using AMP, wherein the block of video data is unidirectionally predicted using BVSP and has a size of 16x12, 12x16, 16x4, or 4x16 (2100). In one example of the present disclosure, the block of video data is a prediction unit.

视频编码器20使用分割单元48可进一步经配置以将视频数据块分割为子块，每一子块具有8x4或4x8的大小(2110)，且从对应于参考图片的深度图片中的对应深度块导出(例如，使用运动补偿单元44)子块中的每一者的相应视差运动向量(2120)。视频编码器20(例如，使用运动补偿单元44)可进一步经配置以使用相应导出视差运动向量合成所述子块中的每一者的相应参考块(2130)，且通过使用合成相应参考块对所述子块中的每一者执行运动补偿而对视频数据块进行编码(例如，使用运动补偿单元44)(2140)。Video encoder 20, using partition unit 48, may be further configured to partition the block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8 (2110), and derive (e.g., using motion compensation unit 44) a respective disparity motion vector for each of the sub-blocks from a corresponding depth block in a depth picture corresponding to a reference picture (2120). Video encoder 20 (e.g., using motion compensation unit 44) may be further configured to synthesize a respective reference block for each of the sub-blocks using the respective derived disparity motion vectors (2130), and encode the block of video data (e.g., using motion compensation unit 44) by performing motion compensation on each of the sub-blocks using the synthesized respective reference blocks (2140).

在本发明的另一实例中，视频编码器20可进一步经配置以产生指示预测单元是使用AMP经编码且指示预测单元是使用BVSP单向预测的一或多个语法元素，且产生指向BVSP候选者的合并候选者索引。In another example of the disclosure, video encoder 20 may be further configured to generate one or more syntax elements indicating that the prediction unit is encoded using AMP and indicating that the prediction unit is uni-directionally predicted using BVSP, and generate a merge candidate index pointing to a BVSP candidate.

在本发明的另一实例中，视频编码器20(例如，使用运动补偿单元44)可经配置以通过导出视频数据块的视差向量、使用所导出的视差向量定位所述子块中的每一者的对应深度块且将所述子块中的每一者的对应深度块的一个选定深度值转换为相应视差运动向量来导出所述子块中的每一者的相应视差运动向量。In another example of the present disclosure, video encoder 20 (e.g., using motion compensation unit 44) may be configured to derive a respective disparity motion vector for each of the sub-blocks by deriving a disparity vector for a block of video data, locating a corresponding depth block for each of the sub-blocks using the derived disparity vector, and converting one selected depth value of the corresponding depth block for each of the sub-blocks to a respective disparity motion vector.

图22是说明本发明的另一实例编码方法的流程图。图22的技术可由视频编码器20的一或多个结构单元实施，包含模式选择单元41、分割单元48和/或运动补偿单元44。22 is a flowchart illustrating another example encoding method of this disclosure. The technique of FIG. 22 may be implemented by one or more structural units of video encoder 20, including mode select unit 41, partition unit 48, and/or motion compensation unit 44.

在本发明的一个实例中，视频编码器20(例如，模式选择单元41和分割单元48)可经配置以使用AMP产生第二视频数据块，其中所述第二视频数据块是使用视图间运动预测或MVI中的至少一者经编码且具有大小16x4或4x16(2200)。视频编码器20(例如，使用分割单元48)可进一步经配置以将所述第二视频数据块分割为子块，每一子块具有8x4或4x8的大小(2210)，且从一个相应参考块导出(例如，使用运动补偿单元44)所述子块中的每一者的运动信息(2220)。视频编码器20可随后通过使用所导出运动信息和一个参考图片列表对所述子块中的每一者执行运动补偿而对所述第二视频数据块进行编码(2230)。In one example of the present disclosure, video encoder 20 (e.g., mode select unit 41 and partition unit 48) may be configured to generate a second block of video data using AMP, wherein the second block of video data is encoded using at least one of inter-view motion prediction or MVI and has a size of 16x4 or 4x16 (2200). Video encoder 20 (e.g., using partition unit 48) may be further configured to partition the second block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8 (2210), and derive (e.g., using motion compensation unit 44) motion information for each of the sub-blocks from a corresponding reference block (2220). Video encoder 20 may then encode the second block of video data by performing motion compensation on each of the sub-blocks using the derived motion information and a reference picture list (2230).

在本发明的另一实例中，视频编码器20(例如，使用运动补偿单元44)可经配置以通过相对于所述一个参考图片列表中的图片执行单向运动补偿而执行运动补偿。In another example of this disclosure, video encoder 20 (eg, using motion compensation unit 44) may be configured to perform motion compensation by performing uni-directional motion compensation relative to pictures in the one reference picture list.

图23是说明本发明的另一实例编码方法的流程图。图23的技术可由视频编码器20的一或多个结构单元实施，例如由模式选择单元41、分割单元48和/或运动补偿单元44实施。23 is a flowchart illustrating another example encoding method of this disclosure. The technique of FIG. 23 may be implemented by one or more structural units of video encoder 20, such as mode select unit 41, partition unit 48, and/or motion compensation unit 44.

在本发明的一个实例中，视频编码器20可经配置以使用AMP产生20(例如，使用模式选择单元41和分割单元48)第二视频数据块，其中所述第二视频数据块是使用视图间运动预测或运动向量继承中的至少一者经编码且具有大小16x12或12x16(2300)，将所述第二视频数据块分割(例如，使用分割单元48)为多个子块(2310)，且以单向预测性预测对所述多个子块中的每一者进行编码(例如，使用运动补偿单元44)(2320)。In one example of the present invention, video encoder 20 may be configured to generate 20 (e.g., using mode select unit 41 and partition unit 48) a second block of video data using AMP, wherein the second block of video data is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x12 or 12x16 (2300), partition the second block of video data (e.g., using partition unit 48) into a plurality of sub-blocks (2310), and encode each of the plurality of sub-blocks with uni-directional predictive prediction (e.g., using motion compensation unit 44) (2320).

图24是说明本发明的实例解码方法的流程图。图24的技术可由视频解码器30的一或多个结构单元实施，例如由运动补偿单元72实施。24 is a flowchart illustrating an example decoding method of this disclosure. The technique of FIG. 24 may be implemented by one or more structural units of video decoder 30, such as motion compensation unit 72.

在本发明的一个实例中，视频解码器30可经配置以接收对应于视频数据块的残余数据，其中所述视频数据块是使用AMP经编码，是使用BVSP单向预测，且具有16x12、12x16、16x4或4x16的大小(2400)。在本发明的一个实例中，所述视频数据块是预测单元。视频解码器30可进一步经配置以将视频数据块分割为子块，每一子块具有8x4或4x8的大小(2410)，且从对应于参考图片的深度图片中的对应深度块导出所述子块中的每一者的相应视差运动向量(2420)。In one example of the present disclosure, video decoder 30 may be configured to receive residual data corresponding to a block of video data, wherein the block of video data is encoded using AMP, is unidirectionally predicted using BVSP, and has a size of 16x12, 12x16, 16x4, or 4x16 (2400). In one example of the present disclosure, the block of video data is a prediction unit. Video decoder 30 may further be configured to partition the block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8 (2410), and derive a respective disparity motion vector for each of the sub-blocks from a corresponding depth block in a depth picture corresponding to a reference picture (2420).

视频解码器30可进一步经配置以使用相应导出视差运动向量合成所述子块中的每一者的相应参考块(2430)，且通过使用残余数据和合成相应参考块对所述子块中的每一者执行运动补偿而解码视频数据块(2440)。Video decoder 30 may be further configured to synthesize a respective reference block for each of the sub-blocks using the respective derived disparity motion vectors (2430), and decode the block of video data by performing motion compensation on each of the sub-blocks using the residual data and the synthesized respective reference block (2440).

在本发明的另一实例中，视频解码器30可进一步经配置以接收指示所述预测单元是使用不对称运动分割经编码且指示所述预测单元是使用后向视图合成预测单向预测的一或多个语法元素，且接收指向BVSP候选者的合并候选者索引。In another example of the disclosure, video decoder 30 may be further configured to receive one or more syntax elements indicating that the prediction unit is encoded using asymmetric motion partitioning and indicating that the prediction unit is uni-directionally predicted using backward view synthesis prediction, and receive a merge candidate index pointing to a BVSP candidate.

在本发明的另一实例中，视频解码器30可进一步经配置以通过导出视频数据块的视差向量、使用所导出的视差向量定位所述子块中的每一者的对应深度块且将所述子块中的每一者的对应深度块的一个选定深度值转换为相应视差运动向量来导出所述子块中的每一者的相应视差运动向量。In another example of the present invention, video decoder 30 may be further configured to derive a respective disparity motion vector for each of the sub-blocks by deriving a disparity vector for the block of video data, locating a corresponding depth block for each of the sub-blocks using the derived disparity vector, and converting a selected depth value of the corresponding depth block for each of the sub-blocks to a respective disparity motion vector.

图25是说明本发明的实例解码方法的流程图。图23的技术可由视频解码器30的一或多个结构单元实施，例如由运动补偿单元72实施。25 is a flowchart illustrating an example decoding method of this disclosure. The technique of FIG. 23 may be implemented by one or more structural units of video decoder 30, such as motion compensation unit 72.

在本发明的一个实例中，视频解码器30可经配置以接收对应于第二视频数据块的残余数据，其中所述第二视频数据块是使用视图间运动预测或MVI中的至少一者经编码且具有大小16x4或4x16(2500)，将所述第二视频数据块分割为子块，每一子块具有8x4或4x8的大小(2510)，从一个相应参考块导出所述子块中的每一者的运动信息(2520)，且通过使用残余数据、所导出运动信息和一个参考图片列表对所述子块中的每一者执行运动补偿而解码所述第二视频数据块。In one example of the present disclosure, video decoder 30 may be configured to receive residual data corresponding to a second block of video data, wherein the second block of video data is encoded using at least one of inter-view motion prediction or MVI and has a size of 16x4 or 4x16 (2500), partition the second block of video data into sub-blocks, each sub-block having a size of 8x4 or 4x8 (2510), derive motion information for each of the sub-blocks from a corresponding reference block (2520), and decode the second block of video data by performing motion compensation on each of the sub-blocks using the residual data, the derived motion information, and a reference picture list.

在本发明的另一实例中，视频解码器30可进一步经配置以通过相对于所述一个参考图片列表中的图片执行单向运动补偿而执行运动补偿。In another example of the disclosure, video decoder 30 may be further configured to perform motion compensation by performing uni-directional motion compensation with respect to pictures in the one reference picture list.

图26是说明本发明的实例解码方法的流程图。图23的技术可由视频解码器30的一或多个结构单元实施，包含运动补偿单元72。26 is a flowchart illustrating an example decoding method of this disclosure. The technique of FIG. 23 may be implemented by one or more structural units of video decoder 30, including motion compensation unit 72.

在本发明的一个实例中，视频解码器30可进一步经配置以接收对应于第二视频数据块的残余数据，其中所述第二视频数据块是使用视图间运动预测或MVI中的至少一者经编码且具有大小16x12或12x16(2600)，将所述第二视频数据块分割为多个子块(2610)，且以单向预测性预测解码所述多个子块中的每一者。In one example of the present disclosure, video decoder 30 may be further configured to receive residual data corresponding to a second block of video data, wherein the second block of video data is encoded using at least one of inter-view motion prediction or MVI and has a size of 16x12 or 12x16 (2600), partition the second block of video data into a plurality of sub-blocks (2610), and decode each of the plurality of sub-blocks with uni-directional predictive prediction.

如上文所解释，本发明的技术包含当对视频数据块应用AMP、BVSP、视图间运动预测和/或MVI时的视频编码和解码技术。确切地说，本发明的技术通过引导对以AMP分割的PU的子块的译码技术而提供较准确译码。举例来说，当PU是使用BVSP经译码时获得以AMP分割的此PU的子块的单独视差运动向量可增加视图合成和运动预测的准确性，且因此增加译码效率。作为另一实例，当PU是使用视图间运动预测和/或MVI经译码时获得以AMP分割的此PU的子块的单独运动信息可增加运动预测的准确性，且因此增加译码效率。As explained above, the techniques of this disclosure include video encoding and decoding techniques when AMP, BVSP, inter-view motion prediction, and/or MVI are applied to blocks of video data. Specifically, the techniques of this disclosure provide more accurate coding by guiding coding techniques for sub-blocks of PUs partitioned with AMP. For example, obtaining separate disparity motion vectors for sub-blocks of a PU partitioned with AMP when the PU is coded using BVSP can increase the accuracy of view synthesis and motion prediction, and thus increase coding efficiency. As another example, obtaining separate motion information for sub-blocks of a PU partitioned with AMP when the PU is coded using inter-view motion prediction and/or MVI can increase the accuracy of motion prediction, and thus increase coding efficiency.

应认识到，取决于实例，本文中所描述的技术中的任一者的某些动作或事件可用不同顺序执行、可添加、合并或全部省略(例如，实践所述技术并不需要所有的所描述动作或事件)。此外，在某些实例中，可(例如)通过多线程处理、中断处理或多个处理器同时而非顺序地执行动作或事件。It should be appreciated that, depending on the example, certain actions or events of any of the techniques described herein may be performed in a different order, may be added, combined, or omitted entirely (e.g., not all described actions or events are required to practice the techniques). Furthermore, in some examples, actions or events may be performed simultaneously rather than sequentially, for example, through multithreading, interrupt processing, or multiple processors.

在一或多个实例中，所描述功能可以硬件、软件、固件或其任何组合来实施。如果以软件实施，那么所述功能可以作为一或多个指令或代码在计算机可读媒体上存储或传输，并且由基于硬件的处理单元来执行。计算机可读媒体可包含计算机可读存储媒体，所述计算机可读存储媒体对应于有形媒体，例如，数据存储媒体或包含(例如)根据通信协议促进计算机程序从一位置传送至另一位置的任何媒体的通信媒体。以此方式，计算机可读媒体大体上可对应于(1)有形计算机可读存储媒体，其是非暂时的，或(2)通信媒体，例如信号或载波。数据存储媒体可为可由一或多个计算机或一个或多个处理器存取以检索用于实施本发明中描述的技术的指令、代码及/或数据结构的任何可用媒体。计算机程序产品可以包含计算机可读媒体。In one or more instances, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored or transmitted as one or more instructions or codes on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media or communication media that includes, for example, any media that facilitates the transfer of a computer program from one location to another according to a communication protocol. In this manner, computer-readable media may generally correspond to (1) tangible computer-readable storage media that is non-transitory, or (2) communication media, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.

借助于实例而非限制，此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可以用来存储指令或数据结构的形式的期望程序代码并且可以由计算机存取的任何其它媒体。并且，可恰当地将任何连接称作计算机可读媒体。举例来说，如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令，那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。然而，应理解，所述计算机可读存储媒体和数据存储媒体并不包含连接、载波、信号或其它暂时性媒体，而是实际上针对非暂时性的有形存储媒体。如本文中所使用，磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘和蓝光光盘，其中磁盘通常以磁性方式再现数据，而光盘利用激光以光学方式再现数据。以上各者的组合也应该包含在计算机可读媒体的范围内。By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, any connection can be appropriately referred to as a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwaves, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwaves are included in the definition of medium. However, it should be understood that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather are directed to non-transitory, tangible storage media. As used herein, disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, where disks typically reproduce data magnetically, while optical discs reproduce data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.

可由例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此，如本文中所使用的术语“处理器”可指代上述结构或适合于实施本文中所描述的技术的任何其它结构中的任一者。另外，在一些方面中，本文中所描述的功能性可以在经配置用于编码和解码的专用硬件和/或软件模块内提供，或者并入在组合编解码器中。另外，可以将所述技术完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Additionally, the techniques may be fully implemented in one or more circuits or logic elements.

本发明的技术可在广泛多种装置或设备中实施，包含无线手持机、集成电路(IC)或一组IC(例如，芯片组)。本发明中描述各种组件、模块或单元是为了强调经配置以执行所揭示的技术的装置的功能方面，但未必需要通过不同硬件单元实现。实际上，如上文所描述，各种单元可以结合合适的软件及/或固件组合在编解码器硬件单元中，或者通过互操作硬件单元的集合来提供，所述硬件单元包含如上文所描述的一或多个处理器。The techniques of this disclosure can be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of a device configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. In fact, as described above, the various units may be combined in a codec hardware unit in conjunction with appropriate software and/or firmware, or provided by a collection of interoperating hardware units, including one or more processors as described above.

描述了各种实例。这些和其它实例属于以下权利要求书的范围内。Various embodiments are described. These and other embodiments are within the scope of the following claims.

Claims

1. A method for decoding video data, the method comprising:

Receive residual data corresponding to a video data block, wherein the video data block is encoded using asymmetric motion segmentation, is unidirectionally predicted using backward view synthesis prediction (BVSP), and has a size of 16x12, 12x16, 16x4, or 4x16, and wherein the video data block is a prediction unit of the decoding unit.

The video data block is divided into sub-blocks, each sub-block having a size of 8x4 or 4x8;

Derive the corresponding disparity motion vector for each of the sub-blocks from the corresponding depth block in the depth image corresponding to the reference image;

The corresponding reference block for each of the sub-blocks is synthesized using the corresponding derived disparity motion vectors; and

The video data block is decoded by performing motion compensation on each of the sub-blocks using the residual data and the synthesized corresponding reference block.

2. The method according to claim 1, further comprising:

Receive an indication that the prediction unit is encoded using asymmetric motion segmentation and an indication that the prediction unit is synthesized using a backward view to predict one or more syntax elements via unidirectional prediction; and

Receive the merged candidate index pointing to the BVSP candidate.

3. The method of claim 1, wherein deriving the corresponding disparity motion vector for each of the sub-blocks comprises:

Export the disparity vector of the video data block;

The corresponding depth block of each of the sub-blocks is located using the derived disparity vector; and

A selected depth value of the corresponding depth block for each of the sub-blocks is converted into the corresponding disparity motion vector.

4. The method according to claim 1, wherein the video data block is a first video data block, the method further comprising:

Receive residual data corresponding to a second video data block, wherein the second video data block is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x4 or 4x16;

The second video data block is divided into sub-blocks, each sub-block having a size of 8x4 or 4x8;

Derive motion information for each of the sub-blocks from a corresponding reference block; and

The second video data block is decoded by performing motion compensation on each of the sub-blocks using the residual data, the derived motion information, and a list of reference images.

5. The method of claim 4, wherein performing motion compensation includes performing unidirectional motion compensation relative to the images in the list of reference images.

6. The method according to claim 1, wherein the video data block is a first video data block, the method further comprising:

Receive residual data corresponding to a second video data block, wherein the second video data block is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x12 or 12x16;

The second video data block is divided into multiple sub-blocks; and

Each of the plurality of sub-blocks is decoded using one-way predictive prediction.

7. A method for encoding video data, the method comprising:

Video data blocks are generated using asymmetric motion segmentation, wherein the video data blocks are unidirectionally predicted using backward view synthesis prediction BVSP and have a size of 16x12, 12x16, 16x4 or 4x16, and wherein the video data blocks are prediction units of the decoding unit.

The video data block is encoded by performing motion compensation on each of the sub-blocks using the synthesized corresponding reference blocks.

8. The method of claim 7, further comprising:

Generate one or more syntax elements that indicate the prediction unit is encoded using asymmetric motion segmentation and that the prediction unit is synthesized using backward view and predicted unidirectionally; and

Generate a merge candidate index pointing to the BVSP candidate.

9. The method of claim 7, wherein deriving the corresponding disparity motion vector for each of the sub-blocks comprises:

Export the disparity vector of the video data block;

10. The method of claim 7, wherein the video data block is a first video data block, the method further comprising:

A second video data block is generated using asymmetric motion segmentation, wherein the second video data block is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x4 or 4x16.

The second video data block is encoded by performing motion compensation on each of the sub-blocks using the derived motion information and a list of reference images.

11. The method of claim 10, wherein performing motion compensation includes performing unidirectional motion compensation relative to the images in the list of reference images.

12. The method of claim 7, wherein the video data block is a first video data block, the method further comprising:

A second video data block is generated using asymmetric motion segmentation, wherein the second video data block is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x12 or 12x16.

Divide the second video data block into multiple sub-blocks; and

Each of the plurality of sub-blocks is encoded using unidirectional predictive prediction.

13. An apparatus configured to decode video data, the apparatus comprising:

Video storage device, configured to store information corresponding to video data blocks; and

One or more processors, configured to:

Receive residual data corresponding to the video data block, wherein the video data block is encoded using asymmetric motion segmentation, is predicted unidirectionally using backward view synthesis prediction (BVSP), and has a size of 16x12, 12x16, 16x4, or 4x16, and wherein the video data block is a prediction unit of the decoding unit.

14. The device of claim 13, wherein the one or more processors are further configured to:

Receive the merged candidate index pointing to the BVSP candidate.

15. The device of claim 13, wherein the one or more processors are further configured to:

Export the disparity vector of the video data block;

16. The apparatus of claim 13, wherein the video data block is a first video data block, and wherein the one or more processors are further configured to:

17. The device of claim 16, wherein the one or more processors are further configured to perform unidirectional motion compensation relative to images in the one list of reference images.

18. The apparatus of claim 13, wherein the video data block is a first video data block, and wherein the one or more processors are further configured to:

The second video data block is divided into multiple sub-blocks; and

19. The device according to claim 13, further comprising:

A display configured to show the decoded video data blocks.

20. The device of claim 13, wherein the video memory and the one or more processors include a video decoder housed within one of a mobile phone, tablet computer, laptop computer, desktop computer, set-top box, or television.

21. An apparatus configured to decode video data, the apparatus comprising:

A means for receiving residual data corresponding to a video data block, wherein the video data block is encoded using asymmetric motion segmentation, is unidirectionally predicted using backward view synthesis prediction (BVSP), and has a size of 16x12, 12x16, 16x4, or 4x16, and wherein the video data block is a prediction unit of a decoding unit.

A means for dividing the video data block into sub-blocks, each sub-block having a size of 8x4 or 4x8;

A means for deriving the corresponding disparity motion vector of each of the sub-blocks from the corresponding depth blocks in a depth image corresponding to a reference image;

A means for synthesizing a corresponding reference block for each of the sub-blocks using the corresponding derived disparity motion vectors; and

A means for decoding a video data block by performing motion compensation on each of the sub-blocks using the residual data and the synthesized corresponding reference block.

22. The device according to claim 21, further comprising:

Means for receiving an indication that the prediction unit is encoded using asymmetric motion segmentation and an indication that the prediction unit is synthesized using backward view to predict one or more syntax elements via unidirectional prediction; and

A means for receiving a merged candidate index pointing to a BVSP candidate.

23. The apparatus of claim 21, wherein the means for deriving the corresponding disparity motion vector for each of the sub-blocks comprises:

Apparatus for deriving the disparity vector of the video data block;

A means for locating the corresponding depth block of each of the sub-blocks using the derived disparity vector; and

A means for converting a selected depth value of the corresponding depth block of each of the sub-blocks into the corresponding parallax motion vector.

24. The apparatus of claim 21, wherein the video data block is a first video data block, the apparatus further comprising:

A means for receiving residual data corresponding to a second video data block, wherein the second video data block is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x4 or 4x16;

A means for dividing the second video data block into sub-blocks, each sub-block having a size of 8x4 or 4x8;

A means for deriving motion information of each of the sub-blocks from a corresponding reference block; and

An apparatus for decoding the second video data block by performing motion compensation on each of the sub-blocks using the residual data, the derived motion information, and a list of reference images.

25. The device of claim 24, wherein the means for performing motion compensation includes means for performing unidirectional motion compensation relative to images in the list of reference images.

26. The apparatus of claim 21, wherein the video data block is a first video data block, the apparatus further comprising:

A means for receiving residual data corresponding to a second video data block, wherein the second video data block is encoded using at least one of inter-view motion prediction or motion vector inheritance and has a size of 16x12 or 12x16;

A means for dividing the second video data block into multiple sub-blocks; and

A means for decoding each of the plurality of sub-blocks with unidirectional predictive prediction.