HK1223757B

HK1223757B - Block-based advanced residual prediction for 3d video coding

Info

Publication number: HK1223757B
Application number: HK16111767.0A
Authority: HK
Inventors: 张莉; 陈颖
Original assignee: 高通股份有限公司
Priority date: 2014-01-11
Filing date: 2015-01-09
Publication date: 2020-05-15

Description

Block-based advanced residual prediction for 3D video coding

本申请案主张2014年1月11日申请的美国临时申请第61/926,290号的权益，其全部内容以全文引用的方式并入本文中。This application claims the benefit of U.S. Provisional Application No. 61/926,290, filed January 11, 2014, the entire contents of which are incorporated herein by reference in their entirety.

技术领域Technical Field

本发明涉及视频译码。The present invention relates to video decoding.

背景技术Background Art

数字视频能力可以并入到多种多样的装置中，包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子图书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频串流装置及其类似者。数字视频装置实施视频译码技术，例如由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4第10部分高级视频译码(AVC)界定的标准、目前正在开发的高效率视频译码(HEVC)标准及此类标准的扩展中所描述的视频译码技术。视频装置可通过实施此些视频译码技术而更有效地传输、接收、编码、解码和/或存储数字视频信息。Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones (so-called "smartphones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of such standards. By implementing such video coding techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频译码技术包含空间(图片内)预测和/或时间(图片间)预测以减少或去除视频序列中固有的冗余。对于基于块的视频译码来说，视频切片(即，视频帧或视频帧的一部分)可分割成视频块，视频块也可被称作树块、译码单元(CU)和/或译码节点。使用关于相同图片中的相邻块中的参考样本的空间预测编码图片的经帧内译码(I)切片中的视频块。图片的经帧间编码(P或B)切片中的视频块可使用相对于相同图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称作帧，且参考图片可被称作参考帧。Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

空间或时间预测导致待译码块的预测块。残差数据表示待译码原始块与预测块之间的像素差。经帧间译码块是根据指向形成预测块的参考样本块的运动向量和指示经译码块与预测块之间的差的残差数据编码的。经帧内译码块是根据帧内译码模式和残差数据来编码。为了进一步压缩，可将残差数据从像素域变换到变换域，从而产生残差变换系数，接着可以对残差变换系数进行量化。可扫描一开始按二维阵列排列的经量化变换系数，以便产生变换系数的一维向量，且可应用熵译码以实现更多压缩。Spatial or temporal prediction results in a prediction block for the block to be coded. Residual data represents the pixel differences between the original block to be coded and the prediction block. Inter-coded blocks are encoded based on a motion vector pointing to a block of reference samples forming the prediction block and residual data indicating the difference between the coded block and the prediction block. Intra-coded blocks are encoded based on an intra-coding mode and the residual data. For further compression, the residual data can be transformed from the pixel domain to the transform domain, thereby producing residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied to achieve even more compression.

发明内容Summary of the Invention

一般来说，本发明涉及多视图视频译码，其中经译码的视频数据包含两个或更多个视图。具体来说，本发明描述与高级残差预测(ARP)相关的各种技术。本发明的技术可减少视频译码器(例如，视频编码器及/或视频解码器)存取运动信息以便执行ARP或任何基础运动补偿过程(即，使用所指派的运动向量以潜在内插操作产生预测块)的次数。以此方式，由于执行对运动信息的更少的存储器存取，所以可增加视频译码(即，编码或解码)的速度。In general, this disclosure relates to multi-view video coding, in which the coded video data includes two or more views. Specifically, this disclosure describes various techniques related to advanced residual prediction (ARP). The techniques of this disclosure can reduce the number of times a video coder (e.g., a video encoder and/or video decoder) accesses motion information in order to perform ARP or any basic motion compensation process (i.e., using assigned motion vectors with potential interpolation operations to generate prediction blocks). In this way, the speed of video coding (i.e., encoding or decoding) can be increased because fewer memory accesses to motion information are performed.

在本发明的一个实例中，一种用于解码视频数据的方法包括：接收第一视图的第一存取单元中的第一经编码视频数据块，其中所述第一经编码视频数据块是使用高级残差预测及双向预测来编码，所述双向预测包含用于第一预测方向的时间预测及用于第二预测方向的视图间预测；确定用于所述第一经编码视频数据块的所述第一预测方向的时间运动信息；确定用于所述第一经编码视频数据块的所述第二预测方向的视差运动信息；使用所述第一预测方向的所述所确定的时间运动信息来识别不同于所述第一预测方向的所述第二预测方向的参考块，其中所述参考块在不同于所述第一存取单元的存取单元中；及使用所述第二预测方向的所述所识别的参考块对所述第一经编码视频数据块执行高级残差预测。In one example of the present invention, a method for decoding video data includes: receiving a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction; determining temporal motion information for the first prediction direction of the first encoded video data block; determining disparity motion information for the second prediction direction of the first encoded video data block; using the determined temporal motion information for the first prediction direction to identify a reference block for the second prediction direction different from the first prediction direction, wherein the reference block is in an access unit different from the first access unit; and performing advanced residual prediction on the first encoded video data block using the identified reference block for the second prediction direction.

在本发明的另一实例中，一种经配置以解码视频数据的设备包括：视频数据存储器，其经配置以存储第一视图的第一存取单元中的第一经编码视频数据块，其中所述第一经编码视频数据块是使用高级残差预测及双向预测来编码，所述双向预测包含用于第一预测方向的时间预测及用于第二预测方向的视图间预测；及一或多个处理器，其与所述视频数据存储器通信且经配置以：确定用于所述第一经编码视频数据块的所述第一预测方向的时间运动信息；确定用于所述第一经编码视频数据块的所述第二预测方向的视差运动信息；使用所述第一预测方向的所述所确定的时间运动信息来识别不同于所述第一预测方向的所述第二预测方向的参考块，其中所述参考块在不同于所述第一存取单元的存取单元中；及使用所述第二预测方向的所述所识别的参考块对所述第一经编码视频数据块执行高级残差预测。In another example of the present invention, an apparatus configured to decode video data includes: a video data memory configured to store a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction; and one or more processors in communication with the video data memory and configured to: determine temporal motion information for the first prediction direction of the first encoded video data block; determine disparity motion information for the second prediction direction of the first encoded video data block; use the determined temporal motion information for the first prediction direction to identify a reference block for the second prediction direction different from the first prediction direction, wherein the reference block is in an access unit different from the first access unit; and perform advanced residual prediction on the first encoded video data block using the identified reference block for the second prediction direction.

在本发明的另一实例中，一种经配置以解码视频数据的设备包括：用于接收第一视图的第一存取单元中的第一经编码视频数据块的装置，其中所述第一经编码视频数据块是使用高级残差预测及双向预测来编码，所述双向预测包含用于第一预测方向的时间预测及用于第二预测方向的视图间预测；用于确定用于所述第一经编码视频数据块的所述第一预测方向的时间运动信息的装置；用于确定用于所述第一经编码视频数据块的所述第二预测方向的视差运动信息的装置；用于使用所述第一预测方向的所述所确定的时间运动信息来识别不同于所述第一预测方向的所述第二预测方向的参考块的装置，其中所述参考块在不同于所述第一存取单元的存取单元中；及用于使用所述第二预测方向的所述所识别的参考块对所述第一经编码视频数据块执行高级残差预测的装置。In another example of the present invention, an apparatus configured to decode video data includes: a device for receiving a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction; a device for determining temporal motion information for the first prediction direction of the first encoded video data block; a device for determining disparity motion information for the second prediction direction of the first encoded video data block; a device for using the determined temporal motion information for the first prediction direction to identify a reference block for the second prediction direction different from the first prediction direction, wherein the reference block is in an access unit different from the first access unit; and a device for performing advanced residual prediction on the first encoded video data block using the identified reference block for the second prediction direction.

在另一实例中，本发明描述一种存储指令的计算机可读存储媒体，所述指令在被执行时致使经配置以解码视频数据的装置的一或多个处理器：接收第一视图的第一存取单元中的第一经编码视频数据块，其中所述第一经编码视频数据块是使用高级残差预测及双向预测来编码，所述双向预测包含用于第一预测方向的时间预测及用于第二预测方向的视图间预测；确定用于所述第一经编码视频数据块的所述第一预测方向的时间运动信息；确定用于所述第一经编码视频数据块的所述第二预测方向的视差运动信息；使用所述第一预测方向的所述所确定的时间运动信息来识别不同于所述第一预测方向的所述第二预测方向的参考块，其中所述参考块在不同于所述第一存取单元的存取单元中；及使用所述第二预测方向的所述所识别的参考块对所述第一经编码视频数据块执行高级残差预测。In another example, the present disclosure describes a computer-readable storage medium storing instructions that, when executed, cause one or more processors of a device configured to decode video data to: receive a first block of encoded video data in a first access unit of a first view, wherein the first block of encoded video data is encoded using advanced residual prediction and bi-directional prediction, the bi-directional prediction including temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction; determine temporal motion information for the first prediction direction of the first block of encoded video data; determine disparity motion information for the second prediction direction of the first block of encoded video data; use the determined temporal motion information for the first prediction direction to identify a reference block for a second prediction direction different from the first prediction direction, wherein the reference block is in an access unit different from the first access unit; and perform advanced residual prediction on the first block of encoded video data using the identified reference block for the second prediction direction.

在附图和下文描述中阐述本发明的一或多个实例的细节。其它特征、目标和优点将从所述描述、图式以及权利要求书显而易见。The details of one or more embodiments of the present invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明可利用本发明中所描述的技术的实例视频编码和解码系统的框图。1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.

图2为说明实例多视图编码或解码次序的图形图。2 is a graphical diagram illustrating an example multi-view encoding or decoding order.

图3为说明用于多视图视频译码的实例时间和视图间预测图案的概念图。3 is a conceptual diagram illustrating example temporal and inter-view prediction patterns for multi-view video coding.

图4是说明用于3D视频的纹理和深度值的概念图。FIG. 4 is a conceptual diagram illustrating texture and depth values for 3D video.

图5为说明用于预测当前块的运动信息的相邻块与当前块的实例关系的概念图。5 is a conceptual diagram illustrating an example relationship between neighboring blocks and a current block used to predict motion information of the current block.

图6为说明用于预测当前块的运动信息的经视图间预测的运动向量候选者和视图间视差运动向量候选者的导出的实例的概念图。6 is a conceptual diagram illustrating an example of derivation of inter-view predicted motion vector candidates and inter-view disparity motion vector candidates for predicting motion information of a current block.

图7为说明相对于当前视频块的实例空间相邻块的概念图，可使用基于相邻块的视差向量(NBDV)导出从所述实例空间相邻块导出当前视频块的视差向量。7 is a conceptual diagram illustrating example spatial neighboring blocks relative to a current video block from which a disparity vector for the current video block may be derived using Neighbor-Based Disparity Vector (NBDV) derivation.

图8是说明子预测单元(PU)视图间运动预测的概念图。FIG. 8 is a conceptual diagram illustrating inter-view motion prediction of a sub-prediction unit (PU).

图9为说明用于经时间预测视频块的时间高级残差预测(ARP)的实例预测结构的概念图。9 is a conceptual diagram illustrating an example prediction structure for temporal advanced residual prediction (ARP) for temporally predicted video blocks.

图10为说明用于时间ARP的实例双向预测结构的概念图。10 is a conceptual diagram illustrating an example bi-directional prediction structure for temporal ARP.

图11为根据本发明中描述的技术的用于经视图间预测视频块的视图间ARP的实例预测结构的概念图。11 is a conceptual diagram of an example prediction structure for inter-view ARP of an inter-view predicted video block, according to the techniques described in this disclosure.

图12是说明使用视图间预测用于一个参考图片列表且使用时间预测用于另一参考图片列表的双向ARP的实例预测结构的概念图。12 is a conceptual diagram illustrating an example prediction structure for bidirectional ARP using inter-view prediction for one reference picture list and temporal prediction for another reference picture list.

图13是说明根据本发明的技术的使用视图间预测用于一个参考图片列表且使用时间预测用于另一参考图片列表的双向ARP的实例预测结构的概念图。13 is a conceptual diagram illustrating an example prediction structure for bidirectional ARP using inter-view prediction for one reference picture list and temporal prediction for another reference picture list, according to the techniques of this disclosure.

图14是说明基于块的时间ARP的概念图。FIG. 14 is a conceptual diagram illustrating block-based time ARP.

图15是说明基于块的视图间ARP的概念图。FIG15 is a conceptual diagram illustrating block-based inter-view ARP.

图16是说明使用子PU合并候选者的基于块的ARP的概念图。FIG. 16 is a conceptual diagram illustrating block-based ARP using sub-PU merge candidates.

图17是说明可实施本发明中描述的技术的实例视频编码器的框图。17 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

图18是说明可利用本发明中所描述的技术的实例视频解码器的框图。18 is a block diagram illustrating an example video decoder that may utilize the techniques described in this disclosure.

图19为说明根据本发明中描述的技术的用于编码视频块的实例ARP方法的流程图。19 is a flowchart illustrating an example ARP method for encoding video blocks according to the techniques described in this disclosure.

图20为说明根据本发明中描述的技术的用于解码视频块的实例ARP方法的流程图。20 is a flowchart illustrating an example ARP method for decoding video blocks according to the techniques described in this disclosure.

具体实施方式DETAILED DESCRIPTION

一般来说，本发明涉及多视图视频译码，其中经译码视频数据包含两个或两个以上视图。在一些实例中，多视图视频译码包含多视图加深度视频译码过程。在一些实例中，多视图译码可包含三维(3D)视频的译码，且可被称作3D视频译码。在本发明的各种实例中，描述用于多视图及/或3D视频译码序列的非基础视图中的高级残差预测(APR)的技术。本发明的技术可减少视频译码器(例如，视频编码器及/或视频解码器)例如从存储器存取运动信息以便执行ARP或任何基础帧间预测(例如，时间及/或视图间帧间预测及双向预测)的次数。以此方式，由于执行对运动信息的更少的存储器存取，所以可增加视频译码(即，编码或解码)的速度。In general, the present invention relates to multi-view video coding, wherein the coded video data includes two or more views. In some examples, multi-view video coding includes a multi-view plus depth video coding process. In some examples, multi-view coding may include coding of three-dimensional (3D) video and may be referred to as 3D video coding. In various examples of the present invention, techniques for advanced residual prediction (APR) in non-base views of multi-view and/or 3D video coding sequences are described. The techniques of the present invention can reduce the number of times a video coder (e.g., a video encoder and/or a video decoder) accesses motion information from memory in order to perform APR or any basic inter-frame prediction (e.g., temporal and/or inter-view inter-frame prediction and bidirectional prediction). In this way, the speed of video coding (i.e., encoding or decoding) can be increased because fewer memory accesses to motion information are performed.

例如，本发明描述用于解码视频数据的方法，其包括：接收第一存取单元中的第一经编码视频数据块，其中所述第一经编码视频数据块是使用高级残差预测及双向视图间预测来编码；确定第一经编码视频数据块的第一预测方向的时间运动信息；及使用针对第一预测方向所确定的时间运动信息来识别不同于第一预测方向的第二预测方向的参考块，其中所述参考块在第二存取单元中。For example, the present disclosure describes a method for decoding video data, comprising: receiving a first block of encoded video data in a first access unit, wherein the first block of encoded video data is encoded using advanced residual prediction and bi-directional inter-view prediction; determining temporal motion information for a first prediction direction of the first block of encoded video data; and using the temporal motion information determined for the first prediction direction to identify a reference block for a second prediction direction different from the first prediction direction, wherein the reference block is in a second access unit.

图1为说明可利用本发明的技术的实例视频编码及解码系统10的框图。如图1中所示，系统10包含源装置12，所述源装置12提供经编码视频数据以在稍后时间由目的地装置14解码。具体来说，源装置12可经由计算机可读媒体16将视频数据提供给目的地装置14。源装置12及目的地装置14可包括广泛范围的装置中的任一者，包含桌上型计算机、笔记本(即，膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、所谓的“智能”板、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、视频流式传输装置或其类似者。在一些情况下，源装置12和目的地装置14可能经装备以用于无线通信。FIG1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques of this disclosure. As shown in FIG1 , system 10 includes a source device 12 that provides encoded video data to be decoded at a later time by a destination device 14. Specifically, source device 12 may provide the video data to destination device 14 via a computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" tablets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

目的地装置14可经由计算机可读媒体16接收待解码的经编码视频数据。计算机可读媒体16可包括能够将经编码视频数据从源装置12移动到目的地装置14的任何类型的媒体或装置。在一个实例中，计算机可读媒体16可包括使得源装置12能够实时将经编码视频数据直接传输到目的地装置14的通信媒体。经编码视频数据可根据通信标准(例如，无线通信协议)来调制，且被传输到目的地装置14。通信媒体可包括任何无线或有线通信媒体，例如射频(RF)频谱或一或多个物理传输线路。通信媒体可形成基于包的网络(例如，局域网、广域网或全球网络，例如因特网)的部分。通信媒体可包含路由器、交换器、基站或可用于促进从源装置12到目的地装置14的通信的任何其它装备。Destination device 14 may receive the encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium that enables source device 12 to transmit the encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.

在一些实例中，经编码数据可以从输出接口22输出到存储装置。类似地，可以通过输入接口从存储装置存取经编码数据。存储装置可包含多种分布式或本地存取的数据存储媒体中的任一者，例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器或任何其它用于存储经编码视频数据的合适的数字存储媒体。在另一实例中，存储装置可以对应于文件服务器或可存储由源装置12产生的经编码视频的另一中间存储装置。目的地装置14可经由流式传输或下载从存储装置存取经存储的视频数据。文件服务器可以是能够存储经编码视频数据并且将所述经编码视频数据传输到目的地装置14的任何类型的服务器。实例文件服务器包含网络服务器(例如，用于网站)、FTP服务器、网络附接存储(NAS)装置或本地磁盘驱动器。目的地装置14可以通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。此可包含无线通道(例如，Wi-Fi连接)、有线连接(例如，DSL、缆线调制解调器，等等)，或适合于存取存储于文件服务器上的经编码视频数据的以上两者的组合。经编码视频数据从存储装置的传输可能是流式传输、下载传输或两者的组合。In some examples, the encoded data can be output from output interface 22 to a storage device. Similarly, the encoded data can be accessed from the storage device via the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, the storage device may correspond to a file server or another intermediate storage device that can store the encoded video generated by source device 12. Destination device 14 can access the stored video data from the storage device via streaming or downloading. The file server can be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include a network server (e.g., for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive. Destination device 14 can access the encoded video data via any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of the two suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device may be streaming, downloading, or a combination of both.

本发明的技术不必限于无线应用或环境。所述技术可以应用于支持多种多媒体应用中的任一者的视频译码，例如空中协议电视广播、有线电视传输、卫星电视传输、因特网流式视频传输(例如，动态自适应HTTP流式传输(DASH))、经编码到数据存储媒体上的数字视频，存储在数据存储媒体上的数字视频的解码，或其它应用。在一些实例中，系统10可经配置以支持单向或双向视频传输，以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。The techniques of this disclosure are not necessarily limited to wireless applications or environments. The techniques may be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air protocol television broadcasting, cable television transmission, satellite television transmission, Internet streaming video transmission (e.g., Dynamic Adaptive Streaming over HTTP (DASH)), digital video encoded onto data storage media, decoding of digital video stored on data storage media, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

在图1的实例中，源装置12包含视频源18、深度估计单元19、视频编码器20及输出接口22。目的地装置14包含输入接口28、视频解码器30、基于深度图像的再现(DIBR)单元31和显示装置32。在其它实例中，源装置和目的地装置可包含其它组件或布置。举例来说，源装置12可从外部视频源18(例如外部相机)接收视频数据。同样，目的地装置14可以与外部显示装置介接，而非包含集成显示装置。1 , source device 12 includes a video source 18, a depth estimation unit 19, a video encoder 20, and an output interface 22. Destination device 14 includes an input interface 28, a video decoder 30, a depth image-based rendering (DIBR) unit 31, and a display device 32. In other examples, the source and destination devices may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Similarly, destination device 14 may interface with an external display device rather than including an integrated display device.

图1的所说明的系统10只是一个实例。本发明的技术可由任何数字视频编码和/或解码装置执行。尽管本发明的技术一般通过视频编码装置来执行，但是所述技术还可通过视频编码器/解码器(通常被称作“编码解码器”)来执行。此外，本发明的技术还可通过视频预处理器来执行。源装置12及目的地装置14仅为源装置12在其中产生经译码视频数据以供传输到目的地装置14的此些译码装置的实例。在一些实例中，装置12、14可以大体上对称的方式操作，使得装置12、14中的每一者包含视频编码及解码组件。因此，系统10可支持视频装置12、14之间的单向或双向视频传播以例如用于视频流式传输、视频回放、视频广播或视频电话。The illustrated system 10 of FIG. 1 is merely an example. The techniques of this disclosure may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are generally performed by a video encoding device, the techniques may also be performed by a video encoder/decoder (often referred to as a "codec"). Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner, such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting, or video telephony.

源装置12的视频源18可包含视频俘获装置，例如摄像机、含有先前所俘获视频的视频存档及/或用于从视频内容提供者接收视频的视频馈送接口。作为另一替代方案，视频源18可产生基于计算机图形的数据作为源视频，或实况视频、所存档视频与计算机产生的视频的组合。在一些情况下，如果视频源18是摄像机，则源装置12及目的地装置14可形成所谓的相机电话或视频电话。然而，如上文所提及，本发明中所描述的技术一般来说可适用于视频译码，且可应用于无线及/或有线应用。在每一种情况下，可由视频编码器20编码所俘获、经预先俘获或计算机产生的视频。经编码视频信息可接着通过输出接口22输出到计算机可读媒体16上。Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video coding in general and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output to computer-readable medium 16 via output interface 22.

视频源18可将视频数据的一或多个视图提供到视频编码器20。举例来说，视频源18可对应于相机阵列，所述相机各自具有相对于所拍摄的特定场景的唯一水平位置。或者，视频源18可例如使用计算机图形从不同水平相机视角产生视频数据。深度估计单元19可经配置以确定对应于纹理图像中的像素的深度像素的值。举例来说，深度估计单元19可表示声音导航与测距(SONAR)单元、光检测与测距(LIDAR)单元或能够在记录场景的视频数据时大体上同时地直接确定深度值的其它单元。Video source 18 may provide one or more views of video data to video encoder 20. For example, video source 18 may correspond to an array of cameras, each of which has a unique horizontal position relative to the particular scene being captured. Alternatively, video source 18 may generate video data from different horizontal camera perspectives, for example using computer graphics. Depth estimation unit 19 may be configured to determine the values of depth pixels corresponding to pixels in the texture image. For example, depth estimation unit 19 may represent a sound navigation and ranging (SONAR) unit, a light detection and ranging (LIDAR) unit, or other unit capable of directly determining depth values substantially simultaneously as video data of a scene is recorded.

另外或替代地，深度估计单元19可经配置以通过比较在大体上相同时间从不同水平相机视角俘获的两个或更多个图像来间接计算深度值。通过计算图像中的大体上类似像素值之间的水平视差，深度估计单元19可近似估计场景中的各种对象的深度。在一些实例中，深度估计单元19可在功能上与视频源18集成。举例来说，在视频源18产生计算机图形图像时，深度估计单元19可例如使用用以再现纹理图像的像素及对象的z坐标提供用于图形对象的实际深度图。Additionally or alternatively, depth estimation unit 19 may be configured to indirectly calculate depth values by comparing two or more images captured at substantially the same time from different horizontal camera perspectives. By calculating horizontal disparity between substantially similar pixel values in the images, depth estimation unit 19 can approximate the depths of various objects in the scene. In some examples, depth estimation unit 19 may be functionally integrated with video source 18. For example, when video source 18 generates computer graphics images, depth estimation unit 19 may provide an actual depth map for the graphical objects, such as using the pixels and z-coordinates of the objects used to reproduce the texture image.

计算机可读媒体16可包含瞬时媒体，例如无线广播或有线网络传输，或存储媒体(也就是说，非暂时性存储媒体)，例如硬盘、快闪驱动器、压缩光盘、数字视频光盘、蓝光光盘或其它计算机可读媒体。在一些实例中，网络服务器(未图示)可例如经由网络传输从源装置12接收经编码视频数据且将经编码视频数据提供给目的地装置14。类似地，媒体生产设施(例如光盘冲压设施)的计算装置可以从源装置12接收经编码视频数据并且生产容纳经编码视频数据的光盘。因此，在各种实例中，计算机可读媒体16可以理解为包含各种形式的一或多个计算机可读媒体。Computer-readable medium 16 may include transient media, such as wireless broadcasts or wired network transmissions, or storage media (that is, non-transitory storage media) such as hard disks, flash drives, compact discs, digital video discs, Blu-ray discs, or other computer-readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12, for example, via a network transmission, and provide the encoded video data to destination device 14. Similarly, a computing device at a media production facility (such as a disc stamping facility) may receive encoded video data from source device 12 and produce an optical disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.

目的地装置14的输入接口28从计算机可读媒体16接收信息。计算机可读媒体16的信息可包含由视频编码器20界定的语法信息，所述语法信息还供视频解码器30使用，所述语法信息包含描述块及其它经译码单元(例如，GOP)的特性及/或处理的语法元素。显示装置32将经解码视频数据显示给用户，且可包括多种显示装置中的任一者，例如阴极射线管(CRT)、液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。在一些实例中，显示装置32可包括能够同时或大体上同时显示两个或更多个视图(例如)以向观看者产生3D视觉效果的装置。Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units (e.g., GOPs). Display device 32 displays the decoded video data to a user and may include any of a variety of display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device. In some examples, display device 32 may include a device capable of displaying two or more views simultaneously or substantially simultaneously, for example, to produce a 3D visual effect for a viewer.

目的地装置14的DIBR单元31可使用从视频解码器30接收的经解码视图的纹理及深度信息再现合成视图。举例来说，DIBR单元31可依据对应深度图中的像素的值确定纹理图像的像素数据的水平视差。DIBR单元31接着可通过使纹理图像中的像素向左或向右偏移所确定的水平视差而产生合成图像。以此方式，显示装置32可以任何组合显示可对应于经解码视图及/或合成视图的一或多个视图。根据本发明的技术，视频解码器30可将深度范围及相机参数的原始及经更新的精度值提供给DIBR单元31，DIBR单元31可使用所述深度范围及相机参数恰当地合成视图。DIBR unit 31 of destination device 14 can reproduce a synthesized view using texture and depth information for the decoded view received from video decoder 30. For example, DIBR unit 31 can determine the horizontal disparity of pixel data for the texture image based on the values of pixels in the corresponding depth map. DIBR unit 31 can then generate a synthesized image by shifting pixels in the texture image left or right by the determined horizontal disparity. In this manner, display device 32 can display one or more views, which may correspond to decoded views and/or synthesized views, in any combination. According to the techniques of this disclosure, video decoder 30 can provide original and updated precision values for depth ranges and camera parameters to DIBR unit 31, which can use these depth ranges and camera parameters to properly synthesize the view.

尽管在图1中未展示，但在一些方面中，视频编码器20和视频解码器30可各自与音频编码器及解码器集成，且可包含适当的MUX-DEMUX单元或其它硬件和软件以处置共同数据流或单独数据流中的音频和视频两者的编码。如果适用，则MUX-DEMUX单元可符合ITUH.223多路复用器协议，或例如用户数据报协议(UDP)等其它协议。Although not shown in FIG1 , in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include appropriate MUX-DEMUX units or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams. If applicable, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).

视频编码器20及视频解码器30各自可实施为多种合适的编码器电路中的任一者，例如一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当部分地用软件实施所述技术时，装置可将用于软件的指令存储在合适的非暂时性计算机可读媒体中且使用一或多个处理器用硬件执行所述指令以执行本发明的技术。视频编码器20及视频解码器30中的每一者可包含在一或多个编码器或解码器中，所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(CODEC)的部分。包含视频编码器20和/或视频解码器30的装置可包括集成电路、微处理器和/或无线通信装置，例如蜂窝式电话。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. A device including video encoder 20 and/or video decoder 30 may include an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone.

在本发明的一个实例中，视频解码器30可经配置以接收第一视图的第一存取单元中的第一经编码视频数据块，其中所述第一经编码视频数据块是使用高级残差预测及双向预测来编码，所述双向预测包含用于第一预测方向的时间预测及用于第二预测方向的视图间预测。视频解码器30可进一步经配置以确定第一经编码视频数据块的第一预测方向的时间运动信息且确定第一经编码视频数据块的第二预测方向的视差运动信息。视频解码器30可进一步经配置以使用所述第一预测方向的所述所确定的时间运动信息而识别不同于所述第一预测方向的第二预测方向的参考块，其中所述参考块在不同于第一存取单元的存取单元中，且使用第二预测方向的所述所识别的参考块对第一经编码视频数据块执行高级残差预测。以此方式，再使用第一预测方向的时间运动信息以用于第二预测方向。因此，需要作出对时间运动信息的更少的存储器存取，这是因为不需要存取由对应于第二预测方向的第一经编码块的运动向量识别的块的时间运动信息，因此允许更快速的视频解码。另外，在执行ARP时使用的参考块的总数可从6减小到5，其导致在使用乘法及加法运算的内插方面较小的计算复杂度。同样，在执行双向帧间预测时，视频编码器20可经配置以在编码第二预测方向时再使用用于第一预测方向的时间运动信息。In one example of the present invention, video decoder 30 may be configured to receive a first block of encoded video data in a first access unit of a first view, wherein the first block of encoded video data is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction. Video decoder 30 may be further configured to determine temporal motion information for a first prediction direction for the first block of encoded video data and to determine parallax motion information for a second prediction direction for the first block of encoded video data. Video decoder 30 may be further configured to use the determined temporal motion information for the first prediction direction to identify a reference block for a second prediction direction different from the first prediction direction, wherein the reference block is in an access unit different from the first access unit, and to perform advanced residual prediction on the first block of encoded video data using the identified reference block for the second prediction direction. In this manner, the temporal motion information for the first prediction direction is reused for the second prediction direction. Consequently, fewer memory accesses are required for temporal motion information because the temporal motion information for the block identified by the motion vector of the first encoded block corresponding to the second prediction direction does not need to be accessed, thereby allowing for faster video decoding. Additionally, the total number of reference blocks used when performing ARP can be reduced from 6 to 5, which results in less computational complexity in interpolation using multiplication and addition operations. Likewise, when performing bidirectional inter prediction, video encoder 20 can be configured to reuse temporal motion information for a first prediction direction when encoding a second prediction direction.

视频编码器20和视频解码器30可以根据一种视频译码标准(例如目前正在开发的高效率视频译码(HEVC)标准)来操作，并且可以符合HEVC测试模型(HM)。替代地，视频编码器20及视频解码器30可根据例如替代地被称作MPEG-4第10部分高级视频译码(AVC)的ITU-T H.264标准等其它专属或工业标准或此类标准的扩展(例如，ITU-T H.264/AVC的MVC扩展)操作。MVC的最新联合草案描述于2010年3月的“用于通用视听服务的高级视频译码(Advanced video coding for generic audiovisual services)”(ITU-T推荐H.264)中。确切地说，视频编码器20及视频解码器30可根据3D和/或多视图译码标准操作，包含HEVC标准的3D扩展(例如，3D-HEVC)。Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC Test Model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4 Part 10 Advanced Video Coding (AVC), or extensions of such standards, such as the MVC extension of ITU-T H.264/AVC. The latest joint draft of MVC is described in "Advanced Video Coding for Generic Audiovisual Services" (ITU-T Recommendation H.264), dated March 2010. Specifically, video encoder 20 and video decoder 30 may operate according to 3D and/or multi-view coding standards, including a 3D extension of the HEVC standard (e.g., 3D-HEVC).

被称作“HEVC工作草案10”或“WD10”的HEVC标准的一个草案在布洛斯等人的文献JCTVC-L1003v34“高效率视频译码(HEVC)文本规范草案10(用于FDIS和最后呼叫)”(ITU-TSG16 WP3和ISO/IEC JTC1/SC29/WG11的视频译码联合合作小组(JCT-VC)，瑞士日内瓦第12次会议，2013年1月14-23日)中描述，截至2015年1月5日，其可从http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip下载。A draft of the HEVC standard, referred to as “HEVC Working Draft 10” or “WD10,” is described in document JCTVC-L1003v34, “High Efficiency Video Coding (HEVC) Textual Specification Draft 10 (for FDIS and Last Call)” by Bloss et al., Joint Collaboration Team on Video Coding (JCT-VC) of ITU-TSG16 WP3 and ISO/IEC JTC1/SC29/WG11, 12th Meeting, Geneva, Switzerland, January 14-23, 2013, available for download from http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip as of January 5, 2015.

被称作“WD10修订本”的HEVC标准的另一草案描述于布洛斯等人的“编辑者提出的对HEVC版本1校正(Editors'proposed corrections to HEVC version 1)”(ITU-TSG16WP3及ISO/IEC JTC1/SC29/WG11的视频译码联合合作小组(JCT-VC)，韩国仁川第13次会议，2013年4月)中，所述草案截至2015年1月5日可从http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M0432-v3.zip获得。对HEVC的多视图扩展(即，MV-HEVC)也正由JCT-3V开发。Another draft of the HEVC standard, known as the "WD10 revision," is described in Bloss et al., "Editors' proposed corrections to HEVC version 1," Joint Collaboration Team on Video Coding (JCT-VC) of ITU-TSG16 WP3 and ISO/IEC JTC1/SC29/WG11, 13th Meeting, Incheon, South Korea, April 2013, available as of January 5, 2015 at http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M0432-v3.zip. A multi-view extension to HEVC, namely MV-HEVC, is also being developed by JCT-3V.

当前，VCEG和MPEG的3D视频译码联合合作小组(JCT-3C)正在开发基于HEVC的3DV标准，其标准化工作的部分包含基于HEVC的多视图视频编解码器(MV-HEVC)的标准化和用于基于HEVC的3D视频译码(3D-HEVC)的另一部分。对于MV-HEVC，应保证其中仅存在高级语法(HLS)改变，以使得HEVC中的译码单元/预测单元层级中的模块不需要重新设计，且可完全再用于MV-HEVC。对于3D-HEVC，可包含并支持用于纹理和深度视图两者的新译码工具，包含译码单元/预测单元层级中的工具。Currently, the Joint Collaboration Team on 3D Video Coding (JCT-3C) of VCEG and MPEG is developing the HEVC-based 3DV standard. Part of this standardization work includes the standardization of the HEVC-based Multi-View Video Codec (MV-HEVC) and another part for HEVC-based 3D Video Coding (3D-HEVC). For MV-HEVC, only the high-level syntax (HLS) changes are required, ensuring that modules at the coding unit/prediction unit level in HEVC do not need to be redesigned and are fully reusable in MV-HEVC. For 3D-HEVC, new coding tools for both texture and depth views, including tools at the coding unit/prediction unit level, can be included and supported.

用于3D-HEVC的参考软件3D-HTM的一个版本可从以下链接下载：[3D-HTM版本9.01r1]：https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-9.0r1/。在张力(Li Zhang)、格哈德·泰克(Gerhard Tech)、克日什托夫·韦格纳(KrzysztofWegner)、叶世勋(Sehoon Yea)的“3D-HEVC及MV-HEVC的测试模型6”(JCT3V-F1005，ITU-TSG 16 WP3和ISO/IEC JTC 1/SC 29/WG 11的3D视频译码扩展开发联合合作小组，第6次会议，瑞士日内瓦，2013年11月(JCT3V-F1005))中描述参考软件描述的一个版本。JCT3V-F1005可从http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝1636下载。A version of the 3D-HTM reference software for 3D-HEVC is available for download at the following link: [3D-HTM Version 9.01r1]: https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-9.0r1/. A version of the reference software is described in Li Zhang, Gerhard Tech, Krzysztof Wegner, and Sehoon Yea, "Test Model 6 for 3D-HEVC and MV-HEVC," JCT3V-F1005, Joint Collaboration Group on 3D Video Coding Extension Development of ITU-TSG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11, 6th Meeting, Geneva, Switzerland, November 2013 (JCT3V-F1005). JCT3V-F1005 can be downloaded from http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1636.

在格哈德·泰克(Gerhard Tech)、克日什托夫·韦格纳(Krzysztof Wegner)、陈颖(Ying Chen)、叶世勋(Sehoon Yea)的“3D-HEVC草案文本2(3D-HEVC Draft Text2)”(JCT3V-F1001，ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC 29/WG 11的3D视频译码扩展开发联合合作小组，第6次会议，瑞士日内瓦，2013年11月(JCT3V-F1-001))中描述3D-HEVC的一个工作草案。JCT3V-F1001可从http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝1361得到。最新软件描述(文档编号：E1005)可从http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝1360得到。A working draft of 3D-HEVC is described in Gerhard Tech, Krzysztof Wegner, Ying Chen, and Sehoon Yea, “3D-HEVC Draft Text 2,” JCT3V-F1001, Joint Collaboration Group on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 6th Meeting, Geneva, Switzerland, November 2013 (JCT3V-F1-001). JCT3V-F1001 is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1361. The latest software description (document number: E1005) is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1360.

用于3D-HEVC的软件3D-HTM的更近的版本可从以下链接下载：[3D-HTM版本12.0]：https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-12.0/。3D-HEVC的对应工作草案(文档编号：I1001)可从http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php？id＝2299得到。最新软件描述(文档编号：I1005)可从http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php？id＝2301得到。A more recent version of the 3D-HTM software for 3D-HEVC can be downloaded from the following link: [3D-HTM Version 12.0]: https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-12.0/. The corresponding working draft of 3D-HEVC (Document No.: I1001) is available at http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php?id=2299. The latest software description (Document No.: I1005) is available at http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php?id=2301.

最初，将论述HEVC的实例译码技术。HEVC标准化工作是基于被称作HEVC测试模型(HM)的视频译码装置的演进模型。HM假设视频译码装置根据例如ITU-TH.264/AVC相对于现有装置的若干额外能力。举例来说，虽然H.264提供九种帧内预测编码模式，但HM可提供多达三十三种角度帧内预测编码模式加DC及平面模式。Initially, we will discuss example coding techniques for HEVC. The HEVC standardization effort is based on an evolved model of a video coding device known as the HEVC Test Model (HM). The HM assumes several additional capabilities compared to existing devices, such as those based on ITU-T H.264/AVC. For example, while H.264 provides nine intra-frame prediction coding modes, the HM can provide up to 33 angular intra-frame prediction coding modes, plus DC and planar modes.

在HEVC和其它视频译码规范中，视频序列通常包含一系列图片。图片也可被称作“帧”。图片可以包含三个样本阵列，标示为S_L、S_Cb以及S_Cr。S_L是明度样本的二维阵列(即，块)。S_Cb是Cb色度样本的二维阵列。S_Cr是Cr色度样本的二维阵列。色度样本在本文中还可以被称为“色度”样本。在其它情况下，图片可为单色的且可仅包含明度样本阵列。In HEVC and other video coding specifications, a video sequence typically contains a series of pictures. A picture may also be referred to as a "frame." A picture may contain three sample arrays, denoted _SL , _SCb , and _SCr . _SL is a two-dimensional array (i.e., a block) of luma samples. _SCb is a two-dimensional array of Cb chroma samples. _SCr is a two-dimensional array of Cr chroma samples. Chroma samples may also be referred to herein as "chroma" samples. In other cases, a picture may be monochrome and may contain only a luma sample array.

为了产生图片的经编码的表示，视频编码器20可以产生一组译码树单元(CTU)。CTU中的每一者可包括明度样本的译码树块、色度样本的两个对应的译码树块，以及用以对译码树块的样本进行译码的语法结构。在单色图片或具有三个单独色彩平面的图片中，CTU可包括单个译码树块及用于对所述译码树块的样本进行译码的语法结构。译码树块可为样本的N×N块。CTU也可以被称为“树块”或“最大译码单元(LCU)”。HEVC的CTU可以广泛地类似于例如H.264/AVC等其它标准的宏块。然而，CTU未必限于特定大小，并且可以包含一或多个译码单元(CU)。切片可包含按光栅扫描次序连续排序的整数数目的CTU。To generate an encoded representation of a picture, video encoder 20 may generate a set of coding tree units (CTUs). Each of the CTUs may include a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples, and a syntax structure for coding the samples of the coding tree block. In a monochrome picture or a picture with three separate color planes, a CTU may include a single coding tree block and a syntax structure for coding the samples of the coding tree block. A coding tree block may be an N×N block of samples. A CTU may also be referred to as a “tree block” or a “largest coding unit (LCU)”. A CTU of HEVC may be broadly similar to macroblocks of other standards, such as H.264/AVC. However, a CTU is not necessarily limited to a specific size and may include one or more coding units (CUs). A slice may include an integer number of CTUs ordered consecutively in raster scan order.

为了产生经译码CTU，视频编码器20可在CTU的译码树块上以递归方式执行四叉树分割，以将译码树块划分为译码块，因此命名为“译码树单元”。译码块是样本的N×N块。译码单元(CU)可包括具有明度样本阵列、Cb样本阵列和Cr样本阵列的图片的明度样本的译码块以及色度样本的两个对应的译码块，以及用以对译码块的样本进行译码的语法结构。在单色图片或具有三个单独色彩平面的图片中，CU可包括单个译码块和用以对译码块的样本进行译码的语法结构。To generate a coded CTU, video encoder 20 may recursively perform quadtree partitioning on the coding tree block of the CTU to divide the coding tree block into coding blocks, hence the name "coding tree unit". A coding block is an N×N block of samples. A coding unit (CU) may include a coding block of luma samples and two corresponding coding blocks of chroma samples for a picture having a luma sample array, a Cb sample array, and a Cr sample array, and syntax structures used to code the samples of the coding blocks. In a monochrome picture or a picture with three separate color planes, a CU may include a single coding block and syntax structures used to code the samples of the coding block.

视频编码器20可将CU的译码块分割为一或多个预测块。预测块是对其应用相同预测的样本的矩形(即，正方形或非正方形)块。CU的预测单元(PU)可包括明度样本的预测块、色度样本的两个对应预测块和用以预测预测块的语法结构。在单色图片或具有三个单独色彩平面的图片中，PU可包括单个预测块和用以预测预测块的语法结构。视频编码器20可以产生用于CU的每个PU的明度预测块、Cb预测块以及Cr预测块的预测性明度块、Cb块以及Cr块。Video encoder 20 may partition the coding blocks of a CU into one or more prediction blocks. A prediction block is a rectangular (i.e., square or non-square) block of samples to which the same prediction is applied. A prediction unit (PU) of a CU may include a prediction block of luma samples, two corresponding prediction blocks of chroma samples, and syntax structures used to predict the prediction blocks. In monochrome pictures or pictures with three separate color planes, a PU may include a single prediction block and syntax structures used to predict the prediction blocks. Video encoder 20 may generate predictive luma, Cb, and Cr blocks for the luma, Cb, and Cr prediction blocks of each PU of a CU.

视频编码器20可使用帧内预测或帧间预测来产生PU的预测块。如果视频编码器20使用帧内预测产生PU的预测块，则视频编码器20可以基于与PU相关联的图片的经解码的样本来产生PU的预测块。在HEVC的一些版本中，对于每一PU的明度分量，以33种角度预测模式(从2到34编索引)、DC模式(以1编索引)和平面模式(以0编索引)利用帧内预测方法。Video encoder 20 may use intra prediction or inter prediction to generate the prediction block for the PU. If video encoder 20 uses intra prediction to generate the prediction block for the PU, video encoder 20 may generate the prediction block for the PU based on decoded samples of the picture associated with the PU. In some versions of HEVC, for the luma component of each PU, intra prediction methods are utilized with 33 angular prediction modes (indexed from 2 to 34), DC mode (indexed with 1), and planar mode (indexed with 0).

如果视频编码器20使用帧间预测产生PU的预测块，则视频编码器20可基于除与PU相关的图片以外的一或多个图片的经解码样本产生PU的预测块。帧间预测可为单向帧间预测(即，单向预测或单向预测性预测)或双向帧间预测(即，双向预测或双向预测性预测)。为了执行单向预测或双向预测，视频编码器20可产生当前切片的第一参考图片列表(RefPicList0)及第二参考图片列表(RefPicList1)。参考图片列表中的每一者可包含一或多个参考图片。当使用单向预测时，视频编码器20可以搜索RefPicList0以及RefPicList1中的任一者或两者中的参考图片，以确定参考图片内的参考位置。此外，当使用单向预测时，视频编码器20可以至少部分基于对应于参考位置的样本而产生用于PU的预测样本块。此外，在使用单向预测时，视频编码器20可产生指示PU的预测块与参考位置之间的空间移位的单一运动向量。为了指示PU的预测块与参考位置之间的空间移位，运动向量可以包含指定PU的预测块与参考位置之间的水平移位的水平分量并且可以包含指定PU的预测块与参考位置之间的垂直移位的垂直分量。If video encoder 20 uses inter-prediction to generate a prediction block for a PU, video encoder 20 may generate the prediction block for the PU based on decoded samples of one or more pictures other than the picture associated with the PU. Inter-prediction may be unidirectional inter-prediction (i.e., unidirectional prediction or unidirectional predictive prediction) or bidirectional inter-prediction (i.e., bidirectional prediction or bidirectional predictive prediction). To perform unidirectional prediction or bidirectional prediction, video encoder 20 may generate a first reference picture list (RefPicList0) and a second reference picture list (RefPicList1) for the current slice. Each of the reference picture lists may include one or more reference pictures. When using unidirectional prediction, video encoder 20 may search the reference pictures in either or both of RefPicList0 and RefPicList1 to determine a reference position within the reference pictures. Furthermore, when using unidirectional prediction, video encoder 20 may generate a block of prediction samples for the PU based at least in part on samples corresponding to the reference position. Furthermore, when using unidirectional prediction, video encoder 20 may generate a single motion vector indicating the spatial displacement between the prediction block of the PU and the reference position. To indicate the spatial displacement between the PU's prediction block and the reference position, the motion vector may contain a horizontal component specifying the horizontal displacement between the PU's prediction block and the reference position and may contain a vertical component specifying the vertical displacement between the PU's prediction block and the reference position.

在使用双向预测来编码PU时，视频编码器20可确定RefPicList0中的参考图片中的第一参考位置及RefPicList1中的参考图片中的第二参考位置。视频编码器20接着可至少部分基于对应于第一及第二参考位置的样本产生PU的预测块。此外，当使用双向预测对PU进行编码时，视频编码器20可以产生指示PU的样本块与第一参考位置之间的空间移位的第一运动向量，以及指示PU的预测块与第二参考位置之间的空间移位的第二运动向量。When encoding a PU using bi-prediction, video encoder 20 may determine a first reference location in a reference picture in RefPicList0 and a second reference location in a reference picture in RefPicList1. Video encoder 20 may then generate a prediction block for the PU based at least in part on samples corresponding to the first and second reference locations. Furthermore, when encoding a PU using bi-prediction, video encoder 20 may generate a first motion vector indicating a spatial displacement between the sample block of the PU and the first reference location, and a second motion vector indicating a spatial displacement between the prediction block of the PU and the second reference location.

通常，B图片的第一或第二参考图片列表(例如，RefPicList0或RefPicList1)的参考图片列表建构包含两个步骤：参考图片列表初始化和参考图片列表重新排序(修改)。参考图片列表初始化是显式机制，其基于POC(图片次序计数，与图片的显示次序对准)次序值将参考图片存储器(也被称作经解码图片缓冲器)中的参考图片放入列表中。参考图片列表重新排序机制可将在参考图片列表初始化期间放置在列表中的图片的位置修改为任何新位置，或即使在图片不属于初始化列表的情况下也将参考图片存储器中的任何参考图片放置在任何位置。可将参考图片列表重新排序(修改)后的一些图片放置在列表中的更进一步的位置中。然而，如果图片的位置超过列表的有效参考图片的数目，则不将所述图片视为最终参考图片列表的条目。可在每一列表的切片标头中发信号通知有效参考图片的数目。Typically, reference picture list construction for the first or second reference picture list (e.g., RefPicList0 or RefPicList1) of a B picture involves two steps: reference picture list initialization and reference picture list reordering (modification). Reference picture list initialization is an explicit mechanism that places reference pictures in the reference picture memory (also known as the decoded picture buffer) into a list based on their POC (Picture Order Count) order values, which align with the display order of the pictures. The reference picture list reordering mechanism can modify the position of pictures placed in the list during reference picture list initialization to any new position, or place any reference picture in the reference picture memory at any position, even if the picture does not belong to the initialized list. Some pictures may be placed further into the list after the reference picture list reordering (modification). However, if a picture's position exceeds the number of valid reference pictures for a list, it is not considered an entry in the final reference picture list. The number of valid reference pictures may be signaled in the slice header of each list.

在建构参考图片列表(即RefPicList0和RefPicList1，如果可用)之后，可使用到参考图片列表的参考索引来识别参考图片列表中包含的任何参考图片。After the reference picture lists (ie, RefPicList0 and RefPicList1, if available) are constructed, reference indexes to the reference picture lists may be used to identify any reference pictures included in the reference picture lists.

在视频编码器20产生CU的一或多个PU的预测性明度、Cb及Cr块之后，视频编码器20可产生CU的明度残差块。CU的明度残差块中的每个样本指示CU的预测性明度块中的一者中的明度样本与CU的原始明度译码块中对应的样本之间的差异。另外，视频编码器20可以产生CU的Cb残差块。CU的Cb残差块中的每一样本可以指示CU的预测性Cb块中的一者中的Cb样本与CU的原始Cb译码块中对应的样本之间的差异。视频编码器20还可产生CU的Cr残差块。CU的Cr残差块中的每个样本可以指示CU的预测性Cr块中的一者中的Cr样本与CU的原始Cr译码块中对应的样本之间的差异。After video encoder 20 generates predictive luma, Cb, and Cr blocks for one or more PUs of a CU, video encoder 20 may generate a luma residual block for the CU. Each sample in the luma residual block of the CU indicates the difference between a luma sample in one of the CU's predictive luma blocks and a corresponding sample in the CU's original luma coding block. Additionally, video encoder 20 may generate a Cb residual block for the CU. Each sample in the Cb residual block of the CU may indicate the difference between a Cb sample in one of the CU's predictive Cb blocks and a corresponding sample in the CU's original Cb coding block. Video encoder 20 may also generate a Cr residual block for the CU. Each sample in the Cr residual block of the CU may indicate the difference between a Cr sample in one of the CU's predictive Cr blocks and a corresponding sample in the CU's original Cr coding block.

此外，视频编码器20可使用四叉树分割将CU的明度、Cb及Cr残差块分解成一或多个明度、Cb及Cr变换块。变换块是对其应用相同变换的样本的矩形(例如，正方形或非正方形)块。CU的变换单元(TU)可包括明度样本的变换块、色度样本的两个对应变换块及用以对变换块样本进行变换的语法结构。因此，CU的每个TU可以与明度变换块、Cb变换块以及Cr变换块相关联。与TU相关联的明度变换块可为CU的明度残差块的子块。Cb变换块可为CU的Cb残差块的子块。Cr变换块可以是CU的Cr残差块的子块。在单色图片或具有三个单独色彩平面的图片中，TU可包括单个变换块和用以对变换块的样本进行变换的语法结构。In addition, video encoder 20 may use quadtree partitioning to decompose the luma, Cb, and Cr residual blocks of a CU into one or more luma, Cb, and Cr transform blocks. A transform block is a rectangular (e.g., square or non-square) block of samples to which the same transform is applied. A transform unit (TU) of a CU may include a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax structures used to transform the transform block samples. Thus, each TU of a CU may be associated with a luma transform block, a Cb transform block, and a Cr transform block. The luma transform block associated with a TU may be a subblock of the CU's luma residual block. The Cb transform block may be a subblock of the CU's Cb residual block. The Cr transform block may be a subblock of the CU's Cr residual block. In monochrome pictures or pictures with three separate color planes, a TU may include a single transform block and syntax structures used to transform the samples of the transform block.

视频编码器20可将一或多个变换应用到TU的明度变换块以产生TU的明度系数块。系数块可为变换系数的二维阵列。变换系数可为标量。视频编码器20可将一或多个变换应用于TU的Cb变换块以产生TU的Cb系数块。视频编码器20可将一或多个变换应用至TU的Cr变换块以产生TU的Cr系数块。Video encoder 20 may apply one or more transforms to the luma transform block of a TU to generate a luma coefficient block for the TU. A coefficient block may be a two-dimensional array of transform coefficients. A transform coefficient may be a scalar. Video encoder 20 may apply one or more transforms to the Cb transform block of a TU to generate a Cb coefficient block for the TU. Video encoder 20 may apply one or more transforms to the Cr transform block of a TU to generate a Cr coefficient block for the TU.

在产生系数块(例如，明度系数块、Cb系数块或Cr系数块)之后，视频编码器20可以量化系数块。量化总体上是指对变换系数进行量化以可能减少用以表示变换系数的数据的量从而提供进一步压缩的过程。在视频编码器20量化系数块之后，视频编码器20可以对指示经量化变换系数的语法元素进行熵编码。举例来说，视频编码器20可对指示经量化变换系数的语法元素执行上下文自适应二进制算术译码(CABAC)。After generating a coefficient block (e.g., a luma coefficient block, a Cb coefficient block, or a Cr coefficient block), video encoder 20 may quantize the coefficient block. Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the transform coefficients, thereby providing further compression. After video encoder 20 quantizes the coefficient block, video encoder 20 may entropy encode syntax elements indicating the quantized transform coefficients. For example, video encoder 20 may perform context-adaptive binary arithmetic coding (CABAC) on the syntax elements indicating the quantized transform coefficients.

视频编码器20可输出包含形成经译码图片及相关联数据的表示的位序列的位流。位流可包括一连串网络抽象层(NAL)单元。NAL单元是含有NAL单元中的数据类型的指示和含有所述数据的呈按需要穿插有模拟阻止位的原始字节序列有效负载(RBSP)的形式的字节的语法结构。NAL单元中的每一者包含NAL单元标头且囊封RBSP。NAL单元标头可包含指示NAL单元类型码的语法元素。由NAL单元的NAL单元标头指定的所述NAL单元类型代码指示NAL单元的类型。RBSP可为含有囊封在NAL单元内的整数数目个字节的语法结构。在一些情况下，RBSP包含零个位。Video encoder 20 may output a bitstream that includes a sequence of bits that form a representation of a coded picture and associated data. The bitstream may include a series of network abstraction layer (NAL) units. A NAL unit is a syntax structure that contains an indication of the type of data in the NAL unit and bytes containing the data in the form of a raw byte sequence payload (RBSP) interspersed with emulation prevention bits as needed. Each NAL unit includes a NAL unit header and encapsulates the RBSP. The NAL unit header may include a syntax element that indicates a NAL unit type code. The NAL unit type code, specified by the NAL unit header of the NAL unit, indicates the type of the NAL unit. The RBSP may be a syntax structure that contains an integer number of bytes encapsulated within the NAL unit. In some cases, the RBSP includes zero bits.

不同类型的NAL单元可囊封不同类型的RBSP。举例来说，第一类型的NAL单元可囊封用于图片参数集(PPS)的RBSP，第二类型的NAL单元可囊封用于经译码切片的RBSP，第三类型的NAL单元可囊封用于SEI的RBSP等等。封装视频译码数据的RBSP(与参数集及SEI消息的RBSP相对)的NAL单元可被称为视频编码层(VCL)NAL单元。Different types of NAL units may encapsulate different types of RBSPs. For example, a first type of NAL unit may encapsulate RBSPs for picture parameter sets (PPSs), a second type of NAL unit may encapsulate RBSPs for coded slices, a third type of NAL unit may encapsulate RBSPs for SEIs, and so on. NAL units that encapsulate RBSPs for video coding data (as opposed to RBSPs for parameter sets and SEI messages) may be referred to as video coding layer (VCL) NAL units.

视频解码器30可以接收由视频编码器20产生的位流。另外，视频解码器30可以解析位流以获得来自位流的语法元素。视频解码器30可至少部分基于从位流获得的语法元素重构构视频数据的图片。用以重构视频数据的过程大体上可以与由视频编码器20执行的过程互逆。举例来说，视频解码器30可使用PU的运动向量来确定当前CU的PU的预测块。另外，视频解码器30可反量化与当前CU的TU相关联的系数块。视频解码器30可以对系数块执行反变换以重构与当前CU的TU相关联的变换块。通过将用于当前CU的PU的预测块的样本添加到当前CU的TU的变换块的对应的样本，视频解码器30可以重构当前CU的译码块。通过重构用于图片的每一CU的译码块，视频解码器30可重构图片。Video decoder 30 may receive a bitstream generated by video encoder 20. Additionally, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct a picture of the video data based, at least in part, on the syntax elements obtained from the bitstream. The process used to reconstruct the video data may be generally inverse to the process performed by video encoder 20. For example, video decoder 30 may use the motion vectors of the PUs to determine prediction blocks for the PUs of the current CU. Additionally, video decoder 30 may inverse quantize coefficient blocks associated with the TUs of the current CU. Video decoder 30 may perform an inverse transform on the coefficient blocks to reconstruct transform blocks associated with the TUs of the current CU. Video decoder 30 may reconstruct the coding blocks of the current CU by adding samples of the prediction blocks for the PUs of the current CU to corresponding samples of the transform blocks of the TUs of the current CU. By reconstructing the coding blocks for each CU of the picture, video decoder 30 may reconstruct the picture.

在一些实例中，视频编码器20可使用合并模式或高级运动向量预测(AMVP)模式发信号通知PU的运动信息。换句话说，在HEVC中，存在预测运动参数的两个模式，一者为合并模式及另一者为AMVP。运动预测可包括基于一或多个其它视频单元的运动信息的视频单元(例如，PU)的运动信息的确定。PU的运动信息可以包含PU的运动向量以及PU的参考索引。In some examples, video encoder 20 may signal the motion information of a PU using merge mode or advanced motion vector prediction (AMVP) mode. In other words, in HEVC, there are two modes for predicting motion parameters, one is merge mode and the other is AMVP. Motion prediction may include determining the motion information of a video unit (e.g., a PU) based on the motion information of one or more other video units. The motion information of a PU may include a motion vector of the PU and a reference index of the PU.

当视频编码器20使用合并模式发信号通知当前PU的运动信息时，视频编码器20产生合并候选者列表。换句话说，视频编码器20可执行运动向量预测符清单建构过程。合并候选者列表包含指示在空间上或在时间上相邻于当前PU的PU的运动信息的合并候选者的集合。即，在合并模式中，建构运动参数(例如，参考索引、运动向量等)的候选者列表，其中候选者可以来自空间和时间相邻块。在一些实例中，候选者还可包含人工产生的候选者。When video encoder 20 uses merge mode to signal the motion information of the current PU, video encoder 20 generates a merge candidate list. In other words, video encoder 20 may perform a motion vector predictor list construction process. The merge candidate list includes a set of merge candidates that indicate the motion information of PUs that are spatially or temporally neighboring to the current PU. That is, in merge mode, a candidate list of motion parameters (e.g., reference index, motion vector, etc.) is constructed, where the candidates can come from spatially and temporally neighboring blocks. In some examples, the candidates may also include manually generated candidates.

此外，在合并模式中，视频编码器20可从合并候选者列表选择合并候选者且可使用由所选合并候选者指示的运动信息作为当前PU的运动信息。视频编码器20可发信号通知所选合并候选者的合并候选者列表中的位置。举例来说，视频编码器20可通过将索引发射到候选者列表中而发信号通知所选择的运动向量参数。视频解码器30可从位流获得进入候选者列表的索引(即，候选者列表索引)。另外，视频解码器30可产生相同合并候选者列表，且可基于所选合并候选者的位置的指示确定所选合并候选者。接着，视频解码器30可以使用选定的合并候选者的运动信息来产生当前PU的预测块。也就是说，视频解码器30可至少部分地基于候选者列表索引确定候选者列表中的所选候选者，其中所选候选者指定当前PU的运动向量。以此方式，在解码器侧处，一旦索引被解码，索引所指向的对应块的所有运动参数便可由当前PU继承。Furthermore, in merge mode, video encoder 20 may select a merge candidate from a merge candidate list and may use the motion information indicated by the selected merge candidate as the motion information for the current PU. Video encoder 20 may signal the position of the selected merge candidate in the merge candidate list. For example, video encoder 20 may signal the selected motion vector parameters by transmitting an index into the candidate list. Video decoder 30 may obtain the index into the candidate list (i.e., the candidate list index) from the bitstream. Alternatively, video decoder 30 may generate the same merge candidate list and may determine the selected merge candidate based on the indication of the position of the selected merge candidate. Video decoder 30 may then use the motion information of the selected merge candidate to generate a prediction block for the current PU. That is, video decoder 30 may determine the selected candidate in the candidate list based at least in part on the candidate list index, where the selected candidate specifies the motion vector for the current PU. In this way, at the decoder side, once the index is decoded, all motion parameters for the corresponding block pointed to by the index may be inherited by the current PU.

跳过模式类似于合并模式。在跳过模式中，视频编码器20及视频解码器30以视频编码器20及视频解码器30在合并模式中使用合并候选者列表的相同方式来产生及使用合并候选者列表。然而，在视频编码器20使用跳过模式发信号通知当前PU的运动信息时，视频编码器20不发信号通知当前PU的任何残差数据。因此，视频解码器30可在不使用残差数据的情况下基于由合并候选者列表中的选定候选者的运动信息指示的参考块而确定PU的预测块。Skip mode is similar to merge mode. In skip mode, video encoder 20 and video decoder 30 generate and use a merge candidate list in the same manner as they use a merge candidate list in merge mode. However, when video encoder 20 signals the motion information of the current PU using skip mode, video encoder 20 does not signal any residual data for the current PU. Therefore, video decoder 30 can determine the prediction block of the PU based on the reference block indicated by the motion information of the selected candidate in the merge candidate list without using residual data.

AMVP模式类似于合并模式，类似之处在于视频编码器20可产生候选者列表并且可从候选者列表选择候选者。然而，当视频编码器20使用AMVP模式发信号通知当前PU的RefPicListX运动信息时，视频编码器20可除了发信号通知当前PU的RefPicListX旗标之外还发信号通知当前PU的RefPicListX运动向量差(MVD)及当前PU的RefPicListX参考索引。当前PU的RefPicListX MVP旗标可指示AMVP候选者列表中的选定AMVP候选者的位置。当前PU的RefPicListX MVD可指示当前PU的RefPicListX运动向量与选定AMVP候选者的运动向量之间的差。以此方式，视频编码器20可通过发信号通知RefPicListX运动向量预测符(MVP)旗标、RefPicListX参考索引值和RefPicListX MVD而发信号通知当前PU的RefPicListX运动信息。换句话说，在位流中的表示当前PU的运动向量的数据可包含表示参考索引的数据、到候选者列表的索引及MVD。AMVP mode is similar to merge mode in that video encoder 20 may generate a candidate list and may select a candidate from the candidate list. However, when video encoder 20 signals the RefPicListX motion information of the current PU using AMVP mode, video encoder 20 may signal the RefPicListX motion vector difference (MVD) of the current PU and the RefPicListX reference index of the current PU in addition to signaling the RefPicListX flag of the current PU. The RefPicListX MVP flag of the current PU may indicate the position of the selected AMVP candidate in the AMVP candidate list. The RefPicListX MVD of the current PU may indicate the difference between the RefPicListX motion vector of the current PU and the motion vector of the selected AMVP candidate. In this way, video encoder 20 can signal the RefPicListX motion information of the current PU by signaling the RefPicListX motion vector predictor (MVP) flag, the RefPicListX reference index value, and the RefPicListX MVD. In other words, the data representing the motion vector of the current PU in the bitstream may include data representing the reference index, an index to the candidate list, and the MVD.

此外，在使用AMVP模式发信号通知当前PU的运动信息时，视频解码器30可从所述位流获得当前PU的MVD及MVP旗标。视频解码器30可产生相同的AMVP候选者列表且可基于MVP旗标确定所述选定AMVP候选者。视频解码器30可通过将MVD添加到由所述选定AMVP候选者指示的运动向量来恢复当前PU的运动向量。也就是说，视频解码器30可基于由所述选定AMVP候选者指示的运动向量和MVD确定当前PU的运动向量。视频解码器30接着可使用当前PU的所恢复的一或多个运动向量来产生当前PU的预测块。In addition, when the motion information of the current PU is signaled using the AMVP mode, the video decoder 30 may obtain the MVD and MVP flag of the current PU from the bitstream. The video decoder 30 may generate the same AMVP candidate list and may determine the selected AMVP candidate based on the MVP flag. The video decoder 30 may recover the motion vector of the current PU by adding the MVD to the motion vector indicated by the selected AMVP candidate. That is, the video decoder 30 may determine the motion vector of the current PU based on the motion vector indicated by the selected AMVP candidate and the MVD. The video decoder 30 may then use the recovered one or more motion vectors of the current PU to generate a prediction block for the current PU.

当视频解码器30产生当前PU的AMVP候选者列表时，视频解码器30可基于覆盖在空间上与当前PU相邻的位置的PU(即，在空间上相邻的PU)的运动信息而导出一或多个AMVP候选者。在PU的预测块包含一位置时，PU可覆盖所述位置。When video decoder 30 generates an AMVP candidate list for a current PU, video decoder 30 may derive one or more AMVP candidates based on motion information of PUs that cover positions spatially adjacent to the current PU (i.e., spatially adjacent PUs). When the prediction block of a PU includes a position, the PU may cover the position.

合并候选者列表或AMVP候选者列表中基于在时间上相邻于当前PU的PU(即，在与当前PU不同的时间实例中的PU)的运动信息的候选者可被称为TMVP。即，TMVP可用以提高HEVC的译码效率，并且不同于其它译码工具，TMVP可需要存取经解码图片缓冲器中、更具体来说是参考图片列表中的帧的运动向量。Candidates in the merge candidate list or the AMVP candidate list that are based on the motion information of a PU that is temporally adjacent to the current PU (i.e., a PU at a different time instance than the current PU) may be referred to as TMVPs. That is, TMVPs may be used to improve HEVC coding efficiency, and unlike other coding tools, TMVPs may require access to the motion vectors of frames in the decoded picture buffer, more specifically, in the reference picture list.

可基于逐CVS(经译码视频序列)、逐切片或另一基础来启用或停用TMVP的使用。SPS中的语法元素(例如，sps_temporal_mvp_enable_flag)可指示TMVP的使用是否针对CVS经启用。此外，当TMVP的使用针对CVS经启用时，可针对所述CVS内的特定切片启用或停用TMVP的使用。举例来说，切片标头中的语法元素(例如，slice_temporal_mvp_enable_flag)可指示TMVP的使用是否针对切片经启用。因此，在经帧间预测的切片中，当TMVP针对整个CVS经启用(例如，SPS中的sps_temporal_mvp_enable_flag设定成1)时，在切片标头中发信号通知slice_temporal_mvp_enable_flag以指示TMVP是否针对当前切片经启用。The use of TMVP can be enabled or disabled on a per-CVS (coded video sequence) basis, per-slice basis, or on another basis. A syntax element in the SPS (e.g., sps_temporal_mvp_enable_flag) can indicate whether the use of TMVP is enabled for a CVS. Furthermore, when the use of TMVP is enabled for a CVS, the use of TMVP can be enabled or disabled for a specific slice within that CVS. For example, a syntax element in the slice header (e.g., slice_temporal_mvp_enable_flag) can indicate whether the use of TMVP is enabled for a slice. Thus, in an inter-predicted slice, when TMVP is enabled for the entire CVS (e.g., sps_temporal_mvp_enable_flag in the SPS is set to 1), the slice_temporal_mvp_enable_flag is signaled in the slice header to indicate whether TMVP is enabled for the current slice.

为了确定TMVP，视频编解码器可首先识别包含与当前PU位于相同位置的PU的参考图片。换句话说，视频译码器可识别位于同一地点的图片。如果当前图片的当前切片是B切片(即，允许包含经双向帧间预测的PU的切片)，那么视频编码器20可在切片标头中发信号通知指示相同位置图片是来自RefPicList0还是RefPicList1的语法元素(例如，collocated_from_l0_flag)。换句话说，在针对当前切片启用TMVP的使用且当前切片是B切片(例如，允许包含双向帧间预测的PU的切片)时，视频编码器20可在切片标头中发信号通知指示位于同一地点的图片是在RefPicList0中还是RefPicList1中的语法元素(例如，collocated_from_l0_flag)。换句话说，为了得到TMVP，首先将识别位于同一地点的图片。如果当前图片为B切片，那么在切片标头中发信号通知collocated_from_l0_flag以指示相同位置的图片是来自RefPicList0还是来自RefPicList1。To determine the TMVP, the video codec may first identify a reference picture that includes a PU that is co-located with the current PU. In other words, the video decoder may identify a co-located picture. If the current slice of the current picture is a B slice (i.e., a slice that allows inclusion of bi-directionally inter-predicted PUs), video encoder 20 may signal in the slice header a syntax element (e.g., collocated_from_l0_flag) that indicates whether the co-located picture is from RefPicList0 or RefPicList1. In other words, when use of TMVP is enabled for the current slice and the current slice is a B slice (e.g., a slice that allows inclusion of bi-directionally inter-predicted PUs), video encoder 20 may signal in the slice header a syntax element (e.g., collocated_from_l0_flag) that indicates whether the co-located picture is in RefPicList0 or RefPicList1. In other words, to derive the TMVP, the co-located picture will first be identified. If the current picture is a B slice, collocated_from_10_flag is signaled in the slice header to indicate whether the co-located picture is from RefPicList0 or from RefPicList1.

在视频解码器30识别包含位于同一地点的图片的参考图片列表之后，视频解码器30可使用可在切片标头中发信号通知的另一语法元素(例如，collocated_ref_idx)来识别所识别的参考图片列表中的图片(即，位于同一地点的图片)。即，在识别参考图片列表之后，在切片标头中发信号通知的collocated_ref_idx用于识别参考图片列表中的图片。After video decoder 30 identifies a reference picture list that includes a co-located picture, video decoder 30 may use another syntax element (e.g., collocated_ref_idx) that may be signaled in a slice header to identify the picture in the identified reference picture list (i.e., the co-located picture). That is, after identifying a reference picture list, the collocated_ref_idx signaled in the slice header is used to identify the picture in the reference picture list.

视频译码器可通过检查位于同一地点的图片来识别位于同一地点的PU。TMVP可指示含有位于同一地点的PU的CU的右下方PU的运动信息或含有此PU的CU的中心PU内的右下方PU的运动信息。因此，使用含有此PU的CU的右下方PU的运动或含有此PU的CU的中心PU内的右下方PU的运动。含有位于同一地点的PU的CU的右下方PU可为覆盖直接在所述PU的预测块的右下方样本的右下方的位置的PU。换句话说，TMVP可指示在参考图片中且覆盖与当前PU的右下方拐角位于同一地点的位置的PU的运动信息，或TMVP可指示在参考图片中且覆盖与当前PU的中心位于同一地点的位置的PU的运动信息。The video coder can identify the co-located PU by examining the co-located picture. The TMVP may indicate the motion information of the bottom-right PU of the CU containing the co-located PU or the motion information of the bottom-right PU within the center PU of the CU containing the co-located PU. Therefore, the motion of the bottom-right PU of the CU containing the co-located PU or the motion of the bottom-right PU within the center PU of the CU containing the co-located PU is used. The bottom-right PU of the CU containing the co-located PU may be the PU that covers a position directly below and to the right of the bottom-right sample of the prediction block of the co-located PU. In other words, the TMVP may indicate the motion information of the PU in the reference picture that covers a position co-located with the bottom-right corner of the current PU, or the TMVP may indicate the motion information of the PU in the reference picture that covers a position co-located with the center of the current PU.

当由以上过程识别的运动向量(即，TMVP的运动向量)用于产生用于合并模式或AMVP模式的运动候选者时，视频译码器可基于时间位置(由POC值反映)缩放所述运动向量。例如，视频译码器可在当前图片及参考图片的POC值之间的差较大时将运动向量的量值增加较大的量，且在当前图片及参考图片的POC值之间的差较小时将所述运动向量的量值增加较小的量。When the motion vector identified by the above process (i.e., the motion vector of the TMVP) is used to generate a motion candidate for merge mode or AMVP mode, the video coder may scale the motion vector based on the temporal position (reflected by the POC value). For example, the video coder may increase the magnitude of the motion vector by a larger amount when the difference between the POC values of the current picture and the reference picture is larger, and increase the magnitude of the motion vector by a smaller amount when the difference between the POC values of the current picture and the reference picture is smaller.

从TMVP导出的时间合并候选者的所有可能的参考图片列表的目标参考索引可始终设定成0。然而，对于AMVP，将所有可能的参考图片的目标参考索引设定成等于经解码参考索引。换句话说，将从TMVP导出的时间合并候选者的所有可能参考图片列表的目标参考索引设定为0，而对于AMVP，将其设定为等于经解码参考索引。在HEVC中，SPS可包含旗标(例如，sps_temporal_mvp_enable_flag)且当sps_temporal_mvp_enable_flag等于1时，切片标头可包含旗标(例如，pic_temporal_mvp_enable_flag)。当对于特定图片，pic_temporal_mvp_enable_flag与temporal_id两者都等于0时，在所述特定图片或按解码次序在所述特定图片之后的图片的解码中，不将来自按解码次序在所述特定图片之前的图片的运动向量用作TMVP。The target reference indexes of all possible reference picture lists for temporal merging candidates derived from TMVP may always be set to 0. However, for AMVP, the target reference indexes of all possible reference pictures are set equal to the decoded reference indexes. In other words, the target reference indexes of all possible reference picture lists for temporal merging candidates derived from TMVP are set to 0, while for AMVP, they are set equal to the decoded reference indexes. In HEVC, the SPS may include a flag (e.g., sps_temporal_mvp_enable_flag) and when sps_temporal_mvp_enable_flag is equal to 1, the slice header may include a flag (e.g., pic_temporal_mvp_enable_flag). When both pic_temporal_mvp_enable_flag and temporal_id are equal to 0 for a particular picture, motion vectors from pictures that precede the particular picture in decoding order are not used as TMVP in the decoding of the particular picture or pictures that follow the particular picture in decoding order.

在一些实例中，视频编码器20和视频解码器30(图1)可使用用于多视图和/或3D视频译码(例如包含两个或两个以上视图的视频数据的译码)的技术。在此些实例中，视频编码器20可编码包含两个或两个以上视图的经编码视频数据的位流，且视频解码器30可解码所述经编码视频数据以将所述两个或两个以上视图提供(例如)到显示装置32。在一些实例中，视频解码器30可提供视频数据的多个视图以使显示装置32能够显示3D视频。在一些实例中，视频编码器20和视频解码器30可符合HEVC标准的3D-HEVC扩展，例如其中使用多视图译码或多视图加深度译码过程。多视图和/或3D视频译码可涉及两个或两个以上纹理视图和/或包含纹理和深度分量的视图的译码。在一些实例中，由视频编码器20编码且由视频解码器30解码的视频数据包含任何给定时间实例(即，“存取单元”内)的两个或两个以上图片，或可从其导出任何给定时间实例的两个或两个以上图片的数据。In some examples, video encoder 20 and video decoder 30 ( FIG. 1 ) may use techniques for multi-view and/or 3D video coding, e.g., coding of video data including two or more views. In such examples, video encoder 20 may encode a bitstream of encoded video data including two or more views, and video decoder 30 may decode the encoded video data to provide the two or more views, e.g., to display device 32. In some examples, video decoder 30 may provide multiple views of the video data to enable display device 32 to display 3D video. In some examples, video encoder 20 and video decoder 30 may conform to the 3D-HEVC extension of the HEVC standard, e.g., where multi-view coding or multi-view plus depth coding processes are used. Multi-view and/or 3D video coding may involve coding of two or more texture views and/or views including texture and depth components. In some examples, video data encoded by video encoder 20 and decoded by video decoder 30 includes, or data from which, two or more pictures at any given time instance (i.e., within an "access unit").

在一些实例中，装置(例如视频源18)可通过例如使用两个或两个以上空间偏移相机或其它视频俘获装置来俘获共同场景而产生所述两个或两个以上图片。自稍微不同的水平位置同时或几乎同时俘获的相同场景的两个图片可用以产生三维效果。在一些实例中，视频源18(或源装置12的另一组件)可使用深度信息或视差信息从在给定时间实例处的第一视图的第一图片产生在所述给定时间实例处的第二(或其它额外)视图的第二(或其它额外)图片。在此状况下，存取单元内的视图可包含对应于第一视图的纹理分量及可与所述纹理分量一起使用以产生第二视图的深度分量。深度或视差信息可由俘获第一视图的视频俘获装置例如基于相机参数或关于视频俘获装置的配置及第一视图的视频数据的俘获的其它已知信息来确定。深度或视差信息可另外地或可替代地例如由视频源18或源装置12的另一组件从相机参数及/或第一视图中的视频数据进行计算。In some examples, a device (such as video source 18) may generate the two or more pictures by, for example, capturing a common scene using two or more spatially offset cameras or other video capture devices. Two pictures of the same scene captured simultaneously or nearly simultaneously from slightly different horizontal positions may be used to produce a three-dimensional effect. In some examples, video source 18 (or another component of source device 12) may use depth information or disparity information to generate a second (or other additional) picture of a second (or other additional) view at a given time instance from a first picture of a first view at the given time instance. In this case, a view within an access unit may include a texture component corresponding to the first view and a depth component that may be used with the texture component to generate the second view. The depth or disparity information may be determined by the video capture device that captured the first view, for example, based on camera parameters or other known information about the configuration of the video capture device and the capture of video data for the first view. The depth or disparity information may additionally or alternatively be calculated, for example, by video source 18 or another component of source device 12 from the camera parameters and/or video data in the first view.

为呈现3D视频，显示装置32可同时或几乎同时显示与共同场景的不同视图相关联的两个图片，其是同时或几乎同时俘获的。在一些实例中，目的地装置14的用户可戴上主动式眼镜以快速地及替代性地遮挡左及右镜片，且显示装置32可快速在左视图与右视图之间与主动式眼镜同步地切换。在其它实例中，显示装置32可同时显示两个视图，且用户可佩戴被动式眼镜(例如，具有偏光镜片)，其对视图进行过滤，从而致使恰当视图进入到用户的眼睛。在其它实例中，显示装置32可包括裸眼式立体显示器，其并不需要让用户感知到3D效果的眼镜。To present 3D video, display device 32 may simultaneously or nearly simultaneously display two pictures associated with different views of a common scene, which were captured simultaneously or nearly simultaneously. In some examples, the user of destination device 14 may wear active glasses to quickly and alternately cover the left and right lenses, and display device 32 may quickly switch between the left and right views in sync with the active glasses. In other examples, display device 32 may display both views simultaneously, and the user may wear passive glasses (e.g., with polarized lenses) that filter the views so that the appropriate view enters the user's eyes. In other examples, display device 32 may include a naked-eye stereoscopic display that does not require glasses for the user to perceive a 3D effect.

多视图视频译码指代对多个视图进行译码的方式。在3D视频译码的状况下，所述多个视图可例如对应于左眼视图及右眼视图。所述多个视图中的每一视图包含多个图片。检视者对3D场景的感知归因于不同视图的图片中的对象之间的水平视差。Multi-view video coding refers to a method of coding multiple views. In the case of 3D video coding, the multiple views may correspond to a left-eye view and a right-eye view, for example. Each of the multiple views includes multiple pictures. The viewer's perception of a 3D scene is due to the horizontal disparity between objects in pictures of different views.

当前图片的当前块的视差向量(DV)是指向在与当前图片不同的视图中的对应图片中的对应块的向量。因此，使用DV，视频译码器可在对应图片中定位对应于当前图片的当前块的块。在此情况下，对应图片是与当前图片为相同的时间实例但在不同视图中的图片。对应图片中的对应块和当前图片中的当前块可包含相似视频内容；然而，当前图片中的当前块的位置与对应图片中的对应块的位置之间存在至少水平视差。当前块的DV提供对应图片中的块与当前图片中的当前块之间的此水平视差的量度。The disparity vector (DV) of a current block in a current picture is a vector that points to a corresponding block in a corresponding picture in a different view than the current picture. Thus, using the DV, a video coder can locate a block in the corresponding picture that corresponds to the current block in the current picture. In this case, the corresponding picture is a picture that is at the same time instance as the current picture but in a different view. The corresponding block in the corresponding picture and the current block in the current picture may contain similar video content; however, there is at least horizontal disparity between the position of the current block in the current picture and the position of the corresponding block in the corresponding picture. The DV of the current block provides a measure of this horizontal disparity between the block in the corresponding picture and the current block in the current picture.

在一些情况下，还可存在对应图片内的块的位置与当前图片内的当前块的位置之间的垂直视差。当前块的DV还可提供对应图片中的块与当前图片中的当前块之间的此垂直视差的量度。DV含有两个分量(x分量和y分量)，但在许多情况下垂直分量将等于零。当前视图的当前图片和不同视图的对应图片所显示的时间可为相同的，也就是说当前图片和对应图片是同一时间实例的图片。In some cases, there may also be vertical disparity between the position of a block in the corresponding picture and the position of the current block in the current picture. The DV of the current block may also provide a measure of this vertical disparity between the block in the corresponding picture and the current block in the current picture. The DV contains two components (x and y), but in many cases the vertical component will be zero. The current picture of the current view and the corresponding picture of the different view may be displayed at the same time, that is, the current picture and the corresponding picture are pictures of the same time instance.

在视频译码中，通常存在两种类型的预测，通常被称为帧内预测和帧间预测。在帧内预测中，视频译码器基于相同图片中的已经译码块预测图片中的视频块。在帧间预测中，视频译码器基于不同图片(即参考图片)的已经译码块预测图片中的视频块。如本发明中所使用，参考图片通常指代含有可用于按解码次序的后续图片的解码过程中的帧间预测的样本的任何图片。当例如根据3D-HEVC相对于当前图片对多视图内容译码时，参考图片可属于相同时间实例但在不同视图中或可在相同视图中但属于不同时间实例。在例如3D-HEVC中的多视图译码的情况下，图片间预测可包含从时间上不同图片中的另一视频块(即，从与当前图片不同的存取单元)预测当前视频块(例如CU的当前译码节点)，以及从与当前图片相同的存取单元中的但同与当前图片不同的视图相关联的不同图片预测。In video coding, there are generally two types of prediction, commonly referred to as intra-prediction and inter-prediction. In intra-prediction, the video coder predicts a video block in a picture based on already coded blocks in the same picture. In inter-prediction, the video coder predicts a video block in a picture based on already coded blocks in a different picture (i.e., a reference picture). As used in this disclosure, a reference picture generally refers to any picture containing samples that can be used for inter-prediction in the decoding process of subsequent pictures in decoding order. When coding multi-view content relative to the current picture, such as in 3D-HEVC, the reference pictures may belong to the same temporal instance but in different views, or may be in the same view but in different temporal instances. In the case of multi-view coding, such as in 3D-HEVC, inter-picture prediction can include predicting the current video block (e.g., the current coding node of a CU) from another video block in a temporally different picture (i.e., from a different access unit than the current picture), as well as predicting from a different picture in the same access unit as the current picture but associated with a different view than the current picture.

在帧间预测的后一种情况下，其可被称作视图间译码或视图间预测。在与当前图片相同的存取单元中但与和当前图片不同的视图相关联的参考图片可被称为视图间参考图片。在多视图译码中，在相同存取单元(即，具有相同时间实例)的不同视图中俘获的图片当中执行视图间预测以移除视图之间的相关。在对例如相依视图等非基础视图的图片译码时，来自相同存取单元但不同视图(例如来自参考视图，例如基础视图)的图片可添加到参考图片列表。视图间参考图片可放置到参考图片列表的任何位置中，正如任何帧间预测(例如，时间或视图间)参考图片的情况。In the latter case of inter-frame prediction, it may be referred to as inter-view coding or inter-view prediction. Reference pictures in the same access unit as the current picture but associated with a different view than the current picture may be referred to as inter-view reference pictures. In multi-view coding, inter-view prediction is performed among pictures captured in different views of the same access unit (i.e., with the same temporal instance) to remove correlation between views. When coding pictures of non-base views, such as dependent views, pictures from the same access unit but different views (e.g., from a reference view, such as a base view) may be added to a reference picture list. Inter-view reference pictures may be placed in any position in a reference picture list, just as with any inter-predicted (e.g., temporal or inter-view) reference picture.

用于预测当前图片的块的参考图片的块由运动向量识别。在多视图译码中，存在至少两个种类的运动向量。时间运动向量(TMV)为指向在与正被译码的块相同的视图中(例如，如上文所描述的帧间预测的第一实例)但与正被译码的块不同的时间实例或存取单元的时间参考图片中的块的运动向量，且对应帧间预测被称作经运动补偿的预测(MCP)。另一类型的运动向量为视差运动向量(DMV)，其指向与当前图片相同的存取单元中的但属于不同视图的图片中的块。利用DMV，对应帧间预测被称作经视差补偿的预测(DCP)或视图间预测。The blocks of the reference picture used to predict the blocks of the current picture are identified by motion vectors. In multi-view coding, there are at least two types of motion vectors. A temporal motion vector (TMV) is a motion vector that points to a block in a temporal reference picture that is in the same view as the block being coded (e.g., as in the first example of inter-frame prediction described above), but in a different temporal instance or access unit than the block being coded, and the corresponding inter-frame prediction is called motion-compensated prediction (MCP). Another type of motion vector is a disparity motion vector (DMV), which points to a block in a picture that is in the same access unit as the current picture but belongs to a different view. With DMV, the corresponding inter-frame prediction is called disparity-compensated prediction (DCP) or inter-view prediction.

在下一部分中，将论述多视图(例如，如在H.264/MVC中)及多视图加深度(例如，如在3D-HEVC中)译码技术。起初，将论述MVC技术。如上所述，MVC是ITU-TH.264/AVC的多视图译码扩展。在MVC中，以时间优先次序译码多个视图的数据，且相应地，解码次序布置被称作时间优先译码。具体来说，可译码共同时间实例处的多个视图中的每一者的视图分量(即，图片)，随后可可不同时间实例的另一组视图分量，且以此类推。存取单元可包含一个输出时间实例的所有视图的经译码图片。应理解，存取单元的解码次序不一定等于输出(或显示)次序。In the next section, multi-view (e.g., as in H.264/MVC) and multi-view plus depth (e.g., as in 3D-HEVC) coding techniques will be discussed. Initially, MVC techniques will be discussed. As described above, MVC is a multi-view coding extension of ITU-T H.264/AVC. In MVC, data for multiple views is coded in a time-first order, and accordingly, the decoding order arrangement is referred to as time-first coding. Specifically, the view components (i.e., pictures) of each of the multiple views at a common time instance may be coded, followed by another set of view components at a different time instance, and so on. An access unit may include coded pictures for all views of one output time instance. It should be understood that the decoding order of an access unit is not necessarily equal to the output (or display) order.

在图2中展示典型的MVC解码次序(即，位流次序)。解码次序布置被称作时间优先译码。应注意，存取单元的解码次序可不等于输出或显示次序。在图2中，S0到S7各自指代多视图视频的不同视图。T0到T8各自表示一个输出时间实例。存取单元可包含一个输出时间实例的所有视图的经译码图片。例如，第一存取单元可包含时间实例T0的所有视图S0到S7，第二存取单元可包含时间实例T1的所有视图S0到S7，且以此类推。A typical MVC decoding order (i.e., bitstream order) is shown in FIG2 . This decoding order arrangement is referred to as time-first coding. Note that the decoding order of access units may not be equal to the output or display order. In FIG2 , S0 to S7 each refer to a different view of the multi-view video. T0 to T8 each represent an output time instance. An access unit may include coded pictures for all views of an output time instance. For example, the first access unit may include all views S0 to S7 for time instance T0, the second access unit may include all views S0 to S7 for time instance T1, and so on.

出于简明目的，本发明可使用以下定义：For the purpose of simplicity, the following definitions may be used in this disclosure:

视图分量：单个存取单元中的视图的经译码表示。当视图包含经译码纹理及深度表示两者时，视图分量由纹理视图分量及深度视图分量构成。View component: A coded representation of a view in a single access unit. When a view includes both coded texture and depth representations, a view component consists of a texture view component and a depth view component.

纹理视图分量：单个存取单元中的视图的纹理的经译码表示。Texture view component: A coded representation of the texture of a view in a single access unit.

深度视图分量：单个存取单元中的视图的深度的经译码表示。Depth view component: A coded representation of the depth of a view in a single access unit.

在图2中，所述视图中的每一者包含若干图片集合。举例来说，视图S0包含图片0、8、16、24、32、40、48、56及64的集合，视图S1包含图片1、9、17、25、33、41、49、57及65的集合，且以此类推。对于3D视频译码，例如3D-HEVC，每一图片可包含两个分量图片：一个分量图片称为纹理视图分量，且另一分量图片称为深度视图分量。视图的图片集合内的纹理视图分量及深度视图分量可认为是彼此对应。举例来说，视图的图片集合内的纹理视图分量被认为是对应于视图的所述图片集合内的深度视图分量，且反之亦然(即，深度视图分量对应于所述集合中的其纹理视图分量，且反之亦然)。如本发明中所使用，对应于深度视图分量的纹理视图分量可认为是为单个存取单元的相同视图的部分的纹理视图分量及深度视图分量。In FIG2 , each of the views includes several picture sets. For example, view S0 includes the set of pictures 0, 8, 16, 24, 32, 40, 48, 56, and 64, view S1 includes the set of pictures 1, 9, 17, 25, 33, 41, 49, 57, and 65, and so on. For 3D video coding, such as 3D-HEVC, each picture may include two component pictures: one component picture is called a texture view component, and the other component picture is called a depth view component. The texture view components and depth view components within a view's picture set may be considered to correspond to each other. For example, a texture view component within a view's picture set is considered to correspond to a depth view component within the view's picture set, and vice versa (i.e., a depth view component corresponds to its texture view component in the set, and vice versa). As used in this disclosure, a texture view component corresponding to a depth view component may be considered to be a texture view component and a depth view component that are part of the same view of a single access unit.

纹理视图分量包含所显示的实际图像内容。举例来说，所述纹理视图分量可包含明度(Y)及色度(Cb及Cr)分量。深度视图分量可指示其对应纹理视图分量中的像素的相对深度。作为一个实例，深度视图分量为仅包含明度值的灰阶图像。换句话说，深度视图分量可不传达任何图像内容，而是提供纹理视图分量中的像素的相对深度的量度。The texture view component contains the actual image content being displayed. For example, the texture view component may include luma (Y) and chroma (Cb and Cr) components. The depth view component may indicate the relative depth of pixels in its corresponding texture view component. As an example, the depth view component is a grayscale image that only includes luma values. In other words, the depth view component may not convey any image content, but rather provides a measure of the relative depth of pixels in the texture view component.

举例来说，深度视图分量中的纯白色像素指示对应纹理视图分量中的其对应像素较接近于观察者的视角，且深度视图分量中的纯黑色像素指示对应纹理视图分量中的其对应像素距观察者的视角较远。黑色与白色之间的各种灰度渐变指示不同深度水平。举例来说，深度视图分量中的深灰色像素指示纹理视图分量中的其对应像素比深度视图分量中的浅灰色像素更远。因为仅需要灰阶来识别像素的深度，因此深度视图分量不需要包含色度分量，因为深度视图分量的色彩值可能不服务于任何目的。For example, a pure white pixel in a depth view component indicates that its corresponding pixel in the corresponding texture view component is closer to the viewer's perspective, and a pure black pixel in a depth view component indicates that its corresponding pixel in the corresponding texture view component is farther from the viewer's perspective. Various shades of gray between black and white indicate different depth levels. For example, a dark gray pixel in a depth view component indicates that its corresponding pixel in the texture view component is farther away than a light gray pixel in the depth view component. Because only grayscale is needed to identify the depth of a pixel, the depth view component does not need to include chroma components, as the color values of the depth view component may not serve any purpose.

仅使用明度值(例如，强度值)来识别深度的深度视图分量是出于说明的目的而提供，且不应被视为限制性的。在其它实例中，可利用任何技术来指示纹理视图分量中的像素的相对深度。The use of only luma values (eg, intensity values) to identify depth view components is provided for illustration purposes and should not be considered limiting. In other examples, any technique may be utilized to indicate the relative depth of pixels in a texture view component.

图3中展示用于多视图视频译码的典型MVC预测结构(包含每一视图内的图片间预测和视图间预测两者)。预测方向由箭头指示，指向的对象使用指出的对象作为预测参考。在MVC中，由视差运动补偿支持视图间预测，所述视差运动补偿使用H.264/AVC运动补偿的语法但允许将不同视图中的图片用作参考图片。A typical MVC prediction structure for multi-view video coding (including both inter-picture prediction within each view and inter-view prediction) is shown in FIG3 . The prediction direction is indicated by an arrow; the pointed object uses the pointed object as a prediction reference. In MVC, inter-view prediction is supported by disparity motion compensation, which uses the syntax of H.264/AVC motion compensation but allows pictures in different views to be used as reference pictures.

在图3的实例中，说明八个视图(具有视图ID“S0”到“S7”)，且对于每一视图说明十二个时间位置(“T0”到“T11”)。即，图3中的每一行对应于视图，而每一列指示时间位置。3, eight views are illustrated (with view IDs "S0" through "S7"), and twelve temporal positions ("T0" through "T11") are illustrated for each view. That is, each row in FIG3 corresponds to a view, and each column indicates a temporal position.

尽管MVC具有可由H.264/AVC解码器解码的所谓的基础视图，且MVC还可支持立体视图对，但MVC的优点在于其可支持使用两个以上视图作为3D视频输入且解码通过多个视图表示的此3D视频的实例。具有MVC解码器的客户端的再现器可预期具有多个视图的3D视频内容。Although MVC has so-called base views that can be decoded by H.264/AVC decoders, and MVC can also support stereoscopic view pairs, an advantage of MVC is that it can support instances where more than two views are used as 3D video input and such 3D video is decoded using multiple views. A renderer of a client with an MVC decoder can expect 3D video content with multiple views.

在每一行及每一列的交叉点处指示图3中的图片。H.264/AVC标准可使用术语帧来表示视频的一部分。本发明可互换地使用术语图片与帧。3 is indicated at the intersection of each row and each column. The H.264/AVC standard may use the term frame to refer to a portion of a video. This disclosure may use the terms picture and frame interchangeably.

使用包含字母的块来说明图3中的图片，字母标示对应图片是经帧内译码(也就是说，I图片)，还是在一个方向上经帧间译码(也就是说，作为P图片)，或是在多个方向上经帧间译码(也就是说，作为B图片)。一般来说，预测由箭头指示，其中指向的图片使用指出的图片用于预测参考。举例来说，时间位置T0处的视图S2的P图片是从时间位置T0处的视图S0的I图片预测的。The pictures in FIG3 are illustrated using blocks containing letters indicating whether the corresponding picture is intra-coded (that is, an I-picture), inter-coded in one direction (that is, as a P-picture), or inter-coded in multiple directions (that is, as a B-picture). In general, prediction is indicated by an arrow, where the pointed picture uses the indicated picture for prediction reference. For example, the P-picture of view S2 at temporal location T0 is predicted from the I-picture of view S0 at temporal location T0.

如同单视图视频编码，多视图视频译码视频序列的图片可相对于在不同时间位置处的图片预测性地编码。举例来说，时间位置T1处的视图S0的b图片具有从时间位置T0处的视图S0的I图片指向其的箭头，从而指示所述b图片是从所述I图片预测的。然而，另外，在多视图视频编码的情况下，图片可经视图间预测。也就是说，视图分量可使用其它视图中的视图分量用于参考。举例来说，在MVC中，如同另一视图中的视图分量为帧间预测参考而实现视图间预测。潜在视图间参考在序列参数集(SPS)MVC扩展中发信号通知且可通过参考图片列表建构过程加以修改，所述参考图片列表建构过程实现帧间预测或视图间预测参考的灵活排序。视图间预测也是包含3D-HEVC(多视图加深度)的HEVC的所提出的多视图扩展的特征。As with single-view video coding, pictures of a multi-view video coding video sequence can be predictively encoded relative to pictures at different temporal locations. For example, a b-picture of view S0 at temporal location T1 has an arrow pointing to it from an I-picture of view S0 at temporal location T0, indicating that the b-picture is predicted from the I-picture. However, in the case of multi-view video coding, pictures can also be inter-view predicted. That is, a view component can use view components in other views for reference. For example, in MVC, inter-view prediction is achieved as if a view component in another view is an inter-prediction reference. Potential inter-view references are signaled in the Sequence Parameter Set (SPS) MVC extension and can be modified through the reference picture list construction process, which enables flexible ordering of inter-prediction or inter-view prediction references. Inter-view prediction is also a feature of proposed multi-view extensions of HEVC, including 3D-HEVC (Multi-view Plus Depth).

图3提供视图间预测的各种实例。在图3的实例中，视图S1的图片说明为是从视图S1的不同时间位置处的图片预测，且是从相同时间位置处的视图S0及S2的图片经视图间预测。举例来说，时间位置T1处的视图S1的b图片是从时间位置T0及T2处的视图S1的B图片中的每一者以及时间位置T1处的视图S0及S2的b图片预测。FIG3 provides various examples of inter-view prediction. In the example of FIG3, pictures of view S1 are illustrated as being predicted from pictures at different temporal locations of view S1, and as being inter-view predicted from pictures of views S0 and S2 at the same temporal location. For example, the b-picture of view S1 at temporal location T1 is predicted from each of the B-pictures of view S1 at temporal locations T0 and T2, as well as the b-pictures of views S0 and S2 at temporal location T1.

在一些实例中，图3可被视为说明纹理视图分量。举例来说，图2中所说明的I、P、B及b图片可认为是视图中的每一者的纹理视图分量。根据本发明中描述的技术，对于图3中所说明的纹理视图分量中的每一者，存在对应深度视图分量。在一些实例中，可以类似于图3中针对对应纹理视图分量所说明的方式的方式预测深度视图分量。In some examples, FIG3 can be considered to illustrate texture view components. For example, the I, P, B, and b pictures illustrated in FIG2 can be considered to be texture view components for each of the views. According to the techniques described in this disclosure, for each of the texture view components illustrated in FIG3, there is a corresponding depth view component. In some examples, the depth view component can be predicted in a manner similar to that illustrated in FIG3 for the corresponding texture view components.

MVC中也可支持两个视图的译码。MVC的优点中的一个优点是：MVC编码器可将两个以上视图视为3D视频输入且MVC解码器可解码此类多视图表示。因此，具有MVC解码器的任何再现器可预期具有两个以上视图的3D视频内容。MVC can also support the decoding of two views. One of the advantages of MVC is that an MVC encoder can treat more than two views as 3D video input and an MVC decoder can decode such multi-view representations. Therefore, any renderer with an MVC decoder can expect 3D video content with more than two views.

在MVC中，允许在相同存取单元(即，具有相同时间实例)中的图片当中的视图间预测。在对非基础视图中的一者中的图片进行译码时，如果图片在不同视图中，但在相同时间实例内，那么可将图片添加到参考图片列表中。可将视图间预测参考图片放置在参考图片列表的任何位置中，正如任何帧间预测参考图片一般。如图3中所示，视图分量可出于参考目的使用其它视图中的视图分量。在MVC中，如同另一视图中的视图分量为帧间预测参考般实现视图间预测。In MVC, inter-view prediction is allowed among pictures in the same access unit (i.e., with the same time instance). When coding a picture in one of the non-base views, if the picture is in a different view but in the same time instance, the picture can be added to the reference picture list. An inter-view prediction reference picture can be placed in any position in the reference picture list, just like any inter-prediction reference picture. As shown in Figure 3, a view component can use view components in other views for reference purposes. In MVC, inter-view prediction is implemented as if the view component in another view is an inter-prediction reference.

在多视图视频译码的上下文中，一般来说存在两个种类的运动向量。一个称为正常运动向量。所述正常运动向量指向时间参考图片且对应时间帧间预测是运动补偿预测(MCP)。另一运动向量是视差运动向量(DMV)。所述DMV指向不同视图中的图片(即，视图间参考图片)且对应帧间预测是视差补偿预测(DCP)。In the context of multi-view video coding, there are generally two types of motion vectors. One is called a normal motion vector. It points to a temporal reference picture and its corresponding temporal inter-frame prediction is motion compensated prediction (MCP). The other motion vector is a disparity motion vector (DMV). It points to a picture in a different view (i.e., an inter-view reference picture) and its corresponding inter-frame prediction is disparity compensated prediction (DCP).

另一类型的多视图视频译码格式引入深度值的使用(例如，3D-HEVC中)。对于普遍用于3D电视和自由视点视频的多视图视频加深度(MVD)数据格式，可独立地以多视图纹理图片译码纹理图像和深度图。图4说明具有纹理图像的MVD数据格式及其相关联的每样本深度图。深度范围可限于在与对应3D点的相机相距最小znear和最大zfar距离的范围内。Another type of multi-view video coding format introduces the use of depth values (e.g., in 3D-HEVC). In the Multi-view Video Plus Depth (MVD) data format, commonly used for 3D television and free-viewpoint video, texture images and depth maps can be coded independently in multi-view texture pictures. FIG4 illustrates an MVD data format with a texture image and its associated per-sample depth map. The depth range can be limited to a minimum znear and maximum zfar distance from the camera for the corresponding 3D point.

在HEVC中，用于运动向量预测的技术可包含合并模式、跳过模式及高级运动向量预测(AMVP)模式。一般来说，根据合并模式及/或跳过模式，当前视频块(例如，PU)继承来自另一先前译码的相邻块(例如，相同图片中的空间上相邻块，或时间或视图间参考图片中的块)的运动信息，例如，运动向量、预测方向及参考图片索引。当实施合并/跳过模式时，视频编码器20建构作为经界定目标中的参考块的运动信息的合并候选者的列表，选择所述合并候选者中的一者，且在位流中向视频解码器30发信号通知识别所述选定合并候选者的候选者列表索引。In HEVC, techniques for motion vector prediction may include merge mode, skip mode, and advanced motion vector prediction (AMVP) mode. Generally, according to merge mode and/or skip mode, a current video block (e.g., a PU) inherits motion information, such as a motion vector, prediction direction, and reference picture index, from another previously coded neighboring block (e.g., a spatially neighboring block in the same picture, or a block in a temporal or inter-view reference picture). When implementing merge/skip mode, video encoder 20 constructs a list of merge candidates that are motion information of a reference block in a defined target, selects one of the merge candidates, and signals a candidate list index identifying the selected merge candidate to video decoder 30 in the bitstream.

在实施合并/跳过模式中，视频解码器30根据所界定的方式重构合并候选者列表且选择所述候选者列表中的由索引指示的合并候选者中的一者。视频解码器30接着可使用合并候选者中的选定一者以作为处于与合并候选者中的选定一者的所述运动向量相同的分辨率且指向与合并候选者中的选定一者的所述运动向量相同的参考图片的当前PU的运动向量。合并模式和跳过模式通过允许视频编码器20发信号通知到合并候选者列表中的索引而非用于当前视频块的帧间预测的所有运动信息而提高位流效率。In implementing merge/skip mode, video decoder 30 reconstructs a merge candidate list according to a defined manner and selects one of the merge candidates in the candidate list indicated by an index. Video decoder 30 may then use the selected one of the merge candidates as the motion vector for the current PU at the same resolution and pointing to the same reference picture as the motion vector of the selected one of the merge candidates. Merge mode and skip mode improve bitstream efficiency by allowing video encoder 20 to signal an index into the merge candidate list rather than all motion information for inter-frame prediction of the current video block.

当实施AMVP时，视频编码器20以所界定的方式建构候选运动向量预测符(MVP)的列表，选择所述候选MVP中的一者，且在位流中向视频解码器30发信号通知识别所述选定MVP的候选者列表索引。类似于合并模式，在实施AMVP时，视频解码器30以所界定的方式重构候选MVP的列表，且基于候选者列表索引而选择MVP中的一者。When implementing AMVP, video encoder 20 constructs a list of candidate motion vector predictors (MVPs) in a defined manner, selects one of the candidate MVPs, and signals a candidate list index identifying the selected MVP in the bitstream to video decoder 30. Similar to merge mode, when implementing AMVP, video decoder 30 reconstructs a list of candidate MVPs in a defined manner and selects one of the MVPs based on the candidate list index.

然而，与合并/跳过模式相反，当实施AMVP时，视频编码器20还发信号通知参考图片索引和预测方向，因此指定由候选者列表索引指定的MVP指向的参考图片。此外，视频编码器20确定当前块的运动向量差(MVD)，其中MVD为MVP与原本将用于当前块的实际运动向量之间的差。对于AMVP，除参考图片索引、参考图片方向和候选者列表索引之外，视频编码器20还在位流中发信号通知当前块的MVD。归因于给定块的参考图片索引和预测向量差的信令，AMVP可不如合并/跳过模式有效，但可提供经译码视频数据的提高的保真度。However, in contrast to merge/skip mode, when implementing AMVP, video encoder 20 also signals a reference picture index and prediction direction, thereby specifying the reference picture to which the MVP, specified by the candidate list index, points. Furthermore, video encoder 20 determines a motion vector difference (MVD) for the current block, where the MVD is the difference between the MVP and the actual motion vector that would have been used for the current block. With AMVP, in addition to the reference picture index, reference picture direction, and candidate list index, video encoder 20 also signals the MVD for the current block in the bitstream. Due to the signaling of the reference picture index and prediction vector difference for a given block, AMVP may not be as efficient as merge/skip mode, but may provide improved fidelity of the coded video data.

图5展示当前视频块47、五个空间相邻块(41、42、43、44和45)和来自另一图片但在与当前图片相同的视图中的时间参考块46的实例。时间参考块46可(例如)为在不同时间实例的图片中但在与当前视频块47相同的视图中的位于同一地点的块。在一些实例中，当前视频块47和参考视频块41到46可如当前开发中的HEVC标准中通常所界定。参考视频块41到46根据HEVC标准被标记为A0、A1、B0、B1、B2及T。视频编码器20和视频解码器30可根据运动信息预测模式(例如，合并/跳过模式或AMVP模式)基于参考视频块41到46的运动信息而预测当前视频块47的包含TMV的运动信息。如下文更详细地描述，视频块的TMV可与DMV一起使用以实现根据本发明的技术的先进残差预测。FIG5 shows an example of a current video block 47, five spatially neighboring blocks (41, 42, 43, 44, and 45), and a temporal reference block 46 from another picture but in the same view as the current picture. Temporal reference block 46 may, for example, be a co-located block in a picture at a different temporal instance but in the same view as current video block 47. In some examples, current video block 47 and reference video blocks 41-46 may be defined as generally in the HEVC standard currently under development. Reference video blocks 41-46 are labeled A0, A1, B0, B1, B2, and T according to the HEVC standard. Video encoder 20 and video decoder 30 may predict motion information, including TMVs, for current video block 47 based on the motion information of reference video blocks 41-46 according to a motion information prediction mode (e.g., merge/skip mode or AMVP mode). As described in more detail below, the TMVs of a video block may be used together with the DMVs to implement advanced residual prediction according to the techniques of this disclosure.

如图5中所说明，视频块42、44、43、41和45可分别相对于当前视频块47在左边、上方、右上方、左下方和左上方。然而，相邻块41到45相对于图5中说明的当前视频块47的数目和位置仅是实例。不同数目的相邻块和/或不同位置处的块的运动信息可考虑包含在当前视频块47的运动信息预测候选者列表中。5 , video blocks 42, 44, 43, 41, and 45 may be respectively to the left, above, above right, below left, and above left relative to current video block 47. However, the number and positions of neighboring blocks 41-45 relative to current video block 47 illustrated in FIG5 are merely examples. A different number of neighboring blocks and/or motion information of blocks at different positions may be considered for inclusion in the motion information prediction candidate list for current video block 47.

空间相邻块42、44、43、41和45中的每一者与当前视频块47的空间关系可描述如下。明度位置(xP,yP)用以指定相对于当前图片的左上样本的当前块的左上明度样本；变量nPSW和nPSH指代针对明度的当前块的宽度和高度。空间上相邻块42的左上明度样本为xP-1,yP+nPSH-1。空间上相邻块44的左上明度样本为xP+nPSW-1,yP-1。空间上相邻块43的左上明度样本为xP+nPSW,yP-1。空间上相邻块41的左上明度样本为xP-1,yP+nPSH。空间上相邻块45的左上明度样本为xP-1,yP-1。尽管相对于明度位置描述，当前和参考块可包含色度分量。The spatial relationship of each of spatially neighboring blocks 42, 44, 43, 41, and 45 to current video block 47 can be described as follows. Luma position (xP, yP) specifies the top-left luma sample of the current block relative to the top-left sample of the current picture; the variables nPSW and nPSH refer to the width and height of the current block for luma. The top-left luma sample of spatially neighboring block 42 is xP-1, yP+nPSH-1. The top-left luma sample of spatially neighboring block 44 is xP+nPSW-1, yP-1. The top-left luma sample of spatially neighboring block 43 is xP+nPSW, yP-1. The top-left luma sample of spatially neighboring block 41 is xP-1, yP+nPSH. The top-left luma sample of spatially neighboring block 45 is xP-1, yP-1. Although described relative to luma position, the current and reference blocks may include chroma components.

空间相邻块41到45中的每一者可提供用于预测当前视频块47的运动信息(例如TMV)的空间运动信息候选者。例如视频编码器20(图1)和/或视频解码器30(图1)等视频译码器可以预定次序(例如扫描次序)考虑空间上相邻参考块的运动信息。举例来说，在3D-HEVC的情况下，视频解码器可考虑参考块的运动信息以以下次序包含在合并模式的合并候选者列表中：42、44、43、41和45。在所说明的实例中，空间相邻块41到45在当前视频块47左边和/或上方。此布置为典型的，因为大多数视频译码器以光栅扫描次序从图片的左上方对视频块译码。因此，在此些实例中，空间相邻块41到45将通常在当前视频块47之前经译码。然而，在其它实例中，例如当视频译码器以不同次序对视频块译码时，空间相邻块41到45可位于当前视频块47的右边和/或下方。Each of spatially neighboring blocks 41-45 may provide a spatial motion information candidate for predicting motion information (e.g., TMV) for current video block 47. A video coder, such as video encoder 20 ( FIG. 1 ) and/or video decoder 30 ( FIG. 1 ), may consider the motion information of spatially neighboring reference blocks in a predetermined order (e.g., scan order). For example, in the case of 3D-HEVC, a video decoder may consider the motion information of reference blocks included in a merge candidate list for merge mode in the following order: 42, 44, 43, 41, and 45. In the illustrated example, spatially neighboring blocks 41-45 are to the left and/or above current video block 47. This arrangement is typical because most video coders code video blocks in raster scan order, starting from the top left of a picture. Therefore, in such examples, spatially neighboring blocks 41-45 will typically be coded before current video block 47. However, in other examples, such as when a video coder codes video blocks in a different order, spatially neighboring blocks 41 - 45 may be located to the right and/or below current video block 47 .

时间参考块46位于在当前视频块47的当前图片之前(但不必在译码次序中紧邻在其之前)经译码的时间参考图片内。另外，块46的参考图片并不一定按显示次序在当前视频块47的图片之前。参考视频块46可通常相对于当前图片中当前视频块47的位置在参考图片中位于同一地点。在一些实例中，参考视频块46位于当前图片中当前视频块47的位置右边和下方，或覆盖当前图片中当前视频块47的中心位置。Temporal reference block 46 is located in a temporal reference picture that is coded before (but not necessarily immediately before in coding order) the current picture of current video block 47. Additionally, the reference picture of block 46 does not necessarily precede the picture of current video block 47 in display order. Reference video block 46 may generally be co-located in the reference picture relative to the position of current video block 47 in the current picture. In some examples, reference video block 46 is located to the right and below the position of current video block 47 in the current picture, or overlaps the center position of current video block 47 in the current picture.

图6为说明例如根据合并/跳过模式或AMVP模式导出经视图间预测的运动向量候选者(IPMVC)和视图间视差运动向量候选者(IDMVC)用于预测当前视频块50的运动信息的实例的概念图。当视图间预测经启用时，视频编码器20和/或视频解码器30可将新的运动向量候选者IPMVC或IDMVC添加到当前视频块50的运动信息候选者列表。IPMVC可预测当前视频块50的TMV，根据本发明的技术，视频编码器20和/或视频解码器30可将其用于当前视频块50或另一视频块的ARP，如下文更详细描述。IDMVC可预测当前视频块50的DMV，视频编码器20和/或视频解码器30可将其用于当前视频块50的ARP。6 is a conceptual diagram illustrating an example of deriving inter-view predicted motion vector candidates (IPMVC) and inter-view disparity motion vector candidates (IDMVC) for predicting motion information of a current video block 50, e.g., according to merge/skip mode or AMVP mode. When inter-view prediction is enabled, video encoder 20 and/or video decoder 30 may add a new motion vector candidate, IPMVC or IDMVC, to the motion information candidate list for the current video block 50. IPMVC may predict the TMV for the current video block 50, which video encoder 20 and/or video decoder 30 may use for the ARP of the current video block 50 or another video block, as described in more detail below, according to the techniques of this disclosure. IDMVC may predict the DMV for the current video block 50, which video encoder 20 and/or video decoder 30 may use for the ARP of the current video block 50.

在图6的实例中，当前块50处于当前视图Vm中。视频编码器20和/或视频解码器30可使用视差向量(DV)51将对应或参考块52定位在参考视图V0中。视频译码器可基于相机参数或根据本文中所描述的技术中的任一者确定DV 51。举例来说，视频译码器可基于相邻块的DMV或DV例如使用基于相邻块的视差向量导出(NBDV)而确定当前视频块50的DV 51。6 , current block 50 is in current view Vm. Video encoder 20 and/or video decoder 30 may use disparity vector (DV) 51 to locate corresponding or reference block 52 in reference view V0. The video coder may determine DV 51 based on camera parameters or according to any of the techniques described herein. For example, the video coder may determine DV 51 for current video block 50 based on DMVs or DVs of neighboring blocks, e.g., using neighbor-based disparity vector derivation (NBDV).

如果参考块52未经帧内译码且未经视图间预测，且其参考图片(例如参考图片58或参考图片60)具有等于当前视频块50的相同参考图片列表中的一个条目的图片次序计数(POC)值的POC值，那么视频编码器20和/或视频解码器30可在将基于POC的参考索引转换为用于当前视频块50的IPMVC之后导出其运动信息(预测方向、参考图片和运动向量)。If reference block 52 is not intra-coded and not inter-view predicted, and its reference picture (e.g., reference picture 58 or reference picture 60) has a picture order count (POC) value equal to the POC value of an entry in the same reference picture list of the current video block 50, then video encoder 20 and/or video decoder 30 may derive its motion information (prediction direction, reference picture, and motion vector) after converting the POC-based reference index to the IPMVC for the current video block 50.

在图6的实例中，参考视频块52与第一参考图片列表(RefPicList0)中指定的指向参考视图V0中的第一参考图片58的TMV 54和第二参考图片列表(RefPicList1)中指定的指向参考视图V0中的第二图片60的TMV 56相关联。当前视频块50继承TMV 54和56由图6中的虚线箭头说明。基于参考视频块52的运动信息，视频译码器将当前视频块50的IPMVC导出为第一参考图片列表(RefPicList0)中指定的指向当前视图Vm中的第一参考图片66的TMV 62(例如具有第一参考图片列表中的与参考图片58相同的POC)和第二参考图片列表(RefPicList1)中指定的指向当前视图Vm中的第二图片68的TMV64(例如具有与参考图片60相同的POC)中的至少一者。In the example of FIG6 , reference video block 52 is associated with a TMV 54 specified in a first reference picture list (RefPicList0) pointing to a first reference picture 58 in reference view V0 and a TMV 56 specified in a second reference picture list (RefPicList1) pointing to a second picture 60 in reference view V0. That current video block 50 inherits TMVs 54 and 56 is illustrated by the dashed arrows in FIG6 . Based on the motion information of reference video block 52, the video coder derives the IPMVC of current video block 50 as at least one of TMV 62 specified in the first reference picture list (RefPicList0) pointing to a first reference picture 66 in current view Vm (e.g., having the same POC as reference picture 58 in the first reference picture list) and TMV 64 specified in the second reference picture list (RefPicList1) pointing to a second picture 68 in current view Vm (e.g., having the same POC as reference picture 60).

视频编码器20和/或视频解码器30可将TMV 62和/或TMV 64用于当前视频块50的ARP。视频编码器20和/或视频解码器30还可将DV 51转换为当前视频块50的IDMVC，且将IDMVC添加到当前视频块50的运动信息候选者列表在与IPMVC不同的位置中。IPMVC或IDMVC中的每一者可在此上下文中被称为‘视图间候选者’。Video encoder 20 and/or video decoder 30 may use TMV 62 and/or TMV 64 for the ARP of current video block 50. Video encoder 20 and/or video decoder 30 may also convert DV 51 into an IDMVC for current video block 50, and add the IDMVC to the motion information candidate list of current video block 50 in a different location from the IPMVC. Each of IPMVC or IDMVC may be referred to as an 'inter-view candidate' in this context.

在合并/跳过模式中，视频译码器将所有空间和时间合并候选者之前的IPMVC(如果可用)插入到合并候选者列表。在合并/跳过模式中，视频译码器插入从A0导出的空间合并候选者之前的IDMVC(图5的块41)。DV 51到IDMVC的转换可视为DV 51到当前视频块50的DMV的转换。视频编码器20和/或视频解码器30可将DMV用于当前视频块50的ARP。In merge/skip mode, the video coder inserts the IPMVC (if available) preceding all spatial and temporal merge candidates into the merge candidate list. In merge/skip mode, the video coder inserts the IDMVC (block 41 of FIG. 5 ) preceding the spatial merge candidate derived from A0. The conversion of DV 51 to IDMVC can be viewed as a conversion of DV 51 to the DMV of the current video block 50. Video encoder 20 and/or video decoder 30 can use the DMV for the ARP of the current video block 50.

在一些情形中，视频译码器可导出当前视频块的DV。举例来说，如上文参看图6所描述，视频编码器20和/或视频解码器30可导出用于当前视频块50的DV 51。在一些实例中，视频译码器可使用NBDV导出来导出用于当前视频块的DV。NBDV导出被用作3D-HEVC中的视差向量导出方法。In some cases, the video coder may derive the DV for the current video block. For example, as described above with reference to FIG6 , video encoder 20 and/or video decoder 30 may derive DV 51 for current video block 50. In some examples, the video coder may use NBDV derivation to derive the DV for the current video block. NBDV derivation is used as a disparity vector derivation method in 3D-HEVC.

针对3D-HEVC的提议针对所有视图使用纹理优先译码次序。换句话说，对于位流中所述多个视图中的每一者，纹理分量在视图的任何深度分量之前经译码，例如经编码或经解码。在一些情况下，例如对于视图间预测，需要DV来对特定存取单元中的视图的纹理分量中的视频块译码。然而，在纹理优先译码中，当前视频块的对应深度分量并不可用于确定当前视频块的DV。NBDV导出可由视频译码器采用，且经提议用于3D-HEVC，以在此些情形中导出用于当前视频块的DV。在当前3D-HEVC设计中，从NBDV导出而导出的DV可通过从由来自NBDV过程的DV指向的参考视图的深度图检索深度数据而进一步改善。The proposal for 3D-HEVC uses a texture-first coding order for all views. In other words, for each of the multiple views in the bitstream, the texture component is coded, e.g., encoded or decoded, before any depth component of the view. In some cases, such as for inter-view prediction, a DV is required to code a video block in the texture component of a view in a particular access unit. However, in texture-first coding, the corresponding depth component of the current video block is not available for determining the DV for the current video block. NBDV derivation can be employed by the video coder and is proposed for 3D-HEVC to derive the DV for the current video block in such cases. In the current 3D-HEVC design, the DV derived from the NBDV derivation can be further improved by retrieving depth data from the depth map of the reference view pointed to by the DV from the NBDV process.

DV用于两个视图之间的移位的估计量。因为相邻块共享视频译码中的几乎相同运动/视差信息，所以当前视频块可使用相邻块中的运动向量信息作为其运动/视差信息的良好预测符。遵循此想法，NBDV导出使用相邻视差信息用于估计不同视图中的DV。DV is used to estimate the displacement between two views. Because neighboring blocks share almost the same motion/disparity information in video decoding, the current video block can use the motion vector information in the neighboring blocks as a good predictor of its motion/disparity information. Following this idea, NBDV derivation uses neighboring disparity information to estimate DV in different views.

根据NBDV导出，视频译码器识别若干空间和时间相邻块。利用两组相邻块。一组来自空间相邻块且另一组来自时间相邻块。视频译码器随后以由当前块与候选(相邻)块之间的相关的优先级所确定的预定义次序检查空间和时间相邻块中的每一者。当视频译码器识别候选者的运动信息中的DMV(即，从相邻候选块指向视图间参考图片(相同存取单元中，但不同视图中)的运动向量)时，视频译码器将DMV转换为DV，且传回相关联的视图次序索引。举例来说，视频译码器可将当前块的DV的水平分量设定为等于DMV的水平分量，且可将DV的垂直分量设定为0。Based on the NBDV derivation, the video coder identifies several spatial and temporal neighboring blocks. Two sets of neighboring blocks are utilized. One set comes from spatial neighboring blocks and the other comes from temporal neighboring blocks. The video coder then checks each of the spatial and temporal neighboring blocks in a predefined order determined by the relative priority between the current block and the candidate (neighboring) block. When the video coder identifies a DMV in the candidate's motion information (i.e., a motion vector pointing from a neighboring candidate block to an inter-view reference picture (in the same access unit but in a different view)), the video coder converts the DMV to a DV and returns the associated view order index. For example, the video coder may set the horizontal component of the DV of the current block equal to the horizontal component of the DMV and may set the vertical component of the DV to 0.

3D-HEVC起初采纳张(Zhang)等人“3D-CE5.h：视差向量产生结果(3D-CE5.h:Disparity vector generation results)”(ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC 29/WG 11的视频译码扩展开发联合合作小组第1次会议：瑞典斯德哥尔摩，2012年7月16日到20日，文献JCT3V-A0097(MPEG编号m26052，下文中称为“JCT3V-A0097”))中所提议的NBDV导出技术。JCT3V-A0097可从以下链接下载：http://phenix.int-evry.fr/jct2/doc_end_user/current_document.php？id＝89。JCT3V-A0097的全部内容以引用的方式并入本文中。3D-HEVC initially adopted the NBDV derivation technique proposed in Zhang et al., "3D-CE5.h: Disparity vector generation results" (Joint Collaborative Team on Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 1st Meeting: Stockholm, Sweden, July 16-20, 2012, document JCT3V-A0097 (MPEG No. m26052, hereinafter "JCT3V-A0097"). JCT3V-A0097 can be downloaded from the following link: http://phenix.int-evry.fr/jct2/doc_end_user/current_document.php?id=89. The entire contents of JCT3V-A0097 are incorporated herein by reference.

在3D-HEVC的一些提议中，当视频译码器执行NBDV导出过程时，视频译码器按次序检查时间相邻块中的视差运动向量、空间相邻块中的视差运动向量且随后检查隐式视差向量(IDV)。IDV可为使用视图间预测译码的空间上或时间上相邻PU的视差向量。IDV也可被称作经导出视差向量。IDV可在PU采用视图间预测时产生，即，用于AMVP或合并模式的候选者借助于视差向量从另一视图中的参考块导出。此视差向量称为IDV。IDV可出于DV导出的目的存储到PU。举例来说，尽管利用运动预测译码块，但出于对以下视频块译码的目的而并不丢弃块的所导出DV。因此，当视频译码器识别DMV或IDV时，视频译码器可传回所识别的DMV或IDV。In some proposals for 3D-HEVC, when a video coder performs the NBDV derivation process, the video coder sequentially checks the disparity motion vectors in temporally neighboring blocks, the disparity motion vectors in spatially neighboring blocks, and then checks the implicit disparity vector (IDV). The IDV may be the disparity vector of a spatially or temporally neighboring PU coded using inter-view prediction. The IDV may also be referred to as a derived disparity vector. The IDV may be generated when the PU employs inter-view prediction, i.e., a candidate for AMVP or merge mode is derived from a reference block in another view using the disparity vector. This disparity vector is referred to as the IDV. The IDV may be stored in the PU for the purpose of DV derivation. For example, although a block is coded using motion prediction, the derived DV of the block is not discarded for the purpose of coding the following video block. Therefore, when the video coder identifies a DMV or IDV, the video coder may return the identified DMV or IDV.

在桑(Sung)等人的“3D-CE5.h：基于HEVC的3D视频译码的视差向量导出的简化(3D-CE5.h:Simplification of disparity vector derivation for HEVC-based 3Dvideo coding)”(ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC 29/WG 11的视频译码扩展开发联合合作小组第1次会议：瑞典斯德哥尔摩，2012年7月16-20日，文献JCT3V-A0126(MPEG编号m26079，下文为“JCT3V-A0126”))中描述的简化NBDV导出过程包含隐式视差向量(IDV)。JCT3V-A0126可从以下链接下载：http://phenix.int-evry.fr/jct2/doc_end_user/current_document.php？id＝142。The simplified NBDV derivation process described in Sung et al., “3D-CE5.h: Simplification of disparity vector derivation for HEVC-based 3D video coding” (ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 Joint Collaboration Group on Video Coding Extension Development, 1st Meeting: Stockholm, Sweden, July 16–20, 2012, document JCT3V-A0126 (MPEG No. m26079, hereinafter “JCT3V-A0126”)) includes an implicit disparity vector (IDV). JCT3V-A0126 can be downloaded from the following link: http://phenix.int-evry.fr/jct2/doc_end_user/current_document.php?id=142.

在康(Kang)等人的“3D-CE5.h：用于视差向量导出的改进(3D-CE5.h:Improvementfor disparity vector derivation)”(ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC 29/WG 11的视频译码扩展开发联合合作小组第2次会议：中国上海，2012年10月13-19日，文献JCT3V-B0047(MPEG编号m26736，下文为“JCT3V-B0047”))中描述针对3D-HEVC的NBDV导出过程的进一步开发。JCT3V-B0047可从以下链接下载：http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝236。Further development of the NBDV derivation process for 3D-HEVC is described in Kang et al., “3D-CE5.h: Improvement for disparity vector derivation,” Joint Collaboration Group on Video Coding Extensions Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 2nd Meeting: Shanghai, China, October 13–19, 2012, document JCT3V-B0047 (MPEG No. m26736, hereinafter “JCT3V-B0047”). JCT3V-B0047 can be downloaded from the following link: http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=236.

在JCT3V-B0047中，通过移除存储在经解码图片缓冲器中的IDV而进一步简化用于3D-HEVC的NBDV导出过程。还通过随机存取点(RAP)图片选择提高译码增益。视频译码器可将传回的视差运动向量或IDV转换为视差向量且可使用所述视差向量用于视图间运动预测和视图间残差预测。随机存取指代从不是位流中的第一经译码图片的经译码图片开始的位流的解码。随机存取图片或随机存取点以规则的间隔插入到位流中可实现随机存取。随机存取图片的实例类型包含即时解码器刷新(IDR)图片、清洁随机存取(CRA)图片和断链存取(BLA)图片。因此，IDR图片、CRA图片和BLA图片统称为RAP图片。在一些实例中，RAP图片可使NAL单元类型等于BLA_W_LP、BLA_W_RADL、BLA_N_LP、IDR_W_RADL、IDR_N_LP、RSV_IRAP_VCL22、RSV_IRAP_VCL23或CRA_NUT。In JCT3V-B0047, the NBDV derivation process for 3D-HEVC is further simplified by removing the IDV stored in the decoded picture buffer. Coding gain is also improved through random access point (RAP) picture selection. The video decoder can convert the returned disparity motion vector or IDV into a disparity vector and use the disparity vector for inter-view motion prediction and inter-view residual prediction. Random access refers to the decoding of a bitstream starting from a coded picture that is not the first coded picture in the bitstream. Random access can be achieved by inserting random access pictures or random access points into the bitstream at regular intervals. Example types of random access pictures include instant decoder refresh (IDR) pictures, clean random access (CRA) pictures, and broken link access (BLA) pictures. Therefore, IDR pictures, CRA pictures, and BLA pictures are collectively referred to as RAP pictures. In some examples, a RAP picture may have the NAL unit type equal to BLA_W_LP, BLA_W_RADL, BLA_N_LP, IDR_W_RADL, IDR_N_LP, RSV_IRAP_VCL22, RSV_IRAP_VCL23, or CRA_NUT.

在康(Kang)等人的“CE2.h：3D-HEVC中基于CU的视差向量导出(CE2.h:CU-baseddisparity vector derivation in 3D-HEVC)”(ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC29/WG11的视频译码扩展开发联合合作小组第4次会议：韩国仁川，2013年4月20日到26日，文献JCT3V-D0181(MPEG编号m29012，下文为“JCT3V-D0181”))中提议用于针对3D-HEVC的基于CU的DV导出的技术。JCT3V-D0181可从以下链接下载：http://phenix.it-sudparis.eu/jct3v/doc_end_user/current_document.php？id＝866。Kang et al., “CE2.h: CU-based disparity vector derivation in 3D-HEVC,” Joint Collaboration Group on Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29/WG11, 4th Meeting, Incheon, South Korea, April 20–26, 2013, document JCT3V-D0181 (MPEG No. m29012, hereinafter “JCT3V-D0181”), proposes techniques for CU-based DV derivation for 3D-HEVC. JCT3V-D0181 can be downloaded from the following link: http://phenix.it-sudparis.eu/jct3v/doc_end_user/current_document.php?id=866.

当视频译码器识别DMV或IDV时，视频译码器可终止检查过程。因此，一旦视频译码器找到当前块的DV，视频译码器便可终止NBDV导出过程。当视频译码器不能够通过执行NBDV导出过程确定当前块的DV时(即，当不存在NBDV导出过程期间发现的DMV或IDV时)，NBDV被标记为不可用。换句话说，可认为NBDV导出过程传回不可用视差向量。When the video coder identifies a DMV or IDV, the video coder can terminate the checking process. Therefore, once the video coder finds the DV for the current block, the video coder can terminate the NBDV derivation process. When the video coder is unable to determine the DV for the current block by performing the NBDV derivation process (i.e., when there are no DMVs or IDVs found during the NBDV derivation process), the NBDV is marked as unavailable. In other words, the NBDV derivation process can be considered to return an unavailable disparity vector.

如果视频译码器不能够通过执行NBDV导出过程导出当前块的DV(即，如果未发现视差向量)，那么视频译码器可使用0DV为当前PU的DV。0DV为具有等于0的水平分量和垂直分量两者的DV。因此，即使当NBDV导出过程传回不可供使用的结果时，视频译码器的需要DV的其它译码过程也可将0视差向量用于当前块。在一些实例中，如果视频译码器不能够通过执行NBDV导出过程而导出当前块的DV，那么视频译码器可停用当前块的视图间残差预测。然而，不管视频译码器是否能够通过执行NBDV导出过程而导出当前块的DV，视频译码器都可针对当前块使用视图间预测。也就是说，如果在检查所有预定义相邻块之后未发现DV，那么0视差向量可用于视图间预测，同时可针对对应CU停用视图间残差预测。If the video coder is unable to derive the DV of the current block by performing the NBDV derivation process (i.e., if no disparity vector is found), the video coder may use 0DV as the DV of the current PU. 0DV is a DV having both horizontal and vertical components equal to 0. Therefore, even when the NBDV derivation process returns a result that is not available for use, other decoding processes of the video coder that require DV may use a 0 disparity vector for the current block. In some examples, if the video coder is unable to derive the DV of the current block by performing the NBDV derivation process, the video coder may disable inter-view residual prediction for the current block. However, regardless of whether the video coder is able to derive the DV of the current block by performing the NBDV derivation process, the video coder may use inter-view prediction for the current block. That is, if no DV is found after checking all predefined neighboring blocks, a 0 disparity vector may be used for inter-view prediction, while inter-view residual prediction may be disabled for the corresponding CU.

图7为说明相对于当前视频块90的可使用NBDV导出从其导出当前视频块的DV的实例空间相邻块的概念图。图7中说明的五个空间相邻块是相对于当前视频块的左下块96、左边块95、右上块92、上方块93和左上块94。空间相邻块可为覆盖当前视频块的CU的左下、左边、右上、上方和左上块。应注意，NBDV的这些空间相邻块可与由视频译码器例如根据HEVC中的合并/AMVP模式用于当前视频块的运动信息预测的空间相邻块相同。在此些情况下，可不需要由视频译码器针对NBDV的额外存储器存取，因为已经考虑将空间相邻块的运动信息用于当前视频块的运动信息预测。FIG7 is a conceptual diagram illustrating example spatial neighboring blocks relative to a current video block 90 from which a DV can be derived using NBDV. The five spatial neighboring blocks illustrated in FIG7 are a lower left block 96, a left block 95, an upper right block 92, an upper block 93, and an upper left block 94 relative to the current video block. The spatial neighboring blocks may be the lower left, left, upper right, upper, and upper left blocks of the CU covering the current video block. It should be noted that these spatial neighboring blocks of NBDV may be the same spatial neighboring blocks used by the video coder for motion information prediction of the current video block, for example, according to the merge/AMVP mode in HEVC. In such cases, additional memory access by the video coder for the NBDV may not be required because the motion information of the spatial neighboring blocks has already been taken into account for motion information prediction of the current video block.

为了检查时间相邻块，视频译码器建构候选图片列表。在一些实例中，视频译码器可处理来自当前视图的多达两个参考图片，即，与当前视频块相同的视图，作为候选图片。视频译码器可首先将位于同一地点的参考图片插入到候选图片列表中，接着按参考图片索引的升序插入候选图片的其余部分。当具有两个参考图片列表中相同参考索引的参考图片可用时，视频译码器可将与同一地点的图片相同的参考图片列表中的一者插入在来自另一参考图片列表的另一参考图片之前。在一些实例中，视频译码器可识别三个候选区，以用于从候选图片列表中的候选图片中的每一者导出时间相邻块。所述三个候选区可如下界定：To check temporally neighboring blocks, the video coder constructs a candidate picture list. In some examples, the video coder may process up to two reference pictures from the current view, i.e., the same view as the current video block, as candidate pictures. The video coder may first insert the co-located reference pictures into the candidate picture list, and then insert the rest of the candidate pictures in ascending order of reference picture index. When a reference picture with the same reference index in two reference picture lists is available, the video coder may insert one of the reference picture lists that is the same as the co-located picture before another reference picture from the other reference picture list. In some examples, the video coder may identify three candidate regions for deriving temporally neighboring blocks from each of the candidate pictures in the candidate picture list. The three candidate regions may be defined as follows:

●CPU：当前PU或当前CU的位于同一地点的区。●CPU: The area co-located with the current PU or current CU.

●CLCU：覆盖当前块的所述位于同一地点的区的最大译码单元(LCU)。● CLCU: Largest coding unit (LCU) covering the co-located region of the current block.

●BR：CPU的右下4×4块。BR: The lower right 4×4 block of the CPU.

如果覆盖候选区的PU指定DMV，那么视频译码器可基于PU的视差运动向量确定当前视频单元的DV。If the PU covering the candidate area specifies a DMV, the video coder may determine the DV for the current video unit based on the disparity motion vector of the PU.

如上文所论述，除从空间及时间相邻块导出的DMV外，视频译码器还可检查IDV。在3D-HTM 7.0及稍后版本的所提议的NBDV导出过程中，视频译码器依次检查时间相邻块中的DMV，随后是空间相邻块中的DMV，且随后是IDV。一旦发现DMV或IDV，过程就终止。另外，NBDV导出过程中检查的空间相邻块的数目进一步减小到二。As discussed above, in addition to DMVs derived from spatial and temporal neighboring blocks, the video coder can also check IDVs. In the proposed NBDV derivation process of 3D-HTM 7.0 and later versions, the video coder sequentially checks DMVs in temporally neighboring blocks, then DMVs in spatially neighboring blocks, and then IDVs. Once a DMV or IDV is found, the process terminates. In addition, the number of spatially neighboring blocks checked during the NBDV derivation process is further reduced to two.

当视频译码器检查相邻PU(即，空间或时间相邻PU)时，视频译码器可首先检查相邻PU是否具有视差运动向量。如果相邻PU均不具有视差运动向量，那么视频译码器可确定空间相邻PU中的任一者是否具有IDV。如果空间相邻PU中的一者具有IDV且所述IDV是作为合并/跳过模式而经译码，那么视频译码器可终止检查过程且可使用所述IDV作为当前PU的最终视差向量。When the video coder checks a neighboring PU (i.e., a spatial or temporal neighboring PU), the video coder may first check whether the neighboring PU has a disparity motion vector. If none of the neighboring PUs has a disparity motion vector, the video coder may determine whether any of the spatial neighboring PUs has an IDV. If one of the spatial neighboring PUs has an IDV and the IDV is coded as a merge/skip mode, the video coder may terminate the checking process and use the IDV as the final disparity vector for the current PU.

如上文所指出，视频译码器可应用NBDV导出过程以导出当前块(例如，CU、PU等)的DV。当前块的视差向量可指示参考视图中的参考图片(即，参考分量)中的位置。在一些3D-HEVC设计中，允许视频译码器存取参考视图的深度信息。在一些此些3D-HEVC设计中，当视频译码器使用NBDV导出过程导出当前块的DV时，视频译码器可应用提高过程以进一步提炼当前块的视差向量。视频译码器可基于参考图片的深度图提炼当前块的DV。视频译码器可使用类似提炼过程来提炼DMV以用于后向视图合成预测。以此方式，深度可用于提炼DV或DMV以用于后向视图合成预测。此提炼过程可在本文中被称作NBDV提炼(“NBDV-R”)、NBDV提炼过程或深度定向的NBDV(Do-NBDV)。As noted above, a video coder may apply an NBDV derivation process to derive the DV for a current block (e.g., a CU, PU, etc.). The disparity vector for the current block may indicate a location in a reference picture (i.e., a reference component) in a reference view. In some 3D-HEVC designs, the video coder is allowed to access depth information for the reference view. In some such 3D-HEVC designs, when the video coder derives the DV for the current block using the NBDV derivation process, the video coder may apply a boosting process to further refine the disparity vector for the current block. The video coder may refine the DV for the current block based on a depth map of a reference picture. The video coder may use a similar refinement process to refine the DMV for backward view synthesis prediction. In this way, depth can be used to refine either the DV or the DMV for backward view synthesis prediction. This refinement process may be referred to herein as NBDV refinement ("NBDV-R"), an NBDV refinement process, or depth-oriented NBDV (Do-NBDV).

当NBDV导出过程传回可用的视差向量时(例如，当NBDV导出过程传回指示NBDV导出过程能够基于相邻块的视差运动向量或IDV导出当前块的视差向量的变量时)，视频译码器可进一步通过检索来自参考图片的深度图的深度数据而提炼视差向量。在一些实例中，提炼过程包含以下两个步骤：When the NBDV derivation process returns a usable disparity vector (e.g., when the NBDV derivation process returns a variable indicating that the NBDV derivation process is capable of deriving the disparity vector of the current block based on the disparity motion vector or IDV of the neighboring block), the video coder may further refine the disparity vector by retrieving depth data from a depth map of the reference picture. In some examples, the refinement process includes the following two steps:

1)在例如基础视图等先前经译码参考深度视图中通过所导出的DV定位对应深度块；对应深度块的大小与当前PU的大小相同。1) Locate the corresponding depth block by the derived DV in a previously coded reference depth view, such as the base view; the size of the corresponding depth block is the same as that of the current PU.

2)从对应深度块的四个隅角像素选择一个深度值且将其转换为经提炼DV的水平分量。DV的垂直分量不变。2) Select one depth value from the four corner pixels of the corresponding depth block and convert it to the horizontal component of the refined DV. The vertical component of DV remains unchanged.

经提炼DV可用于当前视频块的视图间预测，而未经提炼的DV可用于当前视频块的视图间残差预测。此外，将经提炼的DV存储为一个PU的运动向量(如果使用后向视图合成预测(BVSP)模式对其进行译码)，其在下文更详细地描述。在3D-HTM 7.0及稍后版本的所提议的NBDV过程中，存取基础视图的深度视图分量，而不管从NBDV过程导出的视图次序索引的值如何。The refined DV can be used for inter-view prediction of the current video block, while the unrefined DV can be used for inter-view residual prediction of the current video block. In addition, the refined DV is stored as the motion vector of one PU (if it is coded using backward view synthesis prediction (BVSP) mode), which is described in more detail below. In the proposed NBDV process of 3D-HTM 7.0 and later versions, the depth view component of the base view is accessed regardless of the value of the view order index derived from the NBDV process.

已经在安(An)等人的“3D-CE3：子PU层级视图间运动预测(3D-CE3:Sub-PU levelinter-view motion prediction)”(ITU-T SG 16WP 3和ISO/IEC JTC 1/SC 29/WG 11的3D视频译码扩展开发联合合作小组，第6次会议，瑞士日内瓦，2013年10月25日到11月1日(文献JCT3V-F0110)，下文称为“JCT3V-F0110”)中提出用以产生新合并候选者的子PU层级视图间运动预测技术。JCT3V-F0110可从以下链接下载：http://http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php？id＝1447。将新候选者添加到合并候选者列表。A sub-PU level inter-view motion prediction technique for generating new merge candidates was proposed in An et al., “3D-CE3: Sub-PU level inter-view motion prediction” (Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 6th Meeting, Geneva, Switzerland, October 25–November 1, 2013 (document JCT3V-F0110, hereinafter referred to as “JCT3V-F0110”). JCT3V-F0110 can be downloaded from the following link: http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1447 ). The new candidate is added to the merge candidate list.

图8是说明子预测单元(PU)视图间运动预测的概念图。如图8中所展示，当前视图V1中的当前PU 98可被分裂成多个子PU(例如，四个子PU)。每一子PU的视差向量可用于定位参考视图V0中的对应参考块。视频编码器20和/或视频解码器30可经配置以复制(即，再使用)与所述参考块中的每一者相关联的运动向量以用于与当前PU 8的对应子PU一起使用。FIG8 is a conceptual diagram illustrating inter-view motion prediction for sub-prediction units (PUs). As shown in FIG8, current PU 98 in current view V1 may be split into multiple sub-PUs (e.g., four sub-PUs). The disparity vector of each sub-PU may be used to locate a corresponding reference block in reference view V0. Video encoder 20 and/or video decoder 30 may be configured to copy (i.e., reuse) the motion vector associated with each of the reference blocks for use with the corresponding sub-PU of current PU 8.

在一个实例中，使用以下技术导出称为子PU合并候选者的新候选者。首先，通过nPSW×nPSH标示当前PU的大小，通过N×N标示发信号通知的子PU大小，且通过subW×subH标示最终子PU大小。取决于PU大小及发信号通知的子PU大小，可将当前PU划分成一或多个子PU，如下：In one example, a new candidate, called a sub-PU merge candidate, is derived using the following technique. First, the size of the current PU is denoted by nPSW×nPSH, the signaled sub-PU size is denoted by N×N, and the final sub-PU size is denoted by subW×subH. Depending on the PU size and the signaled sub-PU size, the current PU can be divided into one or more sub-PUs as follows:

subW＝max(N,nPSW)！＝N？N:nPSW；subW=max(N,nPSW)! =N? N: nPSW;

subH＝max(N,nPSH)！＝N？N:nPSH；subH=max(N,nPSH)! =N? N: nPSH;

视频编码器20和/或视频解码器30可针对每一参考图片列表将默认运动向量tmvLX设定为(0,0)且将参考索引refLX设定为-1(其中X表示参考图片列表0或参考图片列表1)。对于光栅扫描次序中的每一子PU，以下适用：Video encoder 20 and/or video decoder 30 may set the default motion vector tmvLX to (0,0) and the reference index refLX to -1 for each reference picture list (where X represents reference picture list 0 or reference picture list 1). For each sub-PU in raster scan order, the following applies:

-将从DoNBDV导出过程或NBDV过程获得的DV添加到当前子PU的中间位置以通过下式获得参考样本位置(xRefSub,yRefSub)：- Add the DV obtained from the DoNBDV derivation process or the NBDV process to the middle position of the current sub-PU to obtain the reference sample position (xRefSub, yRefSub) by the following formula:

xRefSub＝Clip3(0,PicWidthInSamplesL-1,xPSub+nPSWsub/2+((mvDisp[0]+2)＞＞2))xRefSub＝Clip3(0,PicWidthInSamplesL-1,xPSub+nPSWsub/2+((mvDisp[0]+2)>>2))

yRefSub＝Clip3(0,PicHeightInSamplesL-1,yPSub+nPSHSub/2+((mvDisp[1]+2)＞＞2))yRefSub＝Clip3(0,PicHeightInSamplesL-1,yPSub+nPSHSub/2+((mvDisp[1]+2)>>2))

参考视图中覆盖(xRefSub,yRefSub)的块用作当前子PU的参考块。The block covering (xRefSub, yRefSub) in the reference view is used as the reference block of the current sub-PU.

-对于所述所识别的参考块：-For the identified reference block:

1)如果使用时间运动向量译码所述所识别的参考块，那么以下适用：1) If the identified reference block is coded using a temporal motion vector, then the following applies:

-相关联的运动参数可用作当前子PU的候选运动参数。- The associated motion parameters may be used as candidate motion parameters for the current sub-PU.

-将tmvLX和refLX更新为当前子PU的运动信息。-Update tmvLX and refLX to the motion information of the current sub-PU.

-如果当前子PU不是光栅扫描次序中的第一者，那么所有先前子PU继承运动信息(tmvLX及refLX)。If the current sub-PU is not the first in raster scan order, all previous sub-PUs inherit the motion information (tmvLX and refLX).

2)否则(参考块经帧内译码)，将当前子PU的运动信息设定成tmvLX和refLX。2) Otherwise (the reference block is intra-coded), the motion information of the current sub-PU is set to tmvLX and refLX.

不同子PU块大小可以用于上述的用于子PU层级视图间运动预测的技术中，包含4×4、8×8及16×16。可在例如视图参数集(VPS)等参数集中发信号通知子PU块的大小。Different sub-PU block sizes can be used in the above-mentioned techniques for sub-PU level inter-view motion prediction, including 4×4, 8×8, and 16×16. The size of the sub-PU block can be signaled in a parameter set such as a view parameter set (VPS).

高级残差预测(ARP)是设法利用视图之间的残差相关度以便提供额外译码效率的译码工具。在ARP中，通过对准当前视图处的运动信息以用于参考视图中的运动补偿而产生残差预测符。另外，引入加权因子以补偿视图之间的质量差异。在针对一个块启用ARP时，发信号通知当前残差与残差预测符之间的差。即，从残差预测符的残差减去当前块的残差，且发信号通知所得的差。在3D-HEVC的一些提议中，ARP仅适用于具有等于Part_2Nx2N的分割模式的经帧间译码CU。Advanced Residual Prediction (ARP) is a coding tool that seeks to exploit residual correlation between views to provide additional coding efficiency. In ARP, a residual predictor is generated by aligning the motion information of the current view for motion compensation in the reference view. In addition, a weighting factor is introduced to compensate for quality differences between views. When ARP is enabled for a block, the difference between the current residual and the residual predictor is signaled. That is, the residual of the current block is subtracted from the residual predictor's residual, and the resulting difference is signaled. In some proposals for 3D-HEVC, ARP applies only to inter-coded CUs with a partitioning mode equal to Part_2Nx2N.

图9为说明用于经时间预测视频块的ARP的实例提议的实例预测结构的概念图。如张(Zhang)等人“CE4：用于多视图译码的高级残差预测(CE4:Advanced residualprediction for multiview coding)”(ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC 29/WG 11的视频译码扩展开发联合合作小组第4次会议：韩国仁川，2013年4月20日到26日，文献JCT3V-D0177(MPEG编号m29008，下文中称为“JCT3V-D0177”))中所提议，在第4次JCT3V会议中采纳应用于具有等于Part_2Nx2N的分割模式的CU的ARP。JCT3V-D0177可从以下链接下载：http://phenix.it-sudparis.eu/jct3v/doc_end_user/current_document.php？id＝ 862。FIG9 is a conceptual diagram illustrating an example proposed prediction structure for ARP for temporally predicted video blocks. As proposed in Zhang et al., "CE4: Advanced residual prediction for multiview coding," ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Joint Collaboration Group on Video Coding Extension Development, 4th Meeting: Incheon, South Korea, April 20-26, 2013, document JCT3V-D0177 (MPEG No. m29008, hereinafter "JCT3V-D0177"), ARP applied to CUs with a partition mode equal to Part_2Nx2N was adopted at the 4th JCT3V meeting. JCT3V-D0177 can be downloaded from the following link: http://phenix.it-sudparis.eu/jct3v/doc_end_user/current_document.php? id= 862 .

如图9中所展示，视频译码器在当前(例如相依)视图Vm的当前图片102中的当前视频块100的残差的预测中调用或识别以下块。As shown in FIG. 9 , the video coder calls upon or identifies the following blocks in the prediction of the residual of the current video block 100 in the current picture 102 of the current (eg, dependent) view Vm.

1)当前视频块100(视图V_m中)：Curr1) Current video block 100 (in view V _m ): Curr

2)参考/基础视图(图9中的V₀)的视图间参考图片108中的视图间参考视频块106：Base。视频译码器基于当前视频块100的DV 104导出视图间参考视频块106(Curr)。视频译码器可使用NBDV导出确定DV 104，如上文所描述。2) An inter-view reference video block 106 in the inter-view reference picture 108 of the reference/base view (V ₀ in FIG9 ): Base. The video coder derives the inter-view reference video block 106 (Curr) based on the DV 104 of the current video block 100. The video coder may determine the DV 104 using NBDV derivation, as described above.

3)与当前视频块100(Curr)相同的视图(V_m)中的时间参考图片114中的时间参考视频块112：CurrTRef。视频译码器基于当前视频块100的TMV 110导出时间参考视频块112。视频译码器可使用本文中所描述的技术中的任一者确定TMV 110。3) A temporal reference video block 112 in a temporal reference picture 114 in the same view (V _m ) as the current video block 100 (Curr): CurrTRef. The video coder derives the temporal reference video block 112 based on the TMV 110 of the current video block 100. The video coder may determine the TMV 110 using any of the techniques described herein.

4)参考视图(即，与视图间参考视频块106(Base)相同的视图)中的时间参考图片118中的时间参考视频块116：BaseTRef。视频译码器使用当前视频块100(Curr)的TMV 110导出参考视图中的时间参考视频块116。TMV+DV的向量121可相对于当前视频块100(Curr)识别时间参考视频块116(BaseTRef)。4) Temporal reference video block 116 in a temporal reference picture 118 in a reference view (i.e., the same view as inter-view reference video block 106 (Base)): BaseTRef. The video coder derives temporal reference video block 116 in the reference view using TMV 110 of current video block 100 (Curr). A vector 121 of TMV+DV may identify temporal reference video block 116 (BaseTRef) relative to current video block 100 (Curr).

当视频编码器20基于视频编码器20使用TMV 110识别的时间参考视频块112对当前视频块100进行时间帧间预测时，视频编码器20将当前视频块100与时间参考视频块112之间的逐像素差异确定为残差块。无ARP的情况下，视频编码器20将对残差块进行变换、量化和熵编码。视频解码器30将对经编码视频位流进行熵解码，执行反量化和变换以导出残差块，且将残差块应用到参考视频块112的重构以重构当前视频块100。When video encoder 20 performs temporal inter-frame prediction on current video block 100 based on temporal reference video block 112 identified by video encoder 20 using TMV 110, video encoder 20 determines the pixel-by-pixel difference between current video block 100 and temporal reference video block 112 as a residual block. Without ARP, video encoder 20 transforms, quantizes, and entropy encodes the residual block. Video decoder 30 entropy decodes the encoded video bitstream, performs inverse quantization and transforms to derive the residual block, and applies the residual block to the reconstruction of reference video block 112 to reconstruct current video block 100.

通过使用ARP，视频译码器确定预测残差块的值(即，预测当前视频块100(Curr)与时间参考视频块112(CurrTRef)之间的差)的残差预测符块。视频编码器20可随后仅需要编码残差块与残差预测符块之间的差，从而减少用于编码当前视频块100的经编码视频位流中包含的信息量。在图9的时间ARP实例中，基于参考/基础视图(V₀)中的对应于当前视频块100(Curr)和时间参考视频块112(CurrTRef)且由DV 104识别的块确定当前视频块100的残差的预测符。参考视图中的这些对应块之间的差可为残差的良好预测符，即，当前视频块100(Curr)与时间参考视频块112(CurrTRef)之间的差。特定来说，视频译码器识别参考视图中的视图间参考视频块106(Base)和时间参考视频块116(BaseTRef)，且基于视图间参考视频块106与时间参考视频块116之间的差(BaseTRef-Base)确定残差预测符块，其中减法运算应用到所表示的像素阵列的每一像素。在一些实例中，视频译码器可将加权因子w应用到残差预测符。在此些实例中，当前块的最终预测符(即，参考块与残差预测符块求和)可表示为：CurrTRef+w*(BaseTRef-Base)。By using ARP, the video coder determines a residual predictor block that predicts the value of the residual block (i.e., predicts the difference between the current video block 100 (Curr) and the temporal reference video block 112 (CurrTRef)). Video encoder 20 may then only need to encode the difference between the residual block and the residual predictor block, thereby reducing the amount of information included in the encoded video bitstream used to encode the current video block 100. In the temporal ARP example of FIG9, a predictor for the residual of the current video block 100 is determined based on blocks in the reference/base view ( _V0 ) that correspond to the current video block 100 (Curr) and the temporal reference video block 112 (CurrTRef) and are identified by DV 104. The difference between these corresponding blocks in the reference view may be a good predictor of the residual, i.e., the difference between the current video block 100 (Curr) and the temporal reference video block 112 (CurrTRef). Specifically, the video coder identifies an inter-view reference video block 106 (Base) and a temporal reference video block 116 (BaseTRef) in a reference view and determines a residual predictor block based on the difference (BaseTRef-Base) between the inter-view reference video block 106 and the temporal reference video block 116, wherein a subtraction operation is applied to each pixel of the represented pixel array. In some examples, the video coder may apply a weighting factor w to the residual predictor. In such examples, the final predictor for the current block (i.e., the sum of the reference block and the residual predictor block) may be expressed as: CurrTRef+w*(BaseTRef-Base).

图10为说明当前视图(Vm)中的当前视频块120的时间ARP的实例双向预测结构的概念图。上文描述和图9说明单向预测。当将ARP扩展到双向预测的情况时，视频译码器可将上文技术应用到参考图片列表中的一或两者以便识别当前视频块120的残差预测符块。特定来说，视频译码器可检查当前视频块120的参考列表中的一或两者以确定其中的一者是否含有可用于时间ARP的TMV。在由图10说明的实例中，当前视频块120与指向第一参考图片列表(RefPicList0)中的第一时间参考图片134的TMV 130相关联，且指向第二时间参考图片136的TMV 132为第二参考图片列表(RefPicList1)。FIG10 is a conceptual diagram illustrating an example bidirectional prediction structure for temporal ARP for current video block 120 in the current view (Vm). The above description and FIG9 illustrate unidirectional prediction. When extending ARP to the case of bidirectional prediction, the video coder may apply the above techniques to one or both of the reference picture lists to identify the residual predictor block for current video block 120. Specifically, the video coder may examine one or both of the reference lists for current video block 120 to determine whether one of them contains a TMV that can be used for temporal ARP. In the example illustrated by FIG10, current video block 120 is associated with TMV 130 pointing to a first temporal reference picture 134 in a first reference picture list (RefPicList0), and TMV 132 pointing to a second temporal reference picture 136 in a second reference picture list (RefPicList1).

在一些实例中，视频译码器将根据检查次序检查参考图片列表以确定其中的一者是否包含可用于时间ARP的TMV，且如果第一列表包含此TMV，则不必根据所述检查次序检查第二列表。在一些实例中，视频译码器将检查两个参考图片列表，并且如果两个列表均包含TMV，那么例如基于使用所述TMV产生的所产生残差预测符相对于当前视频块的残差的比较而确定使用哪一TMV。值得注意的是，根据针对ARP的当前提议(例如，JCT3VC-D0177)，在当前块针对一个参考图片列表使用视图间参考图片(不同视图中)时，停用残差预测过程。In some examples, the video coder checks the reference picture lists according to a checking order to determine whether one of them includes a TMV that can be used for temporal ARP, and if the first list includes such a TMV, then the second list need not be checked according to the checking order. In some examples, the video coder checks two reference picture lists and, if both lists include a TMV, determines which TMV to use based on, for example, a comparison of a residual predictor generated using the TMV with respect to a residual of the current video block. Notably, according to current proposals for ARP (e.g., JCT3VC-D0177), the residual prediction process is disabled when the current block uses an inter-view reference picture (in a different view) for one reference picture list.

如图10中所说明，视频译码器可使用例如根据NBDV导出过程针对当前视频块120识别的DV 124以识别在与当前图片122不同的参考视图(V₀)中但在相同存取单元中的视图间参考图片128中的对应的视图间参考视频块126(Base)。视频译码器还可针对当前视频块120使用TMV 130和132以识别两个参考图片列表(例如RefPicList0和RefPicList1)中的参考视图的各个时间参考图片中的视图间参考视频块126(Base)的时间参考块(BaseTRef)。在图10的实例中，视频译码器基于当前视频块120的TMV 130和132识别第一参考图片列表(例如RefPicList0)中的时间参考图片142中的时间参考视频块(BaseTRef)140和第二参考图片列表(例如RefPicList1)中的时间参考图片146中的时间参考视频块(BaseTRef)144。10 , the video coder may use DV 124 identified for current video block 120, e.g., according to an NBDV derivation process, to identify a corresponding inter-view reference video block 126 (Base) in an inter-view reference picture 128 that is in a different reference view (V ₀ ) than current picture 122 but in the same access unit. The video coder may also use TMVs 130 and 132 for current video block 120 to identify a temporal reference block (BaseTRef) for inter-view reference video block 126 (Base) in respective temporal reference pictures of reference views in two reference picture lists (e.g., RefPicList0 and RefPicList1). In the example of Figure 10, the video decoder identifies a temporal reference video block (BaseTRef) 140 in a temporal reference picture 142 in a first reference picture list (e.g., RefPicList0) and a temporal reference video block (BaseTRef) 144 in a temporal reference picture 146 in a second reference picture list (e.g., RefPicList1) based on the TMVs 130 and 132 of the current video block 120.

参考视图中的当前视频块120的TMV 130和132的使用由图10中的虚线箭头说明。在图10中，参考视图中的时间参考视频块140和144归因于其基于TMV 130和132的识别而被称作经运动补偿的参考块。视频译码器可基于时间参考视频块140与视图间参考视频块126之间的差或基于时间参考视频块144与视图间参考视频块126之间的差而确定当前视频块120的残差预测符块。The use of TMVs 130 and 132 for current video block 120 in the reference view is illustrated by the dashed arrows in FIG 10. In FIG 10, temporal reference video blocks 140 and 144 in the reference view are referred to as motion compensated reference blocks due to their identification based on TMVs 130 and 132. The video coder may determine a residual predictor block for current video block 120 based on a difference between temporal reference video block 140 and inter-view reference video block 126 or based on a difference between temporal reference video block 144 and inter-view reference video block 126.

再次重申，解码器侧的所提议的时间ARP过程可描述(参看图10)如下：Again, the proposed temporal ARP process on the decoder side can be described (see FIG. 10 ) as follows:

1.视频解码器30例如使用指向目标参考视图(V₀)的NBDV导出获得如当前3D-HEVC中指定的DV 124。随后，在相同存取单元内的参考视图的图片128中，视频解码器30通过DV124识别对应的视图间参考视频块126(Base)。1. The video decoder 30 obtains the DV 124 as specified in the current 3D-HEVC, for example, using NBDV derivation pointing to the target reference view (V ₀ ). Then, in a picture 128 of a reference view within the same access unit, the video decoder 30 identifies the corresponding inter-view reference video block 126 (Base) through the DV 124 .

2.视频解码器30再使用当前视频块120的运动信息(例如，TMV 130、132)以导出对应的视图间参考视频块126的运动信息。视频解码器30可基于当前视频块120的TMV130、132和参考视频块126的参考视图中的所导出的参考图片142、146应用对应的视图间参考视频块126的运动补偿以识别经运动补偿的时间参考视频块140、144(BaseTRef)以及通过确定BaseTRef-Base确定残差预测符块。当前块、对应块(Base)和运动补偿块(BaseTRef)之间的关系在图9和10中展示。在一些实例中，参考视图(V₀)中具有与当前视图(V_m)的参考图片相同的POC(图片次序计数)值的参考图片选定为对应块的参考图片。2. Video decoder 30 then uses the motion information (e.g., TMV 130, 132) of current video block 120 to derive motion information for a corresponding inter-view reference video block 126. Video decoder 30 may apply motion compensation for the corresponding inter-view reference video block 126 based on the TMV 130, 132 of the current video block 120 and the derived reference pictures 142, 146 in the reference view of the reference video block 126 to identify a motion-compensated temporal reference video block 140, 144 (BaseTRef) and determine a residual predictor block by determining BaseTRef-Base. The relationship between the current block, the corresponding block (Base), and the motion compensated block (BaseTRef) is shown in Figures 9 and 10. In some examples, a reference picture in the reference view ( _V0 ) having the same POC (Picture Order Count) value as the reference picture of the current view ( _Vm ) is selected as the reference picture for the corresponding block.

3.视频解码器30可将加权因子w应用到残差预测符块以获得经加权残差预测符块，且将经加权残差块的值相加到经预测样本以重构当前视频块120。3. Video decoder 30 may apply the weighting factor w to the residual predictor block to obtain a weighted residual predictor block, and add the values of the weighted residual block to the predicted samples to reconstruct the current video block 120 .

图11为根据本发明中描述的技术的用于经视图间预测视频块的视图间ARP的实例预测结构的概念图。在张(Zhang)等人的“CE4：对高级残差预测的进一步改进(CE4:Furtherimprovements on advanced residual prediction)”(ITU-T SG 16 WP 3和ISO/IEC JTC1/SC 29/WG 11的3D视频译码扩展开发联合合作小组，第6次会议，瑞士日内瓦，2013年10月25日到11月1日，下文称为“JCT3V-F0123”)中提出与图11相关的技术。JCT3V-F0123可从以下链接下载：http://http://phenix.it-sudparis.eu/jct2/doc_end_user/current_ document.php？id＝1460。FIG11 is a conceptual diagram of an example prediction structure for inter-view ARP for inter-view predicted video blocks according to the techniques described in this disclosure. Techniques related to FIG11 are presented in Zhang et al., "CE4: Further Improvements on Advanced Residual Prediction" (Joint Collaboration Team on 3D Video Coding Extensions Development of ITU-T SG 16 WP 3 and ISO/IEC JTC1/SC 29/WG 11, 6th Meeting, Geneva, Switzerland, October 25–November 1, 2013, hereinafter referred to as "JCT3V-F0123"). JCT3V -F0123 can be downloaded from the following link: http://phenix.it-sudparis.eu/jct2/doc_end_user/current_document.php?id=1460 .

根据图11中说明的实例技术，视频译码器(例如视频编码器20和/或视频解码器30)可使用来自不同存取单元的视图间残差来预测经视图间预测的当前块的残差。与其中在当前块的运动向量为DMV时不执行ARP且仅在当前视频块的运动向量为TMV时执行ARP的针对ARP的提议相比，图11的实例技术使用DMV来执行ARP。According to the example technique illustrated in FIG 11 , a video coder (e.g., video encoder 20 and/or video decoder 30) may use inter-view residuals from different access units to predict a residual for an inter-view predicted current block. Compared to proposals for ARP in which ARP is not performed when the motion vector of the current block is a DMV and is performed only when the motion vector of the current video block is a TMV, the example technique of FIG 11 performs ARP using DMVs.

图11的实例技术可由视频译码器(例如视频编码器20或视频解码器30)在当前图片152中的当前视频块150(Curr)的运动向量为DMV 154时执行，且参考视图(V0)中的视图间参考图片158中的视图间参考视频块156(Base)含有至少一个TMV 160。在一些实例中，DMV 154可为DV，其转换为DMV以充当IDMVC用于当前视频块150的运动信息预测。11 may be performed by a video coder, such as video encoder 20 or video decoder 30, when the motion vector of a current video block 150 (Curr) in a current picture 152 is a DMV 154, and an inter-view reference video block 156 (Base) in an inter-view reference picture 158 in a reference view (V0) contains at least one TMV 160. In some examples, DMV 154 may be a DV that is converted to a DMV to serve as an IDMVC for motion information prediction of the current video block 150.

视频译码器使用当前视频块150的DMV 154识别视图间参考图片158中的视图间参考视频块156(Base)。视频译码器使用视图间参考视频块156的TMV 160和相关联参考图片(例如，参考视图(V₀)中的时间参考图片164)连同DMV一起以识别参考视图(V₀)中的时间参考图片164中的时间参考视频块162(BaseTRef)。基于TMV 160和DMV 154识别时间参考视频块162(BaseTRef)由虚线向量170(TMV+DMV)表示。视频译码器还使用TMV 160以识别当前视图(V_m)中的时间参考图片168中的时间参考视频块166(CurrTRef)。参考视图(V₀)中的时间参考视频块162(BaseTRef)和当前视图(V_m)中的时间参考视频块166(CurrTRef)可在相同存取单元内，即，参考视图(V₀)中的时间参考图片164和当前视图(V_m)中的时间参考图片168可在相同存取单元中。The video coder uses DMV 154 of current video block 150 to identify inter-view reference video block 156 (Base) in inter-view reference picture 158. The video coder uses TMV 160 of inter-view reference video block 156 and an associated reference picture (e.g., temporal reference picture 164 in reference view (V ₀ )) along with the DMV to identify temporal reference video block 162 (BaseTRef) in temporal reference picture 164 in reference view (V ₀ ). Identification of temporal reference video block 162 (BaseTRef) based on TMV 160 and DMV 154 is represented by dashed vector 170 (TMV+DMV). The video coder also uses TMV 160 to identify temporal reference video block 166 (CurrTRef) in temporal reference picture 168 in current view (V _m ). Temporal reference video block 162 (BaseTRef) in reference view (V ₀ ) and temporal reference video block 166 (CurrTRef) in current view (V _m ) may be within the same access unit, i.e., temporal reference picture 164 in reference view (V ₀ ) and temporal reference picture 168 in current view (V _m ) may be in the same access unit.

视频译码器(例如视频编码器20和/或视频解码器30)可随后基于这后两个块之间的逐像素差(即，当前视图中的时间参考视频块166与参考视图中的时间参考视频块162之间的差，或CurrTRef-BaseTRef)计算来自当前视频块150的不同存取单元中的视图间残差预测符块。差信号(表示为视图间残差预测符)可用于预测当前视频块150的残差。当前视频块150的预测信号可为视图间预测符(即，视图间参考视频块156(Base))与基于当前视图中的时间参考视频块166与参考视图中的时间参考视频块162之间的差而确定的不同存取单元中的经预测视图间残差的总和。在一些实例中，加权因子w施加到不同存取单元中的经预测视图间残差。在此些实例中，当前视频块150的预测信号可为：Base+w*(CurrTRef-BaseTRef)。A video coder (e.g., video encoder 20 and/or video decoder 30) may then calculate an inter-view residual predictor block in a different access unit from current video block 150 based on the pixel-by-pixel difference between these latter two blocks (i.e., the difference between temporal reference video block 166 in the current view and temporal reference video block 162 in the reference view, or CurrTRef - BaseTRef). The difference signal (denoted as the inter-view residual predictor) may be used to predict the residual for current video block 150. The prediction signal for current video block 150 may be the sum of the inter-view predictor (i.e., inter-view reference video block 156 (Base)) and the predicted inter-view residual in the different access unit determined based on the difference between temporal reference video block 166 in the current view and temporal reference video block 162 in the reference view. In some examples, a weighting factor w is applied to the predicted inter-view residual in the different access units. In such examples, the prediction signal for current video block 150 may be: Base + w * (CurrTRef - BaseTRef).

在一些实例中，视频译码器可确定用于视图间ARP的目标存取单元中的目标参考图片，例如类似于用于时间ARP的目标参考图片的确定，如上文所论述。在一些实例中，如上文参看JCT3V-D0177所论述，每一参考图片列表的目标参考图片为参考图片列表中的第一参考图片。在其它实例中，一个或两个参考图片列表的目标参考图片(例如目标POC)可例如以PU、CU、切片、图片或其它为基础从视频编码器20发信号到视频解码器30。在其它实例中，每一参考图片列表的目标参考图片为与当前块相比具有最小POC差和较小参考图片索引的参考图片列表中的时间参考图片。在其它实例中，两个参考图片列表的目标参考图片相同。In some examples, the video coder may determine a target reference picture in a target access unit for inter-view ARP, e.g., similar to the determination of a target reference picture for temporal ARP, as discussed above. In some examples, as discussed above with reference to JCT3V-D0177, the target reference picture for each reference picture list is the first reference picture in the reference picture list. In other examples, the target reference picture (e.g., target POC) for one or both reference picture lists may be signaled from video encoder 20 to video decoder 30, e.g., on a PU, CU, slice, picture, or other basis. In other examples, the target reference picture for each reference picture list is the temporal reference picture in the reference picture list that has the smallest POC difference and a smaller reference picture index than the current block. In other examples, the target reference picture for both reference picture lists is the same.

如果含有TMV 160所指示的参考视图中的时间参考视频块的图片在与目标ARP参考图片不同的存取单元(时间实例)中，那么视频译码器可将TMV 160缩放到目标参考图片(例如目标参考图片164)以识别用于视图间ARP的参考视图中的时间参考视频块162(BaseTRef)。在此些实例中，视频译码器将时间参考视频块162定位在含有目标ARP参考图片的存取单元中。视频译码器可通过POC缩放来缩放TMV 160。此外，经缩放TMV用于识别定位于目标ARP参考图片中的当前视图中的时间参考视频块(CurrTRef)166。If the picture containing the temporal reference video block in the reference view indicated by TMV 160 is in a different access unit (time instance) than the target ARP reference picture, the video coder may scale TMV 160 to the target reference picture (e.g., target reference picture 164) to identify temporal reference video block 162 (BaseTRef) in the reference view for inter-view ARP. In these examples, the video coder locates temporal reference video block 162 in the access unit containing the target ARP reference picture. The video coder may scale TMV 160 by POC scaling. Furthermore, the scaled TMV is used to identify a temporal reference video block (CurrTRef) 166 in the current view that is located in the target ARP reference picture.

在一些实例中，视频译码器将TMV 160缩放到LX(X为0或1)目标参考图片，其中LX对应于包含TMV的PU的所述RefPicListX。在一些实例中，视频译码器可将来自RefPicList0或RefPicList1中的任一者或两者的TMV分别缩放到L0或L1目标参考图片。在一些实例中，视频译码器将TMV 160缩放到LX目标参考图片，其中X满足当前视频块150(例如当前PU)的DMV 154对应于RefPicListX的条件。In some examples, the video coder scales TMV 160 to an LX (X is 0 or 1) target reference picture, where LX corresponds to the RefPicListX of the PU that includes the TMV. In some examples, the video coder may scale TMVs from either or both of RefPicList0 or RefPicList1 to an L0 or L1 target reference picture, respectively. In some examples, the video coder scales TMV 160 to an LX target reference picture, where X satisfies the condition that DMV 154 of current video block 150 (e.g., current PU) corresponds to RefPicListX.

类似地，在一些实例中，视频译码器在识别目标参考视图中的参考图片158中的视图间参考视频块156之前将DMV 154缩放到ARP的目标参考视图。视频译码器可通过视图次序差缩放而缩放DMV 154。目标参考视图可由视频编码器20及视频解码器30预定和已知，或可例如以PU、CU、切片、图片或其它为基础从视频编码器20信令到视频解码器30。Similarly, in some examples, the video coder scales DMV 154 to the target reference view of the ARP before identifying inter-view reference video blocks 156 in reference pictures 158 in the target reference view. The video coder may scale DMV 154 by view order difference scaling. The target reference view may be predetermined and known by video encoder 20 and video decoder 30, or may be signaled from video encoder 20 to video decoder 30, for example, on a PU, CU, slice, picture, or other basis.

在视图间ARP的一些实例中，视频译码器(例如，视频编码器20和/或视频解码器30)可使用图11中说明的相同预测结构以及所识别的参考视频块156、162和166导出当前块150的预测信号，但基于参考视图中的参考块156与162而非不同存取单元中的参考块162与166之间的差确定残差预测符块。在此些实例中，视频译码器可将加权因子应用到其它样本阵列(例如，参考视图中的参考块156与162之间的差)，且相应地导出当前视频块150的预测信号如下：CurrTRef+w*(Base-BaseTRef)。在视图间ARP的一些实例中，视频译码器可使用各种内插滤波器(包含双线性滤波器)在其与分数像素位置对准的情况下导出参考视频块156、162和166。In some examples of inter-view ARP, a video coder (e.g., video encoder 20 and/or video decoder 30) may derive a prediction signal for current block 150 using the same prediction structure illustrated in FIG. 11 and the identified reference video blocks 156, 162, and 166, but may determine a residual predictor block based on the difference between reference blocks 156 and 162 in a reference view rather than between reference blocks 162 and 166 in different access units. In such examples, the video coder may apply a weighting factor to the other sample array (e.g., the difference between reference blocks 156 and 162 in a reference view) and accordingly derive a prediction signal for current video block 150 as follows: CurrTRef+w*(Base-BaseTRef). In some examples of inter-view ARP, the video coder may use various interpolation filters, including bilinear filters, to derive reference video blocks 156, 162, and 166 when they are aligned with fractional pixel positions.

尽管图11说明其中使用与视图间参考块的TMV和视图间参考视频块的相关联参考图片来识别当前和参考视图中的时间参考视频块的视图间ARP实例，但在其它实例中，其它TMV和相关联参考图片可用于识别当前和参考视图中的时间参考视频块。举例来说，如果当前视频块的DMV是来自当前视频块的第一参考图片列表(例如，RefPicList0或RefPicList1)，那么视频译码器可使用对应于当前块的第二参考图片列表的TMV和来自当前视频块的第二参考图片列表(例如，RefPicList0或RefPicList1中的另一者)的相关联参考图片。在此些实例中，视频译码器可识别与TMV相关联的参考图片中的当前视图中的时间参考视频块，或将TMV缩放到ARP的目标存取单元和目标参考图片以识别当前视图中的时间参考视频块。在此些实例中，视频译码器可识别与其中定位有当前视图中的时间参考视频块的参考图片相同的存取单元中的参考图片中的参考视图中的时间参考视频块。在其它实例中，不是视图间参考视频块的TMV或当前视频块的另一参考图片列表的TMV，视频译码器可类似地使用从当前视频块的空间或时间相邻视频块的运动信息导出的TMV和相关联的参考图片来识别ARP的当前和参考视图中的时间参考视频块。Although FIG11 illustrates an inter-view ARP example in which the TMV associated with an inter-view reference block and the inter-view reference video block's associated reference picture are used to identify temporal reference video blocks in the current and reference views, in other examples, other TMVs and associated reference pictures may be used to identify temporal reference video blocks in the current and reference views. For example, if the DMV for the current video block is from the first reference picture list of the current video block (e.g., RefPicList0 or RefPicList1), the video coder may use the TMV corresponding to the second reference picture list of the current block and the associated reference picture from the second reference picture list of the current video block (e.g., the other of RefPicList0 or RefPicList1). In such examples, the video coder may identify the temporal reference video block in the current view in the reference picture associated with the TMV, or scale the TMV to the target access unit and target reference picture of the ARP to identify the temporal reference video block in the current view. In such examples, the video coder may identify the temporal reference video block in the reference view in a reference picture in the same access unit as the reference picture in which the temporal reference video block in the current view is located. In other examples, instead of the TMV of an inter-view reference video block or the TMV of another reference picture list of the current video block, the video coder may similarly use the TMV and associated reference pictures derived from motion information of spatially or temporally neighboring video blocks of the current video block to identify temporal reference video blocks in the current and reference views of the ARP.

在以下描述中，如果一个参考图片列表的对应参考是时间参考图片且应用ARP，那么ARP过程被标示为时间ARP。否则，如果一个参考图片列表的对应参考是视图间参考图片且应用ARP，那么ARP过程被标示为视图间ARP。In the following description, if the corresponding reference of a reference picture list is a temporal reference picture and ARP is applied, the ARP process is labeled as temporal ARP. Otherwise, if the corresponding reference of a reference picture list is an inter-view reference picture and ARP is applied, the ARP process is labeled as inter-view ARP.

在针对ARP的一些提议中，可使用三个加权因子，即0、0.5和1。产生当前CU的最小速率失真成本的加权因子被选定为最终加权因子，且对应加权因子索引(0、1和2，其分别对应于加权因子0、1和0.5)在CU层级处的位流中发射。一个CU中的所有PU预测共享相同的加权因子。当加权因子等于0时，ARP并不用于当前CU。In some proposals for ARP, three weighting factors, namely 0, 0.5, and 1, can be used. The weighting factor that produces the minimum rate-distortion cost for the current CU is selected as the final weighting factor, and the corresponding weighting factor index (0, 1, and 2, which correspond to weighting factors of 0, 1, and 0.5, respectively) is transmitted in the bitstream at the CU level. All PU predictions in a CU share the same weighting factor. When the weighting factor is equal to 0, ARP is not used for the current CU.

在张(Zhang)等人“3D-CE4：用于多视图译码的高级残差预测(3D-CE4:Advancedresidual prediction for multiview coding)”(ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC29/WG 11的视频译码扩展开发联合合作小组第3次会议：瑞士日内瓦，2013年1月17日到23日，文献JCT3V-C0049(MPEG编号m27784，下文中称为“JCT3V-C0049”))中描述与3D-HEVC的ARP相关的方面。JCT3V-C0049可从以下链接下载：Aspects related to ARP for 3D-HEVC are described in Zhang et al., “3D-CE4: Advanced residual prediction for multiview coding” (3rd meeting of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29/WG 11 Joint Collaboration Group on Development of Extensions for Video Coding, Geneva, Switzerland, January 17–23, 2013, document JCT3V-C0049 (MPEG No. m27784, hereinafter “JCT3V-C0049”). JCT3V-C0049 can be downloaded from the following link:

http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php？id ＝487。 http://phenix.int-evry.fr/jct3v/doc_end_user/current_document.php?id=487 .

在JCT3V-C0049中，以非零加权因子译码的不同PU的参考图片可在不同PU(或当前视频块)间不同。因此，可需要存取来自参考视图的不同图片以产生经运动补偿的块(BaseTRef)(例如图9和10中的视图间参考视频块116、140和144)，或参考视图(Base)中的对应视图间参考视频块，例如图9和10中的视图间参考视频块106和126。In JCT3V-C0049, the reference pictures of different PUs coded with non-zero weighting factors may be different between different PUs (or current video blocks). Therefore, it may be necessary to access different pictures from the reference view to generate motion-compensated blocks (BaseTRef) (e.g., inter-view reference video blocks 116, 140, and 144 in Figures 9 and 10), or corresponding inter-view reference video blocks in the reference view (Base), such as inter-view reference video blocks 106 and 126 in Figures 9 and 10.

在加权因子不等于0时，对于时间残差，在针对残差及残差预测符产生过程执行运动补偿之前，朝向固定图片缩放当前PU的运动向量。在将ARP应用于视图间残差时，在针对残差及残差预测符产生过程执行运动补偿之前，朝向固定图片缩放参考块(例如，图11中的块156)的时间运动向量。对于两种情况(即，时间残差或视图间残差)，将固定图片界定为每一参考图片列表的第一可用的时间参考图片。在经解码运动向量不指向固定图片时，其首先经缩放且随后用于识别CurrTRef及BaseTRef。When the weighting factor is not equal to 0, for temporal residual, the motion vector of the current PU is scaled towards the fixed picture before motion compensation is performed for the residual and residual predictor generation process. When ARP is applied to inter-view residual, the temporal motion vector of the reference block (e.g., block 156 in Figure 11) is scaled towards the fixed picture before motion compensation is performed for the residual and residual predictor generation process. For both cases (i.e., temporal residual or inter-view residual), a fixed picture is defined as the first available temporal reference picture in each reference picture list. When the decoded motion vector does not point to a fixed picture, it is first scaled and then used to identify CurrTRef and BaseTRef.

用于ARP的此参考图片称为目标ARP参考图片。应注意，在当前切片是B切片时，目标ARP参考图片与一个特定参考图片列表相关联。因此，可利用两个目标ARP参考图片。This reference picture used for ARP is called the target ARP reference picture. It should be noted that when the current slice is a B slice, the target ARP reference picture is associated with a specific reference picture list. Therefore, two target ARP reference pictures can be used.

可如下执行目标ARP参考图片的可用性检查。首先，通过RpRefPicLX标示与一个参考图片列表X(其中X是0或1)相关联的目标ARP参考图片，且通过RefPicInRefViewLX标示视图中的具有等于从NBDV导出过程导出的视图次序索引的视图次序索引且具有RpRefPicLX的相同POC值的图片。The availability check of the target ARP reference picture can be performed as follows: First, the target ARP reference picture associated with one reference picture list X (where X is 0 or 1) is indicated by RpRefPicLX, and the picture in the view with a view order index equal to the view order index derived from the NBDV derivation process and the same POC value of RpRefPicLX is indicated by RefPicInRefViewLX.

在以下条件中的一者是假时，针对参考图片列表X停用ARP：ARP is disabled for reference picture list X when one of the following conditions is false:

-RpRefPicLX不可用-RpRefPicLX is not available

-RefPicInRefViewLX未存储在经解码图片缓冲器中-RefPicInRefViewLX is not stored in the decoded picture buffer

-RefPicInRefViewLX未包含在由来自NBDV导出过程的DV或与当前块相关联的DMV定位的对应块(例如，图9中的块106或图11中的块156)的参考图片列表中的任一者，针对此参考图片列表停用ARP。-RefPicInRefViewLX is not included in any of the reference picture lists of the corresponding block located by the DV from the NBDV derivation process or the DMV associated with the current block (e.g., block 106 in Figure 9 or block 156 in Figure 11), ARP is disabled for this reference picture list.

在应用ARP时，可在产生残差及残差预测符时使用双线性滤波器。即，例如，使用双线性滤波器产生图9中的块106、112及116。When ARP is applied, a bilinear filter may be used when generating the residual and the residual predictor. That is, for example, blocks 106, 112, and 116 in FIG9 may be generated using a bilinear filter.

图12是说明使用视图间预测用于一个参考图片列表且使用时间预测用于另一参考图片列表的双向ARP的实例预测结构的概念图。在当前视频块250的双向预测的一个预测方向使用时间预测(例如，针对参考图片列表X)且当前视频块250的另一预测方向使用视图间预测(例如，针对参考图片列表Y(Y＝1-X))时，可通过视频编码器20和/或视频解码器30执行图12的实例技术。FIG12 is a conceptual diagram illustrating an example prediction structure for bidirectional ARP that uses inter-view prediction for one reference picture list and temporal prediction for another reference picture list. The example technique of FIG12 may be performed by video encoder 20 and/or video decoder 30 when one prediction direction of the bidirectional prediction of the current video block 250 uses temporal prediction (e.g., for reference picture list X) and the other prediction direction of the current video block 250 uses inter-view prediction (e.g., for reference picture list Y (Y=1-X)).

在图12的实例中，当前视频块250可与TMV 210及DMV 254相关联。视频编码器20和/或视频解码器30可经配置而以与上文参看图9所描述的类似方式识别及存取参考图片列表X(即，第一预测方向)的参考块。12 , current video block 250 may be associated with TMV 210 and DMV 254. Video encoder 20 and/or video decoder 30 may be configured to identify and access reference blocks of reference picture list X (i.e., first prediction direction) in a similar manner as described above with reference to FIG.

视频编码器20和/或视频解码器30识别用于当前(例如，相依)视图Vm的当前图片253中的当前视频块250的残差的预测中的随后的块。视频编码器20和/或视频解码器30识别参考/基础视图(图12中的V₀)的视图间参考图片258中的视图间参考视频块206(BaseX)。视频编码器20和/或视频解码器30基于当前视频块250(Curr)的DV 204而识别视图间参考视频块206。视频编码器20和/或视频解码器30可使用NBDV导出确定DV 204，如上文所描述。The video encoder 20 and/or video decoder 30 identify subsequent blocks used in the prediction of the residual of the current video block 250 in the current picture 253 of the current (e.g., dependent) view Vm. The video encoder 20 and/or video decoder 30 identify an inter-view reference video block 206 (BaseX) in the inter-view reference picture 258 of the reference/base view ( _V0 in FIG. 12). The video encoder 20 and/or video decoder 30 identify the inter-view reference video block 206 based on the DV 204 of the current video block 250 (Curr). The video encoder 20 and/or video decoder 30 may determine the DV 204 using NBDV derivation, as described above.

视频编码器20和/或视频解码器30可进一步识别同一视图(V_m)中的时间参考图片270中的时间参考视频块212(CurrTRefX)作为当前视频块250(Curr)。视频编码器20和/或视频解码器30使用当前视频块250的TMV 210识别时间参考视频块212。视频编码器20和/或视频解码器30可使用本文中描述的技术中的任一者确定TMV 210。视频编码器20和/或视频解码器30可进一步识别参考视图(即，同一视图)中的时间参考图片272中的时间参考视频块216(BaseTRefX)作为视图间参考视频块206(BaseX)。视频编码器20和/或视频解码器30可使用当前视频块250(Curr)的TMV 210识别参考视图中的时间参考视频块216。TMV 210+DV 204的向量220可相对于当前视频块250(Curr)识别时间参考视频块216(BaseTRefX)。如可在图12中看出，对于参考图片列表X(即，第一预测方向)，视频编码器20和/或视频解码器30经配置以识别及存取三个参考块(即，参考块206、212及216)。The video encoder 20 and/or video decoder 30 may further identify a temporal reference video block 212 (CurrTRefX) in a temporal reference picture 270 in the same view (V _m ) as the current video block 250 (Curr). The video encoder 20 and/or video decoder 30 may identify the temporal reference video block 212 using the TMV 210 of the current video block 250. The video encoder 20 and/or video decoder 30 may determine the TMV 210 using any of the techniques described herein. The video encoder 20 and/or video decoder 30 may further identify a temporal reference video block 216 (BaseTRefX) in a temporal reference picture 272 in a reference view (i.e., the same view) as the inter-view reference video block 206 (BaseX). The video encoder 20 and/or video decoder 30 may identify the temporal reference video block 216 in the reference view using the TMV 210 of the current video block 250 (Curr). The vector 220 of TMV 210+DV 204 may identify a temporal reference video block 216 (BaseTRefX) relative to the current video block 250 (Curr). As can be seen in FIG12, for reference picture list X (i.e., the first prediction direction), video encoder 20 and/or video decoder 30 are configured to identify and access three reference blocks (i.e., reference blocks 206, 212, and 216).

视频编码器20和/或视频解码器30可经配置而以与如上文参看图11所描述的类似方式识别及存取参考图片列表Y的参考块(即，第一预测方向)。视频编码器20和/或视频解码器30识别参考/基础视图(图12中的V₀)的视图间参考图片258中的视图间参考视频块256(BaseY)。视频编码器20和/或视频解码器30基于当前视频块250(Curr)的DMV254识别视图间参考视频块256。The video encoder 20 and/or video decoder 30 may be configured to identify and access reference blocks of reference picture list Y (i.e., first prediction direction) in a similar manner as described above with reference to FIG. 11 . The video encoder 20 and/or video decoder 30 identifies an inter-view reference video block 256 (BaseY) in an inter-view reference picture 258 of a reference/base view ( V ₀ in FIG. 12 ). The video encoder 20 and/or video decoder 30 identifies the inter-view reference video block 256 based on the DMV 254 of the current video block 250 (Curr).

视频编码器20和/或视频解码器30可进一步识别同一视图(V_m)中的时间参考图片268中的时间参考视频块273(CurrTRefY)作为当前视频块250(Curr)。视频编码器20和/或视频解码器30可使用当前视频块250的TMV'285识别时间参考视频块273。视频译码器使用TMV'285及视图间参考视频块256的相关联的参考图片(例如，参考视图(V₀)中的时间参考图片265)以及DMV 254以识别参考视图(V₀)中的时间参考图片265中的时间参考视频块271(BaseTRefY)。基于TMV'285及DMV 254的时间参考视频块271(BaseTRefY)的识别由虚线向量(TMV'+DMV)表示。参考视图(V₀)中的时间参考视频块271(BaseTRefY)及当前视图(V_m)中的时间参考视频块273(CurrTRefY)可在同一存取单元内，即，参考视图(V₀)中的时间参考图片265及当前视图(V_m)中的时间参考图片268可在同一存取单元中。Video encoder 20 and/or video decoder 30 may further identify a temporal reference video block 273 (CurrTRefY) in a temporal reference picture 268 in the same view (V _m ) as the current video block 250 (Curr). Video encoder 20 and/or video decoder 30 may identify temporal reference video block 273 using TMV' 285 of current video block 250. The video coder uses TMV' 285 and the associated reference picture of inter-view reference video block 256 (e.g., temporal reference picture 265 in reference view (V ₀ ₎ ) and DMV 254 to identify temporal reference video block 271 (BaseTRefY) in temporal reference picture 265 in reference view (V 0 ). The identification of temporal reference video block 271 (BaseTRefY) based on TMV' 285 and DMV 254 is represented by the dashed vector (TMV'+DMV). The temporal reference video block 271 (BaseTRefY) in the reference view (V ₀ ) and the temporal reference video block 273 (CurrTRefY) in the current view (V _m ) may be in the same access unit, that is, the temporal reference picture 265 in the reference view (V ₀ ) and the temporal reference picture 268 in the current view (V _m ) may be in the same access unit.

如可在图12中看出，对于参考图片列表Y(即，第二预测方向)，视频编码器20和/或视频解码器30经配置以识别及存取额外三个参考块(即，参考块256、271及273)。As can be seen in FIG. 12 , for reference picture list Y (ie, the second prediction direction), video encoder 20 and/or video decoder 30 are configured to identify and access three additional reference blocks (ie, reference blocks 256 , 271 , and 273 ).

3D-HEVC中的ARP的前述技术展现若干缺点。作为一个实例，在结合双向预测执行块层级ARP或PU层级ARP时会增加对运动信息的存储器存取的数目，因为双向预测本身包含使用运动信息用于两个不同参考图片列表。另外，识别的参考块及存取的数目较高。因此，双向预测与ARP的组合会增加解码器复杂度。The aforementioned techniques for ARP in 3D-HEVC exhibit several drawbacks. For example, performing block-level or PU-level ARP in conjunction with bidirectional prediction increases the number of memory accesses for motion information, as bidirectional prediction inherently involves using motion information for two different reference picture lists. Furthermore, the number of reference blocks identified and accessed is high. Consequently, combining bidirectional prediction with ARP increases decoder complexity.

本发明提出用以解决ARP的上文所提到的问题以便减少视频解码器复杂度的各种实例技术。下文列举的技术中的每一者相对于ARP的当前提议减小执行ARP及其它相关联的视频译码技术所需要的存储器存取的数目。This disclosure proposes various example techniques to address the above-mentioned issues of ARP in order to reduce video decoder complexity. Each of the techniques listed below reduces the number of memory accesses required to perform ARP and other associated video coding techniques relative to current proposals for ARP.

图13是说明根据本发明的技术的使用视图间预测用于一个参考图片列表且使用时间预测用于另一参考图片列表的双向ARP的实例预测结构的概念图。在图13的实例中，视频编码器20和/或视频解码器30经配置以使用双向预测及ARP译码当前视频块250。所述双向预测包含用于参考图片列表X的时间预测(例如，第一预测方向)及用于参考图片列表Y的视图间预测(例如，第二预测方向)。FIG13 is a conceptual diagram illustrating an example prediction structure for bidirectional ARP using inter-view prediction for one reference picture list and temporal prediction for another reference picture list in accordance with the techniques of this disclosure. In the example of FIG13 , video encoder 20 and/or video decoder 30 is configured to code current video block 250 using bidirectional prediction and ARP. The bidirectional prediction includes temporal prediction (e.g., a first prediction direction) for reference picture list X and inter-view prediction (e.g., a second prediction direction) for reference picture list Y.

根据图13的技术，视频编码器20和/或视频解码器30经配置而以与上文参看图12所描述的相同方式识别参考图片列表X(例如，第一预测方向)的参考块206(BaseX)、参考块216(BaseTrefX)及参考块212(CurrTrefX)。TMV 210用于分别相对于参考块206(BaseX)及当前视频块250识别参考块216(BaseTrefX)及参考块212(CurrTrefX)。另外，视频编码器20和/或视频解码器30经配置以使用DMV 254(即，以与上文参看图12所描述的相同方式)识别参考图片列表Y的参考块256(BaseY)(例如，第二预测方向)According to the technique of FIG13 , video encoder 20 and/or video decoder 30 are configured to identify reference block 206 (BaseX), reference block 216 (BaseTrefX), and reference block 212 (CurrTrefX) of reference picture list X (e.g., a first prediction direction) in the same manner as described above with reference to FIG12 . TMV 210 is used to identify reference block 216 (BaseTrefX) and reference block 212 (CurrTrefX) relative to reference block 206 (BaseX) and current video block 250, respectively. Additionally, video encoder 20 and/or video decoder 30 are configured to identify reference block 256 (BaseY) of reference picture list Y (e.g., a second prediction direction) using DMV 254 (i.e., in the same manner as described above with reference to FIG12 ).

然而，视频编码器20和/或视频解码器30不使用与参考块256(BaseY)相关联的时间运动信息来识别CurrTrefY及BaseTrefY。而是，根据本发明的技术，视频编码器20和/或视频解码器30可经配置以使用参考列表X的时间运动信息(即，TMV 210)来识别CurrTrefY及BaseTrefY。如中图13所展示，视频编码器20和/或视频解码器30经配置以相对于参考块256(BaseY)使用TMV 210识别视图V₀中的参考块290(BaseTrefY)。即，视频编码器20和/或视频解码器30经配置以使用DMV 254及TMV 210两者识别参考块290(BaseTrefY)。视频编码器20和/或视频解码器30进一步经配置以使用TMV 210识别与当前视频块250相同的视图(Vm)中的CurrTrefY。因此，参考块212充当CurrTrefX及CurrTrefY两者。因此，使用本发明的技术，视频编码器20和/或视频解码器30在使用双向预测执行ARP时仅识别及存取5个参考块而不是6个。However, video encoder 20 and/or video decoder 30 do not use temporal motion information associated with reference block 256 (BaseY) to identify CurrTrefY and BaseTrefY. Instead, according to the techniques of this disclosure, video encoder 20 and/or video decoder 30 may be configured to use temporal motion information of reference list X (i.e., TMV 210) to identify CurrTrefY and BaseTrefY. As shown in FIG. 13 , video encoder 20 and/or video decoder 30 is configured to use TMV 210 to identify reference block 290 (BaseTrefY) in view V ₀ relative to reference block 256 (BaseY). That is, video encoder 20 and/or video decoder 30 is configured to use both DMV 254 and TMV 210 to identify reference block 290 (BaseTrefY). Video encoder 20 and/or video decoder 30 is further configured to use TMV 210 to identify CurrTrefY in the same view (Vm) as current video block 250. Thus, reference block 212 serves as both CurrTrefX and CurrTrefY.Thus, using the techniques of this disclosure, video encoder 20 and/or video decoder 30 only identifies and accesses five reference blocks instead of six when performing ARP using bi-prediction.

总之，通过识别对应于参考图片列表Y(例如，第二预测方向)的视图间ARP的参考块，视频编码器20和视频解码器30可使用与参考图片列表X的时间预测相关联的时间运动信息(例如图13中的TMV 210)来识别不同存取单元中的参考块(即，参考块290及212)。在此情况下，在执行视图间ARP时，不需要产生与当前视图不同的存取单元中的参考块(即，参考块212)，这是因为其与针对参考图片列表X的时间ARP所识别的参考块相同。即，参考块212用于时间ARP及视图间ARP两者。In summary, by identifying a reference block for inter-view ARP corresponding to reference picture list Y (e.g., the second prediction direction), video encoder 20 and video decoder 30 can use temporal motion information associated with temporal prediction of reference picture list X (e.g., TMV 210 in FIG. 13 ) to identify reference blocks in different access units (i.e., reference blocks 290 and 212). In this case, when performing inter-view ARP, there is no need to generate a reference block in an access unit different from the current view (i.e., reference block 212) because it is the same reference block identified for temporal ARP of reference picture list X. That is, reference block 212 is used for both temporal ARP and inter-view ARP.

以此方式，再使用用于第一预测方向的时间运动信息以用于第二预测方向。因此，需要作出对时间运动信息的更少的存储器存取，这是因为不需要存取由对应于第二预测方向的第一经编码块的运动向量识别的块的时间运动信息，因此允许更快速的视频解码。另外，在执行ARP时使用的参考块的总数可从6减小到5，其导致在使用乘法及加法运算的内插方面较小的计算复杂度。同样，在执行双向帧间预测时，视频编码器20可经配置以在编码第二预测方向时再使用用于第一预测方向的时间运动信息。In this way, temporal motion information used for the first prediction direction is reused for the second prediction direction. Consequently, fewer memory accesses need to be made to the temporal motion information because the temporal motion information of the block identified by the motion vector corresponding to the first encoded block in the second prediction direction does not need to be accessed, thereby allowing for faster video decoding. Additionally, the total number of reference blocks used when performing ARP can be reduced from 6 to 5, which results in less computational complexity in interpolation using multiplication and addition operations. Similarly, when performing bidirectional inter-frame prediction, video encoder 20 can be configured to reuse the temporal motion information used for the first prediction direction when encoding the second prediction direction.

在本发明的另一实例中，视频编码器20和视频解码器30可经配置以在双向预测的一个预测方向(例如，对应于参考图片列表X)对应于时间参考图片且另一预测方向(例如，对应于参考图片列表Y)对应于视图间参考图片时执行简化的ARP过程。在此情况下，对于对应于参考图片列表X的时间ARP，视频编码器20和视频解码器30可经配置以使用与所述视图间参考图片相关联的所述视差运动向量(MVY)识别参考视图中的参考块(例如，图12中的参考块273)，而不是使用从NBDV/DoNBDV导出过程导出的视差向量。同时，来自NBDV或DoNBDV过程的视差向量保持改变，所述视差向量可仍用于视图间运动预测中以产生IPMVC或IDMVC。In another example of the present disclosure, the video encoder 20 and the video decoder 30 may be configured to perform a simplified ARP process when one prediction direction of bi-directional prediction (e.g., corresponding to reference picture list X) corresponds to a temporal reference picture and the other prediction direction (e.g., corresponding to reference picture list Y) corresponds to an inter-view reference picture. In this case, for temporal ARP corresponding to reference picture list X, the video encoder 20 and the video decoder 30 may be configured to use the disparity motion vector (MVY) associated with the inter-view reference picture to identify a reference block in the reference view (e.g., reference block 273 in FIG. 12 ) instead of using the disparity vector derived from the NBDV/DoNBDV derivation process. At the same time, the disparity vector from the NBDV or DoNBDV process remains unchanged and can still be used in inter-view motion prediction to generate IPMVC or IDMVC.

应注意，以上方法可应用于PU层级ARP或块层级ARP两者。下文将更详细地描述PU层级及块层级ARP。It should be noted that the above method can be applied to both PU-level ARP and block-level ARP. PU-level and block-level ARP will be described in more detail below.

现将论述用于块层级ARP的技术。不同于以上描述，其中一个PU内的所有块共享用于ARP(有时被称为PU层级ARP)的相同运动信息，在块层级ARP中，一个PU分裂成若干子块(例如，8×8子块)且每一子块与其自身的导出的运动信息相关联以执行ARP。即，每一子块共享与当前PU相同的运动信息。然而，可针对每一子块确定所导出的运动向量(即，时间ARP中的视差向量或视图间ARP中的时间运动向量)。Techniques for block-level ARP will now be discussed. Unlike the above description, in which all blocks within a PU share the same motion information for ARP (sometimes referred to as PU-level ARP), in block-level ARP, a PU is split into several sub-blocks (e.g., 8×8 sub-blocks) and each sub-block is associated with its own derived motion information to perform ARP. That is, each sub-block shares the same motion information as the current PU. However, a derived motion vector (i.e., a disparity vector in temporal ARP or a temporal motion vector in inter-view ARP) can be determined for each sub-block.

图14是说明基于块的时间ARP的概念图。如图14中所展示，当前图片302包含被划分成四个子块300a、300b、300c及300d的当前块300(Curr)。运动向量310(mvLX)是用于对当前块300执行帧间预测的运动向量。运动向量310指向参考图片314中的参考块312(CurrTref)(其包含子块312a-d)。当前图片302及参考图片314在同一视图(Vm)中。FIG14 is a conceptual diagram illustrating block-based temporal ARP. As shown in FIG14 , a current picture 302 includes a current block 300 (Curr) divided into four sub-blocks 300a, 300b, 300c, and 300d. A motion vector 310 (mvLX) is a motion vector used to perform inter-frame prediction on the current block 300. The motion vector 310 points to a reference block 312 (CurrTref) in a reference picture 314 (which includes sub-blocks 312a-d). The current picture 302 and the reference picture 314 are in the same view (Vm).

对于基于块的时间ARP，默认所导出的运动向量用于子块300a-d中的每一者。对于时间ARP，默认所导出的运动向量是由图14中的第i子块的DV[i]标示的视差向量，且可使用NBDV导出过程导出，与当前ARP中一样。即，可对子块300a-d中的每一者执行NBDV导出过程以导出子块300a-d中的每一者的DV。所导出的DV中的每一者指向参考视图308中的特定参考块306a-d(Base)。例如，DV 304(DV[0])指向参考块306a且DV 305(DV[1])指向参考块306b。For block-based temporal ARP, the default derived motion vector is used for each of sub-blocks 300a-d. For temporal ARP, the default derived motion vector is the disparity vector denoted by DV[i] for the i-th sub-block in FIG14 and can be derived using the NBDV derivation process, as in the current ARP. That is, the NBDV derivation process can be performed on each of sub-blocks 300a-d to derive the DV for each of sub-blocks 300a-d. Each of the derived DVs points to a specific reference block 306a-d (Base) in reference view 308. For example, DV 304 (DV[0]) points to reference block 306a, and DV 305 (DV[1]) points to reference block 306b.

参考视图308在与当前图片302相同的时间实例处，但在另一视图中。在参考块312内的子块312a-d中的一者的中心位置含有视差运动向量时，更新当前子块300a-d中的对应一者的视差向量DV[i]以使用所述视差运动向量。即，例如，如果对应于当前子块300a的参考子块312a的中心位置具有相关联的视差运动向量，那么与参考子块312a相关联的视差运动向量被用作子块300a的视差向量。Reference view 308 is at the same time instance as current picture 302, but in another view. When the center position of one of sub-blocks 312a-d within reference block 312 contains a disparity motion vector, the disparity vector DV[i] of the corresponding one of current sub-blocks 300a-d is updated to use the disparity motion vector. That is, for example, if the center position of reference sub-block 312a corresponding to current sub-block 300a has an associated disparity motion vector, the disparity motion vector associated with reference sub-block 312a is used as the disparity vector for sub-block 300a.

一旦已经识别参考块306a-d中的每一者，运动向量310可用于找到参考图片318中的参考块316a-d(BaseTRef)。参考图片318在与当前图片302不同的时间实例以及不同的视图中。接着可通过从对应参考块316a-d(BaseTref)减去参考块306a-d(Base)而确定残差预测符。接着可对子块300a-d中的每一者执行ARP。Once each of the reference blocks 306a-d has been identified, the motion vector 310 can be used to find the reference block 316a-d (BaseTRef) in the reference picture 318. The reference picture 318 is in a different time instance and a different view than the current picture 302. The residual predictor can then be determined by subtracting the reference block 306a-d (Base) from the corresponding reference block 316a-d (BaseTref). ARP can then be performed on each of the sub-blocks 300a-d.

图15是说明基于块的视图间ARP的概念图。如图15中所展示，当前图片352包含被划分成四个子块350、350b、350c及350d的当前块350(Curr)。视差运动向量360(DMV)是用于对当前块350执行视图间预测的视差运动向量。视差运动向量360指向参考图片358中的参考块356(Base)(其包含子块356a-d)。当前图片352及参考图片358在同一时间实例中但在不同视图中。FIG15 is a conceptual diagram illustrating block-based inter-view ARP. As shown in FIG15 , a current picture 352 includes a current block 350 (Curr) divided into four sub-blocks 350a, 350b, 350c, and 350d. A disparity motion vector 360 (DMV) is a disparity motion vector used to perform inter-view prediction on the current block 350. The disparity motion vector 360 points to a reference block 356 (Base) in a reference picture 358 (which includes sub-blocks 356a-d). The current picture 352 and the reference picture 358 are in the same time instance but in different views.

对于基于块的视图间ARP，默认所导出的运动向量用于子块350a-d中的每一者。对于视图间ARP，默认所导出的运动向量是由图15中的第i子块的mvLX[i]标示的运动向量，且可被设定为覆盖子块356a-d中的每一者的中心位置的时间运动向量，这与当前ARP中一样。即，覆盖子块356内的第i 8×8块的中心位置的块含有时间运动向量，将mvLX[i]更新为所述时间运动向量。For block-based inter-view ARP, the default derived motion vector is used for each of sub-blocks 350a-d. For inter-view ARP, the default derived motion vector is the motion vector denoted by mvLX[i] for the i-th sub-block in FIG15 and can be set to the temporal motion vector covering the center position of each of sub-blocks 356a-d, as in the current ARP. That is, the block covering the center position of the i-th 8×8 block within sub-block 356 contains the temporal motion vector, and mvLX[i] is updated to that temporal motion vector.

所导出的运动向量中的每一者指向参考视图368中的特定参考块366a-d(BaseTref)。例如，运动向量354(mvLX[0])指向参考块368a且运动向量355(mvLX[3])指向参考块366d。Each of the derived motion vectors points to a specific reference block 366a-d (BaseTref) in the reference view 368. For example, motion vector 354 (mvLX[0]) points to reference block 368a and motion vector 355 (mvLX[3]) points to reference block 366d.

一旦已经识别参考块366a-d中的每一者，视差运动向量360可用于找到参考图片364中的参考块362a-d(CurrTRef)。参考图片364在与当前图片352不同的时间实例中。接着可通过从对应参考块366a-d(BaseTref)减去参考块362a-d(CurrTref)而确定残差预测符。接着可对子块350a-d中的每一者执行ARP。Once each of the reference blocks 366a-d has been identified, the disparity motion vector 360 can be used to find a reference block 362a-d (CurrTRef) in the reference picture 364. The reference picture 364 is in a different time instance than the current picture 352. The residual predictor can then be determined by subtracting the reference block 362a-d (CurrTref) from the corresponding reference block 366a-d (BaseTref). ARP can then be performed on each of the sub-blocks 350a-d.

如上文所描述，对于基于块的时间ARP，运动向量310被存取且用于定位参考块312(CurrTref)。同样，对于基于块的视图间ARP，视差运动向量360被存取且用于定位参考块356(Base)。As described above, for block-based temporal ARP, the motion vector 310 is accessed and used to locate the reference block 312 (CurrTref). Likewise, for block-based inter-view ARP, the disparity motion vector 360 is accessed and used to locate the reference block 356 (Base).

图16是说明使用子PU合并候选者的基于块的ARP的概念图。在启用子PU视图间运动预测时，存取由来自NBDV/DoNBDV导出过程的所导出的视差向量410识别的一个参考块(406)的运动信息以导出子PU合并候选者。在确定子PU合并候选者之后，即，对于块400(Curr)内的每一子PU，其将具有其时间运动信息，这由如图14中所展示的由运动向量404(mvLX[0])及运动向量405(mvLX[1])标示。运动向量404及405可用于识别参考块412(CurrTref)及参考块416(BaseTref)。FIG16 is a conceptual diagram illustrating block-based ARP using sub-PU merge candidates. When sub-PU inter-view motion prediction is enabled, the motion information of one reference block (406) identified by the derived disparity vector 410 from the NBDV/DoNBDV derivation process is accessed to derive the sub-PU merge candidates. After the sub-PU merge candidates are determined, that is, for each sub-PU within block 400 (Curr), it will have its temporal motion information, which is indicated by motion vector 404 (mvLX[0]) and motion vector 405 (mvLX[1]) as shown in FIG14. Motion vectors 404 and 405 can be used to identify reference block 412 (CurrTref) and reference block 416 (BaseTref).

在调用ARP过程时，还存取参考块412(CurrTRef)内的每一子块(例如，8×8块)的运动信息。在对应的子块412a-d(CurrRef)与视差运动向量相关联时，所述视差运动向量可用于定位参考视图中的参考块(例如，块406)。When the ARP process is invoked, motion information is also accessed for each sub-block (e.g., an 8x8 block) within the reference block 412 (CurrTRef). When corresponding sub-blocks 412a-d (CurrRef) are associated with disparity motion vectors, the disparity motion vectors can be used to locate a reference block (e.g., block 406) in the reference view.

因此，可需要存取两个块的运动信息。即，存取由来自NBDV/DoNBDV过程的DV识别的一个块的运动信息以用于子PU视图间合并候选者。另外，存取由任何导出的时间运动信息识别的块的运动信息。Therefore, it may be necessary to access the motion information of two blocks. That is, access the motion information of one block identified by the DV from the NBDV/DoNBDV process for the sub-PU inter-view merge candidate. In addition, access the motion information of the block identified by any derived temporal motion information.

3D-HEVC中的ARP的前述技术展现若干缺点。作为一个实例，在子PU视图间合并预测及块层级时间ARP两者用于译码一个PU时，存取两个参考块的运动信息。一个是由从DoNBDV/NBDV导出过程导出的视差向量识别的参考视图中的参考块。另外，存取对应的运动信息以导出子PU视图间合并候选者。在导出子PU视图间合并候选者之后，存取时间参考图片中的另一块以检查时间参考图片中的块是否含有视差运动向量。与不同块相关联的运动信息的双重存取显著增加了视频解码器设计的复杂度，且可减小解码器处理量。The aforementioned techniques for ARP in 3D-HEVC exhibit several disadvantages. As an example, when both inter-sub-PU view merge prediction and block-level temporal ARP are used to decode a PU, motion information of two reference blocks is accessed. One is the reference block in the reference view identified by the disparity vector derived from the DoNBDV/NBDV derivation process. In addition, the corresponding motion information is accessed to derive the inter-sub-PU view merge candidate. After deriving the inter-sub-PU view merge candidate, another block in the temporal reference picture is accessed to check whether the block in the temporal reference picture contains the disparity motion vector. The dual access of motion information associated with different blocks significantly increases the complexity of the video decoder design and can reduce the decoder processing capacity.

作为另一缺点，在使用子PU(即，块层级)ARP时，与指向当前块的时间运动向量的参考块相关联的视差运动向量用于更新默认视差向量。对于一个子块块，即使所述子块具有与其相邻块(左边、上方、下方或右边)相同的视差运动向量，仍在每一子块处执行ARP过程，因此增加视频解码器复杂度。As another disadvantage, when using sub-PU (i.e., block-level) ARP, the disparity motion vector associated with the reference block pointing to the temporal motion vector of the current block is used to update the default disparity vector. For a sub-block, even if the sub-block has the same disparity motion vector as its neighboring blocks (left, above, below, or right), the ARP process is still performed at each sub-block, thereby increasing the complexity of the video decoder.

在本发明的一个实例中，在启用子PU视图间运动预测且子PU视图间合并候选者(其对应于时间运动信息)应用于当前PU时，视频编码器20和视频解码器30可经配置以停用块层级ARP。而是，可启用PU层级ARP。In one example of the present disclosure, when inter-sub-PU view motion prediction is enabled and the inter-sub-PU view merge candidate (which corresponds to temporal motion information) is applied to the current PU, video encoder 20 and video decoder 30 may be configured to disable block-level ARP. Instead, PU-level ARP may be enabled.

在启用子PU视图间运动预测且应用PU层级ARP时，视频编码器20和视频解码器30可确定每一子PU的时间运动信息。即，每一子PU具有其自身的时间运动信息。然而，视频编码器20和视频解码器30针对所有子PU确定同一视差向量。所述时间运动信息及视差向量用于导出残差及残差预测符，如上文所描述。应注意，在子PU视图间运动预测适用时，所使用的ARP过程是时间ARP。When sub-PU inter-view motion prediction is enabled and PU-level ARP is applied, video encoder 20 and video decoder 30 can determine temporal motion information for each sub-PU. That is, each sub-PU has its own temporal motion information. However, video encoder 20 and video decoder 30 determine the same disparity vector for all sub-PUs. The temporal motion information and disparity vector are used to derive the residual and residual predictor, as described above. It should be noted that when sub-PU inter-view motion prediction is applicable, the ARP process used is temporal ARP.

在不使用子PU视图间运动预测时使用以下实例技术。在一个实例中，对于时间ARP，视频编码器20和视频解码器30可确定每一子PU的视差向量。在一个实例中，所述所确定的视差向量可为从由时间参考图片中的当前子PU的时间运动信息识别的当前子PU的参考块导出的视差运动信息。对于视图间ARP，视频编码器20和视频解码器30可确定每一子PU的时间运动信息。在一个实例中，所述时间运动信息可从由视图间参考图片中的当前子PU的视差运动信息识别的当前子PU的参考块导出。The following example techniques are used when inter-view motion prediction of sub-PUs is not used. In one example, for temporal ARP, video encoder 20 and video decoder 30 may determine a disparity vector for each sub-PU. In one example, the determined disparity vector may be disparity motion information derived from a reference block of the current sub-PU identified by the temporal motion information of the current sub-PU in a temporal reference picture. For inter-view ARP, video encoder 20 and video decoder 30 may determine temporal motion information for each sub-PU. In one example, the temporal motion information may be derived from a reference block of the current sub-PU identified by the disparity motion information of the current sub-PU in an inter-view reference picture.

在本发明的另一实例中，在启用子PU视图间运动预测时，视频编码器20及视频解码器可经配置以在相关联的参考图片是时间参考图片的情况下停用对应于特定参考图片列表的一个预测方向的块层级ARP。在此情况下，视频编码器20和视频解码器30可经配置以仅针对此预测方向启用PU层级ARP。In another example of the present disclosure, when sub-PU inter-view motion prediction is enabled, video encoder 20 and video decoder 30 may be configured to disable block-level ARP for one prediction direction corresponding to a specific reference picture list if the associated reference picture is a temporal reference picture. In this case, video encoder 20 and video decoder 30 may be configured to enable PU-level ARP only for this prediction direction.

在一个实例中，应用以下过程。如果当前PU使用视图间合并候选者，那么视频编码器20和视频解码器30确定每一子PU的时间运动信息。然而，视频编码器20和视频解码器30针对所有子PU确定同一视差向量。时间运动信息及视差向量用于导出残差及残差预测符，如上文所描述。In one example, the following process is applied. If the current PU uses an inter-view merge candidate, video encoder 20 and video decoder 30 determine temporal motion information for each sub-PU. However, video encoder 20 and video decoder 30 determine the same disparity vector for all sub-PUs. The temporal motion information and disparity vector are used to derive the residual and residual predictor, as described above.

否则，如果当前PU使用其它可用的合并候选者(即，不是视图间合并候选者)中的一者，那么视频编码器20和视频解码器30应用PU层级时间ARP，其中如果对应的参考图片是时间参考图片，那么当前PU内的所有块共享一个预测方向的同一运动信息。对于一个预测方向，如果对应的参考图片是视图间参考图片，那么视频编码器20和视频解码器30使用PU层级视图间ARP，其中当前PU内的所有块共享同一运动信息。在此情况下，还可应用块层级ARP，其中当前PU内的块可共享同一视差运动信息及不同时间运动信息。Otherwise, if the current PU uses one of the other available merge candidates (i.e., not an inter-view merge candidate), the video encoder 20 and the video decoder 30 apply PU-level temporal ARP, where if the corresponding reference picture is a temporal reference picture, all blocks within the current PU share the same motion information for one prediction direction. For one prediction direction, if the corresponding reference picture is an inter-view reference picture, the video encoder 20 and the video decoder 30 use PU-level inter-view ARP, where all blocks within the current PU share the same motion information. In this case, block-level ARP can also be applied, where blocks within the current PU can share the same disparity motion information and different temporal motion information.

在本发明的另一实例中，在启用块层级ARP时，视频编码器20和视频解码器30可基于运动信息而确定用于执行ARP的块大小。在一个实例中，对于对应于特定参考图片列表的一个预测方向，如果对应的参考图片是时间参考图片，那么视频编码器20和视频解码器30可使用块层级ARP。在此情况下，当前块具有与其相邻块(例如，左边、上方、下方及/或右边相邻块)相同的视差运动信息。此外，当前块及相邻块合并在一起且针对合并的块执行一次ARP。In another example of the present invention, when block-level ARP is enabled, video encoder 20 and video decoder 30 may determine the block size for performing ARP based on motion information. In one example, for one prediction direction corresponding to a particular reference picture list, if the corresponding reference picture is a temporal reference picture, video encoder 20 and video decoder 30 may use block-level ARP. In this case, the current block has the same disparity motion information as its neighboring blocks (e.g., the left, above, below, and/or right neighboring blocks). Furthermore, the current block and the neighboring blocks are merged together, and ARP is performed once for the merged block.

在本发明的另一实例中，对于对应于参考图片列表的一个预测方向，如果对应的参考图片是视图间参考图片，那么视频编码器20和视频解码器30可使用块层级ARP。在此情况下，当前块具有与其相邻块(例如，左边、上方、下方及/或右边相邻块)相同的时间运动信息。此外，当前块及相邻块合并在一起且针对合并的块执行一次ARP。In another example of the present invention, for one prediction direction corresponding to a reference picture list, if the corresponding reference picture is an inter-view reference picture, the video encoder 20 and the video decoder 30 may use block-level ARP. In this case, the current block has the same temporal motion information as its neighboring blocks (e.g., the left, above, below, and/or right neighboring blocks). In addition, the current block and the neighboring blocks are merged together and ARP is performed once for the merged block.

图17是说明可经配置以执行本发明中所描述的技术的实例视频编码器20的框图。视频编码器20可以对视频切片内的视频块执行帧内和帧间译码。帧内译码依赖于空间预测来减少或移除给定视频帧或图片内的视频中的空间冗余。帧间译码依赖于时间或视图间预测来减少或移除视频序列的邻近帧或图片内的视频中的冗余。帧内模式(I模式)可指代若干基于空间的压缩模式中的任一者。例如单向预测(P模式)或双向预测(B模式)等帧间模式可包含若干基于时间的压缩模式中的任一者。FIG17 is a block diagram illustrating an example video encoder 20 that may be configured to perform the techniques described in this disclosure. The video encoder 20 may perform intra- and inter-coding on video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal or inter-view prediction to reduce or remove redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of several spatially-based compression modes. Inter-modes, such as unidirectional prediction (P-mode) or bidirectional prediction (B-mode), may include any of several temporally-based compression modes.

在图17的实例中，视频编码器20包含视频数据存储器235、预测处理单元241、经解码图片缓冲器(DPB)264、求和器251、变换处理单元252、量化处理单元255及熵编码单元257。预测处理单元241包含运动估计单元242、运动及视差补偿单元244、高级残差预测(ARP)单元245及帧内预测处理单元246。为了视频块重构，视频编码器20还包含反量化处理单元259、反变换处理单元260及求和器262。还可包含解块滤波器(图17中未图示)以便对块边界进行滤波，以将成块性假影从经重构视频移除。在需要时，解块滤波器将通常对求和器262的输出进行滤波。除了解块滤波器之外，还可使用额外的环路过滤器(环路内或环路后)。In the example of FIG17 , video encoder 20 includes video data memory 235, prediction processing unit 241, decoded picture buffer (DPB) 264, summer 251, transform processing unit 252, quantization processing unit 255, and entropy encoding unit 257. Prediction processing unit 241 includes motion estimation unit 242, motion and disparity compensation unit 244, advanced residual prediction (ARP) unit 245, and intra-prediction processing unit 246. For video block reconstruction, video encoder 20 also includes inverse quantization processing unit 259, inverse transform processing unit 260, and summer 262. A deblocking filter (not shown in FIG17 ) may also be included to filter block boundaries to remove blockiness artifacts from the reconstructed video. If necessary, the deblocking filter will typically filter the output of summer 262. In addition to the deblocking filter, additional loop filters (in-loop or post-loop) may also be used.

在各种实例中，视频编码器20的一或多个硬件单元可经配置以执行本发明的技术。例如，运动及视差补偿单元244及ARP单元245可单独地或与视频编码器20的其它单元组合地执行本发明的技术。In various examples, one or more hardware units of video encoder 20 may be configured to perform the techniques of this disclosure. For example, motion and disparity compensation unit 244 and ARP unit 245 may perform the techniques of this disclosure alone or in combination with other units of video encoder 20.

如图17中所展示，视频编码器20接收待编码的视频帧(例如，纹理图像或深度图)内的视频数据(例如，视频数据块(例如，明度块、色度块或深度块))。视频数据存储器235可存储待由视频编码器20的组件编码的视频数据。可(例如)从视频源18获得存储在视频数据存储器40中的视频数据。DPB 264是存储参考视频数据以供视频编码器20用于编码视频数据(例如，在帧内或帧间译码模式中，也被称作帧内或帧间预测译码模式)的存储器缓冲器。视频数据存储器235和DPB 264可由多种存储器装置中的任一者形成，例如包含同步DRAM(SDRAM)的动态随机存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM)，或其它类型的存储器装置。视频数据存储器235和DPB 264可由同一存储器装置或单独的存储器装置提供。在各种实例中，视频数据存储器235可与视频编码器20的其它组件一起在芯片上，或相对于所述组件在芯片外。As shown in FIG. 17 , video encoder 20 receives video data (e.g., video data blocks (e.g., luma blocks, chroma blocks, or depth blocks)) within a video frame to be encoded (e.g., a texture image or a depth map). Video data memory 235 may store video data to be encoded by components of video encoder 20. Video data stored in video data memory 40 may be obtained, for example, from video source 18. DPB 264 is a memory buffer that stores reference video data for use by video encoder 20 in encoding video data (e.g., in intra- or inter-coding modes, also known as intra- or inter-prediction coding modes). Video data memory 235 and DPB 264 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 235 and DPB 264 may be provided by the same memory device or separate memory devices. In various examples, video data memory 235 may be on-chip with the other components of video encoder 20 , or off-chip relative to those components.

如图17中所展示，视频编码器20接收视频数据且将所述数据分割成视频块。此分割还可包含分割成切片、瓦片或其它更大单元，以及例如根据LCU及CU的四叉树结构的视频块分割。视频编码器20一般说明对待编码的视频切片内的视频块编码的组件。可将切片划分成多个视频块(且可能划分成被称作瓦片的视频块的集合)。As shown in FIG17 , video encoder 20 receives video data and partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning according to a quadtree structure of LCUs and CUs, for example. Video encoder 20 generally illustrates components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks called tiles).

预测处理单元241可基于误差结果(例如，译码速率及失真等级)针对当前视频块选择多种可能译码模式中的一者，例如，多种帧内译码模式中的一者或多种帧间译码模式中的一者。预测处理单元241可将所得的经帧内译码或经帧间译码块提供到求和器251以产生残差块数据，且提供到求和器262以重构经编码块以用作参考图片。Prediction processing unit 241 may select one of a plurality of possible coding modes for the current video block, e.g., one of a plurality of intra-coding modes or one of a plurality of inter-coding modes, based on the error results (e.g., coding rate and distortion level). Prediction processing unit 241 may provide the resulting intra-coded or inter-coded block to summer 251 to generate residual block data and to summer 262 to reconstruct the encoded block for use as a reference picture.

预测处理单元241内的帧内预测处理单元246相对于与待译码当前块在相同的帧或切片中的一或多个相邻块执行当前视频块的帧内预测性译码，以提供空间压缩。预测处理单元241内的运动估计单元242及运动与视差补偿单元244相对于一或多个参考图片(包含视图间参考图片)中的一或多个预测块执行当前视频块的帧间预测性译码(包含视图间译码)以例如提供时间和/或视图间压缩。Intra-prediction processing unit 246 within prediction processing unit 241 performs intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 242 and motion and disparity compensation unit 244 within prediction processing unit 241 perform inter-predictive coding (including inter-view coding) of the current video block relative to one or more prediction blocks in one or more reference pictures (including inter-view reference pictures) to provide, for example, temporal and/or inter-view compression.

运动估计单元242可经配置以根据用于视频序列的预定模式为视频切片确定帧间预测模式。运动估计单元242与运动与视差补偿单元244可高度集成，但出于概念目的单独地加以说明。由运动估计单元242执行的运动估计是产生运动向量的过程，所述运动向量估计视频块的运动。举例来说，运动向量可指示当前视频帧或图片内的视频块的PU相对于参考图片内的预测块的移位。Motion estimation unit 242 may be configured to determine an inter-prediction mode for a video slice according to a predetermined pattern for a video sequence. Motion estimation unit 242 and motion and disparity compensation unit 244 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation unit 242 is the process of generating motion vectors, which estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a prediction block within a reference picture.

预测块是发现在像素差方面紧密匹配待译码视频块的PU的块，像素差可由绝对差总和(SAD)、平方差总和(SSD)或其它差度量来确定。在一些实例中，视频编码器20可计算存储于DPB 264中的参考图片的子整数像素位置的值。举例来说，视频编码器20可内插四分之一像素位置、八分之一像素位置或参考图片的其它分数像素位置的值。因此，运动估计单元242可相对于全像素位置及分数像素位置执行运动搜索并且输出具有分数像素精度的运动向量。A prediction block is a block that is found to closely match the PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in DPB 264. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of the reference picture. Thus, motion estimation unit 242 may perform a motion search relative to full-pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.

运动估计单元242通过比较经帧间译码切片中的视频块的PU的位置与参考图片(包含时间或视图间参考图片)的预测块的位置来计算PU的运动向量和/或视差运动向量。如上文所描述，运动向量可以用于运动补偿预测，而视差运动向量可以用于视差补偿预测。参考图片可选自第一参考图片列表(列表0或RefPicList0)或第二参考图片列表(列表1或RefPicList1)，其中的每一者识别存储在DPB 264中的一或多个参考图片。运动估计单元242将计算出的运动向量发送到熵编码单元257和运动与视差补偿单元244。Motion estimation unit 242 calculates the motion vector and/or disparity motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU with the position of a prediction block of a reference picture (including a temporal or inter-view reference picture). As described above, motion vectors can be used for motion compensated prediction, while disparity motion vectors can be used for disparity compensated prediction. The reference picture can be selected from a first reference picture list (List 0 or RefPicList0) or a second reference picture list (List 1 or RefPicList1), each of which identifies one or more reference pictures stored in DPB 264. Motion estimation unit 242 sends the calculated motion vector to entropy encoding unit 257 and motion and disparity compensation unit 244.

通过运动与视差补偿单元244执行的运动和/或视差补偿可涉及基于通过运动估计(可能执行子像素精度的内插)确定的运动向量获取或产生预测块。在接收到当前视频块的PU的运动向量后，运动与视差补偿单元244可以即刻在参考图片列表中的一者中定位所述运动向量指向的预测块。视频编码器20通过从正被译码的当前视频块的像素值减去预测块的像素值从而形成像素差值来形成残差视频块。像素差值形成用于所述块的残差数据，且可包含明度和色度差分量两者。求和器251表示执行此减法运算的一或多个组件。运动与视差补偿单元244还可产生与视频块及视频切片相关联的语法元素以供视频解码器30在解码视频切片的视频块时使用。Motion and/or disparity compensation performed by motion and disparity compensation unit 244 may involve obtaining or generating a prediction block based on a motion vector determined by motion estimation (possibly performing interpolation with sub-pixel precision). Upon receiving the motion vector for the PU of the current video block, motion and disparity compensation unit 244 may locate the prediction block to which the motion vector points in one of the reference picture lists. Video encoder 20 forms a residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being decoded, thereby forming pixel difference values. The pixel difference values form the residual data for the block and may include both luma and chroma difference components. Summer 251 represents the one or more components that perform this subtraction operation. Motion and disparity compensation unit 244 may also generate syntax elements associated with the video block and video slice for use by video decoder 30 when decoding the video block of the video slice.

视频编码器20(包含ARP单元245及运动及视差补偿单元244)可执行双向预测及ARP技术中的任一者，例如，本文中描述的视图间或时间ARP技术。具体来说，在本发明的一个实例中，视频编码器可经配置以使用双向预测及视图间ARP来编码当前视频数据块。对于当前视频数据块，运动与视差补偿单元244可经配置以确定当前视频数据块的第一预测方向(例如，参考图片列表X)的时间运动信息，且使用针对第一预测方向所确定的时间运动信息而识别第二预测方向(例如，参考图片列表Y)的参考块，其中第二预测方向的参考块在当前视频数据块的不同存取单元中。以此方式，需要对运动信息及参考块的更少的存储器存取来编码当前视频块。Video encoder 20 (including ARP unit 245 and motion and disparity compensation unit 244) can perform any of bi-directional prediction and ARP techniques, such as the inter-view or temporal ARP techniques described herein. Specifically, in one example of the present disclosure, the video encoder can be configured to encode a current block of video data using bi-directional prediction and inter-view ARP. For the current block of video data, motion and disparity compensation unit 244 can be configured to determine temporal motion information for a first prediction direction (e.g., reference picture list X) for the current block of video data, and use the temporal motion information determined for the first prediction direction to identify a reference block for a second prediction direction (e.g., reference picture list Y), where the reference block for the second prediction direction is in a different access unit from the current block of video data. In this way, fewer memory accesses for motion information and reference blocks are required to encode the current block of video data.

作为如上文所描述由运动估计单元242和运动与视差补偿单元244执行的帧间预测的替代方案，帧内预测单元246可以对当前块执行帧内预测。明确地说，帧内预测处理单元246可以确定用来对当前块进行编码的帧内预测模式。在一些实例中，帧内预测处理单元246可以例如在单独的编码回合期间使用各种帧内预测模式编码当前视频块，并且帧内预测处理单元246(或在一些实例中为预测处理单元241)可以从测试模式中选择适当帧内预测模式来使用。As an alternative to the inter-prediction performed by motion estimation unit 242 and motion and disparity compensation unit 244 as described above, intra-prediction unit 246 can perform intra-prediction on the current block. Specifically, intra-prediction processing unit 246 can determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction processing unit 246 can encode the current video block using various intra-prediction modes, for example, during separate encoding passes, and intra-prediction processing unit 246 (or prediction processing unit 241 in some examples) can select an appropriate intra-prediction mode to use from a test pattern.

举例来说，帧内预测处理单元246可以使用速率失真分析计算用于各种被测试的帧内预测模式的速率失真值，并且从所述被测试模式当中选择具有最佳速率失真特性的帧内预测模式。速率失真分析一般确定经编码块与经编码以产生所述经编码块的原始未经编码块之间的失真(或误差)的量，以及用于产生经编码块的位速率(即，位数目)。帧内预测处理单元246可根据用于各种经编码块的失真和速率计算比率，以确定哪个帧内预测模式对于所述块展现最佳速率-失真值。For example, the intra-prediction processing unit 246 can use rate-distortion analysis to calculate rate-distortion values for various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics from the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and the original unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., the number of bits) used to produce the encoded block. The intra-prediction processing unit 246 can calculate a ratio based on the distortion and rate for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在任何状况下，在选择用于块的帧内预测模式之后，帧内预测处理单元246可将指示块的选定帧内预测模式的信息提供到熵编码单元257。熵编码单元257可根据本发明的技术对指示所选帧内预测模式的信息进行编码。视频编码器20在所发射的位流中可包含配置数据，其可包含多个帧内预测模式索引表和多个经修改的帧内预测模式索引表(也称为码字映射表)、对用于各种块的上下文进行编码的定义，以及对最可能帧内预测模式、帧内预测模式索引表和经修改的帧内预测模式索引表的指示以用于所述上下文中的每一者。In any case, after selecting an intra-prediction mode for a block, intra-prediction processing unit 246 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 257. Entropy encoding unit 257 may encode the information indicating the selected intra-prediction mode according to the techniques of this disclosure. Video encoder 20 may include configuration data in the transmitted bitstream that may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also called codeword mapping tables), definitions of contexts for encoding various blocks, and indications of the most probable intra-prediction mode, intra-prediction mode index tables, and modified intra-prediction mode index tables to use for each of the contexts.

在预测处理单元241经由帧间预测或帧内预测产生当前视频块的预测块之后，视频编码器20通过从当前视频块减去预测块而形成残差视频块。残差块中的残差视频数据可包含在一或多个TU中并应用于变换处理单元252。变换处理单元252使用例如离散余弦变换(DCT)或概念上类似变换的变换将残差视频数据变换成残差变换系数。变换处理单元252可将残差视频数据从像素域转换到变换域，例如频域。After prediction processing unit 241 generates a prediction block for the current video block via inter-frame prediction or intra-frame prediction, video encoder 20 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be contained in one or more TUs and applied to transform processing unit 252. Transform processing unit 252 transforms the residual video data into residual transform coefficients using, for example, a discrete cosine transform (DCT) or a conceptually similar transform. Transform processing unit 252 may convert the residual video data from the pixel domain to a transform domain, such as the frequency domain.

变换处理单元252可将所得变换系数发送到量化处理单元255。量化处理单元255量化变换系数以进一步减小位速率。量化过程可减少与变换系数中的一些或全部相关联的位深度。可通过调整量化参数来修改量化程度。在一些实例中，量化处理单元255可接着执行对包含经量化变换系数的矩阵的扫描。替代地，熵编码单元257可执行所述扫描。Transform processing unit 252 may send the resulting transform coefficients to quantization processing unit 255. Quantization processing unit 255 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the transform coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization processing unit 255 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 257 may perform the scan.

在量化之后，熵编码单元257对经量化的变换系数进行熵编码。举例来说，熵编码单元257可执行上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵编码方法或技术。在熵编码单元257进行的熵编码之后，可将经编码视频位流发射到视频解码器30，或将经编码位流存档以供稍后发射或由视频解码器30检索。熵编码单元257还可对正经译码当前视频切片的运动向量和其它语法元素进行熵编码。After quantization, entropy encoding unit 257 performs entropy encoding on the quantized transform coefficients. For example, entropy encoding unit 257 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method or technique. Following entropy encoding by entropy encoding unit 257, the encoded video bitstream may be transmitted to video decoder 30 or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 257 may also entropy encode the motion vectors and other syntax elements for the current video slice being coded.

反量化处理单元259和反变换处理单元260分别应用反量化和反变换以在像素域中重构残差块，例如以供稍后用作参考图片的参考块。运动与视差补偿单元244可以通过将残差块相加到参考图片列表中的一者内的参考图片中的一者的预测块来计算参考块。运动与视差补偿单元244还可将一或多个内插滤波器应用于经重构的残差块以计算子整数像素值用于运动估计。求和器262将经重构的残差块相加到由运动与视差补偿单元244产生的运动补偿预测块以产生参考块用于存储在DPB 264中。参考块可由运动估计单元242和运动与视差补偿单元244用作参考块以对后续视频帧或图片中的块进行帧间预测。Inverse quantization processing unit 259 and inverse transform processing unit 260 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference picture. Motion and disparity compensation unit 244 may calculate a reference block by adding the residual block to a prediction block of one of the reference pictures in one of the reference picture lists. Motion and disparity compensation unit 244 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 262 adds the reconstructed residual block to the motion compensated prediction block generated by motion and disparity compensation unit 244 to generate a reference block for storage in DPB 264. The reference block may be used by motion estimation unit 242 and motion and disparity compensation unit 244 as a reference block for inter-frame prediction of a block in a subsequent video frame or picture.

图18是说明可实施本发明中描述的技术的实例视频解码器30的框图。在图18的实例中，视频解码器30包含视频数据存储器278、熵解码单元280、预测处理单元281、反量化处理单元286、反变换处理单元288、求和器291及经解码图片缓冲器(DPB)292。预测处理单元281包含运动及视差补偿单元282、ARP单元283及帧内预测处理单元284。在一些实例中，视频解码器30可执行一般与相对于来自图17的视频编码器20描述的编码回合互逆的解码回合。FIG18 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. In the example of FIG18 , video decoder 30 includes video data memory 278, entropy decoding unit 280, prediction processing unit 281, inverse quantization processing unit 286, inverse transform processing unit 288, summer 291, and decoded picture buffer (DPB) 292. Prediction processing unit 281 includes motion and disparity compensation unit 282, ARP unit 283, and intra-prediction processing unit 284. In some examples, video decoder 30 may perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20 from FIG17 .

在各种实例中，视频解码器30的一或多个硬件单元可被分派任务以执行本发明的技术。例如，ARP单元283及运动及视差补偿单元282可单独地或与视频编码器的其它单元组合地执行本发明的技术。In various examples, one or more hardware units of video decoder 30 may be tasked to perform the techniques of this disclosure. For example, ARP unit 283 and motion and disparity compensation unit 282 may perform the techniques of this disclosure alone or in combination with other units of the video encoder.

视频数据存储器278可存储例如经编码视频位流等视频数据以由视频解码器30的组件解码。可从例如相机等本地视频源经由视频数据的有线或无线网络通信或通过存取物理数据存储媒体而获得存储在视频数据存储器278中的视频数据。视频数据存储器278可形成经译码图片缓冲器(CPB)，所述经译码图片缓冲器存储来自经编码视频位流的经编码视频数据。DPB 292是存储供视频解码器30解码视频数据使用(例如，在帧内或帧间译码模式中，还被称作帧内或帧间预测译码模式)的参考视频数据的DPB的一个实例。可通过多种存储器装置中的任一者形成视频数据存储器278及DPB 292，所述存储器装置例如为动态随机存取存储器(DRAM)，包含同步DRAM(SDRAM)、磁阻式RAM(MRAM)、电阻性RAM(RRAM)或其它类型的存储器装置。可通过同一存储器装置或单独存储器装置提供视频数据存储器278及DPB292。在各种实例中，视频数据存储器278可与视频解码器30的其它组件在芯片上或相对于那些组件在芯片外。Video data memory 278 may store video data, such as an encoded video bitstream, for decoding by components of video decoder 30. The video data stored in video data memory 278 may be obtained from a local video source, such as a camera, via wired or wireless network communication of video data, or by accessing physical data storage media. Video data memory 278 may form a coded picture buffer (CPB), which stores encoded video data from an encoded video bitstream. DPB 292 is an example of a DPB that stores reference video data used by video decoder 30 to decode video data (e.g., in intra- or inter-coding modes, also referred to as intra- or inter-prediction coding modes). Video data memory 278 and DPB 292 may be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 278 and DPB 292 may be provided by the same memory device or by separate memory devices. In various examples, video data memory 278 may be on-chip with other components of video decoder 30 or off-chip relative to those components.

在解码过程期间，视频解码器30接收表示来自视频编码器20的经编码视频切片及相关联的语法元素的视频块的经编码视频位流。视频解码器30的熵解码单元280对所述位流进行熵解码以产生经量化系数、运动向量及其它语法元素。熵解码单元280将运动向量及其它语法元素转发到预测处理单元281。视频解码器30可在视频切片层级及/或视频块层级处接收所述语法元素。During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 280 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 280 forwards the motion vectors and other syntax elements to prediction processing unit 281. Video decoder 30 may receive the syntax elements at the video slice level and/or the video block level.

在视频切片被译码为经帧内译码(I)切片时，预测处理单元281的帧内预测处理单元284可基于发信号通知的帧内预测模式及来自当前帧或图片的先前经解码块的数据而产生当前视频切片的视频块的预测数据。在将视频帧译码为经帧间译码(即，B或P)切片或经视图间译码切片时，预测处理单元281的运动及视差补偿单元282基于从熵解码单元280接收的运动向量、视差运动向量及其它语法元素而产生当前视频切片的视频块的预测块。可从包含视图间参考图片的参考图片列表中的一者内的参考图片中的一者产生预测块。视频解码器30可使用默认建构技术或任何其它技术基于存储在DPB 292中的参考图片而建构参考帧列表RefPicList0及RefPicList1。When the video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 284 of prediction processing unit 281 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B or P) slice or an inter-view coded slice, motion and disparity compensation unit 282 of prediction processing unit 281 generates prediction blocks for the video block of the current video slice based on motion vectors, disparity motion vectors, and other syntax elements received from entropy decoding unit 280. The prediction block may be generated from one of the reference pictures within one of the reference picture lists that includes inter-view reference pictures. Video decoder 30 may construct reference frame lists RefPicList0 and RefPicList1 based on the reference pictures stored in DPB 292 using a default construction technique or any other technique.

运动与视差补偿单元282通过解析运动向量和其它语法元素确定用于当前视频切片的视频块的预测信息，并且使用所述预测信息产生用于正解码的当前视频块的预测块。举例来说，运动与视差补偿单元282使用所接收语法元素中的一些语法元素确定用于对视频切片的视频块进行译码的预测模式(例如，帧内预测或帧间预测)、帧间预测切片类型(例如，B切片、P切片和/或经视图间预测的切片)、切片的参考图片列表中的一或多者的建构信息、切片的每一经帧间编码的视频块的运动向量和/或视差运动向量、切片的每一经帧间译码的视频块的帧间预测状态，及用以解码当前视频切片中的视频块的其它信息。Motion and disparity compensation unit 282 determines prediction information for a video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to generate a prediction block for the current video block being decoded. For example, motion and disparity compensation unit 282 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) for coding the video block of the video slice, an inter-prediction slice type (e.g., a B slice, a P slice, and/or an inter-view predicted slice), construction information for one or more of the slice's reference picture lists, a motion vector and/or a disparity motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode the video blocks in the current video slice.

运动与视差补偿单元282还可基于内插滤波器执行内插。运动与视差补偿单元282可使用由视频编码器20在视频块的编码期间使用的内插滤波器来计算参考块的子整数像素的内插值。在这种情况下，运动与视差补偿单元282可根据所接收的语法信息元素而确定由视频编码器20使用的内插滤波器且使用所述内插滤波器来产生预测块。Motion and disparity compensation unit 282 may also perform interpolation based on interpolation filters. Motion and disparity compensation unit 282 may calculate interpolated values for sub-integer pixels of a reference block using interpolation filters used by video encoder 20 during encoding of the video block. In this case, motion and disparity compensation unit 282 may determine the interpolation filters used by video encoder 20 from received syntax information elements and use the interpolation filters to generate a prediction block.

反量化处理单元286对位流中提供的且由熵解码单元280解码的经量化变换系数进行反量化，即解量化。反量化过程可包含使用由视频编码器20针对视频切片中的每一视频块计算的量化参数以确定应应用的量化程度及同样确定应应用的反量化程度。反变换处理单元288对变换系数应用反变换，例如反DCT、反整数变换或概念上类似的反变换过程，以便产生像素域中的残差块。Inverse quantization processing unit 286 inverse quantizes, or dequantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 280. The inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in a video slice to determine the degree of quantization and, likewise, the degree of inverse quantization that should be applied. Inverse transform processing unit 288 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients to produce a residual block in the pixel domain.

视频解码器30(包含ARP单元283及运动及视差补偿单元282)可执行双向预测及/或ARP技术中的任一者，例如，本文中描述的视图间或时间ARP技术。具体来说，在本发明的一个实例中，视频解码器30可接收使用双向预测及视图间ARP编码的当前视频数据块。经编码视频数据块可存储在视频数据存储器278中。对于经编码视频数据块，运动及视差补偿单元282可经配置以确定经编码视频数据块的第一预测方向(例如，参考图片列表X)的时间运动信息，且使用针对第一预测方向所确定的时间运动信息来识别第二预测方向(例如，参考图片列表Y)的参考块，其中第二预测方向的参考块在当前视频数据块的不同存取单元中。以此方式，需要对运动信息及参考块的更少的存储器存取来解码所述经编码视频块。Video decoder 30 (including ARP unit 283 and motion and disparity compensation unit 282) may perform any of the bi-directional prediction and/or ARP techniques, such as the inter-view or temporal ARP techniques described herein. Specifically, in one example of the present disclosure, video decoder 30 may receive a current block of video data encoded using bi-directional prediction and inter-view ARP. The encoded block of video data may be stored in video data memory 278. For the encoded block of video data, motion and disparity compensation unit 282 may be configured to determine temporal motion information for a first prediction direction (e.g., reference picture list X) of the encoded block of video data and use the temporal motion information determined for the first prediction direction to identify a reference block for a second prediction direction (e.g., reference picture list Y), where the reference block for the second prediction direction is in a different access unit from the current block of video data. In this manner, fewer memory accesses for motion information and reference blocks are required to decode the encoded video block.

在运动与视差补偿单元282基于运动向量和其它语法元素产生当前视频块的预测块之后，视频解码器30通过将来自反变换处理单元288的残差块与运动与视差补偿单元282产生的对应预测块求和来形成经解码视频块。求和器291表示可执行此求和运算的一或多个组件。视需要，解块滤波器还可应用于对经解码块进行滤波以便移除成块假影。其它环路过滤器(在译码环路中或在译码环路之后)也可用于使像素转变变平滑或者以其它方式提高视频质量。接着将给定帧或图片中的经解码视频块存储在DPB 292中，DPB292存储参考图片用于后续运动补偿。DPB 292还存储经解码视频用于稍后在显示装置(例如，图1的显示装置32)上呈现。After motion and disparity compensation unit 282 generates a prediction block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform processing unit 288 with the corresponding prediction block generated by motion and disparity compensation unit 282. Summer 291 represents one or more components that can perform this summation operation. Optionally, a deblocking filter may also be applied to the decoded block to remove blocking artifacts. Other loop filters (in the decoding loop or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in DPB 292, which stores reference pictures for subsequent motion compensation. DPB 292 also stores the decoded video for later presentation on a display device (e.g., display device 32 of FIG. 1).

图19为说明根据本发明中描述的技术的用于编码视频块的实例ARP方法的流程图。图19的技术可由包含运动及视差补偿单元244及ARP单元245的视频编码器20的硬件结构的任何组合执行。19 is a flowchart illustrating an example ARP method for encoding a video block according to the techniques described in this disclosure. The techniques of FIG. 19 may be performed by any combination of hardware structures of video encoder 20, including motion and disparity compensation unit 244 and ARP unit 245.

在本发明的一个实例中，视频编码器20可经配置以使用ARP及双向预测来编码视频数据块。在此实例中，双向预测包含用于第一预测方向(例如，用于参考图片列表X)的时间预测及用于第二预测方向(例如，用于参考图片列表Y)的视图间预测。运动及视差补偿单元244可经配置以确定所述视频数据块的第一预测方向的时间运动信息(1900)。ARP单元245可经配置以使用所述所确定的时间运动信息来识别第一预测方向的参考块(1910)且使用第一预测方向的所述所确定的时间运动信息来识别不同于第一预测方向的第二预测方向的参考块(1920)。所述参考块可在不同于所述视频数据块的所述存取单元的存取单元中。ARP单元245可进一步经配置以使用第一预测方向及第二预测方向的所述所识别的参考块来执行高级残差预测(1930)。In one example of the present disclosure, video encoder 20 may be configured to encode a block of video data using ARP and bi-directional prediction. In this example, bi-directional prediction includes temporal prediction for a first prediction direction (e.g., for reference picture list X) and inter-view prediction for a second prediction direction (e.g., for reference picture list Y). Motion and disparity compensation unit 244 may be configured to determine temporal motion information for a first prediction direction of the block of video data (1900). ARP unit 245 may be configured to use the determined temporal motion information to identify a reference block for the first prediction direction (1910) and to use the determined temporal motion information for the first prediction direction to identify a reference block for a second prediction direction different from the first prediction direction (1920). The reference block may be in an access unit different from the access unit of the block of video data. ARP unit 245 may further be configured to perform advanced residual prediction (1930) using the identified reference blocks for the first and second prediction directions.

在本发明的其它实例中，运动及视差补偿单元244可经配置以确定第一经编码视频数据块的第二预测方向的视差运动信息。另外，ARP单元245可经配置以使用所述所确定的时间运动信息来识别第一预测方向的第一参考块，其中第一参考块在第一视图的第二存取单元中。ARP单元245可进一步经配置以使用所述所确定的时间运动信息来识别第二预测方向的第二参考块，且使用所述所确定的时间运动信息及所述所确定的视差运动信息来识别第二预测方向的第三参考块，其中第三参考块在第二视图的第三存取单元中。In other examples of the present disclosure, the motion and disparity compensation unit 244 may be configured to determine disparity motion information for a second prediction direction for the first block of encoded video data. Additionally, the ARP unit 245 may be configured to use the determined temporal motion information to identify a first reference block for the first prediction direction, wherein the first reference block is in a second access unit of the first view. The ARP unit 245 may further be configured to use the determined temporal motion information to identify a second reference block for the second prediction direction, and to use the determined temporal motion information and the determined disparity motion information to identify a third reference block for the second prediction direction, wherein the third reference block is in a third access unit of the second view.

图20是说明根据本发明中所描述的技术的用于解码视频块的实例ARP方法的流程图。图20的技术可由视频解码器、ARP单元283及运动及视差补偿单元282的硬件结构的任何组合执行。20 is a flowchart illustrating an example ARP method for decoding a video block according to the techniques described in this disclosure. The techniques of FIG. 20 may be performed by any combination of hardware structures of the video decoder, ARP unit 283, and motion and disparity compensation unit 282.

在本发明的一个实例中，视频解码器30可经配置以存储第一视图的第一存取单元中的第一经编码视频数据块，其中第一经编码视频数据块是使用高级残差预测及双向预测来编码(2000)。双向预测可包含用于第一预测方向的时间预测及用于第二预测方向的视图间预测。In one example of the present disclosure, video decoder 30 may be configured to store a first block of encoded video data in a first access unit of a first view, wherein the first block of encoded video data is encoded using advanced residual prediction and bi-directional prediction (2000). The bi-directional prediction may include temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction.

运动及视差补偿单元282可经配置以确定第一经编码视频数据块的第一预测方向的时间运动信息(2010)。ARP单元283可经配置以确定第一经编码视频数据块的第二预测方向的视差运动信息(2020)，且使用第一预测方向的所述所确定的时间运动信息来识别不同于第一预测方向的第二预测方向的参考块(2030)。参考块可在不同于第一存取单元的存取单元中。ARP单元283可进一步经配置以使用第二预测方向的所述所识别的参考块对第一经编码视频数据块执行高级残差预测(2040)。Motion and disparity compensation unit 282 may be configured to determine temporal motion information for a first prediction direction for a first block of encoded video data (2010). ARP unit 283 may be configured to determine disparity motion information for a second prediction direction for the first block of encoded video data (2020), and use the determined temporal motion information for the first prediction direction to identify a reference block for a second prediction direction different from the first prediction direction (2030). The reference block may be in an access unit different from the first access unit. ARP unit 283 may further be configured to perform advanced residual prediction on the first block of encoded video data using the identified reference block for the second prediction direction (2040).

在本发明的另一实例中，ARP单元238可经配置以使用所述所确定的时间运动信息来识别第一预测方向的参考块，且使用第一预测方向的所述所识别的参考块对第一经编码视频数据块执行高级残差预测。ARP单元283可进一步经配置以使用所述所确定的时间运动信息来识别第二预测方向的第二参考块，且使用所述所确定的时间运动信息及所述所确定的视差运动信息来识别第二预测方向的第三参考块，其中第三参考块在第二视图的第三存取单元中。第一预测方向的第一参考块与第二预测方向的第二参考块相同。In another example of the present disclosure, the ARP unit 238 may be configured to use the determined temporal motion information to identify a reference block for a first prediction direction, and perform advanced residual prediction on the first block of encoded video data using the identified reference block for the first prediction direction. The ARP unit 283 may be further configured to use the determined temporal motion information to identify a second reference block for a second prediction direction, and use the determined temporal motion information and the determined disparity motion information to identify a third reference block for the second prediction direction, wherein the third reference block is in a third access unit of the second view. The first reference block for the first prediction direction is the same as the second reference block for the second prediction direction.

在本发明的另一实例中，视频解码器30可经配置以使用第一预测方向的所述所识别的参考块及第二预测方向的所述所识别的参考块来解码第一经编码视频数据块。In another example of the disclosure, video decoder 30 may be configured to decode a first encoded block of video data using the identified reference block for the first prediction direction and the identified reference block for the second prediction direction.

在本发明的另一实例中，视频解码器30可经配置以：使用块层级高级残差预测或预测单元层级高级残差预测中的一者来解码第一经编码视频数据块以产生残差视频数据；使用双向预测、第一预测方向的所述所识别的参考块及第二预测方向的所述所识别的参考块来解码残差数据以产生经解码视频数据块。In another example of the present invention, the video decoder 30 may be configured to: decode the first encoded video data block using one of block-level advanced residual prediction or prediction unit-level advanced residual prediction to generate residual video data; and decode the residual data using bi-directional prediction, the identified reference block for the first prediction direction, and the identified reference block for the second prediction direction to generate a decoded video data block.

在本发明的另一实例中，视频解码器30可进一步经配置以存储第三视图的第四存取单元中的第二经编码视频数据块，其中第二经编码视频数据块是使用高级残差预测及双向预测来编码。双向预测可包含用于第三预测方向的时间预测及用于第四预测方向的视图间预测。In another example of the present disclosure, video decoder 30 may be further configured to store a second block of encoded video data in a fourth access unit of a third view, wherein the second block of encoded video data is encoded using advanced residual prediction and bi-directional prediction, which may include temporal prediction for a third prediction direction and inter-view prediction for a fourth prediction direction.

运动及视差补偿单元282可经配置以确定第一经编码视频数据块的第一预测方向的时间运动信息。ARP单元283可经配置以使用所述所确定的时间运动信息来识别第一预测方向的参考块。ARP单元283可进一步使用第一预测方向的所述所确定的时间运动信息来识别不同于第一预测方向的第二预测方向的参考块，其中所述参考块在不同于第一存取单元的存取单元中。ARP单元283还可使用第一预测方向及第二预测方向的所述所识别的参考块对第一经编码视频数据块执行高级残差预测。The motion and disparity compensation unit 282 may be configured to determine temporal motion information for a first prediction direction for a first block of encoded video data. The ARP unit 283 may be configured to use the determined temporal motion information to identify a reference block for the first prediction direction. The ARP unit 283 may further use the determined temporal motion information for the first prediction direction to identify a reference block for a second prediction direction different from the first prediction direction, wherein the reference block is in an access unit different from the first access unit. The ARP unit 283 may also perform advanced residual prediction on the first block of encoded video data using the identified reference blocks for the first and second prediction directions.

在本发明的另一实例中，运动及视差补偿单元282可经配置以确定第一经编码视频数据块的第二预测方向的视差运动信息。ARP单元283可经配置以使用所述所确定的时间运动信息来识别第一预测方向的第一参考块，其中所述第一参考块在第一视图的第二存取单元中。ARP单元283可进一步经配置以使用所述所确定的时间运动信息及所述所确定的视差运动信息来识别第二预测方向的第二参考块，其中所述第二参考块在第二视图的第三存取单元中。In another example of the present disclosure, the motion and disparity compensation unit 282 may be configured to determine disparity motion information for a second prediction direction for a first block of encoded video data. The ARP unit 283 may be configured to use the determined temporal motion information to identify a first reference block for the first prediction direction, wherein the first reference block is in a second access unit of the first view. The ARP unit 283 may further be configured to use the determined temporal motion information and the determined disparity motion information to identify a second reference block for a second prediction direction, wherein the second reference block is in a third access unit of the second view.

在本发明的另一实例中，视频解码器30可经配置以使用第一预测方向的所述所识别的参考块及第二预测方向的所述所识别的参考块来解码第一经编码视频数据块。视频解码器30可进一步经配置以使用块层级高级残差预测或预测单元层级高级残差预测中的一者来解码第一经编码视频数据块以产生残差视频数据，且使用双向预测、第一预测方向的所述所识别的参考块及第二预测方向的所述所识别的参考块来解码残差数据以产生经解码视频数据块。In another example of the present disclosure, video decoder 30 may be configured to decode a first block of encoded video data using the identified reference block for the first prediction direction and the identified reference block for the second prediction direction. Video decoder 30 may be further configured to decode the first block of encoded video data using one of block-level advanced residual prediction or prediction unit-level advanced residual prediction to generate residual video data, and to decode the residual data using bi-directional prediction, the identified reference block for the first prediction direction, and the identified reference block for the second prediction direction to generate a decoded block of video data.

在一些实例中，本发明中所描述的技术的一或多个方面可由中间网络装置执行，所述中间网络装置例如为媒体感知网络元件(MANE)、流调适处理器、拼接处理器或编辑处理器。例如，此中间装置可经配置以产生或接收如本发明中所描述的多种信令中的任一者。In some examples, one or more aspects of the techniques described in this disclosure may be performed by an intermediate network device, such as a media-aware network element (MANE), a stream adaptation processor, a splicing processor, or an editing processor. For example, such an intermediate device may be configured to generate or receive any of the various signaling methods described in this disclosure.

在一或多个实例中，所描述的功能可以用硬件、软件、固件或其任何组合来实施。如果以软件实施，则所述功能可作为一或多个指令或代码在计算机可读媒体上存储或传输，且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体，其对应于有形媒体，例如数据存储媒体，或包含任何促进将计算机程序从一处传送到另一处的媒体(例如，根据通信协议)的通信媒体。以此方式，计算机可读媒体一般可对应于(1)非暂时性的有形计算机可读存储媒体或(2)例如信号或载波等通信媒体。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本发明中描述的技术的指令、代码及/或数据结构的任何可用媒体。计算机程序产品可以包含计算机可读媒体。In one or more instances, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored or transmitted as one or more instructions or codes on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media such as data storage media, or communication media that includes any media that facilitates the transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, computer-readable media may generally correspond to (1) non-transitory tangible computer-readable storage media or (2) communication media such as signals or carrier waves. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.

举例来说且并非限制，所述计算机可读媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置，快闪存储器，或可用于存储呈指令或数据结构的形式的所要程序代码且可由计算机存取的任何其它媒体。同样，任何连接可恰当地称为计算机可读媒体。举例来说，如果使用同轴电缆、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令，那么同轴电缆、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。然而，应理解，所述计算机可读存储媒体和数据存储媒体并不包含连接、载波、信号或其它瞬时媒体，而是实际上针对非瞬时的有形存储媒体。如本文所使用，磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘，其中磁盘通常以磁性方式重现数据，而光盘使用激光以光学方式重现数据。上述各者的组合也应包含在计算机可读媒体的范围内。By way of example, and not limitation, computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Likewise, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwaves, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwaves are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather refer to non-transient, tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks typically reproduce data magnetically, while discs use lasers to reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.

指令可以由一或多个处理器执行，所述一或多个处理器例如是一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效的集成或离散逻辑电路。因此，如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外，在一些方面中，本文中所描述的功能性可以在经配置用于编码和解码的专用硬件和/或软件模块内提供，或者并入在组合编解码器中。并且，可将所述技术完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.

本发明的技术可实施于广泛多种装置或设备中，包含无线手持机、集成电路(IC)或IC组(例如，芯片组)。本发明中描述各种组件、模块或单元是为了强调经配置以执行所揭示技术的装置的功能方面，但不必需要通过不同硬件单元实现。实际上，如上文所描述，各种单元可以结合合适的软件及/或固件组合在编码解码器硬件单元中，或者通过互操作硬件单元的集合来提供，所述硬件单元包含如上文所描述的一或多个处理器。已描述各种实例。这些及其它实例在所附权利要求书的范围内。The techniques of this disclosure can be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or sets of ICs (e.g., chipsets). The various components, modules, or units described herein are intended to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation via distinct hardware units. In practice, as described above, the various units may be combined in a codec hardware unit in conjunction with appropriate software and/or firmware, or provided via a collection of interoperating hardware units, including one or more processors as described above. Various examples have been described. These and other examples are within the scope of the appended claims.

Claims

1. A method for decoding video data, the method comprising:

The system receives a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, wherein the bidirectional prediction includes temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction, and wherein the advanced residual prediction includes a residual predictor block that determines the residual block of the first encoded video data block.

First temporal motion information for determining the first predicted direction of the first encoded video data block;

The first temporal motion information of the first predicted direction is used to identify a first temporal reference block in the first predicted direction, wherein the first temporal reference block is in the first view and in a second access unit, and the second access unit is different from the first access unit.

Derive the disparity vector of the first predicted direction of the first encoded video data block;

The disparity vector of the first prediction direction is used to identify a first view reference block, wherein the first view reference block is in a second view, and the second view is different from the first view;

The first temporal motion information of the first inter-view reference block and the first predicted direction is used to identify the second inter-view reference block, which is in the second view;

The first residual predictor block is determined from the first inter-view reference block and the second inter-view reference block;

First disparity motion information for the second prediction direction of the first encoded video data block;

The first temporal motion information of the first prediction direction is used to identify the second temporal reference block of the second prediction direction, wherein the second temporal reference block is the same as the first temporal reference block;

The first disparity motion information in the second prediction direction is used to identify a third-view inter-reference block, wherein the third-view inter-reference block is in the second view;

The fourth inter-view reference block is identified using the third inter-view reference block and the first temporal motion information of the first predicted direction, wherein the second inter-view reference block is in the second view;

Determine the second residual predictor block from the reference block between the third and fourth views; and

Advanced residual prediction is performed on the first encoded video data block using the first time reference block, the second time reference block, the first residual predictor block, and the second residual predictor block.

2. The method according to claim 1, further comprising:

Decode the first encoded video data block using either block-level high-level residual prediction or prediction unit-level high-level residual prediction to generate residual video data; and

The residual data is bidirectionally predicted and decoded using a first time reference block with the first prediction direction and a second time reference block with the second prediction direction to produce a decoded video data block.

3. The method according to claim 1, further comprising:

The second encoded video data block is received in the third access unit of the third view, wherein the second encoded video data block is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including inter-view prediction for the third prediction direction and temporal prediction for the fourth prediction direction.

Determine the second disparity motion information in the third prediction direction of the second encoded video data block;

The second disparity motion information in the third prediction direction is used to identify the fifth inter-view reference block in the third prediction direction, wherein the fifth inter-view reference block is in the fourth view, and the fourth view is different from the third view;

The motion vector of the third predicted direction of the second encoded video data block is derived from the fifth view inter-reference block;

The motion vector in the third predicted direction is used to identify the third time reference block;

The sixth view reference block is identified using the motion vector of the fifth view reference block and the third prediction direction;

The third residual predictor block is determined from the fifth view reference block and the sixth view reference block;

Determine the second temporal motion information of the fourth prediction direction of the second encoded video data block;

The second temporal motion information of the fourth prediction direction is used to identify the fourth temporal reference block of the fourth prediction direction;

The second disparity motion information in the third prediction direction is used to identify a seventh inter-view reference block, wherein the seventh inter-view reference block is the same as the fifth inter-view reference block;

The eighth view reference block is identified using the second temporal motion information of the seventh view reference block and the fourth prediction direction;

The fourth residual predictor block is determined from the seventh inter-view reference block and the eighth inter-view reference block; and

Advanced residual prediction is performed on the second coded video data block using the third time reference block, the fourth time reference block, the third residual predictor block, and the fourth residual predictor block.

4. The method of claim 1, further comprising:

The first time motion information is scaled based on the image sequence count value.

5. An apparatus configured to decode video data, the apparatus comprising:

A video data storage device configured to store a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction, and wherein the advanced residual prediction includes a residual predictor block that determines the residual blocks of the first encoded video data block; and

One or more processors, which communicate with the video data storage and are configured to:

The second residual predictor block is determined from the reference block between the third and fourth views; and

6. The device of claim 5, wherein the one or more processors are further configured to:

The residual data is decoded using bidirectional prediction using a first time reference block with the first prediction direction and a second time reference block with the second prediction direction to produce a decoded video data block.

7. The device according to claim 6, further comprising:

A display configured to show the decoded video data blocks.

8. The device of claim 5, wherein the video data storage and the one or more processors are housed in one of the following: a desktop computer, a laptop computer, a set-top box, a handset, a smartphone, a smart pad, a tablet computer, a television, a camera, a digital media player, a video game console, or a video streaming device.

9. The device according to claim 5,

The one or more processors are further configured to:

The second disparity motion information in the third prediction direction is used to identify the fifth inter-view reference block in the third prediction direction, wherein the fifth inter-view reference block is in the fourth view, which is different from the third view;

The fourth residual predictor block is determined from the seventh view reference block and the eighth view reference block; and

10. The device of claim 5, wherein the one or more processors are further configured to:

11. An apparatus configured to decode video data, the apparatus comprising:

A means for receiving a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, the bidirectional prediction including time prediction for a first prediction direction and inter-view prediction for a second prediction direction, and wherein the advanced residual prediction includes a residual predictor block for determining the residual block of the first encoded video data block.

A means for determining first temporal motion information of the first predicted direction of the first encoded video data block;

A means for identifying a first time reference block in the first predicted direction using first time motion information in the first predicted direction, wherein the first time reference block is in the first view and in a second access unit, the second access unit being different from the first access unit;

A means for deriving the disparity vector of the first predicted direction of the first encoded video data block;

A means for identifying a first-view reference block using the disparity vector of the first predicted direction, wherein the first-view reference block is in a second view, the second view being different from the first view;

A means for identifying a second inter-view reference block using the first temporal motion information of the first inter-view reference block and the first predicted direction, wherein the second inter-view reference block is in the second view;

A means for determining a first residual predictor block from the first inter-view reference block and the second inter-view reference block;

A means for determining first disparity motion information for the second prediction direction of the first coded video data block;

A means for identifying a second time reference block in a second prediction direction using first time motion information in the first prediction direction, wherein the second time reference block is the same as the first time reference block;

A means for identifying a third-view inter-reference block using first disparity motion information in the second predicted direction, wherein the third-view inter-reference block is in the second view;

A means for identifying a fourth inter-view reference block using the third inter-view reference block and the first temporal motion information of the first predicted direction, wherein the second inter-view reference block is in the second view;

A means for determining a second residual predictor block from the third view reference block and the fourth view reference block; and

A means for performing advanced residual prediction on a first encoded video data block using the first time reference block, the second time reference block, the first residual predictor block, and the second residual predictor block.

12. The apparatus of claim 11, wherein the means for decoding the first encoded video data block comprises:

A means for decoding the first encoded video data block using either block-level high-level residual prediction or prediction unit-level high-level residual prediction to generate residual video data; and

An apparatus for bidirectional predictive decoding of the residual data using a first time reference block having a first prediction direction and a second time reference block having a second prediction direction to generate a decoded video data block.

13. A computer-readable storage medium storing instructions, which, when executed, cause one or more processors of an apparatus configured to decode video data:

Receive a first encoded video data block in a first access unit of a first view, wherein the first encoded video data block is encoded using advanced residual prediction and bidirectional prediction, wherein the bidirectional prediction includes temporal prediction for a first prediction direction and inter-view prediction for a second prediction direction, and wherein the advanced residual prediction includes a residual predictor block that determines the residual block of the first encoded video data block.

14. The computer-readable storage medium of claim 13, wherein the instructions further cause the one or more processors to: