HK1225881B

HK1225881B - Simplification of segment-wise dc coding of large prediction blocks in 3d video coding

Info

Publication number: HK1225881B
Application number: HK16113850.4A
Authority: HK
Inventors: 刘宏斌; 陈颖
Original assignee: 高通股份有限公司
Priority date: 2016-12-05
Filing date: 2013-12-30
Publication date: 2017-09-15

Description

Simplification of segment-by-segment DC coding of large prediction blocks in 3D video coding

Technical Field

This disclosure relates to video coding, and more particularly, to segment-by-Segment DC Coding (SDC) in a three-dimensional (3D) video coding process.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, tablet computers, smart phones, Personal Digital Assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, video teleconferencing devices, set top box devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard, and extensions of such standards. Video devices may transmit, receive, and store digital video information more efficiently.

An encoder-decoder (codec) applies video compression techniques to perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice may be partitioned into video blocks, which may also be referred to as Coded Treeblocks (CTBs), Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in inter-coded (P or B) slices of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may alternatively be referred to as frames.

Spatial or temporal prediction produces a predictive block of the block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded from a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the spatial domain to the transform domain, producing residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged as a two-dimensional array, may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.

A multi-view coding bitstream may be generated by, for example, encoding views from multiple views. Multi-view coding may allow a decoder to select different views, or possibly render multiple views. Additionally, some three-dimensional (3D) video techniques and standards that have been developed or are being developed utilize multiview coding aspects. For example, in some 3D video coding processes, different views may be used to transmit left and right eye views to support 3D video. Other 3D video coding processes may use multiview plus depth coding. In a multiview plus depth coding process, such as the process defined by the 3D-HEVC extension of HEVC, a 3D video bitstream may contain multiple views that include not only texture view components but also depth view components. For example, a given view may include a texture view component and a depth view component. The texture view and depth view components may be used to construct 3D video data.

Disclosure of Invention

In general, techniques are described for simplifying SDC coding of large intra-predicted blocks (e.g., 64 x 64 blocks) in 3D video coding processes (e.g., processes that conform to the 3D-HEVC extension of HEVC). In some examples, the techniques may include processing a 64 × 64 intra-prediction block into four 32 × 32 intra-prediction blocks in intra SDC. Processing a larger intra-predicted block into multiple smaller intra-predicted blocks in intra SDC may reduce the maximum buffer size requirement during intra SDC.

In one example, this disclosure describes a method of decoding depth data for video coding, the method comprising: for an intra prediction mode for a first block of depth data, intra predicting samples of depth data for a second block, wherein the second block comprises four blocks each having the same size that is one-fourth of the size of the first block of depth data and corresponds to top-left, top-right, bottom-left, and bottom-right blocks of the first block of depth data; receiving residual data for the first block indicating depth data of differences between pixel values of the first block and intra-predicted samples of the second block; and reconstructing the first block of depth data based on the intra-predicted samples for the second block and the residual data.

In another example, this disclosure describes a method of encoding depth data for video coding, the method comprising: for an intra prediction mode for a first block of depth data, intra predicting samples of depth data for a second block, wherein the second block comprises four blocks each having the same size that is one-fourth of the size of the first block of depth data and corresponds to top-left, top-right, bottom-left, and bottom-right blocks of the first block of depth data; generating residual data for the first block based on differences between pixel values of the first block and intra-predicted samples of the second block; and encoding a first block of depth data based on the intra-prediction mode and the residual data.

In another example, this disclosure describes a device for coding depth data for video coding, the device comprising: a memory storing depth data for video coding; and one or more processors configured to: for an intra prediction mode for a first block of depth data, intra predicting samples of depth data for a second block, wherein the second block comprises four blocks each having the same size that is one-fourth of the size of the first block of depth data and corresponds to top-left, top-right, bottom-left, and bottom-right blocks of the first block of depth data; and code a first block of depth data based on the intra-prediction mode and residual data for the first block that indicates differences between pixel values of the first block and intra-predicted samples of a second block.

The details of one or more aspects of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the techniques described in this disclosure will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a diagram illustrating intra prediction modes for HEVC.

Fig. 2 is a diagram illustrating neighboring samples for intra prediction modes in HEVC.

FIG. 3 is a block diagram illustrating an example video coding system that may utilize techniques of this disclosure.

Fig. 4 is a diagram illustrating an example of one wedgelet partition pattern suitable for coding an 8 x 8 block of pixel samples.

Fig. 5 is a diagram illustrating an example of one contour partition pattern suitable for coding an 8 x 8 block of pixel samples.

FIG. 6 is a block diagram illustrating an example video encoder that may implement the techniques of this disclosure.

FIG. 7 is a block diagram illustrating an example video decoder that may implement the techniques of this disclosure.

FIG. 8 is a diagram illustrating processing of a 64 × 64 intra-predicted block into four smaller 32 × 32 intra-predicted blocks.

Fig. 9 is a flow diagram illustrating a method for encoding a 64 x 64 intra depth block, according to an example of this disclosure.

Fig. 10 is a flow diagram illustrating a method for decoding a 64 x 64 intra depth block, according to an example of this disclosure.

Detailed Description

This disclosure describes techniques for simplifying segment-by-Segment DC Coding (SDC) of large intra-predicted blocks (e.g., 64 x 64 blocks) in 3D video coding processes, such as 3D-HEVC. In the HEVC main specification, the maximum intra prediction size is 32 × 32. However, in intra SDC mode of 3D-HEVC, the maximum intra prediction size of Planar mode (Planar mode) is 64 × 64. In addition, in the case of Liu et al JCT3V-F0126 "CE 5: general SDC for all Intra modes in 3D-HEVC (CE5related: general SDC for all Intra modes in 3D-HEVC) "(3D video coding extension joint cooperative group 6 th meeting of ITU-T SG 16WP 3 and ISO/IEC JTC 1/SC 29/WG 11: switzerland geneva, 10-25-11-1 months of 2013) has been proposed, in depth coding SDC can be applied for additional depth Intra prediction modes and the original HEVC Intra prediction mode.

In the present invention, when applied to a pixel32 ×, 64 ×, or other N × N expressions may refer to a plurality of pixels, reference samples, or prediction samples associated with a block of video data when in the context of reference samples or prediction samples²A total number of pixels or samples, wherein the block includes N pixels or samples in one dimension (e.g., a horizontal dimension) and N pixels or samples in another dimension (e.g., a vertical dimension).

With the scheme in JCT3V-F0126, the maximum intra prediction size for all HEVC intra prediction modes is 64 × 64. Thus, both the schemes in 3D-HEVC and JCT3V-F0126 increase the maximum buffer size for intra prediction when compared to HEVC. In some examples, this disclosure describes techniques for simplifying 64 × 64SDC coding in 3D-HEVC. To simplify SDC coding of a larger intra-predicted block (e.g., a 64 x 64 block) in a 3D video coding process (e.g., 3D-HEVC), this disclosure describes techniques that may include processing the larger intra-predicted block (e.g., the 64 x 64 intra-predicted block) into four smaller intra-predicted blocks (e.g., four 32 x 32 intra-predicted blocks) in intra SDC. In this way, in intra SDC, a 64 × 64 intra prediction block is processed into four 32 × 32 intra prediction sub-blocks using HEVC intra prediction mode. Processing a larger intra-predicted block into multiple smaller intra-predicted blocks in intra SDC may reduce the maximum buffer size requirement during intra SDC.

In SDC, a video encoder generates delta DC residual values to represent differences between pixels of a coded Prediction Unit (PU) or PU partition of a depth Coding Unit (CU) and predicted samples of the predicted PU or PU partition. A PU may have a single partition or two or more partitions defined according to a partitioning mode, such as a Depth Map Modeling (DMM) mode. In SDC, the delta DC value is a single value representing the difference between the average of the pixels of the PU or partition and the average of the predicted samples of the predicted PU or partition. To reconstruct a PU or PU partition, a single delta DC value is summed with the values of each of the predicted samples of the predicted PU or PU partition.

In this section, the video coding standards and HEVC techniques related to this disclosure are reviewed. Examples of video coding standards include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262, or ISO/IEC MPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4AVC), including Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions thereof. The latest joint draft of MVC is described in "Advanced video coding for general audio visual services" (ITU-T recommendation h.264) at 3 months 2010.

Furthermore, there is a new upcoming video coding standard, High Efficiency Video Coding (HEVC), which is developed by the joint collaborative team of video coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The latest draft of the HEVC standard, JCTVC-L1003(Benjamin Bross, Woo-Jin Han, Jens-Ranier Ohm, Gary Sullivan, Ye-Kui Wang, Thomas Wiegand) "High Efficiency Video Coding (HEVC) text Specification draft 10(for FDIS and Final Call) (High Efficiency Video Coding (HEVC) textual specification draft 10(for FDIS & Last Call))" (Joint Video Coding collaboration team of ITU-T SG 16WP 3 and ISO/IEC JTC 1/SC 29/WG 11) (JCT-VC), conference 12: Swiss Indocile, year 2013, month 1, 14 to day 23 ("HEVC 10") is incorporated herein by reference in its entirety and can be taken from the following links:

http://phenix.it-sudparis.eu/jct/doc_end_user/documents/12_Geneva/ wg11/JCTVC-L1003-v34.zip

fig. 1 is a diagram illustrating intra prediction modes for HEVC. Fig. 1 generally illustrates prediction directions associated with various directional intra prediction modes that may be used for intra coding in HEVC. In current HEVC, for example as described in HEVC WD10, for the luma component of each Prediction Unit (PU), the intra prediction method is utilized in conjunction with 33 directional (angular) prediction modes (indexed from 2 to 34), DC mode (indexed with 1), and planar mode (indexed with 0), as shown in fig. 1.

In planar mode (indexed with 0), prediction is performed using a so-called "plane" function to determine predictor values for each of the pixels within a block of video data (e.g., a PU). According to DC mode (indexed by 1), prediction is performed using averaging of pixel values within the block to determine a predictor value for each of the pixels within the block. According to the directional prediction mode, prediction is performed based on reconstructed pixels of neighboring blocks along a particular direction (as indicated by the mode). In general, the tail end of the arrow shown in fig. 1 represents the opposite one of the neighboring pixels from which the value is retrieved, while the head end of the arrow represents the direction along which the retrieved value propagates to form the predictive block.

For HEVC intra prediction modes, the video encoder and/or video decoder uses the various modes discussed above to generate pixel-specific predictor values for each pixel in the PU, e.g., by using neighboring samples for the PUs of modes 2-34. The video encoder determines a residual value for the video block based on a difference between actual depth values of pixels of the block and the predictor value, and provides the residual value to a video decoder. According to HEVC WD10, a video encoder transforms the residual values and quantizes transform coefficients, and may also entropy encode the quantized transform coefficients. A video decoder determines reconstructed values for pixels of a block (e.g., after entropy decoding, inverse quantization, and inverse transformation) by adding residual values to predictor values. Other details regarding HEVC intra prediction modes are specified in HEVC WD 10. In SDC, a single delta DC residual value is coded for each predicted PU or partition. In addition, the delta DC residual value is not transformed or quantized.

Fig. 2 is a diagram illustrating neighboring samples for intra prediction modes in HEVC. As shown in fig. 2, various directional intra-prediction modes for pixels of a current prediction block may rely on spatially neighboring samples or a combination of such neighboring samples. In particular, in the intra prediction process, the lower left neighboring sample, the upper neighboring sample, and the upper right neighboring reconstructed sample (if available) are used, as shown in fig. 2. Neighboring samples may be obtained from neighboring blocks, e.g., within the same picture or view, that are spatially adjacent to the current block to be intra coded.

In JCT-3V, two HEVC extensions are being developed: multiview extension (MV-HEVC) and 3D video extension (3D-HEVC). The latest version of the reference software "3D-HTM version 9.0" for 3D-HEVC is incorporated herein by reference in its entirety and may be downloaded from the following links:

[3D-HTM version 9.0 ]:

https://hevc.hhi.fraunhofer.de/svn/svn_3DVCSoftware/tags/HTM-9.0/

a recent Draft of 3D-HEVC is presented in JCTVC-F1001-v2(Gerhard Tech, Krzysztofwegner, YingChen and Sehoon Yea, "3D-HEVC Draft Text 2(3D-HEVC Draft Text 2)", ITU-T SG 16WP 3 and 3D video coding extensions of ISO/IEC JTC 1/SC 29/WG 11. the 6 th conference, Switzerland Nintewa, 10.25.2013 to 11.1 (hereinafter referred to as "F1001" or "3D-HEVC WD")), which is incorporated herein by reference in its entirety and available from the following links:

http://phenix.it-sudparis.eu/jct2/doc_end_user/documents/6_Geneva/ wg11/JCT3V-F1001-v2.zip

in 3D-HEVC, as defined in the 3D-HEVC WD mentioned above, each access unit contains multiple pictures, and each of the pictures in each view has a unique view identification (id) or view order index. However, the depth picture and the texture picture of the same view may have different layer ids.

Depth coding in 3D video coding will now be described. The 3D video data is represented using a multiview video plus depth format, where captured views (textures) are associated with corresponding depth maps. In 3D video coding, texture and depth maps are coded and multiplexed into a 3D video bitstream. The depth map is coded as grayscale video, where luma samples represent depth values, and conventional intra and inter coding methods may be applied to depth map coding.

The depth map may be characterized by sharp edges and constant regions. Due to different statistics of depth map samples, different coding schemes are designed for depth maps based on 2D video codecs. In a multiview plus depth coding process, a view may include a texture component and a depth component. Depth Coding Units (CUs) in a depth component may be inter-coded or intra-coded. A depth CU may be divided into one or more PUs, and a PU may be divided into one or more partitions.

Partitions may be intra-predicted or inter-predicted, and in some examples, depth residuals may be coded using piecewise DC residual coding (SDC). In SDC, delta DC residual values representing differences between coded PU partitions and intra or inter coded PU partitions may be coded. In particular, the delta DC value may be a single value for an entire PU or PU partition. The single value may represent a difference between an average of pixel values of a coded PU partition and an average of prediction samples of an inter-or intra-predicted PU or partition.

Fig. 3 is a block diagram illustrating an example video encoding and decoding system 10 that may be configured to utilize various techniques of this disclosure, such as techniques for simplifying segment-by-Segment DC Coding (SDC) of large intra-predicted blocks, such as 64 x 64 blocks, in a 3D video coding process, such as 3D-HEVC. In some examples, video encoder 20 and/or video decoder 30 of system 10 may be configured to process a larger 64 × 64 intra-predicted block (e.g., a 64 × 64 intra-predicted block) into four smaller intra-predicted blocks (e.g., four 32 × 32 intra-predicted blocks) in intra SDC. In this way, in intra SDC, the 64 × 64 intra prediction for HEVC intra prediction mode is processed into four 32 × 32 intra predictions. In some cases, processing a larger intra-predicted block into multiple smaller intra-predicted blocks in intra SDC may reduce the maximum buffer size requirement during intra SDC for encoder 20 and/or decoder 30.

As shown in fig. 3, system 10 includes a source device 12 that provides encoded video data to be decoded by a destination device 14 at a later time. In particular, source device 12 provides video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (such as so-called "smart" phones), so-called "smart" tablets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium (e.g., a transmission channel) to enable source device 12 to transmit encoded video data directly to destination device 14 in real-time.

The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet network, such as a local area network, a wide area network, or a global network, such as the internet. The communication medium may include routers, switches, base stations, or any other apparatus that may be used to facilitate communications from source device 12 to destination device 14.

In some examples, the encoded data may be output from output interface 22 to a computer-readable storage medium, such as a non-transitory computer-readable storage medium, i.e., a data storage device. Similarly, encoded data may be accessed from a storage device through an input interface. The storage device may include any of a variety of distributed or locally accessed non-transitory data storage media, such as a hard drive, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, the storage device may correspond to a file server or another intermediate storage device that may store the encoded video generated by source device 12.

Destination device 14 may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include a web server (e.g., for a website), an FTP server, a Network Attached Storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data via any standard data connection, including an internet connection. Such a connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

The techniques of this disclosure may be applied to video coding to support any of a variety of wired or wireless multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions (e.g., dynamic adaptive streaming over HTTP (DASH)), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 3, source device 12 includes video source 18, video encoder 20, and output interface 22. Destination device 14 includes input interface 28, video decoder 30, and display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply techniques for simplified incremental DC coding for depth coding in a 3D video coding process, such as 3D-HEVC. In other examples, the source device and destination device may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device.

The illustrated system 10 of fig. 3 is merely one example. The techniques described in this disclosure may be performed by digital video encoding and/or decoding devices. Although the techniques of this disclosure are generally performed by video encoder 20 and/or video decoder 30, the techniques may also be performed by a video encoder/decoder (often referred to as a "codec"). Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetric manner such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting, or video telephony.

Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called smartphones, tablets, or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output onto computer-readable medium 16 through output interface 22.

Computer-readable medium 16 may include a transitory medium, such as a wireless broadcast or a wired network transmission, or a data storage medium (i.e., a non-transitory storage medium). In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, such as via a network transmission. Similarly, a computing device of a media production facility (e.g., a disc stamping facility) may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.

This disclosure may generally refer to video encoder 20 "signaling" certain information to another device, such as video decoder 30. It should be understood, however, that video encoder 20 may signal information by associating certain syntax elements with various encoded portions of video data. That is, video encoder 20 may "signal" the data by storing certain syntax elements into headers or payloads of various encoded portions of the video data. In some cases, such syntax elements may be encoded and stored (e.g., to computer-readable medium 16) prior to being received and decoded by video decoder 30. Thus, the term "signaling" may generally refer to communication of syntax or other data for decoding compressed video data, whether such communication occurs in real-time or near real-time or over a period of time, such as may occur when syntax elements are stored onto media at the time of encoding, and then may be retrieved by a decoding device at any time after storage onto this media.

Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20 that is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units (e.g., GOPs). Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, a projection device, or another type of display device.

Although not shown in fig. 3, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. As one example, the MUX-DEMUX unit may conform to the ITU h.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder or decoder circuits, as applicable, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic circuitry, software, hardware, firmware, or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (codec). Devices that include video encoder 20 and/or video decoder 30 may include integrated circuits, microprocessors, and/or wireless communication devices, such as cellular telephones.

Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the HEVC standard, and more particularly, the 3D-HEVC extension of the HEVC standard, as mentioned in this disclosure, for example, by file F1001 or 3D-HEVC WD. HEVC assumes several additional capabilities of a video coding device relative to devices configured to perform coding according to other processes, such as ITU-t h.264/AVC. For example, although h.264 provides nine intra-prediction encoding modes, the HM may provide up to thirty-five intra-prediction encoding modes.

Some basic aspects of HEVC will now be discussed. In general, HEVC specifies that a video picture (or "frame") may be divided into a series of largest coding units referred to as Coding Tree Units (CTUs). A CTU includes corresponding luma and chroma components, referred to as Coded Tree Blocks (CTBs), such as luma CTB and chroma CTB, that include luma and chroma samples, respectively. Syntax data within the bitstream may define the size of the CTU (the largest coding unit in terms of number of pixels). A slice includes a plurality of consecutive CTBs in coding order. A picture may be partitioned into one or more slices. Each CTB may be split into Coding Units (CUs) according to a quadtree partitioning structure. In general, a quadtree data structure contains one node per CU, where the root node corresponds to a CTB. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of which corresponds to one of the sub-CUs.

Each node of the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in a quadtree may include a split flag that indicates whether a CU corresponding to the node is split into sub-CUs. Syntax elements for a CU may be defined in a recursive manner and may depend on whether the CU is split into sub-CUs. If a CU does not split further, it is called a leaf CU. The four sub-CUs of a leaf CU may also be referred to as leaf CUs even if there is no explicit split of the original leaf CU. For example, if a CU of size 16 × 16 is not further split, then the four 8 × 8 sub-CUs will also be referred to as leaf CUs, even though the 16 × 16CU is never split.

CU in HEVC has a similar purpose to the macroblock of the h.264 standard, except that CU does not have a size difference. For example, a CTB may be split into four child nodes (also referred to as child CUs), and each child node may in turn be a parent node and split into four other child nodes. The last non-split child node (referred to as a leaf node of the quadtree) comprises a coding node, also referred to as a leaf CU. Syntax data associated with a coded bitstream may define the maximum number of times a CTB may be split (referred to as the maximum CU depth), and may also define the minimum size of a coding node. Thus, in some examples, the bitstream may also define a smallest coding unit.

A CU includes a coding node and a Prediction Unit (PU) and a Transform Unit (TU) associated with the coding node. This disclosure may use the term "block" to refer to any of a CU, a Prediction Unit (PU), a Transform Unit (TU), or a partition thereof in the context of HEVC, or similar data structures in the context of other standards. The size of a CU corresponds to the size of a coding node. The size of a CU may range from 8 × 8 pixels up to a size with a maximum CTB of 64 × 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe partitioning the CU into one or more PUs. The partition mode may be different between whether the CU is skipped or direct mode encoded, intra prediction mode encoded, or inter prediction mode encoded. With depth coding as described in this disclosure, a PU may be partitioned into non-square shapes, or include partitions of non-rectangular shapes. For example, syntax data associated with a CU may also describe partitioning the CU into one or more TUs according to a quadtree. The TU may be square or non-square (e.g., rectangular) in shape.

The HEVC standard allows for a transform according to a TU, which may be different for different CUs. The size of a TU is typically set based on the size of the PU within a given CU defined for the partitioned CTB, but this may not always be the case. TUs are typically the same size or smaller than a PU. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). Pixel difference values associated with TUs may be transformed to generate transform coefficients, which may be quantized. However, in SDC, the delta DC residual values are typically not transformed or quantized.

A leaf CU may include one or more Prediction Units (PUs). In general, a PU represents a spatial region corresponding to all or a portion of a corresponding CU, and may include data for retrieving a reference sample for the PU. The reference sample may be a pixel from a reference block. In some examples, the reference samples may be obtained from a reference block or generated, such as by interpolation or other techniques. The PU also contains data related to prediction. For example, when a PU is intra-mode encoded, the data of the PU may be included in a Residual Quadtree (RQT), which may include data describing an intra-prediction mode for a TU corresponding to the PU.

As another example, when the PU is inter-mode encoded, the PU may include data defining one or more motion vectors for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list for the motion vector (e.g., RefPicList0 or RefPicList 1).

A leaf-CU having one or more PUs may also include one or more Transform Units (TUs). The transform units may be specified using RQTs (also referred to as TU quadtree structures), as discussed above. For example, the split flag may indicate whether a leaf CU is split into four transform units. Each transform unit may then be further split into more sub-TUs. When a TU is not further split, it may be referred to as a leaf-TU. In general, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, the same intra prediction mode is typically applied to calculate the prediction values for all TUs of a leaf-CU. For intra coding, video encoder 20 may calculate a residual value for each leaf-TU using the intra-prediction mode as the difference between the portion of the CU corresponding to the TU and the original block. TUs are not necessarily limited to the size of a PU. Thus, TUs may be larger or smaller than PUs. For intra coding, a PU may be co-located with a corresponding leaf-TU of the same CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.

Furthermore, the TUs of a leaf-CU may also be associated with a respective quadtree data structure, referred to as a Residual Quadtree (RQT). That is, a leaf-CU may include a quadtree that indicates how the leaf-CU is partitioned into TUs. The root node of a TU quadtree typically corresponds to a leaf CU, while the root node of a CU quadtree typically corresponds to a CTB. TUs of an unsplit RQT are referred to as leaf-TUs. In general, this disclosure uses the terms CU and TU to refer to leaf-CU and leaf-TU, respectively, unless otherwise mentioned.

A video sequence typically includes a series of pictures. As described herein, "picture" and "frame" are used interchangeably. That is, pictures containing video data may be referred to as video frames or simply "frames. A group of pictures (GOP) typically includes one or more of a series of video pictures. A GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, that describes the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 20 typically operates on video blocks within various video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes, and may have different sizes according to a specified coding standard.

As an example, HEVC supports prediction for various PU sizes. Assuming that the size of a particular CU is 2N × 2N, HEVC supports intra prediction of PU sizes of 2N × 2N or N × N, and inter prediction of symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N. A PU with a size of 2 nx 2N represents an undivided CU because it is the same size as the CU on which it resides. In other words, a 2N × 2N PU is the same size as its CU. HEVC supports asymmetric partitioning for inter prediction of PU sizes of 2 nxnu, 2 nxnd, nlx 2N, and nR x 2N. In asymmetric partitioning, one direction of a CU is not partitioned, and the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an indication of "n" followed by "up", "down", "left", or "right". Thus, for example, "2N × nU" refers to a horizontally partitioned 2N × 2N CU, with the top being 2N × 0.5N PU and the bottom being 2N × 1.5N PU. For depth coding, 3D-HEVC WD further supports partitioning of PUs according to Depth Modeling Mode (DMM), including non-rectangular partitions, as will be described.

In this disclosure, "nxn" and "N by N" are used interchangeably to refer to the pixel size of a video block in terms of vertical and horizontal dimensions, e.g., 16 x 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Further, the block does not necessarily need to have the same number of pixels in the horizontal direction and the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.

After conventional intra-predictive or inter-predictive coding using PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. A PU may comprise syntax data that describes a method or mode of generating predictive pixel data in the spatial domain, also referred to as the pixel domain, and TUs for conventional residual coding may comprise coefficients in the transform domain after applying a transform, such as a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform, to the residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs that include residual data for the CU, and then transform the TUs to generate transform coefficients for the CU.

Video encoder 20 may perform quantization of the transform coefficients after any transform used to generate the transform coefficients. Quantization generally refers to the process by which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded to an m-bit value during quantization, where n is greater than m. For depth coding, 3D-HEVC WD further supports SDC for residual data, where delta DC values represent residual values for PU partitions. Unlike conventional HEVC residual values, delta DC residual values are typically not transformed or quantized.

After quantization, video encoder 20 may scan the quantized transform coefficients, generating a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients in front of the array and lower energy (and therefore higher frequency) coefficients behind the array.

In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Binary Arithmetic Coding (CABAC), as used in HEVC. Examples of other entropy coding processes include Context Adaptive Variable Length Coding (CAVLC), syntax-based context adaptive binary arithmetic coding (SBAC), and Probability Interval Partitioning Entropy (PIPE) coding. Also, CABAC is used in HEVC. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

Video encoder 20 may further send syntax data, such as block-based syntax data, picture-based syntax data, and GOP-based syntax data, to video decoder 30, such as in a picture header, a block header, a slice header, or a GOP header. The GOP syntax data may describe a number of pictures in a respective GOP, and the picture syntax data may indicate an encoding/prediction mode used to encode the corresponding picture.

Video encoder 20 and/or video decoder 30 may perform intra-picture prediction coding of depth data and inter-prediction coding of depth data. Additionally, according to examples of this disclosure, video encoder 20 and/or video decoder 30 may code DC residual data generated by depth intra-prediction coding of video data and/or depth inter-prediction coding of video data, e.g., using SDC according to any of a variety of examples, as will be described.

In HEVC, assuming that the size of a Coding Unit (CU) is 2N × 2N, video encoder 20 and video decoder 30 may support various Prediction Unit (PU) sizes of 2N × 2N or N × N for intra prediction, as well as symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, N × N, or similar sizes for inter prediction. The video encoder and video decoder may also support asymmetric partitions of PU sizes of 2 nxnu, 2 nxnd, nlx 2N, and nrx 2N for inter prediction. For depth coding as provided in 3D-HEVC, video encoders and video decoders may be configured to support multiple different depth coding modes for intra-prediction and/or inter-prediction, including various Depth Modeling Modes (DMMs), as described in this disclosure.

Video data coded using 3D video coding techniques may be rendered and displayed to create three-dimensional effects. As one example, two images of different views (i.e., corresponding to two camera perspectives having slightly different horizontal positions) may be displayed substantially simultaneously such that one image is seen by the left eye of the viewer and the other image is seen by the right eye of the viewer.

The 3D effect may be implemented using, for example, a stereoscopic display or an autostereoscopic display. Stereoscopic displays may be used in conjunction with goggles that filter the two images accordingly. For example, passive glasses may filter images using polarized lenses or differently colored lenses or other optical filtering techniques to ensure that the proper eye sees the proper image. As another example, active glasses may rapidly occlude alternating lenses in coordination with a stereoscopic display that may alternate between displaying left and right eye images. An autostereoscopic display displays the two images in a manner that does not require glasses. For example, an autostereoscopic display may include a mirror or prism configured such that each image is projected into the appropriate eye of the viewer.

The techniques of this disclosure relate to techniques for coding 3D video data by coding depth data to support 3D video. In general, the term "texture" is used to describe the value of the brightness (i.e., luminance or "luma") of an image and the value of the chroma (i.e., color or "chroma") of an image. In some examples, a texture image may include one set of luma data (Y) and two sets of chroma data for a blue tone (Cb) and a red tone (Cr). In some chroma formats, such as 4:2:2 or 4:2:0, the chroma data is downsampled relative to the luma data. That is, the spatial resolution of a chroma pixel may be lower than the spatial resolution of the corresponding luma pixel, e.g., one-half or one-fourth of the luma resolution.

The depth data typically describes depth values of corresponding texture data. For example, a depth image may include a set of depth pixels (or depth values) that each describe, for example, a depth of corresponding texture data in a texture component of a view, such as in a depth component of the view. Each pixel may have one or more texture values (e.g., luma and chroma), and may also have one or more depth values. The texture picture and the depth map may (but need not) have the same spatial resolution. For example, a depth map may include more or fewer pixels than a corresponding texture picture. The depth data may be used to determine the horizontal disparity, and in some cases, the vertical disparity, of the corresponding texture data may also be used.

A device receiving texture and depth data may display a first texture image of one view (e.g., a left eye view) and modify the first texture image using the depth data by offsetting pixel values of the first image by horizontal disparity values determined based on depth values to generate a second texture image of another view (e.g., a right eye view). In general, horizontal disparity (or simply "disparity") describes the horizontal spatial offset of a pixel in a first view from a corresponding pixel in a right view, where two pixels correspond to the same part of the same object as represented in the two views.

In still other examples, depth data may be defined for pixels in the z-dimension perpendicular to the image plane such that the depth associated with a given pixel is defined relative to a zero disparity plane defined for the image. This depth may be used to generate horizontal disparity for the display pixels such that the pixels are displayed differently for the left and right eyes depending on the z-dimension depth value of the pixel relative to the zero disparity plane. The zero disparity plane may vary for different portions of the video sequence, and may also vary the amount of depth relative to the zero disparity plane.

Pixels lying on a zero disparity plane may be similarly defined for the left and right eyes. Pixels located in front of the zero disparity plane may be displayed in different positions for the left and right eyes (e.g., with horizontal disparity) in order to create the perception that the pixels appear from an image in the z-direction perpendicular to the image plane. Pixels located behind the zero disparity plane may be displayed with slight blur to slightly perceive depth, or may be displayed in different positions for the left and right eyes (e.g., with horizontal disparity as opposed to pixels located in front of the zero disparity plane). Many other techniques may also be used to convey or define depth data for an image.

Two-dimensional video data is typically coded as a sequence of discrete pictures, each of which corresponds to a particular time instance. That is, each picture has an associated playback time relative to the playback time of the other images in the sequence. These pictures may be considered texture pictures or texture images. In depth-based 3D video coding, each texture picture in the sequence may also correspond to a depth map. That is, the depth map corresponding to the texture picture describes the depth data of the corresponding texture picture. The multi-view video data may include data for a variety of different views, where each view may include a respective sequence of texture components and corresponding depth components.

A picture typically corresponds to a particular time instance item. Video data may be represented using a sequence of access units, where each access unit includes all data corresponding to a particular time instance. Thus, for example, for multiview video data plus depth coding, the texture image from each view for a common time instance plus the depth map for each of the texture images may all be included within a particular access unit. Thus, an access unit may include multiple views, where each view may include data corresponding to a texture component of a texture image and data corresponding to a depth component of a depth map.

Each access unit may contain multiple view components or pictures. The view components of a particular view are associated with a unique view id or view order index, such that the view components of different views are associated with different view ids or view order indices. The view components may include a texture view component as well as a depth view component. Texture and depth view components in the same view may have different layer ids. Texture view components may be coded as one or more texture slices, while depth view components may be coded as one or more depth slices. Multiview plus depth creates a variety of coding possibilities such as intra-picture, inter-picture, intra-view, inter-view, motion prediction, and the like.

In this way, 3D video data may be represented using a multiview video plus depth format, where a captured or generated view includes texture components associated with a corresponding depth map. Furthermore, in 3D video coding, texture and depth maps may be coded and multiplexed into a 3D video bitstream. The depth map may be coded as a grayscale image, where "luma" samples (i.e., pixels) of the depth map represent depth values.

In general, a block of depth data (a block of samples of a depth map, e.g., corresponding to a pixel) may be referred to as a depth block. The depth values may be referred to as luma values associated with the depth samples. That is, a depth map may generally be considered a monochrome texture picture, i.e., a texture picture that includes luma values and does not include chroma values. In any case, conventional intra and inter coding methods may be applied to depth map coding. Alternatively or additionally, other coding methods such as intra SDC or inter SDC may be applied to depth map coding in 3D video coding processes, such as 3D-HEVC.

In 3D-HEVC, the same definition of intra prediction mode as in HEVC is utilized. That is, the intra-mode used in 3D-HEVC includes the intra-mode of HEVC. Also, in 3D-HEVC, a Depth Modeling Mode (DMM) is introduced along with HEVC intra prediction modes to code intra prediction units of a depth slice.

To better represent the sharp edges in the depth map, current HTMs (3D-HTM version 9.0) apply the DMM method for intra coding of depth maps. The depth block is partitioned into two regions specified by the DMM pattern, where each region is represented by a constant value. The DMM pattern may be explicitly signaled (DMM mode 1) or predicted by co-located texture blocks (DMM mode 4).

There are two types of segmentation models defined in DMM, including wedge wave segmentation and contour segmentation. Fig. 4 is a diagram illustrating an example of a wedgelet partition pattern suitable for coding a block of pixel samples. Fig. 5 is a diagram illustrating an example of contour partition patterns suitable for coding a block of pixel samples. For wedgelet partitions as shown in fig. 4, a depth block, e.g., a PU, is split by a straight line into two regions, where the two regions are labeled P0 and P1. For contour partitioning as shown in fig. 5, a depth block, such as a PU, may be partitioned into two irregular regions. Thus, a PU may include a single partition, or may include two partitions in the case of wedgelet partitioning or contour partitioning.

Contour segmentation is more flexible than wedge segmentation, but is difficult to signal explicitly. In DMM mode 4, the contour segmentation pattern is implicitly derived using reconstructed luma samples of the co-located texture block.

As one example, fig. 4 provides an illustration of a wedgelet pattern for an 8 x 8 block 40. For wedgelet partitions, a depth block of, for example, PU, is split by a straight line 46 into two regions 42, 44, with a starting point 48 at (Xs, Ys) and an ending point 50 at (Xe, Ye), as illustrated in fig. 4, where the two regions 42, 44 are also labeled P0 and P1, respectively. Each pattern in block 40 consists of an array of binary digits of size uB × vB, which number marks whether the corresponding sample belongs to region P0 or P1, where uB and vB represent the horizontal and vertical size of the current PU, respectively. Regions P0 and P1 are represented by white and shaded samples, respectively, in fig. 4.

As shown in the example of fig. 5, a depth block, such as depth block 60, may be partitioned into three irregularly shaped regions 62, 64A, and 64B using contour segmentation, with region 62 labeled P0 and two regions 64A and 64B collectively labeled P1, respectively. Although the pixels in region 64A are not immediately adjacent to the pixels in region 64B, regions 64A and 64B may be defined to form one single region for purposes of predicting the PU of depth block 60. In DMM mode 4, in the case of 3D-HEVC, the contour segmentation pattern is implicitly derived using reconstructed luma samples of the co-located texture block.

Referring to fig. 4 and 5, each individual square within nxn depth blocks 40 and 60 represents a respective individual pixel of depth blocks 40 and 60, respectively. In fig. 4, the numerical value within the square indicates whether the corresponding pixel belongs to the region 42 (value "0" in the example of fig. 4) or the region 44 (value "1" in the example of fig. 4). Shading is also used in fig. 4 to indicate whether a pixel belongs to region 42 (white square) or region 44 (grey shaded square).

As discussed above, each pattern (i.e., both the wedge wave and the profile) may be defined by an array of size uB × vB binary digits that mark whether the corresponding sample (i.e., pixel) belongs to region P0 or P1 (where P0 corresponds to region 42 in fig. 4 and region 62 in fig. 5, and P1 corresponds to region 44 in fig. 4 and regions 64A, 64B in fig. 5), where uB and vB represent the horizontal and vertical size of the current PU, respectively. In the examples of fig. 4 and 5, the PUs correspond to blocks 40 and 60, respectively. Video coders, such as video encoder 20 and video decoder 30, may initialize the wedge wave pattern at the beginning of coding (e.g., the beginning of encoding or the beginning of decoding).

For HEVC intra prediction mode, a pixel-specific intra predictor value is generated for each pixel in the PU by using neighboring samples of the PU, as specified in subclause 8.4.2 in HEVC WD 10.

For other depth intra modes, a partition-specific DC predictor is computed for each partition within a PU by using at most two neighboring samples of the PU. Let bPattern [ x ] [ y ] be the partition style of the PU, where x is 0.. N-1, y is 0.. N-1, and N is the width of the PU. The bPattern [ x ] [ y ] indicates which partition the pixel (x, y) belongs to, and the bPattern [ x ] [ y ] may be equal to 0 or 1. Let BitDepth be the bit depth of the depth sample and let RecSample [ x ] [ y ] be the reconstructed neighbor sample of the PU, where x-1 and y-0.. N-1 (corresponding to the left neighbor of the PU) or y-1, x-0.. N-1 (corresponding to the upper neighbor of the PU). Subsequently, the DC predictor of partition X, i.e. dcdred [ X ], is derived as follows, where X ═ 0 or 1:

● sets bT ═ bPattern [0] [0] |? 1:0

● sets bL ═ bPattern [0] [0] |? 1:0

● if bT equals bL

-DCPred[X]＝(RecSample[-1][0]+RecSample[0][-1])>>1

-DCPred[1-X]＝bL？(RecSample[-1][N-1]+RecSample[N-1][-1])>>1:2^BitDepth-1

● else

-DCPred[X]＝bL？RecSample[(N-1)>>1][-1]:RecSample[-1][(N-1)>>1]

-DCPred[1-X]＝bL？RecSample[-1][N-1]:RecSample[N-1][-1]

A Depth Lookup Table (DLT) maps depth indices to depth values. DLTs may be constructed by analyzing frames within a first intra period prior to encoding a full video sequence. In the current design of 3D-HEVC, all valid depth values are ordered in ascending order and inserted into the DLT with an increasing index.

DLT is an optional transcoding tool. In the current HTM (3D-HTM version 9.0), if more than one-half of the VALUEs from 0 to MAX DEPTH VALUE (e.g., 255 for 8-bit DEPTH samples) occur in the original DEPTH map at the analysis step, then video encoder 20 will not use DLT. Otherwise, the DLT will be coded in a Sequence Parameter Set (SPS) and/or a Video Parameter Set (VPS). In order for encoder 20 to code DLT, the number of valid depth values is first coded with exponential-golomb codes. Then, each valid depth value is also coded with exponential-golomb codes.

Video encoder 20 reads a predefined number of frames from an input video sequence to be coded and scans all samples for available depth map values. During this process, video encoder 20 generates a mapping table that maps depth values to effective depth values based on the original uncompressed depth map.

Encoder 20 and/or decoder 30 use analysis depth map D_tDerives a Depth lookup table Idx2Depth (), an index lookup table Depth2Idx (), a Depth mapping table M (), and an effective Depth value d_validThe number of (c):

1. initialization

● boolean vector b (d) FALSE for all depth values d

● index count i 0

2. Processing D for multiple time instances t_tEach pixel position p in (1):

● set (B (D)_t(p)) — TRUE to mark valid depth values

Number of counts of TRUE values in B (d) → d_valid

4. For each d of b (d) ═ TRUE:

● sets Idx2Depth (i) ═ d

● setting M (d) d

● sets Depth2Idx (d) ═ i

●i＝i+1

5. For each d of b (d) ═ FALSE:

● find d ' arg min d-d ' i and B (d ') TRUE

● set to M (d) ═ d'

● sets Depth2Idx (d) to Depth2Idx (d').

Mapping from the index Idx back to the depth value d is as follows: d ═ Idx2Depth [ Idx ]. The mapping from depth value d to index Idx is as follows: idx is Depth2Idx [ d ].

Intra SDC modes (i.e., intra segment-wise DC coding, which may also be referred to as intra simplified depth coding) have been introduced in 3D-HEVC along with HEVC intra prediction modes, DMM modes, and chain coding modes to code intra PUs of depth slices. In current 3D-HEVC, SDC may only be applied to 2 nx 2N PU partition sizes. Instead of coding quantized transform coefficients, the SDC mode represents a depth block with two types of information:

1. partition types for the current depth block, including:

DMM mode 1(2 partitions)

b. Plane (1 partition)

2. For each partition, a residual value (in the pixel domain) is signaled in the bitstream.

Two sub-modes are defined in SDC, including SDC mode 1 and SDC mode 2, which correspond to the plane and partition types of DMM mode 1, respectively. The DC residual value may be represented as an incremental DC value indicating a difference of a DC value of the depth PU partition and a DC value of a predicted partition of the depth PU partition. Likewise, the DC value may be an average pixel value of the depth pixel samples in the depth PU partition.

Simplified residual coding is used in intra SDC. In simplified residual coding, one DC residual value is signaled for each partition of a PU, as described above, and no transform or quantization is applied. As discussed above, to signal information representative of the DC residual value for each partition, two methods may be applied:

1. the DC residual value for each partition, calculated by subtracting the predictor (denoted Pred) generated by the neighboring samples from the DC value (i.e., average value, denoted by Aver) of the current partition in the current PU, is coded directly.

2. When the DLT is transmitted, not the DC residual value is decoded, but the index difference of Aver and Pred mapped from the index lookup table is decoded. The index difference is calculated by subtracting the index of Pred from the index of Aver. At the decoder side, the sum of the decoded index difference and the index of Pred is mapped back to the depth value based on DLT.

It is proposed in JCT3V-F0126 that in depth coding, intra SDC can be applied to all additional depth intra prediction modes and the original HEVC intra prediction mode. In particular, the underlying idea of SDC extends to various intra prediction modes for video encoder 20 and video decoder 30. In SDC, video encoder 20 or video decoder 30 only codes one DC residual value, i.e., delta DC value, for depth PU or PU partition coded in intra prediction mode. Transform and quantization are skipped and no additional residual transform tree is needed for a depth Coding Unit (CU). SDC thus provides an alternative residual coding method whereby encoder 20 encodes and/or video decoder 30 decodes only one DC residual value for a depth PU in intra mode.

Fig. 6 is a block diagram illustrating an example video encoder 20 that may be configured to implement techniques of this disclosure, such as techniques for simplifying segment-by-Segment DC Coding (SDC) of a larger intra-predicted block (e.g., a 64 x 64 block) in a 3D video coding process, such as 3D-HEVC. In some examples, video encoder 20 may be configured to process a larger 64 × 64 intra-prediction block, e.g., a 64 × 64 intra-prediction block, into four smaller intra-prediction blocks, e.g., four 32 × 32 intra-prediction blocks, in intra SDC. In this way, in intra SDC, a 64 × 64 intra prediction block of HEVC intra prediction mode is processed into four 32 × 32 intra prediction blocks. In some cases, processing a large intra-prediction block into multiple smaller intra-prediction blocks in an intra may reduce the maximum buffer size requirement during intra SDC by encoder 20.

This disclosure describes video encoder 20 in the context of HEVC coding, and more particularly 3D-HEVC coding (e.g., as described in 3D-HEVC, and further modified as described in this disclosure). However, the techniques of this disclosure may be applicable to other coding standards or methods in which intra SDC modes are used for depth coding. Accordingly, fig. 6 is provided for purposes of explanation and should not be taken as a limitation on the techniques as broadly illustrated and described in this disclosure.

In the example of fig. 6, video encoder 20 includes prediction processing unit 100, residual generation unit 102, transform processing unit 104, quantization unit 106, inverse quantization unit 108, inverse transform processing unit 110, reconstruction unit 112, filter unit 114, decoded picture buffer 116, and entropy encoding unit 118. The prediction processing unit 100 includes an inter prediction processing unit 120 and an intra prediction processing unit 126. The inter prediction processing unit 120 includes a Motion Estimation (ME) unit 122 and a Motion Compensation (MC) unit 124.

The components of prediction processing unit 100 are described as performing both texture coding and depth coding. In some examples, texture and depth coding may be performed by the same component of prediction processing unit 100 or different components within prediction processing unit 100. For example, separate texture and depth encoders may be provided in some implementations. Also, multiple texture and depth encoders may be provided to encode multiple views, e.g., for multiview plus depth coding. Video encoder 20 may include more, fewer, or different functional components than those shown in fig. 6.

In any case, prediction processing unit 100 may be configured to intra or inter encode texture data and depth data as part of a 3D coding process, such as a 3D-HEVC process. In particular, in some modes, prediction processing unit 100 may use conventional non-SDC residual coding or SDC coding. In the case of SDC coding, prediction processing unit 100 may generate a delta DC residual value for an intra or inter coded depth PU, where the delta DC residual value represents the difference between the average of the pixels in the PU or a partition of the coded PU and the average of the predicted samples in the intra or inter predicted PU partition. A PU may have a single partition or multiple partitions depending on the coding mode. Depth PUs may be coded using HEVC intra, HEVC inter mode, DMM, or other modes.

In some examples, prediction processing unit 100 may operate generally in accordance with 3D-HEVC, e.g., as described in the 3D-HEVC WD, which is subject to the modifications and/or additions described in this disclosure, such as those relating to simplifying segment-wise DC coding (SDC) of a larger intra-predicted block, e.g., by processing a 64 × 64 intra-predicted block into four smaller intra-predicted blocks in an intra SDC mode. In this way, in intra SDC, the 64 × 64 intra prediction for HEVC intra prediction mode is processed into four 32 × 32 intra predictions. Prediction processing unit 100 may provide the syntax information to entropy encoding unit 118. The syntax information may indicate, for example, which prediction modes to use and information about such modes.

Video encoder 20 receives video data to be encoded. Video encoder 20 may encode each of a plurality of Coding Tree Units (CTUs) in a slice of a picture of the video data. Each of the CTUs may be associated with an equal-sized luma Coding Tree Block (CTB) and a corresponding chroma CTB of a picture. As part of encoding the CTUs, prediction processing unit 100 may perform quadtree partitioning to divide the CTBs of the CTUs into progressively smaller blocks. The smaller block may be a coding block of a CU. For example, prediction processing unit 100 may partition a CTB associated with a CTU into four equally sized sub-blocks, partition one or more of the sub-blocks into four equally sized sub-blocks, and so on.

Video encoder 20 may encode a CU of a CTB to generate an encoded representation of the CU (i.e., a coded CU). As part of encoding the CU, prediction processing unit 100 may partition a coding block associated with the CU among one or more PUs of the CU. Thus, each PU may be associated with a luma prediction block and a corresponding chroma prediction block.

Video encoder 20 and video decoder 30 may support PUs having various sizes. As indicated above, the size of a CU may refer to the size of a luma coding block of the CU, and the size of a PU may refer to the size of a luma prediction block of the PU. Assuming that the size of a particular CU is 2 nx 2N, video encoder 20 and video decoder 30 may support 2 nx 2N or nxn PU sizes for intra prediction, and 2 nx 2N, 2 nx N, N x 2N, N x N, or similar sized symmetric PU sizes for inter prediction. Video encoder 20 and video decoder 30 may also support asymmetric partitions of PU sizes of 2 nxnu, 2 nxnd, nlx 2N, and nR x 2N for inter prediction. According to aspects of this disclosure, video encoder 20 and video decoder 30 also support non-rectangular partitions for depth inter-coded PUs.

Inter-prediction processing unit 120 may generate predictive data for a PU by performing inter-prediction on each PU of the CU. The predictive data for the PU may include predictive sample blocks for the PU and motion information for the PU. Inter prediction processing unit 120 may perform different operations on PUs of a CU depending on whether the PU is in an I-slice, a P-slice, or a B-slice. In an I slice, all PUs are intra predicted. Thus, if the PU is in an I slice, inter prediction processing unit 120 does not perform inter prediction on the PU. Thus, for blocks encoded in I-mode, the predicted block is formed using spatial prediction from previously encoded neighboring blocks within the same frame.

If the PU is in a P slice, Motion Estimation (ME) unit 122 may search for a reference picture in a reference picture list (e.g., "RefPicList 0") for the reference region of the PU. The reference pictures may be stored in decoded picture buffer 116. The reference region for the PU may be a region that contains, within the reference picture, a sample block that most closely corresponds to the sample block of the PU. Motion Estimation (ME) unit 122 may generate a reference index that indicates the location in RefPicList0 of the reference picture of the reference region containing the PU.

In addition, for inter-coding, Motion Estimation (ME) unit 122 may generate Motion Vectors (MVs) that indicate spatial displacements between the coding block of the PU and reference locations associated with the reference regions. For example, the MV may be a two-dimensional vector that provides an offset from coordinates in the current decoded picture to coordinates in a reference picture. Motion Estimation (ME) unit 122 may output the reference index and the MV as the motion information of the PU. Motion Compensation (MC) unit 124 may generate the predictive sample block for the PU based on actual samples or interpolated samples at the reference location indicated by the motion vector of the PU.

If the PU is in a B slice, motion estimation unit 122 may perform uni-directional prediction or bi-directional prediction on the PU. To perform unidirectional prediction for the PU, motion estimation unit 122 may search for a reference picture of RefPicList0, or a second reference picture list of a reference region of the PU ("RefPicList 1"). Motion est Motion Estimation (ME) unit 122 may output the following as the Motion information of the PU: a reference index indicating a position of a reference picture containing the reference region in RefPicList0 or RefPicList1, an MV indicating a spatial displacement between a sample block of the PU and a reference position associated with the reference region, and one or more prediction direction indicators indicating whether the reference picture is in RefPicList0 or RefPicList 1. Motion Compensation (MC) unit 124 may generate the predictive sample block for the PU based at least in part on actual samples or interpolated samples at a reference region indicated by the motion vector of the PU.

To perform bi-directional inter prediction for a PU, motion estimation unit 122 may search for reference pictures in RefPicList0 for a reference region of the PU, and may also search for reference pictures in RefPicList1 for another reference region of the PU. Motion Estimation (ME) unit 122 may generate a reference picture index that indicates the position of the reference picture containing the reference region in RefPicList0 and RefPicList 1. In addition, Motion Estimation (ME) unit 122 may generate MVs that indicate spatial displacements between reference locations associated with the reference region and the sample blocks of the PU. The motion information of the PU may include a reference index and a MV of the PU. Motion Compensation (MC) unit 124 may generate the predictive sample block for the PU based at least in part on actual samples or interpolated samples at a reference region indicated by the motion vector of the PU.

Intra-prediction processing unit 126 may generate predictive data for the PU by performing intra-prediction on the PU. The intra-predictive data for the PU may include predictive sample blocks for the PU and various syntax elements. Intra prediction processing unit 126 may perform intra prediction on PUs in I slices, P slices, and B slices. To perform intra-prediction for a PU, intra-prediction processing unit 126 may use a plurality of intra-prediction modes to generate a plurality of predictive data sets for the PU, and then select one of the intra-prediction modes that yields acceptable or best coding performance, e.g., using a rate-distortion optimization technique.

To generate the predictive data set for the PU using the intra-prediction mode, intra-prediction processing unit 126 may extend samples from sample blocks of spatially neighboring PUs across sample blocks of the PU in a direction associated with the intra-prediction mode. Assuming left-to-right, top-to-bottom coding order for PU, CU, and CTU, neighboring PUs may be above, above-right, above-left, or to the left of the PU. Intra-prediction processing unit 126 may use various numbers of intra-prediction modes, e.g., 33 directional intra-prediction modes. In some examples, the number of intra-prediction modes may depend on the size of the area associated with the PU.

Prediction processing unit 100 may select predictive data for a PU of the CU from among predictive data for the PU generated by inter prediction processing unit 120 or predictive data for the PU generated by intra prediction processing unit 126. In some examples, prediction processing unit 100 selects predictive data for PUs of the CU based on a rate/distortion metric for the set of predictive data. The predictive sample block of the selected predictive data may be referred to herein as the selected predictive sample block.

Residual generation unit 102 may generate luma, Cb, and Cr residual blocks for the CU based on the luma, Cb, and Cr coding blocks of the CU and the selected inter-or intra-predictive luma, Cb, and Cr blocks of the PU of the CU. For example, residual generation unit 102 may generate the residual block of the CU such that each sample in the residual block has a value equal to a difference between a sample in the coding block of the CU and a corresponding sample (i.e., in luma or chroma pixel values, if applicable) in a corresponding selected predictive sample block of a PU of the CU. The residue generation unit 102 may also generate delta DC residue values for the SDC mode.

Transform processing unit 104 may perform quadtree partitioning to partition a residual block associated with a CU into transform blocks associated with TUs of the CU. Thus, a TU may be associated with a luma transform block and two chroma transform blocks. The size and position of the luma and chroma transform blocks of a TU of a CU may or may not be based on the size and position of prediction blocks of PUs of the CU. A quadtree structure, referred to as a "residual quadtree" (RQT), may include nodes associated with each of the regions. The TUs of a CU may correspond to leaf nodes of a RQT.

For conventional residual coding, transform processing unit 104 may generate transform coefficient blocks for each TU of a CU by applying one or more transforms to the transform blocks associated with the TU. Transform processing unit 104 may apply various transforms to transform blocks associated with TUs. For example, transform processing unit 104 may apply a Discrete Cosine Transform (DCT), a directional transform, or a conceptually similar transform to the transform blocks. In some examples, transform processing unit 104 does not apply the transform to the transform block. In such examples, the transform block may be processed as a transform coefficient block. Furthermore, for SDC coding, transforms and quantization are typically not applied to the delta DC residual values generated for the predicted PU or partition.

For conventional residual coding, quantization unit 106 may quantize residual transform coefficients in a coefficient block. The quantization process may reduce the bit depth associated with some or all of the transform coefficients. For example, an n-bit transform coefficient may be rounded down to an m-bit transform coefficient during quantization, where n is greater than m. Quantization unit 106 may quantize coefficient blocks associated with TUs of the CU based on Quantization Parameter (QP) values associated with the CU. Video encoder 20 may adjust the degree of quantization applied to coefficient blocks associated with a CU by adjusting the QP value associated with the CU. Quantization may cause information to be lost, so the quantized transform coefficients may have less precision than the original transform coefficients.

Inverse quantization unit 108 and inverse transform processing unit 110 may apply inverse quantization and inverse transform, respectively, to the coefficient block to reconstruct the residual block from the coefficient block. Reconstruction unit 112 may add the reconstructed residual block to corresponding samples from one or more predictive sample blocks generated by prediction processing unit 100 to generate a reconstructed transform block associated with the TU. By reconstructing the transform blocks for each TU of the CU in this manner, video encoder 20 may reconstruct the coding blocks of the CU.

For HEVC intra modes, HEVC inter modes, and other modes such as DMM modes, SDC residual coding for a depth CU may be used to generate delta DC residual values, also referred to as DC residual values, for a predicted PU or PU partition. For SDC, residual generation unit 102 may generate a single delta DC value for each depth PU or PU partition, where the single delta DC value represents the difference between the average of the pixels in the PU or PU partition and the average of the predicted samples in the intra-or inter-predicted PU or PU partition. The delta DC residual value is not transformed or quantized, and may be provided by residual generation unit 102 to entropy coding unit 118, as indicated by line 115 in fig. 6.

Reconstruction unit 112 may reconstruct the depth CU based on the DC residual values of the partitions of the PUs of the CU and the corresponding predicted partitions of the PUs of the CU. For example, the delta DC residual value for each depth PU partition may be added to the pixel values in the corresponding predicted partition to reconstruct the depth PU partition, where the DC residual value may represent the difference between the average of the pixels of the depth PU partition and the average of the predicted samples of the predicted partition. In some examples, information representative of the DC residual value (e.g., one or more syntax elements representative of the delta DC value) may be generated by prediction processing unit 100, received by entropy encoding unit 118, and used by reconstruction unit 112 without inverse quantization or inverse transform processing, e.g., as indicated by line 115.

Filter unit 114 may perform one or more filtering operations to reduce artifacts, such as block artifacts, in the coding blocks associated with the reconstructed CUs. The filtering operation may include one or more of: deblocking to remove blockiness at block boundaries, loop filtering to smooth pixel transitions, sample adaptive offset filtering to smooth pixel transitions, or possibly other types of filtering operations or techniques. After filter unit 114 performs one or more deblocking operations on the reconstructed coded block, decoded picture buffer 116 may store the reconstructed coded block. Inter-prediction unit 120 may use the reference picture containing the reconstructed coding block to perform inter-prediction on PUs of other pictures. In addition, intra-prediction processing unit 126 may use reconstructed coded blocks in decoded picture buffer 116 to perform intra-prediction on other PUs that are in the same picture as the CU.

Entropy encoding unit 118 may receive data from various functional components of video encoder 20. For example, entropy encoding unit 118 may receive coefficient blocks from quantization unit 106 and may receive syntax elements from prediction processing unit 100. Additionally, entropy encoding unit 118 may receive the delta DC residual value from residual generation unit 102. Entropy encoding unit 118 may perform one or more entropy encoding operations on the data to generate entropy encoded data. For example, entropy encoding unit 118 may perform CABAC operations. Examples of other entropy coding processes include Context Adaptive Variable Length Coding (CAVLC), syntax-based context adaptive binary arithmetic coding (SBAC), and Probability Interval Partitioning Entropy (PIPE) coding. In HEVC, CABAC is used. Video encoder 20 may output a bitstream that includes the entropy-encoded data generated by entropy encoding unit 118. For example, the bitstream may include bits of a binary number representing a binary syntax element or a binarized syntax element.

Video encoder 20 is an example of a video encoder configured to perform any of the techniques described in this disclosure, including techniques for simplified segment-by-segment DC coding of large prediction blocks. Additional 3D processing components may also be included within video encoder 20. In accordance with one or more techniques of this disclosure, one or more units within video encoder 20 may perform the techniques described herein as part of the video encoding process. Similarly, video encoder 20 may perform a video decoding process using any of the techniques of this disclosure to reconstruct video data used as reference data for prediction of subsequently coded video data.

Fig. 7 is a block diagram illustrating an example video decoder 30 configured to perform the techniques of this disclosure. Fig. 7 is provided for purposes of illustration and should not be construed as a limitation on the techniques as broadly illustrated and described in this disclosure. This disclosure describes video decoder 30 in the context of HEVC coding, and in particular 3D-HEVC coding. However, the techniques of this disclosure may be applicable to other 3D video coding standards or methods. Video decoder 30 may be configured to perform techniques for simplifying segment-by-Segment DC Coding (SDC) of large intra-predicted blocks (e.g., 64 x 64 blocks) in 3D video coding processes, such as 3D-HEVC. In some examples, video decoder 30 may be configured to process a larger 64 × 64 intra-predicted block, e.g., a 64 × 64 intra-predicted block, into four smaller intra-predicted blocks, e.g., four 32 × 32 intra-predicted blocks, in intra SDC. In this way, in intra SDC depth coding, the 64 × 64 intra prediction for HEVC intra prediction modes is processed into four 32 × 32 intra predictions. In some cases, processing a large intra-prediction block into multiple smaller intra-prediction blocks in an intra may reduce the maximum buffer size requirement during intra SDC by encoder 30.

In the example of fig. 7, video decoder 30 includes entropy decoding unit 150, prediction processing unit 152, inverse quantization unit 154, inverse transform processing unit 156, reconstruction unit 158, filter unit 160, and decoded picture buffer 162. Prediction processing unit 152 includes a Motion Compensation (MC) unit 164 for inter prediction, and an intra prediction processing unit 166. For ease of illustration, the components of prediction processing unit 152 are described as performing both texture decoding and depth decoding. In some examples, texture and depth decoding may be performed by the same component of prediction processing unit 152 or different components within prediction processing unit 152. For example, separate texture and depth decoders may be provided in some implementations. Also, multiple texture and depth decoders may be provided to decode multiple views, e.g., for multiview plus depth coding. In any case, prediction processing unit 152 may be configured to intra or inter decode the texture data and depth data as part of a 3D coding process, such as a 3D-HEVC process.

Thus, prediction processing unit 152 may generally operate in accordance with 3D-HEVC that is subject to the modifications and/or additions described in this disclosure, such as those relating to techniques for simplifying segment-wise DC coding (SDC) of larger intra-predicted blocks, such as techniques that process a larger 64 x 64 intra-predicted block (e.g., a 64 x 64 intra-predicted block) into four smaller intra-predicted blocks (e.g., four 32 x 32 intra-predicted blocks) in intra SDC. Prediction processing unit 152 may obtain, via entropy decoding unit 150, residual data from an encoded video bitstream of intra-or inter-decoded depth data using SDC or conventional non-SDC residual coding techniques, and reconstruct a CU using the intra-or inter-predicted depth data and the residual data. When SDC is used, the residual data may be an incremental DC residual value. In some examples, video decoder 30 may include more, fewer, or different functional components than those shown in fig. 7.

Video decoder 30 receives an encoded video bitstream. Entropy decoding unit 150 parses the bitstream to decode entropy-encoded syntax elements from the bitstream. In some examples, for SDC, entropy decoding unit 118 may be configured to use a CABAC coder to decode, from bits in the bitstream, a binary number of a syntax element that represents the delta DC residual value. Entropy decoding unit 118 may use a CABAC coder to decode various other syntax elements for different coding modes, including intra or inter coding modes using conventional residual coding and intra or inter SDC modes using delta DC residual coding.

Prediction processing unit 152, inverse quantization unit 154, inverse transform processing unit 156, reconstruction unit 158, and filter unit 160 may generate decoded video data based on syntax elements extracted from the bitstream. The bitstream may comprise a sequence of NAL units. NAL units of a bitstream may include coded slice NAL units. As part of decoding the bitstream, entropy decoding unit 150 may extract and entropy decode syntax elements from coded slice NAL units.

Each of the coded slices may include a slice header and slice data. The slice header may contain syntax elements for the slice. The syntax elements in the slice header may include syntax elements that identify a PPS associated with the picture containing the slice. A PPS may refer to an SPS, which may in turn refer to a VPS. Entropy decoding unit 150 may also entropy decode other elements that may include syntax information, such as SEI messages. The decoded syntax elements in any of the slice header, parameter set, or SEI message may include information described herein as being signaled according to example techniques described in this disclosure. Such syntax information may be provided to prediction processing unit 152 for decoding and reconstructing the texture or depth block.

Video decoder 30 may perform reconstruction operations on the undivided CUs and PUs. To perform the reconstruction operation, for non-SDC coding, video decoder 30 may perform the reconstruction operation on each TU of the CU. By performing a reconstruction operation on each TU of the CU, video decoder 30 may reconstruct blocks of the CU. As part of performing reconstruction operations on TUs of a CU, inverse quantization unit 154 may inverse quantize (i.e., dequantize) coefficient blocks associated with the TUs. Inverse quantization unit 154 may use the QP value associated with the CU of the TU to determine the degree of quantization, and likewise the degree of inverse quantization that inverse quantization unit 154 will apply. That is, the compression ratio, i.e., the ratio to represent the number of bits of the original sequence to the compressed sequence, may be controlled by adjusting the value of QP used in quantizing the transform coefficients. The compression ratio may also depend on the method of entropy coding employed.

After inverse quantization unit 154 inverse quantizes the coefficient blocks, inverse transform processing unit 156 may apply one or more inverse transforms to the coefficient blocks to generate residual blocks associated with the TUs. For example, inverse transform processing unit 156 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotational transform, an inverse directional transform, or another inverse transform to the coefficient block.

If the PU is encoded using intra prediction, intra prediction processing unit 166 may perform intra prediction to generate the predictive blocks for the PU. Intra prediction processing unit 166 may use the intra prediction mode to generate predictive luma, Cb, and Cr blocks for the PU based on the prediction blocks of the spatially neighboring PUs. Intra-prediction processing unit 166 may determine the intra-prediction mode for the PU based on one or more syntax elements decoded from the bitstream.

If the PU is encoded using inter prediction, MC unit 164 may perform intra prediction to generate an inter-predictive block for the PU. MC unit 164 may use the inter prediction mode to generate predictive luma, Cb, and Cr blocks for the PU based on the prediction blocks of the PU in the other picture or view. MC unit 164 may determine an inter prediction mode for the PU based on one or more syntax elements decoded from the bitstream, and may receive motion information, such as motion vectors, prediction directions, and reference picture indices.

For inter prediction, MC unit 164 may construct a first reference picture list (RefPicList0) and a second reference picture list (RefPicList1) based on syntax elements extracted from the bitstream. If the PU is encoded using inter prediction, entropy decoding unit 150 may extract the motion information of the PU. MC unit 164 may determine one or more reference blocks for the PU based on the motion information of the PU. Motion Compensation (MC) unit 164 may generate predictive luma, Cb, and Cr blocks for the PU based on samples in blocks at one or more reference blocks of the PU.

Reconstruction unit 158 may reconstruct the luma, Cb, and Cr coding blocks of the CU using the luma, Cb, and Cr transform blocks associated with the TUs of the CU and the predictive luma, Cb, and Cr blocks of the PUs of the CU (i.e., the intra-prediction data or the inter-prediction data), as appropriate. For example, reconstruction unit 158 may add residual samples of the luma, Cb, and Cr transform blocks to corresponding samples of the predictive luma, Cb, and Cr blocks to reconstruct luma, Cb, and Cr coding blocks of the CU.

Filter unit 160 may perform deblocking operations to reduce block artifacts associated with luma, Cb, and Cr coding blocks of a CU. Video decoder 30 may store the luma, Cb, and Cr coding blocks of the CU in decoded picture buffer 162. Decoded picture buffer 162 may provide reference pictures for subsequent motion compensation, intra prediction, and presentation on a display device, such as display device 32 of fig. 3. For example, video decoder 30 may perform intra-prediction or inter-prediction operations on PUs of other CUs based on luma, Cb, and Cr blocks in decoded picture buffer 162.

Video decoder 30 is an example of a video decoder configured to perform any of the techniques described in this disclosure, including techniques for simplified segment-by-segment DC coding of large prediction blocks. In accordance with one or more techniques of this disclosure, one or more units within video decoder 30 may perform one or more techniques described herein as part of a video decoding process. Additional 3D coding components may also be included within video decoder 30.

Prediction processing unit 152, and more particularly intra prediction processing unit 166 and Motion Compensation (MC) unit 164, may determine whether to perform SDC in a depth intra prediction mode and a depth inter prediction mode of a 3D video coding process, such as 3D-HEVC, when applicable. When using SDC, entropy decoding unit 150 may entropy decode one or more delta DC residual values for PUs or PU partitions of the depth CU and associated syntax information.

For SDC, entropy decoding unit 150 may provide the SDC syntax information for the block to prediction processing unit 152, as indicated in fig. 7. Entropy decoding unit 150 may provide the delta DC residual value to reconstruction unit 158. The delta DC residual values received by video decoder 30 may not be transformed and quantized. In particular, the delta DC residual values need not first be provided to the inverse quantization unit 154 and inverse transform processing unit 156 for inverse quantization and inverse transformation. Alternatively, entropy decoding unit 150 may decode, from bits in the bitstream, binary numbers representing one or more syntax elements of the delta DC residual value and provide information representing the delta DC residual value to reconstruction unit 158 for use in reconstructing the SDC coded PU or partition. Reconstruction unit 158 may receive the intra-or inter-predicted PU or PU partition of the depth CU from prediction processing unit 152 and add delta DC residual values to each of the samples of the predicted PU or PU partition to reconstruct the coded PU or PU partition.

In this way, when SDC is used, reconstruction unit 158 may reconstruct the depth CU based on the delta DC residual values of the partitions of the PUs of the CU and the corresponding predicted PU or PU partition of the CU. Likewise, the delta DC residual value may represent the difference between the average of the pixels of the depth PU or PU partition and the average of the samples of the predicted PU or PU partition. As will be described, when the syntax information indicates an intra SDC64 × 64 mode, the decoder 30 processes the 64 × 64 mode into four 32 × 32 modes.

In the HEVC main specification, the maximum intra prediction size is 32 × 32. However, in intra SDC mode of 3D-HEVC, the maximum intra prediction size of planar mode is 64 × 64. With the scheme in JCT3V-F0126, SDC may be applied to the additional depth intra prediction modes and the original HEVC intra prediction modes. Thus, with this scheme, the maximum intra prediction size for all HEVC intra prediction modes is 64 × 64. Both the schemes in 3D-HEVC and JCT3V-F0126 increase the maximum buffer size for intra prediction when compared to HEVC.

According to an example of this disclosure, to simplify SDC coding of a large intra-predicted block, video encoder 20 and video decoder 30 may be configured to process the large intra-predicted block into four smaller intra blocks. For example, with a 64 × 64 intra-predicted block, video encoder 20 and video decoder 30 may be configured to process the 64 × 64 intra-predicted block into four 32 × 32 intra-predicted blocks. In this way, in SDC, the 64 × 64 intra prediction for HEVC intra prediction mode is processed into four 32 × 32 intra predictions. The indications of 64 × 64 and 32 × 32 refer to the number of pixels in a depth PU or partition to be intra coded using SDC, or the corresponding number of prediction samples in an intra-predicted depth PU or partition. Various aspects of techniques for processing a 64 × 64 intra-prediction block into four 32 × 32 intra-prediction blocks are described below for purposes of example.

Fig. 8 is a diagram illustrating processing of a 64 × 64SDC intra prediction block into four smaller 32 × 32 intra prediction blocks. As shown in fig. 8, the 64 x 64 depth block 170 is split into four 32 x 32 sub-blocks 172, 174, 176, and 178. Each of the sub-blocks 172, 174, 176, 178 may have the same intra-mode. For example, intra-mode may be applied to the block 170 and for each of the four 32 x 32 sub-blocks 172, 174, 176, 178. Block 172 is the upper left 32 x 32 intra block of a 64 x 64 block, block 174 is the upper right 32 x 32 intra block of a 64 x 64 block, block 176 is the lower left 32 x 32 intra block of a 64 x 64 block, and block 178 is the lower right 32 x 32 intra block of a 64 x 64 block of depth data. In this example, the 64 x 64 block 170 extends horizontally from x-0 to x-63 and vertically from y-0 to y-63. The upper-left 32 × 32 intra-frame block 172 extends horizontally from x-0 to x-31 and vertically from y-0 to y-31, the upper-right 32 × 32 intra-frame block 172 extends horizontally from x-32 to x-63 and vertically from y-0 to y-31, the lower-left 32 × 32 intra-frame block 172 extends horizontally from x-0 to x-31 and vertically from y-32 to y-63, and the lower-right 32 × 32 intra-frame block 172 extends horizontally from x-32 to x-63 and vertically from y-32 to y-63.

As further shown in fig. 8, spatially neighboring reconstructed samples RecSample [ x ] [ y ] of a 64 × 64 block may be used to predict some intra modes of the 32 × 32 block, where x and y are the vertical and horizontal positions of the reconstructed samples relative to the top left sample at x-0, y-0, respectively, of the 64 × 64 block 170. For example, fig. 8 shows reconstructed left-neighboring samples 182 at, e.g., i-1 and j-0 to 63, upper-left neighboring samples 184 at, e.g., i-1, j-1, upper-neighboring samples 186 at, e.g., i-0 to 63, j-1, and upper-right neighboring samples 188 at, e.g., i-64 to 127, j-1 (only a portion of which are shown in fig. 8). If 64 x 64 is the maximum coding unit size for both HEVC and 3D-HEVC, the lower left neighboring samples at i-1, j-64 to 127 are typically not available because the coding unit is typically coded in raster scan order. Thus, when coding a coding unit, the coding units below it have not yet been encoded, and thus have not yet generated any usable reconstructed samples.

The reconstructed neighboring samples 182-188 reside in neighboring blocks that are spatially adjacent to the 64 x 64 block 170. It is assumed that the neighboring samples 182 to 188 are reconstructed in the sense that the block in which the neighboring samples 182, 184, 186, 188 reside has been encoded or decoded and reconstructed before the currently encoded or decoded 64 x 64 block 170.

It is proposed in examples of this disclosure that in intra SDC coding, a 64 × 64 intra-predicted block coded with HEVC intra-prediction mode as in current 3D-HEVC and potentially as proposed in JCT3V-F0126 may be simplified by splitting the 64 × 64 intra-prediction into four 32 × 32 intra-predicted blocks, e.g., as shown in fig. 8. These four 32 × 32 intra-prediction blocks may have the same intra-prediction mode and may be intra-predicted by video encoder 20 and video decoder 30 in raster scan or decoding order.

In a first example, a subsequent 32 x 32 block within a 64 x 64 block may be predicted using the predicted samples of each 32 x 32 block in place of at least some neighboring reconstructed samples, as explained below. Thus, for intra prediction of some 32 x 32 blocks of a 64 x 64 block, the available reference samples may be reconstructed only, reconstructed or predicted depending on the mode, or predicted only. In general, the reconstructed reference sample has been reconstructed, e.g., by summing a predicted reference sample with a residual value, while the predicted reference sample has not typically been summed with a residual value. In each case, the reconstructed or predicted samples used to intra predict a 32 x 32 block may typically be adjacent to the 32 x 32 block. In a second example, only reconstructed samples are used for 32 × 32 blocks of the intra-predicted 64 × 64 block, including for some 32 × 32 blocks where reconstructed samples are adjacent to the 64 × 64 block because they are adjacent to a boundary of the 64 × 64 block, but not adjacent to the 32 × 32 block because they are not immediately adjacent to the respective 32 × 32 block.

In either the first example or the second example, the first 32 x 32 block (i.e., the upper left 32 x 32 sub-block of the 64 x 64 block) may be intra predicted in the same manner as specified by HEVC. In particular, neighboring samples of spatially neighboring blocks outside of the 64 × 64 block will typically be reconstructed and available for intra prediction of the upper left 32 × 32 block. However, either predicted samples or reconstructed samples may be used for some 32 x 32 blocks in the first example, while only reconstructed samples are used for the second example.

A first example of some 32 x 32 blocks for which reconstructed or predicted samples may be used for intra-prediction of 64 x 64 blocks will now be described. Referring to fig. 8, left neighboring sample 182, upper-left neighboring sample 184, and upper neighboring sample 186, for example, of 64 x 64 block 170 will typically be reconstructed and available for use by video encoder 20 and video decoder 30 in intra-predicting upper-left 32 x 32 intra block 172.

The top left block 172 may be intra predicted using any of the following reconstructed samples RecSample [ i ] [ j ], depending on the intra prediction mode: RecSample [ i ] j ] is the reconstructed left neighbor sample 182 that is the lower left reconstructed sample at i-1, j-32 to 63; RecSample [ i ] j ] is the left neighboring sample 182 to the left reconstructed sample at i-1, j-0 to 31; RecSample [ i ] j ] is the reconstructed top left neighbor sample 184 that is the top left reconstructed sample at i-1, j-1; RecSample [ i ] j ] is a reconstructed top neighboring sample 186 that is a top reconstructed sample at i-0 to 31, j-1; and RecSample [ i ] j is the top-adjacent sample 186 that is the top-right reconstructed sample at i-32 to 63, j-1. Furthermore, the four sub-blocks 172, 174, 176, 178 each have the same intra-mode, and the particular samples from among the available reconstructed and/or predicted samples used to predict the sub-blocks will depend on the selected intra-mode.

In general, to define predicted reference samples that video encoder 20 and video decoder 30 may use (i.e., predicted samples that may be used to intra-predict a particular 32 × 32 block), let PredSample [ x ] [ y ] be the predicted samples of a 64 × 64 block, where x-0 to x-63 and y-0 to y-63, where x and y are the vertical and horizontal positions of the samples relative to the top-left sample at x-0 and y-0, respectively, of 64 × 64 block 170.

For each of the other three 32 x 32 blocks (i.e., blocks 174, 176, and 178), in addition to any available neighboring reconstructed samples, the predicted samples PredSample [ i ] [ j ] of the previous 32 x 32 block (i.e., the previously predicted 32 x 32 block) of the 64 x 64 block may also be used as neighboring samples in the intra prediction process. When a 64 x 64 block, where a 32 x 32 block resides, is being coded, the predicted samples may be predicted but have not yet been reconstructed.

For intra-predicting upper-right 32 x 32 intra block 174, the reference samples available to video encoder 20 and video decoder 30 for intra-prediction may include a portion of reconstructed upper neighboring sample 186 that resides above upper-right 32 x 32 intra block 174 and a portion of reconstructed upper-right neighboring sample 188 (shown partially in fig. 8), plus left neighboring intra-predicted sample 190 of upper-left 32 x 32 intra block 172. If the predicted sample 198 of the lower left 32 x 32 intra block 176 is not predicted when the upper right 32 x 32 intra block 174 is being predicted, it may not be usable as a lower left sample.

Thus, in some examples, the upper-right 32 x 32 block 174 may be intra-predicted without using the lower-left samples. In this case, for some intra modes, video encoder 20 and video decoder 30 may intra-predict upper-right 32 x 32 intra block 174 using neighboring reconstructed samples, left neighboring intra-predicted samples 190 in place of reconstructed reference samples, or left neighboring intra-predicted samples 190 in conjunction with reconstructed neighboring reference samples. Thus, the neighboring predicted samples that may be used to intra predict the upper-right 32 × 32 intra block 174 are from the upper-left 32 × 32 intra block 172 that precedes the upper-right 32 × 32 intra block in coding order. In each case, the reconstructed or predicted samples may be used to intra-predict the neighboring upper-right 32 x 32 intra block 174.

In predicting the second upper right 32 x 32 block 174, PredSample [ i ] [ j ] with i ═ 31, j ═ 0 to 31 is used as the left neighbor sample in a simple manner similar to performing intra prediction on a 32 x 32 block in the same relative position with reconstructed samples as in HEVC. In this case, the predicted samples 190 of PredSample [ i ] [ j ] at i-31, j-0 to 31 that have been predicted for the upper left block 172 serve as the left neighboring samples for the intra-predicted upper right block 174. For the upper-right 32 x 32 block 174, the upper-left sample may be the reconstructed sample of RecSample [ i ] [ j ] at i-31, j-1, the upper sample is the sample of RecSample [ i ] [ j ] at i-32 to 63, j-1, and the upper-right sample is the sample of RecSample [ i ] [ j ] at i-64 to 95, j-1, all of which may be obtained from previously coded (e.g., previously coded in raster order) neighboring blocks above the 64 x 64 block 170.

For intra-prediction of lower-left 32 × 32 intra block 176, the reference samples available to video encoder 20 and video decoder 30 for intra-prediction may include: the portion of reconstructed left neighboring sample 182 that resides to the left of the lower left 32 x 32 intra block 176 as a left sample, the portion of reconstructed left neighboring sample 182 that resides above and to the left of the lower left 32 x 32 intra block 176 as an upper left sample, plus the upper neighboring intra predicted sample 192 of the upper left 32 x 32 intra block 172 as an upper sample, and the upper right intra predicted sample 196 of the upper right 32 x 32 intra block 174 as an upper right sample. In this case, for some intra modes, video encoder 20 and video decoder 30 may intra-predict lower left 32 x 32 intra block 176 using neighboring reconstructed samples, upper neighboring intra-predicted samples 192, 196 in place of reconstructed reference samples, or neighboring intra-predicted samples 192, 196 in conjunction with neighboring reconstructed samples, depending on the intra mode. The neighboring intra-predicted samples available for intra-predicting the lower-left 32 × 32 intra block 176 come from the upper-left 32 × 32 intra block 172 and the upper-right 32 × 32 intra block 174 that precede the lower-left 32 × 32 intra block in coding order.

In predicting the third lower left 32 × 32 block 176, PredSample [ i ] [ j ] with i 0 to 31, j 31 is used as the upper neighboring sample and PredSample [ i ] [ j ] with i 32 to 63, j 31 is used as the upper right neighboring sample in a simple manner similar to performing intra prediction on a 32 × 32 block in the same relative position with reconstructed samples as in HEVC. In this case, the predicted samples 192 that have been predicted for the upper-left block 172 and are adjacent to i 0-31, j 31 of the lower-left block 176 serve as upper neighboring samples for the intra-prediction of the lower-left block 176, and the predicted samples 196 that have been predicted for the upper-right block 174 and are adjacent to i 32-63, j 31 of the lower-left block 716 serve as upper-right neighboring samples for the intra-prediction of the lower-left block 176. For the lower-left 32 x 32 block 176, the upper-left samples may be reconstructed samples of RecSample [ i ] [ j ] at i-1, j-31, and the left samples are samples of RecSample [ i ] [ j ] at i-1, j-32-63, all of which may be obtained from a previously coded (e.g., previously coded in raster order) block to the left of the 64 x 64 block 170.

For intra-prediction of lower-right 32 × 32 intra block 178, the reference samples available to video encoder 20 and video decoder 30 for intra-prediction may include only neighboring predicted samples of the neighboring 32 × 32 block. For example, the reference samples available to video encoder 20 and video decoder 30 for intra-predicting lower-right 32 x 32 intra block 178 may not include reconstructed samples, and may instead include upper-left neighboring predicted samples 194 of upper-left 32 x 32 intra block 172, upper neighboring predicted samples 196 of upper-right 32 x 32 intra block 174, and left neighboring predicted samples 198 of lower-left 32 x 32 intra block 176.

In predicting the fourth lower right 32 x 32 block, PredSample [31] [31] is used as the upper left neighbor sample, the predicted sample of PredSample [ i ] [ j ] at i 31, j 32..63 is used as the left neighbor sample, and the predicted sample of PredSample [ i ] [ j ] at i 32..63, j 31 is used as the upper neighbor sample, in a simple manner similar to performing intra prediction on a 32 x 32 block in the same relative position with reconstructed samples as in HEVC, respectively. In this example, for the bottom right 32 x 32 intra block 178, the predicted neighbor samples of blocks 172, 174, 176 are used. Thus, in one example, no reconstructed neighbor samples are used for intra-predicting the lower right 32 x 32 intra block 178, and only the predicted neighbor samples are used. Also, the lower right 32 x 32 intra block 178 may be predicted without using lower left or upper right reference samples.

In this case, video encoder 20 and video decoder 30 intra-predict lower right 32 × 32 intra block 178 using only neighboring intra-predicted samples of neighboring 32 × 32 blocks of the 64 × 64 block, instead of reconstructed reference samples. Thus, in this example, video encoder 20 and video decoder 30 do not use reconstructed reference samples to intra predict lower right 32 x 32 intra block 178. Alternatively, only the neighboring predicted samples of the 32 x 32 intra block 172, 174, 176 that precede the bottom-right 32 x 32 intra block 178 in coding order are used.

For each of the 32 x 32 blocks 172, 174, 176, 178 of the 64 x 64 intra block 170, the particular reference samples selected by video encoder 20 and video decoder 30 from the available predicted and/or reconstructed reference samples will depend on the particular intra mode selected for coding the 32 x 32 intra block. By splitting the 64 x 64 block into four 32 x 32 blocks, the intra prediction process can be simplified. In some examples, processing smaller blocks may reduce memory buffer requirements in video encoder 20 or video decoder 30.

In the second example, as an alternative to using predicted neighbor samples, only the neighbor reconstructed samples RecSample [ i ] [ j ] adjacent to the current 64 × 64 block 170 are used to predict all four 32 × 32 blocks 172, 174, 176, 178. In this example, the predicted samples of blocks 172, 174, 176 are not used for intra-prediction blocks 174, 176, and 178. Alternatively, reconstructed samples adjacent to the 64 x 64 block 170 are used to intra-predict the block 174, 176, 178, even if some of the reconstructed samples adjacent to the larger 64 x 64 block 170 are not adjacent to a given 32 x 32 sub-block 174, 176, 178.

In general, to define reconstructed reference samples that video encoder 20 and video decoder 30 may use (i.e., reconstructed reference samples that are available for intra-prediction of a particular 32 × 32 block), RecSample [ x ] [ y ] is likewise given as reconstructed neighboring samples of a 64 × 64 block, where x-1, y-1.. 63, or x-0.. 63, y-1, where x and y are the vertical and horizontal positions of the reconstructed samples relative to the top-left samples at x-0, y-0 of 64 × 64 block 170, respectively.

In this second example, the first upper-left 32 x 32 block is predicted in the same way as in HEVC or the upper first example. That is, the first upper-left 32 x 32 block may still be predicted using the same process typically used in HEVC for intra prediction, e.g., 32 x 32 blocks. In particular, when intra-predicting the first top-left 32 × 32 intra block 172, video encoder 20 and video decoder 30 may use any of the following reconstructed neighboring samples RecSample [ i ] [ j ]: reconstructed left sample 182 as the lower left reconstructed sample at i-1, j-32 to 63, reconstructed left sample 182 as the lower left reconstructed sample at i-1, j-0 to 31, RecSample [ i ] [ j ] reconstructed upper left sample 184 as the upper left reconstructed sample at i-1, j-1, RecSample [ i ] [ j ] reconstructed upper left sample 186 as the upper reconstructed sample at i-0 to 31, j-1, and RecSample [ i ] [ j ] reconstructed upper sample 186 as the upper reconstructed sample at i-32 to 63, j-1.

For each of the other three 32 x 32 blocks (upper right block 174, lower left block 176, and lower right block 178), in addition to the reconstructed samples that are actually adjacent to these blocks (e.g., adjacent to the upper left, upper, and upper right reconstructed samples of upper right block 174, and adjacent to the upper left and upper left reconstructed samples of lower left block 176, but not adjacent to lower right block 178), the additional reconstructed samples RecSample [ i ] [ j ] are also used as reference samples in the intra prediction process, as described below.

In predicting the second upper right 32 x 32 block 174, reconstructed samples RecSample [ i ] [ j ] of i-1, j-0 to 31 are used by video encoder 20 and video decoder 30 as left samples for intra-predicting the upper right 32 x 32 intra block 174, even if these reconstructed samples are not adjacent (i.e., not immediately adjacent) to the upper right 32 x 32 intra block 174. Video encoder 20 and video decoder 30 may also use reconstructed left samples 182 of RecSample [ i ] [ j ] at i-1, j-32 to 63 as the lower left reconstructed samples for the upper right 32 × 32 intra block 174. Likewise, these lower-left reconstructed samples are not adjacent (i.e., are not immediately adjacent) to the upper-right 32 x 32 intra prediction block 174, like the left reconstructed samples.

Video encoder 20 and video decoder 30 may also use reconstructed samples that are actually adjacent (i.e., contiguous) to the upper right 32 x 32 intra block 174 of 64 x 64 block 170, except for neighboring (i.e., non-contiguous) reconstructed samples. For example, other reconstructed samples that may be used for intra-predicting the upper-right 32 x 32 intra block 174 include: RecSample [ i ] [ j ] is the portion of reconstructed sample 186 that is the upper left reconstructed sample at i-31, j-1, RecSample [ i ] [ j ] is the reconstructed sample 186 that is the upper reconstructed sample at i-32 to 63, j-1, and RecSample [ i ] [ j ] is the portion of reconstructed sample 188 that is the upper right reconstructed sample at i-64 to 95, j-1. Thus, to intra predict the upper-right 32 × 32 intra block 174 of the 64 × 64 intra block 170, video encoder 20 and video decoder 30 may use reconstructed samples that are adjacent (i.e., contiguous) to the top of the 32 × 32 block 174 and reconstructed samples that are adjacent to the left of the 64 × 64 block but not adjacent to the left of the 32 × 32 block 174.

In predicting the third lower left 32 × 32 block 176 of the 64 × 64 block 170, the neighboring reconstructed samples 182 of RecSample [ i ] [ j ] at i-1, j-32 to 63 are used by the video encoder 20 and the video decoder 30 as left neighboring samples. In this example, the neighboring reconstructed samples 182 of RecSample [ i ] [ j ] at i-1 and j-31 may be used as the upper left neighboring reconstructed samples for the lower left 32 × 32 intra block 176 for intra prediction. Thus, some of the reconstructed samples used for intra prediction may be adjacent to both the 64 × 64 block 170 and the lower-left 32 × 32 intra block.

In addition, video encoder 20 and video decoder 30 may use reconstructed samples that are adjacent to 64 × 64 block 170 but not adjacent (i.e., not immediately adjacent) to lower left 32 × 32 intra block 176 used for intra prediction of the block. For example, when intra-predicting the third lower left 32 × 32 block 176 of the 64 × 64 block 170, reconstructed samples 186 of RecSample [ i ] [ j ] at i-0 to 31, j-1 may be used by video encoder 20 and video decoder 30 as upper reconstructed samples for intra-prediction. In addition, reconstructed samples 186 of RecSample [ i ] [ j ] at i-32 to 63, j-1 may be used by video encoder 20 and video decoder 30 as the upper-right reconstructed samples for the lower-left 32 × 32 intra block 176. Thus, to intra predict the lower left 32 × 32 intra block 176 of the 64 × 64 intra block 170, video encoder 20 and video decoder 30 may use reconstructed samples that are adjacent (i.e., contiguous) to the left of the 32 × 32 block 176 and neighboring reconstructed samples that are adjacent to the top of the 64 × 64 block 170 but not adjacent to the top of the lower left 32 × 32 block 176, depending on the particular intra mode.

In predicting the fourth bottom-right 32 × 32 block 178, video encoder 20 and video decoder 30 may use reconstructed samples of non-adjacent (i.e., not immediately adjacent) 32 × 32 blocks. For example, video encoder 20 and video decoder 30 may use neighboring reconstructed samples of lower-right 32 × 32 intra block 178 that are adjacent to 64 × 64 block 170 but not adjacent to 32 × 32 block 178 as follows: the reconstructed samples 182 at i-1 and j-31 of RecSample [ i ] [ j ] may be used as the upper left samples, the reconstructed samples 182 at i-1, j-32 to 63 of RecSample [ i ] [ j ] may be used as the left samples, and the reconstructed samples 186 at i-32 to 63, j-1 of RecSample [ i ] [ j ] may be used as the upper samples. In some examples, for the bottom-right 32 × 32 intra block 178, video encoder 20 and video decoder 30 may also use RecSample [ i ] [ j ] as part of reconstructed sample 188 of the top-right sample at i-64 to 95, j-1.

For each of the first and second examples above, video encoder 20 may generate residual data that indicates differences between pixels of the coded 64 × 64 block and the corresponding four predictive 32 × 32 blocks of intra-predictive samples. In the case of conventional residual coding, the residual data may include a plurality of residual values that indicate differences on a sample-by-sample basis between pixels of the original 64 × 64 block and corresponding predictive samples of the four 32 × 32 predicted blocks. Alternatively, in the case of SDC, the residual data may be a single delta DC value representing the difference between the average of the pixels in the original 64 x 64 block and the average of the predictive samples of the four 32 x 32 predicted blocks, or alternatively a single delta DC value representing the difference between the average of the pixels in the original 64 x 64 block and the average of the four predictive samples including the top left pixel of the top left predicted block 172, the top right pixel of the top right predicted block 174, the bottom left pixel of the bottom left predicted block 176, and the bottom right pixel of the bottom right predicted block 178. In either case, video encoder 20 may encode syntax information that indicates the intra coding mode for the 64 × 64 block and the residual data for the 64 × 64 block. Video decoder 30 may predict the 64 x 64 block using syntax information and reconstruct the block by summing the residual with the predicted block. When the intra SDC64 × 64 mode is indicated, video decoder 30 may process the intra SDC64 × 64 block of depth data into four 32 × 32 intra-predicted sub-blocks. In this way, the prediction of a 64 × 64 block is basically processed in four 32 × 32 sub-blocks; however, the residual is calculated for the 64 × 64 block, and the intra prediction mode is signaled also for the 64 × 64 block instead of the 32 × 32 sub-block. For example, encoder 20 may encode and decoder 30 may decode syntax information indicating a 64 × 64 intra SDC mode and residual data indicating differences between pixel values of the 64 × 64 block and intra-predicted samples of the 32 × 32 block.

In each of the first and second examples above, no transform is required when applying SDC. Thus, more generally, if such methods are also applied to non-SDC intra prediction with conventional residual coding, the methods will only be applied if the maximum transform size is not greater than the intra prediction block size (e.g., 32 x 32) when the transform is applied. For example, additionally, the above approach may be used to enable normal 64 × 64 intra prediction in 3D-HEVC for depth coding, but may be limited to coding with transform sizes less than or equal to 32 × 32.

Fig. 9 is a flow diagram illustrating a method for encoding a 64 x 64 intra depth block, according to an example of this disclosure. In the example of fig. 9, video encoder 20 selects an intra SDC64 x 64 mode for encoding depth blocks, e.g., in a 3D-HEVC process (200). When the intra 64 × 64 mode is selected, video encoder 20 intra predicts the 64 × 64 block using four 32 × 32 blocks (i.e., sub-blocks) of the 64 × 64 intra block (202), as described in this disclosure. Video encoder 20 then generates residual data indicating the differences between the pixels of the original 64 x 64 block and the intra-predictive samples of the four 32 x 32 predicted sub-blocks (204), and encodes the intra 64 x 64 block based on the intra prediction mode for the 64 x 64 block and the residual data (206). For example, video encoder 20 may signal syntax information indicating the 64 × 64 intra-modes for the 64 × 64 depth block, and thus signal the intra-modes used to predict the samples of each of the 32 × 32 sub-blocks and the residual data for the 64 × 64 block. In the case of intra SDC, in some examples, the residual data may comprise delta DC values for a 64 x 64 block or partition thereof. The delta DC value does not need to be transformed or quantized for encoding in the bitstream. In the case of conventional residual coding, the bitstream may include quantized transform coefficients that represent residual data.

Fig. 10 is a flow diagram illustrating a method for decoding a 64 x 64 intra depth block, according to an example of this disclosure. In the example of fig. 10, such as in the 3D-HEVC process, video decoder 210 receives syntax information in an encoded video bitstream that indicates SDC64 x 64 intra modes for 64 x 64 depth blocks to be decoded. In response to the syntax information indicating the SDC intra 64 × 64 mode, video decoder 30 intra-predicts four 32 × 32 depth sub-blocks of the 64 × 64 depth block, e.g., using the same intra mode for each of the sub-blocks (212), and receives residual data for the 64 × 64 block (214). For example, video decoder 30 may decode the residual data from the bitstream. Moreover, the residual data may be generated by SDC without transform or quantization, or by conventional residual coding, in which case video decoder 30 may apply inverse quantization and inverse transform to obtain the residual data. Video decoder 30 reconstructs depth data for the 64 x 64 intra-coded block based on the four 32 x 32 blocks of intra-prediction samples and the received residual data for the 64 x 64 block (216).

Although 64 x 64 and 32 x 32 blocks are described for purposes of example and illustration, in other examples, the techniques described in this disclosure may be applied to larger blocks. For example, an intra 128 × 128 block may be processed as four 64 × 64 blocks, or a 256 × 256 block may be processed as four 128 × 128 blocks. Thus, the concepts applied to 64 × 64 and 32 × 32 blocks may be considered applicable to larger blocks.

In some examples, the reference samples available to video encoder 20 and/or video decoder 30 for intra-predicting samples of at least some of the 32 x 32 blocks include intra-predicted reference samples of one or more of the other 32 x 32 blocks. As another example, the reference samples available for intra-predicting samples of at least some of the 32 x 32 blocks include intra-predicted reference samples of one or more of the other 32 x 32 blocks that neighbor the respective 32 x 32 block. As another example, the reference samples available for intra-predicting samples of at least some of the 32 x 32 blocks include intra-predicted reference samples of one or more of the other 32 x 32 blocks that neighbor the respective 32 x 32 block and reconstructed samples of neighboring 64 x 64 blocks and the respective 32 x 32 block.

In another example, the reference samples available to video encoder 20 and/or video decoder 30 for intra-predicting samples of at least some of the 32 x 32 blocks include reconstructed samples that neighbor the 64 x 64 block. As another example, the reference samples available for intra-predicting samples of at least some of the 32 x 32 blocks include reconstructed samples that neighbor the 64 x 64 block but not the respective 32 x 32 block.

Thus, in various examples, video encoder 20 or video decoder 30 may be configured to: intra-predicting samples of the 32 x 32 block using intra-predicted reference samples of one or more of the other 32 x 32 blocks; intra-predicting samples of the 32 x 32 block using intra-predicted reference samples of one or more other 32 x 32 blocks neighboring the respective 32 x 32 block; intra-predicting samples of the 32 x 32 block using reference samples of one or more of other 32 x 32 blocks that neighbor the respective 32 x 32 block and reconstructed samples of neighboring 64 x 64 blocks and the respective 32 x 32 block; intra-predicting samples of the 32 x 32 block using reconstructed samples of neighboring 64 x 64 blocks; or intra-predict samples of the 32 x 32 block using reconstructed samples that neighbor the 64 x 64 block but not the respective 32 x 32 block.

Video decoder 30 may perform a method of decoding depth data for video coding, the method comprising: for an intra prediction mode for a first block of depth data, intra predicting samples of depth data for a second block, wherein the second block comprises four blocks each having the same size that is one-fourth of the size of the first block of depth data and corresponds to top-left, top-right, bottom-left, and bottom-right blocks of the first block of depth data; receiving residual data for a first block of depth data indicating differences between pixel values of the first block and intra-predicted samples of a second block; and reconstructing a first block of depth data based on the intra-predicted samples of the second block and the residual data. In some examples, the residual data may include DC residual data indicating a difference between an average of pixel values of the first block and an average of intra-predicted samples of the second block.

Video encoder 20 may perform a method of encoding depth data for video coding, the method comprising: for an intra prediction mode for a first block of depth data, intra predicting samples of depth data for a second block, wherein the second block comprises four blocks each having the same size that is one-fourth of the size of the first block of depth data and corresponds to top-left, top-right, bottom-left, and bottom-right blocks of the first block of depth data; generating residual data for a first block of depth data based on differences between pixel values of the first block and intra-predicted samples of a second block; and encoding a first block of depth data based on the intra-prediction mode and the residual data. In some examples, the residual data may include DC residual data indicating a difference between an average of pixel values of the first block and an average of intra-predicted samples of the second block.

The various intra-coding techniques described in this disclosure may be performed by video encoder 20 (fig. 3 and 5) and/or video decoder 30 (fig. 3 and 7), both of which may generally be referred to as video coders. In addition, video coding may generally refer to video encoding and/or video decoding, as applicable.

Although the techniques of this disclosure are described generally with respect to 3D-HEVC, they are not necessarily limited in this manner. The techniques described above may also be applicable to other current or future standards for 3D video coding.

In one or more examples, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media (e.g., data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium in a non-transitory form, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, Integrated Circuits (ICs), or collections of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by a collection of interoperable hardware units, including one or more processors as described above.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method of decoding depth data for video coding, the method comprising:

intra-predicting, for an intra-prediction mode of a first block of depth data, samples of depth data for a second block, wherein the second block comprises four blocks each having a same size that is one-quarter of a size of the first block of depth data and corresponds to upper-left, upper-right, lower-left, and lower-right blocks of the first block of depth data;

receiving residual data for the first block of depth data indicative of differences between pixel values of the first block and intra-predicted samples of the second block; and

reconstruct the first block of depth data based on the intra-predicted samples of the second block and the residual data.

2. The method of claim 1, wherein the intra prediction mode is an intra 64 x 64 mode, the first block has a 64 x 64 pixel size, and the second blocks each have a 32 x 32 pixel size.

3. The method of claim 1 or 2, wherein the intra prediction mode is an intra segment-wise DC coding (SDC) mode in 3D-HEVC.

4. The method of claim 1 or 2, wherein the intra prediction mode is not an intra segment-wise DC coded SDC mode.

5. The method of claim 1 or 2, wherein the residual data comprises DC residual data indicative of a difference between an average of the pixel values of the first block and an average of the intra-predicted samples of the second block.

6. The method of claim 1 or 2, wherein reference samples available for intra-predicting samples of at least some of the second blocks include intra-predicted reference samples of one or more of the other second blocks.

7. The method of claim 1 or 2, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block.

8. The method of claim 1 or 2, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block and reconstructed samples that neighbor the first block and the respective second block.

9. The method of claim 1 or 2, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include reconstructed samples neighboring the first block.

10. The method of claim 1 or 2, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include reconstructed samples that neighbor the first block but not the respective second block.

11. The method of claim 1 or 2, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using intra-predicted reference samples of one or more of the other second blocks.

12. The method of claim 1 or 2, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block.

13. The method of claim 1 or 2, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using reference samples of one or more of other second blocks that neighbor the respective second block and reconstructed samples that neighbor the first block and the respective second block.

14. The method of claim 1 or 2, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using reconstructed samples neighboring the first block.

15. The method of claim 1 or 2, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using reconstructed samples that neighbor the first block but not the respective second block.

16. A method of encoding depth data for video coding, the method comprising:

generating residual data for the first block of depth data based on differences between pixel values of the first block and intra-predicted samples of the second block; and

encoding the first block of depth data based on the intra-prediction mode and the residual data.

17. The method of claim 16, wherein the intra prediction mode is an intra 64 x 64 mode, the first block has a 64 x 64 pixel size, and the second blocks each have a 32 x 32 pixel size.

18. The method of claim 16 or 17, wherein the intra prediction mode is an intra segment-wise DC coding, SDC, mode in 3D-HEVC.

19. The method of claim 16 or 17, wherein the intra prediction mode is not an intra segment-wise DC coded SDC mode.

20. The method of claim 16 or 17, wherein the residual data comprises DC residual data indicative of a difference between an average of the pixel values of the first block and an average of the intra-predicted samples of the second block.

21. The method of claim 16 or 17, wherein reference samples available for intra-predicting samples of at least some of the second blocks include intra-predicted reference samples of one or more of the other second blocks.

22. The method of claim 16 or 17, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block.

23. The method of claim 16 or 17, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block and reconstructed samples that neighbor the first block and the respective second block.

24. The method of claim 16 or 17, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include reconstructed samples neighboring the first block.

25. The method of claim 16 or 17, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include reconstructed samples that neighbor the first block but not the respective second block.

26. The method of claim 16 or 17, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using intra-predicted reference samples of one or more of the other second blocks.

27. The method of claim 16 or 17, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block.

28. The method of claim 16 or 17, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using reference samples of one or more of other second blocks that neighbor the respective second block and reconstructed samples that neighbor the first block and the respective second block.

29. The method of claim 16 or 17, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using reconstructed samples neighboring the first block.

30. The method of claim 16 or 17, wherein intra-predicting samples of depth data for the second block comprises intra-predicting the samples using reconstructed samples that neighbor the first block but not the respective second block.

31. A device for coding depth data for video coding, the device comprising:

a memory storing depth data for video coding; and

one or more processors configured to:

coding the first block of depth data based on the intra-prediction mode and residual data for the first block that indicates differences between pixel values of the first block and intra-predicted samples of the second block.

32. The device of claim 31, wherein the device comprises a video decoder, and the one or more processors are further configured to:

receiving syntax information indicating the intra prediction mode;

receiving the residual data; and

reconstructing the first block of depth data based on the intra-predicted samples of the second block and the residual data used to code the first block of depth data.

33. The device of claim 31, wherein the device comprises a video encoder, and the one or more processors are further configured to:

selecting the intra prediction mode;

generating the residual data; and

encoding the first block of depth data based on the intra-prediction mode and the residual data used to code the first block of depth data.

34. The device of any one of claims 31-33, wherein the intra prediction mode is an intra 64 x 64 mode, the first blocks have a 64 x 64 pixel size, and the second blocks each have a 32 x 32 pixel size.

35. The device of any one of claims 31-33, wherein the intra prediction mode is an intra segment-wise DC coding, SDC, mode in 3D-HEVC.

36. The device of any one of claims 31-33, wherein the intra prediction mode is not an intra segment-wise DC coded SDC mode.

37. The device of any one of claims 31-33, wherein the residual data comprises DC residual data indicative of a difference between an average of the pixel values of the first block and an average of the intra-predicted samples of the second block.

38. The device of any one of claims 31-33, wherein reference samples available for intra-predicting samples of at least some of the second blocks include intra-predicted reference samples of one or more of the other second blocks.

39. The device of any one of claims 31-33, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block.

40. The device of any one of claims 31-33, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block and reconstructed samples that neighbor the first block and the respective second block.

41. The device of any one of claims 31-33, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include reconstructed samples adjacent to the first block.

42. The device of any one of claims 31-33, wherein reference samples available for intra-predicting the samples of at least some of the second blocks include reconstructed samples that are adjacent to the first block but not adjacent to the respective second block.

43. The device of any one of claims 31-33, wherein the one or more processors are configured to intra-predict the samples using intra-predicted reference samples of one or more of the other second blocks.

44. The device of any one of claims 31-33, wherein the one or more processors are configured to intra-predict the samples using intra-predicted reference samples of one or more of other second blocks that neighbor the respective second block.

45. The device of any one of claims 31-33, wherein the one or more processors are configured to intra-predict the samples using reference samples of one or more of other second blocks that neighbor the respective second block and reconstructed samples that neighbor the first block and the respective second block.

46. The device of any one of claims 31-33, wherein the one or more processors are configured to intra-predict the samples using reconstructed samples neighboring the first block.

47. The device of any one of claims 31-33, wherein the one or more processors are configured to intra-predict the samples using reconstructed samples that are adjacent to the first block but not adjacent to the respective second block.

48. A device for coding depth data for video coding, the device comprising:

means for storing depth data for 3D-HEVC video coding;

means for intra-predicting, for an intra-prediction mode for a first block of depth data, samples for depth data of a second block, wherein the second block comprises four blocks of the same size that are each one-fourth of the size of the first block of depth data and correspond to upper-left, upper-right, lower-left, and lower-right blocks of the first block of depth data; and

means for coding the first block of depth data based on the intra-prediction mode and residual data for the first block that indicates differences between pixel values of the first block and intra-predicted samples of the second block.

49. A device for coding depth data for video coding, the device comprising means for performing the method of any of claims 1-30.

50. A non-transitory computer-readable storage medium comprising instructions that cause one or more processors of a video coder to:

storing depth data for 3D-HEVC video coding;

intra-predicting, for an intra-prediction mode of a first block of depth data, samples of depth data for a second block, wherein the second block comprises four blocks each having a same size that is one-quarter of a size of the first block of depth data and corresponds to upper-left, upper-right, lower-left, and lower-right blocks of the first block of depth data; and

51. A non-transitory computer-readable storage medium comprising instructions that cause one or more processors of a video coder to perform the method of any of claims 1-30.