HK1200047B

HK1200047B - Motion vector coding and bi-prediction in hevc and its extensions

Info

Publication number: HK1200047B
Application number: HK15100451.5A
Authority: HK
Inventors: 陈颖; 王益魁; 张莉
Original assignee: 高通股份有限公司
Priority date: 2012-03-16
Filing date: 2013-03-14
Publication date: 2018-06-29

Description

Motion vector coding and bi-prediction in high efficiency video coding and extensions thereof

This application claims the following U.S. provisional patent applications, the entire contents of each of which are hereby incorporated by reference:

U.S. provisional application No. 61/611,959, filed on 3/16/2012;

U.S. provisional application No. 61/624,990, filed on day 4, 16, 2012;

U.S. provisional application No. 61/658,344, filed on day 11, 6/2012; and

U.S. provisional application No. 61/663,484, filed on day 22, 6/2012.

Technical Field

The present invention relates to video coding.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), portable or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in: standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 (advanced video coding (AVC)), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of these standards, such as Scalable Video Coding (SVC) and Multiview Video Coding (MVC). Version 6 of the Working Draft (WD) of HEVC may be obtained from http: int-evry. fr/jct/doc _ end _ user/documents/8_ San% 20Jose/wg11/JCTVC-H1003-v21. zip. Video devices may more efficiently transmit, receive, encode, decode, and/or store digital video information by implementing these video coding techniques.

Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video frame or portion of a video frame) may be partitioned into multiple video blocks, which may also be referred to as treeblocks, Coding Units (CUs) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture, or use temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Spatial or temporal prediction generates a predictive block for the block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, producing residual transform coefficients that may then be quantized. Quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate one-dimensional vectors of transform coefficients, and entropy coding may be applied to achieve an even greater degree of compression.

Disclosure of Invention

In general, techniques are described for coding motion vectors and for performing bi-prediction in High Efficiency Video Coding (HEVC) and its extensions, such as multiview or three-dimensional video (3DV) extensions. The techniques of this disclosure may support better forward compatibility with multiview video codecs and/or 3D video codes in the base codec design.

In one example, a method of decoding video data comprises: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and decoding the current motion vector based at least in part on the value of the variable.

In another example, a method of encoding video data comprises: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and encoding the current motion vector based at least in part on the value of the variable.

In another example, a device for decoding video data comprises a video decoder configured to: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and decoding the current motion vector based at least in part on the value of the variable.

In another example, a device for encoding video data comprises a video encoder configured to: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and encoding the current motion vector based at least in part on the value of the variable.

In another example, a device for decoding video data comprises: means for determining a first type of a current motion vector for a current block of video data; means for determining a second type of candidate motion vector predictor for a neighboring block of the current block; means for setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and means for decoding the current motion vector based at least in part on the value of the variable.

In another example, a device for encoding video data comprises: means for determining a first type of a current motion vector for a current block of video data; means for determining a second type of candidate motion vector predictor for a neighboring block of the current block; means for setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and means for encoding the current motion vector based at least in part on the value of the variable.

In another example, a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium) has stored thereon instructions that, when executed, cause a processor to: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and decoding the current motion vector based at least in part on the value of the variable.

In another example, a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium) has stored thereon instructions that, when executed, cause a processor to: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and encoding the current motion vector based at least in part on the value of the variable.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize techniques for coding motion vectors and for performing bi-prediction in High Efficiency Video Coding (HEVC) and its extensions, such as multi-view or three-dimensional video (3DV) extensions.

Fig. 2 is a block diagram illustrating an example of a video encoder that may implement techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions.

Fig. 3 is a block diagram illustrating an example of a video decoder 30, which video decoder 30 may implement techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions.

Fig. 4 is a conceptual diagram illustrating an example MVC prediction pattern.

FIG. 5 is a flow diagram illustrating an example method for encoding a current block in accordance with the techniques of this disclosure.

Fig. 6 is a flow diagram illustrating an example method for decoding a current block of video data in accordance with the techniques of this disclosure.

Detailed Description

In general, techniques are described for coding Multiview Video Coding (MVC) data. Currently, the Motion Pictures Experts Group (MPEG) is developing the three-dimensional video (3DV) standard based on the upcoming High Efficiency Video Coding (HEVC) standard. Part of the standardization effort also includes the standardization of HEVC-based multi-view video codecs. In two-dimensional video coding, video data (i.e., a sequence of pictures) is coded picture-by-picture (not necessarily in display order). Video coding devices divide each picture into a plurality of blocks and code each block individually. Block-based prediction modes include spatial prediction, also known as intra-prediction, and temporal prediction, also known as inter-prediction.

For three-dimensional video data such as HEVC-based 3DV, blocks may also be inter-view predicted. That is, a block may be predicted from a picture of another view, where each view generally corresponds to a respective camera position. As such, in HEVC-based 3DV, inter-view prediction based on reconstructed view components from different views may be enabled. This disclosure uses the term "view component" to refer to an encoded picture of a particular view. That is, a view component may include an encoded picture of a particular view at a particular time (in terms of display order or output order). A view component (or slice of a view component) may have a Picture Order Count (POC) value that generally indicates the display order (or output order) of the view component.

In temporal inter-prediction or inter-view prediction, a video coding device may code data indicative of one or more motion vectors (temporal inter-prediction) and/or one or more displacement vectors (inter-view prediction). In some examples, a block coded with one motion vector or one displacement vector is referred to as a P block, while a block coded with two motion vectors or two displacement vectors is referred to as a bi-predictive block or B block. Techniques applicable to motion vectors are also generally applicable to displacement vectors, and thus this disclosure primarily describes motion vector coding techniques. However, it should be understood that these techniques also apply to displacement vectors, and likewise, the techniques described with respect to displacement vectors also apply to motion vectors, unless otherwise indicated.

In general, data indicating a reference picture to which a motion vector or a displacement vector can refer is stored in a reference picture list. Thus, the motion vector data (or displacement vector data) may comprise not only data of the x-component and y-component of the motion vector, but also an indication of an entry of the reference picture list, this indication being referred to as the reference picture index. A video coding device may construct multiple reference picture lists. For example, a video coding device may construct a first reference picture list (list 0 or RefPicList0) to store data for reference pictures with POC values earlier than the current picture, and construct a second reference picture list (list 1 or RefPicList1) to store data for reference pictures with POC values later than the current picture. Furthermore, it should be noted that the display or output order of pictures is not necessarily the same as the coding order value (e.g., frame number or "frame _ num" value). Thus, pictures may be coded in an order that is different from the order in which the pictures are displayed (or captured).

In general, reference picture list construction for a first reference picture list or a second reference picture list of a B picture comprises two steps: reference picture list initialization and reference picture list reordering (modification). The reference picture list is initialized to an explicit mechanism that puts reference pictures in a reference picture memory (also referred to as a decoded picture buffer) into a list based on the order of POC (picture order count, which is aligned with the display order of the pictures) values. The reference picture list reordering mechanism may modify the position of a picture placed in a list during reference picture list initialization to any new position, or place any reference picture in a reference picture memory at any position, even if the picture does not belong to an initialization list. Some pictures may be placed in another position in the list after reference picture list reordering (modification). However, if the position of a picture exceeds the number of active reference pictures of the list, the picture is not considered an entry of the final reference picture list. The number of active reference pictures may be signaled in the slice header for each list. After the reference picture lists are constructed (e.g., RefPicList0 and RefPicList1, if available), the reference index may be used to identify the pictures in any reference picture list.

As mentioned above, the motion vector data may also include a horizontal component (or x-component) and a vertical component (or y-component). Thus, a motion vector may be defined as < x, y >. A video coding device may code a motion vector relative to a motion vector predictor, rather than directly coding the x-component and y-component of the motion vector. In various examples, the motion vector predictor may be selected from a spatial neighbor of the current block, a collocated block of a temporally separate picture (i.e., a collocated block in a previously coded picture), or a collocated block of a picture in another view of the same temporal instance. The motion vector predictor of a temporally separated picture is called Temporal Motion Vector Predictor (TMVP).

To determine the TMVP of a current block, e.g., a current Prediction Unit (PU) of a current Coding Unit (CU) in HEVC, a video coding device may first identify a collocated picture. The term "collocated" picture refers to a picture that includes a particular collocated block. Co-located blocks may also be included in "co-located partitions," as indicated in WD6 of HEVC. If the current picture is a B slice, a collocated _ from _ l0_ flag may be signaled in the slice header of the slice of the current picture to indicate whether the collocated picture is from RefPicList0 or RefPicList 1. After the reference picture list is identified, the video coding device may use collocated ref idx signaled in the slice header to identify the collocated picture in the reference picture list. The collocated PU is then identified by examining the collocated picture. The motion vector of the lower right PU of the CU containing the current PU or the motion vector of the lower right PU within the center PU of the CU containing this PU may be considered the TMVP of the current PU. When the motion vectors identified by the above process are used to generate motion candidates for AMVP or merge mode, they may be scaled based on temporal position (reflected by POC values of reference pictures). In accordance with the techniques of this disclosure, the TMVP may be from the same view or from a different view, as described below.

In HEVC, a Picture Parameter Set (PPS) includes a flag enable temporal mvp flag. When a particular picture with temporal _ id equal to 0 refers to a PPS having enable temporal _ mvp _ flag equal to 0, all reference pictures in the DPB may be marked as "unused for temporal motion vector prediction" and motion vectors from pictures that precede that particular picture in decoding order will not be used as temporal motion vector predictors when decoding the particular picture or pictures that follow the particular picture in decoding order.

In h.264/AVC or HEVC, for P slices, the prediction weights are explicitly signaled by setting weighted _ pred _ flag to 1 when weighted prediction is allowed. The syntax element weighted _ pred _ flag is signaled in the slice header and its semantics are as follows:

in some examples, weighted _ pred _ flag equal to 0 may specify that weighted prediction should not be applied to the P slice. weighted _ pred _ flag equal to 1 specifies that weighted prediction should be applied to the P slice.

For B slices, when weighted prediction is enabled, prediction weights can be explicitly signaled or implicitly derived by setting weighted _ bipred _ idc to non-zero. Syntax may also be signaled in the slice header and its semantics are as follows:

in some examples, weighted _ bipred _ idc equal to 0 specifies that default weighted prediction is applied to B slices. In some examples, weighted _ bipred _ idc equal to 1 specifies that explicit weighted prediction is applied to B slices. In some examples, weighted _ bipred _ idc equal to 2 specifies that implicit weighted prediction should be applied to B slices. The value of weighted _ bipred _ idc may range from 0 to 2, including 0 and 2.

When weighted _ bipred _ idc is equal to 1, a weight can be derived based on the temporal distance of two reference frames by calculating the POC distance.

Current HEVC design may hinder the development of future extensions, such as multiview or 3DV extensions, especially when the developers of these extensions want to provide the ability to make only high level syntax changes. For example, if the reference picture to be used for TMVP is from a different view, it may have the same POC as the current picture. Current designs of HEVC for motion vector scaling may not be able to accurately identify the reference picture used for TMVP in such a scenario. When implicit weighted prediction is applied to a B slice and one reference picture is from a different view, the process of calculating the prediction weights may encounter problems because the process is designed based only on POC distances.

The technique of the present invention can solve these problems. In general, this disclosure provides techniques for changing only the high level syntax on HEVC design to support MVC or 3 DV. Some of the solutions are used for HEVC base specification, and thus for forward compatibility purposes. A video coder, such as a video encoder or a video decoder, may be configured to implement any or all of the various techniques of this disclosure, alone or in any combination. Various techniques are described in more detail below.

As one example, a video coder may be configured to perform the following operations: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and coding a current motion vector based at least in part on the value of the variable. Different types of motion vectors may include, for example, disparity motion vectors and temporal motion vectors.

Any of a variety of techniques may be used to determine the type of motion vector. For example, a video encoder may determine the type of motion vector (e.g., temporal versus disparity) based on a comparison of POC values between a current picture and a reference picture to which the motion vector refers. If the POC values are different, the video coder may determine the motion vector to be a temporal motion vector. On the other hand, if the POC values are the same, the video coder may determine that the motion vector is a disparity motion vector.

As another example, a video coder may compare the current picture and the layers (e.g., views or scalability layers) in which the reference picture referenced by the motion vector appears. The video coder may determine that the motion vector is a temporal motion vector if the current picture and the reference picture occur in the same layer. On the other hand, if the current picture and the reference picture occur in different layers, the video coder may determine that the motion vector is a disparity motion vector.

As yet another example, a video coder may determine whether a reference picture to which a motion vector refers is a long-term reference picture or a short-term reference picture. If the reference picture is a short-term reference picture, the video coder may determine that the motion vector is a disparity motion vector. However, if the reference picture is a long-term reference picture, the video coder may determine that the motion vector is a temporal motion vector.

Furthermore, in accordance with certain techniques of this disclosure, when the current motion vector has a type different from the candidate motion vector predictor, the video coder may be configured to determine that the candidate motion vector predictor is unavailable. For example, the video coder may set an "available" flag (or variable) that indicates whether the candidate motion vector predictor is available as a predictor for the current motion vector to a value that indicates that the candidate motion vector predictor is unavailable when the type between the current motion vector and the candidate motion vector predictor differs.

Fig. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions. As shown in fig. 1, system 10 includes a source device 12, source device 12 providing encoded video data to be decoded by a destination device 14 at a later time. In particular, source device 12 provides video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., portable) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" handsets, so-called "smart" pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may include any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium enabling source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the internet. The communication medium may comprise a router, switch, base station, or any other apparatus that may be used to facilitate communication from source device 12 to destination device 14.

In some examples, the encoded data may be output from output interface 22 to a storage device. Similarly, encoded data may be accessed from the storage device by the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In another example, the storage device may correspond to a file server, or another intermediate storage device that may store the encoded video generated by source device 12. Destination device 14 may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 14 may access the encoded video data via any standard data connection, including an internet connection. Such a data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding in support of any of a variety of multimedia applications, such as: over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions such as HTTP Dynamic Adaptive Streaming (DASH), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes input interface 28, video decoder 30, and display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions. In other examples, the source device and the destination device may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device.

The illustrated system 10 of FIG. 1 is merely one example. The techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions, may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are generally performed by a video encoding device, the techniques may also be performed by a video encoder/decoder commonly referred to as a "CODEC". Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices, with source device 12 generating coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetric manner such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 14, such as for video streaming processing, video playback, video broadcasting, or video telephony.

Video source 18 of source device 12 may include a video capture device such as a video camera, a video archive containing previously captured video, and/or a video feed interface that receives video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 onto computer-readable medium 16.

Computer-readable medium 16 may include: transitory media such as wireless broadcast or wired network transmission; or a storage medium (i.e., a non-transitory storage medium) such as a hard disk, a flash drive, a compact disc, a digital video disc, a blu-ray disc, or other computer-readable medium. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, e.g., via network transmission. Similarly, a computing device of a media production facility (such as a disc stamping facility) may receive encoded video data from source device 12 and generate a disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.

Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may comprise syntax information defined by video encoder 20, which is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units (e.g., GOPs). Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC test model (HM). Likewise, video encoder 20 and video decoder 30 may be configured according to extensions of the HEVC standard, such as multiview extensions or three-dimensional video (3DV) extensions. Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industrial standards such as the ITU-T h.264 standard, alternatively referred to as MPEG-4 part 10 (advanced video coding (AVC)), or extensions of these standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a general data stream or separate data streams. The MUX-DEMUX unit may be compliant with the ITU h.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), where applicable.

The ITU-T H.264/MPEG-4(AVC) standard is set forth by the ITU-T Video Coding Experts Group (VCEG), along with the ISO/IEC Motion Picture Experts Group (MPEG), as a collective collaborative product called the Joint Video Team (JVT). In some aspects, the techniques described in this disclosure may be applied to devices that generally conform to the h.264 standard. The h.264 standard is described by the ITU-T research group and in ITU-T international standard h.264 (advanced video coding for general audio visual services) at 3 months of 2005, which may be referred to herein as the h.264 standard or the h.264 specification, or the h.264/AVC standard or specification. The Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC.

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in the respective device.

JCT-VC is working on the development of the HEVC standard. HEVC standardization efforts are based on an evolution model of the video coding device, referred to as the HEVC test model (HM). The HM infers several additional capabilities of the video coding device relative to existing devices in accordance with, for example, ITU-T H.264/AVC. For example, h.264 provides nine intra-prediction encoding modes, while HM may provide up to thirty-three intra-prediction encoding modes.

In general, the working model of HM describes that a video frame or picture may be divided into a sequence of treeblocks or Largest Coding Units (LCUs) that include both luma and chroma samples. Syntax data within the bitstream may define a size of an LCU, which is a largest coding unit in terms of a number of pixels. A slice includes several consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slices. Each treeblock may be split into multiple Coding Units (CUs) according to a quadtree. In general, a quadtree data structure includes one node per CU, where the root node corresponds to a treeblock. If a CU is split into four sub-CUs, the node corresponding to the CU comprises four leaf nodes, each of which corresponds to one of the sub-CUs.

Each node of the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in a quadtree may include a split flag that indicates whether a CU corresponding to the node is split into a plurality of sub-CUs. Syntax elements of a CU may be defined recursively and may depend on whether the CU is split into multiple sub-CUs. If a CU is not split further, it is called a leaf CU. In the present invention, the four sub-CUs of a leaf CU will also be referred to as leaf CUs even if there is no explicit splitting of the original leaf CU. For example, if a CU of size 16 × 16 is not further split, then although the 16 × 16CU is never split, the four 8 × 8 sub-CUs will also be referred to as leaf CUs.

A CU has a similar purpose to a macroblock of the h.264 standard, except that the CU has no size difference. For example, a treeblock may be split into four child nodes (also referred to as child CUs), and each child node may in turn be a parent node and split into four other child nodes. The final non-split child nodes, referred to as leaf nodes of the quadtree, include coding nodes, also referred to as leaf-CUs. Syntax data associated with a coded bitstream may define a maximum number of separable treeblocks (referred to as a maximum CU depth), and may also define a minimum size of a coding node. Thus, the bitstream may also define a minimum coding unit (SCU). This disclosure uses the term "block" to refer to any of a CU, PU, or TU in the context of HEVC, or similar data structures in the context of other standards (e.g., macroblocks and sub-blocks thereof in h.264/AVC).

A CU includes a coding node and Prediction Units (PUs) and Transform Units (TUs) associated with the coding node. The size of a CU corresponds to the size of the coding node, and the shape must be square. The size of a CU may range from 8 × 8 pixels up to the size of a treeblock with a maximum of 64 × 64 pixels or larger than 64 × 64 pixels. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning of the CU into one or more PUs. The partition mode may be different depending on whether the CU is skipped or direct mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The shape of the PU may be partitioned into non-square shapes. Syntax data associated with a CU may also describe, for example, partitioning of the CU into one or more TUs according to a quadtree. The TU may be square or non-square (e.g., rectangular) in shape.

The HEVC standard allows for a transform according to a TU, which may be different for different CUs. TU sizes are typically set based on the size of PUs within a given CU defined for a partitioned LCU, but this may not always be the case. TUs are typically the same size as a PU, or smaller than a PU. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). The pixel difference values associated with the TUs may be transformed to produce quantifiable transform coefficients.

A leaf CU may include one or more Prediction Units (PUs). In general, a PU represents a spatial region corresponding to all or part of a corresponding CU, and may include data used to retrieve reference samples for the PU. In addition, the PU includes data related to prediction. For example, when a PU is intra-mode encoded, data of the PU may be included in a Residual Quadtree (RQT), which may include data describing an intra-prediction mode for a TU corresponding to the PU. As another example, when a PU is inter-mode encoded, the PU may include data defining one or more motion vectors for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list of the motion vector (e.g., list0, list1, or list C).

A leaf-CU having one or more PUs may also include one or more Transform Units (TUs). Transform units may be specified using RQTs (also referred to as TU quadtree structures), as discussed above. For example, the split flag may indicate whether the leaf CU is split into four transform units. Each transform unit may then be further split into other sub-TUs. When a TU is not split further, it may be referred to as a leaf-TU. In general, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, the same intra prediction mode is generally applied to calculate prediction values for all TUs of a leaf-CU. For intra coding, video encoder 20 may calculate the residual value for each leaf-TU as the difference between the portion of the CU corresponding to the TU and the original block using the intra-prediction mode. TUs are not necessarily limited to the size of a PU. Thus, TU may be larger or smaller than PU. For intra coding, a PU may be collocated with a corresponding leaf-TU of the same CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.

Furthermore, the TUs of a leaf-CU may also be associated with a respective quadtree data structure, referred to as a Residual Quadtree (RQT). That is, a leaf-CU may include a quadtree that indicates the manner in which the leaf-CU is partitioned into TUs. The root node of a TU quadtree generally corresponds to a leaf CU, while the root node of a CU quadtree generally corresponds to a treeblock (or LCU). The non-split TUs of the RQT are called leaf-TUs. In general, the terms CU and TU are used by this disclosure to refer to leaf-CU and leaf-TU, respectively, unless otherwise indicated.

A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) typically includes a series of one or more video pictures. The GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or in other locations, the syntax data describing the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes an encoding mode of the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes and may differ in size according to a specified coding standard.

As an example, the HM supports prediction at various PU sizes. Assuming that the size of a particular CU is 2 nx 2N, the HM supports intra prediction with PU sizes of 2 nx 2N or N × N, and inter prediction with symmetric PU sizes of 2 nx 2N, 2 nx N, N × 2N, or N × N. The HM also supports asymmetric partitioning for inter prediction with PU sizes of 2 nxnu, 2 nxnd, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of a CU is undivided, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an indication of "n" followed by "up", "down", "left", or "right". Thus, for example, "2N × nU" refers to a horizontally partitioned 2N × 2NCU, with 2N × 0.5N PU at the top and 2N × 1.5N PU at the bottom.

In this disclosure, "nxn" and "N by N" may be used interchangeably to refer to the pixel size of a video block, e.g., 16 x 16 pixels or 16 by 16 pixels, in terms of the vertical dimension and the horizontal dimension. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction and in the vertical direction. For example, a block may include N × M pixels, where M is not necessarily equal to N.

Video encoder 20 and video decoder 30 may be configured to perform one or more of the various techniques of this disclosure, either alone or in any combination. For example, according to certain techniques of this disclosure, video encoder 20 and video decoder 30 may be configured to perform various techniques related to Multiview Video Coding (MVC) or three-dimensional video (3DV) coding (e.g., as an extension of h.264/AVC or HEVC). In some examples, High Level Syntax (HLS) changes to the base standard may be used to enable MVC and/or 3DV extensions for the video coding standard. For example, rather than introducing new coding structures, certain existing coding structures may be redefined or otherwise used to enable HLS-only extensions.

As an example, to code video data according to MVC and 3DV extensions, video encoder 20 and video decoder 30 may be configured to perform inter-layer or inter-view prediction. That is, video encoder 20 and video decoder 30 may be configured to predict a block of a current picture in a current view using data of previously coded pictures of previously coded views. Typically, the previously coded picture (i.e., the inter-view reference picture) and the current picture have the same Picture Order Count (POC) value such that the inter-view reference picture and the current picture appear in the same access unit, and likewise, have substantially the same output order (or display order).

Video encoder 20 and video decoder 30 may be configured to utilize the disparity motion vector to code a current block of a current picture using inter-view prediction. Thus, in some examples, a disparity motion vector may be said to include the following motion vectors: for a current picture that includes a current block predicted using the motion vector, a current POC value is equal to a POC value of a reference picture referenced by the motion vector. Accordingly, video encoder 20 and video decoder 30 may be configured to determine that the motion vector is a disparity motion vector when the POC value of the block predicted by the motion vector is equal to the POC value of the reference picture referred to by the motion vector. Similarly, video encoder 20 and video decoder 30 may be configured to determine that the motion vector includes a temporal motion vector when the POC of the block predicted by the motion vector is not equal to the POC value of the reference picture referred to by the motion vector.

Additionally or alternatively, video encoder 20 and video decoder 30 may be configured to determine that a motion vector includes a disparity motion vector when a current picture comprising a current block predicted using the motion vector is in a layer different from a reference picture referenced by the motion vector. Similarly, video encoder 20 and video decoder 30 may be configured to determine that a motion vector includes a temporal motion vector when a current picture including a current block predicted using the motion vector is in the same layer as a reference picture referenced by the motion vector.

As yet another example, HEVC distinguishes long-term reference pictures from short-term reference pictures. In the techniques of HEVC, long-term pictures are stored in a Decoded Picture Buffer (DPB) longer than short-term reference pictures. In addition, the syntax element is used to indicate whether the reference picture is a long-term reference picture or a short-term reference picture. In some examples, in MVC and 3DV, long-term reference pictures may instead correspond to temporal reference pictures (i.e., pictures of the same layer or view as the current picture being coded), while short-term reference pictures may instead correspond to inter-view reference pictures (i.e., pictures of a different layer or view than the current picture being coded). Thus, the use of long-term and short-term reference pictures may also provide an indication of whether the reference picture is a temporal reference picture or an inter-view reference picture. Likewise, the motion vectors referencing long-term reference pictures may include temporal motion vectors, while the motion vectors referencing short-term reference pictures may include disparity motion vectors.

According to certain techniques of this disclosure, video encoder 20 and video decoder 30 may be configured to disable the use of different types of motion vectors as motion vector predictors for each other. For example, if the current motion vector is a temporal motion vector, video encoder 20 and video decoder 30 may be configured to predict the temporal motion vector without using the disparity motion vector as a motion vector predictor. Likewise, if the current motion vector is a disparity motion vector, video encoder 20 and video decoder 30 may be configured to predict the disparity motion vector without using the temporal motion vector as a motion vector predictor.

Video encoder 20 and video decoder 30 may be configured to perform various modes of motion vector prediction. In one example (merge mode), video encoder 20 and video decoder 30 may be configured to code a merge flag, which represents which of a plurality of neighboring blocks inherits motion parameters, such as a reference picture list from which the reference picture is selected, a reference index indicating the reference picture in the reference list, a horizontal motion vector component, and a vertical motion vector component.

In another example, Advanced Motion Vector Prediction (AMVP), video encoder 20 and video decoder 30 may be configured to code an indication of: a reference picture list from which the reference picture is selected, a reference index indicating a reference picture in the reference picture list, a motion vector difference value, and an AMVP index indicating a neighboring block from which the motion vector predictor is selected.

In merge mode and/or AMVP mode or other such motion vector coding modes, video encoder 20 and video decoder 30 may be configured to not use motion information from neighboring blocks that use a different type of motion vector than the motion vector of the current block. That is, video encoder 20 and video decoder 30 may be configured to perform the following operations: determining a first type of a current motion vector; determining a second type of candidate motion vector predictor; and disabling the use of the candidate motion vector predictor as the motion vector predictor for the current motion vector if the first type is not the same as the second type.

To disable a candidate motion vector predictor, video encoder 20 and video decoder 30 may set a variable that indicates whether the candidate motion vector predictor is available as a motion vector predictor for the current motion vector. Video encoder 20 and video decoder 30 may set the value of this variable to indicate that the candidate motion vector predictor is not available, even when the candidate motion vector predictor has previously been deemed available based on other conditions indicating that the candidate motion vector predictor is available. For example, as explained in more detail below, video encoder 20 and video decoder 30 may associate a variable with the candidate motion vector predictor, where the value of the variable indicates whether the candidate motion vector predictor is available as a motion vector predictor for the current motion vector.

In particular, video encoder 20 may be configured to determine a set of motion vector predictors that may be used to predict a current motion vector. Video decoder 30 may also be configured to construct this set, or alternatively, video encoder 20 may signal a set of available motion vector predictors. In any case, video encoder 20 and video decoder 30 may determine a set of available motion vector predictors, and select one of the set of motion vector predictors as an actual motion vector predictor to use to code the current motion vector.

In AMVP mode, video encoder 20 may calculate a motion vector difference between the current motion vector and the motion vector predictor and code the motion vector difference. Likewise, video decoder 30 may combine the motion vector difference value with the determined motion vector predictor to reconstruct the current motion vector (i.e., the motion vector for the current block (e.g., the current PU) of the video data). In merge mode, the actual motion vector predictor can be used as the current motion vector. Thus, in merge mode, video encoder 20 and video decoder 30 may treat the motion vector difference values as zero values.

According to certain techniques of this disclosure, video encoder 20 and video decoder 30 may be configured to perform the following operations: determining whether one or more candidate motion vector predictors in a list of candidate motion vector predictors, any or all of which may have been previously determined to be available based on other criteria, are unavailable to predict a current motion vector based on whether the one or more candidate motion vector predictors are of a type other than the current motion vector. Video encoder 20 and video decoder 30 may be further configured to perform the following operations: motion vector prediction using those candidate motion vector predictors determined to be unavailable are disabled, for example, by setting an available flag (or variable) of an unavailable candidate motion vector predictor to a value indicating that an unavailable candidate motion vector predictor is unavailable.

Additionally or alternatively, after selecting a motion vector predictor from a set of available candidate motion vector predictors, video encoder 20 and video decoder 30 may be configured to determine whether the selected motion vector predictor is a disparity motion vector (i.e., whether the selected motion vector predictor refers to an inter-view reference picture). If the selected motion vector predictor is a disparity motion vector, video encoder 20 and video decoder 30 may disable scaling of the motion vector predictor when coding the current motion vector. That is, assuming that the current motion vector and motion vector predictor are both disparity motion vectors (i.e., reference inter-view reference pictures), the difference in POC values between the current picture and the inter-view reference pictures will be zero (since inter-view reference pictures typically occur within the same access unit as the current picture being coded), and thus, scaling is unnecessary. Furthermore, according to the techniques of this disclosure, attempting to scale the motion vector predictor may cause errors that may be avoided by disabling scaling.

In some examples, in MVC or 3DV extensions of HEVC, enable temporal mvp flag is always set to 0 for any active PPS. That is, video encoder 20 may be configured to set the enable temporal mvp flag of the active PPS in the MVC or 3DV extension of HEVC to 0. Likewise, video decoder 30 may be configured to decode enable temporal mvp flag, or infer the value of enable temporal mvp flag to be 0 when decoding a bitstream compliant with MVC or 3DV extensions of HEVC.

In some examples, in an MVC or 3DV extension of HEVC, in a profile with only High Level Syntax (HLS) changes, video encoder 20 and video decoder 30 set the value of collocated ref idx in such a way that the collocated picture never corresponds to a reference picture from a different view. Furthermore, video encoder 20 and video decoder 30 may be configured to code data representing an indication of MVC or 3DV extension of HEVC, enabling greater flexibility to co-locate pictures with configuration files that change at a low level.

In some examples, video encoder 20 and video decoder 30 may be configured to code data representing indications of slices coded according to HEVC in a slice header to explicitly disable scaling of motion vectors of identified collocated pictures during TMVP. This collocated picture may be marked as "not used for temporal motion vector prediction".

In some examples, in HEVC, video encoder 20 and video decoder 30 may disable motion vector scaling during advanced motion vector prediction when a motion vector of a neighboring block has a reference index that is different from the current reference index, and also has a Picture Order Count (POC) value that is different from the POC value of the current reference picture. Video encoder 20 and video decoder 30 may be configured to code the data representing the indication in a slice header, an Adaptation Parameter Set (APS), a Picture Parameter Set (PPS), a Sequence Parameter Set (SPS), a Video Parameter Set (VPS), or other data structure to signal whether AMVP is disabled or turned on.

In some examples, in HEVC, video encoder 20 and video decoder 30 may determine a motion vector from a spatial neighboring block as unavailable when one and only one of this motion vector and the motion vector to be predicted from the spatial neighboring block is from a picture having the same POC value as the current picture. These techniques may be applicable to either or both AMVP and merge modes. Alternatively, these techniques may be applicable only to the Temporal Motion Vector Prediction (TMVP) aspects of AMVP and merge mode. Video encoder 20 and video decoder 30 may code data representing indications of enabling or disabling such techniques in a slice header, APS, SPS, PPS, or VPS.

In some examples of HEVC, when a reference index of a motion vector to be predicted points to a temporal reference picture (from the same view/layer), a motion vector whose reference index points to a picture from a different view/layer may be considered unusable as a motion vector predictor. This scenario may apply to both AMVP and merge modes. Alternatively, this scenario may only apply to AMVP and TMVP portions of merge mode.

In some examples, in HEVC, video encoder 20 and video decoder 30 may code data representing an indication of each Reference Picture Set (RPS) subset to signal whether any collocated picture from a particular RPS subset will be used for motion vector scaling when the collocated picture is identified as a collocated picture during TMVP. Each picture in the RPS subset may be marked as "unused for temporal motion vector prediction".

In some examples, in HEVC, video encoder 20 and video decoder 30 may code data representing an indication of each RPS subset to signal whether any spatial neighboring motion vector prediction from a picture in a particular RPS subset will be deemed unavailable during AMVP when this motion vector and the motion vector to be predicted belong to RPS subsets with the same indication.

In some examples, in HEVC, video encoder 20 and video decoder 30 may code data representing a new type of implicit weighted prediction for a B slice, such that for certain reference picture pairs in RefPicList0 and RefPicList1, the weights for both reference pictures may be the same if either of the reference pictures in the pair is used for weighted bi-prediction of a PU. For other combinations of pictures from RefPicList0 and RefPicList1, current implicit weighted prediction in HEVC or h.264/AVC may be applicable. Video encoder 20 and video decoder 30 may code data in the slice header indicating which combinations are enabled or disabled.

In some examples, video encoder 20 and video decoder 30 may be configured to not use disparity motion vectors to predict normal (i.e., temporal) motion vectors, and not use temporal motion vectors to predict disparity motion vectors. Furthermore, video encoder 20 and video decoder 30 may be configured not to scale disparity motion vectors. In some examples, when one or both reference pictures of the current PU are inter-view reference pictures and the implicit weighted prediction mode is on, the weights for the two reference pictures of the current PU may be set to be the same (e.g., 1/2, 1/2).

In some examples, as a derivation of the properties of the RPS subsets, the video coder may derive RefTypeIdc equal to 0 for each RPS subset of RefPicSetLtCurr, refpicsetstcurrfront, RefPicSetStCurrAfter, and RefPicSetStFoll. Each picture included in the RPS subset may have RefPicTypeIdc set equal to RefTypeIdc of the RPS subset. As an example use of this scenario in a possible MVC extension of HEVC, the InterView RPS subset may be set to have RefTypeIdc equal to 1.

This disclosure defines a function refpictypefunc (pic) that returns as an argument the RefPicTypeIdc value of the reference picture "pic" passed to the function. This function may be performed as part of a decoding process, e.g., by video encoder 20 when decoding previously encoded video data for use as reference video data or by video decoder 30 during a video decoding process.

This disclosure also provides techniques for a derivation process of motion vector predictor candidates. Exercise for removing exerciseIn addition to or as an alternative to the process of HEVC, video coders, such as video encoder 20 and video decoder 30, may also derive motion vectors mvLXA and availability flags availableFlagLXA using the following process. When for the slave (xA)₀，yA₀) To (xA)₁，yA₁) Of (xA)_k，yA_k) (wherein yA)₁＝yA₀-MinPuSize), availableFlagLXA equals 0, the following may be repeatedly applied until availableFlagLXA equals 1 (where in this example the numbers in format # - # # refer to a particular chapter of the upcoming HEVC standard):

● if a prediction unit covering the luma position (xAk, yAk) is available, PredMode is not MODE _ INTRA, predFlagLX [ xAk ] [ yAk ] is equal to 1, availableFlagLXA is set equal to 1, motion vector mvLXA is set equal to motion vector mvLX [ xAk ] [ yAk ], refIdxA is set equal to refIdxLX [ xAk ] [ yAk ], ListA is set equal to LX.

● otherwise, if a prediction unit covering the luma position (xAk, yAk) is available, PredMode is not MODE _ INTRA, predflag LY [ xAk ] [ yAk ] (where Y ═ X) is equal to 1, availableFlagLXA is set equal to 1, motion vector mvLXA is set equal to motion vector mvLY [ xAk ] [ yAk ], refIdxA is set equal to refIdxLY [ xAk ] [ yAk ], ListA is set equal to LY.

● if availableFlagLXA is equal to 1, and RefPicTypeFunc (RefPicList A (refIdxA)) is not equal to RefPicTypeFunc (RefPicListLX (refIdxLX)), availableFlagLXA is set to 0.

● when availableFlagLXA is equal to 1, and RefPicTypeFunc (RefPicListA (refIdxA)) and RefPicTypeFunc (RefPicListLX (refIdxLX)) are both equal to 0, mvLXA can be derived as specified below:

tx＝(16384+(Abs(td)＞＞1))/td (8-136)

DistScaleFactor＝Clip3(-4096，4095，(tb*tx+32)＞＞6) (8-137)

mvLXA＝Clip3(-8192，8191.75，Sign(DistScaleFactor*mvLXA)*((Abs(DistScaleFactor*mvLXA)+127)＞＞8)) (8-138)

where td and tb can be derived as:

td＝Clip3(-128，127，PicOrderCntVal-PicOrderCnt(RefPicListListA(refIdxA)))

(8-139)

tb＝Clip3(-128，127，PicOrderCntVal-PicOrderCnt(RefPicListLX(refIdxLX)))

(8-140)

● when availableFlagLXA is equal to 1, and both RefPicTypeFunc (RefPicList A (refIdxA)) and RefPicTypeFunc (RefPicListLX (refIdxLX)) are equal to non-zero values, mvLAX is set to mvLXA without scaling.

In addition to or as an alternative to the process of conventional HEVC, video coders such as video encoder 20 and video decoder 30 may also derive motion vectors mvLXB and availability flags availableFlagLXB using the following process. When isscaledflagLX is equal to 0, availableFlagLXB can be set equal to 0, and for slave (xB)₀，yB₀) To (xB)₂，yB₂) (xB)_k，yB_k) (wherein xB₀＝xP+nPSW，xB₁＝xB₀-MinPuSize and xB₂X p-MinPuSize), the following may be repeatedly applied until availableFlagLXB equals 1:

● if a prediction unit covering the luma positions (xBk, yBk) is available, PredMode is not MODE _ INTRA, predFlagLX [ xBk ] [ yBk ] is equal to 1, availableFlagLXB is set equal to 1, motion vector mvLXB is set equal to motion vector mvLX [ xBk ] [ yBk ], refIdxB is set equal to refIdxLX [ xBk ] [ yBk ], ListB is set equal to LX.

● otherwise, if a prediction unit covering the luma position (xBk, yBk) is available, then PredMode is not MODE _ INTRA, predflag LY [ xBk ] [ yBk ] (where Y ═ X) is equal to 1, availableflag lxb is set equal to 1, motion vector mvLXB is set equal to motion vector mvLY [ xBk ] [ yBk ], refIdxB is set equal to refIdxLY [ xBk ] [ yBk ], ListB is set equal to LY.

● if availableFlagLXA is equal to 1, and RefPicTypeFunc (RefPicListListB (refIdxB)) is not equal to RefPicTypeFunc (RefPicListLX (refIdxLX)), availableFlagLXB is set to 0.

● mvLXB can be derived as specified below when availableFlagLXB is equal to 1 and RefPicTypeFunc (refpiclistlista (refidxa)) and RefPicTypeFunc (refpiclistlx (refidxlx)) are both equal to 0, and PicOrderCnt (refpiclistb (refidxb)) is not equal to PicOrderCnt (refpiclistlx (refidxlx)).

tx＝(16384+(Abs(td)＞＞1))/td (8-144)

DistScaleFactor＝Clip3(-4096，4095，(tb*tx+32)＞＞6) (8-145)

mvLXB＝Clip3(-8192，8191.75，Sign(DistScaleFactor*mvLXA)*((Abs(DistScaleFactor*mvLXA)+127)＞＞8)) (8-146)

Wherein td and tb can be derived as

td＝Clip3(-128，127，PicOrderCntVal-PicOrderCnt(RefPicListListB(refIdxB)))

(8-147)

tb＝Clip3(-128，127，PicOrderCntVal-PicOrderCnt(RefPicListLX(refIdxLX)))

(8-148)

● when availableFlagLXB is equal to 1 and both RefPicTypeFunc (RefPicListB (refIdxB)) and RefPicTypeFunc (RefPicListLX (refIdxLX)) are equal to non-zero values, mvLAX is set to mvLXA without scaling.

Video coders, such as video encoder 20 and video decoder 30, may derive temporal luma motion vector predictors according to the techniques of this disclosure. For example, the video coder may derive the variables mvLXCol and availableFlagLXCol as follows:

● if one of the following conditions holds, then both components of mvLXCol may be set equal to 0 and availableFlagLXCol may be set equal to 0:

o colPu is coded in intra prediction mode.

The colPu is marked as "unavailable".

The o colPic flag is "not used for temporal motion vector prediction".

Enable _ temporal _ mvp _ flag is equal to 0.

● otherwise, the motion vector mvCol, the reference index refIdxCol, and the reference list identifier listCol can be derived as follows.

O if PredFlagL0[ xPCL ] [ yPCol ] equals 0, then mvCol, refIdxCol, and listCol may be set equal to MvL1[ xPCL ] [ yPCol ], RefIdxL1[ xPCL ] [ yPCol ], and L1, respectively.

Else (PredFlagL0[ xPCL ] [ yPCol ] equals 1), the following applies:

■ if PredFlagL1[ xPCL ] [ yPCol ] is equal to 0, then mvCol, refIdxCol, and listCol can be set equal to MvL0[ xPCL ] [ yPCol ], RefIdxL0[ xPCL ] [ yPCol ], and L0, respectively.

■ otherwise (PredFlagL1[ xPCL ] [ yPCol ] equals 1), the following assignments can be made.

● if PicOrderCnt (pic) of each picture pic in each reference picture list is less than or equal to PicOrderCntVal, then mvCol, refIdxCol, and listCol can be set equal to MvLX [ xPCCol ] [ yPCol ], RefIdxLX [ xPCL ] [ yPCol ], and LX, respectively, where X is the value of X when such a process is invoked.

● otherwise (PicOrderCnt (pic) of at least one picture pic in the at least one reference picture list is larger than PicOrderCntVal), mvCol, refIdxCol, and listCol may be set equal to MvLN [ xPCCol ] [ yPCol ], RefIdxLN [ xPCCol ] [ yPCol ], and LN, respectively, where N is the value of collocated _ from _ l0_ flag.

And the variable availableFlagLXCol may be set equal to 1, and the following applies:

■ if RefPicTypeFunc (RefPicListLX (refIdxLX)) is not equal to RefPicTypeFunc (listCol (refIdxClol)), availableFlagLXCL is set equal to 0.

Note that listcol (refidxcol) returns a reference picture for temporal motion vectors.

If availableFlagLXCL is 1 and both RefPicTypeFunc (RefPicListLX (refIdxLX)) and RefPicTypeFunc (listCol (refIdxClol)) are equal to non-zero values, or PicOrderCnt (colPic) -RefPicOrderCnt (colPic, refIdxCol, listCol) are equal to PicOrderCntVal-PicOrderCnt (RefPicListLX (refIdxLX)),

■ mvLXCL ═ mvCol (8-153)

Otherwise, if RefPicTypeFunc (refpiclistlx (refidxlx)) and RefPicTypeFunc (listcol (refidxcol)) are both equal to 0, then mvLXCol may be derived as a scaled version of motion vector mvCol as specified below:

tx＝(16384+(Abs(td)＞＞1))/td (8-154)

DistScaleFactor＝Clip3(-4096，4095，(tb*tx+32)＞＞6) (8-155)

mvLXCol＝Clip3(-8192，8191.75，Sign(DistScaleFactor*mvCol)*((Abs(DistScaleFactor*mvCol)+127)＞＞8)) (8-156)

where td and tb can be derived as:

td＝Clip3(-128，127，PicOrderCnt(colPic)-RefPicOrderCnt(colPic，refIdxCol，listCol)) (8-157)

tb＝Clip3(-128，127，PicOrderCntVal-PicOrderCnt(RefPicListLX(refIdxLX)))

(8-158)

the variables described herein may be derived for implicit weighted prediction.

This disclosure also provides techniques for a weighted sample prediction process that video encoder 20 and video decoder 30 may be configured to perform. Inputs to the process may include:

● position (xB, yB), which specifies the top left sample of the current prediction unit relative to the top left sample of the current coding unit,

● the width nPSW and height nPSH of this prediction unit,

● two (nPSW) × (nPSH) arrays predSamplesL0 and predSamplesL1,

● the prediction list utilizes flags predflag l0 and predflag l1,

● refer to indices refIdxL0 and refIdxL1,

● motion vectors mvL0 and mvL1,

● bit depth bitDepth of the chrominance component.

The output of such a process may include:

● predict the (nPSW) × (nPSH) array predSamples of sample values.

In one example, the variables shift1, shift2, offset1, and offset2 are derived as follows:

● variable shift1 is set equal to 14-bitDepth, and variable shift2 is set equal to 15-bitDepth,

● variable offset1 is set equal to 1 < (shift1-1) and variable offset2 is set equal to 1 < (shift 2-1).

In P slices, if the value of predflag l0 is equal to 1, then the following applies:

● if weighted _ pred _ flag is equal to 0, then the default weighted sample prediction process is invoked as described in sub-clause 8.5.2.2.3.1, with the inputs and outputs identical to the process described in this sub-clause.

● otherwise (weighted _ pred _ flag equal to 1), the explicit weighted sample prediction process as described in sub-clause 8.5.2.2.3.2 is invoked, with the inputs and outputs identical to the process described in this sub-clause.

● in the B slice, if predFlagL0 or predFlagL1 is equal to 1, then the following applies:

● if weighted _ bipred _ idc is equal to 0, then the default weighted sample prediction process is invoked as described in sub-clause 8.5.2.2.3.1, with the inputs and outputs identical to the process described in this sub-clause.

● otherwise, if weighted _ bipred _ idc is equal to 1 and if predflag l0 or predflag l1 is equal to 1, then the explicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 is invoked, with the inputs and outputs identical to the process described in this subclause.

● otherwise (weighted _ bipred _ idc equal to 2), the following applies:

if predflag l0 is equal to 1 and predflag l1 is equal to 1, and both RefPicTypeFunc (RefPicListL0(refIdxL0)) and RefPicTypeFunc (RefPicListL1(refIdxL1)) are equal to 0, then the implicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of the current HEVC working draft is invoked, with the inputs and outputs identical to the process described in this subclause.

Otherwise (predflag l0 or predflag l1 equal to 1 but not both) the default weighted sample prediction process as described in subclause 8.5.2.2.3.1 is invoked, with the inputs and outputs identical to the process described in this subclause.

Techniques for a default weighted sample prediction process are also provided. The inputs to and outputs from such a process may be the same as those described above for the weighted sample prediction process. Depending on the values of predflag l0 and predflag l1, the prediction sample predSamples [ x, y ] may be derived as follows, where x-0. (nPSW) -1 and y-0. (nPSH) -1:

● if predflag l0 equals 1 and predflag l1 equals 0,

then predSamples [ x, y ] ═ Clip3(0, (1 < bitDepth) -1, (predSamples l0[ x, y ] + offset1) > shift1) (8-211)

● otherwise, if predFlagL0 equals 0 and predFlagL1 equals 1,

then predSamples [ x, y ] ═ Clip3(0, (1 < bitDepth) -1, (predSamples l1[ x, y ] + offset1) > shift1) (8-212)

● otherwise (both predFlagl0 and predFlagl1 equal 1), if RefPicOrderCnt (currPic, refIdxL0, L0) equals RefPicOrderCnt (currPic, refIdxL1, L1) and mvL0 equals mvL1, and both RefPicTypeFunc (RefPicListL0(refIdxL0)) and RefPicTypeFunc (RefPicListL1(refIdxL1)) equal 0,

then predSamples [ x, y ] ═ Clip3(0, (1 < bitDepth) -1, (predSamples l0[ x, y ] + offset1) > shift1) (8-213)

● if not, then,

○predSamples[x，y]＝Clip3(0，(1＜＜bitDepth)-1，(predSamplesL0[x，y]+predSamplesL1[x，y]+offset2)＞＞shift2) (8-214)

in another example, weighted prediction may be performed as follows. A new type of implicit weighting may be performed, which may correspond to a modified implicit weighted prediction. The following changes may be made in the picture parameter set RBSP semantics:

weighted _ bipred _ idc equal to 0 may specify that default weighted prediction is applied to B slices. weighted _ bipred _ idc equal to 1 may specify that explicit weighted prediction is applied to the B slice. weighted _ bipred _ idc equal to 2 may specify that implicit weighted prediction should be applied to B slices. weighted _ bipred _ idc equal to 3 may specify that constrained implicit weighted prediction is applied to B slices. The value of weighted _ bipred _ idc may range from 0 to 3, including 0 and 3.

In some examples, the techniques of this disclosure may include the following weighted sample prediction process, e.g., performed during a decoding process. The inputs to the weighted sample prediction process may include:

● the width nPSW and height nPSH of this prediction unit,

● two (nPSW) × (nPSH) arrays predSamplesL0 and predSamplesL1,

● the prediction list utilizes flags predflag l0 and predflag l1,

● refer to indices refIdxL0 and refIdxL1,

● motion vectors mvL0 and mvL1,

● bit depth bitDepth of the chrominance component.

The output of such a process may include:

● predict the (nPSW) × (nPSH) array predSamples of sample values.

The variables shift1, shift2, offset1, and offset2 may be derived as follows:

● if weighted _ pred _ flag is equal to 0, a default weighted sample prediction process may be invoked as described in subclause 8.5.2.2.3.1 of the current HEVC working draft, with inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

● otherwise (weighted _ pred _ flag equal to 1), an explicit weighted sample prediction process may be invoked as described in subclause 8.5.2.2.3.2 of the current HEVC working draft, with inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

In a B slice, if predflag l0 or predflag l1 is equal to 1, then the following applies:

● if weighted bipred idc is equal to 0, a default weighted sample prediction process as described in sub-clause 8.5.2.2.3.1 of the current HEVC working draft may be invoked with the inputs and outputs identical to the process described in this sub-clause (e.g., sub-clause 4.2.2).

● otherwise, if weighted _ bipred _ idc is equal to 1 and if predflag l0 or predflag l1 is equal to 1, an explicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of the current HEVC working draft may be invoked, with the inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

● otherwise, if weighted _ bipred _ idc equals 2, then the following applies:

if predflag l0 is equal to 1 and predflag l1 is equal to 1, then the implicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of the current HEVC working draft may be invoked, with the inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

Otherwise (predflag l0 or predflag l1 equal to 1 but not both) a default weighted sample prediction process as described in subclause 8.5.2.2.3.1 of the current HEVC working draft may be invoked, with the inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

● otherwise (weighted _ bipred _ idc equal to 3), the following applies:

if predflag l0 is equal to 1 and predflag l1 is equal to 1, and both RefPicTypeFunc (RefPicListL0(refIdxL0)) and RefPicTypeFunc (RefPicListL1(refIdxL1)) are equal to 0, then the implicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of the current HEVC working draft may be invoked, with the inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

Otherwise (predflag l0 or predflag l1 equal to 1 but not both) a default weighted sample prediction process may be invoked as described in subclause 8.5.2.2.3.1 of the current HEVC working draft, with inputs and outputs identical to the process described in this subclause (e.g., subclause 4.2.2).

In some examples, video encoder 20 and video decoder 30 may be configured to code a flag to disable scaling of spatial neighboring blocks that are being scaled for AMVP. Table 1 below provides an example sequence parameter set RBSP syntax for this flag:

TABLE 1

seq_parameter_set_rbsp(){	Descriptor(s)
		profile_idc	u(8)
reserved _ zero _8 bits/' equal to 0	u(8)
		level_idc	u(8)
seq_parameter_set_id	ue(v)
		…

		disable_spatial_mv_poc_scaling_flag	u(1)
…

In general, the semantics of the sequence parameter set of table 2 remain the same as in the case of the current HEVC working draft. However, table 2 introduces the forbidden space MV POC scaling flag. Various examples of the semantics of this add element are provided below:

in this example, disable _ spatial _ mv _ POC _ scaling _ flag equal to 0 indicates that when the target motion vector corresponds to a picture with a different reference index or a different POC, the spatial motion vector is to be scaled based on the POC. In this example, disable _ spatial _ mv _ poc _ scaling _ flag equal to 1 indicates that when the reference index of the spatial motion vector is different from the target motion vector, this motion vector is considered unavailable. Note that the target motion vector is a motion vector to be predicted under AMVP.

Alternatively, disable _ spatial _ mv _ POC _ scaling _ flag equal to 1 may indicate that a spatial motion vector is considered unavailable when its reference index is different from that of the target motion vector and the POC of the reference picture of this motion vector is different from that of the target motion vector.

Alternatively, disable _ spatial _ mv _ poc _ scaling _ flag may be added in a PPS, APS, or slice header to indicate the same functionality of the picture to which a particular PPS, APS, or slice header may be applicable.

In yet another example, video encoder 20 and video decoder 30 may be configured to code a flag in SPS in a multiview or 3DV context to disable the use of inter-view motion vectors (e.g., disparity motion vectors) for Temporal Motion Vector Prediction (TMVP). Table 2 below provides an example sequence parameter set Raw Byte Sequence Payload (RBSP) syntax consistent with certain techniques of this disclosure:

TABLE 2

seq_parameter_set_rbsp(){	Descriptor(s)
		profile_idc	u(8)
reserved _ zero _8 bits/' equal to 0	u(8)
		level_idc	u(8)
seq_parameter_set_id	ue(v)
		…
bit_equal_one	u(1)

…
	disable_inter_view_as_tmvp_flag	u(1)
…
	if(sps_extension_flag2)
while(more_rbsp_data())
	sps_extension_data_flag2	u(1)
rbsp_trailing_bits()
	}

In general, the semantics of the sequence parameter set of table 2 remain the same as in the case of the current HEVC working draft. However, table 2 introduces bit _ equivalent _ one, disable _ inter _ view _ as _ tmp _ flag, sps _ extension _ flag, and an sps _ extension _ data _ flag 2. Example semantics for these added elements are provided below:

in this example, disable _ inter _ view _ as _ TMVP _ flag equal to 1 indicates that, for all slices in the coded video sequence, the inter-view (only) reference picture is never selected as the collocated picture of TMVP mode. Note that this scenario implies a constraint on collocated _ ref _ idx (i.e., collocated _ ref _ idx can be set in such a way that collocated pictures never correspond to reference pictures from different views).

In this example, disable _ inter _ view _ as _ TMVP _ flag equal to 0 indicates that the inter-view (only) reference picture may be selected as the collocated picture of TMVP mode.

In some examples, the sequence parameter set may also include elements of either or both of tables 1 and 2 described above, in addition to or in lieu of any of the elements of the current HEVC working draft.

The current syntax may be signaled as part of the extension bits for MVC/3DV extension. Alternatively, syntax elements may be signaled in other locations, which may contain sequence level information for multiview/3 DV sequences, such as a subset Sequence Parameter Set (SPS), or even possibly higher level syntax tables, such as a video parameter set. Alternatively, the above syntax element (disable _ inter _ view _ as _ tmvp _ flag) may not be signaled, but the bitstream may always conform to the condition when disable _ inter _ view _ as _ tmvp _ flag is equal to 1. This situation can be achieved by: collocated ref idx is selected in such a way that collocated ref idx never corresponds to an inter-view reference picture.

In addition, Picture Parameter Set (PPS) syntax may be modified in accordance with the techniques of this disclosure. For example, the syntax element "weighted _ bipred _ idc" may be signaled in the PPS. The semantics for this syntax element may be as follows: weighted _ bipred _ idc equal to 0 may specify that default weighted prediction is applied to B slices. weighted _ bipred _ idc equal to 1 may specify that explicit weighted prediction is applied to the B slice. weighted _ bipred _ idc equal to 2 may specify that implicit weighted prediction should be applied to B slices. weighted _ bipred _ idc equal to 3 may specify that constrained implicit weighted prediction is applied to B slices. The value of weighted _ bipred _ idc may range from 0 to 3, including 0 and 3.

Table 3 below provides an example syntax table for slice headers consistent with certain techniques of this disclosure:

TABLE 3

In general, the semantics of the slice header of table 3 remain the same as in the case of HEVC. However, table 3 introduces poc _ cloning _ tmvp _ disabled flag and constraint implicit table. The semantics of these added elements are provided below (with examples of the constraint implicit tables described with respect to tables 4 and 5 below):

poc _ scaling _ TMVP _ disabled _ flag equal to 1 may indicate that the motion vector derived from TMVP is not scaled. This flag equal to 0 may indicate that the motion vector derived from the TMVP may be scaled as in the current design of the TMVP.

As mentioned above, the slice header may include a constraint implicit table, e.g., according to table 4 or table 5 below.

TABLE 4

const_implicit_table(){	Descriptor(s)
		for(i＝0；i＜＝num_ref_idx_lc_active_minus 1；i++)
implicit_disabled_pic_flag[i]	u(1)
		}

Table 5 provides an alternative example of a constrained implicit table:

TABLE 5

const_implicit_table(){	Descriptor(s)
		for(i＝0；i＜＝num_ref_idx_l0_active_minus 1；i++)
implicit_disabled_pic_l0_flag[i]	u(1)
		for(i＝0；i＜＝num_ref_idx_l1_active_minus 1；i++)
implicit_disabled_pic_l1_flag[i]
		}

The semantics of the syntax elements in tables 4 and 5 are provided below:

an explicit disabled pic flag i equal to 1 may indicate that in implicit weighted prediction, if a reference picture in the combined reference picture list corresponding to reference index i is used, the weights for this and another reference picture during implicit weighted prediction are both set to 0.5, meaning no weighted prediction.

An aggressive _ disabled _ pic _ l0_ flag [ i ] equal to 1 may indicate that, in implicit weighted prediction, if the reference picture in RefPicList0 corresponding to reference index i is used, the weights for this and the other reference pictures during implicit weighted prediction are both set to 0.5, meaning no weighted prediction.

An implicit disabled pic l1 flag i equal to 1 may indicate that in implicit weighted prediction, if the reference picture in RefPicList1 corresponding to reference index i is used, the weights for this and the other reference pictures during weighted prediction are both set to 0.5, meaning no weighted prediction.

Alternatively, the reference index values for pictures that are to be constrained from implicit weighted prediction may be signaled directly.

Alternatively, as part of RPS signaling, pictures that would be constrained from implicit weighted prediction may be signaled directly.

Alternatively, in MVC or 3DV codecs, the RPS subset containing inter-view (only) reference pictures may always be set to a constrained implicit weighted prediction picture.

A constrained implicitly weighted prediction picture is a picture such that the following scenarios are satisfied: when used for implicit prediction, the weight of this picture and the other picture of the bi-prediction pair are both 0.5.

Alternatively, video encoder 20 and video decoder 30 may be configured to code, for each RPS subset, a flag in a PPS or SPS or slice header indicating whether all pictures in the RPS subset are constrained implicitly weighted prediction pictures.

As another example, video encoder 20 and video decoder 30 may be configured to code refPicType in SPS. Table 6 provides an example set of syntax for this SPS:

TABLE 6

seq_parameter_set_rbsp(){	Descriptor(s)
		profile_idc	u(8)
reserved _ zero _8 bits/' equal to 0	u(8)
		level_idc	u(8)
seq_parameter_set_id	ue(v)
		…
numAdditionalRPSSubSets	ue(v)
		for(i＝0；i＜numAdditionalRPSSubSets；i++)
ref_type_flag[i]	u(1)

…
		rbsp_trailing_bits()
}

In general, the semantics of the sequence parameter set of table 6 remain the same as in the case of the current HEVC working draft. However, table 6 introduces numaddressionalrpssubsets and ref _ type _ flag [ i ] within the for () loop. Example semantics for these added elements are provided below:

in this example, numaddressionalrpssubsets specifies additional RPS subsets in addition to the RPS subset for the short-term reference picture and the RPS subset for the long-term reference picture.

In this example, ref _ type _ flag [ i ] specifies a flag for any picture of the additional RPS subset i. This flag may be inferred to be equal to 0 for RPS subsets containing short-term reference pictures and RPS subsets containing long-term reference pictures.

RefTypeIdc may be set to ref _ type _ flag of the RPS subset.

Alternatively, video encoder 20 and video decoder 30 need not code this flag, and may infer this flag as a value of 1 for the inter-view RPS subset.

Alternatively, video encoder 20 and video decoder 30 may derive the value of RefTypeIdc for the reference picture as 1 if the reference picture for the motion vector has the same POC as the current picture, and 0 otherwise.

Other examples may be similar to the first example above (or other examples), with the following additions. In the example below, refpictypefuncmv (mv) returns 0 if the reference index of the motion vector mv points to a temporal reference picture, and refpictypefuncmv (mv) returns 1 if the reference index of the motion vector mv points to a picture in a different view/layer. Alternatively, if pic is a short-term reference picture, refpictypefunc (pic) returns 0, and if pic is a long-term picture, refpictypefunc (pic) returns 1. Refpictypefuncmv (mv) returns 0 if the reference index of the motion vector mv points to a short-term reference picture, and refpictypefuncmv (mv) returns 1 if the reference index of the motion vector mv points to a long-term reference picture. In addition, the following process is modified for AMVP.

Derivation process for luma motion vector prediction

Inputs to this process are

A luminance position (xP, yP) specifying an upper left luminance sample of the current prediction unit relative to an upper left sample of the current picture,

variables specifying the width nPSW and height nPSH of the prediction unit for luminance.

The reference index refIdxLX of the current prediction unit partition (where X is 0 or 1).

The output of this process is

-a prediction mvpLX of a motion vector mvLX (where X is 0 or 1).

The motion vector predictor mvpLX is derived with the following ordered steps.

1. The derivation process for the motion vector predictor candidates from the neighboring prediction unit partition in subclause 8.5.2.1.6 is invoked, with the luminance position (xP, yP), the width nPSW and height nPSH of the prediction unit, and refIdxLX (where X is 0 or 1, respectively) as inputs, and the availability flag availableFlagLXN and the motion vector mvLXN (where N is replaced by A, B) as outputs.

2. availableFlagLXA is set equal to 0 if refpictypefuncmv (mvlxa) is not equal to refpictypefuncmv (mvplx), and availableFlagLXB is set equal to 0 if refpictypefuncmv (mvlxb) is not equal to refpictypefuncmv (mvplx).

3. If availableFlagLXA and availableFlagLXB are both equal to l and mvLXA is not equal to mvLXB, availableFlagLXCol is set equal to 0, otherwise, the derivation process for temporal luminance motion vector prediction in sub clause 5 is called with the luminance position (xP, yP), width nPSW and height nPSH of the prediction unit, and refIdxLX (where X is 0 or 1, respectively) as inputs, and outputs as the availability flag availableFlagLXCol and the temporal motion vector prediction value mvLXCol.

4. The motion vector predictor candidate list mvplislx is constructed as follows.

mvLXA, in case availableFlagLXA equals 1

mvLXB, in case availableFlagLXB equals 1,

mvLXCol, in case availableFlagLXCol equals 1,

5. when the mvLXA and the mvLXB have the same value, the mvLXB is removed from the list. The variable numMVPCandLX is set to the number of elements within mvpListLX, and maxNumMVPCand is set to 2.

6. The motion vector predictor list is modified as follows.

If numMVPCandLX is less than 2, then the following applies.

mvpListLX[numMVPCandLX][0]＝0 (8-133)

mvpListLX[numMVPCandLX][1]＝0 (8-134)

numMVPCandLX＝numMVPCandLX+1 (8-135)

Otherwise (numMVPCandLX equal to or greater than 2), all motion vector predictor candidates mvpListLX [ idx ] with idx greater than 1 are removed from the list.

A motion vector of mvplsllx [ mvp _1X _ flag [ xP, yP ] ] is assigned to mvpLX.

In addition, the following modifications may be applied to the TMVP. When checking POC values of each reference picture during TMVP instead of checking each picture, the following modifications are made: only pictures for which RefPicTypeFunc () equals 0 are checked. When, in one alternative, RefPicTypeFunc () returns 0 for short-term reference pictures, this means that only short-term reference pictures are checked.

One detailed implementation that may be implemented by video encoder 20 and video decoder 30 is as follows:

derivation process for temporal luma motion vector prediction

Inputs to this process are

variables specifying the width nPSW and height nPSH of the prediction unit for luminance,

The output of this process is

-a motion vector prediction mvLXClE,

the availability flag availableFlagLXCol.

The function RefPicOrderCnt (picX, refIdx, LX) returns the picture order count PicOrderCntVal of the reference picture with index refIdx from the reference picture list LX of picture picX and is specified as follows.

RefPicOrderCnt (picX, refIdx, LX) ═ PicOrderCnt (refpiclistlx (refIdx) of picture picX)

(8-141)

Depending on the values of slice _ type, collocated _ from _ l0_ flag, and collocated _ ref _ idx, the variable colPic specifying the picture containing the collocated partition is derived as follows.

If slice _ type is equal to B and collocated _ from _ l0_ flag is equal to 0, then the variable colPic specifies the picture containing the collocated partition as specified by RefPicList1[ collocated _ ref _ idx ].

Otherwise (slice _ type equal to B and collocated _ from _ l0_ flag equal to 1 or slice _ type equal to P), variable colPic specifies pictures containing collocated partitions as specified by RefPicList0[ collocated _ ref _ idx ].

The variables colPu and their position (xPCol, yPCol) were derived using the following ordered procedure:

1. the variable colPu is derived as follows

yPRb＝yP+nPSH (8-149)

-if (yP > Log2CtbSize) is equal to (yPRb > Log2CtbSize), then the horizontal component of the lower right luma position of the current prediction unit is defined by

xPRb＝xP+nPSW (8-150)

And the variable colPu is set to the prediction unit covering the modified position inside the colPic given by ((xRb > 4) < 4, (yPRb > 4) < 4).

Else ((yP > Log2CtbSize) is not equal to (yPRb > Log2CtbSize), the colPu is marked as "unavailable".

2. The following applies when colPu is coded in intra prediction mode or colPu is marked as "unavailable".

-the central luminance position of the current prediction unit is defined by

xPCtr＝(xP+(nPSW＞＞1) (8-151)

yPCtr＝(yP+(nPSH＞＞1) (8-152)

The variable colPu is set to the prediction unit covering the modified position inside the colPic given by ((xPtr > 4) < 4, (yPCtr > 4) < 4).

(xPCCol, yPCol) is set equal to the top left luminance sample of colPu relative to the top left luminance sample of colPic.

The function longtermrfpic (picX, refIdx, LX) is defined as follows. If a reference picture with index refIdx from the reference picture list LX of picture picX is marked as "for long-term reference" when picX is the current picture, longtermrfpic (picX, refIdx, LX) returns 1; otherwise longtermrfpic (picX, refIdx, LX) returns 0.

The variables mvLXCol and availableFlagLXCol are derived as follows.

-two components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0 if one of the following conditions holds.

colPu is coded in intra prediction mode.

colPu is marked as "unavailable".

-pic _ temporal _ mvp _ enable _ flag equal to 0.

Otherwise, the motion vector mvCol, the reference index refIdxCol and the reference list identifier listCol are derived as follows.

-if PredFlagL0[ xPCL ] [ yPCol ] equals 0, then mvCol, refIdxCol and listCol are set equal to MvL1[ xPCL ] [ yPCol ], RefIdxL1[ xPCL ] [ yPCol ] and L1, respectively.

Else (PredFlagL0[ xPCL ] [ yPCol ] equals 1), the following applies.

-if PredFlagL1[ xPCL ] [ yPCol ] equals 0, then mvCol, refIdxCol and listCol are set equal to MvL0[ xPCL ] [ yPCol ], RefIdxL0[ xPCL ] [ yPCol ] and L0, respectively.

Else (PredFlagL1[ xPCL ] [ yPCol ] equals 1), the following assignment is made.

-if picordercnt (pic) of each picture pic (or, in an alternative, "each short-term picture pic") equal to 0 in each reference picture list is less than or equal to PicOrderCntVal, then mvCol, refIdxCol, and listCol are set equal to MvLX [ xPCol ] [ yPCol ], RefIdxLX [ xPCol ] [ yPCol ], and LX, respectively, where X is the value of X when such a process is invoked.

Otherwise (picordercnt (pic) of at least one picture pic in the at least one reference picture list is larger than PicOrderCntVal), mvCol, refIdxCol and listCol are set equal to MvLN [ xPCol ] [ yPCol ], RefIdxLN [ xPCol ] [ yPCol ] and LN, respectively, where N is the value of collocated _ from _ l0_ flag.

…

After intra-predictive or inter-predictive coding using PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. A PU may include syntax data that describes a method or mode of generating predictive pixel data in the spatial domain, also referred to as the pixel domain, and a TU may include coefficients in the transform domain after applying a transform to the residual video data, such as a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs that comprise residual data of the CU and then transform the TUs to generate transform coefficients for the CU.

After applying any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to possibly reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be down-rounded to an m-bit value during quantization, where n is greater than m.

After quantization, the video encoder may scan the transform coefficients, producing a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients at the front of the array and lower energy (and therefore higher frequency) coefficients at the back of the array. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

To perform CABAC, video encoder 20 may assign a context within the context model to a symbol to be transmitted. The context may be related to, for example, whether neighboring values of a symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for a symbol to be transmitted. Codewords in a VLC may be constructed such that relatively shorter codes correspond to more probable symbols and longer codes correspond to less probable symbols. As such, using VLC may achieve bit savings compared to, for example, using equal length codewords for each symbol to be transmitted. The probability determination may be based on the context assigned to the symbol.

Video encoder 20 may further send syntax data, such as block-based syntax data, frame-based syntax data, and GOP-based syntax data, to video decoder 30, for example, in a frame header, a block header, a slice header, or a GOP header. The GOP syntax data may describe a number of frames in a respective GOP, and the frame syntax data may indicate an encoding/prediction mode used to encode the corresponding frame.

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder or decoder circuits, as applicable, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic circuitry, software, hardware, firmware, or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (codec). A device that includes video encoder 20 and/or video decoder 30 may include an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone.

As such, video encoder 20 and video decoder 30 represent examples of video coders (e.g., video encoders or video decoders) that are configured to perform the following operations: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and coding a current motion vector based at least in part on the value of the variable. In addition, when the first type includes a disparity motion vector, the second type includes a disparity motion vector, and the candidate motion vector predictor is used to predict the current motion vector, the video coder may be configured to code the current motion vector without scaling the candidate motion vector predictor.

Fig. 2 is a block diagram illustrating an example of a video encoder 20, which video encoder 20 may implement techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions. Video encoder 20 may perform intra-coding and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of a number of spatial-based coding modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of several temporally based coding modes.

As shown in fig. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 2, video encoder 20 includes mode select unit 40, reference picture memory 64, summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Mode select unit 40, in turn, includes motion compensation unit 44, motion estimation unit 42, intra-prediction unit 46, and partition unit 48. To achieve video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and a summer 62. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62, if desired. In addition to the block filter, additional filters (in-loop or post-loop) may be used. These filters are not shown for simplicity, but may filter the output of summer 50 (as an in-loop filter) if desired.

During the encoding process, video encoder 20 receives a video frame or slice to be coded. A frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction unit 46 may alternatively perform intra-predictive coding of the block relative to one or more neighboring blocks in the same frame or slice as the received video block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.

Furthermore, partition unit 48 may partition a block of video data into a plurality of sub-blocks based on an evaluation of previous partition schemes in previous coding passes. For example, partition unit 48 may initially partition a frame or slice into a plurality of LCUs, and partition each of the LCUs into a plurality of sub-CUs based on a rate-distortion analysis (e.g., rate-distortion optimization). Mode select unit 40 may further generate a quadtree data structure that indicates partitioning of LCUs into sub-CUs. Leaf-node CUs of a quadtree may comprise one or more PUs and one or more TUs.

Mode select unit 40 may select one of the coding modes (intra or inter), e.g., based on the error results, and provide the resulting intra or inter coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference frame (or other coded unit) that is relative to the current block being coded within the current frame (or other coded unit). A predictive block is a block that is found to closely match the block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference indicators. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values for a quarter pixel position, an eighth pixel position, or other fractional pixel positions of a reference picture. Accordingly, the motion estimation unit 42 can perform a motion search with respect to the full-pixel position and the fractional-pixel position, and output a motion vector with fractional-pixel precision.

Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing the locations of the PUs to locations of predictive blocks of reference pictures. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.

The motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vectors determined by motion estimation unit 42. Furthermore, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference picture lists. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to the luma component, and motion compensation unit 44 uses motion vectors calculated based on the luma component for both the chroma and luma components. Mode select unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.

In accordance with the techniques of this disclosure, when mode select unit 40 elects to inter-predict a block (e.g., a PU) of video data using motion estimation unit 42 and motion compensation unit 44, video encoder 20 may further encode the motion vectors, e.g., using AMVP or merge mode. For example, entropy encoding unit 56 may receive the motion vector from mode select unit 40 and encode the motion vector. Entropy encoding unit 56 may entropy encode the motion vector using AMVP by selecting neighboring blocks for retrieval of the motion vector predictor and calculating a difference between the motion vector and the motion vector predictor (e.g., a horizontal motion vector difference and a vertical motion vector difference), then entropy encode one or more syntax elements representing the difference.

In accordance with the techniques of this disclosure, entropy encoding unit 56 may set the candidate motion vector predictor to be unavailable in AMVP (or merge mode) to predict the current motion vector when the candidate motion vector predictor has a different type than the current motion vector. Such setting of the candidate motion vector predictor as unavailable may be performed even after a different process determines that the candidate motion vector is available based on other criteria. For example, if the candidate motion vector predictor is a disparity motion vector and the current motion vector is a temporal motion vector, entropy encoding unit 56 may set the candidate motion vector predictor as unavailable as a predictor for the current motion vector. Likewise, if the candidate motion vector predictor is a temporal motion vector and the current motion vector is a disparity motion vector, entropy encoding unit 56 may set the candidate motion vector predictor as unavailable as a predictor for the current motion vector.

Entropy encoding unit 56 may use one or more various techniques to determine whether the motion vector and motion vector predictor being encoded are the same type of motion vector or different types of motion vectors. In some examples, entropy encoding unit 56 may determine whether the motion vector and candidate motion vector predictor being encoded refer to a reference picture having a different POC value than the current picture being encoded. If one of the motion vector or the candidate motion vector predictor refers to a reference picture having a different POC value than the current picture being encoded and the other refers to a reference picture having the same POC value as the current picture being encoded, entropy encoding unit 56 may determine that the motion vector and the candidate motion vector predictor are different types of motion vectors. In detail, a motion vector referring to a reference picture having the same POC value as the current picture being encoded may be considered as a disparity motion vector, and a motion vector referring to a reference picture having a different POC value from the current picture may be considered as a temporal motion vector.

As another example, entropy encoding unit 56 may determine that the current motion vector reference comprises a reference picture in a current layer or a different layer of the current picture being encoded. Likewise, entropy encoding unit 56 may determine that the candidate motion vector predictor refers to a reference picture in the current layer or a different layer. If both the current motion vector and the candidate motion vector predictor refer to a reference picture in the current layer or a reference picture in a different layer, entropy encoding unit 56 may determine that the current motion vector and the candidate motion vector predictor are the same type of motion vector. In particular, the current motion vector and the candidate motion vector predictor may include a disparity motion vector if the current motion vector and the candidate motion vector predictor refer to reference pictures in one or more different layers. The current motion vector and the candidate motion vector predictors may include temporal motion vectors if the current motion vector and the candidate motion vector predictors refer to a reference picture in the current layer. If one of the current motion vector and the candidate motion vector predictor refers to a reference picture in the current layer and the other refers to a reference picture in a different layer, entropy encoding unit 56 may determine that the current motion vector and the candidate motion vector predictor are different types of motion vectors.

As yet another example, entropy encoding unit 56 may determine whether the current motion vector refers to a long-term reference picture or a short-term reference picture, and likewise, whether the candidate motion vector predictor refers to a long-term reference picture or a short-term reference picture. If both the current motion vector and the candidate motion vector predictor refer to the same type of reference picture (i.e., both refer to long-term reference pictures or both refer to short-term reference pictures), entropy encoding unit 56 may determine that the current motion vector and the candidate motion vector predictor are the same type of motion vector. On the other hand, if one of the current motion vector and the candidate motion vector predictor refers to a long-term reference picture and the other refers to a short-term reference picture, entropy encoding unit 56 may determine that the current motion vector and the candidate motion vector predictor are different types of motion vectors. The motion vectors referencing the long-term reference picture may include temporal motion vectors, while the motion vectors referencing the short-term reference picture may include disparity motion vectors.

As discussed above, entropy encoding unit 56 may determine that a candidate motion vector having a type different from the current motion vector is not available as the motion vector predictor for the current motion vector. Thus, entropy encoding unit 56 may remove these candidate motion vector predictors from the list of candidate motion vector predictors for the current motion vector, or omit adding this candidate motion vector predictor to the list of candidate motion vector predictors. In detail, entropy encoding unit 56 may set a value of a variable associated with the candidate motion vector predictor to indicate whether the candidate motion vector predictor is available as the motion vector predictor of the current motion vector. Furthermore, entropy encoding unit 56 may be configured to select a candidate motion vector predictor of the same type as the current motion vector to encode the current motion vector, e.g., for which a variable associated with the candidate motion vector predictor has a value that indicates that the candidate motion vector predictor may be used as the motion vector predictor for the current motion vector. Entropy encoding unit 56 may encode the current motion vector using various motion vector encoding modes, such as an Advanced Motion Vector Predictor (AMVP) or a merge mode.

In general, when the motion vector predictor refers to a reference picture that is different from the reference picture referred to by the current motion vector (e.g., when the POC values of the reference pictures are different), the entropy encoding unit 56 may scale the used motion vector predictor to predict the current motion vector. More particularly, entropy encoding unit 56 may scale the temporal motion vector predictor based on the difference between the POC values of the reference pictures. However, when a motion vector that is also a disparity motion vector is predicted using a motion vector predictor that is a disparity motion vector, entropy encoding unit 56 may disable motion vector predictor scaling.

Entropy encoding unit 56 may encode the motion vector by calculating a motion vector difference between the motion vector and a motion vector predictor (e.g., a motion vector predictor of the same type as the motion vector being coded). In general, a motion vector may be defined by a horizontal component (or x-component) and a vertical component (or y-component). Entropy encoding unit 56 may calculate MVDx (the x component of the motion vector difference) as the difference between the x component of the motion vector being encoded and the x component of the motion vector predictor. Likewise, entropy encoding unit 56 may calculate MVDy (the y component of the motion vector difference) as the difference between the y component of the motion vector being encoded and the y component of the motion vector predictor. In the case where the motion vector is a temporal motion vector, entropy encoding unit 56 may calculate motion vector difference values (MVDx and MVDy) relative to a scaled version of the motion vector predictor (based on the POC difference between the motion vector being encoded and the reference picture referenced by the motion vector predictor). Entropy encoding unit 56 may then entropy encode MVDx and MVDy, e.g., using CABAC.

As an alternative to inter-prediction performed by motion estimation unit 42 and motion compensation unit 44 (as described above), intra-prediction unit 46 may intra-predict the current block. In particular, intra-prediction unit 46 may determine the intra-prediction mode to be used to encode the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode select unit 40 in some examples) may select an appropriate intra-prediction mode to use from the tested modes.

For example, intra-prediction unit 46 may calculate rate-distortion values using rate-distortion analysis for various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, unencoded block, which was encoded to generate the encoded block, and a bit rate (i.e., a number of bits) used to generate the encoded block. Intra-prediction unit 46 may calculate ratios from the distortions and rates of the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

After selecting the intra-prediction mode for the block, intra-prediction unit 46 may provide information to entropy encoding unit 56 indicating the selected intra-prediction mode for the block. Entropy encoding unit 56 may encode information indicating the selected intra-prediction mode. Video encoder 20 may include in the transmitted bitstream configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of the encoding contexts for the various blocks and indications of the most probable intra-prediction modes, intra-prediction mode index tables, and modified intra-prediction mode index tables to be used in each of the contexts.

Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Summer 50 represents the component(s) that perform this subtraction operation. Transform processing unit 52 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, producing a video block that includes residual transform coefficient values. Transform processing unit 52 may perform other transforms that are conceptually similar to DCT. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used.

In any case, transform processing unit 52 applies a transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from a pixel value domain to a transform domain (such as the frequency domain). Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix comprising quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

After quantization, entropy encoding unit 56 entropy codes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the contexts may be based on neighboring blocks. After entropy coding by entropy encoding unit 56, the encoded bitstream may be transmitted to another device (e.g., video decoder 30) or archived for later transmission or retrieval.

Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the frames of reference picture memory 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reconstructed video block for storage in reference picture memory 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.

As such, video encoder 20 of fig. 2 represents an example of a video encoder configured to perform the following operations: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and encoding the current motion vector based at least in part on the value of the variable. In addition, when the first type includes a disparity motion vector, the second type includes a disparity motion vector, and the candidate motion vector predictor is used to predict the current motion vector, the video encoder may be configured to code the current motion vector without scaling the candidate motion vector predictor.

Fig. 3 is a block diagram illustrating an example of a video decoder 30, which video decoder 30 may implement techniques for coding motion vectors and for performing bi-prediction in HEVC and its extensions, such as multiview or 3DV extensions. In the example of fig. 3, video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intra prediction unit 74, an inverse quantization unit 76, an inverse transform unit 78, a reference picture memory 82, and a summer 80. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 (fig. 2). Motion compensation unit 72 may generate prediction data based on motion vectors received from entropy decoding unit 70, while intra-prediction unit 74 may generate prediction data based on intra-prediction mode indicators received from entropy decoding unit 70.

During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, or intra-prediction mode indicators, among other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

When a video slice is coded as an intra-coded (I) slice, intra-prediction unit 74 may generate prediction data for a video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P or GPB) slice, motion compensation unit 72 generates predictive blocks for the video blocks of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 70. The predictive block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct reference frame lists (list 0 and list1) using default construction techniques based on the reference pictures stored in reference picture memory 82.

Motion compensation unit 72 determines prediction information for the video blocks of the current video slice by analyzing the motion vectors and other syntax elements and uses the prediction information to generate the predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine prediction modes (e.g., intra-prediction or inter-prediction) used to code video blocks of the video slice, inter-prediction slice types (e.g., B-slices, P-slices, or GPB slices), construction information for one or more of the reference picture lists of the slice, motion vectors for each inter-coded video block of the slice, inter-prediction states for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.

Entropy decoding unit 70 may entropy decode the motion vectors of the P-and B-coded blocks. For example, entropy decoding unit 70 may use AMVP or merge mode to decode the motion vectors. In particular, in accordance with the techniques of this disclosure, entropy decoding unit 70 may avoid decoding the current motion vector using a candidate motion vector predictor having a type different from the current motion vector being decoded. For example, when the current motion vector includes a disparity motion vector, entropy decoding unit 70 may decode the current motion vector using a motion vector predictor that is also a disparity motion vector. Likewise, entropy decoding unit 70 may disable scaling when decoding the current motion vector using the motion vector predictor that includes the disparity motion vector. As another example, when the current motion vector includes a temporal motion vector, entropy decoding unit 70 may decode the current motion vector using a motion vector predictor that is also a temporal motion vector.

In accordance with the techniques of this disclosure, entropy decoding unit 70 may set a candidate motion vector predictor to be unavailable in AMVP (or merge mode) to predict a current motion vector when the candidate motion vector predictor has a different type than the current motion vector. For example, if the candidate motion vector predictor is a disparity motion vector and the current motion vector is a temporal motion vector, entropy decoding unit 70 may set the candidate motion vector predictor as unavailable as a predictor for the current motion vector. Likewise, if the candidate motion vector predictor is a temporal motion vector and the current motion vector is a disparity motion vector, entropy decoding unit 70 may set the candidate motion vector predictor as unavailable as a predictor for the current motion vector.

Entropy decoding unit 70 may use one or more various techniques to determine whether the motion vector and motion vector predictor being decoded are the same type of motion vector or different types of motion vectors. In some examples, entropy decoding unit 70 may determine whether the motion vector and candidate motion vector predictor being decoded reference pictures having different POC values than the current picture being decoded. Entropy decoding unit 70 may determine that the motion vector and the candidate motion vector predictor are different types of motion vectors if one of the motion vector or the candidate motion vector predictor refers to a reference picture having a different POC value than the current picture being decoded and the other refers to a reference picture having the same POC value as the current picture being decoded. In detail, a motion vector referring to a reference picture having the same POC value as the current picture being decoded may be considered as a disparity motion vector, and a motion vector referring to a reference picture having a different POC value from the current picture may be considered as a temporal motion vector.

As another example, entropy decoding unit 70 may determine whether the current motion vector reference comprises a reference picture in a current layer or a different layer of the current picture being decoded. Likewise, entropy decoding unit 70 may determine whether the candidate motion vector predictor refers to a reference picture in the current layer or a different layer. Entropy decoding unit 70 may determine that the current motion vector and the candidate motion vector predictor are the same type of motion vector if both the current motion vector and the candidate motion vector predictor refer to a reference picture in the current layer or a reference picture in a different layer. In particular, the current motion vector and the candidate motion vector predictor may include a disparity motion vector if the current motion vector and the candidate motion vector predictor refer to reference pictures in one or more different layers. The current motion vector and the candidate motion vector predictors may include temporal motion vectors if the current motion vector and the candidate motion vector predictors refer to a reference picture in the current layer. Entropy decoding unit 70 may determine that the current motion vector and the candidate motion vector predictor are different types of motion vectors if one of the current motion vector and the candidate motion vector predictor refers to a reference picture in the current layer and the other refers to a reference picture in a different layer.

As yet another example, entropy decoding unit 70 may determine whether the current motion vector refers to a long-term reference picture or a short-term reference picture, and likewise, whether the candidate motion vector predictor refers to a long-term reference picture or a short-term reference picture. If both the current motion vector and the candidate motion vector predictor refer to the same type of reference picture (i.e., both refer to long-term reference pictures or both refer to short-term reference pictures), entropy decoding unit 70 may determine that the current motion vector and the candidate motion vector predictor are the same type of motion vector. On the other hand, if one of the current motion vector and the candidate motion vector predictor refers to a long-term reference picture and the other refers to a short-term reference picture, entropy decoding unit 70 may determine that the current motion vector and the candidate motion vector predictor are different types of motion vectors. The motion vectors referencing the long-term reference picture may include temporal motion vectors, while the motion vectors referencing the short-term reference picture may include disparity motion vectors.

As discussed above, entropy decoding unit 70 may determine that a candidate motion vector having a type different from the current motion vector is not available as a motion vector predictor for the current motion vector. Accordingly, the entropy decoding unit 70 may remove the candidate motion vector predictors of the current motion vector from the list of candidate motion vector predictors, or omit adding this candidate motion vector predictor to the list of candidate motion vector predictors. Based on whether the candidate motion vector predictor is of the same type as the current motion vector, entropy decoding unit 70 may also set a variable associated with the candidate motion vector predictor that indicates whether the candidate motion vector may be used as the motion vector predictor for the current motion vector. Furthermore, entropy decoding unit 70 may be configured to select a candidate motion vector predictor of the same type as the current motion vector to decode the current motion vector, i.e., whether the candidate motion vector predictor has an associated variable value that indicates that the candidate motion vector predictor may be used as the motion vector predictor for the current motion vector being decoded. The entropy decoding unit 70 may decode the current motion vector using various motion vector decoding modes such as an Advanced Motion Vector Predictor (AMVP) or a merge mode.

To decode the current motion vector, entropy decoding unit 70 may select one of a plurality of candidate motion vector predictors (e.g., as indicated by syntax data, or according to an implicit selection process). When the selected motion vector predictor is a temporal motion vector, entropy decoding unit 70 may scale the selected motion vector predictor based on a POC difference between a reference picture referenced by the motion vector predictor and a reference picture referenced by the current motion vector. Entropy decoding unit 70 may also decode syntax elements that represent MVDx values (i.e., the horizontal or x component of the motion vector differences) and MVDy values (i.e., the vertical or y component of the motion vector differences). Entropy decoding unit 70 may also add the MVDx value to the x-component of the selected (and possibly scaled) motion vector predictor to reproduce the x-component of the current motion vector and the MVDy value to the y-component of the selected (and possibly scaled) motion vector predictor to reproduce the y-component of the current motion vector. Entropy decoding unit 70 may provide the reproduced (i.e., decoded) motion vectors to motion compensation unit 72.

Motion compensation unit 72 may retrieve data from a previously decoded picture (e.g., from reference picture memory 82) using the decoded motion vectors. Motion compensation unit 72 may also perform interpolation based on the interpolation filter. Motion compensation unit 72 may use interpolation filters as used by video encoder 20 during encoding of the video block to calculate interpolated values for sub-integer pixels of the reference block. In this case, motion compensation unit 72 may determine the interpolation filter used by video encoder 20 from the received syntax element and use the interpolation filter to generate the predictive block.

Inverse quantization unit 76 inverse quantizes (i.e., de-quantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may comprise using, for each video block in the video slice, a quantization parameter QP calculated by video decoder 30_YTo determine the degree of quantization and likewise the degree of inverse quantization that should be applied. The inverse transform unit 78 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain.

After motion compensation unit 72 generates the predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing: residual blocks from inverse transform unit 78, and corresponding predictive blocks generated by motion compensation unit 72. Summer 90 represents the component(s) that perform such a summation operation. Deblocking filters may also be applied to filter the decoded blocks, if desired, in order to remove blocking artifacts. Other loop filters (in or after the coding loop) may also be used to smooth pixel transitions, or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 82, reference picture memory 82 storing reference pictures for subsequent motion compensation. Reference picture memory 82 also stores decoded video for later presentation on a display device, such as display device 32 of fig. 1.

Video decoder 30 may be configured to perform a decoding process in accordance with one or more techniques of this disclosure. In some examples, for each picture PIC, when either the aggressive _ disabled _ PIC _ flag [ i ] or aggressive _ disabled _ PIC _1X _ flag [ i ] corresponds to the picture PIC, the flag cons implittitflag is derived to be equal to aggressive _ disabled _ PIC _ flag [ i ] or aggressive _ disabled _ PIC _1X _ flag [ i ] (where X is equal to 0 for RefPicList0 or equal to 1 for RefPicList 1). Alternatively, when the entire RPS subset is indicated as constrained for implicit weighted prediction, each picture of this RPS subset may have a cons impliitflag equal to 1, otherwise, each picture of this RPS subset may have a cons impliitflag equal to 0.

As one example, video decoder 30 may be configured to perform a weighted sample prediction process. Inputs to such a process may include:

a position (xB, yB) specifying a top left sample of the current prediction unit relative to a top left sample of the current coding unit,

the width nPSW and the height nPSH of this prediction unit,

two (nPSW) × (nPSH) arrays predSamplesL0 and predSamplesL1,

the prediction list utilizes the flags predflag l0 and predflag l1,

reference indices refIdxL0 and refIdxL1,

motion vectors mvL0 and mvL1,

the bit depth bitDepth of the chrominance component.

The output of such a process may include:

-an (nPSW) × (nPSH) array of predicted sample values predSamples.

Video decoder 30 may derive variables shift1, shift2, offset1, and offset2 as follows:

variable shift1 may be set equal to (14-bitDepth) and variable shift2 may be set equal to (15-bitDepth),

the variable offset1 may be set equal to 1 < (shift1-1), and the variable offset2 may be set equal to 1 < (shift 2-1).

if weighted _ pred _ flag is equal to 0, then a default weighted sample prediction process as described in sub-clause 8.5.2.2.3.1 of WD6 of HEVC may be invoked, with the inputs and outputs identical to the process described in this sub-clause.

Otherwise (weighted _ pred _ flag equal to 1), the explicit weighted sample prediction process as described in sub-clause 8.5.2.2.3.2 of WD6 of HEVC may be invoked, with the input and output identical to the process described in this sub-clause.

if weighted bipred idc is equal to 0, a default weighted sample prediction process as described in sub-clause 8.5.2.2.3.1 of WD6 of HEVC may be invoked, with the inputs and outputs identical to the process described in this sub-clause.

Otherwise, if weighted _ bipred _ idc is equal to 1 and if predflag l0 or predflag l1 is equal to 1, an explicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of WD6 of HEVC may be invoked, with the inputs and outputs identical to the process described in this subclause.

Otherwise, if weighted _ bipred _ idc is equal to 2, then the following applies:

if predflag l0 equals 1 and predflag l1 equals 1, then the implicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of WD6 of HEVC may be invoked, with the inputs and outputs identical to the process described in this subclause.

Otherwise (predflag l0 or predflag l1 equal to 1 but not both) the default weighted sample prediction process as described in subclause 8.5.2.2.3.1 of WD6 of HEVC may be invoked, with inputs and outputs identical to the process described in this subclause.

Else (weighted _ bipred _ idc equal to 3), the following applies:

if predflag l0 is equal to 1 and predflag l1 is equal to 1, and cons implitflag (RefPicListL0(refIdxL0)) and cons implitflag (RefPicListL1(refIdxL1)) are both equal to 1, then the implicit weighted sample prediction process as described in subclause 8.5.2.2.3.2 of WD6 of HEVC may be invoked, with the inputs and outputs being the same as the process described in this subclause.

Alternatively, the implicit method when weighted _ bipred _ idc equals 2 can be directly changed to the method specified above for weighted _ bipred _ idc equal to 3.

As such, video decoder 30 of fig. 3 represents an example of a video decoder configured to perform the following operations: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and decoding the current motion vector based at least in part on the value of the variable. In addition, when the first type includes a disparity motion vector, the second type includes a disparity motion vector, and the candidate motion vector predictor is used to predict the current motion vector, the video decoder may be configured to code the current motion vector without scaling the candidate motion vector predictor.

Fig. 4 is a conceptual diagram illustrating an example MVC prediction pattern. In the example of FIG. 4, eight views are illustrated (with view IDs "S0" through "S7"), and twelve time positions ("T0" through "T11") are illustrated for each view. That is, each row in FIG. 4 corresponds to a view, while each column indicates a temporal location.

Although MVC has a so-called base view that can be decoded by an h.264/AVC decoder, and a stereoscopic view pair can also be supported by MVC, MVC has the advantage that: it may support an example of using more than two views as a 3D video input and decoding this 3D video represented by multiple views. A renderer of a client with an MVC decoder may expect 3D video content with multiple views.

The frame in fig. 4 is indicated using shaded blocks comprising letters at the intersection of each row and each column in fig. 4, thereby specifying whether the corresponding frame is intra-coded (i.e., an I-frame), inter-coded in one direction (i.e., as a P-frame), or inter-coded in multiple directions (i.e., as a B-frame). Generally, prediction is indicated by an arrow, where the frame pointed to by the arrow uses the object from which the arrow came for prediction reference. For example, the P frame at temporal position T0 of view S2 is predicted from the I frame at temporal position T0 of view S0.

As with single-view video encoding, frames of a multi-view video coding video sequence may be predictively encoded with respect to frames at different temporal locations. For example, the b frame at temporal position T1 of view S0 has an arrow pointing from the I frame at temporal position T0 of view S0 to the b frame, indicating that the b frame is predicted from the I frame. However, additionally, in the context of multi-view video coding, frames may be inter-view predicted. That is, the view component may use the view component in the other view for reference. For example, in MVC, inter-view prediction is implemented as if the view component in another view is an inter-prediction reference. Possible inter-view references are signaled in the Sequence Parameter Set (SPS) MVC extension and can be modified by the reference picture list construction process, which enables flexible ordering of inter-prediction or inter-view prediction references.

In the MVC extension of h.264/AVC, inter-view prediction is supported by disparity motion compensation (which uses the syntax of h.264/AVC motion compensation), as an example, but allows pictures in different views to be used as reference pictures. Coding of two views may be supported by MVC, which is generally referred to as stereoscopic views. One of the advantages of MVC is: an MVC encoder may take more than two views as a 3D video input, and an MVC decoder may decode such multi-view representations. Thus, a rendering device with an MVC decoder may expect 3D video content with more than two views.

In MVC, inter-view prediction among pictures in the same access unit (i.e., with the same time instance entry) is allowed. In general, an access unit is a data unit that includes all view components (e.g., all NAL units) in a common time instance. Thus, in MVC, inter-view prediction among pictures in the same access unit is permitted. When coding a picture in one of the non-base views, the picture may be added into the reference picture list if the picture is in a different view but has the same temporal instance (e.g., the same POC value, and thus, in the same access unit). Inter-view prediction reference pictures can be placed in any position of the reference picture list, just like any inter-prediction reference picture.

Fig. 4 provides various examples of inter-view prediction. In the example of fig. 4, the frames of view S1 are illustrated as being predicted from frames at different temporal locations of view S1 and being inter-view predicted from frames at the same temporal location of views S0 and S2. For example, the B-frame at temporal position T1 of view S1 is predicted from each of the B-frames at temporal positions T0 and T2 of view S1 and the B-frames at temporal positions T1 of views S0 and S2.

In the example of fig. 4, the capital "B" and lowercase "B" letters are intended to indicate different hierarchical relationships between frames, rather than different encoding methods. In general, capital "B" frames are relatively high in the prediction hierarchy compared to lower case "B" frames. Fig. 4 also illustrates the variation of the predicted level using different degrees of shading, where a larger amount of shaded (i.e., relatively darker) frames are higher in the predicted level than those frames having less shading (i.e., relatively lighter). For example, all I frames in FIG. 4 are illustrated with full shading, while P frames have a slightly lighter shading, and B frames (and lower case B frames) have various degrees of shading relative to each other, but are always lighter than the shading of P and I frames.

In general, the prediction hierarchy is related to the view order index, because frames that are relatively high in the prediction hierarchy should be decoded before frames that are relatively low in the hierarchy are decoded, so that those frames that are relatively high in the hierarchy can be used as reference frames during decoding of frames that are relatively low in the hierarchy. The view order index is an index indicating a decoding order of view components in an access unit. View order indices are implied in the SPS MVC extension as specified in Annex H of h.264/AVC (MVC amendment). In SPS, for each index i, a corresponding view _ id is signaled. In some examples, decoding of the view components should follow the ascending order of the view order indices. If all views are presented, the view order indices are in a sequential order from 0 to num _ views _ minus _ 1.

As such, frames used as reference frames may be decoded prior to decoding frames encoded with reference to the reference frames. The view order index is an index indicating a decoding order of view components in an access unit. For each view order index i, the corresponding view _ id is signaled. The decoding of the view components follows the ascending order of the view order indices. If all views are presented, the set of view order indices may include a set of sequential orderings from zero to one less than the full number of views.

For some frames at equal levels of the hierarchy, the decoding order may be insignificant relative to each other. For example, the I frame at temporal position T0 of view S0 serves as a reference frame for the P frame at temporal position T0 of view S2, which in turn serves as a reference frame for the P frame at temporal position T0 of view S4. Thus, the I frame at temporal position T0 of view S0 should be decoded before the P frame at temporal position T0 of view S2, which should be decoded before the P frame at temporal position T0 of view S4. However, between views S1 and S3, the decoding order does not matter, since views S1 and S3 are not predicted dependent on each other, but only from views higher in the prediction hierarchy. Furthermore, view S1 may be decoded before view S4, as long as view S1 is decoded after views S0 and S2.

As such, the hierarchical ordering may be used to describe views S0-S7. Let notation SA > SB mean that view SA should be decoded before view SB. In the example of FIG. 4, this notation S0 > S2 > S4 > S6 > S7 is used. Also, with respect to the example of FIG. 4, S0 > S1, S2 > S1, S2 > S3, S4 > S3, S4 > S5, and S6 > S5. Any decoding order of views that does not violate these requirements is possible. Thus, many different decoding orders are possible with only certain limitations.

FIG. 5 is a flow diagram illustrating an example method for encoding a current block in accordance with the techniques of this disclosure. The current block may include the current CU or a portion of the current CU (e.g., the current PU). Although described with respect to video encoder 20 (fig. 1 and 2), it should be understood that other devices may be configured to perform a method similar to that of fig. 5.

In this example, video encoder 20 initially predicts the current block (150). For example, video encoder 20 may calculate one or more Prediction Units (PUs) for the current block. In this example, assume that video encoder 20 inter-predicts the current block. For example, motion estimation unit 42 may calculate a motion vector for the current block by performing a motion search of previously coded pictures (e.g., inter-view pictures and temporal pictures). Accordingly, motion estimation unit 42 may generate a temporal motion vector or a disparity motion vector to encode the current block.

Video encoder 20 may then encode the motion vectors. In detail, entropy encoding unit 56 may determine a list of candidate motion vector predictors (152). For example, entropy encoding unit 56 may select motion vectors of one or more neighboring blocks as candidate motion vector predictors. Video encoder 20 may determine that each of the candidate motion vector predictors in the list is available based on criteria other than the type of motion vector. Entropy encoding unit 56 may then determine whether any of the list of candidate motion vectors is of a different type than the current motion vector. Entropy encoding unit 56 may then remove the candidate motion vector predictor having a type different from the type of the current motion vector from the list of candidate motion vector predictors (154). In detail, the entropy encoding unit 56 may set a variable indicating whether the candidate motion vector predictor can be used as a motion vector predictor based on whether the candidate motion vector predictor has a type different from that of the current motion vector predictor being encoded. As such, video encoder 20 may set the variable to a value indicating that the candidate motion vector predictor is unavailable based on having a different type than the current motion vector even when the candidate motion vector was previously determined to be available based on other criteria.

As discussed above, entropy encoding unit 56 may determine whether the candidate motion vector predictor is of the same type as the current motion vector using one of a variety of different methods. For example, entropy encoding unit 56 may determine whether the candidate motion vector predictor refers to a reference picture that has the same POC value as the current picture being encoded or a reference picture that has a different POC value than the current picture being encoded, and determine whether the reference picture referred to by the current motion vector has the same POC value as the current picture being encoded or has a different POC value than the current picture being encoded. As another example, entropy encoding unit 56 may determine whether the candidate motion vector predictor and the current motion vector both refer to reference pictures in the same layer as the current picture being encoded, or reference pictures in one or more different layers that are different from the layer that includes the current picture being encoded. As yet another example, entropy encoding unit 56 may determine whether the candidate motion vector predictor and the current motion vector both refer to a long-term reference picture or a short-term reference picture.

After forming the list of candidate motion vector predictors such that all of the candidate motion vector predictors are of the same type as the current motion vector, entropy encoding unit 56 selects one of the candidate motion vector predictors to use as the motion vector predictor for the current motion vector (156). In detail, entropy encoding unit 56 selects one of the candidate motion vector predictors for which the variable indicates whether the candidate motion vector predictor is available as the motion vector predictor for the current motion vector. If necessary, for example, if the selected motion vector predictor is a temporal motion vector that references a reference picture such as: the POC value of the reference picture is different from the POC value of the reference picture to which the current motion vector refers. If the selected motion vector is a disparity motion vector, entropy encoding unit 56 may disable motion vector predictor scaling. Entropy encoding unit 56 then calculates the difference between the current motion vector and the selected (and possibly scaled) motion vector predictor (158).

Video encoder 20 may then calculate a residual block for the current block, e.g., to generate a Transform Unit (TU) (160). To calculate the residual block, video encoder 20 may calculate the difference between the original unencoded block and the predicted block for the current block. Video encoder 20 may then transform and quantize the coefficients of the residual block (162). Next, video encoder 20 may scan the quantized transform coefficients of the residual block (164). During or after scanning, video encoder 20 may entropy encode the coefficients (166). For example, video encoder 20 may encode the coefficients using CAVLC or CABAC. Video encoder 20 may then output the entropy coded data of the block (168).

As such, the method of FIG. 5 represents an example of a method that includes: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and encoding the current motion vector based at least in part on the value of the variable.

Fig. 6 is a flow diagram illustrating an example method for decoding a current block of video data in accordance with the techniques of this disclosure. The current block may include the current CU or a portion (e.g., a PU) of the current CU. Although described with respect to video decoder 30 (fig. 1 and 3), it should be understood that other devices may be configured to perform a method similar to that of fig. 6.

Initially, video decoder 30 receives data for transform coefficients and motion vector differences for a current block (200). Entropy decoding unit 70 entropy decodes the data for the coefficients and the motion vector difference (202). Entropy decoding unit 70 may then determine a list of candidate motion vector predictors (204). For example, entropy decoding unit 70 may select motion vectors of one or more neighboring blocks as candidate motion vector predictors. Video decoder 30 may determine that each of the candidate motion vector predictors in the list is available based on criteria other than the type of motion vector. Entropy decoding unit 70 may then determine whether any of the list of candidate motion vectors is of a different type than the current motion vector. Entropy decoding unit 70 may then remove the candidate motion vector predictor having a type different from the type of the current motion vector from the list of candidate motion vector predictors (206). In detail, the entropy decoding unit 70 sets a variable indicating whether the candidate motion vector predictor can be used as a motion vector predictor based on whether the candidate motion vector predictor has a type different from that of the current motion vector predictor being encoded. As such, video decoder 30 may set the variable to a value indicating that the candidate motion vector predictor is unavailable based on having a different type than the current motion vector even when the candidate motion vector was previously determined to be available based on other criteria.

As discussed above, entropy decoding unit 70 may determine whether the candidate motion vector predictor is of the same type as the current motion vector using one of a variety of different approaches. For example, entropy decoding unit 70 may determine whether the candidate motion vector predictor refers to a reference picture having the same POC value as the current picture being decoded or a reference picture having a different POC value than the current picture being decoded, and determine whether the current motion vector also refers to a reference picture having the same POC value as the current picture being decoded or a reference picture having a different POC value than the current picture being decoded. As another example, entropy decoding unit 70 may determine whether the candidate motion vector predictor and the current motion vector both refer to reference pictures in the same layer as the current picture being encoded, or are reference pictures in one or more different layers that are different from the layer that includes the current picture being encoded. As yet another example, entropy decoding unit 70 may determine whether the candidate motion vector predictor and the current motion vector both refer to a long-term reference picture or a short-term reference picture.

Entropy decoding unit 70 then selects one of the candidate motion vector predictors that is available (i.e., has a variable value indicating that the candidate motion vector is available as the motion vector predictor for the current motion vector) as the motion vector predictor for the current motion vector (208). In some examples, entropy decoding unit 70 selects the motion vector predictor according to an implicit predefined process, while in other examples, entropy decoding unit 70 decodes a syntax element that indicates which of a list of candidate motion vectors is to be selected. Entropy decoding unit 70 then mathematically combines the decoded motion vector difference value with the motion vector predictor to reproduce the current motion vector (210). For example, entropy decoding unit 70 may add the x-component of the motion vector difference (MVDx) to the x-component of the selected motion vector predictor and add the y-component of the motion vector difference (MVDy) to the y-component of the selected motion vector predictor.

Video decoder 30 may predict the current block using the decoded motion vector (212). Video decoder 30 may then inverse scan the reproduced coefficients (214) to create a block of quantized transform coefficients. Video decoder 30 may then inverse quantize and inverse transform the coefficients to generate a residual block (216). Video decoder 30 may finally decode the current block by combining the predicted block and the residual block (218).

As such, the method of FIG. 6 represents an example of a method that includes: determining a first type of a current motion vector for a current block of video data; determining a second type of candidate motion vector predictor of a neighboring block of the current block; setting a variable representing whether the candidate motion vector predictor is available to a value indicating that the candidate motion vector predictor is unavailable when the first type is different from the second type; and decoding the current motion vector based at least in part on the value of the variable.

It will be recognized that depending on the example, certain acts or events of any of the techniques described herein can be performed in a different sequence, may be added, merged, or left out entirely (e.g., not all described acts or events are necessary for the practice of the techniques). Further, in some instances, acts or events may be performed concurrently, e.g., via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. As such, computer-readable media may generally correspond to (1) tangible computer-readable storage media that is not transitory, or (2) communication media such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but instead pertain to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, as used herein, the term "processor" may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, Integrated Circuits (ICs), or sets of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperating hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method of decoding video data, the method comprising:

determining a first type of a current motion vector for a current block of video data;

determining a second type of candidate motion vector predictor of a neighboring block of the current block;

setting a variable representing whether the candidate motion vector predictor is available as a motion vector predictor of the current motion vector to a first value indicating that the candidate motion vector predictor is not available as the motion vector predictor of the current motion vector, wherein the variable may be set to the first value or a second value different from the first value indicating that the candidate motion vector predictor is available as the motion vector predictor of the current motion vector;

determining whether the first type is different from the second type; and

after initially setting the variable to the first value, setting the variable to the second value in response to determining that the first type is the same as the second type; and

decoding the current motion vector based at least in part on the second value of the variable.

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein determining the first type for the current motion vector includes determining the first type based on a first reference picture subset to which a first reference picture referred to by the current motion vector belongs, and

wherein determining the second type for the candidate motion vector comprises determining the second type based on a second reference picture subset to which a second reference picture referred to by the candidate motion vector predictor belongs.

3. The method of claim 2, wherein the current block is included within a picture of a current layer, and wherein determining that the first type is different than the second type comprises determining that the first type is different than the second type when the first reference picture is included in the current layer and the second reference picture is included in a layer other than the current layer.

4. The method of claim 2, wherein the current block is included within a picture of a current layer, and wherein determining that the first type is different than the second type comprises determining that the first type is different than the second type when the second reference picture is included in the current layer and the first reference picture is included in a layer other than the current layer.

5. The method of claim 1, further comprising, prior to setting the variable to the first value, determining that the candidate motion vector is available based on criteria other than whether the first type is different than the second type.

6. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the first type of the current motion vector represents whether a current reference Picture Order Count (POC) value of a first reference picture referenced by the current motion vector is the same as a current POC value of a current picture including the current block, and

wherein the second type of the candidate motion vector predictor represents whether a candidate reference POC value of a second reference picture referenced by the candidate motion vector predictor is the same as the current POC value.

7. The method of claim 6, further comprising decoding information indicating: setting the neighboring block comprising the candidate motion vector predictor as unavailable for reference when at least one of the current reference POC value and the candidate reference POC value is the same as the current POC value.

8. The method of claim 7, wherein decoding the current motion vector comprises: decoding the current motion vector without scaling the candidate motion vector predictor when a type of the first reference picture referenced by the current motion vector is different from a type of the second reference picture referenced by the candidate motion vector predictor.

9. The method of claim 1, wherein decoding the current motion vector comprises decoding the current motion vector using at least one of an Advanced Motion Vector Prediction (AMVP) mode and a merge mode, the method further comprising: refraining from adding the motion vector predictor to an AMVP candidate list for the current motion vector when the motion vector is decoded using AMVP and when the variable indicates that the candidate motion vector predictor is unavailable, and refraining from adding the motion vector predictor to a merge candidate list for the current motion vector when the motion vector is decoded using merge mode and when the variable indicates that the candidate motion vector predictor is unavailable.

10. The method of claim 1, further comprising decoding data indicating: whether inter-view reference is never selected as a collocated picture of Temporal Motion Vector Prediction (TMVP) mode for all slices in a decoded video sequence of the video data.

11. The method of claim 10, wherein decoding the data comprises decoding a flag disable inter view as tmvp flag.

12. The method of claim 10, wherein decoding the data comprises decoding the data with at least one of: extension bits for multi-view video decoding MVC extensions, extension bits for three-dimensional video 3DV extensions, subset sequence parameter set SPS, and video parameter set VPS.

13. The method of claim 1, further comprising determining a type of a Reference Picture Set (RPS) subset that includes a reference picture to which the current motion vector refers, wherein determining the first type of the current motion vector comprises determining that the first type is equal to the type of the RPS subset.

14. The method of claim 1, the method being executable on a wireless communication device, wherein the device comprises:

a memory configured to store the video data;

a processor configured to execute instructions to decode the video data stored in the memory; and

a receiver configured to receive a signal comprising the video data and to store the video data in the signal to the memory.

15. The method of claim 14, wherein the wireless communication device is a cellular handset and the signal is received by the receiver and modulated according to a cellular communication standard.

16. A method of encoding video data, the method comprising:

determining whether the first type is different from the second type; and after initially setting the variable to the first value, setting the variable to the second value in response to determining that the first type is the same as the second type; and

encoding the current motion vector based at least in part on the second value of the variable.

17. The method of claim 16, wherein the first and second light sources are selected from the group consisting of,

18. The method of claim 17, wherein the current block is included within a picture of a current layer, and wherein determining that the first type is different than the second type comprises determining that the first type is different than the second type when the first reference picture is included in the current layer and the second reference picture is included in a layer other than the current layer.

19. The method of claim 17, wherein the current block is included within a picture of a current layer, and wherein determining that the first type is different than the second type comprises determining that the first type is different than the second type when the second reference picture is included in the current layer and the first reference picture is included in a layer other than the current layer.

20. The method of claim 16, further comprising, prior to setting the variable to the first value, determining that the candidate motion vector is available based on criteria other than whether the first type is different than the second type.

21. The method of claim 16, wherein the first and second light sources are selected from the group consisting of,

22. The method of claim 21, further comprising encoding information indicating: setting the neighboring block comprising the candidate motion vector predictor as unavailable for reference when at least one of the current reference POC value and the candidate reference POC value is the same as the current POC value.

23. The method of claim 22, wherein encoding the current motion vector comprises: encoding the current motion vector without scaling the candidate motion vector predictor when the type of the first reference picture referenced by the current motion vector is different from the type of the second reference picture referenced by the candidate motion vector predictor.

24. The method of claim 16, wherein encoding the current motion vector comprises encoding the current motion vector using at least one of an Advanced Motion Vector Prediction (AMVP) mode and a merge mode, the method further comprising: refraining from adding the motion vector predictor to an AMVP candidate list for the current motion vector when the motion vector is encoded using AMVP and when the variable indicates that the candidate motion vector predictor is unavailable, and refraining from adding the motion vector predictor to a merge candidate list for the current motion vector when the motion vector is encoded using a merge mode and when the variable indicates that the candidate motion vector predictor is unavailable.

25. The method of claim 16, further comprising encoding data indicating: whether inter-view reference is never selected as a collocated picture of Temporal Motion Vector Prediction (TMVP) mode for all slices in an encoded video sequence of the video data.

26. The method of claim 25, wherein encoding the data comprises encoding a flag disable inter view as tmvp flag.

27. The method of claim 25, wherein encoding the data comprises encoding the data with at least one of: extension bits for multi-view video coding, MVC, extension bits for three-dimensional video, 3DV, subset sequence parameter set, SPS, and video parameter set, VPS.

28. The method of claim 16, further comprising determining a type of a Reference Picture Set (RPS) subset that includes a reference picture to which the current motion vector refers, wherein determining the first type of the current motion vector comprises determining that the first type is equal to the type of the RPS subset.

29. A device for decoding video data, the device comprising:

a memory configured to store video data; and

a video decoder configured to:

determining a first type of a current motion vector for a current block of the video data;

setting a variable representing whether the candidate motion vector predictor is available as a motion vector predictor for the current motion vector to a first value indicating that the candidate motion vector predictor is not available as the motion vector predictor for the current motion vector, wherein the variable may be set to the first value or a second value different from the first value indicating that the candidate motion vector predictor is available for the motion vector predictor for the current motion vector;

determining whether the first type is different from the second type; and

30. The device of claim 29, wherein the video decoder is configured to determine the first type for the current motion vector based on a first reference picture subset to which a first reference picture referenced by the current motion vector belongs, and wherein the video decoder is configured to determine the second type based on a second reference picture subset to which a second reference picture referenced by the candidate motion vector predictor belongs.

31. The device of claim 29, wherein the video decoder is further configured to, prior to setting the variable to the first value, determine that the candidate motion vector is available based on criteria other than whether the first type is different than the second type.

32. The apparatus as set forth in claim 29, wherein,

wherein the first type of the current motion vector represents whether a current reference picture POC value of a first reference picture referenced by the current motion vector is the same as a current POC value of a current picture including the current block, and

33. The device of claim 32, wherein the video decoder is further configured to decode information indicating: setting the neighboring block as unavailable for reference when at least one of the current reference POC value and the candidate reference POC value is the same as the current POC value.

34. The device of claim 29, wherein to decode the current motion vector, the video decoder is configured to decode the current motion vector using at least one of an Advanced Motion Vector Prediction (AMVP) mode and a merge mode, wherein when decoding the motion vector using AMVP and when the variable indicates that the candidate motion vector predictor is unavailable, the video decoder is configured to refrain from adding the motion vector predictor into an AMVP candidate list for the current motion vector, and when decoding the motion vector using merge mode and when the variable indicates that the candidate motion vector predictor is unavailable, the video decoder is configured to refrain from adding the motion vector predictor into a merge candidate list for the current motion vector.

35. The device of claim 29, wherein the video decoder is configured to decode data indicating: whether inter-view reference is never selected as a collocated picture of Temporal Motion Vector Prediction (TMVP) mode for all slices in a decoded video sequence of the video data.

36. The device of claim 35, wherein the data includes a flag disable inter view as tmvp flag.

37. The device of claim 29, wherein the video decoder is configured to determine a type of a Reference Picture Set (RPS) subset that includes a reference picture to which the current motion vector refers, wherein to determine the first type of the current motion vector, the video decoder is configured to determine that the first type is equal to the type of the RPS subset.

38. The device of claim 29, wherein the video decoder is configured to: decoding the current motion vector; decoding residual data for the current block; forming predicted data for the current block based at least in part on the current motion vector; and combining the predicted data and the residual data to reconstruct the current block.

39. The device of claim 29, further comprising a video encoder configured to: encoding the current motion vector; forming predicted data for the current block based at least in part on the current motion vector; calculating residual data for the current block based on a difference between the current block and the predicted data; and encoding the residual data.

40. The device of claim 29, wherein the device comprises at least one of:

an integrated circuit;

a microprocessor; and

a wireless communication device comprising the video decoder.

41. The device of claim 29, the device being a wireless communication device, further comprising:

42. The device of claim 41, wherein the wireless communication device is a cellular handset and the signal is received by the receiver and modulated according to a cellular communication standard.

43. A device for encoding video data, the device comprising:

a memory configured to store video data; and

a video encoder configured to:

determining whether the first type is different from the second type; and

after initially setting the variable to the first value, setting the variable to the second value corresponding to the first type being the same as the second type; and

44. The device of claim 43, wherein the video encoder is configured to determine the first type for the current motion vector based on a first reference picture subset to which a first reference picture referred to by the current motion vector belongs, and wherein the video encoder is configured to determine the second type based on a second reference picture subset to which a second reference picture referred to by the candidate motion vector predictor belongs.

45. The device of claim 43, wherein the video encoder is further configured to, prior to setting the variable to the first value, determine that the candidate motion vector is available based on criteria other than whether the first type is different than the second type.

46. The apparatus in accordance with claim 43 wherein the first and second electrodes are electrically connected,

wherein the first type of the current motion vector represents whether a current reference POC value of a first reference picture referenced by the current motion vector is the same as a current POC value of a current picture including the current block, and

47. The device of claim 46, wherein the video encoder is further configured to encode information indicating that: setting the neighboring block as unavailable for reference when at least one of the current reference POC value and the candidate reference POC value is the same as the current POC value.

48. The device of claim 43, wherein to encode the current motion vector, the video encoder is configured to encode the current motion vector using at least one of an Advanced Motion Vector Prediction (AMVP) mode and a merge mode, wherein when the motion vector is encoded using AMVP and when the variable indicates that the candidate motion vector predictor is unavailable, the video encoder is configured to refrain from adding the motion vector predictor into an AMVP candidate list for the current motion vector, and when the motion vector is encoded using a merge mode and when the variable indicates that the candidate motion vector predictor is unavailable, the video encoder is configured to refrain from adding the motion vector predictor into a merge candidate list for the current motion vector.

49. The device of claim 43, wherein the video encoder is configured to encode data indicating: whether inter-view reference is never selected as a collocated picture of Temporal Motion Vector Prediction (TMVP) mode for all slices in an encoded video sequence of the video data.

50. The device of claim 49, wherein the data includes a flag disable inter view as tmvp flag.

51. The device of claim 43, further comprising a type of a Reference Picture Set (RPS) subset of a reference picture to which the current motion vector refers, wherein to determine the first type of the current motion vector, the video encoder is configured to determine that the first type is equal to the type of the RPS subset.

52. The device of claim 43, further comprising a video encoder configured to: decoding the current motion vector; decoding residual data for the current block; forming predicted data for the current block based at least in part on the current motion vector; and combining the predicted data and the residual data to reconstruct the current block.

53. The device of claim 43, wherein the video encoder is configured to: encoding the current motion vector; forming predicted data for the current block based at least in part on the current motion vector; calculating residual data for the current block based on a difference between the current block and the predicted data; and encoding the residual data.

54. A device for encoding video data, the device comprising:

means for determining a first type of a current motion vector for a current block of video data;

means for determining a second type of candidate motion vector predictor for a neighboring block of the current block;

means for setting a variable representing whether the candidate motion vector predictor is available as a motion vector predictor for the current motion vector to a first value indicating that the candidate motion vector predictor is not available as the motion vector predictor for the current motion vector, wherein the variable may be set to the first value or a second value different from the first value indicating that the candidate motion vector predictor is available as the motion vector predictor for the current motion vector;

means for determining whether the first type is different from the second type; and

means for setting the variable to a second value in response to the first type being the same as the second type after initially setting the variable to the first value; and

means for encoding the current motion vector based at least in part on the second value of the variable.

55. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to:

determining whether the first type is different from the second type; and

after initially setting the variable to the first value, setting the variable to the second value in response to the first type being the same as the second type; and