HK1197781A

HK1197781A - Progressive coding of position of last significant coefficient

Info

Publication number: HK1197781A
Application number: HK14111433.6A
Authority: HK
Inventors: 钱威俊; 霍埃尔．索赖．罗哈斯; 马尔塔.卡切维奇; 瑞珍．雷克斯曼．乔许
Original assignee: Qualcomm Incorporated
Priority date: 2011-11-08
Filing date: 2012-11-06
Publication date: 2015-02-13

Description

Progressive decoding of the position of the last significant coefficient

The present application claims the benefit of:

united states provisional application No. 61/557,317, filed on 8/11/2011; and

united states provisional application number 61/561,909 filed on 20/11/2011,

each of the applications is hereby incorporated by reference in its entirety.

Technical Field

The present invention relates to video coding.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), portable or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T h.263, ITU-T h.264/MPEG-4 part 10 (advanced video coding (AVC)), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.

Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Spatial prediction or temporal prediction results in a predictive block for the block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.

Disclosure of Invention

In general, techniques are described for coding video data. Video encoding typically involves predicting a block of video data using a particular prediction mode, and coding residual values for the block based on the difference between the predicted block and the actual block being coded. The residual block includes these pixel-by-pixel differences. The residual block may be transformed and quantized. A video coder may include a quantization unit that maps transform coefficients into discrete level values. This disclosure provides techniques for coding the position of the last significant coefficient within a video block.

In one example, a method for encoding video data comprises: obtaining a value indicative of a position of a last significant coefficient within a video block of size T; determining a first binary string of the value indicating the position of the last significant coefficient based on a truncated unary coding scheme defined by a maximum bit length, the maximum bit length passing through 2log₂(T) -1; based on fixed lengthA coding scheme to determine a second binary string indicating the value of the position of the last significant coefficient; and encoding the first and second binary strings into a bitstream.

In another example, a method for decoding video data comprises: obtaining a first binary string and a second binary string from the encoded bitstream; determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is encoded by a 2log maximum bit length₂(T) -1 defined truncated unary coding scheme; and determining the value indicating the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

In another example, an apparatus for encoding video data comprises a video encoding device configured to: obtaining a value indicative of a position of a last significant coefficient within a video block of size T; determining a first binary string of the value indicating the position of the last significant coefficient based on a truncated unary coding scheme defined by a maximum bit length, the maximum bit length passing through 2log₂(T) -1; determining a second binary string indicating the value of the position of the last significant coefficient based on a fixed length coding scheme; and encoding the first and second binary strings into a bitstream.

In another example, an apparatus for decoding video data comprises a video decoding device configured to: obtaining a first binary string and a second binary string from the encoded bitstream; determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is encoded by a 2log maximum bit length₂(T) -1 defined truncated unary coding scheme; and determining the value indicative of the position of the last significant coefficient based in part on the second binary string, wherein the value is a function of the last significant coefficientThe second binary string is defined by a fixed length coding scheme.

In another example, a device for encoding video data comprises: means for obtaining a value indicative of a position of a last significant coefficient within a video block of size T; means for determining a first binary string of the value indicating the position of the last significant coefficient based on a truncated unary coding scheme defined by a maximum bit length, the maximum bit length passing through 2log₂(T) -1; means for determining a second binary string indicating the value for the position of the last significant coefficient based on a fixed length coding scheme; and means for encoding the first and second binary strings into a bitstream.

In another example, a device for decoding video data comprises: means for obtaining a first binary string and a second binary string from an encoded bitstream; means for determining a value indicative of a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is encoded by a 2log maximum bit length₂(T) -1 defined truncated unary coding scheme; and means for determining the value indicative of the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

In another example, a computer-readable storage medium comprising instructions stored thereon that, when executed, cause a processor of a device for encoding video data to cause one or more processors to: obtaining a value indicative of a position of a last significant coefficient within a video block of size T; determining a first binary string of the value indicating the position of the last significant coefficient based on a truncated unary coding scheme defined by a maximum bit length, the maximum bit length passing through 2log₂(T) -1; determining a second binary string indicating the value of the position of the last significant coefficient based on a fixed length coding scheme; and combining the first and secondThe binary string and the second binary string are encoded into a bitstream.

In another example, a computer-readable storage medium comprises instructions stored thereon that, when executed, cause a processor of a device for decoding video data to: obtaining a first binary string and a second binary string from the encoded bitstream; determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is encoded by a 2log maximum bit length₂(T) -1 defined truncated unary coding scheme; and determining the value indicating the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

In one example, a method for decoding video data comprises: obtaining a first binary string and a second binary string from the encoded bitstream; determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is defined by a maximum bit length of log₂(T) + 1; and determining the value indicating the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

In one example, a method for decoding video data comprises: obtaining a first binary string and a second binary string from the encoded bitstream; determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is defined by a maximum bit length of log₂(T) a defined truncated unary coding scheme; and determining the value indicating the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.

Fig. 2A-2D illustrate exemplary coefficient value scan orders.

Fig. 3 illustrates one example of a significance map for a block of coefficient values.

FIG. 4 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

FIG. 5 is a block diagram illustrating an example entropy encoder that may implement the techniques described in this disclosure.

FIG. 6 is a flow diagram illustrating an example method for determining a binary string of values indicating the location of the last significant coefficient in accordance with the techniques of this disclosure.

FIG. 7 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

FIG. 8 is a flow diagram illustrating an example method for determining a value indicating a position of a last significant coefficient from a binary string, in accordance with the techniques of this disclosure.

Detailed Description

Techniques are provided for reducing the length of a bit string indicating the position of the last significant coefficient position within a transform coefficient block. The bit string may be particularly useful for Context Adaptive Binary Arithmetic Coding (CABAC). In one example, a progressive codeword structure with a reduced number of binary bits and a shorter truncated unary code may be used to indicate the location of the last significant coefficient position. Additionally, in one example, by reducing the maximum length of the truncated unary code, the number of CABAC context models for the last significant coefficient position may also be reduced.

The video encoder may be configured to determine a first binary string and a second binary string of values indicating a position of a last significant coefficient within a video block of size T. The video decoder may be configured to determine a value based on the first and second binary strings, the value indicating a position of a last significant coefficient within a video block of size T. In one example, the first binary string may be based on a truncated unary coding scheme defined by a maximum bit length that passes 2log₂(T) -1 definition, and the second binary string may be based on a fixed length coding scheme defined by a maximum bit length, the maximum bit length passing through log₂(T) -2. In another example, the first binary string may be based on a truncated unary coding scheme defined by a maximum bit length that passes log₂(T) +1, and the second binary string may be based on a fixed length coding scheme defined by a maximum bit length, which passes log₂(T) -1 is defined. In yet another example, the first binary string may be based on a truncated unary coding scheme defined by a maximum bit length that passes log₂(T) define, and the second binary string may be based on a fixed length coding scheme defined by a maximum bit length, the maximum bit length passing through log₂(T) -1 is defined.

FIG. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques described in this disclosure. As shown in fig. 1, system 10 includes a source device 12 that generates encoded video data to be decoded by a destination device 14 at a later time by source device 12. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., portable) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" handsets, so-called "smart" pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

Destination device 14 may receive encoded video data to be decoded over link 16. Link 16 may include any type of media or device capable of moving encoded video data from source device 12 to destination device 14. In one example, link 16 may comprise a communication medium to enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include routers, switches, base stations, or any other apparatus that may be used to facilitate communication from source device 12 to destination device 14.

Alternatively, encoded data may be output from output interface 22 to storage device 32. Similarly, encoded data may be accessed from storage device 32 by input interface 28. Storage device 32 may include any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In other examples, storage device 32 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 12. Destination device 14 may access the stored video data via streaming or download from storage device 32. The file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 14 may access the encoded video data via any standard data connection, including an internet connection. Such a data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from storage device 32 may be a streaming transmission, a download transmission, or a combination of both.

The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some cases, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. In source device 12, video source 18 may include sources such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface that receives video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources. As one example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be generally applicable to video coding, and may be applied to wireless and/or wired applications.

Captured, pre-captured, or computer-generated video may be encoded by video encoder 12. The encoded video data may be transmitted directly to destination device 14 via output interface 22 of source device 20. The encoded video data may also (or alternatively) be stored on storage device 32 for later access by destination device 14 or other devices for decoding and/or playback.

Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some cases, input interface 28 may include a receiver and/or a modem. Input interface 28 of destination device 14 receives the encoded video data over link 16. Encoded video data communicated over link 16 or provided on storage device 32 may include a variety of syntax elements generated by video encoder 20 for use by a video decoder (e.g., video decoder 30) in decoding the video data. Such syntax elements may be included with encoded video data transmitted on a communication medium, stored on a storage medium, or stored on a file server.

The display device 32 may be integrated with the destination device 14 or external to the destination device 14. In some examples, destination device 14 may include an integrated display device, and may also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC test model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4 part 10 Advanced Video Coding (AVC), or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video compression standards include MPEG-2 and ITU-T H.263.

Although not shown in fig. l, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device.

JCT-VC is working on the development of the HEVC standard. HEVC standardization efforts are based on an evolved model of the video coding device called the HEVC test model (HM). The HM assumes several additional capabilities of the video coding device relative to existing devices in accordance with, for example, ITU-T H.264/AVC. For example, h.264 provides nine intra-prediction encoding modes, while HM may provide up to thirty-three intra-prediction encoding modes.

In general, the working model for HM describes that a video frame or picture may be divided into a sequence of tree blocks or Largest Coding Units (LCUs) that include both luma and chroma samples. The tree block has a similar purpose to the macroblock of the h.264 standard. A slice includes a number of consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slices. Each tree block may be split into a number of Coding Units (CUs) according to a quadtree. For example, as the root node of a quadtree, a tree-type block may be split into four child nodes, and each child node may in turn be a parent node and split into four other child nodes. As a leaf node of the quadtree, the final non-split child node comprises a coding node, i.e., a coded video block. Syntax data associated with the coded bitstream may define a maximum number of times a treeblock may be split, and may also define a minimum size of a coding node.

A CU includes a coding node, and a Prediction Unit (PU) and a Transform Unit (TU) associated with the coding node. The size of a CU corresponds to the size of the coding node, and the shape must be square. The size of a CU may range from 8 × 8 pixels up to the size of a tree block with a maximum of 64 × 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning of the CU to one or more PUs. The partition mode may be different depending on whether the CU is skipped or direct mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The PU may be partitioned into non-square shapes. Syntax data associated with a CU may also describe partitioning of the CU into one or more TUs according to a quadtree, for example. The TU may be square or non-square in shape.

The HEVC standard allows for a transform according to a TU, which may be different for different CUs. The size of a TU is typically set based on the size of a PU within a given CU defined for a partitioned LCU, although this may not always be the case. TU sizes are typically the same as PU, or smaller than PU. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). The pixel difference values associated with TUs may be transformed to produce transform coefficients that may be quantized.

In general, a PU includes data related to a prediction process. For example, when a PU is inter-mode encoded, the PU may include data describing an intra-prediction mode of the PU. As another example, when a PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., list 0, list 1, or list C) of the motion vector.

In general, TUs are used for transform and quantization processes. A given CU with one or more PUs may also include one or more TUs. After prediction, video encoder 20 may calculate residual values corresponding to the PUs. The residual values comprise pixel difference values that may be transformed into transform coefficients using TUs, quantized, and scanned to produce serialized transform coefficients for entropy coding. The term "video block" may refer in this disclosure to a coding node of a CU, or a block of transform coefficients. One or more blocks of transform coefficients may define a TU. In some particular cases, this disclosure may also use the term "video block" to refer to a treeblock (i.e., LCU) or a CU, which includes coding nodes as well as PUs and TUs.

A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) typically includes a series of one or more video pictures. The GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, the syntax data describing the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes an encoding mode for the individual slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. A video block may include one or more TUs or PUs corresponding to coding nodes within a CU. Video blocks may have fixed or varying sizes and may differ in size according to a specified coding standard.

As an example, the HM supports prediction with various PU sizes. Assuming that the size of a particular CU is 2 nx 2N, the HM supports intra prediction with PU sizes of 2 nx 2N or N x N, and inter prediction with symmetric PU sizes of 2 nx 2N, 2 nx N, N x 2N, or N x N. The HM also supports asymmetric partitioning for inter prediction with PU sizes of 2 nxnu, 2 nxnd, nlx 2N, and nR x 2N. In asymmetric partitioning, one direction of a CU is undivided, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an indication of "n" followed by "up", "down", "left", or "right". Thus, for example, "2N × nU" refers to a 2N × 2N CU horizontally partitioned into a top 2N × 0.5N PU and a bottom 2N × 1.5N PU.

In this disclosure, "nxn" and "N by N" are used interchangeably to refer to the pixel size of a video block in terms of vertical and horizontal dimensions, e.g., 16 x 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.

After using intra-predictive or inter-predictive coding of PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. The PU may comprise pixel data in a spatial domain (also referred to as a pixel domain), and the TU may comprise coefficients in a transform domain after applying a transform, such as a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform, to the residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form a TU from one or more blocks of transform coefficients. A TU may include residual data of a CU. Video encoder 20 may then transform the TUs to generate transform coefficients for the CU.

After any transform is performed to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process of: transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be downscaled to an m-bit value during quantization, where n is greater than m.

In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. Fig. 2A-2D illustrate some different exemplary scanning orders. Other defined scan orders or adaptive (varying) scan orders may also be used. Fig. 2A illustrates a zig-zag scanning order, fig. 2B illustrates a horizontal scanning order, fig. 2C illustrates a vertical scanning order, and fig. 2D illustrates a diagonal scanning order. Combinations of these scanning orders may also be defined and used. In some examples, the techniques of this disclosure may be particularly applicable during coding of so-called significance maps in video coding processes.

One or more syntax elements may be defined to indicate the position of the last significant coefficient (i.e., a non-zero coefficient), which may depend on the scan order associated with the coefficient block. For example, one syntax element may define a column position of the last significant coefficient within the coefficient value block, and another syntax element may define a row position of the last significant coefficient within the coefficient value block.

Fig. 3 illustrates one example of a significance map for a block of coefficient values. The significance map is shown on the right, with 1-bit flags identifying the coefficients that are significant (i.e., non-zero) in the left video block. In one example, given a set of significant coefficients (e.g., defined by a significance map) and a scan order, a position of a last significant coefficient may be defined. In the emerging HEVC standard, transform coefficients may be grouped into information blocks. An information block may comprise an entire TU, or in some cases, a TU may be subdivided into smaller information blocks. Significance maps and level information (absolute value and sign) are coded for each coefficient in the information block. In one example, for a 4 x 4TU and an 8 x 8TU, an information block consists of 16 consecutive coefficients in an inverse scan order (e.g., diagonal, horizontal, or vertical). For 16 × 16 and 32 × 32 TUs, the coefficients within the 4 × 4 sub-block are treated as information blocks. Syntax elements are coded and signaled to represent coefficient level information within an information block. In one example, all symbols are encoded in inverse scan order. The techniques of this disclosure may improve coding of syntax elements used to define this position of the last significant coefficient of a coefficient block.

As one example, the techniques of this disclosure may be used to code the position of the last significant coefficient of a coefficient block (e.g., a TU or an information block for a TU). Then, after the position of the last significant coefficient is coded, the level and sign information may be coded. Coding of level and sign information may be processed according to a five-pass method by coding the following symbols (e.g., for a TU or an information block of a TU) in an inverse scan order:

significant _ coeff _ flag (abbreviation sigMapFlag): this flag may indicate the significance of each coefficient in the information block. Coefficients having a value of one or greater than one are considered valid.

coeff _ abs _ level _ marker 1_ Flag (abbreviation gr1 Flag): this flag may indicate whether the absolute value of the coefficient is greater than 1 for non-zero coefficients (i.e., coefficients with a sigMapFlag of 1).

coeff _ abs _ level _ marker 2_ Flag (abbreviation gr2 Flag): this Flag may indicate whether the absolute value of the coefficient is greater than 2 for coefficients having an absolute value greater than 1 (i.e., coefficients having a gr1Flag of 1).

coeff _ sign _ flag (abbreviation signFlag): this flag may indicate sign information for non-zero coefficients. For example, zero of this flag indicates a positive sign, and 1 indicates a negative sign.

coeff _ abs _ level _ remaining (abbreviated level rem): the remaining absolute value of the transform coefficient level. For this Flag, for each coefficient having a magnitude greater than x, the absolute value of the coefficient-x (abs (level) -x) is coded, the value of x depending on the presentation of gr1Flag and gr2 Flag.

In this way, transform coefficients for a TU or an information block of a TU may be coded. In any case, the techniques of this disclosure, which relate to coding of syntax elements used to define the position of the last significant coefficient of a coefficient block, may also be used with other types of techniques for final coding of level and sign information of transform coefficients. As stated in this disclosure, the five-pass method for coding significance, level, and sign information is merely one example technique that may be used after coding the position of the last significant coefficient of a block.

After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data. The entropy coding techniques of this disclosure are specifically described as being applicable to CABAC, but the techniques may also be applicable to other entropy coding techniques such as CAVLC, SBAC, PIPE, or other techniques.

To perform CABAC, video encoder 20 may assign contexts within the context model to symbols to be transmitted. The context may be as to whether, for example, adjacent values of the symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in Variable Length Coding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings compared to, for example, using a equal length codeword for each symbol to be transmitted. The probability determination may be based on the context assigned to the symbol.

In general, coding data symbols using CABAC may involve one or more of the following steps:

(1) binarization: if a symbol to be coded has a non-binary value, it is mapped to a sequence of so-called "binary bits". Each binary bit may have a value of "0" or "1".

(2) Context assignment: each bin (in a regular pattern) is assigned to a context. The context model determines the manner in which the context of a given bin is computed based on information available to the bin (e.g., the value of a previously encoded symbol or bin number).

(3) Binary bit encoding: the binary bits are encoded with an arithmetic encoder. To encode a binary bit, an arithmetic encoder requires as input the probability of the value of the binary bit (i.e., the probability that the value of the binary bit equals "0" and the probability that the value of the binary bit equals "1"). The (estimated) probability of each context is represented by an integer value called "context state". Each context has a state, and thus the state (i.e., estimated probability) is the same for the binary bits assigned to one context and differs from context to context.

(4) And (3) updating the state: the probability (state) of the selected context is updated based on the actual coded value of the binary bit (e.g., if the binary bit value is "1", then the probability of "1" is increased).

Note that probability interval partition entropy coding (PIPE) uses principles similar to those of arithmetic coding, and similar techniques can be used for those principles of this disclosure described primarily with respect to CABAC. However, the techniques of this disclosure may be used with CABAC, PIPE, or other entropy coding methods that use binarization techniques.

One technique recently adopted for HM4.0 is described in v. smith (Seregin), i. -K gold "binarization modification for last position coding" (JCTVC-F375, 6 th conference, italian dueling, 7 months 2011) (smith herein). The techniques employed in HM4.0 reduce context in last position coding for CABAC by introducing fixed length codes with a bypass mode. Bypass mode means that there is no context modeling procedure and each symbol is coded with equal probability states. Increasing the number of bits coded in bypass mode while reducing bits coded in regular mode may help speed up and parallelization of the codec.

In the technique employed in HM4.0, the maximum possible magnitude max _ length of the final position component is equally divided in half. The first half is coded with a truncated unary code and the second half is coded with a fixed length code (the number of binary bits is equal to log)₂(max _ length/2)). In the worst case scenario, the number of bins modeled using the context is equal to max _ length/2. Table 1 shows binarization of TU32 × 32 in HM 4.0.

TABLE 1, binarization of TU32 × 32 in HM4.0, where X means 1 or 0.

This disclosure provides techniques for Context Adaptive Binary Arithmetic Coding (CABAC) of a last significant coefficient position. In one example, a progressive codeword structure with a reduced number of binary bits and a shorter truncated unary code may be used. Additionally, in one example, the number of context models for the last significant-coefficient position may be reduced by two by reducing the maximum length of the truncated unary code.

FIG. 4 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may perform intra-coding and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra mode (I-mode) may refer to any of a number of space-based compression modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of a number of time-based compression modes.

In the example of fig. 4, video encoder 20 includes partitioning unit 35, prediction module 41, reference picture memory 64, summer 50, transform module 52, quantization unit 54, and entropy encoding unit 56. Prediction module 41 includes motion estimation unit 42, motion compensation unit 44, and intra-prediction module 46. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform module 60, and a summer 62. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 if desired. In addition to the block filter, an additional loop filter (in-loop or after-loop) may be used.

As shown in fig. 4, video encoder 20 receives video data and partition unit 35 partitions the data into video blocks. Such partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning, e.g., according to a quadtree structure of LCUs and CUs. Video encoder 20 generally illustrates the components that encode video blocks within a video slice to be encoded. A slice may be divided into a plurality of video blocks (and possibly into a set of video blocks referred to as tiles). Prediction module 41 may select one of a plurality of possible coding modes (e.g., one of a plurality of intra coding modes or one of a plurality of inter coding modes) for the current video block based on error results (e.g., coding rates and distortion levels). Prediction module 41 may provide the resulting intra or inter coded block to summer 50 to generate residual block data, and to summer 62 to reconstruct the encoded block for use as a reference picture.

Intra-prediction module 46 within prediction module 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction module 41 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine an inter-prediction mode for a video slice according to a predetermined mode of a video sequence. The predetermined mode may specify the video slices in the sequence as predictive slices (P slices), bi-predictive slices (B slices), or generalized P and B slices (GPB slices). A P slice may refer to a previous sequential picture. A B slice may refer to a previous sequential picture or a subsequent sequential picture. A GPB slice refers to the case where two lists of reference pictures are identical. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. A motion vector, for example, may indicate a displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.

Predictive blocks are blocks that are found to closely match PUs of a video block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Thus, motion estimation unit 42 may perform a motion search relative to the full pixel position and the fractional pixel position and output motion vectors with fractional pixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the location of the PU to the location of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.

Motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on a motion vector determined by motion estimation (possibly performing interpolation to sub-pixel precision). Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate, in one of the reference picture lists, the predictive block to which the motion vector points. Video encoder 20 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values. The pixel difference values form residual data for the block, and may include both luma and chroma difference components. Summer 50 represents the component that performs this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.

As an alternative to inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction module 46 may intra-predict the current block, as described above. In particular, intra-prediction module 46 may determine the intra-prediction mode used to encode the current block. In some examples, intra-prediction module 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction module 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode to use from the tested modes. For example, intra-prediction module 46 may calculate rate-distortion values using rate-distortion analysis on various tested intra-prediction modes, and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., number of bits) used to produce the encoded block. Intra-prediction module 46 may calculate ratios from the distortions and rates of the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

In any case, after selecting the intra-prediction mode for the block, intra-prediction module 46 may provide information to entropy coding unit 56 indicating the selected intra-prediction mode for the block. Entropy coding unit 56 may encode information indicative of the selected intra-prediction mode in accordance with the techniques of this disclosure. Video encoder 20 may include configuration data in the transmitted bitstream that may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of the encoding contexts for the various blocks, and indications of the most probable intra-prediction mode, intra-prediction mode index tables, and modified intra-prediction mode index tables to be used for each of the contexts.

After prediction module 41 generates the predictive block for the current video block via inter-prediction or intra-prediction, video encoder 20 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform module 52. Transform module 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. Transform module 52 may convert the residual video data from the pixel domain to a transform domain (e.g., the frequency domain).

Transform module 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan. Inverse quantization unit 58 and inverse transform module 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for a reference picture. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in reference picture memory 64. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture.

After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30, or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode the motion vectors, as well as other syntax elements of the current video slice being coded.

In one example, entropy coding unit 56 may encode the position of the last significant coefficient using the techniques employed in HM4.0 described above. In other examples, entropy coding unit 56 may encode the position of the last significant coefficient using techniques that may provide improved coding. In particular, entropy coding unit 56 may use a progressive last position coding scheme for several possible TU sizes.

In one example, the codeword for the position of the last significant coefficient may comprise a truncated unary code prefix followed by a fixed length code suffix. In one example, each magnitude of the last position may use the same binarization for all possible TU sizes, except when the last position is equal to the TU size minus 1. This exception is due to the nature of truncated unary coding. In one example, the position of the last significant coefficient within the rectangular transform coefficient may be specified by specifying an x-coordinate value and a y-coordinate value. In another example, the block of transform coefficients may be in the form of a 1 × N vector, and the position of the last significant coefficient within the vector may be specified by a signal position value.

In one example, T may define the size of a TU. As described in detail above, TU shapes may be square or non-square. Thus, T may refer to the number of rows or columns, or direction, of a two-dimensional TUThe length of the quantity. In an example where the truncated unary coding scheme provides a number of zero bits followed by one 1 bit, the number of zeros of the truncated unary code prefix coding the position of the last significant coefficient may be according to N ═ 0₂(T) -1 }. Note that in an example where the truncated unary coding scheme provides 1 bits followed by one zero bit, N ═ 0₂(T) -1} may also define a number of ones. In each of these truncated unary coding alternatives, 2log₂(T) -1 may define the maximum length of the truncated single-element prefix for a TU of size T. For example, for a TU with T equal to 32, the maximum length of the truncated unary prefix is equal to 9, and for a TU with T equal to 16, the maximum length of the truncated unary prefix is equal to 7.

For a truncated unary code (value n), the fixed length code suffix may contain the following b bits of the fixed length binary code with values defined as: f _ value ═ {0, ·, 2^b-1}, wherein b ═ max (0, n/2-7). Thus, the magnitude of last position last _ pos can be derived from n and f _ value as:

where mod (·) represents a modulo operation and f _ value represents the value of a fixed length code.

Table 2 shows an example binarization for the positions of the last significant coefficients of a 32 x 32TU according to the definitions provided according to equation 1. The second column of table 2 provides the corresponding truncated unary prefix value of the possible values of the position of the last significant coefficient within a TU of size T by 2log₂(T) -1 maximum truncated unary prefix length definition. The third column of table 2 provides a corresponding fixed length suffix for each truncated unary prefix. For simplicity, Table 2 contains X values that indicate one or zero bit values. Note that the X value uniquely maps each value sharing a truncated unary prefix according to a fixed length code. The magnitude of the last position component in table 2 may correspond to an x-coordinate value and/or a y-coordinate value.

TABLE 2 binarization of TUs of size 32X 32, where X means 1 or 0.

Tables 3 and 4 show a comparison of the maximum length of the bit string for the example binarization scheme described with respect to table 1 and the example binarization scheme described with respect to table 2. As shown in table 3, the unary code prefix may have a maximum length of 16 binary bits for a 32 x 32TU in the example described with respect to table 1. While a unary code prefix may have a maximum length of 16 binary bits for a 32 x 32TU in the example described with respect to table 2. Additionally, as shown in table 4, the overall length (maximum number of binary bits) based on the truncated unary prefix and the fixed length suffix may be 24 in the worst case (i.e., when the last position is at the end of a 32 x 32 TU) for the examples described with respect to table 2, and 40 for the examples described with respect to table 1.

	TABLE 1	TABLE 2
			TU4×4	3	3
TU8×8	4	5
			TU16×16	8	7
TU32×32	16	9
			In total (lightness)	31	24
In total (chroma)	15	15

Table 3. maximum length of truncated unary code.

	TABLE 1	TABLE 2
			TU4×4	3	3
TU8×8	6	6
			TU16×16	11	9
TU32×32	20	12

TABLE 4 maximum length of binary bits on one last position component.

In another example of a truncated unary coding scheme providing several zero bits followed by one 1 bit or several 1 bits followed by one 1 bit, the truncated unary code prefix that codes the position of the last significant coefficient may be according to N ═ 0₂(T) +1 }. In each of these truncated unary coding schemes, log₂(T) +1 may define the maximum length of truncated unary prefix for TUs of size T. For example, for a TU with T equal to 32, the maximum length of the truncated unary prefix is equal to 6, and where T is equal to 8, the maximum length of the truncated unary prefix is equal to 5.

For a truncated unary code (value n), the fixed length code suffix may contain the following b bits of the fixed length binary code with values defined as: f _ value ═ {0,.. 2^b-1}, wherein b ═ n-2. Thus, the magnitude of last position last _ pos can be derived from n and f _ value as:

where f _ value represents the value of a fixed length code.

Table 5 shows an example binarization for the positions of the last significant coefficients of the 32 x 32TU according to the definitions provided according to equation 2. The second column of Table 5 provides that the final significant factor is largeA corresponding truncated unary prefix value of possible values for a position within a TU of a size T, the size T pass being log₂Maximum truncated unary prefix length definition of (T) + 1. The third column of table 5 provides a corresponding fixed length suffix for each truncated unary prefix. For simplicity, Table 2 contains X values that indicate one or zero bit values. Note that the X value uniquely maps each value sharing a truncated unary prefix according to a fixed length code. The magnitude of the last position component in table 5 may correspond to an x-coordinate value and/or a y-coordinate value.

Table 5. example binarization of TU32 × 32, where X means 1 or 0.

In another example of a truncated unary coding scheme providing several zero bits followed by one 1 bit or several 1 bits followed by one 1 bit, the truncated unary code prefix that codes the position of the last significant coefficient may be according to N ═ 0₂(T)) is defined. In each of these truncated unary coding schemes, log₂(T) may define a maximum length of truncated unary prefix for a TU of size T. For example, for a TU with T equal to 32, the maximum length of the truncated unary prefix is equal to 5, and where T is equal to 8, the maximum length of the truncated unary prefix is equal to 5.

For a truncated unary code (value n), the fixed length code suffix may contain the following b bits of the fixed length binary code with values defined as: f _ value ═ {0,.. 2^b-1), wherein b ═ n-1. Thus, the magnitude of last position last _ pos can be derived from n and f _ value as:

where f _ value represents the value of a fixed length code.

Table 6 shows an example binarization for the positions of the last significant coefficients of the 32 x 32TU according to the definitions provided according to equation 3. The second column of table 6 provides the corresponding truncated unary prefix value of the possible values of the position of the last significant coefficient within a TU of size T, which is log by₂(T) maximum truncated unary prefix length definition. The third column of table 6 provides a corresponding fixed length suffix for each truncated unary prefix. For simplicity, Table 6 contains x values that indicate either a one or zero bit value. Note that the x value uniquely maps each value of the shared truncated unary prefix according to a fixed length code. The magnitude of the last position component in table 6 may correspond to an x-coordinate value and/or a v-coordinate value.

Table 6. example binarization of TU32 × 32, where X means 1 or 0.

Tables 5 and 6 show some alternative examples of coding the position of the last significant coefficient using a truncated unary prefix and a fixed length suffix. The examples shown in tables 5 and 6 allow for shorter binary bits than the examples provided with respect to table 2. Note that in examples where the location of the last significant coefficient is determined based on x-and y-coordinate values, any of the example binarization schemes shown in tables 1, 2, 5, and 6 may be independently selected for the x-and y-coordinate values. For example, the x-coordinate values may be encoded based on the binarization scheme described with respect to Table 2, and the y-coordinate values may be encoded based on the binarization scheme described with respect to Table 6.

As described above, coding data symbols using CABAC may involve one or more of subsequent steps binarization and context assignment. In one example, last position value context modeling may be used for arithmetic coding that truncates a string of bins, while context modeling may not be used for arithmetic coding (i.e., bypassed) of a fixed binary string. In the case where the truncated string is encoded using context modeling, a context is assigned to each of the bin indices of the bin string. Individual binary bit indices may share context assignments. The number of context assignments is equal to the number of bin indices, or the length of the truncated string. Thus, in the case illustrated in the examples in tables 1, 2, 5 and 6, the associated context tables may be assigned to binarization schemes accordingly. Table 7 illustrates possible context indexing for each bin of different TU sizes, binarized with respect to the example provided above with respect to table 2 above. Note that the example context indexing provided in table 7 provides two fewer contexts than the context indexing provided in smith.

TABLE 7

Tables 8 through 11 each illustrate some examples of context indexing according to the following rules created for context modeling:

1. the first K binary bits do not share context, where K > 1. K may be different for each TU size.

2. One context may be assigned to only consecutive binary bits. For example, bits 3 through 5 may use context 5. But bit 3 and bit 5 use context 5 and bit 4 is not allowed to use context 6.

3. The last N binary bits of different TU sizes (N > ═ 0) may share the same context.

4. The number of bits sharing the same context increases with TU size.

The above rules 1 through 4 may be particularly useful for the binarization provided in table 2. However, the context modeling may be adjusted accordingly based on the binarization scheme implemented.

TABLE 8

TABLE 9

Watch 10

TABLE 11

FIG. 5 is a block diagram illustrating an example entropy encoder 56 that may implement the techniques described in this disclosure. Entropy encoder 56 receives syntax elements, such as one or more syntax elements representing the position of the last significant transform coefficient within the transform block coefficients, and encodes the syntax elements into the bitstream. The syntax elements may include a syntax element that specifies an x-coordinate of a location of the last significant coefficient within the transform coefficient block and a syntax element that specifies a y-coordinate of a location of the last significant coefficient within the transform coefficient block. In one example, the entropy encoder 56 illustrated in fig. 5 may be a CABAC encoder. The example entropy encoder 56 in fig. 5 may include a binarization unit 502, an arithmetic encoding unit 504, and a context assignment unit 506.

Binarization unit 502 receives syntax elements and generates a bin string. In one example, binarization unit 502 receives values representing the last position of a significant coefficient within a transform coefficient block and generates a string of bits or binary bit values according to the examples described above. The arithmetic coding unit 504 receives the bit string from the binarization unit 502 and performs arithmetic coding on the codeword. As shown in fig. 5, the arithmetic encoder 504 may receive binary bit values from a bypass path or from the context modeling unit 506. In the case where the arithmetic encoding unit 504 receives a binary bit value from the context modeling unit 506, the arithmetic encoding unit 504 may perform arithmetic encoding based on the context assignment provided by the context assignment unit 506. In one example, arithmetic encoding unit 504 may encode a prefix portion of a bit string using context assignments and may encode a suffix portion of a bit string without using context assignments.

In one example, context assignment unit 506 may assign contexts based on the example context indexing provided in tables 7-11 above. In this manner, video encoder 20 represents a video encoder configured to: obtaining a value indicative of a position of a last significant coefficient within a video block of size T; determining a first binary string of values indicating a position of a last significant coefficient based on a truncated unary coding scheme defined by a maximum bit length, the maximum bit length passing through 2log₂(T)-1、log₂(T) +1 or log₂(T) definition; determining a second binary string of values indicating a position of a last significant coefficient based on a fixed length coding scheme; and encoding the first and second binary strings into a bitstream.

FIG. 6 is a flow diagram illustrating an example method for determining a binary string of values indicating the location of the last significant coefficient in accordance with the techniques of this disclosure. The method described in FIG. 6 may be encoded by an example video encoder or entropy encoder described hereinAny of the coders. At step 602, a value is obtained that indicates the position within the video block of the last significant transform coefficient. At step 604, a prefix binary string of values indicating the position of the last significant coefficient is determined. The prefix binary string may be determined using any of the techniques described herein. In one example, prefix binaries may be based on a truncated unary coding scheme defined by a maximum bit length, which is by 2log₂(T) -1, where T defines the size of the video block. In another example, prefix binaries may be based on a truncated unary coding scheme defined by a maximum bit length, which passes through log₂(T) +1, where T defines the size of the video block. In yet another example, prefix binaries may be based on a truncated unary coding scheme defined by a maximum bit length, which is by log₂(T) definition, where T defines the size of a video block. The prefix binary string may be determined by the encoder performing a set of calculations, by the encoder using a look-up table, or a combination of the two. For example, the encoder may use any of tables 2, 5, and 6 to determine the prefix binary string.

At step 606, a suffix binary string of values indicating the position of the last significant coefficient is determined. The suffix binary string may be determined using any of the techniques described herein. In one example, the suffix binary string may be based on a fixed length coding scheme defined by a maximum bit length, which passes through log₂(T) -2, where T defines the size of the video block. In another example, the suffix binary string may be based on a fixed length coding scheme defined by a maximum bit length, which passes through log₂(T) -1, where T defines the size of the video block. The suffix binary string may be determined by the encoder performing a set of calculations, by the encoder using a look-up table, or a combination of the two. For example, the encoder may use any of tables 2, 5, and 6 to determine the suffix binary string. At step 608, the prefix and suffix binary strings are encoded into the bitstream. In one example, the prefix and suffix binary strings may be encoded using arithmetic coding. Note that the bits may be interchangedPrefix and suffix portions of the stream. The arithmetic coding may be part of a CABAC coding process, or part of another entropy coding process.

Tables 12-14 provide an overview of simulation results for coding performance with respect to the example binarization scheme described in table 1 and with respect to the example binarization scheme described in table 2. The simulation results in tables 12 to 14 were obtained using high efficiency general test conditions as defined by F. Negative values in tables 12 to 14 indicate lower bit rates for the binarization scheme described with respect to table 2 than for the binarization scheme described with respect to table 1. The encoding time and decoding time in tables 12-14 describe the amount of time required to respectively encode and decode a bitstream generated using the binarization scheme described with respect to table 2 compared to the amount of time required to encode (or decode) a bitstream generated using the binarization scheme described with respect to table 1. As can be seen from the experimental results shown in tables 12-14, the binarization scheme described with respect to table 2 provides BD rate performance gains of-0.04%, -0.01%, and-0.03% under high efficiency intra-only, random access, and low latency test conditions, respectively.

The a to E classes in the following table represent various sequences of video data. Columns Y, U and V correspond to data for luma, U chroma, and V chroma, respectively. Table 12 summarizes this data for a configuration in which all data is decoded in intra mode. Table 13 summarizes this data for a configuration in which all data is coded in a "random access" mode where both intra and inter modes are available. Table 14 summarizes this data for a configuration in which pictures are coded in low-delay B-mode.

TABLE 12 full Intra-frame HE

TABLE 13 random Access HE

TABLE 14 Low latency BHE

FIG. 7 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. In the example of fig. 7, video decoder 30 includes an entropy decoding unit 80, a prediction module 81, an inverse quantization unit 86, an inverse transform module 88, a summer 90, and a reference picture memory 92. Prediction module 81 includes motion compensation unit 82 and intra-prediction module 84. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 from fig. 4.

During the decoding process, video decoder 30 receives an encoded video bitstream from video encoder 20 that represents video blocks of encoded video slices and associated syntax elements. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 may determine a value that indicates a location of the last significant coefficient within the transform coefficients based on the techniques described herein. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction module 81. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

When a video slice is coded as an intra-coded (I) slice, intra-prediction module 84 of prediction module 81 may generate prediction data for a video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When a video frame is coded as an inter-coded (i.e., B, P or GPB) slice, motion compensation unit 82 of prediction module 81 generates predictive blocks for the video blocks of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 80. The predictive block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may use a default construction technique to construct the reference frame lists (list 0 and list 1) based on the reference pictures stored in reference picture memory 92.

Motion compensation unit 82 determines prediction information for video blocks of the current video slice by parsing motion vectors and other syntax elements, and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 82 uses some received syntax elements to determine construction information for one or more of a prediction mode (e.g., intra-prediction or inter-prediction) used to code video blocks of a video slice, an inter-prediction slice type (e.g., a B slice, a P slice, or a GPB slice), a reference picture list for a slice, a motion vector for each inter-coded video block of a slice, an inter-prediction state for each inter-coded video block of a slice, and other information used to decode video blocks in a current video slice.

Motion compensation unit 82 may also perform interpolation based on the interpolation filter. Motion compensation unit 82 may use interpolation filters as used by video encoder 20 during encoding of video blocks to calculate interpolated values for sub-integer pixels of a reference block. In this case, motion compensation unit 82 may determine the interpolation filter used by video encoder 20 from the received syntax element and use the interpolation filter to generate the predictive block.

Inverse quantization unit 86 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. The inverse quantization process may include, for each video block in the video slice, using the quantization parameter calculated by video encoder 20 to determine the degree of quantization that should be applied and likewise determine the degree of inverse quantization. The inverse transform module 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain.

In the predictionAfter module 81 generates the predictive block for the current video block based on inter prediction or intra prediction, video decoder 30 forms a decoded video block by summing the residual block from inverse transform module 88 with the corresponding predictive block generated by prediction module 81. Summer 90 represents the component(s) that perform this summation operation. A deblocking filter may also be applied to filter the decoded blocks, if desired, in order to remove blocking artifacts. Other loop filters (either in-coding loop or after-coding loop) may also be used to smooth pixel transitions, or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 92, reference picture memory 92 storing reference pictures for subsequent motion compensation. Reference picture memory 92 also stores the decoded video for later presentation on a display device (e.g., display device 32 of fig. 1). In this manner, video decoder 30 represents a video decoder configured to: obtaining a first binary string and a second binary string from an encoded bitstream; determining a value indicating a position of a last significant coefficient within a video block of size T based in part on a first binary string, wherein the first binary string is scaled by a maximum bit length by 2log₂(T) -1 defined truncated unary coding scheme; and determining a value indicating a position of a last significant coefficient based in part on a second binary string, wherein the second binary string is defined by a fixed length coding scheme.

Fig. 8 is a flow diagram illustrating an example method for determining values from a binary string, the values indicating the location of the last significant coefficient within a transform coefficient, in accordance with the techniques of this disclosure. The method described in fig. 8 may be performed by any of the example video decoders or entropy decoding units described herein. At step 802, an encoded bitstream is obtained. The encoded bitstream may be retrieved from memory or via transmission. The encoded bitstream may be encoded according to a CABAC encoding process or another entropy coding process. At step 804, a prefix binary string is obtained. At step 806, a suffix binary string is obtained. The prefix binary string and the suffix binary string may be obtained by decoding the encoded bitstream. Decoding may include arithmetic solutionsAnd (4) code. The arithmetic decoding may be part of a CABAC decoding processor or another entropy decoding process. At step 808, a value is determined that indicates the position of the last significant coefficient within a video block of size T. In one example, the location of the last significant coefficient is determined based in part on a prefix binary string, where the prefix binary string is 2log by maximum bit length₂(T) -1, where T defines the size of the video block. In one example, the location of the last significant coefficient is determined based in part on a prefix binary string, where the prefix binary string is defined by the log of the maximum bit length₂(T) +1, where T defines the size of the video block. In one example, the location of the last significant coefficient is determined based in part on a prefix binary string, where the prefix binary string is defined by the log of the maximum bit length₂(T) a defined truncated unary coding scheme, wherein T defines the size of the video block. In one example, the position of the last significant coefficient is based in part on a suffix binary string, wherein the suffix binary string is defined by the log of the maximum bit length₂(T) -2, where T defines the size of the video block. In another example, the second binary string may be based on a fixed length coding scheme defined by a maximum bit length, the maximum bit length passing through log₂(T) -1, where T defines the size of the video block. Note that the prefix portion and suffix portion of the bitstream may be interchanged.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as in accordance with a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. The computer program product may include a computer-readable medium.

In yet other examples, this disclosure contemplates computer-readable media comprising data structures stored thereon, wherein the data structures comprise encoded bitstreams consistent with this disclosure. In particular, the encoded bitstream may include an entropy coded bitstream including a first binary string and a second binary string, wherein the first binary string indicates a value (the value indicates a position of a last significant coefficient) and is based on a truncated unary coding scheme defined by a maximum bit length that passes through 2log₂(T) -1, and the second binary string indicates the value indicating the position of the last significant coefficient and is based on a fixed length coding scheme.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but relate to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperability hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method for encoding video data, comprising:

obtaining a value indicative of a position of a last significant coefficient within a video block of size T;

determining a first binary string of the value indicating the position of the last significant coefficient based on a truncated unary coding scheme defined by a maximum bit length, the maximum bit length passing through 2log₂(T) -1;

determining a second binary string indicating the value of the position of the last significant coefficient based on a fixed length coding scheme; and

encoding the first and second binary strings to a bitstream.

2. The method of claim 1, wherein encoding the first and second binary strings to a bitstream includes arithmetic encoding.

3. The method of claim 2, wherein encoding the first and second binary strings to a bitstream includes encoding the first binary string based on a context model.

4. The method of claim 1, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

5. The method of claim 4, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

6. The method of claim 5, wherein the first binary string includes six sequential bits having the same value and one bit having an opposite value, and wherein the second binary string has a bit length of 1.

7. A device comprising a video encoder configured to:

encoding the first and second binary strings to a bitstream.

8. The device of claim 7, wherein being configured to encode the first and second binary strings to a bitstream includes being configured to perform arithmetic encoding.

9. The device of claim 8, wherein being configured to encode the first and second binary strings to a bitstream includes being configured to encode the first binary string based on a context model.

10. The device of claim 7, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

11. The device of claim 10, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

12. The device of claim 10, wherein the first binary string includes six sequential bits having the same value and one bit having an opposite value, and the second binary string has a bit length of 1.

13. A device for encoding video data, the device comprising:

means for obtaining a value indicative of a position of a last significant coefficient within a video block of size T;

determining a position of the last significant coefficient based on a truncated unary coding scheme defined by a maximum bit lengthMeans for storing a first binary string of said values, said maximum bit length passing through 2log₂(T) -1;

means for determining a second binary string indicating the value for the position of the last significant coefficient based on a fixed length coding scheme; and

means for encoding the first and second binary strings to a bitstream.

14. The device of claim 13, wherein means for encoding the first and second binary strings to a bitstream comprises means for performing arithmetic encoding.

15. The device of claim 14, wherein means for encoding the first and second binary strings to a bitstream includes means for encoding the first binary string based on a context model.

16. The device of claim 13, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

17. The device of claim 16, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

18. The device of claim 16, wherein the first binary string includes six sequential bits having the same value and one bit having an opposite value, and wherein the second binary string has a bit length of 1.

19. A computer-readable storage medium comprising instructions stored thereon that, when executed, cause one or more processors to:

encoding the first and second binary strings to a bitstream.

20. The computer-readable storage medium of claim 19, wherein instructions that when executed cause one or more processors to encode the first and second binary strings to a bitstream include instructions that when executed cause one or more processors to perform arithmetic encoding.

21. The computer-readable storage medium of claim 20, wherein instructions that when executed cause one or more processors to encode the first and second binary strings to a bitstream include instructions that when executed cause one or more processors to encode the first binary string based on a context model.

22. The computer-readable storage medium of claim 19, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

23. The computer-readable storage medium of claim 22, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

24. The computer-readable storage medium of claim 23, wherein the first binary string includes six sequential bits having the same value and one bit having an opposite value, and wherein the second binary string has a bit length of 1.

25. A method for decoding video data, comprising:

obtaining a first binary string and a second binary string from the encoded bitstream;

determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is encoded by a 2log maximum bit length₂(T) -1 defined truncated unary coding scheme; and

determining the value indicating the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

26. The method of claim 25, wherein obtaining a first binary string and a second binary string from the encoded bitstream includes performing arithmetic decoding.

27. The method of claim 25, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

28. The method of claim 27, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

29. The method of claim 28, wherein the first binary string includes 6 sequential bits having the same value and 1 bit having the opposite value.

30. The method of claim 29, wherein the second binary string has a bit length of 1.

31. A device comprising a video decoder configured to:

32. The device of claim 31, wherein being configured to obtain a first binary string and a second binary string from the encoded bitstream includes being configured to perform arithmetic decoding.

33. The device of claim 31, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

34. The device of claim 33, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

35. The device of claim 34, wherein the first binary string includes 6 sequential bits having the same value and 1 bit having the opposite value.

36. The device of claim 35, wherein the second binary string has a bit length of 1.

37. A device for decoding video data, the device comprising:

means for obtaining a first binary string and a second binary string from an encoded bitstream;

means for determining a value indicative of a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is encoded by a 2log maximum bit length₂(T) -1 defined truncated unary coding scheme; and

means for determining the value indicative of the position of the last significant coefficient based in part on the second binary string, wherein the second binary string is defined by a fixed length coding scheme.

38. The device of claim 37, wherein means for obtaining a first binary string and a second binary string from an encoded bitstream comprises means for performing arithmetic decoding.

39. The device of claim 37, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

40. The device of claim 39, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

41. The device of claim 40, wherein the first binary string includes 6 sequential bits having a same value and 1 bit having an opposite value.

42. The device of claim 41, wherein the second binary string has a bit length of 1.

43. A computer-readable storage medium comprising instructions stored thereon that, when executed, cause one or more processors to:

44. The computer-readable storage medium of claim 43, wherein instructions that upon execution cause one or more processors to obtain a first binary string and a second binary string from an encoded bitstream include instructions that upon execution cause one or more processors to perform arithmetic decoding.

45. The computer-readable storage medium of claim 43, wherein the fixed length coding scheme is defined by a maximum bit length, the maximum bit length defined by log₂(T) -2.

46. The computer-readable storage medium of claim 45, wherein T is equal to 32, wherein the value indicating the position of a last significant coefficient is equal to 8, and wherein the first binary string has a bit length of 7.

47. The computer-readable storage medium of claim 46, wherein the first binary string includes 6 sequential bits having a same value and 1 bit having an opposite value.

48. The computer-readable storage medium of claim 47, wherein the second binary string has a bit length of 1.

49. A method for decoding video data, comprising:

determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is defined by a maximum bit length of log₂(T) + 1; and

50. A method for decoding video data, comprising:

determining a value indicating a position of a last significant coefficient within a video block of size T based in part on the first binary string, wherein the first binary string is defined by a maximum bit length of log₂(T) a defined truncated unary coding scheme; and