HK1197951B - Context optimization for last significant coefficient position coding - Google Patents
Context optimization for last significant coefficient position coding Download PDFInfo
- Publication number
- HK1197951B HK1197951B HK14111449.8A HK14111449A HK1197951B HK 1197951 B HK1197951 B HK 1197951B HK 14111449 A HK14111449 A HK 14111449A HK 1197951 B HK1197951 B HK 1197951B
- Authority
- HK
- Hong Kong
- Prior art keywords
- binary
- context
- transform block
- block
- index
- Prior art date
Links
Description
Related application
The present application claims rights to:
united states provisional application No. 61/557,317, filed on 8/11/2011;
united states provisional application No. 61/561,909, filed on 20/11/2011;
united states provisional patent application No. 61/588,579, filed on day 19, month 1, 2012; and
united states provisional patent application No. 61/596,049, filed on 7/2/2012, each of which is hereby incorporated by reference in its entirety.
Technical Field
The present disclosure relates to video coding, and more particularly, to techniques for coding transform coefficients.
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), portable or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T h.263, ITU-T h.264/MPEG-4 part 10 (advanced video coding (AVC)), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Spatial prediction or temporal prediction results in a predictive block for the block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
Disclosure of Invention
In general, techniques are described for coding video data. In particular, this disclosure describes techniques for coding transform coefficients.
In one example of the present invention, a method of encoding transform coefficients comprises: obtaining a binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block; determining a context for a binary index of the binary string based on a video block size, wherein the context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and encoding the binary string using Context Adaptive Binary Arithmetic Coding (CABAC) based at least in part on the determined context.
In another example of this disclosure, a method of decoding transform coefficients comprises: obtaining an encoded binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block, wherein the encoded binary string is encoded using CABAC; determining a context for a binary index of the encoded binary string based on a video block size, wherein the context is assigned to at least two binary indices, wherein each of the at least two binary indices is associated with a different video block size; and decoding the encoded binary string using CABAC based at least in part on the determined context.
In another example of this disclosure, an apparatus configured to encode transform coefficients in a video encoding process comprises: means for obtaining a binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block; means for determining a context for a binary index of the binary string based on a video block size, wherein the context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and means for encoding the binary string using CABAC based at least in part on the determined context.
In another example of this disclosure, an apparatus configured to decode transform coefficients in a video decoding process comprises: means for obtaining an encoded binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block, wherein the encoded binary string is encoded using CABAC; means for determining a context for a binary index of the encoded binary string based on a video block size, wherein the context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and means for decoding the encoded binary string using CABAC based at least in part on the determined context.
In another example of this disclosure, a device comprises a video encoder configured to: obtaining a binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block; determining a context for a binary index of the binary string based on a video block size, wherein the context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and encoding the binary string using CABAC based at least in part on the determined context.
In another example of this disclosure, a device comprises a video decoder configured to: obtaining an encoded binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block, wherein the encoded binary string is encoded using CABAC; determining a context for a binary index of the encoded binary string based on a video block size, wherein the context is assigned to at least two binary indices, wherein each of the at least two binary indices is associated with a different video block size; and decoding the encoded binary string using CABAC based at least in part on the determined context.
In another example of this disclosure, a non-transitory computer-readable storage medium has instructions stored thereon that, when executed, cause a video encoding device to: obtaining a binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block; determining a context for a binary index of the binary string based on a video block size, wherein the context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and encoding the binary string using CABAC based at least in part on the determined context.
In another example of this disclosure, a non-transitory computer-readable storage medium has instructions stored thereon that, when executed, cause a video decoding device to: obtaining an encoded binary string indicating a position of a last significant coefficient within a transform coefficient block associated with a video block, wherein the encoded binary string is encoded using CABAC; determining a context for a binary index of the encoded binary string based on a video block size, wherein the context is assigned to at least two binary indices, wherein each of the at least two binary indices is associated with a different video block size; and decoding the encoded binary string using CABAC based at least in part on the determined context.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram illustrating an example video encoding and decoding system.
Fig. 2A-2D illustrate exemplary coefficient value scan orders.
Fig. 3 illustrates one example of a significance map for a block of coefficient values.
FIG. 4 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.
FIG. 5 is a block diagram illustrating an example entropy encoder that may implement the techniques described in this disclosure.
FIG. 6 is a flow diagram illustrating an example of encoding a binary string value indicating a position of a last significant coefficient in accordance with the techniques of this disclosure.
FIG. 7 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.
FIG. 8 is a block diagram illustrating an example entropy decoder that may implement the techniques described in this disclosure.
Fig. 9 is a flow diagram illustrating an example of decoding a binary string value indicating a position of a last significant coefficient in accordance with the techniques of this disclosure.
Detailed Description
In general, techniques are described for coding video data. In particular, this disclosure describes techniques for coding transform coefficients in a video encoding and/or decoding process. In block-based video coding, transform coefficient blocks may be arranged in two-dimensional (2D) arrays. A scanning process may be performed to rearrange a two-dimensional (2D) array of transform coefficients into an ordered one-dimensional (1D) array (i.e., a vector) of transform coefficients. One or more syntax elements may be used to indicate the position of the last significant coefficient (i.e., a non-zero coefficient) within the transform coefficient block based on the scanning order. The position of the last significant coefficient may be used by a video encoder to optimize the encoding of the transform coefficients. Likewise, the video decoder may use the position of the last significant coefficient to optimize the decoding of the transform coefficients. It is therefore desirable to efficiently code one or more syntax elements that indicate the position of the last significant coefficient.
This disclosure describes techniques for coding one or more syntax elements that indicate the position of the last significant coefficient. In some examples, all or a portion of the syntax element indicating the position of the last significant coefficient may be entropy coded according to any of the following techniques: context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), probability interval partition entropy coding (PIPE), or the like. In entropy coding techniques that utilize binary indices, which may also be referred to as "bins" or "bin indices"), and context assignments, common context assignments may be used for bins for different transform block (TU) sizes and/or different color components. In this way, the total number of contexts may be reduced. By reducing the total number of contexts, the video encoder and/or video decoder may code syntax elements indicating the position of the last significant coefficient more efficiently because less contexts need to be stored.
Fig. 1 is a block diagram illustrating an example video encoding and decoding system 10, according to an example of this disclosure, which example video encoding and decoding system 10 may be configured to utilize techniques for coding transform coefficients. As shown in fig. 1, system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16. The encoded video data may also be stored on storage medium 34 or file server 36, and may be accessed by destination device 14 as desired. When stored to a storage medium or file server, video encoder 20 may provide the coded video data to another device, such as a network interface, a Compact Disc (CD), blu-ray, or Digital Video Disc (DVD) recorder or a stamping facility (stamping facility) device, or other device, to store the coded video data to the storage medium. Likewise, a device separate from video decoder 30, such as a network interface, CD or DVD reader, or the like, may retrieve coded video data from a storage medium and provide the retrieved data to video decoder 30.
Source device 12 and destination device 14 may comprise any of a wide variety of devices, including desktop computers, notebook (i.e., portable) computers, tablet computers, set-top boxes, telephone handsets such as so-called smartphones, televisions, cameras, display devices, digital media players, video game consoles, or the like. In many cases, such devices may be equipped for wireless communication. Thus, communication channel 16 may comprise a wireless channel, a wired channel, or a combination of a wireless channel and a wired channel suitable for transmitting encoded video data. Similarly, the file server 36 may be accessed by the destination device 14 via any standard data connection, including an internet connection. Such a data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server.
Techniques for coding transform coefficients according to examples of this disclosure may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of digital video stored on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, a modulator/demodulator 22, and a transmitter 24. In source device 12, video source 18 may include sources such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface that receives video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources. As one example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be generally applicable to video coding, and may be applied to wireless and/or wired applications, or applications in which encoded video data is stored on a local disk.
Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may be modulated by modem 22 according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas.
The captured, pre-captured, or computer-generated video encoded by video encoder 20 may also be stored on storage medium 34 or file server 36 for later consumption. Storage medium 34 may include a blu-ray disc, DVD, CD-ROM, flash memory, or any other suitable digital storage medium for storing encoded video. The encoded video stored on storage medium 34 may then be accessed by destination device 14 for decoding and playback.
File server 36 may be any type of server capable of storing encoded video and transmitting the encoded video to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, local disk drives, or any other type of device capable of storing encoded video data and transmitting the encoded video data to a destination device. The transmission of the encoded video data from the file server 36 may be a streaming transmission, a download transmission, or a combination of both. The file server 36 may be accessed by the destination device 14 through any standard data connection, including an internet connection. Such a data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, ethernet, USB, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server.
In the example of fig. 1, destination device 14 includes a receiver 26, a modem 28, a video decoder 30, and a display device 32. Receiver 26 of destination device 14 receives the information over channel 16 and modem 28 demodulates the information to generate a demodulated bitstream for video decoder 30. The information communicated over channel 16 may include a variety of syntax information generated by video encoder 20 for use by video decoder 30 in decoding the video data. This syntax may also be included with the encoded video data stored on the storage medium 34 or the file server 36. Each of video encoder 20 and video decoder 30 may form part of a respective encoder-decoder (CODEC) capable of encoding or decoding video data.
The display device 32 may be integrated with the destination device 14 or external to the destination device 14. In some examples, destination device 14 may include an integrated display device, and may also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
In the example of fig. 1, communication channel 16 may comprise any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. Communication channel 16 may form part of a packet-based network such as a local area network, a wide area network, or a global network such as the internet. Communication channel 16 generally represents any suitable communication medium or collection of different communication media, including any suitable combination of wired or wireless media, for transmitting video data from source device 12 to destination device 14. The communication channel 16 may include a router, switch, base station, or any other apparatus that may be used to facilitate communication from the source device 12 to the destination device 14.
Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard currently being developed by the joint collaborative team on video coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG), and may conform to the HEVC test model (HM). Video encoder 20 and video decoder 30 may operate according to a recent draft of the HEVC standard, referred to as "HEVC working draft 5" or "WD 5", which is described in document JCTVC-G1103(Bross et al) "WD 5: working draft 5 "of High Efficiency Video Coding (HEVC) (ITU-T SG16WP3 and the Joint collaborative team on video coding (JCT-VC) of ISO/IEC JTC1/SC29/WG11, 7 th conference: Switzerland Nintendo W, 11 months 2012). In addition, another recent working draft of HEVC (working draft 7) is described in the document HCTVC-I1003(Bross et al) "High Efficiency Video Coding (HEVC) text specification draft 7" (Joint collaborative team on video coding (JCT-VC) (ITU-T SG16WP3 and ISO/IEC JTC1/SC29/WG 11), conference 9: Switzerland Nintendow, from 27/4/2012 to 7/5/2012). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards such as the ITU-T H.264 standard alternatively referred to as MPEG-4 part 10 Advanced Video Coding (AVC), or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples include MPEG-2 and ITU-T H.263.
Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Video encoder 20 may implement any or all of the techniques of this disclosure for coding transform coefficients in a video encoding process. Likewise, video decoder 30 may implement any or all of these techniques for coding transform coefficients in a video coding process. A video coder as described in this disclosure may refer to a video encoder or a video decoder. Similarly, a video coding unit may refer to a video encoder or a video decoder. Likewise, video coding may refer to video encoding or video decoding.
Video encoder 20 and video decoder 30 of fig. 1 represent examples of video coders configured to: obtaining a binary string indicating a position of a last significant coefficient within the video block; determining a context of a binary index of a binary string based on a video block size, wherein a context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and coding the binary string using Context Adaptive Binary Arithmetic Coding (CABAC) based at least in part on the determined context.
For video coding according to the HEVC standard currently under development, a video frame may be partitioned into coding units. A Coding Unit (CU) generally refers to an image area serving as a basic unit applied to achieve video compression by various coding tools. A CU typically has a luma component, denoted Y, and two chroma components, denoted U and V. Depending on the video sampling format, the size of the U and V components, in terms of the number of samples, may be the same as or different from the size of the Y component. A CU is typically square and may be considered similar to a so-called macroblock, e.g., in accordance with other video coding standards such as ITU-T h.264. For illustration purposes, coding in accordance with some of the presently proposed aspects of the HEVC standard being developed will be described in this application. However, the techniques described in this disclosure may be used for other video coding processes, such as those defined in accordance with h.264 or other standard or proprietary video coding processes. HEVC standardization efforts are based on a model of the video coding device called the HEVC test model (HM). The HM assumes several capabilities of video coding devices over devices according to ITU-T h.264/AVC, for example. For example, h.264 provides nine intra-prediction encoding modes, while HM provides up to thirty-four intra-prediction encoding modes.
A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) typically includes a series of one or more video pictures. The GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, the syntax data describing the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the encoding mode of the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. A video block may include one or more TUs or PUs corresponding to coding nodes within a CU. Video blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard.
According to the HM, a CU may include one or more Prediction Units (PUs) and/or one or more Transform Units (TUs). Syntax data within the bitstream may define a Largest Coding Unit (LCU), which is the largest CU in terms of the number of pixels. In general, a CU has a similar purpose to a macroblock of h.264, except that the CU has no size difference. Thus, a CU may be split into multiple sub-CUs. In general, a reference to a CU in this disclosure may refer to a largest coding unit of a picture or a sub-CU of an LCU. The LCU may be split into a plurality of sub-CUs, and each sub-CU may be further split into a plurality of sub-CUs. Syntax data for a bitstream may define a maximum number of times an LCU may be split, referred to as CU depth. Thus, the bitstream may also define a minimum coding unit (SCU). This disclosure also uses the term "block" or "portion" to refer to any of a CU, PU, or TU. In general, "portion" may refer to any subset of a video frame.
The LCU may be associated with a quadtree data structure. In general, a quadtree data structure contains one node per CU, where the root node corresponds to an LCU. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of the four leaf nodes corresponding to one of the sub-CUs. Each node of the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in a quadtree may include a split flag that indicates whether a CU corresponding to the node is split into multiple sub-CUs. Syntax elements for a CU may be defined in a recursive manner, and may depend on whether the CU is split into multiple sub-CUs. If a CU is not further split, then the CU is referred to as a leaf CU. In the present invention, the 4 sub-CUs of a leaf CU will also be referred to as leaf CUs, although there is no significant splitting of the original leaf CU. For example, if a CU with a size of 16 × 16 is not further split, then the four 8 × 8 sub-CUs will also be referred to as leaf CUs, although the 16 × 16CU is never split.
Furthermore, the TUs of a leaf-CU may also be associated with respective quadtree data structures. That is, a leaf-CU may include a quadtree that indicates how to partition the leaf-CU into multiple TUs. This disclosure refers to a quadtree indicating how an LCU is partitioned into CU quadtrees, and a quadtree indicating how a leaf-CU is partitioned into TUs as TU quadtrees. The root node of a TU quadtree generally corresponds to a leaf CU, while the root node of a CU quadtree generally corresponds to an LCU. TUs of a TU quadtree that is not split are referred to as leaf-TUs.
A leaf CU may include one or more Prediction Units (PUs). In general, a PU represents all or a portion of a corresponding CU, and may include data for retrieving a reference sample for the PU. For example, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining a motion vector may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution for the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference frame to which the motion vector points, and/or a reference list for the motion vector (e.g., list 0 or list 1). The data defining the leaf-CU of a PU may also describe, for example, partitioning the CU into one or more PUs. The partition mode may be different depending on whether the CU is uncoded, intra-prediction mode encoded, or inter-prediction mode encoded. For intra coding, a PU may be considered the same as the transform unit described below.
As an example, the HM supports prediction with various PU sizes. Assuming that the size of a particular CU is 2 nx 2N, the HM supports intra prediction with PU sizes of 2 nx 2N or N x N, and inter prediction with symmetric PU sizes of 2 nx 2N, 2 nx N, N x 2N, or N x N. The HM also supports asymmetric partitioning for inter prediction with PU sizes of 2 nxnu, 2 nxnd, nlx 2N and nR x 2N. In asymmetric partitioning, one direction of a CU is undivided, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an indication of "n" followed by "up", "down", "left", or "right". Thus, for example, "2N × nU" refers to a 2N × 2N CU horizontally partitioned into a top 2N × 0.5N PU and a bottom 2N × 1.5N PU.
In this disclosure, "nxn" and "N by N" are used interchangeably to refer to the pixel size of a video block in terms of vertical and horizontal dimensions, e.g., 16 x 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Further, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.
To code a block (e.g., a prediction unit of video data), a predictor for the block is first derived. A predictor, also referred to as a predictive block, may be derived via intra (I) prediction (i.e., spatial prediction) or inter (P or B) prediction (i.e., temporal prediction). Thus, some prediction units may be intra-coded (I) using spatial prediction with respect to reference samples in neighboring reference blocks in the same frame (or slice), and other prediction units may be uni-directionally inter-coded (P) or bi-directionally inter-coded (B) with respect to reference sample blocks in other previously coded frames (or slices). In each case, the reference samples may be used to form a predictive block for the block to be coded.
When a predictive block is identified, a difference between the original block of video data and its predictive block is determined. This difference may be referred to as prediction residual data and indicates a pixel difference between pixel values in the block to be coded and pixel values in a predictive block selected to represent the coded block. To achieve better compression, the prediction residual data may be transformed, for example, using a Discrete Cosine Transform (DCT), an integer transform, a Karhunen-Loeve (K-L) transform, or another transform.
The residual data in a transform block, such as a TU, may be arranged into a two-dimensional (2D) array of pixel difference values that reside in the spatial pixel domain. The transform converts the residual pixel values into a two-dimensional array of transform coefficients in a transform domain, such as the frequency domain. For further compression, the transform coefficients may be quantized prior to entropy coding. The entropy coder then applies entropy coding, such as CAVLC, CABAC, PIPE, or the like, to the quantized transform coefficients.
To entropy code a block of quantized transform coefficients, a scanning process is typically performed such that a two-dimensional (2D) array of quantized transform coefficients in the block is rearranged into an ordered one-dimensional (1D) array of transform coefficients (i.e., a vector) according to a particular scanning order. Entropy coding is then applied to the vector of transform coefficients. The scanning of the quantized transform coefficients in the transform unit serializes the 2D array of transform coefficients for the entropy coder. A significance map may be generated to indicate the locations of significant (i.e., non-zero) coefficients. Scanning may be applied to scan the level of significant (i.e., non-zero) coefficients, and/or to code the sign of significant coefficients.
In HEVC, position information (e.g., significance map) for a significant transform coefficient is first coded for a TU to indicate the position of the last non-zero coefficient in the scan order. Significance maps and level information (absolute values and positive signs of coefficients) are coded for each coefficient in an inverse scan order.
After any transform is performed to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process of: transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be downscaled to an m-bit value during quantization, where n is greater than m. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning.
Fig. 2A-2D illustrate some different exemplary scanning orders. Other defined scan orders or adaptive (varying) scan orders may also be used. Fig. 2A illustrates a zig-zag scanning order, fig. 2B illustrates a horizontal scanning order, fig. 2C illustrates a vertical scanning order, and fig. 2D illustrates a diagonal scanning order. Combinations of these scanning orders may also be defined and used. In some examples, the techniques of this disclosure may be particularly applicable during the coding of so-called significance maps in video coding processes.
One or more syntax elements may be defined to indicate the position of the last significant coefficient (i.e., a non-zero coefficient), which may depend on the scan order associated with the coefficient block. For example, one syntax element may define a column position of the last significant coefficient within the coefficient value block, and another syntax element may define a row position of the last significant coefficient within the coefficient value block.
Fig. 3 illustrates one example of a significance map for a block of coefficient values. The significance map is shown on the right, with 1-bit flags identifying the coefficients that are significant (i.e., non-zero) in the left video block. In one example, given a set of significant coefficients (e.g., defined by a significance map) and a scan order, a position of a last significant coefficient may be defined. In the emerging HEVC standard, transform coefficients may be grouped into information blocks. An information block may comprise an entire TU, or in some cases, a TU may be subdivided into smaller information blocks. Significance maps and level information (absolute value and sign) are coded for each coefficient in the information block. In one example, for a 4 x 4TU and an 8 x 8TU, an information block consists of 16 consecutive coefficients in an inverse scan order (e.g., diagonal, horizontal, or vertical). For 16 × 16 and 32 × 32 TUs, the coefficients within the 4 × 4 sub-block are treated as information blocks. Syntax elements are coded and signaled to represent coefficient level information within an information block. In one example, all symbols are encoded in inverse scan order. The techniques of this disclosure may improve coding of syntax elements used to define this position of the last significant coefficient of a coefficient block.
As one example, the techniques of this disclosure may be used to code the position of the last significant coefficient of a coefficient block (e.g., a TU or an information block for a TU). Then, after coding the position of the last significant coefficient, level and sign information associated with the transform coefficient may be coded. Coding of level and sign information may be processed according to a five-pass method by coding the following symbols (e.g., for a TU or an information block of a TU) in an inverse scan order:
significant _ coeff _ flag (abbreviation sigMapFlag): this flag may indicate the significance of each coefficient in the information block. Coefficients having a value of one or greater than one may be considered valid.
coeff _ abs _ level _ marker 1_ Flag (abbreviation gr1 Flag): this flag may indicate whether the absolute value of the coefficient is greater than 1 for non-zero coefficients (i.e., coefficients with a sigMapFlag of 1).
coeff _ abs _ level _ marker 2_ Flag (abbreviation gr2 Flag): this Flag may indicate whether the absolute value of the coefficient is greater than 2 for coefficients having an absolute value greater than 1 (i.e., coefficients having a gr1Flag of 1).
coeff _ sign _ flag (abbreviation signFlag): this flag may indicate sign information for non-zero coefficients. For example, a zero of such a flag may indicate a positive sign, while a 1 may indicate a negative sign.
coeff _ abs _ level _ remaining (abbreviated level rem): this syntax element may indicate the remaining absolute value of the transform coefficient level. For this syntax element, for each coefficient having a magnitude greater than x, the absolute value of the coefficient-x may be coded as (abs (level) -x). The value of x depends on the presentation of gr1Flag and gr2 Flag. If gr2Flag has been decoded, the levelRem value is calculated as (abs (level) -2). If gr2Flag has not been coded, but gr1Flag has been coded, the levelRem value is calculated as (abs (level) -1).
In this way, transform coefficients for a TU or a portion of a TU (e.g., an information block) may be coded. In any case, the techniques of this disclosure, which relate to the coding of syntax elements to define the position of the last significant coefficient of a coefficient block, may also be used with other types of techniques for finally coding level and sign information of transform coefficients. As stated in this disclosure, the five-pass method for coding significance, level, and sign information is merely one example technique that may be used after coding the position of the last significant coefficient of a block.
After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector of transform coefficients. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data. Entropy encoding may be performed according to one of the following techniques: CAVLC, CABAC, syntax-based context-adaptive binary arithmetic coding (SBAC), PIPE coding, or another entropy encoding method. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in Variable Length Coding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings compared to, for example, using a equal length codeword for each symbol to be transmitted.
The entropy coding techniques of this disclosure are specifically described as being applicable to CABAC, but the techniques may also be applicable to CAVLC, SBAC, PIPE, or other techniques. Note that PIPE uses principles similar to those of arithmetic coding.
In general, coding data symbols using CABAC may involve one or more of the following steps:
(1) binarization: if a symbol to be coded has a non-binary value, it is mapped to a binary sequence, the so-called "binary string". Each binary index (i.e., "bin") in the bin string may have a value of "0" or "1".
(2) Context assignment: in regular mode, each bin is assigned to a context. The bin may also be coded according to a bypass mode that does not assign contexts. A context is a probabilistic model and is often referred to as a "context model". As used herein, the term context may refer to a probability model or probability value. The context determines the probability of how to compute the value of a bin for a given bin. The context may be associated with a probability of a value of a bin based on information such as a value of a previously encoded symbol or a bin number. In addition, contexts may be assigned to bins based on higher level (e.g., slice) parameters.
(3) Binary bit encoding: the binary bits are encoded with an arithmetic encoder. To encode a binary bit, an arithmetic encoder requires as input the probability of the value of the binary bit (i.e., the probability that the value of the binary bit equals "0", and/or the probability that the value of the binary bit equals "1"). The (estimated) probability may be represented in context by an integer value referred to as "context state".
(4) And (3) updating the state: the probability (state) of the selected context may be updated based on the actual coded value of the binary bit (e.g., if the binary bit value is "1," the probability of the binary bit being "1" may be increased). The updating of the probabilities may be governed according to transition rules associated with the context.
The following are example binarization techniques for a last significant coefficient syntax element that may be performed by video encoder 20. The last significant coefficient syntax element may include a row position and a column position (i.e., x-coordinate and y-coordinate) of the last significant coefficient within the two-dimensional block. For an 8 x 8 block, there are eight different possibilities of the last position of the coefficient within a column or row, i.e. 0, 1. Eight different binary bits are used to represent these eight row or column positions. For example, bin0 ═ 1 may indicate that the coefficient at row or column 0 is the last significant coefficient. In this example, if bin0 is 0, then the coefficient at position 0 is not the last coefficient. Another binary digit equal to 1 may indicate the position of the last significant coefficient. For example, bin1 ═ 1 may indicate that position 1 is the last significant coefficient. As another example, binX ═ 1 may indicate that position X is the last significant coefficient. As described above, each binary bit may be encoded by two different methods: (1) context-coded binary bits, and (2) bypass-mode-coded binary bits (in the case of no context).
Table 1 shows example binarization of the position of the last significant coefficient, some two of whichThe example in Table 1 provides an example binarization for the last significant coefficient of a 32 × 32TU, the second column of Table 1 provides a corresponding truncated unary prefix value for the possible value of the position of the last significant coefficient within a TU of size T, which is defined by a 2log2For simplicity, Table 1 contains X values that indicate one or zero bit values, please note that the X values uniquely map each value that shares a truncated unary prefix according to fixed length codes, the magnitude of the last location component in Table 1 may correspond to an X coordinate value and/or a y coordinate value, please note that the binarization of the last significant coefficients for 4 × 4, 8 × 8, and 16 × 16 TUs may be defined in a manner similar to the binarization of the 32 × 32 TUs described with respect to Table 1.
TABLE 1 binarization of TUs of size 32X 32, where X means 1 or 0.
As described above, coding data symbols using CABAC may also involve context assignment. In one example, context modeling may be used for arithmetic encoding of a truncated unary portion of a binary string, while context modeling is not used for arithmetic encoding of a fixed binary string of the binary string (i.e., the fixed binary string is bypassed). In the case where the truncated string is encoded using context modeling, a context may be assigned to each of the bin indices of the bin string.
There are several ways in which context can be assigned to each bin index of a bin string. The number of context assignments for the bin string representing the position of the last significant coefficient may be equal to the number or length of bin indices for the truncated bin strings of possible TU sizes and color components. For example, if the possible sizes of luma components are 4 × 4, 8 × 8, 16 × 16, and 32 × 32, the number of context assignments for one dimension may be equal to 60 (i.e., 4+8+16+32) when none of the binary bits are bypassed. Likewise, for each chroma component having possible sizes of 4 × 4, 8 × 8, and 16 × 16, the number of context assignments may be equal to 28 (i.e., 4+8+16) when none of the bins are bypassed. Thus, when the position of the last significant coefficient is specified using x and y coordinates, the maximum number of context assignments may be equal to 116 (i.e., 60+28+28) for each dimension. The following example context assignment assumes that some bins will be bypassed according to the binarization scheme described with respect to Table 1. However, note that the context assignment techniques described herein may be applicable to several binarization schemes. In addition, even when it is assumed that some of the bits will be bypassed, there are numerous ways in which a context may be assigned to a bit of the string of bits representing the position of the last significant coefficient.
In some cases, it may be desirable to reduce the total number of contexts relative to the number of required numbers of context assignments. In this way, an encoder or decoder may not need to store and maintain as many contexts. However, when reducing the number of contexts, the prediction accuracy may also be reduced, for example, if two bins share a context with different probabilities. In addition, the particular context may be updated more frequently, which may affect the estimated probability of the value of the binary bit. That is, coding a binary bit using the assigned context may involve updating the context. Thus, subsequent bins assigned to the context may be coded using the updated context. Additionally, note that in some examples, the context model may be initialized at the slice level such that although the bits are assigned the same context, the value of a bit within a slice does not affect coding of bits within subsequent slices. Techniques are described for optimizing context assignment so that the number of contexts can be reduced while maintaining accuracy of the estimated probability. In one example, the context assignment techniques described herein include techniques that assign the same context to individual bin indices.
Tables 2 through 13 below illustrate context assignments for bin indices of a bin string representing the position of the last significant coefficient within a TU. Note that for some bins in tables 2-13 (e.g., bins 5-7 of an 8 x 8 block), no context is assigned. This is because, as described above, it is assumed that these binary bits will be coded using a bypass mode. Note also that the values in tables 2-13 represent context indices. In tables 2 through 13, different bins share the same context when they have the same context index value. The mapping of context index to actual context may be defined according to a video coding standard. Tables 2 through 13 illustrate the general manner in which contexts are assigned to binary bits.
Table 2 illustrates possible context indexing for each bin of different TU sizes, binarized as described above with respect to the example provided above in table 1. In the example of table 2, adjacent bits are allowed to share the same context. For example, binary bits 2 and 3 of an 8 x 8TU share the same context.
Table 2: context assignment for last position coding
Tables 3-6 each illustrate other examples of context assignments according to the following rules:
1. the first K binary bits do not share context, where K > 1. K may be different for each TU size.
2. One context can only be assigned to consecutive binary bits. For example, bits 3 through 5 may use context 5. However, bin 3 and bin 5 use context 5 and bin 4 use context 6 is not allowed.
3. The last N binary bits (N > ═ 0) of different TU sizes may share the same context.
4. The number of binary bits sharing the same context increases with TU size.
The above rules 1 through 4 may be particularly useful for the binarization provided in table 1. However, rules 1-4 may be equally useful for other binarization schemes, and the actual context assignment may be adjusted accordingly for the implemented binarization scheme.
| Binary bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Indexing | |||||||||
| TU4x4 | 0 | 1 | 2 | ||||||
| TU8x8 | 3 | 4 | 5 | 6 | 7 | ||||
| TU16x16 | 8 | 9 | 10 | 11 | 11 | 12 | 12 | ||
| TU32x32 | 13 | 14 | 14 | 15 | 16 | 16 | 16 | 16 | 17 |
Table 3: example of last position binary bits according to rules 1 to 4
Table 4: example of last position binary bits according to rules 1 to 4
Table 5: example of last position binary bits according to rules 1 to 4
Table 6: example of last position binary bits according to rules 1 to 4
Tables 7-8 below provide example context assignments where the last bits from different block sizes share the same context, which may further optimize the number of contexts. In one example, direct mapping may be used to determine how to share context between the last binary bits of two or more block sizes. For example, for block a and block B having sizes M and N, respectively, the nth bin of block a may use the same context as the nth bin of block size B.
Table 7: the block sizes share instances of the last position binary bit of the same context.
Table 8 shows another example of last position bits from some block sizes sharing context with each other. In this case, TUs having sizes 8 × 8 and 16 × 16 share the same context.
Table 8: some block sizes (8 x 8 and 16 x 16) share instances of the last position binary bit of the same context.
In another example, context mappings from last position bins of different block sizes may be derived using a function f. For example, the nth bin in block size a may share the same context as the mth bin in block size B, where m is a function of n (m ═ f (n)). For example, the function may be linear, i.e., m ═ n × a + b, where a and b are parameters of the linear function. Table 9 shows examples of a ═ 1, B ═ 1, a ═ 8 × 8TU, and B ═ 16 × 16 TU.
Table 9: example of last position binary bit with shared context based on linear function
Note that when applying the above equation in certain cases, rounding may be involved due to integer operations. For example, 7 × 0.5 ═ 3.
The mapping from position n in an 8 x 8 block size to position m in a 4 x 4 block can be calculated using the following equation, according to the following example:
m ═ f (n) > 1, which means that a ═ 0.5, B ═ 0, a ═ 8 × 8, B ═ 4 × 4 (1)
The mapping from position n in a 16 x 16 block to position m in a 4 x 4 block can be calculated using the following equation:
m ═ f (n) > 2, which means a ═ 0.5, B ═ 0, a ═ 16 × 16, B ═ 4 × 4 (2)
As described above, equations (1) and (2) are just two examples that may be used to implement the mapping between blocks of different sizes. Equations (1) and (2) may be referred to as mapping functions. Note that ">" in equations (1) and (2) may represent shift operations defined according to video coding standards such as HEVC. In addition, other equations may be used to achieve the same mapping or different mappings.
Table 10 provides example context assignments for the last significant coefficient for 4 x 4, 8 x 8, and 16 x 16 TUs according to equations (1) and (2).
Table 10: context mapping with transform units of different sizes
Table 11 provides an example of the context assignment of table 10 using different context index values (i.e., 15 to 17 instead of 0 to 2). As described above, the values of the context indices in tables 3-12 are not intended to limit the actual context assigned to the binary index.
Table 11: context mapping with transform units of different sizes
Note that the mapping of the context in table 11 is equivalent to the following mapping function:
ctx_index=(n>>k)+15 (3)
where ctx _ index is an index of the context;
n is binary bit index
k=log2TrafoDimension-2;
log2TrafoDimension log2 (width), for the last position in the x-dimension;
log2TrafoDimension log2 (height), for the last position in the y-dimension.
In some cases, the functions defined in (1) through (3) may be used by a coding device to create a series of lists that may be stored in memory and used to look up context assignments. In some cases, the tables may be predetermined based on the equations and rules described herein and stored in video encoder 20 and video decoder 30.
Additionally, in some examples, functions (1) through (3) defined above may be selectively applied to assign context to particular bins. In this way, contexts may be assigned to different bins based on different rules. In one example, functions such as those described above may be applied only to binary bit indices (i.e., values of n) that are less than a threshold Th1 and/or greater than a threshold Th 2. Table 12 shows an example of selectively applying the mapping technique described above based on the value of the binary bit index (i.e., n > Th2 ═ 2).
Table 12: example of last position binary bit with shared context based on linear function and threshold
In another example, threshold values for applying techniques to the binary bit indices may be different for different block sizes, different frame types, different color components (Y, U, V), and/or other side information. This threshold may be predefined according to the video coding standard or may be signaled using a high level syntax. For example, a threshold may be signaled in a Sequence Parameter Set (SPS), Picture Parameter Set (PPS), Adaptation Parameter Set (APS), and/or slice header.
In another example, the mapping function may be different for different block sizes, different frame types, different color components (Y, U and V), and/or other side information. The mapping function may be predefined according to a video coding standard or may be signaled using a high level syntax. For example, the mapping function may be signaled in an SPS, PPS, APS, and/or slice header.
In another example, the direct mapping and functional mapping techniques described above may be adaptively applied based on color components, frame type, Quantization Parameter (QP), and/or other side information. For example, direct mapping or functional mapping techniques may be applied to only the chroma components. This adaptive rule may be predefined or may be signaled using a high level syntax. For example, the adaptive rules may be signaled in SPS, PPS, APS, and/or slice headers.
In another example, the last position binary bits of the chroma component and the luma component may share the same context. This may apply to any block size, e.g., 4 × 4, 8 × 8, 16 × 16, or 32 × 32. Table 13 shows an example of a last position binary bit sharing context for luma and chroma components of a 4 x 4 TU.
| Binary bit index | 0 | 1 | 2 | 3 |
| Lightness TU4 × 4 | 0 | 1 | 2 | |
| Chroma TU4 × 4 | 0 | 1 | 2 |
Table 13: examples of last position binary bits for luma and chroma components in a 4 x 4TU
Fig. 4 is a block diagram illustrating an example of a video encoder 20 that may use techniques for coding transform coefficients as described in this disclosure. For example, video encoder 20 represents an example of a video encoder configured to: obtaining a binary string indicating a position of a last significant coefficient within the video block; determining a context of a binary index of a binary string based on a video block size, wherein a context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and encoding the binary string using CABAC based at least in part on the determined context. For purposes of illustration, but without limiting this disclosure with respect to other coding standards or methods that may require scanning of transform coefficients, video encoder 20 will be described in the context of HEVC coding. Video encoder 20 may perform intra-coding and inter-coding of CUs within video frames. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video data within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy between a current frame and a previously coded frame of a video sequence. Intra mode (I-mode) may refer to any of a number of spatially based video compression modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of a number of temporally-based video compression modes.
As shown in fig. 4, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 4, video encoder 20 includes a mode selection module 40, a motion estimation module 42, a motion compensation module 44, an intra-prediction module 46, a reference frame buffer 64, a summer 50, a transform module 52, a quantization module 54, and an entropy encoding module 56. Transform module 52 illustrated in fig. 4 is a module that applies the actual transform or a combination of transforms to a block of residual data and should not be confused with blocks of transform coefficients, which may also be referred to as Transform Units (TUs) of a CU. For video block reconstruction, video encoder 20 also includes an inverse quantization module 58, an inverse transform module 60, and a summer 62. A deblocking filter (not shown in fig. 4) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 if desired.
During the encoding process, video encoder 20 receives a video frame or slice to be coded. A frame or slice may be divided into multiple video blocks, e.g., Largest Coding Units (LCUs). Motion estimation module 42 and motion compensation module 44 perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal compression. Intra-prediction module 46 may perform intra-predictive coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial compression.
Mode selection module 40 may select one of the coding modes (intra-mode or inter-mode), e.g., based on the error (i.e., distortion) result for each mode, and provide the resulting intra-or inter-predicted block (e.g., Prediction Unit (PU)) to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use in the reference frame. Summer 62 combines the predicted block with the inverse-quantized, inverse-transformed data for the block from inverse transform module 60 to reconstruct the encoded block, as described in more detail below. Some video frames may be designated as I-frames, where all blocks in an I-frame are encoded in intra-prediction mode. In some cases, intra-prediction module 46 may perform intra-prediction encoding of blocks in P-frames or B-frames, for example, when the motion search performed by motion estimation module 42 does not produce sufficient prediction for the block.
Motion estimation module 42 and motion compensation module 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation (or motion search) is a process that generates motion vectors that estimate the motion of video blocks. A motion vector, for example, may indicate the displacement of a prediction unit in a current frame relative to reference samples of a reference frame. Motion estimation module 42 calculates motion vectors for prediction units of inter-coded frames by comparing the prediction units to reference samples of reference frames stored in reference frame buffer 64. The reference samples may be blocks found to closely match the portion of the CU that includes the PU being coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. The reference samples may occur anywhere within a reference frame or reference slice, and not necessarily at block (e.g., coding unit) boundaries of the reference frame or slice. In some examples, the reference sample may occur at a fractional pixel location.
Motion estimation module 42 sends the calculated motion vectors to entropy encoding module 56 and motion compensation module 44. The portion of the reference frame identified by the motion vector may be referred to as a reference sample. Motion compensation module 44 may calculate a prediction value for the prediction unit of the current CU, e.g., by retrieving a reference sample identified by the motion vector for the PU.
As an alternative to inter prediction performed by motion estimation module 42 and motion compensation module 44, intra prediction module 46 may perform intra prediction on the received block. Intra-prediction module 46 may predict the received block relative to neighboring previously coded blocks (e.g., blocks above, above-right, above-left, or to the left of the current block assuming left-to-right, top-to-bottom encoding order of the block). Intra-prediction module 46 may be configured with a variety of different intra-prediction modes. For example, the intra-prediction module 46 may be configured to have a certain number of directional prediction modes, e.g., thirty-four directional prediction modes, based on the size of the CU being encoded.
Intra-prediction module 46 may select the intra-prediction mode by, for example, calculating error values for various intra-prediction modes and selecting the mode that yields the lowest error value. The directional prediction mode may include functions for combining values of spatially neighboring pixels and applying the combined values to one or more pixel locations in the PU. Once the values for all pixel locations in the PU have been calculated, intra-prediction module 46 may calculate an error value for the prediction mode based on the pixel differences between the PU and the received block to be encoded. Intra-prediction module 46 may continue to test intra-prediction modes until an intra-prediction mode is found that yields an acceptable error value. Intra-prediction module 46 may then send the PU to summer 50.
Video encoder 20 forms a residual block by subtracting prediction data calculated by motion compensation module 44 or intra-prediction module 46 from the original video block being coded. Summer 50 represents one or more components that perform this subtraction operation. The residual block may correspond to a two-dimensional matrix of pixel difference values, where the number of values in the residual block is the same as the number of pixels in the PU corresponding to the residual block. The values in the residual block may correspond to the difference (i.e., error) between the values of the co-located pixels in the PU and in the original block to be coded. The difference may be a chroma difference or a luma difference depending on the type of block being coded.
Transform module 52 may form one or more Transform Units (TUs) from the residual blocks. The transform module 52 selects a transform from a plurality of transforms. The transform may be selected based on one or more coding characteristics, such as block size, coding mode, or the like. Transform module 52 then applies the selected transform to the TU, producing a video block comprising a two-dimensional array of transform coefficients.
Transform module 52 may send the resulting transform coefficients to quantization module 54. Quantization module 54 may then quantize the transform coefficients. Entropy encoding module 56 may then perform a scan of the quantized transform coefficients in the matrix according to a scan pattern. This disclosure describes the entropy encoding module 56 as performing a scan. However, it should be understood that in other examples, other processing modules, such as quantization module 54, may perform the scan.
Inverse quantization module 58 and inverse transform module 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block. Motion compensation module 44 may calculate the reference block by adding the residual block to a predictive block of one of the frames of reference frame buffer 64. The reference frame buffer 64 is sometimes referred to as a Decoded Picture Buffer (DPB). Motion compensation module 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation module 44 to produce a reconstructed video block for storage in reference frame buffer 64. The reconstructed video block may be used by motion estimation module 42 and motion compensation module 44 as a reference block to inter-code a block in a subsequent video frame.
Once the transform coefficients are scanned into a one-dimensional array, entropy encoding module 56 may apply entropy coding, e.g., CAVLC, CABAC, SBAC, PIPE, or another entropy coding method, to the coefficients. In some cases, entropy encoding module 56 may also be configured to perform other coding functions in addition to entropy coding. For example, entropy encoding module 56 may be configured to determine Coded Block Pattern (CBP) values for the CU and the PU. Also, in some cases, entropy encoding module 56 may perform run-length coding of the coefficients. After entropy coding by entropy encoding module 56, the resulting encoded video may be transmitted to another device, such as video decoder 30, or archived for later transmission or retrieval.
In accordance with the techniques of this disclosure, entropy encoding module 56 may select a context to use to encode a syntax element based on, for example, the context assignments described above with respect to tables 2-13 and any combination of: the direction of intra prediction for an intra prediction mode, the scan position of the coefficients corresponding to the syntax elements, the block type, the transform type, and/or other video sequence properties.
In one example, entropy encoding module 56 may encode the position of the last significant coefficient using the binarization techniques employed in HEVC described above with respect to table 1. In other examples, entropy encoding module 56 may use other binarization techniques to encode the position of the last significant coefficient. In one example, the codeword for the position of the last significant coefficient may comprise a truncated unary code prefix followed by a fixed length code suffix. In one example, each magnitude of the last position may use the same binarization for all possible TU sizes, except if the last position is equal to the TU size minus 1. This exception is due to the nature of truncated unary coding. In one example, the location of the last significant coefficient within the rectangular transform coefficient may be specified by specifying an x-coordinate value and a y-coordinate value. In another example, the transform coefficient block may be in the form of a 1 × N vector, and the position of the last significant coefficient within the vector may be specified by a single position value.
FIG. 5 is a block diagram illustrating an example entropy encoding module 56 that may implement the techniques described in this disclosure. In one example, entropy encoding module 56 illustrated in fig. 5 may be a CABAC encoder. Example entropy encoding module 56 may include a binarization module 502, an arithmetic encoding module 510 that includes a bypass encoding engine 504 and a rule encoding engine 508, and a context modeling module 506. Entropy encoding module 56 receives syntax elements, such as one or more syntax elements representing the position of the last significant transform coefficient within the transform block coefficients, and encodes the syntax elements into the bitstream. The syntax elements may include a syntax element that specifies an x-coordinate of a location of the last significant coefficient within the transform coefficient block and a syntax element that specifies a y-coordinate of a location of the last significant coefficient within the transform coefficient block.
Binarization module 502 receives syntax elements and generates a binary string (i.e., a binary string). In one example, binarization module 502 receives a syntax element representing the last position of a significant coefficient within a block of transform coefficients, and generates a bin string according to the example described above with respect to table 1. The arithmetic coding module 510 receives the binary bit string from the binarization module 502 and performs arithmetic coding on the binary bit string. As shown in fig. 5, the arithmetic encoding module 510 may receive binary bit values from a bypass path or a regular coding path. Consistent with the CABAC process described above, in the case where the arithmetic encoding module 510 receives a binary bit value from the bypass path, the bypass encoding engine 504 may perform arithmetic encoding on the binary bit value without utilizing a context assigned to the binary bit value. In one example, bypass encoding engine 504 may assume equal probabilities of possible values of the binary bits.
In the case where the arithmetic coding module 510 receives binary bit values over the rule path, the context modeling module 506 may provide context variables (e.g., context states) such that the rule coding engine 508 may perform arithmetic coding based on the context assignments provided by the context modeling module 506. In one example, arithmetic encoding module 510 may encode a prefix portion of a bit string using context assignments and may encode a suffix portion of a bit string without using context assignments. Context assignments may be defined according to the examples described above with respect to tables 2-13. The context model may be stored in a memory. The context modeling module 506 may include a series of indexed tables and/or utilize mapping functions to determine the context and context variables for a particular bin. After encoding the binary bit values, rule encoding engine 508 may update the context based on the actual binary bit values and output the encoded binary bit values as part of the bitstream. In this manner, the entropy encoding module is configured to encode one or more syntax elements based on the context assignment techniques described herein.
FIG. 6 is a flow diagram illustrating an example method for determining a context of a binary string value indicating a position of a last significant coefficient, in accordance with the techniques of this disclosure. The method described in fig. 6 may be performed by any of the example video encoders or entropy encoders described herein. At step 602, a binary string is obtained indicating the position of the last significant transform coefficient within the video block. The binary string may be defined according to the binarization scheme described with respect to table 1. At step 604, a context for the binary bit value of the binary string is determined. Contexts may be assigned to bins based on the techniques described herein. The context may be determined by the video encoder or the entropy encoder accessing a lookup table or performing a mapping function. The context may be used to derive a particular context variable for a particular bin. The context variable may be a 7-bit binary value indicating one of the 64 possible probabilities (states) and the most likely state (e.g., "1" or "0"). As described above, in some cases, the binary bits may share a context according to the mapping function and tables 2-13 described above. At step 606, the binary bit values are encoded using an arithmetic coding process, such as CABAC, that utilizes context variables. Note that when the bins share a context, the value of one bin may affect the value of the context variable used to encode a subsequent bin according to the context adaptive encoding technique. For example, if a particular bin is a "1," a subsequent bin may be encoded based on an increased probability that the subsequent bin is a 1. In this way, entropy encoding the binary string may include updating the context state of the context model. Additionally, note that in some examples, the context model may be initialized at the slice level such that the value of a bin within a slice may not affect the encoding of a bin within a subsequent slice.
Fig. 7 is a block diagram illustrating an example of a video decoder 30 that may use techniques for coding transform coefficients as described in this disclosure. For example, video decoder 30 represents an example of a video decoder configured to: obtaining an encoded binary string indicating a position of a last significant coefficient within a video block, wherein the encoded binary string is encoded using CABAC; determining a context of a binary index of an encoded binary string based on a video block size, wherein the context is assigned to at least two binary indexes, wherein each of the at least two binary indexes is associated with a different video block size; and decoding the encoded binary string using CABAC based at least in part on the determined context.
In the example of fig. 7, video decoder 30 includes an entropy decoding module 70, a motion compensation module 72, an intra-prediction module 74, an inverse quantization module 76, an inverse transform module 78, a reference frame buffer 82, and a summer 80. In some examples, video decoder 30 may perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20.
Entropy decoding module 70 performs an entropy decoding process on the encoded bitstream to retrieve a one-dimensional array of transform coefficients. The entropy decoding process used depends on the entropy coding (e.g., CABAC, CAVLC, etc.) used by video encoder 20. The entropy coding process used by the encoder may be signaled in the encoded bitstream, or may be a predetermined process.
In some examples, entropy decoding module 70 (or inverse quantization module 76) may scan the received values using a scan that mirrors the scan pattern used by entropy encoding module 56 (or quantization module 54) of video encoder 20. Although the scanning of coefficients may be performed in inverse quantization module 76, the scanning will be described as being performed by entropy decoding module 70 for purposes of illustration. Moreover, although shown as separate functional modules for ease of illustration, the structure and functionality of entropy decoding module 70, inverse quantization module 76, and other modules of video decoder 30 may be highly integrated with one another.
Inverse quantization module 76 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include conventional processes, e.g., similar to the processes proposed for HEVC or defined by the h.264 decoding standard. The inverse quantization process may include using a quantization parameter QP calculated by video encoder 20 for the CU to determine the degree of quantization and, likewise, the degree of inverse quantization that should be applied. Inverse quantization module 76 may inverse quantize the transform coefficients before or after converting the coefficients from a one-dimensional array to a two-dimensional array.
The inverse transform module 78 applies an inverse transform to the inverse quantized transform coefficients. In some examples, inverse transform module 78 may determine the inverse transform based on signaling from video encoder 20 or by inferring the transform from one or more coding characteristics (e.g., block size, coding mode, or the like). In some examples, inverse transform module 78 may determine the transform to apply to the current block based on the signaled transform at the root node of the quadtree for the LCU that includes the current block. Alternatively, the transform may be signaled at the root of a TU quadtree of a leaf node CU in the LCU quadtree. In some examples, the inverse transform module 78 may apply a cascaded inverse transform, wherein the inverse transform module 78 applies two or more inverse transforms to the transform coefficients of the current block being decoded.
Intra-prediction module 74 may generate prediction data for a current block of a current frame based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame. Motion compensation module 72 may retrieve the motion vector, motion prediction direction, and reference index from the encoded bitstream. The reference prediction direction indicates whether the inter prediction mode is unidirectional (e.g., P frame) or bidirectional (B frame). The reference index indicates which reference frame the candidate motion vector is based on. Based on the retrieved motion prediction direction, reference frame index, and motion vector, motion compensation module 72 generates a motion compensated block for the current portion. These motion compensated blocks essentially recreate the predictive blocks used to generate the residual data.
Motion compensation module 72 may generate motion compensated blocks, possibly performing interpolation based on interpolation filters. An identifier of an interpolation filter to be used for motion estimation with sub-pixel precision may be included in the syntax element. Motion compensation module 72 may calculate interpolation values for sub-integer pixels of the reference block using interpolation filters as used by video encoder 20 during encoding of the video block. Motion compensation module 72 may determine the interpolation filters used by video encoder 20 from the received syntax information and use the interpolation filters to generate the predictive blocks.
Additionally, in the HEVC example, motion compensation module 72 and intra-prediction module 74 may use some of the syntax information (e.g., provided by a quadtree) to determine the size of the LCU used to encode the frame(s) of the encoded video sequence. Motion compensation module 72 and intra-prediction processing unit 74 may also use syntax information to determine splitting information that describes how each CU (and, likewise, how sub-CUs) of a frame of an encoded video sequence are split. The syntax information may also include a mode indicating how each split is encoded (e.g., intra-prediction or inter-prediction, and for intra-prediction, an intra-prediction encoding mode), one or more reference frames for each inter-encoded PU (and/or a reference list containing identifiers for the reference frames), and other information used to decode the encoded video sequence.
Summer 80 combines the residual block with the corresponding prediction block generated by motion compensation unit 72 or intra-prediction processing unit 74 to form a decoded block. A deblocking filter may also be applied to filter the decoded blocks, if desired, in order to remove blocking artifacts. The decoded video blocks are then stored in a reference frame buffer 82, the reference frame buffer 82 providing reference blocks for subsequent motion compensation and also generating decoded video for presentation on a display device (e.g., display device 32 of fig. 1). The reference frame buffer 82 may also be referred to as a DPB.
FIG. 8 is a block diagram illustrating an example entropy decoding module 70 that may implement the techniques described in this disclosure. Entropy decoding module 70 receives the entropy encoded bitstream and decodes syntax elements from the bitstream. In one example, the syntax element may represent the position of the last significant transform coefficient within the transform block coefficients. The syntax elements may include a syntax element that specifies an x-coordinate of a location of the last significant coefficient within the transform coefficient block and a syntax element that specifies a y-coordinate of a location of the last significant coefficient within the transform coefficient block. In one example, entropy decoding module 70 illustrated in fig. 8 may be a CABAC decoder. The example entropy decoding module 70 in fig. 8 includes an arithmetic decoding module 702, which arithmetic decoding module 702 may include a bypass decoding engine 704 and a rule decoding engine 706. The example entropy decoding module 70 also includes a context modeling unit 708 and an inverse binarization module 710. Example entropy decoding module 70 may perform the reciprocal function of example entropy encoding module 56 described with respect to fig. 5. In this manner, entropy decoding module 70 may perform entropy decoding based on the context assignment techniques described herein.
The arithmetic decoding module 702 receives the encoded bitstream. As shown in fig. 8, the arithmetic decoding module 702 may process the encoded binary bit values according to a bypass path or a regular coding path. An indication of whether the encoded binary bit values should be processed according to a bypass path or according to a regular pass may be signaled in the bitstream in a higher level syntax. Consistent with the CABAC process described above, in the case where the arithmetic decoding module 702 receives a binary bit value from the bypass path, the bypass decoding engine 704 may perform arithmetic decoding on the binary bit value without utilizing the context assigned to the binary bit value. In one example, the bypass decode engine 704 may assume equal probabilities of possible values of the binary bits.
In the case where the arithmetic decoding module 702 receives binary bit values over the rule path, the context modeling module 708 may provide context variables such that the rule decoding engine 706 may perform arithmetic decoding based on the context assignments provided by the context modeling module 708. The context assignment may be defined according to the examples described above with respect to tables 2-13. The context model may be stored in a memory. The context modeling module 708 may include a series of indexing tables and/or utilize a mapping function to determine the context and context variable portions of the encoded bitstream. After decoding the binary bit value, the rule decoding engine 706 may update the context based on the decoded binary bit value. In addition, the inverse binarization module 710 may perform inverse binarization on the binary bit values and use a binary bit matching function to determine whether the binary bit values are valid. The inverse binarization module 710 may also update the context modeling module based on the match determination. Thus, the inverse binary module 710 outputs the syntax elements according to the context adaptive decoding technique. In this manner, entropy decoding module 70 is configured to decode one or more syntax elements based on the context assignment techniques described herein.
Fig. 9 is a flow diagram illustrating an example method for determining values from a binary string, the values indicating the location of the last significant coefficient within a transform coefficient, in accordance with the techniques of this disclosure. The method described in fig. 9 may be performed by any of the example video decoders or entropy decoding units described herein. At step 902, an encoded bitstream is obtained. The encoded bitstream may be retrieved from memory or via transmission. The encoded bitstream may be encoded according to a CABAC encoding process or another entropy coding process. At step 904, a context for a portion of an encoded binary string is determined. Contexts may be assigned to coded bins based on the techniques described herein. The context may be determined by the video or entropy decoder accessing a lookup table or performing a mapping function. The context may be determined based on a higher level syntax provided in the encoded bitstream. The context may be used to derive a particular context variable for a particular encoded bin. As described above, the context variable may be a 7-bit binary value indicating one of the 64 possible probabilities (states) and the most likely state (e.g., "1" or "0"), and in some cases, the bits may share a context. At step 906, the binary string is decoded using an arithmetic decoding process, such as CABAC, that utilizes context variables. The bin string may be decoded bin by bin, with the context model being updated after each bin is decoded. The decoded bitstream may include syntax elements that are further used to decode transform coefficients associated with the encoded video data. In this manner, utilizing the techniques described above to assign context to particular bins may provide efficient decoding of encoded video data.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but relate to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperability hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.
Claims (20)
1. A method of encoding transform coefficients of a video block, the method comprising:
determining a binary string indicating a location of a last significant coefficient within a transform block of transform coefficients associated with the video block, the binary string comprising a prefix portion and a suffix portion;
determining a context for a binary index in the prefix portion of the binary string based on a size of the transform block, wherein the context is assigned to at least two binary indices according to a context index ctx _ index defined by the following function:
ctx_index=(n>>(log2(T)-2))+15,
wherein n is a binary index, > > represents a binary shift operation, and T is a size of the transform block, and wherein each of the at least two binary indices is associated with a different transform block size; and
encoding the binary string using Context Adaptive Binary Arithmetic Coding (CABAC) based at least in part on the determined context.
2. The method of claim 1, wherein the context is assigned to a last binary index of a 16 x 16 transform block and a last binary index of a 32 x 32 transform block.
3. The method of claim 2, wherein a second context is assigned to a neighboring binary index of the 16 x 16 transform block.
4. The method of claim 1, wherein encoding the binary string using CABAC based at least in part on the determined context includes updating the context based on a value of the binary string, and wherein the transform block is a first transform block, the method further comprising:
determining a second binary string indicating a position of a last significant coefficient within a second transform block, wherein the first transform block and the second transform block are different sizes; and
encoding the second binary string using CABAC based at least in part on the updated context.
5. A method of decoding transform coefficients for a video block, the method comprising:
obtaining an encoded binary string indicating a position of a last significant coefficient within a transform block of transform coefficients associated with the video block, the encoded binary string comprising a prefix portion and a suffix portion, wherein the encoded binary string is encoded using Context Adaptive Binary Arithmetic Coding (CABAC);
determining a context for a binary index in the prefix portion of the encoded binary string based on a size of the transform block, wherein the context is assigned to at least two binary indices according to a context index ctx _ index defined by the function:
ctx_index=(n>>(log2(T)-2))+15,
wherein n is a binary index, > > represents a binary shift operation, and T is a size of the transform block, and wherein each of the at least two binary indices is associated with a different transform block size;
decoding the encoded binary string using CABAC based at least in part on the determined context; and
determining the position of the last significant coefficient in a transform block of the transform coefficients associated with the video block based on a decoded binary string.
6. The method of claim 5, wherein the context is assigned to a last binary index of a 16 x 16 transform block and a last binary index of a 32 x 32 transform block.
7. The method of claim 6, wherein a second context is assigned to a neighboring binary index of the 16 x 16 transform block.
8. The method of claim 5, wherein decoding the encoded binary string using CABAC based at least in part on the determined context includes updating the context based on a value of the encoded binary string, and wherein the transform block is a first transform block; and further comprising:
obtaining a second encoded binary string indicating a position of a last significant coefficient within a second transform block, wherein a size of the first transform block and the second transform block are different; and
decoding the second encoded binary string using CABAC based at least in part on the updated context.
9. An apparatus configured to encode transform coefficients for a video block, the apparatus comprising:
means for determining a binary string indicating a location of a last significant coefficient within a transform block of transform coefficients associated with the video block, the binary string comprising a prefix portion and a suffix portion;
means for determining a context for a binary index in the prefix portion of the binary string based on a size of the transform block, wherein the context is assigned to at least two binary indices according to a context index ctx _ index defined by the function:
ctx_index=(n>>(log2(T)-2))+15,
wherein n is a binary index, > > represents a binary shift operation, and T is a size of the transform block, and wherein each of the at least two binary indices is associated with a different transform block size; and
means for encoding the binary string using Context Adaptive Binary Arithmetic Coding (CABAC) based at least in part on the determined context.
10. The apparatus of claim 9, wherein encoding the binary string using CABAC based at least in part on the determined context includes updating the context based on a value of the binary string, and wherein the transform block is a first transform block; and further comprising:
means for determining a second binary string indicating a position of a last significant coefficient within a second transform block, wherein the first transform block and the second transform block are different sizes; and
means for entropy coding the second binary string using CABAC based at least in part on the updated context.
11. An apparatus configured to decode transform coefficients for a video block, the apparatus comprising:
means for obtaining an encoded binary string indicating a position of a last significant coefficient within a transform block of transform coefficients associated with the video block, the encoded binary string comprising a prefix portion and a suffix portion, wherein the encoded binary string is encoded using Context Adaptive Binary Arithmetic Coding (CABAC);
means for determining a context for a binary index in the prefix portion of the encoded binary string based on a size of the transform block, wherein the context is assigned to at least two binary indices according to a context index ctx _ index defined by the function:
ctx_index=(n>>(log2(T)-2))+15,
wherein n is a binary index, > > represents a binary shift operation, and T is a size of the transform block, and wherein each of the at least two binary indices is associated with a different transform block size;
means for decoding the encoded binary string using CABAC based at least in part on the determined context; and
means for determining the position of the last significant coefficient in a transform block of the transform coefficients associated with the video block based on a decoded binary string.
12. The apparatus of claim 11, wherein decoding the encoded binary string using CABAC based at least in part on the determined context includes updating the context based on a value of the encoded binary string, and wherein the transform block is a first transform block; and further comprising:
means for obtaining a second encoded binary string indicating a position of a last significant coefficient within a second transform block, wherein the first transform block and the second transform block are different sizes; and
means for decoding the second encoded binary string using CABAC based at least in part on the updated context.
13. A device configured to encode transform coefficients for a video block, the device comprising:
a memory configured to store the transform coefficients; and
a video encoder configured to:
determining a binary string indicating a location of a last significant coefficient within a transform block of transform coefficients associated with the video block, the binary string comprising a prefix portion and a suffix portion;
determining a context for a binary index in the prefix portion of the binary string based on a size of the transform block, wherein the context is assigned to at least two binary indices according to a context index ctx _ index defined by the following function:
ctx_index=(n>>(log2(T)-2))+15,
wherein n is a binary index, > > represents a binary shift operation, and T is a size of the transform block, and wherein each of the at least two binary indices is associated with a different transform block size; and
encoding the binary string using Context Adaptive Binary Arithmetic Coding (CABAC) based at least in part on the determined context.
14. The apparatus of claim 13, wherein the context is assigned to a last binary index of a 16 x 16 transform block and a last binary index of a 32 x 32 transform block.
15. The apparatus of claim 14, wherein a second context is assigned to a neighboring binary index of the 16 x 16 transform block.
16. The device of claim 13, wherein encoding the binary string using CABAC based at least in part on the determined context includes updating the context based on a value of the binary string, and wherein the transform block is a first transform block; and wherein the video encoder is further configured to:
determining a second binary string indicating a position of a last significant coefficient within a second transform block, wherein the first transform block and the second transform block are different sizes; and
encoding the second binary string using CABAC based at least in part on the updated context.
17. A device configured to decode transform coefficients for a video block, the device comprising:
a memory configured to store the transform coefficients; and
a video decoder configured to:
obtaining an encoded binary string indicating a position of a last significant coefficient within a transform block of the transform coefficients associated with the video block, the encoded binary string comprising a prefix portion and a suffix portion, wherein the encoded binary string is encoded using Context Adaptive Binary Arithmetic Coding (CABAC);
determining a context for a binary index in the prefix portion of the encoded binary string based on a size of the transform block, wherein the context is assigned to at least two binary indices according to a context index ctx _ index defined by the function:
ctx_index=(n>>(log2(T)-2))+15,
wherein n is a binary index, > > represents a binary shift operation, and T is a size of the transform block, and wherein each of the at least two binary indices is associated with a different transform block size;
decoding the encoded binary string using CABAC based at least in part on the determined context; and
determining the position of the last significant coefficient in a transform block of the transform coefficients associated with the video block based on a decoded binary string.
18. The apparatus of claim 17, wherein the context is assigned to a last binary index of a 16 x 16 transform block and a last binary index of a 32 x 32 transform block.
19. The apparatus of claim 18, wherein a second context is assigned to a neighboring binary index of the 16 x 16 transform block.
20. The device of claim 17, wherein decoding the encoded binary string using CABAC based at least in part on the determined context includes updating the context based on a value of the encoded binary string, and wherein the transform block is a first transform block; and wherein the video decoder is further configured to:
obtaining a second encoded binary string indicating a position of a last significant coefficient within a second transform block, wherein a size of the first transform block and the second transform block are different; and
decoding the second encoded binary string using CABAC based at least in part on the updated context.
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US61/557,317 | 2011-11-08 | ||
| US61/561,909 | 2011-11-20 | ||
| US61/588,579 | 2012-01-19 | ||
| US61/596,049 | 2012-02-07 | ||
| US13/669,096 | 2012-11-05 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1197951A HK1197951A (en) | 2015-02-27 |
| HK1197951B true HK1197951B (en) | 2018-10-19 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5955974B2 (en) | Context optimization to encode the position of the last significant coefficient | |
| US9621921B2 (en) | Coefficient groups and coefficient coding for coefficient scans | |
| US9451287B2 (en) | Context reduction for context adaptive binary arithmetic coding | |
| US9462275B2 (en) | Residual quad tree (RQT) coding for video coding | |
| US9491463B2 (en) | Group flag in transform coefficient coding for video coding | |
| JP2015533061A (en) | Context derivation for context-adaptive, multilevel significance coding | |
| HK1197951B (en) | Context optimization for last significant coefficient position coding | |
| HK40003889A (en) | Decoding of position of last significant coefficient | |
| HK40003889B (en) | Decoding of position of last significant coefficient | |
| HK1197951A (en) | Context optimization for last significant coefficient position coding |