HK1199995B

HK1199995B - Signaling quantization matrices for video coding

Info

Publication number: HK1199995B
Application number: HK15100255.3A
Authority: HK
Inventors: 瑞珍．雷克斯曼．乔许; 马尔塔.卡切维奇
Original assignee: 高通股份有限公司
Priority date: 2011-11-07
Filing date: 2012-11-07
Publication date: 2018-08-03

Description

Signaling quantization matrices for video coding

The present application claims rights to: U.S. provisional application No. 61/556,785, filed on 7/11/2011; U.S. provisional application No. 61/594,885, filed on 3/2/2012; U.S. provisional application No. 61/597,107, filed on day 2, 9, 2012; and U.S. provisional application No. 61/605,654, filed on 3/1/2012, the entire contents of each of which are incorporated herein by reference.

Technical Field

This disclosure relates to data coding, and more particularly, to techniques for coding video data.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, video teleconferencing devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-t h.263, ITU-t h.264/MPEG-4 part ten (advanced video coding (AVC)), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of these standards, to more efficiently transmit, receive, and store digital video information.

Video compression techniques include spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video frame or slice may be partitioned into multiple blocks. Each block may be further partitioned. Blocks in an intra-coded (I) frame or slice are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same frame or slice. Blocks in an inter-coded (P or B) frame or slice may use spatial prediction with respect to reference samples in neighboring blocks in the same frame or slice, or temporal prediction with respect to reference samples in other reference frames. Spatial or temporal prediction generates a predictive block for a block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block.

An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates the difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, producing residual transform coefficients that may then be quantized. The quantized transform coefficients initially arranged in a two-dimensional array may be scanned in a particular order to generate a one-dimensional vector of transform coefficients for entropy coding.

Disclosure of Invention

In general, this disclosure describes signaling the values of a quantization matrix. For example, a video encoder may divide values of a quantization matrix into at least a first subset of values and a second subset of values. A video encoder may encode and signal the values of the first subset as syntax elements. A video decoder may receive syntax elements for the values of the first subset and decode the syntax elements to generate the values of the first subset. Without receiving the values of the second subset, a video decoder may predict the values of the second subset from the values in the first subset.

In one example of this disclosure, a method of encoding video data comprises: generating a quantization matrix comprising a plurality of values; down-sampling a first set of values in the quantization matrix by a first down-sampling factor to generate a first set of down-sampled values; down-sampling a second set of values in the quantization matrix by a second down-sampling factor to generate a second set of down-sampled values; and generate a coded bitstream that includes the first set of downsampled values and the second set of downsampled values.

In another example of this disclosure, a method of decoding video data comprises: receiving, in a coded bitstream, a quantization matrix coded with downsampled values; upsampling a first set of downsampled values in the quantization matrix by a first upsampling factor to produce a first set of values; upsampling a second set of downsampled values in the quantization matrix by a second upsampling factor to produce a second set of values; and inverse quantizing a transform coefficient block with the first and second sets of values.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.

FIG. 2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

FIG. 3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

Fig. 4 is a conceptual diagram illustrating an example quantization matrix.

Fig. 5 is a conceptual diagram illustrating a quantization matrix with example values.

Fig. 6 is a conceptual diagram illustrating a reconstructed quantization matrix utilizing one or more example techniques of this disclosure.

Fig. 7 is a conceptual diagram illustrating downsampling factors for different portions in one example of a quantization matrix.

Fig. 8 is a conceptual diagram illustrating downsampling factors for different portions in another example of a quantization matrix.

Fig. 9 is a conceptual diagram illustrating downsampling factors for different portions in another example of a quantization matrix.

Fig. 10 is a flow diagram illustrating a method of video encoding in accordance with the techniques of this disclosure.

FIG. 11 is a flow diagram illustrating a video decoding method in accordance with the techniques of this disclosure.

Detailed Description

This disclosure describes techniques for signaling values of quantization matrices in video coding. The quantization matrix may be a two-dimensional matrix comprising a plurality of values. As an illustration, the quantization matrix may be used to scale a quantization step size used to quantize residual transform coefficients associated with a transform unit used for video coding. A Quantization Parameter (QP) may be assigned to a block of transform coefficients (e.g., a transform unit) to specify a quantization step size. Each value in the quantization matrix corresponds to a coefficient in the block to be quantized and is used to determine the degree of quantization (given the QP value) to be applied to the coefficient.

In particular, this disclosure proposes techniques for downsampling quantization matrices such that fewer quantization values need only be transmitted and/or stored in an encoded video bitstream. Transmitting or storing the entire quantization matrix associated with a block of video data may require a large number of bits, reducing the bandwidth efficiency of the coded video bitstream. Also, the video decoder may store the entire quantization matrix in storage for use by the inverse quantization process. By utilizing the techniques of this disclosure to downsample the quantization matrix, bits may be saved without substantially reducing the quality of the coded video.

In this disclosure, video coding will be described for purposes of illustration. The coding techniques described in this disclosure may also be applicable to other types of data coding. Digital video devices implement video compression techniques to more efficiently encode and decode digital video information. Video compression may apply spatial (intra-picture) prediction and/or temporal (inter-picture) prediction techniques to reduce or remove redundancy inherent in video sequences.

It should be understood that the term "frame" may be used interchangeably with the term "picture". In other words, the terms "frame" and "picture" each refer to a portion of video, and the sequential display of the frames or pictures results in smooth playback. Thus, in examples where this disclosure uses the term "frame," the techniques of this disclosure should not be construed as limited to video coding techniques or standards that utilize the term "frame," and the techniques may be extended to other standards (e.g., developed standards, standards in development, or future standards) or other video coding techniques that utilize the term "picture.

Typical video encoders partition each frame of an original video sequence into contiguous rectangular regions called "blocks" or "coding units". These blocks are coded in "intra mode" (I-mode) or in "inter mode" (P-mode or B-mode).

For P-mode or B-mode, the encoder first starts by F_refThe indicated "reference frame" is searched for blocks that are similar to the block being encoded. The search is typically limited to a range that is no more than a certain spatial displacement from the block to be encoded. When a best match (i.e., a predictive block or "prediction") has been identified, the best match is represented in the form of a two-dimensional (2D) motion vector (Δ x, Δ y), where Δ x is the horizontal displacement of the position of the pixels in the predictive block in the reference frame relative to the position of the pixels in the block to be coded and Δ y is the vertical displacement of the position of the pixels in the predictive block in the reference frame relative to the position of the pixels in the block to be coded.

As described below, the motion vectors are used with the reference frame to construct the prediction block F_pred：

F_pred(x，y)＝F_ref(x+Δx，y+Δy)

The position of the pixel in the frame is indicated by (x, y).

For blocks coded in I-mode, the prediction block is formed from previously coded neighboring blocks within the same picture frame using spatial prediction. For both I-mode and P-or B-mode, the prediction error (i.e., the residual difference between pixel values in the block being encoded and the prediction block) is represented as a set of weighted basis functions of some discrete transform, such as a Discrete Cosine Transform (DCT). The transform may be performed based on blocks of different sizes, e.g., 4 x 4,8 x 8, or 16 x 16 and larger. The shape of the transform block is not always square. Rectangular shaped transform blocks, e.g., having transform block sizes of 16 × 4, 32 × 8, etc., may also be used.

The weights (i.e., the transform coefficients) are then quantized. Quantization introduces a loss of information and, as such, the quantized coefficients have a lower precision than the original transform coefficients. The quantized transform coefficients and motion vectors are examples of "syntax elements". These syntax elements, plus some control information, form a coded representation of the video sequence. Syntax elements may also be entropy coded, thereby further reducing the number of bits they require for representation. Entropy coding is a lossless operation aimed at minimizing the number of bits required to represent transmitted or stored symbols (syntax elements in this case) by exploiting the nature of the symbol distribution (some symbols occur more frequently than others).

The compression ratio, i.e., the ratio of the number of bits used to represent the original sequence and the compressed sequence, can be controlled by adjusting one or both of the value of a Quantization Parameter (QP) and the value in a quantization matrix, both of which can be used to quantize the transform coefficient values. The compression ratio may depend on the entropy coding method used. Quantization matrices are typically designed such that the quantized values in the matrix typically, but not necessarily exclusively, increase in both the row (left-to-right) and column (top-to-bottom) directions. For example, as the transform coefficient block extends from a DC position in the upper-left (0, 0) corner toward the lower-right (n, n) corner of the transform coefficient block to higher frequency coefficients, the corresponding values in the quantization matrix typically increase. The reason for this design is that the Contrast Sensitivity Function (CSF) of the Human Visual System (HVS) decreases with increasing frequency, both horizontally and vertically.

In the decoder, the block in the current frame is obtained by first constructing a prediction of the block in the same way as in the encoder, and by adding the compressed prediction error to the prediction. The compressed prediction error is found by weighting the transform basis functions using the quantized coefficients. The difference between the reconstructed frame and the original frame is called the reconstruction error.

FIG. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques described in this disclosure. As shown in fig. 1, system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16. The encoded video data may also be stored on storage medium 34 or file server 36 and may be accessed by destination device 14 as desired. When stored to a storage medium or file server, video encoder 20 may provide the coded video data to another device (e.g., a network interface, Compact Disc (CD), blu-ray, or Digital Versatile Disc (DVD) burner or development facility device, or other device) for storage of the coded video data to the storage medium. Likewise, a device separate from video decoder 30 (e.g., a network interface, CD or DVD reader, or the like) may retrieve coded video data from a storage medium and provide the retrieved data to video decoder 30.

Source device 12 and destination device 14 may comprise any of a wide range of devices including: desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called smart phones, televisions, video cameras, display devices, digital media players, video game consoles, or the like. In many cases, these devices may be equipped for wireless communication. Thus, communication channel 16 may comprise a wireless channel, a wired channel, or a combination of a wireless channel and a wired channel suitable for transmission of encoded video data. Similarly, the file server 36 may be accessible by the destination device 14 via any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of wireless and wired connections suitable for accessing encoded video data stored on a file server.

Techniques for signaling quantization matrices according to examples of this disclosure may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the internet), encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, a modulator/demodulator 22, and a transmitter 24. In source device 12, video source 18 may include a source such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of these sources. As one example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications or applications that store encoded video data on a local disk.

Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may be modulated by modem 22 according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas.

Captured, pre-captured, or computer-generated video encoded by video encoder 20 may also be stored on storage medium 34 or file server 36 for later use. Storage medium 34 may comprise a blu-ray disc, DVD, CD-ROM, flash memory device, or any other suitable digital storage medium for storing encoded video. The encoded video stored on storage medium 34 may then be accessed by destination device 14 for decoding and playback. Although not shown in fig. 1, in some examples, storage medium 34 and/or file server 36 may store the output of transmitter 24.

File server 36 may be any type of server capable of storing encoded video and transmitting the encoded video to destination device 14. Example file servers include a web server (e.g., for a website), an FTP server, a Network Attached Storage (NAS) device, a local disk drive, or any other type of device capable of storing encoded video data and transmitting the encoded video data to a destination device. The transmission of the encoded video data from the file server 36 may be a streaming transmission, a download transmission, or a combination of both. The file server 36 may be accessible by the destination device 14 via any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, ethernet, USB, etc.), or a combination of a wireless channel and a wired connection suitable for accessing encoded video data stored on a file server.

In the example of fig. 1, destination device 14 includes a receiver 26, a modem 28, a video decoder 30, and a display device 32. Receiver 26 of destination device 14 receives the information over channel 16, and modem 28 demodulates the information to generate a demodulated bitstream for use by video decoder 30. The information communicated over channel 16 may include a variety of syntax information generated by video encoder 20 for use by video decoder 30 in decoding the video data. This syntax may also be included with the encoded video data stored on the storage medium 34 or the file server 36. Each of video encoder 20 and video decoder 30 may form part of a respective encoder-decoder (CODEC) capable of encoding or decoding video data.

The display device 32 may be integrated with the destination device 14 or external to the destination device 14. In some examples, destination device 14 may include an integrated display device and may also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

In the example of fig. 1, communication channel 16 may include any wireless or wired communication medium (e.g., a Radio Frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media). The communication channel 16 may form part of a packet-based network (e.g., a local area network, a wide area network, or a global network such as the internet). Communication channel 16 generally represents any suitable communication medium or collection of different communication media, including any suitable combination of wired or wireless media, for transmitting video data from source device 12 to destination device 14. Communication channel 16 may include a router, switch, base station, or any other equipment useful for facilitating communication from source device 12 to destination device 14.

Video encoder 20 and video decoder 30 may operate in accordance with a video compression standard, such as the high efficiency video coding standard (HEVC) currently under development, and may comply with the HEVC test model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T h.264 standard, alternatively referred to as MPEG-4 part ten (advanced video coding (AVC)), or extensions of these standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video compression standards include MPEG-2 and ITU-T H.263.

Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in the respective device.

For video coding according to the emerging HEVC standard currently developed by the joint collaborative team for video coding (JCT-VC), as one example, a video frame may be partitioned into coding units. A Coding Unit (CU) generally refers to an image area that serves as a basic unit to which various coding tools are applied for video compression. A CU typically has a luma component indicated as Y and two chroma components indicated as U and V. Depending on the video sampling format, the size of the U and V components may be the same as or different from the size of the Y component (in terms of the number of samples). A CU is typically square, and may be considered similar to a so-called macroblock (e.g., according to other video coding standards such as ITU-T h.264). For purposes of illustration, coding in accordance with some of the currently proposed aspects of the developing HEVC standard will be described in this application. However, the techniques described in this disclosure may be useful to other video coding processes, such as the video coding process defined in accordance with h.264, or other standard or proprietary video coding processes).

HEVC standardization efforts are based on a model of the video coding device known as the HEVC test model (HM). The HM assumes several capabilities of the video coding device relative to devices according to ITU-T h.264/AVC, for example. For example, although h.264 provides nine intra-prediction encoding modes, the HM may provide up to 35 intra-prediction encoding modes. The recent latest Working Draft (WD) of HEVC (hereinafter referred to as HEVC WD7) is available from http:// phenix.int-evry.fr/jct/doc _ end _ user/documents/9_ Geneva/wg11/JCTVC-I1003-v6.zip 10.30 from 2012.

In general, the working model of the HM describes: a video frame or picture may be divided into a sequence of tree blocks or Largest Coding Units (LCUs) that include both luma and chroma samples. Tree blocks have similar uses as macroblocks of the h.264 standard. A slice includes a number of consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slices. Each treeblock may be split into multiple Coding Units (CUs) according to a quadtree. For example, a tree-type block that is the root node of a quadtree may be split into four child nodes, and each child node may in turn be a parent node and may be split into four additional child nodes. The last non-split child node, which is a leaf node of the quadtree, comprises a coding node, i.e., a coded video block. Syntax data associated with a coded bitstream may define a maximum number of times a tree block may be split, and may also define a minimum size of a coding node.

A CU includes a coding node and a Prediction Unit (PU) and a Transform Unit (TU) associated with the coding node. The size of a CU corresponds to the size of the coding node, and its shape must be square. The size of a CU may range from 8 × 8 pixels up to the size of a tree block with a maximum of 64 × 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning of the CU to one or more PUs. The partition mode may differ between a CU being encoded in skip or direct mode, being encoded in intra-prediction mode, or being encoded in inter-prediction mode. The PU may be partitioned into non-squares. Syntax data associated with a CU may also describe partitioning the CU into one or more TUs, e.g., according to a quadtree. TU may be square or non-square.

The HEVC standard allows for a transform according to a TU, which may be different for different CUs. The size of a TU is typically set based on the size of a PU within a given CU defined for a partitioned LCU, but this may not always be the case. TUs are typically the same size or smaller than a PU. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). Pixel difference values associated with TUs may be transformed to produce transform coefficients, which may be quantized.

In general, a PU includes data related to a prediction process. For example, when a PU is intra-mode encoded, the PU may include data describing an intra-prediction mode for the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., list 0, list 1, or list C) of the motion vector.

In general, TUs are used for the transform and quantization processes. A given CU with one or more PUs may also include one or more Transform Units (TUs). After prediction, video encoder 20 may calculate residual values corresponding to the PUs. The residual values comprise pixel difference values that may be transformed into transform coefficients using TUs, quantized, and scanned to generate serialized transform coefficients for entropy coding. This disclosure generally uses the term "video block" to refer to a coding node of a CU. In some particular cases, this disclosure may also use the term "video block" to refer to a treeblock (i.e., LCU) or CU, which includes coding nodes and PUs and TUs.

A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) typically includes a series of one or more video pictures. The GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, that describes the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes and may differ in size according to a specified coding standard.

As an example, the HM supports prediction in various PU sizes. Assuming that the size of a particular CU is 2N × 2N, the HM supports intra prediction in PU sizes of 2N × 2N or N × N, and inter prediction in symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N. The HM also supports asymmetric partitioning for inter prediction in PU sizes of 2 nxnu, 2 nxnd, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a 2N × 2N CU that is horizontally partitioned with 2N × 0.5N PUs at the top and 2N × 1.5 NPUs at the bottom.

In this disclosure, "nxn" and "N by N" may be used interchangeably to refer to the pixel size of a video block in the vertical and horizontal dimensions, e.g., 16 x 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N × M pixels, where M is not necessarily equal to N.

After using intra-predictive coding or inter-predictive coding of PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. The PUs may comprise pixel data in a spatial domain (also referred to as a pixel domain), and the TUs may comprise coefficients in a transform domain after applying a transform (e.g., a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to the residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs that include the residual data of the CU, and then transform the TUs to generate transform coefficients for the CU.

Video encoder 20 may perform quantization of the transform coefficients after any transform used to generate the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be reduced to an m-bit value during quantization, where n is greater than m.

In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate an entropy-encodable serialized vector. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

To perform CABAC, video encoder 20 may assign a context within the context model to a symbol to be transmitted. A context may relate to, for example, whether neighboring values of a symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more probable symbols and longer codes correspond to less probable symbols. In this way, bit savings may be achieved using VLC as compared to, for example, using an equal length codeword for each symbol to be transmitted. The probability determination may be based on the context assigned to the symbol.

Video encoder 20 may implement any or all of the techniques of this disclosure for downsampling and signaling quantization matrices in a video coding process. Likewise, video decoder 30 may implement any or all of these techniques for upsampling quantization matrices in a video coding process. As described in this disclosure, a video coder may refer to a video encoder or a video decoder. Similarly, a video coding unit may refer to a video encoder or a video decoder. Likewise, video coding may refer to video encoding or video decoding.

In one example of this disclosure, video encoder 20 may be configured to: generating a quantization matrix comprising a plurality of values; down-sampling a first set of values in the quantization matrix by a first down-sampling factor to generate a first set of down-sampled values; down-sampling a second set of values in the quantization matrix by a second down-sampling factor to produce a second set of down-sampled values; and generate a coded bitstream that includes the first set of downsampled values and the second set of downsampled values. In some examples, the downsampling factor may be one, in which case the value is coded directly without downsampling.

In another example of this disclosure, video decoder 30 may be configured to: receiving, in a coded bitstream, a quantization matrix coded with downsampled values; upsampling a first set of downsampled values in the quantization matrix by a first upsampling factor to produce a first set of values; upsampling a second set of downsampled values in the quantization matrix by a second upsampling factor to produce a second set of values; and inverse quantizing the transform coefficient block with the first and second sets of values. In some examples, the upsampling factor may be one, in which case the value is coded directly without upsampling.

FIG. 2 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may perform intra and inter coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra mode (I-mode) may refer to any of a number of space-based compression modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of several time-based compression modes.

In the example of fig. 2, video encoder 20 includes partition unit 35, prediction processing unit 41, reference picture storage 64, summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Prediction processing unit 41 includes a motion estimation unit 42, a motion compensation unit 44, and an intra prediction processing unit 46. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and a summer 62. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blockiness artifacts from reconstructed video. A deblocking filter may typically filter the output of summer 62 if desired. In addition to deblocking filters, additional loop filters (in-loop or post-loop) may also be used.

As shown in fig. 2, video encoder 20 receives video data and partition unit 35 partitions the data into a plurality of video blocks. Such partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning according to a quadtree structure of LCUs and CUs, for example. Video encoder 20 generally illustrates the components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into a set of video blocks called tiles). Prediction processing unit 41 may select one of a plurality of possible coding modes (e.g., one of a plurality of intra coding modes or one of a plurality of inter coding modes) for the current video block based on the error results (e.g., coding rate and degree of distortion). Prediction processing unit 41 may provide the resulting intra-or inter-coded block to summer 50 to generate residual block data, and to summer 62 to reconstruct the encoded block for use as a reference picture.

Intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine an inter-prediction mode for a video slice according to a predetermined pattern of a video sequence. The predetermined pattern may designate video slices in the sequence as P slices, B slices, or GPB slices. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.

A predictive block is a block that is found to closely match a PU of a video block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture store 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Thus, motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector with fractional pixel precision.

Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing the locations of the PUs to the locations of predictive blocks of the reference picture. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of the lists identifying one or more reference pictures stored in reference picture store 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.

The motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on a motion vector determined by motion estimation, possibly performing interpolation to sub-pixel precision. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may find the location of the predictive block to which the motion vector points in one of the reference picture lists. Video encoder 20 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values. The pixel difference values form residual data for the block and may include both luminance and chrominance difference components. Summer 50 represents one or more components that perform this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.

Intra-prediction processing unit 46 may intra-predict the current block as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44 as described above. In particular, intra-prediction processing unit 46 may determine the intra-prediction mode used to encode the current block. In some examples, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction processing unit 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode from the tested modes for use. For example, intra-prediction processing unit 46 may calculate bitrate distortion values using bitrate distortion analysis on various tested intra-prediction modes and select the intra-prediction mode with the best bitrate distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., the number of bits) used to produce the encoded block. Intra-prediction processing unit 46 may calculate ratios from the distortions and bitrates of the various encoded blocks to determine which intra-prediction mode exhibits the best bitrate distortion value for the block.

In any case, after selecting the intra-prediction mode for the block, intra-prediction processing unit 46 may provide information to entropy coding unit 56 indicating the selected intra-prediction mode for the block. Entropy coding unit 56 may encode information indicative of the selected intra-prediction mode in accordance with the techniques of this disclosure. Video encoder 20 may include configuration data in the transmitted bitstream, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks, and indications of a maximum probability intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table for each of the contexts.

After prediction processing unit 41 generates the predictive block for the current video block via either inter-prediction or intra-prediction, video encoder 20 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. Transform processing unit 52 may transform the residual video data from the pixel domain to a transform domain, such as the frequency domain.

Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter or by modifying values in a quantization matrix. In some examples, quantization unit 54 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform scanning.

In some cases, quantization unit 54 may also perform a post-transform scaling operation in addition to the quantization operation. The post-transform scaling operation may be used in conjunction with the core transform operation performed by transform unit 52 to efficiently perform a complete spatial-to-frequency transform operation, or an approximation thereof, with respect to the residual data block. In some examples, the post-transform scaling operation may be integrated with the quantization operation such that the post-transform operation and the quantization operation are performed as part of the same set of operations with respect to one or more transform coefficients to be quantized.

In some examples, quantization unit 54 may quantize the transform coefficients based on a quantization matrix. The quantization matrix may include a plurality of values, each of the plurality of values corresponding to a respective transform coefficient of a plurality of transform coefficients in a transform coefficient block to be quantized. The values in the quantization matrix may be used to determine an amount of quantization to be applied by quantization unit 54 to corresponding transform coefficients in the transform coefficient block. For example, for each of the transform coefficients to be quantized, quantization unit 54 may quantize the respective transform coefficient according to a quantization amount determined at least in part by the respective one of the values in the quantization matrix that corresponds to the transform coefficient to be quantized.

In further examples, quantization unit 54 may quantize the transform coefficients based on the quantization parameters and the quantization matrix. The quantization parameter may be a block-level parameter (i.e., a parameter assigned to the entire transform coefficient block) that may be used to determine an amount of quantization to be applied to the transform coefficient block. In these examples, the values in the quantization matrix and the quantization parameter may be used together to determine an amount of quantization to be applied to corresponding transform coefficients in the transform coefficient block. In other words, the quantization matrix may specify values that may be used with the quantization parameters to determine the amount of quantization to be applied to the corresponding transform coefficients. For example, for each of the transform coefficients to be quantized in the transform coefficient block, quantization unit 54 may quantize the respective transform coefficient according to an amount of quantization, the amount of quantization being determined at least in part by a block-level Quantization Parameter (QP) for the transform coefficient block and a respective value of a plurality of coefficient-specific values in a quantization matrix that corresponds to the transform coefficient to be quantized. Thus, the quantization matrix provides a corresponding value for each transform coefficient, and applies the value to the QP to determine the amount of quantization for the transform coefficient value.

In some examples, the quantization process may include a process similar to one or more of the processes defined for the HEVC proposal and/or by the h.264 decoding standard. For example, to quantize the values (i.e., levels) of the transform coefficients, quantization unit 54 may scale the transform coefficients by the corresponding values in the quantization matrix and by the post-transform scaling values. Quantization unit 54 may then shift the scaled transform coefficients by an amount based on the quantization parameter. In some cases, the post-transform scaling value may be selected based on a quantization parameter. Other quantization techniques may also be used.

In some examples, quantization unit 54 may include, in the encoded bitstream, data indicative of a quantization matrix used by quantization unit 54 to quantize the transform coefficients. For example, quantization unit 54 may provide data indicative of the quantization matrix to entropy encoding unit 56 for entropy encoding the data, and then placed into an encoded bitstream.

The quantization matrix data included in the encoded bitstream may be used by video decoder 30 to decode the bitstream (e.g., to perform an inverse quantization operation). In some examples, the data may be an index value that identifies a predetermined quantization matrix from a set of quantization matrices, or may identify a function used to generate a quantization matrix. In further examples, the data may include actual values contained in the quantization matrix. In additional examples, the data may include coded versions of actual values contained in the quantization matrix. For example, a coded version of a quantization matrix may include downsampled values for a particular location in the quantization matrix. In another example, a coded version may be generated based on a prediction value as described in more detail later in this disclosure. In some examples, the data may take the form of one or more syntax elements that specify a quantization matrix used by quantization unit 54 to quantize a transform coefficient block corresponding to a video block to be coded, and quantization unit 54 may include the one or more syntax elements in a header of the coded video block.

In previous standards such as MPEG-2 and AVC/H.264, quantization matrices were used to improve subjective quality, as described above. Quantization matrices may also be included as part of the HEVC standard.

In HM5.1 transform sizes of 4 × 4,8 × 8, 16 × 16 and 32 × 32 are possible. The 32 x 32 transform may be used for luma and possibly only for luma (i.e., may not be used for chroma components). It may be suitable to allow a total of 20 quantization matrices (i.e., separate quantization matrices for Y, U and blocks for the V components used for 4 × 4,8 × 8, 16 × 16 intra and inter prediction blocks and 32 × 32 intra and inter prediction for the Y components). Therefore, it is possible for the encoder to signal 4064 quantization matrix values in order to signal all possible permutations. In some examples, a zigzag scan of the quantization matrix entries, followed by first order prediction (e.g., differential coding) and exponential golomb coding (where the parameter is 0)) may be used to compress the quantization matrix without loss. However, due to the large number of quantization matrix coefficients, a better compression method may be required in HEVC.

Quantization matrices are typically designed to take advantage of the Human Visual System (HVS). The human visual system is generally less sensitive to quantization errors at higher frequencies. One reason for this design is that the Contrast Sensitivity Function (CSF) of the human visual system decreases with increasing frequency, both horizontally and vertically. Thus, for a design-wise quantization matrix, the matrix entries increase in both the row (left-to-right) and column (top-to-bottom) directions. In particular, when a transform coefficient block extends from a DC position in the top-left (0, 0) corner toward the bottom-right (n, n) corner to higher frequency coefficients, the corresponding values in the quantization matrix substantially increase or at least do not decrease.

In the prior art for signaling the quantization matrix, all values (i.e., coefficients) of the entire quantization matrix are signaled. However, it may not be necessary to signal the entire quantization matrix, as some coefficients (e.g., the coefficients towards the lower right corner of the quantization matrix) may have no substantial impact on video quality.

As one example, higher block sizes such as 32 x 32 are typically used when residual blocks are smoothed, where a residual block is the difference between an actual block of video data and a predicted block of video data. The smoothed residual block exhibits a small deviation of the values within the residual block. In this case, after quantization, it may be unlikely that there are many non-zero coefficients at the higher frequencies of the transformed block (i.e., towards the lower right corner).

Statistics of the encoded video sequence support this assumption. For example, using partial frequency transform techniques (e.g., encoding the smallest 16 × 16 coefficients from a 32 × 32 block) shows a minimal loss in coding efficiency. This may be considered equivalent to selecting very high values for the quantization matrix entries for frequencies outside the 16 x 16 region (e.g., selecting high values for the quantization matrix coefficients for frequencies outside the 16 x 16 region). In this example, because there may be a minimal loss in coding efficiency, it may not be necessary to signal all 32 x 32 quantization matrix values (which are 1024 values) in the encoded video bitstream.

The following describes examples of signaling and sum codes for quantization matrices. For example, for signaling, video encoder 20 may signal a one-bit flag to indicate that the entire quantization matrix or only a subset of the quantization matrix is coded. If the flag indicates that the entire quantization matrix is coded, any coding method may be used, such as the coding methods of HM5.1, AVC/H.264, JCTVC-F085, JCTVC-E073, or the techniques described in U.S. provisional patent application No. 61/547,647, which is incorporated by reference in its entirety and will be discussed in more detail below.

If the flag indicates that only a subset of the quantization matrices is being coded (e.g., the first subset), the size of the subset may be coded as a pair of values (last _ row, last _ col). In this example, assume that the subset is rectangular and covers the quantization matrix entries from location (0, 0) to location (last _ row, last _ col). However, other shapes may be used. It is also possible to limit the shape to a square, in which case only a single last value may need to be coded, since the last _ row and last _ col values will be the same. The last values (last _ row, last _ col) may be coded with a fixed number of bits, which may depend on the size of the quantization matrix. For example, for a 32 × 32 quantization matrix, the last value may be coded using 5+5 ═ 10 bits. The last value may be coded using a variable length code (e.g., an exponential golomb code or golomb code).

After the last value (last _ row, last _ col) is coded, the quantization matrix entries belonging to a subset (e.g., the values of the first subset) may be coded. The HM5.1 method or any other method (e.g., techniques described in AVC/h.264, JCTVC-F085, JCTVC-E073, or U.S. patent application No. 13/649,836, filed on 10/11/2012) may be used to code the quantization matrix entries belonging to the subset. Coding may be lossy or lossless.

According to the techniques of U.S. patent application No. 13/649,836, video encoder 20 and video decoder 30 may perform raster scanning on values of a first subset of quantization matrices and non-linear prediction value techniques for coding prediction errors. According to an example technique, the prediction value is a maximum of a value in the first subset of the quantization matrix on the left and a value above with respect to the current scan position in the quantization matrix. In other words, when the quantization matrix is scanned in raster order, the current value in the quantization matrix is predicted based on the maximum of the value to the left of the current value and the value above the current value. Raster order may generally refer to the order in which values in a quantization matrix are scanned from top to bottom by rows and from left to right within each row. In general, the values in the quantization matrix will correspond to respective transform coefficients in a transform coefficient block, with coefficients toward the upper left being low frequency and coefficients near the lower right increasing in frequency.

After the coding of the quantization matrix entries belonging to a subset is complete, the remaining portions of the quantization matrix entries (e.g., coefficient values of the second subset) may be predicted from the quantization matrix entries belonging to the first subset. This process may be performed by both the encoder and decoder. For example, if a quantization matrix entry belonging to a subset is decoded in a lossy manner, the entry is reconstructed. Then, the quantization matrix entries outside the subset (e.g., the coefficient values of the second subset) are scanned in a raster scan order (as one example) to predict the coefficient values of the second subset.

In examples of this disclosure, video encoder 20 may be configured to signal quantization matrix values for a subset of the quantization matrices. For example, a video encoder may divide a quantization matrix into at least a first subset and a second subset of quantization matrix values. The video encoder may encode the coefficient values of the first subset and signal these encoded values as syntax elements to the video decoder. The video decoder may decode the coefficient values of the first subset according to the received syntax elements.

In some examples of this disclosure, video decoder 30 may predict the values of the second subset. For example, in some examples, video encoder 20 may not have to signal syntax elements used to derive the quantization matrix coefficient values for the second subset so that the video decoder may predict the values of the second subset. Instead, video decoder 30 may utilize the techniques of this disclosure to predict the values of the second subset without utilizing these syntax elements. In this way, the amount of data that needs to be signaled for the quantization matrix may be reduced.

As one example, video decoder 30 may predict coefficient values of the second subset of quantization matrix values based on decoded coefficient values of the first subset of quantization matrix values, as discussed in more detail below. As another example, to predict the values of the second subset of quantization matrix values, video decoder 30 may assign each coefficient in the second subset a constant value, among one non-limiting example. The constant value may be a maximum allowable quantization matrix value. In some examples, video encoder 20 may signal a constant value to the video decoder, or alternatively, video encoder 20 and video decoder 30 may be preprogrammed with a constant value.

Fig. 4 is a graphical diagram illustrating an example quantization matrix. Fig. 4 illustrates a quantization matrix 94, which is a 32 × 32 quantization matrix used to quantize a 32 × 32 block of residual transform coefficients. Although the techniques with respect to fig. 4 are described in the context of a 32 x 32 quantization matrix, aspects of this disclosure are not so limited and may be extended to other sizes of quantization matrices, including non-square quantization matrices. The quantization matrix 94 includes a first subset 96 that includes a subset of the values of the entries of the quantization matrix 94. In this example, the first subset 96 is an 8 x 8 matrix (including quantization matrix values a001 in the upper left corner and quantization matrix values a232 in the lower right corner), although other sizes (including non-square sizes) are possible. In this example, coefficient values in the entries of first subset 96 may be encoded and signaled by video encoder 20. The size of the first subset 96 may also be encoded and signaled. The size may be the last row and the last column of the first subset 96, i.e., (7, 7), assuming that the variable a001 is at (0, 0) in the quantization matrix 94. Because this subset is square, only one variable (e.g., 7) may need to be signaled. For non-square subsets, the last row value and the last column value may be encoded and signaled.

In some instances, the values of the entries of the second subset 98 may not be used to predict the values of the entries of the second subset 98. The second subset includes, among other things, quantization matrix values a009, a257, and a1024, and is bounded by the dotted line. The ellipses represent additional quantization matrix values and are used to reduce the size of the graph. In other words, the values of the entries of the second subset 98 may be predicted without using syntax elements calculated from the coefficient values of the entries of the second subset. In some other examples, the values of the entries of the second subset 98 may be determined from downsampled values of the second subset received from the video encoder, as will be discussed in more detail below.

In some examples, the values of the entries of the second subset 98 may be the maximum of the quantization matrix values above the particular value or the maximum of the quantization matrix values to the left of the particular value. If there is no left or upper value, then the left or upper value is assumed to be zero. For example, to predict coefficient values for the second subset of the quantization matrix, video encoder 20 or video decoder 30 may set coefficient values for the current entry of the second subset at coordinate position [ x, y ] to the greater of the coefficient values for the entry to the left at coordinate position [ x-1, y ] in the quantization matrix and the coefficient values for the entry above at coordinate position [ x, y-1] in the quantization matrix (assuming [0, 0] in the upper left corner and [ n, n ] in the n-by-n quantization matrix).

In some examples, the values of the entries of the first subset 96 may be predicted in raster scan order; however, other scanning orders may be used. In this example, rather than signaling the value of the quantization matrix itself, the difference between the current quantization matrix value and the previous quantization matrix value along the raster scan order is signaled. Since the quantization matrix values generally increase in the horizontal and vertical directions, the prediction error (i.e., the difference between the current quantization matrix value and the previous quantization matrix value along the scan order) of the proposed prediction value (i.e., the upper and left quantization matrix values) is almost always non-negative. It should be noted that this proposed prediction scheme works well when asymmetric quantization matrices are used, while zig-zag based scanning is not as efficient.

In some examples, the prediction error is encoded using a golomb code. The golomb code parameters may be included in the encoded video bitstream by the encoder (using fixed or variable length codes) or may be known to both the encoder and decoder. Other methods, such as exponential golomb coding, may be used to encode the prediction error. Due to the slightly spread nature of the prediction error, a golomb code may be required. To be able to encode occasional negative values, a remapping method may be used.

In some examples, one or more of the prediction coefficient values of the second subset may be predicted from other prediction coefficient values of the second subset. For example, the coefficient values of the entries of the portion of the quantization matrix that are also the second subset may be above the current entry in the second subset, and the coefficient values of the entries of the portion of the quantization matrix that are also the second subset may be to the left of the current entry in the second subset. In this example, the coefficient values of the entries that may be used to predict the coefficient value of the current entry may be the predicted values themselves, so these entries are also part of the second subset, and the coefficient values of the entries of the second subset may all be predicted. Video encoder 20 and video decoder 30 may use this process to derive all quantized entries outside of the subset (e.g., in the second subset). Graphical diagrams illustrating the quantization matrices and reconstructed quantization matrices are illustrated in fig. 5 and 6 and described in more detail below.

Referring back to fig. 4, as one example, the values for coefficient a009 of the second subset 98 are predicted to be equal to coefficient a008 of the first subset 96 because no value is available above a 009. The value of the coefficient a257 of the second subset 98 is predicted to be equal to the coefficient a225 of the first subset 96, since no value is available to the left of a 257. The value of coefficient a042 of the second subset 98 is predicted to be the greater of the values of coefficient a010 and coefficient a041 (both belonging to the second subset 98). In this example, the values of coefficients a010 and a041 are predicted values, since both coefficients are in the second subset 98.

Fig. 5 is a graphical diagram illustrating a quantization matrix with example values that may be signaled using prediction according to the techniques described above. Fig. 6 is a graphical diagram illustrating a reconstructed quantization matrix utilizing one or more example techniques of this disclosure. For example, fig. 5 illustrates a quantization matrix 100, which is an 8 x 8 matrix, for illustration purposes. In this example, video encoder 20 may signal the values of the first 5 × 5 entries in quantization matrix 100 (shown in bold lines). For example, in this example, the first subset 101 of quantization matrices 100 is the first 5 x 5 values, which means: in this example, the values of last _ row and last _ col are each 4 (assuming an index based on zero). Because first subset 101 is square, video encoder 20 may signal value 5 only (e.g., because the last _ row and last _ col values are the same). The remaining values in the quantization matrix 100 (i.e., the values outside the first subset 101) are considered to be in the second subset.

Fig. 6 illustrates reconstructed quantization matrix 102. In this example, video decoder 30 (or video encoder 20 in the reconstruction loop) may utilize one of the example techniques to generate reconstructed quantization matrix 102. For example, video decoder 30 and video encoder 20 may utilize techniques that determine values for the second subset of quantization matrix values by using a maximum value between the coefficient on the left relative to the current coefficient and the coefficient above relative to the current coefficient.

The reconstructed quantization matrix 102 illustrates the results of this technique. For example, the first 5 x 5 entries in the first subset 103 are the same as the first 5 x 5 entries in the first subset 101 in the quantization matrix 100, since these values are explicitly signaled. The remaining values (e.g., values of the second subset outside of the first subset 103) are derived from determining the maximum of the coefficients above and to the left relative to the current coefficient.

In some examples, instead of the prediction and raster scan described above, other scans and/or predictions may be used. Alternatively, the quantization matrix entries outside the subset (e.g., coefficient values of the second subset) may be set to a constant value, such as a maximum allowable quantization matrix value. This constant value may be signaled from the video encoder to the video decoder in the bitstream, or the video encoder and video decoder may be preprogrammed with the constant value.

In some examples, video encoder 20 may predict values in the second subset similarly as performed by video decoder 30. For example, video encoder 20 may predict the values of the second subset and replace the values in the second subset with the predicted values of the second subset. In this way, the quantization matrices used on the video encoder side and the video decoder side may be the same.

In some video coding examples, it may not be sufficient to use a constant value of the quantization matrix value or a prediction from the first subset to determine the quantization matrix entries that are not explicitly signaled (i.e., those entries outside the rectangle of (0, 0) to (last _ row, last _ col), the values of the second subset). Other examples for signaling the quantization matrix values are described below, such as using values of a different matrix and using downsampled values to determine the values of the second subset.

As one example, entries that are not explicitly signaled (e.g., values of the second subset) are derived from different matrices (e.g., smaller sized quantization matrices). As one example, such a smaller sized quantization matrix may have been coded in the bitstream signaled by the video encoder. In some examples, the different matrices may be quantization matrices. Video encoder 20 may have previously signaled this different matrix.

For example, a video encoder may signal values of different quantization matrices (e.g., quantization matrices having sizes including 4 × 4,8 × 8, 16 × 16, or 32 × 32). In this example, video decoder 30 may utilize coefficient values from any of the quantization matrices previously encoded in the bitstream to reconstruct the current quantization matrix. For example, assume that the quantization matrix to be reconstructed is a 32 × 32 quantization matrix. In this example, video encoder 20 may signal coefficient values for a first subset of the 32 x 32 quantization matrices. Assuming that the video decoder has received a quantization matrix of size 4 × 4,8 × 8, or 16 × 16, the video decoder may reconstruct the 32 × 32 quantization matrix using the 4 × 4,8 × 8, or 16 × 16 quantization matrix to determine the values of the second subset.

In some examples, a 32 x 32 quantization matrix may be reconstructed using any of the 4 x 4,8 x 8, or 16 x 16 quantization matrices. For example, to reconstruct a 32 × 32 quantization matrix, video decoder 30 may use an 8 × 8 quantization matrix, and the 8 × 8 quantization matrix may be a reconstructed quantization matrix that is reconstructed using a 4 × 4 quantization matrix. However, these hierarchical reconstructions of the quantization matrix may not be necessary in every instance. For example, video encoder 20 may signal the entire 8 × 8 quantization matrix used by video decoder 30 to reconstruct the 32 × 32 quantization matrix. Some of the values of the 32 x 32 quantization matrix may be signaled, while other values may be reconstructed from one or more of the smaller matrices.

Additionally, in some examples, the video encoder may signal the size of the smaller matrix (e.g., the first subset). In another example, video decoder 30 and video encoder 20 may be preprogrammed with the size of the smaller matrix (e.g., the size of the smaller matrix may be known a priori to video encoder 20 and video decoder 30).

As a specific example, assume: the quantization matrix is 32 × 32, last _ row is 14, and last _ col is 14. In this example, video encoder 20 signals the values of the lowest 15 × 15 entries in the 32 × 32 quantization matrix. Assume that video decoder 30 is deriving values for matrix entries with indices (r, c), where r > -15 or c > -15. In this example, to derive the quantization matrix values, the video decoder may use values from a different matrix (e.g., an 8 x 8 matrix), which may be a smaller sized quantization matrix.

Video decoder 30 may use a smaller sized quantization matrix for determining the values of the second subset in different ways. For example, a video decoder may determine a ratio between the size of a quantization matrix and the size of a different smaller sized matrix. Video decoder 30 may scale the position coordinates of the value-determined entries (e.g., the values of the entries in the second subset) within the quantization matrix by a ratio and use a top-valued function and a bottom-bounded function to determine corresponding positions in different smaller-sized matrices. Video decoder 30 may then use the values in the different smaller-sized matrices that correspond to the identified locations within the different smaller-sized matrices to determine the values of the second subset in the quantization matrix being reconstructed.

For example, orderA value at position (r, c) of the reconstructed quantization matrix indicating the size of nxn, where r is a row index and c is a column index. Let r be_L＝floor(r/4)，r_H＝ceil(r/4)，c_LFloor (c/4), and c_HCeil (c/4), where the factor 4 is derived as (32/8). Here, floor (x) indicates the largest integer less than or equal to x. Similarly, ceil (x) indicates the smallest integer greater than or equal to x. Then, can be followedIs set asOr can be combined withIs set asAndaverage value of (a). If the entire 8 x 8 quantization matrix is sent to the decoder, the reconstructed 8 x 8 matrix is the same as the original 8 x 8 quantization matrix. Bilinear interpolation or other more complex interpolation techniques and/or longer interpolation filters may be used. The size of the matrix from which missing values are derived is signaled in the bitstreamSent or known a priori to the video encoder and video decoder. The values of the smaller matrix (e.g., the first subset) may also be included in the bitstream.

In AVC/h.264, zigzag scanning and differential pulse code modulation (DPCM, i.e. prediction from the last value in the scanning order) is used. Then, if the quantization matrix value is coded as zero, this indicates: no more quantization matrix values are to be decoded and the last decoded positive quantization matrix value is repeated. In this case, instead of repeating the last coded quantization matrix value, the remaining quantization matrix values may be derived from a smaller sized quantization matrix, as described earlier.

As described above, in some examples, video decoder 30 may determine the values of the second subset without receiving any syntax elements based on the values of the second subset. However, it may not be beneficial in every instance to avoid signaling the quantization matrix values in the second subset. That is, signaling at least some quantization matrix values (e.g., the values of the second subset of values) of the higher frequency components of the quantization matrix may provide a better tradeoff between coding efficiency and errors in the reconstructed quantization matrix.

In another example of this disclosure, as will be described in more detail below, video encoder 20 may down-sample values of a subset of quantization matrix values and signal the down-sampled values. The video decoder may up-sample the downsampled values to determine the values needed to reconstruct the quantization matrix at the video decoder side. A smaller amount of data may be present in the downsampled values than the original values, and by signaling the downsampled values, the amount of data signaled for the quantization matrix may be reduced.

In one example of downsampling, values outside a subset of the quantization matrix (e.g., from (0, 0) to (last _ row, last _ col)) (i.e., values in the second subset) may be downsampled by a particular factor (e.g., 2), and the downsampled values may be coded in the bitstream in a lossless manner. Any coding method, such as the coding methods described in AVC/H.264, JCTVC-F085, JCTVC-E073, or the techniques described in U.S. patent application Ser. No. 13/649,836, may be used to code the downsampled values. Downsampling is performed using simple averaging (e.g., averaging of quantization matrix values in an nxn region) or using more complex filters and/or equations. Both video encoder 20 and video decoder 30 may use upsampling of the coded values to generate values outside the first subset (e.g., values of the second subset). The upsampling technique may use simple pixel repetition (i.e., using downsampled values for all coordinates within the downsampling region) or a more complex technique. For example, the downsampled quantization matrix values may be processed in a manner similar to downsampled images. Techniques known in the art for performing image upsampling (e.g., bilinear interpolation, bicubic interpolation, etc.) may then be used to upsample the downsampled quantization matrix.

As described above, video encoder 20 may explicitly signal the coefficient values of the first subset, and determine the coefficient values of the second subset using some form of prediction. In the following example, rather than signaling coefficient values for the first subset and determining values for the second subset by prediction, the following techniques may allow a video encoder to generate a coded bitstream signaling coefficient values of a quantization matrix at different levels of coarseness. For example, coefficient values corresponding to lower frequency components of the quantization matrix may be signaled in a lossless manner (i.e., explicitly), and other coefficient values (e.g., in the second subset, the third subset, etc.) may be signaled in an increasingly coarser manner (e.g., by using different downsampling factors). The coefficient values corresponding to the lower frequency positions are typically located closest to the origin of the quantization matrix (e.g., the row and column indices of the coefficient values are closest to (0, 0)). In general, the following techniques allow a video encoder to apply a non-uniform amount of downsampling to quantization matrix values based on where the quantization matrix values are located in a quantization matrix.

The techniques of this example may provide a scheme by which coefficient values located farther from the origin of the quantization matrix are more coarsely approximated than coefficient values located closer to the origin of the quantization matrix. In this example, the approximated quantization matrix values (e.g., in the second and/or third subsets or larger subsets) may be coded and signaled in the bitstream.

However, in some alternative examples, the following technique may be utilized for coefficient values of a first subset, where the first subset is similar to the first subset described above. In these alternative examples, the techniques may determine the coefficient values for the second subset using any of the above example techniques.

For example, for quantization matrix values located in a region near the origin of the quantization matrix (e.g., in the first subset near (0, 0)), video encoder 20 may not apply downsampling (i.e., apply a downsampling factor of 1). In this region, all quantization matrix values are signaled. If the coefficient values in the quantization matrix are located farther from the origin of the quantization matrix (e.g., in a second subset outside the first subset), video encoder 20 may apply a higher level of downsampling (e.g., apply a downsampling factor of 2, 3, 4, etc.). A downsampling factor greater than 1 may indicate the number of coefficient values represented by one value. As an example, the downsampling factor of 2 may mean: quantizing 2 of the matrix when pixel repetition is used for repeated reconstruction²(i.e., 4) coefficient values may be represented by each encoded value. Similarly, the downsampling factor of 4 may mean: quantizing 2 of the matrix when pixel repetition is used for reconstruction⁴(i.e., 16) coefficient values may be represented by each encoded value.

As discussed above, the value calculated from the downsampling factor may be a simple average. For example, at the encoder side, the four quantization matrix values in the 2 x 2 square are averaged for the downsampling factor 2, and the average of those four quantization matrix values is signaled. Likewise, if the downsampling factor is 4, the sixteen quantization matrix values in the 4 x 4 square are averaged, and the average of those sixteen quantization matrix values is signaled. Other more complex equations or filter techniques may be used to calculate the downsampled values.

In some examples, video encoder 20 may establish a downsampling transition point (e.g., boundary) within a quantization matrix. Coefficient values in the quantization matrix between the first transition point and the origin of the quantization matrix may be downsampled by a first downsampling factor (which may be as low as one, meaning no downsampling), coefficient values in the quantization matrix between the first transition point and the second transition point may be downsampled by a second downsampling factor, coefficient values in the quantization matrix between the second transition point and the third transition point may be downsampled by a third downsampling factor, and so on. In some examples, the amount by which the downsampling factor is changed may be non-uniform at each transition point; but aspects of the invention are not limited thereto.

For example, in some examples, syntax elements indicating the positions of a subset of quantization matrix values may not be included in the bitstream. Instead, the location of the region is known in advance at both the video encoder and decoder. Using a downsampling factor of 1 may be equivalent to sending all values, as was done for the low frequency subset values in the previous example (e.g., the low frequency subset values refer to the values located closest to the origin of the quantization matrix). In addition, for other regions that use downsampling factors greater than 1, additional quantization matrix values may be included in the bitstream. One example of such an additional quantization matrix value is shown in fig. 7 for a 16 x 16 block.

In the example of fig. 7, if the row and column indices are both in the range of 0< ═ index < ═ 3, then a downsampling factor of 1 is used in each direction (i.e., no downsampling). If both the row and column indices are in the range of 0< ═ index < ═ 7, but neither is in the range of 0< ═ index < ═ 3, then a downsampling factor of 2 is used in each direction (row/column). For all remaining values, a downsampling factor of 4 is used in each direction. In fig. 7, one quantization matrix value is coded for each of the numbered squares. This value can be derived by simply averaging all quantization matrix values from the original 16 x 16 quantization matrix that belong to the corresponding square. Although simple averaging is used in this example, it is also possible to use a more complex downsampling filter. Squares 0-15 each correspond directly to one quantization matrix value because the downsampling factor is 1 in this region. Squares 17-27 correspond to 2 x 2 blocks of quantization matrix values (i.e., 4 quantization matrix values) because the downsampling factor is 2 in this region. Squares 29-39 correspond to 4 x 4 blocks of quantization matrix values (i.e., 16 quantization matrix values) since the downsampling factor is 4 in this region. The numbers within the squares represent the zigzag scanning order by which the values are coded in the bitstream.

One quantization matrix value corresponding to each square may be included in the bitstream. This may be accomplished using a separate zig-zag scan in a particular region for each downsampling factor. For example, the first squares 0-15 corresponding to a downsampling factor of 1 are scanned in a zigzag order. This scan is followed by a zigzag scan of squares 17-27 corresponding to a downsampling factor of 2. This scan is followed by a zigzag scan of squares 28-39 corresponding to a downsampling factor of 4. If the zig-zag scan for the higher downsampling factor crosses the area covered by another zig-zag scan for the lower downsampling factor, no value is coded (e.g., when going from square 16 to square 17). However, if DPCM is used to code the downsampled values, the predicted value for the next value in the zigzag scan may be derived from the corresponding quantization matrix value for the lower sub-sampling factor, the value having been coded in the bitstream.

For example, in fig. 7, consider a zigzag scan corresponding to a sub-sampling factor of 2. The zigzag scan crosses the square with indices 16 and 17. Between these two squares there are regions (squares 11-15) that have been covered by the zigzag scan corresponding to a sub-sampling factor of 1. Thus, for that region, no value is coded to the bitstream because this region has already been coded. However, when the quantization matrix value of the square having index 17 is being coded using DPCM, the prediction value is derived from the coded values of the squares having indices 11, 13, 14, and 15. This quantization matrix value may be a simple average of the coded values truncated to the nearest integer.

Upon receiving the downsampled quantization matrix, video decoder 30 may decode the quantization matrix values for the coefficient values in the same order in which the quantization matrix values are included in the bitstream. Video decoder 30 may use simple repetition to perform upsampling of the quantization matrix values. That is, all locations within the square use the same quantization matrix value. This quantization matrix value is typically the coded value corresponding to that square. More complex upsampling filters may also be used.

As described above with respect to other techniques, the downsampled quantization matrix values may be coded using DPCM (prediction from a previous value in the scan), followed by signed exponential golomb coding. When a value is not coded because that region has been covered by a zig-zag scan corresponding to a lower sub-sampling factor, the prediction of the next coded value is modified as described above. Any other prediction and coding method may also be used. Instead of 3 downsampling factors such as shown in fig. 7, fewer or more downsampling factors and regions may be used. FIG. 8 shows an example with 2 downsampling factors for an 8 × 8 block, with blocks 0-15 having a downsampling factor of 1 and blocks 17-27 having a downsampling factor of 2.

It should also be noted that other types of scans, such as a diagonal up (up-diagonal) scan, may be used. Also, the scanning may be performed in reverse order. For example, a value corresponding to a downsampling factor of 3 may be first coded. This may be followed by a value corresponding to a downsampling factor of 2, and so on.

In one specific example of the present disclosure, the DC coefficient of the quantization matrix (i.e., the quantization matrix value at position (0, 0)) is the only value in the first subset, and is downsampled by a downsampling factor of 1 (i.e., it is explicitly signaled). All other quantization matrix values in the quantization matrix are considered to be in the second subset and are downsampled by a factor of 2 or more. Fig. 9 shows a 16 x 16 quantization matrix coded according to this example. As shown in fig. 9, the DC coefficient in square 0 is explicitly coded (i.e., downsampled by a factor of 1), and all other quantization matrix values in the quantization matrix are downsampled by a factor of 2. Note that square 1 downsampled by a factor of 2 technically includes a DC coefficient. The value for this particular 2 x 2 block may be determined as the average of the three remaining quantization matrix values (i.e., the values other than the DC coefficient), the average of all four quantization matrix values in the region (i.e., the average including the DC coefficient), or using some other filtering technique.

In another example of the present invention, for a 32 x 32 block, the following downsampling transition point may be used. If the row and column indices are both in the range of 0< ═ index < ═ 3, then a downsampling factor of 1 is used in each direction (i.e., no downsampling). If both the row and column indices are in the range of 0< ═ index < ═ 15, but neither is in the range of 0< ═ index < ═ 3, then a downsampling factor of 2 can be used in each direction (row/column). For all remaining values, a downsampling factor of 4 may be used in each direction. The transition point at which the downsampling factor change value (e.g., from 1-2 or from 2-4) and the actual downsampling factor may be included in the bitstream, or they may be known a priori at both video encoder 20 and video decoder 30.

In one example of this disclosure, if uniform sampling is used, only the 8 x 8 matrix needs to be coded. For non-uniform sampling, more quantization matrix values are decoded, since a more accurate estimation of the complete quantization matrix (32 × 32 or 16 × 16) will be achieved.

For the uniform sampling example, instead of coding a 16 × 16 or 32 × 32 quantization matrix, a lower size (e.g., 8 × 8) quantization matrix is coded in the bitstream. Interpolation may then be used when values of the quantization matrix entries that produce a larger matrix are needed. If the quantization matrix entries represent frequencies in a lower frequency subset (e.g., the lowest 8 x 8 frequencies), then values of the quantization matrix entries for the larger quantization matrix are calculated using bilinear interpolation. For the remainder of the region, the corresponding values are repeated using a quantization matrix from a lower size. Instead of using the lowest 8 x 8 frequency, any other subset may be used. Furthermore, instead of bilinear interpolation and pixel repetition, any two interpolation methods may be used. This technique can be further generalized to more than 2 regions and more than 2 interpolation methods.

With respect to fig. 7, 8, and 9, and the examples of downsampling coefficient values described above, different downsampling factors are used in different regions (i.e., representing different subsets of quantization matrix values). For each subset, one quantization matrix value may be signaled for each block (e.g., the numbered squares in fig. 7-9), where the number of quantization matrix values represented by each block is determined by the downsampling factor for the particular subset. The location at which switching between downsampling factors occurs may be known to the video encoder and video decoder or explicitly signaled.

In other words, the downsampling techniques discussed above may allow video encoder 20 to signal lower frequency quantization matrix values (in one example, with respect to DC coefficients) losslessly and approximate other quantization matrix values in an increasingly coarse manner. This may avoid the necessity of placing the entire quantization matrix in storage, which may be beneficial for 16 x 16 and 32 x 32 block sizes (although the benefits may also apply to blocks of different sizes).

In accordance with the techniques described above, video encoder 20 may be configured to: determining a quantization matrix comprising a plurality of values; down-sampling a first set of values in the quantization matrix by a first down-sampling factor to generate a first set of down-sampled values; down-sampling a second set of values in the quantization matrix by a second down-sampling factor to produce a second set of down-sampled values; and generate a coded bitstream that includes the first set of downsampled values and the second set of downsampled values.

Referring back to fig. 2, after quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. Following entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30, or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode motion vectors and other syntax elements for the current video slice being coded.

In some examples, entropy encoding unit 56 may be operable to perform the techniques of this disclosure. However, aspects of the present invention are not limited thereto. In alternative examples, some other unit of video encoder 20 (e.g., a processor) or any other unit of video encoder 20 may be tasked with performing the techniques of this disclosure. As one example, entropy encoding unit 56 may be operable to encode a size of a first subset of quantization matrices, encode coefficient values of the first subset, and predict coefficient values of a second subset of quantization matrices. Also, in some examples, the techniques of this disclosure may be partitioned among one or more units of video encoder 20.

Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for a reference picture. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of a picture in a reference picture within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in reference picture store 64. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture.

FIG. 3 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. In the example of fig. 3, video decoder 30 includes an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transform unit 88, a summer 90, and a reference picture store 92. Prediction processing unit 81 includes motion compensation unit 82 and intra prediction processing unit 84. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 of fig. 2.

During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction processing unit 81. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

In some examples, entropy decoding unit 80 may be operable to perform the techniques of this disclosure. However, aspects of the present invention are not limited thereto. In alternative examples, some other unit of video decoder 30 (e.g., a processor) or any other unit of video decoder 30 may be tasked with performing the techniques of this disclosure. As one example, entropy decoding unit 80 may be operable to decode a size of a first subset of quantization matrices, decode coefficient values of the first subset, and predict coefficient values of a second subset of quantization matrices. Also, in some examples, the techniques of this disclosure may be divided among one or more units of video decoder 30.

When a video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 84 of prediction processing unit 81 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P or GPB) slice, motion compensation unit 82 of prediction processing unit 81 generates predictive blocks for the video blocks of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 80. The predictive block may be generated from a picture in a reference picture within one of the reference picture lists. Video decoder 30 may use a default construction technique to construct the reference frame lists (list 0 and list 1) based on the reference pictures stored in reference picture store 92.

Motion compensation unit 82 determines prediction information for the video blocks of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to generate predictive blocks for the current video blocks being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra or inter prediction) used to code video blocks of the video slice, an inter-prediction slice type (e.g., a B slice, a P slice, or a GPB slice), construction information for one or more of the reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.

The motion compensation unit 82 may also perform interpolation based on the interpolation filter. Motion compensation unit 82 may use interpolation filters used by video encoder 20 during encoding of video blocks to calculate interpolation values for sub-integer pixels of a reference block. In this case, motion compensation unit 82 may determine the interpolation filter used by video encoder 20 from the received syntax elements and use the interpolation filter to generate the predictive block.

Inverse quantization unit 86 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. The inverse quantization process may include using quantization parameters and/or quantization matrices calculated and signaled by video encoder 20 for each video block in the video slice to determine the degree of quantization and, likewise, the degree of inverse quantization that should be applied. In particular, inverse quantization unit 86 may be configured to decode received quantization matrices that have been coded according to the techniques described above. In particular, video decoder 30 may be configured to upsample a received quantization matrix that has been downsampled in accordance with the techniques of this disclosure.

In one example of this disclosure, video decoder 30 may be configured to: receiving, in a coded bitstream, a quantization matrix coded with downsampled values; upsampling a first set of downsampled values in the quantization matrix by a first upsampling factor to produce a first set of values; upsampling a second set of downsampled values in the quantization matrix by a second upsampling factor to produce a second set of values; and inverse quantizing the transform coefficient block with the first and second sets of values.

Inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate residual blocks in the pixel domain.

After motion compensation unit 82 generates the predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform processing unit 88 with the corresponding predictive block generated by motion compensation unit 82. Summer 90 represents one or more components that perform this summation operation. If desired, a deblocking filter may also be applied to filter the decoded block in order to remove blockiness artifacts. Other loop filters (in or after the coding loop) may also be used to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture storage 92, which stores reference pictures for subsequent motion compensation. Reference picture store 92 also stores decoded video for later presentation on a display device, such as display device 32 of fig. 1.

Fig. 10 is a flow diagram illustrating a method of video encoding in accordance with the techniques of this disclosure. The method of fig. 10 may be performed by video encoder 20. Video encoder 20 may be configured to: determining a quantization matrix (920) comprising a plurality of values; downsampling a first set of values in a quantization matrix by a first downsampling factor to generate a first set of downsampled values (922); and downsampling a second set of values in the quantization matrix by a second downsampling factor to generate a second set of downsampled values (924).

In one example of this disclosure, video encoder 20 may determine a first downsampling factor based on a position of the first set of values in the quantization matrix and determine a second downsampling factor based on a position of the second set of values in the quantization matrix. In a specific example, the first set of values includes only the value at position (0, 0) of the quantization matrix, wherein the first downsampling factor is determined to be 1, and wherein the second downsampling factor is determined to be one of 2 and 4.

Video encoder 20 may be configured to determine transition points in the quantization matrix to determine the manner in which the quantization matrix values are downsampled. In one example, video encoder 20 may be configured to: determining a first transition point in the quantization matrix, wherein values located between the first transition point and an origin of the quantization matrix are not down-sampled; determining a second transition point in the quantization matrix, wherein the first set of values in the quantization matrix is located between the first transition point and the second transition point; and determining a third transition point in the quantization matrix, wherein the second set of values in the quantization matrix is located between the second transition point and the third transition point. Video encoder 20 may be configured to signal the first, second, and third transition points and the first and second downsampling factors in a coded bitstream.

In one example of this disclosure, video encoder 20 may be configured to signal the downsampled values by: predicting one of the downsampled values along the scan order in the first and second sets of downsampled values from a previous downsampled value along the scan order in the first and second sets of downsampled values, wherein the downsampled value in the first set may be used to predict the downsampled value in the second set.

In another example of the present disclosure, the first set of values in the downsampled quantization matrix comprises averaging a first number of quantization matrix values in the first set of values to generate values in the first downsampled set of values, wherein the first number is determined according to a first downsampling factor, and wherein the second set of values in the downsampled quantization matrix comprises averaging a second number of quantization matrix values in the second set of values to generate values in the second downsampled set of values, wherein the second number is determined according to a second downsampling factor.

Video encoder 20 may be further configured to quantize values of transform coefficients in the transform coefficient block according to a quantization matrix to form quantized transform coefficients (926). Video encoder 20 may be further configured to generate a coded bitstream including the first set of downsampled values and the second set of downsampled values (928).

FIG. 11 is a flow diagram illustrating a video decoding method in accordance with the techniques of this disclosure. The method of fig. 11 may be performed by video decoder 30. Video decoder 30 may be configured to receive, in a coded bitstream, a quantization matrix coded with downsampled values (1020); upsampling a first set of downsampled values in the quantization matrix by a first upsampling factor to produce a first set of values (1022); upsampling a second set of downsampled values in the quantization matrix by a second upsampling factor to produce a second set of values (1024); and inverse quantizes the transform coefficient block with the first and second sets of values (1026).

In one example of this disclosure, video decoder 30 may be configured to: a first upsampling factor is determined based on a position of the first set of downsampled values in the quantization matrix, and a second upsampling factor is determined based on a position of the second set of downsampled values in the quantization matrix. In a specific example, the first set of downsampled values includes only the value at position (0, 0) of the quantization matrix, wherein the first upsampling factor is determined to be 1, and wherein the second upsampling factor is determined to be one of 2 and 4.

In another example of this disclosure, video decoder 30 may be configured to: determining a first transition point in a quantization matrix, wherein values of the quantization matrix between the first transition point and an origin of the quantization matrix are not downsampled; determining a second transition point in the quantization matrix, wherein the first set of downsampled values in the quantization matrix is located between the first transition point and the second transition point; and determining a third transition point in the quantization matrix, wherein the second set of downsampled values in the quantization matrix is located between the second transition point and the third transition point. In this example, the first, second, and third transition points and the first and second downsampling factors are received in a coded bitstream.

In another example of this disclosure, video decoder 30 is configured to predict each successive downsample value of downsample values along a scan order of the first and second sets of downsample values from a previous downsample value along the scan order of the first and second sets of downsample values, wherein downsample values of the first set may be used to predict downsample values of the second set.

In another example of the present invention, a first set of values in a quantization matrix is upsampled by: repeating the downsampled values in the first set of downsampled values for a first number of values in the first set of values, wherein the first number is determined according to a first upsampling factor, and a second set of values in the upsampling quantization matrix comprises repeating the downsampled values in the second set of downsampled values for a second number of values in the second set of values, wherein the second number is determined according to a second upsampling factor.

In one example of the disclosure, different upsampling techniques are used to upsample the first and second set of downsampled values. In a specific example, at least one of the first and second sets of values is upsampled using bilinear interpolation.

Video decoder 30 may be further configured to inverse transform the inverse quantized block of transform coefficients to form a block of residual video data; and a prediction process is performed on the block of residual video data to form a decoded block of video data.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as in accordance with a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices, or any other medium that can be used to store desired code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather pertain to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including wireless handsets, Integrated Circuits (ICs), or collections of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Conversely, as noted above, the various units may be combined in a codec hardware unit, or the units may be provided by a collection of interoperability hardware units (including one or more processors as noted above) in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method of encoding video data, comprising:

determining a quantization matrix comprising a plurality of values, wherein the quantization matrix is used to quantize transform coefficients;

determining a first transition point in the quantization matrix, wherein values located between the first transition point and an origin of the quantization matrix are not downsampled;

determining a second transition point in the quantization matrix, wherein a first set of values in the quantization matrix is located between the first transition point and the second transition point;

determining a third transition point in the quantization matrix, wherein a second set of values in the quantization matrix is located between the second transition point and the third transition point;

determining a first downsampling factor based on a position of the first set of values in the quantization matrix;

determining a second downsampling factor based on a position of the second set of values in the quantization matrix;

down-sampling the first set of values in the quantization matrix by the first down-sampling factor to generate a first set of down-sampled values;

down-sampling the second set of values in the quantization matrix by the second down-sampling factor to generate a second set of down-sampled values, wherein the second set of down-sampled values has fewer values than the second set of values in the quantization matrix and represents the second set of values in the quantization matrix; and

generating a coded bitstream that includes the first set of downsampled values and the second set of downsampled values.

2. The method of claim 1, wherein the first downsampling factor is determined to be 1, and wherein the second downsampling factor is determined to be one of 2 and 4.

3. The method of claim 1, wherein the first and second light sources are selected from the group consisting of a red light source, a green light source, and a blue light source,

wherein the quantization matrix has a size of 16 x 16 or 32 x 32.

4. The method of claim 1, further comprising:

signaling the first, second, and third transition points and the first and second downsampling factors in the coded bitstream.

5. The method of claim 1, further comprising:

predicting one of the downsample values along the scan order in the first and the second set of downsample values from a previous downsample value along the scan order in the first and the second set of downsample values, wherein a downsample value in the first set of downsample values may be used to predict a downsample value in the second set of downsample values.

6. The method of claim 1, wherein downsampling the first set of values in the quantization matrix comprises:

averaging a first number of quantization matrix values of the first set of values to generate values of the first set of downsampled values, wherein the first number is determined according to the first downsampling factor; and is

Wherein downsampling the second set of values in the quantization matrix comprises: averaging a second number of quantization matrix values in the second set of values to generate values in the second set of downsampled values, wherein the second number is determined according to the second downsampling factor.

7. The method of claim 1, further comprising:

performing a prediction process on a block of video data to form a block of residual video data;

transform the residual video data to form a block of transform coefficients;

quantizing values of transform coefficients in the transform coefficient block according to the quantization matrix to form quantized transform coefficients; and

entropy coding the quantized transform coefficients into the coded bitstream.

8. A method of decoding video data, comprising:

receiving, in a coded bitstream, a quantization matrix coded with downsampled values;

determining a first transition point in the quantization matrix, wherein values of the quantization matrix located between the first transition point and an origin of the quantization matrix are not downsampled;

determining a second transition point in the quantization matrix, wherein a first set of downsampled values in the quantization matrix is located between the first transition point and the second transition point;

determining a third transition point in the quantization matrix, wherein a second set of downsampled values in the quantization matrix is located between the second transition point and the third transition point;

determining a first upsampling factor value based on a position of the first set of downsampling values in the quantization matrix;

determining a second upsampling factor based on a position of the second set of downsampled values in the quantization matrix;

upsampling the first set of downsampled values in the quantization matrix by the first upsampling factor to produce a first set of values;

upsampling the second set of downsampled values in the quantization matrix by the second upsampling factor to generate a second set of values, wherein the second set of downsampled values has fewer values than the second set of values and represents the second set of values; and

inverse quantizing a transform coefficient block with the first and second sets of values.

9. The method of claim 8, wherein the first upsampling factor is determined to be 1, and wherein the second upsampling factor is determined to be one of 2 and 4.

10. The method of claim 8, wherein the first and second light sources are selected from the group consisting of,

wherein the quantization matrix has a size of 16 x 16 or 32 x 32.

11. The method of claim 8, further comprising:

receiving the first, second, and third transition points and first and second downsampling factors in the coded bitstream.

12. The method of claim 8, further comprising:

predicting each successive one of the downsample values along the scan order in the first and second sets of downsample values from a previous downsample value along the scan order in the first and second sets of downsample values, wherein a downsample value in the first set of downsample values may be used to predict a downsample value in the second set of downsample values.

13. The method of claim 8, wherein upsampling the first set of downsample values in the quantization matrix comprises: replicating a downsample value of the first set of downsample values for a first number of values of the first set of values, wherein the first number is determined according to the first upsampling factor; and is

Wherein upsampling the second set of downsampled values in the quantization matrix comprises: replicating a downsample value of the second set of downsample values for a second number of values of the second set of values, wherein the second number is determined according to the second upsampling factor.

14. The method of claim 8, wherein the first and the second set of downsampled values are upsampled using different upsampling techniques.

15. The method of claim 8, wherein at least one of the first and the second set of downsampled values is upsampled using bilinear interpolation.

16. The method of claim 8, further comprising:

inverse transforming the inverse quantized transform coefficient block to form a residual video data block; and

a prediction process is performed on the block of residual video data to form a decoded block of video data.

17. An apparatus configured to code video data, comprising:

a memory configured to store the video data; and

a video encoder configured to:

18. The apparatus of claim 17, wherein the first downsampling factor is determined to be 1, and wherein the second downsampling factor is determined to be one of 2 and 4.

19. The apparatus as set forth in claim 17, wherein,

wherein the quantization matrix has a size of 16 x 16 or 32 x 32.

20. The apparatus of claim 17, wherein the video encoder is further configured to:

21. The apparatus of claim 17, wherein the video encoder is further configured to:

22. The apparatus of claim 17, wherein downsampling the first set of values in the quantization matrix comprises:

23. The apparatus of claim 17, wherein the video encoder is further configured to:

transform the residual video data to form a block of transform coefficients;

entropy coding the quantized transform coefficients into the coded bitstream.

24. An apparatus configured to decode video data, comprising:

a memory configured to store the video data; and

a video decoder configured to:

determining a first upsampling factor based on a position of the first set of downsampled values in the quantization matrix;

25. The apparatus of claim 24, wherein the first upsampling factor is determined to be 1, and wherein the second upsampling factor is determined to be one of 2 and 4.

26. The apparatus as set forth in claim 24, wherein,

wherein the quantization matrix has a size of 16 x 16 or 32 x 32.

27. The apparatus of claim 24, wherein the video decoder is further configured to:

28. The apparatus of claim 24, wherein the video decoder is further configured to:

29. The apparatus of claim 24, wherein upsampling the first set of downsampled values in the quantization matrix comprises: replicating a downsample value of the first set of downsample values for a first number of values of the first set of values, wherein the first number is determined according to the first upsampling factor; and is

Wherein upsampling the second set of downsampled values in the quantization matrix comprises: duplicate downsampled values of the second set of downsampled values for a second number of values of the second set of values, wherein the second number is determined according to the second upsampling factor.

30. The apparatus of claim 24, wherein the first and the second set of downsampled values are upsampled using different upsampling techniques.

31. The apparatus of claim 24, wherein at least one of the first and the second set of downsampled values is upsampled using bilinear interpolation.

32. The apparatus of claim 24, wherein the video decoder is further configured to:

33. An apparatus configured to encode video data, comprising:

means for determining a quantization matrix comprising a plurality of values, wherein the quantization matrix is used to quantize transform coefficients;

means for determining a first transition point in the quantization matrix, wherein values located between the first transition point and an origin of the quantization matrix are not downsampled;

means for determining a second transition point in the quantization matrix, wherein a first set of values in the quantization matrix is located between the first transition point and the second transition point;

means for determining a third transition point in the quantization matrix, wherein a second set of values in the quantization matrix is located between the second transition point and the third transition point;

means for determining a first downsampling factor based on a position of the first set of values in the quantization matrix;

means for determining a second downsampling factor based on a position of the second set of values in the quantization matrix;

means for down-sampling the first set of values in the quantization matrix by the first down-sampling factor to generate a first set of down-sampled values;

means for downsampling the second set of values in the quantization matrix by the second downsampling factor to generate a second set of downsampled values, wherein the second set of downsampled values has fewer values than the second set of values in the quantization matrix and is representative of the second set of values in the quantization matrix; and

means for generating a coded bitstream that includes the first set of downsampled values and the second set of downsampled values.

34. An apparatus configured to decode video data, comprising:

means for receiving, in a coded bitstream, a quantization matrix coded with downsampled values;

means for determining a first transition point in the quantization matrix, wherein values of the quantization matrix located between the first transition point and an origin of the quantization matrix are not downsampled;

means for determining a second transition point in the quantization matrix, wherein a first set of downsampled values in the quantization matrix is located between the first transition point and the second transition point;

means for determining a third transition point in the quantization matrix, wherein a second set of downsampled values in the quantization matrix is located between the second transition point and the third transition point;

means for determining a first upsampling factor based on a position of the first set of downsampled values in the quantization matrix;

means for determining a second upsampling factor based on a position of the second set of downsampled values in the quantization matrix;

means for upsampling the first set of downsampled values in the quantization matrix by the first upsampling factor to generate a first set of values;

means for upsampling the second set of downsampled values in the quantization matrix by the second upsampling factor to generate a second set of values, wherein the second set of downsampled values has fewer values than the second set of values and represents the second set of values; and

means for inverse quantizing a transform coefficient block with the first and second sets of values.