[go: up one dir, main page]

HK1210344B - Non-zero rounding and prediction mode selection techniques in video encoding - Google Patents

Non-zero rounding and prediction mode selection techniques in video encoding Download PDF

Info

Publication number
HK1210344B
HK1210344B HK15110970.6A HK15110970A HK1210344B HK 1210344 B HK1210344 B HK 1210344B HK 15110970 A HK15110970 A HK 15110970A HK 1210344 B HK1210344 B HK 1210344B
Authority
HK
Hong Kong
Prior art keywords
data
video
weighted prediction
prediction
list
Prior art date
Application number
HK15110970.6A
Other languages
Chinese (zh)
Other versions
HK1210344A1 (en
Inventor
马尔塔.卡切维奇
陈培松
叶琰
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/499,990 external-priority patent/US9161057B2/en
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK1210344A1 publication Critical patent/HK1210344A1/en
Publication of HK1210344B publication Critical patent/HK1210344B/en

Links

Abstract

In one aspect of this disclosure, rounding adjustments to bi-directional predictive data may be purposely eliminated to provide predictive data that lacks any rounding bias. In this case, rounded and unrounded predictive data may both be considered in a rate-distortion analysis to identify the best data for prediction of a given video block. In another aspect of this disclosure, techniques are described for selecting among default weighted prediction, implicit weighted prediction, and explicit weighted prediction. In this context, techniques are also described for adding offset to prediction data, e.g., using the format of explicit weighted prediction to allow for offsets to predictive data that is otherwise determined by implicit or default weighted prediction.

Description

Non-zero rounding and prediction mode selection techniques in video coding
The scheme is a divisional application. The parent application of the present application is the invention patent application with the international application number of PCT/US2010/041423, the application date of PCT application is 7/8/2010, the application number of PCT application is 201080029582.6 after entering the chinese national phase, and the invention name of the patent application is "non-zero rounding and prediction mode selection technology in video coding".
The following co-pending and commonly assigned applications are expressly incorporated herein by reference: MartaKarczewicz, Peasing Chen And YanYe, filed on even date herewith And attorney docket number 082069U2, "Non-Zero Rounding And Prediction mode selection Techniques In Video coding" (Non-Zero Rounding And Prediction mode selection Techniques).
Technical Field
This disclosure relates to video coding, and more specifically, to video coding techniques using bi-prediction.
Background
Digital multimedia capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, digital media players, and the like. Digital multimedia devices may implement video coding techniques such as MPEG-2, ITU-H.263, MPEG-4, or ITU-H.264/MPEG-4 part 10 (advanced video coding (AVC)) to more efficiently transmit and receive or store and retrieve digital video data. Video coding techniques may perform video compression via spatial and temporal prediction to reduce or remove redundancy inherent in video sequences.
In video coding, compression often includes spatial prediction, motion estimation, and motion compensation. Intra-coding relies on spatial prediction and transform coding, such as Discrete Cosine Transform (DCT), to reduce or remove spatial redundancy between video blocks within a given video frame. Inter-coding relies on temporal prediction and transform coding to reduce or remove temporal redundancy between video blocks of successive video frames of a video sequence. Intra-coded frames ("I-frames") are often used as references for inter-coding of random access points as well as other frames. However, I-frames typically exhibit less compression than other frames. The term "I unit" may refer to an I frame, I slice, or other independently decodable portion of an I frame.
For inter-frame coding, a video encoder performs motion estimation to track the movement of matching video blocks between two or more adjacent frames or other units being coded (e.g., slices of a frame). The inter-coded frame may include: a predictive frame ("P frame"), which may comprise a block predicted from a previous frame; and bi-directional predictive frames ("B-frames"), which may include blocks predicted from previous and subsequent frames of a video sequence. The terms "P-frame" and "B-frame" are somewhat historical in the sense that earlier coding techniques limited prediction to a particular direction. Newer coding formats and standards may not limit the prediction direction of P-frames or B-frames. Thus, the term "bi-directional" now refers to prediction based on a list of two or more reference data (regardless of the temporal relationship of such reference data with respect to the data being encoded).
For example, bi-prediction consistent with newer video standards such as ITU h.264 may be based on two different lists that do not necessarily need to have data temporally before and after the current video block. In other words, a B video block may be predicted from two data lists, which may correspond to data from two previous frames, two subsequent frames, or a previous frame and a subsequent frame. In contrast, P video blocks are predicted based on a list (i.e., a data structure) that may correspond to a predictive frame (e.g., a previous frame or a subsequent frame). B-frames and P-frames may be more generally referred to as P-units and B-units. P units and B units may also be implemented in smaller coding units (e.g., slices of a frame or portions of a frame). A B unit may include B video blocks, P video blocks, or I video blocks. The P unit may include a P video block or an I video block. An I unit may include only I video blocks.
For P and B video blocks, motion estimation generates motion vectors that indicate the displacement of the video block relative to a corresponding prediction video block in a predictive reference frame or other coding unit. Motion compensation uses motion vectors to generate a prediction video block from a predictive reference frame or other coding unit. After motion compensation, a residual video block is formed by subtracting the prediction video block from the original video block to be encoded. Video encoders typically apply transform, quantization, and entropy encoding processes to further reduce the bit rate associated with communication of the residual block. I units and P units are typically used to define reference blocks for inter coding of P units and B units.
Disclosure of Invention
This disclosure describes video encoding and decoding techniques that may be applicable to bi-prediction. In bi-prediction, a video block may be predictively encoded and decoded based on two different lists of predictive reference data. In one aspect of the disclosure, rounding adjustments to bi-directional predictive data may be purposely eliminated to provide predictive data that lacks any rounding bias. In this case, both rounded and unrounded predictive data may be considered in the rate-distortion analysis to identify the best data for predicting a given video block. One or more syntax elements may be encoded to indicate the selection, and a decoder may interpret the one or more syntax elements in order to determine whether rounding should be used.
In another aspect of this disclosure, techniques are described for selecting among default weighted prediction, implicit weighted prediction, and explicit weighted prediction. In this context, techniques are also described for adding an offset to predictive data, e.g., using a format for explicit weighted prediction to allow for implementation of an offset to predictive data that would otherwise be defined by implicit or default weighted prediction.
In one example, this disclosure describes a method of encoding video data. The method comprises the following steps: the method includes generating first weighted prediction data that depends on two or more lists of data and includes a rounding adjustment, generating second weighted prediction data that depends on the two or more lists of data and does not include the rounding adjustment, selecting prediction data for encoding the video data based on a rate-distortion analysis of a plurality of candidates for prediction data, wherein the plurality of candidates for prediction data include the first weighted prediction data and the second weighted prediction data, and encoding the video data using the selected prediction data.
In another example, this disclosure describes a method comprising: the method generally includes receiving encoded video data, receiving one or more syntax elements indicating whether rounding adjustments are used to encode the encoded video data, generating weighted prediction data that depends on two or more lists of data, wherein the weighted prediction data does not include the rounding adjustments if the one or more syntax elements indicate that the rounding adjustments are not used to encode the encoded video data, and decoding the video data using the weighted prediction data.
In another example, this disclosure describes a video encoder apparatus that encodes video data. The apparatus includes: a memory that stores the video data and two or more lists of data used to predictively encode the video data; and a motion compensation unit. The motion compensation unit generates first weighted prediction data that depends on the two or more lists of data and includes a rounding adjustment, generates second weighted prediction data that depends on the two or more lists of data and does not include the rounding adjustment, and selects prediction data for encoding the video data based on a rate-distortion analysis of a plurality of candidates for prediction data, wherein the plurality of candidates for prediction data includes the first weighted prediction data and the second weighted prediction data. The video encoder apparatus encodes the video data using the selected prediction data.
In another example, this disclosure describes a video decoder apparatus comprising: an entropy unit that receives encoded video data and decodes one or more syntax elements that indicate whether rounding adjustments are used to encode the encoded video data; and a prediction unit that generates weighted prediction data that depends on two or more lists of data, wherein the weighted prediction data does not include the rounding adjustment if the one or more syntax elements indicate that the rounding adjustment is not used to encode the encoded video data, wherein the video decoder decodes the video data using the weighted prediction data.
In another example, this disclosure describes a device that encodes video data, the device comprising: the apparatus generally includes means for generating first weighted prediction data that depends on two or more lists of data and includes a rounding adjustment, means for generating second weighted prediction data that depends on the two or more lists of data and does not include the rounding adjustment, means for selecting prediction data for encoding the video data based on a rate-distortion analysis of a plurality of candidates for prediction data, wherein the plurality of candidates for prediction data includes the first weighted prediction data and the second weighted prediction data, and means for encoding the video data using the selected prediction data.
In another example, this disclosure describes a device comprising: the apparatus generally includes means for receiving encoded video data, means for receiving one or more syntax elements indicating whether rounding adjustments are used to encode the encoded video data, means for generating weighted prediction data that depends on two or more lists of data, wherein the weighted prediction data does not include the rounding adjustments when the one or more syntax elements indicate that the rounding adjustments are not used to encode the encoded video data, and means for decoding the video data using the weighted prediction data.
The techniques described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the software may be executed in one or more processors, such as a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a Digital Signal Processor (DSP). The software that executes the techniques may be initially stored in a computer-readable medium and loaded and executed in the processor.
Accordingly, the present invention also contemplates a computer-readable storage medium comprising instructions that, when executed, cause a processor to: the method includes generating first weighted prediction data that depends on two or more lists of data and includes a rounding adjustment, generating second weighted prediction data that depends on the two or more lists of data and does not include the rounding adjustment, selecting prediction data for encoding the video data based on a rate-distortion analysis of a plurality of candidates for prediction data, wherein the plurality of candidates for prediction data include the first weighted prediction data and the second weighted prediction data, and encoding the video data using the selected prediction data.
In another example, this disclosure describes a computer-readable storage medium comprising instructions that, when executed, cause a processor to, upon receiving encoded video data and receiving one or more syntax elements indicating whether rounding adjustments are used to encode the encoded video data: generating weighted prediction data that depends on two or more lists of data, wherein the weighted prediction data does not include the rounding adjustment if the one or more syntax elements indicate that the rounding adjustment was not used to encode the encoded video data; and decoding the video data using the weighted prediction data.
The details of one or more aspects of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the techniques described in this disclosure will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1 is a block diagram illustrating one exemplary video encoding and decoding system that may implement the techniques of this disclosure.
Fig. 2 is a block diagram illustrating an example of a video encoder that may perform offset techniques consistent with this disclosure.
Fig. 3 is a block diagram illustrating an example of a motion compensation unit in more detail.
Fig. 4 is an example of a video decoder that may perform an offset technique consistent with this disclosure.
Fig. 5 is a flow chart illustrating an exemplary process performed by a video encoder consistent with this disclosure.
Fig. 6 is a flow chart illustrating an exemplary process performed by a video decoder consistent with this disclosure.
Fig. 7 is a flow diagram illustrating another exemplary process performed by a video encoder consistent with this disclosure.
Detailed Description
This disclosure describes video encoding and decoding techniques that may be applicable to bi-prediction. In bi-prediction, a video block is predictively encoded and decoded based on two different lists of predictive reference data. In an aspect of the disclosure, rounding adjustments to bi-directional predictive data may be purposely eliminated to provide predictive data that lacks any rounding bias. In this case, both rounded and unrounded predictive data may be considered in the rate-distortion analysis to identify the best data for predicting a given video block. One or more syntax elements may be encoded to indicate the selection, and a decoder may interpret the one or more syntax elements in order to determine whether rounding should be used in the decoding process.
In another aspect of this disclosure, encoding techniques are described for selecting among default weighted prediction, implicit weighted prediction, and explicit weighted prediction. In this context, techniques are also described for adding an offset to predictive data, e.g., using a format for explicit weighted prediction to allow for an offset to predictive data that would otherwise be determined by implicit or default weighted prediction.
Weighted prediction refers to bi-directional prediction in which weights may be assigned to two or more different sets of predictive data. Default weighted prediction refers to weighted prediction in which the weight factors associated with two or more different lists are predefined by some default setting. In some cases, default weighted prediction may assign equal weights to each of the lists. Implicit weighted prediction refers to weighted prediction that defines weight factors associated with two or more different lists based on some implicit factors associated with the data. For example, the implicit weight factors may be defined by the temporal position of the data in the two different lists relative to the data being predictively encoded. If the data in list 0 is temporally closer to the data being predictively encoded than the data in list 1, then the data in list 0 may be assigned a larger implicit weight factor in the implicit weighted prediction.
Explicit weighted prediction refers to weighted prediction in which weight factors are dynamically defined and encoded into the bitstream as part of the encoding process. In this regard, explicit weighted prediction differs from default weighted prediction and implicit weighted prediction, e.g., explicit weighted prediction results in weighting factors that are encoded as part of the bitstream, while default and implicit weighted prediction define weighting factors that are predefined or determined by the decoder (no weighting factors are present in the bitstream).
According to one aspect of the present invention, the weighted prediction may be modified by eliminating rounding adjustments to the weighted prediction data relative to conventional weighted prediction. In this case, the encoder may analyze and consider both rounded and unrounded weighted prediction data, and may use either rounded or unrounded weighted prediction data based on the rate-distortion analysis. One or more syntax elements may be defined and encoded into the bitstream in order to identify whether rounded or unrounded weighted prediction data was used. The decoder may interpret the one or more syntax elements in order to determine whether rounded or unrounded weighted prediction data should be used in the decoding.
According to another aspect of this disclosure, encoding techniques are described for selecting among default weighted prediction, implicit weighted prediction, and explicit weighted prediction. Rather than considering each of these possibilities with respect to each other, the encoder may be programmed to select between default weighted prediction and implicit weighted prediction. The selection may then be compared to explicit weighted prediction. Explicit weighted prediction may be performed to calculate explicit weight factors, but the calculated explicit weighted prediction may also be compared to an explicit weighted prediction having weight factors corresponding to weight factors defined by default weighted prediction or implicit weighted prediction.
Explicit weighted prediction may have advantages over default weighted prediction and implicit weighted prediction in the following respects: explicit weighted prediction allows an offset to be added to the predictive data. The offset may bias or adjust predictive data, and may be extremely useful in addressing flashes, darkened sky, scene changes, or other types of lighting changes between video frames. For example, the offset may provide a common adjustment to all values of the video block, e.g., biasing the values up or down. In accordance with this disclosure, the weighting factors defined by default weighted prediction or implicit weighted prediction can be considered in the context of explicit weighted prediction, facilitating an increase in offset while maintaining the weighting factors associated with default or implicit weighted prediction. In this way, predictive data may be improved, which may help improve data compression in some cases.
Fig. 1 is a block diagram illustrating one exemplary video encoding and decoding system 10 that may implement the techniques of this disclosure. As shown in fig. 1, system 10 includes a source device 12 that transmits encoded video to a destination device 16 via a communication channel 15. Source device 12 and destination device 16 may comprise any of a wide range of devices. In some cases, source device 12 and destination device 16 include wireless communication devices, such as wireless handsets, so-called cellular or satellite radiotelephones, or any device that can communicate video information over a communication channel 15, which communication channel 15 may or may not be wireless. However, the techniques of this disclosure (which pertain to non-zero rounding and prediction mode selection techniques) are not necessarily limited to wireless applications or settings.
In the example of fig. 1, source device 12 may include a video source 20, a video encoder 22, a modulator/demodulator (modem) 23, and a transmitter 24. Destination device 16 may include a receiver 26, a modem 27, a video decoder 28, and a display device 30. In accordance with this disclosure, video encoder 22 of source device 12 may be configured to apply non-zero rounding and prediction mode selection techniques as part of the video encoding process. Video decoder 28 may receive one or more syntax elements that indicate the selection and indicate whether non-zero rounding was used. Thus, video decoder 28 may perform appropriate weighted prediction signaled in the received bitstream.
The illustrated system 10 of fig. 1 is merely exemplary. The non-zero rounding and prediction mode selection techniques of this disclosure may be performed by any encoding device that supports bi-directional motion compensated prediction. Source device 12 and destination device 16 are merely examples of such encoding devices, with source device 12 generating encoded video data for transmission to destination device 16. In some cases, devices 12, 16 may operate in a substantially symmetric manner such that each of devices 12, 16 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 16, such as for video streaming, video playback, video broadcasting, or video telephony.
Video source 20 of source device 12 may comprise a video capture device, such as a video camera, a video archive containing previously captured video, or a video feed from a video content provider. As another alternative, video source 20 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 20 is a video camera, source device 12 and destination device 16 may form so-called camera phones or video phones. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 22. The encoded video information may then be modulated by modem 23 according to a communication standard, such as Code Division Multiple Access (CDMA) or another communication standard, and transmitted to destination device 16 via transmitter 24. Modem 23 may include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas.
Receiver 26 of destination device 16 receives information over channel 15 and modem 27 demodulates the information. Again, the video encoding process may implement one or more of the techniques described herein to provide non-zero rounding and prediction mode selection consistent with this disclosure. The information communicated over channel 15 may include information defined by video encoder 22 (which may be used by video decoder 28 consistent with this disclosure). Display device 30 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a cathode ray tube, a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
In the example of fig. 1, communication channel 15 may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. Thus, modem 23 and transmitter 24 may support many possible wireless protocols, wired protocols, or wired and wireless protocols. The communication channel 15 may form part of a packet-based network, such as a Local Area Network (LAN), a Wide Area Network (WAN), or a global network (e.g., the internet, including the interconnection of one or more networks). Communication channel 15 generally represents any suitable communication medium or collection of different communication media for transmitting video data from source device 12 to destination device 16. Communication channel 15 may include a router, switch, base station, or any other equipment that may be used to facilitate communication from source device 12 to destination device 16.
Video encoder 22 and video decoder 28 may operate in accordance with a video compression standard, such as the ITU-T h.264 standard, or described as MPEG-4, part 10, Advanced Video Coding (AVC). However, the techniques of this disclosure are not limited to any particular encoding standard. Although not shown in fig. 1, in some aspects, video encoder 22 and video decoder 28 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX (multiplexing-demultiplexing) units or other hardware and software to handle encoding of audio and video in a common data stream or separate data streams. The MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP), if applicable.
The ITU-T H.264/MPEG-4(AVC) standard is formulated by the ITU-T Video Coding Experts Group (VCEG) along with the ISO/IEC Moving Picture Experts Group (MPEG) as a product of collective collaboration called Joint Video Team (JVT). In some aspects, the techniques described in this disclosure may be applied to devices that generally comply with the h.264 standard. The h.264 standard is described in the ITU-T recommendation h.264 "Advanced Video Coding for genetic audio visual services (Advanced Video Coding for general audio visual services)" published by the ITU-T research group in 3 months 2005, which ITU-T recommendation h.264 may be referred to herein as the h.264 standard or the h.264 specification or the h.264/AVC standard or specification. The Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC.
Work is underway to advance the H.264/MPEG-4AVC standard in various forums of ITU-T, such as the Key Technology Area (KTA) forum. The KTA forum is seeking to some extent coding techniques that exhibit coding efficiencies higher than those exhibited by the h.264/AVC standard. The techniques described in this disclosure may provide coding improvements over the H.264/AVC standard. Recently, the KTA forum received a document detailing technology that may be considered to be related to or related to the technology described herein, filed by Yan Ye, Peisong Chen and Marta Karczewicz, under the number VCEG-AI33, entitled "High precision interpolation and Prediction", and set forth at 35 th meeting held in berlin, germany, on days 16 to 18, 2008, the entire contents of which are hereby incorporated by reference.
Video encoder 22 and video decoder 28 may each be implemented as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. Each of video encoder 22 and video decoder 28 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined CODEC (CODEC) in a respective mobile device, user device, broadcast device, server, or the like.
A video sequence typically comprises a series of video frames. Video encoder 22 and video decoder 28 may operate on video blocks within individual video frames in order to encode and decode video data. Video blocks may have fixed or varying sizes and may differ in size according to a specified encoding standard. Each video frame may comprise a series of slices or other independently decodable units. Each slice may include a series of macroblocks, which may be arranged into sub-blocks. As an example, the ITU-T H.264 standard supports: intra prediction at various block sizes, e.g., 16 by 16, 8 by 8, or 4 by 4 for luma components, and 8 x 8 for chroma components; and inter prediction at various block sizes, e.g., 16 by 16, 16 by 8, 8 by 16, 8 by 8, 8 by 4, 4 by 8, and 4 by 4 for luma components, and corresponding scaled sizes for chroma components. The video blocks may comprise blocks of pixel data, or blocks of transform coefficients, such as after a transform process (e.g., a discrete cosine transform or a conceptually similar transform process).
Smaller video blocks may provide better resolution and may be used for locations of a video frame that include high levels of detail. In general, a macroblock and various sub-blocks may be considered video blocks. In addition, a slice may be considered a series of video blocks, e.g., macroblocks and/or sub-blocks. Each slice may be an independently decodable unit of the video frame. Alternatively, the frame itself may be a decodable unit, or other portions of the frame may be defined as decodable units. The term "coding unit" refers to any independently decodable unit of a video frame, such as an entire frame, a slice of a frame, a group of pictures (GOP), or another independently decodable unit defined according to the coding technique used.
Quantization may be performed after inter-based predictive coding, which includes interpolation and the techniques of this disclosure to efficiently select the prediction algorithm or mode used to predict the coding unit, and after any transform, such as the 4 x 4 or 8 x 8 integer transform or discrete cosine transform or DCT used in h.264/AVC. Quantization generally refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, a 16-bit value may be rounded down to a 15-bit value during quantization. After quantization, entropy encoding may be performed, for example, according to Content Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), or another entropy encoding method.
The techniques of this disclosure are particularly well suited for weighted bi-directional prediction. As mentioned above, bi-prediction is a prediction of a so-called "B video block" based on two different data lists. A B video block may be predicted from two lists of data from two previous frames, two lists of data from a subsequent frame, or one list of data from a previous frame and one list of data from a subsequent frame. In contrast, P video blocks are predicted based on a list that may correspond to a predictive frame (e.g., a previous frame or a subsequent frame). B frames and P frames may be more generally referred to as P units and B units. P units and B units may also be implemented in smaller coding units (e.g., slices of a frame or portions of a frame). A B unit may include B video blocks, P video blocks, or I video blocks. The P unit may include a P video block or an I video block. An I unit may include only I video blocks.
Weighted bi-prediction refers to bi-prediction that allows weighting factors to be assigned to two different lists. Each list may include a set of data associated with a predictive frame or other encoded unit. In weighted bi-prediction, a list may be given a greater weight in the process of generating predictive data. For example, if one of the lists has data that is more similar to the video block being encoded, that list may get more weight than the other list.
For different types of weighted bi-prediction according to ITU-T h.264, video encoder 22 and video decoder 28 may typically support three different types of prediction modes. A first prediction mode, referred to as "default weighted prediction," refers to weighted prediction in which the weight factors associated with two or more different lists are predefined by some default setting. In some cases, default weighted prediction may assign equal weights to each of the lists.
A second prediction mode, referred to as "implicit weighted prediction," refers to weighted prediction in which weight factors associated with two or more different lists are defined based on some implicit factors associated with the data. For example, the implicit weight factors may be defined by the relative temporal position of the data in the two different lists relative to the data being predictively encoded. In both default weighted prediction and implicit weighted prediction, the weighting factors are not included in the bitstream. In contrast, video decoder 28 may be programmed to know the weight factors (for default) or may be programmed to know how to derive the weight factors (for implicit).
A third prediction mode, referred to as "explicit weighted prediction," refers to weighted prediction in which weight factors are dynamically defined and encoded into the bitstream as part of the encoding process. Explicit weighted prediction differs from default weighted prediction and implicit weighted prediction in this respect (e.g., explicit weighted prediction produces weighting factors that are encoded as part of the bitstream).
According to an aspect of the present invention, the weighted prediction may be modified relative to conventional weighted prediction by eliminating rounding adjustments to the weighted prediction data. In this case, encoder 22 may analyze and consider both rounded and unrounded weighted prediction data, and may use either rounded or unrounded weighted prediction data based on rate-distortion analysis. One or more syntax elements may be defined and encoded into the bitstream in order to identify whether rounded or unrounded weighted prediction data is used. Decoder 28 may decode and interpret the syntax elements, and based on the syntax elements, decoder 28 may use rounded or unrounded weighted prediction data in the decoding process. The removal of rounding adjustments applies to default weighted prediction, implicit weighted prediction, and explicit weighted prediction.
According to another aspect of this disclosure, video encoder 22 may select among default weighted prediction, implicit weighted prediction, and explicit weighted prediction. In this aspect, encoder 22 may be programmed to select between default weighted prediction and implicit weighted prediction, rather than considering each of these possibilities relative to each other. The selection may then be compared to explicit weighted prediction. Specifically, encoder 22 may perform explicit weighted prediction to calculate explicit weight factors, but encoder 22 may also compare the calculated explicit weighted prediction to an explicit weighted prediction having weight factors corresponding to the weight factors defined by default weighted prediction or implicit weighted prediction.
Explicit weighted prediction may have advantages over default weighted prediction and implicit weighted prediction in the following respects: explicit weighted prediction allows for coding offsets. The offset may adjust predictive data and may be very useful in addressing flashes, darkened sky, scene changes, or other types of lighting changes between video frames. In accordance with this disclosure, the weighting factors defined by default weighted prediction or implicit weighted prediction may be considered by video encoder 22 in the context of explicit weighted prediction, facilitating an increase in offset while maintaining the weighting factors associated with default or implicit weighted prediction. In this way, predictive data may be improved in some cases, which may help to improve data compression.
In the context of video encoding, video encoder 22 may calculate the DC offset by first averaging the luminance pixel values of the luminance video block being encoded. Video encoder 22 may then average the luma pixel values of the predictive video blocks used to encode the video blocks. Each of these calculated values may include a DC value. Video encoder 22 may calculate the DC offset by subtracting DC values from each other (e.g., by subtracting an average luminance value of the current block from an average luminance value of a predictive block used to encode the current block being encoded). The DC offset may also be defined for the chrominance components (if needed). The DC offset may also be accumulated over a given coding unit (e.g., frame or slice), and the DC offset of a coding unit is defined as the average of the offsets of all blocks of the given coding unit.
Fig. 2 is a block diagram illustrating an example of a video encoder 50 that may perform techniques consistent with this disclosure. Video encoder 50 may correspond to video encoder 22 of source device 12, or a video encoder of a different device. Video encoder 50 may perform intra-coding and inter-coding of blocks within a video frame, but intra-coding components are not shown in fig. 2 for ease of illustration. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames of a video sequence. Intra-mode (I-mode) may refer to spatially-based compressed mode, and inter-mode, such as predictive (P-mode) or bi-directional (B-mode), may refer to temporally-based compressed mode.
As shown in fig. 2, video encoder 50 receives a current video block within a video frame or slice to be encoded. In the example of fig. 2, video encoder 50 includes motion estimation unit 32, motion compensation unit 35, memory 34, adder 48, transform unit 38, quantization unit 40, and entropy encoding unit 46. For video block reconstruction, video encoder 50 also includes inverse quantization unit 42, inverse transform unit 44, and adder 51. Video encoder 50 may also include a deblocking filter (not shown) to filter block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of adder 51, if desired.
During the encoding process, video encoder 50 receives the video block to be encoded, and motion estimation unit 32 and motion compensation unit 35 perform inter-predictive encoding. Motion estimation unit 32 and motion compensation unit 35 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation is generally viewed as the process of generating motion vectors, which estimate the motion of a video block. For example, a motion vector may indicate the displacement of a predictive block within a predictive frame (or other coding unit) relative to the current block being encoded within the current frame (or other coding unit). Motion compensation is generally viewed as the process of extracting or generating a predictive block based on a motion vector determined by motion estimation. Again, motion estimation unit 32 and motion compensation unit 35 may be functionally integrated. For exemplary purposes, the techniques described in this disclosure are described as being performed by motion compensation unit 35.
Motion estimation unit 32 selects the appropriate motion vector for the video block to be encoded by comparing the video block to be encoded to video blocks of one or more predictive coding units (e.g., from temporal or temporally previous and/or future frames). As an example, motion estimation unit 32 may select motion vectors for B frames in a number of ways. In one approach, motion estimation unit 32 may select a previous or future frame from a first set of frames (referred to as list 0) and determine a motion vector using only this previous or future frame from list 0. Alternatively, motion estimation unit 32 may select a previous or future frame from a second set of frames (referred to as list 1) and determine a motion vector using only this previous or future frame from list 1. In yet another approach, motion estimation unit 32 may select a first frame from list 0 and a second frame from list 1, and select one or more motion vectors according to the first frame of list 0 and the second frame of list 1. This third form of prediction may be referred to as bi-predictive motion estimation. The techniques of this disclosure may be implemented in order to efficiently select a motion compensated bi-directional prediction mode. The selected motion vector for any given list may point to a predictive video block that is most similar to the video block being encoded, e.g., as defined by a metric such as Sum of Absolute Differences (SAD) or Sum of Squared Differences (SSD) of pixel values of the predictive block relative to pixel values of the block being encoded.
According to the ITU-T h.264/AVC standard, three motion compensated bi-predictive algorithms or modes may be used to predict B-frames or portions thereof, e.g., video blocks, macroblocks, or any other discrete and/or contiguous portions of a B-frame. A first motion compensated bi-predictive algorithm or mode, commonly referred to as default weighted prediction, may involve applying default weights to each identified video block of the first frame of list 0 and the second frame of list 1. Default weights may be programmed according to a standard and are often chosen to be equal for default weighted predictions. The weighted blocks of the first frame and the second frame are then added together and divided by the total number of frames used to predict the B frame (e.g., two in this example). Often, this division is achieved by adding 1 to the addition (addition) of the weighted block of the first frame and the weighted block of the second frame and then right shifting the result by one bit. The addition of 1 is a rounding adjustment.
According to an aspect of the invention, adding 1 (rounding adjustment) before shifting by one on the right can be avoided, thus eliminating the upwardly biased rounding. Motion compensation unit 35 may generate both weighted blocks with rounding and weighted blocks without rounding, and may select the block that achieves the best coding efficiency.
More generally, the weighted prediction can be given by the following equation:
pred(i,j)=(pred0(i,j)*w0+pred1(i,j)*w1+2r)>>(r+1)
where pred (i, j) is data associated with the weighted prediction block, pred0(i, j) is data from list 0, pred1(i, j) is data from list 1, w0 and w1 are weight factors, 2rIs a rounding term, and > is a right shift operation that shifts (r +1) bits. Consistent with this disclosure, two different versions of pred (i, j) may be generated and may be considered by motion compensation unit 35. The first version is consistent with the above equations, and the second version is consistent with the above equations without rounding (i.e., removing the term "from the equations)2r") are consistent. In some cases, eliminating this rounding may result in better weighted predictive data, which may improve coding efficiency. Motion compensation unit 35 may generate one or more syntax elements to define whether rounding is used for a given video block or set of video blocks. Both the bi-prediction mode and one or more syntax elements indicating whether rounding is used may be output from motion compensation unit 35 to entropy encoding unit 46 for inclusion in the encoded bitstream.
B pictures use two lists of previously encoded reference pictures, list 0 and list 1. These two lists may each contain pictures that are encoded in the past and/or future in temporal order. A block in a B picture may be predicted in one of several ways: motion compensated prediction from list 0 reference pictures, motion compensated prediction from list 1 reference pictures, or motion compensated prediction from a combination of both list 0 and list 1 reference pictures. To obtain a combination of both list 0 and list 1 reference pictures, two motion compensated reference regions are obtained from the list 0 and list 1 reference pictures, respectively. The combination of which will be used to predict the current block.
In this disclosure, the term "B picture" will be used to generally refer to any type of B unit, which may be a B frame, a B slice, or possibly other video units that include at least some B video blocks. As mentioned, B pictures may allow 3 types of weighted prediction. For simplicity, only forward prediction in unidirectional prediction is shown below, but backward prediction may also be used.
The default weighted prediction may be defined by the following equations for unidirectional prediction and bidirectional prediction, respectively.
Unidirectional prediction: pred (i, j) ═ pred0(i, j)
Bidirectional prediction: pred (i, j) ═ pred0(i, j) + pred1(i, j) +1) > 1
Where pred0(i, j) and pred1(i, j) are prediction data from list 0 and list 1.
Implicit weighted prediction can be defined by the following equations for unidirectional prediction and bi-directional prediction, respectively.
Unidirectional prediction: pred (i, j) ═ pred0(i, j)
Bidirectional prediction: pred (i, j) (pred0(i, j) × w0+ pred1(i, j) × w1+32) > 6
In this case, each prediction is scaled according to a weight factor w0 or w1, where w0 and w1 are calculated based on the relative temporal positions of the list 0 and list 1 reference pictures.
Explicit weighted prediction can be defined by the following equations for unidirectional prediction and bi-directional prediction, respectively.
Unidirectional prediction: pred (i, j) ═ pred0(i, j) × w0+2r-1)>>r+o1
Bidirectional prediction: pred (i, j) — (pred0(i, j) × w0+ pred1(i, j) × w1+2r)>>(r+1)+((o1+o2+1)>>1)
In this case, the weight factors are determined by the encoder and transmitted in the slice header, and o1 and o2 are the picture offset for the list 0 reference picture and the picture offset for the list 1 reference picture, respectively.
Conventionally, rounding adjustments are always used in bi-directional prediction. According to the above equation, a rounding adjustment of 1 is used before a shift by one on the right in default weighted prediction, and a rounding adjustment of 32 is used before a shift by six on the right in implicit weighted prediction. In general, 2 is typically used before right shift rr-1Where r represents a positive integer.
Such frequent and biased rounding operations may reduce the accuracy of the prediction. Furthermore, in bi-directional prediction for explicit weighted prediction, there are actually 2 rounds, one for the reference picture and another for the offset. Thus, in this case, rounding errors may accumulate. According to an aspect of this disclosure, the video encoder may add the offset to the weighted prediction before the right shift (instead of rounding 2 times separately), as follows:
pred(i,j)=(pred0(i,j)*w0+pred1(i,j)*w1+((o1+o2)<<r)+2r)>>(r+1),
where pred (i, j) is the weighted prediction data associated with rounding, pred0(i, j) is the data from list 0, pred1(i, j) is the data from list 1, w0 and w1 are the weight factors, o1 and o2 are the offsets, and r and 2 are the weighted prediction data associated with roundingrThe rounded rounding term is provided in connection with a right shift operation of shifting (r +1) bits, ">". This may provide better prediction accuracy. In this case, a new syntax element may also be defined to allow two different offsets (o1 and o2) to be combined into one offset. Further, in this case, the rounded value may include the rounding adjustment described above (e.g., 2 prior to a right shift operation that shifts (r +1) bitsr) And another rounded value ("r") associated with the offset. The above equations may also be modified slightly to provide higher accuracy of the offset. If higher precision of the offset is desired, the offset may be multiplied by a factor (e.g., by 2) and then rounded to an integer. The left shift may also be changed to account for this increased offset accuracy.
Another problem with explicit weighted prediction is that unidirectional prediction and bi-directional prediction may share the same weights and offsets. To have more flexibility for better prediction, uni-directional prediction and bi-directional prediction may be decoupled according to the present invention. In this case, uni-directional prediction and bi-directional prediction may define different weights and offsets for a given type of prediction (default, implicit or explicit). New syntax elements for explicit prediction may be defined to allow better prediction. An encoder may include syntax elements in an encoded bitstream to signal different rounding modes used by the encoder so that a decoder may use the same rounding modes.
It is beneficial to adaptively select rounding adjustments. One way to do this is to generate two or more different sets of predictive data (and possibly encode the video block several times) based on the two or more different sets of predictive data. One predictive data set may have non-zero rounding and the other may eliminate rounding. In yet other examples, round-up, round-down, and no round-off may be considered. Motion compensation unit 35 may generate these different types of predictive data and may perform a rate-distortion (RD) analysis to select the best predictive data for a given video block.
Rate-distortion (RD) analysis is quite common in video coding and typically involves the computation of a cost metric that indicates the coding cost. The cost metric may balance the number of bits required for encoding (rate) and the quality level associated with the encoding (distortion). A typical rate-distortion cost calculation may generally correspond to the following format:
J(λ)=λR+D,
where J (λ) is cost, R is bit rate, D is distortion, and λ is the Lagrangian multiplier.
One way video encoder 50 identifies the most needed prediction data is to first find the motion vectors using motion estimation unit 32, and then implement motion compensation unit 35 and adder 48 to calculate the prediction error with and without rounding adjustments. Motion compensation unit 35 may then select the prediction data that yields the smallest prediction error. The prediction error may be calculated by using the sum of absolute differences between the prediction data and the current data being encoded.
In explicit weighted prediction, motion compensation unit 35 of video encoder 50 may implement three different modes. In all three explicit weighted prediction modes, each reference picture may have one offset for uni-directional prediction and each pair of reference pictures has one offset for bi-directional prediction, e.g.:
pred(i,j)=(pred0(i,j)*w0+pred1(i,j)*w1+(o<<r)+2r)>>(r+1),
where pred (i, j) is the first weighted prediction data, pred0(i, j) is the data from List 0, pred1(i, j) is the data from List 1, w0 and w1 are weight factors, o is the weight factor applicable to pred0(i, j) andcommon offset of pred1(i, j) from List 1, and r and 2rThe rounded rounding term is provided in connection with a right shift operation of shifting (r +1) bits, ">". The first mode may use weights defined by default weighted prediction. The second mode may use weights defined by implicit weighted prediction. The third mode allows each reference picture to have one weight for unidirectional prediction, and allows each pair of reference pictures involved in bidirectional prediction to have a pair of weights for two reference pictures. The weights defined for the third mode may be adaptively determined, and in some cases, an explicit weighted prediction framework may be used with weights defined by default or implicit weighted prediction in order to allow for offsets in these contexts. Furthermore, the weights and offsets defined in this third mode may be different for uni-directional prediction and bi-directional prediction. The above equations may also be modified slightly to provide higher accuracy of the offset. If higher precision of the offset is desired, the offset may be multiplied by a factor (e.g., by 2) and then rounded to an integer. The left shift may also be changed to account for this increased offset precision (e.g., in this case, the left shift may be changed to r-1).
In order for video encoder 50 to signal a particular mode for a given video block or set of video blocks to a decoder, video encoder 50 may implement 2 one-bit syntax elements: derived _ weight _ flag and poc _ weight _ flag. In this case, derived weight flag may be used to select between the first two explicit weighted prediction modes mentioned above and the third mode, and poc weight flag may be used to select between the first explicit weighted prediction mode and the second explicit weighted prediction mode.
To find the best weighted prediction, video encoder 50 may perform multi-pass (multi-pass) encoding and select the best mode based on the rate-distortion cost. One way to do this is an exhaustive search in which motion compensation unit 35 generates every possible weighted prediction data and selects the best weighted prediction data. However, to reduce complexity, motion compensation unit 35 may implement additional techniques of this disclosure, e.g., to first select between default weighted prediction and implicit weighted prediction, and then compare the selection to explicit weighted prediction. Motion compensation unit 35 may calculate weights and offsets associated with explicit weighted prediction, and may also use an explicit weighted prediction framework to add offsets to data otherwise associated with default weighted prediction or implicit weighted prediction, whichever is selected. Thus, there may be two sets of offsets calculated by the motion compensation unit 35. The first set of offsets may be calculated by using known weights used in default weighted prediction or implicit weighted prediction, and the second set of offsets may be calculated, for example, by minimizing motion compensated prediction errors along with weights (as part of the normal calculation of explicit weighted prediction).
To further reduce complexity, during explicit weighted prediction, if the offset is 0, motion compensation unit 35 may skip explicit weighted prediction using default weights or implicit weights. Again, if the offset is 0 and the calculated weights do not change, motion compensation unit 35 may skip typical explicit weighted prediction using the calculated weights and offset.
Once the desired prediction data is identified by motion compensation unit 35 (as described herein), video encoder 50 forms a residual video block by subtracting the prediction data from the original video block being encoded. Adder 48 represents one or more components that perform this subtraction operation. Transform unit 38 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, producing a video block that includes residual transform block coefficients. For example, transform unit 38 may perform other transforms that are conceptually similar to DCTs (e.g., those defined by the h.264 standard). Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used. In any case, transform unit 38 applies a transform to the residual block, producing a block of residual transform coefficients. The transform may convert residual information from a pixel domain to a frequency domain.
Quantization unit 40 quantizes the residual transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, a 9-bit value may be rounded down to an 8-bit value during quantization. In addition, for the case of using an offset, the quantization unit 40 may also quantize a different offset.
After quantization, entropy encoding unit 46 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 46 may perform Content Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), or another entropy encoding method. Following entropy encoding by entropy encoding unit 46, the encoded video may be transmitted to another device or sealed for later transmission or retrieval. The encoded bitstream may include entropy encoded residual blocks, motion vectors for these blocks, and other syntax (e.g., syntax described herein).
Inverse quantization unit 42 and inverse transform unit 44 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block in the manner described above. Adder 51 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 35 to produce a reconstructed video block for storage in memory 34. The reconstructed video block may be used by motion estimation unit 32 and motion compensation unit 35 as a reference block to inter-code a block in a subsequent video frame.
Fig. 3 is a block diagram illustrating an example of motion compensation unit 35 of fig. 2 in more detail. As shown in the example of fig. 3, motion compensation unit 35 is coupled to memory 34, memory 34 storing the set of first coding units or reference frames and the set of second coding units or reference frames as lists 052A and 152B. Additionally, memory 34 may store current video data 53 that is encoded. Memory 34 may include a shared memory structure, or may include a number of different memories, storage units, buffers, or other types of memory that facilitate the storage of any of the data discussed herein. Lists 052A and 152B are data associated with two different predictive units, e.g., data from two different frames or slices or macroblocks, according to bi-prediction. Again, bi-prediction is not necessarily limited to any prediction direction, and thus, lists 052A and 152B may store data from two previous frames or slices, two subsequent frames or slices, or one previous frame or slice and one subsequent frame or slice. Further, in some cases, list 052A and/or list 152B may each include data associated with multiple frames, slices, or macroblocks. List 052A and/or list 152B are only two different sets of possible predictive data, and each list may include one frame or slice, or several frames, slices, or macroblocks, in any direction relative to the current video block being encoded.
As shown in fig. 3, motion compensation unit 35 includes a default weighted prediction unit 54, an implicit weighted prediction unit 56, and an explicit weighted prediction unit 58. Units 54, 56, and 58 perform default weighted prediction, implicit weighted prediction, and explicit weighted prediction, respectively, as described herein. Rate-distortion (R-D) analysis unit 64 may select weighted prediction data among these possibilities, and may implement the techniques of this disclosure to facilitate the selection process.
Motion compensation unit 35 also includes a rounding unit 55 that causes one or more of units 54, 56, and 58 to generate rounded and unrounded versions of the respective weighted prediction data by rounding unit 55. Again, by eliminating rounding, the weighted prediction data may be improved in some contexts.
In addition, the motion compensation unit 35 includes an offset calculation unit 62 that calculates an offset. According to the ITU-T H.264/MPEG-4AVC coding format, the offset is only allowed in explicit weighted prediction. Thus, to account for offsets in the context of default weighted prediction or implicit weighted prediction, the weights determined by default weighted prediction unit 54 or implicit weighted prediction unit 56 may be forwarded to explicit weighted prediction unit 58 along with the offsets determined by offset calculation unit 62. In this manner, explicit weighted prediction unit 58 may employ the ITU-T H.264/MPEG-4AVC coding format by adding an offset to either the default weighted prediction data or the implicit weighted prediction data for consideration by R-D analysis unit 64. In this case, the explicit weighted prediction unit 58 generates not only normal explicit weighted prediction data but also prediction data combining the offset determined by the offset calculation unit 62 using the weight determined by the default weighted prediction unit 54 or the implicit weighted prediction unit 56.
The offset calculation unit may calculate the offset as a difference between an average of video block values of the block being encoded and an average of video block values of the prediction block. Offsets may be calculated for luminance video blocks, and in some cases, offsets may be calculated for luminance video blocks and for chrominance video blocks.
R-D analysis unit 64 may analyze the differently weighted predictive data and may select the weighted predictive data that produces the best results in terms of quality or in terms of rate and distortion. R-D analysis unit 64 outputs selected weighted predictive data that may be subtracted from the video block being encoded via adder 48 (fig. 2). The decoder may be informed of the manner or method in which the application generates the weighted predictive data using the syntax elements. For example, a syntax element may indicate whether rounding is used, and may indicate whether default, implicit, or explicit weighted prediction should be used. If explicit weighted prediction should be used, the syntax elements may further identify weight factors and offsets, which in turn may be the weight factors and offsets associated with explicit weighted prediction, or may be the weight factors actually defined by default weighted prediction unit 54 or implicit weighted prediction unit 56 plus the offset from offset calculation unit 62.
Fig. 4 is a block diagram illustrating an exemplary video decoder 70 that may perform decoding techniques that are the inverse of the encoding techniques described above. Video decoder 70 may include entropy decoding unit 72, prediction unit 75, inverse quantization unit 76, inverse transform unit 78, memory 74, and adder 79. Prediction unit 75 may include a Motion Compensation (MC) unit 88, as well as spatial prediction components, which are not shown for simplicity and ease of illustration.
Video decoder 70 may receive encoded video data and one or more syntax elements indicating whether rounding adjustments are used to encode the encoded video data. MC unit 86 of prediction unit 75 may generate weighted prediction data that depends on two or more lists of data, as described herein. According to this disclosure, in the case where one or more syntax elements indicate that video data being encoded is not encoded using rounding adjustments, the weighted prediction data does not include rounding adjustments. Video decoder 70 may decode the video data using the weighted prediction data, e.g., by calling adder 79 to add the weighted prediction data (e.g., a prediction block) to residual data (e.g., a residual block).
In general, entropy decoding unit 72 receives the encoded bitstream and entropy decodes the bitstream to produce quantized coefficients, motion information, and other syntax. Motion information (e.g., motion vectors) and other syntax is forwarded to prediction unit 75 for use in generating predictive data. Prediction unit 75 performs bi-directional prediction consistent with this disclosure, avoiding rounding adjustments (in some cases), and possibly implementing default, implicit, or explicit weighted prediction according to received syntax elements. The syntax elements may identify the type of weighted prediction to be used, identify coefficients and offsets if explicit weighted prediction is used, and identify whether rounding adjustments should be used in decoding.
The quantized coefficients are sent from entropy decoding unit 72 to inverse quantization unit 76, which inverse quantization unit 76 performs inverse quantization. Inverse transform unit 78 then inverse transforms the dequantized coefficients back to the pixel domain to generate a residual block. Adder 79 combines the prediction data (e.g., a prediction block) generated by prediction unit 75 with the residual blocks from inverse transform unit 78 to generate reconstructed video blocks, which may be stored in memory 74 and/or output from video decoder 70 as decoded video output.
Fig. 5 is a flow chart illustrating an exemplary process performed by a video encoder consistent with this disclosure. Fig. 5 will be described from the perspective of video encoder 50 of fig. 2. As shown in fig. 5, motion compensation unit 35 generates first weighted prediction data that includes rounding (101), and generates second weighted prediction data that does not include rounding (102). Motion compensation unit 35 then selects prediction data from the first weighted prediction data and the second weighted prediction data based on a rate-distortion analysis (103). In particular, motion compensation unit 35 may determine a cost metric for the first and second weighted prediction data that quantifies and balances the encoding rate and encoding quality associated with the first and second weighted prediction data, and may select the prediction data with the lowest cost. Video encoder 50 may then encode the video data based on the selected prediction data (104). For example, video encoder 50 may invoke adder 48 to subtract the selected prediction data from the video data being encoded, and then invoke transform unit 38 for the transform, quantization unit 40 for the quantization, and entropy encoding unit 46 for entropy encoding the quantized and transformed residual coefficients. In this case, motion compensation unit 35 may generate one or more syntax elements to indicate whether rounding is used for the prediction data, and may forward these syntax elements to entropy encoding unit 46 for inclusion in the encoded bitstream.
Fig. 6 is a flow chart illustrating an exemplary process performed by a video decoder consistent with this disclosure. Fig. 6 will be described from the perspective of video decoder 70 of fig. 4. As shown in fig. 6, a video decoder receives encoded video data (111), and receives one or more syntax elements indicating whether rounding adjustments are used to encode the video data (112). In particular, entropy decoding unit 72 may receive an encoded bitstream that includes video data and one or more syntax elements. After entropy decoding, entropy decoding unit 72 may output the video data as quantized transform coefficients, which are inverse quantized by unit 76 and inverse transformed by unit 78. Entropy decoding unit 72 may output syntax elements to a prediction unit that include one or more syntax elements that indicate whether rounding adjustments are used to encode the video data, motion vectors, and possibly other syntax.
Prediction unit 75 invokes motion compensation unit 86 for block-based predictive decoding. In doing so, motion compensation unit 86 generates weighted prediction data based on the syntax (113). Thus, if one or more syntax elements indicate that rounding adjustments are used, motion compensation unit 86 generates weighted prediction data that includes the rounding adjustments. However, if the one or more syntax elements indicate that rounding adjustments are not used, motion compensation unit 86 generates weighted prediction data that lacks rounding adjustments. Video decoder 70 may then decode the video data using the weighted prediction data (114). In particular, video decoder 70 may combine weighted prediction data (e.g., a prediction block) with residual video data (e.g., a residual block) using adder 79 in order to generate a reconstruction of the video data (e.g., a reconstructed video block).
Fig. 7 is a flow diagram illustrating another exemplary process performed by a video encoder consistent with this disclosure. Fig. 7 will be described from the perspective of motion compensation unit 35 of fig. 3 (which may form part of video encoder 50 of fig. 2). As shown in fig. 7, default weighted prediction unit 54 performs default weighted prediction with rounding (201), and performs default weighted prediction without rounding (202). A rounding unit 55 may be invoked to define said rounding or lack thereof. Implicit weighted prediction unit 56 then performs implicit weighted prediction with rounding (203) and performs implicit weighted prediction without rounding (204). Again, rounding unit 55 may be invoked to define said rounding or lack thereof.
As explained above, default weighted prediction refers to weighted prediction in which the weight factors associated with two or more different lists are predefined by some default setting. In some cases, default weighted prediction may assign equal weights to each of the lists. Implicit weighted prediction refers to weighted prediction in which weight factors associated with two or more different lists are defined based on some implicit factors associated with the data. For example, the implicit weight factors may be defined by the relative temporal position of the data in the two different lists relative to the data being predictively encoded.
The R-D analysis unit 64 selects either default weighted prediction or implicit weighted prediction (205). In particular, R-D analysis unit 64 may select default weighted prediction or implicit weighted prediction based on qualities and encoding rates associated with different versions of prediction data. For example, R-D analysis unit 64 may consider the similarity of different versions of prediction data relative to the video block being encoded, and may select the closest version.
If the R-D analysis unit 64 selects default weighted prediction ("Default" 205), the explicit weighted prediction unit 58 may be invoked to calculate explicit weighted prediction data and compare the data to the explicit weighted prediction data using default weights. As mentioned, this allows for the implementation of offsets in the context of default weights. Thus, explicit weighted prediction may be used as a mechanism to provide different offsets to data that would otherwise be defined by default or implicit weighted prediction. As outlined above, explicit weighted prediction refers to weighted prediction in which weight factors are dynamically defined and encoded into a bitstream as part of the encoding process. In this regard, explicit weighted prediction differs from default weighted prediction and implicit weighted prediction, e.g., explicit weighted prediction results in weighting factors that are encoded as part of the bitstream, while default and implicit weighted prediction define weighting factors that are predefined or determined by the decoder (no weighting factors are present in the bitstream).
Specifically, explicit weighted prediction unit 58 may calculate explicit weights and explicit offsets using conventional explicit weighted prediction defined by ITU-T h.264 (206). For example, to compute explicit weights, explicit weighted prediction unit 58 may apply a Least Mean Squares (LMS) algorithm in order to solve the explicit weighted prediction equations listed above with respect to weights and offsets. Additionally, explicit weighted prediction unit 58 may calculate an offset associated with the default weight (207). The offset calculation unit 62 may be invoked by the explicit weighted prediction unit 58 in order to calculate the offset. Specifically, offset calculation unit 62 may calculate a given offset as an average difference between pixel values of the video data being encoded and pixel values of a given version of weighted prediction data.
Explicit weighted prediction unit 58 may then generate two different versions of prediction data. Specifically, explicit weighted prediction unit 58 may perform explicit weighted prediction using default weights and corresponding offsets (208), and may also perform explicit weighted prediction using calculated weights and corresponding explicit offsets (209). These two different versions of explicit weighted prediction data (one version calculated from normal explicit weighted prediction and the other version calculated using default weights plus offsets) and default weighted prediction with or without rounding may then be sent to R-D analysis unit 64. R-D analysis unit 64 may select a prediction mode based on the R-D analysis (210). Specifically, R-D analysis unit 64 may select between these two different versions of explicit weighted prediction data (one version calculated from normal explicit weighted prediction and the other version calculated using default weights plus offsets). R-D analysis unit 64 may consider the similarity of different versions of prediction data relative to the video block being encoded and may select the closest version.
A similar process occurs with respect to implicit weighted prediction data when the implicit weighted prediction data is selected relative to default weighted prediction data. That is, if the R-D analysis unit 64 selects implicit weighted prediction ("implicit" 205), the explicit weighted prediction unit 58 may be invoked to calculate explicit weighted prediction data and compare the data to the explicit weighted prediction data using implicit weights. This allows for an offset in the context of implicit weights. Specifically, explicit weighted prediction unit 58 may calculate explicit weights and explicit offsets using conventional explicit weighted prediction defined by ITU-T h.264 (211). In addition, explicit weighted prediction unit 58 may calculate offsets associated with the implicit weights (212). Offset calculation unit 62 may be invoked by explicit weighted prediction unit 58 in order to calculate an offset as described herein.
Explicit weighted prediction unit 58 may then generate two different versions of prediction data. In this case, explicit weighted prediction unit 58 may perform explicit weighted prediction using implicit weights and corresponding offsets (213), and may also perform explicit weighted prediction using calculated weights and corresponding explicit offsets (214). These two different versions of explicit weighted prediction data (one version calculated from normal explicit weighted prediction and the other version calculated using implicit weights plus an offset) and implicit weighted prediction with or without rounding may then be sent to R-D analysis unit 64. R-D analysis unit 64 may select a prediction mode based on the R-D analysis. Specifically, R-D analysis unit 64 may select between these two different versions of explicit weighted prediction data (one version calculated from normal explicit weighted prediction and the other version calculated using implicit weights plus an offset). R-D analysis unit 64 may consider the similarity of different versions of prediction data relative to the video block being encoded and may select the closest version.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets and Integrated Circuits (ICs) or sets of ICs (i.e., chipsets). Any components, modules or units that have been described are provided to emphasize functional aspects and do not necessarily require realization by different hardware units. The techniques described herein may also be implemented in hardware, software, firmware, or any combination thereof. Any features described as modules, units, or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. In some cases, various features may be implemented as an integrated circuit device, such as an integrated circuit chip or chipset.
If implemented in software, the techniques may be realized at least in part by a computer-readable medium comprising instructions that, when executed in a processor, perform one or more of the methods described above. The computer-readable medium may comprise a computer-readable storage medium and may form part of a computer program product, which may include packaging materials. The computer-readable storage medium may include Random Access Memory (RAM) (e.g., Synchronous Dynamic Random Access Memory (SDRAM)), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), flash memory, magnetic or optical data storage media, and so forth. The techniques additionally, or alternatively, may be realized at least in part by a computer-readable communication medium that carries or communicates code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer.
The code or instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated in a combined video codec. The techniques may also be fully implemented in one or more circuits or logic elements.
This disclosure also contemplates any of a variety of integrated circuit devices comprising circuitry to implement one or more of the techniques described in this disclosure. Such circuitry may be provided in a single integrated circuit chip or in multiple interoperable integrated circuit chips in a so-called chipset. Such integrated circuit devices may be used in a variety of applications, some of which may include use in wireless communication devices (e.g., mobile telephone handsets).
Various embodiments of the present invention have been described. These and other embodiments are within the scope of the following claims.

Claims (15)

1. A method for predicting video data that depends on two or more lists of prediction data, the method comprising:
generating weighted prediction data that depends on the two or more lists of prediction data and includes at least two offsets and rounding adjustments, wherein the weighted prediction data is generated approximately according to the following equation:
pred(i,j)=(pred0(i,j)*w0+pred1(i,j)*w1+((o1+o2)<<r)+2r)>>(r+1),
wherein pred (i, j) is the weighted prediction dataPred0(i, j) and pred1(i, j) are data from the two or more lists of prediction data, w0 and w1 are weight factors, o1 and o2 are the at least two offsets, and r and 2rRight shift operation of and shift (r +1) bits>>Provide in combination a rounding adjustment for said rounding; and
reconstructing the video data using the weighted prediction data.
2. The method of claim 1, wherein the two or more lists comprise list 0 and list 1, and wherein the weighted prediction data depends on first data from the list 0, second data from the list 1, two different weighting factors, two different offsets, and at least two different rounding terms.
3. The method of claim 1, where o1 is a first offset applied to the pred0(i, j) from list 0 and o2 is a second offset applied to the pred1(i, j) from the list 1.
4. The method of claim 1, wherein different offsets are applied for luma and chroma blocks.
5. The method of claim 1, where the at least two offsets are combined into a single offset o, where o is a common offset that applies to the pred0(i, j) from list 0 and the pred1(i, j) from list 1.
6. The method of claim 1, further comprising encoding a video bitstream indicative of reconstructed video data.
7. The method of claim 1, further comprising decoding the video data using reconstructed video data.
8. An apparatus for predicting video data that depends on two or more lists of prediction data, the apparatus comprising:
a memory configured to store the video data and two or more lists of prediction data used to predictively encode the video data; and
a processor configured to:
generating rounding adjusted weighted prediction data that depends on the two or more lists of prediction data and that includes at least two offsets and rounding adjustments;
wherein the processor generates the weighted prediction data according to the following equation approximation:
pred(i,j)=(pred0(i,j)*w0+pred1(i,j)*w1+((o1+o2)<<r)+2r)>>(r+1),
where pred (i, j) is the weighted prediction data, pred0(i, j) and pred1(i, j) are data from the two or more data lists, w0 and w1 are weight factors, o1 and o2 are the at least two offsets, and r and 2rRight shift operation of and shift (r +1) bits>>Provide in combination a rounding adjustment for said rounding; and
reconstructing the video data using the weighted prediction data.
9. The apparatus of claim 8, wherein the two or more lists comprise list 0 and list 1, and wherein the weighted prediction data depends on first data from the list 0, second data from the list 1, two different weighting factors, two different offsets, and at least two different rounding terms.
10. The apparatus of claim 8, where o1 is a first offset applied to the pred0(i, j) from list 0 and o2 is a second offset applied to the pred1(i, j) from the list 1.
11. The apparatus of claim 8, wherein different offsets are applied to predict luma and chroma blocks.
12. The apparatus of claim 8, where said at least two offsets are combined into a single offset o, where o is a common offset that applies to both the pred0(i, j) from list 0 and the pred1(i, j) from list 1.
13. The apparatus of claim 8, wherein the processor is configured to encode a video bitstream indicative of reconstructed video data.
14. The apparatus of claim 8, wherein the processor is further configured to decode the video data using reconstructed video data.
15. The apparatus of claim 8, wherein the apparatus comprises one or more of:
an integrated circuit;
a microprocessor;
a video encoder;
a video decoder; or
A wireless communication device.
HK15110970.6A 2009-07-09 2015-11-06 Non-zero rounding and prediction mode selection techniques in video encoding HK1210344B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/499,990 2009-07-09
US12/499,990 US9161057B2 (en) 2009-07-09 2009-07-09 Non-zero rounding and prediction mode selection techniques in video encoding

Publications (2)

Publication Number Publication Date
HK1210344A1 HK1210344A1 (en) 2016-04-15
HK1210344B true HK1210344B (en) 2018-12-07

Family

ID=

Similar Documents

Publication Publication Date Title
US9609357B2 (en) Non-zero rounding and prediction mode selection techniques in video encoding
US8711930B2 (en) Non-zero rounding and prediction mode selection techniques in video encoding
US8995526B2 (en) Different weights for uni-directional prediction and bi-directional prediction in video coding
US8831087B2 (en) Efficient prediction mode selection
US8995527B2 (en) Block type signalling in video coding
HK1210344B (en) Non-zero rounding and prediction mode selection techniques in video encoding