HK1195430B

HK1195430B - Method for reducing latency in video decoding and computing system

Info

Publication number: HK1195430B
Application number: HK14108880.0A
Authority: HK
Inventors: Gary J. Sullivan
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2011-06-30
Filing date: 2011-10-11
Publication date: 2017-06-16

Description

Method and computing system for reducing delay in video decoding

Background

Engineers use compression (also known as source coding or source coding) to reduce the bit rate of digital video. Compression reduces the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A "codec" is an encoder/decoder system.

In the past two decades, various video codec standards have been adopted, including the H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263, and H.264 (AVC or ISO/IEC 14496-10) standards and the MPEG-1 (ISO/IEC 11172-2), MPEG-4Visual (ISO/IEC 14496-2), and SMPTE421M standards. Recently, the HEVC standard is being developed. Video codec standards typically define syntax options for encoding a video bitstream that detail parameters in the bitstream when certain features are used in encoding and decoding. In many cases, the video codec standard also provides details about the decoding operations that the decoder should perform in order to obtain the correct results in decoding.

The basic goal of compression is to provide good rate-distortion performance. Thus, the encoder attempts to provide the highest quality video for a particular bit rate. Alternatively, the encoder attempts to provide the lowest bit rate of encoded video for a particular quality level/fidelity level to the original video. In practice, depending on the context of use, considerations such as encoding time, encoding complexity, encoding resources, decoding time, decoding complexity, decoding resources, total latency, and/or smoothness in playback also affect the decisions made during encoding and decoding.

For example, consider usage scenarios such as playback of video from a storage device, playback of video from encoded data streamed over a network connection, and video transcoding (from one bitrate to another bitrate, or from one standard to another). At the encoder side, such an application may permit offline encoding that is completely insensitive to time. Thus, the encoder can increase the encoding time and increase the resources used during encoding to find the most efficient way to compress video and thereby improve the rate-distortion performance. If a small amount of delay is also acceptable at the decoder side, the encoder can further improve the rate-distortion performance, for example by exploiting inter-picture correlation from pictures further ahead in the sequence.

On the other hand, consider usage scenarios such as remote desktop conferencing, surveillance video, video telephony, and other real-time communication scenarios. Such applications are time sensitive. Low latency between the recording of input images and the playback of output images is a critical factor in performance. When encoding/decoding tools adapted for non-real-time communication are applied in a real-time communication context, the total delay is often unacceptably high. The latency introduced by these tools during encoding and decoding can improve the performance of conventional video playback, but they disrupt real-time communication.

Disclosure of Invention

In summary, the detailed description section addresses techniques and tools for reducing delay in video encoding and decoding. The techniques and tools may reduce latency in order to improve responsiveness in real-time communications. For example, the techniques and tools reduce the overall delay by constraining the delay due to video frame reordering, and by indicating the constraint on frame reordering delay with one or more syntax elements accompanying the encoded data for the video frame.

In accordance with one aspect of the techniques and tools described herein, a tool, such as a video encoder, a real-time communication tool with a video encoder, or other tool sets one or more syntax elements that indicate a constraint on delay, such as a constraint on frame reordering delay consistent with inter-frame correlation between multiple frames of a video sequence. The tool outputs the syntax element(s), thereby facilitating a simpler and faster determination of when to reconstruct frames in their output order are ready for output.

In accordance with another aspect of the techniques and tools described herein, a tool, such as a video decoder, a real-time communication tool with a video decoder, or other tool receives and parses one or more syntax elements that indicate a constraint on delay (e.g., a constraint on frame reordering delay). The tool also receives encoded data for a plurality of frames of a video sequence. At least some of the encoded data is decoded to reconstruct one of the multiple frames. The tool may determine constraints on delay based on the syntax element(s) and then use the constraints on delay to determine when a reconstructed frame is ready for output (in output order). The tool outputs the reconstructed frame.

The foregoing and other objects, features and advantages of the invention will become more apparent from the following detailed description of the invention which proceeds with reference to the accompanying drawings.

Drawings

FIG. 1 is an illustration of an example computing system in which some described embodiments may be implemented.

Fig. 2a and 2b are illustrations of example network environments in which some described embodiments may be implemented.

FIG. 3 is an illustration of an example encoder system in conjunction with which some described embodiments may be implemented.

Fig. 4 is an illustration of an example decoder system in conjunction with which some described embodiments may be implemented.

Fig. 5a-5e are diagrams illustrating the encoding order and output order for frames in several example series.

Fig. 6 is a flow diagram illustrating an example technique for setting and outputting one or more syntax elements indicating a constraint on delay.

Fig. 7 is a flow diagram illustrating an example technique for reduced delay decoding.

Detailed Description

Detailed description of the inventionthe detailed description presents techniques and tools for reducing delay in video encoding and decoding. The techniques and tools may help reduce latency in order to improve responsiveness in real-time communications.

In a video encoding/decoding scenario, some delay between the time an input video frame is received and the time the frame is played back is inevitable. The frame is encoded by an encoder, conveyed to and decoded by a decoder, and some amount of delay is caused by practical limitations on encoding resources, decoding resources, and/or network bandwidth. However, other delays may be avoided. For example, to improve rate-distortion performance (e.g., to exploit inter-frame correlation from earlier pictures in the sequence), the encoder and decoder may introduce delays. Such delays may be reduced, although there may be losses in rate-distortion performance, processor usage, or playback smoothness.

With the techniques and tools described herein, delay is reduced by constraining delay (thus, limiting the temporal extent of inter-frame correlation) and indicating this constraint on delay to the decoder. For example, the constraint on latency is a constraint on frame reordering latency. Alternatively, the constraint on delay is a constraint in seconds, milliseconds, or another time metric. The decoder may then determine this constraint on delay and use it when determining which frames are ready for output. In this way, latency may be reduced for remote desktop conferencing, video telephony, video surveillance, camera video, and other real-time communication applications.

Some of the innovations described herein are illustrated by reference to syntax elements and operations specific to the h.264 and/or HEVC standards. Such innovations may also be implemented for other standards or formats.

More generally, various alternatives to the examples described herein are possible. Some of the techniques described with reference to the flowcharts may be altered by changing the ordering of the stages shown in the flowcharts, by splitting, repeating, or omitting certain stages, etc. Aspects of latency reduction for video encoding and decoding may be used in combination or separately. Various embodiments use one or more of the described techniques and tools. Some of the techniques and tools described herein address one or more of the problems noted in the background. Typically, a given technology/tool cannot solve all such problems.

I. Example computing System

Fig. 1 illustrates a generalized example of a suitable computing system (100) in which several of the described techniques and tools may be implemented. The computing system (100) is not intended to suggest any limitation as to scope of use or functionality, as the techniques and tools may be implemented in diverse general-purpose or special-purpose computing systems.

Referring to FIG. 1, the computing system (100) includes one or more processing units (110, 115) and memory (120, 125). In fig. 1, this most basic configuration (130) is contained within a dashed line. The processing unit (110, 115) executes computer-executable instructions. The processing unit may be a general purpose Central Processing Unit (CPU), a processor in an Application Specific Integrated Circuit (ASIC), or other type of processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. For example, FIG. 1 shows a central processing unit (110) and a graphics processing unit or co-processing unit (115). The tangible memory (120, 125) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, that is accessible to the processing unit(s). The memory (120, 125) stores software (180) implementing one or more innovations for reducing latency in video encoding and decoding in the form of computer-executable instructions adapted to be executed by the processing unit(s).

The computing system may have additional features. For example, the computing system (100) includes storage (140), one or more input devices (150), one or more output devices (160), and one or more communication connections (170). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing system (100). Typically, operating system software (not shown) provides an operating environment for other software running in the computing system (100) and coordinates activities of the components of the computing system (100).

The tangible storage (140) may be removable or non-removable and include magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information in a non-transitory manner and which can be accessed within the computing system (100). The storage device (140) stores instructions of the software (180) for implementing one or more innovations for delay reduction in video encoding and decoding.

The input device(s) (150) may be a touch input device such as a keyboard, mouse, stylus or trackball, a sound input device, a scanning device, or another device that provides input to the computing system (100). For video encoding, the input device(s) (150) may be a video camera, video card, television tuner card, or similar device that accepts input in analog or digital form, or a CD-ROM or CD-RW that reads video samples into the computing system (100). The output device(s) (160) may be a display, printer, speaker, CD-writer, or another device that provides output from the computing system (100).

The communication connection(s) (170) enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. The modulated data signal is such a signal: one or more of its characteristics may be set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may use an electrical, optical, RF, or other carrier.

The techniques and tools may be described in the general context of computer-readable media. Computer readable media is any available tangible medium that can be accessed within a computing environment. By way of example, and not limitation, for the computing system (100), computer-readable media comprise memory (120, 125), storage (140), and combinations of any of the above.

The techniques and tools may be described in the general context of computer-executable instructions, such as those contained in program modules, being executed in a computing system on a target processor, whether real or virtual. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing system.

The terms "system" and "device" are used interchangeably herein. Unless the context clearly dictates otherwise, none of the terms imply any limitation as to the type of computing system or computing device. In general, a computing system or computing device may be local or distributed, and may include any combination of special purpose hardware and/or general purpose hardware with software that implements the functionality described herein.

For purposes of illustration, the detailed description uses terms like "determine" and "use" to describe computer operations in a computing system. These terms are highly abstract of computer-implemented operations and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on the implementation.

Example network Environment

Fig. 2a and 2b show example network environments (201, 202) that include a video encoder (220) and a video decoder (270). The encoder (220) and decoder (270) are connected over a network (250) using an appropriate communication protocol. The network (250) may comprise the internet or other computer network.

In the network environment (201) shown in fig. 2a, each real-time communication ("RTC") tool (210) contains both an encoder (220) and a decoder (270) for two-way communication. A given encoder (220) may produce output that complies with the SMPTE421M standard, the ISO-IEC14496-10 standard (also known as h.264 or AVC), the HEVC standard, other standards, or proprietary formats, with a corresponding decoder (270) accepting encoded data from that encoder (220). The two-way communication may be part of a video conference, video phone call, or other two-party communication scenario. Although the network environment (201) in fig. 2a comprises two real-time communication tools (210), the network environment (201) may alternatively comprise three or more real-time communication tools (210) participating in a multiparty communication.

The real-time communication tool (210) manages the encoding of the encoder (220). Fig. 3 shows an example encoder system (300) that can be included in the real-time communication tool (210). Alternatively, the real-time communication tool (210) uses another encoder system. The real-time communication tool (210) also manages the decoding of the decoder (270). Fig. 4 shows an example decoder system (400) that may be included in the real-time communication tool (210). Alternatively, the real-time communication tool (210) uses another encoder system.

In the network environment (202) shown in fig. 2b, the encoding tool (212) comprises an encoder (220) that encodes video for delivery to a plurality of playback tools (214), wherein the plurality of playback tools comprise decoders (270). One-way communication may be provided for video surveillance systems, camera surveillance systems, remote desktop conference presentations, or other scenarios in which video is encoded and transmitted from one location to one or more other locations. Although the network environment (202) in fig. 2b contains two playback tools (214), the network environment (202) may contain more or fewer playback tools (214). Generally, the playback tool (214) communicates with the encoding tool (212) to determine a video stream for the playback tool (214) to receive. The playback tool (214) receives the stream, buffers the received encoded data for an appropriate time period, and then begins decoding and playback.

Fig. 3 shows an example encoder system (300) that may be included in the encoding tool (212). Alternatively, the encoding tool (212) uses another encoder system. The encoding tool (212) may also contain server-side controller logic for managing connections with one or more playback tools (214). Fig. 4 shows an example decoder system (400) that may be included in the playback tool (214). Alternatively, the playback tool (214) uses another decoder system. The playback tool (214) may also contain client controller logic for managing connections with the encoding tool (212).

In some cases, the use of syntax elements indicating delay (e.g., frame reordering delay) is specific to a particular standard or format. For example, the encoded data may contain one or more syntax elements indicating constraints on delay as part of the syntax of the base coded video bitstream defined according to the standard or format, or as defined media metadata about the encoded data. In these cases, the real-time communication tool (210), encoding tool (212), and/or playback tool (214) with reduced latency may be codec-dependent in that the decisions they make may depend on the bitstream syntax for a particular standard or format.

In other cases, the use of syntax elements indicating constraints on delay (e.g., frame reordering delay) is outside of a particular standard or format. For example, the syntax element(s) indicating the constraint on delay may be signaled as part of the media transport stream, media storage file or, more generally, the syntax of the media system multiplexing protocol or transport protocol. Or syntax element(s) indicating the delay may be negotiated between the real-time communication tool (210), the encoding tool (212), and/or the playback tool (214) according to a media property negotiation protocol. In these cases, the real-time communication tool (210), encoding tool (212), and playback tool (214) with reduced latency may be codec independent in that they may work with any available video encoder and decoder, assuming that the level of control over inter-frame correlation is fixed during encoding.

Example encoder System

Fig. 3 is a block diagram of an example encoder system (300) in conjunction with which some described embodiments may be implemented. The encoder system (300) may be a general-purpose encoding tool capable of operating in any one of a plurality of encoding modes, such as a low-delay encoding mode for real-time communication, a transcoding mode, and a regular encoding mode for playback of media from a file or stream, or it may be a special-purpose encoding tool adapted for one such encoding mode. The encoder system (300) may be implemented as an operating system module, as part of an application library, or as a stand-alone application. In general, the encoder system (300) receives a sequence of source video frames (311) from a video source (310) and produces encoded data as output to a channel (390). The encoded data output to the channel may contain one or more syntax elements indicating constraints on delay (e.g., frame reordering delay) to facilitate reduced-delay decoding.

The video source (310) may be a video camera, tuner card, storage medium, or other digital video source. The video source (310) generates a sequence of video frames at a frame rate of, for example, 30 frames per second. As used herein, the term "frame" generally refers to source, encoded, or reconstructed image data. For progressive video, a frame is a progressive video frame. For interlaced video, in an example embodiment, interlaced video frames are de-interlaced prior to encoding. Alternatively, two complementary interlaced video fields are encoded as one interlaced video frame or separate fields. In addition to indicating progressive video frames, the term "frame" may indicate a single unpaired video field, a complementary pair of video fields, a video object plane representing a video object at a given time, or a region of interest in a larger image. The video object plane or region may be a portion of a larger image containing multiple objects or regions of a scene.

Arriving source frames (311) are stored in a source frame temporary memory storage area (320) comprising a plurality of frame buffer storage areas (321, 322, … 32 n). A frame buffer (321, 322, etc.) holds a source frame in the source frame storage area (320). After one or more source frames (311) have been stored in the frame buffers (321, 322, etc.), the frame selector (330) periodically selects an individual source frame from the source frame storage area (320). The order in which the frames are selected by the frame selector (330) for input to the encoder (340) may be different from the order in which the frames are generated by the video source (310), e.g., the order of a frame may precede to facilitate temporal backward prediction. Prior to the encoder (340), the encoder system (300) may include a pre-processor (not shown) that performs pre-processing (e.g., filtering) on the selected frames (331) prior to encoding.

The encoder (340) encodes the selected frame (331) to produce an encoded frame (341) and also produces a memory management control signal (342). If the current frame is not the first frame that has been encoded, the encoder (340) may use one or more previous encoded/decoded frames (369) that have been stored in the decoded frame temporary memory store (360) when performing its encoding process. This stored decoded frame (369) is used as a reference frame for inter-prediction of the content of the current source frame (331). Generally, the encoder (340) includes a plurality of encoding modules that perform encoding tasks such as motion estimation and compensation, frequency transformation, quantization, and entropy encoding. The exact operations performed by the encoder (340) may vary depending on the compression format. The format in which the encoded data is output may be a Windows MediaVideo format, a VC-1 format, an MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), an H.26x format (e.g., H.261, H.262, H.263, H.264), an HEVC format, or other format.

The encoded frames (341) and the memory management control signals (342) are processed by a decoding process simulator (350). The decoding process simulator (350) implements some of the functionality of the decoder, such as the decoding task of reconstructing reference frames used by the encoder (340) in motion estimation and compensation. The decoding process simulator (350) uses the memory management control signal (342) to determine whether a given encoded frame (341) needs to be reconstructed and stored for use as a reference frame in inter-frame prediction of a subsequent frame to be encoded. If the control signal (342) indicates a need to store an encoded frame (341), the decoding process simulator (350) models a decoding process that may be implemented by a decoder that receives the encoded frame (341) and produces a corresponding decoded frame (351). In doing so, when the encoder (340) has used the decoded frame(s) (369) already stored in the decoded frame store (360), the decoding process emulator (350) also uses the decoded frame(s) (369) from the store (360) as part of the decoding process.

The decoded frame temporary memory storage area (360) contains a plurality of frame buffer storage areas (361, 362, …, 36 n). The decoding process emulator (350) uses the memory management control signal (342) to manage the contents of the memory region (360) to identify any frame buffers (361, 362, etc.) that have frames that are no longer needed by the encoder (340) for use as reference frames. After modeling the decoding process, the decoding process simulator (350) stores the newly decoded frame (351) in the frame buffer (361, 362, etc.) that has been identified in this manner.

The encoded frame (341) and the memory management control signal (342) are also buffered in the temporary encoded data region (370). The encoded data aggregated in the encoded data region (370) may contain one or more syntax elements indicating constraints on delay as part of the syntax of the base encoded video bitstream. Alternatively, the encoded data aggregated in the encoded data region (370) may contain syntax element(s) indicating the constraint on delay as part of media metadata related to the encoded video data, e.g., as one or more parameters in one or more supplemental enhancement information ("SEI") messages or video usability information ("VUI") messages.

Aggregated data (371) from the temporally encoded data region (370) is processed by a channel encoder (380). The channel encoder (380) may packetize the aggregated data for transmission as a media stream, in which case the channel encoder (380) may add syntax element(s) indicating the constraint on delay as part of the syntax of the media transport stream. Or the channel encoder (380) may organize the aggregated data for storage as a file, in which case the channel encoder (380) may add syntax element(s) indicating the constraint on latency as part of the syntax of the media storage file. Or, more generally, the channel encoder (380) may implement one or more media system multiplexing protocols or transport protocols, in which case the channel encoder (380) may add syntax element(s) indicating the constraint on delay as part of the syntax of the protocol(s). The channel encoder (380) provides an output to a channel (390), the channel (390) representing storage, a communication connection, or another channel for the output.

Example decoder System

Fig. 4 is a block diagram of an example decoder system (400) in conjunction with which some described embodiments may be implemented. The decoder system (400) may be a general decoding tool capable of operating in any one of a plurality of decoding modes, such as a low-latency decoding mode for real-time communication and a regular decoding mode for playback of media from a file or stream, or it may be a dedicated decoding tool adapted for one such decoding mode. The decoder system (400) may be implemented as an operating system module, as part of an application library or as a stand-alone application. In general, the decoder system (400) receives encoded data from a channel (410) and produces reconstructed frames as output for an output destination (490). The encoded data may contain one or more syntax elements indicating constraints on delay (e.g., frame reordering delay) to facilitate reduced delay decoding.

The decoder system (400) comprises a channel (410) which may represent a storage, a communication connection, or another channel for encoded data as input. The channel (410) produces encoded data that has been channel encoded. A channel decoder (420) may process the encoded data. For example, the channel decoder (420) unpacks data that has been aggregated for transmission as a media stream, in which case the channel decoder (420) may parse syntax element(s) indicating constraints on delay as part of the syntax of the media transport stream. Alternatively, the channel decoder (420) separates encoded video data that has been aggregated for storage as a file, in which case the channel decoder (420) may parse syntax element(s) indicating the constraint on delay as part of the syntax of the media storage file. Or, more generally, the channel decoder (420) may implement one or more media system demultiplexing protocols or transport protocols, in which case the channel decoder (420) may parse syntax element(s) indicating constraints on delay as part of the syntax of the protocol(s).

The encoded data (421) output from the channel decoder (420) is stored in a temporary encoded data region (430) until a sufficient amount of such data has been received. The encoded data (421) includes an encoded frame (431) and a memory management control signal (432). The encoded data (421) in the encoded data region (430) may contain one or more syntax elements indicating constraints on delay as part of the syntax of the base encoded video bitstream. Alternatively, the encoded data (421) in the encoded data region (430) may contain syntax element(s) indicating the constraint on delay as part of media metadata related to the encoded video data (e.g., as one or more parameters in one or more SEI messages or VUI messages). Generally, the encoded data region (430) temporarily stores encoded data (421) until such encoded data (421) is used by the decoder (450). At that time, the encoded data for the encoded frame (431) and memory management control signal (432) is passed from the encoded data region (430) to the decoder (450). As decoding continues, new encoded data is added to the encoded data region (430) and the oldest encoded data remaining in the encoded data region (430) is passed to the decoder (450).

The decoder (450) periodically decodes the encoded frames (431) to produce corresponding decoded frames (451). The decoder (450) may use one or more previously decoded frames (469) as reference frames for inter-prediction when performing its decoding process, if appropriate. The decoder (450) reads such previously decoded frame (469) from the decoded frame temporary memory store (460). In general, the decoder (450) includes a plurality of decoding modules that perform decoding tasks such as entropy decoding, inverse quantization, inverse frequency transform, and motion compensation. The exact operations performed by the decoder (450) may vary depending on the compression format.

The decoded frame temporary memory storage area (460) contains a plurality of frame buffer storage areas (461, 462, …, 46 n). The decoded frame storage area (460) is an example of a decoded picture buffer. The decoder (450) uses the memory management control signal (432) to identify the frame buffer (461, 462, etc.) in which it can store the decoded frame (451). The decoder (450) stores decoded frames (451) in the frame buffer.

An output sequencer (sequencer) (480) uses the memory management control signals (432) to identify when the next frame to be generated in output order is available in the decoded frame storage area (460). In order to reduce the delay of the encoding-decoding system, the output sequencer (480) uses syntax elements indicating constraints on delay to speed up the recognition of frames to be generated in output order. When the next frame (480) to be generated in output order is available in the decoded frame storage area (460), it is read by the output sequencer (480) and output to an output destination (490) (e.g., a display). In general, the order in which the output sequencer (480) outputs frames from the decoded frame storage area (460) may be different from the order in which the decoder (450) decodes frames.

Syntax elements to facilitate delay-reducing encoding and decoding

In most video codec systems, the encoding order (also referred to as decoding order or bitstream order) is the order in which video frames are represented in the encoded data in the bitstream and thus processed during decoding. The encoding order may be different from the order in which the camera captured the frames before encoding and from the order in which the decoded frames were displayed, stored, or otherwise output (output order or display order) after decoding. Frame reordering in relation to output order is beneficial (mainly in terms of compression capability), but it increases the end-to-end delay of the encoding and decoding process.

The techniques and tools described herein reduce delay due to reordering of video frames and also facilitate delay reduction for a decoder system by providing information to the decoder system regarding constraints on reordering delay. Such delay reduction is useful for many purposes. For example, it may be used to reduce the time lag that occurs in interactive video communications using a video conferencing system, thereby making the conversation flow and communication interaction between remote participants more rapid and natural.

A. Output timing and output ordering method

According to the h.264 standard, a decoder may use two methods to determine when a decoded frame is ready to be output. The decoder may use timing information in the form of decoding timestamps and output timestamps (e.g., signaled in an image timing SEI message). Alternatively, the decoder may use a buffer capacity limit signaled with various syntax elements to determine when a decoded frame is ready to be output.

Timing information may be associated with each decoded frame. The decoder may use the timing information to determine when decoded frames may be output. In practice, however, such timing information may not be available to the decoder. Moreover, even when timing information is available, some decoders do not actually use this information (e.g., because the decoder has been designed to work regardless of whether timing information is available or not).

The buffer capacity limit according to the h.264 standard (and draft versions of the HEVC standard) is indicated by several syntax elements, including the syntax element max _ dec _ frame _ buffering, the syntax element num _ reorder _ frames, related ordering information (referred to as "picture order count" information), and other memory management control information signaled in the bitstream. The syntax element max _ dec _ frame _ buffering (or derived variable designated as MaxDpbFrames) specifies the decoded picture buffer ("DPB") size required in the frame buffer unit. Likewise, the syntax element max _ dec _ frame _ buffering represents the top-level memory capacity used for encoding the video sequence in order to enable the decoder to output the pictures in the correct order. The syntax element num _ reorder _ frames (or max _ num _ reorder _ frames) indicates the maximum number of frames (or complementary field pairs, or unpaired fields) that can precede any frame (or complementary field pairs, or unpaired fields) in coding order and follow it in output order. In other words, num _ reorder _ frames specifies the constraints on the memory capacity necessary for picture reordering. The syntax element max _ num _ ref _ frames specifies the maximum number of short-term and long-term reference frames (or complementary reference field pairs, or unpaired reference fields) that can be used by the decoding process for inter prediction of any picture in the sequence. The syntax element max _ num _ ref _ frames also determines the size of the sliding window used for decoding the reference picture markers. Like num _ reorder _ frames, max _ num _ ref _ frames specifies a constraint on the required memory capacity.

The decoder uses the max _ dec _ frame _ buffering (or MaxDpbFrames) and num _ reorder _ frames syntax elements to determine when the buffer capacity limit is exceeded. This occurs, for example, when a new decoded frame needs to be stored in the DPB, but there is no remaining available space in the DPB. In this case, the decoder uses the picture order count information to identify which of the pictures that have been decoded is the earliest in output order. The earliest image in output order is then output. Such a process is sometimes referred to as "bumping" because the arrival of a new image that needs to be stored "bumps" an image out of the DPB.

The information indicated by the max _ dec _ frame _ buffering (or MaxDpbFrames) and num _ reorder _ frames syntax elements is sufficient for determining the memory capacity needed in the decoder. However, the use of such information introduces unnecessary delays when used to control the "collision" process for image output. As defined in the h.264 standard, the max _ dec _ frame _ buffering and num _ reorder _ frames syntax elements do not establish a limit on the number of reordering that can be used for any particular picture and, therefore, do not establish a limit on the end-to-end delay. Whatever the value of these syntax elements, a particular picture can be left in the DPB for any length of time before it is output, which corresponds to the large amount of delay added by the encoder to pre-buffer the source picture.

B. Syntax element indicating constraint on frame reordering delay

The techniques and tools described herein reduce delay in video communication systems. The encoding tool, real-time communication tool, or other tool sets a limit on the reordering range that can be used to encode any frame in a video sequence. For example, the limit may be expressed as the number of frames in the encoded video sequence that may precede any given frame in output order and follow the given frame in encoding order. This restriction constrains the reordering delay allowed for any particular frame in the sequence. In other words, the restriction constrains the range of time (in frames) that can be used for reordering between the encoding order and the output order of any particular frame. Limiting the scope of reordering helps reduce end-to-end latency. As such, establishing such restrictions may be useful in real-time system negotiation protocols or application specifications for use scenarios where reducing latency is important.

The one or more syntax elements indicate a constraint on frame reordering delay. Signaling constraints on frame reordering delay facilitates system level negotiation for interactive real-time communication or other use scenarios. It provides a way to directly represent constraints on frame reordering delay and characterize the characteristics of a media stream or session.

The video decoder may use the indicated constraint on frame reordering delay to enable a reduced delay output of decoded video frames. In particular, signaling constraints on frame reordering delay enables a decoder to more simply and quickly identify frames in a DPB that are ready to be output, compared to a frame "collision" process. For example, the decoder may determine the delay state of a frame in the DPB by calculating the difference between the encoding order and the output order for the frame. By comparing the delay state of the frame to the constraint on frame reordering delay, the decoder can determine when the constraint on frame reordering delay has been reached. The decoder can immediately output any frame that has reached the limit. This may help the decoder to identify frames ready for output more quickly than a "collision" process using a variety of syntax elements and tracking structures. In this way, the decoder can quickly (and earlier) determine when a decoded frame can be output. The faster (and earlier) the decoder identifies when a frame can be output, the faster (and earlier) the decoder can output the video to a display or subsequent processing stage.

Thus, by using the constraint on frame reordering latency, a decoder can begin outputting frames from a decoded frame store before the decoded frame store is full, but still provide standard-compliant decoding (i.e., decoding all frames so that the frames exactly match the bits of frames decoded using otherwise conventional schemes). This significantly reduces latency when the latency (in frames) indicated by the delay syntax element is much smaller than the size of the decoded frame storage area (in frames).

Fig. 5a-5e illustrate a series of frames (501-505) having different inter-frame correlations. The series is characterized by different values of the following constraints: (1) constraints on the memory capacity necessary for image reordering (i.e., the number of frame buffers used to store reference frames for reordering purposes, e.g., by the syntax element num _ reorder _ frames), and (2) constraints on frame reordering delay, e.g., as specified by the variable maxlantencylframes. In FIGS. 5a-5e, F is shown for a given frame_j ^kThe index j indicates the position of the frame in output order and the index k indicates the position of the frame in coding order. The frames are shown in output order-the output order subscripts increase from left to right. The arrows illustrate the inter-frame correlation for motion compensation, according to which a subsequent frame in coding order is predicted using a preceding frame in coding order. For simplicity, FIGS. 5a-5e show inter-frame correlation at the frame level (rather than macroblock level, block level, etc., where the reference frame may change), and FIGS. 5a-5e show inter-frame correlation at the frame level for a given frameAt most two frames are used as reference frames. Indeed, in some implementations, different macroblocks, blocks, etc. in a given frame may use different reference frames, and more than two reference frames may be used for the given frame.

In fig. 5a, the series (501) contains nine frames. Last frame F in output order₈ ¹Using the first frame F₀ ⁰As a reference frame. The other frames in the series (501) use the last frame F₈ ¹And a first frame F₀ ⁰Both of which serve as reference frames. This means that frame F is decoded first₀ ⁰Then frame F₈ ¹Then frame F₁ ²And so on. In the series (501) shown in fig. 5a, the value of num _ reorder _ frames is 1. At any point in the decoder system process, only one frame (F) is present among the frames shown in FIG. 5a₈ ¹）For the purpose of reorderingIs stored in the decoded frame storage area. (first frame F)₀ ⁰Is also used as a reference frame and is stored, but not for reordering purposes. Because the first frame F₀ ⁰Is less than the output order of the intermediate frames, so that the first frame F is not processed₀ ⁰For num _ reorder _ frames. ) Although the value of num _ reorder _ frames is small, the series (501) has a relatively high delay-the value of maxlantencyframes is 7. In encoding the first frame F₀ ⁰Thereafter, the encoder waits until the next frame F in output order is encoded₁ ²It has previously buffered eight more source frames because of the next frame F₁ ²Depending on the last frame F in the series (501)₈ ¹. The value of maxlantencyframes is actually the maximum difference allowed between the subscript and superscript values for any particular encoded frame.

In fig. 5b, the series (502) contains nine frames, like the series (501) in fig. 5a, but the inter-frame correlation is different. Temporal reordering of frames occurs in a short range. As a result, the series (502) has a much lower delay-the value of maxlantencyframes is 1. The value of num _ reorder _ frames is still 1.

In fig. 5c, the series (503) contains ten frames. The longest inter-frame correlation (in the time range) is shorter than the longest inter-frame correlation in fig. 5a, but longer than the longest inter-frame correlation in fig. 5 b. This series (503) has the same low value of 1 for num _ reorder _ frames and it has a relatively low value of 2 for maxlantencyframes. This series (503) therefore allows a lower end-to-end delay than the series (501) of fig. 5a, although the delay that can be allowed is lower than without the series (501) of fig. 5 b.

In fig. 5d, the series (504) contains frames organized in a temporal hierarchy having three temporal layers according to inter-frame correlation. The lowest temporal resolution layer contains the first frame F₀ ⁰And the last frame F₈ ¹. Next temporal resolution layer addition frame F₄ ²Depending on the first frame F₀ ⁰And the last frame F₈ ¹. The highest temporal resolution layer adds the remaining frames. The series (504) shown in fig. 5d has a relatively low value of 2 for num _ reorder _ frames but a relatively high value of 7 for maxlantencyframes, at least for the highest temporal resolution layer, since the last frame F₈ ¹Is determined by the difference between the encoding order and the output order of (c). If only the intermediate temporal resolution layer or the lowest temporal resolution layer is decoded, the constraint on frame reordering latency can be reduced to 1 (for the intermediate layer) or 0 (for the lowest layer). To facilitate reduced-delay decoding in various temporal resolutions, syntax elements may indicate constraints on frame reordering delay for different layers in a temporal hierarchy.

In fig. 5e, the series (505) contains frames organized in a temporal hierarchy having three temporal layers according to different inter-frame correlations. The lowest temporal resolution layer contains the first frame F₀ ⁰Intermediate frame F₄ ¹And the last frame F₈ ⁵. Next layer temporal resolution layer additionFrame F₂ ²(which depends on the first frame F)₀ ⁰And intermediate frame F₄ ¹) And F₆ ⁶(which depends on the intermediate frame F)₄ ¹And the last frame F₈ ⁵). The highest temporal resolution layer adds the remaining frames. Compared to the series (504) of fig. 5d, the series (505) of fig. 5e still has a relatively low value of 2 for num _ reorder _ frames but a lower value of 3 for maxlantencyframes, at least for the highest temporal resolution layer, since the intermediate frame F₄ ¹And the last frame F₈ ⁵Is determined by the difference between the encoding order and the output order of (c). If only the intermediate temporal resolution layer or the lowest temporal resolution layer is decoded, the constraint on frame reordering latency can be reduced to 1 (for the intermediate layer) or 0 (for the lowest layer).

In the example shown in fig. 5a-5e, if the value of maxlantencyframes is known, the decoder may identify certain frames that are ready to be output immediately upon receiving a preceding frame in output order. For a given frame, the output order value of the frame minus the coding order value of the frame may be equal to the value of maxlantencyframes. In this case, the given frame is ready for output as long as the preceding frame of the given frame is received in output order. (conversely, num _ reorder _ frames alone, until additional frames are received or the end of the sequence is reached, such frames cannot be identified as ready for output.) specifically, the decoder may use the value of maxlantencyframes to enable the earlier output of the following frames:

in the series (501) of FIG. 5a, frame F₈ ¹。

In the series (502) of FIG. 5b, frame F₂ ¹、F₄ ³、F₆ ⁵And F₈ ⁷。

In the series (503) of FIG. 5c, frame F₃ ¹、F₆ ⁴And F₉ ⁷。

In the series (504) of FIG. 5d, frame F₈ ¹。

In the series (505) of FIG. 5e, frame F₄ ¹And F₈ ⁵。

In addition, declaring or negotiating the value of maxlantencyframes at the system level may provide a summary representation of the delay characteristics of a bitstream or session in a manner that measures reordering storage capacity and indicates that this capacity cannot be achieved using num _ reorder _ frames.

C. Example implementation

Syntax elements indicating constraints on frame reordering delay may be signaled in various ways, depending on the implementation. The syntax element may be signaled as part of a sequence parameter set ("SPS"), a picture parameter set ("PPS"), or other element of the bitstream, as part of an SEI message, VUI message, or other metadata, or in some other manner. In either implementation, the syntax element indicating the constraint value may be encoded and then signaled by using unsigned exponential golomb coding (unsigned entropy-golomb coding), some other form of entropy encoding, or fixed length coding. The decoder performs a corresponding decoding after receiving the syntax element.

In a first implementation, the flag max _ latency _ limit _ flag is signaled. If the flag has the first binary value (e.g., 0), no constraint is imposed on the frame reordering delay. In this case, the value of the max _ latency _ frames syntax element is not signaled or ignored. Otherwise (the flag has a second binary value, such as 1), the value of the max _ latency _ frames syntax element is signaled to indicate the constraint on frame reordering delay. For example, in this case, the value of the signaled max _ latency _ frames syntax element may be any non-negative integer value.

In a second implementation, the syntax element max _ latency _ frames _ plus1 is signaled to indicate the constraint on frame reordering delay. If max _ latency _ frames _ plus1 has a first value (e.g., 0), no constraint is imposed on the frame reordering delay. For other values (e.g., non-zero values), the value of the constraint on frame reordering delay is set to max _ latency _ frames _ plus 1-1. For example, max _ latency _ frames _ plus1 has a value of 0 to 2³²-2, inclusive.

Similarly, in a third implementation, the syntax element max _ latency _ frames is signaled to indicate a constraint on frame reordering delay, and if max _ latency _ frames has a first value (e.g., a maximum value), no constraint is imposed on frame reordering delay. For other values (e.g., values less than the maximum value), the value of the constraint on frame reordering delay is set to max _ latency _ frames.

In a fourth implementation, the constraint on frame reordering delay is indicated relative to a frame memory maximum size. For example, the delay constraint is signaled as an addition to the num _ reorder _ frames syntax element. Typically, the constraint on frame reordering delay (in frames) is greater than or equal to num _ reorder _ frames. To save bits in the signaling of the delay constraint, the coding (e.g., by using unsigned exponential golomb coding, some other form of entropy coding) and then signaling the difference between the delay constraint and the num _ reorder _ frames. The syntax element max _ latency _ increment _ plus1 is signaled to indicate the constraint on frame reordering delay. If max _ latency _ increment _ plus1 has a first value (e.g., 0), no constraint is placed on frame reordering delay. For other values (e.g., non-zero values), the value of the constraint on frame reordering delay is set to num _ reorder _ frames + max _ latency _ increment _ plus 1-1. For example, max _ latency _ increment _ plus1 has a value of 0 to 2³²-2, inclusive.

Alternatively, one or more syntax elements indicating the constraint on frame reordering delay are signaled in some other way.

D. Other ways of indicating constraints on delay

In many of the foregoing examples, the constraint on latency is a constraint on frame reordering latency expressed in units of frame counts. More generally, the constraint on delay is a constraint on delay expressed in units of frame counts or in units of seconds, milliseconds, or another time measure. For example, the constraint on delay may be expressed as an absolute time measure, such as 1 second or 0.5 seconds. The encoder may convert such a time metric into a frame count (taking into account the frame rate of the video) and then encode the video such that inter-frame correlations between frames of the video sequence are consistent with the frame count. Alternatively, regardless of frame reordering and inter-frame correlation, the encoder may use the time metric to limit the degree to which latency can be used to smooth short-term fluctuations in the bit rate, coding complexity, network bandwidth, etc. of the encoded video. The decoder may use the time metric to determine when a frame may be output from the decoded picture buffer.

Constraints on delay may be negotiated between the transmitter and receiver ends in order to make a trade-off between responsiveness (lack of delay) and the ability to smooth short-term fluctuations in the bit rate of the encoded video, the ability to smooth short-term fluctuations in the coding complexity, the ability to smooth short-term fluctuations in the network bandwidth, and/or other factors that benefit from increased delay. In such negotiations, it may be helpful to establish and characterize constraints on delay in a frame rate independent manner. The frame rate of the video may then be taken into account and the constraint applied during encoding and decoding. Alternatively, the constraint may be applied during encoding and decoding regardless of the frame rate of the video.

E. Generalized techniques for setting and outputting syntax elements

Fig. 6 illustrates an example technique (600) for setting and outputting syntax elements that facilitate reduced-delay decoding. The technique (600) is performed, for example, by a real-time communication tool or encoding tool as described with reference to fig. 2a and 2 b. Alternatively, another tool performs the technique (600).

First, the tool sets (610) one or more syntax elements that indicate a constraint on delay (e.g., frame reordering delay, delay in units of a time metric) consistent with inter-frame correlation between multiple frames of a video sequence. When the tool comprises a video encoder, the same tool may also receive frames, encode the frames to produce encoded data (using inter-frame correlation consistent with constraints on frame reordering delay) and output the encoded data for storage or transmission.

Typically, the constraint on frame reordering delay is the reordering delay allowed for any frame in the video sequence. The constraint may be expressed in various ways, but different ways have different other meanings. For example, the constraint may be represented by a maximum count of frames that may precede a given frame in output order but follow the given frame in coding order. Alternatively, the constraint may be expressed as the maximum difference between the coding order and the output order of any frame in the video sequence. Or, focusing on individual frames, the constraint may be expressed as a reordering delay associated with a given particular frame in the video sequence. Alternatively, focusing on a group of frames, the constraint may be expressed as a reordering delay associated with the group of frames of the video sequence. Or may represent the constraint in some other way.

Next, the tool outputs (620) the syntax element(s). This facilitates a determination of when to reconstruct the frames in the output order of the plurality of frames ready for output. The syntax element(s) may be output as part of a sequence parameter set or a picture parameter set in a base coded video bitstream, part of a syntax of a media storage file or media transport stream that also contains the encoded data for the frame, part of a media characteristics negotiation protocol (e.g., during system level negotiation or exchange of session parameter values), part of media system information multiplexed with the encoded data for the frame, or part of media metadata related to the encoded data for the frame (e.g., in an SEI message or a VUI message). Different syntax elements may be output to indicate memory capacity requirements. For example, a buffer size syntax element (e.g., max _ dec _ frame _ buffering) may indicate the maximum size of the DPB, while a frame memory syntax element (e.g., num _ reorder _ frames) may indicate the maximum size of the frame memory for reordering.

The constraint value on delay may be expressed in various ways, as described in section V.C. For example, the tool outputs a flag indicating the presence and absence of the syntax element(s). If the flag indicates that the syntax element(s) is not present, then the constraint on latency is undefined or has a default value. Otherwise, the syntax element(s) follow and indicate the constraint on delay. Alternatively, one value of the syntax element(s) indicates that the constraint on delay is undefined or has a default value, while other possible values of the syntax element(s) indicate an integer count of the constraint on delay. Alternatively, for the case where the constraint on delay is a constraint on frame reordering delay, a given value of the syntax element(s) indicates an integer value of the constraint on frame reordering delay relative to the maximum size of the frame memory used for reordering, which is indicated by a different syntax element such as num _ reorder _ frames. Alternatively, the constraint on delay is expressed in some other way.

In some implementations, the frames of a video sequence are organized according to a temporal hierarchy. In this case, different syntax elements may indicate different constraints on frame reordering delay for different time layers of the temporal hierarchy.

F. Generalized techniques for receiving and using syntax elements

Fig. 7 illustrates an example technique (700) for receiving and using syntax elements that facilitate reduced-delay decoding. The real-time communication tool or playback tool described with reference to fig. 2a and 2b, for example, performs the technique (700). Alternatively, another tool performs the technique (700).

First, the tool receives and parses (710) one or more syntax elements that indicate constraints on delay (e.g., frame reordering delay, delay in units of a time metric). For example, the parsing includes reading one or more syntax elements from the bitstream that indicate constraints on delay. The tool also receives (720) encoded data for a plurality of frames of a video sequence. The tools may parse the syntax element(s) and, based on the syntax element(s), determine a constraint on latency. Typically, the constraint on frame reordering delay is the allowed reordering delay for any frame in the video sequence. The constraint can be expressed in various ways, but different ways have different other meanings, as described in the previous section. The syntax element(s) may be signaled as part of a sequence parameter set or a picture parameter set in the base encoded video bitstream, as part of a syntax for a media storage file or media transport stream, as part of a media property negotiation protocol, as part of media system information multiplexed with the encoded data, or as part of media metadata related to the encoded data. The facility may receive and parse different syntax elements indicating memory capacity requirements, e.g., a buffer size syntax element such as max _ dec _ frame _ buffering and a frame memory syntax element such as num _ reorder _ frames.

The constraint value on delay may be expressed in various ways, as described in section V.C. For example, the tools receive a flag indicating the presence or absence of the syntax element(s). If the flag indicates that the syntax element(s) is not present, then the constraint on latency is undefined or has a default value. Otherwise, the syntax element(s) follow and indicate the constraint on delay. Alternatively, one value of the syntax element(s) indicates that the constraint on delay is undefined or has a default value, while other possible values of the syntax element(s) indicate an integer count of the constraint on delay. Alternatively, for the case where the constraint on latency is a constraint on frame reordering latency, a given value of the syntax element(s) indicates an integer count of the constraint on frame reordering latency relative to the maximum size of the frame memory used for reordering, which is indicated by a different syntax element such as num _ reorder _ frames. Alternatively, the constraint on delay is signaled in some other way.

Returning to fig. 7, the tool decodes (730) at least some of the encoded data to reconstruct a frame. The tool outputs (740) the reconstructed frame. In doing so, the tool may use constraints on delay to determine when a frame reconstructed from, for example, the output order of the frames of the video sequence is ready for output.

In some implementations, the frames in the video sequence are organized according to a temporal hierarchy. In this case, different syntax elements may indicate different constraints on frame reordering delay for different temporal layers of the temporal hierarchy. The tool may select one of different constraints on frame reordering delay based on the temporal resolution of the output.

In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the appended claims. I therefore claim as my invention all inventions that come within the scope and spirit of these claims.

Claims

1. In a computing system implementing a video decoder, a method for reducing delay in video decoding comprising:

receiving and parsing a syntax element indicating a maximum size of a frame memory for reordering, wherein the maximum size of the frame memory for reordering is represented by a maximum count of frames that may precede any one frame of the video sequence in coding order but follow that frame in output order;

receiving and parsing one or more different syntax elements indicative of a constraint on frame reordering delay, wherein the constraint on frame reordering delay is represented by a maximum count of frames that may precede any one frame of a video sequence in output order but follow that frame in coding order;

receiving encoded data for a plurality of frames of a video sequence;

decoding, with the video decoder, at least some of the encoded data to reconstruct one of the plurality of frames; and

and outputting the reconstructed frame.

2. The method of claim 1, further comprising:

determining a constraint on frame reordering delay based on the one or more different syntax elements; and

the constraint on frame reordering delay is used to determine when the reconstructed frame is ready for output according to an output order of the plurality of frames of the video sequence.

3. The method of claim 2, wherein the plurality of frames of the video sequence are organized according to a temporal hierarchy, wherein different syntax elements of one or more different syntax elements indicate different constraints on frame reordering delay for different temporal layers of the temporal hierarchy, the method further comprising selecting one of the different constraints on frame reordering delay according to an output temporal resolution.

4. The method of claim 1, wherein the constraint on frame reordering delay defines a maximum difference between a coding order and an output order of any frame in the video sequence.

5. The method of claim 1, wherein the one or more different syntax elements and the encoded data are signaled as part of a syntax for encoding a video bitstream, the method further comprising:

receiving and parsing a buffer size syntax element indicating a maximum size of a decoded picture buffer, wherein the buffer size syntax element is different from the one or more different syntax elements indicating constraints on frame reordering delay.

6. The method of claim 1, wherein the one or more different syntax elements are signaled as part of a sequence parameter set, a picture parameter set, syntax for a media storage file that also contains the encoded data, syntax for a media transport stream that also contains the encoded data, a media characteristic negotiation protocol, media system information multiplexed with the encoded data, or media metadata related to the encoded data.

7. The method of claim 1, wherein:

one possible value of the one or more different syntax elements indicates that the constraint on frame reordering delay is undefined or has a default value, and wherein other possible values of the one or more different syntax elements indicate an integer count of the constraint on frame reordering delay; or

Wherein a value of the one or more different syntax elements indicates an integer count of constraints on frame reordering delay relative to a maximum size of a frame memory used for reordering.

8. In a computing system, a method for reducing delay in video decoding includes:

setting a syntax element indicating a maximum size of a frame memory for reordering, wherein the maximum size of the frame memory for reordering is represented by a maximum count of frames that may precede any one frame of the video sequence in coding order but follow that frame in output order;

setting one or more different syntax elements indicating a constraint on frame reordering delay that is consistent with inter-frame correlation between frames of a video sequence, wherein the constraint on frame reordering delay is represented by a maximum count of frames that may precede any one frame of the video sequence in output order but follow that frame in coding order; and

the one or more different syntax elements are output to facilitate a determination of when to prepare for output for reconstructing a frame from an output order of the plurality of frames.

9. The method of claim 8, wherein the computing system implements a video encoder, the method further comprising:

receiving the plurality of frames of the video sequence;

encoding, with the video encoder, the plurality of frames to produce encoded data, wherein the encoding uses inter-frame correlation consistent with a constraint on frame reordering delay; and

the encoded data is output for storage or transmission.

10. A computing system comprising a processor and a memory implementing a video decoder adapted to perform a method comprising:

receiving and parsing one or more different syntax elements indicative of constraints on frame reordering delay;

determining a constraint on frame reordering delay based on the one or more different syntax elements, wherein the constraint on frame reordering delay is represented by a maximum count of frames that can precede any one frame of the video sequence in output order but follow that frame in coding order;

receiving encoded data for a plurality of frames of a video sequence;

outputting a reconstructed frame, comprising using a constraint on frame reordering delay to determine when the reconstructed frame is ready for output according to an output order of the plurality of frames of the video sequence.