HK1191485B

HK1191485B - Low latency rate control system and method

Info

Publication number: HK1191485B
Application number: HK14104540.1A
Authority: HK
Inventors: A．杜纳斯; F．R．伊兹奎尔多; G．加西亚
Original assignee: 马维尔亚洲私人有限公司
Priority date: 2011-05-04
Filing date: 2012-05-04
Publication date: 2017-11-24

Description

Low delay rate control system and method

Technical Field

The present invention relates to transmitting video signals over a network. More particularly, the present invention relates to transmitting and receiving complex video signals over a network with low latency constraints.

Background

In known video transmission/reception systems, data content from a digital media server is encoded and transmitted to be transcoded for playback on a digital media renderer. However, video content can be quite complex as it changes over time. Due to complexity variations, the bit rate also needs to be varied for high quality transmission of the content. The complexity of video data results in higher bit rates for encoding data for a given level of compression quality. However, the capacity of the network may remain unchanged over time and may not change due to the complexity of the video data or the increased bit rate.

To accommodate this bit rate fluctuation at the video encoder, rate control is used to create a bit rate that is constant over the network at all times. A constant bit rate is still effective for the transmission of data even when the complexity of the video changes dramatically, such as when the scene changes or the video captures many moving or fine features. Rate control strives to keep the video playback quality as stable as possible. Rate control strives to achieve a compromise between quality stability and constant bit rate requirements.

Due to the limitations of rate control and the limitations imposed by the satisfaction of network capacity, known systems buffer data prior to transmission. The encoded data is buffered to enable the instantaneous video encoder bit rate to be higher and lower than the network capacity, but the rate of buffered data sent to the network is often equal to or lower than the network capacity. Since the bit rate may increase significantly beyond the network capacity, known systems implement larger buffers to accommodate these potentially large increases. However, larger buffers introduce latency into the transmission of data, so that delays occur.

For example, known systems may buffer multiple frames after encoding before transmitting data across the network. Large buffer capacity results in high latency within the system. In some applications, this delay is unacceptable. Real-time video playback and interactive applications may not incur high latency and, therefore, cannot only increase the buffer capacity to handle complex video transmissions. In fact, some applications may not allow any significant latency in the network. Thus, the buffer capacity may be minimized to reduce latency, but then the system may not be able to handle the increased bit rate fluctuations due to the complex data.

Disclosure of Invention

Embodiments of the present invention control the video encoder output bitstream rate for scenarios requiring a transmission delay lower than the duration of a single frame in the network. Embodiments of the present invention do not implement large buffers to handle the bit rate increase due to complex video data encoding, but instead perform a process that allows encoding with low latency. According to the disclosed embodiments, one frame or less may be buffered. The proposed rate control can also work at higher latency to also cover systems that do not require low latency.

The disclosed embodiments allocate the necessary bits within the frame to achieve a particular maximum bit rate. By remaining below network capacity, the video transmission/reception system achieves the low latency expected for real-time video playback.

The disclosed embodiments control bit allocation at the subframe level. The frame is divided into smaller blocks, called rate control blocks. The rate control block is used as a basic unit for bit allocation. This bit allocation achieves the target bit rate desired by the system and meets the delay constraints. The disclosed embodiments may use the slice splitting capability of a video encoder. This feature allows the decoder to independently decode the rate control blocks. The end-to-end delay for each rate control block will be as much as a specific maximum delay whenever the encoded data size for each rate control block is below the bit rate allocated for rate control.

To achieve low network latency, the disclosed embodiments use estimated and predicted values that result in satisfactory results. However, the probability of generating a false estimate is present, which may result in instances where the allocated bits for the rate control block are not implemented. In other words, the bit rate may be higher than the target bit rate and exceed the buffer capacity, thus putting latency requirements at risk. To avoid this buffer overflow problem, the disclosed embodiments may implement a buffer protection mechanism based on the "non-encoded" portion of the rate control block. Other protection mechanisms may also be used.

According to a preferred embodiment, a method for encoding an image frame in a video transmission system is disclosed. The method includes selecting a rate control block for an image frame. The rate control block includes a plurality of macroblocks. The method also includes encoding a plurality of macroblocks of the rate control block according to the bit rate.

In further accordance with the preferred embodiments, a video transport encoding system is disclosed. The video transport encoding system includes a slice divider that ensures that the rate control block from the image frame will have an integer number of slices. This feature allows the rate control blocks to be independently decoded. The rate control block includes a plurality of macroblocks. The video transport coding system also includes an encoder that encodes a plurality of macroblocks for the rate control block. The video transmission encoding system also includes a buffer that stores encoded data for each rate control block. The bit rate of the video transmission system and the size of the buffer are set according to the parameter set for the rate control block.

In further accordance with a preferred embodiment, a method for encoding an image frame within a video transmission is disclosed. The method includes collecting statistics used as part of both the frame and rate control block initial settings. The method also includes setting a size of the buffer to receive the rate controlled encoded data. The method also includes encoding a plurality of macroblocks within the rate control block according to a bit rate corresponding to the buffer capacity.

Drawings

The accompanying drawings are included to provide a further understanding of the invention, and are incorporated into and constitute a part of this specification. The figures set forth below illustrate embodiments of the invention and together with the description serve to explain the principles of the invention as disclosed in the claims and their equivalents.

Fig. 1 illustrates a system for transmitting and receiving video signal data in accordance with a disclosed embodiment.

Fig. 2A illustrates a diagram showing in-system encoding, transmission, and decoding times in accordance with the disclosed embodiments.

Fig. 2B illustrates a graph showing bit rate fluctuation over time within a system in accordance with the disclosed embodiments.

Fig. 3 illustrates a video frame having a rate control block in accordance with a disclosed embodiment.

Fig. 4 illustrates a flow chart for controlling bit rate within a video transport encoder using rate control in accordance with a disclosed embodiment.

Fig. 5 illustrates a flow chart for performing macroblock level rate control in accordance with a disclosed embodiment.

Detailed Description

Aspects of the invention are disclosed in the following description. Alternative embodiments of the invention and their equivalents are designed without departing from the spirit and scope of the invention. It will be noted that like elements disclosed below are indicated by like reference numerals in the figures.

Fig. 1 depicts a system 100 for transmitting and receiving video signal data in accordance with the disclosed embodiments. System 100 may be any system or collection of devices connected over a network to share information. Image frames within a video signal are received to be transmitted over a network. Data within an image frame is encoded for transmission using various encoding techniques.

For example, the system 100 may be a gaming system, where video content is generated in a gaming console and then sent to a high-definition digital media renderer, such as a flat screen television. Alternatively, system 100 may be a security monitoring system that uses High Definition (HD) video. This practice is becoming a norm in the surveillance industry because more and more security camera manufacturers now claim to offer HD cameras, which allow surveillance with real-time high quality video.

The digital media server 102 generates video content for transmission in a band. The digital media server 102 may be any device, console, camera, etc. that captures video data. For example, the digital media server 102 is a game console that plays video games stored on a disk or other medium. Content generated from game play is displayed to the user for viewing and real-time interaction. Alternatively, the digital media server 102 is a computer, video recorder, digital camera, scanner, or the like that captures data.

The uncompressed data signal 104 is output from the digital media server 102 to an encoder 106. The encoder 106 may encode or compress the signal 104 for transmission within the system 100. The encoder 106 may use lossy compression techniques to encode the signal 104. The strength of these techniques may vary based on the complexity of the data within the signal 104.

For example, video data of a character in a game swording to opponents is more complex or very busy than video of a character standing alone, and may require a different encoding process to maintain similar quality. The encoder 106 includes a slice divider 134, which is disclosed in more detail below.

The encoder 106 outputs a compressed signal 108 to a buffer 110. Buffer 110 stores data from signal 108 until it can be transmitted through system 100. If the network bit rate does not allow transmission of signal 108, buffer 110 retains the data until such time as it can be transmitted by receiver 114.

The buffer 110 may have a value of buffer capacity. The amount of buffer capacity used by the rate control to achieve its goal (delay and bit rate) will directly correlate to the maximum allowed transmission delay. Buffer 110 outputs signal 112 to receiver 114.

The receiver 114 transmits a signal 116 over a network 118. Using the game example above, the network 118 may be a wireless network for a location where the router receives the signal 116 from the digital media server 102 and forwards it to the digital media renderer 132 for display. Alternatively, the network 118 may be a computer network that receives the signal 116 from a remote camera showing real-time video.

Receiver 120 receives signal 116 and outputs signal 122 to buffer 124. Buffer 124 may have a similar value of buffer capacity as buffer 110. Signal 126 flows from buffer 110 to decoder 128. Decoder 128 decodes or decompresses signal 126 to generate uncompressed signal 130. The uncompressed signal 130 is preferably a high quality copy of the uncompressed data signal 104, which may have slight fluctuations due to the encoding process.

Digital media renderer 132 receives uncompressed signal 130 and displays video data content to a user. The digital media renderer 132 may be a high definition television with a display resolution of 1,280x720 pixels (720 p) or 1,920x1,080 pixels (1080 i/1080 p). Thus, the amount of data encoded and transcoded within the system 100 may be complex due to the requirements imposed on it by the digital media server 102 and the digital media renderer 132.

The system 100 is subject to various limitations and parameters. The system 100 may transmit at a constant bit rate over the network 118. This bit rate remains the same over time, but may vary in some cases. When the buffer 110 is full, a delay or integration time may occur, which results in a delay within the system 100 when data is transmitted over the network 118.

Fig. 2A depicts a diagram 200 illustrating in-system encoding, transmission, and decoding times in accordance with the disclosed embodiments. The diagram 200 includes a timeline 202 that shows an encoding time 204, a transmission time 206, and a decoding time 208. The encoding time 204 may represent the time for the smallest decodable unit encoded or compressed by the encoder 106 plus the time for some number of video lines required before starting the encoding process.

For the disclosed embodiments, the smallest decodable unit can be a slice. After this time, the encoder 106 can begin sending this first slice. The conventional scheme waits for one frame before starting encoding and starts transmitting a bitstream after the entire frame is encoded. This may include a coding time of 2 frames. The system 100 minimizes encoding time by minimizing the wait before starting encoding and makes the smallest decodable unit smaller.

The encoding time 204 is preferably less than or equal to the time elapsed for the minimum decodable unit. For example, if the minimum decodable unit is a frame and the length of time of frame 210 is 1/60 seconds, then the encoding time 204 is smaller than that within system 100.

The transmission time 206 represents the time at which data is transmitted over the network 118. The transmission time 206 is also less than or equal to the period of time required to transmit the number of bits of buffer capacity at the desired capacity (equal to or less than the bit rate configured for rate control) over the network. Thus, data may not be buffered for any appreciable length of time, or system 100 may not meet these requirements on transmission. Coding time 208 represents the time to code or decompress the smallest decodable unit of video data to reconstruct the video signal. The decoding time 208 is also less than or equal to the time period for the smallest decodable unit. If the smallest decodable unit is a frame, this time will be less than 1 frame.

Thus, latency within system 100 is reduced because the encoding time 204, transmission time 206, and decoding time 208 are kept below the time period of frame 210. Known systems may include the opposite approach where these times exceed the duration of the frame. These delays accumulate at each step in the process, resulting in high latency. Larger buffers also increase these times. Fig. 2A illustrates how latency is reduced to a level where moving frames are well within the limits required for real-time video rendering in the system 100.

Fig. 2B depicts a graph 220 showing bit rate fluctuation over time within a system in accordance with the disclosed embodiments. The graph 220 shows a timeline 222 and a video data bit rate line 224. The video data bit rate line 224 varies as the timeline 222 extends. The video data bit rate line 224 varies with the complexity of the video encoding. For example, the video data bit rate line 224 rises as complexity increases.

Fig. 2B also includes a buffer 226 that stores video data. Buffer 226 may correspond to buffers 110 and 124 of fig. 1. As shown, all data for the video data bit rate line 224 fits within the buffer 226. The video data bit rate line 224 does not exceed the limits of the buffer 226, regardless of how much the line 224 varies. The buffer 226 may also have a buffer capacity. The buffer capacity may depend on the expected minimum transmission delay, which according to the disclosed embodiments is less than one frame. This capacity ensures that delays are minimized.

Values may be set within the system 100 to adapt the relationships shown on fig. 2A and 2B. A particular maximum bit rate value may be set for the constant bit rate of the network 118 for transmitting and receiving video content. In other words, the value of the required bits is determined to encode the frame of video content. The frame bit value and the expected delay may set the buffer capacity of buffer 226. For example, if the bit rate is 1000 bits, and if the expected size of the frame and the expected transmission delay is half a frame, the buffer capacity will be 500 bits. The amount of data stored in buffer 226 (or buffers 110 and 124) may not exceed this value. Thus, any delay or latency within the system 100 is approximately equal to or less than the time of a frame (1/60 seconds if the frame rate of the transmitted video is 60 frames per second), as shown in FIG. 2A.

Fig. 3 depicts a video frame 300 having a rate control block 310 in accordance with a disclosed embodiment. Preferably, each video frame 300 has the same number of bits. The bit rate for video frame 300 may be increased if the picture is very busy and requires complex encoding to capture all the motion.

Video frame 300 may be composed of pixels grouped to form macroblock 302. Each macroblock 302 includes two or more pixels. Preferably, macroblock 302 is 16 pixels x16 pixels. The macroblocks 302 are encoded by the encoder 106 and sent to the decoder 128, which uses a compression scheme or other algorithm. The transmitted information may include the address of the macroblock within video frame 300, luminance information, chrominance or color information, compression level values, and motion vector information.

Thus, the video frame 300 may be partitioned into a plurality of macroblocks 302. In conventional video transmission and reception systems, all macroblocks 302 within a video frame 300 are encoded, buffered, and then transmitted over the network 118. The disclosed embodiments partition the video frame 300 into rate control blocks 310 and use these rate control blocks as the basis for encoding, transmitting, and decoding video data.

The slice splitter 134 of the encoder 106 guarantees that each rate control block has an integer number of slices. The rate control block 310 may also be referred to as a subframe. The rate control block 310 may be used as a basic unit for bit allocation within the system 100 and comprises a plurality of macroblocks 302. Preferably, the number of macroblocks 302 within each rate control block 310 is between 5 and 15. Rate control block 310 may comprise one or more slices of frame 300.

Alternatively, the number of macroblocks will depend on the maximum transmission delay that is expected or required. The larger the size of the rate control block 310, the higher the minimum delay achievable. Note that the opposite is also true, the higher the delay required, the smaller the rate control block required.

Based on the number of macroblocks 302, rate control block 310 may have a target bit rate that corresponds to the duration of rate control block 310. In other words, the target bit rate for rate control block 310 should be lower than the rate for video frame 300 since there is less information to encode. This feature keeps the bit rate fluctuation well below the buffer capacity for a single video frame 300.

Buffer 110 includes a buffer capacity at least equal to rate control block 310. Encoder 106 may forward each rate control block 310 after the last macroblock 302 within each rate control block 310 is encoded. Decoder 128 may independently begin the decoding process for each rate control block 310. Thus, information is transmitted at an increased rate and with reduced latency within the system 100. The bit rate may fluctuate between rate control blocks 310 to accommodate the "busy" portion of video frames 300, but never above the bit rate configured for rate control.

Fig. 4 depicts a flow chart 400 for controlling the bit rate within the video transport encoder 106 using the rate control block 310 in accordance with the disclosed embodiments. Step 402 is performed by determining a frame level initial setting for video frame 300. These settings may include the number of bits per frame, the number of macroblocks 302 on rate control block 310, the target bit rate, and so forth. Step 404 is performed by generating a rate control block 310 within the video frame 300. As noted above, the video frame 300 may include several rate control blocks 310, the rate control blocks 310 including macroblocks 302.

Step 406 is performed by selecting rate control block 310 for encoding and transmission. For example, referring to video frame 300, the top rate control block 310 may be initially selected, and so on until the bottom rate control block 310 is encoded. Step 408 is performed by determining the rate control block initial setting. A target bit rate is selected and a buffer corresponding to the duration of the rate control block 310 is set.

The encoder 106 may allocate some bits based on the selected size of the rate control block 310. Based on this setting, step 410 is performed by performing macroblock-level rate control using the target bit rate based on the size of rate control block 310. Step 410 is disclosed in more detail by fig. 5.

Step 412 is performed by encoding one macroblock 302 within rate control block 310. Step 414 is performed by determining whether the encoded macroblock 302 is the last macroblock within rate control block 310. If not, the flow diagram 400 returns to step 410 to encode the remaining macroblocks 302. If so, step 416 is performed by performing virtual buffer management. The data encoded within the buffer 110 is forwarded to the transceiver 114 for transmission over the network 118. Because buffer 110 corresponds to the size of rate control block 310, data from an upper rate control block 310 is not delayed while a subsequent rate control block 310 is encoded.

Step 418 is performed by determining whether the last macroblock 302 within video frame 300 was encoded. If not, the flow diagram 400 returns to step 406 to select the next rate control block 310. If so, then all video frames 300 have been encoded and a new video frame should be received. Accordingly, the flow diagram 400 returns to step 402.

Fig. 5 depicts a flowchart 500 for performing macroblock level rate control in accordance with a disclosed embodiment. Fig. 500 further discloses step 410 of fig. 4. To achieve low latency, the disclosed embodiments may use estimated and predicted encoded values for macroblock 302. These processes "predict" the value for macroblock 302 during encoding to reduce encoding time. However, these predictions risk producing false estimates. In particular, such errors may occur in particular in busy video frames with much movement and change compared to the previous frame.

For example, if the video frame 300 depicts a blue sky with some clouds, the encoder 106 may predict that the values for the macroblocks 302 within the rate control block 310 are the same for the blue sky background. When the macroblock 302 has the starting data of an airplane flying in the macroblock, the encoder 106 predicts that it has a value for the blue sky, and then an error may occur. The aircraft may be white, so that macroblock 302 has a different value than the blue sky, but encoder 106 still uses the predicted value anyway. Such errors may result in high bit rates for handling complex numerical changes.

Errors such as these must be avoided due to low latency limitations on the system 100. The disclosed embodiment shown in fig. 5 may perform some operations to prevent buffer overflow and the resulting latency. One such operation may avoid approaching the buffer capacity limit for rate control block 310. Space should be left in the buffer 110 to handle any sudden spikes in bit rate due to complex variations. However, despite the space, buffer overflows may occur and need to be handled accordingly.

Step 502 is performed by determining whether the current bit rate for the selected rate control block 310 is approaching a buffer overflow. If not, step 504 is performed by performing a determination of the compression level predicted for the next macroblock. Using the results of the compression level, the disclosed embodiments may predict a similar macroblock 302 as one being encoded on a previously encoded video frame 300. Step 506 is performed by returning to flowchart 400.

If step 502 is true, then step 508 determines whether the macroblock 302 is within an entire spatially predicted frame (intra or 1 frame) or an entire temporally predicted frame (inter) video setting. If not, step 510 skips the encoding process for the remaining macroblocks 302 within rate control block 310. Instead, escape (escape) macroblocks may be used. The encoder 106 informs the decoder 128 that the current frame is similar to the previous frame and that these macroblocks can be used to fill the rate control block 310. Thus, buffer overflow is avoided because the bit rate does not exceed the capacity of the buffer 110.

If step 508 is yes, then step 512 performs a special operation on the remaining macroblocks 302 in rate control block 310. In an I-frame environment, the encoder 106 may not use escape macroblocks because the video data does not relate to the previous video frame.

Intra-frame coding refers to the fact that various lossless and lossy compression techniques are performed with reference only to the information contained within the current frame and not to any other frame in the video sequence. In other words, no temporal processing is performed outside the current picture or frame.

Thus, the predicted scene may not work for 1-frame video frames. Step 512 removes the prediction residual, either partially or completely, so that minimal information is sent. The encoder 106 keeps using the 1-frame macroblock 302, but the remaining macroblocks 302 will have most of the prediction residual set to 0 to reduce the number of bits to be used. As a result, the bit rate is reduced to fit within the bit rate allocated for the buffer 110. The flow 500 then returns to the flow diagram 400 through step 506.

It will be apparent to those skilled in the art that various modifications and variations can be made in the disclosed embodiments of the privacy card cover (privacy card cover) without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of the embodiments disclosed above provided they come within the scope of any claims and their equivalents.

Claims

1. A method for encoding an image frame within a video transmission system, the method comprising:

selecting a rate control block for the image frame, wherein the rate control block comprises a plurality of macroblocks;

determining a rate control block bit rate;

specifying a buffer size to correspond to a size of the rate control block;

encoding the plurality of macroblocks of the rate control block according to the bit rate; and

the buffer overflow is avoided by alternatively skipping the encoding of selected intra macroblocks of the rate control block and removing prediction residues of intra macroblocks in the rate control block.

2. The method of claim 1, wherein the determining step comprises setting the bit rate to correspond to a size of the rate control block.

3. The method of claim 1, further comprising partitioning the image frame into a plurality of rate control blocks.

4. The method of claim 1, further comprising performing macroblock-level rate control.

5. The method of claim 4, further comprising determining whether the buffer is in an overflow state due to a size of the plurality of macroblocks that have been encoded.

6. The method of claim 4, wherein the performing macroblock-level rate control comprises copying macroblocks of a previous frame.