HK1170605A - Adaptive bitrate management for streaming media over packet networks - Google Patents
Adaptive bitrate management for streaming media over packet networks Download PDFInfo
- Publication number
- HK1170605A HK1170605A HK12111165.2A HK12111165A HK1170605A HK 1170605 A HK1170605 A HK 1170605A HK 12111165 A HK12111165 A HK 12111165A HK 1170605 A HK1170605 A HK 1170605A
- Authority
- HK
- Hong Kong
- Prior art keywords
- bitrate
- optimal
- audio
- video
- media
- Prior art date
Links
Description
Cross reference to related patent
The present application claims priority from U.S. application No. 12/416,085, "Adaptive double Management For Streaming Media Over packets Networks," filed 3/31/2009, which U.S. application No. 12/416,085 was continued from part of U.S. application No. 12/170,347, "Adaptive double Management For Streaming Media Over packets Networks," filed 9/2008, which U.S. application No. 12/170,347 claims the benefit of U.S. provisional application No. 60/948,917, "Adaptive double Management For Streaming Media Over packets Networks," filed 10/7/2007, all of which are incorporated herein by reference.
Background
Rate control is essential for media streaming over packet networks. The challenge in delivering bandwidth-intensive content such as multimedia over a shared link of limited capacity is: changes in network conditions are quickly responded to by adjusting the bit rate and media coding scheme to optimize the viewing and listening experience for the user. In particular, when a media stream is transmitted over a connection that cannot provide the necessary throughput, a number of undesirable effects can occur. For example, a network buffer may overflow, causing packet loss, resulting in a disturbed video or audio playback, or a media player buffer may underflow, causing a playback stall.
There are many different mechanisms to enable multimedia transmission over packet networks. The first type of media network transport is a streaming protocol, such as the real-time protocol (RTP). Streaming protocols are specifically designed to transport multimedia information with explicit timing information, and it is generally desirable to send packets when media frame(s) in the payload expire.
Another class is pseudo-streaming. The most commonly used transport protocol for pseudo-streaming is the Transmission Control Protocol (TCP), which was originally designed for bulk data transfer. Thus, TCP does not explicitly indicate timing information for the media in the payload. TCP is used to transfer only media clips (such as, for example,. flv or. mp4 files). The media time information is sent implicitly within the media clip format and the player only plays back the clip when downloading parts of the clip. HTTP is often used as a download protocol over TCP.
In the case of streaming protocol transmissions, the standards body recommends protocols or extensions to protocols to address the issues of transport stream control and bit rate management algorithm implementation. The Internet Engineering Task Force (IETF) specifies in RFC 3550 the real-time transport control protocol (RTCP) as an accompaniment to RTP, and the basic building blocks to achieve bit rate/packet rate control in RTP streaming media. Several extensions of RTCP to high capacity networks follow this original recommendation. Other proprietary protocols, such as the Real Time Messaging Protocol (RTMP), feature similar mechanisms.
Pseudo-streaming, on the other hand, generally does not require an additional protocol for flow control. TCP itself uses its native endpoint feedback to perform flow control over its connections. TCP packets are identified by a packet sequence number that is acknowledged in the opposite direction via an Acknowledgement (ACK) packet. The ACK does not know the type and properties of the payload, making it difficult to implement a bit rate management algorithm for pseudo-streaming.
Several challenges are encountered in delivering multimedia sessions over packet radio networks. These challenges may include:
sudden adjustment of the nominal transmission rate: due to interference, fading, etc., the 3+ G network negotiates physical layer parameters on the fly. The nominal transmission bit rate may vary by a factor of 10. In both pseudo-streaming and streaming sessions, the most direct effect is playback stall due to buffer exhaustion.
Packet loss: caused by link transmission errors or by network congestion.
Reduction of effective bandwidth: the radio link is a shared resource at layer 2 with MAC (medium access control) mechanism and scheduling. This means that the increased load presented by other wireless terminals in the same sector can reduce the effective bandwidth or capacity that the terminal will see.
Limited capacity: the available capacity may typically be a fraction of the capacity available in conventional wired internet access technologies, where capacity is not currently an issue. Typically, a fixed internet media session in a video portal may provide a network load between 250 kbps and 800 kbps. While current 3G cellular networks can maintain throughputs of 500 kbps and above, the overall bit rate budget for cellular telephone wireless multimedia sessions is typically kept below 150 kbps to ensure scalability.
The above-mentioned problems may affect streaming and pseudo-streaming sessions, making adaptive bitrate management essential for achieving a good user experience.
For wireless mobile phones with RTP or similar streaming protocols, the implementation of this adaptive bitrate management is challenging due to:
unusual and incomplete network state information. Typical wireless media players support RTCP receiver reports as defined in RFC 3550, and the report generation frequency is fixed. Therefore, the network state information obtained at the sender side is limited and sporadic. In its packet streaming service specification, 3GPP recommends a number of extensions to the basic IETF RTCP receiver report (i.e., RTCP extended reports or XR). Unfortunately, very few handsets implement these enhancements.
The different media streams are processed separately. Although both the audio stream and the video stream are transmitted over the same network link, they are both processed separately by RTCP. These two RTCP reports provide status information about the same network, thus providing joint analysis; and
typically using a low media bit rate. The bit rate budget for a wireless multimedia session is typically very low (below 150 kbps). Any attempt to reduce the audio or video bit rate may have a greater perceptual impact on the conversation.
In the case of pseudo-streaming sessions, TCP handles lost packets by requesting retransmissions. Thus, there are no problems such as quality degradation due to dropped media packets, although the actual occurrence of packet loss in the system layer leads to increased latency in the data stream, thereby increasing the probability of media player stalls due to empty buffers. The following notable problems occur:
the feedback provided by the ACK packet of TCP is completely unaware of the media time being transmitted.
An HTTP download over TCP would send as many media files as possible and as quickly as possible.
Additional components may be needed at the receiver to cope with the fact that the internal state of TCP is not directly available to the media application.
Drawings
FIG. 1 is a block diagram of an exemplary system.
FIG. 2 is a block diagram illustrating an embodiment of the exemplary system of FIG. 1.
Fig. 3A is a functional diagram illustrating an exemplary communication flow in the exemplary system of fig. 2.
Fig. 3B is an exemplary functional diagram illustrating adaptive bitrate management according to a pseudo-streaming embodiment.
FIG. 4 is a flow chart representing an exemplary method for processing RTCP packets or TCP ACKs.
Fig. 5 is a flow chart representing an exemplary method for processing optimal session bit rate data.
Detailed Description
Reference will now be made in detail to exemplary embodiments consistent with the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Adjusting the bit rate of a streaming media session based on the instantaneous network capacity can be a key function required for delivering streaming media over wireless packet networks. Adaptive bitrate management is an integrated framework and method that enables delivery of self-adjusting streaming or pseudo-streaming sessions to media players (e.g., such as standard 3 GPP-compliant media players or Flash plug-ins for web embedded video). Adaptive bitrate management includes, among other things, an adaptive bitrate controller and a variable bitrate encoder, both of which allow adaptive bitrate management to have the capability of simultaneously enabling joint session bitrate management of audio, video, and/or other streams. In case of a pseudo-streaming session, the adaptive bitrate controller may further include: a media multiplexer for assembling the media clip by multiplexing the audio and video frames generated by the variable bit rate encoder together with the necessary timestamps indicating the playback moments.
Adaptive bitrate management may be applicable to all media transmissions (or protocol suites) that are available for media delivery and provide a transmission progress reporting mechanism. The transmission progress report may be applied to the multimedia session as a whole or to individual multimedia streams (audio, video, text, etc.). The adaptive bitrate manager may include the ability to provide the sender with a way to map media time information to bytes received by the receiver, either explicitly as in the case of RTCP or implicitly through ACK packets as in the case of TCP.
FIG. 1 is a block diagram of an exemplary system. Exemplary system 100 may be any type of system that transmits data packets over a network. For example, an exemplary system may include a mobile terminal accessing streaming media data from a content server over the internet. The exemplary system may include, among other things, a terminal 102, a gateway 104, one or more networks 106, 110, an adaptive bitrate manager 108, and one or more content servers 112 through 114.
Terminal 102 is a hardware component that includes software applications that allow terminal 102 to transmit and receive packets corresponding to streaming media. The terminal 102 provides a display and one or more software applications (e.g., media players) for displaying the streaming media to a user of the terminal 102. In addition, the terminal 102 has the capability to request and receive data packets (such as data packets of streaming media) from the internet. For example, the terminal 102 may transmit request data to the content servers 112 to 114 for a specific file or object data of a web page through its URL, and the content server of the web page may query the object data in the database and transmit corresponding response data to the terminal 102. In some embodiments, the response data may be routed through the adaptive bitrate manager 108.
Although the terminal 102 may be a wired terminal, some embodiments of the present invention may prefer to use a mobile terminal because the mobile terminal is more likely to be in a network that benefits more from an adaptive bitrate manager. The network connection tends to be less stable than a wired network connection due to, for example, changing locations of the mobile terminal where data rate transmissions between the mobile terminal and the network may fluctuate, in some cases quite significantly.
The gateway 104 is a device that converts formatted data provided in one type of network into a specific format required by another type of network. For example, gateway 106 may be a server, router, firewall server, host, or proxy server. Gateway 104 has the following capabilities: converts signals received from the terminal 102 into signals understandable by the network 106, and vice versa. Gateway 104 may be capable of handling audio, video, and t.120 transmissions, either individually or in any combination, and capable of full duplex media conversion.
Networks 106 and 110 may include any combination of Wide Area Networks (WANs), Local Area Networks (LANs), or wireless networks suitable for packet-type communications, such as internet communications. In addition, networks 106 and 110 may include buffers for storing packets prior to transmission to their intended destination.
Adaptive bitrate manager 108 is a server that provides communication between gateway 104 and content servers 112 to 114. The adaptive bitrate manager 108 can optimize performance by adjusting the streaming media bitrate according to the connection (i.e., the media network) between the adaptive bitrate manager 108 and the terminal 102. Adaptive bitrate manager 108 can include optimization techniques, described further below.
The content servers 112 to 114 are the following servers: which receives the request data from the terminal 102, processes the request data accordingly, and in some embodiments returns response data to the terminal 102 through the adaptive bitrate manager 108. For example, content servers 112-114 may be web servers, enterprise servers, or any other type of server. The content servers 112 to 114 may be computers or computer programs responsible for accepting requests for streaming media (e.g., HTTP, RTSP, or other protocols that may initiate media sessions) from the terminal 102 and the service terminal 102.
FIG. 2 is a block diagram illustrating an embodiment of the exemplary system of FIG. 1. The terminal 102 may include, among other things, a media player 202 and a buffer 204. Adaptive bitrate manager 108 can include, among other things, an adaptive bitrate controller 210, a buffer 212, a variable bitrate encoder 214, a media packetization 216, and a media multiplexer 218.
The media player 202 is computer software for playing multimedia files (e.g., streaming media) including video and/or audio media files. Such common examples of Media players 202 may include Microsoft Windows Media Player, Apple Quicktime Player, RealOne Player, and Adobe Flash plug-ins for web embedded video. In some embodiments, the media player 202 uses a codec to decompress and playback streaming video or audio on the display of the terminal 102. The media player 202 may be used as a standalone application or embedded in a web page to create a video application that interacts with HTML content. Further, the media player 202 may provide feedback regarding the receipt of media to the adaptive bitrate manager 108 in the form of a media recipient report. The media receiver report may include RTCP packets for RTP streaming sessions or TCP ACKs for pseudo-streaming sessions.
The buffer 204 (also referred to as the terminal buffer 204) is a software program and/or hardware device that temporarily stores multimedia packets prior to providing the multimedia packets to the media player 202. In some embodiments, buffer 204 receives multimedia packets from adaptive bitrate manager 108 via network 106. In some embodiments, buffer 204 receives multimedia packets from a device other than adaptive bitrate manager 108. Once the buffer 204 receives the multimedia packets (or portions of the media clip in the case of pseudo-streaming), the buffer 204 may provide the stored multimedia packets to the media player 202. Although fig. 2 illustrates the terminal buffer 204 and the media player 202 as separate components, one of ordinary skill in the art will recognize that the terminal buffer 204 may be part of the media player 202. Further, although fig. 2 illustrates only a single buffer, one of ordinary skill in the art will recognize that multiple buffers may be present, e.g., one or more buffers for audio media packets and one or more buffers for video media packets.
Adaptive bitrate controller 210 of adaptive bitrate manager 108 is a software program and/or hardware device that: which periodically receives media receiver reports (such as RTCP receiver reports or TCP ACKs, for example) from the terminal 102 and provides an optimal session bitrate (or encoding parameter) to be used for encoding multimedia data to be sent to the terminal 102 during the next period. In some embodiments, adaptive bitrate controller 210 includes a buffer for storing current and previous media recipient reports. To calculate the optimal session bitrate or encoding parameters, adaptive bitrate controller 210 uses one or more network state estimators to estimate the state of the streaming media network and calculate the optimal session bitrate to be used in the next reporting interval. For example, these network state estimators may estimate the media in transit time (MTT), the bit rate received at the terminal 102, the Round Trip Time Estimate (RTTE), and the packet loss count. Adaptive bitrate controller 210 can use the history and statistics of the estimators to implement different control algorithms to calculate the optimal session bitrate. In addition, adaptive bitrate controller 210 can update the optimal session bitrate by determining the stability of the streaming media network. This may be done by checking that the most recently calculated estimator complies with one or more stability criteria. Using the estimation and stability criteria, adaptive bitrate controller 210 can determine whether to adjust the output bitrate or to keep the current output bitrate constant for the next period. After this determination, adaptive bitrate controller 210 provides an optimal session bitrate value to variable bitrate encoder 214.
The buffer 212 of the adaptive bitrate manager 108 is a software program and/or hardware device that temporarily stores media data before providing the media data to the variable bitrate encoder 214. In some embodiments, buffer 212 receives media data from one or more content servers 112-114 via network 110. In some embodiments, buffer 212 receives media data from devices other than content servers 112-114. In some pseudo-streaming embodiments, buffer 212 may include: a demultiplexer (e.g., demultiplexer 350 shown in fig. 3B) to separate the audio and video tracks before relaying the media to the variable bit rate encoder 214.
The variable bit rate encoder 214 of the adaptive bit rate manager 108 is a software program and/or hardware device that: which receives optimal session bit rate data or encoding parameters from adaptive bit rate controller 210 and provides audio and/or video data encoded at a bit rate that matches the optimal session bit rate provided by adaptive bit rate controller 210 to media packetization 216. For a pseudo-streaming session, variable bit rate encoder 214 may instead provide audio and video frames to media multiplexer 218. The variable bit rate encoder may include, among other things, a bit rate divider 220, an audio encoder 222, a video encoder 224, and for some embodiments, a frame dropper 226.
Bitrate splitter 220 is a software program and/or hardware device that: which receives the optimal session bitrate data from adaptive bitrate controller 210 and allocates the optimal bitrate to be used when encoding the audio and video media data during the next interval. The allocation is such that the sum of the bit rates of all tracks at the time of combining may be substantially equal to the optimal session bit rate specified by adaptive bit rate controller 210. For example, the allocation may be based on a predetermined allocation, user preferences, optimal performance data, giving a higher privilege to a type of data than others, an amount of audio and video data to be provided, and/or any combination thereof. For example, bitrate splitter 220 may privilege audio quality in such a way that if a reduced bitrate is specified, bitrate splitter 220 will reduce the video bitrate first and defer reducing the audio bitrate as much as possible.
The audio encoder 222 and the video encoder 224 are software programs and/or hardware devices as follows: it may receive their respective bit rate allocations from bit rate partitioner 220 (or directly from adaptive bit rate controller 210) and provide output media data that is encoded to match the bit rate of their respective bit rate allocations for the next reporting interval. Both the audio encoder 222 and the video encoder 224 may receive their respective media data from the buffer 212 and output the media data according to their respective bitrate allocations from the bitrate splitter 220. After the bit rates have been determined for both audio and video, it is the responsibility of each encoder to deliver the maximum quality in the corresponding media track. For example, the audio encoder 222 may generate a variable bit rate by adjusting spectral quantization and cutoff frequency. Further, the video encoder 224 may generate a variable bit rate, for example, by adjusting Discrete Cosine Transform (DCT) coefficient quantization or by introducing frame dropping. This frame dropping may be performed by frame dropper 226, if desired.
Frame dropper 226 is a software program and/or hardware device that may be triggered when the desired bit rate is less than the quality threshold. The threshold may be codec dependent and represents a bit rate value below which the use of coarser quantization may lead to intolerable artifacts in the image. The frame dropper 226 may dynamically determine the frame dropping rate based on the desired video bitrate and the bitrate being generated by the video encoder 224. To compensate for the inherent bitrate fluctuations in the video bitrate at the output of the encoder, frame dropper 226 can dynamically update the drop rate by using a sliding window that covers the byte size history of the most recently encoded frames.
Media packetization 216 is a software program and/or hardware device that: which receives audio and video media data from audio encoder 222 and video encoder 224 and converts the data into a packet format to deliver a streaming session. Media packetization 216 may create packets to be transmitted over separate network channels for video and audio data, or combine audio and video in a single media stream. In addition to carrying audio and media data, the media packets may include, among other things, a payload type identifier for identifying the content type, a packet sequence number, a timestamp for allowing synchronization and jitter calculations, and delivery monitoring data. This type of data may then help adaptive bitrate controller 210 determine the quality of service provided by the network when adaptive bitrate controller 210 receives a corresponding media recipient report from terminal 102. After converting the data into packet format, media packetization 216 transmits the data through network buffer 230 of network 106 to terminal buffer 204 of terminal 102. In addition, adaptive bitrate manager 108 maintains a history of transmitted media packets in the audio and video tracks. The history data may include, among other things, the time of day each packet was sent, the sequence number, and the size of each media packet.
In some embodiments (e.g., embodiments in which pseudo-streaming is involved), media multiplexer 218 may replace media packetization 216. The media multiplexer 218 is a software program and/or hardware device that: which receives separate audio and video media data, either directly or indirectly from audio encoder 222 and video encoder 224, and combines the data into a media clip file format to deliver a pseudo-streaming session. The media multiplexer 218 sends subsequent segments of the instantly assembled media file to the media player 202 using TCP as the transport protocol and in some embodiments HTTP as the download protocol over TCP. The Media multiplexer 218 may correspond to the multiplexer disclosed in U.S. application No. 12/368,260 entitled "Method for Controlling Download Rate of Real-Time Streaming as fed by Media Player," which is incorporated herein by reference, to add session timing functions to improve the effectiveness of adaptive bitrate management in pseudo-Streaming sessions. For a pseudo-streaming session, the adaptive bitrate manager 108 (e.g., as described below in fig. 3B) can provide pseudo-streaming media data at a rate that is in accordance with the streaming real-time, as required by the player.
Fig. 3A is a functional diagram illustrating an exemplary communication flow in the system of fig. 2. For the purpose of illustrating this exemplary embodiment, it is assumed that the terminal 102 has received at least some of the media data of the requested media data packet. Further, it is assumed that the media data packets include both audio and video media data. After receiving the packets, the media player 202 transmits (302) a media recipient report to the adaptive bitrate manager 108.
For example, the media receiver report may be an RTCP receiver report or a TCP ACK in case of pseudo-streaming. RTCP is a protocol used to provide quality control information for RTP streams (e.g., transmissions provided by media packetization 216 of adaptive bitrate manager 108). More specifically, RTCP may be partner with media packetization 216 of adaptive bitrate manager 108 in the packetization and delivery of multimedia data. In some embodiments, the media player 202 periodically transmits RTCP receiver reports. The RTCP receiver report may provide feedback on the quality of the service being provided by media packetization 216.
The most widely used method for streaming media on the internet is HTTP-based pseudo streaming carried by the Transmission Control Protocol (TCP). TCP implements its own generic (non-media specific) packetization protocol. TCP uses ACKs internally to provide feedback on received TCP packets and thus provide transport flow control. In the case of pseudo-streaming, the TCP ACK packet is used to update the critical network estimator described previously. The most notable additions are: the TCP sequence number (as described in the above-referenced U.S. application No. 12/368,260) is mapped to the stored media time index and bytes to estimate the in-transit media time.
Although TCP and RTP/RTCP are used as exemplary embodiments to illustrate the adaptive bitrate control method, one of ordinary skill in the art will recognize that the adaptive bitrate control method is applicable to any protocol that implements the functionality of media transmission with sequencing and timing information and media transmission feedback with information about received packets (overlay sequencing, timing, loss rate, etc.).
Further, in some streaming embodiments, the media recipient report may be a single report with both audio and video report data (when multiplexing audio and video into a single stream), or the media recipient report may be split into multiple reports (e.g., such as in RTCP where RTP carries audio and video in separate streams), such as, for example, a recipient report for audio report data and another recipient report for video report data. The media recipient report data may include, among other things, data relating to the sequence number of the most recently received media packet at the terminal 102, the timestamp of the last packet received by the terminal 102 reported in the media recipient report, the number of bits sent from the report, the round trip time, and the number of lost packets.
After receiving the receiver report, adaptive bitrate controller 210 can estimate the state of the network to determine whether to update the session bitrate for the next period. Adaptive bitrate controller 210 can save the newly received receiver reports in a cumulative history and record the time at which the packet was received. To estimate the state of the network, adaptive bitrate controller 210 can combine data from received media recipient reports, previously received recipient reports stored by adaptive bitrate manager 108, and a history of transmitted media packets stored by adaptive bitrate manager 108. The adaptive bitrate controller can estimate the following exemplary data for both streaming and pseudo-streaming sessions by using a network state estimator:
the media time in transit (MTT), calculated as the difference between the timestamp of the most recently sent media packet and the timestamp of the last media packet received by the player, reported in the receiver report. For pseudo-streaming sessions, adaptive bitrate manager 108 performs additional steps to calculate the MTT. For example, the adaptive bitrate manager 108 maintains a table of sequence numbers and timestamps in the media clips sent to the player. When an ACK is received, adaptive bitrate manager 108 can retrieve the timestamp corresponding to the byte sequence number in the ACK. Using the time stamp, the adaptive bitrate manager can calculate the MTT.
The received bit rate, calculated as the bits received between the current and previously received receiver reports divided by the time elapsed between the two receiver reports. The bits received between receiver reports are calculated by cross-referencing the sequence numbers in the receiver reports with the history of bytes sent stored at the adaptive bitrate manager 108.
The Round Trip Time Estimate (RTTE) may be obtained by averaging a plurality of lower MTT values stored at the adaptive bitrate manager 108. For example, the RTTE may be calculated by averaging the lowest 3 MTT values among all stored MTT values for the streaming media network. In addition, adaptive bitrate manager 108 can calculate RTTE from data in the (RTCP) sender report. Although these exemplary embodiments are illustrated, any method may be used to estimate the round trip time of a streaming media network.
Packet loss count, captured directly from the media receiver report.
Adaptive bitrate controller 210 can use these estimates to implement a number of different control algorithms. For example, the session bitrate for the next interval may be calculated using streaming media stability criteria.
Adaptive bitrate controller 210 uses stability criteria to determine the stability of the streaming media network. Although any number of algorithms may be used to determine stability, one exemplary embodiment compares the estimated MTT to the RTTE. If the MTT and RTTE remain close, adaptive bitrate controller 210 can determine that the streaming media network can properly support the current bitrate. In addition, by comparing the received bit rate to the current bit rate session, adaptive bit rate controller 210 may determine that the network may handle the load imposed by adaptive bit rate manager 108.
Adaptive bitrate controller 210 uses estimation and stability criteria to implement a control algorithm for discovering network capacity and adjusting the session bitrate accordingly. Adaptive bitrate controller 210 can define variations of the control algorithm to operate in two different modes: (1) acquisition mode and (2) normal mode. Although two modes are illustrated in this exemplary embodiment, one of ordinary skill in the art will recognize that multiple modes of operation may be defined.
In the normal mode, adaptive bitrate controller 210 operates in a steady state condition, instructing the network to maintain or incrementally increase the effective capacity seen by the system. In some embodiments, the control algorithm may increase the session bitrate when the MTT is not increased and the received bitrate remains close to the current session bitrate, while operating in the normal mode.
Adaptive bitrate controller 210 typically triggers the acquisition mode when it detects high packet loss, a sudden increase in MTT, and/or a value of MTT that is higher than a threshold (MTT threshold), which may be a fixed value or may be dynamically obtained for adaptive control mechanisms. Once triggered, the acquisition mode sets the optimal session bit rate to a value, such as the received bit rate or a portion of the received bit rate. Since the received bit rate may be the best estimate of the actual bit rate that the network can support at that particular point in time, the adaptive bit rate manager 108 should quickly return to a steady state condition. In some embodiments, the new session bitrate is set to only a fraction of the current session bitrate.
In this embodiment, although only the terminal 102 is illustrated as being in communication with the adaptive bitrate manager 108, one of ordinary skill in the art will recognize that multiple terminals may be in communication with the adaptive bitrate manager 108, wherein each terminal may be located in a substantially different network environment. These environments can vary significantly because different underlying wireless technologies and fixed network topologies can be used. Thus, for some embodiments, it is desirable that characteristics of the network environment can be discovered in advance to enable automatic adjustment of key parameters in the framework. For example, adaptive bitrate controller 210 can set the MTT threshold to a value related to RTTE at the beginning of a multimedia session. In this way, the system may attempt to follow the general stability criteria provided by adaptive bitrate controller 210. As indicated above, this stability criterion may be based on a comparison between MTT and RTTE, regardless of the network environment (previously unknown), which is very advantageous given the actual network infrastructure type, which is rarely determined a priori. In some embodiments, the optimal session bitrate may be updated by determining a difference between the MTT and the RTTE and adjusting the session bitrate according to the difference. For example, the larger the difference, the larger the adjustment from the current session bit rate to the optimal session bit rate. In some embodiments, the MTT used for this determination may be based on one or more historical values of MTTs.
By using the control algorithm to calculate the session bitrate update as described above, the adaptive bitrate controller 210 determines the optimal session bitrate for transmitting media data to the terminal 102. Adaptive bitrate controller 210 provides (304) the optimal session bitrate data to bitrate splitter 220 of variable bitrate encoder 214. Upon receiving the optimal session bitrate data, bitrate splitter 220 allocates the optimal session bitrate between the audio stream and the video stream. For example, the allocation may be based on a predetermined allocation, user preference for optimal performance data, giving a higher privilege to one type of data than others, the amount of audio and video data to be provided, and/or any combination thereof. For example, bitrate splitter 220 may privilege audio quality in such a way that if a reduced bitrate is specified, bitrate splitter 220 will reduce the video bitrate first and defer reducing the audio bitrate as much as possible.
After splitting the optimal session bitrate into an optimal audio bitrate and an optimal video bitrate, the bitrate splitter provides (306) the optimal audio bitrate to the audio encoder 222 and provides (308) the optimal video bitrate to the video encoder 224. Upon receiving their respective bit rates, both audio encoder 222 and video encoder 224 receive their respective media data from buffer 212 and output their respective audio media data and video media data according to the respective bit rate allocations from bit rate splitter 220. After the bit rates are determined for both audio and video, it is the responsibility of each encoder to deliver the maximum quality in the corresponding media track by maintaining the requested bit rate until the next interval. For example, the audio encoder 222 may generate a variable bit rate by adjusting the quantization and cut-off frequency. Further, the video encoder 224 may generate a variable bit rate, for example, by adjusting Discrete Cosine Transform (DCT) coefficient quantization or by introducing frame dropping. This frame dropping may be performed by frame dropper 226, if desired. In some embodiments, the encoding parameters of the encoder are not modified until the encoder receives the optimal bit rate data from bit rate splitter 220, which will be provided in a subsequent interval, since encoders 222, 224 are slaves to bit rate splitter 220.
In some embodiments in which frame dropping is preferred, video encoder 224 may provide 310 video media data to frame dropper 226 when the optimal session bitrate is less than the quality threshold. The threshold may be codec dependent and represents a bit rate value below which the use of coarser quantization may lead to intolerable artifacts in the image. When frame dropping is triggered, frame dropper 226 may dynamically determine the frame dropping rate based on the desired video bitrate and the bitrate being generated by video encoder 224. To compensate for the inherent bitrate fluctuations in the video bitrate at the output of the video encoder 224, the frame dropper 226 may dynamically update the drop rate by using a sliding window that covers the byte size history of the most recently encoded frames. Frame dropper 226 may drop frames accordingly to deliver the optimal session bit rate. Furthermore, in some embodiments, the video encoder 224 may utilize the network state estimator of the adaptive bitrate controller 210 to encode the video in a more resilient manner. In some embodiments, the video encoder 224 may use the packet loss information in conjunction with the MTT to determine whether a group of pictures (GOP) value should be decreased to increase the number of frames per second transmitted in the video stream. In some embodiments, video encoder 224 may only provide video media data to media packetization 216 or media multiplexer 218 (shown in fig. 3B) if frame dropping is not required. Audio encoder 222 and, for this embodiment, frame dropper 226 provide (312, 314) audio media data and video media data, respectively, to media packetization 216 or media multiplexer 218 (shown in fig. 3B).
Upon receiving the audio media data and the video media data, media packetization 216 converts the data to packet format. RTP defines a standardized packet format for delivering audio and video over the internet, while TCP performs the same functions for general-purpose data. After converting the data to packet format, media packetization 216 transmits (316) the audio and video media packets to network buffer 230 of network 106. Similarly, in the case of pseudo streaming, upon receiving the audio and video data from the variable bit rate encoder 214, the media multiplexer 218 creates a new portion of the media clip file and sends it to the player using TCP and possibly HTTP, as further described below in fig. 3B. Although only one transmission is shown, one of ordinary skill in the art will recognize that transmission 316 may include a separate transmission for one or more audio media packets and another separate transmission for one or more video media packets. Further, one of ordinary skill in the art will recognize that network 106 may include multiple networks, each with their own buffer or buffers. In addition to carrying audio and media data, these packets may include, among other things, a payload type identifier, a packet sequence number, a timestamp, and delivery monitoring data. This type of data may then help adaptive bitrate controller 210 determine the quality of service provided by the network when adaptive bitrate controller 210 receives a media recipient report from terminal 102. In addition, adaptive bitrate manager 108 can also store a history of transmitted media packets so that it can then adjust the bitrate accordingly.
Upon receiving the packets, the network buffer 230 of the network 106 may store the packets until it is their turn to be provided to the terminal 102. Although only buffer 230 is illustrated, one of ordinary skill in the art will recognize that one or more separate buffers may exist for each of the audio media packets and the video media packets. When it is their turn, the network buffer 230 transmits (318) the packets to the terminal buffer 204.
Upon receiving the packets, the terminal buffer 204 of the terminal 102 may store the packets until it is their turn to be provided to the media player 202. Although only buffer 230 is illustrated, one of ordinary skill in the art will recognize that one or more separate buffers may exist for each of the audio media packets and the video media packets. When it is their turn, the buffer 204 provides (320) the packets to the media player 202. In turn, the media player 202 may extract relevant data from the packets and provide the data to the adaptive bitrate manager 108 in a subsequent receiver report.
Fig. 3B is an exemplary functional diagram illustrating adaptive bitrate management according to a pseudo-streaming embodiment. This embodiment incorporates the method and system for providing adaptive bit rate management for pseudo-streaming communications described in U.S. application No. 12/368,260. Further, demultiplexer 350, flow control module 352, frame scheduler 354, and media database 356 provided herein are similar to those described in U.S. application No. 12/368,260, which is incorporated herein by reference. Further, the adaptive bitrate controller 210 and the variable bitrate controller 214 operate similarly to those described above in fig. 3A and will not be described in detail here.
The demultiplexer 350 may be a software program and/or a hardware device as follows: which intercepts and parses incoming media downloads and retrieves media information such as clip timing information as described below.
Flow control module 352 may be a software program and/or hardware device that applies the download rate mode and may frame the media data and program frame scheduler 354 accordingly.
The frame scheduler 354 may be a software program and/or a hardware device as follows: which triggers frame transmission according to the timing specified by flow control module 352, variable bit rate encoder 214, and/or adaptive bit rate controller 210.
The media database 356 may be a record of framed streaming media or a structured collection of data. The structure may be organized as a structured file, a relational database, an object-oriented database, or other suitable database. Computer software, such as a database management system, is utilized to manage and provide access to the media database 356. Media database 356 may store and provide framed streaming media. Which may be combined with other components of network element 110, such as frame scheduler 354 or media multiplexer 218. It may also be external to the adaptive bitrate manager 108. Media database 356 provides buffering to store media data.
Upon receiving 380 streaming media data from content server 114, demultiplexer 350 parses the streaming media and obtains information for the streaming media. For example, the demultiplexer 350 may retrieve timing information of the streaming media, which may be a real-time playback rate on a media player at the terminal 102, among other things. Demultiplexer 350 then transmits 382 the parsed streaming media and information for controlling the download rate to flow control module 352.
Based on the information of the streaming media including timing information, flow control module 352 applies the download rate pattern and frames the parsed streaming media. The framed streaming media may correspond to a real-time playback rate on a media player at the terminal 102. Flow control module 352 then stores 384 the framed streaming media at media database 356 for transmission and schedules 388 frame scheduler 354 according to the timing information and download mode to trigger transmission of the frame streaming media.
Frame scheduler 354 triggers 390 media multiplexer 218 to transmit framed streaming media according to the timing schedule specified by flow control module 352. Upon triggering (390) and upon retrieving the stored media as a result of being sent (392), media multiplexer 218 provides (394) framed streaming media to terminal 102 according to the timing schedule. The providing step 394 may include: the framed streaming media is provided to one or more network buffers, as described above in fig. 3A, and is then provided to the terminal 102. Terminal 102 handles streaming media similar to that described above in fig. 3A. The delivery is a streaming controlled download corresponding to the real-time playback rate on the media player at the terminal 102.
After receiving the portion of the streaming media, terminal 102 may provide (302) a media recipient report as described above to adaptive bitrate controller 210. Adaptive bitrate controller 210 can save a table of sequence numbers and timestamps in a media clip sent to the player, which can be stored in media database 356. When a TCP ACK is received, adaptive bitrate controller 210 can retrieve the timestamp corresponding to the byte sequence number in the ACK and then calculate the MTT, RTTE, and other network estimators that can be used to implement the bitrate control algorithm and stability criteria described previously in fig. 3A for the streaming media implementation. After a change in the network segment has been detected (e.g., a degradation or improvement in bandwidth in the network segment), adaptive bitrate controller 210 can instruct (304) variable bitrate encoder 214 to perform data optimization on the streaming media in media database 356 prior to sending the streaming media to terminal 102. This may enable dynamic data optimization based on changes in the network segment in which the terminal 102 is located to provide a dynamically reduced size streaming media. Variable bit rate encoder 214 may interact (386) with flow control module 352 to combine download rate control with media data optimization. Through data optimization (e.g., media bit rate reduction techniques), the variable bit rate encoder 214 may modify the size of each media frame in the media database 356. Flow control module 352 may then frame the flow rate of the dynamically reduced size streaming media based on timing information of the streaming media.
Fig. 4 is a flow chart representing an exemplary method for processing a media recipient report. Referring to fig. 4, one of ordinary skill in the art will readily recognize that the illustrated process may be altered to delete steps or further include additional steps. For this exemplary method, assume that the recipient report includes data relating to both audio and video media data. If it is a pseudo-streaming session, the TCP ACK is processed to obtain information about the media transmission progress. Although both types exist, one of ordinary skill in the art will recognize that the recipient report data may include audio or video data. After an initial start step 400, the adaptive bitrate manager obtains (402) receiver report data, which may include one or more receiver reports. The recipient report data may relate to the quality and quantity of audio and video media packets received at the media player of the terminal that were sent directly by the media packetization within the media clip created by the media multiplexer. The receiver report data may include, among other things, the sequence number of the last packet received by the terminal, a timestamp corresponding to such a packet, the number of bits sent, the round trip time, and the number of packets lost during transmission from the adaptive bitrate manager to the terminal. The receiver report data may be obtained by receiving a media receiver report from the terminal and by cross-correlating the content of the last received media receiver report with a history of media packets stored at the adaptive bitrate manager.
Although RTP and RTCP are user-level protocols that are directly accessible to multimedia applications, TCP is typically implemented in kernel space in such a way that the application may not have visibility of its internal state. To overcome this, a simple kernel-level proxy may be implemented to generate application-level receiver reports and send them to the adaptive bitrate manager after receiving ACK packets in the kernel space.
After receiving the receiver report data, the adaptive bitrate manager estimates (404) network conditions of the streaming media network. To estimate the state of the network, the adaptive bitrate manager can combine the data from the received receiver report data of step 402 with previously received receiver report data stored by the adaptive bitrate manager. The adaptive bitrate manager can estimate the MTT, the received bitrate, the RTTE and the packet loss. In a pseudo-streaming session, an extra step is required to calculate the MTT. The adaptive bitrate manager can maintain a table of sequence numbers and timestamps in the media clips sent to the media player. When a TCP ACK is received, the adaptive bitrate manager can retrieve the timestamp corresponding to the byte sequence number in the ACK and then calculate the MTT. The adaptive bitrate manager can use these estimates to implement a variety of different control algorithms.
After estimating the network conditions, the adaptive bitrate manager applies (406) stability criteria to determine the stability of the streaming media network. The stability criterion may help to adjust the bit rate if needed in an attempt to stabilize the streaming media network, such as to avoid buffer overflow in the network and underflow at the terminal, for example. Although any number of algorithms may be used to determine the stability criterion, one exemplary embodiment compares the estimated MTT to the estimated RTTE, both of which are estimated in step 404. If the MTT and RTTE remain close, the adaptive bitrate manager can use the comparison to determine that the streaming media network can properly support the current bitrate. Further, by comparing the received bitrate to the current bitrate session, the adaptive bitrate manager can determine that the streaming media network can handle the load.
After establishing the stability criteria, the adaptive bitrate manager determines (408) whether the network is stable for the current bitstream based on the estimating step 404 and/or the stability criteria establishing step 406. If the network is stable, the adaptive bitrate manager operates (410) in a steady state condition by maintaining or incrementally increasing the current bitrate. In some embodiments, the optimal session bitrate may be calculated by determining the difference between the MTT and the RTTE and adjusting the session bitrate according to the difference. For example, if the current session bitrate is less than the set target session bitrate, the adaptive bitrate manager can incrementally increase the optimal session bitrate if the values of MTT and RTTE are comparable. The adaptive bitrate manager then provides (416) the optimal session bitrate for transmitting the media data to the terminal. After providing step 416, the method may continue to end 418.
If the network is determined to be unstable, the adaptive bitrate manager adjusts (412) the bitrate so that the adaptive bitrate manager can reach a stable condition. For example, in some embodiments, the adaptive bitrate manager can use the estimated bitrate received from step 404, since in some embodiments, the received bitrate can be the best estimate of the actual bitrate that the network can support at that particular point in time. The adaptive bitrate manager then provides (416) the optimal session bitrate for transmitting the media data to the terminal. After providing step 416, the method may continue to end 418.
Fig. 5 is a flow chart representing an exemplary method for processing optimal session bit rate data. Referring to fig. 5, one of ordinary skill in the art will readily recognize that the illustrated process may be altered to delete steps or further include additional steps. For this exemplary method, it is assumed that both audio and video media data are present. Although both types exist, one of ordinary skill in the art will recognize that audio or video data may be present. After an initial start step 500, the adaptive bitrate manager obtains (502) optimal session bitrate data for transmitting media data to the terminal.
Upon receiving the optimal session bitrate data, the adaptive bitrate manager allocates (504) the optimal session bitrate between the audio and video streams to produce an optimal audio bitrate and an optimal video bitrate. For example, the allocation may be based on a predetermined allocation, user preferences, optimal performance data, giving a higher privilege to a type of data than others, an amount of audio and video data to be provided, and/or any combination thereof. For example, the adaptive bitrate manager can privilege audio quality in such a way that if a reduced bitrate is specified, the adaptive bitrate manager can reduce the video bitrate first and defer reducing the audio bitrate as much as possible.
The adaptive bitrate manager obtains (506) audio and video media data. In some embodiments, the obtaining step 506 may occur before the assigning step 504 or the obtaining step 502. After the allocating step 504 and the obtaining step 506, the adaptive bitrate manager encodes (508) the audio and video media data according to their respective allocated bitrates specified at step 504.
After encoding the audio and video streams according to the allocated bit rates, the adaptive bit rate manager provides (510) the encoded audio and video media data for transmission to the terminal. In some embodiments, media packetization receives encoded audio and video media data and converts the data to packet format. In other embodiments, the data is received by a media multiplexer to create a media clip file to be sent to the player over TCP. RTP defines a standardized packet format for delivering audio and video over the internet, while TCP provides its own packetization protocol for general data, which can also be used for media streaming. After converting the data to packet format, the media packetization may then transmit the audio and video media packets to the terminal. After providing the encoded audio and video media data, the method may continue to end 512.
The methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the claims that follow. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein.
Claims (24)
1. A method, comprising:
providing the pseudo-streaming media data to the terminal;
receiving a Transmission Control Protocol (TCP) acknowledgement from the terminal;
estimating one or more network conditions of a network based at least in part on the TCP acknowledgement;
determining an optimal session bitrate based on the estimated one or more network conditions; and
providing pseudo-streaming media data to the terminal according to the optimal session bitrate.
2. The method of claim 1, wherein determining the optimal session bitrate further comprises:
establishing a stability criterion based on the estimated one or more network conditions;
determining the stability of the network; and
providing the optimal session bitrate based on a stability determination.
3. The method of claim 2, wherein establishing stability criteria further comprises: the in-transit media time and the round trip time estimate are compared to assist in stability determination.
4. The method of claim 2, wherein establishing stability criteria further comprises: the received bit rate is compared to the current bit rate session.
5. The method of claim 2, further comprising: maintaining or incrementally increasing the current bit rate when the stability of the network is considered normal.
6. The method of claim 2, further comprising: and when the stability of the network is abnormal, adjusting the current bit rate.
7. The method of claim 1, further comprising: allocating the optimal session bitrate between audio media data and video media data to produce an optimal audio bitrate and an optimal video bitrate; and encoding audio and video media data according to the optimal audio bitrate and the optimal video bitrate.
8. The method of claim 7, wherein providing media data to the terminal comprises: providing encoded audio media data and encoded video media data based on the optimal audio bitrate and the optimal video bitrate.
9. A method, comprising:
receiving a Transmission Control Protocol (TCP) acknowledgement from a terminal based on the terminal receiving pseudo-streaming media data;
estimating one or more network conditions of a network based at least in part on the TCP acknowledgement;
determining a stability of the network based on the estimation;
controlling a bit rate based on the determination; and
the bit rate is provided to an encoder for transmitting the pseudo-streaming media data according to the provided bit rate.
10. The method of claim 9, wherein determining the stability of the network is based on establishing a stability criterion.
11. The method of claim 10, wherein establishing stability criteria further comprises: the in-transit media time and the round trip time estimate are compared to assist in stability determination.
12. The method of claim 10, wherein establishing stability criteria further comprises: the received bit rate is compared to the current bit rate session.
13. The method of claim 9, wherein controlling the bit rate comprises: maintaining or incrementally increasing the current bit rate when the stability of the network is considered normal.
14. The method of claim 9, wherein controlling the bit rate comprises: adjusting the current bit rate when the stability of the network is deemed to be abnormal.
15. A method, comprising:
receiving an optimal session bit rate based on information provided by a Transmission Control Protocol (TCP) acknowledgement;
allocating the optimal session bitrate between audio and video media to produce an optimal audio bitrate and an optimal video bitrate;
encoding audio and video media data according to the optimal audio bitrate and the optimal video bitrate;
multiplexing the encoded audio and video media data; and
the multiplexed audio and video data for transmission is provided to the terminal.
16. The method of claim 15, further comprising: the frames of the encoded video data are discarded.
17. The method of claim 15, wherein allocating an optimal session bitrate between audio and video media is based on giving audio media or video media higher privileges than others.
18. A system, comprising:
a terminal having a media player configured to receive pseudo-streaming media data and to provide a Transmission Control Protocol (TCP) acknowledgement based on the received pseudo-streaming media data;
an adaptive bitrate manager configured to receive the TCP acknowledgement, estimate one or more network conditions based at least in part on the TCP acknowledgement, determine an optimal session bitrate based on the estimated one or more network conditions, and provide pseudo-streaming media data to the terminal based on the optimal session bitrate.
19. The system of claim 18, wherein the adaptive bitrate manager further comprises an adaptive bitrate controller to receive a receiver report and calculate an optimal session bitrate.
20. The system of claim 19, wherein the adaptive bitrate manager further comprises an encoder for obtaining the optimal session bitrate, allocating the optimal session bitrate between audio and video media to produce an optimal audio bitrate and an optimal video bitrate, encoding audio and video data according to the optimal audio bitrate and the optimal video bitrate, and providing the encoded audio and video data for transmission to the terminal.
21. A computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method for processing a Transmission Control Protocol (TCP) acknowledgement, the method comprising:
providing the pseudo-streaming media data to the terminal;
receiving the TCP acknowledgement from the terminal;
estimating one or more network conditions of a network based at least in part on the TCP acknowledgement;
determining an optimal session bitrate based on the estimated one or more network conditions; and
providing pseudo-streaming media data to the terminal according to the optimal session bitrate.
22. A computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method for processing a Transmission Control Protocol (TCP) acknowledgement, the method comprising:
receiving a Transmission Control Protocol (TCP) acknowledgement from a terminal based on the terminal receiving pseudo-streaming media data;
estimating one or more network conditions of a network based at least in part on the TCP acknowledgement;
determining a stability of the network based on the estimation;
controlling a bit rate based on the determination; and
the bit rate is provided to an encoder for transmitting the pseudo-streaming media data according to the provided bit rate.
23. A computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method for handling an optimal session bitrate, the method comprising:
receiving the optimal session bitrate based on information provided by a Transmission Control Protocol (TCP) acknowledgement;
allocating the optimal session bitrate between audio and video media to produce an optimal audio bitrate and an optimal video bitrate;
encoding audio and video media data according to the optimal audio bitrate and the optimal video bitrate;
multiplexing the encoded audio and video media data; and
the multiplexed audio and video data for transmission is provided to the terminal.
24. A terminal, comprising:
a buffer that receives pseudo-streaming media data packets transmitted over the network by the adaptive bitrate manager, an
A media player to receive the pseudo-streaming media data packets and to provide a Transmission Control Protocol (TCP) acknowledgement to the adaptive bitrate manager, the adaptive bitrate manager to receive the Transmission Control Protocol (TCP) acknowledgement, to estimate one or more network conditions of the network based at least in part on a receiver report, to determine an optimal session bitrate based on the estimated one or more network conditions, and to provide pseudo-streaming media data to the buffer according to the optimal session bitrate.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/416085 | 2009-03-31 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1170605A true HK1170605A (en) | 2013-03-01 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2415234B1 (en) | Adaptive bitrate management for streaming media over packet networks | |
| US8621061B2 (en) | Adaptive bitrate management for streaming media over packet networks | |
| EP2719144B1 (en) | On-demand adaptive bitrate management for streaming media over packet networks | |
| EP2962435B1 (en) | Link-aware streaming adaptation | |
| US20050213502A1 (en) | Method and system for controlling operation of a network, such as a WLAN, related network and computer program product therefor | |
| US9596323B2 (en) | Transport accelerator implementing client side transmission functionality | |
| WO2016105846A1 (en) | Link-aware streaming adaptation | |
| WO2012161652A1 (en) | Methods for transmitting and receiving a digital signal, transmitter and receiver | |
| WO2014110670A1 (en) | Media server | |
| EP1533969A1 (en) | Loss reporting for packet-switched streaming services using loss RLE report blocks | |
| HK1170605A (en) | Adaptive bitrate management for streaming media over packet networks |