HK1234236A1 - Device and system for supporting dynamic adaptive streaming over hypertext transfer protocol - Google Patents
Device and system for supporting dynamic adaptive streaming over hypertext transfer protocol Download PDFInfo
- Publication number
- HK1234236A1 HK1234236A1 HK17107718.7A HK17107718A HK1234236A1 HK 1234236 A1 HK1234236 A1 HK 1234236A1 HK 17107718 A HK17107718 A HK 17107718A HK 1234236 A1 HK1234236 A1 HK 1234236A1
- Authority
- HK
- Hong Kong
- Prior art keywords
- media content
- content segments
- quality metric
- segments
- representation
- Prior art date
Links
Description
Background
The growth of multimedia services, including streaming and conversational services, is one of the key drivers driving the development of new mobile broadband technologies and standards. Digital video content is increasingly being consumed in mobile devices. Many video applications are widely used on mobile devices in everyday life. For example, online video streams include popular services such as YouTube and Hulu. Video recording and video conferencing include services such as Skype and Google ring chat. In 2011, YouTube possessed over 1 trillion global viewing times. 10% of the viewing times are accessed via a cell phone or tablet computer. As more and more smartphones, tablets, and other mobile computing devices are purchased, their use for video recording and video conferencing will increase dramatically. With such high consumption demand for multimedia services and the development of media compression and wireless network infrastructure, it is meaningful to enhance the multimedia service capabilities of future cellular and mobile broadband systems and provide consumers with a high quality of experience (QoE), thereby ensuring ubiquitous access to video content and services from any location at any time using any devices and technologies.
Brief Description of Drawings
Features and advantages of the present disclosure will become apparent upon consideration of the detailed description taken in conjunction with the accompanying drawings, which illustrate, by way of example, the features of the present disclosure together; and wherein:
fig. 1 illustrates a block diagram of a Media Presentation Description (MPD) metadata file configuration according to an example;
FIG. 2 illustrates a block diagram of a hypertext transfer protocol (HTTP) flow, according to an example;
fig. 3 illustrates a block diagram of an energy characterization aware Radio Access Network (RAN) architecture for hypertext transfer protocol-based (HTTP-based) video streaming, according to an example;
fig. 4 is a diagram of dynamic adaptive streaming in a hypertext transfer protocol (DASH) Media Presentation Description (MPD) file generation process according to an example;
FIG. 5 illustrates quality variations between media content segments within a representation of media content segments according to an example;
fig. 6 is a diagram of a dynamic adaptive flow in a hypertext transfer protocol (DASH) Media Presentation Description (MPD) file generation process including an MPD post-processing technique according to an example;
fig. 7 depicts functionality of circuitry of a network device operable to support dynamic adaptive streaming over hypertext transfer protocol (DASH), according to an example;
fig. 8 depicts a flow diagram of a method for supporting dynamic adaptive streaming over hypertext transfer protocol (DASH), according to an example;
fig. 9 depicts functionality of circuitry of a network device operable to support dynamic adaptive streaming of hypertext transfer protocol (DASH), according to an example; and
fig. 10 illustrates a diagram of a wireless device (e.g., UE) according to an example.
Reference will now be made to the exemplary embodiments illustrated, and specific language will be used herein to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended.
Detailed Description
Before the present invention is disclosed and described, it is to be understood that this invention is not limited to the particular structures, process steps, or materials disclosed herein, but extends to equivalents thereof as would normally be recognized by those skilled in the relevant art. It is also to be understood that the terminology used herein is for the purpose of describing particular examples only and is not intended to be limiting. Like reference symbols in the various drawings indicate like elements. The numerals provided in the flowcharts and processes are provided for clarity of explanation of the steps and operations and do not necessarily indicate a particular order or sequence.
Example embodiments
An initial overview of technical embodiments is provided below, and then specific technical embodiments are described in further detail later. This initial summary is intended to aid the reader in understanding the present technology more quickly, and is not intended to identify key features or essential features of the technology, nor is it intended to be limiting as to the scope of the claimed subject matter.
A technique for grouping media content segments of similar quality within a representation of the media content segments in a Media Presentation Description (MPD) file is described. For example, the MPD file may be capable of describing the first set of media content files in a first representation (e.g., a representation at a relatively high bit rate). In addition, the MPD file may be capable of describing the second set of media content files in a second representation (e.g., a representation at a relatively low bit rate). The representation can relate to a set of media content files at a defined quality level and/or a defined bit rate. A quality metric (e.g., a media content segment having a relatively low bit rate) can be determined for each of a plurality of media content segments within the same representation. Media content segments having a quality metric below a selected threshold can be identified. For example, in a group of ten media content segments in the representation, one of the media content segments can have a relatively low quality (as evidenced by a quality metric for the one media content segment being below a selected threshold). Media content segments in the representation having a quality metric below a selected threshold can be replaced with other media content segments. For example, media content segments having a quality metric lower than the selected average can be replaced by corresponding media content segments from different representations. The different representations can include a set of media content files at a relatively high bit rate. The corresponding media content segments can be from substantially the same media time range as compared to the media content segment being replaced. As a result, the media content segments used for the presentation can all have relatively similar quality levels (although the bit rate can vary slightly between media content segments). The modified MPD can be generated to include media content segments with relatively similar quality levels. The modified MPD can be transmitted to the client, where the modified MPD can provide substantially constant quality playback of the media content segments at the client.
In an alternative configuration, media content segments in the representation having a quality metric below a selected threshold can be re-encoded. Re-encoding the media content segments can increase the quality level such that the quality metric of the re-encoded media content segments can be above a selected threshold. A modified MPD may be generated to include the re-encoded media content segments. As a result, the media content segments in the representation (i.e., the re-encoded media content segments and the non-re-encoded media content segments) may have relatively similar quality levels.
Hypertext transfer protocol (HTTP) adaptive streaming (HAS) may be used as a form of multimedia delivery of internet video. HTTP-based delivery may provide reliability and deployment simplicity due to the widespread adoption of HTTP and the underlying protocols for HTTP, including Transmission Control Protocol (TCP)/Internet Protocol (IP). HTTP-based delivery can enable simple and easy streaming services by avoiding Network Address Translation (NAT) and firewall traversal problems. HTTP-based delivery or streaming may also provide the ability to use standard HTTP servers and caches instead of dedicated streaming servers. HTTP-based delivery may provide scalability due to minimal or reduced state information on the server side.
When using HAS to deliver internet multimedia content, a video client operating on a mobile device may be configured to perform a primary role in rate adaptation by selecting and requesting an appropriate video presentation level from a video server using HTTP GET or partial GET commands to retrieve data from a specified resource, such as a multimedia server. The video client first establishes the buffer to a certain level before starting playback of streaming multimedia content, such as audio or video. This phase is called the start-up phase. Thereafter, the client begins playback of the buffered multimedia content. The quality and resolution of multimedia playback at the client device depends on the available link bandwidth. Video clients typically estimate available link bandwidth based only on higher layer throughput estimates, such as HTTP level video streaming throughput or based on Transmission Control Protocol (TCP) throughput.
Multimedia streaming in high mobility environments can be challenging when fluctuations in network conditions (i.e., network variability) reduce the communication data rate associated with multimedia content. When an overloaded network causes a reduction in the communication data rate, the end user quality of experience (QoE) may also be reduced. For example, multimedia content received at the mobile device may have a lesser resolution or quality, and/or the multimedia content may be periodically interrupted or paused while being provided over an overloaded network.
The use of progressive download based streaming techniques in mobile networks with limited resources may be undesirable due to inefficient bandwidth utilization and poor end user quality of experience. As discussed in further detail below, a hypertext transfer protocol (HTTP) -based streaming service, such as dynamic adaptive streaming over HTTP (DASH), may be used to address the vulnerability of progressive download-based streaming.
Multimedia content streamed to a client, such as User Equipment (UE), may include a plurality of multimedia content segments. The multimedia content segments may each contain different encoded versions representing different quality levels of the multimedia content. Different encoded versions may allow clients to seamlessly adapt to changing network conditions. For example, when network conditions are good (i.e., network conditions are above a predetermined threshold), the client may request a multimedia content segment with higher video quality. When the network conditions are poor (i.e., the network conditions are below a predetermined threshold), the client may request a multimedia content segment with lower video quality. As a result, the client is still able to receive multimedia content segments (albeit of lower quality) when network conditions are poor and the likelihood of the adaptive media stream being interrupted can be reduced.
In DASH, the client may select the multimedia content segment with the highest bit rate so that the multimedia content segment can be downloaded at the client in time for media playback without causing a rebuffering event in the media playback. In other words, the client may not select multimedia content segments so high that the adaptive media stream is periodically interrupted in order to buffer or preload a portion of the media content onto the client before resuming media playback at the client. In one example, adverse network conditions may degrade the quality of the media content stream. Adverse network conditions may include coverage nulls, sudden bandwidth changes, packet loss, substantial delay variations, and the like. While adaptive streaming techniques may take current network conditions into account when calculating available throughput and when determining an appropriate streaming bitrate based on the available throughput, smooth media playback at the client may not be guaranteed during sudden network changes and/or adverse network conditions.
Thus, to maintain a desired quality of experience of the adaptive media stream at the client, the client's planned route and current network conditions along the planned route may be used to strategically cache multimedia content segments at the client, resulting in smoother media playback and enhanced quality of experience for the client. The client may select a planned route (i.e., a geographical route from which the client is about to start). The client may be streaming media content (e.g., a movie) while traveling on the planned route. In one example, the client may include a mobile device located within a moving vehicle or a computing device of the vehicle. The client may receive the current network conditions for the planned route from a Channel Information Database (CID). The current network conditions may include certain locations along the planned route (e.g., tunnels, bridges, remote areas) that have corresponding network conditions below a predetermined threshold. The client may request additional media content segments of the media content (e.g., additional segments of a movie) from the media content server and then store the additional media content segments in the cache. When the client reaches a location along a planned route having network conditions below a predetermined threshold, the client may play back media content stored in the cache. As a result, continuous media playback may be substantially provided at the client even during times when current network conditions along the planned route fall below the predetermined threshold.
Wireless multimedia standard
A number of multimedia standards have been developed to enable multimedia to be transferred to, from, or between mobile computing devices. For example, in streaming video, the third generation partnership project (3GPP) has developed Technical Specification (TS)26.234 (e.g., version 11.0.0) which describes packet-switched streaming services (PSS) based on real-time streaming protocol (RTSP) for on-demand unicast streaming or live content. Furthermore, hypertext transfer protocol (HTTP) -based streaming services are described in 3GPP TS 26.247 (e.g., release 11.0.0), including progressive download and dynamic adaptive streaming over HTTP (DASH). The 3 GPP-based Multimedia Broadcast and Multicast Service (MBMS) specification TS 26.346 (e.g., release 11.0.0) specifies streaming and downloading techniques for multicast/broadcast content distribution. Thus, DASH/PSS/MBMS-based mobile computing devices, such as User Equipment (UEs), decode and present streamed video at the UE devices. The 3GP file format in 3GPP TS 26.244 (e.g., release 11.0.0) is mandated to be supported in all of these specifications to support file downloads and HTTP-based stream usage.
One example of a standard for conversational video communication, such as video conferencing, is provided in 3GPP TS 26.114 (e.g., 11.0.0). The standard describes multimedia telephony service over IMS (MTSI), which allows Internet Protocol (IP) multimedia subsystem (IMS) based networks to deliver advanced multimedia session services and content. IMS is standardized in 3GPP TS 26.140 (e.g., rel.2011.0.0). A MTSI-based transmitter UE terminal may capture and record video and then transmit the video to a MTSI-based receiver UE terminal over a 3GPP network. The receiver UE terminal may then decode and render the video. 3GPP TS 26.140 also enables video sharing using multimedia sharing service (MMS), where support for 3GP file formats is provided.
The above-described standards are provided as examples of wireless multimedia standards that may be used to transfer multimedia files to, from, and/or between multimedia devices. These examples are not intended to be limiting. Additional criteria may be used to provide streaming video, conversational video, or video sharing.
Streaming media standards
A more detailed explanation of HTTP streaming is provided in the context of embodiments of the present invention, and more particularly, the DASH standard is provided herein. This detailed explanation is not intended to be limiting. As will be further explained in the preceding paragraphs, embodiments of the present invention can be used to select and/or transmit multimedia having desired energy characteristics by enabling a mobile device or a server in communication with the mobile device to efficiently transmit multimedia to, from, and/or between mobile devices. Multimedia may be transmitted using standardized or non-standardized communication schemes.
Hypertext transfer protocol (HTTP) streaming may be used as a form of multimedia delivery of internet video. In HTTP streaming, a multimedia file may be segmented into one or more segments and delivered to a client using the HTTP protocol. HTTP-based delivery may provide reliability and deployment simplicity due to the widespread adoption of both HTTP and HTTP underlying protocols, including the delivery control protocol (TCP)/Internet Protocol (IP). HTTP-based delivery may enable simplified streaming services by avoiding Network Address Translation (NAT) and firewall traversal issues. HTTP-based delivery or streaming may also provide the ability to use standard HTTP servers and caches instead of dedicated streaming servers. HTTP-based delivery may provide scalability due to minimal or reduced state information on the server side. Examples of HTTP streaming technologies may include microsoft iis smooth streams, Apple HTTP real-time streams, and Adobe HTTP dynamic streams.
DASH is a standardized HTTP streaming protocol. As shown in fig. 1, DASH may specify different formats for Media Presentation Description (MPD) metadata files 102 that provide information about the structure and different versions of media content representations stored in the server, as well as segment formats. The MPD metadata file contains information about the initialization and media segments for the media player (e.g., the media player may look at the initialization segments to determine the container format and media timing information) to ensure that segments are mapped to the media presentation timeline for switching and synchronized presentations with other representations. DASH technology has also been standardized by other organizations, such as Moving Picture Experts Group (MPEG), open IPTV forum (OIPF), and hybrid broadcast broadband television (HbbTV).
A DASH client may receive multimedia content by downloading segments in a series of HTTP request response transactions. DASH may provide the ability to dynamically switch between different bit rate representations of media content as the bandwidth available to the mobile device changes. Thus, DASH may allow for rapid adaptation to changing network and wireless link conditions, user preferences, and device capabilities, such as display resolution, type of Central Processing Unit (CPU) employed, available memory resources, and so forth. Dynamic adaptation of DASH may provide users with better quality of experience (QoE) with shorter startup delay and fewer rebuffering events than other streaming protocols.
In DASH, Media Presentation Description (MPD) metadata 102 may provide information about the structure and different versions of media content representations stored in web/media server 212, as shown in fig. 2. In the example shown in fig. 1, the MPD metadata is temporally divided into periods having a predetermined length, such as 60 seconds in this example. Each cycle may include multiple adaptation sets 104. Each adaptation set may provide information about one or more media components with multiple encoding alternatives. For example, the adaptation set 0 in this example may include various differently encoded audio alternatives, such as different bitrates, mono, stereo, surround, etc. In addition to providing different quality audio for the multimedia presentation over the period ID, the adaptation set may also include audio in different languages. The different alternatives provided in the adaptation set are referred to as representations 106.
In fig. 1, adaptation set 1 is shown to provide video at different bit rates, such as 5 megabits per second (Mbps), 2Mbps, 500 kilobits per second (kbps), or trick modes. Trick modes can be used for searching, fast forwarding, rewinding or other changes of position in a multimedia stream file. Further, video may also be available in different formats, such as two-dimensional (2D) video or three-dimensional (3D) video. Each representation 106 may include segmentation information 108. The segmentation information may include initialization information 110 and actual media segment data 112. In this example, an MPEG 4(MP4) file is streamed from a server to a mobile device. Although MP4 is used in this example, a variety of different codecs may be used, as previously described.
The multimedia in the adaptation set may be further divided into smaller segments. In the example of fig. 1, the 60 second video segment of the adaptation set 1 is further divided into four sub-segments 112 of 15 seconds each. These examples are not intended to be limiting. The actual length of the adaptation set and each media segment or sub-segment depends on the type of media, system requirements, type of potential interference, etc. The actual media segments or sub-segments may have lengths that are less than one second to several minutes long.
As shown in fig. 2, MPD metadata information may be transmitted to a client 220, such as a mobile device. The mobile device may be a wireless device configured to receive and display streaming media. In one embodiment, the mobile device may perform only a portion of this functionality, such as receiving streaming media and then transmitting it to another device or display device for presentation. The mobile device may be configured to run a client 220. The client may request the segment using an HTTP GET 240 message or a series of partial GET messages. The client may control the streaming session, such as managing on-time requests and smooth playout of segment sequences, or potentially adjusting the bit rate or other attributes to react to changes in the wireless link, device state, or user preferences.
Fig. 2 illustrates a DASH-based streaming framework. A media encoder 214 in the network/media server 212 may encode the input media from the audio/video input 210 into a format for storage or streaming. The media segmenter 216 may be used to separate the input media into a series of segments 232, which may be provided to the web server 218. The client 220 may request new data in the segment using an HTTP GET message 234 sent to a web server (e.g., an HTTP server).
For example, the web browser 222 of the client 220 may request multimedia content using an HTTP GET message 240. The web server 218 may provide the MPD 242 for multimedia content to the client. The MPD may be used to convey the index of each segment and the respective location of the segments, as indicated by associated metadata information 252. The web browser may pull media from the server segment by segment according to MPD 242 as shown by 236. For example, the web browser may request the first segment using an HTTP GET URL (frag1req) 244. A Uniform Resource Locator (URL) or universal resource locator may be used to tell the web server which segmented client is to request 254. The web server may provide the first segment (i.e., segment 1246). For subsequent segments, the web browser can request segment i using HTTP GET URL (frag i req)248, where i is the integer index of the segment. As a result, the web server can provide segment i 250. The segments may be presented to the client via a media decoder/player 224.
Fig. 3 shows the flow of multimedia content 312 between HTTP servers 310 providing multimedia content to a 3GPP client 338 operating on a mobile device such as UE 336. The HTTP server may interface with a public or private network 322 (or the internet), which communicates with a core network 324 of a Wireless Wide Area Network (WWAN). In one embodiment, the WWAN may be a 3GPP LTE based network or an IEEE 802.16 based network (i.e., 802.16-2009). The core network may access a wireless network 330, such as an Evolved Packet System (EPS), via a Radio Access Network (RAN) 332. RAN 332 may provide multimedia content to clients operating on UE 336 via a node (e.g., evolved node b (enb) 334).
The HTTP server 310 may be coupled to a channel information database 350. The channel information database 350 may include current network conditions for a plurality of geographic locations. The plurality of geographic locations may include particular roads, streets, neighbors, geographic areas, bridges, tunnels, and the like. The current network conditions may be based on real-time monitoring of current network conditions for multiple geographic locations. Accordingly, the channel information database 350 may be dynamically updated due to changes in current network conditions. Alternatively, current network conditions may be inferred based on historical network condition information for multiple geographic locations. In yet another example, crowd-sourced network condition information may be used to determine current network conditions.
In DASH, the media content may be stored in different representations (e.g., corresponding to quality levels). Each representation may include a list of media content segments that may be requested by a client (e.g., a mobile device). Information about the different representations of each representation and the list of media content segments may be compiled in the MPD and downloaded by the client, and based on the MPD, the client may request the different media content segments from the server.
In one example, various post-processing operations may be performed on DASH-formatted content and associated MPD files, i.e., manifest files, in order to handle quality variations introduced by the encoding process. Video content characteristics often vary based on the nature of the content, which is one reason why encoders cannot always produce consistent quality and, at the same time, produce bitstreams with certain specified bit rates. For example, rapidly changing scenes with a relatively large amount of motion, such as in a motion video clip, may be difficult to encode at a consistent quality, and thus the quality of the encoded data may fluctuate significantly. As another example, transitions between scenes may be difficult to encode without introducing a certain level of quality variation. On the other hand, encoding slow-changing scenes may encode with less quality change because a relatively small number of bits are used to represent the scenes.
Many commercial encoders (or video codecs) produce segments of encoded media content with varying levels of quality. Video code is a device or software that implements compression or decompression of digital video. Some examples of video codecs include H.265 or Motion Picture Experts Group (MPEG) -H High Efficiency Video Coding (HEVC), H.264 or MPEG-4 Advanced Video Coding (AVC), or H.263/MPEG-4 part 2.
Fig. 4 is a diagram of an exemplary dynamic adaptive flow of a hypertext transfer protocol (DASH) Media Presentation Description (MPD) file generation process. In some examples, the DASH MPD file generation process may occur at a server (e.g., an edge server) in a Content Delivery Network (CDN) or an operator network. The input video may be received at a server. The input video may include media content such as a sports game or a news broadcast. The media content received at the server may include a single file (e.g., a file of a 2-hour news event). In addition, the input video may be an original uncompressed video signal. In block 402, a video/audio encoding process may be performed on the media content. The video encoding (or video transcoding) process may organize the media content into a digital format compatible with network players and mobile devices. In other words, the media content may undergo a video encoding process to convert the media content into a format that is viewable on various devices. Examples of video codecs may include h.265, h.264, Windows Media Video (WMV), and so forth. Examples of audio codecs include MPEG-1 or MPEG-2 audio layer III (MP3) and Windows Media Audio (WMA). In block 404, a video multiplexing process may be performed to interleave the audio content and the video content with each other.
In block 406, the media content (with interleaved video and audio) may undergo a segmentation process. In other words, the media content may be partitioned into a plurality of media content segments. For example, each media content segment may be 0.5 seconds long, 1 second long, 2 seconds long, and so on. In step 408, a DASH MPD describing media content segments may be generated. The DASH MPD file generation process may be repeated for each bit rate specified by the content provider. Thus, the MPD may contain multiple representations for a given bit rate. The bit rate may be different from one representation to another (e.g., 500 kilobits/second, 1000 kilobits/second, 1500 kilobits/second) in order to provide adaptive media content streaming. A DASH MPD may be transmitted to a client, and the client may use the DASH MPD to request particular media content segments from a server for playback at the client.
FIG. 5 illustrates exemplary quality variations between media content segments within a representation of media content segments. As shown in FIG. 5, the quality level of a typical representation may generally remain constant, but may include higher quality and lower quality anomalies in the representation. Due to the nature of variable bit rate coding, the media content being coded, and the different coding techniques employed by the video coding process, the media content segments resulting from the coding, multiplexing, and splitting processes (as shown in fig. 4) may vary in quality level (e.g., a particular media content segment may have a greater or lesser quality level than other media content segments). In other words, the process of encoding an uncompressed video signal may cause quality variation.
In one example, fast moving scenes in the video content and slow moving scenes in the video content may be encoded using the same encoding techniques. However, due to the relatively large number of bits for fast moving scenes and the relatively low number of bits for slow moving scenes, fast moving scenes may have a lower quality than slow moving scenes, even if both scenes are in the same representation. For clients with different supported bit rates, quality variations from one media content segment to another media content segment may be noticed by the client. In other words, the user of the client may notice a change in quality level between different scenes or frames, potentially resulting in a poor user experience.
Fig. 6 is a diagram of an exemplary dynamic adaptive flow of a hypertext transfer protocol (DASH) Media Presentation Description (MPD) file generation process including an MPD post-processing technique. In some examples, the DASH MPD file generation process with MPD post-processing techniques may occur at a server (e.g., an edge server) in a Content Delivery Network (CDN) or an operator network. The input video may be received at a server. The input video may include media content such as a sports game or a news broadcast. In block 602, a video/audio encoding process may be performed on the media content. In block 604, a video multiplexing process may be performed to interleave the audio content and the video content with each other. In block 606, the video content and the audio content (or media content stream) may undergo a segmentation process. In other words, the media content stream may be partitioned into a plurality of media content segments. In step 608, a DASH MPD describing media content segments may be generated. In step 610, an MPD post-processing technique may be performed on the DASH MPD. Media content segments of similar quality within the representation may be grouped together and used to create a modified MPD 620. In one example, the MPD post-processing techniques may be performed at a DASH encoder instead of at a video codec.
The quality metrics for each media content segment in each representation may be compared, for example, using quality measurement tool 614. The quality metric may be an objective criterion or a subjective criterion used to determine a quality level of the media content segment. In general, media content quality (or video quality) refers to a formal or informal measure of perceived video degradation after the original video content and the video content pass through a video transmission or processing system (e.g., a video encoder). In other words, the quality metric may measure the difference between the original video signal, which is generally considered to be a high quality (since the original video signal is not compressed) and encoded (or otherwise transformed) video signal. One technique for assessing the quality of a digital video processing system (e.g., a video codec) is to calculate the signal-to-noise ratio (SNR) and peak signal-to-noise ratio (PNSR) between the original video signal and the signal passing through the video processing system. PSNR is a commonly used objective video quality metric. Other quality metrics may include Perceptual Evaluation of Video Quality (PEVQ), Structural Similarity (SSIM), and Czenakowski distance (CZD). A quality metric may be assigned to each media content segment or, alternatively, each media content segment may be divided into media content sub-segments and each media content sub-segment may be assigned a quality metric.
The quality metric for each media content segment may be compared to other media content segments in the same representation. Media content segments in the representation are not affected if the media content segments have a substantially similar quality as other media content segments in the representation. When the quality metric for each of the media content segments exceeds the selected quality threshold 612, the media content segments may not be affected. In some examples, the selected quality threshold 612 may be determined by a server or User Equipment (UE).
If the media content segments in the representation are below the selected quality threshold 612, these media content segments may be replaced in the representation. For example, these media content segments may be replaced by corresponding media content segments from different representations described in the MPD file. In some examples, the different representations may include a set of media content files at a relatively high bit rate or a relatively low bit rate. The quality metric of the corresponding media content segment may be greater than the selected quality threshold 612. The respective media content segments may be from substantially the same media time range in different representations. As a result, the media content segments used for the representations may have substantially similar qualities. Modified MPD 620 may be generated to include media content segments, each having relatively similar quality levels. The modified MPD 620 may be transmitted to the client, where the modified MPD may provide substantially constant quality playback of the media content segments at the client. A representation of the media content with minimal temporal variation in quality may be provided to the client. Thus, the client may access a given representation and experience a stable quality and a reduced amount of quality fluctuation during playback.
By way of non-limiting example, a 2 second media content segment may correspond to a video timecode of 2:11:22 (i.e., two hours, eleven minutes, and twenty-two seconds) to 2:11: 24. A 2 second segment of media content may be included in a 4.5 megabits per second (Mbps) video stream. The quality metric for the 2 second media content segment may be determined to be below the quality threshold 612. The corresponding media content segment from the higher representation (e.g., a 2 second media content segment with a video time code of 2:11:22 to 2:11:24 in a 5.3Mbps video stream) may replace the 2 second media content segment with a reduced quality. Thus, media content segments with similar quality (despite different bitrates) may be grouped together in a representation. The media content segments may be mixed and matched to achieve a set of media content segments in all representations having substantially the same level of quality.
In one example, media content segments that are grouped together may have similar quality levels, but different bitrates. For example, relatively high bitrate media content segments can be mixed with relatively low bitrate media content segments (even if the quality levels are substantially similar). The client may operate more efficiently when the bit rate of the media content segments has less fluctuation. Therefore, a minimum amount of bit rate fluctuation may be desired for the client. On the other hand, segments with different quality levels may be undesirable to the user of the client. Thus, a trade-off or trade-off may be made between achieving a stable bit rate and achieving a stable video quality.
In an alternative configuration, media content segments in the representation having quality metrics below the selected quality threshold 612 may be re-encoded. For example, a media content segment may be re-encoded using a different encoder configuration. The media content segments may be re-encoded using a video codec (e.g., h.264) or an audio codec (e.g., MP 3). Re-encoding the media content segments may increase the quality level such that the quality metric of the re-encoded media content segments may be above the quality threshold 612. In other words, the media content segment may be re-encoded and a quality metric of the re-encoded media content segment may be determined. If the quality metric is now above the quality threshold 612, the re-encoded media content segment may be included in the MPD. As a result, the media content segments in the representation (i.e., both the media content segments that are re-encoded and the media content segments that are not re-encoded) may have relatively similar quality levels.
In one example, modified MPD 612 may be generated for a particular device type (or target device), as quality threshold 612 may depend on the type of device being used by the user. For example, a high resolution 12 inch display screen may have a greater number of pixels than a 6 inch display screen, and thus, a video with acceptable quality on a 6 inch display screen may be an unacceptable 12 inch display screen. Thus, for a given target device with known capabilities (e.g., screen size, screen resolution), MPD post-processing may be performed for that particular target device. The media content may be re-encoded at the DASH level for each device type. Media content for a television may be encoded differently than media content for a smartphone or tablet computer. In one example, a client subscribing to an advanced subscription plan can access media content encoded for that particular client.
In another example, MD post-processing may be used to create a new representation of media content by using a combination of existing representations, but with new bitrate values that are more suitable for a particular client. For example, a media content segment for a 500Kbps representation and a media content segment for a 1000Kbps representation (i.e., the same media content) can be combined to create a 750Kbps representation of the media content without having to transcode the media content.
Another example provides functionality 700 of circuitry of a network device operable to support dynamic adaptive streaming over hypertext transfer protocol (DASH), as shown in the flow diagram of fig. 6. The functions may be implemented as a method or the functions may be executed as instructions on a machine, where the instructions are included on at least one computer readable medium or one non-transitory machine readable storage medium. The circuitry may be configured to identify a plurality of media content segments in a defined representation described in a Media Presentation Description (MPD) file, as in block 710. The circuitry may be configured to determine a quality metric for each of a plurality of media content segments in a defined representation described in an MPD file, as in block 720. The circuitry may be configured to identify a media content segment in the defined representation, wherein the determined quality metric is below a selected threshold, as in block 730. Further, the circuitry may be configured to replace the identified media content segments with corresponding media content segments from different representations described in the MPD file to form a modified MPD file, wherein a quality metric of the corresponding media content segments is greater than a selected threshold in order to provide substantially constant quality playback of the media content segments in the defined representations, as in block 740.
In one example, the circuitry may be further configured to transmit the modified MPD file to a client device that supports DASH. In another example, the circuitry may also be configured to generate a modified MPD for a particular device type. In yet another example, the network device is located in a Content Delivery Network (CDN) or in an operator network.
In one aspect, the circuitry may be further configured to identify a quality metric for the media content segment using at least one of a bit rate parameter or a quality parameter. In another aspect, relative to media content segments in a defined representation, the respective media content segments are from substantially the same media time range in a different representation. In yet another aspect, the circuitry may be further configured to re-encode the media content segment in the defined representation, wherein the determined quality metric is below a selected threshold such that the determined quality metric of the re-encoded media content segment is above the selected threshold.
Another example provides a method 800 for supporting dynamic adaptive streaming over hypertext transfer protocol (DASH), as shown in the flow diagram of fig. 6. The method may be performed as instructions on a machine, where the instructions are included on at least one computer-readable medium or one non-transitory machine-readable storage medium. The method may include the operation of determining, at a network device, a quality metric for each of a plurality of media content segments in a defined representation described in a Media Presentation Description (MPD) file, as in block 810. The method may include the operation of identifying a media content segment in the defined representation, wherein the determined quality metric is below a selected threshold, as in block 820. The method may include the operation of replacing the identified media content segments with corresponding media content segments from different representations described in the MPD file to form a modified MPD file, wherein a quality metric of the corresponding media content segments is greater than a selected threshold in order to provide substantially constant quality playback of the media content segments in the defined representations, as in block 830.
In one example, the method may include an operation of transmitting the modified MPD file from the network device to a client device that supports DASH. In another example, the method may include the operation of receiving, at a network device, a selected threshold value for a quality metric from a User Equipment (UE). In yet another example, the method may include an operation of generating a modified MPD for a particular device type.
In one configuration, the network device is located in a Content Delivery Network (CDN) or an operator network. In another configuration, the method further includes identifying a quality metric for the media content segment using at least one of a bit rate parameter or a quality parameter. In yet another configuration, the respective media content segments are from substantially the same media time range in a different representation than the media content segments in the defined representation. Additionally, the method may include the operation of re-encoding the media content segment in the defined representation, wherein the determined quality metric is below a selected threshold such that the determined quality metric of the re-encoded media content segment is above the selected threshold.
Another example provides functionality 900 of circuitry of a network device operable to support dynamic adaptive streaming over hypertext transfer protocol (DASH), as shown in the flow diagram of fig. 6. The functionality may be implemented as a method or the functionality may be executed as instructions on a machine, where the instructions are included on at least one computer readable medium or one non-transitory machine readable storage medium. The circuitry may be configured to identify a plurality of media content segments in a defined representation described in a Media Presentation Description (MPD) file, as in block 910. The circuitry may be configured to determine a quality metric for each of a plurality of media content segments in a defined representation described in an MPD file, as in block 920. The circuitry may be configured to identify a media content segment in the defined representation, wherein the determined quality metric is below a selected threshold, as in block 930. The circuitry may be further configured to re-encode the media content segment in the defined representation, wherein the determined quality metric is below a selected threshold such that the determined quality metric of the re-encoded media content segment is above the selected threshold, as in block 940. Moreover, the circuitry may be further configured to generate the modified MPD to include the re-encoded media content segments so as to provide substantially constant quality playback of the media content segments in the defined representation, as in block 950.
In one example, the circuitry may be further configured to transmit the modified MPD file to a client device that supports DASH. In another example, the circuitry may also be configured to generate a modified MPD for a particular device type. In yet another example, the network device is located in a Content Delivery Network (CDN) or in an operator network.
In one aspect, the circuitry may be further configured to identify a quality metric for the media content segment using at least one of a bit rate parameter or a quality parameter. In another aspect, the circuitry may be further configured to replace the identified media content segments with corresponding media content segments from different representations described in the MPD file to form a modified MPD file, wherein the quality metric of the corresponding media content segments is greater than the selected threshold. In yet another aspect, the respective media content segments are from substantially the same media time range in a different representation than the media content segments in the defined representation.
Fig. 10 provides an example illustration of a wireless device, such as a User Equipment (UE), Mobile Station (MS), mobile wireless device, mobile communication device, tablet, handset, or other type of wireless device. The wireless device may include one or more antennas configured to communicate with a node or transmission station, such as a Base Station (BS), evolved node b (enb), baseband unit (BBU), Remote Radio Head (RRH), Remote Radio Equipment (RRE), relay station (RE), Remote Radio Unit (RRU), Central Processing Module (CPM), or other type of Wireless Wide Area Network (WWAN) access point. The wireless device may be configured to communicate using at least one wireless communication standard including 3GPP LTE, WiMAX, High Speed Packet Access (HSPA), bluetooth, and WiFi. The wireless device may communicate using separate antennas for each wireless communication standard or shared antennas for multiple wireless communication standards. The wireless device may communicate in a Wireless Local Area Network (WLAN), a Wireless Personal Area Network (WPAN), and/or a WWAN.
Fig. 10 also provides an illustration of a microphone and one or more speakers, which may be used for audio input and output from the wireless device. The display screen may be a Liquid Crystal Display (LCD) screen, or other types of display screens such as Organic Light Emitting Diode (OLED) displays. The display screen may be configured as a touch screen. The touch screen may use capacitive, resistive, or another type of touch screen technology. The application processor and the graphics processor may be coupled to internal memory to provide processing and display capabilities. The non-volatile memory port may also be used to provide data input/output options to a user. The non-volatile memory port may also be used to expand the memory capabilities of the wireless device. The keyboard may be integrated with or wirelessly connected to the wireless device to provide additional user input. A touch screen may also be used to provide a virtual keyboard.
Various techniques, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, compact disc read only memories (CD-ROMs), hard drives, non-transitory computer-readable storage media, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the various techniques. The circuitry may include hardware, firmware, program code, executable code, computer instructions, and/or software. The non-transitory computer-readable storage medium may be a computer-readable storage medium that does not include a signal. In the case of program code execution on programmable computers, the computing device may include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. The volatile and non-volatile memory and/or storage elements may be Random Access Memory (RAM), erasable programmable read-only memory (EPROM), flash drives, optical drives, magnetic hard drives, solid state drives, or other media for storing electronic data. The nodes and wireless devices may also include a transceiver module (i.e., transceiver), a counter module (i.e., counter), a processing module (i.e., processor), and/or a clock module (i.e., clock) or timer module (i.e., timer). One or more programs that may implement or utilize the various techniques described herein may use an Application Programming Interface (API), reusable controls, and the like. Such programs may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.
It should be appreciated that many of the functional units described in this specification have been labeled as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom Very Large Scale Integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
Modules may also be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the module and achieve the stated purpose for the module.
Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network. The module may be passive or active, including an agent operable to perform a desired function.
Reference throughout this specification to "an example" or "exemplary" means that a particular feature, structure, or characteristic described in connection with the example is included in at least one embodiment of the present invention. Thus, the appearances of the phrase "in an example" or the word "exemplary" in various places throughout this specification are not necessarily all referring to the same embodiment.
As used herein, a plurality of items, structural elements, compositional elements, and/or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no single member of such list should be construed as a de facto equivalent of any other member of the same list solely based on its presentation in a common group without indications to the contrary. In addition, reference may be made herein to various embodiments and examples of the invention and to alternatives to various components thereof. It should be understood that such embodiments, examples, and alternatives are not to be construed as actual equivalents of each other, but are to be considered as separate and autonomous representations of the invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of layouts, distances, network examples, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, arrangements, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
While the foregoing examples illustrate the principles of the invention in one or more particular applications, it will be apparent to those of ordinary skill in the art that numerous modifications in form, usage and implementation details may be made without departing from the principles and concepts of the invention. Accordingly, it is not intended that the invention be limited, except as by the claims set forth below.
Claims (22)
1. A network apparatus operable to support dynamic adaptive streaming over hypertext transfer protocol (DASH), the network apparatus having circuitry configured to:
identifying a plurality of media content segments in a defined representation described in a Media Presentation Description (MPD) file;
determining a quality metric for each of the plurality of media content segments in the defined representation described in the MPD file;
identifying media content segments in the defined representation, wherein the determined quality metric is below a selected threshold; and
replacing identified media content segments with respective media content segments from different representations described in the MPD file to form a modified MPD file, wherein the quality metric of the corresponding media content segments is greater than a selected threshold in order to provide substantially constant quality playback of the media content segments in the defined representation.
2. The circuitry of claim 1, further configured to communicate the modified MPD file to a client device that supports DASH.
3. The circuit of claim 1, further configured to generate the modified MPD for a particular device type.
4. The circuit of claim 1, wherein the network device is located in a Content Delivery Network (CDN) or in an operator network.
5. The circuit of claim 1, further configured to identify the quality metric for the media content segment using at least one of a bit rate parameter or a quality parameter.
6. The circuit of claim 1, wherein the respective media content segments are from substantially the same media time range in the different representation as compared to the media content segments in the defined representation.
7. The circuit of claim 1, further configured to re-encode the media content segment in the defined representation, wherein a determined quality metric is below a selected threshold such that a determined quality metric of a re-encoded media content segment is above a selected threshold.
8. A method for supporting dynamic adaptive streaming over hypertext transfer protocol (DASH), the method comprising:
determining, at a network device, a quality metric for each of a plurality of media content segments in a defined representation described in a Media Presentation Description (MPD) file;
identifying media content segments in the defined representation, wherein the determined quality metric is below a selected threshold; and
replacing identified media content segments with respective media content segments from different representations described in the MPD file to form a modified MPD file, wherein the quality metric of the corresponding media content segments is greater than a selected threshold in order to provide substantially constant quality playback of the media content segments in the defined representation.
9. The method of claim 8, further comprising transmitting the modified MPD file from the network device to a client device that supports DASH.
10. The method of claim 8, further comprising receiving, at the network device, the selected threshold value of the quality metric from a User Equipment (UE).
11. The method of claim 8, further comprising generating the modified MPD for a particular device type.
12. The method of claim 8, wherein the network device is located in a Content Delivery Network (CDN) or in an operator network.
13. The method of claim 8, further comprising identifying the quality metric for the media content segment using at least one of a bit rate parameter or a quality parameter.
14. The method of claim 8, wherein the respective media content segments are from substantially the same media time range in the different representation as compared to the media content segments in the defined representation.
15. The method of claim 8, further comprising re-encoding the media content segment in the defined representation, wherein the determined quality metric is below a selected threshold such that the determined quality metric of the re-encoded media content segment is above a selected threshold.
16. A network apparatus operable to support dynamic adaptive streaming over hypertext transfer protocol (DASH), the network apparatus having circuitry configured to:
identifying a plurality of media content segments in a defined representation described in a Media Presentation Description (MPD) file;
determining a quality metric for each of the plurality of media content segments in the defined representation described in the MPD file;
identifying a media content segment in the defined representation, wherein the determined quality metric is below a selected threshold;
re-encoding the media content segment in the defined representation, wherein the determined quality metric is below a selected threshold such that the determined quality metric of the re-encoded media content segment is above the selected threshold; and
generating a modified MPD to include re-encoded media content segments to provide substantially constant quality playback of the media content segments in the defined representation.
17. The circuitry of claim 16, further configured to communicate the modified MPD file to a client device that supports DASH.
18. The circuit of claim 16, further configured to generate the modified MPD for a particular device type.
19. The circuitry of claim 16, wherein the network device is located in a Content Delivery Network (CDN) or in an operator network.
20. The circuit of claim 16, further configured to identify the quality metric for the media content segment using at least one of a bit rate parameter or a quality parameter.
21. The circuit of claim 16, further configured to replace the identified media content segments with respective media content segments from different representations described in the MPD file to form the modified MPD file, wherein the quality metric for the corresponding media content segments is greater than the selected threshold.
22. The circuit of claim 21, wherein the corresponding media content segments are from substantially the same media time range in the different representation as compared to the media content segments in the defined representation.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/494,192 | 2014-09-23 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1234236A1 true HK1234236A1 (en) | 2018-02-09 |
| HK1234236B HK1234236B (en) | 2021-03-05 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106576182B (en) | Apparatus and method for supporting dynamic adaptive streaming over hypertext transfer protocol | |
| US11038944B2 (en) | Client/server signaling commands for dash | |
| US10455404B2 (en) | Quality of experience aware multimedia adaptive streaming | |
| CN107005727B (en) | Media content streaming | |
| US9629131B2 (en) | Energy-aware multimedia adaptation for streaming and conversational services | |
| WO2014134309A1 (en) | Link-aware streaming adaptation | |
| WO2015148016A1 (en) | Adaptive media streaming | |
| HK1234236A1 (en) | Device and system for supporting dynamic adaptive streaming over hypertext transfer protocol | |
| HK1234236B (en) | Device and system for supporting dynamic adaptive streaming over hypertext transfer protocol | |
| HK1258336B (en) | Apparatus and machine readable storage medium for multimedia adaptive streaming | |
| HK1217250B (en) | Method and device for providing multimedia adaptive streaming | |
| HK1229572A1 (en) | Adaptive media streaming | |
| HK1241620A1 (en) | Media content streaming |