WO2017161122A1 - System for live video streaming using delay-aware fountain codes - Google Patents
System for live video streaming using delay-aware fountain codes Download PDFInfo
- Publication number
- WO2017161122A1 WO2017161122A1 PCT/US2017/022715 US2017022715W WO2017161122A1 WO 2017161122 A1 WO2017161122 A1 WO 2017161122A1 US 2017022715 W US2017022715 W US 2017022715W WO 2017161122 A1 WO2017161122 A1 WO 2017161122A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- daf
- data
- node
- future
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/35—Unequal or adaptive error protection, e.g. by providing a different level of protection according to significance of source information or by adapting the coding according to the change of transmission channel characteristics
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/37—Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
- H03M13/373—Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35 with erasure correction and erasure determination, e.g. for packet loss recovery or setting of erasures for the decoding of Reed-Solomon codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/37—Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
- H03M13/3761—Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35 using code combining, i.e. using combining of codeword portions which may have been transmitted separately, e.g. Digital Fountain codes, Raptor codes or Luby Transform [LT] codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/63—Joint error correction and other techniques
- H03M13/6312—Error control coding in combination with data compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/115—Selection of the code volume for a coding unit prior to coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
- H04N21/2402—Monitoring of the downstream path of the transmission network, e.g. bandwidth available
Definitions
- FEC Forward Error Correction
- fountain codes such as Luby transform (LT) code [2] and Raptor code [3].
- LT Luby transform
- Raptor code 3
- fountain codes encoded packets are generated from a given set of native packets, such that the original file can ideally be recovered as long as the number of the received packets is equal to or only slightly larger than the number of native packets.
- the RaptorQ codes [4] For example, given a file consisting of a sufficiently large number of native packets k , with the latest generation of fountain codes, the RaptorQ codes [4], the chance of decoding failure is less than 1% upon receiving k packets, and less than one in a million upon receiving k + 2 packets [5].
- a major advantage of fountain codes is ratelessness, i.e., the coding rate can automatically adapt to the wireless condition without demanding acknowledgments (ACKs) or retransmissions.
- traditional fountain codes are designed for file transfer rather than video streaming. Under the current fountain codes paradigm, the users may not start to watch it until the whole video file is successfully decoded.
- window-based fountain codes In order to accommodate fountain codes to video streaming, several window-based fountain codes have been proposed.
- block coding [6], [7] divides the video file into fixed-length data blocks and separately sends them using fountain codes;
- expanding window [11], [12] expands the previous window to include new packets.
- the block size is fixed in block coding, virtually extended in sliding window, and keeps growing in expanding window.
- a network system for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node via a relay node may comprise a first node configured to: predict at least one future video encoding characteristic of a video source over a finite number of future frames, encode a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic, and transmit the plurality of video data packets, wherein the video source is a video source of the plurality of video data packets.
- At least one computer-readable storage medium encoded with executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method for transmitting a video stream over a data link from a source node to a sink node via a relay node.
- the method may comprise: predicting at least one future video encoding characteristic of a video source over a finite number of future frames; encoding a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic; and transmitting the plurality of video data packets, wherein the video source is a video source of the plurality of video data packets.
- FIG. 1 is a block diagram illustrating an exemplary source node, relay nodes, and a sink
- FIG. 2 is a flowchart of an exemplary method of increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node according to some embodiments.
- FIG. 3 is a flowchart of an additional exemplary method of increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node according to some embodiments.
- FIG. 4 is a flowchart of an exemplary fountain code process according to some embodiments.
- FIG. 5 is a set of charts illustrating a comparison between bit rates and sampling distributions using (a) foreman, (b) DAF-O, (c) DAF-M, and (d) DAF according to some embodiments.
- FIG. 6 is a chart illustrating exemplary comparisons of DAF-L, DAF-O, DAF-M, and DAF according to some embodiments.
- FIG. 7 is a chart illustrating exemplary comparisons of DAF-L, DAF-O, DAF-M, DAF, and Block coding according to some embodiments.
- FIG. 8 is a set of charts illustrating exemplary comparisons of DAF, DAF-L, DAF-O, and DAF-M according to some embodiments.
- FIG. 9 is a chart illustrating a comparison of exemplary fountain code schemes according to some embodiments.
- FIG. 10 is a diagram illustrating a computer system on which some embodiments of the invention may be implemented.
- window-based fountain codes may introduce extra play delay due to window size and decoding time.
- window-based fountain codes for video streaming may trade delay for higher decoding performance. The reason is that larger block size may introduce longer delay, but may also provide higher coding gain at the same time [13].
- video streaming applications may have different levels of delay tolerance.
- Some applications may be very delay- sensitive, such as live video chat, cloud gaming, etc., all of which are characterized by bi-directional communication [14].
- Some uni-directional applications on the other hand, have a loose tolerance on end-to-end delay, such as TV broadcasting, Internet live streams. The others, such as the streaming of pre-recorded content, are the least sensitive to delay.
- DAF codes may be suitable for video streaming, as proposed in [10]. Different from the other sliding window fountain codes schemes, DAF codes do not treat the sliding windows as homogeneous. By adaptively selecting the window length and adjusting the sampling pattern according to the ongoing video bit rate, DAF codes may deliver significantly higher video decoding ratio than existing schemes.
- the inventors have recognized and appreciated, however, that the high-complexity global optimization process of DAF codes may prevent its applications in delay-sensitive video streaming (e.g., live streaming). As a result, the inventors have recognized and appreciated that reduceing complexity while maintaining relatively high coding performance may improve the video watching experience for delay- sensitive video streaming. The inventors have recognized and appreciated that using Model Predictive Control (MPC) may be most appropriate.
- MPC Model Predictive Control
- MPC has developed considerably over the recent years. As an advanced method of process control, it has been successfully used in the control process of chemical plants, oil refineries, power plants, etc. [15]. Besides, there are also some successful applications in academia, ranging from energy management [16] to automatic vehicle control [17]. Despite the popularity in automatic control community, MPC has never been used in the domain of multimedia communication. The inventors have recognized and appreciated that integrating MPC with DAF codes may improve video streaming performance, while maintaining low complexity. In MPC, actions may be optimized according to a state sequence over a horizon. Based on that idea, two schemes are discussed herein, i.e., DAF-M and DAF-O.
- DAF-M may improve upon adjusting the sampling pattern for fountain codes according to the ongoing video bit rate, so as to deliver a consistent video watching experience.
- the computational complexity may be affordable for live streaming because the objective function may be minimized over a limit-sized horizon, which may not grow with the video length.
- DAF-0 may be implemented based on the prediction of future bit rate.
- DAF-0 may provide a decoding ratio that is significantly higher than existing online video streaming schemes.
- FIG. 1 is a diagram illustrating a system 100 that may employ techniques for increasing data throughput and decreasing transmission delay from a source node to a sink node via a relay node as described herein.
- a source node 110 (which may be referred to as first node) may encode data packets for transmission.
- the source node 110 may encode the data packets using fountain coding (as illustrated at stage 210 of FIG. 2).
- any suitable coding including rateless coding, may be used to encode the data packets.
- the source node 110 may also transmit the data packets to a first relay node 130 via connection 120 (as illustrated at stage 220 of FIG. 2), which may be a wireless connection.
- connection 120 as illustrated at stage 220 of FIG. 2
- any suitable connection or communication technology may be used to communicate among the nodes.
- the first relay node 130 may receive at least one of the data packets from the source node 110. In addition, the first relay node 130 may relay or transmit the data packets to a second relay node 150 via connection 140, which may be a wireless connection. The second relay node 150 may receive at least one of the data packets from the first relay node 130. In addition, the second relay node 150 may relay or transmit the data packets to a sink node 170 via connection 160, which may be a wireless connection.
- source node 110 may be a server, such as a streaming video server and/or a live streaming video server streaming live video content.
- source node 110 may include a network ingress point, such as a gateway to a wireless network (e.g., a base station in a wireless network).
- sink node 170 may be a client, such as a video receiver and/or playback device.
- sink node 170 may be a client of the server referred to as source node 110.
- Sink node 170 may be a wireless device, such as a smartphone, tablet, laptop computer, or desktop computer.
- relay nodes such as first relay node 130 and/or second relay node 150, may include network routers and/or network switches.
- relay nodes such as first relay node 130 and/or second relay node 150, may include hubs, and/or any other suitable components.
- relay nodes may include other cell transceivers in a cellular network, such as a 5G network.
- the first relay node 130 and/or the second relay node 150 may regenerate, re-encode, and relay the data packets conditionally, based on the quantity of the data packets received at the given relay node. For example, the first relay node 130 and/or the second relay node 150 may receive a subset of the data packets, and based on the subset of the data packets, the first relay node 130 and/or the second relay node 150 may regenerate the data packets, re-encode the regenerated data packets, and transmit the regenerated, re-encoded data packets.
- the sink node 170 may receive one or more data packets from the second relay node 150. If the sink node 170 has received a sufficient quantity of the data packets, the sink node 170 may regenerate and decode the data packets.
- FIG. 1 shows only two relay nodes, the first relay node 130 and the second relay node 150. This number of relay nodes is shown for simplicity of illustration. It should be appreciated that a network system may have many more nodes and relay nodes.
- source node 110 may predict at least one future video encoding characteristic of a video source over a finite number of future frames (as illustrated at stage 210 of FIG. 2).
- a video encoding characteristic may include any characteristic or property of how a video source is encoded, such as frame rate, number of frames in a group of pictures, video bit rate, and so on.
- source node 110 may encode a plurality of video data packets using rateless coding (as described above) based on the predicted at least one future video encoding characteristic of the video source (as illustrated at stage 220 of FIG. 2) by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic (as illustrated at stage 225 of FIG. 2).
- the video source may be a video source of the plurality of video data packets. Additionally, source node 110 may transmit the video data packets. For example, source node 110 may transmit the video data packets over a data link to a second node, which may be first relay node 130, second relay node 150, and/or sink node 170 (as illustrated at stage 220 of FIG. 2).
- the second node may be a sink node configured to receive one or more of the plurality of video data packets from the streaming video server via at least one relay node, or at least one relay node configured to receive at least one of the plurality of video data packets from the streaming video server.
- the data link is at least partially wireless.
- the rateless coding may comprise fountain coding.
- the video source may include video data, such as a video file.
- at least one video data packet of the plurality of video data packets comprises at least 100 bits, although any number of bits may be used.
- the plurality of overlapping sliding data windows may be non-homogeneous, as discussed herein. For example, the plurality of overlapping sliding data windows may collectively have more than one length and/or more than one sampling distribution.
- source node 110 may obtain at least one video encoding characteristic by preprocessing the video source, and source node 110 may use the at least one video encoding characteristic as well as the predicted at least one future video encoding characteristic.
- source node 110 may predict the at least one future video encoding characteristic based on at least one video encoding characteristic of the video source.
- adaptively adjusting the at least one parameter of the rateless coding may comprise adaptively selecting a first length of a first data window and a second length of a second data window of a plurality of overlapping sliding data windows (as illustrated at stage 226 of FIG. 3). Additionally, the selecting may use model predictive control based on the predicted at least one future video encoding characteristic. In some embodiments, the selecting may be based on a number of bits in frames of the video source and/or on a first number of frames in the video source.
- the first length may comprise a second number of frames that the first data window can accommodate, and the second length may comprise a third number of frames that the second data window can accommodate.
- source node 110 may store the first length of the first data window in a header of a first packet of the first data window and may store the second length of the second data window in a header of a second packet of the second data window.
- source node 110 may store the first sampling distribution in the header of the first packet of the first data window as a first slope factor and may store the second sampling distribution in the header of the second packet of the second data window as a second slope factor.
- adaptively adjusting the at least one parameter of the rateless coding may comprise adaptively adjusting a first sampling distribution for the first data window and a second sampling distribution for the second data window (as illustrated at stage 227 of FIG. 3).
- the at least one future video encoding characteristic may include a future video bit rate of the video source.
- the adjusting may be based on a future video bit rate of the video source.
- the future video bit rate may be variable.
- source node 110 may segment data from the frames of the video source into the plurality of video data packets based on at least the first length and the second length (as illustrated at stage 228 of FIG. 3).
- the at least one video encoding characteristic may comprise a past frame rate of the video source, a number of frames in a group of pictures of the video source, and/or a past video bit rate of the video source. Additionally, the past video bit rate may be variable.
- DAF codes are delay-aware fountain codes that may improve streaming high quality video over lossy wireless networks. DAF codes deeply integrate channel coding and video coding. In some embodiments, DAF codes include two techniques: time-based sliding window and optimal window-wise sampling strategy.
- a basic idea of DAF codes is to segment the video file into overlapping windows, and then encode and send them consecutively with fountain codes. While the non-overlapping block coding scheme has a relatively small block size, the overlap between sliding windows may allow the decoded packets in a previous window to help the decoding of future windows. By doing so, the block size may be virtually extended.
- the window size W in DAF codes is defined by the maximum tolerable latency T Dday (in the unit of time or number of frames to be transmitted in the window), instead of a fixed number of packets. Therefore this technique is named time-based sliding window.
- T Dday the maximum tolerable latency
- the step size between two consecutive windows is At .
- At 1 herein. This is because both the video length T and window size W are defined to be integral multiples of At , if At > 1 , all the parameters can be down-sampled by a factor of At , and all formulas still hold.
- the fountain codes algorithm may randomly choose the native packets and combine them into a coded packet, according to degree distribution and sampling probability [2]. Because the windows are overlapping, the total sampling probability of a frame is related to all the windows that cover it, as shown in (1).
- ASP or accumulated sampling probability
- the sampling probabilities in each window need to be adjusted in order to obtain a uniform overall ASP.
- the slope-only description is proposed: the distribution is approximated by a linear function defined by the slope factor
- the resulting ASP for each frame can be computed by (2).
- ASP t) s ⁇ t)- a+ DM (2)
- ⁇ denotes the dot product of the two vectors of ( ⁇ - W + 1) elements
- DAF codes may aim to find a set of slope factors a , for which the variance of the sampling probabilities of all packets attains its minimum value.
- the optimization problem is defined in (4). argmin ⁇ ⁇ ASP S (i) - ASP s,a J
- IDR in-time decoding ratio
- FDR file decoding ratio
- DAF and DAF-L can improve the quality of video streaming over lossy wireless network, the inventors have recognized and appreciated areas of improvement:
- DAF-L decoding performance of DAF-L may be considered relatively low. Although its performance is higher than other delay-aware schemes, there is still a notable gap between DAF and DAF-L ( 10% maximum in decoding ratio). That is because DAF-L does not use the window-wise sampling distribution optimization as in DAF. In exchange, DAF-L is a low-complexity and online algorithm.
- Some embodiments may reduce the complexity of DAF in order to improve performance for delay- sensitive video streaming and for long videos.
- the global optimization function of DAF brings the complexity of ⁇ ⁇ ) . Since the computational scale grows cubically with the video length T , the inventors have recognized and appreciated that alternatives may be more practical for long videos.
- DAF may be an offline scheme. Because the bit rate information of all frames is required to perform the optimization, the whole video file may need to be available before being encoded by fountain codes. Therefore, some embodiments of DAF may not be appropriate or possible for live video streaming applications.
- MPC Model Predictive Control
- the minimization function usually takes the form of a quadratic function of the error between the prediction and the target value.
- the algorithm is iteratively executed each time a new instant is sampled.
- Various MPC algorithms may only differ with each other in the horizon length, predictive model, and the objective function to be minimized. Note that it is in general not a globally optimal algorithm, since the control decisions are made only based on the history values and the prediction of a finite future horizon.
- MPC provides a locally optimal online solution to any objective-function-minimizing process control problem, which the inventors have recognized and appreciated as supporting a solution to problems described herein.
- DAF-M may be considered an offline algorithm.
- the prediction of bit rate and other practical problems for online algorithms will be discussed further below.
- ⁇ the index of current time (also the index of current sliding window). Since the window size is W , the frame that was newly added to the encoding buffer is the
- window size W and horizon length H are independent parameters that serve different purposes.
- W is typically chosen in accordance with the longest tolerable end-to-end delay. It is specified by the user, so it is an input value that does not change in a communication session.
- H is a parameter to balance the computational complexity and desired performance. It is chosen by the application according to the computing power, network condition, quality demand, etc.
- a good target may be similar to the global optimization problem in (4), which is minimizing the variance of ASP.
- the objective function is limited to a local range. Because the slopes in the horizon [ ⁇ , ⁇ + ⁇ - ⁇ ] can affect the ASP of the frames in [T, T + H + W - 2] , the objective function is the variance over that range as in (5).
- af denotes the H -length slope vector for the windows starting from ⁇ .
- P denotes the vector of ASP in the range of [f, ⁇ + H + W - 2] , including both past and predictive values. The calculation of P will be discussed in the following part.
- P H is the average value over s,a r
- the process model may play a decisive role in a MPC controller.
- the chosen model may need to be capable of capturing the process dynamics so as to precisely predict the future outputs.
- ASP s a (t) is defined in (2).
- Vector length is equal to the length of a . In the calculation of ASP, s needs to contain W - 1 more elements than a .
- ASP model is deterministic if we know all the bit rates s and slope parameters a in the future, the process model may no longer be deterministic when a finite horizon is imposed.
- the ASP in the range of objective function in (5) consists of three parts, all of which may be (H + W - 1) -length vectors representing a part of ASP within the range of [T, T + H + W - 2] . Some of the components only have a part of non-zero elements in that range, which will be pointed out later. • ⁇ ⁇ : the initial ASP that already sampled by previous windows.
- Non-zero range [f, ⁇ + W - 2] . f*mit - ASP (s ⁇ +! , a _ w+i ), (7) where the index notation in s and a means the elements in these two vectors are the t* to (t + l - if 1 elements from vectors s and a , respectively. Because V init is computed based on known s and previous selections of a , it may have no variable parameter.
- Non-zero range [ ⁇ , ⁇ + H +W where a is the H -length slope vector to be optimized as in (5).
- Non-zero range [ ⁇ + ⁇ , ⁇ + H + W
- only the first element ⁇ ⁇ ( ⁇ ) may be chosen to encode the 7 th window.
- the optimal slope (11) can be solved by Karush-Kuhn-Tucker (KKT) conditions. Because there are H variables to optimize and 2H conditions for KKT conditions, the optimization process yields a system of equations with 3H equations. If we omit constant factors, the solution involves the generation of a parameter matrix of H x H and the computation of its inverse matrix. The algorithm will be executed iteratively for each window for T times. As a result, the total computational complexity is Since H - T and H does not increase with video length, it can be considered as a linear complexity algorithm for time T . Comparatively, DAF-M may have a much lower complexity than ⁇ ( ⁇ 3 ) of DAF.
- the DAF encoder may be implemented in an ASIC, an FPGA, or a similar integrated circuit chip.
- the system may be implemented as a driver controlling network interface hardware or in any other suitable way.
- the prediction of bit rate in DAF-0 is only ⁇ ( ⁇ ), so the online algorithm may not increase the total
- FIG. 5a shows the bit rates for CIF video sequence foreman coded with H.264/AVC standard.
- the inventors have recognized and appreciated that (i) the first W -I slopes in DAF-M (FIG. 5c) and DAF-0 (FIG. 5b) are uniform distributions. That is because there may not be enough past ASP to calculate P init in that range (see (7)), so the slopes in that range are set to 0 as a warm-up period; (ii) the latter part of DAF-M is very similar to that in the globally optimal result of DAF in FIG. 5d, whilst DAF-0 is not, because of its lack of information about the future bit rates.
- FIG. 6 shows the resulting ASP using the optimized slopes from FIG. 5b-d (including DAF-L, which uses all uniform distributions).
- the numbers following scheme names in the legend are corresponding variations of ASP, and there is
- [ ⁇ ] is an Iverson bracket notation, which is defined to be 1 if the condition in square brackets is satisfied, and 0 otherwise.
- p(c) gives the ratio of frames whose ASP surpasses lc . It is a monotonically increasing function, and its output values are in the range of [0,l] : if c ⁇ lP m£ « , the output value is 0 ; if c ⁇ ⁇ P min , the output value is 1 (where P max and P min denote the maximum and minimum values in P respectively).
- P max and P min denote the maximum and minimum values in P respectively.
- ASP reflects the average accumulated sampling probability for a frame, so, under the same code rate, greater ASP of a frame indicates higher probability for it to be sampled than others. It also means, in order to be sampled into at least one coded packet, a frame with higher ASP expects lower code rate, and ASP is inversely proportional to expected code rate. Therefore, if the ASP of a frame is higher than the reciprocal of code rate lc , that frame is likely to be sampled in coded packets under code rate c , thus it has the chance to be decoded on the receiver side. As a result, p(c) represents the ratio of frames that could be decoded under code rate c .
- FIG. 7 shows the coverage ratio curves of ASP from FIG. 6 and block coding, c is normalized to the range of [ ⁇ , l] .
- the inventors have recognized and appreciated that the result justifies the gains brought by the proposed DAF codes: when c is greater than 60% of lP min
- the ASP coverage ratios are over 60% and there is DAF > DAF- M > DAF- O > DAF- L . That means in terms of the ratio of possible decoded frames within that code rate range, in some embodiments, DAF outperforms DAF-M, DAF-M outperforms DAF-O, and DAF-0 outperforms DAF-L.
- this metric does not fully represent the actual decoding ratio in video communication. It only considers the sampling probability, but (i) it does not imply when the frames are decoded, so it does not represent in-time decoding ratio ( IDR ); (ii) it does not consider the factors of fountain codes, such as degree distribution, codeword length, etc. That is the reason why the p curve of block coding does not look significantly worse than DAF-L in FIG. 7, while the IDR results of block coding is far inferior to any other schemes, as we will see in the next section.
- the encoding/decoding parts of some embodiments may generally follow the DAF framework in [10], with changes made to the Video Preprocessing Module, especially with implementation of a different optimization algorithm in some embodiments.
- VMs are connected to the virtual network as a source (or encoder/sender) node running the client application, and a destination (or decoder/receiver) node running the server application.
- a video is streamed from client to server using different schemes.
- EMANE is used for emulation of IEEE 802.1 lb on PHY and MAC layer of each wireless node. Because of the forward error correction (FEC) nature of fountain codes, we disable the retransmission mechanism of 802.1 lb for all fountain-code-based schemes. The adaptive rate selection mechanism of 802.1 lb is also disabled. Ad-hoc On-Demand Distance Vector (AODV) protocol is used for routing.
- FEC forward error correction
- AODV Ad-hoc On-Demand Distance Vector
- the communication path from the source to the destination has one hop.
- the distance between the two nodes is carefully set so that the packets with 1024-byte payload will experience a 10% loss rate.
- DAF the full optimization version of delay-aware fountain codes in [10].
- DAF-M the MPC-based DAF codes as described herein.
- DAF-O the online version of MPC-based DAF codes as described herein.
- DAF-L the non-optimized version of DAF codes. All the windows may use uniform distributions in some embodiments.
- S-LT the sliding window LT code from [8].
- Block the block coding for fountain codes.
- TCP uses TCP protocol to stream video. In order to add delay awareness, the video file is also segmented into the blocks like in "Block” scheme, but they are sent using TCP.
- IDR in-time decoding ratio
- Results of DAF-M and DAF-0 are plotted with dashed lines and dotted lines, resepectively.
- the different horizon lengths H are indicated by the width of lines and the brightness of colors: the thicker and darker lines indicate larger H , and the thinner and lighter lines indicate smaller H .
- Various values of H may be chosen for MPC-0 and MPC-M.
- Table 2 IDR comparisons of DAF-based schemes using different horizon lengths.
- the decoding ratio of all the schemes is an increasing function of T Dday , and also a decreasing function of C . That means larger delay and lower code rate lead to higher overall performance, which meets our expectation.
- the performance of DAF may be the highest in some embodiments, while DAF-L may be the lowest in some embodiments.
- the results of DAF-M and DAF-0 may generally be between those of DAF and DAF-L in some embodiments.
- the relationships of decoding ratios is consistent with the ASP coverage ratios as seen in FIG. 7.
- DAF-M may outperform DAF-0 under the same horizon length. That is because DAF-M can be deemed as DAF-0 with "perfect" bit rate prediction. As a result, the gap between DAF and DAF-M is only caused by local bit rate knowledge, while the performance drop from DAF to DAF-0 may come from both limited horizon length and inaccuracy of bit rate prediction.
- FIG. 9 shows five surfaces of the online schemes.
- Table 3 IDR comparisons of proposed schemes and existing video streaming algorithms.
- DAF may have the highest performance, followed by DAF-M, both of which are offline algorithms. Considering the computational complexity of DAF-M may be orders of magnitude lower than DAF in some embodiments, DAF-M may be a more practical offline algorithm in some embodiments.
- DAF-0 may have the highest performance, followed by DAF-L. As shown in FIG. 9, the surface of DAF-0 is almost always higher than any other scheme.
- TCP The performance of TCP is relatively low. The reason is that TCP is not suitable for wireless channels where packet loss rate is high [31]. Its congestion control mechanism does not help the performance.
- Block coding scheme performs the lowest among all schemes, although its ASP coverage ratio is not very low compared to others. The reason is that the blocks are too small and non-overlapping, so the coding overhead is very large [13].
- FIG. 1 illustrates a system implemented with multiple computing devices, which may be distributed and/or centralized.
- FIGS. 2 and 3 illustrate algorithms executing on at least one computing device.
- FIG. 19 illustrates an example of a suitable computing system environment 300 on which embodiments of these algorithms may be implemented.
- This computing system may be representative of a computing system that implements the described technique of increasing data throughput and decreasing transmission delay from a source node to a sink node via a relay node.
- computing system environment 300 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 300 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 300.
- the invention is operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments, or cloud-based computing environments that include any of the above systems or devices, and the like.
- the techniques described herein may be implemented in whole or in part within network interface 370.
- the computing environment may execute computer-executable instructions, such as program modules.
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer storage media including memory storage devices.
- an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 310. Though a programmed general purpose computer is illustrated, it should be understood by one of skill in the art that algorithms may be implemented in any suitable computing device. Accordingly, techniques as described herein may be implemented in a system for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node. These techniques may be implemented in such network devices as originally manufactured or as a retrofit, such as by changing program memory devices holding programming for such network devices or software download. Thus, some or all of the components illustrated in FIG. 19, though illustrated as part of a general purpose computer, may be regarded as representing portions of a node or other component in a network system.
- Components of computer 310 may include, but are not limited to, a processing unit 320, a system memory 330, and a system bus 321 that couples various system components including the system memory 330 to the processing unit 320.
- the system bus 321 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel
- MCA Multimedia Architecture
- EISA Enhanced ISA
- VESA Video Electronics Standards Association
- PCI Peripheral Component Interconnect
- Computer 310 typically includes a variety of computer readable media.
- Computer readable media can be any available media that can be accessed by computer 310 and includes both volatile and nonvolatile media, removable and non-removable media.
- Computer readable media may comprise computer storage media and communication media.
- Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by computer 310.
- Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct- wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR), and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
- wired media such as a wired network or direct- wired connection
- wireless media such as acoustic, radio frequency (RF), infrared (IR), and other wireless media.
- the system memory 330 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 331 and random access memory (RAM) 332.
- ROM read only memory
- RAM random access memory
- BIOS basic input/output system 333
- RAM 332 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 320.
- FIG. 19 illustrates operating system 334, application programs 335, other program modules 336, and program data 337.
- the computer 310 may also include other removable/non-removable,
- FIG. 19 illustrates a hard disk drive 341 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 351 that reads from or writes to a removable, nonvolatile magnetic disk 352, and an optical disk drive 355 that reads from or writes to a removable, nonvolatile optical disk 356 such as a CD-ROM or other optical media.
- Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
- the hard disk drive 341 is typically connected to the system bus 321 through an non-removable memory interface such as interface 340, and magnetic disk drive 351 and optical disk drive 355 are typically connected to the system bus 321 by a removable memory interface, such as interface 350.
- the drives and their associated computer storage media discussed above and illustrated in FIG. 19, provide storage of computer readable instructions, data structures, program modules, and other data for the computer 310.
- hard disk drive 341 is illustrated as storing operating system 344, application programs 345, other program modules 346, and program data 347. Note that these components can either be the same as or different from operating system 334, application programs 335, other program modules 336, and program data 337.
- Operating system 344, application programs 345, other program modules 346, and program data 347 are given different numbers here to illustrate that, at a minimum, they are different copies.
- a user may enter commands and information into the computer 310 through input devices such as a keyboard 362 and pointing device 361, commonly referred to as a mouse, trackball, or touch pad.
- Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
- These and other input devices are often connected to the processing unit 320 through a user input interface 360 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
- a monitor 391 or other type of display device is also connected to the system bus 321 via an interface, such as a video interface 390.
- computers may also include other peripheral output devices such as speakers 397 and printer 396, which may be connected through an output peripheral interface 395.
- the computer 310 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 380.
- the remote computer 380 may be a personal computer, a server, a router, a network PC, a peer device, or some other common network node, and typically includes many or all of the elements described above relative to the computer 310, although only a memory storage device 381 has been illustrated in FIG. 19.
- the logical connections depicted in FIG. 19 include a local area network (LAN) 371 and a wide area network (WAN) 373, but may also include other networks.
- LAN local area network
- WAN wide area network
- Such networking environments are commonplace in offices, enterprise- wide computer networks, intranets, and the Internet.
- the computer 310 When used in a LAN networking environment, the computer 310 is connected to the LAN 371 through a network interface or adapter 370. When used in a WAN networking environment, the computer 310 typically includes a modem 372 or other means for establishing communications over the WAN 373, such as the Internet.
- the modem 372 which may be internal or external, may be connected to the system bus 321 via the user input interface 360, or other appropriate mechanism.
- program modules depicted relative to the computer 310, or portions thereof may be stored in the remote memory storage device.
- FIG. 19 illustrates remote application programs 385 as residing on memory device 381. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
- Such services may include streaming multimedia, such as video and/or audio, over a network such as the Internet between a streaming server and a client.
- a streaming service may be over a different network, such as a LAN, where the streaming server may be installed on a computer within the premises of a customer, such as a house or office building.
- the streaming server may be geographically remote relative to the clients, and the connection between the server and clients may be a dedicated wireless connection.
- the connection may be over a shared network such as a 5G cellular network.
- the above-described embodiments of the present invention can be implemented in any of numerous ways.
- the embodiments may be implemented using hardware, software or a combination thereof.
- the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
- processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component.
- a processor may be implemented using circuitry in any suitable format.
- a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone, or any other suitable portable or fixed electronic device.
- PDA Personal Digital Assistant
- a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.
- Such computers may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet.
- networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks, or fiber optic networks.
- the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
- the invention may be embodied as a computer readable storage medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above.
- a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form.
- Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above.
- computer-readable storage medium encompasses only a computer-readable medium that can be considered to be a manufacture (i.e., article of manufacture) or a machine.
- the invention may be embodied as a computer readable medium other than a computer-readable storage medium, such as a propagating signal.
- program or “software” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
- Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices.
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- functionality of the program modules may be combined or distributed as desired in various embodiments.
- data structures may be stored in computer-readable media in any suitable form.
- data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields.
- any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags, or other mechanisms that establish relationship between data elements.
- the invention may be embodied as a method, of which an example has been provided.
- the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Probability & Statistics with Applications (AREA)
- Theoretical Computer Science (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A network system for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node via a relay node. The network system may comprise: a first node configured to: predict at least one future video encoding characteristic of a video source over a finite number of future frames, encode a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic, and transmit the plurality of video data packets, wherein the video source is a video source of the plurality of video data packets.
Description
SYSTEM FOR LIVE VIDEO STREAMING USING DELAY- AWARE FOUNTAIN
CODES
RELATED APPLICATIONS
This application claims priority to and the benefit of U.S. Provisional Patent
Application Number 62/309,069, entitled "SYSTEM FOR VIDEO STREAMING USING DELAY-AWARE FOUNTAIN CODES," filed March 16, 2016. The entire contents of the foregoing are hereby incorporated herein by reference.
BACKGROUND
Over the recent decade, the video-based services have seen explosive growth, especially after the prevalence of smart mobile devices. People are more willing to use YouTube, FaceTime, Netflix, etc., on the move. Thanks to the evolution of communication technologies, such as 3G/4G, LTE, WiFi, etc., the service quality over wireless networks has improved dramatically. However, with the even faster growing video qualities (720p, 1080p, 4K) and the stochastic nature of wireless channels, new challenges are constantly raised for both academia and industry.
In order to deal with the stochastic loss in a wireless network the Forward Error Correction (FEC) codes were developed. One important class of FEC codes are fountain codes [1], such as Luby transform (LT) code [2] and Raptor code [3]. With fountain codes, encoded packets are generated from a given set of native packets, such that the original file can ideally be recovered as long as the number of the received packets is equal to or only slightly larger than the number of native packets. For example, given a file consisting of a sufficiently large number of native packets k , with the latest generation of fountain codes, the RaptorQ codes [4], the chance of decoding failure is less than 1% upon receiving k packets, and less than one in a million upon receiving k + 2 packets [5]. A major advantage of fountain codes is ratelessness, i.e., the coding rate can automatically adapt to the wireless condition without demanding acknowledgments (ACKs) or retransmissions. However, traditional fountain codes are designed for file transfer rather than video streaming. Under the current fountain codes paradigm, the users may not start to watch it until the whole video file is successfully decoded.
In order to accommodate fountain codes to video streaming, several window-based
fountain codes have been proposed. In terms of window structure, (i) block coding [6], [7] divides the video file into fixed-length data blocks and separately sends them using fountain codes; (ii) sliding window [8]-[10] segments the video into overlapping windows, encodes them with fountain codes, and decodes them jointly; (iii) expanding window [11], [12] expands the previous window to include new packets. In the above three schemes, the block size is fixed in block coding, virtually extended in sliding window, and keeps growing in expanding window.
SUMMARY
A network system for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node via a relay node. The network system may comprise a first node configured to: predict at least one future video encoding characteristic of a video source over a finite number of future frames, encode a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic, and transmit the plurality of video data packets, wherein the video source is a video source of the plurality of video data packets.
At least one computer-readable storage medium encoded with executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method for transmitting a video stream over a data link from a source node to a sink node via a relay node. The method may comprise: predicting at least one future video encoding characteristic of a video source over a finite number of future frames; encoding a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic; and transmitting the plurality of video data packets, wherein the video source is a video source of the plurality of video data packets.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating an exemplary source node, relay nodes, and a sink
(or destination) node of a network in which some embodiments of the application may be implemented.
FIG. 2 is a flowchart of an exemplary method of increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node according to some embodiments.
FIG. 3 is a flowchart of an additional exemplary method of increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node according to some embodiments.
FIG. 4 is a flowchart of an exemplary fountain code process according to some embodiments.
FIG. 5 is a set of charts illustrating a comparison between bit rates and sampling distributions using (a) foreman, (b) DAF-O, (c) DAF-M, and (d) DAF according to some embodiments.
FIG. 6 is a chart illustrating exemplary comparisons of DAF-L, DAF-O, DAF-M, and DAF according to some embodiments.
FIG. 7 is a chart illustrating exemplary comparisons of DAF-L, DAF-O, DAF-M, DAF, and Block coding according to some embodiments.
FIG. 8 is a set of charts illustrating exemplary comparisons of DAF, DAF-L, DAF-O, and DAF-M according to some embodiments.
FIG. 9 is a chart illustrating a comparison of exemplary fountain code schemes according to some embodiments.
FIG. 10 is a diagram illustrating a computer system on which some embodiments of the invention may be implemented.
DETAILED DESCRIPTION
The inventors have recognized and appreciated that window-based fountain codes may introduce extra play delay due to window size and decoding time. Moreover, window-based fountain codes for video streaming may trade delay for higher decoding performance. The reason is that larger block size may introduce longer delay, but may also provide higher coding gain at the same time [13].
In fact, different video streaming applications may have different levels of delay tolerance. Some applications may be very delay- sensitive, such as live video chat, cloud gaming, etc., all of which are characterized by bi-directional communication [14]. Some uni-directional applications, on the other hand, have a loose tolerance on end-to-end delay, such as TV broadcasting, Internet live streams. The others, such as the streaming of
pre-recorded content, are the least sensitive to delay.
The inventors have recognized and appreciated that Delay-Aware Fountain (DAF) codes may be suitable for video streaming, as proposed in [10]. Different from the other sliding window fountain codes schemes, DAF codes do not treat the sliding windows as homogeneous. By adaptively selecting the window length and adjusting the sampling pattern according to the ongoing video bit rate, DAF codes may deliver significantly higher video decoding ratio than existing schemes.
The inventors have recognized and appreciated, however, that the high-complexity global optimization process of DAF codes may prevent its applications in delay-sensitive video streaming (e.g., live streaming). As a result, the inventors have recognized and appreciated that reduceing complexity while maintaining relatively high coding performance may improve the video watching experience for delay- sensitive video streaming. The inventors have recognized and appreciated that using Model Predictive Control (MPC) may be most appropriate.
MPC has developed considerably over the recent years. As an advanced method of process control, it has been successfully used in the control process of chemical plants, oil refineries, power plants, etc. [15]. Besides, there are also some successful applications in academia, ranging from energy management [16] to automatic vehicle control [17]. Despite the popularity in automatic control community, MPC has never been used in the domain of multimedia communication. The inventors have recognized and appreciated that integrating MPC with DAF codes may improve video streaming performance, while maintaining low complexity. In MPC, actions may be optimized according to a state sequence over a horizon. Based on that idea, two schemes are discussed herein, i.e., DAF-M and DAF-O.
The inventors have recognized and appreciated that DAF-M may improve upon adjusting the sampling pattern for fountain codes according to the ongoing video bit rate, so as to deliver a consistent video watching experience. The computational complexity may be affordable for live streaming because the objective function may be minimized over a limit-sized horizon, which may not grow with the video length.
The inventors have recognized and appreciated that the online algorithm of DAF-0 may be implemented based on the prediction of future bit rate. The inventors have recognized and appreciated that DAF-0 may provide a decoding ratio that is significantly higher than existing online video streaming schemes.
Implementation of the System
FIG. 1 is a diagram illustrating a system 100 that may employ techniques for increasing data throughput and decreasing transmission delay from a source node to a sink node via a relay node as described herein. In the example of FIG. 1, a source node 110 (which may be referred to as first node) may encode data packets for transmission. In some embodiments, the source node 110 may encode the data packets using fountain coding (as illustrated at stage 210 of FIG. 2). However, any suitable coding, including rateless coding, may be used to encode the data packets. The source node 110 may also transmit the data packets to a first relay node 130 via connection 120 (as illustrated at stage 220 of FIG. 2), which may be a wireless connection. However, any suitable connection or communication technology may be used to communicate among the nodes.
The first relay node 130 may receive at least one of the data packets from the source node 110. In addition, the first relay node 130 may relay or transmit the data packets to a second relay node 150 via connection 140, which may be a wireless connection. The second relay node 150 may receive at least one of the data packets from the first relay node 130. In addition, the second relay node 150 may relay or transmit the data packets to a sink node 170 via connection 160, which may be a wireless connection.
In some embodiments, source node 110 may be a server, such as a streaming video server and/or a live streaming video server streaming live video content. Alternatively or additionally, source node 110 may include a network ingress point, such as a gateway to a wireless network (e.g., a base station in a wireless network).
Additionally, sink node 170 may be a client, such as a video receiver and/or playback device. For example, sink node 170 may be a client of the server referred to as source node 110. Sink node 170 may be a wireless device, such as a smartphone, tablet, laptop computer, or desktop computer.
Alternatively or additionally, relay nodes, such as first relay node 130 and/or second relay node 150, may include network routers and/or network switches. In some embodiments, relay nodes, such as first relay node 130 and/or second relay node 150, may include hubs, and/or any other suitable components. Alternatively or additionally, relay nodes may include other cell transceivers in a cellular network, such as a 5G network.
In some embodiments, the first relay node 130 and/or the second relay node 150 may regenerate, re-encode, and relay the data packets conditionally, based on the quantity of the
data packets received at the given relay node. For example, the first relay node 130 and/or the second relay node 150 may receive a subset of the data packets, and based on the subset of the data packets, the first relay node 130 and/or the second relay node 150 may regenerate the data packets, re-encode the regenerated data packets, and transmit the regenerated, re-encoded data packets.
The sink node 170 may receive one or more data packets from the second relay node 150. If the sink node 170 has received a sufficient quantity of the data packets, the sink node 170 may regenerate and decode the data packets.
FIG. 1 shows only two relay nodes, the first relay node 130 and the second relay node 150. This number of relay nodes is shown for simplicity of illustration. It should be appreciated that a network system may have many more nodes and relay nodes.
In some embodiments, source node 110 may predict at least one future video encoding characteristic of a video source over a finite number of future frames (as illustrated at stage 210 of FIG. 2). A video encoding characteristic may include any characteristic or property of how a video source is encoded, such as frame rate, number of frames in a group of pictures, video bit rate, and so on. Additionally, source node 110 may encode a plurality of video data packets using rateless coding (as described above) based on the predicted at least one future video encoding characteristic of the video source (as illustrated at stage 220 of FIG. 2) by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic (as illustrated at stage 225 of FIG. 2). The video source may be a video source of the plurality of video data packets. Additionally, source node 110 may transmit the video data packets. For example, source node 110 may transmit the video data packets over a data link to a second node, which may be first relay node 130, second relay node 150, and/or sink node 170 (as illustrated at stage 220 of FIG. 2). For example, the second node may be a sink node configured to receive one or more of the plurality of video data packets from the streaming video server via at least one relay node, or at least one relay node configured to receive at least one of the plurality of video data packets from the streaming video server. In some embodiments, the data link is at least partially wireless.
In some embodiments, the rateless coding may comprise fountain coding. Alternatively or additionally, the video source may include video data, such as a video file. Alternatively or additionally, at least one video data packet of the plurality of video data packets comprises at least 100 bits, although any number of bits may be used.
In some embodiments, the plurality of overlapping sliding data windows may be non-homogeneous, as discussed herein. For example, the plurality of overlapping sliding data windows may collectively have more than one length and/or more than one sampling distribution. In some embodiments, source node 110 may obtain at least one video encoding characteristic by preprocessing the video source, and source node 110 may use the at least one video encoding characteristic as well as the predicted at least one future video encoding characteristic. For example, source node 110 may predict the at least one future video encoding characteristic based on at least one video encoding characteristic of the video source. In some embodiments, adaptively adjusting the at least one parameter of the rateless coding may comprise adaptively selecting a first length of a first data window and a second length of a second data window of a plurality of overlapping sliding data windows (as illustrated at stage 226 of FIG. 3). Additionally, the selecting may use model predictive control based on the predicted at least one future video encoding characteristic. In some embodiments, the selecting may be based on a number of bits in frames of the video source and/or on a first number of frames in the video source. Additionally, the first length may comprise a second number of frames that the first data window can accommodate, and the second length may comprise a third number of frames that the second data window can accommodate. In some embodiments, source node 110 may store the first length of the first data window in a header of a first packet of the first data window and may store the second length of the second data window in a header of a second packet of the second data window. In addition, source node 110 may store the first sampling distribution in the header of the first packet of the first data window as a first slope factor and may store the second sampling distribution in the header of the second packet of the second data window as a second slope factor.
In some embodiments, adaptively adjusting the at least one parameter of the rateless coding may comprise adaptively adjusting a first sampling distribution for the first data window and a second sampling distribution for the second data window (as illustrated at stage 227 of FIG. 3). Additionally, the at least one future video encoding characteristic may include a future video bit rate of the video source. For example, the adjusting may be based on a future video bit rate of the video source. In addition, the future video bit rate may be variable.
In some embodiments, source node 110 may segment data from the frames of the video
source into the plurality of video data packets based on at least the first length and the second length (as illustrated at stage 228 of FIG. 3).
In some embodiments, the at least one video encoding characteristic may comprise a past frame rate of the video source, a number of frames in a group of pictures of the video source, and/or a past video bit rate of the video source. Additionally, the past video bit rate may be variable.
Related Work
First of all, the important notations that will be used in this article are listed in Table 1.
Table 1: Definitions of the notations.
Delay-Aware Fountain Codes
DAF codes [10] are delay-aware fountain codes that may improve streaming high quality video over lossy wireless networks. DAF codes deeply integrate channel coding and video coding. In some embodiments, DAF codes include two techniques: time-based sliding window and optimal window-wise sampling strategy.
Time-Based Sliding Window
In some embodiments, a basic idea of DAF codes is to segment the video file into overlapping windows, and then encode and send them consecutively with fountain codes. While the non-overlapping block coding scheme has a relatively small block size, the overlap between sliding windows may allow the decoded packets in a previous window to help the
decoding of future windows. By doing so, the block size may be virtually extended.
In order to maximize block size, the window size W in DAF codes is defined by the maximum tolerable latency TDday (in the unit of time or number of frames to be transmitted in the window), instead of a fixed number of packets. Therefore this technique is named time-based sliding window. In the rest of this article, we call the window covering the frames of [t,t + W - l] as the t* window.
In [10], the step size between two consecutive windows is At . For simplicity, we set At = 1 herein. This is because both the video length T and window size W are defined to be integral multiples of At , if At > 1 , all the parameters can be down-sampled by a factor of At , and all formulas still hold.
Window-wise sampling strategy
Within each window, the fountain codes algorithm may randomly choose the native packets and combine them into a coded packet, according to degree distribution and sampling probability [2]. Because the windows are overlapping, the total sampling probability of a frame is related to all the windows that cover it, as shown in (1).
#}≡allwindowscoverframef where ρω{ΐ) denotes the average sampling probability of each packet in frame t within the window 0) . So, ASP(t) denotes the total probability accumulated on every packet in frame t through all the sliding windows covering that frame.
ASP, or accumulated sampling probability, is an important metric in DAF codes and will be discussed further herein. The final objective of DAF codes is to minimize the variance of ASP, because the higher efficiency of fountain codes is achieved by a more stable sampling rate.
However, with the time-based sliding window, even if the sampling distribution of every window is uniform, the overall ASP may still be nonuniform. As a result, the sampling probabilities in each window need to be adjusted in order to obtain a uniform overall ASP.
A proper representation of the sampling distribution is required. Storing the entire sampling distribution in each packet header (per-frame description) would be impractical due to both communication and computation overhead. Alternatively, the slope-only description is
proposed: the distribution is approximated by a linear function defined by the slope factor The slope factor is a real number ranging from -1 (forming a backward triangular distribution) to 1 (forming a forward triangular distribution). When a = 0 , it represents a uniform distribution.
Given bit rates s and slope factors a , the resulting ASP for each frame can be computed by (2).
ASP t) = s {t)- a+ DM (2) where "■ " denotes the dot product of the two vectors of (Τ - W + 1) elements, and
pkts {t, k) =∑s{i).
Given the total number of frames T , the window size W , and number of packets in each frame s(t) , DAF codes may aim to find a set of slope factors a , for which the variance of the sampling probabilities of all packets attains its minimum value. The optimization problem is defined in (4). argmin ∑ {ASPS (i) - ASPs,a J
(4)
s.t. ■\ < a, < 1, Vf,
1
where ASP„„ YT-W+lASPs
2W + 2 ^'=w
Based on the aforementioned two techniques, the inventors have recognized and appreciated two schemes: DAF and the low-complexity version DAF-L.
In-Time Decoding Ratio
As pointed out in [10], a packet decoded after its playback time is useless. This application- specific feature is captured by a new metric called in-time decoding ratio ( IDR ),
which only counts the "in-time" decoded packets within the current window. This is in contrast to the file decoding ratio ( FDR ) which counts all the decoded packets after a coding session completes.
It has been shown in [10] that DAF and DAF-L schemes yield much higher IDR than the existing schemes, such as [7]-[9], [11], [12], and TCP.
Observations of the Inventors
Although DAF and DAF-L can improve the quality of video streaming over lossy wireless network, the inventors have recognized and appreciated areas of improvement:
• The decoding performance of DAF-L may be considered relatively low. Although its performance is higher than other delay-aware schemes, there is still a notable gap between DAF and DAF-L ( 10% maximum in decoding ratio). That is because DAF-L does not use the window-wise sampling distribution optimization as in DAF. In exchange, DAF-L is a low-complexity and online algorithm.
• Some embodiments may reduce the complexity of DAF in order to improve performance for delay- sensitive video streaming and for long videos. The global optimization function of DAF brings the complexity of θ τ ) . Since the computational scale grows cubically with the video length T , the inventors have recognized and appreciated that alternatives may be more practical for long videos.
• DAF may be an offline scheme. Because the bit rate information of all frames is required to perform the optimization, the whole video file may need to be available before being encoded by fountain codes. Therefore, some embodiments of DAF may not be appropriate or possible for live video streaming applications.
Model Predictive Control
The term Model Predictive Control (MPC) does not designate a specific control strategy, but a wide range of methodologies which make an explicit use of a process model to obtain the control signal by minimizing an objective function [15]. The methodology of MPC -based controllers may be characterized by the following steps:
1. At each time τ , the process model is explicitly used to get the output at a future
time horizon H . These predictions yT {t + k) 1 for k = I ...H depend on the known values
(history inputs and outputs) up to time τ and the future control signals, which are those to be optimized.
2. The control sequence uT (T + k) , k = 0...H - l is calculated by minimizing the objective function. The minimization function usually takes the form of a quadratic function of the error between the prediction and the target value.
3. The first action ιιτ (τ) from the calculated control sequence is sent to the process whilst the other control actions are rejected. Then, the next sampling instant γ(τ + 1) will be known, and step 1 will be repeated with this new value, and the next control action uT+l (t + 1) is calculated, which in principle will be different to ιιτ (τ + 1) because of the new information available.
The algorithm is iteratively executed each time a new instant is sampled. Various MPC algorithms may only differ with each other in the horizon length, predictive model, and the objective function to be minimized. Note that it is in general not a globally optimal algorithm, since the control decisions are made only based on the history values and the prediction of a finite future horizon.
MPC-Based Optimal Control for DAF Codes
In order to extend the application of DAF codes to live video streaming, the inventors have recognized and appreciated that a method to close the gap between DAF-L and DAF may be valuable: an online optimization algorithm with a lower computational complexity than
DAF, but higher performance than DAF-L. Meanwhile, MPC provides a locally optimal online solution to any objective-function-minimizing process control problem, which the inventors have recognized and appreciated as supporting a solution to problems described herein.
MPC-Based DAF Codes: The Offline Version
The inventors have recognized and appreciated that integrating MPC and DAF codes to impose a finite horizon to the optimization process of DAF codes may improve DAF codes at lesast in the areas described above. We call the proposed scheme DAF-M, where M stands for
1 This notation indicates the value of the variable at the time Ί ' + & is calculated at time X .
MPC. A general flowchart of some embodiments of DAF-M is shown in FIG. 4. The text on the top of each block describes the procedure in MPC. The text in the parentheses below explains its function in DAF-M.
In this part, we focus on the localization of the optimization problem, so we temporarily assume that we know the full bit rate information of a video sequence. As a result, DAF-M may be considered an offline algorithm. The prediction of bit rate and other practical problems for online algorithms will be discussed further below.
Horizon
Denote τ as the index of current time (also the index of current sliding window). Since the window size is W , the frame that was newly added to the encoding buffer is the
In order to determine what slope to use in the 7th window, we may calculate the best set of slopes over a future horizon based on both the known ASP in the past and the predicted ASP in the future, then take the first slope as the one used in encoding the 7th window. We denote the length of the horizon as H , so the range of windows involved in our slope optimization is [τ, τ + Η - ΐ] .
We hope to clarify the difference between window size W and horizon length H here. The two concepts are confusable, since both of them are spans of frames, and with both values increasing, the performance and computational complexity of the controller tend to be higher. In fact, they are independent parameters that serve different purposes. W is typically chosen in accordance with the longest tolerable end-to-end delay. It is specified by the user, so it is an input value that does not change in a communication session. On the other hand, H is a parameter to balance the computational complexity and desired performance. It is chosen by the application according to the computing power, network condition, quality demand, etc. The extreme case would be H equals to 1, with which only one slope is calculated, the algorithm reduces to a greedy algorithm; if H equals to Γ - Τ + l , thus all the frames in the future are considered in order to obtain each slope, it becomes the same as the globally optimal algorithm of DAF.
Objective Function
The inventors have recognized and appreciated that a good target may be similar to the global optimization problem in (4), which is minimizing the variance of ASP. With a finite
horizon imposed, the objective function is limited to a local range. Because the slopes in the horizon [τ, τ + Η - ΐ] can affect the ASP of the frames in [T, T + H + W - 2] , the objective function is the variance over that range as in (5).
where af denotes the H -length slope vector for the windows starting from τ . P denotes the vector of ASP in the range of [f, τ + H + W - 2] , including both past and predictive values. The calculation of P will be discussed in the following part. P H is the average value over s,ar
P .
Process Model
In some embodiments, the process model may play a decisive role in a MPC controller.
The chosen model may need to be capable of capturing the process dynamics so as to precisely predict the future outputs. Different from the problems in most industrial control scenarios, in DAF codes, if the bit rate vector s is given, the mapping between control sequences (slope vector) and the predicted output (future ASP) may be deterministic: ASP(s,a) = [A5 s>a(l) ASPs a {2) - ASPs a
(6) where s is a bit rate vector, and a is a slope vector. ASPs a (t) is defined in (2). Vector length is equal to the length of a . In the calculation of ASP, s needs to contain W - 1 more elements than a .
Although ASP model is deterministic if we know all the bit rates s and slope parameters a in the future, the process model may no longer be deterministic when a finite horizon is imposed.
For the current window τ , the ASP in the range of objective function in (5) consists of three parts, all of which may be (H + W - 1) -length vectors representing a part of ASP within the range of [T, T + H + W - 2] . Some of the components only have a part of non-zero elements in that range, which will be pointed out later.
• Ρί∞ί : the initial ASP that already sampled by previous windows.
Non-zero range: [f, τ + W - 2] . f*mit - ASP (s^+! , a _w+i ), (7) where the index notation in s and a means the elements in these two vectors are the t* to (t + l - if1 elements from vectors s and a , respectively. Because Vinit is computed based on known s and previous selections of a , it may have no variable parameter.
• P : the range where ASP are affected by the currently optimized slopes within the horizon.
• P : the range where ASP will be affected by the slopes out of the horizon range in the future.
Non-zero range: [τ + Η, τ + H + W
Ps = ASP(s--2 ,a:- ), (9) where a denotes the prediction of future slopes. Since the optimal choices of the slopes in this part will also be affected by the slopes in the further future, in order to limit the length of horizon, we shall make a prediction without full knowledge about further frames. This is the part that may bring error. Fortunately, because the selection of slopes of frames from further away will have less impact on current optimal slope choice, a simple and safe prediction may be enough. In some embodiments, an all-zero vector may be used as a = 0 , assuming all the windows in that range will use uniform distributions. In that case, we can eliminate the variable parameter a . The future bit rates up to time τ + H + 2W - 3 may still be needed as input parameters.
The result is a combination of the three components:
P = P + P + P (10)
We plug (6) - (10) into the objective function (5) and solve the optimization function
(11). argmin / =
s.t. - l≤at≤l, Vf.
After solving the optimal a , only the first element ατ(τ) may be chosen to encode the 7th window.
Complexity
For each window τ , the optimal slope (11) can be solved by Karush-Kuhn-Tucker (KKT) conditions. Because there are H variables to optimize and 2H conditions for KKT conditions, the optimization process yields a system of equations with 3H equations. If we omit constant factors, the solution involves the generation of a parameter matrix of H x H and the computation of its inverse matrix. The algorithm will be executed iteratively for each window for T times. As a result, the total computational complexity is Since H - T and H does not increase with video length, it can be considered as a linear complexity algorithm for time T . Comparatively, DAF-M may have a much lower complexity than θ(τ3 ) of DAF.
Online Algorithm With Bit Rate Prediction
From (8) and (9), we know that in order to optimize the slopes in horizon H , we may need to know the bit rates in H + 2W - 3 frames ahead of current time τ , or H + W - 2 frames ahead of the last known (a.k.a. the (T + W - if1 ) frame. However, in live-stream videos, we do not know the future bit rates beforehand. As a result, in order to make DAF-M an online algorithm, the ivnentors have recognized and appreciated that a prediction may need to be made on future bit rates, and the length of prediction may be H + W - 2 . Admittedly, a more accurate prediction of video bit rate will help MPC optimization to obtain better control sequences. Therefore, using advanced algorithms, such as [18]-[22], may
improve the coding result. A survey of video frame size forecasting algorithms can be found in [23]. Nevertheless, since video bit rate is a stochastic process, we can never expect a perfect prediction. Since our work does not focus on bit rate prediction algorithms, a simple linear formula (12) is adapted to predict the future bit rates. A similar formula is used in [24] and yields good results. st+l = {l - a)st + ast, (12) where st denotes the bit rate of the frame newly added to the DAF encoder, and st+l denotes the prediction for the future frames, e (θ,ΐ] controls the weight of new frame in the prediction of bit rate. We fix it to 0.75 so our algorithm can get good average performances in our implementations.
In some embodiments, the DAF encoder may be implemented in an ASIC, an FPGA, or a similar integrated circuit chip. Alternatively or additionally, the system may be implemented as a driver controlling network interface hardware or in any other suitable way.
Since (12) is not accurate enough to perform the frame-by- frame prediction, only a level of future bit rate is obtained. We replace all the elements of s in (10) which exceed the last known frame with the predicted bit rate st+l . We call the online MPC -based DAF codes algorithm as DAF-O.
When it comes to computational complexity, in some embodiments, the prediction of bit rate in DAF-0 is only θ(ΐ), so the online algorithm may not increase the total
computational complexity in DAF-M.
Optimization Results and Evaluations
In this part, we show some examples of optimization results obtained by different schemes, and analyze the resulting ASP.
ASP Results Comparison
FIG. 5a shows the bit rates for CIF video sequence foreman coded with H.264/AVC standard. FIG. 5b-d present the optimization results using DAF-O, DAF-M, and DAF, respectively, with W = 30 and H = 10. In FIG. 5b-d, each slope line represents a sampling distribution within a window. Since At = 1 , there should be a window for every [t,t + W - l] span. Because there are too many windows to be clearly shown in one figure, only a fraction of
the windows is presented here.
The inventors have recognized and appreciated that (i) the first W -I slopes in DAF-M (FIG. 5c) and DAF-0 (FIG. 5b) are uniform distributions. That is because there may not be enough past ASP to calculate Pinit in that range (see (7)), so the slopes in that range are set to 0 as a warm-up period; (ii) the latter part of DAF-M is very similar to that in the globally optimal result of DAF in FIG. 5d, whilst DAF-0 is not, because of its lack of information about the future bit rates.
FIG. 6 shows the resulting ASP using the optimized slopes from FIG. 5b-d (including DAF-L, which uses all uniform distributions). The numbers following scheme names in the legend are corresponding variations of ASP, and there is
DAF < DAF- M < DAF- O < DAF- L .
Measuring the Quality of ASP
Although we use variance of ASP as the objective function in (5), the inventors have recognized and appreciated that it may not be an effective metric to measure the quality of the resulting ASP in fountain codes. For example, if obtained sampling distributions generate very low ASP, even if the variance is low, they are not desirable. Besides, the performance of obtained sampling distribution may vary under different code rates. Therefore, we introduce a new metric to measure the quality of ASP, called ASP coverage ratio, denoted as p as in ( 13) .
where [·] is an Iverson bracket notation, which is defined to be 1 if the condition in square brackets is satisfied, and 0 otherwise. p(c) gives the ratio of frames whose ASP surpasses lc . It is a monotonically increasing function, and its output values are in the range of [0,l] : if c < lPm£« , the output value is 0 ; if c≥ \Pmin , the output value is 1 (where Pmax and Pmin denote the maximum and minimum values in P respectively). The inventors have recognized and appreciated that it is a good metric to measure the quality of ASP in this case for the following reasons. ASP reflects the average accumulated sampling probability for a frame, so, under the same code rate, greater ASP of a frame indicates higher probability for it to be sampled than others. It also means, in order to be sampled into at
least one coded packet, a frame with higher ASP expects lower code rate, and ASP is inversely proportional to expected code rate. Therefore, if the ASP of a frame is higher than the reciprocal of code rate lc , that frame is likely to be sampled in coded packets under code rate c , thus it has the chance to be decoded on the receiver side. As a result, p(c) represents the ratio of frames that could be decoded under code rate c .
FIG. 7 shows the coverage ratio curves of ASP from FIG. 6 and block coding, c is normalized to the range of [θ, l] . The inventors have recognized and appreciated that the result justifies the gains brought by the proposed DAF codes: when c is greater than 60% of lPmin
, the ASP coverage ratios are over 60% and there is DAF > DAF- M > DAF- O > DAF- L . That means in terms of the ratio of possible decoded frames within that code rate range, in some embodiments, DAF outperforms DAF-M, DAF-M outperforms DAF-O, and DAF-0 outperforms DAF-L.
Conversely, the relationships of p are inversed when c < 50% . However, since in those scenarios the code rates are too low that the decoding ratios are below 50% and the video can hardly be viewed for all schemes, they are not the cases of most concern to users. In other words, the proposed DAF codes sacrifice the performance when the code rate is extremely low, for better performance when the code rate is in a moderately high range.
It should be noted that this metric does not fully represent the actual decoding ratio in video communication. It only considers the sampling probability, but (i) it does not imply when the frames are decoded, so it does not represent in-time decoding ratio ( IDR ); (ii) it does not consider the factors of fountain codes, such as degree distribution, codeword length, etc. That is the reason why the p curve of block coding does not look significantly worse than DAF-L in FIG. 7, while the IDR results of block coding is far inferior to any other schemes, as we will see in the next section.
Exemplary Experimental Results
Simulator Setup
The encoding/decoding parts of some embodiments may generally follow the DAF framework in [10], with changes made to the Video Preprocessing Module, especially with implementation of a different optimization algorithm in some embodiments.
We conduct the simulation experiments on Common Open Research Emulator (CORE)
[25] and Extendable Mobile Ad-hoc Network Emulator (EMANE) [26]. The working environment is set up on Oracle ® VM VirtualBox virtual machines.
We use CORE to emulate the topology of the virtual network and the relay nodes. Two VMs are connected to the virtual network as a source (or encoder/sender) node running the client application, and a destination (or decoder/receiver) node running the server application. A video is streamed from client to server using different schemes.
EMANE is used for emulation of IEEE 802.1 lb on PHY and MAC layer of each wireless node. Because of the forward error correction (FEC) nature of fountain codes, we disable the retransmission mechanism of 802.1 lb for all fountain-code-based schemes. The adaptive rate selection mechanism of 802.1 lb is also disabled. Ad-hoc On-Demand Distance Vector (AODV) protocol is used for routing.
There are two nodes in the network: a source node and a destination node. The communication path from the source to the destination has one hop. The distance between the two nodes is carefully set so that the packets with 1024-byte payload will experience a 10% loss rate.
Performance Evaluation
We implement eight schemes for comparisons, which are abbreviated as follows:
1. DAF: the full optimization version of delay-aware fountain codes in [10].
2. DAF-M: the MPC-based DAF codes as described herein.
3. DAF-O: the online version of MPC-based DAF codes as described herein.
4. DAF-L: the non-optimized version of DAF codes. All the windows may use uniform distributions in some embodiments.
5. S-LT: the sliding window LT code from [8].
6. Block: the block coding for fountain codes.
7. Expand: the expanding window scheme of [11].
8. TCP: uses TCP protocol to stream video. In order to add delay awareness, the video file is also segmented into the blocks like in "Block" scheme, but they are sent using TCP.
All the seven fountain-code-based schemes use the following parameter setting: the packet size P = 1024 bytes; for degree distribution, let δ = 0.02 , c = 0.4 (as defined in [2, Definition 11]). For TCP scheme, the maximum data rate is limited to the same amount as
required by the fountain-code-based schemes for the sake of fairness.
Several benchmark CIF test sequences, provided by [27], are used for our evaluation. They are coded into H.264/AVC format using x264 [28], encapsulated into ISO MP4 files using MP4Box [29], and streamified by mp4trace tool from EvalVid tool- set [30]. The coding structure is IPPP, which contains only one I-frame (the first frame) and no B-frame, and all the rest are P-frames. All the delays shown in the experiments are in the unit of seconds. Denote by C and TDehy the code rate and the tolerable delay, respectively.
We conduct 20 experiments for each setting with different random seeds, and take the median value of them as the performance measure. The in-time decoding ratio ( IDR ) results are presented since it is a better metric in live video streaming than file decoding ratio.
Evaluate The Effect Of Horizon Length
In this case, we compare the performance of four DAF-based schemes. For DAF-M and DAF-O, we set H to 1, 5, 10, 15 and 20 to observe the influence of horizon length.
FIG. 8 shows the IDR curves of the DAF-based schemes for foreman. Because there are two dimensions of variables, we obtain the IDR curves vs. C when fixing TDday = 1.2s in FIG. 8a, and also obtain the IDR curves vs. TDela when fixing C = 0.77 in FIG. 8b.
Results of DAF-M and DAF-0 are plotted with dashed lines and dotted lines, resepectively. The different horizon lengths H are indicated by the width of lines and the brightness of colors: the thicker and darker lines indicate larger H , and the thinner and lighter lines indicate smaller H . Various values of H may be chosen for MPC-0 and MPC-M.
The numerical results of them when fixing both delay TDelay = 1.2s and code rate
C = 0.8 for foreman are shown in Table 2.
Table 2: IDR comparisons of DAF-based schemes using different horizon lengths.
From the results above, the inventors have recognized and appreciated the following:
• The decoding ratio of all the schemes is an increasing function of TDday , and also a decreasing function of C . That means larger delay and lower code rate lead to higher overall performance, which meets our expectation.
• Among the DAF-based schemes, the performance of DAF may be the highest in some embodiments, while DAF-L may be the lowest in some embodiments. The results of DAF-M and DAF-0 may generally be between those of DAF and DAF-L in some embodiments. The relationships of decoding ratios is consistent with the ASP coverage ratios as seen in FIG. 7.
• In general, DAF-M may outperform DAF-0 under the same horizon length. That is because DAF-M can be deemed as DAF-0 with "perfect" bit rate prediction. As a result, the gap between DAF and DAF-M is only caused by local bit rate knowledge, while the performance drop from DAF to DAF-0 may come from both limited horizon length and inaccuracy of bit rate prediction.
• For DAF-M, the decoding ratio gets higher with the increasing horizon length. However, since the computational complexity increases cubically with H , an extra large H may not be affordable.
• For DAF-O, longer horizon length may not guarantee better performance. In the example of Table 2, the highest IDR of DAF-0 is achieved when H = 10, but not H = 20 . The reason is that in DAF-O, the long-term bit rate prediction may become inaccurate. Due to accumulated long-term prediction error, the performance of DAF-0 may reduce when the prediction length increases.
• When H = 1 , the performance of DAF-0 is significantly lower than other H values. That is because it is deducted to a greedy algorithm, and the decision is only based on the history ASP.
Compare With the Existing Schemes
In this case, we compare the proposed schemes with the existing video streaming algorithms. To strike a good balance between complexity and performance, we use H = 10 for MPC-0 and MPC-M.
First, we want to show IDR comparisons of the online window-based fountain codes schemes: DAF-O, DAF-L, S-LT, expanding window and block coding. We choose all the combinations of TDela e [0.6,1.5] and C [0.6,0.9] to conduct the experiments on foreman.
There are two dimensions of variables, so the results of each scheme form a surface of IDR . FIG. 9 shows five surfaces of the online schemes.
The numerical results obtained by more schemes for more sequences are shown in Table 3.
Table 3: IDR comparisons of proposed schemes and existing video streaming algorithms.
From the results above, we have the following observations:
• Among all the schemes, DAF may have the highest performance, followed by DAF-M, both of which are offline algorithms. Considering the computational complexity of DAF-M may be orders of magnitude lower than DAF in some embodiments, DAF-M may be a more practical offline algorithm in some embodiments.
• Among all the online schemes, DAF-0 may have the highest performance, followed by DAF-L. As shown in FIG. 9, the surface of DAF-0 is almost always higher than any other scheme.
• If C is too high or TDday is too small, the performance of DAF-L may be lower than
DAF. The result accords with the ASP coverage ratio in FIG. 8 when c is small, but it is not a very typical scenario since all of them are too low to be properly watched.
• The performance of TCP is relatively low. The reason is that TCP is not suitable for
wireless channels where packet loss rate is high [31]. Its congestion control mechanism does not help the performance.
• Block coding scheme performs the lowest among all schemes, although its ASP coverage ratio is not very low compared to others. The reason is that the blocks are too small and non-overlapping, so the coding overhead is very large [13].
References
The following references are incorporated herein by reference in their entireties:
[1] J. W. Byers, M. Luby, and M. Mitzenmacher, "A digital fountain approach to asynchronous reliable multicast," IEEE Journal on Selected Areas in Communications, vol. 20, no. 8, pp. 1528-1540, 2002.
[2] M. Luby, "LT codes," in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, 2002, pp. 271-271.
[3] A. Shokrollahi, "Raptor codes," IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2551-2567, 2006.
[4] M. Luby, A. Shokrollahi, M. Watson, T. Stockhammer, and L. Minder, "RaptorQ forward error correction scheme for object delivery (RFC 6330)," IETF Request For
Comments, 2011.
[5] Q. Huang, K. Sun, X. Li, and D. O. Wu, "Just FUN: A joint fountain coding and network coding approach to loss-tolerant information spreading," in Proceedings of the 15th ACM international symposium on Mobile ad hoc networking and computing. ACM, 2014, pp. 83-92.
[6] J. -P. Wagner, J. Chakareski, and P. Frossard, "Streaming of scalable video from multiple servers using rateless codes," in IEEE International Conference on Multimedia and Expo, 2006. IEEE, 2006, pp. 1501-1504.
[7] S. Ahmad, R. Hamzaoui, and M. M. Al-Akaidi, "Unequal error protection using fountain codes with applications to video communication," IEEE Transactions on Multimedia, vol. 13, no. l, pp. 92-101, 2011.
[8] M. C. Bogino, P. Cataldi, M. Grangetto, E. Magli, and G. Olmo, "Sliding-window digital fountain codes for streaming of multimedia contents," in IEEE International
Symposium on Circuits and Systems, 2007. ISCAS 2007. IEEE, 2007, pp. 3467-3470.
[9] P. Cataldi, M. Grangetto, T. Tillo, E. Magli, and G. Olmo, "Sliding-window raptor codes for efficient scalable wireless video broadcasting with unequal loss protection," IEEE
Transactions on Image Processing, vol. 19, no. 6, pp. 1491-1503, 2010.
[10] K. Sun, H. Zhang, and D. Wu, "DAF codes: Delay-aware fountain codes for video streaming," IEEE Transactions on Multimedia, vol. 20, no. 8, pp. 1528-1540, 2016.
[11] D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. J. Piechocki, "Expanding window fountain codes for unequal error protection," IEEE Transactions on Communications, vol. 57, no. 9, pp. 2510-2516, 2009.
[12] D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, "Scalable video multicast using expanding window fountain codes," IEEE Transactions on Multimedia, vol. 11, no. 6, pp. 1094-1104, 2009.
[13] G. Liva, E. Paolini, and M. Chiani, "Performance versus overhead for fountain codes over Fq," IEEE Communications Letters, vol. 14, no. 2, pp. 178-180, 2010.
[14] K. Sun and D. Wu, "Video rate control strategies for cloud gaming," Journal of Visual Communication and Image Representation, vol. 30, pp. 234-241, 2015.
[15] E. F. Camacho and C. B. Alba, Model predictive control. Springer Science & Business Media, 2013.
[16] H. Borhan, A. Vahidi, A. M. Phillips, M. L. Kuang, I. V. Kolmanovsky, and S. D. Cairano, "MPC -based energy management of a powersplit hybrid electric vehicle," IEEE Transactions on Control Systems Technology, vol. 20, no. 3, pp. 593-603, 2012.
[17] P. Falcone, H. Eric Tseng, F. Borrelli, J. Asgari, and D. Hrovat, "MPCbased yaw and lateral stabilisation via active front steering and braking," Vehicle System Dynamics, vol. 46, no. SI, pp. 611-628, 2008.
[18] A. Abdennour, "Short-term MPEG-4 video traffic prediction using ANFIS," International Journal of Network Management, vol. 15, no. 6, pp. 377-392, 2005.
[19] A. D. Doulamis, N. D. Doulamis, and S. D. Kollias, "An adaptable neural-network model for recursive nonlinear traffic prediction and modeling of MPEG video sources," IEEE Transactions on Neural Networks, vol. 14, no. 1, pp. 150-166, 2003.
[20] A. Bhattacharya, A. G. Parlos, and A. F. Atiya, "Prediction of MPEG-coded video source traffic using recurrent neural networks," IEEE Transactions on Signal Processing, vol. 51, no. 8, pp. 2177-2190, 2003.
[21] K. Y. Lee, M. Kim, H.-S. Jang, and K.-S. Cho, "Accurate prediction of real-time
MPEG-4 variable bit rate video traffic," ETRI journal, vol. 29, no. 6, pp. 823-825, 2007.
[22] K. Sun and B. Yan, "Efficient P-frame complexity estimation for frame layer rate control of H.264/AVC," in 18th IEEE International Conference on Image Processing (ICIP),
2011. IEEE, 2011, pp. 1669-1672.
[23] R. J. Haddad, M. P. McGarry, and P. Seeling, "Video bandwidth forecasting," IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 1803-1818, 2013.
[24] A. M. Adas, "Using adaptive linear prediction to support real-time VBR video under RCBR network service model," IEEE/ ACM Transactions on Networking, vol. 6, no. 5, pp. 635-644, 1998.
[25] J. Ahrenholz, "Comparison of core network emulation platforms," in MILITARY COMMUNICATIONS CONFERENCE, 2010-MILCOM 2010. IEEE, 2010, pp. 166-171.
[26] U.S. Naval Research Laboratory, Networks and Communication Systems Branch. The extendable mobile ad-hoc network emulator (EMANE). [Online]. Available:
http://www.nrl.navy.mil/itd/ncs/products/emane
[27] P. Seeling and M. Reisslein, "Video transport evaluation with H.264 video traces," IEEE Communications Surveys and Tutorials, vol. 14, no. 4, pp. 1142-1165, 2012.
[28] x264 team. x264. [Online]. Available:
http://www.videolan.org/developers/x264.html
[29] GPAC project. MP4Box. [Online]. Available:
http : //gpac . wp .mines telecom. fr/mp4box/
[30] J. Klaue, B. Rathke, and A. Wolisz, "Evalvid-a framework for video transmission and quality evaluation," in Computer Performance Evaluation. Modelling Techniques and Tools. Springer, 2003, pp. 255-272.
[31] G. Holland and N. Vaidya, "Analysis of TCP performance over mobile ad hoc networks," Wireless Networks, vol. 8, no. 2/3, pp. 275-288, 2002.
Computing Environment
Techniques for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node may be implemented on any suitable hardware, including a programmed computing system. For example, FIG. 1 illustrates a system implemented with multiple computing devices, which may be distributed and/or centralized. Also, FIGS. 2 and 3 illustrate algorithms executing on at least one computing device. FIG. 19 illustrates an example of a suitable computing system environment 300 on which embodiments of these algorithms may be implemented. This computing system may be representative of a computing system that implements the described technique of increasing data throughput and decreasing transmission delay from a source node to a sink node via a relay node. However, it
should be appreciated that the computing system environment 300 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 300 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 300.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments, or cloud-based computing environments that include any of the above systems or devices, and the like.
The techniques described herein may be implemented in whole or in part within network interface 370. The computing environment may execute computer-executable instructions, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to FIG. 19, an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 310. Though a programmed general purpose computer is illustrated, it should be understood by one of skill in the art that algorithms may be implemented in any suitable computing device. Accordingly, techniques as described herein may be implemented in a system for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node. These techniques may be implemented in such network devices as originally manufactured or as a retrofit, such as by changing program memory devices holding programming for such network devices or software download. Thus, some or all of the components illustrated in FIG. 19, though illustrated as part of a general purpose computer, may be regarded as representing portions of a node or other component in a network system.
Components of computer 310 may include, but are not limited to, a processing unit 320, a system memory 330, and a system bus 321 that couples various system components including the system memory 330 to the processing unit 320. The system bus 321 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel
Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as
Mezzanine bus.
Computer 310 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 310 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by computer 310. Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example and not limitation, communication media includes wired media such as a wired network or direct- wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR), and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 330 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 331 and random access memory (RAM) 332. A basic input/output system 333 (BIOS), containing the basic routines that help to transfer information between elements within computer 310, such as during start-up, is
typically stored in ROM 331. RAM 332 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 320. By way of example and not limitation, FIG. 19 illustrates operating system 334, application programs 335, other program modules 336, and program data 337.
The computer 310 may also include other removable/non-removable,
volatile/nonvolatile computer storage media. By way of example only, FIG. 19 illustrates a hard disk drive 341 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 351 that reads from or writes to a removable, nonvolatile magnetic disk 352, and an optical disk drive 355 that reads from or writes to a removable, nonvolatile optical disk 356 such as a CD-ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 341 is typically connected to the system bus 321 through an non-removable memory interface such as interface 340, and magnetic disk drive 351 and optical disk drive 355 are typically connected to the system bus 321 by a removable memory interface, such as interface 350.
The drives and their associated computer storage media discussed above and illustrated in FIG. 19, provide storage of computer readable instructions, data structures, program modules, and other data for the computer 310. In FIG. 19, for example, hard disk drive 341 is illustrated as storing operating system 344, application programs 345, other program modules 346, and program data 347. Note that these components can either be the same as or different from operating system 334, application programs 335, other program modules 336, and program data 337. Operating system 344, application programs 345, other program modules 346, and program data 347 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 310 through input devices such as a keyboard 362 and pointing device 361, commonly referred to as a mouse, trackball, or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 320 through a user input interface 360 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB). A monitor 391 or other type of display device is also connected to the system bus 321 via an interface, such as a video
interface 390. In addition to the monitor, computers may also include other peripheral output devices such as speakers 397 and printer 396, which may be connected through an output peripheral interface 395.
The computer 310 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 380. The remote computer 380 may be a personal computer, a server, a router, a network PC, a peer device, or some other common network node, and typically includes many or all of the elements described above relative to the computer 310, although only a memory storage device 381 has been illustrated in FIG. 19. The logical connections depicted in FIG. 19 include a local area network (LAN) 371 and a wide area network (WAN) 373, but may also include other networks. Such networking environments are commonplace in offices, enterprise- wide computer networks, intranets, and the Internet.
When used in a LAN networking environment, the computer 310 is connected to the LAN 371 through a network interface or adapter 370. When used in a WAN networking environment, the computer 310 typically includes a modem 372 or other means for establishing communications over the WAN 373, such as the Internet. The modem 372, which may be internal or external, may be connected to the system bus 321 via the user input interface 360, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 310, or portions thereof, may be stored in the remote memory storage device. By way of example and not limitation, FIG. 19 illustrates remote application programs 385 as residing on memory device 381. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art.
In some embodiments, techniques described herein may be used in streaming multimedia services. Such services may include streaming multimedia, such as video and/or audio, over a network such as the Internet between a streaming server and a client. Some embodiments of a streaming service may be over a different network, such as a LAN, where the streaming server may be installed on a computer within the premises of a customer, such as a house or office building. Alternatively or additionally, the streaming server may be geographically remote relative to the clients, and the connection between the server and clients
may be a dedicated wireless connection. Alternatively, the connection may be over a shared network such as a 5G cellular network.
Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Further, though advantages of the present invention are indicated, it should be appreciated that not every embodiment of the invention will include every described advantage. Some embodiments may not implement any features described as advantageous herein and in some instances.
Accordingly, the foregoing description and drawings are by way of example only.
The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component. Though, a processor may be implemented using circuitry in any suitable format.
Further, it should be appreciated that a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone, or any other suitable portable or fixed electronic device.
Also, a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.
Such computers may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks, or fiber optic networks.
Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
In this respect, the invention may be embodied as a computer readable storage medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above. As is apparent from the foregoing examples, a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form. Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above. As used herein, the term
"computer-readable storage medium" encompasses only a computer-readable medium that can be considered to be a manufacture (i.e., article of manufacture) or a machine. Alternatively or additionally, the invention may be embodied as a computer readable medium other than a computer-readable storage medium, such as a propagating signal.
The terms "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or
implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags, or other mechanisms that establish relationship between data elements.
Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.
Also, the invention may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having," "containing," "involving," and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
In the attached claims, various elements are recited in different claims. However, the claimed elements, even if recited in separate claims, may be used together in any suitable combination.
Claims
1. A network system for increasing data throughput and decreasing transmission delay along a data link from a source node to a sink node via a relay node, the network system comprising:
a first node configured to:
predict at least one future video encoding characteristic of a video source over a finite number of future frames,
encode a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic, and
transmit the plurality of video data packets,
wherein the video source is a video source of the plurality of video data packets.
2. The network system of claim 1, wherein:
the rateless coding comprises fountain coding,
the first node is configured to predict the at least one future video encoding
characteristic based on at least one video encoding characteristic of the video source, and adaptively adjusting the at least one parameter of the rateless coding comprises:
using model predictive control by adaptively selecting, based on the predicted at least one future video encoding characteristic, a first length of a first data window and a second length of a second data window of a plurality of overlapping sliding data windows, the first length comprising a first number of frames that the first data window can accommodate, the second length comprising a second number of frames that the second data window can accommodate, and
segmenting data from frames of the video source into the plurality of video data packets based on at least the first length and the second length.
3. The network system of claim 2, wherein:
the at least one future video encoding characteristic comprises a future video bit rate of the video source, the future video bit rate being variable.
4. The network system of claim 2, wherein:
the plurality of overlapping sliding data windows are non-homogeneous and collectively have more than one length and/or more than one sampling distribution.
5. The network system of claim 2, wherein:
adaptively adjusting the at least one parameter of the rateless coding comprises:
adaptively adjusting a first sampling distribution for the first data window and a second sampling distribution for the second data window.
6. The network system of claim 2, wherein:
the at least one video encoding characteristic comprises:
a past frame rate of the video source,
a number of frames in a group of pictures of the video source, and/or a past video bit rate of the video source, the past video bit rate being variable.
7. The network system of claim 1, wherein:
the data link is at least partially wireless.
8. The network system of claim 1 , wherein at least one video data packet of the plurality of video data packets comprises at least 100 bits.
9. The network system of claim 1, wherein:
the first node comprises a streaming video server streaming live video content.
10. The network system of claim 9, wherein:
the first node is configured to transmit the plurality of video data packets to at least one second node, the at least one second node comprising:
a sink node configured to receive one or more of the plurality of video data packets from the source node via at least one relay node, or
at least one relay node configured to receive at least one of the plurality of video data packets from the source node.
11. At least one computer-readable storage medium encoded with executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method for transmitting a video stream over a data link from a source node to a sink node via a relay node, the method comprising:
predicting at least one future video encoding characteristic of a video source over a finite number of future frames;
encoding a plurality of video data packets using rateless coding based on the predicted at least one future video encoding characteristic by adaptively adjusting at least one parameter of the rateless coding based on the predicted at least one future video encoding characteristic; and
transmitting the plurality of video data packets,
wherein the video source is a video source of the plurality of video data packets.
12. The at least one computer-readable storage medium of claim 11, wherein:
the predicting is based on at least one video encoding characteristic of the video source, and
adaptively adjusting the at least one parameter of the rateless coding comprises:
using model predictive control by adaptively selecting, based on the predicted at least one future video encoding characteristic, a first length of a first data window and a second length of a second data window of a plurality of overlapping sliding data windows, the first length comprising a first number of frames that the first data window can accommodate, the second length comprising a second number of frames that the second data window can accommodate, and
segmenting data from frames of the video source into the plurality of video data packets based on at least the first length and the second length.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201662309069P | 2016-03-16 | 2016-03-16 | |
| US62/309,069 | 2016-03-16 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017161122A1 true WO2017161122A1 (en) | 2017-09-21 |
Family
ID=59851216
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2017/022715 Ceased WO2017161122A1 (en) | 2016-03-16 | 2017-03-16 | System for live video streaming using delay-aware fountain codes |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2017161122A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110191342A (en) * | 2019-07-02 | 2019-08-30 | 北京字节跳动网络技术有限公司 | Method for video coding and equipment, video encoding/decoding method and equipment |
| CN113068065A (en) * | 2021-02-25 | 2021-07-02 | 北京邮电大学 | A low-latency streaming media transmission method based on viewpoint prediction |
| CN114900273A (en) * | 2022-03-30 | 2022-08-12 | 上海交通大学 | A Continuous Reliable Transmission Method Based on Covering Window and Non-Uniformly Sampling Fountain Code |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110170591A1 (en) * | 2008-09-16 | 2011-07-14 | Dolby Laboratories Licensing Corporation | Adaptive Video Encoder Control |
| US20130322287A1 (en) * | 2011-02-17 | 2013-12-05 | Blackberry Limited | Packet Delay Optimization in the Uplink of a Multi-Hop Cooperative Relay-Enabled Wireless Network |
| US20150179183A1 (en) * | 2006-12-12 | 2015-06-25 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream |
| WO2016022982A1 (en) * | 2014-08-08 | 2016-02-11 | University Of Florida Research Foundation, Inc. | Joint fountain coding and network coding for loss-tolerant information spreading |
-
2017
- 2017-03-16 WO PCT/US2017/022715 patent/WO2017161122A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150179183A1 (en) * | 2006-12-12 | 2015-06-25 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream |
| US20110170591A1 (en) * | 2008-09-16 | 2011-07-14 | Dolby Laboratories Licensing Corporation | Adaptive Video Encoder Control |
| US20130322287A1 (en) * | 2011-02-17 | 2013-12-05 | Blackberry Limited | Packet Delay Optimization in the Uplink of a Multi-Hop Cooperative Relay-Enabled Wireless Network |
| WO2016022982A1 (en) * | 2014-08-08 | 2016-02-11 | University Of Florida Research Foundation, Inc. | Joint fountain coding and network coding for loss-tolerant information spreading |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110191342A (en) * | 2019-07-02 | 2019-08-30 | 北京字节跳动网络技术有限公司 | Method for video coding and equipment, video encoding/decoding method and equipment |
| CN113068065A (en) * | 2021-02-25 | 2021-07-02 | 北京邮电大学 | A low-latency streaming media transmission method based on viewpoint prediction |
| CN114900273A (en) * | 2022-03-30 | 2022-08-12 | 上海交通大学 | A Continuous Reliable Transmission Method Based on Covering Window and Non-Uniformly Sampling Fountain Code |
| CN114900273B (en) * | 2022-03-30 | 2024-05-24 | 上海交通大学 | A continuous reliable transmission method based on covering window and non-uniform sampling fountain code |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110620892B (en) | Video Annotation Technology in Video Communication | |
| US8711929B2 (en) | Network-based dynamic encoding | |
| EP1709783B1 (en) | Methods and systems that use information about data packets to determine an order for sending the data packets | |
| US9510028B2 (en) | Adaptive video transcoding based on parallel chunked log analysis | |
| Sun et al. | MPC-based delay-aware fountain codes for real-time video communication | |
| Darabkh et al. | Efficient PFD-based networking and buffering models for improving video quality over congested links | |
| Chakareski | Wireless streaming of interactive multi-view video via network compression and path diversity | |
| WO2017161124A1 (en) | System for video streaming using delay-aware fountain codes | |
| WO2017161122A1 (en) | System for live video streaming using delay-aware fountain codes | |
| Sun et al. | MPC-based delay-aware fountain codes for live video streaming | |
| JP4732428B2 (en) | Transcoding node and transcoding method for multiple description transcoding | |
| Nguyen et al. | Scalable 360 video streaming using HTTP/2 | |
| Darabkh et al. | Improving UDP performance using intermediate QoD‐aware hop system for wired/wireless multimedia communication systems | |
| Kurdoglu et al. | Perceptual quality maximization for video calls with packet losses by optimizing FEC, frame rate, and quantization | |
| Wang et al. | Adaptive VR video data transmission method using mobile edge computing based on AIoT cloud VR | |
| CN112040235B (en) | Video resource encoding method and device and video resource decoding method and device | |
| Sun et al. | Unequal error protection for video streaming using delay-aware fountain codes | |
| Roder et al. | Branch and bound algorithms for rate-distortion optimized media streaming | |
| Singh et al. | Modeling and optimization of video transmission in data compression & source coding | |
| US20250385944A1 (en) | Dynamic systems and methods for media-aware low- to ultralow-latency, real-time transport protocol content delivery | |
| 남우승 | Exploring Compressive Transmission with Neural Compression for Efficient Data Communication | |
| CN106875416B (en) | User terminal and multicast-based content receiving method of user terminal | |
| Lima et al. | Optimal rate allocation for p2p video streaming | |
| Huang et al. | Optimized packet scheduling for live streaming on peer-to-peer network with network coding | |
| Kovács et al. | Receiver heterogeneity helps: Network coding for wireless multi-layer multicast |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17767526 Country of ref document: EP Kind code of ref document: A1 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17767526 Country of ref document: EP Kind code of ref document: A1 |