TWI354495B

TWI354495B -

Info

Publication number: TWI354495B
Application number: TW96142831A
Authority: TW
Original assignee: Univ Nat Cheng Kung
Priority date: 2007-11-13
Filing date: 2007-11-13
Publication date: 2011-12-11
Also published as: TW200922328A

Description

1354495 九、發明說明：【發明所屬之技術領域】本發明是有關於一種多媒體串流系統，特別是指一種利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統。【先前技術】由於寬頻網路和IEEE 802.11b/g網路的普及，加上新興的行動3G和Wi-Max網路，在共構而成的異質網路中，使用者可以利用任何一種設備，連上不同的網路來存取網路上的影音多媒體資源。為了滿足不同使用者對於服務品質的需求，可調式多媒體編碼（Scalable Multimedia Coding )技術可以將影音資料壓縮成不同品質的檔案；此外，一個良好設計的可適性多媒體串流（Adaptive Multimedia Streaming )控制架構，則能夠根據當時不同的網路資源，適當地改變傳輸速度，以調整影音撥放的解析度和品質。因此，欲滿足現今異質網路中各種不同服務品質需求的使用者，結合可調式多媒體編碼技術的特性和可適性多媒體串流控制架構，將是非常重要的議題。就可調式多媒體編碼技術而言，目前有許多不錯的技術，例如，轉換編碼（Transcoding )、精敏可調式的視訊編碼（Fine Granularity Scalability，簡稱 FGS)，以及位元切割的算術音訊編碼（Bit Sliced Arithmetic Coding，簡稱 BSAC)，已經被提出。其中，FGS和BSAC技術的主要優 5 13.54495 點在於只需要以最高品質的模式編碼過一次，就能夠產生不同基礎層和加強層的串流稽案’且解碼器就能夠從接收到的串流資料（包含基礎層和部份加強層）進行解瑪。然而，不同的基礎層和加強層數量的多媒體資料，有著不同的解碼時間複雜度，會造成影音串流在同步撥放上的困難。但是，現有的技術和文獻，均甚少針對此問題進行討論與研究。 _就可適丨生多媒體串流控制架構而言，存在著三個影響同步的主要因素：第一、時變的網路頻寬，第二、不可預期的時間延遲抖動（Delay Jitter)，第三、封包遺失。關於「時變的網路頻寬」因素’目前已經有很多好的網路頻寬估測技術與參考軟體（如TOPP、sLoPP、pathChirp等），被用來精確估測傳輸控制協定（Transmissi〇n c〇ntn)1 Protoco卜簡稱TCP) /使用者資料流協定（仏“ d叫職 Μ嶋卜簡稱UDP)交通（ΤΓ版）下的可用頻寬。關於「不可預期的時間延遲抖動」因f，相關的緩衝區控制，也被提出用來減少時間延遲抖動現象的影響。例如，利用調整解碼端的播放速度作緩衝區的管理、利用調整伺服器端的傳送速度來彌補網路延遲抖動的問題，或者利用緩衝區的配置方法來處理延遲時間抖動的問題。關於「封包遺失」因素，封包重傳機制也常被用來改善因為封包遺失而降低的影音品質。例如，依據遺失封包的重要性來決定是否重傳封包，或者利用一種條件式的重傳機制，當遺失封包造成的影音失真超過其訂定的門檻，則予以重傳遺失的 6 13.54495 封a。然而 -“上可適性的多媒體串流技術，大多著重於傳統的早層之影音資料傳送，並未針對，.曰人八居影音資料的編/解碼特性進行一+ 分層之可調式，行性進仃設计，這將很音編碼技術在異質網路中影音串流的優點。調式衫故，有必要尋求一解決之道’結合可調式多媒體編碼技術的特性和可適性多媒俨“狄“ 夕媒體名碼體$流控制架構，㈣應不同的使用者設備和網路特性，並滿服務品質需求的使用者。質網路中各種不同影音1354495 IX. Description of the Invention: [Technical Field] The present invention relates to a multimedia streaming system, and more particularly to a layered multimedia streaming system utilizing video and audio synchronization technology and bandwidth adaptive technology. [Prior Art] Due to the popularity of broadband networks and IEEE 802.11b/g networks, and the emerging mobile 3G and Wi-Max networks, users can use any kind of device in a heterogeneous network. Connect to different networks to access audio and video multimedia resources on the network. In order to meet the needs of different users for service quality, Scalable Multimedia Coding technology can compress audio and video data into different quality files. In addition, a well-designed Adaptive Multimedia Streaming control architecture , according to different network resources at that time, the transmission speed can be appropriately changed to adjust the resolution and quality of the video playback. Therefore, users who want to meet the diverse service quality requirements of today's heterogeneous networks, combined with the characteristics of adaptive multimedia coding technology and adaptable multimedia streaming control architecture, will be very important issues. In terms of adjustable multimedia coding technology, there are many good technologies, such as Transcoding, Fine Granularity Scalability (FGS), and bit-cut arithmetic audio coding (Bit). Sliced Arithmetic Coding (BSAC) has been proposed. Among them, the main advantage of FGS and BSAC technology is 15.54495. It only needs to be encoded once in the highest quality mode, and it can generate the flow syndrome of different base layer and enhancement layer' and the decoder can receive the stream from the received stream. The data (including the base layer and some enhancement layers) is used for solving the problem. However, different multimedia layers of the base layer and the enhancement layer have different decoding time complexity, which may cause difficulty in synchronous playback of video and audio streams. However, the existing technologies and literature rarely discuss and study this issue. _ In terms of the multimedia stream control architecture, there are three main factors that affect synchronization: first, time-varying network bandwidth, second, unpredictable time delay jitter (Delay Jitter), Third, the packet is lost. Regarding the "time-varying network bandwidth" factor, there are already many good network bandwidth estimation techniques and reference software (such as TOPP, sLoPP, pathChirp, etc.), which are used to accurately estimate the transmission control protocol (Transmissi〇). Nc〇ntn)1 Protoco 卜) TCP) / User Data Flow Protocol (仏 “d 叫 Μ嶋 UDP UDP”) The available bandwidth under Traffic (ΤΓ版). About “unpredictable time delay jitter” due to f , related buffer control, has also been proposed to reduce the effects of time delay jitter. For example, the adjustment of the playback speed of the decoder is used as the buffer management, the transmission speed of the server is adjusted to compensate for the problem of network delay jitter, or the buffer configuration method is used to handle the delay time jitter. Regarding the "package loss" factor, the packet retransmission mechanism is also often used to improve the quality of audio and video due to packet loss. For example, depending on the importance of the lost packet, it is decided whether to retransmit the packet, or by using a conditional retransmission mechanism, when the audio and video distortion caused by the lost packet exceeds the threshold set by it, the lost 6 13.54495 a is retransmitted. However, "the most adaptable multimedia streaming technology focuses on the transmission of traditional early-stage audio and video data. It is not targeted. The encoding/decoding characteristics of the eight-person audio and video data are adjusted by one layer and one layer. Sexually advanced design, which will be the advantage of audio coding technology in video and audio streaming in heterogeneous networks. It is necessary to find a solution to combine the characteristics of adaptive multimedia coding technology and adaptability of multimedia media. Di "Even media name code body $ flow control architecture, (d) should be different user equipment and network characteristics, and full of quality of service users. Quality network of different audio and video

【發明内容】，因此，本發明之目的，即在提供_種利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統。SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a layered multimedia streaming system that utilizes audiovisual synchronization techniques and bandwidth adaptive techniques.

於疋，本發明利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統是包含一客戶端處理單元。該客戶端處理單元包括-串流同步模組、—多階層影音解碼模組 ’及一播放同步模組。料流同步模組用以使已接收之一分層視訊位元流與-分層音訊位元流，在預定的一解石馬時間内送入該多階層影音解碼模組。該多階層影音解妈模組用以將該分層視訊位元流與該分層音訊位元流分別解碼為一已解碼視訊信號及一已解碼音訊信號。該播放同步模組用以決定該已解碼視訊信號之一視訊播放時間，及該已解媽音訊信號之一音訊播放時間。藉由該串流同步模組、多階層影音解碼模組，及播放同步模組，以結合可調式多媒體編/解碼技術和可適性多媒體串流控制架構，的確可以達成本發明之目的。 7 1354495 【實施方式】 #關本發明之前述及其他技術内容、特點與功效，在以下配合參考圖4夕 >m Μ八疋一個較佳實施例的詳細說明中，將可清楚的呈現。 #09® 1’本發明利用影音同步技術與頻寬適應性技術之刀層式夕媒體串流系統之較佳實施例包含-伺服S 1及客戶端處理單疋2。該伺服器1包括一多階層影音編碼模組11及一頻寬適應模組12。該客戶端處理單元2包括一串流同步模組21、一 ^ ptb & . 夕1^層影音解碼模組22、一播放同步模組23及衫音播放模組24。該串流同步模組21具有一延遲抖動（Delay jitter)消除子模組211、一條件式重傳子模組212 ’及一接收資料緩衝器（Buffer) 213。該播放同步模組23具有一視訊播放緩衝器231，及一音訊播放緩衝器 232。該多階層影音編碼模組i丨將輸入之一視訊信號與一音訊仏號’为別編碼為一分層視訊位元流（Layered Video Bitstream)及一分層音訊位元流（Layered Audio Bitstream) 。其中該多階層影音編碼模組11在實施上可採用現有的可調式多媒體編碼技術，例如，FGS、BSAC等。且該分層視訊位元流具有複數視訊訊框（Video Frame)，該分層音訊位元流具有複數音訊訊框（Audio Frame)。每一視訊訊框具有一視訊基礎層（Video Base Layer)與至少一視訊加強層（ Video Enhanced Layer)，每一音訊訊框具有一音訊基礎層（ Audio Base Layer)與至少一音訊加強層（Audio Enhanced 8 13.54495Yu Wei, the layered multimedia streaming system of the present invention utilizing video synchronization technology and bandwidth adaptive technology comprises a client processing unit. The client processing unit includes a -stream synchronization module, a multi-layer audio and video decoding module, and a playback synchronization module. The stream synchronization module is configured to enable the received one of the layered video bit stream and the layered audio bit stream to be sent to the multi-layer video decoding module within a predetermined one of the solution time. The multi-layer video and audio solution module is configured to decode the layered video bit stream and the layered audio bit stream into a decoded video signal and a decoded audio signal, respectively. The playback synchronization module is configured to determine a video playback time of the decoded video signal and an audio playback time of the decoded audio signal. The object of the present invention can be achieved by the stream synchronization module, the multi-layer video decoding module, and the playback synchronization module in combination with the adjustable multimedia encoding/decoding technology and the adaptive multimedia stream control architecture. 7 1354495 [Embodiment] The above-mentioned and other technical contents, features and effects of the present invention will be apparent from the following detailed description of a preferred embodiment with reference to Fig. 4 > m. #09® 1' A preferred embodiment of the blade layer media stream system of the present invention utilizing AV synchronization technology and bandwidth adaptive technology comprises - Servo S 1 and Client Processing Unit 2. The server 1 includes a multi-layer video coding module 11 and a bandwidth adaptation module 12. The client processing unit 2 includes a stream synchronization module 21, a ^ ptb & a layer 1 audio and video decoding module 22, a playback synchronization module 23, and a shirt playing module 24. The stream synchronization module 21 has a delay jitter elimination sub-module 211, a conditional retransmission sub-module 212', and a receive data buffer (Buffer) 213. The play sync module 23 has a video play buffer 231 and an audio play buffer 232. The multi-level video coding module i 输入 encodes one video signal and one audio nickname 'as a layered video bitstream and a layered audio bitstream (Layered Audio Bitstream) . The multi-layer video coding module 11 can be implemented by using existing scalable multimedia coding technologies, such as FGS, BSAC, and the like. And the hierarchical video bitstream has a complex video frame having a complex audio frame. Each video frame has a video base layer and at least one video enhancement layer. Each audio frame has an audio base layer and at least one audio enhancement layer (Audio). Enhanced 8 13.54495

Layer)。音訊/視訊基礎層為每一音訊/視訊訊框解碼時所需的最基本資料，而音訊/視訊加強層則是用以改善音訊/視訊基礎層解碼後的品質。該頻寬適應模組12用以決定欲傳送之該分層視訊位元流之位元率（Bit-rate)與該分層音訊位元流之位元率，並將其透過一網路3傳送。由於可調式多媒體編/解碼技術的特性’音訊/視訊基礎層必須先被正確地解碼，音訊/視訊加強層才能夠進一步改善音訊/視訊基礎層解碼後的品質。因此，該頻寬適應模組12必須先預留音訊/視訊基礎層欲傳送的位元率，剩餘的才是音訊/視訊加強層之位元率（以表示），如式（1)所定義。Layer). The audio/video base layer is the most basic data required for decoding each audio/video frame, and the audio/video enhancement layer is used to improve the quality of the audio/video base layer after decoding. The bandwidth adaptation module 12 is configured to determine a bit rate of the layered video bit stream to be transmitted and a bit rate of the layered audio bit stream, and pass the same to a network 3 Transfer. Due to the characteristics of the adjustable multimedia encoding/decoding technology, the audio/video base layer must be decoded correctly, and the audio/video enhancement layer can further improve the quality of the audio/video base layer after decoding. Therefore, the bandwidth adaptation module 12 must first reserve the bit rate to be transmitted by the audio/video base layer, and the remaining is the bit rate of the audio/video enhancement layer (indicated by), as defined by equation (1). .

Trel — Trtotal — TrA bl — TrVjbl................................. (1) 其中Trt()tal代表該網路3之預測可用頻寬，Τγα 代表音訊基礎層之位元率，Trvbi代表視訊基礎層之位元率。然後，該頻寬適應模組12再根據該預測可用頻寬所求出的音訊/視訊加強層之位元率，並利用人類感知（ Perception)特性，以決定欲傳送的該音訊加強層之數目及該視訊加強層之數目。一般而言，在人類的感知中，聽覺較視覺重要許多’加上視訊資料的解碼器能夠利用内插（ InterleaVe)撥放的方式’配合人類視覺暫留的特性，來呈現遺失的4面。所以，就多媒體資料傳送的優先權而言，音訊資料較視訊資料高許多；即，欲傳送的該視訊加強層之數目會小於欲傳送的該音訊加強層之數目。假設一音訊/ 視訊加強層之數目比（以R表示），如式⑺所定義。9 ° 9 13.54495 • R=Na/Nv...................................................... (2) 其中Na代表該音訊加強層之數目，Nv代表該視訊加強層之數目。而每一音訊加強層的資料量約為2ΚΒ，每一視訊加強層的資料量約為10ΚΒ。該音訊/視訊加強層之數目比R值，可以综合音訊與視訊的品質來作決定。假設一音訊/視訊共同品質（以Qav表示），如式（3)所定義。Trel — Trtotal — TrA bl — TrVjbl........................... (1) where Trt()tal represents The predicted bandwidth of the network 3 is available, Τγα represents the bit rate of the audio base layer, and Trvbi represents the bit rate of the video base layer. Then, the bandwidth adaptation module 12 further determines the bit rate of the audio/video enhancement layer according to the predicted available bandwidth, and uses the human perception feature to determine the number of the audio enhancement layer to be transmitted. And the number of video enhancement layers. In general, in human perception, hearing is more important than vision. The decoder that adds video data can use the InterleaVe dialing method to match the characteristics of human visual persistence to present the lost four sides. Therefore, in terms of the priority of multimedia data transmission, the audio data is much higher than the video data; that is, the number of the video enhancement layers to be transmitted is smaller than the number of the audio enhancement layers to be transmitted. Assume that the ratio of the number of audio/video enhancement layers (indicated by R) is as defined in equation (7). 9 ° 9 13.54495 • R=Na/Nv........................................ .............. (2) where Na represents the number of audio enhancement layers and Nv represents the number of video enhancement layers. The amount of data for each audio enhancement layer is about 2 inches, and the amount of data for each video enhancement layer is about 10 inches. The number of audio/video enhancement layers is greater than the R value and can be determined by combining the quality of audio and video. Assume an audio/video common quality (indicated by Qav) as defined by equation (3).

Qav = 2xQa+Qv............................................... (3) 其中Qa代表音訊品質，係以客觀差異等級（Objective 鲁 Difference Grade，簡稱 ODG )與失真指標（DistortionQav = 2xQa+Qv............................................. .. (3) where Qa stands for audio quality, with Objective Difference Level (Objective Lu Difference Grade, ODG for short) and Distortion Indicator (Distortion)

Index，簡稱DI)作為評量標準；Qv代表視訊品質，係以訊嗓比（Peak Signal to Noise Ratio，簡稱 PSNR )作為評量標準。由於ODG較適合評量品質較高的音訊，而DI可評量各種品質高低不同的音訊，故取ODG與DI之平均值作為音訊品質的等級。而ODG與DI的值將由原本的-4.0〜0.0 將其線性正規劃（Linear Normalize )到0~40。經過眾多實驗的結果，最佳化的音訊/視訊加強層之數目比R值為5， • 依此觀念建立了一加強層位元率與加強層數目對應表（ Mapping Table )，如表1所示。在本較佳實施例中，該頻寬適應模組12係以該音訊/視訊加強層之位元率，直接對應出欲傳送的該音訊加強層之數目及該視訊加強層之數目。表1 中，£1^代表音訊加強層l~m (即，音訊加強層之數目為m )，丨_„代表視訊加強層1〜《(即，視訊加強層之數目為η) 〇表1、加強層位元率與加強層數目對應表 10 1354495 加強層位元率加強層數目 2KB 4KB Eli2 10KB 20KB 成-5，硪 22KB d 50KB EH 60KB , Ell2 j (2xm +ΙΟχη) KB ，Ell„Index (DI) is used as the evaluation standard; Qv stands for video quality, and is based on the Peak Signal to Noise Ratio (PSNR). Since ODG is more suitable for evaluating high-quality audio, and DI can evaluate different audio quality, the average value of ODG and DI is taken as the audio quality level. The values of ODG and DI will be linear normalized to 0~40 from the original -4.0~0.0. According to the results of many experiments, the number of optimized audio/video enhancement layers is 5, and the value of R is set to 5. According to this concept, a mapping table of the enhancement level and the number of enhancement layers is established, as shown in Table 1. Show. In the preferred embodiment, the bandwidth adaptation module 12 directly corresponds to the number of the audio enhancement layers to be transmitted and the number of the video enhancement layers at the bit rate of the audio/video enhancement layer. In Table 1, £1^ represents the audio enhancement layer l~m (ie, the number of audio enhancement layers is m), and 丨_„ represents the video enhancement layer 1~“(ie, the number of video enhancement layers is η) 〇 Table 1 Reinforcement layer bit rate and number of reinforcement layers Table 10 1354495 Reinforcement layer bit rate enhancement layer number 2KB 4KB Eli2 10KB 20KB 成-5,硪22KB d 50KB EH 60KB , Ell2 j (2xm +ΙΟχη) KB ,Ell„

該伺服器1會將該分層視訊位元流與該分層音訊位元流分割為複數封包（Packet)，透過該網路3進行傳送。當該客戶端處理單元2接收到該等封包時，該串流同步模組 21用以使已接收之該分層視訊位元流與分層音訊位元流，在預定的一解碼時間内送入該多階層影音解碼模組22。其中已接收之該分層視訊位元流與分層音訊位元流係儲存於該接收資料緩衝器213。該延遲抖動消除子模組211係統計該網路3目前的一延遲時間，並使對應該接收資料緩衝器213 之一可呈現時間長度（Temporal Presentation Length)大於該延遲時間，藉以使該分層視訊位元流及該分層音訊位元流在該解碼時間内送入該多階層影音解碼模組22。而該可呈現時間長度係由該接收資料緩衝器213内之該分層視訊位元 11 13.54495 流的視訊訊框之數目，與該分層音訊位元流的音訊訊框之數目所決定》該延遲抖動消除子模組m先求出—開始延遲抖㈣權值，若該可呈現時間長度小於該開始延遲抖動門捏值則透㈣網路3回傳-開始延遲抖動消除訊息給該頻寬適應模組 12。假設該開始延遲抖動門檻值（以THj表示），如式㈠ )所定義。The server 1 divides the hierarchical video bit stream and the layered audio bit stream into a plurality of packets, and transmits the packets through the network 3. When the client processing unit 2 receives the packets, the stream synchronization module 21 is configured to send the received layered video bitstream and the layered audio bitstream to a predetermined decoding time. The multi-level video decoding module 22 is incorporated. The hierarchical video bit stream and the layered audio bit stream that have been received are stored in the received data buffer 213. The delay jitter cancellation sub-module 211 system calculates a current delay time of the network 3, and causes a Temporal Presentation Length corresponding to one of the received data buffers 213 to be greater than the delay time, thereby causing the layering The video bit stream and the layered audio bit stream are sent to the multi-layer video decoding module 22 during the decoding time. The length of the presentation time is determined by the number of video frames flowing from the layered video bit 11 13.54495 in the received data buffer 213, and the number of audio frames of the layered audio bit stream. The delay jitter elimination sub-module m first obtains a start delay jitter (four) weight, and if the renderable time length is less than the start delay jitter threshold, the network 4 back-starts the delay jitter cancellation message to the bandwidth. Adapt to the module 12. Assume that the start delay jitter threshold (indicated by THj) is as defined by equation (1).

THj = axJmax +(1- ....................................... 其中Jmax代表最大延遲抖動時間，_代表平均延遲抖動時間’而Jmax與Javg是該延遲抖動消除子模組2ιι根據該網路3目前狀態統計而得；a代表—控制參數且〇◊ <1。當該頻寬適應模、组12接收到該開始延遲抖動消除訊息，會進-步調整欲傳送之該分層音訊位元流之音訊訊框的間隔數目（以FRa表示），及該分層視訊位元流之視訊訊框的THj = axJmax +(1- ....................................... where Jmax represents the maximum Delay jitter time, _ represents average delay jitter time' and Jmax and Javg are the delay jitter elimination sub-module 2 ιι according to the current state of the network 3 statistics; a represents - control parameters and 〇◊ < 1. When the frequency The wide adaptive mode, the group 12 receives the start delay jitter cancellation message, and further adjusts the number of intervals (indicated by FRa) of the audio frame of the layered audio bit stream to be transmitted, and the layered video bit Streaming video frame

間隔數目（以FRv表示），分別如式（5) ~ (6)所定義。 FRa=(TrA)bl + TrA)el)/(TrA bi+(TrA ei/2))..............⑸ 其中TrA，b】為音訊基礎層之位元率，Tj：A,e】為音訊加強層之位元率。 FRv = (Trv>bl + TrV)e,)/(Trv>bl + Trv,el_min)............... (6) 其中Trv’bl為視訊基礎層之位元率，為視訊加強層之位元率，Trv，el_min為最小所需視訊加強層之位元率。該延遲抖動消除子模組211繼而求出一停止延遲抖動門樣值’若該可呈現時間|度大於該停止延遲抖動門捏值，則透過該網路3回傳該停止延遲抖動消除訊息給該頻寬適應模 12The number of intervals (in FRv) is defined by equations (5) ~ (6). FRa=(TrA)bl + TrA)el)/(TrA bi+(TrA ei/2))..............(5) where TrA,b] is the bit rate of the audio base layer , Tj: A, e] is the bit rate of the audio enhancement layer. FRv = (Trv>bl + TrV)e,)/(Trv>bl + Trv,el_min)............... (6) where Trv'bl is the position of the video base layer The bit rate is the bit rate of the video enhancement layer, and Trv, el_min is the bit rate of the minimum required video enhancement layer. The delay jitter cancellation sub-module 211 then obtains a stop delay jitter threshold value. If the renderable time_degree is greater than the stop delay jitter threshold, the stop delay jitter cancellation message is transmitted back through the network 3. The bandwidth adaptive mode 12

組12。假設該停止延遲抖動門檻值（以THsj表示），如式 (7)所定義》 X ™Sj = rXjmaX.................................................. (7) _其中Γ代表一調整參數，當該頻寬適應模組12接收到这停止延遲抖動消除訊息，該伺服器1會回到原始的傳送模式。該調整參數r值的大小，會影響到「延遲抖動消除」的運作頻率，與低-品質音訊/視訊時間的長度。當Γ值較大時 k遲抖動/肖除」所造成的低_品質音訊/視訊時間的長度將較長（因騎送較少的加㈣串流，故使得影音品質較差）Φ於「延遲抖動消除」對音訊品質的影響較大，在本較佳實施例中’音訊的r值設為2 Q :而視訊的^值 1.5。該條件式重傳子模組212先計算一可允許的封包延遲時間門檻值’用以判斷是否有封包遺失。假設該可允許的封包延遲時間門檻值（以THd表示），如式（8)所定義。 THd = bxJmax+(i-b)xJavg.............................. (g) 其中b為-個控制參數，控制THd介於與_ 間的比例；當某-封包的延遲時間大於丁則時，則此封包將視為遺失。若有封包遺失，則該條件式重傳子模組212進一步判斷：遺失的封包是否具有視訊基礎層或音訊基礎層之資料，且該可呈現時間長度是否大於—封包重傳所需時間（根據該網路3目前狀態統計而得）；若是，則透過該網路3 回傳-封包重傳訊息給該頻寬適應模组…當該頻寬適應模組U接收到某-封包之封包重傳訊息時，會依該預測可 13 13.54495 用頻寬決定是否進行此封包之重傳；且當該頻寬適應模組 12同時接收到數個封包的封包重傳訊息時，會依各封包之優先權進行重傳，其優先權順序分別為：視訊訊框内的ς訊框最高、音訊訊框次之，視訊訊框内的其他類型訊框（如， Β·訊框、ρ-訊框）最低。值得一提的是，該延遲抖動消除子模組211與該條件式重傳子杈組212可以分開進行，但其等亦可同時進行◊當該延遲抖動消除子模組211與該條件式重傳子模組212同行時，封包重傳所需的位元率會先被保留，且式（5)〜（6 )所定義之音訊訊框的間隔數目FRa與視訊訊框的間隔數目FRv，可進一步修改如式（9)〜（1〇)所定義。 FRa = (TrA’b丨 + TrA，e丨-TrA ret)/(TrA上丨 + (Tra e丨/2))⑼ 其中TrA，ret代表音訊封包重傳所需的位元率。 FRv = (Trv,M+Trv,el-Trv,re〇/(Trvbi+Trv^ 〇〇) 其中Trv，ret代表視訊封包重傳所需的位元率。在該串流同步模組21處理的過程中，該多階層影音解碼模組22不斷地自該接收資料緩衝器213内讀取該分層視訊位元流與該分層音訊位元流，並將其等分別解碼為一已解碼視訊信號及-已解碼音訊㈣。其巾該多階層影音解碼模組22在實施上可㈣現有的可赋多媒體解碼技術，並對應於該多階層影音編碼模組11所採㈣可調式多㈣編碼技術。該多階層影音解碼模組22解碼出的該已解碼視訊信號及已解碼音訊信號，分簡存於該視訊㈣緩衝器如及音訊播放緩衝器232。 14 13.54495 - 該播放同步模組23利用式（11) ~ ( 18)逐步計算，以決定該已解碼視訊信號之一視訊播放時間，及該已解碼音訊信號之一音訊播放時間。 RAmax = TAmax — TAmin................................ (11) 其中TAmax代表一最大音訊解碼時間（解碼資料包括音訊基礎層和所有的音訊加強層），TAmin代表一最小音訊 ' 解碼時間（解碼資料只包括音訊基礎層），RAmax代表一最 ' 大音訊解碼時間差。 • RVmax= TYmax— TVmin................................ ( 12 ) 其中TVmax代表一最大視訊解碼時間（解碼資料包括視訊基礎層和所有的視訊加強層），TVmin代表一最小視訊解碼時間（解碼資料只包括視訊基礎層），RVmax代表一最大視訊解碼時間差。 PA(1)=TA(1)+RAmax................................... (13) PV(l)=TV(l)+RVmax................................... (14) P(av) = max{PA(l)，PV(1)}............................. (15) _ PA(l)=P(av)，PV(l) = P(av)........................... (16) 其中TA(1)及TV(1)分別代表第一個音訊封包進入該音訊播放緩衝器232,與第一個視訊封包進入該視訊播放緩衝器231的時間，P(av)代表影音起始播放時間，PA(1)及 PV(1)分別代表解碼出之第一個音訊/視訊訊框的播放時間，最後皆設為該影音起始播放時間。由於影音資料的時間解析度是可以動態改變的，為了支援可變的時間解析度，對應第i個音訊/視訊訊框（i22) 15 1354495 之音訊/視訊播放時間（以PA⑴/Pv⑴表示）如下。 PA(i)=PA(i-i)+Ua⑴....................................（17) PV(i)=PV(i-l)+Uv(i).................................... ( 18) 其中Ua(i)與uv(i)分別代表第i個音訊/視訊訊框之時間間隔。該播放同步模組23會依該視訊播放時間ρν(〇與該音訊播放時間PA(i)，分別將該視訊播放緩衝器231内的該已解碼視訊信號，及該音訊播放緩衝器232内的該已解碼Group 12. Assume that the stop delay jitter threshold (expressed in THsj) is as defined in equation (7). X TMSj = rXjmaX........................ .......................... (7) _ where Γ represents an adjustment parameter, when the bandwidth adaptation module 12 receives the stop delay The jitter cancel message, the server 1 will return to the original transmission mode. The size of the adjustment parameter r affects the operating frequency of the "delay jitter cancellation" and the length of the low-quality audio/video time. When the value of Γ is large, the length of the low-quality audio/video time will be longer (there is less (4) streaming due to riding, so the audio and video quality is poor) Φ is delayed. The jitter cancellation has a large influence on the audio quality. In the preferred embodiment, the r value of the audio is set to 2 Q: and the value of the video is 1.5. The conditional retransmission sub-module 212 first calculates an allowable packet delay time threshold value to determine whether a packet is lost. Assume that the allowable packet delay time threshold (indicated by THd) is as defined in equation (8). THd = bxJmax+(ib)xJavg.............................. (g) where b is a control parameter and controls THd The ratio between _ and _; when a delay of a packet is greater than D, then the packet will be considered lost. If the packet is lost, the conditional retransmission sub-module 212 further determines whether the lost packet has the data of the video base layer or the audio base layer, and whether the length of the renderable time is greater than the time required for the packet retransmission (based on The current status of the network 3 is statistically obtained; if yes, the network 3 backhaul-packet retransmission message is sent to the bandwidth adaptation module... When the bandwidth adaptation module U receives the packet of a certain packet When the message is transmitted, it is determined according to the prediction that the bandwidth is used to determine whether to retransmit the packet; and when the bandwidth adaptation module 12 receives the packet retransmission message of several packets at the same time, according to the packet The priority is retransmitted, and the priority order is: the highest frame in the video frame, the second in the audio frame, and other types of frames in the video frame (eg, frame, frame, frame) )lowest. It should be noted that the delay jitter elimination sub-module 211 and the conditional retransmission sub-group 212 can be separately performed, but the same can also be performed simultaneously. The delay jitter elimination sub-module 211 and the conditional weight are When the sub-module 212 is traveling, the bit rate required for the retransmission of the packet is first reserved, and the number of intervals FRa of the audio frame defined by the equations (5) to (6) and the number of intervals of the video frame FRv, It can be further modified as defined by equations (9) to (1). FRa = (TrA'b丨 + TrA, e丨-TrA ret) / (TrA upper + (Tra e丨/2)) (9) where TrA, ret represents the bit rate required for audio packet retransmission. FRv = (Trv, M+Trv, el-Trv, re〇/(Trvbi+Trv^ 〇〇) where Trv, ret represents the bit rate required for video packet retransmission. Processing by the stream synchronization module 21 During the process, the multi-layer video decoding module 22 continuously reads the layered video bit stream and the layered audio bit stream from the received data buffer 213, and decodes the layered audio bit stream into a decoded video. Signal and - decoded audio (4). The multi-layer audio and video decoding module 22 can implement (4) existing multimedia decoding technology, and corresponds to the multi-level audio and video encoding module 11 (4) adjustable multi- (four) encoding The decoded video signal and the decoded audio signal decoded by the multi-level video decoding module 22 are stored in the video (4) buffer and the audio playback buffer 232. 14 13.54495 - The playback synchronization module 23 Step (11) ~ (18) is used to calculate the video playback time of one of the decoded video signals and the audio playback time of one of the decoded audio signals. RAmax = TAmax — TAmin........ ........................ (11) where TAmax represents a maximum sound Decoding time (decoding data includes audio base layer and all audio enhancement layers), TAmin represents a minimum audio 'decoding time (decoding data only includes audio base layer), and RAmax represents one of the most 'large audio decoding time difference. · RVmax= TYmax— TVmin................................ (12) where TVmax represents a maximum video decoding time (decoding data includes video base) Layer and all video enhancement layers), TVmin represents a minimum video decoding time (decode data only includes the video base layer), and RVmax represents a maximum video decoding time difference. PA(1)=TA(1)+RAmax..... .............................. (13) PV(l)=TV(l)+RVmax...... ............................. (14) P(av) = max{PA(l), PV(1)}. ............................ (15) _ PA(l)=P(av), PV(l) = P(av ).............................. (16) where TA(1) and TV(1) represent the first audio packet respectively into the The audio playback buffer 232, when the first video packet enters the video playback buffer 231, P (av) represents the video playback time, and PA (1) and PV (1) represent the decoded audio, respectively. A playing time of audio / video information box, and finally all set to start the video playback time. Since the time resolution of the audio and video data can be dynamically changed, in order to support the variable time resolution, the audio/video playback time (indicated by PA(1)/Pv(1)) corresponding to the i-th audio/video frame (i22) 15 1354495 is as follows . PA(i)=PA(ii)+Ua(1)....................................(17) PV(i)=PV(il)+Uv(i).................................... (18) where Ua(i) and uv(i) represent the time interval of the ith audio/video frame, respectively. The playback synchronization module 23 respectively decodes the decoded video signal in the video playback buffer 231 and the audio playback buffer 232 according to the video playback time ρν (〇 and the audio playback time PA(i). The decoded

音訊信號，適時地送入該影音播放模組24進行播放。歸納上述，藉由各模組間之相互配合運作，可成功地結合可調式多媒體編碼技術的特性和可適性多媒體串流控制架構，以適應不同的使用者設備和網路特性，並滿足異質網路中各種不同影音服務品質需求的使用者，的確可以達成本發明之目的。惟以上所述者，僅為本發明之較佳實施例而已，當不能以此限定本發明實施之範圍，即大凡依本發明申請^利The audio signal is sent to the video playback module 24 for playback. In summary, by combining the functions of the modules, the characteristics of the adjustable multimedia coding technology and the adaptive multimedia stream control architecture can be successfully combined to adapt to different user equipment and network characteristics, and to satisfy the heterogeneous network. Users of various audio and video service quality requirements in the road can indeed achieve the object of the present invention. However, the above is only the preferred embodiment of the present invention, and it is not possible to limit the scope of the practice of the present invention, that is, to apply for the invention according to the present invention.

範圍及發明說明内容所作之簡單的等效變化與修飾7皆仍屬本發明專利涵蓋之範圍内。【圖式簡單說明】圖1是-系統方塊圖，說明本發明利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統之較佳實施例 16 1354495 【主要元件符號說明】 1 ....... ••…伺服器組 11…… .....多階層影音編嫣 213 ... •…接收資料緩衝器模組 22…… •…多階層影音解碼 12…… •…頻寬適應模組模組 2 ....... •…客戶端處理單元 23…… …播放同步模組 21…… …·串流同步模組 231 ··· …視訊播放緩衝器 211 ···· •…延遲抖動消除子 232 ···· …音訊播放緩衝器模組 24…… …影音播放模組 212..·· •…條件式重傳子模 3 ....... …網路 17The simple equivalent changes and modifications 7 made by the scope and description of the invention are still within the scope of the invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a system block diagram illustrating a preferred embodiment 16 of a layered multimedia streaming system utilizing video and audio synchronization techniques and bandwidth adaptive techniques of the present invention. 1354495 [Explanation of main component symbols] 1 . ...... ••...server group 11............multi-level video editing 213 ... •... receiving data buffer module 22... •...multi-level video decoding 12... ...frequency-adaptive module module 2 ............. client processing unit 23 ... playback synchronization module 21 ... ... stream synchronization module 231 ···· video playback buffer 211 ···· •...Delay Jitter Eliminator 232 ····...Optical Playback Buffer Module 24...Video and Video Playback Module 212..··•...Conditional Retransmission Submodule 3... . ...network 17

Claims

1354495 X. Patent application scope: L A layered multimedia streaming system using audio-visual synchronization technology and bandwidth adaptive technology, including: a client butterfly processing unit, including a stream synchronization module, a multi-layer video decoding module And a playback synchronization module, wherein the stream synchronization module is configured to send a layered video bitstream and a layered audio bitstream to be sent to the multi-layer video and audio decoding within a predetermined decoding time. The multi-layer audio and video decoding module is configured to decode the layered video bit stream and the layered audio bit stream into a decoded video signal and a decoded video signal, and the playback synchronization module The method is used to determine a video playback time of the decoded video signal and an audio playback time of the decoded audio signal. 2. The layered multimedia streaming system using the audio-visual synchronization technology and the bandwidth-adaptive technology according to the scope of the patent application scope, wherein the playback synchronization module has a video playback buffer and an audio playback buffer. Separately for storing the decoded video signal and the decoded audio signal. 3. According to the layered multimedia streaming system using the audio-visual technology and the bandwidth adapting technology described in the second item of the fourth paragraph (4), wherein the client processing single 7L further includes a video-audio playing module group, the playing synchronization The module is based on the viewing time: the playing time and the audio playing time, respectively, the decoded video signal in the video playback buffer and the decoded audio signal in the audio playing buffer are sent to the audio and video The playback module plays. 4 ·Using video and audio synchronization technology and frequency according to the patent application scope 1 ί+' ^ I m