200922328 九、發明說明: 【發明所屬之技術領域】 本發明是有關於一種多媒體串流系統,特別是指一種 利用影音同步技術與頻寬適應性技術之分層式多媒體串流 系統。 【先前技術】 由於寬頻網路和IEEE 802.11b/g網路的普及,加上新 興的行動3G和Wi-Max網路,在共構而成的異質網路中, 使用者可以利用任何一種設備,連上不同的網路來存取網 路上的影音多媒體資源。 為了滿足不同使用者對於服務品質的需求,可調式多 媒體編瑪(Scalable Multimedia Coding )技術可以將影音資 料壓縮成不同品質的檔案;此外,一個良好設計的可適性 多媒體串流(Adaptive Multimedia Streaming )控制架構, 則能夠根據當時不同的網路資源,適當地改變傳輸速度, 以調整影音撥放的解析度和品質。因此,欲滿足現今異質 網路中各種不同服務品質需求的使用者,結合可調式多媒 體編碼技術的特性和可適性多媒體串流控制架構,將是非 常重要的議題。 就可調式多媒體編碼技術而言,目前有許多不錯的技 術,例如,轉換編碼(Transcoding )、精緻可調式的視訊編 碼(Fine Granularity Scalability,簡稱 FGS),以及位元切 割的算術音訊編碼(Bit Sliced Arithmetic Coding,簡稱 BSAC),已經被提出。其中,FGS和BSAC技術的主要優 200922328 點在於只需要以最高品質的模式編碼過一次,就能夠產生 = :::和加強層的串流檔案’且解碼器就能夠從接收 机貝科(包含基礎層和部份加強層)進行解碼。然 而,不同的基礎層和加強層數量的多媒體資料 的解騎間複雜度,會造成影音申流在同步撥放上的因難 與研現有的技術和文獻’均甚少針對此問題進行討論 就可適性多媒體串流控制架構而言 同步的主要因素:第一、時變的網路頻寬,第者 期的時間延遲抖動(Delay加er),第:^不可預 「時變的網路頻宫〇 穴關於 頻見」因素,目如已經有很 估測技術與參考軟體^ 判網路頻寬 被用來精確m傳PP、PathChlrp等),200922328 IX. Description of the Invention: [Technical Field] The present invention relates to a multimedia streaming system, and more particularly to a layered multimedia streaming system utilizing video and audio synchronization technology and bandwidth adaptive technology. [Prior Art] Due to the popularity of broadband networks and IEEE 802.11b/g networks, and the emerging mobile 3G and Wi-Max networks, users can use any kind of device in a heterogeneous network. Connect to different networks to access audio and video multimedia resources on the network. In order to meet the needs of different users for service quality, Scalable Multimedia Coding technology can compress audio and video data into different quality files; in addition, a well-designed Adaptive Multimedia Streaming control The architecture can appropriately change the transmission speed according to different network resources at that time to adjust the resolution and quality of the video playback. Therefore, users who want to meet the diverse service quality requirements of today's heterogeneous networks, combined with the characteristics of adaptive multimedia coding technology and adaptable multimedia streaming control architecture, will be very important issues. In terms of adjustable multimedia coding technology, there are many good technologies, such as Transcoding, Fine Granularity Scalability (FGS), and bit-cut arithmetic audio coding (Bit Sliced). Arithmetic Coding (BSAC) has been proposed. Among them, the main advantage of FGS and BSAC technology is 200922328. It only needs to be encoded once in the highest quality mode, and it can generate the stream file of =::: and enhancement layer' and the decoder can be received from the receiver. The base layer and some enhancement layers are decoded. However, the complexity of the inter-riding of the multimedia data of different base layers and reinforcement layers will cause the difficulty of the audio-visual application in the synchronous dial-up and the existing technology and literature are rarely discussed. The main factors of synchronization in the adaptive multimedia stream control architecture are: first, time-varying network bandwidth, time delay jitter in the first period (Delay plus er), and: ^ unpredictable "time-varying network frequency "Miyajima points on frequency" factors, such as the already estimated estimation technology and reference software ^ judgment network bandwidth is used to accurately pass PP, PathChlrp, etc.)
Protocol (^nsmisswn c〇mr〇1 a冉丁CP) /使用者資料流協Protocol (^nsmisswn c〇mr〇1 a冉丁CP) / User Data Stream Association
Pr〇t〇co卜簡稱 ser Datagram 「不c)下的可用頻寬。關於 也被提出用來J 相關的緩衝區控制, 视杈出用來減少時間延遲抖動現 調整解碼端的播放速度作緩衝區的管理’、'_=,利用 端的傳送速度來彌姑姻々 利用調整伺服器 區的配置方法來處理延2遲抖動的問題,或者利用緩衝 失法來處理延遲時間抖動的問題。關於「封包遺 」I’封包重傳機制也常被用來改 降低的影音品質。例如,仗储、去Λ 為封L遺失而 不舌* ,依據遺失封包的重要性來争宏曰 否重傳封包,或者利用’、疋 包造成的影音失真^重傳機制,當遺失封 真超過其訂定的門檀,則予以重傳遺失的 200922328 =的=’以上可適性的多媒體串流技術多著重於 = 送,並未―式 音編巧技針^ 行設計’這隸難發揮可調式影 “碼技術在異質網路中影音串流的優點。 技二胜Γ必要哥求—解決之道,結合可調式多媒體編碼 使用者、=可適性多媒體串流控制架構’以適應不同的 =:1路特性’並滿足異質網路中各種不同影音 服務品質需求的使用者。 【發明内容】 ,因此,本發明之目的,即在提供一種利用影音同步技 術與頻&適應性技術之分層式多媒體串流系統。 於疋,本發明利用影音同步技術與頻寬適應性技術之 分層式多媒體串流系統是包含—客戶端處理單元。該客戶 端處理早7L包括-串流同步模組、—多階層影音解碼模組 及播放同步核組。該串流同步模组用以使已接收之— 分層視訊位元流與一分層音訊位元流,在預定的一解碼時 間内送人❹階層影音解碼㈣。該多階㈣音解竭模植 用以將該分層視訊位元流與該分層音訊位^流分別解嗎為 -已解碼視訊信號及-已解碼音訊信號。該播放同步模组 用以決錢已解職訊信號之—視訊播放時間,及該已解 竭音訊信號之一音訊播放時間。 藉由該串流同步模組、多階層影音解碼模組,及播放 同步拉組,以結合可調式多媒體編/解碼技術和可適性多媒 體串流控制架構,的確可以達成本發明之目的。 7 200922328 【實施方式】 有關本舍明之前述及其他技術内容、特點與功效,在 以下配合參考圖式之—個較佳實施例的詳細說明中,將可 清楚的呈現。 參閱圖1 ’本發明利用影音同步技術與頻寬適應性技術 之分層式多媒體串流系統之較佳實施例包含一伺服㈱】及 一客戶端處理單元2。該伺服器1包括-多階層影音編碼模 組11及-頻寬適應模組12。該客戶端處理單元2包括一串 流同步模組21、一多階層影音解碼模組22、一播放同步模 組23 ’及一影音播放模组24。該争流同步模組21具有一 延遲抖動(Delay Jitte〇消除子模組211、一條件式重傳子 模組212 ’及一接收資料緩衝器(Buffer) 213。該播放同步 模組23具有一視訊播放緩衝器231,及一音訊播放緩衝器 232。 該多階層影音編碼模組丨丨將輸入之一視訊信號與一音 訊信號’分別編碼為一分層視訊位元流(Layered video Bitstream)及刀層音訊位元流(Layered Audio Bitstream ) 。其中該多階層影音編碼模組丨i在實施上可採用現有的可 調式多媒體編碼技術,例如,FGS、BSAC等。且該分層視 訊位元流具有複數視訊訊框(Vide〇 Frame),該分層音訊位 元流具有複數音訊訊框(Audio Frame )。每一視訊訊框具有 一視訊基礎層(Video Base Layer)與至少一視訊加強層( Video Enhanced Layer),每一音訊訊框具有一音訊基礎層( Audio Base Layer)與至少一音訊加強層(Audio Enhanced 8 200922328Pr〇t〇co 卜 abbreviated as the available bandwidth under ser Datagram “not c.” Regarding the buffer control that is also proposed for J correlation, the video is used to reduce the time delay jitter and now adjust the playback speed of the decoding end as a buffer. The management ', '_=, the use of the transmission speed of the end of the use of the adjustment of the server area configuration method to deal with the problem of delay 2 delay jitter, or the use of buffer loss to deal with the problem of delay time jitter. About "package The legacy "I" packet retransmission mechanism is also often used to change the quality of the video. For example, if you save or go to the door, you will lose the tongue, but you will not retransmit the packet according to the importance of the lost packet, or use the video and audio distortion caused by ', the package to retransmit the mechanism. More than the set door sandal, it will be retransmitted lost 200922328 = = 'The above adaptability of the multimedia streaming technology is more important than = send, not "styled skill editing technology ^ line design" Adjustable image "code technology in the heterogeneous network video and audio streaming advantages. Technology two wins the necessary brother to seek - the solution, combined with adjustable multimedia coding users, = adaptable multimedia stream control architecture to adapt to different =: 1 channel characteristic 'and meets the needs of various audio and video service quality requirements in heterogeneous networks. [Invention] Therefore, the object of the present invention is to provide an audio and video synchronization technology and frequency & adaptive technology. Hierarchical multimedia streaming system. In the present invention, the hierarchical multimedia streaming system using the audio-visual synchronization technology and the bandwidth adaptive technology is a client-side processing unit. The client processes 7L early. a stream-synchronization module, a multi-layer video decoding module, and a playback synchronization core group. The stream synchronization module is configured to enable the received layered video bit stream and a layered audio bit stream. The predetermined decoding time is sent to the human-level video and audio decoding (4). The multi-level (four) tone demodulation is used to solve the hierarchical video bit stream and the layered audio bit stream respectively - the decoded video signal And - the decoded audio signal. The playback synchronization module is used to determine the video playback time of the decommissioned signal and the audio playback time of one of the exhausted audio signals. The stream synchronization module and the multi-level The audio-visual decoding module and the playing synchronous pull group are combined with the adjustable multimedia encoding/decoding technology and the adaptive multimedia stream control architecture to achieve the object of the present invention. 7 200922328 [Embodiment] The foregoing and other The technical content, features and effects will be clearly shown in the following detailed description of a preferred embodiment with reference to the drawings. Referring to Figure 1 'The present invention utilizes audio-visual synchronization technology and bandwidth adaptation. The preferred embodiment of the layered multimedia streaming system of the technology comprises a servo system and a client processing unit 2. The server 1 comprises a multi-layer video encoding module 11 and a bandwidth adaptive module 12. The client processing unit 2 includes a stream synchronization module 21, a multi-layer video decoding module 22, a playback synchronization module 23', and a video playback module 24. The contention synchronization module 21 has a delay jitter ( The delay Jitte〇 elimination sub-module 211, a conditional retransmission sub-module 212' and a receiving data buffer (Buffer) 213. The playback synchronization module 23 has a video playback buffer 231, and an audio playback buffer. 232. The multi-level video coding module 编码 encodes one of the video signals and one audio signal into a layered video bitstream and a layered audio bitstream. The multi-layer video coding module 丨i can be implemented by using existing tunable multimedia coding technologies, such as FGS, BSAC, and the like. And the layered video bitstream has a complex video frame (Vide frame), the layered audio bitstream having a complex audio frame (Audio Frame). Each video frame has a video base layer and at least one video enhancement layer. Each audio frame has an audio base layer and at least one audio enhancement layer (Audio). Enhanced 8 200922328
Layer)。音訊/視訊基礎層為每一音訊/視訊訊框解碼時所需 的最基本資料,而音訊/視訊加強層則是用以改善音訊/視訊 基礎層解碼後的品質。Layer). The audio/video base layer is the most basic data required for decoding each audio/video frame, and the audio/video enhancement layer is used to improve the quality of the audio/video base layer after decoding.
該頻寬適應模組12用以決定欲傳送之該分層視訊位元 流之位元率(Bit-rate)與該分層音訊位元流之位元率,並 將其透過一網路3傳送。由於可調式多媒體編/解碼技術的 特性’音訊/視訊基礎層必須先被正確地解碼,音訊/視訊加 強層才旎夠進一步改善音訊/視訊基礎層解碼後的品質。因 此,該頻寬適應模組12必須先預留音訊/視訊基礎層欲傳送 的位元率,剩餘的才是音訊/視訊加強層之位元率(以 表示)’如式(1)所定義。 丄reI=Trtotal—TrA bi—Trv,bi……,,^ 其中TrtotaI代表該網路3之預測可用頻Hi代表 音訊基礎層之位元率,Trv,M代表視訊基礎層之位元率。 然後’該頻寬適應模組12再根據該預測可用頻寬所求 出的音§孔/視訊加強居 / ;玄, 速層之位兀率,並利用人類感知(The bandwidth adaptation module 12 is configured to determine a bit rate of the layered video bit stream to be transmitted and a bit rate of the layered audio bit stream, and pass the same to a network 3 Transfer. Due to the characteristics of the adjustable multimedia encoding/decoding technology, the audio/video base layer must be decoded correctly, and the audio/video enhancement layer can further improve the quality of the audio/video base layer after decoding. Therefore, the bandwidth adaptation module 12 must first reserve the bit rate to be transmitted by the audio/video base layer, and the rest is the bit rate of the audio/video enhancement layer (indicated) as defined by equation (1). .丄reI=Trtotal—TrA bi—Trv,bi...,,^ where TrtotaI represents the predicted available frequency Hi of the network 3 represents the bit rate of the audio base layer, and Trv, M represents the bit rate of the video base layer. Then, the bandwidth accommodating module 12 further enhances the bit § hole/video according to the predicted available bandwidth, and uses the human perception (
Perception)特性,以沐 、疋人傳送的S亥g訊加強層之數目及 該視訊加強層之數目。—舻 身又而δ 在人類的感知中,聽謦 較視覺重要許多,加上满1咨姐认初 I寬 上視SfL貢枓的解碼器能夠利用内插 Interleave )撥放的t彳 人 現遺失的書㈤彳式,配合人類視覺暫留的特性,來呈 現遺失的晝面。所以,就多媒體 音訊資料較視訊資料^,. gp 優先權而厂 貝针回泎多,即,欲傳 之數目會小於欲傳误玷+ t ^見误加強層 人傅迗的該音訊加強層之數 視訊加強層之數目比( 攸'•又9 afL/ (以R表不),如式(2) 9 200922328 a ν**"**" ………......………… (2 ) 其中Na代表該音訊加強層之數目,Nv代表該視訊加 強層之數目。而每一音訊加強層的資料量約為2κβ,每一 視訊加強層的資料量約為1GKB。該音訊/視訊加強層之數目 比R值,可以綜合音訊與視訊的品質來作決定。假設一音 訊/視訊共同品質(以Qav表示),如式(3)所定義Perception), the number of enhancement layers transmitted by Mu and the person, and the number of video enhancement layers. - 舻 又 δ δ In human perception, listening to 謦 is more important than visual, plus the full 1 咨 认 认 I I I I I I I I S S S S S S S S S S f S 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码 解码The lost book (five) style, in line with the characteristics of human vision persistence, to present the lost face. Therefore, in terms of multimedia audio data, the video information is more important than the video data ^,. gp priority, that is, the number of wanted messages will be less than the number of wanted to transmit + t ^ see the error enhancement layer of the person's audio enhancement layer The ratio of the number of video enhancement layers (攸'•又9 afL/ (in the form of R), as in (2) 9 200922328 a ν**"**" .................. (2) where Na represents the number of audio enhancement layers, Nv represents the number of video enhancement layers, and the amount of data for each audio enhancement layer is approximately 2 kappa, and the amount of data per video enhancement layer is approximately 1 GKB. The ratio of the number of audio/video enhancement layers to the R value can be determined by combining the quality of audio and video. Suppose an audio/video common quality (indicated by Qav) is defined by equation (3).
QaV=2xQa+Qv……………………..............."(3) 其中Qa代表音訊品質,係以客觀差異等級(〇bjective 歸⑽ce Grade,簡稱〇DG )與失真指標(此咖QaV=2xQa+Qv...................................."(3) where Qa stands for audio quality and is based on objective difference level (〇bjective returns to (10)ce Grade , referred to as 〇DG) and distortion indicators (this coffee
Index,簡稱DI)作為評量標準;Qv代表視訊品質,係以 訊噪比(Peak Signal to Noise Rati〇,簡稱 pSNR )作為評量 標準。由於ODG較適合評量品質較高的音訊,❿〇ι可評 量各種品質高低不同的音訊,故取〇DG與di之平均值作 為音訊品質的等級。而〇DG與DI的值將由原本的心㈣ 將其線性正規劃(Linear N_alize)到卜4()。經過眾多實 驗的結果,最佳化的音訊/視訊加強層之數目比r值為$, 依此觀念建立了一加強層位元率與加強層數目對應表(Index (DI) is used as the evaluation standard; Qv stands for video quality, and the Signal to Noise Rati〇 (pSNR) is used as the evaluation standard. Since ODG is more suitable for evaluating high-quality audio, ❿〇ι can measure different audio levels of different quality, so the average of DG and di is taken as the level of audio quality. The value of 〇DG and DI will be linearly planned (Linear N_alize) to Bu 4() from the original heart (4). After many experimental results, the number of optimized audio/video enhancement layers is greater than r, and a table of enhancement layer bit rate and enhancement layer number is established according to this concept.
Mapping TaMe)’如表i所示。在本較佳實施例中該頻寬 適應模組12係、以該音訊/視訊加強層之位$率,直接對應出 仅傳达的該音訊加強層之數目及該視訊加強層之數目。表1 中U表音訊加強| ^ (即,音訊加強層之數 \ 口两 172 ,拉^代表視訊加強層(即,視訊加強層之數目為") 表1、加強層位元率與加強層數目對應表 10 200922328 加強層位元率 加強層數目 2KB El^ 4KB EI^-2 10KB Ell5 20KB Ell5,Εΐζ 22KB ' Εΐζ ; 50KB E^2Q ' EI\ 60KB ei120,ei12 (2xm+ 10xn ) KB ,ΕΙΙ„ 該伺服器1會將該分層視訊位元流與該分層音訊位元 流分割為複數封包(Packet),透過該網路3進行傳送。當 該客戶端處理單元2接收到該等封包時,該串流同步模組 21用以使已接收之該分層視訊位元流與分層音訊位元流, 在預定的一解碼時間内送入該多階層影音解碼模組22。其 中已接收之該分層視訊位元流與分層音訊位元流係儲存於該 接收資料緩衝器213。該延遲抖動消除子模組211係統計該 網路3目前的一延遲時間,並使對應該接收資料緩衝器213 之一可呈現時間長度(Temporal Presentation Length)大於 該延遲時間,藉以使該分層視訊位元流及該分層音訊位元流 在該解碼時間内送入該多階層影音解碼模組22。而該可呈 現時間長度係由該接收資料緩衝器213内之該分層視訊位元 11 200922328 流的視訊訊框之數目,與該分居立% , /刀層a訊位元流的音訊訊框之數 目所決定。 該延遲抖動消除子模組211先求出一開始延遲抖動門根 值,若該可呈現時間長度小於該_延遲抖動Η檻值,則透 過該網路3回傳-開始延遲抖動消除訊息給該頻寬適應模組 12。假設該開始延遲抖動門檻值(以项表示),如式(4 )所定義。 尸 aX』max+(1 —a)xJavg,… ................(4) 其中】腿代表最大延遲抖動時Havg代表平均延遲 :動時間’而w與Javg是該延遲抖動消除子模組2ΐι根 據該網路3目綠聽計而得;a代表—㈣參數,且^ 。當該頻寬適應模組12接故到該開始延遲抖動消除訊息 會進-步調整欲傳送之該分層音訊位元流之音訊訊框的間 二數目⑺FRa表示)’及該分層視訊位元流之視訊訊框的 間隔數目(以㈣表示),分別如式⑸〜⑷所定義。 FRa- (TrA,bl+ TrA,el)/(TrA,b】+ (TrA e】/2))..…………(5) 其中TrA,bl為音訊基礎層之位元率,I為音訊加強層 之位率。 FRV - (Trv,bl + Trv,el)/(Trv,bl + Trv,el_min)…………·⑷ 其中Trv,bl為視訊基礎層之位元率,為視訊加強層 之位元率’ Trv,el_min為最小所需視訊加強層之位元率。 該延遲抖動消除子模組211繼而求出—停止延遲抖動門 威值’若該可呈現時間長度大於該停止延遲抖動㈣值,則 過該網路3回傳該停止延遲抖動消除訊息給該頻寬適應模 12 200922328 ^表示),如式 組12。假設該停止延遲抖動門檻值( (7 )所定義。 THsj = rxJmax ................................. ^中r代表-調整參數,當該頻寬適應模組12 到 延遲抖動消除訊息,該伺服器1會回到原始的傳送槿 式。該調整參數r值的大小,會影響到「延遲抖動消除、 運作頻率,與低-品質音訊/視訊時間的長度。當Γ值較大時 ,由「延遲抖動消除」所造成的低·品質音訊/視訊時間的長 度將較長(因為傳送較少的加強岛 的加強層串流,故使得影音品質較 由於延遲抖動消除」對音訊品質的影響較大,在本 =實施财,音料1•值設為視訊的^值被設為 該條件式重傳子㈣加先計算—可允許的封包延遲 時間門播值,用以判斷是否右 斷疋否有封包遺失。假設該可允許的封 匕延遲時間Η檀值(以THd表示),如式⑷所定義。 THd^bxJmax+(1-b)xJavg............ ⑷ 其中b為-個控制參數,控制伽介於】酿與 間的比例;當某_封白沾 將M a 遲時間大於™時,則此封包 ^為遺失。料封包遺失,則雜件式重料模組m進 次粗情遺失的封包是否具有視訊基礎層或音訊基礎層之 根攄兮該可呈現時間長度是否大於一封包重傳所需時間( ^網路3目前狀態統計叫);若是,麟過該網路3 =封包重傳訊息給該頻寬適應模Μ 12。當該頻寬適應 吴、'且12接收到某—封包之封包重傳訊息時,會依該預測可 13 200922328 12頻封包之重傳;該頻寬適應模組 ^同時接收到數個封包的封包重傳訊息時,會依各封包之 、序刀另J為.視§fl訊框内的];_訊 框最高、音訊訊框次之,雜却4 y 硯讯矾框内的其他類型訊框(如, B-訊框、P-訊框)最低。 于提的疋該延遲抖動消除子模組211與該條件式 重傳子模組212可以分開進行,但其等亦可同時進行。當該 延遲抖動鎌子模組211與該條件式重傳子模組212同時進 行時,封包重傳所需的位元率會先被保m⑸〜(6 )所定義之音訊訊框的間隔數目FRa與視訊訊框的間隔數 目FRv,可進一步修改如式(9)~⑽所定義。 FRa=(Tr"bl+Tr--T^〇^ (9) 其中TrA’ret代表音訊封包重傳所需的位元率。 FRv=(Trv,bI+TrVt_TrVret)/(Trvbi+T^ ( ι〇) 其中Trv’ret代表視訊封包重傳所需的位元率。 在該串流同步模組21處理的過㈣,料階層影音解 碼模組22錢地自該接收料緩衝器2U _取該分層視 訊位元流與該分層音訊位元流,並將其等分別解碼為一已解 碼視訊信號及-已料音訊”。其"多階層影音解碼模 組22在實施上可採用現有的可調式多媒體解碼技術,並對 應於該多階層影音料模組u所採用的可調式多媒體編碼 ㈣彡音解_組22解碼出的該已解瑪視訊信 號及已解碼日訊號’分別儲存於該視訊播放緩衝器如及 音訊播放缓衝器232。 14 200922328 該播放同步模組23利用式(11 ) ~ ( 18 )逐步計算, 以決定該已解碼視訊信號之一視訊播放時間,及該已解碼音 訊信號之一音訊播放時間。 r A m a X ^ T A in a X τ A m in …( 11 ) 其中TAmax代表一最大音訊解碼時間(解碼資料包括 音訊基礎層和所有的音訊加強層),TAmin代表一最小音訊 解碼時間(解碼資料只包括音訊基礎層),RAmax代表一最 大音訊解碼時間差。 R.Vm3.x~~ in3.X — ………(12) 其中TVmax代表一最大視訊解碼時間(解碼資料包括 視訊基礎層和所有的視訊加強層),TVmin代表一最小視訊 解碼時間(解碼資料只包括視訊基礎層),RVmax代表一最 大視訊解碼時間差。 PA(l) — TA(1)_(_ RAnicix........................(13 ) pV(1) — r11 (1) "~|'1 nia( 1 ) P(av) = max{PA(l) ’ PV^l)}·*····.......^ 15) PA(1)二 P(av),PV(l) = P(av)·......*·*·*.............. 16) 其中TA(1)及TV(1)分別代表第一個音訊封包進入該音 訊播放緩衝器232,與第一個視訊封包進入該視訊播放緩衝 器23 1的時間,P(av)代表影音起始播放時間,PA(1)及 PV(1)分別代表解碼出之第一個音訊/視訊訊框的播放時間, 最後皆設為該影音起始播放時間。 由於影音資料的時間解析度是可以動態改變的,為了 支援可變的時間解析度,對應第i個音訊/視訊訊框(i22) 15 200922328 之音訊/視訊播放時間(以PA⑴/pv⑴表示)如下。 PA(i)=pA(i_1)+Ua(i)……...............( 17) PV(i) = PV(M)+Uv⑴…· … …⑴) 其中Ua⑴與Uv(i)分別代表第i個音訊/視訊訊框之時 間間隔該播放同步模組23會依該視訊播放時間pv(i)與 該音訊播放時間PA⑴,分別將該視訊播放緩衝器231内的 該已解碼視訊信號,及該音訊播放緩衝器232内的該已解碼 音訊信號,適時地送入該影音播放模組24進行播放。 歸納上述,藉由各模組間之相互配合運作,可成功地 、'« &可調式夕媒體編碼技術的特性和可適性多媒體串流控制 架構’以適應不同的使用者設備和網路特性,並滿足異質網 路中各種不同影音服務品質需求的制者,的確可以達成本 發明之目的。 惟以上所述者,僅為本發明之較佳實施例而已,當不 能以此限定本發明實施之範圍’即大凡依本發㈣請專利 範圍及發明說明内容所作之簡單的等效變化與修飾,皆仍 屬本發明專利涵蓋之範圍内。 【圖式簡單說明】 圖1是-系統方塊圖,說明本發明利用影音同步技術 與頻寬適應性技術之分層式多媒體串流系統之較佳實施例 16 200922328 【主要元件符號說明】 1 · •…伺服器 »♦*♦(·· …·多階層影音編瑪 模組 12,* * λ ·, •…頻寬適應模組 ♦*♦»·»»« …·客戶端處理單元 1 * · *«»« …·串流同步模組 211… ’…延遲抖動消除子 模組 212… …條件式重傳子模 組 213……接收資料緩衝器 22………多階層影音解碼 模組 2 3………播放同步模組 231 ·……視訊播放緩衝器 232…·…音訊播放緩衝器 24………影音播放模組 3……· ·,‘,網路 17Mapping TaMe)' is shown in Table i. In the preferred embodiment, the bandwidth adaptation module 12 directly corresponds to the number of the audio enhancement layers and the number of the video enhancement layers that are communicated at the bit rate of the audio/video enhancement layer. Table 1 in the U table audio enhancement | ^ (ie, the number of audio enhancement layers \ mouth two 172, pull ^ represents the video enhancement layer (ie, the number of video enhancement layer is ") Table 1, the enhancement layer bit rate and strengthen Layer number correspondence table 10 200922328 Enhanced layer bit rate enhancement layer number 2KB El^ 4KB EI^-2 10KB Ell5 20KB Ell5, Εΐζ 22KB ' Εΐζ ; 50KB E^2Q ' EI\ 60KB ei120, ei12 (2xm+ 10xn ) KB ,ΕΙΙ The server 1 divides the hierarchical video bit stream and the layered audio bit stream into a plurality of packets, and transmits the packets through the network 3. When the client processing unit 2 receives the packets The stream synchronization module 21 is configured to send the received layered video bit stream and the layered audio bit stream to the multi-layer video decoding module 22 within a predetermined decoding time. The received layered video bitstream and the layered audio bitstream are stored in the received data buffer 213. The delayed jitter cancellation submodule 211 system calculates the current delay time of the network 3 and corresponds to One of the received data buffers 213 can be presented for a long time The Temporal Presentation Length is greater than the delay time, so that the layered video bit stream and the layered audio bit stream are sent to the multi-layer video decoding module 22 during the decoding time. The number of video frames flowing from the layered video bit 11 200922328 in the received data buffer 213 is determined by the number of audio frames of the split %, / knife layer a bit stream. The jitter elimination sub-module 211 first obtains a start delay jitter threshold value, and if the renderable time length is less than the _delay jitter threshold, the backlash-delay message is sent back to the bandwidth through the network 3 Adapt to module 12. Assume that the start delay jitter threshold (in terms of terms) is defined by equation (4). Corpse aX"max+(1 - a)xJavg,... ............ ....(4) where the leg represents the maximum delay jitter, Havg represents the average delay: the dynamic time 'and w and Javg is the delay jitter elimination sub-module 2 ΐι according to the network 3 green listener; a represents - (d) parameters, and ^. When the bandwidth adaptation module 12 is connected to the start delay jitter elimination The number of intervals (7)FRa of the audio frame of the layered audio bit stream to be transmitted and the number of intervals of the video frame of the layered video bit stream (indicated by (d)), They are defined by equations (5) to (4) respectively. FRa- (TrA, bl+ TrA, el)/(TrA,b)+ (TrA e)/2))..............(5) where TrA, bl is the basis of audio The bit rate of the layer, I is the bit rate of the audio enhancement layer. FRV - (Trv, bl + Trv, el) / (Trv, bl + Trv, el_min)...... (4) where Trv, bl is the bit rate of the video base layer, which is the bit rate of the video enhancement layer ' Trv , el_min is the bit rate of the minimum required video enhancement layer. The delay jitter cancellation sub-module 211 then obtains a stop delay jitter threshold value. If the renderable time length is greater than the stop delay jitter (four) value, the network 3 forwards the stop delay jitter cancellation message to the frequency. Wide adaptive mode 12 200922328 ^ indicates), as in group 12. Assume that the stop delay jitter threshold is defined by (7). THsj = rxJmax ........................... ^ r represents - adjustment parameter, when the bandwidth adapts the module 12 to the delay jitter elimination message, the server 1 will return to the original transmission mode. The size of the adjustment parameter r value will affect the "delay jitter elimination" , operating frequency, and length of low-quality audio/video time. When the value is large, the length of low-quality audio/video time caused by "delayed jitter cancellation" will be longer (because the transmission is less enhanced) The reinforced layer of the island is streamed, so that the quality of the audio and video is less affected by the delay jitter. The effect on the audio quality is greater. In this implementation, the value of the audio material is set to the value of the video and is set to the conditional retransmission. Sub (4) plus calculation - allowable packet delay time gatecast value, to determine whether the right break is missing or not. Suppose the allowable seal delay time is expressed as TH (in THd), as shown in equation (4) Definition: THd^bxJmax+(1-b)xJavg............ (4) where b is a control parameter, and the control gamma is between When a certain _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The root of the base layer is whether the length of the presentation time is greater than the time required for a packet to be retransmitted (^ the current state of the network 3 is called); if so, the network passes the network 3 = the packet retransmits the message to the bandwidth adaptation mode. Μ 12. When the bandwidth adapts to Wu, 'and 12 receives a packet-retransmission message of a packet, it will retransmit according to the prediction: 200922328 12-frequency packet; the bandwidth adaptation module ^ receives the number at the same time When a packet is retransmitted, it will be based on each packet, and the other will be the same as the §fl frame; the _ frame is the highest, the audio frame is the second, and the symmetry is 4 y. The other types of frames (eg, B-frame, P-frame) are the lowest. The delay jitter cancellation sub-module 211 and the conditional re-transmission sub-module 212 can be separated, but Can be performed simultaneously. When the delay jitter dice module 211 and the conditional retransmission sub-module 212 are simultaneously performed, the packet The bit rate required for transmission is first protected by the number of intervals FRa of the audio frame defined by m(5)~(6) and the number of intervals FRv of the video frame, which can be further modified as defined by equations (9) to (10). (Tr"bl+Tr--T^〇^ (9) where TrA'ret represents the bit rate required for audio packet retransmission. FRv=(Trv,bI+TrVt_TrVret)/(Trvbi+T^ ( ι〇) Wherein Trv'ret represents the bit rate required for video packet retransmission. In the fourth (4) processing of the stream synchronization module 21, the material level video decoding module 22 borrows the layer from the receiving material buffer 2U_ The video bit stream and the layered audio bit stream are decoded and decoded into a decoded video signal and a received audio message, respectively. The multi-layer audio and video decoding module 22 can adopt the existing adjustable multimedia decoding technology in implementation, and corresponds to the adjustable multimedia coding used by the multi-layer audio and video material module u (4) 彡 解 _ group 22 decoded The decoded video signal and the decoded day signal 'are stored in the video playback buffer and the audio playback buffer 232, respectively. 14 200922328 The playback synchronization module 23 uses step (11) ~ (18) to calculate step by step to determine a video playback time of the decoded video signal and an audio playback time of the decoded audio signal. r A ma X ^ TA in a X τ A m in (11 ) where TAmax represents a maximum audio decoding time (decoding data includes the audio base layer and all audio enhancement layers), and TAmin represents a minimum audio decoding time (decoding data) Only the audio base layer is included, and RAmax represents a maximum audio decoding time difference. R.Vm3.x~~ in3.X — .........(12) where TVmax represents a maximum video decoding time (decoding data includes the video base layer and all video enhancement layers), and TVmin represents a minimum video decoding time (decoded data) Only the video base layer is included, and RVmax represents a maximum video decoding time difference. PA(l) — TA(1)_(_ RAnicix........................(13 ) pV(1) — r11 (1) " ;~|'1 nia( 1 ) P(av) = max{PA(l) ' PV^l)}·*·····.......^ 15) PA(1)Two P(av ), PV(l) = P(av)·......*·*·*.............. 16) where TA(1) and TV(1) respectively Representing the first audio packet entering the audio playback buffer 232, and the time when the first video packet enters the video playback buffer 23 1 , P (av) represents the video playback start time, PA (1) and PV (1) ) respectively represent the playback time of the first audio/video frame decoded, and finally set the playback time of the video. Since the time resolution of the audio and video data can be dynamically changed, in order to support the variable time resolution, the audio/video playback time (represented by PA(1)/pv(1)) corresponding to the i-th audio/video frame (i22) 15 200922328 is as follows . PA(i)=pA(i_1)+Ua(i)..................(17) PV(i) = PV(M)+Uv(1)...·... (1) Wherein Ua(1) and Uv(i) respectively represent the time interval of the i-th audio/video frame, and the playback synchronization module 23 buffers the video playback according to the video playback time pv(i) and the audio playback time PA(1) respectively. The decoded video signal in the 231 and the decoded audio signal in the audio playback buffer 232 are sent to the video playback module 24 for playback. In summary, through the cooperation of the modules, the characteristics of the '« & adjustable media encoding technology and the adaptive multimedia stream control architecture' can be successfully adapted to different user equipment and network characteristics. The makers who meet the quality requirements of various audio and video services in heterogeneous networks can indeed achieve the object of the present invention. However, the above is only the preferred embodiment of the present invention, and the scope of the invention should not be limited thereto. All remain within the scope of the invention patent. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a system block diagram illustrating a preferred embodiment 16 of a layered multimedia streaming system utilizing video and audio synchronization technology and bandwidth adaptive technology of the present invention. 200922328 [Explanation of main component symbols] 1 •...server»♦*♦(··...·Multi-level audio and video editing module 12,* * λ ·, •...Frequency-adaptive module ♦*♦»·»»« ...·Client processing unit 1 * · *«»« ...·Streaming synchronization module 211... '...delay jitter cancellation sub-module 212...conditional retransmission sub-module 213...receiving data buffer 22.........multi-level video decoding module 2 3.........play sync module 231 ·...video play buffer 232...·...audio play buffer 24......audio play module 3...··, ', network 17