TW200922328A

TW200922328A - Hierarchical multimedia streaming system of utilizing video synchronization and bandwidth adaption techniques

Info

Publication number: TW200922328A
Application number: TW96142831A
Authority: TW
Inventors: chong-ming Huang; qun-wei Lin; Cheng-Yan Zhuang
Original assignee: Univ Nat Cheng Kung
Priority date: 2007-11-13
Filing date: 2007-11-13
Publication date: 2009-05-16
Also published as: TWI354495B

Abstract

The present invention discloses a hierarchical multimedia streaming system of utilizing video synchronization and bandwidth adaption techniques, which comprises a server and customer-end processing unit including a streaming synchronization module, a hierarchical multimedia decoding module and a playing synchronization module. The server encodes an image signal and an audio signal to generate a hierarchical image bit stream and a hierarchical audio bit stream. The streaming synchronization module has the hierarchical image and audio bit streams be sent to the hierarchical video decoding module timely. The hierarchical video decoding module decodes the hierarchical image and audio bit streams into a decoded image signal and a decoded audio signal. The playing synchronization module determines image playing time for the decoded image signal and determines audio playing time for the decoded audio signal.

Description

200922328 九、發明說明：【發明所屬之技術領域】本發明是有關於一種多媒體串流系統，特別是指一種利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統。【先前技術】由於寬頻網路和IEEE 802.11b/g網路的普及，加上新興的行動3G和Wi-Max網路，在共構而成的異質網路中，使用者可以利用任何一種設備，連上不同的網路來存取網路上的影音多媒體資源。為了滿足不同使用者對於服務品質的需求，可調式多媒體編瑪（Scalable Multimedia Coding )技術可以將影音資料壓縮成不同品質的檔案；此外，一個良好設計的可適性多媒體串流（Adaptive Multimedia Streaming )控制架構，則能夠根據當時不同的網路資源，適當地改變傳輸速度，以調整影音撥放的解析度和品質。因此，欲滿足現今異質網路中各種不同服務品質需求的使用者，結合可調式多媒體編碼技術的特性和可適性多媒體串流控制架構，將是非常重要的議題。就可調式多媒體編碼技術而言，目前有許多不錯的技術，例如，轉換編碼（Transcoding )、精緻可調式的視訊編碼（Fine Granularity Scalability，簡稱 FGS)，以及位元切割的算術音訊編碼（Bit Sliced Arithmetic Coding，簡稱 BSAC)，已經被提出。其中，FGS和BSAC技術的主要優 200922328 點在於只需要以最高品質的模式編碼過一次，就能夠產生 = :::和加強層的串流檔案’且解碼器就能夠從接收机貝科（包含基礎層和部份加強層）進行解碼。然而，不同的基礎層和加強層數量的多媒體資料的解騎間複雜度，會造成影音申流在同步撥放上的因難與研現有的技術和文獻’均甚少針對此問題進行討論就可適性多媒體串流控制架構而言同步的主要因素：第一、時變的網路頻寬，第者期的時間延遲抖動（Delay加er)，第：^不可預「時變的網路頻宫〇穴關於頻見」因素，目如已經有很估測技術與參考軟體^ 判網路頻寬被用來精確m傳PP、PathChlrp等），200922328 IX. Description of the Invention: [Technical Field] The present invention relates to a multimedia streaming system, and more particularly to a layered multimedia streaming system utilizing video and audio synchronization technology and bandwidth adaptive technology. [Prior Art] Due to the popularity of broadband networks and IEEE 802.11b/g networks, and the emerging mobile 3G and Wi-Max networks, users can use any kind of device in a heterogeneous network. Connect to different networks to access audio and video multimedia resources on the network. In order to meet the needs of different users for service quality, Scalable Multimedia Coding technology can compress audio and video data into different quality files; in addition, a well-designed Adaptive Multimedia Streaming control The architecture can appropriately change the transmission speed according to different network resources at that time to adjust the resolution and quality of the video playback. Therefore, users who want to meet the diverse service quality requirements of today's heterogeneous networks, combined with the characteristics of adaptive multimedia coding technology and adaptable multimedia streaming control architecture, will be very important issues. In terms of adjustable multimedia coding technology, there are many good technologies, such as Transcoding, Fine Granularity Scalability (FGS), and bit-cut arithmetic audio coding (Bit Sliced). Arithmetic Coding (BSAC) has been proposed. Among them, the main advantage of FGS and BSAC technology is 200922328. It only needs to be encoded once in the highest quality mode, and it can generate the stream file of =::: and enhancement layer' and the decoder can be received from the receiver. The base layer and some enhancement layers are decoded. However, the complexity of the inter-riding of the multimedia data of different base layers and reinforcement layers will cause the difficulty of the audio-visual application in the synchronous dial-up and the existing technology and literature are rarely discussed. The main factors of synchronization in the adaptive multimedia stream control architecture are: first, time-varying network bandwidth, time delay jitter in the first period (Delay plus er), and: ^ unpredictable "time-varying network frequency "Miyajima points on frequency" factors, such as the already estimated estimation technology and reference software ^ judgment network bandwidth is used to accurately pass PP, PathChlrp, etc.)

Protocol (^nsmisswn c〇mr〇1 a冉丁CP) /使用者資料流協Protocol (^nsmisswn c〇mr〇1 a冉丁CP) / User Data Stream Association

Pr〇t〇co卜簡稱 ser Datagram 「不c)下的可用頻寬。關於也被提出用來J 相關的緩衝區控制，视杈出用來減少時間延遲抖動現調整解碼端的播放速度作緩衝區的管理’、'_=，利用端的傳送速度來彌姑姻々利用調整伺服器區的配置方法來處理延2遲抖動的問題，或者利用緩衝失法來處理延遲時間抖動的問題。關於「封包遺」I’封包重傳機制也常被用來改降低的影音品質。例如，仗储、去Λ 為封L遺失而不舌* ，依據遺失封包的重要性來争宏曰否重傳封包，或者利用’、疋包造成的影音失真^重傳機制，當遺失封真超過其訂定的門檀，則予以重傳遺失的 200922328 =的=’以上可適性的多媒體串流技術多著重於 = 送，並未―式音編巧技針^ 行設計’這隸難發揮可調式影 “碼技術在異質網路中影音串流的優點。技二胜Γ必要哥求—解決之道，結合可調式多媒體編碼使用者、=可適性多媒體串流控制架構’以適應不同的 =:1路特性’並滿足異質網路中各種不同影音服務品質需求的使用者。【發明内容】，因此，本發明之目的，即在提供一種利用影音同步技術與頻&適應性技術之分層式多媒體串流系統。於疋，本發明利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統是包含—客戶端處理單元。該客戶端處理早7L包括-串流同步模組、—多階層影音解碼模組及播放同步核組。該串流同步模组用以使已接收之— 分層視訊位元流與一分層音訊位元流，在預定的一解碼時間内送人❹階層影音解碼㈣。該多階㈣音解竭模植用以將該分層視訊位元流與該分層音訊位^流分別解嗎為 -已解碼視訊信號及-已解碼音訊信號。該播放同步模组用以決錢已解職訊信號之—視訊播放時間，及該已解竭音訊信號之一音訊播放時間。藉由該串流同步模組、多階層影音解碼模組，及播放同步拉組，以結合可調式多媒體編/解碼技術和可適性多媒體串流控制架構，的確可以達成本發明之目的。 7 200922328 【實施方式】有關本舍明之前述及其他技術内容、特點與功效，在以下配合參考圖式之—個較佳實施例的詳細說明中，將可清楚的呈現。參閱圖1 ’本發明利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統之較佳實施例包含一伺服㈱】及一客戶端處理單元2。該伺服器1包括-多階層影音編碼模組11及-頻寬適應模組12。該客戶端處理單元2包括一串流同步模組21、一多階層影音解碼模組22、一播放同步模組23 ’及一影音播放模组24。該争流同步模組21具有一延遲抖動（Delay Jitte〇消除子模組211、一條件式重傳子模組212 ’及一接收資料緩衝器（Buffer) 213。該播放同步模組23具有一視訊播放緩衝器231，及一音訊播放緩衝器 232。該多階層影音編碼模組丨丨將輸入之一視訊信號與一音訊信號’分別編碼為一分層視訊位元流（Layered video Bitstream)及刀層音訊位元流（Layered Audio Bitstream ) 。其中該多階層影音編碼模組丨i在實施上可採用現有的可調式多媒體編碼技術，例如，FGS、BSAC等。且該分層視訊位元流具有複數視訊訊框（Vide〇 Frame)，該分層音訊位元流具有複數音訊訊框（Audio Frame )。每一視訊訊框具有一視訊基礎層（Video Base Layer)與至少一視訊加強層（ Video Enhanced Layer)，每一音訊訊框具有一音訊基礎層（ Audio Base Layer)與至少一音訊加強層（Audio Enhanced 8 200922328Pr〇t〇co 卜 abbreviated as the available bandwidth under ser Datagram “not c.” Regarding the buffer control that is also proposed for J correlation, the video is used to reduce the time delay jitter and now adjust the playback speed of the decoding end as a buffer. The management ', '_=, the use of the transmission speed of the end of the use of the adjustment of the server area configuration method to deal with the problem of delay 2 delay jitter, or the use of buffer loss to deal with the problem of delay time jitter. About "package The legacy "I" packet retransmission mechanism is also often used to change the quality of the video. For example, if you save or go to the door, you will lose the tongue, but you will not retransmit the packet according to the importance of the lost packet, or use the video and audio distortion caused by ', the package to retransmit the mechanism. More than the set door sandal, it will be retransmitted lost 200922328 = = 'The above adaptability of the multimedia streaming technology is more important than = send, not "styled skill editing technology ^ line design" Adjustable image "code technology in the heterogeneous network video and audio streaming advantages. Technology two wins the necessary brother to seek - the solution, combined with adjustable multimedia coding users, = adaptable multimedia stream control architecture to adapt to different =: 1 channel characteristic 'and meets the needs of various audio and video service quality requirements in heterogeneous networks. [Invention] Therefore, the object of the present invention is to provide an audio and video synchronization technology and frequency & adaptive technology. Hierarchical multimedia streaming system. In the present invention, the hierarchical multimedia streaming system using the audio-visual synchronization technology and the bandwidth adaptive technology is a client-side processing unit. The client processes 7L early. a stream-synchronization module, a multi-layer video decoding module, and a playback synchronization core group. The stream synchronization module is configured to enable the received layered video bit stream and a layered audio bit stream. The predetermined decoding time is sent to the human-level video and audio decoding (4). The multi-level (four) tone demodulation is used to solve the hierarchical video bit stream and the layered audio bit stream respectively - the decoded video signal And - the decoded audio signal. The playback synchronization module is used to determine the video playback time of the decommissioned signal and the audio playback time of one of the exhausted audio signals. The stream synchronization module and the multi-level The audio-visual decoding module and the playing synchronous pull group are combined with the adjustable multimedia encoding/decoding technology and the adaptive multimedia stream control architecture to achieve the object of the present invention. 7 200922328 [Embodiment] The foregoing and other The technical content, features and effects will be clearly shown in the following detailed description of a preferred embodiment with reference to the drawings. Referring to Figure 1 'The present invention utilizes audio-visual synchronization technology and bandwidth adaptation. The preferred embodiment of the layered multimedia streaming system of the technology comprises a servo system and a client processing unit 2. The server 1 comprises a multi-layer video encoding module 11 and a bandwidth adaptive module 12. The client processing unit 2 includes a stream synchronization module 21, a multi-layer video decoding module 22, a playback synchronization module 23', and a video playback module 24. The contention synchronization module 21 has a delay jitter ( The delay Jitte〇 elimination sub-module 211, a conditional retransmission sub-module 212' and a receiving data buffer (Buffer) 213. The playback synchronization module 23 has a video playback buffer 231, and an audio playback buffer. 232. The multi-level video coding module 编码 encodes one of the video signals and one audio signal into a layered video bitstream and a layered audio bitstream. The multi-layer video coding module 丨i can be implemented by using existing tunable multimedia coding technologies, such as FGS, BSAC, and the like. And the layered video bitstream has a complex video frame (Vide frame), the layered audio bitstream having a complex audio frame (Audio Frame). Each video frame has a video base layer and at least one video enhancement layer. Each audio frame has an audio base layer and at least one audio enhancement layer (Audio). Enhanced 8 200922328

Layer)。音訊/視訊基礎層為每一音訊/視訊訊框解碼時所需的最基本資料，而音訊/視訊加強層則是用以改善音訊/視訊基礎層解碼後的品質。Layer). The audio/video base layer is the most basic data required for decoding each audio/video frame, and the audio/video enhancement layer is used to improve the quality of the audio/video base layer after decoding.

該頻寬適應模組12用以決定欲傳送之該分層視訊位元流之位元率（Bit-rate)與該分層音訊位元流之位元率，並將其透過一網路3傳送。由於可調式多媒體編/解碼技術的特性’音訊/視訊基礎層必須先被正確地解碼，音訊/視訊加強層才旎夠進一步改善音訊/視訊基礎層解碼後的品質。因此，該頻寬適應模組12必須先預留音訊/視訊基礎層欲傳送的位元率，剩餘的才是音訊/視訊加強層之位元率（以表示）’如式（1)所定義。丄reI=Trtotal—TrA bi—Trv,bi……，,^ 其中TrtotaI代表該網路3之預測可用頻Hi代表音訊基礎層之位元率，Trv，M代表視訊基礎層之位元率。然後’該頻寬適應模組12再根據該預測可用頻寬所求出的音§孔/視訊加強居 / ;玄，速層之位兀率，並利用人類感知（The bandwidth adaptation module 12 is configured to determine a bit rate of the layered video bit stream to be transmitted and a bit rate of the layered audio bit stream, and pass the same to a network 3 Transfer. Due to the characteristics of the adjustable multimedia encoding/decoding technology, the audio/video base layer must be decoded correctly, and the audio/video enhancement layer can further improve the quality of the audio/video base layer after decoding. Therefore, the bandwidth adaptation module 12 must first reserve the bit rate to be transmitted by the audio/video base layer, and the rest is the bit rate of the audio/video enhancement layer (indicated) as defined by equation (1). .丄reI=Trtotal—TrA bi—Trv,bi...,,^ where TrtotaI represents the predicted available frequency Hi of the network 3 represents the bit rate of the audio base layer, and Trv, M represents the bit rate of the video base layer. Then, the bandwidth accommodating module 12 further enhances the bit § hole/video according to the predicted available bandwidth, and uses the human perception (

Perception)特性，以沐、疋人傳送的S亥g訊加強層之數目及該視訊加強層之數目。—舻身又而δ 在人類的感知中，聽謦較視覺重要許多，加上满1咨姐认初 I寬上視SfL貢枓的解碼器能夠利用内插 Interleave )撥放的t彳人現遺失的書㈤彳式，配合人類視覺暫留的特性，來呈現遺失的晝面。所以，就多媒體音訊資料較視訊資料^,. gp 優先權而厂貝针回泎多，即，欲傳之數目會小於欲傳误玷+ t ^見误加強層人傅迗的該音訊加強層之數視訊加強層之數目比（攸'•又9 afL/ (以R表不），如式（2) 9 200922328 a ν**"**" ………......………… (2 ) 其中Na代表該音訊加強層之數目，Nv代表該視訊加強層之數目。而每一音訊加強層的資料量約為2κβ，每一視訊加強層的資料量約為1GKB。該音訊/視訊加強層之數目比R值，可以綜合音訊與視訊的品質來作決定。假設一音訊/視訊共同品質（以Qav表示），如式（3)所定義Perception), the number of enhancement layers transmitted by Mu and the person, and the number of video enhancement layers. - 舻又 δ δ In human perception, listening to 謦 is more important than visual, plus the full 1 咨认认 I I I I I I I I S S S S S S S S S S f S 解码解码解码解码解码解码解码解码解码解码解码解码解码解码解码解码The lost book (five) style, in line with the characteristics of human vision persistence, to present the lost face. Therefore, in terms of multimedia audio data, the video information is more important than the video data ^,. gp priority, that is, the number of wanted messages will be less than the number of wanted to transmit + t ^ see the error enhancement layer of the person's audio enhancement layer The ratio of the number of video enhancement layers (攸'•又9 afL/ (in the form of R), as in (2) 9 200922328 a ν**"**" .................. (2) where Na represents the number of audio enhancement layers, Nv represents the number of video enhancement layers, and the amount of data for each audio enhancement layer is approximately 2 kappa, and the amount of data per video enhancement layer is approximately 1 GKB. The ratio of the number of audio/video enhancement layers to the R value can be determined by combining the quality of audio and video. Suppose an audio/video common quality (indicated by Qav) is defined by equation (3).

QaV=2xQa+Qv……………………..............."（3) 其中Qa代表音訊品質，係以客觀差異等級（〇bjective 歸⑽ce Grade，簡稱〇DG )與失真指標（此咖QaV=2xQa+Qv...................................."(3) where Qa stands for audio quality and is based on objective difference level (〇bjective returns to (10)ce Grade , referred to as 〇DG) and distortion indicators (this coffee

Index，簡稱DI)作為評量標準；Qv代表視訊品質，係以訊噪比（Peak Signal to Noise Rati〇，簡稱 pSNR )作為評量標準。由於ODG較適合評量品質較高的音訊，❿〇ι可評量各種品質高低不同的音訊，故取〇DG與di之平均值作為音訊品質的等級。而〇DG與DI的值將由原本的心㈣將其線性正規劃（Linear N_alize)到卜4()。經過眾多實驗的結果，最佳化的音訊/視訊加強層之數目比r值為$，依此觀念建立了一加強層位元率與加強層數目對應表（Index (DI) is used as the evaluation standard; Qv stands for video quality, and the Signal to Noise Rati〇 (pSNR) is used as the evaluation standard. Since ODG is more suitable for evaluating high-quality audio, ❿〇ι can measure different audio levels of different quality, so the average of DG and di is taken as the level of audio quality. The value of 〇DG and DI will be linearly planned (Linear N_alize) to Bu 4() from the original heart (4). After many experimental results, the number of optimized audio/video enhancement layers is greater than r, and a table of enhancement layer bit rate and enhancement layer number is established according to this concept.

Mapping TaMe)’如表i所示。在本較佳實施例中該頻寬適應模組12係、以該音訊/視訊加強層之位$率，直接對應出仅傳达的該音訊加強層之數目及該視訊加強層之數目。表1 中U表音訊加強| ^ (即，音訊加強層之數 \ 口两 172 ，拉^代表視訊加強層(即，視訊加強層之數目為"）表1、加強層位元率與加強層數目對應表 10 200922328 加強層位元率加強層數目 2KB El^ 4KB EI^-2 10KB Ell5 20KB Ell5，Εΐζ 22KB ' Εΐζ ; 50KB E^2Q ' EI\ 60KB ei120，ei12 (2xm+ 10xn ) KB ，ΕΙΙ„ 該伺服器1會將該分層視訊位元流與該分層音訊位元流分割為複數封包（Packet)，透過該網路3進行傳送。當該客戶端處理單元2接收到該等封包時，該串流同步模組 21用以使已接收之該分層視訊位元流與分層音訊位元流，在預定的一解碼時間内送入該多階層影音解碼模組22。其中已接收之該分層視訊位元流與分層音訊位元流係儲存於該接收資料緩衝器213。該延遲抖動消除子模組211係統計該網路3目前的一延遲時間，並使對應該接收資料緩衝器213 之一可呈現時間長度（Temporal Presentation Length)大於該延遲時間，藉以使該分層視訊位元流及該分層音訊位元流在該解碼時間内送入該多階層影音解碼模組22。而該可呈現時間長度係由該接收資料緩衝器213内之該分層視訊位元 11 200922328 流的視訊訊框之數目，與該分居立％ , /刀層a訊位元流的音訊訊框之數目所決定。該延遲抖動消除子模組211先求出一開始延遲抖動門根值，若該可呈現時間長度小於該_延遲抖動Η檻值，則透過該網路3回傳-開始延遲抖動消除訊息給該頻寬適應模組 12。假設該開始延遲抖動門檻值（以项表示），如式（4 )所定義。尸 aX』max+(1 —a)xJavg,… ................(4) 其中】腿代表最大延遲抖動時Havg代表平均延遲 :動時間’而w與Javg是該延遲抖動消除子模組2ΐι根據該網路3目綠聽計而得；a代表—㈣參數，且^ 。當該頻寬適應模組12接故到該開始延遲抖動消除訊息會進-步調整欲傳送之該分層音訊位元流之音訊訊框的間二數目⑺FRa表示)’及該分層視訊位元流之視訊訊框的間隔數目（以㈣表示)，分別如式⑸〜⑷所定義。 FRa- (TrA，bl+ TrA，el)/(TrA，b】+ (TrA e】/2))..…………（5) 其中TrA，bl為音訊基礎層之位元率，I為音訊加強層之位率。 FRV - (Trv，bl + Trv，el)/(Trv,bl + Trv，el_min)…………·⑷ 其中Trv，bl為視訊基礎層之位元率，為視訊加強層之位元率’ Trv，el_min為最小所需視訊加強層之位元率。該延遲抖動消除子模組211繼而求出—停止延遲抖動門威值’若該可呈現時間長度大於該停止延遲抖動㈣值，則過該網路3回傳該停止延遲抖動消除訊息給該頻寬適應模 12 200922328 ^表示），如式組12。假設該停止延遲抖動門檻值（ (7 )所定義。 THsj = rxJmax ................................. ^中r代表-調整參數，當該頻寬適應模組12 到延遲抖動消除訊息，該伺服器1會回到原始的傳送槿式。該調整參數r值的大小，會影響到「延遲抖動消除、運作頻率，與低-品質音訊/視訊時間的長度。當Γ值較大時，由「延遲抖動消除」所造成的低·品質音訊/視訊時間的長度將較長（因為傳送較少的加強岛的加強層串流，故使得影音品質較由於延遲抖動消除」對音訊品質的影響較大，在本 =實施财，音料1•值設為視訊的^值被設為該條件式重傳子㈣加先計算—可允許的封包延遲時間門播值，用以判斷是否右斷疋否有封包遺失。假設該可允許的封匕延遲時間Η檀值（以THd表示），如式⑷所定義。 THd^bxJmax+(1-b)xJavg............ ⑷ 其中b為-個控制參數，控制伽介於】酿與間的比例；當某_封白沾將M a 遲時間大於™時，則此封包 ^為遺失。料封包遺失，則雜件式重料模組m進次粗情遺失的封包是否具有視訊基礎層或音訊基礎層之根攄兮該可呈現時間長度是否大於一封包重傳所需時間（ ^網路3目前狀態統計叫）；若是，麟過該網路3 =封包重傳訊息給該頻寬適應模Μ 12。當該頻寬適應吴、'且12接收到某—封包之封包重傳訊息時，會依該預測可 13 200922328 12頻封包之重傳；該頻寬適應模組 ^同時接收到數個封包的封包重傳訊息時，會依各封包之、序刀另J為.視§fl訊框内的];_訊框最高、音訊訊框次之，雜却4 y 硯讯矾框内的其他類型訊框（如， B-訊框、P-訊框）最低。于提的疋該延遲抖動消除子模組211與該條件式重傳子模組212可以分開進行，但其等亦可同時進行。當該延遲抖動鎌子模組211與該條件式重傳子模組212同時進行時，封包重傳所需的位元率會先被保m⑸〜（6 )所定義之音訊訊框的間隔數目FRa與視訊訊框的間隔數目FRv，可進一步修改如式（9)~⑽所定義。 FRa=(Tr"bl+Tr--T^〇^ (9) 其中TrA’ret代表音訊封包重傳所需的位元率。 FRv=(Trv,bI+TrVt_TrVret)/(Trvbi+T^ ( ι〇) 其中Trv’ret代表視訊封包重傳所需的位元率。在該串流同步模組21處理的過㈣，料階層影音解碼模組22錢地自該接收料緩衝器2U _取該分層視訊位元流與該分層音訊位元流，並將其等分別解碼為一已解碼視訊信號及-已料音訊”。其"多階層影音解碼模組22在實施上可採用現有的可調式多媒體解碼技術，並對應於該多階層影音料模組u所採用的可調式多媒體編碼㈣彡音解_組22解碼出的該已解瑪視訊信號及已解碼日訊號’分別儲存於該視訊播放緩衝器如及音訊播放缓衝器232。 14 200922328 該播放同步模組23利用式（11 ) ~ ( 18 )逐步計算，以決定該已解碼視訊信號之一視訊播放時間，及該已解碼音訊信號之一音訊播放時間。 r A m a X ^ T A in a X τ A m in …( 11 ) 其中TAmax代表一最大音訊解碼時間（解碼資料包括音訊基礎層和所有的音訊加強層），TAmin代表一最小音訊解碼時間（解碼資料只包括音訊基礎層），RAmax代表一最大音訊解碼時間差。 R.Vm3.x~~ in3.X — ………(12) 其中TVmax代表一最大視訊解碼時間（解碼資料包括視訊基礎層和所有的視訊加強層），TVmin代表一最小視訊解碼時間（解碼資料只包括視訊基礎層），RVmax代表一最大視訊解碼時間差。 PA(l) — TA(1)_(_ RAnicix........................（13 ) pV(1) — r11 (1) "~|'1 nia( 1 ) P(av) = max{PA(l) ’ PV^l)}·*····.......^ 15) PA(1)二 P(av)，PV(l) = P(av)·......*·*·*.............. 16) 其中TA(1)及TV(1)分別代表第一個音訊封包進入該音訊播放緩衝器232，與第一個視訊封包進入該視訊播放緩衝器23 1的時間，P(av)代表影音起始播放時間，PA(1)及 PV(1)分別代表解碼出之第一個音訊/視訊訊框的播放時間，最後皆設為該影音起始播放時間。由於影音資料的時間解析度是可以動態改變的，為了支援可變的時間解析度，對應第i個音訊/視訊訊框（i22) 15 200922328 之音訊/視訊播放時間（以PA⑴/pv⑴表示）如下。 PA(i)=pA(i_1)+Ua(i)……...............( 17) PV(i) = PV(M)+Uv⑴…· … …⑴）其中Ua⑴與Uv(i)分別代表第i個音訊/視訊訊框之時間間隔該播放同步模組23會依該視訊播放時間pv(i)與該音訊播放時間PA⑴，分別將該視訊播放緩衝器231内的該已解碼視訊信號，及該音訊播放緩衝器232内的該已解碼音訊信號，適時地送入該影音播放模組24進行播放。歸納上述，藉由各模組間之相互配合運作，可成功地、'« &可調式夕媒體編碼技術的特性和可適性多媒體串流控制架構’以適應不同的使用者設備和網路特性，並滿足異質網路中各種不同影音服務品質需求的制者，的確可以達成本發明之目的。惟以上所述者，僅為本發明之較佳實施例而已，當不能以此限定本發明實施之範圍’即大凡依本發㈣請專利範圍及發明說明内容所作之簡單的等效變化與修飾，皆仍屬本發明專利涵蓋之範圍内。【圖式簡單說明】圖1是-系統方塊圖，說明本發明利用影音同步技術與頻寬適應性技術之分層式多媒體串流系統之較佳實施例 16 200922328 【主要元件符號說明】 1 · •…伺服器 »♦*♦(·· …·多階層影音編瑪模組 12，* * λ ·， •…頻寬適應模組 ♦*♦»·»»« …·客戶端處理單元 1 * · *«»« …·串流同步模組 211… ’…延遲抖動消除子模組 212… …條件式重傳子模組 213……接收資料緩衝器 22………多階層影音解碼模組 2 3………播放同步模組 231 ·……視訊播放緩衝器 232…·…音訊播放緩衝器 24………影音播放模組 3……· ·，‘，網路 17Mapping TaMe)' is shown in Table i. In the preferred embodiment, the bandwidth adaptation module 12 directly corresponds to the number of the audio enhancement layers and the number of the video enhancement layers that are communicated at the bit rate of the audio/video enhancement layer. Table 1 in the U table audio enhancement | ^ (ie, the number of audio enhancement layers \ mouth two 172, pull ^ represents the video enhancement layer (ie, the number of video enhancement layer is ") Table 1, the enhancement layer bit rate and strengthen Layer number correspondence table 10 200922328 Enhanced layer bit rate enhancement layer number 2KB El^ 4KB EI^-2 10KB Ell5 20KB Ell5, Εΐζ 22KB ' Εΐζ ; 50KB E^2Q ' EI\ 60KB ei120, ei12 (2xm+ 10xn ) KB ,ΕΙΙ The server 1 divides the hierarchical video bit stream and the layered audio bit stream into a plurality of packets, and transmits the packets through the network 3. When the client processing unit 2 receives the packets The stream synchronization module 21 is configured to send the received layered video bit stream and the layered audio bit stream to the multi-layer video decoding module 22 within a predetermined decoding time. The received layered video bitstream and the layered audio bitstream are stored in the received data buffer 213. The delayed jitter cancellation submodule 211 system calculates the current delay time of the network 3 and corresponds to One of the received data buffers 213 can be presented for a long time The Temporal Presentation Length is greater than the delay time, so that the layered video bit stream and the layered audio bit stream are sent to the multi-layer video decoding module 22 during the decoding time. The number of video frames flowing from the layered video bit 11 200922328 in the received data buffer 213 is determined by the number of audio frames of the split %, / knife layer a bit stream. The jitter elimination sub-module 211 first obtains a start delay jitter threshold value, and if the renderable time length is less than the _delay jitter threshold, the backlash-delay message is sent back to the bandwidth through the network 3 Adapt to module 12. Assume that the start delay jitter threshold (in terms of terms) is defined by equation (4). Corpse aX"max+(1 - a)xJavg,... ............ ....(4) where the leg represents the maximum delay jitter, Havg represents the average delay: the dynamic time 'and w and Javg is the delay jitter elimination sub-module 2 ΐι according to the network 3 green listener; a represents - (d) parameters, and ^. When the bandwidth adaptation module 12 is connected to the start delay jitter elimination The number of intervals (7)FRa of the audio frame of the layered audio bit stream to be transmitted and the number of intervals of the video frame of the layered video bit stream (indicated by (d)), They are defined by equations (5) to (4) respectively. FRa- (TrA, bl+ TrA, el)/(TrA,b)+ (TrA e)/2))..............(5) where TrA, bl is the basis of audio The bit rate of the layer, I is the bit rate of the audio enhancement layer. FRV - (Trv, bl + Trv, el) / (Trv, bl + Trv, el_min)...... (4) where Trv, bl is the bit rate of the video base layer, which is the bit rate of the video enhancement layer ' Trv , el_min is the bit rate of the minimum required video enhancement layer. The delay jitter cancellation sub-module 211 then obtains a stop delay jitter threshold value. If the renderable time length is greater than the stop delay jitter (four) value, the network 3 forwards the stop delay jitter cancellation message to the frequency. Wide adaptive mode 12 200922328 ^ indicates), as in group 12. Assume that the stop delay jitter threshold is defined by (7). THsj = rxJmax ........................... ^ r represents - adjustment parameter, when the bandwidth adapts the module 12 to the delay jitter elimination message, the server 1 will return to the original transmission mode. The size of the adjustment parameter r value will affect the "delay jitter elimination" , operating frequency, and length of low-quality audio/video time. When the value is large, the length of low-quality audio/video time caused by "delayed jitter cancellation" will be longer (because the transmission is less enhanced) The reinforced layer of the island is streamed, so that the quality of the audio and video is less affected by the delay jitter. The effect on the audio quality is greater. In this implementation, the value of the audio material is set to the value of the video and is set to the conditional retransmission. Sub (4) plus calculation - allowable packet delay time gatecast value, to determine whether the right break is missing or not. Suppose the allowable seal delay time is expressed as TH (in THd), as shown in equation (4) Definition: THd^bxJmax+(1-b)xJavg............ (4) where b is a control parameter, and the control gamma is between When a certain _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The root of the base layer is whether the length of the presentation time is greater than the time required for a packet to be retransmitted (^ the current state of the network 3 is called); if so, the network passes the network 3 = the packet retransmits the message to the bandwidth adaptation mode. Μ 12. When the bandwidth adapts to Wu, 'and 12 receives a packet-retransmission message of a packet, it will retransmit according to the prediction: 200922328 12-frequency packet; the bandwidth adaptation module ^ receives the number at the same time When a packet is retransmitted, it will be based on each packet, and the other will be the same as the §fl frame; the _ frame is the highest, the audio frame is the second, and the symmetry is 4 y. The other types of frames (eg, B-frame, P-frame) are the lowest. The delay jitter cancellation sub-module 211 and the conditional re-transmission sub-module 212 can be separated, but Can be performed simultaneously. When the delay jitter dice module 211 and the conditional retransmission sub-module 212 are simultaneously performed, the packet The bit rate required for transmission is first protected by the number of intervals FRa of the audio frame defined by m(5)~(6) and the number of intervals FRv of the video frame, which can be further modified as defined by equations (9) to (10). (Tr"bl+Tr--T^〇^ (9) where TrA'ret represents the bit rate required for audio packet retransmission. FRv=(Trv,bI+TrVt_TrVret)/(Trvbi+T^ ( ι〇) Wherein Trv'ret represents the bit rate required for video packet retransmission. In the fourth (4) processing of the stream synchronization module 21, the material level video decoding module 22 borrows the layer from the receiving material buffer 2U_ The video bit stream and the layered audio bit stream are decoded and decoded into a decoded video signal and a received audio message, respectively. The multi-layer audio and video decoding module 22 can adopt the existing adjustable multimedia decoding technology in implementation, and corresponds to the adjustable multimedia coding used by the multi-layer audio and video material module u (4) 彡解 _ group 22 decoded The decoded video signal and the decoded day signal 'are stored in the video playback buffer and the audio playback buffer 232, respectively. 14 200922328 The playback synchronization module 23 uses step (11) ~ (18) to calculate step by step to determine a video playback time of the decoded video signal and an audio playback time of the decoded audio signal. r A ma X ^ TA in a X τ A m in (11 ) where TAmax represents a maximum audio decoding time (decoding data includes the audio base layer and all audio enhancement layers), and TAmin represents a minimum audio decoding time (decoding data) Only the audio base layer is included, and RAmax represents a maximum audio decoding time difference. R.Vm3.x~~ in3.X — .........(12) where TVmax represents a maximum video decoding time (decoding data includes the video base layer and all video enhancement layers), and TVmin represents a minimum video decoding time (decoded data) Only the video base layer is included, and RVmax represents a maximum video decoding time difference. PA(l) — TA(1)_(_ RAnicix........................(13 ) pV(1) — r11 (1) &quot ;~|'1 nia( 1 ) P(av) = max{PA(l) ' PV^l)}·*·····.......^ 15) PA(1)Two P(av ), PV(l) = P(av)·......*·*·*.............. 16) where TA(1) and TV(1) respectively Representing the first audio packet entering the audio playback buffer 232, and the time when the first video packet enters the video playback buffer 23 1 , P (av) represents the video playback start time, PA (1) and PV (1) ) respectively represent the playback time of the first audio/video frame decoded, and finally set the playback time of the video. Since the time resolution of the audio and video data can be dynamically changed, in order to support the variable time resolution, the audio/video playback time (represented by PA(1)/pv(1)) corresponding to the i-th audio/video frame (i22) 15 200922328 is as follows . PA(i)=pA(i_1)+Ua(i)..................(17) PV(i) = PV(M)+Uv(1)...·... (1) Wherein Ua(1) and Uv(i) respectively represent the time interval of the i-th audio/video frame, and the playback synchronization module 23 buffers the video playback according to the video playback time pv(i) and the audio playback time PA(1) respectively. The decoded video signal in the 231 and the decoded audio signal in the audio playback buffer 232 are sent to the video playback module 24 for playback. In summary, through the cooperation of the modules, the characteristics of the '« & adjustable media encoding technology and the adaptive multimedia stream control architecture' can be successfully adapted to different user equipment and network characteristics. The makers who meet the quality requirements of various audio and video services in heterogeneous networks can indeed achieve the object of the present invention. However, the above is only the preferred embodiment of the present invention, and the scope of the invention should not be limited thereto. All remain within the scope of the invention patent. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a system block diagram illustrating a preferred embodiment 16 of a layered multimedia streaming system utilizing video and audio synchronization technology and bandwidth adaptive technology of the present invention. 200922328 [Explanation of main component symbols] 1 •...server»♦*♦(··...·Multi-level audio and video editing module 12,* * λ ·, •...Frequency-adaptive module ♦*♦»·»»« ...·Client processing unit 1 * · *«»« ...·Streaming synchronization module 211... '...delay jitter cancellation sub-module 212...conditional retransmission sub-module 213...receiving data buffer 22.........multi-level video decoding module 2 3.........play sync module 231 ·...video play buffer 232...·...audio play buffer 24......audio play module 3...··, ', network 17

Claims

200922328 X. Patent application scope: 1. A layered multimedia streaming system using audio and video synchronization technology and bandwidth adaptive technology, including: , client processing, including - stream synchronization module, a multi-level The audio and video decoding module, and the playback module, the stream synchronization module, and used to enable the received-layered video bit stream and the layered audio bit stream to be sent in a predetermined-decoding time The multi-layer video and audio decoding module, the multi-layer video and audio solution layer is respectively decoded into the L-video signal and the _decoded audio signal by the layered audio bit stream, and the playback synchronization mode The group is used to determine the video playback time of the decoded video signal and the tone time of the e decoded audio signal. 2. The layered multimedia streaming system according to the video-synchronization technology disk bandwidth adapting technology described in the third paragraph of the patent application, wherein the playback synchronization module has a video playback buffer and an audio playback buffer. The device is configured to store the decoded video signal and the decoded audio signal, respectively. 3. The layered multimedia stream system using the audio-visual synchronization technology and the bandwidth-adaptive technology according to the second application of the patent application scope, wherein the client-side processing unit further includes an audio-visual playback module, and the playing is the same as (4) The group respectively sends the decoded video signal in the video playback buffer and the decoded audio signal in the audio playback buffer according to the video playing time and the audio playing time, and timely sends the video playing mode to the video playing mode. The group plays. 4. The layered multimedia streaming system using the audio-visual synchronization technology and the frequency 18 200922328 wide adaptive technology according to the t-th aspect of the patent scope includes a server, the server including a multi-layer video coding module The multi-layer video coding module is configured to encode a video signal and an audio signal into the layered video bit stream and the layered audio bit stream, wherein the hierarchical view λ bit stream has a complex view a frame, the layered audio bit stream has a plurality of audio frames, each of the video frames has a video base layer and at least one

In the video enhancement layer, each audio frame has an audio base layer and at least one audio enhancement layer. 5. A layered multimedia streaming system utilizing audio-visual synchronization technology and bandwidth adaptive technology according to claim 4, wherein the feeding device further comprises a bandwidth adaptive module, the bandwidth adaptive module The group is used to determine the bit rate of the layered video bitstream to be transmitted and the bit rate of the layered audio bitstream, and transmit it through a network. Video and audio synchronization technology and frequency system, wherein the bandwidth is suitable for 'using the number of human senses and the audio plus 6 · The layered multimedia stream system utilizing wide adaptability technology according to item 5 of the patent application scope should The module predicts the available frequency header characteristics based on one of the networks, and determines the number of strong layers of the video enhancement layer to be transmitted. The layered multimedia streaming system using the video-synchronization technology disk bandwidth adaptability technology according to claim 6, wherein the number of the video enhancement layers to be transmitted is smaller than the number of the audio enhancement layers to be transmitted. According to the sixth paragraph of the patent scope, the layered multimedia stream system of video synchronization technology and bandwidth adaptability technology, wherein the stream is the same as 19 200922328 = group has - receive data buffer and - delay jitter elimination The sub-module 'the receiving data buffer is configured to store the received hierarchical layer: the layered audio bit stream, the delay jitter eliminating sub-module is used to: = the current delay time of the network 'and The data buffer can be received for a length of time greater than the delay time, so that the score = video bit stream and the layered tone - θ knife layer a bit stream are sent in the decoding time: the layer video decoding module Group, where the length of the presentation can be long: the number of the reeds in the buffer of the sputum is 1 _, * 〆, and, 八... Bit 7" the number of video frames of the stream and the layer of the layer flow The number of frames of audio information is determined. 9·^Please apply the layered multimedia streaming system using video and audio synchronization technology and frequency technology as described in the eighth paragraph of the patent (4), wherein the delay jitter 4 is divided into sub-modules through the network, and π delays jitter from m 1 The r-delay jitter-cancellation message is removed to the bandwidth-adaptive mode to accommodate the number of intervals of the video frame of the layered video bit to be transmitted by the module, and the number of intervals of the frame of the layered audio bit stream. The layered multimedia streaming system using the audio-visual synchronization technology and the frequency-seeking technology described in the 9th item of the H-patent patent, the length of the current time in 1 is less than the beginning #€# /, ΠΛ 呈呈造The moving gate # value, the shell back transmission starts to delay the jitter elimination message, and if the length of time can be extended, the K-stop delay jitter gate interpolation value is returned, and the stop delay jitter elimination message is returned. The layered video bit 1 using the video-synchronization technology and the frequency server according to the eighth aspect of the wide patent range; =::, wherein the _ and the knife-I audio bit stream are divided into 20 200922328 multiple packets, through The network is transmitted to the client, and the stream synchronization module has a conditional retransmission submodule, and the conditional retransmission submodule is used to determine whether there is any The packet is lost and it is determined whether a packet retransmission message needs to be sent back to the bandwidth adaptation module via the network. 12. A layered multimedia streaming system using video and audio synchronization technology and bandwidth adaptive technology according to the scope of the patent application, wherein if the lost packet has data of a video base layer or an audio base layer, and If the presentation time is longer than the time required for the packet retransmission, the packet retransmission message is sent back to the bandwidth adaptation module. 13. According to the scope of the patent application No. 1 @ @立丨. The layered multi- 6 adaptation module using the audio-visual synchronization technology and the bandwidth-adaptive technology described in the second item receives the 竿-封“统' When the bandwidth is just retransmitted by the frequency bp packet, the packet will be retransmitted according to the pre-, and no.