TW201001995A

TW201001995A - Jitter buffer and jitter buffering method

Info

Publication number: TW201001995A
Application number: TW97124150A
Authority: TW
Inventors: Jeng-Shyan Yang
Original assignee: Hon Hai Prec Ind Co Ltd
Priority date: 2008-06-27
Filing date: 2008-06-27
Publication date: 2010-01-01

Abstract

A jitter buffering method includes a packet data writing step, a data buffering step and a data reading step. The packet data writing step and the data reading step are executed synchronously and repeatedly. In the packet data writing step, a packet is detected to determine whether the packet is received normally by comparing the timestamp of the received packet with the timestamp of the frame that is now in the read position of the jitter buffer or with the depth of the jitter buffer, and a storing position for each packet is ascertained according to the comparison result. The data buffer step saves the packet in the storing position. The data reading step sends the packet as a series of frame to a signal DSP. A jitter buffer is also disclosed. By utilizing the present invention, quality of a real time protocol application can be enhanced.

Description

201001995 九、發明說明：【發明所屬之技術領域】本發明涉及一種網際網路協議（Internet pr〇t〇c〇l，IP) 技術，尤其涉及一種旎夠動態調整抖動緩衝器的延時深度的方法。【先前技術】在IP網路中，语音的即時資料都透過封裝在用戶資料協定（UDP)數據包中的即時傳輸協定（Real timeTransp〇rt201001995 IX. Description of the Invention: [Technical Field] The present invention relates to an Internet Protocol (IP) technology, and more particularly to a method for dynamically adjusting the delay depth of a jitter buffer . [Prior Art] In an IP network, voice real-time data is transmitted through an instant transfer protocol (Real timeTransp〇rt) encapsulated in a User Data Protocol (UDP) packet.

Protoco卜RTP)來傳輸。使用UDP傳輸資料，可能會產生亂序和收到重複包。為了使語音能在網路上有效地傳^並保證其通話品質，就必須考慮網路延時、網路延時抖動、丟包、亂序、重複包等因素所帶來的不利影響。為了解決因網路流量不穩定而產生的抖動問題，通常採用抖動緩衝器（jitter buffer)，並配合適當的方法以消除網路抖動。jitter buffer的實質思想是以在接收端增加延時為代價消除網路的抖動，引入延時的大小決定了去除網路抖動的月t*力但並不疋引入的延時越大越好，因為極大的延時會使仔即時語音傳輸無法達到即時的效果，而影響到通話雙方的語音互動。在網路流量不穩定且進行語音通話過程中，引入imbuffer之後，即使數據包發生延遲，仍能夠正常地將語音資料發送給語音訊號處理器，而不會影響語音播放品質。但是如果數據包的抖動超過了一定時間同樣會當作丟包處理，否則在發生了網路延時抖動過大的情況下，資料的即時性不能保證’而資料的即時性對語音的傳輸品質 7 201001995 起到很重要的作用。在以往靜態jitter buffer的架構下，一般會採用固定延時。若固定延時的值設定過大’則會喪失語音的即時性；反之，若固定延時的值設定過小，則會出現許多因抖動過大而造成的丢包’嚴重影響了語音的播放品質。因此’延時深度的動態調整具體是指：能夠即時地調整jitter buffer的延時深度，以達到不喪失語音的即時性又能保持語音品質。在習知技術中，出現了一些動態jitter buffer系統，其依據語音的播放將jitter buffer的聲波分為：有聲脈衝 (talk-spurt)和無聲脈衝（silence-spurt )，其可透過 DSP 的語·音行為彳貞測（ voice activity detection，VAD )功能偵涓出 0 如果用戶想調整jitter buffer能容忍的網路延遲，該調整最好發生在silence-spurt階段，其好處是：接聽者無法感覺出 e周整時聲音的細微變化。而此方法還存在一定的缺陷：卷 VAD功能關閉時，因DSP不能分辨有聲脈衝與無聲脈衝，而使此種調整方法無法運作。【發明内容】赛於以上内容，有必要提供一種抖動緩衝器及抖動緩衝方法’可在語音行為偵測功能關閉或不存在的情況下，實現即時語音傳送品質的改善。一種抖動緩衝器，採用緩衝存儲隊列從網路端接收數據包’該抖動緩衝器包括資料入隊模組、缓衝模級和資料出隊模組’該資料入隊模組與資料出隊模組分別按預定曰夺 8 201001995 鐘週期重複且同步執行資粗次' 、巩订貝枓入隊和資料出隊。其中：所述為料入隊模組用於$ ^ _ 、 , 您尥比車又所接收數據包的時戳、所述資料出隊模組當前所讀取ϋ立— 一# ^ 叩曰幀的播放時戳及抖動緩衝器的延枯沬度，來判斷數據包县不 ^ ^ ^ 否正*接收，並根據判斷結果〇十算數據匕的儲存位址；所述 ψ . 緩衝模組用於根據上述計算 2的儲知址’對所述輯包進行緩衝與存儲；及所述資料出隊杈組用於透過檢查抖沾ΛΛΜ , + 了功、^衡器當前讀取位置中語音幢的播放時戳’確定是否有註_立 1立#油，令W S幢被輸出’如果有則將該 …貞傳送給語音訊號處理器進行語音解碼。數據勺種m缓衝方法’ &用緩衝存儲隊列從網路端接收數據包’雜狀__麵執㈣下步驟：貧料入隊：透過比較所接收數據包告a 取語音幀的播放時戳及抖動緩田刚5 ^ ^ Η ^ ^ π衝器的延時深度，來判斷數址；資料緩衝1心、f + 計算數據包的儲存位貝衝·根據上述計算出的進行緩衝與存儲；及資料出 PM數據包读枷A ^ 隊·透過檢查抖動緩衝器當前嗔4置中語音幀的播放時戰，確宏β $ 士 φ , 料確疋疋否有語音幀祜於 ’如果有則將該語音幀傳送輅、解碼。 1^、，口叩曰讯唬處理器進行語音相較於習知技術，所述的法，可f JI勃;^ ~ Μi 動緩衝器及抖動緩衝方實現數據包延遲與數據包損失率的折巾曰仃為偵測功能關閉或不存吏在b 组立播的11况下’也可以實現如眭。°曰傳送品質的改善。見見即妆【實施方式】 201001995 如圖1所示，係本發明抖動緩衝方法之運行環境圖。該運行環境示意圖包括一抖動緩衝器1和一語音訊號處理器2。所述抖動緩衝器1包括資料入隊模組10、缓衝模組 12和資料出隊模組14。該抖動缓衝器1用於從IP網路接收即時傳輸協定（real-time transport protocol，RTP )數據包（以下簡稱為“數據包”），並按照預定時鐘週期重複且同步執行資料入隊和資料出隊，以完成缓衝存儲功能。也就是說，所述資料入隊模組10和資料出隊模組14是同步工作的，缓衝模組12相當於一個中間站，用於實現資料的缓衝與存儲功能。在本實施例裏，每個數據包根據其寫入的時間會帶有一個時戳，數據包本身代表著語音資料。在資料出隊過程中，數據包是以語音幀為單位進行讀取的，每個語音幀帶有一個時戳，且每個語音幀及其時戳存儲在抖動緩衝器1 的讀取位置中。該抖動緩衝器1還用於將讀取位置中的語音幀及其時戳發送給語音訊號處理器2，抖動緩衝器1每發送一個語音幀及時戳，其讀取計數主動加1。所述語音訊號處理器2用於將所述語音幀進行語音解碼並播放。如圖2所示，係本發明抖動緩衝方法之作業流程圖。步驟S10，資料入隊：資料入隊模組10透過比較所接收數據包的時戳、資料出隊模組14當前所讀取語音幀的播放時戳及抖動缓衝器1的延時深度，來判斷數據包是否正常接收，並根據判斷結果計算數據包的儲存位址。步驟S20,資料缓衝：缓衝模組12根據上述計算出的 201001995 儲存位址，緩衝並存儲所述數據包。步驟S30，資料出隊：由於每個數據包由多個語音幀組成，因此，資料出隊模組14透過檢查抖動緩衝器1當前讀取位置中語音幀的播放時戳，確定是否有語音幀被輸出，如果有則將該語音幀傳送給語音訊號處理器2進行語音解碼。如圖3所示，係本發明圖2中步驟S10和步驟S20之具體作業流程圖。步驟S100，資料入隊模組10接收一個數據包。步驟S102，資料入隊模組10根據數據包的時戳判斷所接收的數據包是否為接收到的第一個數據包。若步驟S102中的判斷結果為是，則於步驟S104，資料入隊模組10將該第一個數據包存入抖動缓衝器1預緩衝階段的起始位址上，缓衝模組12對該第一個數據包進行缓衝，並以該第一個數據包的時戳為基準時戳，設為TS1，然後返回步驟S100。若步驟S102中的判斷結果為否，則於步驟S106，資料入隊模組10將該數據包的時戳記為TSw，並判斷該數據包的時戳TSw與所述基準時戳TS1的差值（TSw-TSl) 是否小於抖動缓衝器1的延時深度Dn。若（TSw-TSl) <Dn，則於步驟S108，資料入隊模組 10確定抖動缓衝器1處於預緩衝階段，緩衝模組12緩衝該數據包，並根據數據包的時戳的順序對該數據包進行排序，即確定該數據包的儲存位址，然後結束流程。 11 201001995 若（TSw-TSl) 2Dn，則於步驟sll〇，資料入隊模組 1〇判斷該數據包的時戳TSw是否小於抖動緩衝器1當前所讀取語音幀的播放時戳TSr。若TSw<TSr，則於步驟S112，資料入隊模組10確定所述數據包延時到達，丟棄該數據包，並將抖動緩衝器i 的延時深度的調整值（以下簡稱為“調整值”）增加一調整權值，然後直接進入步驟S120。在本實施例中，所述調整值為調整延時深度的暫存值，而所述調整權值是指當數據包延遲或過早到達時，增加或減少所述調整值的權值。該調整權值與s吾音t貞傳輸的時間間隔（SamplePerFrame )之間存在倍數關係’即調整權值等於SanipieperFrame/2n，其中η可以為〇到8之間的任意整數值，其在具體實施過程中’可由實驗得出η的具體數值。所述SamplePerFrame直譯出來是指每個語音幀的抽樣數，但在此可理解成語音幀傳輸的時間間隔，解釋如下：在δ吾音數據包中，每個語音幀的樣本可以為1/8毫秒，而依據不同的語音編碼，每個語音幀的時長可以為5毫秒或亳秒，例如’若本實施例中的語音幀的時長為5毫秒，則SamplePerFrame等於40 ;若本實施例中的語音幀的時長為 10 毫秒，貝SamplePerFrame 等於 80。若步驟S110中的判斷結果為：TSw2TSr，則於步驟 S114，資料入隊模組1〇判斷該數據包的時戳TSw是否小於或等於抖動緩衝器1當前所讀取語音幀的播放時戳TSr 與所述延時深度Dn之和（TSr+Dn)。 12 201001995 若TSw$ ( TSr + Dn) ’則於步驟S116，資料入隊模組 10將該數據包中的語音幀及其時戳複製到一標示位為 (TSw-TSl) /SamplePerFrame + Idx 的儲存位址内，然後結束流程。其中，所述Idx指抖動緩衝器1當前所讀取語音巾貞的儲存位址的標示位。若TSw> ( TSr+Dn )，則表明數據包過早到達，於步驟 S118，資料入隊模組10將該數據包中的語音幀及其時戳複製到一標示位為Dn/SamplePerFrame + Idx的儲存位址内，並將所述調整值減少一調整權值。步驟S120，資料入隊模組10透過檢查抖動缓衝器1 的當前延時深度Dn的調整值以判斷該調整值是否大於或等於所述 SamplePerFrame。若DnSSamplePerFrame，則於步驟S122，資料入隊模組10將所述延時深度增加或減少一個SamplePerFrame，且將所述調整值歸零；反之，若Dn<SamplePerFrame，則結束流程。另外，在本實施例中，資料入隊模組10所接收數據包的時戳TSw、資料出隊模組14當前所讀取語音幀的播放時戳TSr、新播放語音幀的播放時戳及抖動緩衝器1的延時深度均以1/8毫秒為單位，且抖動缓衝器χ的延時深度必須為SamplePerFrame的倍數。如圖4所示，係本發明圖2中步驟S30之具體作業流程圖。步驟S300 ’語音訊號處理器2預播放一語音巾貞，也就 13 201001995 是說，資料出隊模組14需要從抖動緩衝器1的讀取位置讀取語音幀。步驟S302，資料出隊模組14檢查抖動緩衝器1是否處於預緩衝階段。若步驟S302中檢查出抖動緩衝器1處於預緩衝階段，則於步驟S304，資料出隊模組14發送一無聲語音幀給語音訊號處理器2。反之，若步驟S302中檢查出抖動緩衝器1未處在預緩衝階段，則於步驟S306，資料出隊模組14檢查抖動缓衝器1當前讀取位置中語音幀的播放時戳TSr是否為空值。若該當前讀取位置中語音幀的播放時戳TSr為空值，則於步驟S308，資料出隊模組14將該當前讀取位置中語音幀的播放時戳TSr設定為：前一語音幀的播放時戳加上一 SamplePerFrame，資料出隊模組14記錄該當前讀取位置有一語音幀被播放，並執行該語音幀的補償。若該當前讀取位置中語音幀的播放時戳TSr不是空值，則於步驟S310，資料出隊模組14將所述當前讀取位置中的語音幀發送給語音訊號處理器2，同時，抖動緩衝器1的讀取計數加1。如圖5所示，係本發明語音幀的補償作業流程圖。步驟S500，資料出隊模組14檢查抖動緩衝器1當前讀取位置之前一語音幀的播放時戳是否為空值。若檢查的結果為否，則於步驟S502，資料出隊模組 14將抖動緩衝器1當前讀取位置之前一語音幀發送給語音 14 201001995 訊號處理器2。反之，若檢查的結果為是，則於步驟S504，資料出隊模組14檢查抖動緩衝器1當前讀取位置之後一語音幀的播放時戳是否為空值。若該當前讀取位置之後一語音幀的播放時戳不為空值，則於步驟S506，資料出隊模組14將該後一語音幀發送至語音訊號處理器2。反之，若所述後一語音幀的播放時戳為空值，則於步驟S508，資料出隊模組14檢查抖動緩衝器1當前讀取位置之前兩個語音幀的播放時戳是否為空值。若檢查的結果為所述當前讀取位置之前兩個語音幀的播放時戳不為空值，則步驟S510，資料出隊模組14將該前兩個語音幀發送至所述語音訊號處理器2進行播放。反之，若檢查的結果為所述當前讀取位置之前兩個語音幀的播放時戳為空值，則步驟S512，資料出隊模組14 發送一無聲語音幀給所述語音訊號處理器2。 / 4 綜上所述，本發明符合發明專利要件，爰依法提出專利申請。惟，以上所述者僅為本發明之較佳實施例，舉凡熟悉本案技藝之人士，在爰依本案發明精神所作之等效修飾或變化，皆應包含於以下之申請專利範圍内。【圖式簡單說明】圖1係本發明抖動緩衝方法之運行環境示意圖。圖2係本發明抖動緩衝方法之作業流程圖。圖3係本發明資料入隊和資料緩衝之具體作業流程 15 201001995 圖。圖4係本發明資料出隊之具體作業流程圖。圖5係本發明語音幀之補償作業流程圖。【主要元件符號說明】抖動緩衝器 1 語音訊號處理器 2 資料入隊模組 10 緩衝模組 12 資料出隊模組 14 16Protoco RTP) to transmit. Using UDP to transfer data may result in out of order and receipt of duplicate packets. In order for voice to be effectively transmitted over the network and to guarantee its call quality, it is necessary to consider the adverse effects of network delay, network delay jitter, packet loss, out-of-order, and repeated packets. In order to solve the jitter problem caused by unstable network traffic, a jitter buffer is usually used and an appropriate method is used to eliminate network jitter. The essence of the jitter buffer is to eliminate the jitter of the network at the cost of increasing the delay at the receiving end. The size of the introduced delay determines the monthly t* force to remove the network jitter, but the larger the delay introduced, the better, because of the great delay. It will make the instant voice transmission of the child unable to achieve the instant effect, and affect the voice interaction of both parties. When the network traffic is unstable and the voice call is in progress, after the imbuffer is introduced, even if the data packet is delayed, the voice data can be normally sent to the voice signal processor without affecting the voice playback quality. However, if the jitter of the data packet exceeds a certain period of time, it will also be treated as a packet loss. Otherwise, in the case where the network delay jitter is too large, the immediacy of the data cannot be guaranteed 'and the immediacy of the data to the voice transmission quality 7 201001995 Play a very important role. In the past static jitter buffer architecture, fixed delays were generally used. If the value of the fixed delay is set too large, the instantaneousness of the speech will be lost; conversely, if the value of the fixed delay is set too small, there will be a lot of packet loss caused by excessive jitter, which seriously affects the playback quality of the speech. Therefore, the dynamic adjustment of the delay depth specifically means that the delay depth of the jitter buffer can be adjusted in real time, so as to maintain the voice quality without losing the immediacy of the voice. In the prior art, some dynamic jitter buffer systems have appeared, which divide the sound waves of the jitter buffer into voice-spurt and silence-spurt according to the playback of the voice, which can be transmitted through the DSP language. Voice activity detection (VAD) function detects 0. If the user wants to adjust the network delay that the jitter buffer can tolerate, the adjustment is best done in the silence-spurt phase. The advantage is that the listener cannot feel it. Subtle changes in sound during the week. However, this method still has certain defects: when the volume VAD function is turned off, the adjustment method cannot be operated because the DSP cannot distinguish between the voiced pulse and the silent pulse. SUMMARY OF THE INVENTION In the above content, it is necessary to provide a jitter buffer and jitter buffering method, which can improve the quality of instant voice transmission when the voice behavior detection function is turned off or does not exist. A jitter buffer uses a buffer storage queue to receive a data packet from a network end. The jitter buffer includes a data enqueue module, a buffer mode, and a data dequeue module. The data enqueue module and the data dequeue module are respectively pressed. Scheduled to usurp 8 20,100,1995, the clock cycle is repeated and the simultaneous execution of the capital is thicker, and the squad is entered into the team and the data is dequeued. Wherein: the material entering the team module is used for $ ^ _ , , and the time stamp of the data packet received by the vehicle, and the data dequeuing module currently reading the standing - a # ^ 叩曰 frame The time stamp of the playback time and the jitter of the jitter buffer are used to determine whether the data packet county is not ^ ^ ^ positive* received, and according to the judgment result, the storage address of the data is calculated; the buffer module is used for According to the above-mentioned calculation 2, the storage address is buffered and stored; and the data dequeue group is used to check the vibrating and smashing, and the function and the current reading position of the voice building are played. The time stamp 'determines whether there is a note _ Li 1 Li # oil, so that the WS building is output 'if any, the ... 贞 is transmitted to the voice signal processor for speech decoding. Data scooping m buffering method ' & using buffer storage queue to receive data packets from the network side 'Miscellaneous__ faceted (4) Next steps: Poor material enqueue: by comparing the received data packet a to take the playback of the speech frame Poke and shake the slow delay depth of the 5 ^ ^ Η ^ ^ π puncher to determine the number of addresses; data buffer 1 heart, f + calculate the storage location of the data packet, buffering and storing according to the above calculation; And the data out of the PM packet read 枷 A ^ Team · by checking the jitter buffer current 嗔 4 set the voice frame playback time, indeed macro β $ 士 φ, it is true that there is a voice frame ' 'if there is The voice frame is transmitted and decoded. 1^,, 叩曰叩曰唬唬唬进行进行进行唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬唬The tweezers can be implemented as if the detection function is turned off or not in the 11th condition of the b group. °曰 Improved transmission quality. See the instant makeup [Embodiment] 201001995 As shown in Fig. 1, it is an operating environment diagram of the jitter buffering method of the present invention. The operating environment diagram includes a jitter buffer 1 and a voice signal processor 2. The jitter buffer 1 includes a data enqueue module 10, a buffer module 12, and a data dequeue module 14. The jitter buffer 1 is configured to receive a real-time transport protocol (RTP) data packet (hereinafter referred to as a "data packet") from an IP network, and repeat and synchronously perform data enqueue and data according to a predetermined clock cycle. Dequeue to complete the buffer storage function. That is to say, the data enqueue module 10 and the data dequeue module 14 are synchronously operated, and the buffer module 12 is equivalent to an intermediate station for realizing data buffering and storage functions. In this embodiment, each packet carries a time stamp based on its time of writing, and the packet itself represents voice data. In the data dequeue process, the data packets are read in units of voice frames, each voice frame carries a time stamp, and each voice frame and its time stamp are stored in the read position of the jitter buffer 1 . The jitter buffer 1 is further configured to send the voice frame and its time stamp in the reading position to the voice signal processor 2, and the jitter buffer 1 automatically increments the read count by one for each voice frame. The voice signal processor 2 is configured to perform voice decoding and playback on the voice frame. As shown in FIG. 2, it is a flowchart of the operation of the jitter buffering method of the present invention. Step S10, data enqueue: the data enqueue module 10 determines the data by comparing the time stamp of the received data packet, the playing time stamp of the currently read speech frame of the data dequeuing module 14 and the delay depth of the jitter buffer 1 Whether the packet is received normally, and the storage address of the data packet is calculated according to the judgment result. Step S20, data buffering: the buffer module 12 buffers and stores the data packet according to the calculated 201001995 storage address. Step S30, data dequeue: Since each data packet is composed of multiple speech frames, the data dequeue module 14 determines whether there is a speech frame by checking the playing time stamp of the speech frame in the current reading position of the jitter buffer 1. It is output, if any, the speech frame is transmitted to the speech signal processor 2 for speech decoding. As shown in Fig. 3, it is a specific operation flowchart of step S10 and step S20 in Fig. 2 of the present invention. In step S100, the data enqueue module 10 receives a data packet. In step S102, the data enqueue module 10 determines whether the received data packet is the received first data packet according to the time stamp of the data packet. If the result of the determination in step S102 is YES, then in step S104, the data enqueue module 10 stores the first data packet in the start address of the pre-buffering phase of the jitter buffer 1, and the buffer module 12 pairs The first data packet is buffered, and the time stamp of the first data packet is used as a reference time stamp, and is set to TS1, and then returns to step S100. If the result of the determination in step S102 is no, then in step S106, the data enqueue module 10 records the time stamp of the data packet as TSw, and determines the difference between the time stamp TSW of the data packet and the reference time stamp TS1 ( TSw-TSl) Whether it is smaller than the delay depth Dn of the jitter buffer 1. If (TSw-TS1) < Dn, then in step S108, the data enqueue module 10 determines that the jitter buffer 1 is in the pre-buffering phase, and the buffer module 12 buffers the data packet and according to the order of the time stamps of the data packets. The packet is sorted by determining the storage address of the packet and then ending the process. 11 201001995 If (TSw-TSl) 2Dn, then in step s11, the data enqueue module 1 determines whether the time stamp TSW of the data packet is smaller than the play time stamp TSR of the speech frame currently read by the jitter buffer 1. If TSw<TSr, in step S112, the data enqueue module 10 determines that the data packet arrives delayed, discards the data packet, and increases the adjustment value of the delay depth of the jitter buffer i (hereinafter referred to as "adjustment value"). Once the weight is adjusted, then directly proceeds to step S120. In this embodiment, the adjustment value is a temporary storage value for adjusting the delay depth, and the adjustment weight refers to a weight value that increases or decreases the adjustment value when the data packet is delayed or arrives too early. There is a multiple relationship between the adjustment weight and the time interval (SamplePerFrame) of the transmission of the sigma t贞, that is, the adjustment weight is equal to SanipieperFrame/2n, where η can be any integer value between 〇 and 8, which is implemented in the specific implementation. In the process, the specific value of η can be obtained experimentally. The SamplePerFrame literal translation refers to the number of samples per speech frame, but can be understood as the time interval of speech frame transmission. The explanation is as follows: In the δ 吾音 packet, the sample of each speech frame can be 1/8. Millisecond, and according to different speech coding, the duration of each speech frame may be 5 milliseconds or leap seconds, for example, 'If the duration of the speech frame in this embodiment is 5 milliseconds, then SamplePerFrame is equal to 40; if this embodiment The duration of the speech frame is 10 milliseconds, and the sample SamplePerFrame is equal to 80. If the result of the determination in step S110 is: TSw2TSr, then in step S114, the data enqueue module 1 determines whether the time stamp TSw of the data packet is less than or equal to the play time stamp TSR of the currently read speech frame of the jitter buffer 1 and The sum of the delay depths Dn (TSr+Dn). 12 201001995 If TSw$ ( TSr + Dn) ' then in step S116, the data enqueue module 10 copies the speech frame and its time stamp in the data packet to a storage location (TSw-TSl) /SamplePerFrame + Idx Within the address, then end the process. The Idx refers to the flag of the storage address of the voice packet currently read by the jitter buffer 1. If TSw> (TSr+Dn), it indicates that the data packet arrives prematurely. In step S118, the data enqueue module 10 copies the speech frame and its time stamp in the data packet to a flag bit of Dn/SamplePerFrame + Idx. The address is stored and the adjustment value is reduced by an adjustment weight. In step S120, the data enqueue module 10 checks whether the adjustment value is greater than or equal to the SamplePerFrame by checking the adjustment value of the current delay depth Dn of the jitter buffer 1. If DnSSamplePerFrame, in step S122, the data enqueue group 10 increases or decreases the delay depth by one SamplePerFrame, and zeros the adjustment value; otherwise, if Dn<SamplePerFrame, the flow ends. In addition, in this embodiment, the time stamp TSW of the data packet received by the data enqueue module 10, the play time stamp TSR of the current read speech frame of the data dequeue module 14, the play time stamp of the newly played voice frame, and the jitter The delay depth of Buffer 1 is in 1/8 milliseconds, and the delay depth of the jitter buffer must be a multiple of SamplePerFrame. As shown in Fig. 4, it is a specific operation flow chart of step S30 in Fig. 2 of the present invention. Step S300 'The voice signal processor 2 pre-plays a voice frame, that is, 13 201001995. It is said that the data dequeue module 14 needs to read the voice frame from the reading position of the jitter buffer 1. In step S302, the data dequeue module 14 checks whether the jitter buffer 1 is in the pre-buffering phase. If it is checked in step S302 that the jitter buffer 1 is in the pre-buffering stage, the data dequeue module 14 sends a silent voice frame to the voice signal processor 2 in step S304. On the other hand, if it is checked in step S302 that the jitter buffer 1 is not in the pre-buffering phase, then in step S306, the data dequeue module 14 checks whether the playback time stamp TSR of the speech frame in the current reading position of the jitter buffer 1 is Null value. If the play time stamp TSR of the voice frame in the current read position is null, then in step S308, the data dequeue module 14 sets the play time stamp TSr of the voice frame in the current read position to: the previous voice frame. The play time stamp is added with a SamplePerFrame, and the data dequeue module 14 records that a voice frame is played at the current read position, and performs compensation of the voice frame. If the play time stamp TSR of the voice frame in the current read position is not a null value, then in step S310, the data dequeue module 14 sends the voice frame in the current read position to the voice signal processor 2, and The read count of the jitter buffer 1 is incremented by one. As shown in FIG. 5, it is a flowchart of the compensation operation of the speech frame of the present invention. In step S500, the data dequeue module 14 checks whether the play time stamp of a speech frame before the current read position of the jitter buffer 1 is null. If the result of the check is no, then in step S502, the data dequeue module 14 sends a speech frame before the current reading position of the jitter buffer 1 to the voice processor 14 201001995. On the other hand, if the result of the check is YES, then in step S504, the data dequeue module 14 checks whether the play time stamp of a speech frame after the current read position of the jitter buffer 1 is null. If the play time stamp of a speech frame after the current read position is not null, the data dequeue module 14 sends the next speech frame to the voice signal processor 2 in step S506. On the other hand, if the play time stamp of the subsequent speech frame is null, then in step S508, the data dequeue module 14 checks whether the play time stamp of the two speech frames before the current read position of the jitter buffer 1 is null. . If the result of the check is that the play time stamp of the two speech frames before the current read position is not null, then in step S510, the data dequeue module 14 sends the first two speech frames to the voice signal processor. 2 play. On the other hand, if the result of the check is that the play time stamp of the two voice frames before the current read position is null, then the data dequeue module 14 sends a silent voice frame to the voice signal processor 2 in step S512. / 4 In summary, the present invention complies with the requirements of the invention patent and submits a patent application according to law. The above description is only the preferred embodiment of the present invention, and equivalent modifications or variations made by those skilled in the art will be included in the following claims. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic diagram showing the operating environment of the jitter buffering method of the present invention. 2 is a flow chart showing the operation of the jitter buffering method of the present invention. Figure 3 is a detailed operation flow of the data enqueue and data buffer of the present invention. 15 201001995 Figure. Figure 4 is a flow chart showing the specific operation of the present invention. Figure 5 is a flow chart of the compensation operation of the speech frame of the present invention. [Main component symbol description] Jitter buffer 1 Voice signal processor 2 Data enqueue module 10 Buffer module 12 Data dequeue module 14 16

Claims

201001995 X. Applying for Patent Fanyuan 1 A method of jitter buffering, using buffers::packets, and repeating the timestamps of the received data packets according to the predetermined clock cycle, the current timestamp of the received data packets; whether the packet is: The delay depth is determined by the storage address; the data packet buffer is buffered and stored according to the above calculation; and the storage address is read by the data packet. The play time stamp of the building determines whether there is a voice duck 2 to transmit the voice frame to the 'round, if there is then the 'j ° processor for voice decoding. The second time = the patent buffer method described in item 1 The step of entering the negative material includes the following steps: (1) after receiving a data packet, 'determine whether it is a data packet, and if so, store the first data packet; The start address is buffered, and the time stamp of the data packet is used as a reference time stamp tsi; (7) if the data packet received in step (1) is not a received data packet, the time stamp TSW and the stamp TS1 of the data packet are determined. Difference (TSw_TS1) No move 2 / know the depth Dn; R jitter (four) delay (3) if (TSw-TSl) < Dn, then the jitter buffer is in the stage, buffer the packet, and according to the time of the packet Sequence pair 17 201001995 The data packet is sorted, that is, the storage address of the data packet is determined; if (TSw-TSl) 2Dn, it is determined whether the time stamp TSw of the data packet is smaller than the current voice frame read by the jitter buffer. Timestamp TSR; (4) If TSw<TSr, the data packet arrives delayed, discards the data packet' and increases the adjustment value of the delay depth by an adjustment weight; if TSw^TSr, determines the data packet Whether the time stamp TSW is less than or equal to the sum of the play time stamp TSR of the currently read speech frame and the delay depth Dn (TSr+Dn); and (5) if TSwS (TSr+Dn), The playback time stamp of the voice frame and the voice frame in the data packet is copied into the storage address of a first indicator bit and then directly ends the process; if TSw> (TSr+Dn), the voice frame in the data packet and The play time stamp of the voice frame is copied into the storage address of a second flag bit and the delay is deep The adjustment value of the degree is reduced by one adjustment weight. The jitter buffer method according to claim 2, wherein the first flag is calculated by (TSw-TSl) /SamplePerFrame+Idx, the The calculation formula of the two flag bits is Dn/SamplePerFrame + Idx, and SamplePerFrame refers to the time interval of voice dumping. ' Idx refers to the flag bit of the storage address of the voice buffer currently read by the jitter buffer. 4. The jitter buffering method according to claim 3, wherein the adjustment value of the delay depth refers to a temporary storage value of adjusting a delay depth, and the adjustment weight is equal to SamplePerFrame/2n, wherein the value of η is The range is an integer value from 0 to 8. 18 201001995 5 - The jitter buffering method according to claim 4, the method comprising the following steps after the step (5): determining whether the current delay depth of the jitter buffer is greater than or equal to the SamplePerFrame; Dn^SamplePerFrame, the delay depth is increased or decreased by one SamplePerFrame, and the adjustment value of the delay depth is zeroed; or if Dn<SamplePerFrame, the flow is ended. 6. The jitter buffering method of claim 1, wherein the data dequeuing step comprises the following steps: when the voice signal processor pre-plays a voice frame, checking whether the jitter buffer is in a pre-buffer phase If yes, send a silent voice frame to the voice signal processor, if the jitter buffer is not in the pre-buffering phase, check whether the play time stamp of the voice frame in the current read position of the jitter buffer is null; if jitter If the playing time stamp of the voice frame in the current reading position of the buffer is a null value, the playing time stamp of the voice frame in the current reading position is set as the playing time stamp of the previous voice frame plus the time interval of a voice frame transmission. Recording, at the current reading position, a speech frame is played, and performing compensation of the speech frame; and if the playback time stamp of the speech frame in the current reading position of the jitter buffer is not null, then the current reading position is The speech frame is sent to the voice signal processor, and the read count of the jitter buffer is incremented by one. 7. The jitter buffering method of claim 6, wherein the compensation of the speech frame comprises the following steps: checking whether the jitter buffer causes the playback time stamp of a speech frame before the current reading position to be empty. Value; if the result of the check is no, a speech frame is sent to the voice signal processor before the jitter buffer reading position, and if the result of the check is yes, then the current reading position of the jitter buffer is checked after a speech frame. Whether the play time stamp is a null value; if the play time stamp of the voice frame after the current read position is not null, the next voice frame is sent to the voice signal processor, if the next voice frame is played If the time stamp is null, check whether the play time stamp of the two speech frames before the current read position of the jitter buffer is null; and if the result of the check is the play time stamp of the two speech frames before the current read position If the value is not null, the first two voice frames are sent to the voice signal processor, and if the result of the check is the play time stamp of the two voice frames before the current read position is Value, sending a frame of unvoiced speech signal to the speech processor. 8 · A jitter buffer, which uses a buffer storage queue to receive data packets from the network, the jitter buffer includes a data enqueue module, a buffer module, and a data dequeue module, and the data enqueue module and data dequeue mode The group repeats the data enqueue and the data dequeue in a predetermined clock cycle, wherein: the data enqueue module is configured to compare the time stamp of the received data packet with the current read speech frame of the data dequeue module. The play time stamp 20 201001995 and the delay depth of the jitter buffer are used to determine whether the data packet is normally received, and the storage address of the data packet is calculated according to the judgment result; the buffer module is used to calculate the storage address according to the above, Buffering and storing the data packet; and the data dequeue module is configured to check whether a voice frame is output by checking a play time stamp of a voice frame in a current read position of the jitter buffer, and if yes, The speech frame is transmitted to a voice signal processor for speech decoding. 9. The jitter buffer of claim 8, wherein the data dequeue module is further configured to: when no speech frame is outputted in the current reading position of the jitter buffer, the current reading position The playing time stamp of the middle voice frame is set to the playing time stamp of the previous voice frame plus the time interval of the voice frame transmission, and a voice frame is recorded to be recorded at the current reading position, and the compensation of the voice frame is performed. The jitter buffer of claim 8, wherein the data dequeuing module is further configured to determine whether the jitter buffer is in a pre-buffering phase when the voice signal processor pre-plays a voice frame. And when the result of the judgment is that the jitter buffer is in the pre-buffering phase, a silent voice frame is sent to the voice signal processor. twenty one