TW201101852A - Sound source direction detecting method and apparatus thereof - Google Patents
Sound source direction detecting method and apparatus thereof Download PDFInfo
- Publication number
- TW201101852A TW201101852A TW98121567A TW98121567A TW201101852A TW 201101852 A TW201101852 A TW 201101852A TW 98121567 A TW98121567 A TW 98121567A TW 98121567 A TW98121567 A TW 98121567A TW 201101852 A TW201101852 A TW 201101852A
- Authority
- TW
- Taiwan
- Prior art keywords
- sound source
- signal
- microphones
- voice
- microphone
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 9
- 238000005314 correlation function Methods 0.000 claims abstract description 16
- 230000000694 effects Effects 0.000 claims abstract description 15
- 239000013598 vector Substances 0.000 claims abstract description 15
- 230000003595 spectral effect Effects 0.000 claims abstract description 5
- 238000006243 chemical reaction Methods 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 7
- 238000001228 spectrum Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 238000012790 confirmation Methods 0.000 claims 1
- 238000000691 measurement method Methods 0.000 claims 1
- 230000005236 sound signal Effects 0.000 abstract 1
- 230000003321 amplification Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- NCGICGYLBXGBGN-UHFFFAOYSA-N 3-morpholin-4-yl-1-oxa-3-azonia-2-azanidacyclopent-3-en-5-imine;hydrochloride Chemical compound Cl.[N-]1OC(=N)C=[N+]1N1CCOCC1 NCGICGYLBXGBGN-UHFFFAOYSA-N 0.000 description 2
- 101100042630 Caenorhabditis elegans sin-3 gene Proteins 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Landscapes
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
201101852 六、發明說明: 【發明所屬之技術領域】 本發明係有關於一種聲源偵測裝置,特別是有關於一 種適用於偵測人聲方位的聲源偵測裝置。 【先前技術】 聲源方位偵測的應用範圍相當廣泛,例如聲納探測、 無線通訊、視訊會議系統等等。近年來隨著機器人的發展 ❹ 越來越蓬勃’應用在機器人聽覺上之聲源方位偵測的研究 也逐漸增加。另外’聲源方位偵測系統也可以用於加強人 機之間的互動(Human-Robot Interaction,HRI ),或是結 合自動語音辨識( Automatic Speech Recognition,ASR)系 統或語音增強(Speech Enhancement)系統,以增強語音處 理系統的效能。 一聲源方位偵測有關的文獻上,被提出的技術可約略分 f二大類.,其中一類是將麥克風陣列所接收到的資料 〇 ΐΠΐί,再使用beamforming或子空間理論處理,以 求侍聲源的角度。另一類則是設法去估 =的時間延遲(TD0A),再利用聲源與麥; 何關係,以估計出聲源的角度。 刃戍 【發明内容】 本發明提供一種聲源偵測方法及其裝 =測、時間延遲估計以及方位角度計算來4::;;: 方位 源之 本=提供-種聲源偵測方法,適用於偵測一聲源之 首先,經由一麥克風陣列,接收來自於上述聲 201101852 一語音信號,其中上述麥爻湿 將每-上述麥克風所接收包括複數麥克風。接著, 波。接著,將已濾波之每一 ϋ入信▲號進行放大以及遽 號。接著,藉由計算各㈣^輸人㈣轉換成—數位信 比:進行語 ===,號音框時 關函數的迫近具法去计算出兩 =的估計值,並且形成-個TDOA向量: Ο 〇 TD〇A向量去進行方位計算,以4 上述聲源之-水平方位角以及—仰角。 ㈣ 塾、择本發明提供—種聲源偵測裝置,適用於偵測一 聲源之方位。上述聲源備測裝置包括一麥克風陣歹心 秀風包括一第一麥克風、-第二麥 _ —麥克風,用以接收來自上述聲源之一語音 :來μ第=麥克風之—第二輸人信號以及對應於上述第 ΐ笛-i之τ第三輸人信號。上述判斷電路首先侧出上 銓二ί號的語音音框(含有語音之信號音框)、上述 框:當上:第“:語ί音框以及上述第三輸入信號的語音音 立 第一及第三輸入信號同時都偵測出語音 二节函不述判斷電路就會進行第-、第二以及第三輪入 二之間的時間延遲TD0A值的估計,然後再據以計 算上梅之一水平方位角以及一仰角。 【實施方式】 顧it讓本發明之上述和其他目的、特徵、和優點能更明 *重下文特舉出較佳實施例,並配合所附圖式,作詳 201101852 細說明如下: 實施例: 第1圖係顯示根據本發明一實施例所述之聲源偵測裝 置100用以偵測聲源140之方位。聲源偵測裝置100包 ^麥1,陣列110、信號轉換電路12〇以及判斷電路130, =信號轉換電路m包括放大與遽波單元122以及類比 ^數位轉換單元124,而判斷電路13〇包括語音活動價測 〇 早70 132、時間延遲估計單元134以及方位計算單元136。 在,發,中’聲源偵測裝置100可根據由聲源140所發出 之語音信號Ss而得到聲源140的三維方位,即聲源140所 在位置之水平方位角0以及仰角含。 第2圖係顯示第1圖中麥克風陣列110之結構示意圖。 麥克風陣列110係由三個麥克風210、220及230所組成, 其中三個麥克風210、220及230係設置成正三角形以組成 一平面陣列。此外,麥克風210、220及230皆為全向性麥 ◎ 克風。同時參考第1圖與第2圖,當語音信號Ss從聲源14〇 被傳送至聲源偵測裝置1〇0時,麥克風210、22〇及23〇會 分別產生所對應之輸入信號。例如,當接收到語音信號s 時’麥克風210會提供輸入信號Sinl至信號轉換電路: 麥克風220會提供輸入信號Sw至信號轉換電路12〇,而麥 克風230則會提供輸入信號Sin3至信號轉換電路12〇。由於 聲源140與麥克風210、220及230之間的距離並不相同', 因此當語音信號Ss從聲源140被傳送至聲源偵測裝置1〇〇 時,語音信號Ss不會同時到達麥克風210、220及23〇。例 如’麥克風210與聲源140,之間的距離最近而麥克風23〇 201101852 與聲源140之間的距離最遠,則麥克風21〇會最先接收到 語音信號Ss而麥克風230會最後接收到語音信號^。 Ο ❹ 第3圖係顯示第1圖中信號轉換電路120之示意圖。 如第3圖所顯示,放大與濾波單元122包括放大器302、 304和306以及濾波斋312、314和316,而類比對數位轉 換單元124包括三個類比對數位轉換器322、324和326, 其中濾波器312、314和316為低通濾波器。同時參考第i 圖與第3圖,由麥克風陣列110内各麥克風所提供的輸入 信號Sinl、Sw以及Sho會分別經過所對應之放大器和減波 器來進行放大以及滤波。例如’輸入信號Sini會先經=放 大器302進行放大之後,再經由濾波器312進行濾波,以 得到濾波信號sfl。同樣地’輸入信號Sin;2會經由放1器3〇4 及濾波器314進行放大與濾波以得到濾波信號,而輸入 信,Sin3會經由放大器3〇6及濾波器316進行放大與 以得到濾波信號SD。接著,類比對數位轉換器%〕 Π會分別將滤波錢^^以及^轉換成數201101852 VI. Description of the Invention: [Technical Field] The present invention relates to a sound source detecting device, and more particularly to a sound source detecting device suitable for detecting a human voice position. [Prior Art] Sound source position detection has a wide range of applications, such as sonar detection, wireless communication, video conferencing systems, and the like. In recent years, with the development of robots, 越来越 has become more and more vigorous. Research on the detection of sound source orientation applied to robots has gradually increased. In addition, the sound source azimuth detection system can also be used to enhance human-robot interaction (HRI), or combine with Automatic Speech Recognition (ASR) system or Speech Enhancement system. To enhance the performance of the speech processing system. In the literature related to azimuth detection of a source, the proposed technique can be roughly divided into two categories. One of them is to receive the data received by the microphone array, and then use beamforming or subspace theory to obtain the sound. The angle of the source. The other is to try to estimate the time delay (TD0A), and then use the relationship between the sound source and the wheat to estimate the angle of the sound source.戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 戍 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声 声Firstly, a voice signal from the sound 201101852 is received via a microphone array, wherein the microphone is received by each microphone to include a plurality of microphones. Then, wave. Next, each filtered signal is amplified and apostrophed. Then, by calculating each (four)^input (4) into a digital-to-digital ratio: the expression ===, the timbre of the closing function of the phonogram is used to calculate the estimated value of two = and form a TDOA vector: Ο 〇 TD 〇 A vector to perform azimuth calculation, with 4 - azimuth of the above-mentioned sound source and - elevation angle. (4) The invention provides a sound source detecting device for detecting the orientation of a sound source. The sound source preparation device comprises a microphone array including a first microphone, a second microphone, and a microphone for receiving a voice from the sound source: the second microphone = the second input The signal and the τ third input signal corresponding to the above-mentioned first flute-i. The above judging circuit firstly outputs the voice frame of the upper voice (the voice box containing the voice), and the above frame: when: the first: the voice box of the voice and the voice of the third input signal are first and The third input signal simultaneously detects the speech two-section function, and the judging circuit performs the estimation of the time delay TD0A value between the first, second, and third rounds, and then calculates one of the top meters. The above-mentioned and other objects, features, and advantages of the present invention will become more apparent. The following detailed description of the preferred embodiments and the accompanying drawings The sound source detecting device 100 is used to detect the orientation of the sound source 140. The sound source detecting device 100 includes the microphone 1, the array 110. The first embodiment shows the sound source detecting device 100 according to an embodiment of the invention. The signal conversion circuit 12A and the determination circuit 130, the signal conversion circuit m includes an amplification and chopping unit 122 and an analog-to-digital conversion unit 124, and the determination circuit 13 includes a voice activity price measurement time 70 132, a time delay estimation unit. 134 and square The calculating unit 136. The sound source detecting device 100 can obtain the three-dimensional orientation of the sound source 140 according to the voice signal Ss emitted by the sound source 140, that is, the horizontal azimuth angle of the sound source 140 and the elevation angle. Fig. 2 is a schematic view showing the structure of the microphone array 110 in Fig. 1. The microphone array 110 is composed of three microphones 210, 220 and 230, wherein three microphones 210, 220 and 230 are arranged in an equilateral triangle to form a In addition, the microphones 210, 220, and 230 are all omnidirectional microphones. Referring to Figures 1 and 2, when the voice signal Ss is transmitted from the sound source 14 to the sound source detecting device 1 0, the microphones 210, 22〇 and 23〇 respectively generate the corresponding input signals. For example, when the voice signal s is received, the microphone 210 provides the input signal Sin1 to the signal conversion circuit: the microphone 220 provides the input signal Sw to The signal conversion circuit 12A, and the microphone 230 provides the input signal Sin3 to the signal conversion circuit 12A. Since the distance between the sound source 140 and the microphones 210, 220 and 230 is not the same ', therefore, when the voice signal When the Ss is transmitted from the sound source 140 to the sound source detecting device 1 , the voice signal Ss does not reach the microphones 210, 220, and 23 同时 at the same time. For example, the distance between the microphone 210 and the sound source 140 is the closest to the microphone 23 〇201101852 The farthest distance from the sound source 140, the microphone 21〇 will receive the voice signal Ss first and the microphone 230 will finally receive the voice signal ^. ❹ ❹ Figure 3 shows the signal conversion circuit in Figure 1 A schematic diagram of 120. As shown in FIG. 3, amplification and filtering unit 122 includes amplifiers 302, 304, and 306 and filters 312, 314, and 316, and analog-to-digital conversion unit 124 includes three analog-to-digital converters 322, 324. And 326, wherein filters 312, 314, and 316 are low pass filters. Referring to the first and third figures, the input signals Sin1, Sw and Sho provided by the microphones in the microphone array 110 are respectively amplified and filtered by the corresponding amplifiers and subtractors. For example, the input signal Sini will be amplified by the amplifier 302 first, and then filtered by the filter 312 to obtain the filtered signal sfl. Similarly, the input signal Sin; 2 is amplified and filtered by the amplifier 3 〇 4 and the filter 314 to obtain a filtered signal, and the input signal, Sin3 is amplified and amplified by the amplifier 3 〇 6 and the filter 316 to obtain a filter. Signal SD. Then, the analog-to-digital converter %] Π will convert the filter money ^^ and ^ into a number
Dl、D2以及D3,並傳送至判斷電路13〇。 I 參考回第1圖,在語音活動偵測單元132 浯音活動偵測(v〇ice activity —,Vad : 位信號Di、D2以及D3各自的信號音框是 號。語音活動侧是—種語音信號處理方法 ==音框是否含有說話的語音。藉由判斷== =3一步的處理,例如語音辨識、語音‘ I “帶的的:率較低,即語音信號的能量是由 判斷出輪入信號是語音或非語音信號。首先布;=動: 7 201101852D1, D2, and D3 are transmitted to the judging circuit 13A. I refer back to Fig. 1, in the voice activity detecting unit 132 voice activity detection (v〇ice activity -, Vad: the signal frames of the bit signals Di, D2, and D3 are numbers. The voice activity side is a voice) Signal processing method == Whether the sound box contains spoken speech. By judging ===3 one-step processing, such as speech recognition, speech 'I' band: the rate is lower, that is, the energy of the speech signal is determined by the round The incoming signal is a voice or non-speech signal. First cloth; = motion: 7 201101852
測單元132會將數位信號D1、D2以及D3各自劃分為複數 個音框(frame)。例如,假設一個音框的長度為23ms而 數位信號D1已收到的長度為230ms,則數位信號D1可被 劃分為10個音框。接著,每個音框會先經過快速傅利葉轉 換(Fast Fourier Transform,FFT)以轉換至頻域,然後定 義每個頻帶的機率值,再據以計算該音框的頻譜亂度 (entropy)。語音信號的低頻成分較強(大約集中於3KHz 以下)’並且頻譜上的強度會顯現出明顯的高低起伏變化。 然而’噪音信號的頻譜強度變化較小而較為平坦。因此, 藉由將各音框的頻譜亂度與一特定亂度參考值進行比較, 語音活動偵測單元132可判斷出該音框為語音音框或是非 語音音框。接著,當該音框被判斷為語音音框時,語音活 動偵測單元132會進一步計算該音框之信號雜訊比(signal to noise ratio’ SNR)以避免誤判,以提高語音活動偵測判斷 的準確性。 第4圖係顯示根據本發明一實施例所述,使用廣義交 互相關函數(generalized cross correlation function)的原理 來估計兩麥克風之_時間延遲(Time Delay of Arrivd, TDOA)數值之示意圖,實際實施時,我們將這個理論性的 更ΐΐ:個逼近的作法。在第4圖中,信號&和& 兩麥克風所接收到及放大、遽波後的信號。例如, 以是第3圖中放大與濾波單元的輸出信號 ί先信號、2與,或是輸出信號s義。 盲无’ L *5虎X〗和合八σϊ , 4?n 2 θ刀別經過線性非時變濾波器410和 二,Λ ^yi和^接著,乘法器_會將信號^ 力仃相乘以得到信號M,其中錢yj信號 201101852 ==器430所產生。接著,積分 的吟間平移範圍内將信號“作 月匕 ^ (peak detector)460 ^ 5 扁一胜—政μ 靶找到廣義交互相關函數的最大值。 二門:=::則使用第4圖的廣義交互相關所得到的 時間延遲估計值6,會盥如 功率頻雄r mi3. . 、乂互相關函數Κχι,Χ2(τ)及交互 力旱頻。日Gxl,x2(f)兩者之間的關係式為: 、⑺=〜r ⑴ 〇 Γ艮=4圖可得到信“、y2之間的交互功率頻譜The measuring unit 132 divides the digital signals D1, D2, and D3 into a plurality of frames. For example, assuming that the length of one frame is 23 ms and the length of the digital signal D1 has been received is 230 ms, the digital signal D1 can be divided into 10 frames. Then, each frame is first subjected to Fast Fourier Transform (FFT) to be converted to the frequency domain, and then the probability value of each band is defined, and then the spectral entropy of the frame is calculated. The low frequency component of the speech signal is strong (about concentrated below 3 kHz) and the intensity on the spectrum will show significant fluctuations. However, the spectral intensity of the noise signal changes less and is flatter. Therefore, by comparing the spectral disorder of each frame with a specific ambiguity reference value, the voice activity detecting unit 132 can determine whether the sound box is a voice box or a non-speech box. Then, when the sound box is determined to be a voice sound box, the voice activity detecting unit 132 further calculates a signal to noise ratio SNR of the sound box to avoid false positives, so as to improve voice activity detection and judgment. The accuracy. Figure 4 is a schematic diagram showing the estimation of the value of the Time Delay of Arrivd (TDOA) of two microphones using the principle of a generalized cross correlation function according to an embodiment of the present invention. We will make this theory even more ambiguous: an approaching approach. In Figure 4, the signals & and & microphones receive and amplify, chopped signals. For example, it is the output signal of the amplification and filtering unit in FIG. 3, the first signal, the 2 and the output signal s. Blind without 'L *5 Tiger X〗 and Hex σϊ, 4?n 2 θ knife passes through the linear time-invariant filter 410 and two, Λ ^yi and ^ Then, the multiplier _ multiplies the signal 仃A signal M is obtained in which the money yj signal 201101852 == is generated by the 430. Then, within the inter-turn translation range of the integral, the signal is “peak detector 460 ^ 5 flat one win — political μ target to find the maximum value of the generalized cross-correlation function. Two gates: =:: use the fourth graph The time delay estimation value 6 obtained by the generalized cross-correlation will be, for example, the power frequency r mi3. . , the 乂 cross-correlation function Κχι, Χ 2 (τ) and the interaction force drought frequency. The day Gxl, x2 (f) The relationship is: , (7) = ~ r (1) 〇Γ艮 = 4 graph can get the interactive power spectrum between the letter ", y2"
Gyi,y2(f)和Gxi,x2(f)之間的關係式: 、 G-⑺,/)坧(/)、⑺.The relationship between Gyi, y2(f) and Gxi, x2(f): , G-(7), /)坧(/), (7).
=表示複數共輛,雜賺別是濾波器241〇與涛 :=的頻率響應。如此,信號〜和 J 相關函數可定義成下列算式(3) ·· m 〇)=]>“/)〜(/) eJ^KfT ^ ⑺’剌/)。由於在有限之信號〜和(3心)的觀察 〇 二/宜只此侍到、(/)的估計值《⑺。因此,可將算式 U (3)改寫成下列算式(4) ·· 飞 t (Γ)= ·0“/)《J/V〜y 的情形下义⑺'⑺。另外,為了得二準 = —個適當的濾波_函= 第4圖φΐ 的值U時的值有較大的差異。在 phIt 11^ —Γ- 直可使V: f的頻率響應,如下列算式⑸所 之間的交互相關函數形成脈衝函數,而 脈衝函數之最大值會出現在辱間延遲竭位置。 9 201101852 Ψκ (/): 'ΚΓωί (5)。 接著,將算式(5)代入I—' h二公)中可以得到下列算式(6): 1 ^ ’當語音活動㈣單元132偵測出數位 t唬Dl、D2以及乜3的立 1只州aj數位 Ο Ο 的數位信號音框進行會對每個麥克風輪入 帶的=轉換,以得到頻域上各頻 的時間延遲。實際上的風210、220 *230之間 的左右相等_ 士式則是’首先依照算式(6) 和D (即〇的、叙刀▲別取得Dl(即VO的數位信號)的音框 不(,)“(:):接著位二號)的音框的快速傅利葉轉換頻譜 中的6⑺以及Ιά r、丨吏用頻譜不(/)和尤2(/)來逼近算式(6) 中的g⑺,,也就是以{⑺《⑺逼近算式(6) Ι>ίΛ 二 Ί/「 以 14^).5(/)1 逼近 Κ (/)| ;然後,對 計出離散^值進行反向快速傅利葉轉換運算而估 中’找出具有最大:數值;接著,從Γ值的合理範圍 值。在時間延遲估值一值,來作為時間延遲的估計 義交互相關函數的逼::134中’就是藉由使用上述的廣 在本發明中,每=方法,來估計出時間延遲的數值。 延遲估叶值。* 克風的輸入信號可以得到一個時間 言,—共可得丨,對第2圖中麥克風210、220及230而 -個時間延遲估㈣偶,其可組成 時間延遲數值的合理範圍是由二麥克風之間的症離以 201101852 及聲波在空氣中的傳遞速度所決定。因此,當聲源所在位 置與二麥克風形成一直線時,則可得到最大的時間延遲估 計值。此外’當從((Γ)求得的時間延遲7的數值越大時, 則τ的相鄰兩離散值所對應的角度值之間的差距就會越 大。因此,為了提高時間延遲估計值的精準度,我們使用 拋物線内插法(parabola interpolation)來取得拋物線頂點所 對應的τ值’以減少角度誤差。 " Ο 〇 第5圖係顯示一聲源530與一麥克風對(即麥克 及520)之間的位置圖。㈣53〇的三維座標 ,風51。及52。的三維座標分別為{〜,从則二}而 =此,根據下列算式⑺及算式(8)可 聲^ 聲源別傳送tl w聲皮由 ⑺ ⑻ 。=Κι - λ)2+k, -zy t2 = ysf + c \〇) ί二聲波在空氣中傳播的速度。所以,麥克風 510及麥克風520之間的時間延遲理論d 克風 可計算出一時間延遲理认母麥克風對 個理論值向量(A A.將延則固理論值組成- 算式⑴,方位計算圖傭 置,計算出-個對:::=對位 ,直㈣之平角度:及仰角 异單兀136 —共可得到〔逆y ^ 因此方位计 +1個理論值向量。接著,方位 201101852 计异單7G 136會計算來自時間延遲估計單元134的時間延 遲估計值向量(M4)與所有時_遲理論值向量的幾何距 離’以找出距離最小的時間延遲理論值向量,進而可得到 聲源之方位角Θ及仰角沴。 所描述,本發明所描述之聲源摘測裝置可使用 :個麥克風所組成的正三角形之陣列來偵測出聲源的三 維方位。在本發明中,使用基於亂度量測之語音活動偵測、 以及基於廣義交互相關函數原理之時間延遲估計,因而可 〇 降低聲源之水平方位角以及仰角在估計上的誤差。 本發明雖以較佳實施例揭露如上,然其並非用以限定 1明j範圍,任何熟習此項技藝者,在不脫離本發明之 仅=和範圍内,當可做些許的更動與潤飾,因此本發明之 ,、護範圍當視後附之申請專利範圍所界定者為準。= indicates a complex number of vehicles, and the miscellaneous earning is the frequency response of the filter 241〇 and Tao :=. Thus, the signal ~ and J correlation functions can be defined as the following equation (3) ·· m 〇)=]>"/)~(/) eJ^KfT ^ (7)'剌/). Because of the limited signal ~ and ( 3 heart) observation 〇 2 / should only be the wait, (/) estimated value "(7). Therefore, the formula U (3) can be rewritten into the following formula (4) · · fly t (Γ) = · 0" /) "J/V~y in the case of meaning (7)' (7). In addition, there is a large difference in the value of the value U of the φ 第 第 适当 适当 适当 适当 适当 适当 。 。 。 。 。. The frequency response of V: f can be made directly at phIt 11^ —Γ-, and the inter-correlation function between equations (5) below forms a pulse function, and the maximum value of the pulse function appears in the ruthless delay position. 9 201101852 Ψκ (/): 'ΚΓωί (5). Next, substituting equation (5) into I-'h two public) can obtain the following formula (6): 1 ^ 'When voice activity (4) unit 132 detects digital states t唬Dl, D2, and 乜3 The digital signal frame of the aj digit Ο 进行 performs a = conversion for each microphone wheel to obtain the time delay of each frequency in the frequency domain. In fact, the right and left winds 210, 220 * 230 are equal to the left and right _ Shishi is 'first according to the formula (6) and D (ie, the 〇, the ▲ ▲ do not get Dl (ie VO digital signal) the sound box is not (,) 6(7) and Ιά r in the fast Fourier transform spectrum of the sound box of "(:): followed by bit 2), and the spectrum is not (/) and especially 2 (/) to approximate the equation (6) g(7), that is, by {(7)"(7) approximation (6) Ι> Λ Λ / / "14^).5(/)1 is approximated by Κ (/)|; then, the inverse of the discrete value is counted The fast Fourier transform operation evaluates to 'find the maximum: the value; then, the reasonable range value from the Γ value. Estimate a value at time delay, as the estimated time-interval correlation function of the time delay:: 134' It is by using the above-mentioned widely used method, the value of the time delay is estimated every time. The delay is estimated. The input signal of the wind can get a time, and the total is available, for the second In the figure, the microphones 210, 220 and 230 have a time delay estimate (four) even, which can form a reasonable range of time delay values between the two microphones. The symptom is determined by the transmission speed of the sound in the air in 201101852. Therefore, when the position of the sound source is in line with the two microphones, the maximum time delay estimate can be obtained. In addition, when the (from () is obtained The larger the value of time delay 7 is, the larger the difference between the angle values corresponding to the two adjacent discrete values of τ is. Therefore, in order to improve the accuracy of the time delay estimation, we use parabolic interpolation. (parabola interpolation) to obtain the τ value corresponding to the parabola vertex to reduce the angular error. " Ο 〇 Figure 5 shows the position map between a sound source 530 and a microphone pair (ie, Mike and 520). (4) 53〇 The three-dimensional coordinates, the winds 51. and 52. The three-dimensional coordinates are {~, from the second} and =, according to the following formula (7) and formula (8), the sound source can be transmitted by tl w (7) (8). =Κι - λ)2+k, -zy t2 = ysf + c \〇) ί The speed at which the two sound waves propagate in the air. Therefore, the time delay between the microphone 510 and the microphone 520 can be calculated for a time. Delaying the recognition of the mother microphone Value vector (A A. will extend the theoretical value of the formula - the formula (1), the azimuth calculation map, calculate - a pair::: = alignment, straight (four) flat angle: and elevation angle 兀 136 - total Obtaining [inverse y ^ and thus the orientation +1 theoretical value vector. Next, the orientation 201101852 calculus 7G 136 calculates the time delay estimation vector (M4) from the time delay estimation unit 134 and all the time-late theoretical vector vectors. The geometric distance 'to find the theoretical vector of the time delay with the smallest distance, and then the azimuth angle and elevation angle 声 of the sound source can be obtained. As described, the sound source pick-up device described in the present invention can use an array of equilateral triangles composed of microphones to detect the three-dimensional orientation of the sound source. In the present invention, speech activity detection based on chaotic measurement and time delay estimation based on the generalized interactive correlation function principle are used, thereby reducing the horizontal azimuth of the sound source and the error in estimation of the elevation angle. The present invention has been described above with reference to the preferred embodiments. However, it is not intended to limit the scope of the invention, and it is possible to make some modifications and refinements without departing from the scope of the invention. Therefore, the scope of the invention is defined by the scope of the appended claims.
12· 201101852 【圖式簡單說明】 第1圖係顯示根據本發明一實施例所述之聲源偵測裝 置,用以偵測聲源之方位; 第2圖係顯示第1圖中麥克風陣列之示意圖; 第3圖係顯示第1圖中信號轉換電路之示意圖; 第4圖係顯示根據本發明一實施例所述之使用廣義交 互相關函數原理來估計兩麥克風之間的時間延遲值之示意 圖;以及 第5圖係顯示一聲源與一麥克風對之間的位置圖。 〇 【主要元件符號說明】 100〜聲源偵測裝置; 110〜麥克風陣列; 120〜信號轉換電路; 122〜放大與濾波單元; 124〜類比對數位轉換單元;130〜判斷電路; 132〜語音活動偵測單元; 134〜時間延遲估計單元; 136〜方位計算單元; 140、530〜聲源; 210、220、230、510、520〜麥克風; 302、304、306〜放大器;312、314、316〜濾波器; 〇 322、324、326〜類比對數位轉換器; 410、420〜線性非時變濾波器; 430〜延遲器; 440〜乘法器; 450〜積分器; 460〜峰值偵測器;12·201101852 [Simplified description of the drawings] Fig. 1 shows a sound source detecting device for detecting the orientation of a sound source according to an embodiment of the present invention; and Fig. 2 shows a microphone array of Fig. 1 FIG. 3 is a schematic diagram showing a signal conversion circuit in FIG. 1; FIG. 4 is a schematic diagram showing estimating a time delay value between two microphones using a generalized cross-correlation function principle according to an embodiment of the invention; And Figure 5 shows a positional map between a sound source and a pair of microphones. 〇 [Main component symbol description] 100~ sound source detection device; 110~ microphone array; 120~ signal conversion circuit; 122~ amplification and filtering unit; 124~ analog-to-digital conversion unit; 130~ judgment circuit; Detecting unit; 134~time delay estimating unit; 136~azimuth calculating unit; 140, 530~ sound source; 210, 220, 230, 510, 520~ microphone; 302, 304, 306~ amplifier; 312, 314, 316~ Filter; 〇322, 324, 326~ analog-to-digital converter; 410, 420~ linear time-invariant filter; 430~ retarder; 440~multiplier; 450~ integrator; 460~peak detector;
Dl、D2、D3〜數位信號; d、A、4、A〜時間延遲估計值;Dl, D2, D3 ~ digital signal; d, A, 4, A ~ time delay estimation value;
Sfl、Sf2、Sf3〜濾、波信號;Sfl, Sf2, Sf3 ~ filter, wave signal;
Sinl、Si„2、Sjn3、Χι、X2〜輸入信號;Sinl, Si„2, Sjn3, Χι, X2~ input signals;
Ss〜語音信號;. y]、y2、y3、Μ〜信號; 61〜方位角;以及 多〜仰角。 13Ss ~ speech signal; y], y2, y3, Μ ~ signal; 61 ~ azimuth; and more ~ elevation angle. 13
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW98121567A TW201101852A (en) | 2009-06-26 | 2009-06-26 | Sound source direction detecting method and apparatus thereof |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW98121567A TW201101852A (en) | 2009-06-26 | 2009-06-26 | Sound source direction detecting method and apparatus thereof |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| TW201101852A true TW201101852A (en) | 2011-01-01 |
Family
ID=44837154
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW98121567A TW201101852A (en) | 2009-06-26 | 2009-06-26 | Sound source direction detecting method and apparatus thereof |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TW201101852A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9473868B2 (en) | 2013-02-07 | 2016-10-18 | Mstar Semiconductor, Inc. | Microphone adjustment based on distance between user and microphone |
| CN112700122A (en) * | 2020-12-29 | 2021-04-23 | 华润电力技术研究院有限公司 | Thermodynamic system performance calculation method, device and equipment |
-
2009
- 2009-06-26 TW TW98121567A patent/TW201101852A/en unknown
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9473868B2 (en) | 2013-02-07 | 2016-10-18 | Mstar Semiconductor, Inc. | Microphone adjustment based on distance between user and microphone |
| CN112700122A (en) * | 2020-12-29 | 2021-04-23 | 华润电力技术研究院有限公司 | Thermodynamic system performance calculation method, device and equipment |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP1983799B1 (en) | Acoustic localization of a speaker | |
| CN111044973B (en) | An MVDR target sound source directional pickup method for microphone array | |
| CN103308889B (en) | Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment | |
| CN102800325A (en) | Ultrasonic-assisted microphone array speech enhancement device | |
| TW201234873A (en) | Sound acquisition via the extraction of geometrical information from direction of arrival estimates | |
| KR20130084298A (en) | Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation | |
| GB2517690A (en) | Method and device for localizing sound sources placed within a sound environment comprising ambient noise | |
| CN110534126B (en) | A method and system for sound source localization and speech enhancement based on fixed beamforming | |
| CN102509552A (en) | Method for enhancing microphone array voice based on combined inhibition | |
| CN102103200A (en) | Acoustic source spatial positioning method for distributed asynchronous acoustic sensor | |
| CN108447499B (en) | Double-layer circular-ring microphone array speech enhancement method | |
| CN103544959A (en) | Verbal system and method based on voice enhancement of wireless locating microphone array | |
| CN110830870B (en) | Earphone wearer voice activity detection system based on microphone technology | |
| CN100571451C (en) | Microphone array sound receiving method and system combining positioning technology | |
| WO2020043037A1 (en) | Voice transcription device, system and method, and electronic device | |
| US20140269198A1 (en) | Beamforming Sensor Nodes And Associated Systems | |
| KR101086304B1 (en) | Apparatus and method for removing echo signals generated by robot platform | |
| JP2010124370A (en) | Signal processing device, signal processing method, and signal processing program | |
| TW201101852A (en) | Sound source direction detecting method and apparatus thereof | |
| JP2006194700A (en) | Sound source direction estimation system, sound source direction estimation method and sound source direction estimation program | |
| JP5635024B2 (en) | Acoustic signal emphasizing device, perspective determination device, method and program thereof | |
| JP3862685B2 (en) | Sound source direction estimating device, signal time delay estimating device, and computer program | |
| CN105025416B (en) | A portable dual-microphone sound source identification and localization device | |
| Shen et al. | A modified cross power-spectrum phase method based on microphone array for acoustic source localization | |
| JP2010206449A (en) | Speech direction estimation device and method, and program |