TWI520131B - Speech Recognition System Based on Joint Time - Frequency Domain and Its Method - Google Patents
Speech Recognition System Based on Joint Time - Frequency Domain and Its Method Download PDFInfo
- Publication number
- TWI520131B TWI520131B TW102136684A TW102136684A TWI520131B TW I520131 B TWI520131 B TW I520131B TW 102136684 A TW102136684 A TW 102136684A TW 102136684 A TW102136684 A TW 102136684A TW I520131 B TWI520131 B TW I520131B
- Authority
- TW
- Taiwan
- Prior art keywords
- speech
- time
- sound
- frequency domain
- characteristic
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 24
- 230000005236 sound signal Effects 0.000 claims description 20
- 238000006243 chemical reaction Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 19
- 230000004044 response Effects 0.000 claims description 13
- 210000003710 cerebral cortex Anatomy 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Landscapes
- Telephone Function (AREA)
Description
本發明係一種語音及非語音辨識系統及其方法,尤指一種以聯合時域頻域調變能量為基礎之語音及非語音辨別系統及其方法。 The present invention relates to a speech and non-speech recognition system and method thereof, and more particularly to a speech and non-speech discrimination system based on combined time domain frequency domain modulation energy and a method thereof.
伴隨行動通訊裝置的長足發展,為能提供更便捷的操作介面,以使智慧型電子產品能夠提供更卓越的服務功能,語音輸入模式便成為熱門的解案方案之一。 With the rapid development of mobile communication devices, in order to provide a more convenient operation interface, so that smart electronic products can provide better service functions, voice input mode has become one of the popular solution solutions.
當行動裝置在戶外或在較為吵雜環境使用時,環境之背景噪音容易造成語音辨識裝置之誤判,進而影響其服務品質。因此,為解決此技術問題,目前消除背景噪音之習知技術計有(1). 利用信號之時域特徵(如訊號之音框能量及過零率)所設計之濾波器來消除噪音、(2). 利用信號之頻域特徵(針對訊號之特定頻帶)所設計之濾波器來消除噪音、以及(3). 利用前述其中之一或及其組合所設計之濾波器以消除噪音,並透過機器學習以優化其抗雜訊號力。 When the mobile device is used outdoors or in a noisy environment, the background noise of the environment is likely to cause misjudgment of the speech recognition device, thereby affecting the service quality. Therefore, in order to solve this technical problem, the conventional techniques for eliminating background noise are (1). The filter designed by the time domain characteristics of the signal (such as the sound box energy of the signal and the zero-crossing rate) is used to eliminate noise, 2). Use the filter designed by the frequency domain characteristics of the signal (for the specific frequency band of the signal) to eliminate noise, and (3) use one of the above or a combination of filters designed to eliminate noise and pass through Machine learning to optimize its anti-noise power.
然而,環境雜訊具有極大的不確定性,而使得僅透過時、頻率之部分特徵所設計之濾波器在較為複雜或低訊號雜訊比的環境下無法準確辨識出語音及噪音,因而影響語音服務之品質;而採用機器學習進行優化之濾波器則需耗費冗長的訓練過程以及大量的資源,使得此方案亦無法在實際上獲得廣泛的運用。 However, the environmental noise has great uncertainty, so that the filter designed by only part of the time and frequency can not accurately recognize the voice and noise in the environment of complex or low signal noise ratio, thus affecting the voice. The quality of the service; the filter optimized by machine learning requires a lengthy training process and a large amount of resources, making this solution not widely available in practice.
為解決前揭習知技術之技術問題,本發明之一目的係透過解析及辨識語音之特定結構,以解決於充斥各式背景雜音之語音辨識問題。 In order to solve the technical problem of the prior art, one of the objects of the present invention is to solve the problem of speech recognition filled with various background noises by analyzing and recognizing the specific structure of the speech.
為達上述之目的,本發明提供一種識別語音及非語音系統,其包含聲音轉換模組、特徵分析模組、萃取模組以及決策模組。首先,聲音轉換模組將輸入之聲音訊號轉換成時頻域之二維圖像,並將時頻域之二維圖像傳送至特徵分析模組。特徵分析模組接著將前述之時頻域之二維圖像進行分析,以得到複數個聲音特徵,並將複數個聲音特徵傳送至萃取模組。而後,萃取模組針對複數個聲音特徵萃取出語音特徵辨識值,再將語音特徵辨識傳送至決策模組。決策模組最後再將語音特徵辨識值與語音門檻值進行比較,以區分出聲音訊號之語音訊號以及非語音訊號。 To achieve the above objective, the present invention provides a voice recognition and non-speech system including a voice conversion module, a feature analysis module, an extraction module, and a decision module. First, the sound conversion module converts the input sound signal into a two-dimensional image in the time-frequency domain, and transmits the two-dimensional image in the time-frequency domain to the feature analysis module. The feature analysis module then analyzes the two-dimensional image of the aforementioned time-frequency domain to obtain a plurality of sound features, and transmits a plurality of sound features to the extraction module. Then, the extraction module extracts the speech feature identification value for the plurality of sound features, and then transmits the speech feature recognition to the decision module. Finally, the decision module compares the speech feature identification value with the speech threshold value to distinguish the voice signal of the audio signal from the non-speech signal.
為達上述之目的,本發明另提供一種識別語音及非語音之方法,其包含下列步驟:將輸入之聲音訊號經由聲音轉換模組轉換成時頻域之二維圖像;接著,將時頻域之二維圖像使用特徵模組作進一步之分析,以取得複數個聲音特徵;再者,將複數個聲音特徵使用萃取模組進行萃取,以得出語音特性辨識值;最後,使用決策模組將語音特性辨識值與內建之語音門檻值進行比較以及分析,以區分聲音訊號中之語音以及非語音之訊號。 For the above purposes, the present invention further provides a method for recognizing speech and non-speech, comprising the steps of: converting an input audio signal into a two-dimensional image in a time-frequency domain via a sound conversion module; and then, time-frequency The two-dimensional image of the domain is further analyzed using the feature module to obtain a plurality of sound features; further, a plurality of sound features are extracted using an extraction module to obtain a speech characteristic identification value; finally, a decision mode is used. The group compares and analyzes the speech feature identification value with the built-in speech threshold to distinguish between speech and non-speech signals in the audio signal.
由於傳統濾波器僅能針對特定之雜訊進行濾波設計,因而限縮了濾波器之濾波效能及其濾波範圍;反觀本發明所述之語音及非語音辨識系統及其方法係透過分析語音之特定結構進行分析,使得在複雜的背景雜音下,仍可萃取出所欲之信號,以提供優質之語音辨識服務。 Since the conventional filter can only filter the design for specific noise, the filter performance and filtering range of the filter are limited; in contrast, the speech and non-speech identification system and the method thereof according to the present invention are analyzed by analyzing the specificity of the speech. The structure is analyzed so that under complex background noise, the desired signal can still be extracted to provide a high quality speech recognition service.
1‧‧‧識別語音及非語音系統 1‧‧‧ Identifying voice and non-speech systems
3‧‧‧聲音轉換模組 3‧‧‧Sound Conversion Module
5‧‧‧特徵分析模組 5‧‧‧Characteristic Analysis Module
7‧‧‧萃取模組 7‧‧‧Extraction module
9‧‧‧決策模組 9‧‧‧Decision module
20‧‧‧聲音訊號 20‧‧‧Sound signal
22‧‧‧時頻域之二維圖像 22‧‧‧2D image of the time-frequency domain
24‧‧‧聲音特徵 24‧‧‧Sound characteristics
26‧‧‧語音特性辨識值 26‧‧‧Voice characteristic identification value
28‧‧‧輸出訊號 28‧‧‧Output signal
第1圖係為本發明識別語音及非語音系統之架構圖。 Figure 1 is an architectural diagram of a speech recognition and non-speech system of the present invention.
第2圖及第3圖係為本發明所採用二維濾波器之示意圖。 2 and 3 are schematic views of a two-dimensional filter used in the present invention.
第4圖及第5圖係為本發明聲音訊號經二維濾波器萃取後之聲音特徵之示意圖。 Fig. 4 and Fig. 5 are schematic diagrams showing the sound characteristics of the sound signal of the present invention after being extracted by a two-dimensional filter.
第6圖係為本發明識別語音及非語音方法之流程圖。 Figure 6 is a flow chart of the method for recognizing speech and non-speech according to the present invention.
以下將描述具體之實施例以說明本發明之實施態樣,惟其並非用以限制本發明所欲保護之範疇。 The specific embodiments are described below to illustrate the embodiments of the invention, but are not intended to limit the scope of the invention.
以下係揭露識別語音及非語音系統之實施例。請參閱第1圖,係本發明之識別語音及非語音系統之架構圖。語音非語音識別系統1主要包含了聲音轉換模組3、特徵分析模組5、萃取模組7、以及決策模組9。 Embodiments for identifying voice and non-speech systems are disclosed below. Please refer to FIG. 1 , which is an architectural diagram of the recognized speech and non-speech system of the present invention. The voice non-speech recognition system 1 mainly includes a voice conversion module 3, a feature analysis module 5, an extraction module 7, and a decision module 9.
其中,聲音轉換模組3將所接收之聲音訊號20轉換成時頻域之二維圖像22,並將時頻域之二維圖像22傳送至與聲音轉換模組3連接之特徵分析模組5。接著,特徵分析模組5將時頻域之二維圖像22進行分析,以取得複數個聲音特徵24,並將複數個聲音特徵24傳送至與特徵分析模組5連結之萃取模組7。萃取模組7對複數個聲音特徵24進行萃取作業,並將生成之語音特性辨識值26傳送至與萃取模組7連接之決策模組9。決策模組9接著根據語音門檻值與語音特性辨識值26進行比較,藉以分辨出語音訊號及非語音訊號。 The sound conversion module 3 converts the received sound signal 20 into a two-dimensional image 22 in the time-frequency domain, and transmits the two-dimensional image 22 in the time-frequency domain to the feature analysis module connected to the sound conversion module 3. Group 5. Next, the feature analysis module 5 analyzes the two-dimensional image 22 in the time-frequency domain to obtain a plurality of sound features 24 and transmits the plurality of sound features 24 to the extraction module 7 coupled to the feature analysis module 5. The extraction module 7 performs an extraction operation on the plurality of sound features 24 and transmits the generated speech characteristic identification value 26 to the decision module 9 connected to the extraction module 7. The decision module 9 then compares the speech threshold value with the speech characteristic identification value 26 to distinguish between the speech signal and the non-speech signal.
另外,聲音轉換模組3於接收到聲音訊號20後,更可將聲音 訊號20進一步劃分成複數個音框,並將其分別透過短時間傅立葉轉換後再加以合成,以形成時頻域之二維圖像22。前述之時頻域二維圖像22係為模擬聽覺中大腦皮質輸出的聽覺頻譜圖。 In addition, after receiving the sound signal 20, the sound conversion module 3 can further sound The signal 20 is further divided into a plurality of sound boxes, which are respectively converted into short-time Fourier transforms and then combined to form a two-dimensional image 22 in the time-frequency domain. The aforementioned time-frequency domain two-dimensional image 22 is an auditory spectrogram that simulates the output of the cerebral cortex in the auditory sense.
請再同時參閱第2圖及第3圖,第2圖以及第3圖係為基於時頻域二維圖像之語音調變方向性示意圖。由於語音訊號具有特定之諧波性以及頻率調變方向性,當聲音訊號透過時軸與頻軸來觀察時,可清楚看出欲解析訊號其頻率調變之方向。請繼續參閱第2圖,圖式為輸入訊號於特定時間區間內之頻率為隨時間遞減,其語音調變方向為一下移模式,而其時間軸上封包變化率為4Hz,其頻率軸上封包變化率為2ms;請接著參閱第3圖,其輸入訊號於特定時間區間內之頻率為隨時間遞增,其語音調變方向為一上移模式,而其時間軸上封包變化率為8Hz,其頻率軸上封包變化率為4ms,透時頻域二維圖像22可分析出聲音訊號20係為何種調變方向,並提供接續模組進行分析。 Please refer to FIG. 2 and FIG. 3 at the same time. FIG. 2 and FIG. 3 are schematic diagrams of the speech modulation directionality based on the time-frequency domain two-dimensional image. Since the voice signal has a specific harmonicity and frequency modulation directionality, when the sound signal is transmitted through the axis and the frequency axis, the direction of the frequency modulation of the signal to be resolved can be clearly seen. Please continue to refer to Figure 2, where the frequency of the input signal is decremented over time in a specific time interval, and the voice modulation direction is the next shift mode, and the packet change rate on the time axis is 4 Hz, and the frequency axis is encapsulated. The rate of change is 2ms; please refer to Figure 3, the frequency of the input signal in a specific time interval is increasing with time, the direction of speech modulation is an upshift mode, and the rate of packet change on the time axis is 8Hz. The packet change rate on the frequency axis is 4 ms. The time-frequency domain 2D image 22 can analyze the direction of the modulation signal 20 and provide a connection module for analysis.
為進一步取得解析參數,特徵分析模組5利用二維時頻脈衝響應帶通濾波器組來產生時頻域之二維圖像22的聯合時域及頻域之複數個聲音特徵24。前述之二維時頻脈衝響應帶通濾波器組可將時間軸上封包變化率以及頻率軸上封包的變化率作為設計之參數,以生成複數個帶通濾波器。而前述帶通濾波器之數量可透過時間上封包的變化率、頻率軸上封包的變化率、以及時頻域脈衝響應方向性的數量相乘所得之積來決定。前述之複數個濾波器可由語音及非語音之特性加以區分成複數個語音特性濾波器以及非語音特性濾波器二個群組,以分別處理語音以及非語音之訊號。而複數個聲音特徵24主要包含了時間、頻率、時間軸上封包變化率、以及 頻率軸上封包的變化率之參數。 To further obtain the analytical parameters, the feature analysis module 5 uses the two-dimensional time-frequency impulse response bandpass filter bank to generate a plurality of sound features 24 in the combined time domain and frequency domain of the two-dimensional image 22 in the time-frequency domain. The aforementioned two-dimensional time-frequency impulse response bandpass filter bank can use the rate of change of the packet on the time axis and the rate of change of the packet on the frequency axis as parameters of the design to generate a plurality of bandpass filters. The number of the band pass filters can be determined by multiplying the rate of change of the time packet, the rate of change of the packet on the frequency axis, and the amount of directionality of the impulse response in the time domain. The plurality of filters described above can be distinguished by a combination of speech and non-speech characteristics into a plurality of groups of speech characteristic filters and non-speech characteristic filters to process speech and non-speech signals, respectively. The plurality of sound features 24 mainly include time, frequency, rate of change on the time axis, and The parameter of the rate of change of the packet on the frequency axis.
為更清楚解釋,請再參閱第4圖及第5圖。其皆為聲音特徵24之示意圖。當一同時包含語音以及具有不確定性風聲之聲音訊號22依序經由聲音模組3及特徵分析模組5處理後,可得到第4圖以及第5圖之所示之聲音特徵24。其中,第4圖為基於時間軸上封包變化率以及頻率軸上封包變化率之語音訊號之調變能量分布情形;第5圖為基於時間軸上封包變化率及頻率軸上封包變化率之非語音訊號之調變能量分布情形。本發明透過觀察聲音能量分布之差異性即可明確的分類出語音以及非語音訊號。 For a clearer explanation, please refer to Figure 4 and Figure 5. They are all schematic diagrams of the sound features 24. When the sound signal 22 including both the voice and the uncertain wind sound is sequentially processed by the sound module 3 and the feature analyzing module 5, the sound features 24 shown in FIGS. 4 and 5 can be obtained. 4 is a modulation energy distribution based on the rate of change of the packet on the time axis and the rate of change of the packet on the frequency axis; FIG. 5 is based on the rate of change of the packet on the time axis and the rate of change of the packet on the frequency axis. The modulation energy distribution of the voice signal. The present invention can clearly classify speech and non-speech signals by observing the difference in sound energy distribution.
而在第4圖中,語音調變能量大部分落在時間軸上封包變化率=4Hz、頻率軸上封包變化率=5ms之區間;在第5圖中,非語音調變能量大部分落在時間軸上封包變化率=32Hz、時間軸上封包變化率=2ms之區間。因此本發明僅需透過選取上述特定之區間即可萃取出所欲之訊號。 In Fig. 4, the speech modulation energy mostly falls on the time axis, the packet change rate = 4 Hz, and the frequency axis packet change rate = 5 ms; in Fig. 5, the non-speech modulation energy mostly falls on The packet change rate on the time axis = 32 Hz, and the packet change rate on the time axis = 2 ms. Therefore, the present invention only needs to extract the desired signal by selecting the above specific interval.
接著,萃取模組7通過上述之複數個語音特性以及非語音特性濾波器以分別求得語音特性以及非語音特性的於時間及頻率軸上變化率分布之調變能量後,並將前述之語音特性以及非語音特性的調變能量分別乘上各自之權重值(例如語音特性濾波的調變能量之權重值可設為1,非語音特性濾波的調變能量可設為-1),以分別求得語音特性及以非語音特性之加權分數。 Next, the extraction module 7 obtains the modulation energy of the speech characteristic and the non-speech characteristic on the time and frequency axis by using the above-mentioned plurality of speech characteristics and the non-speech characteristic filter, respectively, and then the aforementioned speech The modulation energy of the characteristic and the non-speech characteristic are respectively multiplied by the respective weight values (for example, the weight value of the modulation energy of the speech characteristic filter can be set to 1, and the modulation energy of the non-speech characteristic filter can be set to -1), respectively. The speech characteristics and the weighted scores with non-speech characteristics are obtained.
最後,決策模組9將前述之語音及非語音特性之加權分數和語音門檻值進行比較,以分辨聲音訊號20為語音訊號或為非語音訊號,而達到語音及非語音辨識之目的。 Finally, the decision module 9 compares the weighted scores of the voice and non-speech characteristics with the voice threshold to distinguish the voice signal 20 as a voice signal or a non-speech signal, thereby achieving the purpose of voice and non-speech recognition.
以下,更進一步揭露識別語音及非語音方法之實施例。請參 閱第6圖,第6圖為本發明識別語音及非語音方法之流程圖。該方法包含下列步驟:步驟601,將聲音訊號經由聲音轉換模組轉換成時頻域之二維圖像;步驟603,將時頻域之二維圖像使用特徵分析模組進行分析,以取得複數個聲音特徵;步驟605,將複數個聲音特徵使用萃取模組進行萃取成語音特性辨識值;步驟607,將語音特性辨識值與語音門檻值使用決策模組進行比較分析,以區分出聲音訊號中語音訊號與非語音訊號的部份。 Embodiments of the recognized speech and non-speech methods are further disclosed below. Please refer to Referring to Figure 6, Figure 6 is a flow chart of the method for recognizing speech and non-speech according to the present invention. The method includes the following steps: Step 601: converting a sound signal into a two-dimensional image in a time-frequency domain via a sound conversion module; and step 603, analyzing a two-dimensional image in a time-frequency domain using a feature analysis module to obtain a plurality of sound features; in step 605, the plurality of sound features are extracted into a speech characteristic identification value by using an extraction module; and in step 607, the speech characteristic identification value is compared with the speech threshold value using a decision module to distinguish the sound signal. The part of the voice signal and the non-voice signal.
在本實施例中,於步驟601所取得之聲音訊號可先經由聲音轉換模組劃分成複數個音框,並使用短時間傅利葉轉換將複數個音框產生時頻域之二維圖像,並將前述之時頻域之二維圖像傳送至特徵分析模組加以處理。而前述經由聲音轉換模組所轉換之時頻域之二維圖像,係為模擬聽覺中大腦皮質輸出的聽覺頻譜圖。 In this embodiment, the sound signal obtained in step 601 can be first divided into a plurality of sound frames through a sound conversion module, and a plurality of sound frames are used to generate a two-dimensional image in a time-frequency domain using a short-time Fourier transform, and The two-dimensional image of the aforementioned time-frequency domain is transmitted to the feature analysis module for processing. The two-dimensional image in the time-frequency domain converted by the sound conversion module is an auditory spectrum image of the cerebral cortex output in the simulated hearing.
步驟603之特徵分析模組係為二維時頻域脈衝響應帶通濾波器組,前述之二維時頻域脈衝響應帶通濾波器組係把時頻域的二維圖像的聯合時頻及頻域的能量變化率進行語音結構分析並以此產生複數個聲音特徵,並將聲音特徵傳送至萃取模組進行下一步的處理。而上述之二維時頻域脈衝響應帶通濾波器組在實施上可使用複數個帶通濾波器來組成,且其帶通濾波器之數量係經由時間軸上封包的變化率、頻率軸上封包變化率以及時頻域脈衝響應方向性之數量相乘所得的積來決定。而上述之濾波器組可透過語音特性及非語音特性區分成複數個語音特性濾波器及複數個非語音特性濾波器以分別處理語音以及非語音之訊號。而前述之複數個聲音特徵包含了時間、頻率、時間軸上封包變化率、以及頻率軸上封包的變化率之參數。 The feature analysis module of step 603 is a two-dimensional time-frequency domain impulse response bandpass filter bank, and the two-dimensional time-frequency domain impulse response bandpass filter bank is a joint time-frequency of a two-dimensional image in a time-frequency domain. And the frequency change rate of the frequency domain is used for speech structure analysis to generate a plurality of sound features, and the sound features are transmitted to the extraction module for further processing. The above two-dimensional time-frequency domain impulse response band-pass filter bank can be implemented by using a plurality of band-pass filters, and the number of band-pass filters is based on the rate of change of the packet on the time axis, on the frequency axis. The product of the rate of change of the packet and the amount of directionality of the impulse response in the time-frequency domain is determined by the product of the multiplication. The filter bank can be divided into a plurality of speech characteristic filters and a plurality of non-speech characteristic filters to process speech and non-speech signals respectively through speech characteristics and non-speech characteristics. The plurality of sound characteristics described above include parameters of time, frequency, rate of change of the packet on the time axis, and rate of change of the packet on the frequency axis.
接著,步驟605將上述之複數個聲音特徵通過複數個帶通濾波器中的語音特性濾波器並乘上語音特性濾波器的權重值以得到一具有語音特性之加權分數,而此具有語音特性之加權分數可為語音特性辨識值之一部;另外,步驟605亦可將上述之複數個聲音特徵通過複數個帶通濾波器中的非語音特性濾波器並乘上非語音特性濾波器的權重值以得到一具有非語音特性之加權分數,而此具有非語音特性之加權分數可為語音特性辨識值之另一部。 Next, in step 605, the plurality of sound features are passed through a speech characteristic filter in the plurality of band pass filters and multiplied by the weight value of the speech characteristic filter to obtain a weighted score having a speech characteristic, and the speech characteristic is The weighted score may be part of the speech characteristic identification value. In addition, step 605 may also pass the plurality of sound features to pass through the non-speech characteristic filter of the plurality of band pass filters and multiply the weight value of the non-speech characteristic filter. To obtain a weighted score with non-speech characteristics, and the weighted score with non-speech characteristics can be another part of the speech characteristic identification value.
最後,步驟607更針對語音特性之加權分數與非語音特性之加權分數與該語音門檻值做辨別後,作為判斷聲音訊號係為語音非語音訊號之依據。 Finally, in step 607, the weighted scores of the speech characteristics and the weighted scores of the non-speech characteristics are discriminated from the speech threshold value, and then the sound signal is used as the basis for the voice non-speech signal.
上列詳細說明係針對本創作之一可行實施例之具體說明,惟該實施例並非用以限制本發明之專利範圍,凡未脫離本發明技藝精神所為之等效實施或變更,均應包含於本案之專利範圍中。 The detailed description above is a detailed description of one of the possible embodiments of the present invention, and is not intended to limit the scope of the present invention. The patent scope of this case.
1‧‧‧識別語音及非語音系統 1‧‧‧ Identifying voice and non-speech systems
3‧‧‧聲音轉換模組 3‧‧‧Sound Conversion Module
5‧‧‧特徵分析模組 5‧‧‧Characteristic Analysis Module
7‧‧‧萃取模組 7‧‧‧Extraction module
9‧‧‧決策模組 9‧‧‧Decision module
20‧‧‧聲音訊號 20‧‧‧Sound signal
22‧‧‧時頻域之二維圖像 22‧‧‧2D image of the time-frequency domain
24‧‧‧聲音特徵 24‧‧‧Sound characteristics
26‧‧‧語音特性辨識值 26‧‧‧Voice characteristic identification value
28‧‧‧輸出訊號 28‧‧‧Output signal
Claims (21)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW102136684A TWI520131B (en) | 2013-10-11 | 2013-10-11 | Speech Recognition System Based on Joint Time - Frequency Domain and Its Method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW102136684A TWI520131B (en) | 2013-10-11 | 2013-10-11 | Speech Recognition System Based on Joint Time - Frequency Domain and Its Method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201514977A TW201514977A (en) | 2015-04-16 |
| TWI520131B true TWI520131B (en) | 2016-02-01 |
Family
ID=53437707
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW102136684A TWI520131B (en) | 2013-10-11 | 2013-10-11 | Speech Recognition System Based on Joint Time - Frequency Domain and Its Method |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI520131B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI689865B (en) * | 2017-04-28 | 2020-04-01 | 塞席爾商元鼎音訊股份有限公司 | Smart voice system, method of adjusting output voice and computre readable memory medium |
| TWI768676B (en) * | 2021-01-25 | 2022-06-21 | 瑞昱半導體股份有限公司 | Audio processing method and audio processing device, and associated non-transitory computer-readable medium |
-
2013
- 2013-10-11 TW TW102136684A patent/TWI520131B/en not_active IP Right Cessation
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI689865B (en) * | 2017-04-28 | 2020-04-01 | 塞席爾商元鼎音訊股份有限公司 | Smart voice system, method of adjusting output voice and computre readable memory medium |
| TWI768676B (en) * | 2021-01-25 | 2022-06-21 | 瑞昱半導體股份有限公司 | Audio processing method and audio processing device, and associated non-transitory computer-readable medium |
Also Published As
| Publication number | Publication date |
|---|---|
| TW201514977A (en) | 2015-04-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
| CN113823293B (en) | A speaker recognition method and system based on speech enhancement | |
| CN110827837A (en) | Whale activity audio classification method based on deep learning | |
| CN106504763A (en) | Multi-target Speech Enhancement Method Based on Microphone Array Based on Blind Source Separation and Spectral Subtraction | |
| CN102483926B (en) | System and method for noise reduction in processing speech signals by targeting speech and disregarding noise | |
| CN103646649A (en) | High-efficiency voice detecting method | |
| CN108831440A (en) | A kind of vocal print noise-reduction method and system based on machine learning and deep learning | |
| US9997168B2 (en) | Method and apparatus for signal extraction of audio signal | |
| US20210287674A1 (en) | Voice recognition for imposter rejection in wearable devices | |
| CN103985390A (en) | Method for extracting phonetic feature parameters based on gammatone relevant images | |
| CN116416996A (en) | A multi-modal speech recognition system and method based on millimeter wave radar | |
| CN105825857A (en) | Voiceprint-recognition-based method for assisting deaf patient in determining sound type | |
| CN106548786A (en) | A kind of detection method and system of voice data | |
| CN103258537A (en) | Method utilizing characteristic combination to identify speech emotions and device thereof | |
| Murugaiya et al. | Probability enhanced entropy (PEE) novel feature for improved bird sound classification | |
| TWI520131B (en) | Speech Recognition System Based on Joint Time - Frequency Domain and Its Method | |
| WO2018095167A1 (en) | Voiceprint identification method and voiceprint identification system | |
| Pour et al. | Gammatonegram based speaker identification | |
| CN105916090A (en) | Hearing aid system based on intelligent speech recognition technology | |
| Fathima et al. | Gammatone cepstral coefficient for speaker Identification | |
| CN112908347A (en) | Noise detection method and terminal | |
| CN108538290A (en) | Intelligent household control method based on audio signal detection | |
| CN120544610B (en) | Power cable fault discharge sound recognition method and system based on multi-feature fusion | |
| CN115064182A (en) | Fan fault feature identification method of self-adaptive Mel filter in strong noise environment | |
| CN118155608B (en) | Miniature microphone voice recognition system for multi-noise environment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |