[go: up one dir, main page]

TWI310543B - A method for classifying similar mandarin syllables using two consecutive bayesian decision rules - Google Patents

A method for classifying similar mandarin syllables using two consecutive bayesian decision rules Download PDF

Info

Publication number
TWI310543B
TWI310543B TW95118948A TW95118948A TWI310543B TW I310543 B TWI310543 B TW I310543B TW 95118948 A TW95118948 A TW 95118948A TW 95118948 A TW95118948 A TW 95118948A TW I310543 B TWI310543 B TW I310543B
Authority
TW
Taiwan
Prior art keywords
unknown
tone
similar
classification
bayesian
Prior art date
Application number
TW95118948A
Other languages
Chinese (zh)
Other versions
TW200744067A (en
Inventor
Tze Fen Li
Original Assignee
Tze Fen Li
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tze Fen Li filed Critical Tze Fen Li
Priority to TW95118948A priority Critical patent/TWI310543B/en
Publication of TW200744067A publication Critical patent/TW200744067A/en
Application granted granted Critical
Publication of TWI310543B publication Critical patent/TWI310543B/en

Links

Landscapes

  • Complex Calculations (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

1310543 九、發明說明: 1.【發明所屬之技術領域】 國浯相同或相似單音非常多,造成辨認單音困難,尤其 男女發音及發音時生理、^理狀财同,同—單音,發出聲 音,變化差異很大’還有省籍及各地人發音的聲調,即使同 -單音’會發出好不難似單音。本發明是將和未知單音 的相似已知聲音,用連續二次貝氏分類法,在相似已知單音 中,找出真正未知單音。詳純說,本發縣音舰系统包 含顯等長的彈性框,沒有毅器,不重疊,框住長短不等 的單音音波,將纽正常化並_成綠雜讎編瑪倒頻 譜(LPCC)向量。先用㈣貝氏分類法(Bayesdecisic)n rule) 找出未知單音的最她已知單音。精密貝氏分類法 將ίί單音從糊已知她單音找出。音波正常化及抽取特 徵簡單、省時’辨認快又準確,辨認範圍廣。 2·【先前技術】 早音發音時’它的發音用音波表示。音波是—種隨時間 作^線性變化的系統,—個單音音波内含有—勸態特性, 也隨時間作非線性連續變化。相同單音發音時,有—連串相 同動悲特性,隨時間作非線性伸展及收縮,但相同動態特性 獨排列秩序一樣,但時間不同。相同單音發音時,將相 同的動態特性排列在同一時間位置上非常困難。更因相似單 音特多,造成辨認更難。 早 6 1310543 —個電腦化語謂認系統,首先要抽取聲波有關心次 ^也即動態特性,過辦語言無_雜音,如人的音°色貝 =’說話時心理及生理及情緒和語音辨認無關先職。然 後再將相同單音的相同特徵排列在相_時間位置上。此— 連㈣特徵用―等長系列特徵向量表示,稱為-個單音⑽ 徵模型。目前語音辨認系統要產生大小—致的特徵模型太複1310543 IX. Description of invention: 1. [Technical field to which the invention belongs] There are many identical or similar monotones in the country, which makes it difficult to identify single tones, especially the physiology and rationality of the pronunciation and pronunciation of men and women, the same - monophonic, Sounds, the difference is very different 'there is the tone of the provincial and local people's pronunciation, even if the same - single tone will sound like a single tone. The present invention is to find a truly unknown tone in a similar known tone by using a similar known sound with an unknown single tone. According to the detailed description, the sound wave system of the county has a flexible frame with the same length. There is no fortune, no overlap, and the monophonic sounds of different lengths are framed. The new one is normalized and _ into a green chowder. LPCC) vector. First use the (four) Bayesian classification (Bayesdecisic) n rule) to find the most known single tone of the unknown tone. The precision Bayesian classification finds the ίί tone from the paste known to her mono. Normalization of sound waves and extraction features are simple and time-saving. The identification is fast and accurate, and the recognition range is wide. 2. [Prior Art] When the early sound is pronounced, its pronunciation is expressed by sound waves. Sound waves are a kind of system that changes linearly with time. - A single sound wave contains the characteristics of the perceptual state, and it also changes nonlinearly with time. When the same single tone is pronounced, there are a series of identical sorrow characteristics, which are nonlinearly stretched and contracted with time, but the same dynamic characteristics are arranged in the same order, but the time is different. When the same single tone is pronounced, it is very difficult to arrange the same dynamic characteristics at the same time position. It is more difficult to identify because of the similarity of the single tone. As early as 6 1310543 - a computerized language predicate system, we must first extract the sound waves related to the heart ^, that is, the dynamic characteristics, the language is not _ murmur, such as the human voice ° color shell = 'speaking mental and physical and emotional and voice Identifying irrelevant predecessors. The same features of the same tone are then arranged in the phase_time position. This - even (4) features are represented by the "equal length series of feature vectors, called a single tone (10) sign model. At present, the speech recognition system has to generate a large-scale feature model too complex.

雜’且費時。相同單音的相同特徵很難排列在同—時間位置 上,導致比對辨認困難。 -般#音辨認㈣有三魅要卫作:練特徵,特徵正 常化(特賴型大小-致’且_單音的相同特徵排列在同 時間位置)及未知單音辨認一個單音聲波特徵常用有下 列幾種:能量(energy),零橫過點數(_⑽如哪》Miscellaneous and time consuming. The same features of the same tone are difficult to align in the same-time position, making alignment difficult. - General #音识别(四) There are three charms to defend: the characteristics of the training, the normalization of the features (the size of the special type - the same feature of the _ single tone arranged at the same time position) and the unknown tone to identify a single sound wave feature commonly used There are the following types: energy, zero crossing points (_(10) where"

極值數目(extreme c〇unt),_ (f〇r_s),線性預估 編碼倒頻譜(LPCC)及梅爾頻率倒頻譜(MFCC),其中以Lpc(:Extreme number (extreme c〇unt), _ (f〇r_s), linear prediction coded cepstrum (LPCC) and Mel frequency cepstrum (MFCC), where Lpc(:

及mfcc是最有效’並普遍使用。LPCC是代表一個單音最可 靠,穩定辑確的語謂徵。它崎性迴賴式代表單音音 波’以最小平方估計料算迴歸龜,其估計值再轉換成倒 頻譜’就成為LPCC。而MFCC是將音波用傅氏轉換法轉換成 頻率。再根據梅爾頻率比例去估計聽覺系統。根據學者s. B. Davis and Ρ· Mermelstein 於 198〇 年出版在 IEEEAnd mfcc is the most effective 'and is commonly used. The LPCC is the most reliable and stable word for a single tone. Its sinuous singularity represents a single-sound wave, which is returned to the turtle with a least squares estimate, and its estimate is converted into a cepstrum, which becomes the LPCC. The MFCC converts the sound waves into frequencies using the Fourier transform method. The auditory system is estimated based on the ratio of the frequency of the Mel. According to scholars s. B. Davis and Ρ Mermelstein published in IEEE in 198

Transactions on Acoustics, Speech Signal Processing, 7 1310543 vol. 28,No. 4 發表的論文 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences中用動態時間扭曲法 (DTW) ’ MFCC特徵比LPCC特徵辨認率要高。但經過多次語 音辨認實驗(包含本人前發明),用貝氏分類法,Lpcc特禮文 辨認率比MFCC特徵要高,且省時。 至於5吾言辨認’已有很多方法採用。有動態時間扭曲法 (dynamic time-wai*ping) ’ 向量量化法(vector quanti_ zation)及隱藏式馬可夫模式法(hmm)。如果相同的發音在 時間上的變化有差異,一面比對,一面將相同特徵拉到同一 時間位置。辨s忍率會很好,但將相同特徵拉到同一位置很困 難並扭曲時間太長,不能應用。向量量化法如辨認大量單 音’不但不準確,且費時。最近HMM辨認方法不錯,但方法 繁雜,太多未知參數需估計,計算估計值及辨認費時。最近 T. F. Li 於 2003 年出版在 Pattern Reocgnition, vol. 36 發 表的論文 Speech Recognition of mandarin monosyllables 中用貝氏分類法,以相同資料庫,將各種長短一系列LPCC 向量壓縮成相同大小的分類模型,辨認結果比γ.Κ. Chen, C.Y. Liu,G. H. Chiang,M.T. Lin 於 1990 年出版在 Proceedings of Telecommimication Symposium, Taiwan 發 表的論文 The recognition of mandarin monosyllables 8 based on the discrete _en Mark〇v 舶制中用隱藏式 馬可夫模姐(臓)方絲,但魏触獅_,且相同 單音很難斜目騎賴鋼烟㈣位置,對於相似單音, 很難辨認。 本發明語音辨财法針對±述缺點,從學财面,根據 音波有-種語音特徵,隨時間作非線性變化,自 抽取語音特财法。將—個單音音波先正常化再轉換成一個 足以代表該單音的大小相等特徵模型,並且相同單音在它們 特徵模型内相同時間位置有相同特徵。不需要人為或實驗調 —統内的未知參數及门插。用簡易貝氏分類法,即可將未 知早音分類模型和資料庫内已知單音標準模型比對,不需要 再壓縮,扭曲或顿相_特徵來比對。所以本發明語音辨 認系統’能快速完成特徵抽取,特徵正常化及辨認。本系統 為了提高單音辨認率’連續用二次貝氏分類法將很難辨認的 t目似早音騎岭,大大提高整體目語單音辨_。辨認率 二’應用廣,用彈性框可以辨認極短或極長單音音波,對英 語極短音節辨認也有效。 、 【發明概要】 ⑴本發明提供—麟音辨認絲。它紐不具語音音波 刪除。 ⑵本㈣提供—鮮音音波正常化及抽取特徵方法。它 使用A個相等彈性框,不重疊,沒有濾波器,能依單 1310543 2音波長短自由含蓋全部波長,能將單音音波内 系列p物’非線性變化的動態雜轉換成一個大 小相等的特徵模型,並且相同單音音波的特徵模型在 相同時間位置上有相同特徵。可以及時辨認,達到電 腦即時辨認效果。 (3)本發明提供—種簡易有效貝氏辨認未知單音方法。句、 • 錯機率達到最小,計算少,辨認快及辨識率高。〜 ⑷本士發明提供—種抽取單音特徵方法。單音音波有一種 隨時間作非線性變化的動態特性。本發明用隨時間作線 性變化的迴歸模型估計隨時間作非線性變化的音波。產 生的迴歸未知係數的最小平方估計值(LPC向量)。 (5)本發秘騎有具有語音音波(音波韻點)。用較少 數個相等彈性框,不重疊含蓋所有信號點特徵。 | 抑為―個單音音波纽,刪去料音,也不因為太 長,刪去或壓縮部分信號點。只要人鎌覺能辨別此單 曰’本發明即可將該單音抽取·。所財發明語音辨 %、系統應用每一個具有語音的信號點,可以盡量抽取語 音特徵。因㈣2個雜框不重疊,框數少,大大減少 特徵抽取及計算LPCC時間。 ⑹本發明語音辨鱗統可明輯話太減講話太慢的 單音。講話太快時’單音音波很短,尤其對於英語音節 10 發音,本發明的彈性框長度可以縮小,仍然用相同數π 個等長的彈性框含蓋短音波。產生個LPCC向量。只 要該短音人類可辨別,那麼該f個LPCC向量可以有效 代表該短音的特徵模型。講太慢所發出單音音波較長。 彈_會伸長。所產生箱LPCC向量也能有效代麵 長音。 ⑺本發明包含-套語音辨認祕。首先由多人發出相同 的已知單音,產生多種長短不―的聲波。織用靠I相 同的彈性框’沒有濾波器,不重疊,含蓋全部聲波,產 生-個已知單音的万個Lpcc向量多個樣本也即有多 個特徵模型代表該已知單音。多個特徵模型内在相同時 間,置上含有相同特徵向量,也即每個樣本的糊⑽ 向$依順序LPCC向量大致相同。再將特賴型的樣本 求LPCC平均值及變異數,得到大小一致含有平均值及 欠異數的矩陣,叫做該已知單音的標準模型存在資料 庫令。再用同樣方法,將一個未知單音的音波,用及 個等長彈性框,沒有遽波器,不重疊包含全部音波,產 生糊LPCC向量。該“固Lpcc向量稱為一個未知單音 的分類模型。當用簡易的貝氏分類法比對未知單音及資 料庫内-個已知單音時,將分類模型内所有Lpcc假設 為有獨立正常分配的隨機變數,它的平均值及變異數用 已知單音標準模型内的樣本平均數及樣本變異數代 替。在資料庫内計算每一個已知單音標準模型内的平均 數與未知單音的分類模型内的LPCC距離,再以已知單 音的變異數調整。選擇與未知單音#個最相似已知單 曰,再用第二次貝氏分類法計算每個相似已知單音最靠 近未知單音的£個樣品的總貝氏距離 (mis-categorization risk)。再從#個相似已知單音 的總貝氏距離找出一個相似已知單音,它的總貝氏距離 達最小,判為未知單音。 【發明内容】 用圖-及圖二說明發明執行程序。圖_是表示資料庫 建立流程。f料庫包麵有已知單音的鮮麵,表示已 知單音的特徵。-個已知單音丨以—個連續音波1〇形式 進入接收器2G。數化轉換器3〇將連續音波轉為―序列音 波數化的信號點。先前處理器4〇有兩種刪去方法:⑴ 計算-小時段内_點的變異數及—般雜音變異數。 者小於後者,舰树段不綠音,應·。⑵計算— 小時段内連伽f親輯姊及—般料騎和,如前 者小於後者,_辦段从語音,_去。經過先前處 理40之後’付到-序列具有該已知單音信號點。先將 音波正常化取_,將已知單音的全部錢點分成^ 等時段。每時段組成-個框。—個單音—共有·等長框 1310543 5〇 ’沒有驗n ’不重疊,根據單音全部信號點的長度, 以固框長度自_整含蓋全部信號點。所以該框稱為彈性 =,長度自由伸縮,但框長度—樣。不像Ha職ing 固’有濾'波器’半重疊,固定長度,不能隨波長自由調整。 因單音纽__非雜變化,音波含有-個語音動能 特徵,也隨時間作非線性變化。因為不重疊,所以本㈣Transactions on Acoustics, Speech Signal Processing, 7 1310543 vol. 28, No. 4 Published by parametric representations for monosyllabic word recognition in the continuous sentences Sentence Time Warping (DTW) ' MFCC features than LPCC feature recognition rate high. However, after many speech recognition experiments (including my previous invention), the Bayesian classification method, the recognition rate of Lpcc special gift is higher than the MFCC feature, and saves time. As for the five-word recognition, there are many ways to adopt it. There are dynamic time-wai*ping ' vector quantization and hidden Markov mode (hmm). If the same pronunciation changes in time, one side compares the same feature to the same time position. It is good to recognize the s tolerance, but it is difficult and too long to pull the same feature to the same position and cannot be applied. Vector quantization, such as recognizing a large number of monotones, is not only inaccurate, but also time consuming. Recently, the HMM identification method is good, but the method is complicated, too many unknown parameters need to be estimated, and the estimated value and the recognition time are calculated. Recently, TF Li published in Paper Reocgnition, vol. 36 in 2003, using the Bayesian classification method in Paper Recognition of mandarin monosyllables, to compress various length and length series of LPCC vectors into the same size classification model with the same database. The result is more than γ.Κ. Chen, CY Liu, GH Chiang, MT Lin published in 1990 in Proceedings of Telecommimication Symposium, Taiwan. The recognition of mandarin monosyllables 8 based on the discrete _en Mark〇v Markov's model sister (臓) square silk, but Wei touched the lion _, and the same single tone is difficult to squint to ride the Lai steel (four) position, for similar tones, it is difficult to identify. The speech discriminating method of the present invention is directed to the shortcomings of the description, from the financial aspect, according to the sound characteristics of the sound waves, and the non-linear changes over time, and the self-extracting speech special wealth method. The monophonic sound waves are normalized first and then converted into an equal-sized feature model sufficient to represent the tones, and the same tones have the same characteristics at the same time position within their feature models. There is no need to manually or experimentally adjust the unknown parameters and gates. Using the simple Bayesian classification method, the unknown early classification model can be compared with the known mono standard model in the database, without recompression, distortion or phase _ feature comparison. Therefore, the speech recognition system of the present invention can quickly perform feature extraction, feature normalization and recognition. In order to improve the tone recognition rate, the continuous use of the second Bayesian classification method will make it difficult to identify t-like early-sounding ridges, greatly improving the overall target tone. Recognition rate 2' Wide application, the use of the elastic frame can identify very short or very long monophonic sound waves, and is also effective for the recognition of very short syllables in English. [Summary of the Invention] (1) The present invention provides a Linyin identification wire. It does not have a voice sound wave removed. (2) This (4) provides the method of normalizing and extracting features of fresh sound waves. It uses A equal elastic frame, no overlap, no filter, can be free to cover all wavelengths according to the single 1310543 2 tone wavelength, and can convert the dynamic miscellaneous changes of the series p object into a uniform size. The feature model, and the feature models of the same monophonic sound waves have the same features at the same time position. It can be identified in time to achieve instant recognition of the computer. (3) The present invention provides a simple and effective method for recognizing unknown tones. Sentences, • The wrong rate is minimized, the calculation is small, the recognition is fast, and the recognition rate is high. ~ (4) The inventor's invention provides a method for extracting monophonic features. A single sound wave has a dynamic characteristic that changes nonlinearly with time. The present invention estimates a sound wave that changes nonlinearly over time using a regression model that varies linearly with time. The least squared estimate of the regression unknown coefficient (LPC vector) produced. (5) This hair secret ride has a voice sound wave (sound wave rhyme). With fewer equal elastic frames, all signal point features are covered without overlapping. | Suppose it is a single tone, delete the material, and not because it is too long, delete or compress some signal points. As long as the person can discern the single 曰', the present invention can extract the single tone. According to the voice identification of the invention, the system applies each signal point with speech, and can extract speech features as much as possible. Because (4) the two miscellaneous frames do not overlap, the number of frames is small, which greatly reduces feature extraction and calculates LPCC time. (6) The speech discriminating system of the present invention can clarify the monosyllabic speech too much. When the speech is too fast, the single tone is very short, especially for the English syllable 10 pronunciation, the length of the elastic frame of the present invention can be reduced, and the same number of π equal length elastic frames are still used to cover the short sound wave. Generate an LPCC vector. As long as the short human is discernible, the f LPCC vectors can effectively represent the feature model of the short tone. Speaking too slowly, the single tone is longer. The bullet _ will elongate. The generated box LPCC vector can also effectively generate long sounds. (7) The present invention includes a set of voice recognition secrets. First, multiple people emit the same known tone, producing a variety of sound waves that are not long or short. Weaving the same elastic frame by I's no filter, no overlap, covering all sound waves, producing 10,000 Lpcc vectors for a known tone. Multiple samples, that is, multiple feature models representing the known tone. Multiple feature models are internally identical, with the same feature vector set, ie, the paste (10) for each sample is approximately the same as the sequential LPCC vector. Then calculate the LPCC average and the number of variances for the sample of the special type, and obtain a matrix with the same size and the average of the undertones. The standard model called the known tone has a database order. In the same way, an unknown monophonic sound wave is used with an equal-length elastic frame, without a chopper, and the entire sound wave is not overlapped to produce a paste LPCC vector. The "solid Lpcc vector is called a classification model of unknown tones. When comparing the unknown tones and the known tones in the database with the simple Bayesian classification, all Lpccs in the classification model are assumed to be independent. Normally assigned random variables whose mean and variance are replaced by the sample mean and sample variability in the known monophonic standard model. The average and unknown in each known mono standard model are calculated in the database. The LPCC distance in the classification model of the monophony is adjusted by the variation of the known tones. The most similar known singles are selected with the unknown tones #, and each similar known is calculated by the second Bayesian classification. The miss-categorization risk of the £ sample closest to the unknown tone. Then find a similar known tone from the total Bayesian distance of #similar known tones, its total The distance is the smallest and is judged as an unknown single tone. [Summary of the Invention] The invention execution program is illustrated by Figure- and Figure 2. The diagram _ is the process of establishing the database. The surface of the material library has a fresh surface with a known tone. Know the characteristics of the tone. - Known The sound is entered into the receiver 2G in the form of a continuous sound wave. The digitizer 3 converts the continuous sound wave into a signal point of the serial sound wave number. The previous processor 4 has two methods of deleting: (1) calculation - The number of _ points in the hour and the variation of the general murmur. The smaller than the latter, the ship tree segment is not green, should be. (2) Calculation - the hourly segment of the gamma-family and the general ride, such as the former Less than the latter, the _ segment is from the voice, _. After the previous processing 40, the 'paid-sequence has the known tone signal point. The sound wave is normalized to take _, and the entire money point of the known tone is divided into ^ Waiting time period. Each time period consists of - box. - One tone - Common · Isometric frame 1310543 5〇 'No test n 'Do not overlap, according to the length of all signal points of the single tone, with the solid frame length from the whole cover Signal point. So the box is called elastic=, the length is free to expand and contract, but the length of the frame is the same. Unlike the Ha ing 固 solid 'filter' waver' semi-overlapping, fixed length, can not be freely adjusted with the wavelength. __ non-heterogeneous changes, sound waves contain - a phonological kinetic energy feature, also over time Changes. Because not overlap, (iv) present

制,較燦⑵個·框,含蓋單音音波,因信號點可 由#面彳》號點估&丨’用隨時間作線性變化的迴歸模式來密 切__變化的音波。勝】、平方法估計迴歸未知 數每框内產生-组未知係數最小平方估計值叫做線性 預估編碼(LPC向量)。再將Lpc向量轉換為較穩定敗倒 頻譜(LPGG)。-解音音波内含有—相_間作非線 性變化的語音動態特徵,在本發明_換成大小相等以固 LPCC向量6G。為了建立龍庫,—個已知單音,由多人 發音,以_轉射法,將多人發音音輯成大小相等的 f個LPCC向量多個樣本。多個樣本的_ Lpcc向量代表 一個同樣已知單音,因此们固LPCC向量依順序,多個樣 本的LPCC向量應大致相同,也即在同—時間位置上,多 個樣本LPCC向量-樣。再計算多個樣本Lp(:c的平均值及 «數。這⑽LPGG平均似f異麵量絲—個已知 單音的標準概㈣,稱為該已知單音的標準模型70。此 13 1310543 含有樣本平均數及變異數的鮮模型代表—個已知單 音,儲存在資料庫中80。 I —π不 00 -個未知單音辨認方法流程。當輸入一個未System, more than (2) box, including a single tone, because the signal point can be estimated by the #面彳号&丨' with a linear regression pattern with time to closely match the __ changing sound wave. Win], the flat method estimates the regression unknowns generated within each box - the least squares estimate of the group unknown coefficients is called linear predictive coding (LPC vector). The Lpc vector is then converted to a more stable defeated spectrum (LPGG). - The voice dynamics containing the phase-to-linear variation in the sound-sounding sound are replaced by the equal-sized LPCC vector 6G in the present invention. In order to establish a dragon library, a known single tone is pronounced by a plurality of people, and the multi-person pronunciation sound is compiled into a plurality of samples of f equal-sized LPCC vectors by the _transfer method. The _ Lpcc vector of multiple samples represents an equally known tone, so the fixed LPCC vectors are sequential, and the LPCC vectors of multiple samples should be approximately the same, that is, at the same-time position, multiple sample LPCC vectors. Then calculate the average value of the multiple samples Lp (:c and «number. This (10) LPGG average f-surface fascia - a standard general of the known tones (four), called the standard model 70 of the known tones. 1310543 Fresh model with sample mean and variance represents a known single tone, stored in the database 80. I —π 不00 - an unknown tone recognition method flow. When entering an un

知单音2到本發明語音辨認方法後,以未知單音連續音波 11進入接收器21,由數位機器3〇轉為一系列音波信號 點。^以圖-先前處理器40刪去不具語音的音波。將音 波正带化’抽取特徵,將未知單音全部具有語音的信號點 刀成方等日谈’每時段形成—個彈性框5G。-共有$個彈 性框’沒«波器,不重疊,自由伸縮含蓋全部信號點。 在母框内,因信號點可由前面信號估計,用最小平方法求 迴知未知係數·計值。每框崎產生的—組最小平方估 计值叫做LPC向量。lpc向量有正常分配。再將敗向量 轉換較穩定LPC倒頻譜,也即⑽向量6〇。一個未知單 曰以Θ個LPCC向里代表特徵模型,稱為分類模型⑽,和 已知單音鮮模型大小—樣。如果—個已知單音是此未知 單音,它的標準模型的平均值最靠近未知單音分類模型的 LPCC。所以本發明的簡易貝氏辨認法1〇〇,以未知單立的 分類_和龍縣-個已知單音的標轉型⑼比較。 如果-個已知單音是該未知單音,為了計算省時,假定未 知單音的分麵型内所有Lpcc有獨立正常分配,它們的 平均數及&異數以已知單讀準觀樣本平均值及樣本 a31〇543 變異數估6十。先以一個簡易貝氏法是計算未知單音的敗[ 〃 α單日的平均數的距離’再以已知單音變異數調整, 所得_代辆未知單讀—個已知單音相城。選擇與 未知早音#個相似度最高已知單音11〇。再用一個精細的 貝氏分類法’計算每她似已知單音最靠近未知單音{個 樣品的總貝氏距離(mis_categorizati〇n risk) 12〇。從 • #個相似已知單音找一個相似已知單音,它的λ樣品的 總貝氏距離達最小,判為未知單音13〇。 5·【實施方式】 (1) 個單3輸入香音辨認系統後’將單音連續音波轉換— 系列數化音波k號點(signal sampled p〇ints)。再删 去不具語音音波信號點。本發明提供二種方法:一是計 算一小時段内信號點的變異數。二是計算該時段内相鄰 一信號點距離的總和。理論上,第一種方法比較好,因 I 信號點的變異數大於雜音變異數,表示有語音存在。但 在本發明辨認單音時,兩種方法辨認率—樣,但第二種 省時。 (2) 不具語音信號點刪去後,剩下信號點代表一個單音全 部信號點。先將音波正常化再抽取特徵,將全部信號點 分成等時段,每時段形成一個框。一個單音共有$個 等長的彈性框,沒有濾波器,不重疊,自由伸縮,含蓋 15 1310543 全部信號點。彈性框内信號點隨時間作非線性變化,很 難用數學模型表示。因為J.Markhoul於1975年出版在After knowing the tone 2 to the speech recognition method of the present invention, the unknown tone continuous sound wave 11 enters the receiver 21, and the digital machine 3 turns into a series of sound signal points. ^ The image-previous processor 40 deletes the sound without speech. The sound wave is positively banded into the 'extraction feature, and the signal of the unknown single tone having all the voices is squared into a square, and the elastic frame 5G is formed every time period. - There are a total of $ elastic frames ‘no «waves, no overlap, free telescopic cover all signal points. In the mother frame, since the signal point can be estimated by the previous signal, the least square method is used to obtain the unknown coefficient and the value. The set of least squares estimates produced by each frame is called the LPC vector. The lpc vector has a normal allocation. The loss vector is then converted to a more stable LPC cepstrum, which is the (10) vector. An unknown single 代表 represents the feature model in an LPCC, called the classification model (10), and the size of the known single tone model. If a known tone is this unknown tone, the average of its standard model is closest to the LPCC of the unknown tone classification model. Therefore, the simple Bayesian identification method of the present invention is compared with the classification of unknown unknowns _ and Longxian - a known single tone (9). If a known tone is the unknown tone, in order to save time, it is assumed that all Lpccs in the facet of the unknown tone have independent normal assignment, and their mean and & The sample mean and the sample a31〇543 variation are estimated to be 60. First, a simple Bayesian method is used to calculate the loss of an unknown single tone [ 〃 α the distance of the mean of a single day and then adjust by the known tone variation, the resulting _ generation unknown unknown single reading - a known monophonic phase city . Select the highest known single tone 11〇 with the unknown early sound #. A fine Bayesian classification is then used to calculate the total Bayesian distance (mis_categorizati〇n risk) of each sample that is known to be the closest to the unknown tone. From • a similar known single tone to find a similar known tone, its λ sample has the smallest total Bayesian distance and is judged as an unknown single tone of 13〇. 5·[Embodiment] (1) After a single 3-input incense recognition system, the single-tone continuous sound wave is converted into a series of sampled signal k-points (signal sampled p〇ints). Then delete the point that does not have a voice signal. The present invention provides two methods: one is to calculate the number of variations of signal points within one hour. The second is to calculate the sum of the distances of adjacent signal points in the time period. In theory, the first method is better, because the number of variations of the I signal point is greater than the number of noise variations, indicating that there is speech. However, when the present invention recognizes a single tone, the two methods recognize the rate, but the second saves time. (2) After the voice signal point is deleted, the remaining signal points represent a single tone signal point. The sound waves are normalized and the features are extracted, and all the signal points are divided into equal periods, and a frame is formed every time period. A single tone has a flexible frame of equal length, no filter, no overlap, free stretch, and covers 15 1310543 all signal points. The signal points in the elastic frame change nonlinearly with time, which is difficult to represent with a mathematical model. Because J. Markhoul was published in 1975.

Proceedings of IEEE, Vol. 63,No. 3 發表論文 LinearProceedings of IEEE, Vol. 63, No. 3 Published paper Linear

Prediction: A tutorial review中說明信號點與前面 k號點有線性關係’可用隨時間作線性變化的迴歸的模 型估計此非線性變化的信號點。信號點⑻可由前面信 號點估計,其估計值S,⑻由下列迴歸模式表示: «>〇 ⑴ k=l ''夕 在(1)式中,Α,λ = 1,._.,Λ是迴歸未知係數估計值,/>是前 面信號點數目。用L. Rabiner及B. H. Juang於1993年 著作書FundamentalsofSpeechRecognition,Pren-tice Hall PTR, Englewood Cliffs, New Jersey 中Prediction: A tutorial review states that the signal point has a linear relationship with the previous k-point. The model of the regression that varies linearly with time estimates the signal point of this nonlinear change. The signal point (8) can be estimated from the previous signal point, and its estimated value S, (8) is represented by the following regression pattern: «>〇(1) k=l '' In the formula (1), Α, λ = 1,._.,Λ Is the regression unknown coefficient estimate, /> is the number of previous signal points. In L. Rabiner and B. H. Juang, 1993, FundamentalsofSpeechRecognition, Pren-tice Hall PTR, Englewood Cliffs, New Jersey

Durbin的循環公式求最小平方估計值,此組估計值叫做 LPC向量。求框内信號點的Lpc向量方法詳述如下: 以尽表示信號點及其估計值y⑻之間平方差總和: (2) «=0 k=\ 求迴歸係數使平方總和達最小。對每個未知迴歸係數 β'々ι,_.”Λ求(2)式的偏微分,並使偏微分為〇,得到尸 組正常方程式: Ρ Σ 〜Σ 抑-明〇 二 Σ 5⑻5(w - 〇, \<ι<ρ (3) 16 1310543 展開(2)式後,以(3)式代入,得最小總平方差尽 & =2>2 ⑻-ίΧΣ 抑)抑-幻 (4) rt k=\ η (3)式及(4)式轉換為 - k)二 R(i、, \<i<P (5) k=\ k=lDurbin's loop formula finds the least squares estimate, and this set of estimates is called the LPC vector. The Lpc vector method for finding the signal points in the frame is detailed as follows: To sum the sum of the squared differences between the signal points and their estimated values y(8): (2) «=0 k=\ Find the regression coefficient to minimize the sum of squares. For each unknown regression coefficient β'々ι, _.", ask for the partial differential of (2), and divide the partial differential into 〇, and get the normal equation of the corpse: Ρ Σ ~Σ --明〇二Σ 5(8)5(w - 〇, \<ι<ρ (3) 16 1310543 After expanding (2), substitute (3) to get the smallest total squared difference &=2>2 (8)- ΧΣ ) 抑 - 幻 4 4 ) rt k=\ η (3) and (4) are converted to - k) two R(i,, \<i<P (5) k=\ k=l

在(5)及(6)式中,用#表示框内信號點數, N-i R(i) = ^S(n)S(n + i), />0 (7) w=0 用Durbin’s循環快速計算LPC向量如下: E0=R(〇) (8) k, ])]/Ε^ (9) ./=1 α,(ί) = ki (10)In equations (5) and (6), use # to indicate the number of signal points in the frame, Ni R(i) = ^S(n)S(n + i), />0 (7) w=0 with Durbin's The loop quickly calculates the LPC vector as follows: E0=R(〇) (8) k, ])]/Ε^ (9) ./=1 α,(ί) = ki (10)

αΚ-Ι^α”, (11) £,=(1-^)^, (12) (8-12)公式循環計算,得到迴歸係數最小平方估計值, j· = 1,…,八(LPC向量)如下: α,=αΓ, \<j<P (13) 再下列公式將LPC向量轉換較穩定LPC倒頻譜(LPCC)向 量 fl’.,.,/ = 1,__” 尸, 17 (14)αΚ-Ι^α”, (11) £,=(1-^)^, (12) (8-12) The formula is cyclically calculated to obtain the least squares estimate of the regression coefficient, j· = 1,..., eight (LPC) The vector) is as follows: α,=αΓ, \<j<P (13) The following formula converts the LPC vector to a more stable LPC cepstrum (LPCC) vector fl'.,., / = 1,__" corpse, 17 ( 14)

―個彈性框產生-飢Ρ(Χ向量κ,.._,α,;卜根據本發明語 音辨認系統,用/Μ2,因最後的LPCC幾乎為〇。一個單 音以以固LPCC向量表示特徵’也即_個含納固Lpcc 的矩陣表示一個單音特徵。 (3) 一個已知單音由多人發音,產生多個樣本的^p矩陣, 同時代表該已知單音,求多個LPCC樣本平均值及變異 數’得到-個财矩陣内含LP(X樣本平均值及樣本變異 數。該矩陣稱為該已知單音的標準特徵,或標準模型。 (4) 同樣方如(8-15)辆算個杨單音音波的f個- an elastic frame produces - hunger (Χ vector κ, .. _, α,; according to the speech recognition system of the present invention, with / Μ 2, because the final LPCC is almost 〇. A single tone to represent the features of the solid LPCC vector 'Also _ a matrix containing nanosolid Lpcc represents a monophonic feature. (3) A known tone is pronounced by multiple people, producing a matrix of multiple samples, representing the known tone, and multiple The average value and the number of variances of the LPCC sample are obtained - the financial matrix contains LP (the average value of the X sample and the sample variation. This matrix is called the standard feature of the known tone, or the standard model. (4) The same as ( 8-15) Counting f of Yang single sound waves

LPCC向量,有同樣大小办户個LPCC的矩陣,叫做未知單 音的分類模型。 在圖—中,語音辨認器1〇〇,收到一個未知單音的分類 麩型,一個^^LPCC的矩陣。用1 =匕卜y = 1,』, “1,·..’,表示未知單音特徵模型。在與一個已知單音q, ,比對時。為了快速計算比對值,假定有户 ,獨立正常分配’它的平均數及變異數,以已知 單音標準模型内的樣本平均值及樣本變異數估計。以 /(Xk)表示Z的條件密度函數。以T.F. U於2003年出 18 1310543 版在 Pattern Recognition,Vol. 36 發表論文 Speech recognition of mandarin monosyllables 中的決策理論 說明貝氏分類法如下:假設資料庫一共有瓜個已知單音的 標準模型。以K,w,表示單音c,叫,w,出現的機率, 也即先前機率,則|^,=1。以表示一個決策方法。定義The LPCC vector has a matrix of LPCCs of the same size, called a classification model of unknown tones. In the figure, the speech recognizer 1〇〇 receives a classification of the unknown tone, a matrix of ^^LPCC. Use 1 = 匕 y = 1, 』, "1, ·..', to represent the unknown tone feature model. When comparing with a known tone q, , in order to quickly calculate the alignment value, assume that there is a household , independent normal distribution 'its average and variation, estimated by the sample mean and sample variation within the known mono standard model. The conditional density function of Z is represented by /(Xk). 18 1310543 in Pattern Recognition, Vol. 36 The decision theory in Speech recognition of mandarin monosyllables shows that the Bayesian classification is as follows: Assume that the database has a standard model of known monotones. K, w, for singles The sound c, called, w, the probability of occurrence, that is, the previous probability, then |^, = 1. To represent a decision method. Definition

一個簡單損失函數(loss function),也即的判錯機率 (mi sc 1 ass i f i cat i〇n pr〇babi 1 i ty )如下:如決策方法 j 判 錯一個未知單音,則損失函數i(c,rf(功…。如果^判對一 個未知單音,則無損失以c,,i/(x)) = o。辨認方法如下:以[ hUw,表示矩陣值屬於已知單音c,的範圍。也即I 在β判未知單音屬於已知單音c,。^判錯平均機率為 (16) '-=lA simple loss function, that is, the probability of error (mi sc 1 ass ifi cat i〇n pr〇babi 1 i ty ) is as follows: If the decision method j judges an unknown single tone, the loss function i ( c, rf (function... If ^ is judged against an unknown tone, there is no loss with c,, i/(x)) = o. The recognition method is as follows: [hUw, indicating that the matrix value belongs to the known tone c, The range of I. In the case of I, the unknown single tone belongs to the known single tone c, and the average probability of error is (16) '-=l

t(16)中’ υ), Γ7是Γ,以外範圍。以D表示所有語 音辨認方法’也_分π個已知單音的範圍所有方法。 在D中找—個辨認方法植它的平均認錯機率( 最小,以%,W表示 (17) R{t,dr)二 minR(t,d) del) 滿足(Π)式的辨認方法d做與先前機率貝氏分 類法。可用下列表示: 、刀 19 1310543 dT(x) = Ci if dif{x\c)>ejf{x\cj) (18) 在(18)式中,; = ’也即屬於已知單音^的範圍是 對所有尸/,rf ={χΚ/(小。如所有已知單音出 現機率一樣’則貝氏分類法和最大機率法—樣。 貝氏分類法(18)辨認一個未知單音時,先計算所有| 的條件密度函數/〇c|c,), / = l,...,w,'(υ) in t(16), Γ7 is Γ, outside the range. All methods of the speech recognition method are also indicated by D, which also divides the range of π known tones. Find the average probability of error in D in the identification method (minimum, in %, W (17) R{t, dr) two minR (t, d) del) meet the (Π) formula identification method d do With the previous probability of Bayesian classification. It can be expressed as follows: , knife 19 1310543 dT(x) = Ci if dif{x\c)>ejf{x\cj) (18) In (18), ; = 'is also known as a single tone ^ The range is for all corpses /, rf = {χΚ / (small. If all known tones have the same probability of occurrence - then Bayesian classification and maximum probability method - Bayesian classification (18) identifies an unknown tone) When calculating the conditional density function / 〇c|c,), / = l,...,w,

Π 4ΐπ\ σ, ijt _1Σ(νΐ% 2 ./ί σίβ (19) 在(19)中’ /4,.,·^,(已知單音總數)。為了計算方便,將 (19)式取對數,並刪去常數,得 βΠ 4ΐπ\ σ, ijt _1Σ(νΐ% 2 ./ί σίβ (19) In (19) ' /4,.,·^, (the total number of known tones). For the convenience of calculation, take (19) Logarithm, and delete the constant, get β

貝氏分類法⑽變成對每個已知單音算⑽值 (2〇),%)也稱為未知單音和已知單音一她度,或貝 氏距離加咖1_斷i扑在⑽式中,叫山 Η·』’ 未知單音分類模型内LPCC值, 用已知單音的縣模軸樣本平均數及樣本變 ”數估和第-次㈣分類法辨認 模型叫Μ在:賴庫 々早日的刀類 叱)值達#個最小,判A , 《匕的 音。從相似已知單音未知單音的#個相似已知單 從相似已知單音找^L二次貝氏分類法 戈出未知單音。以㈣,/=1,…成 20 1310543The Bayesian classification (10) becomes a (10) value (2〇) for each known single tone, and %) is also called an unknown tone and a known tone, or a Bayesian distance plus a 1_ break. In the formula (10), the name of the LPCC value in the unknown single-tone classification model, the average number of samples and the sample variation of the county model axis with known tones and the first-to-four (four) classification method are called:赖 库々's early knives 叱) value up to #min, a, A, 匕 音. From the similar known single tones unknown singles #similar known single from similar known tones find ^L twice The Bayesian classification method produces an unknown tone. (4), /=1,...to 20 1310543

…’Μ戈表第,個相似已知單音的第々個樣品的Lpcc的 ⑽矩陣。假如第,·個相似單音有,樣品 samples),在第二次貝氏分類法巾,未知單音脱矩陣 U和它的第/個相似單音的全部,個樣品的脱矩陣 各咍輯,那麼在此分類法中,以6啦為平均值。 ^果未知單音叫W是屬於L似單音,J的平均值 是第續樣品<,矩陣,射的變異數(variance)是 IJC / ii6 = V t w *VJ (21) 在⑵)式中’ AH個她單音脱的㈣。但此I 的條件變魏⑻無法計算,因此㈣,個已知單音變里 7’^在(21)式中’%卜心嗔定同 =Γ但是珊品。因此未知單音j和第,相似...' Μ戈表, a (10) matrix of Lpcc for the second sample of a similar known tone. If the first, · a similar single tone, sample samples), in the second Bayesian classification towel, the unknown single tone off matrix U and its first / similar single tone, the sample matrix off the series Then, in this classification, 6 is the average. ^The unknown single tone is W belongs to L-like tone, the average value of J is the continuation sample <, matrix, the variance of the shot is IJC / ii6 = V tw *VJ (21) in (2)) In the 'AH her single tone off (four). However, the condition of this I becomes Wei (8) and cannot be calculated. Therefore, (4), a known single tone is changed 7'^ in (21) where '% is the same as =Γ but Shanpin. Therefore unknown single tone j and the first, similar

曰Λ個樣品心{桃對時,J的變異數仍是資料庫 8〇中第,·個單音的_、%。用第二: 知單音訊錄_料音邮娜㈣時Ϊ 氏距離Orns-她gQrizatlQn如)⑽再以下式表示、 β 丄if ίβ ^(22)中因用,”值,叹)可能會是負數。在尺,個 求《最錢嶋,筒响和_目似已 曰帽氏距離(也叫做總‘啊。而t 未知早音雜個相似已知單音的相似度),距離愈小, 21 1310543 h分類中,在#個相似已知單 它的總貝氏距離達最小,判為 相似度愈大。料二別氏分類中 音找-個相似已知單音,它碰目 此未知早9。第二次貝氏分類法有點像々-/?/? classifier , 但本發明用的是貝氏距離⑵) 。 在第二次 分類法中,每個她單如,本實驗只縣靠近(依貝氏 距離)未知單音—,_相似單音用太多的樣 品’可能會夾有—些不是很好的樣品(不能代表此相似單When a sample heart {Peach pair, the variation of J is still the number in the database 8〇, · _, % of a single tone. Use the second: know the single audio recording _ material tone mail Na (four) when the distance Orns- her gQrizatlQn such as) (10) then the following expression, β 丄if ίβ ^ (22) used, "value, sigh" may be Negative. In the ruler, a request for "the most money, the ring and the _ eye seems to have the distance of the hat (also known as the total 'ah. And t unknown early sounds similar to the similarity of the known single tone), the smaller the distance In the classification of 21 1310543 h, the total Bayesian distance of the #similar similar singles is the smallest, and the greater the similarity is judged. The second is to find a similar known tone, which is the same as the known single tone. Unknown early 9. The second Bayesian classification is a bit like 々-/?/? classifier, but the invention uses Bayesian distance (2)). In the second classification, each of her is as simple as this experiment. The county is close to (Iberian distance) unknown tone -, _ similar to single use too many samples 'may be caught - some are not very good samples (can not represent this similar

…个別平K第一順位相似已知單音)會被第 :次貝—氏分類法判錯。辨識率不但不增,反而獨。理 每個相似單音<的彳m愈多愈好。甚至於可以 包含發音不清晰的單音’因第二次貝氏分類法可以有更 多機會選擇^個更好的樣品(更,,靠近,,未知單音),如 果,品軟太多,計算始值費時。另外第二次貝氏分類 法是針對所有相似單音樣品計算的值,如果對未知單音 取太多相似單音(#很大),計⑽幽時。在本實: 中’必=10已足夠。 為了證實本發明語音辨H統職率高,_範圍廣, 22 1310543 抽取特徵及辨認方法快速且省時,本發明執行多個語音 辨認實驗。首先建立—個日常用的國語單音資料庫。本 早音資料庫是從中央研究院購買,品質差,又不完整。 表面上共有493個不同國語單音。每個單音樣品數不—, 很多只有-、二個樣品,有的是空的,甚至於多個樣品 包含兩個單音’有的多達9G個樣品。由多人發音,每人 • 只發音—次。使用貝氏分類法時,每個®語發音至少要 有6個樣品。任何__個樣品作測試時,其他樣品作 tralningsamples (每單音至少要有五個呢 samples求平均值及變異數)。只有361個國語單音合 格,總共4155樣品。品質不是很好。依品質好壞(學生 聽力測試)分成三類,最好一類有233單音(共有3152 樣品)。第二類有257單音(共有3302樣品)。第三類是 原有361單音(共有4155樣品)。測試一個樣品,其他 • 樣品(training samples)作為計算平均值及變異數。 儲藏在資料庫80中。測試樣品和計算平均值及變異數的 樣品完全分開。三類單音辨認結果放在表二。在表二内, 第一次貝氏分類法依相似度Φ,) (20)大小找出與未知單 音71/(=5, 10,20)個最相似已知單音。在表一有三個 測試單音的10個最相似單音。在第二次貝氏分類法中, 相似單音的樣品全部使用第三類全部樣品。因樣品較 23 1310543 二第二次貝氏分類法有更好機會找料』』最好 拼二3表—令’第一次貝氏分類法辨認第—類單音時, 認H 26% (以相似已知單音第—順位認定)。第二 :欠貝氏細_每個未知單音(職樣品)的2〇個已 〇相似早音’辨認率增加到94秦有768樣品是被第 -次貝氏分齡才能辨認正確。也有2峨樣品,本來 疋破第-次狀分臟辦,衫二讀聽。由本國 ==果’…12肩最好,(最省時及 辨識率㈣。朗糊彈性框不重疊,糾能充分抽取 個早音特徵,如A太大,增加辨識能力很少,很費時。 但因中央研魏資解品質从彳崎,所轉個相似單 音只取少數(料)最好的樣品為宜。 ^單音相似度大小%)順序如下 第—次貝氏分類法得到未知單音的1G個最相似已知單 λ. ΐ曰 1 2 3 4 5 表一 音。 未知...the individual Ping K first order is similar to the known single tone) will be judged by the first:Beibei classification. The recognition rate is not increased, but is independent. The more 彳m of each similar single tone, the better. It can even contain monos with unclear pronunciations' because the second Bayesian classification has more chances to choose ^ better samples (more, close, unknown), if too soft, Calculating the start value takes time. In addition, the second Bayesian classification is a value calculated for all similar monophonic samples. If too many similar tones (# large) are taken for unknown tones, (10) is quiet. In this real: Medium 'must = 10 is enough. In order to prove that the speech recognition H of the present invention has a high rate of _, the range is wide, and the 22 1310543 extraction feature and recognition method is fast and time-saving, the present invention performs a plurality of speech recognition experiments. First, establish a daily Mandarin monophonic database. This early tone database was purchased from Academia Sinica and was of poor quality and incomplete. There are 493 different Mandarin singles on the surface. The number of samples per tone is not - many, only -, two samples, some are empty, and even multiple samples contain two tones 'some as many as 9G samples. Pronounced by multiple people, per person • only pronounced - times. When using the Bayesian classification, each ® language pronunciation must have at least 6 samples. When any __ sample is tested, the other samples are tralningsamples (at least five samples per ensemble are averaged and the number of variations). Only 361 Mandarin singles were combined, for a total of 4155 samples. The quality is not very good. According to the quality (student listening test), there are three categories, and the best one has 233 singles (a total of 3152 samples). The second category has 257 tones (a total of 3302 samples). The third category is the original 361 tone (a total of 4155 samples). Test one sample, and other training samples as the average and variance. It is stored in the database 80. The test sample and the sample for calculating the mean and the variance are completely separated. The three types of monophonic recognition results are placed in Table 2. In Table 2, the first Bayesian classification finds the most similar known tones with the unknown tones 71/(=5, 10, 20) according to the similarity Φ,) (20). In Table 1, there are three most similar tones of three test tones. In the second Bayesian classification, samples of similar tones all use the third type of all samples. Because the sample has a better chance of finding the material than the second 13th Bayesian classification of 23 1310543, the best spelling 2 3 table - let 'the first Bayesian classification recognize the first class-like tone, recognize H 26% ( It is identified by a similar known tone. Second: owing to Bayesian _ _ each unknown single tone (the job sample) of 2 已 〇 〇 早 ’ ’ 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 94 94 94 94 94 94 94 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 There are also 2 samples, which were originally broken by the first-time division. From the country == fruit '...12 shoulders are the best, (the most time-saving and identification rate (four). Lang paste elastic frame does not overlap, the correction can fully extract an early sound feature, such as A is too large, increase the recognition ability is very small, very time consuming However, due to the quality of the central research, the quality of the capital from Sakizaki, it is advisable to take only a few (material) best samples for a similar single tone. ^ Monophonic similarity size %) The order is as follows: The 1G of the unknown tones are most similar to the known single λ. ΐ曰1 2 3 4 5 Table 1 tone. unknown

10 處㊣福缓戶準蘇路錯群 娘@將兩調廟強鳥六廖 顯怒@蘭稜為名寧會穿 俩”和”兩,,應'當為同一國語單音,這是由中央研究院提 24 1310543 供資料庫。 法辨認未知單音的#個相似單音 測試單音樣品總數=4155 ---- top5 toplO top20 3097/4155= 7454 3208/4155= 7721 3254/4155= 7832 max top5 max toplO max top20 3337/4155= 8031 3581/4155= 8619 3758/4155= 9045 表二. 361單音 topi 2341/4155= 5634 筮一淥目疾八細1啊日似早骨中一共包含3337未知單音。 第一-欠貝' 又魅:錯,但第二次貝氏分類法判對有1〇26測試樣品 弟 氏刀類法判對’但第二次貝氏分類法判錯有113測試樣品。 測試單音樣品總數=3302 top5 toplO top20 2933/3302= 8889 3025/3302= 9161 3064/3302=.9279 max top5 max toplO max top20 3006/3302= 9104 3118/3302= 9443 3179/3302= 9628 潜,但第二次貝氏分類法判對有82〇測試樣品。 钟,但第二次貝氏分類法判錯有29測試樣品。 測試單音樣品總數=3152 top5 toplO top20 2872/3152= 9112 2947/3152=. 9350 2993/3152- 9496 max top5 max toplO max top20 2918/3152= 9258 3012/3152= 9556 3073/3152-9749 Ϊ:ΐ3 則試, 第-人貝氏刀類法刦對,但第二次貝氏分類法判錯有21測試樣品。 257單音 topi 2273/3302=. 6884 233單音 topi 2246/3152= 7126 25 1310543 【圖式簡單說明】 圖一表示資料庫建立流程。 圖二表示一個未知單音辨認方法流程。 【主要元件符號說明】 (一)圖一之元件符號簡單說明: 1. 輸入已知單音(1) 2. 已知單音連續音波(10) 3. 已知單音接收器(20) 4. 音波數位化轉換器(30) 5. 除去雜音(40) 6. 彈性框正常化音波(50) 7. 最小平方法計算LPCC向量(60) 8. 計算已知單音的LPCC的平均值及變異數(70) 9. 已知單音資料庫包含所有平均值及變異數的標準模型 (80) (二)圖二之元件符號簡單說明: 1. 輸入未知單音(2) 2. 未知單音連續音波(11) 3. 未知單音接收器(21) 4. 音波數位化轉換器(30) 5. 除去雜音(40) 26 1310543 6·彈性框正常化音波(50) 7.最小平方法計算LPCC向量qo) 98•未知單音的⑽向量代表未知單音分類模型⑽) 已知單音資料庫包含所有平均值及變異數的標準模型10 Zhengfu tempering quasi-Su Lu wrong group mother @ will two temples strong bird six Liao Xian anger @兰棱 for the name of Ning will wear two" and "two, should be 'as the same Mandarin single, this is by The Academia Sinica provides 24 1310543 for the database. The total number of samples of #similar tones tested by unknown method is 4155 ---- top5 toplO top20 3097/4155= 7454 3208/4155= 7721 3254/4155= 7832 max top5 max toplO max top20 3337/4155= 8031 3581/4155= 8619 3758/4155= 9045 Table 2. 361 Mono Topi 2341/4155= 5634 筮 渌 疾 八 细 啊 啊 啊 啊 啊 啊 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似 似The first - owed to the shell's charm: wrong, but the second Bayesian classification judged that there were 1 〇 26 test samples, the brother's knife method was judged 'but the second Bayesian classification error was 113 test samples. Total number of test tone samples = 3302 top5 toplO top20 2933/3302= 8889 3025/3302= 9161 3064/3302=.9279 max top5 max toplO max top20 3006/3302= 9104 3118/3302= 9443 3179/3302= 9628 Dive, but The second Bayesian classification judged that there were 82 test samples. Clock, but the second Bayesian classification error has 29 test samples. Total number of test tone samples = 3152 top5 toplO top20 2872/3152= 9112 2947/3152=. 9350 2993/3152- 9496 max top5 max toplO max top20 2918/3152= 9258 3012/3152= 9556 3073/3152-9749 Ϊ:ΐ3 Then, the first-person Bayesian knife method was robbed, but the second Bayesian classification error was 21 test samples. 257 topi 2273/3302=. 6884 233 topi 2246/3152= 7126 25 1310543 [Simple diagram of the diagram] Figure 1 shows the database establishment process. Figure 2 shows the flow of an unknown tone recognition method. [Explanation of main component symbols] (1) A brief description of the component symbols in Figure 1: 1. Input known tones (1) 2. Known monophonic continuous sound waves (10) 3. Known monophonic receivers (20) 4 . Sonic digital converter (30) 5. Remove noise (40) 6. Elastic frame normalizes the sound wave (50) 7. Minimum square method calculates the LPCC vector (60) 8. Calculate the average value of the known single tone LPCC and Variance (70) 9. The known tone database contains the standard model of all the mean and the variance (80). (2) The simple description of the symbol of Figure 2: 1. Input unknown tone (2) 2. Unknown single Tone continuous sound wave (11) 3. Unknown monophonic receiver (21) 4. Acoustic digital converter (30) 5. Remove noise (40) 26 1310543 6. Elastic frame normalized sound wave (50) 7. Minimum flat method Calculate the LPCC vector qo) 98 • The (10) vector of the unknown tone represents the unknown tone classification model (10). The known tone database contains the standard model of all the mean and the variance.

10.比較每一個已知單音標準模型與未知 (100) 單音分類模型 第-人貝氏分類法選擇未知單音的#個最相似的已知單 音(110) 12.第二次貝氏分類法計算每個相似已知單音尤個最靠近未 知單音的樣品總貝氏距離(120) 氏刀類法選擇一個相似已知單音,它的I個樣 品的總貝氏距離達最小,判為未知單音。(130) 2710. Compare each known single tone standard model with the unknown (100) monotone classification model. The first-person Bayesian classification method selects the # most similar known tones of the unknown tones (110) 12. The second time The Census method calculates the total Bayesian distance (120) of each sample that is similar to the known single tone, which is closest to the unknown tone. The knife method selects a similar known tone, and the total Bayesian distance of its I sample is up to The smallest is judged as an unknown tone. (130) 27

Claims (1)

1310543 喻知知修正替換頁 十、申請專利範圍: 1. -個連續二次収分類法顺相似國語單音的方法,其包含. (1)刪去不具έ吾音音波信號點(sampled p〇ints)或雜音; ⑵-個已知單音音波正常化及抽取大小—致特徵轉方法: 用彈性框將音波正常化並轉換成大小相等的LPCC特徵矩 陣,並將相同單音音波轉換成特徵相同的矩陣;1310543 Yu Zhizhi revised replacement page ten, the scope of application for patents: 1. - A method of continuous secondary collection and classification of similar Mandarin singles, including: (1) Delete the signal point without the voice signal (sampled p〇 Ints) or murmur; (2)--A normalized sound wave normalization and sizing size--feature transfer method: Normalize the sound wave with an elastic frame and convert it into equal-sized LPCC feature matrix, and convert the same single sound wave into features The same matrix; (3H固已知單音特徵的標準模型方法,儲藏在資料庫,標準 模型含有該已知單音特贿_樣本平触及變異數; ⑷-個未知單音音波正常化及抽取特徵方法:將音波正常化 並轉y大小與已知鮮模壯小鱗㈣徵^陣,^為 未知單音分麵型’内含有雜預估編铜頻譜(⑽);(3H solid standard model method for known monophonic features, stored in the database, the standard model contains the known monophonic bribe _ sample flat touch variation; (4) - an unknown monophonic sound normalization and extraction feature method: The sound wave is normalized and the y-size is similar to the known fresh-small scale (4) levy, and the ^ is the unknown mono-faceted type 'with the mispredicted copper spectrum ((10)); ⑸一個簡易貝氏(Bayesian)分類法··將未知單音分類模型 與貧料庫所有已知單音標準模型比較,找灰個與未知單音 =似的已知單音仏u.,順未知單音的糊相似已: 干曰, (6)-個第二次狀分類法:在每個她已知單音,取最小κ 個樣品的總貝氏距離,叫做此相似已知單音與未知單音细 =距離〇nis-categ〇rizatlon risk),在#個相似已知單 找—個最小翻歧離的她已知單 國 & 一個連續二次貝氏分類法二 波信號點或雜 二早曰的方法’其令步驟⑴刪去不具語音的音 音,包含二種方法: 曰 28 1310543 (a)在一小日㈣幽,計_卿她 ^異數,如信號點的變魏小於雜音變異數’_去二 I:,内信號點,計算相鄰兩_點距離總和和-般 段兩信號點距離總和,如前者小於後者則刪去該時 3. ==τ項一個連續二次貝氏分類法辨認相似國 = 巾靖)包f料音波正常化及 取大小致特徵矩陣方法,步驟如下: ω-個轉分,知單音音_點核,為了聰生變 化的迴歸模•姆計祕性難的音波,將音波全長分 ^等時段’每時段形成—轉性框,-個單音共有雜 =性框’沒有據波器(Filter),不重叠,可以自由伸縮 含蓋全長音波,不是固定長度的Hamming窗; ⑹每框内’用一隨時間作線性變化的迴歸模式估計隨時間作 非線性變化的音波; (c)用Durbin’s循環方式 N-i ^(0 = ^ S(n)S(n + /), 7 > 〇 i-\ ^风/)-外·~_)]/ι ./=1 29 1310543 翁为知修正替換頁 aj ~aj '、、k〇、,\<j< Ehf、Ei—' ^J<p 求迴歸係數最小 向量,再用 平方估計值,叫做線性預估編碼 (LPC)(5) A simple Bayesian classification method. Compare the unknown monotone classification model with all the known monophonic standard models of the poor stock, and find the known single tones of the unknown monophonic 仏u. The similarity of the unknown monotone has been: Cognac, (6)-Second sub-classification: In each of her known tones, the total Bayesian distance of the smallest κ samples is called this similar known tones. With the unknown single tone = distance 〇 nis-categ〇rizatlon risk), in #similar similar to find a single lopsided separation she known single country & a continuous quadratic Bayesian two-wave signal point Or the method of Miscellaneous ' 其 其 令 步骤 步骤 步骤 步骤 步骤 步骤 步骤 ' ' ' ' ' 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 Change Wei is less than the murmur variation number '_ go to two I:, the inner signal point, calculate the sum of the distance between the adjacent two _ points and the distance between the two signal points of the general segment. If the former is smaller than the latter, delete the time. 3. ==τ A continuous quadratic Bayesian classification method to identify similar countries = towel Jing) package f material sound wave normalization and take the size of the characteristic matrix The method, the steps are as follows: ω- a turn, know the single sound _ point nucleus, for the singular change of the regression model, the sound of the sound is difficult, the full length of the sound wave is divided into ^ time period 'per time period - the rotation box , - a single tone mixed = sex box 'no filter, no overlap, you can freely extend the full length of the covered sound waves, not a fixed length of Hamming window; (6) each frame with a linear change with time The regression model estimates the acoustic waves that change nonlinearly with time; (c) uses the Durbin's cycle method Ni ^(0 = ^ S(n)S(n + /), 7 > 〇i-\ ^ wind /) - outside · ~_)]/ι ./=1 29 1310543 Weng Zhizhi correction replacement page aj ~aj ',, k〇,, \<j< Ehf, Ei-' ^J<p Find the minimum vector of regression coefficient, reuse Squared estimate, called Linear Predictive Coding (LPC) a \ = irl Μ 1 < / < P .盖(+«, P<i 式轉換LPC向董為穩定的Lpc倒頻譜⑽c)向量; 用亀線性預估編碼倒頻譜(LPCC)向量表示—個單音的f WLPCC特徵矩陣。 1 專利細第丨項―個連續二次貝氏分齡辨認相似國a \ = irl Μ 1 < / < P . cover (+«, P<i-type conversion LPC to Dong is a stable Lpc cepstrum (10) c) vector; with 亀 linear predictive coding cepstrum (LPCC) vector representation— A single-tone f WLPCC feature matrix. 1 Patent sub-item - a continuous second-time Bayesian age identification of similar countries =早曰的方法,財步驟⑶又包含—個已知單音特徵的標準 模型方法,步驟如下: (a) -個已知單音,衫人發音,產生長短不—的音波,因代 表相同已去單曰,長短不一的音波包含一種隨時間作非線 性變化但相同動態特徵,只是時間位置不同; ⑹用Θ個彈性框將該單音長短不—的音波調整並轉換成多個 LPCC特徵大致相同的矩陣樣本,再求Lpcc樣本平均數及 樣本變異數,將樣本平均數及樣本變異數用一個矩陣 30 1310543 表示,稱為該單音標準模型。 5.根财請專雜圍第^―個 ^ ^ 貝氏分頰法辨認相似國 5吾早音的找,其巾步驟⑷又包含—個未崎音音波正常化 =取特徵綠,麵未知單谢咖,其步驟如下: 將未知單音音波分成釋時段,每時段組成-個彈性框, —個未知單音衫鱗長雜框,沒魏波ϋ,不重疊, 自由伸縮含蓋全部音波信號點; ⑹每個雜框内,用—個隨時間作線性變化的迴歸模式 隨時間作非線性變化的音波; 13 (c)用Durbin’s循環方式 ΙΎ—1 w»0 R(i) = Y,S{n)S(n + i), />〇 五〇=_= Early method, financial step (3) contains a standard model method for known monophonic features, the steps are as follows: (a) - a known single tone, the speaker's pronunciation, the length of the sound is not - because the same It has been gone, and the sound waves of different lengths contain a nonlinear change with time but the same dynamic characteristics, but the time position is different; (6) The sound wave of the length of the monophonic sound is adjusted and converted into multiple LPCCs by using one elastic frame. For the matrix samples with the same characteristics, the Lpcc sample mean and the sample variance are obtained. The sample mean and the sample variance are represented by a matrix 30 1310543, which is called the mono standard model. 5. The roots of the wealth, please use the number of ^ ^ ^ ^ ^ Bayesian cheek method to identify similar countries 5 my early sounds, the towel step (4) contains - an unsatisfied sound wave normalization = take the feature green, the surface is unknown Single thank you, the steps are as follows: The unknown single sound wave is divided into the release period, each time period consists of - an elastic frame, an unknown single-tone shirt scale long and long frame, no Wei Bo, no overlap, freely telescopic cover all sound waves Signal point; (6) In each miscellaneous frame, use a regression pattern that changes linearly with time to make nonlinear changes with time; 13 (c) Use Durbin's cycle method ΙΎ-1 w»0 R(i) = Y ,S{n)S(n + i), />〇五〇=_ <=_-Ιχ'、/-))]/尽丨 aj:ai, ^j<P °十异迴歸係數最小平方估計值(LPC向量); (d)再將LPC向量用公式<=_-Ιχ', /-))]/丨 aj:ai, ^j<P ° least squares regression coefficient least squares estimate (LPC vector); (d) then formulate LPC vector 二 A + 31 /-11310543 Ρ :Σ j=i-P I 轉換成穩定LPC倒頻譜(LPCC)向量; (e)用概PCT向量,(飘PCC矩陣),作為該未知單音的 分類模型。 6.=請專利觸】項—個連續二蝴分類法辨認相似國 二曰的方法,其步驟⑸包含—個簡易貝氏(B卿副)分 類法辨認未知單音方法,其步驟如下: (a)-個未知單音的特徵是分類模型,用—個砂⑽矩陣 尤七} ’^^…^表示’為了快速辨認’鮮個 ⑽W假定是辦個獨立隨機變數,有正常分配,如果 未知單音和-個已知單音,輯時,則㈨的 平均數及變異數用該已知單音標準模型内樣本平均 數及樣本變異數估計,黯W條件密度錄/(小),以 下式表示: Π ^σ,β σίβ 在上式中’ Ζ七狀未知單音的分類模舰性預估編碼倒 頻造(LPCC),但(~,4〕可用已知單音&標準模型内的樣本 平均數及樣本變異數估計; ⑹簡易貝氏分類法是針對資料庫中找—個已知單音c,最像此 未知單音尤,一個已知單音^身未知單音相似度以/(小,)大 32 丄31〇543 丨齊》月)曰修正 小表示,· ~一~一^ 必計算的常數,得 為决速辨⑽’用對數化簡條件密度函數朋以,並刪去不 £(^)=Σ/η(σ〇()+~Σ (Xje ii 2 β (d)對每—個已知 J ( 一 早曰c,,,= 】,.·.,所,計算%)值; 6在貧料庫中,選擇#個已知單音Ο丨,』,它的猶是 7椒祕個最小’判為該未知單音的#個相似已知單音。 •=物__—峨二蝴纖辨認相似國 =1的方法,其步驟⑹包含—個第二次貝氏分類法辨認未 知早個她已知單音方法,其步驟如下: a在第二次貝氏分類法中,以κ表示—個未知單音的#個 相似已知單音,以* _ '" = 1,",軋《,“.,尺,,表示第/個相似已知 早音的苐Μ固樣品,以/_ίχΜ ., Λ. ίρΓρ „ ^ —^^’^卜^’表示該樣品 的LPCC的咖矩陣,如果未知 拄,目“ 日ζ押及樣叩Ή4)比對 、、={;丨’假定屬於第〖個已知相似單立,+ 及變里餐θ" 2 和以早曰a,它的平均值 σ Γ …,《是⑽變異數,則未知單立办賴 口口的相似度以⑽計算 以祕 厂'表不,對是=1尺, £(x^) = p^)+lZ(^^y · JK ^ μ (j 7 ’ ⑹計錄個她單音⑽咐娜品的 的的值的總和(可能為負數,因是夺_ m固最小 表不未知單音及第/ 33 1310543 输七月日修正替換頁 個相似已知單音的總貝氏距離或相似度,總貝氏距離愈小, 相似度愈大; (C)從未知單音的#個相似已知單音選一個相似已知單音,它對 • 未知單音的總貝氏距離達最小,判為此未知單音; (d)由辨認測試結果,庐12, /M2,扣5,為最好,這是因為彈性 框不重疊,及12彈性框能充分抽取一個單音特徵。 34A A + 31 /-11310543 Ρ :Σ j=i-P I is converted into a stable LPC cepstrum (LPCC) vector; (e) an approximate PCT vector, (a floating PCC matrix) is used as a classification model for the unknown tone. 6.=Please touch the patent] The method of identifying the similar country two 连续 by the continuous two-follow classification method, the step (5) includes a simple Bayesian (B qing sub) classification method to identify the unknown single tone method, the steps are as follows: a) - The characteristics of an unknown single tone are the classification model, using a sand (10) matrix especially seven} '^^...^ means 'for quick identification' fresh (10) W is assumed to be an independent random variable, with normal allocation, if unknown For monophonic and - a known monophonic, the average and the number of variances of (9) are estimated using the average number of samples and the number of sample variations in the known monophonic standard model, 条件W conditional density record / (small), below Expression: Π ^σ,β σίβ In the above formula, the classification of the seven-singular unknown tone is based on the classification of the ship's predictive coding (LPCC), but (~, 4) can be used with the known monophonic & standard model. Estimate the average number of samples and the number of sample variations; (6) The simple Bayesian classification is to find a known single tone c in the database, most like this unknown single tone, a known single tone is unknown Degree to / (small,) large 32 丄 31〇 543 丨 Qi "month" 曰 correction small said, ~ ~ one ~ one ^ The constant that must be calculated is determined by the speed (10)' with the logarithmic simplification conditional density function, and deleted: (^) = Σ / η (σ〇 () + ~ Σ (Xje ii 2 β (d) For each known J (early 曰c,,, = 】,..,, calculate %) value; 6 in the poor library, select #known monophonic Ο丨, 』, it's still Is the 7th minimum known as the unknown known monophony of the unknown single tone. • = __ 峨 蝴 蝴 蝴 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨 辨The classification method identifies the unknown single-tone method she had known, and the steps are as follows: a In the second Bayesian classification, κ represents - #similar known tones of an unknown single tone, to * _ ' " = 1,", Rolling, "., ruler," indicates the first tamping sample similar to the known early sound, to /_ίχΜ ., Λ. ίρΓρ „ ^ —^^'^卜^' The coffee matrix representing the LPCC of the sample. If it is unknown, the target "day custody and sample 4" is aligned, and ={;丨' is assumed to belong to the first known similar single, + and variable meals θ" 2 and early as a, its mean σ Γ ..., "Yes (10) Variance, then the similarity of the single-opening Laikou is not calculated by (10) with the secret factory's no, the pair is =1 feet, £(x^) = p^)+lZ(^^y · JK ^ μ (j 7 ' (6) counts the sum of the values of her mono (10) 咐 Napin (may be negative, because it is _ m solid minimum table is not unknown mono and / 33 1310543 lost July The daily correction replaces the total Bayesian distance or similarity of a similar known single tone. The smaller the total Bayesian distance, the greater the similarity; (C) Select a similar one from the unknown known singles of the unknown tone. Knowing the single tone, it has the smallest total Bayesian distance for the unknown single tone, and the unknown single tone is judged; (d) by identifying the test result, 庐12, /M2, deducting 5, is the best, because of the elasticity The frames do not overlap, and the 12 elastic frames can fully extract a single tone feature. 34
TW95118948A 2006-05-29 2006-05-29 A method for classifying similar mandarin syllables using two consecutive bayesian decision rules TWI310543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW95118948A TWI310543B (en) 2006-05-29 2006-05-29 A method for classifying similar mandarin syllables using two consecutive bayesian decision rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW95118948A TWI310543B (en) 2006-05-29 2006-05-29 A method for classifying similar mandarin syllables using two consecutive bayesian decision rules

Publications (2)

Publication Number Publication Date
TW200744067A TW200744067A (en) 2007-12-01
TWI310543B true TWI310543B (en) 2009-06-01

Family

ID=45072266

Family Applications (1)

Application Number Title Priority Date Filing Date
TW95118948A TWI310543B (en) 2006-05-29 2006-05-29 A method for classifying similar mandarin syllables using two consecutive bayesian decision rules

Country Status (1)

Country Link
TW (1) TWI310543B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI412019B (en) 2010-12-03 2013-10-11 Ind Tech Res Inst Sound event detection module and method thereof

Also Published As

Publication number Publication date
TW200744067A (en) 2007-12-01

Similar Documents

Publication Publication Date Title
US7957959B2 (en) Method and apparatus for processing speech data with classification models
TWI396184B (en) A method for speech recognition on all languages and for inputing words using speech recognition
US12159627B2 (en) Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation
Ding et al. Deep connected attention (DCA) ResNet for robust voice pathology detection and classification
CN109036381A (en) Method of speech processing and device, computer installation and readable storage medium storing program for executing
CN113889090A (en) Multi-language recognition model construction and training method based on multi-task learning
CN115294970B (en) Voice conversion method, device and storage medium for pathological voice
Dua et al. Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system
CN113782000B (en) Language identification method based on multiple tasks
CN116895287B (en) A method for analyzing speech phenotype of depression based on SHAP value
Liu et al. Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features.
Dusan Estimation of speaker's height and vocal tract length from speech signal.
Musaev et al. Advanced feature extraction method for speaker identification using a classification algorithm
Usman On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation
JP5091202B2 (en) Identification method that can identify any language without using samples
Omar et al. Training Universal Background Models for Speaker Recognition.
TWI310543B (en) A method for classifying similar mandarin syllables using two consecutive bayesian decision rules
CN104240699B (en) Simple and effective phrase speech recognition method
CN110033786B (en) Gender judgment method, device, equipment and readable storage medium
Fachrie et al. Robust Indonesian digit speech recognition using Elman recurrent neural network
Beke Forensic speaker profiling in a Hungarian speech corpus
Hanilçi et al. Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition
Khanna et al. Application of vector quantization in emotion recognition from human speech
Shah et al. Feature Fusion for Performance Enhancement of Text Independent Speaker Identification

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees