JPH02203397A

JPH02203397A - Voice/voiceless part detection system

Info

Publication number: JPH02203397A
Application number: JP1022512A
Authority: JP
Inventors: Masami Akamine; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1989-02-02
Filing date: 1989-02-02
Publication date: 1990-08-13

Abstract

PURPOSE:To improve the detection accuracy by deciding which category a feature parameter belongs to be using the position of the projection point of the feature parameter in a main component vector space of each category and a specific area predetermined in the main vector space. CONSTITUTION:Means 150 - 170 decide whether or not the feature parameter belongs to categories by using the positions of projection points by projecting means 120 - 140 for the feature parameter in the main component vector spaces of the categories and specific areas predetermined in the main component vector spaces. Then the feature parameter is projected in the main component vector spaces and then a voiced/voiceless decision is made. Consequently, even when the number of parameters used for the voiced/voiceless decision making is decreased, the loss of information that the original feature parameter has is least, the voiced/voiceless sound detection accuracy is high, and the absence of a voice and the addition of noises due to an error in the voiced/voiceless decision making are reduced.

Description

【発明の詳細な説明】〔発明の目的〕（産業上の利用分野）この発明は、音声の有音部分のみをセル化し伝送する＾
Ｔ　Ｍ　（Ａｓｙｎｃｈｒｏｎｏｕｓ　Ｔｒａｎｏｕｓ
　Ｍｏｄｅ　）通信や有音部分のみを録音する録音畏槍
や音声認識の基本技術である音声信号の有音・無音検出
方式（従来の技術）音声の有音部分のみをセル化し伝送するＡＴＭ通信や、
ｆ声認識、有音部分のみを録音する録音民−において、
音声の有音区間又は、音声の始端終端を検出する有音／
無音検出は最も基本的でかつ重要な処理である。有音／
無音検出が正確に０行われないと音声が途切れたり、音
声認識の誤りが増加したりする。特番こＡＴＭ通信昏こ
おいては回線を有効利用するためのキーとなると思われ
る。[Detailed Description of the Invention] [Objective of the Invention] (Field of Industrial Application) This invention converts and transmits only the sound portion of audio into cells.
T M (Asynchronous Tranous
(Mode) Recording system that records only the voiced part of communication and sound/silence detection method of voice signal, which is the basic technology of voice recognition (conventional technology) ATM communication that records only the voiced part of voice and transmits it ,
f Voice recognition, recording people who record only the sound part,
A voice/voice section that detects the voice interval or the start and end of the voice.
Silence detection is the most basic and important process. Sound/
If silence detection is not performed accurately, audio may be interrupted or speech recognition errors may increase. Special numbers are considered to be the key to effective use of lines in the current ATM communications era.

信号の入力東件による入力信号レベルのｆ勅に依存せず
１周囲雑音レベルの大きい場合でも、レベルの小さい語
頭子音の脱落を減少させることができる有音／無音検出
法の従来例として特開昭６０−２００３００号公報「音
声の始端、終端検出装置」が知られている。JP-A-2006-101010 is a conventional example of a voice/silence detection method that can reduce the dropout of initial consonants with a low level even when the ambient noise level is high, without depending on the input signal level according to the signal input condition. 1986-200300 ``Voice start and end detection device'' is known.

以下（こ、この従来法の説明を行う。This conventional method will be explained below.

第６図は、上記公報に記載された始端、終端検出装置の
ブロック図である。第６図において、６００（まエネル
ギー抽出部で、整流平滑回路で構成され信号のパワーを
フレーム毎１こ抽出する。６１０はスペクトル形状抽出
部で、低域（２５０〜６００Ｈｚ）。FIG. 6 is a block diagram of the start and end detection device described in the above publication. In FIG. 6, 600 is an energy extractor, which is composed of a rectifying and smoothing circuit, and extracts the signal power once per frame. 610 is a spectral shape extractor, which is a low frequency (250 to 600 Hz).

中域（６００〜１５００Ｈｚ）、高域（１５００〜４０
００Ｈｚ）　　の３１１１ｊＡの帯域通過フィルタ詳と
整流平滑回路で溝底され、′？！ｒ帯域に８いてフレー
ム毎のパワーがスペクトル情報として用いられている。Mid range (600-1500Hz), high range (1500-40
00Hz) 3111JA band pass filter details and rectification smoothing circuit. ! The power of each frame in the r band is used as spectrum information.

エネルギー抽出部６００とスペクトル形状抽出部６１０
とで特微量抽出部６２０を構成する。６３０はマルチプ
レクサで、６００からの信号パワーと６１０からの帯域
フィルタパワーを時分割で有音・無音判定部６４０へ入
力するためのものである。Energy extractor 600 and spectral shape extractor 610
This constitutes a feature amount extraction section 620. 630 is a multiplexer for inputting the signal power from 600 and the band filter power from 610 to the sound/non-sound determining section 640 in a time-division manner.

６４０は有音・無音判定部で無音、無声音、有声音の判
別を行りためのものである。６５０．６６０は閾値メモ
リと標準パターンメモリであり有音・無音判定部６４０
で用いられる定数値が格納されている。［［メモリ６５
０ｆこはパワーの２つの閾値＆　、　Ｅｔが格納されて
いる。また、ａ＊準パターンメモリ６６０には、無音・
無声音を判定するための線形判別関数と無音・有声晋ｅ
判別するための線形判別関数の係数が格納されている。Reference numeral 640 denotes a voice/non-voice determination unit for determining whether there is a silence, an unvoiced sound, or a voiced sound. 650 and 660 are a threshold memory and a standard pattern memory, and are a sound/non-sound determination unit 640
Constant values used in are stored. [[Memory 65
Two power thresholds & and Et are stored in 0f. Additionally, the a* quasi-pattern memory 660 contains silent and
Linear discriminant function for determining voiceless sounds and silent/voiced sounds
Coefficients of a linear discriminant function for discrimination are stored.

これら２つの閾値Ｅ、、Ｅ、と２の線形判別関数の係数
は、予め使用する環境下で発声された音声データの統計
処理により求められ、格納されている。６７０は始端・
終端候補演出部であり、有音・無音判定部より送られて
くるフレーム毎の有音・無音刊定債果の持続時間により
、音声の始端・終端候補を検出する。６８０は始端・終
端決定部で、最終的な始端・終端を決定する。These two threshold values E, , E, and the coefficients of the linear discriminant function of 2 are obtained in advance by statistical processing of voice data uttered under the environment to be used, and are stored. 670 is the starting point.
This is an end candidate production section, which detects start and end candidates of audio based on the duration of the sound/no sound output for each frame sent from the sound/no sound determination section. 680 is a start/end determining unit that determines the final start/end.

以上のように構成された音声の始端・終端検出民ｌの実
際の１助作ｆこついてはｍｅｔこ説明すると、まず、マ
イク等をこより入力される音声を言む信号は、フレーム
毎ｌこ対数パワーＬＰＷと対数帯域パワーＩ、Ｐ　ｉ　
（ｔ　＝１〜３）（こ変換される。有音・無音判定部６
４０は、これらの４つのパラメータと閾値メモリ６５０
と標準パターンメモリ６６０ｆこ格納されている。閾１
１１１Ｅｔ　＊　Ｅｔと２つの線形判別関数の係数２用
いて入力されたフレームが有ｉであるかｉｔξあるかを
判定する。To explain the actual process of detecting the start and end of audio configured as described above, first, the signal representing the audio input from a microphone etc. is logarithmically calculated every frame. Power LPW and logarithmic band power I, P i
(t = 1 to 3) (This is converted. Sound/silence determining unit 6
40 is a memory 650 for these four parameters and a threshold value.
and standard pattern memory 660f are stored. threshold 1
111Et*Et and the coefficient 2 of the two linear discriminant functions are used to determine whether there are i or itξ input frames.

この有音・無音判定は、まず最初に２つのエネルギー閾
値Ｅ、、Ｅ！と対数パワーＬＰＷとの比較１こより次の
ように行われる。This sound/non-sound judgment starts with two energy thresholds E,,E! A comparison of 1 and the logarithmic power LPW is performed as follows.

ＬＰ　Ｗ　）　Ｅ　ｒ　　　　ならば有音ＬＰＷ＜Ｅｔ
　　　　ならば無音Ｅ、　くＬＰＷ＜；：Ｅ、ならば不定不定の’ａ　ｅ　＋こは、さら擾こ対数帯域パワーＬＰ
ｉ（１＝１〜３）と６６０に格納されている２つの線形
判別関数の係数を用いて１式（１）の判別関数値ＦＸを
計算し、ＦＸこより有音・無音を判定する。If LP W ) E r then there is sound LPW<Et
Then, silence E, LPW<;:E, then indefinite indefinite 'a e + ko, furthermore, logarithmic band power LP
Using i (1=1 to 3) and the coefficients of the two linear discriminant functions stored in 660, the discriminant function value FX of Equation 1 (1) is calculated, and voice presence/absence is determined from the FX.

但し、Ａｔは６６０ｆこ格納されている判別関数の係数
であり、ＬＰｉ　は、６６０に格納されている標準パタ
ーンである。However, At is a coefficient of the discriminant function stored in 660f, and LPi is a standard pattern stored in 660.

式（１）のＡ１８よびＬＰｌは予め、使用環境下で発声
された音声データの無音・無声音・有声音を統計処理し
て求められる。ＦＸＱ値は入力が無音のとき負で、入力
が無声音あるいは有声音のとき正の値をとるようｆこ設
定されている。スペクトル形状による有音・無音判定は
無音／有声音と無音／有声音の２つの線形判別関数を計
算しいずれか一方でも正の値をとるならば有音、２つ兵
員の値ならば無音と判定するというものである。このよ
うな方法は無音・無声袴・有声音のスペクトル形状の相
異を利用しているため、エネルギーの小さな無声子音や
有声子音の脱落を少なくできるという特徴をもつ。A18 and LPl in Equation (1) are obtained in advance by statistically processing silent, unvoiced, and voiced sounds of audio data uttered under the usage environment. The FXQ value is set so that it takes a negative value when the input is silent, and a positive value when the input is voiceless or voiced. To determine whether there is sound or no sound based on the spectral shape, calculate two linear discriminant functions: silent/voiced and silent/voiced, and if either one has a positive value, it is determined that there is a sound, and if both values are for soldiers, then it is determined that there is no sound. It is about making a judgment. This method utilizes the differences in the spectral shapes of silence, voiceless hakama, and voiced consonants, so it has the characteristic of reducing the omission of voiceless consonants and voiced consonants with low energy.

しかして、この方法では、スペクトル形状を表わすパラ
メータが少なく、また、パラメータの選び方に理論的根
拠がないため、有音・無音判定を誤ってしまい音声の脱
落や雑音の付７ＪＯが避けられない場合がある。この方
法薔こよるパラメータは、低減（２５０〜６００Ｈｚ）
、中域（［）０〜１５００Ｈｚ）。However, with this method, there are few parameters that represent the spectral shape, and there is no theoretical basis for selecting parameters, so there are cases where voice/no-voice determination is incorrect and audio dropout or noise is unavoidable. There is. The parameters of this method are reduced (250-600Hz)
, midrange ([)0-1500Hz).

高域（１５００〜４０００Ｈｚ）の３つの帯域フィルタ
出力の対数パワーとなりているが、例えば％第６−６こ
示すように無声音のスペクトルが（ａ）で、雑音のスペ
クトルが（ｂ）であるような場合５両者のスペクトルは
大きく異なっているにも関らず１式（１）で計算される
線形判別関数の値は同一になりてしまい、有音・無音１
！４１定を誤ってしまう（但し、Ａｔ＝１）。その結果
、音声の脱落や雑音の付加が避けられない場合がある。It is the logarithmic power of the output of three band filters in the high frequency range (1500 to 4000 Hz). For example, as shown in %6-6, the spectrum of unvoiced sound is (a) and the spectrum of noise is (b). In case 5, the value of the linear discriminant function calculated by equation (1) will be the same even though the spectra of the two are greatly different, and
! 41 constant (however, At=1). As a result, dropout of audio or addition of noise may be unavoidable.

このような問題はパラメータ数が少ない上に帯域フィル
タの選択が適切ではないためをこ生くるものである。又
、さら蚤こ、パラメータの選択法に理論的根拠がないた
め、パラメータの選択すなわち帯域フィルタの帯域の設
定は試行錯誤ｆこ頓らざるを傳ず設定のため多大な労力
を資すと共にパラメターが必ずしも適切ではないという
問題がある。帯域フィルタの数を噌しパラメータ数を増
加させれば、有音・無音判定の誤りは改善させることが
できるが、有音／無音判定のための判別関数の計′ｘ曖
は噌太し、また、パラメータ設定のための労力も膨大な
ものになる。上記公報による技術では１式（１）の線形
判別関数の代りにマハラノビス距離を用いることができ
ることが記述されているが、マハラノビス距雛を用いる
と演算量が一１増大する。This problem is caused by the small number of parameters and inappropriate selection of bandpass filters. In addition, since there is no theoretical basis for the parameter selection method, the selection of parameters, that is, the setting of the band of the bandpass filter, requires a lot of effort and effort to set the parameters by trial and error. The problem is that it is not necessarily appropriate. If the number of bandpass filters is increased and the number of parameters is increased, errors in speech/non-speech determination can be improved, but the total ambiguity of the discriminant function for speech/non-speech determination will increase. Furthermore, the effort required to set the parameters becomes enormous. Although the technique disclosed in the above-mentioned publication describes that the Mahalanobis metric can be used instead of the linear discriminant function of Equation 1 (1), using the Mahalanobis metric increases the amount of calculation by 11.

（発明が酵央しようとする課題・）上述したよりに、従来Ｏ有Ｉ／無音検出去（ま。(The problem that the invention is trying to solve) As mentioned above, conventional O presence/silence detection is removed.

演＄瞼を少なくするため【こパラメータ数３少なくした
＊　＆、有仔／無音判定Ｐ誤ってしまい矛声の脱落や雑
音の付加が避けられない場合があるという問題点がある
。また、従来の方法では、パラメータの選択番こ当りて
ｉ論的な選択基醜がないため、多くの労力を要するとい
う問題点もある。In order to reduce the number of calculations, [the number of parameters was reduced by 3* &, there is a problem that the presence/absence determination P may be incorrect and the omission of vocalizations or the addition of noise may be unavoidable. In addition, the conventional method has the problem that it requires a lot of effort because there is no idiomatic selection basis for determining the parameter selection order.

そこで本発明は、このような問題点を解決するため−こ
なされたものでその目的は有音／無音検出精度が高く音
声の脱落・雑音の付カロが少ない有音・無音検出方式を
提供することにある。Therefore, the present invention has been made to solve these problems, and its purpose is to provide a voice/silence detection method that has high voice/silence detection accuracy and is less likely to drop out voices or add noise. There is a particular thing.

[Beak formation of invention]

（課題を解決するための手段）本発明では、上記の問題点を解決するため音声信号等の
音響信号の特徴を表す信号パワーやＬＰＣ係ａ等の特徴
パラメータを求める手段と、１話や認ａ装置等の通信湊
喧や処理畏ｌｔを便用するとき想定される雑ｔｆ−Ｅ含
む音声信号の特徴パラメータを求め、それらの特徴パラ
メータの自己相関行うすを複数のカテゴリーに分類した
後各カテゴリーの主成分分析により得られる各カテゴリ
ーの主成分ベクトル７ｉ！、間上にフレームごとの特敵
パラメータを形影する手段と、各カテゴリーごと蚤こ主
成分ベクトル空間上の特徴パラメータの射影点の立直と
主成分ベクトル２間上で予め定められた特定の領域を用
いて特徴パラメータが該カテゴリーに属するかどうかを
判定する手段と、各カテゴリーごとの該判定結果を痣合
して有音／無音判定を行り手段を有する。(Means for Solving the Problems) In order to solve the above-mentioned problems, the present invention provides means for determining characteristic parameters such as signal power and LPC factor a representing the characteristics of acoustic signals such as audio signals, and After finding the characteristic parameters of the audio signal including the miscellaneous TF-E that is assumed when using the communication port and processing technology of a device, etc., and classifying the autocorrelation of these characteristic parameters into multiple categories, each Principal component vector 7i of each category obtained by principal component analysis of category! , a means for shaping the special enemy parameters for each frame on the space, a means for projecting the projection point of the feature parameter on the flea principal component vector space for each category, and a specific area predetermined on the principal component vector 2 space. means for determining whether a feature parameter belongs to the category, and means for combining the determination results for each category to determine whether or not there is a sound.

（作用）まず、音声信号等の音響信号の特徴パラメータを求める
。次のそのパラメータを別のパラメータにｆ　＋Ａした
後パラメータ数２元の特徴パラメータより少なくするこ
とを考える。第４図にこの概念を示す。第４図（こ２い
て、５個の元の特徴パラメータｒ、ｘｌ（ｉ＝１．２ｓ
−＋Ｌ）ｔ！：、し、ｘｌ要児とするベクトルをＸとす
る。′に喚は直交変換とし、変換行列をＡとする。変換
後の特徴パラメータｉ　ｙ　１（ｉ＝１．２．・・・ｅ
　Ｌ　）　−ｙ　ｔを要素とするベクトルをＹ、Ｎ個の
パラメータｙｊ（ｊ＝１，２．・・・、Ｎ）を残してｆ
ｉＦｌ）（Ｌ−Ｎ）個を零とした特徴パラメータベクト
ルをＹとする。(Operation) First, characteristic parameters of an acoustic signal such as a voice signal are determined. Next, consider reducing the number of parameters to less than the binary feature parameter after converting that parameter to another parameter by f +A. Figure 4 shows this concept. Figure 4 (here, the five original feature parameters r, xl (i=1.2s
-+L)t! :, and xl. Let X be the vector to be used as the vector. ′ is an orthogonal transformation, and the transformation matrix is A. Feature parameter after conversion i y 1 (i=1.2...e
L ) -y The vector whose elements are Y, leaving N parameters yj (j = 1, 2..., N), f
Let Y be a feature parameter vector with iFl)(L-N) zeros.

但し、ＮＩＬとする。However, it is NIL.

このとき、パラメータ数削減ｌこよりて生じる誤差ベク
トルｅは、元の・待１敗パラメータベクトルＸとＹの逆
変換との差として次式のように記述される。At this time, the error vector e caused by the reduction in the number of parameters l is described as the difference between the original 1-lose parameter vector X and the inverse transformation of Y as shown in the following equation.

６＝Ｘ−Ａ−ＩＹ＝Ａ−’（Ｙ　−Ｙ　）　　　　　　　　　　Ｌ２）こ
の誤差の２乗平均値ｆｒ”　＝＝Ｅ　（ｅｔｅ　）を最
小にする変換を行えば、特徴パラメータ数を少なくする
ことによる誤差が最小になる。但し、ｔは行列の転置、
Ｅは期待値である　ｒｌを最小化する変換は、ｘｌの自
己相関行列の固有ベクトルを行うベクトルとする行列Ａ
ｔこよる変換、すなわちＫＬ変変換あることが知られて
いる。また固有ベクトルは、ｘｌの主成分分析（こよっ
て得られる主成分ベクトルと同じであり、固有償の大き
い順１こ対応した固有ベクトルが第１．＄２．第３．・
・・主成分ベクトルに対応する。6=X-A-IY =A-'(Y-Y) L2) By performing a transformation that minimizes the root mean square value of this error fr'' ==E (ete), it is possible to reduce the number of feature parameters. The error is minimized, where t is the transpose of the matrix,
E is the expected value. The transformation to minimize rl is the matrix A, which is the eigenvector of the autocorrelation matrix of xl.
It is known that there is a transformation that depends on t, that is, a KL transformation. In addition, the eigenvectors are the same as the principal component vectors obtained by principal component analysis of xl (the eigenvectors corresponding to the one with the largest eigencompensation are 1st, 2nd, 3rd, etc.)
...corresponds to the principal component vector.

Ｌ個の特徴パラメータＸをＫＬｆ喫した後、パラメータ
数を削減する操作は、第１−第Ｎ主成分ベクトルを座標
軸とするＮ次元主成分ベクトル空司上ζこ、ＸＥ射影す
ることに対応する。従りて。The operation of reducing the number of parameters after KLf of L feature parameters . Therefore.

特徴パラメータを主成分ベクトル９間とに射影すること
をこより１元の特熾パラメータをより少ないパラメータ
次元で表現する喘合の誤差、言い換えれは元の特徴パラ
メータのもつ情報のロスを最小暑こしながら特徴パラメ
ータ数を少なくできる。By projecting the feature parameters onto the principal component vectors, we can express the feature parameters of one element with fewer parameter dimensions, while minimizing the loss of information contained in the original feature parameters. The number of feature parameters can be reduced.

有音部と無音部の特徴パラメータは、特性の違い例えば
、スペクトル形状の違いによりて、主成分ベクトル空間
上の特定の領域に分布する。The characteristic parameters of the sound part and the silent part are distributed in a specific region on the principal component vector space due to differences in characteristics, for example, differences in spectral shapes.

有音／無音判定は基本的ｉこほこの性質を利用し主成分
ベクトル空間上の特徴／寸うメータの射影点と主成分ベ
クトル空間上で予め定められた有音と無音の領域の比較
により高精度で行うことができる。Speech/silence determination is performed by comparing the projected point of the feature/sound meter on the principal component vector space and the predetermined voice and silence regions on the principal component vector space, using the basic property of i-cohoko. It can be done with high precision.

有音／無音判定をより高精度に行うため１本発明では、
有音部の特徴パラメータを複数のカテゴリーを複数のカ
テゴリー（こ分類し、各カテゴリーで有音／無音判定を
行い、そり結果を総合して最終的な有音／無音判定を行
う。これは、−ぎで有音と言りても、その特徴パラメー
タは１例えば。In order to more accurately determine the presence/absence of sound, the present invention includes the following steps:
The feature parameters of the voiced part are classified into multiple categories, the voiced/silenced judgment is made for each category, and the results are integrated to make the final voiced/silenced judgment. Even if it is said that there is a sound at -, the characteristic parameter is 1, for example.

母音と子音、男声と女声、子音でも各音頭によりて異な
るためである。従りて％音声をなるべく多くのカテゴリ
ーに分類し、各カテゴリーの内で有音／無奮刊定を行え
ば有音／無音検出の精度はより高べなる。しかし、カテ
ゴリー数を認識のようＣζ余り多くすると装置の規模が
大きくなるという問題があるので、カテゴリー数を適切
な儂に絞る必侵がある。This is because vowels and consonants, male and female voices, and even consonants differ depending on the beginning of each sound. Therefore, if the percent speech is classified into as many categories as possible and the presence/absence of speech is determined within each category, the accuracy of speech/non-speech detection will be higher. However, if the number of categories is increased by more than Cζ for recognition, there is a problem that the scale of the apparatus becomes large, so it is necessary to narrow down the number of categories to an appropriate number.

ここで、カテゴリーの分ｌＪｉ去及びカテゴリー数の絞
り方ｆこついて説明するＪ＃微パラメータベクトル％　
ｘｌ　（１＝１　、２　、・・・、Ｍ）とおくと、各ｘ
１に対して１式（２）の誤差の２乗平均１１［Ｅ　ｄ小
ｌこする変換は次式で計算される自己相関行うすの主成
分分析によって得られる一肩ベクトルで咋戎される。Here, we will explain how to narrow down the number of categories and how to narrow down the number of categories.
xl (1=1, 2,...,M), each x
The root mean square error of Equation (2) for 1 is 11 [E d small]. The transformation is calculated by the following equation. .

ベクトルＸｔの要素である。It is an element of vector Xt.

式（３）から、自己相関行列Ｒは１次式で表される各特
徴パラメータベクトルとの自己相関行列Ｒ１をＩ、を次
元の空間で平均したもの、すなわち重心であることが分
かる。From equation (3), it can be seen that the autocorrelation matrix R is the average of the autocorrelation matrix R1 with each feature parameter vector expressed by a linear equation, I, in a dimensional space, that is, the center of gravity.

Ｍ個の％−徴パラメータベクトルの自己相関行列Ｒｔ（
ｔ−４，２，・・・・・・、Ｍ）を、１コの自己相関列
行タリＲで代表させると考えた時１式（３）で求められ
る自己相関行列Ｒは１次式で定義されるＲｉとＲ■２乗
平均誤差Ｅを最小にするものである。The autocorrelation matrix Rt(
When considering that t-4, 2, ......, M) is represented by one autocorrelation column and row Tary R, the autocorrelation matrix R obtained by equation 1 (3) is a linear equation. The defined Ri and R2 minimize the root mean square error E.

但し、　ｘｉｌ　、ｘｉｌ　、曲回・、ｘｉＬは特徴パ
ラメータなぜなら、上式をＲ（ｋ、ｊ）で偏微分し、Ｏ
とお（ことによりを得ることができ、これが式（３）のＩＲＩｃなるから
である。温し、π（ｋ、ｊ）及び１１（ｌｃ、ｊ）はそ
れぞれ自己相１列行９１Ｊ　ＲとＲ１の（ｋ、ｊ）要素
である。However, xil , xil , turn times ·, xiL are characteristic parameters. Therefore, by partially differentiating the above equation with respect to R(k, j), O
This is because π(k, j) and 11(lc, j) are respectively self-phase 1 column row 91J R and R1. (k, j) elements.

このような考え方に基づいて、有を部の！ｌ？微／くラ
メータを予め富め定めた数のカテゴＩＪ−１こ分類する
と共【こ各カテゴリーの代表自己相関行列を求める。具
体的には、ＬＢＧアルゴリズムとして知られている手ｆ
ｆ、を用いる。Based on this way of thinking, we have created a division of existence! l? The micro/parameters are classified into a predetermined number of categories IJ-1, and a representative autocorrelation matrix for each category is determined. Specifically, the hand f known as the LBG algorithm
Use f.

まず、予め電話等の便用環境下で収集されたら声の有音
部の特徴パラメータベクトルＸ１（１＝１゜２、・・・
・・・、Ｍ）を多数求め、ｘｔｃｏ自己相関行ｆｌＪ　
Ｒ１を式（４）に従い計算する。次に行列Ｒ１０行ベク
トル’＜ｅｘとするベクトルをトレーニングベクトルｔ
１としてＬＢＧアルゴリズムを適用することにより、予
め定められた数の代表ベクトル＾ｊ（ｊ−１，２，・・
・・・・、Ｎ）と分割Ｐ（Ａｊ）を求める。求められた
分割Ｐ（Ａｊ）に属する自己相関行列Ｒ１を作成した特
徴パラメータベクトルＸｌをｊ番目のカテゴリーのメン
バーとすると共蛋こ１代表ベグトル＾ｊの′ｇ！素から
咋られる行列Ｒｊをｊ番目のカテゴリーの代表自己相関
行列とする。ＬＢＧアルゴリズムについては、Ｙ　、　
Ｌｉｎｄｅ、＾、Ｂｕｚｏ　ａｎｄ　Ｒ、Ｍ　。First, if the characteristic parameter vector of the vocal part of the voice is collected in advance in a convenient environment such as a telephone, X1 (1=1°2, . . .
..., M), xtco autocorrelation line flJ
Calculate R1 according to equation (4). Next, the training vector t
By applying the LBG algorithm as 1, a predetermined number of representative vectors ^j(j-1, 2,...
..., N) and the division P(Aj). If the feature parameter vector Xl that created the autocorrelation matrix R1 belonging to the obtained division P(Aj) is a member of the j-th category, then 'g! Let matrix Rj derived from the elements be a representative autocorrelation matrix of the j-th category. For the LBG algorithm, Y,
Linde, ^, Buzo and R, M.

Ｇｒａｙ　：　”Ａｎ　ａｌｇｏｒｉｔｈｍ　ｆｏｒ　
Ｖｅｃｔｏｒ　ｑｕａｎｔｉｚｅｒｄｅｓｉｇｎ　　フ
　ＩＥＥＥ　　　Ｔｒｏｍｓ　　、　　Ｃ０Ｍ−２８、
Ｎｏ、ｌ　　ｐｐ。Gray: “An algorithm for
Vector quantizer design IEEE Troms, C0M-28,
No, lpp.

８４−９５　（Ｊａｎｕａｒｙ、１９８０　）に詳述さ
れている。84-95 (January, 1980).

（文＃２）以上の方法により、特徴パラメータベクトルが予め定め
られた複数のカテゴリー蕃こ分境されると共憂こ、各カ
テゴリーの代表自己相関性ｙＩｊが求められる。この方
法では、ＬＢＧアルゴリズムを用いているので１Ｍ個の
自己相関行列Ｒ１（ｉ＝１．２゜・・・・・・・・・、
Ｍ）を８個（くＭ）Ｑ代表自己相関行シ１１より〔（ｊ
工１．２．・・・・・・、Ｎ）で代茂させる又は近似し
た時の誤差の２乗平均値が最小になる。(Sentence #2) By the above method, when the feature parameter vector is divided into a plurality of predetermined categories, the representative autocorrelation yIj of each category is found. Since this method uses the LBG algorithm, 1M autocorrelation matrices R1 (i=1.2°...
M) from 8 (kuM)Q representative autocorrelation rows 11 [(j
Engineering 1.2. ..., N), the root mean square value of the error when approximated or approximated is minimized.

次１こ、それぞれのカテゴリーの代表自己相関行列ＩＲ
ｊを主成分分析することによりそれぞれのカテゴリーの
主成分ベクトルを求める。また、それぞれのカテゴリー
について、主成分ベクトルを座標軸とする主成分ベクト
ル空間上ｌこそのカテゴリーに属する特徴パラメータベ
クトルを射影することにより、カテゴリー＃Ｃ属するか
否かをを判定するための領域を主成分ベクトル空間上に
予め定める。Next, representative autocorrelation matrix IR for each category
Principal component vectors for each category are determined by principal component analysis of j. In addition, for each category, by projecting the feature parameter vector that belongs to the category on the principal component vector space with the principal component vector as the coordinate axis, we can create a region for determining whether it belongs to category #C or not. predetermined on the component vector space.

有音／無音判定は、フレームごとに求められる特徴パラ
メータベクトルをそれぞれのカテゴリーの主成分ベクト
ル臣間上に射影し、そのカテゴリー１こ属すか否かを射
影点と予め定めた領域の比較により全てのカテゴリーに
ついて行りた後、各カテゴリーの判定結果をａ倉して行
う。Speech/silence determination is performed by projecting the feature parameter vector obtained for each frame onto the principal component vector of each category, and comparing the projected point with a predetermined area to determine whether it belongs to category 1 or not. After examining each category, the results of each category are summarized.

（実施例）以下本発明に係る一実施例を図面を参照して説明する。(Example) An embodiment of the present invention will be described below with reference to the drawings.

まず１本発明を用いるＡＴＭ通信に用いる音声セル化罠
１１ｆこついて第８図を用いて説明する。この装置は回
線を有効利用し、しかも高速ｆこ伝送するためのもので
ありて、入力される音声信号を一方は音声符号化器７０
２（こおいて符号化し。First, the voice cell conversion trap 11f used in ATM communication using the present invention will be explained with reference to FIG. This device utilizes lines effectively and transmits at high speed.One side of the input audio signal is sent to the audio encoder
2 (encoded here.

他方は雑音信号を雑音符号化器７０３において符号化す
る。そしてＣれらの符号化された信号はセル化悦［１７
０５でセル化され伝送される。音声の符号化に際しては
、有音部を無音部を有皆無音演出器７０１において検出
し、有音のみをセル化するようスイＶチア０６’３切り
替えＩｊ御する。又、雑音についても有音雑音検出器７
０４にぢいて有音な雑音のみを検出し、符号化させる。The other encodes the noise signal in a noise encoder 703. And these encoded signals are converted into cells [17
05, it is converted into cells and transmitted. When encoding audio, the presence/absence production device 701 detects a sound part and a non-speech part, and controls the switch Ij to convert only the sound part into cells. Also, regarding noise, the voiced noise detector 7
At step 04, only voiced noise is detected and encoded.

雑音は音声に自然さを与えるために設けられて、有音声
以外つまり無音が検出させた時、スイッチ７０４を雑音
符号化器７０３側ｆこ切り替えられ伝送される４このブ
ロック図における伝送の方法（システム）の例は、■有
音部を無音部（雑音を含む）とで異なる符号化を施し伝
送する方法、あるいは異なるビｖトレー）（２４に、８
Ｋｂｐｓ）で伝送する方法等が考えらる。Noise is provided to give naturalness to the voice, and when a voice other than voice, that is, silence is detected, the switch 704 is switched to the noise encoder 703 side and the transmission is performed.4Transmission method in this block diagram ( An example of a method for transmitting a sound part and a silent part (including noise) by applying different encoding, or a different video tray) (24, 8
A possible method is to transmit the data at a high speed (Kbps).

又このブロック図では示されないかの雑音のみ（無音フ
レーム）を初期の段階（例えば接続時点）で受信側に伝
送しておき、この雑音を受信側で常に再生雑音−こ所定
の変化が検出された時に初めてこの雑音を伝送し直す方
法や、■音声だけを伝送し、無音は全く送らない方法等
も考えられる。■の場合は受信側でもりている白色雑音
等により雑音を再生するようにすればよい。Also, only noise (silent frame), which is not shown in this block diagram, is transmitted to the receiving side at an early stage (for example, at the time of connection), and this noise is always reproduced on the receiving side when a predetermined change is detected. Possible methods include a method in which this noise is retransmitted only when the noise is detected, and a method in which only voice is transmitted and no silence is transmitted at all. In the case of (2), the noise may be reproduced using white noise or the like that is present on the receiving side.

以上のシステムに用いられた有音・無音演出ζこついて
詳しく述べる。We will discuss in detail the sound and silent effects used in the above system.

第１図は、本発明の一実施例に係る有音／無音検出器の
ブロック図である。第１図において５１１０はＬＰＣケ
プストラム抽出回路であり入力端子１００から入力した
信号のＬＰＣケプストラムＣ１（ｔ＝ｉ、ｚ、・・・・
・・’、　Ｐ　）　ｆ公知の方法Ｅこよりフレームごと
（１０ｍｓ）ｆこ計算する。但し、Ｐは分析次数であり
例えばＰ＝１６とする。ＬＰＣケプストラムの計算法に
ついては例えば古井貞煕：「ディジタル音声処理」（東
海大学出版会、１９８５）ｔこ記述されている。１２０
は特徴パラメータ射影回路４１：１であり、１１０で求
められたＬＰＣケプストラムベクトルＣ＝（Ｃ１，Ｃ１
・・・・・Ｃｐ）ｔを予め求められたカテゴＩＪ　−４
：　１の主成分ベクトル空間上ｌこ射影する。同様をこ
１３０はＬＰＣケプストラムベクトルＣを予め定められ
たカテゴリー４！″２の主成分ベクトル空間上に射影す
る特徴パラメータ射影回路４Ｉ−２である。本実施例で
はカテゴリー数を１０としており、各カテゴリー擾こ対
しそれぞれ特徴パラメータ射影回路である。FIG. 1 is a block diagram of a speech/silence detector according to an embodiment of the present invention. In FIG. 1, 5110 is an LPC cepstrum extraction circuit, which extracts the LPC cepstrum C1 (t=i, z, . . . ) of the signal input from the input terminal 100.
...', P) f is calculated for each frame (10 ms) using a known method. However, P is the analysis order, and is assumed to be P=16, for example. The calculation method of the LPC cepstrum is described, for example, in Sadahiro Furui's ``Digital Speech Processing'' (Tokai University Press, 1985). 120
is the feature parameter projection circuit 41:1, and the LPC cepstrum vector C=(C1, C1
...Cp) Category IJ-4 for which t is determined in advance
: Project onto the principal component vector space of 1. Similarly, this 130 converts the LPC cepstrum vector C into a predetermined category 4! This is a feature parameter projection circuit 4I-2 that projects onto the principal component vector space of "2".In this embodiment, the number of categories is 10, and a feature parameter projection circuit is provided for each category.

第２図は、第１図記載の特徴パラメータ射影回路≠１１
２０の一構成例を示すブロック図であり内積演算回路２
１０とカテゴリー≠１主成分ベクトルメモリ２００から
構成される。特徴パラメータ射影回路ヰ２〜−ｊＩＰ１
（Ｈこついても同様である。Figure 2 shows the feature parameter projection circuit shown in Figure 1≠11
20 is a block diagram showing a configuration example of the inner product calculation circuit 2.
10 and a category≠1 principal component vector memory 200. Feature parameter projection circuit ヰ2~-jIP1
(The same applies to H getting stuck.

カテゴリー主成分ベクトルメモリーこは予め求められた
カテゴリー≠１の第１〜第３の３つの主成分ベクトルＶ
、、Ｖ、、ｖ、が格納されている。カテゴリーの分類法
及び主成分ベクトルの求め方については後述する。Category principal component vector memory This is the first to third three principal component vectors V of category≠1 determined in advance.
,,V,,v, are stored. The method of classifying categories and how to obtain principal component vectors will be described later.

内積演算回路２１０は、１１０の出力であるＬＰＣケプ
ストラムベクトルＣと主成分ベクトル■、〜Ｖ、との内
積演算を次式に従りて行い、■。The inner product calculation circuit 210 performs the inner product calculation of the LPC cepstrum vector C, which is the output of the circuit 110, and the principal component vectors 2, ˜V, according to the following equation.

Ｖｓ　＊　Ｖｓ　ｆ座標軸とする３次元の主成分ベクト
ル９間上へＣを射影し、射影点Ｑの座標Ｑｔ、Ｑｍ−Ｑ
ｉを求める。Vs * Vs Project C onto the three-dimensional principal component vector 9 with the f coordinate axis, and calculate the coordinates of the projection point Q, Qt, Qm-Q
Find i.

但し、Ｖｉｊは主成分ｖ１の妥素である。各々の特徴パ
ラメータ射影回路により求められた射影点の座標Ｑｌ（
ｉ＝１．２．３）はカテゴリー判定回路１５０，１６０
．１７０へ入力され、そこで、入力信号から抽出された
ＬＰＣケプストラムがそのカテゴリー１こ属するか否か
が各カテゴリーごとに判定される。第３図は、第１図記
載のカテゴリー＄１判定回路の一傳成例を示すブロック
図であり、カテゴＩＪ　−＋　１領域規定ノＲラメ−ダ
メモリ２２０と判定回路２３０から、構成される。カテ
ゴリー学２〜ヰ１０１！ＩＪ定回路についても同様であ
る。カテゴＩＪ　−＋　１〜：ｌＩ：ｌ　Ｏ？ｆＪ域規
定パラメータメモリーこは各カテゴリーの主成分ベクト
ル空間上で各カテゴリーの領域を規定するパラメータが
格納されている。各カテゴリーの領域を第４図に示すよ
うな長方体とした場合領域を規定するパラメータはＶ、
　１１／Ｉｈ、　Ｖｔ　ｊ、Ｖｔ　ｈ、　Ｖｓ　１．　
’ｕｓｈ　（！：　ｆ！　６゜これらのパラメータは、
予め定めておくが、その方法−こついては後述する。第
３図曇こ８ける判定回路２３０は、前記射影点Ｑが＠４
図の領域に存在するか否かによりて入力の特徴パラメー
タがカテゴリー＝ｌ：ｌｌこ属するか否かの判定を行う
。すなわち。However, Vij is a reasonable value of principal component v1. The coordinates Ql(
i=1.2.3) are category judgment circuits 150, 160
．． 170, where it is determined for each category whether the LPC cepstrum extracted from the input signal belongs to that category. FIG. 3 is a block diagram showing an example of the construction of the category $1 judgment circuit shown in FIG. Category Studies 2-101! The same applies to the IJ constant circuit. Category IJ −+ 1~:lI:l O? The fJ area defining parameter memory stores parameters that define the area of each category on the principal component vector space of each category. If the area of each category is a rectangular parallelepiped as shown in Figure 4, the parameters that define the area are V,
11/Ih, Vt j, Vt h, Vs 1.
'ush (!: f! 6゜These parameters are
The method is determined in advance, and the details will be explained later. The determination circuit 230 in FIG. 3 determines that the projection point Q is @4
It is determined whether the input feature parameter belongs to the category =l:ll depending on whether it exists in the area of the diagram. Namely.

１／”’ｉ　ｌ　＜Ｑ　ｔ　＜：Ｖｔ　ｈかツＩＪ＊　
ｌ　＜Ｑ　ｔ　二＝＝＝゛１．）ｔ　ｈか”’）Ｖｓ　
ｌ　＜Ｑ　ｓ＜１）ｓ　ｈ■ときカテゴリー≠１に属す
ると判定し＠１”を、それ以外で１０″を出力する。1/”'i l <Q t <:Vt h katsu IJ*
l <Q t 2===゛1. )t h?”')Vs
When l<Q s<1)s h■, it is determined that it belongs to the category≠1, and @1" is output; otherwise, 10" is output.

第１図に戻りて、有音／無音判定回路１８０は、カテゴ
ＩＪ　−＝＄　１判定回路〜カテゴリー≠１０ｑ定回路
の出力を総合して、有音／無音判定を行う。Returning to FIG. 1, the utterance/silence determining circuit 180 performs utterance/silence determination by integrating the outputs of the category IJ -=$1 determining circuit to the category≠10q constant circuit.

具体的には、カテゴリー≠１判定回路〜カテゴリー４１
：ｌ　Ｏ判定回路の出力のＯＲ演算を行い、その結果が
”１”ならば有音、“Ｏ″ならば無音と判定する。Specifically, category≠1 judgment circuit ~ category 41
:l The output of the O determination circuit is ORed, and if the result is "1", it is determined that there is a sound, and if the result is "O", it is determined that there is no sound.

以上が本発明の一実施例に係る有ｆ／無音検出器の動作
の説明である。以下に、特徴パラメータを複数のカテゴ
リーに分類すると共に各カテゴリーの主成分ベクトル及
び主成分ベクトル空間上の各カテゴリーの存在領域を求
める方法について述べる。The above is an explanation of the operation of the f/silence detector according to one embodiment of the present invention. Below, a method for classifying feature parameters into a plurality of categories and determining the principal component vector of each category and the region in which each category exists on the principal component vector space will be described.

まず、予め電話の使用環境下で収集された牙声の有音部
のＬＰＣケプストラムベクトルＣ１（１＝１．２．・・
・・・・、Ｍ）を多数（Ｍ個）求め、Ｃ１の自己相関行
列Ｒ１を式（９）Ｉこ従い計算する０次ｆｃ１行列Ｒｉ
Ｏ行ベクトル分要素とするＰ！次元のベクトルをトレー
ニングベクトルｔｌとする。First, the LPC cepstrum vector C1 (1=1.2...
..., M), and calculate the autocorrelation matrix R1 of C1 according to formula (9) I.0-order fc1 matrix Ri
P with O row vector elements! Let the dimensional vector be the training vector tl.

（ソＪｔ　１＝（Ｃｉ、ｔ　ｃ　ｉ、　Ｃｉｌ・・・・・・・
・・Ｃ１，Ｃ１ｐＣｉｌ　ＣｔｌＣｉ♂・・・・・・・
・・ｃｔｌｃｉｐ・・・・・・・・・・・・・・・Ｃ１
ｐりｔ但しｃｉ、、Ｃ１１・・・・・・・・、　Ｃｉｐ
　はり、ＰＣケプストラムベクトルＣ１の要素である。(So J t 1 = (Ci, t c i, Cil...
・・C1, C1pCil CtlCi♂・・・・・・・
・・・ctlcip・・・・・・・・・・・・・・・C1
However, ci,, C11..., Cip
The beam is an element of the PC cepstrum vector C1.

また、ｔは行列の転置を示す、トレーニングベクトルｔ
１はＬＢＧアルゴリズムを適用し次のようにしてＮ個の
代表、ベクトルｙｊ（ｊ　＝１．２．・・・・・・・・
・、Ｎ）と分割Ｐ（Ａｊ）を求める。Also, t indicates the transpose of the matrix, the training vector t
1 applies the LBG algorithm and calculates N representatives, vector yj (j = 1.2...
, N) and the division P(Aj).

第０ステップ：初期設定代表ベクトルの数Ｎ％歪の閾値８１代表ベクトルの初期
値Ａｏ、ｌ−レーニングベクトルｔｉ（ｉ＝１　、２　
、　・・−、Ｍ）を与え、ｍ＝ｏ、ｌ）、＝　　と設定
する。0th step: Initial setting Number of representative vectors
, ...-, M) and set m=o, l), = .

第１ステＶブ：与えられた代表ベクトルの果合Ａｍ＝（
ｙｊ、ｊ＝１．・・・・・・１、Ｎ）で最小平均歪とな
るような分割Ｐ　（Ａｍ）＝ｌＳｉ　）　、ｉ＝１．２
．−、Ｎをトレーニングベクトルｔｉによりて求める。First step: Result of the given representative vector Am=(
yj, j=1. ...1, N), such that the minimum average distortion is achieved by dividing P (Am) = lSi), i = 1.2
．． -, N are determined using the training vector ti.

すなわち。Namely.

分割領域Ｓｉｆこ属するすべてのｔｉｔこついて、ｄ（
ｔｉ　、ｙｌ）＜ｌｔｉ、ｙｊ）、ｊ＝１．２．・・・
・・・・・・、ＮとなるようＥこする。但し、ｄ（ｔｉ
、ｙｌ）はｔｉとｙｌの間■歪であり１次のように２乗
誤差として定義できる。All tits belonging to the divided area Sif are added, d(
ti, yl)<lti, yj), j=1.2. ...
・・・・・・Rub E so that it becomes N. However, d(ti
, yl) is the distortion between ti and yl, and can be defined as a squared error like a first order.

ｔ　１（Ｒ）、　ｙ　ｌ（′ＥＱｌ′ｉｔ　ｉ　、　ｙ
　ｉＯ安素である。t 1(R), y l('EQl'it i, y
iO is ammonium.

ここで分割Ｐ（Ａｍ）ｌこよる最小平均歪を次式により
計算する。Here, the minimum average distortion due to the division P(Am)l is calculated using the following equation.

Ｄｍ＝Ｄｍ［（Ａｍ、Ｐ（Ａｍ）））Ｓｌに属するトーニングベクトルｔｊを作成しているＬ
ＰＣケプストラムベクトルＣｊが１番目のカテゴリーの
メンバーということになる。また。Dm=Dm[(Am, P(Am))) L creating the toning vector tj belonging to Sl
It follows that the PC cepstrum vector Cj is a member of the first category. Also.

代表ベクトル７１（１＝１．２．・・・・・・、Ｎ）の
要素の詑び変え番こよって得られる行列Ｒ１が１番目の
カテゴリーの代賢自己相関行列となる。The matrix R1 obtained by changing the number of elements of the representative vector 71 (1=1.2...,N) becomes the Daiken autocorrelation matrix of the first category.

第２ステップ；収束のチ史ツク（Ｄｍ−＋　−Ｄｍ）／Ｄｍ（ｇｒｌらば処理を停止し
、Ａｍを最終の代表ベクトルの集会とする。Second step: If convergence check (Dm-+-Dm)/Dm(grl) is reached, the process is stopped and Am is set as the final collection of representative vectors.

第３ステップ：繰り返し今の分割によりて得られている代表ベクトル集会Ａ　ｍ
十＋　ｆ　Ａ　ｒｎとし、ｍ＝ｍ＋署として第１ステツ
プへ戻る。Third step: Repeat the representative vector assembly A m obtained by the current division.
10 + f A rn and return to the first step with m = m + sign.

なお、本実施例では初期設定に２いて、Ｎ＝１０、ε＝
０．Ｏｌ、ｍ＝１００００とする。In this embodiment, the initial setting is 2, N=10, and ε=
0. Let Ol, m=10000.

以上の処理によりて得られる１０個の分割５１（１＝１
．２．・・・・・・、Ｎ）が１０個のカテゴリーになり
各カテゴリーの主成分ベクトルは、Ｒ１の主成分分析１
こより予め得ることができる。また、主成分ペグトル２
間上で各カテゴリーの領域を規定するパラメータは、！
カテゴリーごとにそのカテゴリー憂こ属するＬＰｃケブ
トラムベクトルヲ谷カテゴリーの主成分ベクトル空間上
ｌこ射影すること醗こより予め定めることができる。The 10 divisions 51 (1=1
．． 2. ......, N) becomes 10 categories, and the principal component vector of each category is calculated by principal component analysis 1 of R1.
It can be obtained in advance from this. In addition, the main component Pegtor 2
The parameters that define the area of each category on the !
It can be predetermined for each category by projecting the LPc vector to which the category belongs onto the principal component vector space of the category.

以上の実施例では、有音／無牙判定に用いているパラメ
ータは各カテゴリーにつ＃Ｑｔ−Ｑｓ（Ｄ３つであるが
、その数は任意に設定することができる。パラメータ数
がいくつでありても１元の特徴パラメータ数分少なくし
たことによる誤差は最小である。また、この実施例では
カテゴリー数２１０としているが、その数も自由に設定
することができ、いずれの場合も特徴パラメータを有限
のカテゴリー数に分類することによる誤差は最小である
。In the above embodiment, the parameters used for the sound/tuskless determination are #Qt-Qs (D3) for each category, but the number can be set arbitrarily. Even if the number of categories is reduced by the number of feature parameters in one element, the error is minimal.Also, in this example, the number of categories is 210, but the number can be set freely. The error due to classification into a finite number of categories is minimal.

更に、本実施例では、入力の特徴パラメータが各カテゴ
１７−１こ属するか否かの判定を主成分ベクトル空間上
の射影点が各カテゴリーごとζこ定めた特定の領域に入
るか否かということで行りているが、これと領域の重心
と射影点との距離で行うこともできる。例えば、領域の
重心をＶ　＝　（”ＬＦ”＋−″ｖｔＶｓ）とＳき、距
ｆｉＤを次式で定義し、Ｄと予め定めた閾値Ｔｈとの比
較ｆこよりＤ　＜　Ｔ　ｈ　ｒｌらばカテゴリーに属し
、Ｄ＞Ｔｈならば属さないと判定することもできる。Furthermore, in this embodiment, it is determined whether the input feature parameter belongs to each category 17-1 by determining whether the projection point on the principal component vector space falls within a specific area determined for each category. This is done using the distance between the center of gravity of the area and the projection point. For example, when the center of gravity of the region is V = ("LF" + - "vtVs), the distance fiD is defined by the following formula, and D is compared with a predetermined threshold Th. From this, if D < T h rl, the category , and if D>Th, it can also be determined that it does not belong.

１＝１但し、ａｌは重み係数である。1=1 However, al is a weighting coefficient.

従来の有音／無音検出法では、距嶋による判定が用いら
れ、領域による判定は従来になかりたものである。領域
Ｃζよる判定は、有音叉は無音が主成分ベクトル空間上
で特別な領域に分布する場合でも、有音・無音判定がで
きるので、有音／無音検出の精度が向上するという効果
がある。例えば式（１４）　ｌこおいてａｌ＝１（１＝
１．２．３）と２いた場合、Ｄ＜：Ｔｈとなる領域は球
の内部となるように、距１ｌｆ１６ｃよる判定では、有
音の領域の形が距離の定義によりて決まり、任意の形を
設定することができないのに対し、領域による判定では
、任意の形を設定することができる。In the conventional voice/silence detection method, a determination based on Takashima is used, and determination based on an area has not been used in the past. The determination based on the area Cζ allows the presence or absence of voice to be determined even when voice or silence is distributed in a special area on the principal component vector space, so it has the effect of improving the accuracy of voice/silence detection. For example, equation (14) where l = 1 (1 =
In the case of 1.2.3) and 2, the region where D<:Th is inside the sphere.In the judgment based on the distance 1lf16c, the shape of the voiced region is determined by the definition of the distance, and any shape can be formed. cannot be set, whereas in region-based determination, any shape can be set.

入力信号の特徴パラメーターこついてもＬＰＣケプスト
ラムの他に、信号パワー　零交差ａ、ＬＰＣ係数、自己
相関係数、ＤＦＴ係数およびそれらの組合せを用いるこ
とができる。As for the characteristic parameters of the input signal, in addition to the LPC cepstrum, signal power zero crossing a, LPC coefficients, autocorrelation coefficients, DFT coefficients, and combinations thereof can be used.

以上述べたようｆこ１本発明は、特徴パラメータを主成
分ベクトル空間上に射影した上で有音／無音判定を行り
ので、有音／無音判定に用いるパラメータ数２少なくし
た場合でも５元の特徴パラメータのもつ情報の損失が最
も小さく有ｆ／無ｔｍ出精度が高く、有音／無音検出の
倶りによる音声の脱落や雑音の付加を少なくすることが
できるという効果がある。また、有音、／無音判定に用
いるパラメータの選択に当りて、まず多くの特徴パラメ
ータを求め、そのパラメータと固有値最大の第１主成分
ベクトルから順に第２．第３・・・・・・主成分ベクト
ルへと内積演算を行って得られるパラメータを用いれば
元の特徴パラメータのもつ情報の損失が最小となるとい
う理愉的な規準があるので。As described above, the present invention performs voice/silence determination after projecting the feature parameters onto the principal component vector space, so even if the number of parameters used for voice/silence determination is reduced by 2, the The loss of information of the characteristic parameters is the smallest, the presence/absence of tm detection accuracy is high, and the dropout of voice and the addition of noise due to failure of speech/non-speech detection can be reduced. In selecting parameters to be used for voice/silence determination, first, many feature parameters are determined, and the parameters and the second principal component vector are selected in order from the first principal component vector with the largest eigenvalue. Third...There is a reasonable criterion that the loss of information in the original feature parameters will be minimized if the parameters obtained by performing inner product calculation on the principal component vectors are used.

有斤／無ｉｔ’ｌｌ定ｌこ用いるパラメータの設定が容
易であるという効果がある。This has the effect of making it easy to set the parameters used.

さらに１本発明では、特徴パラメータの自己相関行列を
複数のカテゴリーに分類したｆｆｌ、各カテゴリーの主
成分分析番こよりて得られる各カテゴリーの主成分ベク
トル空間上にフレームごとの特徴パラメータを射影した
上でカテゴリーごとｔこ有音／無音判定を行いその結果
を総合して有音／無音検出を行りているので、有音／無
音検出精度が向上するという効果がある。しかも、カテ
ゴリーの分ａおよびカテゴリーごとの主成分ベクトルを
求める際にＬＢＧアルゴリズムを用いているので、Ｍ個
の特徴パラメータの自己相関行列９Ｍより少ない個数の
カテゴリーに分類することによりて生じる誤差を最小こ
することができ、有音／無音倹゛出精度を高くすること
ができるという効果がある。Furthermore, in the present invention, the feature parameters for each frame are projected onto the principal component vector space of each category obtained by ffl, which classifies the autocorrelation matrix of the feature parameters into a plurality of categories, and the principal component analysis number of each category. Since the presence/absence determination is made for each category and the results are combined to detect the presence/absence of speech, there is an effect that the accuracy of detecting the presence/absence of speech is improved. Moreover, since the LBG algorithm is used to calculate the category a and the principal component vector for each category, the error caused by classification into fewer categories than the autocorrelation matrix 9M of M feature parameters can be minimized. This has the effect that it is possible to improve the accuracy of sound/non-sound detection.

〔Effect of the invention〕

以上本発明によれば演算量は少ない−こもかかわらず、
有音Ｏ無音の判定を精度よく行い得、システムへの信頼
性も向上するという効果を奏する。As described above, according to the present invention, the amount of calculation is small - in spite of this,
This has the effect that it is possible to accurately determine whether there is a sound or no sound, and the reliability of the system is also improved.

[Brief explanation of the drawing]

第１図は本発明の一実施例ｊｌこ係る有音／無音検出器
のブロック図、ｉＩＩ、２図は第１図記載の特徴パラメ
ータ射影回＠≠１の一構成例を示すブロック図、第３図
は第１図記載のカテゴＩＪ　−４：　１判定回路の一構
成例を示すブロック図、第４図は本発明の一実施例に係
り主成分ベクトル空間上でカテゴリー゛に属するか否か
を判定するための領域を示す図、第５図は本発明の詳細
な説明の際に用いた特徴パラメータ数削減の概念を示す
図、第６図は従来の有ｆ／無音検出ｔｔ［ｌのブロック
図、第７図は従来の有ｆ／＃ｆ検出装置で同一のスペク
トル形状と判定されるスペクトルの例を示す図、第８図
は本発明の音声セル化ｆ−１のブロック図である。１００・・・入力端子、１１０・・・ＬＰＣケプストラ
ム抽出回路、１２０，１３０，１４０・・・特徴パラメ
ータ射影回ｉｆ！、１５０．１６０，１７０・・・カテ
ゴリー判定回路、１８０・・・有音／無ｆ判定回路、２
００・・・カテゴリ主成分ベクトルメモリ、２１０・・
・内積演算回路％　２２０・・・カテゴリー領域規定パ
ラメータメモリ、２３０・・・判定回路、６００・・・
エネルギー抽出部％　６１０・・・スペクトル形状抽出
部、６２０・・・特微量抽出部、６３０・・・マルチプ
レクサ、６４０・・・有音・無音判定部、６５０・・・
閾値メモＩＪ、６６０・・・標準パターンメモＩＪ、６
７０・・・始端−終鴫候補検出部、６８０・・・始端−
終端決定部。FIG. 1 is a block diagram of a speech/silence detector according to an embodiment of the present invention, and FIG. FIG. 3 is a block diagram showing an example of the configuration of the category IJ-4:1 determination circuit shown in FIG. FIG. 5 is a diagram showing the concept of reducing the number of feature parameters used in the detailed explanation of the present invention. FIG. 6 is a diagram showing the conventional f/silence detection tt[l Block diagram: FIG. 7 is a diagram showing an example of spectra determined to have the same spectral shape by a conventional f/#f detection device; FIG. 8 is a block diagram of the voice cell f-1 according to the present invention. . 100... Input terminal, 110... LPC cepstrum extraction circuit, 120, 130, 140... Feature parameter projection time if! , 150.160, 170...Category judgment circuit, 180...Sound/absence f judgment circuit, 2
00... Category principal component vector memory, 210...
- Inner product calculation circuit% 220...Category area defining parameter memory, 230...Judgment circuit, 600...
Energy extraction section % 610... Spectrum shape extraction section, 620... Feature amount extraction section, 630... Multiplexer, 640... Sound/non-sound determination section, 650...
Threshold value memo IJ, 660...Standard pattern memo IJ, 6
70...Starting end-terminus candidate detection unit, 680...Starting end-
Termination determining section.

Claims

[Claims]

(1) A means for obtaining a feature parameter representing the feature of an acoustic signal of an audio signal, and an autocorrelation matrix of the feature parameter of the audio signal including preset noise, for dividing the feature parameter obtained by this means into multiple categories. A means for projecting onto the principal component vector space of each of the classified categories, a projection point position of the feature parameter on the principal component vector space for each category projected by this means, and A means for determining whether the feature parameter belongs to this category using a specific area, and a means for determining whether there is a sound or no sound using the determination result for each category determined by this means. This is a sound/silence detection method that is characterized by:

(2) The voice/silence detection method according to claim 1, wherein the means for determining the feature parameters determines the feature parameters using signal power, LPC coefficients, etc. representing the characteristics of the acoustic signal.

(3) The principal component vector space of the feature parameters of the voiced part or the silent part set in advance is obtained by principal component analysis of the feature parameters of the device used. Silence detection method.