JPH0740200B2

JPH0740200B2 - Voice section detection method

Info

Publication number: JPH0740200B2
Application number: JP61079304A
Authority: JP
Inventors: 陽一山田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-04-08
Filing date: 1986-04-08
Publication date: 1995-05-01
Anticipated expiration: 2010-05-01
Also published as: JPS62237498A

Description

【発明の詳細な説明】（産業上の利用分野）この発明は音声確認装置における音声区間検出方法に関
するものである。The present invention relates to a voice section detection method in a voice confirmation device.

（従来の技術）従来より、音声認識処理に当り、音声区間の検出を行っ
ている。先ず、この従来の音声区間検出方法につき第４
図を参照して説明する。(Prior Art) Conventionally, in a voice recognition process, a voice section is detected. First, regarding the conventional voice section detection method,
It will be described with reference to the drawings.

従来の音声区間検出方法によれば、音声信号の入力レベ
ル値を時間ｔの関数Ｓ（ｔ）とした時、この音声信号が
入力した時の雑音レベル値Ｎ、入力レベル値Ｓ（ｔ）等
よりレベル閾値Ｌを設定する。その閾値Ｌと比較して入
力レベル値Ｓ（ｔ）が大である状態（Ｓ（ｔ）＞Ｌ）が
一定時間すなわち始端決定高レベル入力最低継続時間TS
以上継続した時、この継続時間TSの開始時刻を音声区間
の始端とする。その後入力レベル値Ｓ（ｔ）がその閾値
Ｌと比較して小である状態（Ｓ（ｔ）≦Ｌ）が一定時間
すなわち終端決定低レベル入力継続時間TE以上継続した
時、この継続時間TEの開始時刻を音声区間の終端とする
という判定方法により音声区間の決定を行っていた。According to the conventional voice section detection method, when the input level value of the voice signal is a function S (t) of the time t, the noise level value N when the voice signal is input, the input level value S (t), etc. The level threshold L is set more. A state in which the input level value S (t) is larger than the threshold value L (S (t)> L) is a constant time, that is, the starting end determination high level input minimum duration time TS.
When the above continues, the start time of this continuation time TS is set as the start end of the voice section. After that, when the state (S (t) ≦ L) in which the input level value S (t) is smaller than the threshold value L continues for a certain period of time, that is, at the termination determination low level input duration TE or more, the duration TE The voice section is determined by the determination method that the start time is the end of the voice section.

この場合、雑音レベル値Ｌとしては、音声信号の入力中
でないと想定される時刻t0から予め定められた長さの、
時間的に連続した雑音測定時間間隔TNにおける入力レベ
ル値Ｓ（ｔ）の平均値を使用している。そしてレベル閾値Ｌとしては雑音レベ
ル値Ｎに予め定められた定数C0を加算した値Ｌ＝Ｎ＋C0 とする方法が一般的な方法であった。In this case, the noise level value L has a predetermined length from the time t0 when it is assumed that the voice signal is not being input,
Average value of the input level value S (t) in the noise measurement time interval TN that is continuous in time Are using. As the level threshold L, a method of setting a value L = N + C0 obtained by adding a predetermined constant C0 to the noise level value N is a general method.

この方法によれば、第４図においてレベル閾値Ｌと入力
レベル値Ｓ（ｔ）とが一致する時刻t1,t2,t3,t4とする
と、入力レベル値Ｓ（ｔ）がこのレベル閾値Ｌを前述の
継続時間TE以上越える区間の開始点例えば時刻t3を音声
区間の始端と決定している。According to this method, assuming that the time t1, t2, t3, t4 at which the level threshold L and the input level value S (t) match in FIG. 4, the input level value S (t) determines the level threshold L as described above. The start point of the section exceeding the duration TE of, for example, time t3 is determined to be the beginning of the voice section.

次に、レベル閾値Ｌを前述の継続時間TE以上下回る区間
の開始点例えば時刻t4をその音声区間の終端と決定して
いる。Next, the start point of the section below the level threshold L by the duration TE or more, for example, time t4, is determined as the end of the voice section.

（発明が解決しようとする問題点）しかしながら、雑音変動が激しい環境下で音声認識装置
を実際に使用する場合には、雑音レベル値測定時刻にお
ける入力レベル値と、雑音レベル値測定時刻からある時
間経過後における入力レベル値とが異なる値を取る確率
は経過時間に従って大きくなると一般に考えられてい
る。従って、雑音レベル値測定時刻からの経過時間が比
較的短い期間内の時刻例えば音声発声開始直前の時刻に
おける入力レベル値（大きさ）が雑音レベル値と異なっ
た値となる確率は比較的小さいので、前述した従来の如
く設定されたレベル閾値を使用して始端を安定かつ正確
に検出出来る確率は大である。しかし、雑音レベル値測
定時刻よりの経過時間が比較的長くなる時刻例えば音声
発生終了直後の時刻における入力レベル値（大きさ）が
雑音レベル値と異なった値となる確率は大きいので、終
端検出時には、最初に設定したレベル閾値が終端検出に
適した値ではなくなつており、これがため終端を正確に
検出出来なくなる確率が大となり、これに起因して音声
確認性能の低下をもたらすという問題点があった。(Problems to be Solved by the Invention) However, when the voice recognition device is actually used in an environment where noise fluctuations are severe, the input level value at the noise level value measurement time and a certain time from the noise level value measurement time It is generally considered that the probability that a value different from the input level value after the passage of time will increase with the passage of time. Therefore, the probability that the input level value (magnitude) at a time within a period where the time elapsed from the noise level value measurement time is relatively short, for example, immediately before the start of voice utterance, is different from the noise level value is relatively small. The probability that the start edge can be detected stably and accurately using the level threshold value set as in the prior art described above is high. However, the probability that the input level value (magnitude) at the time when the elapsed time from the noise level value measurement time is relatively long, for example, at the time immediately after the end of voice generation is different from the noise level value, is high. , The level threshold set at the beginning is not a value suitable for end detection, which increases the probability that the end cannot be detected accurately, which causes a problem that the voice confirmation performance deteriorates. there were.

この発明は上述した問題点の解決を図るためになされた
ものである。The present invention has been made to solve the above-mentioned problems.

従って、この発明の目的は音声区間の終端を安定かつ正
確に検出することを可能とし音声区間検出方法を提供す
ることにある。Therefore, an object of the present invention is to provide a method for detecting a voice section which enables stable and accurate detection of the end of the voice section.

（問題点を解決するための手段）この目的の達成を図るため、この発明による音声区間検
出方法によれば、次のような処理を行う（第１図参
照）。(Means for Solving Problems) In order to achieve this object, according to the voice section detection method of the present invention, the following processing is performed (see FIG. 1).

この発明によれば、第１段階として仮終端を決定し、第
２段階として真の終端を決定する。According to the present invention, the temporary termination is determined as the first stage, and the true termination is determined as the second stage.

この第一段階では、入力音声信号の雑音レベル値Ｎに予
め定められた正の定数C1を加えて得られた始端検出用レ
ベル閾値LSよりも値の小さくかつ雑音レベル値Ｎよりも
値が大きいレベル値を得、このレベル値を仮終端検出用
レベル閾値LKEと設定する。次に、入力レベル値Ｓ
（ｔ）がこの仮終端検出用レベル閾値LKEを予め定めら
れている仮終端決定低レベル入力最低継続時間TKE以上
下回る区間の開始時刻t3をその音声区間の仮終端と決定
する。In this first step, the noise level value N of the input voice signal is smaller than the leading edge detection level threshold value LS obtained by adding a predetermined positive constant C1 and is larger than the noise level value N. The level value is obtained, and this level value is set as the temporary end detection level threshold value LKE. Next, the input level value S
(T) determines the start time t3 of the section in which the temporary end detection level threshold LKE falls below the predetermined temporary end determination low level input minimum duration time TKE as the temporary end of the voice section.

次に、第２段階として、仮終端直後における入力レベル
値の大きさを終端検出用雑音レベル値NEとし、この雑音
レベル値NEに予め定められた正の定数C3を加算した値を
終端検出用レベル閾値LEとする。次に、始端から仮終端
までの記憶された入力レベル値Ｓ（ｔ）とこのレベル値
LEとの交差時刻t2と真の終端として検出する。Next, in the second step, the magnitude of the input level value immediately after the provisional termination is used as the noise level value NE for detecting the termination, and a value obtained by adding a predetermined positive constant C3 to this noise level value NE is used for the termination detection. Let level threshold LE. Next, the stored input level value S (t) from the start end to the provisional end and this level value
Detected as the time t2 at which LE intersects and the true end.

（作用）このように、この発明によれば、仮終端検出値後の入力
レベル値Ｓ（ｔ）を真の終端検出用の雑音レベル閾値LE
として設定しているので、終端検出は雑音の時間的変動
の影響を受けにくく、従って、音声区間の検出は安定か
つ正確となる。(Operation) As described above, according to the present invention, the input level value S (t) after the provisional end detection value is used as the noise level threshold LE for detecting the true end.
Therefore, the end detection is not easily affected by the temporal fluctuation of noise, and therefore the detection of the voice section is stable and accurate.

（実施例）以下、図面を参照してこの発明の音声区間検出方法の実
施例につき説明する。(Embodiment) An embodiment of the voice section detection method of the present invention will be described below with reference to the drawings.

第１図はこの発明の音声区間検出方法の説明に供する説
明図で横軸に時刻ｔを取り、縦軸にレベル値を取って入
力レベル値の変化の様子の一例を示してある。第２図は
この発明の音声区間検出方法を実施するための音声区間
検出部の一構成例を示すブロック図である。又、第３図
はこの発明の方法の説明に供する処理の流れ図である。FIG. 1 is an explanatory diagram for explaining the voice section detecting method of the present invention, and shows an example of how the input level value changes by taking the time t on the horizontal axis and the level value on the vertical axis. FIG. 2 is a block diagram showing an example of the configuration of a voice section detection unit for carrying out the voice section detection method of the present invention. Further, FIG. 3 is a flow chart of the process used for explaining the method of the present invention.

第２図に示す音声区間検出部は、レベル抽出部11、仮音
声区間検出用閾値設定部12、仮音声区間検出部13、レベ
ル記憶部14、終端検出用閾値設定部15、終端検出部16及
び制御部17を以って構成している。尚、以下の説明にお
いて、流れ図の処理ステップをＳで表わす。The speech section detection unit shown in FIG. 2 includes a level extraction section 11, a temporary speech section detection threshold setting section 12, a temporary speech section detection section 13, a level storage section 14, a termination detection threshold setting section 15, and a termination detection section 16. And a control unit 17. In the following description, S represents a processing step in the flowchart.

先ず、処理をスタートさせる（S1）。入力音声信号a1を
レベル検出部11に入力させ、そのレベル抽出を行って入
力レベル信号a2に変換する（S2）。この入力レベル信号
a2の入力レベル値をＳ（ｔ）とし、第１図に実線で示
す。この入力レベル信号a2を仮音声区間検出用閾値設定
部12、仮音声区間検出部13及びレベル記憶部14へ出力す
る。First, the process is started (S1). The input voice signal a1 is input to the level detection unit 11, its level is extracted and converted into the input level signal a2 (S2). This input level signal
The input level value of a2 is S (t) and is shown by the solid line in FIG. The input level signal a2 is output to the temporary voice section detection threshold setting unit 12, the temporary voice section detection unit 13, and the level storage unit 14.

制御部22は音声の発声中でないと想定される時刻におい
て仮音声区間検出用閾値設定部12へ仮音声区間検出用閾
値設定指令信号r1を出力する。仮音声区間検出用閾値設
定部12は、仮音声区間検出用閾値設定指令信号r1が入力
された時刻t0より予め定められた時間期間TSNだけ入力
レベル信号a2を受け取り、この時間期間における入力レ
ベル値Ｓ（ｔ）の平均値を雑音レベル値Ｎすなわちと設定する。次に、この雑音レベル値Ｎに対して予め学
習して定められた正の定数C1を加算した値を始端検出用
レベル閾値LSとして決定する（S3）。The control unit 22 outputs a temporary voice section detection threshold setting command signal r1 to the temporary voice section detection threshold setting section 12 at a time when it is assumed that no voice is being uttered. The temporary voice section detection threshold setting unit 12 receives the input level signal a2 for a predetermined time period TSN from the time t0 when the temporary voice section detection threshold setting command signal r1 is input, and the input level value in this time period. The average value of S (t) is the noise level value N, that is, And set. Next, a value obtained by adding a positive constant C1 determined by learning to the noise level value N in advance is determined as the start edge detection level threshold LS (S3).

LS＝Ｎ＋C1 このレベル閾値LSの信号b1を仮音声区間検出部13に送
る。LS = N + C1 The signal b1 of this level threshold LS is sent to the temporary voice section detection unit 13.

次に、仮音声区間検出用閾値設定部12において、この雑
音レベル値Ｎに対して始端検出用ベルト閾値設定に用い
た定数C1に比較して小さい値を持つ予め学習により定め
られた正の定数C2を加算した値を仮終端検出用レベル閾
値LKEとして決定する（S4）。Next, in the temporary voice section detection threshold value setting unit 12, a positive constant determined in advance having a smaller value than the constant C1 used for setting the start edge detection belt threshold value with respect to the noise level value N. The value obtained by adding C2 is determined as the temporary end detection level threshold LKE (S4).

LKE＝Ｎ＋C2 この仮終端検出用レベル閾値LKEの決め方は入力音声信
号の雑音レベル値Ｎに予め定められた正の定数C1を加え
て得られた始端検出用レベル閾値LSから、予め定められ
た正の定数Ｃ′２を差し引いて、このレベル閾値LSより
も値の小さくかつ雑音レベル値Ｎよりも値が大きいレベ
ル値を得、このレベル値を仮終端検出用レベル閾値LKE
と設定しても良い。LKE = N + C2 This temporary end detection level threshold LKE is determined by adding a predetermined positive threshold value LS obtained by adding a predetermined positive constant C1 to the noise level value N of the input voice signal. Is subtracted from the constant C'2 to obtain a level value smaller than the level threshold LS and larger than the noise level value N, and this level value is used as the temporary end detection level threshold LKE.
May be set.

仮音声区間検出用閾値設定部12はこのレベル閾値LKEの
信号b2を仮音声区間検出部13へ送ると共に、仮音声区間
検出用閾値設定終了信号b3を制御部17へ送る。The temporary voice section detection threshold setting unit 12 sends the level threshold LKE signal b2 to the temporary voice section detecting unit 13 and sends the temporary voice section detecting threshold setting end signal b3 to the control unit 17.

制御部17はこの仮音声区間検出用閾値設定終了信号b3が
供給されると、仮音声区間検出部13へ仮音声区間検出指
令信号r2を出力する。When the temporary voice section detection threshold setting end signal b3 is supplied, the control unit 17 outputs the temporary voice section detection command signal r2 to the temporary voice section detection unit 13.

仮音声区間検出部13は、仮音声区間検出指令信号r2の受
信後、入力レベル信号a2、始端検出用レベル閾値b2及び
仮終端検出用レベル閾値b3を入力として仮音声区間の検
出を開始し、始端と仮終端とを検出する。The temporary voice section detection unit 13, after receiving the temporary voice section detection command signal r2, starts the detection of the temporary voice section with the input level signal a2, the start edge detection level threshold b2 and the temporary end detection level threshold b3 as input, The start end and the provisional end are detected.

この始端時刻t1の検出処理においては、入力レベル値Ｓ
（ｔ）が時間の経過により始端検出用レベル閾値LSと一
致した時刻t1からこの入力レベル値Ｓ（ｔ）が、学習に
より予め定められた始端決定高レベル入力最低継続時間
TS以上、このレベル閾値LSより大きな値となっているこ
とＳ（ｔ）＞LS 但し、時間期間TS以上を検出した時、この継続時間TSの前述の開始時刻t1を音
声区間の始端と決定する（S5）。In the detection processing at the start time t1, the input level value S
From time t1 when (t) coincides with the start edge detection level threshold LS due to the passage of time, this input level value S (t) is the start edge determination high level input minimum duration time predetermined by learning.
If TS or more and a value greater than this level threshold LS S (t)> LS However, when the time period TS or more is detected, the above-mentioned start time t1 of this duration TS is determined as the start end of the voice section. (S5).

また、仮終端の検出処理においては、入力レベル値Ｓ
（ｔ）が始端検出後、レベルを低下してきて仮終端検出
用レベル閾値LKEと一致した時刻t3からこの入力レベル
値Ｓ（ｔ）が、学習により予め定められた仮終端決定低
レベル入力最低継続時間TKE以上、このレベル閾値LSを
下回る値となっていることＳ（ｔ）≦LKE 但し、継続時間TKE以上を検出した時、この継続時間TKEの前述の開始時間t3を
仮音声区間の仮終端と決定する（S6）。In the detection processing of the temporary end, the input level value S
This input level value S (t) is determined from the time t3 when (t) decreases the level after the start edge is detected and coincides with the temporary end detection level threshold value LKE, and the input end value S (t) is determined by learning, and the low level input minimum continuation is determined in advance. The value must be below the level threshold LS for time TKE or more S (t) ≤ LKE However, when the duration time TKE or more is detected, the above-mentioned start time t3 of this duration time TKE is temporarily terminated in the temporary voice section. (S6).

このようにして検出された始端時刻t1の信号d1をレベル
記憶部14、終端検出部16及び制御部17へ出力すると共に
（S5）、検出された仮終端時刻t3の信号d2をレベル記憶
部14、終端検出用閾値設定部15、終端検出部16及び制御
部17へ出力する（S6）。The signal d1 at the start end time t1 thus detected is output to the level storage unit 14, the end detection unit 16 and the control unit 17 (S5), and the signal d2 at the detected temporary end time t3 is output to the level storage unit 14 , End detection threshold setting unit 15, end detection unit 16, and control unit 17 (S6).

レベル記憶部14には始端時刻t1と仮終端時刻t3のそれぞ
れの信号d1及びd2が入力する。始端時刻信号d1が入力す
ると、始端時刻t1から入力レベル信号a2の入力レベル値
Ｓ（ｔ）の記憶を開始し、この入力レベル値の記憶を仮
終端時刻t3から予め学習によって定められた所定時間を
経過する時刻まで継続して行う。The signals d1 and d2 at the start time t1 and the temporary end time t3 are input to the level storage unit 14. When the start end time signal d1 is input, the storage of the input level value S (t) of the input level signal a2 is started from the start end time t1, and the storage of this input level value is started from the provisional end time t3 for a predetermined time predetermined by learning. Continue until the time that passes.

制御部17は仮音声区間検出部13からの仮終端時刻d2を受
信した後、終端検出用閾値設定指令信号r3を終端検出用
閾値設定部15へ出力する。After receiving the provisional termination time d2 from the provisional voice section detection unit 13, the control unit 17 outputs the termination detection threshold value setting command signal r3 to the termination detection threshold value setting unit 15.

終端検出用閾値設定部15は、制御部17からの終端検出用
閾値設定指令信号r3を受け取った後、仮音声区間検出部
13からの仮終端時刻信号d2によって与えられる時刻t3か
ら時間軸正方向へ予め定められた終端検出用雑音設定時
間TEN分の入力レベル値Ｓ（ｔ）を終端検出用雑音レベ
ル信号e1としてレベル記憶部14から受け取る。そして、
この雑音測定時間TENでの入力レベル値Ｓ（ｔ）の平均
値NEすなわちを終端検出用雑音レベル値NEと設定する。続いて、この
雑音レベル値NEに予め学習によって定められている正の
定数C3を加えて終端検出用レベル閾値LEと設定する（S
7）。すなわち LE＝NE＋C3 尚、この場合、この定数C3を選定することによって、終
端検出用レベル閾値LEを設定することが出来る。The termination detection threshold setting unit 15 receives the termination detection threshold setting command signal r3 from the control unit 17, and then the provisional voice section detection unit.
The input level value S (t) for a predetermined termination detection noise setting time TEN from the time t3 given by the provisional termination time signal d2 from 13 to the positive direction of the time axis is stored as the termination detection noise level signal e1. Received from Part 14. And
The average value NE of the input level values S (t) at this noise measurement time TEN, that is, Is set as the noise level value NE for end detection. Then, a positive constant C3 previously determined by learning is added to the noise level value NE to set the end detection level threshold LE (S
7). That is, LE = NE + C3 In this case, by selecting this constant C3, the end detection level threshold LE can be set.

このレベル閾値LEの信号f1を終端検出部16へ出力する共
に、終端検出用閾値設定終了信号f2を制御部17へ出力す
る。The signal f1 of the level threshold LE is output to the end detection unit 16 and the end detection threshold setting end signal f2 is output to the control unit 17.

制御部17は終端検出用閾値設定終了16を受け取ると、終
端検出部16へ終端検出指令信号r4を出力する。終端検出
部16は終端検出指令信号r4を受け取った後、仮音声区間
検出部13から入力した始端時刻信号d1で定まる時刻t1か
ら仮終端時刻信号d2で定まる時刻t3までの仮音声区間の
入力レベル値Ｓ（ｔ）の信号e2をレベル記憶部14から受
け取り、更に終端検出用閾値設定部15より終端検出用レ
ベル閾値信号f1を受け取る。Upon receiving the end detection threshold setting end 16, the control unit 17 outputs the end detection command signal r4 to the end detection unit 16. After receiving the end detection command signal r4, the end detection unit 16 inputs the input level of the provisional voice section from the time t1 determined by the start end time signal d1 input from the provisional voice section detection unit 13 to the time t3 determined by the provisional end time signal d2. The signal e2 having the value S (t) is received from the level storage unit 14, and further the end detection level threshold signal f1 is received from the end detection threshold setting unit 15.

そして、この終端検出部16において、これら信号e2及び
f1によってそれぞれ定められる仮音声区間の入力レベル
値Ｓ（ｔ）と、終端検出用レベル閾値LEとの大小比較を
仮終端時刻t3から時間軸負方向へ行っていき、仮音声区
間の入力レベル値Ｓ（ｔ）が終端検出レベル値LEよりも
初めて大となる時刻例えばt2を真の終端時刻として検出
（S8）する。Then, in the end detection unit 16, these signals e2 and
The input level value S (t) of the temporary voice section determined by f1 and the end detection level threshold LE are compared in magnitude in the negative direction of the time axis from the temporary end time t3 to the input level value of the temporary voice section. The time when S (t) becomes larger than the end detection level value LE for the first time, for example t2, is detected as the true end time (S8).

このようにして決定された終端時刻の信号ｇを制御部17
へ出力してこの音声区間検出の処理がエンドとなる（S
9）。The control unit 17 outputs the signal g at the end time determined in this way.
To the end of this voice section detection processing (S
9).

（発明の効果）上述した説明から明らかなように、この発明では、音声
認識装置使用時における雑音変動に対応するために、終
端検出用レベル閾値の設定を仮終端直後の入力レベル値
の平均値を基準にして行っている。従って、終端検出を
周囲雑音レベルの時間的変動の影響を受けずに行えるの
で、安定かつ正確な音声区間検出を行うことが出来、よ
って音声認識装置における認識性能の向上が期待出来
る。(Effects of the Invention) As is apparent from the above description, in the present invention, in order to cope with noise fluctuations when the speech recognition device is used, the end detection level threshold is set to the average value of the input level values immediately after the provisional end. It is based on. Therefore, since the end detection can be performed without being affected by the temporal fluctuation of the ambient noise level, stable and accurate voice section detection can be performed, and thus the recognition performance of the voice recognition device can be expected to improve.

[Brief description of drawings]

第１図はこの発明に係る音声区間検出方法の実施例の説
明に供する入力音声信号の入力レベル値の時間的変化の
様子を示す説明図、第２図はこの発明の音声区間検出方法の説明に供する音
声区間検出部を示すブロック図、第３図はこの発明の音声区間検出方法の処理の流れ図、第４図は従来の音声区間検出方法の説明図である。 11……レベル抽出部 12……仮音声区間検出用閾値設定部 13……仮音声区間検出部 14……レベル記憶部 15……終端検出用閾値設定部 16……終端検出部、17……制御部。FIG. 1 is an explanatory view showing a state of temporal change of an input level value of an input voice signal, which is used for explaining an embodiment of a voice section detecting method according to the present invention, and FIG. 2 is an explanation of a voice section detecting method of the present invention. FIG. 3 is a block diagram showing a voice section detecting unit used in FIG. 3, FIG. 3 is a flowchart of processing of the voice section detecting method of the present invention, and FIG. 4 is an explanatory view of a conventional voice section detecting method. 11 …… Level extraction unit 12 …… Temporary voice section detection threshold setting unit 13 …… Temporary voice section detection unit 14 …… Level storage unit 15 …… End detection threshold setting unit 16 …… End detection unit, 17 …… Control unit.

Claims

[Claims]

1. A noise level is defined as an average value of input level values of the input voice signal in a predetermined time period before the voice section of the input voice signal, and a level threshold value set based on the noise level, and the input. When detecting the voice section by comparing the level with the level value, the noise level value of the input voice signal is smaller than the level threshold for starting edge detection obtained by adding a predetermined positive constant and the noise level A level value having a value larger than the value is set as a temporary end detection level threshold, and the input level value falls below the temporary end detection level threshold by a predetermined temporary end determination low level input minimum duration time or more. The start time is determined to be the tentative end of the voice section, and from the tentative end, over a predetermined noise measurement period,
The average value of the input level value is used as the noise level value for end detection,
A value obtained by adding a predetermined positive constant to the noise level value is set as the end detection level threshold value, and the stored input level value from the start end to the provisional end,
A method of detecting a voice section, characterized in that a time closest to a tentative end is detected as a true end among crossing times with the level threshold for end detection.