JP2002261553A

JP2002261553A - Automatic audio gain control device, automatic audio gain control method, storage medium for storing computer program having automatic audio gain control algorithm, and computer program having automatic audio gain control algorithm

Info

Publication number: JP2002261553A
Application number: JP2001058171A
Authority: JP
Inventors: Atsushi Yamane; 淳山根
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-03-02
Filing date: 2001-03-02
Publication date: 2002-09-13
Anticipated expiration: 2021-03-02
Also published as: JP4548953B2

Abstract

(57)【要約】【課題】レベルの大きな背景雑音や無声音が入力され
た場合でも、入力音声の自動的なレベル調整を正確に行
う。【解決手段】入力音声信号の性質を判定する音声判定
手段１０２に、入力音声信号の有声音らしさを判定する
有声音判定機能を持たせ、その判定の結果、入力音声中
の無声音であると判定された部分では無音の場合と同様
の処理を行う。有声音判定機能は、フレームから所定の
種類のピッチに対する自己相関関数および自己相関関数
とフレームの強度との比を計算し、自己相関関数とフレ
ームの強度との比が最大になるピッチおよび比を求め、
その最大値が所定の閾値を超えれば有声音であると判断
する。これにより、レベルの大きな背景雑音や無声音が
入力された場合に、必要以上に利得を増加させることな
く、適切な利得制御を行うことができる。 (57) [Summary] [Problem] To accurately perform automatic level adjustment of an input voice even when a high-level background noise or unvoiced sound is input. SOLUTION: A voice determination means 102 for determining a property of an input voice signal is provided with a voiced sound determination function for determining the likelihood of a voiced sound of the input voice signal, and as a result of the determination, it is determined that the input voice signal is an unvoiced sound. The same process as in the case of silence is performed in the portion where the sound is made. The voiced sound determination function calculates an autocorrelation function for a predetermined type of pitch from the frame and a ratio between the autocorrelation function and the intensity of the frame, and calculates a pitch and a ratio at which the ratio of the autocorrelation function and the intensity of the frame becomes maximum. Asked,
If the maximum value exceeds a predetermined threshold, it is determined that the sound is voiced. Thus, when background noise or unvoiced sound with a large level is input, appropriate gain control can be performed without increasing the gain more than necessary.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力音声信号を所
定のサンプル数からなるフレームに分割し、前記フレー
ムに乗じる利得を自動的に求め、前記フレームのレベル
を所望のレベルに調整することを目的とする音声自動利
得制御装置、音声自動利得制御方法、音声自動利得制御
用のアルゴリズムを持つコンピュータプログラムを格納
する記憶媒体及び音声自動利得制御用のアルゴリズムを
持つコンピュータプログラムに関する。本発明が適用可
能な技術としては、音声録音装置、音声通信、ボイスメ
モ、留守録装置、音声メッセージ録音装置および音声認
識装置等がある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for dividing an input audio signal into frames each having a predetermined number of samples, automatically obtaining a gain for multiplying the frames, and adjusting the level of the frames to a desired level. The present invention relates to an automatic audio gain control device, an automatic audio gain control method, a storage medium storing a computer program having an automatic audio gain control algorithm, and a computer program having an automatic audio gain control algorithm. The technology to which the present invention can be applied includes a voice recording device, voice communication, voice memo, answering machine, voice message recording device, voice recognition device, and the like.

【０００２】[0002]

【従来の技術】音声をディジタル化し、蓄積する技術
は、各種電話機やファクシミリ装置等の留守番電話装置
や音声メッセージ送信装置、携帯型ボイスメモ等、さま
ざまなところで用いられている。これらの音声録音装置
においては、音声を高品質に保ったままメモリを効率的
に利用するためにさまざまな音声符号化（音声圧縮技
術）が用いられることもある。2. Description of the Related Art Technologies for digitizing and storing voice are used in various places such as an answering machine such as various telephones and facsimile machines, a voice message transmitting device, and a portable voice memo. In these voice recording devices, various voice encodings (voice compression techniques) may be used in order to efficiently use the memory while maintaining high quality voice.

【０００３】これらの音声録音装置においては、入力音
声のレベルが問題になることがある。留守録やメッセー
ジ録音を行う場合に、話者はさまざまな声量やマイクと
の距離をもって録音処理を行う。また、音声をディジタ
ル変換するＡＤ変換機の特性によって入力音声がディジ
タル化されるレベルはさまざまである。後者について
は、特性に応じたレベル処理をあらかじめ行っておけば
レベルの均一化が可能であるが、前者の場合は、事前の
設定では対応しきれない。[0003] In these voice recording devices, the level of the input voice may be a problem. When performing a message recording or message recording, a speaker performs a recording process with various voice volumes and distances from a microphone. The level at which the input voice is digitized varies depending on the characteristics of the AD converter that converts the voice to digital. In the latter case, the level can be made uniform by performing a level process in accordance with the characteristics in advance, but in the former case, it is not possible to cope with the former setting.

【０００４】以上の問題を解決するために、入力音声を
自動的に所望のレベルに調整する自動利得制御（Auto G
ain Control、以下ＡＧＣと略称する）装置が用いられ
ている。このようなＡＤＣ装置では、ＡＤ変換前にアナ
ログでＡＧＣ処理を行うものもあるが、ＡＤ変換の特性
を吸収するためにディジタルでＡＧＣ処理を行う方が効
率的である場合もある。In order to solve the above problems, an automatic gain control (Auto G) for automatically adjusting an input voice to a desired level.
ain Control (hereinafter abbreviated as AGC) device. Some ADC devices perform AGC processing in analog before AD conversion, but in some cases, it is more efficient to perform AGC processing digitally in order to absorb the characteristics of AD conversion.

【０００５】ディジタル処理によるＡＧＣ装置は、入力
音声の強度と目標とする強度との比をもとに利得を決定
し、その利得を入力音声に乗じるという形が一般的であ
る。ハードウェアとしてＤＳＰ（ディジタル信号処理プ
ロセッサ）を用いる場合など、割り算処理や平方根処理
に多大な処理が発生する関係で、単数あるいは複数の音
声サンプルからなるフレームの強度と前記目標強度とを
比較し、前記目標強度よりも前記フレームの強度が小さ
い場合は利得を増加させ、大きい場合は利得を減少さ
せ、その利得は次フレームの利得として用いるといった
再帰的な方法を用いる場合もある。An AGC apparatus using digital processing generally determines a gain based on a ratio between an input voice intensity and a target intensity, and multiplies the gain by the gain. For example, when a DSP (Digital Signal Processor) is used as hardware, a large amount of processing occurs in division processing and square root processing, and the intensity of a frame including one or a plurality of audio samples is compared with the target intensity. A recursive method may be used in which the gain is increased when the strength of the frame is smaller than the target strength, and decreased when the strength of the frame is larger than the target strength, and the gain is used as the gain of the next frame.

【０００６】再帰的なディジタルＡＧＣ装置において
は、入力音声信号が無音（あるいは話者や聴者にとって
ほとんど無意味で比較的小さな背景雑音）の場合、その
強度をもとに利得を計算すると、過剰な大きさを持った
利得を生じてしまうため、有音検出を行う。有音検出の
手法としては、前記フレームの強度（フレーム内各サン
プルの二乗和）に所定の閾値を与え、閾値以下であれば
無音（あるいは雑音のみ）であると判定する方法が用い
られている。In a recursive digital AGC apparatus, if the input speech signal is silent (or a relatively small background noise which is almost meaningless to a speaker or a listener), if the gain is calculated based on the intensity, an excessive Since a large gain is generated, sound detection is performed. As a technique for detecting sound, a method is used in which a predetermined threshold is given to the intensity of the frame (sum of squares of each sample in the frame), and if the intensity is equal to or smaller than the threshold, it is determined that there is no sound (or only noise). .

【０００７】[0007]

【発明が解決しようとする課題】しかし、実際の音声に
おいては、無声音が含まれる。母音などの有声音部の強
度が大きい場合も、無声音で始まる語頭部は強度が小さ
いため、この部分をもとに利得制御を行うとやはり過剰
な利得を生じてしまうおそれがある。However, actual voices include unvoiced sounds. Even when the intensity of a voiced sound part such as a vowel is high, since the head of a word beginning with an unvoiced sound is low in intensity, if gain control is performed based on this part, excessive gain may still occur.

【０００８】さらに、屋外からの電話による留守番電話
録音等、背景雑音が大きい環境の音声を利得制御する場
合、背景雑音が前記閾値を超え、背景雑音が増幅されて
しまい、そのため強度の大きい背景雑音を想定して閾値
を決定すると、閾値が高くなってしまい、利得制御すべ
き音声が無音であると判定されるおそれがある。Further, when gain control is performed on voices in an environment with a large background noise, such as when recording an answering machine by an outside telephone, the background noise exceeds the threshold value, and the background noise is amplified. When the threshold value is determined on the assumption of the above, the threshold value becomes high, and there is a possibility that the sound whose gain should be controlled is determined to be silent.

【０００９】本発明の目的は、レベルの大きな背景雑音
や無声音が入力された場合でも、入力音声の自動的なレ
ベル調整を正確に行うことである。SUMMARY OF THE INVENTION It is an object of the present invention to accurately perform automatic level adjustment of an input voice even when a high-level background noise or unvoiced sound is input.

【００１０】[0010]

【課題を解決するための手段】請求項１記載の発明は、
入力音声信号を所定のサンプル数からなるフレームに分
割し、前記フレームに乗じる利得を自動的に求め、前記
フレームのレベルを所望のレベルに調整することを目的
とする音声自動利得制御装置であって、前記フレームの
性質を判定する音声判定手段を備え、前記音声判定手段
は、前記フレームが有声音であるか無声音であるかを判
定する有声音判定手段を備える。According to the first aspect of the present invention,
An automatic audio gain control device for dividing an input audio signal into frames having a predetermined number of samples, automatically obtaining a gain for multiplying the frames, and adjusting a level of the frames to a desired level. And a voice determining means for determining a property of the frame, wherein the voice determining means includes a voiced sound determining means for determining whether the frame is a voiced sound or an unvoiced sound.

【００１１】請求項２記載の発明は、請求項１記載の音
声自動利得制御装置において、次フレームに乗じる利得
を推定する利得推定手段を備え、前記利得推定手段は、
有性音判定手段において前記フレームが有音であると判
定された場合と無音であると判定された場合とで異なる
利得推定処理を行い、さらに、前記有声音判定手段にお
いて前記フレームが無声音であると判定された場合に無
音の場合を同様の処理を行う。According to a second aspect of the present invention, in the audio automatic gain control apparatus according to the first aspect, there is provided a gain estimating means for estimating a gain by which the next frame is multiplied.
Perform different gain estimation processing when the frame is determined to be voiced and when determined to be unvoiced in the voicing sound determination unit, and further, when the frame is unvoiced in the voiced sound determination unit. When it is determined that there is no sound, the same processing is performed.

【００１２】請求項３記載の発明は、請求項１または２
記載の音声自動利得制御装置において、前記有声音判定
手段は、所定の複数のピッチに対する複数の前記フレー
ムの自己相関関数を算出する自己相関関数算出手段、お
よび前記複数の自己相関関数を用いて前記フレームが有
声音であるか無声音であるかを最終的に決定する有声音
決定手段を備える。The invention described in claim 3 is the first or second invention.
In the automatic voice gain control device according to the aspect, the voiced sound determination unit uses an autocorrelation function calculation unit that calculates an autocorrelation function of a plurality of the frames for a plurality of predetermined pitches, and the plurality of autocorrelation functions. A voiced sound determining means for finally determining whether the frame is a voiced sound or an unvoiced sound is provided.

【００１３】請求項４記載の発明は、請求項３記載の音
声自動利得制御装置において、前記有声音決定手段は、
前記複数の自己相関関数の最大値を求め、前記自己相関
関数の最大値が、フレームごとに求められた所定の閾値
より大きい場合は前記フレームが有声音であると判定
し、所定の閾値よりも小さい場合は無声音であると判定
する。According to a fourth aspect of the present invention, in the automatic audio gain control apparatus according to the third aspect, the voiced sound determining means includes:
Determine the maximum value of the plurality of autocorrelation functions, if the maximum value of the autocorrelation function is larger than a predetermined threshold obtained for each frame, it is determined that the frame is a voiced sound, than a predetermined threshold If it is small, it is determined that the sound is unvoiced.

【００１４】請求項５記載の発明は、請求項４記載の音
声自動利得制御装置において、前記閾値は、前記複数の
自己相関関数の算出に用いられるすべてのサンプル点の
二乗和に所定の比を乗じたものである。According to a fifth aspect of the present invention, in the audio automatic gain control apparatus according to the fourth aspect, the threshold value is determined by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. It is multiplied.

【００１５】請求項６記載の発明は、請求項３記載の音
声自動利得制御装置において、前記有声音決定手段は、
前記複数の自己相関関数それぞれと、フレームごとに求
められた所定の閾値とを比較し、一つでも前記閾値より
も大きな前記自己相関関数が存在すれば前記フレームが
有声音であると判定し、存在しなければ無声音であると
判定する。According to a sixth aspect of the present invention, in the audio automatic gain control apparatus according to the third aspect, the voiced sound determining means includes:
Each of the plurality of autocorrelation functions is compared with a predetermined threshold determined for each frame, and it is determined that the frame is a voiced sound if any one of the autocorrelation functions is greater than the threshold. If it does not exist, it is determined that the sound is unvoiced.

【００１６】請求項７記載の発明は、請求項６記載の音
声自動利得制御装置において、前記閾値は、前記複数の
自己相関関数の算出に用いられるすべてのサンプル点の
二乗和に所定の比を乗じたものである。According to a seventh aspect of the present invention, in the audio automatic gain control apparatus according to the sixth aspect, the threshold value is determined by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. It is multiplied.

【００１７】請求項８記載の発明は、入力音声信号を所
定のサンプル数からなるフレームに分割し、前記フレー
ムに乗じる利得を自動的に求め、前記フレームのレベル
を所望のレベルに調整することを目的とする音声自動利
得制御方法であって、前記フレームの性質を判定する音
声判定工程を備え、前記音声判定工程は、前記フレーム
が有声音であるか無声音であるかを判定する有声音判定
工程を備える。According to the present invention, the input speech signal is divided into frames each having a predetermined number of samples, a gain for multiplying the frames is automatically obtained, and the level of the frames is adjusted to a desired level. An automatic audio gain control method comprising an audio determining step of determining a property of the frame, wherein the audio determining step determines whether the frame is a voiced sound or an unvoiced sound. Is provided.

【００１８】請求項９記載の発明は、請求項８記載の音
声自動利得制御方法において、次フレームに乗じる利得
を推定する利得推定工程を備え、前記利得推定工程は、
有性音判定工程において前記フレームが有音であると判
定された場合と無音であると判定された場合とで異なる
利得推定処理を行い、さらに、前記有声音判定工程にお
いて前記フレームが無声音であると判定された場合に無
音の場合を同様の処理を行う。According to a ninth aspect of the present invention, in the voice automatic gain control method according to the eighth aspect, there is provided a gain estimating step of estimating a gain by which the next frame is multiplied.
Perform different gain estimation processing between the case where the frame is determined to be voiced and the case where it is determined to be silent in the voicing sound determination step, and further, the frame is unvoiced in the voiced sound determination step. When it is determined that there is no sound, the same processing is performed.

【００１９】請求項１０記載の発明は、請求項８または
９記載の音声自動利得制御方法において、前記有声音判
定工程は、所定の複数のピッチに対する複数の前記フレ
ームの自己相関関数を算出する自己相関関数算出工程、
および前記複数の自己相関関数を用いて前記フレームが
有声音であるか無声音であるかを最終的に決定する有声
音決定工程を備える。According to a tenth aspect of the present invention, in the automatic voice gain control method according to the eighth or ninth aspect, the voiced sound determination step calculates an autocorrelation function of a plurality of frames for a plurality of predetermined pitches. Correlation function calculation step,
And a voiced sound determination step for finally determining whether the frame is voiced or unvoiced using the plurality of autocorrelation functions.

【００２０】請求項１１記載の発明は、請求項１０記載
の音声自動利得制御方法において、前記有声音決定工程
は、前記複数の自己相関関数の最大値を求め、前記自己
相関関数の最大値が、フレームごとに求められた所定の
閾値より大きい場合は前記フレームが有声音であると判
定し、所定の閾値よりも小さい場合は無声音であると判
定する。According to an eleventh aspect of the present invention, in the voice automatic gain control method according to the tenth aspect, the voiced sound determining step determines a maximum value of the plurality of autocorrelation functions, and determines that the maximum value of the autocorrelation function is If the frame is larger than a predetermined threshold obtained for each frame, the frame is determined to be voiced, and if smaller than the predetermined threshold, the frame is determined to be unvoiced.

【００２１】請求項１２記載の発明は、請求項１１記載
の音声自動利得制御方法において、前記閾値は、前記複
数の自己相関関数の算出に用いられるすべてのサンプル
点の二乗和に所定の比を乗じたものである。According to a twelfth aspect of the present invention, in the audio automatic gain control method according to the eleventh aspect, the threshold value is obtained by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. It is multiplied.

【００２２】請求項１３記載の発明は、請求項１０記載
の音声自動利得制御方法において、前記有声音決定工程
は、前記複数の自己相関関数それぞれと、フレームごと
に求められた所定の閾値とを比較し、一つでも前記閾値
よりも大きな前記自己相関関数が存在すれば前記フレー
ムが有声音であると判定し、存在しなければ無声音であ
ると判定する。According to a thirteenth aspect of the present invention, in the audio automatic gain control method according to the tenth aspect, the voiced sound determining step includes the step of determining each of the plurality of autocorrelation functions and a predetermined threshold value obtained for each frame. In comparison, if at least one of the autocorrelation functions greater than the threshold value exists, the frame is determined to be voiced, and if not, it is determined to be unvoiced.

【００２３】請求項１４記載の発明は、請求項１３記載
の音声自動利得制御方法において、前記閾値は、前記複
数の自己相関関数の算出に用いられるすべてのサンプル
点の二乗和に所定の比を乗じたものである。According to a fourteenth aspect of the present invention, in the audio automatic gain control method according to the thirteenth aspect, the threshold value is obtained by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. It is multiplied.

【００２４】請求項１５記載の発明は、コンピュータに
インストールされ、このコンピュータに、入力音声信号
を所定のサンプル数からなるフレームに分割し、前記フ
レームに乗じる利得を自動的に求め、前記フレームのレ
ベルを所望のレベルに調整することを目的とする音声自
動利得制御を実行する機能と、前記フレームが有声音で
あるか無声音であるかを判定する有声音判定機能を含む
有声判定機能であって、前記フレームの性質を判定する
音声判定機能を実行する機能と、を実行させる機械読取
可能な音声自動利得制御用のアルゴリズムを持つコンピ
ュータプログラムを格納する記憶媒体である。The invention according to claim 15 is installed in a computer, which divides an input audio signal into frames each having a predetermined number of samples, automatically obtains a gain by which the frames are multiplied, and obtains a level of the frames. A function to execute automatic voice gain control for the purpose of adjusting to a desired level, and a voiced determination function including a voiced sound determination function to determine whether the frame is voiced or unvoiced, A storage medium for storing a computer program having a function of executing a voice determination function for determining a property of the frame and a machine-readable algorithm for automatic gain control of voice for executing the function.

【００２５】請求項１６記載の発明は、請求項１５記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、コンピュータに、次フレームに乗じる利得を推定す
る利得推定機能を実行させ、前記利得推定機能は、有性
音判定機能の実行によって前記フレームが有音であると
判定された場合と無音であると判定された場合とで異な
る利得推定処理を行い、さらに、前記有声音判定機能の
実行によって前記フレームが無声音であると判定された
場合に無音の場合を同様の処理を行う。According to a sixteenth aspect of the present invention, in the computer program stored in the storage medium of the fifteenth aspect, the computer causes a computer to execute a gain estimating function for estimating a gain for multiplying a next frame. A different gain estimation process is performed between the case where the frame is determined to be voiced and the case where the frame is determined to be silent by execution of the voicing sound determination function. Is determined to be unvoiced, the same processing is performed for the case of no sound.

【００２６】請求項１７記載の発明は、請求項１５また
は１６記載の記憶媒体に格納されたコンピュータプログ
ラムにおいて、前記有声音判定機能は、所定の複数のピ
ッチに対する複数の前記フレームの自己相関関数を算出
する自己相関関数算出機能と、前記複数の自己相関関数
を用いて前記フレームが有声音であるか無声音であるか
を最終的に決定する有声音決定機能と、を含む。According to a seventeenth aspect of the present invention, in the computer program stored in the storage medium of the fifteenth or sixteenth aspect, the voiced sound determination function determines an autocorrelation function of a plurality of frames for a plurality of predetermined pitches. An autocorrelation function calculation function to calculate and a voiced sound determination function to finally determine whether the frame is voiced or unvoiced using the plurality of autocorrelation functions.

【００２７】請求項１８記載の発明は、請求項１７記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記有声音決定機能は、前記複数の自己相関関数の
最大値を求め、前記自己相関関数の最大値が、フレーム
ごとに求められた所定の閾値より大きい場合は前記フレ
ームが有声音であると判定し、所定の閾値よりも小さい
場合は無声音であると判定する。The invention according to claim 18 is the computer program stored in the storage medium according to claim 17, wherein the voiced sound determination function obtains a maximum value of the plurality of autocorrelation functions, When the maximum value is larger than a predetermined threshold value obtained for each frame, the frame is determined to be voiced, and when it is smaller than the predetermined threshold, it is determined to be unvoiced.

【００２８】請求項１９記載の発明は、請求項１８記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記閾値は、前記複数の自己相関関数の算出に用い
られるすべてのサンプル点の二乗和に所定の比を乗じた
ものである。According to a nineteenth aspect of the present invention, in the computer program stored in the storage medium of the eighteenth aspect, the threshold value is set to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Multiplied by the ratio of

【００２９】請求項２０記載の発明は、請求項１７記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記有声音決定機能は、前記複数の自己相関関数そ
れぞれと、フレームごとに求められた所定の閾値とを比
較し、一つでも前記閾値よりも大きな前記自己相関関数
が存在すれば前記フレームが有声音であると判定し、存
在しなければ無声音であると判定する。According to a twentieth aspect of the present invention, in the computer program stored in the storage medium according to the seventeenth aspect, the voiced sound determining function is configured to determine each of the plurality of autocorrelation functions and a predetermined value determined for each frame. The frame is compared with a threshold, and if at least one of the autocorrelation functions is larger than the threshold, the frame is determined to be voiced, and if not, it is determined to be unvoiced.

【００３０】請求項２１記載の発明は、請求項２０記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記閾値は、前記複数の自己相関関数の算出に用い
られるすべてのサンプル点の二乗和に所定の比を乗じた
ものである。According to a twenty-first aspect of the present invention, in the computer program stored in the storage medium of the twentieth aspect, the threshold value is a predetermined sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Multiplied by the ratio of

【００３１】請求項２２記載の発明は、コンピュータに
インストールされ、このコンピュータに、入力音声信号
を所定のサンプル数からなるフレームに分割し、前記フ
レームに乗じる利得を自動的に求め、前記フレームのレ
ベルを所望のレベルに調整することを目的とする音声自
動利得制御を実行する機能と、前記フレームが有声音で
あるか無声音であるかを判定する有声音判定機能を含む
有声判定機能であって、前記フレームの性質を判定する
音声判定機能を実行する機能と、を実行させる機械読取
可能な音声自動利得制御用のアルゴリズムを持つコンピ
ュータプログラムである。The invention according to claim 22 is installed in a computer, which divides an input audio signal into frames each having a predetermined number of samples, automatically calculates a gain by which the frames are multiplied, and obtains a level of the frames. A function to execute automatic voice gain control for the purpose of adjusting to a desired level, and a voiced determination function including a voiced sound determination function to determine whether the frame is voiced or unvoiced, A computer program having a function of executing an audio determination function of determining the nature of the frame, and a machine-readable audio automatic gain control algorithm for executing the function.

【００３２】請求項２３記載の発明は、請求項２２記載
のコンピュータプログラムにおいて、コンピュータに、
次フレームに乗じる利得を推定する利得推定機能を実行
させ、前記利得推定機能は、有性音判定機能の実行によ
って前記フレームが有音であると判定された場合と無音
であると判定された場合とで異なる利得推定処理を行
い、さらに、前記有声音判定機能の実行によって前記フ
レームが無声音であると判定された場合に無音の場合を
同様の処理を行う。According to a twenty-third aspect of the present invention, in the computer program according to the twenty-second aspect, the computer stores:
A gain estimating function for estimating a gain for multiplying the next frame is executed, and the gain estimating function is performed when the frame is determined to be voiced and when it is determined to be silent by execution of a sexual sound determination function. And a different gain estimation process is performed, and when the frame is determined to be unvoiced by the execution of the voiced sound determination function, a similar process is performed for the case of no sound.

【００３３】請求項２４記載の発明は、請求項２２また
は２３記載のコンピュータプログラムにおいて、前記有
声音判定機能は、所定の複数のピッチに対する複数の前
記フレームの自己相関関数を算出する自己相関関数算出
機能と、前記複数の自己相関関数を用いて前記フレーム
が有声音であるか無声音であるかを最終的に決定する有
声音決定機能と、を含む。According to a twenty-fourth aspect of the present invention, in the computer program according to the twenty-second or twenty-third aspect, the voiced sound determination function calculates an autocorrelation function of a plurality of frames for a plurality of predetermined pitches. And a voiced sound determination function that ultimately determines whether the frame is voiced or unvoiced using the plurality of autocorrelation functions.

【００３４】請求項２５記載の発明は、請求項２４記載
のコンピュータプログラムにおいて、前記有声音決定機
能は、前記複数の自己相関関数の最大値を求め、前記自
己相関関数の最大値が、フレームごとに求められた所定
の閾値より大きい場合は前記フレームが有声音であると
判定し、所定の閾値よりも小さい場合は無声音であると
判定する。According to a twenty-fifth aspect of the present invention, in the computer program according to the twenty-fourth aspect, the voiced sound determination function obtains a maximum value of the plurality of autocorrelation functions, and the maximum value of the autocorrelation function is determined for each frame. If the frame is larger than the predetermined threshold value, the frame is determined to be voiced, and if smaller than the predetermined threshold, the frame is determined to be unvoiced.

【００３５】請求項２６記載の発明は、請求項２５記載
のコンピュータプログラムにおいて、前記閾値は、前記
複数の自己相関関数の算出に用いられるすべてのサンプ
ル点の二乗和に所定の比を乗じたものである。According to a twenty-sixth aspect of the present invention, in the computer program according to the twenty-fifth aspect, the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. It is.

【００３６】請求項２７記載の発明は、請求項２４記載
のコンピュータプログラムにおいて、前記有声音決定機
能は、前記複数の自己相関関数それぞれと、フレームご
とに求められた所定の閾値とを比較し、一つでも前記閾
値よりも大きな前記自己相関関数が存在すれば前記フレ
ームが有声音であると判定し、存在しなければ無声音で
あると判定する。According to a twenty-seventh aspect of the present invention, in the computer program according to the twenty-fourth aspect, the voiced sound determination function compares each of the plurality of autocorrelation functions with a predetermined threshold value obtained for each frame, If at least one of the autocorrelation functions is larger than the threshold value, the frame is determined to be voiced, and if not, it is determined to be unvoiced.

【００３７】請求項２８記載の発明は、請求項２７記載
のコンピュータプログラムにおいて、前記閾値は、前記
複数の自己相関関数の算出に用いられるすべてのサンプ
ル点の二乗和に所定の比を乗じたものである。According to a twenty-eighth aspect of the present invention, in the computer program according to the twenty-seventh aspect, the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. It is.

【００３８】[0038]

【発明の実施の形態】本発明の実施の形態について説明
する。Embodiments of the present invention will be described.

【００３９】本実施の形態では、コンピュータの記憶媒
体、例えばＨＤＤ（ハードディスクドライブ）にインス
トールされたコンピュータプログラムに従いそのコンピ
ュータのプロセッサがＲＡＭ等のメモリ領域を用いつつ
実行する処理を機能としてとらえ、その機能をブロック
化した機能ブロック図を用い、この機能ブロック図によ
って表現される機能説明をもって、そのコンピュータに
よって実現する音声自動利得制御装置および音声自動利
得制御方法について説明する。In the present embodiment, a function executed by a processor of a computer according to a computer program installed in a storage medium of the computer, for example, an HDD (hard disk drive) while using a memory area such as a RAM is considered as a function. The automatic audio gain control device and the automatic audio gain control method realized by the computer will be described using a functional block diagram obtained by blocking the above and using the functional description expressed by the functional block diagram.

【００４０】また、このような機能ブロック図に基づく
機能説明は、同時に、音声自動利得制御用のアルゴリズ
ムを持つコンピュータプログラム及びこのコンピュータ
プログラムを格納する記憶媒体についての説明を兼ね
る。ここでの、記憶媒体は、コンピュータのＨＤＤばか
りでなく、そのＲＡＭ等の記憶領域、コンピュータを離
れて存在するＣＤ−ＲＯＭやＣＤ−Ｒ、ＣＤ−ＲＷ等の
光メディア、あるいは磁気的な情報記録媒体、コンピュ
ータにコンピュータプログラムを配信する通信ネットワ
ークのシステム等、各種の記憶媒体を含む。Further, the functional description based on such a functional block diagram also serves as a description of a computer program having an algorithm for automatic audio gain control and a storage medium storing the computer program. Here, the storage medium is not only the HDD of the computer but also a storage area such as a RAM thereof, an optical medium such as a CD-ROM, a CD-R, and a CD-RW that exists apart from the computer, or a magnetic information recording. It includes various storage media such as a medium, a communication network system for distributing a computer program to a computer.

【００４１】〔第１の実施の形態〕本発明の第１の実施
の形態を図１に基づいて説明する。図１は、本発明が適
用される音声自動利得制御（音声ＡＧＣ装置）の一形態
を示す機能ブロック図である。[First Embodiment] A first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a functional block diagram showing one embodiment of an audio automatic gain control (audio AGC device) to which the present invention is applied.

【００４２】本音声ＡＧＣ装置は、音声入力手段１０１
と、音声判定手段１０２と、利得乗算手段１０３と、利
得推定手段１０４と、を備える。The voice AGC apparatus has voice input means 101
, A voice determination unit 102, a gain multiplication unit 103, and a gain estimation unit 104.

【００４３】さらに、前記音声判定手段１０２は、有音
判定手段２０１と、有声音判定手段２０２と、を備え
る。Further, the voice judging means 102 has a voiced sound judging means 201 and a voiced sound judging means 202.

【００４４】さらに、前記有音判定手段２０１は、第１
のフレーム強度算出手段３０１と、有音決定手段３０２
と、を備える。Further, the sound existence determination means 201 is provided with a first
Frame intensity calculation means 301 and sound existence determination means 302
And.

【００４５】さらに、前記有声音判定手段２０２は、第
２のフレーム強度算出手段４０１と、自己相関関数算出
手段４０２と、自己相関関数最大値判定手段４０３と、
有声音決定手段４０４と、を備える。Further, the voiced sound judgment means 202 includes a second frame intensity calculation means 401, an autocorrelation function calculation means 402, an autocorrelation function maximum value judgment means 403,
Voiced sound determination means 404.

【００４６】さらに、前記利得推定手段１０４は、第３
のフレーム強度算出手段５０１と、強度比較手段５０２
と、利得算出手段５０３と、を備える。Further, the gain estimation means 104
Frame intensity calculating means 501 and intensity comparing means 502
And a gain calculating means 503.

【００４７】前記音声入力手段１０１においては、ディ
ジタル音声信号が入力され、所定のサンプル数のフレー
ムが構成される。サンプル数としては、サンプリング周
波数が８ｋＨｚの場合、音声が定常とされる４０から２
００の範囲の値が望ましい。The voice input means 101 receives a digital voice signal and forms a frame of a predetermined number of samples. When the sampling frequency is 8 kHz, the number of samples is from 40 to 2 where sound is stationary.
A value in the range of 00 is desirable.

【００４８】前記音声判定手段１０２においては、前記
フレームが入力され、まず、有音判定手段２０１におい
て、有音か無音かが判定される。本ＡＧＣ装置において
は、有音か無音かを判定する手法としては、前記フレー
ムの強度（各サンプルの二乗和）を計算し、その値が所
定の閾値より大きい場合は有音とし、閾値よりも小さい
場合は無音とする手法を用いる。すなわち、前記有音判
定手段２０１においては、まず、第1のフレーム強度算
出手段３０１において、前記フレームの強度（１）が計
算される。さらに、有音決定手段３０２において、前記
フレームの強度（１）と所定の閾値との比較を行い、前
記フレームの強度（２）が閾値よりも大きい場合は有音
であると判定し、閾値よりも小さい場合は無音であると
判定する。The sound determination means 102 receives the frame, and the sound determination means 201 first determines whether there is sound or no sound. In the present AGC apparatus, as a method of determining whether there is sound or no sound, the intensity of the frame (sum of squares of each sample) is calculated, and if the value is larger than a predetermined threshold, the frame is determined to be sound. If it is small, use a method of silence. That is, in the sound determination unit 201, first, the first frame intensity calculation unit 301 calculates the intensity (1) of the frame. Further, the sound determination means 302 compares the intensity (1) of the frame with a predetermined threshold, and determines that the frame is sound if the intensity (2) of the frame is higher than the threshold. Is smaller, it is determined that there is no sound.

【００４９】また、前記有音判定手段２０１において有
音と判定されたフレームについては、前記有声音判定手
段２０２において、有声音か無声音かが判定される。本
ＡＧＣ装置においては、有声音か無声音かを判定する手
法としては、所定の複数のピッチに対するフレームの自
己相関関数を計算し、その自己相関関数の最大値とフレ
ームの強度との比を求め、その比が所定の値よりも大き
い場合は有声音とする方法を用いる。The voiced sound judging means 202 judges whether the frame judged as voiced by the voiced sound judging means 201 is a voiced sound or an unvoiced sound. In the present AGC apparatus, as a method of determining whether a voiced sound or an unvoiced sound, the autocorrelation function of a frame for a plurality of predetermined pitches is calculated, and the ratio between the maximum value of the autocorrelation function and the frame intensity is obtained. When the ratio is larger than a predetermined value, a method of making a voiced sound is used.

【００５０】本ＡＧＣ装置においては、有音であると判
定されたフレームの場合も、有声音フレーム以外を無音
フレームと同様に処理することにより、背景雑音のレベ
ルが高い場合や、無声音フレームの場合に、必要以上に
高い利得を乗じることがなくなる。In the AGC apparatus, even in the case of a frame determined to be voiced, by processing other than the voiced frame in the same manner as the unvoiced frame, when the background noise level is high or in the case of the unvoiced frame. Is not multiplied by an unnecessarily high gain.

【００５１】すなわち、前記有声音判定手段２０２にお
いては、まず、第２のフレーム強度算出手段４０１にお
いて、前記フレームの強度（２）が計算される。自己相
関関数の計算には、現在のフレームの各サンプルのみな
らず、以前のフレームのサンプルを用いて計算するた
め、このフレームの強度（２）は、自己相関関数計算に
用いるすべてのサンプルの強度計算であるため、前記フ
レームの強度（１）とは異なる値になる。That is, in the voiced sound judging means 202, first, the intensity (2) of the frame is calculated by the second frame intensity calculating means 401. Since the autocorrelation function is calculated using not only each sample of the current frame but also the samples of the previous frame, the intensity (2) of this frame is calculated as the intensity of all the samples used in the autocorrelation function calculation. Since this is a calculation, the value is different from the intensity (1) of the frame.

【００５２】前記フレームの強度（２）を算出するサン
プル数が、前記フレームの強度（１）を算出するサンプ
ル数（すなわちフレームのサンプル数）の整数倍である
場合は、過去および現在の前記フレームの強度（１）を
足し合わせることによって前記フレームの強度（２）を
求めることができ、積和計算回数を削減することができ
る。When the number of samples for calculating the intensity (2) of the frame is an integral multiple of the number of samples for calculating the intensity (1) of the frame (that is, the number of samples of the frame), the past and present frames are used. By adding the intensities (1) of the frames, the intensity (2) of the frame can be obtained, and the number of product-sum calculations can be reduced.

【００５３】さらに、自己相関関数算出手段４０２にお
いて、所定の複数のピッチに対する自己相関関数が計算
される。自己相関関数は、当該フレームのピッチを推定
するために用いられる手法であり、その計算方法は、当
業者には周知である。自己相関関数を計算するピッチと
しては、人間の音声のピッチ範囲である、５０Ｈｚ〜４
００Ｈｚ、すなわちサンプリング周波数８ｋＨｚの場合
は２０〜１２０サンプルあたりを用いることが好まし
い。この範囲における自己相関関数の最大値（自己相関
関数列のピーク）が明確に得られた場合、当該フレーム
は有声音であると考えることができ、さらに、与えるピ
ッチが当該フレームのピッチであると考えられる。Further, the autocorrelation function calculating means 402 calculates an autocorrelation function for a plurality of predetermined pitches. The autocorrelation function is a technique used for estimating the pitch of the frame, and its calculation method is well known to those skilled in the art. The pitch for calculating the autocorrelation function is 50 Hz to 4 which is the pitch range of human voice.
In the case of 00 Hz, that is, a sampling frequency of 8 kHz, it is preferable to use about 20 to 120 samples. When the maximum value of the autocorrelation function (peak of the autocorrelation function sequence) in this range is clearly obtained, the frame can be considered to be a voiced sound, and the given pitch is the pitch of the frame. Conceivable.

【００５４】さらに、自己相関関数最大値判定手段４０
３において、各ピッチに対して求められた複数の前記自
己相関関数の中から最大値を求める。Further, the autocorrelation function maximum value judging means 40
In 3, the maximum value is determined from the plurality of autocorrelation functions determined for each pitch.

【００５５】さらに、有声音決定手段４０４において、
前記自己相関関数の最大値と、前記フレームの強度
（２）との比を求め、さらに、この比を所定の閾値と比
較し、比が閾値よりも大きい場合は有声音であると判定
し、閾値よりも小さい場合は無声音であると判定する。Further, in the voiced sound determination means 404,
A ratio between the maximum value of the autocorrelation function and the intensity (2) of the frame is obtained, and the ratio is compared with a predetermined threshold. If the ratio is larger than the threshold, it is determined that the sound is a voiced sound. If it is smaller than the threshold, it is determined that the sound is unvoiced.

【００５６】また、前記自己相関関数の最大値と前記フ
レームの強度（２）との比を求めて所定の比の閾値と比
較するのではなく、あらかじめ前記フレームの強度
（２）に所定の比の閾値を乗じ、その乗じた値に対して
前記自己相関関数の最大値が大きい場合は有声音と判定
し、小さい場合は無声音と判定してもよい。Also, instead of obtaining the ratio between the maximum value of the autocorrelation function and the intensity (2) of the frame and comparing it with a threshold value of a predetermined ratio, the intensity of the frame (2) is determined in advance by a predetermined ratio. May be determined as a voiced sound when the maximum value of the autocorrelation function is large with respect to the multiplied value, and may be determined as an unvoiced sound when the maximum value is small.

【００５７】さらに、利得乗算手段１０３においては、
前記フレームに、以前のフレームにおいて利得推定手段
１０４において求められた利得が乗じられる。現在のフ
レームが最初の処理フレームである場合は、所定の初期
値を利得とすればよい。利得の初期値としては、１が挙
げられる。Further, in the gain multiplying means 103,
The frame is multiplied by the gain determined by the gain estimator 104 in the previous frame. If the current frame is the first processing frame, a predetermined initial value may be used as the gain. The initial value of the gain is 1, for example.

【００５８】前記利得を乗じたフレームは、ＡＧＣ出力
として、出力され、音声符号化・音声認識等のアプリケ
ーションに入力される。The frame multiplied by the gain is output as an AGC output and input to an application such as speech coding and speech recognition.

【００５９】さらに、利得推定手段１０４においては、
前記音声判定手段１０２における判定結果にもとづき、
無音フレームおよび無声音フレーム（以下無音／無声音
フレーム）の場合、および有声音フレームの場合、それ
ぞれに固有の処理を行い、次フレームのための利得推定
が行われる。利得推定の手法としては、当業者には周知
の手法が採用できる。その手法としては、特開平１１−
２１４９４０号公報に記載されている手法のごとくであ
る。したがって、以下に一例を述べるが、その限りでは
ない。Further, in the gain estimating means 104,
On the basis of the determination result in the voice determination means 102,
In the case of a silent frame and an unvoiced sound frame (hereinafter referred to as a silent / unvoiced frame) and in the case of a voiced sound frame, unique processing is performed for each of them, and gain estimation for the next frame is performed. As a method of gain estimation, a method known to those skilled in the art can be adopted. The method is disclosed in
This is like the method described in Japanese Patent Publication No. 214940. Therefore, an example is described below, but is not limited thereto.

【００６０】まず、無音／無声音フレームについては、
前記利得算出手段５０３において、利得の変更は行われ
ない。ただし、無音／無声音フレームが所定数続いた場
合は、利得を初期値にリセットする。First, for the silent / unvoiced frame,
The gain calculator 503 does not change the gain. However, when a predetermined number of silent / unvoiced sound frames continue, the gain is reset to the initial value.

【００６１】また、有声音フレームについては、まず、
前記第3のフレーム強度算出手段５０１において、前記
利得を乗じたフレームの強度が計算され、フレームの強
度（３）が得られる。前記フレームの強度（３）は、利
得を乗じたフレームの強度という点で、前記フレームの
強度（１）および前記フレームの強度（２）と異なる。For voiced sound frames, first,
In the third frame strength calculating means 501, the strength of the frame multiplied by the gain is calculated, and the strength (3) of the frame is obtained. The strength (3) of the frame differs from the strength (1) of the frame and the strength (2) of the frame in terms of the strength of the frame multiplied by the gain.

【００６２】さらに、前記強度比較手段５０２におい
て、前記フレームの強度（３）と所定の強度目標値との
大きさの比較を行う。Further, the intensity comparing means 502 compares the intensity (3) of the frame with a predetermined intensity target value.

【００６３】さらに、前記利得算出手段５０３におい
て、前記強度比較手段において前記フレームの強度
（３）が所定の強度目標値よりも大きい場合は、利得を
所定値減じ、また、小さい場合は、利得を所定値増加さ
せる。Further, in the gain calculating means 503, when the strength (3) of the frame is larger than a predetermined strength target value in the strength comparing means, the gain is reduced by a predetermined value. Increase by a predetermined value.

【００６４】〔第２の実施の形態〕本発明の第２の実施
の形態を図１に基づいて説明する。図２は、本発明が適
用される音声自動利得制御（音声ＡＧＣ装置）の一形態
を示す機能ブロック図である。なお、第１の実施の形態
と同一部分は同一符号で示し説明も省略する。[Second Embodiment] A second embodiment of the present invention will be described with reference to FIG. FIG. 2 is a functional block diagram showing an embodiment of an automatic audio gain control (audio AGC device) to which the present invention is applied. The same parts as those in the first embodiment are denoted by the same reference numerals, and the description is omitted.

【００６５】本音声ＡＧＣ装置は、音声入力手段１０１
と、音声判定手段１０２と、利得乗算手段１０３と、利
得推定手段１０４と、を備える。The voice AGC apparatus is provided with voice input means 101
, A voice determination unit 102, a gain multiplication unit 103, and a gain estimation unit 104.

【００６６】さらに、前記音声判定手段１０２は、有音
判定手段２０１と、有声音判定手段２０２と、を備え
る。Further, the voice judging means 102 has a voiced sound judging means 201 and a voiced sound judging means 202.

【００６７】さらに、前記有音判定手段２０１は、第１
のフレーム強度算出手段３０１と、有音決定手段３０２
と、を備える。Further, the sound existence determination means 201 includes a first
Frame intensity calculation means 301 and sound existence determination means 302
And.

【００６８】さらに、前記有声音判定手段２０２は、第
２のフレーム強度算出手段４０１と、自己相関関数算出
手段４０２と、有声音決定手段４０３と、を備える。Further, the voiced sound determining means 202 includes a second frame intensity calculating means 401, an autocorrelation function calculating means 402, and a voiced sound determining means 403.

【００６９】さらに、前記利得推定手段１０４は、第３
のフレーム強度算出手段５０１と、強度比較手段５０２
と、利得算出手段５０３と、を備える。Further, the gain estimating means 104
Frame intensity calculating means 501 and intensity comparing means 502
And a gain calculating means 503.

【００７０】本ＡＧＣ装置は、請求項5に基づくＡＧＣ
装置とは、有声音判定手段２０２のみが異なるため、有
声音判定手段２０２の動作のみを示す。The present AGC apparatus has an AGC
Only the operation of the voiced sound determination means 202 is shown because only the voiced sound determination means 202 differs from the apparatus.

【００７１】前記有声音判定手段２０２においては、ま
ず、第2のフレーム強度算出手段４０１において、前記
フレームの強度（２）が計算される。In the voiced sound judging means 202, first, the intensity (2) of the frame is calculated by the second frame intensity calculating means 401.

【００７２】前記フレームの強度（２）を算出するサン
プル数が、前記フレームの強度（１）を算出するサンプ
ル数（すなわちフレームのサンプル数）の整数倍である
場合は、過去および現在の前記フレームの強度（１）を
足し合わせることによって前記フレームの強度（２）を
求めることができ、積和計算回数を削減することができ
る。If the number of samples for calculating the intensity (2) of the frame is an integral multiple of the number of samples for calculating the intensity (1) of the frame (that is, the number of samples of the frame), the past and current frames are calculated. By adding the intensities (1) of the frames, the intensity (2) of the frame can be obtained, and the number of product-sum calculations can be reduced.

【００７３】さらに、自己相関関数算出手段４０２にお
いて、所定の複数のピッチに対する自己相関関数が計算
される。自己相関関数は、当該フレームのピッチを推定
するために用いられる手法であり、その計算方法は、当
業者には周知である。自己相関関数を計算するピッチと
しては、人間の音声のピッチ範囲である、５０Ｈｚ〜４
００Ｈｚ、すなわちサンプリング周波数８ｋＨｚの場合
は２０〜１２０サンプルあたりを用いることが好まし
い。この範囲における自己相関関数の最大値（自己相関
関数列のピーク）が明確に得られた場合、当該フレーム
は有声音であると考えることができ、さらに、与えるピ
ッチが当該フレームのピッチであると考えられる。Further, the autocorrelation function calculating means 402 calculates an autocorrelation function for a plurality of predetermined pitches. The autocorrelation function is a technique used for estimating the pitch of the frame, and its calculation method is well known to those skilled in the art. The pitch for calculating the autocorrelation function is 50 Hz to 4 which is the pitch range of human voice.
In the case of 00 Hz, that is, a sampling frequency of 8 kHz, it is preferable to use about 20 to 120 samples. When the maximum value of the autocorrelation function (peak of the autocorrelation function sequence) in this range is clearly obtained, the frame can be considered to be a voiced sound, and the given pitch is the pitch of the frame. Conceivable.

【００７４】さらに、有声音決定手段４０３において、
複数求めた前記自己相関関数それぞれと、前記フレーム
の強度（２）との比を求め、さらに、この比を所定の閾
値と比較し、比が閾値よりも大きい場合が一つでもある
場合は有声音であると判定し、まったくない場合は無声
音であると判定する。Further, in the voiced sound determination means 403,
A ratio between each of the plurality of obtained autocorrelation functions and the intensity (2) of the frame is obtained, and this ratio is compared with a predetermined threshold value. It is determined to be a voice sound, and if there is no voice sound, it is determined to be a voiceless sound.

【００７５】また、前記自己相関関数と前記フレームの
強度（２）との比を求めて所定の比の閾値と比較するの
ではなく、あらかじめ前記フレームの強度（２）に所定
の比の閾値を乗じ、その乗じた値よりも大きな前記自己
相関関数が一つでもある場合は有声音と判定し、まった
くない場合は無声音と判定してもよい。Also, instead of obtaining the ratio between the autocorrelation function and the intensity (2) of the frame and comparing it with a threshold value of a predetermined ratio, a threshold value of a predetermined ratio is previously assigned to the intensity (2) of the frame. If there is at least one autocorrelation function larger than the multiplied value, the voiced sound may be determined, and if not, the voiced sound may be determined to be unvoiced.

【００７６】[0076]

【発明の効果】請求項１記載の発明は、入力音声信号を
所定のサンプル数からなるフレームに分割し、前記フレ
ームに乗じる利得を自動的に求め、前記フレームのレベ
ルを所望のレベルに調整することを目的とする音声自動
利得制御装置であって、前記フレームの性質を判定する
音声判定手段を備え、前記音声判定手段は、前記フレー
ムが有声音であるか無声音であるかを判定する有声音判
定手段を備えるので、有声音判定手段による判定の結果
無声音であると判定された場合に無音の場合と同様の以
後の処理を行うことにより、レベルの大きな背景雑音や
無声音が入力された場合に、必要以上に利得を増加させ
ることなく、適切な利得制御を行うことができる。According to the first aspect of the present invention, an input audio signal is divided into frames having a predetermined number of samples, a gain for multiplying the frames is automatically obtained, and the level of the frames is adjusted to a desired level. An automatic voice gain control device, comprising: voice determining means for determining a property of the frame, wherein the voice determining means determines whether the frame is a voiced sound or an unvoiced sound. Since the determination means is provided, if the result of the determination by the voiced sound determination means is determined to be unvoiced, by performing the same processing as in the case of no sound, a large level of background noise or unvoiced sound is input. Thus, appropriate gain control can be performed without increasing the gain more than necessary.

【００７７】請求項２記載の発明は、請求項１記載の音
声自動利得制御装置において、次フレームに乗じる利得
を推定する利得推定手段を備え、前記利得推定手段は、
有性音判定手段において前記フレームが有音であると判
定された場合と無音であると判定された場合とで異なる
利得推定処理を行い、さらに、前記有声音判定手段にお
いて前記フレームが無声音であると判定された場合に無
音の場合を同様の処理を行うので、より簡単な構成で請
求項１記載の音声自動利得制御装置を実現することがで
きる。According to a second aspect of the present invention, in the automatic voice gain control apparatus according to the first aspect, there is provided a gain estimating means for estimating a gain by which the next frame is to be multiplied.
Perform different gain estimation processing when the frame is determined to be voiced and when determined to be unvoiced in the voicing sound determination unit, and further, when the frame is unvoiced in the voiced sound determination unit. When it is determined that no sound is present, the same processing is performed in the case of no sound, so that the audio automatic gain control device according to claim 1 can be realized with a simpler configuration.

【００７８】請求項３記載の発明は、請求項１または２
記載の音声自動利得制御装置において、前記有声音判定
手段は、所定の複数のピッチに対する複数の前記フレー
ムの自己相関関数を算出する自己相関関数算出手段、お
よび前記複数の自己相関関数を用いて前記フレームが有
声音であるか無声音であるかを最終的に決定する有声音
決定手段を備えるので、より簡単な構成で請求項１およ
び２記載の音声自動利得制御装置を実現することができ
る。The third aspect of the present invention is the first or second aspect.
In the automatic voice gain control device according to the aspect, the voiced sound determination unit uses an autocorrelation function calculation unit that calculates an autocorrelation function of a plurality of the frames for a plurality of predetermined pitches, and the plurality of autocorrelation functions. Since the voiced sound determining means for finally determining whether the frame is a voiced sound or an unvoiced sound is provided, the automatic voice gain control device according to claims 1 and 2 can be realized with a simpler configuration.

【００７９】請求項４記載の発明は、請求項３記載の音
声自動利得制御装置において、前記有声音決定手段は、
前記複数の自己相関関数の最大値を求め、前記自己相関
関数の最大値が、フレームごとに求められた所定の閾値
より大きい場合は前記フレームが有声音であると判定
し、所定の閾値よりも小さい場合は無声音であると判定
するので、より簡単な構成で請求項３記載の音声自動利
得制御装置を実現することができる。According to a fourth aspect of the present invention, in the audio automatic gain control apparatus according to the third aspect, the voiced sound determining means is
Determine the maximum value of the plurality of autocorrelation functions, if the maximum value of the autocorrelation function is larger than a predetermined threshold obtained for each frame, it is determined that the frame is a voiced sound, than a predetermined threshold If the volume is small, it is determined that the voice is unvoiced, so that the voice automatic gain control device according to claim 3 can be realized with a simpler configuration.

【００８０】請求項５記載の発明は、請求項４記載の音
声自動利得制御装置において、前記閾値は、前記複数の
自己相関関数の算出に用いられるすべてのサンプル点の
二乗和に所定の比を乗じたものであるので、より簡単な
構成で請求項４記載の音声自動利得制御装置を実現する
ことができる。According to a fifth aspect of the present invention, in the audio automatic gain control apparatus according to the fourth aspect, the threshold value is obtained by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Since the multiplication is performed, the audio automatic gain control device according to claim 4 can be realized with a simpler configuration.

【００８１】請求項６記載の発明は、請求項３記載の音
声自動利得制御装置において、前記有声音決定手段は、
前記複数の自己相関関数それぞれと、フレームごとに求
められた所定の閾値とを比較し、一つでも前記閾値より
も大きな前記自己相関関数が存在すれば前記フレームが
有声音であると判定し、存在しなければ無声音であると
判定するので、より簡単な構成で請求項３記載の音声自
動利得制御装置を実現することができる。According to a sixth aspect of the present invention, in the audio automatic gain control apparatus according to the third aspect, the voiced sound determining means includes:
Each of the plurality of autocorrelation functions is compared with a predetermined threshold determined for each frame, and it is determined that the frame is a voiced sound if any one of the autocorrelation functions is greater than the threshold. If it does not exist, it is determined that the sound is unvoiced, so that the voice automatic gain control device according to claim 3 can be realized with a simpler configuration.

【００８２】請求項７記載の発明は、請求項６記載の音
声自動利得制御装置において、前記閾値は、前記複数の
自己相関関数の算出に用いられるすべてのサンプル点の
二乗和に所定の比を乗じたものであるので、より簡単な
構成で、請求項６記載の音声自動利得制御装置を実現す
ることができる。According to a seventh aspect of the present invention, in the automatic audio gain control apparatus according to the sixth aspect, the threshold value is obtained by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Since the multiplication is performed, the audio automatic gain control device according to claim 6 can be realized with a simpler configuration.

【００８３】請求項８記載の発明は、入力音声信号を所
定のサンプル数からなるフレームに分割し、前記フレー
ムに乗じる利得を自動的に求め、前記フレームのレベル
を所望のレベルに調整することを目的とする音声自動利
得制御方法であって、前記フレームの性質を判定する音
声判定工程を備え、前記音声判定工程は、前記フレーム
が有声音であるか無声音であるかを判定する有声音判定
工程を備えるので、有声音判定工程による判定の結果無
声音であると判定された場合に無音の場合と同様の以後
の処理を行うことにより、レベルの大きな背景雑音や無
声音が入力された場合に、必要以上に利得を増加させる
ことなく、適切な利得制御を行うことができる。According to the present invention, the input audio signal is divided into frames each having a predetermined number of samples, a gain for multiplying the frames is automatically obtained, and the level of the frames is adjusted to a desired level. An automatic audio gain control method comprising an audio determining step of determining a property of the frame, wherein the audio determining step determines whether the frame is a voiced sound or an unvoiced sound. Therefore, when the voiced sound determination step determines that the sound is unvoiced, the subsequent processing is performed in the same manner as in the case of no sound, so that a large level of background noise or unvoiced sound is input. As described above, appropriate gain control can be performed without increasing the gain.

【００８４】請求項９記載の発明は、請求項８記載の音
声自動利得制御方法において、次フレームに乗じる利得
を推定する利得推定工程を備え、前記利得推定工程は、
有性音判定工程において前記フレームが有音であると判
定された場合と無音であると判定された場合とで異なる
利得推定処理を行い、さらに、前記有声音判定工程にお
いて前記フレームが無声音であると判定された場合に無
音の場合を同様の処理を行うので、より簡単な構成で請
求項８記載の音声自動利得制御方法を実現することがで
きる。According to a ninth aspect of the present invention, in the automatic voice gain control method of the eighth aspect, there is provided a gain estimating step of estimating a gain by which the next frame is to be multiplied.
Perform different gain estimation processing between the case where the frame is determined to be voiced and the case where it is determined to be silent in the voicing sound determination step, and further, the frame is unvoiced in the voiced sound determination step. If it is determined that no sound is present, the same processing is performed for the case of no sound, so that the automatic audio gain control method according to claim 8 can be realized with a simpler configuration.

【００８５】請求項１０記載の発明は、請求項８または
９記載の音声自動利得制御方法において、前記有声音判
定工程は、所定の複数のピッチに対する複数の前記フレ
ームの自己相関関数を算出する自己相関関数算出工程、
および前記複数の自己相関関数を用いて前記フレームが
有声音であるか無声音であるかを最終的に決定する有声
音決定工程を備えるので、より簡単な構成で請求項８ま
たは９記載の音声自動利得制御方法を実現することがで
きる。According to a tenth aspect of the present invention, in the voice automatic gain control method according to the eighth or ninth aspect, the voiced sound determination step calculates an autocorrelation function of a plurality of frames for a plurality of predetermined pitches. Correlation function calculation step,
And a voiced sound determining step of finally determining whether the frame is a voiced sound or an unvoiced sound using the plurality of autocorrelation functions, so that the automatic voice processing according to claim 8 or 9 has a simpler configuration. A gain control method can be realized.

【００８６】請求項１１記載の発明は、請求項１０記載
の音声自動利得制御方法において、前記有声音決定工程
は、前記複数の自己相関関数の最大値を求め、前記自己
相関関数の最大値が、フレームごとに求められた所定の
閾値より大きい場合は前記フレームが有声音であると判
定し、所定の閾値よりも小さい場合は無声音であると判
定するので、より簡単な構成で請求項１０記載の音声自
動利得制御方法を実現することができる。According to an eleventh aspect of the present invention, in the voice automatic gain control method according to the tenth aspect, the voiced sound determining step obtains a maximum value of the plurality of autocorrelation functions, and determines a maximum value of the autocorrelation function. The method according to claim 10, wherein the frame is determined to be a voiced sound if it is larger than a predetermined threshold value obtained for each frame, and is determined to be an unvoiced sound if it is smaller than the predetermined threshold value. Can be realized.

【００８７】請求項１２記載の発明は、請求項１１記載
の音声自動利得制御方法において、前記閾値は、前記複
数の自己相関関数の算出に用いられるすべてのサンプル
点の二乗和に所定の比を乗じたものであるので、より簡
単な構成で請求項１１記載の音声自動利得制御方法を実
現することができる。According to a twelfth aspect of the present invention, in the audio automatic gain control method according to the eleventh aspect, the threshold value is obtained by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Since the multiplication is performed, the audio automatic gain control method according to claim 11 can be realized with a simpler configuration.

【００８８】請求項１３記載の発明は、請求項１０記載
の音声自動利得制御方法において、前記有声音決定工程
は、前記複数の自己相関関数それぞれと、フレームごと
に求められた所定の閾値とを比較し、一つでも前記閾値
よりも大きな前記自己相関関数が存在すれば前記フレー
ムが有声音であると判定し、存在しなければ無声音であ
ると判定するので、より簡単な構成で請求項１０記載の
音声自動利得制御方法を実現することができる。According to a thirteenth aspect of the present invention, in the voice automatic gain control method according to the tenth aspect, the voiced sound determination step includes the step of determining each of the plurality of autocorrelation functions and a predetermined threshold value obtained for each frame. 11. A simpler configuration, since the frame is determined to be voiced if at least one of the autocorrelation functions is larger than the threshold, and the frame is determined to be unvoiced if not. The described automatic audio gain control method can be realized.

【００８９】請求項１４記載の発明は、請求項１３記載
の音声自動利得制御方法において、前記閾値は、前記複
数の自己相関関数の算出に用いられるすべてのサンプル
点の二乗和に所定の比を乗じたものであるので、より簡
単な構成で、請求項１３記載の音声自動利得制御方法を
実現することができる。According to a fourteenth aspect of the present invention, in the audio automatic gain control method according to the thirteenth aspect, the threshold value is obtained by adding a predetermined ratio to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Since the multiplication is performed, the audio automatic gain control method according to claim 13 can be realized with a simpler configuration.

【００９０】請求項１５記載の発明は、機械読取可能な
音声自動利得制御用のアルゴリズムを持つコンピュータ
プログラムを格納する記憶媒体であって、そのコンピュ
ータプログラムは、コンピュータにインストールされ、
このコンピュータに、入力音声信号を所定のサンプル数
からなるフレームに分割し、前記フレームに乗じる利得
を自動的に求め、前記フレームのレベルを所望のレベル
に調整することを目的とする音声自動利得制御を実行す
る機能と、前記フレームが有声音であるか無声音である
かを判定する有声音判定機能を含む有声判定機能であっ
て、前記フレームの性質を判定する音声判定機能を実行
する機能と、を実行させるので、有声音判定工程による
判定の結果無声音であると判定された場合に無音の場合
と同様の以後の処理を行うことにより、レベルの大きな
背景雑音や無声音が入力された場合に、必要以上に利得
を増加させることなく、適切な利得制御を行うことがで
きる。According to a fifteenth aspect of the present invention, there is provided a storage medium for storing a computer program having an algorithm for machine-readable audio automatic gain control, wherein the computer program is installed in a computer,
This computer has an automatic audio gain control for dividing an input audio signal into frames having a predetermined number of samples, automatically obtaining a gain for multiplying the frames, and adjusting a level of the frames to a desired level. And a voiced determination function including a voiced sound determination function to determine whether the frame is a voiced sound or unvoiced sound, a function to perform a voice determination function to determine the nature of the frame, Therefore, if the result of the determination in the voiced sound determination step is determined to be unvoiced, by performing subsequent processing similar to the case of silence, when a large level of background noise or unvoiced sound is input, Appropriate gain control can be performed without increasing the gain more than necessary.

【００９１】請求項１６記載の発明は、請求項１５記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、コンピュータに、次フレームに乗じる利得を推定す
る利得推定機能を実行させ、前記利得推定機能は、有性
音判定機能の実行によって前記フレームが有音であると
判定された場合と無音であると判定された場合とで異な
る利得推定処理を行い、さらに、前記有声音判定機能の
実行によって前記フレームが無声音であると判定された
場合に無音の場合を同様の処理を行うので、より簡単な
構成で請求項１５記載のコンピュータプログラムを実現
することができる。According to a sixteenth aspect of the present invention, in the computer program stored in the storage medium of the fifteenth aspect, the computer causes a computer to execute a gain estimating function for estimating a gain for multiplying a next frame. A different gain estimation process is performed between the case where the frame is determined to be voiced and the case where the frame is determined to be silent by execution of the voicing sound determination function. Is determined to be unvoiced, the same processing is performed in the case of no sound, so that the computer program according to claim 15 can be realized with a simpler configuration.

【００９２】請求項１７記載の発明は、請求項１５また
は１６記載の記憶媒体に格納されたコンピュータプログ
ラムにおいて、前記有声音判定機能は、所定の複数のピ
ッチに対する複数の前記フレームの自己相関関数を算出
する自己相関関数算出機能と、前記複数の自己相関関数
を用いて前記フレームが有声音であるか無声音であるか
を最終的に決定する有声音決定機能と、を含むので、よ
り簡単な構成で請求項１５または１６記載のコンピュー
タプログラムを実現することができる。According to a seventeenth aspect of the present invention, in the computer program stored in the storage medium according to the fifteenth or sixteenth aspect, the voiced sound determination function determines an autocorrelation function of a plurality of frames for a plurality of predetermined pitches. An autocorrelation function calculation function for calculating, and a voiced sound determination function for finally determining whether the frame is a voiced sound or an unvoiced sound using the plurality of autocorrelation functions, so that a simpler configuration is provided. Thus, the computer program according to claim 15 or 16 can be realized.

【００９３】請求項１８記載の発明は、請求項１７記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記有声音決定機能は、前記複数の自己相関関数の
最大値を求め、前記自己相関関数の最大値が、フレーム
ごとに求められた所定の閾値より大きい場合は前記フレ
ームが有声音であると判定し、所定の閾値よりも小さい
場合は無声音であると判定するので、より簡単な構成で
請求項１７記載のコンピュータプログラムを実現するこ
とができる。According to an eighteenth aspect of the present invention, in the computer program stored in the storage medium of the seventeenth aspect, the voiced sound determining function obtains a maximum value of the plurality of autocorrelation functions, If the maximum value is larger than a predetermined threshold value obtained for each frame, the frame is determined to be voiced, and if smaller than the predetermined threshold, it is determined to be unvoiced sound. Item 17 can be realized.

【００９４】請求項１９記載の発明は、請求項１８記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記閾値は、前記複数の自己相関関数の算出に用い
られるすべてのサンプル点の二乗和に所定の比を乗じた
ものであるので、より簡単な構成で請求項１８記載のコ
ンピュータプログラムを実現することができる。According to a nineteenth aspect of the present invention, in the computer program stored in the storage medium of the eighteenth aspect, the threshold value is set to a sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Therefore, the computer program according to claim 18 can be realized with a simpler configuration.

【００９５】請求項２０記載の発明は、請求項１７記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記有声音決定機能は、前記複数の自己相関関数そ
れぞれと、フレームごとに求められた所定の閾値とを比
較し、一つでも前記閾値よりも大きな前記自己相関関数
が存在すれば前記フレームが有声音であると判定し、存
在しなければ無声音であると判定するので、より簡単な
構成で請求項１７記載のコンピュータプログラムを実現
することができる。According to a twentieth aspect of the present invention, in the computer program stored in the storage medium according to the seventeenth aspect, the voiced sound determination function includes a step of determining each of the plurality of autocorrelation functions and a predetermined function determined for each frame. Compare with a threshold, if at least one of the autocorrelation function is larger than the threshold is determined, the frame is determined to be a voiced sound, if not present is determined to be unvoiced sound, with a simpler configuration A computer program according to claim 17 can be realized.

【００９６】請求項２１記載の発明は、請求項２０記載
の記憶媒体に格納されたコンピュータプログラムにおい
て、前記閾値は、前記複数の自己相関関数の算出に用い
られるすべてのサンプル点の二乗和に所定の比を乗じた
ものであるので、より簡単な構成で請求項２０記載のコ
ンピュータプログラムを実現することができる。According to a twenty-first aspect of the present invention, in the computer program stored in the storage medium of the twentieth aspect, the threshold value is a predetermined sum of squares of all sample points used for calculating the plurality of autocorrelation functions. Therefore, the computer program according to claim 20 can be realized with a simpler configuration.

【００９７】請求項２２記載の発明は、機械読取可能な
音声自動利得制御用のアルゴリズムを持つコンピュータ
プログラムであって、このコンピュータプログラムは、
コンピュータにインストールされ、このコンピュータ
に、入力音声信号を所定のサンプル数からなるフレーム
に分割し、前記フレームに乗じる利得を自動的に求め、
前記フレームのレベルを所望のレベルに調整することを
目的とする音声自動利得制御を実行する機能と、前記フ
レームが有声音であるか無声音であるかを判定する有声
音判定機能を含む有声判定機能であって、前記フレーム
の性質を判定する音声判定機能を実行する機能と、を実
行させるので、有声音判定工程による判定の結果無声音
であると判定された場合に無音の場合と同様の以後の処
理を行うことにより、レベルの大きな背景雑音や無声音
が入力された場合に、必要以上に利得を増加させること
なく、適切な利得制御を行うことができる。According to a twenty-second aspect of the present invention, there is provided a computer program having an algorithm for machine-readable audio automatic gain control.
Installed on a computer, the computer divides the input audio signal into frames of a predetermined number of samples, and automatically determines a gain for multiplying the frames;
A voiced determination function including a function of performing automatic voice gain control for the purpose of adjusting the level of the frame to a desired level and a voiced sound determination function of determining whether the frame is voiced or unvoiced And a function of executing a voice determination function of determining the nature of the frame, so that when the voiced voice is determined to be unvoiced as a result of the determination in the voiced voice determination step, the subsequent By performing the process, when a high-level background noise or unvoiced sound is input, appropriate gain control can be performed without increasing the gain more than necessary.

【００９８】請求項２３記載の発明は、請求項２２記載
のコンピュータプログラムにおいて、コンピュータに、
次フレームに乗じる利得を推定する利得推定機能を実行
させ、前記利得推定機能は、有性音判定機能の実行によ
って前記フレームが有音であると判定された場合と無音
であると判定された場合とで異なる利得推定処理を行
い、さらに、前記有声音判定機能の実行によって前記フ
レームが無声音であると判定された場合に無音の場合を
同様の処理を行うより簡単な構成で請求項２２記載のコ
ンピュータプログラムを実現することができる。The invention according to claim 23 is the computer program according to claim 22, wherein
A gain estimating function for estimating a gain for multiplying the next frame is executed, and the gain estimating function is performed when the frame is determined to be voiced and when it is determined to be silent by execution of a sexual sound determination function. 23. A simpler configuration according to claim 22, wherein a different gain estimation process is performed in accordance with the above, and further, when the frame is determined to be unvoiced by the execution of the voiced sound determination function, a similar process is performed in the case of silence. A computer program can be realized.

【００９９】請求項２４記載の発明は、請求項２２また
は２３記載のコンピュータプログラムにおいて、前記有
声音判定機能は、所定の複数のピッチに対する複数の前
記フレームの自己相関関数を算出する自己相関関数算出
機能と、前記複数の自己相関関数を用いて前記フレーム
が有声音であるか無声音であるかを最終的に決定する有
声音決定機能と、を含むので、より簡単な構成で請求項
２２または２３記載のコンピュータプログラムを実現す
ることができる。According to a twenty-fourth aspect of the present invention, in the computer program according to the twenty-second or twenty-third aspect, the voiced sound determination function calculates an autocorrelation function for calculating a plurality of autocorrelation functions of a plurality of frames for a plurality of predetermined pitches. 24. A simpler configuration, comprising a function and a voiced sound determination function for finally determining whether the frame is voiced or unvoiced using the plurality of autocorrelation functions. The described computer program can be realized.

【０１００】請求項２５記載の発明は、請求項２４記載
のコンピュータプログラムにおいて、前記有声音決定機
能は、前記複数の自己相関関数の最大値を求め、前記自
己相関関数の最大値が、フレームごとに求められた所定
の閾値より大きい場合は前記フレームが有声音であると
判定し、所定の閾値よりも小さい場合は無声音であると
判定するので、より簡単な構成で請求項２４記載のコン
ピュータプログラムを実現することができるので、より
簡単な構成で請求項２４記載のコンピュータプログラム
を実現することができる。According to a twenty-fifth aspect of the present invention, in the computer program according to the twenty-fourth aspect, the voiced sound determination function obtains a maximum value of the plurality of autocorrelation functions, and the maximum value of the autocorrelation function is determined for each frame. 25. The computer program according to claim 24, wherein the frame is determined to be a voiced sound if the frame is larger than a predetermined threshold value, and an unvoiced sound is determined if the frame is smaller than the predetermined threshold value. Therefore, the computer program according to claim 24 can be realized with a simpler configuration.

【０１０１】請求項２６記載の発明は、請求項２５記載
のコンピュータプログラムにおいて、前記閾値は、前記
複数の自己相関関数の算出に用いられるすべてのサンプ
ル点の二乗和に所定の比を乗じたものであるので、より
簡単な構成で請求項２５記載のコンピュータプログラム
を実現することができる。The invention according to claim 26 is the computer program according to claim 25, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Therefore, the computer program according to claim 25 can be realized with a simpler configuration.

【０１０２】請求項２７記載の発明は、請求項２４記載
のコンピュータプログラムにおいて、前記有声音決定機
能は、前記複数の自己相関関数それぞれと、フレームご
とに求められた所定の閾値とを比較し、一つでも前記閾
値よりも大きな前記自己相関関数が存在すれば前記フレ
ームが有声音であると判定し、存在しなければ無声音で
あると判定するので、より簡単な構成で請求項２４記載
のコンピュータプログラムを実現することができる。According to a twenty-seventh aspect of the present invention, in the computer program according to the twenty-fourth aspect, the voiced sound determining function compares each of the plurality of autocorrelation functions with a predetermined threshold value obtained for each frame, 25. The computer according to claim 24, wherein the frame is determined to be a voiced sound if at least one of the autocorrelation functions greater than the threshold value is present, and is determined to be an unvoiced sound if not. A program can be realized.

【０１０３】請求項２８記載の発明は、請求項２７記載
のコンピュータプログラムにおいて、前記閾値は、前記
複数の自己相関関数の算出に用いられるすべてのサンプ
ル点の二乗和に所定の比を乗じたものであるので、より
簡単な構成で請求項２７記載のコンピュータプログラム
を実現することができる。According to a twenty-eighth aspect of the present invention, in the computer program according to the twenty-seventh aspect, the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Therefore, the computer program according to claim 27 can be realized with a simpler configuration.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態の音声自動利得装置
（ＡＧＣ装置）の機能ブロック図である。FIG. 1 is a functional block diagram of an automatic audio gain device (AGC device) according to a first embodiment of the present invention.

【図２】本発明の第２の実施の形態の音声自動利得装置
（ＡＧＣ装置）の機能ブロック図である。FIG. 2 is a functional block diagram of an automatic audio gain device (AGC device) according to a second embodiment of the present invention.

[Explanation of symbols]

１０２音声判定手段（工程、機能）２０２有声音判定手段（工程、機能）１０４利得推定手段（工程、機能）４０２自己相関関数算出手段（工程、機能）４０４有声音決定手段（工程、機能） Reference Signs List 102 voice determination means (step, function) 202 voiced sound determination means (step, function) 104 gain estimation means (step, function) 402 autocorrelation function calculation means (step, function) 404 voiced sound determination means (step, function)

───────────────────────────────────────────────────── フロントページの続き (54)【発明の名称】音声自動利得制御装置、音声自動利得制御方法、音声自動利得制御用のアルゴリズムを持つコンピュータプログラムを格納する記憶媒体及び音声自動利得制御用のアルゴリズムを持つコンピュータプログラム ──────────────────────────────────────────────────続き Continuing from the front page (54) [Title of the Invention] A storage medium for storing a computer program having an automatic audio gain control device, an automatic audio gain control method, an algorithm for automatic audio gain control, and an automatic audio gain control Computer program with different algorithms

Claims

[Claims]

1. An automatic audio gain for dividing an input audio signal into frames each having a predetermined number of samples, automatically obtaining a gain for multiplying the frames, and adjusting a level of the frames to a desired level. The control device, further comprising: a voice determination unit that determines a property of the frame, wherein the voice determination unit includes a voiced sound determination unit that determines whether the frame is a voiced sound or an unvoiced sound. Automatic gain control device.

2. A gain estimating means for estimating a gain by which a next frame is multiplied, wherein the gain estimating means determines whether the frame is a sound by the sexual sound determining means and determines that the frame is silent. 2. A gain estimation process different from that performed when the frame is determined to be unvoiced by the voiced sound determination means, and a similar process is performed when the frame is unvoiced. Audio automatic gain control device.

3. An autocorrelation function calculation means for calculating an autocorrelation function of a plurality of frames for a plurality of predetermined pitches, and said voiced sound determination means includes a voiced sound using the plurality of autocorrelation functions. 3. The automatic voice gain control device according to claim 1, further comprising voiced sound determination means for finally determining whether the voice is a voice or a voiceless voice.

4. The voiced sound determination means obtains a maximum value of the plurality of autocorrelation functions, and if the maximum value of the autocorrelation function is larger than a predetermined threshold value obtained for each frame, the voiced sound determination means 4. The automatic voice gain control device according to claim 3, wherein the voice is determined to be a voice, and if the voice is smaller than a predetermined threshold, it is determined to be an unvoiced voice.

5. The automatic audio gain according to claim 4, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Control device.

6. The voiced sound determining means compares each of the plurality of autocorrelation functions with a predetermined threshold value obtained for each frame, and if any one of the autocorrelation functions is larger than the threshold value. 4. The automatic audio gain control device according to claim 3, wherein if the frame is determined to be a voiced sound, if not, the frame is determined to be an unvoiced sound.

7. The automatic audio gain according to claim 6, wherein the threshold is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Control device.

8. An automatic audio gain for dividing an input audio signal into frames having a predetermined number of samples, automatically obtaining a gain for multiplying the frames, and adjusting a level of the frames to a desired level. A control method, comprising: a voice determination step of determining a property of the frame; wherein the voice determination step includes a voiced sound determination step of determining whether the frame is a voiced sound or an unvoiced sound. Automatic gain control method.

9. A gain estimating step of estimating a gain by which a next frame is multiplied, wherein the gain estimating step is performed when the frame is determined to be voiced in the sexual sound determination step and when the frame is determined to be silent. 9. A gain estimation process different from that performed when the frame is determined, and the same process is performed when the frame is determined to be unvoiced in the voiced sound determination step when the frame is unvoiced. Voice automatic gain control method.

10. The voiced sound determination step includes: an autocorrelation function calculation step of calculating an autocorrelation function of a plurality of the frames for a plurality of predetermined pitches; and the voiced sound is determined using the plurality of autocorrelation functions. 10. The voice automatic gain control method according to claim 8, further comprising a voiced sound determination step of finally determining whether the sound is unvoiced or not.

11. The voiced sound determination step includes determining a maximum value of the plurality of autocorrelation functions, and determining that the frame has a value if the maximum value of the autocorrelation function is larger than a predetermined threshold value determined for each frame. 11. The automatic voice gain control method according to claim 10, wherein it is determined that the sound is a vocal sound, and when it is smaller than a predetermined threshold value, it is determined that the sound is an unvoiced sound.

12. The automatic audio gain according to claim 11, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Control method.

13. The voiced sound determination step includes comparing each of the plurality of autocorrelation functions with a predetermined threshold value obtained for each frame, and if any one of the autocorrelation functions is larger than the threshold value. 11. The automatic audio gain control method according to claim 10, wherein if the frame is determined to be a voiced sound, if not, the frame is determined to be an unvoiced sound.

14. The automatic audio gain according to claim 13, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Control method.

15. Installed in a computer, the computer divides an input audio signal into frames each having a predetermined number of samples, automatically determines a gain for multiplying the frames, and adjusts the level of the frames to a desired level. And a voiced determination function including a voiced sound determination function for determining whether the frame is a voiced sound or an unvoiced sound, wherein the function of the frame is determined. A storage medium for storing a computer program having a function of executing a voice determination function to execute the function, and a machine-readable algorithm for automatic gain control of a machine to execute the function.

16. A computer for executing a gain estimating function for estimating a gain to be multiplied by a next frame, wherein the gain estimating function is performed when the frame is determined to be voiced by execution of a sexual sound determination function. Perform a different gain estimation process in the case where it is determined to be unvoiced, and further perform the same process in the case of silence when the frame is determined to be unvoiced by the execution of the voiced sound determination function. A storage medium for storing the computer program according to claim 15, wherein:

17. The voiced sound determination function includes: an autocorrelation function calculation function of calculating an autocorrelation function of a plurality of frames for a plurality of predetermined pitches; and a voiced sound generated by using the plurality of autocorrelation functions. 17. A voiced sound determination function for finally determining whether the sound is a voiced sound or an unvoiced sound.
A storage medium for storing the computer program described above.

18. The voiced sound determination function obtains a maximum value of the plurality of autocorrelation functions, and if the maximum value of the autocorrelation function is larger than a predetermined threshold obtained for each frame, the voiced sound determination function determines The storage medium for storing a computer program according to claim 17, wherein the computer program is determined to be a vocal sound, and is determined to be an unvoiced sound if it is smaller than a predetermined threshold.

19. The computer program according to claim 18, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Storage media to store.

20. The voiced sound determination function compares each of the plurality of autocorrelation functions with a predetermined threshold value obtained for each frame, and determines whether at least one of the autocorrelation functions is larger than the threshold value. 18. The storage medium storing the computer program according to claim 17, wherein if the frame is determined to be a voiced sound, and if not, the frame is determined to be an unvoiced sound.

21. The computer program according to claim 20, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio. Storage media to store.

22. Installed in a computer, the input audio signal is divided into frames each having a predetermined number of samples, a gain for multiplying the frames is automatically obtained, and the level of the frames is adjusted to a desired level. And a voiced judgment function including a voiced sound judgment function for judging whether the frame is a voiced sound or an unvoiced sound, wherein the property of the frame is judged. A computer program having a function of executing a voice determination function to execute, and a machine-readable voice automatic gain control algorithm for executing the function.

23. A computer that executes a gain estimating function for estimating a gain by which a next frame is to be multiplied, wherein the gain estimating function includes a case where the frame is determined to be sound by execution of a sexual sound determination function. Perform a different gain estimation process in the case where it is determined to be unvoiced, and further perform the same process in the case of silence when the frame is determined to be unvoiced by the execution of the voiced sound determination function. The computer program according to claim 22, characterized in that:

24. The voiced sound determination function includes: an autocorrelation function calculation function for calculating an autocorrelation function of a plurality of the frames for a plurality of predetermined pitches; and a voiced sound using the plurality of autocorrelation functions. 24. A voiced sound determination function for finally determining whether the sound is a voiced sound or an unvoiced sound.
Computer program as described.

25. The voiced sound determination function obtains a maximum value of the plurality of autocorrelation functions, and if the maximum value of the autocorrelation function is larger than a predetermined threshold obtained for each frame, the voiced sound determination function 25. The computer program according to claim 24, wherein the computer program is determined to be a vocal sound, and is determined to be unvoiced if it is smaller than a predetermined threshold.

26. The computer program according to claim 25, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio.

27. The voiced sound determination function compares each of the plurality of autocorrelation functions with a predetermined threshold value obtained for each frame, and if any one of the autocorrelation functions is larger than the threshold value. 25. The computer program according to claim 24, wherein if the frame is determined to be voiced, the frame is determined to be unvoiced if not present.

28. The computer program according to claim 27, wherein the threshold value is obtained by multiplying a sum of squares of all sample points used for calculating the plurality of autocorrelation functions by a predetermined ratio.