JP2002358093A

JP2002358093A - Speech recognition method, speech recognition device, and storage medium therefor

Info

Publication number: JP2002358093A
Application number: JP2001165457A
Authority: JP
Inventors: Tatsuya Kimura; 達也木村; Akira Ishida; 明石田; Nobuyuki Kunieda; 伸行國枝; Kazuya Nomura; 和也野村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-05-31
Filing date: 2001-05-31
Publication date: 2002-12-13

Abstract

(57)【要約】【課題】少数のデータでも十分な適応を行う話者適応
法を簡便な処理で実現する音声認識方法を提供するこ
と。【解決手段】本発明の音声認識方法は、入力音声より
特徴パラメータ系列を抽出する過程Ｓ１２と、適応を行
うときには第１の音響モデルのパラメータを読込む過程
Ｓ１４と、第１の学習パラメータセットを作成する過程
Ｓ１５と、第２の学習パラメータセットを作成する過程
Ｓ１６と、第１の学習パラメータセットおよび第２の学
習パラメータセットから適応パラメータを算出する過程
Ｓ１７と、適応パラメータを用い第１の音響モデルのパ
ラメータから第２の音響モデルのパラメータを作成する
過程Ｓ１８と、認識を行うときには第１の音響モデルま
たは第２の音響モデルのいずれかを選択する過程１１０
と、選択された音響モデルを用いて特徴パラメータの系
列と照合する過程Ｓ１１２とを有することとした。 (57) [Summary] [PROBLEMS] To provide a speech recognition method that realizes a speaker adaptation method that performs sufficient adaptation even with a small number of data by simple processing. SOLUTION: The speech recognition method of the present invention includes a step S12 of extracting a feature parameter sequence from an input speech, a step S14 of reading a parameter of a first acoustic model when adaptation is performed, and a first learning parameter set. A step of creating S15, a step of creating a second learning parameter set S16, a step of calculating an adaptive parameter from the first learning parameter set and the second learning parameter set S17, and a first sound using the adaptive parameter. Step S18 of creating parameters of the second acoustic model from the parameters of the model, and Step 110 of selecting either the first acoustic model or the second acoustic model when performing recognition.
And a step S112 of collating with a sequence of feature parameters using the selected acoustic model.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、誰の声でも認識で
きる不特定話者音声認識技術に関し、特に認識率が低い
話者に対する認識率をその話者の音声を用いて認識率を
高めるように適応化する話者適応化機能を有する音声認
識方法及び音声認識装置及びその記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an unspecified speaker speech recognition technology capable of recognizing anyone's voice. TECHNICAL FIELD The present invention relates to a speech recognition method and a speech recognition device having a speaker adaptation function for adapting to a speech recognition device and a storage medium thereof.

【０００２】[0002]

【従来の技術】従来、音声認識における話者適応法の技
術として例えば、特許第３０３５２３９号公報に記載さ
れたものが知られている。2. Description of the Related Art Conventionally, as a technique of a speaker adaptation method in speech recognition, for example, a technique described in Japanese Patent No. 3035239 is known.

【０００３】図９は、従来技術の話者適応法の処理フロ
ー図を示す。FIG. 9 shows a processing flow chart of a conventional speaker adaptation method.

【０００４】図９に示すように、従来技術による話者適
応法は、第１の公知技術であるＭＬＬＲ法（文献１「C.
L.Leggetter et al.,貼ochMaximumLikelihood Linear R
egression for Speaker Adaptation of Continuous Den
sity Hidden Markov Models",Computer Speech and Lan
guage,Vol.9,pp.171-185,1995年」）および第２公知技
術であるＭＡＰ推定法（文献２「C.H.Lee et al.,貼och
A Study on Speaker Adaptation of the Parameters of
Continuous Density Hidden Markov Models",IEEE Tra
nsactions on Signal Processing,Vol.39,No.4,pp.806-
814,1991年」）とを組合わせたものである。この組合せ
により、上記第１の公知技術であるＭＬＬＲ法では学習
データが少数の場合に十分な適応が行われない問題の解
決を図っている。As shown in FIG. 9, the speaker adaptation method according to the prior art is based on the MLLR method, which is a first known technique (see C.I.
L. Leggetter et al., OchMaximumLikelihood Linear R
egression for Speaker Adaptation of Continuous Den
sity Hidden Markov Models ", Computer Speech and Lan
guage, Vol. 9, pp. 171-185, 1995 ”) and a MAP estimation method which is a second known technique (Reference 2“ CHLee et al., Pasted och ”).
A Study on Speaker Adaptation of the Parameters of
Continuous Density Hidden Markov Models ", IEEE Tra
nsactions on Signal Processing, Vol. 39, No. 4, pp. 806-
814, 1991 "). By this combination, the MLLR method, which is the first known technique, solves a problem that sufficient adaptation is not performed when the number of learning data is small.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来技
術の話者適応法では、ＭＬＬＲ法とＭＡＰ推定法という
２つの学習アルゴリズムを使用することになるので、処
理が極めて複雑であるという問題があった。However, the conventional speaker adaptation method uses two learning algorithms, the MLLR method and the MAP estimation method, so that the processing is extremely complicated. .

【０００６】本発明は、このような従来の問題を解決す
るためになされたもので、少数のデータでも十分な適応
を行う話者適応法を簡便な処理で実現する音声認識方法
及び音声認識装置及びその記憶媒体を提供するものであ
る。SUMMARY OF THE INVENTION The present invention has been made to solve such a conventional problem, and a voice recognition method and a voice recognition apparatus for realizing a speaker adaptation method for sufficiently adapting even a small amount of data by simple processing. And its storage medium.

【０００７】[0007]

【課題を解決するための手段】本発明の音声認識方法
は、入力した前記音声を分析して特徴パラメータ系列を
抽出する過程と、前記音声が適応のために入力した音声
であるときにはあらかじめ作成してある不特定話者用の
第１の音響モデルのパラメータを読込む過程と、前記第
１の音響モデルのパラメータより第１の学習パラメータ
セットを作成する過程と、前記特徴パラメータ系列およ
び前記第１の音響モデルのパラメータ系列より第２の学
習パラメータセットを作成する過程と、前記第１の学習
パラメータセットおよび第２の学習パラメータセットか
ら適応パラメータを算出する過程と、前記適応パラメー
タを用いた重回帰モデルに基づく所定の写像関数により
前記第１の音響モデルのパラメータから第２の音響モデ
ルのパラメータを作成する過程と、認識を行うときには
前記第１の音響モデルまたは第２の音響モデルのいずれ
かを選択する過程と、前記選択された音響モデルを用い
て前記特徴パラメータの系列と照合を行う過程と、前記
照合結果から認識結果を決定する過程とを有することと
した。According to the present invention, there is provided a speech recognition method comprising the steps of: analyzing an input speech and extracting a feature parameter sequence; and generating a feature parameter sequence if the speech is input for adaptation. Reading a parameter of a first acoustic model for an unspecified speaker, creating a first learning parameter set from the parameters of the first acoustic model, extracting the feature parameter sequence and the first Generating a second learning parameter set from the parameter series of the acoustic model, calculating an adaptive parameter from the first learning parameter set and the second learning parameter set, and performing multiple regression using the adaptive parameter. The parameters of the second acoustic model are created from the parameters of the first acoustic model by a predetermined mapping function based on the model. Performing a step of selecting either the first acoustic model or the second acoustic model when performing recognition, and performing a comparison with the sequence of the feature parameter using the selected acoustic model, Determining a recognition result from the collation result.

【０００８】この方法により、学習話者数が少ない場合
でも、安定した話者適応を有する音声認識方法を簡便な
処理で実現することができる。According to this method, even when the number of learning speakers is small, a speech recognition method having stable speaker adaptation can be realized by simple processing.

【０００９】また、本発明の音声認識方法は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と前
記音響モデルのパラメータ系列との間でViterbi法また
はDP法またはBaum-Welch法による時間軸の整合方法を用
いて前記ベクトル対の確定をしながら学習パラメータセ
ットを作成することとした。Further, in the speech recognition method according to the present invention, the acoustic model may be a continuous distribution HMM or an H approximating a continuous HMM.
An acoustic model modeled by MM, wherein parameters of the acoustic model are average vectors in the acoustic model, and individual elements of the learning parameter set are configured in a vector pair form; A learning parameter set is created while determining the vector pair by using a time axis matching method based on the Viterbi method, the DP method, or the Baum-Welch method with the parameter sequence of the acoustic model.

【００１０】この方法により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。[0010] According to this method, a learning parameter set in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００１１】また、本発明の音声認識方法は、前記学習
パラメータセットの個々の要素がベクトルの対の形で構
成され、対の一方のパラメータベクトルの要素が重回帰
分析における目的変数として用いられ、対の他方の前記
パラメータベクトルが説明変数として用いられ、前記適
応パラメータが重回帰分析における偏回帰係数として用
いられることとした。Further, in the speech recognition method according to the present invention, each element of the learning parameter set is constituted by a pair of vectors, and an element of one parameter vector of the pair is used as an objective variable in the multiple regression analysis. The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００１２】この方法により、適応パラメータを最尤推
定により求めることができ、精度の良い適応パラメータ
の推定ができる。According to this method, the adaptive parameter can be obtained by the maximum likelihood estimation, and the adaptive parameter can be estimated with high accuracy.

【００１３】また、本発明の音声認識方法は、入力した
音声を分析して特徴パラメータ系列を抽出する過程と、
前記音声が適応のために入力した音声であるときにはあ
らかじめ作成してある不特定話者用の第１の音響モデル
のパラメータを読込む過程と、適応における初回発声で
あるかどうかを判断し初回発声のときにのみ前記第１の
音響モデルのパラメータより第１の学習パラメータセッ
トを作成する過程と、前記特徴パラメータ系列および前
記第１の音響モデルのパラメータ系列より第２の学習パ
ラメータセットを作成する過程と、前記第１の学習パラ
メータセットおよび第２の学習パラメータセットから適
応パラメータを作成または更新する過程と、前記適応パ
ラメータを用いた重回帰モデルに基づく所定の写像関数
により前記第１の音響モデルのパラメータから第２の音
響モデルのパラメータを作成する過程と、認識を行うと
きには前記第１の音響モデルまたは第２の音響モデルの
いずれかを選択する過程と、前記選択された音響モデル
を用いて前記特徴パラメータの系列と照合を行う過程
と、前記照合結果から認識結果を決定する過程とを有す
ることとした。The speech recognition method of the present invention further comprises the steps of: analyzing an input speech to extract a feature parameter sequence;
Reading the parameters of the first acoustic model for an unspecified speaker prepared in advance when the speech is speech input for adaptation, and determining whether the speech is the first speech in the adaptation, Creating a first learning parameter set from the parameters of the first acoustic model only, and creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model. Generating and updating an adaptive parameter from the first learning parameter set and the second learning parameter set; and a predetermined mapping function based on a multiple regression model using the adaptive parameter. Creating the parameters of the second acoustic model from the parameters, and performing the first Selecting one of a sound model and a second sound model, performing a match with the sequence of the feature parameters using the selected sound model, and determining a recognition result from the match result. I decided to have it.

【００１４】この構成により、学習話者数が少ない場合
でも、安定した話者適応機能を有する音声認識装置を実
現することができるまた、本発明の音声認識方法は、前
記音響モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似
したＨＭＭでモデル化される音響モデルであり、前記音
響モデルのパラメータが前記音響モデルにおける平均ベ
クトルであり、前記学習パラメータセットの個々の要素
がベクトルの対の形で構成され、前記特徴パラメータ系
列と前記音響モデルのパラメータ系列との間でViterbi
法またはDP法またはBaum-Welch法による時間軸の整合方
法を用いて前記ベクトル対の確定をしながら学習パラメ
ータセットを作成することとした。With this configuration, a speech recognition apparatus having a stable speaker adaptation function can be realized even when the number of learning speakers is small. Also, in the speech recognition method of the present invention, the acoustic model has a continuous distribution HMM. Or an acoustic model modeled by an HMM approximating a continuous HMM, wherein the parameters of the acoustic model are average vectors in the acoustic model, and each element of the learning parameter set is configured in the form of a vector pair; Viterbi between the feature parameter sequence and the parameter sequence of the acoustic model
A learning parameter set is created while determining the vector pair using a time axis matching method based on the method, the DP method, or the Baum-Welch method.

【００１５】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this configuration, a learning parameter set in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００１６】また、本発明の音声認識方法は、前記学習
パラメータセットの個々の要素がベクトルの対の形で構
成され、対の一方のパラメータベクトルの要素が重回帰
分析における目的変数として用いられ、対の他方の前記
パラメータベクトルが説明変数として用いられ、前記適
応パラメータが重回帰分析における偏回帰係数として用
いられることとした。Further, in the speech recognition method according to the present invention, each element of the learning parameter set is constituted in the form of a vector pair, and an element of one parameter vector of the pair is used as an objective variable in the multiple regression analysis. The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００１７】この構成により、学習話者数が少ない場合
でも、安定した話者適応機能を有する記憶媒体を実現す
ることができる。With this configuration, it is possible to realize a storage medium having a stable speaker adaptation function even when the number of learning speakers is small.

【００１８】また、本発明の音声認識方法は、入力した
音声を分析して特徴パラメータ系列を抽出する過程と、
前記音声が適応のために入力した音声であるときにはあ
らかじめ作成してある不特定話者用の第１の音響モデル
のパラメータを読込む過程と、適応における初回発声で
あるかどうかを判断し初回発声のときにのみ前記第１の
音響モデルのパラメータより第１の学習パラメータセッ
トを作成する過程と、前記特徴パラメータ系列および前
記第１の音響モデルのパラメータ系列より第２の学習パ
ラメータセットを作成する過程と、前記第１の学習パラ
メータセットおよび第２の学習パラメータセットから適
応パラメータを作成または更新する過程と、前記適応パ
ラメータを用いた重回帰モデルに基づく所定の写像関数
により前記第１の音響モデルのパラメータから第２の音
響モデルのパラメータを作成する過程と、認識を行うと
きには前記第１の音響モデルまたは第２の音響モデルの
どちらを使って認識するかを判定する過程と、前記判定
結果に基づき前記２種類音響モデルの選択制御を行う過
程と、前記選択された音響モデルを用いて前記特徴パラ
メータの系列と照合を行う過程と、前記照合結果から認
識結果を決定する過程とを有することとした。The speech recognition method of the present invention further comprises the steps of: analyzing an input speech to extract a feature parameter sequence;
Reading the parameters of the first acoustic model for an unspecified speaker prepared in advance when the speech is speech input for adaptation, and determining whether the speech is the first speech in the adaptation, Creating a first learning parameter set from the parameters of the first acoustic model only, and creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model. Generating and updating an adaptive parameter from the first learning parameter set and the second learning parameter set; and a predetermined mapping function based on a multiple regression model using the adaptive parameter. Creating the parameters of the second acoustic model from the parameters, and performing the first Determining whether to use the acoustic model or the second acoustic model for recognition, performing a selection control of the two types of acoustic models based on the determination result, and using the selected acoustic model. The method includes a step of performing matching with a series of feature parameters and a step of determining a recognition result from the matching result.

【００１９】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。With this configuration, a learning parameter set in which the input feature vector series and the model parameter series are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００２０】また、本発明の音声認識方法は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と前
記音響モデルのパラメータ系列との間でViterbi法また
はDP法またはBaum-Welch法による時間軸の整合方法を用
いて前記ベクトル対の確定をしながら学習パラメータセ
ットを作成することとした。Further, in the speech recognition method of the present invention, the acoustic model may be a continuous distribution HMM or an H approximating a continuous HMM.
An acoustic model modeled by MM, wherein parameters of the acoustic model are average vectors in the acoustic model, and individual elements of the learning parameter set are configured in a vector pair form; A learning parameter set is created while determining the vector pair by using a time axis matching method based on the Viterbi method, the DP method, or the Baum-Welch method with the parameter sequence of the acoustic model.

【００２１】この構成により、適応パラメータの推定を
最尤を規準に実施することができ、精度の良い適応パラ
メータの推定ができる。With this configuration, the adaptive parameter can be estimated based on the maximum likelihood, and the adaptive parameter can be estimated with high accuracy.

【００２２】また、本発明の音声認識方法は、前記学習
パラメータセットの個々の要素がベクトルの対の形で構
成され、対の一方のパラメータベクトルの要素が重回帰
分析における目的変数として用いられ、対の他方の前記
パラメータベクトルが説明変数として用いられ、前記適
応パラメータが重回帰分析における偏回帰係数として用
いられることとした。Further, in the speech recognition method according to the present invention, each element of the learning parameter set is formed in the form of a pair of vectors, and an element of one parameter vector of the pair is used as an objective variable in the multiple regression analysis. The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００２３】この方法により、学習話者数が少ない場合
でも、安定した逐次適応型の話者適応方法を実現するこ
とができる。According to this method, even when the number of learning speakers is small, it is possible to realize a stable sequential adaptive speaker adaptation method.

【００２４】また、本発明の音声認識方法は、入力した
音声を分析して特徴パラメータ系列を抽出する過程と、
前記音声が適応のために入力した音声であるときにはあ
らかじめ作成してある不特定話者用の第１の音響モデル
のパラメータを読込む過程と、適応における初回発声で
あるかどうかを判断し初回発声のときにのみ前記第１の
音響モデルのパラメータより第１の学習パラメータセッ
トを作成する過程と、前記特徴パラメータ系列および前
記第１の音響モデルのパラメータ系列より第２の学習パ
ラメータセットを作成する過程と、前記第１の学習パラ
メータセットおよび第２の学習パラメータセットから適
応パラメータを作成または更新する過程と、前記適応パ
ラメータを用いた重回帰モデルに基づく所定の写像関数
により前記第１の音響モデルのパラメータから第２の音
響モデルのパラメータを作成する過程と、認識を行うと
きには前記第１の音響モデルおよび第２の音響モデルを
用いて前記特徴パラメータの系列と照合を行う過程と、
前記照合結果から認識結果を決定する過程とを有するこ
ととした。Further, the speech recognition method of the present invention comprises the steps of: analyzing an input speech and extracting a feature parameter sequence;
When the voice is a voice input for adaptation, a step of reading parameters of a first acoustic model for an unspecified speaker prepared in advance, and determining whether or not this is the first utterance in the adaptation, Creating a first learning parameter set from the parameters of the first acoustic model only, and creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model. Generating and updating an adaptive parameter from the first learning parameter set and the second learning parameter set; and a predetermined mapping function based on a multiple regression model using the adaptive parameter. Creating the parameters of the second acoustic model from the parameters, and performing the first A process of performing a sequence and matching of the feature parameters using the sound model and the second acoustic model,
Determining a recognition result from the collation result.

【００２５】この方法により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this method, a learning parameter set in which a sequence of the input feature amount vectors and a model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００２６】また、本発明の音声認識方法は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と前
記音響モデルのパラメータ系列との間でViterbi法また
はDP法またはBaum-Welch法による時間軸の整合方法を用
いて前記ベクトル対の確定をしながら学習パラメータセ
ットを作成することとした。Further, in the speech recognition method according to the present invention, the acoustic model may be a continuous distribution HMM or an H approximating a continuous HMM.
An acoustic model modeled by MM, wherein parameters of the acoustic model are average vectors in the acoustic model, and individual elements of the learning parameter set are configured in a vector pair form; A learning parameter set is created while determining the vector pair by using a time axis matching method based on the Viterbi method, the DP method, or the Baum-Welch method with the parameter sequence of the acoustic model.

【００２７】この方法により、適応パラメータの推定を
最尤を規準に実施することができ、精度の良い適応パラ
メータの推定ができる。According to this method, the adaptive parameter can be estimated on the basis of the maximum likelihood, and the adaptive parameter can be estimated with high accuracy.

【００２８】また、本発明の音声認識方法は、前記学習
パラメータセットの個々の要素がベクトルの対の形で構
成され、対の一方のパラメータベクトルの要素が重回帰
分析における目的変数として用いられ、対の他方の前記
パラメータベクトルが説明変数として用いられ、前記適
応パラメータが重回帰分析における偏回帰係数として用
いられることとした。Further, in the speech recognition method according to the present invention, each element of the learning parameter set is formed in the form of a pair of vectors, and an element of one parameter vector of the pair is used as an objective variable in the multiple regression analysis. The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００２９】この構成により、学習話者数が少ない場合
でも、安定した話者適応装置を実現することができる。With this configuration, a stable speaker adaptation apparatus can be realized even when the number of learning speakers is small.

【００３０】また、本発明の音声認識装置は、入力した
音声を分析して特徴パラメータ系列を抽出する特徴パラ
メータ抽出手段と、適応前の不特定話者用としてあらか
じめ作成してある第１の音響モデルのパラメータを記憶
する第１の記憶手段と、前記第１の音響モデルのパラメ
ータより第１の学習パラメータセットを作成する手段
と、前記特徴パラメータ系列および前記第１の音響モデ
ルのパラメータ系列より第２の学習パラメータセットを
作成する手段と、前記第１の学習パラメータセットおよ
び第２のパラメータセットを併合する手段と、前記併合
手段により得られた第３の学習パラメータセットから重
回帰分析法により適応パラメータを算出する適応パラメ
ータ作成手段と、前記適応パラメータを用いた重回帰モ
デルに基づく所定の写像関数により前記第１の音響モデ
ルのパラメータから第２の音響モデルのパラメータを作
成する音響モデルパラメータ変換手段と、前記第２の音
響モデルのパラメータを記憶する第２の記憶手段とを備
え構成とした。Further, the speech recognition apparatus of the present invention analyzes the input speech and extracts a feature parameter sequence, and a first sound created beforehand for an unspecified speaker. First storage means for storing model parameters; means for generating a first learning parameter set from the parameters of the first acoustic model; and means for generating a first learning parameter set from the feature parameter series and the parameter series of the first acoustic model. Means for creating a second learning parameter set, means for merging the first learning parameter set and the second parameter set, and adaptation by a multiple regression analysis from the third learning parameter set obtained by the merging means. An adaptive parameter creating means for calculating a parameter, and a predetermined parameter based on a multiple regression model using the adaptive parameter. An acoustic model parameter conversion unit configured to generate a parameter of the second acoustic model from a parameter of the first acoustic model by using an image function; and a second storage unit configured to store the parameter of the second acoustic model. did.

【００３１】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this configuration, a learning parameter set in which the input feature vector sequence and the model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００３２】また、本発明の音声認識装置は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と前
記音響モデルのパラメータ系列との間でViterbi法また
はDP法またはBaum-Welch法による時間軸の整合方法を用
いて前記ベクトル対の確定をしながら学習パラメータセ
ットを作成することとした。Further, in the speech recognition apparatus of the present invention, the acoustic model may be a continuous distribution HMM or an H approximating a continuous HMM.
An acoustic model modeled by MM, wherein parameters of the acoustic model are average vectors in the acoustic model, and individual elements of the learning parameter set are configured in a vector pair form; A learning parameter set is created while determining the vector pair by using a time axis matching method based on the Viterbi method, the DP method, or the Baum-Welch method with the parameter sequence of the acoustic model.

【００３３】この構成により、学習話者数が少ない場合
でも、安定した話者適応機能を有する記憶媒体を提供す
ることができるまた、本発明の音声認識装置は、入力し
た音声を分析して特徴パラメータ系列を抽出する特徴パ
ラメータ抽出手段と、適応前の不特定話者用としてあら
かじめ作成してある第１の音響モデルのパラメータを記
憶する第１の記憶手段と、前記第１の音響モデルのパラ
メータより第１の学習パラメータセットを学習の初回発
声時に限り作成する手段と、前記特徴パラメータ系列お
よび前記第１の音響モデルのパラメータ系列より第２の
学習パラメータセットを作成する手段と、学習の初回発
声時に限り前記第１の学習パラメータセットおよび第２
のパラメータセットを併合する手段と、前記併合手段に
より得られた第３の学習パラメータセットから重回帰分
析法により適応パラメータを算出または更新する適応パ
ラメータ作成手段と、前記適応パラメータを用いた重回
帰モデルに基づく所定の写像関数により前記第１の音響
モデルのパラメータから第２の音響モデルのパラメータ
を作成する音響モデルパラメータ変換手段と、前記第２
の音響モデルのパラメータを記憶する第２の記憶手段と
を備えた構成とした。According to this configuration, a storage medium having a stable speaker adaptation function can be provided even when the number of learning speakers is small. The speech recognition device of the present invention analyzes input speech and is characterized by Feature parameter extraction means for extracting a parameter sequence, first storage means for storing parameters of a first acoustic model prepared in advance for an unspecified speaker before adaptation, and parameters of the first acoustic model Means for creating a first learning parameter set only at the time of the first utterance of learning; means for creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model; The first learning parameter set and the second
Means for combining the parameter sets of the above, an adaptive parameter creating means for calculating or updating the adaptive parameters from the third learning parameter set obtained by the combining means by a multiple regression analysis method, and a multiple regression model using the adaptive parameters Acoustic model parameter conversion means for creating a parameter of the second acoustic model from a parameter of the first acoustic model by a predetermined mapping function based on
And a second storage means for storing the parameters of the acoustic model.

【００３４】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this configuration, a learning parameter set is obtained in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis.
Accurate adaptation parameter estimation can be performed.

【００３５】また、本発明の音声認識装置は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と前
記音響モデルのパラメータ系列との間でViterbi法また
はDP法またはBaum-Welch法による時間軸の整合方法を用
いて前記ベクトル対の確定をしながら学習パラメータセ
ットを作成することとした。Further, in the speech recognition apparatus according to the present invention, the acoustic model may be a continuous distribution HMM or an H approximating a continuous HMM.
An acoustic model modeled by MM, wherein parameters of the acoustic model are average vectors in the acoustic model, and individual elements of the learning parameter set are configured in a vector pair form; A learning parameter set is created while determining the vector pair by using a time axis matching method based on the Viterbi method, the DP method, or the Baum-Welch method with the parameter sequence of the acoustic model.

【００３６】この構成により、適応パラメータを最尤推
定することができ、精度の良い適応パラメータの推定が
できるまた、本発明の音声認識装置は、入力した音声を
分析して特徴パラメータ系列を抽出する特徴パラメータ
抽出手段と、適応前の不特定話者用としてあらかじめ作
成してある第１の音響モデルのパラメータを記憶する第
１の記憶手段と、前記第１の音響モデルのパラメータよ
り第１の学習パラメータセットを学習の初回発声時に限
り作成する手段と、前記特徴パラメータ系列および前記
第１の音響モデルのパラメータ系列より第２の学習パラ
メータセットを作成する手段と、学習の初回発声時に限
り前記第１の学習パラメータセットおよび第２のパラメ
ータセットを併合する手段と、前記併合手段により得ら
れた第３の学習パラメータセットから重回帰分析法によ
り適応パラメータを算出または更新する適応パラメータ
作成手段と、前記適応パラメータを用いた重回帰モデル
に基づく所定の写像関数により前記第１の音響モデルの
パラメータから第２の音響モデルのパラメータを作成す
る音響モデルパラメータ変換手段と、前記第２の音響モ
デルのパラメータを記憶する第２の記憶手段と、認識に
際して前記第１の音響モデルまたは第２の音響モデルの
どちらを用いて認識するかを判定する手段とを備えた構
成とした。With this configuration, the adaptive parameter can be estimated with maximum likelihood, and the adaptive parameter can be estimated with high accuracy. The speech recognition apparatus of the present invention analyzes the input speech and extracts a feature parameter sequence. Feature parameter extraction means, first storage means for storing parameters of a first acoustic model prepared in advance for an unspecified speaker before adaptation, and first learning from the parameters of the first acoustic model Means for creating a parameter set only at the time of the first utterance of learning; means for creating a second learning parameter set from the feature parameter sequence and the parameter sequence of the first acoustic model; Means for merging the learning parameter set and the second parameter set, and a third learning parameter set obtained by the merging means. An adaptive parameter creating means for calculating or updating an adaptive parameter from a meter set by a multiple regression analysis method; and a second sound from a parameter of the first acoustic model by a predetermined mapping function based on a multiple regression model using the adaptive parameter. Acoustic model parameter conversion means for creating model parameters, second storage means for storing parameters of the second acoustic model, and either the first acoustic model or the second acoustic model for recognition. And a means for determining whether to recognize.

【００３７】この方法により、学習サンプルが少なすぎ
る場合には不特定話者の音響モデルを用いた自動的に行
われ性能が安定した音声認識方法を提供することができ
る。According to this method, when the number of learning samples is too small, it is possible to provide a speech recognition method which is automatically performed using an acoustic model of an unspecified speaker and has stable performance.

【００３８】また、本発明の音声認識装置は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と前
記音響モデルのパラメータ系列との間でViterbi法また
はDP法またはBaum-Welch法による時間軸の整合方法を用
いて前記ベクトル対の確定をしながら学習パラメータセ
ットを作成することとした。Further, in the speech recognition apparatus according to the present invention, the acoustic model may be a continuous distribution HMM or an H approximating the continuous HMM.
An acoustic model modeled by MM, wherein parameters of the acoustic model are average vectors in the acoustic model, and individual elements of the learning parameter set are configured in a vector pair form; A learning parameter set is created while determining the vector pair by using a time axis matching method based on the Viterbi method, the DP method, or the Baum-Welch method with the parameter sequence of the acoustic model.

【００３９】この方法により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this method, a learning parameter set is obtained in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis.
Accurate adaptation parameter estimation can be performed.

【００４０】また、本発明の音声認識装置は、入力した
音声を分析して特徴パラメータ系列を抽出する特徴パラ
メータ抽出手段と、適応前の不特定話者用としてあらか
じめ作成してある第１の音響モデルのパラメータを記憶
する第１の記憶手段と、前記第１の音響モデルのパラメ
ータより第１の学習パラメータセットを学習の初回発声
時に限り作成する手段と、前記特徴パラメータ系列およ
び前記第１の音響モデルのパラメータ系列より第２の学
習パラメータセットを作成する手段と、学習の初回発声
時に限り前記第１の学習パラメータセットおよび第２の
パラメータセットを併合する手段と、前記併合手段によ
り得られた第３の学習パラメータセットから重回帰分析
法により適応パラメータを算出または更新する適応パラ
メータ作成手段と、前記適応パラメータを用いた重回帰
モデルに基づく所定の写像関数により前記第１の音響モ
デルのパラメータから第２の音響モデルのパラメータを
作成する音響モデルパラメータ変換手段と、前記第２の
音響モデルのパラメータを記憶する第２の記憶手段と、
認識に際しては前記第１の音響モデルに対する照合結果
および第２の音響モデルに対する照合結果を得る照合手
段とを備えた構成とした。Further, the speech recognition apparatus of the present invention analyzes the input speech and extracts a feature parameter sequence, and a first sound created beforehand for an unspecified speaker. First storage means for storing model parameters; means for creating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance of learning; the feature parameter sequence and the first audio Means for creating a second learning parameter set from the model parameter series, means for merging the first learning parameter set and the second parameter set only at the time of the first utterance of learning, and a second means obtained by the merging means. An adaptive parameter creating means for calculating or updating an adaptive parameter from the learning parameter set by a multiple regression analysis method; Acoustic model parameter conversion means for creating a parameter of the second acoustic model from a parameter of the first acoustic model by a predetermined mapping function based on a multiple regression model using the adaptive parameter, and a parameter of the second acoustic model Second storage means for storing
Upon recognition, a configuration is provided that includes a verification unit that obtains a verification result for the first acoustic model and a verification result for the second acoustic model.

【００４１】この方法により、適応パラメータの推定を
最尤を規準に実施することができ、精度の良い適応パラ
メータの推定ができる。According to this method, the adaptive parameter can be estimated on the basis of the maximum likelihood, and the adaptive parameter can be estimated with high accuracy.

【００４２】また、本発明の音声認識装置は、前記音響
モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似したＨ
ＭＭでモデル化される音響モデルであり、前記音響モデ
ルのパラメータが前記音響モデルにおける平均ベクトル
であり、前記学習パラメータセットの個々の要素がベク
トルの対の形で構成され、前記特徴パラメータ系列と音
響モデルのパラメータ系列との間でViterbi法またはDP
法またはBaum-Welch法による時間軸の整合方法を用いて
前記ベクトル対の確定をしながら学習パラメータセット
を作成することとした。Further, in the speech recognition apparatus of the present invention, the acoustic model may be a continuous distribution HMM or an H approximating a continuous HMM.
An acoustic model modeled by MM, wherein the parameters of the acoustic model are average vectors in the acoustic model, and the individual elements of the learning parameter set are configured in the form of pairs of vectors; Viterbi method or DP between model parameter series
The learning parameter set is created while the vector pair is determined using the time axis matching method by the Baum-Welch method.

【００４３】この構成により、学習サンプルが少なすぎ
る場合には不特定話者の音響モデルを用いた認識が自動
的に行われ性能が安定した音声認識装置を提供すること
ができる。According to this configuration, when the number of learning samples is too small, the recognition using the acoustic model of the unspecified speaker is automatically performed, and a speech recognition device with stable performance can be provided.

【００４４】また、本発明の音声認識記憶媒体は、入力
した音声を分析して特徴パラメータ系列を抽出する過程
と、前記音声が適応のために入力した音声であるときに
はあらかじめ作成してある不特定話者用の第１の音響モ
デルのパラメータを読込む過程と、前記第１の音響モデ
ルのパラメータより第１の学習パラメータセットを作成
する過程と、前記特徴パラメータ系列および前記第１の
音響モデルのパラメータ系列より第２の学習パラメータ
セットを作成する過程と、前記第１の学習パラメータセ
ットおよび第２の学習パラメータセットから適応パラメ
ータを算出する過程と、前記適応パラメータを用いた重
回帰モデルに基づく所定の写像関数により前記第１の音
響モデルのパラメータから第２の音響モデルのパラメー
タを作成する過程と、認識を行うときには前記第１の音
響モデルまたは第２の音響モデルのいずれかを選択する
過程と、前記選択された音響モデルを用いて前記特徴パ
ラメータの系列と照合を行う過程と、前記照合結果から
認識結果を決定する過程とを有する音声認識方法のプロ
グラムを記憶し、前記プログラムをコンピュータより読
み取り可能とすることとした。The voice recognition storage medium of the present invention further comprises a step of analyzing the input voice and extracting a characteristic parameter sequence, and a step of generating an unspecified predetermined voice when the voice is voice input for adaptation. Reading the parameters of the first acoustic model for the speaker, creating a first learning parameter set from the parameters of the first acoustic model, and analyzing the feature parameter sequence and the first acoustic model. A step of creating a second learning parameter set from the parameter sequence, a step of calculating an adaptive parameter from the first learning parameter set and the second learning parameter set, and a predetermined step based on a multiple regression model using the adaptive parameter. Generating parameters of a second acoustic model from parameters of the first acoustic model by a mapping function of Selecting one of the first acoustic model and the second acoustic model when performing the recognition, performing a comparison with the sequence of the feature parameters using the selected acoustic model, And a program for determining a recognition result from the speech recognition method. The program is readable by a computer.

【００４５】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this configuration, a learning parameter set is obtained in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis.
Accurate adaptation parameter estimation can be performed.

【００４６】また、本発明の音声認識記憶媒体は、前記
音響モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似し
たＨＭＭでモデル化される音響モデルであり、前記音響
モデルのパラメータが前記音響モデルにおける平均ベク
トルであり、前記学習パラメータセットの個々の要素が
ベクトルの対の形で構成され、前記特徴パラメータ系列
と前記音響モデルのパラメータ系列との間でViterbi法
またはDP法またはBaum-Welch法による時間軸の整合方法
を用いて前記ベクトル対の確定をしながら学習パラメー
タセットを作成することとした。Further, in the speech recognition storage medium according to the present invention, the acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating the continuous HMM, and a parameter of the acoustic model is an average vector in the acoustic model. The individual elements of the learning parameter set are configured in the form of pairs of vectors, and the time axis between the feature parameter sequence and the parameter sequence of the acoustic model is calculated using the Viterbi method, the DP method, or the Baum-Welch method. The learning parameter set is created while determining the vector pair using the matching method.

【００４７】この構成により、学習サンプルが少なすぎ
る場合には不特定話者の音響モデルを用いた認識が自動
的に行われ性能が安定した音声認識装置を提供すること
ができる。With this configuration, when the number of learning samples is too small, recognition using an acoustic model of an unspecified speaker is automatically performed, and a speech recognition device with stable performance can be provided.

【００４８】また、本発明の音声認識記憶媒体は、前記
学習パラメータセットの個々の要素がベクトルの対の形
で構成され、対の一方のパラメータベクトルの要素が重
回帰分析における目的変数として用いられ、対の他方の
前記パラメータベクトルが説明変数として用いられ、前
記適応パラメータが重回帰分析における偏回帰係数とし
て用いられることとした。Further, in the speech recognition storage medium of the present invention, each element of the learning parameter set is formed in the form of a vector pair, and the element of one parameter vector of the pair is used as an objective variable in multiple regression analysis. , The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００４９】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。With this configuration, a learning parameter set in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００５０】また、本発明の音声認識記憶媒体は、入力
した音声を分析して特徴パラメータ系列を抽出する過程
と、前記音声が適応のために入力した音声であるときに
はあらかじめ作成してある不特定話者用の第１の音響モ
デルのパラメータを読込む過程と、適応における初回発
声であるかどうかを判断し初回発声のときにのみ前記第
１の音響モデルのパラメータより第１の学習パラメータ
セットを作成する過程と、前記特徴パラメータ系列およ
び前記第１の音響モデルのパラメータ系列より第２の学
習パラメータセットを作成する過程と、前記第１の学習
パラメータセットおよび第２の学習パラメータセットか
ら適応パラメータを作成または更新する過程と、前記適
応パラメータを用いた重回帰モデルに基づく所定の写像
関数により前記第１の音響モデルのパラメータから第２
の音響モデルのパラメータを作成する過程と、認識を行
うときには前記第１の音響モデルまたは第２の音響モデ
ルのいずれかを選択する過程と、前記選択された音響モ
デルを用いて前記特徴パラメータの系列と照合を行う過
程と、前記照合結果から認識結果を決定する過程とを有
する音声認識方法のプログラムを記憶し、前記プログラ
ムをコンピュータより読み取り可能とすることとした。The voice recognition storage medium of the present invention further comprises a step of analyzing the input voice and extracting a characteristic parameter sequence, and a step of extracting the feature parameter sequence if the voice is input for adaptation. Reading the parameters of the first acoustic model for the speaker, and determining whether or not the first utterance in the adaptation, and determining the first learning parameter set from the parameters of the first acoustic model only at the first utterance. Creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model; and adapting an adaptive parameter from the first learning parameter set and the second learning parameter set. Creating or updating, and a predetermined mapping function based on a multiple regression model using the adaptive parameters, The second from the parameters of the acoustic model of
Creating the parameters of the acoustic model, selecting one of the first acoustic model and the second acoustic model when performing the recognition, and using the selected acoustic model, And storing a program of a voice recognition method having a step of performing a check and a step of determining a recognition result from the result of the check, and making the program readable by a computer.

【００５１】この構成により、適応パラメータを最尤推
定することが可能となり、精度の良い適応パラメータの
推定ができる。With this configuration, the adaptive parameter can be estimated with maximum likelihood, and the adaptive parameter can be estimated with high accuracy.

【００５２】また、本発明の音声認識記憶媒体は、前記
音響モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似し
たＨＭＭでモデル化される音響モデルであり、前記音響
モデルのパラメータが前記音響モデルにおける平均ベク
トルであり、前記学習パラメータセットの個々の要素が
ベクトルの対の形で構成され、前記特徴パラメータ系列
と前記音響モデルのパラメータ系列との間でViterbi法
またはDP法またはBaum-Welch法による時間軸の整合方法
を用いて前記ベクトル対の確定をしながら学習パラメー
タセットを作成することとした。Further, in the speech recognition storage medium according to the present invention, the acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating the continuous HMM, and a parameter of the acoustic model is an average vector in the acoustic model. The individual elements of the learning parameter set are configured in the form of pairs of vectors, and the time axis between the feature parameter sequence and the parameter sequence of the acoustic model is calculated using the Viterbi method, the DP method, or the Baum-Welch method. The learning parameter set is created while determining the vector pair using the matching method.

【００５３】この方法により、逐次適応を行っていて話
者が変わった場合には、不特定話者の音響モデルが有効
になり、極端な認識性能の低下を回避することができる
音声認識方法を実現できる。According to this method, when the speaker is changed by performing the adaptation successively, the acoustic model of the unspecified speaker becomes effective, and a speech recognition method capable of avoiding an extreme decrease in recognition performance is provided. realizable.

【００５４】また、本発明の音声認識記憶媒体は、前記
学習パラメータセットの個々の要素がベクトルの対の形
で構成され、対の一方のパラメータベクトルの要素が重
回帰分析における目的変数として用いられ、対の他方の
前記パラメータベクトルが説明変数として用いられ、前
記適応パラメータが重回帰分析における偏回帰係数とし
て用いられることとした。Further, in the speech recognition storage medium of the present invention, each element of the learning parameter set is formed in the form of a pair of vectors, and one parameter vector element of the pair is used as an objective variable in multiple regression analysis. , The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００５５】この方法により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this method, a learning parameter set is obtained in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis.
Accurate adaptation parameter estimation can be performed.

【００５６】また、本発明の音声認識記憶媒体は、入力
した音声を分析して特徴パラメータ系列を抽出する過程
と、前記音声が適応のために入力した音声であるときに
はあらかじめ作成してある不特定話者用の第１の音響モ
デルのパラメータを読込む過程と、適応における初回発
声であるかどうかを判断し初回発声のときにのみ前記第
１の音響モデルのパラメータより第１の学習パラメータ
セットを作成する過程と、前記特徴パラメータ系列およ
び前記第１の音響モデルのパラメータ系列より第２の学
習パラメータセットを作成する過程と、前記第１の学習
パラメータセットおよび第２の学習パラメータセットか
ら適応パラメータを作成または更新する過程と、前記適
応パラメータを用いた重回帰モデルに基づく所定の写像
関数により前記第１の音響モデルのパラメータから第２
の音響モデルのパラメータを作成する過程と、認識を行
うときには前記第１の音響モデルまたは第２の音響モデ
ルのどちらを使って認識するかを判定する過程と、前記
判定結果に基づき前記２種類音響モデルの選択制御を行
う過程と、前記選択された音響モデルを用いて前記特徴
パラメータの系列と照合を行う過程と、前記照合結果か
ら認識結果を決定する過程とを有する音声認識方法のプ
ログラムを記憶し、前記プログラムをコンピュータより
読み取り可能とすることとした。Further, the voice recognition storage medium of the present invention analyzes the input voice and extracts a characteristic parameter sequence, and further includes an unspecified predetermined voice if the voice is voice input for adaptation. Reading the parameters of the first acoustic model for the speaker, and determining whether or not the first utterance in the adaptation, and determining the first learning parameter set from the parameters of the first acoustic model only at the first utterance. Creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model; and adapting an adaptive parameter from the first learning parameter set and the second learning parameter set. Creating or updating, and a predetermined mapping function based on a multiple regression model using the adaptive parameters, The second from the parameters of the acoustic model of
Creating a parameter of the acoustic model of the first acoustic model, determining whether to use the first acoustic model or the second acoustic model to perform the recognition, and performing the two types of acoustic model based on the determination result. A program for a voice recognition method including a step of performing model selection control, a step of performing matching with the sequence of feature parameters using the selected acoustic model, and a step of determining a recognition result from the matching result is stored. The program can be read by a computer.

【００５７】この方法により、適応パラメータを最尤推
定により求めることができ、精度の良い適応パラメータ
の推定ができる。According to this method, the adaptive parameter can be obtained by the maximum likelihood estimation, and the adaptive parameter can be estimated with high accuracy.

【００５８】また、本発明の音声認識記憶媒体は、前記
音響モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似し
たＨＭＭでモデル化される音響モデルであり、前記音響
モデルのパラメータが前記音響モデルにおける平均ベク
トルであり、前記学習パラメータセットの個々の要素が
ベクトルの対の形で構成され、前記特徴パラメータ系列
と前記音響モデルのパラメータ系列との間でViterbi法
またはDP法またはBaum-Welch法による時間軸の整合方法
を用いて前記ベクトル対の確定をしながら学習パラメー
タセットを作成することとした。Further, in the speech recognition storage medium according to the present invention, the acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating the continuous HMM, and a parameter of the acoustic model is an average vector in the acoustic model. The individual elements of the learning parameter set are configured in the form of pairs of vectors, and the time axis between the feature parameter sequence and the parameter sequence of the acoustic model is calculated using the Viterbi method, the DP method, or the Baum-Welch method. The learning parameter set is created while determining the vector pair using the matching method.

【００５９】この構成により、逐次適応を行っていて話
者が変わった場合には、不特定話者の音響モデルが有効
になり、極端な認識性能の低下を回避することができる
音声認識装置を実現できる。With this configuration, when the speaker is changed while performing the adaptation sequentially, the acoustic model of the unspecified speaker becomes effective, and a speech recognition apparatus capable of avoiding an extreme decrease in recognition performance is provided. realizable.

【００６０】また、本発明の音声認識記憶媒体は、前記
学習パラメータセットの個々の要素がベクトルの対の形
で構成され、対の一方のパラメータベクトルの要素が重
回帰分析における目的変数として用いられ、対の他方の
前記パラメータベクトルが説明変数として用いられ、前
記適応パラメータが重回帰分析における偏回帰係数とし
て用いられることとした。Further, in the speech recognition storage medium of the present invention, each element of the learning parameter set is formed in the form of a pair of vectors, and an element of one parameter vector of the pair is used as an objective variable in the multiple regression analysis. , The other parameter vector of the pair is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in multiple regression analysis.

【００６１】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this configuration, a learning parameter set in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００６２】また、本発明の音声認識記憶媒体は、入力
した音声を分析して特徴パラメータ系列を抽出する過程
と、前記音声が適応のために入力した音声であるときに
はあらかじめ作成してある不特定話者用の第１の音響モ
デルのパラメータを読込む過程と、適応における初回発
声であるかどうかを判断し初回発声のときにのみ前記第
１の音響モデルのパラメータより第１の学習パラメータ
セットを作成する過程と、前記特徴パラメータ系列およ
び前記第１の音響モデルのパラメータ系列より第２の学
習パラメータセットを作成する過程と、前記第１の学習
パラメータセットおよび第２の学習パラメータセットか
ら適応パラメータを作成または更新する過程と、前記適
応パラメータを用いた重回帰モデルに基づく所定の写像
関数により前記第１の音響モデルのパラメータから第２
の音響モデルのパラメータを作成する過程と、認識を行
うときには前記第１の音響モデルおよび第２の音響モデ
ルを用いて前記特徴パラメータの系列と照合を行う過程
と、前記照合結果から認識結果を決定する過程とを有す
る音声認識方法のプログラムを記憶し、前記プログラム
をコンピュータより読み取り可能とすることとした。The voice recognition storage medium of the present invention further comprises a step of analyzing the input voice and extracting a feature parameter sequence, and a step of extracting the feature parameter sequence when the voice is voice input for adaptation. Reading the parameters of the first acoustic model for the speaker, and determining whether or not the first utterance in the adaptation, and determining the first learning parameter set from the parameters of the first acoustic model only at the first utterance. Creating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model; and adapting an adaptive parameter from the first learning parameter set and the second learning parameter set. Creating or updating, and a predetermined mapping function based on a multiple regression model using the adaptive parameters, The second from the parameters of the acoustic model of
Creating the parameters of the acoustic model, performing the recognition using the first acoustic model and the second acoustic model to perform matching with the feature parameter sequence, and determining the recognition result from the matching result. And storing the program of the voice recognition method having the steps of:

【００６３】この構成により、逐次適応を行っていて話
者が変わった場合には、不特定話者の音響モデルが有効
になり、極端な認識性能の低下を回避することができる
記憶媒体を実現できる。According to this configuration, when the speaker is changed after the successive adaptation, the acoustic model of the unspecified speaker becomes effective, and a storage medium capable of avoiding extreme reduction in recognition performance is realized. it can.

【００６４】また、本発明の音声認識記憶媒体は、前記
音響モデルが連続分布ＨＭＭまたは連続ＨＭＭを近似し
たＨＭＭでモデル化される音響モデルであり、前記音響
モデルのパラメータが前記音響モデルにおける平均ベク
トルであり、前記学習パラメータセットの個々の要素が
ベクトルの対の形で構成され、前記特徴パラメータ系列
と前記音響モデルのパラメータ系列との間でViterbi法
またはDP法またはBaum-Welch法による時間軸の整合方法
を用いて前記ベクトル対の確定をしながら学習パラメー
タセットを作成することとした。In the speech recognition storage medium according to the present invention, the acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating the continuous HMM, and a parameter of the acoustic model is an average vector in the acoustic model. The individual elements of the learning parameter set are configured in the form of pairs of vectors, and the time axis between the feature parameter sequence and the parameter sequence of the acoustic model is calculated using the Viterbi method, the DP method, or the Baum-Welch method. The learning parameter set is created while determining the vector pair using the matching method.

【００６５】この構成により、入力された特徴量ベクト
ルの系列とモデルのパラメータ系列との間の時間軸上で
の対応付けがなされた学習パラメータセットが得られ、
精度の良い適応パラメータの推定ができる。According to this configuration, a learning parameter set in which the input feature amount vector sequence and the model parameter sequence are associated on the time axis is obtained.
Accurate adaptation parameter estimation can be performed.

【００６６】さらに、本発明の音声認識記憶媒体は、前
記学習パラメータセットの個々の要素がベクトルの対の
形で構成され、対の一方のパラメータベクトルの要素が
重回帰分析における目的変数として用いられ、対の他方
の前記パラメータベクトルが説明変数として用いられ、
前記適応パラメータが重回帰分析における偏回帰係数と
して用いられることとした。Further, in the speech recognition storage medium of the present invention, the individual elements of the learning parameter set are formed in the form of pairs of vectors, and the elements of one of the pairs of parameter vectors are used as objective variables in multiple regression analysis. , The other parameter vector of the pair is used as an explanatory variable,
The adaptation parameters were used as partial regression coefficients in multiple regression analysis.

【００６７】この構成により、適応パラメータを最尤推
定により求めることができ、精度の良い適応パラメータ
の推定ができる。With this configuration, the adaptive parameter can be obtained by maximum likelihood estimation, and the adaptive parameter can be estimated with high accuracy.

【００６８】[0068]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図面を用いて説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００６９】図１は、本発明の第１の実施形態の音声認
識方法のフローチャートを示す。FIG. 1 shows a flowchart of the voice recognition method according to the first embodiment of the present invention.

【００７０】図１において、Ｓ１１は音声を入力する過
程であり、Ｓ１２は入力した音声を分析して特徴パラメ
ータ系列を抽出する過程である。Ｓ１３において適応モ
ードか認識モードの選択をする。Ｓ１４はあらかじめ作
成してある不特定話者用の音響モデルである第１の音響
モデルのパラメータを読込む過程、Ｓ１５は第１の音響
モデルのパラメータの全てあるいは一部分のパラメータ
を抽出して第１の学習パラメータセットを作成する過程
であり、Ｓ１６は特徴パラメータ系列および第１の音響
モデルのパラメータ系列の全てあるいは一部分のパラメ
ータを抽出して第２の学習パラメータセットを作成する
過程であり、Ｓ１７は第１の学習パラメータセットおよ
び第２のパラメータセットから適応パラメータを算出す
る過程であり、Ｓ１８は適応パラメータを用いた重回帰
モデル（後述）に基づく所定の写像関数により第１の音
響モデルのパラメータから第２の音響モデルを作成する
過程であり、Ｓ１９は第２の音響モデルのパラメータを
記憶する過程である。Ｓ１４からＳ１９までの過程を経
て作成される第２音響モデルを適応後のパラメータとし
て得ることにより適応処理を行う。In FIG. 1, S11 is a process of inputting voice, and S12 is a process of analyzing the input voice and extracting a feature parameter sequence. In S13, an adaptive mode or a recognition mode is selected. S14 is a step of reading parameters of a first acoustic model, which is a previously created acoustic model for an unspecified speaker, and S15 is a step of extracting all or a part of the parameters of the first acoustic model and extracting the first acoustic model. Is a process of creating a learning parameter set, and S16 is a process of creating a second learning parameter set by extracting parameters of all or a part of the feature parameter sequence and the parameter sequence of the first acoustic model, and S17 is a process of creating a second learning parameter set. This is a process of calculating an adaptive parameter from the first learning parameter set and the second parameter set. In step S18, a predetermined mapping function based on a multiple regression model (described later) using the adaptive parameter is used to calculate a parameter from the first acoustic model. This is a process of creating a second acoustic model, and S19 is a process for creating parameters of the second acoustic model. A process for storing. The adaptive processing is performed by obtaining the second acoustic model created through steps S14 to S19 as parameters after the adaptation.

【００７１】認識時には、第１の音響モデルまたは第２
の音響モデルのいずれかを選択する過程Ｓ１１０と、選
択された音響モデルを用いて特徴パラメータの系列と照
合を行う過程Ｓ１１１と、照合結果から認識結果を決定
する過程Ｓ１２２と、認識結果を出力する過程Ｓ１３３
の過程を経て音声認識を行う。At the time of recognition, the first acoustic model or the second acoustic model
Step S110 of selecting any one of the acoustic models, Step S111 of performing matching with a sequence of feature parameters using the selected acoustic model, Step S122 of determining a recognition result from the matching result, and outputting the recognition result. Step S133
The voice recognition is performed through the process described above.

【００７２】適応時に第１の学習パラメータセットと第
２のパラメータの学習パラメータセットを併合したパラ
メータセットを用いて適応パラメータを最尤推定により
求める。パラメータの併合により、不特定話者の音響モ
デルのパラメータが事前に学習サンプルに含まれている
ため、学習話者が少ない場合でも精度の良い適応パラメ
ータを安定して求めることができる。At the time of adaptation, an adaptive parameter is obtained by maximum likelihood estimation using a parameter set obtained by combining the first learning parameter set and the second parameter learning parameter set. Since the parameters of the acoustic model of the unspecified speaker are included in the learning sample in advance by the merging of the parameters, even when the number of learning speakers is small, it is possible to stably obtain an accurate adaptive parameter.

【００７３】図２は、本発明の第２の実施形態の音声認
識装置の構成図を示す。FIG. 2 shows a configuration diagram of a speech recognition apparatus according to a second embodiment of the present invention.

【００７４】図２において、入力した音声を音響分析部
２１にて分析して、特徴パラメータ抽出部２２において
特徴パラメータ系列を抽出する。切り替え手段２３によ
り「適応モード」または「認識モード」の切り替えを行
う。まず「適応モード」の場合について説明する。第１
の記憶装置２４には適応前の不特定話者用音響モデルと
してあらかじめ作成してある第１の音響モデルのパラメ
ータを格納している。記憶装置２４から読み出した第１
の音響モデルのパラメータの全てあるいは一部分のパラ
メータを抽出して、第１の学習パラメータセット作成部
２５により第１の学習パラメータセットを作成する。特
徴パラメータ系列および第１の音響モデルのパラメータ
系列の全てあるいは一部分のパラメータを抽出して、第
２の学習パラメータセット作成部２６により第２の学習
パラメータセットを作成する。第１の学習パラメータセ
ットおよび第２のパラメータセットを学習パラメータ併
合部２７で併合する。学習パラメータ併合部２７により
得られた第３の学習用パラメータセットから重回帰分析
法により適応パラメータ作成部２８により適応パラメー
タを算出し第２の記憶装置２９に格納する。音響モデル
変換部２１０では、適応パラメータを用いた重回帰モデ
ルに基づく所定の写像関数により第１の音響モデルのパ
ラメータから第２の音響モデルのパラメータを作成し第
３の記憶手段２１１に格納する。In FIG. 2, the input voice is analyzed by the acoustic analysis unit 21 and the feature parameter extraction unit 22 extracts a feature parameter sequence. The switching unit 23 switches between “adaptive mode” and “recognition mode”. First, the case of the “adaptive mode” will be described. First
The storage device 24 stores the parameters of the first acoustic model created in advance as an acoustic model for unspecified speakers before adaptation. The first read from the storage device 24
All or a part of the parameters of the acoustic model are extracted, and the first learning parameter set creating unit 25 creates the first learning parameter set. The parameters of all or a part of the feature parameter series and the parameter series of the first acoustic model are extracted, and the second learning parameter set creating unit 26 creates a second learning parameter set. The learning parameter merging unit 27 merges the first learning parameter set and the second parameter set. From the third learning parameter set obtained by the learning parameter merging unit 27, an adaptive parameter is calculated by an adaptive parameter creating unit 28 by a multiple regression analysis method and stored in a second storage device 29. The acoustic model conversion unit 210 creates parameters of the second acoustic model from parameters of the first acoustic model using a predetermined mapping function based on a multiple regression model using adaptive parameters, and stores the parameters in the third storage unit 211.

【００７５】次に、「認識モード」の場合における構成
及び動作について説明する。特徴パラメータ抽出部２２
で抽出した特徴パラメータは、照合部２１２にて制御部
２１４および音響モデル選択部２１５により選択される
音響モデルとの照合スコアが求められ結果判定部２１７
により最終結果が得られる。Next, the configuration and operation in the “recognition mode” will be described. Feature parameter extraction unit 22
In the feature parameters extracted in step (1), a matching score with the acoustic model selected by the control unit 214 and the acoustic model selecting unit 215 is obtained by the matching unit 212, and the result determining unit 217
Yields the final result.

【００７６】学習パラメータセット併合部２７を設け学
習パラメータセットの中に不特定話者用の音響モデルを
含ませておくことにより、学習データが少ない場合で
も、精度の良い適応パラメータを安定して得ることがで
きる。By providing a learning parameter set merging unit 27 and including an acoustic model for an unspecified speaker in the learning parameter set, an adaptive parameter with high accuracy can be stably obtained even when the learning data is small. be able to.

【００７７】図３は、本発明の第３の実施形態の音声認
識方法のフローチャートを示す。FIG. 3 shows a flowchart of the voice recognition method according to the third embodiment of the present invention.

【００７８】第１の実施形態では学習音声の提示を何発
声分か行った後で認識をすることを想定しているが、こ
の実施形態においては、逐次学習が可能な構成を示して
いる。In the first embodiment, it is assumed that the recognition is performed after a certain number of presentations of the learning speech are made, but this embodiment shows a configuration in which the learning can be performed sequentially.

【００７９】図３において、Ｓ３１は音声を入力する過
程であり、Ｓ３２は入力した音声を分析して特徴パラメ
ータ系列を抽出する過程である。Ｓ３３において適応モ
ードか認識モードの選択をする。Ｓ３４はあらかじめ作
成してある不特定話者用の音響モデルである第１の音響
モデルのパラメータを読込む過程、分岐Ｓ３５にて学習
の初回であるかどうかを調べ、初回の場合のみ第１の学
習パラメータセット作成過程Ｓ３６にて第１の音響モデ
ルのパラメータの全てあるいは一部分のパラメータを抽
出して第１の学習パラメータセットを作成する。Ｓ３７
は特徴パラメータ系列および第１の音響モデルのパラメ
ータ系列の全てあるいは一部分のパラメータを抽出して
第２の学習パラメータセットを作成する過程であり、Ｓ
３８は逐次学習のために必要な中間パラメータを作成す
る過程である。Ｓ３９は第１の学習パラメータセットお
よび第２のパラメータセットから適応パラメータを算出
する過程である、Ｓ３１０は適応パラメータを用いた重
回帰モデルに基づく所定の写像関数により第１の音響モ
デルのパラメータから第２の音響モデルを作成する過程
であり、Ｓ３１１は第２の音響モデルを記憶する過程で
ある。Ｓ３４からＳ３１１までの過程を経て作成される
第２音響モデルを適応後のパラメータとして得ることに
より逐次型適応における１発声当たりの適応処理を行
う。In FIG. 3, S31 is a process of inputting a voice, and S32 is a process of analyzing the input voice and extracting a feature parameter sequence. In S33, an adaptive mode or a recognition mode is selected. In step S34, a process of reading parameters of a first acoustic model, which is an acoustic model for an unspecified speaker prepared in advance, is performed. In branch S35, it is determined whether or not learning is the first time. In a learning parameter set creation step S36, all or a part of the parameters of the first acoustic model are extracted to create a first learning parameter set. S37
Is a process of extracting a parameter of the feature parameter sequence and all or a part of the parameter sequence of the first acoustic model to create a second learning parameter set;
Reference numeral 38 denotes a process of creating intermediate parameters required for sequential learning. S39 is a process of calculating an adaptive parameter from the first learning parameter set and the second parameter set, and S310 is a process of calculating a first acoustic model parameter from a parameter of the first acoustic model by a predetermined mapping function based on a multiple regression model using the adaptive parameter. The second acoustic model is a process of creating the second acoustic model, and S311 is a process of storing the second acoustic model. By obtaining the second acoustic model created through the processes from S34 to S311 as a parameter after the adaptation, the adaptive process per utterance in the sequential adaptation is performed.

【００８０】認識時には、第１の音響モデルまたは第２
の音響モデルのいずれかを選択する過程Ｓ３１２と、選
択された音響モデルを用いて特徴パラメータの系列と照
合を行う過程Ｓ３１３と照合結果から認識結果を決定す
る過程Ｓ３１４と、認識結果を出力する過程Ｓ３１５過
程を経て音声認識を行う。At the time of recognition, the first acoustic model or the second acoustic model
Step S312 of selecting one of the acoustic models, Step S313 of performing matching with a sequence of feature parameters using the selected acoustic model, Step S314 of determining a recognition result from the matching result, and Step of outputting the recognition result Voice recognition is performed through step S315.

【００８１】前記説明の通り、第３の実施形態の方法に
より第１の学習パラメータセットの作成を適応の初回発
声時にのみ行うとともに、逐次型の適応に必要な中間パ
ラメータを発声毎に保存しておくことにより、適応パラ
メータの更新を発声毎に行うことが可能となり、逐次型
の学習を実現することができる。As described above, the creation of the first learning parameter set by the method of the third embodiment is performed only at the time of the first utterance of the adaptation, and the intermediate parameters necessary for the successive adaptation are stored for each utterance. By doing so, it becomes possible to update the adaptive parameter for each utterance, and it is possible to realize sequential learning.

【００８２】図４は、本発明の第４実施形態の音声認識
装置の構成図を示す。FIG. 4 shows a configuration diagram of a speech recognition apparatus according to a fourth embodiment of the present invention.

【００８３】この実施形態では逐次型の適応が可能な音
声認識装置の構成例を示している。This embodiment shows an example of the configuration of a speech recognition apparatus that can perform successive adaptation.

【００８４】図４において、入力した音声を音響分析部
４１にて分析して、特徴パラメータ抽出部４２において
特徴パラメータ系列を抽出する。切り替え手段４３によ
り「適応モード」または「認識モード」の切り替えを行
う。In FIG. 4, an input voice is analyzed by an acoustic analysis unit 41, and a feature parameter extraction unit 42 extracts a feature parameter sequence. The switching unit 43 switches between “adaptive mode” and “recognition mode”.

【００８５】まず「適応モード」の場合について説明す
る。第１の記憶装置４４には適応前の不特定話者用音響
モデルとしてあらかじめ作成してある第１の音響モデル
のパラメータを格納している。第１の記憶装置４４から
読み出した第１の音響モデルのパラメータの全てあるい
は一部分のパラメータを抽出して、第１の学習パラメー
タセット作成部４５により第１の学習パラメータセット
を作成する。この操作は仮想スイッチ４３、４８１によ
り、学習の初回発声時にのみ行う。特徴パラメータ系列
および第１の音響モデルのパラメータ系列の全てあるい
は一部分のパラメータを抽出して、第２の学習パラメー
タセット作成部４６により第２の学習パラメータセット
を作成する。学習パラメータセット併合部４７では学習
の初回発声時にのみ第１及び第２の学習パラメータセッ
トの併合を行う。First, the case of the "adaptive mode" will be described. The first storage device 44 stores the parameters of the first acoustic model created in advance as an acoustic model for unspecified speakers before adaptation. All or a part of the parameters of the first acoustic model read from the first storage device 44 are extracted, and the first learning parameter set creating unit 45 creates the first learning parameter set. This operation is performed by the virtual switches 43 and 481 only at the first utterance of learning. The parameters of all or a part of the feature parameter series and the parameter series of the first acoustic model are extracted, and the second learning parameter set creating unit 46 creates a second learning parameter set. The learning parameter set merging unit 47 merges the first and second learning parameter sets only at the first utterance of learning.

【００８６】学習パラメータセット併合部４７または第
２の学習パラメータセット作成部４６で得られた第２の
学習用パラメータセットから重回帰分析法により適応パ
ラメータ作成部４８により適応パラメータを算出し第２
の記憶装置４９に格納する。音響モデル変換部４１０で
は、適応パラメータを用いた重回帰モデルに基づく所定
の写像関数により第１の音響モデルのパラメータから第
２の音響モデルのパラメータを作成し第３の記憶手段４
１１に格納する。From the second learning parameter set obtained by the learning parameter set merging unit 47 or the second learning parameter set generating unit 46, the adaptive parameter generating unit 48 calculates the adaptive parameters by the multiple regression analysis method.
Is stored in the storage device 49. The acoustic model conversion unit 410 creates parameters of the second acoustic model from parameters of the first acoustic model using a predetermined mapping function based on a multiple regression model using adaptive parameters,
11 is stored.

【００８７】次に、「認識モード」の場合の構成及び動
作について説明する。特徴パラメータ抽出部４２で抽出
された特徴パラメータは、照合部４１２にて制御部４１
４および音響モデル選択部４１５により選択される音響
モデルとの照合スコアが求められ結果判定部４１７によ
り最終結果が得られる。Next, the configuration and operation in the “recognition mode” will be described. The feature parameter extracted by the feature parameter extracting unit 42 is compared with the control unit 41 by the matching unit 412.
4 and a matching score with the acoustic model selected by the acoustic model selecting unit 415, and the result determining unit 417 obtains the final result.

【００８８】学習パラメータセット併合部４７を設け学
習パラメータセットの中に不特定話者用の音響モデルを
含ませておくことにより、学習データが少ない場合で
も、精度の良い適応パラメータを安定に得ることができ
ることに加え、学習の初回発声時のみ第１の学習パラメ
ータセットを第２の学習パラメータセットに加え、初回
以外の発声時には適応パラメータの更新を発声毎に行う
ことにより、逐次型の適応を実現することが可能にな
る。By providing a learning parameter set merging unit 47 and including an acoustic model for an unspecified speaker in the learning parameter set, it is possible to stably obtain accurate adaptive parameters even when the learning data is small. In addition to the above, the first learning parameter set is added to the second learning parameter set only at the time of the first utterance of learning, and the adaptation parameter is updated for each utterance at the time of the utterance other than the first time, thereby realizing the sequential type adaptation. It becomes possible to do.

【００８９】図５は、本発明の第５の実施形態の音声認
識方法のフローチャートを示す。FIG. 5 shows a flowchart of the voice recognition method according to the fifth embodiment of the present invention.

【００９０】この実施形態では第３の実施形態の構成で
可能となった逐次学習に加えて音響モデルの選択的な制
御機能を認識処理に新たに付加することにより、極端に
学習サンプルが少ない場合における認識性能の安定化を
図る方法の例を示している。In this embodiment, in addition to the sequential learning made possible by the configuration of the third embodiment, a selective control function of the acoustic model is newly added to the recognition processing, so that the number of learning samples is extremely small. 2 shows an example of a method for stabilizing the recognition performance in.

【００９１】図５において、Ｓ５１は音声を入力する過
程であり、Ｓ５２は入力した音声を分析して特徴パラメ
ータ系列を抽出する過程である。Ｓ５３において適応モ
ードか認識モードの選択をする。Ｓ５４はあらかじめ作
成してある不特定話者用の音響モデルである第１の音響
モデルのパラメータを読込む過程、分岐Ｓ５５にて学習
の初回であるかどうかを調べ、初回の場合のみ第１の学
習パラメータセット作成過程Ｓ５６にて第１の音響モデ
ルのパラメータの全てあるいは一部分のパラメータを抽
出して第１の学習パラメータセットを作成する。Ｓ５７
は特徴パラメータ系列および第１の音響モデルのパラメ
ータ系列の全てあるいは一部分のパラメータを抽出して
第２の学習パラメータセットを作成する過程であり、Ｓ
５８は逐次学習のために必要な中間パラメータを作成す
る過程である。Ｓ５９は第１の学習パラメータセットお
よび第２のパラメータセットから適応パラメータを算出
する過程である、Ｓ５１０は適応パラメータを用いた重
回帰モデル（後述））に基づく所定の写像関数により第
１の音響モデルのパラメータから第２の音響モデルを作
成する過程であり、Ｓ５１１は第２の音響モデルを記憶
する過程である。Ｓ５４からＳ５１１までの過程を経て
作成される第２音響モデルを適応後のパラメータとして
得ることにより逐次型適応における１発声当たりの適応
処理を行う。In FIG. 5, S51 is a process of inputting a voice, and S52 is a process of analyzing the input voice and extracting a feature parameter sequence. In S53, the adaptive mode or the recognition mode is selected. S54 is a step of reading the parameters of the first acoustic model, which is an acoustic model for an unspecified speaker, created in advance. In branch S55, it is checked whether or not the learning is the first time. In a learning parameter set creation step S56, all or a part of the parameters of the first acoustic model are extracted to create a first learning parameter set. S57
Is a process of extracting a parameter of the feature parameter sequence and all or a part of the parameter sequence of the first acoustic model to create a second learning parameter set;
Reference numeral 58 denotes a process of creating intermediate parameters required for sequential learning. S59 is a process of calculating an adaptive parameter from the first learning parameter set and the second parameter set. S510 is a first acoustic model based on a predetermined mapping function based on a multiple regression model (described later) using the adaptive parameter. Is a process of creating a second acoustic model from the parameters of the above. S511 is a process of storing the second acoustic model. By obtaining the second acoustic model created through the processes from S54 to S511 as a parameter after the adaptation, the adaptive process per utterance in the sequential adaptation is performed.

【００９２】認識時には、第１の音響モデルまたは第２
の音響モデルのいずれを選択するかを判定する過程Ｓ５
１２と、選択を制御する過程Ｓ５１３とにより、選択さ
れた音響モデルを用いて特徴パラメータの系列と照合を
行う過程Ｓ５１４と、照合結果から認識結果を決定する
過程Ｓ５１５と、認識結果を出力する過程Ｓ５１６の過
程を経て認識処理を行う。At the time of recognition, the first acoustic model or the second acoustic model
Determining which one of the acoustic models is selected S5
12, a step S514 of performing matching with a sequence of feature parameters using the selected acoustic model, a step S515 of determining a recognition result from the matching result, and a step of outputting the recognition result. A recognition process is performed through the process of S516.

【００９３】前記説明の通り、第５の実施形態の方法に
より第１の学習パラメータセットの作成を適応の初回発
声時にのみ行うとともに、逐次型の適応に必要な中間パ
ラメータを発声毎に保存しておくことにより、適応パラ
メータの更新を発声毎に行うことが可能となり、逐次型
の学習を実現することができる。As described above, the creation of the first learning parameter set by the method of the fifth embodiment is performed only at the time of the first utterance of the adaptation, and the intermediate parameters necessary for the successive adaptation are stored for each utterance. By doing so, it becomes possible to update the adaptive parameter for each utterance, and it is possible to realize sequential learning.

【００９４】上述の実施形態では音響モデルの選択的な
制御機能Ｓ５１２および選択制御Ｓ５１３を認識処理に
新たに付加することにより、極端に学習サンプルが少な
い場合における認識性能の安定化を図ることが可能にな
る。In the above-described embodiment, by adding the selective control function S512 and the selection control S513 of the acoustic model to the recognition process, the recognition performance can be stabilized when the number of learning samples is extremely small. become.

【００９５】図６は、本発明の第６の実施形態の音声認
識装置の構成図を示す。FIG. 6 shows a configuration diagram of a speech recognition apparatus according to a sixth embodiment of the present invention.

【００９６】この実施形態では逐次型の適応が可能な音
声認識装置であって、学習サンプルが極端に少ない場合
に場合における認識性能の安定化を図ることを目的に構
成した例を示している。In this embodiment, there is shown an example of a speech recognition apparatus capable of successive adaptation, which is designed to stabilize the recognition performance when the number of learning samples is extremely small.

【００９７】図６において、入力した音声を音響分析部
６１にて分析して、特徴パラメータ抽出部６２において
特徴パラメータ系列を抽出する。切り替え手段６３によ
り「適応モード」または「認識モード」の切り替えを行
う。まず「適応モード」の場合について説明する。第１
の記憶装置６４には適応前の不特定話者用音響モデルと
してあらかじめ作成してある第１の音響モデルのパラメ
ータを格納している。第１の記憶装置６４から読み出し
た第１の音響モデルのパラメータの全てあるいは一部分
のパラメータを抽出して、第１の学習パラメータセット
作成部６５により第１の学習パラメータセットを作成す
る。この操作は仮想スイッチ６３、６８１により、学習
の初回発声時にのみ行う。特徴パラメータ系列および第
１の音響モデルのパラメータ系列の全てあるいは一部分
のパラメータを抽出して、第２の学習パラメータセット
作成部６６により第２の学習パラメータセットを作成す
る。学習パラメータセット併合部６７では学習の初回発
声時にのみ第１及び第２の学習パラメータセットの併合
を行う。In FIG. 6, an input voice is analyzed by an acoustic analysis unit 61 and a feature parameter extraction unit 62 extracts a feature parameter sequence. The switching unit 63 switches between “adaptive mode” and “recognition mode”. First, the case of the “adaptive mode” will be described. First
The storage device 64 stores the parameters of the first acoustic model created in advance as an acoustic model for unspecified speakers before adaptation. All or a part of the parameters of the first acoustic model read from the first storage device 64 are extracted, and the first learning parameter set creating unit 65 creates a first learning parameter set. This operation is performed by the virtual switches 63 and 681 only at the first utterance of learning. All or a part of the parameter series of the feature parameter series and the parameter series of the first acoustic model are extracted, and the second learning parameter set creating unit 66 creates a second learning parameter set. The learning parameter set merging unit 67 merges the first and second learning parameter sets only at the first utterance of learning.

【００９８】学習パラメータセット併合部６７または第
２の学習パラメータセット作成部６６で得られた学習用
パラメータセットから重回帰分析法により適応パラメー
タ作成部６８により適応パラメータを算出し第２の記憶
装置６９に格納する。音響モデル変換部６１１では、適
応パラメータを用いた重回帰モデルに基づく所定の写像
関数により第１の音響モデルのパラメータから第２の音
響モデルのパラメータを作成し第３の記憶装置６１２に
格納する。From the learning parameter sets obtained by the learning parameter set merging section 67 or the second learning parameter set generating section 66, the adaptive parameters are calculated by the adaptive parameter generating section 68 by the multiple regression analysis method, and the second storage device 69 is provided. To be stored. The acoustic model conversion unit 611 creates parameters of the second acoustic model from parameters of the first acoustic model using a predetermined mapping function based on a multiple regression model using adaptive parameters, and stores the parameters in the third storage device 612.

【００９９】次に、「認識モード」の場合について説明
する。特徴パラメータ抽出部６２で抽出された特徴パラ
メータは、照合部６１２にて、制御部６１４および判定
部６１５および音響モデル選択部６１６により選択され
る音響モデルとの照合スコアが求められ結果判定部６１
７により最終結果が得られる。Next, the case of the "recognition mode" will be described. The feature parameter extracted by the feature parameter extraction unit 62 is used by the matching unit 612 to obtain a matching score with the acoustic model selected by the control unit 614, the determination unit 615, and the acoustic model selection unit 616, and the result determination unit 61
7 gives the final result.

【０１００】この実施形態では、第４の実施形態の構成
で可能となった逐次学習に加えて音響モデルの選択的な
制御機能を制御部６１４および結果判定部６１５で実現
し、認識処理に新たに付加することにより、極端に学習
サンプルが少ない場合における認識性能の安定化を図る
ことが可能となる。In this embodiment, in addition to the sequential learning made possible by the configuration of the fourth embodiment, a selective control function of the acoustic model is realized by the control unit 614 and the result determination unit 615, and a new processing is performed for the recognition processing. , It is possible to stabilize the recognition performance when the number of learning samples is extremely small.

【０１０１】図７は、本発明の第７の実施形態の音声認
識方法のフローチャートを示す。FIG. 7 shows a flowchart of the voice recognition method according to the seventh embodiment of the present invention.

【０１０２】この実施形態では第３の実施形態の構成で
可能となった逐次学習に加えて例えば逐次学習の途中で
話者が変わった場合でも極端な認識性能の低下を防ぐ構
成を実現している。In this embodiment, in addition to the sequential learning made possible by the configuration of the third embodiment, a configuration for preventing an extreme decrease in recognition performance even when the speaker changes during the sequential learning is realized. I have.

【０１０３】図７において、Ｓ７１は音声を入力する過
程であり、Ｓ７２は入力した音声を分析して特徴パラメ
ータ系列を抽出する過程である。Ｓ７３において適応モ
ードか認識モードの選択をする。Ｓ７４はあらかじめ作
成してある不特定話者用の音響モデルである第１の音響
モデルのパラメータを読込む過程、分岐Ｓ７５にて学習
の初回であるかどうかを調べ初回の場合のみ第１の学習
パラメータセット作成過程Ｓ７６にて第１の音響モデル
のパラメータの全てあるいは一部分のパラメータを抽出
して第１の学習パラメータセットを作成する。Ｓ７７は
特徴パラメータ系列および第１の音響モデルのパラメー
タ系列の全てあるいは一部分のパラメータを抽出して第
２の学習パラメータセットを作成する過程であり、Ｓ７
８は逐次学習のために必要な中間パラメータを作成する
過程である。Ｓ７９は第１の学習パラメータセットおよ
び第２のパラメータセットから適応パラメータを算出す
る過程である、Ｓ７１０は適応パラメータを用いた重回
帰モデル（後述））に基づく所定の写像関数により第１
の音響モデルのパラメータから第２の音響モデルを作成
する過程であり、Ｓ７１１は第２の音響モデルを記憶す
る過程である。Ｓ７４からＳ７１１までの過程を経て作
成される第２音響モデルを適応後のパラメータとして得
ることにより逐次型適応における１発声当たりの適応処
理を行う。In FIG. 7, S71 is a process of inputting a voice, and S72 is a process of analyzing the input voice and extracting a feature parameter sequence. In S73, an adaptive mode or a recognition mode is selected. Step S74 is a step of reading parameters of a first acoustic model which is an acoustic model for an unspecified speaker prepared in advance. In branch S75, it is checked whether learning is the first time or not. In a parameter set creation step S76, all or a part of the parameters of the first acoustic model are extracted to create a first learning parameter set. Step S77 is a step of extracting all or a part of the parameter series of the feature parameter and the parameter series of the first acoustic model to create a second learning parameter set.
Reference numeral 8 denotes a process of creating intermediate parameters required for sequential learning. S79 is a process of calculating an adaptive parameter from the first learning parameter set and the second parameter set, and S710 is a process of calculating the first parameter by a predetermined mapping function based on a multiple regression model (described later) using the adaptive parameter.
Is a process of creating a second acoustic model from the parameters of the acoustic model of Step S711, and S711 is a process of storing the second acoustic model. By obtaining the second acoustic model created through the processes from S74 to S711 as a parameter after the adaptation, the adaptive process per utterance in the sequential adaptation is performed.

【０１０４】認識時には、特徴パラメータ系列と第１の
音響モデルとの照合をＳ７１２にて行い、第２の音響モ
デルとの照合をＳ７１３にて行い、Ｓ７１４にて２種類
の音響モデルに対する照合結果から最もスコアの高い結
果を最終結果とする決定を行った後Ｓ７１５にて認識結
果を出力する。At the time of recognition, the feature parameter series and the first acoustic model are collated in S712, the collation with the second acoustic model is performed in S713, and the collation results for the two types of acoustic models are determined in S714. After determining that the result with the highest score is the final result, the recognition result is output in S715.

【０１０５】上述の実施形態ではＳ７１２およびＳ７１
３とで常に適応前の音響モデルと適応後の音響モデルの
照合を行うことにより話者が変わった場合でも極端な認
識率の低下を招くことを回避できる。In the above embodiment, S712 and S71
By constantly comparing the acoustic model before adaptation with the acoustic model after adaptation in step 3, even if the speaker changes, it is possible to avoid an extreme decrease in the recognition rate.

【０１０６】図８は、本発明の第８の実施形態の音声認
識装置の構成図を示す。FIG. 8 is a diagram showing the configuration of a speech recognition apparatus according to an eighth embodiment of the present invention.

【０１０７】この実施形態では逐次型の適応が可能な音
声認識装置であって、学習サンプルが極端に少ない場合
に場合における認識性能の安定化を図ることを目的に構
成した例を示している。This embodiment shows an example of a speech recognition apparatus capable of successive adaptation, which is designed to stabilize the recognition performance when the number of learning samples is extremely small.

【０１０８】図８において、入力した音声を音響分析部
８１にて分析して、特徴パラメータ抽出部８２において
特徴パラメータ系列を抽出する。切り替え手段８３によ
り「適応モード」または「認識モード」の切り替えを行
う。まず「適応モード」の場合について説明する。第１
の記憶装置８４には適応前の不特定話者用音響モデルと
してあらかじめ作成してある第１の音響モデルのパラメ
ータを格納している。第１の記憶装置８４から読み出し
た第１の音響モデルのパラメータの全てあるいは一部分
のパラメータを抽出して、第１の学習パラメータセット
作成部８５により第１の学習パラメータセットを作成す
る。この操作は仮想スイッチ８３、８８１により、学習
の初回発声時にのみ行う。特徴パラメータ系列および第
１の音響モデルのパラメータ系列の全てあるいは一部分
のパラメータを抽出して、第２の学習パラメータセット
作成部８６により第２の学習パラメータセットを作成す
る。学習パラメータセット併合部８７では学習の初回発
声時にのみ第１及び第２の学習パラメータセットの併合
を行う。In FIG. 8, an input voice is analyzed by an acoustic analysis unit 81, and a feature parameter extraction unit 82 extracts a feature parameter sequence. The switching unit 83 switches between “adaptive mode” and “recognition mode”. First, the case of the “adaptive mode” will be described. First
The storage device 84 stores the parameters of the first acoustic model created in advance as an acoustic model for unspecified speakers before adaptation. All or a part of the parameters of the first acoustic model read from the first storage device 84 are extracted, and the first learning parameter set creating unit 85 creates a first learning parameter set. This operation is performed by the virtual switches 83 and 881 only at the first utterance of learning. The parameters of all or a part of the feature parameter series and the parameter series of the first acoustic model are extracted, and the second learning parameter set creation unit 86 creates a second learning parameter set. The learning parameter set merging unit 87 merges the first and second learning parameter sets only at the first utterance of learning.

【０１０９】学習パラメータセット併合部８７または第
２の学習パラメータセット作成部８６で得られた学習用
パラメータセットから重回帰分析法により適応パラメー
タ作成部８８により適応パラメータを算出し第２の記憶
装置８９に格納する。音響モデル変換部８１０では、適
応パラメータを用いた重回帰モデルに基づく所定の写像
関数により第１の音響モデルのパラメータから第２の音
響モデルのパラメータを作成し第３の記憶手段８１１に
格納する。From the learning parameter sets obtained by the learning parameter set merging unit 87 or the second learning parameter set generating unit 86, the adaptive parameters are calculated by the adaptive parameter generating unit 88 by the multiple regression analysis method, and the second storage unit 89 To be stored. The acoustic model conversion unit 810 creates parameters of the second acoustic model from parameters of the first acoustic model using a predetermined mapping function based on a multiple regression model using adaptive parameters, and stores the parameters in the third storage unit 811.

【０１１０】次に、「認識モード」の場合について説明
する。特徴パラメータ抽出部８２で抽出した特徴パラメ
ータは、照合部８１２にて第１の音響モデルおよび第２
の音響モデルとの照合スコアが求められ、結果判定部８
１７により最終結果が得られる。Next, the case of the "recognition mode" will be described. The feature parameter extracted by the feature parameter extraction unit 82 is compared with the first acoustic model and the second
The matching score with the acoustic model is determined, and the result determination unit 8
17 gives the final result.

【０１１１】この実施形態では、第４の実施形態の構成
で可能となった逐次学習に加えて、常に第１の音響モデ
ルと第２の音響モデルとの両方の音響モデルとの照合を
行うことにより、認識装置を使用中に話者が変わった場
合でも極端な認識率の低下を防止することが可能とな
る。In this embodiment, in addition to the sequential learning made possible by the configuration of the fourth embodiment, it is necessary to always collate both the first acoustic model and the second acoustic model. Accordingly, it is possible to prevent an extremely low recognition rate even when the speaker changes while using the recognition device.

【０１１２】また、本発明の第９の実施形態である記録
媒体は、第１の実施形態による方法をコンピュータに実
行させるためのプログラムを記憶し、コンピュータで読
み取り可能としたものである。The recording medium according to the ninth embodiment of the present invention stores a program for causing a computer to execute the method according to the first embodiment, and is readable by the computer.

【０１１３】また、本発明の第１０の実施形態である記
録媒体は、第３の実施形態による方法をコンピュータに
実行させるためのプログラムを記憶し、コンピュータで
読み取り可能としたものである。The recording medium according to the tenth embodiment of the present invention stores a program for causing a computer to execute the method according to the third embodiment, and is readable by the computer.

【０１１４】また、本発明の第１１の実施形態である記
録媒体は、第５の実施形態による方法をコンピュータに
実行させるためのプログラムを記憶し、コンピュータで
読み取り可能としたものである。The recording medium according to the eleventh embodiment of the present invention stores a program for causing a computer to execute the method according to the fifth embodiment, and is readable by the computer.

【０１１５】さらに、本発明の第１２の実施形態である
記録媒体は、第７の実施形態による方法をコンピュータ
に実行させるためのプログラムを記憶し、コンピュータ
で読み取り可能としたものである。Further, a recording medium according to the twelfth embodiment of the present invention stores a program for causing a computer to execute the method according to the seventh embodiment, and is readable by the computer.

【０１１６】[0116]

【発明の効果】以上のように本発明は、不特定話者の音
響モデルから作成される第１の学習パラメータセットと
入力信号の特徴パラメータと不特定話者の音響モデルか
ら作成される第２の学習パラメータセットを併合して得
られる第３の学習パラメータセットを用いて適応パラメ
ータを求めることにより、少数の学習データでも安定し
た話者適応を簡便な処理で実現する音声認識方法を提供
することができる。As described above, according to the present invention, the first learning parameter set created from the acoustic model of the unspecified speaker, the characteristic parameter of the input signal, and the second training model created from the acoustic model of the unspecified speaker. To provide a speech recognition method that realizes stable speaker adaptation with a simple process even with a small number of learning data by obtaining an adaptation parameter using a third learning parameter set obtained by merging the learning parameter sets. Can be.

[Brief description of the drawings]

【図１】本発明の第１の実施形態の音声認識方法のフロ
ーチャートを示す図FIG. 1 is a diagram showing a flowchart of a speech recognition method according to a first embodiment of the present invention;

【図２】本発明の第２の実施形態の音声認識装置の構成
図FIG. 2 is a configuration diagram of a speech recognition device according to a second embodiment of the present invention;

【図３】本発明の第３の実施形態の音声認識方法のフロ
ーチャートを示す図FIG. 3 is a diagram showing a flowchart of a voice recognition method according to a third embodiment of the present invention.

【図４】本発明の第４の実施形態の声認識装置の構成図FIG. 4 is a configuration diagram of a voice recognition device according to a fourth embodiment of the present invention.

【図５】本発明の第５の実施形態の音声認識方法のフロ
ーチャートを示す図FIG. 5 is a diagram showing a flowchart of a speech recognition method according to a fifth embodiment of the present invention.

【図６】本発明の第６の実施形態の音声認識装置の構成
図FIG. 6 is a configuration diagram of a speech recognition device according to a sixth embodiment of the present invention.

【図７】本発明の第７の実施形態の音声認識方法のフロ
ーチャートを示す図FIG. 7 is a diagram showing a flowchart of a speech recognition method according to a seventh embodiment of the present invention.

【図８】本発明の第８の実施形態の音声認識装置の構成
図FIG. 8 is a configuration diagram of a speech recognition device according to an eighth embodiment of the present invention.

【図９】従来技術の話者適応法の処理フロー図FIG. 9 is a processing flowchart of a speaker adaptation method according to the related art.

[Explanation of symbols]

Ｓ１１音声入力過程Ｓ１２特徴パラメータ抽出過程Ｓ１３モード選択過程Ｓ１４第１の音響モデルのパラメータ読込過程Ｓ１５第１の学習パラメータセット作成過程Ｓ１６第２の学習パラメータセット作成過程Ｓ１７適応パラメータ算出過程Ｓ１８第２の音響モデルのパラメータ作成過程Ｓ１９第２の音響モデルのパラメータ記憶過程Ｓ１１０音響モデル選択過程Ｓ１１１照合過程Ｓ１１２認識結果決定過程Ｓ１１３認識結果出力過程２１音響分析部２２特徴パラメータ抽出部２３切り替えスイッチ２４第１の記憶装置２５第１の学習パラメータセット作成部２６第２の学習パラメータセット作成部２７学習パラメータセット併合部２８適応パラメータ作成部２９第２の記憶装置２１０音響モデル変換部２１１第３の記憶装置２１２照合部２１５音響モデル選択部２１４制御部２１７結果判定部 S11 Voice input process S12 Feature parameter extraction process S13 Mode selection process S14 First acoustic model parameter reading process S15 First learning parameter set creation process S16 Second learning parameter set creation process S17 Adaptive parameter calculation process S18 Second Acoustic model parameter creation process S19 Second acoustic model parameter storage process S110 Acoustic model selection process S111 Matching process S112 Recognition result determination process S113 Recognition result output process 21 Acoustic analysis unit 22 Feature parameter extraction unit 23 Switching switch 24 First Storage device 25 First learning parameter set creation unit 26 Second learning parameter set creation unit 27 Learning parameter set merging unit 28 Adaptive parameter creation unit 29 Second storage device 210 Acoustic model conversion unit 21 1 third storage device 212 collation unit 215 acoustic model selection unit 214 control unit 217 result determination unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者國枝伸行神奈川県横浜市港北区綱島東四丁目３番１号松下通信工業株式会社内 (72)発明者野村和也神奈川県横浜市港北区綱島東四丁目３番１号松下通信工業株式会社内Ｆターム(参考） 5D015 GG03 HH00 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Nobuyuki Kunieda 4-3-1 Tsunashima Higashi, Kohoku-ku, Yokohama-shi, Kanagawa Prefecture Inside Matsushita Communication Industrial Co., Ltd. (72) Kazuya Nomura Tsunashima-higashi, Kohoku-ku, Yokohama, Kanagawa Prefecture 4-3-1, Matsushita Communication Industrial Co., Ltd. F-term (reference) 5D015 GG03 HH00

Claims

[Claims]

1. A process of analyzing a inputted speech to extract a feature parameter sequence, and a first sound for an unspecified speaker prepared in advance when the speech is a speech inputted for adaptation. Reading the parameters of the model, creating a first learning parameter set from the parameters of the first acoustic model, and acquiring a second learning parameter from the feature parameter series and the parameter series of the first acoustic model. Creating a set, calculating an adaptive parameter from the first learning parameter set and the second learning parameter set, and performing the first mapping by a predetermined mapping function based on a multiple regression model using the adaptive parameter. Second from the parameters of the acoustic model
Creating the parameters of the acoustic model, selecting one of the first acoustic model and the second acoustic model when performing the recognition, and using the selected acoustic model, And a step of determining a recognition result from the comparison result.

2. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, a parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets is Are configured in the form of a vector pair, and the vector pair of the feature parameter sequence and the parameter sequence of the acoustic model are matched using a time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method. 2. The speech recognition method according to claim 1, wherein a learning parameter set is created while being determined.

3. The individual elements of the learning parameter set are formed in the form of pairs of vectors, the elements of one parameter vector of the pair are used as objective variables in multiple regression analysis, and the other of the pair is The speech recognition method according to claim 1, wherein the speech recognition method is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in multiple regression analysis.

4. A process for analyzing an input voice to extract a feature parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the voice is a voice input for adaptation. Reading the parameters of
A step of determining whether or not the first utterance in the adaptation and creating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance; Creating a second learning parameter set from the parameter series, creating or updating an adaptive parameter from the first learning parameter set and the second learning parameter set, and performing a multiple regression model using the adaptive parameter. A step of creating a parameter of the second acoustic model from a parameter of the first acoustic model by a predetermined mapping function based on the first acoustic model, and selecting one of the first acoustic model and the second acoustic model when performing recognition. Using the selected acoustic model to collate with the sequence of feature parameters And degree, the speech recognition method characterized in that it comprises a step of determining a recognition result from the verification result.

5. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, a parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets is Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 5. The speech recognition method according to claim 4, wherein a learning parameter set is created while being determined.

6. An individual element of the learning parameter set is configured in the form of a vector pair, an element of one parameter vector of the pair is used as an objective variable in a multiple regression analysis, and the other parameter vector of the pair is The speech recognition method according to claim 4, wherein the speech recognition method is used as an explanatory variable, and the adaptive parameter is used as a partial regression coefficient in a multiple regression analysis.

7. A step of analyzing an input voice to extract a feature parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the voice is input voice for adaptation. Reading the parameters of the
Determining whether or not the first utterance in the adaptation and creating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance; Creating a second learning parameter set from the parameter series, creating or updating an adaptive parameter from the first learning parameter set and the second learning parameter set, and performing a multiple regression model using the adaptive parameter. Creating a parameter of the second acoustic model from parameters of the first acoustic model by a predetermined mapping function based on the first acoustic model, and performing recognition using either the first acoustic model or the second acoustic model. And performing the selection control of the two types of acoustic models based on the determination result. Cormorants process and the steps of performing a sequence and matching of the feature parameter using the selected acoustic model, the speech recognition method characterized in that it comprises a step of determining a recognition result from the verification result.

8. The acoustic model modeled by a continuous distribution HMM or an HMM approximating the continuous HMM, wherein the parameters of the acoustic model are average vectors in the acoustic model, and each of the learning parameter sets Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, The speech recognition method according to claim 7, wherein a learning parameter set is created while being determined.

9. An individual element of the learning parameter set is formed in the form of a vector pair, an element of one parameter vector of the pair is used as an objective variable in the multiple regression analysis, and the other of the pair is defined as the parameter vector. The speech recognition method according to claim 7, wherein the speech recognition method is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in a multiple regression analysis.

10. A process of analyzing an input speech to extract a feature parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the speech is input speech for adaptation. Reading a parameter, determining whether or not the first utterance in the adaptation, and generating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance; And a step of creating a second learning parameter set from the parameter series of the first acoustic model; a step of creating or updating an adaptation parameter from the first learning parameter set and the second learning parameter set; A predetermined mapping function based on the multiple regression model using the parameters, (2) a step of creating parameters of the acoustic model, (2) a step of performing matching with the feature parameter sequence using the first acoustic model and the second acoustic model when performing recognition, and Determining the speech.

11. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, a parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets is Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 11. The speech recognition method according to claim 10, wherein a learning parameter set is created while being determined.

12. An individual element of the learning parameter set is configured in the form of a vector pair, an element of one parameter vector of the pair is used as an objective variable in a multiple regression analysis, and the other parameter vector of the pair is The speech recognition method according to claim 10, wherein the speech recognition method is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in a multiple regression analysis.

13. A feature parameter extracting means for analyzing an input speech to extract a feature parameter sequence, and a first parameter for storing parameters of a first acoustic model prepared in advance for an unspecified speaker before adaptation. Means for generating a first learning parameter set from the parameters of the first acoustic model, and generating a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model. Means, means for merging the first learning parameter set and second parameter set, and adaptive parameter creation means for calculating an adaptive parameter from the third learning parameter set obtained by the merging means by a multiple regression analysis method A first mapping function based on a multiple regression model using the adaptive parameters, A speech recognition apparatus comprising: an acoustic model parameter conversion unit configured to generate a parameter of a second acoustic model from a model parameter; and a second storage unit configured to store the parameter of the second acoustic model.

14. An acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, wherein the parameters of the acoustic model are average vectors in the acoustic model, and each of the learning parameter sets Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 15. The speech recognition device according to claim 14, wherein a learning parameter set is created while being determined.

15. A feature parameter extracting means for analyzing an input speech to extract a feature parameter sequence, and a first parameter for storing parameters of a first acoustic model prepared in advance for an unspecified speaker before adaptation. Storage means, means for creating a first learning parameter set based on the parameters of the first acoustic model only at the first utterance of learning, and second means based on the feature parameter series and the parameter series of the first acoustic model. Means for creating a learning parameter set, means for merging the first learning parameter set and the second parameter set only at the time of the first utterance of learning, and multiple regression from the third learning parameter set obtained by the merging means. An adaptive parameter generating means for calculating or updating an adaptive parameter by an analysis method; and Acoustic model parameter conversion means for creating a parameter of the second acoustic model from a parameter of the first acoustic model by a predetermined mapping function based on a multiple regression model, and a second storing the parameters of the second acoustic model. A speech recognition device comprising: a storage unit.

16. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, a parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets is Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 16. The speech recognition device according to claim 15, wherein a learning parameter set is created while being determined.

17. A feature parameter extracting means for analyzing an input speech to extract a feature parameter sequence, and a first parameter storing first acoustic model parameters prepared in advance for an unspecified speaker before adaptation. Storage means, means for creating a first learning parameter set based on the parameters of the first acoustic model only at the first utterance of learning, and second means based on the feature parameter series and the parameter series of the first acoustic model. Means for creating a learning parameter set, means for merging the first learning parameter set and the second parameter set only at the time of the first utterance of learning, and multiple regression from the third learning parameter set obtained by the merging means. An adaptive parameter generating means for calculating or updating an adaptive parameter by an analysis method; and Acoustic model parameter conversion means for creating a parameter of the second acoustic model from a parameter of the first acoustic model by a predetermined mapping function based on a multiple regression model, and a second storing the parameters of the second acoustic model. A speech recognition apparatus comprising: a storage unit; and a unit configured to determine whether to use the first acoustic model or the second acoustic model for recognition.

18. An acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, wherein the parameters of the acoustic model are average vectors in the acoustic model, and each of the learning parameter sets Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 18. The speech recognition device according to claim 17, wherein a learning parameter set is created while being determined.

19. A feature parameter extracting means for analyzing an input voice to extract a feature parameter sequence, and a first parameter storing a parameter of a first acoustic model prepared in advance for an unspecified speaker before adaptation. Storage means, means for creating a first learning parameter set based on the parameters of the first acoustic model only at the first utterance of learning, and second means based on the feature parameter series and the parameter series of the first acoustic model. Means for creating a learning parameter set, means for merging the first learning parameter set and the second parameter set only at the time of the first utterance of learning, and multiple regression from the third learning parameter set obtained by the merging means. An adaptive parameter generating means for calculating or updating an adaptive parameter by an analysis method; and Acoustic model parameter conversion means for creating a parameter of the second acoustic model from a parameter of the first acoustic model by a predetermined mapping function based on a multiple regression model, and a second storing the parameters of the second acoustic model. A speech recognition apparatus comprising: a storage unit; and a matching unit that obtains a matching result with respect to the first acoustic model and a matching result with respect to the second acoustic model upon recognition.

20. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, a parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 20. The speech recognition apparatus according to claim 19, wherein the learning parameter set is created while being determined.

21. A process for analyzing an input speech to extract a feature parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the speech is input speech for adaptation. Reading a parameter, a step of creating a first learning parameter set from the parameters of the first acoustic model, and a second learning parameter set from the feature parameter series and the parameter series of the first acoustic model. , A step of calculating an adaptive parameter from the first learning parameter set and the second learning parameter set, and a first mapping function based on a multiple regression model using the adaptive parameter. Second from model parameters
Creating the parameters of the acoustic model, selecting one of the first acoustic model and the second acoustic model when performing the recognition, and using the selected acoustic model, A voice recognition storage medium storing a program of a voice recognition method including a step of performing a verification and a step of determining a recognition result from the verification result, and making the program readable by a computer.

22. An acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, wherein the parameters of the acoustic model are average vectors in the acoustic model, and each of the learning parameter sets Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 22. The speech recognition storage medium according to claim 21, wherein a learning parameter set is created while being determined.

23. Each element of the learning parameter set is formed in the form of a vector pair, an element of one parameter vector of the pair is used as an objective variable in multiple regression analysis, and the other of the pair is The speech recognition storage medium according to claim 21, wherein the speech recognition storage medium is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in a multiple regression analysis.

24. A process for analyzing an input speech to extract a characteristic parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the speech is input speech for adaptation. Reading a parameter, determining whether or not the first utterance in the adaptation, and generating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance; And a step of creating a second learning parameter set from the parameter series of the first acoustic model; a step of creating or updating an adaptation parameter from the first learning parameter set and the second learning parameter set; A predetermined mapping function based on the multiple regression model using the parameters, 2) a step of creating parameters of the acoustic model, a step of selecting either the first acoustic model or the second acoustic model at the time of recognition, and a step of selecting the feature parameter using the selected acoustic model. A speech recognition storage medium storing a program of a speech recognition method including a step of performing a collation with a sequence and a step of determining a recognition result from the comparison result, and making the program readable by a computer.

25. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, the parameters of the acoustic model are average vectors in the acoustic model, and each of the learning parameter sets Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 25. The speech recognition storage medium according to claim 24, wherein a learning parameter set is created while being determined.

26. Each element of the learning parameter set is formed in the form of a vector pair, the element of one parameter vector of the pair is used as an objective variable in multiple regression analysis, and the other of the pair is The speech recognition storage medium according to claim 24, wherein the speech recognition storage medium is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in multiple regression analysis.

27. A process of analyzing an input voice to extract a feature parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the voice is a voice input for adaptation. Reading a parameter, determining whether or not the first utterance in the adaptation, and generating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance; And a step of creating a second learning parameter set from the parameter series of the first acoustic model; a step of creating or updating an adaptation parameter from the first learning parameter set and the second learning parameter set; A predetermined mapping function based on the multiple regression model using the parameters, A step of creating parameters of the acoustic model 2; a step of determining whether to use the first acoustic model or the second acoustic model when performing recognition; and a step of determining the two types based on the determination result. A program for a voice recognition method, comprising the steps of: performing a selection control of an acoustic model; performing a comparison with the feature parameter sequence using the selected acoustic model; and determining a recognition result from the comparison result. A voice-recognition storage medium that stores the program and makes the program readable by a computer.

28. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, a parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets is Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, The speech recognition storage medium according to claim 27, wherein a learning parameter set is created while being determined.

29. The individual elements of the learning parameter set are formed in the form of pairs of vectors, the elements of one parameter vector of the pair are used as objective variables in multiple regression analysis, and the other of the pair is The speech recognition storage medium according to claim 27, wherein the speech recognition storage medium is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in a multiple regression analysis.

30. A process for analyzing an input voice to extract a feature parameter sequence, and a first acoustic model for an unspecified speaker prepared in advance when the voice is a voice input for adaptation. Reading a parameter, determining whether or not the first utterance in the adaptation, and generating a first learning parameter set from the parameters of the first acoustic model only at the time of the first utterance; And a step of creating a second learning parameter set from the parameter series of the first acoustic model; a step of creating or updating an adaptation parameter from the first learning parameter set and the second learning parameter set; A predetermined mapping function based on the multiple regression model using the parameters, (2) a step of creating parameters of the acoustic model, (2) a step of performing matching with the feature parameter sequence using the first acoustic model and the second acoustic model when performing recognition, and Storing a program for a voice recognition method having a step of deciding, and making the program readable by a computer.

31. The acoustic model is an acoustic model modeled by a continuous distribution HMM or an HMM approximating a continuous HMM, the parameter of the acoustic model is an average vector in the acoustic model, and each of the learning parameter sets is Is configured in the form of a pair of vectors, and between the feature parameter sequence and the parameter sequence of the acoustic model, using the time axis matching method by the Viterbi method, the DP method, or the Baum-Welch method, 31. The speech recognition storage medium according to claim 30, wherein a learning parameter set is created while determining.

32. Each element of the learning parameter set is formed in the form of a vector pair, an element of one parameter vector of the pair is used as an objective variable in multiple regression analysis, and the other parameter vector of the pair is 31. The speech recognition storage medium according to claim 30, wherein the speech recognition storage medium is used as an explanatory variable, and the adaptation parameter is used as a partial regression coefficient in a multiple regression analysis.