JP2002297181A

JP2002297181A - Speech recognition vocabulary registration determination method and speech recognition device

Info

Publication number: JP2002297181A
Application number: JP2001100776A
Authority: JP
Inventors: Tsuneo Kato; 恒夫加藤; Toru Shimizu; 徹清水; Norio Higuchi; 宜男樋口
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2001-03-30
Filing date: 2001-03-30
Publication date: 2002-10-11

Abstract

PROBLEM TO BE SOLVED: To prevent the registration of vocabularies liable to give rise to erroneous recognition. SOLUTION: The numerical value to evaluate the easy tendency to mingling of the new vocabulary and the already registered vocabulary is calculated by evaluation value calculating means 5. Deciding means 6 sets the threshold to the numerical value given from the evaluation value calculating means 5 and decides whether the registration of the new vocabulary is permitted or prohibited.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は認識性能の低下を防
ぐために、誤認識を起こしやすいと予想される語彙（単
語やフレーズ、あるいは、それらの組合せ）を音声認識
対象語彙の選定時あるいは登録時に検出する技術に関
し、音声認識システムのユーザが新規語彙を同システム
に登録したり、音声認識システムの開発者が同システム
に登録する認識対象語彙を選定するのに有用である。BACKGROUND OF THE INVENTION The present invention relates to a method for preventing a decrease in recognition performance by selecting a vocabulary (word or phrase, or a combination thereof) that is likely to cause erroneous recognition when selecting or registering a vocabulary to be subjected to speech recognition. Regarding the detection technology, it is useful for a user of the speech recognition system to register a new vocabulary in the system, and for a developer of the speech recognition system to select a recognition target vocabulary to be registered in the system.

【０００２】[0002]

【従来の技術】電話による航空券チケット予約システム
や自動電話受付システムを始め、タスク限定型音声認識
装置、個人登録型音声ポータル装置、センター提供型音
声ポータル装置などでは、音声入力式ワープロ装置にお
ける音声認識システムと比べて、電話入力のため入力音
声の品質が良くないこと、並びに、タスクが限定されて
いることから、認識対象語彙の数を制限して認識性能の
低下を防ぐことが多い。つまり、タスクに応じた限られ
た語彙だけを認識するようにしている。2. Description of the Related Art In a task-limited voice recognition device, a personal registration type voice portal device, a center-provided voice portal device, and the like, such as an airline ticket ticket reservation system and an automatic telephone reception system, a voice input type word processor is used. As compared with the recognition system, the input speech is of poor quality due to telephone input and the number of tasks is limited, so that the number of vocabulary words to be recognized is often limited to prevent a reduction in recognition performance. That is, only a limited vocabulary corresponding to the task is recognized.

【０００３】この種の音声認識システムでは、通常、シ
ステム開発者が限られた語彙の選定を経験的に行ってい
る。例えば、システム開発者にとっては、経験上、「お
じさん」という語彙と「おじいさん」という語彙は誤認
識しやすいことが判っているから、仮に「おじさん」を
登録すれば、「おじいさん」は不適なもののとして排除
し登録しないようにしている。しかし、経験に頼ってい
るため、音声認識対象語彙の選定効率が良くない。In this type of speech recognition system, usually, a system developer empirically selects a limited vocabulary. For example, system developers have known from experience that the vocabulary "Uncle" and the vocabulary "Grandfather" are easily misunderstood, so if "Uncle" is registered, "Grandfather" is unsuitable. It is excluded and is not registered. However, since it depends on experience, the efficiency of selecting a vocabulary for speech recognition is not good.

【０００４】一方、システムのユーザが認識対象語彙を
カスタマイズできるようにすることが考えられる。しか
し、ユーザは経験が乏しいため、既登録語彙と混同しや
すい（紛らわしい）新規語彙を登録して、認識性能が低
下するという危険性がある。例えば、「おじさん」が既
に登録されていても、無意識に「おじいさん」を新規語
彙として追加登録することが考えられる。従って、事実
上、ユーザによる認識対象語彙の増加は困難である。On the other hand, it is conceivable that a user of the system can customize the vocabulary to be recognized. However, since the user has little experience, there is a risk that a new vocabulary that is easily confused with a registered vocabulary (confusing) is registered, and the recognition performance is reduced. For example, even if “Uncle” has already been registered, it is conceivable that “Uncle” is additionally registered as a new vocabulary unconsciously. Therefore, it is practically difficult for the user to increase the vocabulary to be recognized.

【０００５】[0005]

【発明が解決しようとする課題】本発明の課題は、誤認
識を起こしやすいと予想される語彙を認識対象語彙の選
定時あるいは登録時に検出し、登録させないようにする
ことである。これにより、システム開発者にとっては、
効率良く、認識対象語彙を選定することができる。ま
た、ユーザにとっては、認識性能を低下させることな
く、認識対象語彙の増加など、カスタマイズすることが
できる。SUMMARY OF THE INVENTION It is an object of the present invention to detect a vocabulary which is likely to cause erroneous recognition when selecting or registering a vocabulary to be recognized, and to prevent the vocabulary from being registered. As a result, for system developers,
The recognition target vocabulary can be efficiently selected. Further, for the user, it is possible to customize the vocabulary to be recognized, for example, without increasing the recognition performance.

【０００６】[0006]

【課題を解決するための手段】請求項１に係る発明は音
声認識語彙登録判定方法であり、音声認識装置に対する
新規語彙と既登録語彙との混同のしやすさを数値化し、
得られた数値（以下、評価値）に対してしきい値を設定
し、新規語彙の登録を許可するか禁止するか判定するこ
とを特徴とする。The invention according to claim 1 is a speech recognition vocabulary registration / judgment method, which quantifies the ease of confusion between a new vocabulary and a registered vocabulary for a speech recognition device,
A threshold value is set for the obtained numerical value (hereinafter, evaluation value), and it is determined whether registration of a new vocabulary is permitted or prohibited.

【０００７】請求項２に係る発明は、請求項１に係る発
明において、新規語彙及び既登録語彙の文字または文字
列をそれぞれ音素や音節等の認識単位に分解して、新規
登録語彙に対応する第１の認識単位系列と、既登録語彙
に対応する第２の認識単位系列を求め、前記評価値とし
て、第１、第２の認識単位系列間の類似度を求めること
を特徴とする音声認識語彙登録判定方法である。According to a second aspect of the present invention, in the first aspect, characters or character strings of the new vocabulary and the registered vocabulary are decomposed into recognition units such as phonemes and syllables to correspond to the newly registered vocabulary. Speech recognition characterized by obtaining a first recognition unit sequence and a second recognition unit sequence corresponding to a registered vocabulary, and obtaining a similarity between the first and second recognition unit sequences as the evaluation value. This is a vocabulary registration determination method.

【０００８】請求項３に係る発明は、請求項２に係る発
明において、認識単位間の距離を表す尺度の和あるいは
積をとって、前記類似度を求めることを特徴とする音声
認識語彙登録判定方法。A third aspect of the present invention is the speech recognition vocabulary registration determination according to the second aspect of the present invention, wherein the similarity is obtained by taking a sum or a product of scales representing distances between recognition units. Method.

【０００９】請求項４に係る発明は、請求項２に係る発
明において、予め得られている認識単位間のコンフュー
ジョン・マトリクスを用いて、前記類似度を求めること
を特徴とする音声認識語彙登録判定方法である。According to a fourth aspect of the present invention, there is provided the voice recognition vocabulary registration according to the second aspect, wherein the similarity is obtained by using a previously obtained confusion matrix between the recognition units. This is a determination method.

【００１０】請求項５に係る発明は、請求項２に係る発
明において、認識単位のＨＭＭ（隠れマルコフモデル）
を表現する音響パラメータ空間上の分布を用いて、前記
類似度を求めることを特徴とする音声認識語彙登録判定
方法である。The invention according to claim 5 is the invention according to claim 2, wherein the recognition unit is HMM (Hidden Markov Model).
Is a speech recognition vocabulary registration / judgment method characterized in that the similarity is obtained by using a distribution in an acoustic parameter space expressing the expression.

【００１１】請求項６に係る発明は、請求項１に係る発
明において、新規語彙の文字または文字列に対応する疑
似的な音響パラメータ系列を生成し、生成した音響パラ
メータ系列に対して音声認識処理を行い、得られた尤度
から前記評価値を求めることを特徴とする音声認識語彙
登録判定方法である。According to a sixth aspect of the present invention, in the first aspect of the invention, a pseudo acoustic parameter sequence corresponding to a character or a character string of a new vocabulary is generated, and the generated acoustic parameter sequence is subjected to a speech recognition process. And determining the evaluation value from the obtained likelihood.

【００１２】請求項７に係る発明は、請求項６に係る発
明において、前記音声認識処理として、新規語彙を受理
する文法を用いる第１の音声認識処理と、既登録語彙の
みを受理する文法を用いる第２の音声認識処理を行い、
第１の音声認識処理で得られた尤度と第２の音声認識処
理で得られた尤度との差を、前記評価値とすることを特
徴とする音声認識語彙登録判定方法である。The invention according to claim 7 is the invention according to claim 6, wherein the speech recognition processing includes a first speech recognition processing using a grammar for accepting a new vocabulary, and a grammar for accepting only registered vocabulary. Perform a second speech recognition process to be used,
A speech recognition vocabulary registration / determination method characterized in that a difference between a likelihood obtained in a first speech recognition process and a likelihood obtained in a second speech recognition process is used as the evaluation value.

【００１３】請求項８に係る発明は、請求項７に係る発
明において、ＨＭＭ（隠れマルコフモデル）を用いて前
記音響パラメータ系列を生成することを特徴とする音声
認識語彙登録判定方法。According to an eighth aspect of the present invention, there is provided the speech recognition vocabulary registration determination method according to the seventh aspect, wherein the acoustic parameter sequence is generated using an HMM (Hidden Markov Model).

【００１４】請求項９に係る発明は、請求項８に係る発
明において、前記ＨＭＭの分散を用いて摂動を与え、あ
るいは、雑音ＨＭＭの音響パラメータをランダムに接続
して、前記音響パラメータ系列を生成することを特徴と
する音声認識語彙登録判定方法である。According to a ninth aspect of the present invention, in the invention according to the eighth aspect, the acoustic parameter sequence is generated by applying perturbation using the variance of the HMM or connecting acoustic parameters of the noise HMM at random. This is a speech recognition vocabulary registration / judgment method.

【００１５】請求項１０に係る発明は、請求項２に係る
発明の音声認識語彙登録判定方法と、請求項６に係る発
明の音声認識語彙登録判定方法とを組み合わせて使用す
ることを特徴とする音声認識語彙登録判定方法である。A tenth aspect of the present invention is characterized in that the speech recognition vocabulary registration / judgment method according to the second aspect and the speech recognition vocabulary registration / judgment method according to the sixth aspect are used in combination. This is a speech recognition vocabulary registration determination method.

【００１６】請求項１１に係る発明は音声認識装置であ
り、音声入力手段と、語彙登録手段と、音声認識手段を
有し、音声入力手段から入力された音声を語彙登録手段
に登録されている語彙（以下、既登録語彙）に基づいて
認識する音声認識装置において、文字を入力する文字入
力手段と、入力された文字または文字列（以下、新規語
彙）と、既登録語彙との混同のしやすさを評価する数値
（以下、評価値）を計算する評価値計算手段と、得られ
た評価値に対してしきい値を設定し、新規語彙の登録を
許可するか禁止するか判定する判定手段を有すること特
徴とする。According to an eleventh aspect of the present invention, there is provided a speech recognition apparatus, comprising a speech input unit, a vocabulary registration unit, and a speech recognition unit, wherein a speech input from the speech input unit is registered in the vocabulary registration unit. In a speech recognition apparatus that recognizes words based on vocabulary (hereinafter, registered vocabulary), a character input unit for inputting a character, and the input character or character string (hereinafter, new vocabulary) is confused with a registered vocabulary. Evaluation value calculating means for calculating a numerical value (hereinafter, evaluation value) for evaluating ease of use, and setting a threshold value for the obtained evaluation value to determine whether to permit or prohibit registration of a new vocabulary It is characterized by having means.

【００１７】請求項１２に係る発明は、請求項１１に係
る発明において、新規語彙を認識単位に分解して、新規
語彙に対応する第１の認識単位系列を生成する第１の変
換手段と、既登録語彙を認識単位に分解して、既登録語
彙に対応する第２の認識単位系列を生成する第２の変換
手段とを有すること、前記評価値計算手段は、前記評価
値として、第１の変換手段で得られた第１の認識単位系
列と、第２の変換手段で得られた第２の認識単位系列と
の間の類似度を計算することを有することを特徴とする
音声認識装置である。According to a twelfth aspect of the present invention, in the invention according to the eleventh aspect, first conversion means for decomposing the new vocabulary into recognition units and generating a first recognition unit sequence corresponding to the new vocabulary, Second conversion means for decomposing the registered vocabulary into recognition units and generating a second recognition unit sequence corresponding to the registered vocabulary, wherein the evaluation value calculation means includes a first evaluation value Calculating a similarity between the first recognition unit sequence obtained by the conversion means and the second recognition unit sequence obtained by the second conversion means. It is.

【００１８】請求項１３に係る発明は、請求項１２に係
る発明において、前記評価値計算手段は、認識単位間の
距離を表す尺度の和あるいは積をとって、前記類似度を
計算することを特徴とする音声認識装置である。According to a thirteenth aspect of the present invention, in the twelfth aspect, the evaluation value calculating means calculates the similarity by calculating a sum or a product of scales representing distances between recognition units. It is a speech recognition device that is a feature.

【００１９】請求項１４に係る発明は、請求項１２に係
る発明において、前記評価値計算手段は、予め得られて
いる認識単位間のコンフュージョン・マトリクスを用い
て、前記類似度を計算することを特徴とする音声認識装
置である。According to a fourteenth aspect, in the twelfth aspect, the evaluation value calculating means calculates the similarity using a confusion matrix between recognition units obtained in advance. This is a voice recognition device characterized by the following.

【００２０】請求項１５に係る発明は、請求項１２に係
る発明において、前記評価値計算手段は、認識単位のＨ
ＭＭ（隠れマルコフモデル）を表現する音響パラメータ
空間上の分布を用いて、前記類似度を計算することを特
徴とする音声認識装置である。According to a fifteenth aspect of the present invention, in the twelfth aspect of the present invention, the evaluation value calculating means includes the recognition unit H
A speech recognition apparatus characterized in that the similarity is calculated using a distribution in an acoustic parameter space representing a MM (Hidden Markov Model).

【００２１】請求項１６に係る発明は、請求項１１に係
る発明において、新規語彙に対応する疑似的な音響パラ
メータ系列を生成し、前記音声認識手段に与える音響パ
ラメータ系列生成手段を有すること、前記評価値計算手
段は前記音声認識手段から得られる尤度を用いて、前記
評価値を計算することを特徴とする音声認識装置であ
る。According to a sixteenth aspect of the present invention, in the invention according to the eleventh aspect, there is provided an acoustic parameter sequence generating means for generating a pseudo acoustic parameter sequence corresponding to a new vocabulary and providing the generated sequence to the speech recognition means. The evaluation value calculating means calculates the evaluation value using the likelihood obtained from the voice recognition means.

【００２２】請求項１７に係る発明は、請求項１６に係
る発明において、前記音声認識手段は、新規語彙を受理
する文法を用いる第１の音声認識処理と、既登録語彙の
みを受理する文法を用いる第２の音声認識処理を行うこ
と、前記評価値計算手段は前記評価値として、第１の音
声認識処理で得られた尤度と第２の音声認識処理で得ら
れた尤度との差を計算することを特徴とする音声認識装
置である。According to a seventeenth aspect of the present invention, in the invention according to the sixteenth aspect, the speech recognition means performs a first speech recognition process using a grammar for accepting a new vocabulary, and a grammar for accepting only a registered vocabulary. Performing a second speech recognition process to be used, wherein the evaluation value calculating means calculates, as the evaluation value, a difference between a likelihood obtained in the first speech recognition process and a likelihood obtained in the second speech recognition process. Is calculated.

【００２３】請求項１８に係る発明は、請求項１７に係
る発明において、前記音響パラメータ系列生成手段は、
ＨＭＭ（隠れマルコフモデル）を用いて、音響パラメー
タ系列を生成することを特徴とする音声認識装置であ
る。According to an eighteenth aspect of the present invention, in the invention according to the seventeenth aspect, the acoustic parameter sequence generating means includes:
This is a speech recognition device that generates an acoustic parameter sequence using an HMM (Hidden Markov Model).

【００２４】請求項１９に係る発明は、請求項１８に係
る発明において、前記音響パラメータ系列生成手段は、
前記ＨＭＭの分散を用いて摂動を与え、あるいは、雑音
ＨＭＭの音響パラメータをランダムに接続して、音響パ
ラメータ系列を生成することを特徴とする音声認識装置
である。According to a nineteenth aspect, in the invention according to the eighteenth aspect, the acoustic parameter sequence generating means includes:
A speech recognition apparatus characterized in that a perturbation is given by using the variance of the HMM, or an acoustic parameter sequence is generated by randomly connecting acoustic parameters of a noise HMM.

【００２５】請求項２０に係る発明は、請求項１から１
０いずれかの発明に係る音声認識語彙登録判定方法をコ
ンピュータに実行させるコンピュータプログラムであ
る。The invention according to claim 20 is the invention according to claims 1 to 1
0 is a computer program for causing a computer to execute the speech recognition vocabulary registration determination method according to any one of the inventions.

【００２６】請求項２１に係る発明は、請求項２１の発
明に係るコンピュータプログラムを記録した記録媒体で
ある。According to a twenty-first aspect of the present invention, there is provided a recording medium storing the computer program according to the twenty-first aspect.

【００２７】[0027]

【発明の実施の形態】以下、本発明の実施形態例を説明
する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described.

【００２８】図１、図２を参照して、本発明の基本的な
音声認識装置と音声認識語彙登録判定方法を説明する。
図１に音声認識装置の構成例を示す。この装置は音声入
力手段１と、語彙登録手段２と、音声認識手段３と、文
字入力手段４と、評価値計算手段５と、判定手段６を有
しており、主としてコンピュータプログラムとそれを実
行するコンピュータで実現されている。Referring to FIGS. 1 and 2, a basic speech recognition apparatus and a speech recognition vocabulary registration / judgment method of the present invention will be described.
FIG. 1 shows a configuration example of a speech recognition device. This apparatus has a voice input unit 1, a vocabulary registration unit 2, a voice recognition unit 3, a character input unit 4, an evaluation value calculation unit 5, and a determination unit 6, and mainly includes a computer program and a computer program for executing the computer program. Computer.

【００２９】音声入力手段１は音声信号を音声認識手段
３に与えるものであり、例えば、電話等の外部から音声
信号を入力するもの、マイクロホン等それ自身が音声を
信号に変換するものが使用可能である。The voice input means 1 is for giving a voice signal to the voice recognition means 3, and for example, a voice signal input means for inputting a voice signal from outside, such as a telephone, and a microphone which converts voice itself into a signal can be used. It is.

【００３０】語彙登録手段２は認識対象語彙を登録する
ものである。The vocabulary registration means 2 registers a vocabulary to be recognized.

【００３１】音声認識手段３は、語彙登録手段２の既登
録語彙に基づいて、音声入力手段１から入力された音声
を認識するものである。The voice recognition means 3 recognizes the voice input from the voice input means 1 based on the registered vocabulary of the vocabulary registration means 2.

【００３２】文字入力手段４は新規語彙をキャラクター
ベースの文字または文字列で、評価値計算手段５及び語
彙登録手段２に与えるものであり、例えば、キーボード
などＣＵＩ型入力手段、文字認識手段、ＧＵＩ型入力手
段が使用可能である。新規語彙が漢字や英数字、記号等
であれば、その読みが評価値計算手段５に与えられる。The character input means 4 provides a new vocabulary as a character-based character or character string to the evaluation value calculating means 5 and the vocabulary registration means 2, and includes, for example, a CUI type input means such as a keyboard, a character recognition means, a GUI. Type input means can be used. If the new vocabulary is a kanji, an alphanumeric character, a symbol, or the like, its reading is given to the evaluation value calculation means 5.

【００３３】評価値計算手段５は文字入力手段４から与
えられた新規語彙と、語彙登録手段２の既登録語彙との
混同のしやすさ（紛らわしさ）を評価する数値（評価
値）を計算し、得られた評価値を判定手段６に与えるも
のである。これにより、混同のしやすさが数値化され
る。The evaluation value calculation means 5 calculates a numerical value (evaluation value) for evaluating the degree of confusion between the new vocabulary given from the character input means 4 and the vocabulary registered in the vocabulary registration means 2 (confusingness). Then, the obtained evaluation value is given to the judgment means 6. This quantifies the ease of confusion.

【００３４】判定手段６は、評価値計算手段５から与え
られる評価値に対してしきい値を設定し、新規語彙の登
録を許可するか禁止するか判定するものである。この判
定結果は語彙登録手段２に与えられる。しきい値は、予
め実験等により求めることにより設定することができ
る。The judgment means 6 sets a threshold value for the evaluation value given by the evaluation value calculation means 5 and judges whether registration of a new vocabulary is permitted or prohibited. This determination result is given to the vocabulary registration means 2. The threshold value can be set by obtaining it in advance through experiments or the like.

【００３５】語彙登録手段２は、判定結果が登録許可で
あれば、文字入力手段４から与えられる新規語彙を認識
対象語彙として登録し、登録禁止であれば新規語彙を登
録しない。登録禁止の場合は、その旨を視覚や聴覚等に
よる警告手段７で知らせるようにしている。The vocabulary registration means 2 registers the new vocabulary given from the character input means 4 as the recognition target vocabulary if the result of the determination is permission, and does not register the new vocabulary if the registration is prohibited. If the registration is prohibited, this is notified by warning means 7 such as visual or auditory means.

【００３６】従って、図１に示す音声認識装置では語彙
登録に際し、図２に手順を示すように、まず、新規語彙
と既登録語彙との混同のしやすさを数値化し（ステップ
Ｓ１）、得られた評価値に対してしきい値を設定し、新
規語彙の登録を許可するか禁止するか判定する（ステッ
プＳ２）。登録許可であれば新規語彙を登録し（ステッ
プＳ３）、登録禁止であれば禁止の旨を警告する（ステ
ップＳ４）。Therefore, in the vocabulary registration in the speech recognition apparatus shown in FIG. 1, as shown in FIG. 2, the ease of confusion between a new vocabulary and a registered vocabulary is first quantified (step S1). A threshold value is set for the obtained evaluation value, and it is determined whether registration of a new vocabulary is permitted or prohibited (step S2). If registration is permitted, a new vocabulary is registered (step S3), and if registration is prohibited, a warning is issued (step S4).

【００３７】音声認識装置のコンピュータはコンピュー
タプログラムを読み込んで、このような手順を実行す
る。言い換えれば、コンピュータに実行させる手順がコ
ンピュータプログラムに記述されている。コンピュータ
は適宜な記録媒体に記録されたコンピュータプログラム
を読み込んだり、あるいは、ネットワーク等適宜な伝送
手段を通して読み込んだりすることができる。The computer of the speech recognition apparatus reads the computer program and executes such a procedure. In other words, the procedure to be executed by the computer is described in the computer program. The computer can read the computer program recorded on an appropriate recording medium, or can read it through an appropriate transmission means such as a network.

【００３８】評価値計算手段５における混同しやすさの
数値化は、例えば、ＤＰマッチングや疑似的な音声認識
を行うことにより、達成できる。The numerical value of the degree of confusion in the evaluation value calculation means 5 can be achieved, for example, by performing DP matching or pseudo speech recognition.

【００３９】ＤＰマッチングの場合、新規語彙及び既登
録語彙の文字または文字列をそれぞれ音素や音節等の認
識単位に分解して、新規登録語彙に対応する第１の認識
単位系列と、既登録語彙に対応する第２の認識単位系列
を求め、第１、第２の認識単位系列間の類似度を求め
る。この類似度が、新規語彙と既登録語彙との混同しや
すさの評価値となる。In the case of DP matching, the characters or character strings of the new vocabulary and the registered vocabulary are decomposed into recognition units such as phonemes and syllables, respectively. , And a similarity between the first and second recognition unit sequences is obtained. This similarity is an evaluation value of the degree of confusion between the new vocabulary and the registered vocabulary.

【００４０】簡単な例では、認識単位間の一致、置換、
削除、挿入に対してそれぞれ距離を表すスコア（尺度ま
たは距離尺度）をそれぞれ認識単位の組合せによらず固
定値としておき、これら一致スコア（通常は０）、置換
スコア、削除スコア、挿入スコアを用いて２つの認識単
位系列間でＤＰマッチングを行う。例えば、総和スコア
（スコアの和）が最小となるパスを探索し、そのパスの
総スコアを２つの認識単位系列間の類似度とする。一致
スコア、置換スコア、削除スコア及び挿入スコアはいず
れも０でない場合は、最小となる積スコア（スコアの
積）を２つの認識単位系列間の類似度とすることができ
る。図３に、この手法を実現する音声認識装置の構成例
を示す。但し、評価値計算に必要な部分のみ示し、音声
入力手段と音声認識手段は図示を省略する。この音声認
識装置のコンピュータもコンピュータプログラムを読み
込んで、上記の手順を実行する。コンピュータプログラ
ムには、コンピュータに実行させる手順が記述されてい
る。In a simple example, matching between recognition units, replacement,
Scores (scales or distance scales) representing distances for deletion and insertion are set as fixed values irrespective of combinations of recognition units, and the matching score (usually 0), replacement score, deletion score, and insertion score are used. DP matching is performed between the two recognition unit sequences. For example, a path having the minimum total score (sum of scores) is searched, and the total score of the path is set as the similarity between the two recognition unit sequences. When none of the match score, the replacement score, the deletion score, and the insertion score are 0, the minimum product score (product of the scores) can be used as the similarity between the two recognition unit sequences. FIG. 3 shows a configuration example of a voice recognition device that realizes this method. However, only the parts necessary for the evaluation value calculation are shown, and the voice input means and the voice recognition means are not shown. The computer of the speech recognition device also reads the computer program and executes the above procedure. The computer program describes a procedure to be executed by the computer.

【００４１】図３に示す音声認識装置は格納手段８と、
第１の変換手段９と、第２の変換手段１０を追加して有
する。格納手段８には、認識単位間の一致スコア、置換
スコア、削除スコア及び挿入スコアが固定値として納め
られている。The voice recognition device shown in FIG.
A first conversion unit 9 and a second conversion unit 10 are additionally provided. The storage unit 8 stores a match score, a replacement score, a deletion score, and an insertion score between recognition units as fixed values.

【００４２】第１の変換手段９は、文字入力手段４から
新規語彙を取得して認識単位に分解し、新規語彙に対応
する第１の認識単位系列を生成して評価値計算手段５に
与える。第２の変換手段１０は、語彙登録手段２から既
登録語彙を取得して認識単位に分解し、既登録語彙に対
応する第２の認識単位系列を生成して評価値計算手段５
に与える。評価値計算手段５は、格納手段８に納められ
た認識単位間の固定した一致スコア、置換スコア、削除
スコア及び挿入スコアを用いて、第１の変換手段９から
得られた第１の認識単位系列と、第２の変換手段１０か
ら得られた第２の認識単位系列とのＤＰマッチングの計
算を行うことにより、最小の総和スコアあるいは最小の
積スコアをこれら２つの認識単位系列間の類似度（新規
語彙と既登録語彙との混同しやすさの評価値）を求め
る。この場合、判定手段６は設定したしきい値より評価
値（類似度）が大きい場合は新規語彙の登録を許可し、
小さい場合は登録を禁止することになる。The first conversion means 9 obtains a new vocabulary from the character input means 4, breaks it down into recognition units, generates a first recognition unit sequence corresponding to the new vocabulary, and gives it to the evaluation value calculation means 5. . The second conversion unit 10 acquires the registered vocabulary from the vocabulary registration unit 2 and decomposes the vocabulary into recognition units, generates a second recognition unit sequence corresponding to the registered vocabulary, and generates an evaluation value calculation unit 5.
Give to. The evaluation value calculation means 5 uses the fixed matching score, replacement score, deletion score, and insertion score between the recognition units stored in the storage means 8 to obtain the first recognition unit obtained from the first conversion means 9. By calculating the DP matching between the sequence and the second recognition unit sequence obtained from the second conversion means 10, the minimum sum score or the minimum product score is calculated by the similarity between these two recognition unit sequences. (Evaluation value of ease of confusion between new vocabulary and registered vocabulary) is calculated. In this case, when the evaluation value (similarity) is larger than the set threshold value, the determination means 6 permits the registration of a new vocabulary,
If it is smaller, registration will be prohibited.

【００４３】上記の例では認識単位間の一致スコア、置
換スコア、削除スコア及び挿入スコアを固定値とした
が、認識単位の組合せによっては、認識単位間の混同の
しやすさが異なる。従って、認識単位間の一致スコア、
置換スコア、削除スコア及び挿入スコアを、認識単位の
組合せによって応じた値に設定することにより、より正
確に類似度（評価値）を求めることができる。In the above example, the matching score, the replacement score, the deletion score, and the insertion score between the recognition units are fixed values, but the degree of confusion between the recognition units differs depending on the combination of the recognition units. Therefore, the match score between the recognition units,
By setting the replacement score, the deletion score, and the insertion score to values according to the combination of the recognition units, the similarity (evaluation value) can be obtained more accurately.

【００４４】その一例として、音素や音節等の認識単位
のコンフュージョン・マトリクス(Confusion Matrix)を
用いて、認識単位間の混同のしやすさを数値化してＤＰ
マッチングを行う例を説明する。As an example, using a confusion matrix of recognition units such as phonemes and syllables, the degree of confusion between recognition units is quantified and converted into a DP.
An example of performing matching will be described.

【００４５】予め音素認識実験や音節認識実験など、認
識単位のについての認識実験を行うことにより、図４に
示すようなコンフュージョン・マトリクスが得られる。
簡単のため、認識単位はａとｂの２つとしている。A confusion matrix as shown in FIG. 4 is obtained by conducting a recognition experiment on recognition units such as a phoneme recognition experiment and a syllable recognition experiment in advance.
For simplicity, the two recognition units are a and b.

【００４６】図４において、縦軸の認識単位の事象が横
軸の認識単位と認識された数を示し、記号del は縦軸の
認識単位の事象が削除された数を示し、記号ins は横軸
の認識単位が挿入された数を示す。例えば、縦軸ａと横
軸ａとが示す値８０は、認識単位ａの事象が認識単位ａ
と認識された数、つまり正解の数であり、縦軸ａと横軸
ｂとが示す値１５は、認識単位ａの事象が認識単位ｂと
認識された数、つまり置換された数である。縦軸ａと横
軸del とが示す値５は、認識単位ａの事象が削除された
数であり、縦軸ins と横軸ａとが示す値３０は、認識単
位ａが挿入された数である。In FIG. 4, the event of the recognition unit on the vertical axis indicates the number of recognition units recognized on the horizontal axis, the symbol del indicates the number of events of the recognition unit on the vertical axis deleted, and the symbol ins indicates the number of events. Indicates the number of axis recognition units inserted. For example, the value 80 indicated by the vertical axis a and the horizontal axis a indicates that the event of the recognition unit a is the recognition unit a
The value 15 indicated by the vertical axis a and the horizontal axis b is the number of events recognized in the recognition unit a recognized as the recognition unit b, that is, the number replaced. The value 5 indicated by the vertical axis a and the horizontal axis del is the number of events of the recognition unit a deleted, and the value 30 indicated by the vertical axis ins and the horizontal axis a is the number of recognition units a inserted. is there.

【００４７】このコンフュージョン・マトリクスから、
認識単位間の混同のしやすさを表すスコアを定義するこ
とができる。その一例を下記式(1) 〜(8) で示す。 hit(a-a)=log(80/(80+15+5)) …式(1) sub(a-b)=log(15/(80+15+5)) …式(2) del(a) =log( 5/(80+15+5)) …式(3) hit(b-b)=log(70/(20+70+10)) …式(4) sub(b-a)=log(20/(20+70+10)) …式(5) del(b) =log(10/(20+70+10)) …式(6) ins(a) =log(30(80+20+30)) …式(7) ins(b) =log( 5(15+70+5)) …式(8) 但し、hit(a-a)は認識単位ａの事象が認識単位ａと認識
される一致スコア（正解数が事例数に占める割合の対
数）、sub(a-b)は認識単位ａの事象が認識単位ｂと認識
される置換スコア（置換数が事例数に占める割合の対
数）、del(a)は認識単位ａの事象が削除される削除スコ
ア（削除数が事例数に占める割合の対数）、hit(b-b)は
認識単位ｂの事象が認識単位ｂと認識される一致スコア
（正解数が事例数に占める割合の対数）、sub(b-a)は認
識単位ｂの事象が認識単位ａと認識される置換スコア
（置換数が事例数に占める割合の対数）、del(b)は認識
単位ｂの事象が削除される削除スコア（削除数が事例数
に占める割合の対数）、ins(a)は認識単位ａが挿入され
る挿入スコア（挿入数が観測数に占める割合の対数）、
ins(b)は認識単位ｂが挿入される挿入スコア（挿入数が
観測数に占める割合の対数）である。From this confusion matrix,
It is possible to define a score that indicates the degree of confusion between recognition units. One example is shown by the following equations (1) to (8). hit (aa) = log (80 / (80 + 15 + 5)) ... Equation (1) sub (ab) = log (15 / (80 + 15 + 5)) ... Equation (2) del (a) = log (5 / (80 + 15 + 5))… Equation (3) hit (bb) = log (70 / (20 + 70 + 10))… Equation (4) sub (ba) = log (20 / (20+ 70 + 10)) ... Equation (5) del (b) = log (10 / (20 + 70 + 10)) ... Equation (6) ins (a) = log (30 (80 + 20 + 30)) ... Equation (7) ins (b) = log (5 (15 + 70 + 5)) Equation (8) where hit (aa) is the matching score (the number of correct answers is Sub (ab) is the replacement score (the logarithm of the ratio of the number of replacements to the number of cases) at which the event of the recognition unit a is recognized as the recognition unit b, and del (a) is the recognition unit a The deletion score (the logarithm of the ratio of the number of deletions to the number of cases) at which events are deleted, and hit (bb) is the matching score at which the event of the recognition unit b is recognized as the recognition unit b (the ratio of the number of correct answers to the number of cases) , Sub (ba) is the replacement score (the logarithm of the ratio of the number of replacements to the number of cases) at which the event of the recognition unit b is recognized as the recognition unit a, and del (b) is The deletion score at which the event of the recognition unit b is deleted (the logarithm of the ratio of the number of deletions to the number of cases), and ins (a) is the insertion score at which the recognition unit a is inserted (the logarithm of the ratio of the number of insertions to the number of observations) ,
ins (b) is the insertion score at which the recognition unit b is inserted (the logarithm of the ratio of the number of insertions to the number of observations).

【００４８】このようなコンフュージョン・マトリクス
を用いて一致スコア、置換スコア、削除スコア及び挿入
スコアを定め、これを用いてＤＰマッチングを行い、２
つの認識単位系列間の類似度（混同しやすさの評価値）
を求める。A match score, a replacement score, a deletion score, and an insertion score are determined by using such a confusion matrix, and DP matching is performed using the determined scores.
Similarity between two recognition unit series (evaluation value of ease of confusion)
Ask for.

【００４９】但し、上記式(1) 〜(8) で示す例では、一
致スコア、置換スコア、削除スコア及び挿入スコアは認
識単位間の距離ではなく、混同のしやすさを示している
ため、ＤＰマッチングにおいては、最大の総和スコアあ
るいは積スコアを評価値（類似度）とすることになる。However, in the examples shown in the above equations (1) to (8), the matching score, the replacement score, the deletion score and the insertion score indicate not the distance between the recognition units but the degree of confusion. In DP matching, the maximum sum score or product score is used as the evaluation value (similarity).

【００５０】最小の総和スコアあるいは積スコアを評価
値（類似度）とするには、下記式(9) 〜(16)に示すよう
に正負の符号を逆にしたスコアを用いてＤＰマッチング
を行えば良い。この場合、総和スコアあるいは積スコア
が最小のパスを探索し、そのパスの総和スコアあるいは
積スコアが評価値（類似度）となる。 hit(a-a)=-log(80/(80+15+5)) …式(9) sub(a-b)=-log(15/(80+15+5)) …式(10) del(a) =-log( 5/(80+15+5)) …式(11) hit(b-b)=-log(70/(20+70+10)) …式(12) sub(b-a)=-log(20/(20+70+10)) …式(13) del(b) =-log(10/(20+70+10)) …式(14) ins(a) =-log(30(80+20+30)) …式(15) ins(b) =-log( 5(15+70+5)) …式(16)In order to use the minimum sum score or product score as an evaluation value (similarity), DP matching is performed using a score in which the sign is inverted as shown in the following equations (9) to (16). Good. In this case, a path having the minimum total score or product score is searched for, and the total score or product score of the path becomes the evaluation value (similarity). hit (aa) =-log (80 / (80 + 15 + 5)) ... Equation (9) sub (ab) =-log (15 / (80 + 15 + 5)) ... Equation (10) del (a) = -log (5 / (80 + 15 + 5))… Equation (11) hit (bb) =-log (70 / (20 + 70 + 10))… Equation (12) sub (ba) =-log ( 20 / (20 + 70 + 10)) ... Equation (13) del (b) = -log (10 / (20 + 70 + 10)) ... Equation (14) ins (a) = -log (30 (80+ 20 + 30))… Equation (15) ins (b) = -log (5 (15 + 70 + 5))… Equation (16)

【００５１】図５に、この手法を実現する音声認識装置
の構成例を示す。但し、評価値計算に必要な部分のみ示
し、音声入力手段、音声認識手段、警告手段等は図示を
省略する。この音声認識装置のコンピュータもコンピュ
ータプログラムを読み込んで、上記の手順を実行する。
コンピュータプログラムには、コンピュータに実行させ
る手順が記述されている。FIG. 5 shows an example of the configuration of a speech recognition apparatus for realizing this method. However, only the parts necessary for the evaluation value calculation are shown, and voice input means, voice recognition means, warning means, and the like are not shown. The computer of the speech recognition device also reads the computer program and executes the above procedure.
The computer program describes a procedure to be executed by the computer.

【００５２】図５に示す音声認識装置は格納手段８と、
第１の変換手段９と、第２の変換手段１０と、スコア計
算手段１１を追加して有する。スコア計算手段１１はコ
ンフュージョン・マトリクスを用い、上記の式(9) 〜(1
6)のようにして一致スコア、置換スコア、削除スコア及
び挿入スコアを計算し、これらが格納手段８に納められ
ている。The voice recognition device shown in FIG.
A first conversion unit 9, a second conversion unit 10, and a score calculation unit 11 are additionally provided. The score calculation means 11 uses a confusion matrix and calculates the above equations (9) to (1).
The match score, replacement score, deletion score, and insertion score are calculated as in 6), and these are stored in the storage means 8.

【００５３】第１の変換手段９は、文字入力手段４から
新規語彙を取得して認識単位に分解し、新規語彙に対応
する第１の認識単位系列を生成して評価値計算手段５に
与える。第２の変換手段１０は、語彙登録手段２から既
登録語彙を取得して認識単位に分解し、既登録語彙に対
応する第２の認識単位系列を生成して評価値計算手段５
に与える。評価値計算手段５は、格納手段８に納められ
ている認識単位間のた一致スコア、置換スコア、削除ス
コア及び挿入スコアを用いて、第１の変換手段９から得
られた第１の認識単位系列と、第２の変換手段１０から
得られた第２の認識単位系列とのＤＰマッチングの計算
を行うことにより、最小の総和スコアあるいは最小の積
スコアをこれら２つの認識単位系列間の類似度（新規語
彙と既登録語彙との混同しやすさの評価値）を求める。
この場合、判定手段６は設定したしきい値より評価値
（類似度）が大きい場合は新規語彙の登録を許可し、小
さい場合は登録を禁止することになる。The first conversion means 9 obtains a new vocabulary from the character input means 4, breaks it down into recognition units, generates a first recognition unit sequence corresponding to the new vocabulary, and gives it to the evaluation value calculation means 5. . The second conversion unit 10 acquires the registered vocabulary from the vocabulary registration unit 2 and decomposes the vocabulary into recognition units, generates a second recognition unit sequence corresponding to the registered vocabulary, and generates an evaluation value calculation unit 5.
Give to. The evaluation value calculation means 5 uses the matching score, the replacement score, the deletion score, and the insertion score between the recognition units stored in the storage means 8 to obtain the first recognition unit obtained from the first conversion means 9. By calculating the DP matching between the sequence and the second recognition unit sequence obtained from the second conversion means 10, the minimum sum score or the minimum product score is calculated by the similarity between these two recognition unit sequences. (Evaluation value of ease of confusion between new vocabulary and registered vocabulary) is calculated.
In this case, the judgment unit 6 permits registration of a new vocabulary when the evaluation value (similarity) is larger than the set threshold value, and prohibits registration when the evaluation value (similarity) is smaller than the set threshold value.

【００５４】別の例として、認識単位間のスコアを、コ
ンフュージョン・マトリクスに代えて、音素モデルや音
節モデルなど認識単位を表すＨＭＭ（隠れマルコフモデ
ル）の音響パラメータを用いて求めることができる。例
えば、ＨＭＭが離散ＨＭＭであれば、コードブック間の
距離を用い、連続分布ＨＭＭであれば、分布の平均ベク
トル間の距離を用いることができる。As another example, a score between recognition units can be obtained using acoustic parameters of a HMM (Hidden Markov Model) representing a recognition unit, such as a phoneme model or a syllable model, instead of the confusion matrix. For example, if the HMM is a discrete HMM, the distance between codebooks can be used. If the HMM is a continuous distribution HMM, the distance between mean vectors of the distribution can be used.

【００５５】ここで、ＨＭＭの音響パラメータから認識
単位間のスコアを求める距離計算の例を示す。例えば、
認識単位ａと認識単位ｂが３状態のＨＭＭであり、それ
ぞれの２状態目が下記の式(17)、式(18)で表される確率
Ｐ_a、Ｐ_bを出力する混合正規分布である場合、２つの
状態間の距離Ｄ（ａ，ｂ）は下記の式(19)もしくは、式
(20)で定義される。但し、d(a _j,b_j) は分布間の距離
を表す。式(20)はＤ（ａ，ｂ）として分布間距離の最小
値を定義しているが、分布間距離の最大値あるいは平均
値をＤ（ａ，ｂ）としても良い。Here, an example of the distance calculation for obtaining the score between the recognition units from the acoustic parameters of the HMM will be described. For example,
The recognition unit a and the recognition unit b are three-state HMMs, and the second state of each is a mixed normal distribution that outputs the probabilities P _a and P _b represented by the following equations (17) and (18). In this case, the distance D (a, b) between the two states is calculated by the following equation (19) or
Defined in (20). Here, d (a _j , b _j ) represents the distance between distributions. Equation (20) defines the minimum value of the inter-distribution distance as D (a, b), but the maximum or average value of the inter-distribution distance may be D (a, b).

【００５６】[0056]

【数１】 (Equation 1)

【００５７】認識単位を表すＨＭＭの音響パラメータか
ら認識単位間のスコアを求めたら、このスコアを用いて
ＤＰマッチングを行う。つまり、コンフュージョン・マ
トリクスを基にした場合と同様、総和スコアあるいは積
スコアが最小のパスを探索し、そのパスの総和スコアあ
るいは積スコアを評価値（類似度）する。When a score between recognition units is obtained from the acoustic parameters of the HMM representing the recognition unit, DP matching is performed using this score. That is, similarly to the case based on the confusion matrix, a path having the smallest total score or product score is searched, and the total score or product score of the path is evaluated (similarity).

【００５８】この場合、認識単位を表すＨＭＭの音響パ
ラメータから認識単位間のスコアを求めると、認識単位
の削除及び挿入に対するスコアが得られない。そこで、
ＤＰマッチングとしては、認識単位の削除及び挿入を禁
止し、その代わりに滞留を許容するＤＰマッチングを用
いる。In this case, if a score between recognition units is obtained from the acoustic parameters of the HMM representing the recognition unit, a score for deletion and insertion of the recognition unit cannot be obtained. Therefore,
As the DP matching, a DP matching that prohibits deletion and insertion of a recognition unit and allows a stay instead is used.

【００５９】図６に、この手法を実現する音声認識装置
の構成例を示す。但し、評価値計算に必要な部分のみ示
し、音声認識手段等は図示を省略する。この音声認識装
置のコンピュータもコンピュータプログラムを読み込ん
で、上記の手順を実行する。コンピュータプログラムに
は、コンピュータに実行させる手順が記述されている。FIG. 6 shows an example of the configuration of a speech recognition apparatus for realizing this method. However, only the parts necessary for the evaluation value calculation are shown, and the voice recognition means and the like are not shown. The computer of the speech recognition device also reads the computer program and executes the above procedure. The computer program describes a procedure to be executed by the computer.

【００６０】図６に示す音声認識装置は格納手段８と、
第１の変換手段９と、第２の変換手段１０と、スコア計
算手段１１を追加して有する。スコア計算手段１１は認
識単位を表すＨＭＭの音響パラメータから認識単位間の
スコアを計算し、これらが格納手段８に納められてい
る。The voice recognition device shown in FIG.
A first conversion unit 9, a second conversion unit 10, and a score calculation unit 11 are additionally provided. The score calculation means 11 calculates scores between recognition units from acoustic parameters of the HMM representing the recognition units, and these are stored in the storage means 8.

【００６１】第１の変換手段９は、文字入力手段４から
新規語彙を取得して認識単位に分解し、新規語彙に対応
する第１の認識単位系列を生成して評価値計算手段５に
与える。第２の変換手段１０は、語彙登録手段２から既
登録語彙を取得して認識単位に分解し、既登録語彙に対
応する第２の認識単位系列を生成して評価値計算手段５
に与える。The first conversion means 9 obtains a new vocabulary from the character input means 4, breaks it down into recognition units, generates a first recognition unit sequence corresponding to the new vocabulary, and gives it to the evaluation value calculation means 5. . The second conversion unit 10 acquires the registered vocabulary from the vocabulary registration unit 2 and decomposes the vocabulary into recognition units, generates a second recognition unit sequence corresponding to the registered vocabulary, and generates an evaluation value calculation unit 5.
Give to.

【００６２】評価値計算手段５は、格納手段８に納めら
れている認識単位間のスコアを用い、第１の変換手段９
から得られた第１の認識単位系列と、第２の変換手段１
０から得られた第２の認識単位系列について、認識単位
の削除及び挿入を禁止し、代わりに滞留を許容するＤＰ
マッチングの計算を行うことにより、最小の総和スコア
あるいは最小の積スコアをこれら２つの認識単位系列間
の類似度（新規語彙と既登録語彙との混同しやすさの評
価値）を求める。この場合、判定手段６は設定したしき
い値より評価値（類似度）が大きい場合は新規語彙の登
録を許可し、小さい場合は登録を禁止することになる。The evaluation value calculation means 5 uses the score between the recognition units stored in the storage means 8 and the first conversion means 9
The first recognition unit sequence obtained from
DP that prohibits deletion and insertion of recognition units and allows stagnation instead of the second recognition unit sequence obtained from 0
By calculating the matching, the minimum sum score or the minimum product score is obtained as the similarity between these two recognition unit sequences (evaluation value of the ease of confusion between the new vocabulary and the registered vocabulary). In this case, the judgment unit 6 permits registration of a new vocabulary when the evaluation value (similarity) is larger than the set threshold value, and prohibits registration when the evaluation value (similarity) is smaller than the set threshold value.

【００６３】次に、疑似的な音声認識を行って新規語彙
と既登録語彙との混同のしやすさを数値化する例を説明
する。疑似的な音声認識を行うために、新規語彙の文字
または文字列に対応する疑似的な音響パラメータ系列を
生成する。この疑似的な音響パラメータ系列に対して音
声認識処理を行い、尤度を求める。得られた尤度から評
価値が求まるので、この評価値を利用して新規語彙の登
録許可か禁止か判定する。疑似的な音声認識を行う場合
は、認識単位間のスコアを用いて混同しやすさの評価値
を求める場合に比べ、音響的な特徴をより正確に反映す
ることができる。Next, an example will be described in which pseudo speech recognition is performed to quantify the degree of confusion between a new vocabulary and a registered vocabulary. In order to perform pseudo speech recognition, a pseudo acoustic parameter sequence corresponding to a character or character string of a new vocabulary is generated. Speech recognition processing is performed on this pseudo acoustic parameter sequence to determine likelihood. Since an evaluation value is obtained from the obtained likelihood, it is determined whether registration of a new vocabulary is permitted or prohibited by using the evaluation value. When pseudo speech recognition is performed, acoustic features can be reflected more accurately than when an evaluation value of ease of confusion is obtained using scores between recognition units.

【００６４】疑似的な音響パラメータ系列は、音素モデ
ルや音節モデルと言った認識単位の音響パラメータを利
用して生成することができる。A pseudo acoustic parameter sequence can be generated using acoustic parameters of a recognition unit such as a phoneme model or a syllable model.

【００６５】例えば、認識単位を表現するＨＭＭの音響
パラメータを利用する。この場合、新規語彙を音素や音
節等の認識単位に分解し、各認識単位に対応するＨＭＭ
を表現する音響パラメータを連結することによって、疑
似的な音響パラメータ系列を作成することができる。離
散ＨＭＭであればコードブックを用いて、連続分布ＨＭ
Ｍであれば音響的特徴を表現する分布の平均値を用いる
ことができる。For example, the acoustic parameters of the HMM representing the recognition unit are used. In this case, the new vocabulary is decomposed into recognition units such as phonemes and syllables, and an HMM corresponding to each recognition unit is decomposed.
By concatenating the acoustic parameters that express, a pseudo acoustic parameter sequence can be created. If it is a discrete HMM, a continuous distribution HM
If it is M, the average value of the distribution expressing the acoustic feature can be used.

【００６６】このとき、ＨＭＭの分散、例えばＨＭＭの
音響パラメータの表現する分布の分散を用いて音響パラ
メータ系列に摂動を与えることにより、音響的特徴のバ
リエーションを検証することができる。また、雑音ＨＭ
Ｍの音響パラメータを音響パラメータ系列にランダムに
連結することにより、雑音が混入した場合の影響を検証
することができる。各認識単位に対応するフレーム数
は、ＨＭＭ状態の自己遷移確率の逆数を用いると良い。At this time, the variation of the acoustic feature can be verified by perturbing the acoustic parameter sequence using the variance of the HMM, for example, the variance of the distribution expressed by the acoustic parameters of the HMM. Also, the noise HM
By randomly connecting the M acoustic parameters to the acoustic parameter sequence, it is possible to verify the effect of the noise mixing. As the number of frames corresponding to each recognition unit, the reciprocal of the self-transition probability of the HMM state may be used.

【００６７】図７を参照して、疑似的な音響パラメータ
系列の生成例を説明する。Referring to FIG. 7, an example of generating a pseudo acoustic parameter sequence will be described.

【００６８】まず、無音モデル"sil" と、音素モデル"p
h1" 、"ph2" 、・・・と、雑音モデルN1、・・・とその
継続時間長TN1 、・・・を用いて、音素（ＨＭＭ状態）
系列を指定する。これにより、音素（ＨＭＭ状態）系列
に対応する音響パラメータ系列が、図８に示すように生
成される。First, the silence model "sil" and the phoneme model "p"
phonemes (HMM state) using h1 "," ph2 ",..., noise model N1,.
Specify a series. As a result, an acoustic parameter sequence corresponding to the phoneme (HMM state) sequence is generated as shown in FIG.

【００６９】図７では音素系列を指定指定しているが、
状態系列でも指定するすることができる。また、前述の
ように、音響パラメータ系列に摂動を与えることもでき
る。更に、混合分布ＨＭＭから音響パラメータ系列を生
成する場合は、混合分布の重心を用いたり、分布重み最
大の分布の平均ベクトルを用いたり、混合分布からンダ
ムにピックアップした分布の平均ベクトルを用いる等が
可能である。In FIG. 7, the phoneme series is designated and designated.
It can also be specified in the state series. Also, as described above, perturbation can be given to the acoustic parameter sequence. Furthermore, when generating an acoustic parameter sequence from the mixture distribution HMM, the center of gravity of the mixture distribution is used, the average vector of the distribution with the largest distribution weight is used, the average vector of the distribution picked up from the mixture distribution by Dam, and the like are used. It is possible.

【００７０】上述したような疑似的な音響パラメータ系
列が作成されたら、この疑似的な音響パラメータ系列に
対して音声認識処理を行う。この場合、(1) 既登録語彙
のみを受理する文法で音声認識処理を行う方法、(2) 既
登録語彙のみを受理する文法で行う音声認識処理と新規
語彙のみを受理する文法で行う音声認識処理の両方を用
いる方法、(3) 既登録語彙と新規語彙の両方を受理する
文法で音声認識処理を行う方法が考えられる。(1) の方
法では、音声認識処理で得られた尤度自体を評価値とす
ることができ、尤度が設定したしきい値より小さい場合
は新規語彙の登録を許可し、大きい場合は登録を禁止す
れば良い。(2) の方法では、新規語彙のみを受理する文
法で行った音声認識処理の尤度と既登録語彙のみを受理
する文法で行った音声認識処理の尤度との差を評価値と
することができ、この尤度の差が設定したしきい値より
大きい場合は新規語彙の登録を許可し、小さい場合は登
録を禁止すれば良い。(3) の方法では、音声認識処理で
得られた最大の尤度と、２番目に大きい尤度との差を評
価値とすることができ、この尤度の差が設定したしきい
値より大きい場合は新規語彙の登録を許可し、小さい場
合は登録を禁止すれば良い。When the pseudo acoustic parameter sequence as described above is created, a speech recognition process is performed on the pseudo acoustic parameter sequence. In this case, (1) speech recognition processing using grammar that accepts only registered vocabulary, (2) speech recognition processing using grammar that accepts only registered vocabulary and speech recognition using grammar that accepts only new vocabulary (3) A method of performing speech recognition using a grammar that accepts both registered vocabulary and new vocabulary. In the method (1), the likelihood itself obtained by the speech recognition processing can be used as an evaluation value. If the likelihood is smaller than a set threshold, registration of a new vocabulary is permitted. Should be banned. In the method (2), the difference between the likelihood of the speech recognition process performed by the grammar that accepts only the new vocabulary and the likelihood of the speech recognition process performed by the grammar that accepts only the registered vocabulary is used as the evaluation value. If the difference between the likelihoods is larger than a set threshold, registration of a new vocabulary is permitted, and if smaller, registration is prohibited. In the method (3), the difference between the maximum likelihood obtained in the speech recognition processing and the second largest likelihood can be used as the evaluation value, and the difference between the likelihoods is determined by the set threshold value. If it is large, registration of a new vocabulary is permitted, and if it is small, registration is prohibited.

【００７１】図８に、上記（２）の手法を実現する音声
認識装置の構成例を示す。但し、評価値計算に必要な部
分のみ示し、音声入力手段、警告手段等は図示を省略す
る。この音声認識装置のコンピュータもコンピュータプ
ログラムを読み込んで、上記の手順を実行する。コンピ
ュータプログラムには、コンピュータに実行させる手順
が記述されている。FIG. 8 shows an example of the configuration of a speech recognition apparatus for realizing the above method (2). However, only the parts necessary for the evaluation value calculation are shown, and voice input means, warning means, and the like are not shown. The computer of the speech recognition device also reads the computer program and executes the above procedure. The computer program describes a procedure to be executed by the computer.

【００７２】図８に示す音声認識装置は音響パラメータ
生成手段１２と、格納手段１３とを追加して有する。The speech recognition apparatus shown in FIG. 8 additionally has an acoustic parameter generation unit 12 and a storage unit 13.

【００７３】音響パラメータ生成手段１２は音声入力手
段１から新規語彙の文字列を入力し、これに対応する疑
似的な音響パラメータ系列を前述したようにＨＭＭを用
いて生成し、音声認識手段３に与えるものである。その
際、ＨＭＭの分散を用いて摂動を与え、あるいは、雑音
ＨＭＭの音響パラメータをランダムに接続して、音響パ
ラメータ系列を生成するようにしている。The acoustic parameter generating means 12 receives a character string of a new vocabulary from the voice input means 1, generates a corresponding pseudo acoustic parameter sequence using the HMM as described above, and sends it to the voice recognizing means 3. Is to give. At this time, a perturbation is given using the variance of the HMM, or the acoustic parameters of the noise HMM are randomly connected to generate an acoustic parameter sequence.

【００７４】格納手段１３は、新規語彙のみを受理する
文法と、既登録語彙のみを受理する文法を格納してい
る。音声認識手段３は疑似的な音響パラメータに対し
て、新規語彙を受理する文法を用いる第１の音声認識処
理３ａと、既登録語彙のみを受理する文法を用いる第２
の音声認識処理３ｂの２通りを行い、それそれの尤度を
評価値計算手段５に与える。The storage unit 13 stores a grammar that accepts only new vocabulary and a grammar that accepts only registered vocabulary. The speech recognition means 3 performs a first speech recognition process 3a using a grammar for accepting a new vocabulary for a pseudo acoustic parameter, and a second speech grammar for accepting only a registered vocabulary.
And the likelihood of each is given to the evaluation value calculation means 5.

【００７５】評価値計算手段５は第１の音声認識処理３
ａで得られた尤度と第２の音声認識処理３ｂで得られた
尤度との差を計算し、この尤度差を評価値する。この場
合、判定手段６は設定したしきい値より評価値が大きい
場合は新規語彙の登録を許可し、小さい場合は登録を禁
止する。The evaluation value calculating means 5 performs the first speech recognition processing 3
The difference between the likelihood obtained in a and the likelihood obtained in the second speech recognition process 3b is calculated, and this likelihood difference is evaluated. In this case, the judgment means 6 permits registration of a new vocabulary when the evaluation value is larger than the set threshold value, and prohibits registration when the evaluation value is smaller than the set threshold value.

【００７６】上述した各種の音声認識語彙登録判定手法
は、単独で用いても、あるいは、任意の複数を組み合わ
せて使用することができる。例えば、下記(1) 〜(4) に
示す４手法のうち、全て、あるいは、適宜な複数を組み
合わせて使用する。 (1) コンフュージョン・マトリクスを用いて認識単位間
のスコアを定め、このスコアを用いてＤＰマッチングを
行って評価値を求める手法。 (2) 認識単位のＨＭＭを表現する音響パラメータ空間上
の分布を用いて認識単位間のスコアを定め、このスコア
を用いてＤＰマッチングを行って評価値を求める手法。 (3) ＨＭＭを用いて新規語彙に対応する疑似的な音響パ
ラメータ系列を生成し、これに対して音声認識処理を行
うことにより評価値を求める手法。 (4) 上記(3) の手法で疑似的な音響パラメータ系列の生
成に際し、ＨＭＭの分散を用いて摂動を与えたり、雑音
ＨＭＭの音響パラメータをランダムに接続することによ
り、発生の揺らぎや耐雑音性を評価する手法。The above-described various speech recognition vocabulary registration / judgment methods can be used alone or in any combination. For example, all or four or more of the four methods shown in the following (1) to (4) are used in combination. (1) A method of determining a score between recognition units using a confusion matrix and performing DP matching using the score to obtain an evaluation value. (2) A method in which a score between recognition units is determined using a distribution in the acoustic parameter space expressing the HMM of the recognition unit, and DP matching is performed using the score to obtain an evaluation value. (3) A method of generating a pseudo acoustic parameter sequence corresponding to a new vocabulary using an HMM, and performing a speech recognition process on the sequence to obtain an evaluation value. (4) In generating a pseudo acoustic parameter sequence by the method of (3) above, by applying perturbation using the variance of the HMM or by randomly connecting the acoustic parameters of the noise HMM, fluctuations in generation and noise immunity are generated. Technique to evaluate gender.

【００７７】このような複数の手法を組み合わせる場
合、(1) と(2) の手法に比べ、(3) と(4) の手法の方が
精度が良いが、コンピュータによるソフトウェア処理で
は(3)と(4) の手法の方が処理が重い。従って、このよ
うな場合には、(1) または(2)あるいは両方の手法を厳
しいしきい値を設定して行い、このときの判定でもれた
もののみ(3) または(4) あるいは両方の手法で判定する
と効率的である。When combining a plurality of such methods, the methods (3) and (4) are more accurate than the methods (1) and (2), but the software processing by the computer requires (3) Methods (4) and (4) require more processing. Therefore, in such a case, (1) or (2) or both methods are performed with a strict threshold set, and only those that are judged at this time are (3) or (4) or both. It is efficient to judge by the method.

【００７８】[0078]

【発明の効果】本発明によれば認識対象語彙の選定時あ
るいは登録時に、誤認識を起こしやすいと予想される単
語やフレーズ等の語彙の登録を防ぐことが可能になる。
これにより、システム開発者にとっては、効率良く、認
識対象語彙を選定することができる。また、ユーザにと
っては、認識性能を低下の低下を抑制しつつ、認識対象
語彙の増加など、カスタマイズすることができる。この
ような効果を奏する本発明は、タスク限定型音声認識装
置や、個人登録型音声ポータル装置、センター提供型音
声ポータル装置等に有用である。According to the present invention, at the time of selecting or registering a vocabulary to be recognized, it is possible to prevent the registration of vocabulary such as words and phrases that are likely to cause erroneous recognition.
This allows the system developer to efficiently select the vocabulary to be recognized. Further, for the user, it is possible to customize the recognition performance such as increasing the vocabulary to be recognized while suppressing the reduction in the recognition performance. The present invention having such effects is useful for a task-restricted voice recognition device, a personal registration type voice portal device, a center-provided voice portal device, and the like.

[Brief description of the drawings]

【図１】本発明の実施形態に係る音声認識装置の構成例
を示す図。FIG. 1 is a diagram showing a configuration example of a speech recognition device according to an embodiment of the present invention.

【図２】本発明の実施形態に係る音声認識語彙登録判定
方法の例を示す図。FIG. 2 is a diagram showing an example of a speech recognition vocabulary registration determination method according to the embodiment of the present invention.

【図３】本発明の実施形態に係る他の音声認識装置の構
成例を示す図。FIG. 3 is a diagram showing a configuration example of another voice recognition device according to the embodiment of the present invention.

【図４】コンフュージョン・マトリクスの例を示す図。FIG. 4 is a diagram showing an example of a confusion matrix.

【図５】本発明の実施形態に係る他の音声認識装置の構
成例を示す図。FIG. 5 is a diagram showing a configuration example of another speech recognition device according to the embodiment of the present invention.

【図６】本発明の実施形態に係る他の音声認識装置の構
成例を示す図。FIG. 6 is a diagram showing a configuration example of another voice recognition device according to the embodiment of the present invention.

【図７】疑似的な音響パラメータ系列の生成例をを示す
図。FIG. 7 is a diagram showing an example of generating a pseudo acoustic parameter sequence.

【図８】本発明の実施形態に係る他の音声認識装置の構
成例を示す図。FIG. 8 is a diagram showing a configuration example of another voice recognition device according to the embodiment of the present invention.

[Explanation of symbols]

１音声入力手段２語彙登録手段３音声認識手段３ａ第１の音声認識処理３ｂ第２の音声認識処理４文字入力手段５評価値計算手段６判定手段７警告手段８格納手段９第１の変換手段１０第２の変換手段１１スコア計算手段１２音響パラメータ生成手段１３格納手段１３ REFERENCE SIGNS LIST 1 voice input means 2 vocabulary registration means 3 voice recognition means 3a first voice recognition processing 3b second voice recognition processing 4 character input means 5 evaluation value calculation means 6 determination means 7 warning means 8 storage means 9 first conversion means Reference Signs List 10 second conversion means 11 score calculation means 12 acoustic parameter generation means 13 storage means 13

───────────────────────────────────────────────────── フロントページの続き (72)発明者樋口宜男埼玉県上福岡市大原二丁目１番15号株式会社ケイディディ研究所内Ｆターム(参考） 5D015 GG01 HH00 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Norio Higuchi 2-1-1-15 Ohara, Kamifukuoka-shi, Saitama F-term in Kaididi Research Institute Co., Ltd. 5D015 GG01 HH00

Claims

[Claims]

1. A vocabulary for confusing a new vocabulary with a registered vocabulary for a speech recognition apparatus is digitized, and a threshold is set for an obtained numerical value (hereinafter referred to as an evaluation value) to register a new vocabulary. A speech recognition vocabulary registration / judgment method characterized by judging whether to allow or prohibit.

2. The method according to claim 1, wherein the characters or character strings of the new vocabulary and the registered vocabulary are decomposed into recognition units such as phonemes and syllables, respectively. A speech recognition vocabulary registration determination method, wherein a second recognition unit sequence corresponding to a registered vocabulary is obtained, and a similarity between the first and second recognition unit sequences is obtained as the evaluation value.

3. The speech recognition vocabulary registration determination method according to claim 2, wherein the similarity is obtained by taking a sum or a product of scales representing distances between recognition units.

4. The speech recognition vocabulary registration determination method according to claim 2, wherein said similarity is obtained by using a confusion matrix between recognition units obtained in advance.

5. The HMM of claim 2, wherein:
A speech recognition vocabulary registration / judgment method characterized in that the similarity is obtained using a distribution in an acoustic parameter space expressing a (hidden Markov model).

6. The method according to claim 1, wherein a pseudo acoustic parameter sequence corresponding to a character or a character string of the new vocabulary is generated, and the generated acoustic parameter sequence is subjected to a speech recognition process. A speech recognition vocabulary registration determination method, wherein the evaluation value is obtained.

7. The speech recognition process according to claim 6, wherein a first speech recognition process using a grammar for accepting a new vocabulary and a second speech recognition process using a grammar for accepting only a registered vocabulary are performed. And a difference between a likelihood obtained in the first speech recognition process and a likelihood obtained in the second speech recognition process is used as the evaluation value.

8. The speech recognition vocabulary registration determination method according to claim 7, wherein the acoustic parameter sequence is generated using an HMM (Hidden Markov Model).

9. The speech recognition vocabulary according to claim 8, wherein the acoustic parameter sequence is generated by perturbation using the variance of the HMM or by randomly connecting acoustic parameters of a noise HMM. Registration determination method.

10. A speech recognition vocabulary registration / judgment method characterized by using a combination of the speech recognition vocabulary registration / judgment method according to claim 2 and the speech recognition vocabulary registration / judgment method according to claim 6.

11. A speech input means, a vocabulary registration means, and a speech recognition means, wherein speech input from the speech input means is based on a vocabulary registered in the vocabulary registration means (hereinafter, a registered vocabulary). In a speech recognition apparatus for recognizing, a character input means for inputting a character, a numerical value (hereinafter, an evaluation value) for evaluating the ease of confusion between an input character or character string (hereinafter, a new vocabulary) and a registered vocabulary ), And a determination unit that sets a threshold value for the obtained evaluation value and determines whether registration of a new vocabulary is permitted or prohibited.

12. The method according to claim 11, wherein the first vocabulary is decomposed into recognition units to generate a first recognition unit sequence corresponding to the new vocabulary, and the registered vocabulary is decomposed into recognition units. And a second conversion unit for generating a second recognition unit sequence corresponding to the registered vocabulary, wherein the evaluation value calculation unit determines the first value obtained by the first conversion unit as the evaluation value. A speech recognition apparatus comprising: calculating a similarity between the recognition unit sequence of (i) and the second recognition unit sequence obtained by the second conversion means.

13. The speech recognition apparatus according to claim 12, wherein said evaluation value calculation means calculates the similarity by taking a sum or a product of scales representing distances between recognition units.

14. The method according to claim 12, wherein the evaluation value calculating means is obtained in advance.

15. The apparatus according to claim 12, wherein the evaluation value calculating means calculates the similarity using a distribution in an acoustic parameter space representing an HMM (Hidden Markov Model) of a recognition unit. Voice recognition device.

16. The speech recognition unit according to claim 11, further comprising: a speech parameter sequence generation unit that generates a pseudo acoustic parameter sequence corresponding to a new vocabulary and provides the pseudo speech parameter sequence to the speech recognition unit. A speech recognition device, wherein the evaluation value is calculated using likelihood obtained from the speech recognition device.

17. The speech recognition device according to claim 16, wherein said speech recognition means performs a first speech recognition process using a grammar for accepting a new vocabulary and a second speech recognition process using a grammar for accepting only a registered vocabulary. The evaluation value calculating means may determine the likelihood obtained in the first speech recognition process as a second evaluation value as the evaluation value.
A speech recognition apparatus for calculating a difference from a likelihood obtained by the speech recognition processing of (1).

18. The speech recognition apparatus according to claim 17, wherein said acoustic parameter sequence generation means generates an acoustic parameter sequence using an HMM (Hidden Markov Model).

19. The acoustic parameter sequence generating means according to claim 18, wherein the acoustic parameter sequence generating means perturbs using the variance of the HMM or randomly connects acoustic parameters of the noise HMM to generate an acoustic parameter sequence. A speech recognition device characterized by the following.

20. A computer program for causing a computer to execute the speech recognition vocabulary registration / judgment method according to claim 1.

21. A recording medium on which the computer program according to claim 21 is recorded.