JP2002082690A

JP2002082690A - Language model generation method, speech recognition method, and program recording medium therefor

Info

Publication number: JP2002082690A
Application number: JP2000268900A
Authority: JP
Inventors: Katsutoshi Ofu; 克年大附; Takaaki Hori; 貴明堀; Shoichi Matsunaga; 昭一松永; Takeshi Kawabata; 豪川端
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2000-09-05
Filing date: 2000-09-05
Publication date: 2002-03-22
Anticipated expiration: 2020-09-05
Also published as: JP3628245B2

Abstract

(57)【要約】【課題】認識タスク（発声内容）に対し、高精度な記
号連鎖確率（言語モデル）を生成できる。【解決手段】認識タスク用テキストデータベース１５
０の他に複数の一般用テキストデータベース１６０−１
〜１６０−Ｎを用い、ＤＢ１５０の記号連鎖確率Ｐ_Tを
求め、Ｐ_Tを用いて各一般用ＤＢ１６０−ｎ（ｎ＝１，
２，…,Ｎ）のテストセットパープレキシティＰＰ_nを
求め、ＰＰ_nが小さい程、そのＤＢ１６０−ｎに大きな
重みＷ_n（０＜Ｗ_n＜１）を与え、ＤＢ１５０とＤＢ１
６０−１〜１６０−Ｎとから記号連鎖確率を求める。そ
の際に例えばある単語Ａの出現回数を求める場合にＤＢ
１６０−１〜１６０−Ｎの各出現回数にそれぞれＷ₁〜
Ｗ_nを乗算し、これらの値とＤＢ１５０の出現回数との
和をＡの出現回数とする。 (57) [Summary] [Problem] A highly accurate symbol chain probability (language model) can be generated for a recognition task (speech content). SOLUTION: A text database 15 for a recognition task.
0 and a plurality of general text databases 160-1
Used to 160-N, determined symbol linkage probability P _T of DB 150, the OTC DB160-n (n = 1 with P _T,
2, ..., determine the test set perplexity PP _n of N), as the PP _n is small, giving greater weight _{_{W n (0 <W n <}} 1) to the DB 160-n, DB 150 and DB1
The symbol chain probability is obtained from 60-1 to 160-N. At this time, for example, when calculating the number of appearances of a certain word A, the DB
Each of the number of appearances of 160-1 to 160-N corresponds to W ₁ to
W _n is multiplied, and the sum of these values and the number of occurrences of DB 150 is defined as the number of occurrences of A.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、人が発声した文
章などの音声を入力信号とし、その音声を音響モデルお
よび記号連鎖確率（言語モデル）を用いて認識し、その
結果を記号列として出力する音声認識方法、この方法に
用いられる言語モデルの生成方法及びこれらのプログラ
ム記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech such as a sentence uttered by a human as an input signal, recognizes the speech using an acoustic model and a symbol chain probability (language model), and outputs the result as a symbol sequence. The present invention relates to a method for recognizing speech, a method for generating a language model used in the method, and a program recording medium for these.

【０００２】[0002]

【従来の技術】音声入力を音声認識により記号列（単語
列）に変換する場合、大規模なテキストデータベースか
ら記号（単語）の出現連鎖に関する記号連鎖確率（言語
モデル）を生成し、それを利用することで音声認識性能
を向上させる手法が従来から知られている。しかし、認
識タスク（発声内容）が、記号連鎖確率の生成に用いた
大規模なテキストデータベースのタスクと異なったりす
る場合には有効性が低かった。2. Description of the Related Art When a speech input is converted into a symbol string (word string) by speech recognition, a symbol chain probability (language model) relating to a symbol (word) occurrence chain is generated from a large-scale text database and is used. Conventionally, a technique for improving the speech recognition performance has been known. However, the effectiveness was low when the recognition task (speech content) was different from the task of the large-scale text database used to generate the symbol chain probability.

【０００３】そこで、このような問題点を解決するた
め、特開平４−２９１３９９号公報に示すように、認識
タスクに類似したテキストデータベースから作成した学
習用記号連鎖確率を用いて、大規模なテキストデータベ
ースから生成した記号連鎖確率を適応化し、この適応化
された記号連鎖確率を利用して音声認識を行うようにし
た技術も従来から提案されている。In order to solve such a problem, as disclosed in Japanese Patent Application Laid-Open No. Hei 4-291399, a large-scale text data is generated by using a learning symbol chain probability created from a text database similar to a recognition task. A technique has been proposed in which a symbol chain probability generated from a database is adapted and speech recognition is performed using the adapted symbol chain probability.

【０００４】[0004]

【発明が解決しようとする課題】上述した適応された記
号連鎖確率を用いる従来の技術は、大規模なテキストデ
ータベースから生成した記号連鎖確率のみを用いる技術
に比較して、高い認識性能を実現できるが、大規模なテ
キストデータベースから生成した記号連鎖確率には、認
識タスクとはかけ離れたデータの情報も含まれているた
め、記号連鎖によっては適応化した確率値の推定が不安
定になるという問題があった。また、適応化した記号連
鎖確率は、大規模なテキストデータベースと認識タスク
に類似したテキストデータベースのすべての情報を保持
するため記憶容量が大きいという問題があった。The conventional technique using the above-mentioned adapted symbol chain probability can realize higher recognition performance than the technique using only the symbol chain probability generated from a large-scale text database. However, since the symbol chain probability generated from a large-scale text database includes information on data far from the recognition task, the estimation of the adapted probability value becomes unstable depending on the symbol chain. was there. In addition, there is a problem that the adapted symbol chain probability has a large storage capacity because it holds all information of a large-scale text database and a text database similar to a recognition task.

【０００５】そこで、この発明の一つの目的は、大規模
テキストデータベースの中で認識タスクにより類似して
いるテキストにより大きな重みを付けることにより、高
精度な記号連鎖確率を生成し、それを認識に用いること
により認識性能を向上することができる言語モデルの生
成方法を提供することにある。また、認識タスクと類似
度の低いデータを排除する（重みを０とする）ことによ
り、保持する情報を削減して、記憶容量の小さな記号連
鎖確率を提供することにある。Therefore, one object of the present invention is to generate a highly accurate symbol chain probability by assigning a greater weight to texts that are more similar to a recognition task in a large-scale text database, and to use this for recognition. An object of the present invention is to provide a method of generating a language model that can improve recognition performance by using the language model. Another object of the present invention is to provide a symbol chain probability with a small storage capacity by eliminating data having a low degree of similarity to the recognition task (by setting the weight to 0) to reduce information to be held.

【０００６】[0006]

【課題を解決するための手段】この発明の言語モデル生
成方法によれば、認識対象のタスク（発声内容）に関す
るテキストデータを格納した認識タスク用テキストデー
タベースと、認識対象タスクとは直接関係しない一般的
な複数のテキストデータベースとを用い、認識タスク用
テキストデータベースに対する各一般用テキストデータ
ベースの関連（類似性）を示す重みを求め、これら認識
タスク用、また一般用テキストデータベースを用い、注
目する記号（単語）についてそれが属するデータベース
の重みを与えて記号連鎖確率を生成する。According to the language model generating method of the present invention, a text database for a recognition task storing text data relating to a task (speech content) to be recognized, and a general database not directly related to the task to be recognized. Weights indicating the relevance (similarity) of each general-purpose text database to the recognition task text database using a plurality of general text databases, and using the recognition task and general text databases, For each word, the weight of the database to which it belongs is given to generate a symbol chain probability.

【０００７】各一般用テキストデータベースの重みを求
めるには、認識タスク用テキストデータベースのテキス
トデータから得られる情報に基づいて、各一般用テキス
トデータベースのテキストデータのそれぞれ類似性を検
証し、認識タスク用テキストデータベースとの類似度の
大きい一般用テキストデータベースに大きな重みを与え
る。認識タスク用テキストデータベースと一般用テキス
トデータベースの類似度の検証には、認識タスク用テキ
ストデータベース中のテキストデータより生成した記号
連鎖確率を、各一般用テキストデータベースのテキスト
データに対して与えた際のパープレキシティ（エントロ
ピー）、または各一般用テキストデータベースについて
その中にそれぞれ認識タスク用テキストデータベース中
の単語が含まれない率（未知語率）をそれぞれ求め、あ
るいはこれらパープレキシティおよび未知語率の組み合
わせを用いる。また認識タスク用テキストデータベース
の重みＷ_Tを決定する場合は、認識タスク用テキストデ
ータベースのデータ量と複数の一般用テキストデータベ
ースのデータ量との比に基づく値を用いる。この重みＷ
_Tは上記パープレキシティおよび未知語率に基づく値と
組み合わせて用いてもよい。To determine the weight of each general-purpose text database, the similarity of the text data of each general-purpose text database is verified based on information obtained from the text data of the recognition task text database, and the weight of each general-purpose text database is determined. A large weight is given to a general-purpose text database having a high similarity with the text database. In order to verify the similarity between the text database for the recognition task and the general text database, the symbol chain probability generated from the text data in the text database for the recognition task was used when the text data in each general text database was given. The perplexity (entropy), or the rate at which each general-purpose text database does not contain a word in the recognition task text database (unknown word rate), respectively, or the perplexity and unknown word rate Use a combination. Also when determining the weight W _T of the text database for recognition tasks, a value based on the ratio of the amount of data amount and the plurality of general text database text database for recognition tasks. This weight W
_T may be used in combination with the value based on the perplexity and the unknown word rate.

【０００８】[0008]

【発明の実施の形態】この発明の実施の形態について図
面を参照して詳細に説明する。図１にこの発明による音
声認識方法の一実施例が適用される音声認識装置の構成
例を示す。音声認識部１１０と、記号連鎖確率（言語モ
デル）データベース１２０と、音声標準パタンデータベ
ース１３０と、認識タスク用記号連鎖確率生成部１４０
と、認識タスク用テキストデータベース１５０と、複数
の一般用テキストデータベース１６０−１〜１６０−Ｎ
とを備えている。Embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 shows a configuration example of a speech recognition apparatus to which an embodiment of the speech recognition method according to the present invention is applied. A speech recognition unit 110, a symbol chain probability (language model) database 120, a speech standard pattern database 130, and a symbol chain probability generating unit 140 for a recognition task.
, A recognition task text database 150, and a plurality of general text databases 160-1 to 160-N.
And

【０００９】音声標準パタンデータベース１３０は、予
め分析された音声の標準パタンを複数保持している。認
識タスク用データベース１５０には認識対象タスク（発
声内容）と関連したテキストデータが格納される。入力
音声が例えばニュース番組の発語音声の場合、認識タス
クのテキストデータとして、多数のニュースの書き起こ
しに用いた各単語が認識タスク用テキストデータベース
１５０に格納される。一般用データベース１６０−１〜
１６０−Ｎは認識対象タスクと直接関係のない新聞記事
データベース、小説データベースなどが用いられ、例え
ば新聞記事、ホームページ、ネットニューズなどから多
数の単語を収集し、その収集した際にその単語を含んで
構成されていた一文ごとにそれぞれ１つの一般用テキス
トデータベース１６０−ｎ（ｎ＝１，２，…,Ｎ）を構
成してもよい。The voice standard pattern database 130 holds a plurality of voice standard patterns analyzed in advance. The recognition task database 150 stores text data related to the recognition target task (speech content). If the input speech is, for example, a spoken speech of a news program, each word used to transcribe a large number of news is stored in the recognition task text database 150 as text data of the recognition task. General database 160-1
For 160-N, a newspaper article database, a novel database, or the like, which is not directly related to the recognition target task, is used. For example, a large number of words are collected from newspaper articles, homepages, net news, and the like. One general text database 160-n (n = 1, 2,..., N) may be configured for each of the configured sentences.

【００１０】認識タスク用記号連鎖確率生成部１４０
は、この発明による言語モデル生成方法を実行するもの
であって、認識処理に先立って、認識タスク用テキスト
データベース１５０と、一般用テキストデータベース１
６０−１〜１６０−Ｎとから、認識タスクに対する類似
度の大きいテキストデータを含む一般用テキストデータ
ベースにより大きな重みを付けて、これら認識タスク用
テキストデータベースと複数の一般用テキストデータベ
ースを用いて、認識タスクに対して高精度な絞り込みを
することが可能な記号連鎖確率（言語モデル）を生成し
てそれを記号連鎖確率データベース１２０に格納する。
音声認識部１１０は、入力音声に対して、記号連鎖確率
データベース１２０の記号連鎖確率および音声標準パタ
ンデータベース１３０の音声標準パタンなどから得られ
る情報に基づいて記号列候補の絞り込みを行い、認識結
果である記号列を出力する。Symbol chain probability generating section 140 for recognition task
Executes the language model generation method according to the present invention, and performs a recognition task text database 150 and a general text database 1 prior to a recognition process.
From 60-1 to 160-N, a general text database including text data having a high degree of similarity to the recognition task is given a greater weight, and the recognition task text database and a plurality of general text databases are used to perform recognition. A symbol chain probability (language model) capable of narrowing down tasks with high accuracy is generated and stored in the symbol chain probability database 120.
The speech recognition unit 110 narrows down symbol string candidates for the input speech based on information obtained from the symbol chain probability of the symbol chain probability database 120 and the voice standard pattern of the voice standard pattern database 130, and based on the recognition result. Output a symbol string.

【００１１】図２に、認識タスク用記号連鎖確率生成部
１４０の構成例を示しこの図２を参照してこの発明によ
る言語モデル、つまり記号連鎖確率の生成方法の実施例
を説明する。重み決定部２１０では、認識タスク用デー
タベース１５０中の各認識タスクのテキストデータと各
一般用テキストデータベース１６０−ｎの各テキストデ
ータとを入力し、認識タスクのテキストデータと各一般
用テキストデータベース１６０−ｎのテキストデータと
の類似度からその一般用テキストデータベース１６０−
ｎに対する重みＷ_nを決定する。また、各一般用テキス
トデータベースの重みＷ_iに基づいて認識タスク用テキ
ストデータベース１５０の重みＷ_Tを決定する。これら
重みＷ_n,Ｗ_Tを決定する具体的手法は後で説明する。
認識タスク用テキストデータベース１５０に重みＷ_Tを
複数の一般用テキストデータベース１６０−１〜１６０
−Ｎに重みＷ₁〜Ｗ_Nをそれぞれ与える。FIG. 2 shows an example of the configuration of the symbol chain probability generation section 140 for a recognition task, and an embodiment of a method of generating a language model, that is, a symbol chain probability according to the present invention will be described with reference to FIG. The weight determining unit 210 inputs the text data of each recognition task in the recognition task database 150 and the text data of each general text database 160-n, and inputs the text data of the recognition task and each general text database 160-n. n based on the similarity to the text data of the n.
Determine the weight W _n for _n . Further, to determine the weight W _T of the recognition task text database 150 based on the weight W _i of each general text database. A specific method for determining these weights W _n and W _T will be described later.
Multiple of the general text database the weight W _T in recognition task for the text database 150 160-1 to 160
−N are given weights W _{1 to} W _N , respectively.

【００１２】記号連鎖確率生成部２２０では、重み決定
部２１０が出力した重み付きの認識タスク用テキストデ
ータベース１５０および重み付きの複数の一般用テキス
トデータベース１６０−１〜１６０−Ｎの各テキストデ
ータを入力し、記号連鎖確率（言語モデル）を生成して
記号連鎖確率データベース１２０に格納する。この記号
連鎖確率、つまり、ユニグラム、バイグラム、トライグ
ラム、一般的にはＭグラム（Ｍは１以上の整数）の生成
の基本的な手法は従来の方法と同様であるが、認識タス
ク用テキストデータベース１５０と複数の一般的テキス
トデータベース１６０−１〜１６０−Ｎを１つのテキス
トデータベースとして、このテキストデータベースから
記号連鎖確率を生成するが、その際に、各記号（単語）
について、それが属するテキストデータベースの重みを
考慮する。例えば単語Ａについて認識タスク用テキスト
データベース１５０における出現回数をＣ_T（Ａ）、一
般用テキストデータベース１６０−１〜１６０−Ｎにお
ける各出現回数をＣ₁（Ａ）〜Ｃ_N（Ａ）とすると、これ
らに対し、そのデータベースの重みを掛算して加算し、Ｃ（Ａ）＝Ｗ_T・Ｃ_T（Ａ）＋Ｗ₁・Ｃ₁（Ａ）＋Ｗ₂・
Ｃ₂（Ａ）＋…＋Ｗ_N・Ｃ_N（Ａ）を単語Ａの出現回数と
し、同様にして他の単語の出現回数を求める。単語Ａの
単語単体の出現確率（ユニグラム）は、単語Ａの出現回
数Ｃ（Ａ）をすべての単語の出現回数ΣＣ（ｋ）で割っ
たＰ（Ａ）＝Ｃ（Ａ）／ΣＣ（ｋ）となり、同様にして他の単語の出現確率を求めて記号連
鎖確率データベース１２０に格納する。The symbol chain probability generation section 220 receives the text data of the weighted recognition task text database 150 and the plurality of weighted general text databases 160-1 to 160-N output by the weight determination section 210. Then, a symbol chain probability (language model) is generated and stored in the symbol chain probability database 120. The basic method of generating this symbol chain probability, that is, a unigram, a bigram, a trigram, and generally an M-gram (M is an integer of 1 or more) is the same as the conventional method, but a text database for a recognition task. 150 and a plurality of general text databases 160-1 to 160-N are considered as one text database, and a symbol chain probability is generated from this text database.
Consider the weight of the text database to which it belongs. For example, if the number of appearances of the word A in the recognition task text database 150 is C _T (A), and the number of appearances in the general text databases 160-1 to 160-N is C ₁ (A) to C _N (A), These are multiplied by the weight of the database and added, and C (A) = W _T · C _T (A) + W ₁ · C ₁ (A) + W ₂ ·
C ₂ (A) +... + W _N · C _N (A) is the number of appearances of the word A, and the number of appearances of another word is obtained in the same manner. The appearance probability (unigram) of a single word of the word A is obtained by dividing the number of appearances C (A) of the word A by the number of appearances ΣC (k) of all words P (A) = C (A) / ΣC (k) Similarly, the appearance probabilities of other words are obtained and stored in the symbol chain probability database 120.

【００１３】あるいは、例えば単語Ａの次に単語Ｂが生
じる確率であるバイグラムの場合、単語Ａに続いて単語
Ｂが出現する回数について認識タスク用テキストデータ
ベース１５０における出現回数をＣ_T（Ａ，Ｂ）、一般
用テキストデータベース１６０−１〜１６０−Ｎにおけ
る各出現回数をＣ₁（Ａ，Ｂ）〜Ｃ_N（Ａ，Ｂ）とする
と、これらについてそれぞれのデータベースの重みを掛
算したものの和Ｃ（Ａ，Ｂ）＝Ｗ_T・Ｃ_T（Ａ，Ｂ）＋Ｗ₁・Ｃ
₁（Ａ，Ｂ）＋Ｗ₂・Ｃ₂（Ａ，Ｂ）＋…＋Ｗ_N・Ｃ_N
（Ａ，Ｂ）を単語連鎖Ａ，Ｂの出現回数とし、それを単語Ａの出現
回数Ｃ（Ａ）で割ったＰ（Ｂ｜Ａ）＝Ｃ（Ａ，Ｂ）／Ｃ
（Ａ）を単語Ａの次に単語Ｂが生じるバイグラム確率と
して同様にして他の単語連鎖の確率を求めて記号確率デ
ータベース１２０に格納してもよい。Alternatively, for example, in the case of a bigram, which is the probability that word B occurs after word A, the number of occurrences of word B following word A in the recognition task text database 150 is C _T (A, B ), Assuming that the number of appearances in the general text databases 160-1 to 160-N is C ₁ (A, B) to C _N (A, B), the sum C ( A, B) = W _T · C _T (A, B) + W ₁ · C
₁ (A, B) + W ₂ · C ₂ (A, B) + ... + W _N · C _N
(A, B) is the number of appearances of the word chains A and B, and is divided by the number of appearances C (A) of the word A, P (B | A) = C (A, B) / C
Similarly, the probability of another word chain may be obtained and stored in the symbol probability database 120 using (A) as the bigram probability that the word B follows the word A.

【００１４】次に、図２に示した認識タスク用記号連鎖
確率生成部１４０中の重み決定部２１０における処理手
順例を図３に示し、以下に動作を説明するｎ＝１に初期
化し（Ｓ１）、認識タスク用テキストデータベース１５
０のテキストデータと一般用テキストデータベース１６
０−ｎのテキストデータとから一般用テキストデータベ
ース１６０−ｎに対する重みｗ_nを決定する（Ｓ２）、
ｎ＝Ｎかを調べ（Ｓ３）、ｎ＝Ｎでなければｎを＋１し
てステップＳ２に戻る（Ｓ４）。ｎ＝Ｎであれば、つま
りすべての一般用テキストデータベース１６０−１〜１
６０−Ｎについて重みＷ₁〜Ｗ_Nを決定したら、認識タ
スク用テキストデータベース１５０に対する重みＷ_Tを
決定する（Ｓ５）。Next, FIG. 3 shows an example of a processing procedure in the weight determining section 210 in the symbol chain probability generating section 140 for the recognition task shown in FIG. 2, and the operation is initialized to n = 1 (S1). ), Recognition task text database 15
0 text data and general text database 16
A weight wn for the general text database 160- _n is determined from the text data of 0-n (S2).
It is checked whether n = N (S3). If n = N, n is incremented by 1 and the process returns to step S2 (S4). If n = N, that is, all general text databases 160-1 to 160-1
After determining the weight W ₁ to _W-N for 60-N, it determines a weight W _T for recognition task text database 150 (S5).

【００１５】重みｗ_nの決定の具体例を図４を参照して
説明する。例えば、パープレキシティに基づいて重み付
けをする場合には、認識タスク用テキストデータベース
１５０のテキストデータを用いて記号連鎖確率Ｐ_Tを記
号連鎖確率生成部４１０で生成しテキストデータベース
重み計算部４２０でその記号連鎖確率Ｐ_Tの一般用テキ
ストデータベース１６０−ｎに対するテストセットパー
プレキシティを計算し、そのパープレキシティの値に基
づいて重みの値Ｗ_nを決める。テストセットパープレキ
シティＰＰは、言語Ｌの情報理論的な意味での単語の平
均分岐数を表し、評価用テキスト集合（単語列、記号
列）に対して適用され、次式で与えられる。A specific example of determining the weight w _n will be described with reference to FIG. For example, when weighting is performed based on perplexity, the symbol chain probability _PT is generated by the symbol chain probability generation unit 410 using the text data of the recognition task text database 150, and the text database weight calculation unit 420 generates the symbol chain probability _PT. calculate the test set perplexity for the general text database 160-n of symbols chain probability P _T, determine the value W _n of weights based on the value of the perplexity. The test set perplexity PP represents the average number of branches of a word in the information-theoretic sense of the language L, is applied to an evaluation text set (word string, symbol string), and is given by the following equation.

【００１６】ＰＰ＝２^H(L) ここでＨ（Ｌ）＝−Σ_w1（１／ｎ）Ｐ（ｗ₁ ⁿ）logＰ
（ｗ₁ ⁿ）であり、Ｈ（Ｌ）は一単語あたりのエントロピ
ーであり、Ｐ（ｗ₁ ⁿ）は単語列ｗ ₁ ⁿ＝ｗ₁…ｗ_nの生成
確率である。つまり、認識タスク用テキストデータベー
ス１５０を用いて生成した記号連鎖確率Ｐ_Tを用いて、
一般用テキストデータベース１６０−ｎ内のテキストデ
ータについてテストセットパープレキシティＰＰを求め
る。言語パープレキシティが大きいほど、つまり単語の
平均分岐数が多いほど、単語を特定するのが難しく、あ
る記号連鎖確率と評価テキストからテストセットパープ
レキシティＰＰを求めた場合、そのＰＰの値が小さいほ
ど、記号連鎖確率は評価テキストをよく表現できている
ことになる。従って、前記実施例で一般用テキストデー
タベース１６０−ｎについて求めたテストセットパープ
レキシティの値が小さければ、この一般用テキストデー
タベース１６０−ｎは認識タスク用テキストデータベー
ス１５０と似ていることになる。よって一般用テキスト
データベース１６０−ｎに対する重みＷ_nを大きくす
る。PP = 2^{H (L)} Where H (L) = − Σ_w1(1 / n) P (w₁ ⁿ) LogP
(W₁ ⁿ) And H (L) is the entropy per word
And P (w₁ ⁿ) Is the word string w ₁ ⁿ= W₁... w_nGenerate
Probability. In other words, the text database for the recognition task
Chain probability P generated using the_TUsing,
Text data in the general text database 160-n
For test set perplexity PP
You. The higher the language perplexity, that is, the word
The higher the average number of branches, the more difficult it is to identify words.
Test set parp from the symbol chain probability and evaluation text
When determining the lexity PP, the smaller the value of that PP
The symbol chain probability can express the evaluation text well
Will be. Therefore, the general-purpose text data
Test set purple obtained for database 160-n
If the lexity value is small, this general text data
Database 160-n is a text database for the recognition task.
Will be similar to So general text
Weight W for database 160-n_nIncrease
You.

【００１７】また、未知語率に基づいて一般用テキスト
データベースを重み付けをすることもできる。この場合
は図４中に示すように認識タスク用テキストデータベー
ス１５０に存在する（出現する）記号（単語）のリス
ト、即ち記号リストＬ_Tを記号リスト生成部４３０で生
成し、テキストデータベース重み計算部４２０では、一
般用テキストデータベース１６０−ｎに出現する単語
（記号）ののべ数のうち、記号リストＬ_Tに含まれない
記号（単語）が何個あるかという割合（未知語率）を計
算し、その未知語率の値に基づいて一般用テキストデー
タベース１６０−ｎの重みの値Ｗ_nを決める。例えば一
般用テキストデータベース１６０−ｎの全単語数が２０
００で、その中１００単語が記号リストＬ_Tに含まれて
いない未知語であった場合は、その未知語率は（１００
／２０００）×１００＝５００になる。未知語率が小さ
いほど、記号リストＬ_Tと一般用テキストデータベース
１６０−ｎには重複する単語が多く一般用テキストデー
タベース１６０−ｎは認識タスク用データベース１５０
と類似していることになり、重みＷ_nを大きくする。Further, the general text database can be weighted based on the unknown word rate. In this case is present in the recognition task text database 150 as shown in FIG. 4 (appearing) list of symbols (words), that generates a symbol list L _T by the symbol list generator 430, the text database weight calculator in 420, among the number total of words appearing for general text database 160-n (symbol), calculate the percentage (vocabulary rate) is not included in the symbol list L _T sign (word) of that how many there Then, the weight value Wn of the general text database 160- _n is determined based on the value of the unknown word rate. For example, if the total number of words in the general text database 160-n is 20
In 00, when 100 words in it was an unknown word that is not included in the symbol list L _T, it is the unknown word rate (100
/ 2000) × 100 = 500. About the unknown word rate is low, symbol list L _T and the general text database for general use text database 160-n are many words that overlap in the 160-n is recognition tasks for database 150
It will be similar to, to increase the weight W _n.

【００１８】テキストデータベース計算部４２０では、
記号リストＬ_Tおよび記号連鎖確率Ｐ_Tのいずれかを用
いる場合に限らず、これら両者を組み合わせて用いるこ
ともできる。例えば一般用テキストデータベース１６０
−ｎの記号連鎖確率Ｐ_Tを用いたテストセットパープレ
キシティがしきい値より小さく、かつ記号リストＬ_Tに
対する未知語率がしきい値より小さい場合は一般用テキ
ストデータベース１６０−ｎの重みＷ_nを１とし、その
他の場合は重みＷ_nを０とする。なお記号リストＬ_Tの
生成は、後述する実施例では３万文、のべ１００万単語
の認識タスク用テキストデータベース１５０の場合、異
なる単語数は約１０万単語であるが、この１０万単語中
には、認識タスク用データベース１５０中に１回しか出
現しない単語も多く含まれており、それらの単語は統計
的に信頼性が低いことから、出現しなかったこととして
記号リストＬ_Tに含めない場合もあり、前記後述の実験
では記号リストＬ_Tを出現頻度の多い単語から２万単語
までとした。この頻度上位２万語の単語はこのデータベ
ース１５０ののべ１００万単語のうち約９９％を占め
る。In the text database calculation section 420,
Not only in the case of using any of the symbols list L _T and symbolic chain probability P _T, can be used in combination both. For example, a general text database 160
Test set perplexity using symbols linkage probability P _T of -n is smaller than the threshold value, and if the unknown word rate for symbol list L _T is smaller than the threshold of the general text database 160-n weights W _{n is set} to 1; in other cases, the weight W _{n is set} to 0. Note generating symbols list L _T is 3 Manbun in Examples described later, when a total of 1,000,000 words recognition task text database 150, different but word count is about 100,000 words, the 100,000 word in the, it includes many words that do not appear only once during the recognition task database 150, from those words are the low statistical reliability, not included in the symbol list L _T as it did not appear If also there, it said in the experiments described below was the symbol list L _T from many word frequency of occurrence up to 20,000 words. The 20,000 words with the highest frequency account for about 99% of the total 1 million words in the database 150.

【００１９】次に認識タスク用テキストデータベース１
５０の重みｗ_Tを決定する処理例を図５を参照して説明
する。例えば、認識タスク用テキストデータベース１５
０のデータ量Ｌ_Tをテキストデータ量算出部５１０で求
め、一般用テキストデータベース１６０−１〜１６０−
Ｎの全テキストデータ量Ｃ_Dをテキストデータ量算出部
５２０で求め、これらの比Ｃ_D/Ｃ_Tを重み計算部５３
０で計算し、その計算結果に基づいて認識タスク用テキ
ストデータベース１５０の重みＷ_Tを与える。一般用テ
キストデータベース１６０−１〜１６０−Ｎの全テキス
トデータ量Ｃ_Dとしてその重みＷ_nを考慮する場合はＣ
_D＝Σ_n=1 ^NＷ_nＣ_nを計算して求める。Ｃ_nは一般用
テキストデータベース１６０−ｎののべ単語数である。Next, the text database 1 for the recognition task
An example of processing for determining the weight w _T of 50 will be described with reference to FIG. For example, the recognition task text database 15
Seeking 0 data amount L _T in a text data amount calculation section 510, the general text database 160-1～160-
Obtains the full text data amount C _D N-text data amount calculator 520, weight calculator 53 these ratios C _D / C _T
Calculated in 0, it gives the weight W _T of the recognition task text database 150 based on the calculation result. When considering the weight W _n as the total text data amount C _D of the general text database 160 - 1 to 160-N C
Determined by calculating the _{_{^{D = Σ n = 1 N W}}} n C n. C _n is the base number of words of the general text database 160-n.

【００２０】上述において、認識タスク用テキストデー
タベース１５０には重みＷ_Tを与えなくてもよい。つま
り一般用テキストデータベース１６０−１〜１６０−Ｎ
の重みＷ₁〜Ｗ_Nを求めて、これを用いて図２中の記号
連鎖確率生成部２２０で記号連鎖確率を前述したように
求めてもよい。この場合はＷ_T＝１とし、Ｗ₁〜Ｗ_Nを
１以下の正数としたとも云える。また逆に、一般用テキ
ストデータベース１６０−１〜１６０−Ｎには重みを与
えず、つまりＷ₁〜Ｗ_Nを全て１とし、認識タスク用テ
キストデータベース１５０に１以上の重みＷ_Tを与え
て、記号連鎖確率生成部２２０で記号連鎖確率を求めて
もよい。In the above description, the weight W _T need not be given to the recognition task text database 150. That is, the general text databases 160-1 to 160-N
Seeking weight W ₁ to _W-N, the symbol linkage probability by the symbol linkage probability generation unit 220 in FIG. 2 may be determined as described above using the same. In this case, it can be said that W _T = 1 and W _{1 to} W _N are positive numbers of 1 or less. Conversely, without giving weights for General text database 160 - 1 to 160-N, that is to all 1 W ₁ to _W-N, giving one or more of the weight W _T recognition task text database 150, The symbol chain probability may be calculated by the symbol chain probability generation unit 220.

【００２１】上述した言語モデルの生成及び音声認識は
コンピュータによりプログラムを実行させて行うことも
できる。例えば図６に示すように各部がバス６７０に接
続され、メモリ６４０に言語モデル生成プログラムがＣ
Ｄ−ＲＯＭ、ハードディスクなどからあるいは通信回線
を介してしてインストールされてあり、ＣＰＵ６６０が
この言語モデル生成プログラムを実行することにより、
認識タスク用テキストデータベース１５０、一般用テキ
ストデータベース１６０−１〜１６０−Ｎを用いて、図
７に示すように認識タスク用テキストデータベース１５
０の単語から記号連鎖確率Ｐ_T又は記号リストＬ_Tを生
成し（Ｓ１）、その後、各一般用テキストデータベース
１６０−１〜１６０−Ｎのそれぞれについて、重みＷ_n
を順次計算し（Ｓ２）、次に認識タスク用テキストデー
タベース１５０の重みＷ_Tを計算し（Ｓ４）、その後、
これらの重みＷ₁〜Ｗ_N,Ｗ_Tを用いてテキストデータ
ベース１５０と１６０−１〜１６０−Ｎとの単語につい
て記号連鎖確率（言語モデル）を生成して記号連鎖確率
データベース１２０に格納する（Ｓ４）。The above-described generation of the language model and speech recognition can also be performed by executing a program by a computer. For example, as shown in FIG. 6, each unit is connected to a bus 670, and a language model generation program
It is installed from a D-ROM, a hard disk or the like or via a communication line, and the CPU 660 executes the language model generation program to
Using the recognition task text database 150 and the general text databases 160-1 to 160-N, as shown in FIG.
0 words generates symbol linkage probability P _T or symbol list L _T from (S1), then, for each of the general text database 160 - 1 to 160-N, weights W _n
And sequentially calculating (S2), then calculates a weight W _T of the recognition task text database 0.99 (S4), then,
These weights W ₁ to _W-N, and generates the word and text database 150 and 160 - 1 to 160-N symbol linkage probability (language model) using the W _T stored in the symbol linkage probability database 120 (S4 ).

【００２２】その後、音声認識を行うが、メモリ６５０
に音声認識プログラムを前述と同様にインストールして
おき、入力部６１０に音声が入力されると、ＣＰＵ６６
０が音声認識プログラムを実行し、記号連鎖確率データ
ベース１２０、音声標準パタンデータベース１３０を参
照して音声認識を行い、その結果の記号列を出力部６３
０から出力する。なお入力部６１０で入力される音声は
例えば線形予測分析されたＬＰＣケプストラム、ＬＰＣ
デルタケプストラム、対数パワーなどの特徴パラメータ
のベクトルデータ時系列とされたものである。記憶部６
２０は言語モデル生成や、音声認識時に一時にデータを
記憶するためなどに用いられる。実施例この発明の効果を確認するために評価実験を行った。評
価用の入力音声データは、ニュース番組の男性アナウン
サーの発話１２９文を用いた。認識タスク用テキストデ
ータベース１５０としては、約１００万単語のニュース
の書き起こしテキストを用いた。一般用テキストデータ
ベース１６０−１〜１６０−Ｎとしては、新聞記事、ホ
ームページ、ネットニューズなどから収集した約１億５
０００万単語のデータベースを用意し、この評価実験で
は、このテキストデータベース中の一文をそれぞれ一つ
の一般用テキストデータベース１６０−１〜１６０−Ｎ
とした。一般用テキストデータベースの重みＷ_nの決定
にはパープレキシティＰＰ _nを用い、一般用テキストデ
ータベース１６０−１〜１６０−Ｎ中でパープレキシテ
ィＰＰ_nがしきい値より低い文が全体の４０％となるよ
うにパープレキシティＰＰ_nのしきい値を設定し、パー
プレキシティＰＰ_nがしきい値より小さい文の一般用テ
キストデータベース１６０−ｎの重みＷ_nを１、しきい
値以上の文の一般用テキストデータベース１６０−ｎの
重みＷ_nを０とした。また、認識タスク用テキストデー
タベース１５０に対する重みＷ_Tは、認識タスク用テキ
ストデータベース１５０のデータ量（単語数）Ｃ_Tと重
み付けした一般用テキストデータベース１６０−１〜１
６０−Ｎのデータ量（単語数）Ｃ_Dとに基づいて、Ｃ_D/
Ｃ_Tとした。Thereafter, voice recognition is performed.
Install the speech recognition program as above
When a voice is input to the input unit 610, the CPU 66
0 executes the speech recognition program, and the symbol chain probability data
Base 120 and voice standard pattern database 130
And performs a speech recognition, and outputs the resulting symbol string to an output unit 63.
Output from 0. Note that the voice input by the input unit 610 is
For example, LPC cepstrum analyzed by linear prediction, LPC
Feature parameters such as delta cepstrum and log power
The vector data is a time series. Storage unit 6
20 is used to generate language models and temporarily store data during speech recognition.
It is used for storing.Example An evaluation experiment was performed to confirm the effects of the present invention. Comment
The input voice data for the price is a male announcement of a news program.
Sir's utterance 129 sentences were used. Text data for recognition task
The database 150 contains about 1 million words of news
The transcript text was used. General text data
As the bases 160-1 to 160-N, newspaper articles, e-
About 150 million collected from websites, net news, etc.
Prepare a database of 100 million words, and in this evaluation experiment
Is one sentence in this text database
General text database 160-1 to 160-N
And Weight W of general text database_nDecision
Has Perplexity PP _nAnd general text data
Perplexite in the bases 160-1 to 160-N
IPP_nIs 40% of the sentences that are lower than the threshold
Sea perplexity PP_nSet the threshold for
Plexity PP_nGeneral-purpose text
Weight W of Kist database 160-n_nOne, threshold
Value of the general text database 160-n
Weight W_nWas set to 0. Also, text data for recognition tasks
Weight W for database 150_TIs the text for the recognition task
Data (number of words) C of the strike database 150_TAnd heavy
General-purpose text database 160-1
60-N data amount (number of words) C_DAnd based on C_D/
C_TAnd

【００２３】評価用音声データの書き起こしテキストに
対する記号連鎖確率のパープレキシティ（単語の平均分
岐数、小さいほど評価テキストに対して高精度な連鎖確
率であるといえる）で評価したところ、認識タスク用テ
キストデータベース１５０のみから生成した記号連鎖確
率の場合はテキストデータ数が少ないため７５と大きな
値になり、認識タスク用テキストデータベース１５０と
大規模テキストデータベース群、つまり一般用テキスト
データベース１６０−１〜１６０−Ｎとから生成した記
号連鎖確率の場合は４２と小さくなったが、この発明の
重み付けを用いて生成した記号連鎖確率の場合は、３６
と更に小さくなった。When the evaluation was performed based on the perplexity of the symbol chain probability for the transcribed text of the evaluation speech data (the average number of branches of the word, the smaller the word, the higher the accuracy of the evaluation text is the chain probability). In the case of the symbol chain probability generated only from the text database 150 for use, the number of text data is small, so the value becomes 75, which is a large value. The text database 150 for the recognition task and the large-scale text database group, that is, the general text databases 160-1 to 160 are used. −N, the symbol chain probability is reduced to 42, but the symbol chain probability generated using the weighting of the present invention is 36.
And became even smaller.

【００２４】また、音声認識実験により評価したとこ
ろ、単語誤り率は、認識タスク用テキストデータベース
１５０のみから生成した記号連鎖確率の場合、１４．７
％、認識タスク用テキストデータと一般用テキストデー
タベース１６０−１〜１６０−Ｎとから生成した記号連
鎖確率の場合、１１．６％、この発明の重み付けを用い
て生成した記号連鎖確率の場合は、９．９％となり、認
識率の向上が明確に得られた。When evaluated by a speech recognition experiment, the word error rate was 14.7 in the case of a symbol chain probability generated only from the recognition task text database 150.
%, 11.6% for the symbol chain probability generated from the recognition task text data and the general text databases 160-1 to 160-N, and 11.6% for the symbol chain probability generated using the weighting of the present invention. This was 9.9%, which clearly improved the recognition rate.

【００２５】また、記号連鎖確率のパラメータ数は、認
識タスク用テキストデータベース１５０と一般用テキス
トデータベース１６０−１〜１６０−Ｎとから生成した
記号連鎖確率の場合、約１０００万であり、この発明の
重み付けを用いて生成した記号連鎖確率の場合は、約４
６０万であり、可成り少なくなった。The number of parameters of the symbol chain probability is about 10 million in the case of the symbol chain probability generated from the recognition task text database 150 and the general text databases 160-1 to 160-N. In the case of the symbol chain probability generated using weighting, about 4
It was 600,000, considerably reduced.

【００２６】[0026]

【発明の効果】以上述べたようにこの発明によれば下記
の第１および第２の効果を得ることができる。第１の効
果は、一般の大規模データベース群から、認識タスク用
テキストデータに類似したデータに重み付けをすること
により、認識タスクに対して高精度な記号連鎖確率を生
成することができる。As described above, according to the present invention, the following first and second effects can be obtained. The first effect is that a high-precision symbol chain probability can be generated for a recognition task by weighting data similar to the recognition task text data from a general large-scale database group.

【００２７】第２の効果は、重み付けの際に、認識タス
クに対して類似度の低いデータに対する重みを０にする
ことにより、高精度かつ記憶容量の小さい記号連鎖確率
を生成することができる。The second effect is that, by setting the weight for data having low similarity to the recognition task to 0 at the time of weighting, a symbol chain probability with high accuracy and small storage capacity can be generated.

[Brief description of the drawings]

【図１】この発明の音声認識方法が適用される装置の構
成を示すブロック図。FIG. 1 is a block diagram showing a configuration of an apparatus to which a voice recognition method according to the present invention is applied.

【図２】この発明の言語モデル（記号連鎖確率）生成方
法が適用される生成部の構成を示すブロック図。FIG. 2 is a block diagram showing a configuration of a generation unit to which a language model (symbol chain probability) generation method of the present invention is applied.

【図３】この発明の言語モデル生成方法の処理の流れを
示すフローチャート。FIG. 3 is a flowchart showing a processing flow of a language model generation method according to the present invention.

【図４】一般用テキストデータベースの重み決定部の構
成例を示すブロック図。FIG. 4 is a block diagram showing a configuration example of a weight determining unit of the general text database.

【図５】認識タスク用テキストデータベースの重み決定
部の構成例を示すブロック図。FIG. 5 is a block diagram showing a configuration example of a weight determination unit of a text database for a recognition task.

【図６】この発明による言語モデル生成方法及び音声認
識方法をコンピュータにより実行される場合の構成例を
示す図。FIG. 6 is a diagram showing a configuration example when a language model generation method and a speech recognition method according to the present invention are executed by a computer.

【図７】この発明による言語モデル生成方法の処理手順
の例を示すフローチャート。FIG. 7 is a flowchart showing an example of a processing procedure of a language model generating method according to the present invention.

───────────────────────────────────────────────────── フロントページの続き (72)発明者松永昭一東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者川端豪東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5D015 HH23 ──────────────────────────────────────────────────続き Continued on the front page (72) Shoichi Matsunaga, Inventor 2-3-1 Otemachi, Chiyoda-ku, Tokyo Nippon Telegraph and Telephone Corporation (72) Inventor Go Go Kawabata 2-3-3, Otemachi, Chiyoda-ku, Tokyo No. 1 Nippon Telegraph and Telephone Corporation F-term (reference) 5D015 HH23

Claims

[Claims]

1. A recognition task text database storing text data relating to a recognition target task, and a plurality of general text databases storing general text data not directly related to the recognition target task. A weight indicating the relation between each of the general text databases with respect to the task text database is determined. Using the recognition task text database and the plurality of general text databases, a symbol (word) of interest belongs to a database to which it belongs. A language model generation method characterized by generating a symbol chain probability (language model) by giving a weight.

2. A symbol chain probability is determined using the recognition task text database, and the weight is calculated for each general text database based on perplexity (entropy) using the symbol chain probability. The language model generation method according to claim 1, wherein:

3. The weight of each of the general text databases is calculated based on the ratio (unknown word rate) of symbols (words) in the database that are not included in the recognition task text data. 2. The language model generating method according to claim 1, wherein

4. A symbol chain probability is determined using the recognition task text database, and a perplexity (entropy) is determined for each general text database using the symbol chain probability. Of the symbols (words) in the database that are not included in the recognition task text data (unknown word rate), and for each of the general text databases, the perplexity and unknown word rate are calculated. 2. The language model generation method according to claim 1, wherein the weight is obtained from the following.

5. The weight of the recognition task text database based on a ratio of the text data amount of the recognition task text database to the text data amount of the plurality of general text databases. The language model generation method according to any one of 1 to 4.

6. A recognition task text database storing text data relating to a recognition target task and a plurality of general text databases storing general text data not directly related to the recognition target task. The task text database is given a greater weight than the plurality of general text databases, and the recognition task is performed for the symbol (word) of interest using all of the recognition task text database and the plurality of general text databases. A language model generation method characterized by generating a symbol chain probability (language model) by giving a weight to a text database.

7. The method of generating a symbol chain probability by assigning a weight to a target symbol includes assigning a weight of the database to the number of appearances in each database, and using the total value as the number of occurrences in the entire database. The method according to claim 1, wherein a probability is generated.

8. The method of generating a symbol chain probability by assigning a weight to an occurrence frequency or a symbol chain probability in each database for a symbol of interest is given a weight of the database, and the total integrated value of the symbol is calculated in the entire database. 8. The language model generation method according to claim 1, wherein a symbol chain probability is obtained as the appearance frequency or the symbol chain probability.

9. A speech recognition method for recognizing an input speech by using an acoustic model and a symbol chain probability (language model) and outputting the symbol as a sequence of symbols (words). A speech recognition method characterized by using a language model generated by the method according to any one of the first to third aspects.

10. A recording medium on which a program for causing a computer to execute the method according to claim 1 is recorded.