JP2002221989A

JP2002221989A - Text input method and apparatus

Info

Publication number: JP2002221989A
Application number: JP2001354521A
Authority: JP
Inventors: Mitsuru Endo; 充遠藤; Makoto Nishizaki; 誠西崎; Natsuki Saito; 夏樹齋藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2000-11-22
Filing date: 2001-11-20
Publication date: 2002-08-09
Anticipated expiration: 2021-11-20
Also published as: JP3948260B2

Abstract

(57)【要約】【課題】本発明は、小型化可能なテキスト入力方法を
実現することを目的とする。【解決手段】音声の入力を行う入力手段と、前記入力
音声から単語列候補を作成・表示する候補表示手段と、
ユーザが前記候補を選択する候補選択手段とを有し、ユ
ーザが発話の先頭から順に１〜数単語の単位で逐次的候
補選択を行うことを特徴とするもので、大語彙連続音声
認識において自動的に行われている探索処理を、ユーザ
主導の選択・確定操作を利用した逐次的な探索処理に変
更した。これにより、多くの単語列候補の組み合わせを
考慮した探索空間を大幅に削減することができ、記憶容
量の点でも処理量の点でも装置の小型化が可能となる。 (57) [Problem] To provide a text input method which can be reduced in size. SOLUTION: Input means for inputting voice, candidate display means for creating and displaying word string candidates from the input voice,
A candidate selecting means for the user to select the candidate, wherein the user sequentially selects candidates in units of one to several words in order from the beginning of the utterance. The search process that has been performed is changed to a sequential search process using a user-initiated selection / confirmation operation. As a result, a search space in which many combinations of word string candidates are considered can be significantly reduced, and the apparatus can be downsized in terms of storage capacity and processing amount.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識を利用し
たテキスト入力方法、特に携帯電話等の小型機器におけ
るテキスト入力方法及びその装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text input method utilizing voice recognition, and more particularly to a text input method and apparatus for a small device such as a mobile phone.

【０００２】[0002]

【従来の技術】従来、音声認識を利用したテキスト入力
方法としては、単語あるいは文節単位で話者が発声し、
その発声毎に音声認識する方法と、文あるいは文章単位
で発声し文全体を一挙に音声認識する方法とがあった。
前者の場合、特開平２−２９８９９７号公報に記載され
ているように、話者が発声した後に所定数の候補をメニ
ュー表示しその中から話者に選択させるようにしてい
た。しかし、この方法では、話者がいちいち文節単位で
発声を区切り、その度に正しい単語を選ばなければなら
ないので、入力操作が大変煩わしく、また時間がかかる
という課題があった。2. Description of the Related Art Conventionally, as a text input method using voice recognition, a speaker utters a word or a phrase unit,
There is a method of recognizing a speech for each utterance, and a method of recognizing a whole sentence at once by uttering in units of sentences or sentences.
In the former case, as described in Japanese Patent Application Laid-Open No. 2-298997, a predetermined number of candidates are displayed on a menu after the speaker has uttered, and the speaker can be selected from the menu. However, in this method, the speaker has to separate utterances in units of phrases, and must select the correct word each time, so that the input operation is very troublesome and time-consuming.

【０００３】一方後者の場合としては、たとえば「単語
を認識単位とした日本語の大語彙連続音声認識」 (情報
処理学会論文誌,Vol.40,No.4,pp1395-1403,Apl.1999）
に記載されたものが知られている。On the other hand, as the latter case, for example, “Japanese large vocabulary continuous speech recognition using words as recognition units” (Information Processing Society of Japan, Vol.40, No.4, pp1395-1403, Apl. 1999)
Are known.

【０００４】図１２にこの従来のテキスト入力方法の動
作フローを示し、その動作を説明する。FIG. 12 shows an operation flow of the conventional text input method, and the operation will be described.

【０００５】最初に、ユーザが音声を入力する（Ｓ１２
０１）。次に、装置は自動的に認識結果の探索を行う。
認識結果の探索においては、その装置は音素等の音響単
位を接続しながら発声全体の音響スコアを求める。それ
と同時に単語等の言語単位の系列に対して言語スコアを
求める。そして、その装置はそれらを統合したスコアの
高い順に認識結果の序列をつける。通常、一発声は数単
語ないし数十単語からなる文章である。その装置は精度
の良い認識結果を出力するために、探索途中において単
語候補の組み合わせを考慮した数多くの単語列候補を残
している（Ｓ１２０２）。[0005] First, the user inputs a voice (S12).
01). Next, the device automatically searches for a recognition result.
In searching for a recognition result, the device obtains an acoustic score of the entire utterance while connecting acoustic units such as phonemes. At the same time, a linguistic score is obtained for a series of linguistic units such as words. Then, the apparatus ranks the recognition results in descending order of the score obtained by integrating them. Usually, one utterance is a sentence composed of several words or tens of words. In order to output a highly accurate recognition result, the apparatus leaves many word string candidates in consideration of combinations of word candidates during the search (S1202).

【０００６】次に、その装置はその認識結果の序列にお
ける最上位の単語系列を入力した全ての音声について表
示する（Ｓ１２０３）。次に、ユーザは表示された認識
結果の内、自分の意図と異なる部分を修正する（Ｓ１２
０４）。そして、ユーザによるすべての修正が終わった
ときに、その装置は一発声に対する入力操作を終了する
（Ｓ１２０５）。Next, the apparatus displays the highest word sequence in the order of the recognition result for all the inputted voices (S1203). Next, the user corrects a part of the displayed recognition result that is different from his / her intention (S12).
04). Then, when all the corrections by the user have been completed, the device ends the input operation for one utterance (S1205).

【０００７】[0007]

【発明が解決しようとする課題】しかし上記の従来技術
では、文全体の認識処理を行った後に認識結果候補を修
正するため、長い発声の場合等には認識処理の負担が大
きく、記憶容量が多く必要となる。そのため、装置が小
型化できないという課題があった。However, in the above-mentioned prior art, the recognition result candidate is corrected after performing the whole sentence recognition process. Therefore, in the case of a long utterance, the load of the recognition process is large, and the storage capacity is large. Many are needed. Therefore, there is a problem that the device cannot be downsized.

【０００８】本発明は、小型化可能でかつ、１文以上の
連続音声の入力が可能なテキスト入力方法を実現するこ
とを目的とする。An object of the present invention is to realize a text input method which can be reduced in size and can input one or more sentences of continuous voice.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決する本発
明に係るテキスト入力方法及びその装置は、文あるいは
文章単位で入力された発声を、単語あるいは文節単位で
ユーザが文頭から逐次、候補を選択し確定していく探索
処理をするようにしたものである。A text inputting method and apparatus according to the present invention for solving the above-mentioned problems provide a method in which a user inputs utterances input in units of sentences or sentences, and sequentially inputs candidates in units of words or phrases from the beginning of a sentence. A search process for selecting and fixing is performed.

【００１０】これにより、多くの単語列候補の組み合わ
せを考慮した探索空間を保持しておく必要が無くなるた
め、記憶容量の大幅な削減と、音声認識処理量の削減が
出来る。これにより装置の小型化が可能となる。更に、
ユーザは１文以上の単位で連続音声の入力が出来るの
で、単語単位の入力のような煩わしさがなくなる。[0010] This eliminates the need to maintain a search space in which many combinations of word string candidates are taken into account, so that the storage capacity can be significantly reduced and the amount of speech recognition processing can be reduced. This makes it possible to reduce the size of the device. Furthermore,
Since the user can input a continuous voice in units of one or more sentences, the trouble of inputting in units of words is eliminated.

【００１１】[0011]

【発明の実施の形態】本発明の第１の態様に係るテキス
ト入力方法は、音声を連続して入力するステップと、そ
の入力された音声の先頭から１乃至数単語の単位で単語
列候補を作成する候補作成ステップと、その候補を表示
する表示ステップと、その表示された候補をユーザが選
択する選択ステップとを有し、その選択された候補に基
づき、次に続く音声に対し、その候補作成ステップと、
その表示ステップと、その選択ステップとを順次繰り返
すものである。DESCRIPTION OF THE PREFERRED EMBODIMENTS In a text input method according to a first aspect of the present invention, a step of continuously inputting speech and a step of selecting a word string candidate in units of one to several words from the head of the input speech are described. A candidate creating step for creating, a display step for displaying the candidate, and a selecting step for the user to select the displayed candidate. Based on the selected candidate, the next Creation steps;
The display step and the selection step are sequentially repeated.

【００１２】これによって、入力音声の先頭からユーザ
による候補の確定操作を行い、またその単位を数単語単
位とすることで、システムが準備しなくてはならない候
補が局所的な候補ですむために、少ない処理量と少ない
記憶容量でテキスト入力が可能となり、装置の小型化が
可能である。[0012] Thus, the user performs an operation of deciding a candidate from the beginning of the input voice, and the unit is set to several words, so that the system needs to prepare only local candidates. Text input is possible with a small amount of processing and a small storage capacity, and the size of the apparatus can be reduced.

【００１３】本発明の第２の態様は、第１の態様に係る
テキスト入力方法において、その候補作成ステップにお
いて、単語単位の連鎖確率に従った単語の連結を繰り返
す伸長処理により文節単位の候補を決定するものであ
る。According to a second aspect of the present invention, in the text input method according to the first aspect, in the candidate creation step, a phrase unit candidate is identified by an expansion process of repeating word connection in accordance with a word unit chain probability. To decide.

【００１４】これによって、単語単位の連鎖確率という
小型化に有利な情報を用いながらも、文節単位に変換す
ることでユーザにわかりやすく提示することができる。[0014] Thus, while using information that is advantageous for miniaturization, such as the chain probability in word units, it can be presented to the user in an easy-to-understand manner by converting it into phrase units.

【００１５】本発明の第３の態様は、第２の態様に係る
テキスト入力方法において、その候補作成ステップが更
に音響スコアによる候補の更新処理を有するするもので
ある。According to a third aspect of the present invention, in the text input method according to the second aspect, the candidate creating step further includes a candidate updating process based on an acoustic score.

【００１６】これによって、言語スコアと音響スコアを
用いて候補の序列を付けることで、使用頻度が高く、か
つ、発音が近い候補から順に候補を提示するので、ユー
ザが所望の候補を得るまでに目を通さなくてはならない
候補の数が少なくてすむ。[0016] By assigning the ranks of the candidates using the linguistic score and the acoustic score, the candidates are presented in the order of frequency of use and candidates having the closest pronunciation. Fewer candidates need to be read.

【００１７】本発明の第４の態様は、第３の態様に係る
テキスト入力方法において、その伸長処理を行った文節
候補の数が、言語スコアの最上位から所定数に達したこ
とによりその伸長処理を終了するものである。According to a fourth aspect of the present invention, in the text input method according to the third aspect, when the number of phrase candidates subjected to the expansion processing reaches a predetermined number from the top of the language score, the expansion is performed. This ends the processing.

【００１８】これによって、ユーザに提示する候補の数
に制限を加えることにより、音声認識処理が不調に終わ
ったときに、正しくない候補を延々と表示し続けること
なしに、再度音声入力を促すなどの処理が可能になる。Thus, by limiting the number of candidates to be presented to the user, when the speech recognition processing is unsuccessful, the user is prompted again for speech input without continuously displaying incorrect candidates endlessly. Can be processed.

【００１９】本発明の第５の態様に係るテキスト入力装
置は、音声の入力を行う入力部と、その入力部からの音
声の特徴量を抽出する音声前処理部と、確定された単語
列から言語モデルを用いて、次に続く単語候補を作成す
る単語候補作成部と、その抽出された特徴量とその単語
候補から言語モデル及び音響モデルの少なくともいずれ
か一方を用いて１乃至数単語の単語列候補を作成する単
語列作成部と、その単語列候補を表示する表示部と、ユ
ーザが表示されたその単語列候補を選択する操作部と、
その操作部により選択された単語列から次の単語候補の
作成をその単語候補作成部に指示する候補作成指示部と
を有するものである。A text input device according to a fifth aspect of the present invention comprises: an input unit for inputting a voice; a voice preprocessing unit for extracting a feature amount of the voice from the input unit; A word candidate creating unit for creating a next word candidate using the language model; and one or several words from at least one of the language model and the acoustic model from the extracted feature amount and the word candidate. A word string creation unit for creating a string candidate, a display unit for displaying the word string candidate, and an operation unit for the user to select the displayed word string candidate,
A candidate creation instructing unit that instructs the word candidate creating unit to create the next word candidate from the word string selected by the operation unit.

【００２０】これによって、入力音声の先頭からユーザ
による候補の確定操作を行い、またその単位を数単語単
位とすることで、システムが準備しなくてはならない候
補が局所的な候補ですむために、少ない処理量と少ない
記憶容量でテキスト入力が可能となり、装置の小型化が
可能である。Thus, the user performs the operation of deciding a candidate from the beginning of the input voice, and the unit is set to several words, so that the system needs to prepare only local candidates. Text input is possible with a small amount of processing and a small storage capacity, and the size of the apparatus can be reduced.

【００２１】本発明の第６の態様は、第５の態様に係る
テキスト入力装置において、その単語列作成部が、単語
単位の連鎖確率に従った単語の連結を繰り返す伸長処理
により文節単位の候補を作成するものである。According to a sixth aspect of the present invention, in the text input device according to the fifth aspect, the word string creating unit performs a phrase unit candidate by performing an extension process of repeating a word connection in accordance with a word unit chain probability. Is to create.

【００２２】これによって、単語単位の連鎖確率という
小型化に有利な情報を用いながらも、文節単位に変換す
ることでユーザにわかりやすく提示することができる。Thus, while using information that is advantageous for miniaturization, such as the chain probability in word units, it is possible to present to the user in an easy-to-understand manner by converting into phrase units.

【００２３】本発明の第７の態様は、第６の態様に係る
テキスト入力装置において、その単語列作成部が、更に
音響スコアによる更新処理を有するものである。According to a seventh aspect of the present invention, in the text input device according to the sixth aspect, the word string creating section further has an updating process based on an acoustic score.

【００２４】これによって、言語スコアと音響スコアを
用いて候補の序列を付けることで、使用頻度が高く、か
つ、発音が近い候補から順に候補を提示するので、ユー
ザが所望の候補を得るまでに目を通さなくてはならない
候補の数が少なくてすむ。In this way, by assigning the order of the candidates using the language score and the acoustic score, the candidates are presented in order from the candidate having the highest use frequency and the closest pronunciation, so that the user can obtain the desired candidate before the candidate is obtained. Fewer candidates need to be read.

【００２５】本発明の第８の態様は、第７の態様に係る
テキスト入力装置において、その単語列作成部は、その
伸長処理を行った文節候補の数が、言語スコアの最上位
から所定数に達したことによりその伸長処理を終了する
ものである。According to an eighth aspect of the present invention, in the text input device according to the seventh aspect, the word string creating section determines that the number of phrase candidates subjected to the decompression processing is a predetermined number from the top of the language score. Is reached, the decompression process is terminated.

【００２６】これによって、ユーザに提示する候補の数
に制限を加えることにより、音声認識処理が不調に終わ
ったときに、正しくない候補を延々と表示し続けること
なしに、再度音声入力を促すなどの処理が可能になる。Thus, by limiting the number of candidates to be presented to the user, when the speech recognition processing is unsuccessful, the user is prompted again for speech input without continuously displaying incorrect candidates endlessly. Can be processed.

【００２７】本発明の第９の態様に係る携帯電話は、第
５乃至第８の態様のいずれかに記載のテキスト入力装置
を有するものである。[0027] A mobile phone according to a ninth aspect of the present invention includes the text input device according to any one of the fifth to eighth aspects.

【００２８】これによって、特に携帯電話ではシステム
が準備しなくてはならない候補が局所的な候補ですむた
めに、装置の小型化が可能である。[0028] This makes it possible to reduce the size of the apparatus, especially in the case of a mobile phone, since only the local candidates are required for the system to prepare.

【００２９】本発明の第１０の態様に係るプログラム
は、音声を連続して入力する入力ステップと、その入力
された音声の先頭から１乃至数単語の単位で単語列候補
を作成する候補作成ステップと、その候補を表示する表
示ステップと、その表示された候補をユーザが選択する
選択ステップとを有し、その選択された候補に基づき次
に続く音声に対し、その候補作成ステップと、その表示
ステップと、その選択ステップとを順次繰り返すことを
実行させるものである。A program according to a tenth aspect of the present invention comprises an input step of continuously inputting voices, and a candidate generating step of generating word string candidates in units of one to several words from the beginning of the input voice. And a display step of displaying the candidate, and a selecting step of the user selecting the displayed candidate. The candidate creating step and the display of the next sound based on the selected candidate are performed. The step and the selecting step are sequentially repeated.

【００３０】これによって、入力音声の先頭からユーザ
による候補の確定操作を行い、またその単位を数単語単
位とすることで、システムが準備しなくてはならない候
補が局所的な候補ですむために、少ない処理量と少ない
記憶容量でテキスト入力が可能となり、装置の小型化が
可能である。By doing this, the user performs an operation of deciding a candidate from the beginning of the input voice, and the unit is set to several words, so that the system needs to prepare only local candidates. Text input is possible with a small amount of processing and a small storage capacity, and the size of the apparatus can be reduced.

【００３１】本発明の第１１の態様は、第１０の態様に
係るプログラムにおいて、その候補作成ステップで、単
語単位の連鎖確率に従った単語の連結を繰り返す伸長処
理により文節単位の候補を決定するものである。According to an eleventh aspect of the present invention, in the program according to the tenth aspect, in the candidate creating step, a phrase unit candidate is determined by an extension process of repeating word connection according to a word unit chain probability. Things.

【００３２】これによって、単語列単位の連鎖確率とい
う小型化に有利な情報を用いながらも、文節単位に変換
することでユーザにわかりやすく提示することができ
る。Thus, while using information, which is advantageous for miniaturization, such as a chain probability in units of word strings, it is possible to present to the user in an easy-to-understand manner by converting it into units of phrases.

【００３３】本発明の第１２の態様は、第１１の態様に
係るプログラムにおいて、その候補作成ステップが更に
音響スコアによる候補の更新処理を有するものである。According to a twelfth aspect of the present invention, in the program according to the eleventh aspect, the candidate creating step further includes a candidate updating process based on an acoustic score.

【００３４】これによって、言語スコアと音響スコアを
用いて候補の序列を付けることで、使用頻度が高く、か
つ、発音が近い候補から順に候補を提示するので、ユー
ザが所望の候補を得るまでに目を通さなくてはならない
候補の数が少なくてすむ。In this way, by assigning the order of the candidates using the language score and the acoustic score, the candidates are presented in the order of frequency of use and candidates having the closest pronunciation, so that the user can obtain the desired candidate before the candidate is obtained. Fewer candidates need to be read.

【００３５】本発明の第１３の態様は、第１２の態様に
係るプログラムにおいて、その伸長処理を行った文節候
補の数が、言語スコアの最上位から所定数に達したこと
により前記伸長処理を終了するものである。According to a thirteenth aspect of the present invention, in the program according to the twelfth aspect, the expansion processing is performed when the number of phrase candidates subjected to the expansion processing reaches a predetermined number from the top of the language score. That's the end.

【００３６】これによって、ユーザに提示する候補の数
に制限を加えることにより、音声認識処理が不調に終わ
ったときに、正しくない候補を延々と表示し続けること
なしに、再度音声入力を促すなどの処理が可能になる。Thus, by limiting the number of candidates to be presented to the user, when the speech recognition processing is unsuccessful, the user is prompted again for voice input without continuing to display incorrect candidates endlessly. Can be processed.

【００３７】本発明の第１４の態様に係るコンピュータ
読み取り可能な記憶媒体は、音声を連続して入力する入
力ステップと、その入力された音声の先頭から１乃至数
単語の単位で単語列候補を作成する候補作成ステップ
と、その候補を表示する表示ステップと、その表示され
た候補をユーザが選択する選択ステップとを有し、その
選択された候補に基づき次に続く音声に対し、その候補
作成ステップと、その表示ステップと、その選択ステッ
プとを順次繰り返すことを実行させるプログラムを記録
したものである。A computer-readable storage medium according to a fourteenth aspect of the present invention comprises: an input step of continuously inputting voice; and a word string candidate in units of one to several words from the head of the input voice. A candidate creating step, a displaying step of displaying the candidate, and a selecting step of selecting the displayed candidate by a user, and generating the candidate for the next sound based on the selected candidate. The program records a program for executing a step, its display step, and its selection step sequentially.

【００３８】これによって、入力音声の先頭からユーザ
による候補の確定操作を行い、またその単位を数単語単
位とすることで、システムが準備しなくてはならない候
補が局所的な候補ですむために、少ない処理量と少ない
記憶容量でテキスト入力が可能となり、装置の小型化が
可能である。Thus, the user performs the operation of deciding the candidate from the beginning of the input voice, and the unit is set to several words, so that the system needs to prepare only local candidates. Text input is possible with a small amount of processing and a small storage capacity, and the size of the apparatus can be reduced.

【００３９】本発明の第１５の態様は、第１４の態様に
係るプログラムを記録したコンピュータ読み取り可能な
記憶媒体において、そのプログラムの候補作成ステップ
で、単語単位の連鎖確率に従った単語の連結を繰り返す
伸長処理により文節単位の候補を決定するものである。According to a fifteenth aspect of the present invention, in a computer-readable storage medium storing the program according to the fourteenth aspect, in the candidate creating step of the program, the linking of words according to the chain probability of each word is performed. The phrase-by-phrase unit is determined by repeated extension processing.

【００４０】これによって、単語列単位の連鎖確率とい
う小型化に有利な情報を用いながらも、文節単位に変換
することでユーザにわかりやすく提示することができ
る。Thus, while using information that is advantageous for miniaturization, such as a chain probability in units of word strings, it is possible to present to the user in an easy-to-understand manner by converting it into units of phrases.

【００４１】本発明の第１６の態様は、第１５の態様に
係るプログラムを記録したコンピュータ読み取り可能な
記憶媒体において、そのプログラムの候補作成ステップ
が更に音響スコアによる候補の更新処理を有するもので
ある。According to a sixteenth aspect of the present invention, in the computer-readable storage medium storing the program according to the fifteenth aspect, the step of creating a candidate for the program further includes a process of updating the candidate based on an acoustic score. .

【００４２】これによって、言語スコアと音響スコアを
用いて候補の序列を付けることで、使用頻度が高く、か
つ、発音が近い候補から順に候補を提示するので、ユー
ザが所望の候補を得るまでに目を通さなくてはならない
候補の数が少なくてすむ。In this way, by assigning the order of the candidates using the language score and the acoustic score, the candidates are presented in order from the candidate having the highest use frequency and the closest pronunciation, so that the user can obtain the desired candidate before the candidate is obtained. Fewer candidates need to be read.

【００４３】本発明の第１７の態様は、第１６の態様に
係るプログラムを記録したコンピュータ読み取り可能な
記憶媒体において、そのプログラムの伸長処理を行った
文節候補の数が、言語スコアの最上位から所定数に達し
たことにより前記伸長処理を終了するものである。According to a seventeenth aspect of the present invention, in the computer-readable storage medium storing the program according to the sixteenth aspect, the number of phrase candidates subjected to the program decompression processing is increased from the top of the language score. When the number reaches a predetermined number, the decompression process is terminated.

【００４４】これによって、ユーザに提示する候補の数
に制限を加えることにより、音声認識処理が不調に終わ
ったときに、正しくない候補を延々と表示し続けること
なしに、再度音声入力を促すなどの処理が可能になる。Thus, by restricting the number of candidates presented to the user, when the speech recognition processing is unsuccessful, the user is prompted again for voice input without continuously displaying incorrect candidates endlessly. Can be processed.

【００４５】以下、本発明の実施の形態について、図１
から図１１までを用いて説明する。Hereinafter, an embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIGS.

【００４６】（実施の形態１）図１は、本発明の一実施
の形態となるテキスト入力装置のブロック構成図であ
る。図１において、入力部１０１から取り込まれた入力
音声は、音声前処理部１０２に入力され、Ａ／Ｄ変換処
理されたのち、特徴量抽出処理を行う。単語候補作成部
１０３は、言語モデル１０４を参照して直前に確定して
いる文節に続く単語候補を言語モデル１０４から所定数
だけ作成する。ここで言語モデル１０４は単語系列にお
ける単語間の関係をモデル化したものである。最初の発
声の場合には、操作部１０８から指示を受けた候補作成
指示部１０９が単語候補作成部１０３に文頭であるとい
う指示を伝える。単語候補作成部１０３はこの指示を受
けとると、言語モデル１０４を参照し、文頭に発声され
る確率の高い単語を候補として作成する。このように作
成した単語候補を単語列作成部１０６に伝える。(Embodiment 1) FIG. 1 is a block diagram of a text input device according to an embodiment of the present invention. In FIG. 1, an input voice fetched from an input unit 101 is input to a voice preprocessing unit 102, subjected to A / D conversion processing, and then performs feature amount extraction processing. The word candidate creation unit 103 refers to the language model 104 and creates a predetermined number of word candidates from the language model 104 following the phrase determined immediately before. Here, the language model 104 models the relationship between words in a word sequence. In the case of the first utterance, the candidate creation instructing unit 109 that has received the instruction from the operation unit 108 notifies the word candidate creation unit 103 of the instruction of the beginning of the sentence. Receiving this instruction, the word candidate creating unit 103 creates a candidate word having a high probability of being uttered at the beginning of the sentence with reference to the language model 104. The word candidates thus created are transmitted to the word string creating unit 106.

【００４７】一方、作成部１０６は音声前処理部１０２
から文単位で発声された音声の特徴量を受け取り、メモ
リ１１０に一旦格納する。作成部１０６は、単語候補作
成部１０３からの単語候補に伸長処理及び、音響モデル
１０５と単語辞書１１１を参照する音響スコア更新処理
を行い、文節候補である単語列を所定の数だけ作成す
る。音響モデル１０５は音響的特徴をモデル化したもの
である。単語辞書１１１は認識されるべき単語が音響モ
デルが表す発音記号の例として登録されたものである。
なお、伸長処理および音響スコア更新処理については、
詳細に後述する。On the other hand, the creating unit 106
, The feature amount of the voice uttered in units of sentences is temporarily stored in the memory 110. The creation unit 106 performs an extension process on the word candidates from the word candidate creation unit 103 and an acoustic score update process with reference to the acoustic model 105 and the word dictionary 111, and creates a predetermined number of word strings as phrase candidates. The acoustic model 105 is a model of acoustic features. In the word dictionary 111, words to be recognized are registered as examples of phonetic symbols represented by acoustic models.
In addition, regarding the expansion processing and the acoustic score update processing,
Details will be described later.

【００４８】表示部１０７はその作成された単語列候補
を表示する。ユーザは操作部１０８により、表示された
候補の中から正しい文節を選択する。候補作成指示部１
０９は操作部１０８からの指示に従い、選択された文節
を単語列作成部１０６から受け取り、確定した文節とし
て出力する。また一方、候補作成指示部１０９は単語候
補作成部１０３へもその確定した文節を伝える。The display unit 107 displays the created word string candidates. The user selects a correct phrase from the displayed candidates using the operation unit 108. Candidate creation instructing unit 1
09 receives the selected phrase from the word string creating unit 106 in accordance with the instruction from the operation unit 108 and outputs it as a confirmed phrase. On the other hand, the candidate creation instructing unit 109 also notifies the word candidate creation unit 103 of the determined phrase.

【００４９】単語候補作成部１０３はその確定文節を受
け、前述したように、言語モデル１０４を参照して次に
続く単語候補を作成する。上記処理が入力された１文に
ついて終了するまで繰り返される。終了した後、メモリ
１１０に格納されていた１文の特徴量データは消去され
る。The word candidate creating unit 103 receives the determined phrase, and creates the next succeeding word candidate with reference to the language model 104 as described above. The above process is repeated until the input sentence is completed. After the end, the feature amount data of one sentence stored in the memory 110 is deleted.

【００５０】図２は、本実施例の携帯電話によるマンマ
シンインタフェース図である。ボイスボタン２０１は音
声認識の開始を装置に知らせるものである。候補ボタン
２０２は文節候補の表示や変更を要求するためのもので
ある。表示画面２０３は確定したテキストや文節候補な
どを表示するものである。確定ボタン２０４は文節候補
を確定するものである。FIG. 2 is a man-machine interface diagram using the portable telephone of the present embodiment. The voice button 201 is used to notify the start of voice recognition to the apparatus. The candidate button 202 is used to request display or change of a phrase candidate. The display screen 203 displays the determined text, phrase candidates, and the like. The confirm button 204 is for confirming a phrase candidate.

【００５１】図３に本発明のテキスト入力装置の動作の
概要を表したフローチャートを示す。以下に、図１から
図３を用いて、本発明の動作を説明する。FIG. 3 is a flowchart showing the outline of the operation of the text input device of the present invention. The operation of the present invention will be described below with reference to FIGS.

【００５２】初めにユーザはボイスボタン２０１を押し
１文を発声し、音声を入力する。テキスト入力装置は、
入力音声に対してＡ／Ｄ変換処理を行う。そして、変換
された音声信号に対して例えば１０msecごとのフレーム
単位でＬＰＣケプストラム係数等の特徴抽出処理を行う
（Ｓ３０１）。First, the user presses the voice button 201, utters one sentence, and inputs a voice. Text input device
A / D conversion processing is performed on the input voice. Then, a feature extraction process such as an LPC cepstrum coefficient is performed on the converted audio signal in units of frames, for example, every 10 msec (S301).

【００５３】次に、ユーザは候補ボタン２０２を押し
て、文節候補表示要求を行う（Ｓ３０２）。テキスト
入力装置は、入力音声の特徴量と音響モデル及び言語モ
デルを用いて、文節の候補リストを作成し、表示画面２
０３に上位の候補を１つ以上表示する（Ｓ３０３）。Next, the user presses the candidate button 202 to make a phrase candidate display request (S302). The text input device creates a phrase candidate list using the feature amount of the input speech, the acoustic model, and the language model, and displays
03, one or more top candidates are displayed (S303).

【００５４】この文節候補リストは、音響スコアと言語
スコアとの重み付けをした和である統合スコアが大きい
順に単語列を並べたものである。ここで、単語列に対す
る音響スコアは、以下のようにして求めることができ
る。入力フレームｉ、辞書フレームjに対する音響スコ
アas(i,j)は、数式(1)によって、計算できる。This phrase candidate list is a list in which word strings are arranged in descending order of the integrated score which is the weighted sum of the acoustic score and the language score. Here, the acoustic score for the word string can be obtained as follows. The acoustic score as (i, j) for the input frame i and the dictionary frame j can be calculated by Expression (1).

【００５５】[0055]

【数１】 (Equation 1)

【００５６】ここで、"t"は転置、"-1"は逆行列、x(i)
は入力フレームiに相当する入力ベクトル、((j) 、
((j)は辞書フレームjに相当する特徴ベクトルの共分散
行列と平均値ベクトルである。前述の音響モデルは具体
的には、これら辞書フレームの共分散行列および平均値
ベクトルの集合である。入力ベクトルはLPCケプストラ
ム係数ベクトルのような入力音声の抽出された特徴ベク
トルである。辞書フレームは、入力フレームに対応する
とみなされた文節辞書に登録されている文節を音響モデ
ルから取り出した同じく特徴ベクトルである。なお、特
徴量データはLPCケプストラム係数ベクトルに限られ
ず、ＭＦＣＣ(Mel Frequency Cepstral Coefficients)
などが使用できる。Here, "t" is transposed, "-1" is an inverse matrix, x (i)
Is the input vector corresponding to the input frame i, ((j),
((j) is the covariance matrix and the mean value vector of the feature vector corresponding to the dictionary frame j. The acoustic model described above is specifically a set of the covariance matrix and the mean value vector of these dictionary frames. The input vector is an extracted feature vector of the input speech, such as an LPC cepstrum coefficient vector, and the dictionary frame is a feature vector obtained by extracting a phrase registered in the phrase dictionary considered to correspond to the input frame from the acoustic model. Note that the feature data is not limited to the LPC cepstrum coefficient vector, but may be an MFCC (Mel Frequency Cepstral Coefficients).
Etc. can be used.

【００５７】単語の音響スコアは、ＤＰマッチング等の
マッチング手法により、入力フレームと辞書フレームの
対応関係を求め、その対応関係を結んだ最適パス上の音
響スコアを加算することにより求めることができる。さ
らに、単語列の音響スコアも、隣り合った単語の時間的
な整合性を考慮しながら単語単位の音響スコアを加算す
ることにより求めることができる。The acoustic score of a word can be obtained by finding the correspondence between an input frame and a dictionary frame by a matching method such as DP matching and adding the acoustic score on the optimal path connecting the correspondence. Furthermore, the acoustic score of a word string can also be obtained by adding the acoustic score of each word while considering the temporal consistency of adjacent words.

【００５８】また、単語列に対する言語スコアは、以下
のようにして求めることができる。The language score for a word string can be obtained as follows.

【００５９】前述の言語モデルは、具体的には、単語w
(i)がn個の先行単語pre(i,n)の後に出現する連鎖確率P
(w(i)|pre(i,n))の集合である。単語列に対する言語ス
コアは、言語モデルを参照し、各単語について先行単語
を考慮して連鎖確率またはその対数値を求め、それらを
加算することで得られる。The above-described language model specifically includes the word w
Chain probability P that (i) appears after n preceding words pre (i, n)
(w (i) | pre (i, n)). The language score for the word string is obtained by referring to the language model, determining the chain probability or its logarithmic value for each word in consideration of the preceding word, and adding them.

【００６０】このようにして、入力音声の特徴量と音響
モデルとから音響スコアが得られ、単語列仮説と言語モ
デルから言語スコアが得られる。それらを統合したスコ
アの高い単語列を文節候補としてリストに登録する。In this way, an acoustic score is obtained from the feature amount of the input speech and the acoustic model, and a language score is obtained from the word string hypothesis and the language model. A word string having a high score obtained by integrating them is registered as a phrase candidate in a list.

【００６１】次に、ユーザは表示された文節の候補を確
認し、所望の候補でなければ候補ボタン２０２を押し
て、次の候補を表示させる。所望の候補が表示された時
点でユーザは確定ボタン２０４を押して文節を確定する
（Ｓ３０４）。Next, the user confirms the displayed phrase candidate, and if not a desired candidate, presses the candidate button 202 to display the next candidate. When the desired candidate is displayed, the user presses the confirm button 204 to confirm the phrase (S304).

【００６２】文節単位で確定操作を行い、発声の最後ま
で文節の確定が終わっていなければステップＳ３０２に
戻り、最後の文節の確定が終わったところで終了する
（Ｓ３０５）。A determination operation is performed in units of phrases. If the phrase has not been determined until the end of the utterance, the process returns to step S302, and ends when the final phrase has been determined (S305).

【００６３】以上のように、本発明においては、ユーザ
の文節候補の確定操作によって、候補を確定した後に、
次に続く文節の候補作成をしていくために、それ以外の
候補を保存する必要もなければ、認識処理をする必要も
なくなる。これにより、装置が必要とする記憶容量が少
なくて済み、装置を小型化できる。As described above, in the present invention, after a candidate is determined by a user's operation for determining a phrase candidate,
There is no need to save other candidates and no need to perform recognition processing in order to create the next phrase candidate. As a result, the storage capacity required by the device can be reduced, and the device can be downsized.

【００６４】ここで、言語的な単位について考察する。
形態素のように短い単位は、少ない種類数でカバー率を
高くできるので装置の小型化に適している。しかし、ユ
ーザが選択していく単位としては、文節等のより長い単
位の方がわかりやすく好ましい。本発明は、言語的な最
小単位として形態素を用いる。なお、本実施例では、人
間とのインタラクションにおいてより好ましい形態素を
適当に接続した文節単位に組み上げた例を提示する。こ
の組み上げ処理を形態素の伸長処理と呼ぶ。Here, a linguistic unit will be considered.
Short units such as morphemes are suitable for miniaturization of the apparatus because the coverage can be increased with a small number of types. However, as a unit selected by the user, a longer unit such as a phrase is preferable because it is easy to understand. The present invention uses a morpheme as a linguistic minimum unit. In this embodiment, an example is presented in which more preferable morphemes in the interaction with a human are assembled in a phrase unit appropriately connected. This assembling process is called a morpheme decompression process.

【００６５】以下に、単語列作成部１０６で行う文節候
補作成処理について、図４から図６を用いて詳しく説明
する。The phrase candidate creation processing performed by the word string creation unit 106 will be described in detail below with reference to FIGS.

【００６６】図４は、本発明の文節候補作成過程の処理
手順を示したフローチャートである。本実施例では、最
初に形態素単位の候補を伸長して文節候補リストを作っ
た（Ｓ４０１〜Ｓ４０６）。次にその結果に音響スコア
を加味して最終的な文節候補リストを作成した（Ｓ４０
７〜Ｓ４１２）。図５は確定済みの文節「それを」５０
０の次に接続する文節候補リストを伸長処理により作成
した時の処理データの例である。図６はその伸長処理後
に音響スコア更新をして文節候補リストを作成した時の
処理データの例である。FIG. 4 is a flowchart showing the procedure of the phrase candidate creation process of the present invention. In this embodiment, first, a phrase candidate list is created by expanding morpheme unit candidates (S401 to S406). Next, a final phrase candidate list is created by adding the acoustic score to the result (S40).
7 to S412). Figure 5 shows the confirmed phrase "it" 50
It is an example of processing data when a phrase candidate list connected after 0 is created by decompression processing. FIG. 6 shows an example of processing data when a phrase candidate list is created by updating the acoustic score after the expansion processing.

【００６７】図５において、最初に確定済み文節「そ
れを」５００の次に接続する文節候補リスト５１０を作
成する。それは、「それを」と全形態素の間の連鎖確率
を、予め学習しておいた言語モデルにより求められる。
この求めた形態素のリストを連鎖確率の高いものからソ
ートしたものを文節候補リスト５１０とする。各文節候
補には、伸長終了フラグ（図５、図８、図１０、図１１
では「完了」と表示してある）として、今後伸長される
可能性があることを表す０を初期値として与える（Ｓ４
０１）。この伸長終了フラグは、伸長の可能性がなくな
ったときに‘１’がセットされる。この状態では、文節
候補は短すぎてわかりにくい。そこで、単語列作成部１
０６はその文節候補とそれにつづく形態素の間で連鎖確
率が比較的高いものを探して、その文節候補と接続し、
より長い文節候補を作成する。In FIG. 5, first, a phrase candidate list 510 to be connected after the determined phrase “it” 500 is created. That is, the chain probability between “it” and all morphemes is obtained by a language model that has been learned in advance.
A list of the obtained morphemes sorted from those with the highest chain probability is referred to as a phrase candidate list 510. Each of the phrase candidates has a decompression end flag (FIGS. 5, 8, 10, and 11).
In this case, “completed” is displayed, and “0” indicating that there is a possibility of decompression is given as an initial value (S4).
01). This decompression end flag is set to “1” when the possibility of decompression has disappeared. In this state, the phrase candidates are too short to be easily understood. Therefore, the word string creation unit 1
06 searches for a relatively high linkage probability between the phrase candidate and the morpheme following it, and connects the phrase candidate to the phrase candidate.
Create longer phrase candidates.

【００６８】そこで初めに、単語列作成部１０６は伸長
する文節候補を決定する。リストの最上位から文節候補
を参照し、伸長終了フラグが‘０’である最初の文節候
補を選ぶ（Ｓ４０２）。選択された候補はリスト５１１
のようになる。Therefore, first, the word string creating unit 106 determines a phrase candidate to be expanded. The phrase candidates are referred to from the top of the list, and the first phrase candidate whose decompression end flag is "0" is selected (S402). Selected candidate is list 511
become that way.

【００６９】次に、単語列作成部１０６は、伸長する文
節候補とその候補に接続しうる各形態素との連鎖確率を
求める。ここで、連鎖確率が所定の閾値未満の形態素、
又は連鎖確率が句読点に対する連鎖確率よりも小さい形
態素、並びに、句読点は、「その他の形態素」として一
つにまとめ、それらの連鎖確率の和を求める。その連鎖
確率は、「その他の形態素」の連鎖確率となる（Ｓ４０
３）。求めた連鎖確率はリスト５１２のようになり、
「し」からの連鎖確率が比較的大きい「た」と「て」以
外の確率は「（他）」としてまとめられている。図で
は、「その他の形態素」という概念に対応する記号とし
て（他）を用いている。但し、伸長終了フラグ（「完
了」）を表示したリスト５１０、５２０，および５３０
については記号（他）を省略してある。（以降の図６，
図８、図１０、図１１についても同じ。）次に、伸長候補を作成する。「それを」→「し」の連鎖
確率に、「し」→「た」の連鎖確率を掛けたものを、
「それを」→「した」の連鎖確率とする。文節候補
「し」は「した」に伸長されたことになる。同様にし
て、単語列作成部１０６は、伸長候補「して」を作成す
る。「その他の形態素」としてまとめたものは、後続の
形態素への分岐が多い。つまり文節の境界として相応し
いと考えられる。そのため、単語列作成部１０６は、
「その他の形態素」については伸長が終了したとみな
す。したがって、「し」のままとし、「それを」→
「し」と、「し」→「(他)」の確率を掛け、それを連鎖
確率とする。また、伸長終了フラグを‘１’にする（Ｓ
４０４）。その結果、伸長した候補のリスト５１３がで
きる。以上で、１回の伸長処理が終了する。Next, the word string creating unit 106 obtains a chain probability between the phrase candidate to be expanded and each morpheme that can be connected to the candidate. Here, a morpheme whose chain probability is less than a predetermined threshold,
Alternatively, morphemes whose chaining probability is smaller than the chaining probability for punctuation marks and punctuation marks are combined as “other morphemes”, and the sum of the chaining probabilities is obtained. The chain probability is the chain probability of “other morpheme” (S40
3). The calculated chain probability is as shown in list 512,
Probabilities other than "ta" and "te", which have relatively large chain probabilities from "shi", are grouped as "(other)". In the figure, (other) is used as a symbol corresponding to the concept of “other morpheme”. However, the lists 510, 520, and 530 displaying the extension completion flag (“completed”)
Are omitted from the symbols (others). (See Figure 6 below.
The same applies to FIGS. 8, 10 and 11. Next, a decompression candidate is created. The product of multiplying the chain probability of "it" → "shi" by the chain probability of "shi" → "ta",
Let it be the chain probability of "it" → "it". The phrase candidate "shi" has been expanded to "shi". Similarly, the word string creating unit 106 creates an extension candidate “do”. What is summarized as "other morphemes" has many branches to subsequent morphemes. In other words, it is considered to be suitable as a phrase boundary. Therefore, the word string creation unit 106
Regarding "other morphemes", it is considered that the decompression has been completed. Therefore, leave it as "Shi" and "She" →
Multiply “shi” by the probability of “shi” → “(other)” and use it as the chain probability. The decompression end flag is set to “1” (S
404). As a result, a list 513 of expanded candidates is created. Thus, one decompression process is completed.

【００７０】次に、単語列作成部１０６は文節候補リス
トを更新する。即ち、単語列作成部１０６は文節候補リ
スト５１０から、伸長前候補５１１を除く。次に、単語
列作成部１０６は、伸長後候補５１３を追加する。そし
て、それを連鎖確率の高い順位にしたがって並べ替える
（Ｓ４０５）。その結果、更新された文節候補リスト５
２０が得られた。Next, the word string creating unit 106 updates the phrase candidate list. That is, the word string creating unit 106 removes the pre-decompression candidate 511 from the phrase candidate list 510. Next, the word string creating unit 106 adds the post-expansion candidate 513. Then, they are rearranged according to the order of the highest chain probability (S405). As a result, the updated phrase candidate list 5
20 was obtained.

【００７１】次に、単語列作成部１０６は終了判定を行
う。本実施例では、あらかじめ設定しておいた回数であ
る１００回の伸長処理を行ったところで終了とした（Ｓ
４０６）。その伸長処理が１００回未満のときには終了
でないとして、Ｓ４０２へ戻る。このようにして伸長処
理をつづけていくことにより、文節候補リスト５３０の
ように、「した」、「受けた」、「決めた」など、文節
として適当な単位の候補が得られた。Next, the word string creating unit 106 makes an end determination. In the present embodiment, the process is terminated when the decompression process is performed 100 times, which is the preset number of times (S
406). If the decompression process is less than 100 times, it is determined that the process is not to be terminated, and the process returns to S402. By continuing the decompression process in this manner, candidates of a unit suitable for a phrase, such as “do”, “received”, and “determined”, as in the phrase candidate list 530, were obtained.

【００７２】なお、終了判定は、伸長終了フラグが
‘１’にセットされている文節候補の数が連鎖確率の最
上位から所定の数量に達したときに終了とすることが出
来る。その他、「その他の形態素」の連鎖確率よりも大
きな連鎖確率を有する伸長終了フラグが‘０’の文節候
補がなくなった時点で終了とすることも可能である。The end determination can be made when the number of phrase candidates for which the decompression end flag is set to “1” reaches a predetermined number from the top of the chain probability. In addition, it is also possible to terminate when there is no longer any phrase candidate whose decompression end flag having a chain probability greater than that of “other morphemes” is “0”.

【００７３】次に、音響スコアを加味して順位付けを行
った文節候補リストの作成方法を説明する。Next, a description will be given of a method of creating a phrase candidate list in which the ranking is performed in consideration of the acoustic score.

【００７４】図６において、確定済み文節「それを（終
端時刻３１４）」６００は、ステップＳ３０１におい
て、ボイスボタン２０１を押した時刻を始端とし、「そ
れを」を発声し終えた時刻（終端時刻）が３１４msであ
ったことを示す。In FIG. 6, the confirmed phrase “it (the end time 314)” 600 starts at the time when the voice button 201 is pressed in step S301 and ends when the “it” is uttered (the end time). ) Indicates 314 ms.

【００７５】初めに、１００回の伸長処理により作成さ
れた文節候補リスト５３０を元にして、連鎖確率を対
数処理して言語スコアを求める。本実施例では、式
（２）により言語スコアを連鎖確率から求めた。First, a linguistic score is obtained by performing logarithmic processing on the chain probability based on the phrase candidate list 530 created by 100 times of expansion processing. In the present embodiment, the language score is obtained from the chain probability by equation (2).

【００７６】Ｌ＝２０log₁₀ｌ（２）但し、Ｌ：言語スコア、ｌ：連鎖確率音響スコアの初期値としては、適当に高い値（ここでは
１．００）を設定した。また、言語スコアと音響スコア
の和を統合スコアとした。そして、単語列作成部１０６
は、統合スコアの高い順で文節候補リストをソートし、
リスト６１０を求めた。また、音響マッチングにより得
られる発声の終端時刻として、確定済み文節の終端時刻
３１４を初期値として各候補にセットした（Ｓ４０
７）。L = 20 log ₁₀ l (2) where L: language score, 1: linkage probability A suitably high value (here, 1.00) was set as the initial value of the acoustic score. The sum of the language score and the acoustic score was defined as the integrated score. Then, the word string creating unit 106
Sorts the phrase candidate list in descending order of the integrated score,
Listing 610 was sought. Also, as the end time of the utterance obtained by the acoustic matching, the end time 314 of the confirmed phrase is set as an initial value in each candidate (S40).
7).

【００７７】次に、単語列作成部１０６は音響スコアの
値を更新する候補を決定した。即ち、リストの最上位か
ら文節候補を参照し、未だ、音響スコアの更新が行われ
ていない最初の未更新候補を選んだ（Ｓ４０８）。な
お、音響スコアの更新が行われたか否かの判定は、確定
済みの文節の終端時刻と文節候補の終端時刻とが同じで
あるか否かにより行われる。リスト６１０においては、
「した」が選ばれた。Next, the word string creating unit 106 determines a candidate for updating the value of the acoustic score. In other words, the phrase candidates are referred to from the top of the list, and the first unupdated candidate for which the acoustic score has not been updated has been selected (S408). The determination as to whether or not the acoustic score has been updated is made based on whether or not the terminal time of the confirmed phrase and the terminal time of the phrase candidate are the same. In Listing 610,
"I did" was selected.

【００７８】次に、時刻３１４ms付近を始端として、
「した」に対する音響スコアを計算する（Ｓ４０９）。
音響マッチングの結果として、始端時刻３１４ms、終端
時刻６４３msという音声区間で、比較的音響スコアの高
い、０．８９が（１）式により得られた (リスト６１
２)。Next, starting at around time 314 ms,
An acoustic score for “Yes” is calculated (S409).
As a result of the acoustic matching, a relatively high acoustic score of 0.89 was obtained by the equation (1) in a speech section having a start time of 314 ms and an end time of 643 ms (List 61).
2).

【００７９】その音響マッチングの代表的な方法は、音
声信号のＡＤ変換、特徴パラメータへの変換、音響モデ
ルとの局所距離の計算、ＤＰマッチングによる局所距離
の累積計算、という処理過程からなる。これらの処理
は、ステップＳ３０１の音声入力において一括して行う
処理と、ステップＳ４０９の音響スコアの計算において
逐次的に行う処理に分散させることができる。一括して
行う処理は重複計算を防ぐので処理量の点で有利であ
り、逐次的に行う処理は途中の結果を保存しておく必要
がないので記憶容量の点で有利である。どのように分散
させるかは、実際のハードウエア構成に応じて決められ
るべきものである。本実施例においては、音響モデルと
の局所距離の計算及び、ＤＰマッチングによる局所距離
の累積計算処理をステップＳ４０９にて行った。A typical method of the acoustic matching includes AD processing of a voice signal, conversion to a characteristic parameter, calculation of a local distance to an acoustic model, and cumulative calculation of a local distance by DP matching. These processes can be divided into a process performed collectively in the voice input in step S301 and a process performed sequentially in the calculation of the acoustic score in step S409. Processing performed in a lump is advantageous in terms of the amount of processing because redundant calculations are prevented, and processing performed in a sequential manner is advantageous in terms of storage capacity because there is no need to save intermediate results. How to distribute the data should be determined according to the actual hardware configuration. In the present embodiment, the calculation of the local distance to the acoustic model and the cumulative calculation of the local distance by DP matching are performed in step S409.

【００８０】次に、単語列作成部１０６は、文節候補の
値を更新した。即ち、音響スコアを０．８９に更新し、
言語スコアと音響スコアの和を求めて統合スコアを更新
した。文節候補の終端時刻はマッチング区間を参照して
更新した（Ｓ４１０）。その結果、新しい候補はリスト
６１３になった。Next, the word string creating unit 106 updates the value of the phrase candidate. That is, the acoustic score is updated to 0.89,
The integrated score was updated by finding the sum of the language score and the acoustic score. The end time of the phrase candidate was updated with reference to the matching section (S410). As a result, the new candidate is listed 613.

【００８１】次に、文節候補リストを更新する。即ち、
単語列作成部１０６は文節候補リスト６１０から、音響
スコア更新前候補６１１を削除する。次に、単語列作成
部１０６は、更新後候補６１３を文節候補リスト６１０
に追加する。そして、統合スコアの高い順に並べ替える
（Ｓ４１１）。その結果、文節候補リスト６２０が得ら
れた。以上の処理を、音響スコア更新処理と呼ぶ。Next, the phrase candidate list is updated. That is,
The word string creating unit 106 deletes the pre-update acoustic score candidate 611 from the phrase candidate list 610. Next, the word string creating unit 106 stores the updated candidate 613 in the phrase candidate list 610.
Add to Then, they are rearranged in descending order of the integrated score (S411). As a result, a phrase candidate list 620 was obtained. The above process is called an acoustic score update process.

【００８２】次に、単語列作成部１０６は終了判定を行
う。本実施例では、あらかじめ設定しておいた回数であ
る１００回数の音響スコア更新処理を行ったときに終了
とした（Ｓ４１２）。１００回未満である場合は終了で
ないとして、ステップＳ４０８へ戻る。このようにして
音響スコア更新処理をつづけていくことにより、使用頻
度が高く、かつ、発声との音響マッチングのスコアも高
い文節候補リストが作成された。このリストはスコアの
高い順に文節候補が並べられた形をしている。Next, the word string creating unit 106 makes an end determination. In the present embodiment, the process ends when the acoustic score updating process is performed 100 times, which is the preset number of times (S412). If the number is less than 100, it is determined that the processing is not to be ended, and the process returns to step S408. By continuing the acoustic score updating process in this manner, a phrase candidate list that is frequently used and has a high score for acoustic matching with utterance is created. This list is in the form of phrase candidates arranged in descending order of score.

【００８３】なお、終了判定は、終端時刻が確定済み時
刻と異なっている文節候補の数が統合スコアの最上位か
ら所定の数量に達したときに終了とすることが出来る。Note that the end determination can be made when the number of phrase candidates whose end time is different from the determined time reaches a predetermined number from the top of the integrated score.

【００８４】テキスト入力装置は、以上のようにして得
られた文節候補リストの最上位の文節候補から表示す
る。これにより、テキスト入力装置は、その時に対象に
している文節に特定した音声認識処理で済むことにより
少ない処理量と少ない記憶容量でテキスト入力が可能と
なる。また、統合スコアの高い順に上位の候補を１つ以
上表示することができることにより、ユーザが所望の候
補を得るまでに提示される候補の数が少なくて済むよう
になった。そして更に、文節単位で候補を表示すること
により、ユーザにとって分かりやすい選択提示となっ
た。The text input device displays the phrase candidates at the top of the phrase candidate list obtained as described above. As a result, the text input device can perform the text input with a small processing amount and a small storage capacity by performing the voice recognition processing specified for the target phrase at that time. In addition, since one or more top candidates can be displayed in descending order of the integrated score, the number of candidates presented until the user obtains a desired candidate can be reduced. Further, by displaying the candidates in units of phrases, the selection and presentation are easy for the user to understand.

【００８５】（実施の形態２）この実施例は、単語列作
成部１０６で行う伸長処理と音響スコア更新処理とが同
時進行的に文節候補リストを更新していく点が実施例１
と異なる。その他のテキスト入力装置のブロック構成
図、マンマシンインターフェースなどは全て同じもので
ある。(Embodiment 2) This embodiment is different from Embodiment 1 in that the decompression process and the acoustic score update process performed by the word string creation unit 106 simultaneously update the phrase candidate list.
And different. The block configuration diagram, man-machine interface, etc. of the other text input devices are all the same.

【００８６】図７は、本発明の実施の形態２となるテキ
スト入力装置の文節候補作成過程の処理手順を示したフ
ローチャートである。FIG. 7 is a flowchart showing a processing procedure of a phrase candidate creation process of the text input device according to the second embodiment of the present invention.

【００８７】図８は、確定済みの文節「それを」５００
の次に接続する文節候補リストを、伸長処理と音響スコ
ア処理とを交互に繰り返すことにより作成したときの処
理データの流れを示したものである。FIG. 8 shows a confirmed phrase “it” 500
3 shows a flow of processing data when a phrase candidate list to be connected next to is created by alternately repeating decompression processing and acoustic score processing.

【００８８】以下に図７と図８を用いて詳しく説明す
る。The details will be described below with reference to FIGS. 7 and 8.

【００８９】最初に、確定済みの文節「それを」５００
から次に接続する文節候補リスト８０１を作成するステ
ップＳ７０１は実施例１のステップＳ４０１と同じであ
る。次にこの候補リスト８０１の連鎖確率に対数処理し
た言語スコアと音響スコアを加えた統合スコアを求め
て、音響スコア付き候補リスト８０２を作成した（Ｓ７
０２）。次に、その音響スコア付き候補リストの最上
位から伸長処理が未処理の候補を検索し、最初の候補を
伸長処理候補とする（Ｓ７０３）。リスト８０２におい
ては、その候補は「し」となる。この候補に対し、言語
モデルを使用してＳ４０７と同様に「し」から連鎖確率
が比較的大きい「た」と、「て」及び「（他）」が求ま
る（Ｓ７０４）。これらの文節候補を実施例１と同様に
候補リスト８０２に追加し、統合スコアの大きい順に並
べ替えて新たな文節候補８０３を得る（Ｓ７０５）。First, the determined phrase “that” 500
The step S701 of creating a phrase candidate list 801 to be connected to the next is the same as step S401 of the first embodiment. Next, an integrated score is obtained by adding the linguistic score and the acoustic score logarithmically processed to the linkage probability of the candidate list 801 to create a candidate list 802 with an acoustic score (S7).
02). Next, from the top of the candidate list with the acoustic score, a search is performed for a candidate that has not been subjected to decompression processing, and the first candidate is set as a decompression processing candidate (S703). In the list 802, the candidate is “shi”. For this candidate, "ta", "te" and "(other)" having a relatively large chain probability are obtained from "shi" in the same manner as in S407 using the language model (S704). These phrase candidates are added to the candidate list 802 in the same manner as in the first embodiment, and rearranged in descending order of the integrated score to obtain a new phrase candidate 803 (S705).

【００９０】次に、この候補リストの最上位から終端時
刻が確定済み文節候補の終端時刻と同じ候補を検索し、
最初の候補の音響スコアを求める（Ｓ７０６）。リスト
８０３においては、「受け」がそれに当たる。この候補
に対して、Ｓ４０９と同様にして音響スコアを求める
と、始端時刻３１４ms、終端時刻６４０msとい音声区間
で、音響スコア０．０２が得られた。それを文節候補リ
スト８０３に反映させ（Ｓ７０７）、統合スコアの大き
い順に並べ替えて新たな文節候補８０４を得た（Ｓ７０
８）。以上のステップＳ７０３からステップ７０８まで
の処理をあらかじめ設定しておいた回数繰り返し、文節
候補リスト８０６を得た。本実施例では１００回繰り返
した。本実施例の結果は、実施例１の結果と同じものに
なった。Next, a search is made for a candidate whose terminal time is the same as the terminal time of the determined phrase candidate from the top of the candidate list.
An acoustic score of the first candidate is obtained (S706). In the list 803, “receive” corresponds to this. When an acoustic score was obtained for this candidate in the same manner as in S409, an acoustic score of 0.02 was obtained in a speech section having a start time of 314 ms and an end time of 640 ms. This is reflected on the phrase candidate list 803 (S707), and the phrase is rearranged in the descending order of the integrated score to obtain a new phrase candidate 804 (S70).
8). The processing from step S703 to step 708 is repeated a preset number of times, and a phrase candidate list 806 is obtained. In this embodiment, the process was repeated 100 times. The result of the present example was the same as the result of Example 1.

【００９１】なお、本実施例では終了判定は、所定回数
の伸長処理と音響スコア更新処理を繰り返した時点で終
了と判定したが、伸長終了フラグが‘１’にセットされ
ている文節候補の数が最上位から所定の数量に達したと
きに終了とすることが出来る。In this embodiment, the end determination is made when the expansion processing and the acoustic score update processing are repeated a predetermined number of times. However, the number of the phrase candidates for which the expansion end flag is set to “1” is determined. Can be terminated when a predetermined number is reached from the top.

【００９２】また、終了判定は、終端時刻が確定済み時
刻と異なっている文節候補の数が統合スコアの最上位か
ら所定の数量に達したときに終了とすることも可能であ
る。Further, the end determination can be made when the number of phrase candidates whose end time is different from the determined time reaches a predetermined number from the highest rank of the integrated score.

【００９３】あるいは、終了判定は、上記伸長終了フラ
グによる方法、あるいは終端時刻による方法のいずれか
早い方で終了とすることも出来る。Alternatively, the end determination can be made by the method using the decompression end flag or the method using the end time, whichever is earlier.

【００９４】（実施の形態３）この実施例は、単語列作
成部で行う伸長処理と音響スコア更新処理とが実施例１
と逆の順で行われる点が実施例１と異なる。その他のテ
キスト入力装置のブロック構成図、マンマシンインター
フェースなどは全て同じものである。(Embodiment 3) In this embodiment, the expansion processing and the acoustic score update processing performed by the word string creating section are performed in the first embodiment.
Is different from the first embodiment in that the steps are performed in the reverse order. The block configuration diagram, man-machine interface, etc. of the other text input devices are all the same.

【００９５】図９は、本発明の実施の形態３となるテキ
スト入力装置の文節候補作成過程の処理手順を示したフ
ローチャートである。FIG. 9 is a flowchart showing a processing procedure of a phrase candidate creation process of the text input device according to the third embodiment of the present invention.

【００９６】図１０は、確定済みの文節「それを」５０
０の次に接続する文節候補リストを、音響スコア処理を
済ませた後に、伸長処理を行うことにより作成したとき
の処理データの流れを示したものである。FIG. 10 shows a confirmed phrase “that” 50.
This shows the flow of processing data when a phrase candidate list to be connected after 0 is created by performing decompression processing after completing the acoustic score processing.

【００９７】以下に図９と図１０を用いて詳しく説明す
る。Hereinafter, a detailed description will be given with reference to FIGS. 9 and 10.

【００９８】最初に、確定済みの文節「それを」５００
から次に接続する文節候補リスト１００１を作成するス
テップＳ９０１は実施例１のステップＳ４０１と同じで
ある。次にこの候補リスト１００１の連鎖確率に対数処
理した言語スコアと音響スコアを加えた統合スコアを求
めて、仮の音響スコア付き候補リスト１００２を作成し
た（Ｓ９０２）。次に、候補リスト１００２の最上位か
ら確定済み文節候補の終端時刻３１４と異なる終端時刻
の候補を検索し、最初の候補を音響スコア計算候補とし
て決定する（Ｓ９０３）。この候補に対する音響スコア
をＳ４０９と同様に計算する（Ｓ９０４）。リスト１０
０２においては、「し」が選択され音響スコアを計算す
ると、始端時刻３１４ms、終端時刻５１０msという音声
区間で、比較的音響スコアの高い、０．９０が得られた
（Ｓ９０４）。それを文節候補リスト１００２に反映さ
せ（Ｓ９０５）、統合スコアの大きい順に並べ替えて新
たな文節候補１００３を得た（Ｓ９０６）。上記のス
テップＳ９０３からステップＳ９０６までをあらかじめ
設定してある回数繰り返した（Ｓ９０７）。本実施例で
は、１００回繰り返し、候補リスト１００４を得た。First, the determined phrase “that” 500
A step S901 for creating a phrase candidate list 1001 to be connected to the next is the same as step S401 in the first embodiment. Next, an integrated score was obtained by adding the linguistic score and the acoustic score that were logarithmically processed to the linkage probability of the candidate list 1001, and a temporary acoustic score candidate list 1002 was created (S902). Next, a candidate having an end time different from the end time 314 of the confirmed phrase candidate is searched from the top of the candidate list 1002, and the first candidate is determined as an acoustic score calculation candidate (S903). The acoustic score for this candidate is calculated in the same manner as in S409 (S904). Listing 10
In 02, when “shi” was selected and the acoustic score was calculated, a relatively high acoustic score of 0.90 was obtained in the speech section of the start time 314 ms and the end time 510 ms (S904). The result is reflected in the phrase candidate list 1002 (S905), and the phrase is rearranged in descending order of the integrated score to obtain a new phrase candidate 1003 (S906). Steps S903 to S906 described above were repeated a preset number of times (S907). In this embodiment, the candidate list 1004 is obtained 100 times.

【００９９】次に、この候補リスト１００４に言語モデ
ルを使用して、伸長処理を行った。初めに、候補リスト
１００４の最上位から伸長終了フラグが‘１’にセット
されていない最初の候補を選択する（Ｓ９０８）。Next, a decompression process was performed on the candidate list 1004 using a language model. First, the first candidate whose decompression end flag is not set to “1” is selected from the top of the candidate list 1004 (S908).

【０１００】次に、言語モデルの連鎖確率を参照し（Ｓ
９０９）、ステップＳ４０３と同様に「し」から連鎖確
率が比較的大きい「た」と、「て」及び「（他）」が求
まる（Ｓ９１０）。Next, referring to the chain probability of the language model (S
909), "ta", "te" and "(other)" having a relatively large chain probability are obtained from "shi" in the same manner as in step S403 (S910).

【０１０１】これらの文節候補を実施例１と同様に候補
リスト１００４に追加し、統合スコアの大きい順に並べ
替えて新たな文節候補１００５を得る（Ｓ９１１）。These phrase candidates are added to the candidate list 1004 in the same manner as in the first embodiment, and are rearranged in descending order of the integrated score to obtain new phrase candidates 1005 (S911).

【０１０２】これらのステップＳ９０８からＳ９１１を
あらかじめ設定してある回数である１００回繰り返し
（Ｓ９１２）、文節候補１００６を得た。この結果は、
音響スコアとして一番目の形態素のみの値を用いるの
で、実施例１及び実施例２とは別の結果になるが、上位
に表示される文節は同様のものが得られた。Steps S908 to S911 are repeated 100 times, which is a preset number of times (S912), to obtain a phrase candidate 1006. The result is
Since only the value of the first morpheme is used as the acoustic score, the results are different from those of the first and second embodiments, but the same phrase is displayed in the higher rank.

【０１０３】なお、本実施例では、音響スコア更新処理
の終了判定は、更新処理を所定回数繰り返すことで終了
としたが、終端時刻が確定済み時刻と異なっている文節
候補の数が統合スコアの最上位から所定の数量に達した
ときに終了とすることも可能である。In this embodiment, the end of the acoustic score update process is determined by repeating the update process a predetermined number of times. However, the number of the phrase candidates whose end time is different from the determined time is determined by the integrated score. It is also possible to end when a predetermined quantity is reached from the highest order.

【０１０４】また、伸長処理の終了判定は、伸長終了フ
ラグが‘１’にセットされている文節候補の数が最上位
から所定の数量に達したときに終了とすることが出来
る。Also, the end determination of the decompression process can be terminated when the number of phrase candidates for which the decompression end flag is set to “1” reaches a predetermined number from the top.

【０１０５】図１１は図１０と同じく、音響スコア更新
処理を済ました後に、伸長処理を行ったものである。し
かし、この処理データ例は、この伸長処理において、ス
テップ９１０の伸長済み候補の作成を行った後に、連結
された形態素の分の音響スコアを計算し、伸長前の音響
スコアに加えた点が図１０と異なる。FIG. 11 shows the result of the expansion process after the completion of the acoustic score update process as in FIG. However, this processing data example shows that, in this decompression processing, after generating the decompressed candidates in step 910, the acoustic score for the connected morphemes is calculated and added to the acoustic score before decompression. Different from 10.

【０１０６】図１１の候補リスト１１０５の「した」及
び「して」の終端時刻がそれぞれ“６４３”及び“６４
０"に更新されている。このように、音響スコアを伸長
処理に併せて更新することは、文節候補の正確な音響ス
コアが求まりより好ましい。The end times of “do” and “do” in the candidate list 1105 of FIG. 11 are “643” and “64”, respectively.
0 ". It is more preferable to update the acoustic score in conjunction with the decompression processing, since an accurate acoustic score of a phrase candidate is obtained.

【０１０７】なお、実施例１から実施例３は、文節候補
リストを作成し、確定ボタンの入力により候補を確定し
た後に次の文節候補を作成するという例で説明を行っ
た。しかし、ユーザによる候補の確定した時点から、次
の文節候補を表示するまでの時間を短縮するために、候
補を表示している時点でその表示候補を用いて次の文節
候補の作成処理を行うことも可能である。あるいは、候
補リストの表示に所望のものがない場合、再度ボイスボ
タンを押し、その認識させたい文節のみを発声すること
により、候補の再作成を装置に行わせることも出来る。In the first to third embodiments, a description has been given of an example in which a phrase candidate list is created, the candidate is determined by inputting a confirm button, and then the next phrase candidate is created. However, in order to shorten the time from when the candidate is determined by the user to when the next phrase candidate is displayed, the process of creating the next phrase candidate is performed using the display candidate when the candidate is displayed. It is also possible. Alternatively, if there is no desired candidate list, the voice button may be pressed again, and only the phrase to be recognized is uttered to cause the apparatus to recreate the candidate.

【０１０８】[0108]

【発明の効果】以上のように本発明によれば、文あるい
は文章単位で入力された発声を、単語あるいは文節単位
でユーザが文頭から逐次、候補を選択し確定していく探
索処理をすることにより、装置の小型化と、音声入力の
煩わしさの軽減を両立したテキスト入力が実現できると
いう有利な効果が得られる。As described above, according to the present invention, it is possible to perform a search process in which the user sequentially selects candidates from the beginning of a sentence and determines the utterance inputted in units of sentences or sentences in units of words or phrases. Accordingly, an advantageous effect is obtained in which text input can be realized while achieving both a reduction in the size of the apparatus and a reduction in the complexity of voice input.

[Brief description of the drawings]

【図１】本発明の実施の形態１によるテキスト入力装置
のブロック構成図FIG. 1 is a block diagram of a text input device according to a first embodiment of the present invention;

【図２】本発明の実施の形態１によるテキスト入力装置
のマンマシンインタフェース図FIG. 2 is a man-machine interface diagram of the text input device according to the first embodiment of the present invention.

【図３】本発明の実施の形態１によるテキスト入力装置
の動作を示すフローチャートFIG. 3 is a flowchart showing an operation of the text input device according to the first embodiment of the present invention;

【図４】本発明の実施の形態１によるテキスト入力装置
の文節候補作成過程の処理手順のフローチャートFIG. 4 is a flowchart of a processing procedure of a phrase candidate creating process of the text input device according to the first embodiment of the present invention;

【図５】本発明の実施の形態１によるテキスト入力装置
の伸長処理過程の処理データの例を示す図FIG. 5 is a diagram showing an example of processing data in a decompression process of the text input device according to the first embodiment of the present invention;

【図６】本発明の実施の形態１によるテキスト入力方法
の音響スコア更新過程の処理データの例を示す図FIG. 6 is a diagram showing an example of processing data in an acoustic score update process of the text input method according to the first embodiment of the present invention.

【図７】本発明の実施の形態2によるテキスト入力装置
の文節候補作成過程の処理手順のフローチャートFIG. 7 is a flowchart of a processing procedure of a phrase candidate creating process of the text input device according to the second embodiment of the present invention;

【図８】本発明の実施の形態２によるテキスト入力装置
の文節候補作成過程の処理データの例を示す図FIG. 8 is a diagram showing an example of processing data in a phrase candidate creating process of the text input device according to the second embodiment of the present invention;

【図９】、本発明の実施の形態３によるテキスト入力装
置の文節候補作成過程の処理手順のフローチャートFIG. 9 is a flowchart of a procedure of a phrase candidate creating process of the text input device according to the third embodiment of the present invention;

【図１０】本発明の実施の形態３によるテキスト入力装
置の文節候補作成過程の処理データの例を示す図FIG. 10 is a diagram showing an example of processing data in a phrase candidate creating process of the text input device according to the third embodiment of the present invention.

【図１１】本発明の実施の形態３によるテキスト入力装
置のより好ましい文節候補作成過程の処理データの例を
示す図FIG. 11 is a diagram showing an example of processing data in a more preferable phrase candidate creation process of the text input device according to the third embodiment of the present invention;

【図１２】従来のテキスト入力方法のフローチャートFIG. 12 is a flowchart of a conventional text input method.

[Explanation of symbols]

１０１入力部１０２音声前処理部１０３単語候補部１０４言語モデル１０５音響モデル１０６単語列作成部１０７表示部１０８操作部１０９候補作成指示部１１０メモリ２０１ボイスボタン２０２候補ボタン２０３表示画面２０４確定ボタン Reference Signs List 101 input unit 102 speech preprocessing unit 103 word candidate unit 104 language model 105 acoustic model 106 word string creation unit 107 display unit 108 operation unit 109 candidate creation instruction unit 110 memory 201 voice button 202 candidate button 203 display screen 204 confirmation button

───────────────────────────────────────────────────── フロントページの続き (72)発明者齋藤夏樹大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5D015 HH23 KK02 LL08 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Natsuki Saito 1006 Kazuma Kadoma, Kazuma, Osaka Matsushita Electric Industrial Co., Ltd. F-term (reference) 5D015 HH23 KK02 LL08

Claims

[Claims]

1. A step of continuously inputting voices, a step of generating a word string candidate in units of one to several words from the beginning of the input voice, a display step of displaying the candidates, A selection step for the user to select the displayed candidate, and a text that repeats the candidate creation step, the display step, and the selection step sequentially for the next sound based on the selected candidate. input method.

2. The text input method according to claim 1, wherein, in the candidate creation step, a phrase unit candidate is determined by an expansion process of repeating word connection according to a word unit chain probability.

3. The text input method according to claim 2, wherein said candidate creating step further includes a candidate updating process based on an acoustic score.

4. The number of phrase candidates subjected to the decompression processing is as follows:
The text input method according to claim 3, wherein the decompression process is terminated when a predetermined number is reached from the top of the language score.

5. An input unit for inputting a voice, a voice preprocessing unit for extracting a feature amount of the voice from the input unit, and a next word candidate using a language model from the determined word string. A word candidate creating unit for creating, a word string creating unit for creating a word string candidate of one to several words from the extracted feature quantity and the word candidate using at least one of a language model and an acoustic model, A display unit for displaying a word string candidate, an operation unit for selecting the displayed word string candidate by the user, and instructing the word candidate creation unit to create a next word candidate from the word string selected by the operation unit A text input device comprising:

6. The text input device according to claim 5, wherein the word string creating section creates a phrase unit candidate by performing an extension process of repeating a word connection according to a word unit chain probability.

7. The text input device according to claim 6, wherein the word string creating unit further has an updating process based on an acoustic score.

8. The text according to claim 7, wherein the word string creating unit ends the decompression processing when the number of phrase candidates subjected to the decompression processing reaches a predetermined number from the top of the language score. Input device.

9. A mobile phone having the text input device according to claim 5.

10. An inputting step of continuously inputting voices, a candidate generating step of generating word string candidates in units of one to several words from the head of the input voices, and a displaying step of displaying the candidates. Selecting a displayed candidate by a user, and sequentially repeating the candidate creating step, the displaying step, and the selecting step for the next sound based on the selected candidate. A program that lets you do things.

11. The program according to claim 10, wherein in the candidate creating step, a phrase unit candidate is determined by an expansion process of repeating word connection according to a word unit chain probability.

12. The program according to claim 11, wherein the candidate creating step further includes a candidate updating process based on an acoustic score.

13. The non-transitory computer-readable storage medium according to claim 12, wherein the expansion processing is terminated when the number of phrase candidates subjected to the expansion processing reaches a predetermined number from the top of the language score.

14. An inputting step of continuously inputting voices, a candidate generating step of generating word string candidates in units of one to several words from the beginning of the input voices, and a displaying step of displaying the candidates. Selecting a displayed candidate by a user, and sequentially repeating the candidate creating step, the displaying step, and the selecting step for the next sound based on the selected candidate. A computer-readable storage medium that stores a program for executing the above.

15. The computer-readable storage medium storing a program according to claim 14, wherein, in the candidate creating step, a phrase unit candidate is determined by an expansion process of repeating word connection according to a word unit chain probability. .

16. A computer-readable storage medium storing a program according to claim 15, wherein said candidate creating step further includes a candidate updating process based on an acoustic score.

17. The computer-readable recording medium storing the program according to claim 16, wherein the expansion processing is terminated when the number of phrase candidates subjected to the expansion processing reaches a predetermined number from the top of the language score. Storage medium.