JP2002073081A

JP2002073081A - Voice recognition method and electronic equipment

Info

Publication number: JP2002073081A
Application number: JP2000256653A
Authority: JP
Inventors: Toshihisa Tsukada; 俊久塚田; Yoshiaki Kitatsume; 吉明北爪; Makoto Tanaka; 田中　　誠; Hideki Uchidate; 秀樹内館
Original assignee: Hitachi ULSI Systems Co Ltd
Current assignee: Hitachi Solutions Technology Ltd
Priority date: 2000-08-28
Filing date: 2000-08-28
Publication date: 2002-03-12

Abstract

PROBLEM TO BE SOLVED: To provide an English voice recognition method by which a rate of discrimination of alphabetic word is improved significantly with a simple configuration, and to provide electronic equipment of which operation convenience is improved by using a voice recognition technology. SOLUTION: In the English voice recognition method, the voice input of an alphabetic word is performed by substituting the word to alphabets that composes the word, such alphabets are inputted by substituting the alphabets to the combination of utterance of alphabetic characters of a plurality of languages, and the voice recognition is performed per above English alphabetic character. Such a voice recognition method is mounted on the electronic equipment.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声認識方法と
電子装置に関し、特に、音声入力を行うようにした携帯
電話機等の電子装置に利用して有効な技術に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method and an electronic device, and more particularly to a technique effective for use in an electronic device such as a portable telephone for performing voice input.

【０００２】[0002]

【従来の技術】音声認識技術は人が話す言葉や文章をコ
ンピュータに直接認識させる技術であり、現在入力手段
として用いられているキーボード入力、ペン入力などに
替わる簡便な入力手段として注目されている。音声認識
に関する文献の例として、Y.Obuchi, A.Koizumi, Y.Kit
ahara, J.Matsuda, and T.Tsukada, Proc. EUROSPEECH'
99, pp.2023-2026, 1999があり、口述ソフトウェアの例
として、 ViaVoice(ＩＢＭ社）、 NaturallySpeaking
（Ｄragon 社) 、音声認識ソフトウェアの例としてＡＳ
Ｒ１６００（Ｌ＆Ｈ社）がある。上記ＩＢＭ社の“ViaV
oice" やＤragon 社の“Naturally Speaking" は主とし
てＷintel ＰＣ向けのいわば重装備の口述ソフトウェア
である。2. Description of the Related Art Speech recognition technology is a technology that allows a computer to directly recognize words and sentences spoken by humans, and has attracted attention as a simple input means that replaces keyboard input, pen input, etc. currently used as input means. . Examples of literature on speech recognition include Y.Obuchi, A.Koizumi, Y.Kit
ahara, J. Matsuda, and T. Tsukada, Proc. EUROSPEECH '
99, pp.2023-2026, 1999. Examples of dictation software are ViaVoice (IBM) and NaturallySpeaking.
(Dragon), AS as an example of voice recognition software
R1600 (L & H). IBM's "ViaV
oice "and Dragon's" Naturally Speaking "are primarily heavy-duty dictation software for Wintel PCs.

【０００３】[0003]

【発明が解決しようとする課題】携帯電話機等のデジタ
ル電子機器では、その小型化のためにパーソナルコンピ
ータのような多数のキーを持つキーボードを搭載するこ
とが難しいし、仮に多数のキーを実装できたとしても１
つの大きさが小さく、かつ密集して配置されることなる
ために使い勝手の悪いものとなり現実的でない。そこ
で、入力手段として注目されている上記音声認識技術を
用いることが考えられるが、上記の口述ソフトウェアは
膨大な音声データを駆使することによりその認識率を高
めたもので、コンテクスト（文脈）などの利用により文
章入力についてはかなり高度の性能を有するものとなっ
ているが、反面では大容量のメモリと高性能のＣＰＵを
必要とする。したがって、このような口述ソフトウェア
を携帯電話機等のような小型で低消費電力であることが
必要なデジタル電子機器に搭載することもやはり現実的
ではない。In a digital electronic device such as a portable telephone, it is difficult to mount a keyboard having a large number of keys such as a personal computer because of the miniaturization thereof. At least 1
Since the two are small and densely arranged, they are inconvenient and impractical. Therefore, it is conceivable to use the above-mentioned speech recognition technology, which has attracted attention as an input means. Although text input has a very high level of performance due to its use, it requires a large-capacity memory and a high-performance CPU. Therefore, it is still not realistic to install such dictation software in a digital electronic device that needs to be small and consume low power, such as a mobile phone.

【０００４】この発明の目的は、簡単な構成で英単語の
識別率の大幅な改善を図った音声認識方法を提供するこ
とある。この発明の他の目的は、音声認識技術を用いて
使い勝手の改善を図った電子装置を提供することある。
この発明の前記ならびにそのほかの目的と新規な特徴
は、本明細書の記述および添付図面から明らかになるで
あろう。[0004] It is an object of the present invention to provide a speech recognition method which has a simple structure and which greatly improves the recognition rate of English words. Another object of the present invention is to provide an electronic device that improves usability by using a voice recognition technology.
The above and other objects and novel features of the present invention will become apparent from the description of the present specification and the accompanying drawings.

【０００５】[0005]

【課題を解決するための手段】本願において開示される
発明のうち代表的なものの概要を簡単に説明すれば、下
記の通りである。音声認識方法において、特定言語をそ
れを構成するアルファベットに置き換えて音声入力する
とともに、かかるアルファベットを複数言語の字母の発
声の組み合わせに置き換えて入力するとともに、音声認
識では上記アルファベットの字母単位の組み合わせで行
う。The following is a brief description of an outline of a typical invention among the inventions disclosed in the present application. In the voice recognition method, a specific language is replaced with the alphabet constituting the same and the voice is input, and the alphabet is replaced with a combination of the utterances of the characters in a plurality of languages and the input is performed. Do.

【０００６】本願において開示される発明のうち他の代
表的なものの概要を簡単に説明すれば、下記の通りであ
る。入力部で音声信号を取り込んでデジタル化し、信号
分析部で入力された音声信号の特徴抽出を行なって予め
用意された音響モデルと照合して字母判別を行う電子装
置において、特定言語の入力をそれを構成するアルファ
ベットに置き換えて音声入力するとともに、かかるアル
ファベットを複数言語の字母の発声の組み合わせに置き
換えて入力するとともに、上記音声認識部での音声認識
では上記アルファベットの字母単位の組み合わせで行
う。The following is a brief description of an outline of another typical invention disclosed in the present application. In an electronic device that takes in and digitizes a voice signal in an input unit, extracts a feature of the voice signal input in a signal analysis unit, compares the feature with an acoustic model prepared in advance, and determines a character model, an input of a specific language is performed. Is input by replacing the alphabet with the constituent alphabet, and the input is performed by replacing the alphabet with a combination of utterances of characters in a plurality of languages, and the voice recognition by the voice recognition unit is performed in units of the characters of the alphabet.

【０００７】[0007]

【発明の実施の形態】図１には、この発明に係る音声認
識方法を説明するための一実施例のブロック図が示され
ている。この実施例の英語音声認識の方法は、図１の各
ブロックでの信号処理に沿って行われる。発声音はまず
音声入力部においてディジタル信号化される。この実施
例では、簡単な構成での高い音声識別率を実現するため
に、音声入力を特定言語の単語等ではなく、そのスペル
に対応したアルファベットの字母の単位で行うようにす
ることに１つの特徴としている。FIG. 1 is a block diagram showing an embodiment for explaining a speech recognition method according to the present invention. The method of English speech recognition in this embodiment is performed according to signal processing in each block in FIG. The uttered sound is first converted into a digital signal in a voice input unit. In this embodiment, in order to realize a high speech recognition rate with a simple configuration, it is necessary to input speech not in words or the like in a specific language but in units of the alphabet corresponding to the spelling. Features.

【０００８】ここで、「字母」とは、国語辞典によれ
ば、「かな」、「アルファベット」、「梵字」等のよう
に発音を示すつづり字のひとつひとつのことをいい、英
語では「phoneme 」（フォニーム）に相当する。このフ
ォニーム（phoneme ）は、音素のことをいい、ある言語
の音声学上の最小単位を意味するものである。According to the Japanese dictionary, "letter" refers to each of the spellings that indicate pronunciation, such as "kana", "alphabet", and "sankanji". In English, "phoneme" is used. (Phony). This phoneme (phoneme) refers to a phoneme, which means the smallest phonetical unit of a language.

【０００９】上記のような字母単位でのスペル音声入力
には、字母毎に一定の無音期間を挿入するか、あるいは
字母の区切りを意味するキー入力信号を挿入すること等
により行われる。使い勝手を考慮すれば、字母毎に無音
期間を挿入することが有益であると考えられが、より確
実な字母の区切りを行うなら、キー信号を用いることが
有益である。The above-mentioned spelling input in units of characters is performed by inserting a fixed silence period for each character, or by inserting a key input signal indicating a break of the character. In consideration of usability, it is considered useful to insert a silence period for each character. However, if a more accurate character separation is performed, it is useful to use a key signal.

【００１０】上記音声入力部から字母単位で入力された
音声信号は、音声分析部に送られ、そこで特徴抽出処理
が行われる。具体的には短時間周波数分析である。分析
結果は照合部において予め用意された音響モデルと照合
され、判定処理を行う。最も高いスコアを得たものが認
識結果として表示される。[0010] The voice signal input from the voice input unit in character units is sent to a voice analysis unit, where a feature extraction process is performed. Specifically, it is a short-time frequency analysis. The analysis result is collated with an acoustic model prepared in advance in the collation unit, and a determination process is performed. The one with the highest score is displayed as the recognition result.

【００１１】照合部において用いられる音響モデルは、
一般的にはＨＭＭと単語辞書とコンテキスト辞書を連結
したものである。ここで、ＨＭＭとは、隠れマルコフモ
デル（Hidden Markov Model)と呼ばれるもので認識の基
となる参照モデルである。このＨＭＭに単語辞書や文脈
データなどを組込んだコンテキスト辞書を組み合わせた
ものが音響モデルとなる。The acoustic model used in the matching unit is
Generally, the HMM, the word dictionary, and the context dictionary are connected. Here, the HMM is a so-called Hidden Markov Model, which is a reference model serving as a basis for recognition. A combination of the HMM and a context dictionary incorporating a word dictionary, context data, and the like is an acoustic model.

【００１２】上記音声認識の基本となるＨＭＭは比較的
軽いソフトウェアであるが、従来のように口述に対応し
た特定言語の音声識別を行うようにするには、その言語
の単語辞書やコンテキスト辞書を設けることが必須とな
り、そのアプリケーションによっては重くなりやすい。
長文読み上げを実時間で認識することが求められる口述
ソフトファアなどでは、単語辞書は無論のこと文脈や文
例などのデータを大量に設けることが必要になるため、
データを高速で処理する高性能のＣＰＵおよび大容量の
メモリが必要になる。Although the HMM which is the basis of the above speech recognition is relatively light software, in order to perform speech recognition in a specific language corresponding to dictation as in the past, a word dictionary or context dictionary of the language must be used. It is essential to provide it, and it is likely to be heavy depending on the application.
In dictation softfare, which is required to recognize long sentences in real time, word dictionaries need to provide a large amount of data such as contexts and sentence examples.
A high-performance CPU for processing data at high speed and a large-capacity memory are required.

【００１３】携帯型電子装置においては長時間動作を可
能にするため、部品点数を減らし消費電力を押さえる必
要がある。メモリの容量はできるだけ少なくし、ＣＰＵ
のパワーも制限される。すなわち辞書を始めとするデー
タ量はなるべく軽いものが求められる。そこで、この発
明に係る英語音声認識方法では、英単語を字母単位で入
力し、それを字母単位で音声識別を行うようにすること
により、基本的には同図で点線で示したように単語辞書
やコンテスト辞書を大幅に簡略化するものである。In a portable electronic device, it is necessary to reduce the number of components and suppress power consumption in order to enable long-term operation. Use as little memory as possible
Power is also limited. That is, it is required that the data amount including the dictionary be as light as possible. Therefore, in the English speech recognition method according to the present invention, by inputting an English word in units of characters and performing speech recognition in units of characters, basically the words are input as indicated by dotted lines in FIG. It greatly simplifies dictionaries and contest dictionaries.

【００１４】この実施例では、英語の音声による入力能
率を極限まで高めるためにスペル発声による音声入力が
採用される。例えば“butter" は“bi: ju: ti: ti: i:
a:r" と発声する。これは単に“b*t*r"(ハ゛ター) と発声
するよりも情報量が多いためにその認識率は格段に改善
される。発声するのはアルファベットの２６文字だけな
ので、あらゆる単語の発音記号とそれに付随した音声デ
ータを駆使する重装備のソフトウェアを必要とするもの
に比較して、はるかに軽量のソフトウェアであらゆる単
語を入力することができかつ正確に認識される。In this embodiment, voice input by spelling is employed in order to maximize the input efficiency of English voice. For example, “butter” becomes “bi: ju: ti: ti: i:
a: r ", which is much more informative than simply saying" b * t * r "(patters), which greatly improves its recognition rate. It consists of 26 letters of the alphabet Is so much more lightweight that it can be entered and recognized exactly with software that is much lighter than anything that requires heavy equipment software that uses the phonetic symbols of every word and its accompanying audio data. You.

【００１５】単語の入力が正確にできれば文章の入力も
同様に入力できる。これはいわばワードプロセッサの音
声版とみなすことができる。英語のワードプロセッサが
行っていることはまさにこのことであって“butter" を
入力するにあたっては“b"“u"“t"“t"“e"“r"とキー
ボードのキーを叩くのである。If a word can be input correctly, a sentence can be input similarly. This can be considered as an audio version of a word processor. That's exactly what an English word processor does, hitting "b", "u", "t", "t", "e", and "r" keys on the keyboard when typing "butter".

【００１６】本発明のもうひとつの特徴は、上記に加え
てギリシャ語のアルファベットを併用することである。
先の“butter" の例でいうと“b"、“u"、“t"、“e"、
“r"等の代わりに、β（beita:ヘ゛ータ)、υ(ju:psilan:イフ
゜シロン) 、τ(tau:タウ)、ε(eps*lan:エフ゜シロン) 、ρ(rou:ロ
ー)等を併用することである。“butter" を“bi: ju: t
i: ti: i: a:r" と発声したときその出力が“gutter"
となったとする。つまり、“b"と“g"が混線したわけで
ある。このとき続けて“β" と発声すれば出力は自動的
に正しい“butter" となる。このような混線が予め予想
されるときには最初から“beita ju: ti: ti: i: a:r"
と発声することで混線をなくすることもできる。β(b)
だけでなく、δ(d) 、γ(g) 、π(p) 、τ(t) 等も利用
する。この他にもα(a) 、κ(k) 、ν(n) 、μ(m) 、ω
(o) 、ρ(r) 、σ(s) 等も適宜利用することが可能であ
る。Another feature of the present invention is to use a Greek alphabet in addition to the above.
In the above “butter” example, “b”, “u”, “t”, “e”,
Use β (beita: heater), υ (ju: psilan: epsilon), τ (tau: tau), ε (eps * lan: epsilon), ρ (rou: low), etc. instead of “r” etc. That is. “Butter” to “bi: ju: t
When you say "i: ti: i: a: r", the output is “gutter”
Let's say That is, "b" and "g" are mixed. If you say “β” continuously at this time, the output automatically becomes the correct “butter”. When such crosstalk is expected in advance, “beita ju: ti: ti: i: a: r”
You can eliminate crosstalk by saying β (b)
In addition, δ (d), γ (g), π (p), τ (t) and the like are used. In addition, α (a), κ (k), ν (n), μ (m), ω
(o), ρ (r), σ (s), etc. can also be used as appropriate.

【００１７】上記のような認識方法を採ることにより、
英語などの単語認識率を極限にまで高めることができ
る。すなわちスペリングによる入力においては認識すべ
き音節が基本的にはアルファベットに限定されるため認
識率が高まる。さらに、単語に該当しないアルファベッ
トの組み合わせが除外されることも認識率を高めるのに
効果がある。例えば、“butter" を“bi: ju: ti: ti:
i: a:r" と発声したとき、最初の“b"を通常混同しやす
い“d"と間違えることはない。なぜなら“dutter" とい
う単語がないからである。By adopting the above-described recognition method,
The recognition rate of words such as English can be increased to the utmost. That is, in spelling input, the syllables to be recognized are basically limited to alphabets, so that the recognition rate increases. Furthermore, the exclusion of alphabet combinations that do not correspond to words is also effective in increasing the recognition rate. For example, replace “butter” with “bi: ju: ti: ti:
When you say "i: a: r", you don't mistake the first "b" for the usually confusing "d" because there is no word "dutter".

【００１８】スペリング入力はそれだけでも効果がある
が、さらにこれを効果的にするためにギリシャ語のアル
ファベットを併用するものである。これはアルファベッ
ト間の違いをさらに高めるのに効果的である。英語を例
にとればいわゆる「ｅ問題」を避けることができる。す
なわち、“b"、“d"、“e"、“g"、“p"、“t"等のまぎ
らわしさを“β" 、“δ" 、“ε" 、“γ" 、“π" 、
“τ" 等の併用により避けることができる。英語とギリ
シャ語のアルファベットの中からお互いに距離の離れた
すなわち類似性の少ないものを選ぶことができるからで
ある。ギリシャ語のアルファベットは英語ほどには知ら
れていないが、それでも比較的なじみがあり記憶しやす
い。Although the spelling input is effective by itself, the Greek spelling is used together to make the spelling input more effective. This is useful for further enhancing the differences between the alphabets. Taking English as an example, the so-called "e problem" can be avoided. That is, the ambiguity of “b”, “d”, “e”, “g”, “p”, “t”, etc. is expressed as “β”, “δ”, “ε”, “γ”, “π”,
It can be avoided by using “τ” or the like together. This is because English and Greek alphabets that are far apart from each other, that is, those that have little similarity can be selected. The Greek alphabet is not as familiar as English, but it is still relatively familiar and easy to remember.

【００１９】なお、本発明は英語に限らず、仏語、独
語、ロシア語等の欧米系言語に適用可能であるという普
遍性を持つ。すなわちたとえば仏語とギリシャ語のアル
ファベットを組み合わせて使うことにより、仏語の単語
認識率を究極まで高めることができる。The present invention has universality that it can be applied not only to English but also to European and American languages such as French, German, and Russian. That is, for example, by using a combination of French and Greek alphabets, the word recognition rate of French can be increased to the maximum.

【００２０】本発明に係る音声認識方法に対応したソフ
トウェア等を携帯型電子装置に搭載した場合、単語の認
識率が改善されることによる音声入力の能率向上効果が
大きいことは無論であるが、それ以上に大きいのは必ず
入力できるという安心感である。これは製品を使用する
立場からすると非常に大きなことで、何度発声しても正
しい入力ができないのでは使ってもらえない。このこと
は本発明の適用製品が何であっても言えることである。When software or the like corresponding to the voice recognition method according to the present invention is installed in a portable electronic device, it is a matter of course that the effect of improving the efficiency of voice input by improving the word recognition rate is great. What is bigger than that is the sense of security that you can always enter. This is a huge thing from the standpoint of using the product, and it cannot be used if you can't get the correct input no matter how many times you say it. This is true regardless of the product to which the present invention is applied.

【００２１】本発明に関わる音声認識方法では、前記の
ようなスペリング入力にいくつかのコマンド、例えば
“Capital letter" （大文字）、“hyphen" （ハイフ
ン）、“comma"（コンマ）、“period" （ピリオド）、
“colon"（コロン）、“space"（スペース）、“new pa
ragraph"（改行）等を付加するようにしてもよい。この
ようなコマンドを設けることにより、文章入力も容易に
できる訳でありその効果は極めて大きい。更に従来の手
段では入力すべき単語のデータを予め入力する必要があ
り、その認識率は単語の数が１０００語、１００００語
と増えるにしたがって顕著に減少していった。本発明を
用いれば単語数が増大しても認識率は変化せず必ず入力
できる。In the speech recognition method according to the present invention, several commands, such as "Capital letter" (uppercase), "hyphen" (hyphen), "comma" (comma), and "period" are input to the spelling input as described above. (period),
“Colon”, “space”, “new pa”
A command such as "ragraph" (line feed) may be added. By providing such a command, it is possible to easily input a sentence, and the effect is extremely large. Must be input in advance, and the recognition rate decreases remarkably as the number of words increases to 1,000 or 10,000. With the present invention, the recognition rate does not change even if the number of words increases. Can always be entered.

【００２２】この実施例の音声認識方法では、基本的に
は英語やギャシャ語のアルファベット等のように少ない
数の音声識別を基本としているので、音響モデルのデー
タ量を極力少なくできる上に、かかるアルファベットの
字母結果を組み合わせることで、結果的にあらゆる種類
の単語や文章も入力することができる。このように音声
認識でのデータ量を少なくすることができるので、それ
を処理する中央処理装置ＣＰＵも低消費電力のＲＩＳＣ
(Reduced instruction set computer)タイプのものを用
いることができ、しかもメモリ容量も少なくてよい。こ
の結果、この発明に係る音声認識方法は、携帯用電子装
置に最適な入力方法である。使い勝手を良くするため
に、特定の制御信号や動作命令を音声で行うようにした
場合でも、単語辞書やコンテキスト辞書は小規模で済
む。In the speech recognition method of this embodiment, since a small number of speeches, such as English and Gasha alphabets, are basically used, the data amount of the acoustic model can be reduced as much as possible. By combining the results of the alphabet characters, all types of words and sentences can be input as a result. Since the amount of data in the speech recognition can be reduced in this way, the central processing unit CPU for processing the data can be reduced in the power consumption of the RISC.
(Reduced instruction set computer) type can be used, and the memory capacity may be small. As a result, the speech recognition method according to the present invention is an optimal input method for a portable electronic device. Even if a specific control signal or operation command is performed by voice to improve the usability, the word dictionary and the context dictionary need only be small.

【００２３】図２には、この発明を携帯型通訳機に適用
した場合の一実施例の外観図が示されている。この実施
例の携帯型通訳機は、英語−日本語通訳に向けられてい
る。この実施例の携帯型通訳機の使用方法は次の通りで
ある。発声釦１を押してマイク２に向かって発声する
と、認識結果が表示装置３に表示される。正しい結果が
得られたらＯＫ釦４を押して文例検索に移行する。スク
ロール釦５により検索し所望の文が見つかるとＯＫ釦４
を押して訳文表示をする。さらに発声釦１を押すと訳文
の音声がスピーカ６を通して流れる。FIG. 2 is an external view of one embodiment in which the present invention is applied to a portable interpreter. The portable interpreter of this embodiment is directed to an English-Japanese interpreter. The method of using the portable interpreter of this embodiment is as follows. When the utterance button 1 is pressed and uttered toward the microphone 2, the recognition result is displayed on the display device 3. When a correct result is obtained, the user presses the OK button 4 to shift to sentence example search. Use the scroll button 5 to search and find the desired sentence.
Press to display the translation. When the utterance button 1 is further pressed, the voice of the translated sentence flows through the speaker 6.

【００２４】この携帯型通訳機に本発明に係る音声認識
方法を適用した例を“Thank"を例にして示す。発声釦１
を押してマイク２に口を近づけて“ti: eit* ei en ke
i"(ティーエイチ、アイ、イー、エヌ、ケイ) と発声する。認識結果
が“Thank"と表示装置３に表示される。認識時には自動
的にスペルチェックが行われ、辞書にない単語は除外さ
れる。これは認識率を高めるのに効果がある。認識結果
を表示装置３でチェックしたらＯＫ釦４を押すと文例が
表示される。An example in which the voice recognition method according to the present invention is applied to this portable interpreter will be described using "Thank" as an example. Voice button 1
Press and bring your mouth close to microphone 2 and “ti: eit * ei en ke
Say "i" (Tee H, I, E, N, Kay). The recognition result is displayed as "Thank" on the display device 3. At the time of recognition, the spell check is performed automatically, and words not in the dictionary are excluded. When the recognition result is checked on the display device 3 and the OK button 4 is pressed, a sentence example is displayed.

【００２５】“Thank you for your help." を選んでＯ
Ｋ釦４を押すと「ありがとう。助かりました。」と訳文
が表示される。そこで発声釦１を押すと音声がスピーカ
６から流れる。文例検索による選択例について述べたが
短い文であれば“ti: eit* ei en kei, space, wai ou
ju:"といった具合に入力することもできる。表示装置３
には“Thank you"と表示される。音声出力は「ありがと
う」となる。ここで、アルファベットｈに対応した発音
eit*において、発音記号の部分が* で置き換えている。
このことは、前記のギリシャ語アルファベットの発音記
号も同様である。以下同じ。Select "Thank you for your help."
When the K button 4 is pressed, the translated sentence "Thank you. Then, when the utterance button 1 is pressed, a sound flows from the speaker 6. The selection example by sentence example search was described, but if it is a short sentence, "ti: eit * ei en kei, space,
ju: ". The display device 3
Displays "Thank you". The audio output is "thank you". Here, the pronunciation corresponding to the alphabet h
In eit *, the phonetic symbols are replaced with *.
The same applies to the phonetic symbols of the Greek alphabet. same as below.

【００２６】基本的にはスペル入力で入力はできるが、
周囲の雑音などの関係で入力しづらい場合もある。この
ようなときにはギリシャ語のアルファベットを併用する
のが有効である。先の“thank"を例にとれば“th" 、
“a"、“n"、“k"の代わりにθ「しーた」、α「あるふ
ぁ」、ν「にゅー」、κ「かっぱ」を用いる、等であ
る。また、修正時に用いるのも有効である。同じ例で結
果の表示が“think"になった時に“a"の部分をαに置き
換えて入力し直すことにより正しい結果“thank"が容易
に得られる。Basically, you can input by spelling,
In some cases, it is difficult to input due to the surrounding noise or the like. In such a case, it is effective to use the Greek alphabet together. Take “thank” as an example, “th”,
For example, instead of “a”, “n”, and “k”, θ (sea), α (alf), ν (nyaw), κ (kappa) are used. It is also effective to use at the time of correction. In the same example, when the display of the result becomes “think”, the correct result “thank” can be easily obtained by replacing the “a” part with α and re-inputting.

【００２７】英語のアルファベットに対応するギリシャ
語のアルファベットが必ずしもある訳ではないので併用
が基本となる。英語とギリシャ語のアルファベットの組
み合わせにより「ｅ問題」を避けることができるばかり
でなく、アルファベット間の距離を増大することができ
結果として認識率が究極まで高められる。ここで、「ｅ
問題」とは、ｂ、ｄ、ｅ、ｇ、ｐ、t 等の発声の類似性
が認識率を妨げるという問題のことである。Since the Greek alphabet corresponding to the English alphabet is not always present, the combination is fundamental. The combination of English and Greek alphabets not only avoids the "e problem", but also increases the distance between the alphabets, resulting in the ultimate increase in recognition rate. Here, "e
The "problem" refers to a problem in which similarities in utterances such as b, d, e, g, p, and t interfere with the recognition rate.

【００２８】前記図２の携帯型通訳機をそのまま携帯型
のワードプロセッサとして用いるようにすることもでき
る。その使用方法は、次の通りである。まずコマンドと
して“Word Processor" と発声し、続いて“File" そし
て“New"と発声する。新規入力画面が表示される。“Ti
tle of the invention" と発声する。その結果が“Idol
of a convention" と表示されたとする。スクロール釦
によりポインターを“Idol" の“I"に合わせ、“ti: ai
ti: el i:"(ティーアイティーエルイー) もしくは “tau iota
tau lambda epsilon"(タウイオタタウラムタ゛エフ゜シロン) 等と発
声する。The portable interpreter shown in FIG. 2 can be used as it is as a portable word processor. The method of use is as follows. First say “Word Processor” as a command, then say “File” and “New”. A new entry screen is displayed. “Ti
"tle of the invention". The result is "Idol
of the convention ”is displayed. Move the pointer to“ I ”in“ Idol ”using the scroll buttons, and then touch“ ti: ai
ti: el i: "or" tau iota
tau lambda epsilon "(tau iota tau lambda epsilon).

【００２９】あるいは、英語とギリシャ語のアルファベ
ットの組み合わせ発声でもいい。“tau ai tau el i"(タ
ウアイタウエルイー)となる。これにより“Title of a conve
ntion"が表示される。次にポインターを“a"に合わせ
“θ eit* i:"(シータエイチイー) 等と発声する。“a"が“th
e"に代わる。以下同様にして正しい入力“Title of the
invention" が得られる。続いて本文の入力に入り、文
章入力、適宜修正過程を経て入力を完成する。途中ある
いは最初からスペル入力を用いることもある。辞書には
ない技術用語や人名、地名の入力時等にはとくに便利で
ある。全文が完成したら“Store"(ストア) と発声し、入力
した文面を保存し作業を終了する。Alternatively, a combination of English and Greek alphabets may be used. “Tau ai tau el i”. This gives the title "Title of a
ntion "is displayed. Then, move the pointer to" a "and say" θ eit * i: "(theta H e).
e ". Similarly, enter the correct input" Title of the
Invention "is obtained. Then, enter the text, enter the text, and correct it as necessary. Complete the input. Sometimes, you can use spelling in the middle or from the beginning. There are also technical terms, personal names, and place names that are not in the dictionary. This is especially useful when inputting, etc. When the whole sentence is completed, say “Store” to save the input text and finish the work.

【００３０】このような携帯型ワードプロセッサでは、
文章読み上げ入力とスペル入力とを併用しているので、
入力効率が改善されるという効果がある。また、スペル
入力を随時採用すれば周囲を気にすることなく入力作業
を行うことができる。ギリシャ語のアルファベットを頻
繁に使えばこの効果を更に高めることができる。In such a portable word processor,
Since both text-to-speech input and spelling input are used,
There is an effect that the input efficiency is improved. In addition, if spell input is adopted as needed, the input operation can be performed without worrying about the surroundings. Frequent use of the Greek alphabet can enhance this effect.

【００３１】上記図２の携帯型通訳機は、そのまま携帯
電話機能もったパームトップ型パーソナルコンピュータ
に置き換えることができる。例えば、電子メール（e-ma
il）テキスト文の入力に本発明を適用した例をもうひと
つの使用方法を以下に説明する。The portable interpreter shown in FIG. 2 can be directly replaced with a palmtop personal computer having a portable telephone function. For example, e-mail (e-ma
il) Another example of using the present invention for inputting a text sentence will be described below.

【００３２】まずe-mailのコマンドとして “e-mail"
と発声する。画面がメール新規作成に切り替わる。表示
画面に表示されたアドレス帖からアドレスを選択する。
本文の入力に入る。“How are you"“Question mark"と
発声する。“How are you?"と表記される。もしも、修
正する必要があるときは随時本発明に係るグレコロマン
式スペル入力により修正する。以下同様にしてメールの
文章を入力する。最後にコマンドとして“Send mail"と
発声するとメールが送信される。First, “e-mail” is used as an e-mail command.
To say. The screen switches to creating a new mail. Select an address from the address list displayed on the display screen.
Enter text. Say “How are you” “Question mark”. "How are you?" If it is necessary to correct the spelling, it is corrected by the Greco-Roman spelling input according to the present invention. Then, input the text of the mail in the same manner. Finally, say "Send mail" as a command to send the mail.

【００３３】上記のパームトップ型パーソナルコンピュ
ータは、携帯電話機であってもいいし、ネットワーク接
続されたノート型パーソナルコンピュータ等であっても
いい。また、本発明を用いる副次的効果としては入力を
スペル方式に適宜変更することにより、側で聞いている
人がいても気にすることなく使用することができる。The above palmtop type personal computer may be a portable telephone, a notebook type personal computer connected to a network, or the like. Further, as a secondary effect of using the present invention, by appropriately changing the input to the spelling method, even if there is a person listening on the side, it can be used without concern.

【００３４】上記の実施例から得られる作用効果は、下
記の通りである。（１）音声認識方法において、特定言語をそれを構成
するアルファベットに置き換えて音声入力するととも
に、かかるアルファベットを複数言語の字母の発声の組
み合わせに置き換えて入力するとともに、音声認識では
上記アルファベットの字母単位の組み合わせで行うこと
により、簡単な構成で識別率の大幅な改善を図ることが
できるという効果が得られる。The functions and effects obtained from the above embodiment are as follows. (1) In the speech recognition method, a specific language is replaced with an alphabet constituting the language and the speech is input, and the alphabet is replaced with a combination of utterances of the characters in a plurality of languages. The effect of this is that the identification rate can be significantly improved with a simple configuration.

【００３５】（２）上記に加えて、上記複数言語での
字母の入力は、英語のアルファベットの一部がギリシャ
語のアルファベットに置き換えることによる類似性の高
い字母の識別が簡単となり、１回の認識での識別率の大
幅な改善につながるという効果が得られる。(2) In addition to the above, the input of the characters in the above-mentioned plurality of languages can easily identify the characters having a high similarity by replacing a part of the English alphabet with the Greek alphabet, so that it is possible to perform one-time input. This has the effect of significantly improving the identification rate in recognition.

【００３６】（３）上記に加えて、上記音声信号の信
号処理は入力部においてデジタル信号化し、音声分析部
において特徴抽出処理を行ない、照合部において予め用
意された隠れマルコフモデルを含む音響モデルと照合す
ることにより簡単な信号処理での認識が可能となり、メ
モリやＣＰＵに安価なものを用い、簡易なソフトウェア
での認識が可能になるという効果が得られる。(3) In addition to the above, in the signal processing of the audio signal, a digital signal is converted into a digital signal in an input unit, a feature extracting process is performed in a voice analyzing unit, and an acoustic model including a hidden Markov model prepared in advance in a collating unit. The collation enables recognition by simple signal processing, so that an inexpensive memory or CPU can be used, and recognition by simple software can be achieved.

【００３７】（４）音声信号を取り込んでデジタル化
する音声入力部と、上記デジタル化された音声信号の特
徴抽出を行ない、予め用意された音響モデルと照合して
字母判別を行う音声信号処理部を備え、特定言語の入力
をそれを構成するアルファベットに置き換えて音声入力
するとともに、かかるアルファベットを複数言語の字母
の発声の組み合わせに置き換えて入力するとともに、上
記音声認識部での音声認識では上記アルファベットの字
母単位の組み合わせで行うようにすることにより、キー
操作が簡単で文字や単語入力を簡単に行える電子装置を
得ることができるという効果が得られる。(4) An audio input unit for taking in an audio signal and digitizing it, and an audio signal processing unit for extracting features of the digitized audio signal and collating with a previously prepared acoustic model to determine a character model. The input of a specific language is replaced with the alphabet constituting the same, and the voice is input.Also, the alphabet is replaced with a combination of the utterances of the characters in a plurality of languages, and the input is performed. By using the combination of the character units, it is possible to obtain an electronic device in which the key operation is simple and characters and words can be easily input.

【００３８】（５）上記に加えて、上記複数言語での
字母の入力は、英語のアルファベットの一部がギリシャ
語のアルファベットに置き換えられて発声することによ
り、簡単な信号処理により識別率の大幅な改善につなが
るという効果が得られる。(5) In addition to the above, the input of the characters in the above-mentioned plurality of languages can be performed by replacing a part of the English alphabet with the Greek alphabet and uttering the voice. This leads to an effect that leads to a significant improvement.

【００３９】以上本発明者よりなされた発明を実施例に
基づき具体的に説明したが、本願発明は前記実施例に限
定されるものではなく、その要旨を逸脱しない範囲で種
々変更可能であることはいうまでもない。例えば、前記
のような英語、仏語、独語、ロシア語等の欧米系言語に
適用可能であるという普遍性を持つものである。すなわ
ち、たとえば仏語とギリシャ語のアルファベットを組み
合わせて使うことにより、仏語の単語認識率を究極まで
高めることができる。この発明に係る音声認識方法を用
いた音声認識機能が搭載される電子装置は、前記の実施
例の他にカーナビゲーションシステム等にも同様に適用
できる。Although the invention made by the inventor has been specifically described based on the embodiment, the invention of the present application is not limited to the embodiment, and various modifications can be made without departing from the gist of the invention. Needless to say. For example, it has universality that it can be applied to Western languages such as English, French, German, and Russian as described above. That is, for example, by using a combination of French and Greek alphabets, the word recognition rate of French can be increased to the maximum. An electronic device equipped with a voice recognition function using the voice recognition method according to the present invention can be similarly applied to a car navigation system and the like in addition to the above embodiments.

【００４０】[0040]

【発明の効果】本願において開示される発明のうち代表
的なものによって得られる効果を簡単に説明すれば、下
記の通りである。音声認識方法において、特定言語をそ
れを構成するアルファベットに置き換えて音声入力する
とともに、かかるアルファベットを複数言語の字母の発
声の組み合わせに置き換えて入力するとともに、音声認
識では上記アルファベットの字母単位の組み合わせで行
うことにより、簡単な構成で識別率の大幅な改善を図る
ことができる。The effects obtained by typical ones of the inventions disclosed in the present application will be briefly described as follows. In the voice recognition method, a specific language is replaced with the alphabet constituting the same and the voice is input, and the alphabet is replaced with a combination of the utterances of the characters in a plurality of languages and the input is performed. By doing so, the identification rate can be significantly improved with a simple configuration.

【００４１】音声信号を取り込んでデジタル化する音声
入力部と、上記デジタル化された音声信号の特徴抽出を
行ない、予め用意された音響モデルと照合して字母判別
を行う音声信号処理部を備え、特定言語の入力をそれを
構成するアルファベットに置き換えて音声入力するとと
もに、かかるアルファベットを複数言語の字母の発声の
組み合わせに置き換えて入力するとともに、上記音声認
識部での音声認識では上記アルファベットの字母単位の
組み合わせで行うようにすることにより、キー操作が簡
単で文字や単語入力を簡単に行える電子装置を得ること
ができる。An audio input unit for receiving and digitizing an audio signal, and an audio signal processing unit for extracting features of the digitized audio signal and collating with a previously prepared acoustic model to determine a character model; The input of a specific language is replaced with the alphabet constituting the same and the voice is input, and the alphabet is replaced with a combination of the utterances of the characters in a plurality of languages, and the input is performed. By using the combination of the above, an electronic device can be obtained in which key operations are simple and characters and words can be easily input.

[Brief description of the drawings]

【図１】この発明に係る音声認識方法を説明するための
一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment for explaining a voice recognition method according to the present invention.

【図２】この発明を携帯型通訳機に適用した場合の一実
施例を示す外観図である。FIG. 2 is an external view showing an embodiment when the present invention is applied to a portable interpreter.

[Explanation of symbols]

１…発声釦、２…表示装置、３…表示画面、４…ＯＫ
釦、５…スクロール釦、６…スピーカー。DESCRIPTION OF SYMBOLS 1 ... Voice button, 2 ... Display device, 3 ... Display screen, 4 ... OK
Buttons, 5 ... scroll buttons, 6 ... speakers.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 1/00 Ｇ１０Ｌ 3/00 ５３７Ｈ 1/725 ５５１Ａ５５１Ｂ (72)発明者田中誠東京都小平市上水本町５丁目22番１号日立超エル・エス・アイ・システムズ内 (72)発明者内館秀樹東京都小平市上水本町５丁目22番１号日立超エル・エス・アイ・システムズ内Ｆターム(参考） 5D015 AA05 BB02 HH23 KK02 5K027 AA11 BB01 HH20 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04M 1/00 G10L 3/00 537H 1/725 551A 551B (72) Inventor Makoto Tanaka Kodaira City, Tokyo 5-22-1, Honcho, Hitachi, Ltd., within SHI-LSI Systems (72) Inventor Hideki Uchidate, 5-22-1, Josui-Honmachi, Kodaira-shi, Tokyo F term (reference) 5D015 AA05 BB02 HH23 KK02 5K027 AA11 BB01 HH20

Claims

[Claims]

In a speech recognition method, a specific language is replaced by an alphabet constituting the same and speech input is performed. The alphabet is replaced by a combination of utterances of a plurality of languages and input. A speech recognition method characterized in that the method is performed in a combination of character units.

2. The speech recognition method according to claim 1, wherein the input of the characters in the plurality of languages is uttered by replacing a part of the English alphabet with a Greek alphabet.

3. The signal processing according to claim 1 or 2, wherein the signal processing of the audio signal is performed by converting the audio signal into a digital signal in an input unit, performing a feature extraction process in an audio analysis unit, and including a hidden Markov model prepared in advance in a matching unit. A speech recognition method including a method of determining a character base by matching with a model.

4. An audio input unit for inputting an audio signal, and an audio signal processing unit for digitizing the input audio signal, extracting features of the input audio signal, collating with a previously prepared acoustic model, and performing character discrimination. The input of a specific language is replaced with the alphabet constituting the same, and the input is made by voice. The alphabet is replaced with a combination of the utterances of the characters in a plurality of languages, and the input is performed. An electronic device characterized in that it is performed in a combination of units.

5. The electronic device according to claim 4, wherein the input of the characters in the plurality of languages is uttered by replacing a part of the English alphabet with a Greek alphabet.