JP2000322088A

JP2000322088A - Voice recognition microphone, voice recognition system, and voice recognition method

Info

Publication number: JP2000322088A
Application number: JP11133659A
Authority: JP
Inventors: Shinji Wakizaka; 新路脇坂; Kazuo Kondo; 和夫近藤; Hiroaki Kokubo; 浩明小窪; Nobuo Hataoka; 信夫畑岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1999-05-14
Filing date: 1999-05-14
Publication date: 2000-11-24

Abstract

(57)【要約】【課題】カーナビゲーションシステム、小型情報機
器、ゲームなどに用いられる音声認識システムにおい
て、音声認識を使い易いインタフェースにする。【解決手段】音声認識処理を実行する機能を有する音
声認識マイク１と、システム本体５とを通信手段４で接
続し、音声認識部１３で認識した認識結果をシステム本
体５に転送してシステムを動作させる。音声認識マイク
１は、音響モデル１３５と音声認識処理部１３４とを有
する音声認識部１３と、コマンド辞書１３１と認識対象
辞書１５２とユーザ登録辞書１５３と、データ通信部１
７とを有し、システム本体５から転送された認識対象辞
書１５２から必要な単語をユーザ登録辞書１５４に登録
し、通常は、コマンド辞書１５１とユーザ登録辞書１５
４とを用いて音声認識を行う。 (57) [Summary] To provide an easy-to-use interface for voice recognition in a voice recognition system used for a car navigation system, a small information device, a game, and the like. SOLUTION: A voice recognition microphone 1 having a function of executing voice recognition processing is connected to a system main body 5 by a communication means 4, and a recognition result recognized by a voice recognition section 13 is transferred to the system main body 5 to operate the system. Make it work. The voice recognition microphone 1 includes a voice recognition unit 13 having an acoustic model 135 and a voice recognition processing unit 134, a command dictionary 131, a recognition target dictionary 152, a user registration dictionary 153, and a data communication unit 1.
The necessary words are registered in the user registration dictionary 154 from the recognition target dictionary 152 transferred from the system main body 5, and the command dictionary 151 and the user registration dictionary 15 are usually stored.
4 to perform speech recognition.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識システム
および方法にかかわり、カーナビゲーションシステム、
車載用ＰＣ、カーエレクトロニクスや、ＰＤＡ、ハンド
ヘルドＰＣに代表される小型情報機器、携帯型音声翻訳
機、ならびに、ゲーム機器、家電機器に用いる音声認識
システムであって、特に、カーナビゲーションシステム
や車載用ＰＣ、カーエレクトロニクスに代表されるカー
マルチメディア分野において、認識応答時間、認識率向
上の面で、使い勝手の良い音声認識システムおよび方法
に関する。The present invention relates to a speech recognition system and method, and more particularly to a car navigation system,
This is a voice recognition system used for in-vehicle PCs, car electronics, small information devices such as PDAs and handheld PCs, portable voice translators, game devices, and home electric appliances. In the field of car multimedia represented by PC and car electronics, the present invention relates to an easy-to-use speech recognition system and method in terms of improving recognition response time and recognition rate.

【０００２】[0002]

【従来の技術】近年、音声認識技術を用いた小型情報シ
ステムが普久しつつある。カーナビゲーションシステム
をはじめとして、ＰＤＡに代表される小型情報機器、携
帯型翻訳機等である。このような音声認識システムの例
として、特開平５ー３５７７６号公報には「言語自動選
択機能付翻訳装置」として、マイクから入力した操作者
の音声を認識して、翻訳し、翻訳した言語の音声を出力
するようにした携帯用の翻訳装置に関する技術が開示さ
れている。2. Description of the Related Art In recent years, small information systems using voice recognition technology have been around for a long time. In addition to car navigation systems, small information devices represented by PDAs, portable translators, and the like. As an example of such a speech recognition system, Japanese Patent Application Laid-Open No. 5-35776 discloses a "translation device with an automatic language selection function" that recognizes, translates, and translates an operator's voice input from a microphone. A technology related to a portable translation device that outputs voice is disclosed.

【０００３】以下、図６を用いて、このような従来技術
にかかわる音声翻訳装置の概要を説明する。図６は、従
来技術にかかわる音声翻訳装置の構成を示すブロック図
である。音声認識手段を備えた音声翻訳装置７は、マイ
ク７１と、音声認識部７２と、翻訳部７３と、制御部７
４と、これら各部７２，７３，７４管でデータを転送す
るバス７５とを有して構成される。[0003] An outline of such a speech translation apparatus according to the prior art will be described below with reference to FIG. FIG. 6 is a block diagram showing a configuration of a speech translator according to the related art. The speech translation device 7 including the speech recognition means includes a microphone 71, a speech recognition unit 72, a translation unit 73, a control unit 7
4 and a bus 75 for transferring data through these sections 72, 73 and 74.

【０００４】音声認識部７２は、音声区間切出部７２１
と、音声認識処理部７２２と、音声モデル格納部７２３
と、音声認識辞書部７２４とを有して構成される。[0004] The voice recognition section 72 includes a voice section extraction section 721.
, A voice recognition processing unit 722, and a voice model storage unit 723
And a voice recognition dictionary unit 724.

【０００５】翻訳部７３は、翻訳語データ用メモリカー
ド７３１と、音声合成部７３２と、表示部７３５とを有
している。さらに、音声合成部７３２には、スピーカア
ンプ７３３と、スピーカ７３４が接続されている。[0005] The translation unit 73 has a memory card 731 for translated word data, a speech synthesis unit 732, and a display unit 735. Further, a speaker amplifier 733 and a speaker 734 are connected to the voice synthesizer 732.

【０００６】マイク７１は、ユーザの音声などを電気信
号に変換して入力する。The microphone 71 converts a user's voice or the like into an electric signal and inputs the electric signal.

【０００７】音声区間切出部７２１は、マイク７１から
入力された音声と雑音を含んだ音声信号をデジタル信号
に変換するとともに音声区間を切り出し、音声区間の信
号を音声認識処理部７２２に送る。[0007] The voice section cut-out section 721 converts the voice input from the microphone 71 and the voice signal containing noise into a digital signal, cuts out the voice section, and sends the voice section signal to the voice recognition processing section 722.

【０００８】音声認識処理部７２２は、キーボード又は
スイッチ等による操作信号７９を受けた制御部７４の指
示により、マイク７１、音声区間切出部７２１を経て、
切り出された音声を音響モデル格納部７２３に格納され
た音響モデルを用いて分析する。さらに、音声認識処理
部７２２は、分析した結果を、音声認識辞書部７２４に
格納された標準音声パターンと比較することによって、
音声認識を行う。[0008] The voice recognition processing unit 722 receives an operation signal 79 from a keyboard or a switch, and receives an instruction from the control unit 74.
The extracted speech is analyzed using the acoustic model stored in the acoustic model storage unit 723. Further, the voice recognition processing unit 722 compares the analyzed result with the standard voice pattern stored in the voice recognition dictionary unit 724,
Perform voice recognition.

【０００９】音響モデル格納部７２３には、音声認識に
用いる切り出された音声区間の音響モデルが格納されて
いる。[0009] The acoustic model storage unit 723 stores an acoustic model of a cut-out speech section used for speech recognition.

【００１０】音声認識辞書部７２４は、ＲＡＭ等からな
り、操作者の発声に応じた標準音声パターンを格納して
いる。この標準音声パターンは、操作者があらかじめ格
納しておく。[0010] The voice recognition dictionary unit 724 is composed of a RAM or the like, and stores a standard voice pattern corresponding to the utterance of the operator. This standard voice pattern is stored in advance by the operator.

【００１１】一方、翻訳部７３の翻訳語データ用メモリ
カード７３１は、ＲＯＭカード等からなり、音声認識し
た単語に対応する翻訳語が格納されており、翻訳語を音
声合成して出力する場合には、音声データを格納してい
る。また、この翻訳語データ用メモリカード７３１か
ら、翻訳語に対応したキャラクターコードを読み込み、
表示部７３５に表示する。翻訳語データ用メモリカード
７３１を他の言語のものと交換することによって、音声
認識した単語を複数の言語に対応して翻訳することが可
能となる。On the other hand, the translated word data memory card 731 of the translation unit 73 is composed of a ROM card or the like, and stores translated words corresponding to the words recognized by speech. Stores audio data. Also, from this translated word data memory card 731, a character code corresponding to the translated word is read,
It is displayed on the display unit 735. By exchanging the translation word data memory card 731 with one for another language, it becomes possible to translate the word recognized by speech in a plurality of languages.

【００１２】音声合成部７３２は、音声認識処理部７２
２により認識された音声に対応した翻訳語を、翻訳語デ
ータ用メモリカード７３１から読み込み、音声信号に変
換してスピーカアンプ７３３、スピーカ７３４を経て出
力する。The voice synthesizing section 732 includes a voice recognition processing section 72.
2 is read from the translated word data memory card 731, converted into an audio signal, and output via the speaker amplifier 733 and the speaker 734.

【００１３】表示部７３５は、翻訳装置の使用者への指
示や翻訳語の文字による表示等をおこなう。The display unit 735 gives instructions to the user of the translation apparatus, displays translated words in characters, and the like.

【００１４】制御部７４は、マイクロプロセッサ等から
なり、音声翻訳装置７の各部を制御する。The control section 74 is composed of a microprocessor or the like, and controls each section of the speech translator 7.

【００１５】このような音声認識、音声合成技術の分野
は、半導体技術の向上を背景として、システムがより人
間的なユーザインタフェースを提供すべきであるという
要望から、その発展が期待されている。上記従来の音声
認識技術を用いた小型情報システムにおいても、カーナ
ビゲーションシステムをはじめとして、ＰＤＡに代表さ
れる携帯型情報機器、携帯型翻訳機、さらに、音声イン
タフェースを持った情報家電として、今後ますます普及
してくることが予想される。[0015] In the field of such speech recognition and speech synthesis techniques, their development is expected from the demand that the system should provide a more human-like user interface with the improvement of semiconductor technology. In the above-mentioned small information systems using the conventional speech recognition technology, as car navigation systems, portable information devices represented by PDAs, portable translators, and information home appliances with voice interfaces, the future will continue. It is expected to become more and more popular.

【００１６】そこで、このような音声認識技術を使用し
た分野での実用化における課題は、認識率の向上と認識
応答時間の短縮にある。従来の技術では、認識率や認識
応答時間の性能を低下させないためには、認識する語数
に制約を設ける必要がある。その制約の中で、あらかじ
め登録しておいた単語、文に対して、その文字列が持つ
統計的な話者の音声の特徴と、実際に話者が発声した音
声の特徴とを比較し、確率的に一番近い値を認識結果と
している。特に、雑音環境下におけるあるレベル以上の
認識率を確保するには、この手法が必要不可欠である。Therefore, the problems in practical use in the field using such a speech recognition technology are to improve the recognition rate and to shorten the recognition response time. In the related art, it is necessary to set a limit on the number of words to be recognized in order not to lower the performance of the recognition rate and the recognition response time. Under the constraints, for words and sentences registered in advance, the statistical characteristics of the speaker's voice in the character string and the characteristics of the voice actually spoken by the speaker are compared. The value closest to the probability is regarded as the recognition result. In particular, this method is indispensable for securing a recognition rate above a certain level in a noisy environment.

【００１７】今後、音声認識における技術革新や、それ
を実現するソフトウエア、ハードウエアの性能向上によ
り、認識する語数に制約を設けなくとも、認識率や認識
応答時間の性能は向上することが考えられる。しかしな
がら、音声認識システムの実用的な観点から、処理量は
できるだけ小さい方が認識率や認識応答時間における音
声認識の単体性能、並びに音声認識を組み込んだシステ
ム全体の性能と使い勝手の面では好ましい。また、音声
認識を使い易いインタフェースにするための課題は、音
声認識を音声を用いた単なる一つのユーザインタフェー
スにすることである。[0017] In the future, it is expected that the performance of the recognition rate and the recognition response time will be improved by the technical innovation in the speech recognition and the improvement of the software and hardware for realizing the speech recognition without limiting the number of words to be recognized. Can be However, from the practical viewpoint of the speech recognition system, it is preferable that the processing amount is as small as possible in terms of the single performance of the speech recognition at the recognition rate and the recognition response time, and the performance and usability of the whole system incorporating the speech recognition. Further, a problem for making the speech recognition an easy-to-use interface is to make the speech recognition a single user interface using speech.

【００１８】そのためには、音声認識処理をシステム本
体で行わず、マイク等のインタフェース側で実現するこ
とである。それにより、システム本体とマイク等の音声
認識インタフェースは、既存のシステムに容易に接続可
能となる。さらに、音声認識処理した結果をシステム本
体へ転送することから、従来のアナログ音声信号をシス
テム本体へ転送してから音声認識する場合と比べて、環
境からのノイズの影響を小さくすることができる。した
がってシステム全体の認識性能を向上させることができ
る。そのためには、システム本体と、音声認識を実行す
るマイクなどのインタフェース部とを分離して使い勝手
のよいシステムを提供する必要がある。For this purpose, the voice recognition processing is not performed by the system itself, but is realized by an interface such as a microphone. Thereby, the system main body and the voice recognition interface such as the microphone can be easily connected to the existing system. Further, since the result of the voice recognition processing is transferred to the system main body, the influence of noise from the environment can be reduced as compared with the case where a conventional analog voice signal is transferred to the system main body and then voice recognition is performed. Therefore, the recognition performance of the entire system can be improved. For that purpose, it is necessary to provide an easy-to-use system by separating the system main body and an interface unit such as a microphone for executing voice recognition.

【００１９】従来のカーナビゲーションシステムにおけ
る音声認識システムでは、地名、交差点名、建物名、駅
名、電話番号などの音声認識対象となる辞書を数十万単
語用意して、辞書を階層的に分割して音声認識を階層的
に実行する。目的の単語まで到達するまでに、数回の音
声を発生し、かつ音声認識するまでくり返す。目的の単
語が認識されると、例えば、目的地までのルート探索が
行われる。In a speech recognition system in a conventional car navigation system, hundreds of thousands of dictionaries to be subjected to speech recognition such as place names, intersection names, building names, station names, and telephone numbers are prepared, and the dictionaries are hierarchically divided. To perform speech recognition hierarchically. It generates several voices until it reaches the target word, and repeats it until it recognizes the voice. When the target word is recognized, for example, a route search to the destination is performed.

【００２０】このようなシステムでは、認識対象となる
辞書の語数は膨大であり、認識率や認識応答時間の性能
を低下させないために、辞書の階層化および音声認識を
階層的に実行する。これでは、一つの目的単語を認識さ
せるのに数回の音声認識を実行しなければならない。し
たがって、便利であるはずの音声認識によるインタフェ
ースが逆に不便なものとなりシステム全体の使い勝手が
悪くなってしまう。In such a system, the number of words in the dictionary to be recognized is enormous, and hierarchies of the dictionary and speech recognition are executed hierarchically so as not to reduce the performance of the recognition rate and recognition response time. In this case, several times of speech recognition must be performed to recognize one target word. Therefore, an interface based on voice recognition, which should be convenient, is inconvenient, and the usability of the entire system is reduced.

【００２１】また、辞書を階層的にせず、数十万単語か
らなる辞書で、はじめから目的の単語を発声する音声認
識システムでは、将来、音声認識技術の革新で認識率が
向上したとしても、特に、システムの低価格化において
は、認識率や認識応答時間の面で充分な性能が得られな
い。Further, in a speech recognition system in which a dictionary is composed of hundreds of thousands of words and a target word is uttered from the beginning without making the dictionary hierarchical, even if the recognition rate is improved in the future due to the innovation of speech recognition technology, In particular, in reducing the price of the system, sufficient performance cannot be obtained in terms of recognition rate and recognition response time.

【００２２】さらに、辞書を階層化し数回の音声認識を
実行して、基本的な音声認識の性能が得られたとして
も、認識率は１００％にはならない。それは、人間が音
的に類似した単語を聞き間違えるのと同じである。Furthermore, even if the dictionary is hierarchized and speech recognition is performed several times, and the basic speech recognition performance is obtained, the recognition rate does not reach 100%. It is the same as a human misunderstanding a phonetically similar word.

【００２３】例えば、カーナビゲーションシステムに適
用した音声認識システムを図７を用いて説明する。図７
は、従来の音声認識システムであり、音声認識処理をカ
ーナビゲーションシステム本体側で行っているシステム
における音声認識処理の流れを説明する図である。従来
のシステムでは、音声認識に関わる一連の処理は、カー
ナビゲーションシステム本体が行っているので、カーナ
ビゲーションシステムに限らず、他に優先順位の高い処
理との競合が起こり、ＣＰＵの負荷は大きくなる。ま
た、もともと音声のインタフェースを持たないシステム
においては、ＣＰＵの負荷に加えて、ハードウエアの改
造が伴う。For example, a speech recognition system applied to a car navigation system will be described with reference to FIG. FIG.
FIG. 2 is a diagram illustrating a flow of a voice recognition process in a conventional voice recognition system in which a voice recognition process is performed on a car navigation system body side. In the conventional system, a series of processes related to voice recognition are performed by the car navigation system itself, so that not only the car navigation system but also other high-priority processes occur and the CPU load increases. . Also, in a system that does not originally have a voice interface, hardware modification is involved in addition to the load on the CPU.

【００２４】ここでは、第１の発生「ホテル」の後に、
第２の発生「△△△ホテル」を発した場合の音声認識処
理を説明する。このシステムでは、対象別に複数の辞書
が設けられている。第１の発声「ホテル」が入力される
（Ｓ１）と、音声認識処理Ｐ１は、一連の音声認識処理
を行って、認識結果「ホテル」を出力し、アプリケーシ
ョンＰ２へ送出する（Ｓ２）。認識結果「ホテル」は、
アプリケーションＰ２で、辞書選択処理Ｐ３を起動し
（Ｓ３）、大規模認識対象辞書５３内のホテル単語辞書
５３１を選択する（Ｓ４）。Here, after the first occurrence "hotel",
The speech recognition processing when the second occurrence “@hotel” is issued will be described. In this system, a plurality of dictionaries are provided for each object. When the first utterance "hotel" is input (S1), the voice recognition process P1 performs a series of voice recognition processes, outputs a recognition result "hotel", and sends it to the application P2 (S2). The recognition result "hotel"
The application P2 starts the dictionary selection process P3 (S3), and selects the hotel word dictionary 531 in the large-scale recognition target dictionary 53 (S4).

【００２５】ホテル単語辞書５３１は、ホテル名の単語
で構成された辞書であり、単語数は、５０００単語であ
るとする。大規模認識対象辞書５３として、ホテル単語
辞書の他にパーク単語辞書５３１、スキー場単語辞書５
３２などが保存されており、単語数は、それぞれ１００
００単語、４０００単語であるとする。選択されたホテ
ル単語辞書５３１は、認識対象辞書５３４として以降の
音声認識処理Ｐ１に用いられる。The hotel word dictionary 531 is a dictionary composed of hotel name words, and it is assumed that the number of words is 5000 words. As the large-scale recognition dictionary 53, in addition to the hotel word dictionary, a park word dictionary 531 and a ski resort word dictionary 5
32, etc., and the number of words is 100, respectively.
Assume that there are 00 words and 4000 words. The selected hotel word dictionary 531 is used as a recognition target dictionary 534 in the subsequent speech recognition processing P1.

【００２６】第２以降の発声「△△△ホテル」が入力さ
れると（Ｓ５）、音声認識所利Ｐ１は音声認識処理を行
い、認識結果「△△△ホテル」をアプリケーションＰ２
に出力する（Ｓ６）。アプリケーションＰ２は、このデ
ータを、目的地設定、ルート検索などの処理に渡す（Ｓ
７）。When the second or later utterance "@hotel" is input (S5), the speech recognition facility P1 performs a speech recognition process, and outputs the recognition result "@hotel" to the application P2.
(S6). The application P2 passes this data to processes such as destination setting and route search (S
7).

【００２７】このような方式では、ユーザは、以上の様
な「ホテル」の入力から始まるやり取りを、目的地を設
定するごとに行わなければならない。先に述べた通り、
このような手法は、ＣＰＵの負荷と、辞書の単語数か
ら、認識応答時間と認識率の面で、特に、システムトー
タルの低価格化において認識性能を劣化させることとな
る。In such a system, the user must perform the exchange starting from the input of "hotel" as described above every time a destination is set. As mentioned earlier,
Such a method degrades the recognition performance in terms of the recognition response time and the recognition rate from the load on the CPU and the number of words in the dictionary, particularly in reducing the price of the system as a whole.

【００２８】[0028]

【発明が解決しようとする課題】以上の点に鑑み、本発
明の第一の目的は、音声認識を実行する音声認識インタ
フェース部を音声認識結果を利用するシステム本体と分
離して、システム本体へは認識した結果だけを転送する
音声認識マイクを提供することにある。また、本発明の
第二の目的は、性能の面から見ても使い勝手の良い音声
認識マイクを提供することである。SUMMARY OF THE INVENTION In view of the above, a first object of the present invention is to separate a speech recognition interface unit for executing speech recognition from a system body using a speech recognition result, and to a system body. An object of the present invention is to provide a voice recognition microphone that transfers only a recognition result. A second object of the present invention is to provide a speech recognition microphone that is easy to use even in terms of performance.

【００２９】すなわち、例えば、カーナビゲーションシ
ステムに登録されている膨大な辞書の中で、ユーザがよ
く使う複数の辞書の単語数は、コマンドや目的地の地名
など合計しても１００単語以下であると想定する。そこ
で、音声認識マイクの認識対象単語は、ユーザが、シス
テム本体側の膨大な辞書から必要な単語だけを、システ
ム本体側から音声認識マイク側へ登録した単語に限って
音声認識処理を実行する。したがって、認識率は１００
％に近くなり、また処理量も小さくなることから、低価
格なハードウエアでも十分な性能が実現できる。That is, for example, among a large number of dictionaries registered in the car navigation system, the number of words in a plurality of dictionaries frequently used by users is 100 words or less in total, such as commands and place names of destinations. Assume that Therefore, as for the words to be recognized by the voice recognition microphone, the user executes the voice recognition process only on the necessary words from the huge dictionary on the system body side, and only on the words registered in the voice recognition microphone side from the system body side. Therefore, the recognition rate is 100
% And the amount of processing is reduced, so that sufficient performance can be achieved even with low-cost hardware.

【００３０】[0030]

【課題を解決するための手段】上記目的を達成するため
に、請求項１の発明は、音声認識の対象となる単語や文
章を集めて辞書として定義し、音声認識結果に基づいて
それらの単語や文章を取り出して、後続する情報処理用
データや文字列表示や単語が示す画像などとして出力し
たり、認識結果を音声合成を用いて音声として出力する
音声認識システムにける音声認識機能を備えた音声認識
マイクにおいて、マイク単体の機能を有するマイク部
と、マイク部からのアナログ信号をデジタル信号に変換
するＡ／Ｄ変換器と、音声区間を検出する音声区間検出
処理と、取り込んだ音声に対して音声分析する音声分析
処理と、音声の特徴を音素単位でもつ音響モデルと、あ
らかじめ登録された辞書と音響モデルを連結して、音響
モデルと連結された全ての辞書において、入力された音
声の音声分析結果と照合し、確からしい認識結果を出力
する音声認識処理部を備えて構成し、マイクから音声認
識結果を出力するようにした。In order to achieve the above object, according to the first aspect of the present invention, words and sentences to be subjected to speech recognition are collected and defined as a dictionary, and these words and sentences are defined based on the result of speech recognition. And a sentence, and output it as data for subsequent information processing, a character string display, an image indicated by a word, or the like, or a speech recognition system that outputs recognition results as speech using speech synthesis. In a voice recognition microphone, a microphone section having a function of a microphone alone, an A / D converter for converting an analog signal from the microphone section into a digital signal, a voice section detection process for detecting a voice section, and Voice analysis processing, and voice models that have voice characteristics in phoneme units, and pre-registered dictionaries and voice models are connected, and the voice models are connected. In the dictionary of Te, against the results voice analysis of the voice input, and configured to include a voice recognition processing section for outputting a probable recognition result, and outputs a speech recognition result from the microphone.

【００３１】請求項２の発明は請求項１の音声認識マイ
クにおいて、音声の入力から音声認識結果を出力するま
での一連の音声認識処理を行う音声認識部と、音声認識
結果を用いて新たな処理を実行するシステム本体へ認識
結果を転送しシステム本体から認識対象データを転送す
るデータ通信部と、認識対象となる辞書を含む辞書部を
有して構成した。According to a second aspect of the present invention, in the voice recognition microphone according to the first aspect, a voice recognition unit that performs a series of voice recognition processes from inputting a voice to outputting a voice recognition result, and a new voice recognition unit using the voice recognition result. The system includes a data communication unit that transfers a recognition result to the system main unit that executes processing and transfers recognition target data from the system main unit, and a dictionary unit that includes a dictionary to be recognized.

【００３２】請求項３の発明は、請求項２の音声認識マ
イクにおいて、データ通信部を、音声認識マイクとシス
テム本体を有線または無線もしくは赤外線通信で接続す
るインタフェースとして構成し、通信されるデータをデ
ジタルデータとし、その内容を、音声認識結果を表すテ
キストデータおよび／またはデジタル化された音声波形
データおよび／または音声でない雑音Ｎと音声Ｓのレベ
ルを示すＳ／Ｎ比のデータとした。According to a third aspect of the present invention, in the voice recognition microphone according to the second aspect, the data communication unit is configured as an interface for connecting the voice recognition microphone and the system body by wire, wireless, or infrared communication, and the data to be communicated is transmitted. The contents were digital data, and the contents were text data representing speech recognition results and / or digitized speech waveform data and / or S / N ratio data indicating the levels of noise N and speech S which are not speech.

【００３３】請求項４の発明は、上記音声認識マイクに
おいて、これから音声を入力することを音声認識マイク
に知らせるための音声入力通知手段を接続するためのイ
ンタフェースを備えた。According to a fourth aspect of the present invention, the voice recognition microphone includes an interface for connecting voice input notifying means for notifying the voice recognition microphone that a voice is to be input.

【００３４】請求項５の発明は、上記音声認識マイクに
おいて、あらかじめ音声認識マイクに登録したコマンド
辞書と、システム本体の記憶媒体に登録された大規模な
辞書から認識対象となる辞書を読み出した認識対象辞書
と、ユーザの登録処理によって作成されるユーザ登録辞
書とを有して構成した。According to a fifth aspect of the present invention, in the above-mentioned speech recognition microphone, a command dictionary registered in advance in the speech recognition microphone and a dictionary to be recognized are read from a large-scale dictionary registered in a storage medium of the system body. It has a target dictionary and a user registration dictionary created by a user registration process.

【００３５】請求項６の発明は、音声認識機能を有し入
力された音声を認識した結果を出力する音声認識マイク
と、音声認識結果を用いて後続する処理を実行する情報
処理手段とから音声認識システムを構築した。According to a sixth aspect of the present invention, there is provided a speech recognition microphone having a speech recognition function and outputting a result of recognizing inputted speech, and an information processing means for executing subsequent processing using the speech recognition result. A recognition system was built.

【００３６】請求項７の発明は、上記音声認識システム
において、上記音声認識マイクが音声認識処理部と辞書
部とデータ通信部を有し、上記情報処理手段が音声認識
に使用する認識対象辞書とデータ通信部を有し、前記音
声認識マイクでの音声認識結果に基づいて音声認識マイ
クの辞書部に前記情報処理手段の認識対象辞書の一部分
を転送し、転送された認識対象辞書を用いて音声認識す
るようにした。According to a seventh aspect of the present invention, in the voice recognition system, the voice recognition microphone has a voice recognition processing unit, a dictionary unit, and a data communication unit, and the information processing unit uses a recognition target dictionary used for voice recognition. Having a data communication unit, transferring a part of the recognition target dictionary of the information processing means to a dictionary unit of the voice recognition microphone based on a voice recognition result of the voice recognition microphone, and using the transferred recognition target dictionary Recognized.

【００３７】請求項８の発明は、上記音声認識システム
において、上記音声認識マイクの辞書部にユーザ登録辞
書を設け、該ユーザ辞書に認識対象辞書の中のユーザが
必要とする単語を登録し、通常の音声認識では、コマン
ド辞書とユーザ登録辞書を認識対象として音声認識する
ようにした。According to the present invention, in the speech recognition system, a user registration dictionary is provided in a dictionary section of the speech recognition microphone, and words required by a user in the dictionary to be recognized are registered in the user dictionary. In normal speech recognition, speech recognition is performed with the command dictionary and the user registration dictionary as recognition targets.

【００３８】請求項９の発明は、音声認識処理部と辞書
部とデータ通信部を有する音声認識マイクと、音声認識
結果に基づいて処理を行う情報処理手段とからなる音声
認識システムの音声認識方法において、音声認識マイク
の辞書部に、コマンド辞書と、情報処理手段から転送さ
れた認識対象辞書と、ユーザ登録辞書とを備え、認識対
象辞書の中から最終的にユーザが必要とする複数の単語
を集めてユーザ登録辞書を作成し、通常の音声認識では
コマンド辞書とユーザ登録辞書を認識対象として音声認
識するようにした。According to a ninth aspect of the present invention, there is provided a voice recognition method for a voice recognition system comprising a voice recognition microphone having a voice recognition processing unit, a dictionary unit and a data communication unit, and information processing means for performing processing based on the voice recognition result. In the dictionary part of the voice recognition microphone, a command dictionary, a recognition target dictionary transferred from the information processing means, and a user registration dictionary, and a plurality of words finally required by the user from the recognition target dictionary Then, a user registration dictionary is created by collecting the command dictionary and the normal dictionary recognizes the command dictionary and the user registration dictionary as voice recognition targets.

【００３９】[0039]

【発明の実施の形態】以下、本発明に係る各実施形態
を、図１から図５を用いて説明する。図１は、本発明に
かかる音声認識マイクの機能構成を示すブロック図であ
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments according to the present invention will be described below with reference to FIGS. FIG. 1 is a block diagram showing a functional configuration of a voice recognition microphone according to the present invention.

【００４０】図１に示す音声認識マイク１は、音声認識
した結果３を出力する。この音声認識マイク１の出力３
は、音声認識した結果に限らず、音声認識に関わる情報
であってよい。例えば、本来のマイクの基本機能である
音声や周囲の音を集音して、従来はアナログ信号として
伝達していたものをデジタル化して出力したデジタル波
形信号である。また、音声でない雑音Ｎのレベルと、音
声Ｓのレベルを相対的な比で表したＳ／Ｎ比のデータで
ある。これらのデジタル化された情報は、音声認識マイ
クとシステム本体を組み合わせて新たなシステムを構築
する場合のアプリケーションに必要な基本情報である。The voice recognition microphone 1 shown in FIG. 1 outputs a result 3 of voice recognition. Output 3 of this voice recognition microphone 1
Is not limited to the result of speech recognition, but may be information related to speech recognition. For example, it is a digital waveform signal obtained by collecting voices and surrounding sounds which are basic functions of an original microphone and digitizing and outputting a signal conventionally transmitted as an analog signal. In addition, it is S / N ratio data representing the level of noise N which is not voice and the level of voice S in a relative ratio. These digitized information is basic information necessary for an application when a new system is constructed by combining a voice recognition microphone and a system body.

【００４１】音声認識マイク１は、マイク１１と、音声
認識部１３と、辞書部１５と、データ通信部１７との機
能ブロックで構成される。The voice recognition microphone 1 is composed of functional blocks of a microphone 11, a voice recognition unit 13, a dictionary unit 15, and a data communication unit 17.

【００４２】マイク１１は、音声や雑音を取り込むもの
で、従来から有るコンデンサマイクなどで構成され、指
向性を有している。The microphone 11 captures voice and noise, and is configured by a conventional condenser microphone or the like, and has directivity.

【００４３】音声認識部１３は、入力された音声や雑音
から音声だけを検出して、音声分析を行う。さらに、音
声認識部１３は、あらかじめ登録された辞書と、音声の
特徴を音素単位でもつ音響モデルから、登録された全て
の辞書と音響モデルを連結して、実際に入力された音声
の音声分析結果と照合して、確からしい認識結果を出力
する。The voice recognition unit 13 detects only voice from the input voice or noise and performs voice analysis. Further, the speech recognition unit 13 connects all the registered dictionaries and acoustic models from a pre-registered dictionary and an acoustic model having speech characteristics in phoneme units, and performs a speech analysis of the actually input speech. The result is compared with the result and a reliable recognition result is output.

【００４４】辞書部１５は、音声認識の対象となる辞書
が格納される。辞書部１５に格納される辞書には、あら
かじめ音声認識マイク１に登録しておくコマンド辞書
と、システム本体の記憶媒体に登録されている大規模な
認識辞書から転送された認識対象辞書と、コマンド辞書
や認識対象辞書からユーザが必要とする単語のみ取り出
して登録して構成するユーザ登録辞書がある。辞書部１
５は、音声認識結果、コマンド辞書に登録された「ユー
ザ辞書へ登録（辞書へ登録）」を受けて、ユーザ登録辞
書を作成する。The dictionary section 15 stores a dictionary to be subjected to speech recognition. The dictionary stored in the dictionary unit 15 includes a command dictionary registered in advance in the voice recognition microphone 1, a recognition target dictionary transferred from a large-scale recognition dictionary registered in a storage medium of the system main body, and a command dictionary. There is a user registration dictionary configured by extracting and registering only words required by a user from a dictionary or a dictionary to be recognized. Dictionary part 1
5 receives the speech recognition result and “register in user dictionary (register in dictionary)” registered in the command dictionary, and creates a user registration dictionary.

【００４５】データ通信部１７は、音声認識部の音声認
識結果を、音声認識マイク１が接続されたシステム本体
あるいは音声認識マイク１を使用しているシステム本体
に、転送するための処理を行う。また、音声認識結果と
同様に、デジタル化された音声信号や、音声に関わる情
報を転送する。The data communication unit 17 performs a process for transferring the speech recognition result of the speech recognition unit to the system body to which the speech recognition microphone 1 is connected or to the system body using the speech recognition microphone 1. Further, similarly to the speech recognition result, a digitized speech signal and information related to speech are transferred.

【００４６】図２を用いて、音声認識マイク１のハード
ウエア構成を説明する。音声認識マイク１は、マイク１
１と、アンプ２１と、Ａ／Ｄ変換器２２と、ＣＰＵ２３
と、ＲＯＭ２４と、ＲＡＭ２５と、有線インタフェース
２６−１と、赤外線インタフェース（ＩＲ）２６−２
と、無線インタフェース２６−３と、音声認識モードイ
ンタフェース２６−４と、これらを相互に接続するシス
テムバス２７と、音声入力ボタン２９とを有して構成さ
れる。The hardware configuration of the voice recognition microphone 1 will be described with reference to FIG. The voice recognition microphone 1 is the microphone 1
1, an amplifier 21, an A / D converter 22, and a CPU 23
, ROM 24, RAM 25, wired interface 26-1, and infrared interface (IR) 26-2.
, A wireless interface 26-3, a voice recognition mode interface 26-4, a system bus 27 for interconnecting them, and a voice input button 29.

【００４７】マイク１１は、図１に示したマイク１１と
同じである。The microphone 11 is the same as the microphone 11 shown in FIG.

【００４８】アンプ２１は、抵抗、コンデンサなどの電
子部品で構成されたアンプであり、雑音を除去するため
のハイパスフィルタや、バンドパスフィルタを含んでい
る。The amplifier 21 is an amplifier composed of electronic components such as a resistor and a capacitor, and includes a high-pass filter for removing noise and a band-pass filter.

【００４９】Ａ／Ｄ変換器２２は、アンプ２１を経由し
てマイク１１から入力された音声や雑音のアナログ信号
をデジタル信号に変換する。Ａ／Ｄ変換器２２は、シス
テムバス２７に接続されている。The A / D converter 22 converts a voice or noise analog signal input from the microphone 11 via the amplifier 21 into a digital signal. The A / D converter 22 is connected to the system bus 27.

【００５０】ＣＰＵ２３は、音声認識マイク１におい
て、音声認識および辞書登録ならびにデータ通信の全て
の処理をソフトウエアで行う中央処理ユニットあるいは
ＣＰＵコアである。The CPU 23 is a central processing unit or a CPU core which performs all processes of voice recognition, dictionary registration and data communication in the voice recognition microphone 1 by software.

【００５１】ＲＯＭ２４には、音声認識マイクシステム
の初期化および一連の音声認識および辞書登録ならびに
データ通信の全ての処理をソフトウエアで実行するため
のプログラムが、書き込まれている。また、ＲＯＭ２４
には、音声認識に必要な音響モデルや辞書、文法なども
書き込まれている。The ROM 24 stores a program for executing all processes of initialization of the voice recognition microphone system, a series of voice recognition, dictionary registration, and data communication by software. ROM 24
Also, an acoustic model, a dictionary, a grammar, and the like necessary for speech recognition are written in the file.

【００５２】ＲＡＭ２５は、一連の音声認識、辞書登
録、データ通信の全ての処理をソフトウエアで実行する
ためのプログラムをアクセスの高速なメモリに転送して
実行するためのメモリであり、また、プログラム実行中
に必要なワークエリアを確保するためのメモリである。
さらに、電源を切ってもユーザの登録した辞書などが消
えない様にするためのメモリである。The RAM 25 is a memory for transferring a program for executing all processes of a series of voice recognition, dictionary registration and data communication by software to a high-speed access memory and executing the program. This is a memory for securing a necessary work area during execution.
Further, it is a memory for preventing dictionaries and the like registered by the user from being erased even when the power is turned off.

【００５３】有線インタフェース２６−１は、音声認識
マイク１と接続されたシステム本体の間で情報（データ
３１）をやり取りするためのインタフェースである。シ
ステム本体とは、有線で接続され、データビット幅は、
シリアルでもパラレルでもよい。The wired interface 26-1 is an interface for exchanging information (data 31) between the voice recognition microphone 1 and the system body connected thereto. The system itself is connected by wire, and the data bit width is
It may be serial or parallel.

【００５４】データ３１は、音声認識マイク１とシステ
ム本体の間で、有線を介して双方向に転送されるデータ
である。その第１は、音声認識マイク１から出力される
認識結果である。認識結果は、テキストデータの文字情
報であっても、コード化されたデータもよい。第２は、
認識対象となる辞書データである。辞書データは、シス
テム本体から音声認識マイク１へ転送される。第３は、
音声でない雑音Ｎと音声Ｓのレベルを示すＳ／Ｎ比のデ
ータである。第４は、システム本体で音声認識モードに
入っていることを伝えるための情報である。例えば、シ
ステム本体側のカーナビゲーションシステムにおいて、
音声認識モードとしてリモコンの発話ボタンが押された
場合などである。そこで、音声認識マイク１は、入力さ
れた音声に対して、音声認識処理を実行する。The data 31 is data that is bidirectionally transferred between the voice recognition microphone 1 and the system body via a wire. The first is a recognition result output from the voice recognition microphone 1. The recognition result may be character information of text data or coded data. Second,
Dictionary data to be recognized. The dictionary data is transferred from the system body to the voice recognition microphone 1. Third,
This is S / N ratio data indicating the levels of noise N and voice S that are not voice. Fourth is information for notifying that the system is in the voice recognition mode. For example, in a car navigation system on the system body side,
This is the case where the utterance button of the remote control is pressed as the voice recognition mode. Therefore, the voice recognition microphone 1 performs a voice recognition process on the input voice.

【００５５】赤外線インタフェース２６−２は、音声認
識マイク１と接続されたシステム本体の間で情報（デー
タ３２）をやり取りするためのインタフェースである。
システム本体とは、赤外線ＩＲを使った無線でインタフ
ェースされ、データビット幅は、シリアルでもパラレル
でもよい。The infrared interface 26-2 is an interface for exchanging information (data 32) between the voice recognition microphone 1 and the connected system body.
The system itself is wirelessly interfaced using infrared IR, and the data bit width may be serial or parallel.

【００５６】データ３２は、音声認識マイク１とシステ
ム本体の間で、赤外線通信方式を介して双方向に転送さ
れるデータである。その内容はデータ３１と同じであ
る。The data 32 is data that is bidirectionally transferred between the voice recognition microphone 1 and the system main body via the infrared communication system. The contents are the same as the data 31.

【００５７】無線インタフェース２６−３は、音声認識
マイク１と接続されたシステム本体の間で情報（データ
３３）をやり取りするためのインタフェースである。シ
ステム本体とは、無線ＬＡＮなどを使った無線でインタ
フェースされ、データビット幅は、シリアルでもパラレ
ルでもよい。The wireless interface 26-3 is an interface for exchanging information (data 33) between the voice recognition microphone 1 and the connected system main body. The system itself is wirelessly interfaced using a wireless LAN or the like, and the data bit width may be serial or parallel.

【００５８】データ３３は、音声認識マイク１とシステ
ム本体の間で、無線を介して双方向に転送されるデータ
である。その内容はデータ３１と同様である。The data 33 is data that is bidirectionally transferred between the voice recognition microphone 1 and the system body via radio. The contents are the same as the data 31.

【００５９】ここで、システム本体とのインタフェース
をとる有線インタフェース２６−１、赤外線インタフェ
ース２６−２、無線インタフェース２６−３は、音声認
識マイク１において、どれか１つだけのインタフェース
をそなえていてもよいし、全てのインタフェースを備え
ていてもよい。Here, the wired interface 26-1, the infrared interface 26-2, and the wireless interface 26-3 for interfacing with the system main unit can be used even if the voice recognition microphone 1 has only one interface. Good, or all interfaces may be provided.

【００６０】音声認識モードインタフェース２６−４
は、上記システム本体で音声認識モードに入っているこ
とを伝えるための情報において、直接に音声認識マイク
１に音声認識モードに入っていることを伝えるためのイ
ンタフェースである。音声認識モードインタフェース２
６−４には、音声入力状態を通知する音声入力ボタン２
９が接続されている。直接に音声認識マイク１に音声認
識モードに入っていることを伝えるための意味は、例え
ば、システム本体は、音声認識マイク１からの認識結果
３１〜３３を一方向的に受けて、システム本体が別のあ
る処理や仕事をする場合などに有効である。このような
手法が必要な場合は、音声リモコンに適応した場合が考
えられる。Voice recognition mode interface 26-4
Is an interface for directly informing the voice recognition microphone 1 that the system is in the voice recognition mode in the information for reporting that the system body is in the voice recognition mode. Voice recognition mode interface 2
6-4, a voice input button 2 for notifying a voice input state;
9 is connected. The meaning of directly telling the voice recognition microphone 1 that the apparatus is in the voice recognition mode is, for example, that the system body receives the recognition results 31 to 33 from the voice recognition microphone 1 in one direction, and the system body This is effective when performing another process or work. When such a method is required, it is conceivable that the method is applied to an audio remote controller.

【００６１】音声認識マイク１は、複数のＬＳＩやＩＣ
で構成してもよいし、ＡＳＩＣ等の一つの半導体素子上
に構成してもよい。ＡＳＩＣであれば、ＣＰＵ２３はＣ
ＰＵコアとして構成される。The voice recognition microphone 1 includes a plurality of LSIs and ICs.
Or may be formed on one semiconductor element such as an ASIC. If it is an ASIC, the CPU 23
It is configured as a PU core.

【００６２】図３を用いて、音声認識マイク１と音声認
識の結果を用いて動作するシステム本体５を、無線ある
いは有線で接続し、音声認識をインタフェースに持つシ
ステム構成と、処理の流れを説明する。例えば、本願に
おけるシステムの一例として、カーナビゲーションシス
テムがあげられる。Referring to FIG. 3, a system configuration in which the voice recognition microphone 1 and the system main body 5 operating using the results of voice recognition are connected wirelessly or by wire, and voice recognition is provided as an interface, and the flow of processing will be described. I do. For example, a car navigation system is an example of the system in the present application.

【００６３】このシステムは、音声認識マイク１とシス
テム本体５を、無線あるいは有線４などの通信方式で接
続して構成される。This system is configured by connecting the voice recognition microphone 1 and the system main body 5 by a communication method such as wireless or wired 4.

【００６４】音声認識マイク１は、例えば、カーナビゲ
ーションシステムにおいては、車内のサンバイザー、シ
ートベルト、ステアリングコラム、ピラーやハンドルに
設置されるか内蔵される。または、音声認識マイク１
は、システムをコントロールするリモコンに内蔵され
る。For example, in a car navigation system, the voice recognition microphone 1 is installed or built in a sun visor, a seat belt, a steering column, a pillar or a steering wheel in a vehicle. Or voice recognition microphone 1
Is built into the remote control that controls the system.

【００６５】カーナビゲーションシステム本体５は、オ
ーディオシステムや空調システムと一体化され、ディス
プレイも含めて車内に搭載される。The car navigation system body 5 is integrated with an audio system and an air conditioning system, and is mounted in the vehicle including a display.

【００６６】音声認識マイク１は、マイク１１と、音声
認識部１３と、辞書部１５と、データ通信部１７と、音
声入力ボタン２９とを有して構成される。The voice recognition microphone 1 includes a microphone 11, a voice recognition unit 13, a dictionary unit 15, a data communication unit 17, and a voice input button 29.

【００６７】音声認識部１３は、Ａ／Ｄ変換器１３１
と、音声区間切出部１３２と、音声分析処理部１３３
と、音声認識処理部１３４と、音響モデル格納部１３５
とを有して構成される。The speech recognition unit 13 includes an A / D converter 131
Voice section cutout section 132 and voice analysis processing section 133
, A voice recognition processing unit 134, and an acoustic model storage unit 135
And is configured.

【００６８】辞書部１５は、コマンド辞書１５１と、認
識対象辞書１５２と、辞書登録処理部１５３と、ユーザ
登録辞書１５４を有して構成される。The dictionary section 15 includes a command dictionary 151, a recognition target dictionary 152, a dictionary registration processing section 153, and a user registration dictionary 154.

【００６９】データ通信部１７は、音声認識処理部１３
４によって認識された結果をシステム本体５に転送する
処理を行うデータ通信部である。あるいは、システム本
体５から転送されてくる認識対象の辞書データや音声認
識モードに入ったことを伝えるための情報を受け取る。The data communication unit 17 includes the voice recognition processing unit 13
4 is a data communication unit that performs a process of transferring a result recognized by 4 to the system main body 5. Alternatively, it receives dictionary data to be recognized transferred from the system main body 5 and information for notifying that it has entered the voice recognition mode.

【００７０】システム本体５は、例えば、カーナビゲー
ションシステムとして構成される。この場合、システム
本体５は、データ通信部５１と、辞書読出部５２と、認
識対象大規模辞書５３と、アプリケーションソフト５４
と、音声合成部５５と、表示部５６とを有して構成され
る。The system body 5 is configured as, for example, a car navigation system. In this case, the system body 5 includes a data communication unit 51, a dictionary reading unit 52, a large-scale dictionary 53 to be recognized, and application software 54.
, A voice synthesizing unit 55 and a display unit 56.

【００７１】Ａ／Ｄ変換器１３１は、マイク１１から出
力されたアナログ信号をディジタル信号に変換して、音
声区間切出部１３３に出力する。The A / D converter 131 converts an analog signal output from the microphone 11 into a digital signal, and outputs the digital signal to the voice section extracting unit 133.

【００７２】音声区間切出部１３２は、ディジタル信号
に変換された音声や雑音を含む入力信号から音声を切り
出して、音声分析処理部１３３へ出力する。The voice section cutout section 132 cuts out a voice from an input signal containing a voice or noise converted into a digital signal and outputs the voice to the voice analysis processing section 133.

【００７３】音声分析処理部１３３は、音声を分析して
分析結果を音声認識処理部１３４へ出力する。The voice analysis processing unit 133 analyzes the voice and outputs the analysis result to the voice recognition processing unit 134.

【００７４】音声認識処理部１３４は、音声分析結果と
音響モデルと辞書を用いて、確率的に音声照合の一連の
処理を行い、確からしい認識結果を出力する音声認識部
である。The speech recognition processing unit 134 is a speech recognition unit that stochastically performs a series of speech collation processing using the speech analysis result, the acoustic model, and the dictionary, and outputs a reliable recognition result.

【００７５】音響モデル格納部１３５には、音声認識に
必要な音響モデルが格納される。実用化されつつある一
般的な音声認識システムでは、あらかじめ声を登録しな
くても、誰が話し手でもその声を認識できるいわゆる
「不特定話者対応」が主流になっている。このような音
声認識に用いられる音響モデルとしては、例えば、隠れ
マルコフモデル（HMM：Hidden Markov Model）を用いる
ことができる。The acoustic model storage section 135 stores acoustic models necessary for speech recognition. In a general speech recognition system that is being put into practical use, a so-called “unspecified speaker correspondence” in which anyone can recognize the voice without registering the voice in advance has become mainstream. As an acoustic model used for such speech recognition, for example, a Hidden Markov Model (HMM) can be used.

【００７６】コマンド辞書１５１は、あらかじめ音声認
識マイク１に登録されている辞書で、音声認識処理を実
行するジャンルや、音声認識処理の命令などの単語が記
述されている。The command dictionary 151 is a dictionary registered in the speech recognition microphone 1 in advance, and describes words such as a genre for executing speech recognition processing and instructions for speech recognition processing.

【００７７】認識対象辞書１５２は、音声認識の対象と
なる辞書であり、システム本体５から必要に応じて、転
送されてくる。The recognition target dictionary 152 is a dictionary to be subjected to voice recognition, and is transferred from the system body 5 as needed.

【００７８】辞書登録処理部１５３は、認識対象となっ
ている認識対象辞書１５２の中から、常に音声認識マイ
ク１の中に登録しておきたい単語に対して、ユーザがコ
マンド辞書１５１の「辞書へ登録」を音声入力して、認
識結果として「辞書へ登録」が音声認識処理部１３４か
ら出力された場合に登録を行う処理部である。The dictionary registration processing section 153 allows the user to input a “dictionary” of the command dictionary 151 for words that the user wants to always register in the voice recognition microphone 1 from among the recognition target dictionaries 152 to be recognized. This is a processing unit that performs registration when a “registration to dictionary” is output from the speech recognition processing unit 134 as a result of speech input of “register to”.

【００７９】ユーザ登録辞書１５４は、常に音声認識マ
イク１の中に登録しておきたい単語を登録する辞書であ
る。The user registration dictionary 154 is a dictionary for registering words to be registered in the voice recognition microphone 1 at all times.

【００８０】図４を用いて、図３に示した辞書の構成に
ついて説明する。あらかじめ音声認識マイク１に登録さ
れたコマンド辞書１５１は、例えば３００語の単語数か
らなる辞書として構成され、図４（ａ）に示すデータＤ
１５１として例示される。この例では、コマンド辞書１
５１は、認識対象ジャンルを表す「ホテル」，「パー
ク」，「スキー場」，「ゴルフ場」を始め、処理の命令
を表す「辞書へ登録」，「辞書から削除」，「辞書の内
容」などの単語で構成されている。The configuration of the dictionary shown in FIG. 3 will be described with reference to FIG. The command dictionary 151 registered in advance in the voice recognition microphone 1 is configured as a dictionary composed of, for example, 300 words, and the data D shown in FIG.
151 as an example. In this example, command dictionary 1
Reference numeral 51 denotes "hotel", "park", "ski resort", "golf course" indicating a recognition target genre, "register in dictionary", "delete from dictionary", "contents of dictionary" indicating processing instructions. It is composed of such words.

【００８１】音声認識対象辞書１５２は、音声認識の対
象となる辞書であり、システム本体５の認識対象大規模
辞書５３から、必要に応じて転送されてくる。音声認識
マイク１は、コマンド辞書１５１と、この認識対象辞書
１５２の単語の範囲に限り、入力音声に対して音声認識
処理を行う。The dictionary for speech recognition 152 is a dictionary to be subjected to speech recognition, and is transferred from the large-scale dictionary for recognition 53 of the system body 5 as necessary. The voice recognition microphone 1 performs voice recognition processing on the input voice only in the range of the words in the command dictionary 151 and the recognition target dictionary 152.

【００８２】認識対象辞書１５２は、５０００語ぐらい
の単語数からなり、コマンド辞書のジャンルを指定する
コマンドに対応した単語からなる辞書として構成され
る。例えば、ジャンルを指定するコマンド「ホテル」に
対応して、図４（ｂ）のデータＤ１５２として例示した
ように、「○○○ホテル」，「△△△ホテル」，「ホテ
ル□□□□」，「◇◇◇◇旅館」…「××××ホテ
ル」，「☆☆☆ホテル」などのホテル名を表す単語のみ
が登録されている。The recognition target dictionary 152 has a word count of about 5000 words, and is configured as a dictionary including words corresponding to commands specifying the genre of the command dictionary. For example, in response to the command “hotel” for specifying a genre, as shown as data D152 in FIG. 4B, “OO Hotel”, “△△△ hotel”, “hotel □□□□” , “@Ryokan”: Only words representing hotel names such as “xxxxxx hotel” and “☆☆☆ hotel” are registered.

【００８３】ユーザ登録辞書データＤ１５４は、ユーザ
登録辞書１５４の具体的な内容の一例を示している。例
えば、ユーザが登録しておきたい単語で構成されてお
り、ユーザ登録辞書１５４を構成している単語の数は、
１００単語ぐらいとする。The user registration dictionary data D154 shows an example of specific contents of the user registration dictionary 154. For example, the user registration dictionary 154 is composed of words that the user wants to register, and the number of words constituting the user registration dictionary 154 is as follows.
Assume about 100 words.

【００８４】ユーザ登録辞書１５４は、１００語程度の
単語数からなる辞書として構成される。ユーザ登録辞書
１５４は、ユーザが登録しておきたい単語を、コマンド
辞書１５１や認識対象辞書１５２から取り出して登録し
た辞書として構成され、図４（ｃ）のデータＤ１５４と
して例示した、「△△△ホテル」，「東京○○○ラン
ド」，…「自宅へ戻る」など、場所や処理の命令などの
単語が登録されている。The user registration dictionary 154 is configured as a dictionary having about 100 words. The user registration dictionary 154 is configured as a dictionary in which words that the user wants to register are extracted from the command dictionary 151 and the recognition target dictionary 152 and registered, and “@” illustrated as data D154 in FIG. Words such as a place and a processing instruction such as "hotel", "Tokyo xxx land", ... "return to home" are registered.

【００８５】つぎに、システム本体５側の処理について
説明する。システム本体５のデータ通信部５１は、音声
認識マイク１によって認識された結果をシステム本体５
で受け取るためのデータ通信部である。あるいは、シス
テム本体５から音声認識マイク１に対して、認識対象の
辞書データや音声認識モードに入ったことを伝えるため
の情報を転送する。Next, the processing on the system body 5 side will be described. The data communication unit 51 of the system main body 5 transmits the result recognized by the voice recognition microphone 1 to the system main body 5.
Is a data communication unit for receiving the data. Alternatively, the system main body 5 transfers to the voice recognition microphone 1 dictionary data to be recognized and information for notifying that the voice recognition mode has been entered.

【００８６】辞書読出部５２は、ＣＤ−ＲＯＭやＤＶＤ
に代表される大記憶容量の記憶媒体に保存されている認
識対象大規模辞書５３の中から、認識対象の辞書として
分類されている辞書を読み出し、データ通信部５１か
ら、音声認識マイク１へ転送する。The dictionary reading unit 52 is a CD-ROM or DVD
A dictionary classified as a dictionary to be recognized is read out of the large-scale dictionary for recognition 53 stored in a storage medium having a large storage capacity represented by, and transferred from the data communication unit 51 to the voice recognition microphone 1. I do.

【００８７】認識対象大規模辞書５３は、認識対象の辞
書が各項目ごとに分類されてＣＤ−ＲＯＭやＤＶＤに保
存されている。例えば、ホテル単語辞書５３１は、ホテ
ル名の単語で構成された辞書であり、単語数は、５００
０単語であるとする。認識対象辞書として、ホテル単語
辞書の他にパーク単語辞書５３２、スキー場単語辞書５
３３などの各種の認識対象ジャンルが保存されており、
単語数は、それぞれ１００００単語、４０００単語であ
るとする。In the large-scale dictionary 53 to be recognized, the dictionary to be recognized is classified for each item and stored in a CD-ROM or a DVD. For example, the hotel word dictionary 531 is a dictionary composed of hotel name words, and the number of words is 500.
Assume that there are 0 words. Park word dictionary 532 and ski resort word dictionary 5 in addition to the hotel word dictionary as recognition target dictionaries
Various recognition target genres such as 33 are stored.
The number of words is assumed to be 10,000 words and 4000 words, respectively.

【００８８】アプリケーションソフト５４は、カーナビ
ゲーションシステムの主な処理を行うアプリケーション
ソフトであり、ＧＰＳの処理やナビゲーションの処理や
音声インタフェースなどのシステム全般にわたる処理を
行う。The application software 54 is application software for performing main processing of the car navigation system, and performs processing of the whole system such as GPS processing, navigation processing, and voice interface.

【００８９】表示部５６は、液晶表示装置（ＬＣＤ）な
どから構成され、地図や進行状況、音声認識結果に対す
る情報などを表示する。The display section 56 is composed of a liquid crystal display (LCD) or the like, and displays a map, a progress status, information on a voice recognition result, and the like.

【００９０】音声合成部５５は、進行状況や音声認識結
果のコールバック、音声案内等を音声合成して処理す
る。The voice synthesizing unit 55 synthesizes and processes the progress status, the callback of the voice recognition result, voice guidance, and the like.

【００９１】図５を用いて、音声認識マイク１とシステ
ム本体５内での音声認識処理を説明する。本発明の音声
認識システムは、図７に示した従来の音声認識システム
に対して、音声認識処理を音声認識マイク１側で行うの
で、カーナビゲーションシステムに限らず、システム本
体５での処理は既存のままでよく、システム側のＣＰＵ
の負荷は変らない。また、もともと音声のインタフェー
スを持たないシステムにおいても、システム本体のＣＰ
Ｕの負荷は変らない上、ハードウエアの改造も小さな変
更ですみ、通信インタフェースを持っているシステムに
おいては、音声インタフェースを持たない既存のハード
ウエアで、音声による操作を実現できる。The speech recognition processing in the speech recognition microphone 1 and the system main unit 5 will be described with reference to FIG. The voice recognition system of the present invention performs voice recognition processing on the voice recognition microphone 1 side with respect to the conventional voice recognition system shown in FIG. 7, so that the processing in the system body 5 is not limited to the car navigation system. The system side CPU
Load does not change. Also, even in a system that does not originally have a voice interface, the CP
The load on U does not change, and the hardware modification requires only minor changes. In a system having a communication interface, voice operation can be realized with existing hardware without a voice interface.

【００９２】図５において、破線から上は音声認識マイ
ク１における辞書の登録処理と音声認識処理とこの処理
に用いる辞書を、破線から下はシステム本体５における
音声認識結果の利用と辞書を示している。音声認識マイ
ク１側は、一連の音声認識を行う。この処理には、辞書
登録前に実施するユーザ登録辞書作成処理と、ユーザ登
録辞書を用いた音声認識処理を行う。In FIG. 5, the upper part from the broken line shows the dictionary registration processing and the voice recognition processing in the voice recognition microphone 1 and the dictionary used for this processing, and the lower part from the broken line shows the use of the voice recognition result in the system main unit 5 and the dictionary. I have. The voice recognition microphone 1 performs a series of voice recognition. In this process, a user registration dictionary creation process performed before dictionary registration and a voice recognition process using the user registration dictionary are performed.

【００９３】まず、登録前に第１の発生「ホテル」が入
力される（Ｓ１１）と、コマンド辞書１５１を用いて音
声認識処理Ｐ１１を実行し、音声認識結果「ホテル」を
出力する（Ｓ１２）。認識結果「ホテル」は、システム
本体５側へ転送され、システム本体側のアプリケーショ
ン５４で、辞書選択処理Ｐ５２を起動し（Ｓ１３）、認
識対象を網羅した認識対象大規模辞書５３の内のホテル
単語辞書５３１を選択する。First, when the first occurrence "hotel" is input before registration (S11), the voice recognition process P11 is executed using the command dictionary 151, and the voice recognition result "hotel" is output (S12). . The recognition result “hotel” is transferred to the system main body 5 side, and the application 54 of the system main body side activates the dictionary selection process P52 (S13), and the hotel word in the recognition target large-scale dictionary 53 covering the recognition targets. Select the dictionary 531.

【００９４】選択されたホテル単語辞書５３１は、シス
テム本体５から、音声認識マイク１側へ転送され、認識
対象辞書１５２に格納される（Ｓ１４）。The selected hotel word dictionary 531 is transferred from the system main body 5 to the voice recognition microphone 1, and stored in the recognition target dictionary 152 (S14).

【００９５】第２以降の発声「△△△ホテル」が入力さ
れると（Ｓ１５）、認識対象辞書１５２に格納されたホ
テル辞書データＤ５３１を用いて音声認識処理Ｐ１１を
行い、認識結果「△△△ホテル」を出力する（Ｓ１
６）。認識結果「△△△ホテル」は、システム本体側へ
転送され、システム本体５のアプリケーション５４で、
目的地設定、ルート検索などの処理Ｐ５７に渡される
（Ｓ１７）。When the second or later utterance "@hotel" is input (S15), speech recognition processing P11 is performed using the hotel dictionary data D531 stored in the recognition target dictionary 152, and the recognition result "@hotel" △ Hotel ”(S1
6). The recognition result “@hotel” is transferred to the system body side, and the application 54 of the system body 5
The processing is passed to processing P57 such as destination setting and route search (S17).

【００９６】第３の発声「辞書へ登録」が入力される
（Ｓ１８）と、音声認識処理Ｐ１１は、コマンド辞書１
５１を用いて音声認識処理を行い認識結果「辞書へ登
録」を出力する（Ｓ１９）。認識結果「辞書へ登録」を
受けて、辞書登録処理１５３は、音声認識マイク側のユ
ーザ登録辞書１５４に、「△△△ホテル」を登録する
（Ｓ２０）。ユーザ登録辞書１５４の単語数は、１００
単語とする。When the third utterance “register in dictionary” is input (S 18), the voice recognition process P 11 executes the command dictionary 1
The speech recognition process is performed using the command 51, and the recognition result "register in dictionary" is output (S19). Upon receiving the recognition result “register in dictionary”, dictionary registration processing 153 registers “@hotel” in user registration dictionary 154 on the voice recognition microphone side (S20). The number of words in the user registration dictionary 154 is 100
Words.

【００９７】登録後は、ユーザは、目的地を第１の発声
として、いきなり「△△△ホテル」を発声する（Ｓ２
１）と、ユーザ登録辞書１５２に格納されたユーザ登録
辞書データＤ１５４を用いて音声認識処理Ｐ１１を行
い、認識結果「△△△ホテル」を出力する（Ｓ１６）。
認識結果「△△△ホテル」は、システム本体側へ転送さ
れ、システム本体５のアプリケーション５４で、目的地
設定、ルート検索などの処理Ｐ５７に渡される（Ｓ１
７）。After registration, the user suddenly utters "@hotel" with the destination as the first utterance (S2).
1), voice recognition processing P11 is performed using the user registration dictionary data D154 stored in the user registration dictionary 152, and a recognition result "@hotel" is output (S16).
The recognition result “@hotel” is transferred to the system main body side, and is passed to processing P57 such as destination setting and route search by the application 54 of the system main body 5 (S1).
7).

【００９８】このように、本発明によれば、必要な単語
を選択してユーザ辞書１５４に登録した後は、ユーザは
直ちに「△△△ホテル」と発生するだけで音声認識が実
行される。さらに、先にも述べた通り、ユーザ登録辞書
１５４の単語数は、１００単語と少ないので、認識応答
時間および認識率の面で、良好な認識性能を発揮するこ
とができ、音声認識インタフェースの向上が期待でき
る。As described above, according to the present invention, after selecting a necessary word and registering it in the user dictionary 154, the user immediately executes "@hotel" to perform voice recognition. Further, as described above, since the number of words in the user registration dictionary 154 is as small as 100 words, good recognition performance can be achieved in terms of recognition response time and recognition rate, and the speech recognition interface can be improved. Can be expected.

【００９９】さらに、この音声認識システムは、音声認
識処理におけるシステム本体の負荷が大幅に軽減される
ので、音声認識マイクからの認識結果を受信できるイン
タフェースを追加するだけで、既存の様々なシステムに
応用できる。Further, in this speech recognition system, the load on the system itself in the speech recognition processing is greatly reduced, so that an existing interface can be received simply by adding an interface capable of receiving a recognition result from a speech recognition microphone. Can be applied.

【０１００】[0100]

【発明の効果】本発明によれば、カーナビゲーションシ
ステム、小型情報システム、ゲームに用いられる音声認
識システムにおいて、実際に使用する環境で、雑音のレ
ベルに合わせて音声区間検出用しきい値の設定を自動化
し、自動しきい値設定による音声区間検出および、認識
性能が実環境下でも劣化しない、良好な音声認識システ
ムを実現することができる。According to the present invention, in a car navigation system, a small information system, and a voice recognition system used in a game, a threshold for voice section detection is set in accordance with a noise level in an environment actually used. Can be realized, and a good speech recognition system can be realized in which speech section detection by automatic threshold setting and recognition performance do not deteriorate even in a real environment.

[Brief description of the drawings]

【図１】本発明にかかる音声認識マイクの機能の概要を
説明するブロック図。FIG. 1 is a block diagram illustrating an outline of functions of a voice recognition microphone according to the present invention.

【図２】本発明にかかる音声認識マイクのハードウエア
構成を示すブロック図。FIG. 2 is a block diagram showing a hardware configuration of a voice recognition microphone according to the present invention.

【図３】本発明にかかる音声認識システムの概要を説明
するブロック図。FIG. 3 is a block diagram illustrating an outline of a speech recognition system according to the present invention.

【図４】本発明にかかる音声認識システムにおける辞書
構成を説明する図。FIG. 4 is a view for explaining a dictionary configuration in the speech recognition system according to the present invention.

【図５】本発明にかかる音声認識システムの動作を説明
する図。FIG. 5 is a view for explaining the operation of the speech recognition system according to the present invention.

【図６】従来の音声認識システムを使用した携帯型翻訳
装置の構成を説明するブロック図。FIG. 6 is a block diagram illustrating a configuration of a portable translation device using a conventional speech recognition system.

【図７】従来の音声認識システムの動作を説明する図。FIG. 7 is a diagram for explaining the operation of a conventional speech recognition system.

[Explanation of symbols]

１音声認識マイク１１音声認識マイク１３音声認識部１３１Ａ／Ｄ変換器１３２音声区間切出部１３３音声分析処理部１３４音声認識処理部１３５音響モデル格納部１５辞書部１５１コマンド辞書１５２認識対象辞書１５３辞書登録処理部１５４ユーザ登録辞書１７データ通信部２１アンプ２２Ａ／Ｄ変換器２３ＣＰＵ２４ＲＯＭ２５ＲＡＭ２６インタフェース２９音声入力ボタン３認識結果４通信手段５システム本体５１データ通信部５２辞書読出部５３認識対象大規模辞書５３１ホテル単語辞書５３２パーク単語辞書５３３スキー場単語辞書５４アプリケーションソフ５５音声合成部５６表示部 1 Voice Recognition Microphone 11 Voice Recognition Microphone 13 Voice Recognition Unit 131 A / D Converter 132 Voice Section Extraction Unit 133 Voice Analysis Processing Unit 134 Voice Recognition Processing Unit 135 Acoustic Model Storage Unit 15 Dictionary Unit 151 Command Dictionary 152 Recognition Dictionary 153 Dictionary registration processing unit 154 User registration dictionary 17 Data communication unit 21 Amplifier 22 A / D converter 23 CPU 24 ROM 25 RAM 26 Interface 29 Voice input button 3 Recognition result 4 Communication means 5 System body 51 Data communication unit 52 Dictionary reading unit 53 Large-scale dictionary for recognition 531 Hotel word dictionary 532 Park word dictionary 533 Ski area word dictionary 54 Application software 55 Speech synthesis unit 56 Display unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５７１Ａ (72)発明者小窪浩明東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者畑岡信夫東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内Ｆターム(参考） 5B075 ND02 PP07 PP22 PQ02 PQ04 PQ05 UU01 5D015 DD02 GG03 LL09 LL11 9A001 CC05 EE05 HH16 HH17 JJ77 JZ76 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 3/00 571A (72) Inventor Hiroaki Kokubo 1-280 Higashi-Koigabo, Kokubunji-shi, Tokyo Hitachi, Ltd. (72) Inventor Nobuo Hataoka 1-280 Higashi Koikekubo, Kokubunji-shi, Tokyo F-term in Central Research Laboratory, Hitachi, Ltd.F-term (reference)

Claims

[Claims]

1. Words and sentences to be subjected to speech recognition are collected and defined as a dictionary, and those words and sentences are extracted based on the speech recognition result. A voice recognition microphone having a voice recognition function in a voice recognition system that outputs a recognized image or the like, or outputs a recognition result as voice using voice synthesis. A / D to convert analog signal to digital signal
A converter, a voice section detection process for detecting a voice section, a voice analysis process for performing voice analysis on the captured voice, a voice model having voice characteristics in phoneme units, a dictionary and a voice model registered in advance. In all of the dictionaries linked to the acoustic model, the dictionary is provided with a voice recognition processing unit that compares the input voice with the voice analysis result and outputs a reliable recognition result, and outputs the voice recognition result from the microphone. A voice recognition microphone characterized by the following.

2. A speech recognition unit for performing a series of speech recognition processes from inputting speech to outputting a speech recognition result, and transferring the recognition result to a system main unit for executing a new process using the speech recognition result. The voice recognition microphone according to claim 1, further comprising a data communication unit that transfers recognition target data from the main body, and a dictionary unit that includes a dictionary to be recognized.

3. The data communication unit is an interface for connecting a voice recognition microphone and the system body by wire, wireless, or infrared communication, and data to be communicated is digital data, and the content of the data is text representing a voice recognition result. 3. An S / N ratio data indicating data and / or digitized audio waveform data and / or non-audio noise N and audio S levels.
Voice recognition microphone described in.

4. An interface for connecting a voice input notifying means for notifying a voice recognition microphone that a voice is to be input from now on.
A voice recognition microphone according to claim 3.

5. A command dictionary registered in advance in a voice recognition microphone, a recognition target dictionary in which a dictionary to be recognized is read from a large-scale dictionary registered in a storage medium of a system main body, and a user registration process. The voice recognition microphone according to any one of claims 1 to 4, further comprising a user registration dictionary created.

6. A speech recognition system comprising a speech recognition microphone having a speech recognition function and outputting a result of recognizing input speech, and an information processing means for executing subsequent processing using the speech recognition result.

7. The speech recognition microphone according to claim 1, wherein the speech recognition microphone has a speech recognition processing unit, a dictionary unit, and a data communication unit, and the information processing unit has a recognition target dictionary and a data communication unit used for speech recognition. 7. A part of the recognition target dictionary of the information processing means is transferred to a dictionary part of a voice recognition microphone based on the voice recognition result in step (a), and voice recognition is performed using the transferred recognition target dictionary. A speech recognition system as described.

8. A user registration dictionary is provided in a dictionary section of the voice recognition microphone, and words required by a user in the dictionary to be recognized are registered in the user dictionary. In a normal voice recognition, a command dictionary and a user registration are registered. The speech recognition system according to claim 6, wherein speech recognition is performed with a dictionary as a recognition target.

9. A speech recognition method for a speech recognition system, comprising: a speech recognition microphone having a speech recognition processing unit, a dictionary unit, and a data communication unit; and information processing means for performing processing based on the speech recognition result. The dictionary section includes a command dictionary, a recognition target dictionary transferred from the information processing means, and a user registration dictionary. A plurality of words finally required by the user are collected from the recognition target dictionary for user registration. A speech recognition method for a speech recognition system, comprising: creating a dictionary; and performing speech recognition using a command dictionary and a user registration dictionary as recognition targets in normal speech recognition.