JP2004271561A

JP2004271561A - Speech input device, speaker identification device using the same, speech input method and speaker identification method using the same, speech input program, speaker identification program, and program recording medium

Info

Publication number: JP2004271561A
Application number: JP2003058071A
Authority: JP
Inventors: Yoichiro Hachiman; 洋一郎八幡
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-03-05
Filing date: 2003-03-05
Publication date: 2004-09-30
Anticipated expiration: 2023-03-05
Also published as: JP4253518B2

Abstract

【課題】ユーザに大きな負担を強いることなく発話区間で区切られた音声データと話者識別情報とを得る。
【解決手段】音声入力スイッチ５がオン状態の場合に、生体情報取得センサ１で取得された生体情報が生体情報入力部２から入力されると共に、音声入力部６から音声が入力される。生体情報照合部４は、生体情報入力部２からの入力生体情報と生体情報登録部３に登録された登録生体情報とを照合し、照合結果に基づいて話者識別結果を得る。こうして、音声入力スイッチ５を操作して発声している話者の生体情報を自動的に取得することによって、話者情報の入力を個別に行う必要が無い。したがって、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とを得ることができる。
【選択図】図１An object of the present invention is to obtain voice data and speaker identification information delimited by an utterance section without imposing a heavy burden on a user.
When a voice input switch is turned on, biological information acquired by a biological information acquisition sensor is input from a biological information input unit and a voice is input from a voice input unit. The biometric information collating unit 4 collates the input biometric information from the biometric information input unit 2 with the registered biometric information registered in the biometric information registration unit 3, and obtains a speaker identification result based on the collation result. Thus, by operating the voice input switch 5 and automatically acquiring the biometric information of the speaking speaker, it is not necessary to individually input the speaker information. Therefore, it is possible to obtain the voice data and the speaker identification information separated by the utterance section without putting a burden on the speaker.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
この発明は、音声を入力する音声入力装置およびそれを用いた話者識別装置、音声入力方法およびそれを用いた話者識別方法、音声入力プログラム、話者識別プログラム、並びに、プログラム記録媒体に関する。
【０００２】
【従来の技術】
従来、指紋データや音声データを用いてユーザを識別するユーザ識別装置がある。例えば、特開２０００‐１６５３７８号公報に開示された指紋認証装置においては、電源スイッチに設けた指紋読取部で読み取った指紋データの認証結果に基づいて移動無線端末機を動作させ、マイクから入力された音声データの認証結果に基づいて上記移動体無線端末機を通信可能にするようにしている。
【０００３】
また、例えば、特開平１１‐１４６０５７号公報に開示された移動電話においては、ユーザの音声データ、耳介や虹彩等の画像データ、指紋および掌紋の凹凸データ、二酸化炭素の濃度データを用いたユーザ識別結果に基づいて、移動電話を通話可能にする。
【０００４】
また、従来、予めユーザが音声を発話することによって、音声辞書を特定ユーザに適応し登録する音声認識装置がある。例えば、日本アイ・ビー・エム株式会社のＶｉａＶｏｉｃｅでは、予めユーザが多量の音声を発話することによって音声辞書を特定ユーザに適応した上で登録しておき、登録後はユーザが登録音声辞書を示すインデックスを選択することによりユーザに適応した音声辞書を用いて音声を認識するようにしている。
【０００５】
【発明が解決しようとする課題】
しかしながら、上記従来のユーザ識別装置においては、音声データの入力に関して以下のような問題がある。すなわち、先ず、特開２０００‐１６５３７８号公報に開示された指紋認証装置では、通話時の音声をマイクから入力して認証しているが、認証の対象となる音声データの範囲（取得タイミング）について言及しておらず、通話音声全体を通して正確に認証するのは周囲雑音の影響等があるため技術的に困難である。また、特開平１１‐１４６０５７号公報に開示された移動電話では、種々のユーザ情報に基づいて識別することによってセキュリティ効果の向上を図っている。ところが、ユーザ情報の入力については個別に扱っており、入力においてユーザに負担が掛るという問題がある。
【０００６】
また、上記従来の音声認識装置では、登録音声辞書の選択に関して以下のような問題がある。日本アイ・ビー・エム株式会社のＶｉａＶｏｉｃｅの場合には、ユーザが登録音声辞書を示すインデックスを選択する必要があるが、同一の音声認識装置を使用するユーザが複数存在する場合やユーザの交替頻度が激しい場合に、ユーザが交替する毎にインデックスを選択するのはユーザに負担が掛るという問題がある。
【０００７】
そこで、この発明の目的は、ユーザに大きな負担を強いることなく発話区間で区切られた音声データと発話者識別情報とを得ることが可能な音声入力装置およびそれを用いた話者識別装置、音声入力方法およびそれを用いた話者識別方法、音声入力プログラム、話者識別プログラム、並びに、プログラム記録媒体を提供することにある。
【０００８】
【課題を解決するための手段】
上記目的を達成するため、この発明の音声入力装置は、話者によって、音声入力スイッチが音声入力手段による音声入力を可にすると共に、生体情報取得手段による生体情報取得を可にするように切り替えられる。そうすると、音声入力手段によって、話者の音声に基づいて音声データが生成されて後段に入力される。さらに、生体情報取得手段によって取得された生体情報が生体情報入力手段から入力される。そして、生体情報照合手段によって、上記入力生体情報と生体情報登録手段に登録された登録生体情報とが比較照合され、入力音声に係る話者が識別されて話者識別結果が出力される。
【０００９】
こうして、上記音声入力スイッチを操作して発声している話者の生体情報が自動的に取得されて話者識別が行われる。したがって、話者情報の入力を個別に扱う必要が無く、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とが得られる。また、認証に音声データを用いていないので、周囲雑音の影響等がなく精度良く発話者の認証が行われる。また、上記発話者識別情報を音声認識時に用いる話者適応辞書の選択に利用すれば、自動的に話者適応辞書を選択することが可能になる。
【００１０】
また、１実施例の音声入力装置では、上記生体情報に指紋または掌紋を含んでいる。こうして、比較的に音声入力時に取得し易い話者情報に基づいて、容易に話者識別が行われる。
【００１１】
また、１実施例の音声入力装置では、上記音声入力スイッチを、押下されている間は上記音声入力と生体情報取得とを可にするオン状態であり、押下されていない間は上記音声入力と生体情報取得とを否にするオフ状態であるとしている。したがって、一般のスイッチと同様に、上記音声入力スイッチを押下し続けている間オン状態となって、上記音声入力と生体情報取得とが可能になる。
【００１２】
また、１実施例の音声入力装置では、上記音声入力スイッチを、押下された際に上記音声入力と生体情報取得とを可にするオン状態となり、発話終了判定手段によって発話が終了したと判定されると、自動的に、上記音声入力と生体情報取得とを否にするオフ状態となるようにしている。こうして、話者に負担を強いることなく、発話区間で区切られた音声データが自動的に得られる。
【００１３】
また、１実施例の音声入力装置では、上記生体情報取得手段を、上記音声入力スイッチがオン状態の間に継続して生体情報を取得するようにしている。こうして、精度良い生体情報を入力することが可能になる。
【００１４】
また、１実施例の音声入力装置では、上記生体情報取得手段を、上記音声入力スイッチがオン状態となった時点で一度だけ上記生体情報を取得するようにしている。こうして、生体情報入力処理が迅速に行われる。
【００１５】
また、この発明の話者識別装置は、話者によって、上記音声入力装置の音声入力スイッチが、音声入力手段による音声入力を可にすると共に、生体情報取得手段による生体情報取得を可にするように切り替えられると、上記音声入力装置によって、話者の音声に基づく音声データが後段に入力されると共に、上記入力生体情報に基づいて入力音声に係る話者識別結果が出力される。
【００１６】
さらに、上記音声入力装置の音声入力手段に設けられた声紋情報抽出手段によって、上記音声データから声紋情報が抽出される。そして、声紋照合手段によって、入力声紋情報と声紋情報登録手段に登録された登録声紋情報とが比較照合され、入力音声に係る話者が識別されて話者識別結果が出力される。そうすると、照合結果統合手段によって、上記音声入力装置の生体情報照合手段による話者識別結果と声紋照合手段による話者識別結果とが統合されて、一つの話者識別結果が出力される。
【００１７】
こうして、上記音声入力スイッチを操作して発声している話者の生体情報および声紋情報が自動的に入力されて話者識別が行われる。したがって、話者情報の入力を個別に扱う必要が無く、話者に負担を掛けることなく話者識別情報が得られる。また、入力生体情報と入力声紋情報との二つの話者情報に基づいて話者識別結果が得られる。したがって、精度の良い話者識別結果が得られる。また、上述したごとく、今発声している話者の生体情報と声紋情報とに基づいて話者識別が行われる。したがって、入力音声データと発話者との対応を確実に取ることができ、信頼性の高い話者識別が行われる。
【００１８】
また、１実施例の話者識別装置では、上記照合結果統合手段を、入力生体情報のＳ／Ｎ比と入力音声情報のＳ／Ｎ比とを予めメモリ手段に保持された平均値と比較し、上記平均値に比べてより大きいＳ／Ｎ比を呈する方の入力情報に基づく話者識別結果を上記一つの話者識別結果として出力するようにしている。したがって、上記入力生体情報に基づく話者識別結果と上記入力声紋情報に基づく話者識別結果とのうち、より確からしい方の話者識別結果が最終的な話者識別結果として出力される。
【００１９】
また、１実施例の話者識別装置では、上記照合結果統合手段を、上記生体情報照合手段が話者識別結果を得た際の確からしさの情報と上記声紋照合手段が話者識別結果を得た際の確からしさの情報とを比較して、より確からしさが大きい方の話者識別結果を上記一つの話者識別結果として出力するようにしている。したがって、上記入力生体情報に基づく話者識別結果と上記入力声紋情報に基づく話者識別結果とのうち、より確からしい方の話者識別結果が最終的な話者識別結果として出力される。
【００２０】
また、この発明の音声入力方法は、話者によって音声入力スイッチがオン状態にされると、上記音声入力スイッチのオン状態に呼応して、上記話者の生体情報が入力される一方、音声に基づいて生成された音声データが後段に入力される。そして、入力生体情報と生体情報登録手段に登録された登録生体情報とが比較照合され、入力音声に係る話者が識別されて話者識別結果が出力される。
【００２１】
こうして、上記音声入力スイッチを操作して発声している話者の生体情報が自動的に取得されて話者識別が行われる。したがって、話者情報の入力を個別に扱う必要が無く、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とが得られる。また、上記発話者識別情報を音声認識時に用いる話者適応辞書の選択に利用すれば、自動的に話者適応辞書を選択することが可能になる。
【００２２】
また、この発明の話者識別方法は、上記音声入力方法が実行されて、入力生体情報に基づいて入力音声に係る話者識別結果を得ると共に、生成された音声データから声紋情報が抽出されて入力される。さらに、入力声紋情報と声紋情報登録手段に登録された登録声紋情報とが比較照合され、入力音声に係る話者が識別されて話者識別結果が出力される。そうすると、上記入力生体情報に基づく話者識別結果と上記入力声紋情報に基づく話者識別結果とが統合されて、一つの話者識別結果が出力される。
【００２３】
こうして、上記音声入力スイッチを操作して発声している話者の生体情報および声紋情報が自動的に入力されて話者識別が行われる。したがって、話者情報の入力を個別に扱う必要が無く、話者に負担を掛けることなく話者識別情報が得られる。さらに、入力音声データと発話者との対応を確実に取ることができ、信頼性の高い話者識別が行われる。また、入力生体情報と入力声紋情報との二つの話者情報に基づいて、精度良く話者識別結果が得られる。
【００２４】
また、この発明の音声入力プログラムは、上記音声入力装置の音声入力スイッチが話者によって操作された際に、コンピュータを、上記音声入力装置の音声入力手段，生体情報入力手段，生体情報登録手段および生体情報照合手段として機能させる。
【００２５】
したがって、話者情報の入力を個別に扱う必要が無く、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とが得られる。また、上記発話者識別情報を音声認識時に用いる話者適応辞書の選択に利用すれば、自動的に話者適応辞書を選択することが可能になる。
【００２６】
また、この発明の話者識別プログラムは、上記話者識別装置の音声入力スイッチが話者によって操作された際に、コンピュータを、上記話者識別装置の音声入力手段，生体情報入力手段，生体情報登録手段，生体情報照合手段，声紋情報抽出手段，声紋情報登録手段，声紋照合手段および照合結果統合手段として機能させる。
【００２７】
したがって、話者情報の入力を個別に扱う必要が無く、話者に負担を掛けることなく話者識別情報が得られる。また、入力生体情報と入力声紋情報との二つの話者情報に基づいて、精度良く話者識別結果が得られる。
【００２８】
また、この発明のプログラム記録媒体は、上記音声入力プログラムが記録されている。したがって、この音声入力プログラムをコンピュータで読み出して実行することによって、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とが得られる。また、上記発話者識別情報を音声認識時に用いる話者適応辞書の選択に利用すれば、自動的に話者適応辞書を選択することが可能になる。
【００２９】
また、この発明のプログラム記録媒体は、上記話者識別プログラムが記録されている。したがって、この話者識別プログラムをコンピュータで読み出して実行することによって、話者に負担を掛けることなく話者識別情報を得ることができる。また、入力生体情報と入力声紋情報との二つの話者情報に基づいて、精度良く話者識別結果が得られる。
【００３０】
【発明の実施の形態】
以下、この発明を図示の実施の形態により詳細に説明する。
【００３１】
＜第１実施の形態＞
図１は、本実施の形態の音声入力装置における機能ブロック図である。図１において、１は、発話者の生体情報を取得する上記生体情報取得手段としての生体情報取得センサであり、例えば指紋読取センサまたは掌紋読取センサ等である。上記指紋読取センサまたは掌紋読取センサ等はその読み取り方法を限定するものではなく、例えば、光学式の装置や、感圧シートを用いる装置や、電極の容量変化を測定する装置や、電解効果型トランジスタを用いる装置等を用いればよい。
【００３２】
２は、上記生体情報取得センサ１によって取得された生体情報を入力する生体情報入力部である。３は、一人または複数の人の生体情報が個人別に登録されている生体情報登録部である。この生体情報登録部３に登録される生体情報は、生体情報取得センサ１によって取得される生体情報に対応した情報、例えば指紋または掌紋等である。４は、生体情報入力部２から入力された入力生体情報と生体情報登録部３に登録されている登録生体情報とを比較照合して、話者識別結果を出力する生体情報照合部である。
【００３３】
５は、音声を入力する際にオンする音声入力スイッチである。６は、音声入力スイッチ５がオン状態の場合に、マイク（図示せず）等によって音声を音声データに変換し、Ａ／Ｄ（アナログ／デジタル）変換処理を施してディジタルの音声データを生成して後段に入力する音声入力部である。尚、音声入力部６におけるＡ／Ｄ変換処理は必須の処理ではなく、マイクを介して得られるアナログ音声データをそのまま入力しても差し支えない。
【００３４】
ここで、上述したような機能構成において、生体情報入力部２，生体情報照合部４および音声入力部６は、専用のＬＳＩ（大規模集積回路）素子等によって構成される。また、生体情報登録部３は、半導体メモリ，磁気メモリまたは記憶装置等で構成される。尚、上記各部を構成する素子等は、１つであっても複数が複合されたものであっても本実施の形態に影響はない。さらに、上記各部は、ＣＰＵ（中央演算処理装置）あるいはその周辺機器等で代用することも可能である。
【００３５】
ここで、図１においては、上記生体情報取得センサ１と音声入力スイッチ５とを一つのモジュールとして図示しているが、これが本実施の形態における特徴の一つである。つまり、音声入力スイッチ５がオン状態である場合に、生体情報取得センサ１によって生体情報を取得すると共に、音声入力部６を構成する上記マイクを介して音声データを生成するのである。尚、音声入力スイッチ５のオン状態とオフ状態との交替は、発話者が指先で音声入力スイッチ５を押下している間はオン状態であり、離している間はオフ状態とするのが好ましい。また、発話終了判定手段（図示せず）を有して、発話者が発話の直前に一度だけ音声入力スイッチ５を押下し、その後入力音声のパワー等から算出されるＳ／Ｎ比等を用いて上記発話終了判定手段によって発話区間の終了を判定し、発話区間が終了した場合あるいはその直後に音声入力スイッチ５を自動的にオフ状態としてもよい。
【００３６】
図２は、図１に示す音声入力装置によって実行される音声入力処理動作のフローチャートである。以下、図２に従って、音声入力処理動作について詳細に説明する。発話者によって音声入力スイッチ５がオフ状態からオン状態に切り替えられると音声入力処理動作がスタートする。
【００３７】
ステップＳ１で、上記生体情報入力部２によって、生体情報取得センサ１で取得された例えば指紋あるいは掌紋等の生体情報が、生体情報照合部４に入力される（生体情報入力処理）。ステップＳ２で、音声入力部６によって、上記マイク等によって音声が音声データに変換され、Ａ／Ｄ変換処理が施されてディジタルの音声データが生成されて後段に入力される（音声入力処理）。この場合、上述したように、上記Ａ／Ｄ変換処理は必須の処理ではなく、上記マイクを介して変換されたアナログ音声データをそのまま入力してもよい。
【００３８】
ステップＳ３で、上記生体情報照合部４によって、音声入力スイッチ５がオン状態であるかオフ状態であるか判別される。その結果、オフ状態であればステップＳ４に進む一方、オン状態であればステップＳ１に戻って生体情報入力処理および音声入力処理を続行する。ステップＳ４で、生体情報照合部４によって、上記ステップＳ１において入力された入力生体情報と生体情報登録部３に登録されている登録生体情報とが比較照合される。そして、照合結果に基づいて話者識別結果が出力される（生体情報照合処理）。そうした後に、音声入力処理動作を終了する。
【００３９】
こうして出力された上記話者識別結果は、例えば、上記入力された音声データを認識する際に用いる話者適応辞書の選択時に利用することによって、精度良く入力音声を認識することが可能になるのである。
【００４０】
上述したように、本実施の形態においては、話者によってオフ状態からオン状態に切り替えられる音声入力スイッチ５を設け、音声入力スイッチ５がオン状態の場合に、生体情報取得センサ１によって生体情報が取得されると共に、音声入力部６から音声が入力されるようにしている。そして、生体情報取得センサ１によって取得された生体情報と生体情報登録部３に登録された生体情報とを生体情報照合部４で照合し、照合結果に基づいて話者識別結果を得るようにしている。
【００４１】
このように、本実施の形態によれば、上記音声入力スイッチ５を操作して発声している話者の生体情報を自動的に取得することができる。したがって、話者情報の入力を個別に行う必要が無く、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とを得ることができる。また、認証に音声データを用いてはいないので認証に周囲雑音の影響等がなく、精度良く発話者の認証を行うことができる。また、上記発話者識別情報を音声認識時に用いる話者適応辞書を選択する際に利用すれば、日本アイ・ビー・エム株式会社のＶｉａＶｏｉｃｅの場合のように使用者がインデックスを選択することなく自動的に話者適応辞書を選択することが可能になる。さらに、音声入力スイッチ５を操作して発声している話者の生体情報を取得するので音声データと発話者との対応が取れ、セキュリティ効果を高めることができる。
【００４２】
尚、上記第１実施の形態においては、上記音声入力スイッチ５がオン状態の間生体情報の取得と音声の入力とを継続する場合について記述している。しかしながら、本実施の形態は、これに限定されるものではなく、例えば、図２に示す音声入力処理動作のステップＳ３において、音声入力スイッチ５はオン状態であると判別された場合には上記ステップＳ２に戻るようにして、生体情報の入力は音声入力スイッチ５がオン状態を継続している間において一度だけ行うようにしても差し支えない。また、上記ステップＳ１において生体情報を入力した後に上記ステップＳ２において音声を入力するようにしているが、生体情報入力処理と音声入力処理とは平行して行うのが好ましい。また、上記ステップＳ１において生体情報を入力し上記ステップＳ２において音声を入力した後に生体情報照合処理を行うようにしているが、生体情報照合処理と音声入力処理とは並行して行うのが好ましい。
【００４３】
＜第２実施の形態＞
本実施の形態は、上記第１実施の形態における音声入力装置を用いた話者識別装置に関するものである。
【００４４】
図３は、本実施の形態の話者識別装置における機能ブロック図である。図３において、生体情報取得センサ１１，生体情報入力部１２，生体情報登録部１３，生体情報照合部１４，音声入力スイッチ１５および音声入力部１６は、上記第１実施の形態における生体情報取得センサ１，生体情報入力部２，生体情報登録部３，生体情報照合部４，音声入力スイッチ５および音声入力部６と同じであり、音声入力装置１７を構成している。尚、音声入力装置１７の詳細な説明は省略する。
【００４５】
但し、上記音声入力部１６は、図１における音声入力部６の機能に加えて、上記マイク等を介して入力された音声データを分析して声紋情報を得る声紋情報抽出手段を備えて、抽出した声紋情報を後述の声紋照合部１８へ入力するようになっている。尚、上記声紋情報としては、例えば、ＦＦＴ（高速フーリエ変換）等によって得られる周波数分析結果に基づく特徴量等を用いればよい。
【００４６】
１９は、一人または複数の人の声紋情報を個人別に登録しておく声紋登録部である。上記声紋照合部１８は、音声入力部１６から入力された入力声紋情報と声紋登録部１９に登録されている登録声紋情報とを比較照合して、話者識別結果を出力する。２０は、生体情報照合部１４からの話者識別結果と声紋照合部１８からの話者識別結果とを統合して、一つの話者識別結果を出力する照合結果統合部である。
【００４７】
この照合結果統合部２０では、例えば、予め、生体情報入力部１２から入力される種々の生体情報および音声入力部１６から入力される種々の音声情報の各々に関して算出されたＳ／Ｎ比の平均値を、半導体メモリ（図示せず）等に保持しておく。そして、照合結果統合処理の際には、入力生体情報のＳ／Ｎ比および入力音声情報のＳ／Ｎ比と上記保存された平均値とを比較し、平均値に比してより大きいＳ／Ｎ比を呈する方の入力情報を採用し、この採用された入力情報に基づく話者識別結果を上記一つの話者識別結果として出力するのである。
【００４８】
または、例えば、上記生体情報入力部１２から入力される生体情報および音声入力部１６から入力される音声情報の各々に関して、生体情報照合部１４および声紋照合部１８が話者識別結果を生成した際の確からしさの情報を用いて、より確からしさが大きい方の入力情報に基づく話者識別結果を上記一つの話者識別結果として出力してもよい。
【００４９】
ここで、上述したような機能構成において、生体情報入力部１２，生体情報照合部１４，音声入力部１６，声紋照合部１８および照合結果統合部２０は、専用のＬＳＩ（大規模集積回路）素子等によって構成される。また、生体情報登録部１３および声紋登録部１９は、半導体メモリ，磁気メモリまたは記憶装置等で構成される。尚、上記各部を構成する素子等は、１つであっても複数が複合されたものであっても本実施の形態に影響はない。さらに、上記各部は、ＣＰＵ（中央演算処理装置）あるいはその周辺機器等で代用することも可能である。
【００５０】
図４は、図３に示す話者識別装置によって実行される話者識別処理動作のフローチャートである。以下、図４に従って、話者識別処理動作について詳細に説明する。発話者によって音声入力スイッチ１５がオフ状態からオン状態に切り替えられると話者識別処理動作がスタートする。
【００５１】
ステップＳ１１〜ステップＳ１４で、上記音声入力装置１７によって、図２に示す音声入力処理動作における上記ステップＳ１〜ステップＳ４と同様にして、生体情報入力処理および音声入力処理が行われ、音声入力スイッチ１５がオフ状態になると生体情報照合処理が行われる。但し、上記ステップＳ１２における音声入力処理においては、音声データが得られるとこの音声データが分析されて上記声紋情報（入力声紋情報）が抽出される。
【００５２】
ステップＳ１５で、上記声紋照合部１８によって、上記ステップＳ１２において抽出された入力声紋情報と声紋登録部１９に登録されている登録声紋情報とが比較照合される。そして、照合結果に基づいて話者識別結果が出力される（声紋情報照合処理）。ステップＳ１６で、照合結果統合部２０によって、上記ステップＳ１４における生体情報照合処理の結果得られた話者識別結果と、上記ステップＳ１５における声紋情報照合処理の結果得られた話者識別結果とが、上述したようにして統合されて一つの話者識別結果が出力される（照合結果統合処理）。そうした後、話者識別処理動作を終了する。
【００５３】
上述したように、本実施の形態においては、上記第１実施の形態の場合と同じ構成を有する音声入力装置１７を設けて、音声入力スイッチ１５がオン状態の場合に、生体情報入力部１２からの入力生体情報と登録生体情報との照合結果に基づいて話者識別結果が得られる。さらに、音声入力装置１７における音声入力部１６は、得られた音声データから声紋情報を抽出するようになっており、入力声紋情報と声紋登録部１９に登録されている登録声紋情報との照合を声紋照合部１８で行い、照合結果に基づいて話者識別結果を得る。そして、照合結果統合部２０によって、生体情報照合部１４からの入力生体情報に基づく話者識別結果と声紋照合部１８からの入力声紋情報に基づく話者識別結果とを統合して、一つの話者識別結果を出力するようにしている。
【００５４】
このように、本実施の形態によれば、上記音声入力スイッチ１５を操作して発声している話者の生体情報および声紋情報を自動的に取得することができる。したがって、話者情報の入力を個別に行う必要が無く、話者に負担を掛けることなく話者識別情報を得ることができる。また、入力生体情報および入力声紋情報の複数の話者情報に基づいて話者識別結果を得るようにしている。したがって、精度良く話者識別結果を得ることができる。さらに、音声入力スイッチ１５を操作して発声している話者の生体情報及び声紋情報を取得して話者識別を行うので、入力音声データと発話者との対応を確実に取ることができ、信頼性の高い話者識別を行うことができる。
【００５５】
尚、上記第２実施の形態においては、上記ステップＳ１４において生体情報照合処理を行った後に上記ステップＳ１５において声紋情報照合処理を行うようにしているが、生体情報照合処理と声紋情報照合処理とは平行して行うのが好ましい。また、上記生体情報照合処理と声紋情報照合処理とを照合結果統合処理とは独立して個別に行うようにしているが、生体情報照合処理と声紋情報照合処理とを照合結果統合処理内において行うようにしても差し支えない。
【００５６】
ところで、上記各実施の形態における生体情報入力部２・１２，生体情報登録部３・１３，生体情報照合部４・１４，音声入力部６・１６，声紋照合部１８，声紋登録部１９および照合結果統合部２０としての機能は、プログラム記録媒体に記録された音声入力プログラムおよび話者識別プログラムによっても実現できる。上記各実施の形態における上記プログラム記録媒体は、ＲＯＭ（リード・オンリ・メモリ）でなるプログラムメディアである。または、外部補助記憶装置に装着されて読み出されるプログラムメディアであってもよい。尚、何れの場合においても、上記プログラムメディアから音声入力プログラムまたは話者識別プログラムを読み出すプログラム読み出し手段は、上記プログラムメディアに直接アクセスして読み出す構成を有していてもよいし、ＲＡＭ（ランダム・アクセス・メモリ）に設けられたプログラム記憶エリア（図示せず）にダウンロードし、上記プログラム記憶エリアにアクセスして読み出す構成を有していてもよい。尚、上記プログラムメディアから上記ＲＡＭのプログラム記憶エリアにダウンロードするためのダウンロードプログラムは、予め本体装置に格納されているものとする。
【００５７】
ここで、上記プログラムメディアとは、本体側と分離可能に構成され、磁気テープやカセットテープ等のテープ系、フロッピーディスク，ハードディスク等の磁気ディスクやＣＤ（コンパクトディスク）‐ＲＯＭ，ＭＯ（光磁気）ディスク，ＭＤ（ミニディスク），ＤＶＤ（ディジタル多用途ディスク）等の光ディスクのディスク系、ＩＣ（集積回路）カードや光カード等のカード系、マスクＲＯＭ，ＥＰＲＯＭ（紫外線消去型ＲＯＭ），ＥＥＰＲＯＭ（電気的消去型ＲＯＭ），フラッシュＲＯＭ等の半導体メモリ系を含めた、固定的にプログラムを坦持する媒体である。
【００５８】
また、上記実施の形態における音声入力装置および話者識別装置は、インターネットを含む通信ネットワークと通信Ｉ／Ｆを介して接続可能な構成を有している場合には、上記プログラムメディアは、通信ネットワークからのダウンロード等によって流動的にプログラムを坦持する媒体であっても差し支えない。尚、その場合における上記通信ネットワークからダウンロードするためのダウンロードプログラムは、予め本体装置に格納されているものとする。あるいは、別の記録媒体からインストールされるものとする。
【００５９】
尚、上記記録媒体に記録されるものはプログラムのみに限定されるものではなく、データも記録することが可能である。
【００６０】
【発明の効果】
以上より明らかなように、この発明では、話者が音声入力スイッチを音声入力と生体情報取得とを可能にする側に切り替えて発話すると、音声を音声データに変換して後段に入力する一方、生体情報を取得して入力し、入力生体情報と登録生体情報とを照合して入力音声に係る話者識別結果を出力するので、発声している話者の生体情報を自動的に取得して話者識別を行うことができる。
【００６１】
したがって、話者情報の入力を個別に行う必要が無く、話者に負担を掛けることなく発話区間で区切られた音声データと話者識別情報とを得ることができる。また、認証に音声データを用いていないので、周囲雑音の影響等がなく精度良く発話者の認証を行うことができる。また、上記発話者識別情報を音声認識に用いる話者適応辞書の選択時に利用すれば、自動的に話者適応辞書を選択することが可能になる。
【００６２】
また、この発明では、話者が音声入力スイッチを音声入力と生体情報取得とを可能にする側に切り替えて発話すると、音声に基づく音声データと入力生体情報に基づく話者識別結果とを出力し、さらに、上記音声データから声紋情報を抽出し、入力声紋情報と登録声紋情報とを比較照合して話者識別結果を出力し、上記入力生体情報に基づく話者識別結果と上記入力声紋情報に基づく話者識別結果とを統合して一つの話者識別結果を出力するので、発声している話者の生体情報および声紋情報を自動的に入力して話者識別を行うことができる。
【００６３】
したがって、話者情報の入力を個別に行う必要が無く、話者に負担を掛けることなく話者識別情報を得ることができる。また、入力生体情報と入力声紋情報との二つの話者情報に基づいて、精度の良い話者識別結果を得ることができる。また、上述したごとく、今発声している話者の生体情報と声紋情報とに基づいて話者識別を行うことによって、入力音声データと発話者との対応を確実に取ることができる。したがって、信頼性の高い話者識別を行うことができる。
【図面の簡単な説明】
【図１】この発明の音声入力装置におけるブロック図である。
【図２】図１に示す音声入力装置によって実行される音声入力処理動作のフローチャートである。
【図３】この発明の話者識別装置におけるブロック図である。
【図４】図３に示す話者識別装置によって実行される話者識別処理動作のフローチャートである。
【符号の説明】
１，１１…生体情報取得センサ、
２，１２…生体情報入力部、
３，１３…生体情報登録部、
４，１４…生体情報照合部、
５，１５…音声入力スイッチ、
６，１６…音声入力部、
１７…音声入力装置、
１８…声紋照合部、
１９…声紋登録部、
２０…照合結果統合部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice input device for inputting voice, a speaker identification device using the same, a voice input method and a speaker identification method using the same, a voice input program, a speaker identification program, and a program recording medium.
[0002]
[Prior art]
Conventionally, there is a user identification device for identifying a user using fingerprint data or voice data. For example, in a fingerprint authentication device disclosed in Japanese Patent Application Laid-Open No. 2000-165378, a mobile wireless terminal is operated based on an authentication result of fingerprint data read by a fingerprint reading unit provided in a power switch, and is input from a microphone. The mobile radio terminal can communicate based on the authentication result of the voice data.
[0003]
Further, for example, in a mobile phone disclosed in Japanese Patent Application Laid-Open No. H11-146057, user's voice data, image data such as pinna and iris, fingerprint and palm print irregularity data, The mobile phone is enabled to talk based on the identification result.
[0004]
In addition, conventionally, there is a speech recognition device that adapts a speech dictionary to a specific user and registers the speech dictionary when the user speaks in advance. For example, in ViaVoice of IBM Japan, Ltd., a user speaks a large amount of voice to register a speech dictionary after adapting to a specific user, and after registration, the user indicates the registered speech dictionary. By selecting an index, speech is recognized using a speech dictionary adapted to the user.
[0005]
[Problems to be solved by the invention]
However, the above-described conventional user identification device has the following problems with respect to input of audio data. That is, first, in the fingerprint authentication device disclosed in Japanese Patent Application Laid-Open No. 2000-165378, voice during a call is input from a microphone for authentication, but the range of voice data to be authenticated (acquisition timing) is described. It is not mentioned, and it is technically difficult to accurately authenticate the entire voice of the call due to the influence of ambient noise and the like. Further, in the mobile telephone disclosed in Japanese Patent Application Laid-Open No. H11-146057, the security effect is improved by identification based on various user information. However, input of user information is handled individually, and there is a problem that a burden is imposed on the user in inputting.
[0006]
In addition, the above-described conventional speech recognition apparatus has the following problem regarding selection of a registered speech dictionary. In the case of ViaVoice of IBM Japan, Ltd., it is necessary for the user to select an index indicating a registered voice dictionary. However, when there are a plurality of users using the same voice recognition device, When the index is intense, selecting the index every time the user changes places a burden on the user.
[0007]
Therefore, an object of the present invention is to provide a voice input device capable of obtaining voice data and speaker identification information separated by a voice section without imposing a heavy burden on a user, a speaker identification device using the voice input device, and a voice An object of the present invention is to provide an input method, a speaker identification method using the same, a voice input program, a speaker identification program, and a program recording medium.
[0008]
[Means for Solving the Problems]
In order to achieve the above object, in the voice input device of the present invention, a speaker switches a voice input switch to enable voice input by a voice input unit and to enable biological information acquisition by a biological information acquisition unit. Can be Then, voice data is generated by the voice input means based on the voice of the speaker and input to the subsequent stage. Further, the biological information acquired by the biological information acquiring means is input from the biological information input means. Then, the input biometric information and the registered biometric information registered in the biometric information registration means are compared and collated by the biometric information matching means, a speaker associated with the input voice is identified, and a speaker identification result is output.
[0009]
Thus, the biometric information of the speaker speaking by operating the voice input switch is automatically acquired, and the speaker identification is performed. Therefore, it is not necessary to handle the input of the speaker information individually, and the voice data and the speaker identification information separated by the utterance section can be obtained without putting a burden on the speaker. In addition, since voice data is not used for authentication, the speaker is accurately authenticated without being affected by ambient noise. Further, if the speaker identification information is used for selecting a speaker adaptive dictionary used for speech recognition, it becomes possible to automatically select the speaker adaptive dictionary.
[0010]
In the voice input device according to the embodiment, the biometric information includes a fingerprint or a palm print. In this way, speaker identification is easily performed based on speaker information that is relatively easy to acquire at the time of voice input.
[0011]
Further, in the voice input device of one embodiment, the voice input switch is in an on state that enables the voice input and the acquisition of the biological information while being pressed, and the voice input switch is not pressed when the voice input switch is not pressed. It is assumed that the state is an off state in which the acquisition of biological information is to be rejected. Therefore, as in the case of a general switch, the voice input switch is turned on while the voice input switch is kept depressed, so that the voice input and the biological information acquisition can be performed.
[0012]
Further, in the voice input device of one embodiment, when the voice input switch is pressed, the voice input device is turned on to enable the voice input and the acquisition of biological information, and the utterance end determination unit determines that the utterance has ended. Then, the state is automatically set to an off state in which the voice input and the biological information acquisition are denied. In this way, the voice data delimited by the utterance section is automatically obtained without imposing a burden on the speaker.
[0013]
Further, in the voice input device of one embodiment, the biometric information acquisition means is configured to continuously obtain biometric information while the voice input switch is on. Thus, it is possible to input accurate biological information.
[0014]
Further, in the voice input device of one embodiment, the biometric information acquisition means is configured to obtain the biometric information only once when the voice input switch is turned on. Thus, the biological information input process is performed quickly.
[0015]
Also, in the speaker identification device according to the present invention, the speaker enables the voice input switch of the voice input device to enable voice input by the voice input unit, and also enables biological information acquisition by the biological information acquisition unit. Then, the voice input device inputs voice data based on the voice of the speaker to the subsequent stage, and outputs a speaker identification result related to the input voice based on the input biometric information.
[0016]
Further, voiceprint information is extracted from the voice data by voiceprint information extracting means provided in the voice input means of the voice input device. Then, the input voiceprint information and the registered voiceprint information registered in the voiceprint information registration unit are compared and collated by the voiceprint collation unit, a speaker related to the input voice is identified, and a speaker identification result is output. Then, the result of speaker identification by the biometric information matching means of the voice input device and the result of speaker identification by the voiceprint matching means are integrated by the matching result integration means, and one speaker identification result is output.
[0017]
Thus, the voice input switch is operated to automatically input the biometric information and the voiceprint information of the speaking speaker, thereby performing the speaker identification. Therefore, it is not necessary to individually handle the input of the speaker information, and the speaker identification information can be obtained without imposing a burden on the speaker. Further, a speaker identification result can be obtained based on two pieces of speaker information of the input biometric information and the input voiceprint information. Therefore, an accurate speaker identification result can be obtained. Further, as described above, speaker identification is performed based on the biometric information and the voiceprint information of the speaker who is uttering now. Therefore, the correspondence between the input voice data and the speaker can be ensured, and highly reliable speaker identification is performed.
[0018]
In the speaker identification device according to one embodiment, the collation result integrating unit compares the S / N ratio of the input biometric information and the S / N ratio of the input voice information with an average value stored in a memory unit in advance. The speaker identification result based on the input information having the higher S / N ratio than the average value is output as the one speaker identification result. Therefore, of the speaker identification result based on the input biometric information and the speaker identification result based on the input voiceprint information, the more likely speaker identification result is output as the final speaker identification result.
[0019]
Further, in the speaker identification device according to one embodiment, the verification result integrating means includes information on the likelihood that the biometric information verification means has obtained the speaker identification result and the voiceprint verification means obtains the speaker identification result. The information of the certainty at the time of comparison is compared with the information of the certainty, and the speaker identification result with the higher certainty is output as the one speaker identification result. Therefore, of the speaker identification result based on the input biometric information and the speaker identification result based on the input voiceprint information, the more likely speaker identification result is output as the final speaker identification result.
[0020]
Further, according to the voice input method of the present invention, when the voice input switch is turned on by the speaker, the biometric information of the speaker is input in response to the on state of the voice input switch, while the voice is input. The audio data generated based on this is input to the subsequent stage. Then, the input biometric information and the registered biometric information registered in the biometric information registration unit are compared and collated, a speaker related to the input voice is identified, and a speaker identification result is output.
[0021]
Thus, the biometric information of the speaker speaking by operating the voice input switch is automatically acquired, and the speaker identification is performed. Therefore, it is not necessary to handle the input of the speaker information individually, and the voice data and the speaker identification information separated by the utterance section can be obtained without putting a burden on the speaker. Further, if the speaker identification information is used for selecting a speaker adaptive dictionary used for speech recognition, it becomes possible to automatically select the speaker adaptive dictionary.
[0022]
According to the speaker identification method of the present invention, the voice input method is executed to obtain a speaker identification result related to the input voice based on the input biometric information, and voiceprint information is extracted from the generated voice data. Is entered. Further, the input voiceprint information and the registered voiceprint information registered in the voiceprint information registration unit are compared and collated, a speaker related to the input voice is identified, and a speaker identification result is output. Then, the speaker identification result based on the input biometric information and the speaker identification result based on the input voiceprint information are integrated, and one speaker identification result is output.
[0023]
Thus, the voice input switch is operated to automatically input the biometric information and the voiceprint information of the speaking speaker, thereby performing the speaker identification. Therefore, it is not necessary to individually handle the input of the speaker information, and the speaker identification information can be obtained without imposing a burden on the speaker. Furthermore, the correspondence between the input voice data and the speaker can be ensured, and highly reliable speaker identification is performed. Further, a speaker identification result can be obtained with high accuracy based on two pieces of speaker information of input biometric information and input voiceprint information.
[0024]
Also, the voice input program of the present invention, when the voice input switch of the voice input device is operated by a speaker, causes the computer to control the voice input means, the biological information input means, the biological information registration means, It is made to function as biometric information matching means.
[0025]
Therefore, it is not necessary to handle the input of the speaker information individually, and the voice data and the speaker identification information separated by the utterance section can be obtained without putting a burden on the speaker. Further, if the speaker identification information is used for selecting a speaker adaptive dictionary used for speech recognition, it becomes possible to automatically select the speaker adaptive dictionary.
[0026]
Also, the speaker identification program of the present invention, when a voice input switch of the speaker identification device is operated by a speaker, causes a computer to input the voice input means, the biological information input means, the biological information of the speaker identification device. It functions as registration means, biometric information matching means, voiceprint information extracting means, voiceprint information registering means, voiceprint matching means, and matching result integrating means.
[0027]
Therefore, it is not necessary to individually handle the input of the speaker information, and the speaker identification information can be obtained without imposing a burden on the speaker. Further, a speaker identification result can be obtained with high accuracy based on two pieces of speaker information of input biometric information and input voiceprint information.
[0028]
Further, the program recording medium of the present invention stores the above-mentioned voice input program. Therefore, by reading and executing the voice input program by the computer, the voice data and the speaker identification information separated by the utterance section can be obtained without imposing a burden on the speaker. Further, if the speaker identification information is used for selecting a speaker adaptive dictionary used for speech recognition, it becomes possible to automatically select the speaker adaptive dictionary.
[0029]
Further, the program recording medium of the present invention stores the above speaker identification program. Therefore, by reading and executing this speaker identification program by a computer, speaker identification information can be obtained without imposing a burden on the speaker. Further, a speaker identification result can be obtained with high accuracy based on two pieces of speaker information of input biometric information and input voiceprint information.
[0030]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments.
[0031]
<First embodiment>
FIG. 1 is a functional block diagram of the voice input device according to the present embodiment. In FIG. 1, reference numeral 1 denotes a biometric information acquisition sensor as the biometric information acquisition means for acquiring biometric information of a speaker, such as a fingerprint reading sensor or a palm print reading sensor. The fingerprint reading sensor or palm print reading sensor or the like does not limit the reading method. For example, an optical device, a device using a pressure-sensitive sheet, a device for measuring a change in capacitance of an electrode, and a field effect transistor An apparatus using the above may be used.
[0032]
Reference numeral 2 denotes a biological information input unit for inputting biological information acquired by the biological information acquisition sensor 1. Reference numeral 3 denotes a biometric information registration unit in which biometric information of one or more persons is registered for each individual. The biological information registered in the biological information registration unit 3 is information corresponding to the biological information acquired by the biological information acquisition sensor 1, for example, a fingerprint or a palm print. Reference numeral 4 denotes a biometric information matching unit that compares input biometric information input from the biometric information input unit 2 with registered biometric information registered in the biometric information registration unit 3 and outputs a speaker identification result.
[0033]
Reference numeral 5 denotes a voice input switch that is turned on when voice is input. Reference numeral 6 denotes a case where when the audio input switch 5 is on, the audio is converted into audio data by a microphone (not shown) or the like, and an A / D (analog / digital) conversion process is performed to generate digital audio data. This is a voice input unit to be input at a later stage. Note that the A / D conversion process in the voice input unit 6 is not an essential process, and analog voice data obtained via a microphone may be directly input.
[0034]
Here, in the functional configuration as described above, the biometric information input unit 2, the biometric information matching unit 4, and the voice input unit 6 are configured by a dedicated LSI (large-scale integrated circuit) element or the like. The biometric information registration unit 3 is configured by a semiconductor memory, a magnetic memory, a storage device, or the like. It should be noted that the present embodiment is not affected whether the elements or the like constituting each of the above sections are one or a plurality of composite elements. Further, each of the above-mentioned units may be replaced by a CPU (Central Processing Unit) or its peripheral devices.
[0035]
Here, in FIG. 1, the biological information acquisition sensor 1 and the voice input switch 5 are illustrated as one module, but this is one of the features of the present embodiment. That is, when the voice input switch 5 is on, the biometric information is acquired by the biometric information acquisition sensor 1 and voice data is generated via the microphone constituting the voice input unit 6. It is preferable that the voice input switch 5 be switched between the on state and the off state while the speaker is pressing down the voice input switch 5 with his / her fingertip and off when the speaker is released. . Further, the utterance end determination means (not shown) is provided, and the utterer presses the voice input switch 5 once immediately before the utterance, and then uses the S / N ratio calculated from the power of the input voice and the like. Then, the end of the utterance section is determined by the utterance end determination means, and the voice input switch 5 may be automatically turned off when the utterance section ends or immediately thereafter.
[0036]
FIG. 2 is a flowchart of a voice input processing operation performed by the voice input device shown in FIG. Hereinafter, the voice input processing operation will be described in detail with reference to FIG. When the voice input switch 5 is switched from the off state to the on state by the speaker, the voice input processing operation starts.
[0037]
In step S1, the biometric information input unit 2 inputs biometric information such as a fingerprint or a palm print acquired by the biometric information acquisition sensor 1 to the biometric information matching unit 4 (biometric information input processing). In step S2, the audio input unit 6 converts the audio into audio data by the microphone or the like, performs A / D conversion processing, generates digital audio data, and inputs the digital audio data to the subsequent stage (audio input processing). In this case, as described above, the A / D conversion processing is not an essential processing, and the converted analog audio data may be directly input via the microphone.
[0038]
In step S3, the biological information collating unit 4 determines whether the voice input switch 5 is on or off. As a result, if it is off, the process proceeds to step S4, while if it is on, the process returns to step S1 to continue the biometric information input process and the voice input process. In step S4, the input biometric information input in step S1 is compared with the registered biometric information registered in the biometric information registration unit 3 by the biometric information matching unit 4. Then, a speaker identification result is output based on the matching result (biological information matching processing). After that, the voice input processing operation ends.
[0039]
The speaker identification result thus output is used, for example, when selecting a speaker adaptive dictionary used for recognizing the input voice data, so that the input voice can be recognized with high accuracy. is there.
[0040]
As described above, in the present embodiment, the voice input switch 5 that is switched from the off state to the on state by the speaker is provided, and when the voice input switch 5 is in the on state, the biological information is obtained by the biological information acquisition sensor 1. The voice is acquired and the voice is input from the voice input unit 6. Then, the biometric information acquired by the biometric information acquisition sensor 1 and the biometric information registered in the biometric information registration unit 3 are collated by the biometric information collation unit 4, and a speaker identification result is obtained based on the collation result. I have.
[0041]
As described above, according to the present embodiment, it is possible to automatically acquire the biological information of the speaking speaker by operating the voice input switch 5. Therefore, it is not necessary to input speaker information individually, and it is possible to obtain the voice data and the speaker identification information separated by the utterance section without putting a burden on the speaker. In addition, since voice data is not used for authentication, the authentication is not affected by ambient noise and the like, and the speaker can be accurately authenticated. In addition, if the speaker identification information is used to select a speaker adaptive dictionary used for speech recognition, the user can automatically select an index without selecting an index as in the case of Via Voice of IBM Japan, Ltd. It becomes possible to select a speaker adaptive dictionary. Further, the voice input switch 5 is operated to acquire the biometric information of the speaking speaker, so that the correspondence between the speech data and the speaking speaker can be obtained, and the security effect can be enhanced.
[0042]
In the first embodiment, the case where the acquisition of the biological information and the input of the voice are continued while the voice input switch 5 is on is described. However, the present embodiment is not limited to this. For example, if it is determined in step S3 of the voice input processing operation shown in FIG. Returning to S2, the input of biometric information may be performed only once while the voice input switch 5 is kept on. In addition, although the voice is input in step S2 after the biometric information is input in step S1, the biometric information input process and the voice input process are preferably performed in parallel. Although the biometric information matching process is performed after the biometric information is input in step S1 and the voice is input in step S2, it is preferable that the biometric information matching process and the voice input process be performed in parallel.
[0043]
<Second embodiment>
This embodiment relates to a speaker identification device using the voice input device according to the first embodiment.
[0044]
FIG. 3 is a functional block diagram of the speaker identification device according to the present embodiment. In FIG. 3, the biometric information acquisition sensor 11, the biometric information input unit 12, the biometric information registration unit 13, the biometric information matching unit 14, the voice input switch 15, and the voice input unit 16 are the biometric information acquisition sensors according to the first embodiment. 1, a biometric information input unit 2, a biometric information registration unit 3, a biometric information collation unit 4, a voice input switch 5, and a voice input unit 6, which constitute a voice input device 17. The detailed description of the voice input device 17 is omitted.
[0045]
However, the voice input unit 16 includes voiceprint information extraction means for analyzing voice data input via the microphone or the like to obtain voiceprint information in addition to the function of the voice input unit 6 in FIG. The input voiceprint information is input to a voiceprint collation unit 18 described later. Note that as the voiceprint information, for example, a feature amount based on a frequency analysis result obtained by FFT (Fast Fourier Transform) or the like may be used.
[0046]
Reference numeral 19 denotes a voiceprint registration unit for registering voiceprint information of one or more persons for each individual. The voiceprint matching unit 18 compares and compares the input voiceprint information input from the voice input unit 16 with the registered voiceprint information registered in the voiceprint registration unit 19, and outputs a speaker identification result. Reference numeral 20 denotes a matching result integrating unit that integrates the speaker identification result from the biometric information matching unit 14 and the speaker identification result from the voiceprint matching unit 18 and outputs one speaker identification result.
[0047]
In the matching result integrating unit 20, for example, the average of the S / N ratios calculated in advance for each of the various biological information input from the biological information input unit 12 and the various audio information input from the audio input unit 16 The value is stored in a semiconductor memory (not shown) or the like. Then, in the verification result integration processing, the S / N ratio of the input biometric information and the S / N ratio of the input voice information are compared with the stored average value, and the S / N ratio larger than the average value is compared. The input information of the one exhibiting the N ratio is adopted, and the speaker identification result based on the adopted input information is output as the one speaker identification result.
[0048]
Alternatively, for example, when the biometric information matching unit 14 and the voiceprint matching unit 18 generate a speaker identification result for each of the biometric information input from the biometric information input unit 12 and the voice information input from the voice input unit 16. , The speaker identification result based on the input information with the higher certainty may be output as the one speaker identification result.
[0049]
Here, in the functional configuration as described above, the biometric information input unit 12, the biometric information matching unit 14, the voice input unit 16, the voiceprint matching unit 18, and the matching result integrating unit 20 are dedicated LSI (large-scale integrated circuit) elements. And so on. The biometric information registration unit 13 and the voiceprint registration unit 19 are configured by a semiconductor memory, a magnetic memory, a storage device, or the like. It should be noted that the present embodiment is not affected whether the elements or the like constituting each of the above sections are one or a plurality of composite elements. Further, each of the above-mentioned units may be replaced by a CPU (Central Processing Unit) or its peripheral devices.
[0050]
FIG. 4 is a flowchart of a speaker identification processing operation executed by the speaker identification device shown in FIG. Hereinafter, the speaker identification processing operation will be described in detail with reference to FIG. When the voice input switch 15 is switched from the off state to the on state by the speaker, the speaker identification processing operation starts.
[0051]
In steps S11 to S14, the voice input device 17 performs biological information input processing and voice input processing in the same manner as in steps S1 to S4 in the voice input processing operation shown in FIG. When is turned off, the biometric information matching process is performed. However, in the voice input process in step S12, when voice data is obtained, the voice data is analyzed and the voiceprint information (input voiceprint information) is extracted.
[0052]
In step S15, the voiceprint verification unit 18 compares and compares the input voiceprint information extracted in step S12 with the registered voiceprint information registered in the voiceprint registration unit 19. Then, a speaker identification result is output based on the collation result (voiceprint information collation processing). In step S16, the speaker identification result obtained as a result of the biometric information matching process in step S14 and the speaker identification result obtained as a result of the voiceprint information matching process in step S15 are output by the matching result integration unit 20. As described above, one speaker identification result is output after being integrated (collation result integration processing). After that, the speaker identification processing operation ends.
[0053]
As described above, in the present embodiment, the voice input device 17 having the same configuration as that of the first embodiment is provided, and when the voice input switch 15 is on, The speaker identification result is obtained based on the collation result between the input biometric information and the registered biometric information. Further, the voice input unit 16 of the voice input device 17 extracts voiceprint information from the obtained voice data, and performs collation between the input voiceprint information and the registered voiceprint information registered in the voiceprint registration unit 19. The voiceprint collating unit 18 obtains a speaker identification result based on the collation result. Then, the matching result integrating unit 20 integrates the speaker identification result based on the input biometric information from the biometric information matching unit 14 and the speaker identification result based on the input voiceprint information from the voiceprint matching unit 18 to form one speech. It outputs the person identification result.
[0054]
As described above, according to the present embodiment, the voice input switch 15 can be operated to automatically acquire the biological information and the voiceprint information of the speaking speaker. Therefore, it is not necessary to individually input speaker information, and speaker identification information can be obtained without imposing a burden on the speaker. Further, a speaker identification result is obtained based on a plurality of pieces of speaker information of the input biometric information and the input voiceprint information. Therefore, a speaker identification result can be obtained with high accuracy. Furthermore, since the voice input switch 15 is operated to acquire the biometric information and voiceprint information of the uttering speaker and perform the speaker identification, the correspondence between the input voice data and the utterer can be ensured, Reliable speaker identification can be performed.
[0055]
In the second embodiment, after performing the biometric information matching process in step S14, the voiceprint information matching process is performed in step S15. It is preferable to carry out in parallel. Further, the biometric information matching process and the voiceprint information matching process are performed separately and independently of the matching result integration process. However, the biometric information matching process and the voiceprint information matching process are performed in the matching result integration process. You can do it.
[0056]
By the way, the biometric information input units 2 and 12, the biometric information registration units 3 and 13, the biometric information matching units 4 and 14, the voice input units 6 and 16, the voiceprint matching unit 18, the voiceprint registration unit 19, and the matching in the above embodiments. The function as the result integrating unit 20 can also be realized by a voice input program and a speaker identification program recorded on a program recording medium. The program recording medium in each of the above embodiments is a program medium composed of a ROM (Read Only Memory). Alternatively, it may be a program medium that is mounted on and read from an external auxiliary storage device. In any case, the program reading means for reading the voice input program or the speaker identification program from the program medium may have a configuration of directly accessing and reading the program medium, or a RAM (random access program). A configuration may be adopted in which the program is downloaded to a program storage area (not shown) provided in the memory, and the program storage area is accessed and read. It is assumed that a download program for downloading from the program medium to the program storage area of the RAM is stored in the main device in advance.
[0057]
Here, the above-mentioned program medium is configured to be separable from the main body side, such as a tape system such as a magnetic tape or a cassette tape, a magnetic disk such as a floppy disk or a hard disk, a CD (compact disk) -ROM, an MO (magneto-optical). Disk system of optical disks such as disk, MD (mini disk), DVD (digital versatile disk), card system such as IC (integrated circuit) card and optical card, mask ROM, EPROM (ultraviolet erasing ROM), EEPROM (electric This is a medium that fixedly carries a program, including a semiconductor memory system such as a temporary erasing ROM) and a flash ROM.
[0058]
In the case where the voice input device and the speaker identification device according to the above-described embodiment have a configuration that can be connected to a communication network including the Internet via a communication I / F, the program medium includes the communication network. It may be a medium that carries the program in a fluid manner by downloading it from, for example. In this case, it is assumed that a download program for downloading from the communication network is stored in the main device in advance. Alternatively, it shall be installed from another recording medium.
[0059]
It should be noted that what is recorded on the recording medium is not limited to only a program, and data can also be recorded.
[0060]
【The invention's effect】
As is clear from the above, in the present invention, when the speaker switches the voice input switch to the side that enables voice input and biometric information acquisition and utters, the voice is converted into voice data and input to the subsequent stage, The biometric information is acquired and input, and the input biometric information is compared with the registered biometric information to output a speaker identification result relating to the input voice, so that the biometric information of the speaking speaker is automatically acquired. Speaker identification can be performed.
[0061]
Therefore, it is not necessary to input speaker information individually, and it is possible to obtain the voice data and the speaker identification information separated by the utterance section without putting a burden on the speaker. In addition, since voice data is not used for authentication, it is possible to accurately perform speaker authentication without being affected by ambient noise. Further, if the speaker identification information is used when selecting a speaker adaptation dictionary used for speech recognition, the speaker adaptation dictionary can be automatically selected.
[0062]
Further, in the present invention, when the speaker switches the voice input switch to the side that enables voice input and biometric information acquisition and utters, the voice data based on the voice and the speaker identification result based on the input biometric information are output. Further, the voiceprint information is extracted from the voice data, the input voiceprint information and the registered voiceprint information are compared and collated, and a speaker identification result is output, and the speaker identification result based on the input biometric information and the input voiceprint information are output. Since one speaker identification result is output by integrating the speaker identification result based on the speaker identification result, the speaker identification can be performed by automatically inputting the biometric information and the voiceprint information of the speaking speaker.
[0063]
Therefore, it is not necessary to individually input speaker information, and speaker identification information can be obtained without imposing a burden on the speaker. Further, a highly accurate speaker identification result can be obtained based on the two pieces of speaker information of the input biometric information and the input voiceprint information. Further, as described above, by performing the speaker identification based on the biometric information and the voiceprint information of the speaker that is currently speaking, the correspondence between the input voice data and the speaker can be ensured. Therefore, highly reliable speaker identification can be performed.
[Brief description of the drawings]
FIG. 1 is a block diagram of a voice input device according to the present invention.
FIG. 2 is a flowchart of a voice input processing operation executed by the voice input device shown in FIG. 1;
FIG. 3 is a block diagram of the speaker identification device according to the present invention.
FIG. 4 is a flowchart of a speaker identification processing operation executed by the speaker identification device shown in FIG. 3;
[Explanation of symbols]
1, 11 ... biological information acquisition sensor,
2, 12 ... biological information input unit,
3, 13 ... biological information registration unit,
4, 14 ... biometric information collation unit,
5, 15 ... voice input switch,
6, 16 ... voice input unit,
17 voice input device,
18. Voiceprint collation unit
19 ... Voiceprint registration section,
20: Matching result integration unit.

Claims

Voice input means for generating voice data based on the voice of the speaker and inputting the voice data to a subsequent stage;
Biological information acquisition means for acquiring the biological information of the speaker,
Biological information input means for inputting biological information acquired by the biological information acquisition means,
Biometric information registration means in which biometric information is registered for each individual;
Biometric information matching means for comparing and comparing input biometric information input from the biometric information input means and registered biometric information registered in the biometric information registration means, and identifying a speaker related to the input voice,
It is operated by a speaker to determine whether or not the voice input by the voice input unit is possible and whether or not the biological information can be obtained by the biological information obtaining unit, and that the biological information can be obtained when the voice input is possible. A voice input device comprising a voice input switch for switching.

The voice input device according to claim 1,
A voice input device, wherein the biometric information includes a fingerprint or a palm print.

The voice input device according to claim 1,
The voice input switch is in an on state that allows the voice input and the acquisition of biometric information while being pressed, and is in an off state that disables the voice input and the acquisition of biometric information while not pressed. A voice input device, comprising:

The voice input device according to claim 1,
Speech end determination means for determining the end of the utterance based on the input voice,
When the voice input switch is pressed, the voice input and the biological information acquisition are turned on to enable the voice input and the biological information acquisition, and when the speech termination determination unit determines that the utterance has ended, the voice input and the biological information acquisition are performed. A voice input device, which is in an off state for disabling.

The voice input device according to claim 3 or 4,
The voice input device, wherein the biometric information obtaining means continuously obtains biometric information while the voice input switch is on.

The voice input device according to claim 3 or 4,
The voice input device, wherein the biometric information obtaining means obtains biometric information only once when the voice input switch is turned on.

A voice input device according to any one of claims 1 to 6,
Voiceprint information extraction means provided in the voice input means of the voice input device, for extracting voiceprint information from the generated voice data;
Voiceprint information registration means in which voiceprint information is registered for each individual;
Voiceprint matching means for comparing and collating input voiceprint information input from the voice input means with registered voiceprint information registered in the voiceprint information registration means, and identifying a speaker related to the input voice;
The apparatus further comprises a verification result integration unit that integrates the speaker identification result by the biometric information verification unit of the voice input device and the speaker identification result by the voiceprint verification unit, and outputs one speaker identification result. Speaker identification device.

The speaker identification device according to claim 7,
The collation result integration means includes:
The average value of the S / N ratio of the biometric information and the average of the S / N ratio of the voice information calculated in advance based on various biometric information and voice information input from the biometric information input unit and the voice input unit And the value is stored in the memory means,
At the time of speaker identification, the S / N ratio of the input biometric information and the S / N ratio of the input voice information are compared with the stored average value, and the one that presents a higher S / N ratio than the average value is compared. A speaker identification device configured to output a speaker identification result based on input information as the one speaker identification result.

The speaker identification device according to claim 7,
The collation result integration means includes:
By comparing the information on the certainty when the biometric information matching means of the voice input device has obtained the speaker identification result with the information on the certainty when the voiceprint matching means has obtained the speaker identification result, A speaker identification device characterized in that a speaker identification result with a higher likelihood is output as the one speaker identification result.

A biometric information input step of acquiring and inputting biometric information of the speaker in response to the on state of the voice input switch;
In response to the ON state of the voice input switch, a voice input step of generating voice data based on the voice of the speaker and inputting the generated voice data to a subsequent stage;
A voice input method comprising comparing and comparing the input biometric information input with the registered biometric information registered in the biometric information registration means to identify a speaker associated with the input voice. .

A voice input step of executing the voice input method according to claim 10 to obtain a speaker identification result of the input voice based on the input biometric information, and extracting and inputting voiceprint information from the generated voice data. ,
A voiceprint verification step of comparing and collating the input voiceprint information input with the registered voiceprint information registered in the voiceprint information registration means to identify a speaker related to the input voice;
A speaker integrating step of integrating the speaker identification result based on the input biometric information and the speaker identification result based on the input voiceprint information to output one speaker identification result. Identification method.

Computer
When the voice input switch according to claim 1 is operated by a speaker, the switch functions as the voice input unit, the biometric information input unit, the biometric information registration unit, and the biometric information matching unit according to claim 1. Voice input program.

Computer
When the voice input switch according to claim 7 is operated by a speaker, the voice input means, biometric information input means, biometric information registration means, biometric information matching means, voiceprint information extraction means, voiceprint according to claim 7 A speaker identification program functioning as information registration means, voiceprint collation means and collation result integration means.

A computer-readable program recording medium on which the voice input program according to claim 12 is recorded.

A computer-readable program recording medium on which the speaker identification program according to claim 13 is recorded.