JP2006011591A

JP2006011591A - Individual authentication system

Info

Publication number: JP2006011591A
Application number: JP2004184663A
Authority: JP
Inventors: Shogo Kameyama; 昌吾亀山
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2004-06-23
Filing date: 2004-06-23
Publication date: 2006-01-12

Abstract

PROBLEM TO BE SOLVED: To provide an individual authentication system allowing further improvement of a security level, and difficult to break through security until a living person oneself directly performs operation. SOLUTION: Four kinds of contact type biological characteristic information detection parts 342, 343, a face photographing camera 341, a bone conduction sound detection part 340 and an air conduction sound detection part 304 are set as a mother group of an authenticating characteristic information acquisition part, and at least two kinds are selected from them and are provided in a cellphone 1. Acquisition of authenticating characteristic information by at least two designated authenticating characteristic information acquisition parts is simultaneously executed in a telephone use grasp holding state. When the acquisition of the authenticating characteristic information is not simultaneously performed, acceptance authentication (e.g. authentication that one is a regular user) of an authentication processing target person is not performed. COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、携帯電話を用いた個人認証システムに関する。 The present invention relates to a personal authentication system using a mobile phone.

特開２０００−２５９８２８号公報JP 2000-259828 A 特開２００４−８００８０号公報Japanese Patent Laid-Open No. 2004-80080

個人の認証方式として、認証対象者の音声波に含まれる個人性情報を利用した、いわゆる話者認識の技術が広く活用されている。例えば、最近では、特許文献１〜２に開示されているごとく、携帯電話のセキュリティレベルを高めるために、話者認識を含めた個人認証方式が種々提案されている。最近、携帯電話の普及台数が急増しており、新機種開発競争も激化していることから、機種の買い替えサイクルも短縮している。携帯電話機は電話帳やメールアドレスリストなどの個人データの蓄積媒体となることから、データの残された廃電話機がジャンクとして売買され、個人情報流出を引き起こす問題が指摘されている。また、インターネット接続などの情報通信端末機能を搭載した携帯電話機が標準化してしつつあり、情報提供課金や買い物などの決済あるいはモバイルバンキングなどにも広く利用されていることや、住居や建物などの建造物や自動車などのロック操作端末への流用も考えられていることから、より高度のセキュリティレベルが要求されている。特許文献１及び２には、音声による認証だけでなく、顔画像照合や指紋照合などの別の手段による認証方式を組み合わせることでセキュリティレベルを高める技術が開示されている。 As a personal authentication method, a so-called speaker recognition technique using personality information included in a voice wave of a person to be authenticated is widely used. For example, recently, as disclosed in Patent Documents 1 and 2, various personal authentication methods including speaker recognition have been proposed in order to increase the security level of mobile phones. In recent years, the proliferation of mobile phones has been increasing rapidly, and competition for new model development has intensified, so the model replacement cycle has also been shortened. Since mobile phones serve as a storage medium for personal data such as a telephone directory and mail address list, a problem has been pointed out that waste telephones with data remaining are sold as junk and cause personal information leakage. In addition, mobile phones equipped with information communication terminal functions such as the Internet connection are becoming standardized, and are widely used for payment for information provision, payments for shopping, mobile banking, etc. Since diversion to lock operation terminals such as buildings and automobiles is also considered, a higher security level is required. Patent Documents 1 and 2 disclose a technique for raising a security level by combining not only voice authentication but also an authentication method using another means such as face image matching or fingerprint matching.

近年、セキュリティシステムが高度化するにつれ、それを不法に破る犯罪的手法も高度化ないし大胆化してきている。例えば、特許文献１や特許文献２のように、指紋や顔などの画像による認証と、音声による認証とを組み合わせる方法が採用されている場合、セキュリティの突破は一見非常に難しいように見える。しかし、次のような方法を採用すれば、複数段に張り巡らされたセキュリティステップを全てかいくぐることも不可能ではないのである。すなわち、顔については写真や映像を、音声については録音テープを、指紋については写真製版したスタンプや、果ては認証対象者本人から切り取った腕や指などを個別に用いて、正規利用者の存在状態をバーチャルに再現し、順次的に受理認証を得て行くのである（以下、このような不正を「代用虚偽認証」と称することにする）。この方法は、生きた本人がその場にいなくてもセキュリティ突破が可能であり、誘拐・拉致などのリスクの大きな方法を必ずしも必要としない。また、仮に誘拐などを犯す凶悪犯罪が絡む場合でも、認証に必要な情報を一旦本人から得てしまえば、あとは複製や取得品（指など）を活用すればこと足りるので、用済みとなった本人を口封じ等のために抹殺することにも躊躇が働かなくなる惧れがある。 In recent years, as security systems become more sophisticated, criminal techniques that illegally break them have become more sophisticated or bold. For example, as in Patent Document 1 and Patent Document 2, when a method that combines authentication using images such as fingerprints and faces and authentication using voice is adopted, it seems that it is very difficult to break through security. However, if the following method is adopted, it is not impossible to pass through all the security steps spread over a plurality of stages. In other words, the presence or absence of a legitimate user using a photo or video for the face, a recording tape for the voice, a photo-engraved stamp for the fingerprint, or an arm or finger cut from the person to be authenticated. Are virtually reproduced, and acceptance authentication is sequentially obtained (hereinafter, such fraud is referred to as “substitute false authentication”). This method can break through security even if the living person is not on the spot, and does not necessarily require a risky method such as kidnapping and abduction. In addition, even if a violent crime involving kidnapping is involved, once the information necessary for authentication is obtained from the person, it is sufficient to use a copy or an acquired product (finger etc.), so it is used There is a possibility that the trap will not work even if the person is killed for the purpose of sealing.

本発明の課題は、セキュリティレベルの更なる向上が可能であり、かつ、生きた本人が直接操作しない限り、セキュリティ突破することが困難な個人認証システムを提供することにある。 It is an object of the present invention to provide a personal authentication system that can further improve the security level and is difficult to break through unless a living person directly operates.

Means and actions / effects for solving the problems

本発明は、認証処理対象者を、携帯電話を用いて認証する個人認証システムに係り、上記の課題を解決するために、
上記携帯電話に設けられる認証用特徴情報取得部であって、
認証処理対象者に該携帯電話を、認証処理以外の電話機能使用時と同じ状態で把握保持する電話使用把握保持状態にて保持させたとき、該認証処理対象者の手が接触する位置に設けられた、該手の生体特徴情報を検出する接触式生体特徴情報検出部と、
認証処理対象者に該携帯電話を、電話使用把握保持状態にて保持させたとき、該認証処理対象者の顔を撮影可能な位置に設けられる顔撮影用カメラと、
認証処理対象者の音声情報を骨伝導音にて検出する骨伝導音検出部と、
認証処理対象者の音声情報を気導音にて検出する気導音検出部と、
からなる群より選ばれる２以上のもの（接触式生体特徴情報検出部が２種以上設けられる場合には、接触式生体特徴情報検出部のみから２種を選ぶ場合を含む）を含んだ認証用特徴情報取得部と、
携帯電話に設けられ、電話使用把握保持状態にて２以上の認証用特徴情報取得部による認証用特徴情報の取得を、指定された少なくとも２つのものについて同時に実行する認証用特徴情報取得制御手段と、
携帯電話の内部又は外部に設けられ、２以上の認証用特徴情報取得部が各々取得した個別の認証用特徴情報に基づいて、認証処理対象者の認証処理を行なう認証処理手段とを備え、
認証用特徴情報の取得が、指定された少なくとも２つ認証用特徴情報取得部について同時になされていない場合には、認証処理対象者の受理認証を行なわないようにすることを特徴とする。 The present invention relates to a personal authentication system for authenticating a person to be authenticated using a mobile phone, and in order to solve the above problems,
An authentication feature information acquisition unit provided in the mobile phone,
Provided at the position where the hand of the person subject to authentication processing touches when the person subject to authentication processing holds the mobile phone in the same state as when the telephone function other than the authentication process is used. A contact-type biometric feature information detecting unit for detecting biometric feature information of the hand,
A camera for photographing a face provided at a position where the face of the person subject to authentication processing can be photographed when the person subject to authentication processing holds the mobile phone in a state of grasping the use of the telephone;
A bone conduction sound detection unit for detecting the voice information of the person to be authenticated by bone conduction sound;
An air conduction sound detection unit for detecting voice information of the person to be authenticated by air conduction sound;
For authentication including two or more selected from the group consisting of (including two or more types of contact-type biometric feature information detection units when two or more types of contact-type biometric feature information detection units are provided) A feature information acquisition unit;
Authentication feature information acquisition control means provided in a mobile phone, wherein the authentication feature information acquisition unit simultaneously executes acquisition of authentication feature information by two or more authentication feature information acquisition units in a telephone use grasp holding state; ,
An authentication processing unit that is provided inside or outside the mobile phone, and that performs authentication processing of the person to be authenticated based on the individual authentication feature information acquired by each of the two or more authentication feature information acquisition units,
If the acquisition of the authentication feature information is not performed for at least two designated authentication feature information acquisition units at the same time, it is characterized in that the acceptance authentication of the person to be authenticated is not performed.

「認証処理以外の電話機能」とは、通話機能を必須とする一般的な携帯電話の機能をいい、電子メール作成・送信機能、静止画ないし動画の撮影機能及びテレビ電話機能などの１種以上を付加できる。 "Telephone functions other than authentication processing" refers to general mobile phone functions that require a call function. One or more functions such as e-mail creation / transmission function, still image or video shooting function, and videophone function Can be added.

特許文献１や特許文献２のような携帯電話認証技術において、前述の代用虚偽認証のような不正が可能となるのは、複数の認証形態を単に寄せ集めているに過ぎず、生きた状態の正規ユーザが直接認証操作しているかどうかの識別に対して、特段の考慮が払われていないことによる。 In the mobile phone authentication technologies such as Patent Document 1 and Patent Document 2, fraud such as the above-described false authentication is possible only by gathering a plurality of authentication forms, and in a living state. This is because no special consideration is given to the identification of whether or not the authorized user is directly performing the authentication operation.

本発明によると、接触式生体特徴情報検出部、顔撮影カメラ、骨伝導音検出部及び気導音検出部の４種類を認証用特徴情報取得部の母群として、これから少なくとも２種を選択して携帯電話に設ける。そして、電話使用把握保持状態にて、指定された少なくとも２つの認証用特徴情報取得部による認証用特徴情報の取得を同時に実行し、それら認証用特徴情報の取得が同時になされていない場合には、認証処理対象者の受理認証（例えば、正規ユーザであるとの認証）を行なわないようにしたので、生きた本人が直接操作しない限り、セキュリティ突破することが困難となる。これにより、個々の認証用特徴情報取得部に、代用品等による特徴情報を順序的に与えるような不正を効果的に排除することができ、セキュリティレベルの更なる向上が可能となる。 According to the present invention, at least two types are selected from the four types of contact biometric feature information detection unit, face photographing camera, bone conduction sound detection unit, and air conduction sound detection unit as the mother group of the authentication feature information acquisition unit. To be installed on the mobile phone. And in the telephone use grasp holding state, acquisition of the authentication feature information by the specified at least two authentication feature information acquisition unit is executed at the same time, and if the acquisition of the authentication feature information is not made at the same time, Since acceptance authentication (for example, authentication that the user is an authorized user) is not performed, it is difficult to break through the security unless the living person directly operates. As a result, it is possible to effectively eliminate fraud in which feature information obtained by substitutes or the like is sequentially given to individual authentication feature information acquisition units, and the security level can be further improved.

「２以上の認証用特徴情報取得部による認証用特徴情報の取得を同時に実行する」ということの、「同時」の意味については、例えば各認証用特徴情報取得部による情報取得処理の制御を別々のコンピュータにて並列に行なう場合など、文字通りの意味で「同時」に行なう態様をもちろん含む。しかし、本発明においては、認証のための情報取得処理が開始されてから完了するまでの総期間内において、異なる認証用特徴情報取得部による個別の情報取得処理が仮に順次的に行われる場合であっても、どの情報取得処理期間にも属さない冗長期間（例えば、第一の認証用特徴情報取得部による情報取得処理が終了した後の、第二の認証用特徴情報取得部による情報取得処理が開始されるまでの待機時間などである）の占める比率が５０％以下（望ましくは１０％以下）に制限されている場合は、各認証用特徴情報取得部による情報取得処理は「同時」になされたものとして定義する。つまり、代用虚偽認証を実行しようとする犯罪者に、「代用品」を交換する時間的な余裕を与えなければよいのである。また、２以上の認証用特徴情報取得部による個別の情報取得処理を時分割並列処理にて行なう場合も、それら情報取得処理は同時なされたものとする。 For the meaning of “simultaneously executing acquisition of authentication feature information by two or more authentication feature information acquisition units”, for example, separate control of information acquisition processing by each authentication feature information acquisition unit Of course, it includes a mode of performing "simultaneously" in a literal sense, such as in parallel on a computer. However, in the present invention, in the case where individual information acquisition processing by different authentication feature information acquisition units is sequentially performed within the total period from the start of the information acquisition processing for authentication to completion. Even if there is a redundant period that does not belong to any information acquisition processing period (for example, the information acquisition process by the second authentication feature information acquisition unit after the information acquisition process by the first authentication feature information acquisition unit is completed) Information acquisition processing by each authentication feature information acquisition unit is “simultaneously” when the ratio of the waiting time until the start of the authentication is limited to 50% or less (preferably 10% or less) Define what has been done. In other words, it is only necessary to give the criminal who is going to perform substitute false authentication a time allowance for exchanging the “substitute”. Also, when individual information acquisition processing by two or more authentication feature information acquisition units is performed by time-division parallel processing, the information acquisition processing is assumed to be performed at the same time.

また、「認証用特徴情報の取得が、指定された少なくとも２つ認証用特徴情報取得部について同時になされていない場合に、認証処理対象者の受理認証を行なわないようにする」機能については、認証用特徴情報の取得の同時性を専用の処理ルーチンで判定し、その判定結果に基づいて積極的に認証の受理／棄却の制御を行なうように実現することもできるし、特に同時性の判定等を行なわず、非同時の場合に取得される情報自体が形骸化されるように情報取得処理のシーケンスを定め、結果として受理認証が得られなくなるように実現することもできる。例えば、複数の認証用特徴情報の取得処理に割り振る時間やステップ数を、順次的な情報入力では対応できない程度に限定しておくと、該情報入力が順次的に行われた場合に、時系列的に後で入力された情報の少なくとも一部が、上記時間やステップ数から必然的にはみ出ることになる。この場合、対応する情報取得部は、情報源が不在であっても、上記時間ないしステップ数内に強制的に情報取得を進めてしまうので、取得された情報は空白となるか、仮に何らかの情報が取得されても、受理認証が得られるような意味のある情報とはならず、形骸化される。この場合、このように形骸化された情報は、例えば正規の情報と特に区別することなく後続の認証処理に供給すれば必然的に棄却認証となるので、同時性の判定が不要となることは明らかである。 In addition, for the function of “not accepting and authenticating the person to be authenticated when authentication feature information acquisition is not performed for at least two specified authentication feature information acquisition units at the same time”, the authentication It is possible to determine the synchronization of feature information acquisition using a dedicated processing routine, and to actively control the acceptance / rejection of authentication based on the determination result. It is also possible to determine the sequence of information acquisition processing so that the information itself acquired in the case of non-simultaneous is formed, and as a result, it is possible to realize that the acceptance authentication cannot be obtained. For example, if the time and the number of steps allocated to the acquisition processing of a plurality of pieces of authentication feature information are limited to a level that cannot be handled by sequential information input, when the information input is performed sequentially, time series Therefore, at least a part of information inputted later inevitably protrudes from the time and the number of steps. In this case, the corresponding information acquisition unit forcibly advances the information acquisition within the above time or number of steps even if the information source is absent, so the acquired information is blank or some information Even if it is acquired, it will not be meaningful information that will allow you to obtain acceptance authentication, but will be wrecked. In this case, since the information formed in this way is supplied to subsequent authentication processing without being particularly distinguished from regular information, for example, it is inevitably rejected authentication, so that determination of simultaneity is unnecessary. it is obvious.

接触式生体特徴情報検出部は、例えば周知の指紋検出部にて構成することができる。また、接触式生体特徴情報検出部は、電話使用把握保持状態における携帯電話と手との接触状態を検出する接触検知センサにて構成することができる。もちろん、指紋検出部と組み合わせることも可能である。 The contact-type biometric feature information detection unit can be configured by, for example, a known fingerprint detection unit. The contact-type biometric feature information detection unit can be configured by a contact detection sensor that detects a contact state between the mobile phone and the hand in the telephone use grasp holding state. Of course, it can be combined with a fingerprint detection unit.

接触検知センサは、近接スイッチなど、接触の有無を二値的に検知するものであってもよいが、この場合は該接触検知情報自体に顕著な個人識別性を持たせることは困難であり、継続的な把握保持状態を検知するだけに留まるなど、認証システム上の利用形態は補助的なものとなる（当然、不正による回避も容易である）。他方、個人識別性をより高めた手の生体情報を得るために、該手の生体情報として、携帯電話の接触分布や把握圧力分布の情報を検出する面型接触センサを用いることができる。この場合、認証処理手段は、該面型接触センサが検出する接触分布ないし把握圧力分布の情報に基づいて認証を行なうものとすることができる。携帯電話の手による把握形態は、利用者の手の大きさ、立体形状、把握力及び握り方の癖などにより個人差があり、これらは携帯電話表面の接触分布や把握圧力の分布状態に反映される。そこで、この情報を面型接触センサにより検出すれば、個人識別性の高い認証用特徴情報として活用である。また、上記のような分布情報（特に圧力分布）は生きた本人に固有の情報であるから、切り取った腕やレプリカなどによる代用虚偽認証は極めて困難である。 The contact detection sensor may be a device that binaryly detects the presence or absence of contact, such as a proximity switch, but in this case, it is difficult to give the contact detection information itself remarkable personal identification, The usage form on the authentication system is auxiliary, such as only detecting a continuous grasping and holding state (of course, it is easy to avoid fraud). On the other hand, in order to obtain hand biometric information with higher personal identifiability, a surface-type contact sensor that detects information on a mobile phone contact distribution and grasping pressure distribution can be used as the hand biometric information. In this case, the authentication processing means may perform authentication based on information on contact distribution or grasped pressure distribution detected by the surface contact sensor. There are individual differences in the form of grasping by the hand of the mobile phone depending on the size of the user's hand, solid shape, grasping force, and grip of the grip, etc., which are reflected in the contact distribution on the surface of the mobile phone and the distribution state of the grasping pressure Is done. Therefore, if this information is detected by a surface contact sensor, it can be utilized as feature information for authentication with high personal identification. Moreover, since the above distribution information (especially pressure distribution) is information peculiar to the living person, the substitution false authentication by the cut arm or the replica is very difficult.

なお、面型接触センサは、携帯電話の筐体の把握対象となる表面を覆うように設けることができる。圧力分布検知に適した面型接触センサとしては、押し圧によって接点抵抗が変化する感圧接点をシート内に複数分散配置したシート状感圧センサモジュールを使用することもできる（例えば、フジクラ技報第１０４号（２００３年４月）、第３２〜３６頁）。このようなシート状感圧センサモジュールを利用すると、個人識別性の高い圧力分布情報を直接的に得ることができる利点がある。 Note that the surface contact sensor can be provided so as to cover a surface to be grasped of the casing of the mobile phone. As a surface-type contact sensor suitable for pressure distribution detection, a sheet-like pressure sensor module in which a plurality of pressure-sensitive contacts whose contact resistance is changed by pressing force is dispersedly arranged in a sheet can be used (for example, Fujikura Technical Report). 104 (April 2003), pp. 32-36). Use of such a sheet-like pressure sensor module has an advantage that pressure distribution information with high individual identification can be obtained directly.

次に、本発明にて採用できる電話使用保持状態としては、ごく普通の通話時に用いる保持状態、つまり、携帯電話の受話器を顔に当てて通話する保持状態である顔当接型保持状態を使用することができる（図２９参照）。顔当接型保持状態を認証用の保持状態として利用することの利点は、ユーザが携帯電話を本来の電話として利用する場合と全く同じ、自然な感覚で電話を保持して認証を行なうことができるので、認証用特徴情報の取得状態にもばらつきを生じ難く、精度の高い認証実績を形成しやすい点にある。しかし、受話器を顔に当てる姿勢になるから、携帯電話に設けたカメラによる顔の撮影には明らかに不向きとなる。そこで、取得情報認証用特徴情報取得部として、接触式生体特徴情報検出部と骨伝導音検出部と気導音検出部とからなる群より選ばれる２以上のものを組み合わせて使用し、携帯電話には顔撮影用カメラを設けないか、又は設ける場合でも、該顔当接型保持状態で認証用特徴情報の取得を行なうモードでは該顔撮影用カメラを使用しないようにすればよい。 Next, as a telephone use holding state that can be adopted in the present invention, a holding state used during an ordinary call, that is, a face contact type holding state that is a holding state in which a mobile phone handset is placed on the face is used. (See FIG. 29). The advantage of using the face-contact type holding state as the holding state for authentication is that the user can hold and authenticate the phone with the same natural feeling as when using the mobile phone as the original phone. Therefore, the acquisition state of the authentication feature information is less likely to vary, and it is easy to form a highly accurate authentication record. However, since the handset is placed in contact with the face, it is clearly unsuitable for photographing a face with a camera provided in the mobile phone. Therefore, as the feature information acquisition unit for acquired information authentication, two or more members selected from the group consisting of a contact-type biological feature information detection unit, a bone conduction sound detection unit, and an air conduction sound detection unit are used in combination. Even if a face photographing camera is not provided, the face photographing camera may not be used in the mode in which the authentication feature information is acquired in the face contact type holding state.

顔当接型保持状態においては、携帯電話を手で保持してしゃべる動作が最も自然であるから、認証用特徴情報取得部として、接触式生体特徴情報検出部と、骨伝導音検出部及び気導音検出部の少なくともいずれかを組み合わせて使用することが、認証に臨む際の特有の緊張感や違和感を生じにくく、認証情報の入力もスムーズに行なうことができ、正確な認証結果を得やすい利点がある。この場合、認証用特徴情報取得制御手段は、認証用特徴情報として、気導音及び骨伝導音の少なくともいずれかの音声情報と、手の生体情報とを同時に取得するものとすることができる。 In the face-contact type holding state, it is most natural to hold and talk with a mobile phone by hand. Therefore, as a feature information acquisition unit for authentication, a contact-type biological feature information detection unit, a bone conduction sound detection unit, Using in combination with at least one of the sound guide detection units makes it difficult to create a unique sense of tension or discomfort when faced with authentication, makes it possible to smoothly input authentication information, and to obtain accurate authentication results. There are advantages. In this case, the authentication feature information acquisition control means can simultaneously acquire at least one of the air conduction sound and the bone conduction sound as the authentication feature information and the biological information of the hand.

また、顔当接型保持状態においては、電話機を顔に接触させるので、骨伝導音を認証用特徴情報として活用できる利点がある。この場合、認証用特徴情報取得部として、骨伝導音検出部及び気導音検出部の双方を使用し、認証用特徴情報取得制御手段は、骨認証処理対象者が発する音声を、骨伝導音検出部と気導音検出部とにより同時検出することにより、認証用特徴情報として骨伝導音声情報と気導音声情報との双方を取得するものとして構成することができる。 In the face contact type holding state, since the telephone is brought into contact with the face, there is an advantage that the bone conduction sound can be utilized as the authentication feature information. In this case, both the bone conduction sound detection unit and the air conduction sound detection unit are used as the authentication feature information acquisition unit, and the authentication feature information acquisition control unit generates the sound generated by the bone authentication processing target person as the bone conduction sound. By detecting simultaneously by a detection part and an air conduction sound detection part, it can be comprised as what acquires both bone conduction audio | voice information and air conduction audio | voice information as authentication characteristic information.

話者認識による認証方式にて従来、音声検知のステップに関しては、騒音等による検出精度のみが考慮され、声帯から気道を通って空中に放出される気中伝導音（本発明では、これを「気導音」と称する）を通常のマイクで検知するか、骨伝導音を専用の骨伝導マイクで検知するかは、システムをどのような音環境下で使用するかに応じて適宜選択すればよいとみなされ、両者を併用する思想は全くなかった。 Conventionally, in the authentication method based on speaker recognition, regarding the voice detection step, only the detection accuracy due to noise or the like is considered, and the air conduction sound emitted from the vocal cords through the airway into the air (in the present invention, this is referred to as “ Whether the air conduction sound is detected by a normal microphone or the bone conduction sound is detected by a dedicated bone conduction microphone can be appropriately selected according to the sound environment in which the system is used. It was considered good and there was no idea of using both together.

しかしながら、気導音は音波の伝わる媒体が空気であるのに対し、骨伝導音の媒体は、骨伝導音検出部（具体的には骨伝導マイク）と声帯との間に介在する人体組織及び骨格であり、音響インピーダンス構造が全く異なる。その結果、検知される音声波形もその影響を受け、共通の声帯から発せられる音声であるにも関わらず、気導音と伝導音と検知波形には少なからぬ差異を生ずる。骨伝導音の伝播経路は人体組織及び骨格が介在するため、気導音媒体の空気に比べて複雑かつ不均質であり、伝播速度、振幅、音響的な共振周波数など、音声伝播に影響するパラメータにも分布があるため、声帯からの原音波形は、骨伝導音として伝播する過程において、気導音よりもはるかに大きな変質を受ける。当然、伝播経路となる人体組織や骨格には個人差があり、それに応じて気導音と骨伝導音との波形にも、人によって固有の差が生ずる。 However, air conduction sound is air as a medium through which sound waves are transmitted, whereas bone conduction sound medium is a human tissue that is interposed between a bone conduction sound detection unit (specifically, a bone conduction microphone) and a vocal cord. It is a skeleton and the acoustic impedance structure is completely different. As a result, the detected speech waveform is also affected, and there are considerable differences between the air conduction sound, the conduction sound, and the detection waveform even though the sound sound is emitted from a common vocal cord. The propagation path of bone conduction sound is complicated and inhomogeneous compared to air of air conduction sound medium because human tissue and skeleton are involved, and parameters that affect sound propagation such as propagation speed, amplitude, and acoustic resonance frequency Therefore, the original sound waveform from the vocal cords undergoes much greater alteration than the air conduction sound in the process of propagation as a bone conduction sound. Naturally, there are individual differences in the human body tissue and skeleton serving as a propagation path, and accordingly, there is a unique difference in the waveforms of the air conduction sound and the bone conduction sound.

そこで本発明者は、骨伝導音声情報と気導音声情報とのこのような差異に着目し、その両者を組み合わせることで、個人認証技術上、種々の画期的な効果が生ずることを見出した。具体的には、骨伝導音声情報と気導音声情報との単独では達成できない以下のような特有の効果を生ずるのである。
（１）骨伝導音と気導音との単独では知りえなかった、両波形の差異に由来した特徴情報が新たに把握可能となる。その結果、個人認証のセキュリティレベルを大幅に高めることができる。
（２）骨伝導音声情報と気導音声情報とが、いずれも情報種別としては同質の音声情報であるため、ハードウェアやソフトウェアの処理上の共有化も容易であり、波形の差異に由来した特徴情報を演算によって抽出することも容易である。
（３）骨伝導音は、検知に際しての人体接触が介在するために録音等による正確な再現が比較的難しく、これと気導音とを同時サンプリングするように構成すれば、生きた本人が直接操作しない限り、セキュリティ突破することが非常に困難となる。
（４）認証処理対象者が発する音声を、骨伝導音検出部と気導音検出部とにより同時検出することにより、骨伝導音と気導音との音波源が同一となり、別々に発声された音声を個別に骨伝導音又は気導音として検知する場合と比較して、骨伝導音と気導音との音声波形としての相関が強まるので、波形の差異に占める認証対象者固有の差異成分、つまり、認証に利用可能な特徴情報をより明確に把握でき、認証精度を高めることができる。 Therefore, the present inventor has paid attention to such a difference between the bone conduction voice information and the air conduction voice information, and has found that combining the both produces various epoch-making effects in personal authentication technology. . Specifically, the following specific effects that cannot be achieved by the bone conduction voice information and the air conduction voice information alone are produced.
(1) It becomes possible to newly grasp characteristic information derived from the difference between the two waveforms, which could not be known solely by the bone conduction sound and the air conduction sound. As a result, the security level of personal authentication can be significantly increased.
(2) Since both the bone conduction voice information and the air conduction voice information are the same kind of voice information as the information type, it is easy to share the processing of hardware and software, resulting from the difference in waveform It is also easy to extract feature information by calculation.
(3) The bone conduction sound is relatively difficult to accurately reproduce by recording or the like because of the human body contact at the time of detection. If it is configured to sample this and the air conduction sound at the same time, the living person directly Unless it is operated, it becomes very difficult to break through security.
(4) By simultaneously detecting the sound generated by the person to be authenticated by the bone conduction sound detection unit and the air conduction sound detection unit, the sound sources of the bone conduction sound and the air conduction sound become the same, and are uttered separately. Compared to the case where the detected voice is detected as bone conduction sound or air conduction sound, the correlation between the bone conduction sound and the air conduction sound as a speech waveform is strengthened. Components, that is, feature information that can be used for authentication can be grasped more clearly, and authentication accuracy can be improved.

気導音と骨伝導音とにより認証を行なう場合の認証処理手段は、骨伝導音声情報と気導音声情報との双方に基づく照合元音声特徴情報の、その照合先となる標準音声特徴情報を記憶した標準音声特徴情報記憶部と、照合元音声特徴情報を該標準音声特徴情報と照合する照合手段とを有するものとして構成できる。認証特定対象者（受理認証されるべき（つまり、「正しい」と認証されるべき）対象者）の気導音情報と骨伝導音情報とに基づいて標準音声特徴情報を予め作成しておき、これを、認証時に認証処理対象者から取得した照合元音声特徴情報の照合先として利用することにより、認証処理の簡略化と精度の向上とを図ることができる。なお、標準音声特徴情報として後述のごとき位相差を用いて認証を行なう場合等においては、認証特定対象者の標準音声を、システム外に設けられた骨伝導音検出部と気導音検出部とにより検出して作成することも可能である。しかし、ハードウェア間の特性相違の影響等を軽減する観点からは、標準音声特徴情報を、（システム自体に設けられた）骨伝導音検出部と気導音検出部とにより検出して作成することがより有効であり、標準音声特徴情報の作成処理も当然簡単となる。 The authentication processing means in the case of performing authentication with the air conduction sound and the bone conduction sound includes the standard speech feature information as the collation destination of the collation source voice feature information based on both the bone conduction speech information and the air conduction speech information. The stored standard voice feature information storage unit and collation means for collating the source voice feature information with the standard voice feature information can be configured. Standard voice feature information is created in advance based on the air conduction sound information and the bone conduction sound information of the authentication specific target person (the person who should be accepted and authenticated (that is, the person who should be authenticated as “correct”)), By using this as a verification destination of verification source voice feature information acquired from the authentication processing target person at the time of authentication, it is possible to simplify the authentication processing and improve accuracy. In the case where authentication is performed using a phase difference as described later as standard voice feature information, the standard voice of the person to be authenticated is used as a bone conduction sound detection unit and an air conduction sound detection unit provided outside the system. It is also possible to detect and create by. However, from the viewpoint of reducing the influence of differences in characteristics between hardware, standard voice feature information is detected and created by the bone conduction sound detection unit (provided in the system itself) and the air conduction sound detection unit. Is more effective, and the process of creating standard audio feature information is naturally simplified.

音声特徴情報は、骨伝導音の周波数スペクトルと気導音の周波数スペクトルを含むものとすることができる。この場合、照合手段は、それら周波数スペクトルを、標準音声特徴情報に含まれる骨伝導音と気導音との各標準周波数スペクトルと照合し、それらの双方において照合一致結果が得られた場合に受理認証するものとすることができる。同一人物の音声であっても、骨伝導音の周波数スペクトルと気導音の周波数スペクトルとは互いに相違するので、骨伝導音と気導音との周波数スペクトルをそれぞれ対応する標準周波数スペクトルと照合することで、高精度の個人認証が高くなる。この効果は、認証の対象となる周波数スペクトルと標準周波数スペクトルとのいずれについても、骨伝導音検出部と気導音検出部とにより認証処理対象者が発する音声を同時検出して作成したものを使用した場合に特に高められる。骨伝導音と気導音との双方の周波数スペクトルを用いて照合を行なうのであるから、各波形単独では特定し得ない、両波形の差異に由来した特徴情報を結果的に含んだ認証方式となる。 The audio feature information can include a frequency spectrum of bone conduction sound and a frequency spectrum of air conduction sound. In this case, the collation means collates the frequency spectrum with each standard frequency spectrum of the bone conduction sound and the air conduction sound included in the standard voice feature information, and accepts the collation match result in both of them. It can be authenticated. Even if the voice of the same person, the frequency spectrum of bone conduction sound and the frequency spectrum of air conduction sound are different from each other, so the frequency spectrum of bone conduction sound and air conduction sound is compared with the corresponding standard frequency spectrum. This increases the accuracy of personal authentication. This effect is obtained by simultaneously detecting the sound generated by the person to be authenticated by the bone conduction sound detection unit and the air conduction sound detection unit for both the frequency spectrum and the standard frequency spectrum to be authenticated. Increased especially when used. Since verification is performed using the frequency spectrum of both bone conduction sound and air conduction sound, the authentication method includes characteristic information derived from the difference between the two waveforms, which cannot be specified by each waveform alone. Become.

一方、本発明の個人認証システムは、骨伝導音検出部が検出する骨伝導音波形と、気導音検出部が検出する気導音波形との双方を用いたときにのみ演算可能となる複合音声特徴情報を演算する複合音声特徴情報演算手段を有したものとしても構成できる。この場合、認証処理手段は、該複合音声特徴情報に基づいて認証処理を行なうものとすることができる。この方法は、骨伝導音と気導音との各波形単独では特定し得ない両波形の差異に由来した特徴情報を、複合音声特徴情報として演算により抽出把握する方法に他ならず、２種の音声情報の組み合わせによる認証精度及びセキュリティレベルの向上効果を一層高めることができる。 On the other hand, the personal authentication system of the present invention is a composite that can be calculated only when both the bone conduction sound waveform detected by the bone conduction sound detection unit and the air conduction sound waveform detected by the air conduction sound detection unit are used. It can also be configured as having a composite voice feature information calculation means for calculating voice feature information. In this case, the authentication processing means can perform authentication processing based on the composite voice feature information. This method is nothing but a method of extracting and grasping the feature information derived from the difference between the two waveforms that cannot be specified by each waveform of the bone conduction sound and the air conduction sound as composite voice feature information. The improvement effect of the authentication accuracy and the security level by the combination of voice information can be further enhanced.

複合音声特徴情報演算手段は、気導音波形と骨伝導音波形との位相差を複合音声特徴情報として演算することができる。前述のごとく、骨伝導音の伝播経路となる人体組織及び骨格は、その音響インピーダンスの分布状況に個人の生体的特徴が直接的に反映される。具体的には、生体（つまり、認証すべき個人）毎に、インピーダンス不連続部分（例えば組織境界など）等での反射波の形成状況や位相遅延状況などが相違するので、骨伝導音波形は気導音波形に対し認証すべき個人毎に異なる位相差を有するものとなり、個人識別性を有する。そこで、気導音波形と骨伝導音波形との位相差を演算によって求めれば、これを複個人認証のための有効かつ重要な情報として用いることができる。この場合、位相差を正確に演算するには、骨伝導音と気導音とは、同一音声に対して同時検出したものを用いることが必要である。 The composite voice feature information calculation means can calculate the phase difference between the air conduction sound waveform and the bone conduction sound waveform as the composite sound feature information. As described above, in the human body tissue and skeleton that are the propagation paths of the bone conduction sound, the biological characteristics of the individual are directly reflected in the distribution state of the acoustic impedance. Specifically, because the living wave (that is, the individual to be authenticated) differs in the formation of reflected waves and the phase delay at impedance discontinuities (eg, tissue boundaries), the bone conduction sound waveform is Each individual to be authenticated with respect to the air conduction sound waveform has a different phase difference, and has personal identification. Therefore, if the phase difference between the air conduction sound waveform and the bone conduction sound waveform is obtained by calculation, this can be used as effective and important information for multi-person authentication. In this case, in order to accurately calculate the phase difference, it is necessary to use the bone conduction sound and the air conduction sound that are simultaneously detected for the same sound.

この場合、予め特定されている認証特定対象者固有の気導音波形と骨伝導音波形との位相差を標準位相差として求めておき、認証処理手段は、演算された位相差が該標準位相差と一致しているか否かに基づいて認証処理を行なうことができる。波形位相差自体は、比較的単純な波形演算（例えば、２つの波形の位相差を種々に設定して差分ないし加算波形を演算し、積分振幅が最小化ないし最大化する位相差を求める方法）により求めることができ、スペクトル照合等と比較して演算負荷を軽減できる利点がある。 In this case, the phase difference between the air conduction sound waveform and the bone conduction sound waveform peculiar to the authentication identification subject specified in advance is obtained as a standard phase difference, and the authentication processing means determines that the calculated phase difference is the standard level. Authentication processing can be performed based on whether or not it matches the phase difference. The waveform phase difference itself is a relatively simple waveform calculation (for example, a method of calculating a difference or addition waveform by setting the phase difference between two waveforms in various ways to obtain a phase difference that minimizes or maximizes the integrated amplitude). There is an advantage that the calculation load can be reduced as compared with spectrum matching or the like.

なお、気導音と骨伝導音とは周波数スペクトルにも差があるので、両波形に共通に含まれる周波数成分を抽出して位相差を求めると、より正確な位相差の演算が可能である。この場合、該周波数成分の抽出は周知のディジタルフィルタ技術を活用して実施することができる。 Since the air conduction sound and the bone conduction sound also have a difference in frequency spectrum, more accurate calculation of the phase difference is possible by obtaining the phase difference by extracting the frequency components that are included in both waveforms in common. . In this case, the extraction of the frequency component can be performed using a known digital filter technique.

また、複合音声特徴情報は、上記のような両波形の位相差に限られるものではなく、例えば、気導音と骨伝導音との各周波数スペクトルの差分スペクトルを利用することもできる。骨伝導音は、伝播経路に介在する人体の減衰ないし共振などの音響特性が個人差を有し、結果的に、気導音に対し不足ないし強調される周波数成分も個人により差を生ずる。従って、気導音と骨伝導音との差分スペクトルは個人識別性を有することになる。また、気導音と骨伝導音との共通スペクトル（個々の周波数スペクトルから上記差分スペクトルを減じたもの）など、個々の周波数スペクトルと上記差分スペクトルとの数学的操作により等価的に得られるスペクトルも、複合音声特徴情報として当然に活用できる。 The composite voice feature information is not limited to the phase difference between the two waveforms as described above. For example, the difference spectrum of each frequency spectrum between the air conduction sound and the bone conduction sound can be used. Bone conduction sound has individual differences in acoustic characteristics such as attenuation or resonance of the human body intervening in the propagation path. As a result, frequency components that are insufficient or emphasized with respect to air conduction sound also vary depending on the individual. Therefore, the difference spectrum between the air conduction sound and the bone conduction sound has personal identification. There are also spectra obtained equivalently by mathematical operations of individual frequency spectra and the above difference spectra, such as the common spectrum of air conduction sound and bone conduction sound (subtracting the above difference spectrum from each frequency spectrum). Naturally, it can be used as composite voice feature information.

上記のような位相差や差分スペクトルの発生要因は、主として骨伝導音の伝播経路をなす骨格や人体組織の機械的構造に起因するものであるから、のどの調子などによって認証対象となる音声に多少変質が生じていても誤認等を生じにくい利点がある。 The cause of the phase difference and difference spectrum as described above is mainly due to the skeleton and the mechanical structure of the human tissue that form the propagation path of the bone conduction sound. There is an advantage that misidentification is less likely to occur even if some alteration occurs.

また、認証処理手段は、認証処理を、骨伝導音の周波数スペクトルと気導音の周波数スペクトルとの少なくともいずれかを標準周波数スペクトルと照合する第一の認証処理と、複合音声特徴情報に基づく第二の認証処理とを組み合わせて実施するものとすることもできる。骨伝導音の周波数スペクトルと気導音の周波数スペクトルとのいずれかに基づく従来の音声認証方式は、スペクトル照合の手法により高い個人識別性を有している反面、録音等を利用した誤魔化しなどへのセキュリティホールも抱えている。しかし、上記のような複合音声特徴情報（特に、演算が簡単な位相差）による認証処理を組み合わせることで、上記のようなセキュリティホールの発生を効果的に防止することができる。 Further, the authentication processing means performs the authentication process by performing a first authentication process for comparing at least one of the frequency spectrum of the bone conduction sound and the frequency spectrum of the air conduction sound with the standard frequency spectrum, and the first based on the composite voice feature information. It can also be implemented in combination with the second authentication process. The conventional voice authentication method based on either the frequency spectrum of the bone conduction sound or the frequency spectrum of the air conduction sound has high personal identification by the method of spectrum matching, but it is also used for misrepresentation using recording etc. There are also security holes. However, it is possible to effectively prevent the occurrence of the security hole as described above by combining the authentication processing based on the composite voice feature information (particularly, a phase difference that is easy to calculate) as described above.

以上、顔当接型保持状態における認証態様をいくつか例示して説明した。しかし、最近の携帯電話は、単純な通話機能以外に、電子メール作成・送信機能や、カメラ・ビデオ撮影機能あるいはテレビ電話機能などが次々と付加され、顔当接型保持状態以外での使用形態も頻繁に使われるようになってきた。顔当接型保持状態の次に多い標準的な利用保持形態は、液晶パネル等によるメイン表示画面に利用者の顔を正対させて保持する顔正対型保持状態である（図３０参照）。この保持形態を利用するのは、電子メールの作成時やインターネット利用時、さらには、メイン表示画面とは反対の、電話機背面に設けられたカメラにより静止画や動画の撮影を行ったりする場合である。この場合は、電話機がユーザの顔と接触しないので骨伝導音の取得には明らかに不向きである。従って、携帯電話には認証用特徴情報取得部としての骨伝導音検出部が設けられないか、又は設けられていても、該顔正対型保持状態で認証用特徴情報の取得を行なうモードでは該骨伝導音検出部を使用しないようにする。 In the above, several authentication modes in the face contact type holding state have been exemplified and described. However, in recent mobile phones, in addition to simple call functions, e-mail creation / transmission functions, camera / video shooting functions, videophone functions, etc. have been added one after another. Has also been used frequently. The most common usage holding mode next to the face contact type holding state is a face-to-face holding state in which the user's face is held facing the main display screen by a liquid crystal panel or the like (see FIG. 30). . This type of retention is used when creating e-mails, using the Internet, and when taking still images and videos with the camera provided on the back of the phone, opposite to the main display screen. is there. In this case, since the telephone does not contact the user's face, it is clearly unsuitable for obtaining bone conduction sound. Accordingly, the cellular phone is not provided with the bone conduction sound detection unit as the authentication feature information acquisition unit, or even if provided, in the mode in which the authentication feature information is acquired in the face-to-face holding state. The bone conduction sound detector is not used.

この場合、認証用特徴情報取得部としては、気道音検出部、接触式生体特徴情報検出部及び顔撮影用カメラからなる群より、任意の２種以上を選んで搭載できるが、顔正対型保持状態では自分の顔を表示部に映し出して認証撮影することができるので、特有の認証利用形態として、以下のようなものを例示できる。まず、携帯電話には、顔撮影用カメラが撮影する認証処理対象者の顔画像を表示する表示部を設ける。該顔撮影用カメラは携帯電話に対し、認証処理対象者の顔と表示部とが正対しているときに、顔を撮影可能な位置に取り付ける。そして、電話使用保持状態として、携帯電話の表示部及び顔撮影用カメラが認証処理対象者の顔と正対する保持状態である顔正対型保持状態を使用し、認証用特徴情報取得部として、顔撮影用カメラを必須とし、これに接触式生体特徴情報検出部と気導音検出部との少なくともいずれかが組み合わせて使用する。これにより、顔画像を必須として、これに気道音又は手の生体情報のいずれかを同時取得する形で組み合わせることが可能となり、セキュリティレベルの高い認証が可能となる。 In this case, as the authentication feature information acquisition unit, any two or more types can be selected from the group consisting of an airway sound detection unit, a contact-type biometric feature information detection unit, and a face photographing camera. In the holding state, the user's face can be displayed on the display unit and authentication photography can be performed, so the following can be exemplified as specific authentication usage forms. First, the mobile phone is provided with a display unit that displays a face image of the person to be authenticated that is photographed by the face photographing camera. The face photographing camera is attached to a position where a face can be photographed when the face of the person to be authenticated and the display unit face the mobile phone. And, as the telephone use holding state, using the face-to-face holding state in which the display unit of the mobile phone and the camera for photographing the face directly face the face of the person to be authenticated, the authentication feature information acquisition unit, A face photographing camera is essential, and this is used in combination with at least one of a contact-type biological feature information detection unit and an air conduction sound detection unit. As a result, it is possible to make a face image indispensable and combine it with either airway sound or biological information of the hand at the same time, and authentication with a high security level is possible.

顔正対型保持状態での具体的な認証形態としては種々採用可能であり、例えば認証用特徴情報取得部として、顔撮影用カメラと気導音検出部とを組み合わせて使用し、認証用特徴情報取得制御手段が認証用特徴情報として、顔撮影画像と気導音音声情報とを同時に取得する構成も可能である。この場合、音声入力中は、顔画像において口などが動き、情報取得の厳密な同時性を充足させることは困難であるが、冗長待機時間を前述のごとく短くし、顔撮影処理と音声入力を伴う気導音検出とを、間髪入れずに連続して実施すれば、本発明で規定する同時性は十分に満たすことができる。この場合、代替虚偽認証等をより効果的に防止するには、認証用特徴情報取得制御手段に、顔撮影用カメラによる顔画像の検出状態の確認処理と、気導音音声情報の検出処理とを交互に繰り返して実行させるとよい。 Various authentication forms can be adopted as the face-to-face holding state. For example, as a feature information acquisition unit for authentication, a face photographing camera and an air conduction sound detection unit are used in combination. A configuration is also possible in which the information acquisition control means acquires the face-captured image and the air conduction sound information simultaneously as the authentication feature information. In this case, during voice input, the mouth moves in the face image, and it is difficult to satisfy the exact synchronization of information acquisition, but the redundant waiting time is shortened as described above, and the face photographing process and voice input are performed. If the accompanying air conduction sound detection is carried out continuously without putting in a gap, the simultaneity defined in the present invention can be sufficiently satisfied. In this case, in order to more effectively prevent alternative false authentication and the like, the authentication feature information acquisition control means includes a confirmation process of the detection state of the face image by the face photographing camera, a detection process of the air conduction sound information, It is good to execute by alternately repeating.

また、顔画像を用いる場合、認証用特徴情報取の取得同時性ということを考慮すると、認証用特徴情報取得部として、前記顔撮影用カメラと接触式生体特徴情報検出部とを組み合わせて使用し、認証用特徴情報取得制御手段は認証用特徴情報として、顔撮影画像と手の生体情報とを同時に取得するものとして構成することも望ましい態様の１つである。 Further, when using a face image, considering the simultaneous acquisition of the authentication feature information acquisition, the face photographing camera and the contact-type biometric feature information detection unit are used in combination as the authentication feature information acquisition unit. It is also a desirable aspect that the authentication feature information acquisition control means is configured to simultaneously acquire the face photographed image and the hand biometric information as the authentication feature information.

本発明の個人認証システムにおいて認証用特徴情報取得部を、接触式生体特徴情報検出部と、顔撮影用カメラ、気導音検出部及び骨伝導音検出部の少なくともいずれかとの組み合わせにて構成する場合、認証用特徴情報取得制御手段は、顔撮影画像、気導音音声情報及び骨伝導音音声情報の少なくともいずれかの取得処理の前後に、接触式生体特徴情報検出部による手の生体情報の検出状態変化を調べる接触変化確認処理を行なうことができる。つまり、接触変化確認処理として、例えば上記の取得処理の前後において、手の生体情報を２回検出し、検出された生体情報が変化しているかどうかを調べることにより、取得処理の間も含めて携帯電話の保持状態（これは、顔当接型でも顔正対型でもいずれでも適用可能である）が維持されているかどうかを確認できる。これは、順次的な代替虚偽認証によるセキュリティ突破を阻む上で好都合である。 In the personal authentication system of the present invention, the authentication feature information acquisition unit is configured by a combination of a contact-type biometric feature information detection unit and at least one of a face photographing camera, an air conduction sound detection unit, and a bone conduction sound detection unit. In this case, the authentication feature information acquisition control means performs the biometric information of the hand by the contact-type biometric feature information detection unit before and after the acquisition process of at least one of the face image, the air conduction sound information, and the bone conduction sound information. It is possible to perform a contact change confirmation process for examining the detected state change. That is, as the contact change confirmation process, for example, before and after the acquisition process described above, by detecting the biological information of the hand twice and checking whether the detected biological information has changed, including during the acquisition process. It can be confirmed whether or not the holding state of the mobile phone (this can be applied to either the face contact type or the face-to-face type) is maintained. This is advantageous in preventing a security breakthrough due to sequential alternative false authentication.

以下、本発明の実施の形態を添付の図面を用いて詳しく説明する。
この実施形態では、本発明の個人認証システムの機能を携帯電話に組み込む場合を例にとって説明する。図１は、携帯電話１の一例を示す外観斜視図である。携帯電話１は、本体の上寄りに受話器３０３が、同じく下寄りに送話器３０４が配置されており、両者の間には、液晶表示装置（例えば、カラー液晶表示装置）にて構成された液晶モニタ３０８、入力部３０５、及び携帯電話１をオンフック状態とオフフック状態との間で切り換えるオンフック／オフフック切換スイッチ３０６が設けられている。本実施形態において携帯電話１は、線電話通信網だけでなく、インターネット等の情報通信網へのアクセスも可能なものとされている。入力部は、情報入力用のキーボードに兼用された通話ダイアルキー３０５ａ、カーソル移動キー３０５ｂ、及び通話モードや情報検索モード等の使用モードを切り換えるためのモード切替キー３０５ｃ等を含んでいる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
In this embodiment, a case where the function of the personal authentication system of the present invention is incorporated in a mobile phone will be described as an example. FIG. 1 is an external perspective view showing an example of a mobile phone 1. The mobile phone 1 is provided with a receiver 303 on the upper side of the main body and a transmitter 304 on the lower side, and a liquid crystal display device (for example, a color liquid crystal display device) between the two. An on-hook / off-hook switch 306 that switches the liquid crystal monitor 308, the input unit 305, and the mobile phone 1 between an on-hook state and an off-hook state is provided. In the present embodiment, the mobile phone 1 can access not only a line telephone communication network but also an information communication network such as the Internet. The input unit includes a call dial key 305a that is also used as an information input keyboard, a cursor movement key 305b, and a mode switching key 305c for switching use modes such as a call mode and an information search mode.

送話器３０４は、気導音検出部を兼ねるマイクにて構成される。他方、受話器３０３は本実施形態では骨伝導スピーカで構成され、これと近接して骨伝導音検出部としての骨伝導マイク３４０が配置されている。骨伝導スピーカの基本構成は、例えば特許第２９６７７７７号公報あるいは特開２００３-３４０３７０号公報等により、骨伝導マイクの基本構成は、例えば実開昭５５−１４６７８５号公報、特開昭５８−１８２３９７号公報、実開昭６３−１７３９９１号公報あるいは特許第３４８８７４９号公報等により、各々周知であるため詳細な説明は省略する。いずれも耳か耳下の顎骨などに当てて使用するものである。これらはいずれも認証用特徴情報取得部を構成する。 The transmitter 304 is composed of a microphone that also serves as an air conduction sound detection unit. On the other hand, the handset 303 is constituted by a bone conduction speaker in the present embodiment, and a bone conduction microphone 340 as a bone conduction sound detecting unit is disposed in the vicinity thereof. The basic configuration of the bone conduction speaker is, for example, Japanese Patent No. 2967777 or Japanese Patent Laid-Open No. 2003-340370, and the basic configuration of the bone conduction microphone is, for example, Japanese Utility Model Laid-Open No. 55-146785, Japanese Patent Laid-Open No. 58-18297. Since it is well known in Japanese Patent Publication No. 63-173991 or Japanese Patent No. 34888749, detailed description is omitted. Both are used for the ear or the jawbone under the ear. All of these constitute an authentication feature information acquisition unit.

また、携帯電話１には、他の認証用特徴情報取得部として、顔撮影カメラ３４１、接触式生体特徴情報検出部をなす面型接触センサ３４３と、指紋検出部３４２とを備えている。図１に示すごとく、携帯電話１を握る形態は、利用者によるバラツキはあるものの基本形態はほぼ同じである。すなわち、表示部３０８が手ＭＨの内側に向くように電話機下部を掌の底に当て、電話機の第一の側面（右利きの人は左側面、左利きの人は逆）に曲げた４本の指１４Ｆをあてがう一方、電話機の第二の側面（右利きの人は左側面、左利きの人は逆）の下半分に親指ＭＳの付け根から側面部を沿わせ、上半分にかかる位置にて親指の先端を当てる。入力部３０５に不用意に触れないようにし、また、指先が顔に当たる不快感を軽減するために、無意識にこういう持ち方になるのである。本実施形態では、これを利用して、親指の先端腹が当たる位置に指紋検出部３４２を設け、両側面に面型接触センサ３４３を設けている。 In addition, the mobile phone 1 includes a face photographing camera 341, a surface contact sensor 343 that forms a contact-type biometric feature information detection unit, and a fingerprint detection unit 342 as other authentication feature information acquisition units. As shown in FIG. 1, the form of gripping the mobile phone 1 is almost the same as the basic form although there are variations among users. That is, four phones that are bent to the first side of the phone (left-handed for right-handed people, reverse for left-handed) so that the display unit 308 faces the inside of the hand MH While applying finger 14F, place the side part from the base of thumb MS to the lower half of the second side of the telephone (the left side for right-handed people, the opposite for left-handed people), and the thumb at the top half Hit the tip of. In order to avoid inadvertently touching the input unit 305 and to reduce the uncomfortable feeling that the fingertip hits the face, it is unconsciously held in this way. In the present embodiment, utilizing this, the fingerprint detection unit 342 is provided at a position where the tip of the thumb hits, and the surface contact sensors 343 are provided on both side surfaces.

本実施形態では、図３に示すように、面型接触センサ３４３として、既に説明した、押し圧によって接点抵抗（接点容量でもよい）が変化する感圧接点ＳＰをシート内に複数分散配置したシート状感圧センサモジュールを用いている。個々の感圧接点ＳＰの抵抗値（圧力検出値）は複数ビット信号により諧調的にデジタル変換され、各感圧接点ＳＰの信号値により圧力分布情報が得られる。これにより、第一側（ここでは左）の面型接触センサ３４３では、親指以外の４本指による把握押圧領域に対応した感圧分布領域ＰＤＰが検出され、第二側（ここでは右）の面型接触センサ３４３では、該圧力分布に親指（及び掌の親指付け根部分）の把握押圧領域に対応した感圧分布領域ＰＤＰが検出される。該感圧分布領域ＰＤＰの形状（及び圧力分布状態）は個人によって異なるため、特徴情報として利用できる。なお、携帯電話１の上記両側面と背面との３つの面にまたがる一体の面型接触センサを設け、指と掌との一体的な把握押圧領域を検出するようにしてもよいが、電話機の裏面に通常設けられるバッテリー収容部の蓋を排除し、例えば電話機底面側からバッテリーを抜き差しできるようにするなどの設計変更が必要である。 In the present embodiment, as shown in FIG. 3, as the surface contact sensor 343, a sheet in which a plurality of pressure-sensitive contacts SP that have already been described and whose contact resistance (or contact capacity) changes due to the pressing force is dispersedly arranged in the sheet. A pressure sensor module is used. The resistance value (pressure detection value) of each pressure-sensitive contact SP is digitally converted by a multi-bit signal, and pressure distribution information is obtained from the signal value of each pressure-sensitive contact SP. As a result, the pressure sensor distribution area PDP corresponding to the grasping pressing area by the four fingers other than the thumb is detected by the surface contact sensor 343 on the first side (here, left), and is detected on the second side (here, right). The surface contact sensor 343 detects a pressure-sensitive distribution area PDP corresponding to the grasping pressing area of the thumb (and the base of the thumb of the palm) in the pressure distribution. Since the shape (and pressure distribution state) of the pressure-sensitive distribution region PDP varies depending on the individual, it can be used as feature information. It should be noted that an integrated surface contact sensor that extends over the three surfaces of the mobile phone 1 on both sides and the back surface may be provided to detect an integrated grasping and pressing area between the finger and palm. It is necessary to change the design, for example, by removing the cover of the battery housing portion that is normally provided on the back surface, so that the battery can be inserted and removed, for example, from the bottom side of the telephone.

なお、上記以外に使用可能な面型接触センサとしては、周知のタッチパネルと同様の機構によるアナログ容量結合方式面型タッチセンサを用いることができる。この種のタッチセンサは、縦線群と横線線とを互いに非接触となるように格子状に配列した微細配線を検出面に形成し、それら縦線と横線とに交流電圧を一定間隔で交互に通電するとともに、電流検知により各配線のインピーダンス変化をモニタし、インピーダンス変化が検知された縦線と横線の位置から、検出面上の接触点座標を特定するものである。この方法では、接触点に加わる圧力検知は困難であり、接触分布状態を特定するのに適している。しかし、同一人物が異なる力で携帯電話を把握したとき、力の度合いによって指などの接触面積が変化するので、間接的には把握力の情報も得ることができる。 As a surface-type contact sensor that can be used other than the above, an analog capacitive coupling type surface-type touch sensor having a mechanism similar to that of a known touch panel can be used. In this type of touch sensor, fine lines in which vertical lines and horizontal lines are arranged in a grid pattern so as to be in non-contact with each other are formed on the detection surface, and alternating voltages are alternately applied to the vertical lines and horizontal lines at regular intervals. In addition, the impedance change of each wiring is monitored by current detection, and the contact point coordinates on the detection surface are specified from the positions of the vertical line and the horizontal line where the impedance change is detected. In this method, it is difficult to detect the pressure applied to the contact point, which is suitable for specifying the contact distribution state. However, when the same person grasps the mobile phone with different forces, the contact area of the finger or the like changes depending on the degree of the force, so that information on the grasping force can be obtained indirectly.

また、図２に示すように、接触式生体特徴情報検出部として、入力部３０５との接触を検知する入力部感圧センサ３２３を設けてもよい。 Further, as shown in FIG. 2, an input unit pressure sensor 323 that detects contact with the input unit 305 may be provided as a contact-type biometric feature information detection unit.

図１に戻り、顔撮影用カメラ３４１は例えばＣＣＤカメラからなり、認証処理対象者の顔と表示部３０８とが正対しているときに、顔を撮影可能な位置となるように、携帯電話１の表示部３０８に近接して設けられている。これは、認証用の顔画像は、顔の必要部分がカメラ３４１の視野に収まっていなければならないので、カメラ３４１に写る撮影用のファインダ画像を表示部３０８に表示し、認証に好適な姿勢の画像が得られるかどうか（例えば、図１３に示すように、表示部３０８内の規定枠Ｆ内に顔が納まり、基準線ＳＬに目線を合せるなど）を確認しつつ撮影できるようにするためである。なお、顔撮影用カメラ３４１に代えて網膜撮影用カメラを設け、網膜の画像を認証用特徴情報として用いることも可能である。さらに、網膜の画像以外にも、アイリス（虹彩）の画像を撮影し、認証用特徴情報として用いることもできる。アイリスの画像を用いる場合、その模様や色の個人性を利用して照合・認証を行う。特にアイリス模様は後天的形成要素であり、遺伝的影響度も低いので一卵性双生児でも顕著な相違があり、確実に識別できる利点がある。アイリス模様を用いた認証方式は、認識・照合を迅速に行うことができ、他人誤認率も低い特徴がある。アイリスの撮影は通常のカメラを用いて行うことができ、この場合、顔撮影用カメラ３４１に代えて専用のカメラを設けるようにしてもよいし、顔撮影用カメラ３４１にアイリス接写用のアタッチメントを取り付けて撮影を行うようにすることも可能である。 Returning to FIG. 1, the face photographing camera 341 is composed of, for example, a CCD camera, and the mobile phone 1 is positioned so that the face can be photographed when the face of the person to be authenticated and the display unit 308 are facing each other. The display unit 308 is provided in the vicinity. This is because the face image for authentication must have a necessary part of the face within the field of view of the camera 341. Therefore, a finder image for photographing that is captured by the camera 341 is displayed on the display unit 308 and has a posture suitable for authentication. This is so that an image can be taken while confirming whether or not an image can be obtained (for example, as shown in FIG. 13, the face fits within a prescribed frame F in the display unit 308 and the line of sight is aligned with the reference line SL). is there. It is also possible to provide a retina photographing camera instead of the face photographing camera 341 and use the retina image as the authentication feature information. In addition to the retina image, an iris image can be taken and used as authentication feature information. When an iris image is used, collation / authentication is performed using the personality of the pattern and color. In particular, the iris pattern is an acquired component and has a low genetic influence, so there is a significant difference even in identical twins, and there is an advantage that it can be reliably identified. An authentication method using an iris pattern has features that it can quickly recognize and collate and has a low misidentification rate. An iris can be photographed using a normal camera. In this case, a dedicated camera may be provided instead of the face photographing camera 341, or an attachment for iris close-up photography is attached to the face photographing camera 341. It is also possible to attach and take a picture.

図２は、携帯電話１の電気的構成の一例を示すブロック図である。回路の要部は、Ｉ／Ｏポート３１１と、これに接続されたＣＰＵ３１２（認証用特徴情報取得制御手段、認証処理手段、照合手段、複合音声特徴情報演算手段を構成する）、ＲＯＭ３１３、ＲＡＭ３１４（骨伝導音声情報記憶部及び気導音声情報記憶部となる）等からなる制御部３１０を含む。Ｉ／Ｏポート３１１には、前述の入力部３０５、オンフック／オフフック切換スイッチ３０６が接続される。また、受話器３０３はアンプ３１５とＤ／Ａ変換器３１６を介して、送話器３０４はアンプ３１７とＡ／Ｄ変換器３１８を介して、さらに骨伝導マイク３４０はアンプ３２０とＡ／Ｄ変換器３２１を介して、それぞれＩ／Ｏポート３１１に接続されている。また、Ｉ／Ｏポート３１１には、電話接続回路３２３が接続されている。接続回路３２３は、制御部３１０と接続するための接続インターフェース３３１と、これに接続された変調器３３２、送信機３３３、周波数シンセサイザ３３４、受信機３３５、復調器３３６及び共用器３３７等により構成されている。制御部３１０からのデータ信号は変調器３３２により変調され、さらに送信機３３３により共用器３３７を介してアンテナ３３９から送信される。一方、受信電波はアンテナ３３９及び共用器３３７を介して受信器３３５により受信され、復調器３３６で復調された後、制御部３１０のＩ／Оポート３１１に入力される。なお、通話を行なう場合は、例えば送話器３０４から入力された音声信号がアンプ３１７で増幅され、さらにＡ／Ｄ変換器３１８によりデジタル変換されて制御部３１０に入力される。該信号は、必要に応じて制御部３１０にて加工された後、Ｄ／Ａ変換器３１６及びアンプ３１５を介して受話器３０３から出力される。 FIG. 2 is a block diagram showing an example of the electrical configuration of the mobile phone 1. The main parts of the circuit are an I / O port 311 and a CPU 312 (which constitutes an authentication feature information acquisition control unit, an authentication processing unit, a verification unit, and a composite voice feature information calculation unit), a ROM 313 and a RAM 314 ( A control unit 310 including a bone conduction voice information storage unit and an air conduction voice information storage unit). The input unit 305 and the on-hook / off-hook switch 306 described above are connected to the I / O port 311. The receiver 303 is connected via an amplifier 315 and a D / A converter 316, the transmitter 304 is connected via an amplifier 317 and an A / D converter 318, and the bone conduction microphone 340 is connected to an amplifier 320 and an A / D converter. Each of them is connected to the I / O port 311 via the H.321. A telephone connection circuit 323 is connected to the I / O port 311. The connection circuit 323 includes a connection interface 331 for connecting to the control unit 310, a modulator 332, a transmitter 333, a frequency synthesizer 334, a receiver 335, a demodulator 336, a duplexer 337, and the like connected thereto. ing. The data signal from the controller 310 is modulated by the modulator 332 and further transmitted from the antenna 339 via the duplexer 337 by the transmitter 333. On the other hand, the received radio wave is received by the receiver 335 via the antenna 339 and the duplexer 337, demodulated by the demodulator 336, and then input to the I / O port 311 of the control unit 310. When making a call, for example, an audio signal input from the transmitter 304 is amplified by the amplifier 317, further digitally converted by the A / D converter 318, and input to the control unit 310. The signal is processed by the control unit 310 as necessary, and then output from the receiver 303 via the D / A converter 316 and the amplifier 315.

一方、接続インターフェース３３１には、制御用電波Ｐを発信する制御用電波発信器３３８がつながれている。制御用電波Ｐは共用器３３７を介してアンテナ３３９から発信される。そして、携帯電話１が別の通信ゾーン１０２に移動した場合、網側の無線回線制御局１０４が制御用電波Ｐの受信状況に基づいて、周知のハンドオーバ処理を行なう。 On the other hand, a control radio wave transmitter 338 that transmits a control radio wave P is connected to the connection interface 331. The control radio wave P is transmitted from the antenna 339 via the duplexer 337. When the mobile phone 1 moves to another communication zone 102, the network-side radio network controller 104 performs a well-known handover process based on the reception status of the control radio wave P.

Ｉ／Ｏポート３１１には、顔撮影用カメラ３４１、指紋検知ユニット３４２及び面型接触センサ３４３が接続されている。面型接触センサ３４３の個々の接点ＳＰ（図３）の抵抗変化はアナログ電圧信号としてディジタイザ３４４に入力され、接点ＳＰごとの圧力のデジタルデータに変換されてＩ／Ｏポート３１１に入力される。 To the I / O port 311, a face photographing camera 341, a fingerprint detection unit 342, and a surface contact sensor 343 are connected. The resistance change of each contact SP (FIG. 3) of the surface contact sensor 343 is input to the digitizer 344 as an analog voltage signal, converted into digital data of pressure for each contact SP, and input to the I / O port 311.

次に、ＲＯＭ３１４内には、無線電話通信の基本制御プログラムである通信プログラムと、液晶モニタ３０８の画面表示を司る表示プログラムが搭載される。また、図４に示すように、ＲＯＭ３１４内には、携帯電話１のユーザが正規ユーザであるか否かを認証するための認証用プログラム（ＣＰＵ３１２にて実行されることで、認証処理手段を実現する）も搭載されている。本実施形態において認証処理は、具体的には気導音の音声波形と骨伝導音の音声波形の双方を併用した話者認識・照合処理により行われる。上記の認証用プログラムは、メインプログラム２０１と、該メインプログラム２０１が利用するサブモジュール群、具体的には気導音サンプリングモジュール２０２、骨伝導音サンプリングモジュール２０３、気導音／骨伝導音位相差演算・照合判定モジュール２０４、気導音／骨伝導音スペクトル演算・照合判定モジュール２０５、顔画像サンプリングモジュール２０７、顔画像照合・判定モジュール２０８、指紋サンプリングモジュール２０９、指紋照合・判定モジュール２１０、前述の把握押圧領域を検出するための感圧分布測定モジュール２１１、感圧分布照合・判定モジュール２１２等からなる。これらのプログラム群は、いずれも図２のＲＡＭ３１３をワークエリアとしてＣＰＵ３１２により実行されるものである。 Next, in the ROM 314, a communication program, which is a basic control program for wireless telephone communication, and a display program for controlling the screen display of the liquid crystal monitor 308 are installed. Further, as shown in FIG. 4, in ROM 314, an authentication processing means is realized by being executed by CPU 312 for authenticating whether or not the user of mobile phone 1 is a regular user. Is also installed. In the present embodiment, the authentication processing is specifically performed by speaker recognition / collation processing using both the speech waveform of the air conduction sound and the speech waveform of the bone conduction sound. The authentication program includes a main program 201 and a group of submodules used by the main program 201. Specifically, the air conduction sound sampling module 202, the bone conduction sound sampling module 203, the air conduction sound / bone conduction sound phase difference. Calculation / collation determination module 204, air conduction sound / bone conduction sound spectrum calculation / collation determination module 205, face image sampling module 207, face image collation / determination module 208, fingerprint sampling module 209, fingerprint collation / determination module 210, It consists of a pressure-sensitive distribution measurement module 211 for detecting a grasping pressing area, a pressure-sensitive distribution collation / determination module 212, and the like. These programs are all executed by the CPU 312 using the RAM 313 in FIG. 2 as a work area.

また、認証用マスターデータ３２２として、音声による認証をスペクトル照合処理にて行なう場合（関与するモジュールは符号２０５，２０６）に使用する音声スペクトルのマスターデータ、具体的には気導音音声スペクトルマスターデータ３２１、骨伝導音音声スペクトルマスターデータ２２２及びそれらの差分スペクトルのマスターデータ２２３が用意されている。また、顔画像マスターデータ２２４、指紋マスターデータ２２４及び感圧分布マスターデータ２２６もそれぞれ用意されている。これらのデータは、認証処理を実施するのに先立って、気導音及び骨伝導音の場合は、正規ユーザ（認証特定対象者）に、照合用として予め定められた音（「おん」）、単語ないし文を発音させて、これを受話器３０３（気導音）及び骨伝導マイク３４０（骨伝導音）により波形検出し、周知のフーリエ変換演算を施してスペクトル化することにより作成されるものである。また、顔画像マスターデータ２４１（図１０）、指紋マスターデータ２４３及び感圧分布マスターデータ２２６（図１２）も、それぞれ顔撮影カメラ３４１、指紋検知ユニット３４２及び面型接触センサ３４３により、正規ユーザから事前に取得されたものが用意されている。これらのデータは、ユーザ毎に異なるデータになることと、セキュリティレベル向上等のため照合元音声特徴情報を随時更新できるようにするために、書き換え可能なＲＯＭ、具体的には、図２のＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）３２２内に書き換え可能に記憶されており、必要に応じてＲＡＭ３１３の認証用データメモリ内にロードして利用される。 Further, as authentication master data 322, audio spectrum master data, specifically air-conducted sound spectrum master data used when authentication by voice is performed in spectrum matching processing (modules involved are reference numerals 205 and 206). 321, bone conduction sound spectrum master data 222 and master data 223 of their difference spectrum are prepared. Also, face image master data 224, fingerprint master data 224, and pressure-sensitive distribution master data 226 are prepared. In the case of the air conduction sound and the bone conduction sound, these data are obtained by a regular user (authentication specific target person) with a sound (“ON”) predetermined for verification, A word or sentence is pronounced, and this is generated by detecting the waveform with the handset 303 (air conduction sound) and the bone conduction microphone 340 (bone conduction sound) and performing a known Fourier transform operation to obtain a spectrum. is there. Further, the face image master data 241 (FIG. 10), the fingerprint master data 243, and the pressure-sensitive distribution master data 226 (FIG. 12) are also received from the authorized user by the face photographing camera 341, the fingerprint detection unit 342, and the surface-type contact sensor 343, respectively. What was acquired in advance is prepared. These data are different for each user, and the rewritable ROM, specifically, the EEPROM shown in FIG. (Electrically Erasable Programmable Read Only Memory) 322 is stored so as to be rewritable, and is loaded into the authentication data memory of the RAM 313 and used as necessary.

なお、以下においては、複数の具体的な音声認証方式についての説明を行なうが、方式によっては特に用いないモジュール及びデータも存在するので、必要なモジュールとデータを取捨選択して用いるものとする（当該の認証方式にて用いないモジュール及びデータを省略することももちろん可能である）。 In the following description, a plurality of specific voice authentication methods will be described. However, there are modules and data that are not particularly used depending on the method, and therefore, necessary modules and data are selected and used ( Of course, it is possible to omit modules and data not used in the authentication method).

携帯電話１の使用方法は、電話部分については周知であるので詳細な説明は省略し、その使用に先立つ認証処理について以下に詳しく説明する。図１０は、メインプログラム２０１（図４）による認証主処理の流れである。認証処理を行なうためには、照合用のデータ登録などを含む初期化処理を行なう必要がある（Ｓ１）。この初期化処理は、照合用マスターデータの更新等を行なう場合を除き、一度行なえば、その後はスキップされるものである。Ｓ３及びＳ４は処理の中心をなす認証処理であり、その認証結果により、携帯電話１の機能利用を許可するか否かを示す認証用フラグが、例えばＲＡＭ３１３（図２）内に立てられる。Ｓ５では、その認証フラグをリードし、規定の条件を満たしている場合にロック解除（Ｓ７：つまり、利用許可）、満たさない場合にロック非解除（Ｓ８：つまり、利用不許可）の流れとなる。 Since the method for using the mobile phone 1 is well known for the telephone portion, detailed description thereof is omitted, and authentication processing prior to use will be described in detail below. FIG. 10 is a flow of authentication main processing by the main program 201 (FIG. 4). In order to perform the authentication process, it is necessary to perform an initialization process including registration of data for verification (S1). This initialization process is skipped once it is performed, except when the verification master data is updated. S3 and S4 are authentication processes that form the center of the process. Based on the authentication result, an authentication flag indicating whether or not the function use of the mobile phone 1 is permitted is set in, for example, the RAM 313 (FIG. 2). In S5, the authentication flag is read, and when the prescribed condition is satisfied, the lock is released (S7: that is, usage is permitted), and when not satisfied, the lock is released (S8: that is, that the usage is not permitted). .

本実施形態において認証処理は、図２９に示すような顔当接型保持状態での処理となる第一認証処理（Ｓ３）と、顔正対型保持状態での処理となる第二認証処理（Ｓ４）との２段階にて行なうようになっている。第一認証処理と第二認証処理とは順序を入れ替えることも可能であるし、第一認証処理のみを行なうこと（つまり、図１５でＳ４を省略）、あるいは第二認証処理のみを行なうこと（つまり、図１５でＳ３を省略）もそれぞれ可能である。 In this embodiment, the authentication process includes a first authentication process (S3) that is a process in the face contact type holding state as shown in FIG. 29 and a second authentication process that is a process in the face-to-face holding state (S3). This is performed in two stages with S4). The first authentication process and the second authentication process can be switched in order, and only the first authentication process is performed (that is, S4 is omitted in FIG. 15), or only the second authentication process is performed ( That is, S3 is omitted in FIG.

認証によりロック解除される携帯電話１の機能については、周知の電話機能（電話通信網ないしインターネットなどへの接続やメール機能等も含む）に限られるものではなく、例えば、自動車のロック／ロック解除や、エンジン始動、ヘッドライトや車内灯の点灯・消灯など、自動車機能の無線遠隔操作ユニット機能とすることもできる。 The function of the cellular phone 1 that is unlocked by authentication is not limited to a well-known telephone function (including connection to a telephone communication network or the Internet, a mail function, etc.), for example, lock / unlock of a car. Or, it can be a wireless remote control unit function for automobile functions, such as starting the engine, turning on / off the headlights and interior lights.

認証処理の具体的な説明に入る前に、初期化処理と音声認識処理との各処理の流れを、図１６〜図１９を用いて説明する。そのいずれにおいても、処理の主要部は、音声データの取得と加工を受け持つ音声データ処理からなる（初期処理ではＳ３０１、音声認証処理ではＳ４０２）。図１７を用いて、この音声データ処理をまず詳細に説明する。話者認証技術では、セキュリティ向上等を目的として、認証処理対象者に認証用の音声を発音させるための手法が種々考案され、方式によって初期データの取得方法も異なるが、いずれも手法としては周知であるので概略だけ説明する。 Before entering into a specific description of the authentication process, the flow of each process of the initialization process and the voice recognition process will be described with reference to FIGS. In either case, the main part of the process consists of voice data processing responsible for acquisition and processing of voice data (S301 in the initial process and S402 in the voice authentication process). The audio data processing will be described in detail first with reference to FIG. In the speaker authentication technology, various methods have been devised for causing the authentication target person to pronounce the voice for authentication for the purpose of improving security and the initial data acquisition method differs depending on the method, but both methods are well known. Therefore, only an outline will be described.

（１）文字（あるいは音（例えば母音））を一文字だけ発声させる方法
発声させる文字を表示等により指定して発生させ、サンプリングを行なう。
（２）複数文字を組み合わせて逐次発声させる方法
基本的には（１）と同じ。発声の順序を表示等により誘導し、順次波形のサンプリングを行なう。実際の照合時には、発声順序を固定にしてもよいし、乱数を用いて発声順序を毎回変えるようにしてもよい（後者の場合、認証時に指定される文字の発声順序がランダムに変化されるので、固定順序で発声したものを録音しておいても役に立たなくできる利点がある。
（３）単語を発声させる方法
使用する単語は１種類のみとしてもよいし（この場合（２）と同じになる）、複数種類の中から選択させる方法もある。後者の場合（以下、図１を参照のこと）、照合先となる単語の選択リストを画面１０８に表示し、入力部３０５にて選択を行った後、選んだ単語の発声・サンプリングを行なう。また、文字数（あるいは録音時間）を指定して、ユーザの好みの単語を入力部３０５にて任意に入力し、発声・サンプリングする方法もある。この場合、その単語がパスワード代わりになることが明らかである。また、より手の込んだ方法としては、正規ユーザにしか回答がわからない質問を音声出力させ、これに対応した登録済みの回答を音声入力させる方法もある。この場合、初期化処理では、出力すべき質問内容と、それに対する回答内容の、各入力ないし選択が必要となる。
（４）文を入力する方法
基本的には（３）と同じであり、質問／回答形式を採用する場合は、複数の質問と回答とを対話形式で入力する方法もありえる。 (1) Method of uttering only one character (or sound (for example, vowel)) A character to be uttered is designated and generated by display or the like, and sampling is performed.
(2) Method of sequentially uttering by combining multiple characters Basically the same as (1). The order of utterance is guided by display or the like, and waveform sampling is performed sequentially. During actual verification, the utterance order may be fixed, or the utterance order may be changed every time using a random number (in the latter case, the utterance order of characters designated at the time of authentication is changed randomly) There is an advantage that even if you record what you say in a fixed order, you can use it.
(3) Method of uttering a word There may be only one type of word to be used (in this case, it is the same as (2)), or there is a method of selecting from a plurality of types. In the latter case (refer to FIG. 1 below), a selection list of words to be collated is displayed on the screen 108 and selected by the input unit 305, and then the selected word is uttered and sampled. In addition, there is a method in which the number of characters (or recording time) is designated, a user's favorite word is arbitrarily input by the input unit 305, and utterance / sampling is performed. In this case, it is clear that the word is substituted for the password. Further, as a more elaborate method, there is a method in which a question that only a legitimate user can understand is voice-output and a registered answer corresponding to this is voice-inputted. In this case, in the initialization process, it is necessary to input or select the question contents to be output and the answer contents to the question contents.
(4) Method of inputting a sentence This is basically the same as (3), and when a question / answer format is adopted, there may be a method of inputting a plurality of questions and answers in an interactive format.

骨伝導音と気導音とで比較した場合、骨伝導音の方が声帯に近い分、母音などの声帯振動に由来した音波成分が気導音より強調される傾向にある。また、摩擦音や破裂音は、舌や唇などの声帯以外の作音要素が関与するため、気導音のほうがより強調されて現れる。従って、骨伝導音と気導音との波形ないしスペクトル上の差（特に差分スペクトルなど）に基づいて認証を行なう場合は、認証対象となる音声波形データ（骨伝導音及び気導音）として、母音、摩擦音及び破裂音を含むもの（好ましくは、最も多く含まれる音がこれらの音種のいずれかとなる音列：例えば、「さしすせそ」、「しししんちゅうのむし」、「あいうえお」など：もちろん、「さ行」、「た行」あるいは「あ行」の単音でも可）を指定することが望ましいといえる。また、同じ母音でも、調音に舌の前部を使う「い、え」などの音は気導音でより明瞭であり、逆に舌後部を使う「う、お」などの音は骨伝導音で明瞭であるから、「いえ（家）」「こうぼ（酵母）」など、前者又は後者のどちらかを主体に含む音列を指定することも効果的である。 When comparing the bone conduction sound and the air conduction sound, the sound component derived from the vocal cord vibration such as a vowel tends to be emphasized from the air conduction sound because the bone conduction sound is closer to the vocal cord. In addition, since the frictional sound and the plosive sound are associated with sound-generating elements other than the vocal cords such as the tongue and lips, the air conduction sound appears more emphasized. Therefore, when performing authentication based on the difference in the waveform or spectrum between bone conduction sound and air conduction sound (especially the difference spectrum), as speech waveform data (bone conduction sound and air conduction sound) to be authenticated, Sounds that include vowels, friction sounds, and plosives (preferably, sound strings in which the most abundant sounds are one of these sound types: for example, “Sashisuseso”, “Sushishinchu no Mushi”, “Aiueo”, etc .: Of course, it may be desirable to specify “sa line”, “ta line” or “a line”. In addition, even in the same vowel, sounds such as “i, e” that use the front part of the tongue for articulation are air conduction sounds, and conversely, sounds such as “u, o” that use the back part of the tongue are bone conduction sounds. Therefore, it is also effective to specify a sound string mainly including either the former or the latter, such as “No (house)” and “Koubo (yeast)”.

図１７の説明に戻り、Ｓ５０１では、指定された音声の入力を送話器３０４と骨伝導マイク３４０の双方を用いて入力する。Ｓ５０２では、そのサンプリングを行なう（図４の気導音サンプリングモジュール２０２及び骨伝導音サンプリングモジュール２０３の実行により実施される）。ユーザは要求された音列を１回だけ発するので、サンプリングは時系列的には同時に行われなくてはならない（従って、これを用いて実施される後述の第一認証処理では、２つの認証用特徴情報をなす骨伝導音と気道音との取得が同時に実行されることが明らかである）。この場合、単一のＣＰＵを用いる場合は、図１８に示すような時分割による並列処理として実行する。具体的には、Ｓ１０１においてサンプリングカウンタをリセットし、以下、サンプリングカウンタをインクリメントしながら、気導音用のマイク入力ポートのリード（Ｓ１０２）とリード値のメモリ（ＲＡＭ３１３）への書き込み（Ｓ１０３）、骨伝導マイクの入力ポートのリード（Ｓ１０４）とリード値のメモリへの書き込み（Ｓ１０５）を交互に繰り返えす。サンプリングするべき音声データの長さに応じて総サンプリング時間（サンプリングカウンタの値で代用できるが、他のタイマー手段を用いてもよい）を決めておき、タイムアップによりサンプリングを打ち切るようにしておくと（Ｓ１０７）、骨伝導音音声波形と気導音音声波形とを同時サンプリングしない限りは、両音声のデータを正常に取得することは不可能となり、例えばテープレコーダ等を用いた順次音声入力等による誤魔化し等を効果的に防止することができる。 Returning to the description of FIG. 17, in S 501, input of the designated voice is input using both the transmitter 304 and the bone conduction microphone 340. In S502, the sampling is performed (implemented by executing the air conduction sound sampling module 202 and the bone conduction sound sampling module 203 of FIG. 4). Since the user emits the requested sound sequence only once, the sampling must be performed simultaneously in time series (therefore, in the first authentication process described later performed using this, two authentications are used) It is clear that the acquisition of the bone conduction sound and the airway sound that make up the characteristic information is performed simultaneously). In this case, when a single CPU is used, it is executed as parallel processing by time division as shown in FIG. Specifically, the sampling counter is reset in S101, and thereafter, while the sampling counter is incremented, reading of the microphone input port for air conduction sound (S102) and writing of the read value to the memory (RAM 313) (S103), Read of the input port of the bone conduction microphone (S104) and writing of the read value to the memory (S105) are repeated alternately. Depending on the length of the audio data to be sampled, the total sampling time (the value of the sampling counter can be used instead, but other timer means may be used), and the sampling is aborted when the time is up (S107) Unless both the bone conduction sound waveform and the air conduction sound waveform are sampled simultaneously, it is impossible to normally obtain the data of both sounds, for example, by sequential sound input using a tape recorder or the like. It is possible to effectively prevent deception and the like.

なお、単語や文による音声データの入力を行なう場合は、定められた内容（意味）の音声の入力が完了したか否かを周知の音声認識技術により判別し、完了していればサンプリングを打ち切るように構成することもできる。この場合、タイマー手段は必ずしも必要でなくなる。また、ハードウェアは幾分複雑化するが、気導音音声と骨伝導音音声のサンプリングを、個別の（つまり、２つの）ＣＰＵにて独立して行なうこともでき、この場合は、時分割処理を行なわなくとも両音声波形の並列サンプリングが可能となる。 In addition, when inputting speech data using words or sentences, it is determined whether or not the input of speech with a predetermined content (meaning) is completed by using a well-known speech recognition technology, and if completed, sampling is terminated. It can also be configured as follows. In this case, the timer means is not always necessary. Although the hardware is somewhat complicated, the sampling of the air conduction sound and the bone conduction sound can be performed independently by separate (ie, two) CPUs. Both audio waveforms can be sampled in parallel without processing.

図１７に戻り、上記のようにして気導音と骨伝導音との各音声波形のサンプリングを終了したら、Ｓ５０３にて、各音声が同時にサンプリングされたものであるかどうかをチェックする。チェック方法としては種々考えられるが、例えば、気導音と骨伝導音とが故意にずれたタイミングで入力されていれば、どちらかがサンプリング時間外にはみ出し、取得したデータには大きな空白期間が生ずるはずであるから、これを利用する方法がある。この場合、取得した気導音波形と骨伝導音波形の少なくともいずれかに、音声振幅が予め定められた下限値以下となる期間が一定以上継続しているか否かをチェックし、そのような期間が存在していれば同時性なしと判定する。Ｓ５０３にて同時性なしと判定された場合はＳ５１１に進んで処理を打ち切り、エラーないし警告出力を行なう。 Returning to FIG. 17, when the sampling of the respective sound waveforms of the air conduction sound and the bone conduction sound is completed as described above, it is checked in step S 503 whether or not each sound has been sampled simultaneously. There are various possible check methods. For example, if the air conduction sound and the bone conduction sound are input at a deliberately shifted timing, one of them will protrude outside the sampling time, and the acquired data will have a large blank period. There is a way to take advantage of this because it should happen. In this case, it is checked whether at least one of the acquired air conduction sound waveform and bone conduction sound waveform has a period during which the audio amplitude is equal to or lower than a predetermined lower limit value continuing for a certain period. If is present, it is determined that there is no simultaneity. If it is determined in S503 that there is no simultaneity, the process proceeds to S511, where the processing is terminated and an error or warning is output.

同時性を充足していたらＳ５０５及びＳ５０６へ進み、検出された気導音音声波形データと骨伝導音音声波形データをメモリに記憶・登録する。以下は、認証に用いる複合音声特徴情報の演算処理となる（複合音声特徴情報演算手段の機能が実現されている）。Ｓ５０７では複合音声特徴情報として、気導音音声波形と骨伝導音音声波形との位相差を演算する（気導音／骨伝導音位相差演算・照合判定モジュール２０４の実行により実施される）。図８に示すように、気導音音声波形と骨伝導音音声波形とは同一の音声を個別のマイクにより同時にサンプリングしたものであり、サンプリング開始タイミングを基準に波形を重ね合せたときの両波形の位相を基準重ね合わせ位相とする。２つの波形は、同一の音声に基づき共通の周波数成分を多く含むので、図９に示すように、両波形データの重ね合わせ位相を、基準重ね合わせ位相にて固有に存在していた位相差（つまり、求めるべき位相差）φが解消されるように相対的にシフトして差分波形を演算すれば、該差分波形の積分振幅（平均振幅）は、その重ね合わせ位相にて最小化される（図９の一番下を参照）。そこで、差分波形の積分振幅を演算しつつ両波形データの重ね合わせ位相を種々に変化させ、該積分振幅が最小化される重ね合わせ位相を見出せば、これを求めるべき両波形の位相差φとして得ることができる。 If simultaneity is satisfied, the process proceeds to S505 and S506, and the detected air conduction sound waveform data and bone conduction sound waveform data are stored and registered in the memory. The following is a calculation process of the composite voice feature information used for authentication (the function of the composite voice feature information calculation means is realized). In S507, the phase difference between the air conduction sound waveform and the bone conduction sound speech waveform is calculated as the composite sound feature information (implemented by executing the air conduction sound / bone conduction sound phase difference calculation / collation determination module 204). As shown in FIG. 8, the air conduction sound waveform and the bone conduction sound waveform are obtained by simultaneously sampling the same sound by individual microphones, and both waveforms when the waveforms are superimposed on the basis of the sampling start timing. Is the reference superposition phase. Since the two waveforms include many common frequency components based on the same sound, as shown in FIG. 9, the overlapping phase of both waveform data is the phase difference (inherently present in the reference overlapping phase) ( That is, if the differential waveform is calculated by relatively shifting so as to eliminate the phase difference (φ to be obtained), the integrated amplitude (average amplitude) of the differential waveform is minimized by the superposition phase ( (See the bottom of FIG. 9). Therefore, if the superposition phase of both waveform data is changed variously while calculating the integral amplitude of the difference waveform and the superposition phase where the integral amplitude is minimized is found, this is obtained as the phase difference φ between both waveforms to be obtained. Obtainable.

なお、認証処理に用いる個人特徴情報として利用することを考慮すると、求めるべき位相差φに一義的に対応したパラメータが得られればこと足りるので、複合音声特徴情報としては、差分波形の積分振幅が最小化される位相差に限らず、以下のもので代用することも可能である。
(1)差分波形の積分振幅が最大化される位相差
(2)加算波形の積分振幅が最小化される位相差
(3)加算波形の積分振幅が最大化される位相差 In consideration of use as personal feature information used for authentication processing, it is only necessary to obtain a parameter that uniquely corresponds to the phase difference φ to be obtained. The phase difference is not limited to the following, and the following can be substituted.
(1) Phase difference that maximizes the integrated amplitude of the differential waveform
(2) Phase difference that minimizes the integrated amplitude of the sum waveform
(3) Phase difference that maximizes the integrated amplitude of the added waveform

以下、差分波形の積分振幅が最小化される位相差φを求める処理を例にとって、図１９のフローチャートにより説明する。Ｓ２０１では、重ね合わせ位相差Σｔ（波形は種々の正弦波波形の重ね合わせになるので、位相差の演算単位は角度ではなく時間とする）をリセットする。次いで、気導音音声波形と骨伝導音音声波形との一方を第一波形、他方を第二波形として、Ｓ２０２で第二波形の位相を予め定められた微小時間Δｔだけシフトし、第一波形は固定として、Ｓ２０３で差分波形を演算する。Ｓ２０４では、その差分波形の積分振幅Ａを演算する。積分振幅の演算方法は周知であるが、例えば次のようにして計算できる。まず、波形をｆ（ｔ）として、各サンプリングタイミングｔに対応するｆ（ｔ）の値を全て加算してサンプリング数Ｎで割り、波形中心線ｆ０を求める。次いで、各ｔの値につき｜ｆ（ｔ）−ｆ０｜を演算し、これを全てのｔについて加算してＮで割れば積分振幅が得られる。Ｓ２０５では、そのときのΣｔの値を位相差φとし、積分振幅Ａの値と対応付けて記憶する。 Hereinafter, the process of obtaining the phase difference φ that minimizes the integral amplitude of the differential waveform will be described with reference to the flowchart of FIG. In S201, the superposition phase difference Σt (the waveform is a superposition of various sinusoidal waveforms, so the calculation unit of the phase difference is not an angle but time) is reset. Next, one of the air conduction sound waveform and the bone conduction sound waveform is set as the first waveform, and the other is set as the second waveform, and the phase of the second waveform is shifted by a predetermined minute time Δt in S202 to obtain the first waveform. Is fixed, and the differential waveform is calculated in S203. In S204, the integral amplitude A of the difference waveform is calculated. The method of calculating the integral amplitude is well known, but can be calculated as follows, for example. First, assuming that the waveform is f (t), all the values of f (t) corresponding to each sampling timing t are added and divided by the number of samplings N to obtain the waveform center line f0. Then, | f (t) −f0 | is calculated for each value of t, and this is added for all t and divided by N to obtain an integrated amplitude. In S205, the value of Σt at that time is set as the phase difference φ and stored in association with the value of the integral amplitude A.

次いで、Ｓ２０６でΣｔをΔｔだけインクリメントし、Σｔが予め定められた最大値Σｔmaxに到達するまでＳ２０２〜Ｓ２０６の処理を繰り返す。認証用に指定された音声としてユーザが自然に発声できることを考慮すれば、音声サンプルの長さは例えば１秒以上確保することが望ましい。位相差を見出すのに必要な波形シフト量は、０．５〜２波長分もあれば十分なので、人の声の周波数が平均的には１〜２ｋＨｚであることを考えれば、Σｔは０．５〜２ｍｓ位に設定するのがよい。また、サンプリング周期Δｔは、Σｔの１／１０００〜１／１０程度とすることが望ましい。なお、第二波形のシフトの区間は、基準重ね合わせ位相差を原点として、正又は負の一方向にのみ区間設定して演算してもよいし、正負のそれぞれに区間設定して演算するようにしてもよい。 Next, in S206, Σt is incremented by Δt, and the processes in S202 to S206 are repeated until Σt reaches a predetermined maximum value Σtmax. Considering that the user can speak naturally as the voice designated for authentication, it is desirable to secure the length of the voice sample, for example, for 1 second or longer. The amount of waveform shift required to find the phase difference is sufficient for 0.5 to 2 wavelengths. Therefore, considering that the human voice frequency is on average 1 to 2 kHz, Σt is 0. It should be set to about 5 to 2 ms. The sampling period Δt is preferably about 1/1000 to 1/10 of Σt. Note that the second waveform shift interval may be calculated by setting the interval only in one direction, positive or negative, with the reference overlap phase difference as the origin, or by setting the interval in each of positive and negative. It may be.

以上の演算が終了すれば、Ｓ２０８に進み、記憶されている積分振幅Ａの最小値Ａ０を見出し、Ｓ２０９でそのＡ０に対応する位相差φを求めるべき位相差φ０として決定する。なお、骨伝導音と気導音との間には、図６に示すように、スペクトル上少なからぬ差異があり、互いに共通しない周波数成分が存在する（例えば、骨伝導音の場合、周波数の高い音域のスペクトル強度が欠落しがちとなる）。従って、上記位相差を演算する際には、共通成分の多い周波数域をフィルタリングにより抽出してから波形演算を行なう方が望ましい場合がある。以上で位相差演算の説明を終わる。 When the above calculation is completed, the process proceeds to S208, the stored minimum value A0 of the integrated amplitude A is found, and the phase difference φ0 corresponding to A0 is determined as the phase difference φ0 to be obtained in S209. As shown in FIG. 6, there is a considerable difference in spectrum between the bone conduction sound and the air conduction sound, and there are frequency components that are not common to each other (for example, in the case of bone conduction sound, the frequency is high). The spectral intensity of the range tends to be missing). Therefore, when calculating the phase difference, it may be desirable to perform waveform calculation after extracting a frequency region with many common components by filtering. This is the end of the description of the phase difference calculation.

図１７に戻り、Ｓ５０８及びＳ５０９では、気導音と骨伝導音との各波形の周波数スペクトルを演算し、結果を記憶する。この演算は、既に述べたごとく原波形に対し周知のフーリエ変換処理を施すことにより実施できる。ただし、話者認識においては、図５の上に示すような微細構造を含んだスペクトル波形よりも、下に示すようなスペクトル概形（主に、声の質を反映した情報である）の方が測定の再現性に優れ、かつ個人識別情報としても十分に有効であり、照合処理も容易であることが知られている。このスペクトル概形はスペクトル包絡とも称され、周知の種々の音声分析アルゴリズム（例えば、ノンパラメトリック分析法による場合は、短時間事故相関分析法、短時間スペクトル分析法、ケプストラム分析法、帯域フィルタバンク分析法あるいは零交差数分積法など、パラメトリック分析法による場合は、線形予測分析法、最尤スペクトル推定法、共分散法、ＰＡＲＣＯＲ分析法、ＬＳＰ分析法など）により抽出・演算が可能である。 Returning to FIG. 17, in S508 and S509, the frequency spectrum of each waveform of the air conduction sound and the bone conduction sound is calculated, and the result is stored. As described above, this calculation can be performed by performing a well-known Fourier transform process on the original waveform. However, in speaker recognition, the spectral outline shown below (mainly information that reflects voice quality) is used rather than the spectral waveform containing the fine structure shown in FIG. However, it is known that it has excellent measurement reproducibility, is sufficiently effective as personal identification information, and can be easily verified. This spectral outline is also referred to as spectral envelope, and various well-known speech analysis algorithms (for example, short-time accident correlation analysis method, short-time spectrum analysis method, cepstrum analysis method, band filter bank analysis in the case of non-parametric analysis method) In the case of using a parametric analysis method such as a method or a zero-crossing product method, extraction and calculation can be performed by a linear prediction analysis method, a maximum likelihood spectrum estimation method, a covariance method, a PARCOR analysis method, an LSP analysis method, or the like.

図１５に戻り、Ｓ５１０では、図６に示すごとく、上記のようにして得られた気導音と骨伝導音との周波数スペクトルの差分を演算し、差分スペクトルデータとして記憶する。以上の処理は、図４の気導音／骨伝導音差分スペクトル演算・照合判定モジュール２０５、波形スペクトル照合・判定モジュール２０６の実行により実施される。以上で、音声データ処理の説明を終わる。 Returning to FIG. 15, in S510, as shown in FIG. 6, the difference in frequency spectrum between the air conduction sound and the bone conduction sound obtained as described above is calculated and stored as difference spectrum data. The above processing is performed by executing the air conduction sound / bone conduction sound difference spectrum calculation / collation determination module 205 and the waveform spectrum comparison / determination module 206 of FIG. This is the end of the description of the audio data processing.

図１６に戻り、初期化処理の流れについて説明する。
Ｓ３０１の音声データ処理では、正規ユーザ（認証特定対象者）自身の声により音声入力を行い、既に説明した通りの方法で位相差、気導音ないし骨伝導音の周波数スペクトルないし差分スペクトルのデータを作成し、Ｓ３０２にて、これらを、この後の音声認証処理で使用するマスターデータ（標準音声特徴情報：標準位相差、標準周波数スペクトルあるいは標準差分スペクトル）としてＥＥＰＲＯＭ３２２（図４）に登録する。また、Ｓ３０３〜Ｓ３０５では、顔撮影カメラ３４１、指紋検知ユニット３４２及び面型接触センサ３４３により、顔画像マスターデータ２４１（図１０）、指紋マスターデータ２４３（図１１）及び感圧分布マスターデータ２２６（図１２）をそれぞれ取得し、登録する。 Returning to FIG. 16, the flow of the initialization process will be described.
In the voice data processing of S301, voice input is performed by the voice of the authorized user (authentication specific target person), and the phase difference, the air conduction sound or the bone conduction sound frequency spectrum or the difference spectrum data is obtained by the method described above. In step S302, these are registered in the EEPROM 322 (FIG. 4) as master data (standard voice feature information: standard phase difference, standard frequency spectrum, or standard difference spectrum) used in the subsequent voice authentication process. In S303 to S305, the face image master data 241 (FIG. 10), the fingerprint master data 243 (FIG. 11), and the pressure-sensitive distribution master data 226 (by the face photographing camera 341, the fingerprint detection unit 342, and the surface contact sensor 343). FIG. 12) is acquired and registered.

以下、第一認証処理につき、骨伝導音と気道音とを認証用特徴情報として同時取得する場合を例にとって説明する。図２０はその一例を示している。Ｓ４０１では、ユーザは認証のための指定の音声を入力する。Ｓ４０２で前述の音声データ処理が実行され、位相差φが演算される。Ｓ４０３では、その位相差φをマスターデータとして記憶されている標準位相差φ０と比較する。ここでは、差分φ−φ０を演算している。Ｓ４０６では、位相差φと標準位相差φ０との偏差が許容範囲内であるか否かを調べ、許容範囲内であれば認証フラグを許可にセットし（Ｓ４０７）、範囲外であれば非許可にセットする（Ｓ４０８）。なお、標準位相差φ０をマスターとして登録するのに代え、標準位相差φ０を包含する許容位相差範囲（最大値φmaxとφminとで与えられる）を登録しておき、φが当該範囲に属しているか否かにより認証を行なうこともできる。 Hereinafter, the case where the bone conduction sound and the airway sound are simultaneously acquired as the authentication feature information will be described as an example for the first authentication processing. FIG. 20 shows an example. In S401, the user inputs a designated voice for authentication. In S402, the audio data processing described above is executed, and the phase difference φ is calculated. In S403, the phase difference φ is compared with the standard phase difference φ0 stored as master data. Here, the difference φ−φ0 is calculated. In S406, it is checked whether or not the deviation between the phase difference φ and the standard phase difference φ0 is within the allowable range. If the deviation is within the allowable range, the authentication flag is set to be permitted (S407). (S408). Instead of registering the standard phase difference φ0 as the master, an allowable phase difference range (given by the maximum values φmax and φmin) including the standard phase difference φ0 is registered, and φ belongs to the range. Authentication can also be performed depending on whether or not there is.

図２１は、位相差に代えて差分スペクトルを用いる音声認証処理の例である（図２０と共通のステップに同じステップ番号を付与し、説明を省略する）。Ｓ４０２で音声データ処理が実行され、Ｓ４１０で、図６に示すごとく、気導音と骨伝導音との差分スペクトルの演算結果を読み出し、Ｓ４１１にて差分スペクトルのマスターデータ（図４：符号２２３）と比較する。Ｓ４１２で両者が一致と判定されれば認証フラグを許可にセットし（Ｓ４１３）、範囲外であれば非許可にセットする（Ｓ４１４）。 FIG. 21 is an example of a voice authentication process using a difference spectrum instead of a phase difference (the same step numbers are assigned to the steps common with FIG. 20 and the description is omitted). In S402, voice data processing is executed. In S410, as shown in FIG. 6, the calculation result of the difference spectrum between the air conduction sound and the bone conduction sound is read, and in S411, the master data of the difference spectrum (FIG. 4: reference numeral 223). Compare with If it is determined in S412 that both match, the authentication flag is set to permit (S413), and if it is out of range, it is set to non-permitted (S414).

図６に示すように、気導音スペクトルと骨伝導音スペクトルとは、主要部分は共通しているが、特定の周波数帯ではスペクトル強度に顕著な差が見られる（例えば、高域側の成分は気導音スペクトルのほうが骨伝導音スペクトルよりも強く現れる）。そこで、該周波数帯での差分スペクトルの形状をマスターと比較することにより、一致・不一致の照合を行なうことができる。特に、気導音スペクトルと骨伝導音スペクトルとの一方に存在し、他方には存在しないスペクトル包絡のピーク（図６で「×」にて示すようなもの）であって、当該ピーク位置が認証すべき個人によって変動する場合、差分スペクトルにて該ピークを検出し、そのピーク位置（周波数）の照合により、精度の高い認証照合を簡便に行なうことができる。 As shown in FIG. 6, the air conduction sound spectrum and the bone conduction sound spectrum have the same main part, but a significant difference is observed in the spectrum intensity in a specific frequency band (for example, a high frequency component). The air conduction sound spectrum appears stronger than the bone conduction sound spectrum). Therefore, matching / mismatching matching can be performed by comparing the shape of the difference spectrum in the frequency band with that of the master. In particular, it is a spectrum envelope peak (as indicated by “x” in FIG. 6) that exists in one of the air conduction sound spectrum and the bone conduction sound spectrum and does not exist in the other, and the peak position is authenticated. When it varies depending on an individual to be detected, the peak can be detected in the difference spectrum, and verification of the peak position (frequency) can be easily performed with high accuracy authentication verification.

図２２は、骨伝導音と気導音との各スペクトルを個別にマスターと照合する音声認証処理の例である（図２０と共通のステップに同じステップ番号を付与し、説明を省略する）。Ｓ４０２で音声データ処理が実行され、気導音と骨伝導音との各周波数スペクトルの演算結果を読み出す（Ｓ４２０，Ｓ４２３）。これらは個別にマスターデータ（図４：符号２２１，２２２）と比較する。Ｓ４２２及びＳ４２５で、骨伝導音と気導音との両者にて一致と判定された場合にのみ、認証フラグを許可にセットし（Ｓ４２６）、範囲外であれば非許可にセットする（Ｓ４２７）。 FIG. 22 is an example of a voice authentication process in which each spectrum of bone conduction sound and air conduction sound is individually verified with the master (the same step numbers are assigned to the steps common with FIG. 20 and description thereof is omitted). Audio data processing is executed in S402, and calculation results of each frequency spectrum of the air conduction sound and the bone conduction sound are read out (S420, S423). These are individually compared with master data (FIG. 4: reference numerals 221, 222). Only in cases where it is determined in S422 and S425 that both the bone conduction sound and the air conduction sound match, the authentication flag is set to permit (S426), and if it is out of range, it is set to non-permitted (S427). .

気導音と骨伝導音とのいずれの周波数スペクトルも、図６に示すように、スペクトル包絡において、音声に応じて固有のピーク位置を生ずるので、このピークの個数と位置により、入力された音声（例えば単語や文字）が、マスターが示す音声と同じであるか否かを識別できる（つまり、音声認識）。また、音声の内容が同じであれば、ピークの位置や強度（あるいは、ピーク間の強度比）をマスターと比較し、その一致・不一致に応じて正規ユーザかそうでないかを認証できる（つまり、話者認識）。 As shown in FIG. 6, each frequency spectrum of the air conduction sound and the bone conduction sound generates a unique peak position in accordance with the sound in the spectrum envelope. Therefore, the input sound is determined according to the number and position of the peaks. It is possible to identify whether (for example, a word or a character) is the same as the voice indicated by the master (that is, voice recognition). If the content of the voice is the same, the peak position and intensity (or intensity ratio between peaks) can be compared with the master, and it can be authenticated whether it is a legitimate user or not according to the match / mismatch (that is, Speaker recognition).

図１５に戻り、以上のような第一認証処理がＳ３で終了すれば、次に第二認証処理に移る（なお、第一認証処理で棄却認証の条件が充足された場合は、ここで処理を打ち切るようにしてもよい）。第二認証処理では、図３０に示すように、携帯電話１を顔正対型保持状態に持ち替える（図２９のように耳元に当てていた電話１を、指等を動かさず手首を約１／４回転させながら下へおろせば１モーションで持ち替えを完了できる）。 Returning to FIG. 15, when the first authentication process as described above ends in S 3, the process proceeds to the second authentication process (if the rejection condition is satisfied in the first authentication process, the process is performed here). May be discontinued). In the second authentication process, as shown in FIG. 30, the mobile phone 1 is changed to the face-to-face holding state (the phone 1 that has been placed on the ear as shown in FIG. If you move it down 4 turns, you can complete the change in 1 motion).

図２３は、第二認証処理の第一例を示すものである。ここでは、気道音と顔画像とが認証用特徴情報として同時取得される処理となっている。Ｓ６０１では顔画像（Ｉ）を撮影する。次いで、図１４に示すように、表示部３０８に予め決められた質問２５０が表示され、その回答を音声入力する。図２３に示すように、顔画像撮影、質問表示、回答音声入力は、この順で２回繰り返して行われ、最後にもう一度顔画像を撮影する（Ｓ６０１〜Ｓ６０７）。Ｓ６０８では、３回撮影した顔画像（Ｉ）〜（ＩＩＩ）がマスターと一致するかどうかを照合する（図１０参照）。また、Ｓ６１０及びＳ６１２では、２回取得した回答音声（Ｉ）（ＩＩ）（気道音のスペクトル）をマスターと照合する。Ｓ６０９，Ｓ６１１，Ｓ６１３では、それぞれの照合の一致不一致について判定し、全て一致の場合にのみＳ６１４に進んで認証フラグを許可にセットし（受理認証）、１つでも不一致があればＳ６１５に進んで認証フラグを非許可にセットする（棄却認証）。 FIG. 23 shows a first example of the second authentication process. Here, the airway sound and the face image are processed simultaneously as authentication feature information. In step S601, the face image (I) is photographed. Next, as shown in FIG. 14, a predetermined question 250 is displayed on the display unit 308, and the answer is inputted by voice. As shown in FIG. 23, face image shooting, question display, and answer voice input are repeated twice in this order, and finally a face image is shot again (S601 to S607). In S608, it is checked whether or not the face images (I) to (III) photographed three times coincide with the master (see FIG. 10). In S610 and S612, the response voices (I) and (II) (airway sound spectrum) acquired twice are collated with the master. In S609, S611, and S613, it is determined whether or not there is a match between the respective collations, and if all match, the process proceeds to S614 to set the authentication flag to “permitted” (acceptance authentication), and if there is any mismatch, the process proceeds to S615. Set the authentication flag to non-permitted (rejection authentication).

ここで、３度撮影した顔画像のうち、最初か最後のどちらかはマスターとの照合により認証することが望ましいが、残りの２つは、虚偽認証等を防止する判定ができればよく、カメラの視野から顔が外れていないかどうかを確認する簡単なパターン照合処理に置き換えることができる。図２６に、その一例を示す。２つのパターンがカラーあるいは諧調を有したものであれば、Ｓ７０１でそれぞれ二値化を行なう。Ｓ７０３〜Ｓ７０６では、パターンフレーム間の対応するピクセルを順次リードし、ピクセルの設定値（０か１）の排他的論理和を演算する。パターンが動いていなければ対応するピクセルの設定値は等しく、排他的論理和の値は０となり、パターンが動けばピクセル不一致となって排他的論理和の値は１となる。この排他的論理和の値を各ピクセルについて繰り返し行ない、カウンタＫに加算してゆく（Ｓ７０７）。パターンに異常な動きがあれば変化ピクセルの数が増え、排他的論理和のカウンタＫの値も大きくなる。Ｓ７０９〜Ｓ７１４では、そのＫの最終的な値をフレーム内ピクセル総数Ｍで割り、この値が許容値以下であれば一致判定を、許容値を超えれば不一致判定を行なう。 Here, it is desirable to authenticate the first or last of the face images taken three times by collating with the master, but the remaining two need only be determined to prevent false authentication, etc. It can be replaced with a simple pattern matching process for confirming whether or not the face is out of view. An example is shown in FIG. If the two patterns have color or gradation, binarization is performed in S701. In S703 to S706, corresponding pixels between pattern frames are sequentially read, and an exclusive OR of pixel setting values (0 or 1) is calculated. If the pattern does not move, the setting value of the corresponding pixel is equal and the value of the exclusive OR is 0, and if the pattern moves, the pixel does not match and the value of the exclusive OR is 1. This exclusive OR value is repeated for each pixel and added to the counter K (S707). If there is an abnormal movement in the pattern, the number of changed pixels increases, and the value of the exclusive OR counter K also increases. In S709 to S714, the final value of K is divided by the total number M of pixels in the frame, and if this value is less than or equal to the allowable value, match determination is performed, and if it exceeds the allowable value, mismatch determination is performed.

上記図２６の処理は、顔画像に限らず、後述する指紋画像や感圧分布パターンについても同様に適用できる（接触変化確認処理）。図１３上に示すように、顔画像２４０の撮影中に手ＭＨによる保持が解除されれば、検知される指紋画像や感圧分布パターンに変化が現れ、異常として棄却認証することができる。また、図１３下に示すように、手ＭＨによる保持中（つまり、指紋画像や感圧分布パターンの検出中）に、顔画像が不在になった場合は、顔画像パターンに変化が現れるので、これも異常として棄却認証することができる。 The process of FIG. 26 is not limited to a face image, but can be similarly applied to a fingerprint image and a pressure-sensitive distribution pattern described later (contact change confirmation process). As shown in FIG. 13, if the holding by the hand MH is canceled during the photographing of the face image 240, a change appears in the detected fingerprint image or pressure-sensitive distribution pattern, and rejection authentication can be performed as an abnormality. Further, as shown in the lower part of FIG. 13, when the face image is absent during the holding by the hand MH (that is, during detection of the fingerprint image or the pressure-sensitive distribution pattern), a change appears in the face image pattern. This can also be rejected as abnormal.

図２３の流れからも明らかなように、複数種類の認証用特徴情報による複数の認証処理を実施するのであるが、認証用特徴情報の取得は処理の前段で一括して行ない、それを用いた認証処理は後段で一括処理するようにしている。これは、認証用特徴情報の取得処理を、連続的かつ速やかに行なうことで冗長時間を廃し、順次的な代替虚偽認証を行ないにくくするためである。顔画像、指紋、感圧分布及び音声入力は、処理上は順次的であるが、顔画像、指紋及び感圧分布の情報取得は、それぞれ１フレームのパターン取得処理に過ぎないから、いずれも概ね１〜１０ｍｓ程度の所要時間であり、この３つの情報取得には長くとも１秒あれば十分である。他方、音声入力については、音声フレーズの入力時間が３〜２０秒程度であるから、処理上の冗長期間の占める比率が５０％以下に十分制限でき、代用虚偽認証を実行しようとする犯罪者に「代用品」を交換する時間的な余裕を与えることがない。 As is clear from the flow of FIG. 23, a plurality of authentication processes using a plurality of types of authentication feature information are performed, but the acquisition of the authentication feature information is performed in a single stage before the processing and is used. The authentication process is performed in a batch at a later stage. This is because the acquisition process of the authentication feature information is continuously and quickly performed to eliminate redundant time and make it difficult to perform sequential alternative false authentication. Face image, fingerprint, pressure-sensitive distribution, and voice input are sequential in processing, but acquisition of face image, fingerprint, and pressure-sensitive distribution information is only one frame pattern acquisition process. The required time is about 1 to 10 ms, and at most 1 second is sufficient for acquiring these three pieces of information. On the other hand, for voice input, since the voice phrase input time is about 3 to 20 seconds, the ratio of the processing redundant period can be sufficiently limited to 50% or less, and the criminal who intends to perform substitute false authentication. There is no time to replace the “substitute”.

例えば、Ｓ６０１に先立って、カメラ３４１に写るファインダ画像を表示部３０８に表示して、カメラ３４１に対する顔の位置合わせを促し、確定ボタン（入力部３０５のどれかのキーに割り振っておくか、別途認証ボタンを設ける）を押す等により起動信号を与えれば、Ｓ６０１〜Ｓ６０７までの処理が、ユーザ操作によるブレークが不可能な状態で一気に実行されるようにしておくとよい。また、Ｓ６０２ないしＳ６０５での質問に対する回答のインターバルは、正規ユーザであれば即答できるはずなので直ちに音声サンプリング処理に移行し、回答に必要な時間を経過すれば自動的に次のステップに進むようにする。他方、Ｓ６０１、Ｓ６０４、Ｓ６０７の画像撮影はいずれも数ｍｓ程度の瞬時に完了するから、特にシャッター音出力や撮影メッセージの表示等による通知を行なわなければ、処理上はユーザからみて、画面上に質問表示するたびに間髪入れずに答えを音声入力するだけの流れとなり、その間、顔が何度も撮影されていることには全く気付かない。その結果、ユーザは携帯電話１とあたかも会話するごとく簡単な音声のやり取りをするだけで、内部的には画像照合も含めた複雑な処理を行っている実感もなく、平易な気分で認証処理を終えることができる。質問と回答とは、一連のストーリー性を持たせて相互に関連付けておくとより効果的である。実例を以下に示す。回答は、正規ユーザにしか回答できない内容にしておくことはもちろんである。
（実例１）
（質問（Ｉ））「誰が好き？」
（回答（Ｉ））「かおるちゃん」
（質問（ＩＩ））「どのくらい好き？」
（回答（ＩＩ））「骨まで好き」
（実例２）
（質問（Ｉ））「パパ、すみれのお誕生日はいつ？」
（回答（Ｉ））「１２月２１日」
（質問（ＩＩ））「何買ってくれるの？」
（回答（ＩＩ））「ショートケーキ」 For example, prior to S601, a viewfinder image captured by the camera 341 is displayed on the display unit 308 to prompt the user to align the face with the camera 341, and a confirmation button (assigned to any key of the input unit 305 or separately) If an activation signal is given by pressing (providing an authentication button) or the like, the processing from S601 to S607 may be executed at once in a state where a break by a user operation is impossible. Also, since the response interval for the question in S602 to S605 should be a regular user, it should be possible to answer immediately, so that the process immediately proceeds to the voice sampling process, and automatically proceeds to the next step when the time required for the answer has passed. To do. On the other hand, since the image capturing in S601, S604, and S607 is all completed in an instant of about several ms, unless the notification is made especially by the shutter sound output or the display of the photographing message, the processing is on the screen from the viewpoint of the user. Every time a question is displayed, it is a flow of just inputting the answer without putting in a short time, and during that time, it is completely unaware that the face has been shot many times. As a result, the user simply exchanges voice as if having a conversation with the mobile phone 1, and does not actually feel that complicated processing including image matching is performed internally, and the authentication process is performed in a simple mood. Can finish. Questions and answers are more effective if they are related to each other with a series of stories. An example is shown below. Needless to say, the answer is set so that only authorized users can answer.
(Example 1)
(Question (I)) “Who do you like?”
(Answer (I)) "Kaoru-chan"
(Question (II)) “How much do you like?”
(Answer (II)) “I like bones”
(Example 2)
(Question (I)) “Dad, when is Sumire ’s birthday?”
(Answer (I)) "December 21"
(Question (II)) "What will you buy?"
(Answer (II)) "Short cake"

上記のように冗長時間を短縮すれば、例えば、顔画像の撮影が先に行われ、相当の冗長期間経過後に指紋や感圧分布の入力が行われたとすると、顔画像の撮影中に、携帯電話が手で保持されていない状態で指紋や感圧分布の検出可能時間が経過してしまう。つまり、指紋や掌などの情報源が不在の状態で制的に情報取得処理が進められ、形骸化された空白の指紋ないし感圧分布の情報だけが残る。これを認証処理に供すれば、必然的に棄却認証となるから、目的を達することができる。 If the redundancy time is shortened as described above, for example, if a face image is captured first and a fingerprint or pressure-sensitive distribution is input after a considerable redundancy period has passed, The detectable time of the fingerprint and the pressure-sensitive distribution elapses when the phone is not held by hand. In other words, the information acquisition process is systematically performed in the absence of information sources such as fingerprints and palms, and only the blank fingerprints or pressure-sensitive distribution information that remains in the form remains. If this is subjected to authentication processing, rejection authentication will inevitably result in rejection, so the purpose can be achieved.

図２４は、第二認証処理の第二例を示すものである。ここでは、Ｓ６５０〜Ｓ６５６で、顔画像、指紋、携帯電話保持に係る感圧分布及び気導音の４種類を認証用特徴情報として同時取得する。具体的には、中央のＳ６５３で音声入力を行ない、顔画像、指紋及び感圧分布の検出は、該音声入力の前後に各１回ずつ行ない、同時性の確認を行っている。Ｓ６５７〜Ｓ６５９では、まず、図２６に示した流れにより、顔画像、指紋及び感圧分布の各パターンが、音声入力の前後で動いていないかどうかを確認し、動いていると判断された場合は認証フラグを非許可にセットする（Ｓ６６９）。動いていなければＳ６６０〜Ｓ６６７に進み、音声、感圧分布、指紋及び顔画像をそれぞれマスターと比較し、全て一致の場合にのみＳ６６８に進んで認証フラグを許可にセットし（受理認証）、１つでも不一致があればＳ６６９に進んで認証フラグを非許可にセットする（棄却認証）。 FIG. 24 shows a second example of the second authentication process. Here, in S650 to S656, four types of facial image, fingerprint, pressure-sensitive distribution related to cellular phone holding, and air conduction sound are simultaneously acquired as authentication feature information. Specifically, voice input is performed in the center S653, and the detection of the face image, the fingerprint, and the pressure-sensitive distribution is performed once before and after the voice input to confirm the simultaneity. In S657 to S659, first, according to the flow shown in FIG. 26, it is confirmed whether or not each pattern of the face image, the fingerprint, and the pressure-sensitive distribution is moving before and after voice input, and is determined to be moving. Sets the authentication flag to non-permitted (S669). If not moving, the process proceeds to S660 to S667, and the voice, pressure-sensitive distribution, fingerprint and face image are respectively compared with the master, and if all match, the process proceeds to S668 and the authentication flag is set to permit (acceptance authentication), 1 If there is any mismatch, the process proceeds to S669, and the authentication flag is set to non-permitted (rejection authentication).

ところで、音声やパターンマッチングによる認証処理にはファジーな変動要素もあって、認証の精度を高めるには、周知ではあるがより複雑な処理が必要になってくる。本発明のように複数種類の認証処理を組み合わせて実施する場合、処理を行なうＣＰＵ等への負担も重くなるし、認証完了までに長い処理待ち時間が発生することもある。そこで、個々の認証処理における照合の精度を多少低くして、処理負担自体は軽減できるようにし、認証の精度は複数方式の組み合わせの側でカバーするという考え方がある。この場合、図２４の処理は図２５のように変形して実行することができる。Ｓ６５０〜Ｓ６５９の処理は図２４と全く同じである。そして、Ｓ６６０〜Ｓ６７３では、音声、感圧分布、指紋及び顔画像のマスターとの照合結果を、受理／棄却の二者択一的に定めるのではなく、照合一致度をポイント等の数値パラメータで表し、その数値パラメータを用いた判定演算の結果に基づいて、総合的に受理／棄却の判定を行なうようにしている。この場合、ある認証用特徴情報での照合結果が多少不明瞭であっても、他の認証用特徴情報での照合結果が明確であれば、総合的に誤差の小さい有益な認証判定が可能となる。本実施形態では、持ち点を定めて照合一致度が低ければその都度減点を行ない（Ｓ６７０，Ｓ６７１，Ｓ６７２，Ｓ６７４）、Ｓ６７３で合格点を充足していればＳ６６８に進んで認証フラグを許可にセットし（受理認証）、合格点に満たない場合はＳ６６９に進んで認証フラグを非許可にセットする（棄却認証）。 By the way, there are fuzzy fluctuating factors in the authentication processing by voice and pattern matching, and it is well known that more complicated processing is required to improve the accuracy of authentication. When a plurality of types of authentication processes are performed in combination as in the present invention, the burden on the CPU that performs the processing becomes heavy, and a long processing waiting time may occur until the authentication is completed. Therefore, there is a concept that the accuracy of verification in each authentication process is somewhat lowered so that the processing load itself can be reduced, and the accuracy of authentication is covered by a combination of a plurality of methods. In this case, the process of FIG. 24 can be modified and executed as shown in FIG. The processing from S650 to S659 is exactly the same as that in FIG. In S660 to S673, the collation result with the master of the voice, pressure-sensitive distribution, fingerprint and face image is not determined alternatively or alternatively, but the collation coincidence is determined by numerical parameters such as points. The acceptance / rejection determination is comprehensively performed based on the result of the determination operation using the numerical parameter. In this case, even if the collation result with certain authentication feature information is somewhat unclear, if the collation result with other authentication feature information is clear, it is possible to make a useful authentication judgment with a small error overall. Become. In this embodiment, if the score is determined and the matching degree is low, points are deducted each time (S670, S671, S672, S674). If set (acceptance authentication) and less than the passing score, the process proceeds to S669 and the authentication flag is set to non-permitted (rejection authentication).

最後に、図２７は、図１５の認証主処理の第一認証処理（Ｓ３）及び第二認証処理（Ｓ４）を、顔当接保持状態だけを用いた複合認証処理（Ｓ３）で置き換えた例である。図２９に示すように、顔当接保持状態では、顔画像データの取得が不能のため、複合認証処理は、図２８のＳ６５１〜Ｓ６５５に示すように、顔画像以外の認証用特徴情報、具体的には音声（ここでは、気導音と骨伝導音）と、手の生体特徴情報（ここでは、指紋と感圧文応）とを組み合わせて、これらを同時取得するようにしている。ここでも、中央のＳ６５２で音声入力を行ない、指紋及び感圧分布の検出は、該音声入力の前後に各１回ずつ行ない、同時性の確認を行っている。Ｓ６５８，Ｓ６５９では、図２６に示した流れにより、指紋及び感圧分布の各パターンが、音声入力の前後で動いていないかどうかを確認し、動いていると判断された場合は認証フラグを非許可にセットする（Ｓ６６９）。動いていなければＳ６６２〜Ｓ６６５及びＳ４０３〜Ｓ４２２に進み、音声、感圧分布、指紋及び顔画像をそれぞれマスターと比較し、全て一致の場合にのみＳ６６８に進んで認証フラグを許可にセットし（受理認証）、１つでも不一致があればＳ６６９に進んで認証フラグを非許可にセットする（棄却認証）。 Finally, FIG. 27 shows an example in which the first authentication process (S3) and the second authentication process (S4) of the authentication main process in FIG. 15 are replaced with a composite authentication process (S3) using only the face contact holding state. It is. As shown in FIG. 29, since the face image data cannot be acquired in the face contact holding state, the composite authentication process performs authentication feature information other than the face image, specifically, as shown in S651 to S655 in FIG. Specifically, sound (here, air conduction sound and bone conduction sound) is combined with hand biometric information (here, fingerprint and pressure sensitive narration), and these are acquired simultaneously. Again, voice input is performed in the center S652, and the fingerprint and the pressure-sensitive distribution are detected once before and after the voice input to confirm the simultaneity. In S658 and S659, the flow shown in FIG. 26 is used to check whether each pattern of the fingerprint and the pressure-sensitive distribution is moving before and after the voice input. If it is determined that the pattern is moving, the authentication flag is not set. The permission is set (S669). If not, the process proceeds to S662-S665 and S403-S422, and the voice, pressure-sensitive distribution, fingerprint, and face image are respectively compared with the master, and if all match, the process proceeds to S668 and the authentication flag is set to permit (acceptance). (Authentication) If there is even one mismatch, the process proceeds to S669 and the authentication flag is set to non-permitted (rejection authentication).

なお、Ｓ４０３以下の音声認証処理については、図１９の位相差による認証処理（第二の認証処理：Ｓ４０１〜Ｓ４０６）と、図２２のスペクトル照合による認証処理（第一の認証処理：Ｓ４２０〜Ｓ４２２）とを組み合わせ、双方において一致と判断された場合のみ、認証フラグを許可にセットし（Ｓ４２６）、範囲外であれば非許可にセットする（Ｓ４２７）。スペクトル照合では、気導音のみを用いているが、骨伝導音を用いてもよいし、両方を用いてもよい。しかし、位相差の演算はスペクトル演算に比べると簡単であり、スペクトル照合を気導音と骨伝導音との一方のみとして（他方については、スペクトル演算自体を省略する）、位相差による認証を補助的に用いると、処理の軽量化と認証精度の向上とを同時に図ることができる。 In addition, as for the voice authentication processing after S403, the authentication processing based on the phase difference in FIG. 19 (second authentication processing: S401 to S406) and the authentication processing based on spectrum matching in FIG. 22 (first authentication processing: S420 to S422). ) And the authentication flag is set to “permitted” only when it is determined that both are the same (S426). If it is out of the range, it is set to “not permitted” (S427). In spectral matching, only air conduction sound is used, but bone conduction sound may be used, or both may be used. However, the calculation of the phase difference is simpler than the spectrum calculation, and spectrum verification is performed only for one of the air conduction sound and the bone conduction sound (for the other, the spectrum calculation itself is omitted), and authentication by the phase difference is assisted. If it is used, it is possible to simultaneously reduce the processing weight and improve the authentication accuracy.

上記の実施形態では認証必要なデータ取得と、そのデータを用いた認証処理を全て携帯電話（上位概念は認証用端末）の内部で完結するようにしていたが、認証処理の前部又は一部を携帯電話外の装置に担わせることも可能である。例えば、携帯電話においては認証用特徴情報の取得のみを行い、そのデータを直接又は適宜の加工後に、通信により他のコンピュータで構成された認証用データ処理装置へ転送する（この場合、照合用のマスターデータは認証用データ処理装置へ事前に転送しておく必要がある）。認証用データ処理装置では、転送されてくるデータを受け取り、既に説明したのと同様の方法により照合による認証処理を行い、その結果（認証用フラグと同じ形式のデータ内容でよい）を携帯電話へ返す。携帯電話は、受けた結果の内容に応じて、既に説明したロック解除（利用許可）、ないしロック非解除（利用不許可）の処理を行なう。 In the above embodiment, the data acquisition necessary for authentication and the authentication process using the data are all completed inside the mobile phone (the upper concept is an authentication terminal). Can also be carried by a device outside the mobile phone. For example, in a mobile phone, only acquisition of authentication feature information is performed, and the data is transferred directly or after appropriate processing, and then transferred to an authentication data processing apparatus configured by another computer by communication (in this case, for verification) Master data must be transferred to the authentication data processing device in advance). The authentication data processing device receives the transferred data, performs authentication processing by collation in the same manner as described above, and sends the result (data content in the same format as the authentication flag) to the mobile phone. return. The mobile phone performs the unlocking (use permission) or unlocking release (use disapproval) process already described in accordance with the contents of the received result.

図２においては、認証用データ処理装置はインターネット等の通信ネットワーク３５１に接続された認証ホストコンピュータ３５２であり、携帯電話１は、通信接続回路３２３による電波通信により、無線基地局３５０を介して認証ホストコンピュータ３５２に接続される。なお、認証ホストコンピュータ３５２とは無線ＬＡＮやBlue Toothなどの近距離無線通信網を介して接続するようにしてもよいし、コネクタやケーブルを介して有線接続することも可能である。 In FIG. 2, the authentication data processing apparatus is an authentication host computer 352 connected to a communication network 351 such as the Internet, and the mobile phone 1 is authenticated via a radio base station 350 by radio wave communication by a communication connection circuit 323. A host computer 352 is connected. The authentication host computer 352 may be connected via a short-range wireless communication network such as a wireless LAN or Blue Tooth, or may be connected via a connector or a cable.

本発明の個人認証システムに使用する携帯電話の一例を示す外観斜視図。1 is an external perspective view showing an example of a mobile phone used in the personal authentication system of the present invention. 図１の個人認証システムに使用する携帯電話の電気的構成の一例を示すブロック図。The block diagram which shows an example of the electrical constitution of the mobile telephone used for the personal authentication system of FIG. 面型接触センサによる感圧分布の検知例を示す模式図。The schematic diagram which shows the example of detection of the pressure-sensitive distribution by a surface-type contact sensor. 図２のＲＯＭ及びＥＥＰＲＯＭの記憶内容を示す模式図。FIG. 3 is a schematic diagram showing storage contents of a ROM and an EEPROM in FIG. 2. 音声スペクトルとスペクトル包絡の例を示すグラフ。The graph which shows the example of an audio | voice spectrum and a spectrum envelope. 気導音と骨伝導音との個別の周波数スペクトルと、それらの差分スペクトルとの概念図。The conceptual diagram of the individual frequency spectrum of an air conduction sound and a bone conduction sound, and those difference spectra. 音声波形にフィルタリングを施して用いる概念を示す模式波形図。The schematic waveform diagram which shows the concept used after filtering an audio | voice waveform. 気導音と骨伝導音との位相差を説明する模式波形図。The schematic waveform diagram explaining the phase difference of an air conduction sound and a bone conduction sound. 気導音と骨伝導音との位相差を波形差分により求める方法の説明図。Explanatory drawing of the method of calculating | requiring the phase difference of an air conduction sound and a bone conduction sound by a waveform difference. 顔画像による認証の概念図。The conceptual diagram of the authentication by a face image. 指紋による認証の概念図。The conceptual diagram of the authentication by a fingerprint. 指紋による感圧分布の概念図。The conceptual diagram of the pressure-sensitive distribution by a fingerprint. 顔画像と手の生体情報とが不当に順次入力される様子を説明する図。The figure explaining a mode that a face image and the biometric information of a hand are input illegally sequentially. 音声認証入力の誘導形式を例示して示す模式図。The schematic diagram which illustrates the induction | guidance | derivation format of voice authentication input. 認証主処理の流れを示すフローチャート。The flowchart which shows the flow of an authentication main process. 初期化処理の流れを示すフローチャート。The flowchart which shows the flow of an initialization process. 音声データ処理の流れを示すフローチャート。The flowchart which shows the flow of an audio | voice data process. 気導音／骨伝導音波形サンプリング処理の流れを示すフローチャート。The flowchart which shows the flow of an air conduction sound / bone conduction sound wave form sampling process. 気導音／骨伝導音位相差演算処理の流れを示すフローチャート。The flowchart which shows the flow of an air conduction sound / bone conduction sound phase difference calculation process. 第一認証処理の第一例の流れを示すフローチャート。The flowchart which shows the flow of the 1st example of a 1st authentication process. 同じく第二例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 2nd example. 同じく第三例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 3rd example. 第二認証処理の第一例の流れを示すフローチャート。The flowchart which shows the flow of the 1st example of a 2nd authentication process. 同じく第二例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 2nd example. 同じく第三例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 3rd example. 顔画像ないし手の生体情報の動きを検出するパターン照合処理の流れを示すフローチャート。The flowchart which shows the flow of the pattern collation process which detects the motion of the biometric information of a face image or a hand. 顔当接型保持状態での複合認証処理を用いる認証主処理の例を示すフローチャート。The flowchart which shows the example of the authentication main process using the composite authentication process in a face contact type | mold holding state. 複合認証処理の流れを示すフローチャート。The flowchart which shows the flow of a composite authentication process. 顔当接型保持状態の説明図。Explanatory drawing of a face contact type | mold holding state. 顔正対型保持状態の説明図。Explanatory drawing of a face-to-face holding state.

Explanation of symbols

１携帯電話（個人認証システム）
３０４送話器（マイク：気導音検出部）
３４０骨伝導マイク（骨伝導音検出部）
３４１顔画像撮影カメラ
３４２指紋検知ユニット（接触式生体特徴情報検出部）
３４３面型接触センサ（接触式生体特徴情報検出部）
３１２ＣＰＵ（認証処理手段、照合手段、複合音声特徴情報演算手段）
３１３ＲＡＭ（骨伝導音声情報記憶部、気導音声情報記憶部）
３２２ＥＥＰＲＯＭ（標準音声特徴情報記憶部） 1 Mobile phone (personal authentication system)
304 Mic (microphone: air conduction sound detector)
340 Bone conduction microphone (bone conduction sound detector)
341 Face image camera 342 Fingerprint detection unit (contact-type biometric feature information detector)
343 Surface contact sensor (contact type biometric feature information detection unit)
312 CPU (authentication processing means, verification means, composite voice feature information calculation means)
313 RAM (bone conduction voice information storage unit, air conduction voice information storage unit)
322 EEPROM (standard audio feature information storage unit)

Claims

A personal authentication system for authenticating a person to be authenticated using a mobile phone,
An authentication feature information acquisition unit provided in the mobile phone,
When the authentication processing person holds the mobile phone in a telephone use grasp holding state in which the mobile phone is grasped and held in the same state as when a telephone function other than the authentication process is used, the person who is subject to the authentication processing person touches the position. A contact-type biometric feature information detection unit that detects biometric feature information of the hand provided;
A camera for photographing a face provided at a position where the face of the person subject to authentication processing can be photographed when the person subject to authentication processing holds the mobile phone in the telephone use grasp holding state;
A bone conduction sound detector for detecting the voice information of the person to be authenticated by bone conduction sound;
An air conduction sound detecting unit for detecting voice information of the person to be authenticated by air conduction sound;
A feature information acquisition unit for authentication including two or more selected from the group consisting of:
Authentication feature information acquisition that is provided in the mobile phone and that simultaneously executes acquisition of authentication feature information by the two or more authentication feature information acquisition units in the telephone use grasp holding state for at least two specified ones Control means;
Authentication processing means that is provided inside or outside the mobile phone and performs authentication processing of the person to be authenticated based on the individual authentication feature information acquired by each of the two or more authentication feature information acquisition units; Prepared,
The acquisition of the authentication feature information is not performed when the at least two specified authentication feature information acquisition units are not simultaneously performed, and the authentication processing target person is not authenticated. Personal authentication system.

The personal authentication system according to claim 1, wherein the contact-type biometric feature information detection unit includes a fingerprint detection unit.

The personal authentication system according to claim 1, wherein the contact-type biometric feature information detection unit includes a contact detection sensor that detects a contact state between the mobile phone and the hand in the telephone use grasp holding state.

The contact detection sensor is a surface contact sensor that detects contact distribution or grasping pressure distribution information on the surface of the mobile phone as the biological information of the hand, and the authentication processing unit is configured to detect the contact detected by the surface contact sensor. The personal authentication system according to claim 3, wherein the authentication is performed based on information on distribution or grasped pressure distribution.

As the telephone use holding state, a face contact type holding state, which is a holding state in which a handset of the mobile phone is placed on the face, is used, and the contact-type biometric feature information detection unit is used as the authentication feature information acquisition unit And two or more selected from the group consisting of the bone conduction sound detection unit and the air conduction sound detection unit are used in combination, and the mobile phone is not provided with or provided with the face photographing camera 5. The personal authentication system according to claim 1, wherein the face photographing camera is not used in a mode in which the authentication feature information is acquired in the face contact type holding state. .

In the face contact holding state, as the authentication feature information acquisition unit, the contact biometric feature information detection unit and at least one of the bone conduction sound detection unit and the air conduction sound detection unit are used in combination. 6. The authentication feature information acquisition control means simultaneously acquires voice information of at least one of the air conduction sound and bone conduction sound and the biological information of the hand as the authentication feature information. Personal authentication system.

In the face contact holding state, both the bone conduction sound detection unit and the air conduction sound detection unit are used as the authentication feature information acquisition unit, and the authentication feature information acquisition control means The bone conduction sound information and the air conduction sound information are both acquired as the feature information for authentication by simultaneously detecting the sound emitted by the person to be processed by the bone conduction sound detection unit and the air conduction sound detection unit. The personal authentication system according to claim 5 or 6.

Composite sound for calculating composite sound feature information that can be calculated only when both the bone conduction sound waveform detected by the bone conduction sound detection unit and the air conduction sound waveform detected by the air conduction sound detection unit are used. 8. The personal authentication system according to claim 7, further comprising characteristic information calculation means, wherein the authentication processing means performs the authentication processing based on the composite voice characteristic information.

9. The personal authentication system according to claim 8, wherein the composite voice feature information calculation means calculates a phase difference between the air conduction sound waveform and the bone conduction sound waveform as the composite sound feature information.

The authentication processing means includes a first authentication process for comparing at least one of a frequency spectrum of the bone conduction sound and a frequency spectrum of the air conduction sound with a standard frequency spectrum, and the composite voice feature information. The personal authentication system according to claim 8 or 9, wherein the personal authentication system is implemented in combination with a second authentication process based on the above.

The mobile phone is provided with a display unit that displays a face image of the person to be authenticated to be photographed by the face photographing camera. The face photographing camera is connected to the face of the person to be authenticated by the mobile phone. When the display unit is directly facing, it is attached to a position where the face can be photographed,
As the telephone use holding state, a face-to-face holding state in which the display unit of the mobile phone and the face photographing camera are in a holding state facing the face of the person to be authenticated is used, and the authentication feature information acquisition As the unit, the face photographing camera is indispensable, and at least one of the contact-type biometric feature information detection unit and the air conduction sound detection unit is used in combination, and the mobile phone detects the bone conduction sound detection 11. The bone conduction sound detection unit is not used in a mode in which the authentication feature information is acquired in the face-to-face holding state even though the unit is not provided or is provided. The personal authentication system according to any one of the above.

The authentication feature information acquisition unit includes a combination of the contact-type biometric feature information detection unit and at least one of the face photographing camera, the air conduction sound detection unit, and the bone conduction sound detection unit. The feature information acquisition control unit is configured to detect the biometric information of the hand by the contact-type biometric feature information detection unit before and after the acquisition process of at least one of the face image, the air conduction sound information, and the bone conduction sound information. The personal authentication system according to any one of claims 1 to 11, wherein a contact change confirmation process for examining a change in the detection state is performed.