CA2013371C - Voice verification circuit for validating the identity of telephone calling card customers - Google Patents
Voice verification circuit for validating the identity of telephone calling card customers Download PDFInfo
- Publication number
- CA2013371C CA2013371C CA002013371A CA2013371A CA2013371C CA 2013371 C CA2013371 C CA 2013371C CA 002013371 A CA002013371 A CA 002013371A CA 2013371 A CA2013371 A CA 2013371A CA 2013371 C CA2013371 C CA 2013371C
- Authority
- CA
- Canada
- Prior art keywords
- speech
- parameters
- circuitry
- coding
- covariance matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012795 verification Methods 0.000 title abstract description 46
- 239000011159 matrix material Substances 0.000 claims abstract description 68
- 230000009466 transformation Effects 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims description 21
- 230000001131 transforming effect Effects 0.000 claims description 15
- 230000003595 spectral effect Effects 0.000 claims description 13
- 238000010586 diagram Methods 0.000 description 8
- 238000000605 extraction Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 5
- 238000013475 authorization Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
Abstract
A speaker verification system receives input speech from a speaker of unknown identity. The speech (12, 14) undergoes LPC analysis and transformation to maximize separability between true speakers and impostors when compared (16) to reference speech parameters which have been similarly transformed. The transformation incorporated a "inter-class"
covariance matrix of successful impostors within a database.
covariance matrix of successful impostors within a database.
Description
- 32350-X550.1 (TI-14077) IDENTITY OF TELEPHONE CALLING CARD CUSTOMERS
TECHNICAL FIELD OF THE INVENTION
.This invention relates in general to speech analysis, and more particularly to a high performance ' speaker verification system including speaker discrimination.
TECHNICAL FIELD OF THE INVENTION
.This invention relates in general to speech analysis, and more particularly to a high performance ' speaker verification system including speaker discrimination.
BACKGROUND OF THE INVENTION
In many applications, it is necessary to verify the :identity of an unknown person. One example of an identity verification device is a photo badge by which an interested party may compare the photo on the badge with the person claiming an identity in order to verify the claim. This method of verification has many shortcomings.
Badges are prone to loss and theft, and relatively easy duplication or adulteration. Furthermore, the inspection of the badge must be performed by a person, and is thus not applicable to many situations where the verification ' must be done by a machine. In short, an effective verifia~ation system or device must be cost-effective, fast, accurate, easy to use and resistant to tampering or impersonation.
Long distance credit card services, for example, must identify a user to ensure that an impostor does not use the service under another person's identity. Prior art systems provide a lengthy identification number (calling card number) which must be entered via the phone's keypad to initiate the long distance service. This approach. is prone to abuse, since the identification number may be easily appropriated by theft, or by simply observing the entry of the identification number by another. It has been estimated that the loss to the long distance services due to unauthorized use exceeds $500,000,000 per year.
Speaker verification systems have been available for several years. However, most applications require a 30 very small true speaker rejection rate, and a small impostor acceptance rate. If the true speaker rejection ~~v~ ~~
In many applications, it is necessary to verify the :identity of an unknown person. One example of an identity verification device is a photo badge by which an interested party may compare the photo on the badge with the person claiming an identity in order to verify the claim. This method of verification has many shortcomings.
Badges are prone to loss and theft, and relatively easy duplication or adulteration. Furthermore, the inspection of the badge must be performed by a person, and is thus not applicable to many situations where the verification ' must be done by a machine. In short, an effective verifia~ation system or device must be cost-effective, fast, accurate, easy to use and resistant to tampering or impersonation.
Long distance credit card services, for example, must identify a user to ensure that an impostor does not use the service under another person's identity. Prior art systems provide a lengthy identification number (calling card number) which must be entered via the phone's keypad to initiate the long distance service. This approach. is prone to abuse, since the identification number may be easily appropriated by theft, or by simply observing the entry of the identification number by another. It has been estimated that the loss to the long distance services due to unauthorized use exceeds $500,000,000 per year.
Speaker verification systems have been available for several years. However, most applications require a 30 very small true speaker rejection rate, and a small impostor acceptance rate. If the true speaker rejection ~~v~ ~~
rate is too high, then the verification system will place a burden on the users. If the impostor acceptance rate is too high, then the verification system may not be of value. Prior art speaker verification systems have not provided the necessary discrimination between true speakers and impostors to be commercially acceptable in applications where the speaking environment is unfavorable.
Speaker verification over long distance telephone networks present challenges not previously overcome. Variations in handset microphones result in -severe mismatches between speech data collected from different handsets for the same speaker. Further, the telephone channels introduce signal distortions which reduce the accuracy of the speaker verification system.
Also, there is little control over the speaker or speaking conditions.
Therefore, a need has arisen in the industry for a system to prevent calling card abuse over telephone Lines. Further, a need has arisen to provide a speaker verification system which effectively discriminates between true speakers arid impostors, particularly in a setting where verification occurs over a long distance network.
~~~~23r~~_ SUMMARY OF THE INVEDITION
In accordance with the present invention, a speaker verification method and apparatus is provided which substantially reduces the problems associated with prior verification systems.
A telephone long distance service is provided using speaker verification to determine whether a user is a valid user or an impostor. The user claims an identity by offering some form of identification, typically by entering a calling card number on the phone's touch-tone keypad. The service requests the user to speak a speech sample, which is subsequently transformed and compared with a reference model previous created from a speech sample provided by the valid user. The comparison results in a score which is used to accept or reject the user.
The telephone service verification system of the present invention provides significant advantages over the prior art. In order to be accepted, an impostor would need to know the correct phrase, the proper inflection and cadence in repeating the phrase, and would have to have speech features sufficiently close to the true speaker.
Hence, the likelihood of defeating the system is very small.
The speaker verification system of the present invention receives input speech.fra~m a speaker of unknown identity. The speech signal is subjected to an LfC
analysis to derive a set of spectral and energy parameters based on the speech signal energy and spectral content of the speech signal. These parameters are transformed to derive a template of statistically optimum features that are designed to maximize reparability between true t/~ A
speakers and known impostors. The template is compared with a previously stored reference model for the true speaker. A score is derivad from a compari.son with the 5 reference model which may be compared to a threshold to determine whether the unb:nown speaker is a true speaker or an impostor. The comparison of the input speech to the reference speech is made using Euclidean distance measurements (the sum of the squares of distances between Corresponding features) using either Dynamic Time Warping or Hidden Markov Modeling.
In one aspect of the present invention, the ' transformation is computed using two matrices. The first matrix is a matrix derived from the speech of all true speakers in a database. The second matrix is a matrix derived from all impostors in the database, those whose speech may be confused with that of a true speaker. The second database provides a basis for discriminating between true speakers and known impostors, thereby increasing the separability of the system.
The speaker verification system o~ the present invention provides significant advantages over prior art systems., First, the percentage of impostor acceptance relative to the percentage of true speaker rejection is decreased. Second, the dimensionality of the transformation matrix may be reduced, thereby reducing template storage requirements and computational burden.
5a In accordance with one aspect of the present invention there is provided a method of verifying the identity of an unknown person, whereby the unknown person's identity is determined to be either a true speaker or an importer, comprising the steps of: receiving input speech from the unknown person; coding the input speech into a set of predetermined spectral and energy parameters; transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
In accordance with another aspect of the present invention there is provided a method of verifying the identity of an unknown person over a telephone network, whereby the unknown person's identity is determined to be either a true speaker or an importer, comprising the steps of: receiving input speech from the unknown person over the telephone network; coding the input speech into a set of predetermined spectral and energy parameters;
transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
5b In accordance with yet another aspect of the present invention there is provided apparatus for verifying the identity of an unknown person, whereby the unknown person's identity is determined to be either a true speaker or an importer, comprising: circuitry for receiving input speech from the unknown person; circuitry for coding the input speech into a set of predetermined spectral and energy parameters; circuitry for transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful imposters; and circuitry for comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
?~~.~ ~"~~.
Speaker verification over long distance telephone networks present challenges not previously overcome. Variations in handset microphones result in -severe mismatches between speech data collected from different handsets for the same speaker. Further, the telephone channels introduce signal distortions which reduce the accuracy of the speaker verification system.
Also, there is little control over the speaker or speaking conditions.
Therefore, a need has arisen in the industry for a system to prevent calling card abuse over telephone Lines. Further, a need has arisen to provide a speaker verification system which effectively discriminates between true speakers arid impostors, particularly in a setting where verification occurs over a long distance network.
~~~~23r~~_ SUMMARY OF THE INVEDITION
In accordance with the present invention, a speaker verification method and apparatus is provided which substantially reduces the problems associated with prior verification systems.
A telephone long distance service is provided using speaker verification to determine whether a user is a valid user or an impostor. The user claims an identity by offering some form of identification, typically by entering a calling card number on the phone's touch-tone keypad. The service requests the user to speak a speech sample, which is subsequently transformed and compared with a reference model previous created from a speech sample provided by the valid user. The comparison results in a score which is used to accept or reject the user.
The telephone service verification system of the present invention provides significant advantages over the prior art. In order to be accepted, an impostor would need to know the correct phrase, the proper inflection and cadence in repeating the phrase, and would have to have speech features sufficiently close to the true speaker.
Hence, the likelihood of defeating the system is very small.
The speaker verification system of the present invention receives input speech.fra~m a speaker of unknown identity. The speech signal is subjected to an LfC
analysis to derive a set of spectral and energy parameters based on the speech signal energy and spectral content of the speech signal. These parameters are transformed to derive a template of statistically optimum features that are designed to maximize reparability between true t/~ A
speakers and known impostors. The template is compared with a previously stored reference model for the true speaker. A score is derivad from a compari.son with the 5 reference model which may be compared to a threshold to determine whether the unb:nown speaker is a true speaker or an impostor. The comparison of the input speech to the reference speech is made using Euclidean distance measurements (the sum of the squares of distances between Corresponding features) using either Dynamic Time Warping or Hidden Markov Modeling.
In one aspect of the present invention, the ' transformation is computed using two matrices. The first matrix is a matrix derived from the speech of all true speakers in a database. The second matrix is a matrix derived from all impostors in the database, those whose speech may be confused with that of a true speaker. The second database provides a basis for discriminating between true speakers and known impostors, thereby increasing the separability of the system.
The speaker verification system o~ the present invention provides significant advantages over prior art systems., First, the percentage of impostor acceptance relative to the percentage of true speaker rejection is decreased. Second, the dimensionality of the transformation matrix may be reduced, thereby reducing template storage requirements and computational burden.
5a In accordance with one aspect of the present invention there is provided a method of verifying the identity of an unknown person, whereby the unknown person's identity is determined to be either a true speaker or an importer, comprising the steps of: receiving input speech from the unknown person; coding the input speech into a set of predetermined spectral and energy parameters; transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
In accordance with another aspect of the present invention there is provided a method of verifying the identity of an unknown person over a telephone network, whereby the unknown person's identity is determined to be either a true speaker or an importer, comprising the steps of: receiving input speech from the unknown person over the telephone network; coding the input speech into a set of predetermined spectral and energy parameters;
transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
5b In accordance with yet another aspect of the present invention there is provided apparatus for verifying the identity of an unknown person, whereby the unknown person's identity is determined to be either a true speaker or an importer, comprising: circuitry for receiving input speech from the unknown person; circuitry for coding the input speech into a set of predetermined spectral and energy parameters; circuitry for transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful imposters; and circuitry for comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
?~~.~ ~"~~.
ERIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying drawings, in which:
FIGURE 1 illustrates a flow chart depicting a personal identity verification system for long-distance calling card services using speech verification;
FIGURE 2 illustrates a block diagram depicting enrollment of a speaker into the speaker verification system of the present invention;
FIGURE 3 illustrates a block diagram depicting the verification and reference update used in the present invention;
FIGURE 4 illustrates a block diagram depicting the verification system used in the present invention;
FIGURE Sa illustrates the vectors used to form the in-class and inter-class matrices used to form the transformation matrix in the present invention;
2~ FIGURE 5b illustrates a block diagram depicting the formation of the speaker discrimination transformation matrix; and FIGURE 6 illustrates a comparison of the speaker verification system of the present invention as compared to a prior art speaker verification system.
.. ~ r~~J
.,~ zs ;:3 DETAILED DESCRIPTION OF THE IDlVENTION
The preferred embodiment of the present invention is best understood by referring to FIGURES 1-6 of the drawings.
FIGURE 1 illustrates a flow chart 10 depicting personal identity verification using speaker verification in connection with a long-distance calling card service.
In block 12, a person claims an identity by offering some information corresponding to a unique identification. For example, a long distance telephone subscriber may enter a unique ID number to claim his identity. In other applications, such as entry to a building, a person may claim identity by presenting a picture badge.
Since the identification offered in block 12 is .
subject to theft and/or alteration, the personal identity verification system of the present invention requests a voice sample from the person in block 14. In block 16, the voice sample provided by the person is compared to a stored reference voice sample which has been previously obtained for the speaker whose identity is being claimed (the "true" speaker). Supplemental security is necessary to ensure that unauthorised users do not create a reference model for another valid user. If the voice sample correlates with the stored voice sample according to predefined decision criteria in deciszon block 18, the identity offered by the person is accepted in block 20.
If the match, between the reference voice sample and the input speech utterance does not satisfy the decision criteria in decision block l8, then the offered identity is rejected in block 22.
~) FIGURE 2 illustrates a block diagram depicting enrollment of a user's voice into the speaker verification system of the present invention. During the enrollment phase, each user of the system supplies a voice sample comprising an authorization phrase which the user will use to gain access to the system. The enrollment speech sample is digitized using an analog-to-digital (A/D) converter 24. The digitized speech is subjected to a linear predictive coding (LPC) analysis in circuit 26.
The beginning and end of the enrollment speech sample are detected by the utterance detection circuit 28. The -utterance detection circuit 28 estimates a speech utterance level parameter from RMS energy (computed every 40 msec frame) using fast upward adaptation and slow downward adaptation. The utterance detection threshold is determined from a noise level estimate and a predetermined ' minimum speech utterance level. The end of the utterance is declared when the speech level estimate remains below a fraction (for example 0.125) of the peak speech utterance level for a specified duration (for example 500 msecy.
Typically, the utterance as has a duration of 2-3 seconds.
. The feature extraction circuit 30 computes a plurality of parameters from each frame of LPC data. In the preferred embodiment, thirty-two parameters are computed by the feature extraction circuit 30, including:
a speech level estimate;
RMS frame energy;
a scalar measure of rate of spectral change;
fourteen filter-bank magnitudes using MEL-spaced simulated filter banks normalized by frame energy;
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying drawings, in which:
FIGURE 1 illustrates a flow chart depicting a personal identity verification system for long-distance calling card services using speech verification;
FIGURE 2 illustrates a block diagram depicting enrollment of a speaker into the speaker verification system of the present invention;
FIGURE 3 illustrates a block diagram depicting the verification and reference update used in the present invention;
FIGURE 4 illustrates a block diagram depicting the verification system used in the present invention;
FIGURE Sa illustrates the vectors used to form the in-class and inter-class matrices used to form the transformation matrix in the present invention;
2~ FIGURE 5b illustrates a block diagram depicting the formation of the speaker discrimination transformation matrix; and FIGURE 6 illustrates a comparison of the speaker verification system of the present invention as compared to a prior art speaker verification system.
.. ~ r~~J
.,~ zs ;:3 DETAILED DESCRIPTION OF THE IDlVENTION
The preferred embodiment of the present invention is best understood by referring to FIGURES 1-6 of the drawings.
FIGURE 1 illustrates a flow chart 10 depicting personal identity verification using speaker verification in connection with a long-distance calling card service.
In block 12, a person claims an identity by offering some information corresponding to a unique identification. For example, a long distance telephone subscriber may enter a unique ID number to claim his identity. In other applications, such as entry to a building, a person may claim identity by presenting a picture badge.
Since the identification offered in block 12 is .
subject to theft and/or alteration, the personal identity verification system of the present invention requests a voice sample from the person in block 14. In block 16, the voice sample provided by the person is compared to a stored reference voice sample which has been previously obtained for the speaker whose identity is being claimed (the "true" speaker). Supplemental security is necessary to ensure that unauthorised users do not create a reference model for another valid user. If the voice sample correlates with the stored voice sample according to predefined decision criteria in deciszon block 18, the identity offered by the person is accepted in block 20.
If the match, between the reference voice sample and the input speech utterance does not satisfy the decision criteria in decision block l8, then the offered identity is rejected in block 22.
~) FIGURE 2 illustrates a block diagram depicting enrollment of a user's voice into the speaker verification system of the present invention. During the enrollment phase, each user of the system supplies a voice sample comprising an authorization phrase which the user will use to gain access to the system. The enrollment speech sample is digitized using an analog-to-digital (A/D) converter 24. The digitized speech is subjected to a linear predictive coding (LPC) analysis in circuit 26.
The beginning and end of the enrollment speech sample are detected by the utterance detection circuit 28. The -utterance detection circuit 28 estimates a speech utterance level parameter from RMS energy (computed every 40 msec frame) using fast upward adaptation and slow downward adaptation. The utterance detection threshold is determined from a noise level estimate and a predetermined ' minimum speech utterance level. The end of the utterance is declared when the speech level estimate remains below a fraction (for example 0.125) of the peak speech utterance level for a specified duration (for example 500 msecy.
Typically, the utterance as has a duration of 2-3 seconds.
. The feature extraction circuit 30 computes a plurality of parameters from each frame of LPC data. In the preferred embodiment, thirty-two parameters are computed by the feature extraction circuit 30, including:
a speech level estimate;
RMS frame energy;
a scalar measure of rate of spectral change;
fourteen filter-bank magnitudes using MEL-spaced simulated filter banks normalized by frame energy;
time difference of frame energy over 40 msec;
and time difference of fourteen filter-bank magnitudes over 40 msec.
The feature extraction circuit 30 computes the thirty-two and derives fourteen features (the least significant features are discarded) using a linear transformation of the LPC data fox each frame. The formation of the linear transformation matrix is described in connection with the FIGURE 5. The fourteen features computed by the feature extraction circuit 30 for each 40 msec frame are stored in a reference template memory 32.
FIGURE 3 illustrates a block diagram depicting the verification circuit. The person desir~,ng access must re eat the authorization p phrase into the speech verification system. Many impostors will be rejected because they do not know the correct authorization phrase.
The input speech (hereinafter "verification speech") is input to a process and verification circuit 3~ which determines whether the verification speech matches the speech submitted during enrollment. If the speech is accepted. by decision logic 36, then the reference template is updated in circuit 38. If the verification speech is rejected, then the person is requested to repeat the phrase. If the verification speech is rejected after.a predetermined number of xepeated attempts, the user is denied access.
After each successful verification, the reference template 'is updated by averaging the reference and the most recent utterance (in the feature domain) as follows:
to Rnew (1-a)Rold + aT
where, a = min (max (1,'n, 0.05), 0.2), n = session index R = reference template data T = last accepted utterance A block diagram depicting verification of an utterance is illustrated in FIGURE 4. The verification speech, submitted by the user requesting access, is subjected to A/D conversion, LPC analysis, and feature extraction in blocks 40-44. 'The A/D conversion, LPC
analysis and feature extraction are identical to the processes described in connection with FIGURE 2.
The parameters computed by the feature extraction circuit 44 are input to a dynamic time warping and compaxe circuit 46. Dynamic time warping (DTW) employs an optimum warping function for nonlinear time alignment of the two utterances (reference and verification) at equivalent points of time. The correlation between the two utterances is derived by integrating over time the euclidean distances between the feature parameters representing the time aligned reference and verification utterances at each frame. The DTW and compare carcuit 46 outputs a score representing the similarities between the two utterances. The score is compared to a predetermined threshold by decision logic 36, which determines whether the utterance is accepted or rejected.
In order ~o compute the linear transformation matrix used in the feature extraction circuits 44 and 30, a speech database .is collected over a group of users. If, ~~~s~a for example, the speech database is to be used in connection with a telephone network, the d::,~abase speech will be collected over the long distance network to provide for the variations and handset microphones and sicJnal distortions due to the telephone channel. Speech is collected from the users over a number of sessions.
During each session, the users repeat a authorization phrase, such as "1650 Atlanta, Georgia'° or a phone number such as "765-4321".
FIGURE 5a illustrates the speech data for a single user. The database utterances are digitized and ' subjected to LPC analysis as discussed in connection with FIGURE 2. Consequently, each utterance 38 is broken into a number of 40 msec frames 50. Each frame is represented by 32 parameters, as previously discussed herein. Each speaker provides a predetermined number of utterances 48.
For example, in FIGURE 5a, each speaker provides 40 utterances. An initial linear transformation matrix or "in-class" covariance matrix [L) is derived from a principal component analysis performed on a pooled covariance matrix computed over all true speakers. To compute .the initial linear transformation matrix [L), covariance matrices are computed for each speaker over the 40 (or other predetermined number) time aligned database utterances 48. The covariance matrices derived for each speaker in the database are pooled together and diagonalzzed. The initial linear transformation matrix is made up of the eigenvectors of the pooled covariance matrix. The resulting diagonalized initial linear transform matrix will have dimensions of 32x32; however, the resulting matrix comprises uncorrelated features ranked in decreasing order of statistical variance.
Therefore, the least significant features may be discarded. The resulting initial linear transformation (after discarding the least significant features) accounts for approximately 95% of the total variance in the data.
In an important aspect of the present invention, the initial linear transformation matrix is adjusted to maximize the reparability between true speakers and impostors in a given data base. Speaker reparability is a more desirable goal than creating a set of statistically uncorrelated features, since the uncorrelated features may ' not be good discriminant features.
An inter-class or "confusion" cavariance matrix is computed over all time-aligned utterances for all successful impostors of a given true speaker. For example, if the database shows that the voice data supplied by 120 impostors (anyone other than the true speaker) will be accepted by the verification system as coming from the true speaker, a covariance matrix is computed far these utterances. The covariance matrices computed for impostors of each true speaker are pooled over all true speakers. The covariance matrix corresponding to the pooled impostor data is known as the "inter-class" or "confusion" covariance matrix [Cj.
Z5 To compute the final linear transformation matrix.[LTj, the initial linear transformation covariance matrix [L] is diagonalized, resulting in a matrix [Ldj.
The matrix [Ldj is multiplied by the confusion matrix [Cj and is subsequently diagonalized. The resulting matrix is the linear transformation matrix [LTj. The block diagram ShOwlllg computation of the linear transformation matrix is illustrated in FIGURE 5b in blocks 52-58.
The transformation provided by the confusion matrix further rotates the speech feature vector to increase separability between true speakers and impostors.
In addition to providing a higher impostor rejection rate, the transformation leads to a further reduction in the number of features used in the speech representation (dimensionality), since only the dominant dimensions need to be preserved. Whereas, an eighteen-feature vector per frame is typically used for the principal spectral ' components, it has been found that a fourteen-feature vector may be used in connection with the present invention. The smaller feature vector reduces the noise inherent in the transformation.
Experimental results comparing impostor acceptance as a function of true speaker rejection is shown in FTGURE 6. In FIGURE 6, curve "A" illustrates impostor acceptance computed without use of the confusion matrix. Curve "B" illustrates the impostor acceptance using the confusion matrix to provide speaker discrimination. As can be seen, for a true speaker rejection of approximately two percent, the present invention reduces the impostor acceptance by approximately ten percent.
In addition to the dynamic time warping (time-aligned) method of performing the comparison of the reference and verification utterances, a Hidden Markov Model-based {HMM) comparison could be employed. An HMM
comparison would provide a state-by-state comparison of the reference and verification utterances, each utterance being transformed as described hereinabove. It has been found that a word by word HMM comparison is preferable to a whole-phrase comparison, due to the inaccuracies caused mainly by pauses between words.
Although the present invention has been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
and time difference of fourteen filter-bank magnitudes over 40 msec.
The feature extraction circuit 30 computes the thirty-two and derives fourteen features (the least significant features are discarded) using a linear transformation of the LPC data fox each frame. The formation of the linear transformation matrix is described in connection with the FIGURE 5. The fourteen features computed by the feature extraction circuit 30 for each 40 msec frame are stored in a reference template memory 32.
FIGURE 3 illustrates a block diagram depicting the verification circuit. The person desir~,ng access must re eat the authorization p phrase into the speech verification system. Many impostors will be rejected because they do not know the correct authorization phrase.
The input speech (hereinafter "verification speech") is input to a process and verification circuit 3~ which determines whether the verification speech matches the speech submitted during enrollment. If the speech is accepted. by decision logic 36, then the reference template is updated in circuit 38. If the verification speech is rejected, then the person is requested to repeat the phrase. If the verification speech is rejected after.a predetermined number of xepeated attempts, the user is denied access.
After each successful verification, the reference template 'is updated by averaging the reference and the most recent utterance (in the feature domain) as follows:
to Rnew (1-a)Rold + aT
where, a = min (max (1,'n, 0.05), 0.2), n = session index R = reference template data T = last accepted utterance A block diagram depicting verification of an utterance is illustrated in FIGURE 4. The verification speech, submitted by the user requesting access, is subjected to A/D conversion, LPC analysis, and feature extraction in blocks 40-44. 'The A/D conversion, LPC
analysis and feature extraction are identical to the processes described in connection with FIGURE 2.
The parameters computed by the feature extraction circuit 44 are input to a dynamic time warping and compaxe circuit 46. Dynamic time warping (DTW) employs an optimum warping function for nonlinear time alignment of the two utterances (reference and verification) at equivalent points of time. The correlation between the two utterances is derived by integrating over time the euclidean distances between the feature parameters representing the time aligned reference and verification utterances at each frame. The DTW and compare carcuit 46 outputs a score representing the similarities between the two utterances. The score is compared to a predetermined threshold by decision logic 36, which determines whether the utterance is accepted or rejected.
In order ~o compute the linear transformation matrix used in the feature extraction circuits 44 and 30, a speech database .is collected over a group of users. If, ~~~s~a for example, the speech database is to be used in connection with a telephone network, the d::,~abase speech will be collected over the long distance network to provide for the variations and handset microphones and sicJnal distortions due to the telephone channel. Speech is collected from the users over a number of sessions.
During each session, the users repeat a authorization phrase, such as "1650 Atlanta, Georgia'° or a phone number such as "765-4321".
FIGURE 5a illustrates the speech data for a single user. The database utterances are digitized and ' subjected to LPC analysis as discussed in connection with FIGURE 2. Consequently, each utterance 38 is broken into a number of 40 msec frames 50. Each frame is represented by 32 parameters, as previously discussed herein. Each speaker provides a predetermined number of utterances 48.
For example, in FIGURE 5a, each speaker provides 40 utterances. An initial linear transformation matrix or "in-class" covariance matrix [L) is derived from a principal component analysis performed on a pooled covariance matrix computed over all true speakers. To compute .the initial linear transformation matrix [L), covariance matrices are computed for each speaker over the 40 (or other predetermined number) time aligned database utterances 48. The covariance matrices derived for each speaker in the database are pooled together and diagonalzzed. The initial linear transformation matrix is made up of the eigenvectors of the pooled covariance matrix. The resulting diagonalized initial linear transform matrix will have dimensions of 32x32; however, the resulting matrix comprises uncorrelated features ranked in decreasing order of statistical variance.
Therefore, the least significant features may be discarded. The resulting initial linear transformation (after discarding the least significant features) accounts for approximately 95% of the total variance in the data.
In an important aspect of the present invention, the initial linear transformation matrix is adjusted to maximize the reparability between true speakers and impostors in a given data base. Speaker reparability is a more desirable goal than creating a set of statistically uncorrelated features, since the uncorrelated features may ' not be good discriminant features.
An inter-class or "confusion" cavariance matrix is computed over all time-aligned utterances for all successful impostors of a given true speaker. For example, if the database shows that the voice data supplied by 120 impostors (anyone other than the true speaker) will be accepted by the verification system as coming from the true speaker, a covariance matrix is computed far these utterances. The covariance matrices computed for impostors of each true speaker are pooled over all true speakers. The covariance matrix corresponding to the pooled impostor data is known as the "inter-class" or "confusion" covariance matrix [Cj.
Z5 To compute the final linear transformation matrix.[LTj, the initial linear transformation covariance matrix [L] is diagonalized, resulting in a matrix [Ldj.
The matrix [Ldj is multiplied by the confusion matrix [Cj and is subsequently diagonalized. The resulting matrix is the linear transformation matrix [LTj. The block diagram ShOwlllg computation of the linear transformation matrix is illustrated in FIGURE 5b in blocks 52-58.
The transformation provided by the confusion matrix further rotates the speech feature vector to increase separability between true speakers and impostors.
In addition to providing a higher impostor rejection rate, the transformation leads to a further reduction in the number of features used in the speech representation (dimensionality), since only the dominant dimensions need to be preserved. Whereas, an eighteen-feature vector per frame is typically used for the principal spectral ' components, it has been found that a fourteen-feature vector may be used in connection with the present invention. The smaller feature vector reduces the noise inherent in the transformation.
Experimental results comparing impostor acceptance as a function of true speaker rejection is shown in FTGURE 6. In FIGURE 6, curve "A" illustrates impostor acceptance computed without use of the confusion matrix. Curve "B" illustrates the impostor acceptance using the confusion matrix to provide speaker discrimination. As can be seen, for a true speaker rejection of approximately two percent, the present invention reduces the impostor acceptance by approximately ten percent.
In addition to the dynamic time warping (time-aligned) method of performing the comparison of the reference and verification utterances, a Hidden Markov Model-based {HMM) comparison could be employed. An HMM
comparison would provide a state-by-state comparison of the reference and verification utterances, each utterance being transformed as described hereinabove. It has been found that a word by word HMM comparison is preferable to a whole-phrase comparison, due to the inaccuracies caused mainly by pauses between words.
Although the present invention has been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (21)
1. A method of verifying the identity of an unknown person, whereby the unknown person's identity is determined to be either a true speaker or an imposter, comprising the steps of:
receiving input speech from the unknown person;
coding the input speech into a set of predetermined spectral and energy parameters;
transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
receiving input speech from the unknown person;
coding the input speech into a set of predetermined spectral and energy parameters;
transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
2. The method of claim 1 wherein said step of transforming comprises the step of transforming the parameters with a linear transform matrix.
3. The method of claim 2 wherein said step of transforming the parameters with a linear transform matrix comprises the steps of:
forming a database of speech samples from a plurality of speakers;
coding said speech samples into a plurality of parameters;
creating an in-class covariance matrix based on the parameters of all true speakers in the database;
creating an inter-class covariance matrix based on the parameters of successful importers in the database;
and creating the linear transform matrix based on said in-class covariance matrix and said inter-class covariance matrix.
forming a database of speech samples from a plurality of speakers;
coding said speech samples into a plurality of parameters;
creating an in-class covariance matrix based on the parameters of all true speakers in the database;
creating an inter-class covariance matrix based on the parameters of successful importers in the database;
and creating the linear transform matrix based on said in-class covariance matrix and said inter-class covariance matrix.
4. The method of claim 3 wherein said step of creating the linear transform matrix comprises the steps of:
determining a transformation by diagonalizing the in-class covariance matrix;
multiplying the inter-class covariance matrix by said transformation; and diagonalizing the matrix formed in said multiplying step.
determining a transformation by diagonalizing the in-class covariance matrix;
multiplying the inter-class covariance matrix by said transformation; and diagonalizing the matrix formed in said multiplying step.
5. The method of claim 1 wherein said step of coding the speech comprises the step of performing linear predictive coding on said speech to generate spectral information.
6. The method of claim 5 wherein said step of coding the speech further comprises the step of performing linear predictive coding on said speech to generate energy information.
7. The method of claim 6 wherein said step of coding said speech further comprises the step of digitizing said speech prior to said steps of performing said linear predictive coding.
8. A method of verifying the identity of an unknown person over a telephone network, whereby the unknown person's identity is determined to be either a true speaker or an imposter, comprising the steps of:
receiving input speech from the unknown person over the telephone network;
coding the input speech into a set of predetermined spectral and energy parameters;
transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
receiving input speech from the unknown person over the telephone network;
coding the input speech into a set of predetermined spectral and energy parameters;
transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful impostors; and comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
9. The method of claim 8 wherein said step of transforming comprises the step of transforming the parameters with a linear transform matrix.
10. The method of claim 9 wherein said step of transforming the parameters with a linear transform matrix comprises the steps of:
forming a database of speech samples from a plurality of speakers over the telephone network;
coding said speech samples into a plurality of parameters;
creating an in-class covariance matrix based on the parameters of all true speakers in the database;
creating an inter-class covariance matrix based on the parameters of successful impostors in the database;
and creating the linear transform matrix based on said in-class covariance matrix and said inter-class covariance matrix.
forming a database of speech samples from a plurality of speakers over the telephone network;
coding said speech samples into a plurality of parameters;
creating an in-class covariance matrix based on the parameters of all true speakers in the database;
creating an inter-class covariance matrix based on the parameters of successful impostors in the database;
and creating the linear transform matrix based on said in-class covariance matrix and said inter-class covariance matrix.
11. The method of claim 10 wherein said step of creating the linear transform matrix comprises the steps of:
determining a transformation by diagonalizing the in-class covariance matrix;
multiplying the inter-class covariance matrix by said transformation; and diagonalizing the matrix formed in said multiplying step.
determining a transformation by diagonalizing the in-class covariance matrix;
multiplying the inter-class covariance matrix by said transformation; and diagonalizing the matrix formed in said multiplying step.
12. The method of claim 8 wherein said step of coding the speech comprises the step of performing linear predictive coding on said speech to generate spectral information.
13. The method of claim 12 wherein said step of coding the speech further comprises the step of performing linear predictive coding on said speech to generate energy information.
14. The method of claim 13 wherein said step of coding said speech further comprises the step of digitizing said speech prior to said steps of performing said linear predictive coding.
15. Apparatus for verifying the identity of an unknown person, whereby the unknown person's identity is determined to be either a true speaker or an imposter, comprising:
circuitry for receiving input speech from the unknown person;
circuitry for coding the input speech into a set of predetermined spectral and energy parameters;
circuitry for transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful importers; and circuitry for comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
circuitry for receiving input speech from the unknown person;
circuitry for coding the input speech into a set of predetermined spectral and energy parameters;
circuitry for transforming the parameters based on a predetermined statistically maximized discrimination between true speakers and successful importers; and circuitry for comparing the transformed parameters with a stored reference model to verify the identity of the unknown person.
16. The apparatus of claim 15 wherein said circuitry for transforming comprises circuitry for transforming the parameters with a linear transform matrix.
17. The apparatus of claim 16 wherein said circuitry for transforming the parameters with a linear transform matrix comprises:
circuitry for forming a database of speech samples from a plurality of speakers;
circuitry for coding said speech samples into a plurality of parameters;
circuitry for creating an in-class covariance matrix based on the parameters of all true speakers in the database;
circuitry for creating an inter-class covariance matrix based son the parameters of successful impostors in the database; and circuitry for creating the linear transform matrix based on said in-class covariance matrix and said inter-class covariance matrix.
circuitry for forming a database of speech samples from a plurality of speakers;
circuitry for coding said speech samples into a plurality of parameters;
circuitry for creating an in-class covariance matrix based on the parameters of all true speakers in the database;
circuitry for creating an inter-class covariance matrix based son the parameters of successful impostors in the database; and circuitry for creating the linear transform matrix based on said in-class covariance matrix and said inter-class covariance matrix.
18. The apparatus of claim 17 wherein said circuitry for creating the linear transform matrix comprises:
circuitry for determining a transformation by diagonalizing the in-class covariance matrix;
circuitry for multiplying the inter-class covariance matrix by said transformation; and circuitry for diagonalizing the product of the inter-class covariance matrix multiplied by the diagonalized in-class covariance matrix.
circuitry for determining a transformation by diagonalizing the in-class covariance matrix;
circuitry for multiplying the inter-class covariance matrix by said transformation; and circuitry for diagonalizing the product of the inter-class covariance matrix multiplied by the diagonalized in-class covariance matrix.
19. The apparatus of claim 15 wherein said coding circuitry comprises circuitry for performing linear predictive coding on said speech to generate spectral information.
20. The apparatus of claim 19 wherein said coding circuitry further comprises circuitry for performing linear predictive coding on said speech to generate energy information.
21. The apparatus of claim 20 wherein said coding circuitry further comprises circuitry for digitizing said speech prior to performing said linear predictive coding.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US07/350,060 US5054083A (en) | 1989-05-09 | 1989-05-09 | Voice verification circuit for validating the identity of an unknown person |
| US350,060 | 1989-05-09 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CA2013371A1 CA2013371A1 (en) | 1990-11-09 |
| CA2013371C true CA2013371C (en) | 2001-10-30 |
Family
ID=23375063
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CA002013371A Expired - Fee Related CA2013371C (en) | 1989-05-09 | 1990-03-29 | Voice verification circuit for validating the identity of telephone calling card customers |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US5054083A (en) |
| EP (1) | EP0397399B1 (en) |
| JP (1) | JP3080388B2 (en) |
| KR (1) | KR0139949B1 (en) |
| AU (1) | AU636335B2 (en) |
| CA (1) | CA2013371C (en) |
| DE (1) | DE69031189T2 (en) |
Families Citing this family (42)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5054083A (en) | 1989-05-09 | 1991-10-01 | Texas Instruments Incorporated | Voice verification circuit for validating the identity of an unknown person |
| US5167004A (en) * | 1991-02-28 | 1992-11-24 | Texas Instruments Incorporated | Temporal decorrelation method for robust speaker verification |
| US5293452A (en) * | 1991-07-01 | 1994-03-08 | Texas Instruments Incorporated | Voice log-in using spoken name input |
| DE69229584T2 (en) * | 1991-08-30 | 2000-07-20 | Texas Instruments Inc., Dallas | Telephone signal classification and method and system for telephone message delivery |
| DE4207837A1 (en) * | 1992-03-12 | 1993-09-16 | Sel Alcatel Ag | METHOD AND DEVICE FOR CHECKING AND OBTAINING ACCESS RIGHTS |
| US5566229A (en) * | 1992-08-24 | 1996-10-15 | At&T | Voice directed communications system employing shared subscriber identifiers |
| US5450524A (en) * | 1992-09-29 | 1995-09-12 | At&T Corp. | Password verification system based on a difference of scores |
| FI96247C (en) * | 1993-02-12 | 1996-05-27 | Nokia Telecommunications Oy | Procedure for converting speech |
| NZ250812A (en) * | 1993-02-27 | 1996-09-25 | Alcatel Australia | Voice controlled data memory and input/output card |
| DE4306200A1 (en) * | 1993-02-27 | 1994-09-01 | Sel Alcatel Ag | Voice-controlled telephone device |
| US5677989A (en) * | 1993-04-30 | 1997-10-14 | Lucent Technologies Inc. | Speaker verification system and process |
| US5502759A (en) * | 1993-05-13 | 1996-03-26 | Nynex Science & Technology, Inc. | Apparatus and accompanying methods for preventing toll fraud through use of centralized caller voice verification |
| US5623539A (en) * | 1994-01-27 | 1997-04-22 | Lucent Technologies Inc. | Using voice signal analysis to identify authorized users of a telephone system |
| US5414755A (en) * | 1994-08-10 | 1995-05-09 | Itt Corporation | System and method for passive voice verification in a telephone network |
| US5687287A (en) | 1995-05-22 | 1997-11-11 | Lucent Technologies Inc. | Speaker verification method and apparatus using mixture decomposition discrimination |
| US5696880A (en) * | 1995-06-26 | 1997-12-09 | Motorola, Inc. | Communication system user authentication method |
| US5940476A (en) | 1996-06-28 | 1999-08-17 | Distributed Software Development, Inc. | System and method for identifying an unidentified caller |
| US5901203A (en) * | 1996-06-28 | 1999-05-04 | Distributed Software Development, Inc. | Computer-based system and method for identifying an unidentified caller |
| US6205204B1 (en) | 1996-06-28 | 2001-03-20 | Distributed Software Development, Inc. | System and method for identifying an unidentified person using an ambiguity-resolution criterion |
| US7006605B1 (en) * | 1996-06-28 | 2006-02-28 | Ochopee Big Cypress Llc | Authenticating a caller before providing the caller with access to one or more secured resources |
| US6529881B2 (en) | 1996-06-28 | 2003-03-04 | Distributed Software Development, Inc. | System and method for identifying an unidentified customer at the point of sale |
| US6205424B1 (en) | 1996-07-31 | 2001-03-20 | Compaq Computer Corporation | Two-staged cohort selection for speaker verification system |
| US6292782B1 (en) * | 1996-09-09 | 2001-09-18 | Philips Electronics North America Corp. | Speech recognition and verification system enabling authorized data transmission over networked computer systems |
| US5995927A (en) * | 1997-03-14 | 1999-11-30 | Lucent Technologies Inc. | Method for performing stochastic matching for use in speaker verification |
| US6078807A (en) * | 1997-08-26 | 2000-06-20 | International Business Machines Corporation | Telephony fraud detection using voice recognition techniques |
| US5913196A (en) * | 1997-11-17 | 1999-06-15 | Talmor; Rita | System and method for establishing identity of a speaker |
| US6233555B1 (en) | 1997-11-25 | 2001-05-15 | At&T Corporation | Method and apparatus for speaker identification using mixture discriminant analysis to develop speaker models |
| US6141644A (en) * | 1998-09-04 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Speaker verification and speaker identification based on eigenvoices |
| US6519565B1 (en) * | 1998-11-10 | 2003-02-11 | Voice Security Systems, Inc. | Method of comparing utterances for security control |
| IL129451A (en) | 1999-04-15 | 2004-05-12 | Eli Talmor | System and method for authentication of a speaker |
| US6795804B1 (en) * | 2000-11-01 | 2004-09-21 | International Business Machines Corporation | System and method for enhancing speech and pattern recognition using multiple transforms |
| US7899742B2 (en) * | 2001-05-29 | 2011-03-01 | American Express Travel Related Services Company, Inc. | System and method for facilitating a subsidiary card account |
| US20030081756A1 (en) * | 2001-10-23 | 2003-05-01 | Chan Norman C. | Multi-detector call classifier |
| US7299177B2 (en) * | 2003-05-30 | 2007-11-20 | American Express Travel Related Services Company, Inc. | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
| WO2006087799A1 (en) * | 2005-02-18 | 2006-08-24 | Fujitsu Limited | Audio authentication system |
| US7940897B2 (en) * | 2005-06-24 | 2011-05-10 | American Express Travel Related Services Company, Inc. | Word recognition system and method for customer and employee assessment |
| JP4714523B2 (en) * | 2005-07-27 | 2011-06-29 | 富士通東芝モバイルコミュニケーションズ株式会社 | Speaker verification device |
| CN101051463B (en) * | 2006-04-06 | 2012-07-11 | 株式会社东芝 | Verification method and device identified by speaking person |
| TWI412941B (en) * | 2008-11-25 | 2013-10-21 | Inst Information Industry | Apparatus and method for generating and verifying a voice signature of a message and computer program product thereof |
| JP5612014B2 (en) * | 2012-03-29 | 2014-10-22 | 株式会社東芝 | Model learning apparatus, model learning method, and program |
| CN105245497B (en) * | 2015-08-31 | 2019-01-04 | 刘申宁 | A kind of identity identifying method and device |
| CN121075338B (en) * | 2025-11-07 | 2026-02-03 | 广州思正电子股份有限公司 | Sound pickup identification method and system based on voiceprint characterization memory |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4032711A (en) * | 1975-12-31 | 1977-06-28 | Bell Telephone Laboratories, Incorporated | Speaker recognition arrangement |
| US4053710A (en) * | 1976-03-01 | 1977-10-11 | Ncr Corporation | Automatic speaker verification systems employing moment invariants |
| JPS58129682A (en) * | 1982-01-29 | 1983-08-02 | Toshiba Corp | Individual verifying device |
| EP0233285A4 (en) * | 1985-07-01 | 1987-12-01 | Ecco Ind | Speaker verification system. |
| US4827518A (en) * | 1987-08-06 | 1989-05-02 | Bell Communications Research, Inc. | Speaker verification system using integrated circuit cards |
| US5216720A (en) | 1989-05-09 | 1993-06-01 | Texas Instruments Incorporated | Voice verification circuit for validating the identity of telephone calling card customers |
| US5054083A (en) | 1989-05-09 | 1991-10-01 | Texas Instruments Incorporated | Voice verification circuit for validating the identity of an unknown person |
-
1989
- 1989-05-09 US US07/350,060 patent/US5054083A/en not_active Expired - Lifetime
-
1990
- 1990-03-29 CA CA002013371A patent/CA2013371C/en not_active Expired - Fee Related
- 1990-03-30 AU AU52403/90A patent/AU636335B2/en not_active Ceased
- 1990-05-03 DE DE69031189T patent/DE69031189T2/en not_active Expired - Fee Related
- 1990-05-03 EP EP90304828A patent/EP0397399B1/en not_active Expired - Lifetime
- 1990-05-08 KR KR1019900006448A patent/KR0139949B1/en not_active Expired - Fee Related
- 1990-05-08 JP JP02118491A patent/JP3080388B2/en not_active Expired - Fee Related
Also Published As
| Publication number | Publication date |
|---|---|
| US5054083A (en) | 1991-10-01 |
| KR900018910A (en) | 1990-12-22 |
| DE69031189T2 (en) | 1997-12-11 |
| EP0397399A3 (en) | 1991-07-31 |
| EP0397399B1 (en) | 1997-08-06 |
| KR0139949B1 (en) | 1998-07-15 |
| DE69031189D1 (en) | 1997-09-11 |
| AU636335B2 (en) | 1993-04-29 |
| EP0397399A2 (en) | 1990-11-14 |
| AU5240390A (en) | 1990-11-15 |
| JP3080388B2 (en) | 2000-08-28 |
| CA2013371A1 (en) | 1990-11-09 |
| JPH0354600A (en) | 1991-03-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA2013371C (en) | Voice verification circuit for validating the identity of telephone calling card customers | |
| US5216720A (en) | Voice verification circuit for validating the identity of telephone calling card customers | |
| US6480825B1 (en) | System and method for detecting a recorded voice | |
| US5548647A (en) | Fixed text speaker verification method and apparatus | |
| US5414755A (en) | System and method for passive voice verification in a telephone network | |
| US6038528A (en) | Robust speech processing with affine transform replicated data | |
| EP0501631B1 (en) | Temporal decorrelation method for robust speaker verification | |
| EP0528990B1 (en) | Simultaneous speaker-independent voice recognition and verification over a telephone network | |
| US6510415B1 (en) | Voice authentication method and system utilizing same | |
| EP1019904B1 (en) | Model enrollment method for speech or speaker recognition | |
| EP0822539B1 (en) | Two-staged cohort selection for speaker verification system | |
| CA2173302C (en) | Speaker verification method and apparatus using mixture decomposition discrimination | |
| EP0983587B1 (en) | Speaker verification method using multiple class groups | |
| US20150112682A1 (en) | Method for verifying the identity of a speaker and related computer readable medium and computer | |
| GB2541466A (en) | Replay attack detection | |
| US20120029922A1 (en) | Method of accessing a dial-up service | |
| US20080071538A1 (en) | Speaker verification method | |
| EP0643520B1 (en) | System and method for passive voice verification in a telephone network | |
| Naik et al. | Evaluation of a high performance speaker verification system for access Control | |
| Melin | Speaker verification in telecommunication | |
| Gonzalez-Rodriguez et al. | Biometric identification through speaker verification over telephone lines | |
| Saeta et al. | Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification | |
| EP1016075B1 (en) | Method and arrangement for providing speaker reference data for speaker verification | |
| AU4068402A (en) | System and method for detecting a recorded voice | |
| Kounoudes et al. | Intelligent Speaker Verification based Biometric System for Electronic Commerce Applications |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| EEER | Examination request | ||
| MKLA | Lapsed |