WO1992006468A1 - Methodes et appareils pour verifier l'identite de l'initiateur d'une sequence d'operations - Google Patents
Methodes et appareils pour verifier l'identite de l'initiateur d'une sequence d'operations Download PDFInfo
- Publication number
- WO1992006468A1 WO1992006468A1 PCT/GB1991/001681 GB9101681W WO9206468A1 WO 1992006468 A1 WO1992006468 A1 WO 1992006468A1 GB 9101681 W GB9101681 W GB 9101681W WO 9206468 A1 WO9206468 A1 WO 9206468A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- derived
- features
- operations
- models
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- the present invention relates to verifying that a sequence of operations has been carried out by a specific entity.
- entity is a person and the sequence of operations may, for example, be speaking a digit or a letter, or writing a letter or a word.
- the invention relates particularly to the verification that an utterance was made by a predetermined person, but it is believed that the invention can be applied to other actions carried out by persons such as recognising written words.
- HMM Hidden Markov Models
- a method of verifying that a sequence of operations originates from a specific entity comprising the steps of
- apparatus for verifying that a sequence of operations originated from a specific entity comprising
- the specific entity is usually a person, although the entity may be an object, for example an object undergoing non-destructive testing when the sequence of operations may be signals originated by the object under test.
- the sequence of operations may, for example, be the utterance of a sound or the signing of a signature.
- the sounds may be alpha-numeric characters or words and the characters or words may be uttered as isolated items, or connected items as in continuous speech.
- the invention has the advantage that it tends to reduce false acceptances and false rejections in speaker verification.
- Signals resulting from incoming speech may be digitised at relatively short intervals and processed over relatively long intervals to provide sets or "frames" of digital signals derived from spectral components. By rejecting some of these components before or after further processing, the effects of telephone link limitations and distortion can be reduced so that speaker verification over telephone systems is possible.
- a method of speech verification or recognition including obtaining digital signals representative of speech
- the finite state machine models employed by the invention may be refined when an appropriate method of finding a suitable partial differential is known. Such a method is described below.
- a method of modifying Hidden Markov Models using a gradient based algorithm Preferably a number of iterations are carried out, and after each iteration the modified models are tested against stored data to determine whether improvements have taken place, the processes finishing when improvements become insignificant.
- the invention also includes apparatus for carrying out the third and fourth aspects of the invention.
- FIG. 2 is a block diagram of a computer card shown as a block in Figure 1
- Figures 3 and 4 are flow charts showing how cepstral and related features can be extracted from signals representing sounds
- Figure 5 is a flow chart showing speaker verification for isolated words
- Figures 6 and 7 form a flow chart showing the calculation of probabilities in speaker verification using connected words
- Figure 8 is a flow chart showing the construction of HMM models
- Figure 9 is a diagram illustrating an alpha-net which may be used in modifying HMMs employed in the invention.
- a person whose speech is to be verified may use a telephone 10 at a location remote from a personal computer 11 containing a circuit card 12 which together carry out verification and indicate the result.
- the telephone 10 will be connected by way of exchanges and telephone lines 13 to the input of the card 12 which contains an analogue-to- digital (A/D) and digital-to-analogue (D/A) converter 32 (see Figure 2), and a digital signal processor (DSP) 33 in which the program for speaker verification and data for the program are stored.
- the card 12 also contains a memory 34, and interface logic 35 for coupling the DSP to a host computer such as the personal computer mentioned above.
- a telephony interface 36 converts from a two wire telephone line to four wires: an A/D input pair and a D/A output pair.
- the interface 36 also contains a circuit for ring detection which provides an output on a control line 37, and "on hook” and "off hook” operations at the beginning and end of telephone messages.
- An audio interface 38 includes a pre-amplifier allowing an audio input for the card 12 to be connected to a microphone.
- An output for connection to a loudspeaker is also provided to allow audio messages or synthesised speech as an alternative to screen messages.
- a switch 39 is operated as required to connect either the telephony interface or the audio interface to the converter 32.
- the A/D samples incoming speech typically at a rate of 8,000 samples per second and spectral representations of the input samples are produced at a frequency called the frame rate, typically every 20 ms.
- Spectral representation is in the form of the outputs of a bank of narrow band filters each centred on a different frequency, with these frequencies spread across the spectrum of the incoming telephone signals.
- the use of a bank of filters for this purpose is well known, and the filters may for example be formed by discrete components or by digital filters achieved by programming the DSP, for example as described in Chapter 4 of the book "Digital Signal Processing Design" by A. Bateman and W. Yates, see particularly Section 4.27 including Example 4.2 on page 148.
- Table 1 gives an example of centre frequencies and bandwidths for a suitable filter bank having 11 filters.
- each filter gives a power output and the DSP program sums these outputs over each frame to give a frame output for each filter.
- the outputs of the filters are, in this embodiment, subjected to the known technique of cepstral processing which is described for example in the paper "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences", by S.B. Davis and P. Mermelstein, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-28, No. 4, August 1980, pages 357 to 366, see particularly page 359.
- Cepstral processing results in the derivation of a number of coefficients which can be regarded as descriptors for the spectrum of the speech signal.
- the first coefficient represents the total energy of the spectrum
- the second coefficient represents the general slope of the spectrum with increase in frequency
- the third coefficient gives an indication of how "peaky" the spectrum is.
- n designating the filters in the filter bank is set to zero and then incremented (operations 41 and 42).
- the logarithm of the output power for each frame of the first filter is calculated and stored in an operation 43 and then the operations 42 to 43 are repeated for the other filters as n increases until under the control of a test 44 the logarithms of all the filter outputs have been stored.
- f n is the log of the power output of the n th filter and j is the number of the cepstrum coefficient.
- the j th cepstral coefficient is found by an operation 46 and a test 47 where a variable m j is increased each time n is incremented by adding the product of the logarithm of the output power of the n th filter and the appropriate cosine transform coefficient a jn (which equals cos[j(k-1 ⁇ 2) so
- a test 48 i n conj unction with previ ous operations 49 and 50 causes the operations 45 and 46, and the test 47 to be repeated j times, so generating j cepstral coefficients.
- the resulting representation of the spectrum has advantages in that the cepstrum can be readily processed for further purposes by giving different weightings to the coefficients.
- the zero and first cepstral coefficients known as MFCCO and MFCCl may be given zero weights.
- Processing preferably also includes deriving an indication related to the rate of change of each coefficient (first order difference) and its second order difference.
- An algorithm for this purpose begins by setting the number j of the coefficient to one in an operation 52 and setting (operation 53) a variable k nominally representing a particular recent frame to -k max where k max is the number of previous and succeeding frames relative to the frame k to be used in forming each j th first order rate of difference d j .
- m j (k) is the j th coefficient of the k th frame.
- d j 2m j (2) + lm j (1) + 0 - lm j (-1) - 2m j (-2) + 0, respectively, where the figures in brackets give the frame position relative to the k th frame.
- the second order j th difference e j is calculated in an operation 58 and uses
- e j in a single operation.
- the e j calculated in this way is for the nominal frame k but derived partly from a frame k+2 two frames later.
- the second order difference derived is a trend which is almost always nearly the same as could, alternatively, be obtained from using d j (k-1) and d j (k+1).
- Second order coefficients could be calculated with values of d j from more frames and first order coefficients could be calculated with values from two frames only.
- each frame is represented by a number of values which form elements in a “feature vector”. There is one such vector for each frame.
- feature vector There is one such vector for each frame.
- 7 corresponding to the first order differences of these coefficients Preferably an additional 7 corresponding to second order differences are also used.
- Feature extraction may be carried out by any suitable alternative method such as the known methods of linear prediction and discrete Fourier transform whose outputs may also be converted to the cepstrum of the speech signal.
- an HMM is a finite state machine, which in the field of speech recognition typically comprises from 3 to 10 states coupled by transitions, usually from one state to the next and from one state to itself.
- Each state has an associated probability distribution function (pdf) which allows the calculation of the probability that a given feature vector would be produced by the HMM when in that state.
- PDF probability distribution function
- Each pdf is a multidimensional function specified by a plurality of pairs of mean values and variances each of which is derived from the normal distribution of an element in the feature vectors as is mentioned in more detail below.
- the DSP on the card 12 stores data for two sets of HMMs: the first set is means and variances derived from the nominally normal distributions of the elements of feature vectors of a large number of utterances of the digits 0 to 9 from many different persons (males and females) and the models in this set are known as world models; and the second set in which each model is derived from the nominally normal distributions of the elements of feature vectors of, typically, 5 utterances of each digit spoken by a person whose speech is to be verified.
- the second set in which each model is derived from the nominally normal distributions of the elements of feature vectors of, typically, 5 utterances of each digit spoken by a person whose speech is to be verified.
- the data for the HMMs of these two sets is stored as means and variances in files in the memory of the DSP.
- the memory may also store probabilities of transition from one state to another and also from one state to itself, these transition probabilities also being calculated in a known way from the digit utterances.
- a person whose speech is to be verified enters an identification code into the computer 11.
- the card 12 causes the computer to carry out an operation 15 to prompt for a random digit by displaying a request for this digit to be spoken or by generating a voice synthesised request.
- an operation 16 When the person utters the digit a sequence of feature vectors is extracted and stored in an operation 16 and the probability of this sequence being generated by the world model for the digit prompted is calculated in an operation 17. This probability is calculated using the Viterbi algorithm, which again is well known in the field of speech recognition.
- the Viterbi algorithm considers each feature vector in the sequence and the probability that each state of the HMM could have produced that vector in deriving a probability.
- the Viterbi algorithm takes into account the transition probabilities from one state to another and the probability calculated from the previous state. In this way the Viterbi algorithm finds the most likely combination of states and transitions and calculates a log probability that a sequence of feature vectors matches a particular HMM model.
- the Viterbi algorithm and its use in calculating the probabilities from HMM models is described in Chapter 8, particularly Sections 8.4 and 8.11 of the book "Speech Synthesis and Recognition" by J.N. Holmes, published by Van Nostrand Reinhold (UK) in 1988.
- the log probability that the sequence of vectors could have been generated by the alleged speaker's personal model of the prompted digit is now calculated in the same way (operation 18), the calculated world model log probability is subtracted from the calculated personal model probability and the result is stored (operation 19).
- a positive value for this result indicates that the personal probability is greater than the world probability and that therefore it is more likely that the digit was uttered by the alleged speaker than by an impostor.
- a test 20 is used to determine whether the last prompt in the operation 15 was, in this example, for the fifth in a string of random digits. If not then operations 15 to 20 are repeated but otherwise an operation 22 is carried out in which the results stored in the operation are compared and a decision on verification is given on the basis of a poll with the majority of acceptances or rejections determining the decision. The decision is indicated on the display of the computer 12 or by means of a voice synthesised message in an operation 23 and the algorithm ends.
- the beginning and end of an utterance is found by comparing the probability that the features currently extracted could have been generated by a "silence" state, defined by means and variances, with the probability that the features could have been generated by the beginning and end state, respectively, of an HMM representing an expected word.
- Improvements in speaker verification can usually be achieved if a phrase of connected words, that is continuously spoken words, is used in preference to isolated words. For example five numerals could be spoken as a continuous phrase or a string of five numerals could be split into two continuously spoken parts.
- a computer or such as the P.C 11 may be programmed to display a prompt for the required phrase so that the model of the expected response can be formed by joining models for individual words end to end to make an overall model in the form of a string of word models.
- silences between words by including a state representing silence between the end of one word model and the beginning of the next word model and to allow transitions either directly from one word model to the next or by way of the silence state which may also have a transition to itself to allow for longer silences.
- Figures 6 and 7 are in the form of a flow chart for calculating probabilities of connected words.
- each incoming frame in a complete utterance is dealt with in turn to calculate the probability, for each state in the string of models, that a sequence of states ending with that state could have generated the utterance.
- Figure 7 uses these probabilities to segment the utterance into words and extract the probability that each of these words was spoken.
- a variable FRAME is set to 1 corresponding to the first frame of an utterance received, and then feature extraction is carried out as described above in connection with Figures 3 and 4 (operation 81). Since the next group of operations is to be carried out for every state in the string of models a variable STATE is set to the total number of states in this string (operation 82) so that in this group of operations the last state in the string is considered first. The probability that the features of frame 1 could have been generated by this last state is calculated in an operation 83 from the probability distribution for this state.
- This probability is now added to the maximum probability obtained in previous iterations for states having transitions to this last state (operation 84), this maximum including the multiplication by the probability of the transition to the last state as calculated by addition of the log of the transition probability.
- operation 84 a type of Viterbi algorithm is operated.
- operation 85 is carried out in which the identification of the state which had this previous maximum probability is stored.
- An operation 86 and a test 87 cause operations 83, 84 and 85 to be repeated for every state in the string.
- the STATE variable is again set to the total number of states in the string of models in an operation 91, the variable FRAME is set to the last frame occurring before silence and a variable WORD is set to the total number of words represented by the string of models (operations 92 and 93).
- the probability of the last word in the utterance is the probability calculated in the operation 84 for the last state in the string and the last frame in the utterance.
- An operation 94 stores this probability.
- a variable FRAME COUNT is set to zero in an operation 95 and then a test 96 is carried out to determine whether the previous state of this word model is in the previous word model as indicated by the identification stored in the operation 85.
- the test 96 now determines for the previous frame of the last word whether, from the indication for this frame stored in the operation 85, the previous state was in the last or previous word model. This process continues until the test 96 gives a positive response indicating that the algorithm has backtracked through all the states in the last word to the beginning of the word.
- the number of frames in the word is available from the variable FRAME COUNT.
- the variable WORD is now decremented (operation 100) and if all the words in the string have not been considered as indicated by a test 101, the operations 94 to 100 are repeated for the previous word in the string after decrementing the variable FRAME in an operation 102.
- the outcome of the test 101 indicates that all the words in the string have been considered the probabilities of each word are available as stored in the operation 94 and these probabilities can be considered as above in an acceptance calculation to give an indication as to whether the speaker is verified as genuine or not.
- a program may be used in which the probabilities derived from the world models of each of, for example, five digits may be multiplied together and compared with a similar product derived by matching feature vectors against personal models. If the product of probabilities from the world models is smaller than the product from the personal models then the speech is verified.
- speaker verification using the invention is not limited to the utterance of digits.
- Other characters such as letters from the English or other alphabets may be used, as may be complete words if either each character of the word is spoken separately or a known continuous speech recognition algorithm is used to separate one character or word from another.
- a set of world models for each digit is built using the Baum-Welch re-estimation process which is another well known technique in the field of speech recognition. These initial world models are common to all speakers. Personal models for each of the digits are then constructed using the same process from, typically, five examples of each of the digits collected for each person whose speech is to be verified.
- Each world model is derived from a number of utterances (Q MAX ) of the digit represented by that model each taken from a different speaker.
- transition probabilities means and variances are initialised in an operation 60 as follows.
- the transition probabilities a(i)(i) from state i to itself and the transition probabilities a(i)(j) from state i to state j are initialised to
- a(i)(j) 0.5
- the frames of each utterance are assigned linearly to the HMM states so that each state has, typically, one tenth of the frames (P MAX ) of the utterance assigned to it.
- P MAX the frames of the utterance assigned to it.
- each frame has k features the mean ⁇ (j)(k) for the normal or Gaussian distribution of state j feature k is initialised from
- the variance ⁇ 2 (j)(k) of state j feature k is calculated using
- P and Q are set to zero in an operation 61 to allow the probabilities of each of the P MAX frames of each of the Q MAX utterances of the digit to be calculated given the HMM for that model in an operation 62 which is repeated P MAX Q MAX times by operation of tests 63 and 64, and increment operations 65 and 66.
- the operations 69, 70 and 71 provide a new re-estimated model which is used, for a number of iterations, to allow the calculation of frame probabilities from the Q MAX utterances to be recalculated followed by repetitions of the operations 67 to 71.
- the number of iterations is determined by a test 73 using a value "MAX ITER" which is typically about 10 but alternatively iteration may be continued until a test (not shown) indicates that convergence of the transition probabilities, means and variances has occurred.
- the personal HMMs typically have 7 states and are also left to right models. Again each feature in each state is described by a normal or Gaussian distribution.
- HMM parameters can be estimated by the Baum-Welch algorithm or by the Viterbi algorithm. Since the Viterbi algorithm is simpler and faster it is used for the personal HMMs where a person being enrolled for speaker verification has to wait until the enrolment process is complete.
- the algorithm used for the personal HMMs is the same as Figure 7 except the operations 67, 68 and 69 are replaced by an operation which calculates the forward probabilities using the Viterbi algorithm and a following operation to trace back to find the best sequence of states for each word. Also the operations 70 and 71 are carried out in a different way as explained below.
- the transition probabilities, means and variances are initialised in the same way as for the world models but while the means and variances are re-estimated the transition probabilities remain fixed during the re-estimation.
- the Viterbi operation is as described in the above mentioned Section 8.4 of Chapter 8 of the book by J.N. Holmes.
- the trace back operation keeps track of the state giving rise to the maximum value for each frame of each word and shows the sequence of states having the highest probability for each word.
- the frames for each word are assigned to the best fitting state found on trace back.
- the new mean (j)(k) for feature k in state j is re-estimated in the operation 70 using the frames assigned to state j as follows:
- the re-estimation process is repeated for the number of iterations required. This is either a fixed number, for example 3, or until the model has converged.
- Continuous speech can also be used in deriving the world and personal models and the algorithms of Figures 6 and 7 may be used for this purpose when during training a prompt again shows a string of numerals which are to be spoken.
- the frame count available from the operation 97 when the test 96 indicates that the end of a word has been reached is used to segment the words and to identify the frames in each word. Since the initial word models are based on the features of these frames, the operation 81 also stores the features of each frame when the world and personal models are being derived. As before the frames are initially allocated linearly to model states to allow means and variances to be calculated for initialisation and then the models are re-estimated using either the Baum-Welch or the Viterbi algorithm.
- the resulting world and personal models are used for the operation of the system described above but improvements in discrimination between personal models and the world model, and hence the overall operation of the system, can be expected if the models are further adapted using discriminative training to make best possible use of the differences between sets of utterances used in forming the personal models and the world models rather than using the utterances to improve the likelihood results provided by the models.
- the preferred way of doing this is to use a gradient algorithm but, as is mentioned below, for this purpose the rate of change of the likelihood function for the output probability of a model as a function of means and variances is required. The rate of change of the output probability has to be calculated with respect to each state probability in turn.
- the error in the output (that is the difference between the actual and required output) is taken and the error back-propagation algorithm for the perceptron (a neural net concept) is used to work out the appropriate error derivative with respect to a given state.
- MLP multi-layer perceptron
- Each pair of models, the personal model and the world model, for each digit is treated as an example of an alpha-net and the alpha-net training technique is used to increase discrimination between the two models. When the discrimination has been maximised the models are ready to be used for the process of verification.
- the Alpha algorithm used in the application of HMMs to speech recognition, computes sums os/er alternative state sequences. This Alpha computation can be thought of as performed by a particular form of recurrent network which is called an alpha-net.
- the parameters of the network are parameters of the HMMs such as means, variances.
- ⁇ jt the likelihood of the model generating all the observed data sequence up to and including time t, in terms of b jt the likelihood of it generating the data at time t given that it is in state j at time t, a ij the probability of state j given that the state at the previous time was i, and the ⁇ 's at the previous time.
- ⁇ for the final state of each model is the likelihood of all the data given that model.
- the prior probabilities reflect the amount of training material of each kind, that is expected speaker trials and other speaker trials. During use the prior probabilities will depend on other factors.
- FIG. 9 An example of an alpha-net is shown in Figure 9 where the personal model of a digit includes three states 25, 26 and 27 with transitions between the states and each state having a transition to itself.
- the corresponding world model of the same digit has states 28, 29 and 30 with similar transitions.
- the alpha-net is formed by assuming an initial silent state 31 and transitions from this state to the two models.
- the outputs of the net represent the probability of the digit when uttered being generated by the personal and world models, respectively.
- Adaption is by changing the means and variances for each of the states 25 to 30 to give optimum results and the technique used for training makes use of the identity of the Baum-Welch backward pass algorithm and the MLP back propagation of partial derivatives.
- ⁇ is a coefficient which controls the rate of adaption and the last part of the last term of the equations signifies that the coefficient is to be multiplied by the sign of the rate of change.
- the adaption rate can be decreased periodically by changing the value of ⁇ by deducting a fixed amount from ⁇ or by taking a proportion of ⁇ .
- the equations 1 and 2, or 3 and 4, with 5 and 6 are used repeatedly to calculate new means and variances for all states of the two models.
- the result is a pair of models: a modified personal and a corresponding world model.
- Next sequences of stored vectors representing the digit when spoken by about 50 speakers other than the speaker corresponding to the personal model are used to test for improvements in the discrimination afforded by the modified models.
- the process is then repeated for the models of every other digit so that a pair of models is obtained for every digit.
- two distributions of output probabilities are obtained, one corresponding to the world model and one corresponding to the personal model.
- HMMs HMMs. Speaker verification has applications other than by way of telephone links; for example in access control both for locations and buildings but also computers. Applications of the invention also occur in recognising spoken PINs for cash dispensing machines.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Complex Calculations (AREA)
- Collating Specific Patterns (AREA)
- Programmable Controllers (AREA)
- Telephonic Communication Services (AREA)
Abstract
Le contrôle de l'identité d'un interlocuteur s'avère important dans les applications comme les opérations financières réalisées automatiquement par téléphone. L'acceptation erronée d'un interlocuteur peut conduire à de graves problèmes. Par ailleurs, les rejets injustifiés fréquents d'interlocuteurs sont également problématiques à cause des ennuis ainsi occasionnés. Certains de ces inconvénients du contrôle d'identité peuvent être réduits grâce à l'invention par la formation de 'modèles de Markov cachés' (HMM) pour chaque mot d'un groupe de mots en utilisant les carctéristiques de prononciation desdits mots fournies par un grand nombre d'interlocuteurs. Lesdits modèles sont désignés par l'expression 'modèles globaux'. De plus, pour chaque personne dont la parole doit être identifiée, on crée un HMM pour chacun des mots tel qu'il est prononcé par ladite personne. Ces modèles sont désignés par l'expression 'modèles personnels'. Lors d'une vérification d'identité, on demande à la personne de répéter un enchaînement de mots isolés ou connectés (15); ensuite, des caractéristiques de chacun de ces mots sont extraites (16). Après extraction, on calcule la probabilité selon laquelle lesdites caratériatiques de ces mots auraient pu être générées respectivement par les modèles globaux et par le modèle personnel de ladite personne (17 et 18). Ces probabilités sont ensuite comparées (19) pour chaque mot. La décision (23) prise lors de la vérification est basée sur une interrogation (22) desdites comparaisons.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US08/039,054 US5526465A (en) | 1990-10-03 | 1991-09-30 | Methods and apparatus for verifying the originator of a sequence of operations |
| AU86496/91A AU665745B2 (en) | 1990-10-03 | 1991-09-30 | Methods and apparatus for verifying the originator of a sequence of operations |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB909021489A GB9021489D0 (en) | 1990-10-03 | 1990-10-03 | Methods and apparatus for verifying the originator of a sequence of operations |
| GB9021489.1 | 1990-10-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO1992006468A1 true WO1992006468A1 (fr) | 1992-04-16 |
Family
ID=10683160
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB1991/001681 Ceased WO1992006468A1 (fr) | 1990-10-03 | 1991-09-30 | Methodes et appareils pour verifier l'identite de l'initiateur d'une sequence d'operations |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US5526465A (fr) |
| AU (1) | AU665745B2 (fr) |
| GB (2) | GB9021489D0 (fr) |
| WO (1) | WO1992006468A1 (fr) |
| ZA (1) | ZA917886B (fr) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5679001A (en) * | 1992-11-04 | 1997-10-21 | The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland | Children's speech training aid |
| US5799278A (en) * | 1995-09-15 | 1998-08-25 | International Business Machines Corporation | Speech recognition system and method using a hidden markov model adapted to recognize a number of words and trained to recognize a greater number of phonetically dissimilar words. |
| US6125284A (en) * | 1994-03-10 | 2000-09-26 | Cable & Wireless Plc | Communication system with handset for distributed processing |
Families Citing this family (48)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2105034C (fr) * | 1992-10-09 | 1997-12-30 | Biing-Hwang Juang | Systeme de verification de haut-parleurs utilisant l'evaluation normalisee de cohortes |
| WO1995005656A1 (fr) * | 1993-08-12 | 1995-02-23 | The University Of Queensland | Systeme d'identification par la parole |
| DE4416598A1 (de) * | 1994-05-11 | 1995-11-16 | Deutsche Bundespost Telekom | Verfahren und Vorrichtung zur Sicherung von Telekommunikations-Verbindungen |
| JPH0973440A (ja) * | 1995-09-06 | 1997-03-18 | Fujitsu Ltd | コラム構造の再帰型ニューラルネットワークによる時系列トレンド推定システムおよび方法 |
| US5960391A (en) * | 1995-12-13 | 1999-09-28 | Denso Corporation | Signal extraction system, system and method for speech restoration, learning method for neural network model, constructing method of neural network model, and signal processing system |
| US6073101A (en) * | 1996-02-02 | 2000-06-06 | International Business Machines Corporation | Text independent speaker recognition for transparent command ambiguity resolution and continuous access control |
| US6137863A (en) * | 1996-12-13 | 2000-10-24 | At&T Corp. | Statistical database correction of alphanumeric account numbers for speech recognition and touch-tone recognition |
| US6061654A (en) * | 1996-12-16 | 2000-05-09 | At&T Corp. | System and method of recognizing letters and numbers by either speech or touch tone recognition utilizing constrained confusion matrices |
| US5924070A (en) * | 1997-06-06 | 1999-07-13 | International Business Machines Corporation | Corporate voice dialing with shared directories |
| US5897616A (en) | 1997-06-11 | 1999-04-27 | International Business Machines Corporation | Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases |
| US6219453B1 (en) | 1997-08-11 | 2001-04-17 | At&T Corp. | Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm |
| US6154579A (en) * | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
| US5913192A (en) * | 1997-08-22 | 1999-06-15 | At&T Corp | Speaker identification with user-selected password phrases |
| US6141661A (en) * | 1997-10-17 | 2000-10-31 | At&T Corp | Method and apparatus for performing a grammar-pruning operation |
| US6205428B1 (en) | 1997-11-20 | 2001-03-20 | At&T Corp. | Confusion set-base method and apparatus for pruning a predetermined arrangement of indexed identifiers |
| US6208965B1 (en) | 1997-11-20 | 2001-03-27 | At&T Corp. | Method and apparatus for performing a name acquisition based on speech recognition |
| US6122612A (en) * | 1997-11-20 | 2000-09-19 | At&T Corp | Check-sum based method and apparatus for performing speech recognition |
| US6205261B1 (en) | 1998-02-05 | 2001-03-20 | At&T Corp. | Confusion set based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
| DE69922082T2 (de) | 1998-05-11 | 2005-12-15 | Citicorp Development Center, Inc., Los Angeles | System und Verfahren zur biometrischen Authentifizierung eines Benutzers mit einer Chipkarte |
| US6421453B1 (en) * | 1998-05-15 | 2002-07-16 | International Business Machines Corporation | Apparatus and methods for user recognition employing behavioral passwords |
| US7937260B1 (en) | 1998-06-15 | 2011-05-03 | At&T Intellectual Property Ii, L.P. | Concise dynamic grammars using N-best selection |
| US6400805B1 (en) | 1998-06-15 | 2002-06-04 | At&T Corp. | Statistical database correction of alphanumeric identifiers for speech recognition and touch-tone recognition |
| US6157731A (en) * | 1998-07-01 | 2000-12-05 | Lucent Technologies Inc. | Signature verification method using hidden markov models |
| US6233557B1 (en) * | 1999-02-23 | 2001-05-15 | Motorola, Inc. | Method of selectively assigning a penalty to a probability associated with a voice recognition system |
| IL129451A (en) | 1999-04-15 | 2004-05-12 | Eli Talmor | System and method for authentication of a speaker |
| US7590538B2 (en) * | 1999-08-31 | 2009-09-15 | Accenture Llp | Voice recognition system for navigating on the internet |
| US6526544B1 (en) * | 1999-09-14 | 2003-02-25 | Lucent Technologies Inc. | Directly verifying a black box system |
| US6711699B1 (en) * | 2000-05-04 | 2004-03-23 | International Business Machines Corporation | Real time backup system for information based on a user's actions and gestures for computer users |
| US6961703B1 (en) * | 2000-09-13 | 2005-11-01 | Itt Manufacturing Enterprises, Inc. | Method for speech processing involving whole-utterance modeling |
| US7143044B2 (en) * | 2000-12-29 | 2006-11-28 | International Business Machines Corporation | Translator for infants and toddlers |
| WO2002067245A1 (fr) * | 2001-02-16 | 2002-08-29 | Imagination Technologies Limited | Verification de haut-parleurs |
| GB2372366A (en) * | 2001-02-16 | 2002-08-21 | Imagination Tech Ltd | Speaker verification |
| US20040104062A1 (en) * | 2002-12-02 | 2004-06-03 | Yvon Bedard | Side panel for a snowmobile |
| US7240007B2 (en) * | 2001-12-13 | 2007-07-03 | Matsushita Electric Industrial Co., Ltd. | Speaker authentication by fusion of voiceprint match attempt results with additional information |
| US20030149881A1 (en) * | 2002-01-31 | 2003-08-07 | Digital Security Inc. | Apparatus and method for securing information transmitted on computer networks |
| US7143073B2 (en) * | 2002-04-04 | 2006-11-28 | Broadcom Corporation | Method of generating a test suite |
| US8171298B2 (en) * | 2002-10-30 | 2012-05-01 | International Business Machines Corporation | Methods and apparatus for dynamic user authentication using customizable context-dependent interaction across multiple verification objects |
| JP2004191705A (ja) * | 2002-12-12 | 2004-07-08 | Renesas Technology Corp | 音声認識装置 |
| TWI223791B (en) * | 2003-04-14 | 2004-11-11 | Ind Tech Res Inst | Method and system for utterance verification |
| US7363223B2 (en) * | 2004-08-13 | 2008-04-22 | International Business Machines Corporation | Policy analysis framework for conversational biometrics |
| US7584098B2 (en) * | 2004-11-29 | 2009-09-01 | Microsoft Corporation | Vocabulary-independent search of spontaneous speech |
| US8239200B1 (en) * | 2008-08-15 | 2012-08-07 | Google Inc. | Delta language model |
| DK2364495T3 (en) * | 2008-12-10 | 2017-01-16 | Agnitio S L | Method of verifying the identity of a speaking and associated computer-readable medium and computer |
| US9015093B1 (en) | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
| US8775341B1 (en) | 2010-10-26 | 2014-07-08 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
| US10438591B1 (en) | 2012-10-30 | 2019-10-08 | Google Llc | Hotword-based speaker recognition |
| US9384738B2 (en) | 2014-06-24 | 2016-07-05 | Google Inc. | Dynamic threshold for speaker verification |
| US9715874B2 (en) * | 2015-10-30 | 2017-07-25 | Nuance Communications, Inc. | Techniques for updating an automatic speech recognition system using finite-state transducers |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0099476A2 (fr) * | 1982-06-25 | 1984-02-01 | Kabushiki Kaisha Toshiba | Système pour la vérification de l'identité |
| EP0121248A1 (fr) * | 1983-03-30 | 1984-10-10 | Nec Corporation | Procédé et système de contrôle de l'identité d'un locuteur |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4783804A (en) * | 1985-03-21 | 1988-11-08 | American Telephone And Telegraph Company, At&T Bell Laboratories | Hidden Markov model speech recognition arrangement |
| US4910782A (en) * | 1986-05-23 | 1990-03-20 | Nec Corporation | Speaker verification system |
| US5033087A (en) * | 1989-03-14 | 1991-07-16 | International Business Machines Corp. | Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system |
| US5311601A (en) * | 1990-01-12 | 1994-05-10 | Trustees Of Boston University | Hierarchical pattern recognition system with variable selection weights |
| US5293452A (en) * | 1991-07-01 | 1994-03-08 | Texas Instruments Incorporated | Voice log-in using spoken name input |
-
1990
- 1990-10-03 GB GB909021489A patent/GB9021489D0/en active Pending
-
1991
- 1991-09-30 US US08/039,054 patent/US5526465A/en not_active Expired - Lifetime
- 1991-09-30 AU AU86496/91A patent/AU665745B2/en not_active Expired
- 1991-09-30 GB GB9120698A patent/GB2248513B/en not_active Expired - Lifetime
- 1991-09-30 WO PCT/GB1991/001681 patent/WO1992006468A1/fr not_active Ceased
- 1991-10-02 ZA ZA917886A patent/ZA917886B/xx unknown
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0099476A2 (fr) * | 1982-06-25 | 1984-02-01 | Kabushiki Kaisha Toshiba | Système pour la vérification de l'identité |
| EP0121248A1 (fr) * | 1983-03-30 | 1984-10-10 | Nec Corporation | Procédé et système de contrôle de l'identité d'un locuteur |
Non-Patent Citations (7)
| Title |
|---|
| ICASSP '88/1988 International Conference on Acoustics, Speech and Signal Processing, 11-14 April 1988, New York, US, volume 1, IEEE (New York, US) G. Velius: "Variants of cepstrum based speaker identity verification", pages 583-586, see paragraph 5: "Experiment II: Feature weighting and distance measures" * |
| ICASSP '88/1988 International Conference on Acoustics, Speech and Signal Processing, 11-14 April 1988, New York, US, volume 1, IEEE (New York, US) N. Tishby: "Information theoretic factorization of speaker and language in hidden Markov models, with application to speaker recognition", pages 87-90, see paragraph 5 "Application to speaker verification" * |
| ICASSP '89/1989 International Conference on Acoustics, Speech and Signal Processing, 23-26 May 1989, Glasgow, GB, volume 1, IEEE (New York, US) J.M. Naik et al.: "Speaker verification over long distance telephone lines", pages 524-527, see page 527 "Speaker verification using hidden Markov modeling" * |
| ICASSP '90/1990 International Conference on Acoustics, Speech and Signal Processing, 3-6 April 1990, Albuquerque US, volume 1, IEEE (New York, US) H. Gish: "Robust discrimination in automatic speaker identification", pages 289-292, see paragraph II: "The basis ISIS model" * |
| IEEE Communications Magazine, volume 28, no. 1, January 1990 (New York, US) J.M. Naik: "Speaker verification: A tutorial", pages 42-48, see pages 43,44: "Pattern matching"; pages 45,46: "An example of a speaker verification system" * |
| IEEE Transactions on Acoustics, Speech and Signal Processing, volume 36, no. 6, June 1988 (New York, US) F.K. Soong et al.: "On the use of instantaneous and transitional spectral information in speaker recognition", pAges 871-879, see paragraph I: "Introduction" * |
| The Journal of the Acoustical Society of America, volume 46, no. 4, part 2, 1969 (New York, US) J.E. Luck: "Automatic speaker verification using cepstral measurements", pages 1026-1032, see page 1029, right-hand column, lines 11-17 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5679001A (en) * | 1992-11-04 | 1997-10-21 | The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland | Children's speech training aid |
| US5791904A (en) * | 1992-11-04 | 1998-08-11 | The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland | Speech training aid |
| US6125284A (en) * | 1994-03-10 | 2000-09-26 | Cable & Wireless Plc | Communication system with handset for distributed processing |
| US5799278A (en) * | 1995-09-15 | 1998-08-25 | International Business Machines Corporation | Speech recognition system and method using a hidden markov model adapted to recognize a number of words and trained to recognize a greater number of phonetically dissimilar words. |
Also Published As
| Publication number | Publication date |
|---|---|
| US5526465A (en) | 1996-06-11 |
| ZA917886B (en) | 1992-10-28 |
| GB9021489D0 (en) | 1990-11-14 |
| GB2248513A (en) | 1992-04-08 |
| AU665745B2 (en) | 1996-01-18 |
| GB2248513B (en) | 1994-08-31 |
| AU8649691A (en) | 1992-04-28 |
| GB9120698D0 (en) | 1991-11-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US5526465A (en) | Methods and apparatus for verifying the originator of a sequence of operations | |
| US9646614B2 (en) | Fast, language-independent method for user authentication by voice | |
| US6278970B1 (en) | Speech transformation using log energy and orthogonal matrix | |
| Carey et al. | A speaker verification system using alpha-nets | |
| US6760701B2 (en) | Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation | |
| US6195634B1 (en) | Selection of decoys for non-vocabulary utterances rejection | |
| Peacocke et al. | An introduction to speech and speaker recognition | |
| US6401063B1 (en) | Method and apparatus for use in speaker verification | |
| EP0744734B1 (fr) | Méthode et appareil de vérification du locuteur utilisant une discrimination basée sur la décomposition des mixtures | |
| US6249760B1 (en) | Apparatus for gain adjustment during speech reference enrollment | |
| EP1159737B9 (fr) | Reconnaissance du locuteur | |
| JPS62231997A (ja) | 音声認識システム及びその方法 | |
| US8433569B2 (en) | Method of accessing a dial-up service | |
| US6519563B1 (en) | Background model design for flexible and portable speaker verification systems | |
| CA2304747C (fr) | Reconnaissance de formes au moyen de modeles de reference multiples | |
| US20080071538A1 (en) | Speaker verification method | |
| Karthikeyan et al. | Hybrid machine learning classification scheme for speaker identification | |
| Kasuriya et al. | Comparative study of continuous hidden Markov models (CHMM) and artificial neural network (ANN) on speaker identification system | |
| Pandey et al. | Multilingual speaker recognition using ANFIS | |
| Thakur et al. | Speaker authentication using gmm-ubm | |
| Emori et al. | Vocal tract length normalization using rapid maximum‐likelihood estimation for speech recognition | |
| WO1997037345A1 (fr) | Traitement de la parole | |
| US9978373B2 (en) | Method of accessing a dial-up service | |
| Ahn et al. | On effective speaker verification based on subword model | |
| HK1018110B (en) | Speech processing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU JP US |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE |