EP0984431B1 - Vérification et identification de locuteur basée sur des voix-propres - Google Patents
Vérification et identification de locuteur basée sur des voix-propres Download PDFInfo
- Publication number
- EP0984431B1 EP0984431B1 EP99306671A EP99306671A EP0984431B1 EP 0984431 B1 EP0984431 B1 EP 0984431B1 EP 99306671 A EP99306671 A EP 99306671A EP 99306671 A EP99306671 A EP 99306671A EP 0984431 B1 EP0984431 B1 EP 0984431B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speaker
- eigenspace
- training
- speakers
- new
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Definitions
- the present invention relates generally to speech technology and, more particularly, to a system and method for performing speaker verification or speaker identification.
- Speaker verification involves determining whether a given voice belongs to a certain speaker (herein called the "client") or to an impostor (anyone other than the client).
- Speaker identification involves matching a given voice to one of a set of known voices. Like speaker verification, speaker identification has a number of attractive applications. For example, a speaker identification system may be used to classify voice mail by speaker for a set of speakers for which voice samples are available. Such capability would allow a computer-implemented telephony system to display on a computer screen the identity of callers who have left messages on the voice mail system.
- Human speech is the product of air under pressure from the lungs being forced through the vocal cords and modulated by the glottis to produce sound waves that then resonate in the oral and nasal cavities before being articulated by the tongue, jaw, teeth and lips. Many factors affect how these sound producing mechanisms inter-operate.
- the common cold for example, greatly alters the resonance of the nasal cavity as well as the tonal quality of the vocal cords.
- the present invention uses a model-based analytical approach to speaker verification and speaker identification.
- Models are constructed and trained upon the speech of known client speakers (and in the case of speaker verification also upon the speech of one or more impostors).
- These speaker models typically employ a multiplicity of parameters (such as Hidden Markov Model parameters). Rather than using these parameters directly, the parameters are concatenated to form supervectors. These supervectors, one supervector per speaker, represent the entire training data speaker population.
- a linear transformation is performed on the supervectors resulting in a space that we call eigenspace.
- the basis vectors of this eigenspace we call "eigenvoice" vectors or "eigenvectors".
- the eigenspace can be dimensionally reduced by discarding some of the eigenvector terms.
- each of the speakers comprising the training data is represented in eigenspace, either as a point in eigenspace or as a probability distribution in eigenspace.
- the former is somewhat less precise in that it treats the speech from each speaker as relatively unchanging.
- the latter reflects that the speech of each speaker will vary from utterance to utterance.
- the system may then be used to perform speaker verification or speaker identification.
- New speech data is obtained and used to construct a supervector that is then dimensionally reduced and represented in the eigenspace. Assessing the proximity of the new speech data to prior data in eigenspace, speaker verification or speaker identification is performed. The new speech from the speaker is verified if its corresponding point or distribution within eigenspace is within a threshold proximity to the training data for that client speaker.. The system may reject the new speech as authentic if it falls closer to an impostor's speech when placed in eigenspace.
- Speaker identification is performed in a similar fashion.
- the new speech data is placed in eigenspace and identified with that training speaker whose eigenvector point for distribution is closest.
- the eigenspace represents in a concise, low-dimensionality way, each entire speaker, not merely a selected few features of each speaker.
- Proximity computations performed in eigenspace can be made quite rapidly as there are typically considerably fewer dimensions to contend with in eigenspace than there are in the original speaker model space or feature vector space.
- the system does not require that the new speech data include each and every example or utterance that was used to construct the original training data. Through techniques described herein, it is possible to perform dimensionality reduction on a supervector for which some of its components are missing. The result point for distribution in eigenspace nevertheless will represent the speaker remarkably well.
- the eigenvoice techniques employed by the present invention will work with many different speech models.
- the invention can be practiced using other types of model-based recognizers, such as phoneme similarity recognizers, for example.
- HMMs Hidden Markov Models
- the Hidden Markov Model is a modeling approach involving state diagrams. Any speech unit (such as a phrase, word, subword, phoneme or the like) can be modeled, with all knowledge sources included in that model.
- the HMM represents an unknown process that produces a sequence of observable outputs at discrete intervals, the outputs being members of some finite alphabet (corresponding to the predefined set of speech units). These models are called "hidden” because the state sequence that produced the observable output is not known.
- an HMM 10 is illustrated by a set of states (S1, S2 ... S5), vectors that define transitions between certain pairs of states, illustrated as arrows in Figure 1, and a collection of probability data.
- the Hidden Markov Model includes a set of transition probabilities 12 associated with the transition vectors and a set of output probabilities 14 associated with the observed output at each state.
- the model is clocked from one state to another at regularly spaced, discrete intervals. At clock-time, the model may change from its current state to any state for which a transition vector exists. As illustrated, a transition can be from a given state back to itself.
- transition probabilities represent the likelihood that a transition from one state to another will occur when the model is clocked.
- each transition has associated with it a probability value (between 0 and 1).
- the sum of all probabilities leaving any state equals 1.
- transition probability Table 12 For illustration purposes, a set of exemplary transition probability values has been given in transition probability Table 12. It will be understood that in a working embodiment these values would be generated by the training data, with the constraint that the sum of all probabilities leaving any state equals 1.
- the model can be thought of as emitting or outputting one member of its alphabet.
- a phoneme-based speech unit has been assumed.
- the symbols identified in output probability Table 14 correspond to some of the phonemes found in standard English. Which member of the alphabet gets emitted upon each transition depends on the output probability value or function learned during training. The outputs emitted thus represent a sequence of observations (based on the training data) and each member of the alphabet has a probability of being emitted.
- HMMs are often based on probability functions comprising one or more Gaussian distributions. When a plurality of Gaussian functions are used they are typically additively mixed together to define a complex probability distribution, as illustrated at 16.
- the probability distributions can be described by a plurality of parameters. Like the transition probability values (Table 12) these output probability parameters may comprise floating point numbers. Parameters Table 18 identifies the parameters typically used to represent probability density functions (pdf) based on observed data from the training speakers. As illustrated by the equation in Figure 1 at Gaussian function 16, the probability density function for observation vector O to be modeled is the iterative sum of the mixture coefficient for each mixture component multiplied by the Gaussian density n, where the Gaussian density has a mean vector u j and covariance matrix U j computed from the cepstral or filter bank coefficient speech parameters.
- Hidden Markov Model recognizer may vary widely from one application to another.
- the HMM example shown in Figure 1 is intended merely to illustrate how Hidden Markov Models are constructed, and is not intended as a limitation upon the scope of the present invention.
- Hidden Markov Modeling concept there are many variations on the Hidden Markov Modeling concept.
- the eigenvoice adaptation technique of the invention can be readily adapted to work with each of the different Hidden Markov Model variations, as well as with other parameter-based speech modeling systems.
- FIGS 2 and 3 illustrate, respectively, how speaker identification and speaker verification may be performed using the techniques of the invention.
- an eigenspace is constructed.
- the specific eigenspace constructed depends upon the application.
- a set of known client speakers 20 is used to supply training data 22 upon which the eigenspace is created.
- the training data 22 are supplied from the client speaker or speakers 21a for which verification will be desired and also from one or more potential impostors 21 b.
- the procedure for generating the eigenspace is essentially the same for both speaker identification and speaker verification applications. Accordingly, like reference numerals have been applied to Figures 2 and 3.
- the eigenspace is constructed by developing and training speaker models for each of the speakers represented in the training data 22. This step is illustrated at 24 and generates a set of models 26 for each speaker.
- Hidden Markov Models have been illustrated here, the invention is not restricted to Hidden Markov Models. Rather, any speech model having parameters suitable for concatenation may be used.
- the models 26 are trained with sufficient training data so that all sound units defined by the model are trained by at least one instance of actual speech for each speaker.
- the model training step 24 can include appropriate auxiliary speaker adaptation processing to refine the models.
- auxiliary processing examples include Maximum A Posteriori estimation (MAP) or other transformation-based approaches such as Maximum Likelihood Linear Regression (MLLR).
- MAP Maximum A Posteriori estimation
- MLLR Maximum Likelihood Linear Regression
- the models for each speaker are used to construct a supervector at step 28 .
- the supervector, illustrated at 30 may be formed by concatenating the parameters of the model for each speaker.
- the supervector for each speaker may comprise an ordered list of parameters (typically floating point numbers) corresponding to at least a portion of the parameters of the Hidden Markov Models for that speaker. Parameters corresponding to each sound unit are included in the supervector for a given speaker.
- the parameters may be organized in any convenient order. The order is not critical; however, once an order is adopted it must be followed for all training speakers.
- model parameters to use in constructing the supervector will depend on the available processing power of the computer system.
- Hidden Markov Model parameters we have achieved good results by constructing supervectors from the Gaussian means. If greater processing power is available, the supervectors may also include other parameters, such as the transition probabilities (Table 12, Fig. 1) or the Covariance Matrix parameters (parameters 18, Fig. 1). If the Hidden Markov Models generate discrete outputs (as opposed to probability densities), then these output values may be used to comprise the supervector.
- a dimensionality reduction operation is performed at step 32 .
- Dimensionality reduction can be effected through any linear transformation that reduces the original high-dimensional supervectors into basis vectors.
- a non-exhaustive list of examples includes:
- the class of dimensionality reduction techniques useful in implementing the invention is defined as follows.
- T training supervectors obtained from speaker-dependent models for speech recognition.
- V dimension
- X [x1, x2, ..., xV] ⁇ T
- E is less than or equal to T, the number of training supervectors
- W [w1, w2, ..., wE] ⁇ T.
- the values of the parameters of M are calculated in some way from the set of T training supervectors.
- W M*X.
- M has dimension E*V
- Examples include Principal Component Analysis, Independent Component Analysis, Linear Discriminant Analysis, Factor Analysis, and Singular Value Decomposition.
- the invention may be implemented with any such method (not only those listed) for finding such a constant linear transformation M in the special case where the input vectors are training supervectors derived from speaker-dependent modeling, and where M is used to carry out the aforementioned technique.
- the basis vectors generated at step 32 define an eigenspace spanned by the eigenvectors. Dimensionality reduction yields one eigenvector for each one of the training speakers. Thus if there are T training speakers then the dimensionality reduction step 32 produces T eigenvectors. These eigenvectors define what we call eigenvoice space or eigenspace.
- Each supervector in the original training set can be represented as a linear combination of these eigenvectors.
- the eigenvectors are ordered by their importance in modeling the data: the first eigenvector is more important than the second, which is more important than the third, and so on. Our experiments with this technique thus far show that the first eigenvector appears to correspond to a male-female dimension.
- T eigenvectors Although a maximum of T eigenvectors is produced at step 32, in practice, it is possible to discard several of these eigenvectors, keeping only the first N eigenvectors. Thus at step 36 we optionally extract N of the T eigenvectors to comprise a reduced parameter eigenspace at 38.
- the higher order eigenvectors can be discarded because they typically contain less important information with which to discriminate among speakers. Reducing the eigenvoice space to fewer than the total number of training speakers provides an inherent data compression that can be helpful when constructing practical systems with limited memory and processor resources.
- each speaker in the training data is represented in eigenspace.
- each known client speaker is represented in eigenspace as depicted at step 40a and illustrated diagrammatically at 42a .
- the client speaker and potential impostor speakers are represented in eigenspace as indicated at step 40b and as illustrated at 42b .
- the speakers may be represented in eigenspace either as points in eigenspace (as illustrated diagrammatically in Figure 2 at 42a ) or as probability distributions in eigenspace (as illustrated diagrammatically in Figure 3 at 42b ).
- the user seeking speaker identification or verification supplies new speech data at 44 and these data are used to train a speaker dependent model as indicated at step 46 .
- the model 48 is then used at step 50 to construct a supervector 52 .
- the new speech data may not necessarily include an example of each sound unit.
- the new speech utterance may be too short to contain examples of all sound units. The system will handle this, as will be more fully explained below.
- Dimensionality reduction is performed at step 54 upon the supervector 52 , resulting in a new data point that can be represented in eigenspace as indicated at step 56 and illustrated at 58 .
- the previously acquired points in eigenspace are represented as dots, whereas the new speech data point is represented by a star.
- Figure 4 illustrates an exemplary embodiment of both speaker identification and speaker verification.
- the new speech data is assigned to the closest training speaker in eigenspace, step 62 diagrammatically illustrated at 64 .
- the system will thus identify the new speech as being that of the prior training speaker whose data point or data distribution lies closest to the new speech in eigenspace.
- the system tests the new data point at step 66 to determine whether it is within a predetermined threshold proximity to the client speaker in eigenspace. As a safeguard the system may, at step 68 , reject the new speaker data if it lies closer in eigenspace to an impostor than to the client speaker. This is diagrammatically illustrated at 69, where the proximity to the client speaker and proximity to the closest impostor have been depicted.
- a projection operation finds the point within eigenspace that is as close as possible to the point outside of eigenspace corresponding to the new speaker's input speech. It bears noting that these points are actually supervectors from which a set of HMMs can be reconstituted.
- the projection operation is a comparatively crude technique that does not guarantee that the point within eigenspace is optimal for the new speaker. Furthermore, the projection operation requires that the supervector for the new speaker contain a complete set of data to represent the entire set of HMMs for that speaker. This requirement gives rise to a significant practical limitation.
- that speaker When using projection to constrain a new speaker to the eigenspace, that speaker must supply enough input speech so that all speech units are represented in the data. For example, if the Hidden Markov Models are designed to represent all phonemes in the English language, then the training speaker must supply examples of all phonemes before the simple projection technique can be used. In many applications this constraint is simply not practical.
- the maximum likelihood technique of the invention addresses both of the above-mentioned drawbacks of simple projection.
- the maximum likelihood technique of the invention finds a point within eigenspace that represents the supervector corresponding to a set of Hidden Markov Models that have the maximum probability of generating the speech supplied by the new speaker.
- the maximum likelihood technique is based on probabilities arising from the actual adaptation data and thus tends to weight the more probable data more heavily. Unlike the simple projection technique, the maximum likelihood technique will work even if the new speaker has not supplied a full set of training data (i.e., data for some of the sound units are missing). In effect, the maximum likelihood technique takes into account the context under which the supervectors are constructed, namely from Hidden Markov Models involving probabilities that certain models are more likely than others to generate the input speech supplied by the new speaker.
- the maximum likelihood technique will select the supervector within eigenspace that is the most consistent with the new speaker's input speech, regardless of how much input speech is actually available.
- the new speaker is a young female native of Alabama.
- the maximum likelihood technique will select a point within eigenspace that represents all phonemes (even those not yet represented in the input speech) consistent with this speaker's native Alabama female accent.
- Figure 5 shows how the maximum likelihood technique works.
- the input speech from the new speaker is used to construct supervector 70.
- the supervector comprises a concatenated list of speech parameters, corresponding to cepstral coefficients or the like.
- these parameters are floating point numbers representing the Gaussian means extracted from the set of Hidden Markov Models corresponding to the new speaker.
- Other HMM parameters may also be used.
- these HMM means are shown as dots, as at 72 .
- supervector 70 When fully populated with data, supervector 70 would contain floating point numbers for each of the HMM means, corresponding to each of the sound units represented by the HMM models. For illustration purposes it is assumed here that the parameters for phoneme "ah" are present but parameters for phoneme "iy" are missing.
- the eigenspace 38 is represented by a set of eigenvectors 74, 76 and 78 .
- the supervector 70 corresponding to the observation data from the new speaker may be represented in eigenspace by multiplying each of the eigenvectors by a corresponding eigenvalue, designated W 1 , W 2 ... W n .
- These eigenvalues are initially unknown.
- the maximum likelihood technique finds values for these unknown eigenvalues. As will be more fully explained, these values are selected by seeking the optimal solution that will best represent the new speaker within eigenspace.
- an adapted model 80 is produced.
- the supervector of the input speech may have had some missing parameter values (the "iy" parameters, for example)
- the supervector 80 representing the adapted model is fully populated with values. That is one benefit of the invention.
- the values in supervector 80 represent the optimal solution, namely that which has the maximum likelihood of representing the new speaker in eigenspace.
- the individual eigenvalues W 1 , W 2 ... W n may be viewed as comprising a maximum likelihood vector, herein referred to as maximum likelihood vector.
- Figure 5 illustrates vector diagrammatically at 82. As the illustration shows, maximum likelihood vector 82 comprises the set of eigenvalues W 1 , W 2 ... W n .
- the procedure for performing adaptation using the maximum likelihood technique is shown in Figure 6.
- Speech from a new speaker comprising the observation data, is used to construct a set of HMMs as depicted at 100 .
- the set of HMMs 102 is then used in constructing a supervector as depicted at 104 .
- the supervector 106 comprises a concatenated list of HMM parameters extracted from the HMM models 102 .
- a probability function Q is constructed at 108 .
- the presently preferred embodiment employs a probability function that represents the probability of generating the observed data for the pre-defined set of HMM models 102 . Subsequent manipulation of the probability function Q is made easier if the function includes not only a probability term P but also the logarithm of that term, log P.
- the probability function is then maximized at step 110 by taking the derivative of the probability function individually with respect to each of the eigenvalues W 1 , W 2 ... W n .
- the eigenspace is of dimension 100
- this system calculates 100 derivatives of the probability function Q setting each to zero and solving for the respective W. While this may seem like a large computation, it is far less computationally expensive than performing the thousands of computations typically required of conventional MAP or MLLR techniques.
- the resulting set of Ws represent the eigenvalues needed to identify the point in eigenspace corresponding to the point of maximum likelihood.
- the set of Ws comprises a maximum likelihood vector in eigenspace.
- each of the eigenvectors (eigenvectors 74, 76 and 78 in Fig. 5) define a set of orthogonal vectors or coordinates against which the eigenvalues are multiplied to define a point constrained within eigenspace.
- This maximum likelihood vector depicted at 112 , is used to construct supervector 114 corresponding to the optimal point in eigenspace (point 66 in Fig. 4).
- Supervector 114 may then be used at step 116 to construct the adapted model 118 for the new speaker.
- ⁇ m ( s ) ( j ) represents the mean vector for the mixture gaussian m in the state s of the eigenvector (eigenmodel) j.
- the ⁇ j are orthogonal and the w j are the eigenvalues of our speaker model.
- any new speaker can be modeled as a linear combination of our database of observed speakers. Then with s in states o, ⁇ , m in mixture gaussians of M.
- a simple geometric distance calculation can be used to identify which training data speaker is closest to the new speaker.
- proximity is assessed by treating the new speaker data as an observation O and by then testing each distribution candidate (representing the training speakers) to determine what is the probability that the candidate generated the observation data. The candidate with the highest probability is assessed as having the closest proximity. In some high-security applications it may be desirable to reject verification if the most probable candidate has a probability score below a predetermined threshold. A cost function may be used to thus rule out candidates that lack a high degree of certainty.
- Assessing the proximity of the new speaker to the training speakers may be carried out entirely within eigenspace, as described above.
- a Bayesian estimation technique can be used for even greater accuracy.
- the Gaussian densities of the training speakers within eigenspace are multiplied by the estimated marginal density in the orthogonal complement space that represents the speaker data that were discarded through dimensionality reduction.
- performing dimensionality reduction upon the speaker model supervectors results in a significant data compression from high-dimensionality space to low-dimensionality space.
- dimensionality reduction preserves the most important basis vectors, some higher-order information is discarded.
- the Bayesian estimation technique estimates a marginal Gaussian density that corresponds to this discarded information.
- the original eigenspace is constructed by linear transformation of the supervector through a dimensionality reduction process whereby M components are extracted from the larger number N of all components.
- the smaller extracted M components represent a lower-dimensional subspace of the transformation basis that correspond to the maximal eigenvalues.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Electrically Operated Instructional Devices (AREA)
- Image Analysis (AREA)
- Telephonic Communication Services (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Collating Specific Patterns (AREA)
Claims (11)
- Méthode de vérification ou d'identification d'un locuteur en ce qui concerne un locuteur client prédéterminé, cette méthode consistant:à assurer l'apprentissage d'un ensemble de modèles de parole sur les sons vocaux d'une pluralité de locuteurs d'apprentissage, cette pluralité de locuteurs d'apprentissage comprenant au moins un locuteur client;à construire un espace propre pour représenter cette pluralité de locuteurs d'apprentissage en effectuant une réduction de dimensionnalité sur l'ensemble de modèles afin de produire un ensemble de vecteurs de la base qui définissent cet espace propre;à représenter ce locuteur client en tant que premier emplacement dans l'espace propre;à traiter les données introduites pour le nouveau locuteur en assurant l'apprentissage d'un nouveau modèle de parole sur ces données introduites et en effectuant une réduction de dimensionnalité sur ce nouveau modèle de parole afin de produire une représentation de ce nouveau locuteur en tant que deuxième emplacement dans l'espace propre;à évaluer la proximité des premier et deuxième emplacements et à se servir de cette évaluation pour indiquer si le nouveau locuteur est le locuteur client.
- Méthode d'identification d'un locuteur selon la revendication 1 caractérisée en ce que la pluralité des locuteurs d'apprentissage comprend une pluralité de différents locuteurs clients et caractérisée en ce qu'elle consiste par ailleurs:à représenter tous les locuteurs parmi la pluralité de locuteurs clients comme emplacements de locuteurs d'apprentissage dans l'espace propre, età évaluer la proximité du deuxième emplacement par rapport aux emplacements des locuteurs d'apprentissage et identifier le nouveau locuteur comme locuteur sélectionné parmi la pluralité des locuteurs clients, ceci basé au moins en partie sur l'évaluation de proximité.
- Méthode de vérification d'un locuteur selon la revendication 1 caractérisée en ce que la pluralité de locuteurs d'apprentissage comprend au moins un locuteur imposteur qui est représenté en tant que troisième emplacement dans l'espace propre.
- Méthode de vérification d'un locuteur selon la revendication 3 qui consiste par ailleurs à évaluer la proximité des deuxième et troisième emplacements et à se servir de cette évaluation additionnelle pour indiquer d'autre part si le nouveau locuteur est le locuteur client.
- Méthode selon la revendication 1 caractérisée en ce que l'étape qui consiste à évaluer la proximité s'effectue en déterminant la distance entre les premier et deuxième emplacements.
- Méthode selon la revendication 1 caractérisée en ce que les locuteurs d'apprentissage sont représentés comme emplacements dans l'espace propre.
- Méthode selon la revendication 1 caractérisée en ce que les locuteurs d'apprentissage sont représentés comme points dans l'espace propre.
- Méthode selon la revendication 1 caractérisée en ce que les locuteurs d'apprentissage sont représentés comme distributions dans l'espace propre.
- Méthode selon la revendication 1 caractérisée en ce que l'étape qui consiste à traiter les données introduites pour le nouveau locuteur se sert également de ces données introduites pour produire une fonction de probabilité puis maximise cette fonction de probabilité pour déterminer un vecteur de maximum de vraisemblance qui se trouve à l'intérieur de l'espace propre.
- Méthode selon la revendication 1 caractérisée en ce que la pluralité de locuteurs d'apprentissage comprend une pluralité de locuteurs clients et au moins un locuteur imposteur.
- Méthode selon la revendication 1 qui consiste par ailleurs à évaluer la proximité des premier et deuxième emplacements et à se servir de cette évaluation pour indiquer si le nouveau locuteur est le locuteur client afin de déterminer si l'identité du nouveau locuteur varie.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US148911 | 1998-09-04 | ||
| US09/148,911 US6141644A (en) | 1998-09-04 | 1998-09-04 | Speaker verification and speaker identification based on eigenvoices |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| EP0984431A2 EP0984431A2 (fr) | 2000-03-08 |
| EP0984431A3 EP0984431A3 (fr) | 2000-11-29 |
| EP0984431B1 true EP0984431B1 (fr) | 2004-02-18 |
Family
ID=22527990
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP99306671A Expired - Lifetime EP0984431B1 (fr) | 1998-09-04 | 1999-08-23 | Vérification et identification de locuteur basée sur des voix-propres |
Country Status (7)
| Country | Link |
|---|---|
| US (2) | US6141644A (fr) |
| EP (1) | EP0984431B1 (fr) |
| JP (1) | JP2000081894A (fr) |
| CN (1) | CN1188828C (fr) |
| DE (1) | DE69914839T2 (fr) |
| ES (1) | ES2214815T3 (fr) |
| TW (1) | TW448416B (fr) |
Families Citing this family (238)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7630895B2 (en) * | 2000-01-21 | 2009-12-08 | At&T Intellectual Property I, L.P. | Speaker verification method |
| US6076055A (en) * | 1997-05-27 | 2000-06-13 | Ameritech | Speaker verification method |
| US6141644A (en) * | 1998-09-04 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Speaker verification and speaker identification based on eigenvoices |
| US8095581B2 (en) * | 1999-02-05 | 2012-01-10 | Gregory A Stobbs | Computer-implemented patent portfolio analysis method and apparatus |
| US20010044719A1 (en) * | 1999-07-02 | 2001-11-22 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for recognizing, indexing, and searching acoustic signals |
| US6556969B1 (en) * | 1999-09-30 | 2003-04-29 | Conexant Systems, Inc. | Low complexity speaker verification using simplified hidden markov models with universal cohort models and automatic score thresholding |
| US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
| WO2001073756A1 (fr) * | 2000-03-31 | 2001-10-04 | Centre For Signal Processing Of The Nanyang Technological University School Of Electrical & Electronic Engineering | Verification d'un locuteur a partir d'une matrice de projection |
| US6609094B1 (en) * | 2000-05-22 | 2003-08-19 | International Business Machines Corporation | Maximum entropy and maximum likelihood criteria for feature selection from multivariate data |
| EP1178467B1 (fr) * | 2000-07-05 | 2005-03-09 | Matsushita Electric Industrial Co., Ltd. | Vérification et identification du locuteur |
| US7216077B1 (en) * | 2000-09-26 | 2007-05-08 | International Business Machines Corporation | Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation |
| DE10047723A1 (de) * | 2000-09-27 | 2002-04-11 | Philips Corp Intellectual Pty | Verfahren zur Ermittlung eines Eigenraums zur Darstellung einer Mehrzahl von Trainingssprechern |
| US7496510B2 (en) * | 2000-11-30 | 2009-02-24 | International Business Machines Corporation | Method and apparatus for the automatic separating and indexing of multi-speaker conversations |
| US6895376B2 (en) * | 2001-05-04 | 2005-05-17 | Matsushita Electric Industrial Co., Ltd. | Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification |
| US7437289B2 (en) * | 2001-08-16 | 2008-10-14 | International Business Machines Corporation | Methods and apparatus for the systematic adaptation of classification systems from sparse adaptation data |
| US20030113002A1 (en) * | 2001-12-18 | 2003-06-19 | Koninklijke Philips Electronics N.V. | Identification of people using video and audio eigen features |
| US6952674B2 (en) * | 2002-01-07 | 2005-10-04 | Intel Corporation | Selecting an acoustic model in a speech recognition system |
| US7620547B2 (en) * | 2002-07-25 | 2009-11-17 | Sony Deutschland Gmbh | Spoken man-machine interface with speaker identification |
| US7181393B2 (en) * | 2002-11-29 | 2007-02-20 | Microsoft Corporation | Method of real-time speaker change point detection, speaker tracking and speaker model construction |
| US7272565B2 (en) * | 2002-12-17 | 2007-09-18 | Technology Patents Llc. | System and method for monitoring individuals |
| US7634063B2 (en) * | 2003-01-02 | 2009-12-15 | Technology Patents, Llc | System and method for monitoring individuals |
| WO2004064040A1 (fr) * | 2003-01-15 | 2004-07-29 | Siemens Corporate Research Inc. | Procede de traitement de la parole |
| US7299177B2 (en) * | 2003-05-30 | 2007-11-20 | American Express Travel Related Services Company, Inc. | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
| WO2005015547A1 (fr) * | 2003-07-01 | 2005-02-17 | France Telecom | Procede et systeme d'analyse de signaux vocaux pour la representation compacte de locuteurs |
| SG140445A1 (en) * | 2003-07-28 | 2008-03-28 | Sony Corp | Method and apparatus for automatically recognizing audio data |
| US7328154B2 (en) * | 2003-08-13 | 2008-02-05 | Matsushita Electrical Industrial Co., Ltd. | Bubble splitting for compact acoustic modeling |
| US7643989B2 (en) * | 2003-08-29 | 2010-01-05 | Microsoft Corporation | Method and apparatus for vocal tract resonance tracking using nonlinear predictor and target-guided temporal restraint |
| US7224786B2 (en) * | 2003-09-11 | 2007-05-29 | Capital One Financial Corporation | System and method for detecting unauthorized access using a voice signature |
| US7212613B2 (en) * | 2003-09-18 | 2007-05-01 | International Business Machines Corporation | System and method for telephonic voice authentication |
| US20080208581A1 (en) * | 2003-12-05 | 2008-08-28 | Queensland University Of Technology | Model Adaptation System and Method for Speaker Recognition |
| KR20050063986A (ko) * | 2003-12-23 | 2005-06-29 | 한국전자통신연구원 | 고유음성 계수를 이용한 화자종속 음성인식 시스템 및 방법 |
| US7636855B2 (en) * | 2004-01-30 | 2009-12-22 | Panasonic Corporation | Multiple choice challenge-response user authorization system and method |
| US20050192973A1 (en) * | 2004-02-12 | 2005-09-01 | Smith Micro Software, Inc. | Visual database management system and method |
| US20070033041A1 (en) * | 2004-07-12 | 2007-02-08 | Norton Jeffrey W | Method of identifying a person based upon voice analysis |
| US9355651B2 (en) | 2004-09-16 | 2016-05-31 | Lena Foundation | System and method for expressive language, developmental disorder, and emotion assessment |
| US9240188B2 (en) | 2004-09-16 | 2016-01-19 | Lena Foundation | System and method for expressive language, developmental disorder, and emotion assessment |
| US8938390B2 (en) * | 2007-01-23 | 2015-01-20 | Lena Foundation | System and method for expressive language and developmental disorder assessment |
| US10223934B2 (en) | 2004-09-16 | 2019-03-05 | Lena Foundation | Systems and methods for expressive language, developmental disorder, and emotion assessment, and contextual feedback |
| US8078465B2 (en) * | 2007-01-23 | 2011-12-13 | Lena Foundation | System and method for detection and analysis of speech |
| US7565292B2 (en) * | 2004-09-17 | 2009-07-21 | Micriosoft Corporation | Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech |
| US20080208578A1 (en) * | 2004-09-23 | 2008-08-28 | Koninklijke Philips Electronics, N.V. | Robust Speaker-Dependent Speech Recognition System |
| US7574359B2 (en) * | 2004-10-01 | 2009-08-11 | Microsoft Corporation | Speaker selection training via a-posteriori Gaussian mixture model analysis, transformation, and combination of hidden Markov models |
| US7565284B2 (en) * | 2004-11-05 | 2009-07-21 | Microsoft Corporation | Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories |
| US7447633B2 (en) * | 2004-11-22 | 2008-11-04 | International Business Machines Corporation | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
| US7519531B2 (en) * | 2005-03-30 | 2009-04-14 | Microsoft Corporation | Speaker adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation |
| US20060229879A1 (en) * | 2005-04-06 | 2006-10-12 | Top Digital Co., Ltd. | Voiceprint identification system for e-commerce |
| US20060287863A1 (en) * | 2005-06-16 | 2006-12-21 | International Business Machines Corporation | Speaker identification and voice verification for voice applications |
| US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
| US8825482B2 (en) * | 2005-09-15 | 2014-09-02 | Sony Computer Entertainment Inc. | Audio, video, simulation, and user interface paradigms |
| US7788101B2 (en) * | 2005-10-31 | 2010-08-31 | Hitachi, Ltd. | Adaptation method for inter-person biometrics variability |
| JP4556028B2 (ja) * | 2005-11-04 | 2010-10-06 | 株式会社国際電気通信基礎技術研究所 | 発話主体同定装置及びコンピュータプログラム |
| US20070201443A1 (en) * | 2006-02-09 | 2007-08-30 | Debanjan Saha | VoIP caller authentication by voice signature continuity |
| US7539616B2 (en) * | 2006-02-20 | 2009-05-26 | Microsoft Corporation | Speaker authentication using adapted background models |
| WO2007111169A1 (fr) * | 2006-03-24 | 2007-10-04 | Pioneer Corporation | Dispositif d'enregistrement de modèle de locuteur, procédé et programme informatique dans un système d'identification du locuteur |
| DE602006010511D1 (de) * | 2006-04-03 | 2009-12-31 | Voice Trust Ag | Sprecherauthentifizierung in digitalen Kommunikationsnetzen |
| US7769583B2 (en) * | 2006-05-13 | 2010-08-03 | International Business Machines Corporation | Quantizing feature vectors in decision-making applications |
| AU2006343470B2 (en) * | 2006-05-16 | 2012-07-19 | Loquendo S.P.A. | Intersession variability compensation for automatic extraction of information from voice |
| DE602006011287D1 (de) * | 2006-05-24 | 2010-02-04 | Voice Trust Ag | Robuste Sprechererkennung |
| US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
| CN101154380B (zh) * | 2006-09-29 | 2011-01-26 | 株式会社东芝 | 说话人认证的注册及验证的方法和装置 |
| US8024193B2 (en) * | 2006-10-10 | 2011-09-20 | Apple Inc. | Methods and apparatus related to pruning for concatenative text-to-speech synthesis |
| CA2676380C (fr) * | 2007-01-23 | 2015-11-24 | Infoture, Inc. | Systeme et procede pour la detection et l'analyse de la voix |
| US20080195395A1 (en) * | 2007-02-08 | 2008-08-14 | Jonghae Kim | System and method for telephonic voice and speech authentication |
| US8099288B2 (en) * | 2007-02-12 | 2012-01-17 | Microsoft Corp. | Text-dependent speaker verification |
| KR20080090034A (ko) * | 2007-04-03 | 2008-10-08 | 삼성전자주식회사 | 음성 화자 인식 방법 및 시스템 |
| US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
| US20090006085A1 (en) * | 2007-06-29 | 2009-01-01 | Microsoft Corporation | Automated call classification and prioritization |
| US20090018826A1 (en) * | 2007-07-13 | 2009-01-15 | Berlin Andrew A | Methods, Systems and Devices for Speech Transduction |
| US20090030676A1 (en) * | 2007-07-26 | 2009-01-29 | Creative Technology Ltd | Method of deriving a compressed acoustic model for speech recognition |
| US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
| US8817964B2 (en) * | 2008-02-11 | 2014-08-26 | International Business Machines Corporation | Telephonic voice authentication and display |
| WO2009110613A1 (fr) * | 2008-03-07 | 2009-09-11 | 日本電気株式会社 | Dispositif de collationnement personnel et dispositif d'enregistrement de locuteur, procédé et programme |
| US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
| US8504365B2 (en) * | 2008-04-11 | 2013-08-06 | At&T Intellectual Property I, L.P. | System and method for detecting synthetic speaker verification |
| US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
| US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
| US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
| JP5326892B2 (ja) * | 2008-12-26 | 2013-10-30 | 富士通株式会社 | 情報処理装置、プログラム、および音響モデルを生成する方法 |
| US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
| US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
| US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
| US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
| US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
| US9685159B2 (en) * | 2009-11-12 | 2017-06-20 | Agnitio Sl | Speaker recognition from telephone calls |
| US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
| US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
| US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
| US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
| DE202011111062U1 (de) | 2010-01-25 | 2019-02-19 | Newvaluexchange Ltd. | Vorrichtung und System für eine Digitalkonversationsmanagementplattform |
| US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
| US8719191B2 (en) * | 2010-03-01 | 2014-05-06 | International Business Machines Corporation | Training and verification using a correlated boosted entity model |
| CN102194455A (zh) * | 2010-03-17 | 2011-09-21 | 博石金(北京)信息技术有限公司 | 一种与说话内容无关的声纹鉴别认证方法 |
| US8442823B2 (en) * | 2010-10-19 | 2013-05-14 | Motorola Solutions, Inc. | Methods for creating and searching a database of speakers |
| US9318114B2 (en) * | 2010-11-24 | 2016-04-19 | At&T Intellectual Property I, L.P. | System and method for generating challenge utterances for speaker verification |
| WO2012068705A1 (fr) * | 2010-11-25 | 2012-05-31 | Telefonaktiebolaget L M Ericsson (Publ) | Système et procédé d'analyse pour données audio |
| US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
| US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
| US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
| US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
| CN103186527B (zh) * | 2011-12-27 | 2017-04-26 | 北京百度网讯科技有限公司 | 建立音乐分类模型的系统、推荐音乐的系统及相应方法 |
| JP6031761B2 (ja) * | 2011-12-28 | 2016-11-24 | 富士ゼロックス株式会社 | 音声解析装置および音声解析システム |
| US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
| US9390445B2 (en) | 2012-03-05 | 2016-07-12 | Visa International Service Association | Authentication using biometric technology through a consumer device |
| US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
| US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
| US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
| US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
| US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
| US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
| EP2713367B1 (fr) * | 2012-09-28 | 2016-11-09 | Agnitio, S.L. | Reconnaissance du locuteur |
| US20140136204A1 (en) * | 2012-11-13 | 2014-05-15 | GM Global Technology Operations LLC | Methods and systems for speech systems |
| US8694315B1 (en) * | 2013-02-05 | 2014-04-08 | Visa International Service Association | System and method for authentication using speaker verification techniques and fraud model |
| US9406298B2 (en) * | 2013-02-07 | 2016-08-02 | Nuance Communications, Inc. | Method and apparatus for efficient i-vector extraction |
| KR20250004158A (ko) | 2013-02-07 | 2025-01-07 | 애플 인크. | 디지털 어시스턴트를 위한 음성 트리거 |
| US20140222423A1 (en) * | 2013-02-07 | 2014-08-07 | Nuance Communications, Inc. | Method and Apparatus for Efficient I-Vector Extraction |
| US9865266B2 (en) * | 2013-02-25 | 2018-01-09 | Nuance Communications, Inc. | Method and apparatus for automated speaker parameters adaptation in a deployed speaker verification system |
| US9336775B2 (en) | 2013-03-05 | 2016-05-10 | Microsoft Technology Licensing, Llc | Posterior-based feature with partial distance elimination for speech recognition |
| US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
| US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
| WO2014144579A1 (fr) | 2013-03-15 | 2014-09-18 | Apple Inc. | Système et procédé pour mettre à jour un modèle de reconnaissance de parole adaptatif |
| US9258425B2 (en) | 2013-05-22 | 2016-02-09 | Nuance Communications, Inc. | Method and system for speaker verification |
| WO2014197334A2 (fr) | 2013-06-07 | 2014-12-11 | Apple Inc. | Système et procédé destinés à une prononciation de mots spécifiée par l'utilisateur dans la synthèse et la reconnaissance de la parole |
| WO2014197336A1 (fr) | 2013-06-07 | 2014-12-11 | Apple Inc. | Système et procédé pour détecter des erreurs dans des interactions avec un assistant numérique utilisant la voix |
| US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
| WO2014197335A1 (fr) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interprétation et action sur des commandes qui impliquent un partage d'informations avec des dispositifs distants |
| HK1220268A1 (zh) | 2013-06-09 | 2017-04-28 | 苹果公司 | 用於實現跨數字助理的兩個或更多個實例的會話持續性的設備、方法、和圖形用戶界面 |
| US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
| JP2016521948A (ja) | 2013-06-13 | 2016-07-25 | アップル インコーポレイテッド | 音声コマンドによって開始される緊急電話のためのシステム及び方法 |
| US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
| US8812320B1 (en) | 2014-04-01 | 2014-08-19 | Google Inc. | Segment-based speaker verification using dynamically generated phrases |
| US9542948B2 (en) | 2014-04-09 | 2017-01-10 | Google Inc. | Text-dependent speaker identification |
| US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
| US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
| US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
| US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
| US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
| US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
| US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
| US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
| US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
| US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
| US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
| US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
| US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
| US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
| US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
| US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
| US9257120B1 (en) | 2014-07-18 | 2016-02-09 | Google Inc. | Speaker verification using co-location information |
| US11942095B2 (en) | 2014-07-18 | 2024-03-26 | Google Llc | Speaker verification using co-location information |
| US11676608B2 (en) | 2021-04-02 | 2023-06-13 | Google Llc | Speaker verification using co-location information |
| US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
| US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
| US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
| BR102014023647B1 (pt) * | 2014-09-24 | 2022-12-06 | Fundacao Cpqd - Centro De Pesquisa E Desenvolvimento Em Telecomunicacoes | Método e sistema para detecção de fraudes em aplicações baseadas em processamento de voz |
| US9606986B2 (en) | 2014-09-29 | 2017-03-28 | Apple Inc. | Integrated word N-gram and class M-gram language models |
| US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
| US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
| US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
| US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
| US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
| US9424841B2 (en) | 2014-10-09 | 2016-08-23 | Google Inc. | Hotword detection on multiple devices |
| US9318107B1 (en) | 2014-10-09 | 2016-04-19 | Google Inc. | Hotword detection on multiple devices |
| US9812128B2 (en) | 2014-10-09 | 2017-11-07 | Google Inc. | Device leadership negotiation among voice interface devices |
| US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
| US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
| US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
| US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
| US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
| US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
| US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
| US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
| US12268523B2 (en) | 2015-05-08 | 2025-04-08 | ST R&DTech LLC | Biometric, physiological or environmental monitoring using a closed chamber |
| US10709388B2 (en) | 2015-05-08 | 2020-07-14 | Staton Techiya, Llc | Biometric, physiological or environmental monitoring using a closed chamber |
| US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
| US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
| US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
| US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
| US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
| US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
| US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
| US10056076B2 (en) * | 2015-09-06 | 2018-08-21 | International Business Machines Corporation | Covariance matrix estimation with structural-based priors for speech processing |
| US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
| US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
| US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
| US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
| US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
| US20170092278A1 (en) * | 2015-09-30 | 2017-03-30 | Apple Inc. | Speaker recognition |
| US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
| US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
| US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
| US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
| US9779735B2 (en) | 2016-02-24 | 2017-10-03 | Google Inc. | Methods and systems for detecting and processing speech signals |
| US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
| CN105845141A (zh) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | 基于信道鲁棒的说话人确认模型及说话人确认方法和装置 |
| US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
| US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
| US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
| US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
| DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
| US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
| US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
| US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
| US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
| US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
| DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
| DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
| DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
| DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
| US10141009B2 (en) | 2016-06-28 | 2018-11-27 | Pindrop Security, Inc. | System and method for cluster-based audio event detection |
| US9972320B2 (en) | 2016-08-24 | 2018-05-15 | Google Llc | Hotword detection on multiple devices |
| US9824692B1 (en) | 2016-09-12 | 2017-11-21 | Pindrop Security, Inc. | End-to-end speaker recognition using deep neural network |
| US10553218B2 (en) * | 2016-09-19 | 2020-02-04 | Pindrop Security, Inc. | Dimensionality reduction of baum-welch statistics for speaker recognition |
| US10347256B2 (en) | 2016-09-19 | 2019-07-09 | Pindrop Security, Inc. | Channel-compensated low-level features for speaker recognition |
| US10325601B2 (en) | 2016-09-19 | 2019-06-18 | Pindrop Security, Inc. | Speaker recognition in the call center |
| US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
| KR102241970B1 (ko) | 2016-11-07 | 2021-04-20 | 구글 엘엘씨 | 기록된 미디어 핫워드 트리거 억제 |
| US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
| US10559309B2 (en) | 2016-12-22 | 2020-02-11 | Google Llc | Collaborative voice controlled devices |
| US10397398B2 (en) | 2017-01-17 | 2019-08-27 | Pindrop Security, Inc. | Authentication using DTMF tones |
| US10720165B2 (en) * | 2017-01-23 | 2020-07-21 | Qualcomm Incorporated | Keyword voice authentication |
| US10522137B2 (en) | 2017-04-20 | 2019-12-31 | Google Llc | Multi-user authentication on a device |
| DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
| DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
| DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
| DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
| DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
| DK179549B1 (en) | 2017-05-16 | 2019-02-12 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
| US10395650B2 (en) | 2017-06-05 | 2019-08-27 | Google Llc | Recorded media hotword trigger suppression |
| KR102364853B1 (ko) | 2017-07-18 | 2022-02-18 | 삼성전자주식회사 | 음향 센싱 소자의 신호 처리 방법과 음향 센싱 시스템 |
| US10529357B2 (en) | 2017-12-07 | 2020-01-07 | Lena Foundation | Systems and methods for automatic determination of infant cry and discrimination of cry from fussiness |
| EP3553775B1 (fr) | 2018-04-12 | 2020-11-25 | Spotify AB | Authentification vocale |
| EP3690875B1 (fr) | 2018-04-12 | 2024-03-20 | Spotify AB | Entrainement et test d'un systeme compose de phrases |
| US10692496B2 (en) | 2018-05-22 | 2020-06-23 | Google Llc | Hotword suppression |
| WO2020159917A1 (fr) | 2019-01-28 | 2020-08-06 | Pindrop Security, Inc. | Repérage de mots-clés et découverte de mots non supervisés pour une analyse de fraude |
| WO2020163624A1 (fr) | 2019-02-06 | 2020-08-13 | Pindrop Security, Inc. | Systèmes et procédés de détection de passerelle dans un réseau téléphonique |
| US11646018B2 (en) | 2019-03-25 | 2023-05-09 | Pindrop Security, Inc. | Detection of calls from voice assistants |
| US12015637B2 (en) | 2019-04-08 | 2024-06-18 | Pindrop Security, Inc. | Systems and methods for end-to-end architectures for voice spoofing detection |
| US10841424B1 (en) | 2020-05-14 | 2020-11-17 | Bank Of America Corporation | Call monitoring and feedback reporting using machine learning |
Family Cites Families (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4032711A (en) * | 1975-12-31 | 1977-06-28 | Bell Telephone Laboratories, Incorporated | Speaker recognition arrangement |
| US5548647A (en) * | 1987-04-03 | 1996-08-20 | Texas Instruments Incorporated | Fixed text speaker verification method and apparatus |
| US5054083A (en) * | 1989-05-09 | 1991-10-01 | Texas Instruments Incorporated | Voice verification circuit for validating the identity of an unknown person |
| US5345535A (en) * | 1990-04-04 | 1994-09-06 | Doddington George R | Speech analysis method and apparatus |
| US5339385A (en) * | 1992-07-22 | 1994-08-16 | Itt Corporation | Speaker verifier using nearest-neighbor distance measure |
| FR2696036B1 (fr) * | 1992-09-24 | 1994-10-14 | France Telecom | Procédé de mesure de ressemblance entre échantillons sonores et dispositif de mise en Óoeuvre de ce procédé. |
| US5632002A (en) * | 1992-12-28 | 1997-05-20 | Kabushiki Kaisha Toshiba | Speech recognition interface system suitable for window systems and speech mail systems |
| AUPM983094A0 (en) * | 1994-12-02 | 1995-01-05 | Australian National University, The | Method for forming a cohort for use in identification of an individual |
| US5687287A (en) * | 1995-05-22 | 1997-11-11 | Lucent Technologies Inc. | Speaker verification method and apparatus using mixture decomposition discrimination |
| US5895447A (en) * | 1996-02-02 | 1999-04-20 | International Business Machines Corporation | Speech recognition using thresholded speaker class model selection or model adaptation |
| US6205424B1 (en) * | 1996-07-31 | 2001-03-20 | Compaq Computer Corporation | Two-staged cohort selection for speaker verification system |
| US6088669A (en) * | 1997-01-28 | 2000-07-11 | International Business Machines, Corporation | Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling |
| US6182037B1 (en) * | 1997-05-06 | 2001-01-30 | International Business Machines Corporation | Speaker recognition over large population with fast and detailed matches |
| US5953700A (en) * | 1997-06-11 | 1999-09-14 | International Business Machines Corporation | Portable acoustic interface for remote access to automatic speech/speaker recognition server |
| US6233555B1 (en) * | 1997-11-25 | 2001-05-15 | At&T Corporation | Method and apparatus for speaker identification using mixture discriminant analysis to develop speaker models |
| CA2318262A1 (fr) * | 1998-03-03 | 1999-09-10 | Lernout & Hauspie Speech Products N.V. | Systeme et procede multiresolution destines a une verification du locuteur |
| US6141644A (en) * | 1998-09-04 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Speaker verification and speaker identification based on eigenvoices |
-
1998
- 1998-09-04 US US09/148,911 patent/US6141644A/en not_active Expired - Lifetime
-
1999
- 1999-08-23 DE DE69914839T patent/DE69914839T2/de not_active Expired - Fee Related
- 1999-08-23 ES ES99306671T patent/ES2214815T3/es not_active Expired - Lifetime
- 1999-08-23 EP EP99306671A patent/EP0984431B1/fr not_active Expired - Lifetime
- 1999-09-02 JP JP11248458A patent/JP2000081894A/ja active Pending
- 1999-09-03 CN CNB991183894A patent/CN1188828C/zh not_active Expired - Fee Related
- 1999-10-12 TW TW088115204A patent/TW448416B/zh not_active IP Right Cessation
-
2000
- 2000-07-05 US US09/610,495 patent/US6697778B1/en not_active Expired - Lifetime
Also Published As
| Publication number | Publication date |
|---|---|
| JP2000081894A (ja) | 2000-03-21 |
| CN1188828C (zh) | 2005-02-09 |
| ES2214815T3 (es) | 2004-09-16 |
| TW448416B (en) | 2001-08-01 |
| DE69914839T2 (de) | 2005-01-05 |
| US6141644A (en) | 2000-10-31 |
| EP0984431A2 (fr) | 2000-03-08 |
| DE69914839D1 (de) | 2004-03-25 |
| EP0984431A3 (fr) | 2000-11-29 |
| US6697778B1 (en) | 2004-02-24 |
| CN1247363A (zh) | 2000-03-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP0984431B1 (fr) | Vérification et identification de locuteur basée sur des voix-propres | |
| Hansen et al. | Speaker recognition by machines and humans: A tutorial review | |
| US9646614B2 (en) | Fast, language-independent method for user authentication by voice | |
| Campbell | Speaker recognition: A tutorial | |
| US6571210B2 (en) | Confidence measure system using a near-miss pattern | |
| US6401063B1 (en) | Method and apparatus for use in speaker verification | |
| EP0744734B1 (fr) | Méthode et appareil de vérification du locuteur utilisant une discrimination basée sur la décomposition des mixtures | |
| EP2022042B1 (fr) | Compensation de la variabilite intersession pour extraction automatique d'informations a partir de la voix | |
| US6697779B1 (en) | Combined dual spectral and temporal alignment method for user authentication by voice | |
| US6263309B1 (en) | Maximum likelihood method for finding an adapted speaker model in eigenvoice space | |
| US20030225719A1 (en) | Methods and apparatus for fast and robust model training for object classification | |
| US6751590B1 (en) | Method and apparatus for performing pattern-specific maximum likelihood transformations for speaker recognition | |
| US6341264B1 (en) | Adaptation system and method for E-commerce and V-commerce applications | |
| US20070124145A1 (en) | Method and apparatus for estimating discriminating ability of a speech, method and apparatus for enrollment and evaluation of speaker authentication | |
| Maghsoodi et al. | Speaker recognition with random digit strings using uncertainty normalized HMM-based i-vectors | |
| EP1178467B1 (fr) | Vérification et identification du locuteur | |
| EP0953968B1 (fr) | Adaptation au locuteur et à l'environnement basée sur des vecteurs propres de voix incluant la méthode de vraisemblance maximum | |
| US6917919B2 (en) | Speech recognition method | |
| WO2002029785A1 (fr) | Procede, appareil et systeme permettant la verification du locuteur s'inspirant d'un modele de melanges de gaussiennes (gmm) | |
| Olsson | Text dependent speaker verification with a hybrid HMM/ANN system | |
| Singh | Bayesian distance metric learning and its application in automatic speaker recognition systems | |
| Omar et al. | Maximum conditional mutual information projection for speech recognition. | |
| Ramli et al. | Diagonal Factorisation Subspace in I-vector Extraction for Fast Computation and Memory Efficiency: A Case Study on Frog Sound Identification | |
| Nguyen et al. | Eigenvoices: a compact representation of speakers in model space | |
| Gangisetty | Text-independent speaker recognition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE ES FR GB IT |
|
| AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
| RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: JUNQUA, JEAN-CLAUDE Inventor name: NGUYEN, PATRICK Inventor name: BOWMAN, ROBERT Inventor name: KUHN, ROLAND |
|
| PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
| 17P | Request for examination filed |
Effective date: 20000929 |
|
| AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
| AKX | Designation fees paid |
Free format text: DE ES FR GB IT |
|
| 17Q | First examination report despatched |
Effective date: 20021122 |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE ES FR GB IT |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
| REF | Corresponds to: |
Ref document number: 69914839 Country of ref document: DE Date of ref document: 20040325 Kind code of ref document: P |
|
| REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2214815 Country of ref document: ES Kind code of ref document: T3 |
|
| ET | Fr: translation filed | ||
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20041119 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20050809 Year of fee payment: 7 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20060817 Year of fee payment: 8 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20060823 Year of fee payment: 8 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20060828 Year of fee payment: 8 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: IT Payment date: 20060831 Year of fee payment: 8 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20070430 |
|
| GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20070823 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060831 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20080301 |
|
| REG | Reference to a national code |
Ref country code: ES Ref legal event code: FD2A Effective date: 20070824 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20070823 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20070824 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20070823 |