Kanervisto et al., 2021 - Google Patents
Optimizing tandem speaker verification and anti-spoofing systemsKanervisto et al., 2021
View PDF- Document ID
- 2278966985535760509
- Author
- Kanervisto A
- Hautamäki V
- Kinnunen T
- Yamagishi J
- Publication year
- Publication venue
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
External Links
Snippet
As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then …
- 238000001514 detection method 0 abstract description 2
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6261—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation partitioning the feature space
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6228—Selecting the most significant subset of features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kinnunen et al. | Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals | |
Kanervisto et al. | Optimizing tandem speaker verification and anti-spoofing systems | |
Kreuk et al. | Fooling end-to-end speaker verification with adversarial examples | |
Gomez-Alanis et al. | A gated recurrent convolutional neural network for robust spoofing detection | |
Jahangir et al. | Text-independent speaker identification through feature fusion and deep neural network | |
Ge et al. | Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations | |
US9373330B2 (en) | Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis | |
Gomez-Alanis et al. | A kernel density estimation based loss function and its application to asv-spoofing detection | |
Mandasari et al. | Quality measure functions for calibration of speaker recognition systems in various duration conditions | |
Mak et al. | Machine learning for speaker recognition | |
US20100017209A1 (en) | Random voiceprint certification system, random voiceprint cipher lock and creating method therefor | |
Aida-zade et al. | Speech recognition using support vector machines | |
Wan | Speaker verification using support vector machines | |
CN1302427A (en) | Model adaptation system and method for speaker verification | |
Wang et al. | A practical guide to logical access voice presentation attack detection | |
Hanilçi | Data selection for i-vector based automatic speaker verification anti-spoofing | |
Rohdin et al. | End-to-end DNN based text-independent speaker recognition for long and short utterances | |
Uzun et al. | A second look at the performance of neural networks for keystroke dynamics using a publicly available dataset | |
Scardapane et al. | On the use of deep recurrent neural networks for detecting audio spoofing attacks | |
Altınçay et al. | An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification | |
Ramos-Castro et al. | Speaker verification using speaker-and test-dependent fast score normalization | |
Adiban et al. | Sut system description for anti-spoofing 2017 challenge | |
Jung et al. | To what extent can ASV systems naturally defend against spoofing attacks? | |
Perera et al. | Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing | |
US20250201139A1 (en) | Systems and methods for artificial intelligence-mediated multiparty electronic communication |