Kanervisto et al., 2021 - Google Patents

Optimizing tandem speaker verification and anti-spoofing systems

Kanervisto et al., 2021

Document ID: 2278966985535760509
Author: Kanervisto A; Hautamäki V; Kinnunen T; Yamagishi J
Publication year: 2021
Publication venue: IEEE/ACM Transactions on Audio, Speech, and Language Processing

External Links

Cited by

Snippet

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then …

Continue reading at ieeexplore.ieee.org (PDF) (other versions)

238000001514 detection method 0 abstract description 2

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6261—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation partitioning the feature space
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6228—Selecting the most significant subset of features
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication	Publication Date	Title
Kinnunen et al.	2020	Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals
Kanervisto et al.	2021	Optimizing tandem speaker verification and anti-spoofing systems
Kreuk et al.	2018	Fooling end-to-end speaker verification with adversarial examples
Gomez-Alanis et al.	2019	A gated recurrent convolutional neural network for robust spoofing detection
Jahangir et al.	2020	Text-independent speaker identification through feature fusion and deep neural network
Ge et al.	2022	Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations
US9373330B2 (en)	2016-06-21	Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis
Gomez-Alanis et al.	2020	A kernel density estimation based loss function and its application to asv-spoofing detection
Mandasari et al.	2013	Quality measure functions for calibration of speaker recognition systems in various duration conditions
Mak et al.	2020	Machine learning for speaker recognition
US20100017209A1 (en)	2010-01-21	Random voiceprint certification system, random voiceprint cipher lock and creating method therefor
Aida-zade et al.	2016	Speech recognition using support vector machines
Wan	2003	Speaker verification using support vector machines
CN1302427A (en)	2001-07-04	Model adaptation system and method for speaker verification
Wang et al.	2022	A practical guide to logical access voice presentation attack detection
Hanilçi	2018	Data selection for i-vector based automatic speaker verification anti-spoofing
Rohdin et al.	2020	End-to-end DNN based text-independent speaker recognition for long and short utterances
Uzun et al.	2012	A second look at the performance of neural networks for keystroke dynamics using a publicly available dataset
Scardapane et al.	2017	On the use of deep recurrent neural networks for detecting audio spoofing attacks
Altınçay et al.	2000	An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification
Ramos-Castro et al.	2007	Speaker verification using speaker-and test-dependent fast score normalization
Adiban et al.	2017	Sut system description for anti-spoofing 2017 challenge
Jung et al.	2024	To what extent can ASV systems naturally defend against spoofing attacks?
Perera et al.	2024	Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
US20250201139A1 (en)	2025-06-19	Systems and methods for artificial intelligence-mediated multiparty electronic communication