Broux et al., 2016 - Google Patents

An active learning method for speaker identity annotation in audio recordings

Broux et al., 2016

Document ID: 8542369261892039884
Author: Broux P; Doukhan D; Petitrenaud S; Meignier S; Carrive J
Publication year: 2016
Publication venue: 1st International Workshop on Multimodal Media Data Analytics (MMDA 2016)

External Links

Cited by

Snippet

Given that manual annotation of speech is an expensive and long process, we attempt in this paper to assist an anno-tator to perform a speaker diarization. This assistance takes place in an annotation background for a large amount of archives. We propose a method …

Continue reading at hal.science (PDF) (other versions)

238000000034 method 0 abstract description 13

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3074—Audio data retrieval
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication	Publication Date	Title
Makhoul et al.	2000	Speech and language technologies for audio indexing and retrieval
US6424946B1 (en)	2002-07-23	Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
US10109280B2 (en)	2018-10-23	Blind diarization of recorded calls with arbitrary number of speakers
US20220328037A1 (en)	2022-10-13	System and method for neural network orchestration
Delacourt et al.	2000	DISTBIC: A speaker-based segmentation for audio data indexing
Moattar et al.	2012	A review on speaker diarization systems and approaches
Vijayasenan et al.	2009	An information theoretic approach to speaker diarization of meeting data
Rouvier et al.	2013	An open-source state-of-the-art toolbox for broadcast news diarization
US6748356B1 (en)	2004-06-08	Methods and apparatus for identifying unknown speakers using a hierarchical tree structure
US20180218738A1 (en)	2018-08-02	Word-level blind diarization of recorded calls with arbitrary number of speakers
Rao et al.	2013	Pitch synchronous and glottal closure based speech analysis for language recognition
Poignant et al.	2014	Unsupervised speaker identification in TV broadcast based on written names
Takamichi et al.	2021	JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification
Bhanja et al.	2019	Deep residual networks for pre-classification based Indian language identification
CN101226558B (en)	2011-08-31	Method for searching audio data based on MFCCM
Broux et al.	2016	An active learning method for speaker identity annotation in audio recordings
Xiong et al.	2006	A tree-based kernel selection approach to efficient Gaussian mixture model–universal background model based speaker identification
Bhowmick et al.	2021	Identification/segmentation of indian regional languages with singular value decomposition based feature embedding
Tahon et al.	2024	Allies: a speech corpus for segmentation, speaker diarization speech recognition and speaker change detection
Levinson et al.	1988	Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification
Wang	2000	Mandarin spoken document retrieval based on syllable lattice matching
Esteve et al.	2007	Extracting true speaker identities from transcriptions
Li et al.	2004	Adaptive speaker identification with audiovisual cues for movie content analysis
Tetariy et al.	2013	An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases
Vijayasenan	2010	An information theoretic approach to speaker diarization of meeting recordings