Broux et al., 2016 - Google Patents
An active learning method for speaker identity annotation in audio recordingsBroux et al., 2016
View PDF- Document ID
- 8542369261892039884
- Author
- Broux P
- Doukhan D
- Petitrenaud S
- Meignier S
- Carrive J
- Publication year
- Publication venue
- 1st International Workshop on Multimodal Media Data Analytics (MMDA 2016)
External Links
Snippet
Given that manual annotation of speech is an expensive and long process, we attempt in this paper to assist an anno-tator to perform a speaker diarization. This assistance takes place in an annotation background for a large amount of archives. We propose a method …
- 238000000034 method 0 abstract description 13
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3074—Audio data retrieval
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Makhoul et al. | Speech and language technologies for audio indexing and retrieval | |
US6424946B1 (en) | Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering | |
US10109280B2 (en) | Blind diarization of recorded calls with arbitrary number of speakers | |
US20220328037A1 (en) | System and method for neural network orchestration | |
Delacourt et al. | DISTBIC: A speaker-based segmentation for audio data indexing | |
Moattar et al. | A review on speaker diarization systems and approaches | |
Vijayasenan et al. | An information theoretic approach to speaker diarization of meeting data | |
Rouvier et al. | An open-source state-of-the-art toolbox for broadcast news diarization | |
US6748356B1 (en) | Methods and apparatus for identifying unknown speakers using a hierarchical tree structure | |
US20180218738A1 (en) | Word-level blind diarization of recorded calls with arbitrary number of speakers | |
Rao et al. | Pitch synchronous and glottal closure based speech analysis for language recognition | |
Poignant et al. | Unsupervised speaker identification in TV broadcast based on written names | |
Takamichi et al. | JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification | |
Bhanja et al. | Deep residual networks for pre-classification based Indian language identification | |
CN101226558B (en) | Method for searching audio data based on MFCCM | |
Broux et al. | An active learning method for speaker identity annotation in audio recordings | |
Xiong et al. | A tree-based kernel selection approach to efficient Gaussian mixture model–universal background model based speaker identification | |
Bhowmick et al. | Identification/segmentation of indian regional languages with singular value decomposition based feature embedding | |
Tahon et al. | Allies: a speech corpus for segmentation, speaker diarization speech recognition and speaker change detection | |
Levinson et al. | Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification | |
Wang | Mandarin spoken document retrieval based on syllable lattice matching | |
Esteve et al. | Extracting true speaker identities from transcriptions | |
Li et al. | Adaptive speaker identification with audiovisual cues for movie content analysis | |
Tetariy et al. | An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases | |
Vijayasenan | An information theoretic approach to speaker diarization of meeting recordings |