[go: up one dir, main page]

Broux et al., 2016 - Google Patents

An active learning method for speaker identity annotation in audio recordings

Broux et al., 2016

View PDF
Document ID
8542369261892039884
Author
Broux P
Doukhan D
Petitrenaud S
Meignier S
Carrive J
Publication year
Publication venue
1st International Workshop on Multimodal Media Data Analytics (MMDA 2016)

External Links

Snippet

Given that manual annotation of speech is an expensive and long process, we attempt in this paper to assist an anno-tator to perform a speaker diarization. This assistance takes place in an annotation background for a large amount of archives. We propose a method …
Continue reading at hal.science (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3074Audio data retrieval
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication Publication Date Title
Makhoul et al. Speech and language technologies for audio indexing and retrieval
US6424946B1 (en) Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
US10109280B2 (en) Blind diarization of recorded calls with arbitrary number of speakers
US20220328037A1 (en) System and method for neural network orchestration
Delacourt et al. DISTBIC: A speaker-based segmentation for audio data indexing
Moattar et al. A review on speaker diarization systems and approaches
Vijayasenan et al. An information theoretic approach to speaker diarization of meeting data
Rouvier et al. An open-source state-of-the-art toolbox for broadcast news diarization
US6748356B1 (en) Methods and apparatus for identifying unknown speakers using a hierarchical tree structure
US20180218738A1 (en) Word-level blind diarization of recorded calls with arbitrary number of speakers
Rao et al. Pitch synchronous and glottal closure based speech analysis for language recognition
Poignant et al. Unsupervised speaker identification in TV broadcast based on written names
Takamichi et al. JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification
Bhanja et al. Deep residual networks for pre-classification based Indian language identification
CN101226558B (en) Method for searching audio data based on MFCCM
Broux et al. An active learning method for speaker identity annotation in audio recordings
Xiong et al. A tree-based kernel selection approach to efficient Gaussian mixture model–universal background model based speaker identification
Bhowmick et al. Identification/segmentation of indian regional languages with singular value decomposition based feature embedding
Tahon et al. Allies: a speech corpus for segmentation, speaker diarization speech recognition and speaker change detection
Levinson et al. Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification
Wang Mandarin spoken document retrieval based on syllable lattice matching
Esteve et al. Extracting true speaker identities from transcriptions
Li et al. Adaptive speaker identification with audiovisual cues for movie content analysis
Tetariy et al. An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases
Vijayasenan An information theoretic approach to speaker diarization of meeting recordings