Kaya et al., 2017 - Google Patents
Emotion, age, and gender classification in children's speech by humans and machinesKaya et al., 2017
View PDF- Document ID
- 8811920000706917527
- Author
- Kaya H
- Salah A
- Karpov A
- Frolova O
- Grigorev A
- Lyakso E
- Publication year
- Publication venue
- Computer Speech & Language
External Links
Snippet
In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in …
- 230000002996 emotional 0 abstract description 87
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kaya et al. | Emotion, age, and gender classification in children’s speech by humans and machines | |
| Fagherazzi et al. | Voice for health: the use of vocal biomarkers from research to clinical practice | |
| Swain et al. | Databases, features and classifiers for speech emotion recognition: a review | |
| Eni et al. | Estimating autism severity in young children from speech signals using a deep neural network | |
| Rao et al. | Emotion recognition using speech features | |
| Rohanian et al. | Alzheimer's dementia recognition using acoustic, lexical, disfluency and speech pause features robust to noisy inputs | |
| Koolagudi et al. | Emotion recognition from speech: a review | |
| Le et al. | Automatic quantitative analysis of spontaneous aphasic speech | |
| Anagnostopoulos et al. | Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 | |
| Alonso et al. | New approach in quantification of emotional intensity from the speech signal: emotional temperature | |
| Lyakso et al. | EmoChildRu: emotional child Russian speech corpus | |
| Brahmi et al. | Exploring the role of machine learning in diagnosing and treating speech disorders: a systematic literature review | |
| Pérez-Toro et al. | Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing | |
| Leal et al. | Speech-based depression assessment: A comprehensive survey | |
| Radha et al. | Automatic speaker and age identification of children from raw speech using sincnet over erb scale | |
| Campbell et al. | Alzheimer's Dementia Detection from Audio and Language Modalities in Spontaneous Speech. | |
| Rangra et al. | Emotional speech-based personality prediction using NPSO architecture in deep learning | |
| Xin et al. | JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions | |
| Ouyang et al. | Individual differences in the prosodic encoding of informativity | |
| Shanmugam et al. | Understanding the use of acoustic measurement and Mel Frequency Cepstral Coefficient (MFCC) features for the classification of depression speech | |
| Pravin et al. | Acousto-prosodic delineation and classification of speech disfluencies in bilingual children | |
| Themistocleous | Computational Language Assessment in patients with speech, language, and communication impairments. | |
| Stasak | An investigation of acoustic, linguistic, and affect based methods for speech depression assessment | |
| Feng et al. | Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics | |
| Pandey | Segmental evaluation of Text-to-Speech Synthesizers |