Fernandes et al., 2021 - Google Patents

Enhanced Deep Hierarchical Long Short-Term Memory and Bidirectional Long Short-Term Memory for Tamil Emotional Speech Recognition using Data Augmentation …

Fernandes et al., 2021

View PDF

Document ID: 15777672378294777342
Author: Fernandes B; Mannepalli K
Publication year: 2021
Publication venue: Pertanika Journal of Science & Technology

External Links

Cited by

Snippet

Neural networks have become increasingly popular for language modelling and within these large and deep models, overfitting, and gradient remains an important problem that heavily influences the model performance. As long short-term memory (LSTM) and …

Continue reading at 119.40.116.186 (PDF) (other versions)

230000002996 emotional 0 title abstract description 43

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions

Similar Documents

Publication	Publication Date	Title
Sultana et al.	2021	Bangla speech emotion recognition and cross-lingual study using deep CNN and BLSTM networks
Huang et al.	2015	Joint optimization of masks and deep recurrent neural networks for monaural source separation
Das et al.	2020	A hybrid meta-heuristic feature selection method for identification of Indian spoken languages from audio signals
US8886533B2 (en)	2014-11-11	System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification
Agarwal et al.	2021	RETRACTED ARTICLE: Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition
Guha et al.	2020	Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals
Wu et al.	2021	Environmental sound classification via time–frequency attention and framewise self-attention-based deep neural networks
Nasef et al.	2021	Voice gender recognition under unconstrained environments using self-attention
Swain et al.	2022	A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
Sekkate et al.	2023	A statistical feature extraction for deep speech emotion recognition in a bilingual scenario
Cardona et al.	2017	Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs
Arya et al.	2021	Speech based emotion recognition using machine learning
Telmem et al.	2021	The convolutional neural networks for Amazigh speech recognition system
Song et al.	2025	MPSA-DenseNet: A novel deep learning model for English accent classification
Sharma et al.	2023	HindiSpeech-Net: a deep learning based robust automatic speech recognition system for Hindi language
Mishra et al.	2024	Speech emotion classification using feature-level and classifier-level fusion
Zhang et al.	2022	Chinese dialect tone’s recognition using gated spiking neural P systems
Udeh et al.	2025	Improved ShuffleNet V2 network with attention for speech emotion recognition
Deb et al.	2016	Classification of speech under stress using harmonic peak to energy ratio
Mandal et al.	2025	Is attention always needed? a case study on language identification from speech
Dar et al.	2025	Bi-directional LSTM-based isolated spoken word recognition for Kashmiri language utilizing Mel-spectrogram feature
Tibi et al.	2025	Arabic dialect classification using an adaptive deep learning model
Tailor et al.	2022	Deep learning approach for spoken digit recognition in Gujarati language
Biswas et al.	2023	Speech Emotion Recognition Using Deep CNNs Trained on Log-Frequency Spectrograms
Fernandes et al.	2021	Enhanced Deep Hierarchical Long Short-Term Memory and Bidirectional Long Short-Term Memory for Tamil Emotional Speech Recognition using Data Augmentation and Spatial Features.