Krusche, 2021 - Google Patents
Visualization and auralization of features learned by neural networks for musical instrument recognitionKrusche, 2021
- Document ID
- 18169282700840059244
- Author
- Krusche A
- Publication year
- Publication venue
- PQDT-Global
External Links
Snippet
In computer vision a number of feature visualization techniques were developed to make convolutional networks more interpretable. For audio classification those methods are used as well but are not as extensively investigated. This thesis picks up on that and investigates …
- 238000012800 visualization 0 title abstract description 238
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/215—Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
- G10H2250/235—Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Klapuri | Automatic music transcription as we know it today | |
| US20110225196A1 (en) | Moving image search device and moving image search program | |
| Hu et al. | Separation of singing voice using nonnegative matrix partial co-factorization for singer identification | |
| DE10123366C1 (en) | Device for analyzing an audio signal for rhythm information | |
| Durrieu et al. | Main instrument separation from stereophonic audio signals using a source/filter model | |
| Sainburg et al. | Noisereduce: Domain general noise reduction for time series signals | |
| DE112022006903T5 (en) | METHOD FOR PROCESSING INFORMATION, INFORMATION PROCESSING SYSTEM AND STORAGE MEDIUM | |
| Rao | Audio signal processing | |
| Ullrich et al. | Music transcription with convolutional sequence-to-sequence models | |
| Krusche | Visualization and auralization of features learned by neural networks for musical instrument recognition | |
| Chen et al. | Synthesis and Restoration of Traditional Ethnic Musical Instrument Timbres Based on Time-Frequency Analysis. | |
| Hashemi et al. | Persian music source separation in audio-visual data using deep learning | |
| Papadopoulos | Music-content-adaptive robust principal component analysis for a semantically consistent separation of foreground and background in music audio signals | |
| Lewis et al. | Knowledge discovery-based identification of musical pitches and instruments in polyphonic sounds | |
| Akman et al. | Audio explanation synthesis with generative foundation models | |
| BASTIANELLO | Sound generation using GAN Models | |
| Battenberg | Techniques for machine understanding of live drum performances | |
| Kumar et al. | Machine learning for audio processing: From feature extraction to model selection | |
| dos Santos Moura et al. | Source Extraction based on Binary Masking and Machine Learning | |
| Parsons et al. | Effects of Prosodic Information on Dialect Classification Using Whisper Features | |
| Paulino et al. | Analysis of Frequency Range Effect on the Detection of Voice Disorder Using Convolutional Neural Networks Trained on Spectogram Images | |
| EP1743324B1 (en) | Device and method for analysing an information signal | |
| Kadi et al. | Real-Time Musical Instruments Recognition for Scenography Purposes | |
| Сердюк et al. | INFORMATION TECHNOLOGIES OF NEURAL NETWORK SPEECH RECOGNITION IN REAL-TIME | |
| Serdyuk et al. | Information technologies of neural network speech recognition in real-time |