Krusche, 2021 - Google Patents

Visualization and auralization of features learned by neural networks for musical instrument recognition

Krusche, 2021

Document ID: 18169282700840059244
Author: Krusche A
Publication year: 2021
Publication venue: PQDT-Global

External Links

Cited by

Snippet

In computer vision a number of feature visualization techniques were developed to make convolutional networks more interpretable. For audio classification those methods are used as well but are not as extensively investigated. This thesis picks up on that and investigates …

Continue reading at search.proquest.com (other versions)

238000012800 visualization 0 title abstract description 238

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/215—Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
- G10H2250/235—Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Similar Documents

Publication	Publication Date	Title
Klapuri	2004	Automatic music transcription as we know it today
US20110225196A1 (en)	2011-09-15	Moving image search device and moving image search program
Hu et al.	2015	Separation of singing voice using nonnegative matrix partial co-factorization for singer identification
DE10123366C1 (en)	2002-08-08	Device for analyzing an audio signal for rhythm information
Durrieu et al.	2009	Main instrument separation from stereophonic audio signals using a source/filter model
Sainburg et al.	2024	Noisereduce: Domain general noise reduction for time series signals
DE112022006903T5 (en)	2025-01-09	METHOD FOR PROCESSING INFORMATION, INFORMATION PROCESSING SYSTEM AND STORAGE MEDIUM
Rao	2008	Audio signal processing
Ullrich et al.	2018	Music transcription with convolutional sequence-to-sequence models
Krusche	2021	Visualization and auralization of features learned by neural networks for musical instrument recognition
Chen et al.	2024	Synthesis and Restoration of Traditional Ethnic Musical Instrument Timbres Based on Time-Frequency Analysis.
Hashemi et al.	2020	Persian music source separation in audio-visual data using deep learning
Papadopoulos	2014	Music-content-adaptive robust principal component analysis for a semantically consistent separation of foreground and background in music audio signals
Lewis et al.	2007	Knowledge discovery-based identification of musical pitches and instruments in polyphonic sounds
Akman et al.	2025	Audio explanation synthesis with generative foundation models
BASTIANELLO	2023	Sound generation using GAN Models
Battenberg	2012	Techniques for machine understanding of live drum performances
Kumar et al.	2025	Machine learning for audio processing: From feature extraction to model selection
dos Santos Moura et al.	2021	Source Extraction based on Binary Masking and Machine Learning
Parsons et al.	2025	Effects of Prosodic Information on Dialect Classification Using Whisper Features
Paulino et al.	2024	Analysis of Frequency Range Effect on the Detection of Voice Disorder Using Convolutional Neural Networks Trained on Spectogram Images
EP1743324B1 (en)	2007-10-31	Device and method for analysing an information signal
Kadi et al.	2025	Real-Time Musical Instruments Recognition for Scenography Purposes
Сердюк et al.	2025	INFORMATION TECHNOLOGIES OF NEURAL NETWORK SPEECH RECOGNITION IN REAL-TIME
Serdyuk et al.	2025	Information technologies of neural network speech recognition in real-time