CN109065070B - A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function - Google Patents
A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function Download PDFInfo
- Publication number
- CN109065070B CN109065070B CN201810995309.7A CN201810995309A CN109065070B CN 109065070 B CN109065070 B CN 109065070B CN 201810995309 A CN201810995309 A CN 201810995309A CN 109065070 B CN109065070 B CN 109065070B
- Authority
- CN
- China
- Prior art keywords
- audio
- dimension reduction
- signal
- kernel function
- dimensionality reduction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000009467 reduction Effects 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000005236 sound signal Effects 0.000 claims abstract description 26
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 15
- 239000002245 particle Substances 0.000 claims description 20
- 230000037433 frameshift Effects 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 7
- 230000001133 acceleration Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000013139 quantization Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims 2
- 238000009432 framing Methods 0.000 claims 1
- 230000006911 nucleation Effects 0.000 claims 1
- 238000010899 nucleation Methods 0.000 claims 1
- 230000000007 visual effect Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 4
- 238000013507 mapping Methods 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 abstract description 2
- 230000007547 defect Effects 0.000 abstract 1
- 238000010276 construction Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 3
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种基于核函数的音频特征信号的降维方法,属于音频特征信号处理技术领域。The invention relates to a dimensionality reduction method for an audio feature signal based on a kernel function, and belongs to the technical field of audio feature signal processing.
背景技术Background technique
为了实现对无线音频广播的管控,对音频广播进行安全高效的实时监听和甄别,音频信息的快速处理关系到整个流程的进程速度,而音频的特征信号降维处理作为音频信息处理的核心,其效率与可信度也必成为了目前亟待解决的问题。就目前而言已大部分针对音频特征信号降维方法主要有局部保留投影法、多维缩放法、局部线性嵌入法、主成分析法等。这些降维算法大多复杂度高,以丢弃部分特征信号达到降维的目的,在实际中工程应用中会造成不可预测的误差,本发明便是针对上述弊端所提出的。In order to realize the management and control of wireless audio broadcasting, and conduct safe and efficient real-time monitoring and screening of audio broadcasting, the rapid processing of audio information is related to the process speed of the entire process, and the feature signal dimensionality reduction processing of audio is the core of audio information processing. Efficiency and credibility must also become urgent problems to be solved at present. At present, most of the dimensionality reduction methods for audio feature signals mainly include local preservation projection method, multi-dimensional scaling method, local linear embedding method, principal component analysis method and so on. Most of these dimensionality reduction algorithms have high complexity, and the purpose of dimensionality reduction is achieved by discarding some characteristic signals, which will cause unpredictable errors in practical engineering applications. The present invention is proposed to address the above drawbacks.
发明内容SUMMARY OF THE INVENTION
本发明要解决的技术问题是提供一种基于核函数的音频特征信号的降维方法,对提取出的音频线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行降维分析,达到缩减数据维度,提高信息处理速率的目的。The technical problem to be solved by the present invention is to provide a dimensionality reduction method for audio feature signal based on kernel function. (MFCC) performs dimensionality reduction analysis to reduce the data dimension and improve the information processing rate.
本发明的技术方案如是:一种基于核函数的音频特征信号的降维方法。该方法包括以下具体步骤:The technical solutions of the present invention are as follows: a dimensionality reduction method for audio feature signals based on a kernel function. The method includes the following specific steps:
(1)音频信号采集:采集音频信号,获得音频样本。(1) Audio signal collection: collect audio signals to obtain audio samples.
(2)音频信号预处理:将所采集的音频样本中的模拟信号转换为数字信号,将数字信号写入WAV文件中。对写入WAV文件中的数字信号进行滤波、预加重、分帧处理。(2) Audio signal preprocessing: convert the analog signal in the collected audio sample into a digital signal, and write the digital signal into a WAV file. Filter, pre-emphasize and frame the digital signal written in the WAV file.
(3)特征参数提取:对处理后的数字信号中的线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行高维特征参数的提取。(3) Feature parameter extraction: Extract high-dimensional feature parameters for linear prediction coefficients (LPC), linear prediction cepstral coefficients (LPCC), and Mel frequency cepstral coefficients (MFCC) in the processed digital signal.
(4)降维模型的搭建:将上述提取出的特征参数送入通过核化技巧(kerneltrick)所搭建的降维模型中直接得到低维隐变量,所述低维隐变量即为降维后的数据。其核心是用高斯回归过程模型(GPR)来非线性建模隐变量和观测变量的关系。(4) Construction of the dimensionality reduction model: the above-mentioned extracted feature parameters are sent into the dimensionality reduction model built by the kernel trick to directly obtain low-dimensional hidden variables, and the low-dimensional hidden variables are the dimensionality reduction. The data. Its core is to use Gaussian regression process model (GPR) to nonlinearly model the relationship between latent variables and observed variables.
(5)降维分析:将降维后的数据进行可视化展示(2D/3D),与其它降维方法所得到的结果进行对比。(5) Dimensionality reduction analysis: Visually display (2D/3D) data after dimensionality reduction, and compare it with the results obtained by other dimensionality reduction methods.
上述的一种基于核函数的音频特征信号的降维方法,步骤(1)中所述音频采集是通过音频采集装置采集音频样本,音频采集器对音频信号采集时设置好采样频率(采样频率满足奈奎斯特采样定理)、采样声道数、量化精度。The above-mentioned dimensionality reduction method based on the audio feature signal of a kernel function, the audio collection described in step (1) is to collect audio samples by an audio collection device, and the audio collection device sets the sampling frequency when the audio signal is collected (the sampling frequency satisfies the Nyquist sampling theorem), number of sampling channels, quantization accuracy.
上述的一种基于核函数的音频特征信号的降维方法,步骤(2)中音频信号预处理包括以下步骤:In the above-mentioned dimensionality reduction method of the audio feature signal based on the kernel function, the audio signal preprocessing in step (2) comprises the following steps:
(1)采用矩形窗函数w(n)(上限频率一般取fH=3400Hz,下限频率fL=60~100Hz)对采集的音频信号x(n)进行滤波处理得到信号ya(n),其中(1) Use the rectangular window function w(n) (the upper limit frequency is generally taken as f H = 3400 Hz, and the lower limit frequency f L = 60 ~ 100 Hz) to filter the collected audio signal x (n) to obtain the signal y a (n), in
(2)对滤波处理后的信号ya(n)进行差分方法进行预加重处理得到信号yb(n),其中yb(n)=y(n)-αy(n-1)(α为预加重系数一般取值接近于1)。提升高频部分,抑制低频部分,使信号的频谱变得平坦。(2) Perform pre-emphasis processing on the filtered signal y a (n) by the differential method to obtain the signal y b (n), where y b (n)=y(n)-αy(n-1) (α is The pre-emphasis coefficient is generally close to 1). Boosts high frequencies and suppresses low frequencies to flatten the spectrum of the signal.
(3)分帧语音信号的短时分析是将信号分割成若干个语音段,一段称为一帧,每段的时间范围在10~30ms之间。为了保证帧与帧之间平滑过渡,帧与帧之间有部分重叠,重叠的部分称为帧移,帧移取帧长的1/2或1/3。(3) The short-term analysis of the framed speech signal is to divide the signal into several speech segments, one segment is called a frame, and the time range of each segment is between 10 and 30 ms. In order to ensure a smooth transition between frames, there is a partial overlap between frames, and the overlapping part is called frame shift, and the frame shift takes 1/2 or 1/3 of the frame length.
上述的一种基于核函数的音频特征信号的降维方法,步骤(3)特征参数提取包括以下步骤:In the above-mentioned dimensionality reduction method of the audio feature signal based on kernel function, step (3) feature parameter extraction comprises the following steps:
(1)线性预测系数(LPC):利用编程调用LPC函数包,设置好帧长、帧移、窗函数、LPC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格中1。(1) Linear Prediction Coefficient (LPC): Use programming to call the LPC function package, set the frame length, frame shift, window function, and order parameters of LPC, and perform eigenvalues on the preprocessed audio signal in the above step (2). , put into the specified table 1.
(2)线性预测倒谱系数(LPCC):利用编程调用LPCC函数包,设置好帧长、帧移、窗函数、LPCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格中2。(2) Linear prediction cepstral coefficient (LPCC): use programming to call the LPCC function package, set the frame length, frame shift, window function, and order parameters of LPCC, and perform the preprocessed audio signal in the above step (2). Extraction of eigenvalues and put them into the specified table 2.
(3)梅尔频率倒谱系数(MFCC):利用编程调用MFCC函数包,设置好帧长、帧移、窗函数、MFCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格中3。(3) Mel frequency cepstral coefficient (MFCC): use programming to call the MFCC function package, set the frame length, frame shift, window function, and order parameters of the MFCC, and perform the preprocessed audio signal in the above step (2). Extract the eigenvalues and put them in the specified table 3.
上述的一种基于核函数的音频特征信号的降维方法,步骤(4)中降维模型的搭建包括以下步骤:In the above-mentioned dimensionality reduction method of audio feature signal based on kernel function, the building of the dimensionality reduction model in step (4) includes the following steps:
(1)特征降维模型搭建首先记隐空间为维度为q,记观测空间为维度为d(q<d)。假设观测值与隐空间参量之间存在y=f(z)+ε关系,噪声ε服从均值为0,方差为β的高斯分布,并假设隐函数f是满足高斯过程的平方指数核函数(1) Building the feature dimensionality reduction model First, the latent space is recorded as The dimension is q, and the observation space is recorded as The dimension is d (q<d). Assume that there is a y=f(z)+ε relationship between the observed value and the latent space parameter, the noise ε obeys a Gaussian distribution with a mean of 0 and a variance of β, and assumes that the implicit function f is a square exponential kernel function that satisfies the Gaussian process
其中σ为平方指数核的系数参数,l表示z与z′两点之间距离影响因数参数,β表示模型的一个超参量参数,σ(z,z′)表示的是Kroneckerdelta函数,核函数中要求解的参量为θ(σ,l,β)。当z与z′很接近时其核函数取得最大值,距离很远时取得最小值。为了便于后续推导,先给出协方差矩阵的计算公式,其公式为 where σ is the coefficient parameter of the square exponential kernel, l represents the distance influencing factor parameter between the two points z and z', β represents a hyperparameter parameter of the model, σ(z, z') represents the Kroneckerdelta function, in the kernel function The parameters to be solved are θ(σ,l,β). When z and z' are very close, the kernel function obtains the maximum value, and when the distance is very far, the kernel function obtains the minimum value. In order to facilitate the subsequent derivation, the calculation formula of the covariance matrix is given first, and its formula is
(2)假定对d维观测空间进行了独立采样,则可得关于Y的观测概率,其中y:,i为观测值空间Y中的第i维的n个元素(2) Assuming that the d-dimensional observation space is independently sampled, the observation probability about Y can be obtained, where y :,i is the n element of the i-th dimension in the observation value space Y
要想获得较好的降维效果,即采用相关算法获取最好的核函数超参量使得上述概率最大化,此处采用粒子群寻优算法对其进行求解,把θ(σ,l,β)记为A=(a1,a2,a3),其中粒子i的速度记为vi=(vi1,vi2,vi3),粒子经过最好的最好位置记为pg=(pg1,pg2,pg3),粒子群算法采用如下方程对粒子所在的位置进行不断更新In order to obtain a better dimensionality reduction effect, the relevant algorithm is used to obtain the best hyperparameters of the kernel function to maximize the above probability. Here, the particle swarm optimization algorithm is used to solve it, and θ(σ,l,β) Denoted as A = (a 1 , a 2 , a 3 ), where the speed of particle i is denoted as vi = (v i1 , v i2 , v i3 ), and the best position the particle passes through is denoted as p g = ( p g1 ,p g2 ,p g3 ), the particle swarm algorithm uses the following equation to continuously update the position of the particle
其中w是非负的惯性因子;加速常数c1与c2是非负数;r1与r2是在[0 1]范围内变换的随机数。利用粒子群优化算法当前位置、经验位置和邻居位信息进行粒子状态的调整,将粒子群优化算法这种信息交换模式应用到核参数优化过程中,粒子受到自身经验和群里经验的双重影响,故而有较好的全局寻优能力和收敛速度。 where w is a non-negative inertia factor; acceleration constants c 1 and c 2 are non-negative numbers; r 1 and r 2 are random numbers transformed in the range of [0 1]. Particle swarm optimization algorithm current position, experience position and neighbor position information are used to adjust the particle state, and the information exchange mode of particle swarm optimization algorithm is applied to the process of kernel parameter optimization. Therefore, it has better global optimization ability and convergence speed.
本模型所使用的核函数为非线性核函数,将其求出的核参量θ(σ,l,β)回带入模型,将上述提取出的特征参量送入降维模型得到隐参量,所述隐参量即降维后的数据。The kernel function used in this model is a nonlinear kernel function, and the obtained kernel parameters θ(σ, l, β) are brought back into the model, and the above-extracted characteristic parameters are sent into the dimensionality reduction model to obtain hidden parameters, so The hidden parameters are the data after dimensionality reduction.
上述的一种基于音频特征信号的降维分析方法,步骤(5)中将上述降维后数据进行二维或三维可视化显示,进而同其他降维算法结果进行分析和对比。In the above-mentioned dimensionality reduction analysis method based on audio feature signal, in step (5), the above-mentioned dimensionality reduction data is displayed in two-dimensional or three-dimensional visualization, and then the results of other dimensionality reduction algorithms are analyzed and compared.
本发明与现有的基于核函数的音频特征信号的降维方法的优点有:The advantages of the present invention and the existing kernel function-based audio feature signal dimensionality reduction method are as follows:
(1)本发明用非线性核函数表示观测空间数据与隐空间的参量直接的关系,避免了用线性映射导致的某些音频特征数据降维效果差的缺点。(1) The present invention uses a nonlinear kernel function to represent the direct relationship between the observation space data and the parameters of the latent space, avoiding the disadvantage of poor dimensionality reduction effect for some audio feature data caused by linear mapping.
(2)本发明是采用粒子群算法对核函数中的超参量进行求解,粒子群优良的全局寻优能力和群粒子的方向性可以很快找到最优超参量,对于后续更换其他核函数也是极其方便的。(2) The present invention uses the particle swarm algorithm to solve the hyperparameters in the kernel function. The excellent global optimization ability of the particle swarm and the directionality of the swarm particles can quickly find the optimal hyperparameters, which is also the same for subsequent replacement of other kernel functions. Extremely convenient.
(3)本发明提出的新型音频特征降维算理论简单,编程易于实现,更加适应于现实工程项目的应用,对音频信息处理速度的提升有实质性的改变。(3) The novel audio feature dimensionality reduction algorithm proposed by the present invention is simple in theory, easy to implement in programming, more suitable for application in practical engineering projects, and has substantial changes in the improvement of audio information processing speed.
附图说明Description of drawings
图1本发明降维分析流程图;Fig. 1 dimensionality reduction analysis flow chart of the present invention;
图2本发明信号预处理流程图;Fig. 2 signal preprocessing flow chart of the present invention;
图3本发明特征参数提取与降维处理流程图;Fig. 3 feature parameter extraction and dimensionality reduction processing flow chart of the present invention;
具体实施方式Detailed ways
下面结合附图和实施例对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
如图1-3所示,一种基于核函数的音频特征信号的降维方法,具体步骤为:As shown in Figure 1-3, a dimensionality reduction method of audio feature signal based on kernel function, the specific steps are:
(1)音频信号采集:采集音频信号,获得音频样本。(1) Audio signal collection: collect audio signals to obtain audio samples.
(2)音频信号预处理:将所采集的音频样本中的模拟信号转换为数字信号,将数字信号写入WAV文件中。对将写入WAV文件中的数字信号进行滤波、预加重、分帧处理。(2) Audio signal preprocessing: convert the analog signal in the collected audio sample into a digital signal, and write the digital signal into a WAV file. Filter, pre-emphasize, and frame the digital signal to be written into the WAV file.
(3)特征参数提取:对处理后的数字信号中的线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行高维特征参数的提取。(3) Feature parameter extraction: Extract high-dimensional feature parameters for linear prediction coefficients (LPC), linear prediction cepstral coefficients (LPCC), and Mel frequency cepstral coefficients (MFCC) in the processed digital signal.
(4)降维模型的搭建:将上述提取出的特征参数送入通过核化技巧(kerneltrick)所搭建的降维模型中直接得到低维隐变量,所述低维隐变量即为降维后的数据。(4) Construction of the dimensionality reduction model: the above-mentioned extracted feature parameters are sent into the dimensionality reduction model built by the kernel trick to directly obtain low-dimensional hidden variables, and the low-dimensional hidden variables are the dimensionality reduction. The data.
(5)降维分析:将降维后的数据进行可视化展示(2D/3D),与其它降维方法所得到的结果进行对比。(5) Dimensionality reduction analysis: Visually display (2D/3D) data after dimensionality reduction, and compare it with the results obtained by other dimensionality reduction methods.
所述音频采集是通过音频采集装置采集音频样本,音频采集器对音频信号采集时设置好采样频率为44.1Hz(采样频率满足奈奎斯特采样定理)、因为本所采集的是语音信号故采样声道数为单声道、量化精度为16bit。The audio collection is to collect audio samples through an audio collection device. When the audio collection device collects audio signals, the sampling frequency is set to 44.1Hz (the sampling frequency satisfies the Nyquist sampling theorem). The number of channels is mono, and the quantization precision is 16bit.
所述的信号预处理包括以下步骤:The signal preprocessing includes the following steps:
(1)采用矩形窗函数w(n)(上限频率一般取fH=3400Hz,下限频率fL=60~100Hz)对采集的音频信号x(n)进行滤波处理得到信号ya(n),其中 (1) Use the rectangular window function w(n) (the upper limit frequency is generally taken as f H = 3400 Hz, and the lower limit frequency f L = 60 ~ 100 Hz) to filter the collected audio signal x (n) to obtain the signal y a (n), in
(2)对滤波处理后的信号ya(n)进行差分方法进行预加重处理得到信号yb(n),其中yb(n)=y(n)-αy(n-1)(α为预加重系数一般取值接近于1)。(2) Perform pre-emphasis processing on the filtered signal y a (n) by the differential method to obtain the signal y b (n), where y b (n)=y(n)-αy(n-1) (α is The pre-emphasis coefficient is generally close to 1).
(3)将预加重处理得到信号yb(n)分割成若干个语音段,一段称为一帧,每段的时间范围在10~30ms之间。帧与帧之间有部分重叠,重叠的部分称为帧移,帧移取帧长的1/2或1/3。(3) The signal y b (n) obtained by pre-emphasis processing is divided into several speech segments, one segment is called a frame, and the time range of each segment is between 10 and 30 ms. There is a partial overlap between frames, the overlapping part is called frame shift, and the frame shift takes 1/2 or 1/3 of the frame length.
所述特征参数提取包括以下步骤:The feature parameter extraction includes the following steps:
(1)线性预测系数(LPC):利用编程调用LPC函数包,设置好帧长、帧移、窗函数、LPC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格1中。(1) Linear Prediction Coefficient (LPC): Use programming to call the LPC function package, set the frame length, frame shift, window function, and order parameters of LPC, and perform eigenvalues on the preprocessed audio signal in the above step (2). , into the specified table 1.
(2)线性预测倒谱系数(LPCC):利用编程调用LPCC函数包,设置好帧长、帧移、窗函数、LPCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格2中。(2) Linear prediction cepstral coefficient (LPCC): use programming to call the LPCC function package, set the frame length, frame shift, window function, and order parameters of LPCC, and perform the preprocessed audio signal in the above step (2). The extraction of eigenvalues is put into the specified table 2.
(3)梅尔频率倒谱系数(MFCC):利用编程调用MFCC函数包,设置好帧长、帧移、窗函数、MFCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格3中。(3) Mel frequency cepstral coefficient (MFCC): use programming to call the MFCC function package, set the frame length, frame shift, window function, and order parameters of the MFCC, and perform the preprocessed audio signal in the above step (2). Extract the eigenvalues and put them in the specified table 3.
所述降维模型的搭建包括以下步骤:The construction of the dimensionality reduction model includes the following steps:
(1)记隐空间参量为观测空间为 (1) The hidden space parameter is recorded as The observation space is
即隐空间为维度q,观测空间维度为d(q<d)假设观测值与隐空间参量直接存在y=f(z)+ε关系,噪声ε服从均值为0,方差为ξ的高斯分布,并假设隐函数f是满足高斯过程的平方指数核函数:That is, the dimension of the hidden space is q, and the dimension of the observation space is d (q<d). Assuming that there is a y=f(z)+ε relationship between the observed value and the hidden space parameters, the noise ε follows a Gaussian distribution with a mean of 0 and a variance of ξ. And assume that the implicit function f is a square exponential kernel function satisfying a Gaussian process:
其中σ为平方指数核的系数参数,l表示z与z′两点之间距离影响因数参数,β表示模型的一个超参量参数,σ(z,z′)表示的是Kronecker delta函数,核函数中要求解的参量为θ(σ,l,β)。当z与z′很接近时其核函数取得最大值,距离很远时取得最小值。协方差矩阵的计算公式为:where σ is the coefficient parameter of the square exponential kernel, l represents the distance influencing factor parameter between the two points z and z', β represents a hyperparameter parameter of the model, σ(z, z') represents the Kronecker delta function, the kernel function The parameters to be solved in are θ(σ,l,β). When z and z' are very close, the kernel function obtains the maximum value, and when the distance is very far, the kernel function obtains the minimum value. The formula for calculating the covariance matrix is:
(2)假定对d维观测空间进行了独立采样,则可得关于Y的观测概率,其中y:,i为观测值空间Y中的第i维的n个元素(2) Assuming that the d-dimensional observation space is independently sampled, the observation probability about Y can be obtained, where y :,i is the n element of the i-th dimension in the observation value space Y
本发明此处采用粒子群寻优算法对其参数进行求解,把θ(σ,l,β)记为A=(a1,a2,a3),其中粒子i的速度记为vi=(vi1,vi2,vi3),粒子经过最好的最好位置记为pg=(pg1,pg2,pg3),粒子群算法位置迭代更新公式:In the present invention, the particle swarm optimization algorithm is used to solve its parameters, and θ(σ, l, β) is denoted as A=(a 1 , a 2 , a 3 ), and the speed of particle i is denoted as vi = (v i1 ,v i2 ,v i3 ), the best position where the particle passes through is denoted as p g =(p g1 ,p g2 ,p g3 ), the particle swarm algorithm position iterative update formula:
其中w是非负的惯性因子;加速常数c1与c2是非负数;r1与r2是在[0 1]范围内变换的随机数。将求出的核参量θ(σ,l,β)回带入模型得到基于核函数的降维模型,将上述提取出的特征参量送入降维模型得到隐参量,所述隐参量即降维后的数据。where w is a non-negative inertia factor; acceleration constants c 1 and c 2 are non-negative numbers; r 1 and r 2 are random numbers transformed in the range of [0 1]. Bring the obtained kernel parameters θ(σ, l, β) back into the model to obtain a dimensionality reduction model based on the kernel function, and send the above extracted feature parameters into the dimensionality reduction model to obtain hidden parameters, which are dimensionality reduction. data after.
所述降维分析中因为人生活在三维空间,对超越三维的空间无法想象,并且对于数据组较多的降维结果直接分析较为困难,故将预处理好的音频信号送入搭建好的降维模型中进行降维处理,对获得的隐参量数据进行保存并且进行可视化展示,以便于与其他的降维模型进行优劣对比分析。本发明并不限于上述实施方式,在本领域普通技术人员所具备的知识范围内,还可以将其降维算法应用到其他相关领域。In the dimensionality reduction analysis, because people live in three-dimensional space, it is impossible to imagine the space beyond three-dimensional space, and it is difficult to directly analyze the dimensionality reduction results with many data sets. The dimensionality reduction process is carried out in the dimensional model, and the obtained hidden parameter data is saved and displayed visually, so as to facilitate the comparative analysis with other dimensionality reduction models. The present invention is not limited to the above-mentioned embodiments, and the dimensionality reduction algorithm thereof can also be applied to other related fields within the scope of knowledge possessed by those of ordinary skill in the art.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810995309.7A CN109065070B (en) | 2018-08-29 | 2018-08-29 | A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810995309.7A CN109065070B (en) | 2018-08-29 | 2018-08-29 | A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109065070A CN109065070A (en) | 2018-12-21 |
CN109065070B true CN109065070B (en) | 2022-07-19 |
Family
ID=64757611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810995309.7A Active CN109065070B (en) | 2018-08-29 | 2018-08-29 | A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109065070B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112444785B (en) | 2019-08-30 | 2024-04-12 | 华为技术有限公司 | A method, device and radar system for identifying target behavior |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679321A (en) * | 2016-01-29 | 2016-06-15 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and device and terminal |
CN105913066A (en) * | 2016-04-13 | 2016-08-31 | 刘国栋 | Digital lung sound characteristic dimension reducing method based on relevance vector machine |
CN106898362A (en) * | 2017-02-23 | 2017-06-27 | 重庆邮电大学 | The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis |
CN109166591A (en) * | 2018-08-29 | 2019-01-08 | 昆明理工大学 | A kind of classification method based on audio frequency characteristics signal |
CN109346104A (en) * | 2018-08-29 | 2019-02-15 | 昆明理工大学 | A Dimensionality Reduction Method for Audio Features Based on Spectral Clustering |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8756061B2 (en) * | 2011-04-01 | 2014-06-17 | Sony Computer Entertainment Inc. | Speech syllable/vowel/phone boundary detection using auditory attention cues |
-
2018
- 2018-08-29 CN CN201810995309.7A patent/CN109065070B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679321A (en) * | 2016-01-29 | 2016-06-15 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and device and terminal |
CN105913066A (en) * | 2016-04-13 | 2016-08-31 | 刘国栋 | Digital lung sound characteristic dimension reducing method based on relevance vector machine |
CN106898362A (en) * | 2017-02-23 | 2017-06-27 | 重庆邮电大学 | The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis |
CN109166591A (en) * | 2018-08-29 | 2019-01-08 | 昆明理工大学 | A kind of classification method based on audio frequency characteristics signal |
CN109346104A (en) * | 2018-08-29 | 2019-02-15 | 昆明理工大学 | A Dimensionality Reduction Method for Audio Features Based on Spectral Clustering |
Non-Patent Citations (6)
Title |
---|
"Hierarchical Gaussian Process Latent Variable Models";Neil D.Lawrence;《Machine Learning, Proceedings of the Twenty-Fourth International Conference》;20171230;第20-24页 * |
"Semi-supervised Gaussian process latent variable model with pairwise";Xiumei Wang 等;《Neurocomputing》;20101230;全文 * |
"语音情感特征提取及其降维方法综述";刘振焘 等;《计算机学报》;20181230;全文 * |
"基于语音特征的汉语数字语音降维与识别研究";高文曦;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20120715;第31-33页 * |
"基于高斯过程隐变量模型的数据降维与分类";张家源;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20181015;全文 * |
"降维技术与方法综述";张煜东;《四川兵工学报》;20101030;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109065070A (en) | 2018-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109599120B (en) | Abnormal mammal sound monitoring method based on large-scale farm plant | |
Müller et al. | Acoustic anomaly detection for machine sounds based on image transfer learning | |
CN107393554B (en) | A feature extraction method based on fusion of inter-class standard deviations in acoustic scene classification | |
CN105976809B (en) | Recognition method and system based on dual-modal emotion fusion of voice and facial expression | |
CN103310789B (en) | A kind of sound event recognition method of the parallel model combination based on improving | |
CN109166591B (en) | Classification method based on audio characteristic signals | |
WO2016155047A1 (en) | Method of recognizing sound event in auditory scene having low signal-to-noise ratio | |
CN107564543B (en) | A Speech Feature Extraction Method with High Emotion Discrimination | |
CN106653032A (en) | Animal sound detecting method based on multiband energy distribution in low signal-to-noise-ratio environment | |
CN105448291A (en) | Parkinsonism detection method and detection system based on voice | |
CN112651452A (en) | Fan blade abnormity detection method and storage medium | |
Hao et al. | Time-domain neural network approach for speech bandwidth extension | |
CN109065070B (en) | A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function | |
CN117316178A (en) | Voiceprint recognition method, device, equipment and medium for power equipment | |
CN102623007B (en) | Classification method of audio features based on variable duration | |
CN111179972A (en) | Human voice detection algorithm based on deep learning | |
CN104867493B (en) | Multifractal Dimension end-point detecting method based on wavelet transformation | |
CN108564967B (en) | Mel energy voiceprint feature extraction method for cry detection system | |
CN112863541A (en) | Audio cutting method and system based on clustering and median convergence | |
CN113990297B (en) | An audio tampering identification method based on ENF | |
CN109935234B (en) | Method for identifying source equipment of sound recording | |
Biswas et al. | Audio visual isolated Oriya digit recognition using HMM and DWT | |
Therese et al. | A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system | |
Han et al. | ARResNet: A convolutional neural network based on human ear features to construct abnormal sound detection system for air-conditioning | |
CN109215633A (en) | The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |