CN109065070B

CN109065070B - A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function

Info

Publication number: CN109065070B
Application number: CN201810995309.7A
Authority: CN
Inventors: 龙华; 杨明亮; 邵玉斌; 杜庆治
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2022-07-19
Anticipated expiration: 2038-08-29
Also published as: CN109065070A

Abstract

The invention relates to a kernel function-based audio characteristic signal dimension reduction method, and belongs to the technical field of audio signal processing. The invention carries out dimension reduction processing on the characteristic parameters of the audio signals, achieves the required dimension reduction effect while not discarding the audio characteristic information quantity, visually displays the final dimension reduction data, and carries out comparison analysis on the results obtained by adopting other audio characteristic parameter dimension reduction methods. The invention carries out dimension reduction on the audio characteristic parameters, mainly carries out dimension reduction processing on a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient of an audio coefficient field, and visually displays the data result after dimension reduction. The audio feature dimension reduction processing of the invention can be used for monitoring broadcast signals and quickly identifying and processing audio signals. The method has simple algorithm, uses the nonlinear kernel function to represent the mapping relation between the Gaussian observation space and the hidden space, and avoids the defects of limited use range and poor dimension reduction effect of a linear mapping method.

Description

A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function

技术领域technical field

本发明涉及一种基于核函数的音频特征信号的降维方法，属于音频特征信号处理技术领域。The invention relates to a dimensionality reduction method for an audio feature signal based on a kernel function, and belongs to the technical field of audio feature signal processing.

背景技术Background technique

为了实现对无线音频广播的管控，对音频广播进行安全高效的实时监听和甄别，音频信息的快速处理关系到整个流程的进程速度，而音频的特征信号降维处理作为音频信息处理的核心，其效率与可信度也必成为了目前亟待解决的问题。就目前而言已大部分针对音频特征信号降维方法主要有局部保留投影法、多维缩放法、局部线性嵌入法、主成分析法等。这些降维算法大多复杂度高，以丢弃部分特征信号达到降维的目的，在实际中工程应用中会造成不可预测的误差，本发明便是针对上述弊端所提出的。In order to realize the management and control of wireless audio broadcasting, and conduct safe and efficient real-time monitoring and screening of audio broadcasting, the rapid processing of audio information is related to the process speed of the entire process, and the feature signal dimensionality reduction processing of audio is the core of audio information processing. Efficiency and credibility must also become urgent problems to be solved at present. At present, most of the dimensionality reduction methods for audio feature signals mainly include local preservation projection method, multi-dimensional scaling method, local linear embedding method, principal component analysis method and so on. Most of these dimensionality reduction algorithms have high complexity, and the purpose of dimensionality reduction is achieved by discarding some characteristic signals, which will cause unpredictable errors in practical engineering applications. The present invention is proposed to address the above drawbacks.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题是提供一种基于核函数的音频特征信号的降维方法，对提取出的音频线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行降维分析，达到缩减数据维度，提高信息处理速率的目的。The technical problem to be solved by the present invention is to provide a dimensionality reduction method for audio feature signal based on kernel function. (MFCC) performs dimensionality reduction analysis to reduce the data dimension and improve the information processing rate.

本发明的技术方案如是：一种基于核函数的音频特征信号的降维方法。该方法包括以下具体步骤：The technical solutions of the present invention are as follows: a dimensionality reduction method for audio feature signals based on a kernel function. The method includes the following specific steps:

(1)音频信号采集：采集音频信号，获得音频样本。(1) Audio signal collection: collect audio signals to obtain audio samples.

(2)音频信号预处理：将所采集的音频样本中的模拟信号转换为数字信号，将数字信号写入WAV文件中。对写入WAV文件中的数字信号进行滤波、预加重、分帧处理。(2) Audio signal preprocessing: convert the analog signal in the collected audio sample into a digital signal, and write the digital signal into a WAV file. Filter, pre-emphasize and frame the digital signal written in the WAV file.

(3)特征参数提取：对处理后的数字信号中的线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行高维特征参数的提取。(3) Feature parameter extraction: Extract high-dimensional feature parameters for linear prediction coefficients (LPC), linear prediction cepstral coefficients (LPCC), and Mel frequency cepstral coefficients (MFCC) in the processed digital signal.

(4)降维模型的搭建：将上述提取出的特征参数送入通过核化技巧(kerneltrick)所搭建的降维模型中直接得到低维隐变量，所述低维隐变量即为降维后的数据。其核心是用高斯回归过程模型(GPR)来非线性建模隐变量和观测变量的关系。(4) Construction of the dimensionality reduction model: the above-mentioned extracted feature parameters are sent into the dimensionality reduction model built by the kernel trick to directly obtain low-dimensional hidden variables, and the low-dimensional hidden variables are the dimensionality reduction. The data. Its core is to use Gaussian regression process model (GPR) to nonlinearly model the relationship between latent variables and observed variables.

(5)降维分析：将降维后的数据进行可视化展示(2D/3D)，与其它降维方法所得到的结果进行对比。(5) Dimensionality reduction analysis: Visually display (2D/3D) data after dimensionality reduction, and compare it with the results obtained by other dimensionality reduction methods.

上述的一种基于核函数的音频特征信号的降维方法，步骤(1)中所述音频采集是通过音频采集装置采集音频样本，音频采集器对音频信号采集时设置好采样频率(采样频率满足奈奎斯特采样定理)、采样声道数、量化精度。The above-mentioned dimensionality reduction method based on the audio feature signal of a kernel function, the audio collection described in step (1) is to collect audio samples by an audio collection device, and the audio collection device sets the sampling frequency when the audio signal is collected (the sampling frequency satisfies the Nyquist sampling theorem), number of sampling channels, quantization accuracy.

上述的一种基于核函数的音频特征信号的降维方法，步骤(2)中音频信号预处理包括以下步骤：In the above-mentioned dimensionality reduction method of the audio feature signal based on the kernel function, the audio signal preprocessing in step (2) comprises the following steps:

(1)采用矩形窗函数w(n)(上限频率一般取f_H＝3400Hz，下限频率f_L＝60～100Hz)对采集的音频信号x(n)进行滤波处理得到信号y_a(n)，其中(1) Use the rectangular window function w(n) (the upper limit frequency is generally taken as f _H = 3400 Hz, and the lower limit frequency f _L = 60 ~ 100 Hz) to filter the collected audio signal x (n) to obtain the signal y _a (n), in

(2)对滤波处理后的信号y_a(n)进行差分方法进行预加重处理得到信号y_b(n)，其中y_b(n)＝y(n)-αy(n-1)(α为预加重系数一般取值接近于1)。提升高频部分，抑制低频部分，使信号的频谱变得平坦。(2) Perform pre-emphasis processing on the filtered signal y _a (n) by the differential method to obtain the signal y _b (n), where y _b (n)=y(n)-αy(n-1) (α is The pre-emphasis coefficient is generally close to 1). Boosts high frequencies and suppresses low frequencies to flatten the spectrum of the signal.

(3)分帧语音信号的短时分析是将信号分割成若干个语音段，一段称为一帧，每段的时间范围在10～30ms之间。为了保证帧与帧之间平滑过渡，帧与帧之间有部分重叠，重叠的部分称为帧移，帧移取帧长的1/2或1/3。(3) The short-term analysis of the framed speech signal is to divide the signal into several speech segments, one segment is called a frame, and the time range of each segment is between 10 and 30 ms. In order to ensure a smooth transition between frames, there is a partial overlap between frames, and the overlapping part is called frame shift, and the frame shift takes 1/2 or 1/3 of the frame length.

上述的一种基于核函数的音频特征信号的降维方法，步骤(3)特征参数提取包括以下步骤：In the above-mentioned dimensionality reduction method of the audio feature signal based on kernel function, step (3) feature parameter extraction comprises the following steps:

(1)线性预测系数(LPC)：利用编程调用LPC函数包，设置好帧长、帧移、窗函数、LPC的阶数参数，对上述步骤(2)中预处理好的音频信号进行特征值的提取，放入指定的表格中1。(1) Linear Prediction Coefficient (LPC): Use programming to call the LPC function package, set the frame length, frame shift, window function, and order parameters of LPC, and perform eigenvalues on the preprocessed audio signal in the above step (2). , put into the specified table 1.

(2)线性预测倒谱系数(LPCC)：利用编程调用LPCC函数包，设置好帧长、帧移、窗函数、LPCC的阶数参数，对上述步骤(2)中预处理好的音频信号进行特征值的提取，放入指定的表格中2。(2) Linear prediction cepstral coefficient (LPCC): use programming to call the LPCC function package, set the frame length, frame shift, window function, and order parameters of LPCC, and perform the preprocessed audio signal in the above step (2). Extraction of eigenvalues and put them into the specified table 2.

(3)梅尔频率倒谱系数(MFCC)：利用编程调用MFCC函数包，设置好帧长、帧移、窗函数、MFCC的阶数参数，对上述步骤(2)中预处理好的音频信号进行特征值的提取，放入指定的表格中3。(3) Mel frequency cepstral coefficient (MFCC): use programming to call the MFCC function package, set the frame length, frame shift, window function, and order parameters of the MFCC, and perform the preprocessed audio signal in the above step (2). Extract the eigenvalues and put them in the specified table 3.

上述的一种基于核函数的音频特征信号的降维方法，步骤(4)中降维模型的搭建包括以下步骤：In the above-mentioned dimensionality reduction method of audio feature signal based on kernel function, the building of the dimensionality reduction model in step (4) includes the following steps:

(1)特征降维模型搭建首先记隐空间为

维度为q，记观测空间为

维度为d(q<d)。假设观测值与隐空间参量之间存在y＝f(z)+ε关系，噪声ε服从均值为0，方差为β的高斯分布，并假设隐函数f是满足高斯过程的平方指数核函数(1) Building the feature dimensionality reduction model First, the latent space is recorded as

The dimension is q, and the observation space is recorded as

The dimension is d (q<d). Assume that there is a y=f(z)+ε relationship between the observed value and the latent space parameter, the noise ε obeys a Gaussian distribution with a mean of 0 and a variance of β, and assumes that the implicit function f is a square exponential kernel function that satisfies the Gaussian process

其中σ为平方指数核的系数参数，l表示z与z′两点之间距离影响因数参数，β表示模型的一个超参量参数，σ(z,z′)表示的是Kroneckerdelta函数，核函数中要求解的参量为θ(σ,l,β)。当z与z′很接近时其核函数取得最大值，距离很远时取得最小值。为了便于后续推导，先给出协方差矩阵的计算公式，其公式为

where σ is the coefficient parameter of the square exponential kernel, l represents the distance influencing factor parameter between the two points z and z', β represents a hyperparameter parameter of the model, σ(z, z') represents the Kroneckerdelta function, in the kernel function The parameters to be solved are θ(σ,l,β). When z and z' are very close, the kernel function obtains the maximum value, and when the distance is very far, the kernel function obtains the minimum value. In order to facilitate the subsequent derivation, the calculation formula of the covariance matrix is given first, and its formula is

(2)假定对d维观测空间进行了独立采样，则可得关于Y的观测概率，其中y_:,i为观测值空间Y中的第i维的n个元素(2) Assuming that the d-dimensional observation space is independently sampled, the observation probability about Y can be obtained, where y _:,i is the n element of the i-th dimension in the observation value space Y

要想获得较好的降维效果，即采用相关算法获取最好的核函数超参量使得上述概率最大化，此处采用粒子群寻优算法对其进行求解，把θ(σ,l,β)记为A＝(a₁,a₂,a₃)，其中粒子i的速度记为v_i＝(v_i1,v_i2,v_i3)，粒子经过最好的最好位置记为p_g＝(p_g1,p_g2,p_g3)，粒子群算法采用如下方程对粒子所在的位置进行不断更新In order to obtain a better dimensionality reduction effect, the relevant algorithm is used to obtain the best hyperparameters of the kernel function to maximize the above probability. Here, the particle swarm optimization algorithm is used to solve it, and θ(σ,l,β) _Denoted as A = (a ₁ , a ₂ , a ₃ ), where the speed of particle i is denoted as vi = (v _i1 , v _i2 , v _i3 ), and the best position the particle passes through is denoted as p _g = ( p _g1 ,p _g2 ,p _g3 ), the particle swarm algorithm uses the following equation to continuously update the position of the particle

其中w是非负的惯性因子；加速常数c₁与c₂是非负数；r₁与r₂是在[0 1]范围内变换的随机数。利用粒子群优化算法当前位置、经验位置和邻居位信息进行粒子状态的调整，将粒子群优化算法这种信息交换模式应用到核参数优化过程中，粒子受到自身经验和群里经验的双重影响，故而有较好的全局寻优能力和收敛速度。

where w is a non-negative inertia factor; acceleration constants c ₁ and c ₂ are non-negative numbers; r ₁ and r ₂ are random numbers transformed in the range of [0 1]. Particle swarm optimization algorithm current position, experience position and neighbor position information are used to adjust the particle state, and the information exchange mode of particle swarm optimization algorithm is applied to the process of kernel parameter optimization. Therefore, it has better global optimization ability and convergence speed.

本模型所使用的核函数为非线性核函数，将其求出的核参量θ(σ,l,β)回带入模型，将上述提取出的特征参量送入降维模型得到隐参量，所述隐参量即降维后的数据。The kernel function used in this model is a nonlinear kernel function, and the obtained kernel parameters θ(σ, l, β) are brought back into the model, and the above-extracted characteristic parameters are sent into the dimensionality reduction model to obtain hidden parameters, so The hidden parameters are the data after dimensionality reduction.

上述的一种基于音频特征信号的降维分析方法，步骤(5)中将上述降维后数据进行二维或三维可视化显示，进而同其他降维算法结果进行分析和对比。In the above-mentioned dimensionality reduction analysis method based on audio feature signal, in step (5), the above-mentioned dimensionality reduction data is displayed in two-dimensional or three-dimensional visualization, and then the results of other dimensionality reduction algorithms are analyzed and compared.

本发明与现有的基于核函数的音频特征信号的降维方法的优点有：The advantages of the present invention and the existing kernel function-based audio feature signal dimensionality reduction method are as follows:

(1)本发明用非线性核函数表示观测空间数据与隐空间的参量直接的关系，避免了用线性映射导致的某些音频特征数据降维效果差的缺点。(1) The present invention uses a nonlinear kernel function to represent the direct relationship between the observation space data and the parameters of the latent space, avoiding the disadvantage of poor dimensionality reduction effect for some audio feature data caused by linear mapping.

(2)本发明是采用粒子群算法对核函数中的超参量进行求解，粒子群优良的全局寻优能力和群粒子的方向性可以很快找到最优超参量，对于后续更换其他核函数也是极其方便的。(2) The present invention uses the particle swarm algorithm to solve the hyperparameters in the kernel function. The excellent global optimization ability of the particle swarm and the directionality of the swarm particles can quickly find the optimal hyperparameters, which is also the same for subsequent replacement of other kernel functions. Extremely convenient.

(3)本发明提出的新型音频特征降维算理论简单，编程易于实现，更加适应于现实工程项目的应用，对音频信息处理速度的提升有实质性的改变。(3) The novel audio feature dimensionality reduction algorithm proposed by the present invention is simple in theory, easy to implement in programming, more suitable for application in practical engineering projects, and has substantial changes in the improvement of audio information processing speed.

附图说明Description of drawings

图1本发明降维分析流程图；Fig. 1 dimensionality reduction analysis flow chart of the present invention;

图2本发明信号预处理流程图；Fig. 2 signal preprocessing flow chart of the present invention;

图3本发明特征参数提取与降维处理流程图；Fig. 3 feature parameter extraction and dimensionality reduction processing flow chart of the present invention;

具体实施方式Detailed ways

下面结合附图和实施例对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

如图1-3所示，一种基于核函数的音频特征信号的降维方法，具体步骤为：As shown in Figure 1-3, a dimensionality reduction method of audio feature signal based on kernel function, the specific steps are:

(2)音频信号预处理：将所采集的音频样本中的模拟信号转换为数字信号，将数字信号写入WAV文件中。对将写入WAV文件中的数字信号进行滤波、预加重、分帧处理。(2) Audio signal preprocessing: convert the analog signal in the collected audio sample into a digital signal, and write the digital signal into a WAV file. Filter, pre-emphasize, and frame the digital signal to be written into the WAV file.

(4)降维模型的搭建：将上述提取出的特征参数送入通过核化技巧(kerneltrick)所搭建的降维模型中直接得到低维隐变量，所述低维隐变量即为降维后的数据。(4) Construction of the dimensionality reduction model: the above-mentioned extracted feature parameters are sent into the dimensionality reduction model built by the kernel trick to directly obtain low-dimensional hidden variables, and the low-dimensional hidden variables are the dimensionality reduction. The data.

所述音频采集是通过音频采集装置采集音频样本，音频采集器对音频信号采集时设置好采样频率为44.1Hz(采样频率满足奈奎斯特采样定理)、因为本所采集的是语音信号故采样声道数为单声道、量化精度为16bit。The audio collection is to collect audio samples through an audio collection device. When the audio collection device collects audio signals, the sampling frequency is set to 44.1Hz (the sampling frequency satisfies the Nyquist sampling theorem). The number of channels is mono, and the quantization precision is 16bit.

所述的信号预处理包括以下步骤：The signal preprocessing includes the following steps:

(1)采用矩形窗函数w(n)(上限频率一般取f_H＝3400Hz，下限频率f_L＝60～100Hz)对采集的音频信号x(n)进行滤波处理得到信号y_a(n)，其中

(1) Use the rectangular window function w(n) (the upper limit frequency is generally taken as f _H = 3400 Hz, and the lower limit frequency f _L = 60 ~ 100 Hz) to filter the collected audio signal x (n) to obtain the signal y _a (n), in

(2)对滤波处理后的信号y_a(n)进行差分方法进行预加重处理得到信号y_b(n)，其中y_b(n)＝y(n)-αy(n-1)(α为预加重系数一般取值接近于1)。(2) Perform pre-emphasis processing on the filtered signal y _a (n) by the differential method to obtain the signal y _b (n), where y _b (n)=y(n)-αy(n-1) (α is The pre-emphasis coefficient is generally close to 1).

(3)将预加重处理得到信号y_b(n)分割成若干个语音段，一段称为一帧，每段的时间范围在10～30ms之间。帧与帧之间有部分重叠，重叠的部分称为帧移，帧移取帧长的1/2或1/3。(3) The signal y _b (n) obtained by pre-emphasis processing is divided into several speech segments, one segment is called a frame, and the time range of each segment is between 10 and 30 ms. There is a partial overlap between frames, the overlapping part is called frame shift, and the frame shift takes 1/2 or 1/3 of the frame length.

所述特征参数提取包括以下步骤：The feature parameter extraction includes the following steps:

(1)线性预测系数(LPC)：利用编程调用LPC函数包，设置好帧长、帧移、窗函数、LPC的阶数参数，对上述步骤(2)中预处理好的音频信号进行特征值的提取，放入指定的表格1中。(1) Linear Prediction Coefficient (LPC): Use programming to call the LPC function package, set the frame length, frame shift, window function, and order parameters of LPC, and perform eigenvalues on the preprocessed audio signal in the above step (2). , into the specified table 1.

(2)线性预测倒谱系数(LPCC)：利用编程调用LPCC函数包，设置好帧长、帧移、窗函数、LPCC的阶数参数，对上述步骤(2)中预处理好的音频信号进行特征值的提取，放入指定的表格2中。(2) Linear prediction cepstral coefficient (LPCC): use programming to call the LPCC function package, set the frame length, frame shift, window function, and order parameters of LPCC, and perform the preprocessed audio signal in the above step (2). The extraction of eigenvalues is put into the specified table 2.

(3)梅尔频率倒谱系数(MFCC)：利用编程调用MFCC函数包，设置好帧长、帧移、窗函数、MFCC的阶数参数，对上述步骤(2)中预处理好的音频信号进行特征值的提取，放入指定的表格3中。(3) Mel frequency cepstral coefficient (MFCC): use programming to call the MFCC function package, set the frame length, frame shift, window function, and order parameters of the MFCC, and perform the preprocessed audio signal in the above step (2). Extract the eigenvalues and put them in the specified table 3.

所述降维模型的搭建包括以下步骤：The construction of the dimensionality reduction model includes the following steps:

(1)记隐空间参量为

观测空间为

(1) The hidden space parameter is recorded as

The observation space is

即隐空间为维度q，观测空间维度为d(q<d)假设观测值与隐空间参量直接存在y＝f(z)+ε关系，噪声ε服从均值为0，方差为ξ的高斯分布，并假设隐函数f是满足高斯过程的平方指数核函数：That is, the dimension of the hidden space is q, and the dimension of the observation space is d (q<d). Assuming that there is a y=f(z)+ε relationship between the observed value and the hidden space parameters, the noise ε follows a Gaussian distribution with a mean of 0 and a variance of ξ. And assume that the implicit function f is a square exponential kernel function satisfying a Gaussian process:

其中σ为平方指数核的系数参数，l表示z与z′两点之间距离影响因数参数，β表示模型的一个超参量参数，σ(z,z′)表示的是Kronecker delta函数，核函数中要求解的参量为θ(σ,l,β)。当z与z′很接近时其核函数取得最大值，距离很远时取得最小值。协方差矩阵的计算公式为：where σ is the coefficient parameter of the square exponential kernel, l represents the distance influencing factor parameter between the two points z and z', β represents a hyperparameter parameter of the model, σ(z, z') represents the Kronecker delta function, the kernel function The parameters to be solved in are θ(σ,l,β). When z and z' are very close, the kernel function obtains the maximum value, and when the distance is very far, the kernel function obtains the minimum value. The formula for calculating the covariance matrix is:

本发明此处采用粒子群寻优算法对其参数进行求解，把θ(σ,l,β)记为A＝(a₁,a₂,a₃)，其中粒子i的速度记为v_i＝(v_i1,v_i2,v_i3)，粒子经过最好的最好位置记为p_g＝(p_g1,p_g2,p_g3)，粒子群算法位置迭代更新公式：In the present invention, the particle swarm optimization algorithm is used to solve its parameters, and θ(σ, l, β) is denoted as A=(a ₁ , a ₂ , a ₃ ), and the speed of particle _i is denoted as vi = (v _i1 ,v _i2 ,v _i3 ), the best position where the particle passes through is denoted as p _g =(p _g1 ,p _g2 ,p _g3 ), the particle swarm algorithm position iterative update formula:

其中w是非负的惯性因子；加速常数c₁与c₂是非负数；r₁与r₂是在[0 1]范围内变换的随机数。将求出的核参量θ(σ,l,β)回带入模型得到基于核函数的降维模型，将上述提取出的特征参量送入降维模型得到隐参量，所述隐参量即降维后的数据。where w is a non-negative inertia factor; acceleration constants c ₁ and c ₂ are non-negative numbers; r ₁ and r ₂ are random numbers transformed in the range of [0 1]. Bring the obtained kernel parameters θ(σ, l, β) back into the model to obtain a dimensionality reduction model based on the kernel function, and send the above extracted feature parameters into the dimensionality reduction model to obtain hidden parameters, which are dimensionality reduction. data after.

所述降维分析中因为人生活在三维空间，对超越三维的空间无法想象，并且对于数据组较多的降维结果直接分析较为困难，故将预处理好的音频信号送入搭建好的降维模型中进行降维处理，对获得的隐参量数据进行保存并且进行可视化展示，以便于与其他的降维模型进行优劣对比分析。本发明并不限于上述实施方式，在本领域普通技术人员所具备的知识范围内，还可以将其降维算法应用到其他相关领域。In the dimensionality reduction analysis, because people live in three-dimensional space, it is impossible to imagine the space beyond three-dimensional space, and it is difficult to directly analyze the dimensionality reduction results with many data sets. The dimensionality reduction process is carried out in the dimensional model, and the obtained hidden parameter data is saved and displayed visually, so as to facilitate the comparative analysis with other dimensionality reduction models. The present invention is not limited to the above-mentioned embodiments, and the dimensionality reduction algorithm thereof can also be applied to other related fields within the scope of knowledge possessed by those of ordinary skill in the art.

Claims

1. A kernel function-based audio feature signal dimension reduction method is characterized in that: the method comprises the following specific steps:

(1) audio signal acquisition: collecting an audio signal to obtain an audio sample;

(2) audio signal preprocessing: converting analog signals in the collected audio samples into digital signals, writing the digital signals into a WAV file, and performing filtering, pre-emphasis and framing processing on the digital signals written into the WAV file;

(3) characteristic parameter extraction: extracting characteristic parameters of a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient in the processed digital signal;

building a dimension reduction model: sending the extracted characteristic parameters into a dimensionality reduction model built by a nucleation skill to directly obtain low-dimensional hidden variables, wherein the low-dimensional hidden variables are dimensionality reduced data;

the dimension reduction model is specifically built as follows:

(1) the dimension reduction model is built by first recording the hidden space as

Dimension q, let observation space be

Dimension d, q<d, assuming that a relation of y ═ f (z) + epsilon exists between the observed value and the hidden space parameter, the noise epsilon follows a gaussian distribution with mean 0 and variance β, and assuming that the hidden function f is a squared exponential kernel function satisfying the gaussian process:

wherein σ is a coefficient parameter of a square exponential kernel, l represents a distance influence factor parameter between z and z ', β represents a hyper-parameter of the model, σ (z, z ') represents a Kronecker delta function, the parameter requiring solution in the kernel function is θ (σ, l, β), it can be known from the above formula that the kernel function obtains a maximum value when z and z ' are very close, and obtains a minimum value when the distance is very far, and a calculation formula of a covariance matrix of the kernel function:

(2) assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is_:,iFor n elements of the i-th dimension in the observation space Y

Solving the parameters by a particle swarm optimization algorithm, and recording theta (sigma, l, beta) as A ═ a₁,a₂,a₃) Wherein the velocity of the particle i is denoted v_i＝(v_i1,v_i2,v_i3) The best position where the particle passes is denoted p_g＝(p_g1,p_g2,p_g3) And a particle swarm algorithm position iteration formula:

(4) wherein w is a non-negative inertia factor; acceleration constant c₁And c₂Is a non-negative number; r is₁And r₂Is at [01]Random numbers transformed within the range are applied to a kernel parameter optimization process by an information exchange mode of a particle swarm optimization algorithm, the solved kernel parameters theta (sigma, l, beta) are brought back into the model to obtain a dimension reduction model, the extracted characteristic parameters are sent into the dimension reduction model to obtain hidden parameters, and the hidden parameters are dimension reduced data;

(5) and (3) analyzing a dimension reduction result: and carrying out visual display on the data subjected to the dimensionality reduction.

2. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio acquisition is performed by an audio acquisition device, and the audio acquisition device sets the sampling frequency, the number of sampling channels and the quantization precision when acquiring the audio signals.

3. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio signal pre-processing comprises the steps of:

(1) filtering the collected audio signal x (n) by adopting a rectangular window function w (n) to obtain a signal y_a(n) wherein

(2) For the filtered signal y_a(n) pre-emphasis processing is carried out by using a difference method to obtain a signal y_b(n) wherein y_b(n) y (n) - α y (n-1), α being a pre-emphasis coefficient and generally having a value close to 1;

(3) processing the pre-emphasis to obtain a signal y_b(n) dividing the frame into a plurality of voice frames, and partially overlapping the frames, wherein the overlapped part is called frame shift.