[go: up one dir, main page]

CN109065070B - A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function - Google Patents

A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function Download PDF

Info

Publication number
CN109065070B
CN109065070B CN201810995309.7A CN201810995309A CN109065070B CN 109065070 B CN109065070 B CN 109065070B CN 201810995309 A CN201810995309 A CN 201810995309A CN 109065070 B CN109065070 B CN 109065070B
Authority
CN
China
Prior art keywords
audio
dimension reduction
signal
kernel function
dimensionality reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810995309.7A
Other languages
Chinese (zh)
Other versions
CN109065070A (en
Inventor
龙华
杨明亮
邵玉斌
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810995309.7A priority Critical patent/CN109065070B/en
Publication of CN109065070A publication Critical patent/CN109065070A/en
Application granted granted Critical
Publication of CN109065070B publication Critical patent/CN109065070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to a kernel function-based audio characteristic signal dimension reduction method, and belongs to the technical field of audio signal processing. The invention carries out dimension reduction processing on the characteristic parameters of the audio signals, achieves the required dimension reduction effect while not discarding the audio characteristic information quantity, visually displays the final dimension reduction data, and carries out comparison analysis on the results obtained by adopting other audio characteristic parameter dimension reduction methods. The invention carries out dimension reduction on the audio characteristic parameters, mainly carries out dimension reduction processing on a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient of an audio coefficient field, and visually displays the data result after dimension reduction. The audio feature dimension reduction processing of the invention can be used for monitoring broadcast signals and quickly identifying and processing audio signals. The method has simple algorithm, uses the nonlinear kernel function to represent the mapping relation between the Gaussian observation space and the hidden space, and avoids the defects of limited use range and poor dimension reduction effect of a linear mapping method.

Description

一种基于核函数的音频特征信号的降维方法A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function

技术领域technical field

本发明涉及一种基于核函数的音频特征信号的降维方法,属于音频特征信号处理技术领域。The invention relates to a dimensionality reduction method for an audio feature signal based on a kernel function, and belongs to the technical field of audio feature signal processing.

背景技术Background technique

为了实现对无线音频广播的管控,对音频广播进行安全高效的实时监听和甄别,音频信息的快速处理关系到整个流程的进程速度,而音频的特征信号降维处理作为音频信息处理的核心,其效率与可信度也必成为了目前亟待解决的问题。就目前而言已大部分针对音频特征信号降维方法主要有局部保留投影法、多维缩放法、局部线性嵌入法、主成分析法等。这些降维算法大多复杂度高,以丢弃部分特征信号达到降维的目的,在实际中工程应用中会造成不可预测的误差,本发明便是针对上述弊端所提出的。In order to realize the management and control of wireless audio broadcasting, and conduct safe and efficient real-time monitoring and screening of audio broadcasting, the rapid processing of audio information is related to the process speed of the entire process, and the feature signal dimensionality reduction processing of audio is the core of audio information processing. Efficiency and credibility must also become urgent problems to be solved at present. At present, most of the dimensionality reduction methods for audio feature signals mainly include local preservation projection method, multi-dimensional scaling method, local linear embedding method, principal component analysis method and so on. Most of these dimensionality reduction algorithms have high complexity, and the purpose of dimensionality reduction is achieved by discarding some characteristic signals, which will cause unpredictable errors in practical engineering applications. The present invention is proposed to address the above drawbacks.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题是提供一种基于核函数的音频特征信号的降维方法,对提取出的音频线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行降维分析,达到缩减数据维度,提高信息处理速率的目的。The technical problem to be solved by the present invention is to provide a dimensionality reduction method for audio feature signal based on kernel function. (MFCC) performs dimensionality reduction analysis to reduce the data dimension and improve the information processing rate.

本发明的技术方案如是:一种基于核函数的音频特征信号的降维方法。该方法包括以下具体步骤:The technical solutions of the present invention are as follows: a dimensionality reduction method for audio feature signals based on a kernel function. The method includes the following specific steps:

(1)音频信号采集:采集音频信号,获得音频样本。(1) Audio signal collection: collect audio signals to obtain audio samples.

(2)音频信号预处理:将所采集的音频样本中的模拟信号转换为数字信号,将数字信号写入WAV文件中。对写入WAV文件中的数字信号进行滤波、预加重、分帧处理。(2) Audio signal preprocessing: convert the analog signal in the collected audio sample into a digital signal, and write the digital signal into a WAV file. Filter, pre-emphasize and frame the digital signal written in the WAV file.

(3)特征参数提取:对处理后的数字信号中的线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行高维特征参数的提取。(3) Feature parameter extraction: Extract high-dimensional feature parameters for linear prediction coefficients (LPC), linear prediction cepstral coefficients (LPCC), and Mel frequency cepstral coefficients (MFCC) in the processed digital signal.

(4)降维模型的搭建:将上述提取出的特征参数送入通过核化技巧(kerneltrick)所搭建的降维模型中直接得到低维隐变量,所述低维隐变量即为降维后的数据。其核心是用高斯回归过程模型(GPR)来非线性建模隐变量和观测变量的关系。(4) Construction of the dimensionality reduction model: the above-mentioned extracted feature parameters are sent into the dimensionality reduction model built by the kernel trick to directly obtain low-dimensional hidden variables, and the low-dimensional hidden variables are the dimensionality reduction. The data. Its core is to use Gaussian regression process model (GPR) to nonlinearly model the relationship between latent variables and observed variables.

(5)降维分析:将降维后的数据进行可视化展示(2D/3D),与其它降维方法所得到的结果进行对比。(5) Dimensionality reduction analysis: Visually display (2D/3D) data after dimensionality reduction, and compare it with the results obtained by other dimensionality reduction methods.

上述的一种基于核函数的音频特征信号的降维方法,步骤(1)中所述音频采集是通过音频采集装置采集音频样本,音频采集器对音频信号采集时设置好采样频率(采样频率满足奈奎斯特采样定理)、采样声道数、量化精度。The above-mentioned dimensionality reduction method based on the audio feature signal of a kernel function, the audio collection described in step (1) is to collect audio samples by an audio collection device, and the audio collection device sets the sampling frequency when the audio signal is collected (the sampling frequency satisfies the Nyquist sampling theorem), number of sampling channels, quantization accuracy.

上述的一种基于核函数的音频特征信号的降维方法,步骤(2)中音频信号预处理包括以下步骤:In the above-mentioned dimensionality reduction method of the audio feature signal based on the kernel function, the audio signal preprocessing in step (2) comprises the following steps:

(1)采用矩形窗函数w(n)(上限频率一般取fH=3400Hz,下限频率fL=60~100Hz)对采集的音频信号x(n)进行滤波处理得到信号ya(n),其中(1) Use the rectangular window function w(n) (the upper limit frequency is generally taken as f H = 3400 Hz, and the lower limit frequency f L = 60 ~ 100 Hz) to filter the collected audio signal x (n) to obtain the signal y a (n), in

Figure BDA0001781726780000021
Figure BDA0001781726780000021

(2)对滤波处理后的信号ya(n)进行差分方法进行预加重处理得到信号yb(n),其中yb(n)=y(n)-αy(n-1)(α为预加重系数一般取值接近于1)。提升高频部分,抑制低频部分,使信号的频谱变得平坦。(2) Perform pre-emphasis processing on the filtered signal y a (n) by the differential method to obtain the signal y b (n), where y b (n)=y(n)-αy(n-1) (α is The pre-emphasis coefficient is generally close to 1). Boosts high frequencies and suppresses low frequencies to flatten the spectrum of the signal.

(3)分帧语音信号的短时分析是将信号分割成若干个语音段,一段称为一帧,每段的时间范围在10~30ms之间。为了保证帧与帧之间平滑过渡,帧与帧之间有部分重叠,重叠的部分称为帧移,帧移取帧长的1/2或1/3。(3) The short-term analysis of the framed speech signal is to divide the signal into several speech segments, one segment is called a frame, and the time range of each segment is between 10 and 30 ms. In order to ensure a smooth transition between frames, there is a partial overlap between frames, and the overlapping part is called frame shift, and the frame shift takes 1/2 or 1/3 of the frame length.

上述的一种基于核函数的音频特征信号的降维方法,步骤(3)特征参数提取包括以下步骤:In the above-mentioned dimensionality reduction method of the audio feature signal based on kernel function, step (3) feature parameter extraction comprises the following steps:

(1)线性预测系数(LPC):利用编程调用LPC函数包,设置好帧长、帧移、窗函数、LPC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格中1。(1) Linear Prediction Coefficient (LPC): Use programming to call the LPC function package, set the frame length, frame shift, window function, and order parameters of LPC, and perform eigenvalues on the preprocessed audio signal in the above step (2). , put into the specified table 1.

(2)线性预测倒谱系数(LPCC):利用编程调用LPCC函数包,设置好帧长、帧移、窗函数、LPCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格中2。(2) Linear prediction cepstral coefficient (LPCC): use programming to call the LPCC function package, set the frame length, frame shift, window function, and order parameters of LPCC, and perform the preprocessed audio signal in the above step (2). Extraction of eigenvalues and put them into the specified table 2.

(3)梅尔频率倒谱系数(MFCC):利用编程调用MFCC函数包,设置好帧长、帧移、窗函数、MFCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格中3。(3) Mel frequency cepstral coefficient (MFCC): use programming to call the MFCC function package, set the frame length, frame shift, window function, and order parameters of the MFCC, and perform the preprocessed audio signal in the above step (2). Extract the eigenvalues and put them in the specified table 3.

上述的一种基于核函数的音频特征信号的降维方法,步骤(4)中降维模型的搭建包括以下步骤:In the above-mentioned dimensionality reduction method of audio feature signal based on kernel function, the building of the dimensionality reduction model in step (4) includes the following steps:

(1)特征降维模型搭建首先记隐空间为

Figure BDA0001781726780000031
维度为q,记观测空间为
Figure BDA0001781726780000032
维度为d(q<d)。假设观测值与隐空间参量之间存在y=f(z)+ε关系,噪声ε服从均值为0,方差为β的高斯分布,并假设隐函数f是满足高斯过程的平方指数核函数(1) Building the feature dimensionality reduction model First, the latent space is recorded as
Figure BDA0001781726780000031
The dimension is q, and the observation space is recorded as
Figure BDA0001781726780000032
The dimension is d (q<d). Assume that there is a y=f(z)+ε relationship between the observed value and the latent space parameter, the noise ε obeys a Gaussian distribution with a mean of 0 and a variance of β, and assumes that the implicit function f is a square exponential kernel function that satisfies the Gaussian process

Figure BDA0001781726780000033
其中σ为平方指数核的系数参数,l表示z与z′两点之间距离影响因数参数,β表示模型的一个超参量参数,σ(z,z′)表示的是Kroneckerdelta函数,核函数中要求解的参量为θ(σ,l,β)。当z与z′很接近时其核函数取得最大值,距离很远时取得最小值。为了便于后续推导,先给出协方差矩阵的计算公式,其公式为
Figure BDA0001781726780000033
where σ is the coefficient parameter of the square exponential kernel, l represents the distance influencing factor parameter between the two points z and z', β represents a hyperparameter parameter of the model, σ(z, z') represents the Kroneckerdelta function, in the kernel function The parameters to be solved are θ(σ,l,β). When z and z' are very close, the kernel function obtains the maximum value, and when the distance is very far, the kernel function obtains the minimum value. In order to facilitate the subsequent derivation, the calculation formula of the covariance matrix is given first, and its formula is

Figure BDA0001781726780000034
Figure BDA0001781726780000034

(2)假定对d维观测空间进行了独立采样,则可得关于Y的观测概率,其中y:,i为观测值空间Y中的第i维的n个元素(2) Assuming that the d-dimensional observation space is independently sampled, the observation probability about Y can be obtained, where y :,i is the n element of the i-th dimension in the observation value space Y

Figure BDA0001781726780000035
Figure BDA0001781726780000035

要想获得较好的降维效果,即采用相关算法获取最好的核函数超参量使得上述概率最大化,此处采用粒子群寻优算法对其进行求解,把θ(σ,l,β)记为A=(a1,a2,a3),其中粒子i的速度记为vi=(vi1,vi2,vi3),粒子经过最好的最好位置记为pg=(pg1,pg2,pg3),粒子群算法采用如下方程对粒子所在的位置进行不断更新In order to obtain a better dimensionality reduction effect, the relevant algorithm is used to obtain the best hyperparameters of the kernel function to maximize the above probability. Here, the particle swarm optimization algorithm is used to solve it, and θ(σ,l,β) Denoted as A = (a 1 , a 2 , a 3 ), where the speed of particle i is denoted as vi = (v i1 , v i2 , v i3 ), and the best position the particle passes through is denoted as p g = ( p g1 ,p g2 ,p g3 ), the particle swarm algorithm uses the following equation to continuously update the position of the particle

Figure BDA0001781726780000041
其中w是非负的惯性因子;加速常数c1与c2是非负数;r1与r2是在[0 1]范围内变换的随机数。利用粒子群优化算法当前位置、经验位置和邻居位信息进行粒子状态的调整,将粒子群优化算法这种信息交换模式应用到核参数优化过程中,粒子受到自身经验和群里经验的双重影响,故而有较好的全局寻优能力和收敛速度。
Figure BDA0001781726780000041
where w is a non-negative inertia factor; acceleration constants c 1 and c 2 are non-negative numbers; r 1 and r 2 are random numbers transformed in the range of [0 1]. Particle swarm optimization algorithm current position, experience position and neighbor position information are used to adjust the particle state, and the information exchange mode of particle swarm optimization algorithm is applied to the process of kernel parameter optimization. Therefore, it has better global optimization ability and convergence speed.

本模型所使用的核函数为非线性核函数,将其求出的核参量θ(σ,l,β)回带入模型,将上述提取出的特征参量送入降维模型得到隐参量,所述隐参量即降维后的数据。The kernel function used in this model is a nonlinear kernel function, and the obtained kernel parameters θ(σ, l, β) are brought back into the model, and the above-extracted characteristic parameters are sent into the dimensionality reduction model to obtain hidden parameters, so The hidden parameters are the data after dimensionality reduction.

上述的一种基于音频特征信号的降维分析方法,步骤(5)中将上述降维后数据进行二维或三维可视化显示,进而同其他降维算法结果进行分析和对比。In the above-mentioned dimensionality reduction analysis method based on audio feature signal, in step (5), the above-mentioned dimensionality reduction data is displayed in two-dimensional or three-dimensional visualization, and then the results of other dimensionality reduction algorithms are analyzed and compared.

本发明与现有的基于核函数的音频特征信号的降维方法的优点有:The advantages of the present invention and the existing kernel function-based audio feature signal dimensionality reduction method are as follows:

(1)本发明用非线性核函数表示观测空间数据与隐空间的参量直接的关系,避免了用线性映射导致的某些音频特征数据降维效果差的缺点。(1) The present invention uses a nonlinear kernel function to represent the direct relationship between the observation space data and the parameters of the latent space, avoiding the disadvantage of poor dimensionality reduction effect for some audio feature data caused by linear mapping.

(2)本发明是采用粒子群算法对核函数中的超参量进行求解,粒子群优良的全局寻优能力和群粒子的方向性可以很快找到最优超参量,对于后续更换其他核函数也是极其方便的。(2) The present invention uses the particle swarm algorithm to solve the hyperparameters in the kernel function. The excellent global optimization ability of the particle swarm and the directionality of the swarm particles can quickly find the optimal hyperparameters, which is also the same for subsequent replacement of other kernel functions. Extremely convenient.

(3)本发明提出的新型音频特征降维算理论简单,编程易于实现,更加适应于现实工程项目的应用,对音频信息处理速度的提升有实质性的改变。(3) The novel audio feature dimensionality reduction algorithm proposed by the present invention is simple in theory, easy to implement in programming, more suitable for application in practical engineering projects, and has substantial changes in the improvement of audio information processing speed.

附图说明Description of drawings

图1本发明降维分析流程图;Fig. 1 dimensionality reduction analysis flow chart of the present invention;

图2本发明信号预处理流程图;Fig. 2 signal preprocessing flow chart of the present invention;

图3本发明特征参数提取与降维处理流程图;Fig. 3 feature parameter extraction and dimensionality reduction processing flow chart of the present invention;

具体实施方式Detailed ways

下面结合附图和实施例对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

如图1-3所示,一种基于核函数的音频特征信号的降维方法,具体步骤为:As shown in Figure 1-3, a dimensionality reduction method of audio feature signal based on kernel function, the specific steps are:

(1)音频信号采集:采集音频信号,获得音频样本。(1) Audio signal collection: collect audio signals to obtain audio samples.

(2)音频信号预处理:将所采集的音频样本中的模拟信号转换为数字信号,将数字信号写入WAV文件中。对将写入WAV文件中的数字信号进行滤波、预加重、分帧处理。(2) Audio signal preprocessing: convert the analog signal in the collected audio sample into a digital signal, and write the digital signal into a WAV file. Filter, pre-emphasize, and frame the digital signal to be written into the WAV file.

(3)特征参数提取:对处理后的数字信号中的线性预测系数(LPC)、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)进行高维特征参数的提取。(3) Feature parameter extraction: Extract high-dimensional feature parameters for linear prediction coefficients (LPC), linear prediction cepstral coefficients (LPCC), and Mel frequency cepstral coefficients (MFCC) in the processed digital signal.

(4)降维模型的搭建:将上述提取出的特征参数送入通过核化技巧(kerneltrick)所搭建的降维模型中直接得到低维隐变量,所述低维隐变量即为降维后的数据。(4) Construction of the dimensionality reduction model: the above-mentioned extracted feature parameters are sent into the dimensionality reduction model built by the kernel trick to directly obtain low-dimensional hidden variables, and the low-dimensional hidden variables are the dimensionality reduction. The data.

(5)降维分析:将降维后的数据进行可视化展示(2D/3D),与其它降维方法所得到的结果进行对比。(5) Dimensionality reduction analysis: Visually display (2D/3D) data after dimensionality reduction, and compare it with the results obtained by other dimensionality reduction methods.

所述音频采集是通过音频采集装置采集音频样本,音频采集器对音频信号采集时设置好采样频率为44.1Hz(采样频率满足奈奎斯特采样定理)、因为本所采集的是语音信号故采样声道数为单声道、量化精度为16bit。The audio collection is to collect audio samples through an audio collection device. When the audio collection device collects audio signals, the sampling frequency is set to 44.1Hz (the sampling frequency satisfies the Nyquist sampling theorem). The number of channels is mono, and the quantization precision is 16bit.

所述的信号预处理包括以下步骤:The signal preprocessing includes the following steps:

(1)采用矩形窗函数w(n)(上限频率一般取fH=3400Hz,下限频率fL=60~100Hz)对采集的音频信号x(n)进行滤波处理得到信号ya(n),其中

Figure BDA0001781726780000051
(1) Use the rectangular window function w(n) (the upper limit frequency is generally taken as f H = 3400 Hz, and the lower limit frequency f L = 60 ~ 100 Hz) to filter the collected audio signal x (n) to obtain the signal y a (n), in
Figure BDA0001781726780000051

(2)对滤波处理后的信号ya(n)进行差分方法进行预加重处理得到信号yb(n),其中yb(n)=y(n)-αy(n-1)(α为预加重系数一般取值接近于1)。(2) Perform pre-emphasis processing on the filtered signal y a (n) by the differential method to obtain the signal y b (n), where y b (n)=y(n)-αy(n-1) (α is The pre-emphasis coefficient is generally close to 1).

(3)将预加重处理得到信号yb(n)分割成若干个语音段,一段称为一帧,每段的时间范围在10~30ms之间。帧与帧之间有部分重叠,重叠的部分称为帧移,帧移取帧长的1/2或1/3。(3) The signal y b (n) obtained by pre-emphasis processing is divided into several speech segments, one segment is called a frame, and the time range of each segment is between 10 and 30 ms. There is a partial overlap between frames, the overlapping part is called frame shift, and the frame shift takes 1/2 or 1/3 of the frame length.

所述特征参数提取包括以下步骤:The feature parameter extraction includes the following steps:

(1)线性预测系数(LPC):利用编程调用LPC函数包,设置好帧长、帧移、窗函数、LPC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格1中。(1) Linear Prediction Coefficient (LPC): Use programming to call the LPC function package, set the frame length, frame shift, window function, and order parameters of LPC, and perform eigenvalues on the preprocessed audio signal in the above step (2). , into the specified table 1.

(2)线性预测倒谱系数(LPCC):利用编程调用LPCC函数包,设置好帧长、帧移、窗函数、LPCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格2中。(2) Linear prediction cepstral coefficient (LPCC): use programming to call the LPCC function package, set the frame length, frame shift, window function, and order parameters of LPCC, and perform the preprocessed audio signal in the above step (2). The extraction of eigenvalues is put into the specified table 2.

(3)梅尔频率倒谱系数(MFCC):利用编程调用MFCC函数包,设置好帧长、帧移、窗函数、MFCC的阶数参数,对上述步骤(2)中预处理好的音频信号进行特征值的提取,放入指定的表格3中。(3) Mel frequency cepstral coefficient (MFCC): use programming to call the MFCC function package, set the frame length, frame shift, window function, and order parameters of the MFCC, and perform the preprocessed audio signal in the above step (2). Extract the eigenvalues and put them in the specified table 3.

所述降维模型的搭建包括以下步骤:The construction of the dimensionality reduction model includes the following steps:

(1)记隐空间参量为

Figure BDA0001781726780000061
观测空间为
Figure BDA0001781726780000062
(1) The hidden space parameter is recorded as
Figure BDA0001781726780000061
The observation space is
Figure BDA0001781726780000062

即隐空间为维度q,观测空间维度为d(q<d)假设观测值与隐空间参量直接存在y=f(z)+ε关系,噪声ε服从均值为0,方差为ξ的高斯分布,并假设隐函数f是满足高斯过程的平方指数核函数:That is, the dimension of the hidden space is q, and the dimension of the observation space is d (q<d). Assuming that there is a y=f(z)+ε relationship between the observed value and the hidden space parameters, the noise ε follows a Gaussian distribution with a mean of 0 and a variance of ξ. And assume that the implicit function f is a square exponential kernel function satisfying a Gaussian process:

Figure BDA0001781726780000063
Figure BDA0001781726780000063

其中σ为平方指数核的系数参数,l表示z与z′两点之间距离影响因数参数,β表示模型的一个超参量参数,σ(z,z′)表示的是Kronecker delta函数,核函数中要求解的参量为θ(σ,l,β)。当z与z′很接近时其核函数取得最大值,距离很远时取得最小值。协方差矩阵的计算公式为:where σ is the coefficient parameter of the square exponential kernel, l represents the distance influencing factor parameter between the two points z and z', β represents a hyperparameter parameter of the model, σ(z, z') represents the Kronecker delta function, the kernel function The parameters to be solved in are θ(σ,l,β). When z and z' are very close, the kernel function obtains the maximum value, and when the distance is very far, the kernel function obtains the minimum value. The formula for calculating the covariance matrix is:

Figure BDA0001781726780000064
Figure BDA0001781726780000064

(2)假定对d维观测空间进行了独立采样,则可得关于Y的观测概率,其中y:,i为观测值空间Y中的第i维的n个元素(2) Assuming that the d-dimensional observation space is independently sampled, the observation probability about Y can be obtained, where y :,i is the n element of the i-th dimension in the observation value space Y

Figure BDA0001781726780000065
Figure BDA0001781726780000065

本发明此处采用粒子群寻优算法对其参数进行求解,把θ(σ,l,β)记为A=(a1,a2,a3),其中粒子i的速度记为vi=(vi1,vi2,vi3),粒子经过最好的最好位置记为pg=(pg1,pg2,pg3),粒子群算法位置迭代更新公式:In the present invention, the particle swarm optimization algorithm is used to solve its parameters, and θ(σ, l, β) is denoted as A=(a 1 , a 2 , a 3 ), and the speed of particle i is denoted as vi = (v i1 ,v i2 ,v i3 ), the best position where the particle passes through is denoted as p g =(p g1 ,p g2 ,p g3 ), the particle swarm algorithm position iterative update formula:

Figure BDA0001781726780000071
Figure BDA0001781726780000071

Figure BDA0001781726780000072
Figure BDA0001781726780000072

其中w是非负的惯性因子;加速常数c1与c2是非负数;r1与r2是在[0 1]范围内变换的随机数。将求出的核参量θ(σ,l,β)回带入模型得到基于核函数的降维模型,将上述提取出的特征参量送入降维模型得到隐参量,所述隐参量即降维后的数据。where w is a non-negative inertia factor; acceleration constants c 1 and c 2 are non-negative numbers; r 1 and r 2 are random numbers transformed in the range of [0 1]. Bring the obtained kernel parameters θ(σ, l, β) back into the model to obtain a dimensionality reduction model based on the kernel function, and send the above extracted feature parameters into the dimensionality reduction model to obtain hidden parameters, which are dimensionality reduction. data after.

所述降维分析中因为人生活在三维空间,对超越三维的空间无法想象,并且对于数据组较多的降维结果直接分析较为困难,故将预处理好的音频信号送入搭建好的降维模型中进行降维处理,对获得的隐参量数据进行保存并且进行可视化展示,以便于与其他的降维模型进行优劣对比分析。本发明并不限于上述实施方式,在本领域普通技术人员所具备的知识范围内,还可以将其降维算法应用到其他相关领域。In the dimensionality reduction analysis, because people live in three-dimensional space, it is impossible to imagine the space beyond three-dimensional space, and it is difficult to directly analyze the dimensionality reduction results with many data sets. The dimensionality reduction process is carried out in the dimensional model, and the obtained hidden parameter data is saved and displayed visually, so as to facilitate the comparative analysis with other dimensionality reduction models. The present invention is not limited to the above-mentioned embodiments, and the dimensionality reduction algorithm thereof can also be applied to other related fields within the scope of knowledge possessed by those of ordinary skill in the art.

Claims (3)

1. A kernel function-based audio feature signal dimension reduction method is characterized in that: the method comprises the following specific steps:
(1) audio signal acquisition: collecting an audio signal to obtain an audio sample;
(2) audio signal preprocessing: converting analog signals in the collected audio samples into digital signals, writing the digital signals into a WAV file, and performing filtering, pre-emphasis and framing processing on the digital signals written into the WAV file;
(3) characteristic parameter extraction: extracting characteristic parameters of a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient in the processed digital signal;
building a dimension reduction model: sending the extracted characteristic parameters into a dimensionality reduction model built by a nucleation skill to directly obtain low-dimensional hidden variables, wherein the low-dimensional hidden variables are dimensionality reduced data;
the dimension reduction model is specifically built as follows:
(1) the dimension reduction model is built by first recording the hidden space as
Figure FDA0003562036870000011
Dimension q, let observation space be
Figure FDA0003562036870000012
Dimension d, q<d, assuming that a relation of y ═ f (z) + epsilon exists between the observed value and the hidden space parameter, the noise epsilon follows a gaussian distribution with mean 0 and variance β, and assuming that the hidden function f is a squared exponential kernel function satisfying the gaussian process:
Figure FDA0003562036870000013
wherein σ is a coefficient parameter of a square exponential kernel, l represents a distance influence factor parameter between z and z ', β represents a hyper-parameter of the model, σ (z, z ') represents a Kronecker delta function, the parameter requiring solution in the kernel function is θ (σ, l, β), it can be known from the above formula that the kernel function obtains a maximum value when z and z ' are very close, and obtains a minimum value when the distance is very far, and a calculation formula of a covariance matrix of the kernel function:
Figure FDA0003562036870000014
(2) assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is:,iFor n elements of the i-th dimension in the observation space Y
Figure FDA0003562036870000021
Solving the parameters by a particle swarm optimization algorithm, and recording theta (sigma, l, beta) as A ═ a1,a2,a3) Wherein the velocity of the particle i is denoted vi=(vi1,vi2,vi3) The best position where the particle passes is denoted pg=(pg1,pg2,pg3) And a particle swarm algorithm position iteration formula:
Figure FDA0003562036870000022
Figure FDA0003562036870000023
(4) wherein w is a non-negative inertia factor; acceleration constant c1And c2Is a non-negative number; r is1And r2Is at [01]Random numbers transformed within the range are applied to a kernel parameter optimization process by an information exchange mode of a particle swarm optimization algorithm, the solved kernel parameters theta (sigma, l, beta) are brought back into the model to obtain a dimension reduction model, the extracted characteristic parameters are sent into the dimension reduction model to obtain hidden parameters, and the hidden parameters are dimension reduced data;
(5) and (3) analyzing a dimension reduction result: and carrying out visual display on the data subjected to the dimensionality reduction.
2. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio acquisition is performed by an audio acquisition device, and the audio acquisition device sets the sampling frequency, the number of sampling channels and the quantization precision when acquiring the audio signals.
3. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio signal pre-processing comprises the steps of:
(1) filtering the collected audio signal x (n) by adopting a rectangular window function w (n) to obtain a signal ya(n) wherein
Figure FDA0003562036870000024
(2) For the filtered signal ya(n) pre-emphasis processing is carried out by using a difference method to obtain a signal yb(n) wherein yb(n) y (n) - α y (n-1), α being a pre-emphasis coefficient and generally having a value close to 1;
(3) processing the pre-emphasis to obtain a signal yb(n) dividing the frame into a plurality of voice frames, and partially overlapping the frames, wherein the overlapped part is called frame shift.
CN201810995309.7A 2018-08-29 2018-08-29 A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function Active CN109065070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810995309.7A CN109065070B (en) 2018-08-29 2018-08-29 A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810995309.7A CN109065070B (en) 2018-08-29 2018-08-29 A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function

Publications (2)

Publication Number Publication Date
CN109065070A CN109065070A (en) 2018-12-21
CN109065070B true CN109065070B (en) 2022-07-19

Family

ID=64757611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810995309.7A Active CN109065070B (en) 2018-08-29 2018-08-29 A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function

Country Status (1)

Country Link
CN (1) CN109065070B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112444785B (en) 2019-08-30 2024-04-12 华为技术有限公司 A method, device and radar system for identifying target behavior

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679321A (en) * 2016-01-29 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Speech recognition method and device and terminal
CN105913066A (en) * 2016-04-13 2016-08-31 刘国栋 Digital lung sound characteristic dimension reducing method based on relevance vector machine
CN106898362A (en) * 2017-02-23 2017-06-27 重庆邮电大学 The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis
CN109166591A (en) * 2018-08-29 2019-01-08 昆明理工大学 A kind of classification method based on audio frequency characteristics signal
CN109346104A (en) * 2018-08-29 2019-02-15 昆明理工大学 A Dimensionality Reduction Method for Audio Features Based on Spectral Clustering

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756061B2 (en) * 2011-04-01 2014-06-17 Sony Computer Entertainment Inc. Speech syllable/vowel/phone boundary detection using auditory attention cues

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679321A (en) * 2016-01-29 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Speech recognition method and device and terminal
CN105913066A (en) * 2016-04-13 2016-08-31 刘国栋 Digital lung sound characteristic dimension reducing method based on relevance vector machine
CN106898362A (en) * 2017-02-23 2017-06-27 重庆邮电大学 The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis
CN109166591A (en) * 2018-08-29 2019-01-08 昆明理工大学 A kind of classification method based on audio frequency characteristics signal
CN109346104A (en) * 2018-08-29 2019-02-15 昆明理工大学 A Dimensionality Reduction Method for Audio Features Based on Spectral Clustering

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"Hierarchical Gaussian Process Latent Variable Models";Neil D.Lawrence;《Machine Learning, Proceedings of the Twenty-Fourth International Conference》;20171230;第20-24页 *
"Semi-supervised Gaussian process latent variable model with pairwise";Xiumei Wang 等;《Neurocomputing》;20101230;全文 *
"语音情感特征提取及其降维方法综述";刘振焘 等;《计算机学报》;20181230;全文 *
"基于语音特征的汉语数字语音降维与识别研究";高文曦;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20120715;第31-33页 *
"基于高斯过程隐变量模型的数据降维与分类";张家源;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20181015;全文 *
"降维技术与方法综述";张煜东;《四川兵工学报》;20101030;全文 *

Also Published As

Publication number Publication date
CN109065070A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109599120B (en) Abnormal mammal sound monitoring method based on large-scale farm plant
Müller et al. Acoustic anomaly detection for machine sounds based on image transfer learning
CN107393554B (en) A feature extraction method based on fusion of inter-class standard deviations in acoustic scene classification
CN105976809B (en) Recognition method and system based on dual-modal emotion fusion of voice and facial expression
CN103310789B (en) A kind of sound event recognition method of the parallel model combination based on improving
CN109166591B (en) Classification method based on audio characteristic signals
WO2016155047A1 (en) Method of recognizing sound event in auditory scene having low signal-to-noise ratio
CN107564543B (en) A Speech Feature Extraction Method with High Emotion Discrimination
CN106653032A (en) Animal sound detecting method based on multiband energy distribution in low signal-to-noise-ratio environment
CN105448291A (en) Parkinsonism detection method and detection system based on voice
CN112651452A (en) Fan blade abnormity detection method and storage medium
Hao et al. Time-domain neural network approach for speech bandwidth extension
CN109065070B (en) A Dimensionality Reduction Method of Audio Feature Signal Based on Kernel Function
CN117316178A (en) Voiceprint recognition method, device, equipment and medium for power equipment
CN102623007B (en) Classification method of audio features based on variable duration
CN111179972A (en) Human voice detection algorithm based on deep learning
CN104867493B (en) Multifractal Dimension end-point detecting method based on wavelet transformation
CN108564967B (en) Mel energy voiceprint feature extraction method for cry detection system
CN112863541A (en) Audio cutting method and system based on clustering and median convergence
CN113990297B (en) An audio tampering identification method based on ENF
CN109935234B (en) Method for identifying source equipment of sound recording
Biswas et al. Audio visual isolated Oriya digit recognition using HMM and DWT
Therese et al. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system
Han et al. ARResNet: A convolutional neural network based on human ear features to construct abnormal sound detection system for air-conditioning
CN109215633A (en) The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant