CN106228979A

CN106228979A - A kind of abnormal sound in public places feature extraction and recognition methods

Info

Publication number: CN106228979A
Application number: CN201610674982.1A
Authority: CN
Inventors: 李伟红; 田真真; 龚卫国; 王伟冰
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2016-08-16
Filing date: 2016-08-16
Publication date: 2016-12-14
Anticipated expiration: 2036-08-16
Also published as: CN106228979B

Abstract

The invention relates to a method for extracting and identifying abnormal sounds in public places. It improves the pole symmetric mode decomposition (ESMD), referred to as D‑ESMD. The influence of background noise in small public places on the feature extraction of abnormal sounds; Aiming at the problem that the original ESMD has a poor decomposition effect when decomposing abnormal sounds, a symmetrical midpoint interpolation method is proposed to replace the extreme value midpoint parity interpolation method to improve the decomposition efficiency and recognition of abnormal sounds Aiming at the defects of the original ESMD in effective decomposition mode selection, a permutation entropy algorithm is proposed to detect the complexity of the modes obtained by ESMD decomposition, and adaptively obtain the effective mode components of abnormal sounds. The invention can fully describe the characteristics of abnormal sounds, obtain better classification and recognition results, can more accurately extract the characteristics of abnormal sounds, and has better robustness to environmental background noise.

Description

A method for feature extraction and recognition of abnormal sounds in public places

技术领域technical field

本发明属于音频信号特征提取及模式识别技术领域，尤其涉及一种公共场所异常声音特征提取及识别方法。The invention belongs to the technical field of audio signal feature extraction and pattern recognition, and in particular relates to a method for feature extraction and identification of abnormal sounds in public places.

背景技术Background technique

公共场所如广场、公共汽车站、地铁等具有人流量大、地域广等特点，且公共场所的安全防范一直受到各国政府和人民的广泛关注。目前以视频监控为主的监控技术为公共场所的安全防范起到了积极的作用，然而视频监控技术存在监控死角、阴雨天监控模糊等问题。众所周知，在异常事件发生时常常伴随着尖叫声、枪声、玻璃破碎声、爆炸声等异常声音，因此音频监控与视频监控的协作运行已经成为公共场所安全监控领域的发展方向。目前，现有的音频监控系统仅包含简单的声音采集、传输等，缺乏对异常声音的有效识别，原因是音频监控核心理论及技术没有得到突破。公共场所异常声音识别技术为音频监控系统的核心技术。因此，对该技术的研究具有重要的社会意义以及研究价值。Public places such as squares, bus stations, subways, etc. have the characteristics of large flow of people and wide areas, and the safety precautions of public places have always been widely concerned by governments and people of all countries. At present, the monitoring technology mainly based on video monitoring has played a positive role in the safety of public places. However, video monitoring technology has problems such as monitoring dead spots and blurry monitoring in rainy days. As we all know, when abnormal events occur, they are often accompanied by abnormal sounds such as screams, gunshots, broken glass, explosions, etc. Therefore, the cooperative operation of audio monitoring and video monitoring has become the development direction of the field of public place security monitoring. At present, the existing audio monitoring system only includes simple sound collection and transmission, etc., and lacks effective identification of abnormal sounds. The reason is that the core theory and technology of audio monitoring have not been broken through. Abnormal sound recognition technology in public places is the core technology of the audio monitoring system. Therefore, the research on this technology has important social significance and research value.

目前对于极点对称模态分解(Extreme-point Symmetric Mode Decomposition，ESMD)方法提取公共场所异常声音特征存在问题：①公共场所异常声音由两部分组成：异常声音信号与背景噪声信号。背景噪声信号会遮挡异常声音的局部特征，采用ESMD对公共场所异常声音进行分解，得到的模态分量必然包含背景噪声分量，导致异常声音特征产生偏差。②ESMD在分解信号时，根据极值中点构造1条、2条、3条或者等多的插值曲线提高分解效果，即ESMD_I、ESMD_II、ESMD_III方法。由于插值方式对模态分解的效果影响很大，通过比较上述三种插值方式发现，随着插值线的增加，模态数目会减少，对称度会降低，振幅变化会增强，分解效率会提高。③ESMD分解具有背景噪声的公共场所异常声音时，是利用极值点剩余个数作为分解模态终止的判定条件，这会使得被分解的低频的噪声保留下来，得到不属于异常声音的伪特征分量。④ESMD方法分解公共场所异常声音信号时，设定筛选次数K在区间[K_min，K_max]内变化，然后利用不同的筛选次数重复分解异常声音信号，最后采用最小二乘法原理计算出最佳筛选次数，导致ESMD分解异常声音信号耗时长。At present, there are problems in extracting abnormal sound features in public places by Extreme-point Symmetric Mode Decomposition (ESMD) method: ① Abnormal sounds in public places are composed of two parts: abnormal sound signal and background noise signal. The background noise signal will block the local features of the abnormal sound. Using ESMD to decompose the abnormal sound in public places, the modal components obtained must contain the background noise component, which will lead to deviations in the characteristics of the abnormal sound. ② When ESMD decomposes the signal, it constructs 1, 2, 3 or more interpolation curves according to the midpoint of the extreme value to improve the decomposition effect, that is, ESMD_I, ESMD_II, ESMD_III methods. Since the interpolation method has a great influence on the effect of mode decomposition, by comparing the above three interpolation methods, it is found that with the increase of the interpolation line, the number of modes will decrease, the degree of symmetry will decrease, the amplitude change will increase, and the decomposition efficiency will increase. ③ When ESMD decomposes abnormal sounds in public places with background noise, it uses the remaining number of extreme points as the judgment condition for the termination of the decomposition mode, which will keep the decomposed low-frequency noise and obtain pseudo-feature components that do not belong to abnormal sounds . ④ When the ESMD method decomposes abnormal sound signals in public places, set the number of screening K to change within the interval [K _min , K _max ], then use different screening times to repeatedly decompose abnormal sound signals, and finally use the principle of least squares to calculate the best screening times, it takes a long time for ESMD to decompose abnormal sound signals.

综上所述，ESMD分解技术有一定的改进空间。To sum up, ESMD decomposition technology has some room for improvement.

发明内容Contents of the invention

针对以上现有技术的不足，本发明提出基于改进的ESMD(D-ESMD)分解技术的公共场所异常声音特征提取及识别方法，通过对ESMD的输入信号加噪，改进内部的插值方法、分解模态终止的判定条件及模态分量筛选次数，得到公共场所异常声音在不同尺度下的特征。Aiming at the deficiencies in the prior art above, the present invention proposes a public place abnormal sound feature extraction and recognition method based on the improved ESMD (D-ESMD) decomposition technology, by adding noise to the input signal of ESMD, the internal interpolation method and decomposition model are improved. The characteristics of abnormal sounds in public places at different scales are obtained by using the judgment conditions of modal termination and the screening times of modal components.

一种公共场所异常声音特征提取及识别方法，具体实现步骤如下：A method for feature extraction and identification of abnormal sounds in public places, the specific implementation steps are as follows:

步骤1：输入公共场所待识别异常声音并进行预处理。Step 1: Input the abnormal sounds to be identified in public places and perform preprocessing.

步骤2：采用改进的极点对称模态D-ESMD分解方法将待识别异常声音信号进行分解，得到各阶模态分量，每阶模态分量分别包含异常声音信号在不同频率段的特征。Step 2: Use the improved pole-symmetric mode D-ESMD decomposition method to decompose the abnormal sound signal to be identified to obtain the modal components of each order, and each order modal component contains the characteristics of the abnormal sound signal in different frequency bands.

步骤3：计算步骤2中得到的各阶模态分量相对于原始异常声音信号的能量比，并组合成向量形式进行归一化处理，作为待识别异常声音信号的特征向量。Step 3: Calculate the energy ratio of the modal components of each order obtained in step 2 relative to the original abnormal sound signal, and combine them into a vector form for normalization, as the feature vector of the abnormal sound signal to be identified.

步骤4：判断特征向量是否有效；若无效，跳转到步骤3；若有效，执行步骤5。Step 4: Determine whether the feature vector is valid; if not, go to step 3; if valid, go to step 5.

步骤5：公共场所待识别异常声音的识别过程：首先，在已经建立的异常声音库中随机选取每一类并且一定数量的训练样本，通过步骤2和步骤3求取其训练样本的特征向量并建立SVM分类模型；然后，利用建立的SVM分类模型对待识别异常声音的特征向量进行分类，得到分类识别结果。Step 5: The identification process of abnormal sounds to be identified in public places: First, randomly select each type and a certain number of training samples in the abnormal sound library that has been established, and obtain the eigenvectors of the training samples through steps 2 and 3 and Establish the SVM classification model; then, use the established SVM classification model to classify the feature vectors of the abnormal sounds to be recognized, and obtain the classification recognition results.

所述的D-ESMD分解方法是在极点对称模态ESMD分解方法基础上，添加随机T分布噪声序列于公共场所待识别异常声音当中，采用对称中点插值方法替代ESMD的极值中点奇偶插值方法，对分解的模态分量计算排列熵值，并且改进模态分量筛选次数，进而完成各模态的复杂性检测，自适应得到异常声音有效模态分量。The D-ESMD decomposition method is based on the pole symmetric mode ESMD decomposition method, adding a random T-distributed noise sequence to the abnormal sounds to be identified in public places, and using the symmetrical midpoint interpolation method to replace the extreme value midpoint parity interpolation of ESMD method, calculate the permutation entropy value for the decomposed modal components, and improve the modal component screening times, and then complete the complexity detection of each mode, and adaptively obtain the effective modal components of abnormal sounds.

所述的异常声音库中包括爆炸声、尖叫声、枪声、玻璃破碎声。The library of abnormal sounds includes explosions, screams, gunshots, and broken glass.

具体地，所述的D-ESMD分解方法的具体过程为：Specifically, the concrete process of described D-ESMD decomposition method is:

步骤2.1确定添加T分布随机噪声次数N；Step 2.1 determines the number of times N to add T-distributed random noise;

步骤2.2假设待识别异常声音信号为x，添加随机的T分布序列于待识别声音信号x中，得到加噪的异常声音信号X_i；Step 2.2 Assuming that the abnormal sound signal to be identified is x, add a random T distribution sequence to the sound signal x to be identified, and obtain the abnormal sound signal Xi with noise _added ;

步骤2.3求取加噪过后异常声音信号X_i的极值点，连接相邻极值点，并将线段中点标记为F_i，补充左右边界点F₀与F_n，采用对称中点插值方法替代ESMD的极值中点奇偶插值方法对n+1个极值中点构造插值曲线L^*；Step 2.3 Find the extreme points of the abnormal sound signal X _i after adding noise, connect the adjacent extreme points, mark the midpoint of the line segment as F _i , supplement the left and right boundary points F ₀ and F _n , and adopt the symmetrical midpoint interpolation method The extreme midpoint odd-even interpolation method that replaces ESMD constructs an interpolation curve L ^* for n+1 extreme midpoints;

步骤2.4将X_i-L^*作为输入，重复上述步骤2.3直到筛选次数达到最大值，得到第一阶模态分量M₁ ⁱ，计算模态分量的排列熵的值；如果该信号的排列熵的值大于约定阈值θ，则认为是异常声音模态分量，否则认为是噪声分量；Step 2.4 takes X _i -L ^* as input, repeats the above step 2.3 until the number of screening reaches the maximum value, and obtains the first-order modal component M ₁ ⁱ , and calculates the value of the permutation entropy of the modal component; if the permutation entropy of the signal is If the value is greater than the agreed threshold θ, it is considered to be an abnormal sound modal component, otherwise it is considered to be a noise component;

步骤2.5若模态分量M₁ ⁱ为异常声音模态分量，则将X_i-M₁ ⁱ作为输入信号，重复步骤2.3-步骤2.4，直到分解得到的模态分量M_n ⁱ为噪声分量为止；Step 2.5 If the modal component M ₁ ⁱ is an abnormal sound modal component, then take Xi-M ₁ _i as the input signal, and repeat steps ^2.3-2.4 until the decomposed modal component M _n ⁱ is a noise component;

步骤2.6若i<N，则令i＝i+1，重复步骤2.2至步骤2.5，每次添加的T分布噪声信号不同，直至进行N次分解为止，对得到所有的模态分量求总体平均值，并将结果作为待分解信号的最终的模态分量M_k：Step 2.6 If i<N, set i=i+1, repeat step 2.2 to step 2.5, the T distribution noise signal added each time is different, until N times of decomposition are performed, and all modal components are obtained Calculate the overall average and use the result as the final modal component M _k of the signal to be decomposed:

${M m}_{k k} = = \frac{11}{N N} {Σ Σ}_{i i = = 11}^{N N} {M m}_{k k}^{i i}$

上式中，k是模态分量阶数，N为加噪次数。In the above formula, k is the order of the modal component, and N is the number of times of adding noise.

具体地，所述的对称中点插值方法具体步骤为：Specifically, the specific steps of the symmetric midpoint interpolation method are:

步骤3.1、假设输入信号为y，求取y的所有极大值点y_max与极小值点y_min；Step 3.1, assuming that the input signal is y, find all the maximum value points y _max and minimum value points y _min of y;

步骤3.2、连接所有相邻极值点，并求取极值中点y_mean；Step 3.2, connect all adjacent extreme points, and calculate the extreme midpoint y _mean ;

y_mean＝(y_max+y_min)/2y _mean ＝(y _max +y _min )/2

步骤3.3、求取相邻极值中点的对称中点y_m，同时采用三次样条插值方法对y_m进行插值，得到最终插值曲线。In step 3.3, the symmetrical midpoint y _m of the adjacent extreme midpoints is obtained, and the cubic spline interpolation method is used to interpolate y _m to obtain the final interpolation curve.

具体地，步骤2.4中筛选次数最优为12。Specifically, the optimal number of screening times in step 2.4 is 12.

具体地，排列熵的具体计算过程如下：Specifically, the specific calculation process of permutation entropy is as follows:

假设一个长度为N的时间序列信号x(i)，i＝1，2，…，N，对其进行延迟重构，得到如下时间序列：Assuming a time series signal x(i) with length N, i=1, 2, ..., N, and reconstructing it with delay, the following time series is obtained:

式中，l为时间延迟，m为重构维数，对X(i)中m个元素进行升序排列，得到：In the formula, l is the time delay, m is the reconstruction dimension, and the m elements in X(i) are arranged in ascending order to obtain:

X_i'＝{x(i+(j₁-1)*l)≤x(i+(j₂-1)*l)X _i '={x(i+(j ₁ -1)*l)≤x(i+(j ₂ -1)*l)

≤…≤x(i+(j_m-1)*l)}≤…≤x(i+(j _m -1)*l)}

因此，每一个向量X(i)都拥有一组排列序列：Therefore, each vector X(i) has a set of permutations:

Sg＝{j₁,j₂,j₃,…j_m}Sg＝{j ₁ ,j ₂ ,j ₃ ,…j _m }

式中，j表示重构分量中各元素所在列的索引。In the formula, j represents the index of the column where each element in the reconstruction component is located.

其中，对于m个不同的符号必然会有m！种不同的排列；计算每种排列方式在X(i)中出现的概率p₁、p₂、…p₃，则归一化后的排列熵为：Among them, there must be m for m different symbols! Different permutations; calculate the probability p ₁ , p ₂ , ... p ₃ of each permutation in X(i), then the normalized permutation entropy is:

$H h = = ((- - {Σ Σ}_{i i = = 11}^{k k} {p p}_{i i} * * {lgp lgp}_{i i})) lg lg ((m m!!))$

其中，N为时间序列长度，m为重构维数和l为时间延迟。where N is the time series length, m is the reconstruction dimension and l is the time delay.

有效增益效果在于：The effective gain effect is:

本发明基于D-ESMD对公共场所异常声音分解时，添加随机的T分布噪声序列到待分解的公共场所异常声音信号中，从源头上降低由背景噪声带来的分解偏差，从而使公共场所异常声音的识别能力得到很大提高。而且本发明结合公共场所异常声音与背景噪声特点，提出D-ESMD方法用于公共场所异常声音特征提取及识别，将公共场所异常声音分解成一系列频率成分较为单一的模态分量。从理论上对ESMD内部的插值方法、分解模态终止的判定条件及模态分量筛选次数等进行改进，保证分解得到的模态分量能够反映公共场所异常声音在不同尺度下的特征。When the present invention decomposes abnormal sounds in public places based on D-ESMD, random T-distributed noise sequences are added to the abnormal sound signals in public places to be decomposed, and the decomposition deviation caused by background noise is reduced from the source, so that abnormal sounds in public places Voice recognition ability has been greatly improved. Moreover, the present invention combines the characteristics of abnormal sounds and background noise in public places, and proposes the D-ESMD method for feature extraction and identification of abnormal sounds in public places, and decomposes abnormal sounds in public places into a series of modal components with relatively single frequency components. Theoretically, the internal interpolation method of ESMD, the judgment condition of decomposition modal termination and the screening times of modal components are improved to ensure that the modal components obtained by decomposition can reflect the characteristics of abnormal sounds in public places at different scales.

附图说明Description of drawings

图1：本发明提出的一种公共场所异常声音特征提取及识别方法流程框图；Fig. 1: A flow chart of a method for feature extraction and identification of abnormal sounds in public places proposed by the present invention;

图2：ESMD插值方法分解模拟信号图；Figure 2: The ESMD interpolation method decomposes the analog signal diagram;

图3：本发明提出的改进插值方法分解模拟信号图；Fig. 3: the improved interpolation method proposed by the present invention decomposes the analog signal diagram;

图4：本发明与其它几种异常声音特征提取方法的受试者工作特征(ReceiverOperating Characteristic，ROC)曲线对比图。Fig. 4: Comparison of Receiver Operating Characteristic (ROC) curves between the present invention and several other abnormal sound feature extraction methods.

具体实施方式detailed description

以下结合附图进一步详细阐述本发明。The present invention is further described in detail below in conjunction with the accompanying drawings.

本发明的核心技术是D-ESMD分解方法。D-ESMD分解方法是基于ESMD分解方法的进行的改进，其改进点为：The core technology of the present invention is the D-ESMD decomposition method. The D-ESMD decomposition method is an improvement based on the ESMD decomposition method, and its improvement points are:

一、采用基于T分布的ESMD分解方法，削弱模态分量中的背景噪声分量，从而更好的提取异常声音的特征。具体步骤如下：1. The ESMD decomposition method based on T distribution is used to weaken the background noise component in the modal component, so as to better extract the characteristics of abnormal sounds. Specific steps are as follows:

添加随机的T分布序列于待识别声音信号中，削弱模态分量中的背景噪声分量，从源头上降低由背景噪声带来的分解偏差，提高异常声音的特征提取的能力。具体处理过程为：Add random T distribution sequence to the sound signal to be recognized, weaken the background noise component in the modal component, reduce the decomposition deviation caused by the background noise from the source, and improve the feature extraction ability of abnormal sound. The specific processing process is:

假设公共场所异常声音信号为X(t)，它一般由真实异常音信号x(t)与背景噪声信号N(t)组成，即：Assuming that the abnormal sound signal in public places is X(t), it generally consists of the real abnormal sound signal x(t) and the background noise signal N(t), namely:

X(t)＝x(t)+N(t)X(t)=x(t)+N(t)

采用ESMD对X(t)进行分解时，得到的模态M(t)中同样包含异常声音信号分量m(t)与背景噪声信号分量c(t)，即为：When X(t) is decomposed by ESMD, the obtained mode M(t) also contains abnormal sound signal component m(t) and background noise signal component c(t), namely:

$X x ((t t)) = = {Σ Σ}_{i i = = 11}^{n no} {M m}_{i i} ((t t)) + + r r ((t t)) = = {Σ Σ}_{i i = = 11}^{n no} {m m}_{i i} ((t t)) + + {c c}_{i i} ((t t)) + + r r ((t t))$

式中，n为模态分量数量，r(t)为分解余项。In the formula, n is the number of modal components, and r(t) is the decomposition remainder.

在信号X(t)中添加k次不同的T噪声序列n_i(t)后，系列公式可表示成：After adding k times of different T noise sequences n _i (t) to the signal X(t), the series of formulas can be expressed as:

X(t)+n₁(t)＝m₁₁(t)+m₁₂(t)+…+m_1n(t)+c₁₁(t)+c₁₂(t)+…+c_1n(t)+r₁(t)X(t)+n ₁ (t)=m ₁₁ (t)+m ₁₂ (t)+...+m _1n (t)+c ₁₁ (t)+c ₁₂ (t)+...+c _1n (t) +r ₁ (t)

X(t)+n₂(t)＝m₂₁(t)+m₂₂(t)+…+m_2n(t)+c₂₁(t)+c₂₂(t)+…+c_2n(t)+r₂(t)X(t)+n ₂ (t)＝m ₂₁ (t)+m ₂₂ (t)+...+m _2n (t)+c ₂₁ (t)+c ₂₂ (t)+...+c _2n (t) +r ₂ (t)

………………

X(t)+n_i(t)＝m_i1(t)+m_i2(t)+…+m_in(t)+c_i1(t)+c_i2(t)+…+c_in(t)+r_i(t)X(t)+n _i (t)＝m _i1 (t)+m _i2 (t)+…+m _in (t)+c _i1 (t)+c _i2 (t)+…+c _in (t) +r _i (t)

………………

X(t)+n_k(t)＝m_k1(t)+m_k2(t)+…+m_kn(t)+c_k1(t)+c_k2(t)+…+c_kn(t)+r_k(t)X(t)+n _k (t)＝m _k1 (t)+m _k2 (t)+…+m _kn (t)+c _k1 (t)+c _k2 (t)+…+c _kn (t) +r _k (t)

将上述N个公式累加，可得：Adding up the above N formulas, we can get:

$\begin{matrix} k k * * X x ((t t)) + + {n no}_{11} ((t t)) + + {n no}_{22} ((t t)) + + ... ... {n no}_{k k} ((t t)) \\ = = k k * * x x ((t t)) + + k k * * N N ((t t)) + + {n no}_{11} ((t t)) + + {n no}_{22} ((t t)) + + ... ... {n no}_{k k} ((t t)) \\ = = {Σ Σ}_{i i = = 11}^{k k} {Σ Σ}_{j j = = 11}^{n no} {N N}_{i i j j} ((t t)) + + {r r}_{i i} ((t t)) \\ = = {Σ Σ}_{i i = = 11}^{k k} {Σ Σ}_{j j = = 11}^{n no} {m m}_{i i j j} ((t t)) + + {c c}_{i i j j} ((t t)) + + {r r}_{i i} ((t t)) \\ = = {Σ Σ}_{i i = = 11}^{k k} {Σ Σ}_{j j = = 11}^{n no} {m m}_{i i j j} ((t t)) + + {r r}_{i i} ((t t)) + + {Σ Σ}_{i i = = 11}^{k k} {Σ Σ}_{j j = = 11}^{n no} {c c}_{i i j j} ((t t)) \end{matrix}$

由上式可知，当k～∞时，k*N(t)+n₁(t)+n₂(t)+…n_k(t)与c_ij(t)项均趋近于零，则上式转换为如下：It can be seen from the above formula that when k～∞, k*N(t)+n ₁ (t)+n ₂ (t)+…n _k (t) and c _ij (t) both tend to zero, then The above formula is transformed into the following:

$x x ((t t)) = = \frac{11}{k k} {Σ Σ}_{i i = = 11}^{k k} {Σ Σ}_{j j = = 11}^{n no} {m m}_{i i j j} ((t t)) + + {r r}_{i i} ((t t))$

从上式可以看出，添加k次随机T分布噪声序列于公共场所异常声音当中，并采用ESMD对其分解得到的各阶模态取均值，背景噪声分量c(t)已经被消除，从而降低了公共场所背景噪声对异常声音分解的影响。It can be seen from the above formula that by adding k times of random T-distributed noise sequences to abnormal sounds in public places, and using ESMD to take the mean value of each order mode obtained by its decomposition, the background noise component c(t) has been eliminated, thereby reducing The effect of background noise in public places on the decomposition of abnormal sounds.

二、采用对称中点插值替代极值中点奇偶插值，从信号源头上提高ESMD分解效率与分解准确率。2. Use symmetrical midpoint interpolation instead of extreme midpoint parity interpolation to improve ESMD decomposition efficiency and decomposition accuracy from the source of the signal.

对称中点插值方法为：The symmetric midpoint interpolation method is:

步骤3.1求取原始信号的所有极大值点y_max与极小值点y_min；Step 3.1 Obtain all maximum value points y _max and minimum value points y _min of the original signal;

步骤3.2连接所有相邻极值点，并求取极值中点y_mean；Step 3.2 connect all adjacent extreme points, and find the extreme midpoint y _mean ;

y_mean＝(y_max+y_min)/2y _mean ＝(y _max +y _min )/2

步骤3.3求取相邻极值中点的对称中点y_m，同时采用三次样条插值方法对y_m进行插值，得到最终插值曲线。Step 3.3 Calculate the symmetrical midpoint y _m of the adjacent extreme midpoints, and at the same time use the cubic spline interpolation method to interpolate y _m to obtain the final interpolation curve.

别采用对称中点插值与极值点奇偶插值对模拟信号z进行分解。假设模拟信号z由三种频率不同幅度不同的正弦信号组成，如下所示：Do not use symmetrical midpoint interpolation and extreme point parity interpolation to decompose the analog signal z. Assume that the analog signal z consists of three sinusoidal signals with different frequencies and different amplitudes, as follows:

z＝sin(20*p*t)+1.5cos(40*π*t)+2.5cos(80*π*t)z=sin(20*p*t)+1.5cos(40*π*t)+2.5cos(80*π*t)

由图2以看出，采用ESMD插值方法对模拟信号进行分解时，产生的模态出现了失真现象，模态与原始信号幅值偏差较大。图3本发明提出的改进插值方法分解模拟信号图，有效缓解了ESMD插值的端点不明确造成带来的失真问题。It can be seen from Figure 2 that when the ESMD interpolation method is used to decompose the analog signal, the resulting mode appears distorted, and the amplitude deviation between the mode and the original signal is large. Fig. 3 The improved interpolation method proposed by the present invention decomposes the analog signal diagram, which effectively alleviates the distortion problem caused by the ambiguity of the end points of ESMD interpolation.

三、基于排列熵算法对ESMD分解得到的模态分量进行复杂性检测，并将此作为区分异常声音与背景噪声的判定准则，自适应的得到有效异常声音分量。3. Based on the permutation entropy algorithm, the complexity detection of the modal components obtained by ESMD decomposition is carried out, and this is used as the judgment criterion for distinguishing abnormal sounds from background noise, and the effective abnormal sound components are adaptively obtained.

排列熵的具体计算过程如下：The specific calculation process of permutation entropy is as follows:

式中，l为时间延迟，m为重构维数，对X(i)中m个元素进行升序排列，可得到：In the formula, l is the time delay, m is the reconstruction dimension, and the m elements in X(i) are arranged in ascending order, and we can get:

≤…≤x(i+(j_m-1)*l)}≤…≤x(i+(j _m -1)*l)}

Sg＝{j₁,j₂,j₃,…j_m}Sg＝{j ₁ ,j ₂ ,j ₃ ,…j _m }

其中，对于m个不同的符号必然会有m！种不同的排列。计算每种排列方式在X(i)中出现的概率p₁、p₂、…p₃，则归一化后的排列熵为：Among them, there must be m for m different symbols! different permutations. Calculate the probability p ₁ , p ₂ , ... p ₃ of each arrangement in X(i), then the normalized arrangement entropy is:

其中，N为时间序列长度，m为重构维数和l为时间延迟。根据实验结果，重构维数m一般选取3～7。时间延迟对排列熵的影响较小，一般可以选取为1。where N is the time series length, m is the reconstruction dimension and l is the time delay. According to the experimental results, the reconstruction dimension m is generally selected from 3 to 7. The time delay has little effect on the permutation entropy and can generally be selected as 1.

本发明中通过判定添加了随机T分布序列的公共场所异常声音信号的分解得到不同频率尺度的模态分量的排列熵值H是否大于阈值θ，来判定模态的取舍。经过实验，发现θ取值在0.25-0.35范围内时，提取异常声音特征效果较好。In the present invention, the choice of the mode is determined by determining whether the permutation entropy value H of the modal components of different frequency scales obtained by decomposing the abnormal sound signal in a public place with a random T distribution sequence is greater than the threshold θ. After experiments, it is found that when the value of θ is in the range of 0.25-0.35, the effect of extracting abnormal sound features is better.

四、模态分量筛选次数Fourth, the number of modal component screening

模态筛选次数是通过大量实验确定最佳的筛选次数，优选值为12。The number of modal screenings is determined through a large number of experiments to determine the optimal number of screenings, and the preferred value is 12.

本发明利用以上的改进点，实现公共场所异常声音特征提取及识别，如图1所示，方法主要包括三部分：包括：公共场所待识别异常声音的分解、特征提取和识别。The present invention utilizes the above improvements to realize feature extraction and recognition of abnormal sounds in public places. As shown in Figure 1, the method mainly includes three parts: including: decomposition, feature extraction and recognition of abnormal sounds to be recognized in public places.

方法的具体步骤如下：The specific steps of the method are as follows:

步骤1：输入公共场所待识别异常声音信号并进行预处理。Step 1: Input the abnormal sound signal to be identified in public places and perform preprocessing.

步骤2：采用改进的极点对称模态D-ESMD分解方法将待识别异常声音信号分解为一系列模态分量，每阶模态分量分别包含异常声音信号在不同频率段的特征。Step 2: Decompose the abnormal sound signal to be identified into a series of modal components by using the improved pole-symmetric mode D-ESMD decomposition method, and each modal component contains the characteristics of the abnormal sound signal in different frequency bands.

步骤4：判断特征向量是否有效；若无效，跳转到步骤3；若有效，执行步骤5；Step 4: Determine whether the feature vector is valid; if invalid, go to step 3; if valid, go to step 5;

步骤5：公共场所待识别异常声音的识别过程：首先，在已经建立的异常声音库中随机选取每一类并且一定数量的训练样本，通过步骤2和步骤3求取其训练样本的特征向量并建立SVM分类模型；然后，利用建立的SVM分类模型对待识别异常声音的特征向量进行分类，得到分类识别结果；Step 5: The identification process of abnormal sounds to be identified in public places: First, randomly select each type and a certain number of training samples in the abnormal sound library that has been established, and obtain the eigenvectors of the training samples through steps 2 and 3 and Establishing an SVM classification model; then, using the established SVM classification model to classify the feature vectors of the abnormal sounds to be identified, to obtain classification recognition results;

其中，所述的D-ESMD用于公共场所待识别异常声音特征提取的具体步骤如下所示：Wherein, the specific steps of the D-ESMD used in feature extraction of abnormal sounds to be identified in public places are as follows:

步骤2.3求取加噪过后异常声音信号X_i的极值点，连接相邻极值点，并将线段中点标记为F_i，补充左右边界点F₀与F_n。采用对称中点插值方法替代ESMD的极值中点奇偶插值方法对n+1个极值中点构造插值曲线L^*。Step 2.3 Find the extreme points of the abnormal sound signal X _i after adding noise, connect the adjacent extreme points, mark the midpoint of the line segment as F _i , and supplement the left and right boundary points F ₀ and F _n . A symmetrical midpoint interpolation method is used to replace the ESMD's extreme midpoint even-even interpolation method to construct an interpolation curve L ^* for n+1 extreme midpoints.

步骤2.4将X_i-L^*作为输入，重复上述步骤2.3直到筛选次数达到最大值12得到第一阶模态分量M₁ ⁱ，计算模态分量的排列熵的值；如果该信号的排列熵的值大于约定阈值θ，则认为是异常声音模态分量，否则认为是噪声分量；Step 2.4 takes X _i -L ^* as input, repeats the above step 2.3 until the number of screening times reaches the maximum value of 12 to obtain the first-order modal component M ₁ ⁱ , and calculates the value of the permutation entropy of the modal component; if the permutation entropy of the signal is If the value is greater than the agreed threshold θ, it is considered to be an abnormal sound modal component, otherwise it is considered to be a noise component;

${M m}_{k k} = = \frac{11}{N N} {Σ Σ}_{i i = = 11}^{N N} {M m}_{k k}^{i i}$

图4为本发明与其它几种异常声音特征提取方法的ROC曲线对比图。其中ESMD为极点对称模态分解方法，EEMD为总体经验模态分解方法，SaSEEMD为基于α分布的总体经验模态分解方法，ELMD为总体局部均值分解方法。D-ESMD为本发明提出的改进ESMD分解方法。Fig. 4 is a comparison chart of ROC curves between the present invention and several other abnormal sound feature extraction methods. Among them, ESMD is the pole symmetric mode decomposition method, EEMD is the overall empirical mode decomposition method, SaSEMD is the overall empirical mode decomposition method based on α distribution, and ELMD is the overall local mean decomposition method. D-ESMD is an improved ESMD decomposition method proposed by the present invention.

Claims

1. an abnormal sound in public places feature extraction and recognition methods, it is characterised in that including: public place is to be identified different Chang Shengyin decomposition, feature extraction and identification；Implement step as follows:

Step 1: input public place abnormal sound to be identified also carries out pretreatment；

Step 2: use the pole symmetric mode decomposition D-ESMD method improved to be decomposed by abnormal sound signal to be identified, To each rank modal components, every rank modal components comprises the abnormal sound signal feature in different frequency section respectively；

Step 3: each rank modal components obtained in calculation procedure 2 is relative to the energy ratio of original anomaly acoustical signal, and combines Vector form is become to be normalized, as the characteristic vector of abnormal sound signal to be identified；

Step 4: judging characteristic vector is the most effective；If invalid, jump to step 3；If effectively, performing step 5；

Step 5: the identification process of public place abnormal sound to be identified: first, random in the abnormal sound storehouse having built up Choose each class and a number of training sample, ask for the characteristic vector of its training sample by step 2 and step 3 and build Vertical svm classifier model；Then, utilize the svm classifier model set up that the characteristic vector of abnormal sound to be identified is classified, To Classification and Identification result；

Described D-ESMD decomposition method is on the basis of pole symmetric mode ESMD decomposition method, adds random T partition noise Sequence is in the middle of public place abnormal sound to be identified, and the extreme value midpoint odd even using symmetrical Point Interpolation method to substitute ESMD is inserted Value method, to the modal components calculated permutations entropy decomposed, and improves modal components screening number of times, and then completes each mode Complexity detects, and self adaptation obtains the effective modal components of abnormal sound；

Described abnormal sound storehouse includes explosive sound, shriek, shot, glass breaking sound.

A kind of abnormal sound in public places feature extraction the most according to claim 1 and recognition methods, it is characterised in that institute The detailed process of the D-ESMD decomposition method stated is:

Step 2.1 determines interpolation T distributed random noise times N；

Step 2.2 assumes that abnormal sound signal to be identified is x, adds random T distribution series in acoustical signal x to be identified, Obtain adding the abnormal sound signal X made an uproar_i；

Step 2.3 asks for adding abnormal sound signal X after making an uproar_iExtreme point, connect adjacent extreme point, and by line segment midpoint labelling For F_i, supplement left and right boundary point F₀With F_n, use symmetrical Point Interpolation method to substitute the extreme value midpoint odd even interpolation method pair of ESMD N+1 extreme value midpoint structure interpolation curve L^*；

Step 2.4 is by X_i-L^*As input, repeat the above steps 2.3, until screening number of times reaches maximum, obtains the first order mode State component M₁ ⁱ, calculate the value of the arrangement entropy of modal components；If the value of the arrangement entropy of this signal is more than agreement threshold θ, then it is assumed that It is abnormal sound modal components, otherwise it is assumed that be noise component(s)；

If step 2.5 modal components M₁ ⁱFor abnormal sound modal components, then by X_i-M₁ ⁱAs input signal, repeat step 2.3- Step 2.4, until decomposing modal components M obtained_n ⁱTill noise component(s)；

If step 2.6 i < N, then making i=i+1, repetition step 2.2 to step 2.5, the T partition noise signal every time added is different, Till carrying out n times decomposition, to obtaining all of modal components M_k ⁱSeek population mean, and using result as signal to be decomposed Final modal components M_k:

M_{k} = \frac{1}{N} Σ_{i = 1}^{N} M_{k}^{i}

In above formula, k is modal components exponent number, and N is for adding number of times of making an uproar.

A kind of abnormal sound in public places feature extraction the most according to claim 1 and 2 and recognition methods, its feature exists In, described symmetrical Point Interpolation method concretely comprises the following steps:

Step 3.1, assume that input signal is y, ask for all maximum point y of y_maxWith minimum point y_min；

Step 3.2, connect all adjacent extreme points, and ask for extreme value midpoint y_mean；

y_mean=(y_max+y_min)/2

Step 3.3, ask for the symmetrical midpoint y at adjacent extreme value midpoint_m, use cubic spline interpolation method to y simultaneously_mCarry out interpolation, Obtain final interpolation curve.

A kind of abnormal sound in public places feature extraction the most according to claim 2 and recognition methods, it is characterised in that step The maximum preferential 12 of number of times is screened in rapid 2.4.

A kind of abnormal sound in public places feature extraction the most according to claim 2 and recognition methods, it is characterised in that row The concrete calculating process of row entropy is as follows:

Assume time series signal x (i) of an a length of N, i=1,2 ..., N, it is carried out delay reconstruction, when obtaining following Between sequence:

In formula, l is time delay, and m is reconstruct dimension, m element in X (i) is carried out ascending order arrangement, obtains:

X′_i={ x (i+ (j₁-1)*l)≤x(i+(j₂-1)*l)

≤…≤x(i+(j_m-1)*l)}

Therefore, each vector X (i) is owned by one group of collating sequence:

Sg={j₁,j₂,j₃,…j_m}

In formula, j represents the index of each element column in reconstruct component；

Wherein, m is necessarily had for m different symbol！Plant different arrangements；Calculate every kind of arrangement mode to occur in X (i) Probability p₁、p₂、…p₃, then the arrangement entropy after normalization is:

H = (- Σ_{i = 1}^{k} p_{i} * {lgp}_{i}) \lg (m!)

Wherein, N is length of time series, and m is time delay for reconstruct dimension and l.