CN106228979A - A kind of abnormal sound in public places feature extraction and recognition methods - Google Patents
A kind of abnormal sound in public places feature extraction and recognition methods Download PDFInfo
- Publication number
- CN106228979A CN106228979A CN201610674982.1A CN201610674982A CN106228979A CN 106228979 A CN106228979 A CN 106228979A CN 201610674982 A CN201610674982 A CN 201610674982A CN 106228979 A CN106228979 A CN 106228979A
- Authority
- CN
- China
- Prior art keywords
- abnormal sound
- esmd
- decomposition
- signal
- modal components
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
本发明涉及一种公共场所异常声音的提取及识别方法,对极点对称模态分解(ESMD)进行改进,简称D‑ESMD,其特点是:在公共场所异常声音中加入随机T分布序列信号,减小公共场所背景噪声对异常声音特征提取的影响;针对原始ESMD在分解异常声音时,分解效果欠佳的问题,提出对称中点插值替代极值中点奇偶插值方法,提高异常声音分解效率与识别率;针对原始ESMD在有效分解模态选择上的缺陷,提出基于排列熵算法对ESMD分解得到的模态进行复杂性检测,自适应得到异常声音有效模态分量。利用本发明可以充分描述异常声音的特征,并得到较好的分类识别结果,更能够准确提取异常声音的特征,并且对环境背景噪声具有较好的鲁棒性。
The invention relates to a method for extracting and identifying abnormal sounds in public places. It improves the pole symmetric mode decomposition (ESMD), referred to as D‑ESMD. The influence of background noise in small public places on the feature extraction of abnormal sounds; Aiming at the problem that the original ESMD has a poor decomposition effect when decomposing abnormal sounds, a symmetrical midpoint interpolation method is proposed to replace the extreme value midpoint parity interpolation method to improve the decomposition efficiency and recognition of abnormal sounds Aiming at the defects of the original ESMD in effective decomposition mode selection, a permutation entropy algorithm is proposed to detect the complexity of the modes obtained by ESMD decomposition, and adaptively obtain the effective mode components of abnormal sounds. The invention can fully describe the characteristics of abnormal sounds, obtain better classification and recognition results, can more accurately extract the characteristics of abnormal sounds, and has better robustness to environmental background noise.
Description
技术领域technical field
本发明属于音频信号特征提取及模式识别技术领域,尤其涉及一种公共场所异常声音特征提取及识别方法。The invention belongs to the technical field of audio signal feature extraction and pattern recognition, and in particular relates to a method for feature extraction and identification of abnormal sounds in public places.
背景技术Background technique
公共场所如广场、公共汽车站、地铁等具有人流量大、地域广等特点,且公共场所的安全防范一直受到各国政府和人民的广泛关注。目前以视频监控为主的监控技术为公共场所的安全防范起到了积极的作用,然而视频监控技术存在监控死角、阴雨天监控模糊等问题。众所周知,在异常事件发生时常常伴随着尖叫声、枪声、玻璃破碎声、爆炸声等异常声音,因此音频监控与视频监控的协作运行已经成为公共场所安全监控领域的发展方向。目前,现有的音频监控系统仅包含简单的声音采集、传输等,缺乏对异常声音的有效识别,原因是音频监控核心理论及技术没有得到突破。公共场所异常声音识别技术为音频监控系统的核心技术。因此,对该技术的研究具有重要的社会意义以及研究价值。Public places such as squares, bus stations, subways, etc. have the characteristics of large flow of people and wide areas, and the safety precautions of public places have always been widely concerned by governments and people of all countries. At present, the monitoring technology mainly based on video monitoring has played a positive role in the safety of public places. However, video monitoring technology has problems such as monitoring dead spots and blurry monitoring in rainy days. As we all know, when abnormal events occur, they are often accompanied by abnormal sounds such as screams, gunshots, broken glass, explosions, etc. Therefore, the cooperative operation of audio monitoring and video monitoring has become the development direction of the field of public place security monitoring. At present, the existing audio monitoring system only includes simple sound collection and transmission, etc., and lacks effective identification of abnormal sounds. The reason is that the core theory and technology of audio monitoring have not been broken through. Abnormal sound recognition technology in public places is the core technology of the audio monitoring system. Therefore, the research on this technology has important social significance and research value.
目前对于极点对称模态分解(Extreme-point Symmetric Mode Decomposition,ESMD)方法提取公共场所异常声音特征存在问题:①公共场所异常声音由两部分组成:异常声音信号与背景噪声信号。背景噪声信号会遮挡异常声音的局部特征,采用ESMD对公共场所异常声音进行分解,得到的模态分量必然包含背景噪声分量,导致异常声音特征产生偏差。②ESMD在分解信号时,根据极值中点构造1条、2条、3条或者等多的插值曲线提高分解效果,即ESMD_I、ESMD_II、ESMD_III方法。由于插值方式对模态分解的效果影响很大,通过比较上述三种插值方式发现,随着插值线的增加,模态数目会减少,对称度会降低,振幅变化会增强,分解效率会提高。③ESMD分解具有背景噪声的公共场所异常声音时,是利用极值点剩余个数作为分解模态终止的判定条件,这会使得被分解的低频的噪声保留下来,得到不属于异常声音的伪特征分量。④ESMD方法分解公共场所异常声音信号时,设定筛选次数K在区间[Kmin,Kmax]内变化,然后利用不同的筛选次数重复分解异常声音信号,最后采用最小二乘法原理计算出最佳筛选次数,导致ESMD分解异常声音信号耗时长。At present, there are problems in extracting abnormal sound features in public places by Extreme-point Symmetric Mode Decomposition (ESMD) method: ① Abnormal sounds in public places are composed of two parts: abnormal sound signal and background noise signal. The background noise signal will block the local features of the abnormal sound. Using ESMD to decompose the abnormal sound in public places, the modal components obtained must contain the background noise component, which will lead to deviations in the characteristics of the abnormal sound. ② When ESMD decomposes the signal, it constructs 1, 2, 3 or more interpolation curves according to the midpoint of the extreme value to improve the decomposition effect, that is, ESMD_I, ESMD_II, ESMD_III methods. Since the interpolation method has a great influence on the effect of mode decomposition, by comparing the above three interpolation methods, it is found that with the increase of the interpolation line, the number of modes will decrease, the degree of symmetry will decrease, the amplitude change will increase, and the decomposition efficiency will increase. ③ When ESMD decomposes abnormal sounds in public places with background noise, it uses the remaining number of extreme points as the judgment condition for the termination of the decomposition mode, which will keep the decomposed low-frequency noise and obtain pseudo-feature components that do not belong to abnormal sounds . ④ When the ESMD method decomposes abnormal sound signals in public places, set the number of screening K to change within the interval [K min , K max ], then use different screening times to repeatedly decompose abnormal sound signals, and finally use the principle of least squares to calculate the best screening times, it takes a long time for ESMD to decompose abnormal sound signals.
综上所述,ESMD分解技术有一定的改进空间。To sum up, ESMD decomposition technology has some room for improvement.
发明内容Contents of the invention
针对以上现有技术的不足,本发明提出基于改进的ESMD(D-ESMD)分解技术的公共场所异常声音特征提取及识别方法,通过对ESMD的输入信号加噪,改进内部的插值方法、分解模态终止的判定条件及模态分量筛选次数,得到公共场所异常声音在不同尺度下的特征。Aiming at the deficiencies in the prior art above, the present invention proposes a public place abnormal sound feature extraction and recognition method based on the improved ESMD (D-ESMD) decomposition technology, by adding noise to the input signal of ESMD, the internal interpolation method and decomposition model are improved. The characteristics of abnormal sounds in public places at different scales are obtained by using the judgment conditions of modal termination and the screening times of modal components.
一种公共场所异常声音特征提取及识别方法,具体实现步骤如下:A method for feature extraction and identification of abnormal sounds in public places, the specific implementation steps are as follows:
步骤1:输入公共场所待识别异常声音并进行预处理。Step 1: Input the abnormal sounds to be identified in public places and perform preprocessing.
步骤2:采用改进的极点对称模态D-ESMD分解方法将待识别异常声音信号进行分解,得到各阶模态分量,每阶模态分量分别包含异常声音信号在不同频率段的特征。Step 2: Use the improved pole-symmetric mode D-ESMD decomposition method to decompose the abnormal sound signal to be identified to obtain the modal components of each order, and each order modal component contains the characteristics of the abnormal sound signal in different frequency bands.
步骤3:计算步骤2中得到的各阶模态分量相对于原始异常声音信号的能量比,并组合成向量形式进行归一化处理,作为待识别异常声音信号的特征向量。Step 3: Calculate the energy ratio of the modal components of each order obtained in step 2 relative to the original abnormal sound signal, and combine them into a vector form for normalization, as the feature vector of the abnormal sound signal to be identified.
步骤4:判断特征向量是否有效;若无效,跳转到步骤3;若有效,执行步骤5。Step 4: Determine whether the feature vector is valid; if not, go to step 3; if valid, go to step 5.
步骤5:公共场所待识别异常声音的识别过程:首先,在已经建立的异常声音库中随机选取每一类并且一定数量的训练样本,通过步骤2和步骤3求取其训练样本的特征向量并建立SVM分类模型;然后,利用建立的SVM分类模型对待识别异常声音的特征向量进行分类,得到分类识别结果。Step 5: The identification process of abnormal sounds to be identified in public places: First, randomly select each type and a certain number of training samples in the abnormal sound library that has been established, and obtain the eigenvectors of the training samples through steps 2 and 3 and Establish the SVM classification model; then, use the established SVM classification model to classify the feature vectors of the abnormal sounds to be recognized, and obtain the classification recognition results.
所述的D-ESMD分解方法是在极点对称模态ESMD分解方法基础上,添加随机T分布噪声序列于公共场所待识别异常声音当中,采用对称中点插值方法替代ESMD的极值中点奇偶插值方法,对分解的模态分量计算排列熵值,并且改进模态分量筛选次数,进而完成各模态的复杂性检测,自适应得到异常声音有效模态分量。The D-ESMD decomposition method is based on the pole symmetric mode ESMD decomposition method, adding a random T-distributed noise sequence to the abnormal sounds to be identified in public places, and using the symmetrical midpoint interpolation method to replace the extreme value midpoint parity interpolation of ESMD method, calculate the permutation entropy value for the decomposed modal components, and improve the modal component screening times, and then complete the complexity detection of each mode, and adaptively obtain the effective modal components of abnormal sounds.
所述的异常声音库中包括爆炸声、尖叫声、枪声、玻璃破碎声。The library of abnormal sounds includes explosions, screams, gunshots, and broken glass.
具体地,所述的D-ESMD分解方法的具体过程为:Specifically, the concrete process of described D-ESMD decomposition method is:
步骤2.1确定添加T分布随机噪声次数N;Step 2.1 determines the number of times N to add T-distributed random noise;
步骤2.2假设待识别异常声音信号为x,添加随机的T分布序列于待识别声音信号x中,得到加噪的异常声音信号Xi;Step 2.2 Assuming that the abnormal sound signal to be identified is x, add a random T distribution sequence to the sound signal x to be identified, and obtain the abnormal sound signal Xi with noise added ;
步骤2.3求取加噪过后异常声音信号Xi的极值点,连接相邻极值点,并将线段中点标记为Fi,补充左右边界点F0与Fn,采用对称中点插值方法替代ESMD的极值中点奇偶插值方法对n+1个极值中点构造插值曲线L*;Step 2.3 Find the extreme points of the abnormal sound signal X i after adding noise, connect the adjacent extreme points, mark the midpoint of the line segment as F i , supplement the left and right boundary points F 0 and F n , and adopt the symmetrical midpoint interpolation method The extreme midpoint odd-even interpolation method that replaces ESMD constructs an interpolation curve L * for n+1 extreme midpoints;
步骤2.4将Xi-L*作为输入,重复上述步骤2.3直到筛选次数达到最大值,得到第一阶模态分量M1 i,计算模态分量的排列熵的值;如果该信号的排列熵的值大于约定阈值θ,则认为是异常声音模态分量,否则认为是噪声分量;Step 2.4 takes X i -L * as input, repeats the above step 2.3 until the number of screening reaches the maximum value, and obtains the first-order modal component M 1 i , and calculates the value of the permutation entropy of the modal component; if the permutation entropy of the signal is If the value is greater than the agreed threshold θ, it is considered to be an abnormal sound modal component, otherwise it is considered to be a noise component;
步骤2.5若模态分量M1 i为异常声音模态分量,则将Xi-M1 i作为输入信号,重复步骤2.3-步骤2.4,直到分解得到的模态分量Mn i为噪声分量为止;Step 2.5 If the modal component M 1 i is an abnormal sound modal component, then take Xi-M 1 i as the input signal, and repeat steps 2.3-2.4 until the decomposed modal component M n i is a noise component;
步骤2.6若i<N,则令i=i+1,重复步骤2.2至步骤2.5,每次添加的T分布噪声信号不同,直至进行N次分解为止,对得到所有的模态分量求总体平均值,并将结果作为待分解信号的最终的模态分量Mk:Step 2.6 If i<N, set i=i+1, repeat step 2.2 to step 2.5, the T distribution noise signal added each time is different, until N times of decomposition are performed, and all modal components are obtained Calculate the overall average and use the result as the final modal component M k of the signal to be decomposed:
上式中,k是模态分量阶数,N为加噪次数。In the above formula, k is the order of the modal component, and N is the number of times of adding noise.
具体地,所述的对称中点插值方法具体步骤为:Specifically, the specific steps of the symmetric midpoint interpolation method are:
步骤3.1、假设输入信号为y,求取y的所有极大值点ymax与极小值点ymin;Step 3.1, assuming that the input signal is y, find all the maximum value points y max and minimum value points y min of y;
步骤3.2、连接所有相邻极值点,并求取极值中点ymean;Step 3.2, connect all adjacent extreme points, and calculate the extreme midpoint y mean ;
ymean=(ymax+ymin)/2y mean =(y max +y min )/2
步骤3.3、求取相邻极值中点的对称中点ym,同时采用三次样条插值方法对ym进行插值,得到最终插值曲线。In step 3.3, the symmetrical midpoint y m of the adjacent extreme midpoints is obtained, and the cubic spline interpolation method is used to interpolate y m to obtain the final interpolation curve.
具体地,步骤2.4中筛选次数最优为12。Specifically, the optimal number of screening times in step 2.4 is 12.
具体地,排列熵的具体计算过程如下:Specifically, the specific calculation process of permutation entropy is as follows:
假设一个长度为N的时间序列信号x(i),i=1,2,…,N,对其进行延迟重构,得到如下时间序列:Assuming a time series signal x(i) with length N, i=1, 2, ..., N, and reconstructing it with delay, the following time series is obtained:
式中,l为时间延迟,m为重构维数,对X(i)中m个元素进行升序排列,得到:In the formula, l is the time delay, m is the reconstruction dimension, and the m elements in X(i) are arranged in ascending order to obtain:
Xi'={x(i+(j1-1)*l)≤x(i+(j2-1)*l)X i '={x(i+(j 1 -1)*l)≤x(i+(j 2 -1)*l)
≤…≤x(i+(jm-1)*l)}≤…≤x(i+(j m -1)*l)}
因此,每一个向量X(i)都拥有一组排列序列:Therefore, each vector X(i) has a set of permutations:
Sg={j1,j2,j3,…jm}Sg={j 1 ,j 2 ,j 3 ,…j m }
式中,j表示重构分量中各元素所在列的索引。In the formula, j represents the index of the column where each element in the reconstruction component is located.
其中,对于m个不同的符号必然会有m!种不同的排列;计算每种排列方式在X(i)中出现的概率p1、p2、…p3,则归一化后的排列熵为:Among them, there must be m for m different symbols! Different permutations; calculate the probability p 1 , p 2 , ... p 3 of each permutation in X(i), then the normalized permutation entropy is:
其中,N为时间序列长度,m为重构维数和l为时间延迟。where N is the time series length, m is the reconstruction dimension and l is the time delay.
有效增益效果在于:The effective gain effect is:
本发明基于D-ESMD对公共场所异常声音分解时,添加随机的T分布噪声序列到待分解的公共场所异常声音信号中,从源头上降低由背景噪声带来的分解偏差,从而使公共场所异常声音的识别能力得到很大提高。而且本发明结合公共场所异常声音与背景噪声特点,提出D-ESMD方法用于公共场所异常声音特征提取及识别,将公共场所异常声音分解成一系列频率成分较为单一的模态分量。从理论上对ESMD内部的插值方法、分解模态终止的判定条件及模态分量筛选次数等进行改进,保证分解得到的模态分量能够反映公共场所异常声音在不同尺度下的特征。When the present invention decomposes abnormal sounds in public places based on D-ESMD, random T-distributed noise sequences are added to the abnormal sound signals in public places to be decomposed, and the decomposition deviation caused by background noise is reduced from the source, so that abnormal sounds in public places Voice recognition ability has been greatly improved. Moreover, the present invention combines the characteristics of abnormal sounds and background noise in public places, and proposes the D-ESMD method for feature extraction and identification of abnormal sounds in public places, and decomposes abnormal sounds in public places into a series of modal components with relatively single frequency components. Theoretically, the internal interpolation method of ESMD, the judgment condition of decomposition modal termination and the screening times of modal components are improved to ensure that the modal components obtained by decomposition can reflect the characteristics of abnormal sounds in public places at different scales.
附图说明Description of drawings
图1:本发明提出的一种公共场所异常声音特征提取及识别方法流程框图;Fig. 1: A flow chart of a method for feature extraction and identification of abnormal sounds in public places proposed by the present invention;
图2:ESMD插值方法分解模拟信号图;Figure 2: The ESMD interpolation method decomposes the analog signal diagram;
图3:本发明提出的改进插值方法分解模拟信号图;Fig. 3: the improved interpolation method proposed by the present invention decomposes the analog signal diagram;
图4:本发明与其它几种异常声音特征提取方法的受试者工作特征(ReceiverOperating Characteristic,ROC)曲线对比图。Fig. 4: Comparison of Receiver Operating Characteristic (ROC) curves between the present invention and several other abnormal sound feature extraction methods.
具体实施方式detailed description
以下结合附图进一步详细阐述本发明。The present invention is further described in detail below in conjunction with the accompanying drawings.
本发明的核心技术是D-ESMD分解方法。D-ESMD分解方法是基于ESMD分解方法的进行的改进,其改进点为:The core technology of the present invention is the D-ESMD decomposition method. The D-ESMD decomposition method is an improvement based on the ESMD decomposition method, and its improvement points are:
一、采用基于T分布的ESMD分解方法,削弱模态分量中的背景噪声分量,从而更好的提取异常声音的特征。具体步骤如下:1. The ESMD decomposition method based on T distribution is used to weaken the background noise component in the modal component, so as to better extract the characteristics of abnormal sounds. Specific steps are as follows:
添加随机的T分布序列于待识别声音信号中,削弱模态分量中的背景噪声分量,从源头上降低由背景噪声带来的分解偏差,提高异常声音的特征提取的能力。具体处理过程为:Add random T distribution sequence to the sound signal to be recognized, weaken the background noise component in the modal component, reduce the decomposition deviation caused by the background noise from the source, and improve the feature extraction ability of abnormal sound. The specific processing process is:
假设公共场所异常声音信号为X(t),它一般由真实异常音信号x(t)与背景噪声信号N(t)组成,即:Assuming that the abnormal sound signal in public places is X(t), it generally consists of the real abnormal sound signal x(t) and the background noise signal N(t), namely:
X(t)=x(t)+N(t)X(t)=x(t)+N(t)
采用ESMD对X(t)进行分解时,得到的模态M(t)中同样包含异常声音信号分量m(t)与背景噪声信号分量c(t),即为:When X(t) is decomposed by ESMD, the obtained mode M(t) also contains abnormal sound signal component m(t) and background noise signal component c(t), namely:
式中,n为模态分量数量,r(t)为分解余项。In the formula, n is the number of modal components, and r(t) is the decomposition remainder.
在信号X(t)中添加k次不同的T噪声序列ni(t)后,系列公式可表示成:After adding k times of different T noise sequences n i (t) to the signal X(t), the series of formulas can be expressed as:
X(t)+n1(t)=m11(t)+m12(t)+…+m1n(t)+c11(t)+c12(t)+…+c1n(t)+r1(t)X(t)+n 1 (t)=m 11 (t)+m 12 (t)+...+m 1n (t)+c 11 (t)+c 12 (t)+...+c 1n (t) +r 1 (t)
X(t)+n2(t)=m21(t)+m22(t)+…+m2n(t)+c21(t)+c22(t)+…+c2n(t)+r2(t)X(t)+n 2 (t)=m 21 (t)+m 22 (t)+...+m 2n (t)+c 21 (t)+c 22 (t)+...+c 2n (t) +r 2 (t)
………………
X(t)+ni(t)=mi1(t)+mi2(t)+…+min(t)+ci1(t)+ci2(t)+…+cin(t)+ri(t)X(t)+n i (t)=m i1 (t)+m i2 (t)+…+m in (t)+c i1 (t)+c i2 (t)+…+c in (t) +r i (t)
………………
X(t)+nk(t)=mk1(t)+mk2(t)+…+mkn(t)+ck1(t)+ck2(t)+…+ckn(t)+rk(t)X(t)+n k (t)=m k1 (t)+m k2 (t)+…+m kn (t)+c k1 (t)+c k2 (t)+…+c kn (t) +r k (t)
将上述N个公式累加,可得:Adding up the above N formulas, we can get:
由上式可知,当k~∞时,k*N(t)+n1(t)+n2(t)+…nk(t)与cij(t)项均趋近于零,则上式转换为如下:It can be seen from the above formula that when k~∞, k*N(t)+n 1 (t)+n 2 (t)+…n k (t) and c ij (t) both tend to zero, then The above formula is transformed into the following:
从上式可以看出,添加k次随机T分布噪声序列于公共场所异常声音当中,并采用ESMD对其分解得到的各阶模态取均值,背景噪声分量c(t)已经被消除,从而降低了公共场所背景噪声对异常声音分解的影响。It can be seen from the above formula that by adding k times of random T-distributed noise sequences to abnormal sounds in public places, and using ESMD to take the mean value of each order mode obtained by its decomposition, the background noise component c(t) has been eliminated, thereby reducing The effect of background noise in public places on the decomposition of abnormal sounds.
二、采用对称中点插值替代极值中点奇偶插值,从信号源头上提高ESMD分解效率与分解准确率。2. Use symmetrical midpoint interpolation instead of extreme midpoint parity interpolation to improve ESMD decomposition efficiency and decomposition accuracy from the source of the signal.
对称中点插值方法为:The symmetric midpoint interpolation method is:
步骤3.1求取原始信号的所有极大值点ymax与极小值点ymin;Step 3.1 Obtain all maximum value points y max and minimum value points y min of the original signal;
步骤3.2连接所有相邻极值点,并求取极值中点ymean;Step 3.2 connect all adjacent extreme points, and find the extreme midpoint y mean ;
ymean=(ymax+ymin)/2y mean =(y max +y min )/2
步骤3.3求取相邻极值中点的对称中点ym,同时采用三次样条插值方法对ym进行插值,得到最终插值曲线。Step 3.3 Calculate the symmetrical midpoint y m of the adjacent extreme midpoints, and at the same time use the cubic spline interpolation method to interpolate y m to obtain the final interpolation curve.
别采用对称中点插值与极值点奇偶插值对模拟信号z进行分解。假设模拟信号z由三种频率不同幅度不同的正弦信号组成,如下所示:Do not use symmetrical midpoint interpolation and extreme point parity interpolation to decompose the analog signal z. Assume that the analog signal z consists of three sinusoidal signals with different frequencies and different amplitudes, as follows:
z=sin(20*p*t)+1.5cos(40*π*t)+2.5cos(80*π*t)z=sin(20*p*t)+1.5cos(40*π*t)+2.5cos(80*π*t)
由图2以看出,采用ESMD插值方法对模拟信号进行分解时,产生的模态出现了失真现象,模态与原始信号幅值偏差较大。图3本发明提出的改进插值方法分解模拟信号图,有效缓解了ESMD插值的端点不明确造成带来的失真问题。It can be seen from Figure 2 that when the ESMD interpolation method is used to decompose the analog signal, the resulting mode appears distorted, and the amplitude deviation between the mode and the original signal is large. Fig. 3 The improved interpolation method proposed by the present invention decomposes the analog signal diagram, which effectively alleviates the distortion problem caused by the ambiguity of the end points of ESMD interpolation.
三、基于排列熵算法对ESMD分解得到的模态分量进行复杂性检测,并将此作为区分异常声音与背景噪声的判定准则,自适应的得到有效异常声音分量。3. Based on the permutation entropy algorithm, the complexity detection of the modal components obtained by ESMD decomposition is carried out, and this is used as the judgment criterion for distinguishing abnormal sounds from background noise, and the effective abnormal sound components are adaptively obtained.
排列熵的具体计算过程如下:The specific calculation process of permutation entropy is as follows:
假设一个长度为N的时间序列信号x(i),i=1,2,…,N,对其进行延迟重构,得到如下时间序列:Assuming a time series signal x(i) with length N, i=1, 2, ..., N, and reconstructing it with delay, the following time series is obtained:
式中,l为时间延迟,m为重构维数,对X(i)中m个元素进行升序排列,可得到:In the formula, l is the time delay, m is the reconstruction dimension, and the m elements in X(i) are arranged in ascending order, and we can get:
Xi'={x(i+(j1-1)*l)≤x(i+(j2-1)*l)X i '={x(i+(j 1 -1)*l)≤x(i+(j 2 -1)*l)
≤…≤x(i+(jm-1)*l)}≤…≤x(i+(j m -1)*l)}
因此,每一个向量X(i)都拥有一组排列序列:Therefore, each vector X(i) has a set of permutations:
Sg={j1,j2,j3,…jm}Sg={j 1 ,j 2 ,j 3 ,…j m }
式中,j表示重构分量中各元素所在列的索引。In the formula, j represents the index of the column where each element in the reconstruction component is located.
其中,对于m个不同的符号必然会有m!种不同的排列。计算每种排列方式在X(i)中出现的概率p1、p2、…p3,则归一化后的排列熵为:Among them, there must be m for m different symbols! different permutations. Calculate the probability p 1 , p 2 , ... p 3 of each arrangement in X(i), then the normalized arrangement entropy is:
其中,N为时间序列长度,m为重构维数和l为时间延迟。根据实验结果,重构维数m一般选取3~7。时间延迟对排列熵的影响较小,一般可以选取为1。where N is the time series length, m is the reconstruction dimension and l is the time delay. According to the experimental results, the reconstruction dimension m is generally selected from 3 to 7. The time delay has little effect on the permutation entropy and can generally be selected as 1.
本发明中通过判定添加了随机T分布序列的公共场所异常声音信号的分解得到不同频率尺度的模态分量的排列熵值H是否大于阈值θ,来判定模态的取舍。经过实验,发现θ取值在0.25-0.35范围内时,提取异常声音特征效果较好。In the present invention, the choice of the mode is determined by determining whether the permutation entropy value H of the modal components of different frequency scales obtained by decomposing the abnormal sound signal in a public place with a random T distribution sequence is greater than the threshold θ. After experiments, it is found that when the value of θ is in the range of 0.25-0.35, the effect of extracting abnormal sound features is better.
四、模态分量筛选次数Fourth, the number of modal component screening
模态筛选次数是通过大量实验确定最佳的筛选次数,优选值为12。The number of modal screenings is determined through a large number of experiments to determine the optimal number of screenings, and the preferred value is 12.
本发明利用以上的改进点,实现公共场所异常声音特征提取及识别,如图1所示,方法主要包括三部分:包括:公共场所待识别异常声音的分解、特征提取和识别。The present invention utilizes the above improvements to realize feature extraction and recognition of abnormal sounds in public places. As shown in Figure 1, the method mainly includes three parts: including: decomposition, feature extraction and recognition of abnormal sounds to be recognized in public places.
方法的具体步骤如下:The specific steps of the method are as follows:
步骤1:输入公共场所待识别异常声音信号并进行预处理。Step 1: Input the abnormal sound signal to be identified in public places and perform preprocessing.
步骤2:采用改进的极点对称模态D-ESMD分解方法将待识别异常声音信号分解为一系列模态分量,每阶模态分量分别包含异常声音信号在不同频率段的特征。Step 2: Decompose the abnormal sound signal to be identified into a series of modal components by using the improved pole-symmetric mode D-ESMD decomposition method, and each modal component contains the characteristics of the abnormal sound signal in different frequency bands.
步骤3:计算步骤2中得到的各阶模态分量相对于原始异常声音信号的能量比,并组合成向量形式进行归一化处理,作为待识别异常声音信号的特征向量。Step 3: Calculate the energy ratio of the modal components of each order obtained in step 2 relative to the original abnormal sound signal, and combine them into a vector form for normalization, as the feature vector of the abnormal sound signal to be identified.
步骤4:判断特征向量是否有效;若无效,跳转到步骤3;若有效,执行步骤5;Step 4: Determine whether the feature vector is valid; if invalid, go to step 3; if valid, go to step 5;
步骤5:公共场所待识别异常声音的识别过程:首先,在已经建立的异常声音库中随机选取每一类并且一定数量的训练样本,通过步骤2和步骤3求取其训练样本的特征向量并建立SVM分类模型;然后,利用建立的SVM分类模型对待识别异常声音的特征向量进行分类,得到分类识别结果;Step 5: The identification process of abnormal sounds to be identified in public places: First, randomly select each type and a certain number of training samples in the abnormal sound library that has been established, and obtain the eigenvectors of the training samples through steps 2 and 3 and Establishing an SVM classification model; then, using the established SVM classification model to classify the feature vectors of the abnormal sounds to be identified, to obtain classification recognition results;
其中,所述的D-ESMD用于公共场所待识别异常声音特征提取的具体步骤如下所示:Wherein, the specific steps of the D-ESMD used in feature extraction of abnormal sounds to be identified in public places are as follows:
步骤2.1确定添加T分布随机噪声次数N;Step 2.1 determines the number of times N to add T-distributed random noise;
步骤2.2假设待识别异常声音信号为x,添加随机的T分布序列于待识别声音信号x中,得到加噪的异常声音信号Xi;Step 2.2 Assuming that the abnormal sound signal to be identified is x, add a random T distribution sequence to the sound signal x to be identified, and obtain the abnormal sound signal Xi with noise added ;
步骤2.3求取加噪过后异常声音信号Xi的极值点,连接相邻极值点,并将线段中点标记为Fi,补充左右边界点F0与Fn。采用对称中点插值方法替代ESMD的极值中点奇偶插值方法对n+1个极值中点构造插值曲线L*。Step 2.3 Find the extreme points of the abnormal sound signal X i after adding noise, connect the adjacent extreme points, mark the midpoint of the line segment as F i , and supplement the left and right boundary points F 0 and F n . A symmetrical midpoint interpolation method is used to replace the ESMD's extreme midpoint even-even interpolation method to construct an interpolation curve L * for n+1 extreme midpoints.
步骤2.4将Xi-L*作为输入,重复上述步骤2.3直到筛选次数达到最大值12得到第一阶模态分量M1 i,计算模态分量的排列熵的值;如果该信号的排列熵的值大于约定阈值θ,则认为是异常声音模态分量,否则认为是噪声分量;Step 2.4 takes X i -L * as input, repeats the above step 2.3 until the number of screening times reaches the maximum value of 12 to obtain the first-order modal component M 1 i , and calculates the value of the permutation entropy of the modal component; if the permutation entropy of the signal is If the value is greater than the agreed threshold θ, it is considered to be an abnormal sound modal component, otherwise it is considered to be a noise component;
步骤2.5若模态分量M1 i为异常声音模态分量,则将Xi-M1 i作为输入信号,重复步骤2.3-步骤2.4,直到分解得到的模态分量Mn i为噪声分量为止;Step 2.5 If the modal component M 1 i is an abnormal sound modal component, then take Xi-M 1 i as the input signal, and repeat steps 2.3-2.4 until the decomposed modal component M n i is a noise component;
步骤2.6若i<N,则令i=i+1,重复步骤2.2至步骤2.5,每次添加的T分布噪声信号不同,直至进行N次分解为止,对得到所有的模态分量求总体平均值,并将结果作为待分解信号的最终的模态分量Mk:Step 2.6 If i<N, set i=i+1, repeat step 2.2 to step 2.5, the T distribution noise signal added each time is different, until N times of decomposition are performed, and all modal components are obtained Calculate the overall average and use the result as the final modal component M k of the signal to be decomposed:
上式中,k是模态分量阶数,N为加噪次数。In the above formula, k is the order of the modal component, and N is the number of times of adding noise.
图4为本发明与其它几种异常声音特征提取方法的ROC曲线对比图。其中ESMD为极点对称模态分解方法,EEMD为总体经验模态分解方法,SaSEEMD为基于α分布的总体经验模态分解方法,ELMD为总体局部均值分解方法。D-ESMD为本发明提出的改进ESMD分解方法。Fig. 4 is a comparison chart of ROC curves between the present invention and several other abnormal sound feature extraction methods. Among them, ESMD is the pole symmetric mode decomposition method, EEMD is the overall empirical mode decomposition method, SaSEMD is the overall empirical mode decomposition method based on α distribution, and ELMD is the overall local mean decomposition method. D-ESMD is an improved ESMD decomposition method proposed by the present invention.
Claims (5)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610674982.1A CN106228979B (en) | 2016-08-16 | 2016-08-16 | Method for extracting and identifying abnormal sound features in public places |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610674982.1A CN106228979B (en) | 2016-08-16 | 2016-08-16 | Method for extracting and identifying abnormal sound features in public places |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106228979A true CN106228979A (en) | 2016-12-14 |
| CN106228979B CN106228979B (en) | 2020-01-10 |
Family
ID=57552521
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610674982.1A Expired - Fee Related CN106228979B (en) | 2016-08-16 | 2016-08-16 | Method for extracting and identifying abnormal sound features in public places |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106228979B (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106683687A (en) * | 2016-12-30 | 2017-05-17 | 杭州华为数字技术有限公司 | Abnormal voice classifying method and device |
| CN107527617A (en) * | 2017-09-30 | 2017-12-29 | 上海应用技术大学 | Monitoring method, apparatus and system based on voice recognition |
| CN107525671A (en) * | 2017-07-28 | 2017-12-29 | 中国科学院电工研究所 | A kind of wind-powered electricity generation driving-chain combined failure character separation and discrimination method |
| CN108182950A (en) * | 2017-12-28 | 2018-06-19 | 重庆大学 | The abnormal sound in public places feature decomposition and extracting method of improved experience wavelet transformation |
| CN109258509A (en) * | 2018-11-16 | 2019-01-25 | 太原理工大学 | A kind of live pig abnormal sound intelligent monitor system and method |
| CN110910897A (en) * | 2019-12-05 | 2020-03-24 | 四川超影科技有限公司 | A feature extraction method for motor abnormal sound recognition |
| CN111461090A (en) * | 2020-06-17 | 2020-07-28 | 杭州云智声智能科技有限公司 | Sound vibration signal processing method and system based on environment sample basic cloud model |
| CN112906578A (en) * | 2021-02-23 | 2021-06-04 | 北京建筑大学 | Bridge time sequence displacement signal denoising method |
| CN114710419A (en) * | 2022-02-21 | 2022-07-05 | 上海交通大学 | Switching power supply sound-based equipment working state single-point monitoring method and device and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102522082A (en) * | 2011-12-27 | 2012-06-27 | 重庆大学 | Recognizing and locating method for abnormal sound in public places |
| CN103730109A (en) * | 2014-01-14 | 2014-04-16 | 重庆大学 | Method for extracting characteristics of abnormal noise in public places |
| CN105125204A (en) * | 2015-07-31 | 2015-12-09 | 华中科技大学 | Electrocardiosignal denoising method based on ESMD (extreme-point symmetric mode decomposition) method |
| US20160171975A1 (en) * | 2014-12-11 | 2016-06-16 | Mediatek Inc. | Voice wakeup detecting device and method |
-
2016
- 2016-08-16 CN CN201610674982.1A patent/CN106228979B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102522082A (en) * | 2011-12-27 | 2012-06-27 | 重庆大学 | Recognizing and locating method for abnormal sound in public places |
| CN103730109A (en) * | 2014-01-14 | 2014-04-16 | 重庆大学 | Method for extracting characteristics of abnormal noise in public places |
| US20160171975A1 (en) * | 2014-12-11 | 2016-06-16 | Mediatek Inc. | Voice wakeup detecting device and method |
| CN105125204A (en) * | 2015-07-31 | 2015-12-09 | 华中科技大学 | Electrocardiosignal denoising method based on ESMD (extreme-point symmetric mode decomposition) method |
Non-Patent Citations (1)
| Title |
|---|
| 周涛涛等: "基于CEEMD 和排列熵的故障数据小波阈值降噪方法", 《振动与冲击》 * |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106683687B (en) * | 2016-12-30 | 2020-02-14 | 杭州华为数字技术有限公司 | Abnormal sound classification method and device |
| CN106683687A (en) * | 2016-12-30 | 2017-05-17 | 杭州华为数字技术有限公司 | Abnormal voice classifying method and device |
| CN107525671A (en) * | 2017-07-28 | 2017-12-29 | 中国科学院电工研究所 | A kind of wind-powered electricity generation driving-chain combined failure character separation and discrimination method |
| CN107527617A (en) * | 2017-09-30 | 2017-12-29 | 上海应用技术大学 | Monitoring method, apparatus and system based on voice recognition |
| CN108182950B (en) * | 2017-12-28 | 2021-05-28 | 重庆大学 | An Improved Empirical Wavelet Transform Method for Decomposition and Extraction of Abnormal Sounds in Public Places |
| CN108182950A (en) * | 2017-12-28 | 2018-06-19 | 重庆大学 | The abnormal sound in public places feature decomposition and extracting method of improved experience wavelet transformation |
| CN109258509A (en) * | 2018-11-16 | 2019-01-25 | 太原理工大学 | A kind of live pig abnormal sound intelligent monitor system and method |
| CN110910897A (en) * | 2019-12-05 | 2020-03-24 | 四川超影科技有限公司 | A feature extraction method for motor abnormal sound recognition |
| CN110910897B (en) * | 2019-12-05 | 2023-06-09 | 四川超影科技有限公司 | Feature extraction method for motor abnormal sound recognition |
| CN111461090A (en) * | 2020-06-17 | 2020-07-28 | 杭州云智声智能科技有限公司 | Sound vibration signal processing method and system based on environment sample basic cloud model |
| CN112906578A (en) * | 2021-02-23 | 2021-06-04 | 北京建筑大学 | Bridge time sequence displacement signal denoising method |
| CN112906578B (en) * | 2021-02-23 | 2023-09-05 | 北京建筑大学 | A Method for Denoising Bridge Time Series Displacement Signals |
| CN114710419A (en) * | 2022-02-21 | 2022-07-05 | 上海交通大学 | Switching power supply sound-based equipment working state single-point monitoring method and device and storage medium |
| CN114710419B (en) * | 2022-02-21 | 2023-07-28 | 上海交通大学 | Method, device and storage medium for single-point monitoring of equipment working status based on sound of switching power supply |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106228979B (en) | 2020-01-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106228979A (en) | A kind of abnormal sound in public places feature extraction and recognition methods | |
| CN109065030B (en) | Ambient sound recognition method and system based on convolutional neural network | |
| CN104795064B (en) | The recognition methods of sound event under low signal-to-noise ratio sound field scape | |
| CN102890930B (en) | Speech emotion recognizing method based on hidden Markov model (HMM) / self-organizing feature map neural network (SOFMNN) hybrid model | |
| CN109784410B (en) | Characteristic extraction and classification method for ship radiation noise signals | |
| CN109800700A (en) | A kind of underwater sound signal target classification identification method based on deep learning | |
| CN113221673A (en) | Speaker authentication method and system based on multi-scale feature aggregation | |
| CN110767216A (en) | Voice recognition attack defense method based on PSO algorithm | |
| CN107300971B (en) | The intelligent input method and system propagated based on osteoacusis vibration signal | |
| CN104135327A (en) | Spectrum sensing method based on support vector machine | |
| CN106328120B (en) | Method for extracting abnormal sound features of public places | |
| CN110211604A (en) | A kind of depth residual error network structure for voice deformation detection | |
| CN103871424A (en) | Online speaking people cluster analysis method based on bayesian information criterion | |
| CN110808067A (en) | Low signal-to-noise ratio sound event detection method based on binary multiband energy distribution | |
| CN116908618A (en) | A method for diagnosing AC series arc fault in low-voltage distribution network | |
| CN110610722B (en) | Short-time energy and Mel cepstrum coefficient combined novel low-complexity dangerous sound scene discrimination method based on vector quantization | |
| CN106899357B (en) | A camouflaged covert underwater communication device simulating dolphin whistle | |
| CN111325143A (en) | A method for underwater target recognition under the condition of unbalanced dataset | |
| CN101308651B (en) | Detection method of audio transient signal | |
| CN116186524A (en) | A self-supervised machine abnormal sound detection method | |
| CN110247714B (en) | Bionic hidden underwater acoustic communication coding method and device integrating camouflage and encryption | |
| CN120408373A (en) | Transmission Line Fault Diagnosis Method Based on Time-Frequency Multi-Level Fusion and Bidirectional Time Series Enhanced Network | |
| CN107170442A (en) | Multi-parameters optimization method based on self-adapted genetic algorithm | |
| CN103544953A (en) | Sound environment recognition method based on background noise minimum statistic feature | |
| CN115910073B (en) | Voice fraud detection method based on bidirectional attention residual error network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200110 |