WO2008106852A1

WO2008106852A1 - A method and device for determining the classification of non-noise audio signal

Info

Publication number: WO2008106852A1
Application number: PCT/CN2007/003985
Authority: WO
Inventors: Qin Yan; Haojiang Deng; Jun Wang; Xuewen Zeng; Jun Zhang; Libin Zhang; Zhe Wang
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2007-03-02
Filing date: 2007-12-29
Publication date: 2008-09-12
Anticipated expiration: 2009-09-02
Also published as: CN101256772A; CN101256772B

Abstract

A method for determining the classification of non-noise audio signal includes: first extracting the spectral feature parameters of the non-noise audio signal; then determining the classification of the present non-noise audio signal in the scope of frequency domain according to the spectral feature parameters of said non-noise audio signal and the defined threshold values of the feature parameters.

Description

确定非噪声音频信号归属类别的方法和装置 Method and apparatus for determining a non-noise audio signal attribution category

技术领域本发明涉及通信领域 , 尤其涉及确定有用信号归属类别的技术。背景技术随着宽带技术的发展，目前的音频信号也呈现多元化趋势：不仅仅限于语音、还包含音乐、清音和各种噪声等音频信号。通常将所述语音、音乐和清音音频信号，总称为非噪声音频信号；将各种噪声音频信号称为噪声音频信号。为了对不同的音频信号采取适宜的编解码算法，需要在编解码非噪声音频信号之前，确定出各种非噪声音频信号的归属类别。 TECHNICAL FIELD The present invention relates to the field of communications, and more particularly to techniques for determining a class of useful signal attribution. BACKGROUND OF THE INVENTION With the development of broadband technology, current audio signals are also diversified: not only speech, but also audio signals such as music, unvoiced sound, and various noises. The speech, music, and unvoiced audio signals are generally referred to as non-noise audio signals; various noise audio signals are referred to as noisy audio signals. In order to adopt a suitable codec algorithm for different audio signals, it is necessary to determine the attribution categories of various non-noise audio signals before encoding and decoding the non-noise audio signals.

在音频信号处理领域，目前存在一些能够判别出音乐信号和语音信号的编码器，如 AMR-WB ( Adaptive Multi-Rate - Wideband, 多速率编码标准）和 SMV ( Selectable Mode Vocoder, 多码率模式语音编码标准）。其判别音乐信号和语音信号的基本思想如下：在对音频信号进行编解码之前，提取出编解码时所使用的时域特征参数；然后利用所述时域特征参数，将音频信号中的音乐信号和语音信号判别出来。 In the field of audio signal processing, there are currently some encoders capable of discriminating music signals and speech signals, such as AMR-WB (Adaptive Multi-Rate-Wideband) and SMV (Selectable Mode Vocoder). Coding standard). The basic idea of discriminating the music signal and the speech signal is as follows: Before encoding and decoding the audio signal, extracting the time domain characteristic parameter used in the encoding and decoding; then using the time domain characteristic parameter, the music signal in the audio signal is used And the voice signal is discriminated.

可以看出，上述音频信号的判别过程只能使用编码算法涉及到的时域特征参数，因此这种确定音频信号的归属类别的方法必须依赖于编码算法而存在，不具有独立性以及可移植性。发明内容 It can be seen that the discriminating process of the above audio signal can only use the time domain characteristic parameters involved in the encoding algorithm. Therefore, the method for determining the attribution category of the audio signal must exist depending on the encoding algorithm, and has no independence and portability. . Summary of the invention

本发明的实施例提供一种确定非噪声音频信号归属类别的方法和装置，其能够脱离编码算法而存在。 Embodiments of the present invention provide a method and apparatus for determining a home category of a non-noise audio signal that can exist without the encoding algorithm.

本发明的实施例通过如下技术方案实现： Embodiments of the present invention are implemented by the following technical solutions:

本发明的实施例提供一种确定非噪声音频信号归属类别的方法，其包括：获取非噪声音频信号的频谙特征参数； Embodiments of the present invention provide a method of determining a home category of a non-noise audio signal, including: Obtaining frequency characteristic parameters of the non-noise audio signal;

在频域范围内，根据所迷非噪声音频信号的频谱特征参数，以及设定的特征参数阁值，确定当前非噪声音频信号归属类别。 In the frequency domain, the current non-noise audio signal attribution category is determined according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter value.

本发明的实施例还提供一种确定非噪声音频信号归属类别的装置，其包括：特征参数获取单元，用于获取非噪声音频信号的频谙特征参数； An embodiment of the present invention further provides an apparatus for determining a belonging class of a non-noise audio signal, including: a feature parameter acquiring unit, configured to acquire a frequency characteristic parameter of the non-noise audio signal;

归属类别确定单元，用于在频域范围内， 4艮据所述非噪声音频信号的频谱特征参数，以及设定的特征参数阈值，确定当前非噪声音频信号归属类别。 The attribution category determining unit is configured to determine, in the frequency domain range, the current non-noise audio signal attribution category according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold.

本发明的实施例还提供一种清音判别装置，其包括： An embodiment of the present invention further provides an unvoiced discriminating device, including:

第一获取单元，用于获取音频信号的频谙特征参数； a first acquiring unit, configured to acquire a frequency characteristic parameter of the audio signal;

清音判别单元，用于根据获取到的如下特征参数的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行清音归属类别的判决：时域过零率 zcr; 低频带占全频带的能量比率 ratiol。 The unvoiced discriminating unit is configured to perform a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; The energy ratio of the frequency band ratiol.

本发明的实施例还提供一种语音判别装置，其包括： An embodiment of the present invention further provides a voice discriminating device, including:

第二获取单元，用于获取音频信号的频谱特征参数； a second acquiring unit, configured to acquire a spectral feature parameter of the audio signal;

语音判别单元，用于根据获取到的如下特征参数中的一个或多个，以及相应的特征参数阅值，对当前非噪声音频信号进行语音归属类别的判决：谱波动 flux; 谱波动方差 var— flux; 谱波动方差移动平均 flux— var— mov; 时域过零率 zcr; x%谱衰减 Rolloff— x。 The speech discriminating unit is configured to perform a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding feature parameter reading: spectral fluctuation flux; spectral fluctuation variance var— Flux; spectral fluctuation variance moving average flux_var- mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff-x.

本发明的实施例还提供一种音乐判别装置 , 其包括： An embodiment of the present invention further provides a music discriminating device, including:

第三获取单元，用于获取音频信号的频特征参数； a third acquiring unit, configured to acquire a frequency characteristic parameter of the audio signal;

音乐判别单元，用于根据获取到的如下特征参数的一个或多个，以及相应的特征参数阔值，对当前非噪声音频信号进行音乐归属类别的判决：谱波动方差移动平均 flux_var_mov; x%i鲁衰减 Rolloff x, a music discriminating unit, configured to perform a music attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters: and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux_var_mov; x%i Lu attenuates Rolloff x,

由上述本发明的实施例提供的具体实施方案可以看出，其是根据非噪声音频信号的频谱特征参数，来确定当前非噪声音频信号的归属类别的，因此本发明的实施例能够脱离编码算法而存在，从而具有独立性以及可移植性。附图说明图 1为本发明提供的第一实施例的流程图； It can be seen from the specific implementation provided by the above embodiments of the present invention that the attribution type of the current non-noise audio signal is determined according to the spectral characteristic parameter of the non-noise audio signal, so that the embodiment of the present invention can be separated from the coding algorithm. It exists, thus being independent and portable. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a flow chart of a first embodiment provided by the present invention;

图 2为本发明提供的第一实施例中的初始判决逻辑流程图； 2 is a flow chart of initial decision logic in a first embodiment provided by the present invention;

图 3为本发明提供的第一实施例中的修定判决逻辑流程图； 3 is a logic flow chart of a modification decision in the first embodiment provided by the present invention;

图 4为本发明提供的第二实施例的结构原理图； Figure 4 is a schematic structural view of a second embodiment provided by the present invention;

图 5为本发明提供的第三实施例的结构原理图； Figure 5 is a schematic structural view of a third embodiment provided by the present invention;

图 6为本发明提供的第四实施例的结构原理图； Figure 6 is a schematic structural view of a fourth embodiment of the present invention;

图 7为本发明提供的第五实施例的结构原理图。具体实施方式本发明提供的第一实施例是一种确定非噪声音频信号归属类别的方法，其实施过程如图 1所示，包括： Figure 7 is a schematic structural view of a fifth embodiment provided by the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The first embodiment of the present invention provides a method for determining a home category of a non-noise audio signal. The implementation process is as shown in FIG. 1 and includes:

步骤 S100, 获取非噪声音频信号的频旙特征参数。 Step S100: Acquire frequency characteristic parameters of the non-noise audio signal.

对于输入的非噪声音频信号，其具有的频语参数，主要包括：短时特征参数及其类长时特征参数。所述短时特征参数包括：谱波动（flux )， 95%谱衰减 ( spectral rolloff), x%谙衰减 Rolloff— x (如 ⁵0%谱衰减（ Roiloff— half ) )，低频带占全频带的能量比率 ratiol , 时域过零率 zcr ( zero crossing rate, zcr ); 频域过零率 fzcr; 所述类长时特征则是各短时特征参数的方差和移动平均，如旙波动方差 flux_var; 普波动方差移动平均 flux— var— tnov; "普衰减方差 rolloff— var。 For the input non-noise audio signal, the frequency parameters it has include: short-term feature parameters and their class-time feature parameters. The short-term characteristic parameters include: spectral fluctuation, 95% spectral rolloff, x% 谙 attenuation Rolloff-x (eg, ⁵⁰ % spectral attenuation (Roiloff-half)), low frequency band occupying the full frequency band The energy ratio ratiol, the zero-crossing rate zcr (the zero crossing rate, zcr); the frequency domain zero-crossing rate fzcr; the length-time characteristic is the variance and moving average of each short-term characteristic parameter, such as the fluctuation variance flux_var; The volatility variance moving average flux—var—tnov; “the decay variance rolloff—var.

在所述第一实施例中，取 10帧，即 100ms的时长统计上述特征参数，下面给出这些特征参数的定义和计算公式： In the first embodiment, the above characteristic parameters are counted by taking 10 frames, that is, the duration of 100 ms, and the definitions and calculation formulas of these characteristic parameters are given below:

定义表示一帧声音信号的第 i个时域采样值，其中 0≤ <M ; M表示一帧信号的采样值数目； T表示帧数； u jw,是第 i帧的信号频谱； N是 FFT ( Fast Fourier Transform, 快速傅立叶变换）的长度， flux ) 为第 i帧潘波动，， TT^和是第 i帧谱波动移动平均，频谱移动平均和谱衰减移动平均。下面以采样率 16kHz的声音信号为例，对特征参数作详细说明： 1. 谱波动 flux 及其衍生的谙波动方差 flux— var 和 i普波动方差移动平均 flux—var— mov。 Defining the i-th time-domain sample value representing a frame of the sound signal, where 0 ≤ <M; M represents the number of sample values of one frame signal; T represents the number of frames; u jw, is the signal spectrum of the ith frame; N is the FFT (Fast Fourier Transform, the length of the Fast Fourier Transform), flux is the ier wave of the i-th frame, and TT^ is the moving average of the ith frame spectrum, the moving average of the spectrum, and the moving average of the spectral attenuation. The following takes the sound signal with a sampling rate of 16 kHz as an example to describe the characteristic parameters in detail: 1. Spectral fluctuation flux and its derived 谙 fluctuation variance flux—var and i-wave variance variance moving average flux—var—mov.

谱波动 flux特征参数描述了帧和帧之间的变化。对音乐信号而言， flux比较低，平稳，而语音信号的 flux通常比较高，变化大。其可以采用公式 1计算得到；谱波动方差 flux一var和波动方差移动平均 flux一 var— mov分别釆用公式 2和公式 3 计算得到：

The spectral fluctuation flux feature parameter describes the variation between frames and frames. For music signals, flux is relatively low and smooth, while the flux of speech signals is usually high and varies greatly. It can be calculated using Equation 1; the spectral fluctuation variance flux-var and the fluctuation variance moving average flux-var-mov are calculated using Equation 2 and Equation 3, respectively:

•公式 1

•Formula 1

.公式 2 1 i .Formula 2 1 i

flux _ var _ mov = var flux_i =― ^ var Jlnx k) Flux _ var _ mov = var flux _i =― ^ var Jlnx k)

10 10

.公式 3 其中，《οπ«(·)是归一化函数。 Equation 3 where "οπ«(·) is a normalization function.

2、低频带占全频带的能量比率 ratiol。 2. The low frequency band accounts for the energy ratio ratio1 of the full frequency band.

该特征参数描述了低频段子带能量占总能量的比例。通常语音信号的 mtiol 比较高，音乐信号的 mtiol比较低。其计算公式如公式 4所示： This characteristic parameter describes the ratio of the low-band sub-band energy to the total energy. Usually the mtiol of the voice signal is higher, and the mtiol of the music signal is lower. Its calculation formula is as shown in formula 4:

公式 4 Formula 4

3、 95%谱衰减（ Rolloff )、 50% i普衰减（ Rolloff— half ) 及谱衰减方差 ( rolloff— var )。 3, 95% spectral decay (Rolloff), 50% i-attenuation (Rolloff-half) and spectral attenuation variance (rolloff-var).

其中， Rolloff表示占全带 95%能量的点的位置； Rolloff half表示占全带 50% 能量的点的位置。 Where Rolloff represents the position of the point occupying 95% of the energy; Rolloff half means 50% of the total band The location of the point of energy.

通常语音信号谱衰减的点比较低，音乐信号的谱衰减的点比较高。 RoUoff 和 rolloff—var的计算公式分别如公式 5和公式 6所示： Generally, the point at which the speech signal spectrum is attenuated is relatively low, and the point at which the spectrum of the music signal is attenuated is relatively high. The calculation formulas for RoUoff and rolloff-var are shown in Equation 5 and Equation 6, respectively:

Rolloffii) =max(∑[/ (k) <0.95*∑U jyw, (/) ) 公式 5 rolloff― var(z) =丄 (Rolloff(k)一 RoUoff,) Rolloffii) =max(∑[/ (k) <0.95*∑U jyw, (/) ) Equation 5 rolloff― var(z) =丄 (Rolloff(k)- RoUoff,)

m k= -m m k= -m

公式 6 Formula 6

Rolloff—half的计算公式如公式 7所示： The calculation formula for Rolloff-half is shown in Equation 7:

Rolloff Jialfii) = max(∑f/ _pw_i (k)≤0.5*∑U _pw, (I) ) 公式 7Rolloff Jialfii) = max(∑f/ _pw _i (k)≤0.5*∑U _pw, (I) ) Equation 7

4、时域过零率 zcr。 4. Time domain zero-crossing rate zcr.

该特征参数主要用来检测清音。由于语音中会间隔出现清音，故会出现较音乐高的 zcr。其计算公式如公式 8所示： zcr = ^ll{x(i)x(i-l)<0} This feature parameter is mainly used to detect unvoiced sound. Since the voice is unvoiced at intervals, a higher musical zcr will occur. Its calculation formula is as shown in Equation 8: zcr = ^ll{x(i)x(i-l)<0}

ί ί

公式 8 公式 8中，函数 Π{Α}表示当 Α是 _truth时， II{A}是 1; 当 A是 f_ai_se时， Π{Α}为 0.In Equation 8, the function Π{Α} indicates that when Α is _trut h, II{A} is 1; when A is f _a i _se , Π{Α} is 0.

5、频域过零率 fzcr。 5. Frequency domain zero-crossing rate fzcr.

所述 fzcr表示一个衡量频域内，某帧信号在不同频率的能量起伏的程度。对语音信号而言， fzcr可以看作是共振峰的一种初步算法。其可以通过如下方式获得：截取非噪声音频信号帧的至少一段频谙信号；对所截取的每一段频语信号进行归一化处理；并对归一化处理后的频谱信号，进行去掉平均值的整理处理，并计算整理过的频谱信号的过零率。具体可以采用公式 9至公式 13计算得到： The fzcr represents a measure of the fluctuation of the energy of a frame signal at different frequencies in the frequency domain. For speech signals, fzcr can be seen as a preliminary algorithm for formants. It can be obtained by: intercepting at least one frequency signal of the non-noise audio signal frame; normalizing each of the intercepted frequency signals; and performing average removal on the normalized spectrum signal The finishing process, and calculate the zero-crossing rate of the collated spectrum signal. Specifically, it can be calculated using Equation 9 to Equation 13:

1 W2(,) 1 W2(,)

U_ v_gi{t)= ∑ U_pwM U_ v _gi {t)= ∑ U_pwM

N2(t) - Nl(t) «=wi N2(t) - Nl(t) «=wi

公式 S> 对于" e[Nl(i),N2( )], 存在： Formula S> For "e[Nl(i), N2( )], there is:

U― mov_l (t, n) = U _ movO_i (n)-U _ avg_t (t) U― mov _l (t, n) = U _ movO _i (n)-U _ avg _t (t)

公^ ao 其中所述 ϋ—movOi (n)如公式 11所示： Public ^ ao where ϋ - movOi (n) is as shown in Equation 11:

U—movO, (n) = [U _ pw, n) + U __pw,(n-l) + U _ pw, (n + 1)]/ 3 U—movO, (n) = [U _ pw, n) + U __pw, (n-l) + U _ pw, (n + 1)]/ 3

公式 11 于是有： Formula 11 then has:

1 ^r_1 1 ^r_1

^( =—∑Il{O^r_wo _J(r,w) [/_wOT(f,n-l)<0} 公 fzcr{i) =∑K{t) 公式 13 其中所述 Ν1 和 Ν2 是频域子带起始点，例如可以为 Ν1 =[ 188Hz, 1500Ηζ,2500Ηζ,3750Ηζ], Ν2=[ 1500Hz, 2500Hz,3750Hz, 8000Hz]; 所述 ί/_ 是第 i 帧的信号频谱；所述 t/_ _wov(t，0是第 i帧的 t子段的移动平均；所述 T表示帧数。 ^( =—∑Il{O ^r _wo _J (r,w) [/_wOT(f,nl)<0} public fzcr{i) =∑K{t) Equation 13 where Ν1 and Ν2 are frequency domain With a starting point, for example, Ν1 = [188Hz, 1500Ηζ, 2500Ηζ, 3750Ηζ], Ν2=[1500Hz, 2500Hz, 3750Hz, 8000Hz]; the ί/_ is the signal spectrum of the i-th frame; the t/_ _wov (t, 0 is the moving average of the t sub-segments of the ith frame; and T represents the number of frames.

当获取到上述特征参数后，执行步骤 S200, 在频域范围内，根据所述非噪声音频信号的特征参数，以及设定的特征参数阈值，确定当前非噪声音频信号归属类别。 After obtaining the above characteristic parameters, step S200 is performed to determine the current non-noise audio signal attribution category according to the characteristic parameters of the non-noise audio signal and the set characteristic parameter threshold in the frequency domain.

步骤 S200中利用上述特征参数组合进行逻辑判断时，首先进行初步逻辑判定，将非噪声音频信号进行语音和音乐的初始分类，分成 4 类：清音，语音，音乐和不确定信号；然后进行修正逻辑判定，即对经过初步逻辑判定后得到的不确定信号进一步进行判决，使之可以归属为语音或音乐。如下： In step S200, when the logic parameter determination is performed by using the combination of the above characteristic parameters, a preliminary logic determination is first performed, and the non-noise audio signal is initially classified into four categories: unvoiced, voice, music, and uncertain signal; and then the correction logic is performed. The judgment, that is, the uncertainty signal obtained after the preliminary logic determination is further judged, so that it can be attributed to voice or music. as follows:

首先进行初步逻辑判定，将非噪声音频信号进行语音和音乐的初始分类，分成 4类：清音，语音，音乐和不确定信号。具体实施过程如图 2所示： First, a preliminary logic decision is made to classify the non-noise audio signal into the initial classification of speech and music, which are divided into four categories: unvoiced, voice, music, and uncertain signals. The specific implementation process is shown in Figure 2:

步骤 S101，置语音标志和音乐标志为 0，即 Speech Jlag=0且 Music_flag=0。接下来同时进行如下判断：步骤 S102, 根据如下特征参数的一个或多个：时域过零率 zcr, 低频带占全频带的能量比率 ratiol , 判断当前非噪声音频信号是否归属于清音。以及， In step S101, the voice flag and the music flag are set to 0, that is, Speech Jlag=0 and Music_flag=0. Next, make the following judgments at the same time: Step S102: Determine whether the current non-noise audio signal belongs to the unvoiced sound according to one or more of the following characteristic parameters: a time domain zero-crossing rate zcr, and a low frequency band occupying an energy ratio ratio1 of the full frequency band. as well as,

步骤 S103 , 根据如下特征参数的一个或多个：谙波动 flux; i普波动方差 flux_var; 谱波动方差移动平均 flux— var—mov; 时域过零率 zcr，判断当前非噪声音频信号是否归属于语音。以及， Step S103, according to one or more of the following characteristic parameters: 谙 fluctuation flux; i undulation variance flux_var; spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr, determining whether the current non-noise audio signal belongs to voice. as well as,

步骤 S104, 根据 x%谱衰减 Rolloff— X ，如 50%谱衰减 Rolloff— half特征参数，判断当前非噪声音频信号是否归属于语音。以及， Step S104, determining whether the current non-noise audio signal belongs to the voice according to the x% spectrum attenuation Rolloff_X, such as the 50% spectral attenuation Rolloff-half characteristic parameter. as well as,

步骤 S 105,根据上一帧音频信号的清音拖尾标志 ZCR— hangover—flag、谱波动拖尾标志 Flux— hangover—flag或 ϊ ·衰减拖尾标志 Rollhalf— hangover_flag，判断当前非噪声音频信号是否归属于语音。以及， Step S105, determining whether the current non-noise audio signal belongs according to the unvoiced trailing flag ZCR-hangover_flag, the spectral fluctuation trailing flag Flux-hangover_flag or the attenuating trailing flag Rollhalf_hangover_flag of the audio signal of the previous frame. For voice. as well as,

步骤 S106 , 根据如下特征参数的一个或多个：诰波动方差移动平均 flux var mov; x%谱衰减 Rolloff x, 判断当前非噪声音频信号是否归属于音乐。以及， Step S106, according to one or more of the following characteristic parameters: 诰 fluctuation variance moving average flux var mov; x% spectral attenuation Rolloff x, determining whether the current non-noise audio signal belongs to music. as well as,

步骤 S107，根据上一帧的谱波动方差移动平均拖尾标志 flux_var_mov_hangover_flag判断当前非噪声音频信号是否归属于音乐。 Step S107: Determine whether the current non-noise audio signal belongs to music according to the spectral fluctuation variance moving average tailing flag flux_var_mov_hangover_flag of the previous frame.

上述步骤 S102中，若确定出当前音频信号归属于清音，则执行步骤 S108, 即置清音拖尾标志 ZCR— hangover_flag为第一设定值，如 ZCR— hangover— flag=20。然后执行步骤 S109 , 即输出清音标识。否则，执行步骤 S113 , 即保持 Speech— flag=0, 表示当前非噪声帧为既不属于语音类。具体实现可以按照如下情况进行： In the above step S102, if it is determined that the current audio signal belongs to the unvoiced sound, step S108 is executed, that is, the unvoiced trailing flag ZCR_hangover_flag is set to a first set value, such as ZCR_hangover_flag=20. Then, step S109 is executed, that is, the unvoiced identifier is output. Otherwise, step S113 is executed, that is, the speech_flag=0 is maintained, indicating that the current non-noise frame is neither a voice class. The specific implementation can be carried out as follows:

判断下述条件中的一个或多个是否满足：时域过零率 zcr是否大于时域过零率阈值 THR— ZCR; 低频带占全频带的能量比率 ratiol是否大于低频带占全频带的能量比率阈值 THR一 RA; 若其中一个满足条件，则确定当前非噪声帧归属于清音类别 , 清音拖尾标志 ZCR— hangover— flag 为第一设定值，如 ZCR_hangover_flag-20; 否则，执行步骤 S 113 , 即保持 Speech— flag=0。 Determining whether one or more of the following conditions are satisfied: whether the time domain zero-crossing rate zcr is greater than the time-domain zero-crossing rate threshold THR_ZCR; whether the low-band-to-full-band energy ratio ratio1 is greater than the low-band-to-full-band energy ratio Threshold THR-RA; if one of the conditions satisfies the condition, it is determined that the current non-noise frame belongs to the unvoiced category, and the unvoiced trailing flag ZCR_hangover_flag is the first set value, such as ZCR_hangover_flag-20; otherwise, step S113 is performed. That is, keep Speech_flag=0.

步骤 S103中，若确定出当前音频信号归属于语音，则执行步骤 S110, 即置傳波动拖尾标志 Flux— hangover— flag为第二设定值，如 Flux— hangover— flag=20; 然后执行步骤 S112,输出语音标识，即置 Speech_flag=l。否则，执行步骤 S113, 即保持 Speechjlag=0, 表示当前非噪声桢不属于语音类。具体实现可以按照如下情况进行： In step S103, if it is determined that the current audio signal belongs to the voice, step S110 is performed, that is, The fluctuation trailing flag Flux-hangover_flag is the second set value, such as Flux-hangover_flag=20; then step S112 is executed to output the voice identifier, that is, set Speech_flag=l. Otherwise, step S113 is performed, that is, Speechjlag=0 is maintained, indicating that the current non-noise 桢 does not belong to the voice class. The specific implementation can be carried out as follows:

判断下述条件中的一个或多个是否满足： Determine if one or more of the following conditions are met:

谱波动 flu 是否大于谘波动阈值 THR_FLUX; 谱波动方差 flux—var是否大于谱波动方差阔值 THR—FLUX— VAR; 谱波动 flux是否大于第一谱波动方差函 ¾ f,(flux_var) , 如 f^flux—var) = 0.7-20*flux—var; 谱波动 flux是否小于第二 i普波动方差函数 f₂(flux_var)，如 f₂(flux_var)=8*(flux— var); zcr是否大于谱波动方差移动平均函数 f(flux一 var一 mov) , 如 f(flux— var— mov)= 60-2609* flux_var_mov; 若其中一个条件满足，则确定当前非噪声音频信号归属于语音类别，置谱波动拖尾标志 Flux—hangover— flag为第二设定值，如 Flux— hangover— flag=20; 然后置 Speech— flag=l; 否则，执行步骤 SI 13，即保持 Speech— flag=0, 表示当前非噪声帧不属于语音类。 Whether the spectral fluctuation flu is greater than the consensus fluctuation threshold THR_FLUX; whether the spectral fluctuation variance flux-var is greater than the spectral fluctuation variance threshold THR-FLUX-VAR; whether the spectral fluctuation flux is greater than the first spectral fluctuation variance function 3⁄4 f, (flux_var), such as f^ Flux—var) = 0.7-20*flux—var; whether the spectral fluctuation flux is less than the second undulation variance function f ₂ (flux_var), such as f ₂ (flux_var)=8*(flux—var); whether zcr is greater than the spectrum The fluctuation variance moving average function f(flux-var-mov), such as f(flux_var_mov)= 60-2609* flux_var_mov; if one of the conditions is satisfied, it is determined that the current non-noise audio signal belongs to the speech class, and the spectrum is set. The fluctuation tailing flag Flux-hangover_flag is the second set value, such as Flux-hangover_flag=20; then set Spech_flag=l; otherwise, execute step SI 13, that is, keep the Speech_flag=0, indicating the current Non-noise frames are not part of the voice class.

步骤 S104中，若确定出当前音频信号归属于语音，则执行步骤 S111 , 即置谱衰减拖尾标志 Rollhalf_hangover_flag 为第三设定值，如 Rollhalf_hangover_flag=20；然后执行步骤 S112，输出语音标识，即置 Speech_flag=L 否则，执行步骤 S113 , 即保持 Speech— flag=0, 表示当前非噪声帧不属于语音类。具体实现可以按照如下情况进行： In step S104, if it is determined that the current audio signal belongs to the voice, step S111 is performed, that is, the spectral attenuation trailing flag Rollhalf_hangover_flag is a third set value, such as Rollhalf_hangover_flag=20; then step S112 is performed to output the voice identifier, that is, Speech_flag=L Otherwise, step S113 is performed, that is, keeping Speech_flag=0, indicating that the current non-noise frame does not belong to the voice class. The specific implementation can be carried out as follows:

x%谱衰减 Rolloff— half是否小于 x%谱衰减阈值 THR— ROLL; 若满足，则确定当前非噪声音频信号归属于语音类别，置谱衰减拖尾标志 Rollhalf_hangover_flag 为第三设定值，如 Rollhalf_hangover—flag=20; 然后置 Speech_flag=l ; 否则，执行步骤 S113 , 即保持 Speech—flag=0，表示当前非噪声帧为非语音类。 Whether the x% spectral attenuation Rolloff-half is less than the x% spectral attenuation threshold THR_ROLL; if so, it is determined that the current non-noise audio signal belongs to the speech class, and the spectral attenuation trailing flag Rollhalf_hangover_flag is the third set value, such as Rollhalf_hangover- Flag=20; Then set Speech_flag=l; otherwise, execute step S113, that is, keep Speech_flag=0, indicating that the current non-noise frame is a non-speech class.

步骤 S105中，若确定出当前音频信号归属于语音，则执行步骤 Sl l l，输出语音标识，即置 Speech— flag=l。否则，执行步骤 S113 , 即保持 Speech— flag=0, 表示当前非噪声帧为非语音类。具体实现可以按照如下情况进行： In step S105, if it is determined that the current audio signal belongs to the voice, step S111 is performed, and the output is performed. Voice identification, ie, Speech_flag=l. Otherwise, step S113 is executed, that is, keeping Speech_flag=0, indicating that the current non-noise frame is a non-speech class. The specific implementation can be carried out as follows:

判断是否满足下迷条件中一个或多个： Determine if one or more of the following conditions are met:

清音拖尾标志 ZCR_hangover—flag 是否大于 0；旙波动拖尾标志 Fluxjiangover— flag是否大于 0; 以及 i普衰減拖尾标志 Rollhalfjiangover— flag是否大于 0; Whether the unvoiced trailing flag ZCR_hangover_flag is greater than 0; 旙 fluctuation tailing flag Fluxjiangover—flag is greater than 0; and i is attenuated trailing flag Rollhalfjiangover—flag is greater than 0;

若是，则认为当前音频信号归属于语音，于是置 Speech— flag=l。否则，不处理，即保持 Speech_flag=0, 表示当前非噪声帧为非语音类。 If so, the current audio signal is considered to be at the voice, and then Speech_flag = l. Otherwise, it does not process, that is, keeps Speech_flag=0, indicating that the current non-noise frame is a non-speech class.

步骤 S106中，若确定出当前音频信号归属于音乐，则执行步骤 S114, 即置谱波动方差移动平均拖尾标志 flux— var— mov— hangover_flag 为第四设定值，如 flux_var_mov_hangover_flag=20；然后执行步骤 S115 , 输出音乐标识，即置 Music一 flag=l。否则，执行步骤 S116, 即保持 Music_flag ), 表示当前非噪声帧不属于音乐类。具体实现可以按照如下情况进行： In step S106, if it is determined that the current audio signal belongs to music, step S114 is performed, that is, the spectral fluctuation variance moving average trailing flag flux_var_mov_hangover_flag is a fourth set value, such as flux_var_mov_hangover_flag=20; Step S115, outputting a music identifier, that is, setting Music_flag=l. Otherwise, step S116 is executed, that is, music_flag is maintained, indicating that the current non-noise frame does not belong to the music class. The specific implementation can be carried out as follows:

谱波动方差移动平均 flux— var— mov 是否小于第三 x°/。谱衰减函数 f₃ The spectral fluctuation variance moving average flux — var — mov is less than the third x°/. Spectral decay function f ₃

(Rolloff_x), 如 f₃(Rolloff— half)=0.03-l/2400*(Rolloff— half); 谱波动方差移动平均 flux_var—mov是否小于第五设定值，如第五设定值 =0.005; 讲波动方差移动平均 flux_var_mov 是否小于第四 x°/o 谱衰减函数 f₄(Rolloff_x) , 如 f₄(Rolloff_hali =l/l 867*Rolloff_half-0.0486；谱波动方差移动平均 flux— var_mov 是否小于谱波动方差移动平均阈值 THR_FLUX—VAR— MOV; (Rolloff_x), such as f ₃ (Rolloff - half) = 0.03-l/2400 * (Rolloff - half); spectral fluctuation variance moving average flux_var - mov is less than the fifth set value, such as the fifth set value = 0.005; Is the fluctuation variance moving average flux_var_mov smaller than the fourth x°/o spectral attenuation function f ₄ (Rolloff_x), such as f ₄ (Rolloff_hali = l/l 867*Rolloff_half-0.0486; spectral fluctuation variance moving average flux_ var_mov is less than spectral fluctuation Variance moving average threshold THR_FLUX_VAR_MOV;

若其中一个条件满足，则确定当前非噪声音频信号归属于音乐类别，于是置语波动方差移动平均拖尾标志 flux— var— mov— hangover— flag为第四设定值，如 flux— var— mov_hangover—flag=20。然后置 Music_flag=l ; 否则，执行步骤 S116, 即保持 Music_flag=0, 表示当前非噪声帧不属于音乐类。 If one of the conditions is satisfied, it is determined that the current non-noise audio signal belongs to the music category, so the speech fluctuation variance moving average tailing flag flux_var_mov_hangover_flag is the fourth set value, such as flux_var_mov_hangover —flag=20. Then, Music_flag=l is set; otherwise, step S116 is executed, that is, Music_flag=0 is held, indicating that the current non-noise frame does not belong to the music class.

步驟 S107 中，判断谱波动方差移动平均拖尾标志 flux_var_mov_hangover_flag是否大于 0; 若是，则认为当前音频信号归属于音乐，于是置 Music_flag=l。否则，执行步骤 SI 16, 即保持 Music—flag=0, 表示当前非噪声帧不属于音乐类。 In step S107, it is determined whether the spectral fluctuation variance moving average tailing flag flux_var_mov_hangover_flag is greater than 0; if yes, the current audio signal is considered to belong to the sound Le, then set Music_flag=l. Otherwise, step SI16 is executed, that is, Music_flag=0 is maintained, indicating that the current non-noise frame does not belong to the music class.

经过上述实施过程后，非噪声音频信号被判决后，可能输出如下标志： After the above implementation process, after the non-noise audio signal is judged, the following flag may be output:

Speech_flag= Music_flag=l、 Speech—flag=0和 Music— flag=0。 Speech_flag = Music_flag = 1, Speech - flag = 0, and Music - flag = 0.

然后执行步骤 S117，即根据所述 Speech— flag和 Music— flag, 判断当前非噪声音频信号的归属类别： Then, step S117 is performed, that is, according to the Speech_flag and Music_flag, the attribution category of the current non-noise audio signal is determined:

当 Speech_flag=l 且 Music— flag=l 时表示当前非噪声音频信号既归属于语音，又归属于音乐；或当 Speech_flag=0且 Music_flag=0时，表示当前非噪声音频信号既不归属于语音，又不归属于音乐，于是执行步骤 S118，即判决非噪声音频信号归属于不确定信号 UNCERTAIN的信息； When Speech_flag=l and Music_flag=l, it indicates that the current non-noise audio signal belongs to both speech and music; or when Speech_flag=0 and Music_flag=0, it indicates that the current non-noise audio signal is not attributed to speech. If it is not attributed to the music, then step S118 is performed, that is, the information that the non-noise audio signal belongs to the indeterminate signal UNCERTAIN is determined;

当 Speech— flag=l且 Music_flag=0时，则表示非噪声音频信号归属于语音，于是执行步骤 S119，判决非噪声音频信号归属于语音； When Speech_flag=l and Music_flag=0, it indicates that the non-noise audio signal belongs to the voice, and then step S119 is performed to determine that the non-noise audio signal belongs to the voice;

当 Speech— flag=0且 Music_flag=l时，则表示非噪声音频信号归属于音乐，于是执行步骤 S120, 判决当前非噪声音频信号归属于音乐。 When Speech_flag=0 and Music_flag=l, it indicates that the non-noise audio signal belongs to the music, and then step S120 is performed to determine that the current non-noise audio signal belongs to the music.

对于判决为既不属于语音类别又不属于音乐类别的不确定音频信号 Uncertain audio signals for decisions that are neither in the speech category nor in the music category

UNCERTAIN, 还需要进一步根据所述音频信号前的音频环境，判别出其归属类另' J。具体判决方法如图 3所示： UNCERTAIN, further needs to determine the attribution class according to the audio environment before the audio signal. The specific judgment method is shown in Figure 3:

步骤 S201 , 判断当前非噪声音频信号之前的音频环境为语音音频环境，还是音乐环境； Step S201, determining that the audio environment before the current non-noise audio signal is a voice audio environment, and is also a music environment;

如杲满足 Speech— continue_counter (连续语音计数器，表示所述当前非噪声音频信号之前，连续出现的语音音频信号的个数） >THR— SPEECH 阁值，则确定当前非噪声音频信号之前的音频环境为语音音频环境； If the speech_continue_counter (continuous speech counter, the number of consecutively occurring speech audio signals before the current non-noise audio signal) >THR_SPEECH value is satisfied, the audio environment before the current non-noise audio signal is determined as Voice and audio environment;

如果满足 Music_continue_counter (连续音乐计数器，表示之前连续出现的音乐音频信号的个数） >THR— MUSIC阈值，则确定当前非噪声音频信号之前的音频环境为音乐音频环境。 If Music_continue_counter (continuous music counter, indicating the number of consecutive music audio signals) >THR_MUSIC threshold is satisfied, it is determined that the audio environment before the current non-noise audio signal is a music audio environment.

如果 Speech— continue— counter>THR— SPEECH 阚值，或 Music— continue— counter>THR— MUSIC阈值均不满足，说明当前非噪声音频信号之前的音频环境既不属于语音环境，也不属于音乐环境。于是，直接执行步骤 S205 , 即将所述非噪声音频信号判决为不确定音频信号。 If Speech— continue—count>THR—SPEECH 阚, or Music—continent—count>THR—The MUSIC threshold is not satisfied, indicating that the audio environment before the current non-noise audio signal is neither a voice environment nor a music environment. Then, step S205 is directly performed, that is, the non-noise audio signal is judged as an indeterminate audio signal.

当确定出当前非噪声音频信号之前的音频环境为语音环境时，则执行步骤 S202, 根据当前非噪声音频信号的 flux、 flux— var、 flux— var__mov、 Rolloff— var 和 fzcr中的至少一个，判断当前非噪声音频信号是否归属于语音，若是，则执行步骤 S204，即确定当前非噪声音频信号为语音，并置语音信号标志 Speech— flag=l ; 否则执行步骤 S205, 即确定当前非噪声音频信号为不确定音频信号。 When it is determined that the audio environment before the current non-noise audio signal is a voice environment, step S202 is performed to determine, according to at least one of flux, flux_var, flux_var__mov, Rolloff_var, and fzcr of the current non-noise audio signal. Whether the current non-noise audio signal is attributed to the voice, if yes, executing step S204, that is, determining that the current non-noise audio signal is voice, and juxtaposing the voice signal flag Speech_flag=l; otherwise, performing step S205, determining the current non-noise audio signal For uncertain audio signals.

步骤 S202的具体实施过程如下： The specific implementation process of step S202 is as follows:

判断是否满足下述条件中的至少一个： flux>THR—flux， flux— var >THR_flux_var , flux_var_mov>THR_flux_var_mov , Determine if at least one of the following conditions is met: flux>THR—flux, flux—var>THR_flux_var, flux_var_mov>THR_flux_var_mov,

Rolloff_var>THR_Rolloff_var , fzcr< THR— fzcr; Rolloff_var>THR_Rolloff_var , fzcr< THR — fzcr;

如果满足上述条件之一，则确定当前非噪声音频信号为语音，置语音信号标志 Speech_flag=l; 否则，确定当前非噪声音频信号为不确定音频信号。 If one of the above conditions is satisfied, it is determined that the current non-noise audio signal is speech, and the speech signal flag Speech_flag = 1; otherwise, the current non-noise audio signal is determined to be an indeterminate audio signal.

此时所述阔值 THR— flux、 THR_flux_var和 THR_flux_var一 mov可以不同于初始判断过程所设置的相应阈值。 At this time, the thresholds THR-flux, THR_flux_var, and THR_flux_var-mov may be different from the corresponding thresholds set by the initial judgment process.

当确定出当前非噪声音频信号之前的音频环境为音乐环境时，则执行步骤 S203, 根据当前非噪声音频信号的 flux_var—mov、 Rolloff— var和 fzcr中的至少一个，判断当前非噪声音频信号是否归属于音乐，若是，则确定当前非噪声音频信号为音乐，并置音乐信号标志 Music—flag=l; 否则，执行步骤 S205, 即确定当前非噪声音频信号为不确定音频信号。 When it is determined that the audio environment before the current non-noise audio signal is a music environment, step S203 is performed to determine whether the current non-noise audio signal is based on at least one of flux_var_mov, Rolloff_var, and fzcr of the current non-noise audio signal. Attributable to the music, if yes, determining that the current non-noise audio signal is music, and juxtaposing the music signal flag Music_flag=l; otherwise, performing step S205, that is, determining that the current non-noise audio signal is an indeterminate audio signal.

步骤 S204的具体实施过程如下： The specific implementation process of step S204 is as follows:

判断是否满足下述条件中的至少一个： flux_var—mov<THR_flux— var—mov， Rolloff_var<THR_ Rolloff_var, fzcr> THR— fzcr; Determining whether at least one of the following conditions is satisfied: flux_var_mov<THR_flux_var-mov, Rolloff_var<THR_Rolloff_var, fzcr> THR-fzcr;

如果满足上述条件之一，则确定当前非噪声音频信号为音乐，置音乐信号标志 Music—flag=l ; 否则，确定当前非噪声音频信号为不确定音频信号。 If one of the above conditions is satisfied, it is determined that the current non-noise audio signal is music, and the music signal is set. The flag Music_flag=l; otherwise, it determines that the current non-noise audio signal is an indeterminate audio signal.

此时所述阈值 THR_ flux_var— mov可以不同于初始判断过程所设置的相应阈值。 At this time, the threshold THR_flux_var_mov may be different from the corresponding threshold set by the initial judging process.

对于步骤 S101至步骤 S120确定出的既属于语音类别又属于音乐类别的音频信号，以及步骤 S201 至步骤 S205确定出的既不属于语音类别又不属于音乐类别的不确定音频信号，可以进行进一步的判决，采取的判决方法如下： For the audio signals determined by the steps S101 to S120 which belong to both the voice category and the music category, and the uncertain audio signals determined by the steps S201 to S205 that are neither the voice category nor the music category, further determination may be performed. The judgment, the method of judgment adopted is as follows:

根据当前非噪声音频信号前的音频信号，对所述不确定音频信号进行归属类别的判决。具体如下： The uncertainty audio signal is subjected to a decision of a home class based on an audio signal preceding the current non-noise audio signal. details as follows:

将所述不确定音频信号的归属类别，判为紧邻所述不确定音频信号前的音频信号的归属类别；或，所述不确定音频信号的归属类别，判为所述不确定音频信号前的一段音频信号中，所占比重较大的信号所归属的类别。 Determining, by the attribution category of the indeterminate audio signal, a attribution category of the audio signal immediately before the indeterminate audio signal; or determining, by the attribution category of the indeterminate audio signal, The category to which a relatively large proportion of the audio signal belongs.

对于步骤 S101至步骤 S120确定出的既属于语音类别又属于音乐类别的音频信号，以及步骤 S201至步骤 S205确定出的既不属于语音类别又不属于音乐类别的不确定音频信号，也可以采用其它软决策方法，对不确定音频信号，进行归属类别的判决，例如采用 GMM (高斯混合模型）判定的方法作进一步分类。 For the audio signals determined by the steps S101 to S120 that belong to both the voice category and the music category, and the uncertain audio signals determined by the steps S201 to S205 that are neither in the voice category nor in the music category, other The soft decision method, for the uncertain audio signal, performs the classification of the attribution class, for example, using the GMM (Gaussian mixture model) decision method for further classification.

上述实施例是以步骤 S101至步骤 S107同时进行判断为例进行说明的，除每个步骤的详细情况与上述实施过程雷同，此处不进行具体说明。 The above embodiment is described by taking the steps S101 to S107 simultaneously as an example, and the details of each step are the same as those of the above-mentioned implementation process, and are not specifically described herein.

本发明提供的第二实施例是一种确定非噪声音频信号归属类别的装置，其结构如图 4 所示，包括：特征参数获取单元和归属类别确定单元。所述归属类别确定单元包括：清音判别子单元、语音判别子单元和音乐判别子单元，所述归属类别确定单元还包括：一判决子单元。 The second embodiment of the present invention is an apparatus for determining a home category of a non-noise audio signal, and has a structure as shown in FIG. 4, including: a feature parameter obtaining unit and a home category determining unit. The attribution class determining unit includes: a voiceless discriminating subunit, a voice discriminating subunit, and a music discriminating subunit, and the home class determining unit further includes: a determining subunit.

各个单元之间信号的交互关系如下： The interaction of signals between the various units is as follows:

所述特征参数获取单元获取非噪声音频信号的特征参数；所述特征参数包括如下中的至少一个： The feature parameter obtaining unit acquires a feature parameter of the non-noise audio signal; the feature parameter includes at least one of the following:

谱波动 flux; 谱波动方差 flux— var; 谱波动方差移动平均 flux— var_mov; 频带占全频带的能量比率 ratiol ; 95%谱衰减 Rolloff; x%谱衰减 Rolloff—x, 如 50%谱衰减 Rolloff_half; 谱衰减方差 rolloff一 var; 频谱幅度的方差 magvar; 时域过零率 zcr; 频域过零率 fzcr。 Spectral fluctuation flux; spectral fluctuation variance flux_var; spectral fluctuation variance moving average flux_var_mov; The energy ratio of the frequency band to the full frequency band ratiol; 95% spectral attenuation Rolloff; x% spectral attenuation Rolloff-x, such as 50% spectral attenuation Rolloff_half; spectral attenuation variance rolloff-var; spectral amplitude variance magvar; time domain zero-crossing rate zcr; The frequency domain zero crossing rate fzcr.

所述归属类别确定单元，在频域范围内，根据所述非噪声音频信号的特征参数，以及设定的特征参数阈值，确定当前非噪声音频信号归属类别。具体处理如下： The attribution category determining unit determines, in the frequency domain range, the current non-noise audio signal attribution category according to the characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold. The specific treatment is as follows:

清音判别子单元，根据获取到的如下特征参数的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行清音归属类别的判决：时域过零率 zcr; 低频带占全频带的能量比率 ratiol ; 具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。以及， The unvoiced discriminant subunit performs a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the obtained characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; a low frequency band occupying the full frequency band The specific energy ratio ratio is the same as the related description in the first embodiment, and will not be described in detail here. as well as,

语音判别子单元，根据获取到的如下特征参数中的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行语音归属类别的判决：谱波动 flux; 傳波动方差 var— flux; 谱波动方差移动平均 flux— var— mov; 时域过零率 zcr; x% 谱衰減 Rolloff—x, 如 50%谱衰减 Rolloff— half; 具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。以及， The speech discriminating subunit performs a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a spectral fluctuation flux; a variability variance var-flux; Spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff-x, such as 50% spectral attenuation Rolloff-half; the specific processing is the same as the related description in the first embodiment, here No longer described in detail. as well as,

音乐判别子单元，根据获取到的如下特征参数的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行音乐归属类别的判决：谱波动方差移动平均 flux— var— mov; x%谱衰减 Rolloff_x，如 50%谱衰减 Rolloff— half。具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。 The music discriminating subunit performs a music attribution category decision on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: spectral fluctuation variance moving average flux_var_mov; x The % spectrum decays Rolloff_x, such as 50% spectral decay Rolloff-half. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.

当通过所述清音判决子单元、语音判决子单元或音乐判决子单元，判决出当前非噪声音频信号为既不归属于语音类别又不归属于音乐类别时，所述归属类别确定单元还通过所述一判决子单元，判断当前非噪声音频信号前存在语音音频环境还是音乐音频环境； When the unvoiced audio sub-unit, the voice decision sub-unit or the music decision sub-unit determines that the current non-noise audio signal belongs to neither the voice class nor the music class, the home class determining unit passes the Describe a decision subunit to determine whether a voice audio environment or a music audio environment exists before the current non-noise audio signal;

当当前非噪声音频信号前存在语音音频环境时，根据获取到的如下特征参数中的一个或多个： i香波动 flux; 谱波动方差 var— flux; 谱波动方差移动平均 flux_var_mov; 谱衰减方差 rolloff— var; 频域过零率 fzcr, 以及相应的特征参数阈值，对既不属于语音又不属于音乐的当前非噪声音频信号进行语音归属类别的判决；具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。 When there is a voice audio environment before the current non-noise audio signal, according to one or more of the following characteristic parameters obtained: i scent fluctuation flux; spectral fluctuation variance var-flux; spectral fluctuation variance moving average flux_var_mov; spectral attenuation variance rolloff — var; frequency domain zero-crossing rate fzcr, and corresponding characteristic parameters The threshold value is used to determine the voice attribution category for the current non-noise audio signal that is neither voice nor music. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.

当当前非噪声音频信号前存在音乐音频环境时， 4艮据获取到的如下特征参数中的一个或多个：谱波动方差移动平均 flux_var_mov;谱衰减方差 rolloff— var; 频域过零率 fzcr, 以及相应的特征参数阈值，对既不属于语音又不属于音乐的当前非噪声音频信号进行语音归属类别的判决。具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。 When there is a music audio environment in front of the current non-noise audio signal, one or more of the following characteristic parameters are obtained: spectral fluctuation variance moving average flux_var_mov; spectral attenuation variance rolloff_var; frequency domain zero-crossing rate fzcr, And corresponding feature parameter thresholds, and the current non-noise audio signal that is neither voice nor music belongs to the voice attribution category. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.

对于通过所述清音判决子单元、语音判决子单元或音乐判决子单元，确定出的既属于语音类别又属于音乐类别的音频信号，以及当通过所述一判决子单元确定出既不属于语音类别又不属于音乐类别的不确定音频信号后，可以通过所述一判决子单元对所述音频信号进行进一步的判决，可以采取如下判决方法：根据当前非噪声音频信号前的音频信号，对所述不确定音频信号进行归属类别的判决。也就是说，将所述不确定音频信号的归属类别，判为紧邻所述不确定音频信号前的音频信号的归属类别；或，所述不确定音频信号的归属类别，判为所述不确定音频信号前的一段音频信号中，所占比重较大的信号所归属的类别。 And an audio signal determined by the unvoiced decision subunit, the voice decision subunit, or the music decision subunit, belonging to both the voice category and the music category, and when determined by the one of the decision subunits to be neither a voice category After the undetermined audio signal of the music category is not further determined, the audio signal may be further determined by the determining subunit, and the following determining method may be adopted: according to the audio signal before the current non-noise audio signal, Uncertain audio signals are subject to decision of attribution class. That is, the attribution category of the indeterminate audio signal is determined as the attribution category of the audio signal immediately before the indeterminate audio signal; or the attribution category of the indeterminate audio signal is determined as the uncertainty Among the audio signals in front of the audio signal, the category to which the larger proportion of the signal belongs.

也可以采用其它软决策方法，对不确定音频信号，进行归属类别的判决，例如采用 GMM (高斯混合模型）判定的方法作进一步分类。 Other soft decision methods can also be used to determine the attribution class for the uncertain audio signal, for example, using the GMM (Gaussian Mixture Model) decision.

本发明提供的第三实施例是一种清音判别装置，其结构如图 5所示，包括：第一获取单元和清音判别单元。 The third embodiment provided by the present invention is an unvoiced discriminating device, and its structure is as shown in FIG. 5, and includes: a first acquiring unit and an unvoiced discriminating unit.

所述第一获取单元获取音频信号的特征参数；所述特征参数包括时域过零率 zcr; 和 /或，低频带占全频带的能量比率 ratiol。 The first obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes a time domain zero-crossing rate zcr; and/or, the low frequency band occupies an energy ratio ratio1 of the full frequency band.

所述清音判别单元，根据获取到的如下特征参数的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行清音归属类别的判决：时域过零率 zcr; 低频带占全频带的能量比率 mtiol。具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。本发明提供的第四实施例是一种语音判别装置，其结构如图 6所示，包括：第二获取单元和语音判别单元； The unvoiced discriminating unit performs a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; a low frequency band The energy ratio of the frequency band is mtiol. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein. The fourth embodiment of the present invention is a voice discriminating device, and the structure thereof is as shown in FIG. 6, and includes: a second acquiring unit and a voice discriminating unit;

所述第二获取单元获取音频信号的特征参数；所述特征参数包括如下特征参数中的一个或多个： The second obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes one or more of the following feature parameters:

谱波动 flux; 谱波动方差 var_flux; 谱波动方差移动平均 flux— var—mov; 时域过零率 zcr; x%谱衰减 Rolloff_x, 如 50%谘衰减 Rolloff一 half。 Spectral fluctuation flux; spectral fluctuation variance var_flux; spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff_x, such as 50% attenuation Rolloff-half.

所述语音判别单元，根据获取到的如下特征参数中的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行语音归属类别的判决：谱波动 flux;语波动方差 var— flux; 谱波动方差移动平均 flux— var—mov; 时域过零率 zcr; x%谱衰减 Rolloff— X, 如 50%谙衰减 Rolloff— half。具体处理过程与第一实施例中的相关描述雷同，这里不再详细描述。 The speech discriminating unit performs a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and a corresponding characteristic parameter threshold: spectral fluctuation flux; language fluctuation variance var-flux ; spectral fluctuation variance moving average flux - var - mov; time domain zero crossing rate zcr; x% spectral attenuation Rolloff - X, such as 50% 谙 decay Rolloff - half. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.

本发明提供的第五实施例是一种音乐判别装置，其结构如图 7所示，包括：第三获取单元和音乐判别单元。 A fifth embodiment of the present invention is a music discriminating device, which has a structure as shown in FIG. 7, and includes: a third acquiring unit and a music discriminating unit.

所述第三获取单元获取音频信号的特征参数；所述特征参数包括如下特征参数的一个或多个： The third obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes one or more of the following feature parameters:

谱波动方差移动平均 flux var mov; x%请衰减 Rolloff x, 如 50%i瞽衰减 Rolloff_half₀ Spectral fluctuation variance moving average flux var mov; x% please attenuate Rolloff x, such as 50% i瞽 decay Rolloff_half ₀

所述音乐判别单元，根据获取到的如下特征参数的一个或多个，以及相应的特征参数阈值，对当前非噪声音频信号进行音乐归属类别的判决：谱波动方差移动平均 flux var—mov; .x%i普衰减 Rolloff x，如 50%i脊衰减 Rolloff half。具体处理过程与第一实施例中的相关描述雷同 , 这里不再详细描述。 The music discriminating unit performs a music attribution category decision on the current non-noise audio signal according to one or more acquired feature parameters and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux var-mov; X%i attenuates Rolloff x, such as 50% i ridge attenuation Rolloff half. The specific processing is identical to the related description in the first embodiment and will not be described in detail herein.

由上述本发明的实施例提供的具体实施方案可以看出，其是根据非噪声音频信号的频谱特征参数，来确定当前非噪声音频信号的归属类别的，因此本发明的实施例能够脱离编码算法而存在，从而具有独立性以及可移植性。 It can be seen from the specific implementation provided by the above embodiments of the present invention that the attribution type of the current non-noise audio signal is determined according to the spectral characteristic parameter of the non-noise audio signal, so that the embodiment of the present invention can be separated from the coding algorithm. It exists, thus being independent and portable.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。 It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention are within the scope of the present invention The present invention is also intended to cover such modifications and variations within the scope of the equivalents.

Claims

Rights request

A method for determining a belonging class of a non-noise audio signal, comprising: acquiring a spectral characteristic parameter of a non-noise audio signal;

In the frequency domain, the current non-noise audio signal attribution category is determined according to the frequency characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold.

2. The method of claim 1 wherein the characteristic parameters comprise at least one of:

• i-fluctuation flux; spectral fluctuation variance flux—var; verb fluctuation variance moving average flux—var— mov; 氐band occupies the energy ratio of the full band ratiol; 95% spectral attenuation Rolloff; x% spectral attenuation Rolloff—x; time domain Zero crossing rate zcr»

3. The method of claim 2, wherein the feature parameter further comprises at least one of the following:

Spectral attenuation variance rolloff— var; frequency domain zero-crossing rate fzcr.

4. The method according to claim 3, wherein the frequency domain zero crossing rate fzcr is obtained by:

Intercepting at least one piece of the spectral signal of the non-noise audio signal;

The segmented frequency signal is normalized; and the normalized frequency-transmission signal is subjected to the removal of the average value, and the zero-crossing rate of the sequenced frequency signal is calculated.

The method according to claim 2 or 3, wherein the determining the current non-noise audio signal attribution category according to the spectral characteristic parameter and the set characteristic parameter threshold comprises:

Obtaining one or more of the following characteristic parameters: a time domain zero-crossing rate zcr; a low frequency band occupying an energy ratio ratio1 of the full frequency band, and a corresponding characteristic parameter threshold, and performing a decision on the unvoiced attribution category of the current non-noise audio signal; and / or,

According to one or more of the following characteristic parameters obtained: spectral fluctuation flux; spectral fluctuation variance Var-flux; spectral fluctuation variance moving average flux_var_mov; time domain zero-crossing rate zcr, and corresponding characteristic parameter threshold, for the current non-noise audio signal to make a voice attribution category decision; and / or,

Determining the current non-noise audio signal according to the obtained x% spectral attenuation Rolloff_X characteristic parameter and the corresponding characteristic parameter threshold; and/or,

Determining the speech attribution category of the current non-noise audio signal according to the unvoiced trailing flag, the spectral fluctuation tailing flag, and the spectral attenuation trailing flag of the audio signal of the previous frame; and/or,

Obtaining a music attribution category for the current non-noise audio signal according to one or more of the following characteristic parameters obtained: a spectral fluctuation variance moving average flux_var_mov; x% spectral attenuation Rolloff_x, and a corresponding characteristic parameter threshold; and/or,

The average non-noise audio signal is subjected to a decision of the music attribution category based on the spectral fluctuation variance of the previous frame of the audio signal moving the average tailing flag.

The method of claim 5, wherein the process of determining the unvoiced audio category of the current non-noise audio signal comprises:

Determining whether one or more of the following conditions are satisfied: whether the time domain zero-crossing rate zcr is greater than the time-domain zero-crossing rate threshold THR_ZCR; whether the low-band-to-full-band energy ratio ratio1 is greater than the low-band-to-full-band energy ratio threshold THR —RA;

If one of the conditions satisfies the condition, it is determined that the current non-noise frame belongs to the unvoiced category, and the unvoiced trailing flag is set to the first set value; otherwise, it is determined that the current non-noise frame does not belong to the voice class.

7. The method according to claim 5, wherein the one or more of the following characteristic parameters are obtained: a fluctuation flux; a spectral fluctuation variance var-flux; a spectral fluctuation variance moving average flux-var — mov; the time domain zero-crossing rate zcr, and the corresponding characteristic parameter threshold, the process of determining the voice attribution category of the current non-noise audio signal, specifically:

Determine if one or more of the following conditions are met:

Whether the language fluctuation flux is greater than the spectral fluctuation threshold THRJFLUX; whether the spectral fluctuation variance flux_var is greater than the spectral fluctuation variance threshold THR_FLUX_VAR; whether the spectral fluctuation flux is greater than the first spectral fluctuation variance function f^flux-var); Two-spectrum fluctuation variance function f ₂ (flux_var); whether zcr Greater than i volatility variance moving average function f(flux_var_mov);

If one of the conditions is satisfied, it is determined that the current non-noise audio signal belongs to the voice class, and the spectral fluctuation tail flag is the second set value; otherwise, it is determined that the current non-noise audio signal does not belong to the voice class.

The method according to claim 5, wherein the performing the voice attribution category on the current non-noise audio signal according to the acquired x%i-attenuated Rolloff_X feature parameter and the corresponding feature parameter threshold The process of the judgment includes:

Determining whether x%i attenuates Rolloff_X is less than x% spectral attenuation threshold THR_ROLL; if yes, determines that the current non-noise audio signal belongs to the speech category, and the speech attenuation trailing flag is the third set value; otherwise, It is determined that the current non-noise audio signal does not belong to the voice class.

The method according to claim 5, wherein the unvoiced trailing flag, the spectral fluctuation trailing flag, and the attenuating trailing flag of the audio signal of the previous frame are used to perform a voice attribution category on the current non-noise audio signal. The process of the judgment specifically includes:

Determining whether at least one of the following conditions is satisfied: whether the unvoiced trailing flag of the audio signal of the previous frame is greater than 0; whether the spectral fluctuation trailing flag of the previous frame of the audio signal is greater than 0; spectral attenuation of the previous frame of the audio signal Whether the flag is greater than 0;

If one of the conditions satisfies the condition, it is determined that the current non-noise audio signal belongs to the voice; if the above condition is not satisfied, it is determined that the current non-noise audio signal does not belong to the voice.

10. The method according to claim 5, wherein the one or more of the acquired characteristic parameters are: spectral fluctuation variance moving average flux_var_mov; x°/. The spectral attenuation Rolloff_x, and the corresponding characteristic parameter threshold, the process of determining the music attribution category of the current non-noise audio signal, specifically including:

Determine if one or more of the following conditions are met:

Whether the volatility variance moving average flux_var_ mov is less than the third x% pan fading function β (Rolloff_x); the spectral volatility variance moving average flux _ var — mov is less than the fifth set value; the spectral fluctuation variance moving average flux_var — mov is less than Fourth x%i fading function f4 (Rolloff-x); spectral fluctuation variance moving average flux_var- mov is less than the spectral fluctuation variance moving average threshold THR—FLUX—VAR—MO V;

If one of the conditions is satisfied, it is determined that the current non-noise audio signal belongs to the music category, and the spectral fluctuation variance moving average trailing flag is the fourth set value; otherwise, it is determined that the current non-noise audio signal does not belong to the music class.

The method according to claim 5, wherein the moving the average trailing flag according to the spectral fluctuation variance of the audio signal of the previous frame, and determining the music attribution category of the current non-noise audio signal, further includes :

Determining whether the spectral fluctuation variance moving average trailing flag of the previous frame audio signal is greater than 0, and if so, determining that the current non-noise audio signal belongs to music; otherwise, determining that the current non-noise audio signal does not belong to music.

The method according to claim 5, wherein when it is determined that the current non-noise audio signal is neither attributed to the voice category nor belongs to the music category, the method further includes:

Determining whether there is a voice audio environment or a music audio environment before the current non-noise audio signal; when there is a voice audio environment before the current non-noise audio signal, it is judged whether one or more of the following conditions are satisfied: spectral fluctuation flux, spectral fluctuation variance var — Whether one or more of flux, moving variance flux average flu_var_mov, 镨 attenuation variance rol ff — var characteristic parameters are greater than the corresponding characteristic parameter threshold; whether the frequency domain zero crossing rate f _zcr is less than the corresponding characteristic parameter threshold; If one of the conditions is satisfied, determining that the current non-noise audio signal belongs to the voice; otherwise determining that the current non-noise audio signal is not attributable to the voice;

When there is a music audio environment before the current non-noise audio signal, it is determined whether one or more of the following conditions are met: Spectral fluctuation variance moving average flux_var_mov, spectral attenuation variance rolloff- var one or more of the characteristic parameters Whether it is less than the corresponding characteristic parameter threshold; whether the frequency domain zero-crossing rate fzcr is greater than the corresponding characteristic parameter threshold, and if one of the conditions is satisfied, determining that the current non-noise audio signal belongs to music; otherwise, determining that the current non-noise audio signal is not Belongs to music.

13. The method according to claim 5, wherein when it is determined that the current non-noise audio signal belongs to both the voice category and the music category, or neither belongs to the voice category nor belongs to the music category, Also includes:

Using a Gaussian mixture model to determine the attribution category for audio indeterminate audio signals that are both speech and music, or that are both non-speech and non-music; or

Determining, by the attribution category of the indeterminate audio signal, a attribution category of the audio signal immediately before the indeterminate audio signal; or

The attribution category of the indeterminate audio signal is determined as a category to which a signal having a larger proportion of the audio signal before the indeterminate audio signal belongs.

The method according to claim 12, wherein when it is determined that the current non-noise audio signal belongs to neither the voice category nor the music category, the method further includes:

Using a Gaussian mixture model to determine the attribution category for an audio indeterminate audio signal that belongs to both speech and music, or that belongs to both non-speech and non-music; or

A device for determining a belonging class of a non-noise audio signal, comprising: a feature parameter acquiring unit, configured to acquire a spectral feature parameter of the non-noise audio signal;

The attribution class determining unit is configured to determine, in the frequency domain range, the current non-noise audio signal attribution category according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold.

16. The apparatus of claim 15, wherein the spectral characteristic parameter comprises at least one of:

Fluctuation flux; i-wave fluctuation variance flux-var; spectral fluctuation variance moving average flux-var- mov; 4氐 band dominates the energy ratio of the full band ratiol; 95% spectral attenuation Rolloff; x°/.谙Attenuation Rolloff—x; Time domain zero-crossing rate zcr.

17. The apparatus according to claim 16, wherein the frequency characteristic parameter further comprises at least one of: attenuating variance rolloff_var; frequency domain zero crossing rate fzcr.

18. The apparatus according to claim 16 or 17, wherein the attribution category determining unit comprises:

The unvoiced discriminant subunit is configured to: according to the acquired one or more of the following characteristic parameters: a time domain zero-crossing rate zcr; a low frequency band occupying an energy ratio ratio1 of the full frequency band, and a corresponding characteristic parameter threshold, for the current non-noise audio signal Conducting a judgment on the category of unvoiced sound; and,

a speech discriminating subunit, configured to: according to one or more of the following characteristic parameters obtained: i-fluctuation flux; spectral fluctuation variance var_flux; spectral fluctuation variance moving average flux_var-mov; time-domain zero-crossing rate zcr; % transmits the decaying Rolloff_x, and the corresponding characteristic parameter threshold, and performs the decision of the voice attribution category on the current non-noise audio signal;

a music discriminating subunit, configured to perform one or more of the following characteristic parameters according to the following: a spectral fluctuation variance moving average flux_var_mov; x%谙attenuation Rolloff_x, and a corresponding characteristic parameter threshold, and music on the current non-noise audio signal Judgment of attribution category.

The device according to claim 18, wherein the attribution category determining unit further comprises:

a decision subunit, configured to determine whether a voice audio environment or a music audio environment exists before the current non-noise audio signal when it is determined that the current non-noise audio signal belongs to neither the voice category nor the music category;

When there is a speech audio environment before the current non-noise audio signal, according to one or more of the following characteristic parameters obtained: language fluctuation flux; i-wave fluctuation variance var_flux; spectral fluctuation variance moving average flux_var-mov; spectral attenuation The variance rolloff_var; the frequency domain zero-crossing rate fzcr, and the corresponding characteristic parameter threshold, the speech attribution category of the current non-noise audio signal;

When there is a music audio environment before the current non-noise audio signal, according to one or more of the acquired characteristic parameters: spectral fluctuation variance moving average flux_var_mov; spectral attenuation variance rolloff_var; frequency domain zero-crossing rate fzcr, and corresponding The feature parameter threshold is used to determine the voice attribution category for the current non-noise audio signal that is neither speech nor music.

20. An unvoiced discriminating device, comprising: a first acquiring unit, configured to acquire a frequency i ridge feature parameter of the audio signal;

The unvoiced discriminating unit is configured to perform one or more of the following characteristic parameters: a time domain zero-crossing rate zcr; a low frequency band occupying an energy ratio ratio1 of the full frequency band, and a corresponding characteristic parameter threshold, and performing current non-noise audio signals The judgment of the unvoiced category.

21. A speech discriminating device, comprising:

a second acquiring unit, configured to acquire a spectral feature parameter of the audio signal;

a speech discriminating unit, configured to: according to one or more of the following characteristic parameters obtained: spectral fluctuation flux; spectral fluctuation variance var-flux; spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr; x% spectrum The Rolloff_x is attenuated, and the corresponding characteristic parameter threshold is used to determine the voice attribution category of the current non-noise audio signal.

22. A music discriminating device, comprising:

a third acquiring unit, configured to acquire a spectral feature parameter of the audio signal;

a music discriminating unit, configured to perform music according to one or more of the following characteristic parameters: a spectral fluctuation variance moving average flux var mov; x% spectral attenuation Rolloff x, and a corresponding characteristic parameter threshold, and performing music on the current non-noise audio signal Judgment of attribution category.