WO2008106852A1 - A method and device for determining the classification of non-noise audio signal - Google Patents
A method and device for determining the classification of non-noise audio signal Download PDFInfo
- Publication number
- WO2008106852A1 WO2008106852A1 PCT/CN2007/003985 CN2007003985W WO2008106852A1 WO 2008106852 A1 WO2008106852 A1 WO 2008106852A1 CN 2007003985 W CN2007003985 W CN 2007003985W WO 2008106852 A1 WO2008106852 A1 WO 2008106852A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- spectral
- var
- flux
- current non
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Definitions
- the present invention relates to the field of communications, and more particularly to techniques for determining a class of useful signal attribution.
- BACKGROUND OF THE INVENTION With the development of broadband technology, current audio signals are also diversified: not only speech, but also audio signals such as music, unvoiced sound, and various noises.
- the speech, music, and unvoiced audio signals are generally referred to as non-noise audio signals; various noise audio signals are referred to as noisy audio signals.
- AMR-WB Adaptive Multi-Rate-Wideband
- SMV Selectable Mode Vocoder
- Embodiments of the present invention provide a method and apparatus for determining a home category of a non-noise audio signal that can exist without the encoding algorithm.
- Embodiments of the present invention provide a method of determining a home category of a non-noise audio signal, including: Obtaining frequency characteristic parameters of the non-noise audio signal;
- the current non-noise audio signal attribution category is determined according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter value.
- An embodiment of the present invention further provides an apparatus for determining a belonging class of a non-noise audio signal, including: a feature parameter acquiring unit, configured to acquire a frequency characteristic parameter of the non-noise audio signal;
- the attribution category determining unit is configured to determine, in the frequency domain range, the current non-noise audio signal attribution category according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold.
- An embodiment of the present invention further provides an unvoiced discriminating device, including:
- a first acquiring unit configured to acquire a frequency characteristic parameter of the audio signal
- the unvoiced discriminating unit is configured to perform a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; The energy ratio of the frequency band ratiol.
- An embodiment of the present invention further provides a voice discriminating device, including:
- a second acquiring unit configured to acquire a spectral feature parameter of the audio signal
- the speech discriminating unit is configured to perform a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding feature parameter reading: spectral fluctuation flux; spectral fluctuation variance var— Flux; spectral fluctuation variance moving average flux_var- mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff-x.
- An embodiment of the present invention further provides a music discriminating device, including:
- a third acquiring unit configured to acquire a frequency characteristic parameter of the audio signal
- a music discriminating unit configured to perform a music attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters: and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux_var_mov; x%i Lu attenuates Rolloff x,
- FIG. 1 is a flow chart of a first embodiment provided by the present invention
- FIG. 3 is a logic flow chart of a modification decision in the first embodiment provided by the present invention.
- Figure 4 is a schematic structural view of a second embodiment provided by the present invention.
- Figure 5 is a schematic structural view of a third embodiment provided by the present invention.
- Figure 6 is a schematic structural view of a fourth embodiment of the present invention.
- FIG. 7 is a schematic structural view of a fifth embodiment provided by the present invention.
- DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The first embodiment of the present invention provides a method for determining a home category of a non-noise audio signal.
- the implementation process is as shown in FIG. 1 and includes:
- Step S100 Acquire frequency characteristic parameters of the non-noise audio signal.
- the short-term characteristic parameters include: spectral fluctuation, 95% spectral rolloff, x% ⁇ attenuation Rolloff-x (eg, 50 % spectral attenuation (Roiloff-half)), low frequency band occupying the full frequency band
- the length-time characteristic is the variance and moving average of each short-term characteristic parameter, such as the fluctuation variance flux_var;
- the above characteristic parameters are counted by taking 10 frames, that is, the duration of 100 ms, and the definitions and calculation formulas of these characteristic parameters are given below:
- i-th time-domain sample value representing a frame of the sound signal, where 0 ⁇ ⁇ M; M represents the number of sample values of one frame signal; T represents the number of frames; u jw, is the signal spectrum of the ith frame; N is the FFT (Fast Fourier Transform, the length of the Fast Fourier Transform), flux is the ier wave of the i-th frame, and TT ⁇ is the moving average of the ith frame spectrum, the moving average of the spectrum, and the moving average of the spectral attenuation.
- the following takes the sound signal with a sampling rate of 16 kHz as an example to describe the characteristic parameters in detail: 1. Spectral fluctuation flux and its derived ⁇ fluctuation variance flux—var and i-wave variance variance moving average flux—var—mov.
- the spectral fluctuation flux feature parameter describes the variation between frames and frames. For music signals, flux is relatively low and smooth, while the flux of speech signals is usually high and varies greatly. It can be calculated using Equation 1; the spectral fluctuation variance flux-var and the fluctuation variance moving average flux-var-mov are calculated using Equation 2 and Equation 3, respectively:
- Equation 3 where " ⁇ «( ⁇ ) is a normalization function.
- the low frequency band accounts for the energy ratio ratio1 of the full frequency band.
- This characteristic parameter describes the ratio of the low-band sub-band energy to the total energy. Usually the mtiol of the voice signal is higher, and the mtiol of the music signal is lower. Its calculation formula is as shown in formula 4:
- Rolloff represents the position of the point occupying 95% of the energy; Rolloff half means 50% of the total band The location of the point of energy.
- Equation 7 The calculation formula for Rolloff-half is shown in Equation 7:
- Rolloff Jialfii max( ⁇ f/ _pw i (k) ⁇ 0.5* ⁇ U _pw, (I) ) Equation 7
- Equation 8 the function ⁇ indicates that when ⁇ is trut h, II ⁇ A ⁇ is 1; when A is f a i se , ⁇ is 0.
- the fzcr represents a measure of the fluctuation of the energy of a frame signal at different frequencies in the frequency domain.
- fzcr can be seen as a preliminary algorithm for formants. It can be obtained by: intercepting at least one frequency signal of the non-noise audio signal frame; normalizing each of the intercepted frequency signals; and performing average removal on the normalized spectrum signal The finishing process, and calculate the zero-crossing rate of the collated spectrum signal. Specifically, it can be calculated using Equation 9 to Equation 13:
- step S200 is performed to determine the current non-noise audio signal attribution category according to the characteristic parameters of the non-noise audio signal and the set characteristic parameter threshold in the frequency domain.
- step S200 when the logic parameter determination is performed by using the combination of the above characteristic parameters, a preliminary logic determination is first performed, and the non-noise audio signal is initially classified into four categories: unvoiced, voice, music, and uncertain signal; and then the correction logic is performed.
- the judgment that is, the uncertainty signal obtained after the preliminary logic determination is further judged, so that it can be attributed to voice or music. as follows:
- Step S102 Determine whether the current non-noise audio signal belongs to the unvoiced sound according to one or more of the following characteristic parameters: a time domain zero-crossing rate zcr, and a low frequency band occupying an energy ratio ratio1 of the full frequency band. as well as,
- Step S103 according to one or more of the following characteristic parameters: ⁇ fluctuation flux; i undulation variance flux_var; spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr, determining whether the current non-noise audio signal belongs to voice. as well as,
- Step S104 determining whether the current non-noise audio signal belongs to the voice according to the x% spectrum attenuation Rolloff_X, such as the 50% spectral attenuation Rolloff-half characteristic parameter. as well as,
- Step S105 determining whether the current non-noise audio signal belongs according to the unvoiced trailing flag ZCR-hangover_flag, the spectral fluctuation trailing flag Flux-hangover_flag or the attenuating trailing flag Rollhalf_hangover_flag of the audio signal of the previous frame. For voice. as well as,
- Step S106 according to one or more of the following characteristic parameters: ⁇ fluctuation variance moving average flux var mov; x% spectral attenuation Rolloff x, determining whether the current non-noise audio signal belongs to music. as well as,
- Step S107 Determine whether the current non-noise audio signal belongs to music according to the spectral fluctuation variance moving average tailing flag flux_var_mov_hangover_flag of the previous frame.
- the specific implementation can be carried out as follows:
- the specific implementation can be carried out as follows:
- the specific implementation can be carried out as follows:
- the spectral fluctuation variance moving average flux — var — mov is less than the third x°/.
- the following flag may be output:
- step S117 that is, according to the Speech_flag and Music_flag, the attribution category of the current non-noise audio signal is determined:
- step S118 is performed, that is, the information that the non-noise audio signal belongs to the indeterminate signal UNCERTAIN is determined;
- step S119 is performed to determine that the non-noise audio signal belongs to the voice
- step S120 is performed to determine that the current non-noise audio signal belongs to the music.
- Step S201 determining that the audio environment before the current non-noise audio signal is a voice audio environment, and is also a music environment;
- the speech_continue_counter continuously speech counter, the number of consecutively occurring speech audio signals before the current non-noise audio signal
- TRR_SPEECH value the audio environment before the current non-noise audio signal is determined as Voice and audio environment
- step S205 is directly performed, that is, the non-noise audio signal is judged as an indeterminate audio signal.
- step S202 The specific implementation process of step S202 is as follows:
- the thresholds THR-flux, THR_flux_var, and THR_flux_var-mov may be different from the corresponding thresholds set by the initial judgment process.
- step S204 is as follows:
- the threshold THR_flux_var_mov may be different from the corresponding threshold set by the initial judging process.
- the uncertainty audio signal is subjected to a decision of a home class based on an audio signal preceding the current non-noise audio signal. details as follows:
- the soft decision method for the uncertain audio signal, performs the classification of the attribution class, for example, using the GMM (Gaussian mixture model) decision method for further classification.
- the second embodiment of the present invention is an apparatus for determining a home category of a non-noise audio signal, and has a structure as shown in FIG. 4, including: a feature parameter obtaining unit and a home category determining unit.
- the attribution class determining unit includes: a voiceless discriminating subunit, a voice discriminating subunit, and a music discriminating subunit, and the home class determining unit further includes: a determining subunit.
- the feature parameter obtaining unit acquires a feature parameter of the non-noise audio signal; the feature parameter includes at least one of the following:
- Spectral fluctuation flux Spectral fluctuation flux
- spectral fluctuation variance flux_var spectral fluctuation variance moving average flux_var_mov
- the energy ratio of the frequency band to the full frequency band ratiol 95% spectral attenuation Rolloff; x% spectral attenuation Rolloff-x, such as 50% spectral attenuation Rolloff_half; spectral attenuation variance rolloff-var; spectral amplitude variance magvar; time domain zero-crossing rate zcr; The frequency domain zero crossing rate fzcr.
- the attribution category determining unit determines, in the frequency domain range, the current non-noise audio signal attribution category according to the characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold.
- the specific treatment is as follows:
- the unvoiced discriminant subunit performs a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the obtained characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; a low frequency band occupying the full frequency band
- a time domain zero-crossing rate zcr a time domain zero-crossing rate zcr
- a low frequency band occupying the full frequency band
- the speech discriminating subunit performs a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a spectral fluctuation flux; a variability variance var-flux; Spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff-x, such as 50% spectral attenuation Rolloff-half; the specific processing is the same as the related description in the first embodiment, here No longer described in detail. as well as,
- the music discriminating subunit performs a music attribution category decision on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: spectral fluctuation variance moving average flux_var_mov; x
- the % spectrum decays Rolloff_x, such as 50% spectral decay Rolloff-half.
- the specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
- the home class determining unit passes the Describe a decision subunit to determine whether a voice audio environment or a music audio environment exists before the current non-noise audio signal;
- spectral fluctuation variance moving average flux_var_mov When there is a music audio environment in front of the current non-noise audio signal, one or more of the following characteristic parameters are obtained: spectral fluctuation variance moving average flux_var_mov; spectral attenuation variance rolloff_var; frequency domain zero-crossing rate fzcr, And corresponding feature parameter thresholds, and the current non-noise audio signal that is neither voice nor music belongs to the voice attribution category.
- the specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
- the audio signal may be further determined by the determining subunit, and the following determining method may be adopted: according to the audio signal before the current non-noise audio signal, Uncertain audio signals are subject to decision of attribution class.
- the attribution category of the indeterminate audio signal is determined as the attribution category of the audio signal immediately before the indeterminate audio signal; or the attribution category of the indeterminate audio signal is determined as the uncertainty Among the audio signals in front of the audio signal, the category to which the larger proportion of the signal belongs.
- GMM Gausian Mixture Model
- the third embodiment provided by the present invention is an unvoiced discriminating device, and its structure is as shown in FIG. 5, and includes: a first acquiring unit and an unvoiced discriminating unit.
- the first obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes a time domain zero-crossing rate zcr; and/or, the low frequency band occupies an energy ratio ratio1 of the full frequency band.
- the unvoiced discriminating unit performs a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; a low frequency band The energy ratio of the frequency band is mtiol.
- the specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
- the fourth embodiment of the present invention is a voice discriminating device, and the structure thereof is as shown in FIG. 6, and includes: a second acquiring unit and a voice discriminating unit;
- the second obtaining unit acquires a feature parameter of the audio signal;
- the feature parameter includes one or more of the following feature parameters:
- Spectral fluctuation flux Spectral fluctuation flux
- spectral fluctuation variance var_flux spectral fluctuation variance moving average flux_var-mov
- time domain zero-crossing rate zcr time domain zero-crossing rate zcr
- x% spectral attenuation Rolloff_x such as 50% attenuation Rolloff-half.
- the speech discriminating unit performs a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and a corresponding characteristic parameter threshold: spectral fluctuation flux; language fluctuation variance var-flux ; spectral fluctuation variance moving average flux - var - mov; time domain zero crossing rate zcr; x% spectral attenuation Rolloff - X, such as 50% ⁇ decay Rolloff - half.
- the specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
- a fifth embodiment of the present invention is a music discriminating device, which has a structure as shown in FIG. 7, and includes: a third acquiring unit and a music discriminating unit.
- the third obtaining unit acquires a feature parameter of the audio signal;
- the feature parameter includes one or more of the following feature parameters:
- Spectral fluctuation variance moving average flux var mov; x% please attenuate Rolloff x, such as 50% i ⁇ decay Rolloff_half 0
- the music discriminating unit performs a music attribution category decision on the current non-noise audio signal according to one or more acquired feature parameters and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux var-mov; X%i attenuates Rolloff x, such as 50% i ridge attenuation Rolloff half.
- a music attribution category decision on the current non-noise audio signal according to one or more acquired feature parameters and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux var-mov; X%i attenuates Rolloff x, such as 50% i ridge attenuation Rolloff half.
- the attribution type of the current non-noise audio signal is determined according to the spectral characteristic parameter of the non-noise audio signal, so that the embodiment of the present invention can be separated from the coding algorithm. It exists, thus being independent and portable.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Telephone Function (AREA)
Abstract
Description
确定非噪声音频信号归属类别的方法和装置 Method and apparatus for determining a non-noise audio signal attribution category
技术领域 本发明涉及通信领域 , 尤其涉及确定有用信号归属类别的技术。 背景技术 随着宽带技术的发展, 目前的音频信号也呈现多元化趋势: 不仅仅限于语 音、 还包含音乐、 清音和各种噪声等音频信号。 通常将所述语音、 音乐和清音 音频信号, 总称为非噪声音频信号; 将各种噪声音频信号称为噪声音频信号。 为了对不同的音频信号采取适宜的编解码算法, 需要在编解码非噪声音频信号 之前, 确定出各种非噪声音频信号的归属类别。 TECHNICAL FIELD The present invention relates to the field of communications, and more particularly to techniques for determining a class of useful signal attribution. BACKGROUND OF THE INVENTION With the development of broadband technology, current audio signals are also diversified: not only speech, but also audio signals such as music, unvoiced sound, and various noises. The speech, music, and unvoiced audio signals are generally referred to as non-noise audio signals; various noise audio signals are referred to as noisy audio signals. In order to adopt a suitable codec algorithm for different audio signals, it is necessary to determine the attribution categories of various non-noise audio signals before encoding and decoding the non-noise audio signals.
在音频信号处理领域, 目前存在一些能够判别出音乐信号和语音信号的编 码器,如 AMR-WB ( Adaptive Multi-Rate - Wideband, 多速率编码标准)和 SMV ( Selectable Mode Vocoder, 多码率模式语音编码标准)。 其判别音乐信号和语音 信号的基本思想如下: 在对音频信号进行编解码之前, 提取出编解码时所使用 的时域特征参数; 然后利用所述时域特征参数, 将音频信号中的音乐信号和语 音信号判别出来。 In the field of audio signal processing, there are currently some encoders capable of discriminating music signals and speech signals, such as AMR-WB (Adaptive Multi-Rate-Wideband) and SMV (Selectable Mode Vocoder). Coding standard). The basic idea of discriminating the music signal and the speech signal is as follows: Before encoding and decoding the audio signal, extracting the time domain characteristic parameter used in the encoding and decoding; then using the time domain characteristic parameter, the music signal in the audio signal is used And the voice signal is discriminated.
可以看出, 上述音频信号的判别过程只能使用编码算法涉及到的时域特征 参数, 因此这种确定音频信号的归属类别的方法必须依赖于编码算法而存在, 不具有独立性以及可移植性。 发明内容 It can be seen that the discriminating process of the above audio signal can only use the time domain characteristic parameters involved in the encoding algorithm. Therefore, the method for determining the attribution category of the audio signal must exist depending on the encoding algorithm, and has no independence and portability. . Summary of the invention
本发明的实施例提供一种确定非噪声音频信号归属类别的方法和装置, 其 能够脱离编码算法而存在。 Embodiments of the present invention provide a method and apparatus for determining a home category of a non-noise audio signal that can exist without the encoding algorithm.
本发明的实施例通过如下技术方案实现: Embodiments of the present invention are implemented by the following technical solutions:
本发明的实施例提供一种确定非噪声音频信号归属类别的方法, 其包括: 获取非噪声音频信号的频谙特征参数; Embodiments of the present invention provide a method of determining a home category of a non-noise audio signal, including: Obtaining frequency characteristic parameters of the non-noise audio signal;
在频域范围内, 根据所迷非噪声音频信号的频谱特征参数, 以及设定的特 征参数阁值, 确定当前非噪声音频信号归属类别。 In the frequency domain, the current non-noise audio signal attribution category is determined according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter value.
本发明的实施例还提供一种确定非噪声音频信号归属类别的装置, 其包括: 特征参数获取单元, 用于获取非噪声音频信号的频谙特征参数; An embodiment of the present invention further provides an apparatus for determining a belonging class of a non-noise audio signal, including: a feature parameter acquiring unit, configured to acquire a frequency characteristic parameter of the non-noise audio signal;
归属类别确定单元, 用于在频域范围内, 4艮据所述非噪声音频信号的频谱 特征参数, 以及设定的特征参数阈值, 确定当前非噪声音频信号归属类别。 The attribution category determining unit is configured to determine, in the frequency domain range, the current non-noise audio signal attribution category according to the spectral characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold.
本发明的实施例还提供一种清音判别装置, 其包括: An embodiment of the present invention further provides an unvoiced discriminating device, including:
第一获取单元, 用于获取音频信号的频谙特征参数; a first acquiring unit, configured to acquire a frequency characteristic parameter of the audio signal;
清音判别单元, 用于根据获取到的如下特征参数的一个或多个, 以及相应 的特征参数阈值, 对当前非噪声音频信号进行清音归属类别的判决: 时域过零 率 zcr; 低频带占全频带的能量比率 ratiol。 The unvoiced discriminating unit is configured to perform a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; The energy ratio of the frequency band ratiol.
本发明的实施例还提供一种语音判别装置, 其包括: An embodiment of the present invention further provides a voice discriminating device, including:
第二获取单元, 用于获取音频信号的频谱特征参数; a second acquiring unit, configured to acquire a spectral feature parameter of the audio signal;
语音判别单元, 用于根据获取到的如下特征参数中的一个或多个, 以及相 应的特征参数阅值, 对当前非噪声音频信号进行语音归属类别的判决: 谱波动 flux; 谱波动方差 var— flux; 谱波动方差移动平均 flux— var— mov; 时域过零率 zcr; x%谱衰减 Rolloff— x。 The speech discriminating unit is configured to perform a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding feature parameter reading: spectral fluctuation flux; spectral fluctuation variance var— Flux; spectral fluctuation variance moving average flux_var- mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff-x.
本发明的实施例还提供一种音乐判别装置 , 其包括: An embodiment of the present invention further provides a music discriminating device, including:
第三获取单元, 用于获取音频信号的频 特征参数; a third acquiring unit, configured to acquire a frequency characteristic parameter of the audio signal;
音乐判别单元, 用于根据获取到的如下特征参数的一个或多个, 以及相应 的特征参数阔值, 对当前非噪声音频信号进行音乐归属类别的判决: 谱波动方 差移动平均 flux_var_mov; x%i鲁衰减 Rolloff x, a music discriminating unit, configured to perform a music attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters: and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux_var_mov; x%i Lu attenuates Rolloff x,
由上述本发明的实施例提供的具体实施方案可以看出, 其是根据非噪声音 频信号的频谱特征参数, 来确定当前非噪声音频信号的归属类别的, 因此本发 明的实施例能够脱离编码算法而存在, 从而具有独立性以及可移植性。 附图说明 图 1为本发明提供的第一实施例的流程图; It can be seen from the specific implementation provided by the above embodiments of the present invention that the attribution type of the current non-noise audio signal is determined according to the spectral characteristic parameter of the non-noise audio signal, so that the embodiment of the present invention can be separated from the coding algorithm. It exists, thus being independent and portable. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a flow chart of a first embodiment provided by the present invention;
图 2为本发明提供的第一实施例中的初始判决逻辑流程图; 2 is a flow chart of initial decision logic in a first embodiment provided by the present invention;
图 3为本发明提供的第一实施例中的修定判决逻辑流程图; 3 is a logic flow chart of a modification decision in the first embodiment provided by the present invention;
图 4为本发明提供的第二实施例的结构原理图; Figure 4 is a schematic structural view of a second embodiment provided by the present invention;
图 5为本发明提供的第三实施例的结构原理图; Figure 5 is a schematic structural view of a third embodiment provided by the present invention;
图 6为本发明提供的第四实施例的结构原理图; Figure 6 is a schematic structural view of a fourth embodiment of the present invention;
图 7为本发明提供的第五实施例的结构原理图。 具体实施方式 本发明提供的第一实施例是一种确定非噪声音频信号归属类别的方法, 其 实施过程如图 1所示, 包括: Figure 7 is a schematic structural view of a fifth embodiment provided by the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The first embodiment of the present invention provides a method for determining a home category of a non-noise audio signal. The implementation process is as shown in FIG. 1 and includes:
步骤 S100, 获取非噪声音频信号的频旙特征参数。 Step S100: Acquire frequency characteristic parameters of the non-noise audio signal.
对于输入的非噪声音频信号, 其具有的频语参数, 主要包括: 短时特征参 数及其类长时特征参数。 所述短时特征参数包括: 谱波动 (flux ), 95%谱衰减 ( spectral rolloff), x%谙衰减 Rolloff— x (如 50%谱衰减( Roiloff— half ) ), 低频带 占全频带的能量比率 ratiol , 时域过零率 zcr ( zero crossing rate, zcr ); 频域过 零率 fzcr; 所述类长时特征则是各短时特征参数的方差和移动平均,如旙波动方 差 flux_var; 普波动方差移动平均 flux— var— tnov; "普衰减方差 rolloff— var。 For the input non-noise audio signal, the frequency parameters it has include: short-term feature parameters and their class-time feature parameters. The short-term characteristic parameters include: spectral fluctuation, 95% spectral rolloff, x% 谙 attenuation Rolloff-x (eg, 50 % spectral attenuation (Roiloff-half)), low frequency band occupying the full frequency band The energy ratio ratiol, the zero-crossing rate zcr (the zero crossing rate, zcr); the frequency domain zero-crossing rate fzcr; the length-time characteristic is the variance and moving average of each short-term characteristic parameter, such as the fluctuation variance flux_var; The volatility variance moving average flux—var—tnov; “the decay variance rolloff—var.
在所述第一实施例中, 取 10帧, 即 100ms的时长统计上述特征参数, 下面 给出这些特征参数的定义和计算公式: In the first embodiment, the above characteristic parameters are counted by taking 10 frames, that is, the duration of 100 ms, and the definitions and calculation formulas of these characteristic parameters are given below:
定义 表示一帧声音信号的第 i个时域采样值, 其中 0≤ <M ; M表示一 帧信号的采样值数目; T表示帧数; u jw,是第 i帧的信号频谱; N是 FFT ( Fast Fourier Transform, 快速傅立叶变换) 的长度, flux ) 为第 i帧潘波动, , TT^和 是第 i帧谱波动移动平均, 频谱移动平均和谱衰减移动平均。 下 面以采样率 16kHz的声音信号为例, 对特征参数作详细说明: 1. 谱波动 flux 及其衍生的谙波动方差 flux— var 和 i普波动方差移动平均 flux—var— mov。 Defining the i-th time-domain sample value representing a frame of the sound signal, where 0 ≤ <M; M represents the number of sample values of one frame signal; T represents the number of frames; u jw, is the signal spectrum of the ith frame; N is the FFT (Fast Fourier Transform, the length of the Fast Fourier Transform), flux is the ier wave of the i-th frame, and TT^ is the moving average of the ith frame spectrum, the moving average of the spectrum, and the moving average of the spectral attenuation. The following takes the sound signal with a sampling rate of 16 kHz as an example to describe the characteristic parameters in detail: 1. Spectral fluctuation flux and its derived 谙 fluctuation variance flux—var and i-wave variance variance moving average flux—var—mov.
谱波动 flux特征参数描述了帧和帧之间的变化。 对音乐信号而言, flux比较 低, 平稳, 而语音信号的 flux通常比较高, 变化大。 其可以采用公式 1计算得到; 谱波动方差 flux一var和 波动方差移动平均 flux一 var— mov分别釆用公式 2和公式 3 计算得到: The spectral fluctuation flux feature parameter describes the variation between frames and frames. For music signals, flux is relatively low and smooth, while the flux of speech signals is usually high and varies greatly. It can be calculated using Equation 1; the spectral fluctuation variance flux-var and the fluctuation variance moving average flux-var-mov are calculated using Equation 2 and Equation 3, respectively:
•公式 1 •Formula 1
.公式 2 1 i .Formula 2 1 i
flux _ var _ mov = var fluxi =― ^ var Jlnx k) Flux _ var _ mov = var flux i =― ^ var Jlnx k)
10 10
.公式 3 其中, 《οπ«(·)是归一化函数。 Equation 3 where "οπ«(·) is a normalization function.
2、 低频带占全频带的能量比率 ratiol。 2. The low frequency band accounts for the energy ratio ratio1 of the full frequency band.
该特征参数描述了低频段子带能量占总能量的比例。 通常语音信号的 mtiol 比较高, 音乐信号的 mtiol比较低。 其计算公式如公式 4所示: This characteristic parameter describes the ratio of the low-band sub-band energy to the total energy. Usually the mtiol of the voice signal is higher, and the mtiol of the music signal is lower. Its calculation formula is as shown in formula 4:
公式 4 Formula 4
3、 95%谱衰减 ( Rolloff )、 50% i普衰减 ( Rolloff— half ) 及谱衰减方差 ( rolloff— var )。 3, 95% spectral decay (Rolloff), 50% i-attenuation (Rolloff-half) and spectral attenuation variance (rolloff-var).
其中, Rolloff表示占全带 95%能量的点的位置; Rolloff half表示占全带 50% 能量的点的位置。 Where Rolloff represents the position of the point occupying 95% of the energy; Rolloff half means 50% of the total band The location of the point of energy.
通常语音信号谱衰减的点比较低, 音乐信号的谱衰减的点比较高。 RoUoff 和 rolloff—var的计算公式分别如公式 5和公式 6所示: Generally, the point at which the speech signal spectrum is attenuated is relatively low, and the point at which the spectrum of the music signal is attenuated is relatively high. The calculation formulas for RoUoff and rolloff-var are shown in Equation 5 and Equation 6, respectively:
Rolloffii) =max(∑[/ (k) <0.95*∑U jyw, (/) ) 公式 5 rolloff― var(z) =丄 (Rolloff(k)一 RoUoff,) Rolloffii) =max(∑[/ (k) <0.95*∑U jyw, (/) ) Equation 5 rolloff― var(z) =丄 (Rolloff(k)- RoUoff,)
m k= -m m k= -m
公式 6 Formula 6
Rolloff—half的计算公式如公式 7所示: The calculation formula for Rolloff-half is shown in Equation 7:
Rolloff Jialfii) = max(∑f/ _pwi (k)≤0.5*∑U _pw, (I) ) 公式 7Rolloff Jialfii) = max(∑f/ _pw i (k)≤0.5*∑U _pw, (I) ) Equation 7
4、 时域过零率 zcr。 4. Time domain zero-crossing rate zcr.
该特征参数主要用来检测清音。 由于语音中会间隔出现清音, 故会出现较 音乐高的 zcr。 其计算公式如公式 8所示: zcr = ^ll{x(i)x(i-l)<0} This feature parameter is mainly used to detect unvoiced sound. Since the voice is unvoiced at intervals, a higher musical zcr will occur. Its calculation formula is as shown in Equation 8: zcr = ^ll{x(i)x(i-l)<0}
ί ί
公式 8 公式 8中, 函数 Π{Α}表示当 Α是 truth时, II{A}是 1; 当 A是 faise时, Π{Α}为 0.In Equation 8, the function Π{Α} indicates that when Α is trut h, II{A} is 1; when A is f a i se , Π{Α} is 0.
5、 频域过零率 fzcr。 5. Frequency domain zero-crossing rate fzcr.
所述 fzcr表示一个衡量频域内, 某帧信号在不同频率的能量起伏的程度。 对 语音信号而言, fzcr可以看作是共振峰的一种初步算法。 其可以通过如下方式获 得: 截取非噪声音频信号帧的至少一段频谙信号; 对所截取的每一段频语信号 进行归一化处理; 并对归一化处理后的频谱信号, 进行去掉平均值的整理处理, 并计算整理过的频谱信号的过零率。 具体可以采用公式 9至公式 13计算得到: The fzcr represents a measure of the fluctuation of the energy of a frame signal at different frequencies in the frequency domain. For speech signals, fzcr can be seen as a preliminary algorithm for formants. It can be obtained by: intercepting at least one frequency signal of the non-noise audio signal frame; normalizing each of the intercepted frequency signals; and performing average removal on the normalized spectrum signal The finishing process, and calculate the zero-crossing rate of the collated spectrum signal. Specifically, it can be calculated using Equation 9 to Equation 13:
1 W2(,) 1 W2(,)
U_ vgi{t)= ∑ U_pwM U_ v gi {t)= ∑ U_pwM
N2(t) - Nl(t) «=wi N2(t) - Nl(t) «=wi
公式 S> 对于" e[Nl(i),N2( )], 存在: Formula S> For "e[Nl(i), N2( )], there is:
U― movl (t, n) = U _ movOi (n)-U _ avgt (t) U― mov l (t, n) = U _ movO i (n)-U _ avg t (t)
公^ ao 其中所述 ϋ—movOi (n)如公式 11所示: Public ^ ao where ϋ - movOi (n) is as shown in Equation 11:
U—movO, (n) = [U _ pw, n) + U __pw,(n-l) + U _ pw, (n + 1)]/ 3 U—movO, (n) = [U _ pw, n) + U __pw, (n-l) + U _ pw, (n + 1)]/ 3
公式 11 于是有: Formula 11 then has:
1 r_1 1 r_1
^( =—∑Il{Or_wo J(r,w) [/_wOT(f,n-l)<0} 公 fzcr{i) =∑K{t) 公式 13 其 中 所 述 Ν1 和 Ν2 是频 域子 带 起始 点 , 例 如 可 以 为 Ν1 =[ 188Hz, 1500Ηζ,2500Ηζ,3750Ηζ], Ν2=[ 1500Hz, 2500Hz,3750Hz, 8000Hz]; 所 述 ί/_ 是第 i 帧的信号频谱; 所述 t/_ wov(t,0是第 i帧的 t子段的移动平均; 所述 T表示帧数。 ^( =—∑Il{O r _wo J (r,w) [/_wOT(f,nl)<0} public fzcr{i) =∑K{t) Equation 13 where Ν1 and Ν2 are frequency domain With a starting point, for example, Ν1 = [188Hz, 1500Ηζ, 2500Ηζ, 3750Ηζ], Ν2=[1500Hz, 2500Hz, 3750Hz, 8000Hz]; the ί/_ is the signal spectrum of the i-th frame; the t/_ wov (t, 0 is the moving average of the t sub-segments of the ith frame; and T represents the number of frames.
当获取到上述特征参数后, 执行步骤 S200, 在频域范围内, 根据所述非噪 声音频信号的特征参数, 以及设定的特征参数阈值, 确定当前非噪声音频信号 归属类别。 After obtaining the above characteristic parameters, step S200 is performed to determine the current non-noise audio signal attribution category according to the characteristic parameters of the non-noise audio signal and the set characteristic parameter threshold in the frequency domain.
步骤 S200中利用上述特征参数组合进行逻辑判断时, 首先进行初步逻辑判 定, 将非噪声音频信号进行语音和音乐的初始分类, 分成 4 类: 清音, 语音, 音乐和不确定信号; 然后进行修正逻辑判定, 即对经过初步逻辑判定后得到的 不确定信号进一步进行判决, 使之可以归属为语音或音乐。 如下: In step S200, when the logic parameter determination is performed by using the combination of the above characteristic parameters, a preliminary logic determination is first performed, and the non-noise audio signal is initially classified into four categories: unvoiced, voice, music, and uncertain signal; and then the correction logic is performed. The judgment, that is, the uncertainty signal obtained after the preliminary logic determination is further judged, so that it can be attributed to voice or music. as follows:
首先进行初步逻辑判定, 将非噪声音频信号进行语音和音乐的初始分类, 分成 4类: 清音, 语音, 音乐和不确定信号。 具体实施过程如图 2所示: First, a preliminary logic decision is made to classify the non-noise audio signal into the initial classification of speech and music, which are divided into four categories: unvoiced, voice, music, and uncertain signals. The specific implementation process is shown in Figure 2:
步骤 S101,置语音标志和音乐标志为 0, 即 Speech Jlag=0且 Music_flag=0。 接下来同时进行如下判断: 步骤 S102, 根据如下特征参数的一个或多个: 时域过零率 zcr, 低频带占全 频带的能量比率 ratiol , 判断当前非噪声音频信号是否归属于清音。 以及, In step S101, the voice flag and the music flag are set to 0, that is, Speech Jlag=0 and Music_flag=0. Next, make the following judgments at the same time: Step S102: Determine whether the current non-noise audio signal belongs to the unvoiced sound according to one or more of the following characteristic parameters: a time domain zero-crossing rate zcr, and a low frequency band occupying an energy ratio ratio1 of the full frequency band. as well as,
步骤 S103 , 根据如下特征参数的一个或多个: 谙波动 flux; i普波动方差 flux_var; 谱波动方差移动平均 flux— var—mov; 时域过零率 zcr, 判断当前非噪声 音频信号是否归属于语音。 以及, Step S103, according to one or more of the following characteristic parameters: 谙 fluctuation flux; i undulation variance flux_var; spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr, determining whether the current non-noise audio signal belongs to voice. as well as,
步骤 S104, 根据 x%谱衰减 Rolloff— X , 如 50%谱衰减 Rolloff— half特征参 数, 判断当前非噪声音频信号是否归属于语音。 以及, Step S104, determining whether the current non-noise audio signal belongs to the voice according to the x% spectrum attenuation Rolloff_X, such as the 50% spectral attenuation Rolloff-half characteristic parameter. as well as,
步骤 S 105,根据上一帧音频信号的清音拖尾标志 ZCR— hangover—flag、谱波 动拖尾标志 Flux— hangover—flag或 ϊ ·衰减拖尾标志 Rollhalf— hangover_flag, 判断 当前非噪声音频信号是否归属于语音。 以及, Step S105, determining whether the current non-noise audio signal belongs according to the unvoiced trailing flag ZCR-hangover_flag, the spectral fluctuation trailing flag Flux-hangover_flag or the attenuating trailing flag Rollhalf_hangover_flag of the audio signal of the previous frame. For voice. as well as,
步骤 S106 , 根据如下特征参数的一个或多个: 诰波动方差移动平均 flux var mov; x%谱衰减 Rolloff x, 判断当前非噪声音频信号是否归属于音乐。 以及, Step S106, according to one or more of the following characteristic parameters: 诰 fluctuation variance moving average flux var mov; x% spectral attenuation Rolloff x, determining whether the current non-noise audio signal belongs to music. as well as,
步骤 S107, 根据上一帧 的谱波动方差移动平均拖尾标志 flux_var_mov_hangover_flag判断当前非噪声音频信号是否归属于音乐。 Step S107: Determine whether the current non-noise audio signal belongs to music according to the spectral fluctuation variance moving average tailing flag flux_var_mov_hangover_flag of the previous frame.
上述步骤 S102中, 若确定出当前音频信号归属于清音, 则执行步骤 S108, 即置清音拖尾标志 ZCR— hangover_flag为第一设定值,如 ZCR— hangover— flag=20。 然后执行步骤 S109 , 即输出清音标识。 否则, 执行步骤 S113 , 即保持 Speech— flag=0, 表示当前非噪声帧为既不属于语音类。 具体实现可以按照如下 情况进行: In the above step S102, if it is determined that the current audio signal belongs to the unvoiced sound, step S108 is executed, that is, the unvoiced trailing flag ZCR_hangover_flag is set to a first set value, such as ZCR_hangover_flag=20. Then, step S109 is executed, that is, the unvoiced identifier is output. Otherwise, step S113 is executed, that is, the speech_flag=0 is maintained, indicating that the current non-noise frame is neither a voice class. The specific implementation can be carried out as follows:
判断下述条件中的一个或多个是否满足: 时域过零率 zcr是否大于时域过零 率阈值 THR— ZCR; 低频带占全频带的能量比率 ratiol是否大于低频带占全频带 的能量比率阈值 THR一 RA; 若其中一个满足条件, 则确定当前非噪声帧归属于 清音类别 , 清音拖尾标志 ZCR— hangover— flag 为 第一设定值, 如 ZCR_hangover_flag-20; 否则, 执行步骤 S 113 , 即保持 Speech— flag=0。 Determining whether one or more of the following conditions are satisfied: whether the time domain zero-crossing rate zcr is greater than the time-domain zero-crossing rate threshold THR_ZCR; whether the low-band-to-full-band energy ratio ratio1 is greater than the low-band-to-full-band energy ratio Threshold THR-RA; if one of the conditions satisfies the condition, it is determined that the current non-noise frame belongs to the unvoiced category, and the unvoiced trailing flag ZCR_hangover_flag is the first set value, such as ZCR_hangover_flag-20; otherwise, step S113 is performed. That is, keep Speech_flag=0.
步骤 S103中, 若确定出当前音频信号归属于语音, 则执行步骤 S110, 即置 傳波动拖尾标志 Flux— hangover— flag为第二设定值, 如 Flux— hangover— flag=20; 然后执行步骤 S112,输出语音标识, 即置 Speech_flag=l。 否则,执行步骤 S113, 即保持 Speechjlag=0, 表示当前非噪声桢不属于语音类。 具体实现可以按照如 下情况进行: In step S103, if it is determined that the current audio signal belongs to the voice, step S110 is performed, that is, The fluctuation trailing flag Flux-hangover_flag is the second set value, such as Flux-hangover_flag=20; then step S112 is executed to output the voice identifier, that is, set Speech_flag=l. Otherwise, step S113 is performed, that is, Speechjlag=0 is maintained, indicating that the current non-noise 桢 does not belong to the voice class. The specific implementation can be carried out as follows:
判断下述条件中的一个或多个是否满足: Determine if one or more of the following conditions are met:
谱波动 flu 是否大于谘波动阈值 THR_FLUX; 谱波动方差 flux—var是否大 于谱波动方差阔值 THR—FLUX— VAR; 谱波动 flux是否大于第一谱波动方差函 ¾ f,(flux_var) , 如 f^flux—var) = 0.7-20*flux—var; 谱波动 flux是否小于第二 i普 波动方差函数 f2(flux_var), 如 f2(flux_var)=8*(flux— var); zcr是否大于谱波动方 差移动平均函数 f(flux一 var一 mov) , 如 f(flux— var— mov)= 60-2609* flux_var_mov; 若其中一个条件满足, 则确定当前非噪声音频信号归属于语音类别, 置谱 波动拖尾标志 Flux—hangover— flag为第二设定值, 如 Flux— hangover— flag=20; 然 后置 Speech— flag=l; 否则, 执行步骤 SI 13, 即保持 Speech— flag=0, 表示当前非 噪声帧不属于语音类。 Whether the spectral fluctuation flu is greater than the consensus fluctuation threshold THR_FLUX; whether the spectral fluctuation variance flux-var is greater than the spectral fluctuation variance threshold THR-FLUX-VAR; whether the spectral fluctuation flux is greater than the first spectral fluctuation variance function 3⁄4 f, (flux_var), such as f^ Flux—var) = 0.7-20*flux—var; whether the spectral fluctuation flux is less than the second undulation variance function f 2 (flux_var), such as f 2 (flux_var)=8*(flux—var); whether zcr is greater than the spectrum The fluctuation variance moving average function f(flux-var-mov), such as f(flux_var_mov)= 60-2609* flux_var_mov; if one of the conditions is satisfied, it is determined that the current non-noise audio signal belongs to the speech class, and the spectrum is set. The fluctuation tailing flag Flux-hangover_flag is the second set value, such as Flux-hangover_flag=20; then set Spech_flag=l; otherwise, execute step SI 13, that is, keep the Speech_flag=0, indicating the current Non-noise frames are not part of the voice class.
步骤 S104中, 若确定出当前音频信号归属于语音, 则执行步骤 S111 , 即置 谱 衰 减 拖 尾 标 志 Rollhalf_hangover_flag 为 第 三 设 定 值 , 如 Rollhalf_hangover_flag=20; 然后执行步骤 S112, 输出语音标识, 即置 Speech_flag=L 否则, 执行步骤 S113 , 即保持 Speech— flag=0, 表示当前非噪声 帧不属于语音类。 具体实现可以按照如下情况进行: In step S104, if it is determined that the current audio signal belongs to the voice, step S111 is performed, that is, the spectral attenuation trailing flag Rollhalf_hangover_flag is a third set value, such as Rollhalf_hangover_flag=20; then step S112 is performed to output the voice identifier, that is, Speech_flag=L Otherwise, step S113 is performed, that is, keeping Speech_flag=0, indicating that the current non-noise frame does not belong to the voice class. The specific implementation can be carried out as follows:
判断下述条件中的一个或多个是否满足: Determine if one or more of the following conditions are met:
x%谱衰减 Rolloff— half是否小于 x%谱衰减阈值 THR— ROLL; 若满足, 则确 定 当 前非噪声 音频信号归属于语音类别 , 置谱衰减拖尾标志 Rollhalf_hangover_flag 为第三设定值, 如 Rollhalf_hangover—flag=20; 然后置 Speech_flag=l ; 否则, 执行步骤 S113 , 即保持 Speech—flag=0, 表示当前非噪声 帧为非语音类。 Whether the x% spectral attenuation Rolloff-half is less than the x% spectral attenuation threshold THR_ROLL; if so, it is determined that the current non-noise audio signal belongs to the speech class, and the spectral attenuation trailing flag Rollhalf_hangover_flag is the third set value, such as Rollhalf_hangover- Flag=20; Then set Speech_flag=l; otherwise, execute step S113, that is, keep Speech_flag=0, indicating that the current non-noise frame is a non-speech class.
步骤 S105中, 若确定出当前音频信号归属于语音, 则执行步骤 Sl l l, 输出 语音标识, 即置 Speech— flag=l。 否则, 执行步骤 S113 , 即保持 Speech— flag=0, 表示当前非噪声帧为非语音类。 具体实现可以按照如下情况进行: In step S105, if it is determined that the current audio signal belongs to the voice, step S111 is performed, and the output is performed. Voice identification, ie, Speech_flag=l. Otherwise, step S113 is executed, that is, keeping Speech_flag=0, indicating that the current non-noise frame is a non-speech class. The specific implementation can be carried out as follows:
判断是否满足下迷条件中一个或多个: Determine if one or more of the following conditions are met:
清音拖尾标志 ZCR_hangover—flag 是否大于 0; 旙波动拖尾标志 Fluxjiangover— flag是否大于 0; 以及 i普衰減拖尾标志 Rollhalfjiangover— flag是 否大于 0; Whether the unvoiced trailing flag ZCR_hangover_flag is greater than 0; 旙 fluctuation tailing flag Fluxjiangover—flag is greater than 0; and i is attenuated trailing flag Rollhalfjiangover—flag is greater than 0;
若是, 则认为当前音频信号归属于语音, 于是置 Speech— flag=l。 否则, 不 处理, 即保持 Speech_flag=0, 表示当前非噪声帧为非语音类。 If so, the current audio signal is considered to be at the voice, and then Speech_flag = l. Otherwise, it does not process, that is, keeps Speech_flag=0, indicating that the current non-noise frame is a non-speech class.
步骤 S106中, 若确定出当前音频信号归属于音乐, 则执行步骤 S114, 即置 谱波动方差移动平均拖尾标志 flux— var— mov— hangover_flag 为第四设定值, 如 flux_var_mov_hangover_flag=20; 然后执行步骤 S115 , 输出音乐标识, 即置 Music一 flag=l。 否则, 执行步骤 S116, 即保持 Music_flag ), 表示当前非噪声帧 不属于音乐类。 具体实现可以按照如下情况进行: In step S106, if it is determined that the current audio signal belongs to music, step S114 is performed, that is, the spectral fluctuation variance moving average trailing flag flux_var_mov_hangover_flag is a fourth set value, such as flux_var_mov_hangover_flag=20; Step S115, outputting a music identifier, that is, setting Music_flag=l. Otherwise, step S116 is executed, that is, music_flag is maintained, indicating that the current non-noise frame does not belong to the music class. The specific implementation can be carried out as follows:
判断下述条件中的一个或多个是否满足: Determine if one or more of the following conditions are met:
谱波动方差移动平均 flux— var— mov 是否小于第三 x°/。谱衰减函数 f3 The spectral fluctuation variance moving average flux — var — mov is less than the third x°/. Spectral decay function f 3
(Rolloff_x), 如 f3(Rolloff— half)=0.03-l/2400*(Rolloff— half); 谱波动方差移动平均 flux_var—mov是否小于第五设定值, 如第五设定值 =0.005; 讲波动方差移动平均 flux_var_mov 是 否 小 于 第 四 x°/o 谱 衰 减 函 数 f4(Rolloff_x) , 如 f4(Rolloff_hali =l/l 867*Rolloff_half-0.0486; 谱波动方差移动平均 flux— var_mov 是否小于谱波动方差移动平均阈值 THR_FLUX—VAR— MOV; (Rolloff_x), such as f 3 (Rolloff - half) = 0.03-l/2400 * (Rolloff - half); spectral fluctuation variance moving average flux_var - mov is less than the fifth set value, such as the fifth set value = 0.005; Is the fluctuation variance moving average flux_var_mov smaller than the fourth x°/o spectral attenuation function f 4 (Rolloff_x), such as f 4 (Rolloff_hali = l/l 867*Rolloff_half-0.0486; spectral fluctuation variance moving average flux_ var_mov is less than spectral fluctuation Variance moving average threshold THR_FLUX_VAR_MOV;
若其中一个条件满足, 则确定当前非噪声音频信号归属于音乐类别, 于是 置语波动方差移动平均拖尾标志 flux— var— mov— hangover— flag为第四设定值, 如 flux— var— mov_hangover—flag=20。 然后置 Music_flag=l ; 否则, 执行步骤 S116, 即保持 Music_flag=0, 表示当前非噪声帧不属于音乐类。 If one of the conditions is satisfied, it is determined that the current non-noise audio signal belongs to the music category, so the speech fluctuation variance moving average tailing flag flux_var_mov_hangover_flag is the fourth set value, such as flux_var_mov_hangover —flag=20. Then, Music_flag=l is set; otherwise, step S116 is executed, that is, Music_flag=0 is held, indicating that the current non-noise frame does not belong to the music class.
步 驟 S107 中 , 判 断 谱 波 动 方 差 移 动 平 均 拖 尾 标 志 flux_var_mov_hangover_flag是否大于 0; 若是, 则认为当前音频信号归属于音 乐, 于是置 Music_flag=l。 否则, 执行步骤 SI 16, 即保持 Music—flag=0, 表示 当前非噪声帧不属于音乐类。 In step S107, it is determined whether the spectral fluctuation variance moving average tailing flag flux_var_mov_hangover_flag is greater than 0; if yes, the current audio signal is considered to belong to the sound Le, then set Music_flag=l. Otherwise, step SI16 is executed, that is, Music_flag=0 is maintained, indicating that the current non-noise frame does not belong to the music class.
经过上述实施过程后, 非噪声音频信号被判决后, 可能输出如下标志: After the above implementation process, after the non-noise audio signal is judged, the following flag may be output:
Speech_flag= Music_flag=l、 Speech—flag=0和 Music— flag=0。 Speech_flag = Music_flag = 1, Speech - flag = 0, and Music - flag = 0.
然后执行步骤 S117, 即根据所述 Speech— flag和 Music— flag, 判断当前非噪 声音频信号的归属类别: Then, step S117 is performed, that is, according to the Speech_flag and Music_flag, the attribution category of the current non-noise audio signal is determined:
当 Speech_flag=l 且 Music— flag=l 时表示当前非噪声音频信号既归属于语 音, 又归属于音乐; 或当 Speech_flag=0且 Music_flag=0时, 表示当前非噪声音 频信号既不归属于语音, 又不归属于音乐, 于是执行步骤 S118, 即判决非噪声 音频信号归属于不确定信号 UNCERTAIN的信息; When Speech_flag=l and Music_flag=l, it indicates that the current non-noise audio signal belongs to both speech and music; or when Speech_flag=0 and Music_flag=0, it indicates that the current non-noise audio signal is not attributed to speech. If it is not attributed to the music, then step S118 is performed, that is, the information that the non-noise audio signal belongs to the indeterminate signal UNCERTAIN is determined;
当 Speech— flag=l且 Music_flag=0时, 则表示非噪声音频信号归属于语音, 于是执行步骤 S119, 判决非噪声音频信号归属于语音; When Speech_flag=l and Music_flag=0, it indicates that the non-noise audio signal belongs to the voice, and then step S119 is performed to determine that the non-noise audio signal belongs to the voice;
当 Speech— flag=0且 Music_flag=l时, 则表示非噪声音频信号归属于音乐, 于是执行步骤 S120, 判决当前非噪声音频信号归属于音乐。 When Speech_flag=0 and Music_flag=l, it indicates that the non-noise audio signal belongs to the music, and then step S120 is performed to determine that the current non-noise audio signal belongs to the music.
对于判决为既不属于语音类别又不属于音乐类别的不确定音频信号 Uncertain audio signals for decisions that are neither in the speech category nor in the music category
UNCERTAIN, 还需要进一步根据所述音频信号前的音频环境, 判别出其归属类 另' J。 具体判决方法如图 3所示: UNCERTAIN, further needs to determine the attribution class according to the audio environment before the audio signal. The specific judgment method is shown in Figure 3:
步骤 S201 , 判断当前非噪声音频信号之前的音频环境为语音音频环境, 还 是音乐环境; Step S201, determining that the audio environment before the current non-noise audio signal is a voice audio environment, and is also a music environment;
如杲满足 Speech— continue_counter (连续语音计数器, 表示所述当前非噪声 音频信号之前, 连续出现的语音音频信号的个数) >THR— SPEECH 阁值, 则确 定当前非噪声音频信号之前的音频环境为语音音频环境; If the speech_continue_counter (continuous speech counter, the number of consecutively occurring speech audio signals before the current non-noise audio signal) >THR_SPEECH value is satisfied, the audio environment before the current non-noise audio signal is determined as Voice and audio environment;
如果满足 Music_continue_counter (连续音乐计数器, 表示之前连续出现的 音乐音频信号的个数) >THR— MUSIC阈值, 则确定当前非噪声音频信号之前的 音频环境为音乐音频环境。 If Music_continue_counter (continuous music counter, indicating the number of consecutive music audio signals) >THR_MUSIC threshold is satisfied, it is determined that the audio environment before the current non-noise audio signal is a music audio environment.
如 果 Speech— continue— counter>THR— SPEECH 阚 值 , 或 Music— continue— counter>THR— MUSIC阈值均不满足, 说明当前非噪声音频信号 之前的音频环境既不属于语音环境, 也不属于音乐环境。 于是, 直接执行步骤 S205 , 即将所述非噪声音频信号判决为不确定音频信号。 If Speech— continue—count>THR—SPEECH 阚, or Music—continent—count>THR—The MUSIC threshold is not satisfied, indicating that the audio environment before the current non-noise audio signal is neither a voice environment nor a music environment. Then, step S205 is directly performed, that is, the non-noise audio signal is judged as an indeterminate audio signal.
当确定出当前非噪声音频信号之前的音频环境为语音环境时, 则执行步骤 S202, 根据当前非噪声音频信号的 flux、 flux— var、 flux— var__mov、 Rolloff— var 和 fzcr中的至少一个, 判断当前非噪声音频信号是否归属于语音, 若是, 则执 行步骤 S204, 即确定当前非噪声音频信号为语音, 并置语音信号标志 Speech— flag=l ; 否则执行步骤 S205, 即确定当前非噪声音频信号为不确定音频 信号。 When it is determined that the audio environment before the current non-noise audio signal is a voice environment, step S202 is performed to determine, according to at least one of flux, flux_var, flux_var__mov, Rolloff_var, and fzcr of the current non-noise audio signal. Whether the current non-noise audio signal is attributed to the voice, if yes, executing step S204, that is, determining that the current non-noise audio signal is voice, and juxtaposing the voice signal flag Speech_flag=l; otherwise, performing step S205, determining the current non-noise audio signal For uncertain audio signals.
步骤 S202的具体实施过程如下: The specific implementation process of step S202 is as follows:
判断是否满足下述条件中的至少一个: flux>THR—flux, flux— var >THR_flux_var , flux_var_mov>THR_flux_var_mov , Determine if at least one of the following conditions is met: flux>THR—flux, flux—var>THR_flux_var, flux_var_mov>THR_flux_var_mov,
Rolloff_var>THR_Rolloff_var , fzcr< THR— fzcr; Rolloff_var>THR_Rolloff_var , fzcr< THR — fzcr;
如果满足上述条件之一, 则确定当前非噪声音频信号为语音, 置语音信号 标志 Speech_flag=l; 否则, 确定当前非噪声音频信号为不确定音频信号。 If one of the above conditions is satisfied, it is determined that the current non-noise audio signal is speech, and the speech signal flag Speech_flag = 1; otherwise, the current non-noise audio signal is determined to be an indeterminate audio signal.
此时所述阔值 THR— flux、 THR_flux_var和 THR_flux_var一 mov可以不同于 初始判断过程所设置的相应阈值。 At this time, the thresholds THR-flux, THR_flux_var, and THR_flux_var-mov may be different from the corresponding thresholds set by the initial judgment process.
当确定出当前非噪声音频信号之前的音频环境为音乐环境时, 则执行步骤 S203, 根据当前非噪声音频信号的 flux_var—mov、 Rolloff— var和 fzcr中的至少 一个, 判断当前非噪声音频信号是否归属于音乐, 若是, 则确定当前非噪声音 频信号为音乐, 并置音乐信号标志 Music—flag=l; 否则, 执行步骤 S205, 即确 定当前非噪声音频信号为不确定音频信号。 When it is determined that the audio environment before the current non-noise audio signal is a music environment, step S203 is performed to determine whether the current non-noise audio signal is based on at least one of flux_var_mov, Rolloff_var, and fzcr of the current non-noise audio signal. Attributable to the music, if yes, determining that the current non-noise audio signal is music, and juxtaposing the music signal flag Music_flag=l; otherwise, performing step S205, that is, determining that the current non-noise audio signal is an indeterminate audio signal.
步骤 S204的具体实施过程如下: The specific implementation process of step S204 is as follows:
判断是否满足下述条件中的至少一个: flux_var—mov<THR_flux— var—mov, Rolloff_var<THR_ Rolloff_var, fzcr> THR— fzcr; Determining whether at least one of the following conditions is satisfied: flux_var_mov<THR_flux_var-mov, Rolloff_var<THR_Rolloff_var, fzcr> THR-fzcr;
如果满足上述条件之一, 则确定当前非噪声音频信号为音乐, 置音乐信号 标志 Music—flag=l ; 否则, 确定当前非噪声音频信号为不确定音频信号。 If one of the above conditions is satisfied, it is determined that the current non-noise audio signal is music, and the music signal is set. The flag Music_flag=l; otherwise, it determines that the current non-noise audio signal is an indeterminate audio signal.
此时所述阈值 THR_ flux_var— mov可以不同于初始判断过程所设置的相应 阈值。 At this time, the threshold THR_flux_var_mov may be different from the corresponding threshold set by the initial judging process.
对于步骤 S101至步骤 S120确定出的既属于语音类别又属于音乐类别的音 频信号, 以及步骤 S201 至步骤 S205确定出的既不属于语音类别又不属于音乐 类别的不确定音频信号, 可以进行进一步的判决, 采取的判决方法如下: For the audio signals determined by the steps S101 to S120 which belong to both the voice category and the music category, and the uncertain audio signals determined by the steps S201 to S205 that are neither the voice category nor the music category, further determination may be performed. The judgment, the method of judgment adopted is as follows:
根据当前非噪声音频信号前的音频信号, 对所述不确定音频信号进行归属 类别的判决。 具体如下: The uncertainty audio signal is subjected to a decision of a home class based on an audio signal preceding the current non-noise audio signal. details as follows:
将所述不确定音频信号的归属类别, 判为紧邻所述不确定音频信号前的音 频信号的归属类别; 或, 所述不确定音频信号的归属类别, 判为所述不确定音 频信号前的一段音频信号中, 所占比重较大的信号所归属的类别。 Determining, by the attribution category of the indeterminate audio signal, a attribution category of the audio signal immediately before the indeterminate audio signal; or determining, by the attribution category of the indeterminate audio signal, The category to which a relatively large proportion of the audio signal belongs.
对于步骤 S101至步骤 S120确定出的既属于语音类别又属于音乐类别的音 频信号, 以及步骤 S201至步骤 S205确定出的既不属于语音类别又不属于音乐 类别的不确定音频信号, 也可以采用其它软决策方法, 对不确定音频信号, 进 行归属类别的判决,例如采用 GMM (高斯混合模型)判定的方法作进一步分类。 For the audio signals determined by the steps S101 to S120 that belong to both the voice category and the music category, and the uncertain audio signals determined by the steps S201 to S205 that are neither in the voice category nor in the music category, other The soft decision method, for the uncertain audio signal, performs the classification of the attribution class, for example, using the GMM (Gaussian mixture model) decision method for further classification.
上述实施例是以步骤 S101至步骤 S107同时进行判断为例进行说明的, 除 每个步骤的详细情况与上述实施过程雷同, 此处不进行具体说明。 The above embodiment is described by taking the steps S101 to S107 simultaneously as an example, and the details of each step are the same as those of the above-mentioned implementation process, and are not specifically described herein.
本发明提供的第二实施例是一种确定非噪声音频信号归属类别的装置, 其 结构如图 4 所示, 包括: 特征参数获取单元和归属类别确定单元。 所述归属类 别确定单元包括: 清音判别子单元、 语音判别子单元和音乐判别子单元, 所述 归属类别确定单元还包括: 一判决子单元。 The second embodiment of the present invention is an apparatus for determining a home category of a non-noise audio signal, and has a structure as shown in FIG. 4, including: a feature parameter obtaining unit and a home category determining unit. The attribution class determining unit includes: a voiceless discriminating subunit, a voice discriminating subunit, and a music discriminating subunit, and the home class determining unit further includes: a determining subunit.
各个单元之间信号的交互关系如下: The interaction of signals between the various units is as follows:
所述特征参数获取单元获取非噪声音频信号的特征参数; 所述特征参数包 括如下中的至少一个: The feature parameter obtaining unit acquires a feature parameter of the non-noise audio signal; the feature parameter includes at least one of the following:
谱波动 flux; 谱波动方差 flux— var; 谱波动方差移动平均 flux— var_mov; 频带占全频带的能量比率 ratiol ; 95%谱衰减 Rolloff; x%谱衰减 Rolloff—x, 如 50%谱衰减 Rolloff_half; 谱衰减方差 rolloff一 var; 频谱幅度的方差 magvar; 时域 过零率 zcr; 频域过零率 fzcr。 Spectral fluctuation flux; spectral fluctuation variance flux_var; spectral fluctuation variance moving average flux_var_mov; The energy ratio of the frequency band to the full frequency band ratiol; 95% spectral attenuation Rolloff; x% spectral attenuation Rolloff-x, such as 50% spectral attenuation Rolloff_half; spectral attenuation variance rolloff-var; spectral amplitude variance magvar; time domain zero-crossing rate zcr; The frequency domain zero crossing rate fzcr.
所述归属类别确定单元, 在频域范围内, 根据所述非噪声音频信号的特征 参数, 以及设定的特征参数阈值, 确定当前非噪声音频信号归属类别。 具体处 理如下: The attribution category determining unit determines, in the frequency domain range, the current non-noise audio signal attribution category according to the characteristic parameter of the non-noise audio signal and the set characteristic parameter threshold. The specific treatment is as follows:
清音判别子单元, 根据获取到的如下特征参数的一个或多个, 以及相应的 特征参数阈值, 对当前非噪声音频信号进行清音归属类别的判决: 时域过零率 zcr; 低频带占全频带的能量比率 ratiol ; 具体处理过程与第一实施例中的相关描 述雷同, 这里不再详细描述。 以及, The unvoiced discriminant subunit performs a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the obtained characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; a low frequency band occupying the full frequency band The specific energy ratio ratio is the same as the related description in the first embodiment, and will not be described in detail here. as well as,
语音判别子单元, 根据获取到的如下特征参数中的一个或多个, 以及相应 的特征参数阈值,对当前非噪声音频信号进行语音归属类别的判决:谱波动 flux; 傳波动方差 var— flux; 谱波动方差移动平均 flux— var— mov; 时域过零率 zcr; x% 谱衰減 Rolloff—x, 如 50%谱衰减 Rolloff— half; 具体处理过程与第一实施例中的 相关描述雷同, 这里不再详细描述。 以及, The speech discriminating subunit performs a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a spectral fluctuation flux; a variability variance var-flux; Spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff-x, such as 50% spectral attenuation Rolloff-half; the specific processing is the same as the related description in the first embodiment, here No longer described in detail. as well as,
音乐判别子单元, 根据获取到的如下特征参数的一个或多个, 以及相应的 特征参数阈值, 对当前非噪声音频信号进行音乐归属类别的判决: 谱波动方差 移动平均 flux— var— mov; x%谱衰减 Rolloff_x, 如 50%谱衰减 Rolloff— half。 具体 处理过程与第一实施例中的相关描述雷同, 这里不再详细描述。 The music discriminating subunit performs a music attribution category decision on the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: spectral fluctuation variance moving average flux_var_mov; x The % spectrum decays Rolloff_x, such as 50% spectral decay Rolloff-half. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
当通过所述清音判决子单元、 语音判决子单元或音乐判决子单元, 判决出 当前非噪声音频信号为既不归属于语音类别又不归属于音乐类别时, 所述归属 类别确定单元还通过所述一判决子单元, 判断当前非噪声音频信号前存在语音 音频环境还是音乐音频环境; When the unvoiced audio sub-unit, the voice decision sub-unit or the music decision sub-unit determines that the current non-noise audio signal belongs to neither the voice class nor the music class, the home class determining unit passes the Describe a decision subunit to determine whether a voice audio environment or a music audio environment exists before the current non-noise audio signal;
当当前非噪声音频信号前存在语音音频环境时, 根据获取到的如下特征参 数中的一个或多个: i香波动 flux; 谱波动方差 var— flux; 谱波动方差移动平均 flux_var_mov; 谱衰减方差 rolloff— var; 频域过零率 fzcr, 以及相应的特征参数 阈值, 对既不属于语音又不属于音乐的当前非噪声音频信号进行语音归属类别 的判决; 具体处理过程与第一实施例中的相关描述雷同, 这里不再详细描述。 When there is a voice audio environment before the current non-noise audio signal, according to one or more of the following characteristic parameters obtained: i scent fluctuation flux; spectral fluctuation variance var-flux; spectral fluctuation variance moving average flux_var_mov; spectral attenuation variance rolloff — var; frequency domain zero-crossing rate fzcr, and corresponding characteristic parameters The threshold value is used to determine the voice attribution category for the current non-noise audio signal that is neither voice nor music. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
当当前非噪声音频信号前存在音乐音频环境时, 4艮据获取到的如下特征参 数中的一个或多个:谱波动方差移动平均 flux_var_mov;谱衰减方差 rolloff— var; 频域过零率 fzcr, 以及相应的特征参数阈值,对既不属于语音又不属于音乐的当 前非噪声音频信号进行语音归属类别的判决。 具体处理过程与第一实施例中的 相关描述雷同, 这里不再详细描述。 When there is a music audio environment in front of the current non-noise audio signal, one or more of the following characteristic parameters are obtained: spectral fluctuation variance moving average flux_var_mov; spectral attenuation variance rolloff_var; frequency domain zero-crossing rate fzcr, And corresponding feature parameter thresholds, and the current non-noise audio signal that is neither voice nor music belongs to the voice attribution category. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
对于通过所述清音判决子单元、 语音判决子单元或音乐判决子单元, 确定 出的既属于语音类别又属于音乐类别的音频信号, 以及当通过所述一判决子单 元确定出既不属于语音类别又不属于音乐类别的不确定音频信号后, 可以通过 所述一判决子单元对所述音频信号进行进一步的判决, 可以采取如下判决方法: 根据当前非噪声音频信号前的音频信号, 对所述不确定音频信号进行归属 类别的判决。 也就是说, 将所述不确定音频信号的归属类别, 判为紧邻所述不 确定音频信号前的音频信号的归属类别; 或, 所述不确定音频信号的归属类别, 判为所述不确定音频信号前的一段音频信号中, 所占比重较大的信号所归属的 类别。 And an audio signal determined by the unvoiced decision subunit, the voice decision subunit, or the music decision subunit, belonging to both the voice category and the music category, and when determined by the one of the decision subunits to be neither a voice category After the undetermined audio signal of the music category is not further determined, the audio signal may be further determined by the determining subunit, and the following determining method may be adopted: according to the audio signal before the current non-noise audio signal, Uncertain audio signals are subject to decision of attribution class. That is, the attribution category of the indeterminate audio signal is determined as the attribution category of the audio signal immediately before the indeterminate audio signal; or the attribution category of the indeterminate audio signal is determined as the uncertainty Among the audio signals in front of the audio signal, the category to which the larger proportion of the signal belongs.
也可以采用其它软决策方法, 对不确定音频信号, 进行归属类别的判决, 例如采用 GMM (高斯混合模型) 判定的方法作进一步分类。 Other soft decision methods can also be used to determine the attribution class for the uncertain audio signal, for example, using the GMM (Gaussian Mixture Model) decision.
本发明提供的第三实施例是一种清音判别装置, 其结构如图 5所示, 包括: 第一获取单元和清音判别单元。 The third embodiment provided by the present invention is an unvoiced discriminating device, and its structure is as shown in FIG. 5, and includes: a first acquiring unit and an unvoiced discriminating unit.
所述第一获取单元获取音频信号的特征参数; 所述特征参数包括时域过零 率 zcr; 和 /或, 低频带占全频带的能量比率 ratiol。 The first obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes a time domain zero-crossing rate zcr; and/or, the low frequency band occupies an energy ratio ratio1 of the full frequency band.
所述清音判别单元, 根据获取到的如下特征参数的一个或多个, 以及相应 的特征参数阈值, 对当前非噪声音频信号进行清音归属类别的判决: 时域过零 率 zcr; 低频带占全频带的能量比率 mtiol。 具体处理过程与第一实施例中的相 关描述雷同, 这里不再详细描述。 本发明提供的第四实施例是一种语音判别装置, 其结构如图 6所示, 包括: 第二获取单元和语音判别单元; The unvoiced discriminating unit performs a decision on the unvoiced attribution category of the current non-noise audio signal according to one or more of the acquired characteristic parameters and the corresponding characteristic parameter threshold: a time domain zero-crossing rate zcr; a low frequency band The energy ratio of the frequency band is mtiol. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein. The fourth embodiment of the present invention is a voice discriminating device, and the structure thereof is as shown in FIG. 6, and includes: a second acquiring unit and a voice discriminating unit;
所述第二获取单元获取音频信号的特征参数; 所述特征参数包括如下特征 参数中的一个或多个: The second obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes one or more of the following feature parameters:
谱波动 flux; 谱波动方差 var_flux; 谱波动方差移动平均 flux— var—mov; 时 域过零率 zcr; x%谱衰减 Rolloff_x, 如 50%谘衰减 Rolloff一 half。 Spectral fluctuation flux; spectral fluctuation variance var_flux; spectral fluctuation variance moving average flux_var-mov; time domain zero-crossing rate zcr; x% spectral attenuation Rolloff_x, such as 50% attenuation Rolloff-half.
所述语音判别单元, 根据获取到的如下特征参数中的一个或多个, 以及相 应的特征参数阈值, 对当前非噪声音频信号进行语音归属类别的判决: 谱波动 flux;语波动方差 var— flux; 谱波动方差移动平均 flux— var—mov; 时域过零率 zcr; x%谱衰减 Rolloff— X, 如 50%谙衰减 Rolloff— half。 具体处理过程与第一实施例中 的相关描述雷同, 这里不再详细描述。 The speech discriminating unit performs a speech attribution category determination on the current non-noise audio signal according to one or more of the acquired characteristic parameters and a corresponding characteristic parameter threshold: spectral fluctuation flux; language fluctuation variance var-flux ; spectral fluctuation variance moving average flux - var - mov; time domain zero crossing rate zcr; x% spectral attenuation Rolloff - X, such as 50% 谙 decay Rolloff - half. The specific processing procedure is the same as that in the first embodiment, and will not be described in detail herein.
本发明提供的第五实施例是一种音乐判别装置, 其结构如图 7所示, 包括: 第三获取单元和音乐判别单元。 A fifth embodiment of the present invention is a music discriminating device, which has a structure as shown in FIG. 7, and includes: a third acquiring unit and a music discriminating unit.
所述第三获取单元获取音频信号的特征参数; 所述特征参数包括如下特征 参数的一个或多个: The third obtaining unit acquires a feature parameter of the audio signal; the feature parameter includes one or more of the following feature parameters:
谱波动方差移动平均 flux var mov; x%请衰减 Rolloff x, 如 50%i瞽衰减 Rolloff_half0 Spectral fluctuation variance moving average flux var mov; x% please attenuate Rolloff x, such as 50% i瞽 decay Rolloff_half 0
所述音乐判别单元, 根据获取到的如下特征参数的一个或多个, 以及相应 的特征参数阈值, 对当前非噪声音频信号进行音乐归属类别的判决: 谱波动方 差移动平均 flux var—mov; .x%i普衰减 Rolloff x, 如 50%i脊衰减 Rolloff half。 具 体处理过程与第一实施例中的相关描述雷同 , 这里不再详细描述。 The music discriminating unit performs a music attribution category decision on the current non-noise audio signal according to one or more acquired feature parameters and a corresponding feature parameter threshold: a spectral fluctuation variance moving average flux var-mov; X%i attenuates Rolloff x, such as 50% i ridge attenuation Rolloff half. The specific processing is identical to the related description in the first embodiment and will not be described in detail herein.
由上述本发明的实施例提供的具体实施方案可以看出, 其是根据非噪声音 频信号的频谱特征参数, 来确定当前非噪声音频信号的归属类别的, 因此本发 明的实施例能够脱离编码算法而存在, 从而具有独立性以及可移植性。 It can be seen from the specific implementation provided by the above embodiments of the present invention that the attribution type of the current non-noise audio signal is determined according to the spectral characteristic parameter of the non-noise audio signal, so that the embodiment of the present invention can be separated from the coding algorithm. It exists, thus being independent and portable.
显然, 本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发 明的精神和范围。 这样, 倘若本发明的这些修改和变型属于本发明权利要求及 其等同技术的范围之内, 则本发明也意图包含这些改动和变型在内。 It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention are within the scope of the present invention The present invention is also intended to cover such modifications and variations within the scope of the equivalents.
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 200710080333 CN101256772B (en) | 2007-03-02 | 2007-03-02 | Method and device for determining attribution class of non-noise audio signal |
| CN200710080333.X | 2007-03-02 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2008106852A1 true WO2008106852A1 (en) | 2008-09-12 |
Family
ID=39737776
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2007/003985 Ceased WO2008106852A1 (en) | 2007-03-02 | 2007-12-29 | A method and device for determining the classification of non-noise audio signal |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN101256772B (en) |
| WO (1) | WO2008106852A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8050916B2 (en) | 2009-10-15 | 2011-11-01 | Huawei Technologies Co., Ltd. | Signal classifying method and apparatus |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101399039B (en) * | 2007-09-30 | 2011-05-11 | 华为技术有限公司 | Method and device for determining non-noise audio signal classification |
| CN102044246B (en) | 2009-10-15 | 2012-05-23 | 华为技术有限公司 | An audio signal detection method and device |
| CN102129858B (en) * | 2011-03-16 | 2012-02-08 | 天津大学 | Note Segmentation Method Based on Teager Energy Entropy |
| JP6182895B2 (en) * | 2012-05-01 | 2017-08-23 | 株式会社リコー | Processing apparatus, processing method, program, and processing system |
| US9589570B2 (en) | 2012-09-18 | 2017-03-07 | Huawei Technologies Co., Ltd. | Audio classification based on perceptual quality for low or medium bit rates |
| PL2922052T3 (en) * | 2012-11-13 | 2021-12-20 | Samsung Electronics Co., Ltd. | HOW TO SET THE ENCODING MODE |
| CN114534130A (en) * | 2020-11-25 | 2022-05-27 | 深圳市安联消防技术有限公司 | Method for eliminating airflow noise of breathing mask |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4720862A (en) * | 1982-02-19 | 1988-01-19 | Hitachi, Ltd. | Method and apparatus for speech signal detection and classification of the detected signal into a voiced sound, an unvoiced sound and silence |
| CN1204766A (en) * | 1997-03-25 | 1999-01-13 | 皇家菲利浦电子有限公司 | Method and device for detecting voice activity |
| JP2000066691A (en) * | 1998-08-21 | 2000-03-03 | Kdd Corp | Audio information classification device |
| US6202046B1 (en) * | 1997-01-23 | 2001-03-13 | Kabushiki Kaisha Toshiba | Background noise/speech classification method |
| CN1354455A (en) * | 2000-11-18 | 2002-06-19 | 深圳市中兴通讯股份有限公司 | Sound activation detection method for identifying speech and music from noise environment |
| US20060136211A1 (en) * | 2000-04-19 | 2006-06-22 | Microsoft Corporation | Audio Segmentation and Classification Using Threshold Values |
| CN1909060A (en) * | 2005-08-01 | 2007-02-07 | 三星电子株式会社 | Method and apparatus for extracting voiced/unvoiced classification information |
| CN1920947A (en) * | 2006-09-15 | 2007-02-28 | 清华大学 | Voice/music detector for audio frequency coding with low bit ratio |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6570991B1 (en) * | 1996-12-18 | 2003-05-27 | Interval Research Corporation | Multi-feature speech/music discrimination system |
-
2007
- 2007-03-02 CN CN 200710080333 patent/CN101256772B/en not_active Withdrawn - After Issue
- 2007-12-29 WO PCT/CN2007/003985 patent/WO2008106852A1/en not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4720862A (en) * | 1982-02-19 | 1988-01-19 | Hitachi, Ltd. | Method and apparatus for speech signal detection and classification of the detected signal into a voiced sound, an unvoiced sound and silence |
| US6202046B1 (en) * | 1997-01-23 | 2001-03-13 | Kabushiki Kaisha Toshiba | Background noise/speech classification method |
| CN1204766A (en) * | 1997-03-25 | 1999-01-13 | 皇家菲利浦电子有限公司 | Method and device for detecting voice activity |
| JP2000066691A (en) * | 1998-08-21 | 2000-03-03 | Kdd Corp | Audio information classification device |
| US20060136211A1 (en) * | 2000-04-19 | 2006-06-22 | Microsoft Corporation | Audio Segmentation and Classification Using Threshold Values |
| CN1354455A (en) * | 2000-11-18 | 2002-06-19 | 深圳市中兴通讯股份有限公司 | Sound activation detection method for identifying speech and music from noise environment |
| CN1909060A (en) * | 2005-08-01 | 2007-02-07 | 三星电子株式会社 | Method and apparatus for extracting voiced/unvoiced classification information |
| CN1920947A (en) * | 2006-09-15 | 2007-02-28 | 清华大学 | Voice/music detector for audio frequency coding with low bit ratio |
Non-Patent Citations (1)
| Title |
|---|
| BAI L. ET AL.: "FEATURE ANALYSIS AND EXTRACTION FOR AUDIO AUTOMATIC CLASSIFICATION", MINI-MICRO SYSTEMS, vol. 26, no. 11, November 2005 (2005-11-01), pages 2029 - 2034 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8050916B2 (en) | 2009-10-15 | 2011-11-01 | Huawei Technologies Co., Ltd. | Signal classifying method and apparatus |
| US8438021B2 (en) | 2009-10-15 | 2013-05-07 | Huawei Technologies Co., Ltd. | Signal classifying method and apparatus |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101256772A (en) | 2008-09-03 |
| CN101256772B (en) | 2012-02-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2008106852A1 (en) | A method and device for determining the classification of non-noise audio signal | |
| CN103646649B (en) | A kind of speech detection method efficiently | |
| CN103325386B (en) | The method and system controlled for signal transmission | |
| Junqua et al. | A robust algorithm for word boundary detection in the presence of noise | |
| CN112951259B (en) | Audio noise reduction method and device, electronic equipment and computer readable storage medium | |
| US8165880B2 (en) | Speech end-pointer | |
| US6993481B2 (en) | Detection of speech activity using feature model adaptation | |
| EP2083417B1 (en) | Sound processing device and program | |
| US20090198492A1 (en) | Adaptive noise modeling speech recognition system | |
| WO2008067719A1 (en) | Sound activity detecting method and sound activity detecting device | |
| CN100505040C (en) | Audio frequency splitting method for changing detection based on decision tree and speaking person | |
| WO2014153800A1 (en) | Voice recognition system | |
| WO2002029782A1 (en) | Perceptual harmonic cepstral coefficients as the front-end for speech recognition | |
| Zaw et al. | The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection | |
| CN104835498A (en) | Voiceprint identification method based on multi-type combination characteristic parameters | |
| CN101821971A (en) | Systems and methods for noisy activity detection | |
| CN101010722A (en) | Detection of voice activity in an audio signal | |
| CN104900229A (en) | Method for extracting mixed characteristic parameters of voice signals | |
| CN105374352A (en) | Voice activation method and system | |
| CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
| Vyas | A Gaussian mixture model based speech recognition system using Matlab | |
| CN111091833A (en) | Endpoint detection method for reducing noise influence | |
| CN110349598A (en) | A kind of end-point detecting method under low signal-to-noise ratio environment | |
| CN100573663C (en) | Silence Detection Method Based on Speech Feature Discrimination | |
| US20190139567A1 (en) | Voice Activity Detection Feature Based on Modulation-Phase Differences |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07846019 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 07846019 Country of ref document: EP Kind code of ref document: A1 |