HK1041739B

HK1041739B - Noise suppression using external voice activity detection

Info

Publication number: HK1041739B
Application number: HK01107509.8A
Authority: HK
Inventors: 詹姆斯‧布莱恩‧皮科特; 克里斯托弗‧怀恩‧斯普林菲尔德; 陈培菁
Original assignee: Cdc知识产权公司
Priority date: 1999-04-19
Filing date: 2000-03-16
Publication date: 2004-09-30

Description

Method of controlling update of noise amount estimation of voice signal and transmitter

Technical Field

The present invention relates to communication systems, and more particularly to noise suppression of transmitted speech signals.

Background

In a communication system, a transmitting station may employ a noise suppression mechanism to reduce the noise content of the transmitted speech signal. This can be particularly useful when the transmitting station is a mobile handset or hands-free phone that operates in the presence of background noise. In these environments, a sudden increase in background noise can cause a far-end listener to hear an undesirable noise level. This problem is particularly apparent when the transmitter station operates as a mobile station and the transmitter station includes noise suppression techniques. Although current noise suppression techniques are effective at reducing background noise in static or slowly varying noise environments, noise suppression may be significantly reduced when the transmitting station operates in an environment where rapidly varying noise is present.

In a mobile environment, a large change in background noise can be generated when a user of a mobile transmitter activates a fan, drops a window, while the mobile station is in motion or otherwise experiences a significant and sudden change in background noise within the mobile station. Background noise within a mobile unit may also be affected by a variety of other variations within the mobile unit.

In a typical mobile transmitter that uses internal voice activity detection for a noise suppression algorithm, the increase in background noise can be decoded by the noise suppression algorithm into a voice signal from the user of the mobile transmitter. This condition arises due to the interdependence between the detection of speech activity and the estimation of the intrinsic noise (floor) calculated by the noise suppression algorithm. A noise suppression technique, such as stationary spectrum inspection, has been used with some success to mitigate the effects of sudden increases in background noise. However, in practice, this approach has been shown to be inadequate in many cases due to the time required for the noise suppression algorithm to reduce the background noise to a receivable level. In some cases, the time period may last 10-20 seconds. In other cases, the system may experience a lock-in fault condition in which the noise-inherent update ceases to occur. This results in placing the transmitter in a state where the listener experiences an unacceptable amount of noise for an extended period of time.

Disclosure of Invention

It is therefore particularly desirable that the noise suppression method and system is adapted to sudden increases in background noise by using a voice activity detector with reduced interdependence between voice activity detection and noise floor estimation. Such a system provides the ability to transmit with a lower noise level while the mobile station operates in the presence of widely varying background noise.

According to the present invention, there is provided a method for controlling updating of a noise level estimate of an input speech signal, wherein in a transmitter performing a noise suppression technique on the input speech signal, the noise suppression technique uses an internal speech activity detector, the method comprising the steps of: estimating a background noise floor of the input speech signal using an external speech activity detector outside of noise suppression techniques; estimating a signal power of an input speech signal using an external speech activity detector; obtaining a voice activity factor in an external voice activity detector based on a background noise floor estimate; determining voice activity from the voice activity factor and a signal power estimate; controlling updating of the noise amount estimate according to the determining step; and the second estimating step comprises the step of integrating previous signal power estimates.

The present invention also provides a transmitter for transmitting a speech signal to a remote receiver, comprising: an internal voice activity detector; an intrinsic noise amount estimator coupled to the internal voice activity detector; an external voice activity detector coupled to the noise floor estimator for controlling the updating of the noise floor estimator; and said external voice activity detector applying a slope factor for producing said update of said noise floor associated with said voice signal, and said external voice detector comprising: a signal power estimator for calculating a signal power estimate of the speech signal; a noise floor estimator for estimating a noise floor of the speech signal independent of a speech activity state; and a voice activity processor coupled to said signal power estimator and said noise floor estimator for controlling the updating of said noise quantity estimator. Drawings

A more complete understanding of the present invention may be derived by referring to the detailed description when considered in connection with the figures, wherein like reference numbers refer to similar items throughout the figures, and:

FIG. 1 is a block diagram of a transmitter employing voice activity detection using an external voice activity detector in accordance with a preferred embodiment of the present invention;

FIG. 2 is a flow diagram of a method for noise suppression using an external voice activity detector in accordance with a preferred embodiment of the present invention; and

fig. 3 is a flow diagram of a method used by an external voice activity detector to control the updating of noise volume estimates by a noise suppression algorithm in accordance with a preferred embodiment of the present invention.

Detailed Description

A method and system for improved noise suppression using an external voice activity detector provides the ability to communicate voice in the presence of widely varying background noise. The method and system overcome the disadvantages in various noise suppression techniques by providing a fast noise update that minimizes the noise heard by the listening station. In addition, a lock-in fault condition in which noise update ceases to occur is avoided. This results in a hands-free communication system that does not subject the far-end listener to noise shocks when the background noise increases.

Fig. 1 is a block diagram of a transmitter employing voice activity detection using an external voice activity detector in accordance with a preferred embodiment of the present invention. In fig. 1, a microphone 50 receives acoustic energy and converts this energy into an electrical signal. Microphone 50 may be any type of microphone or other sensor that converts mechanical or acoustic vibrations into an electrical signal. The microphone 50 is coupled to an analog-to-digital converter 75, and the analog-to-digital converter 75 converts the input analog electrical signal to a digital representation. Analog-to-digital converter 75 can be any general type of converter preferably having sufficient sample rate and dynamic range to produce an accurate digital representation of the analog speech signal from microphone 50.

The output of the analog-to-digital converter 75 is input to a noise suppressor 100, the noise suppressor 100 comprising a pre-processor 110, a voice activity detector 120, a noise amount estimator 130, and a channel gain calculation element 140. An output of the analog-to-digital converter 75 is additionally coupled to an external voice activity detector 150. In a preferred embodiment, noise suppressor 100 represents the various noise suppressors applicable in connection with the present invention. In addition, the functionality of the noise suppressor 100 may be implemented entirely as one or more software processing elements, or may be implemented in hardware where the respective functions are implemented by discrete and dedicated processing elements.

In fig. 1, the pre-processor 110 receives a digitally represented speech signal from the analog-to-digital converter 75. In a preferred embodiment, the preprocessor 110 performs any desired spectral modification function in which some spectral bands, preferably those containing primarily speech, are emphasized, while other spectral bands, such as those containing primarily noise, are attenuated. In addition, the pre-processor 110 may also perform a conversion from a time domain signal to a frequency domain signal to allow the remainder of the noise suppressor 100 to perform additional processing on the digital representation of the speech signal.

The output of the pre-processor 110 is coupled to a voice activity detector 120 and a noise amount estimator 130. In a preferred embodiment, the voice activity detector 120 performs voice detection based on channel energy statistics of the noise floor and the digital representation of the voice signal from the pre-processor 110. The noise amount estimator 130 measures background noise present in the digital representation of the speech signal from the pre-processor 110.

The outputs of the voice activity detector 120 and the noise amount estimator 130 are then coupled to a channel gain calculation element 140. In a preferred embodiment, channel gain calculation element 140 segments the digital representation of the speech signal into a set of frequency bins. By segmenting the speech signal into frequency segments, channel and gain calculations can be performed for particular frequency bands that contain primarily speech information. In addition, those frequency bands that contain mostly noise information can be attenuated.

As shown in fig. 1, the noise amount estimator 130 is coupled to the voice activity detector 120 for making voice activity decisions based on the amount of noise in the digital representation of the voice signal from the pre-processor 110. Thus, the voice activity detector 120 determines voice activity by receiving input from the noise amount estimator 130.

In fig. 1, an external voice activity detector 150 makes a separate voice activity determination to assist the noise amount estimator 130 in determining the amount of noise in the digital representation of the voice signal from the pre-processor 110. In a preferred embodiment, the external voice activity detector determines voice activity without input from the noise amount estimator 130. Importantly, by removing the dependency of the noise floor determination on the voice activity detection decision, without constraining the external noise floor estimation, a more reliable voice activity detection mechanism can be provided for use in environments where background noise changes rapidly.

The external voice activity detector 150 receives an input of a digital representation of the voice signal from the analog-to-digital converter 75. These inputs are coupled to a signal power estimator 154 and a noise floor estimator 156. The signal power estimator 154 performs calculations to determine the signal power present in the input signal. The noise floor estimator 156 performs calculations on the input signal to determine the noise floor of the signal input.

The outputs from the signal power estimator 154 and the noise floor estimator 156 are coupled to a voice activity processor 158, and the voice activity processor 158 compares the signal power to the level of the inherent noise to determine whether an update of the noise quantity estimator 130 should be made. The method used by the signal power estimator 154, the noise floor estimator 156, and the voice activity processor 158 is further discussed with reference to fig. 3. The output of the voice activity processor 158 is coupled to the noise suppressor 100. In a preferred embodiment, the output includes an instruction that forces the noise estimator 130 to perform a noise estimate of the digital representation of the speech signal from the pre-processor 110.

Fig. 2 is a flow chart of a method implemented by an external voice activity detector in accordance with a preferred embodiment of the present invention. The external voice activity detector 150 of fig. 1 is adapted to implement this method. The method of fig. 2 begins with computing a background noise floor estimate with a voice activity detector. By way of example, and not by way of limitation, such estimation is based on a slow-rise/fast-fall technique designed to track changes in the inherent noise of a particular signal. Preferably, the technique does not require an assumption as to whether the input digital representation of the speech signal is speech or noise. As each sample indicated by y (n) is processed, the estimate of the current signal power is desirably updated at step 220 by an integration function such as a leaky integrator as represented in the following equation.

P_y(n)＝(1-γ)y²(n)+γP_y(n-1) wherein γ ≈ 9875

At step 230, the current signal power estimate is compared to the noise floor estimate. If the signal power estimate exceeds the noise floor estimate, which can indicate a reduction in the noise level of the input speech signal, then the updated noise floor is set equal to the signal power estimate in step 245. This produces the desired "fast drop" of the inherent noise. If the signal power estimate exceeds the noise floor estimate, which indicates an increase in noise level, a slope factor is applied to the noise floor estimate (at step 240) to produce a slowly rising wander (rambling) of the current noise floor estimate at a rate of beta decibels per second. The algorithm for steps 230, 240 and 245 can be expressed as:

if (P)_y(n)＜NF_y(n-1)), then NF_y(n)＝P_y(n)

Otherwise

NF_y(n)＝β(NF_y(n-1)), wherein β ≈ 2 to 8 dB/sec

At step 250, a voice activity factor α is applied to the updated noise floor estimate to produce a voice activity threshold estimate (α (NF))_y(n))). The method then continues to step 260 where the signal power estimate is compared to the voice activity threshold estimate from step 250. Step 260 is the primary decision as to whether or not to force a noise suppression technique to update the noise magnitude estimate of the digital representation of the speech signal, although typical implementations preferably also employ well-known techniques such as release delay periods and lags.

If the signal power estimate exceeds the voice activity threshold estimate, then the external voice activity detector allows the noise suppression technique to update the noise amount estimate, as in step 270. In the event that the signal power estimate does not exceed the voice activity threshold, step 262 is performed in which a determination is made as to whether the upper limit of the silence counter has been reached. If the upper limit of the silence counter has not been reached, step 263 is performed in which the count is incremented and the method returns to step 260. A full description of the destination and the optimal digital value of the silence counter is described with reference to fig. 3.

If the decision at step 262 indicates that the upper limit of the silence counter has been reached, then step 265 is performed where the external voice activity sensor forced noise suppression technique updates the noise amount estimate. Step 280 is then performed, wherein the mute counter is reset. After performing steps 265 to 280, the method returns to step 210, where the next frame of the digital representation of the speech signal is estimated. The algorithm for steps 250 through 280 can be expressed as:

if P is_y(n)＞α((NF_y(n)), then the update is not forced

Otherwise

Forced update, increase mute counter, and check threshold

Fig. 3 is a flow diagram of a method used by an external voice activity detector to control the updating of noise volume estimates by a noise suppression algorithm in accordance with a preferred embodiment of the present invention. The method begins at step 310 where an external voice activity detector, such as external voice activity detector 150 of fig. 1, determines whether voice activity is present. Step 310 represents the results of the voice activity detection, such as the results described with reference to FIG. 2, where the noise amount estimation is forced if appropriate conditions exist. If step 310 determines that voice activity is not present, step 320 is performed in which the counter is incremented. At step 330, a check is made to determine if the current value of the counter reaches an upper limit. In a preferred embodiment, the upper limit for the counter is set equal to 20.

If the upper limit of the counter has been reached, the external voice activity detector forces an update of the amount of noise of the input digital representation of the voice signal and the method returns to step 310. If, however, step 330 determines that the upper limit has not been reached, then the method proceeds to step 350, where the external voice activity detector allows the noise suppression algorithm to determine whether an update of the amount of noise in the input digital representation of the voice signal is required. The method then returns to step 310. If the external voice activity detector determines that a voice signal is present, as in step 310, the counter is reset in step 315 and the method returns to step 310.

Steps 320 to 330 allow noise updates only after a longer "release delay" period has occurred. The use of the release delay period limits the noise suppression algorithm to noise amount estimation only when the hands-free user has stopped speaking. Thus, noise amount estimation is not performed during speech and during pauses that occur during normal speech. In addition, using a counter to limit the time between forced updates of the amount of noise at the speech signal limit limits the length of the release delay period. By limiting the length of the release delay period, a lock-in fault state in which the noise suppression algorithm stops updating the amount of noise can be avoided. Thus preventing a far-end listener from experiencing high noise levels.

A method and system for improved noise suppression using an external voice activity detector provides the ability to communicate voice in the presence of widely varying background noise. The method and system corrects for the shortcomings of many noise suppression techniques by forcing noise suppression techniques under certain conditions to make a noise magnitude estimate of the input digital representation of the speech signal. This in turn minimizes the noise heard by the listening station. In addition, a lock-in fault condition in which noise update stops occurring is avoided. The method and system produce a hands-free communication system that does not subject a far-end listener to noise impulses when background noise increases.

Accordingly, it is intended by the appended claims to cover all modifications that fall within the true spirit and scope of the invention.

Claims

1. A method for controlling updating of a noise level estimate of an input speech signal, wherein a noise suppression technique uses an internal speech activity detector in a transmitter that performs the noise suppression technique on the input speech signal, the method comprising the steps of:

estimating a background noise floor of the input speech signal using an external speech activity detector outside of noise suppression techniques;

estimating a signal power of an input speech signal using an external speech activity detector;

obtaining a voice activity factor in an external voice activity detector based on a background noise floor estimate;

determining voice activity from the voice activity factor and a signal power estimate;

controlling updating of the noise amount estimate according to the determining step; and

the second estimating step comprises the step of integrating previous signal power estimates.

2. The method of claim 1, wherein the determining step further comprises the steps of: if the signal power estimate exceeds the background noise floor estimate, a slope factor is applied to the background noise floor estimate to form an updated noise floor estimate.

3. The method of claim 2, wherein the slope factor is approximately in the range of 2 to 8 decibels per second.

4. The method of claim 3 wherein the determining step further comprises the step of applying a voice activity factor to the updated noise floor estimate to produce a voice activity threshold estimate.

5. The method of claim 4, wherein the slope factor is approximately 8 decibels per second.

6. The method of claim 4, wherein the controlling step further comprises the steps of: the internal voice activity detector is allowed to update the noise amount estimate if the signal power estimate is greater than the voice activity threshold estimate.

7. The method of claim 1, wherein the determining step further comprises the steps of: the background noise floor estimate is made equal to the signal power estimate if the signal power estimate is less than the background noise floor estimate.

8. The method of claim 7 wherein the determining step further comprises the step of applying a voice activity factor to the background noise floor estimate to produce a voice activity threshold estimate.

9. The method of claim 8, wherein the controlling step further comprises the steps of: the noise amount estimate is updated if the signal power estimate is less than the voice activity threshold estimate.

10. The method of claim 1, wherein said step of integrating further comprises the step of applying a leaky integrator factor.

11. The method of claim 10, wherein the leaky integrator factor is approximately 0.9875.

12. A transmitter for transmitting a speech signal to a remote receiver, comprising:

an internal voice activity detector;

an intrinsic noise amount estimator coupled to the internal voice activity detector;

an external voice activity detector coupled to the noise floor estimator for controlling the updating of the noise floor estimator; and

said external voice activity detector applying a slope factor for producing said update of said noise floor associated with said voice signal, an

The external voice detector includes:

a signal power estimator for calculating a signal power estimate of the speech signal;

a noise floor estimator for estimating a noise floor of the speech signal independent of a speech activity state; and

a voice activity processor coupled to said signal power estimator and said noise floor estimator for controlling the updating of said noise quantity estimator.