[go: up one dir, main page]

HK1171273B - System for adaptive voice intelligibility processing - Google Patents

System for adaptive voice intelligibility processing Download PDF

Info

Publication number
HK1171273B
HK1171273B HK12111607.8A HK12111607A HK1171273B HK 1171273 B HK1171273 B HK 1171273B HK 12111607 A HK12111607 A HK 12111607A HK 1171273 B HK1171273 B HK 1171273B
Authority
HK
Hong Kong
Prior art keywords
noise
enhancement
signal
input
voice
Prior art date
Application number
HK12111607.8A
Other languages
Chinese (zh)
Other versions
HK1171273A1 (en
Inventor
杨钧
理查德.J.奥利弗
詹姆斯.特雷西
何星
Original Assignee
Dts有限责任公司
Filing date
Publication date
Application filed by Dts有限责任公司 filed Critical Dts有限责任公司
Priority claimed from PCT/US2009/056850 external-priority patent/WO2011031273A1/en
Publication of HK1171273A1 publication Critical patent/HK1171273A1/en
Publication of HK1171273B publication Critical patent/HK1171273B/en

Links

Description

System for adaptive voice intelligibility processing
Background
Mobile phones are typically used in areas that include high background noise. Such noise typically has a level that greatly reduces the intelligibility of verbal communications from the mobile phone speaker. In most cases, some of the communication information is lost or at least partially lost when the listener listens because the high ambient noise level masks or distorts the caller's voice.
Attempts to minimize loss of intelligibility in the presence of high background noise have included: using an equalizer, a clipping circuit, or simply increasing the volume of the mobile phone. The equalizer and clipping circuit themselves may increase background noise and thus cannot solve the problem. Increasing the overall level of the mobile phone's sound or speaker volume generally does not significantly improve intelligibility and may cause other problems such as feedback and listener discomfort.
Disclosure of Invention
In a particular embodiment, a system for automatically adjusting a voice intelligibility enhancement applied to an audio signal, comprises: an enhancement module receives an input speech signal comprising formants and applies audio enhancement to the input speech signal to provide an enhanced speech signal. Audio enhancement may emphasize one or more formants in the input speech signal. The system further comprises: an enhancement controller having one or more processors. An enhancement controller may adjust an amount of audio enhancement applied by the enhancement module based at least in part on the detected amount of ambient noise. The system further comprises: an output gain controller that can adjust a total gain of the enhanced speech signal based at least in part on the amount of ambient noise and the input speech signal, and apply the total gain to the enhanced speech signal to produce an amplified speech signal. The system may further comprise: a distortion control module that may reduce clipping in the amplified speech signal at least by mapping one or more samples of the amplified speech signal to one or more values stored in a sum of sinusoids table. The sum of sinusoids table may be generated from the sum of lower order sinusoidal harmonics.
In various embodiments, a method of adjusting a voice intelligibility enhancement may comprise: the method includes receiving a speech signal and an input signal having a near-end ambient content, calculating, with one or more processors, the near-end ambient content in the input signal, adjusting, with the one or more processors, a level of speech enhancement based at least in part on the near-end ambient content, and applying the speech enhancement to the speech signal to produce an enhanced speech signal. The speech enhancement may emphasize one or more formants of the speech signal.
Further, in a particular embodiment, a system for automatically adjusting a voice intelligibility enhancement applied to an audio signal may include an enhancement module that may receive an input voice signal including formants and apply audio enhancement to the input voice signal to provide an enhanced voice signal. The audio enhancement may emphasize one or more formants in the input speech signal. The system may also include an augmentation controller comprising one or more processors. The enhancement controller may adjust an amount of audio enhancement applied by the enhancement module based at least in part on the amount of detected ambient noise. The system may also include an output gain controller that may adjust a total gain of the enhanced speech signal based at least in part on the amount of ambient noise and the input speech signal, and apply the total gain to the enhanced speech signal to produce an amplified speech signal.
A processor-readable storage medium having instructions stored thereon that cause one or more processors to perform a method of adjusting a voice intelligibility enhancement, the method may comprise: receiving a voice signal from a remote telephone and a noise signal from a microphone, calculating a value of the noise signal, adjusting a gain applied to a formant of the voice signal based at least in part on the value of the noise signal, and applying the gain to the formant of the voice signal.
In some implementations, a system for adjusting a noise threshold for voice intelligibility enhancement may comprise: a speech enhancement module that can utilize a receiving device to receive an input speech signal from a remote device and apply audio enhancement to the input speech signal to emphasize one or more formants in the input speech signal. The system may also include a voice enhancement controller having one or more processors. The voice enhancement controller may adjust an amount of audio enhancement applied by the enhancement module based at least in part on an amount of ambient noise detected above a first noise threshold. The system may also include a noise sensitivity controller that may adjust the first noise threshold. The noise sensitivity controller may include a first correlator that may calculate a first autocorrelation value from a microphone input signal received from a microphone of the receiving device, a first variance module that may calculate a first variance of the first autocorrelation value, a second correlator that may calculate a second autocorrelation value from a speaker input signal, wherein the speaker input signal includes an output signal of the speech enhancement module, a second variance module that may calculate a second variance of the second autocorrelation value, and a noise sensitivity adjuster that may adjust the first noise threshold using one or more of the first and second autocorrelation values and the first and second variance values to produce a second noise threshold. Thus, in particular embodiments, the voice enhancement controller may adjust the amount of audio enhancement applied to the second input audio signal based at least in part on the second amount of ambient noise detected above the second noise threshold.
In a particular embodiment, a system for adjusting sensitivity of a voice intelligibility enhancement comprises: a speech enhancement module may receive an input speech signal received by a receiving device from a remote device with the receiving device and apply audio enhancement to the input speech signal to emphasize one or more formants in the input speech signal. The system may also include an enhancement controller that may adjust an amount of the audio enhancement applied by the voice enhancement module based at least in part on an amount of ambient noise present in the input voice signal. The system may also include a noise sensitivity controller having one or more processors that may adjust a sensitivity of the enhancement controller to ambient noise based at least in part on a statistical analysis of at least one or both of a microphone input signal obtained from a microphone of the receiving device and a speaker input signal provided as an output signal of the voice enhancement module.
In a particular embodiment, a method for adjusting sensitivity of speech enhancement includes: receiving an input audio signal; detecting correlated content in an input audio signal, wherein detecting comprises calculating, using one or more processors, a statistical analysis of the input audio signal; and adjusting a level of enhancement applied to the input audio signal in response to performing the detecting.
Further, in various embodiments, an audio signal processing method includes: receiving a microphone input signal; detecting substantially periodic content in the microphone input signal; and adjusting, with one or more processors, audio enhancement based at least in part on the detected substantially periodic content in the microphone input signal. The audio enhancement may selectively enhance an audio output signal based at least in part on a level of the microphone input signal. The method may further comprise providing the audio output signal to a speaker.
For purposes of summarizing the disclosure, certain aspects, advantages and novel features of the invention have been described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment of the invention disclosed herein. Thus, the invention disclosed herein may be implemented or performed in the following manner: one or a set of advantages taught herein may be realized and optimized without necessarily realizing other advantages that may be taught or suggested herein.
Drawings
Throughout the drawings, reference numerals may be repeated to indicate correspondence between reference elements. The drawings are provided to illustrate embodiments of the invention described herein and not to limit the scope thereof.
FIG. 1 illustrates an embodiment of a mobile phone environment for implementing a voice enhancement system;
FIG. 2 illustrates an embodiment of the enhancement system of FIG. 1;
FIG. 3 illustrates an embodiment of a voice enhancement control process used by the voice enhancement system;
FIG. 4 illustrates an embodiment of an output volume control process used by the voice enhancement system;
FIGS. 5A, 5B, 5C and 6 illustrate embodiments of a noise sensitivity control process used by the voice enhancement system;
FIG. 7 illustrates an example distortion control module of the system of FIG. 1;
FIG. 8 shows an example time domain plot of a sine wave;
FIG. 9 shows an example frequency spectrum of the sine wave of FIG. 8;
FIG. 10 shows an example time domain plot of a clipped sine wave;
FIG. 11 shows an example frequency spectrum of the clipped sine wave of FIG. 10;
FIG. 12 shows an example spectrum of a reduced number of harmonics compared to the clipped sine wave spectrum of FIG. 11;
FIG. 13 shows an example time domain plot of a partially saturated wave corresponding to the spectrum of FIG. 12;
FIG. 14 illustrates an embodiment of a sine and mapping function;
fig. 15 shows an example time domain plot of an audio signal and a distortion controlled version of the signal.
Detailed Description
I. Introduction to
Mobile phones and other similarly sized devices tend to have small speakers where the volume of the sound is limited when the small speakers are produced. Thus, it is difficult to hear a conversation on a mobile phone in the presence of ambient noise.
The present disclosure describes a system and method for adjusting voice intelligibility processing based on ambient noise, speech level (speech level), a combination of the two, and the like. The voice intelligibility processing may include techniques to emphasize formants in the speech. For example, voice intelligibility processing may be used to clarify speech of a mobile phone conversation or the like. The voice intelligibility processing may be adapted to increase or decrease a voice formant or other sound characteristic based at least in part on the ambient noise. By enhancing the voice intelligibility processing, formants in the speaker's speech can be emphasized to make it more perceptible to the listener. However, in the presence of significant ambient noise, the enhancement of formants in speech may make the speech sound harsh. Thus, if the ambient noise is reduced, the amount of voice intelligibility processing can be reduced to avoid harshness in the speech.
Further, the overall gain of the audio signal may also be adaptively increased based at least in part on the noise level and/or the speech level. However, if the total gain of the audio signal is increased beyond a certain level, saturation of the audio signal may occur, thereby causing harmonic distortion. In particular embodiments, to reduce the effects of saturated distortion, a distortion control process may be used. The distortion control process may reduce distortion that occurs during high gain situations while allowing some distortion to occur to maintain or increase loudness. In a particular embodiment, distortion control may be performed by mapping an audio signal to an output signal, where the output signal has fewer harmonics than a fully saturated signal.
System overview
FIG. 1 illustrates an embodiment of a mobile phone environment 100 for implementing a voice enhancement system 110. In the example mobile phone environment 100, a caller phone 104 and a recipient phone 108 are shown. The caller phone 104 and the receiver phone 108 may be mobile phones, Voice Over Internet Protocol (VOIP) phones, smart phones, landline phones, and the like. The calling party telephone 104 may be considered to be located at a far end of the mobile telephony environment 100 and the receiving party telephone may be considered to be located at a near end of the mobile telephony environment 100. The near end and the far end may be reversed when the user of the mobile phone 108 speaks.
In the depicted embodiment, the caller provides voice input 102 to caller telephone 104. A transmitter 106 in the calling party telephone 104 sends the voice input signal 102 to a receiving party telephone 108. The transmitter 106 may transmit the voice input signal 102 wirelessly or through a landline depending on the type of calling party telephone 104. The voice enhancement system 110 of the receiving party's telephone 108 may receive the voice input signal 102. The voice enhancement system 110 may include hardware and/or software for improving the intelligibility of the voice input signal 102. For example, the voice enhancement system 110 may process the voice input signal 102 with voice enhancement that enhances the distinctive characteristics of the spoken sound.
The voice enhancement system 110 may also utilize the microphone of the receiving party telephone 108 to detect the ambient noise 112. The ambient noise or content 112 may include background noise or ambient noise (ambientnoise). In addition to its ordinary meaning, ambient noise or content may also include some or all of the near-end noise. For example, the ambient noise or content may include echo from the speaker output 114 in addition to background sound received by the microphone of the recipient phone 108. In some cases, the ambient noise may also include voice input from the user of the recipient phone 108, including coughing, whistling, and double talk (see "noise sensitivity Control" below).
Advantageously, in a particular embodiment, the voice enhancement system 110 adjusts the strength of the voice enhancement applied to the voice input signal 102 based at least in part on the amount of ambient noise 112. For example, if the ambient noise 112 increases, the voice enhancement system 110 may increase the amount of voice enhancement applied and vice versa. Thus, the speech enhancement may track, at least in part, the amount of detected ambient noise 112.
Further, the speech enhancement system 110 may increase the overall gain applied to the speech input signal 102 based at least in part on the amount of ambient noise 112. However, when less ambient noise 112 is present, the speech enhancement system 110 may reduce the amount of speech enhancement and/or gain increase applied. This reduction is beneficial to the listener because the speech enhancement and/or volume increase may sound harsh or annoying when low levels of background noise 112 are present.
Thus, in particular embodiments, the speech enhancement system 110 transforms the speech input signal into an enhanced output signal 114, where the enhanced output signal 114 may be better understood by a listener in the presence of varying ambient noise levels. In some embodiments, a voice enhancement system 110 may also be included in calling party telephone 104. The voice enhancement system 110 may apply enhancement to the voice input signal 102 based at least in part on the amount of ambient noise detected by the calling party telephone 104. Thus, the voice enhancement system 110 may be used in the caller telephone 104, the recipient telephone 108, or both.
Although the voice enhancement system 110 is shown as part of the telephone 108, the voice enhancement system 110 may alternatively be implemented in any communication device or devices in communication with a telephone. For example, the voice enhancement system 110 may be implemented in a computer, router, analog telephone adapter, or the like that is in communication with or coupled to a VOIP-enabled telephone. The voice enhancement system 110 may also be used in public address ("PA") devices (including PAs over internet protocol), radio transceivers, hearing aid devices (e.g., hearing aids), walkie-talkies, and other audio systems. Further, the voice enhancement system 110 may be implemented in any processor-based system that provides audio output to one or more speakers.
Fig. 2 shows a more detailed embodiment of the voice enhancement system 210. The voice enhancement system 210 may have all of the features of the voice enhancement system 110. The voice enhancement system 210 may be implemented in: a mobile phone, a cellular phone, a smart phone, or other computing device including any of the above. Advantageously, in a particular embodiment, the voice enhancement system 210 adjusts the voice intelligibility processing and the volume processing based at least in part on the amount of detected ambient noise and/or the level of the voice signal.
The voice enhancement system 210 includes a voice enhancement module 220. The speech enhancement module 220 may include hardware and/or software for applying speech enhancement to the speech input signal 202. The voice enhancement may enhance the distinctive characteristics of the vocal sounds in the voice input signal 202. In some embodiments, these distinguishing characteristics include formants generated in the vocal tract of a person (e.g., a caller using a telephone). Intelligibility of human speech may depend to a large extent on the pattern of the frequency distribution of the formants. Thus, the speech enhancement module 220 may selectively enhance formants to provide more easily understood speech in the presence of background noise.
In a particular embodiment, the voice enhancement module 220 applies voice enhancement using some or all of the features described in U.S. patent No.5459813 ("813 patent"), entitled "public address intelligibility system," filed 10, 17, 1995, the entire contents of which are incorporated herein by reference. Although the' 813 patent describes these features in the context of a circuit, the voice enhancement module 220 may implement some or all of these features using instructions executed in a processor, such as a Digital Signal Processor (DSP). In addition, the voice enhancement module 220 may also use voice enhancement techniques not disclosed in the' 813 patent.
The speech enhancement module 220 may process formants by dividing the speech input signal 202 into frequency subbands. The voice enhancement module 220 may divide the voice input signal 202 into two or more sub-bands, and so on. The speech enhancement module 220 may perform such frequency division by applying a band pass filter having a center frequency at which formants tend to occur or near which formants tend to occur. In an embodiment, such frequency division may be accomplished by a spectrum analyzer 42 or 124 as described, for example, in column 4, row 50 through column 5, row 24 and in column 7, row 10 through row 32 of the '813 patent, which section of the' 813 patent is specifically incorporated herein by reference.
The speech enhancement module 220 may apply speech enhancement by independently amplifying formants in the sub-bands and selectively weighting them. Weighting the formants may allow certain formants to be emphasized, thereby improving intelligibility. The speech enhancement module 220 may combine the weighted formants with the baseband speech component to provide an output speech signal to an output gain controller 230 (described below). The voice enhancement module 220 may also enhance other voiced distinction features such as plosives and fricatives.
For example, the voice enhancement module 220 may also perform these amplification, weighting, and combining functions (or digital implementations thereof) in the same or similar manner as described in the following portions of the' 813 patent: column 5, lines 1-7; column 5, line 46 to column 6, line 19; and column 9, lines 8 through 39. Accordingly, these portions of the' 813 patent are specifically incorporated by reference. To illustrate an example of how some of these functions may be implemented digitally, the' 813 patent describes using variable resistors to weight signals in a particular sub-band (see, e.g., column 5, row 66 to column 6, row 19). The speech enhancement module 220 may implement these weights digitally by storing the gain value in memory and applying the gain value to the signal with the processor.
Advantageously, in certain embodiments, a voice enhancement controller 222 is provided that can control the level of voice enhancement provided by the voice enhancement module 220. The voice enhancement controller 222 may include hardware and/or software. The speech enhancement controller 222 may provide an enhancement level control signal or value to the speech enhancement module 220, which speech enhancement module 220 increases or decreases the level of speech enhancement applied. In one embodiment, the enhancement level control signal adjusts the weighting of the sub-bands. For example, the control signal may include one or more gain values multiplied by the outputs (or inputs) of some or all of the subbands. Likewise, the control signal may be used to add or subtract inputs or outputs for some or all of the subbands. The control signal may be adjusted on a sample-by-sample basis as the ambient noise 204 increases and decreases.
In a particular embodiment, the voice enhancement controller 222 adjusts the level of voice enhancement after detecting a threshold energy of the ambient noise 204. Above the threshold, the speech enhancement controller 222 may cause the level of speech enhancement to follow or substantially follow the amount of ambient noise 204. In one embodiment, for example, the level of noise enhancement above the noise threshold provided and the proportion of noise energy (or power) to the threshold are proportional. In an alternative embodiment, the level of speech enhancement is adjusted regardless of the amount of ambient noise present, e.g., no threshold is used.
The depicted embodiment of the voice enhancement system 210 includes a noise sensitivity controller 224 and an additional enhancement control 226, the additional enhancement control 226 being used to further adjust the amount of control provided by the voice enhancement controller 222. The noise sensitivity controller 224 may provide a noise sensitivity control value to the voice enhancement controller 222 to adjust the degree to which the voice enhancement controller 222 is sensitive to the amount of noise 204 present. As will be described in more detail below, the noise sensitivity controller 224 may affect the noise threshold below which the speech enhancement controller 222 may not adjust the level of speech enhancement.
In a particular embodiment, the noise sensitivity controller 224 automatically generates the noise sensitivity control based at least in part on audio samples obtained from microphone and/or speaker inputs. Advantageously, in certain embodiments, the noise sensitivity controller 224 may automatically adjust the noise sensitivity to account for speaker echo and other noise artifacts picked up by the microphone. These features will be described in more detail below with reference to fig. 5 and 6. Further, in some embodiments, the noise sensitivity controller 224 provides a user interface that allows a user to adjust the noise sensitivity control. Thus, the noise sensitivity controller 224 may provide automatic and/or manual control of the speech enhancement controller 222.
The additional enhancement control 226 may provide an additional enhancement control signal to the voice enhancement controller 222 that may be used as a value at which the level of enhancement is no longer decreasing. The additional enhancement controls 226 may be presented to the user via a user interface. This control 226 also allows the user to increase the level of enhancement beyond that determined by the voice enhancement controller 222. In one embodiment, the voice enhancement controller 222 may add additional enhancement from the additional enhancement control 226 to the enhancement level determined by the voice enhancement controller 222. The additional enhancement control 226 may be particularly useful for hearing impaired people who may want stronger voice enhancement processing or who want to apply voice enhancement processing frequently.
In particular embodiments, output gain controller 230 may control the amount of overall gain applied to the output signal of speech enhancement module 220. The output gain controller 230 may be implemented in hardware and/or software. The output gain controller 230 adjusts the gain applied to the output signal based at least in part on the level of the noise input 204 and the level of the voice input 202. Such a gain may be applied in addition to any user-set gain such as the volume control of the phone. Advantageously, adjusting the gain of the audio signal based on the ambient noise 204 and/or the voice input 202 may help the listener to further perceive the voice input signal 202.
In the depicted embodiment, an adaptive level control 232 is also shown, which may further adjust the amount of gain provided by the output gain controller 230. The user interface may also present an adaptation level control 232 to the user. Increasing such control 232 may increase the gain of the controller 230 more as the incoming voice input 202 level decreases or as the noise input 204 increases. Reducing such control 232 can cause the gain of the controller 230 to increase less when the incoming voice input 202 level decreases or when the noise input 204 decreases.
In some cases, the gain applied by the speech enhancement module 220, the speech enhancement controller 222, and/or the output gain controller 230 may cause the speech signal to be clipped or saturated. Saturation may result in harmonic distortion that is unpleasant to the listener. Thus, in certain embodiments, a distortion control module 140 is also provided. The distortion control module 140 may receive the gain-adjusted speech signal of the output gain controller 230. The distortion control module 140 may include hardware and/or software to control distortion while at least partially preserving or even increasing the signal energy provided by the speech enhancement module 220, the speech enhancement controller 222, and/or the output gain controller 230.
In a particular embodiment, the distortion control module 140 controls distortion in the speech signal by mapping one or more samples in the speech signal to an output signal that has fewer harmonics than a fully saturated signal. For non-saturated samples, the mapping may follow the speech signal linearly or approximately linearly. For saturated samples, the mapping may be a nonlinear transformation applying a control distortion. Thus, in particular embodiments, the distortion control module 140 may allow the voice signal to sound louder with less distortion than a fully saturated signal. Thus, in particular embodiments, distortion control module 140 converts data representing a physical voice signal into data representing another physical voice signal having control distortion.
Speech enhancement control
Fig. 3 illustrates an embodiment of a voice enhancement control process 300. The speech enhancement control process 300 may be implemented by the speech enhancement system 110 or 210. In particular, the voice enhancement control process 300 may be implemented by the voice enhancement controller 222. Advantageously, in particular embodiments, the voice enhancement control process 300 adjusts the voice enhancement processing based at least in part on the level of ambient noise energy.
At block 302, an ambient noise input signal is received by a communication device, such as a telephone. The ambient noise input signal may be detected by a microphone of the communication device. At decision block 304, a determination is made whether environmental control is enabled. If environmental control is not enabled, a value of zero is provided to block 306. In one embodiment, the environmental control may be enabled or disabled by a user through a user interface of the communication device. Disabling environmental control may cause the voice enhancement control process to adjust the voice enhancement processing based on factors other than the noise level, such as the additional control level described above.
The energy of the ambient noise signal may be calculated by taking the absolute value of the noise signal in block 306 and by applying a noise smoothing filter to the noise signal in block 308. The noise smoothing filter may be a first order filter or a higher order filter. For example, the smoothing filter may be a low pass filter or the like. In some embodiments, the noise smoothing filter provides an average (e.g., moving average) noise energy level per sample. In an alternative embodiment, the power of the noise signal is calculated instead of the energy.
At block 310, the energy of the ambient noise signal may be provided to an output gain control process. An example output gain control process is described below with reference to fig. 4. Ambient noise energy may also be provided to a decision block 312, which decision block 312 may determine whether the energy has reached (e.g., is greater than or equal to) a noise threshold. In one embodiment, the noise threshold is calculated as follows:
noise threshold 1- (. alpha.)Noise sensitivity control) (1)
Where α is a constant, where the noise sensitivity control may be a value generated by the noise sensitivity controller 224 of fig. 2. The noise sensitivity control may affect the sensitivity of the voice enhancement controller 222 to the ambient noise input 302. The noise sensitivity control may be changed based on various factors such that the noise threshold is changed (see fig. 5 and 6). In embodiments, the alpha and noise sensitivity controls may be between the range [0, 1], or may have other values outside of this example range.
In the depicted embodiment, if the noise energy is greater than or equal to the threshold, the noise energy is passed to the multiplication block 314. Otherwise, a zero control level is provided to the multiplication block 314. Because the control level may be multiplied by the voice signal sub-bands described above with reference to fig. 2, a zero control level may potentially result in no voice enhancement processing being applied to the voice signal (e.g., no additional processing is provided at block 316 below).
At multiplication block 314, the output of decision block 312 is multiplied by the multiplicative inverse of the noise threshold. Alternatively, the output of decision block 312 is divided by the noise threshold decision. The output of the multiplication block 314 may be a preliminary enhancement level. Thus, in particular embodiments, the enhancement level may be a ratio of the noise energy to the noise threshold.
At block 316, additional enhancement control, described above with reference to FIG. 2, may be added to the preliminary enhancement control level. The additional enhancement control may be between the range 0, 1, or have some other value. At decision block 318, a determination is made whether a high control level has been reached. The high control level may be a predetermined peak or maximum control level. If the high control level has been reached, then at decision block 318, the enhanced control level is limited to the high control level. Otherwise, decision block 318 passes the enhanced control level to decision block 320.
At decision block 320, it may be determined whether voice enhancement control is enabled. If not, the user input may be used to adjust the speech enhancement processing level. User input may be presented to the user via a user interface or the like. If control is enabled, the enhanced control level calculated in blocks 302 through 318 may be taken as the output control level in block 322.
Although a noise threshold is used in this example, it is not necessary to use a noise threshold in all embodiments. In particular embodiments, the voice enhancement processing may be adjusted based on any noise level. However, in some cases, it may be beneficial to use a threshold. For example, in low ambient noise situations, the voice enhancement processing may be harsh or unpleasant. Thus, using a threshold to determine when to turn on voice enhancement control may cause voice enhancement processing to be used in the presence of greater noise levels.
Output gain control
Fig. 4 illustrates an embodiment of an output gain control process 400. The output gain control process 400 may be implemented by the voice enhancement system 110 or 210. In particular, the output gain control process 400 may be implemented by the output gain controller 230. Advantageously, in particular embodiments, the output gain control process 400 adjusts the output gain based at least in part on the level of ambient noise energy and the speech input level.
At block 402, a voice input signal from a remote caller is received by a communication device, such as a telephone. The energy in the voice input signal may be determined at blocks 404 and 406 by taking the absolute value of the voice input at block 404 and calculated by applying a voice smoothing filter at block 406. The speech smoothing filter may be a low pass filter or the like that provides an average (e.g., moving average) speech level per sample (sample per sample basis).
At block 408, ambient noise energy is received. The ambient noise energy is calculated in the volume control process 300 described above. At decision block 410, the output of the voice smoothing filter is compared to a receive gain threshold and the ambient noise energy is compared to a microphone gain threshold. The receive gain threshold may depend at least in part on the adaptive gain control described above with reference to fig. 2. The microphone gain threshold may be based at least in part on the noise sensitivity control described above with reference to fig. 2.
In one embodiment, the receive gain threshold is calculated as follows:
receive gain threshold of 0.5+ (gamma)Adaptive gain control) (2)
Where γ is a constant ranging between [0, 1], and the adaptive gain control may be a value corresponding to the adaptive gain control 232 of fig. 2. Also, the microphone gain threshold may be calculated as follows:
microphone gain threshold 1- (η × noise sensitivity control) (3)
Where η is a constant in the range between [0, 1], and the noise sensitivity control is a value generated by the above-described noise sensitivity controller 224. The noise sensitivity control may change value (see also fig. 5 and 6) so that in some embodiments the microphone gain threshold also changes.
At decision block 410, if the condition is satisfied, the ambient noise energy is provided to multiplication block 412. Otherwise, a low gain level may be provided to the multiplication block 412. The low gain level may be a minimum gain level, etc. For example, where the ambient noise energy is relatively low and the voice input is relatively high, a low gain level may be used. In these cases, fine gain adjustments may be desirable because the voice signal may already be relatively easy to understand.
At a multiplication block 412, the output of the decision block 410 is multiplied by the multiplicative inverse of the microphone gain threshold to produce a gain level. Alternatively, the output of decision block 410 may be divided by the microphone gain threshold. Thus, the gain level may be a ratio of the ambient noise energy to the microphone gain threshold. At block 414, it is determined whether a high gain level has been reached. If the high gain level is not reached, the output of the multiplication block 412 is passed to the output gain smoothing filter 416. Otherwise, a high gain level is provided to the output gain smoothing filter. The high gain level may be a maximum gain level, etc.
At block 416, an output gain smoothing filter is applied to the output of decision block 414. The output gain smoothing filter, which may be a low pass filter or the like, averages the gain levels calculated at the multiplication block 412 and/or the decision block 414. The smoothing filter can reduce abrupt changes in gain level. At block 418, the output of the gain smoothing filter is multiplied by an output gain control, which may be a user-set value. For example, the output gain control may be presented to the user via a user interface. At block 420, the output of the multiplication block 418 is provided as the output gain level.
V. noise sensitivity control
As described above, the noise sensitivity control generated by the noise sensitivity controller 224 may be changed automatically or under user control. In certain embodiments, changing the noise sensitivity control affects the sensitivity of the speech enhancement controller 222 and/or the output gain controller 230 to noise. In one embodiment, increasing the noise sensitivity control causes the voice enhancement controller 222 to respond greatly to ambient noise by greatly enhancing the intelligibility of the voice, and vice versa. Similarly, increasing the noise sensitivity control may cause the output gain controller 230 to greatly increase the output gain applied to the enhanced audio signal, and vice versa.
In several cases, it may be beneficial to automatically reduce the sensitivity of the speech enhancement controller 222 and/or the output gain controller 230. For example, if the receiving telephone 108 of fig. 1 only receives noise and does not receive a voice signal from the calling party telephone 104 (e.g., due to a conversation pause), applying voice enhancement may increase the loudness of the noise. Furthermore, unpleasant effects may occur when the microphone of the receiving telephone 108 picks up the voice signal from the speaker output 114 of the telephone 108. This speaker feedback may be interpreted by the voice enhancement controller 222 as ambient noise, which may cause the voice enhancement to modulate the speaker feedback. The resulting modulated output signal 114 may be unpleasant to a listener. A similar problem may occur when the recipient phone 108 outputs a voice signal received from the caller phone 104 while the listener is talking to the recipient phone 108. The microphone of the receiving telephone 108 may detect the double talk and the voice enhancement controller 222 may cause the voice enhancement to modulate the double talk, resulting in an unpleasant sound.
In particular embodiments, noise sensitivity controller 224 may overcome these and other problems by automatically adjusting the sensitivity of speech enhancement controller 222 and/or output gain controller 230 to noise. Alternatively, the noise sensitivity controller 224 may trigger (e.g., turn on or off) the voice enhancement controller 222 and/or the output gain controller 230. Referring to fig. 5A, 5B, and 5C, embodiments of noise sensitivity controllers 524a, 524B, and 524C are shown in greater detail. The noise sensitivity controller 524a of fig. 5A may adjust the noise sensitivity of the controllers 222, 230 or trigger the controllers 222, 230 to illustrate a situation where the receiving telephone 108 receives only noise and does not receive a voice signal from a far end (e.g., from the calling party telephone 104). The noise sensitivity controller 524a of fig. 5B may adjust the noise sensitivity of the controllers 222, 230 or trigger the controllers 222, 230 to account for speaker feedback and/or double talk situations. The noise sensitivity controller 524C of fig. 5C combines the features of the controllers 524a, 524B shown in fig. 5A and 5B.
In fig. 5A, the noise sensitivity controller 524a receives the speaker input 502 a. Speaker input 502a may include one or more output samples stored in a buffer or the like that are also provided to a speaker of a communication device, such as telephone 108. The speaker input 502a may be the output signal 250 of the voice enhancement system 210 described above. Speaker input 502a is provided to correlator 530a, where correlator 530a may calculate or estimate an autocorrelation of speaker input 502 a. In an embodiment, correlator 530a calculates an autocorrelation of a set of samples in speaker input 502 a.
The voice signal tends to be periodic or substantially periodic. Thus, if the speaker input 502a includes a voice signal, the autocorrelation function of the speaker input 502a may also be periodic or substantially periodic due to the nature of the autocorrelation. On the other hand, noise signals are generally uncorrelated and not periodic (some anomalies are described below). Evaluating the autocorrelation of a periodic or substantially periodic signal may result in a value that is greater than the autocorrelation of many noise signals.
The autocorrelation calculated by correlator 530a is provided to sensitivity adjuster 550 a. In one embodiment, the speaker input 502a is most likely noise if the autocorrelation is small or below a threshold. Accordingly, the sensitivity adjuster 550a can reduce the noise sensitivity control 504a corresponding to the noise sensitivity control of the above equations (1) and (3). Thus, noise sensitivity control 504a may adjust the noise threshold used by speech enhancement controller 222 and/or the microphone gain threshold used by output gain controller 230. Thus, the speech enhancement controller 222 and/or the output gain controller 230 may be less aggressive in response to ambient noise. If the autocorrelation is large or greater than a threshold (indicating that the speaker input 502a may include speech), the sensitivity adjuster 550a may increase the noise sensitivity control 504 a. Thus, the speech enhancement controller 222 and/or the output gain controller 230 may be actively responsive to ambient noise.
In certain embodiments, the amount of sensitivity adjustment provided by the sensitivity adjuster 550a may correspond to the level of autocorrelation. For example, the lower the autocorrelation, the smaller the sensitivity adjuster 550a may make the noise sensitivity control 504a, and vice versa.
In the depicted embodiment, correlator 530a also provides autocorrelation values to optional variance module 540 a. The variance module 540a may calculate or estimate a variance of a set of autocorrelation values. The variance module 540a may provide the resulting variance value to the sensitivity adjuster 550a, which the sensitivity adjuster 550a may use to refine the adjustment to the noise sensitivity control 504 a. Larger variance values may reflect the presence of a speech signal, while smaller variance values may reflect the presence of dominant noise. Thus, the sensitivity adjuster 550a may include logic to increase the noise sensitivity control 504a when the autocorrelation and the variance value are both large, and to decrease the noise sensitivity control 504b when one or both of the autocorrelation and the variance value are small.
A variety of alternative configurations for the illustrated example noise sensitivity controller 524a may be provided. For example, the variance module 540a may be omitted. Alternatively, correlator 530a may provide a value only to the variance module, and sensitivity adjuster 550a may adjust noise sensitivity control 504a based only on the variance value. In addition, correlator 530a may use other statistical measures to analyze speaker input 502 a. For example, correlator 530a may use any normalized unbiased estimator. In one embodiment, correlator 530a normalizes the correlation by the total power or energy of a set of samples. Normalizing the correlation by power may cause the sensitivity adjuster 550a to adjust the noise sensitivity control 504a based on the characteristics of the input signal 502a rather than based on the power variance of the input signal 502 a.
Referring to fig. 5B, an example noise sensitivity controller 524B includes most of the features of fig. 5A. However, the noise sensitivity controller 524b receives a microphone ("mic") input 502b, rather than the speaker input 502a, where the microphone input 520b may comprise a set of samples received by a microphone. Applying the correlation and/or variance techniques described above to the microphone input 502b may allow the noise sensitivity controller 524b to improve voice intelligibility processing in the presence of speaker feedback and/or double talk.
The microphone input 502b is provided to a correlator 530b that is capable of providing the same autocorrelation characteristics described above. In the case of speaker feedback or dual talk, the microphone input 502b may include periodic or substantially periodic information. Thus, the autocorrelation function may be periodic or substantially periodic, and the autocorrelation values calculated by correlator 530b may be greater than the autocorrelation of many forms of noise.
As before, correlator 530b may provide autocorrelation values to sensitivity adjuster 550 b. If the autocorrelation value is greater than or greater than the threshold, the sensitivity adjuster 550b may reduce the noise sensitivity control 504b to reduce the voice enhancement modulation caused by speaker feedback and/or double talk. Similarly, if the autocorrelation value is small or less than the threshold, the sensitivity adjuster 550b may increase the noise sensitivity control 504 b. As above, the sensitivity adjuster 550b may adjust the amount of the noise sensitivity control 504b based at least in part on the level of autocorrelation.
Correlator 530b also provides autocorrelation values to optional variance module 540 b. The variance module 540b may calculate a variance or an approximation of the variance of a set of autocorrelation values. The variance module 540b may provide the resulting variance value to the sensitivity adjuster 550b, which the sensitivity adjuster 550b may use to refine the adjustment of the noise sensitivity control 504 b. Larger variance values may reflect the presence of voice feedback and/or double talk, while smaller variance values may reflect primarily the presence of noise. Thus, the sensitivity adjuster 550b may also reduce the noise sensitivity control 504b when the variance is large, and vice versa.
Advantageously, the variance module 540b may account for certain noise signals having harmonic components. Some noise signals, such as those generated by automobiles and aircraft, have low frequency harmonic content, which can result in higher correlation values. However, the autocorrelation of these noise signals may have a lower variance value than for the speech signal. Accordingly, the sensitivity adjuster 550b may include logic to decrease the noise sensitivity control 504b when the autocorrelation and the variance values are both large, and to increase the noise sensitivity control 504b when one or both of the autocorrelation and the variance values are small.
In various embodiments, the alternative configuration described above with reference to noise sensitivity controller 524a may also be used to modify noise sensitivity controller 524 b. Further, in alternative embodiments, an acoustic echo canceller may be used in place of correlator 530b, variance module 540b, and/or sensitivity adjuster 550b (or an acoustic echo canceller may be used in addition to correlator 530b, variance module 540b, and/or sensitivity adjuster 550 b). The acoustic echo canceller may reduce or cancel the echo received from the speaker at the microphone input 502 b. For example, an acoustic echo canceller implementing the features described in ITU-T recommendation G.167, 3 months 1993, may be employed and is hereby incorporated by reference in its entirety. Advantageously, however, in certain embodiments, the correlation and/or variance features described herein may be implemented with less processing resources than an acoustic echo canceller.
Referring to fig. 5C, the noise sensitivity controller 524C combines the features of the noise sensitivity controllers 524a and 524 b. In particular, the noise sensitivity controller 524c receives the microphone input 502b and the speaker input 504 a. Speaker input 502a is provided to correlator 530a, correlator 530a provides autocorrelation values to sensitivity adjuster 550c and variance module 540a, and variance module 540a provides variance values to sensitivity adjuster 550 c. Microphone input 502b is provided to correlator 530a, correlator 530b provides autocorrelation values to sensitivity adjuster 550c and variance module 540b, and variance module 540b provides variance values to sensitivity adjuster 550 c.
The sensitivity adjuster 550c may include logic to adjust the noise sensitivity control 504c based at least in part on information received from any of the following components: parts 530a, 530b, 540a and 540 b. In certain embodiments, the sensitivity adjuster 550c performs a soft decision to adjust the noise sensitivity control 504 c. One example of a process 600 that may be performed by the sensitivity adjuster 550c is depicted in fig. 6. At decision block 602 of process 600, a determination is made whether the microphone variance value is greater than a threshold value. The microphone variance values may be calculated by variance module 540 b. If the variance of the autocorrelation of the microphone input 502b is greater than a threshold, a periodic or substantially periodic signal may be present due to voice feedback or double talk. Accordingly, at block 604, the sensitivity adjuster 550c reduces the noise sensitivity control based at least in part on the correlation value from the correlator 530b, where a larger correlation value potentially results in a larger reduction.
If the microphone variance is less than the threshold, a determination is made at decision block 606 as to whether the speaker variance is less than the threshold. The variance module 540a may calculate speaker variance values from the autocorrelation of the speaker input 502 a. If the speaker variance is greater than or equal to the threshold, a speech signal may be present in the speaker input signal 502 a. Accordingly, at block 608, the sensitivity adjuster 550c sets the noise sensitivity control to a default level.
If the speaker variance is less than the threshold, then noise may be present in the speaker input 502 a. Thus, the sensitivity adjuster 550c reduces the noise sensitivity control based at least in part on the correlation value from the correlator 530a, where a smaller correlation value potentially results in a greater reduction.
The process 600 illustrates one example implementation of the sensitivity adjuster 550 c. In other embodiments, one or both of the thresholds described in process 600 may be provided with an amount of hysteresis. In other embodiments, in block 604, the noise sensitivity control is set to a particular smaller value that does not depend directly on the correlation value. Likewise, in block 610, the noise sensitivity control may be set to a value that is not dependent on the correlation value. In addition, other statistical measures besides autocorrelation and variance may be used to adjust noise sensitivity, including standard deviation, high order moments, acoustic echo cancellation, and the like. Numerous other configurations are also possible.
More generally, any of the noise sensitivity controllers described above may be viewed as a voice, dialog or speech classifier that detects and/or classifies one or more sound, dialog or speech components in an input audio signal. The noise sensitivity controller can also be seen as a voice detector or a general signal classifier. The noise sensitivity controller performs voice or signal classification or detection to analyze one or more statistical characteristics of the input audio signal, at least in part, by using one or more processors. Autocorrelation and variance, acoustic echo cancellation and estimators are only examples of techniques that may be employed by the noise sensitivity controller. Other techniques, including other statistical techniques, may be used to detect the speech or other components of the input signal.
Furthermore, voice feedback and double talk are also only examples of sound components that may be detected. The features of the noise sensitivity controller described above with reference to fig. 5 and 6 may be used to detect other speech components in an audio signal, including speech in any media content (e.g., television, radio, music, and other content). For example, the controller may detect a speech component in the media content using autocorrelation of audio in the media content. In one embodiment, the controller may provide the detected voice component to the dialog enhancement to increase or decrease the amount of dialog enhancement applied, thereby enabling the dialog enhancement to enhance the dialog more effectively.
Distortion control
The speech enhancement controller 222 and/or the output gain controller 230 may increase one or more gains applied to the speech signal. In some cases, increasing the gain beyond a certain point may cause the signal to saturate, which may cause distortion. Advantageously, in certain embodiments, the distortion control module 240 described above may provide for controlling distortion, thereby providing greater loudness.
Fig. 7 illustrates an embodiment of the distortion control module 740 in greater detail, which may have all of the features of the distortion control module 140 described above. Distortion control module 740 may be implemented in hardware and/or software. In certain embodiments, the distortion control module 740 may cause selected distortions in the audio signal to increase signal energy and, thus, loudness. The selected distortion may be a control distortion that adds fewer harmonics than are present in a fully saturated signal.
As described above, distortion control module 740 may cause the selected distortion at least in part by mapping input samples to output samples. Distortion control module 740 may perform this mapping by using samples of input signal 702 as an index into sum-of-sines table 714 or tables. The sine sum table 714 may include: the resulting values are summed over harmonically related sine waves.
For example, if the input signal 702 has samples with a value of m, the distortion control module 740 may map the input samples to output samples at index m in the sum-of-sinusoids table 714. If the samples of the input signal 702 fall between the index values of the table 714, the distortion control module 740 may interpolate the index values. Using interpolation may allow the size of the sine sum table 714 to be reduced to conserve memory. However, in certain embodiments, the sum-of-sinusoids table 714 may be designed to be large enough to avoid the use of interpolation. Distortion control module 740 may use the mapped output values of sine sum table 714 as output samples for output signal 722.
The sum of sinusoids table 714 may be implemented as any data structure, such as an array, matrix, or the like. Table 714 is generated to include any number of harmonic sinusoids including odd harmonics, even harmonics, or a combination thereof. In particular embodiments, odd harmonics may provide good distortion control for a speech audio signal. Even harmonics may be used in other implementations and are advantageous for reducing clipping in music signals. Odd or even harmonics can be used for mixed speech and music signals. However, this is merely an illustrative example, and either the odd harmonics or the even harmonics, or both, may be used in any application.
The greater the potential increase in signal energy and distortion when more sinusoids are used to generate table 714, and vice versa. Because the use of a large number of sinusoids can result in significant harmonic distortion, in certain embodiments it may be beneficial to construct the sinusoids sum table 714 using a relatively small number of low frequency sinusoids.
For example, table 714 may be constructed from two or three harmonically related sinusoids, four sinusoids, five sinusoids, or a sum of more sinusoids. A plurality of sine sum tables 714 may be stored in memory and may be used by the distortion control module 740 for different purposes. For example, a sum of sinusoids table 714 with multiple harmonics may be used for the speech signal, while a table 714 with fewer harmonics may be used for music to construct less distortion.
The distortion control module 740 may also provide a user interface that provides distortion control to a user to adjust the amount of signal energy increase and/or distortion. For example, a graphical cursor, button, or the like may be provided, or the user can press a physical or soft button to adjust the amount of applied energy increase or distortion. Increasing distortion control may enable the use of tables with more harmonics and vice versa.
An example process for generating the sine sum table 714 is now described using sine waves associated with three odd harmonics. In this example, the sum of sinusoids table 714 may be generated by populating a first table of a selected size having a value of one sine wave period (e.g., from 0 radians to 2 pi). Populating a table of size N (N being an integer) may include dividing a sine wave period into N values and assigning the N values to N slots in the table. The first sine wave table may represent a fundamental harmonic or a first harmonic.
A second table of the same size as the first table may be populated with three periods of a sine wave in a similar manner (by dividing the three sine periods into N values). The values in the second table may represent the third harmonic of the first sine wave. Similarly, a third table of the same size as the first two tables, representing fifth harmonics, may be populated with five sine wave cycles. The values in the first, second and third tables may be scaled as needed. For example, the values in the second table may be scaled down to have a magnitude less than the magnitude of those in the first table, and the values in the third table may be scaled to include smaller values than those in the second table.
Because in particular embodiments, the three tables are the same size (e.g., have the same number of N entries), the values in the respective indices of the three tables may be added together to create a new sine sum table 714, the new sine sum table 714 including the sum of the first, third, and fifth harmonics. Thus, in a particular embodiment, if the sinusoids and the values in table 714 are to be plotted, an approximate graph of one cycle of the summed waveform should be shown. In a particular embodiment, the more sine waves that are used, the more closely this waveform will be drawn to a square wave. In various embodiments, other sinusoids and tables with different harmonics may be constructed in a similar manner to that described for the three odd harmonics. Alternatively, the sine sum table 714 may be constructed using some portions of the sine wave period rather than a complete period.
Since the distortion control module 740 maps samples from the input signal 702 into the sum of sinusoids table 714, the harmonic frequencies in the table 714 may depend on the table lookup rate, which in turn may depend on the frequency of the input signal. In certain embodiments, this frequency dependence causes distortion control module 740 to perform a table lookup operation at or near the same rate as the frequency of input signal 702.
By way of illustration, for a simple sine wave input signal 702 having a given frequency, the distortion control module 740 may perform the mapping operation at the same frequency. The resulting harmonics may have a particular frequency that depends on the sine wave frequency. Thus, doubling the frequency of the sine wave may double the harmonic frequency. For input signals 702 that include multiple superimposed frequencies, the mapping performed by distortion control module 740 may result in harmonic superposition.
Fig. 8 to 15 show examples of distortion and sine wave sums. For reference, fig. 8 shows an example time domain plot 800 of a sine wave 802. The peak 804 of sine wave 802 is shown without clipping. The peak level 804 of sine wave 802 is at 0db, and in some embodiments, peak level 804 may be a peak possible digital level. Fig. 9 shows an example curve 900, the example curve 900 showing a frequency spectrum 902 of the sine wave 802 of fig. 8. Since fig. 9 is a sinusoidal curve, one frequency is represented.
In certain embodiments, increasing the amplitude of sine wave 802 beyond the peak level may result in hard clipping. The hard clipping of sinusoid 1002 is shown in graph 1000 of fig. 10. The clipped sinusoid 1002 includes a clipping portion 1004 that saturates at the peak level. In the frequency domain plot 1102 shown in fig. 11, an example of harmonics 1104 of the clipped sine wave 1002 can be seen. As shown, the harmonics 1104 may be spread as high as the sampling frequency (about 22kHz in the illustrated example). Certain portions of harmonics 1106 are also aliased, causing further distortion.
To avoid full distortion of the hard clipping while still allowing for increased volume, distortion control module 740 may use a complex of lower frequency harmonics, as described above. An example of a set of harmonics of such a wave is shown in FIG. 12, which includes an example frequency response curve 1200 for a complex wave that may be generated in response to a 400Hz input sine wave. The spectrum in curve 1200 includes fewer harmonics 1202 than the full clipping case of fig. 11. In the depicted embodiment, fifth harmonic 1202 has been generated. The highest order harmonic 1202 is at a lower frequency than the high frequency harmonic 1104 of fig. 11. In this embodiment there are no aliased harmonics 1106.
The illustrated example embodiment includes harmonics of approximately 400Hz, 1200Hz, 2000Hz, 2800Hz, and 3600 Hz. These harmonics 1202 are odd harmonics 1202, which include first harmonic 1204, third harmonic 1206, fifth harmonic 1208, seventh harmonic 1210, and ninth harmonic 1212. The first harmonic 1204 has an amplitude of about 0dB, with 0dB being the maximum possible digital amplitude in a particular embodiment. The amplitude of successive harmonics 1202 becomes smaller as the frequency increases. In an embodiment, the amplitude of the harmonics 1202 is monotonically decreasing. In other embodiments, these amplitudes may vary.
The result of the control distortion provided by the lower frequency harmonics may be a rounded and more natural sound waveform with higher signal energy or higher average signal energy. Fig. 13 shows an example time domain plot 1300 of a wave 1302, the example time domain plot 1300 showing a sine wave mapped onto the harmonics 1204 of fig. 12. The exemplary wave 1302 is shown having a partially amplitude limited portion 1306 and a partially rounded portion 1308. The comparison between wave 1302 and hard-limited wave 1002 shows that wave 1302 is more rounded than hard-limited wave 1002. Further, the portion 1304 of the wave 1302 is linear or approximately linear. The bending portion 1308 begins to bend at about-3 dB from the clipping portion 1306.
FIG. 14 illustrates an example curve 1400, the example curve 1400 depicting an embodiment of a sine and mapping function 1410. The illustrated sine and mapping function 1410 may be plotted by plotting the values in a sine sum table (e.g., table 714 described above). The sine sum mapping function 1410 includes a quarter cycle sine sum wave. For optimization, a quarter cycle of sine and wave may be used instead of the full wave, as will be described below.
The input signal values are plotted on the x-axis, which includes positive amplitude values ranging between 0 and 1. Similarly, the output signal values are plotted on the y-axis, also including amplitude values ranging between 0 and 1. The negative amplitude value will be described below. When distortion control module 140 or 740 maps an input sample to an output sample, in certain embodiments, the input sample is mapped to a point on mapping function 1410. The mapped output samples may have a larger or smaller value than the input samples, depending on where the input samples are mapped.
For clarity, the sine and map function 1410 is shown as a continuous function. However, when implemented in a digital system, mapping function 1410 may be discrete. Furthermore, as described above, it is not possible to define the mapping function 1410 for all input signal values. Thus, for example, distortion control module 140 or 740 may interpolate the output signal value between the two closest points on mapping function 1410.
For reference, a dashed line 1420 is shown, which corresponds to line y ═ x. If the input samples are mapped according to dashed line 1420, the output samples may be the same as the input samples. Mapping function 1410 includes a linear or approximately linear mapping region 1412 and a non-linear or approximately non-linear mapping region 1414. As the input sample values falling within the linear mapping region 1412 increase, the corresponding output samples in the linear mapping region 1412 also increase linearly or substantially linearly. Some input sample values falling in the non-linear region 1414 increase non-linearly or substantially non-linearly with varying levels of increase.
Most values of mapping function 1410 are larger than the values of dashed line 1420 so that most input samples can be mapped to larger values. However, in region 1416 of the non-linear mapping region 1414, the value of the mapping function 1410 is less than or equal to the value of dashed line 1420. In this region 1416, the input samples are mapped to smaller values. Thus, for example, the value of the hard-clipped samples may be reduced (e.g., have a value of 1.0 or approximately 1.0).
As described above, mapping function 1410 includes a quarter of a sine sum wave, rather than a full wave. Using a quarter wave (or even a half wave) may enable the size of the sine sum table 714 to be reduced, thereby saving memory. For negative input signal values, (e.g., between the ranges [ -1, 0], etc.), the distortion control modules 140, 740 may invert the mapping function 1410 on the x-axis and invert the mapping function 1410 on the y-axis. Thereafter, the distortion control modules 140, 740 may apply the mapping function 1410 to the input samples. Alternatively, negative values may be inverted and normalized to the range [0, 1 ]. Then, a mapping function 1410 may be applied and the resulting output samples may be negated to return to a negative value.
In alternative embodiments, the illustrated function 1410 may look different, for example, depending on the number of harmonics used to generate the sinusoids and table 714. For example, the linear mapping region 1412 may have a greater or lesser slope. The nonlinear mapping region 1414 may be a different shape; for example, there may be fewer peaks. Likewise, the zone 1416 may be smaller or larger in magnitude.
In particular embodiments, the range of the x-axis and/or the y-axis may be different from the above-described range [0, 1 ]. Reducing the x-axis range to [0, a ] may increase the amplification of at least a portion of the input signal, where a is less than 1. Conversely, increasing the x-axis range to [0, b ] may decrease the amplification of at least a portion of the input signal, where b is greater than 1. Advantageously, in some embodiments, clipping may be reduced using b values greater than 1. Similarly, the y-axis may be changed to [0, c ], where c is less than or greater than 1.
Fig. 15 shows a plot 1500 of an example time domain plot of an audio signal 1512 before distortion control is applied. Further, fig. 15 shows an example time domain plot of the same audio signal 1514 after distortion control has been applied. An example implementation using distortion control introduces approximately 6dB of additional gain into the waveform.
Distortion control may be used in other applications, for example, distortion control may be used to increase bass volume while reducing distortion. Distortion control may also be used in frequency spreading applications. Further, for example, distortion control may also be used to synthesize instrument sounds or other sounds by selecting various harmonics to create a desired instrument sound quality.
VII. conclusion
Depending on the embodiment, certain acts, events or functions of any algorithm described herein can be performed in a different order, may be added, merged, or left out all together (e.g., not all described acts or events are necessary for the performance of the algorithm). Further, in particular embodiments, acts or events may be performed concurrently, e.g., through multi-threaded processing, interrupt processing, or multiple processors or processor cores, rather than sequentially.
The various illustrative logical blocks, modules, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, and steps have been described above generally in terms of their functionality. The implementation of such functionality as hardware or software depends upon the particular application and design constraints imposed on the overall system. The described functionality may be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks and modules described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be a processor, controller, microcontroller, or state machine, combinations of these, or the like. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor), a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
Unless otherwise stated, and as will be understood in the context of usage, conditional language such as "may," "for example," and the like, as used herein, is generally intended to cover a particular feature, element, and/or state included in a particular embodiment, but not included in other embodiments. Thus, such conditional language is not generally intended to imply that features, elements, and/or states are in any way required for one or more embodiments or that one or more embodiments require logic to be included with the author's input or prompt inclusion to determine whether such features, elements, and/or states are to be included or performed in any particular embodiment.
While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the device or algorithm illustrated may be made without departing from the spirit of the disclosure. It will be recognized that certain embodiments of the invention described herein may be practiced otherwise than as specifically described or otherwise provided that some features may be used or practiced separately from others. The scope of the invention being indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (17)

1. A system for automatically adjusting a voice intelligibility enhancement applied to an audio signal, the system comprising:
an enhancement module configured to receive an input speech signal comprising formants and to apply audio enhancement to the input speech signal to provide an enhanced speech signal, the audio enhancement configured to emphasize one or more formants in the input speech signal;
an enhancement controller comprising one or more processors, the enhancement controller configured to adjust an amount of audio enhancement applied by an enhancement module based at least in part on an amount of detected ambient noise;
an output gain controller configured to:
adjusting an overall gain of the enhanced voice signal based at least in part on an amount of ambient noise in the input voice signal, an
Applying the total gain to an enhanced speech signal to produce an amplified speech signal; and
a distortion control module configured to reduce clipping in the amplified speech signal by at least mapping one or more samples of the amplified speech signal to one or more values stored in a sum of sinusoids table, wherein the sum of sinusoids table is generated from a sum of lower order sinusoidal harmonics.
2. The system of claim 1, wherein the enhancement module is further operative to emphasize the one or more formants by applying gains to frequency subbands of the input speech signal.
3. The system of claim 1, wherein the enhancement controller is further configured to adjust an amount of audio enhancement applied by the enhancement module based at least in part on an amount of detected ambient noise above a first noise threshold.
4. The system of claim 3, further comprising: a noise sensitivity controller configured to adjust the first noise threshold.
5. The system of claim 4, wherein the noise sensitivity controller provides a user interface configured to allow a user to adjust a noise sensitivity control configured to affect the first noise threshold.
6. The system of claim 4, wherein the noise sensitivity controller comprises:
a first correlator configured to calculate a first autocorrelation value from a microphone input signal received from a microphone of the receiving device;
a first variance module operative to calculate a first variance of the first autocorrelation values;
a second correlator configured to calculate a second autocorrelation value from a speaker input signal, the speaker input signal comprising an output signal of the enhancement module;
a second variance module operative to calculate a second variance of the second autocorrelation values; and
a noise sensitivity adjuster configured to adjust a first noise threshold using one or more of the first and second autocorrelation values and the first and second variances to produce a second noise threshold, wherein the enhancement controller is configured to adjust an amount of audio enhancement applied to the second input audio signal based at least in part on a second amount of detected ambient noise above the second noise threshold.
7. The system of claim 4, wherein the noise sensitivity adjuster is further configured to generate a second noise threshold that is less than the first noise threshold in response to the second variance being less than a predetermined amount.
8. The system of claim 4, wherein the noise sensitivity adjuster is further configured to generate a second noise threshold that is less than the first noise threshold in response to the first variance being greater than a predetermined amount.
9. The system of claim 8, wherein the noise sensitivity adjuster is further configured to decrease the second noise threshold based at least in part on one or more of the first autocorrelation values.
10. The system of claim 8, wherein the noise sensitivity adjuster is further configured to provide a greater reduction in the second noise threshold for larger first autocorrelation values.
11. The system of claim 10, wherein the noise sensitivity adjuster is further configured to decrease the second noise threshold based at least in part on one or more of the second autocorrelation values.
12. The system of claim 8, wherein the noise sensitivity adjuster is further configured to provide a greater reduction in the second noise threshold for a smaller second autocorrelation value.
13. The system of claim 1, wherein the mapping performed by the distortion control module is configured to map the amplified speech signal to an output signal, wherein the output signal has fewer harmonics than a fully saturated signal.
14. The system of claim 1, wherein the enhancement controller is further configured to adjust the amount of audio enhancement applied based at least in part on a ratio of the amount of detected ambient noise to a threshold level.
15. A method for automatically adjusting a voice intelligibility enhancement applied to an audio signal, the method comprising:
receiving an input voice signal comprising formants;
applying audio enhancement to the input speech signal to provide an enhanced speech signal, the audio enhancement configured to emphasize one or more formants in the input speech signal;
adjusting an amount of audio enhancement applied based at least in part on the amount of detected ambient noise;
adjusting an overall gain of the enhanced voice signal based at least in part on an amount of ambient noise in the input voice signal;
applying the overall gain to an enhanced speech signal to produce an amplified speech signal; and
clipping in the amplified speech signal is reduced by mapping at least one or more samples of the amplified speech signal to one or more values stored in a sum-of-sinusoids table generated from a sum of lower order sinusoidal harmonics.
16. The method of claim 15, further comprising: one or more formants are emphasized by applying gains to frequency subbands of an input speech signal.
17. The method of claim 15, wherein the mapping further comprises mapping the amplified speech signal to an output signal, wherein the output signal has fewer harmonics than a fully saturated signal.
HK12111607.8A 2009-09-14 System for adaptive voice intelligibility processing HK1171273B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2009/056850 WO2011031273A1 (en) 2009-09-14 2009-09-14 System for adaptive voice intelligibility processing

Publications (2)

Publication Number Publication Date
HK1171273A1 HK1171273A1 (en) 2013-03-22
HK1171273B true HK1171273B (en) 2015-08-14

Family

ID=

Similar Documents

Publication Publication Date Title
US8204742B2 (en) System for processing an audio signal to enhance speech intelligibility
CN102498482B (en) System for adaptive voice intelligibility processing
US10299040B2 (en) System for increasing perceived loudness of speakers
CN100397781C (en) sound enhancement system
EP2465200B1 (en) System for increasing perceived loudness of speakers
EP1709734B1 (en) System for audio signal processing
CN103827965B (en) Adaptive voice intelligibility processor
AU771444B2 (en) Noise reduction apparatus and method
US7539614B2 (en) System and method for audio signal processing using different gain factors for voiced and unvoiced phonemes
HK1171273B (en) System for adaptive voice intelligibility processing
HK1197111B (en) Adaptive voice intelligibility processor
HK1167526B (en) System for increasing perceived loudness of speakers