HK1163328A

HK1163328A - Gain control based masking

Info

Publication number: HK1163328A
Application number: HK12103873.2A
Authority: HK
Inventors: R．卡策尔; K．哈通
Original assignee: 伯斯有限公司
Priority date: 2008-12-23
Filing date: 2009-12-02
Publication date: 2012-09-07

Abstract

Interfering signals that may be present in a listening environment are masked by reproducing a desired signal in a listening environment, determining a masking threshold associated with the desired signal, identifying an interfering signal that may be present in the environment, comparing the interfering signal to the masking threshold, and adjusting the desired signal over time to raise its masking threshold above the level of the interfering signal.

Description

Masking based gain control

Technical Field

This specification relates to signal processing that utilizes the masking behavior of the human auditory system to reduce the perception of unwanted signal interference, and a system for creating acoustically isolated regions to reduce noise and signal interference.

Background

Since audible signals have been broadcast and reproduced from sound recordings, a wide variety of content has been provided for the listener to select from. For example, passengers traveling in a vehicle may each have a different favorite radio station or recording (e.g., compact disc, etc.). However, only a single station can be selected at a time for broadcasting from the vehicle's radio. Similarly, different passengers may want to listen to different types and genres of recorded material (e.g., music from a compact disc or memory device) using the vehicle's audio device (e.g., compact disc player). However, only a single selection (e.g., a compact disc track) can be played back at a time. Furthermore, the perception of the played back selection may be deteriorated by interference from noise sources both inside and outside the vehicle. For example, along with engine noise and passenger speech, as the vehicle travels through a noisy environment (e.g., a city center), relatively loud noise may overwhelm selected radio stations or recorded playback and create an unpleasant listening experience for the passengers.

Disclosure of Invention

In one aspect, a method for masking an interfering audio signal includes identifying a first frequency band of a signal provided to a first acoustic region to adjust a masking threshold associated with a second frequency band of the signal. The method also includes applying a gain to the first frequency band of the signal to raise the masking threshold in the second frequency band above the interfering signal.

Implementations may include one or more of the following features. Identifying the first frequency band of the signal may include selecting a frequency band having a maximum level from a set of frequency bands. The first and second frequency bands may be in the Bark (Bark) domain. Adjusting the first frequency band of the signal may include comparing the masking threshold to a level of the interfering signal. The gain applied to the first signal may be slew rate limited. To apply the gain to the first frequency band, the method may include smoothing the gain to maintain a peak gain value. To maintain the peak value, the method may include expanding the peak value. The interfering signal may comprise various types of signals, such as a signal provided to the second acoustic region, an estimate of a noise signal, or other types of signals.

In another aspect, a method for masking an interfering audio signal includes reproducing a first signal having a level at a first location. The first signal is also associated with a first frequency range. The method also includes determining a masking threshold based on a frequency associated with the first signal at the first location. Further, the method includes identifying a level of a second signal present at the first location. The second signal is associated with a second frequency range different from the first frequency range. The method also includes comparing a level of a second signal present at the first location to a masking threshold. The method also includes adjusting the first signal level to increase the masking threshold above a level of the second signal within the second frequency range.

Implementations may include one or more of the following features. The first and second frequency ranges may be represented in the bark domain or other similar domain. The adjustment of the first signal may be slew rate limited. Adjusting the first signal level may include applying a gain. Applying such a gain may include smoothing the gain to maintain a peak gain value. The hold peak may comprise an extended peak. The second signal may comprise various types of signals, such as a signal provided to the second location, an estimate of a noise signal, or other similar signals. The method may further include adjusting the second signal level according to the frequency to reduce the second signal level below the masking threshold over at least a portion of the second frequency range in order to reduce audibility of the second signal at the first location.

In yet another aspect, a method includes regenerating a first signal having a level at a first location as a function of frequency. The first signal also has a first frequency range. The method also includes determining a masking threshold based on a frequency associated with the first signal at the first location. Furthermore, the method includes identifying a level based on a frequency of a second signal present at the first location. The second signal has a second frequency range. The method also includes comparing a level of a second signal present at the first location to a masking threshold. Further, the method includes adjusting the second signal level according to the frequency to reduce the second signal level below the masking threshold over at least a portion of the second frequency range in order to reduce audibility of the second signal in the first location.

Implementations may include one or more of the following features. The first and second frequency ranges may be represented in the bark domain or other similar domain. To adjust the level of the second signal, the method may include reducing the gain. The second signal may include various types of signals, such as a signal provided to the second location.

In another aspect, a method includes receiving a plurality of data points, wherein each data point is associated with a value. The method also includes defining an average window having a window length and identifying at least one peak value from the data point values. The method also includes assigning the identified peak to data points adjacent to the data point associated with the identified peak to produce an adjusted plurality of data points. The combined length of the adjacent data points and the data points associated with the identified peak is equivalent to the window length. The method also includes averaging the adjusted plurality of data points using an averaging window to produce a smoothed version of the plurality of data points.

Implementations may include one or more of the following features. The data point associated with the identified peak may be located at the center of the adjacent data point to which the peak is assigned. Averaging may include stepping an averaging window along the adjusted plurality of data points.

These and other aspects and features, and various combinations thereof, may be expressed as methods, apparatus, systems, means for performing functions, program products, and in other ways.

Drawings

Fig. 1 is a top view of an automobile.

Fig. 2 illustrates an acoustically isolated region within a passenger cabin.

Fig. 3-5 are graphs illustrating masking of acoustic signals.

Fig. 6 is a block diagram of an audio processing device.

Fig. 7 includes a block diagram of an interference estimator.

Fig. 8 is a graph of masking thresholds.

Fig. 9 is a graph of acoustic signal input level versus output level.

Fig. 10 is a graph of gain versus frequency.

FIG. 11 is a flow chart of the operation of a masking estimator.

Fig. 12 is a flow chart of the operation of the interference estimator.

Fig. 13 is a flow chart of the operation of the gain setter.

Detailed Description

Referring to fig. 1, an automobile 100 includes an audio reproduction system 102, the audio reproduction system 102 capable of reducing interference from acoustically isolated areas. Such zones allow occupants of the automobile 100 to individually select different audio content for playback without interfering with or being interfered with by playback in other zones. However, an overflow of the acoustic signal may occur and disturb the playback. By reducing spillover, the system 102 improves audio reproduction while reducing interference. Although the system 102 is illustrated as being implemented in the automobile 100, similar systems may be implemented in other types of vehicles (e.g., airplanes, buses, etc.) and/or environments (e.g., homes, places of business, restaurants, sporting arenas, etc.) where multiple people may wish to individually select and listen to similar or different audio content. Along with addressing audio content spillover from other isolated regions, the audio reproduction system 102 may address spillover from other types of audio sources. For example, noise outside of the passenger compartment of the automobile, such as engine noise, wind noise, etc., may be addressed by the regeneration system 102.

As represented in the figure, the system 102 includes an audio processing device 104, the audio processing device 104 processing audio signals for reproduction. In particular, the audio processing device 104 monitors and reduces spillover to assist in maintaining acoustically isolated zones within the automobile 100. In some arrangements, the functionality of the audio processing device 104 may be incorporated into an audio device, such as an amplifier or the like (e.g., a radio, CD player, DVD player, digital audio player, hands-free telephone system, navigation system, vehicle infotainment system, etc.). Additional audio devices may also be included in the system 102, such as speakers 106(a) - (f) distributed throughout the passenger cabin may be used to reproduce audio signals and to create acoustically isolated zones. For example, speakers (a) - (f), along with other speakers and devices (as desired), may be used in a System such as that described in U.S. patent application Serial No. 11/780,463, "System and Method for directing Radiating Sound," the entire contents of which are incorporated by reference. Other transducers, such as one or more microphones (e.g., an in-dash microphone 108), may be used by the system 102 to collect audio signals, e.g., for processing by the system. Additional speakers may also be included in the system 102 and located throughout the vehicle. The microphone may be located on a headliner, stanchion, seat back or head rest or other location that facilitates sensing sounds within or proximate to the vehicle. In addition, the built-in control panel 110 provides a user interface for initiating system operation, exchanging information (such as making settings controllable by a user), and providing a visual display for monitoring the operation of the system. In this implementation, the built-in control panel 110 includes a control knob 112 to enable user input for controlling volume adjustments and the like.

To reduce spillover and control acoustic energy radiated into these areas, various signals may be collected and used for processing operations of the audio reproduction system 102. For example, signals from one or more audio sources and signals of selected audio content may be used to form and maintain an isolated region. Environmental information (e.g., ambient noise present in the interior of the automobile) that may interfere with the ability of passengers to hear audio may be sensed (e.g., by in-built microphone 108) and used to reduce zone spillover. Instead of an in-built microphone 108 (or multiple microphones incorporated into the automobile), the audio system 102 may use one or more other microphones placed in the interior of the automobile 100. For example, a microphone of the cellular telephone 114 (or other type of handheld device) may be used to collect ambient noise. By wirelessly or hardwired connecting the cellular telephone 114 via the built-in control panel 110, the audio processing device 104 may be provided with an ambient noise signal by a cable (not shown), a bluetooth connection, or other similar connection technique. Ambient noise may also be estimated from other techniques and methods, such as inferring the noise level based on engine operation (e.g., engine RPM), vehicle speed, or other similar parameters. The state (e.g., open or closed) of a window, skylight, etc. may also be used to provide an estimate of the ambient noise. The location and time of day may be used for noise level estimation, for example, a global positioning system may be used to locate the location of the car 100 (e.g., in a city) and for estimation along with a clock (e.g., noise is greater during the day).

Referring to FIG. 2, a portion of a passenger compartment of an automobile 100 illustrates areas that are desired to be acoustically isolated from one another. In this particular example, four zones 200, 202, 204, 206 are monitored by the regeneration system 102, and each zone is centered on a unique seat of the automobile (e.g., zone 200 is centered on the driver's seat, zone 202 is centered on the front passenger seat, etc.). For situations in which each zone is created to be acoustically isolated, a passenger located in one zone will be able to select and listen to audio content without disturbing or being disturbed by the audio content played back in one or more other zones. In one example, the regeneration system 102 operates to reduce inter-zone spillover (as described in U.S. patent application serial No. 11/780,463) to improve acoustic isolation. The regeneration system 102 may also operate to reduce interference between perceived regions. Further, the area 200 can be monitored 206 to reduce perceived interference from other types of audible signals. For example, perceived interference from signals internal (e.g., engine noise) and external (e.g., street noise) to the automobile 100 and associated interference in selecting audio content for playback may be substantially reduced.

Generally, perceived interference is reduced by masking signals outside of the region (i.e., undesired signals) with in-region (i.e., desired) signals. In general, complete removal of region-to-region spillover may not be achieved, and some audible perturbations may be discernable. However, spillover may be less noticeable when different audio content is being provided to multiple regions (e.g., one broadcast station is provided to region 200 and another broadcast station is provided to region 202) and signal processing with auditory masking is implemented. Although four zones are illustrated in this particular arrangement, the regeneration system 102 may monitor and reduce spillover (both real physical sound leakage and perceived interference) for additional or fewer zones. Along with the number of regions, the size of the regions may also be adjustable. For example, the front seat areas 200, 202 may be combined to form a single area and the rear seat areas 204, 206 may be combined to form a single area, thereby creating two areas of increased size in the automobile 100.

Referring to fig. 3, a graph 300 graphically illustrates auditory masking in the human auditory system when responding to a received signal. Such masking may be utilized by the regeneration system 102 to reduce perceived overflow among two or more regions. Generally, audio signals (e.g., from a radio station, CD soundtrack, etc.) selected for playback in a particular region (e.g., region 200) stimulate the auditory system. When the selected signal is present, it may or may not be perceived depending on the relationship of the other signal presented to the auditory system to the first signal. In other words, the first signal may mask the other signals. In general, a loud sound may mask other quieter sounds that are relatively close in frequency to the loud sound. A masking threshold may be determined in association with the first signal that describes a perceptual relationship between the first signal and other signals present. The second signal below the masking threshold presented to the auditory system may not be perceived, while the second signal above the masking threshold may be perceived.

In the graph 300, a horizontal axis 302 (e.g., an X-axis) represents frequency on a logarithmic scale, and a vertical axis 304 (e.g., a Y-axis) represents signal level (e.g., a decibel scale) also on a logarithmic scale. To illustrate the masking present in the auditory system, a tone signal 306 is represented at a certain frequency (on the horizontal axis 302), which has a corresponding signal level on the vertical axis 304. When the tonal signal 306 is presented to the auditory system, a masking threshold 308 may be generated in the auditory system over a range of frequencies. For example, in response to tone signal 306 (at frequency f)₀At), the masking threshold 308 extends above the frequency of the tone signal 306 (e.g., to frequency f)₂) And below (e.g. to frequency f)₁). As shown, the masking threshold 308 is not related to the tone signal frequency f₀Symmetrical and more frequency spread with increasing frequency than decreasing frequency (i.e. f)₂-f₀＞f₀-f₁) As instructed by the auditory system.

When a second acoustic signal is presented to the listener (e.g., the acoustic signal spills over from another region), it is included within the masking threshold curve frequency range (i.e., frequency f)₁And f₂In) the frequency, the correlation between the level of the second acoustic signal and the masking threshold 308It is determined whether the second signal is audible to the listener. Signals having levels below the masking threshold curve 308 may not be audible to a listener, while signals having levels above the masking threshold curve 308 may be audible. For example, the tone signal 310 is masked by the tone signal 306 because the level of the tone signal 310 is below the masking threshold 308. Instead, the tone signal 312 is not masked because the level of the tone signal 312 is above the masking threshold 308. Thus, tone signal 312 is audible, while tone signal 310 is not heard over tone signal 306.

Referring to fig. 4, a graph 400 illustrates (at a particular time instant) a frequency response 402 of a selected signal and a corresponding masking threshold 404 of the auditory system associated with the signal. For example, numerical models may be developed to represent a typical auditory system. From the model, an auditory system response (e.g., masking threshold 404) may be determined for an audio signal (e.g., a selected audio signal within a region). Although the masking threshold 404 follows the general shape of the frequency response 402, the threshold is not equivalent to the frequency response due to the behavior of the auditory system (which is represented in the auditory system model). Similar to the scheme illustrated in fig. 3, the second (i.e., interfering) signal presented to the auditory system having a level that exceeds the masking threshold 404 may be audible, while the signal presented to the auditory system having a level below the threshold may not be discernable (and considered masked). For example, because the level of the tone signal response 406 is below the masking threshold 404 (at the frequency f of the tone signal 406)₁Where) the tone signal 406 is masked (not discernable by the auditory system). Instead, the level of the tone signal 408 exceeds the level of the masking threshold 404 (at the frequency f of the tone signal)₂Where) and audible to a listener. Accordingly, adjustments may be applied to the selected audio signals within the region over time to reduce the number of instances that the interfering signal exceeds the masking threshold associated with the selected signals. In some arrangements, if the interfering signal is known and controllable by the audio system, the adjustment may be applied to the interfering signal over time to reduceA number of instances that the jammer signal exceeds a masking threshold associated with the selected signal. In some arrangements, both the selected signal and the interfering signal within the area may be adjusted over a period of time to reduce the number of instances that the interfering signal exceeds a masking threshold associated with the selected signal.

One or more techniques may be implemented to adjust the signal to reduce audibility of the interfering signal. The level of the desired signal (e.g., the selected signal within the region represented by frequency response 402) may be increased (e.g., gain applied) to correspondingly increase it at the appropriate frequency (e.g., frequency f)₂) The level of where the interfering signal has energy. Without considering masking, the gain of signal 402 may be increased by an amount (β) to raise its level to interference signal 408 at frequency f₂Above the ground level. In some cases, the gain of signal 402 may be increased by an amount equal to (β) plus an offset (e.g., an offset of 1dB, 2dB, or higher) to ensure that signal 402 completely masks the interfering signal. Alternatively, the level of the selected signal may be increased (e.g., gain applied) to correspondingly increase it at frequency f₂An associated masking threshold at which the interfering signal 408 has energy. The masking threshold need only be increased by an amount (alpha) to raise it above the level of the interfering signal 408. The selected signal may be increased at frequency f₂To raise its associated masking threshold above the level of the interfering signal 408. In some cases, this may be done by adjusting the gain of signal 402 by an amount less than (β) but greater than (α). If the signal 402 has a frequency f₂Relatively less energy at than in adjacent frequencies, it may be desirable to have energy at frequency f₂Applies a gain greater than (alpha) to the signal 402 to raise the masking threshold above the level of the interfering signal 408, and at a frequency f₂The masking threshold at (a) is primarily a result of the energy present at these nearby frequencies. Alternatively, may be different from f₂Adjusts the gain of the selected signal to shift its masking threshold by an amount (alpha) that increases it to the frequency of the interfering signalf₂Above the desired level. In this case, the level of the selected signal is increased to the level of the interfering signal at f₂At a level different from f than above₂Requires less gain to raise the masking threshold of the selected signal to the interference signal at f₂Above the ground level. Accordingly, by adjusting the masking threshold 404 for signal masking, the spectral content of the selected signal may be less variable. This is shown in fig. 5 and described in more detail below.

Referring to fig. 5, a graph 500 illustrates raising the masking threshold 404 such that both tone signal responses 406, 408 are at the respective frequencies f₁And f₂Below the threshold of (c). In this illustration, a portion of the signal frequency response 402 is adjusted to position the masking threshold 404 above the response of the interfering signal. By applying a gain, for example, the level of the masking threshold 404 is greater than the tonal signal response 408 (at frequency f)₂At) is detected.

A portion of the spectrum of the desired signal may be identified which may control the level of the masking threshold (at the frequency at which the interference occurs). For example, one or more portions of the signal frequency response 402 may be identified and adjusted to position the masking threshold 404 at an appropriate level (at frequency f)₂At (c). In this case, a peak 502 of the signal frequency response 402 is identified as controlling the masking threshold 404 (at frequency f)₂At (c). By applying a relatively small gain adjustment to the peak 502 of the frequency response 402 (at frequency f)₃At), the appropriate portion 504 of the masking threshold 404 is raised to a level above the tone signal 408 (at frequency f)₂At (c). Accordingly, by selectively identifying and adjusting one or more appropriate portions of the frequency response 402, the masking threshold 404 may be adjusted in order to mask the interfering signal.

Referring to fig. 6, a block diagram 600 represents a portion of the audio processing device 104 that monitors one or more acoustically isolated zones (e.g., zone 200) and reduces the effects of undesired signals (e.g., overflow signals) from other locations (e.g., nearby zones, external noise sources, etc.). For example, the auditory system exhibits a masking threshold capable of masking undesired signals in response to being rendered with signals selected for playback in a region of interest (e.g., region 200). In this way, an audio signal (referred to as an intra-region signal in the figure) to be generated in a region of interest (e.g., region 200) is provided to the audio input stage 602 of the audio processing device 104. Audio signals (referred to as jammer signals) selected for playback in other regions (e.g., regions 202, 204, 206) are also provided to the audio input stage 602. In some arrangements, other types of signals may be collected by the audio input stage 602, for example noise signals inside or outside the vehicle may be collected. Further, while the processing of block diagram 600 described below involves operations in a single region, it should be understood that redundancy may provide similar functionality to multiple regions.

In this implementation, both the in-region signal and the interfering signal are provided to the audio input stage 602 in the time domain and to the domain transformers 604, 606, respectively, so as to be segmented into overlapping blocks and transformed into the frequency domain (or other domain, such as the time-frequency domain or any other domain that may be useful). For example, one or more transforms (e.g., fast fourier transforms, wavelets, etc.) and segmentation techniques (e.g., windowing, etc.) as well as other processing methods (e.g., zero padding, overlapping, etc.) may be used by the domain transformers 604, 606. The transformed interference signals are provided to an interference estimator 608, and the interference estimator 608 estimates the amount of interference (e.g., audio spill over) provided by each respective interference signal. For example, a region of interest 200 (shown in fig. 2), estimates the amount of signal present in each of the other regions 202, 204, and 206 that spills into the region 200. To produce such an estimate, one or more signal processing techniques may be implemented, such as determining a transfer function (e.g., S-parameter S) between each pair of regions₁₂、S₂₁Etc.). For example, transfer functions between regions 200 and 202, between regions 200 and 204, and between regions 200 and 206 may be determined. Once the transfer function is known, it can be used to convolve in the time domain (or multiply in the frequency domain) the selected for eachThe signals presented in the zones (zones 202, 204, and 206) are interfered with to estimate the interference signal spilled into zone 200. Once determined, the results from multiple regions can be combined using superposition (or other similar techniques). Additional quantities, such as statistical and higher order transfer functions, may also be computed to characterize potential region overflow.

Referring to fig. 7, one or more techniques and methodologies may be used by interference estimator 608 (shown in fig. 6) to quantify interference from other areas or noise sources. For example, in one implementation, the interference estimator 700 may include an inter-region transfer function processor 702, the inter-region transfer function processor 702 providing an estimate of the amount of audible overflow between regions. A slew rate limiter 704 may also be included in the interference estimator 700, for example as described below, to reduce cross modulation of signals between isolated regions. In another implementation, interference estimator 706 may estimate a level of noise present at one or more locations (e.g., areas outside of a passenger cabin, etc.) in order to adjust one or more masking thresholds to reduce the effects of the noise. A slew rate limiter 720 may also be included in interference estimator 706 to reduce the modulation of the desired signal by interference noise. For example, noise estimator 708 (included within interference estimator 706) may use one or more adaptive filters (e.g., Least Mean Square (LMS) filters, etc.) to estimate the noise level, as described in U.S. Pat. nos. 5,434,922 and 5,615,270, the contents of which are incorporated herein by reference. The noise level collected by one or more microphones (e.g., the built-in 108) may be provided (via the audio input stage 602) to the interference estimator 706 for estimating the noise level to adjust the masking threshold. In some implementations, the functionality of both interference estimators 700, 706 may be used such that a masking threshold may be determined based on multiple types of noise signals (e.g., present in a region, outside of a region, etc.) and audible signals provided to one or more regions for playback.

The slew rate limiters 704, 720 apply the slew rate to the output of the interference estimators 700, 706 to reduce audible and objectionable modulation. In this way, the peak of the interfering signal is maintained for a predetermined period of time before being allowed to fade. For example, the slew rate limiter 704, 720 may maintain the peak interference signal level 0.1 to 1.0 seconds before allowing the signal level to fade at a predetermined rate (e.g., 3dB to 6dB per second). Referring to graph 710, trace 712 represents the interference signal as a function of time for a single frequency band (or bark band, as described below) that is provided to slew rate limiter 704, and trace 714 represents the slew rate limited interference signal. As represented by trace 714, each peak is maintained for an approximately constant period of time before fading at the predetermined rate. For the example in which another peak occurs as time progresses, the signal level increases unimpeded. By including the slew rate limiters 704, 720, the intermittent structure of the interfering signal is significantly prevented from appearing as audible artifacts (e.g., modulation) within the in-region signal. Further, the gain can be adjusted in a rapid manner without overdriving signals in the regions while reducing cross modulation of signals between regions. In one implementation, where the interference estimator divides the interfering signal into multiple frequency (or barker) bands, the multiple frequency bands are processed in parallel according to the method described above.

Returning to fig. 6, block diagram 600 includes a masking threshold estimator 610 to estimate one or more masking thresholds associated with signals within a region. In this implementation, the intra-region frequency-domain signal is received by transformer 606 and scaled to reflect the auditory system response (e.g., transform frequency bins of the frequency-domain signal based on a model of human auditory perception). For example, the signal may be converted to a bark scale, which defines the bandwidth based on the human auditory system. In one implementation, the bark value may be calculated from the frequency in Hz by using the following equation:

equation (1) is one particular definition of the bark scale, however, other equations and mathematical functions may be used to define another scale. Further, other methods and techniques may be used to transform a signal from one domain (e.g., the frequency domain) to another domain (e.g., the bark domain). Along with the masking threshold estimator 610, the signal provided from the interference estimator 608 is transformed to bark scale before being provided to the gain setter 612. In one implementation, both the masking threshold estimator 610 and the interference estimator 608 convert the frequency range of 0 to 24,000Hz to bark scales in the range of approximately 0 to 25 bark. Further, by dividing each barker band into a predetermined number of segments (e.g., three segments), the number of barker bands is increased proportionally (e.g., to 75 barker sub-bands).

Along with transforming the frequency domain signal onto the bark scale, the masking threshold estimator 610 determines a masking threshold based on the in-region signal level for each bark band. The masking threshold estimator 610 identifies, for each barker band, the barker band of the signal in the region most likely to bear the threshold. This can be understood as follows.

When a signal has energy present in a first frequency (e.g., bark) band, it has an associated masking threshold in that bark band. The masking threshold also extends to the nearby barker band. The level of the threshold slides down with a certain slope (determined according to the characteristics of the auditory system) on either side of the first bark band where energy is present. This is shown in curve 308 of fig. 3 for a single tone, but is similar for the bark band. The slope is determined according to the characteristics of the human auditory system and has been experimentally determined to be on the order of-24 dB to-60 dB per octave. In general, the slope in frequency drops much more steeply than the slope in frequency rises. In one implementation, slopes of-28 dB/octave (rising in frequency) and-60 dB/octave (falling in frequency) are used. In other implementations, other slope values may also be incorporated. Depending on the level and slope of the energy present in the signal in the nearby frequency bands, the masking threshold in the first bark band may be controlled by the energy in the first bark band, or it may be controlled by the energy in other nearby bark bands. When the masking threshold estimator 610 determines the masking threshold for the in-region signal 402, it keeps track of which barker bands dominate the masking threshold in each barker band of the signal. For signal 402, masking threshold estimator 610 superimposes masking threshold curves for all individual bark bands and chooses the maximum curve in each band as the masking threshold in that band. That is, it overlaps (scales by the amount of energy in each bark band) a curve for each bark band similar to curve 308 of fig. 3 and selects the highest one in each band. The masking threshold estimator 610 then keeps track of which barker bands are charged with the threshold in each barker band. The masking threshold estimator 610 may also subtract an offset from the determined threshold. The offset is arbitrary, but may be any amount of 1dB, 2dB, generally less than 6dB, or some other amount. The objective is to ensure that the threshold is set lower than it would otherwise be, so that when a gain is applied to a selected signal to raise its masking threshold above the level of the interfering signal, slightly more gain is applied than would otherwise be applied, without the need for an offset. This reduces the chance that the interfering signal will remain audible above the selected signal. As described above, to control the adjustment, the masking threshold estimator 610 identifies a particular bark band, which may be identical (or different) to the band being adjusted. Of course, other techniques and methods may be used to identify one or more frequency bands to control the threshold adjustment.

Referring to fig. 8, a graph 800 represents a portion of a frequency domain signal 802 (from the domain transformer 606) converted to a bark domain signal 804. The barker range portion shown has a value between 10 and 18 and each band is segmented into three sub-bands (to produce a barker range of 30 to 54, as represented on the horizontal axis). For each barker domain value of signal 802, masking threshold estimator 610 calculates a masking threshold represented by signal trace 806. Further, the masking threshold estimator 610 identifies a particular barker band that primarily controls the adjustment for each calculated masking threshold. Referring to the graph, an integer is placed above each band to identify the barker band that is responsible primarily for the masking threshold, which is the barker band that should be adjusted to most strongly influence the masking threshold. Adjustment of the masking thresholds in barker bands 32, 33 and 34 is controlled, for example, by adjusting barker band 32 (as indicated by the three instances of the numeral "32" labeled above bands 32-34).

One or more techniques may be implemented to select a particular barker band to control adjustments to other barker bands or the same barker band. For example, specific frequency bands may be grouped and the group member with the largest masking threshold may be used to adjust the group members. Referring to the figure, a group may be formed by barker bands 32-34 and the group member with the largest threshold may be identified by the masking threshold estimator 610. In this case, the barker band 32 is associated with a maximum masking threshold and is selected to control group membership adjustment. Various parameters may be adjusted for such a determination, e.g., a group may include more or fewer members. Other methods, separate from or in combination with determining the maximum value, may be implemented to identify a particular barker band. For example, multi-valued searches, value estimates, hysteresis, and other types of mathematical operations may be implemented when identifying a particular barker band.

Returning to fig. 6, upon receiving the masking threshold from masking threshold estimator 610 and the estimate of the interfering signal from interference estimator 608, gain setter 612 determines an appropriate gain to apply to the in-region signal such that the masking threshold of the selected in-region signal exceeds the interfering signal (e.g., overflow signals from other regions, noise, etc.). In general, the gain setter 612 compares the masking threshold (from the in-region signal) to the interference signal (on a barker band basis) to determine whether signal adjustment is warranted. If desired, one or more gains are identified for application to signal portions associated with controlling one or more barker bands (e.g., if the interfering signal has a level in barker band 33 that would be above the masking threshold associated with the unmodified in-region signal, a gain is applied to the signal portion associated with barker band 32 to adjust the masking threshold in barker band 33).

Referring to fig. 9, a graph 900 illustrates applying a gain to an in-region signal (at a particular barker band) to adjust a masking threshold at one or more barker bands. Graph 900 includes a horizontal axis representing the level of the signal within the region and a vertical axis representing the output signal level (after the gain is applied). Generally, the signals in the input area and the output signals have minimum and maximum levels. The maximum output level may be user selected (e.g., provided by a maximum volume setting), while the minimum output level may be determined from the estimated level of the interfering signal plus an offset value to mask the interfering signal. Thus, one or more suitable gains are applied to the in-region signal range 902, the in-region signal range 902 being defined by a minimum in-region signal level and an in-region signal level equivalent to the interference signal level plus an offset. In this way, appropriate gain is applied to the signal level to exceed the interference level as needed for the adjustment.

Returning to fig. 6, along with determining the gain required to adjust the masking threshold and identifying the appropriate barker band to use for controlling the adjustment, gain setter 612 also determines the appropriate gain value in the frequency domain. In this way, the gains identified in the bark domain are converted into the frequency domain. For example, equation (1) may be used to define a function to convert the gain from the bark domain into the frequency domain. Along with the conversion provided into the frequency domain, other operations may be provided by the gain setter 612 to prepare the gain for application to the signals within the region. For example (as described below), the gain values may be smoothed before application.

Referring to fig. 10, a graph 1000 illustrates a set of gains determined by a gain setter 612 to produce a masking threshold for a particular time instant. Having converted from the bark domain to the frequency domain, solid line 1002 represents the gain across a range of frequencies (100Hz to 20,000Hz), as represented on the horizontal axis. In this illustration, the gains derived in the bark domain are converted into corresponding frequency bins. Referring to equation (1), at lower frequencies, one band in the barker domain may be equivalent to one bin in the frequency domain. However, at higher frequencies, a barker band may contain hundreds of frequency bins. Thus, the gain (as represented by trace 1002 using the logarithmic frequency axis) appears to be compressed with frequency and relatively discontinuous and blocky in the frequency domain. In the case of a conversion into the time domain, such a gain function usually produces an impulse response with an extended time period and which is sensitive to aliasing.

To reduce the length of the impulse response and concentrate the signal energy in time, one or more techniques and methods are used to apply a smoothing function to the gain (represented by trace 1002). However, to properly mask the interfering signal, the peak gain level needs to be maintained. In this way, a smoothing technique that preserves the gain peak is achieved. In one exemplary technique, a smoothing function is selected that averages the gain values over a window of predetermined length. The average gain value is saved and the window is slid up in frequency to repeat the process and calculate the moving average while stepping along the frequency axis. To maintain the gain peaks, each peak is detected and widened by an amount equivalent to the window width. In this way, the peak is maintained when the broadened peaks are averaged over the window. For example, for an averaging window defined as 1/6 octaves, each gain peak is widened 1/12 octaves on each side of the peak. Other window sizes may also be implemented.

The dashed trace 1004 represents the smoothed gain and illustrates peak hold. Although the smoothed gain values (e.g., highlighted with arrow 1006) may be relatively higher for non-peaks, each peak is guaranteed to be maintained across the frequency range and a suitable masking threshold is guaranteed to be generated. By applying such a smoothing function, aliasing can be reduced and the corresponding impulse response (of such a gain in the time domain) is generally more compact.

Returning to fig. 6, after the appropriate gain values are determined by the gain setter 612 and transformed into the linear frequency domain (and smoothed), the gain values are applied to the signals within the region. In this particular implementation, the amplifier stage 614 is provided with a gain value from the gain setter 612 and applies the gain to the in-region signal in the frequency domain. The output of the gain stage 614 is received by a domain transformer 616 and transformed back into the time domain. Further, in this implementation, the domain transformer 616 accounts for the segmentation (performed by the domain transformer 606) to produce a substantially continuous signal. The audio output stage 618 is provided with the time domain signal from the domain converter 616 and prepares the signal for playback. For example, signals may be adjusted (e.g., gain applied) by the audio output stage 618 to transmit audio content to one or more speakers (e.g., speakers 106(a) - (f)).

Referring to fig. 11, a flow diagram 1100 represents certain operations of the masking threshold estimator 610. As described above, the masking threshold estimator 610 may be executed by the audio processing device 104, e.g., instructions may be executed by a processor (e.g., a microprocessor) associated with the audio processing device. Such instructions may be stored in a storage device (e.g., hard disk drive, CD-ROM, etc.) and provided to the processor (or processors) for execution. The audio processing device may be installed in other locations (e.g., a home, office, etc.) along with the devices installed in the vehicle. Further, a computing device, such as a computer system, may be used to perform the operations of the masking threshold estimator 610. Circuitry (e.g., digital logic) may also be used individually or in combination with one or more processing devices to provide the operation of the masking threshold estimator 610.

The operation of the masking threshold estimator 610 includes receiving (1102) a frequency domain signal and calculating (1104) a bark domain representation of the signal. From the bark domain representation of the signal, the masking threshold estimator 610 calculates (1106) a masking threshold, e.g., an adjustable masking threshold may be calculated for each bark band. The offset may be subtracted from the calculated threshold in the one or more frequency bands. The masking threshold estimator keeps track of the barker bands that assume the masking threshold in each barker band. To adjust the masking threshold in the barker band, the masking threshold estimator 610 determines (1108) one or more appropriate barker bands (the band or bands that are most responsible for masking) for controlling the adjustment. In some examples, a barker band group may be formed and a particular frequency band with the largest signal level (within the group) is allocated for adjusting each barker band member of the group.

Referring to fig. 12, a flow diagram 1200 includes certain operations of the interference estimator 608. As described with reference to fig. 7, slew rate limiters 704, 720 may be included in the interference estimator to reduce modulation artifacts of interfering signals present in the in-zone signal. Similar to the masking threshold estimator 610, the operations of the interference estimator 608 may be performed from instructions provided to one or more processors (e.g., microprocessors), custom circuits, other similar processing techniques, or a combination of methods.

To provide slew rate limiting, the operation of interference threshold estimator 608 may include receiving (1202) an interference signal (e.g., a frequency or bark domain signal obtained from a transfer function between two zones, or a frequency or bark domain signal obtained from microphone measurements) and determining (1204) whether a peak is detected. Peak detection is well known in the art and methods for performing peak detection will not be described in further detail herein. In one arrangement, peak detection is provided by monitoring and comparing individual signal levels. If a peak is detected, the operation includes holding (1206) the peak for a predetermined period of time (e.g., 0.1 seconds, 1.0 seconds, etc.). If a peak has not been detected or after holding the detected peak, the operations include determining (1208) whether a peak is currently being held. If the peak hold period is not active (e.g., a peak has not been detected), then the interference estimator 608 allows the signal to fade (1210). If a peak is currently being maintained, operation returns to determining whether another peak is detected.

Referring to fig. 13, a flow chart 1300 includes certain operations of the gain setter 612. As described with reference to fig. 7, along with selecting gain values and converting the values from the bark domain to the frequency domain, the gain setter 612 applies a smoothing function to the derived gain to maintain the peak value. Similar to the masking threshold estimator 610 and the interference estimator 608, the operations of the gain setter 612 may be performed from instructions provided to one or more processors (e.g., microprocessors), custom circuits, or using other similar processing techniques or combinations of processing techniques.

To identify the appropriate gain, the operation of the gain setter 612 includes comparing the in-region signal (or signals) with one or more interfering signals (1302). The comparison may be performed on bark band representations of various signals. Based on this determination, gain setter 612 determines (1304) the gain or gains required to adjust the masking threshold and the appropriate barker band to use for applying the gain. The operations of the gain setter also include converting (1306) the identified gain from the bark domain to the frequency domain depending on how the bark domain is defined (e.g., equation (1)). Once placed on the linear frequency scale, the operation includes applying (1308) a smoothing function to the gain. For example, a peak-hold smoothing function may be applied such that the peak gain value is maintained to ensure that an appropriate masking signal is generated.

According to one implementation, to perform the operations described in flowcharts 1100, 1200, and 1300, masking threshold estimator 610, interference estimator 608, and gain setter 612 may individually or in combination perform any of the computer-implemented methods previously described. For example, the audio processing device 104 may include a computing device (e.g., a computer system) to execute instructions associated with the masking threshold estimator 610, the interference estimator 608, and the gain setter 612. The computing device may include a processor, memory, storage, and one or more input/output devices. Each component may be interconnected using a system bus or other similar structure. The processor may be capable of processing instructions for execution within a computing device. In one implementation, the processor is a single-threaded processor. In another implementation, the processor is a multi-threaded processor. The processor is capable of processing instructions stored in the memory or on the storage device to display graphical information for a user interface on the input/output device.

The memory stores information within the computing device. In one implementation, the memory is a computer-readable medium. In one implementation, the memory is a volatile memory unit. In another implementation, the memory is a non-volatile memory unit.

The storage device can provide mass storage for the computing device. In one implementation, the storage device is a computer-readable medium. In various different implementations, the storage device may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.

The input/output device provides input/output operations for the computing device. In one implementation, the input/output device includes a keyboard and/or a pointing device. In another implementation, an input/output device includes a display unit for displaying a graphical user interface.

The described features (e.g., the masking threshold estimator 610, the interference estimator 608, and the gain setter 612, the operations described in the flowcharts 1100, 1200, and 1300) may be implemented in digital electronic circuitry (e.g., a processor), or in computer hardware, firmware, software, or in combinations of them. The apparatus may be embodied in a computer program product embodied in an information carrier (e.g., in a machine-readable storage device) for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output. The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors of any kind of computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data. Generally, a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and an optical disc. Storage devices suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and memory may be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).

To provide for interaction with a user, these features can be implemented on a computer having a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer.

The features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected in any form or medium of digital data communication, such as a communication network. Examples of communication networks include, for example, a LAN, a WAN, and the computers and networks forming the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a network, such as the one described. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Other embodiments are within the scope of the following claims. The techniques described herein may be performed in a different order and still achieve desirable results.

Claims

1. A method for masking an interfering audio signal, comprising:

identifying a first frequency band of a desired signal provided to a first acoustic region to adjust a masking threshold associated with a second frequency band of the desired signal; and

applying a gain to a first frequency band of the desired signal to raise the masking threshold in the second frequency band above a level of an interfering signal containing energy in the second frequency band.

2. The method of claim 1, wherein identifying the first frequency band of the desired signal comprises selecting a frequency band having a maximum level from a set of frequency bands.

3. The method of claim 1, wherein the first and second frequency bands are in the barker domain.

4. The method of claim 1, wherein adjusting the first portion of the signal comprises comparing the masking threshold to the level of the interfering signal.

5. The method of claim 4, wherein the gain applied is slew rate limited.

6. The method of claim 1, wherein applying the gain comprises smoothing the gain to maintain a peak gain value.

7. The method of claim 6, wherein maintaining the peak value comprises expanding the peak value.

8. The method of claim 1, wherein the interfering signal comprises a signal provided to a second acoustic region.

9. The method of claim 1, wherein the interfering signal comprises an estimate of a noise signal.

10. A method for masking an interfering audio signal, comprising:

regenerating a first signal having a level at a first location, the first signal also having a first frequency range,

determining a masking threshold as a function of a frequency associated with the first signal at the first location,

identifying a level of a second signal present at the first location, the second signal having a second frequency range different from the first frequency range,

comparing the level of the second signal present at the first location with the masking threshold, an

Adjusting the first signal level to raise the masking threshold above the level of the second signal within the second frequency range.

11. The method of claim 10, wherein the first and second frequency ranges are represented in the bark domain.

12. The method of claim 10, wherein the adjusted level of the first signal is slew rate limited.

13. The method of claim 10, wherein adjusting the first signal level comprises applying a gain.

14. The method of claim 13, wherein applying the gain comprises smoothing the gain to maintain a peak gain value.

15. The method of claim 14, wherein maintaining the peak value comprises expanding the peak value.

16. The method of claim 10, wherein the second signal comprises a signal provided to a second location.

17. A method according to claim 10, wherein the second signal represents an estimate of a noise signal.

18. The method of claim 10, further comprising:

adjusting the second signal level according to frequency to reduce the second signal level below the masking threshold over at least a portion of the second frequency range in order to reduce audibility of the second signal in the first location.

19. A method for reducing audibility of an interfering signal, comprising:

regenerating a first signal having a level at a first location as a function of frequency, the first signal also having a first frequency range,

identifying a level based on a frequency of a second signal present at the first location, the second signal having a second frequency range,

Adjusting the second signal level according to frequency to reduce the second signal level below the masking threshold above at least a portion of the second frequency range in order to reduce audibility of the second signal in the first location.

20. The method of claim 19, wherein the first and second frequency ranges are represented in the bark domain.

21. The method of claim 19, wherein adjusting the second signal level comprises reducing a gain.

22. The method of claim 19, wherein the second signal comprises a signal provided to a second location.