[go: up one dir, main page]

HK1176177B - Method for modifying a parameter of an audio dynamics processor and apparatus for performing the method - Google Patents

Method for modifying a parameter of an audio dynamics processor and apparatus for performing the method Download PDF

Info

Publication number
HK1176177B
HK1176177B HK13103397.8A HK13103397A HK1176177B HK 1176177 B HK1176177 B HK 1176177B HK 13103397 A HK13103397 A HK 13103397A HK 1176177 B HK1176177 B HK 1176177B
Authority
HK
Hong Kong
Prior art keywords
audio
signal
auditory
event
loudness
Prior art date
Application number
HK13103397.8A
Other languages
Chinese (zh)
Other versions
HK1176177A1 (en
Inventor
布雷特.格雷厄姆.克罗克特
阿兰.杰弗里.西费尔特
Original Assignee
杜比实验室特许公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杜比实验室特许公司 filed Critical 杜比实验室特许公司
Publication of HK1176177A1 publication Critical patent/HK1176177A1/en
Publication of HK1176177B publication Critical patent/HK1176177B/en

Links

Description

Method for modifying parameters of an audio dynamics processor and device for carrying out the method
This application is a divisional application of the invention patent application entitled "Audio gain control Using specific loudness based auditory event detection" with application number "200780014742.8 (national phase-entering PCT application, International application number PCT/US 2007/008313).
Technical Field
The present invention relates to audio dynamic range control methods and apparatus in which an audio processing device analyzes an audio signal and changes the level, gain or dynamic range of the audio, and all or some of the audio gain and parameters of the dynamic processing are generated as a function of auditory events. The invention also relates to a computer program for implementing such a method or controlling such a device.
The invention also relates to methods and apparatus utilizing specific loudness-based auditory event detection. The invention also relates to a computer program for implementing such a method or controlling such a device.
The invention also relates to a method for modifying parameters of an audio dynamics processor. The invention also relates to a device for carrying out such a method.
Background
Dynamic processing of audio
Techniques for Automatic Gain Control (AGC) and Dynamic Range Control (DRC) are well known and are common components of many audio signal paths. In an abstract sense, both techniques measure the level of an audio signal in some way, and then gain modify the signal by an amount that is a function of the measured level. In a linear 1:1 dynamics processing system, the input audio is not processed and the output audio signal ideally matches the input audio signal. Furthermore, if there is an audio dynamics processing system that automatically measures the characteristics of an input signal and uses this measurement to control an output signal, the output signal has been compressed at a 2:1 ratio relative to the input signal if the level of the input signal rises by 6dB and the output signal is processed such that the level of the output signal rises by only 3 dB. International publication WO2006/047600A1 ("calibrating and Adjusting the periodic Loudness and/or the periodic Spectral Balance of an Audio Signal" by Alan JeffreySeefeldt) provides a detailed overview of the following five basic types of Audio dynamics processing: compression, clipping, Automatic Gain Control (AGC), spreading, and thresholding.
Auditory events and auditory event detection
The process of dividing sound into units or segments that are perceived as separate and distinct is sometimes referred to as "auditory event analysis" or "auditory scene analysis" ("ASA"), and the segments are sometimes referred to as "auditory events" or "audio events. Bregman sets forth an extensive discussion of Auditory scene analysis in his work "audio scene analysis-The Perceptual Organization of Sound" (massachusetts institute of Technology,1991, Fourth printing,2001, Second MIT pressure analysis). Further, a publication dated back to 1976 is cited in U.S. Pat. No. 6,002,776 issued to Bhadkamkar et al at 12/14/1999 as "prior art related to sound separation by auditory scene analysis". However, the patent to Bhadkamkar et al does not favor the practical application of auditory scene analysis and concludes that "while the technology involving auditory scene analysis is interesting from a scientific standpoint as a model of human auditory processing, it is currently too computationally demanding and specialized to be a practical technology for sound separation before making fundamental progress. "
In each of the patent applications and papers listed below under the heading "incorporated by reference," Crockett and Crocket et al propose effective means for identifying auditory events. According to these documents, an audio signal is divided into auditory events by detecting changes in spectral components (amplitude as a function of frequency) with respect to time, where each auditory event tends to be perceived as independent and distinct. This can be done, for example, by: calculating spectral content (spectral content) of successive time blocks of an audio signal, calculating differences in spectral content between successive time blocks of the audio signal, and identifying auditory event boundaries as boundaries between such successive time blocks when differences in spectral content between successive time blocks exceed a threshold. Alternatively, the change in amplitude with respect to time may be calculated in addition to or instead of the change in spectral content with respect to time.
The process, in its implementation with minimal computational requirements, divides the audio into time segments by: the entire frequency band (full bandwidth audio) or substantially the entire frequency band (band-limited filtering at the ends of the spectrum is typically used in practical implementations) is analyzed and the loudest audio signal components are given the greatest weight. This approach takes advantage of the psychoacoustic phenomenon that the human ear may tend to concentrate on a single auditory event at a given time on a smaller time scale (below 20 milliseconds (ms)). This means that although multiple events may occur simultaneously, one component tends to be perceptually most prominent and can be processed separately as if it were the only event that occurred. With this effect, the auditory event detection is also allowed to scale with the complexity of the processed audio. For example, if the input audio signal being processed is a solo, the audio events identified may be individual notes being played. Similarly, for an input speech signal, it is possible to recognize individual components of speech, such as vowels and consonants, as individual audio elements. As audio complexity increases, such as music with drumbeats or multiple instruments and parts, auditory event detection identifies the "most prominent" (i.e., loudest) audio element at any given moment.
At the expense of greater computational complexity, the process may also account for variations in spectral content over time in discrete sub-bands (either fixed or dynamically determined, or both) rather than the entire bandwidth. This alternative approach considers more than one audio stream in different sub-bands, rather than assuming that only a single audio stream can be perceived at a particular time.
Auditory event detection may be achieved by: the time domain audio waveform is divided into time intervals or blocks and the data in each block is then converted to the frequency domain using a filter bank or a time-frequency transform such as an FFT. The amplitude of the spectral content of each block may be normalized to eliminate or reduce the effect of amplitude variations. Each resulting frequency domain representation provides an indication of the spectral content of the audio in a particular block. The spectral content of successive blocks is compared and a change greater than a threshold can be used to indicate the time start or time end of an auditory event.
Preferably, the frequency domain data is normalized as described below. The extent to which the frequency domain data needs to be normalized gives an indication of the amplitude. Thus, if the degree of change exceeds a predetermined threshold, the change may also be used to indicate an event boundary. The event start and event end points resulting from the spectral change and from the amplitude change may be ored together to identify event boundaries resulting from either type of change.
While the techniques described in the noted Crockett and Crockett et al applications and articles are particularly effective in connection with aspects of the invention, other techniques for identifying auditory events and event boundaries may be used in connection with aspects of the invention.
Disclosure of Invention
Conventional prior art audio dynamics processing involves multiplying audio by a time-varying control signal that adjusts the gain of the audio to produce a desired result. The "gain" is a scaling factor that scales the audio amplitude. The control signal may be generated continuously or from a block of audio data, but is typically derived from some form of measurement of the audio being processed, and the rate of change of the control signal is determined by a smoothing filter which has sometimes a fixed characteristic and sometimes a characteristic which varies with the dynamics of the audio. For example, the response time may be adjustable according to changes in the magnitude or power of the audio. Prior art methods such as Automatic Gain Control (AGC) and Dynamic Range Compression (DRC) do not estimate in any psycho-acoustically based way the time interval during which a gain change may be perceived as a defect, and the time interval when the gain change can be applied without introducing audible distortion (audible artifacts). Thus, conventional audio dynamics processing typically introduces audible distortion, i.e., the result of the dynamics processing introduces undesirable perceptible audio variations.
Auditory scene analysis identifies perceptually discrete auditory events, where each event occurs between two consecutive auditory event boundaries. By ensuring that the gain is closer to constant within an auditory event and by limiting most of the gain variation to near the event boundaries, audible artifacts caused by the gain variation can be greatly reduced. In the context of a compressor or expander, the response to an increase in audio level (commonly referred to as a surge) may be rapid and comparable to or less than the minimum duration of an auditory event, but the response to a decrease (release or restoration) may be slower, and thus the sound that should appear constant or faded may be audibly disrupted. In this case, it is very advantageous to delay the gain recovery before the next boundary or to slow down the rate of change of gain during the event. For automatic gain control applications where the mid-and long-term level or mid-and long-term loudness of the audio is normalized and thus the attack and release times are likely to be longer than the shortest duration of an auditory event, it is advantageous to take the following actions for increasing and decreasing gain during the event: delaying the gain change or slowing the rate of gain change before the next event boundary.
According to one aspect of the invention, an audio processing system receives an audio signal and analyzes and alters the gain and/or dynamic range characteristics of the audio. The dynamic range modification of the audio is typically controlled by parameters of the dynamics processing system (attack and release times, compression ratio, etc.) that have a significant impact on the perceptual distortion introduced by the dynamics processing. Changes in signal characteristics in the audio signal with respect to time are detected and identified as auditory event boundaries such that audio segments between successive boundaries constitute auditory events in the audio signal. The characteristics of auditory events of interest may include event characteristics such as perceived intensity or duration. Some of the one or more dynamic processing parameters are generated at least partially in response to auditory events and/or a degree of change in a signal characteristic associated with the auditory event boundaries.
According to another aspect of the present invention, there is provided a method for modifying parameters of an audio dynamics processor, the method comprising: detecting a change in signal characteristics in the audio signal with respect to time; identifying as auditory event boundaries changes in the signal characteristics in the audio signal with respect to time that exceed a threshold, wherein segments of audio between successive boundaries constitute auditory events; generating a parameter modification control signal based on the identified boundary of the auditory event; and modifying the parameter of the audio dynamics processor in accordance with the control signal.
According to yet another aspect of the invention, there is provided an apparatus for performing the above method.
Typically, auditory events are segments of audio that tend to be perceived as separate and distinct. One effective measure of signal characteristics includes a measure of the spectral content of the audio, for example as described in the cited Crockett and Crockett et al documents. All or some of the one or more dynamic processing parameters may be generated at least partially in response to the presence or absence of one or more auditory events and characteristics thereof. Auditory event boundaries can be identified as changes in signal characteristics with respect to time that exceed a threshold. Alternatively, all or some of the one or more parameters may be generated at least partially in response to a continuous measure of the degree of signal characteristic change associated with the auditory event boundary. Although aspects of the present invention may in principle be implemented in the analog and/or digital domain, a practical implementation may be in the digital domain where each audio signal is represented by individual samples or samples in a data block. In this case, the signal characteristic may be the spectral content of the audio within a block, the detection of a change in the signal characteristic with respect to time may be the detection of a change in the spectral content of the audio from block to block, and the temporal start boundary and the temporal end boundary of an auditory event each coincide with a boundary of a data block. It should be noted that for the more traditional case where dynamic gain variation is performed on a sample-by-sample basis, the auditory scene analysis described may be performed on a block-by-block basis, and the resulting auditory event information is used to perform the sample-by-sample applied dynamic gain variation.
By controlling key audio dynamics processing parameters using the results of auditory scene analysis, a significant reduction in the auditory distortion introduced by dynamics processing can be achieved.
The present invention provides two ways of performing auditory scene analysis. The first way performs spectral analysis and identifies the location of a perceivable audio event for controlling the dynamic gain parameter by identifying a change in spectral content. The second way transforms the audio to the perceptual loudness domain (which may provide more psychoacoustically relevant information than the first way) and identifies the location of the auditory event which is then used to control the dynamic gain parameters. It should be noted that the second approach requires the audio processing to know the absolute acoustic reproduction level, which may not be feasible in some implementations. The provision of these two auditory scene analysis methods enables ASA controlled dynamic gain modification with a process or device that may or may not be calibrated to take into account the absolute reproduction level.
Aspects of the invention are described herein in the context of audio dynamics processing, including aspects of other inventions. Such other inventions are described in various pending U.S. patent applications and international patent applications, both to Dolby Laboratories licensing corporation, the owner of this application, which are identified herein.
Drawings
FIG. 1 is a flow chart showing an example of processing steps for performing auditory scene analysis;
FIG. 2 shows an example of block processing, windowing, and performing DFT on audio while auditory scene analysis is being performed;
FIG. 3 is in the nature of a flow chart or functional block diagram illustrating parallel processing utilizing audio to identify auditory events and to identify characteristics of auditory events to utilize the events and their characteristics to modify dynamic processing parameters;
FIG. 4 is in the nature of a flow chart or functional block diagram illustrating a process of recognizing auditory events using audio only and determining event characteristics based on auditory event detection to modify dynamic processing parameters using the events and their characteristics;
FIG. 5 is in the nature of a flow chart or functional block diagram illustrating a process for recognizing auditory events using only audio and determining event characteristics based on auditory event detection to modify dynamic processing parameters using only characteristics of the auditory events;
FIG. 6 shows an idealized set of auditory filter response characteristics close to the critical band on the ERB scale, with the horizontal scale being frequency in Hertz and the vertical scale being level in decibels;
FIG. 7 shows an ISO226 equal loudness curve with the horizontal scale being frequency in Hertz (logarithmic scale base 10) and the vertical scale being sound pressure level in decibels;
8a-c show idealized input/output characteristics and input gain characteristics of an audio dynamic range compressor;
9a-f illustrate examples of using auditory events to control release time in a digital implementation of a conventional Dynamic Range Controller (DRC) in which gain control is derived from the Root Mean Square (RMS) power of the signal;
10a-f show examples of using auditory events to control release times in digital implementations of a conventional Dynamic Range Controller (DRC) in which gain control is derived from the Root Mean Square (RMS) power of the signal, for alternative signals to that used in FIG. 9;
fig. 11 depicts a suitable idealized set of AGC and DRC curves for applying AGC prior to DRC in a loudness domain dynamics processing system, the purpose of this combination being to cause all processed audio to have approximately the same perceived loudness while still preserving the dynamics of at least some of the original audio.
Detailed Description
Auditory scene analysis (original non-loudness domain method)
According to an embodiment of an aspect of the present invention, auditory scene analysis may consist of four general processing steps shown in the section of FIG. 1. The first step 1-1 ("performing spectral analysis") takes a time domain audio signal, divides it into blocks and calculates the spectral profile or spectral content of each block. Spectral analysis transforms the audio signal into the short-term frequency domain. This can be performed in linear or curved frequency space (e.g. Bark scale or critical band which better approximates the characteristics of the human ear), with any filter bank, band-pass filter based transform or band-pass filter bank. There is a trade-off between time and frequency with any filter bank. The greater time resolution and therefore the shorter time interval results in a lower frequency resolution. The greater frequency resolution and hence narrower sub-bands result in longer time intervals.
The first step, shown conceptually in fig. 1, calculates the spectral content of successive time segments of the audio signal. Although 512 samples provide a good compromise between time resolution and frequency resolution, in practical embodiments the ASA block size may be from any number of samples of the input audio signal. In a second step 1-2, the difference in spectral content from block to block is determined ("perform spectral contour difference measurement"). Thus, the second step calculates the difference in spectral content between successive time segments of the audio signal. As discussed above, a valid indicator of the start or end of a perceived auditory event is considered to be a change in spectral content. In a third step 1-3 ("identifying the location of auditory event boundaries"), when the spectral difference between one spectral contour block and the next is greater than a threshold, the boundary of that block is taken as an auditory event boundary. The audio segments between successive boundaries constitute auditory events. Thus, the third step sets the auditory event boundary between consecutive time segments when the difference in spectral profile content between such consecutive time segments exceeds a threshold, thereby defining an auditory event. In this embodiment, the auditory event boundaries define auditory events whose length is an integer multiple of the spectral profile block, the minimum length of which is one spectral profile block (512 samples in this example). Event boundaries need not be so limited in principle. As an alternative to the actual embodiment discussed herein, the input block size may vary, for example, to be substantially the size of an auditory event.
After the event boundaries are identified, key characteristics of the auditory event are identified, as shown in steps 1-4.
Overlapping or non-overlapping audio segments may be windowed and used to calculate a spectral profile of the input audio. The overlap results in better resolution of the auditory event locations and also makes events less likely to be missed, such as transient transients. However, overlapping also increases computational complexity. Thus, the overlap may be omitted. Fig. 2 shows a conceptual representation of N non-overlapping blocks of samples that are windowed and transformed to the frequency domain via a Discrete Fourier Transform (DFT). Each block may be windowed and transformed to the frequency domain, for example by using a DFT, preferably implemented as a Fast Fourier Transform (FFT) for speed.
The following variables can be used to calculate the spectral profile of the input block:
m-the number of windowed samples in a block used to compute a spectral profile
P-number of samples overlapped for spectral computation
In general, any integer can be used for the variables. However, this implementation would be more efficient where M is set to a power equal to 2 so that spectral contour calculations can be made using a standard FFT. In a practical embodiment of the auditory scene analysis process, the listed parameters may be set to:
512 samples (or 11.6ms at 44.1 kHz)
P is 0 samples (no overlap)
The values listed above are determined experimentally and have been found to generally identify the location and duration of auditory events with sufficient accuracy. However, setting the value of P to 256 samples (50% overlap) instead of zero samples (no overlap) has been found to be effective in identifying certain difficult to find events. The windows used in the spectral contour calculation are M-point Hamming windows, M-point Kaiser-Bessel windows or other suitable (preferably non-rectangular) windows, although many different types of windows may be used to minimize the spectral distortion due to windowing. The values and hamming window types indicated above were chosen after a large number of experimental analyses, as they appear to provide excellent results across a wide range of audio material. For the processing of audio signals that are mainly low frequency content, non-rectangular windowing is preferred. Rectangular windowing produces spectral distortions that may lead to incorrect event detection. Unlike certain encoder/decoder (codec) applications where the overall overlap/add process must provide a constant level, no such limitation is imposed here, and the windows may be selected for characteristics such as temporal/frequency resolution of the windows and stop-band rejection.
In step 1-1 (fig. 1), the spectrum of each block of M samples may be calculated by: windowing the data with an M-point hamming window, an M-point kessel-bessel window, or other suitable window, converting to the frequency domain with an M-point fast fourier transform, and computing the magnitude of the complex FFT coefficients. The resulting data is normalized so that the maximum magnitude is set to one (unity) and the normalized array of M numbers is converted to the log domain. The data may also be normalized by some other metric, such as an average magnitude or average power value of the data. The array does not need to be converted to the log domain, but the conversion simplifies the computation of the difference measure in steps 1-2. Furthermore, the log domain more closely matches the characteristics of the human auditory system. The resulting log domain value ranges from minus infinity to zero. In a practical embodiment, a lower limit may be imposed on the range of values, which may be fixed (e.g., -60dB), or may be frequency dependent to reflect the lower audibility of silence at low and very high frequencies. (Note that the size of the array can be reduced to M/2 because the FFT represents negative as well as positive frequencies.)
Step 1-2 calculates a measure of the difference between the spectra of the adjacent blocks. For each block, each of the M (logarithmic) spectral coefficients from step 1-1 is subtracted from the corresponding coefficient of the preceding block and the magnitude of the difference is calculated (ignoring the sign). The M differences are then summed into a number. The difference measure may also be represented as an average difference value for each spectral coefficient by dividing the difference measure by the number of spectral coefficients (M coefficients in this case) used in the summation.
Step 1-3 identifies the location of auditory event boundaries by applying a threshold to the array of difference measurements with thresholds from step 1-2. When the difference measure exceeds a threshold, the change in the spectrum is considered sufficient to signal a new event, and the block number of the change is recorded as the event boundary. For the values of M and P given above and for the log-domain values expressed in dB (in step 1-1), the threshold can be set equal to 2500 if the full magnitude of the FFT (including the mirrored part) is compared, or to 1250 if half the magnitude of the FFT is compared (as noted above, the FFT represents the negative and positive frequencies — one is the mirror of the other for the magnitude of the FFT). This value is chosen experimentally and provides good auditory event boundary detection. The parameter value may be changed to decrease (increase the threshold) or increase (decrease the threshold) the event detection.
The process of fig. 1 can be represented more generally with equivalent arrangements of fig. 3, 4 and 5. In fig. 3, the audio signal is applied in parallel to the following processes: the audio signal is divided into an "identify auditory events" function or step 3-1, and optionally a "identify characteristics of auditory events" function or step 3-2, in which each is perceived as an independent and distinct auditory event. The process of fig. 1 may be used to divide an audio signal into auditory events and the identified characteristics of the auditory events, or some other suitable process may be used. The auditory event information determined by function or step 3-1, which may be an identification of auditory event boundaries, is then used by a "modify dynamics parameters" function or step 3-3 to modify audio dynamics processing parameters (e.g., attack, release, ratio, etc.) as desired. An optional "identify characteristics" function or step 3-3 also receives auditory event information. The "identify characteristics" function or step 3-3 may characterize some or all of the auditory events with one or more characteristics. As described in connection with the process of fig. 1, such characteristics may include identification of the primary sub-bands of auditory events. The characteristics may also include one or more audio characteristics including, for example, a measure of the power of the auditory event, a measure of the amplitude of the auditory event, a measure of the spectral flatness of the auditory event, and whether the auditory event is substantially unvoiced, or other characteristics that help to modify the dynamic parameters such that the processing's unwanted auditory distortion is reduced or removed. The characteristics may also include other characteristics, such as whether the auditory event includes a transient.
An alternative to the arrangement of fig. 3 is shown in fig. 4 and 5. In fig. 4, the audio input signal is not applied directly to the "identify characteristics" function or step 4-3, but rather the "identify characteristics" function or step 4-3 receives information from the "identify auditory events" function or step 4-1. The arrangement of fig. 1 is a specific example of such an arrangement. In FIG. 5, functions or steps 5-1, 5-2 and 5-3 are arranged in series.
The details of this practical embodiment are not important. Other ways for performing the following may be used: the method comprises the steps of calculating spectral content of successive time segments of the audio signal, calculating differences between successive time segments, and setting auditory event boundaries at respective boundaries between successive time segments when differences in spectral contour content between such successive time segments exceed a threshold.
Auditory scene analysis (New loudness domain method)
International application PCT/US2005/038579 entitled "stabilizing and Adjusting the Perceived loudness and/or the Perceived Spectral Balance of an Audio Signal", filed on 25.10.2005 by Alan Jeffrey Seefeldt and published as International publication WO2006/047600A1, discloses therein an objective measure of Perceived loudness based on a psychoacoustic model. The entire contents of said application are hereby incorporated by reference. As described in said application, an excitation signal E [ b, t ] is calculated from the audio signal x [ n ], wherein said excitation signal approximates the energy distribution along the basal membrane of the inner ear at the critical frequency band b during a time block t. The excitation may be computed from a short-time discrete fourier transform (STDFT) of the audio signal, as follows:
wherein X [ k, t]Represents x [ n ]]STDFT at time block t and frequency point (bin) k. Note that in equation 1, t represents time in discrete units of transform blocks, as opposed to continuous measurements (e.g., seconds). T [ k ]]Representing the frequency response of a filter simulating the transmission of audio through the outer and middle ear, Cb[k]Indicating the frequency response of the basement membrane at a location corresponding to the critical band b. Fig. 6 depicts a suitable set of critical band filter responses in which 40 bands are evenly spaced along the Equivalent Rectangular Bandwidth (ERB) scale as defined by Moore and Glasberg. Each filter shape is described by a rounded exponential function and the bands are distributed with an interval of 1 ERB. Finally, the smoothing time constant λ in equation 1bMay advantageously be chosen to be proportional to the integration time of human loudness perception in frequency band b.
With equal loudness curves such as those shown in fig. 7, the excitation at each band is transformed to an excitation level that will produce the same perceived loudness at 1 kHz. Then from the transformed excitation E by means of compression non-linearities1kHz[b,t]Specific loudness, i.e., a measure of perceived loudness distributed across frequency and time, is calculated. One such technique for calculating specific loudness, Nb, t, is given by]The appropriate function of:
wherein TQ1kHzIs a threshold at 1kHz without sound, the constants beta and alpha are chosen to match the growth of loudness data collected from listening experiments. Abstractly, this transformation from excitation to specific loudness can be represented by a function Ψ { } that holds:
N[b,t]=Ψ{E[b,t]}
finally, the total loudness L [ t ] in sons is calculated by summing the specific loudness across the frequency bands:
specific loudness N b, t is a spectral representation intended to model the way humans perceive audio as a function of frequency and time. The specific loudness captures sensitivity changes for different frequencies, sensitivity changes for levels, and changes in frequency resolution. The specific loudness is therefore a spectral representation that matches well with auditory event detection. Comparison of the difference in N b, t across frequency bands between successive time blocks, while more computationally complex, may result in perceptually more accurate auditory event detection in many cases than using the continuous FFT spectrum described above directly.
In said patent application, several applications for modifying audio based on this psychoacoustic loudness model are disclosed. There are several dynamic processing algorithms, such as AGC and DRC. These disclosed algorithms may benefit from utilizing auditory events to control various related parameters. Since the specific loudness has already been calculated, the purpose of detecting the event is easily achieved. Details of preferred embodiments are discussed below.
Audio dynamics processing parameter control using auditory events
Two examples of embodiments of the present invention are now provided. A first example describes the use of auditory events to control release time in a digital implementation of a Dynamic Range Controller (DRC), where the gain control is derived from the Root Mean Square (RMS) power of the signal. The second embodiment describes the use of auditory events to control certain aspects of the more complex combination of AGC and DRC implemented in the context of the psychoacoustic loudness model described above. These two embodiments are intended to serve only as examples of the present invention, and it should be understood that the use of auditory events to control the parameters of the dynamic processing algorithm is not limited to the details described below.
Dynamic range control
The digital implementation of DRC that has been described segments the audio signal x [ n ] into windowed, half-overlapping blocks and calculates, for each block, a correction gain based on a measure of the local power of the signal and the selected compression curve. The gain is smoothed across blocks and then multiplied with each block. The final modified blocks are superimposed to produce a modified audio signal y n.
It should be noted that although the auditory scene analysis and digital implementation of DRC described herein divide the time-domain audio signal into blocks to perform the analysis and processing, the DRC processing need not be performed using block segmentation. For example, auditory scene analysis can be performed using the block segmentation and spectral analysis described above, and the resulting auditory event locations and characteristics can be utilized to provide control information to a digital implementation of a conventional DRC implementation that typically operates on a sample-by-sample basis. However, the same blocking structure as used for auditory scene analysis is used here for DRC to simplify the description of the combination of DRC and auditory scene analysis.
Continuing with the description of the block-based DRC implementation, overlapping blocks of an audio signal may be represented as:
x [ n, t ] ═ w [ n ] x [ n + tM/2], where 0 < n < M-1 (4)
Where M is the block length, its hop distance (hopsize) is M/2, w [ n ] is the window, n is the sample index within the block, and t is the block index (note that t is used here in the same way as used in equation 1 for STDFT; e.g., t represents time in discrete units of blocks, not seconds). Ideally, the window w [ n ] tapers to zero at both ends and sums to one when half overlapping itself; for example, a commonly used sinusoidal window meets these criteria.
For each block, the RMS power may then be calculated to produce a power measurement P [ t ] in dB for each block:
as mentioned previously, the power measurement may be smoothed with a fast attack and slow release before processing with the compression curve, but instead the instantaneous power P t is processed and the resulting gain is smoothed. This alternative has the advantage that a simple compression curve with a sharp knee can be utilized, but the resulting gain is still smooth as power travels past the knee. The compression curve is represented as a function F of the signal level that produces the gain, as shown in fig. 8c, and the block gain G t is given by:
G[t]=F{P[t]} (6)
assuming that the compression curve applies more attenuation as the signal level increases, the gain will decrease when the signal is in "attack mode" and increase when the signal is in "release mode". Therefore, the smoothed gain can be calculated according to the following equation
Wherein the content of the first and second substances,
and
αrelease>>αattack (7c)
finally, smoothed gain in dBIs applied to each block of the signal and the modified blocks are superimposed to produce modified audio:
wherein n is more than 0 and less than M/2 (8)
Note that the superposition synthesis shown above effectively smoothes the processed signal y [ n ] across, since the block is multiplied by a decreasing window as shown in equation 4]The gain of the sample. Therefore, the gain control signal is subjected to smoothing other than that shown in equation 7 a. In a more traditional implementation of DRC that operates on a sample-by-sample basis rather than a block-by-block basis, more complex gain smoothing than the simple single-pole filter shown in equation 7a may be necessary to avoid audible distortion in the processed signal. Furthermore, the use of block-based processing introduces an inherent M/2 sample delay to the system, and only aattackThe associated delay time is close to the delay, eliminating the need to further delay the signal x [ n ] before applying the gain to prevent overshoot]。
Fig. 9a to 9c depict the result of applying the DRC processing to an audio signal. For this particular implementation, a block length of M-512 is used at a sampling rate of 44.1 kHz. A compression curve similar to that shown in fig. 8b was used:
the signal is attenuated at a ratio of 5:1 above-20 dB relative to full scale numbers and enhanced at a ratio of 5:1 below-30 dB relative to full scale numbers. Using an impact coefficient alpha corresponding to a half-life of 10msattackAnd a release factor alpha corresponding to a half-life of 500msreleaseTo smooth the gain. The original audio signal depicted in fig. 9a includes six consecutive piano chords attenuated to silence, with the last chord at the 1.75 × 10 th chord5Around one sample. The gain shown in FIG. 9b is investigatedIt should be noted that the gain remains close to 0dB while the six chords are played. This is because the energy of the signal is mainly between-30 dB and-20 dB, i.e. the region where the DRC curve does not need correction. However, after the final chord is reached, the energy of the signal drops below-30 dB andand the gain starts to increase with decreasing chord and eventually exceeds 15 dB. Fig. 9c depicts the resulting modified audio signal, and it can be seen that the tail of the final chord is significantly enhanced. This enhancement of the otherwise low-level attenuated sound of the chord produces a particularly unnatural effect acoustically. It is an object of the present invention to avoid this type of problem associated with conventional dynamic processors.
Fig. 10a to 10c depict the result of applying the exact same DRC system to different audio signals. In this case, the first half of the signal includes a fast-rhythm music piece at a high level, and then the signal is at about 10 × 10 th4One sample switches to the second fast tempo music piece but at a much lower level. When studying the gain in fig. 6b, it can be seen that the signal is attenuated by about 10dB during the first half, and then the gain is increased back to 0dB during the second half when playing the second melody. In this case, the gain behavior is desirable. It may be desirable that the second curve is enhanced relative to the first curve and the gain should be increased rapidly after the transition to the second curve so as to be acoustically unobtrusive. A gain behavior similar to that discussed for the first signal can be seen, but is desirable here. Thus, it may be desirable to adjust the first condition without affecting the second condition. Controlling the release time of the DRC system using auditory events provides such a solution.
In the first signal studied in fig. 9, the enhancement of the attenuation of the chord appears unnatural, since the last chord and its attenuation are perceived as a single auditory event whose integrity is desired to be maintained. However, in the second case, many auditory events occur while the gain is increased, which means that little change is brought about for any individual event. The overall gain variation is not uncomfortable. It may thus argue that gain changes should only be allowed in temporal regions adjacent to auditory event boundaries. This principle can be applied to the gain when it is in either the bump mode or the release mode, but for most practical implementations of DRC the gain is shifted (move) in the bump mode so fast compared to the event-aware human time resolution that no control is needed. The event can be utilized to control smoothing of the DRC gain only when the DRC gain is in the release mode.
Suitable behavior of the release control will now be described. In qualitative terms, if an event is detected, the gain is smoothed using the release time constant specified above in equation 7 a. As time progresses past the detected event, the release time constant is continuously increased if no subsequent event is detected, so that the smoothed gain is eventually "frozen" in place. If another event is detected, the smoothing time constant is reset to an initial value and the process is repeated. To adjust the release time, a control signal may first be generated based on the detected event boundary.
As discussed previously, event boundaries can be detected by looking for changes in the continuous spectrum of the audio signal. In this particular implementation, the DFT for each overlapped block x [ n, t ] may be computed to produce an STDFT for the audio signal x [ n ]:
next, the difference between the normalized log-magnitude spectra of consecutive blocks may be calculated according to:
wherein:
although other normalization factors may be used, such as | X [ k, t]Average of the cross-over bins of | but using | X [ k, t ] here]The maximum value of | across the frequency point k is normalized. If the difference D [ t ]]Exceeds a threshold value DminThen the event is deemed to have occurred. Furthermore, it may be based on D [ t ]]Is greater than or equal to a maximum threshold value DmaxTo assign an intensity of between zero and one to the event. The resulting auditory event intensity signal A [ t ]]Can be calculated as:
by assigning an auditory event an intensity proportional to the amount of spectral change associated with the event, greater control over dynamic processing is achieved than with binary event decisions. The inventors have found that a large gain change during a strong event is acceptable and the signal in equation 11 allows for such variable control.
The signal a t is a pulse signal in which a pulse occurs at the position of an event boundary. To control the release time, the signal a [ t ] may be further smoothed so that it decays smoothly to zero after the detection of an event boundary. The smoothed event control signal A [ t ] can be calculated from A [ t ] according to the following equation:
here, αeventControlling the decay time of the event control signal. FIGS. 9d and 10d depict event control signals for two corresponding audio signalsIn which the smoother isThe half-life is set to 250 ms. In the first case, it can be seen that event boundaries for each of the six piano chords are detected, and the event control signal decays smoothly to zero after each event. For the second signal, many events are detected that are very close in time to each other, so the event control signal never completely decays to zero.
Now the event control signal can be utilizedTo change the release time constant for smoothing the gain. When the control signal is equal to one, the smoothing coefficient α [ t ] from equation 7a is as previously described]Is equal to alphareleaseAnd when the control signal is equal to zero, the coefficient is equal to one to avoid a change in the smoothed gain. The smoothing coefficient is interpolated between these two limits using the control signal according to:
by continuously interpolating a smoothing coefficient as a function of the event control signal, the release time is reset to a value proportional to the event intensity at the beginning of the event and then smoothly increases to infinity after the event occurs. By a coefficient alpha for generating a smoothed event control signaleventTo indicate the rate of increase.
Fig. 9e and 10e show the effect of smoothing the gain with event controlled coefficients from equation 13, as opposed to event uncontrolled coefficients from equation 7 b. In the first case, the event control signal is reduced to zero after the last piano chord, thereby avoiding gain up shifts. Thus, the corresponding modified audio in fig. 9f does not suffer from unnatural enhancement to the attenuation of chords. In the second case, the event control signal is never close to zero, so the smoothed gain signal is prevented from becoming small by the application of the event control. The trace of the smoothed gain is approximately equal to the event-free gain in fig. 10 b. This is exactly the desired effect.
Loudness-based AGC and DRC
As an alternative to conventional dynamic processing techniques in which the signal modification is a direct function of a simple signal measurement such as peak power or RMS power, international patent application PCT/US2005/038579 discloses that the psychoacoustic-based loudness model previously described is used as a framework for performing dynamic processing therein. Several advantages are cited. First, measurements and modification values in units of sones (sones) are specified, which is a more accurate measure of loudness perception than more basic measurements such as peak power or RMS power. Second, the audio may be modified to maintain the perceived spectral balance of the original audio as the overall loudness changes. In this way, the change in overall loudness becomes less perceptually significant than dynamic processors that utilize wideband gain to, for example, modify audio. Finally, the psychoacoustic model is inherently multi-band, so the system is easily configured to perform multi-band dynamic processing to alleviate the well-known cross-spectral pumping problem associated with wideband dynamic processors.
While performing dynamic processing in this loudness domain has several advantages over more traditional dynamic processing, the technique can be further improved by utilizing auditory events to control various parameters. Consider the audio piece containing the piano chord described in 27a and the associated DRC shown in fig. 10b and 10 c. A similar DRC may be performed in this loudness domain, and in this case, when the decrease in loudness of the last piano chord is enhanced, the enhancement will be less apparent since the spectral balance of the decreased notes will be maintained when the enhancement is applied. However, a better solution is to not enhance the attenuation at all, so that the same principles of controlling attack and release times with auditory events as previously described for conventional DRCs can be applied advantageously.
The loudness domain dynamics processing system now described includes AGC followed by DRC. The purpose of this combination is to make all the processed audio have approximately the same perceived loudness, while still preserving the dynamics of at least some of the original audio. Fig. 11 depicts a suitable set of AGC and DRC curves for this application. Note that since the processing is performed in the loudness domain, the input and output of the two curves are expressed in units of sones. The AGC curve strives to bring the output audio close to a certain target level and does so with a relatively slow time constant as previously described. It can be said that AGC makes the long-term loudness of the audio equal to the target, but the loudness can fluctuate significantly around the target for a short period of time. Thus, a faster acting DRC can be used to limit these fluctuations to a certain range that is considered acceptable for a particular application. Fig. 11 shows such a DRC curve, wherein the AGC target falls in the "null band" of the DRC, i.e. the part of the curve that does not need correction. With this combination of curves, the AGC places the long-term loudness of the audio in the zero band of the DRC curve, thus requiring the application of minimal, fast-acting DRC corrections. If the short-term loudness still fluctuates outside the zero band, the DRC then acts to shift the loudness of the audio toward the zero band. As a final general explanation, a slowly acting AGC may be applied to subject all bands of the loudness model to the same amount of loudness modification, thereby maintaining the perceptual spectral balance, and the fast acting DRC may be applied in a manner that allows the loudness modification to vary across bands to mitigate cross-spectral pumping that might otherwise result from fast acting band-independent loudness modifications.
Auditory events can be used to control the attack and release of the AGC and DRC. In the case of AGC, the attack and release times are both large compared to the temporal resolution of event perception, so event control can be used advantageously in both cases. For DRC, the impact is relatively short, so that only event control of release may be required, as described above for conventional DRC.
As previously discussed, the specific loudness spectrum associated with the loudness model used may be utilized for event detection. The differential signal D [ t ] similar to the differential signal in equations 10a and 10b can be calculated from the specific loudness N [ b, t ] defined in equation 2 as follows:
wherein the content of the first and second substances,
although other normalization factors may be used as well,such as | N [ b, t [ ]]Average across the band of | but here | N [ b, t is used]The maximum value of | across the band b is normalized. If the difference D [ t ]]Exceeds a threshold value DminThen the event is deemed to have occurred. The differential signal is then processed in the same manner as shown in equations 11 and 12 to generate a smoothed event control signal for controlling the attack and release times
The AGC curve depicted in fig. 11 can be expressed as a function taking its input as a measure of loudness and producing the desired output loudness:
Lo=FAGC{Li} (15a)
the DRC curve may be similarly expressed as:
Lo=FDRC{Li} (15b)
for AGC, the input loudness is a measure of the long-term loudness of the audio. Such a measure can be calculated by smoothing the instantaneous loudness L t defined in equation 3 with a relatively long time constant (of the order of several seconds). It has been shown that humans are more heavily weighted than softer parts when judging the long-term loudness of an audio segment, and that this effect can be simulated in the smoothing with a faster impact than the release. Thus, with the combination of event control for attack and release, the long-term loudness used to determine the AGC modification can be calculated according to:
LAGC[t]=αAGC[t]LAGC[t-1]+(1-αAGC[t])L[t] (16a)
wherein the content of the first and second substances,
further, the correlated long-term specific loudness spectrum that will later be used for multi-band DRC can be calculated:
NAGC[b,t]=αAGC[t]NAGC[b,t-1]+(1-αAGC[t])N[b,t] (16c)
in practice, the smoothing factor may be chosen such that the impact time is about half of the release time. Given the long-term loudness measurement, the loudness modification scale associated with the AGC can then be calculated as the ratio of output loudness relative to input loudness:
the DRC corrections will now be calculated from the loudness after applying the AGC scaling. Instead of smoothing the measure of loudness prior to applying the DRC curve, a DRC curve may instead be applied to the instantaneous loudness and then the resulting modification smoothed. This is similar to the previously described technique for smoothing the gain of a conventional DRC. Furthermore, DRC can be applied in a multi-band manner, which means that the DRC modification is a function of the specific loudness N [ b, t ] (instead of the total loudness L [ t ]) in each band b. However, in order to maintain the average spectral balance of the original audio, DRC can be applied to each band so that the resulting modification has the same average effect as would have been caused by applying DRC to the overall loudness. This can be achieved by: each band is scaled by the ratio of the long-term total loudness (after applying AGC scaling) relative to the long-term specific loudness, and this value is used as an argument for the DRC function. The result is then rescaled by the inverse of the ratio to produce specific loudness. Thus, the DRC scaling in each band may be calculated according to:
the AGC modification and DRC modification can then be combined to form a total loudness scaling for each band:
STOT[b,t]=SAGC[t]SDRC[b,t] (19)
the overall scaling can then be smoothed over time independently of each band with a fast bump and a slow release and event control applied only to the release. Ideally, smoothing is performed on the logarithm of the scaling, similar to the gain of a conventional DRC being smoothed in its decibel representation, but this is not essential. To ensure that the smoothed overall scale varies in synchronism with the specific loudness in each band, the attack and release modes may be determined by simultaneously smoothing the specific loudness itself:
wherein the content of the first and second substances,
finally, the target specific loudness may be calculated based on the smoothed scaling applied to the original specific loudness:
then, the gain G [ b, t ] that when applied to the original excitation results in a specific loudness equal to the target is solved:
the gain may be applied to each frequency band of a filter bank used to calculate the excitation, and the modified audio may then be generated by inverting (invert) the filter bank to generate a modified time domain audio signal.
Additional parameter control
While the above discussion focuses on the control of attack and release parameters for AGC and DRC by means of auditory event analysis of the processed audio, other important parameters may benefit from being controlled by means of ASA results. For example, the event control signal from equation 12 may be utilizedTo change a DRC ratio parameter for dynamically adjusting a gain of the audio. Similar to the attack and release time parameters, the ratio parameter may contribute significantly to the perceptual distortion introduced by the dynamic gain adjustment.
Implementation of
The invention may be implemented in hardware or software, or a combination of both (e.g., a programmable logic array). The algorithms included as part of the invention are not inherently related to any particular computer or other apparatus, except as otherwise specified. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus (e.g., an integrated circuit) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs running on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or input port, and at least one output device or output port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices in a known manner.
Each such program may be implemented in any desired computer language including machine, assembly, high level procedural, logical, or object oriented programming languages for communication with a computer system. In any case, the language may be a compiled or interpreted language.
Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.
According to the present application, the following solutions are provided:
scheme 1. an audio processing method in which a processor receives input channels and generates output channels, the output channels being generated by applying dynamic gain modifications to the input channels, the audio processing method comprising:
detecting a change in signal characteristics in the audio input channel with respect to time,
identifying changes in signal characteristics in the input channel with respect to time as auditory event boundaries, wherein an audio segment between successive boundaries constitutes an auditory event in the channel, an
All or some of the one or more parameters of the audio dynamics gain modification method are generated at least partially in response to the degree of auditory events and/or signal characteristic changes associated with the auditory event boundaries.
Scheme 2. the method of scheme 1 wherein the auditory events are audio segments that tend to be perceived as independent and distinct.
Scheme 3. the method of scheme 1 or 2, wherein the signal characteristic comprises a spectral content of the audio.
Scheme 4. the method of scheme 1 or 2, wherein the signal characteristic comprises a perceived loudness of the audio.
Scheme 5. the method of any of schemes 1-4, wherein all or some of the one or more parameters are generated at least in part in response to the presence or absence of one or more auditory events.
Scheme 6. the method of any of schemes 1-4, wherein the identifying step identifies a change in signal characteristics with respect to time that exceeds a threshold as an auditory event boundary.
Scheme 7. the method of any of schemes 1-4, wherein the auditory event boundaries may be modified by means of a function for generating a control signal used to modify audio dynamics gain modification parameters.
Scheme 8. the method of any of schemes 1-4, wherein all or some of the one or more parameters are generated at least in part in response to a continuous measurement of a degree of change in a signal characteristic associated with the auditory event boundary.
Scheme 9. an apparatus adapted to perform the method of any of schemes 1-8.
Arrangement 10. a computer program, stored on a computer-readable medium, for causing a computer to control the apparatus of arrangement 9.
Scheme 11. a computer program stored on a computer readable medium for causing a computer to perform the method of any of the schemes 1-8.
Scheme 12. a method for partitioning an audio signal into auditory events, wherein each auditory event tends to be perceived as independent and distinct, the method comprising:
calculating a difference in spectral content between successive time blocks of the audio signal, wherein the difference is calculated by comparing a difference in specific loudness between successive time blocks, wherein the specific loudness is a measure of perceived loudness as a function of frequency and time, an
Auditory event boundaries are identified as boundaries between consecutive time blocks when the difference in spectral content between such consecutive time blocks exceeds a threshold.
Scheme 13. the method of scheme 12, wherein the sampling is performed at a sampling frequency fsDiscrete sequence of events x [ n ] sampled from an audio source]To represent the audio signal and by comparing the specific loudness N b, t across the frequency band b between successive time blocks t]To calculate the difference.
Scheme 14. the method of scheme 13, wherein the difference in spectral content between successive time blocks of the audio signal is calculated according to:
wherein the content of the first and second substances,
scheme 15. the method of scheme 13, wherein the difference in spectral content between successive time blocks of the audio signal is calculated according to:
wherein the content of the first and second substances,
scheme 16. an apparatus adapted to perform the method of any of schemes 12-15.
Scheme 17. a computer program stored on a computer readable medium for causing a computer to control the apparatus of scheme 16.
A computer program, stored on a computer-readable medium, for causing a computer to perform the method of any one of claims 12-15.
Numerous embodiments of the present invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described herein may be order independent, and thus may be performed in an order different than the order described.
It should be understood that the implementation of other variations and modifications of the invention and its various aspects will be apparent to those of ordinary skill in the art, and that the invention is not limited by the specific embodiments described. It is therefore contemplated to cover by the present invention any and all modifications, variations, or equivalents that fall within the true spirit and scope of the basic underlying principles disclosed and claimed herein.
Incorporation by reference
The entire contents of each of the following patents, patent applications and publications are incorporated herein by reference.
Audio dynamics processing
"Audio Engineer's ReferenceBook", second edition, Limited, Reed Edutation and Professional Publishing Co., Ltd, and companies, Limited, and suppliers and Compressors, Alan Tutton, 2-1492-.
Detecting and utilizing auditory events
U.S. patent application 10/474,387 to Brett GrahamClockett, "High Quality Time-Scaling and pitch-Scaling of Audio Signals", published as US2004/0122662A1 at 6.24.2004.
U.S. patent application 10/478,398, "Method for Time alignment Audio signals Using channels Based on Audio Events", published as US2004/0148159A1, 7/29/2004, U.S. patent application No. Brett G.Crockett et al.
U.S. patent application 10/478,538 "Segmenting Audio Signals Into Audio events" to Brett G.Crockett, published as US2004/0165730A1, 8/26 of 2004. Aspects of the present invention provide means for detecting auditory events other than those disclosed in the Crockett's said application.
U.S. patent application 10/478,397, "composite Audio using characteristics Based on Audio Events", published as US2004/0172240A1, 2004, at 9, 2, Brett G.Crockett et al.
International application PCT/US05/24630 under the patent Cooperation treaty, entitled "Method for Combining Audio satellites Using Audio Scene Analysis", by Michael John Smithers, filed 13/2005 and published as WO2006/026161 at 3/9/2006.
International application PCT/US2004/016964 under the patent Cooperation treaty, entitled "Method, Apparatus and Computer Program for calibrating and adapting the received Loudness of an Audio Signal", filed 5/27/2004 and published as International publication WO2004/111994A2, 12/23/2004, to Alan Jeffrey Seefeldt et al, filed 2004 and 99994A 2.
International application PCT/US2005/038579 entitled "working and adapting the functional Loudenness and/or the functional Spectral Balance of an Audio Signal", filed 10/25.2005 and published as International publication WO2006/047600, to Alan Jeffrey Seefeldt.
Brett Crockett and Michael Smithers "A Method for Charactering and Identifying Audio Based on Audio Scene Analysis", Audio engineering Society assessment paper6416,118thConvention, Barcelona, 5 months and 28-31 days 2005.
Brett Crockert, "High Quality Multichannel Time Scaling and Pitch-Shifting using Audio Scene Analysis", Audio Engineering society Convention Paper5948, New York, month 10 2003.
"A New Objective Measure of Perceived Loudness" by Alan Seefeldt et al, Audio Engineering Society Convention Paper6236, SanFrancisco,2004, 10/28 days.
Handbook for Sound Engineers, The New Audio Cyclopedia, second edition, Dynamics, 850-.
"Audio Engineer's Reference Book" edited by Michael Talbot-Smith, second edition, section 2.9 ("Limiters and Compressors" of Alan Tutton), page 2149-.

Claims (12)

1. A method for modifying parameters of an audio dynamics processor, comprising:
detecting a change in a signal characteristic in the audio signal with respect to time,
identifying as auditory event boundaries a change in the signal characteristic in the audio signal with respect to time that exceeds a threshold, wherein an audio segment between successive boundaries constitutes an auditory event,
generating a parameter modification control signal based on the identified boundary of the auditory event, an
Modifying the parameter of the audio dynamics processor in accordance with the parameter modification control signal.
2. The method of claim 1, wherein auditory events are segments of audio that tend to be perceived as independent and distinct.
3. The method of claim 1, wherein the signal characteristic comprises spectral content of the audio signal.
4. The method of claim 1, wherein the signal characteristic comprises a perceived intensity of the audio signal.
5. The method of claim 1, wherein the signal characteristic comprises a perceived loudness of the audio signal.
6. The method of claim 1, wherein the parameter is generated at least in part in response to the presence or absence of one or more auditory events.
7. An apparatus for modifying parameters of an audio dynamics processor, comprising:
means for detecting a change in a signal characteristic in the audio signal with respect to time,
means for identifying as auditory event boundaries a change in the signal characteristic in the audio signal with respect to time that exceeds a threshold, wherein an audio segment between successive boundaries constitutes an auditory event,
means for generating a parameter modification control signal based on the identified boundary of the auditory event, an
Means for modifying the parameter of the audio dynamics processor in accordance with the parameter modification control signal.
8. The apparatus of claim 7, wherein auditory events are segments of audio that tend to be perceived as independent and distinct.
9. The apparatus of claim 7, wherein the signal characteristic comprises spectral content of the audio signal.
10. The apparatus of claim 7, wherein the signal characteristic comprises a perceived intensity of the audio signal.
11. The apparatus of claim 7, wherein the signal characteristic comprises a perceived loudness of the audio signal.
12. The apparatus of claim 7, wherein the parameter is generated at least in part in response to the presence or absence of one or more auditory events.
HK13103397.8A 2006-04-27 2013-03-19 Method for modifying a parameter of an audio dynamics processor and apparatus for performing the method HK1176177B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US79580806P 2006-04-27 2006-04-27
US60/795,808 2006-04-27

Publications (2)

Publication Number Publication Date
HK1176177A1 HK1176177A1 (en) 2013-07-19
HK1176177B true HK1176177B (en) 2015-09-18

Family

ID=

Similar Documents

Publication Publication Date Title
US12218642B2 (en) Audio control using auditory event detection
HK1176177B (en) Method for modifying a parameter of an audio dynamics processor and apparatus for performing the method
HK1126902B (en) Audio gain control using specific-loudness-based auditory event detection