HK1173274B

HK1173274B - Adaptive dynamic range enhancement of audio recordings

Info

Publication number: HK1173274B
Application number: HK13100484.8A
Authority: HK
Inventors: M‧维尔什; E‧斯特因; J-M‧卓特
Original assignee: Dts（英属维尔京群岛）有限公司
Priority date: 2009-10-09
Filing date: 2010-10-08
Publication date: 2016-09-02

Description

Adaptive dynamic range enhancement for audio recording

Cross Reference to Related Applications

The present inventors claim the priority OF the united states provisional patent application entitled ADAPTIVE DYNAMIC RANGE ENHANCEMENT OF audioredingss, serial No. 61/250,320, filed by inventor Walsh et al on 9/10/2009, and the united states provisional patent application entitled ADAPTIVEDYNAMIC RANGE ENHANCEMENT, serial No. 61/381,860, filed by inventor Walsh et al on 10/9/2010, united states provisional patent applications serial nos. 61/217,562 and 61/381,860 are incorporated herein by reference.

Statement regarding federally sponsored research/development

Not applicable to

Technical Field

The present invention relates generally to audio signal processing and, more particularly, to enhancing audio streams and recordings by restoring or emphasizing their dynamic range.

Background

It has become common practice in the recording industry to record and distribute recordings at higher levels of loudness, in compliance with the dialect "louder is better". With the advent of digital media formats such as CD, music is encoded with a maximum peak level defined by the number of bits available to represent the encoded signal. The loudness perception can still be further increased by signal processing techniques such as multi-band dynamic range compression, peak limiting and equalization when the maximum amplitude of the CD is reached. By using such a digital mastering tool, the sound engineer can maximize the average signal level by compressing transient peaks (such as drum hits) and increasing the gain of the resulting signal. Extreme use of dynamic range compression may introduce clipping and other audible distortions into the recorded waveform. Modern albums using such extreme dynamic range compression therefore get loudness at the expense of the quality of the music reproduction. The practice of increasing the loudness of music publications to match competing publications can have two effects. Because there is a maximum loudness level available for recording (as opposed to playback whose loudness is limited by playback speakers and amplifiers), boosting the overall loudness of a song or track ultimately produces a segment that is maximal and uniformly loud from start to finish. This produces music with a small dynamic range (there is little difference between loud and quiet parts), an effect that is often seen as fatigue and a blank in the artist's creation performance.

Another possible effect is distortion. In the digital domain, it is commonly referred to as clipping. Digital media cannot output a signal that is higher than digital full scale, and therefore this results in a waveform that becomes clipped, regardless of when the peak of the signal is pressed past that point. When this occurs, it can sometimes produce an audible click. However, some sounds like drum hits will reach their peak only for a very short time, and if the peak is much louder than the rest of the signal, the click will not be heard. In many cases, the peak of the drum hit is clipped, but it is not detected by an arbitrary listener.

Fig. 1a and 1b provide a visual representation of harmful mastering techniques. The audio recording waveforms shown in fig. 1a and 1b represent the original mastering track and a version of the track that has been mastered using different techniques. FIG. 1a shows an original recording, and the presence of a large number of peaks indicates a high dynamic range representing various types of dynamics present in the original performance. This recording provides a vibratory listening experience as certain tapping beats, such as drum hits, will sound forceful and clear. Instead, the recordings shown in FIG. 1b are re-mastered for louder commercial CD distribution. Most of the peaks present in the original recording are compressed or even clipped and, as a result, the dynamic range of the recording has been compromised. The increasingly aggressive use of dynamic range compression during the mastering phase of commercial music has produced a great deal of strong objection from consumers, producers and artists.

The approach discussed by the audio industry to solve this problem has focused on discussing the mastering techniques at the source of the problem. An example of this is described in Bob Katz's Mastering Audio, Second Edition, The arc The Science. Katz describes how a recording can be monitored for loudness by using calibration of the processed signal and using more moderate compression parameters without distorting the final result. Most mastering engineers agree that Katz's method is often replaced by the need for studio management. Even if more conservative mastering techniques become the new standard, it does not solve the problem of the major part of the existing recordings already mastered and distributed to the end users.

Prior processing techniques for modifying the dynamics of an audio recording are known in the art. One such process is loudness leveling (leveling), in which the difference between the perceived loudness of audio materials that are subject to different degrees of dynamic range compression is normalized to some predetermined level. However, these methods are used to normalize the average loudness of successive audio tracks played from various sources, and do not make any attempt to recover the dynamic range of the over-dynamic-range compressed content. As a result, the compressed media may be more silent to the dynamic presentation when played at a lower specified listening level.

Another known technique is to apply an upward expander (upward expander) as described in U.S. patent No.3,978,423 to Bench, entitled dynamic expander. The up-expander applies a time-varying gain to the audio signal according to a fixed "expansion curve" whereby the output signal level is greater than the input level above a selected threshold. As a result, the amplitude of the larger sound part of the source signal increases. However, this can result in an otherwise dynamic channel (soundtrack) in the output signal with over-emphasized transients.

Another known technique is to promote dynamic spectrum equalization of the lower and upper frequency bands when transients are detected. As a result, a more dynamic output is produced. Dynamic spectrum equalization is described in the following documents: x Rodet, F Jaillet, Detection and Modeling of Fast AttackTransients (2001), Proceedings of the International Computer MusicConference; U.S. patent No.7,353,169 entitled transport Detection and modification in Audio Signals to Goodwin et al; and U.S. patent application No.11/744,465 to Avendano et al entitled Method for Enhancing Audio Signals. Unlike previous approaches, these dynamic enhancement techniques exclusively affect signal transients. However, it affects all signal transients, even those that already exhibit high dynamics. Dynamic spectral equalization generally applies processing to all audio signal content, whether or not it is needed. For certain types of audio content, this can result in excessive dynamic processing output.

U.S. patent No.6,453,282 to Hipert et al summarizes methods of transient detection in the discrete-time audio domain. This time-domain approach is unreliable when analyzing heavily dynamic range compressed materials, since the energy variations due to transients become insignificant when the signal is viewed as a whole. This leads to misclassification of transient signals and to false positives (false positive).

In view of the ever-increasing interest in improving the presentation of audio recordings, there is a need in the art for improved audio processing.

Disclosure of Invention

According to the present invention, a method and apparatus for conditioning an audio signal is provided. The present invention provides a mandatory enhancement to the dynamic range of audio signals, in particular audio signals that have been subjected to harmful mastering techniques.

According to one aspect of the invention, a method for conditioning an audio signal is provided, the method having the steps of: receiving at least one audio signal, each audio signal having at least one channel, each channel being divided into a plurality of frames in a time sequence; calculating at least one measure of the dynamic excursion of the audio signal for a plurality of consecutive time segments; filtering the audio signal into a plurality of subbands, each frame being represented by at least one subband; deriving a dynamic gain factor from the continuous time period; analyzing at least one subband of the frame to determine whether a transient exists in the frame; and applying a dynamic gain factor to each frame having a transient.

The measured value of the dynamic offset may be represented by a crest factor (crest factor) of the time period. The crest factor for each successive time segment may be calculated by taking the ratio of the function of the peak signal magnitude to the function of the average signal magnitude of the audio signal within the frame. The method may further comprise the step of calculating a subband relative energy function for at least one subband.

The overall subband transient energy for each frame may be calculated by comparing the subband transient energy in each subband of the frame or portion of the frame to a relative energy threshold and summing the number of subbands that pass the relative energy threshold. Transients may be present in a frame in the case where the number of subbands passing the relative energy threshold in the analysis of the frame is greater than a predetermined fraction of the total subbands. For example, in an analysis of the frame, a transient may exist in the frame where the number of subbands passing the relative energy threshold is greater than one quarter of the total subbands.

The method continues by calculating a dynamic gain weighting factor based on the number of subbands that pass the threshold for the total number of subbands in the analysis. The dynamic gain factor for each frame is weighted according to the weighting factor. If no transient is detected for the frame, the previous dynamic gain for the frame may be reduced to a value of 1 by using an exponential decay curve. Before applying the final dynamic gain to the input signal, a check for tonal-like audio may be made to avoid audible modulation of strong tones present in the input signal. If a strong tone is detected within a subband, no additional gain is applied to the subband for that frame period, and the dynamic gain of the subband continues to decay with a dynamic gain value based on the previous frame.

According to another aspect of the present invention, an audio signal processing apparatus is provided. The audio signal processing apparatus includes: receiving means for receiving at least one audio signal, each audio signal having at least one channel, each channel being divided into a plurality of frames in a time sequence; a calculation component for calculating at least one measure of dynamic offset of the audio signal for a plurality of consecutive time segments; a filtering component for filtering the audio signal into a plurality of subbands, each frame being represented by at least one subband; deriving means for deriving a dynamic gain from the measure of dynamic offset and analysing at least one subband of the frame to determine whether a transient is present within the frame and applying the dynamic gain to each frame having a transient.

Drawings

These and other features and advantages of the various embodiments disclosed herein will be better understood with reference to the following description and drawings, in which like numbers refer to like parts throughout, and in which:

FIG. 1a is a perspective view of a waveform of an original audio recording;

FIG. 1b is a perspective view of a waveform of a re-mastered audio recording with over-compressed dynamic range;

FIG. 2 is a schematic diagram of a listening environment using adaptive dynamic augmentation for playback on multi-channel speakers or headphones according to an embodiment of the present invention;

FIG. 3 is a flow diagram illustrating an optional loudness leveling processing block preceding an adaptive dynamic enhancement processor, according to an embodiment of the present invention;

FIG. 4 is a flowchart showing steps taken in an adaptive dynamics enhancement process for detecting transients and applying gain accordingly, in accordance with one embodiment of the present invention;

FIG. 5 is a flowchart showing steps taken in an adaptive dynamics enhancement process of detecting transients, evaluating transients against a known threshold, and applying an adaptive EQ curve accordingly, in accordance with one embodiment of the present invention.

Detailed Description

The detailed description set forth below in connection with the appended drawings is intended as a description of the presently preferred embodiments of the invention and is not intended to represent the only forms in which the present invention may be constructed or utilized. The description sets forth the functions and the sequence of steps for developing and operating the invention in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions and sequences may be accomplished by different embodiments that are also intended to be encompassed within the spirit and scope of the invention. It is further understood that the use of relational terms such as first and second, and the like, are used solely to distinguish one from another entity without necessarily requiring or implying any actual such relationship or order between such entities.

The object of the present invention is to solve the problems in the harmful recording technique, in which audio recordings are mastered as loud as possible using an aggressive application dynamic range compression algorithm. The dynamic excursions of transients in these recorded signals are much lower than they should be. This produces a perception of a weak sound, clunk, or no angry reproduction when listened to at a moderate level.

The present invention analyzes the strength of audio recordings and enhances the transients that show signs of harmful mastering practices. The present invention was devised using intelligent/adaptive processing derived by analyzing the loudness and dynamic properties of a source audio recording signal. The effort of modifying the original audio recording signal is avoided, unless necessary. However, the default amount of additional dynamic processing may also be adjusted by the user so that the intensity of any recording may be exaggerated for sharper or "more powerful" sounds, or reduced for finer enhancements. The present invention can be used to enhance the transient dynamics in any music, movie or game soundtrack and any listening environment originating from any media source.

Referring now to fig. 2, a schematic diagram is provided that illustrates an implementation of various embodiments. Fig. 2 illustrates an audio listening environment for playback of a dynamically enhanced audio recording on a speaker or headset. The audio listening environment comprises at least one consumer electronic device 10 such as a DVD or BD player, a TV tuner, a CD player, a handheld player, an internet audio/video device, or a game console. The consumer electronic device 10 provides a source audio recording that is dynamically enhanced to compensate for any harmful mastering techniques.

In this embodiment, the consumer electronic device 10 is connected to an audio reproduction system 12. Audio reproduction system 12 processes the audio recording through adaptive dynamic enhancement processing (ADE) that dynamically enhances the audio recording. In an alternative embodiment, the stand alone consumer electronic device 10 may enhance the audio recording through ADE processing.

Audio reproduction system 12 contains a Central Processing Unit (CPU) such as an IBM PowerPC, Inter Pentium (x 86) processor, or the like, which may represent one or more conventional types of such processors. Random Access Memory (RAM) temporarily stores the results of data processing operations performed by the CPU and is typically interconnected with it by dedicated memory channels. The audio reproduction system 12 may also contain a permanent storage device such as a hard disk drive that also communicates with the CPU over the i/o bus. Other types of storage devices such as tape drives, optical drives, etc. may also be connected. The graphics card is also connected to the CPU through a video bus and transmits signals representing display data to the display monitor. A peripheral data input device such as a keyboard or mouse may be connected to the audio reproduction system over the USB port. The USB controller translates data and instructions to and from the CPU for the peripheral devices connected to the USB port. Additional devices such as a printer, microphone, and speakers may be connected to audio reproduction system 12.

The audio reproduction system 12 may utilize an operating system having a Graphical User Interface (GUI) such as WINDOWS from microsoft corporation of Redmond, Washington, MAC OS from apple corporation of Cupertino, CA, and various UNIX versions with X-WINDOWS systems. Audio reproduction system 12 executes one or more computer programs. Generally, the operating system and computer program are tangibly embodied in a computer-readable medium, e.g., one or more of a fixed and/or removable data storage device including a hard disk drive. Both the operating system and the computer program can be loaded from the data storage device described above into RAM for execution by the CPU. The computer program may contain instructions that when read and executed by the CPU cause it to perform steps to perform the steps or features of the present invention.

The above audio reproduction system 12 represents only one exemplary device suitable for implementing aspects of the present invention. Audio reproduction system 12 may have many different configurations and structures. Any such configuration or construction may be readily substituted without departing from the scope of the present invention. Those skilled in the art will recognize that the above-described sequence is most commonly used in computer-readable media, however, there are other existing sequences that can be substituted without departing from the scope of the present invention.

Elements of one embodiment of ADE processing may be implemented by hardware, firmware, software, or any combination thereof. When implemented in hardware, the ADE processing may be used on one audio signal processor or distributed among various processing components. When implemented as software, the elements of an embodiment of the present invention are essentially the code segments to perform the necessary tasks. The software preferably comprises actual code or code that emulates or simulates the operations described in one embodiment of the invention. The program or code segments can be stored in a processor or machine accessible medium or transmitted by a computer data signal embodied in a carrier wave, or a signal modulated by a carrier, over a transmission medium. A "processor-readable or accessible medium" or a "machine-readable or accessible medium" may include any medium that can store, communicate, or transport information. Examples of a processor-readable medium include electronic circuits, semiconductor memory devices, Read Only Memory (ROM), flash memory, Erasable ROM (EROM), floppy disks, Compact Disk (CD) ROM, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the internet, intranet, etc. The machine-accessible medium may be embodied in an article of manufacture. The machine-accessible medium may contain data that, when accessed by a machine, cause the machine to perform the operations described below. The term "data" herein refers to any type of information encoded for machine-readable purposes. Thus, it may comprise programs, code, data, files, and the like.

All or a portion of embodiments of the present invention may be implemented by software. The software may have several modules coupled to each other. A software module is coupled to another module to receive variables, parameters, arguments, pointers, etc. and/or to generate or pass results, updated variables, pointers, etc. The software modules may also be software drivers or interfaces that interact with the operating system running on the platform. The software modules may also be hardware drivers that configure, set up, initialize, send, and receive data to and from the hardware devices.

One embodiment of the invention may be described as a process which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a block diagram may describe the operations as a sequential process, multiple operations may be performed in parallel or concurrently. Additionally, the order of the operations may be reconfigured. The process terminates when its operations are completed. The processes may correspond to methods, programs, procedures, etc., and fig. 2 is a schematic diagram illustrating the audio reproduction system 12 for reproduction on the headphones 14 or speakers 16. Audio reproduction system 12 may receive digital or analog audio source signals from various audio or audio/video sources 10. The audio source signal may be a single signal, a two-channel signal (such as a music track or a TV broadcast) or a multi-channel signal (such as a movie soundtrack). The audio signal may be any perceived or unperceived sound, such as a real world sound or an engineering sound.

Audio reproduction system 12 may include an analog-to-digital converter or a digital audio input interface for connecting an analog audio source. It may contain a digital signal processor for processing the audio signal and a digital-to-analog converter and signal amplifier for converting the processed output signal into an electrical signal that is sent to the transducer (headset 14 or speaker 16). The audio reproduction system 12 may be a home cinema receiver or a car sound system dedicated to the selection, processing and routing of audio and/or video signals. Alternatively, one or more of the audio reproduction system 12 and the audio signal source may be incorporated together into a consumer electronic device 10 such as a portable media player, television, or laptop computer. The speaker 16 may also be incorporated into the same appliance, such as in the case of a television or laptop computer.

Fig. 3 is a high level flow diagram illustrating an ADE processing environment. The flow chart begins at step 300 by receiving an input signal. The input signal is a digital audio signal. In the present embodiment, the input signal is processed by a loudness leveling algorithm in step 310, whereby the gain of the incoming input signal is adapted over time such that it has a substantially constant average loudness level (say-20 dB versus 0dB full). The loudness leveling algorithm is an optional feature and is not required for implementing ADE processing. Subsequently, in 320, if an upstream gain normalization algorithm is present, the ADE processing may factor the reference gain level to the available headroom (headroom) needed to extend the gain of the incoming signal without causing audible artifacts that may result from signal waveform clipping. This communication is represented by the dashed arrow. The ADE headroom requirement may also factor the input mastering gain and the gain of the input signal content. The amount of dynamic enhancement that can be applied can be scaled by using the user parameters described by DYNAMICS ENHANCEMENT LEVEL. An output limiter is used to ensure that output saturation does not occur as a result of applying the required dynamic EQ to the input signal.

Referring now to fig. 4, a flow diagram depicting one embodiment of ADE processing is shown. ADE processing begins in step 400 by receiving an input signal representing an audio recording. The input signal is a digital audio signal of at least one channel. The input signal represents a tangible physical phenomenon, in particular sound, converted into a digital format by analog/digital conversion, which has been converted into an electronic signal and suitably preprocessed. Generally, analog filtering, digital filtering, and other pre-processing are applied to minimize aliasing, saturation, or other signal processing errors downstream, as is known in the art. The audio signal may be represented by a conventional linear method such as PCM encoding. In step 410, the input signal is filtered through a multi-tap, multi-band, analysis filterbank, which may suitably be a complementary quadrature mirror filterbank. Alternatively, a Pseudo Quadrature Mirror Filter (PQMF) such as a polyphase filter bank may be used. The filter bank produces a plurality of subband signal outputs. In this embodiment, 64 of such sub-band outputs are used. However, one skilled in the art will readily recognize that the input signal may be filtered into any number of subbands. As part of the filtering function, the filter bank should preferably also critically attenuate the subband signals in each subband significantly, in particular to a lesser number of samples/second just enough to fully represent the signal in each subband ("critical samples"). The sub-band sampling may also mimic the physiology of human hearing.

After filtering, the subbands are analyzed for transient detection in step 420. It is envisaged that not all sub-bands will be used for transient analysis, since it is known that the probability of some frequencies having transients is low. In this embodiment, transients are detected by using a transient detection algorithm that calculates a weighted sum of energy over the frequency band. Since signal energy usually occupies low frequencies, additional weights are used to emphasize the energy of the signal where transients are no longer significant. This reduces the likelihood of "false positives" in the transient identification process:

wherein TE_HF(m, c) is instantaneous high frequency weighted transient energy, k is band index, m is analysis frame index, c represents channel index, w (k) corresponds to the kth frequency weighted filter coefficient, and | G (k, m, c) | represents absolute gain of the kth band of the mth analysis frame of the kth channel. It will be appreciated by those skilled in the art that various transient detection algorithms may be applied in accordance with the present invention, and the above examples are provided as examples and should not be construed as limiting the scope of the invention.

The instantaneous transient energy function is compared to the time average of the previous transient energy. The comparison indicates a possible transient event, wherein the instantaneous transient energy should be much greater than the average transient energy. The average transient energy TE may be calculated by applying a leaky integrator filter (leaky integrator filter) in each frequency band_av：

TE_av(m，c)＝(1-α_TE)TE_av(m-1，c)+α_TETE_HF(m，c) (2)

Wherein alpha is_TECorresponding to the transient energy damping factor, m represents the frame index, and c represents the channel index.

If it is notThen triggering the onset of a transient in which G_TRANSCorresponding to some predetermined temporal threshold. Generally, 2 to 3G_TRANSThe value of (d) gives good results, however, the threshold value may also vary depending on the source material. Subsequently, in step 440, a multiband crest factor value CF (k, m, c) is calculated by taking the ratio of the peak signal level to the time average of the previous signal level in each of the 64 analysis bands.

The peak signal level and the average signal level are derived by using leaky integrators with different attack (attack) and release time constants. An alternative method of calculating the average signal level involves averaging over several "frames" of past frequency subbands stored in system memory. The peak and average gain calculations in this embodiment use leaky integrator filters.

G_peak(k，m，c)＝(1-α_{peak_av})G_peak(k，m-1，c)+α_{peak_av}G(k，m，c)

If G (k, m, c) > G_peak(k，m-1，c) (4)

G_peak(k，m，c)＝(1-α_{peak_rel})G_peak(k，m-1，c)+α_{peak_rel}G(k，m，c)

If G (k, m, c) is less than or equal to G_peak(k，m-1，c)(5)

G_av(k，m，c)＝(1-α_av)G_av(k，m-1，c)+α_avG(k，m，c) (6)

The derived crest factor is based on a ratio of gains. As a result, the derived crest factor is independent of the level of the input signal. Thus, the results are the same regardless of the system's mastering gain or recording level of the original recording. Referring to equation (3), a significant transient such as a percussion strike should have a crest factor higher than a more steady state or tonal signal. If the signal contains an instantaneous onset exhibiting an opposite crest factor value, it is a strong indication of post-recording dynamic range compression or limitation over that frequency band. In this case, the original signal should benefit from a gain boost of a short time, in the order of the onset and decay times of the detected transients, to produce the desired crest factor value.

As a result, the ADE process evaluates the crest factor at any time that the onset of the transient is detected. In step 460, the crest factor is evaluated and if it is lower than the target crest factor threshold (determined by a combination of algorithmic tuning and/or user preferences), then the gain in that subband is increased so that the desired crest factor value is obtained. The gain may be limited to remain within a specified or dynamically evaluated headroom budget:

if it is notAnd TE_HF(m，c)＞G_TRANSTE_av(m，c)

(7)

Wherein G is_eq(k, m, c) represents the applied gain function, G_{eq_max}Representing the maximum allowable gain (typically corresponding to the allocated algorithm headroom), α_attackIs the gain onset damping function (gain onset damping function) if found to be derived fromA false signal of a rapid gain change, the gain start damping function can be tuned to a value close to 1. The value of the damping function may be frequency dependent to allow the gain ramp to occur at different rates for different frequency ranges. CF (compact flash)_TargetRepresents the target crest factor value, and CF (k, m, c) represents the crest factor values measured at frequency k and frame m and channel c.

If no transient onset is detected or if the crest factor is equal to or greater than the target crest factor value, then the applied dynamic EQ gain is backed off to a value of 1 by using an envelope that mimics the dynamics of a typical transient attack. The rate of gain reduction is weighted such that the higher frequency gain decreases faster than the lower frequency gain:

G_eq(k，m，c)＝max(1，α_decay(k，m)G_eq(k，m-1，c)) (8)

wherein alpha is_decay(k, m) represents a frequency dependent damping factor. In the present embodiment, α_decay(k, m) is represented by a 64-point function that slopes exponentially at frequencies from high to low values bounded by 1 and 0.

In step 480, the user parameter, represented by "Dynamics Enhancement Level" (DEL), scales the target crest factor by a value between 0.0 and 1.0. A DEL value of 0.0 means that the crest factor threshold will always be obtained and therefore no enhancement will be performed on the original signal. A DEL value of 0.5 represents a default analysis threshold and represents a "reasonable" crest factor expectation. By this value, the compressed signal is enhanced, while signals with sufficient strength will receive little or no dynamic enhancement. A DEL value of 1.0 represents a strength that exceeds the "reasonable" crest factor expectation, such that most transients are enhanced, whether or not needed.

The output is derived by multiplying the subband input signal components by a time-varying EQ curve derived from the enhancement gain. These gains are smoothed in frequency to avoid spurious signals. The EQ curve is applied to the original composite input signal data and the resulting composite band coefficients are then recombined and transformed into a block of time-domain output samples using a 64-band combining group or equivalent frequency-time domain filter. Finally, the time domain output of the synthesis filter band is passed through a soft limiter (or equivalent) to counteract any incidental level overshoot that may be caused by an increase in signal level beyond the available headroom.

This input/output process is repeated for each analysis frame. The gain of the EQ curve is dynamically changed according to the analysis of each frame. In the above described embodiment, the derived gain curve is applied to the original signal by multiplication in the frequency domain followed by output synthesis which is complementary to the input synthesis block. In other embodiments, the analytical and synthetic methods may differ. For example, as described above, the analysis may be performed in the frequency domain, and when a desired gain curve has been calculated, a filter representing the desired frequency response may be implemented in the time domain by using FIR and/or IIR filters. The coefficients of the time-domain filter may be changed according to the analysis of each input data frame. Alternatively, the analysis of crest factor and transient onset detection may occur in the time domain as a whole.

The analysis and synthesis described above uses evenly separated frequency bands. The analysis is preferably performed on logarithmically separated frequency bands that better match the psychoacoustics of human hearing.

Referring now to fig. 5, a flow chart illustrating a preferred embodiment of ADE processing is presented. The flowchart begins in step 500 by converting an input signal into a composite frequency domain representation using a 64-band oversampled polyphase analysis filter bank. Other types of filter banks may be used. A different number of filter banks may also be used. In the implementation described herein, the analysis filter bank extracts blocks of 64 frequency-domain samples for each block of 64 time-domain input samples to form the sub-band audio signal.

In step 510, to evaluate the amount of force present in the input signal, a frequency independent crest factor per frame is derived for each channel.

Wherein H_sum(m,c) Defined as the sum of the k band sizes of the mth frame of the mth lane of input data:

H_sum(m，c)＝∑H(k，m，c)

the peak sum function is defined as:

H_{sum_pk}(m，c)＝H_sum(m, c)) if H_sum(m，c)＞H_{sum_pk}(m-1，c)

If not, then,

H_{sum_pk}(m)＝(1-α_{pk_rel})H_{sum_pk}(m-1)+α_{pk_rel}H_sum(m)

the average sum function is defined by the leaky integrator function:

H_{sum_av}(m，c)＝(1-α_avg)H_{sum_av}(m-1，c)+α_avgH_sum(m，c)

wherein alpha is_{pk_rel}Represents the peak release coefficient, and alpha_avgRepresenting the average smoothing factor.

The crest factor per frame is defined as the ratio of the peak signal magnitude to the average signal magnitude,

where cf (m) represents the crest factor of the mth frame of the c-th channel of input data. It is contemplated that the crest factor may be described in terms of a sum of energies.

H_sum(m，c)＝∑|H(k，m，c)|²

The crest factor per frame represents the amount of dynamic range present in the input signal. When a transient is detected, the crest factor should be equal to or greater than some desired target value. If the crest factor per frame is too low in the presence of a transient, a short-term gain is applied to the input signal frame to increase the measured crest factor to a more desirable value, where short-term refers to detecting the onset and decay times of the order of the onset and decay times of the transient.

In step 520, a predetermined target crest factor CF is obtained_TThe ratio of (c) to (d) to derive a per frame dynamic gain G_DYN(m, c) and the measured crest factor CF (m, c) represents the amount of gain required to obtain the desired dynamic bias level.

CF_TIs assumed to represent a reasonable crest factor of the dynamic material of, for example, 14 dB. The specified target crest factor can also be modified by a user-controllable gain called Dynamic Enhancement Level (DEL), thereby indirectly affectingIn response to the amount of enhancement applied.

If the target crest factor is greater than the measured crest factor, G_DYN(m, c) will be less than 1. If the gain value is allowed, it eventually results in a reduction of the level of transient events in the input. However, in this embodiment, G_DYN(m, c) is limited to 1 or more.

In this phase, G is not applied to the input signal_DYN(m, c). But, only satisfyTwo other conditions are applied:

1. a transient has been detected for the current frame; or

2. The subbands to which the gain is applied do not have any strong tonal content.

In step 540, transients in the current frame are detected. The subband signals are analyzed to detect transients by using a transient detection algorithm that computes a relative energy function per subband. When a large energy increase is detected within a subband, the value of the function will increase sharply. The presence of more subbands indicates a simultaneous increase, which further indicates a higher probability that a transient has been detected within a given frame.

The relative energy function can be defined as:

wherein E is_inst(k, m, c) represents the energy measured on the k subband of the m frame of the c channel, and E_av(k, m, c) represents the average energy measured on the kth subband of the mth frame of the kth channel. Per sub-band is flatAre based on the leak integral function:

E_av(k，m，c)＝(1-_av)E_av(k，m-1，c)+_avE_inst(k，m，c)

for each subband relative energy function, the current value is compared with some relative energy threshold RE_TRESHAnd (6) comparing. If a relative energy function threshold is exceeded in a sub-band, that sub-band is marked as having an energy increase that represents a transient. The overall per-frame transient energy function is then calculated by summing the number of subbands passing the relative energy threshold.

TE(m，c)＝∑(RE(k，m，c)＞RE_TRESH)

Where TE (m, c) is an integer value between 0 and K, where K represents the total number of subbands analyzed. Note that K may be less than the total number of bands in a frame. For example, it may be more desirable to focus the transient detection on sub-bands where significant energy has been detected.

A significant proportion of the subbands exceeding the relative energy threshold represent a broadband increase in energy representing the transient. However, it is difficult to correlate the exact number of sub-bands with the positive result to specifically define the transient. In some cases, the average signal level may be too high so that the relative energy threshold may remain low in many frequency bands. Although the required number of sub-bands with positive results explaining this may be reduced, this may lead to "false positive" transient detection. Thus, the transient energy function per frame starts with an estimate of the likelihood of deriving a transient. And, calculating and exceeding RE_TRESHA series of gain weighting functions proportional to the number of sub-bands. For example,

if TE (m, c)>K/2, then W_T(m,c)=1

If TE (m, c)>K/3, then W_T(m,c)=0.75

If TE (m, c)>K/4, then W_T(m,c)=0.5；

Where K represents the total number of subbands in the analysis.

If not, then,

W_T(m,c)=0

other values may be used for the positive subband threshold and associated weighting gain. In step 550, W on any input channel is determined_T(m,c)>Any value of 0 represents the onset of the transient. The dynamic gain is then modified by a weighting factor:

G_{DYN_MOD}(m，c)＝max(1，G_DYN(m，c)*W_T(m，c))

a boundary check is applied to ensure that no gain less than 1 is applied. The gain may then be applied to all subbands of the current data frame. However, this may be undesirable in sub-bands with significant tonal-like components, since sudden increases in gain in these bands may result in audible signal modulation. To avoid this, each sub-band is analyzed for the presence of strong tones. By their nature, tonal-like components have a relatively low peak-to-average ratio (or subband crest factor). Therefore, there is no additional gain applied to subbands having a measured crest factor below the so-called tonality threshold, and they continue to decay based on their original decay trajectory.

In step 530, a per-subband crest factor value is calculated by taking the ratio of the peak gain level to the time-averaged gain in each of the analysis bands.

Both peak and average filters are implemented using leaky integrators.

If G (k, m, c) > G_peak(k, m-1, c), then G_peak(k，m，c)＝G(k，m，c)

Where G (k, m, c) represents the size of the kth sub-band of the mth frame of the kth channel.

If not, then,

G_peak(k，m，c)＝(1-β_{peak_rel})G_peak(k，m-1，c)+β_{peak_rel}G(k，m，c))

G_av(k，m，c)＝(1-β_av)G_av(k，m-1，c)+β_avG(k，m，c))

wherein, beta_{peak_rel}Represents the peak release function per subband, and_avrepresenting the average smoothing function.

In the frame for detecting the transient start, the crest factor of each sub-band is compared with a predetermined threshold value gamma_TONEIn comparison, this determines whether a tonal component is present in that subband. If the subband crest factor is below the threshold, then it is assumed that a tonal like component is detected and no gain is applied to the subband for the frame. Various measures of pitch may be used, such as the coefficients of pitch described in j.johnston, "Transform coding of audio signals using spatial noise criterion," IEEE J sel.areas in comm., vol.6, No.2, pp.314-323,1998, month 2. Described as EQ_DYNThe final per-subband dynamic gain of (k, m, c) is updated instantaneously to the following value:

if CF (k, m, c) > gamma_TONEThen EQ_DYN(k，m，c)＝G_{DYN_MOD}(m，c)

In step 560, it is determined that if no transient is detected or if tonal components are detected in the sub-bands, then EQ is performed using a frequency-dependent exponential curve that models a typical transient decay function_DYNThe relative subband values of (k, m, c) decay towards a value of 1 (no processing):

EQ_DYN(k，m，c)＝max(EQ_DYN(k，m，c)*σ_decay(k)，1)

wherein σ_ecay(k) Represents a per-subband decay coefficient function that decreases with increasing frequency to model how slowly low frequency transients decay compared to high frequency transients. A boundary check is applied to ensure that no gain less than 1 is applied.

In step 570, EQ_DYN(k, m, c) is constrained within a limited range to avoid output saturation as follows:

if EQ_DYN(k，m，c)*|X(k，m，c)|＞Y_max

Wherein | X (k, m, c) | represents the size of input data of the kth segment of the mth frame of the c-th channel, and Y_maxRepresenting the maximum allowed output value of each subband of each frame of each channel. EQ if warranted_DYNThe final version of (k, m, c) may be smoothed in frequency to avoid spurious signals.

In step 580, the method includes multiplying the composite input coefficient in each frequency band by the EQ_DYN(k, m, c) applying a specified enhancement to the appropriate input channel.

Y(k，m，c)＝EQ_DYN(k，m，c)X(k，m，c)

Where X (k, m, c) represents input data of a kth segment of an mth frame of a c-th channel, and Y (k, m, c) represents output data of a kth segment of an mth frame of the c-th channel.

The resulting complex band coefficients are recombined and transformed into a block of time-domain output samples using a 64-band combining bank or equivalent frequency-time domain filter.

The above-described input/output process is repeated for each block of input samples (steps 500-580). The gain of the EQ curve will change dynamically according to the analysis of each block of the input signal.

The gain of the EQ curve is dynamically changed according to the analysis of each input signal frame. In the above described embodiment, the derived gain curve is applied to the original signal by multiplication in the frequency domain followed by output synthesis which is complementary to the input synthesis block. In other embodiments, the analytical and synthetic methods may be different.

The analysis and synthesis described above uses evenly separated frequency bands. However, it is preferred to perform the analysis on logarithmically separated frequency bands that better match the psychoacoustics of human hearing.

The particulars shown herein are by way of example and for purposes of illustrative discussion of the embodiments of the present invention only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the present invention. In this regard, no attempt is made to show details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.

Claims

1. A method for conditioning an audio signal, comprising:

receiving at least one audio signal, each audio signal having at least one channel, each channel being divided into a plurality of frames in a time sequence;

calculating a crest factor of the audio signal over at least one frame;

filtering the audio signal into a plurality of subbands, each frame being represented by at least one subband;

deriving a dynamic gain per frame by taking a ratio of a prescribed target crest factor to the calculated crest factor;

calculating a subband crest factor by taking a ratio of a peak gain level to a time-averaged gain in each subband of the at least one subband;

analyzing at least one subband of the frame to determine whether a transient exists in the frame by using a transient detection algorithm that calculates a subband relative energy function for the at least one subband of the frame;

in response to a transient detected in a frame, comparing a sub-band crest factor for each sub-band of the frame to a predetermined tonality threshold; and

dynamic gain is applied to subbands having subband crest factors that exceed a predetermined tonality threshold.

2. The method of claim 1, wherein the crest factor of the audio signal is calculated by taking a ratio of a function of a peak signal magnitude to a function of an average signal magnitude of the audio signal within a frame.

3. The method of claim 1, wherein analyzing at least one subband of the frame comprises computing a subband relative energy represented as:

wherein:

RE (k, m, c) ═ subband relative energy measured on the kth subband of the mth frame of the mth channel;

E_inst(k, m, c) — the instantaneous energy measured on the kth subband of the mth frame of the kth channel; and

E_av(k, m, c) is the average energy measured on the kth subband of the mth frame of the kth channel.

4. The method of claim 3, wherein the overall transient energy for each frame is calculated by comparing the subband relative energy in each subband of the frame to a threshold and summing the number of subbands that exceed the threshold, the overall transient energy being represented as:

TE(m，c)＝∑(RE(k，m，c)＞RE_TRESH)

wherein:

TE (m, c) — the total transient energy measured on the mth frame of the mth channel;

RE (k, m, c) ═ subband relative energy measured on the kth subband of the mth frame of the mth channel; and

RE_TRESHthreshold relative energy value.

5. The method of claim 4, wherein a transient is detected in a frame when the number of subbands that exceed the threshold in the analysis of the frame is greater than a predetermined fraction of total subbands.

6. The method of claim 4, further comprising:

calculating a weighting factor for each frame based on the number of subbands that exceed a threshold; and

the dynamic gain of each frame is weighted based on a weighting factor.

7. The method of claim 1, wherein the subband crest factor for each subband is calculated by determining a ratio of a peak gain level to a time-averaged gain, the subband crest factor being expressed as:

wherein:

CF (k, m, c) is the crest factor value on the k subband of the mth frame of the mth channel;

G_peak(k, m, c) peak gain level on the kth subband of the mth frame of the kth channel;

G_avtime-plane on the kth subband of the mth frame of the kth channelThe gain is equalized.

8. The method of claim 1, further comprising not further modifying the subband gain if the subband crest factor is below a predetermined tonality threshold.

9. The method of claim 1, further comprising applying a dynamic gain reduced by an exponential decay curve to subbands having subband crest factors below a predetermined tonality threshold.

10. An apparatus for conditioning an audio signal, comprising:

receiving means for receiving at least one audio signal, each audio signal having at least one channel, each channel being divided into a plurality of frames in a time sequence;

a calculation section for calculating a crest factor of the audio signal over at least one frame;

a filtering component for filtering the audio signal into a plurality of subbands, each frame being represented by at least one subband;

deriving means for deriving a dynamic gain per frame by taking a ratio of a prescribed target crest factor to the calculated crest factor;

wherein the calculating means is further for calculating a subband crest factor by taking a ratio of a peak gain level to a time-averaged gain in each subband of the at least one subband;

an analysis component for analyzing at least one subband of the frame to determine whether a transient exists in the frame by using a transient detection algorithm that calculates a subband relative energy function for the at least one subband of the frame;

comparing means for comparing the sub-band crest factor for each sub-band of a frame with a predetermined tonality threshold in response to a transient detected in the frame; and

a dynamic gain applying component for applying a dynamic gain to subbands having subband crest factors exceeding a predetermined tonality threshold.

11. The apparatus of claim 10, wherein the crest factor of the audio signal is calculated by taking a ratio of a function of a peak signal magnitude to a function of an average signal magnitude of the audio signal within a frame.

12. The apparatus of claim 10, wherein analyzing at least one subband of the frame comprises computing a subband relative energy represented as:

wherein:

E_inst(k, m, c) instantaneous energy measured on the kth subband of the mth frame of the kth channel; and

13. The apparatus of claim 12, wherein the overall transient energy for each frame is calculated by comparing the subband relative energy in each subband of the frame to a threshold and summing the number of subbands that exceed the threshold, the overall transient energy being represented as:

TE(m，c)＝∑(RE(k，m，c)＞RE_TRESH)

wherein:

RE_TRESHthreshold relative energy value.

14. The apparatus of claim 13, wherein a transient is detected in a frame when a number of subbands that exceed a threshold in the analysis of the frame is greater than a predetermined fraction of total subbands.

15. The apparatus of claim 13, wherein the computing component is further configured to:

the dynamic gain of each frame is weighted based on a weighting factor.

16. The apparatus of claim 10, wherein the subband crest factor for each subband is calculated by determining a ratio of a peak gain level to a time-averaged gain, the subband crest factor being expressed as:

wherein:

G_av(k, m, c) is the time-averaged gain on the kth subband of the mth frame of the kth channel.

17. The apparatus of claim 10, wherein the dynamic gain applying component is further for not further modifying the subband gain if the subband crest factor is below a predetermined tonality threshold.

18. The apparatus of claim 10, wherein the dynamic gain applying component is further for applying a dynamic gain reduced by an exponential decay curve to subbands having subband crest factors below a predetermined tonality threshold.