HK1173250B - Virtual audio processing for loudspeaker or headphone playback - Google Patents
Virtual audio processing for loudspeaker or headphone playback Download PDFInfo
- Publication number
- HK1173250B HK1173250B HK13100483.9A HK13100483A HK1173250B HK 1173250 B HK1173250 B HK 1173250B HK 13100483 A HK13100483 A HK 13100483A HK 1173250 B HK1173250 B HK 1173250B
- Authority
- HK
- Hong Kong
- Prior art keywords
- channel signal
- signal
- center channel
- output
- processing
- Prior art date
Links
Description
Cross Reference to Related Applications
The present invention claims priority from U.S. provisional patent application entitled VIRTUAL 3DAUDIO PROCESSING FOR LOUDSPEAKER OR heidphoneplayack, filed on 6/1/2009, inventors hereof, serial No. 61/217,562. U.S. provisional patent application serial No. 61/217,562 is incorporated herein by reference.
Declaring that: government funded research/development
Not applicable to
Technical Field
The present application relates to processing audio signals, and more particularly to processing audio signals to reproduce sound on virtual channels.
Background
Audio plays an important role in providing a content rich multimedia experience in consumer electronics. The development of scalability and mobility of consumer electronics devices, as well as wireless connectivity, provides users with instant access to content. Fig. 1a shows a conventional audio reproduction system 10 for playing through headphones 12 or speakers 14 as known to those skilled in the art.
The conventional audio reproduction system 10 receives a digital or analog audio source signal 16 from an audio or audio/video source 18 (e.g., CD player, TV tuner, handheld media player, etc.). The audio reproduction system 10 may be a home cinema receiver or an automated audio system dedicated to selecting, processing and routing broadcast audio and/or video signals. Alternatively, the audio reproduction system 10 and one or more audio signal sources may be integrated together in a consumer electronic device, such as a portable media player, a television, a laptop computer, or the like.
The audio output signal 20 is typically processed through a speaker system and output for playback. Such an output signal 20 may be a dual channel signal sent to the headphones 12 or a pair of front speakers 14, or a multi-channel signal for surround sound playback. For surround sound playback, the audio reproduction system 10 may include a multi-channel decoder, as described in U.S. patent No. 5,974,380, assigned to Digital television systems, Inc. Other commonly used multi-channel decoders includeAndAC3。
the audio reproduction system 10 also includes standard processing equipment (not shown), such as an analog-to-digital converter for connecting an analog audio source, or a digital audio input interface. The audio reproduction system 10 may include a digital signal processor for processing the audio signals, and a digital-to-analog converter and signal amplifier for converting the processed output signals into electrical signals that are sent to the transducer (headphones 12 or speakers 14).
In general, the speakers 14 may be arranged in a variety of configurations depending upon the application. The speaker 14 may be a stand-alone speaker as shown in fig. 1 a. Alternatively, the speaker 14 may be incorporated into the same device, for example in the case of a consumer electronics device such as a television, laptop, handheld stereo player, etc. Fig. 1b shows a laptop computer 22 with two embedded speakers 24a, 24b arranged parallel to each other. The embedded speakers are narrowly spaced from each other as indicated by a'. The consumer electronic device may include embedded speakers 24a, 24b arranged in various orientations, such as side-by-side or above-below. The space and size of the embedded speakers 24a, 24b is application specific and therefore dependent on the size and physical limitations of the enclosure.
Due to technical and physical limitations, audio playback is often compromised or limited in such devices. This is particularly evident in electronic devices (e.g., laptop computers, MP3 players, mobile phones, etc.) that have the physical limitation of having narrow speaker spacing or playing sound with headphones. Some devices are limited due to the physical separation between the speakers and due to the corresponding small angle between the speakers and the listener. In such sound systems, the perceived sound field (sound stage) perceived by the listener is generally of a lesser width than in the case of systems where the loudspeakers are sufficiently spaced. Typically, product designers avoid deviating from the aesthetic design of the television set by not including a center speaker. This compromise may limit the overall sound quality of the television when the sound and dialog are directed toward the center speaker.
To address these audio limitations, audio processing methods have typically been used to reproduce two-channel or multi-channel audio signals through a pair of headphones or a pair of speakers. Such methods include dramatic spatial enhancement effects to improve audio playback in applications with narrowly spaced speakers.
In us patent No. 5,671,287, Gerzon discloses pseudo-stereophonic or directional dispersion effects with low "reverberation" and substantially flat reproduced total energy response. The pseudo-stereo effect includes little unpleasantness and undesirable subjective side effects. It may also provide a simple way of controlling various parameters of the pseudo-stereo effect, such as the magnitude of the angular spread of the sound source.
In us patent No. 6,370,256, McGrath discloses a head-related transfer function acting on an input audio signal in a head-tracking listening environment comprising a series of principal component filters connected to the input audio signal, each filter outputting a predetermined analog sound arrival; a series of delay elements, each delay element being connected to a respective one of the principal component filters and delaying the output of the filter by a variable amount in accordance with a delay input, thereby producing a filter delay output; summing means interconnected to the series of delay elements and summing the filter delay outputs to produce an audio speaker output signal; a head tracking parameter mapping unit having a current direction signal input and interconnected to each of the series of delay elements to provide a delay input.
In us patent No. 6,574,649, McGrath discloses an efficient convolution technique for spatial enhancement. With less processing power, the time domain output adds various spatial effects to the input signal.
Conventional spatial audio enhancement effects include processing audio signals to provide the perception that they are output from virtual speakers, thereby having an out-of-head effect (in headphone playback), or an over-speaker arc effect (in speaker playback). Such "virtualization" processing is particularly effective for audio signals mainly containing side (or "full left/full right sound image (hard-panned)" sounds. However, when the audio signal contains a center-panned sound component, the perceived position of the center-panned sound component remains "anchored" at the center point of the speaker. When such sounds are reproduced through headphones, they tend to be perceived as being elevated and may create an undesirable "in the head" audio experience.
For a two-channel or stereo signal, the virtual audio effect of less strongly mixed audio material is less noticeable. In this regard, the central image component dominates the mix, resulting in little spatial enhancement. In the extreme case where the input signal is a complete single channel (identical in the left and right audio source channels), no spatial effect is audible at all when the spatial enhancement algorithm is enabled.
This is particularly a problem in systems where the speakers are below the ear level of the listener (horizontal listening plane). Such structures are found in laptop computers or mobile devices. In these cases, the processed full left/full right image components of the audio mix may be perceived to extend beyond the speakers and be elevated above the plane of the speakers, while the center image and/or single channel content is perceived to emanate from between the original speakers. This results in a very "disjointed" reproduced stereo image.
Accordingly, in view of the increasing interest and application to providing spatial effects in audio signals, there is a need in the art for improved virtual audio processing.
Disclosure of Invention
According to a first aspect of the present invention, there is included a method of processing an audio signal, the method comprising the steps of: receiving at least one audio signal having at least a center channel signal, a right side channel signal, and a left side channel signal; processing the right side channel signal and the left side channel signal using a first virtualization processor, thereby creating a right virtualized channel signal and a left virtualized channel signal; processing the center channel signal using a spatial expander to produce distinct right and left outputs, thereby expanding the center channel to have a pseudo-stereo effect; and adding the right output and the left output to the right virtualized channel signal and the left virtualized channel signal to produce at least one modified side channel output.
The center channel signal is filtered by a right all-pass filter and a left all-pass filter to produce a right phase-shifted output signal and a left phase-shifted output signal. The right and left channel signals are processed by the first virtualization processor to create different perceived spatial locations for at least one of the right and left channel signals. In an alternative embodiment, the step of processing the center channel signal using a spatial expander further comprises the step of applying a delay or all-pass filter to the center channel signal, thereby creating a phase-shifted center channel signal. The phase-shifted center channel signal is then subtracted from the center channel signal to produce the right output. The center channel signal is then added to the phase-shifted center channel signal to produce the left output. In an alternative embodiment, the spatial extender scales the center channel signal based on at least one coefficient for determining the amount of perceived spatial extension. The coefficient is defined by satisfying a2+b2Multiplication factors a and b of c are determined, where c is equal to a predetermined constant value.
According to a second aspect of the present invention, there is included a method of processing an audio signal, the method comprising the steps of: receiving at least one audio signal having at least a right-side channel signal and a left-side channel signal; processing the right side channel signal and the left side channel signal to extract a center channel signal; further processing the right side channel signal and the left side channel signal using a first virtualization processor, thereby creating a right virtualized channel signal and a left virtualized channel signal; processing the center channel signal using a spatial expander to produce distinct left and right outputs, thereby expanding the center channel to have a pseudo-stereo effect; and adding the right output and the left output to the right virtualized channel signal and the left virtualized channel signal to produce at least one modified side channel output.
The first processing step may include the steps of: filtering the right and left side channel signals into a plurality of subband audio signals, each subband signal being associated with a different frequency band; extracting a sub-band center channel signal from each frequency band; and recombining the extracted sub-band center channel signals to produce a full-band center channel signal. The first processing step may comprise the step of extracting the subband mid channel signal by scaling at least one of the right or left subband channel signal using at least one scaling factor. It is conceivable that the at least one scaling factor is determined by evaluating an inter-channel similarity index between the right-hand channel signal and the left-hand channel signal. The inter-channel similarity index is related to a magnitude of a signal component common to the right-side channel signal and the left-side channel signal.
According to a third aspect of the present invention, there is provided an audio signal processing apparatus comprising: at least one audio signal having at least a center channel signal, a right side channel signal and a left side channel signal; a processor for receiving the right and left side channel signals, the processor processing the right and left side channel signals using a first virtualization processor, thereby creating a right and left virtualized channel signal; a spatial expander for receiving the center channel signal, the spatial expander processing the center channel signal to produce distinct right and left output signals, thereby expanding the center channel to have a pseudo-stereo effect; and a mixer for adding the right and left output signals to the right and left virtualized channel signals to produce at least one modified side channel output. Processing the right-side channel signal and the left-side channel signal using the first virtualization processor to create a different perceived spatial location for at least one of the right-side channel signal and the left-side channel signal. The invention is best understood by reference to the following detailed description when read in conjunction with the accompanying drawings.
Drawings
These and other features and advantages of the various embodiments disclosed herein will be better understood with regard to the following description and drawings, in which like numbers refer to like parts throughout, and in which:
fig. 1a is a schematic diagram showing a conventional audio reproduction playing system for reproduction through headphones or speakers.
Fig. 1b is a schematic diagram showing a laptop computer with two embedded speakers at a narrower separation.
Fig. 2 is a schematic diagram showing a virtual audio processing device for playing through a pair of front speakers.
Fig. 3 is a block diagram illustrating a virtual audio processing system having three parallel processing blocks and a spatial extender included in a central channel processing block.
Fig. 3a is a block diagram of a front channel virtualization processing block with HRTF filters having sum and difference transfer functions and producing two output signals.
Fig. 3b is a block diagram illustrating a surround channel virtualization processing block having HRTF filters with sum and difference transfer functions and producing two output signals.
Fig. 4 is a diagram illustrating an auditory effect of a spatial expansion process according to an embodiment of the present invention.
Fig. 5a is a block diagram depicting a spatial expansion processing block for filtering a center channel signal with a right all-pass filter and a left all-pass filter.
Fig. 5b is a block diagram of an all-pass filter including a delay unit.
Fig. 5c is a block diagram of a spatial spreading processing block with a delay unit.
Fig. 5d is a block diagram of a spatial expansion processing block with an all-pass filter.
Fig. 6 is a block diagram of a virtual audio processing apparatus including a center channel extraction block for extracting a center channel signal from a right channel signal and a left channel signal.
FIG. 7 is a block diagram of a center channel extraction processing block that performs sub-band analysis.
FIG. 8 is a block diagram of a virtual audio processing device with spatial extensions and a channel virtualizer in the same processing block.
Detailed Description
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
Elements of one embodiment of the invention may be implemented by hardware, firmware, software, or any combination thereof. When implemented in software, the elements of an embodiment of the invention are essentially the code segments to perform the necessary tasks. The software may include the actual code to perform the operations described in one embodiment of the invention, or code that emulates or simulates the operations. The program or code segments can be stored in a processor or machine accessible medium or transmitted by a computer data signal embodied in a carrier wave, or a signal modulated by a carrier, over a transmission medium. A "processor-readable or accessible medium" or a "machine-readable or accessible medium" may include any medium that can store, transmit, or communicate information. Examples of a processor-readable medium include electronic circuits, semiconductor memory devices, Read Only Memory (ROM), flash memory, Erasable ROM (EROM), floppy disks, Compact Disk (CD) ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
The machine-accessible medium may be included in an article of manufacture. The machine-accessible medium may include data that, when accessed by a machine, cause the machine to perform the operations described below. The term "data" is used herein to mean any type of information that is encoded for machine-readable purposes. Accordingly, it may include programs, code, data, files, and the like.
An embodiment of the invention is implemented in whole or in part by software. The software may have a plurality of modules coupled to each other. A software module is coupled to another module to receive variables, parameters, arguments, pointers, etc. and/or to generate or pass results, updated variables, pointers, etc. The software modules may also be software drivers or interfaces to interact with the operating system running on the platform. The software modules may also be hardware drivers for configuring, setting up, initializing, and sending and receiving data to and from the hardware devices.
One embodiment of the invention may be described as a process which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a block diagram may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process is terminated when its operations are completed. The processes may correspond to methods, procedures, and the like.
FIG. 2 is a schematic diagram illustrating an environment in which one embodiment of the invention may be practiced. The environment includes a virtual audio processing device 26 configured to receive at least one audio source signal 28. The audio source signal 28 may be any audio signal, such as a single channel signal or a dual channel signal (e.g., music or TV broadcast). The two-channel audio signal comprises two side-channel signals LF (t), RF (t) intended to be played through a pair of front loudspeakers LF, RF. Alternatively, the audio source signal 28 may be a multi-channel signal (e.g., a film soundtrack) and include a center channel signal cf (t) intended to be played through the surround sound speaker array and four side channel signals ls (t), lf (t), rf (t), rs (t). Preferably, the audio signal source 28 comprises at least a left channel signal lf (t) and a right channel signal rf (t).
The virtual audio processing device 26 processes the audio source signals 28 to produce audio output signals 30a, 30b for playback through speakers or headphones. Audio source signal 28 may be a multi-channel signal intended to be performed through speaker array 14 surrounding a listener, such as a standard "5.1" speaker layout as shown in fig. 1a, with speakers labeled LS (left surround), LF (left front), CF (center front), RF (right front), RS (right surround), SW (subwoofer). The standard "5.1" speaker layout 14 is an example given and not limiting. In this regard, it is contemplated that the audio output signals 30a, 30b may be configured to simulate any source (or "virtual") speaker layout represented by "m.n", where m is the number of main (satellite) channels and n is the number of sub-woofer (or low frequency enhancement) channels. Alternatively, the audio output signals 30a, 30b may be processed for playback over a pair of headphones 12.
The virtual audio processing device 26 has various conventional processing means (not shown) which may include a digital signal processor connected to digital audio input and output interfaces, and a storage device for storing temporary processing data and processing program instructions.
The audio output signals 30a, 30b are directed to a pair of loudspeakers, labeled L and R respectively. Fig. 2 shows a desired arrangement of loudspeakers LS, LF, CF, RF and RS for a 5-channel audio input signal. In many practical applications, such as television sets or laptop computers, the physical separation of the output speakers L and R is narrower than the expected separation of the LF and RF speakers. In this case, the virtual audio processing device 26 is designed to produce a stereo widening effect. The stereo widening effect provides the illusion that the audio signals LF (t) and RF (t) are emitted from a pair of virtual loudspeakers located at positions LF and RF. Thus, the perceived sound is emitted from a virtual speaker located at the intended speaker location. The virtual loudspeakers may be located anywhere in the spatial sound field. In this regard, it is contemplated that the audio source signals 28 may be processed to emanate from virtual speakers located at any perceived location.
For 5-channel audio source signal 28, virtual audio processing device 26 produces the perception that audio channel signals CF (t), LS (t), and RS (t) emanate from speakers located at positions CF, LS, and RS, respectively. Similarly, audio channel signals CF (t), LF (t), and RF (t) may be perceived as emanating from speakers located at positions CF, LF, and RF, respectively. As is known in the art, these illusions may be achieved by transforming the audio input signal 28 in view of a measurement or approximation of the speaker-to-ear sound transfer function, otherwise known as a Head Related Transfer Function (HRTF). HRTFs relate to frequency-dependent time and amplitude differences that are imposed on sound emanating from any sound source and that contribute to sound diffraction around the listener's head. It is conceivable that each source from any direction produces two associated HRTFs (one for each ear). It is important to note that most 3D sound systems are not able to use the HRTF of the user; in most cases, non-personalized (universal) HRTFs are used. Generally, theoretical methods based on physical or psychoacoustic are used to derive non-personalized HRTFs that are common to most people.
The ipsilateral (ipsilateral) HRTF represents the path to the ear closest to the source, while the contralateral (contralateral) HRTF represents the path to the farthest ear. The HRTFs labeled in fig. 2 are as follows:
H0i: an ipsilateral HRTF for the front left or right front actual speaker position;
H0c: a contralateral HRTF for the front left or right front actual speaker position;
HFi: an ipsilateral HRTF for a front left or right virtual speaker position;
HFc: a contralateral HRTF for a front left or right virtual speaker position;
HSi: ipsilateral HRTFs for positions surrounding left or right virtual speakers;
HSc: a contralateral HRTF for a surround left or right virtual speaker position;
HF: HRTF for the front center virtual speaker position (same for both ears).
The virtual audio processing device assumes that the actual speaker layout and the virtual speaker layout are in a symmetrical relationship with respect to the front direction of the listener. In the case of a symmetrical relationship, the listener is positioned on a straight axis relative to the CF speaker so that the audio image is balanced in direction. It is contemplated that slight changes in head position will not disturb the symmetry. The symmetry relationship is exemplary presented and not intended to be limiting. In this regard, those skilled in the art will appreciate that the present invention extends to an asymmetric virtual speaker layout that includes any number of virtual speakers located at any perceived location in the sound field.
In an exemplary embodiment of the present invention, the intended output speaker may be the earpiece 12. In this case, the actual output speakers L and R are located at the ears of the listener. Transfer function H0iIs the earphone transfer function, and the transfer function H0cCan be ignored.
Referring now to fig. 3, a block diagram of the virtual audio processing device 26 is shown. The overall process is broken down into three parallel processing blocks of the processing audio source signal path 28, the output signals of which are added to calculate the final output signals l (t), r (t), respectively. Each audio source signal 28 is virtualized to provide the illusion that the respective source channel signals LF (t), RF (t), LS (t), RS (t), CF (t) are located at different predetermined positions in 3D space. However, to provide the desired spatial effect, only one of the side channel signals LF (t), RF (t), LS (t), RS (t) needs to be virtualized. Various virtualization techniques for surround speakers of a 5.1 channel system are known in the art. In some systems, the 5.1 surround mixed ls (t) and rs (t) channels may be binaural processed to generate virtual sources using HRTFs corresponding to approximately 110 degrees forward from either side (the normal position of the surround speakers).
Front channel virtualization processing block 34 processes the front channel source audio signal pair lf (t), rf (t). Surround channel virtualization processing block 36 processes surround channel source audio signal pairs LS (t), RS (t). The central channel virtualization processing module 38 processes the central channel source audio signal cf (t).
For front speaker output, the center channel virtualization processing block 38 may include a 3dB signal attenuation. For headphone output, the central channel virtualization processing module 38 may apply a transfer function [ H ] to the source signal CF (t)F/H0i]And (4) defined filtering.
Referring now to fig. 3a and 3b, block diagrams depicting preferred embodiments of the front channel virtualization processing module 34 and the surround channel virtualization processing block 36 are shown. The present embodiment assumes that the actual and virtual loudspeaker layouts are symmetrical with respect to the front direction of the listener. Block HFSUM,HFDIFF,HSSUMAnd HSDIFFRepresenting filters having transfer functions defined by the following respective equations:
HFSUM=[HFi+HFc]/[H0i+H0c];
HFDIFF=[HFi-HFc]/[H0i-H0c];
HSSUM=[HSi+HSc]/[H0i+H0c];
HSDIFF=[HSi-HSc]/[H0i-H0c]。
referring back to fig. 3, the central channel virtualization block 38 is followed by a spatial expansion processing block 40 (or spatial expander, described in detail below) to produce two different (L and R) output signals from the single channel input signal cf (t), resulting in a pseudo-stereo effect. The pseudo-stereo effect converts a mono signal into a dual or multi-channel output signal, thereby expanding the mono signal into a dual or multi-channel sound field.
In front speaker playback, the subjective effect obtained is that the center channel audio signal cf (t) is perceived to emanate from an extended spatial region located near the actual speakers, as shown in fig. 4. The resulting signal cf (t) is thus scattered or dispersed, resulting in a more natural sound perception. In headphone playback, the subjective effect achieved is a more natural and visual perception of the location of the center channel audio signal. The subjective effect is an improved "out of head" front feel, alleviating one of the common disadvantages of headphones playing.
In fig. 3, the central channel virtualization processing block 38 is a single-input single-output filter, and therefore it may be equivalent to modify the process of fig. 3 by first applying a spatial expansion process to the input signal cf (t), and then applying the central channel virtualization process identically to each of the two output signals L and R of the spatial expansion processing block.
Referring now to fig. 5a, a block diagram of the spatial expansion processing block 40 is shown. The source signal cf (t) is split into left and right output signals L, R, which are passed through different all-pass filters APFLAnd APFRAnd (6) processing. An all-pass filter is an electronic filter that passes all frequencies equally, but changes the phase relationship between the frequencies. Thus, the all-pass filter may provide a frequency-dependent phase shift to the signal and/or vary the propagation delay with frequency. All-pass filters are typically used to compensate for other unwanted phase shifts occurring in the processing or mixed with an un-phase shifted version of the original signal to implement a notch comb filter. They can also be used to convert a hybrid phase filter into a minimum phase filter with an equivalent magnitude response, or an unstable filter into a stable filter with an equivalent magnitude response.
Referring now to fig. 5b, a block diagram of an embodiment of the all-pass filter processing block APF is shown. The all-pass filter APF comprises a delay unit 42, denoted Z-NFor introducing a time delay for the center channel signal cf (t). The digital delay length N is expressed in samples and g represents a positive or negative loop gain such that its magnitude | g | < 1.0. Preferably, the spatial extension processing block 40 includes different digital delay lengths N for the respective all-pass filters APF, the time length of the delay being between 3ms and 5 ms. However, the range of the time length is not intended to be limiting, as the time length may be determined according to various parameters.
Referring now to FIG. 5c, a block diagram of the spatial expansion processing block 40 is shown, according to an alternative embodiment. In this embodiment, the difference between the L and R output signals of the spatial extension processing block 40 is produced by adding and subtracting delayed copies of the audio source signal cf (t) itself, respectively. Preferably, the copied cf (t) signal comprises a time delay having a digital delay length of between 2ms and 4 ms. For a given digital delay length N, the spatial extension is determined by scaling factors a and b. The scaling factor is generated according to a multiplication factor having a ratio a/b. Preferably, the ratio a/b is comprised between [0.0, 1.0 ]]In (1). By using the rules: a is2+b2The total power of the output signals L and R may be limited to match the power of the input signal cf (t). It is conceivable that c is equal to a predetermined constant. Preferably c is equal to about 0.5.
Referring now to FIG. 5d, a block diagram of the spatial expansion processing block 40 is shown, according to an alternative embodiment. The processing block of fig. 5c is modified by replacing the delay unit 42 with an all-pass filter APF. A delay or all-pass filter is applied to cf (t) to produce a phase-shifted center channel signal. The phase-shifted center channel signal is subtracted from cf (t) to produce a right output. Cf (t) is added to the phase-shifted center channel signal to produce a left output. A variation of the spatial expansion processing block 40 may be implemented by replacing the APF with another single-input single-output all-pass network. An alternative method for constructing a single-input single-output all-pass network may be applied in the embodiments of the spatial extension block shown in fig. 5a or 5 d. These methods include cascading multiple single-input single-output all-pass networks, and/or replacing or cascading any delay cells in an all-pass network filter with another all-pass network.
Referring now to FIG. 6, another embodiment of the front channel and central channel virtualization process included in the device 26 is shown. This embodiment is preferred when the audio source signal 28 does not comprise the independent center channel signal cf (t). The central channel extraction processing block 44 is inserted in front of the front channel virtualization processing block 34. The central channel extraction processing block 44 receives the front channel signal pair denoted by LF (t), RF (t) and outputs three signals LF ', RF ' and CF '. The audio signal CF' is an extracted center channel audio signal that contains an audio signal component (or "center image") that is common to the original left and right input signals LF and RF. The audio signal LF' contains an audio signal component localized (or "sound image localized") to the left in the original two-channel input signal (LF, RF). Similarly, the audio signal RF' contains an audio signal component localized (or "sound image localization") to the right in the input signal (LF, RF). The three signals LF ', RF ' and CF ' are then processed in the same way as in the virtual audio processing device 26 of fig. 3. Alternatively, the extracted center channel signal CF' may be additively combined with the separate center channel input signal CF (t), so that the same virtual audio processing device 26 may also be employed to process a multi-channel input signal comprising the original center channel signal.
Referring now to FIG. 7, a block diagram of an embodiment of the central channel extraction processing block 44 is shown. The audio source channel signals lf (t) and rf (t) are processed by an optional sub-band analysis stage 46a, 46b which decomposes the signals into a plurality of sub-band audio signals associated with different frequency bands. In embodiments comprising these subband analysis stages 46a, 46b, a central channel extraction process is performed for each frequency band, respectively, and synthesis blocks may optionally be provided to recombine the subband output signals corresponding to each of the three output channels LF (t), RF (t), and CF (t) into full-band audio signals LF ', RF ', and CF ', respectively. In one embodiment, the central channel extraction process is implemented by:
LF’=kL*LF;RF’=kR*RF;CF’=kC*(LF+RF);
wherein k isLRepresenting the scaling factor, k, for the LF' signalRRepresents a scaling factor for the RF' signal, and kCThe scaling factor for the CF' signal is indicated. In one embodiment, the scaling factor kL,kRAnd kCAdaptively calculated by an adaptive dominance detector block 48 which constantly evaluates the inter-channel similarity M between the input channels, increasing k when the inter-channel similarity is highCAnd k is decreased when inter-channel similarity is lowCThe value of (c). At the same time, the adaptive dominance detector block reduces k when inter-channel similarity is highLAnd kRAnd increasing these values when inter-channel similarity is low. In one embodiment of the present invention, the inter-channel similarity index M is defined as follows:
M=log[|LF+RF|2/|LF-RF|2]
referring now to FIG. 8, a block diagram of a virtual audio processing device 26 is shown, in accordance with an alternative embodiment. The spatial expansion processing block 40 and the front channel virtualization processing block 34 of fig. 3a are combined in a single processing block. The spatial expansion process is applied to the filter HFSUMOf the filter HFSUMIs derived from the sum of the audio source channel signals lf (t) and rf (t). A delay or all-pass filter is applied to cf (t) to produce a phase-shifted center channel signal. The phase-shifted center channel signal is subtracted from cf (t) to produce a right output. Cf (t) is added to the phase-shifted center channel signal to produce a left output. The difference between the right and left channel signals is HFDIFFProcessing to produce a filtered difference signal. The filtered difference signal is added to the phase shifted center channel signal. The optional adaptive dominance detector 48 continuously adjusts the spatial extent according to the inter-channel similarity index M. Alternatively, as in fig. 7, the input signals lf (t) and rf (t) may be pre-processed by a subband analysis block (not shown in fig. 8), and the output signals L and R may be post-processed by a synthesis block to recombine the subbands into a full band signal。
Specific details are set forth herein by way of example only, and are presented in order to provide a thorough discussion of the embodiments of the invention and to provide a description of the invention that is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
Claims (26)
1. A method of processing an audio signal, comprising the steps of:
receiving at least one audio signal having at least a center channel signal, a right side channel signal, and a left side channel signal;
processing the right side channel signal and the left side channel signal using a first virtualization processor, thereby creating a right virtualized channel signal and a left virtualized channel signal;
processing the center channel signal using a spatial expander to produce distinct right and left outputs, thereby expanding the center channel to have a pseudo-stereo effect, the step of processing the center channel signal using the spatial expander further comprising:
applying a delay or all-pass filter to the center channel signal, thereby creating a phase-shifted center channel signal;
subtracting the phase-shifted center channel signal from the center channel signal to produce the right output;
adding the center channel signal to the phase-shifted center channel signal to produce the left output;
scaling the center channel signal based on at least one coefficient for determining a perceived amount of spatial expansion; and
adding the right output to the right virtualized channel signal and the left output to the left virtualized channel signal to produce at least one modified side channel output.
2. The method of claim 1, wherein processing the center channel signal using a spatial expander comprises:
the center channel signal is processed using a right all-pass filter to produce a right phase-shifted output signal.
3. The method of claim 1, wherein processing the center channel signal using a spatial expander comprises:
the center channel signal is processed using a left all-pass filter to produce a left phase-shifted output signal.
4. The method of claim 1, wherein processing the right and left channel signals using the first virtualization processor creates a different perceived spatial location for at least one of the right and left channel signals.
5. The method of claim 1Wherein the at least one coefficient is represented by a2+b2C, where c is equal to a predetermined constant value.
6. The method according to claim 5, wherein the predetermined constant value is 0.5.
7. The method of claim 1, wherein the at least one audio signal further comprises a right surround side channel signal and a left surround side channel signal.
8. The method of claim 7, wherein the right surround side channel signal and the left surround side channel signal are processed by a second virtualization processor, thereby creating a right surround virtualized channel signal and a left surround virtualized channel signal.
9. The method of claim 8, further comprising the steps of:
adding the right output to the right surround virtualized channel signal and the left output to the left surround virtualized channel signal to produce at least one modified side channel output.
10. The method of claim 1, wherein the first virtualized processor comprises H(SUM)A first HRTF filter represented by H(DIFF)A second HRTF filter of where H(SUM)And H(DIFF)Including the following transfer functions:
H(SUM)=[Hi+Hc]/[H0i+H0c];
H(DIFF)=[Hi-Hc]/[H0i-H0c];
wherein HiIs an ipsilateral HRTF, H for the left virtual speaker position or the right virtual speaker positioncIs directed to the left virtual speaker position or the right virtual speakerAn opposite HRTF of the acoustic location; h0iIs an HRTF, H on the same side of the left actual loudspeaker position or the right actual loudspeaker position0cIs the opposite side HRTF for either the left actual speaker position or the right actual speaker position.
11. A method of processing an audio signal, comprising the steps of:
receiving at least one audio signal having at least a right-side channel signal and a left-side channel signal;
processing the right side channel signal and the left side channel signal to extract a center channel signal;
further processing the right side channel signal and the left side channel signal using a first virtualization processor, thereby creating a right virtualized channel signal and a left virtualized channel signal;
processing the center channel signal using a spatial expander to produce distinct left and right outputs, thereby expanding the center channel to have a pseudo-stereo effect; and
adding the right output to the right virtualized channel signal and the left output to the left virtualized channel signal to produce at least one modified side channel output.
12. The method of claim 11, wherein the step of processing the right and left side channel signals to extract a center channel signal comprises:
filtering the right and left side channel signals into a plurality of sub-band audio signals associated with different frequency bands;
extracting a sub-band center channel signal in at least one frequency band; and
recombining the sub-band center channel signals to produce a full-band center channel signal.
13. The method of claim 11, wherein the step of processing the right and left side channel signals to extract a center channel signal comprises:
scaling at least one of the right-side channel signal or the left-side channel signal using at least one scaling factor.
14. The method of claim 13, wherein the at least one scaling factor is determined by continuously evaluating an inter-channel similarity index between the right and left side channel signals, wherein the inter-channel similarity index is related to a magnitude of a signal component common to the right and left side channel signals.
15. The method of claim 14, wherein the inter-channel similarity index is determined by comparing a power of a sum and a power of a difference of the right and left channel signals.
16. The method of claim 11, wherein the first virtualized processor comprises H(SUM)A first HRTF filter represented by H(DIFF)A second HRTF filter of where H(SUM)And H(DIFF)Including the following transfer functions:
H(SUM)=[Hi+Hc]/[H0i+H0c];
H(DIFF)=[Hi-Hc]/[H0i-H0c];
wherein HiIs an ipsilateral HRTF, H for the left virtual speaker position or the right virtual speaker positioncIs a contralateral HRTF, H, for either the left virtual speaker position or the right virtual speaker position0iIs an HRTF, H on the same side of the left actual loudspeaker position or the right actual loudspeaker position0cIs the opposite side HRTF for either the left actual speaker position or the right actual speaker position.
17. The method of claim 16, comprising the steps of:
using H(SUM)Processing a sum of the right side channel signal and the left side channel signal to generate the center channel signal.
18. The method of claim 11, wherein the step of processing the center channel signal using a spatial expander comprises:
applying a delay or all-pass filter to the center channel signal, thereby creating a phase-shifted center channel signal;
subtracting the phase-shifted center channel signal from the center channel signal to produce the right output; and
adding the center channel signal to the phase-shifted center channel signal to produce the left output.
19. The method of claim 16, further comprising the step of:
applying a delay or all-pass filter to the center channel signal, thereby creating a phase-shifted center channel signal;
subtracting the phase-shifted center channel signal from the center channel signal to produce the right output;
adding the center channel signal to the phase-shifted center channel signal to produce the left output;
using H(DIFF)Processing a difference between the right side channel signal and the left side channel signal to generate a filtered difference signal; and
adding the filtered difference signal to the phase-shifted center channel signal.
20. The method of claim 16, wherein the transfer function H0iIs a headphone transfer function, and said transfer function H0cIs zero.
21. The method according to claim 18, comprising the step of scaling the centre channel signal based on at least one coefficient for determining the perceived amount of spatial expansion.
22. The method of claim 18, wherein the amplitude of the center channel signal is continuously adjusted by a scaling factor based on an inter-channel similarity index between the right and left side channel signals, wherein the similarity index is related to the magnitude of a signal component common to the right and left side channel signals.
23. The method of claim 1 or 11, wherein the adding step produces at least two modified side channel output signals for playback through headphones.
24. An audio signal processing apparatus comprising:
at least one audio signal having at least a center channel signal, a right side channel signal and a left side channel signal;
a processor for receiving the right and left side channel signals, the processor processing the right and left side channel signals using a first virtualization processor, thereby creating a right and left virtualized channel signal;
a spatial expander for receiving the center channel signal, the spatial expander processing the center channel signal to produce distinct right and left output signals, thereby expanding the center channel to have a pseudo-stereo effect; wherein the spatial expander applies a delay or all-pass filter to the center channel signal, thereby creating a phase-shifted center channel signal, and subtracts the phase-shifted center channel signal from the center channel signal to produce the right output; the spatial expander adds the center channel signal to the phase-shifted center channel signal to produce the left output and scales the center channel signal based on at least one coefficient for determining an amount of perceived spatial expansion; and
a mixer to add the right output to the right virtualized channel signal and the left output to the left virtualized channel signal to produce at least one modified side channel output.
25. The audio signal processing apparatus of claim 24, wherein processing the right and left channel signals using the first virtualization processor produces different perceived spatial locations for at least one of the right and left channel signals.
26. The audio signal processing apparatus of claim 24, wherein the audio signal comprises a right surround side channel signal and a left surround side channel signal.
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US21756209P | 2009-06-01 | 2009-06-01 | |
| US61/217,562 | 2009-06-01 | ||
| US12/762,915 US8000485B2 (en) | 2009-06-01 | 2010-04-19 | Virtual audio processing for loudspeaker or headphone playback |
| US12/762,915 | 2010-04-19 | ||
| PCT/US2010/036683 WO2010141371A1 (en) | 2009-06-01 | 2010-05-28 | Virtual audio processing for loudspeaker or headphone playback |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1173250A1 HK1173250A1 (en) | 2013-05-10 |
| HK1173250B true HK1173250B (en) | 2016-05-20 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102597987B (en) | Virtual audio processing for loudspeaker or headphone playback | |
| US10299056B2 (en) | Spatial audio enhancement processing method and apparatus | |
| KR102423757B1 (en) | Method, apparatus and computer-readable recording medium for rendering audio signal | |
| US7680289B2 (en) | Binaural sound localization using a formant-type cascade of resonators and anti-resonators | |
| JP2004529515A (en) | Method for decoding two-channel matrix coded audio to reconstruct multi-channel audio | |
| JP4782614B2 (en) | decoder | |
| US7599498B2 (en) | Apparatus and method for producing 3D sound | |
| JP2010178375A (en) | 5-2-5 matrix encoder and decoder system | |
| US8027494B2 (en) | Acoustic image creation system and program therefor | |
| US12507011B2 (en) | Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same | |
| HK1173250B (en) | Virtual audio processing for loudspeaker or headphone playback | |
| JP7332745B2 (en) | Speech processing method and speech processing device | |
| WO2020045109A1 (en) | Signal processing device, signal processing method, and program | |
| US11470435B2 (en) | Method and device for processing audio signals using 2-channel stereo speaker | |
| GB2609667A (en) | Audio rendering |