GB2369976A - A method of synthesising an averaged diffuse-field head-related transfer function - Google Patents
A method of synthesising an averaged diffuse-field head-related transfer function Download PDFInfo
- Publication number
- GB2369976A GB2369976A GB0029810A GB0029810A GB2369976A GB 2369976 A GB2369976 A GB 2369976A GB 0029810 A GB0029810 A GB 0029810A GB 0029810 A GB0029810 A GB 0029810A GB 2369976 A GB2369976 A GB 2369976A
- Authority
- GB
- United Kingdom
- Prior art keywords
- hrtfs
- hrtf
- diffuse
- sound
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000012546 transfer Methods 0.000 title claims abstract description 9
- 230000006870 function Effects 0.000 claims abstract description 14
- 238000012935 Averaging Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000005316 response function Methods 0.000 claims abstract description 11
- 230000005236 sound signal Effects 0.000 claims abstract description 4
- 230000015572 biosynthetic process Effects 0.000 abstract description 3
- 230000004807 localization Effects 0.000 abstract description 3
- 238000003786 synthesis reaction Methods 0.000 abstract description 3
- 230000003595 spectral effect Effects 0.000 description 13
- 238000004088 simulation Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 3
- 210000005069 ears Anatomy 0.000 description 3
- 238000005259 measurement Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 206010021403 Illusion Diseases 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Abstract
A method of synthesising an averaged diffuse-field head-related transfer function (HRTF) for use in three dimensional audio signal processing, includes:- 1 selecting from a synthesis library of HRTFs, a set of HRTFs having approximately equal interaural time delay values, 2 averaging the left ear response functions from the set of HRTFs as a function of frequency, 3 averaging the right ear response functions from the set of HRTFs as a function of frequency, and 4 combining the averaged left and right ear response functions with the corresponding interaural time delay value to give an averaged diffuse-field HRTF which enables creation of non-specific localisation HRTFs. These "diffuse field" HRTFs have application in impulse response modelling with wave-scattering, and in virtual reverberation systems, for example for headphones and telephone handsets.
Description
A METHOD OF SYNTHESISING A DIFFUSE-FIELD HEAD-RELATED TRANSFER FUNCTION The present invention relates to a method of synthesising a diffuse-field
Head Related Transfer function (HRTF). It relates particularly, though not exclusively, to 3D-audio signal-processing, in which sounds can be reproduced so as to appear to originate in full, three-dimensional space around the listener, using two audio channels, and reproduced via either loudspeakers or headphones.
One aspect of the present invention relates to headphone"virtualisation" technology, in which an audio signal is processed such that, when it is auditioned using headphones, the source of the sound appears to originate outside the head of the listener. (At present, conventional stereo audio creates sound-images which appear-for the most part-to originate inside the head of the listener).
Another aspect of the invention is its use in virtual 3D-reverberation processing.
HRTFs from a sound-source at specified locations in space can be measured using an"artificial head"microphone system (Figure 1), or similar HRTFs may be synthesised electronically. Each HRTF comprises three elements: (a) a left-ear transfer function; (b) a right-ear transfer function; and (c) an inter-aural time-delay (ITD), and each is specific to a particular direction in three-dimensional space with respect to the listener (Figure 2). Sometimes it is convenient and more descriptive to refer to the left-and right-ear functions as a"near-ear"and"far-ear"function, according to relative source position.
The HRTF can be used to process a monophonic sound-source (Figure 2), digitally, such that the resultant stereo-pair signal contains the naturally-occurring 3D-sound cues which are introduced acoustically by the head and ears when we listen to sounds in real life. Typically, the use of two, 25-tap FIR filters (one for the near-ear filter and one for the far-ear filter), together with an appropriate (ITD)
time-delay element, in the range 0 to 680 Rs, provides an effective signal-processing means for implementing an HRTF filter at the usual sample rates of either 22.05 kHz or 44.1 kHz.
This processing, when applied to an audio recording and auditioned on headphones, creates the auditory illusion that the listener hears the recording from a virtual sound-source from a point in space corresponding to the direction associated with the HRTF (as shown in Figure 3). However, this method is anechoic (no sound-wave reflections are present), and emulates listening to sounds in an anechoic chamber. Consequently, although the direction of the sound-source can be emulated reasonably well, its distance is more difficult to judge. Through headphones, for example, the sound-source seems very close to the head.
When one listens through loudspeakers instead of headphones, then the signals are not conveyed efficiently into the ears, because there is"transaural acoustic crosstalk"present which inhibits the 3D-sound cues. This means that the left ear hears a little of what the right ear is hearing (after a small, additional timedelay of around 0.25 ms), and vice versa. To prevent this happening, it is known to create appropriate"crosstalk cancellation"signals from the opposite loudspeaker.
These signals are equal in magnitude and inverted with respect to the crosstalk signals, and designed to cancel them out. There are more advanced schemes which anticipate the secondary (and higher order) effects of the cancellation signals themselves contributing to secondary crosstalk, and the correction thereof.
This anechoic HRTF processing is the first stage in creating a virtual sound image for playback over two loudspeakers or headphones. It is restricted, however, in terms of conveying a convincing sense of distance to the listener, and through headphones, the image always appears very close to the head. In order to improve matters, several attempts have been made to supplement the anechoic simulation with additional signals representing reflected and reverberant waves, as will be described below. First, however, it is useful and important to define the terms "free-field"and"diffuse field"in the context of acoustical waves.
A"free-field"acoustical sound-field is one which is free of reflections, in which the acoustic energy from a sound-emitting object is free to propagate in the form of plane, progressive waves, without encountering any physical objects which would modify the nature of its propagation (such as reflective or absorbing surfaces). Consequently, if one were to make measurements in free-field conditions at any particular point in space, an incident sound-wave will possess both a
magnitude (related to the sound intensity value) and a propagation vector, representing its direction and speed. The wave passes through any particular point ONCE only. An example of a free-field environment is the anechoic chamber.
Several out-of-doors scenarios are almost anechoic, such as when standing in a snow-covered [sound-absorbing] field.
Free-field conditions, then, represent a reflection-free acoustical environment in which a sound-wave, measured or perceived at a single point (a) has magnitude and direction, and (b) is encountered only once.
A"diffuse field"acoustical sound-field is one created by a reflection-rich environment, in which many waves are superposed together such that it is impossible to identify or perceive the individual components. In"Fundamentals of
Acoustics", 3 édition, John Wiley and Sons, New York, (1984) by L E Kinsler et al. the generation of a diffuse field is described as: "... a ray model wherein sound is assumed to travel outwardfrom the source along
diverging rays. At each encounter with the boundaries of the room, the rays are partially absorbed and reflected... After a large number of reflections the sound in the room may have been assumed to have become diffuse : the average energy density is the same throughout the volume of the enclosure, and all directions of propagation are equally probable." An example of a diffuse-field environment is an echoic chamber : an ideal room with perfectly reflecting walls. Diffuse-field conditions represent a reflectionrich acoustical environment in which, at a single point, any particular sound-wave can be encountered many times, successively. Any measurement represents the summation of many individual waves, to which can be attributed a measurable averaged magnitude, but no measurable (or perceivable) direction.
The free-field condition and the diffuse field condition represent the two most extreme acoustic scenarios which exist at opposite ends of a continuous range of environments. In reality, of course, most environments lie somewhere between
the two ; indoor environments being largely diffuse in nature, and outdoor ones being largely free-field plus a ground reflection.
Recently, in order to enhance the anechoic simulation, various 3D-sound vendors have used either of two different approaches to supplement the direct sound, as will now be described. The first approach is to add a number of the firstorder reflections, using ray-tracing calculations, and the second approach is to simulate 3D-reverberation.
It is common practise to model the propagation of sound-waves in a room by means of ray-tracing. This method assumes that when a sound wave is reflected from a planar surface, such as a wall, then the process is analogous to an optical reflection (Snell's law: the angle of reflection is equal to the angle of incidence). This is a very crude method of visualising the situation, but it has been adopted widely, probably because of its convenient synergy with reverberation modelling using delay-lines.
Figure 4 shows the ray-tracing method applied to a simple rectangular room, depicted here in plan view. The listener is placed in the centre of the room, for convenience, and there is a sound-source to the front and on the right-hand side of the listener, at distance r, and at azimuth angle 0. The room has width w, and length 1. The sound from the source travels via a direct path to the listener, r, as shown, and also via a reflection off the right-hand wall such that the total path length is a + b. If the reflection path is extrapolated backwards from the listener and beyond the wall by its distance from the wall to the source, a, then this point specifies the position of the associated"virtual"sound-source. Because there is only a single reflection in the path from the source to listener, it is termed a"first-order" reflection. There are six first-order reflections in all: one from each wall, one from the ceiling and one from the ground.
The four wall-reflection-related virtual sound sources which correspond to a specific source-listener arrangement are shown in Figure 5 as vl through v4. They are all in adjacent"virtual rooms".
It is known in the prior art to enhance the synthesis of a virtual sound source by adding several additional virtual sound sources, derived directly from the original, which correspond to room reflections. However, note that this simulation
is a free-field one, because all of the sources have a direction associated with them, and encounter the listener only once.
The second common approach to the concept of reflected acoustical waves in rooms is the use of"virtual reverberation". This is achieved by creating several reverberant signals from the input sound-source, and then sending them to an array of virtual sources placed around the listener. Figure 6 shows this type of
arrangement in which the primary (direct) sound-source is positioned at about-300 azimuth from the listener, and there are four virtual speakers placed at 45'and 135'azimuth which deliver the reverberant signals. Each of the virtual speakers would require its own HRTF processing as shown in Figure 2.
The angles of the speakers are not critical, and it is possible to use a different number of virtual speakers. In the extreme, only two could be used, positioned laterally at 90 , say. However, this is not satisfactory because the reverberant signals would all appear to the listener to originate from two point sources at
precisely 90 . It is an aim of the present invention to overcome this problem. The use of four virtual speakers (Figure 6) is a compromise solution which distributes the reverberant signal between four sources, but it is not ideal. Additional virtual speakers could be employed, but each additional speaker requires its own HRTF processing algorithm, which is costly in signal-processing terms. Also, interference between the channels degrades the audio quality, and can"collapse"the sound image. Four speakers is a practical compromise.
However, note that this simulation also is a free-field one (despite the use of reverberant signals), because all of the reverberant virtual sources have a direction associated with them, and each of their signals encounters the listener only once.
In practise, neither of these prior art schemes is truly satisfactory. For the critical headphone user, the accurate incorporation of all six first-order reflections makes little difference to the 3D-audio simulation. The addition of 3D reverberation is more effective in that it can create the correct ambient sound for a particular acoustic environment, but it does not help in"externalising"the sound image: the primary source still sounds close to the head. Indeed, even when great care is taken
to adjust the reverberation parameters, the inventor has found that it is difficult to achieve convincing"externalisation"effects, even when using a complex reverberation engine (featuring all six accurately-simulated first-order reflections, together with eight individual virtual reverberation sources).
The link which overcomes these problems, and enables much more effective simulation of 3D-audio, is wave-scattering, as described in co-pending patent application number GB0022891. 6 which is incorporated herein by reference.
The ray tracing models propose the existence of a great many virtual sources in adjacent rooms to the primary one, but assume that the room is free of scattering objects. The results do not externalise the headphone image properly, and they are not convincing in terms of natural reverberation quality either.
In reality, however, the presence of physical features in a room, such as loudspeakers, chairs, tables, and so on, all scatter the sound-waves from the soundsource. Consequently, the listener receives first the direct sound (by definition), which is followed quickly by a chaotic sequence of elemental contributions from the scattering objects, even before the first wall reflections arrive at the listener. It is this wave-scattering which is the dominant feature in the 5-30 ms period following the arrival of the first, direct wavefront. Following this, of course, the scattered waves themselves participate in the reflection and reverberation processes.
To create a truly realistic 3D-audio simulation, these scattered waves must be incorporated with the direct sound. However, a problem arises because the scattered contributions do not possess a specific direction-of-origin. They might have undergone reflection from oblique or grossly irregular surfaces; they might have diffracted around various objects. In short, the waveform at the listener's ear is analogous to the motion of a buoy in a harbour with choppy waves. One can discern the amplitude of displacement, but one cannot attribute a direction to the wave motion. The scattering phenomenon itself can be simulated, but how can the resultant waveform be transmitted to the listener?
An HRTF of some form must be used, if the tonal qualities are to remain consistent with those of the direct sound simulation. One might consider that the best HRTFs to use for delivering the scattered waveform into each ear might be lateral ones (say 90 ), as with the virtual reverberation system, above, but again,
this is not satisfactory because the scattered signals would appear to originate precisely from two small point sources at 90 , rather than from a larger area, all around the listener. What is needed is an effective"non-specific"localisation means, where the required effect is to create the perception that the simulated 3D-audio source (in this case, the synthesised scattered signals) is positioned in a general area of space with respect to the listener, rather than existing as a point source.
One might consider"averaging"the HRTFs in some way, in order to create a"vague", non-specific HRTF, but it is not obvious as to how to do this. For example, if the spectral data were to be averaged around the region of +90 (in the
horizontal plane in the range, say, from +45'to +135'), what should be done about the ITD component of the HRTFs, which varies from around 300 us to 680 us in this range ? It is known that if the spectral data itself is inaccurate or poor, perhaps by inadequate synthesis or the use of data from a poor quality artificial head, then the brain can be confused and will localise the recorded sound source incorrectly. For example, some artificial head recordings have provided only a rearward image through headphones, and some poor quality'3D'systems cannot create the illusion of sounds originating from azimuth angles greater than 90 . The use of incorrect or inconsistent spectral data can also fragment an audio image, such that different types of sounds (e. g. from different instruments) occupy different positions in space.
An object of the present invention is to enable creation of non-specific localisation HRTFs. These"diffuse field"HRTFs have application in our copending patent application (GB0022891.6) relating to wave-scattering, and also in conjunction with virtual reverberation systems (Figure 6), to replace the multisource reverberation array with a single, custom HRTF.
According to the present invention there is provided a method as specified in claims 1-3.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings, in which:
Figure 1 shows a plan view of an arrangement for measuring HRTFs,
Figure 2 shows a schematic diagram of the three elements of an HRTF,
Figure 3 shows a plan view of headphone replay of a signal from a virtual sound-source from a point in space corresponding to the direction associated with an HRTF, Figure 4 depicts in plan view the ray-tracing method applied to a simple rectangular room,
Figure 5 shows the four wall-reflection-related virtual sound sources which correspond to a specific source-listener arrangement, shown in plan view,
Figure 6 shows in plan view a known arrangement for producing"virtual reverberation for a headphone user,
Figure 7 shows a plot of inter-aural time delay (ITD) versus azimuth angle,
Figure 8 shows a cone shaped locus of points at one side of a head having a common value of interaural time-delay,
Figures 9 shows a near ear response curve for two angles having a similar
ITD before averaging,
Figures 10 shows a far ear response curve for the same two angles as Figure 9 before averaging,
Figure 11 shows the data from Figures 9 and 10 after averaging,
Figure 12 shows how the averaged near ear response from Figure 11 compares with a near ear response for 90 , Figure 13 shows how the averaged far ear response from Figure 11 compares with a far ear response for 90 , Figure 14 shows a block diagram of an embodiment for creating an"out-ofthe-head"sound image for headphone listening, and
Figure 15 shows a block diagram of a telephone handset according to the present invention.
Figure 16 shows a block diagram of a method according to the present invention.
The present invention provides a means of creating a"diffuse field HRTF" (DFHRTF) by averaging a number of conventional HRTFs relating to a chosen solid angle with respect to the listener. The invention is based on the selection of HRTFs for averaging which possess similar ITD values, based on the"cone of confusion"
principle. By selecting a number of HRTFs in the chosen solid angle which possess identical ITD values, the spectral data can be averaged to create a DFHRTF, retaining the original ITD value.
The inter-aural time delay (ITD), having a range from 0 to 680 u. s, is probably the most powerful spatial sound cue component of the HRTF, and is dependent on the position of the sound-source with respect to the head and ears. If an inconsistent ITD is used with particular spectral data, then the imaging quality is compromised greatly: the sound image can become grossly incorrectly positioned (including elevated and depressed positions), and musical sources can become spatially incoherent-"distributed"in space-with different instruments and voices occurring in different positions.
It will be appreciated that the ITD cue is symmetrical (or nearly so) in the horizontal plane about the front-rear axis horizontally through the head. For example, the ITD associated with a loudspeaker positioned at an angle of 45'is about 362 us, and this is also the time delay associated with an angle of 138 . As the sound source moves, for example, in a horizontal, clockwise circle around the listener, starting from a point directly ahead (00 azimuth), then the ITD increases to a maximum of about 680 u. s, when the source is directly on the right-hand side of the listener, and then diminishes to zero again when the source is directly behind the listener. Figure 7 shows a plot of ITD (in time units of 22. 68 u, s, corresponding to 44.1 kHz samples) vs. azimuth angle in the horizontal plane. As the source continues from this rearward position, around to the left-hand side of the listener, then the delay increases again, to minus 680 u. s (i. e. the left ear signal is now leading the right ear signal) and then diminishes to zero as the source arrives back at the 0 position. Consequently, there are two positions of azimuth (in the horizontal plane) which correspond to any single value of ITD (e. g. 45'and 138', as shown in Figure 7). As might be expected by the near symmetry, these angles are nearly complementary (i. e. nearly summing to 180 ).
A consequence of this is that the brain, after identifying first the ITD, must decide on the basis of spectral information alone which of the two (or more) possible locations contains the sound source.
Because the head is approximately spherical, the symmetry extends to the third dimension. For example, in addition to the horizontal plane 0'and 180' directions, there are two more poles where the time-delay is zero, namely the positions directly above and directly below the listener. Indeed, any position on the median plane (the vertical plane bisecting the listener's head from front to back) has a zero ITD. In fact, any particular value of time-delay can be represented by a corresponding locus at one side of the head: this locus is cone-shaped (see Figure 8) and is sometimes called the"Cone of Confusion", because if the spectral data of a particular HRTF is distorted or incorrect in some way (for example, if it does not match the listener's own characteristics), then the associated virtual sound image appears to be incorrectly located. Generally, the image becomes localised by the listener somewhere on the rim of the particular cone of confusion associated with the ITD of the distorted HRTF, because the ITD component of the HRTF is such a powerful sound cue.
It appears that the brain uses, first of all, the ITD to determine on which cone of confusion to localise the virtual sound-source, following which the spectral data determines where on the rim of the cone the source appears. By selecting
HRTFs which possess a common ITD, their spectral data can be averaged so as to be spatially ambiguous, but still consistent with the ITD, and therefore unlikely to cause fragmentation of the audio image. It is thus possible to choose a region of space around the listener where the virtual source is to be positioned, determine which is the cone of confusion rim passing closest to the centroid of this region, and then average, respectively, the near-ear and far-ear spectral data of the HRTFs on this rim to create a region-specific diffuse field HRTF.
There are a number of different spatial regions for which the use of related, specific diffuse-field HRTFs would be valuable, these include, but are not limited to, the following.
Spatial region 1: Full Lateral.
Average all HRTFs having ITDs of around 300 us (representing all positions on the whole rim of the cone of Figure 8). (Note that this offers both a LHS and a RHD diffuse HRTF, as do all other options except #4 and #6.) Application: Wave-scattering effects.
Spatial region 2: Upper Lateral.
Average HRTFs having ITDs of around 300 as and above the horizontal plane only (i. e. the upper half of the cone rim).
Applications: Virtual 3D-reverberation (most real world reverb occurs above the listener).
Spatial region 3: Partial Upper Lateral.
Average HRTFs having ITDs of around 300 As above the horizontal plane and within azimuths of 450 to 1350 only (central portion of upper half of rim).
Applications: Cinema type 3D-reverberation (lateral, elevated bias for predominantly lateral sources).
Spatial region 4: Central.
Average all HRTFs having ITDs of zero (i. e. lying on the median plane).
Application: Centralised for representing short, wide rooms.
Spatial region 5: Upper Central.
Average all HRTFs having ITDs of zero (lying on the median plane) above the horizontal plane.
Application: Represents short, wide, high rooms.
Spatial region 6: Above the listener.
Average all HRTFs having ITDs of zero (lying on the median plane) above the horizontal plane in a restricted range of elevation values (say, from +45'to +90').
Application : Represents overhead reverb (cathedral type effect).
These are just a few examples of how the primary HRTFs can be averaged to create region-specific DFHRTFs. There is a full description about the use of diffuse-field HRTFs for wave-scattering applications in our co-pending application GB0022891. 6.
Figures 9 to 13 show a simple practical example of the invention, in which a pair of horizontal-plane HRTFs having identical ITDs are averaged so as to create a lateral diffuse-field HRTF. The objective here is to create a lateral, diffuse HRTF suitable for use with non-directional scattered-wave simulation, in order to create non-specific lateral positioning. First, two horizontal-plane positions with identical
ITDs are chosen so as to represent a reasonably wide spread of angle (around the
900 direction), but still represent a non-central location. The HRTFs for +45'and +138'are ideal for this purpose. Figure 9 shows the near-ear data for both of these HRTFs, and Figure 10 shows the far-ear data. By averaging, respectively, their nearear and far-ear amplitude responses, as a function of frequency, the corresponding data for the resultant diffuse field HRTF is obtained (Figure 11). When this new
HRTF is used to process a sound recording, the resulting virtual image sounds as if it is placed somewhere to the side of the listener, but, as required, the exact positioning cannot be determined. Wave-scattering and reverberation applied using this arrangement sounds natural. By contrast, if an individual HRTF were to be employed, say that of +90 , then the resultant image would be very precisely located. Wave-scattering and reverberation through this latter arrangement sounds quite artificial, as if it were emanating from a single point-source loudspeaker.
The following points, visible in the Figures, are worthy of note.
Figure 9: main differences between the two primary HRTF near-ear responses occur between 2 kHz and 4 kHz.
Figure 10: differences between the two primary HRTF far-ear responses are much smaller. However, note the common 7 dB trough at about 1.8 kHz: this seems to be caused by head diffraction, and is absent in the +90'far-ear characteristic.
Figure 11: the new diffuse spectral data exhibits the common 7 dB trough at about 1.8 kHz.
Figure 12 shows the near-ear new spectral data in comparison with that of the +90 HRTF. The main differences lie in the range 4 kHz to 8 kHz, where the diffuse data response is about 6 dB less than that of the +900 data.
Figure 13 shows the far-ear new spectral data in comparison with that of the +90 HRTF. A fundamental difference here is that the 7 dB trough at about 1.8 kHz has been incorporated into the diffuse data, and there is considerable difference in the range 3.5 kHz to 6 kHz.
Figure 14 shows a block diagram of the implementation of the invention for creating an"out-of-the-head"sound image for headphone listening, based on wavescattering according to the co-pending application GB0022891. 6. An audio signal input source is fed via a conventional HRTF processing stage (as in Figure 1), and the L and R outputs are fed into respective summing means. These represent the direct sound; the first to arrive at the listener. Simultaneously, the input source is fed into a pair of scattering filters, each creating respective scattered, non-directional signals. The signal from each scattering filter is fed into a diffuse HRTF (one corresponding to the left hemisphere, and the other to the right). The respective L and R outputs of both diffuse HRTFs are fed to the respective summing means, and combined with the output representing the direct sound. Consequently, the direct sound is precisely positioned, and the scattered sound is not precisely positioned.
Another application is shown in Figure 15, relating to virtualisation for a telephone handset, such as for example a cell-phone. The cell-phone's internal audio source is fed via a pinna transfer function (that is, the near-ear characteristic of an HRTF; in this case the 900 near-ear function would be appropriate (Figure 12)) to a summing junction, and thence to the internal speaker. Simultaneously, the audio source is fed into a scattering filter, thus creating a scattered, non-directional signal, and thence via a diffuse pinna transfer function (that is, the near-ear characteristic of a lateral, diffuse HRTF, such as that also shown in Figure 12) to the summing junction and the internal speaker.
Figure 16 shows a block diagram of the method of the present invention in broad terms. Block 1 denotes selecting from an existing library of HRTFs, or from a library of possible HRTFs capable of being synthesised by an HRTF synthesiser, a set of HRTFs having approximately equal ITD values. Block 2 denotes averaging the left ear response functions from the set of HRTFs at each frequency. Block 3 denotes the step of averaging the right ear response functions from the set of HRTFs at each frequency, and block 4 denotes combining the averaged left and right ear response functions with the corresponding ITD value to give an averaged diffusefield HRTF.
Finally, the accompanying abstract is incorporated herein by reference.
Claims (5)
- CLAIMS 1. A method of synthesising an averaged diffuse-field head-related transfer function (HRTF) for use in three dimensional audio signal processing, the HRTF consisting of a left ear response function which varies with frequency, a right ear response function which varies with frequency, and an interaural time delay (ITD) which is substantially independent of frequency, the method consisting of or including: a) selecting from an existing library of HRTFs, or from a library of possible HRTFs capable of being synthesised by an HRTF synthesiser, a set of HRTFs having approximately equal ITD values, b) averaging the left ear response functions from the set of HRTFs at each frequency, c) averaging the right ear response functions from the set of HRTFs at each frequency, and d) combining the averaged left and right ear response functions with the corresponding ITD value to give an averaged diffuse-field HRTF.
- 2. A method as claimed in claim 1 in which the set of HRTFs selected in step a) correspond to a predetermined spatial region with respect to the preferred location of a listener in use.
- 3. A method of creating a sound image, apparently located external to the head of a listener in use, by combining a direct sound modified by a conventional free-field HFTF, and the same sound modified by a scattering filter followed by one or more averaged diffuse-field HRTFs synthesised according to claim 1.
- 4. A headphone system adapted to perform the method as claimed in claim 3.
- 5. A telephone handset adapted to perform the method as claimed in claim 3.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0029810A GB2369976A (en) | 2000-12-06 | 2000-12-06 | A method of synthesising an averaged diffuse-field head-related transfer function |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0029810A GB2369976A (en) | 2000-12-06 | 2000-12-06 | A method of synthesising an averaged diffuse-field head-related transfer function |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB0029810D0 GB0029810D0 (en) | 2001-01-17 |
| GB2369976A true GB2369976A (en) | 2002-06-12 |
Family
ID=9904595
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB0029810A Withdrawn GB2369976A (en) | 2000-12-06 | 2000-12-06 | A method of synthesising an averaged diffuse-field head-related transfer function |
Country Status (1)
| Country | Link |
|---|---|
| GB (1) | GB2369976A (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1929838A4 (en) * | 2005-10-01 | 2011-06-01 | Samsung Electronics Co Ltd | METHOD AND APPARATUS FOR PRODUCING A SPATIAL SOUND |
| EP2357854A1 (en) | 2010-01-07 | 2011-08-17 | Deutsche Telekom AG | Method and device for generating individually adjustable binaural audio signals |
| WO2012011015A1 (en) | 2010-07-22 | 2012-01-26 | Koninklijke Philips Electronics N.V. | System and method for sound reproduction |
| US10142761B2 (en) | 2014-03-06 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Structural modeling of the head related impulse response |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101221763B (en) * | 2007-01-09 | 2011-08-24 | 昆山杰得微电子有限公司 | Three-dimensional sound field synthesizing method aiming at sub-Band coding audio |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1995023493A1 (en) * | 1994-02-25 | 1995-08-31 | Moeller Henrik | Binaural synthesis, head-related transfer functions, and uses thereof |
| US5742689A (en) * | 1996-01-04 | 1998-04-21 | Virtual Listening Systems, Inc. | Method and device for processing a multichannel signal for use with a headphone |
| WO1999031938A1 (en) * | 1997-12-13 | 1999-06-24 | Central Research Laboratories Limited | A method of processing an audio signal |
-
2000
- 2000-12-06 GB GB0029810A patent/GB2369976A/en not_active Withdrawn
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1995023493A1 (en) * | 1994-02-25 | 1995-08-31 | Moeller Henrik | Binaural synthesis, head-related transfer functions, and uses thereof |
| US5742689A (en) * | 1996-01-04 | 1998-04-21 | Virtual Listening Systems, Inc. | Method and device for processing a multichannel signal for use with a headphone |
| WO1999031938A1 (en) * | 1997-12-13 | 1999-06-24 | Central Research Laboratories Limited | A method of processing an audio signal |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1929838A4 (en) * | 2005-10-01 | 2011-06-01 | Samsung Electronics Co Ltd | METHOD AND APPARATUS FOR PRODUCING A SPATIAL SOUND |
| US8340304B2 (en) | 2005-10-01 | 2012-12-25 | Samsung Electronics Co., Ltd. | Method and apparatus to generate spatial sound |
| EP2357854A1 (en) | 2010-01-07 | 2011-08-17 | Deutsche Telekom AG | Method and device for generating individually adjustable binaural audio signals |
| WO2012011015A1 (en) | 2010-07-22 | 2012-01-26 | Koninklijke Philips Electronics N.V. | System and method for sound reproduction |
| CN103053180A (en) * | 2010-07-22 | 2013-04-17 | 皇家飞利浦电子股份有限公司 | System and method for sound reproduction |
| US9107018B2 (en) | 2010-07-22 | 2015-08-11 | Koninklijke Philips N.V. | System and method for sound reproduction |
| CN103053180B (en) * | 2010-07-22 | 2016-03-23 | 皇家飞利浦电子股份有限公司 | System and method for sound reproduction |
| RU2589377C2 (en) * | 2010-07-22 | 2016-07-10 | Конинклейке Филипс Электроникс Н.В. | System and method for reproduction of sound |
| US10142761B2 (en) | 2014-03-06 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Structural modeling of the head related impulse response |
Also Published As
| Publication number | Publication date |
|---|---|
| GB0029810D0 (en) | 2001-01-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Hacihabiboglu et al. | Perceptual spatial audio recording, simulation, and rendering: An overview of spatial-audio techniques based on psychoacoustics | |
| AU2001239516B2 (en) | System and method for optimization of three-dimensional audio | |
| JP4633870B2 (en) | Audio signal processing method | |
| US6498857B1 (en) | Method of synthesizing an audio signal | |
| AU713105B2 (en) | A four dimensional acoustical audio system | |
| US6738479B1 (en) | Method of audio signal processing for a loudspeaker located close to an ear | |
| MXPA05004091A (en) | Dynamic binaural sound capture and reproduction. | |
| AU2001239516A1 (en) | System and method for optimization of three-dimensional audio | |
| US6990210B2 (en) | System for headphone-like rear channel speaker and the method of the same | |
| KR20080079502A (en) | Stereo sound output device and method for generating early reflection sound | |
| US10440495B2 (en) | Virtual localization of sound | |
| GB2369976A (en) | A method of synthesising an averaged diffuse-field head-related transfer function | |
| EP0959644A2 (en) | Method of modifying a filter for implementing a head-related transfer function | |
| US7050596B2 (en) | System and headphone-like rear channel speaker and the method of the same | |
| Pelzer et al. | 3D reproduction of room auralizations by combining intensity panning, crosstalk cancellation and Ambisonics | |
| GB2366975A (en) | A method of audio signal processing for a loudspeaker located close to an ear | |
| US6983054B2 (en) | Means for compensating rear sound effect | |
| US20060109986A1 (en) | Apparatus and method to generate virtual 3D sound using asymmetry and recording medium storing program to perform the method | |
| GB2353926A (en) | Generating a second audio signal from a first audio signal for the reproduction of 3D sound | |
| Glasgal | Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques | |
| JPH05115098A (en) | Stereophonic sound field synthesis method | |
| Bahri | Loudspeaker directivity and playback environment in acoustic crosstalk cancelation | |
| KR100705930B1 (en) | 3D sound realization device and method | |
| Corey | An integrated system for dynamic control of auditory perspective in a multichannel sound field | |
| Lee et al. | Reduction of sound localization error for non-individualized HRTF by directional weighting function |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 732E | Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977) | ||
| WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |