[go: up one dir, main page]

WO2018162803A1 - Procédé et agencement d'analyse paramétrique et traitement de scènes sonores spatiales codées de manière ambisonique - Google Patents

Procédé et agencement d'analyse paramétrique et traitement de scènes sonores spatiales codées de manière ambisonique Download PDF

Info

Publication number
WO2018162803A1
WO2018162803A1 PCT/FI2018/050172 FI2018050172W WO2018162803A1 WO 2018162803 A1 WO2018162803 A1 WO 2018162803A1 FI 2018050172 W FI2018050172 W FI 2018050172W WO 2018162803 A1 WO2018162803 A1 WO 2018162803A1
Authority
WO
WIPO (PCT)
Prior art keywords
spherical harmonic
digital representation
arrangement
representation
harmonic digital
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/FI2018/050172
Other languages
English (en)
Inventor
Archontis Politis
Sakari TERVO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aalto Korkeakoulusaatio sr
Original Assignee
Aalto Korkeakoulusaatio sr
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aalto Korkeakoulusaatio sr filed Critical Aalto Korkeakoulusaatio sr
Publication of WO2018162803A1 publication Critical patent/WO2018162803A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/003Digital PA systems using, e.g. LAN or internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/13Aspects of volume control, not necessarily automatic, in stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • H04S7/306For headphones

Definitions

  • the present invention generally pertains to audio processing.
  • the invention relates to spatial audio processing and enhancing ambisonically encoded sound scenes through analysis of related data.
  • Spatial sound recording, processing and reproduction are generally moving away from playback setup-based channel formats, e.g. stereo or 5.1 surround, to systems that are flexible and able to appropriately render spatial sound scenes to arbitrary playback systems.
  • Such systems can be intended exclusively for synthetically produced sounds scenes, where parametric spatial information, such as position and orientation, is attached to all sounds in the scene, and each individual sound is transmitted to the client for rendering and playback. This approach is termed object-based.
  • all the sound objects can be encoded, using appropriate mixing tools, into a number of audio signals that describe the whole sound scene, an approach sometimes termed scene-based, or spatial-transform-based.
  • An advantage of scene-based encoding and reproduction over object-based is the reduction on bandwidth requirements to a fixed number of channels, instead of potentially a large amount of object channels, and the important possibility to represent recorded, real sound scenes, captured with appropriate spatial audio recording devices employing microphone arrays.
  • Ambisonics which uses spherical harmonics as spatial basis functions to represent a sound scene.
  • Ambisonics basically encode the spatial properties of the sound scene as level differences between audio channels, without additional metadata.
  • a sound scene may refer to both a synthesized spatial sound scene 102 created e.g. in a studio or workstation with appropriate software/hardware and ambisonic encoding 104, or it may refer to a real sound scene captured with an ambisonic microphone103 and encoded 105 as shown at 100 in Fig.1.
  • Ambisonics define a hierarchical spatial audio format, wherein increasing orders define an increasing spatial resolution, with a corresponding increasing number of audio channels describing the scene. Due to technological limitations, Ambisonics were limited to the first-order format in the past, described by four audio channels, termed here first-order Ambisonics (FOA).
  • FOA first-order Ambisonics
  • HOA higher-order Ambisonics
  • Ambisonics or ambisonically encoded signals
  • the playback system for listening through appropriate mixing of the channels, which depends only on the target playback system and the available order of Ambisonics, FOA or HOA.
  • the process is called ambisonic decoding.
  • useful transformations such as rotations can be applied to the sound scene. Rotations are useful e.g. in headphone playback since they can stabilize the perceived playback sound scene of the listener when combined with head- tracking headphones, or an external head-tracking system.
  • ambisonic decoding Since ambisonic decoding is signal-independent, it does not consider the content of the sound scene at all. To improve upon the limitations of Ambisonics, some signal-dependent methods have been developed, that perform a time-frequency analysis of the ambisonic signals and aim to extract parameters that they then use to sharpen reproduction and improve overall perceptual quality.
  • One such solution is Directional Audio Coding, which aims to improve FOA signals by extracting one direction-of-arrival (DoA) and one diffuseness parameter, breaking essentially the sound scene into a directional sound and a non-directional/diffuse sound stream.
  • DoA direction-of-arrival
  • HARPEX Another such method is HARPEX which works only with FOA signals, and assumes two directional sounds active at each time-frequency point, and hence extracts two directional streams.
  • the resolution at reproduction is in theory limited only by the capabilities of the playback system.
  • Low-order ambisonic decoding fails, however, to utilize the full resolution of arbitrary playback systems due to the inability of the decoding stage to duly approximate the appropriate spatialization functions of the playback system, such as panning functions for loudspeakers or head-related transfer functions (HRTFs) for headphones.
  • HRTFs head-related transfer functions
  • ambisonic sound scenes exist in a lower-order format only, mainly FOA, due to the lack of HOA microphones in the market.
  • synthetic sound scenes may be more conveniently produced directly in a higher-order format, e.g. third or fourth, to preserve the directional sharpness of the material.
  • ambisonic modifications to a sound scene are, despite of their usefulness in some occasions, limited to global spatial modifications, such as the aforesaid rotations of the scene, directional blurring, and warping, meaning that sounds from certain directions are pushed closer, while others are stretched further apart.
  • These can be thought of as modifications of a 360 picture, in which the image can be rotated, blurred or distorted across certain directions.
  • an objective of the present invention is to at least alleviate one or more of the above problems and challenges associated with prior art solutions in the context of audio encoding or decoding involving spherical harmonic digital representation of sound scenes, or specifically, as termed above, Ambisonics.
  • an electronic arrangement for cultivating a spherical harmonic digital representation of a sound scene comprises at least one data interface for transferring data, at least one processing unit for processing instructions and other data, and memory for storing the instructions and other data, said at least one processing unit being configured, in accordance with the stored instructions, to cause: obtaining the spherical harmonic digital representation of the sound scene, preferably comprising ambisonically encoded digital presentation, determining through analysis of said spherical harmonic digital representation a number of related spatial parameters indicative of at least dominant sound sources in the sound scene, their directions-of-arrival (DOA) and associated powers, wherein time-frequency decomposition of said spherical harmonic digital representation is preferably utilized to divide the presentation into a plurality of frequency bands analysed, said bands optionally reflecting human auditory frequency resolution, and providing said spherical harmonic digital representation, preferably as divided into said plurality of frequency bands, and said number of spatial parameters to spatial filtering in order to produce an output signal for audio rendering and/or up
  • DOA directions-
  • an electronic arrangement for processing a spherical harmonic digital representation of a sound scene comprises at least one data interface for transferring data, at least one processing unit for processing instructions and other data, and memory for storing the instructions and other data, said at least one processing unit being configured, in accordance with the stored instructions, to cause: obtaining the spherical harmonic digital representation of the sound scene, preferably being divided into a plurality of frequency bands, and a number of related spatial parameters indicative of at least dominant sound sources in the scene, their directions-of-arrival (DOA) and associated powers; subjecting the spherical harmonic digital representation to spatial filtering and audio rendering or spatial filtering and upmixing, wherein corresponding matrices for decomposition of the spherical harmonic digital representation and rendering to audio signals associated with respective playback channels or upmixing to higher order representation, both for the dominant sound sources and ambient component are determined based on the spatial parameters; and respectively providing the resulting, rendered signals forward for audio playback via a number of transducers associated with the play
  • an electronic arrangement for processing a low- bandwidth indication of a spherical harmonic digital representation of a sound scene comprises at least one data interface for transferring data, at least one processing unit for processing instructions and other data, and memory for storing the instructions and other data, said at least one processing unit being configured, in accordance with the stored instructions, to cause: obtaining a number of dominant sound source signals and a monophonic ambient signal resulting from decomposing the spherical harmonic digital representation preferably divided into a plurality of frequency bands, and further receiving a number of related spatial parameters indicative of at least dominant sound sources in the scene, their directions-of-arrival (DOA) and associated powers; subjecting said dominant sound source signals and said ambient signal to audio rendering, utilizing said spatial parameters and involving distribution of the dominant sound source signals and said ambient signal among a number of playback channels, or to upmixing involving re-encoding the signals to a higher order spherical harmonic representation; and respectively providing the resulting, rendered signals forward for audio playback via a
  • DOA directions-
  • a method for cultivating a spherical harmonic digital representation of a sound scene to be performed by an electronic arrangement comprises: obtaining the spherical harmonic digital representation of the sound scene; determining through analysis of said spherical harmonic digital representation a number of related spatial parameters indicative of at least dominant sound sources in the scene, their directions-of-arrival (DOA) and associated powers, wherein time-frequency decomposition of said spherical harmonic digital representation is utilized to divide the presentation into a plurality of frequency bands analyzed, said bands optionally reflecting human auditory frequency resolution; and providing said spherical harmonic digital representation, preferably as divided into said plurality of frequency bands, and said number of spatial parameters to spatial filtering in order to produce an output signal for audio rendering or upmixing the representation to higher order.
  • DOA directions-of-arrival
  • a method for processing a spherical harmonic digital representation of a sound scene comprises: obtaining the spherical harmonic digital representation of the sound scene, preferably being divided into a plurality of frequency bands, and a number of related spatial parameters parameters indicative of at least dominant sound sources in the scene, their directions-of-arrival (DOA) and associated powers; subjecting the spherical harmonic digital representation to combined spatial filtering and audio rendering or combined spatial filtering and upmixing, wherein corresponding matrices for decomposition of the spherical harmonic digital representation and rendering to audio signals associated with respective playback channels or upmixing to higher order representation, both for the dominant sound sources and ambient component are determined based on the spatial parameters; and respectively providing the resulting rendered signals forward for audio playback via a number of transducers associated with the playback channels, optionally speakers such as loudspeakers or headphones, or the upmixed, higher-order spherical harmonic digital representation of the sound scene for storage
  • a method for processing a low-bandwidth indication of a spherical harmonic digital representation of a sound scene to be performed by an electronic arrangement comprises: obtaining a number of dominant sound source signals and a preferably monophonic ambient signal resulting from decomposing the spherical harmonic digital representation preferably divided into a plurality of frequency bands, and further receiving a number of related spatial parameters indicative of at least dominant sound sources in the scene, their directions-of-arrival (DOA) and associated powers; subjecting said dominant sound source signals and said ambient signal to audio rendering, utilizing said spatial parameters and involving distribution of the dominant sound source signals and said ambient signal among a number of playback channels, or to upmixing involving re-encoding the signals to a higher order spherical harmonic representation; and respectively providing the resulting, rendered signals forward for audio playback via a number of transducers associated with the playback channels, preferably speakers such as loudspeakers or headphones, or the upmixed, higher-order spherical harmonic
  • various embodiments of the present invention provide enhanced playback and flexible manipulation of HOA signals using parametric information.
  • it uniquely applies an acoustic model of the sound scene that considers multiple directional sounds and a diffuse ambient signal.
  • the solution focuses on spatially transformed HOA signals and aims to playback of the whole sound scene, without rejecting ambience and reverberation.
  • the underlying model utilizes acoustical array techniques of DoA estimation and beamforming.
  • the HOA signals that serve as input to the method may originate either from a microphone array recording or from software source such as mixing software.
  • the analysis and synthesis can be performed in a suitable time-frequency transform domain, such as the short-time Fourier transform, or a perceptually optimized filter bank.
  • the analysis/synthesis may proceed in time frames analyzed and processed in a number of frequency bands.
  • the applied time- frequency processing improves estimation accuracy due to e.g. improved separability and sparsity of the source and diffuse signals, being hence in better agreement with the assumed model.
  • various embodiments of the present invention yield increased spatial resolution at reproduction for e.g. loudspeakers or headphones; instead of direct transformation of the ambisonic signals for spatial modifications and decoding, the solution suggested herein analyzes the statistics of the ambisonic signals and extracts spatial parameters describing the dominant sources in the scene, such as their directions-of-arrival (DoA) and their powers. It then estimates their signals along with the signals that are not modeled by the dominant source signals, and hence they model the remaining sounds in the scene corresponding to reverberation and ambience. As the solution knows the DoAs of the estimated sources, it can use them to spatialize the sources at the playback system with the highest spatial resolution that it can offer.
  • DoA directions-of-arrival
  • the ambience component may be enhanced to achieve maximally diffuse properties by processing it separately through a network of decorrelators, something which cannot be done with e.g. direct ambisonic decoding.
  • This kind of diffuse processing basically restores spaciousness of the ambient and reverberant sound, which is otherwise degraded by the high correlation between the playback signals in direct ambisonic decoding.
  • the suggested solution is flexible in terms of playback setup.
  • panning functions that suit better the target playback system, such as amplitude or vector-base amplitude panning (VBAP) functions, can be utilized.
  • VBAP vector-base amplitude panning
  • various embodiments of the present invention enable making more advanced and flexible spatial modifications of the captured or constructed sound scene.
  • the embodiments can be harnessed into introducing a wide variety of such novel, meaningful modifications based on the analyzed, obtained parameters. Since the parameters are rather intuitive (DoAs of sources, levels of sources and ambience), they can be conveniently utilized by a sound effect developer to design interesting effects for manipulation of the sound scene. Examples of applicable modifications include selective attenuation of certain detected sources in the scene, control of the level of the ambient component, spatial re-mapping of certain sources in the scene, and visualization of the source parameters. These can be especially useful on editing the spatial sound scene for combination and alignment with immersive video content, for example.
  • lower-order ambisonic recordings and mixes e.g. first or second
  • a number of refers herein to any positive integer starting from one (1 ), e.g. one, two, or three.
  • a plurality of refers herein to any positive integer starting from two (2), e.g. two, three, four.
  • Figure 1 illustrates two common use scenarios involving Ambisonics
  • Figure 2 illustrates a potential context and use scenario of various embodiments of the present invention, related entities and electronic arrangements
  • Figure 3 illustrates an embodiment of the present invention for cultivating spherical harmonic representations of sound scenes
  • Figure 4 illustrates one other embodiment of the present invention for cultivating spherical harmonic representations of sound scenes
  • Figure 5 illustrates an embodiment of an audio analyser applicable e.g. in connection with the embodiments of Figs. 3 and 4;
  • Figure 6 illustrates an embodiment of establishing audio modification/separation matrices for use e.g. with the embodiment of Fig. 3 and based on spatial parameters provided by an analyser such as the analyser of Fig. 5;
  • Figure 7 illustrates embodiments of spatial filtering and rendering as well as upmixing, applicable e.g. in connection with the embodiment of Fig. 3;
  • Figure 8 illustrates embodiments of source panning and diffuse rendering as well as upmixing, applicable e.g. in connection with the embodiment of Fig. 4.
  • Figure 9 is a high-level flow diagram depicting the internals of various embodiments of a method in accordance with the invention.
  • User devices (UE) 204a, 204b, 204c of users may refer to wired or wireless terminals, e.g. desktop or laptop computers, smartphones, tablets, which may be functionally connected to a communications network 210 by a suitable wired or wireless transceiver, for example, to access the network 210 and remote entities operatively reachable therethrough.
  • the network 210 may include or be at least operatively connected to various private and/or public networks, e.g. the internet.
  • One of such user devices 204a, 204b, 204c and/or a number of further devices, such as a server 220, also connected to the network, may be configured to host at least portion of one or more embodiments of an electronic arrangement described herein and/or to execute related method(s) suggested.
  • a server 220 also connected to the network
  • further device(s) 218 included in the environment that do not contribute to encoding, cultivating or decoding the spherical harmonic digital representations of sound scenes but participate in storing or transferring them, for example.
  • the arrangement typically includes at least one processing unit 222, such as a microprocessor or a digital signal processor, for processing instructions and other data, a memory 228, such as one or more memory chips optionally integrated with the processing unit 222, for storing instructions and other data, and a (data) communication interface 224, such as a transceiver, transmitter and/or receiver, for transferring data e.g. via the network 210.
  • the interface 224 may comprise wired and/or wireless means of communication.
  • the arrangement may comprise or be at least operatively connected to a microphone such as an ambisonic microphone 230 for capturing a sound scene and sounds 229 for ambisonic encoding instead of or in addition to encoding a locally or externally synthetically produced sound scene.
  • the arrangement may comprise or be at least operatively connected to a speaker 231 such as one or more loudspeakers, a related playback system, or e.g. headphones for audio output 232 of a reproduced sound scene.
  • the arrangement may thus be configured to locally obtain or receive an externally created spherical harmonic digital representation of a sound scene, which may be captured via microphone(s) and/or be of synthetic origin (e.g., created directly digitally with a computer).
  • an externally created spherical harmonic digital representation of a sound scene which may be captured via microphone(s) and/or be of synthetic origin (e.g., created directly digitally with a computer).
  • one or more embodiments of the arrangement in accordance with the present invention may be implemented and/or related method(s) executed by any one or more of the user devices 204a, 204b, 204c and/or other computing devices such as the server 220.
  • Intermediate device(s) 218 may be utilized for storing and/or transferring related data, for instance.
  • One or more of the devices 204a, 204b, 204c, 220 may be configured to analyze the digital representation of a sound scene, whereas the same or different devices 204a, 204b, 204c, 220
  • the suggested solution may take as input an ambisonic stream, and produce signals for a) loudspeakers with arbitrary setups (stereo, 5.1 , 7.1 , hexagons, octagons, cubic, 13.1 , 22.2, arbitrary ones), b) headphones employing head-related transfer functions for effective 3D sound rendering personalized to the user, and with head-tracking support, and/or c) ambisonic signals of a higher-order than the original (upmixing).
  • the method can work with ambisonic signals of basically any resolution, such as the common first-order ambisonics (FOA) of 4 channels, or higher-order ambisonics (HOA) of 9, 16, 25, or 36 channels for example.
  • FOA common first-order ambisonics
  • HOA higher-order ambisonics
  • variants 300, 400 of the solution generally suggested herein are discussed in detail hereinafter.
  • various embodiments of the present solution may rely e.g. on a time-frequency decomposition 302, 402 of the FOA/HOA signals 301 , 401 using an appropriate time-frequency transform or a filter bank.
  • the created frequency channel(s) 303, 403 may be then handled separately.
  • Both variants 300, 400 may utilize the same or at least similar analysis 304, 404 stage, which at each time step extracts spatial parameters for the sound scene, incorporating e.g. estimation of the number of dominant sources, their respective directions-of-arrival (DoA), and preferably also their powers.
  • DoA directions-of-arrival
  • the two variants 300, 400 differ e.g. on where the signal decomposition happens and on the resulting bandwidth requirements, for instance.
  • the first variant 300 termed here high-bandwidth version, may be configured to decompose the sound scene and render or upmix the associated components in one stage 308.
  • This version preferably utilizes all the FOA/HOA channels that are to be transmitted to the decoder, along with the spatial parameters, and is considered advantageous when e.g. maximum quality is desired, or when encoding and decoding may naturally occur in one stage (e.g. one application performing both in the same machine).
  • This version preserves the original sound scene impression as much as possible.
  • the decoder obtains/receives 360 the FOA/HOA channels 303 as well as spatial parameters 536, 538 used to form mixing matrices 632, 634 to enable decomposition of the sound scene (spatial filtering 732) and rendering 750, 752 to loudspeakers/headphones 310 or upmixing 754, 756.
  • the derived matrices 632, 634 may be further adapted based on the information 730 about the loudspeaker setup, headphone characteristics (such as headphone calibration filters) and e.g. the user's headphone spatialization filters (e.g. head-related transfer functions), if available.
  • the matrices may combine the decomposition of the sound scene and rendering to speakers/upmixing in one stage, for example. Accordingly, at least one matrix 632 may be determined to decompose and render the source components in the sound scene, and at least one other one 634 to decompose and render the ambient component. Furthermore, if upmixing 312 to higher-order Ambisonics is desired, the matrices may be configured to take into account the desired target order 742.
  • the second variant 400 termed here low-bandwidth version, results in a smaller number of channels to be used by the decoder, than the number of input FOA/HOA channels. Hence this version is more suitable for efficient transmission and compression of the sound scene, without significantly compromising of quality during rendering/upmixing.
  • the sound scene is decomposed at the encoder stage by spatial filtering block 414 into a number of sound source signals 416, variable at each time step, and a monophonic ambient signal 418.
  • the total number of source signals plus ambient signal is smaller or equal to half the FOA/HOA channels plus one.
  • the decomposed channels are then stored, transmitted or otherwise provided 460 to the decoder along with the spatial parameters such as DOA 836 and power parameters 838, which may correspond to output 536, 538 of the analysis 404 or result from modification in adjustment 406 executed based on e.g. user input/preferences. Further input may include information 830 on the playback setup.
  • the source signals are rendered 850 to speakers 410 of the playback setup using e.g. amplitude panning in the case of loudspeakers or e.g. head-related transfer functions for headphones.
  • the ambient signal may be distributed to the loudspeakers or headphones through rendering 852 involving decorrelation filters, to be perceived as diffuse and surrounding by the listener.
  • the source signals are re-encoded 854 to higher-order Ambisonics based on their analyzed directions-of-arrival and desired target order 842.
  • spatial parameter(s) on e.g. associated power may be utilized.
  • Both variants 300, 400 enable and support modification 306, 406, 630 of the spatial parameters in the sound scene by the user, such as a) modification of the level difference between the source signals and the ambient signal, b) change of directions-of-arrival of the source signals towards other user-set ones, c) suppression of source signals coming from certain user-defined directions.
  • the modification parameters 636 can be defined through an appropriate graphical user interface (GUI), for example, and be subsequently sent or otherwise provided to the decoder stage.
  • GUI graphical user interface
  • they contribute 306, 630 to the formulation of the separation and rendering matrices 632, 634, while in the low-bandwidth version 400 they contribute directly to the panning and diffuse rendering stage 408.
  • a source number estimator 532 may utilize eigendecomposition of the SCM and perform analysis of its eigenvalues and eigenvectors. Based on the estimated number of sources, a source signal space and an ambient signal space may be formed from the eigenvectors. These subspaces may be then utilized to estimate the directions-of-arrival 534 of the source signals, using any subspace method, such as the MUSIC method, or the weighted subspace fitting method.
  • the total sound scene power is preferably computed from the sum of powers of the ambisonic input signals. Using the directions-of-arrival the power of each source component and subsequently the power of the ambient component may be estimated 535, the latter as the difference between the source powers and the total.
  • Ambisonics are based on the formalism of a continuous spatially band-limited amplitude distribution describing the incident sound field, and capturing all contributions of the sound scene, such as multiple sources, reflections, and reverberation.
  • the band-limitation refers to the spatial, or angular, variability of the sound field distribution, and a low-order representation approximates only coarsely sharp directional distributions and sounds incident with high spatial concentration.
  • the spatio-temporal distribution describing the sound field at time t is expressed by ⁇ ( ⁇ , ⁇ ) , where ⁇ is a unit vector at azimuth and elevation respectively.
  • the vector y(y) contains the spherical harmonic (SH) functions Y nm (y) of order n and degree m .
  • the HOA signals can be either obtained by synthesizing a sound scene, encoding directly sound signals at desired directions and reverberation, or by capturing a real sound scene.
  • Ambisonic recording obtains the ambisonic signals by sampling the sound field at some finite region around the origin with a microphone array.
  • the encoding then aims at achieving optimally the SHT of Eq. 1 based on the microphone recordings. Physical constraints limit severely broadband recording of the higher-order terms, with the limitation dependent on the array arrangement, number of microphones and overall array size. It is assumed here that we have access directly to the ambisonic signals, after encoding, with the best possible performance.
  • Recording the ambisonic signals is done by microphone arrays, most commonly spherical ones for practical and theoretical convenience. Recording begins by sampling the sound field pressure over a surface or volume around the origin, expressed through a number of M microphone signals x . These signals are transformed to spherical harmonic coefficients of order N ⁇ M -1 , hence expressing the field in the array region, and then extrapolated to the plane wave density coefficients a , that are in theory independent of the array. Due to physical limitations, the frequency region over which this extrapolation is valid, and hence the acquisition of the ambisonic signals, depends on the size, geometric properties and diffraction characteristics of the array. The recording process however can be described in a compact form as
  • E(/) is the QxM matrix of encoding filters, which is derived by a constrained inversion of either the theoretical or measured directional response of the array.
  • Ambisonic decoding defines a linear mapping of the ambisonic signals a to L output channels of the reproduction system, defined through the ambisonic decoding matrix D of size L x Q . It is derived according to the spatial properties of the reproduction system and can be either frequency- independent (a matrix of gains), or frequency-dependent (a matrix of filters).
  • a second case of interest is isotropic diffuse sound, coming with equal power from all directions, which is a useful simplification of late reverberant sound.
  • This can be modeled as an amplitude distribution d(t, y) with angular correlation (15) and P dm omponent that is independent of direction due to the isotropy assumption.
  • the ambisonic signals due to such diffuse sound are given by
  • This correlation matrix forms the basis of the parametric estimation for the parametric analysis and synthesis. According to the assumed field model, the total field power is
  • the analysis and synthesis is performed in a suitable time-frequency transform domain, such as the short-time Fourier transform, or a perceptually optimized filterbank. All quantities defined before are used in their time-frequency counterpart at the time index / and frequency index k , while correlations now determine subband correlations and are frequency-dependent.
  • the time-frequency processing improves estimation due to better separability and sparsity of the source and diffuse signals, and hence better agreement with the assumed model.
  • Dominance of directional or diffuse components in the sound scene is reflected in the structure of the spatial statistics of the signals, as captured in the correlation matrix of Eq. 21 with K sources and diffuse sound. Detection of these conditions is based on the subspace principle of sensor array processing.
  • the eigenvalue decomposition (EVD) of the correlation matrix has the form
  • ⁇ ⁇ > ... > q > ... > ⁇ ⁇ > o are the sorted eigenvalues of the EVD
  • ⁇ q are the respective eigenvectors.
  • all the lowest eigenvalues of K ⁇ q ⁇ Q should be equal and close to the diffuse power P .
  • All the eigenvalues associated with ⁇ ⁇ q ⁇ K are associated with the powers of both sources and the diffuse field, with q > P diS .
  • the distribution of the eigenvalues reveals information about how many sources exist in the scene, and sources with significant direct-to-diffuse ratio (DDR) will be associated with eigenvalues significantly higher than the lower ones corresponding to the diffuse field. This information will be used in order to detect diffuse conditions and get an estimate of the number of significant sources in the sound scene.
  • DDR direct-to-diffuse ratio
  • both detection and estimation uses a frequency-averaged covariance matrix across multiple bins, in frequency ranges that are perceptually motivated and reflect human auditory frequency resolution, such as equivalent rectangular bandwidth (ERB) bands.
  • Estimation of the number of sources in the sound scene is based on analysis of the subspace decomposition.
  • Various approaches from array processing literature could be applied for this task. They can be based for example on analysis of dominant eigenvalues, eigenvalue ratios, eigenvalue statistics, analysis of the eigenvectors, or information theoretic criteria.
  • SORTE eigenvalue statistics
  • DoA direction(s)-of-arrival estimation
  • DoA estimation can be performed by a variety of established methods from array signal processing. They vary widely on their complexity and performance, and they should be chosen according to the sound scene and the application requirements.
  • DoA estimation can be done by narrowband DoA methods, which require scanning on a grid of directions and the associated maxima or minima finding. That can be done through analysis of power maps of beamformers, such as the MVDR, or by subspace methods, such as MUSIC. We present an example based on MUSIC.
  • the source DoAs f s E r g are found at the grid directions for which the minima of Eq. 35 occur.
  • the powers of the individual components may be estimated.
  • the source powers can be computed by considering a beamformer with nulls to all estimated DoAs apart from the source of interest.
  • the diffuse sound can be estimated in various ways, and the final choice should depend on the application scenario. If the application has no strict bandwidth or transmission requirements, for example in a standalone high- resolution spatial sound reproduction system, where all the spherical harmonic signals are available on the decoder, then it is advantageous to retain a diffuse signal in the SHD, with a directional distribution which can deviate from isotropic and which can be reproduced without need of decorrelation, as will be detailed at the synthesis stage.
  • the diffuse field power in this case is computed similarly to Eq. 20
  • the directional sounds are distributed to the output channels with maximum directional concentration from their analyzed directions. It is suitable to consider such distribution functions as synthesis steering vectors, which may include panning laws, head-related-transfer functions, ambisonic panning functions, virtual steering vectors for transcoding into virtual recording arrays or other spatial formats, and others.
  • synthesis steering vectors which may include panning laws, head-related-transfer functions, ambisonic panning functions, virtual steering vectors for transcoding into virtual recording arrays or other spatial formats, and others.
  • g(Y) [g 1 (Y),...,g L (r)Y .
  • the design of the spatialization vectors depends on the target system. Three major cases of interest are:
  • Loudspeaker rendering A common solution is vector-base amplitude panning (VBAP), which adapts to any loudspeaker setup and provides perceptually maximum directional concentration for any direction, and hence a suitable choice for fully directional sounds.
  • VBAP vector-base amplitude panning
  • smooth panning functions can be used such as ambisonic panning [citation], which have increased localization blur but provide a more even perceived source width, if such a characteristic is preferred over directional sharpness.
  • HRTF interpolation should be employed for arbitrary DoAs, or in the case of a dense grid of HRTF measurements, quantization to the closest HRTF direction can be adequate.
  • Ambisonic upmixing In the case of ambisonic upmixing, new synthetic ambisonic signals are generated from the lower order signals that are analyzed. Let us assume that the target order is N' > N , then the re-encoding gains are the target order spherical harmonic vectors for the analyzed DoAs
  • a total omnidirectional monophonic diffuse signal can be estimated and transmitted, by keeping only the first component of Eq. 33 , in a low-bandwidth application scenario.
  • This diffuse component can then be distributed to the output channels at the decoding stage through a network of decorrelators to achieve spatially diffuse reproduction.
  • all the ambisonic signals model the ambient/diffuse residual, as shown in Eq. 33 .
  • Distribution of the diffuse signals of a diff can be performed in two stages, first a non-parametric ambisonic rendering, and second an optional parametric enhancement stage.
  • the non-parametric decoding stage relies on well- designed ambisonic decoding matrices.
  • the output channels should be uncorrelated between them, preferably with the original output signal powers to preserve the directional power distribution of the original sound field.
  • the enhanced correlation matrix c b eni of these signals would be diagonal with entries diag[c b ] .
  • a dir (*, /) dir ( - 1) + (l - )0, w£ . (54)
  • the final diffuse rendering matrix B from the ambisonic signals to the output signals is, similarly to the directional rendering matrix, given by smoothing Eq. 38 with the same time-constant as for the directional sounds
  • Figure 9 illustrates, at 900, items that may be performed in various embodiments of a method according to the present invention.
  • the executing arrangement may be configured with the necessary hardware and software (with reference to a number of computing devices such as user terminal devices and/or servers, for example) and provided in or at least connected to a target network in cases where such connectivity is preferred for e.g. transfer (transmission and/or receipt, from a standpoint of a single device) of sound scene related data.
  • Item 901 encompasses some items the method may, depending on the embodiment, comprise.
  • a number of feasible microphones such as an ambisonic microphone may be obtained and configured for capturing a sound scene.
  • it may further be connected or comprise a recording equipment that stores the captured signals in a target format, which may in some embodiments already be e.g. a selected ambisonic encoding format or other preferred spherical harmonic representation of the scene.
  • a target format which may in some embodiments already be e.g. a selected ambisonic encoding format or other preferred spherical harmonic representation of the scene.
  • synthesis equipment with reference to e.g. a computer provided with necessary synthesis software may be utilized.
  • item 920 refers to actual capturing (microphone) or creation (synthetic production) of the sound scene and item 922 refers to establishing the associated spherical harmonic representation.
  • the representation is obtained, either as ready-made by an external entity such as a computer functionally connected to the arrangement or self- established as deliberated above.
  • obtaining may herein refer to e.g. receiving, fetching, reading, capturing, synthetic production, etc.
  • the representation is subjected to analysis.
  • analysis of said spherical harmonic digital representation preferably contains determination of a number of related spatial parameters indicative of at least dominant sound sources in the sound scene, their directions-of-arrival (DOA) and associated powers.
  • Time-frequency decomposition of said spherical harmonic digital representation such as a selected time-frequency transform (e.g. a selected variant of Fourier transform) or a filter bank may be utilized to divide the presentation into a plurality of frequency bands analysed.
  • the bands may be selected so as to reflect characteristic(s) of human auditory system such as frequency resolution thereof.
  • spherical harmonic digital representation data or source/diffuse signals decomposed therefrom may be spatially filtered 908, potentially in the aforementioned bands, and rendered 912 for audio playback and/or upmixing 914 optionally based on sound modification input that is translated into changes 910 in the spatial parameters (either through direct manipulation of parameter data or of the process via which the parameters are configured to affect audio rendering/upmixing) as discussed hereinbefore.
  • modification of spatial parameters may be executed prior to or in connection with spatial filtering, preferably upon creation of separation/mixing matrices, whereas in the embodiment of Fig. 4 such modifications typically take place after spatial filtering, whereupon the embodiment-dependent execution order of items 908 and 910 has been highlighted in the figure by a curved bidirectional arrow between them.
  • the dotted two horizontal lines are indicative of two options of a potential division between analysis and decoding/rendering side activities, further illustrating the fact that in some embodiments spatial filtering 908 may be executed at the decoding/rendering phase (Fig. 3) while in some other embodiments it may be already executed in connection with the analysis (Fig. 4). Some embodiments of the present invention may correspondingly concentrate on analysis side activities only, some others on decoding/rendering, while there may also be "complete system" type embodiments, executing both analysis and decoding/rendering side tasks at least selectively.
  • the execution is ended at 916.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne un agencement (204a, 204b, 204c, 220) permettant de produire une représentation numérique harmonique sphérique d'une scène sonore (102, 103, 229, 230, 301, 401), qui est configuré pour obtenir la représentation numérique harmonique sphérique (301, 401) de la scène sonore, déterminer par analyse (304, 404, 530, 532, 534, 535) de ladite représentation numérique harmonique sphérique un certain nombre de paramètres spatiaux apparentés (536, 538) indiquant des sources sonores au moins dominantes dans la scène sonore, leurs directions d'arrivée (DOA) et puissances associées, dans lequel la décomposition temps-fréquence de ladite représentation numérique harmonique sphérique est de préférence utilisée pour diviser la présentation en une pluralité de bandes de fréquence analysées (302, 402), et fournir (360) ladite sphérique représentation numérique harmonique, de préférence divisée en ladite pluralité de bandes de fréquence, et ledit nombre de paramètres spatiaux en filtrage spatial (308, 414) afin de produire un signal de sortie pour la restitution audio (231, 232, 310, 410) ou le mixage élevé (312, 412) de la représentation à un ordre plus élevé. L'invention concerne également un procédé correspondant ainsi que des agencements et des procédés associés pour une lecture audio ou un mixage plus élevé.
PCT/FI2018/050172 2017-03-09 2018-03-08 Procédé et agencement d'analyse paramétrique et traitement de scènes sonores spatiales codées de manière ambisonique Ceased WO2018162803A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20175220 2017-03-09
FI20175220 2017-03-09

Publications (1)

Publication Number Publication Date
WO2018162803A1 true WO2018162803A1 (fr) 2018-09-13

Family

ID=63447458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2018/050172 Ceased WO2018162803A1 (fr) 2017-03-09 2018-03-08 Procédé et agencement d'analyse paramétrique et traitement de scènes sonores spatiales codées de manière ambisonique

Country Status (1)

Country Link
WO (1) WO2018162803A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113016189A (zh) * 2018-11-16 2021-06-22 三星电子株式会社 识别音频场景的电子设备和方法
GB2593117A (en) * 2018-07-24 2021-09-22 Nokia Technologies Oy Apparatus, methods and computer programs for controlling band limited audio objects
CN113490980A (zh) * 2019-01-21 2021-10-08 弗劳恩霍夫应用研究促进协会 用于编码空间音频表示的装置和方法以及用于使用传输元数据来解码经编码的音频信号的装置和方法,以及相关的计算机程序
CN114341976A (zh) * 2019-06-24 2022-04-12 高通股份有限公司 将基于场景的音频数据相关以用于心理声学音频编解码
CN114424588A (zh) * 2019-09-17 2022-04-29 诺基亚技术有限公司 使用宽带估计的参数化空间音频捕获的方向估计增强
CN115881140A (zh) * 2021-09-29 2023-03-31 华为技术有限公司 编解码方法、装置、设备、存储介质及计算机程序产品
WO2024038702A1 (fr) * 2022-08-15 2024-02-22 パナソニックIpマネジメント株式会社 Dispositif de reproduction de champ sonore, procédé de reproduction de champ sonore, et système de reproduction de champ sonore
US12445799B2 (en) 2022-12-08 2025-10-14 Samsung Electronics Co., Ltd. Surround sound to immersive audio upmixing based on video scene analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130223658A1 (en) * 2010-08-20 2013-08-29 Terence Betlehem Surround Sound System
US20150156578A1 (en) * 2012-09-26 2015-06-04 Foundation for Research and Technology - Hellas (F.O.R.T.H) Institute of Computer Science (I.C.S.) Sound source localization and isolation apparatuses, methods and systems
WO2015175981A1 (fr) * 2014-05-16 2015-11-19 Qualcomm Incorporated Vecteurs de codage décomposés à partir de signaux audio ambiophoniques d'ordre supérieur

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130223658A1 (en) * 2010-08-20 2013-08-29 Terence Betlehem Surround Sound System
US20150156578A1 (en) * 2012-09-26 2015-06-04 Foundation for Research and Technology - Hellas (F.O.R.T.H) Institute of Computer Science (I.C.S.) Sound source localization and isolation apparatuses, methods and systems
WO2015175981A1 (fr) * 2014-05-16 2015-11-19 Qualcomm Incorporated Vecteurs de codage décomposés à partir de signaux audio ambiophoniques d'ordre supérieur

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
EPAIN, N. ET AL.: "Spherical Harmonic Signal Covariance and Sound Field Diffuseness", IEEE /ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 24, no. 10, October 2016 (2016-10-01), pages 1796 - 1807, XP011618336 *
HAN, K. ET AL.: "Improved Source Number Detection and Direction Estimation With Nested Arrays and ULAs Using Jackknifing", IEEE TRANSACTIONS ON SIGNAL PROCESSING, vol. 61, no. 23, December 2013 (2013-12-01), pages 6118 - 6128, XP011531751 *
KUECH, F. ET AL.: "Directional Audio Coding Using Planar Microphone Arrays", IEEE CONFERENCE ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 6 May 2008 (2008-05-06), pages 37 - 40, XP055557498 *
POLITIS, A. ET AL.: "JSAmbisonics: A Web Audio library for interactive spatial sound processing on the web", INTERACTIVE AUDIO SYSTEMS SYMPOSIUM, UNIVERSITY OF YORK , UNITED KINGDOM, 23 September 2016 (2016-09-23), XP055557512 *
POLITIS, A. ET AL.: "PARAMETRIC SPATIAL AUDIO EFFECTS", PROC. OF THE 15TH INT. CONFERENCE ON DIGITAL AUDIO EFFECTS (DAFX-12), 17 September 2012 (2012-09-17), York, UK, XP055527425 *
POLITIS, A. ET AL.: "Sector-Based Parametric Sound Field Reproduction in the Spherical Harmonic Domain", IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, vol. 9, no. 5, August 2015 (2015-08-01), XP055204187 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2593117A (en) * 2018-07-24 2021-09-22 Nokia Technologies Oy Apparatus, methods and computer programs for controlling band limited audio objects
CN113016189A (zh) * 2018-11-16 2021-06-22 三星电子株式会社 识别音频场景的电子设备和方法
CN113016189B (zh) * 2018-11-16 2023-12-19 三星电子株式会社 识别音频场景的电子设备和方法
CN113490980A (zh) * 2019-01-21 2021-10-08 弗劳恩霍夫应用研究促进协会 用于编码空间音频表示的装置和方法以及用于使用传输元数据来解码经编码的音频信号的装置和方法,以及相关的计算机程序
US12198709B2 (en) 2019-01-21 2025-01-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for encoding a spatial audio representation or apparatus and method for decoding an encoded audio signal using transport metadata and related computer programs
CN114341976A (zh) * 2019-06-24 2022-04-12 高通股份有限公司 将基于场景的音频数据相关以用于心理声学音频编解码
CN114424588A (zh) * 2019-09-17 2022-04-29 诺基亚技术有限公司 使用宽带估计的参数化空间音频捕获的方向估计增强
US12156014B2 (en) 2019-09-17 2024-11-26 Nokia Technologies Oy Direction estimation enhancement for parametric spatial audio capture using broadband estimates
CN115881140A (zh) * 2021-09-29 2023-03-31 华为技术有限公司 编解码方法、装置、设备、存储介质及计算机程序产品
WO2024038702A1 (fr) * 2022-08-15 2024-02-22 パナソニックIpマネジメント株式会社 Dispositif de reproduction de champ sonore, procédé de reproduction de champ sonore, et système de reproduction de champ sonore
US12445799B2 (en) 2022-12-08 2025-10-14 Samsung Electronics Co., Ltd. Surround sound to immersive audio upmixing based on video scene analysis

Similar Documents

Publication Publication Date Title
US11832080B2 (en) Spatial audio parameters and associated spatial audio playback
WO2018162803A1 (fr) Procédé et agencement d'analyse paramétrique et traitement de scènes sonores spatiales codées de manière ambisonique
RU2663343C2 (ru) Система, устройство и способ для совместимого воспроизведения акустической сцены на основе адаптивных функций
US11950063B2 (en) Apparatus, method and computer program for audio signal processing
US9014377B2 (en) Multichannel surround format conversion and generalized upmix
CN115176486B (zh) 利用空间元数据内插的音频渲染
CN114503606B (zh) 音频处理
CN117560615A (zh) 目标空间音频参数和相关联的空间音频播放的确定
CN118368580A (zh) 使用自适应捕捉从麦克风阵列生成空间音频信号格式
CN112189348B (zh) 空间音频捕获的装置和方法
Politis et al. JSAmbisonics: A Web Audio library for interactive spatial sound processing on the web
CN113439303A (zh) 用于使用扩散分量进行与基于DirAC的空间音频编码有关的编码、解码、场景处理和其他过程的装置、方法和计算机程序
CN113597776A (zh) 参数化音频中的风噪声降低
JP2023551016A (ja) オーディオ符号化及び復号方法並びに装置
GB2549922A (en) Apparatus, methods and computer computer programs for encoding and decoding audio signals
JP2024023412A (ja) 音場関連のレンダリング
CN112970062A (zh) 空间参数信令
CN120303957A (zh) 空间音频的双耳音频渲染
FR3101741A1 (fr) Détermination de corrections à appliquer à un signal audio multicanal, codage et décodage associés
EP4312439A1 (fr) Sélection de direction de paire sur la base d'une direction audio dominante
EP4172986A1 (fr) Codage optimise d'une information representative d'une image spatiale d'un signal audio multicanal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18763843

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18763843

Country of ref document: EP

Kind code of ref document: A1