EP2198425A1

EP2198425A1 - Method, module and computer software with quantification based on gerzon vectors

Info

Publication number: EP2198425A1
Application number: EP08840014A
Authority: EP
Inventors: Adil Mouhssine; Abdellatif Benjelloun Touimi
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2007-10-01
Filing date: 2008-09-30
Publication date: 2010-06-23
Also published as: WO2009050409A1; US20100241439A1

Abstract

The invention relates to a method for encoding the components ( Xi,k ) of an audio scene including N signals (Si,..., SN) with N>1, that comprises the step of quantifying at least some of said components, wherein the quantification is defined based on at least an energy vector and/or one velocity vector associated with Gerzon criteria and based on said components.

Description

METHOD, MODULE AND COMPUTER PROGRAM WITH QUANTIFICATION BASED ON GERZON VECTORS

The present invention relates to audio signal coding devices comprising quantization modules and intended in particular to take place in applications for transmission or storage of digitized and compressed audio signals.

The invention relates more particularly to the coding of 3D sound scenes. A 3D sound scene, also called spatialized sound, comprises a plurality of audio channels each corresponding to monophonic signals.

In signal coding techniques of a sound scene, each monophonic signal is encoded independently of other signals based on perceptual criteria for reducing the bit rate by minimizing the perceptual distortion of the monophonic coded signal relative to the original monophonic signal. . Audio encoders of the state of the art MPEG 2/4 AAC encoder type provide rate reduction techniques that minimize perceptual distortion of the signal.

Another signal coding technique of a sound stage, used in the "MPEG Audio Surround" encoder (see "Text of ISO / IEC FDIS 23003-1, MPEG Surround", ISO / IEC JTC1 / SC29 / WG11 N8324, JuIy 2006, Klagenfurt,

Austria), includes the extraction and coding of spatial parameters from all monophonic audio signals on the different channels. These signals are then mixed to obtain a monophonic or stereophonic signal, which is then compressed by a conventional mono or stereo encoder (for example of the MPEG-4 AAC, HE-AAC type, etc.). At the level of the decoder, the synthesis of the rendered 3D sound scene is made from the spatial parameters and the decoded mono or stereo signal.

The coding of the multichannel signals of a sound scene includes, in certain cases, the introduction of a transformation (KLT, Ambiophonic, DCT, etc.) making it possible to better take into account the interactions that may exist between the different signals of the sound scene. to code.

For these new types of encoders, there is the problem of offering a reduction of the bit rate that respects the spatial aspect of the sound stage. The present invention improves this situation by proposing, in a first aspect, a method of encoding components of an audio scene comprising N signals with N> 1, comprising a step of quantizing at least some of the components. The method is characterized in that the quantization is defined as a function of at least one energy vector and / or a velocity vector associated with Gerzon criteria and function of the components.

A method according to the invention thus proposes a quantization which takes into account the interactions between the signals of a sound scene and which thus makes it possible to reduce the spatial distortion of the sound stage and thus to respect its original aspect. The allocation of bits to the spatial components is performed by considering the spatial accuracy and spatial stability of the restored sound scene.

The audio quality of the decoded global sound stage is improved for a given coding rate.

In one embodiment, the quantization is defined as a function of variations of at least one of said energy and velocity vectors during component variations. The allocation of bits to the various components is thus performed as a function of the impact of their respective variations on the spatial accuracy and / or the spatial stability of the decoded sound scene.

In one embodiment, component variations corresponding to the minimization, or limitation, of variations of at least one of the energy and velocity vectors are determined and, based on said component variations, values are derived. quantization errors to define the quantification of components. This arrangement makes it possible to determine the quantization function which will give rise to a minimum or limited disturbance of the restored sound scene.

In one embodiment, a method according to the invention further comprises a step of detecting a transition frequency for determining which of the vectors among the energy vector or the velocity vector to be taken into account to define the quantization. components. Such an arrangement makes it possible to increase the quality of the coding while limiting the computation volume to be achieved. In one embodiment, the components are components obtained by spatial transformation, for example of the ambiophonic type.

In other embodiments, the transformation is a time / frequency transformation, for example a DCT, or a transformation combination.

In one embodiment, the energy vector is calculated based on an inverse spatial transformation on said spatial components and / or the velocity vector is calculated based on an inverse spatial transformation on said spatial components. According to a second aspect, the invention proposes a component processing module originating from an audio scene comprising N signals with N> 1, comprising means for determining elements for defining a quantization step of at least some of the components. , based at least on the energy vectors and / or the velocity vector associated with Gerzon criteria and function of the components.

According to a third aspect, the invention provides an audio coder adapted to encode components of an audio scene comprising N signals with N> 1, comprising: a component processing module according to the second aspect of the invention; and a quantization module adapted to define quantization indices associated with components as a function of at least elements determined by the processing module.

According to a fourth aspect, the invention proposes a computer program to be installed in a processing module, said program comprising instructions for implementing, during a program execution by means of processing said module, the steps of a method according to the first aspect of the invention.

Other features and advantages of the invention will become apparent on reading the description which follows. This is purely illustrative and should be read in conjunction with the attached drawings in which: FIG. 1 represents an encoder in one embodiment of the invention; Figure 2 illustrates the propagation of a plane wave in space; FIG. 3 represents a device for restoring a sound stage, comprising loudspeakers.

Gerzon's criteria are generally used to characterize the location of synthesized virtual sound sources when rendering signals from a 3D sound stage from the speakers of a given sound rendering system.

These criteria are based on the study of the velocity and energy vectors of the acoustic pressures generated by the sound rendering system used.

When a sound reproduction system comprises n loudspeakers, the n signals generated by these loudspeakers are defined by an acoustic pressure Pi and an acoustic propagation angle <p _t , i = 1 to n.

The velocity vector V, of polar coordinates (r _v , θ _v ) is then defined as

The energy vector É, of polar coordinates (r _E , θ _E ) is defined as follows:

The conditions necessary for the location of the virtual sound sources to be optimal are defined by looking for the angles <p _t , characterizing the position of the speakers of the sound rendering system considered, which satisfy the criteria below, called Gerzon criteria, which are the following criteria: - criterion 1, relating to the accuracy of the sound image of the source S at low frequencies: θ _v = θ; where θ is the propagation angle of the actual source S that we are trying to achieve.

- criterion 2, relating to the stability of the sound image of the source S at low frequencies: r _v = 1;

- criterion 3, relating to the accuracy of the sound image of the source S at high frequencies: Θ _E = Θ;

- criterion 4, relating to the stability of the sound image of the source S at high frequencies: r _E = 1.

The encoder described below in one embodiment of the invention utilizes the velocity and energy vectors associated with the Gerzon criteria in an application other than that of searching for the best angles <p _t characterizing the position of the speakers. speakers of a sound rendering system considered.

Figure 1 shows an audio coder 1 in one embodiment of the invention.

The encoder 1 comprises a time / frequency transformation module 3, a spatial transformation module 4, a quantization module 6 and a module 7 for constituting a binary sequence.

A 3D sound stage to be coded, as an illustration, includes

N channels (with N> 1) on each of which a respective signal Si, ..., S _N is delivered. The time / frequency conversion module 3 of the encoder 1 receives as input the N signals Si,..., S _N of the 3D sound scene to be encoded.

Each signal Si, i = 1 to N, is represented by the variation of its acoustic omnidirectional pressure Pi and the propagation angle θj, in the space of the 3D scene, of the associated acoustic wave. On each time frame of each of these signals indicating the different values taken over time by the sound pressure Pi, the time / frequency transformation module 3 performs a transformation. time / frequency. In the present case, it determines, for each of the signals Si, i = 1 to N, its spectral representation characterized by M coefficients MDCT Y _{1 k} , with k = 0 to M-1. An MDCT coefficient Y _{1 k} thus represents the element of the spectrum of the signal Si for the frequency F _k . The spectral representations Y _{1 k} , k = 0 to M-1, signals Si, i = 1 to N, are provided at the input of the spatial transformation module 4, which also receives at input the angles θi of acoustic propagation characterizing the Si input signals.

The spatial transformation module 4 is adapted to perform a spatial transformation of the input signals provided, that is to say to determine the spatial components of these signals resulting from the projection on a spatial repository depending on the order of the transformation. .

The order of a spatial transformation is related to the angular frequency according to which it "scans" the sound field. In one embodiment, the spatial transformation considered is the ambiophonic transformation. The sound scene is then represented by a set of signals called ambiophonic components, which make it possible to store the sound information relative to the acoustic field. This representation facilitates the manipulation of the acoustic field (rotation of the sound stage, distortion of perspective i.e. possibility of tightening the frontal scene and dilating the back scene) and the extraction of relevant parameters for a reproduction on a given device.

Another advantage of the surround transformation is that, in the case where the number N of signals of the sound stage is large, it is possible to represent them by a number L of ambiophonic components much lower than N, degrading very little the quality space of the sound stage. The volume of data to be transmitted is reduced and this without significant degradation of the audio quality of the sound scene.

Thus, in the case considered, the spatial transformation module 4 performs an ambiophonic transformation, which gives a compact spatial representation of a 3D sound scene, by producing projections of the sound field on the associated spherical or cylindrical harmonic functions. For more information on the ambiophonic transformations, one can refer to the following documents: "Representation of acoustic fields, application to the transmission and the reproduction of complex sound scenes in a multimedia context", Thesis of doctorate of the university Paris 6, Jerome DANIEL, July 31, 2001, "A highly scalable spherical array based microphone on an orthonormal decomposition of the sound field," Jens Meyer - Gary Elko, Vol. He - pp. 1781-1784 in Proc. ICASSP 2002.

With reference to FIG. 2, the following formula gives the decomposition into cylindrical harmonics at an infinite order of a signal S 1 of the sound stage: Sj (r, φ) = Pi. [J ₀ (kr) + ^ 2.j ^m J _m (kr). (cosm.θ _r cosm.φ + smm.θ _r smm.φ)] l≤m≤∞ where (J _m ) represent the functions of Bessel, r the distance between the center of the marker and the position of a listener placed at a point M, Pi the acoustic pressure of the signal Si, θi the propagation angle of the acoustic wave corresponding to the signal Si and φ the angle between the position of the listener and the axis of the marker.

If the ambiophonic transformation is of finite order p, for a 2D ambiophonic transformation (in the horizontal plane), the ambiophonic transform of a signal Si expressed in the time domain then comprises the following 2p + 1 components: (Pi, Pi. cosθi, Pi.sinθi, Pi.cos2θi, Pi.sin2θi, Pi.cos3θi, Pi.sin3θi, ..., Pi.cospθi,

Pi.sinpθi).

In what follows, it was considered a 2D surround transformation. However, the invention can be implemented with a 3D surround transformation (in this case, it is considered that the speakers are arranged on a sphere).

Furthermore, the invention can be implemented with any order p of any ambiophonic transformation, for example p = 2 or more.

Let A = (A _i} \ _{≤ι≤L be} the _ambiophonic transformation matrix of order pl≤J≤N for the 3D scene.

Ï - 1

Then A _1; = 1, Kr Jî COS θ _} if i is even and A _1} = Vîssiin θ;, if i

odd, that is: 1 1

V2 cos 6> yji COS 6> ₂ . yfïcosθ _N

V2 sin 6> 4l ήn θ ₂ . yfimide _N yfccos 2θ ₁ yfccos2θ ₂ . V2 cos2 # _w

A = SJn W ₁ T / I sin W ₂ . 4ïήn2θ _N

V2 COS pU ₁ yJ2 cos pθ ₂ . yf2 cos pθ _N

Let Y be the matrix of the frequency components of the signals Si, i = 1 to

Let X be the matrix of the ambiophonic components: X = (x _{ι k} \ _≤t≤L

O≤k≤M-l

The matrix X of the surround components is determined using the following equation:

X = A. Y (3)

The spatial transformation module 4 is thus adapted to determine the matrix X, using equation (3) as a function of the data Y _{1 k} and θ i (i = 1 to N, k = 0 to M-1) which are provided as input.

The values X _{1 k} (i = 1 to L, k = 0 to M-1), which are the elements to be encoded by the encoder 1 in a binary sequence, are provided at the input of the quantization module 6.

The quantization module 6 comprises a processing module 5 adapted to implement a method for defining the quantization function to be applied to ambiophonic components X _{1 k} (i = 1 to L, k = 0 to M-1) received. The method exploits relationships between variations in velocity and energy vectors used in Gerzon criteria and variations in surround components. The quantization function thus defined is then applied to the ambiophonic components received by the quantization module 6.

The steps for defining the quantization function implemented by the processing module 5 are based on the principles described below, with respect to the values obtained X _{1 k} (i = 1 to L, k = 0 to M-1). , surround components to quantify.

Let D be the p-order ambiophonic decoding matrix for a regular loudspeaker audio rendering system (i.e., the loudspeakers are arranged regularly around a point).

χ [*] = is the vector for the frequency F _k (k = 0 to M-1) of

ambiophonic components of order p with L = Ip + 1 and τ [k] is the vector of the powers of the respective signals delivered to the loudspeakers Q 'after surround decoding.

We then have τ [fc] = D.X [fc] (4)

If [φ _v ---, φ _Q ^ is the vector of the acoustic propagation angles from the respective Q 'speakers, then the p-order ambiophonic decoding matrix D is written as follows:

pφ _Q , It will be noted that the choice of a regular system has been made because the decoding matrix then has a reduced computation complexity (ie D the p-order ambiophonic matrix adapted to code L signals, then the V _decoding decoding matrix = -D ' ^τ ). However, another surround decoding matrix may be used by the processing module.

The coordinates of velocity vectors V and energy E, which will be named hereinafter Gerzon vectors, satisfy for frequency F _k , k = 0 to M-1:

r _v cosθ _v [k]

r _E sin θ _E [k] =

and therefore we get (equations (5)):

Σi _≤ι≤β {Σi <-j <-L <r ^X j, ^k ) ^cos ^

This last system of equations (5) defines the relation that exists between the ambiophonic components and the Gerzon vectors V and É, defined by their respective polar coordinates (r _v , θ _v ) and (r _E , θ _E ).

A variation of the values taken by the ambiophonic components therefore implies a corresponding variation or displacement of the Gerzon vectors around their original position.

However, in the case where the surround components are quantized, their quantized values are only approximate values of their true values.

It will now be determined the influence on the Gerzon vectors of an elementary displacement h around the values of the ambiophonic components.

By definition of the differential of a compound function, we can write that: d tm (θ _v [k] (h)) = (l + tm ² (θ _v [k] (h))). Dθ _v [k ] (h) d tan (θ _E [k] (h)) = (l + tan ² (θ _E [k] (h))). dθ _E [k] (h)

(6) dr ² (h) = 2r _v (h) .dr _v dr _E (h) = 2r _E (h) .dr _E

We can deduce from these equations (6) that the knowledge of the variations of the functions tan (6 ^ [fc]), tan (0 _£ [fc]), r ² and r ² makes it possible to determine the corresponding variation of the Gerzon vectors around of the vector h.

The vector h = represents the quantization error for a frequency F _k of the ambiophonic components X _{1 k} (i = 1 to L) considered.

The differential of the function tanj ^ [&] around the vector h can be written as follows:

d tm (θ _v [k] (h)) = Σh _n . AJLUZ, ( ₇ ).

H = I dX. By calculating, using equations (5), the partial derivatives of the functions tan (# _v [fc]) and r _y ² with respect to the variation (K) _{ι <n <L} of each ambiophonic component we obtain for «e [l, L], fce [O ₅ Af-I], (equations (8)):

In the same way we calculate the partial derivatives of the functions tan (0g [fc]) and ri (equations (9)), for we [l, L] and ke [0, Af-I]:

In the above section, the relations (8) and (9) that link the variations of the Gerzon vectors to the variations of the surround components have been determined. The error that the Gerzon vectors undergo is therefore a function of the error introduced on the surround components. In what follows, these relationships are exploited by the processing module 5 to determine a new type of quantization based on the criteria of spatialization.

In one embodiment of the invention, given a rate of Deb value granted for quantization, the processing module 5 seeks to determine the quantization error h of the surround components with the Deb flow rate, which optimizes the displacement of the vectors. of Gerzon.

In one embodiment, the optimization sought is the minimization, or the limitation within a given threshold, of the displacement of the Gerzon vectors around their position corresponding to a zero error. This amounts to looking for the value of the error vector h which allows the Gerzon vectors to keep an orientation and a module fairly close to the calculated Gerzon vectors without quantization.

Indeed, Gerzon's vectors make it possible to control the degree of spatial fidelity (stability and accuracy of the sound image restored) during the rendering of a sound scene on a given device.

Consider the following function vector:

This vector (10) represents the variations of the Gerzon vectors for a displacement h of the values of the ambiophonic components (X _n ) _{1 <n <L} -

Let Deb be the overall bit rate allocated to the quantization module 6 for quantizing the surround components. The overall bit rate Deb is equal to the sum of the bit rates D;> ^s allocated to each frequency F _s, s = 0 to M-1, of each Surround component (X _n ) _{1 <n <L} , M representing the number of spectral bands of the surround components

L M -I

So Deb = Σ Σ D _{] s} .

In the case where the quantization module 6 is a high resolution quantizer, it can be written that:

X J, k

^D j, k = ^cte + - ^lo èw (11)

Thus, in one embodiment, the optimization problem to be solved can be written as follows: r \ dθ _v \ (hf

Determine h minimizing κ (h) according to the D standard

L MI in each frequency Fk, under the constraint of the global flow Deb = Σ Σ D _{] s} ".

/ = 1 k = 0

The resolution of this problem can be done instead by considering the dual problem: "Determine h minimizing in each frequency Fk the overall flow Deb under the constraint | K (h) | ₂ <| δ | ₂ ", a sufficient condition to minimize the overall bit rate Deb by minimizing the elementary bit rate in each frequency.

Element δ is a vector indicating a given threshold of spatial perception. This threshold vector δ can be determined statistically by calculating for different rendering systems and for different ambiophonic transformation orders the threshold at which the change in the values taken by the surround components becomes perceptible.

In one embodiment, this optimization problem is solved by the processing module using the Lagrangian method and gradient descent methods, for example using a computer program implementing the steps of the algorithm described below. Lagrangian and gradient descent methods are known. During an iteration of the algorithm, each step a /, b / or c / is implemented in parallel for each frequency F _k ,, k = 0 to M.

The step d / uses the results determined for the set of frequencies F _k> k = 0 to M-1.

Let the following Lagrangian function: L (X, λ) = D _{; jt} -λ (κ (x) -δ). In a first step a / for a frequency F _k , the coordinates of the Lagrange vector λ are initialized: λ = λ ⁽⁰⁾ .

Then the steps b / to d / are carried out successively for (Z) = (0):

In step b /, it is determined, with respect to the frequency Fk,

This determination is made by searching the coordinates of

X such as partial derivatives fixed) are zero, using equations (6), (7), (8) and (9).

In step c /, we calculate, relative to the frequency F _k , λ ^{(/ + 7)} = max | λ ^(/) + α.g (/ i ^(/) j, θ |, where g represents the function gradient.

dθ _E (ti ^l) )

We have (h ^w ) = dr _v (h ">)

Using equations (6), (7) and (8) and (9), the value of • In step d /, the flow is determined Dj ^(\ allocated for coding the j ^ιeme surround component in the frequency F _k equal to

according to equation (11).

L M-I

Then we calculate the sum D ⁽¹⁾ = V / -i V / -i D j ^(l k ⁾ of the flow rates D ⁽ ι ^l - ^k),.

The value D ^{(1> is} then compared to the Deb value of the desired overall flow rate.

If the value of the obtained bit rate D ⁽¹⁾ is greater than the desired value Deb, increment (Z) by 1 and repeat the steps b / to d /.

Otherwise, we stop the iterations.

When in step d / an iteration (lΛ, the value of the flow D ^{(l />} obtained

is less than the desired Deb value, the coordinates h ^(//) of the vector h '''calculated during the iteration (l _f ) for a frequency Fk are those of the error minimizing the displacement of the Gerzon vectors in the frequency F _k .

The quantization function is thus defined for each surround component in each frequency F _k: the coordinate _h} ^(lf) (k) calculated for the frequency Fk represents the quantization error of the j ^ιeme surround component in the frequency Fk.

Once the quantization to operate thus defined by the processing module 5, the module 6 determines the corresponding quantization indices for each ambiophonic spectral component and supplies these data to the module 7 for constituting a binary sequence. The latter, after having carried out, if necessary, additional processing on the received data (for example an entropy coding) constitutes, according to these data, a binary sequence intended for example to be transmitted in a bit stream Φ.

The invention thus proposes a novel quantization technique applicable to multichannel signals, which takes into account spatial characteristics of the scene to be encoded. The quantization, defined by the allocation of the bits, by the quantization steps or by an index characterizing a quantizer among a set, is determined so as to cause a limited deviation of the Gerzon vectors, and thus to guarantee during the restitution of the Quantized signals an acoustic scene true to the original acoustic scene. The velocity and energy vectors are two mathematical tools introduced by Gerzon whose objective is to translate the effect of the localization, in the low and high frequency domains respectively, of a synthesized sound source. For a listener placed in the center of a reproduction device, the velocity vector V and the energy vector E are respectively associated with the location effects at low and high frequencies.

In one embodiment, in practice, a transition frequency is determined which determines the preponderance domains of the V and E criteria.

Thus, for the frequencies higher than this transition frequency, the prediction of the location is carried out thanks to the energy vector É and for the frequencies below this transition frequency, the location is based on the velocity vector V.

Physically, the transition frequency corresponds to the frequency beyond which the wavefront is smaller than the size of the head. In the case of first-order surround systems, this transition frequency is of the order of 700 Hz.

From these data, it is then possible to split the optimization problem into two problems. The first problem corresponds to seeking to optimize the position of the source reconstructed after quantization in the low frequency domain, and the second problem corresponds to seeking to optimize it in the high frequency domain. Thus, it is possible to reduce the number of constraints to two. We will f \ dθ _v \ (h) \ so use in the optimization algorithm just the couple or the

couple depending on whether one is in the low frequency domain or high frequencies.

In the embodiment described above, the invention is implemented using an inverse spatial transformation of a spatial transformation used during coding.

In one embodiment, the Gerzon vectors are computed and used independently of a transform possibly used during coding, ie the invention may be implemented when the signals are or are not spatial transformation or other.

Indeed, these Gerzon vectors are physical parameters that make it possible to characterize the wavefront reconstructed by the superposition of the waves emitted by the different loudspeakers (see "Representation of acoustic fields, application to the transmission and reproduction of scenes". complex sounds in a multimedia context ", Doctoral thesis of the Paris 6 University, July 31, 2001, Jérôme Daniel).

With reference to FIG. 3 representing a rendering device 10 comprising N loudspeakers Hj (i = 1 to N) (of which only the loudspeakers Hi, H _n and Hp are represented), a listening point E in FIG. the space which represents the center of the sound reproduction device 10 (FIG. 1).

It is possible in this case to calculate the velocity and energy vectors relative to this listening point E by using the following formulas:

- Σ G ₁ ² U ₁ where (G ₁ , - -, G ^) are the gains of the different loudspeakers Hj, i = 1 to N constituting the sound stage and the vectors U ₁ are unitary vectors starting from the point E towards the loudspeakers Hj.

From this formula, Gerzon vectors can be computed without the prior use of surround encoding.

In the context of the realization of a spatial quantizer based on the Gerzon vectors, it is then possible to define the quantization problem as follows:

For a given flow Deb must minimize the variation in velocity vectors Ay = IIv yl and energy .DELTA.E = || Ê ^'-e | , with V and Ε representing

II II II II II respectively velocity vector and energy vector calculated after quantification. The resolution of this problem is similar to the resolution described above with the use of the ambiophonic transform, based on the resolution of the Lagrangian problem.

Claims

A method of encoding components (X _{1 k} ) of an audio scene comprising N signals (Si, ..., S _N ) with N> 1, comprising a step of quantizing at least some of the components, characterized in that said quantification is defined as a function of at least one energy vector (E) and / or a velocity vector (y) associated with Gerzon criteria and a function of said components.

2. Method according to claim 1, wherein the quantization is defined as a function of variations of at least one of said vectors (y, E) during component variations (X _{ι k} ).

3. Method according to the preceding claim, according to which component variations (X _{1 k} ) corresponding to the minimization or limitation of variations of at least one of the vectors (V, E) are determined and function of said determined component variations, quantization error values making it possible to define the quantization of the components.

4. Method according to one of claims 1 to 3, characterized in that it comprises a step of detecting a transition frequency for determining the one of the vectors of the energy vector or the velocity vector to be taken into account. account to define the quantification of the components.

5. Method according to one of the preceding claims, characterized in that the components are components obtained by spatial transformation.

6. Method according to claim 5, characterized in that the spatial components are ambiophonic components, determined by an ambiophonic spatial transformation.

The method according to claim 5 or 6, wherein the energy vector (E) is calculated as a function of an inverse spatial transformation (D) on said spatial components and / or the velocity vector (y) is calculated as function of an inverse spatial transformation (D) on said spatial components.

8. Module (5) for processing components (X _{1 k} ) from an audio scene comprising N signals (Si, ..., S _N ) with N> 1, comprising means for determining definition elements of a step of quantizing at least some of the components, as a function at least of the energy vector (E) and / or the velocity vector (V) associated with Gerzon criteria and function of said components.

An audio encoder (1) adapted to encode components (X _{1 k} ) of an audio scene comprising N signals (Si, ..., S _N ) with N> 1, comprising: a processing module (5) of components according to claim 8; a quantization module adapted to define quantization data associated with components based on at least elements determined by the processing module.

10. Computer program to be installed in a processing module (5), said program comprising instructions for implementing, during a program execution by processing means of said module, the steps of a method according to the present invention. any of claims 1 to 7.