HK1238786B

HK1238786B - Method and apparatus for compressing and decompressing a higher order ambisonics signal representation

Info

Publication number: HK1238786B
Application number: HK17112517.0A
Authority: HK
Inventors: A.克鲁格; S.科唐; J.贝姆; ‧巴特克 J-M
Original assignee: 杜比国际公司
Priority date: 2012-05-14
Filing date: 2017-11-28
Publication date: 2021-09-17

Description

Method and apparatus for compressing and decompressing high-order Ambisonics signal representation

本申请是申请号为201380025029.9、申请日为2013年5月6日、发明名称为“压缩和解压缩高阶高保真度立体声响复制信号表示的方法及装置”的发明专利申请的分案申请。The present application is a divisional application of the invention patent application with application number 201380025029.9, application date May 6, 2013, and invention name “Method and device for compressing and decompressing high-order ambisonics signal representation”.

技术领域Technical Field

本发明涉及一种压缩和解压缩高阶高保真度立体声响复制(Higher OrderAmbisonics)信号表示的方法及装置，其中以不同的方式处理方向和环境(ambient)分量。The present invention relates to a method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation, wherein directional and ambient components are processed differently.

背景技术Background Art

高阶高保真度立体声响复制(HOA)提供了以下优点：捕获三维空间中的特定位置附近的完整声场，该位置被称为“最佳听音位置(sweet spot)”。与像立体声或环绕声这样的基于信道的技术相反，这种HOA表示不依赖于具体的扩音器结构。但是，这种灵活性以在特定扩音器结构上回放该HOA表示所需的解码处理为代价。Higher-order Ambisonics (HOA) offers the advantage of capturing the complete sound field around a specific location in three-dimensional space, known as the "sweet spot." Unlike channel-based technologies like stereo or surround sound, this HOA representation is independent of a specific loudspeaker configuration. However, this flexibility comes at the expense of the decoding processing required to play back the HOA representation on a specific loudspeaker configuration.

HOA基于使用截断的球谐函数(SH)展开式的在期望的听者位置附近的位置x的单独的角形波数量k的气压的复数幅度的描述，其中，在不失一般性的情况下，可以将期望的听者位置假设为球坐标系统的原点。这种表示的空间分辨率随着该展开式的增长的最大阶N提高。不幸的是，展开式系数的数量O随阶N而平方地增长，亦即O＝(N+1)²。例如，使用阶N＝4的典型的HOA表示需要O＝25个HOA系数。给出期望的采样率f_S和每个样本的比特数量N_b，传送HOA信号表示的总的比特率按照O·f_S·N_b来确定，并且在针对每个样本采用N_b＝16个比特，采样率为f_S＝48kHz的情况下的阶N＝4的HOA信号表示的传输导致19.2MBits/s的比特率。因此，压缩HOA信号表示是非常值得做的。HOA is based on describing the complex amplitude of the air pressure at a position x near the desired listener position using a truncated spherical harmonics (SH) expansion of a number k of individual angular waves, where, without loss of generality, the desired listener position can be assumed to be the origin of the spherical coordinate system. The spatial resolution of this representation increases with the maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quadratically with the order N, i.e., O = (N + 1) ² . For example, a typical HOA representation using order N = 4 requires O = 25 HOA coefficients. Given the desired sampling rate f _S and the number of bits per sample N _b , the total bit rate for transmitting the HOA signal representation is determined as O·f _S ·N _b . Transmission of an HOA signal representation of order N = 4, using N _b = 16 bits per sample and a sampling rate of f _S = 48 kHz, results in a bit rate of 19.2 MBits/s. Therefore, compressing the HOA signal representation is highly desirable.

关于现存空间音频压缩方法的概述可以在专利申请EP 10306472.1中或者在I.Elfitri、B.Günel、A.M.Kondoz的“Multichannel Audio Coding Based on Analysis bySynthesis”(Proceedings of the IEEE，第99卷，第4期，657-670页，2011年4月)中找到。An overview of existing spatial audio compression methods can be found in patent application EP 10306472.1 or in “Multichannel Audio Coding Based on Analysis by Synthesis” by I. Elfitri, B. Günel, A.M. Kondoz, Proceedings of the IEEE, Vol. 99, No. 4, pp. 657-670, April 2011.

下面的技术与本发明更相关。The following technology is more relevant to the present invention.

可以如V.Pulkki在“Spatial Sound Reproduction with Directional AudioCoding”(Journal of Audio Eng.Society，第55(6)卷，503-516页，2007年)中所述的使用方向音频编码(DirAC)来压缩B格式信号(等效于一阶高保真度立体声响复制表示)。在对电子会议应用提出的一个版本中，将B格式信号编码成单个全向信号以及以单一方向形式的边信息和针对每个频带的扩散参数。然而，作为结果的数据率的显著降低以在再现时得到的较小的信号质量为代价。另外，DirAC受限于一阶高保真度立体声响复制表示的压缩，其受到非常低的空间分辨率的影响。B-format signals (equivalent to a first-order Ambisonics representation) can be compressed using Directional Audio Coding (DirAC), as described by V. Pulkki in "Spatial Sound Reproduction with Directional Audio Coding" (Journal of Audio Eng. Society, Vol. 55(6), pp. 503-516, 2007). In one version proposed for electronic conferencing applications, the B-format signal is encoded into a single omnidirectional signal together with side information in the form of a single direction and a diffuseness parameter for each frequency band. However, the resulting significant reduction in data rate comes at the expense of a lower signal quality upon reproduction. In addition, DirAC is limited to compression of a first-order Ambisonics representation, which suffers from a very low spatial resolution.

已知的用于压缩具有N＞1的HOA表示的方法相当少。其中之一利用感知高级音频编码(AAC)编码解码器对单独的HOA系数序列进行直接编码，参见E.Hellerud、I.Burnett、A.Solvang、U.Peter Svensson的“Encoding Higher Order Ambisonics with AAC”(第124届AES大会，阿姆斯特丹，2008年)。然而，该方法的固有问题是永远不会被听到的信号的感知编码。通常通过HOA系数序列的加权和来获得重构的回放信号。这是为什么当在特定的扩音器结构上呈现解压缩后的HOA表示时未屏蔽感知编码噪声的概率很高的原因。以更技术性的术语，感知编码噪声未屏蔽的主要问题是单独的HOA系数序列之间的高度的互相关性。因为在单独的HOA系数序列中的编码后的噪声信号通常彼此不相关，所以可能出现感知编码噪声的结构重叠，同时与噪声无关的HOA系数序列在重叠处被消去。另一个问题是所提到的互相关性导致感知编码器的效率降低。There are relatively few known methods for compressing HOA representations with N > 1. One approach involves directly encoding individual HOA coefficient sequences using the perceptual Advanced Audio Coding (AAC) codec, see E. Hellerud, I. Burnett, A. Solvang, and U. Peter Svensson, “Encoding Higher Order Ambisonics with AAC” (124th AES Convention, Amsterdam, 2008). However, an inherent problem with this approach is the perceptual coding of signals that will never be heard. The reconstructed playback signal is typically obtained by a weighted sum of the HOA coefficient sequences. This explains the high probability of unmasking perceptual coding noise when the decompressed HOA representation is presented on a specific loudspeaker configuration. In more technical terms, the main issue with unmasking perceptual coding noise is the high degree of cross-correlation between the individual HOA coefficient sequences. Because the encoded noise signals in the individual HOA coefficient sequences are generally uncorrelated, overlapping structures of perceptual coding noise can occur, while HOA coefficient sequences unrelated to the noise are eliminated at the overlap. Another issue is that the cross-correlation reduces the efficiency of the perceptual coder.

为了将这些影响的程度最小化，在EP 10306472.1中提出在感知编码之前将HOA表示变换为空间域中的等效表示。空间域信号对应于常规的方向信号，并且如果扩音器被置于与对空间域变换假设的那些方向完全相同的方向上，则将对应于扩音器信号。In order to minimize the extent of these effects, EP 10306472.1 proposes transforming the HOA representation into an equivalent representation in the spatial domain before perceptual coding. The spatial domain signal corresponds to a conventional directional signal and would correspond to a loudspeaker signal if the loudspeakers were placed in exactly the same directions as those assumed for the spatial domain transform.

到空间域的变换降低了单独的空间域信号之间的互相关性。然而，并未彻底消除互相关性。关于相对较高的互相关性的示例是其方向落入空间域信号所覆盖的相邻方向之间的方向信号。The transformation to the spatial domain reduces the cross-correlation between the individual spatial domain signals. However, the cross-correlation is not completely eliminated. An example of relatively high cross-correlation is a directional signal whose direction falls between adjacent directions covered by the spatial domain signal.

EP 10306472.1和上述的Hellerud等人的论文的另一个不足是经感知编码的信号的数量是(N+1)²，其中，N是HOA表示的阶。因此，压缩后的HOA表示的数据率随高保真度立体声响复制阶而平方地增长。Another disadvantage of EP 10306472.1 and the aforementioned paper by Hellerud et al. is that the number of perceptually coded signals is (N+1) ² , where N is the order of the HOA representation. Therefore, the data rate of the compressed HOA representation grows quadratically with the Ambisonics order.

本发明的压缩处理将HOA声场表示分解为方向分量和环境分量。具体对于计算方向声场分量，在下面描述了一种新的处理，用于估计若干主声音方向。The compression process of the present invention decomposes the HOA sound field representation into a directional component and an ambient component. Specifically for computing the directional sound field component, a new process is described below for estimating several dominant sound directions.

关于基于高保真度立体声响复制的方向估计的现存方法，上述的Pulkki的论文描述了一种结合DirAC编码的方法，用于基于B格式声场表示来估计方向。方向根据平均强度矢量获得，其指向声场能量流动的方向。在D.Levin、S.Gannot、E.A.P Habets的“Direction-of-Arrival Estimation using Acoustic Vector Sensors in thePresence of Noise”(IEEE Proc.Of the ICASSP，105-108页，2011年)中提出了一种基于B格式的替代。通过搜索对引入到那个方向的波束形成器输出信号提供最大能量的那个方向，迭代地进行方向估计。Regarding existing methods for direction estimation based on Ambisonics, the aforementioned paper by Pulkki describes a method combined with DirAC coding for estimating direction based on a B-format sound field representation. The direction is obtained from the mean intensity vector, which indicates the direction of the sound field energy flow. An alternative based on the B-format is proposed in D. Levin, S. Gannot, and E.A.P. Habets, "Direction-of-Arrival Estimation using Acoustic Vector Sensors in the Presence of Noise" (IEEE Proc. of the ICASSP, pp. 105-108, 2011). Direction estimation is performed iteratively by searching for the direction that provides the maximum energy to the beamformer output signal directed toward that direction.

然而，对于方向估计，两种方法都受约束于B格式，其受到相对较低的空间分辨率的影响。另一不足之处是该估计被限制于仅仅单个主方向。However, for direction estimation, both methods are constrained to the B-format, which suffers from relatively low spatial resolution.Another drawback is that the estimation is restricted to only a single main direction.

HOA表示提供了改善的空间分辨率，从而允许对若干主方向的改善的估计。现存的基于HOA声场表示对若干方向进行估计的方法相当稀少。在N.Epain、C.Jin、A.van Schaik的“The Application of Compressive Sampling to the Analysis and Synthesis ofSpatial Sound Fields”(127th Convention of the Audio Eng.Soc.，纽约，2009年)中以及在A.Wabnitz、N.Epain、A.van Schaik、C Jin的“Time Domain Reconstruction ofSpatial Sound Fields Using Compressed Sensing”(IEEE Proc.of the ICASSP，465-468页，2011年)中提出了一种基于压缩传感的方法。主要想法是假设声场是空间稀疏的，亦即由仅仅少量的方向信号构成。在球上分配大量的测试方向之后，采用最优化算法以便发现尽可能少的测试方向以及对应的方向信号，使得它们被给出的HOA表示良好地描述。与实际上由给出的HOA表示提供的空间分辨率相比，该方法提供了一种改善的空间分辨率，因为其避开了从给出的HOA表示的有限阶导致的空间离差。然而，该算法的性能高度依赖于是否满足稀疏性假设。具体地，如果声场包括任何的较小的附加环境分量，或者如果HOA表示受到将在从多信道记录计算时出现的噪声的影响，则该方法将失败。The HOA representation provides improved spatial resolution, allowing improved estimation of several main directions. Existing methods for estimating several directions based on the HOA sound field representation are quite rare. A method based on compressed sensing was proposed in "The Application of Compressive Sampling to the Analysis and Synthesis of Spatial Sound Fields" by N. Epain, C. Jin, A. van Schaik (127th Convention of the Audio Eng. Soc., New York, 2009) and in "Time Domain Reconstruction of Spatial Sound Fields Using Compressed Sensing" by A. Wabnitz, N. Epain, A. van Schaik, C Jin (IEEE Proc. of the ICASSP, pp. 465-468, 2011). The main idea is to assume that the sound field is spatially sparse, that is, it consists of only a small number of directional signals. After distributing a large number of test directions on the sphere, an optimization algorithm is used to find as few test directions and corresponding directional signals as possible so that they are well described by the given HOA representation. This method provides an improved spatial resolution compared to that actually provided by the given HOA representation, since it avoids the spatial dispersion resulting from the finite order of the given HOA representation. However, the performance of the algorithm is highly dependent on whether the sparsity assumption is met. In particular, the method will fail if the sound field includes any small additional ambient components, or if the HOA representation is affected by noise that would be present when calculated from multi-channel recordings.

另一个更直观的方法是将给出的HOA表示变换成在B.Rafaely的“Plane-wavedecomposition of the sound field on a sphere by spherical convolution”(J.Acoust.Soc.Am.，第4卷，第116号，2149-2157页，2004年10月)中所述的空间域，然后搜索方向功率中的最大值。该方法的不足之处是环境分量的存在将导致方向功率分布的模糊，并且与不存在任何环境分量相比，将导致方向功率的最大值的移位。Another more intuitive method is to transform the given HOA representation into the spatial domain described in "Plane-wave decomposition of the sound field on a sphere by spherical convolution" (J. Acoust. Soc. Am., Vol. 4, No. 116, pp. 2149-2157, October 2004) by B. Rafaely, and then search for the maximum value in the directional power. The disadvantage of this method is that the presence of an ambient component will result in an ambiguous directional power distribution and will result in a shift in the maximum value of the directional power compared to when no ambient component is present.

发明内容Summary of the Invention

本发明要解决的问题是提供一种HOA信号的压缩，由此仍然保持HOA信号表示的高空间分辨率。通过在权利要求1和2中所述的方法解决该问题。在权利要求3和4中公开了利用这些方法的装置。The problem to be solved by the present invention is to provide a compression of HOA signals, whereby the high spatial resolution represented by the HOA signals is still maintained. This problem is solved by the methods described in claims 1 and 2. Apparatuses utilizing these methods are disclosed in claims 3 and 4.

本发明解决声场的高阶高保真度立体声响复制HOA表示的压缩。在本申请中，术语“HOA”是指所述高阶高保真度立体声响复制表示以及对应地编码或表示后的音频信号。估计主声音方向，并且将HOA信号表示分解成时域中的若干主方向信号和相关的方向信息以及HOA域中的环境分量，继之以通过降低其阶来压缩环境分量。在该分解之后，将降低了阶的环境HOA分量变换到空间域，并且与方向信号一起进行感知编码。The present invention addresses the compression of high-order Ambisonics (HOA) representations of sound fields. In this application, the term "HOA" refers to both the high-order Ambisonics representation and the corresponding encoded or represented audio signal. The dominant sound direction is estimated, and the HOA signal representation is decomposed into several dominant directional signals and associated directional information in the time domain, as well as an ambiance component in the HOA domain. The ambiance component is then compressed by reducing its order. Following this decomposition, the reduced-order ambiance HOA component is transformed into the spatial domain and perceptually encoded along with the directional signal.

在接收器或解码器侧，感知地解压缩编码后的方向信号和阶降低后经编码的环境分量。将经感知解压缩的环境信号变换成降低了阶的HOA域表示，继之以阶扩展。从方向信号和对应的方向信息以及从原始阶的环境HOA分量重新组成总的HOA表示。At the receiver or decoder side, the coded directional signal and the coded ambience component after order reduction are perceptually decompressed. The perceptually decompressed ambience signal is transformed into a reduced-order HOA domain representation, followed by order expansion. The overall HOA representation is reconstructed from the directional signal and the corresponding directional information, as well as from the original-order ambience HOA component.

有利地，环境声场分量可以通过具有低于原始的阶的HOA表示以足够的准确度来表示，并且主方向信号的提取确保了在压缩和解压缩之后仍然获得高空间分辨率。Advantageously, the ambient sound field components can be represented with sufficient accuracy by an HOA representation having a lower order than the original, and the extraction of the main directional signal ensures that a high spatial resolution is still obtained after compression and decompression.

原则上，本发明的方法适于压缩高阶高保真度立体声响复制HOA信号表示，所述方法包括以下步骤：In principle, the method of the invention is suitable for compressing a higher-order Ambisonics (HOA) signal representation, said method comprising the following steps:

-估计主方向，其中，所述主方向估计取决于能量上的主HOA分量的方向功率分布；- estimating a main direction, wherein the main direction estimate depends on the directional power distribution of the main HOA components in energy;

-将HOA信号表示分解或解码成时域中的若干主方向信号和相关的方向信息以及HOA域中的残差环境分量，其中，所述残差环境分量表示所述HOA信号表示和所述主方向信号的表示之间的差异；- decomposing or decoding the HOA signal representation into a plurality of main directional signals and associated directional information in the time domain and a residual ambience component in the HOA domain, wherein the residual ambience component represents the difference between the HOA signal representation and a representation of the main directional signal;

-通过与所述残差环境分量的原始阶相比降低所述残差环境分量的阶来压缩所述残差环境分量；- compressing the residual ambience component by reducing the order of the residual ambience component compared to its original order;

-将降低了阶的所述残差环境HOA分量变换到空间域；- transforming the reduced-order residual ambient HOA component into the spatial domain;

-对所述主方向信号和所述变换后的残差环境HOA分量进行感知编码。-Perceive coding the main directional signal and the transformed residual ambient HOA component.

原则上，本发明的方法适于对通过以下步骤进行了压缩的高阶高保真度立体声响复制HOA信号表示进行解压缩：In principle, the method of the invention is suitable for decompressing a Higher Order Ambisonics HOA signal representation that has been compressed by:

-将降低了阶的所述残差环境分量变换到空间域；- transforming the reduced-order residual ambience component into the spatial domain;

-对所述主方向信号和所述变换后的残差环境HOA分量进行感知编码；-perceptually encoding the main direction signal and the transformed residual environment HOA component;

所述方法包括以下步骤：The method comprises the following steps:

-对所述经感知编码的主方向信号和所述经感知编码的变换后的残差环境HOA分量进行感知解码；-perceptually decoding the perceptually coded main directional signal and the perceptually coded transformed residual ambient HOA component;

-对经感知解码的变换后的残差环境HOA分量进行逆变换以便取得HOA域表示；- inverse transform the perceptually decoded transformed residual ambient HOA components to obtain HOA domain representation;

-对经逆变换的残差环境HOA分量进行阶扩展以便建立原始阶的环境HOA分量；- performing order expansion on the inverse transformed residual ambient HOA component to create the ambient HOA component of the original order;

-组成所述经感知解码的主方向信号、所述方向信息以及所述经原始阶扩展的环境HOA分量以便取得HOA信号表示。- composing the perceptually decoded primary directional signal, the directional information and the original-order expanded ambient HOA components in order to obtain an HOA signal representation.

原则上，本发明的装置适于压缩高阶高保真度立体声响复制HOA信号表示，所述装置包括：In principle, the device of the invention is suitable for compressing a higher-order Ambisonics (HOA) signal representation, said device comprising:

-适于估计主方向的部件，其中，所述主方向估计取决于能量上的主HOA分量的方向功率分布；- means adapted to estimate a main direction, wherein said main direction estimate depends on the directional power distribution of the main HOA components in energy;

-适于将HOA信号表示分解或解码成时域中的若干主方向信号和相关的方向信息以及HOA域中的残差环境分量的部件，其中，所述残差环境分量表示所述HOA信号表示和所述主方向信号的表示之间的差异；- means adapted to decompose or decode the HOA signal representation into a number of main directional signals and associated directional information in the time domain and a residual ambience component in the HOA domain, wherein the residual ambience component represents the difference between the HOA signal representation and a representation of the main directional signal;

-适于通过与所述残差环境分量的原始阶相比降低所述残差环境分量的阶来压缩所述残差环境分量的部件；- means adapted to compress said residual ambience component by reducing the order of said residual ambience component compared to its original order;

-适于将降低了阶的所述残差环境分量变换到空间域的部件；- means adapted to transform said residual ambience component of reduced order into the spatial domain;

-适于对所述主方向信号和所述变换后的残差环境HOA分量进行感知编码的部件。- means adapted to perceptually encode said main directional signal and said transformed residual ambient HOA component.

原则上，本发明的装置适于对通过以下步骤进行了压缩的高阶高保真度立体声响复制HOA信号表示进行解压缩：In principle, the device of the invention is suitable for decompressing a Higher Order Ambisonics HOA signal representation that has been compressed by:

所述装置包括：The device comprises:

-适于对经感知编码的主方向信号和经感知编码的变换后的残差环境HOA分量进行感知解码的部件；- means adapted to perceptually decode the perceptually coded main directional signal and the perceptually coded transformed residual ambient HOA component;

-适于对经感知解码的变换后的残差环境HOA分量进行逆变换以便取得HOA域表示的部件；- means adapted to inversely transform the perceptually decoded transformed residual ambient HOA component in order to obtain an HOA domain representation;

-适于对所述经逆变换的残差环境HOA分量进行阶扩展以便建立原始阶的环境HOA分量的部件；- means adapted to perform order expansion on said inverse transformed residual ambience HOA components in order to create ambience HOA components of original order;

-适于组成所述经感知解码的主方向信号、所述方向信息以及所述经原始阶扩展的环境HOA分量以便取得HOA信号表示的部件。- means adapted to compose said perceptually decoded main directional signal, said directional information and said original-order expanded ambient HOA components in order to obtain an HOA signal representation.

在相应的从属权利要求中公开了本发明的有利的另外的实施例。Advantageous further embodiments of the invention are disclosed in the respective dependent claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

参照附图说明本发明的示例性实施例，附图中：Exemplary embodiments of the present invention are described with reference to the accompanying drawings, in which:

图1是关于不同的高保真度立体声响复制阶N和角Θ∈[0，π]的规一化离差函数v_N(Θ)；FIG1 shows the normalized dispersion function v _N (Θ) for different Ambisonics orders N and angles Θ∈[0,π];

图2是根据本发明的压缩处理的框图；FIG2 is a block diagram of a compression process according to the present invention;

图3是根据本发明的解压缩处理的框图。FIG3 is a block diagram of a decompression process according to the present invention.

具体实施方式DETAILED DESCRIPTION

高保真度立体声响复制信号使用球谐函数(SH)展开式描述无源区域内的声场。这种描述的灵活性可以归因于声压的时间和空间行为基本上由波动方程确定这一物理特性。The Ambisonics signal uses a spherical harmonics (SH) expansion to describe the sound field within the passive region. This flexibility can be attributed to the physical property that the temporal and spatial behavior of the sound pressure is essentially determined by the wave equation.

波动方程和球谐函数展开式Wave equation and spherical harmonic expansion

为了对高保真度立体声响复制进行更详细的描述，下面假设球坐标系统，其中，通过半径r＞0(亦即，到坐标原点的距离)、从极轴z测量的倾斜角θ∈[0，π]以及从x轴在x＝y平面中测量的方位角φ∈[0，2π[来表示空间x＝(r，θ，φ)^T中的点。在该球坐标系统中，关于连通的无源区域内的声压p(t，x)(其中，t表示时间)的波动方程由Earl G.Williams的教科书“Fourier Acoustics”(Applied Mathematical Sciences第93卷，Academic Press，1999年)给出：For a more detailed description of Ambisonics, a spherical coordinate system is assumed below, in which a point in space x = (r, θ, φ) T is represented by a radius r > 0 (i.e., the distance from the coordinate origin), a tilt angle θ∈[0,π] measured from the polar axis z, and an ^azimuth angle φ∈[0,2π] measured from the x-axis in the x=y plane. In this spherical coordinate system, the wave equation for the sound pressure p(t,x) (where t represents time) in a connected passive region is given by Earl G. Williams' textbook "Fourier Acoustics" (Applied Mathematical Sciences, Vol. 93, Academic Press, 1999):

其中，c_s指示声音的速度。因此，关于时间的声压的傅里叶变换为where _cs indicates the speed of sound. Therefore, the Fourier transform of the sound pressure with respect to time is

其中，i表示虚数单位，根据Williams的教科书可以展开成SH的级数：Here, i represents the imaginary unit, which can be expanded into the SH series according to Williams' textbook:

应当注意到，该展开式对于连通的无源区域(其对应于序列的收敛的区域)内的所有点x均有效。It should be noted that this expansion is valid for all points x within the connected, passive region (which corresponds to the region of convergence of the sequence).

在等式(4)中，k表示由下式定义的角形波数量：In equation (4), k represents the number of angular waves defined by:

并且指示SH展开式系数，其只取决于乘积kr。and denotes the SH expansion coefficients, which depend only on the product kr.

另外，是阶n以及次数(degree)m的SH函数：In addition, here is the SH function of order n and degree m:

其中，表示相关联的勒让德函数，并且(·)！表示阶乘。where represents the associated Legendre function and (·)! represents the factorial.

关于非负次数指数m的相关联的勒让德函数通过勒让德多项式P_n(x)定义，如下：The associated Legendre functions for non-negative power exponents m are defined by the Legendre polynomials _Pn (x) as follows:

其中m≥0。 (7)Where m≥0. (7)

对于负次数指数，亦即m＜0，相关联的勒让德函数定义如下：For negative power exponents, that is, m < 0, the associated Legendre function is defined as follows:

其中m＜0。 (8)Where m＜0. (8)

继而勒让德多项式P_n(x)(n≥0)可以使用罗德里格公式定义为：Then the Legendre polynomial P _n (x)(n≥0) can be defined using Rodrigues' formula as:

在现有技术中，例如在M.Poletti的“Unified Description of Ambisonicsusing Real and Complex Spherical Harmonics”(Proceedings of the AmbisonicsSymposium 2009，2009年6月25至27日，格拉茨，奥地利)中，还存在关于SH函数的定义，其通过关于负次数指数m的因子(-1)^m从等式(6)得出。In the prior art, for example in M. Poletti's "Unified Description of Ambisonics using Real and Complex Spherical Harmonics" (Proceedings of the Ambisonics Symposium 2009, June 25-27, 2009, Graz, Austria), there is also a definition of the SH function, which results from equation (6) by the factor (-1) ^m for the negative power index m.

替代地，关于时间的声压的傅里叶变换可以使用实数SH函数表示为Alternatively, the Fourier transform of the sound pressure with respect to time can be expressed using the real SH function as

在文献中，存在关于实数SH函数的多种定义(例如，参见上述的Poletti的论文)。一种在本文档中应用的可行的定义由下式给出：In the literature, there are various definitions of real SH functions (see, for example, the above-mentioned paper by Poletti). One possible definition used in this document is given by:

其中，(·)^*表示复数共轭。通过将等式(6)插入到等式(11)中得到一种替代的表示：Where (·) ^* represents the complex conjugate. An alternative representation is obtained by inserting equation (6) into equation (11):

其中，in,

虽然实数SH函数针对每个定义都是实数值的，但是一般地，对于对应的展开式系数这并不满足。Although real SH functions are real-valued for every definition, in general this is not true for the corresponding expansion coefficients.

复数SH函数涉及如下实数SH函数：Complex SH functions are related to the following real SH functions:

复数SH函数以及具有方向矢量Ω：＝(θ，φ)^T的实数SH函数形成三维空间中的单位球上的平方可积分复数值函数的正交基，因此满足如下条件：The complex SH functions and the real SH functions with the direction vector Ω:=(θ, φ) ^T form an orthogonal basis of square-integrable complex-valued functions on the unit sphere in three-dimensional space, and thus satisfy the following condition:

其中，δ表示克罗内克δ函数。使用等式(15)和等式(11)中的实数球谐函数的定义可以得出第二结果。Here, δ represents the Kronecker delta function. Using equation (15) and the definition of real spherical harmonics in equation (11), a second result can be obtained.

内部问题和高保真度立体声响复制系数Internal issues and Ambisonics coefficients

高保真度立体声响复制的目的是表示坐标原点附近的声场。在不失一般性的情况下，此处假设感兴趣的这个区域为以坐标原点为中心的半径为R的球形体，其通过集合{x|0≤r≤R}指定。关于该表示的关键假设是假定该球形体不包含任何声源。找出该球形体内的声场表示被称为“内部问题”，参见上述的Williams的教科书。The goal of Ambisonics is to represent the sound field near the origin. Without loss of generality, this region of interest is assumed to be a sphere of radius R centered at the origin, specified by the set {x|0≤r≤R}. A key assumption about this representation is that the sphere contains no sound sources. Finding a representation of the sound field within this sphere is known as the "interior problem"; see the aforementioned Williams textbook.

可以示出，关于该内部问题，SH函数展开式系数可以表示为It can be shown that, with respect to this internal problem, the coefficients of the SH function expansion can be expressed as

其中，j_n(.)表示一阶球贝塞尔函数。根据等式(17)，其满足关于声场的完整信息包含在被称为高保真度立体声响复制系数的系数中。where j _n (.) denotes a first-order spherical Bessel function. According to equation (17), it is satisfied that the complete information about the sound field is contained in the coefficients called Ambisonics coefficients.

类似地，可以对实数SH函数展开式的系数进行因式分解为Similarly, the coefficients of the real SH function expansion can be factored into

其中，系数被称为关于使用实数值的SH函数的展开式的高保真度立体声响复制系数。它们还通过下式与相关：where the coefficients are called Ambisonics coefficients with respect to the expansion of the SH function using real values. They are also related to by:

平面波分解Plane wave decomposition

在中心位于坐标原点的声音无源球形体内的声场可以通过从所有可能方向碰撞到该球形体上的无限数量的角形波数量k不同的平面波的重叠表示，参见上述的Rafely的“Plane-wave decomposition...”论文。假设来自方向Ω₀的具有角形波数量k的平面波的复数幅度由D(k，Ω₀)给出，可以使用等式(11)和等式(19)以类似的方式示出关于实数SH函数展开式的对应的高保真度立体声响复制系数由下式给出：The sound field within a sound-passive sphere centered at the origin can be represented by the superposition of an infinite number of plane waves of varying angular wavenumbers k impinging upon the sphere from all possible directions, see the aforementioned Rafely's "Plane-wave decomposition..." paper. Assuming that the complex amplitude of a plane wave with angular wavenumber k coming from the direction Ω ₀ is given by D(k, Ω ₀ ), the corresponding Ambisonics coefficients with respect to the real SH function expansion can be shown in a similar manner using equations (11) and (19) as follows:

因此，关于从无限数量的角形波数量为k的平面波的重叠得到的声场的高保真度立体声响复制系数从等式(20)在所有可能的方向的积分得到：Therefore, the Ambisonics coefficients for the sound field resulting from the superposition of an infinite number of plane waves with a number k of angular waves are obtained from the integration of equation (20) over all possible directions:

函数D(k，Ω)被称为“幅度密度”，并且假设在单位球上是平方可积分的。可以将其展开成实数SH函数的级数，如下The function D(k,Ω) is called the "amplitude density" and is assumed to be square-integrable on the unit sphere. It can be expanded into a series of real SH functions as follows

其中，展开系数等于出现在等式(22)中的积分，亦即where the expansion coefficient is equal to the integral appearing in equation (22), i.e.

通过将等式(24)插入到等式(22)中，可以看出高保真度立体声响复制系数是展开式系数的缩放后的版本，亦即By inserting equation (24) into equation (22), it can be seen that the Ambisonics coefficients are scaled versions of the expansion coefficients, i.e.

在对缩放后的高保真度立体声响复制系数以及幅度密度函数D(k，Ω)应用关于时间的逆傅里叶变换时，得到对应的时域量When applying the inverse Fourier transform with respect to time to the scaled Ambisonics coefficients and the amplitude density function D(k,Ω), the corresponding time-domain quantity is obtained.

然后，在时域中，可以将等式(24)用公式表示为Then, in the time domain, equation (24) can be formulated as

时域方向信号d(t，Ω)可以通过实数SH函数展开式根据下式表示The time domain direction signal d(t,Ω) can be expressed by the real SH function expansion according to the following formula

使用SH函数是实数值的这一事实，其复数共轭可以表示为Using the fact that the SH function is real-valued, its complex conjugate can be expressed as

假设时域信号d(t，Ω)是实数值的，亦即d(t，Ω)＝d^*(t，Ω)，根据等式(29)与等式(30)的比较，可以得出系数在该情况下是实数值的，亦即Assuming that the time domain signal d(t,Ω) is real-valued, that is, d(t,Ω)=d ^* (t,Ω), according to the comparison between equation (29) and equation (30), it can be concluded that the coefficient is real-valued in this case, that is,

下面，将系数称为缩放后的时域高保真度立体声响复制系数。In the following, the coefficients are referred to as scaled time domain Ambisonics coefficients.

下面，还假设声场表示通过将在下面的处理压缩的部分更详细地描述的这些系数给出。In the following, it is also assumed that the sound field representation is given by these coefficients which will be described in more detail in the following section on processing compression.

注意，通过用于根据本发明的处理的系数进行的时域HOA表示等效于对应的频域HOA表示因此，在对等式进行了较小的相应修改的情况下，可以在频域中等效地实现所述压缩和解压缩。Note that the time domain HOA representation by the coefficients used for the processing according to the present invention is equivalent to the corresponding frequency domain HOA representation. Therefore, the compression and decompression can be equivalently implemented in the frequency domain with minor corresponding modifications to the equations.

具有有限阶的空间分辨率Finite-order spatial resolution

实践中，仅使用有限数量的阶n≤N的高保真度立体声响复制系数描述坐标原点附近的声场。相对于真实幅度密度函数D(k，Ω)，根据下式从截断的SH函数级数计算幅度密度函数引入了一种空间离差In practice, only a limited number of Ambisonics coefficients of order n ≤ N are used to describe the sound field near the origin. The magnitude density function computed from the truncated SH function series introduces a spatial dispersion relative to the true magnitude density function D(k,Ω) according to

参见上述的“Plane-wave decomposition...”论文。这可以通过使用等式(31)See the above-mentioned "Plane-wave decomposition..." paper. This can be done by using equation (31)

对来自方向Ω₀的单个平面波计算幅度密度函数来实现：This is achieved by computing the amplitude density function for a single plane wave coming from the direction Ω ₀ :

其中in

其中，Θ表示满足下述属性的指向方向Ω和Ω₀的两个矢量之间的角where Θ represents the angle between two vectors pointing in the direction Ω and Ω ₀ that satisfy the following properties:

cosΘ＝cosθcosθ₀+cos(φ-φ₀)sinθsinθ₀ (39)cosΘ＝cosθcosθ ₀ +cos(φ-φ ₀ )sinθsinθ ₀ (39)

在等式(34)中，利用在等式(20)中给出的平面波的高保真度立体声响复制系数，而在等式(35)和(36)中利用一些数学理论，参见上述的“Plane-wave decomposition...”论文。可以使用等式(14)示出在等式(33)中的属性。In equation (34), the plane wave Ambisonics coefficients given in equation (20) are used, while in equations (35) and (36) some mathematical theory is used, see the above-mentioned "Plane-wave decomposition..." paper. The properties in equation (33) can be shown using equation (14).

比较等式(37)与真实幅度密度函数Compare equation (37) with the true amplitude density function

其中，δ(·)表示迪拉克δ函数，从将缩放后的迪拉克δ函数替换为离差函数v_N(Θ)(其在按照其最大值进行了归一化之后，针对不同的高保真度立体声响复制阶N和角Θ∈[0，π]，在图1中示出)，空间离差变得显而易见。where δ(·) denotes the Dirac delta function, the spatial dispersion becomes apparent from replacing the scaled Dirac delta function with the dispersion function v _N (Θ) (which is shown in FIG1 for different Ambisonics orders N and angles Θ∈[0,π] after being normalized to its maximum value).

因为对于N≥4，v_N(Θ)的第一个零近似地位于(参见上述的“Plane-wavedecomposition...”论文)，随着增加高保真度立体声响复制阶N，离差效应降低(并且因此空间分辨率提高)。Since the first zero of _vN (Θ) is located approximately at for N≥4 (see the above-mentioned "Plane-wave decomposition ..." paper), the dispersion effect decreases (and thus the spatial resolution increases) with increasing the Ambisonics order N.

对于N→∞，离差函数v_N(Θ)收敛到缩放后的迪拉克δ函数。在以下情况下可以看到这一点：勒让德多项式的完整关系As N → ∞, the dispersion function v _N (Θ) converges to the scaled Dirac delta function. This can be seen in the following case: The complete relation for the Legendre polynomials

与等式(35)一起使用以将关于N→∞的v_N(Θ)的极限表示为Used together with Eq. (35) to express the limit of v _N (Θ) with respect to N → ∞ as

在通过In passing

定义阶n≤N的实数SH函数的矢量时，其中，O＝(N+1)²，并且(.)^T表示转置，等式(37)与等式(33)比较示出离差函数可以通过两个实数SH矢量的标量乘积表示为When defining a vector of real SH functions of order n≤N, where O=(N+1) ² and (.) ^T represents the transpose, Equation (37) compared with Equation (33) shows that the deviation function can be expressed by the scalar product of two real SH vectors as

v_N(Θ)＝S^T(Ω)S(Ω₀) (47)v _N (Θ)＝S ^T (Ω)S(Ω ₀ ) (47)

在时域中，可以将离差等效地表示为In the time domain, the dispersion can be equivalently expressed as

采样sampling

对于一些应用，期望根据在有限数量J的离散方向Ω_j上的时域幅度密度函数d(t，Ω)的样本确定缩放后的时域高保真度立体声响复制系数然后，根据B.Rafaely的“Analysis and Design of Spherical Microphone Arrays”(IEEE Transactions onSpeech and Audio Processing，卷13，第1号，页135-143，2005年1月)通过有限求和近似等式(28)中的积分：For some applications, it is desirable to determine the scaled time-domain Ambisonics coefficients from samples of the time-domain amplitude density function d(t,Ω) in a finite number J of discrete directions _Ωj . The integral in equation (28) is then approximated by a finite summation according to B. Rafaely, "Analysis and Design of Spherical Microphone Arrays," IEEE Transactions on Speech and Audio Processing, Vol. 13, No. 1, pp. 135-143, January 2005:

其中，g_j表示一些适当选取的采样加权。相对于“Analysis and Design...”论文，近似(50)是指使用实数SH函数的时域表示而不是使用复数SH函数的频域表示。使近似(50)变得精确的必要条件是幅度密度是有限谐函数阶N的，意味着where _gj represents some appropriately chosen sampling weights. In contrast to the "Analysis and Design..." paper, approximation (50) refers to the use of a time domain representation of real SH functions rather than a frequency domain representation of complex SH functions. A necessary condition for approximation (50) to be accurate is that the amplitude density is of finite harmonic order N, meaning

对于n＞N。 (51)For n＞N. (51)

如果该条件不满足，则近似(50)受到空间混叠误差的影响，参见B.Rafaely的“Spatial Aliasing in Spherical Microphone Arrays”(IEEE Transactions on SignalProcessing，卷55，第3期，第1003-1010页，2007年3月)。If this condition is not met, the approximation (50) is affected by spatial aliasing errors, see B. Rafaely, “Spatial Aliasing in Spherical Microphone Arrays” (IEEE Transactions on Signal Processing, Vol. 55, No. 3, pp. 1003-1010, March 2007).

第二必要条件需要采样点Ω_j和对应的加权满足在“Analysis and Design...”论文中给了的对应条件：The second necessary condition requires that the sampling points Ω _j and the corresponding weights satisfy the corresponding conditions given in the "Analysis and Design..." paper:

对于m，m′≤N (52)For m, m′≤N (52)

条件(51)和(52)联合起来对于精确采样就足够了。Conditions (51) and (52) combined are sufficient for accurate sampling.

采样条件(52)由一组线性等式组成，可以使用单个矩阵等式简洁地用公式表示为The sampling conditions (52) consist of a set of linear equations that can be concisely formulated using a single matrix equation as

ΨGΨ^H＝I (53)ΨGΨ ^H ＝I (53)

其中，Ψ表示由下式定义的模式矩阵Where Ψ represents the mode matrix defined by

并且G表示在其对角线上具有加权的矩阵，亦即And G represents a matrix with weights on its diagonal, that is

G：＝diag(g₁，，g_J) (55)G:=diag(g ₁ ,,g _J ) (55)

从等式(53)可以看出，满足等式(52)的必要条件是采样点的数量J满足J≥O。将在J个采样点处的时域幅度密度的值聚集到如下矢量中It can be seen from equation (53) that the necessary condition for satisfying equation (52) is that the number of sampling points J satisfies J ≥ O. The values of the time domain amplitude density at J sampling points are aggregated into the following vector

w(t)：＝(D(t，Ω₁)，...，D(t，Ω_J)) (56)w(t): =(D(t, Ω ₁ ),..., D(t, Ω _J )) (56)

并且通过下式定义缩放后的时域高保真度立体声响复制系数的矢量And the vector of scaled time domain Ambisonics coefficients is defined by

两个矢量通过SH函数展开式(29)相关。这种关系提供了下面的线性等式系统：The two vectors are related by the SH function expansion (29). This relationship provides the following system of linear equations:

w(t)＝Ψ^Hc(t) (58)w(t)＝Ψ ^H c(t) (58)

使用所引入的矢量记号，从时域幅度密度函数样本的值计算缩放后的时域高保真度立体声响复制系数可以写作：Using the introduced vector notation, the computation of the scaled time-domain Ambisonics coefficients from the values of the time-domain amplitude density function samples can be written as:

c(t)≈ΨGw(t) (59)c(t)≈ΨGw(t) (59)

给出固定的高保真度立体声响复制阶N，经常无法实现通过计算J≥O数量的采样点Ω_j和对应的加权使得满足采样条件等式(52)。然而，如果选取采样点使得良好地近似采样条件，则模式矩阵Ψ的秩为O，并且其条件数低。在该情况下，存在模式矩阵Ψ的伪逆Given a fixed Ambisonics order N, it is often not possible to compute J ≥ O number of sampling points Ω _j and the corresponding weights such that the sampling condition Eq. (52) is satisfied. However, if the sampling points are chosen so as to approximate the sampling condition well, the rank of the pattern matrix Ψ is O and its condition number is low. In this case, there exists a pseudo-inverse of the pattern matrix Ψ

Ψ⁺：＝(ΨΨ^H)^-1ΨΨ⁺ (60)Ψ ⁺ :=(ΨΨ ^H ) ^-1 ΨΨ ⁺ (60)

并且通过下式给出从时域幅度密度函数样本的矢量到缩放后的时域高保真度立体声响复制系数矢量c(t)的合理近似And a reasonable approximation from the vector of time-domain amplitude density function samples to the scaled time-domain Ambisonics coefficient vector c(t) is given by

c(t)≈Ψ⁺w(t) (61)c(t)≈Ψ ⁺ w(t) (61)

如果J＝O并且模式矩阵的秩为O，则其伪逆与其逆一致，因为If J = O and the pattern matrix is of rank O, then its pseudoinverse is identical to its inverse, since

Ψ⁺＝(ΨΨ^H)^-1Ψ＝Ψ^-HΨ^-1Ψ＝Ψ^-H (62)Ψ ⁺ =(ΨΨ ^H ) ^-1 Ψ＝Ψ ^-H Ψ ^-1 Ψ＝Ψ ^-H (62)

如果额外满足采样条件等式(52)，则满足If the sampling condition equation (52) is additionally satisfied, then

Ψ^-H＝ΨG (63)Ψ ^-H =ΨG (63)

并且两个近似(59)和(61)是等价的并且是精确的。And the two approximations (59) and (61) are equivalent and exact.

可以将矢量w(t)解释为空间时域信号的矢量。从HOA域到空间域的变换可以例如通过使用等式(58)进行。这种变换在本申请中被称为“球谐函数变换”(SHT)并且在降低了阶的环境HOA分量变换到空间域时使用。隐含地假设SHT的空间采样点Ω_j近似地满足在并且J＝O情况下的等式(52)中的采样条件。The vector w(t) can be interpreted as a vector of spatial-time domain signals. The transformation from the HOA domain to the spatial domain can be performed, for example, using equation (58). This transformation is referred to in this application as the "spherical harmonic transform" (SHT) and is used when the reduced-order ambient HOA components are transformed into the spatial domain. It is implicitly assumed that the spatial sampling points _Ωj of the SHT approximately satisfy the sampling conditions in equation (52) with J=0.

在这些假设下，SHT矩阵满足在SHT的绝对缩放不重要的情况下，则可以忽略常量Under these assumptions, the SHT matrix satisfies In cases where the absolute scaling of the SHT is unimportant, the constant can be ignored.

压缩compression

本发明涉及对给出的HOA信号表示的压缩。如上所述，将HOA表示分解成时域中的预定义数量的主方向信号以及HOA域中的环境分量，继之以通过降低环境分量的阶来压缩环境分量的HOA表示。该操作利用如下被收听测试支持的假设：环境声场分量可以通过具有低阶的HOA表示以足够的精确度来表示。对主方向信号的提取确保了在压缩和对应的解压缩之后保持高空间分辨率。The present invention relates to compression of a given HOA signal representation. As described above, the HOA representation is decomposed into a predefined number of main directional signals in the time domain and an ambient component in the HOA domain, followed by compressing the HOA representation of the ambient component by reducing its order. This operation exploits the assumption, supported by listening tests, that ambient sound field components can be represented with sufficient accuracy by HOA representations having a low order. Extraction of the main directional signals ensures that high spatial resolution is maintained after compression and corresponding decompression.

在分解之后，降低了阶的环境HOA分量被变换到空间域，并且与如在专利申请EP10306472.1的Exemplary embodiments部分中所述那样与方向信号一起被感知地编码。After decomposition, the reduced-order ambient HOA components are transformed into the spatial domain and perceptually encoded together with the directional signal as described in the Exemplary embodiments section of patent application EP10306472.1.

压缩处理包括在图2中图示的两个相继步骤。在下面的压缩的细节部分描述单独信号的确切定义。The compression process comprises two successive steps which are illustrated in Figure 2. The exact definition of the individual signals is described in the Details of Compression section below.

在图2a中示出的第一步骤或阶段中，在主方向估计器22中估计主方向，并且进行将高保真度立体声响复制信号C(l)分解成方向分量以及残差或环境分量，其中l表示帧索引。在方向信号计算步骤或阶段23中计算方向分量，由此高保真度立体声响复制表示被转换到由具有对应的方向的D个常规方向信号x(l)的集合表示的时域信号。在环境HOA分量计算步骤或阶段24中计算残差的环境分量，并且表示为HOA域系数C_A(l)。In a first step or stage shown in FIG2 a , the main direction is estimated in a main direction estimator 22 and a decomposition of the Ambisonics signal C(l) into a directional component and a residual or ambience component is performed, where l represents a frame index. The directional components are calculated in a directional signal calculation step or stage 23, whereby the Ambisonics representation is converted into a time domain signal represented by a set of D conventional directional signals x(l) with corresponding directions. The ambience component of the residual is calculated in an ambience HOA component calculation step or stage 24 and represented as HOA domain coefficients _CA (l).

在图2b中示出的第二步骤中，对方向信号X(l)和环境HOA分量C_A(l)执行感知编码，如下：In the second step shown in FIG2 b , perceptual coding is performed on the directional signal X(l) and the ambient HOA component _CA (l) as follows:

-可以在感知编码器27中使用任何已知的感知压缩技术单独地压缩常规时域方向信号X(l)。The normal time-domain direction signal X(l) may be compressed separately in the perceptual encoder 27 using any known perceptual compression technique.

-在两个子步骤或阶段中执行环境HOA域分量C_A(l)的压缩。第一子步骤或阶段25执行将原始高保真度立体声响复制阶N降低至N_RED，例如N_RED＝2，得到环境HOA分量C_A，RED(l)。此处，利用如下假设：可以通过具有低阶的HOA足够精确地表示环境声场分量。第二子步骤或阶段26基于在专利申请EP 10306472.1中所述的压缩。通过应用球谐函数变换，将在子步骤/阶段25计算的环境声场分量的O_RED：＝(N_RED+1)²个HOA信号C_A，RED(l)变换成空间域中的O_RED个等效信号W_A，RED(l)，得到可以输入给一组并行的感知编码解码器27的常规时域信号。可以应用任何已知的感知编码或压缩技术。输出编码后的方向信号和阶降低了的编码后的空间域信号并且它们可以被传送或存储。Compression of the ambient HOA domain components _CA (l) is performed in two sub-steps or stages. A first sub-step or stage 25 performs a reduction of the original Ambisonics order N to _NRED , for example _NRED = 2, resulting in the ambient HOA components CA _,RED (l). Here, the assumption is made that the ambient sound field components can be represented sufficiently accurately by HOA components with low order. A second sub-step or stage 26 is based on the _compression described in patent application EP 10306472.1. By applying a spherical harmonics transformation, ^{the 2} HOA signals CA _,RED ₍ l) of the ambient sound field components calculated in sub-step/stage 25 are transformed into _0RED equivalent signals W _A,RED (l) in the spatial domain, resulting in a conventional time domain signal that can be input to a set of parallel perceptual codecs 27. Any known perceptual coding or compression technique can be applied. The coded directional signal and the coded spatial domain signal with reduced order are output and can be transmitted or stored.

有利地，可以在感知编码器27中联合地执行对所有时域信号X(l)和W_A，RED(l)的感知压缩，以便通过利用可能剩余的信道间相关性提高总体的编码效率。Advantageously, the perceptual compression of all time domain signals X(l) and _WA,RED (l) may be performed jointly in the perceptual encoder 27 in order to improve the overall coding efficiency by exploiting possible remaining inter-channel correlations.

解压缩Decompression

在图3中图示了对收到的或重放的信号的解压缩处理。如同压缩处理，其包括两个相继步骤。The decompression process of a received or replayed signal is illustrated in Figure 3. Like the compression process, it comprises two successive steps.

在图3a中示出的第一步骤或阶段中，在感知解码31中执行对编码后的方向信号以及阶降低了的编码后的空间域信号的感知解码或者解压缩，其中，是表示分量并且表示环境HOA分量。在逆球谐函数变换器32中经由逆球谐函数变换将经感知解码或解压缩的空间域信号变换成阶为N_RED的HOA域表示此后，在阶扩展步骤或阶段33中，通过阶扩展从估计阶为N的适当的HOA表示In a first step or stage shown in FIG3 a , perceptual decoding or decompression of the encoded directional signal and the order-reduced encoded spatial domain signal is performed in perceptual decoding 31 , where is a representation component and represents the ambient HOA component. The perceptually decoded or decompressed spatial domain signal is transformed into an HOA domain representation of order N _RED via an inverse spherical harmonics transform in an inverse spherical harmonics transformer 32 . Thereafter, in an order expansion step or stage 33 , an appropriate HOA representation of order N is estimated from by order expansion.

在图3b示出的第二步骤或阶段中，在HOA信号组装器34中从方向信号和对应的方向信息以及从原始阶的环境HOA分量重新组成总的HOA表示In a second step or stage shown in FIG3 b , the total HOA representation is reconstructed in the HOA signal assembler 34 from the directional signal and the corresponding directional information and from the ambient HOA components of the original order

可达到的数据率降低Achievable data rates decrease

本发明所解决的问题是与现有的用于HOA表示的压缩方法相比显著地降低数据率。下面论述与非压缩的HOA表示相比的可达到的压缩率。压缩率得自传送阶为N的非压缩的HOA信号C(l)所需的数据率与传送由D个经感知编码的方向信号和对应的方向以及N_RED个表示环境HOA分量的经感知编码的空间域信号W_A，RED(l)组成的压缩后的信号表示所需的数据率的比较。The problem addressed by the present invention is to significantly reduce the data rate compared to existing compression methods for HOA representations. The achievable compression ratios compared to uncompressed HOA representations are discussed below. The compression ratios are derived from a comparison of the data rate required to transmit an uncompressed HOA signal C(l) of order N with the data rate required to transmit a compressed signal representation consisting of D perceptually coded directional signals and the corresponding directions and N _RED perceptually coded spatial domain signals W _A,RED (l) representing ambient HOA components.

为了传送非压缩的HOA信号C(l)，需要O·f_S·N_b的数据率。相反，传送D个经感知编码的方向信号X(l)需要D·f_b，COD的数据率，其中，f_b，COD表示经感知编码的信号的比特率。类似地，传送N_RED个经感知编码的空间域信号W_A，RED(l)信号需要O_RED·f_b，COD的比特率。假设基于与采样率f_S相比低得多的速率计算方向亦即假设它们对于由B个样本组成的信号帧的持续时间是固定的，例如对于f_S＝48kHz的采样率，B＝1200，并且对于压缩后的HOA信号的总的数据率的计算，可以忽略对应的数据率份额。To transmit the uncompressed HOA signal C(l), a data rate of O· _fS · _Nb is required. Conversely, transmitting the D perceptually coded directional signals X(l) requires a data rate of D·fb _,COD , where fb _,COD denotes the bit rate of the perceptually coded signal. Similarly, transmitting the _NRED perceptually coded spatial domain signals W _A,RED (l) requires a bit rate of _ORED ·fb _,COD . It is assumed that the directions are calculated based on a rate much lower than the sampling rate _fS , i.e., that they are fixed for the duration of a signal frame consisting of B samples, e.g., B=1200 for a sampling rate of _fS =48kHz, and the corresponding data rate contribution can be ignored for the calculation of the total data rate of the compressed HOA signal.

因此，传送压缩后的表示需要大约(D+O_RED)·f_b，COD的数据率。因此，压缩率r_COMPR为Therefore, transmitting the compressed representation requires a data rate of approximately (D + _ORED ) · _fb,COD . Therefore, the compression ratio _rCOMPR is

例如，使用降低的HOA阶N_RED＝2并且的比特率将采用采样率f_S＝48kHz并且对于每个样本N_b＝16比特的阶N＝4的HOA表示压缩成具有D＝3个主方向的表示将导致r_COMPR≈25的压缩率。传送压缩后的表示需要大约的数据率。For example, compressing an HOA representation of order N=4 with sampling rate _fs = 48 kHz and _Nb = 16 bits per sample to a representation with D = 3 principal directions using a reduced HOA order _NRED = 2 and a bit rate of _rCOMPR≈ 25 will result in a compression ratio of rCOMPR≈ 25. Transmitting the compressed representation requires a data rate of approximately .

降低的出现编码噪声未屏蔽的概率Reduced probability of unmasking coding noise

如在背景技术中所述，在专利申请EP 10306472.1中所述的空间域信号的感知压缩受到信号之间的剩余的相互相关性的影响，其可能导致未屏蔽感知编码噪声。根据本发明，主方向信号在被感知编码之前，首先从HOA声场表示提取将其提取。这意味着，在组成HOA表示时，在感知解码之后，编码噪声具有与方向信号完全相同的空间方向性。具体地，编码噪声以及方向信号对任何任意方向的影响通过在具有有限阶的空间分辨率部分中解释的空间离差函数确定性地描述。换言之，在任何时刻，表示编码噪声的HOA系数矢量恰好是表示方向信号的HOA系数矢量的倍数。因此，噪声HOA系数的任意加权的和将不会导致对感知编码噪声的任何未屏蔽。As described in the background, the perceptual compression of spatial domain signals described in patent application EP 10306472.1 is affected by residual cross-correlation between the signals, which can lead to unmasking of perceptual coding noise. According to the present invention, the main directional signal is first extracted from the HOA sound field representation before being perceptually coded. This means that when forming the HOA representation, after perceptual decoding, the coding noise has exactly the same spatial directionality as the directional signal. Specifically, the effect of the coding noise and the directional signal on any arbitrary direction is deterministically described by a spatial dispersion function interpreted in a spatial resolution portion with a finite order. In other words, at any moment, the HOA coefficient vector representing the coding noise is exactly a multiple of the HOA coefficient vector representing the directional signal. Therefore, any weighted sum of the noise HOA coefficients will not result in any unmasking of the perceptual coding noise.

另外，正如在EP 10306472.1中所提出那样地处理降低了阶的环境分量，但是因为针对每个定义，环境分量的空间域信号在彼此之间具有相当低的相关性，所以感知噪声未屏蔽的概率很低。Additionally, the reduced-order ambient components are processed as proposed in EP 10306472.1, but since the spatial domain signals of the ambient components have a rather low correlation between each other per definition, the probability of perceptual noise being unmasked is low.

改进的方向估计Improved direction estimation

本发明的方向估计取决于能量上的主HOA分量的方向功率分布。从HOA表示的秩降低了的相关性矩阵(其通过对HOA表示的相关性矩阵的特征值分解得到)计算方向功率分布。与在上述的“Plane-wave decomposition...”论文中使用的方向估计相比，提供了更准确这一优点，因为关注于能量上的主HOA分量而不是对方向估计使用完整的HOA表示降低了方向功率分布的空间模糊。The present invention's direction estimation relies on the directional power distribution of the dominant HOA component in terms of energy. The directional power distribution is calculated from a rank-reduced correlation matrix of the HOA representation (obtained by eigenvalue decomposition of the correlation matrix of the HOA representation). This provides the advantage of greater accuracy compared to the direction estimation used in the aforementioned "Plane-wave decomposition..." paper, as focusing on the dominant HOA component in terms of energy rather than using the full HOA representation for the direction estimation reduces spatial ambiguity in the directional power distribution.

与在上述的“The Application of Compressive Sampling to the Analysisand Synthesis of Spatial Sound Fields”和“Time Domain Reconstruction ofSpatial Sound Fields Using Compressed Sensing”论文中提出的方向估计相比，提供了更加健壮这一优点。原因是将HOA表示分解成方向分量和环境分量几乎永远不会完美的实现，使得在方向分量中保留少量环境分量。然后，像在这两个论文中那样的压缩采样方法由于它们对环境信号的存在的高度敏感性而无法提供合理的方向估计。Compared to the directional estimates proposed in the aforementioned "The Application of Compressive Sampling to the Analysis and Synthesis of Spatial Sound Fields" and "Time Domain Reconstruction of Spatial Sound Fields Using Compressed Sensing" papers, this offers the advantage of being more robust. The reason is that the decomposition of the HOA representation into directional and ambient components is almost never perfectly achieved, leaving a small amount of ambient component in the directional component. However, compressed sampling methods like those in these two papers cannot provide reasonable directional estimates due to their high sensitivity to the presence of ambient signals.

有利地，本发明的方向估计不会受到该问题的影响。Advantageously, the direction of the present invention is not expected to be affected by this problem.

HOA表示分解的替代应用HOA represents an alternative application of decomposition

根据在上述的Pulkki的论文“Spatial Sound Reproduction with DiretionalAudio Coding”中所提出的，所述的将HOA表示分解成带有相关方向信息的若干方向信号以及在HOA域中的环境分量可以用于HOA表示的信号自适应类DirAC呈现。According to the above-mentioned paper "Spatial Sound Reproduction with Diretional Audio Coding" by Pulkki, the decomposition of the HOA representation into several directional signals with relevant directional information and an ambient component in the HOA domain can be used for signal adaptive DirAC-like presentation of the HOA representation.

可以不同地呈现每个HOA分量，因为两个分量的物理特征是不同的。例如，可以使用如基于矢量的幅度摇摄(VBAP)这样的信号摇摄技术对扩音器呈现方向信号，参见V.Pulkki的“Virtual Sound Source Positioning Using Vector Base AmplitudePanning”(Journal of Audio Eng.Society，卷45，第6期，第456-466页，1997年)。可以使得已知的标准HOA呈现技术呈现环境HOA分量。Each HOA component can be rendered differently because the physical characteristics of the two components are different. For example, a signal panning technique such as vector-based amplitude panning (VBAP) can be used to render a directional signal for a loudspeaker. See V. Pulkki, "Virtual Sound Source Positioning Using Vector Base Amplitude Panning" (Journal of Audio Eng. Society, Vol. 45, No. 6, pp. 456-466, 1997). Standard known HOA rendering techniques can be used to render the ambient HOA component.

这样的呈现不限于阶为“1”的高保真度立体声响复制表示，并且因此可以被视为到阶N＞1的HOA表示的类DirAC呈现的扩展。Such a rendering is not limited to Ambisonics representations of order "1" and can therefore be seen as an extension of the DirAC-like rendering to HOA representations of order N>1.

对来自HOA信号表示的若干方向的估计可以用于任何相关类型的声场分析。The estimation of several directions from the HOA signal representation can be used for any relevant type of sound field analysis.

下面的部分更详细地描述信号处理步骤。The following sections describe the signal processing steps in more detail.

压缩compression

输入格式的定义Definition of input format

作为输入，假设在等式(26)中定义的缩放后的时域HOA系数以速率进行采样。将矢量c(j)定义为由属于采样时间t＝jT_S，的所有系数组成，其根据：As input, the scaled time-domain HOA coefficients defined in equation (26) are assumed to be sampled at rate . Define vector c(j) to consist of all coefficients belonging to sampling time t= _jTS , according to:

成帧Framing

在成帧步骤或阶段21中，对缩放后的HOA系数的进入的矢量c(j)进行成帧成为长度为B的非重叠的帧，其根据：In a framing step or stage 21, the incoming vector c(j) of scaled HOA coefficients is framed into non-overlapping frames of length B according to:

假设f_S＝48kHz的采样率，对应于25ms的帧持续时间，适当的帧长度为B＝1200个样本。Assuming a sampling rate of _fs = 48 kHz, corresponding to a frame duration of 25 ms, a suitable frame length is B = 1200 samples.

主方向的估计Estimation of main direction

对于主方向的估计，计算下面的相关性矩阵For the estimation of the main direction, the following correlation matrix is calculated

在当前帧l和L-1个先前帧上的求和指出方向分析基于具有L·B个样本的帧的长重叠组，亦即，对于每个当前帧，考虑邻近帧的内容。这有助于方向分析的稳定性，理由有两个：更长的帧导致更大数量的观测，方向估计由于重叠帧而平滑。The summation over the current frame l and the L-1 previous frames indicates that the direction analysis is based on a long overlapping set of frames with L·B samples, that is, for each current frame, the content of the neighboring frames is considered. This helps the stability of the direction analysis for two reasons: longer frames lead to a larger number of observations, and the direction estimate is smoothed due to the overlapping frames.

假设f_S＝48kHz并且B＝1200，对应于100ms的总体帧持续时间，L的合理值是4。Assuming _fs = 48 kHz and B = 1200, a reasonable value for L is 4, corresponding to an overall frame duration of 100 ms.

接下来，根据下式确定相关性矩阵B(l)的特征值分解Next, the eigenvalue decomposition of the correlation matrix B(l) is determined according to the following formula

B(l)＝V(l)A(l)V^T(l) (68)B(l)＝V(l)A(l)V ^T (l) (68)

其中，矩阵V(l)由特征矢量v_i(l)，1≤i≤O组成，如下The matrix V(l) is composed of the eigenvectors _vi (l), 1≤i≤O, as follows

并且Λ(l)是具有对应的特征值λ_i(l)，1≤i≤O的对角矩阵，在其对角线上：And Λ(l) is a diagonal matrix with corresponding eigenvalues λ _i (l), 1≤i≤O, on its diagonal:

假设以非升序编排特征值的索引，亦即，Assume that the eigenvalues are indexed in non-ascending order, that is,

λ₁(l)≥λ₂(l)≥…≥λ_O(l) (71)λ ₁ (l)≥λ ₂ (l)≥…≥λ _O (l) (71)

之后，计算主特征值的索引集合一种对此进行管理的可行方式是定义所期望的最小宽带方向对环境功率比DAR_MIN，然后确定使得Afterwards, the index set of the main eigenvalues is calculated. One possible way to manage this is to define the desired minimum broadband directional-to-ambient power ratio DAR _MIN and then determine the value such that

并且and

对于for

关于DAR_MIN的合理选择是15dB。主特征值的数量进一步地被约束为不大于D，以便集中于不超过D个主方向。这通过将索引集替换为来实现，其中A reasonable choice for DAR _MIN is 15 dB. The number of main eigenvalues is further constrained to be no greater than D, so as to focus on no more than D main directions. This is achieved by replacing the index set with

接下来，通过下式得到B(l)的秩近似Next, we can get the rank approximation of B(l) by the following formula:

其中 (74)Of which (74)

该矩阵应当包含主方向分量对B(l)的贡献。This matrix should contain the contributions of the principal direction components to B(l).

之后，计算矢量Afterwards, calculate the vector

其中，Ξ表示关于大量近似相等分布的测试方向Ω_q：＝(θ_q，φ_q)，1≤q≤Q的模式矩阵，其中，θ_q∈[0，π]表示从极轴z测量的倾斜角θ∈[0，π]，并且φ_q∈[-π，π[表示从x轴在x＝y平面中测量的方位角。where Ξ represents a pattern matrix for a large number of approximately equally distributed test directions _Ωq :=( _θq , _φq ), 1≤q≤Q, where _θq∈ [0,π] represents the tilt angle θ∈[0,π] measured from the polar axis z, and _φq∈ [-π,π[ represents the azimuth angle measured from the x-axis in the x=y plane.

通过下式定义模式矩阵ΞThe pattern matrix Ξ is defined by

其中，对于1≤q≤QAmong them, for 1≤q≤Q

σ²(l)中的个元素是从方向Ω_q入射的对应于主方向信号的平面波的功率的近似。在下面的关于方向搜索算法的解释部分中提供与此相关的理论上的解释。The elements in σ ² (l) are approximations of the power of a plane wave corresponding to the main directional signal incident from the direction Ω _q . A theoretical explanation related to this is provided in the following explanation section on the direction search algorithm.

根据σ²(l)，计算用于方向信号分量的确定的若干(个)主方向从而约束主方向的数量以满足以便确保不变的数据率。然而，如果允许可变的数据率，则主方向的数量可以适配于当前的声音场景。From σ ² ( l ), several main directions for the determination of the directional signal component are calculated, constraining the number of main directions to satisfy in order to ensure a constant data rate. However, if a variable data rate is allowed, the number of main directions can be adapted to the current sound scene.

计算个主方向的一种可行方式是将第一主方向设置成具有最大功率的那个，亦即，其中，并且假设由主方向信号创建功率最大值，并且考虑使用有限阶N的HOA表示得到方向信号的空间离差的事实(参见，上述的“Plane-wave decomposition...”论文)，则可以断定：在Ω_CURRDOM，1(l)的方向领域中，应当出现属于相同的方向信号的功率分量。因为可以通过函数(参见等式(38))表示空间信号离差，其中，表示Ω_q和Ω_CURRDOM，1(l)之间的角，属于方向信号的功率根据下降。因此，对于另外的主方向的搜索，排除在具有Θ_q，1≤Θ_MIN的的方向领域中的所有方向Ω_q，这是合理的。可以将距离Θ_MIN选取为v_N(x)(对于N≥4，其近似地通过给出)的第一个零。然后，将第二主方向设置为在剩余的方向上具有最大功率的那个，其中，以类似的方式确定剩余的主方向。One possible way to calculate the main directions is to set the first main direction to the one with the maximum power, i.e., where , and assuming that the power maximum is created by the main direction signal, and taking into account the fact that the spatial dispersion of the direction signals is obtained using an HOA representation of finite order N (see, the above-mentioned "Plane-wave decomposition..." paper), it can be concluded that: in the direction field of Ω _CURRDOM,1 (l), power components belonging to the same direction signal should appear. Since the spatial signal dispersion can be represented by a function (see equation (38)), where , represents the angle between Ω _q and Ω _CURRDOM,1 (l), the power belonging to the direction signal decreases according to . Therefore, for the search of the other main directions, it is reasonable to exclude all directions Ω _q in the direction field with Θ _q,1 ≤ Θ _MIN . The distance Θ _MIN can be selected as the first zero of v _N (x) (for N ≥ 4, it is approximately given by ). Then, the second main direction is set to the one with the maximum power in the remaining directions, where the remaining main directions are determined in a similar way.

可以通过以下方式确定主方向的数量考虑分配给单独的主方向的功率并且搜索比值超过所期望的方向对环境率比DAR_MIN的值的情况。这意味着，满足The number of main directions can be determined by taking into account the power allocated to the individual main directions and searching for a situation where the ratio exceeds the value of the desired direction-to-ambient ratio DAR _MIN . This means that

关于计算所有主方向的总体处理可以按照下面执行：The overall process for computing all principal directions can be performed as follows:

接下来，对在当前帧中得到的方向和先前帧中的方向进行平滑，得到平滑的方向该操作可以分成两个相继部分：Next, the direction obtained in the current frame and the direction in the previous frame are smoothed to obtain a smoothed direction. This operation can be divided into two consecutive parts:

(a)对先前帧中的平滑的方向分配当前的主方向确定分配函数使得分配的方向之间的角的和(a) Assign the current main direction to the smoothed directions in the previous frame. Determine the assignment function so that the sum of the angles between the assigned directions is

最小化。可以使用著名的匈牙利算法(参见H.W.Kuhn的“The Hungarian methodfor the assignment problem”，Naval research logistics quarterly 2，第1-2期，第83-97页，1955年)解决这样的分配问题。将当前方向和先前帧中的不活动的方向(关于术语“不活动的方向”的解释，参见下面)之间的角设置为2Θ_MIN。该操作的效果是，试图将比2Θ_MIN更接近于先前活动的方向的当前方向分配给它们。如果距离超过2Θ_MIN，则假设对应的当前方向属于新的信号，这意味着其优选分配给先前不活动的方向注释：当允许全体压缩算法的更大等待时间时，相继方向估计的分配可以更健壮的进行。例如，可以更好地识别突然的方向改变，而不会将它们与从估计误差得到的离群值混合在一起。Minimize. Such an assignment problem can be solved using the famous Hungarian algorithm (see HW Kuhn's "The Hungarian method for the assignment problem", Naval research logistics quarterly 2, No. 1-2, pp. 83-97, 1955). The angle between the current direction and the inactive direction in the previous frame (for an explanation of the term "inactive direction", see below) is set to 2Θ _MIN . The effect of this operation is that an attempt is made to assign current directions that are closer than 2Θ _MIN to the previously active direction. If the distance exceeds 2Θ _MIN , it is assumed that the corresponding current direction belongs to a new signal, which means that it is preferably assigned to the previously inactive direction. Note: When a larger waiting time is allowed for the overall compression algorithm, the assignment of successive direction estimates can be performed more robustly. For example, sudden changes of direction can be better identified without mixing them with outliers resulting from estimation errors.

(b)使用步骤(a)中的分配计算平滑的方向平滑是基于球的几何形状而不是欧几里得几何形状。对于当前的主方向中的每个，沿着由方向知指定的跨越球上的两个点的大圆的劣弧进行平滑。显然，通过用平滑因子α_Ω计算经指数加权的移动平均数，独立地平滑方位角和倾斜角。对于倾斜角，这得到下面的平滑操作：(b) Directions smoothed using the assignments from step (a) Smoothing is based on spherical geometry rather than Euclidean geometry. For each of the current principal directions, smoothing is performed along the minor arc of the great circle spanning two points on the sphere, as specified by the direction θ. Obviously, azimuth and inclination are smoothed independently by computing an exponentially weighted moving average with smoothing factors _αΩ . For inclination, this yields the following smoothing operation:

对于方位角，必须修改平滑以在从π-ε(ε＞0)到-π的平移时以及在相反方向的平移时得到正确的平滑。可以对此进行考虑，通过首先将以2π为模的差分角计算为For azimuth, the smoothing must be modified to get correct smoothing when translating from π-ε (ε>0) to -π and when translating in the opposite direction. This can be taken into account by first computing the differential angle modulo 2π as

其通过下式被转换到区间[-π，π[It is converted to the interval [-π, π[

这个以2π为模的平滑后的主方位角被确定为The main azimuth angle after smoothing modulo 2π is determined as

并且通过下式最终被转换成位于区间[-π，π[内And finally converted into a value in the interval [-π, π[

在的情况下，存在未取得分配的当前主方向的先前帧中的方向对应的索引集合被表示为In the case where there is an index set corresponding to the direction in the previous frame that has not yet obtained the assigned current main direction is represented as

从上一帧复制相应的方向，亦即，对于Copy the corresponding direction from the previous frame, that is, for

对预定数量(L_IA)的帧未分配的方向被称为是不活动的。Directions that are not assigned for a predetermined number (L _IA ) of frames are said to be inactive.

之后，计算通过表示的活动的方向的索引集合。其基数表示为After that, the index set of the direction of the activity represented by is calculated. Its cardinality is expressed as

然后，将所有平滑后的方向连接成单个方向矩阵，作为Then, all smoothed directions are concatenated into a single direction matrix as

方向信号的计算Calculation of direction signal

方向信号的计算基于模式匹配。具体地，对于那些HOA表示得到给出的HOA信号的最佳近似的方向信号进行搜索。因为相继帧之间的方向的改变会导致方向信号的不连续性，所以可以计算重叠帧的方向信号的估计，继之以使用适当的窗口函数平滑相继的重叠帧的结果。然而，该平滑引入单个帧的等待时间。The calculation of the directional signal is based on pattern matching. Specifically, a search is performed for directional signals whose HOA representation gives the best approximation to a given HOA signal. Because changes in direction between consecutive frames can lead to discontinuities in the directional signal, an estimate of the directional signal can be calculated for overlapping frames, followed by smoothing the result of consecutive overlapping frames using an appropriate window function. However, this smoothing introduces latency in the individual frames.

下面解释关于方向信号的详细估计：The detailed estimation of the direction signal is explained below:

首先，根据下式计算基于平滑后的活动的方向的模式矩阵First, the pattern matrix based on the direction of the smoothed activity is calculated according to the following formula

其中，in,

其中，d_ACT，j，1≤j≤D_ACT(l)表示活动的方向的索引。Here, d _ACT,j , 1≤j≤D _ACT (l) represents the index of the direction of the activity.

接下来，计算包含关于第(l-1)个和第l个帧的所有方向信号的非平滑的估计的矩阵X_INST(l)：Next, the matrix X _INST (l) containing the unsmoothed estimates of all directional signals for the (l-1)th and lth frames is calculated:

其中，in,

这在两个步骤中完成。在第一步骤中，将对应于不活动的方向的行中的方向信号样本设置成零，亦即This is done in two steps. In the first step, the direction signal samples in the rows corresponding to the inactive directions are set to zero, i.e.

如果if

在第二步骤中，通过首先根据下式将对应于活动的方向的方向信号样本安排在矩阵中来得到它们In the second step, the direction signal samples corresponding to the direction of the activity are obtained by first arranging them in a matrix according to

然后计算该矩阵，以便将误差的欧几里得范数This matrix is then calculated so that the Euclidean norm of the error

Ξ_ACT(l)X_INST，ACT(l)-[C(l-1) C(l)] (97)Ξ _ACT (l)X _{INST, ACT} (l)-[C(l-1) C(l)] (97)

最小化。其解通过下式给出Minimize. Its solution is given by

通过适当的窗口函数w(j)对方向信号x_INST，d(l，j)(1≤d≤D)的估计进行窗口处理：The estimate of the direction signal x _INST,d (l, j) (1≤d≤D) is windowed by an appropriate window function w(j):

x_{INST，WIN，d}(l，j)：＝x_INST，d(l，j)·w(j)，1≤j≤2B (99)x _{INST, WIN, d} (l, j): = x _{INST, d} (l, j)·w (j), 1≤j≤2B (99)

关于窗口函数的示例由周期汉明窗口给出，定义如下An example of a window function is given by a periodic Hamming window, defined as

其中，K_w表示被确定为使得移位后的窗口的和等于“1”的缩放因子。根据下式通过进行了窗口处理的非平滑的估计的适当重叠来计算第(l-1)个帧的平滑后的方向信号Where _Kw represents a scaling factor determined so that the sum of the shifted windows is equal to "1". The smoothed direction signal of the (l-1)th frame is calculated by appropriately overlapping the windowed unsmoothed estimates according to the following formula:

x_d((l-1)B+j)＝x_{INST，WIN，d}(l-1，B+j)+x_{INST，WIN，d}(l，j) (101)x _d ((l-1)B+j)＝x _{INST, WIN, d} (l-1, B+j)+x _{INST, WIN, d} (l, j) (101)

对第(l-1)个帧的所有平滑后的方向信号的样本布置在矩阵X(l-1)中，如下The samples of all smoothed direction signals for the (l-1)th frame are arranged in the matrix X(l-1) as follows

其中，in,

环境HOA分量的计算Calculation of ambient HOA components

根据下式通过从总的HOA表示C(l-1)减去总的方向HOA分量C_DIR(l-1)得到环境HOA分量C_A(l-1)The ambient HOA component CA(l-1) is obtained by subtracting the total directional HOA component _CDIR (l-1) from the total HOA representation _C (l-1) according to the following formula:

其中，通过下式确定C_DIR(l-1)Wherein, _CDIR (l-1) is determined by the following formula:

其中，Ξ_DOM(l)表示通过下式定义的基于所有平滑的方向的模式矩阵Wherein, _ΞDOM (1) represents the mode matrix based on all smooth directions defined by the following formula

因为总的方向HOA分量的计算还基于重叠的相继瞬间总方向HOA分量的空间平滑，还得到具有单个帧的等待时间的环境HOA分量。Since the calculation of the total directional HOA component is also based on the spatial smoothing of overlapping successive instantaneous total directional HOA components, an ambient HOA component with a latency of a single frame is also obtained.

环境HOA分量的阶降低The order of the ambient HOA component is reduced

通过C_A(l-1)的分量将其表示为It is expressed by the components of _CA (l-1) as

通过删去所有n＞N_RED的HOA系数完成阶降低：The order reduction is achieved by deleting all HOA coefficients with n>N _RED :

环境HOA分量的球谐函数变换Spherical harmonics transformation of ambient HOA components

通过降低了阶的环境HOA分量C_A，RED(l)与模式矩阵的逆的相乘执行球谐函数变换The spherical harmonics transformation is performed by multiplying the reduced-order ambient HOA component CA _,RED (l) with the inverse of the mode matrix

其中，in,

基于O_RED是均匀分布的方向Ω_A，d Based on the uniform distribution of _ORED , the direction of ΩA _,d

1≤d≤O_RED：W_A，RED(l)＝(Ξ_A)^-1C_A，RED(l) (₁₁₁)1≤d≤O _RED : W _{A, RED} (l) = (Ξ _A ) ^-1 C _{A, RED} (l) ( ₁₁₁ )

解压缩Decompression

逆球谐函数变换Inverse spherical harmonics transform

经由逆球谐函数变换通过下式将经感知解压缩的空间域信号变换成阶为N_RED的HOA域表示The perceptually decompressed spatial domain signal is transformed into an HOA domain representation of order N _RED via the inverse spherical harmonics transform by the following formula:

阶扩展Step expansion

根据下式通过附加零将HOA表示的高保真度立体声响复制阶扩展成NThe Ambisonics order represented by the HOA is expanded to N by appending zeros according to

其中，O_m×n表示具有m行和n列的零矩阵。Where O _m×n represents a zero matrix with m rows and n columns.

HOA系数组成HOA coefficient composition

最终的解压缩后的HOA系数根据下式由方向和环境HOA分量相加组成The final decompressed HOA coefficient is composed of the sum of the directional and ambient HOA components according to the following formula

在该阶段，再次引入单个帧的等待时间以允许基于空间平滑计算方向HOA分量。由此，避免了在声场的方向分量中由相继帧之间的方向改变导致的可能的不期望的不连续性。At this stage, a latency of a single frame is again introduced to allow the computation of the directional HOA components based on spatial smoothing. Thus, possible undesirable discontinuities in the directional components of the sound field caused by directional changes between consecutive frames are avoided.

为了计算平滑后的方向HOA分量，将包含所有单独方向信号的估计的两个相继帧连接成单个长帧，如下To compute the smoothed directional HOA component, two consecutive frames containing estimates of all individual directional signals are concatenated into a single long frame as follows

在该长帧中包含的每个单独信号选段乘以例如等式(100)的窗口函数。当按照下式通过长帧的分量表示该长帧时Each individual signal excerpt contained in the long frame is multiplied by a window function such as equation (100). When the long frame is represented by its components according to the following equation

可以将窗口处理操作用公式表示为计算经窗口处理的信息选段如下The window processing operation can be expressed as a formula to calculate the selected information segment after window processing as follows

最后，通过将所有经窗口处理的方向信号选段编码成适当的方向并且以重叠的方式将它们重叠，得到总的方向HOA分量C_DIR(l-1)：Finally, the total directional HOA component C _DIR (l-1) is obtained by encoding all windowed directional signal segments into appropriate directions and overlapping them in an overlapping manner:

方向搜索算法的解释Explanation of the Direction Search Algorithm

下面，解释在主方向估计部分中所述的方向搜索处理之后的动机。其基于首先定义的一些假设。Next, the motivation behind the direction search process described in the main direction estimation section is explained. It is based on some assumptions that are first defined.

假设Assumptions

HOA系数矢量c(j)通常通过下式与时域幅度密度函数d(j，Ω)相关The HOA coefficient vector c(j) is usually related to the time-domain amplitude density function d(j,Ω) by

假设HOA系数矢量c(j)符合以下模型：Assume that the HOA coefficient vector c(j) conforms to the following model:

对于lB+1≤j≤(l+1)B (120)For lB+1≤j≤(l+1)B (120)

该模型表明，一方面，HOA系数矢量c(j)通过来自第l个帧的方向的I个主方向源信号x_i(j)(1≤i≤I)创建。具体地，假设对于单个帧的持续时间，方向是固定的。假设主源信号的数量I明显地小于HOA系数的总数量O。另外，假设帧长度B明显地大于O。另一方面，矢量c(j)由残差分量c_A(j)组成，可以将其视为表示理想的各向同性环境声场。The model shows that, on the one hand, the HOA coefficient vector c(j) is created by I main directional source signals x _i (j) (1≤i≤I) from the direction of the lth frame. Specifically, it is assumed that the direction is fixed for the duration of a single frame. It is assumed that the number of main source signals I is significantly smaller than the total number of HOA coefficients O. In addition, it is assumed that the frame length B is significantly larger than O. On the other hand, the vector c(j) is composed of the residual component c _A (j), which can be regarded as representing an ideal isotropic ambient sound field.

假设单独的HOA系数矢量分量具有以下性质：It is assumed that the individual HOA coefficient vector components have the following properties:

●假设主源信号是零平均值，亦即●Assume that the main source signal has zero mean value, that is

并且假设主源信号彼此无关，亦即And assume that the main source signals are independent of each other, that is,

其中表示第l个帧的第i个信号的平均功率。where represents the average power of the i-th signal in the l-th frame.

●假设主源信号与HOA系数矢量的环境分量无关，亦即● Assume that the main source signal is independent of the ambient component of the HOA coefficient vector, that is,

●假设环境HOA分量矢量是零平均值，并且假设其具有协方差矩阵● The ambient HOA component vector is assumed to be zero mean and has a covariance matrix

●每个帧l的方向对环境功率比DAR(l)在此处通过下式定义The directional-to-ambient power ratio DAR(l) for each frame l is defined here by

假设其大于预定义的期望值DAR_MIN，亦即Assume that it is greater than the predefined expected value DAR _MIN , that is

DAR(l)≥DAR_MIN (126)DAR(l)≥DAR _MIN (126)

方向搜索的解释Explanation of Direction Search

为了进行解释，考虑以下情况：仅基于第l个帧的样本而不考虑L-1个先前帧的样本，计算相关性矩阵B(l)(参见等式(67))。该操作对应于设置L＝1。因此，相关性矩阵可以表示为To explain, consider the following case: the correlation matrix B(l) is calculated based only on the samples of the lth frame without considering the samples of the L-1 previous frames (see equation (67)). This operation corresponds to setting L = 1. Therefore, the correlation matrix can be expressed as

通过将等式(120)中的模型假设替换到等式(128)中，并且通过使用等式(122)和(123)以及等式(124)中的定义，可以将相关性矩阵B(l)近似为(129)By substituting the model assumptions in equation (120) into equation (128), and by using the definitions in equations (122) and (123) and equation (124), the correlation matrix B(l) can be approximated as (129)

根据等式(131)可以看出，B(l)近似地由对方向和环境HOA分量有贡献的两个附加分量组成。其秩近似提供方向HOA分量的近似，亦即According to equation (131), B(l) is approximately composed of two additional components that contribute to the directional and ambient HOA components. Its rank approximation provides an approximation of the directional HOA component, that is,

其根据关于方向对环境功率比的等式(126)得出。This is derived from equation (126) regarding the directional to ambient power ratio.

然而，应当强调的是，∑_A(l)的一部分将不可避免地漏到中，因为∑_A(l)一般具有完整的秩，因此矩阵的列和∑_A(l)跨过的子空间彼此不正交。通过等式(132)，用于主方向搜索的等式(77)中的矢量σ²(l)可以表示为However, it should be emphasized that part of ∑ _A (l) will inevitably leak into ∑ A (l) because ∑ _A (l) generally has full rank, so the columns of the matrix and the subspace spanned by ∑ _A (l) are not orthogonal to each other. By equation (132), the vector σ ² (l) in equation (77) for the main direction search can be expressed as

在等式(135)中，使用在等式(47)中示出的球谐函数的以下属性：In equation (135), the following properties of the spherical harmonics shown in equation (47) are used:

S^T(Ω_q)S(Ω_q′)＝v_N(∠(Ω_q，Ω_q′)) (137)S ^T (Ω _q )S(Ω _q′ )＝v _N (∠(Ω _q ,Ω _q′ )) (137)

等式(136)示出，σ²(l)的个分量是来自测试方向Ω_q(1≤q≤Q)的信号的功率的近似。Equation (136) shows that the components of σ ² (l) are approximations of the power of the signal from the test direction Ω _q (1≤q≤Q).

Claims

1. A method for decompressing a high-order high-fidelity stereo reproduction (HOA) signal representation, the method comprising:

Receive encoded direction signals and encoded environmental signals;

The encoded direction signal and the encoded environment signal are sensed and decoded to generate the decoded direction signal and the decoded environment signal, respectively.

The decoded environmental signal is converted from the spatial domain to its HOA domain representation; and

The high-order high-fidelity stereo reproduction (HOA) signal is reconstructed from the HOA domain representation and decoded directional signals of the ambient signal;

This transformation includes applying an inverse spatial transformation to the decoded environmental signal.

2. The method according to claim 1, wherein the high-order high-fidelity stereo reproduction (HOA) signal represents a signal having an order greater than 1.

3. The method according to claim 1, wherein the order of the decoded environmental signal is less than the order represented by the high-fidelity stereo reproduction (HOA) signal.

4. The method of claim 1, wherein the encoded direction signal and the encoded ambient signal are received in a bitstream, and the bitstream is sensed and decoded into a plurality of transmission channels, each of the plurality of transmission channels being reassigned to the direction signal or the ambient signal before the conversion and reassembly.

5. An apparatus for decompressing a high-order high-fidelity stereo (HOA) reproduction signal representation, the apparatus comprising:

The input interface receives encoded direction signals and encoded ambient signals.

An audio decoder that senses and decodes encoded directional signals and encoded ambient signals to generate decoded directional signals and decoded ambient signals, respectively.

The inverse transformer converts the decoded environmental signal from the spatial domain to its HOA domain representation; and

Synthesizers reconstruct high-order high-fidelity stereo reproduction (HOA) signals from the HOA domain representation and decoded directional signals of ambient signals.

The inverse transformer is further configured to perform the conversion by applying an inverse spatial transformation to the decoded ambient signal.

6. The device according to claim 5, wherein the high-order high-fidelity stereo reproduction (HOA) signal represents a signal having an order greater than 1.

7. The device according to claim 5, wherein the order of the decoded ambient signal is less than the order represented by the high-fidelity stereo echo (HOA) signal.

8. The device of claim 5, wherein the encoded direction signal and the encoded ambient signal are received in a bit stream, and the bit stream is sensed and decoded into a plurality of transmission channels, each of the plurality of transmission channels being reassigned to the direction signal or the ambient signal before the conversion and reassembly.

9. A method for decompressing a high-order high-fidelity stereo reproduction (HOA) signal representation, the method comprising:

Receive encoded direction signals and encoded environmental signals;

The decoded environmental signal is converted from the spatial domain to the HOA domain representation of the environmental signal;

The high-order, high-fidelity stereo (HOA) reproduction signal is reconstructed from the HOA domain representation and decoding of the ambient signal; and

Smooth the reconstructed HOA signal.

10. An apparatus for decompressing a high-order high-fidelity stereo reproduction (HOA) signal representation, the apparatus comprising:

The inverse transformer converts the decoded environmental signal from the spatial domain to the HOA domain representation of the environmental signal;

A synthesizer that reconstructs a high-order, high-fidelity stereo reproduction (HOA) signal from the direction signal represented and decoded in the HOA domain of the ambient signal; and

A smoother that smooths the reconstructed HOA signal.

11. A non-transitory computer-readable medium comprising instructions that, when executed by a processor, cause to perform the method according to any one of claims 1-4 and 9.

12. An apparatus for decompressing a high-order high-fidelity stereo (HOA) reproduction signal representation, comprising:

One or more processors, and

One or more storage media storing instructions that, when executed by the one or more processors, cause the method according to any one of claims 1-4 and 9 to be performed.

13. An apparatus comprising components for performing the method according to any one of claims 1-4 and 9.