[go: up one dir, main page]

HK40019652B - Method and apparatus for decoding a bitstream including encoded hoa representations, and medium - Google Patents

Method and apparatus for decoding a bitstream including encoded hoa representations, and medium Download PDF

Info

Publication number
HK40019652B
HK40019652B HK42020010097.2A HK42020010097A HK40019652B HK 40019652 B HK40019652 B HK 40019652B HK 42020010097 A HK42020010097 A HK 42020010097A HK 40019652 B HK40019652 B HK 40019652B
Authority
HK
Hong Kong
Prior art keywords
prediction
array
hoa
index
elements
Prior art date
Application number
HK42020010097.2A
Other languages
Chinese (zh)
Other versions
HK40019652A (en
Inventor
A·克鲁埃格尔
S·科尔多恩
O·伍埃博尔特
Original Assignee
杜比国际公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杜比国际公司 filed Critical 杜比国际公司
Publication of HK40019652A publication Critical patent/HK40019652A/en
Publication of HK40019652B publication Critical patent/HK40019652B/en

Links

Description

包括编码HOA表示的位流的解码方法和装置、以及介质This includes methods and apparatus for decoding bitstreams represented by HOA codes, as well as media.

本申请是申请号为201480072725.X、申请日为2014年12月19日、发明名称为“用于改善对声场的高阶高保真度立体声响复制表示进行编码所需的边信息的编码的方法和装置”的发明专利申请的分案申请。This application is a divisional application of the invention patent application with application number 201480072725.X, application date December 19, 2014, entitled "Method and apparatus for encoding side information required to improve the encoding of high-order high-fidelity stereo sound reproduction representation of sound field".

技术领域Technical Field

本发明涉及用于改善对声场的高阶高保真度立体声响复制表示(Higher OrderAmbisonics representation)进行编码所需的边信息的编码的方法和装置。The present invention relates to a method and apparatus for encoding side information required to improve the encoding of higher-order ambisonics representations of sound fields.

背景技术Background Technology

除了诸如波场合成(WFS)或诸如22.2多通道音频格式的基于通道的方法的其它技术以外,高阶高保真度立体声响复制(HOA)也提供表现三维声音的一种可能性。与基于通道的方法对照,HOA表示提供与特定扬声器设置无关的优点。但是,这种灵活性以特定扬声器设置上的HOA表示的回放所需要的解码处理为代价。与所需的扬声器的数量通常非常大的WFS方法相比,HOA信号也可被呈现给仅包含很少的扬声器的设置。HOA的另一优点在于,可以在不对头戴式耳机(headphone)的双耳呈现进行任何修改的情况下使用同一表示。In addition to other techniques such as wave field synthesis (WFS) or channel-based methods such as 22.2 multichannel audio formats, high-order high-fidelity stereo reproduction (HOA) also offers the possibility of representing three-dimensional sound. Compared to channel-based methods, HOA representation offers the advantage of being independent of specific speaker setups. However, this flexibility comes at the cost of the decoding processing required for playback of the HOA representation on a specific speaker setup. HOA signals can also be presented to setups with only a few speakers, unlike the WFS method, which typically requires a very large number of speakers. Another advantage of HOA is that the same representation can be used without any modification to the binaural presentation of headphones.

HOA基于按照截短的球面谐波(SH)展开(expansion)的复杂平面谐波振幅的空间密度的表示。各展开系数是角频率的函数,该函数可以用时域函数等同地表示。由此,不失一般性,整个HOA声场表示实际上可被假定为包含O个时域函数,这里,O标记展开系数的数量。以下,这些时域函数将被等同地称为HOA系数序列或者HOA通道。HOA is a representation of the spatial density of complex plane harmonic amplitudes based on the expansion of truncated spherical harmonics (SH). Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time-domain function. Therefore, without loss of generality, the entire HOA sound field representation can actually be assumed to contain O time-domain functions, where O denotes the number of expansion coefficients. Hereinafter, these time-domain functions will be equivalently referred to as the HOA coefficient sequence or HOA channel.

随着展开的最高阶N增大,HOA表示的空间分辨率提高。不幸的是,展开系数的数量O随着阶N二次生长,具体地,O=(N+1)2。例如,利用阶N=4的典型的HOA表示需要O=25个HOA(展开)系数。根据前面所作的考虑,给定希望的单通道采样率fs和每个样本的位数Nb,传送HOA表示的总位速率由O·fs·Nb确定。因此,通过使用Nb=16位每采样、以fs=48kHz的采样率传送阶N=4的HOA表示导致19.2MBits/s的位速率,这对于诸如例如流传输的许多实际应用来说非常高。因此,非常希望压缩HOA表示。As the highest order N of the expansion increases, the spatial resolution of the HOA representation improves. Unfortunately, the number of expansion coefficients O grows quadratically with order N, specifically O = (N+1) ² . For example, a typical HOA representation with order N = 4 requires O = 25 HOA (expansion) coefficients. Based on the preceding considerations, given a desired single-channel sampling rate f <sub>s </sub> and bit depth N<sub> b </sub> per sample, the total bit rate for transmitting the HOA representation is determined by O·f <sub>s</sub> ·N<sub>b</sub> . Therefore, transmitting an order N = 4 HOA representation at a sampling rate of f <sub>s</sub> = 48 kHz using N <sub>b</sub> = 16 bits per sample results in a bit rate of 19.2 MBits/s, which is very high for many practical applications such as streaming. Therefore, compressing the HOA representation is highly desirable.

在WO 2013/171083A1、EP 13305558.2和PCT/EP2013/075559中提出HOA声场表示的压缩。这些处理的共同之处在于,它们执行声场分析并且将给定的HOA表示分解成方向成分和残留环境成分。一方面,最终的压缩表示被假定为包含由环境HOA成分的相关系数序列和方向信号的感知编码得到的数个量化信号。另一方面,假定它包含与量化信号相关的另外的边信息,该边信息是从其压缩版本重构HOA表示所需要的。Compression of HOA sound field representations is proposed in WO 2013/171083A1, EP 13305558.2, and PCT/EP2013/075559. These processes share the common feature of performing sound field analysis and decomposing a given HOA representation into directional and residual environmental components. On the one hand, the final compressed representation is assumed to contain several quantized signals derived from the correlation coefficient sequence of the environmental HOA components and the perceptual encoding of the directional signals. On the other hand, it is assumed to contain additional side information associated with the quantized signals, which is necessary to reconstruct the HOA representation from its compressed version.

该边信息的重要部分是从方向信号预测原始HOA表示的多个部分的描述。由于对于该预测来说,原始HOA表示被假定为由从空间均匀分布的方向冲击的数个空间分散的一般平面波等同地代表,因此,以下,预测被称为空间预测。A key part of this side information is the description of multiple parts of the original HOA representation predicted from the directional signal. Since the original HOA representation is assumed to be equivalently represented by several spatially dispersed general plane waves impacting from spatially uniformly distributed directions for this prediction, the prediction is referred to below as the spatial prediction.

在ISO/IEC JTC1/SC29/WG11,N14061,“Working Draft Text of MPEG-H 3DAudio HOA RM0”,November 2013,Geneva,Switzerland中描述了与空间预测有关的这种边信息的编码。但是,边信息的该现有技术编码相当不足。The encoding of such side information related to spatial prediction is described in ISO/IEC JTC1/SC29/WG11, N14061, “Working Draft Text of MPEG-H 3D Audio HOA RM0”, November 2013, Geneva, Switzerland. However, the existing encoding of this side information is quite inadequate.

发明内容Summary of the Invention

本发明要解决的一个问题是提供编码与该空间预测有关的边信息的更有效的方式。One problem this invention aims to solve is to provide a more efficient way to encode side information related to spatial prediction.

通过在本发明中公开的方法解决该问题。在本发明中还公开了利用这些方法的装置。This problem is solved by the methods disclosed in this invention. Apparatus utilizing these methods is also disclosed in this invention.

位被预先安排给编码的边信息表示数据ζCOD,该位用于表示是否要执行任何预测。该特征随时间减少传送ζCOD数据的平均位速率。此外,在特定的情形中,作为使用对各方向指示是否执行预测的位阵列的替代,传送或传递活动的预测的数量和各指标更有效的。单个位可用于指示被推想为执行预测的方向的指标以何种方式被编码。平均来说,该操作随时间进一步减小传送ζCOD数据的位速率。A bit is pre-allocated to the encoded side information representation data ζ COD , and this bit is used to indicate whether any prediction should be performed. This feature reduces the average bit rate of transmitting ζ COD data over time. Furthermore, in certain situations, transmitting or relaying the number of predictions and indicators of the activity is more efficient than using a bit array indicating whether a prediction should be performed in each direction. A single bit can be used to indicate how the indicator presumed to be the direction of prediction execution is encoded. On average, this operation further reduces the bit rate of transmitting ζ COD data over time.

原则上,本发明的方法适于改善用高阶高保真度立体声响复制(标记为HOA)系数序列的输入时间帧编码声场的HOA表示所需要的边信息的编码,其中,主导方向信号以及残留环境HOA成分被确定,并且,预测被用于所述主导方向信号,由此对HOA系数的编码帧提供描述所述预测的边信息数据,并且其中,所述边信息数据可包含:In principle, the method of the present invention is suitable for improving the encoding of side information required for HOA representation of a sound field encoded using an input time frame of a high-order high-fidelity stereo reproduction (labeled HOA) coefficient sequence, wherein a dominant direction signal and residual environmental HOA components are determined, and a prediction is used for the dominant direction signal, thereby providing side information data describing the prediction to the encoded frame of the HOA coefficients, wherein the side information data may include:

-表示是否对方向执行预测的位阵列;- A bit array indicating whether to perform direction prediction;

-其中每个位对于要执行预测的方向指示预测的类型的位阵列;- Each bit in this array indicates the type of prediction for the direction in which the prediction is to be performed;

-其要素关于要执行的预测表示要使用的方向信号的指标的数据阵列;- Its elements are data arrays of indicators of the directional signals to be used in relation to the prediction to be performed;

-其要素代表量化的缩放因子的数据阵列,- Its elements represent a data array of quantized scaling factors.

所述方法包括以下步骤:The method includes the following steps:

-提供指示是否要执行所述预测的位值;- Provides an indication of whether the predicted bit value should be performed;

-如果不执行预测,那么在所述边信息数据中省略所述位阵列和所述数据阵列;- If prediction is not performed, then the bit array and the data array are omitted from the edge information data;

-如果要执行所述预测,那么,作为所述表示是否对方向执行预测的位阵列的替代,提供指示活动的预测的数量和包含要执行预测的方向的指标的数据阵列是否包含于所述边信息数据中的位值。- If the prediction is to be performed, then, instead of the bit array indicating whether a prediction is to be performed on the direction, a data array indicating the number of predictions to be performed and a data array containing an index of the direction to be predicted are included in the bit values of the side information data.

原则上,本发明的装置适于改善用高阶高保真度立体声响复制(标记为HOA)系数序列的输入时间帧编码声场的HOA表示所需要的边信息的编码,其中,主导方向信号以及残留环境HOA成分被确定,并且,预测被用于所述主导方向信号,由此对HOA系数的编码帧提供描述所述预测的边信息数据,并且其中,所述边信息数据可包含:In principle, the apparatus of the present invention is suitable for improving the encoding of side information required for HOA representation of a sound field encoded using an input time frame of a high-order high-fidelity stereo reproduction (labeled HOA) coefficient sequence, wherein a dominant direction signal and residual environmental HOA components are determined, and a prediction is used for the dominant direction signal, thereby providing side information data describing the prediction to the encoded frame of the HOA coefficients, wherein the side information data may include:

-表示是否对方向执行预测的位阵列;- A bit array indicating whether to perform direction prediction;

-其中每个位对于要执行预测的方向指示预测的类型的位阵列;- Each bit in this array indicates the type of prediction for the direction in which the prediction is to be performed;

-其要素关于要执行的预测表示要使用的方向信号的指标的数据阵列;- Its elements are data arrays of indicators of the directional signals to be used in relation to the prediction to be performed;

-其要素代表量化的缩放因子的数据阵列,- Its elements represent a data array of quantized scaling factors.

所述装置包括以下部件,其:The device includes the following components:

-提供指示是否要执行所述预测的位值;- Provides an indication of whether the predicted bit value should be performed;

-如果不执行预测,那么在所述边信息数据中省略所述位阵列和所述数据阵列;- If prediction is not performed, then the bit array and the data array are omitted from the edge information data;

-如果要执行所述预测,那么,作为所述表示是否对方向执行预测的位阵列的替代,提供指示活动的预测的数量和包含要执行预测的方向的指标的数据阵列是否包含于所述边信息数据中的位值。- If the prediction is to be performed, then, instead of the bit array indicating whether a prediction is to be performed on the direction, a data array indicating the number of predictions to be performed and a data array containing an index of the direction to be predicted are included in the bit values of the side information data.

本发明的有利的另外的实施例在各独立的权利要求中被公开。Further advantageous embodiments of the invention are disclosed in the individual claims.

附图说明Attached Figure Description

参照附图描述本发明的示例性实施例,其中,Exemplary embodiments of the present invention are described with reference to the accompanying drawings, wherein,

图1表示与在EP 13305558.2中描述的HOA压缩处理中的空间预测有关的边信息的示例性编码;Figure 1 shows an exemplary encoding of side information related to spatial prediction in the HOA compression process described in EP 13305558.2;

图2表示与在专利申请EP 13305558.2中描述的HOA解压缩处理中的空间预测有关的边信息的示例性解码;Figure 2 illustrates an exemplary decoding of side information related to spatial prediction in the HOA decompression process described in patent application EP 13305558.2;

图3表示在专利申请PCT/EP2013/075559中描述的HOA分解;Figure 3 illustrates the HOA decomposition described in patent application PCT/EP2013/075559;

图4表示代表残留信号的一般平面波的方向(示为叉)和主导声源的方向(示为圈)的示图。这些方向在三维坐标系中呈现为单位球上的采样位置;Figure 4 shows a diagram representing the direction of the general plane wave (shown as a cross) and the direction of the dominant sound source (shown as a circle) representing the residual signal. These directions are represented as sampling positions on a unit sphere in a three-dimensional coordinate system;

图5表示空间预测边信息的现有技术编码;Figure 5 illustrates the existing technology coding for spatial prediction edge information;

图6表示空间预测边信息的本发明的编码;Figure 6 illustrates the encoding of the present invention for spatial prediction edge information;

图7表示编码的空间预测边信息的本发明的解码;Figure 7 illustrates the decoding of the encoded spatial prediction edge information of this invention;

图8是图7的继续。Figure 8 is a continuation of Figure 7.

具体实施方式Detailed Implementation

以下,为了提供使用与空间预测有关的边信息的本发明的编码的语境,回顾一下在专利申请EP 13305558.2中描述的HOA压缩和解压缩处理。The following is a recap of the HOA compression and decompression process described in patent application EP 13305558.2, in order to provide context for the encoding of the present invention using edge information related to spatial prediction.

HOA压缩HOA compression

在图1中,示出与空间预测有关的边信息的编码如何能被嵌入到在专利申请EP13305558.2中描述的HOA压缩处理中。对于HOA表示压缩,采用对于长度L的HOA系数序列的非重叠输入帧C(k)的帧状处理,这里,k标记帧指标。图1中的第一步骤或阶段11/12是任选的,包括将HOA系数序列C(k)的非重叠的第k个帧和第(k-1)个帧级联为长帧如下:Figure 1 illustrates how the encoding of side information related to spatial prediction can be embedded into the HOA compression process described in patent application EP13305558.2. For HOA representation compression, frame-like processing is employed for non-overlapping input frames C(k) of a HOA coefficient sequence of length L, where k denotes the frame index. The first step or stage 11/12 in Figure 1 is optional and includes concatenating the non-overlapping k-th and (k-1)-th frames of the HOA coefficient sequence C(k) into a long frame as follows:

该长帧与相邻的长帧重叠50%,并且,该长帧被相继用于主导声源方向的估计。与的表示法类似,上波折号(tilde)在以下的描述中用于表示各量指的是长重叠帧。如果不存在步骤/阶段11/12,那么上波折号没有特定含义。加粗的参数意味着一组值,例如,矩阵或者矢量。This long frame overlaps with adjacent long frames by 50%, and is subsequently used to estimate the dominant sound source direction. Similar to the notation used in [previous method], the tilde is used in the following description to indicate that the quantities refer to long overlapping frames. If step/stage 11/12 does not exist, the tilde has no specific meaning. Bold parameters represent a set of values, such as a matrix or vector.

如在EP 13305558.2中描述的那样,长帧被相继用于步骤或阶段13中,用于估计主导声源方向。该估计提供所检测的相关方向信号的指标的数据组以及方向信号的相应方向估计的数据组D表示必须在开始HOA压缩之前设定且可在随后的已知处理中应对的方向信号的最大数量。As described in EP 13305558.2, long frames are used successively in step or stage 13 to estimate the dominant sound source direction. This estimation provides a data set of indices of the detected relevant directional signals and a data set D of corresponding direction estimates of the directional signals, representing the maximum number of directional signals that must be set before the start of HOA compression and can be handled in subsequent known processing.

在步骤或阶段14中,HOA系数序列的当前(长)帧被分解(如在EP 13305156.5中提出的那样)成属于包含于组中的方向的数个方向信号XDIR(k-2)和残留环境HOA成分CAMB(k-2)。为了获得平滑的信号,作为重叠-相加处理的结果,引入两个帧的延迟。假定XDIR(k-2)包含总共D个通道,但是,其中只有与活动的方向信号对应的那些是非零的。规定这些通道的指标被假定为在数据组JDIR,ACT(k-2)中被输出。另外,步骤/阶段14中的分解提供可在用于从方向信号预测原始HOA表示的多个部分的分解侧使用的一些参数ζ(k-2)(更多细节请参见EP 13305156.5)。为了解释空间预测参数ζ(k-2)的含义,在后面的部分“HOA分解”中更详细地描述HOA分解。In step or phase 14, the current (long) frame of the HOA coefficient sequence is decomposed (as proposed in EP 13305156.5) into several directional signals X DIR (k-2) belonging to the directions contained in the group and the residual environment HOA component C AMB (k-2). To obtain a smooth signal, a delay of two frames is introduced as a result of the overlap-addition process. It is assumed that X DIR (k-2) contains a total of D channels, however, only those corresponding to the active directional signals are non-zero. The indices of these channels are assumed to be output in the data group J DIR,ACT (k-2). In addition, the decomposition in step/phase 14 provides some parameters ζ (k-2) that can be used on the decomposition side for predicting multiple parts of the original HOA representation from the directional signals (see EP 13305156.5 for more details). The HOA decomposition is described in more detail in the later section "HOA Decomposition" to explain the meaning of the spatial prediction parameter ζ (k-2).

在步骤或阶段15中,环境HOA成分CAMB(k-2)的系数的数量减少为仅包含ORED+D-NDIR,ACT(k-2)个非零HOA系数序列,这里,NDIR,ACT(k-2)=|JDIR,ACT(k-2)表示数据组JDIR,ACT(k-2)的基数(cardinality),即,帧k-2中的活动的方向信号的数量。由于环境HOA成分被认为总是由HOA系数序列的最小数量ORED代表,因此,该问题实际上可简化为在可能的O-ORED个HOA系数序列中选择剩余的D-NDIR,ACT(k-2)个HOA系数序列。为了获得平滑的简化的环境HOA表示,完成该选取(choice),使得与在前面的帧k-3进行的选取相比,将发生尽可能少的改变。In step or stage 15, the number of coefficients for the environmental HOA component CAMB (k-2) is reduced to only O RED + DN DIR,ACT (k-2) non-zero HOA coefficient sequences, where DN DIR,ACT (k-2) = |J DIR,ACT (k-2)| represents the cardinality of the data set J DIR,ACT (k-2), i.e., the number of active directional signals in frame k-2. Since the environmental HOA component is considered to always be represented by the minimum number of HOA coefficient sequences O RED , the problem can actually be simplified to selecting the remaining DN DIR,ACT (k-2) HOA coefficient sequences from the possible O RED HOA coefficient sequences. To obtain a smooth, simplified representation of the environmental HOA, this selection is performed such that as few changes as possible occur compared to the selection made in the preceding frame k-3.

具有减少数量的ORED+NDIR,ACT(k-2)非零系数序列的最终的环境HOA表示由CAMB,RED(k-2)表示。选取的环境HOA系数序列的指标在数据组JAMB,ACT(k-2)中被输出。在步骤/阶段16中,如EP 13305558.2中描述的那样,包含于XDIR(k-2)中的活动方向信号和包含于CAMB,RED(k-2)中的HOA系数序列被分配给单个感知编码的l个通道的帧Y(k-2)。感知编码步骤/阶段17编码帧Y(k-2)的l个通道并且输出编码的帧The final environmental HOA representation with a reduced number of O RED + N DIR,ACT (k-2) non-zero coefficient sequences is represented by C AMB,RED (k-2). The index of the selected environmental HOA coefficient sequence is output in the data set J AMB,ACT (k-2). In step/stage 16, as described in EP 13305558.2, the activity direction signal contained in X DIR (k-2) and the HOA coefficient sequence contained in C AMB,RED (k-2) are assigned to a single perceptually encoded l-channel frame Y (k-2). Perceptual coding step/stage 17 encodes the l-channels of frame Y (k-2) and outputs the encoded frame.

根据本发明,在步骤/阶段14中的原始HOA表示的分解之后,为了提供编码的数据表现ζCOD(k-2),通过使用在延迟18中延迟了两个帧的指标组在步骤或阶段19中无损地编码从HOA表示的分解得到的空间预测参数或边信息数据ζ(k-2)。According to the present invention, after the decomposition of the original HOA representation in step/stage 14, in order to provide the encoded data representation ζ COD (k-2), the spatial prediction parameters or side information data ζ (k-2) obtained from the decomposition of the HOA representation are losslessly encoded in step or stage 19 by using a set of indices that are delayed by two frames in delay 18.

HOA分解HOA decomposition

在图2中,示例性地表示如何在步骤或阶段25中将与空间预测有关的接收的编码的边信息数据ζCOD(k-2)的解码嵌入到在专利申请EP 13305558.2的图3中描述的HOA分解处理中。通过使用在延迟24中延迟了两个帧的接收的指标组在使编码边信息数据ζCOD(k-2)的解码版本ζ(k-2)在步骤或阶段23中进入到HOA表示的组成(composition)中之前,实现编码边信息数据ζCOD(k-2)的解码。Figure 2 exemplarily illustrates how the decoding of the received encoded side information data ζ COD (k-2) related to spatial prediction is embedded in step or stage 25 into the HOA decomposition process described in Figure 3 of patent application EP 13305558.2. The decoding of the encoded side information data ζ COD (k-2) is achieved by using a set of indicators whose reception is delayed by two frames in delay 24 before the decoded version ζ(k-2) of the encoded side information data ζ COD (k-2) is incorporated into the composition of the HOA representation in step or stage 23.

在步骤或阶段21中,为了获得中的l个解码信号,执行包含于中的l个信号的感知解码。In step or stage 21, in order to obtain l decoded signals in, perceptual decoding of l signals contained in is performed.

在信号重新分配步骤或阶段22中,为了重新创建方向信号的帧和环境HOA成分的帧中的感知解码信号被重新分配。通过使用指标数据组和JAMB,ACT(k-2),再现对HOA压缩执行的分配操作,获得关于如何重新分配信号的信息。在组成步骤或阶段23中,重新组成希望的总HOA表示的当前帧(根据关于PCT/EP2013/075559的图2b和图4描述的处理,使用方向信号的帧活动方向信号指标的组连同相应的方向的组来自方向信号的HOA表示的预测部分的参数ζ(k-2)、以及减少的环境HOA成分的HOA系数序列的帧)。In the signal redistribution step or stage 22, the perceptual decoded signals in the frames of the directional signal and the environmental HOA component are redistributed to recreate the directional signal frames. The redistribution operation performed on the HOA compression is reproduced using the index data set and J AMB,ACT (k-2), providing information on how the signals were redistributed. In the composition step or stage 23, the current frame of the desired total HOA representation is recomposed (according to the processing described in Figures 2b and 4 of PCT/EP2013/075559, using the frame of the directional signal active directional signal index set along with the corresponding directional set from the parameter ζ(k-2) of the predicted portion of the directional signal HOA representation, and the frame of the reduced environmental HOA component HOA coefficient sequence).

与PCT/EP2013/075559中的成分对应,并且,和与PCT/EP2013/075559中的对应,其中,可通过取得包含有效要素的的行的那些指标获得活动方向信号指标。即,通过使用接收的对这种预测的参数ζ(k-2)从方向信号预测关于均匀分布方向的方向信号,然后,从方向信号的帧、从和以及从预测部分和减少的环境HOA成分重新组成当前的解压缩帧Corresponding to the components in PCT/EP2013/075559, and, and corresponding to the components in PCT/EP2013/075559, the activity direction signal index can be obtained by acquiring those indices of the row containing the effective elements. That is, the direction signal with respect to the uniformly distributed direction is predicted from the direction signal using the received parameter ζ(k-2) for such prediction, and then the current decompressed frame is reconstructed from the frame of the direction signal, from and from the predicted portion and the reduced environmental HOA components.

HOA分解HOA decomposition

关于图3,为了解释其中的空间预测的含义,详细描述HOA分解处理。该处理得自关于专利申请PCT/EP2013/075559的图3描述的处理。Regarding Figure 3, the HOA decomposition process is described in detail to explain the meaning of the spatial predictions therein. This process is derived from the process described in Figure 3 of patent application PCT/EP2013/075559.

首先,在步骤或阶段31中,通过使用输入HOA表示的长帧方向的组以及方向信号的相应指标的组计算平滑的主导方向信号XDIR(k-1)和它们的HOA表示CDIR(k-1)。假定XDIR(k-1)包含总共D个通道,但是,其中,只有与活动方向信号对应的那些是非零的。规定这些通道的指标被假定为在组JDIR,ACT(k-1)中被输出。在步骤或阶段33中,原始HOA表示和主导方向信号的HOA表示CDIR(k-1)之间的残差由O个方向信号(它们可被视为来自被称为均匀网格的均匀分布方向的一般平面波)的数量代表。在步骤或阶段34中,为了提供预测信号与各预测参数ζ(k-1),从主导方向信号XDIR(k-1)预测这些方向信号。对于预测,仅考虑具有包含于组中的指标d的主导方向信号xDIR,d(k-1)。在后面的部分“空间预测”中更详细地描述预测。First, in step or stage 31, smoothed dominant directional signals X DIR (k-1) and their HOA representations C DIR (k-1) are calculated using a group of long-frame directions represented by the input HOA and a group of corresponding indices of the directional signals. It is assumed that X DIR (k-1) contains a total of D channels, but only those corresponding to the active directional signals are non-zero. The indices for these channels are assumed to be output in the group J DIR,ACT (k-1). In step or stage 33, the residual between the original HOA representation and the HOA representation C DIR (k-1) of the dominant directional signals is represented by the number of O directional signals (which can be considered as general plane waves from uniformly distributed directions called a uniform grid). In step or stage 34, these directional signals are predicted from the dominant directional signals X DIR (k-1) to provide prediction signals with each prediction parameter ζ (k-1). For prediction, only the dominant directional signal x DIR,d (k-1) with the index d included in the group is considered. The predictions will be described in more detail in the later section on "Spatial Prediction".

在步骤或阶段35中,计算预测方向信号的平滑的HOA表示在步骤或阶段37中,原始HOA表示与主导方向信号的HOA表示CDIR(k-2)和来自均匀分布方向的预测方向信号的HOA表示之间的残差CAMB(k-2)被计算并且被输出。In step or stage 35, a smoothed HOA representation of the predicted direction signal is calculated. In step or stage 37, the residual C AMB (k-2) between the original HOA representation, the HOA representation C DIR (k-2) of the dominant direction signal, and the HOA representation of the predicted direction signal from the uniformly distributed direction is calculated and output.

通过相应的延迟381~387执行图3的处理中需要的信号延迟。The signal delay required in the processing shown in Figure 3 is executed by the corresponding delays of 381 to 387.

空间预测Spatial prediction

空间预测的目的是预测O个残留信号:The purpose of spatial prediction is to predict O residual signals:

其中,这O个残留信号是从以下平滑的方向信号的扩展帧预测的:These O residual signals are predicted from the extended frames of the following smoothed directional signals:

(参见专利申请PCT/EP2013/075559中和以上的部分“HOA分解”的描述)。(See the description of “HOA decomposition” in and above of patent application PCT/EP2013/075559).

各残留信号q=1、…、O代表从方向Ωq冲击的空间分散一般平面波,由此,假定所有方向Ωq,q=1、…、O几乎均匀地分布于单位球上。所有方向全体被称为“网格”。Each residual signal q = 1, ..., O represents a spatially dispersed general plane wave impacted from direction Ω q . Therefore, it is assumed that all directions Ω q , q = 1, ..., O are almost uniformly distributed on a unit sphere. The entire set of all directions is called the "grid".

假定第d方向信号对于各帧是活动的,则各方向信号d=1、…、D代表从在方向ΩACT,d(k-3)、ΩACT,d(k-2)、ΩACT,d(k-1)与ΩACT,d(k)之间内插的轨迹冲击的一般平面波。Assuming that the signal in direction d is active for each frame, then each direction signal d = 1, ..., D represents a general plane wave impacted by the trajectory interpolated between directions ΩACT,d (k-3), ΩACT , d(k-2), ΩACT,d (k-1) and ΩACT,d (k).

为了通过例子解释说明空间预测的含义,考虑阶N=3的HOA表示的分解,这里,提取的方向的最大数量等于D=4。为了简化,进一步假定只有具有指标“1”和“4”的方向信号是活动的,而具有指标“2”和“3”的那些是不活动的。另外,为了简化,假定主导声源的方向对于考虑的帧来说是恒定的,即,To illustrate the meaning of spatial prediction with an example, consider the decomposition of the HOA representation of order N=3, where the maximum number of extracted directions is equal to D=4. For simplicity, further assume that only the direction signals with indices "1" and "4" are active, while those with indices "2" and "3" are inactive. Additionally, for simplicity, assume that the direction of the dominant sound source is constant for the frame under consideration, i.e.,

作为阶N=3的结果,存在空间分散的一般平面波q=1、…、O的O=16个方向Ωq。图4示出这些方向以及活动的主导声源的方向ΩACT,1和ΩACT,4As a result of order N=3, there are 16 directions Ωq for spatially dispersed general plane waves q=1, ..., O. Figure 4 shows these directions, as well as the directions ΩACT ,1 and ΩACT,4 of the dominant active sound source.

用于描述空间预测的现有技术的参数Parameters used to describe existing technologies for spatial prediction

在上述的ISO/IEC文献中给出一种描述空间预测的方式。在该文献中,信号q=1、…、O被假定为通过方向信号的预定最大数量DPRED的加权和或者通过该加权和的低通滤波版本被预测。与空间预测有关的边信息由参数组ζ(k-1)={pTYPE(k-1),PIND(k-1),PQ,F(k-1)}描述,该参数组包含以下的三个成分:The aforementioned ISO/IEC document provides a method for describing spatial prediction. In this document, signals q = 1, ..., O are assumed to be predicted as a weighted sum of a predetermined maximum number of D PREDs of the directional signals, or as a low-pass filtered version of that weighted sum. The side information related to spatial prediction is described by the parameter set ζ(k-1) = {p TYPE (k-1), P IND (k-1), P Q, F (k-1)}, which contains the following three components:

·矢量pTYPE(k-1),其要素pTYPE,q(k-1),q=1、…、O表示对于第q方向Ωq是否执行预测,如果是,那么它们也指示预测的类型。这些要素的含义如下:• The vector p TYPE (k-1) has elements p TYPE,q (k-1), where q = 1, ..., 0. These elements indicate whether a prediction is performed for the q-th direction Ω q , and if so, they also indicate the type of prediction. The meanings of these elements are as follows:

·矩阵PIND(k-1),其要素pIND,d,q(k-1),d=1、…、DPRED,q=1、…、O标记其中的方向信号已执行方向Ωq的预测的指标。如果对于方向Ωq没有执行预测,那么矩阵PIND(k-1)的相应列由零构成。并且,如果对方向Ωq的预测使用少于DPRED的方向信号,那么PIND(k-1)的第q列中的不需要的要素也是零。A matrix PIND (k-1) contains elements pIND,d,q (k-1), where d = 1, ..., D PRED , and q = 1, ..., 0, indicating the direction signals used to predict the direction Ωq . If no prediction is performed for the direction Ωq , the corresponding column of matrix PIND (k-1) is zero. Furthermore, if the prediction of the direction Ωq uses less direction signals than D PRED , the unwanted element in the q-th column of PIND (k-1) is also zero.

·矩阵PQ,F(k-1),包含相应的量化预测因子pQ,F,d,q(k-1),d=1、…、DPRED,q=1、…、O。• Matrix P Q,F (k-1) contains the corresponding quantitative predictors p Q,F,d,q (k-1), d=1、…、D PRED , q=1、…、O.

为了使得能够适当地解释这些参数,必须在解码侧获知以下的两个参数:In order to properly interpret these parameters, the following two parameters must be known on the decoding side:

·方向信号的最大数量DPRED,由其允许预测一般平面波信号• The maximum number of directional signals, D PRED , which allows prediction of general plane wave signals.

·用于量化预测因子pQ,F,d,q(k-1)的位的数量BSC,d=1、…、DPRED,q=1、…、O。在式(10)中给出去量化规则。• The number of bits used to quantize the predictor factors pQ,F,d,q (k-1) is BSC , d=1、…、D PRED , q=1、…、O. The dequantization rule is given in equation (10).

这两个参数必须被任意地设定为编码器和解码器已知的固定值,或者要被另外传送的固定值,但传送率明显没有帧率频繁。后一种选项可用于使这两个参数适于要压缩的HOA表示。These two parameters must be arbitrarily set to fixed values known to the encoder and decoder, or to fixed values to be transmitted separately, but the transmission rate is significantly less frequent than the frame rate. The latter option can be used to adapt these two parameters to the HOA representation to be compressed.

假定O=16、DPRED=2且BSC=8,参数组的例子可能看起来类似于以下形式:Assuming O = 16, D PRED = 2, and B SC = 8, an example of the parameter set might look something like the following:

pTYpE(k-1)=[1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0],  (7)p TYpE (k-1)=[1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0], (7)

这种参数意味着,通过与从对值40去量化得到的因子的纯相乘(即,全波段),从来自方向ΩACT,1的方向信号预测来自方向Ω1的一般平面波信号并且,通过低通滤波和与从对值15和-13去量化得到的因子的相乘,从方向信号和预测来自方向Ω7的一般平面波信号This parameter means that, by pure multiplication with the factor obtained from dequantization of logarithm 40 (i.e., full band), the general plane wave signal from direction Ω 1 is predicted from the directional signal from direction Ω ACT,1. Furthermore, by low-pass filtering and multiplication with the factors obtained from dequantization of logarithms 15 and -13, the general plane wave signal from direction Ω 7 is predicted.

给定该边信息,预测被假定为执行如下:Given this edge information, the prediction is assumed to be performed as follows:

首先,量化预测因子pQ,F,d,q(k-1),d=1、…、DPRED,q=1、…、O被去量化以提供实际的预测因子:First, the quantized predictors pQ,F,d,q (k-1), d=1、…、D PRED , q=1、…、O are dequantized to provide the actual predictors:

如已经描述的,BSC标记用于量化预测因子的位的预定数量。另外,如果pIND,d,q(k-1)等于零,那么pF,d,q(k-1)被假定为被设定为零。As already described, the B SC marker is used to quantify a predetermined number of bits in the predictor. Additionally, if p IND,d,q (k-1) equals zero, then p F,d , q (k-1) is assumed to be set to zero.

对于上述的例子,假定BSC=8,则去量化预测因子矢量会导致:In the example above, assuming BSC = 8, dequantizing the predictor vector would result in:

并且,为了执行低通预测,使用长度Lh=31的预定低通FIR滤波器hLP:=[hLP(0)hLP(1) … hLP(Lh-1)]  (12)。滤波延迟由Dh=15个采样给出。Furthermore, in order to perform low-pass prediction, a predetermined low-pass FIR filter h LP of length L h = 31 is used: = [h LP (0)h LP (1) … h LP (L h - 1)] (12). The filtering delay is given by D h = 15 samples.

作为信号,假定预测信号As a signal, assume the prediction signal

和方向信号and direction signal

通过pass

and

*for:对于由它们的采样构成,则预测信号的采样值由下式给出:*for: For signals composed of their samples, the sampled values of the predicted signal are given by the following formula:

*if:如果*if: if

其中,in,

如上所述,并且,现在从式(17)可以看出,信号q=1、…、O被假定为通过方向信号的预定最大数量DPRED的加权和或者通过该加权和的低通滤波版本被预测。As stated above, and now it can be seen from equation (17) that the signals q = 1, ..., O are assumed to be predicted by a weighted sum of a predetermined maximum number of DPREDs of the directional signals or by a low-pass filtered version of that weighted sum.

与空间预测有关的边信息的现有技术编码Existing technology coding of side information related to spatial prediction

在上述的ISO/IEC文献中,针对的是空间预测边信息的编码。在图5所示的算法1中概括并且将在以下解释它。为了更清楚地表现,在所有的表达中忽略帧指标k-1。The aforementioned ISO/IEC document addresses the encoding of spatially predicted side information. This is summarized in Algorithm 1 shown in Figure 5 and will be explained below. For clarity, the frame index k-1 is ignored in all representations.

首先,创建包含O个位的位阵列ActivePred,其中,位ActivePred[q]表示是否对方向Ωq执行预测。该阵列中的“1”的数量由NumActivePred标记。First, create a bit array ActivePred containing O bits, where bit ActivePred[q] indicates whether prediction is performed for direction Ω q . The number of "1"s in this array is marked by NumActivePred.

然后,创建长度为NumActivePred的位阵列PredType,这里,每个位对要执行预测的方向指示预测的类型即全波段还是低通。同时,创建长度为NumActivePred·DPRED的无符号整数阵列PredDirSigIds,该阵列的要素对每个活动的预测标记要使用的方向信号的DPRED指标。如果对预测使用少于DPRED的方向信号,那么指标被假定为被设定为零。阵列PredDirSigIds的各要素被假定为由|log2(D+1)|个位代表。阵列PredDirSigIds中的非零要素的数量由NumNonZerolds表示。Next, a bit array `PredType` of length `NumActivePred` is created, where each bit pair indicates the type of prediction (full-band or low-pass) for the direction to be predicted. Simultaneously, an unsigned integer array `PredDirSigIds` of length `NumActivePred·D PRED` is created, where each element of this array is labeled with the `D PRED` index of the direction signal to be used for each active prediction. If a direction signal with less than `D PRED` is used for the prediction, the index is assumed to be set to zero. Each element of the `PredDirSigIds` array is assumed to be represented by |log 2 (D+1)| bits. The number of non-zero elements in the `PredDirSigIds` array is represented by `NumNonZerolds`.

最后,创建长度为NumNonZerolds的整数阵列QuantPredGains,其要素被假定为代表用于式(17)中的量化缩放因子PQ,F,d,q(k-1)。在式(10)中给出用于获得相应的去量化缩放因子PF,d,q(k-1)的去量化。阵列QuantPredGains的各要素被假定为由BSC个位代表。Finally, an integer array QuantPredGains of length NumNonZerolds is created, whose elements are assumed to represent the quantization scaling factors PQ,F,d,q (k-1) used in Equation (17). Dequantization is given in Equation (10) to obtain the corresponding dequantization scaling factors PF,d,q (k-1). Each element of the array QuantPredGains is assumed to be represented by BSC bits.

最后,边信息ζCOD的编码表示包含根据下式的四个上述的阵列:Finally, the encoded representation of the edge information ζ COD contains four of the above arrays according to the following formula:

ζCOD=[ActivePred PredType PredDirSiglds QuantPredGains].(19)ζ COD =[ActivePred PredType PredDirSiglds QuantPredGains].(19)

为了用例子解释该编码,使用式(7)~(9)的编码表示:To illustrate this encoding with examples, we will use the encoding representations of equations (7) to (9):

ActivePred=[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]  (20)ActivePred=[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0] (20)

PredType=[0 1]  (21)PredType = [0 1] (21)

PredDirSigIds=[1 0 1 4]  (22)PredDirSigIds=[1 0 1 4] (22)

QuantPredGains=[40 15 -13].  (23)QuantPredGains=[40 15 -13]. (23)

需要的位的数量等于16+2+3·4+8·+3=54。The number of bits required is 16 + 2 + 3·4 + 8·+3 = 54.

本发明的与空间预测有关的边信息的编码The encoding of side information related to spatial prediction in this invention

为了提高与空间预测有关的边信息的编码的效率,现有技术的处理被有利地修改。To improve the efficiency of encoding side information related to spatial prediction, the processing of existing techniques has been advantageously modified.

A)当编码典型声场的HOA表示时,本发明的发明人观察到常常有多个帧在HOA压缩处理中决定根本不执行任何空间预测。但是,在这些帧中,位阵列ActivePred仅包含零,零的数量等于O。由于这种帧内容常常出现,因此本发明的处理对编码表示ζCOD预先安排单个位PSPredictionActive,该位表示是否要执行任何预测。如果位PSPredictionActive的值为零(或者替代地,为“1”),那么阵列ActivePred以及与预测有关的其它数据不包含于编码的边信息ζCOD中。实际上,该操作随时间减少ζCOD的传送的平均位速率。A) When encoding a typical sound field HOA representation, the inventors of this invention observed that there are often multiple frames in the HOA compression process where no spatial prediction is performed at all. However, in these frames, the bit array ActivePred contains only zeros, with the number of zeros equal to 0. Since this type of frame content occurs frequently, the processing of this invention pre-schedules a single bit PSPredictionActive for the encoded representation ζ COD , indicating whether any prediction should be performed. If the value of the bit PSPredictionActive is zero (or, alternatively, "1"), then the array ActivePred and other prediction-related data are not included in the encoded side information ζ COD . In effect, this operation reduces the average bit rate of ζ COD transmission over time.

B)在编码典型声场的HOA表示时作出的进一步观察有,活动的预测的数量NumActivePred常常非常低。在这种情况下,作为为了对各方向Ωq指示是否要执行预测使用位阵列ActivePred的替代,传送或者传递活动的预测的数量和各指标可能是更有效的。特别地,这种修改类型的对活动的编码在B) Further observations made when encoding the HOA representation of a typical sound field show that the number of active predictions, NumActivePred, is often very low. In this case, as an alternative to using the bit array ActivePred to indicate whether prediction should be performed in each direction Ωq , transmitting or relaying the number of active predictions and their indices may be more efficient. In particular, this type of modified encoding of activity...

NumActivePred≤MM                      (24)的情况下是更有效的,In the case of NumActivePred≤M M (24), it is more efficient.

这里,MM是满足下式的最大整数:Here, M is the largest integer that satisfies the following formula:

可仅通过上述的HOA次序N:O=(N+1)2的知识计算MM的值。在式(25)中,|log2(MM)|标记编码活动预测的实际数量NumActivePred所需要的位的数量,MM·|log2(O)|是编码各方向指标所需要的位的数量。式(25)右边与阵列ActivePred的位数对应,这是以已知的方式编码相同的信息所需要的。根据上述的解释,单个位KindOfCodedPredIds可用于指示以哪种方式编码被推想为执行预测的那些方向的指标。如果位KindOfCodedPredIds具有值“1”(或者替代地,为“0”),那么数量NumActivePred和包含推想为执行预测的方向的指标的阵列PredIds被添加到编码的边信息ζCOD。否则,如果位KindOfCodedPredIds具有值“0”(或者替代地,为“1”),那么阵列ActivePred被用于编码相同的信息。The value of MM can be calculated solely from the knowledge of the HOA order N:O = (N+1) ² . In Equation (25), | log² ( MM )| marks the number of bits required to encode the actual number NumActivePred of the active prediction, and MM ·| log² (O)| is the number of bits required to encode the indicators for each direction. The right side of Equation (25) corresponds to the number of bits in the array ActivePred, which is required to encode the same information in a known manner. Based on the above explanation, a single bit KindOfCodedPredIds can be used to indicate how the indicators of those directions presumed to be performing the prediction are encoded. If the bit KindOfCodedPredIds has a value of "1" (or alternatively, "0"), then the number NumActivePred and the array PredIds containing the indicators presumed to be performing the prediction are added to the encoded side information ζCOD . Otherwise, if the bit KindOfCodedPredIds has a value of "0" (or, alternatively, "1"), then the array ActivePred is used to encode the same information.

平均而言,该操作随时间减少ζCOD的传送位速率。On average, this operation reduces the bit transfer rate of ζ COD over time.

C)为了进一步提高边信息编码效率,利用对预测使用的活动方向信号的实际可用数量常常小于D的事实。这意味着,对于指标阵列PredDirSigIds的各要素的编码,需要少于个位。特别地,对预测使用的活动方向信号的实际可用数量由包含活动方向信号的指标的数据组的要素的数量给出。由此,个位可被用于编码指标阵列PredDirSigIds的各要素,这种类型的编码是更有效的。在解码器中,数据组被假定为是已知的,因此,解码器也知道解码方向信号的指标必须读取多少位。注意,要计算的ζCOD的帧指标和所使用的指标数据组必须相同。C) To further improve the efficiency of side information coding, the fact that the actual number of available activity direction signals used for prediction is often less than D is utilized. This means that less than one bit is needed to encode each element of the index array PredDirSigIds. Specifically, the actual number of available activity direction signals used for prediction is given by the number of elements in the data group containing the index of the activity direction signal. Thus, one bit can be used to encode each element of the index array PredDirSigIds, making this type of coding more efficient. In the decoder, the data group is assumed to be known, therefore, the decoder also knows how many bits must be read for the index of the decoding direction signal. Note that the frame index for calculating ζ COD and the index data group used must be the same.

以上的对于已知的边信息编码处理的修改A)~C)导致图6所示的示例性编码处理。The above modifications A) to C) to the known edge information encoding process result in the exemplary encoding process shown in Figure 6.

因此,编码的边信息包含以下成分:ζCOD=  (26)Therefore, the encoded side information contains the following components: ζ COD = (26)

注释:在上述的ISO/IEC文献中,例如,在6.1.3节中,QuantPredGains被称为PredGains,但它包含量化值。Note: In the aforementioned ISO/IEC document, for example in section 6.1.3, QuantPredGains is referred to as PredGains, but it includes quantized values.

式(7)~(9)中的例子的编码表示将是:The encoded representation of the examples in equations (7) to (9) will be:

PSPredictionActive=1  (27)PSPredictionActive=1 (27)

KindOfCodedPredlds=1  (28)KindOfCodedPredlds=1 (28)

NumActivePred=2  (29)NumActivePred = 2 (29)

Predlds=[1 7]  (30)Predlds = [1 7] (30)

PredType=[0 1]  (31)PredType = [0 1] (31)

PredDirSiglds=[1 0 1 4]  (32)PredDirSiglds=[1 0 1 4] (32)

QuantPredGains=[40 15 -13],  (33)QuantPredGains=[40 15 -13], (33)

需要的位数为1+1+2+2·4+2+2·4+8·3=46。有利地,与式(20)~(23)中的现有技术的编码表示相比,根据本发明编码的该表示需要少8个位。也可以不在编码器侧提供位阵列PredType。The required number of bits is 1 + 1 + 2 + 2·4 + 2 + 2·4 + 8·3 = 46. Advantageously, compared with the prior art encoding representation in equations (20) to (23), the representation encoded according to the present invention requires 8 fewer bits. It is also possible not to provide the bit array PredType on the encoder side.

与空间预测有关的修改的边信息编码的解码Decoding of side information encoding related to spatial prediction modifications

在图7和图8所示的示例性解码处理中(图8所示的处理是图7处理的继续)概括并且在以下解释与空间预测有关的修改的边信息的解码。首先,矢量pTYPE和矩阵PIND与PQ,F的所有要素被初始化为零。然后,读取位PSPredictionActive,它表示是否要执行空间预测。在空间预测(即,PSPredictionActive=1)的情况下,读取位KindOfCodedPredIds,这表示要执行预测的方向的指标的编码的类型。In the exemplary decoding process shown in Figures 7 and 8 (the process shown in Figure 8 is a continuation of the process in Figure 7), the decoding of the modified side information related to spatial prediction is summarized and explained below. First, all elements of the vector pTYPE and the matrix PIND with PQ,F are initialized to zero. Then, the bit PSPredictionActive is read, which indicates whether spatial prediction should be performed. In the case of spatial prediction (i.e., PSPredictionActive = 1), the bit KindOfCodedPredIds is read, which indicates the type of encoding of the index of the direction to be predicted.

在KindOfCodedPredIds=0的情况下,读取长度为O的位阵列ActivePred,其中,第q个要素表示是否对于方向Ωq执行预测。在下一步骤中,从阵列ActivePred计算预测的数量NumActivePred并且读取长度为NumActivePred的位阵列PredType,其中,要素表示对相关方向中的每一个执行的预测的类型。通过包含于ActivePred和PredType中的信息,计算矢量pTYPE的要素。With KindOfCodedPredIds = 0, a bit array ActivePred of length O is read, where the q-th element indicates whether a prediction is performed for direction Ω q . In the next step, the number of predictions NumActivePred is calculated from the ActivePred array, and a bit array PredType of length NumActivePred is read, where the elements indicate the type of prediction performed for each of the relevant directions. Using the information contained in ActivePred and PredType, the elements of vector p TYPE are calculated.

也可以不在编码器侧提供位阵列PredType且从位阵列ActivePred计算矢量pTYPE的要素。Alternatively, the bit array PredType can be omitted on the encoder side, and the elements of the vector pTYPE can be calculated from the bit array ActivePred.

在KindOfCodedPredIds=0的情况下,读取活动预测的数量NumActivePred,该数量被假定为用|log2(MM)|个位被编码,这里,MM是满足式(25)的最大整数。然后,读取包含NumActivePred个要素的数据阵列PredIds,这里,各要素被假定为用|log2(O)|个位被编码。该阵列的要素是必须执行预测的方向的指标。依次读取长度NumActivePred的位阵列PredType,其中,要素表示对相关方向中的每一个执行的预测的类型。通过NumActivePred、PredIds和PredType的知识,计算矢量pTYPE的要素。也可以不在编码器侧提供位阵列PredType且从数量NumActivePred和数据阵列PredIds计算矢量pTYPE的要素。With KindOfCodedPredIds = 0, the number of active predictions, NumActivePred, is read, which is assumed to be encoded with |log 2 (M M )| bits, where M M is the largest integer satisfying equation (25). Then, the data array PredIds containing NumActivePred features is read, where each feature is assumed to be encoded with |log 2 (O)| bits. The features in this array are indicators of the directions in which predictions must be performed. The bit array PredType of length NumActivePred is read sequentially, where each feature represents the type of prediction performed for each of the relevant directions. The features of vector pTYPE are calculated using the knowledge of NumActivePred, PredIds, and PredType. Alternatively, the features of vector pTYPE can be calculated from the number NumActivePred and the data array PredIds without providing the bit array PredType on the encoder side.

对于两种情况(即,KindOfCodedPredIds=0和KindOfCodedPredIds=1),在下一步骤中,读取包含NumActivePred·DPRED个要素的阵列PredDirSigIds。各要素被假定为用个位被编码。通过使用包含于pTYPE、和PredDirSigIds中的信息,设定矩阵PIND的要素并且计算PIND中的非零要素的数量NumNonZerolds。For both cases (i.e., KindOfCodedPredIds = 0 and KindOfCodedPredIds = 1), in the next step, the array PredDirSigIds containing NumActivePred·D PRED features is read. Each feature is assumed to be encoded using individual bits. Using the information contained in pTYPE and PredDirSigIds, the features of matrix PIND are defined, and the number of non-zero features in PIND, NumNonZerolds, is calculated.

最后,读取包含分别用BSC个位编码的NumNonZerolds个要素的阵列QuanPredGains。通过使用包含于PIND和QuanPredGains中的信息,设定矩阵PQ,F的要素。Finally, the array QuanPredGains, containing NumNonZerolds elements encoded with B SC bits, is read. The elements of matrix PQ ,F are then defined using the information contained in P IND and QuanPredGains.

可通过单个处理器或电子电路或者通过并行地操作并且/或者在本发明的处理的不同部分上操作的若干处理器或电子电路实施本发明的处理。The processing of the present invention can be implemented by a single processor or electronic circuit or by several processors or electronic circuits operating in parallel and/or operating on different portions of the processing of the present invention.

Claims (7)

1.一种用于对包括编码的高阶高保真度立体声响复制HOA表示的位流进行解码的方法,所述方法包括:1. A method for decoding a bitstream including an encoded high-order high-fidelity stereo echolalia (HOA) representation, the method comprising: 评估位KindOfCodedPredIds的值;Evaluate the value of the bit KindOfCodedPredIds; 基于所述位KindOfCodedPredIds的值评估第一阵列ActivePred,其中,所述第一阵列ActivePred中的每个要素指示对于相应的方向是否执行预测;The first array ActivePred is evaluated based on the value of the bit KindOfCodedPredIds, wherein each element in the first array ActivePred indicates whether a prediction is performed for the corresponding direction; 基于所述第一阵列ActivePred的评估确定矢量pTYPE的要素;The elements of vector pTYPE are determined based on the evaluation of the first array ActivePred; 评估第二阵列PredDirSigIds,其中所述第二阵列PredDirSigIds的要素标记要用于活动预测的方向信号的指标;Evaluate the second array PredDirSigIds, wherein the feature labels of the second array PredDirSigIds are used as an index for the directional signal of the activity prediction; 基于所述矢量pTYPE、方向信号的指标的数据组、以及所述第二阵列PredDirSigIds的要素,确定标记其中的方向信号执行方向的预测的指标的矩阵PIND的要素。Based on the vector pTYPE , the data set of the direction signal index, and the elements of the second array PredDirSigIds, the elements of the matrix PIND that mark the index of the direction signal to perform the prediction of the direction are determined. 2.根据权利要求1所述的方法,其中,第二阵列PredDirSigIds的每个要素对要执行的预测标记要使用的方向信号的指标,并且其中每个要素基于个位被编码,并且被相应地解码,其中标记方向信号的指标的所述数据组的要素的数量。2. The method of claim 1, wherein each element of the second array PredDirSigIds is an index of the direction signal to be used for the prediction to be performed, and wherein each element is encoded based on individual bits and is decoded accordingly, wherein the number of elements of the data group that are labeled with the index of the direction signal is... 3.一种用于对包括编码的高阶高保真度立体声响复制HOA表示的位流进行解码的解码器,所述解码器包括处理器,所述处理器被配置为:3. A decoder for decoding a bitstream including an encoded high-order high-fidelity stereo reproduction (HOA) representation, the decoder comprising a processor configured to: 评估位KindOfCodedPredIds的值;Evaluate the value of the bit KindOfCodedPredIds; 基于所述位KindOfCodedPredIds的值评估第一阵列ActivePred,其中,所述第一阵列ActivePred中的每个要素指示对于相应的方向是否执行预测;The first array ActivePred is evaluated based on the value of the bit KindOfCodedPredIds, wherein each element in the first array ActivePred indicates whether a prediction is performed for the corresponding direction; 基于所述第一阵列ActivePred的评估确定矢量pTYPE的要素;The elements of vector pTYPE are determined based on the evaluation of the first array ActivePred; 评估第二阵列PredDirSigIds,其中所述第二阵列PredDirSigIds的要素标记要用于活动预测的方向信号的指标;Evaluate the second array PredDirSigIds, wherein the feature labels of the second array PredDirSigIds are used as an index for the directional signal of the activity prediction; 基于所述矢量pTYPE、方向信号的指标的数据组、以及所述第二阵列PredDirSigIds的要素,确定标记其中的方向信号执行方向的预测的指标的矩阵PIND的要素。Based on the vector pTYPE , the data set of the direction signal index, and the elements of the second array PredDirSigIds, the elements of the matrix PIND that mark the index of the direction signal to perform the prediction of the direction are determined. 4.根据权利要求3所述的解码器,其中,第二阵列PredDirSigIds的每个要素对要执行的预测标记要使用的方向信号的指标,并且其中每个要素基于个位被编码,并且被相应地解码,其中标记方向信号的指标的所述数据组的要素的数量。4. The decoder of claim 3, wherein each element of the second array PredDirSigIds is an index of the direction signal to be used for the prediction to be performed, and wherein each element is encoded based on individual bits and is decoded accordingly, wherein the number of elements of the data group that are the indexes of the direction signal are marked. 5.一种计算机程序产品,包含指令,所述指令在计算机上执行时实行根据权利要求1或2所述的方法。5. A computer program product comprising instructions that, when executed on a computer, perform the method according to claim 1 or 2. 6.一种用于对包括编码的高阶高保真度立体声响复制HOA表示的位流进行解码的设备,包括:6. An apparatus for decoding a bitstream including an encoded high-order high-fidelity stereo reproduction (HOA) representation, comprising: 处理器,以及processor, and 计算机程序产品,其上包括指令,所述指令在所述处理器上执行时使得设备执行根据权利要求1或2所述的方法。A computer program product having instructions thereon that, when executed on the processor, cause a device to perform the method according to claim 1 or 2. 7.一种用于对包括编码的高阶高保真度立体声响复制HOA表示的位流进行解码的装置,该装置包含用于执行根据权利要求1或2所述的方法的处理器或电子电路。7. An apparatus for decoding a bitstream including an encoded high-order high-fidelity stereo reproduction (HOA) representation, the apparatus comprising a processor or electronic circuitry for performing the method according to claim 1 or 2.
HK42020010097.2A 2014-01-08 2020-06-29 Method and apparatus for decoding a bitstream including encoded hoa representations, and medium HK40019652B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP14305022.7 2014-01-08
EP14305061.5 2014-01-16

Publications (2)

Publication Number Publication Date
HK40019652A HK40019652A (en) 2020-10-16
HK40019652B true HK40019652B (en) 2024-06-21

Family

ID=

Similar Documents

Publication Publication Date Title
JP2025003689A (en) Method and apparatus for improving the coding of side information required to code a higher-order Ambisonics representation of a sound field - Patents.com
HK40107858A (en) Method and apparatus for decoding a bitstream including encoded hoa representations, and medium
HK40110211A (en) Method and apparatus for decoding a bitstream including encoded hoa representations, and medium
HK40019652B (en) Method and apparatus for decoding a bitstream including encoded hoa representations, and medium
HK40020236B (en) Method and apparatus for decoding a bitstream including encoded hoa representations, and medium
HK40018256B (en) Method and apparatus for decoding a bitstream including encoded hoa representations, and medium
HK40079041B (en) Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
HK40123824A (en) Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
HK40079041A (en) Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
HK40019652A (en) Method and apparatus for decoding a bitstream including encoded hoa representations, and medium
HK1227165B (en) Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound filed
HK1227165A1 (en) Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound filed