[go: up one dir, main page]

HK1224073B - Indicating frame parameter reusability for coding vectors - Google Patents

Indicating frame parameter reusability for coding vectors Download PDF

Info

Publication number
HK1224073B
HK1224073B HK16112175.4A HK16112175A HK1224073B HK 1224073 B HK1224073 B HK 1224073B HK 16112175 A HK16112175 A HK 16112175A HK 1224073 B HK1224073 B HK 1224073B
Authority
HK
Hong Kong
Prior art keywords
syntax element
vector
quantization
unit
audio
Prior art date
Application number
HK16112175.4A
Other languages
Chinese (zh)
Other versions
HK1224073A1 (en
Inventor
N.G.彼得斯
D.森
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/609,190 external-priority patent/US9489955B2/en
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK1224073A1 publication Critical patent/HK1224073A1/en
Publication of HK1224073B publication Critical patent/HK1224073B/en

Links

Description

指示用于译码向量的帧参数可重用性Indicates the reusability of frame parameters for decoding vectors

本申请案主张以下各美国临时申请案的权利:This application claims the benefit of the following U.S. provisional applications:

2014年1月30日申请的标题为“声场的经分解表示的压缩(COMPRESSION OFDECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第61/933,706号美国临时申请案;U.S. Provisional Application No. 61/933,706, filed January 30, 2014, entitled “COMPRESSION OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD”;

2014年1月30日申请的标题为“声场的经分解表示的压缩(COMPRESSION OFDECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第61/933,714号美国临时申请案;U.S. Provisional Application No. 61/933,714, filed January 30, 2014, entitled “COMPRESSION OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD”;

2014年1月30日申请的标题为“指示用于解码空间向量的帧参数可重用性(INDICATING FRAME PARAMETER REUSABILITY FOR DECODING SPATIAL VECTORS)”的第61/933,731号美国临时申请案;U.S. Provisional Application No. 61/933,731, filed January 30, 2014, entitled “INDICATING FRAME PARAMETER REUSABILITY FOR DECODING SPATIAL VECTORS”;

2014年3月7日申请的标题为“用于球谐系数的立即播出帧(IMMEDIATE PLAY-OUTFRAME FOR SPHERICAL HARMONIC COEFFICIENTS)”的第61/949,591号美国临时申请案;U.S. Provisional Application No. 61/949,591, filed on March 7, 2014, entitled “IMMEDIATE PLAY-OUTFRAME FOR SPHERICAL HARMONIC COEFFICIENTS”;

2014年3月7日申请的标题为“声场的经分解表示的淡入/淡出(FADE-IN/FADE-OUTOF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第61/949,583号美国临时申请案;U.S. Provisional Application No. 61/949,583, filed on March 7, 2014, entitled “FADE-IN/FADE-OUT OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD”;

2014年5月16日申请的标题为“译码经分解高阶立体混响(HOA)音频信号的V-向量(CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第61/994,794号美国临时申请案;U.S. Provisional Application No. 61/994,794, filed May 16, 2014, entitled “CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”;

2014年5月28日申请的标题为“指示用于解码空间向量的帧参数可重用性(INDICATING FRAME PARAMETER REUSABILITY FOR DECODING SPATIAL VECTORS)”的第62/004,147号美国临时申请案;U.S. Provisional Application No. 62/004,147, filed May 28, 2014, entitled “INDICATING FRAME PARAMETER REUSABILITY FOR DECODING SPATIAL VECTORS”;

2014年5月28日申请的标题为“用于球谐系数的立即播出帧及声场的经分解表示的淡入/淡出(IMMEDIATE PLAY-OUT FRAME FOR SPHERICAL HARMONIC COEFFICIENTS ANDFADE-IN/FADE-OUT OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第62/004,067号美国临时申请案;U.S. Provisional Application No. 62/004,067, filed May 28, 2014, entitled “IMMEDIATE PLAY-OUT FRAME FOR SPHERICAL HARMONIC COEFFICIENTS AND FADE-IN/FADE-OUT OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD”;

2014年5月28日申请的标题为“译码经分解高阶立体混响(HOA)音频信号的V-向量(CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第62/004,128号美国临时申请案;U.S. Provisional Application No. 62/004,128, filed May 28, 2014, entitled “CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”;

2014年7月1日申请的标题为“译码经分解高阶立体混响(HOA)音频信号的V-向量(CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第62/019,663号美国临时申请案;U.S. Provisional Application No. 62/019,663, filed July 1, 2014, entitled “CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”;

2014年7月22日申请的标题为“译码经分解高阶立体混响(HOA)音频信号的V-向量(CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第62/027,702号美国临时申请案;U.S. Provisional Application No. 62/027,702, filed July 22, 2014, entitled “CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”;

2014年7月23日申请的标题为“译码经分解高阶立体混响(HOA)音频信号的V-向量(CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第62/028,282号美国临时申请案;U.S. Provisional Application No. 62/028,282, filed July 23, 2014, entitled “CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”;

2014年7月25日申请的标题为“用于球谐系数的立即播出帧及声场的经分解表示的淡入/淡出(IMMEDIATE PLAY-OUT FRAME FOR SPHERICAL HARMONIC COEFFICIENTS ANDFADE-IN/FADE-OUT OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第62/029,173号美国临时申请案;U.S. Provisional Application No. 62/029,173, filed July 25, 2014, entitled “IMMEDIATE PLAY-OUT FRAME FOR SPHERICAL HARMONIC COEFFICIENTS AND FADE-IN/FADE-OUT OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD”;

2014年8月1日申请的标题为“译码经分解高阶立体混响(HOA)音频信号的V-向量(CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第62/032,440号美国临时申请案;U.S. Provisional Application No. 62/032,440, filed August 1, 2014, entitled “CODING V-VECTORS OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”;

2014年9月26日申请的标题为“高阶立体混响(HOA)音频信号的切换式V-向量量化(SWITCHED V-VECTOR QUANTIZATION OF A HIGHER ORDER AMBISONICS(HOA)AUDIOSIGNAL)”的第62/056,248号美国临时申请案;及U.S. Provisional Application No. 62/056,248, filed on September 26, 2014, entitled “SWITCHED V-VECTOR QUANTIZATION OF A HIGHER ORDER AMBISONICS (HOA) AUDIOSIGNAL”; and

2014年9月26日申请的标题为“经分解高阶立体混响(HOA)音频信号的预测性向量量化(PREDICTIVE VECTOR QUANTIZATION OF A DECOMPOSED HIGHER ORDER AMBISONICS(HOA)AUDIO SIGNAL)”的第62/056,286号美国临时申请案;及U.S. Provisional Application No. 62/056,286, filed September 26, 2014, entitled “PREDICTIVE VECTOR QUANTIZATION OF A DECOMPOSED HIGHER ORDER AMBISONICS (HOA) AUDIO SIGNAL”; and

2015年1月12日申请的标题为“环境高阶立体混响系数的转变(TRANSITIONING OFAMBIENT HIGHER-ORDER AMBISONIC COEFFICIENTS)”的第62/102,243号美国临时申请案,U.S. Provisional Application No. 62/102,243, filed January 12, 2015, entitled “TRANSITIONING OF AMBIENT HIGHER-ORDER AMBISONIC COEFFICIENTS,”

前述所列各美国临时申请案中的每一者以引用的方式并入本文中,如同在其相应全文中所阐述般。Each of the aforementioned U.S. provisional applications is incorporated herein by reference as if set forth in its respective entirety.

技术领域Technical Field

本发明涉及音频数据,且更具体来说,涉及高阶立体混响音频数据的译码。The present disclosure relates to audio data, and more particularly, to the coding of high-order ambisonic audio data.

背景技术Background Art

高阶立体混响(HOA)信号(常由多个球谐系数(SHC)或其它阶层元素表示)是声场的三维表示。HOA或SHC表示可按独立于用以重放从SHC信号呈现的多信道音频信号的局部扬声器几何布置的方式来表示声场。SHC信号还可促进向后兼容性,因为可将SHC信号呈现为众所周知的且被广泛采用的多信道格式(例如,5.1音频信道格式或7.1音频信道格式)。SHC表示因此可实现对声场的更好表示,其也适应向后兼容性。A higher-order ambisonic (HOA) signal (often represented by a plurality of spherical harmonic coefficients (SHCs) or other hierarchical elements) is a three-dimensional representation of a sound field. An HOA or SHC representation can represent the sound field in a manner that is independent of the local speaker geometry used to play back the multi-channel audio signal rendered from the SHC signal. An SHC signal can also facilitate backward compatibility because the SHC signal can be rendered in a well-known and widely adopted multi-channel format (e.g., a 5.1 audio channel format or a 7.1 audio channel format). The SHC representation thus enables a better representation of the sound field, which also accommodates backward compatibility.

发明内容Summary of the Invention

一般来说,描述译码高阶立体混响音频数据的技术。高阶立体混响音频数据可包括对应于具有大于一的阶数的球谐基函数的至少一球谐系数。In general, techniques for coding higher-order ambisonic audio data are described. The higher-order ambisonic audio data may include at least one spherical harmonic coefficient corresponding to a spherical harmonic basis function having an order greater than one.

在一方面,一种有效率的位使用方法包括获得包括表示球谐域中的正交空间轴线的向量的位流。所述位流进一步包括关于是否重用来自前一帧的指示在压缩所述向量时使用的信息的至少一语法元素的指示符。In one aspect, an efficient bit usage method includes obtaining a bitstream including vectors representing orthogonal spatial axes in a spherical harmonic domain. The bitstream further includes an indicator as to whether to reuse at least one syntax element from a previous frame indicating information used in compressing the vectors.

在另一方面,一种经配置以执行有效率的位使用的装置包括一或多个处理器,所述一或多个处理器经配置以获得包括表示球谐域中的正交空间轴线的向量的位流。所述位流进一步包括关于是否重用来自前一帧的指示在压缩所述向量时使用的信息的至少一语法元素的指示符。所述装置也包括经配置以存储所述位流的存储器。In another aspect, a device configured to perform efficient bit usage includes one or more processors configured to obtain a bitstream comprising vectors representing orthogonal spatial axes in a spherical harmonic domain. The bitstream further includes an indicator as to whether at least one syntax element from a previous frame indicating information used in compressing the vectors is reused. The device also includes a memory configured to store the bitstream.

在另一方面,一种经配置以执行有效率的位使用的装置包括用于获得包括表示球谐域中的正交空间轴线的向量的位流的装置。所述位流进一步包括关于是否重用来自前一帧的指示在压缩所述向量时使用的信息的至少一语法元素的指示符。所述装置也包括用于存储所述指示符的装置。In another aspect, a device configured to perform efficient bit usage includes means for obtaining a bitstream comprising vectors representing orthogonal spatial axes in a spherical harmonic domain. The bitstream further includes an indicator as to whether at least one syntax element from a previous frame indicating information used in compressing the vectors is reused. The device also includes means for storing the indicator.

在另一方面,一种非暂时性计算机可读存储媒体具有存储于其上的指令,所述指令在经执行时使得一或多个处理器获得包括表示球谐域中的正交空间轴线的向量的位流,其中所述位流进一步包括关于是否重用来自前一帧的指示在压缩所述向量时使用的信息的至少一语法元素的指示符。On the other hand, a non-transitory computer-readable storage medium has instructions stored thereon that, when executed, cause one or more processors to obtain a bitstream comprising vectors representing orthogonal spatial axes in a spherical harmonic domain, wherein the bitstream further comprises an indicator of whether to reuse at least one syntax element indicating information from a previous frame to be used in compressing the vectors.

在附图及以下描述中阐述所述技术的一或多个方面的细节。所述技术的其它特征、目标及优点将从所述描述及图式以及从权利要求书中显而易见。The details of one or more aspects of the technology are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the technology will be apparent from the description and drawings, and from the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明具有各种阶数及子阶数的球谐基函数的图。FIG. 1 is a diagram illustrating spherical harmonic basis functions with various orders and sub-orders.

图2为说明可执行本发明中所描述的技术的各种方面的系统的图。2 is a diagram illustrating a system that may perform various aspects of the techniques described in this disclosure.

图3为更详细地说明可执行本发明中所描述的技术的各种方面的图2的实例中所展示的音频编码装置的一实例的框图。3 is a block diagram illustrating in greater detail an example of the audio encoding device shown in the example of FIG. 2 that may perform various aspects of the techniques described in this disclosure.

图4为更详细地说明图2的音频解码装置的框图。FIG. 4 is a block diagram illustrating the audio decoding apparatus of FIG. 2 in more detail.

图5A为说明音频编码装置执行本发明中所描述的基于向量的合成技术的各种方面的示范性操作的流程图。5A is a flowchart illustrating exemplary operation of an audio encoding device performing various aspects of the vector-based synthesis techniques described in this disclosure.

图5B为说明音频编码装置执行本发明中所描述的译码技术的各种方面的示范性操作的流程图。5B is a flowchart illustrating exemplary operation of an audio encoding device in performing various aspects of the coding techniques described in this disclosure.

图6A为说明音频解码装置执行本发明中所描述的技术的各种方面的示范性操作的流程图。6A is a flowchart illustrating exemplary operation of an audio decoding device in performing various aspects of the techniques described in this disclosure.

图6B为说明音频解码装置执行本发明中所描述的译码技术的各种方面的示范性操作的流程图。6B is a flowchart illustrating exemplary operation of an audio decoding device in performing various aspects of the coding techniques described in this disclosure.

图7为更详细地说明可指定经压缩空间分量的位流的帧的图。7 is a diagram illustrating in greater detail a frame of a bitstream that may specify a compressed spatial component.

图8为更详细地说明可指定经压缩空间分量的位流的一部分的图。8 is a diagram illustrating in greater detail a portion of a bitstream that may specify a compressed spatial component.

具体实施方式DETAILED DESCRIPTION

环绕声的演化现今已使得许多输出格式可用于娱乐。此些消费型环绕声格式的实例大部分为“声道”式的,这是因为其以某些几何坐标隐含地指定到扩音器的馈入。消费型环绕声格式包含风行的5.1格式(其包含以下六个声道:左前(FL)、右前(FR)、中心或前中心、左后或左环绕、右后或右环绕,及低频效应(LFE))、发展中的7.1格式、包含高度扬声器的各种格式,例如7.1.4格式及22.2格式(例如,用于供超高清晰度电视标准使用)。非消费型格式可横跨任何数目个扬声器(成对称及非对称几何布置),其常常被称为“环绕阵列”。此类阵列的一实例包含定位于截顶二十面体(truncated icosohedron)的拐角上的坐标处的32个扩音器。The evolution of surround sound has made many output formats available for entertainment today. Examples of these consumer surround sound formats are mostly "channel" in nature because they implicitly specify the feeds to the loudspeakers with certain geometric coordinates. Consumer surround sound formats include the popular 5.1 format (which includes the following six channels: left front (FL), right front (FR), center or front center, left rear or left surround, right rear or right surround, and low-frequency effects (LFE)), the developing 7.1 format, various formats that include height speakers, such as the 7.1.4 format, and the 22.2 format (for example, for use with ultra-high-definition television standards). Non-consumer formats can span any number of speakers (in symmetrical and asymmetrical geometric arrangements) and are often referred to as "surround arrays." An example of such an array includes 32 loudspeakers positioned at coordinates on the corners of a truncated icosohedron.

到未来MPEG编码器的输入视情况为以下三种可能格式中的一者:(i)传统的基于声道的音频(如上文所论述),其意欲经由处于预先指定的位置处的扩音器播放;(ii)基于对象的音频,其涉及用于单一音频对象的具有含有其位置坐标(以及其它信息)的相关联元数据的离散脉码调制(PCM)数据;及(iii)基于场景的音频,其涉及使用球谐基函数的系数(也被称为“球谐系数”或SHC、“高阶立体混响”或HOA及“HOA系数”)来表示声场。所述未来MPEG编码器可能更详细地描述于国际标准化组织/国际电工委员会(ISO)/(IEC)JTC1/SC29/WG11/N13411的标题为“要求针对3D音频的提议(Call for Proposals for 3DAudio)”的文件中,所述文件于2013年1月在瑞士日内瓦发布,且可在http:// mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/ w13411.zip获得。The input to future MPEG encoders will be in one of three possible formats: (i) traditional channel-based audio (as discussed above), intended to be played through loudspeakers at pre-specified positions; (ii) object-based audio, which involves discrete pulse-code modulation (PCM) data for a single audio object with associated metadata containing its position coordinates (and other information); and (iii) scene-based audio, which involves representing the sound field using coefficients of spherical harmonic basis functions (also known as "spherical harmonic coefficients" or SHCs, "higher-order ambisonics" or HOA, and "HOA coefficients"). Such future MPEG encoders may be described in more detail in the document entitled "Call for Proposals for 3D Audio" of the International Organization for Standardization/International Electrotechnical Commission (ISO)/(IEC) JTC1/SC29/WG11/ N13411 , published in Geneva, Switzerland in January 2013 and available at http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip .

在市场中存在各种基于“环绕声”声道的格式。举例来说,其范围从5.1家庭影院系统(其在使起居室享有立体声方面已获得最大成功)到由日本广播协会或日本广播公司(NHK)开发的22.2系统。内容建立者(例如,好莱坞工作室)将希望产生影片的音轨一次,而不花费精力来针对每一扬声器配置对其进行重混(remix)。近年来,标准开发组织一直在考虑如下方式:将编码及后续解码(其可为调适的且不知晓重放位置(涉及呈现器)处的扬声器几何布置(及数目)及声学条件)提供到标准化位流中。There are various formats based on "surround sound" channels in the market. For example, they range from the 5.1 home theater system, which has achieved the greatest success in bringing stereo sound to the living room, to the 22.2 system developed by the Japan Broadcasting Corporation or NHK. Content creators (for example, Hollywood studios) will want to produce the soundtrack of a movie once, without spending the effort to remix it for each speaker configuration. In recent years, standard development organizations have been considering ways to provide encoding and subsequent decoding (which can be adaptive and agnostic to the speaker geometry (and number) and acoustic conditions at the playback location (involving the renderer)) into a standardized bitstream.

为了向内容建立者提供此类灵活性,可使用一组阶层元素来表示声场。所述组阶层元素可指其中元素经排序而使得一组基本低阶元素提供经模型化声场的完整表示的一组元素。当将所述组扩展以包含高阶元素时,所述表示变得更详细,从而增加分辨率。To provide content creators with this flexibility, a set of hierarchical elements can be used to represent the sound field. The set of hierarchical elements may refer to a set of elements in which the elements are ordered so that a basic set of low-order elements provides a complete representation of the modeled sound field. As the set is expanded to include higher-order elements, the representation becomes more detailed, thereby increasing the resolution.

一组阶层元素的一实例为一组球谐系数(SHC)。以下表达式示范使用SHC进行的对声场的描述或表示:An example of a set of hierarchical elements is a set of spherical harmonic coefficients (SHCs). The following expression demonstrates the description or representation of the sound field using SHCs:

所述表达式展示:在时间t在声场的任何点处的压力pi可独特地通过SHC来表示。此处,c为音速(~343m/s),为参考点(或观测点),jn(·)为n阶球面贝塞尔函数,且为n阶及m子阶球谐基函数。可认识到,方括号中的术语为可通过各种时间-频率变换来近似的信号的频域表示(即,),所述变换例如离散傅立叶变换(DFT)、离散余弦变换(DCT)或小波变换。阶层组的其它实例包含数组小波变换系数及其它数组多分辨率基函数系数。The expression shows that the pressure p i at any point in the sound field at time t can be uniquely represented by the SHC. Here, c is the speed of sound (~343 m/s), is the reference point (or observation point), j n (·) is the nth-order spherical Bessel function, and is the nth-order and mth-order spherical harmonic basis functions. It can be recognized that the terms in square brackets are frequency domain representations of the signal (i.e., ) that can be approximated by various time-frequency transforms, such as discrete Fourier transform (DFT), discrete cosine transform (DCT), or wavelet transform. Other examples of hierarchical groups include arrays of wavelet transform coefficients and other arrays of multiresolution basis function coefficients.

图1为说明从零阶(n=0)到四阶(n=4)的球谐基函数的图。如可见,对于每一阶来说,存在m子阶的扩展,出于易于说明的目的,在图1的实例中展示了所述子阶但未明确地提及。Figure 1 is a diagram illustrating spherical harmonic basis functions from zeroth order (n=0) to fourth order (n=4). As can be seen, for each order, there is an expansion of m sub-orders, which are shown in the example of Figure 1 for ease of illustration but not explicitly mentioned.

可通过各种麦克风阵列配置在物理上获取(例如,记录)SHC或替代地,可从声场的基于声道或基于对象的描述导出SHC。SHC表示基于场景的音频,其中可将SHC输入到音频编码器以获得经编码SHC,所述经编码SHC可促成更有效率的传输或存储。举例来说,可使用涉及(1+4)2(25,且因此为四阶)系数的四阶表示。SHC can be physically acquired (e.g., recorded) using various microphone array configurations or, alternatively, derived from a channel-based or object-based description of the sound field. SHC represents scene-based audio, where the SHC can be input to an audio encoder to obtain encoded SHC, which can facilitate more efficient transmission or storage. For example, a fourth-order representation involving (1+4) ² (2⁵, and therefore fourth-order) coefficients can be used.

如上文所提及,可使用麦克风阵列从麦克风记录导出SHC。可如何从麦克风阵列导出SHC的各种实例描述于Poletti,M.的“基于球谐的三维环绕声系统(Three-DimensionalSurround Sound Systems Based on Spherical Harmonics)”(J.Audio Eng.Soc.,第53卷,第11期,2005年11月,第1004到1025页)中。As mentioned above, SHC can be derived from microphone recordings using a microphone array. Various examples of how SHC can be derived from a microphone array are described in Poletti, M., "Three-Dimensional Surround Sound Systems Based on Spherical Harmonics," J. Audio Eng. Soc., Vol. 53, No. 11, November 2005, pp. 1004-1025.

为了说明可如何从基于对象的描述导出SHC,考虑以下等式。可将对应于个别音频对象的声场的系数表达为:To illustrate how SHC can be derived from an object-based description, consider the following equation. The coefficients of the sound field corresponding to an individual audio object can be expressed as:

其中i为为n阶球面汉克尔函数(第二种类),且为对象的位置。知道依据频率的对象源能量g(ω)(例如,使用时间-频率分析技术,例如,对PCM流执行快速傅立叶变换)允许我们将每一PCM对象及对应位置转换成SHC另外,可展示(因为上述情形为线性及正交分解)每一对象的系数为加成性的。以此方式,可通过系数表示众多PCM对象(例如,作为用于个别对象的系数向量的总和)。基本上,所述系数含有关于声场的信息(依据3D坐标的压力),且上述情形表示在观测点附近从个别对象到整个声场的表示的变换。下文在基于对象及基于SHC的音频译码的上下文中描述剩余诸图。where i is the nth-order spherical Hankel function (of the second kind), and ω is the position of the object. Knowing the frequency-dependent source energy of the object, g(ω) (e.g., using time-frequency analysis techniques, such as performing a Fast Fourier Transform on the PCM stream) allows us to convert each PCM object and corresponding position into an SHC. Furthermore, it can be shown (because of the linear and orthogonal decomposition described above) that the coefficients of each object are additive. In this way, many PCM objects can be represented by coefficients (e.g., as the sum of the coefficient vectors for the individual objects). Essentially, the coefficients contain information about the sound field (pressure in terms of 3D coordinates), and the above scenario represents a transformation from an individual object to a representation of the entire sound field near the observation point. The remaining figures are described below in the context of object-based and SHC-based audio coding.

图2为说明可执行本发明中所描述的技术的各种方面的系统10的图。如图2的实例中所展示,系统10包含内容建立者装置12及内容消费者装置14。虽然在内容建立者装置12及内容消费者装置14的上下文中加以描述,但可在声场的SHC(其也可被称作HOA系数)或任何其它阶层表示经编码以形成表示音频数据的位流的任何上下文中实施所述技术。此外,内容建立者装置12可表示能够实施本发明中所描述的技术的任何形式的计算装置,包含手持机(或蜂窝式电话)、平板计算机、智能电话或桌上型计算机(提供几个实例)。同样地,内容消费者装置14可表示能够实施本发明中所描述的技术的任何形式的计算装置,包含手持机(或蜂窝式电话)、平板计算机、智能电话、机顶盒,或桌上型计算机(提供几个实例)。FIG2 is a diagram illustrating a system 10 that can perform various aspects of the techniques described in this disclosure. As shown in the example of FIG2 , system 10 includes a content creator device 12 and a content consumer device 14. Although described in the context of content creator device 12 and content consumer device 14, the techniques can be implemented in any context in which the SHC (which may also be referred to as HOA coefficients) or any other hierarchical representation of a sound field is encoded to form a bitstream representing audio data. Furthermore, content creator device 12 can represent any form of computing device capable of implementing the techniques described in this disclosure, including a handset (or cellular phone), a tablet, a smartphone, or a desktop computer (to provide a few examples). Similarly, content consumer device 14 can represent any form of computing device capable of implementing the techniques described in this disclosure, including a handset (or cellular phone), a tablet, a smartphone, a set-top box, or a desktop computer (to provide a few examples).

内容建立者装置12可由影片工作室或可产生多信道音频内容以供内容消费者的操作者(例如,内容消费者装置14)消耗的其它实体来操作。在一些实例中,内容建立者装置12可由将希望压缩HOA系数11的个别用户操作。常常,内容建立者产生音频内容连同视频内容。内容消费者装置14可由个体来操作。内容消费者装置14可包含音频重放系统16,其可指能够呈现SHC以供作为多信道音频内容重放的任何形式的音频重放系统。The content creator device 12 may be operated by a film studio or other entity that can produce multi-channel audio content for consumption by operators of content consumers (e.g., content consumer device 14). In some examples, the content creator device 12 may be operated by an individual user who wishes to compress the HOA coefficients 11. Often, the content creator produces audio content along with the video content. The content consumer device 14 may be operated by an individual. The content consumer device 14 may include an audio playback system 16, which may refer to any form of audio playback system capable of presenting SHC for playback as multi-channel audio content.

内容建立者装置12包含音频编辑系统18。内容建立者装置12获得呈各种格式(包含直接作为HOA系数)的现场记录7及音频对象9,内容建立者装置12可使用音频编辑系统18对现场记录7及音频对象9进行编辑。内容建立者可在编辑过程期间从音频对象9呈现HOA系数11,从而在识别声场的需要进一步编辑的各种方面的尝试中倾听所呈现的扬声器馈入。内容建立者装置12可接着编辑HOA系数11(可能经由操纵可供以上文所描述的方式导出源HOA系数的音频对象9中的不同者间接地编辑)。内容建立者装置12可使用音频编辑系统18产生HOA系数11。音频编辑系统18表示能够编辑音频数据且输出所述音频数据作为一或多个源球谐系数的任何系统。The content creator device 12 includes an audio editing system 18. The content creator device 12 obtains a field recording 7 and audio objects 9 in various formats (including directly as HOA coefficients), which the content creator device 12 can edit using the audio editing system 18. The content creator can render the HOA coefficients 11 from the audio objects 9 during the editing process, listening to the rendered speaker feeds in an attempt to identify various aspects of the sound field that require further editing. The content creator device 12 can then edit the HOA coefficients 11 (possibly indirectly by manipulating different ones of the audio objects 9 from which the source HOA coefficients can be derived in the manner described above). The content creator device 12 can generate the HOA coefficients 11 using the audio editing system 18. The audio editing system 18 represents any system capable of editing audio data and outputting the audio data as one or more source spherical harmonic coefficients.

当编辑过程完成时,内容建立者装置12可基于HOA系数11产生位流21。即,内容建立者装置12包含音频编码装置20,所述音频编码装置20表示经配置以根据本发明中所描述的技术的各种方面编码或以其它方式压缩HOA系数11以产生位流21的装置。音频编码装置20可产生位流21以供传输,作为一实例,跨越传输信道(其可为有线或无线信道、数据存储装置或其类似者)。位流21可表示HOA系数11的经编码版本,且可包含主要位流及另一旁侧位流(其可被称作旁侧信道信息)。When the editing process is complete, the content creator device 12 can generate a bitstream 21 based on the HOA coefficients 11. That is, the content creator device 12 includes an audio encoding device 20, which represents a device configured to encode or otherwise compress the HOA coefficients 11 according to various aspects of the techniques described in this disclosure to generate the bitstream 21. The audio encoding device 20 can generate the bitstream 21 for transmission, as an example, across a transmission channel (which can be a wired or wireless channel, a data storage device, or the like). The bitstream 21 can represent an encoded version of the HOA coefficients 11 and can include a main bitstream and another side bitstream (which can be referred to as side channel information).

尽管下文更详细地加以描述,但音频编码装置20可经配置以基于基于向量的合成或基于方向的合成编码HOA系数11。为了确定是执行基于向量的分解方法还是执行基于方向的分解方法,音频编码装置20可至少部分基于HOA系数11确定HOA系数11是经由声场的自然记录(例如,现场记录7)产生还是从(作为一实例)例如PCM对象的音频对象9人工地(即,合成地)产生。当HOA系数11是从音频物体9产生时,音频编码装置20可使用基于方向的分解方法编码HOA系数11。当HOA系数11是使用(例如,eigenmike)现场地俘获时,音频编码装置20可基于基于向量的分解方法编码HOA系数11。上述区别表示可部署基于向量或基于方向的分解方法的一实例。可能存在其它状况:其中所述分解方法中的任一者或两者可用于自然记录、人工产生的内容或两种内容的混合(混合内容)。此外,也有可能同时使用两种方法用于译码HOA系数的单一时间框。Although described in greater detail below, the audio encoding device 20 may be configured to encode the HOA coefficients 11 based on either vector-based synthesis or directional-based synthesis. To determine whether to perform a vector-based decomposition method or a directional-based decomposition method, the audio encoding device 20 may determine, at least in part based on the HOA coefficients 11, whether the HOA coefficients 11 were generated via a natural recording of a sound field (e.g., a field recording 7) or artificially (i.e., synthetically) generated from an audio object 9, such as a PCM object (as an example). When the HOA coefficients 11 are generated from an audio object 9, the audio encoding device 20 may encode the HOA coefficients 11 using a directional-based decomposition method. When the HOA coefficients 11 are captured live (e.g., using eigenmike), the audio encoding device 20 may encode the HOA coefficients 11 based on a vector-based decomposition method. The above distinction represents one example of how either a vector-based or directional-based decomposition method may be deployed. Other situations may exist where either or both of the decomposition methods may be used for natural recordings, artificially generated content, or a mixture of both (hybrid content). Furthermore, it is also possible to use both methods simultaneously for decoding a single time frame of HOA coefficients.

出于说明的目的假定:音频编码装置20确定HOA系数11是现场地俘获或以其它方式表示现场记录(例如,现场记录7),音频编码装置20可经配置以使用涉及线性可逆变换(LIT)的应用的基于向量的分解方法编码HOA系数11。线性可逆变换的一实例被称作“奇异值分解”(或“SVD”)。在此实例中,音频编码装置20可将SVD应用于HOA系数11以确定HOA系数11的经分解版本。音频编码装置20可接着分析HOA系数11的经分解版本以识别可促进进行HOA系数11的经分解版本的重新排序的各种参数。音频编码装置20可接着基于所识别的参数将HOA系数11的经分解版本重新排序,其中如下文进一步详细描述,在给定以下情形的情况下,此重新排序可改进译码效率:变换可将HOA系数跨越HOA系数的帧重新排序(其中帧可包含HOA系数11的M个样本且在一些实例中,M经设定为1024)。在将HOA系数11的经分解版本重新排序之后,音频编码装置20可选择表示声场的前景(或,换句话说,特异的、占优势的或突出的)分量的HOA系数11的经分解版本。音频编码装置20可将表示前景分量的HOA系数11的经分解版本指定为音频对象及相关联方向信息。Assuming for purposes of illustration that the audio encoding device 20 determines that the HOA coefficients 11 were captured live or otherwise represent a live recording (e.g., live recording 7), the audio encoding device 20 may be configured to encode the HOA coefficients 11 using a vector-based decomposition method involving the application of a linear reversible transform (LIT). One example of a linear reversible transform is called "singular value decomposition" (or "SVD"). In this example, the audio encoding device 20 may apply the SVD to the HOA coefficients 11 to determine a decomposed version of the HOA coefficients 11. The audio encoding device 20 may then analyze the decomposed version of the HOA coefficients 11 to identify various parameters that may facilitate reordering the decomposed version of the HOA coefficients 11. The audio encoding device 20 may then reorder the decomposed version of the HOA coefficients 11 based on the identified parameters, where, as described in further detail below, this reordering may improve coding efficiency given that the transform may reorder the HOA coefficients across a frame of HOA coefficients (where a frame may include M samples of the HOA coefficients 11 and, in some examples, M is set to 1024). After reordering the decomposed versions of the HOA coefficients 11, the audio encoding device 20 may select the decomposed version of the HOA coefficients 11 that represents the foreground (or, in other words, the specific, dominant, or prominent) component of the sound field. The audio encoding device 20 may designate the decomposed version of the HOA coefficients 11 that represents the foreground component as an audio object and associated directional information.

音频编码装置20也可关于HOA系数11执行声场分析以便至少部分地识别表示声场的一或多个背景(或,换句话说,环境)分量的HOA系数11。音频编码装置20可在给定以下情形的情况下关于背景分量执行能量补偿:在一些实例中,背景分量可能仅包含HOA系数11的任何给定样本的一子集(例如,例如对应于零阶及一阶球面基函数的HOA系数11,而非对应于二阶或高阶球面基函数的HOA系数11)。换句话说,当执行降阶时,音频编码装置20可扩增(例如,添加能量/减去能量)HOA系数11中的剩余背景HOA系数以补偿由于执行降阶而导致的总体能量的改变。The audio encoding device 20 may also perform sound field analysis on the HOA coefficients 11 in order to at least partially identify HOA coefficients 11 representing one or more background (or, in other words, ambient) components of the sound field. The audio encoding device 20 may perform energy compensation on the background components given the following circumstances: in some instances, the background components may only include a subset of any given sample of the HOA coefficients 11 (e.g., HOA coefficients 11 corresponding to zero-order and first-order spherical basis functions, rather than HOA coefficients 11 corresponding to second-order or higher-order spherical basis functions). In other words, when performing order reduction, the audio encoding device 20 may amplify (e.g., add energy/subtract energy) the remaining background HOA coefficients in the HOA coefficients 11 to compensate for the change in overall energy caused by performing order reduction.

音频编码装置20接下来可关于表示背景分量及前景音频对象中的每一者的HOA系数11中的每一者执行一种形式的音质编码(例如,MPEG环绕、MPEG-AAC、MPEG-USAC或其它已知形式的音质编码)。音频编码装置20可关于前景方向信息执行一种形式的内插,且接着关于经内插前景方向信息执行降阶以产生经降阶的前景方向信息。在一些实例中,音频编码装置20可进一步关于经降阶的前景方向信息执行量化,从而输出经译码前景方向信息。在一些情况下,量化可包括纯量/熵量化。音频编码装置20可接着形成位流21以包含经编码背景分量、经编码前景音频对象及经量化的方向信息。音频编码装置20可接着传输或以其它方式将位流21输出到内容消费者装置14。The audio encoding device 20 may then perform a form of psychoacoustic encoding (e.g., MPEG Surround, MPEG-AAC, MPEG-USAC, or other known forms of psychoacoustic encoding) on each of the HOA coefficients 11 representing each of the background components and the foreground audio objects. The audio encoding device 20 may perform a form of interpolation on the foreground directional information and then perform order reduction on the interpolated foreground directional information to produce reduced-order foreground directional information. In some examples, the audio encoding device 20 may further perform quantization on the reduced-order foreground directional information, thereby outputting the coded foreground directional information. In some cases, quantization may include scalar/entropy quantization. The audio encoding device 20 may then form a bitstream 21 to include the encoded background components, the encoded foreground audio objects, and the quantized directional information. The audio encoding device 20 may then transmit or otherwise output the bitstream 21 to the content consumer device 14.

虽然在图2中经展示为直接传输到内容消费者装置14,但内容建立者装置12可将位流21输出到定位于内容建立者装置12与内容消费者装置14之间的中间装置。所述中间装置可存储位流21以供稍后递送到可能请求所述位流的内容消费者装置14。所述中间装置可包括文件服务器、网页服务器、桌上型计算机、膝上型计算机、平板计算机、移动电话、智能电话,或能够存储位流21以供音频解码器稍后检索的任何其它装置。所述中间装置可驻留于能够将位流21流式传输(且可能结合传输对应视频数据位流)到请求位流21的订户(例如,内容消费者装置14)的内容递送网络中。2 as being transmitted directly to the content consumer device 14, the content creator device 12 may output the bitstream 21 to an intermediate device positioned between the content creator device 12 and the content consumer device 14. The intermediate device may store the bitstream 21 for later delivery to the content consumer device 14 that may request the bitstream. The intermediate device may comprise a file server, a web server, a desktop computer, a laptop computer, a tablet computer, a mobile phone, a smartphone, or any other device capable of storing the bitstream 21 for later retrieval by an audio decoder. The intermediate device may reside in a content delivery network that is capable of streaming the bitstream 21 (and possibly in conjunction with transmitting a corresponding video data bitstream) to a subscriber (e.g., the content consumer device 14) requesting the bitstream 21.

替代地,内容建立者装置12可将位流21存储到存储媒体,例如压缩光盘、数字多功能光盘、高清晰度视频光盘或其它存储媒体,其中的大部分能够由计算机读取且因此可被称作计算机可读存储媒体或非暂时性计算机可读存储媒体。在此上下文中,传输信道可指借以传输存储到所述媒体的内容的那些信道(且可包含零售商店及其它基于商店的递送机构)。在任何情况下,本发明的技术因此就此而言不应限于图2的实例。Alternatively, the content creator device 12 may store the bitstream 21 on a storage medium such as a compact disc, a digital versatile disc, a high-definition video disc, or other storage medium, most of which are capable of being read by a computer and thus may be referred to as computer-readable storage medium or non-transitory computer-readable storage medium. In this context, a transmission channel may refer to those channels through which the content stored on the medium is transmitted (and may include retail stores and other store-based delivery mechanisms). In any case, the technology of this disclosure should not be limited to the example of FIG. 2 in this regard.

如图2的实例中进一步展示,内容消费者装置14包含音频重放系统16。音频重放系统16可表示能够重放多信道音频数据的任何音频重放系统。音频重放系统16可包含数个不同呈现器22。呈现器22可各自提供不同形式的呈现,其中不同形式的呈现可包含执行基于向量的振幅移动(VBAP)的各种方式中的一或多者及/或执行声场合成的各种方式中的一或多者。如本文所使用,“A及/或B”意谓“A或B”,或“A及B”两者。As further shown in the example of FIG2 , the content consumer device 14 includes an audio playback system 16. The audio playback system 16 may represent any audio playback system capable of playing back multi-channel audio data. The audio playback system 16 may include a number of different renderers 22. The renderers 22 may each provide a different form of presentation, wherein the different forms of presentation may include one or more of various ways of performing vector-based amplitude shifting (VBAP) and/or one or more of various ways of performing sound field synthesis. As used herein, "A and/or B" means "A or B," or "both A and B."

音频重放系统16可进一步包含音频解码装置24。音频解码装置24可表示经配置以解码来自位流21的HOA系数11'的装置,其中HOA系数11'可类似于HOA系数11,但归因于经由传输信道的有损操作(例如,量化)及/或传输而有所不同。即,音频解码装置24可将位流21中所指定的前景方向信息解量化,同时还关于位流21中所指定的前景音频对象及表示背景分量的经编码HOA系数执行音质解码。音频解码装置24可进一步关于经解码前景方向信息执行内插,且接着基于经解码前景音频对象及经内插前景方向信息确定表示前景分量的HOA系数。音频解码装置24可接着基于表示前景分量的所确定的HOA系数及表示背景分量的经解码HOA系数确定HOA系数11'。The audio playback system 16 may further include an audio decoding device 24. The audio decoding device 24 may represent a device configured to decode HOA coefficients 11′ from the bitstream 21, wherein the HOA coefficients 11′ may be similar to the HOA coefficients 11, but differ due to lossy operations (e.g., quantization) and/or transmission through a transmission channel. That is, the audio decoding device 24 may dequantize the foreground directional information specified in the bitstream 21 while also performing quality decoding on the foreground audio objects specified in the bitstream 21 and the encoded HOA coefficients representing the background components. The audio decoding device 24 may further perform interpolation on the decoded foreground directional information and then determine the HOA coefficients representing the foreground components based on the decoded foreground audio objects and the interpolated foreground directional information. The audio decoding device 24 may then determine the HOA coefficients 11′ based on the determined HOA coefficients representing the foreground components and the decoded HOA coefficients representing the background components.

音频重放系统16可在解码位流21之后获得HOA系数11'且呈现HOA系数11'以输出扩音器馈入25。扩音器馈入25可驱动一或多个扩音器(其出于易于说明的目的而未在图2的实例中加以展示)。The audio playback system 16 may obtain the HOA coefficients 11' after decoding the bitstream 21 and present the HOA coefficients 11' to output the loudspeaker feed 25. The loudspeaker feed 25 may drive one or more loudspeakers (which are not shown in the example of FIG. 2 for ease of illustration).

为了选择适当呈现器或在一些情况下产生适当呈现器,音频重放系统16可获得指示扩音器的数目及/或扩音器的空间几何布置的扩音器信息13。在一些情况下,音频重放系统16可使用参考麦克风且以使得动态地确定扩音器信息13的方式驱动扩音器而获得扩音器信息13。在其它情况下或结合扩音器信息13的动态确定,音频重放系统16可提示用户与音频重放系统16介接且输入扩音器信息13。To select an appropriate renderer, or in some cases, generate an appropriate renderer, the audio playback system 16 may obtain loudspeaker information 13 indicating the number of loudspeakers and/or the spatial geometric arrangement of the loudspeakers. In some cases, the audio playback system 16 may obtain the loudspeaker information 13 using a reference microphone and driving the loudspeakers in a manner that dynamically determines the loudspeaker information 13. In other cases, or in conjunction with the dynamic determination of the loudspeaker information 13, the audio playback system 16 may prompt the user to interface with the audio playback system 16 and input the loudspeaker information 13.

音频重放系统16可接着基于扩音器信息13选择音频呈现器22中的一者。在一些情况下,当音频呈现器22中无一者在与扩音器信息13中所指定者处于某一阈值相似度度量(按照扩音器几何布置)内时,音频重放系统16可基于扩音器信息13产生音频呈现器22中的所述者。在一些情况下,音频重放系统16可基于扩音器信息13产生音频呈现器22中的一者,而不会首先试图选择音频呈现器22中的现有的一者。The audio playback system 16 may then select one of the audio renderers 22 based on the loudspeaker information 13. In some cases, when none of the audio renderers 22 are within a certain threshold similarity metric (in terms of loudspeaker geometry) to the one specified in the loudspeaker information 13, the audio playback system 16 may generate that one of the audio renderers 22 based on the loudspeaker information 13. In some cases, the audio playback system 16 may generate one of the audio renderers 22 based on the loudspeaker information 13 without first attempting to select an existing one of the audio renderers 22.

图3为更详细地说明可执行本发明中所描述的技术的各种方面的图2的实例中所展示的音频编码装置20的一实例的框图。音频编码装置20包含内容分析单元26、基于向量的分解单元27及基于方向的分解单元28。尽管下文简要描述,但关于音频编码装置20及压缩或以其它方式编码HOA系数的各种方面的更多信息可在2014年5月29日申请的标题为“用于声场的经分解表示的内插(INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF ASOUND FIELD)”的第WO 2014/194099号国际专利申请公开案中获得。FIG3 is a block diagram illustrating in greater detail an example of the audio encoding device 20 shown in the example of FIG2 that can perform various aspects of the techniques described in this disclosure. The audio encoding device 20 includes a content analysis unit 26, a vector-based decomposition unit 27, and a direction-based decomposition unit 28. Although briefly described below, more information regarding the audio encoding device 20 and various aspects of compressing or otherwise encoding HOA coefficients can be found in International Patent Application Publication No. WO 2014/194099, filed May 29, 2014, entitled “INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD.”

内容分析单元26表示经配置以分析HOA系数11的内容以识别HOA系数11表示从现场记录产生的内容还是从音频对象产生的内容的单元。内容分析单元26可确定HOA系数11是从实际声场的记录产生还是从人工音频对象产生。在一些情况下,当帧HOA系数11是从记录产生时,内容分析单元26将HOA系数11传递到基于向量的分解单元27。在一些情况下,当帧HOA系数11是从合成音频对象产生时,内容分析单元26将HOA系数11传递到基于方向的合成单元28。基于方向的合成单元28可表示经配置以执行对HOA系数11的基于方向的合成以产生基于方向的位流21的单元。The content analysis unit 26 represents a unit configured to analyze the content of the HOA coefficients 11 to identify whether the HOA coefficients 11 represent content generated from a live recording or from an audio object. The content analysis unit 26 may determine whether the HOA coefficients 11 are generated from a recording of an actual sound field or from an artificial audio object. In some cases, when the frame HOA coefficients 11 are generated from a recording, the content analysis unit 26 passes the HOA coefficients 11 to a vector-based decomposition unit 27. In some cases, when the frame HOA coefficients 11 are generated from a synthesized audio object, the content analysis unit 26 passes the HOA coefficients 11 to a direction-based synthesis unit 28. The direction-based synthesis unit 28 may represent a unit configured to perform direction-based synthesis of the HOA coefficients 11 to generate a direction-based bitstream 21.

如图3的实例中所展示,基于向量的分解单元27可包含线性可逆变换(LIT)单元30、参数计算单元32、重新排序单元34、前景选择单元36、能量补偿单元38、音质音频译码器单元40、位流产生单元42、声场分析单元44、系数减少单元46、背景(BG)选择单元48、空间-时间内插单元50及量化单元52。As shown in the example of Figure 3, the vector-based decomposition unit 27 may include a linear reversible transform (LIT) unit 30, a parameter calculation unit 32, a reordering unit 34, a foreground selection unit 36, an energy compensation unit 38, a psychoacoustic audio decoder unit 40, a bitstream generation unit 42, a sound field analysis unit 44, a coefficient reduction unit 46, a background (BG) selection unit 48, a space-time interpolation unit 50 and a quantization unit 52.

线性可逆变换(LIT)单元30接收呈HOA信道形式的HOA系数11,每一信道表示与球面基函数的给定阶数、子阶数相关联的系数的块或帧(其可表示为HOA[k],其中k可表示样本的当前帧或块)。HOA系数11的矩阵可具有维度D:M×(N+1)2The linear invertible transform (LIT) unit 30 receives HOA coefficients 11 in the form of HOA channels, each channel representing a block or frame of coefficients associated with a given order or sub-order of a spherical basis function (which may be denoted as HOA[k], where k may represent the current frame or block of samples). The matrix of HOA coefficients 11 may have dimensions D: M×(N+1) ² .

即,LIT单元30可表示经配置以执行被称作奇异值分解的形式的分析的单元。虽然关于SVD加以描述,但可关于提供数组线性不相关的能量密集输出的任何类似变换或分解执行本发明中所描述的所述技术。而且,本发明中对“组”的提及大体上意欲指非零组(除非特别地相反陈述),且并不意欲指包含所谓的“空组”的组的经典数学定义。That is, LIT unit 30 may represent a unit configured to perform a form of analysis known as singular value decomposition. Although described with respect to SVD, the techniques described in this disclosure may be performed with respect to any similar transform or decomposition that provides an energy-dense output of a set of linearly uncorrelated numbers. Furthermore, references to "groups" in this disclosure are generally intended to refer to non-zero groups (unless specifically stated to the contrary) and are not intended to refer to the classical mathematical definition of a group, which includes the so-called "null group."

替代变换可包括常常被称作“PCA”的主分量分析。PCA是指使用正交变换将一组可能相关变量的观测结果转换成被称作主分量的一组线性不相关变量的数学程序。线性不相关变量表示彼此并不具有线性统计关系(或相依性)的变量。可将主分量描述为彼此具有小程度的统计相关性。在任何情况下,所谓的主分量的数目小于或等于原始变量的数目。在一些实例中,按如下方式定义变换:使得第一主分量具有最大可能方差(或,换句话说,尽可能多地考虑数据中的可变性),且每一接续分量又具有可能的最高方差(在以下约束下:所述连续分量正交于前述分量(所述情形可重新陈述为与前述分量不相关))。PCA可执行一种形式的降阶,其就HOA系数11而言可导致HOA系数11的压缩。取决于上下文,可通过数个不同名称来提及PCA,例如离散卡忽南-拉维变换(discrete Karhunen-Loeve transform)、哈特林变换(Hotelling transform)、适当正交分解(POD)及本征值分解(EVD)(仅举几个实例)。有利于压缩音频数据的基本目标的此些操作的性质为多信道音频数据的“能量压缩”及“解相关”。An alternative transformation may include principal component analysis, often referred to as "PCA". PCA refers to a mathematical procedure that uses an orthogonal transformation to convert the observations of a set of possibly correlated variables into a set of linearly uncorrelated variables called principal components. Linearly uncorrelated variables refer to variables that do not have a linear statistical relationship (or dependence) with each other. Principal components can be described as having a small degree of statistical correlation with each other. In any case, the number of so-called principal components is less than or equal to the number of original variables. In some instances, the transformation is defined as follows: the first principal component has the largest possible variance (or, in other words, takes into account as much variability in the data as possible), and each subsequent component has the highest possible variance (under the constraint that the successive components are orthogonal to the previous components (which can be restated as being uncorrelated with the previous components)). PCA can perform a form of order reduction, which can result in compression of the HOA coefficients 11 with respect to the HOA coefficients 11. Depending on the context, PCA may be referred to by several different names, such as discrete Karhunen-Loeve transform, Hotelling transform, Proper Orthogonal Decomposition (POD), and Eigenvalue Decomposition (EVD), to name a few. Properties of such operations that facilitate the fundamental goal of compressing audio data are "energy compression" and "decorrelation" of multi-channel audio data.

在任何情况下,出于实例的目的,假定LIT单元30执行奇异值分解(其再次可被称作“SVD”),LIT单元30可将HOA系数11变换成两组或两组以上经变换的HOA系数。“数组”经变换的HOA系数可包含经变换的HOA系数的向量。在图3的实例中,LIT单元30可关于HOA系数11执行SVD以产生所谓的V矩阵、S矩阵及U矩阵。在线性代数中,SVD可按如下形式表示y乘z实数或复数矩阵X(其中X可表示多信道音频数据,例如HOA系数11)的因子分解:In any case, assuming for purposes of example that the LIT unit 30 performs singular value decomposition (which again may be referred to as "SVD"), the LIT unit 30 may transform the HOA coefficients 11 into two or more sets of transformed HOA coefficients. The "array" of transformed HOA coefficients may include a vector of transformed HOA coefficients. In the example of FIG. 3 , the LIT unit 30 may perform SVD on the HOA coefficients 11 to generate so-called V matrices, S matrices, and U matrices. In linear algebra, SVD may represent the factorization of a y-by-z real or complex matrix X (where X may represent multi-channel audio data, such as the HOA coefficients 11) as follows:

X=USV* X=USV *

U可表示y乘y实数或复数单位矩阵,其中U的y列被称为多信道音频数据的左奇异向量。S可表示在对角线上具有非负实数的y乘z矩形对角线矩阵,其中S的对角线值被称为多信道音频数据的奇异值。V*(其可表示V的共轭转置)可表示z乘z实数或复数单位矩阵,其中V*的z列被称为多信道音频数据的右奇异向量。U may represent a y-by-y real or complex identity matrix, where the y columns of U are referred to as the left singular vectors of the multi-channel audio data. S may represent a y-by-z rectangular diagonal matrix with non-negative real numbers on the diagonal, where the diagonal values of S are referred to as the singular values of the multi-channel audio data. V* (which may represent the conjugate transpose of V) may represent a z-by-z real or complex identity matrix, where the z columns of V* are referred to as the right singular vectors of the multi-channel audio data.

尽管本发明中描述为将技术应用于包括HOA系数11的多信道音频数据,但所述技术可应用于任何形式的多信道音频数据。以此方式,音频编码装置20可关于表示声场的至少一部分的多信道音频数据执行奇异值分解,以产生表示多信道音频数据的左奇异向量的U矩阵、表示多信道音频数据的奇异值的S矩阵及表示多信道音频数据的右奇异向量的V矩阵,且将多信道音频数据表示为U矩阵、S矩阵及V矩阵中的一或多者的至少一部分的函数。Although described in the present invention as applying the technique to multi-channel audio data including the HOA coefficients 11, the technique can be applied to any form of multi-channel audio data. In this manner, the audio encoding device 20 can perform singular value decomposition on the multi-channel audio data representing at least a portion of the sound field to generate a U matrix representing the left singular vectors of the multi-channel audio data, an S matrix representing the singular values of the multi-channel audio data, and a V matrix representing the right singular vectors of the multi-channel audio data, and represent the multi-channel audio data as a function of at least a portion of one or more of the U matrix, the S matrix, and the V matrix.

在一些实例中,将上文提及的SVD数学表达式中的V*矩阵表示为V矩阵的共轭转置以反映SVD可应用于包括复数的矩阵。当应用于仅包括实数的矩阵时,V矩阵的复数共轭(或,换句话说,V*矩阵)可被视为V矩阵的转置。下文为易于说明的目的,假定:HOA系数11包括实数,结果为经由SVD而非V*矩阵输出V矩阵。此外,虽然在本发明中表示为V矩阵,但在适当时,对V矩阵的提及应被理解为是指V矩阵的转置。虽然假定为V矩阵,但所述技术可按类似方式应用于具有复数系数的HOA系数11,其中SVD的输出为V*矩阵。因此,就此而言,所述技术不应限于仅提供应用SVD以产生V矩阵,而可包含将SVD应用于具有复数分量的HOA系数11以产生V*矩阵。In some examples, the V* matrix in the SVD mathematical expression mentioned above is expressed as the conjugate transpose of the V matrix to reflect that SVD can be applied to matrices including complex numbers. When applied to matrices including only real numbers, the complex conjugate of the V matrix (or, in other words, the V* matrix) can be considered as the transpose of the V matrix. For ease of explanation below, it is assumed that the HOA coefficients 11 include real numbers, resulting in the output of the V matrix via SVD rather than the V* matrix. In addition, although expressed as a V matrix in this disclosure, references to the V matrix should be understood to refer to the transpose of the V matrix when appropriate. Although assumed to be a V matrix, the described techniques can be applied in a similar manner to HOA coefficients 11 having complex coefficients, where the output of the SVD is the V* matrix. Therefore, in this regard, the described techniques should not be limited to providing only the application of SVD to generate the V matrix, but may include applying SVD to HOA coefficients 11 having complex components to generate the V* matrix.

在任何情况下,LIT单元30可关于高阶立体混响(HOA)音频数据(其中立体混响音频数据包含HOA系数11或任何其它形式的多信道音频数据的块或样本)的每一块(其可指帧)执行逐块形式的SVD。如上文所提及,变量M可用以表示音频帧的长度(以样本数计)。举例来说,当音频帧包含1024个音频样本时,M等于1024。尽管关于M的典型值加以描述,但本发明的所述技术不应限于M的典型值。LIT单元30因此可关于具有M乘(N+1)2个HOA系数的HOA系数11的块执行逐块SVD,其中N再次表示HOA音频数据的阶数。LIT单元30可经由执行所述SVD而产生V矩阵、S矩阵及U矩阵,其中矩阵中的每一者可表示上文所描述的相应V、S及U矩阵。以此方式,线性可逆变换单元30可关于HOA系数11执行SVD以输出具有维度D:M×(N+1)2的US[k]向量33(其可表示S向量及U向量的组合版本),及具有维度D:(N+1)2×(N+1)2的V[k]向量35。US[k]矩阵中的个别向量元素也可被称为XPS(k),而V[k]矩阵中的个别向量也可被称为v(k)。In any case, LIT unit 30 may perform a block-by-block SVD on each block (which may be referred to as a frame) of higher-order ambisonic (HOA) audio data (where the ambisonic audio data includes blocks or samples of HOA coefficients 11 or any other form of multi-channel audio data). As mentioned above, the variable M may be used to represent the length (in number of samples) of the audio frame. For example, when the audio frame includes 1024 audio samples, M is equal to 1024. Although described with respect to typical values of M, the techniques of this disclosure should not be limited to typical values of M. LIT unit 30 may therefore perform a block-by-block SVD on a block of HOA coefficients 11 having M times (N+1) 2 HOA coefficients, where N again represents the order of the HOA audio data. By performing the SVD, LIT unit 30 may generate a V matrix, an S matrix, and a U matrix, where each of the matrices may represent the respective V, S, and U matrices described above. In this manner, the linear reversible transform unit 30 may perform SVD on the HOA coefficients 11 to output a US[k] vector 33 having dimensions D: M×(N+1) 2 (which may represent a combined version of the S vector and the U vector), and a V[k] vector 35 having dimensions D: (N+1) 2 ×(N+1) 2. Individual vector elements in the US[k] matrix may also be referred to as XPS (k), and individual vectors in the V[k] matrix may also be referred to as v(k).

U、S及V矩阵的分析可揭示:所述矩阵携有或表示上文通过X表示的基础声场的空间及时间特性。U(长度为M个样本)中的N个向量中的每一者可表示依据时间(对于通过M个样本表示的时间段)的经正规化的分离音频信号,其彼此正交且已与任何空间特性(其也可被称作方向信息)解耦。表示空间形状及位置宽度的空间特性可改为通过V矩阵中的个别第i向量v(i)(k)(每一者具有长度(N+1)2)表示。v(i)(k)向量中的每一者的个别元素可表示描述针对相关联的音频对象的声场的形状及方向的HOA系数。U矩阵及V矩阵两者中的向量经正规化而使得其均方根能量等于单位。U中的音频信号的能量因此通过S中的对角线元素表示。将U与S相乘以形成US[k](具有个别向量元素XPS(k)),因此表示具有真正能量的音频信号。进行SVD分解以使音频时间信号(U中)、其能量(S中)与其空间特性(V中)解耦的能力可支持本发明中所描述的技术的各种方面。另外,通过US[k]与V[k]的向量乘法合成基础HOA[k]系数X的模型引出贯穿此文件使用的术语“基于向量的分解”。Analysis of the U, S, and V matrices reveals that the matrices carry or represent the spatial and temporal characteristics of the underlying sound field, denoted above by X. Each of the N vectors in U (of length M samples) can represent a normalized, separated audio signal in terms of time (for a time period denoted by M samples), which are orthogonal to one another and decoupled from any spatial characteristics (which can also be referred to as directional information). The spatial characteristics representing spatial shape and position width can instead be represented by individual i-th vectors v (i) (k) in the V matrix (each having length (N+1) ² ). The individual elements of each v(i) (k) vector can represent the HOA coefficients describing the shape and direction of the sound field for the associated audio object. The vectors in both the U and V matrices are normalized so that their root mean square energy is equal to unity. The energy of the audio signal in U is therefore represented by the diagonal elements in S. Multiplying U and S forms US[k] (with individual vector elements XPS (k)), thus representing the audio signal with true energy. The ability to perform SVD decomposition to decouple the audio time signal (in U), its energy (in S), and its spatial characteristics (in V) can support various aspects of the techniques described in this disclosure. Additionally, the model of synthesizing the basis HOA[k] coefficients X by vector multiplication of US[k] and V[k] leads to the term "vector-based decomposition" used throughout this document.

尽管描述为直接关于HOA系数11执行,但LIT单元30可将线性可逆变换应用于HOA系数11的导数。举例来说,LIT单元30可关于从HOA系数11导出的功率谱密度矩阵应用SVD。功率谱密度矩阵可表示为PSD且是经由hoaFrame到hoaFrame的转置的矩阵乘法而获得,如下文的伪码中所概述。hoaFrame记法是指HOA系数11的帧。Although described as being performed directly on the HOA coefficients 11, the LIT unit 30 may apply a linear reversible transform to the derivatives of the HOA coefficients 11. For example, the LIT unit 30 may apply an SVD to the power spectral density matrix derived from the HOA coefficients 11. The power spectral density matrix may be represented as a PSD and is obtained via matrix multiplication of the transpose of hoaFrame to hoaFrame, as outlined in the pseudocode below. The hoaFrame notation refers to a frame of the HOA coefficients 11.

在将SVD(svd)应用于PSD之后,LIT单元30可获得S[k]2矩阵(S_squared)及V[k]矩阵。S[k]2矩阵可表示S[k]矩阵的平方,因此LIT单元30可将平方根运算应用于S[k]2矩阵以获得S[k]矩阵。在一些情况下,LIT单元30可关于V[k]矩阵执行量化以获得经量化的V[k]矩阵(其可表示为V[k]'矩阵)。LIT单元30可通过首先将S[k]矩阵乘以经量化的V[k]'矩阵以获得SV[k]'矩阵而获得U[k]矩阵。LIT单元30接下来可获得SV[k]'矩阵的伪逆(pinv)且接着将HOA系数11乘以SV[k]'矩阵的伪逆以获得U[k]矩阵。可通过以下伪码表示前述情形:After applying SVD (svd) to the PSD, the LIT unit 30 may obtain an S[k] 2 matrix (S_squared) and a V[k] matrix. The S[k] 2 matrix may represent the square of the S[k] matrix, so the LIT unit 30 may apply a square root operation to the S[k] 2 matrix to obtain the S[k] matrix. In some cases, the LIT unit 30 may perform quantization on the V[k] matrix to obtain a quantized V[k] matrix (which may be represented as a V[k]' matrix). The LIT unit 30 may obtain the U[k] matrix by first multiplying the S[k] matrix by the quantized V[k]' matrix to obtain the SV[k]' matrix. The LIT unit 30 may then obtain the pseudo-inverse (pinv) of the SV[k]' matrix and then multiply the HOA coefficients 11 by the pseudo-inverse of the SV[k]' matrix to obtain the U[k] matrix. The foregoing scenario may be represented by the following pseudo-code:

PSD=hoaFrame'*hoaFrame;PSD = hoaFrame'*hoaFrame;

[V,S_squared]=svd(PSD,’econ’);[V,S_squared]=svd(PSD,’econ’);

S=sqrt(S_squared);S = sqrt(S_squared);

U=hoaFrame*pinv(S*V');U = hoaFrame*pinv(S*V');

通过关于HOA系数的功率谱密度(PSD)而非系数自身执行SVD,LIT单元30可在处理器循环及存储空间中的一或多者方面可能地降低执行SVD的计算复杂性,同时达成相同的源音频编码效率,如同SVD是直接应用于HOA系数一般。即,上文所描述的PSD型SVD可能有可能在计算上要求不太高,这是因为与M*F矩阵(其中M为帧长度,即,1024或大于1024个样本)相比较,SVD是针对F*F矩阵(其中F为HOA系数的数目)进行。通过应用于PSD而非HOA系数11,与应用于HOA系数11时的O(M*L2)相比较,SVD的复杂性现可为约O(L3)(其中O(*)表示计算机科学技术中常见的计算复杂性的大O记法)。By performing SVD on the power spectral density (PSD) of the HOA coefficients rather than the coefficients themselves, the LIT unit 30 can potentially reduce the computational complexity of performing the SVD in terms of one or more of processor cycles and memory space, while achieving the same source audio coding efficiency as if the SVD were applied directly to the HOA coefficients. That is, the PSD-type SVD described above can potentially be less computationally demanding because the SVD is performed on an F*F matrix (where F is the number of HOA coefficients) as compared to an M*F matrix (where M is the frame length, i.e., 1024 or greater). By being applied to the PSD rather than the HOA coefficients 11, the complexity of the SVD can now be approximately O(L 3 ) (where O(*) represents the Big O notation for computational complexity commonly used in computer science) as compared to O(M*L 2 ) when applied to the HOA coefficients 11.

就此而言,LIT单元30可关于高阶立体混响音频数据11执行分解或以其它方式分解高阶立体混响音频数据11以获得表示球谐域中的正交空间轴线的向量(例如,上述V-向量)。分解可包含SVD、EVD或任何其它形式的分解。In this regard, the LIT unit 30 may perform decomposition on the higher-order ambisonic audio data 11 or otherwise decompose the higher-order ambisonic audio data 11 to obtain vectors (e.g., the aforementioned V-vectors) representing orthogonal spatial axes in the spherical harmonic domain. The decomposition may include SVD, EVD, or any other form of decomposition.

参数计算单元32表示经配置以计算各种参数的单元,所述参数例如相关性参数(R)、方向性质参数及能量性质(e)。用于当前帧的参数中的每一者可表示为R[k]、及e[k]。参数计算单元32可关于US[k]向量33执行能量分析及/或相关(或所谓的交叉相关)以识别所述参数。参数计算单元32也可确定用于先前帧的参数,其中先前帧参数可基于具有US[k-1]向量及V[k-1]向量的先前帧表示为R[k-1]、θ[k-1]、r[k-1]及e[k-1]。参数计算单元32可将当前参数37及先前参数39输出到重新排序单元34。Parameter calculation unit 32 represents a unit configured to calculate various parameters, such as a correlation parameter (R), a directional property parameter, and an energy property (e). Each of the parameters for the current frame may be represented as R[k] and e[k]. Parameter calculation unit 32 may perform energy analysis and/or correlation (or so-called cross-correlation) on US[k] vector 33 to identify the parameters. Parameter calculation unit 32 may also determine parameters for the previous frame, where the previous frame parameters may be represented as R[k-1], θ[k-1], r[k-1], and e[k-1] based on the previous frame having US[k-1] vector and V[k-1] vector. Parameter calculation unit 32 may output current parameters 37 and previous parameters 39 to reordering unit 34.

SVD分解并不会保证通过US[k-1]向量33中的第p向量表示的音频信号/对象(其可表示为US[k-1][p]向量(或,替代地,表示为XPS (p)(k-1)))将为通过US[k]向量33中的第p向量表示的相同音频信号/对象(其也可表示为US[k][p]向量33(或,替代地,表示为XPS (p)(k)))(在时间上前进)。由参数计算单元32计算的参数可供重新排序单元34用以将音频对象重新排序以表示其自然评估或随时间推移的连续性。SVD decomposition does not guarantee that the audio signal/object represented by the p-th vector in the US[k-1] vectors 33 (which may be represented as a US[k-1][p] vector (or, alternatively, as XPS (p) (k-1))) will be the same audio signal/object represented by the p-th vector in the US[k] vectors 33 (which may also be represented as a US[k][p] vector 33 (or, alternatively, as XPS (p) (k))) (advanced in time). The parameters calculated by the parameter calculation unit 32 may be used by the reordering unit 34 to reorder the audio objects to represent their natural evaluation or continuity over time.

即,重新排序单元34可逐轮地比较来自第一US[k]向量33的参数37中的每一者与用于第二US[k-1]向量33的参数39中的每一者。重新排序单元34可基于当前参数37及先前参数39将US[k]矩阵33及V[k]矩阵35内的各种向量重新排序(作为一实例,使用匈牙利算法(Hungarian algorithm))以将经重新排序的US[k]矩阵33'(其可在数学上表示为及经重新排序的V[k]矩阵35'(其可在数学上表示为)输出到前景声音(或占优势声音--PS)选择单元36(“前景选择单元36”)及能量补偿单元38。That is, the reordering unit 34 may compare each of the parameters 37 from the first US[k] vector 33 with each of the parameters 39 for the second US[k-1] vector 33 on a round-by-round basis. The reordering unit 34 may reorder the various vectors within the US[k] matrix 33 and the V[k] matrix 35 based on the current parameters 37 and the previous parameters 39 (as an example, using the Hungarian algorithm) to output the reordered US[k] matrix 33′ (which may be mathematically represented as ) and the reordered V[k] matrix 35′ (which may be mathematically represented as ) to the foreground sound (or dominant sound—PS) selection unit 36 (“foreground selection unit 36”) and the energy compensation unit 38.

声场分析单元44可表示经配置以关于HOA系数11执行声场分析以便有可能达成目标位速率41的单元。声场分析单元44可基于分析及/或基于所接收目标位速率41,确定音质译码器执行个体的总数目(其可为环境或背景信道的总数目(BGTOT)的函数)及前景信道(或换句话说,占优势信道)的数目。音质译码器执行个体的总数目可表示为numHOATransportChannels。The soundfield analysis unit 44 may represent a unit configured to perform soundfield analysis on the HOA coefficients 11 in order to potentially achieve the target bitrate 41. Based on the analysis and/or based on the received target bitrate 41, the soundfield analysis unit 44 may determine the total number of psychoacoustic decoder instances (which may be a function of the total number of ambient or background channels (BG TOT )) and the number of foreground channels (or, in other words, dominant channels). The total number of psychoacoustic decoder instances may be represented as numHOATransportChannels.

再次为了可能地达成目标位速率41,声场分析单元44还可确定前景信道的总数目(nFG)45、背景(或换句话说,环境)声场的最小阶数(NBG或替代地,MinAmbHoaOrder)、表示背景声场的最小阶数的实际信道的对应数目(nBGa=(MinAmbHoaOrder+1)2),及待发送的额外BG HOA信道的索引(i)(其在图3的实例中可共同地表示为背景信道信息43)。背景信道信息42也可被称作环境信道信息43。numHOATransportChannels-nBGa后剩余的信道中的每一者可为“额外背景/环境信道”、“作用中的基于向量的占优势信道”、“作用中的基于方向的占优势信号”或“完全不活动”。在一方面,可通过两个位以(“ChannelType”)语法元素形式指示信道类型:(例如,00:基于方向的信号;01:基于向量的占优势信号;10:额外环境信号;11:非作用中信号)。背景或环境信号的总数目nBGa可通过(MinAmbHOAorder+1)2+在用于所述帧的位流中以信道类型形式显现索引10(在上述实例中)的次数给出。Again, to possibly achieve the target bit rate 41, the sound field analysis unit 44 may also determine the total number of foreground channels (nFG) 45, the minimum order of the background (or in other words, ambient) sound field ( NBG or, alternatively, MinAmbHoaOrder), the corresponding number of actual channels representing the minimum order of the background sound field (nBGa = (MinAmbHoaOrder + 1) 2 ), and the index (i) of the additional BG HOA channel to be sent (which may be collectively denoted as background channel information 43 in the example of FIG. 3 ). The background channel information 42 may also be referred to as ambient channel information 43. Each of the channels remaining after numHOATransportChannels-nBGa may be an "additional background/ambient channel," an "active vector-based dominant channel," an "active direction-based dominant signal," or "completely inactive." In one aspect, the channel type may be indicated by two bits in the "ChannelType" syntax element: (e.g., 00: direction-based signal; 01: vector-based dominant signal; 10: additional ambient signal; 11: inactive signal). The total number of background or ambient signals, nBGa, may be given by (MinAmbHOAorder+1) ² + the number of times index 10 (in the above example) appears as a channel type in the bitstream for that frame.

在任何情况下,声场分析单元44可基于目标位速率41选择背景(或换句话说,环境)信道的数目及前景(或换句话说,占优势)信道的数目,从而在目标位速率41相对较高时(例如,在目标位速率41等于或大于512Kbps时)选择更多背景及/或前景信道。在一方面,在位流的标头区段中,numHOATransportChannels可经设定为8,而MinAmbHOAorder可经设定为1。在此情境下,在每个帧处,四个信道可专用于表示声场的背景或环境部分,而其它4个信道可逐帧地在信道类型上变化--例如,用作额外背景/环境信道或前景/占优势信道。前景/占优势信号可为基于向量或基于方向的信号中的一者,如上文所描述。In any case, the sound field analysis unit 44 may select the number of background (or in other words, ambient) channels and the number of foreground (or in other words, dominant) channels based on the target bit rate 41, thereby selecting more background and/or foreground channels when the target bit rate 41 is relatively high (e.g., when the target bit rate 41 is equal to or greater than 512 Kbps). In one aspect, in the header segment of the bitstream, numHOATransportChannels may be set to 8 and MinAmbHOAorder may be set to 1. In this scenario, at each frame, four channels may be dedicated to representing the background or ambient portion of the sound field, while the other four channels may vary in channel type from frame to frame - for example, used as additional background/ambient channels or foreground/dominant channels. The foreground/dominant signal may be one of a vector-based or a direction-based signal, as described above.

在一些情况下,用于帧的基于向量的占优势信号的总数目可通过所述帧的位流中ChannelType索引为01的次数给出。在上述方面,对于每个额外背景/环境信道(例如,对应于ChannelType 10),可在所述信道中表示可能的HOA系数(前四个除外)中的哪一者的对应信息。对于四阶HOA内容,所述信息可为指示HOA系数5到25的索引。可在minAmbHOAorder经设定为1时始终发送前四个环境HOA系数1到4,因此,音频编码装置可能仅需要指示额外环境HOA系数中具有索引5到25的一者。因此可使用5位语法元素(对于四阶内容)发送所述信息,其可表示为“CodedAmbCoeffIdx”。In some cases, the total number of vector-based dominant signals for a frame can be given by the number of times the ChannelType index is 01 in the bitstream of the frame. In the above aspect, for each additional background/ambient channel (e.g., corresponding to ChannelType 10), corresponding information can be provided indicating which of the possible HOA coefficients (except the first four) is present in the channel. For fourth-order HOA content, the information can be an index indicating HOA coefficients 5 to 25. The first four ambient HOA coefficients 1 to 4 can always be sent when minAmbHOAorder is set to 1, so the audio encoding device may only need to indicate one of the additional ambient HOA coefficients with an index of 5 to 25. Therefore, the information can be sent using a 5-bit syntax element (for fourth-order content), which can be denoted as "CodedAmbCoeffIdx".

为了加以说明,假定:minAmbHOAorder经设定为1且具有索引6的额外环境HOA系数是经由位流21发送(作为一实例)。在此实例中,minAmbHOAorder 1指示环境HOA系数具有索引1、2、3及4。音频编码装置20可选择环境HOA系数,这是因为环境HOA系数具有小于或等于(minAmbHOAorder+1)2或4的索引(在此实例中)。音频编码装置20可指定位流21中与索引1、2、3及4相关联的环境HOA系数。音频编码装置20也可指定位流中具有索引6的额外环境HOA系数作为具有ChannelType 10的additionalAmbientHOAchannel。音频编码装置20可使用CodedAmbCoeffIdx语法元素指定索引。作为一种实践,CodedAmbCoeffIdx元素可指定从1到25的所有索引。然而,因为minAmbHOAorder经设定为1,所以音频编码装置20可能并不指定前四个索引中的任一者(因为已知将在位流21中经由minAmbHOAorder语法元素指定前四个索引)。在任何情况下,因为音频编码装置20经由minAmbHOAorder(对于前四个系数)及CodedAmbCoeffIdx(对于额外环境HOA系数)指定五个环境HOA系数,所以音频编码装置20可能并不指定与具有索引1、2、3、4及6的环境HOA系数相关联的对应V-向量元素。因此,音频编码装置20可通过元素[5,7:25]指定V-向量。For illustration, assume that minAmbHOAorder is set to 1 and additional ambient HOA coefficients with index 6 are sent via the bitstream 21 (as an example). In this example, minAmbHOAorder 1 indicates that the ambient HOA coefficients have indices 1, 2, 3, and 4. The audio encoding device 20 may select the ambient HOA coefficients because the ambient HOA coefficients have indices less than or equal to (minAmbHOAorder+1) 2 or 4 (in this example). The audio encoding device 20 may specify the ambient HOA coefficients associated with indices 1, 2, 3, and 4 in the bitstream 21. The audio encoding device 20 may also specify the additional ambient HOA coefficient with index 6 in the bitstream as additionalAmbientHOAchannel with ChannelType 10. The audio encoding device 20 may specify the indices using the CodedAmbCoeffIdx syntax element. As a practice, the CodedAmbCoeffIdx element may specify all indices from 1 to 25. However, because minAmbHOAorder is set to 1, the audio encoding device 20 may not specify any of the first four indices (because it is known that the first four indices will be specified in the bitstream 21 via the minAmbHOAorder syntax element). In any case, because the audio encoding device 20 specifies five ambient HOA coefficients via minAmbHOAorder (for the first four coefficients) and CodedAmbCoeffIdx (for the additional ambient HOA coefficients), the audio encoding device 20 may not specify corresponding V-vector elements associated with the ambient HOA coefficients with indices 1, 2, 3, 4, and 6. Therefore, the audio encoding device 20 may specify the V-vector via elements [5, 7:25].

在第二方面,所有前景/占优势信号为基于向量的信号。在此第二方面,前景/占优势信号的总数目可通过nFG=numHOATransportChannels-[(MinAmbHoaOrder+1)2+additionalAmbientHOAchannel中的每一者]给出。In a second aspect, all foreground/dominant signals are vector-based signals.In this second aspect, the total number of foreground/dominant signals may be given by nFG = numHOATransportChannels - [(MinAmbHoaOrder+1) 2 + each of additionalAmbientHOAchannels].

声场分析单元44将背景信道信息43及HOA系数11输出到背景(BG)选择单元36,将背景信道信息43输出到系数减少单元46及位流产生单元42,且将nFG 45输出到前景选择单元36。The sound field analysis unit 44 outputs the background channel information 43 and the HOA coefficient 11 to the background (BG) selection unit 36 , outputs the background channel information 43 to the coefficient reduction unit 46 and the bitstream generation unit 42 , and outputs the nFG 45 to the foreground selection unit 36 .

背景选择单元48可表示经配置以基于背景信道信息(例如,背景声场(NBG)以及待发送的额外BG HOA信道的数目(nBGa)及索引(i))确定背景或环境HOA系数47的单元。举例来说,当NBG等于一时,背景选择单元48可选择用于具有等于或小于一的阶数的音频帧的每一样本的HOA系数11。在此实例中,背景选择单元48可接着选择具有通过索引(i)中的一者识别的索引的HOA系数11作为额外BG HOA系数,其中将待于位流21中指定的nBGa提供到位流产生单元42以便使得音频解码装置(例如,图2及4的实例中所展示的音频解码装置24)能够从位流21剖析背景HOA系数47。背景选择单元48可接着将环境HOA系数47输出到能量补偿单元38。环境HOA系数47可具有维度D:M×[(NBG+1)2+nBGa]。环境HOA系数47也可被称作“环境HOA系数47”,其中环境HOA系数47中的每一者对应于待由音质音频译码器单元40编码的单独环境HOA信道47。The background selection unit 48 may represent a unit configured to determine background or ambient HOA coefficients 47 based on background channel information, such as the background sound field (N BG ) and the number (nBGa) and index (i) of additional BG HOA channels to be sent. For example, when N BG is equal to one, the background selection unit 48 may select the HOA coefficients 11 for each sample of the audio frame having an order equal to or less than one. In this example, the background selection unit 48 may then select the HOA coefficients 11 having an index identified by one of the indices (i) as the additional BG HOA coefficients, where nBGa to be specified in the bitstream 21 is provided to the bitstream generation unit 42 to enable an audio decoding device (e.g., the audio decoding device 24 shown in the examples of Figures 2 and 4) to parse the background HOA coefficients 47 from the bitstream 21. The background selection unit 48 may then output the ambient HOA coefficients 47 to the energy compensation unit 38. The ambient HOA coefficients 47 may have dimension D: M×[(N BG +1) 2 +nBGa]. The ambient HOA coefficients 47 may also be referred to as “ambient HOA coefficients 47,” where each of the ambient HOA coefficients 47 corresponds to a separate ambient HOA channel 47 to be encoded by the psychoacoustic audio coder unit 40.

前景选择单元36可表示经配置以基于nFG 45(其可表示识别前景向量的一或多个索引)选择表示声场的前景或特异分量的经重新排序的US[k]矩阵33'及经重新排序的V[k]矩阵35'的单元。前景选择单元36可将nFG信号49(其可表示为经重新排序的US[k]1,…,nFG49、FG1,…,nfG[k]49或49)输出到音质音频译码器单元40,其中nFG信号49可具有维度D:M×nFG且每一者表示单信道-音频对象。前景选择单元36也可将对应于声场的前景分量的经重新排序的V[k]矩阵35'(或v(1..nFG)(k)35')输出到空间-时间内插单元50,其中对应于前景分量的经重新排序的V[k]矩阵35'的子集可表示为前景V[k]矩阵51k(其可在数学上表示为),其具有维度D:(N+1)2×nFG。The foreground selection unit 36 may represent a unit configured to select the reordered US[k] matrix 33′ and the reordered V[k] matrix 35′ representing the foreground or distinctive components of the sound field based on nFG 45 (which may represent one or more indices identifying the foreground vectors). The foreground selection unit 36 may output an nFG signal 49 (which may be represented as reordered US[k] 1, ..., nFG 49, FG 1, ..., nfG [k] 49, or 49) to the psychoacoustic audio decoder unit 40, where the nFG signal 49 may have dimensions D: M×nFG and each represents a single channel-audio object. The foreground selection unit 36 may also output the reordered V[k] matrix 35' (or v (1..nFG) (k)35') corresponding to the foreground components of the sound field to the spatial-temporal interpolation unit 50, where a subset of the reordered V[k] matrix 35' corresponding to the foreground components may be represented as a foreground V[k] matrix 51 k (which may be mathematically represented as), which has dimension D: (N+1) 2 ×nFG.

能量补偿单元38可表示经配置以关于环境HOA系数47执行能量补偿以补偿归因于通过背景选择单元48移除HOA信道中的各者而产生的能量损失的单元。能量补偿单元38可关于经重新排序的US[k]矩阵33'、经重新排序的V[k]矩阵35'、nFG信号49、前景V[k]向量51k及环境HOA系数47中的一或多者执行能量分析,且接着基于能量分析执行能量补偿以产生经能量补偿的环境HOA系数47'。能量补偿单元38可将经能量补偿的环境HOA系数47'输出到音质音频译码器单元40。The energy compensation unit 38 may represent a unit configured to perform energy compensation on the ambient HOA coefficients 47 to compensate for the energy loss due to the removal of each of the HOA channels by the background selection unit 48. The energy compensation unit 38 may perform an energy analysis on one or more of the reordered US[k] matrix 33′, the reordered V[k] matrix 35′, the nFG signal 49, the foreground V[k] vector 51 k , and the ambient HOA coefficients 47, and then perform energy compensation based on the energy analysis to generate energy-compensated ambient HOA coefficients 47′. The energy compensation unit 38 may output the energy-compensated ambient HOA coefficients 47′ to the psychoacoustic audio decoder unit 40.

空间-时间内插单元50可表示经配置以接收第k帧的前景V[k]向量51k及前一帧(因此为k-1记法)的前景V[k-1]向量51k-1且执行空间-时间内插以产生经内插的前景V[k]向量的单元。空间-时间内插单元50可将nFG信号49与前景V[k]向量51k重新组合以恢复经重新排序的前景HOA系数。空间-时间内插单元50可接着将经重新排序的前景HOA系数除以经内插的V[k]向量以产生经内插的nFG信号49'。空间-时间内插单元50也可输出用以产生经内插的前景V[k]向量的前景V[k]向量51k,以使得音频解码装置(例如,音频解码装置24)可产生经内插的前景V[k]向量且借此恢复前景V[k]向量51k。将用以产生经内插的前景V[k]向量的前景V[k]向量51k表示为剩余前景V[k]向量53。为了确保在编码器及解码器处使用相同的V[k]及V[k-1](以建立经内插的向量V[k]),可在编码器及解码器处使用向量的经量化/经解量化的版本。The spatial-temporal interpolation unit 50 may represent a unit configured to receive the foreground V[k] vector 51 k of the k-th frame and the foreground V[k-1] vector 51 k-1 of the previous frame (hence the k-1 notation) and perform spatial-temporal interpolation to generate an interpolated foreground V[k] vector. The spatial-temporal interpolation unit 50 may recombines the nFG signal 49 with the foreground V[k] vector 51 k to recover the reordered foreground HOA coefficients. The spatial-temporal interpolation unit 50 may then divide the reordered foreground HOA coefficients by the interpolated V[k] vector to generate the interpolated nFG signal 49 ′. The spatial-temporal interpolation unit 50 may also output the foreground V[k] vector 51 k used to generate the interpolated foreground V[k] vector so that an audio decoding device (e.g., the audio decoding device 24) can generate the interpolated foreground V[k] vector and thereby recover the foreground V[k] vector 51 k . The foreground V[k] vector 51 k used to generate the interpolated foreground V[k] vector is denoted as the residual foreground V[k] vector 53. To ensure that the same V[k] and V[k-1] are used at the encoder and decoder (to create the interpolated vector V[k]), quantized/dequantized versions of the vectors may be used at the encoder and decoder.

在操作中,空间-时间内插单元50可内插来自包含于第一帧中的第一多个HOA系数11的一部分的第一分解(例如,前景V[k]向量51k)及包含于第二帧中的第二多个HOA系数11的一部分的第二分解(例如,前景V[k]向量51k-1)的第一音频帧的一或多个子帧,以产生用于所述一或多个子帧的经分解的经内插球谐系数。In operation, the spatio-temporal interpolation unit 50 may interpolate one or more subframes of a first audio frame from a first decomposition of a portion of a first plurality of HOA coefficients 11 included in the first frame (e.g., the foreground V[k] vector 51 k ) and a second decomposition of a portion of a second plurality of HOA coefficients 11 included in the second frame (e.g., the foreground V[k] vector 51 k-1 ) to generate decomposed interpolated spherical harmonic coefficients for the one or more subframes.

在一些实例中,第一分解包括表示HOA系数11的所述部分的右奇异向量的第一前景V[k]向量51k。同样,在一些实例中,第二分解包括表示HOA系数11的所述部分的右奇异向量的第二前景V[k]向量51kIn some examples, the first decomposition includes a first foreground V[k] vector 51k representing right singular vectors of the portion of HOA coefficients 11. Similarly, in some examples, the second decomposition includes a second foreground V[k] vector 51k representing right singular vectors of the portion of HOA coefficients 11.

换句话说,就球面上的正交基函数而言,基于球谐的3D音频可为3D压力场的参数表示。所述表示的阶数N越高,空间分辨率可能地越高,且常常球谐(SH)系数的数目越大(总共(N+1)2个系数)。对于许多应用,可能需要系数的带宽压缩从而能够有效率地传输及存储所述系数。本发明中所针对的所述技术可提供使用奇异值分解(SVD)进行的基于帧的维度减少过程。SVD分析可将系数的每一帧分解成三个矩阵U、S及V。在一些实例中,所述技术可将US[k]矩阵中的向量中的一些向量作为基础声场的前景分量来处置。然而,当以此方式进行处置时,所述向量(在US[k]矩阵中)在帧间是不连续的,即使其表示同一特异音频分量也是如此。当经由变换音频译码器馈入所述分量时,所述不连续性可导致显著假影。In other words, 3D audio based on spherical harmonics can be a parametric representation of a 3D pressure field in terms of orthogonal basis functions on a sphere. The higher the order N of the representation, the higher the spatial resolution can be, and often the larger the number of spherical harmonic (SH) coefficients (a total of (N+1) 2 coefficients). For many applications, bandwidth compression of the coefficients may be required so that they can be efficiently transmitted and stored. The techniques targeted in this disclosure can provide a frame-based dimensionality reduction process using singular value decomposition (SVD). SVD analysis can decompose each frame of coefficients into three matrices U, S, and V. In some examples, the techniques can treat some of the vectors in the US[k] matrix as foreground components of the underlying sound field. However, when treated in this way, the vectors (in the US[k] matrix) are discontinuous between frames, even if they represent the same specific audio component. When these components are fed through a transform audio decoder, these discontinuities can lead to noticeable artifacts.

在一些方面,空间-时间内插可依赖于以下观测:可将V矩阵解译为球谐域中的正交空间轴线。U[k]矩阵可表示球谐(HOA)数据依据基函数的投影,其中不连续性可归因于正交空间轴线(V[k]),所述正交空间轴线每个帧皆改变且因此自身为不连续的。此情形不同于例如傅立叶变换的一些其它分解,其中在一些实例中,基函数在帧间为常量。在此些术语中,SVD可被视为匹配追求算法。空间-时间内插单元50可执行内插以通过在帧之间内插而可能从帧到帧维持基函数(V[k])之间的连续性。In some aspects, spatiotemporal interpolation can rely on the observation that the V matrix can be interpreted as orthogonal spatial axes in the spherical harmonic domain. The U[k] matrix can represent the projection of the spherical harmonic (HOA) data onto the basis functions, where discontinuities can be attributed to the orthogonal spatial axes (V[k]), which change every frame and are therefore themselves discontinuous. This differs from some other decompositions, such as the Fourier transform, where, in some instances, the basis functions are constant across frames. In these terms, SVD can be considered a matching pursuit algorithm. The spatiotemporal interpolation unit 50 can perform interpolation to maintain continuity between the basis functions (V[k]) from frame to frame, possibly by interpolating between frames.

如上文所提及,可关于样本执行内插。当子帧包括一组单一样本时,所述状况在上述描述中得以一般化。在经由样本及经由子帧进行内插的两种状况下,内插运算可呈以下等式的形式:As mentioned above, interpolation can be performed with respect to samples. This is generalized in the above description when a subframe includes a set of single samples. In both the case of interpolation over samples and over subframes, the interpolation operation can take the form of the following equation:

在上述等式中,可从单一V-向量v(k-1)关于单一V-向量v(k)执行内插,所述向量在一方面可表示来自邻近帧k及k-1的V-向量。在上述等式中,l表示执行内插所针对的分辨率,其中l可指示整数样本且l=1,…,T(其中T为样本的长度,在所述长度内执行内插且在所述长度内需要经输出的经内插的向量且所述长度也指示过程的输出产生向量的l)。替代地,l可指示由多个样本组成的子帧。当(例如)将帧划分成四个子帧时,l可包括用于所述子帧中的每一子帧的值1、2、3及4。可经由位流将l的值作为被称为“CodedSpatialInterpolationTime”的字段用信号通知,使得可在解码器中重复内插运算。w(l)可包括内插权重的值。当内插为线性的时,w(l)可依据l在0与1之间线性地且单调地变化。在其它情况下,w(l)可依据l在0与1之间以非线性但单调方式(例如,上升余弦的四分之一循环)变化。可将函数w(l)在几种不同函数可能性之间编索引且将所述函数在位流中作为被称为“SpatialInterpolationMethod”的字段用信号通知,使得可由解码器重复相同的内插运算。当w(l)具有接近于0的值时,输出可被高度加权或受v(k-1)影响。而当w(l)具有接近于1的值时,其确保输出被高度加权且受v(k-1)影响。In the above equation, interpolation can be performed from a single V-vector v(k-1) with respect to a single V-vector v(k), which in one aspect can represent the V-vectors from adjacent frames k and k-1. In the above equation, l represents the resolution at which interpolation is performed, where l can indicate an integer number of samples and l=1, ..., T (where T is the length of samples over which interpolation is performed and over which the output interpolated vector is desired, and the length also indicates the l of the output generation vector of the process). Alternatively, l can indicate a subframe consisting of multiple samples. When, for example, a frame is divided into four subframes, l can include values of 1, 2, 3, and 4 for each of the subframes. The value of l can be signaled via the bitstream as a field called "CodedSpatialInterpolationTime" so that the interpolation operation can be repeated in the decoder. w(l) can include the value of the interpolation weight. When interpolation is linear, w(l) can vary linearly and monotonically between 0 and 1 depending on l. In other cases, w(l) may vary between 0 and 1 in a nonlinear but monotonic manner (e.g., a quarter cycle of a raised cosine) depending on l. The function w(l) can be indexed between several different function possibilities and signaled in the bitstream as a field called "SpatialInterpolationMethod" so that the same interpolation operation can be repeated by the decoder. When w(l) has a value close to 0, the output may be heavily weighted or influenced by v(k-1). When w(l) has a value close to 1, it ensures that the output is heavily weighted and influenced by v(k-1).

系数减少单元46可表示经配置以基于背景信道信息43关于剩余前景V[k]向量53执行系数减少以将经缩减前景V[k]向量55输出到量化单元52的单元。经缩减前景V[k]向量55可具有维度D:[(N+1)2-(NBG+1)2-BGTOT]×nFG。The coefficient reduction unit 46 may represent a unit configured to perform coefficient reduction on the remaining foreground V[k] vector 53 based on the background channel information 43 to output a reduced foreground V[k] vector 55 to the quantization unit 52. The reduced foreground V[k] vector 55 may have a dimension D: [(N+1) 2- ( NBG +1) 2 - BGTOT ]×nFG.

就此而言,系数减少单元46可表示经配置以减少剩余前景V[k]向量53的系数的数目的单元。换句话说,系数减少单元46可表示经配置以消除前景V[k]向量中具有极少或几乎没有方向信息的系数(其形成剩余前景V[k]向量53)的单元。如上文所描述,在一些实例中,特异或(换句话说)前景V[k]向量的对应于一阶及零阶基函数的系数(其可表示为NBG)提供极少方向信息,且因此可将其从前景V-向量移除(经由可被称作“系数减少”的过程)。在此实例中,可提供较大灵活性以使得不仅从组[(NBG+1)2+1,(N+1)2]识别对应于NBG的系数而且识别额外HOA信道(其可通过变量TotalOfAddAmbHOAChan表示)。声场分析单元44可分析HOA系数11以确定BGTOT,其不仅可识别(NBG+1)2而且可识别TotalOfAddAmbHOAChan,所述两者可共同地被称作背景信道信息43。系数减少单元46可接着将对应于(NBG+1)2及TotalOfAddAmbHOAChan的系数从剩余前景V[k]向量53移除以产生大小为((N+1)2-(BGTOT)×nFG的维度较小的V[k]矩阵55,其也可被称作经缩减前景V[k]向量55。In this regard, the coefficient reduction unit 46 may represent a unit configured to reduce the number of coefficients of the remaining foreground V[k] vector 53. In other words, the coefficient reduction unit 46 may represent a unit configured to eliminate coefficients in the foreground V[k] vector that have little or no directional information (which form the remaining foreground V[k] vector 53). As described above, in some examples, the coefficients of the unique or (in other words) foreground V[k] vector corresponding to the first-order and zero-order basis functions (which may be represented as N BG ) provide little directional information and, therefore, may be removed from the foreground V-vector (via a process that may be referred to as "coefficient reduction"). In this example, greater flexibility may be provided such that not only the coefficients corresponding to N BG are identified from the set [(N BG +1) 2 +1, (N+1) 2 ] but also additional HOA channels (which may be represented by the variable TotalOfAddAmbHOAChan). The sound field analysis unit 44 may analyze the HOA coefficients 11 to determine BG TOT , which may identify not only ( NBG +1) 2 but also TotalOfAddAmbHOAChan, which may be collectively referred to as background channel information 43. The coefficient reduction unit 46 may then remove the coefficients corresponding to ( NBG +1) 2 and TotalOfAddAmbHOAChan from the remaining foreground V[k] vector 53 to generate a smaller dimensional V[k] matrix 55 of size ((N+1) 2 -(BG TOT )×nFG, which may also be referred to as a reduced foreground V[k] vector 55.

换句话说,如第WO 2014/194099号公开案中所提及,系数减少单元46可产生用于旁侧信道信息57的语法元素。举例来说,系数减少单元46可在存取单元(其可包含一或多个帧)的标头中指定表示选择复数种配置模式中的哪一者的语法元素。尽管描述为基于每一存取单元指定,但系数减少单元46可基于每一帧或任何其它周期性基础或非周期性基础(例如,针对整个位流一次)指定所述语法元素。在任何情况下,所述语法元素可包括两个位,所述两个位指示选择三种配置模式中的哪一者用于指定经缩减前景V[k]向量55的所述组非零系数以表示特异分量的方向方面。所述语法元素可表示为“CodedVVecLength”。以此方式,系数减少单元46可在位流中用信号通知或以其它方式指定使用三种配置模式中的哪一者在位流21中指定经缩减前景V[k]向量55。In other words, as mentioned in WO 2014/194099, coefficient reduction unit 46 may generate syntax elements for side channel information 57. For example, coefficient reduction unit 46 may specify a syntax element in the header of an access unit (which may include one or more frames) that indicates which of a plurality of configuration modes is selected. Although described as being specified on a per-access-unit basis, coefficient reduction unit 46 may specify the syntax element on a per-frame basis or any other periodic or aperiodic basis (e.g., once for the entire bitstream). In any case, the syntax element may include two bits that indicate which of the three configuration modes is selected for specifying the set of non-zero coefficients of the reduced foreground V[k] vector 55 to represent the directional aspect of the unique component. The syntax element may be denoted as "CodedVVecLength." In this way, coefficient reduction unit 46 may signal or otherwise specify in the bitstream which of the three configuration modes is used to specify the reduced foreground V[k] vector 55 in the bitstream 21.

举例来说,三种配置模式可呈现于用于VVecData的语法表(稍后在本文件中引用)中。在所述实例中,配置模式如下:(模式0),在VVecData字段中传输完整V-向量长度;(模式1),不传输与用于环境HOA系数的最小数目个系数相关联的V-向量的元素及包含额外HOA信道的V-向量的所有元素;及(模式2),不传输与用于环境HOA系数的最小数目个系数相关联的V-向量的元素。VVecData的语法表结合switch及case语句说明所述模式。尽管关于三种配置模式加以描述,但所述技术不应限于三种配置模式,且可包含任何数目种配置模式,包含单一配置模式或复数种模式。第WO 2014/194099号公开案提供具有四种模式的不同实例。系数减少单元46也可将旗标63指定为旁侧信道信息57中的另一语法元素。For example, three configuration modes may be presented in the syntax table for VVecData (referenced later in this document). In this example, the configuration modes are as follows: (Mode 0) the full V-vector length is transmitted in the VVecData field; (Mode 1) the elements of the V-vector associated with the minimum number of coefficients for the ambient HOA coefficients and all elements of the V-vector containing the additional HOA channels are not transmitted; and (Mode 2) the elements of the V-vector associated with the minimum number of coefficients for the ambient HOA coefficients are not transmitted. The syntax table for VVecData illustrates these modes in conjunction with switch and case statements. Although described with respect to three configuration modes, the techniques should not be limited to three configuration modes and may include any number of configuration modes, including a single configuration mode or a plurality of modes. WO 2014/194099 provides a different example with four modes. Coefficient reduction unit 46 may also specify flag 63 as another syntax element in side channel information 57.

量化单元52可表示经配置以执行任何形式的量化以压缩经缩减前景V[k]向量55以产生经译码前景V[k]向量57从而将经译码前景V[k]向量57输出到位流产生单元42的单元。在操作中,量化单元52可表示经配置以压缩声场的空间分量(即,在此实例中,为经缩减前景V[k]向量55中的一或多者)的单元。出于实例的目的,假定经缩减前景V[k]向量55包含两行向量,由于系数减少,每一列具有少于25个元素(其暗示声场的四阶HOA表示)。尽管关于两行向量加以描述,但任何数目个向量可包含于经缩减前景V[k]向量55中,至多为(n+1)2个,其中n表示声场的HOA表示的阶数。此外,尽管下文描述为执行纯量及/或熵量化,但量化单元52可执行导致经缩减前景V[k]向量55的压缩的任何形式的量化。The quantization unit 52 may represent a unit configured to perform any form of quantization to compress the reduced foreground V[k] vector 55 to produce a coded foreground V[k] vector 57, thereby outputting the coded foreground V[k] vector 57 to the bitstream generation unit 42. In operation, the quantization unit 52 may represent a unit configured to compress the spatial components of the sound field (i.e., in this example, one or more of the reduced foreground V[k] vectors 55). For purposes of example, it is assumed that the reduced foreground V[k] vector 55 includes two row vectors, each column having fewer than 25 elements due to coefficient reduction (which implies a fourth-order HOA representation of the sound field). Although described with respect to two row vectors, any number of vectors may be included in the reduced foreground V[k] vector 55, up to (n+1) 2 , where n represents the order of the HOA representation of the sound field. Furthermore, although described below as performing scalar and/or entropy quantization, the quantization unit 52 may perform any form of quantization that results in compression of the reduced foreground V[k] vector 55.

量化单元52可接收经缩减前景V[k]向量55且执行压缩方案以产生经译码前景V[k]向量57。压缩方案大体上可涉及用于压缩向量或数据的元素的任何可设想压缩方案,且不应限于下文更详细描述的实例。作为一实例,量化单元52可执行包含以下各者中的一或多者的压缩方案:将经缩减前景V[k]向量55的每一元素的浮点表示变换成经缩减前景V[k]向量55的每一元素的整数表示、经缩减前景V[k]向量55的整数表示的均匀量化,以及剩余前景V[k]向量55的经量化的整数表示的分类及译码。The quantization unit 52 may receive the reduced foreground V[k] vector 55 and perform a compression scheme to produce a coded foreground V[k] vector 57. The compression scheme may generally relate to any conceivable compression scheme for compressing elements of a vector or data and should not be limited to the examples described in more detail below. As an example, the quantization unit 52 may perform a compression scheme that includes one or more of: transforming a floating point representation of each element of the reduced foreground V[k] vector 55 into an integer representation of each element of the reduced foreground V[k] vector 55, uniform quantization of the integer representation of the reduced foreground V[k] vector 55, and sorting and coding of the quantized integer representation of the remaining foreground V[k] vector 55.

在一些实例中,可通过参数动态地控制所述压缩方案的一或多个过程中的若干者以达成或几乎达成(作为一实例)所得位流21的目标位速率41。在给定经缩减前景V[k]向量55中的每一者彼此正交的情况下,可独立地译码经缩减前景V[k]向量55中的每一者。在一些实例中,如下文更详细地描述,可使用相同译码模式(通过各种子模式界定)译码每一经缩减前景V[k]向量55的每一元素。In some examples, several of the one or more processes of the compression scheme may be dynamically controlled by parameters to achieve, or nearly achieve (as an example), a target bit rate 41 for the resulting bitstream 21. Given that each of the reduced foreground V[k] vectors 55 is orthogonal to one another, each of the reduced foreground V[k] vectors 55 may be independently coded. In some examples, as described in more detail below, each element of each reduced foreground V[k] vector 55 may be coded using the same coding mode (defined by various sub-modes).

如第WO 2014/194099号公开案中所描述,量化单元52可执行纯量量化及/或霍夫曼(Huffman)编码以压缩经缩减前景V[k]向量55,从而输出经译码前景V[k]向量57(其也可被称作旁侧信道信息57)。旁侧信道信息57可包含用以译码剩余前景V[k]向量55的语法元素。As described in WO 2014/194099, the quantization unit 52 may perform scalar quantization and/or Huffman encoding to compress the reduced foreground V[k] vector 55, thereby outputting a coded foreground V[k] vector 57 (which may also be referred to as side channel information 57). The side channel information 57 may include syntax elements used to code the remaining foreground V[k] vector 55.

此外,尽管关于纯量量化形式加以描述,但量化单元52可执行向量量化或任何其它形式的量化。在一些情况下,量化单元52可在向量量化及纯量量化之间切换。在上文所描述的纯量量化期间,量化单元52可计算两个连续V-向量(如在帧到帧中连续)之间的差且译码所述差(或,换句话说,残余)。此纯量量化可表示基于先前所指定的向量及差信号进行的一种形式的预测性译码。向量量化并不涉及此差译码。Furthermore, although described with respect to a scalar form of quantization, quantization unit 52 may perform vector quantization or any other form of quantization. In some cases, quantization unit 52 may switch between vector quantization and scalar quantization. During scalar quantization described above, quantization unit 52 may calculate the difference between two consecutive V-vectors (e.g., consecutive from frame to frame) and code the difference (or, in other words, the residual). This scalar quantization may represent a form of predictive coding based on a previously specified vector and difference signal. Vector quantization does not involve this difference coding.

换句话说,量化单元52可接收输入V-向量(例如,经缩减前景V[k]向量55中的一者)且执行不同类型的量化以选择所述量化类型中将用于所述输入V-向量的类型。作为一实例,量化单元52可执行向量量化、无霍夫曼译码的纯量量化,及具有霍夫曼译码的纯量量化。In other words, quantization unit 52 may receive an input V-vector (e.g., one of the downscaled foreground V[k] vectors 55) and perform different types of quantization to select a type of quantization type to be used for the input V-vector. As an example, quantization unit 52 may perform vector quantization, scalar quantization without Huffman coding, and scalar quantization with Huffman coding.

在此实例中,量化单元52可根据向量量化模式将输入V-向量向量量化以产生经向量量化的V-向量。经向量量化的V-向量可包含表示输入V-向量的经向量量化的权重值。在一些实例中,可将经向量量化的权重值表示为指向量化码字的量化码簿中的量化码字(即,量化向量)的一或多个量化索引。当经配置以执行向量量化时,量化单元52可基于码向量63(“CV 63”)将经缩减前景V[k]向量55中的每一者分解成码向量的加权总和。量化单元52可产生用于码向量63中的选定码向量中的每一者的权重值。In this example, quantization unit 52 may vector quantize the input V-vector according to a vector quantization mode to generate a vector quantized V-vector. The vector quantized V-vector may include vector quantized weight values representing the input V-vector. In some examples, the vector quantized weight values may be represented as one or more quantization indices pointing to quantization codewords (i.e., quantization vectors) in a quantization codebook of quantization codewords. When configured to perform vector quantization, quantization unit 52 may decompose each of the reduced foreground V[k] vectors 55 into a weighted sum of codevectors based on code vectors 63 ("CV 63"). Quantization unit 52 may generate weight values for each of the selected code vectors in code vectors 63.

量化单元52接下来可选择所述权重值的子集以产生权重值的选定子集。举例来说,量化单元52可从所述组权重值中选择Z个最大量值权重值以产生权重值的选定子集。在一些实例中,量化单元52可进一步将选定权重值重新排序以产生权重值的选定子集。举例来说,量化单元52可基于从最高量值权重值开始且于最低量值权重值结束的量值将选定权重值重新排序。Quantization unit 52 may then select a subset of the weight values to generate a selected subset of weight values. For example, quantization unit 52 may select the Z largest magnitude weight values from the set of weight values to generate the selected subset of weight values. In some examples, quantization unit 52 may further reorder the selected weight values to generate the selected subset of weight values. For example, quantization unit 52 may reorder the selected weight values based on magnitude starting with the highest magnitude weight value and ending with the lowest magnitude weight value.

当执行向量量化时,量化单元52可从量化码簿中选择Z-分量向量来表示Z个权重值。换句话说,量化单元52可将Z个权重值向量量化以产生表示Z个权重值的Z-分量向量。在一些实例中,Z可对应于由量化单元52选择以表示单一V-向量的权重值的数目。量化单元52可产生指示经选择以表示Z个权重值的Z-分量向量的数据,且将此数据提供到位流产生单元42作为经译码权重57。在一些实例中,量化码簿可包含经编索引的多个Z-分量向量,且指示Z-分量向量的数据可为量化码簿中指向选定向量的索引值。在此些实例中,解码器可包含经类似地编索引的量化码簿以解码索引值。When performing vector quantization, quantization unit 52 may select a Z-component vector from a quantization codebook to represent the Z weight values. In other words, quantization unit 52 may vector-quantize the Z weight values to generate a Z-component vector representing the Z weight values. In some examples, Z may correspond to the number of weight values selected by quantization unit 52 to represent a single V-vector. Quantization unit 52 may generate data indicating the Z-component vector selected to represent the Z weight values and provide this data to bitstream generation unit 42 as coded weights 57. In some examples, the quantization codebook may include a plurality of indexed Z-component vectors, and the data indicating the Z-component vector may be an index value in the quantization codebook that points to the selected vector. In such examples, the decoder may include a similarly indexed quantization codebook to decode the index value.

在数学上,可基于以下表达式表示经缩减前景V[k]向量55中的每一者:Mathematically, each of the reduced foreground V[k] vectors 55 can be represented based on the following expression:

其中Ωj表示一组码向量({Ωj})中的第j码向量,ωj表示一组权重({ωj})中的第j权重,V对应于由V-向量译码单元52表示、分解及/或译码的V-向量,且J表示用以表示V的权重的数目及码向量的数目。表达式(1)的右侧可表示包含一组权重({ωj})及一组码向量({Ωj})的码向量的加权总和。Where Ω j represents the jth code vector in a set of code vectors ({Ω j }), ω j represents the jth weight in a set of weights ({ω j }), V corresponds to the V-vector represented, decomposed, and/or decoded by V-vector decoding unit 52, and J represents the number of weights and the number of code vectors used to represent V. The right side of expression (1) may represent a weighted sum of code vectors including the set of weights ({ω j }) and the set of code vectors ({Ω j }).

在一些实例中,量化单元52可基于以下等式确定权重值:In some examples, quantization unit 52 may determine the weight value based on the following equation:

其中表示一组码向量({Ωk})中的第k码向量的转置,V对应于由量化单元52表示、分解及/或译码的V-向量,且ωk表示一组权重({ωk})中的第k权重。where represents the transpose of the kth code vector in a set of code vectors ({Ω k }), V corresponds to the V-vector represented, decomposed, and/or decoded by quantization unit 52, and ω k represents the kth weight in a set of weights ({ω k }).

考虑使用25个权重及25个码向量表示V-向量VFG的实例。可将VFG的此分解书写为:Consider an example where 25 weights and 25 code vectors are used to represent the V-vector V FG . This decomposition of V FG can be written as:

其中Ωj表示一组码向量({Ωj})中的第j码向量,ωj表示一组权重({ωj})中的第j权重,且VFG对应于由量化单元52表示、分解及/或译码的V-向量。where Ω j represents the jth code vector in a set of code vectors ({Ω j }), ω j represents the jth weight in a set of weights ({ω j }), and V FG corresponds to the V-vector represented, decomposed, and/or decoded by quantization unit 52.

在所述组码向量({Ωj})正交的实例中,以下表达式可适用:In the instance where the group code vectors ({Ω j }) are orthogonal, the following expressions may apply:

在此些实例中,等式(3)的右侧可简化如下:In these examples, the right side of equation (3) can be simplified as follows:

其中ωk对应于码向量的加权总和中的第k权重。where ω k corresponds to the kth weight in the weighted sum of the code vectors.

对于等式(3)中所使用的码向量的实例加权总和,量化单元52可使用等式(5)(类似于等式(2))计算用于码向量的加权总和中的权重中的每一者的权重值且可将所得权重表示为:For the example weighted sum of codevectors used in equation (3), quantization unit 52 may calculate weight values for each of the weights in the weighted sum of codevectors using equation (5) (similar to equation (2)) and may express the resulting weights as:

k}k=1,…,25 (6)k } k=1,…,25 (6)

考虑量化单元52选择五个最大权重值(即,具有最大值或绝对值的权重)的实例。可将待量化的权重值的子集表示为:Consider an example where quantization unit 52 selects the five largest weight values (i.e., weights with the largest or absolute values). The subset of weight values to be quantized can be represented as:

可使用权重值的子集以及其对应码向量形成估计V-向量的码向量的加权总和,如以下表达式中所展示:A weighted sum of codevectors to estimate the V-vector may be formed using a subset of weight values and their corresponding codevectors, as shown in the following expression:

其中Ωj表示码向量({Ωj})的子集中的第j码向量,表示权重()的子集中的第j权重,且对应于所估计的V-向量,其对应于由量化单元52分解及/或译码的V-向量。表达式(1)的右侧可表示包含一组权重()及一组码向量({Ωj})的码向量的加权总和。Wherein Ω j represents the j-th code vector in the subset of code vectors ({Ω j }), represents the j-th weight in the subset of weights (), and corresponds to the estimated V-vector, which corresponds to the V-vector decomposed and/or decoded by quantization unit 52. The right side of expression (1) may represent a weighted sum of code vectors including a set of weights () and a set of code vectors ({Ω j }).

量化单元52可将权重值的子集量化以产生经量化的权重值,其可表示为:Quantization unit 52 may quantize a subset of the weight values to produce quantized weight values, which may be represented as:

可使用经量化的权重值以及其对应码向量形成表示所估计的V-向量的经量化的版本的码向量的加权总和,如以下表达式中所展示:A weighted sum of codevectors representing quantized versions of the estimated V-vectors may be formed using the quantized weight values and their corresponding codevectors, as shown in the following expression:

其中Ωj表示码向量({Ωj})的子集中的第j码向量,表示权重()的子集中的第j权重,且对应于所估计的V-向量,其对应于由量化单元52分解及/或译码的V-向量。表达式(1)的右侧可表示包含一组权重()及一组码向量({Ωj})的码向量的子集的加权总和。Wherein Ω j represents the j-th code vector in the subset of code vectors ({Ω j }), represents the j-th weight in the subset of weights (), and corresponds to the estimated V-vector, which corresponds to the V-vector decomposed and/or decoded by quantization unit 52. The right side of expression (1) may represent a weighted sum of the subset of code vectors including a set of weights () and a set of code vectors ({Ω j }).

前文的替代重新叙述(其大部分等效于上文所描述的叙述)可如下。可基于一组预定义码向量译码V-向量。为了译码V-向量,将每一V-向量分解成码向量的加权总和。码向量的加权总和由k对预定义码向量及相关联权重组成:An alternative restatement of the foregoing (which is largely equivalent to the one described above) can be as follows. V-vectors can be coded based on a set of predefined code vectors. To code a V-vector, each V-vector is decomposed into a weighted sum of code vectors. The weighted sum of code vectors consists of k pairs of predefined code vectors and associated weights:

其中Ωj表示一组预定义码向量({Ωj})中的第j码向量,ωj表示一组预定义权重({ωj})中的第j实数值权重,k对应于加数的索引(其可高达7),且V对应于经译码的V-向量。k的选择取决于编码器。如果编码器选择两个或两个以上码向量的加权总和,那么编码器可选择的预定义码向量的总数目为(N+1)2,所述预定义码向量是从3D音频标准(标题为“信息技术-异质环境中的高效率译码及媒体递送-第3部分:3D音频(Informationtechnology-High effeciency coding and media delivery in heterogeneousenvironments-Part 3:3D audio)”,ISO/IEC JTC 1/SC 29/WG 11,日期为2014年7月25日,且通过文件编号ISO/IEC DIS 23008-3识别)的表F.3到F.7导出作为HOA扩展系数。当N为4时,使用上文所引用的3D音频标准的附录F.5中具有32个预定义方向的表格。在所有状况下,将权重ω的绝对值关于上文所引用的3D音频标准的表F.12中的表格的前k+1列中可见的且通过相关联的行编号索引用信号通知的预定义加权值向量量化。Where Ωj represents the jth code vector in a set of predefined code vectors ({ Ωj }), ωj represents the jth real-valued weight in a set of predefined weights ({ ωj }), k corresponds to the index of the addend (which can be up to 7), and V corresponds to the coded V-vector. The choice of k depends on the encoder. If the encoder selects the weighted sum of two or more code vectors, then the total number of predefined code vectors that the encoder can select is (N+1) 2 , which are derived as HOA expansion coefficients from Tables F.3 to F.7 of the 3D audio standard (entitled "Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio", ISO/IEC JTC 1/SC 29/WG 11, dated July 25, 2014, and identified by document number ISO/IEC DIS 23008-3). When N is 4, the table with 32 predefined directions in Appendix F.5 of the above-referenced 3D audio standard is used. In all cases, the absolute value of the weight ω is vector-quantized with respect to the predefined weighting values found in the first k+1 columns of the table in Table F.12 of the above-referenced 3D audio standard and signaled by the associated row number index.

将权重ω的数字正负号分别译码为:Decode the positive and negative signs of the weight ω as:

换句话说,在用信号通知值k之后,通过指向k+1个预定义码向量{Ωj}的k+1个索引、指向预定义加权码簿中的k个经量化的权重的一索引及k+1个数字正负号值sj编码V-向量:In other words, after signaling the value k, the V-vector is encoded by k+1 indices pointing to the k+1 predefined code vectors {Ω j }, an index pointing to the k quantized weights in the predefined weighted codebook, and k+1 digital sign values s j :

如果编码器选择一码向量的加权总和,那么结合上文所引用的3D音频标准的表F.11的表格中的绝对加权值使用从上文所引用的3D音频标准的表F.8导出的码簿,其中在下文展示这些表格中的两者。而且,可分别译码加权值ω的数字正负号。量化单元52可用信号通知使用上文所提及的表F.3到F.12中所阐述的前述码簿中的哪一码簿来使用码簿索引语法元素(其在下文可表示为“CodebkIdx”)译码输入V-向量。量化单元52也可将输入V-向量纯量量化以产生输出经纯量量化的V-向量,而无需对经纯量量化的V-向量进行霍夫曼译码。量化单元52可进一步根据霍夫曼译码纯量量化模式将输入V-向量纯量量化以产生经霍夫曼译码经纯量量化的V-向量。举例来说,量化单元52可将输入V-向量纯量量化以产生经纯量量化的V-向量,且对经纯量量化的V-向量进行霍夫曼译码以产生输出经霍夫曼译码经纯量量化的V-向量。If the encoder selects a weighted sum of code vectors, a codebook derived from Table F.8 of the aforementioned 3D audio standard is used in conjunction with the absolute weight values in Table F.11 of the aforementioned 3D audio standard, both of which are shown below. Furthermore, the numerical signs of the weight values ω may be individually coded. Quantization unit 52 may signal which of the aforementioned codebooks described in Tables F.3 through F.12 is used to code the input V-vector using the codebook index syntax element (hereinafter referred to as "CodebkIdx"). Quantization unit 52 may also scalar quantize the input V-vector to produce an output scalar quantized V-vector without Huffman coding the scalar quantized V-vector. Quantization unit 52 may further scalar quantize the input V-vector according to a Huffman coded scalar quantization mode to produce a Huffman coded scalar quantized V-vector. For example, quantization unit 52 may scalar quantize the input V-vector to produce a scalar quantized V-vector, and Huffman code the scalar quantized V-vector to produce an output Huffman coded scalar quantized V-vector.

在一些实例中,量化单元52可执行一种形式的经预测的向量量化。量化单元52可通过在位流21中指定指示是否执行用于向量量化的预测的一或多个位(例如,PFlag语法元素)而识别是否预测向量量化(如通过指示量化模式的一或多个位识别,例如,NbitsQ语法元素)。In some examples, quantization unit 52 may perform a form of predicted vector quantization. Quantization unit 52 may identify whether vector quantization is predicted (as identified by one or more bits indicating a quantization mode, e.g., an NbitsQ syntax element) by specifying one or more bits in the bitstream 21 indicating whether prediction for vector quantization is performed (e.g., a PFlag syntax element).

为了说明经预测的向量量化,量化单元42可经配置以接收对应于向量(例如,v-向量)的基于码向量的分解的权重值(例如,权重值量值),基于所接收权重值及基于经重建构的权重值(例如,从一或多个先前或后续音频帧重建构的权重值)产生预测性权重值,及将数组预测性权重值向量量化。在一些状况下,一组预测性权重值中的每一权重值可对应于单一向量的基于码向量的分解中所包含的权重值。To illustrate predicted vector quantization, quantization unit 42 may be configured to receive weight values (e.g., weight value magnitudes) corresponding to a code vector-based decomposition of a vector (e.g., a v-vector), generate predictive weight values based on the received weight values and based on reconstructed weight values (e.g., weight values reconstructed from one or more previous or subsequent audio frames), and vector quantize the set of predictive weight values. In some cases, each weight value in a set of predictive weight values may correspond to a weight value included in the code vector-based decomposition of a single vector.

量化单元52可接收权重值及从向量的先前或后续译码获得的经加权的经重建构的权重值。量化单元52可基于权重值及经加权的经重建构的权重值产生预测性权重值。量化单元42可将经加权的经重建构的权重值从权重值中减去以产生预测性权重值。预测性权重值可替代地被称作(例如)残余、预测残余、残余权重值、权重值差、误差或预测误差。Quantization unit 52 may receive the weight value and a weighted, reconstructed weight value obtained from a previous or subsequent decoding of the vector. Quantization unit 52 may generate a predictive weight value based on the weight value and the weighted, reconstructed weight value. Quantization unit 52 may subtract the weighted, reconstructed weight value from the weight value to generate a predictive weight value. The predictive weight value may alternatively be referred to as, for example, a residual, a prediction residual, a residual weight value, a weight value difference, an error, or a prediction error.

权重值可表示为|wi,j|,其为对应权重值wi,j的量值(或绝对值)。因此,权重值可替代地被称作权重值量值或被称作权重值的量值。权重值wi,j对应于来自用于第i音频帧的权重值的有序子集的第j权重值。在一些实例中,权重值的有序子集可对应于向量(例如,v-向量)的基于码向量的分解中的权重值的子集,其是基于权重值的量值而排序(例如,从最大量值到最小量值排序)。The weight value may be represented as | wi,j |, which is the magnitude (or absolute value) of the corresponding weight value wi ,j . Thus, the weight value may alternatively be referred to as a weight value magnitude or as a weight value magnitude. The weight value wi ,j corresponds to the jth weight value from an ordered subset of weight values for the i-th audio frame. In some examples, the ordered subset of weight values may correspond to a subset of weight values in a code vector-based decomposition of a vector (e.g., a v-vector), which is sorted based on the magnitude of the weight values (e.g., sorted from largest magnitude to smallest magnitude).

经加权的经重建构的权重值可包含项,其对应于对应的经重建构的权重值的量值(或绝对值)。经重建构的权重值对应于来自用于第(i-1)音频帧的经重建构的权重值的有序子集的第j经重建构的权重值。在一些实例中,可基于对应于经重建构的权重值的经量化的预测性权重值产生经重建构的权重值的有序子集(或集合)。The weighted reconstructed weight values may include entries corresponding to the magnitudes (or absolute values) of the corresponding reconstructed weight values. The reconstructed weight values correspond to the jth reconstructed weight value from the ordered subset of reconstructed weight values for the (i-1)th audio frame. In some examples, the ordered subset (or set) of reconstructed weight values may be generated based on quantized predictive weight values corresponding to the reconstructed weight values.

量化单元42也包含加权因子αj。在一些实例中,αj=1,在此状况下,经加权的经重建构的权重值可减小到在其它实例中,αj≠1。举例来说,可基于以下等式确定αjQuantization unit 42 also includes a weighting factor α j . In some examples, α j =1, in which case the weighted reconstructed weight value may be reduced to . In other examples, α j ≠1. For example, α j may be determined based on the following equation:

其中I对应于用以确定αj的音频帧的数目。如先前等式中所展示,在一些实例中,可基于来自多个不同音频帧的多个不同权重值确定加权因子。where I corresponds to the number of audio frames used to determine α j . As shown in the previous equation, in some examples, the weighting factor may be determined based on a plurality of different weight values from a plurality of different audio frames.

而且,当经配置以执行经预测的向量量化时,量化单元52可基于以下等式产生预测性权重值:Furthermore, when configured to perform predicted vector quantization, quantization unit 52 may generate predictive weight values based on the following equation:

其中ei,j对应于来自用于第i音频帧的权重值的有序子集的第j权重值的预测性权重值。where e i,j corresponds to the predictive weight value of the jth weight value from the ordered subset of weight values for the i-th audio frame.

量化单元52基于预测性权重值及经预测的向量量化(PVQ)码簿产生经量化的预测性权重值。举例来说,量化单元52可将预测性权重值结合针对待译码的向量或针对待译码的帧产生的其它预测性权重值向量量化以便产生经量化的预测性权重值。Quantization unit 52 generates quantized predictive weight values based on the predictive weight values and a predicted vector quantization (PVQ) codebook. For example, quantization unit 52 may vector quantize the predictive weight values in conjunction with other predictive weight values generated for the vector to be coded or for the frame to be coded to generate quantized predictive weight values.

量化单元52可基于PVQ码簿将预测性权重值620向量量化。PVQ码簿可包含多个M-分量候选量化向量,且量化单元52可选择所述候选量化向量中的一者来表示Z个预测性权重值。在一些实例中,量化单元52可从PVQ码簿中选择使量化误差最小化(例如,使最小平方误差最小化)的候选量化向量。Quantization unit 52 may vector-quantize predictive weight values 620 based on a PVQ codebook. The PVQ codebook may include a plurality of M-component candidate quantization vectors, and quantization unit 52 may select one of the candidate quantization vectors to represent the Z predictive weight values. In some examples, quantization unit 52 may select a candidate quantization vector from the PVQ codebook that minimizes quantization error (e.g., minimizes a least square error).

在一些实例中,PVQ码簿可包含多个条目,其中所述条目中的每一者包含量化码簿索引及对应M-分量候选量化向量。量化码簿中的所述索引中的每一者可对应于多个M-分量候选量化向量中的相应者。In some examples, the PVQ codebook may include multiple entries, wherein each of the entries includes a quantization codebook index and a corresponding M-component candidate quantization vector. Each of the indices in the quantization codebook may correspond to a respective one of a plurality of M-component candidate quantization vectors.

量化向量中的每一者中的分量的数目可取决于经选择以表示单一v-向量的权重的数目(即,Z)。一般来说,对于具有Z-分量候选量化向量的码簿,量化单元52可同时将Z个预测性权重值向量量化以产生单一经量化的向量。量化码簿中的条目的数目可取决于用以将权重值向量量化的位速率。The number of components in each of the quantization vectors may depend on the number of weights selected to represent a single v-vector (i.e., Z). In general, for a codebook with Z-component candidate quantization vectors, quantization unit 52 may simultaneously quantize the Z predictive weight value vectors to produce a single quantized vector. The number of entries in the quantization codebook may depend on the bit rate used to quantize the weight value vectors.

当量化单元52将预测性权重值向量量化时,量化单元52可从PVQ码簿中选择将为表示Z个预测性权重值的量化向量的Z-分量向量。经量化的预测性权重值可表示为其可对应于用于第i音频帧的Z-分量量化向量的第j分量,其可进一步对应于用于第i音频帧的第j预测性权重值的经向量量化的版本。When quantization unit 52 quantizes the predictive weight value vector, quantization unit 52 may select a Z-component vector from the PVQ codebook to be the quantized vector representing the Z predictive weight values. The quantized predictive weight value may be represented as a j-th component of the Z-component quantized vector for the i-th audio frame, which may further correspond to a vector quantized version of the j-th predictive weight value for the i-th audio frame.

当经配置以执行经预测的向量量化时,量化单元52也可基于经量化的预测性权重值及经加权的经重建构的权重值产生经重建构的权重值。举例来说,量化单元52可将经加权的经重建构的权重值加到经量化的预测性权重值以产生经重建构的权重值。经加权的经重建构的权重值可与上文所描述的经加权的经重建构的权重值相同。在一些实例中,经加权的经重建构的权重值可为经重建构的权重值的经加权及经延迟的版本。When configured to perform predicted vector quantization, the quantization unit 52 may also generate a reconstructed weight value based on the quantized predictive weight value and the weighted reconstructed weight value. For example, the quantization unit 52 may add the weighted reconstructed weight value to the quantized predictive weight value to generate a reconstructed weight value. The weighted reconstructed weight value may be the same as the weighted reconstructed weight value described above. In some examples, the weighted reconstructed weight value may be a weighted and delayed version of the reconstructed weight value.

经重建构的权重值可表示为其对应于对应的经重建构的权重值的量值(或绝对值)。经重建构的权重值对应于来自用于第(i-1)音频帧的经重建构的权重值的有序子集的第j经重建构的权重值。在一些实例中,量化单元52可分别译码指示经预测性地译码的权重值的正负号的数据,且解码器可使用此信息确定经重建构的权重值的正负号。The reconstructed weight values may be represented as their magnitudes (or absolute values) corresponding to the corresponding reconstructed weight values. The reconstructed weight values correspond to the j-th reconstructed weight value from the ordered subset of reconstructed weight values for the (i-1)-th audio frame. In some examples, quantization unit 52 may separately encode data indicating the signs of the predictively coded weight values, and the decoder may use this information to determine the signs of the reconstructed weight values.

量化单元52可基于以下等式产生经重建构的权重值:Quantization unit 52 may generate a reconstructed weight value based on the following equation:

其中对应于来自用于第i音频帧的权重值的有序子集的第j权重值(例如,M-分量量化向量的第j分量)的经量化的预测性权重值,对应于来自用于第(i-1)音频帧的权重值的有序子集的第j权重值的经重建构的权重值的量值,且αj对应于来自权重值的有序子集的第j权重值的加权因子。wherein αj corresponds to the quantized predictive weight value of the jth weight value from the ordered subset of weight values for the i-th audio frame (e.g., the jth component of the M-component quantization vector), αj corresponds to the magnitude of the reconstructed weight value of the jth weight value from the ordered subset of weight values for the (i-1)-th audio frame, and αj corresponds to the weighting factor of the jth weight value from the ordered subset of weight values.

量化单元52可基于经重建构的权重值产生经延迟的经重建构的权重值。举例来说,量化单元52可将经重建构的权重值延迟达一音频帧以产生经延迟的经重建构的权重值。The quantization unit 52 may generate a delayed reconstructed weight value based on the reconstructed weight value.For example, the quantization unit 52 may delay the reconstructed weight value by an audio frame to generate the delayed reconstructed weight value.

量化单元52也可基于经延迟的经重建构的权重值及加权因子产生经加权的经重建构的权重值。举例来说,量化单元52可将经延迟的经重建构的权重值乘以加权因子以产生经加权的经重建构的权重值。Quantization unit 52 may also generate a weighted reconstructed weight value based on the delayed reconstructed weight value and the weighting factor. For example, quantization unit 52 may multiply the delayed reconstructed weight value by the weighting factor to generate a weighted reconstructed weight value.

类似地,量化单元52可基于经延迟的经重建构的权重值及加权因子产生经加权的经重建构的权重值。举例来说,量化单元52可将经延迟的经重建构的权重值乘以加权因子以产生经加权的经重建构的权重值。Similarly, quantization unit 52 may generate a weighted reconstructed weight value based on the delayed reconstructed weight value and the weighting factor. For example, quantization unit 52 may multiply the delayed reconstructed weight value by the weighting factor to generate a weighted reconstructed weight value.

响应于从PVQ码簿中选择将为用于Z个预测性权重值的量化向量的Z-分量向量,在一些实例中,量化单元52可译码对应于所选定Z-分量向量的索引(来自PVQ码簿)(而非译码所选定Z-分量向量自身)。所述索引可指示一组经量化的预测性权重值。在此些实例中,解码器24可包含类似于PVQ码簿的码簿,且可通过将指示经量化的预测性权重值的索引映射到解码器码簿中的对应Z-分量向量而解码所述索引。Z-分量向量中的分量中的每一者可对应于经量化的预测性权重值。In response to selecting a Z-component vector from the PVQ codebook that is to be the quantized vector for the Z predictive weight values, in some examples, quantization unit 52 may code an index (from the PVQ codebook) corresponding to the selected Z-component vector (rather than coding the selected Z-component vector itself). The index may indicate a set of quantized predictive weight values. In such examples, decoder 24 may include a codebook similar to the PVQ codebook and may decode the index indicating the quantized predictive weight values by mapping the index to the corresponding Z-component vector in the decoder codebook. Each of the components in the Z-component vector may correspond to a quantized predictive weight value.

将向量(例如,V-向量)纯量量化可涉及个别地及/或独立于其它分量将所述向量的分量中的每一者量化。举例来说,考虑以下实例V-向量:Quantizing a vector (e.g., a V-vector) scalar-wise can involve quantizing each of the components of the vector individually and/or independently of the other components. For example, consider the following example V-vector:

V=[0.23 0.31 -0.47 … 0.85]V=[0.23 0.31 -0.47 … 0.85]

为了将此实例V向量纯量量化,可个别地将所述分量中的每一者量化(即,纯量量化)。举例来说,如果量化步长为0.1,那么可将0.23分量量化为0.2,可将0.31分量量化为0.3,等等。经纯量量化的分量可共同地形成经纯量量化的V-向量。To scalar quantize this example V vector, each of the components may be individually quantized (i.e., scalar quantized). For example, if the quantization step size is 0.1, the 0.23 component may be quantized to 0.2, the 0.31 component may be quantized to 0.3, and so on. The scalar quantized components may collectively form a scalar quantized V-vector.

换句话说,量化单元52可关于经缩减前景V[k]向量55中的给定向量的所有元素执行均匀纯量量化。量化单元52可基于可表示为NbitsQ语法元素的值识别量化步长。量化单元52可基于目标位速率41动态地确定此NbitsQ语法元素。NbitsQ语法元素也可识别如下文再现的ChannelSideInfoData语法表中所提及的量化模式,同时也识别步长(出于纯量量化的目的)。即,量化单元52可依据此NbitsQ语法元素确定量化步长。作为一实例,量化单元52可将量化步长(在本发明中表示为“差量”或“Δ”)确定为等于216-NbitsQ。在此实例中,当NbitsQ语法元素的值等于6时,差量等于210且存在26种量化等级。就此而言,对于向量元素v,经量化的向量元素vq等于[v/Δ],且-2NbitsQ-1<vq<2NbitsQ-1In other words, quantization unit 52 may perform uniform scalar quantization on all elements of a given vector in the reduced foreground V[k] vector 55. Quantization unit 52 may identify a quantization step size based on the value of the NbitsQ syntax element, which may be represented as a quantization step size. Quantization unit 52 may dynamically determine this NbitsQ syntax element based on the target bit rate 41. The NbitsQ syntax element may also identify a quantization mode, as referenced in the ChannelSideInfoData syntax table reproduced below, while also identifying a step size (for scalar quantization purposes). That is, quantization unit 52 may determine the quantization step size based on this NbitsQ syntax element. As an example, quantization unit 52 may determine the quantization step size (denoted as a "delta" or "Δ" in this disclosure) to be equal to 2 16 - NbitsQ . In this example, when the value of the NbitsQ syntax element is 6, the delta is equal to 2 10 and there are 2 6 levels of quantization. In this regard, for a vector element v, the quantized vector element vq is equal to [v/Δ], and -2NbitsQ-1 < vq < 2NbitsQ-1 .

量化单元52可接着执行经量化的向量元素的分类及残余译码。作为一实例,量化单元52可针对给定的经量化的向量元素vq,使用以下等式识别此元素所对应的类别(通过确定类别识别符cid):Quantization unit 52 may then perform classification and residual coding of the quantized vector elements. As an example, quantization unit 52 may, for a given quantized vector element vq , identify the category to which this element corresponds (by determining the category identifier cid) using the following equation:

量化单元52可接着对此类别索引cid进行霍夫曼译码,同时也识别指示vq为正值还是负值的正负号位。量化单元52接下来可识别此类别中的残余。作为一实例,量化单元52可根据以下等式确定此残余:Quantization unit 52 may then Huffman code this class index cid while also identifying the sign bit that indicates whether vq is positive or negative. Quantization unit 52 may then identify the residual in this class. As an example, quantization unit 52 may determine this residual according to the following equation:

残余=|vq|-2cid-1 Residue = |v q |-2 cid-1

量化单元52可接着用cid-1个位对此残余进行块译码。Quantization unit 52 may then block code this residue with cid-1 bits.

在一些实例中,当译码cid时,量化单元52可选择用于NbitsQ语法元素的不同值的不同霍夫曼码簿。在一些实例中,量化单元52可提供用于NbitsQ语法元素值6,…,15的不同霍夫曼译码表。此外,量化单元52可包含用于在6,…,15的范围内的不同NbitsQ语法元素值中的每一者的五个不同霍夫曼码簿,总共50个霍夫曼码簿。就此而言,量化单元52可包含多个不同霍夫曼码簿以适应数个不同统计上下文中的cid的译码。In some examples, when coding the cid, quantization unit 52 may select different Huffman codebooks for different values of the NbitsQ syntax element. In some examples, quantization unit 52 may provide different Huffman coding tables for NbitsQ syntax element values of 6, ..., 15. Furthermore, quantization unit 52 may include five different Huffman codebooks for each of the different NbitsQ syntax element values in the range of 6, ..., 15, for a total of 50 Huffman codebooks. In this regard, quantization unit 52 may include multiple different Huffman codebooks to accommodate coding of the cid in a number of different statistical contexts.

为了进行说明,量化单元52可针对NbitsQ语法元素值中的每一者包含:用于译码向量元素一到四的第一霍夫曼码簿;用于译码向量元素五到九的第二霍夫曼码簿;用于译码向量元素九及九以上的第三霍夫曼码簿。当出现以下情形时,可使用此些前三个霍夫曼码簿:经缩减前景V[k]向量55中待压缩的经缩减前景V[k]向量55并非是从经缩减前景V[k]向量55中在时间上后续的对应经缩减前景V[k]向量预测且并非表示合成音频对象((例如)最初通过经脉码调制(PCM)音频对象界定的音频对象)的空间信息。当经缩减前景V[k]向量55中的此经缩减前景V[k]向量55是从经缩减前景V[k]向量55中在时间上后续的对应经缩减前景V[k]向量55预测时,量化单元52可针对NbitsQ语法元素值中的每一者另外包含用于译码经缩减前景V[k]向量55中的所述经缩减前景V[k]向量55的第四霍夫曼码簿。当经缩减前景V[k]向量55中的此经缩减前景V[k]向量55表示合成音频对象时,量化单元52也可针对NbitsQ语法元素值中的每一者包含用于译码经缩减前景V[k]向量55中的所述经缩减前景V[k]向量55的第五霍夫曼码簿。可针对此些不同统计上下文(即,在此实例中,未经预测及非合成上下文、经预测的上下文及合成上下文)中的每一者开发各种霍夫曼码簿。For illustration, quantization unit 52 may include, for each of the NbitsQ syntax element values, a first Huffman codebook for coding vector elements one through four, a second Huffman codebook for coding vector elements five through nine, and a third Huffman codebook for coding vector elements nine and above. These first three Huffman codebooks may be used when the reduced foreground V[k] vector 55 to be compressed is not predicted from a corresponding reduced foreground V[k] vector 55 that is temporally subsequent in the reduced foreground V[k] vector 55 and does not represent spatial information of a synthetic audio object (e.g., an audio object originally defined by a pulse code modulated (PCM) audio object). When this one of the reduced foreground V[k] vectors 55 is predicted from a corresponding reduced foreground V[k] vector 55 that is temporally subsequent in the reduced foreground V[k] vectors 55, the quantization unit 52 may additionally include, for each of the NbitsQ syntax element values, a fourth Huffman codebook for coding that one of the reduced foreground V[k] vectors 55. When this one of the reduced foreground V[k] vectors 55 represents a synthetic audio object, the quantization unit 52 may also include, for each of the NbitsQ syntax element values, a fifth Huffman codebook for coding that one of the reduced foreground V[k] vectors 55. Various Huffman codebooks may be developed for each of such different statistical contexts, i.e., in this example, unpredicted and non-synthesized contexts, predicted contexts, and synthetic contexts.

下表说明霍夫曼表选择及待于位流中指定以使得解压缩单元能够选择适当霍夫曼表的位:The following table illustrates the Huffman table selection and the bits to be specified in the bitstream to enable the decompression unit to select the appropriate Huffman table:

Pred模式Pred Mode HT信息HT Information HT表HT table 00 00 HT5HT5 00 11 HT{1,2,3}HT{1,2,3} 11 00 HT4HT4 11 11 HT5HT5

在前表中,预测模式(“Pred模式”)指示是否针对当前向量执行了预测,而霍夫曼表(“HT信息”)指示用以选择霍夫曼表一到五中的一者的额外霍夫曼码簿(或表格)信息。预测模式也可表示为下文所论述的PFlag语法元素,而HT信息可通过下文所论述的CbFlag语法元素来表示。In the preceding table, the prediction mode ("Pred Mode") indicates whether prediction is performed for the current vector, and the Huffman table ("HT Information") indicates additional Huffman codebook (or table) information used to select one of Huffman tables 1 to 5. The prediction mode can also be represented as the PFlag syntax element discussed below, while the HT Information can be represented by the CbFlag syntax element discussed below.

下表进一步说明此霍夫曼表选择过程(在给定各种统计上下文或情形的情况下)。The following table further illustrates this Huffman table selection process (given various statistical contexts or situations).

记录Record 合成synthesis 无PredNo Pred HT{1,2,3}HT{1,2,3} HT5HT5 具有PredWith Pred HT4HT4 HT5HT5

在前表中,“记录”列指示向量表示经记录的音频对象时的译码上下文,而“合成”列指示向量表示合成音频对象时的译码上下文。“无Pred”行指示并不关于向量元素执行预测时的译码上下文,而“具有Pred”行指示关于向量元素执行预测时的译码上下文。如此表中所展示,量化单元52在向量表示所记录音频对象且并不关于向量元素执行预测时选择HT{1,2,3}。量化单元52在音频对象表示合成音频对象且并不关于向量元素执行预测时选择HT5。量化单元52在向量表示所记录音频对象且关于向量元素执行预测时选择HT4。量化单元52在音频对象表示合成音频对象且关于向量元素执行预测时选择HT5。In the previous table, the "Recorded" column indicates the decoding context when the vector represents a recorded audio object, while the "Synthesized" column indicates the decoding context when the vector represents a synthetic audio object. The "Without Pred" row indicates the decoding context when prediction is not performed on the vector elements, while the "With Pred" row indicates the decoding context when prediction is performed on the vector elements. As shown in this table, the quantization unit 52 selects HT{1,2,3} when the vector represents a recorded audio object and does not perform prediction on the vector elements. The quantization unit 52 selects HT5 when the audio object represents a synthetic audio object and does not perform prediction on the vector elements. The quantization unit 52 selects HT4 when the vector represents a recorded audio object and performs prediction on the vector elements. The quantization unit 52 selects HT5 when the audio object represents a synthetic audio object and performs prediction on the vector elements.

量化单元52可基于本发明中所论述的准则的任何组合选择以下各者中的一者以用作输出经切换式量化的V-向量:未经预测的经向量量化的V-向量、经预测的经向量量化的V-向量、未经霍夫曼译码的经纯量量化的V-向量,及经霍夫曼译码的经纯量量化的V-向量。在一些实例中,量化单元52可从包含向量量化模式及一或多个纯量量化模式的一组量化模式中选择量化模式,且基于(或根据)所述选定模式将输入V-向量量化。量化单元52可接着将以下各者中的选定者提供到位流产生单元52以用作经译码前景V[k]向量57:未经预测的经向量量化的V-向量(例如,就权重值或指示权重值的位而言)、经预测的经向量量化的V-向量(例如,就误差值或指示误差值的位而言)、未经霍夫曼译码的经纯量量化的V-向量,及经霍夫曼译码的经纯量量化的V-向量。量化单元52也可提供指示量化模式的语法元素(例如,NbitsQ语法元素),及用以解量化或以其它方式重建构V-向量的任何其它语法元素(如下文关于图4及7的实例更详细论述)。Quantization unit 52 may select one of the following for use as the output switched-quantized V-vector based on any combination of criteria discussed in this disclosure: an unpredicted vector-quantized V-vector, a predicted vector-quantized V-vector, a non-Huffman-coded scalar-quantized V-vector, and a Huffman-coded scalar-quantized V-vector. In some examples, quantization unit 52 may select a quantization mode from a set of quantization modes including a vector quantization mode and one or more scalar quantization modes, and quantize the input V-vector based on (or in accordance with) the selected mode. Quantization unit 52 may then provide a selected one of the following to bitstream generation unit 52 for use as the coded foreground V[k] vector 57: an unpredicted vector quantized V-vector (e.g., in terms of weight values or bits indicating weight values), a predicted vector quantized V-vector (e.g., in terms of error values or bits indicating error values), a unHuffman coded scalar quantized V-vector, and a Huffman coded scalar quantized V-vector. Quantization unit 52 may also provide a syntax element indicating a quantization mode (e.g., an NbitsQ syntax element), and any other syntax elements used to dequantize or otherwise reconstruct the V-vector (as discussed in more detail below with respect to the examples of Figures 4 and 7).

包含于音频编码装置20内的音质音频译码器单元40可表示音质音频译码器的多个执行个体,其中的每一者用以编码经能量补偿的环境HOA系数47'及经内插的nFG信号49'中的每一者的不同音频对象或HOA信道,以产生经编码环境HOA系数59及经编码nFG信号61。音质音频译码器单元40可将经编码环境HOA系数59及经编码nFG信号61输出到位流产生单元42。The parasitic audio decoder unit 40 included in the audio encoding device 20 may represent multiple executions of a parasitic audio decoder, each of which is used to encode a different audio object or HOA channel for each of the energy-compensated ambient HOA coefficients 47′ and the interpolated nFG signal 49′ to generate encoded ambient HOA coefficients 59 and encoded nFG signal 61. The parasitic audio decoder unit 40 may output the encoded ambient HOA coefficients 59 and the encoded nFG signal 61 to the bitstream generation unit 42.

包含于音频编码装置20内的位流产生单元42表示将数据格式化以符合已知格式(其可指为解码装置已知的格式)借此产生基于向量的位流21的单元。换句话说,位流21可表示以上文所描述的方式编码的经编码音频数据。位流产生单元42在一些实例中可表示多路复用器,其可接收经译码前景V[k]向量57、经编码环境HOA系数59、经编码nFG信号61,及背景信道信息43。位流产生单元42可接着基于经译码前景V[k]向量57、经编码环境HOA系数59、经编码nFG信号61及背景信道信息43产生位流21。以此方式,位流产生单元42可借此在位流21中指定向量57以获得如下文关于图7的实例更详细描述的位流21。位流21可包含主要或主位流及一或多个旁侧信道位流。The bitstream generation unit 42 included in the audio encoding device 20 represents a unit that formats data to conform to a known format (which may be a format known to the decoding device), thereby generating a vector-based bitstream 21. In other words, the bitstream 21 may represent the encoded audio data encoded in the manner described above. In some examples, the bitstream generation unit 42 may represent a multiplexer that may receive the coded foreground V[k] vector 57, the encoded ambient HOA coefficients 59, the encoded nFG signal 61, and the background channel information 43. The bitstream generation unit 42 may then generate the bitstream 21 based on the coded foreground V[k] vector 57, the encoded ambient HOA coefficients 59, the encoded nFG signal 61, and the background channel information 43. In this manner, the bitstream generation unit 42 may thereby specify the vector 57 in the bitstream 21 to obtain the bitstream 21, as described in more detail below with respect to the example of FIG. 7. The bitstream 21 may include a primary or main bitstream and one or more side channel bitstreams.

尽管在图3的实例中未展示,但音频编码装置20也可包含位流输出单元,所述位流输出单元基于当前帧将使用基于方向的合成还是基于向量的合成编码而切换从音频编码装置20输出的位流(例如,在基于方向的位流21与基于向量的位流21之间切换)。位流输出单元可基于由内容分析单元26输出的指示执行基于方向的合成(作为检测到HOA系数11是从合成音频对象产生的结果)还是执行基于向量的合成(作为检测到HOA系数经记录的结果)的语法元素执行所述切换。位流输出单元可指定正确的标头语法以指示用于当前帧以及位流21中的相应位流的切换或当前编码。Although not shown in the example of FIG3 , the audio encoding device 20 may also include a bitstream output unit that switches the bitstream output from the audio encoding device 20 (e.g., switches between a direction-based bitstream 21 and a vector-based bitstream 21) based on whether the current frame is to be encoded using direction-based synthesis or vector-based synthesis. The bitstream output unit may perform the switching based on a syntax element output by the content analysis unit 26 indicating whether direction-based synthesis is to be performed (as a result of detecting that the HOA coefficients 11 are generated from a synthesized audio object) or vector-based synthesis is to be performed (as a result of detecting that the HOA coefficients are recorded). The bitstream output unit may specify the correct header syntax to indicate the switching or current encoding for the current frame and the corresponding bitstream in the bitstream 21.

此外,如上文所提及,声场分析单元44可识别BGTOT环境HOA系数47,所述BGTOT环境HOA系数可基于逐个帧而改变(但时常BGTOT可跨越两个或两个以上邻近(在时间上)帧保持恒定或相同)。BGTOT的改变可导致在经缩减前景V[k]向量55中表达的系数的改变。BGTOT的改变可导致背景HOA系数(其也可被称作“环境HOA系数”),其基于逐个帧而改变(但再次,时常BGTOT可跨越两个或两个以上邻近(在时间上)帧保持恒定或相同)。所述改变常常导致就以下方面而言的能量的改变:通过额外环境HOA系数的添加或移除及系数从经缩减前景V[k]向量55的对应移除或系数到经缩减前景V[k]向量55的添加表示的声场。In addition, as mentioned above, the sound field analysis unit 44 can identify BG TOT ambient HOA coefficients 47, which can change on a frame-by-frame basis (but often the BG TOT can remain constant or the same across two or more adjacent (in time) frames). Changes in the BG TOT can result in changes in the coefficients expressed in the reduced foreground V[k] vector 55. Changes in the BG TOT can result in background HOA coefficients (which can also be referred to as "ambient HOA coefficients") that change on a frame-by-frame basis (but again, often the BG TOT can remain constant or the same across two or more adjacent (in time) frames). The changes often result in changes in energy in terms of the sound field represented by the addition or removal of additional ambient HOA coefficients and the corresponding removal of coefficients from the reduced foreground V[k] vector 55 or the addition of coefficients to the reduced foreground V[k] vector 55.

因此,声场分析单元(声场分析单元44)可进一步确定环境HOA系数何时逐帧而改变且产生指示环境HOA系数的改变的旗标或其它语法元素(就用以表示声场的环境分量而言)(其中所述改变也可被称作环境HOA系数的“转变”或被称作环境HOA系数的“转变”)。详细地说,系数减少单元46可产生旗标(其可表示为AmbCoeffTransition旗标或AmbCoeffIdxTransition旗标),从而将所述旗标提供到位流产生单元42,以便可将所述旗标包含于位流21中(有可能作为旁侧信道信息的部分)。Thus, the soundfield analysis unit (soundfield analysis unit 44) may further determine when the ambient HOA coefficients change from frame to frame and generate a flag or other syntax element (in terms of the ambient component used to represent the soundfield) indicating the change in the ambient HOA coefficients (wherein the change may also be referred to as a "transition" of the ambient HOA coefficients or as a "transition" of the ambient HOA coefficients. In detail, the coefficient reduction unit 46 may generate a flag (which may be denoted as an AmbCoeffTransition flag or an AmbCoeffIdxTransition flag) and provide the flag to the bitstream generation unit 42 so that the flag can be included in the bitstream 21 (possibly as part of the side channel information).

除指定环境系数转变旗标之外,系数减少单元46也可修改产生经缩减前景V[k]向量55的方式。在一实例中,当确定环境HOA环境系数中的一者在当前帧中处于转变中时,系数减少单元46可指定用于经缩减前景V[k]向量55的V-向量中的每一者的向量系数(其也可被称作“向量元素”或“元素”),其对应于处于转变中的环境HOA系数。同样地,处于转变中的环境HOA系数可添加到背景系数的BGTOT总数目或从背景系数的BGTOT总数目移除。因此,背景系数的总数目的所得改变影响以下情形:环境HOA系数是包含于还是不包含于位流中,及在上文所描述的第二及第三配置模式中是否针对位流中所指定的V-向量包含V-向量的对应元素。关于系数减少单元46可如何指定经缩减前景V[k]向量55以克服能量的改变的更多信息提供于2015年1月12日申请的标题为“环境HIGHER_ORDER立体混响系数的转变(TRANSITIONING OF AMBIENT HIGHER_ORDER AMBISONIC COEFFICIENTS)”的第14/594,533号美国申请案中。In addition to specifying the ambient coefficient transition flag, coefficient reduction unit 46 may also modify the manner in which the reduced foreground V[k] vector 55 is generated. In one example, when it is determined that one of the ambient HOA environment coefficients is in transition in the current frame, coefficient reduction unit 46 may specify a vector coefficient (which may also be referred to as a "vector element" or "element") for each of the V-vectors of the reduced foreground V[k] vector 55 that corresponds to the ambient HOA coefficient in transition. Likewise, the ambient HOA coefficient in transition may be added to or removed from the total BG TOT number of background coefficients. Thus, the resulting change in the total number of background coefficients affects whether the ambient HOA coefficients are included or not included in the bitstream , and whether the corresponding elements of the V-vectors are included for the V-vector specified in the bitstream in the second and third configuration modes described above. More information on how coefficient reduction unit 46 may specify scaled-down foreground V[k] vectors 55 to overcome changes in energy is provided in U.S. application Ser. No. 14/594,533, filed Jan. 12, 2015, entitled “TRANSITIONING OF AMBIENT HIGHER_ORDER AMBISONIC COEFFICIENTS.”

图4为更详细地说明图2的音频解码装置24的框图。如图4的实例中所展示,音频解码装置24可包含提取单元72、基于方向性的重建构单元90及基于向量的重建构单元92。尽管下文加以描述,但关于音频解码装置24及解压缩或以其它方式解码HOA系数的各种方面的更多信息可在2014年5月29日申请的标题为“用于声场的经分解表示的内插(NTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第WO 2014/194099号国际专利申请公开案中获得。FIG4 is a block diagram illustrating the audio decoding device 24 of FIG2 in greater detail. As shown in the example of FIG4, the audio decoding device 24 may include an extraction unit 72, a directionality-based reconstruction unit 90, and a vector-based reconstruction unit 92. Although described below, more information regarding the audio decoding device 24 and various aspects of decompressing or otherwise decoding HOA coefficients can be found in International Patent Application Publication No. WO 2014/194099, filed May 29, 2014, entitled “NTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD.”

提取单元72可表示经配置以接收位流21及提取HOA系数11的各种经编码版本(例如,基于方向的经编码版本或基于向量的经编码版本)的单元。提取单元72可确定上文所提及的指示HOA系数11是经由各种基于方向的版本还是基于向量的版本编码的语法元素。当执行基于方向的编码时,提取单元72可提取HOA系数11的基于方向的版本及与所述经编码版本相关联的语法元素(其在图4的实例中表示为基于方向的信息91),将所述基于方向的信息91传递到基于方向的重建构单元90。基于方向的重建构单元90可表示经配置以基于基于方向的信息91以HOA系数11'的形式重建构HOA系数的单元。下文关于图7A到7J的实例更详细地描述位流及位流内的语法元素的布置。The extraction unit 72 may represent a unit configured to receive the bitstream 21 and extract various encoded versions of the HOA coefficients 11 (e.g., a direction-based encoded version or a vector-based encoded version). The extraction unit 72 may determine the syntax elements mentioned above that indicate whether the HOA coefficients 11 are encoded via various direction-based versions or vector-based versions. When direction-based encoding is performed, the extraction unit 72 may extract the direction-based version of the HOA coefficients 11 and the syntax elements associated with the encoded version (which are represented as direction-based information 91 in the example of FIG. 4 ), and pass the direction-based information 91 to the direction-based reconstruction unit 90. The direction-based reconstruction unit 90 may represent a unit configured to reconstruct the HOA coefficients in the form of HOA coefficients 11 ′ based on the direction-based information 91. The bitstream and the arrangement of syntax elements within the bitstream are described in more detail below with respect to the examples of FIG. 7A to 7J .

当语法元素指示HOA系数11是使用基于向量的合成编码时,提取单元72可提取经译码前景V[k]向量57(其可包含经译码权重57及/或索引63或经纯量量化的V-向量)、经编码环境HOA系数59及对应音频对象61(其也可被称作经编码nFG信号61)。音频对象61各自对应于向量57中的一者。提取单元72可将经译码前景V[k]向量57传递到V-向量重建构单元74,且将经编码环境HOA系数59以及经编码nFG信号61提供到音质解码单元80。When the syntax element indicates that the HOA coefficients 11 are encoded using vector-based synthesis, the extraction unit 72 may extract the coded foreground V[k] vectors 57 (which may include the coded weights 57 and/or indices 63 or scalar-quantized V-vectors), the encoded ambient HOA coefficients 59, and the corresponding audio objects 61 (which may also be referred to as encoded nFG signals 61). The audio objects 61 each correspond to one of the vectors 57. The extraction unit 72 may pass the coded foreground V[k] vectors 57 to the V-vector reconstruction unit 74 and provide the encoded ambient HOA coefficients 59 and the encoded nFG signal 61 to the psychoacoustic decoding unit 80.

为了提取经译码前景V[k]向量57,提取单元72可根据以下ChannelSideInfoData(CSID)语法表提取语法元素。To extract the coded foreground V[k] vector 57, extraction unit 72 may extract syntax elements according to the following ChannelSideInfoData (CSID) syntax table.

表-ChannelSideInfoData(i)的语法Table - Syntax of ChannelSideInfoData(i)

用于前表的语义如下。The semantics for the preceding table are as follows.

此有效负载保持用于第i信道的旁侧信息。有效负载的大小及数据取决于信道的类型。This payload holds side information for the ith channel. The size and data of the payload depends on the type of channel.

ChannelType[i] 此元素存储表95中所界定的第i信道的类型。ChannelType[i] This element stores the type of the i-th channel defined in Table 95.

ActiveDirsIds[i] 此元素使用来自附录F.7的900个预定义均匀分布的点的索引指示作用中方向信号的方向。码字0用于用信号通知方向信号的结束。ActiveDirsIds[i] This element indicates the direction of the active directional signal using the index of 900 predefined uniformly distributed points from Annex F.7. Codeword 0 is used to signal the end of the directional signal.

PFlag[i] 与第i信道的基于向量的信号相关联的预测旗标。PFlag[i] The prediction flag associated with the vector-based signal of the i-th channel.

CbFlag[i] 与第i信道的基于向量的信号相关联的用于经纯量量化的V-向量的霍夫曼解码的码簿旗标。CbFlag[i] Codebook flag for Huffman decoding of scalar quantized V-vectors associated with the vector-based signal of the i-th channel.

CodebkIdx[i]CodebkIdx[i] 用信号通知与第i信道的基于向量的信号相关联的用以Signaling the vector-based signal associated with the i-th channel 将经向量量化的V-向量解量化的特定码簿。A specific codebook is used to dequantize the vector quantized V-vector.

NbitsQ[i] 此索引确定与第i信道的基于向量的信号相关联的用于数据的霍夫曼解码的霍夫曼表。码字5确定均匀8位解量化器的使用。两个MSB 00确定重用前一帧(k-1)的NbitsQ[i]、PFlag[i]及CbFlag[i]数据。NbitsQ[i] This index identifies the Huffman table used for Huffman decoding of the data associated with the vector-based signal for the i-th channel. Codeword 5 specifies the use of a uniform 8-bit dequantizer. The two MSBs 00 specify the reuse of NbitsQ[i], PFlag[i], and CbFlag[i] data from the previous frame (k-1).

bA,bB NbitsQ[i]字段的msb(bA)及第二msb(bB)。bA,bB The msb (bA) and second msb (bB) of the NbitsQ[i] field.

uintC NbitsQ[i]字段的剩余两个位的码字。uintC The codeword for the remaining two bits of the NbitsQ[i] field.

NumVecIndicesNumVecIndices 用以将经向量量化的V-向量解量化的向量的数目。The number of vectors used to dequantize the vector quantized V-vector.

AddAmbHoaInfoChannel(i)此有效负载保持用于额外环境HOA系数的信息。AddAmbHoaInfoChannel(i) This payload holds information for additional ambient HOA coefficients.

根据CSID语法表,提取单元72可首先获得指示信道的类型的ChannelType语法元素(例如,其中值0用信号通知基于方向的信号,值1用信号通知基于向量的信号,且值2用信号通知额外环境HOA信号)。基于ChannelType语法元素,提取单元72可在三种状况之间切换。Based on the CSID syntax table, extraction unit 72 may first obtain a ChannelType syntax element indicating the type of channel (e.g., where a value of 0 signals a direction-based signal, a value of 1 signals a vector-based signal, and a value of 2 signals an additional ambient HOA signal). Based on the ChannelType syntax element, extraction unit 72 may switch between three conditions.

集中于状况1以说明本发明中所描述的技术的一实例,提取单元72可获得NbitsQ语法元素(即,上述实例CSID语法表中的bA语法元素)的最高有效位及NbitsQ语法元素(即,上述实例CSID语法表中的bB语法元素)的次高有效位。NbitsQ(k)[i]的(k)[i]可表示针对第i输送信道的第k帧获得NbitsQ语法元素。NbitsQ语法元素可表示指示用以将通过HOA系数11表示的声场的空间分量量化的量化模式的一或多个位。在本发明中也可将空间分量称作V-向量或称作经译码前景V[k]向量57。Focusing on case 1 to illustrate an example of the techniques described in this disclosure, extraction unit 72 may obtain the most significant bits of an NbitsQ syntax element (i.e., the bA syntax element in the example CSID syntax table above) and the second most significant bits of an NbitsQ syntax element (i.e., the bB syntax element in the example CSID syntax table above). The (k)[i] of NbitsQ(k)[i] may represent obtaining the NbitsQ syntax element for the k-th frame of the i-th transport channel. The NbitsQ syntax element may represent one or more bits indicating a quantization mode used to quantize the spatial components of the soundfield represented by the HOA coefficients 11. The spatial components may also be referred to in this disclosure as V-vectors or as coded foreground V[k] vectors 57.

在上述实例CSID语法表中,NbitsQ语法元素可包含四个位以指示用以压缩在对应VVecData字段中指定的向量的12种量化模式中的一者(当保留或不使用用于NbitsQ语法元素的值零到三时)。12种量化模式包含下文指示的以下模式:In the example CSID syntax table above, the NbitsQ syntax element may include four bits to indicate one of 12 quantization modes used to compress the vector specified in the corresponding VVecData field (when values zero to three for the NbitsQ syntax element are reserved or not used). The 12 quantization modes include the following modes indicated below:

0-3: 保留0-3: Reserved

4: 向量量化4: Vector Quantization

5: 无霍夫曼译码的纯量量化5: Scalar quantization without Huffman decoding

6: 具有霍夫曼译码的6-位纯量量化6: 6-bit scalar quantization with Huffman decoding

7: 具有霍夫曼译码的7-位纯量量化7: 7-bit scalar quantization with Huffman decoding

8: 具有霍夫曼译码的8-位纯量量化8: 8-bit scalar quantization with Huffman decoding

… …… …

16: 具有霍夫曼译码的16-位纯量量化16: 16-bit scalar quantization with Huffman decoding

在上文中,NbitsQ语法元素的从6到16的值不仅指示将执行具有霍夫曼译码的纯量量化,而且指示纯量量化的量化步长。就此而言,量化模式可包括向量量化模式、无霍夫曼译码的纯量量化模式,及具有霍夫曼译码的纯量量化模式。In the above, the value of the NbitsQ syntax element from 6 to 16 indicates not only that scalar quantization with Huffman coding is to be performed, but also indicates the quantization step size of the scalar quantization. In this regard, the quantization mode may include a vector quantization mode, a scalar quantization mode without Huffman coding, and a scalar quantization mode with Huffman coding.

返回到上述实例CSID语法表,提取单元72可组合bA语法元素与bB语法元素,其中此组合可为加法,如上述实例CSID语法表中所展示。组合的bA/bB语法元素可表示关于是否重用来自前一帧的指示在压缩向量时使用的信息的至少一语法元素的指示符。提取单元72接下来比较组合的bA/bB语法元素与值零。当组合的bA/bB语法元素具有值零时,提取单元72可确定用于第i输送信道的当前第k帧的量化模式信息(即,指示上述实例CSID语法表中的量化模式的NbitsQ语法元素)与第i输送信道的第k-1帧的量化模式信息相同。换句话说,当经设定为零值时,所述指示符指示重用来自前一帧的所述至少一语法元素。Returning to the example CSID syntax table described above, extraction unit 72 may combine the bA syntax element and the bB syntax element, where this combination may be additive, as shown in the example CSID syntax table described above. The combined bA/bB syntax element may represent an indicator as to whether at least one syntax element from a previous frame indicating information used when compressing a vector is reused. Extraction unit 72 next compares the combined bA/bB syntax element to a value of zero. When the combined bA/bB syntax element has a value of zero, extraction unit 72 may determine that the quantization mode information for the current k-th frame of the i-th transport channel (i.e., the NbitsQ syntax element indicating the quantization mode in the example CSID syntax table described above) is the same as the quantization mode information for the k-1-th frame of the i-th transport channel. In other words, when set to a value of zero, the indicator indicates that the at least one syntax element from the previous frame is reused.

提取单元72类似地确定用于第i输送信道的当前第k帧的预测信息(即,所述实例中指示是否在向量量化或纯量量化期间执行预测的PFlag语法元素)与第i输送信道的第k-1帧的预测信息相同。提取单元72也可确定用于第i输送信道的当前第k帧的霍夫曼码簿信息(即,指示用以重建构V-向量的霍夫曼码簿的CbFlag语法元素)与第i输送信道的第k-1帧的霍夫曼码簿信息相同。提取单元72也可确定用于第i输送信道的当前第k帧的向量量化信息(即,指示用以重建构V-向量的向量量化码簿的CodebkIdx语法元素及指示用以重建构V-向量的码向量的数目的NumVecIndices语法元素)与第i输送信道的第k-1帧的向量量化信息相同。Extraction unit 72 similarly determines that the prediction information for the current k-th frame of the i-th transport channel (i.e., the PFlag syntax element indicating whether prediction is performed during vector quantization or scalar quantization in this example) is the same as the prediction information for the k-1-th frame of the i-th transport channel. Extraction unit 72 may also determine that the Huffman codebook information for the current k-th frame of the i-th transport channel (i.e., the CbFlag syntax element indicating the Huffman codebook used to reconstruct the V-vector) is the same as the Huffman codebook information for the k-1-th frame of the i-th transport channel. Extraction unit 72 may also determine that the vector quantization information for the current k-th frame of the i-th transport channel (i.e., the CodebkIdx syntax element indicating the vector quantization codebook used to reconstruct the V-vector and the NumVecIndices syntax element indicating the number of code vectors used to reconstruct the V-vector) is the same as the vector quantization information for the k-1-th frame of the i-th transport channel.

当组合的bA/bB语法元素并不具有值零时,提取单元72可确定用于第i输送信道的第k帧的量化模式信息、预测信息、霍夫曼码簿信息及向量量化信息并不与第i输送信道的第k-1帧的所述情形相同。因此,提取单元72可获得NbitsQ语法元素的最低有效位(即,上述实例CSID语法表中的uintC语法元素),从而组合bA、bB及uintC语法元素以获得NbitsQ语法元素。基于此NbitsQ语法元素,当NbitsQ语法元素用信号通知向量量化时,提取单元72可获得PFlag、CodebkIdx及NumVecIndices语法元素,或当NbitsQ语法元素用信号通知具有霍夫曼译码的纯量量化时,提取单元72可获得PFlag及CbFlag语法元素。以此方式,提取单元72可提取用以重建构V-向量的前述语法元素,将此些语法元素传递到基于向量的重建构单元72。When the combined bA/bB syntax element does not have a value of zero, extraction unit 72 may determine that the quantization mode information, prediction information, Huffman codebook information, and vector quantization information for the kth frame of the i-th transport channel are not the same as those for the k-1th frame of the i-th transport channel. Therefore, extraction unit 72 may obtain the least significant bit of the NbitsQ syntax element (i.e., the uintC syntax element in the example CSID syntax table above) to combine the bA, bB, and uintC syntax elements to obtain the NbitsQ syntax element. Based on this NbitsQ syntax element, extraction unit 72 may obtain the PFlag, CodebkIdx, and NumVecIndices syntax elements when the NbitsQ syntax element signals vector quantization, or the PFlag and CbFlag syntax elements when the NbitsQ syntax element signals scalar quantization with Huffman coding. In this manner, extraction unit 72 may extract the aforementioned syntax elements used to reconstruct the V-vector and pass these syntax elements to vector-based reconstruction unit 72.

提取单元72接下来可从第i输送信道的第k帧中提取V-向量。提取单元72可获得HOADecoderConfig容器应用程序,其包含表示为CodedVVecLength的语法元素。提取单元72可剖析来自HOADecoderConfig容器应用程序的CodedVVecLength。提取单元72可根据以下VVecData语法表获得V-向量。Extraction unit 72 may then extract the V-vector from the k-th frame of the i-th transport channel. Extraction unit 72 may obtain the HOADecoderConfig container application, which includes a syntax element denoted as CodedVVecLength. Extraction unit 72 may parse the CodedVVecLength from the HOADecoderConfig container application. Extraction unit 72 may obtain the V-vector according to the following VVecData syntax table.

VVec(k)[i] 此向量为用于第i信道的第k HOAframe()的V-向量。VVec(k)[i] This vector is the V-vector of the kth HOAframe() for the i-th channel.

VVecLength 此变量指示待读出的向量元素的数目。VVecLength This variable indicates the number of vector elements to be read.

VVecCoeffId 此向量含有经传输的V-向量系数的索引。VVecCoeffId This vector contains the indices of the transmitted V-vector coefficients.

VecVal 介于0与255之间的整数值。VecVal An integer value between 0 and 255.

aVal 在解码VVectorData期间使用的暂时变量。aVal Temporary variable used during decoding of VVectorData.

huffVal 待进行霍夫曼解码的霍夫曼码字。huffVal The Huffman codeword to be Huffman decoded.

SgnVal 此符号为在解码期间使用的经译码正负号值。SgnVal This sign is the coded sign value used during decoding.

intAddVal 此符号为在解码期间使用的额外整数值。intAddVal This symbol is an additional integer value used during decoding.

NumVecIndices 用以将经向量量化的V-向量解量化的向量的数目。NumVecIndices is the number of vectors used to dequantize the vector quantized V-vector.

WeightIdx WeightValCdbk中用以将经向量量化的V-向量解量化的索引。WeightIdx Index in WeightValCdbk used to dequantize the vector quantized V-vector.

nBitsW 用于读取WeightIdx以解码经向量量化的V-向量的字段大小。nBitsW The field size used to read WeightIdx to decode the vector quantized V-vector.

WeightValCbk 含有正实数值加权系数的向量的码簿。仅在NumVecIndices>1的情况下才为有必要的。提供具有256个条目的WeightValCdbk。WeightValCbk Codebook containing vectors of positive real-valued weight coefficients. Necessary only if NumVecIndices > 1. WeightValCdbk is provided with 256 entries.

WeightValPredCdbk 含有预测性加权系数的向量的码簿。仅在NumVecIndices>1的情况下才为有必要的。提供具有256个条目的WeightValPredCdbk。WeightValPredCdbk Codebook containing the vector of predictive weight coefficients. Necessary only if NumVecIndices > 1. WeightValPredCdbk is provided with 256 entries.

WeightValAlpha 针对V-向量量化的预测性译码模式使用的预测性译码系数。WeightValAlpha Predictive decoding coefficient used for predictive decoding mode of V-vector quantization.

VvecIdx 用以将经向量量化的V-向量解量化的VecDict的索引。VvecIdx Index into the VecDict used to dequantize the vector quantized V-vector.

nbitsIdx 用于读取VvecIdx以解码经向量量化的V-向量的字段大小。nbitsIdx Field size used to read VvecIdx to decode the vector quantized V-vector.

WeightVal 用以解码经向量量化的V-向量的实数值加权系数。WeightVal is the real-valued weighting coefficient used to decode the vector quantized V-vector.

在前述语法表中,提取单元72可确定NbitsQ语法元素的值是否等于四(或,换句话说,用信号通知使用向量解量化重建构V-向量)。当NbitsQ语法元素的值等于四时,提取单元72可比较NumVecIndices语法元素的值与值一。当NumVecIndices的值等于一时,提取单元72可获得VecIdx语法元素。VecIdx语法元素可表示指示用以将经向量量化的V-向量解量化的VecDict的索引的一或多个位。提取单元72可将VecIdx阵列执行个体化,其中第零元素经设定为VecIdx语法元素的值加上一。提取单元72也可获得SgnVal语法元素。SgnVal语法元素可表示指示在解码V-向量期间使用的经译码正负号值的一或多个位。提取单元72可将WeightVal阵列执行个体化,其中依据SgnVal语法元素的值设定第零元素。In the aforementioned syntax table, extraction unit 72 may determine whether the value of the NbitsQ syntax element is equal to four (or, in other words, signaling the use of vector dequantization to reconstruct the V-vector). When the value of the NbitsQ syntax element is equal to four, extraction unit 72 may compare the value of the NumVecIndices syntax element to the value of one. When the value of NumVecIndices is equal to one, extraction unit 72 may obtain the VecIdx syntax element. The VecIdx syntax element may represent one or more bits indicating the index of the VecDict used to dequantize the vector quantized V-vector. Extraction unit 72 may individualize the VecIdx array, with the zeroth element set to the value of the VecIdx syntax element plus one. Extraction unit 72 may also obtain the SgnVal syntax element. The SgnVal syntax element may represent one or more bits indicating the coded sign value used during decoding of the V-vector. Extraction unit 72 may individualize the WeightVal array, with the zeroth element set according to the value of the SgnVal syntax element.

当NumVecIndices语法元素的值并不等于值一时,提取单元72可获得WeightIdx语法元素。WeightIdx语法元素可表示指示用以将经向量量化的V-向量解量化的WeightValCdbk阵列中的索引的一或多个位。WeightValCdbk阵列可表示含有正实数值加权系数的向量的码簿。提取单元72接下来可依据在HOAConfig容器应用程序中指定的NumOfHoaCoeffs语法元素(在位流21的开始时作为一实例指定)确定nbitsIdx。提取单元72可接着对NumVecIndices反复,从而从位流21中获得VecIdx语法元素且用每一所获得的VecIdx语法元素设定VecIdx阵列元素。When the value of the NumVecIndices syntax element is not equal to a value of one, extraction unit 72 may obtain a WeightIdx syntax element. The WeightIdx syntax element may represent one or more bits indicating an index into the WeightValCdbk array used to dequantize the vector-quantized V-vector. The WeightValCdbk array may represent a codebook containing a vector of positive real-valued weighting coefficients. Extraction unit 72 may then determine nbitsIdx based on the NumOfHoaCoeffs syntax element specified in the HOAConfig container application (specified at the beginning of the bitstream 21 as an example). Extraction unit 72 may then iterate over NumVecIndices, obtaining VecIdx syntax elements from the bitstream 21 and setting a VecIdx array element with each obtained VecIdx syntax element.

提取单元72并不执行以下PFlag语法比较,所述PFlag语法比较涉及确定与从位流21中提取语法元素不相关的tmpWeightVal变量值。因此,提取单元72接下来可获得用于在确定WeightVal语法元素中使用的SgnVal语法元素。Extraction unit 72 does not perform the PFlag syntax comparison involving determining a tmpWeightVal variable value that is not relevant to extracting syntax elements from bitstream 21. Therefore, extraction unit 72 may next obtain the SgnVal syntax element for use in determining the WeightVal syntax element.

当NbitsQ语法元素的值等于五时(用信号通知使用无霍夫曼解码的纯量解量化重建构V向量),提取单元72从0到VVecLength反复,从而将aVal变量设定为从位流21中获得的VecVal语法元素。VecVal语法元素可表示指示介于0与255之间的整数的一或多个位。When the value of the NbitsQ syntax element is equal to five (signaling the use of scalar dequantization without Huffman decoding to reconstruct the V-vector), extraction unit 72 iterates from 0 to VVecLength, setting the aVal variable to the VecVal syntax element obtained from bitstream 21. The VecVal syntax element may represent one or more bits indicating an integer between 0 and 255.

当NbitsQ语法元素的值等于或大于六时(用信号通知使用具有霍夫曼解码的NbitsQ-位纯量解量化重建构V-向量),提取单元72从0到VVecLength反复,从而获得huffVal、SgnVal及intAddVal语法元素中的一或多者。huffVal语法元素可表示指示霍夫曼码字的一或多个位。intAddVal语法元素可表示指示在解码期间使用的额外整数值的一或多个位。提取单元72可将此些语法元素提供到基于向量的重建构单元92。When the value of the NbitsQ syntax element is equal to or greater than six (signaling the use of NbitsQ-bit scalar dequantization with Huffman decoding to reconstruct the V-vector), extraction unit 72 iterates from 0 to VVecLength, obtaining one or more of the huffVal, SgnVal, and intAddVal syntax elements. The huffVal syntax element may represent one or more bits indicating a Huffman codeword. The intAddVal syntax element may represent one or more bits indicating an additional integer value used during decoding. Extraction unit 72 may provide these syntax elements to vector-based reconstruction unit 92.

基于向量的重建构单元92可表示经配置以执行与上文关于基于向量的合成单元27所描述的那些操作互逆的操作以便重建构HOA系数11'的单元。基于向量的重建构单元92可包含V-向量重建构单元74、空间-时间内插单元76、前景制订单元78、音质解码单元80、HOA系数制订单元82、淡化单元770,及重新排序单元84。淡化单元770的虚线指示就包含于基于向量的重建构单元92中而言,淡化单元770可为视情况存在的单元。The vector-based reconstruction unit 92 may represent a unit configured to perform operations reciprocal to those described above with respect to the vector-based synthesis unit 27 in order to reconstruct the HOA coefficients 11′. The vector-based reconstruction unit 92 may include a V-vector reconstruction unit 74, a spatio-temporal interpolation unit 76, a foreground formulation unit 78, a psychoacoustic decoding unit 80, an HOA coefficient formulation unit 82, a fade unit 770, and a reordering unit 84. The dashed line of the fade unit 770 indicates that the fade unit 770 may be an optional unit for inclusion in the vector-based reconstruction unit 92.

V-向量重建构单元74可表示经配置以从经编码前景V[k]向量57重建构V-向量的单元。V-向量重建构单元74可以与量化单元52的方式互逆的方式操作。The V-vector reconstruction unit 74 may represent a unit configured to reconstruct a V-vector from the encoded foreground V[k] vector 57. The V-vector reconstruction unit 74 may operate in a manner that is reciprocal to that of the quantization unit 52.

换句话说,V-向量重建构单元74可根据以下伪码操作以重建构V-向量:In other words, the V-vector reconstruction unit 74 may operate according to the following pseudo code to reconstruct the V-vector:

根据前述伪码,V-向量重建构单元74可获得用于第i输送信道的第k帧的NbitsQ语法元素。当NbitsQ语法元素等于四时(所述情形再次用信号通知执行向量量化),V-向量重建构单元74可比较NumVecIndicies语法元素与一。如上文所描述,NumVecIndicies语法元素可表示指示用以将经向量量化的V-向量解量化的向量的数目的一或多个位。当NumVecIndicies语法元素的值等于一时,V-向量重建构单元74可接着从0直到VVecLength语法元素的值反复,从而将idx变量设定为VVecCoeffId且将第VVecCoeffIdV-向量元素(v(i) VVecCoeffId[m](k))设定为WeightVal乘以由[900][VecIdx[0]][idx]识别的VecDict条目。换句话说,当NumVvecIndicies的值等于一时,从表F.8结合表F.11中所展示的8×1加权值的码簿导出向量码簿HOA扩展系数。According to the aforementioned pseudocode, the V-vector reconstruction unit 74 may obtain the NbitsQ syntax element for the kth frame of the i-th transport channel. When the NbitsQ syntax element is equal to four (which again signals that vector quantization is being performed), the V-vector reconstruction unit 74 may compare the NumVecIndicies syntax element with one. As described above, the NumVecIndicies syntax element may represent one or more bits indicating the number of vectors used to dequantize the vector quantized V-vector. When the value of the NumVecIndicies syntax element is equal to one, the V-vector reconstruction unit 74 may then iterate from 0 to the value of the VVecLength syntax element, setting the idx variable to VVecCoeffId and setting the VVecCoeffId-th V-vector element (v (i) VVecCoeffId[m] (k)) to WeightVal multiplied by the VecDict entry identified by [900][VecIdx[0]][idx]. In other words, when the value of NumVvecIndicies is equal to one, the vector codebook HOA expansion coefficients are derived from the codebook of 8×1 weight values shown in Table F.8 in conjunction with Table F.11.

当NumVecIndicies语法元素的值并不等于一时,V-向量重建构单元74可将cdbLen变量设定为O,其为表示向量的数目的变量。cdbLen语法元素指示码向量的辞典或码簿中的条目的数目(其中此辞典在前述伪码中表示为“VecDict”且表示含有用以解码经向量量化的V-向量的HOA扩展系数的向量的具有cdbLen个码簿条目的码簿)。当HOA系数11的次序(由“N”表示)等于四时,V-向量重建构单元74可将cdbLen变量设定为32。V-向量重建构单元74接下来可从0到O反复,从而将TmpVVec阵列设定为零。在此反复期间,v-向量重建构单元74也可从0到NumVecIndecies语法元素的值反复,从而将TempVVec阵列的第m条目设定为等于第j WeightVal乘以VecDict的[cdbLen][VecIdx[j]][m]条目。When the value of the NumVecIndicies syntax element is not equal to one, the V-vector reconstruction unit 74 may set the cdbLen variable, which is a variable representing the number of vectors, to 0. The cdbLen syntax element indicates the number of entries in the dictionary or codebook for code vectors (where this dictionary is represented as "VecDict" in the pseudocode above and represents a codebook with cdbLen codebook entries containing a vector of HOA expansion coefficients used to decode the vector quantized V-vector). When the number of HOA coefficients 11 (represented by "N") is equal to four, the V-vector reconstruction unit 74 may set the cdbLen variable to 32. The V-vector reconstruction unit 74 may then iterate from 0 to 0, setting the TmpVVec array to zero. During this iteration, v-vector reconstruction unit 74 may also iterate from 0 to the value of the NumVecIndecies syntax element, setting the mth entry of the TempVVec array equal to the jth WeightVal multiplied by the [cdbLen][VecIdx[j]][m] entry of VecDict.

V-向量重建构单元74可根据以下伪码导出WeightVal:The V-vector reconstruction unit 74 may derive WeightVal according to the following pseudo code:

在前述伪码中,V-向量重建构单元74可从0直到NumVecIndices语法元素的值反复,首先确定PFlag语法元素的值是否等于0。当PFlag语法元素等于0时,V-向量重建构单元74可确定tmpWeightVal变量,从而将tmpWeightVal变量设定为等于WeightValCdbk码簿的[CodebkIdx][WeightIdx]条目。当PFlag语法元素的值并不等于0时,V-向量重建构单元74可将tmpWeightVal变量设定为等于WeightValPredCdbk码簿的[CodebkIdx][WeightIdx]条目加上WeightValAlpha变量乘以第i输送信道的第k-1帧的tempWeightVal。WeightValAlpha变量可指上文所提及的阿尔法值,其可在音频编码及解码装置20及24处静态地界定。V-向量重建构单元74可接着依据由提取单元72获得的SgnVal语法元素及tmpWeightVal变量获得WeightVal。In the aforementioned pseudocode, the V-vector reconstruction unit 74 may iterate from 0 up to the value of the NumVecIndices syntax element, first determining whether the value of the PFlag syntax element is equal to 0. When the PFlag syntax element is equal to 0, the V-vector reconstruction unit 74 may determine the tmpWeightVal variable, thereby setting the tmpWeightVal variable equal to the [CodebkIdx][WeightIdx] entry of the WeightValCdbk codebook. When the value of the PFlag syntax element is not equal to 0, the V-vector reconstruction unit 74 may set the tmpWeightVal variable equal to the [CodebkIdx][WeightIdx] entry of the WeightValPredCdbk codebook plus the WeightValAlpha variable multiplied by tempWeightVal for the k-1th frame of the i-th transport channel. The WeightValAlpha variable may refer to the alpha value mentioned above, which may be statically defined at the audio encoding and decoding devices 20 and 24. V-vector reconstruction unit 74 may then obtain WeightVal based on the SgnVal syntax element obtained by extraction unit 72 and the tmpWeightVal variable.

换句话说,V-向量重建构单元74可基于权重值码簿(表示为用于未经预测的向量量化的“WeightValCdbk”及用于经预测的向量量化的“WeightValPredCdbk”,所述两者可表示基于码簿索引(在前述VVectorData(i)语法表中表示为“CodebkIdx”语法元素)及权重索引(在前述VVectorData(i)语法表中表示为“WeightIdx”语法元素)中的一或多者编索引的多维表)导出用于用以重建构V-向量的每一对应码向量的权重值。可在旁侧信道信息的一部分中界定此CodebkIdx语法元素,如下文ChannelSideInfoData(i)语法表中所展示。In other words, the V-vector reconstruction unit 74 may derive weight values for each corresponding code vector used to reconstruct the V-vector based on a weight value codebook (denoted as "WeightValCdbk" for unpredicted vector quantization and "WeightValPredCdbk" for predicted vector quantization, both of which may represent multidimensional tables indexed based on one or more of a codebook index (denoted as the "CodebkIdx" syntax element in the aforementioned VVectorData(i) syntax table) and a weight index (denoted as the "WeightIdx" syntax element in the aforementioned VVectorData(i) syntax table). This CodebkIdx syntax element may be defined in a portion of the side channel information, as shown in the ChannelSideInfoData(i) syntax table below.

上述伪码的剩余向量量化部分涉及计算FNorm以使V-向量的元素正规化,随后将V-向量元素(v(i) VVecCoeffId[m](k))计算为等于TmpVVec[idx]乘以FNorm。V-向量重建构单元74可依据VVecCoeffID获得idx变量。The remaining vector quantization portion of the pseudocode above involves calculating FNorm to normalize the elements of the V-vector, and then calculating the V-vector element (v (i) VVecCoeffId[m] (k)) as TmpVVec[idx] multiplied by FNorm. The V-vector reconstruction unit 74 can obtain the idx variable based on VVecCoeffID.

当NbitsQ等于5时,执行均匀8位纯量解量化。与此对比,大于或等于6的NbitsQ值可导致霍夫曼解码的应用。上文所提及的cid值可等于NbitsQ值的两个最低有效位。预测模式在上述语法表中表示为PFlag,而霍夫曼表信息位在上述语法表中表示为CbFlag。剩余语法指定解码如何以实质上类似于上文所描述的方式的方式出现。When NbitsQ is equal to 5, uniform 8-bit scalar dequantization is performed. In contrast, NbitsQ values greater than or equal to 6 may result in the application of Huffman decoding. The cid value mentioned above may be equal to the two least significant bits of the NbitsQ value. The prediction mode is denoted as PFlag in the above syntax table, while the Huffman table information bits are denoted as CbFlag in the above syntax table. The remaining syntax specifies how decoding occurs in a manner substantially similar to that described above.

音质解码单元80可以与图3的实例中所展示的音质音频译码器单元40互逆的方式操作以便解码经编码环境HOA系数59及经编码nFG信号61且借此产生经能量补偿的环境HOA系数47'及经内插的nFG信号49'(其也可被称作经内插的nFG音频对象49')。音质解码单元80可将经能量补偿的环境HOA系数47'传递到淡化单元770且将nFG信号49'传递到前景制订单元78。The psychoacoustic decoding unit 80 can operate in a manner reciprocal to the psychoacoustic audio decoder unit 40 shown in the example of FIG3 to decode the encoded ambient HOA coefficients 59 and the encoded nFG signal 61 and thereby generate energy-compensated ambient HOA coefficients 47′ and interpolated nFG signal 49′ (which may also be referred to as interpolated nFG audio object 49′). The psychoacoustic decoding unit 80 can pass the energy-compensated ambient HOA coefficients 47′ to the fade unit 770 and pass the nFG signal 49′ to the foreground formulation unit 78.

空间-时间内插单元76可以与上文关于空间-时间内插单元50所描述的方式类似的方式操作。空间-时间内插单元76可接收经缩减前景V[k]向量55k且关于前景V[k]向量55k及经缩减前景V[k-1]向量55k-1执行空间-时间内插以产生经内插的前景V[k]向量55k”。空间-时间内插单元76可将经内插的前景V[k]向量55k”转递到淡化单元770。The spatial-temporal interpolation unit 76 may operate in a manner similar to that described above with respect to the spatial-temporal interpolation unit 50. The spatial-temporal interpolation unit 76 may receive the reduced foreground V[k] vector 55 k and perform spatial-temporal interpolation on the foreground V[k] vector 55 k and the reduced foreground V[k-1] vector 55 k-1 to generate an interpolated foreground V[k] vector 55 k ”. The spatial-temporal interpolation unit 76 may transfer the interpolated foreground V[k] vector 55 k ” to the fade unit 770.

提取单元72也可将指示环境HOA系数中的一者何时处于转变中的信号757输出到淡化单元770,所述淡化单元770可接着确定SHCBG 47'(其中SHCBG 47'也可表示为“环境HOA信道47'”或“环境HOA系数47'”)及经内插的前景V[k]向量55k”的元素中的哪一者将淡入或淡出。在一些实例中,淡化单元770可关于环境HOA系数47'及经内插的前景V[k]向量55k”的元素中的每一者相反地操作。即,淡化单元770可关于环境HOA系数47'中的对应环境HOA系数执行淡入或淡出或执行淡入或淡出两者,同时关于经内插的前景V[k]向量55k”的元素中的对应经内插的前景V[k]向量执行淡入或淡出或执行淡入与淡出两者。淡化单元770可将经调整的环境HOA系数47”输出到HOA系数制订单元82且将经调整的前景V[k]向量55k”'输出到前景制订单元78。就此而言,淡化单元770表示经配置以关于HOA系数或其导出项(例如,呈环境HOA系数47'及经内插的前景V[k]向量55k”的元素的形式)的各种方面执行淡化操作的单元。The extraction unit 72 may also output a signal 757 indicating when one of the ambient HOA coefficients is in transition to a fade unit 770, which may then determine which of the SHC BG 47′ (wherein the SHC BG 47′ may also be represented as “ambient HOA channel 47′” or “ambient HOA coefficient 47′”) and the elements of the interpolated foreground V[k] vector 55 k ” to fade in or out. In some examples, the fade unit 770 may operate inversely with respect to each of the elements of the ambient HOA coefficient 47′ and the interpolated foreground V[k] vector 55 k ”. That is, the fade unit 770 may perform a fade-in or fade-out or both fade-in or fade-out with respect to the corresponding ambient HOA coefficients in the ambient HOA coefficients 47 , while performing a fade-in or fade-out or both fade-in and fade-out with respect to the corresponding interpolated foreground V[k] vectors in the elements of the interpolated foreground V[k] vector 55 k ”. The fade unit 770 may output the adjusted ambient HOA coefficients 47″ to the HOA coefficient formulation unit 82 and the adjusted foreground V[k] vector 55 k ”′ to the foreground formulation unit 78. In this regard, the fade unit 770 represents a unit configured to perform fade operations with respect to various aspects of the HOA coefficients or their derivatives (e.g., in the form of the ambient HOA coefficients 47′ and the elements of the interpolated foreground V[k] vector 55 k ”).

前景制订单元78可表示经配置以关于经调整的前景V[k]向量55k”'及经内插的nFG信号49'执行矩阵乘法以产生前景HOA系数65的单元。就此而言,前景制订单元78可组合音频对象49'(所述方式为借以表示经内插的nFG信号49'的另一种方式)与向量55k”'以重建构HOA系数11'的前景(或换句话说,占优势)方面。前景制订单元78可执行经内插的nFG信号49'乘以经调整的前景V[k]向量55k”'的矩阵乘法。The foreground development unit 78 may represent a unit configured to perform matrix multiplication on the adjusted foreground V[k] vector 55k ' and the interpolated nFG signal 49' to produce the foreground HOA coefficients 65. In this regard, the foreground development unit 78 may combine the audio object 49' (which is another way of representing the interpolated nFG signal 49') and the vector 55k ' to reconstruct the foreground (or in other words, dominant) aspects of the HOA coefficients 11'. The foreground development unit 78 may perform a matrix multiplication of the interpolated nFG signal 49' by the adjusted foreground V[k] vector 55k '.

HOA系数制订单元82可表示经配置以将前景HOA系数65组合到经调整的环境HOA系数47”以便获得HOA系数11'的单元。撇号记法反映HOA系数11'可类似于HOA系数11但与HOA系数11不相同。HOA系数11与11'之间的差可起因于归因于有损传输媒体上的传输、量化或其它有损操作产生的损失。The HOA coefficient formulation unit 82 may represent a unit configured to combine the foreground HOA coefficients 65 into the adjusted ambient HOA coefficients 47″ in order to obtain the HOA coefficients 11′. The apostrophe notation reflects that the HOA coefficients 11′ may be similar to but not identical to the HOA coefficients 11. The difference between the HOA coefficients 11 and 11′ may result from losses due to transmission over a lossy transmission medium, quantization, or other lossy operations.

图5A为说明音频编码装置(例如,图3的实例中所展示的音频编码装置20)执行本发明中所描述的基于向量的合成技术的各种方面的示范性操作的流程图。最初,音频编码装置20接收HOA系数11(106)。音频编码装置20可调用LIT单元30,LIT单元30可关于HOA系数应用LIT以输出经变换的HOA系数(例如,在SVD的状况下,经变换的HOA系数可包括US[k]向量33及V[k]向量35)(107)。5A is a flowchart illustrating exemplary operation of an audio encoding device (e.g., the audio encoding device 20 shown in the example of FIG. 3 ) in performing various aspects of the vector-based synthesis techniques described in this disclosure. Initially, the audio encoding device 20 receives the HOA coefficients 11 (106). The audio encoding device 20 may invoke the LIT unit 30, which may apply the LIT with respect to the HOA coefficients to output transformed HOA coefficients (e.g., in the case of SVD, the transformed HOA coefficients may include the US[k] vector 33 and the V[k] vector 35) (107).

音频编码装置20接下来可调用参数计算单元32以按上文所描述的方式关于US[k]向量33、US[k-1]向量33、V[k]及/或V[k-1]向量35的任何组合执行上文所描述的分析以识别各种参数。即,参数计算单元32可基于经变换的HOA系数33/35的分析确定至少一参数(108)。The audio encoding device 20 may then call the parameter calculation unit 32 to perform the above-described analysis to identify various parameters with respect to any combination of the US[k] vector 33, the US[k-1] vector 33, the V[k] and/or the V[k-1] vector 35 in the manner described above. That is, the parameter calculation unit 32 may determine at least one parameter based on the analysis of the transformed HOA coefficients 33/35 (108).

音频编码装置20可接着调用重新排序单元34,重新排序单元34基于参数将经变换的HOA系数(再次在SVD的上下文中,其可指US[k]向量33及V[k]向量35)重新排序以产生经重新排序的经变换的HOA系数33'/35'(或,换句话说,US[k]向量33'及V[k]向量35'),如上文所描述(109)。在前述操作或后续操作中的任一者期间,音频编码装置20也可调用声场分析单元44。如上文所描述,声场分析单元44可关于HOA系数11及/或经变换的HOA系数33/35执行声场分析以确定前景信道的总数目(nFG)45、背景声场的阶数(NBG)以及待发送的额外BG HOA信道的数目(nBGa)及索引(i)(其在图3的实例中可共同地表示为背景信道信息43)(109)。The audio encoding device 20 may then call the reordering unit 34, which reorders the transformed HOA coefficients (again, in the context of SVD, which may refer to the US[k] vector 33 and the V[k] vector 35) based on the parameters to produce reordered transformed HOA coefficients 33'/35' (or, in other words, the US[k] vector 33' and the V[k] vector 35'), as described above (109). During any of the foregoing operations or subsequent operations, the audio encoding device 20 may also call the soundfield analysis unit 44. As described above, the soundfield analysis unit 44 may perform soundfield analysis on the HOA coefficients 11 and/or the transformed HOA coefficients 33/35 to determine the total number of foreground channels (nFG) 45, the order of the background soundfield ( NBG ), and the number (nBGa) and index (i) of additional BG HOA channels to be sent (which may be collectively represented as background channel information 43 in the example of FIG. 3) (109).

音频编码装置20也可调用背景选择单元48。背景选择单元48可基于背景信道信息43确定背景或环境HOA系数47(110)。音频编码装置20可进一步调用前景选择单元36,前景选择单元36可基于nFG 45(其可表示识别前景向量的一或多个索引)选择表示声场的前景或特异分量的经重新排序的US[k]向量33'及经重新排序的V[k]向量35'(112)。The audio encoding device 20 may also invoke a background selection unit 48. The background selection unit 48 may determine background or ambient HOA coefficients 47 based on the background channel information 43 (110). The audio encoding device 20 may further invoke a foreground selection unit 36, which may select a reordered US[k] vector 33' and a reordered V[k] vector 35' representing a foreground or unique component of the sound field based on nFG 45 (which may represent one or more indices identifying a foreground vector) (112).

音频编码装置20可调用能量补偿单元38。能量补偿单元38可关于环境HOA系数47执行能量补偿以补偿归因于由背景选择单元48移除HOA系数中的各种HOA系数而产生的能量损失(114),且借此产生经能量补偿的环境HOA系数47'。The audio encoding device 20 may invoke the energy compensation unit 38. The energy compensation unit 38 may perform energy compensation on the ambient HOA coefficients 47 to compensate for energy loss due to removal of various ones of the HOA coefficients by the background selection unit 48 (114), and thereby generate energy-compensated ambient HOA coefficients 47'.

音频编码装置20也可调用空间-时间内插单元50。空间-时间内插单元50可关于经重新排序的经变换的HOA系数33'/35'执行空间-时间内插以获得经内插的前景信号49'(其也可被称作“经内插的nFG信号49'”)及剩余前景方向信息53(其也可被称作“V[k]向量53”)(116)。音频编码装置20可接着调用系数减少单元46。系数减少单元46可基于背景信道信息43关于剩余前景V[k]向量53执行系数减少以获得经缩减前景方向信息55(其也可被称作经缩减前景V[k]向量55)(118)。The audio encoding device 20 may also call the space-time interpolation unit 50. The space-time interpolation unit 50 may perform space-time interpolation on the reordered transformed HOA coefficients 33'/35' to obtain an interpolated foreground signal 49' (which may also be referred to as "interpolated nFG signal 49'") and residual foreground directional information 53 (which may also be referred to as "V[k] vector 53") (116). The audio encoding device 20 may then call the coefficient reduction unit 46. The coefficient reduction unit 46 may perform coefficient reduction on the residual foreground V[k] vector 53 based on the background channel information 43 to obtain reduced foreground directional information 55 (which may also be referred to as reduced foreground V[k] vector 55) (118).

音频编码装置20可接着调用量化单元52以按上文所描述的方式压缩经缩减前景V[k]向量55且产生经译码前景V[k]向量57(120)。The audio encoding device 20 may then invoke the quantization unit 52 to compress the downscaled foreground V[k] vector 55 in the manner described above and generate a coded foreground V[k] vector 57 (120).

音频编码装置20也可调用音质音频译码器单元40。音质音频译码器单元40可对经能量补偿的环境HOA系数47'及经内插的nFG信号49'的每一向量进行音质译码以产生经编码环境HOA系数59及经编码nFG信号61。音频编码装置可接着调用位流产生单元42。位流产生单元42可基于经译码前景方向信息57、经译码环境HOA系数59、经译码nFG信号61及背景信道信息43产生位流21。The audio encoding device 20 may also invoke the psychoacoustic audio decoder unit 40. The psychoacoustic audio decoder unit 40 may psychoacoustically decode each vector of the energy-compensated ambient HOA coefficients 47′ and the interpolated nFG signal 49′ to generate encoded ambient HOA coefficients 59 and encoded nFG signal 61. The audio encoding device may then invoke the bitstream generation unit 42. The bitstream generation unit 42 may generate the bitstream 21 based on the coded foreground directional information 57, the coded ambient HOA coefficients 59, the coded nFG signal 61, and the background channel information 43.

图5B为说明音频编码装置执行本发明中所描述的译码技术的示范性操作的流程图。图3的实例中所展示的音频编码装置20的位流产生单元42可表示经配置以执行本发明中所描述的技术的一实例单元。位流产生单元42可确定帧的量化模式是否与时间上的前一帧(其可表示为“第二帧”)的量化模式相同(314)。尽管关于前一帧加以描述,但可关于时间上的后续帧执行所述技术。帧可包含一或多个输送信道的一部分。输送信道的所述部分可包含ChannelSideInfoData(根据ChannelSideInfoData语法表形成)以及某一有效负载(例如,图7的实例中的VVectorData字段156)。有效负载的其它实例可包含AddAmbientHOACoeffs字段。FIG5B is a flowchart illustrating exemplary operation of an audio encoding device performing the decoding techniques described in this disclosure. The bitstream generation unit 42 of the audio encoding device 20 shown in the example of FIG3 may represent an example unit configured to perform the techniques described in this disclosure. The bitstream generation unit 42 may determine whether the quantization mode of the frame is the same as the quantization mode of the previous frame in time (which may be denoted as "second frame") (314). Although described with respect to the previous frame, the techniques may be performed with respect to subsequent frames in time. A frame may include a portion of one or more transport channels. The portion of the transport channel may include ChannelSideInfoData (formed according to the ChannelSideInfoData syntax table) and a payload (e.g., the VVectorData field 156 in the example of FIG7). Other examples of a payload may include an AddAmbientHOACoeffs field.

当量化模式相同时(“是”316),位流产生单元42可在位流21中指定量化模式的一部分(318)。量化模式的所述部分可包含bA语法元素及bB语法元素,但不包含uintC语法元素。bA语法元素可表示指示NbitsQ语法元素的最高有效位的位。bB语法元素可表示指示NbitsQ语法元素的次高有效位的位。位流产生单元42可将bA语法元素及bB语法元素中的每一者的值设定为0,借此用信号通知位流21中的量化模式字段(即,作为一实例,NbitsQ字段)并不包含uintC语法元素。零值bA语法元素及bB语法元素的此用信号通知也指示将来自前一帧的NbitsQ值、PFlag值、CbFlag值及CodebkIdx值用作用于当前帧的相同语法元素的对应值。When the quantization modes are the same ("yes" 316), bitstream generation unit 42 may specify a portion of the quantization mode in bitstream 21 (318). The portion of the quantization mode may include a bA syntax element and a bB syntax element, but not a uintC syntax element. The bA syntax element may represent bits indicating the most significant bit of the NbitsQ syntax element. The bB syntax element may represent bits indicating the second most significant bit of the NbitsQ syntax element. Bitstream generation unit 42 may set the value of each of the bA and bB syntax elements to 0, thereby signaling that the quantization mode field (i.e., as an example, the NbitsQ field) in bitstream 21 does not include a uintC syntax element. This signaling of zero-valued bA and bB syntax elements also indicates that the NbitsQ value, PFlag value, CbFlag value, and CodebkIdx value from the previous frame are used as the corresponding values for the same syntax elements for the current frame.

当量化模式并不相同时(“否”316),位流产生单元42可在位流21中指定指示整个量化模式的一或多个位(320)。即,位流产生单元42可在位流21中指定bA、bB及uintC语法元素。位流产生单元42也可基于量化模式指定量化信息(322)。此量化信息可包含关于量化的任何信息,例如向量量化信息、预测信息及霍夫曼码簿信息。作为一实例,向量量化信息可包含CodebkIdx语法元素及NumVecIndices语法元素中的一者或两者。作为一实例,预测信息可包含PFlag语法元素。作为一实例,霍夫曼码簿信息可包含CbFlag语法元素。When the quantization modes are not the same ("No" 316), bitstream generation unit 42 may specify one or more bits in bitstream 21 that indicate the entire quantization mode (320). That is, bitstream generation unit 42 may specify bA, bB, and uintC syntax elements in bitstream 21. Bitstream generation unit 42 may also specify quantization information based on the quantization mode (322). This quantization information may include any information about quantization, such as vector quantization information, prediction information, and Huffman codebook information. As an example, vector quantization information may include one or both of a CodebkIdx syntax element and a NumVecIndices syntax element. As an example, prediction information may include a PFlag syntax element. As an example, Huffman codebook information may include a CbFlag syntax element.

就此而言,所述技术可使得音频编码装置20能够经配置以获得包括声场的空间分量的经压缩版本的位流21。可通过关于多个球谐系数执行基于向量的合成而产生空间分量。位流可进一步包括来自前一帧的关于是否重用指定在压缩空间分量时使用的信息的标头字段的一或多个位的指示符。In this regard, the techniques described herein may enable the audio encoding device 20 to be configured to obtain a bitstream 21 comprising a compressed version of the spatial components of the sound field. The spatial components may be generated by performing vector-based synthesis on a plurality of spherical harmonic coefficients. The bitstream may further include an indicator of one or more bits from a previous frame indicating whether to reuse information from a header field specifying information used in compressing the spatial components.

换句话说,所述技术可使得音频编码装置20能够经配置以获得包括表示球谐域中的正交空间轴线的向量57的位流21。位流21可进一步包括来自前一帧的关于是否重用指示在压缩(例如,量化)向量时使用的信息的至少一语法元素的指示符(例如,NbitsQ语法元素的bA/bB语法元素)。In other words, the techniques may enable the audio encoding device 20 to be configured to obtain a bitstream 21 including vectors 57 representing orthogonal spatial axes in the spherical harmonic domain. The bitstream 21 may further include an indicator from a previous frame regarding whether to reuse at least one syntax element indicating information used in compressing (e.g., quantizing) the vector (e.g., a bA/bB syntax element of an NbitsQ syntax element).

图6A为说明音频解码装置(例如,图4中所展示的音频解码装置24)执行本发明中所描述的技术的各种方面的示范性操作的流程图。最初,音频解码装置24可接收位流21(130)。在接收到位流后,音频解码装置24可调用提取单元72。出于论述的目的假定位流21指示将执行基于向量的重建构,提取单元72可剖析位流以检索上文所提及的信息,将所述信息传递到基于向量的重建构单元92。FIG6A is a flowchart illustrating exemplary operation of an audio decoding device (e.g., audio decoding device 24 shown in FIG4 ) in performing various aspects of the techniques described in this disclosure. Initially, audio decoding device 24 may receive bitstream 21 (130). Upon receiving the bitstream, audio decoding device 24 may invoke extraction unit 72. Assuming for purposes of discussion that bitstream 21 indicates that vector-based reconstruction is to be performed, extraction unit 72 may parse the bitstream to retrieve the information mentioned above, passing the information to vector-based reconstruction unit 92.

换句话说,提取单元72可按上文所描述的方式从位流21中提取经译码前景方向信息57(再次,其也可被称作经译码前景V[k]向量57)、经译码环境HOA系数59及经译码前景信号(其也可被称作经译码前景nFG信号59或经译码前景音频对象59)(132)。In other words, the extraction unit 72 can extract the decoded foreground directional information 57 (which, again, may also be referred to as the decoded foreground V[k] vector 57), the decoded ambient HOA coefficients 59 and the decoded foreground signal (which may also be referred to as the decoded foreground nFG signal 59 or the decoded foreground audio object 59) from the bitstream 21 in the manner described above (132).

音频解码装置24可进一步调用解量化单元74。解量化单元74可对经译码前景方向信息57进行熵解码及解量化以获得经缩减前景方向信息55k(136)。音频解码装置24也可调用音质解码单元80。音质音频解码单元80可解码经编码环境HOA系数59及经编码前景信号61以获得经能量补偿的环境HOA系数47'及经内插的前景信号49'(138)。音质解码单元80可将经能量补偿的环境HOA系数47'传递到淡化单元770且将nFG信号49'传递到前景制订单元78。The audio decoding device 24 may further call a dequantization unit 74. The dequantization unit 74 may entropy decode and dequantize the coded foreground directional information 57 to obtain reduced foreground directional information 55k (136). The audio decoding device 24 may also call a psychoacoustic decoding unit 80. The psychoacoustic audio decoding unit 80 may decode the encoded ambient HOA coefficients 59 and the encoded foreground signal 61 to obtain energy-compensated ambient HOA coefficients 47' and an interpolated foreground signal 49' (138). The psychoacoustic decoding unit 80 may pass the energy-compensated ambient HOA coefficients 47' to the fade unit 770 and pass the nFG signal 49' to the foreground formulation unit 78.

音频解码装置24接下来可调用空间-时间内插单元76。空间-时间内插单元76可接收经重新排序的前景方向信息55k'且关于经缩减前景方向信息55k/55k-1执行空间-时间内插以产生经内插的前景方向信息55k”(140)。空间-时间内插单元76可将经内插的前景V[k]向量55k”转递到淡化单元770。The audio decoding device 24 may next call the spatio-temporal interpolation unit 76. The spatio-temporal interpolation unit 76 may receive the reordered foreground directional information 55 k ′ and perform spatio-temporal interpolation on the reduced foreground directional information 55 k /55 k-1 to generate interpolated foreground directional information 55 k ″(140). The spatio-temporal interpolation unit 76 may transfer the interpolated foreground V[k] vector 55 k ″ to the fade unit 770.

音频解码装置24可调用淡化单元770。淡化单元770可接收或以其它方式获得指示经能量补偿的环境HOA系数47'何时处于转变中的语法元素(例如,AmbCoeffTransition语法元素)(例如,从提取单元72)。淡化单元770可基于转变语法元素及维持的转变状态信息使经能量补偿的环境HOA系数47'淡入或淡出,从而将经调整的环境HOA系数47”输出到HOA系数制订单元82。淡化单元770也可基于语法元素及维持的转变状态信息,及使经内插的前景V[k]向量55k”中的对应一或多个元素淡出或淡入,从而将经调整的前景V[k]向量55k”'输出到前景制订单元78(142)。The audio decoding device 24 may invoke a fade unit 770. The fade unit 770 may receive or otherwise obtain a syntax element (e.g., an AmbCoeffTransition syntax element) that indicates when the energy-compensated ambient HOA coefficients 47′ are in transition (e.g., from the extraction unit 72). The fade unit 770 may fade the energy-compensated ambient HOA coefficients 47′ in or out based on the transition syntax element and the maintained transition state information, thereby outputting the adjusted ambient HOA coefficients 47″ to the HOA coefficient formulation unit 82. The fade unit 770 may also fade out or in the corresponding one or more elements in the interpolated foreground V[k] vector 55k ″ based on the syntax element and the maintained transition state information, thereby outputting the adjusted foreground V[k] vector 55k ′″ to the foreground formulation unit 78 (142).

音频解码装置24可调用前景制订单元78。前景制订单元78可执行nFG信号49'乘以经调整的前景方向信息55k”'的矩阵乘法以获得前景HOA系数65(144)。音频解码装置24也可调用HOA系数制订单元82。HOA系数制订单元82可将前景HOA系数65加到经调整的环境HOA系数47”以便获得HOA系数11'(146)。The audio decoding device 24 may call the foreground formulation unit 78. The foreground formulation unit 78 may perform a matrix multiplication of the nFG signal 49′ by the adjusted foreground direction information 55 k ″′ to obtain the foreground HOA coefficients 65 (144). The audio decoding device 24 may also call the HOA coefficient formulation unit 82. The HOA coefficient formulation unit 82 may add the foreground HOA coefficients 65 to the adjusted ambient HOA coefficients 47″ to obtain the HOA coefficients 11′ (146).

图6B为说明音频解码装置执行本发明中所描述的译码技术的示范性操作的流程图。图4的实例中所展示的音频编码装置24的提取单元72可表示经配置以执行本发明中所描述的技术的一实例单元。位流提取单元72可获得指示帧的量化模式是否与时间上的前一帧(其可表示为“第二帧”)的量化模式相同的位(362)。此外,尽管关于前一帧加以描述,但可关于时间上的后续帧执行所述技术。FIG6B is a flowchart illustrating exemplary operation of an audio decoding device performing the coding techniques described in this disclosure. Extraction unit 72 of audio encoding device 24 shown in the example of FIG4 may represent an example unit configured to perform the techniques described in this disclosure. Bitstream extraction unit 72 may obtain bits (362) indicating whether the quantization mode of a frame is the same as the quantization mode of a temporally previous frame (which may be denoted as "second frame"). Furthermore, although described with respect to a previous frame, the techniques may be performed with respect to a temporally subsequent frame.

当量化模式相同时(“是”364),提取单元72可从位流21中获得量化模式的一部分(366)。量化模式的所述部分可包含bA语法元素及bB语法元素,但不包含uintC语法元素。提取单元42也可将用于当前帧的NbitsQ值、PFlag值、CbFlag值、CodebkIdx值及NumVertIndices值的值设定为与针对前一帧设定的NbitsQ值、PFlag值、CbFlag值、CodebkIdx值及NumVertIndices值的值相同(368)。When the quantization modes are the same ("YES" 364), extraction unit 72 may obtain a portion of the quantization mode from bitstream 21 (366). The portion of the quantization mode may include the bA syntax element and the bB syntax element, but not the uintC syntax element. Extraction unit 42 may also set the values of the NbitsQ value, PFlag value, CbFlag value, CodebkIdx value, and NumVertIndices value for the current frame to be the same as the values of the NbitsQ value, PFlag value, CbFlag value, CodebkIdx value, and NumVertIndices value set for the previous frame (368).

当量化模式并不相同时(“否”364),提取单元72可从位流21中获得指示整个量化模式的一或多个位。即,提取单元72从位流21中获得bA、bB及uintC语法元素(370)。提取单元72也可基于量化模式获得指示量化信息的一或多个位(372)。如上文关于图5B所提及,量化信息可包含关于量化的任何信息,例如向量量化信息、预测信息及霍夫曼码簿信息。作为一实例,向量量化信息可包含CodebkIdx语法元素及NumVecIndices语法元素中的一者或两者。作为一实例,预测信息可包含PFlag语法元素。作为一实例,霍夫曼码簿信息可包含CbFlag语法元素。When the quantization modes are not the same ("No" 364), extraction unit 72 may obtain one or more bits indicating the entire quantization mode from bitstream 21. That is, extraction unit 72 obtains the bA, bB, and uintC syntax elements from bitstream 21 (370). Extraction unit 72 may also obtain one or more bits indicating quantization information based on the quantization mode (372). As mentioned above with respect to FIG. 5B, the quantization information may include any information related to quantization, such as vector quantization information, prediction information, and Huffman codebook information. As an example, the vector quantization information may include one or both of the CodebkIdx syntax element and the NumVecIndices syntax element. As an example, the prediction information may include the PFlag syntax element. As an example, the Huffman codebook information may include the CbFlag syntax element.

就此而言,所述技术可使得音频解码装置24能够经配置以获得包括声场的空间分量的经压缩版本的位流21。可通过关于多个球谐系数执行基于向量的合成而产生空间分量。位流可进一步包括来自前一帧的关于是否重用指定在压缩空间分量时使用的信息的标头字段的一或多个位的指示符。In this regard, the techniques described herein may enable the audio decoding device 24 to be configured to obtain a bitstream 21 comprising a compressed version of the spatial components of the sound field. The spatial components may be generated by performing vector-based synthesis on a plurality of spherical harmonic coefficients. The bitstream may further include an indicator of one or more bits from a previous frame indicating whether to reuse information from a header field specifying information used in compressing the spatial components.

换句话说,所述技术可使得音频解码装置24能够经配置以获得包括表示球谐域中的正交空间轴线的向量57的位流21。位流21可进一步包括来自前一帧的关于是否重用指示在压缩(例如,量化)向量时使用的信息的至少一语法元素的指示符(例如,NbitsQ语法元素的bA/bB语法元素)。In other words, the techniques may enable the audio decoding device 24 to be configured to obtain a bitstream 21 including vectors 57 representing orthogonal spatial axes in the spherical harmonic domain. The bitstream 21 may further include an indicator from a previous frame regarding whether to reuse at least one syntax element indicating information used in compressing (e.g., quantizing) the vector (e.g., a bA/bB syntax element of an NbitsQ syntax element).

图7为说明根据本发明中所描述的技术的各种方面指定的实例帧249S及249T的图。如图7的实例中所展示,帧249S包含ChannelSideInfoData(CSID)字段154A到154D、HOAGainCorrectionData(HOAGCD)字段、VVectorData字段156A及156B以及HOAPredictionInfo字段。CSID字段154A包含经设定为值10的uintC语法元素(“uintC”)267、经设定为值1的bb语法元素(“bB”)266,及经设定为值0的bA语法元素(“bA”)265,以及经设定为值01的ChannelType语法元素(“ChannelType”)269。FIG7 is a diagram illustrating example frames 249S and 249T specified according to various aspects of the techniques described in this disclosure. As shown in the example of FIG7 , frame 249S includes ChannelSideInfoData (CSID) fields 154A-154D, HOAGainCorrectionData (HOAGCD) fields, VVectorData fields 156A and 156B, and a HOAPredictionInfo field. CSID field 154A includes a uintC syntax element ("uintC") 267 set to a value of 10, a bb syntax element ("bB") 266 set to a value of 1, a bA syntax element ("bA") 265 set to a value of 0, and a ChannelType syntax element ("ChannelType") 269 set to a value of 01.

uintC语法元素267、bB语法元素266及bA语法元素265一起形成NbitsQ语法元素261,其中bA语法元素265形成NbitsQ语法元素261的最高有效位,bB语法元素266形成次高有效位且uintC语法元素267形成最低有效位。如上文所提及,NbitsQ语法元素261可表示指示用以编码高阶立体混响音频数据的量化模式(例如,向量量化模式、无霍夫曼译码的纯量量化模式,及具有霍夫曼译码的纯量量化模式中的一者)的一或多个位。The uintC syntax element 267, the bB syntax element 266, and the bA syntax element 265 together form an NbitsQ syntax element 261, where the bA syntax element 265 forms the most significant bit, the bB syntax element 266 forms the second most significant bit, and the uintC syntax element 267 forms the least significant bit of the NbitsQ syntax element 261. As mentioned above, the NbitsQ syntax element 261 may represent one or more bits indicating a quantization mode (e.g., one of a vector quantization mode, a scalar quantization mode without Huffman coding, and a scalar quantization mode with Huffman coding) used to encode the high-order ambisonics audio data.

CSID语法元素154A也包含上文在各种语法表中参考的PFlag语法元素300及CbFlag语法元素302。PFlag语法元素300可表示指示通过第一帧249S的HOA系数11表示的声场的空间分量的经译码元素(其中再次,空间分量可指V-向量)是否是从第二帧(例如,在此实例中,为前一帧)预测的一或多个位。CbFlag语法元素302可表示指示霍夫曼码簿信息的一或多个位,所述霍夫曼码簿信息可识别使用霍夫曼码簿(或,换句话说,表格)中的哪一者编码空间分量的元素(或,换句话说,V-向量元素)。The CSID syntax element 154A also includes the PFlag syntax element 300 and the CbFlag syntax element 302 referenced above in the various syntax tables. The PFlag syntax element 300 may represent one or more bits indicating whether the coded elements of the spatial components of the soundfield represented by the HOA coefficients 11 of the first frame 249S (where again, the spatial components may refer to V-vectors) are predicted from a second frame (e.g., in this example, the previous frame). The CbFlag syntax element 302 may represent one or more bits indicating Huffman codebook information, which may identify which of the Huffman codebooks (or, in other words, tables) is used to encode the elements of the spatial components (or, in other words, V-vector elements).

CSID字段154B包含bB语法元素266及bB语法元素265以及ChannelType语法元素269,在图7的实例中,前述各语法元素中的每一者经设定为对应值0及0及01。CSID字段154C及154D中的每一者包含具有值3(112)的ChannelType字段269。CSID字段154A到154D中的每一者对应于输送信道1、2、3及4中的相应输送信道。实际上,每一CSID字段154A到154D指示对应有效负载为基于方向的信号(当对应ChannelType等于零时)、基于向量的信号(当对应ChannelType等于一时)、额外环境HOA系数(当对应ChannelType等于二时)还是为空值(当ChannelType等于三时)。7 , each of which is set to the corresponding values of 0, 0, and 01. Each of CSID fields 154C and 154D includes a ChannelType field 269 having a value of 3 (11 2 ). Each of CSID fields 154A-154D corresponds to a respective transport channel of transport channels 1, 2, 3, and 4. In practice, each CSID field 154A-154D indicates whether the corresponding payload is a direction-based signal (when the corresponding ChannelType is equal to zero), a vector-based signal (when the corresponding ChannelType is equal to one), additional ambient HOA coefficients (when the corresponding ChannelType is equal to two), or a null value (when ChannelType is equal to three).

在图7的实例中,帧249S包含两个基于向量的信号(在给定ChannelType语法元素269在CSID字段154A及154B中等于1的情况下)及两个空值(在给定ChannelType 269在CSID字段154C及154D中等于3的情况下)。此外,如通过PFlag语法元素300指示的音频编码装置20使用的预测经设定为一。此外,如通过PFlag语法元素300指示的预测是指指示关于经压缩空间分量v1到vn中的对应经压缩空间分量是否执行预测的预测模式指示。当PFlag语法元素300经设定为一时,音频编码装置20可使用通过采取以下情形的差进行的预测:对于纯量量化,来自前一帧的向量元素与当前帧的对应向量元素之间的差,或,对于向量量化,来自前一帧的权重与当前帧的对应权重之间的差。In the example of FIG7 , frame 249S includes two vector-based signals (when the given ChannelType syntax element 269 is equal to 1 in CSID fields 154A and 154B) and two null values (when the given ChannelType 269 is equal to 3 in CSID fields 154C and 154D). Furthermore, the prediction used by the audio encoding device 20, as indicated by the PFlag syntax element 300, is set to one. Furthermore, the prediction, as indicated by the PFlag syntax element 300, refers to a prediction mode indication indicating whether prediction is performed on corresponding compressed spatial components of the compressed spatial components v1 through vn. When the PFlag syntax element 300 is set to one, the audio encoding device 20 may use prediction by taking the difference between vector elements from the previous frame and corresponding vector elements of the current frame for scalar quantization, or the difference between weights from the previous frame and corresponding weights of the current frame for vector quantization.

音频编码装置20也确定帧249S中的第二输送信道的CSID字段154B的NbitsQ语法元素261的值与前一帧(例如,在图7的实例中为帧249T)的第二输送信道的CSID字段154B的NbitsQ语法元素261的值相同。因此,音频编码装置20针对bA语法元素265及bB语法元素266中的每一者指定值零以用信号通知将前一帧249T中的第二输送信道的NbitsQ语法元素261的值重用于帧249S中的第二输送信道的NbitsQ语法元素261。因此,音频编码装置20可避免指定帧249S中的第二输送信道的uintC语法元素267以及上文所识别的另一语法元素。The audio encoding device 20 also determines that the value of the NbitsQ syntax element 261 for the CSID field 154B for the second transport channel in frame 249S is the same as the value of the NbitsQ syntax element 261 for the CSID field 154B for the second transport channel of the previous frame (e.g., frame 249T in the example of FIG. 7 ). Therefore, the audio encoding device 20 specifies a value of zero for each of the bA syntax element 265 and the bB syntax element 266 to signal that the value of the NbitsQ syntax element 261 for the second transport channel in the previous frame 249T is reused for the NbitsQ syntax element 261 for the second transport channel in frame 249S. Therefore, the audio encoding device 20 can avoid specifying the uintC syntax element 267 for the second transport channel in frame 249S, as well as the other syntax element identified above.

图8为说明根据本文中所描述的技术的至少一位流的一或多个信道的实例帧的图。位流450包含各自可包含一或多个信道的帧810A到810H。位流450可为图7的实例中所展示的位流21的一实例。在图8的实例中,音频解码装置24维持状态信息,从而更新状态信息以确定如何解码当前帧k。音频解码装置24可利用来自配置814及帧810B到810D的状态信息。FIG8 is a diagram illustrating an example frame of one or more channels of at least one bitstream according to the techniques described herein. Bitstream 450 includes frames 810A through 810H, each of which may include one or more channels. Bitstream 450 may be an example of bitstream 21 shown in the example of FIG7 . In the example of FIG8 , audio decoding device 24 maintains state information, thereby updating the state information to determine how to decode the current frame k. Audio decoding device 24 may utilize state information from configuration 814 and frames 810B through 810D.

换句话说,音频编码装置20可在位流产生单元42内包含(例如)状态机402,其维持用于编码帧810A到810E中的每一者的状态信息,这是因为位流产生单元42可基于状态机402指定用于帧810A到810E中的每一者的语法元素。In other words, the audio encoding device 20 may include, for example, a state machine 402 within the bitstream generation unit 42 that maintains state information for encoding each of the frames 810A-810E because the bitstream generation unit 42 may specify syntax elements for each of the frames 810A-810E based on the state machine 402.

音频解码装置24同样可在位流提取单元72内包含(例如)类似状态机402,其基于状态机402输出语法元素(所述语法元素中的一些语法元素未在位流21中明确地指定)。音频解码装置24的状态机402可按与音频编码装置20的状态机402的方式类似的方式操作。因此,音频解码装置24的状态机402可维持状态信息,从而基于配置814(及,在图8的实例中,帧810B到810D的解码)更新状态信息。基于状态信息,位流提取单元72可基于由状态机402维持的状态信息提取帧810E。状态信息可提供数个隐含语法元素,音频编码装置20可在解码帧810E的各种输送信道时利用所述隐含语法元素。The audio decoding device 24 may likewise include, for example, a similar state machine 402 within the bitstream extraction unit 72, which outputs syntax elements based on the state machine 402 (some of which are not explicitly specified in the bitstream 21). The state machine 402 of the audio decoding device 24 may operate in a manner similar to that of the state machine 402 of the audio encoding device 20. Thus, the state machine 402 of the audio decoding device 24 may maintain state information, thereby updating the state information based on the configuration 814 (and, in the example of FIG. 8 , the decoding of frames 810B through 810D). Based on the state information, the bitstream extraction unit 72 may extract frame 810E based on the state information maintained by the state machine 402. The state information may provide a number of implicit syntax elements that the audio encoding device 20 may utilize when decoding various transport channels of frame 810E.

可关于任何数目个不同上下文及音频生态系统执行前述技术。下文描述数个实例上下文,但所述技术应限于所述实例上下文。一实例音频生态系统可包含音频内容、影片工作室、音乐工作室、游戏音频工作室、基于信道的音频内容、译码引擎、游戏音频符干(gameaudio stems)、游戏音频译码/呈现引擎,及递送系统。The aforementioned techniques can be performed with respect to any number of different contexts and audio ecosystems. Several example contexts are described below, but the techniques should be limited to these example contexts. An example audio ecosystem may include audio content, a film studio, a music studio, a game audio studio, channel-based audio content, a decoding engine, game audio stems, a game audio decoding/rendering engine, and a delivery system.

影片工作室、音乐工作室及游戏音频工作室可接收音频内容。在一些实例中,音频内容可表示获取的输出。影片工作室可例如通过使用数字音频工作站(DAW)输出基于信道的音频内容(例如,呈2.0、5.1及7.1)。音乐工作室可例如通过使用DAW输出基于信道的音频内容(例如,呈2.0及5.1)。在任一状况下,译码引擎可基于一或多个编解码器(例如,AAC、AC3、杜比真HD(Dolby True HD)、杜比数字Plus(Dolby Digital Plus)及DTS主音频)接收及编码基于信道的音频内容以供由递送系统输出。游戏音频工作室可例如通过使用DAW输出一或多个游戏音频符干。游戏音频译码/呈现引擎可译码音频符干及或将音频符干呈现成基于信道的音频内容以供由递送系统输出。可执行所述技术的另一实例上下文包括音频生态系统,其可包含广播记录音频对象、专业音频系统、消费型装置上俘获、HOA音频格式、装置上呈现、消费型音频、TV及附件,及汽车音频系统。Film studios, music studios, and game audio studios may receive audio content. In some examples, the audio content may represent acquired output. A film studio may output channel-based audio content (e.g., in 2.0, 5.1, and 7.1), for example, using a digital audio workstation (DAW). A music studio may output channel-based audio content (e.g., in 2.0 and 5.1), for example, using a DAW. In either case, a decoding engine may receive and encode the channel-based audio content based on one or more codecs (e.g., AAC, AC3, Dolby True HD, Dolby Digital Plus, and DTS Master Audio) for output by a delivery system. A game audio studio may output one or more game audio stems, for example, using a DAW. A game audio decoding/rendering engine may decode the audio stems and/or render the audio stems into channel-based audio content for output by a delivery system. Another example context in which the techniques may be performed includes an audio ecosystem that may include broadcast recorded audio objects, professional audio systems, consumer on-device capture, HOA audio formats, on-device presentation, consumer audio, TV and accessories, and automotive audio systems.

广播记录音频对象、专业音频系统及消费型装置上俘获皆可使用HOA音频格式译码其输出。以此方式,可使用HOA音频格式将音频内容译码成单一表示,可使用装置上呈现、消费型音频、TV及附件及汽车音频系统重放所述单一表示。换句话说,可在通用音频重放系统(即,与需要例如5.1、7.1等的特定配置的情形形成对比)(例如,音频重放系统16)处重放音频内容的单一表示。Broadcast recorded audio objects, professional audio systems, and capture on consumer devices can all use the HOA audio format to encode their output. In this way, the HOA audio format can be used to encode audio content into a single representation that can be played back using on-device presentation, consumer audio, TV and accessories, and car audio systems. In other words, a single representation of the audio content can be played back on a universal audio playback system (i.e., as opposed to requiring a specific configuration such as 5.1, 7.1, etc.) (e.g., audio playback system 16).

可执行所述技术的上下文的其它实例包含可包含获取元件及重放元件的音频生态系统。获取元件可包含有线及/或无线获取装置(例如,Eigen麦克风)、装置上环绕声俘获器及移动装置(例如,智能电话及平板计算机)。在一些实例中,有线及/或无线获取装置可经由有线及/或无线通信信道耦合到移动装置。Other examples of contexts in which the techniques may be implemented include audio ecosystems that may include acquisition elements and playback elements. The acquisition elements may include wired and/or wireless acquisition devices (e.g., Eigen microphones), on-device surround sound capturers, and mobile devices (e.g., smartphones and tablets). In some examples, the wired and/or wireless acquisition devices may be coupled to the mobile devices via wired and/or wireless communication channels.

根据本发明的一或多个技术,移动装置可用以获取声场。举例来说,移动装置可经由有线及/或无线获取装置及/或装置上环绕声俘获器(例如,集成到移动装置中的多个麦克风)获取声场。移动装置可接着将所获取声场译码成HOA系数以用于由重放元件中的一或多者重放。举例来说,移动装置的用户可记录(获取声场)现场事件(例如,集会、会议、比赛、音乐会等),且将记录译码成HOA系数。According to one or more techniques of this disclosure, a mobile device may be used to acquire a sound field. For example, the mobile device may acquire the sound field via a wired and/or wireless acquisition device and/or an on-device surround sound capturer (e.g., multiple microphones integrated into the mobile device). The mobile device may then decode the acquired sound field into HOA coefficients for playback by one or more of the playback elements. For example, a user of the mobile device may record (capture the sound field) a live event (e.g., a rally, conference, game, concert, etc.) and decode the recording into HOA coefficients.

移动装置也可利用重放元件中的一或多者来重放HOA经译码声场。举例来说,移动装置可解码HOA经译码声场,且将使得重放元件中的一或多者重新建立声场的信号输出到重放元件中的一或多者。作为一实例,移动装置可利用无线及/或无线通信信道将信号输出到一或多个扬声器(例如,扬声器阵列、声棒(sound bar)等)。作为另一实例,移动装置可利用衔接解决方案将信号输出到一或多个衔接台及/或一或多个衔接的扬声器(例如,智能汽车及/或家庭中的声音系统)。作为另一实例,移动装置可利用头戴式耳机呈现将信号输出到一组头戴式耳机(例如)以建立实际的双耳声音。The mobile device may also utilize one or more of the playback elements to playback the HOA decoded sound field. For example, the mobile device may decode the HOA decoded sound field and output a signal to one or more of the playback elements that causes the one or more playback elements to recreate the sound field. As an example, the mobile device may utilize a wireless and/or wireless communication channel to output the signal to one or more speakers (e.g., a speaker array, a sound bar, etc.). As another example, the mobile device may utilize a docking solution to output the signal to one or more docking stations and/or one or more docked speakers (e.g., a sound system in a smart car and/or home). As another example, the mobile device may utilize a headphone presentation to output the signal to a set of headphones (e.g., to create actual binaural sound).

在一些实例中,特定移动装置可获取3D声场并且在稍后时间重放相同的3D声场。在一些实例中,移动装置可获取3D声场,将所述3D声场编码为HOA,且将经编码3D声场传输到一或多个其它装置(例如,其它移动装置及/或其它非移动装置)以用于重放。In some examples, a particular mobile device may acquire a 3D sound field and replay the same 3D sound field at a later time. In some examples, a mobile device may acquire a 3D sound field, encode the 3D sound field into an HOA, and transmit the encoded 3D sound field to one or more other devices (e.g., other mobile devices and/or other non-mobile devices) for playback.

可执行所述技术的又一上下文包含可包含音频内容、游戏工作室、经译码音频内容、呈现引擎及递送系统的音频生态系统。在一些实例中,游戏工作室可包含可支持HOA信号的编辑的一或多个DAW。举例来说,所述一或多个DAW可包含HOA插件及/或可经配置以与一或多个游戏音频系统一起操作(例如,工作)的工具。在一些实例中,游戏工作室可输出支持HOA的新符干格式。在任何状况下,游戏工作室可将经译码音频内容输出到呈现引擎,所述呈现引擎可呈现声场以供由递送系统重放。Yet another context in which the techniques may be performed includes an audio ecosystem that may include audio content, a game studio, decoded audio content, a rendering engine, and a delivery system. In some examples, a game studio may include one or more DAWs that may support editing of HOA signals. For example, the one or more DAWs may include HOA plug-ins and/or tools that may be configured to operate (e.g., work) with one or more game audio systems. In some examples, a game studio may output a new stem format that supports HOA. In any case, the game studio may output the decoded audio content to a rendering engine that may render the sound field for playback by the delivery system.

也可关于示范性音频获取装置执行所述技术。举例来说,可关于可包含共同地经配置以记录3D声场的多个麦克风的Eigen麦克风执行所述技术。在一些实例中,Eigen麦克风的所述多个麦克风可位于具有大约4cm的半径的实质上球面球的表面上。在一些实例中,音频编码装置20可集成到Eigen麦克风中以便直接从麦克风输出位流21。The techniques may also be performed with respect to an exemplary audio acquisition device. For example, the techniques may be performed with respect to an Eigen microphone, which may include multiple microphones collectively configured to record a 3D sound field. In some examples, the multiple microphones of the Eigen microphone may be located on the surface of a substantially spherical ball having a radius of approximately 4 cm. In some examples, the audio encoding device 20 may be integrated into the Eigen microphone so as to output the bitstream 21 directly from the microphone.

另一示范性音频获取上下文可包含可经配置以接收来自一或多个麦克风(例如,一或多个Eigen麦克风)的信号的制作车。制作车也可包含音频编码器,例如图3的音频编码器20。Another exemplary audio acquisition context may include a production truck that may be configured to receive signals from one or more microphones (eg, one or more Eigen microphones). The production truck may also include an audio encoder, such as audio encoder 20 of FIG.

在一些情况下,移动装置也可包含共同地经配置以记录3D声场的多个麦克风。换句话说,所述多个麦克风可具有X、Y、Z分集。在一些实例中,移动装置可包含可旋转以关于移动装置的一或多个其它麦克风提供X、Y、Z分集的麦克风。移动装置也可包含音频编码器,例如图3的音频编码器20。In some cases, the mobile device may also include multiple microphones that are collectively configured to record a 3D sound field. In other words, the multiple microphones may have X, Y, Z diversity. In some examples, the mobile device may include a microphone that can be rotated to provide X, Y, Z diversity with respect to one or more other microphones of the mobile device. The mobile device may also include an audio encoder, such as audio encoder 20 of FIG. 3 .

加固型视频俘获装置可进一步经配置以记录3D声场。在一些实例中,加固型视频俘获装置可附接到参与活动的用户的头盔。举例来说,加固型视频俘获装置可在用户泛舟时附接到用户的头盔。以此方式,加固型视频俘获装置可俘获表示用户周围的动作(例如,水在用户身后的撞击、另一泛舟者在用户前方说话,等等)的3D声场。The ruggedized video capture device can be further configured to record a 3D sound field. In some examples, the ruggedized video capture device can be attached to a helmet of a user participating in an activity. For example, the ruggedized video capture device can be attached to a user's helmet while the user is whitewater rafting. In this way, the ruggedized video capture device can capture a 3D sound field representing the action around the user (e.g., water crashing behind the user, another whitewater rafter speaking in front of the user, etc.).

也可关于可经配置以记录3D声场的附件增强型移动装置执行所述技术。在一些实例中,移动装置可类似于上文所论述的移动装置,其中添加一或多个附件。举例来说,Eigen麦克风可附接到上文所提及的移动装置以形成附件增强型移动装置。以此方式,附件增强型移动装置可俘获3D声场的较高质量版本(与仅使用与附件增强型移动装置成一体式的声音俘获组件的情形相比较)。The techniques may also be performed with respect to an accessory-enhanced mobile device that can be configured to record a 3D sound field. In some examples, the mobile device may be similar to the mobile device discussed above, with one or more accessories added. For example, an Eigen microphone may be attached to the mobile device mentioned above to form an accessory-enhanced mobile device. In this way, the accessory-enhanced mobile device can capture a higher-quality version of the 3D sound field (compared to the case of using only the sound capture components that are integral to the accessory-enhanced mobile device).

下文进一步论述可执行本发明中所描述的技术的各种方面的实例音频重放装置。根据本发明的一或多个技术,扬声器及/或声棒可布置于任何任意配置中,同时仍重放3D声场。此外,在一些实例中,头戴式耳机重放装置可经由有线或无线连接耦合到解码器24。根据本发明的一或多个技术,可利用声场的单一通用表示来在扬声器、声棒及头戴式耳机重放装置的任何组合上呈现声场。Example audio playback devices that can perform various aspects of the techniques described in this disclosure are discussed further below. According to one or more techniques of this disclosure, speakers and/or soundbars can be arranged in any arbitrary configuration while still reproducing a 3D sound field. Furthermore, in some examples, a headphone playback device can be coupled to decoder 24 via a wired or wireless connection. According to one or more techniques of this disclosure, a single, universal representation of a soundfield can be utilized to render the soundfield on any combination of speaker, soundbar, and headphone playback devices.

数个不同实例音频重放环境也可适合于执行本发明中所描述的技术的各种方面。举例来说,以下环境可为用于执行本发明中所描述的技术的各种方面的合适环境:5.1扬声器重放环境、2.0(例如,立体声)扬声器重放环境、具有全高前扩音器的9.1扬声器重放环境、22.2扬声器重放环境、16.0扬声器重放环境、汽车扬声器重放环境,及具有耳挂式耳机重放环境的移动装置。Several different example audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For example, the following environments may be suitable environments for performing various aspects of the techniques described in this disclosure: a 5.1 speaker playback environment, a 2.0 (e.g., stereo) speaker playback environment, a 9.1 speaker playback environment with full-height front speakers, a 22.2 speaker playback environment, a 16.0 speaker playback environment, an automotive speaker playback environment, and a mobile device with an over-the-ear headphone playback environment.

根据本发明的一或多个技术,可利用声场的单一通用表示来在前述重放环境中的任一者上呈现声场。另外,本发明的技术使得呈现器能够从通用表示呈现声场以供在不同于上文所描述的环境的重放环境上重放。举例来说,如果设计考虑禁止扬声器根据7.1扬声器重放环境的恰当置放(例如,如果不可能置放右环绕扬声器),那么本发明的技术使得呈现器能够通过其它6个扬声器进行补偿,使得可在6.1扬声器重放环境上达成重放。According to one or more techniques of this disclosure, a single universal representation of a sound field can be utilized to render the sound field on any of the aforementioned playback environments. Additionally, the techniques of this disclosure enable a renderer to render the sound field from the universal representation for playback on playback environments other than the environments described above. For example, if design considerations prohibit proper placement of speakers according to a 7.1 speaker playback environment (e.g., if placement of a right surround speaker is not possible), the techniques of this disclosure enable the renderer to compensate by using the other six speakers so that playback can be achieved on a 6.1 speaker playback environment.

此外,用户可在佩戴头戴式耳机时观看运动比赛。根据本发明的一或多个技术,可获取运动比赛的3D声场(例如,可将一或多个Eigen麦克风置放于棒球场中及/或周围),可获得对应于3D声场的HOA系数且将所述HOA系数传输到解码器,所述解码器可基于HOA系数重建构3D声场且将经重建构的3D声场输出到呈现器,所述呈现器可获得关于重放环境的类型(例如,头戴式耳机)的指示,且将经重建构的3D声场呈现成使得头戴式耳机输出运动比赛的3D声场的表示的信号。In addition, a user can watch a sports game while wearing a headset. According to one or more techniques of the present invention, a 3D sound field of a sports game can be obtained (for example, one or more Eigen microphones can be placed in and/or around a baseball field), HOA coefficients corresponding to the 3D sound field can be obtained and transmitted to a decoder, which can reconstruct the 3D sound field based on the HOA coefficients and output the reconstructed 3D sound field to a renderer, which can obtain an indication of the type of playback environment (for example, a headset), and present the reconstructed 3D sound field so that the headset outputs a signal representing the 3D sound field of the sports game.

在上文所描述的各种情况中的每一者中,应理解,音频编码装置20可执行方法或另外包括用以执行音频编码装置20经配置以执行的方法的每一步骤的装置。在一些情况下,所述装置可包括一或多个处理器。在一些情况下,所述一或多个处理器可表示借助于存储到非暂时性计算机可读存储媒体的指令配置的专用处理器。换句话说,数组编码实例中的每一者中的技术的各种方面可提供非暂时性计算机可读存储媒体,其具有存储于其上的指令,所述指令在经执行时使得一或多个处理器执行音频编码装置20已经配置以执行的方法。In each of the various scenarios described above, it should be understood that the audio encoding device 20 may perform a method or otherwise include means for performing each step of the method that the audio encoding device 20 is configured to perform. In some cases, the device may include one or more processors. In some cases, the one or more processors may represent specialized processors configured with the aid of instructions stored on a non-transitory computer-readable storage medium. In other words, various aspects of the techniques in each of the array encoding examples may provide a non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause the one or more processors to perform the method that the audio encoding device 20 has been configured to perform.

在一或多个实例中,所描述功能可以硬件、软件、固件或其任何组合来实施。如果以软件实施,那么所述功能可作为一或多个指令或代码存储于计算机可读媒体上或经由计算机可读媒体进行传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于例如数据存储媒体的有形媒体。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本发明中所描述的技术的指令、代码及/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。In one or more examples, the described functions can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted via a computer-readable medium as one or more instructions or code and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media such as data storage media. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include a computer-readable medium.

同样,在上文所描述的各种情况中的每一者中,应理解,音频解码装置24可执行方法或另外包括用以执行音频解码装置24经配置以执行的方法的每一步骤的装置。在一些情况下,所述装置可包括一或多个处理器。在一些情况下,所述一或多个处理器可表示借助于存储到非暂时性计算机可读存储媒体的指令配置的专用处理器。换句话说,数组编码实例中的每一者中的技术的各种方面可提供非暂时性计算机可读存储媒体,其具有存储于其上的指令,所述指令在经执行时使得一或多个处理器执行音频解码装置24已经配置以执行的方法。Likewise, in each of the various scenarios described above, it should be understood that the audio decoding device 24 may perform a method or otherwise include means for performing each step of the method that the audio decoding device 24 is configured to perform. In some cases, the means may include one or more processors. In some cases, the one or more processors may represent specialized processors configured with the aid of instructions stored on a non-transitory computer-readable storage medium. In other words, various aspects of the technology in each of the array encoding examples may provide a non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause the one or more processors to perform the method that the audio decoding device 24 has been configured to perform.

借助于实例而非限制,此些计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储呈指令或数据结构形式的所要程序代码且可由计算机存取的任何其它媒体。然而,应理解,计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体,而是针对非暂时性有形存储媒体。如本文中所使用,磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、磁盘及蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘通过激光以光学方式再现数据。以上各者的组合也应包含于计算机可读媒体的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather refer to non-transitory, tangible storage media. As used herein, disk and optical disk include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), magnetic disk, and Blu-ray disc, where magnetic disks typically reproduce data magnetically, while optical discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

指令可由一或多个处理器执行,所述一或多个处理器例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效的集成或离散逻辑电路。因此,如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任何其它结构中的任一者。另外,在一些方面,可在经配置用于编码及解码的专用硬件及/或软件模块内提供本文中所描述的功能性,或将本文中所描述的功能性并入于组合式编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements.

本发明的技术可在广泛多种装置或设备中实施,所述装置或设备包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。在本发明中描述各种组件、模块或单元以强调经配置以执行所揭示技术的装置的功能方面,但未必需要通过不同硬件单元来实现。确切地说,如上文所描述,各种单元可与合适的软件及/或固件一起组合于编解码器硬件单元中或由互操作性硬件单元的集合提供,硬件单元包含如上文所描述的一或多个处理器。The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as described above, the various units may be combined with appropriate software and/or firmware in a codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors as described above.

已描述所述技术的各种方面。所述技术的此些及其它方面在以下权利要求书的范围内。Various aspects of the technology have been described. These and other aspects of the technology are within the scope of the following claims.

Claims (52)

1.一种有效率的位使用方法,所述方法包括:1. An efficient bit usage method, the method comprising: 获得包括声场的空间分量的经压缩版本的位流,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的指示预测模式的语法元素的指示符,所述预测模式指示是否关于所述向量执行预测。A compressed version of the bitstream is obtained, comprising spatial components of the sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in the spherical harmonic domain, wherein the bitstream further comprises an indicator of whether to reuse a syntax element from the previous frame that indicates a prediction mode, the prediction mode indicating whether to perform prediction with respect to the vector. 2.根据权利要求1所述的方法,其中所述语法元素是第一语法元素且所述指示符包括第二语法元素的一或多个位,所述第二语法元素指示在压缩所述向量时使用的量化模式。2. The method of claim 1, wherein the syntax element is a first syntax element and the indicator includes one or more bits of a second syntax element, the second syntax element indicating the quantization mode used when compressing the vector. 3.根据权利要求2所述的方法,其中当经设定为零值时,所述第二语法元素的所述一或多个位指示重用来自所述前一帧的所述第一语法元素。3. The method of claim 2, wherein when set to zero, the one or more bits of the second syntax element indicate reuse of the first syntax element from the previous frame. 4.根据权利要求2所述的方法,其中所述量化模式包括向量量化模式。4. The method according to claim 2, wherein the quantization mode includes vector quantization mode. 5.根据权利要求2所述的方法,其中所述量化模式包括无霍夫曼译码的纯量量化模式。5. The method according to claim 2, wherein the quantization mode includes a scalar quantization mode without Huffman decoding. 6.根据权利要求2所述的方法,其中所述量化模式包括具有霍夫曼译码的纯量量化模式。6. The method of claim 2, wherein the quantization mode includes a scalar quantization mode with Huffman decoding. 7.根据权利要求2所述的方法,其中所述指示符包含所述第二语法元素的最高有效位及所述第二语法元素的次高有效位。7. The method of claim 2, wherein the indicator comprises the most significant bit of the second syntax element and the second most significant bit of the second syntax element. 8.根据权利要求1所述的方法,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的霍夫曼表。8. The method of claim 1, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a Huffman table used when compressing the vector. 9.根据权利要求1所述的方法,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示类别识别符,所述类别识别符识别所述向量所对应的压缩类别。9. The method of claim 1, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a category identifier that identifies the compression category corresponding to the vector. 10.根据权利要求1所述的方法,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示所述向量的元素为正值还是负值。10. The method of claim 1, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating whether the elements of the vector are positive or negative. 11.根据权利要求1所述的方法,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的码向量的数目。11. The method of claim 1, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating the number of code vectors used when compressing the vector. 12.根据权利要求1所述的方法,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的向量量化码簿。12. The method of claim 1, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating the vector quantization codebook used when compressing the vector. 13.根据权利要求1所述的方法,其中所述向量的经压缩版本在所述位流中是至少部分地使用霍夫曼码表示,所述霍夫曼码用以表示所述向量的元素的残余值。13. The method of claim 1, wherein the compressed version of the vector in the bit stream is represented at least partially using Huffman codes to represent the residual values of the elements of the vector. 14.根据权利要求1所述的方法,其进一步包括:14. The method of claim 1, further comprising: 分解高阶立体混响音频数据以获得所述向量;及Decompose the high-order stereo reverberation audio data to obtain the vector; and 在所述位流中指定所述向量以获得所述位流。Specify the vector in the bit stream to obtain the bit stream. 15.根据权利要求1所述的方法,其进一步包括:15. The method of claim 1, further comprising: 从所述位流获得对应于所述向量的音频对象;及Obtain the audio object corresponding to the vector from the bitstream; and 组合所述音频对象与所述向量以重建构高阶立体混响音频数据。The audio object is combined with the vector to reconstruct high-order stereo reverberant audio data. 16.根据权利要求1所述的方法,其中所述向量的所述压缩包含所述向量的量化。16. The method of claim 1, wherein the compression of the vector comprises the quantization of the vector. 17.一种经配置以执行有效率的位使用的装置,所述装置包括:17. An apparatus configured to perform efficient bit usage, the apparatus comprising: 一或多个处理器,其经配置以获得包括声场的空间分量的经压缩版本的位流,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的指示预测模式的语法元素的指示符,所述预测模式指示是否关于所述向量执行预测;及One or more processors configured to obtain a compressed version of a bitstream comprising spatial components of a sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in a spherically harmonic domain, wherein the bitstream further comprises an indicator regarding whether to reuse a syntax element from a previous frame indicating a prediction mode, the prediction mode indicating whether to perform prediction with respect to the vector; and 存储器,其经配置以存储所述位流。A memory configured to store the bit stream. 18.根据权利要求17所述的装置,其中所述语法元素是第一语法元素且所述指示符包括第二语法元素的一或多个位,所述第二语法元素指示在压缩所述向量时使用的量化模式。18. The apparatus of claim 17, wherein the syntax element is a first syntax element and the indicator includes one or more bits of a second syntax element, the second syntax element indicating the quantization mode used when compressing the vector. 19.根据权利要求18所述的装置,其中当经设定为零值时,所述第二语法元素的所述一或多个位指示重用来自所述前一帧的所述第一语法元素。19. The apparatus of claim 18, wherein when set to a zero value, the one or more bits of the second syntax element indicate reuse of the first syntax element from the previous frame. 20.根据权利要求18所述的装置,其中所述量化模式包括向量量化模式。20. The apparatus of claim 18, wherein the quantization mode includes a vector quantization mode. 21.根据权利要求18所述的装置,其中所述量化模式包括无霍夫曼译码的纯量量化模式。21. The apparatus of claim 18, wherein the quantization mode comprises a scalar quantization mode without Huffman decoding. 22.根据权利要求18所述的装置,其中所述量化模式包括具有霍夫曼译码的纯量量化模式。22. The apparatus of claim 18, wherein the quantization mode comprises a scalar quantization mode with Huffman decoding. 23.根据权利要求18所述的装置,其中所述指示符包含所述第二语法元素的最高有效位及所述第二语法元素的次高有效位。23. The apparatus of claim 18, wherein the indicator comprises the most significant bit of the second syntax element and the second most significant bit of the second syntax element. 24.根据权利要求17所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的霍夫曼表。24. The apparatus of claim 17, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a Huffman table used when compressing the vector. 25.根据权利要求17所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示类别识别符,所述类别识别符识别所述向量所对应的压缩类别。25. The apparatus of claim 17, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a category identifier that identifies the compression category corresponding to the vector. 26.根据权利要求17所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示所述向量的元素为正值还是负值。26. The apparatus of claim 17, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating whether the elements of the vector are positive or negative. 27.根据权利要求17所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的码向量的数目。27. The apparatus of claim 17, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating the number of code vectors used when compressing the vector. 28.根据权利要求17所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的向量量化码簿。28. The apparatus of claim 17, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a vector quantization codebook used when compressing the vector. 29.根据权利要求17所述的装置,其中所述向量的经压缩版本在所述位流中是至少部分地使用霍夫曼码表示,所述霍夫曼码用以表示所述向量的元素的残余值。29. The apparatus of claim 17, wherein the compressed version of the vector in the bit stream is represented at least partially using Huffman codes to represent residual values of the elements of the vector. 30.根据权利要求17所述的装置,其中所述一或多个处理器经进一步配置以分解高阶立体混响音频数据以获得所述向量,及在所述位流中指定所述向量以获得所述位流。30. The apparatus of claim 17, wherein the one or more processors are further configured to decompose high-order stereo reverberation audio data to obtain the vector, and to specify the vector in the bitstream to obtain the bitstream. 31.根据权利要求17所述的装置,其中所述一或多个处理器经进一步配置以从所述位流获得对应于所述向量的音频对象,及组合所述音频对象与所述向量以重建构高阶立体混响音频数据。31. The apparatus of claim 17, wherein the one or more processors are further configured to obtain an audio object corresponding to the vector from the bit stream, and to combine the audio object with the vector to reconstruct higher-order stereo reverberation audio data. 32.根据权利要求17所述的装置,其中所述向量的所述压缩包含所述向量的量化。32. The apparatus of claim 17, wherein the compression of the vector comprises the quantization of the vector. 33.一种经配置以执行有效率的位使用的装置,所述装置包括:33. An apparatus configured to perform efficient bit usage, the apparatus comprising: 用于获得包括声场的空间分量的经压缩版本的位流的装置,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的指示预测模式的语法元素的指示符,所述预测模式指示是否关于所述向量执行预测;及A means for obtaining a compressed version of a bitstream comprising spatial components of a sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in a spherically harmonic domain, wherein the bitstream further comprises an indicator regarding whether to reuse a syntax element from a previous frame indicating a prediction mode, the prediction mode indicating whether to perform prediction with respect to the vector; and 用于存储所述指示符的装置。A means for storing the indicator. 34.根据权利要求33所述的装置,其中所述语法元素是第一语法元素且所述指示符包括第二语法元素的一或多个位,所述第二语法元素指示在压缩所述向量时使用的量化模式。34. The apparatus of claim 33, wherein the syntax element is a first syntax element and the indicator includes one or more bits of a second syntax element, the second syntax element indicating the quantization mode used when compressing the vector. 35.根据权利要求34所述的装置,其中当经设定为零值时,所述第二语法元素的所述一或多个位指示重用来自所述前一帧的所述第一语法元素。35. The apparatus of claim 34, wherein when set to a zero value, the one or more bits of the second syntax element indicate reuse of the first syntax element from the previous frame. 36.根据权利要求34所述的装置,其中所述量化模式包括向量量化模式。36. The apparatus of claim 34, wherein the quantization mode includes a vector quantization mode. 37.根据权利要求34所述的装置,其中所述量化模式包括无霍夫曼译码的纯量量化模式。37. The apparatus of claim 34, wherein the quantization mode comprises a scalar quantization mode without Huffman decoding. 38.根据权利要求34所述的装置,其中所述量化模式包括具有霍夫曼译码的纯量量化模式。38. The apparatus of claim 34, wherein the quantization mode comprises a scalar quantization mode with Huffman decoding. 39.根据权利要求34所述的装置,其中所述指示符包含所述第二语法元素的最高有效位及所述第二语法元素的次高有效位。39. The apparatus of claim 34, wherein the indicator comprises the most significant bit of the second syntax element and the second most significant bit of the second syntax element. 40.根据权利要求33所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的霍夫曼表。40. The apparatus of claim 33, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a Huffman table used when compressing the vector. 41.根据权利要求33所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示类别识别符,所述类别识别符识别所述向量所对应的压缩类别。41. The apparatus of claim 33, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating a category identifier that identifies the compression category corresponding to the vector. 42.根据权利要求33所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示所述向量的元素为正值还是负值。42. The apparatus of claim 33, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating whether the elements of the vector are positive or negative. 43.根据权利要求33所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的码向量的数目。43. The apparatus of claim 33, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating the number of code vectors used when compressing the vector. 44.根据权利要求33所述的装置,其中所述语法元素是第一语法元素且所述指示符指示是否重用来自所述前一帧的第二语法元素,所述第二语法元素指示在压缩所述向量时使用的向量量化码簿。44. The apparatus of claim 33, wherein the syntax element is a first syntax element and the indicator indicates whether a second syntax element from the previous frame is reused, the second syntax element indicating the vector quantization codebook used when compressing the vector. 45.根据权利要求33所述的装置,其中所述向量的经压缩版本在所述位流中是至少部分地使用霍夫曼码表示,所述霍夫曼码用以表示所述向量的元素的残余值。45. The apparatus of claim 33, wherein the compressed version of the vector in the bit stream is represented at least partially using Huffman codes to represent residual values of the elements of the vector. 46.根据权利要求33所述的装置,其进一步包括:46. The apparatus of claim 33, further comprising: 用于分解高阶立体混响音频数据以获得所述向量的装置;及A means for decomposing high-order stereo reverberation audio data to obtain the vector; and 用于在所述位流中指定所述向量以获得所述位流的装置。A means for specifying the vector in the bit stream to obtain the bit stream. 47.根据权利要求33所述的装置,其进一步包括:47. The apparatus of claim 33, further comprising: 用于从所述位流获得对应于所述向量的音频对象的装置;及A means for obtaining an audio object corresponding to the vector from the bit stream; and 用于组合所述音频对象与所述向量以重建构高阶立体混响音频数据的装置。A means for combining the audio object with the vector to reconstruct higher-order stereo reverberation audio data. 48.根据权利要求33所述的装置,其中所述向量的所述压缩包含所述向量的量化。48. The apparatus of claim 33, wherein the compression of the vector comprises the quantization of the vector. 49.一种非暂时性计算机可读存储媒体,其具有存储于其上的指令,所述指令在经执行时使得一或多个处理器进行以下操作:49. A non-transitory computer-readable storage medium having instructions stored thereon, which, when executed, cause one or more processors to perform the following operations: 获得包括声场的空间分量的经压缩版本的位流,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的指示预测模式的至少一语法元素的指示符,所述预测模式指示是否关于所述向量执行预测。A compressed version of a bitstream is obtained, comprising spatial components of a sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in a spherical harmonic domain, wherein the bitstream further comprises an indicator of at least one syntax element indicating whether a prediction mode from a previous frame is reused, the prediction mode indicating whether prediction is performed with respect to the vector. 50.一种经配置以执行有效率的位使用的装置,所述装置包括:50. An apparatus configured to perform efficient bit usage, the apparatus comprising: 一或多个处理器,其经配置以获得包括声场的空间分量的经压缩版本的位流,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的语法元素的指示符,所述语法元素指示在压缩所述向量时使用的霍夫曼表;及One or more processors configured to obtain a compressed version of a bitstream comprising spatial components of a sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in a spherically harmonic domain, wherein the bitstream further includes indicators regarding whether syntax elements from the previous frame are reused, the syntax elements indicating the Huffman table used when compressing the vectors; and 存储器,其经配置以存储所述位流。A memory configured to store the bit stream. 51.一种经配置以执行有效率的位使用的装置,所述装置包括:51. An apparatus configured to perform efficient bit usage, the apparatus comprising: 一或多个处理器,其经配置以获得包括声场的空间分量的经压缩版本的位流,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的语法元素的指示符,所述语法元素指示在压缩所述向量时使用的向量量化码簿;及One or more processors configured to obtain a compressed version of a bitstream comprising spatial components of a sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in a spherically harmonic domain, wherein the bitstream further comprises an indicator regarding whether syntax elements from a previous frame are reused, the syntax elements indicating the vector quantization codebook used when compressing the vectors; and 存储器,其经配置以存储所述位流。A memory configured to store the bit stream. 52.一种经配置以执行有效率的位使用的装置,所述装置包括:52. An apparatus configured to perform efficient bit usage, the apparatus comprising: 一或多个处理器,其经配置以获得包括声场的空间分量的经压缩版本的位流,所述声场的所述空间分量由表示球谐域中的正交空间轴线的向量表示,其中所述位流进一步包括关于是否重用来自前一帧的语法元素的指示符,所述语法元素指示在压缩所述向量时使用的量化模式,所述指示符包括所述语法元素的一或多个位;及One or more processors configured to obtain a compressed version of a bitstream comprising spatial components of a sound field, the spatial components of which are represented by vectors representing orthogonal spatial axes in a spherically harmonic domain, wherein the bitstream further comprises an indicator regarding whether a syntax element from a previous frame is reused, the syntax element indicating the quantization mode used when compressing the vector, the indicator comprising one or more bits of the syntax element; and 存储器,其经配置以存储所述位流。A memory configured to store the bit stream.
HK16112175.4A 2014-01-30 2015-01-30 Indicating frame parameter reusability for coding vectors HK1224073B (en)

Applications Claiming Priority (37)

Application Number Priority Date Filing Date Title
US201461933731P 2014-01-30 2014-01-30
US201461933714P 2014-01-30 2014-01-30
US201461933706P 2014-01-30 2014-01-30
US61/933,706 2014-01-30
US61/933,731 2014-01-30
US61/933,714 2014-01-30
US201461949591P 2014-03-07 2014-03-07
US201461949583P 2014-03-07 2014-03-07
US61/949,591 2014-03-07
US61/949,583 2014-03-07
US201461994794P 2014-05-16 2014-05-16
US61/994,794 2014-05-16
US201462004147P 2014-05-28 2014-05-28
US201462004067P 2014-05-28 2014-05-28
US201462004128P 2014-05-28 2014-05-28
US62/004,067 2014-05-28
US62/004,147 2014-05-28
US62/004,128 2014-05-28
US201462019663P 2014-07-01 2014-07-01
US62/019,663 2014-07-01
US201462027702P 2014-07-22 2014-07-22
US62/027,702 2014-07-22
US201462028282P 2014-07-23 2014-07-23
US62/028,282 2014-07-23
US201462029173P 2014-07-25 2014-07-25
US62/029,173 2014-07-25
US201462032440P 2014-08-01 2014-08-01
US62/032,440 2014-08-01
US201462056248P 2014-09-26 2014-09-26
US201462056286P 2014-09-26 2014-09-26
US62/056,248 2014-09-26
US62/056,286 2014-09-26
US201562102243P 2015-01-12 2015-01-12
US62/102,243 2015-01-12
US14/609,190 2015-01-29
US14/609,190 US9489955B2 (en) 2014-01-30 2015-01-29 Indicating frame parameter reusability for coding vectors
PCT/US2015/013818 WO2015116952A1 (en) 2014-01-30 2015-01-30 Indicating frame parameter reusability for coding vectors

Publications (2)

Publication Number Publication Date
HK1224073A1 HK1224073A1 (en) 2017-08-11
HK1224073B true HK1224073B (en) 2021-01-08

Family

ID=

Similar Documents

Publication Publication Date Title
CN105917408B (en) Indicates frame parameter reusability for coding vectors
HK1224073B (en) Indicating frame parameter reusability for coding vectors
HK1229524B (en) Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals
HK1229522B (en) Method and device for obtaining a plurality of higher order ambisonic (hoa) coefficients