HK1232013B - Methods and devices for processing audio data - Google Patents
Methods and devices for processing audio dataInfo
- Publication number
- HK1232013B HK1232013B HK17105633.3A HK17105633A HK1232013B HK 1232013 B HK1232013 B HK 1232013B HK 17105633 A HK17105633 A HK 17105633A HK 1232013 B HK1232013 B HK 1232013B
- Authority
- HK
- Hong Kong
- Prior art keywords
- coefficients
- stereo reverberation
- order
- decorrelation
- unit
- Prior art date
Links
Description
本申请案主张以下各者的权益:This application claims the rights of:
第62/020,348号美国临时专利申请案,其标题为“减少HOA背景信道之间的相关性(REDUCING CORRELATION BETWEEN HOA BACKGROUND CHANNELS)”,在2014年7月2日申请;和U.S. Provisional Patent Application No. 62/020,348, entitled “REDUCING CORRELATION BETWEEN HOA BACKGROUND CHANNELS,” filed on July 2, 2014; and
第62/060,512号美国临时专利申请案,其标题为“减少HOA背景信道之间的相关性(REDUCING CORRELATION BETWEEN HOA BACKGROUND CHANNELS)”,在2014年10月6日申请,U.S. Provisional Patent Application No. 62/060,512, entitled “REDUCING CORRELATION BETWEEN HOA BACKGROUND CHANNELS,” filed October 6, 2014,
其中的每一者的全部内容以引用的方式并入本文中。The entire contents of each of which are incorporated herein by reference.
技术领域Technical Field
本发明涉及音频数据,且更确切地说,涉及高阶立体混响音频数据的译码。The present invention relates to audio data, and more particularly, to the coding of high-order ambisonic audio data.
背景技术Background Art
高阶立体混响(HOA)信号(通常由多个球面谐波系数(SHC)或其它分层元素表示)是声场的三维表示。HOA或SHC表示可以独立于用以回放从SHC信号再现的多信道音频信号的局部扬声器几何布置的方式来表示声场。SHC信号还可促进向后兼容性,因为可将SHC信号再现为众所周知的且被广泛采用的多信道格式(例如,5.1音频信道格式或7.1音频信道格式)。SHC表示因此可实现对声场的更好表示,其也适应向后兼容性。A higher-order ambisonic (HOA) signal (typically represented by a plurality of spherical harmonic coefficients (SHCs) or other layered elements) is a three-dimensional representation of a sound field. An HOA or SHC representation can represent the sound field in a manner independent of the local speaker geometry used to play back the multi-channel audio signal reproduced from the SHC signal. An SHC signal can also facilitate backward compatibility because the SHC signal can be reproduced in a well-known and widely adopted multi-channel format (e.g., a 5.1 audio channel format or a 7.1 audio channel format). The SHC representation can thus achieve a better representation of the sound field, which also accommodates backward compatibility.
发明内容Summary of the Invention
一般来说,描述用于对高阶立体混响音频数据进行译码的技术。高阶立体混响音频数据可包括对应于具有大于一的阶数的球面谐波基底函数的至少一个高阶立体混响(HOA)系数。描述用于减少高阶立体混响(HOA)背景信道之间的相关性的技术。In general, techniques are described for coding higher-order ambisonic audio data. The higher-order ambisonic audio data may include at least one higher-order ambisonic (HOA) coefficient corresponding to a spherical harmonic basis function having an order greater than one. Techniques are also described for reducing correlation between higher-order ambisonic (HOA) background channels.
在一个方面,一种方法包含:获得具有至少一左信号和一右信号的环境立体混响系数的经去相关表示,所述环境立体混响系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联;和基于所述环境立体混响系数的所述经去相关表示而产生扬声器馈送。In one aspect, a method includes obtaining a decorrelated representation of ambisonic coefficients having at least a left signal and a right signal, the ambisonic coefficients having been extracted from a plurality of higher-order ambisonic coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic coefficients, wherein at least one of the plurality of higher-order ambisonic coefficients is associated with a spherical basis function having an order greater than one; and generating a speaker feed based on the decorrelated representation of the ambisonic coefficients.
在另一方面,一种方法包含:将去相关变换应用于环境立体混响系数以获得所述环境立体混响系数的经去相关表示,所述环境HOA系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联。In another aspect, a method includes applying a decorrelation transform to ambisonic coefficients to obtain decorrelated representations of the ambisonic coefficients, the ambisonic HOA coefficients having been extracted from a plurality of higher-order ambisonic coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic coefficients, wherein at least one of the plurality of higher-order ambisonic coefficients is associated with a spherical basis function having an order greater than one.
在另一方面,一种用于压缩音频数据的装置包含一或多个处理器,其经配置以:获得具有至少一左信号和一右信号的环境立体混响系数的经去相关表示,所述环境立体混响系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联;和基于所述环境立体混响系数的所述经去相关表示而产生扬声器馈送。In another aspect, a device for compressing audio data includes one or more processors configured to: obtain a decorrelated representation of ambisonic coefficients having at least a left signal and a right signal, the ambisonic coefficients having been extracted from a plurality of higher-order ambisonic coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic coefficients, wherein at least one of the plurality of higher-order ambisonic coefficients is associated with a spherical basis function having an order greater than one; and generate a speaker feed based on the decorrelated representation of the ambisonic coefficients.
在另一方面,一种用于压缩音频数据的装置包含一或多个处理器,其经配置以:将去相关变换应用于环境立体混响系数以获得所述环境立体混响系数的经去相关表示,所述环境HOA系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联。In another aspect, a device for compressing audio data includes one or more processors configured to: apply a decorrelation transform to ambisonic reverberation coefficients to obtain decorrelated representations of the ambisonic reverberation coefficients, the ambisonic reverberation coefficients having been extracted from a plurality of higher-order ambisonic reverberation coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic reverberation coefficients, wherein at least one of the plurality of higher-order ambisonic reverberation coefficients is associated with a spherical basis function having an order greater than one.
在另一方面,一种用于压缩音频数据的装置包含:用于获得具有至少一左信号和一右信号的环境立体混响系数的经去相关表示的装置,所述环境立体混响系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联;和用于基于所述环境立体混响系数的所述经去相关表示而产生扬声器馈送的装置。In another aspect, a device for compressing audio data includes: means for obtaining a decorrelated representation of ambisonic coefficients having at least a left signal and a right signal, the ambisonic coefficients having been extracted from a plurality of higher-order ambisonic coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic coefficients, wherein at least one of the plurality of higher-order ambisonic coefficients is associated with a spherical basis function having an order greater than one; and means for generating speaker feeds based on the decorrelated representation of the ambisonic coefficients.
在另一方面,一种用于压缩音频数据的装置包含:用于将去相关变换应用于环境立体混响系数以获得所述环境立体混响系数的经去相关表示的装置,所述环境HOA系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联;和用于存储所述环境立体混响系数的所述经去相关表示的装置。In another aspect, a device for compressing audio data includes: means for applying a decorrelation transform to ambisonic reverberation coefficients to obtain a decorrelated representation of the ambisonic reverberation coefficients, the ambisonic reverberation coefficients having been extracted from a plurality of higher-order ambisonic reverberation coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic reverberation coefficients, wherein at least one of the plurality of higher-order ambisonic reverberation coefficients is associated with a spherical basis function having an order greater than one; and means for storing the decorrelated representation of the ambisonic reverberation coefficients.
在另一方面,用指令对计算机可读存储媒体进行编码,所述指令在执行时致使音频压缩装置的一或多个处理器:获得具有至少一左信号和一右信号的环境立体混响系数的经去相关表示,所述环境立体混响系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联;和基于所述环境立体混响系数的所述经去相关表示而产生扬声器馈送。In another aspect, a computer-readable storage medium is encoded with instructions that, when executed, cause one or more processors of an audio compression device to: obtain a decorrelated representation of ambisonic coefficients having at least a left signal and a right signal, the ambisonic coefficients having been extracted from a plurality of higher-order ambisonic coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic coefficients, wherein at least one of the plurality of higher-order ambisonic coefficients is associated with a spherical basis function having an order greater than one; and generate a speaker feed based on the decorrelated representation of the ambisonic coefficients.
在另一方面,用指令对计算机可读存储媒体进行编码,所述指令在执行时致使音频压缩装置的一或多个处理器:将去相关变换应用于环境立体混响系数以获得所述环境立体混响系数的经去相关表示,所述环境HOA系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联。In another aspect, a computer-readable storage medium is encoded with instructions that, when executed, cause one or more processors of an audio compression device to apply a decorrelation transform to ambisonic reverberation coefficients to obtain decorrelated representations of the ambisonic reverberation coefficients, the ambisonic reverberation coefficients having been extracted from a plurality of higher-order ambisonic reverberation coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic reverberation coefficients, wherein at least one of the plurality of higher-order ambisonic reverberation coefficients is associated with a spherical basis function having an order greater than one.
在附图和以下描述中陈述所述技术的一或多个方面的细节。所述技术的其它特征、目标和优点将从所述描述和图式以及权利要求书而显而易见。The details of one or more aspects of the technology are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the technology will be apparent from the description and drawings, and from the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是说明具有各种阶数和子阶数的球面谐波基底函数的图。FIG1 is a diagram illustrating spherical harmonic basis functions with various orders and sub-orders.
图2是说明可执行本发明中描述的技术的各种方面的系统的图。2 is a diagram illustrating a system that may perform various aspects of the techniques described in this disclosure.
图3是更详细地说明在图2的实例中所展示的可执行本发明中描述的技术的各种方面的音频编码装置的一个实例的框图。3 is a block diagram illustrating one example of an audio encoding device shown in the example of FIG. 2 in greater detail, which may perform various aspects of the techniques described in this disclosure.
图4是更详细地说明图2的音频解码装置的框图。FIG. 4 is a block diagram illustrating the audio decoding apparatus of FIG. 2 in more detail.
图5是说明音频编码装置执行本发明中描述的基于向量的合成技术的各种方面的示范性操作的流程图。5 is a flowchart illustrating exemplary operation of an audio encoding device in performing various aspects of the vector-based synthesis techniques described in this disclosure.
图6A是说明音频解码装置执行本发明中描述的技术的各种方面的示范性操作的流程图。6A is a flowchart illustrating exemplary operation of an audio decoding device performing various aspects of the techniques described in this disclosure.
图6B是说明音频编码装置和音频解码装置执行本发明中描述的译码技术的示范性操作的流程图。6B is a flowchart illustrating exemplary operation of an audio encoding device and an audio decoding device performing the coding techniques described in this disclosure.
具体实施方式DETAILED DESCRIPTION
环绕声的演进现今已使得许多输出格式可用于娱乐。此等消费型环绕声格式的实例大部分为基于“信道”的,这是因为其以特定几何坐标隐含地指定去往扩音器的馈送。消费型环绕声格式包含普遍的5.1格式(其包含以下六个信道:左前(FL)、右前(FR)、中心或前中心、左后或左环绕、右后或右环绕,以及低频效应(LFE))、发展中的7.1格式、包含高度扬声器的各种格式,例如7.1.4格式和22.2格式(例如,供与超高清晰电视标准一起使用)。非消费型格式可涵括任何数目个扬声器(成对称和非对称几何布置),其常常被为“环绕阵列”。此类阵列的一个实例包含定位在截角二十面体的拐角上的坐标处的32个扩音器。The evolution of surround sound has made many output formats available for entertainment today. Examples of these consumer surround sound formats are mostly "channel" based because they implicitly specify the feeds to the loudspeakers in specific geometric coordinates. Consumer surround sound formats include the popular 5.1 format (which includes the following six channels: left front (FL), right front (FR), center or front center, left rear or left surround, right rear or right surround, and low frequency effects (LFE)), the developing 7.1 format, various formats that include height speakers, such as the 7.1.4 format, and the 22.2 format (for example, for use with ultra-high-definition television standards). Non-consumer formats can include any number of loudspeakers (in symmetrical and asymmetrical geometric arrangements), which are often referred to as "surround arrays." One example of such an array includes 32 loudspeakers positioned at coordinates on the corners of a truncated icosahedron.
去往未来MPEG编码器的输入视情况为以下三个可能格式中的一者:(i)传统的基于信道的音频(如上文所论述),其意图由处于预先指定的位置处的扩音器播放;(ii)基于对象的音频,其涉及用于单个音频对象的具有含有其位置坐标(以及其它信息)的相关联元数据的离散脉码调制(PCM)数据;以及(iii)基于场景的音频,其涉及使用球面谐波基底函数的系数(也称为“球面谐波系数”或SHC、“高阶立体混响”或HOA以及“HOA系数”)来表示声场。所述未来MPEG编码器更详细地描述于国际标准化组织/国际电工委员会(ISO)/(IEC)JTC1/SC29/WG11/N13411的标题为“要求对于3D音频的提议(Call for Proposals for 3DAudio)”的文献中,所述文献于2013年1月在瑞士日内瓦发布,且可在http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip获得。The input to future MPEG encoders will be in one of three possible formats: (i) traditional channel-based audio (as discussed above), intended to be played by loudspeakers at pre-specified positions; (ii) object-based audio, which involves discrete pulse-code modulation (PCM) data for individual audio objects with associated metadata containing their position coordinates (and other information); and (iii) scene-based audio, which involves representing the sound field using coefficients of spherical harmonic basis functions (also called "spherical harmonic coefficients" or SHC, "higher-order ambisonics" or HOA, and "HOA coefficients"). The future MPEG encoder is described in more detail in the document entitled “Call for Proposals for 3D Audio” by the International Organization for Standardization/International Electrotechnical Commission (ISO)/(IEC) JTC1/SC29/WG11/N13411, published in Geneva, Switzerland in January 2013 and available at http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip.
在市场中存在各种基于信道的“环绕声”格式。它们的范围(例如)是从5.1家庭影院系统(其在使起居室享有立体声效方面已获得最大成功)到NHK(日本广播协会(NipponHoso Kyokai)或日本广播公司(Japan Broadcasting Corporation))所开发的22.2系统。内容创建者(例如,好莱坞工作室)将希望一次产生电影的原声带,而不花费精力来针对每一扬声器配置对其进行再混合。近来,标准开发组织(Standards DevelopingOrganizations)一直在考虑如下方式:提供到标准化位流中的编码,以及后续解码,其可调适且不知晓回放位置(涉及再现器)处的扬声器几何布置(和数目)以及声学条件。There are various channel-based "surround sound" formats on the market. They range, for example, from 5.1 home theater systems, which have had the greatest success in bringing stereo sound to the living room, to the 22.2 system developed by NHK (Nippon Hoso Kyokai or Japan Broadcasting Corporation). Content creators (e.g., Hollywood studios) would like to produce a movie soundtrack once, without having to expend the effort of remixing it for each speaker configuration. Recently, Standards Developing Organizations have been considering approaches that provide encoding into a standardized bitstream, and subsequent decoding, that is adaptable and agnostic to the speaker geometry (and number) and acoustic conditions at the playback location (relating to the reproducer).
为向内容创建者提供此类灵活性,可使用分层要素集合来表示声场。所述分层要素集合可指其中元素经排序以使得较低阶元素的基础集合提供模型化声场的完整表示的元素集合。在所述集合经扩展以包含高阶元素时,所述表示变得更详细,从而增加分辨率。To provide content creators with this flexibility, a hierarchical element set can be used to represent the sound field. The hierarchical element set may refer to a set of elements in which the elements are ordered so that a base set of lower-order elements provides a complete representation of the modeled sound field. As the set is expanded to include higher-order elements, the representation becomes more detailed, thereby increasing the resolution.
分层要素集合的一个实例为球面谐波系数(SHC)集合。以下表达式示范使用SHC对声场的描述或表示:An example of a hierarchical element set is a set of spherical harmonic coefficients (SHCs). The following expression demonstrates the description or representation of a sound field using SHCs:
所述表达式展示在时间t处在声场的任何点{rr,θr,}处的压力pi可由SHC、来唯一地表示。此处,c是声音的速度(约343m/s),{rr,θr,}是参考点(或观测点),jn(·)是阶n的球面贝塞尔函数,且是阶数n和子阶数m的球面谐波基底函数。可认识到,方括号中的术语为信号(即,S(ω,rr,θr,)的频域表示,其可由各种时间-频率变换(例如离散傅里叶变换(DFT)、离散余弦变换(DCT)或小波变换)近似表示。分层集合的其它实例包含小波变换系数的集合和多分辨率基底函数的系数的其它集合。通过截断高阶以使得仅留存零阶和一阶来处理高阶立体混响信号。归因于高阶系数的能量损失,通常会对剩余的信号进行某些能量补偿。The expression shows that the pressure p i at any point {r r , θ r } in the sound field at time t can be uniquely represented by SHC . Here, c is the speed of sound (approximately 343 m/s), {r r , θ r } is a reference point (or observation point), j n (·) is a spherical Bessel function of order n, and is a spherical harmonic basis function of order n and suborder m. It can be recognized that the term in square brackets is a frequency domain representation of the signal (i.e., S(ω, r r , θ r ), which can be approximated by various time-frequency transforms (e.g., discrete Fourier transform (DFT), discrete cosine transform (DCT), or wavelet transform). Other examples of hierarchical sets include sets of wavelet transform coefficients and other sets of coefficients of multiresolution basis functions. High-order ambisonic reverberation signals are processed by truncating the high orders so that only the zeroth and first orders remain. Due to the energy loss of the high-order coefficients, some energy compensation is usually performed on the remaining signal.
本发明的各种方面针对于减少背景信号间的相关性。举例来说,本发明的技术可减少或可能地消除在HOA域中表达的背景信号之间的相关性。减少背景HOA信号之间的相关性的潜在优点是减低噪声去掩蔽。如本文中所使用,表达“噪声去掩蔽”可指将音频对象归属于在空间域中不对应于所述音频对象的位置。除减少与噪声去掩蔽有关的潜在问题之外,本文中所描述的编码技术还可产生表示左音频信号和右音频信号(例如一起形成立体声输出的信号)的输出信号。继而,解码装置可解码左音频信号和右音频信号以获得立体声输出,或可混合左音频信号与右音频信号以获得单声道输出。另外,在经编码位流表示纯水平布局的情境中,解码装置可实施本发明的各种技术以仅解码水平分量去相关HOA背景信号。通过将解码过程限制于水平分量去相关HOA背景信号,解码器可实施所述技术以节省计算资源并且减少带宽消耗。Various aspects of the present invention are directed to reducing correlation between background signals. For example, the techniques of the present invention can reduce or potentially eliminate correlation between background signals expressed in the HOA domain. A potential advantage of reducing correlation between background HOA signals is reduced noise demasking. As used herein, the expression "noise demasking" may refer to attributing audio objects to locations that do not correspond to them in the spatial domain. In addition to reducing potential issues associated with noise demasking, the encoding techniques described herein can also generate output signals representing left and right audio signals (e.g., signals that together form a stereo output). A decoding device can then decode the left and right audio signals to obtain a stereo output, or can mix the left and right audio signals to obtain a mono output. Furthermore, in scenarios where the encoded bitstream represents a purely horizontal layout, the decoding device can implement the various techniques of the present invention to decode only the horizontal component decorrelated HOA background signals. By limiting the decoding process to the horizontal component decorrelated HOA background signals, the decoder can implement the techniques to save computational resources and reduce bandwidth consumption.
图1是说明从零阶(n=0)到四阶(n=4)的球面谐波基底函数的图。如可见,对于每一阶,存在子阶数m的扩展,出于易于说明的目的,在图1的实例中展示所述子阶数但未明确注释。Figure 1 is a diagram illustrating spherical harmonic basis functions from zeroth order (n=0) to fourth order (n=4). As can be seen, for each order, there is an extension of a number of sub-orders m, which are shown in the example of Figure 1 for ease of illustration but not explicitly annotated.
可通过各种麦克风阵列配置物理地获取(例如,记录)SHC或替代地,其可从声场的基于信道或基于对象的描述导出。SHC表示基于场景的音频,其中SHC可输入到音频编码器以获得经编码SHC,所述经编码SHC可促成更有效的传输或存储。举例来说,可使用涉及(1+4)2(25,且因此为四阶)系数的四阶表示。SHC can be physically acquired (e.g., recorded) by various microphone array configurations or, alternatively, derived from a channel-based or object-based description of the sound field. SHC represents scene-based audio, where the SHC can be input to an audio encoder to obtain encoded SHC, which can facilitate more efficient transmission or storage. For example, a fourth-order representation involving (1+4) ² (2⁵, and therefore fourth-order) coefficients can be used.
如上文所提到,可使用麦克风阵列从麦克风记录导出SHC。可如何从麦克风阵列导出SHC的各种实例描述于波莱蒂·M(Poletti,M)的“基于球面谐波的三维环绕声系统(Three-Dimensional Surround Sound Systems Based on Spherical Harmonics)”(听觉工程学协会会刊(J.Audio Eng.Soc.),第53卷,第11期,2005年11月,第1004到1025页)中。As mentioned above, SHC can be derived from microphone recordings using a microphone array. Various examples of how SHC can be derived from a microphone array are described in Poletti, M., "Three-Dimensional Surround Sound Systems Based on Spherical Harmonics," J. Audio Eng. Soc., Vol. 53, No. 11, November 2005, pp. 1004-1025.
为说明可如何从基于对象的描述导出SHC,考虑以下方程式。可将对应于个别音频对象的声场的系数表达为:To illustrate how SHC can be derived from an object-based description, consider the following equation. The coefficients of the sound field corresponding to an individual audio object can be expressed as:
其中i是是阶n的球面汉克尔函数(第二种类),且{rs,θs,}是对象的位置。已知随频率变化的对象源能量g(ω)(例如,使用时间-频率分析技术,例如对PCM流执行快速傅里叶变换)允许将每一PCM对象和对应位置转换成SHC此外,可展示(由于以上是线性和正交分解)用于每一对象的系数是累加的。以此方式,众多PCM对象可由系数(例如,作为个别对象的系数向量的总和)来表示。基本上,所述系数含有关于声场的信息(随3D坐标而变的压力),且上述情形表示在观测点{rr,θr,}附近从个别对象到整个声场的表示的变换。下文在基于对象和基于SHC的音频译码的上下文中描述剩余各图。where i is a spherical Hankel function of order n (of the second kind), and { rs , θs ,} is the position of the object. Knowing the frequency-varying source energy of the object, g(ω) (e.g., using time-frequency analysis techniques such as performing a Fast Fourier Transform on the PCM stream) allows each PCM object and corresponding position to be converted into an SHC. Furthermore, it can be shown (due to the linear and orthogonal decomposition above) that the coefficients for each object are additive. In this way, many PCM objects can be represented by coefficients (e.g., as the sum of the coefficient vectors for the individual objects). Essentially, the coefficients contain information about the sound field (pressure as a function of 3D coordinates), and the above scenario represents the transformation from an individual object to a representation of the entire sound field near the observation point { rs , θr ,}. The remaining figures are described below in the context of object-based and SHC-based audio coding.
图2是说明可执行本发明中描述的技术的各种方面的系统10的图。如图2的实例中所展示,系统10包含内容创建者装置12和内容消费者装置14。尽管在内容创建者装置12和内容消费者装置14的上下文中描述,但可在其中声场的SHCt也可称为HOA系数)或任何其它分层表示经编码以形成表示音频数据的位流的任何上下文中实施所述技术。此外,内容创建者装置12可表示能够实施本发明中所描述的技术的任何形式的计算装置,包含手持机(或蜂窝式电话)、平板计算机、智能电话或台式计算机(提供几个实例)。同样地,内容消费者装置14可表示能够实施本发明中所描述的技术的任何形式的计算装置,包含手持机(或蜂窝式电话)、平板计算机、智能电话、机顶盒或台式计算机(提供几个实例)。FIG2 is a diagram illustrating a system 10 that can perform various aspects of the techniques described in this disclosure. As shown in the example of FIG2 , the system 10 includes a content creator device 12 and a content consumer device 14. Although described in the context of a content creator device 12 and a content consumer device 14, the techniques may be implemented in any context in which the SHCt (also referred to as HOA coefficients) or any other layered representation of a sound field is encoded to form a bitstream representing audio data. Furthermore, the content creator device 12 may represent any form of computing device capable of implementing the techniques described in this disclosure, including a handset (or cellular phone), a tablet computer, a smartphone, or a desktop computer (to provide a few examples). Likewise, the content consumer device 14 may represent any form of computing device capable of implementing the techniques described in this disclosure, including a handset (or cellular phone), a tablet computer, a smartphone, a set-top box, or a desktop computer (to provide a few examples).
内容创建者装置12可由电影工作室或可产生多信道音频内容以供内容消费者装置(例如,内容消费者装置14)的操作者消耗的其它实体来操作。在一些实例中,内容创建者装置12可由将希望压缩HOA系数11的个别用户操作。内容创建者通常产生音频内容与视频内容。内容消费者装置14可由个人操作。内容消费者装置14可包含音频回放系统16,其可指能够再现SHC以供作为多信道音频内容回放的任何形式的音频回放系统。The content creator device 12 may be operated by a movie studio or other entity that can produce multi-channel audio content for consumption by an operator of a content consumer device (e.g., content consumer device 14). In some examples, the content creator device 12 may be operated by an individual user who would like to compress the HOA coefficients 11. Content creators typically produce audio content along with video content. The content consumer device 14 may be operated by an individual. The content consumer device 14 may include an audio playback system 16, which may refer to any form of audio playback system capable of reproducing SHC for playback as multi-channel audio content.
内容创建者装置12包含音频编辑系统18。内容创建者装置12获得各种格式(包含直接作为HOA系数)的实况记录7和音频对象9,内容创建者装置12可使用音频编辑系统18对其进行编辑。麦克风5可捕获实况记录7。内容创建者可在编辑过程期间再现来自音频对象9的HOA系数11,从而收听所再现的扬声器馈送以试图标识需要进一步编辑的声场的各种方面。内容创建者装置12接着可编辑HOA系数11(潜在地通过操纵可以上文所描述的方式从中导出源HOA系数的音频对象9中的不同者而间接地编辑)。内容创建者装置12可采用音频编辑系统18产生HOA系数11。音频编辑系统18表示能够编辑音频数据且输出所述音频数据作为一或多个源球面谐波系数的任何系统。The content creator device 12 includes an audio editing system 18. The content creator device 12 obtains a live recording 7 and audio objects 9 in various formats (including directly as HOA coefficients), which the content creator device 12 can edit using the audio editing system 18. The microphone 5 can capture the live recording 7. The content creator can reproduce the HOA coefficients 11 from the audio objects 9 during the editing process, listening to the reproduced speaker feeds to try to identify various aspects of the sound field that need further editing. The content creator device 12 can then edit the HOA coefficients 11 (potentially indirectly by manipulating different ones of the audio objects 9 from which the source HOA coefficients can be derived in the manner described above). The content creator device 12 can employ the audio editing system 18 to generate the HOA coefficients 11. The audio editing system 18 represents any system capable of editing audio data and outputting the audio data as one or more source spherical harmonic coefficients.
当编辑过程完成时,内容创建者装置12可基于HOA系数11产生位流21。也就是说,内容创建者装置12包含音频编码装置20,所述音频编码装置表示经配置以根据本发明中描述的技术的各种方面编码或以其它方式压缩HOA系数11以产生位流21的装置。音频编码装置20可产生位流21以用于跨越传输信道(其可为有线或无线信道、数据存储装置或其类似者)传输(作为一个实例)。位流21可表示HOA系数11的经编码版本,且可包含主要位流和另一旁侧位流(其可称为旁侧信道信息)。When the editing process is complete, the content creator device 12 can generate a bitstream 21 based on the HOA coefficients 11. That is, the content creator device 12 includes an audio encoding device 20, which represents a device configured to encode or otherwise compress the HOA coefficients 11 according to various aspects of the techniques described in this disclosure to generate the bitstream 21. The audio encoding device 20 can generate the bitstream 21 for transmission across a transmission channel (which can be a wired or wireless channel, a data storage device, or the like) (as an example). The bitstream 21 can represent an encoded version of the HOA coefficients 11 and can include a main bitstream and another side bitstream (which can be referred to as side channel information).
虽然在图2中展示为直接传输到内容消费者装置14,但内容创建者装置12可将位流21输出到位于内容创建者装置12与内容消费者装置14之间的中间装置。中间装置可存储位流21以用于稍后递送到可请求所述位流的内容消费者装置14。所述中间装置可包括文件服务器、网络服务器、台式计算机、膝上型计算机、平板计算机、移动电话、智能电话,或能够存储位流21以供音频解码器稍后检索的任何其它装置。中间装置可驻留在能够将位流21流式传输(且可能结合传输对应视频数据位流)到请求位流21的订户(例如,内容消费者装置14)的内容递送网络中。2 as being transmitted directly to the content consumer device 14, the content creator device 12 may output the bitstream 21 to an intermediary device located between the content creator device 12 and the content consumer device 14. The intermediary device may store the bitstream 21 for later delivery to the content consumer device 14 that may request the bitstream. The intermediary device may comprise a file server, a network server, a desktop computer, a laptop computer, a tablet computer, a mobile phone, a smartphone, or any other device capable of storing the bitstream 21 for later retrieval by an audio decoder. The intermediary device may reside in a content delivery network that is capable of streaming the bitstream 21 (and possibly in conjunction with transmitting a corresponding video data bitstream) to a subscriber (e.g., the content consumer device 14) requesting the bitstream 21.
替代地,内容创建者装置12可将位流21存储到存储媒体,例如压缩光盘、数字视频光盘、高清视频光盘或其它存储媒体,其中大多数能够由计算机读取且因此可称为计算机可读存储媒体或非暂时性计算机可读存储媒体。在此上下文中,传输信道可指借以传输存储到媒体的内容的信道(且可包含小量存储(retail stores)和其它基于存储的(store-based)递送机制)。因而,在任何情况下,就此而言,本发明的技术不应限于图2的实例。Alternatively, the content creator device 12 may store the bitstream 21 on a storage medium, such as a compact disc, a digital video disc, a high-definition video disc, or other storage medium, most of which are capable of being read by a computer and thus may be referred to as computer-readable storage medium or non-transitory computer-readable storage medium. In this context, a transmission channel may refer to a channel by which content stored on the medium is transmitted (and may include retail stores and other store-based delivery mechanisms). Thus, in any case, the techniques of this disclosure should not be limited to the example of FIG. 2 in this regard.
如图2的实例中进一步展示,内容消费者装置14包含音频回放系统16。音频回放系统16可表示能够回放多信道音频数据的任何音频回放系统。音频回放系统16可包含多个不同的再现器22。再现器22可各自提供用于不同形式的再现,其中所述不同形式的再现可包含执行向量基振幅移动(VBAP)的各种方式中的一或多者,以及/或执行声场合成的各种方式中的一或多者。如本文中所使用,“A和/或B”意味着“A或B”,或“A和B”两者。As further shown in the example of FIG2 , the content consumer device 14 includes an audio playback system 16. The audio playback system 16 may represent any audio playback system capable of playing back multi-channel audio data. The audio playback system 16 may include a plurality of different reproducers 22. The reproducers 22 may each provide for a different form of reproduction, wherein the different forms of reproduction may include one or more of various ways of performing Vector Basis Amplitude Shifting (VBAP) and/or one or more of various ways of performing sound field synthesis. As used herein, "A and/or B" means "A or B," or "both A and B."
音频回放系统16可进一步包含音频解码装置24。音频解码装置24可表示经配置以解码来自位流21的HOA系数11'的装置,其中HOA系数11'可类似于HOA系数11,但归因于有损操作(例如,量化)和/或经由传输信道的传输而不同。音频回放系统16可在解码位流21之后获得HOA系数11'并再现HOA系数11'以输出扩音器馈送25。扩音器馈送25可驱动一或多个扩音器(其为便于说明的目的在图2的实例中未示出)。The audio playback system 16 may further include an audio decoding device 24. The audio decoding device 24 may represent a device configured to decode the HOA coefficients 11′ from the bitstream 21, wherein the HOA coefficients 11′ may be similar to the HOA coefficients 11, but may differ due to lossy operations (e.g., quantization) and/or transmission via a transmission channel. The audio playback system 16 may obtain the HOA coefficients 11′ after decoding the bitstream 21 and reproduce the HOA coefficients 11′ to output a loudspeaker feed 25. The loudspeaker feed 25 may drive one or more loudspeakers (not shown in the example of FIG. 2 for ease of illustration).
为了选择适当再现器或在一些例子中产生适当再现器,音频回放系统16可获得指示扩音器的数目和/或扩音器的空间几何布置的扩音器信息13。在一些例子中,音频回放系统16可使用参考麦克风获得扩音器信息13且以动态地确定扩音器信息13的方式驱动扩音器。在其它例子中或结合动态确定扩音器信息13,音频回放系统16可提示用户与音频回放系统16介接并输入扩音器信息13。To select an appropriate reproducer, or in some examples, generate an appropriate reproducer, the audio playback system 16 may obtain loudspeaker information 13 indicating the number of loudspeakers and/or the spatial geometric arrangement of the loudspeakers. In some examples, the audio playback system 16 may obtain the loudspeaker information 13 using a reference microphone and drive the loudspeakers in a manner that dynamically determines the loudspeaker information 13. In other examples, or in conjunction with dynamically determining the loudspeaker information 13, the audio playback system 16 may prompt a user to interface with the audio playback system 16 and input the loudspeaker information 13.
音频回放系统16接着可基于扩音器信息13选择音频再现器22中的一者。在一些例子中,当音频再现器22中无一者处于到在扩音器信息13中所指定的扩音器几何布置的某一阈值相似性量度(就扩音器几何布置来说)内时,音频回放系统16可基于扩音器信息13产生音频再现器22中的一者。音频回放系统16可在一些例子中基于扩音器信息13产生音频再现器22中的一者,而无需首先试图选择音频再现器22中的现有者。一或多个扬声器3接着可回放再现的扩音器馈送25。The audio playback system 16 may then select one of the audio renderers 22 based on the loudspeaker information 13. In some instances, when none of the audio renderers 22 are within a certain threshold similarity measure (in terms of loudspeaker geometry) to the loudspeaker geometry specified in the loudspeaker information 13, the audio playback system 16 may generate one of the audio renderers 22 based on the loudspeaker information 13. The audio playback system 16 may, in some instances, generate one of the audio renderers 22 based on the loudspeaker information 13 without first attempting to select an existing one of the audio renderers 22. The one or more speakers 3 may then play back the reproduced loudspeaker feed 25.
图3是更详细地说明在图2的实例中所展示的可执行本发明中描述的技术的各种方面的音频编码装置20的一个实例的框图。音频编码装置20包含内容分析单元26、基于向量的合成方法单元27、基于方向的合成方法单元28,以及去相关单元40'。尽管下文简单描述,但关于音频编码装置20和压缩或以其它方式编码HOA系数的各种方面的更多信息可在2014年5月29日申请的标题为“用于声场的经分解表示的内插(INTERPOLATION FORDECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第WO 2014/194099号国际专利申请公开案中获得。FIG3 is a block diagram illustrating in greater detail an example of the audio encoding device 20 shown in the example of FIG2 that can perform various aspects of the techniques described in this disclosure. The audio encoding device 20 includes a content analysis unit 26, a vector-based synthesis method unit 27, a direction-based synthesis method unit 28, and a decorrelation unit 40′. Although briefly described below, more information about the audio encoding device 20 and various aspects of compressing or otherwise encoding HOA coefficients can be found in International Patent Application Publication No. WO 2014/194099, filed May 29, 2014, entitled “INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD.”
内容分析单元26表示经配置以分析HOA系数11的内容以标识HOA系数11是表示从实况记录产生的内容还是表示从音频对象产生的内容的单元。内容分析单元26可确定HOA系数11是从实际声场的记录产生还是从人工音频对象产生。在一些例子中,当帧式HOA系数11是从记录产生时,内容分析单元26将HOA系数11传递到基于向量的分解单元27。在一些例子中,当帧式HOA系数11是从合成音频对象产生时,内容分析单元26将HOA系数11传递到基于方向的合成单元28。基于方向的合成单元28可表示经配置以执行HOA系数11的基于方向的合成以产生基于方向的位流21的单元。The content analysis unit 26 represents a unit configured to analyze the content of the HOA coefficients 11 to identify whether the HOA coefficients 11 represent content generated from a live recording or content generated from an audio object. The content analysis unit 26 can determine whether the HOA coefficients 11 are generated from a recording of an actual sound field or from an artificial audio object. In some examples, when the framed HOA coefficients 11 are generated from a recording, the content analysis unit 26 passes the HOA coefficients 11 to a vector-based decomposition unit 27. In some examples, when the framed HOA coefficients 11 are generated from a synthesized audio object, the content analysis unit 26 passes the HOA coefficients 11 to a direction-based synthesis unit 28. The direction-based synthesis unit 28 can represent a unit configured to perform direction-based synthesis of the HOA coefficients 11 to generate a direction-based bitstream 21.
如在图3的实例中所展示,基于向量的分解单元27可包含线性可逆变换(LIT)单元30、参数计算单元32、重排序单元34、前景选择单元36、能量补偿单元38、心理声学音频译码器单元40、位流产生单元42、声场分析单元44、系数减少单元46、背景(BG)选择单元48、空间-时间内插单元50以及量化单元52。As shown in the example of Figure 3, the vector-based decomposition unit 27 may include a linear reversible transform (LIT) unit 30, a parameter calculation unit 32, a reordering unit 34, a foreground selection unit 36, an energy compensation unit 38, a psychoacoustic audio decoder unit 40, a bitstream generation unit 42, a sound field analysis unit 44, a coefficient reduction unit 46, a background (BG) selection unit 48, a space-time interpolation unit 50 and a quantization unit 52.
线性可逆变换(LIT)单元30接收呈HOA信道形式的HOA系数11,所述HOA信道中的每一信道表示与球面基底函数的给定阶数、子阶数相关联的系数的块或帧(其可标示为HOA[k],其中k可标示样本的当前帧或块)。HOA系数11的矩阵可具有维度D:M×(N+1)2。The linear invertible transform (LIT) unit 30 receives HOA coefficients 11 in the form of HOA channels, each of which represents a block or frame of coefficients associated with a given order or sub-order of a spherical basis function (which may be denoted as HOA[k], where k may indicate the current frame or block of samples). The matrix of HOA coefficients 11 may have dimensions D: M×(N+1) ² .
LIT单元30可表示经配置以执行被称为奇异值分解的分析形式的单元。虽然关于SVD加以描述,但可对于提供线性不相关的能量密集输出的集合的任何类似变换或分解来执行本发明中所描述的技术。而且,本发明中对“集合”的参考一般意图指非零集合(除非特定地相反陈述),且并不意图指包含所谓的“空集合”的集合的经典数学定义。替代变换可包括通常被称为“PCA”的主分量分析。取决于上下文,PCA可由若干不同名称指代,例如(仅举几例)离散卡亨南-洛维变换、霍特林变换、恰当正交分解(POD)和本征值分解(EVD)。有利于压缩音频数据的基本目标的此类操作的特性为多信道音频数据的“能量压缩”和“去相关”。The LIT unit 30 may represent a unit configured to perform a form of analysis known as singular value decomposition. Although described with respect to SVD, the techniques described in this disclosure may be performed on any similar transform or decomposition of a set that provides linearly uncorrelated, energy-dense outputs. Moreover, references to "sets" in this disclosure are generally intended to refer to non-zero sets (unless specifically stated to the contrary) and are not intended to refer to the classical mathematical definition of a set that includes the so-called "empty set." Alternative transforms may include principal component analysis, commonly referred to as "PCA." Depending on the context, PCA may be referred to by several different names, such as, to name a few, the discrete Kärnen-Loewe transform, the Hotelling transform, proper orthogonal decomposition (POD), and eigenvalue decomposition (EVD). Characteristics of such operations that contribute to the fundamental goal of compressing audio data are "energy compression" and "decorrelation" of multi-channel audio data.
在任何情况下,出于实例的目的,假设LIT单元30执行奇异值分解(其又可被称为“SVD”),LIT单元30可将HOA系数11变换成两个或多于两个经变换HOA系数的集合。经变换HOA系数的“集合”可包含经变换HOA系数的向量。在图3的实例中,LIT单元30可对于HOA系数11执行SVD以产生所谓的V矩阵、S矩阵及U矩阵。在线性代数中,SVD可以如下形式表示y乘z实数或复数矩阵X(其中X可表示多信道音频数据,例如HOA系数11)的因子分解:In any case, for purposes of example, assuming that the LIT unit 30 performs singular value decomposition (which may also be referred to as "SVD"), the LIT unit 30 may transform the HOA coefficients 11 into sets of two or more transformed HOA coefficients. A "set" of transformed HOA coefficients may include a vector of transformed HOA coefficients. In the example of FIG. 3 , the LIT unit 30 may perform SVD on the HOA coefficients 11 to generate so-called V matrices, S matrices, and U matrices. In linear algebra, SVD may represent the factorization of a y-by-z real or complex matrix X (where X may represent multi-channel audio data, such as the HOA coefficients 11) in the following form:
X=USV*X=USV*
U可表示y乘y实数或复数单式矩阵,其中U的y列被称为多信道音频数据的左奇异向量。S可表示在对角线上具有非负实数的y乘z矩形对角线矩阵,其中S的对角线值被称为多信道音频数据的奇异值。V*(其可标示V的共轭转置)可表示z乘z实数或复数单式矩阵,其中V*的z列被称为多信道音频数据的右奇异向量。U may represent a y-by-y real or complex unitary matrix, where the y columns of U are referred to as the left singular vectors of the multi-channel audio data. S may represent a y-by-z rectangular diagonal matrix with non-negative real numbers on the diagonal, where the diagonal values of S are referred to as the singular values of the multi-channel audio data. V* (which may denote the conjugate transpose of V) may represent a z-by-z real or complex unitary matrix, where the z columns of V* are referred to as the right singular vectors of the multi-channel audio data.
在一些实例中,以上提及的SVD数学表达式中的V*矩阵标示为V矩阵的共轭转置以反映SVD可应用于包括复数的矩阵。当应用于仅包括实数的矩阵时,V矩阵的复数共轭(或换句话说,V*矩阵)可被视为V矩阵的转置。下文中为容易说明的目的,假设HOA系数11包括实数,结果是经由SVD而非V*矩阵输出V矩阵。此外,尽管在本发明中标示为V矩阵,但对V矩阵的提及应理解为在适当的情况下涉及V矩阵的转置。尽管假设为V矩阵,但所述技术可以类似方式应用于具有复数系数的HOA系数11,其中SVD的输出为V*矩阵。因此,就此而言,所述技术不应限于仅提供应用SVD以产生V矩阵,而是可包含将SVD应用于具有复数分量的HOA系数11以产生V*矩阵。In some examples, the V* matrix in the SVD mathematical expression mentioned above is labeled as the conjugate transpose of the V matrix to reflect that SVD can be applied to matrices including complex numbers. When applied to matrices including only real numbers, the complex conjugate of the V matrix (or in other words, the V* matrix) can be considered as the transpose of the V matrix. For ease of explanation below, it is assumed that the HOA coefficients 11 include real numbers, and the result is that the V matrix is output via SVD rather than the V* matrix. In addition, although labeled as the V matrix in this disclosure, references to the V matrix should be understood to refer to the transpose of the V matrix where appropriate. Although assumed to be the V matrix, the described technique can be applied in a similar manner to the HOA coefficients 11 having complex coefficients, where the output of the SVD is the V* matrix. Therefore, in this regard, the described technique should not be limited to providing only the application of SVD to generate the V matrix, but may include applying SVD to the HOA coefficients 11 having complex components to generate the V* matrix.
以此方式,LIT单元30可对于HOA系数11执行SVD以输出具有维度D:M×(N+1)2的US[k]向量33(其可表示S向量和U向量的组合版本)以及具有维度D:(N+1)2×(N+1)2的V[k]向量35。US[k]矩阵中的个别向量元素也可被称为XPS(k),而V[k]矩阵中的个别向量也可被称为v(k)。In this manner, the LIT unit 30 may perform SVD on the HOA coefficients 11 to output a US[k] vector 33 having dimensions D: M×(N+1) 2 (which may represent a combined version of the S vector and the U vector) and a V[k] vector 35 having dimensions D: (N+1) 2 ×(N+1) 2. Individual vector elements in the US[k] matrix may also be referred to as XPS (k), and individual vectors in the V[k] matrix may also be referred to as v(k).
U、S和V矩阵的分析可显示这些矩阵携载或表示上文由X表示的基本声场的空间和时间特性。U(长度为M个样本)中的N个向量中的每一者可表示随时间而变(对于由M个样本表示的时间周期)的经归一化分离音频信号,其彼此正交且已与任何空间特性(其也可被称为方向信息)解耦。表示空间形状和位置(r、θ、)的空间特性可替代地由V矩阵中的个别第i向量v(i)(k)(每一者具有长度(N+1)2)表示。v(i)(k)向量中的每一者的个别元素可表示HOA系数,其描述相关联音频对象的声场的形状(包含宽度)和位置。U矩阵和V矩阵中的向量均经归一化以使得其均方根能量等于一。U中的音频信号的能量因而由S中的对角线元素表示。将U与S相乘以形成US[k](具有个别向量元素XPS(k)),因而表示具有能量的音频信号。SVD分解以使音频时间信号(U中)、其能量(S中)以及其空间特性(V中)解耦的能力可支持本发明中所描述的技术的各种方面。另外,通过US[k]与V[k]的向量乘法合成基本HOA[k]系数X的模型产生贯穿此文献使用的术语“基于向量的分解”。Analysis of the U, S, and V matrices can show that these matrices carry or represent the spatial and temporal characteristics of the basic sound field, represented above by X. Each of the N vectors in U (which is M samples long) can represent a normalized, separated audio signal that varies with time (for a time period represented by M samples), is orthogonal to one another, and has been decoupled from any spatial characteristics (which can also be referred to as directional information). The spatial characteristics representing spatial shape and position (r, θ, ) can alternatively be represented by individual i-th vectors v (i) (k) in the V matrix (each having a length of (N+1) ² ). The individual elements of each v (i) (k) vector can represent HOA coefficients, which describe the shape (including width) and position of the sound field of the associated audio object. The vectors in both the U and V matrices are normalized so that their root mean square energy is equal to one. The energy of the audio signal in U is thus represented by the diagonal elements in S. Multiplying U and S forms US[k] (having individual vector elements XPS (k)), thus representing the audio signal having energy. The ability of the SVD decomposition to decouple the audio time signal (in U), its energy (in S), and its spatial characteristics (in V) can support various aspects of the techniques described in this disclosure. Additionally, the model of synthesizing the elementary HOA[k] coefficients X by vector multiplication of US[k] and V[k] yields the term "vector-based decomposition" used throughout this document.
尽管描述为对于HOA系数11直接执行,但LIT单元30可将线性可逆变换应用到HOA系数11的导出项。举例来说,LIT单元30可对于从HOA系数11导出的功率谱密度矩阵应用SVD。通过对于HOA系数的功率谱密度(PSD)而非系数本身执行SVD,LIT单元30可在处理器循环和存储空间的一或多者方面潜在地降低执行SVD的计算复杂性,同时实现与将SVD直接应用于HOA系数的情况相同的源音频编码效率。Although described as being performed directly on the HOA coefficients 11, the LIT unit 30 may apply a linear reversible transform to the derived terms of the HOA coefficients 11. For example, the LIT unit 30 may apply SVD to the power spectral density matrix derived from the HOA coefficients 11. By performing SVD on the power spectral density (PSD) of the HOA coefficients rather than the coefficients themselves, the LIT unit 30 may potentially reduce the computational complexity of performing the SVD in terms of one or more of processor cycles and memory space, while achieving the same source audio coding efficiency as if the SVD were applied directly to the HOA coefficients.
参数计算单元32表示经配置以计算各种参数的单元,所述参数例如相关性参数(R)、方向特性参数(θ、r)和能量特性(e)。当前帧的参数中的每一者可标示为R[k]、θ[k]、r[k]及e[k]。参数计算单元32可对于US[k]向量33执行能量分析和/或相关(或所谓的交叉相关)以标识这些参数。参数计算单元32还可确定前一帧的参数,其中前一帧的参数可基于具有US[k-1]向量及V[k-1]向量的前一帧而标示为R[k-1]、θ[k-1]、r[k-1]和e[k-1]。参数计算单元32可将当前参数37和先前参数39输出到重排序单元34。The parameter calculation unit 32 represents a unit configured to calculate various parameters, such as a correlation parameter (R), directional characteristic parameters (θ, r), and energy characteristic (e). Each of the parameters of the current frame may be denoted as R[k], θ[k], r[k], and e[k]. The parameter calculation unit 32 may perform energy analysis and/or correlation (or so-called cross-correlation) on the US[k] vector 33 to identify these parameters. The parameter calculation unit 32 may also determine the parameters of the previous frame, where the parameters of the previous frame may be denoted as R[k-1], θ[k-1], r[k-1], and e[k-1] based on the previous frame having the US[k-1] vector and the V[k-1] vector. The parameter calculation unit 32 may output the current parameters 37 and the previous parameters 39 to the reordering unit 34.
由参数计算单元32计算的参数可供重排序单元34用以将音频对象重排序以表示其自然评估或随时间推移的连续性。重排序单元34可将来自第一US[k]向量33的参数37中的每一者与第二US[k-1]向量33的参数39中的每一者在顺序方面进行比较。重排序单元34可基于当前参数37和先前参数39对US[k]矩阵33和V[k]矩阵35内的各种向量进行重排序(作为一个实例,使用匈牙利算法)以将经重排序的US[k]矩阵33'(其可数学标示为和经重排序的V[k]矩阵35'(其可数学标示为)输出到前景声音(或主导声音(PS))选择单元36(“前景选择单元36”)和能量补偿单元38。The parameters calculated by the parameter calculation unit 32 may be used by the reordering unit 34 to reorder the audio objects to represent their natural evaluation or continuity over time. The reordering unit 34 may compare each of the parameters 37 from the first US[k] vector 33 with each of the parameters 39 of the second US[k-1] vector 33 in terms of order. The reordering unit 34 may reorder the various vectors within the US[k] matrix 33 and the V[k] matrix 35 based on the current parameters 37 and the previous parameters 39 (using, as an example, the Hungarian algorithm) to output the reordered US[k] matrix 33′ (which may be mathematically designated as ) and the reordered V[k] matrix 35′ (which may be mathematically designated as ) to the foreground sound (or dominant sound (PS)) selection unit 36 (“foreground selection unit 36”) and the energy compensation unit 38.
声场分析单元44可表示经配置以对于HOA系数11执行声场分析以便潜在地实现目标位速率41的单元。声场分析单元44可基于所述分析和/或基于所接收目标位速率41,确定心理声学译码器实例化的总数目(其可为环境或背景信道的总数目(BGTOT)的函数)以及前景信道(或换句话说,主导信道)的数目。心理声学译码器实例化的总数可标示为numHOATransportChannels。The soundfield analysis unit 44 may represent a unit configured to perform a soundfield analysis on the HOA coefficients 11 in order to potentially achieve the target bitrate 41. The soundfield analysis unit 44 may determine the total number of psychoacoustic decoder instantiations (which may be a function of the total number of ambient or background channels (BG TOT )) and the number of foreground channels (or in other words, dominant channels) based on the analysis and/or based on the received target bitrate 41. The total number of psychoacoustic decoder instantiations may be denoted as numHOATransportChannels.
再次为了潜在地实现目标位速率41,声场分析单元44还可确定前景信道的总数目(nFG)45、背景(或换句话说,环境)声场的最小阶数(NBG或替代性地,MinAmbHOAorder)、表示背景声场的最小阶数的实际信道的对应数目(nBGa=(MinAmbHOAorder+1)2),以及要发送的额外BG HOA信道的索引(i)(其在图3的实例中可共同地标示为背景信道信息43)。背景信道信息42也可被称为环境信道信息43。保持来自numHOATransportChannels-nBGa的信道中的每一者可为“额外背景/环境信道”、“活动的基于向量的主导信道”、“活动的基于方向的主导信号”,或为“完全非活动的”。在一个方面中,信道类型可为通过两位指示(为“ChannelType”)的语法元素(例如,00:基于方向的信号;01:基于向量的主导信号;10:额外环境信号;11:非活动信号)。可由(MinAmbHOAorder+1)2+索引10(在以上实例中)作为信道类型在所述帧的位流中出现的次数给出背景或环境信号的总数目nBGa。Again to potentially achieve the target bit rate 41, the sound field analysis unit 44 may also determine the total number of foreground channels (nFG) 45, the minimum order of the background (or in other words, ambient) sound field ( NBG or alternatively, MinAmbHOAorder), the corresponding number of actual channels representing the minimum order of the background sound field (nBGa = (MinAmbHOAorder + 1) 2 ), and the index (i) of the additional BG HOA channel to be sent (which may be collectively labeled as background channel information 43 in the example of FIG. 3 ). The background channel information 42 may also be referred to as ambient channel information 43. Each of the channels remaining from numHOATransportChannels - nBGa may be an "additional background/ambient channel," an "active vector-based dominant channel," an "active direction-based dominant signal," or "completely inactive." In one aspect, the channel type may be a syntax element indicated by two bits ("ChannelType") (e.g., 00: direction-based signal; 01: vector-based dominant signal; 10: additional ambient signal; 11: inactive signal). The total number of background or ambient signals, nBGa, may be given by (MinAmbHOAorder+1) 2 + index 10 (in the above example) as the number of times the channel type appears in the bitstream of the frame.
声场分析单元44可基于目标位速率41选择背景(或换句话说,环境)信道的数目和前景(或换句话说,主导)信道的数目,从而在目标位速率41相对较高时(例如,在目标位速率41等于或大于512Kbps时)选择更多背景和/或前景信道。在一个方面中,在位流的标头部分中,numHOATransportChannels可设置为8,而MinAmbHOAorder可设置为1。在此情境下,在每个帧处,四个信道可专用于表示声场的背景或环境部分,而另4个信道可在逐帧基础上随信道类型而变化,例如任一者用作额外背景/环境信道或前景/主导信道。前景/主导信号可为基于向量或基于方向的信号中的一者,如上文所描述。The sound field analysis unit 44 may select the number of background (or in other words, ambient) channels and the number of foreground (or in other words, dominant) channels based on the target bit rate 41, thereby selecting more background and/or foreground channels when the target bit rate 41 is relatively high (e.g., when the target bit rate 41 is equal to or greater than 512 Kbps). In one aspect, in the header portion of the bitstream, numHOATransportChannels may be set to 8 and MinAmbHOAorder may be set to 1. In this scenario, at each frame, four channels may be dedicated to representing the background or ambient portion of the sound field, while the other four channels may vary with the channel type on a frame-by-frame basis, such as either being used as additional background/ambient channels or foreground/dominant channels. The foreground/dominant signal may be one of a vector-based or a direction-based signal, as described above.
在一些例子中,帧的基于向量的主导信号的总数目可通过ChannelType索引在所述帧的位流中为01的次数给出。在以上方面中,对于每个额外背景/环境信道(例如,对应于ChannelType 10),可在所述信道中表示可能的HOA系数(除前四个以外)中的何者的对应信息。对于四阶HOA内容,所述信息可为指示HOA系数5到25的索引。可在minAmbHOAorder设置为1时始终发送前四个环境HOA系数1到4,因此,音频编码装置可能仅需要指示具有索引5到25的额外环境HOA系数中的一者。因此,可使用5位语法元素(针对四阶内容)发送所述信息,其可标示为“CodedAmbCoeffIdx”。在任何情况下,声场分析单元44将背景信道信息43和HOA系数11输出到背景(BG)选择单元36,将背景信道信息43输出到系数减少单元46和位流产生单元42,且将nFG 45输出到前景选择单元36。In some examples, the total number of vector-based dominant signals for a frame may be given by the number of times the ChannelType index is 01 in the bitstream of the frame. In the above aspect, for each additional background/ambient channel (e.g., corresponding to ChannelType 10), corresponding information may be provided as to which of the possible HOA coefficients (other than the first four) are represented in the channel. For fourth-order HOA content, the information may be indices indicating HOA coefficients 5 to 25. The first four ambient HOA coefficients 1 to 4 may always be sent when minAmbHOAorder is set to 1, and thus the audio encoding device may only need to indicate one of the additional ambient HOA coefficients with indices 5 to 25. Therefore, the information may be sent using a 5-bit syntax element (for fourth-order content), which may be designated "CodedAmbCoeffIdx". In any case, the sound field analysis unit 44 outputs the background channel information 43 and the HOA coefficient 11 to the background (BG) selection unit 36, outputs the background channel information 43 to the coefficient reduction unit 46 and the bitstream generation unit 42, and outputs the nFG 45 to the foreground selection unit 36.
背景选择单元48可表示经配置以基于背景信道信息(例如,背景声场(NBG)以及要发送的额外BG HOA信道的数目(nBGa)和索引(i))确定背景或环境HOA系数47的单元。举例来说,当NBG等于一时,背景选择单元48可选择具有等于或小于一的阶数的音频帧的每一样本的HOA系数11。在此实例中,背景选择单元48接着可选择具有由索引(i)中的一者标识的索引的HOA系数11作为额外BG HOA系数,其中将待于位流21中指定的nBGa提供到位流产生单元42以便使得音频解码装置(例如,图2和4的实例中所展示的音频解码装置24)能够从位流21解析背景HOA系数47。背景选择单元48接着可将环境HOA系数47输出到能量补偿单元38。环境HOA系数47可具有维度D:M×[(NBG+1)2+nBGa]。环境HOA系数47也可被称为“环境HOA系数47”,其中环境HOA系数47中的每一者对应于待由心理声学音频译码器单元40编码的单独环境HOA信道47。The background selection unit 48 may represent a unit configured to determine background or ambient HOA coefficients 47 based on background channel information, such as the background sound field (N BG ) and the number (nBGa) and index (i) of additional BG HOA channels to be sent. For example, when N BG is equal to one, the background selection unit 48 may select the HOA coefficients 11 for each sample of the audio frame having an order equal to or less than one. In this example, the background selection unit 48 may then select the HOA coefficients 11 having an index identified by one of the indices (i) as the additional BG HOA coefficients, where nBGa to be specified in the bitstream 21 is provided to the bitstream generation unit 42 to enable the audio decoding device (e.g., the audio decoding device 24 shown in the examples of Figures 2 and 4) to parse the background HOA coefficients 47 from the bitstream 21. The background selection unit 48 may then output the ambient HOA coefficients 47 to the energy compensation unit 38. The ambient HOA coefficients 47 may have dimensions D: M×[(N BG +1) 2 +nBGa]. The ambient HOA coefficients 47 may also be referred to as “ambient HOA coefficients 47 ,” where each of the ambient HOA coefficients 47 corresponds to a separate ambient HOA channel 47 to be encoded by the psychoacoustic audio coder unit 40 .
前景选择单元36可表示经配置以基于nFG 45(其可表示标识前景向量的一或多个索引)选择表示声场的前景或相异分量的经重排序的US[k]矩阵33'和经重排序的V[k]矩阵35'的单元。前景选择单元36可将nFG信号49(其可表示为经重排序US[k]1、…、nFG49、FG1、…、nfG[k]49,或49)输出到心理声学音频译码器单元40,其中nFG信号49可具有维度D:M×nFG,且每一者表示单声道音频对象。前景选择单元36还可将对应于声场的前景分量的经重排序的V[k]矩阵35'(或v(1..nFG)(k)35')输出到空间-时间内插单元50,其中经重排序的V[k]矩阵35'中的对应于前景分量的子集可表示为具有维度D:((N+1)2×nFG)的前景V[k]矩阵51k(其可在数学上表示为)。The foreground selection unit 36 may represent a unit configured to select a reordered US[k] matrix 33′ and a reordered V[k] matrix 35′ representing foreground or distinct components of the sound field based on nFG 45 (which may represent one or more indices identifying a foreground vector). The foreground selection unit 36 may output an nFG signal 49 (which may be represented as reordered US[k] 1, ..., nFG 49, FG 1, ..., nfG [k] 49, or 49) to the psychoacoustic audio decoder unit 40, where the nFG signal 49 may have dimensions D: M×nFG, and each represents a mono audio object. The foreground selection unit 36 can also output the reordered V[k] matrix 35' (or v (1..nFG) (k)35') corresponding to the foreground components of the sound field to the spatial-temporal interpolation unit 50, where the subset corresponding to the foreground components in the reordered V[k] matrix 35' can be represented as a foreground V[k] matrix 51 k with dimension D: ((N+1) 2 ×nFG) (which can be mathematically represented as).
能量补偿单元38可表示经配置以对于环境HOA系数47执行能量补偿以补偿归因于由背景选择单元48移除HOA信道中的各者而产生的能量损失的单元。能量补偿单元38可对于经重排序的US[k]矩阵33'、经重排序的V[k]矩阵35'、nFG信号49、前景V[k]向量51k和环境HOA系数47中的一或多者执行能量分析,且接着基于所述能量分析执行能量补偿以产生经能量补偿的环境HOA系数47'。能量补偿单元38可将经能量补偿的环境HOA系数47'输出到去相关单元40'。继而,去相关单元40'可实施本发明的技术以减少或消除HOA系数47'的背景信号之间的相关性以形成一或多个经去相关的HOA系数47"。去相关单元40'可将经去相关的HOA系数47"输出到心理声学音频译码器单元40。The energy compensation unit 38 may represent a unit configured to perform energy compensation on the ambient HOA coefficients 47 to compensate for the energy loss due to the removal of each of the HOA channels by the background selection unit 48. The energy compensation unit 38 may perform an energy analysis on one or more of the reordered US[k] matrix 33', the reordered V[k] matrix 35', the nFG signal 49, the foreground V[k] vector 51k , and the ambient HOA coefficients 47, and then perform energy compensation based on the energy analysis to produce energy-compensated ambient HOA coefficients 47'. The energy compensation unit 38 may output the energy-compensated ambient HOA coefficients 47' to the decorrelation unit 40'. The decorrelation unit 40' may then implement the techniques of the present disclosure to reduce or eliminate correlation between background signals of the HOA coefficients 47' to form one or more decorrelated HOA coefficients 47". The decorrelation unit 40' may output the decorrelated HOA coefficients 47" to the psychoacoustic audio decoder unit 40.
空间-时间内插单元50可表示经配置以接收第k帧的前景V[k]向量51k和前一帧(因此为k-1记法)的前景V[k-1]向量51k-1且执行空间-时间内插以产生经内插的前景V[k]向量的单元。空间-时间内插单元50可将nFG信号49与前景V[k]向量51k重新组合以恢复经重排序的前景HOA系数。空间-时间内插单元50接着可将经重排序的前景HOA系数除以经内插的V[k]向量以产生经内插的nFG信号49'。空间-时间内插单元50还可输出前景V[k]向量51k,所述前景V[k]向量51k用以产生经内插的前景V[k]向量,以使得例如音频解码装置24的音频解码装置可产生经内插的前景V[k]向量且借此恢复前景V[k]向量51k。将用以产生经内插的前景V[k]向量的前景V[k]向量51k标示为剩余的前景V[k]向量53。为了确保在编码器和解码器处使用相同的V[k]和V[k-1](以创建经内插的向量V[k]),可在编码器和解码器处使用向量的经量化/经解量化版本。空间-时间内插单元50可将经内插的nFG信号49'输出到心理声学音频译码器单元46且将经内插的前景V[k]向量51k输出到系数减少单元46。The spatial-temporal interpolation unit 50 may represent a unit configured to receive the foreground V[k] vector 51 k of the kth frame and the foreground V[k-1] vector 51 k-1 of the previous frame (hence the k-1 notation) and perform spatial-temporal interpolation to generate an interpolated foreground V[k] vector. The spatial-temporal interpolation unit 50 may recombines the nFG signal 49 with the foreground V[k] vector 51 k to recover the reordered foreground HOA coefficients. The spatial-temporal interpolation unit 50 may then divide the reordered foreground HOA coefficients by the interpolated V[k] vector to generate the interpolated nFG signal 49'. The spatial-temporal interpolation unit 50 may also output the foreground V[k] vector 51 k , which is used to generate the interpolated foreground V[k] vector so that an audio decoding device, such as the audio decoding device 24, can generate the interpolated foreground V[k] vector and thereby recover the foreground V[k] vector 51 k . The foreground V[k] vector 51 k used to generate the interpolated foreground V[k] vector is labeled as the remaining foreground V[k] vector 53. To ensure that the same V[k] and V[k-1] are used at the encoder and decoder (to create the interpolated vector V[k]), quantized/dequantized versions of the vectors may be used at the encoder and decoder. The spatio-temporal interpolation unit 50 may output the interpolated nFG signal 49' to the psychoacoustic audio decoder unit 46 and the interpolated foreground V[k] vector 51 k to the coefficient reduction unit 46.
系数减少单元46可表示经配置以基于背景信道信息43对于剩余的前景V[k]向量53执行系数减少以将经减少的前景V[k]向量55输出到量化单元52的单元。经减少的前景V[k]向量55可具有维度D:[(N+1)2-(NBG+1)2-BGTOT]×nFG。系数减少单元46可在这方面表示经配置以减少剩余的前景V[k]向量53中的系数的数目的单元。换句话说,系数减少单元46可表示经配置以消除(形成剩余的前景V[k]向量53的)前景V[k]向量中的具有极少或几乎不具有方向信息的系数的单元。在一些实例中,相异或(换句话说)前景V[k]向量的对应于一阶和零阶基底函数的系数(其可标示为NBG)提供极少方向信息,且因此可从前景V向量移除(通过可被称为“系数减少”的过程)。在此实例中,可提供较大灵活性以不仅从集合[(NBG+1)2+1,(N+1)2]标识对应于NBG的系数而且标识额外HOA信道(其可由变量TotalOfAddAmbHOAChan标示)。The coefficient reduction unit 46 may represent a unit configured to perform coefficient reduction on the remaining foreground V[k] vector 53 based on the background channel information 43 to output a reduced foreground V[k] vector 55 to the quantization unit 52. The reduced foreground V[k] vector 55 may have a dimension D: [(N+1) 2- (N BG +1) 2 -BG TOT ]×nFG. The coefficient reduction unit 46 may, in this regard, represent a unit configured to reduce the number of coefficients in the remaining foreground V[k] vector 53. In other words, the coefficient reduction unit 46 may represent a unit configured to eliminate coefficients in the foreground V[k] vector (forming the remaining foreground V[k] vector 53) that have little or almost no directional information. In some examples, the coefficients of the distinct or (in other words) foreground V[k] vector corresponding to the first-order and zeroth-order basis functions (which may be designated as N BG ) provide little directional information and, therefore, may be removed from the foreground V vector (by a process that may be referred to as "coefficient reduction"). In this example, greater flexibility may be provided to identify not only coefficients corresponding to NBG from the set [( NBG +1) 2 +1, (N+1) 2 ] but also additional HOA channels (which may be denoted by the variable TotalOfAddAmbHOAChan).
量化单元52可表示经配置以执行任何形式的量化以压缩减少的前景V[k]向量55以产生经译码前景V[k]向量57,从而将经译码前景V[k]向量57输出到位流产生单元42的单元。在操作中,量化单元52可表示经配置以压缩声场的空间分量(即,在此实例中为经减少的前景V[k]向量55中的一或多者)的单元。量化单元52可执行如由标示为“NbitsQ”的量化模式语法元素指示的以下12种量化模式中的任一者:The quantization unit 52 may represent a unit configured to perform any form of quantization to compress the reduced foreground V[k] vectors 55 to produce a coded foreground V[k] vector 57, thereby outputting the coded foreground V[k] vector 57 to the bitstream generation unit 42. In operation, the quantization unit 52 may represent a unit configured to compress the spatial components of the soundfield (i.e., one or more of the reduced foreground V[k] vectors 55 in this example). The quantization unit 52 may perform any of the following 12 quantization modes, as indicated by the quantization mode syntax element labeled "NbitsQ":
NbitsQ值 量化模式的类型NbitsQ value Quantization mode type
0-3: 保留0-3: Reserved
4: 向量量化4: Vector Quantization
5: 无霍夫曼译码的标量量化5: Scalar Quantization without Huffman Decoding
6: 具有霍夫曼译码的6位标量量化6: 6-bit scalar quantization with Huffman decoding
7: 具有霍夫曼译码的7位标量量化7: 7-bit scalar quantization with Huffman decoding
8: 具有霍夫曼译码的8位标量量化8: 8-bit scalar quantization with Huffman decoding
… …… …
16: 具有霍夫曼译码的16位标量量化16: 16-bit scalar quantization with Huffman decoding
量化单元52还可执行前述类型的量化模式中的任一者的经预测版本,其中确定前一帧的V向量的元素(或执行向量量化时的权重)与当前帧的V向量的元素(或执行向量量化时的权重)之间的差。量化单元52接着可量化当前帧与前一帧的元素或权重之间的差而非当前帧本身的V向量的元素的值。Quantization unit 52 may also perform a predicted version of any of the aforementioned types of quantization modes, in which the differences between the elements of the V vector of the previous frame (or weights when performing vector quantization) and the elements of the V vector of the current frame (or weights when performing vector quantization) are determined. Quantization unit 52 may then quantize the differences between the elements or weights of the current and previous frames rather than the values of the elements of the V vector of the current frame itself.
量化单元52可对于经减少的前景V[k]向量55中的每一者执行多种形式的量化,以获得经减少的前景V[k]向量55的多个经译码版本。量化单元52可选择经减少的前景V[k]向量55的经译码版本中的一者作为经译码前景V[k]向量57。换句话说,量化单元52可基于本发明中论述的准则的任何组合来选择未经预测的经向量量化的V向量、经预测的经向量量化的V向量、未经霍夫曼译码的经标量量化的V向量以及经霍夫曼译码的经标量量化的V向量中的一者,以用作输出的经切换量化的V向量。在一些实例中,量化单元52可从包含向量量化模式和一或多个标量量化模式的量化模式集合中选择量化模式,且基于(或根据)选定模式量化输入的V向量。量化单元52接着可将以下各者中的选定者提供到位流产生单元52以用作经译码前景V[k]向量57:未经预测的经向量量化的V向量(例如,就权重值或指示权重值的位来说)、经预测的经向量量化的V向量(例如,就误差值或指示误差值的位来说)、未经霍夫曼译码的经标量量化的V向量以及经霍夫曼译码的经标量量化的V向量。量化单元52还可提供指示量化模式的语法元素(例如,NbitsQ语法元素)和用以将V向量解量化或以其它方式重构V向量的任何其它语法元素。Quantization unit 52 may perform multiple forms of quantization on each of the reduced foreground V[k] vectors 55 to obtain multiple coded versions of the reduced foreground V[k] vectors 55. Quantization unit 52 may select one of the coded versions of the reduced foreground V[k] vectors 55 as the coded foreground V[k] vector 57. In other words, quantization unit 52 may select one of an unpredicted vector quantized V vector, a predicted vector quantized V vector, an unHuffman coded scalar quantized V vector, and a Huffman coded scalar quantized V vector for use as the output switched quantized V vector based on any combination of the criteria discussed in this disclosure. In some examples, quantization unit 52 may select a quantization mode from a set of quantization modes including a vector quantization mode and one or more scalar quantization modes, and quantize the input V vector based on (or in accordance with) the selected mode. Quantization unit 52 may then provide a selected one of the following to bitstream generation unit 52 for use as the coded foreground V[k] vector 57: an unpredicted vector quantized V vector (e.g., in terms of weight values or bits indicating weight values), a predicted vector quantized V vector (e.g., in terms of error values or bits indicating error values), a unHuffman coded scalar quantized V vector, and a Huffman coded scalar quantized V vector. Quantization unit 52 may also provide a syntax element indicating a quantization mode (e.g., an NbitsQ syntax element) and any other syntax elements used to dequantize or otherwise reconstruct the V vector.
包含于音频编码装置20内的去相关单元40'可表示经配置以将一或多个去相关变换应用于HOA系数47'以获得经去相关的HOA系数47"的单元的单个或多个例子。在一些实例中,去相关单元40'可将UHJ矩阵应用于HOA系数47'。在本发明的各种例子中,UHJ矩阵还可被称作“基于相位的变换”。应用基于相位的变换在本文中也可被称作“相移去相关”。The decorrelation unit 40′ included within the audio encoding device 20 may represent a single or multiple instances of a unit configured to apply one or more decorrelation transforms to the HOA coefficients 47′ to obtain decorrelated HOA coefficients 47″. In some examples, the decorrelation unit 40′ may apply a UHJ matrix to the HOA coefficients 47′. In various examples of the present disclosure, the UHJ matrix may also be referred to as a “phase-based transform.” Applying a phase-based transform may also be referred to herein as “phase-shift decorrelation.”
立体混响UHJ格式是经设计以与单声道和立体声媒体兼容的立体混响环绕声系统的发展。UHJ格式包含其中将以根据可用的信道变化的准确性程度重现所记录的声场的系统层次。在各种例子中,UHJ也被称作“C格式”。所述缩写指示并入到所述系统中的来源中的一些:来自通用的U(UD-4);来自矩阵H的H;和来自系统45J的J。The Ambisonics UHJ format is a development of an Ambisonics surround sound system designed to be compatible with both mono and stereo media. The UHJ format encompasses a system hierarchy in which the recorded sound field is reproduced with a degree of accuracy that varies depending on the available channels. In various instances, UHJ is also referred to as the "C-format." The abbreviations indicate some of the sources incorporated into the system: U from Universal (UD-4); H from Matrix H; and J from System 45J.
UHJ是编码和解码立体混响技术内的方向性声音信息的分层系统。取决于可用的信道的数目,系统可携带更多或更少信息。UHJ是立体声和单声道完全兼容的。可使用高达四个信道(L、R、T、Q)。UHJ is a layered system for encoding and decoding directional sound information within ambisonics. Depending on the number of available channels, the system can carry more or less information. UHJ is fully compatible with both stereo and mono. Up to four channels (L, R, T, Q) can be used.
在一种形式中,2信道(L、R)UHJ、水平(或“平面”)环绕信息可由正交立体声信号信道(CD、FM或数字无线电等)携载,所述信息可在收听端使用UHJ解码器进行恢复。将两个信道求和可产生兼容的单声道信号,其与对常规“经假立体声录音的(panpotted)单声道”源相比可为对两信道版本的更准确表示。如果第三信道(T)可用,那么当经由3信道UHJ解码器进行解码时,第三信道可用以产生对平面环绕效应的改进的定位准确性。第三信道为此目的可能并非不需要具有全音频带宽,从而导致所谓的“21/2信道”系统的可能性,其中第三信道在带宽上受限。在一个实例中,所述限值可为5kHz。第三信道可经由FM无线电例如借助于相位正交调制进行广播。将第四信道(Q)添加到UHJ系统可允许以高度n(有时被称为多声道(Periphony))对全环绕声音进行编码,其中准确性程度与4信道B格式相同。In one form, 2-channel (L, R) UHJ, horizontal (or "planar") surround information can be carried by quadrature stereo signal channels (CD, FM, or digital radio, etc.), which can be recovered at the listening end using a UHJ decoder. Summing the two channels produces a compatible mono signal that can be a more accurate representation of the two-channel version than a conventional "panpotted mono" source. If a third channel (T) is available, then when decoded via a 3-channel UHJ decoder, the third channel can be used to produce improved localization accuracy for the planar surround effect. The third channel may not necessarily have the full audio bandwidth for this purpose, leading to the possibility of a so-called "2.5-channel" system, in which the third channel is limited in bandwidth. In one example, the limit may be 5 kHz. The third channel can be broadcast via FM radio, for example, using phase quadrature modulation. Adding a fourth channel (Q) to the UHJ system allows encoding of full surround sound at a height n (sometimes referred to as periphony) with the same degree of accuracy as the 4-channel B-format.
2信道UHJ是通常用于立体混响记录的分配的格式。2信道UHJ记录可经由所有正交立体声信道传输,且可使用正交2信道媒体中的任一者而无需更改。UHJ是立体声兼容的,因为在无需解码的情况下,收听者可察觉立体声像,但其与常规立体声(例如,所谓的“超立体声”)相比显著更宽。也可将左信道与右信道求和以用于极高程度的单声道兼容性。经由UHJ解码器回放,可展现环绕能力。2-channel UHJ is a format commonly used for the distribution of ambisonic recordings. 2-channel UHJ recordings can be transmitted via all orthogonal stereo channels and can use any orthogonal 2-channel media without modification. UHJ is stereo-compatible, as the listener can perceive a stereo image without decoding, but it is significantly wider than conventional stereo (e.g., so-called "super stereo"). The left and right channels can also be summed for an extremely high degree of mono compatibility. Playback through a UHJ decoder reveals surround sound capabilities.
应用UHJ矩阵(或基于相位的变换)的去相关单元40'的实例数学表示如下:An example mathematical representation of the decorrelation unit 40' applying the UHJ matrix (or phase-based transform) is as follows:
UHJ编码:UHJ Code:
S=(0.9397*W)+(0.1856*X);S=(0.9397*W)+(0.1856*X);
D=imag(hilbert((-0.3420*W)+(0.5099*X)))+(0.6555*Y);D=imag(hilbert((-0.3420*W)+(0.5099*X)))+(0.6555*Y);
T=imag(hilbert((-0.1432*W)+(0.6512*X)))-(0.7071*Y);T=imag(hilbert((-0.1432*W)+(0.6512*X)))-(0.7071*Y);
Q=0.9772*Z;Q = 0.9772*Z;
S和D到左和右的转换:Conversion of S and D to left and right:
左=(S+D)/2Left = (S + D) / 2
右=(S-D)/2Right = (S-D)/2
根据以上计算的一些实施方案,关于以上计算的假设可包含以下各项:HOA背景信道是1阶立体混响,FuMa经归一化,按照立体混响信道编号次序W(a00)、X(a11)、Y(a11-)、Z(a10)。According to some implementations of the above calculation, assumptions regarding the above calculation may include the following: the HOA background channel is a first-order ambisonic reverberation, the FuMa is normalized, and the ambisonic reverberation channel numbering order is W(a00), X(a11), Y(a11-), Z(a10).
在上文所列的计算中,去相关单元40'可执行各种矩阵与恒定值的标量乘法。举例来说,为获得S信号,去相关单元40'可执行W矩阵与恒定值0.9397(例如,通过标量乘法)以及X矩阵与恒定值0.1856的标量乘法。还如在上文所列的计算中所说明,去相关单元40'可在获得D和T信号中的每一者时应用希尔伯特变换(由以上UHJ编码中的“Hilbert()”函数标示)。以上UHJ编码中的“imag()”函数指示获得希尔伯特变换的结果的虚数(在数学意义上)。In the calculations listed above, decorrelation unit 40' may perform scalar multiplications of various matrices with constant values. For example, to obtain the S signal, decorrelation unit 40' may perform scalar multiplications of the W matrix with the constant value 0.9397 (e.g., via scalar multiplications) and the X matrix with the constant value 0.1856. As also illustrated in the calculations listed above, decorrelation unit 40' may apply a Hilbert transform (denoted by the "Hilbert()" function in the UHJ encoding above) when obtaining each of the D and T signals. The "imag()" function in the UHJ encoding above indicates that the imaginary number (in the mathematical sense) of the result of the Hilbert transform is obtained.
应用UHJ矩阵(或基于相位的变换)的去相关单元40'的另一实例数学表示如下:Another example mathematical representation of the decorrelation unit 40' applying the UHJ matrix (or phase-based transform) is as follows:
UHJ编码:UHJ Code:
S=(0.9396926*W)+(0.151520536509082*X);S=(0.9396926*W)+(0.151520536509082*X);
D=imag(hilbert((-0.3420201*W)+(0.416299273350443*X)))+(0.535173990363608*Y);D=imag(hilbert((-0.3420201*W)+(0.416299273350443*X)))+(0.535173990363608*Y);
T=0.940604061228740*(imag(hilbert((-0.1432*W)+(0.531702573500135*X)))-(0.577350269189626*Y));T=0.940604061228740*(imag(hilbert((-0.1432*W)+(0.531702573500135*X)))-(0.577350269189626*Y));
Q=Z;Q = Z;
S和D到左和右的转换:Conversion of S and D to left and right:
左=(S+D)/2;Left = (S + D) / 2;
右=(S-D)/2;Right = (S-D)/2;
在以上计算的一些实例实施方案中,关于以上计算的假设可包含以下各项:HOA背景信道是1阶立体混响,N3D(或“全三维”)经归一化,按照立体混响信道编号次序W(a00)、X(a11)、Y(a11-)、Z(a10)。尽管本文中关于N3D归一化进行描述,但应了解,所述实例计算也可应用于经SN3D归一化(或“经施密特半归一化”)的HOA背景信道。N3D与SN3D归一化可在所使用的比例缩放因子方面不同。N3D归一化相对于SN3D归一化的实例表示如下表达为:In some example implementations of the above calculations, assumptions regarding the above calculations may include the following: the HOA background channel is 1st-order ambisonics, and N3D (or "full 3D") normalized to the ambisonic channel numbering order W(a00), X(a11), Y(a11-), Z(a10). Although described herein with respect to N3D normalization, it should be understood that the example calculations may also be applied to SN3D-normalized (or "Schmidt semi-normalized") HOA background channels. N3D and SN3D normalizations may differ in the scaling factors used. An example representation of N3D normalization relative to SN3D normalization is expressed as follows:
在SN3D归一化中所使用的加权系数的实例如下表达为:An example of the weighting coefficients used in SN3D normalization is expressed as follows:
在上文所列的计算中,去相关单元40'可执行各种矩阵与恒定值的标量乘法。举例来说,为获得S信号,去相关单元40'可执行W矩阵与恒定值0.9396926(例如,通过标量乘法)以及X矩阵与恒定值0.151520536509082的标量乘法。还如在上文所列的计算中所说明,去相关单元40'可在获得D和T信号中的每一者时应用希尔伯特变换(由以上UHJ编码中的“Hilbert()”函数或相移去相关标示)。以上UHJ编码中的“imag()”函数指示获得希尔伯特变换的结果的虚数(在数学意义上)。In the calculations listed above, decorrelation unit 40' may perform scalar multiplications of various matrices with constant values. For example, to obtain the S signal, decorrelation unit 40' may perform scalar multiplications of the W matrix with the constant value 0.9396926 (e.g., via scalar multiplications) and the X matrix with the constant value 0.151520536509082. As also illustrated in the calculations listed above, decorrelation unit 40' may apply a Hilbert transform (denoted by the "Hilbert()" function or phase-shift decorrelation in the UHJ encoding above) when obtaining each of the D and T signals. The "imag()" function in the UHJ encoding above indicates the imaginary number (in the mathematical sense) of the result of the Hilbert transform.
去相关单元40'可执行上文所列的计算,使得所得的S信号和D信号表示左音频信号和右音频信号(或换句话说,立体声音频信号)。在一些此类情境中,去相关单元40'可输出T信号和Q信号作为经去相关的HOA系数47"的一部分,但当T信号和Q信号再现给立体声扬声器几何布置(或换句话说,立体声扬声器配置)时,接收位流21的解码装置可不处理所述T信号和Q信号。在实例中,HOA系数47'可表示将在单声道音频再现系统上再现的声场。去相关单元40'可输出S信号和D信号作为经去相关的HOA系数47"的一部分,且接收位流21的解码装置可组合(或“混合”)S信号和D信号以形成将以单声道音频格式再现和/或输出的音频信号。在这些实例中,解码装置和/或再现装置可以各种方式恢复单声道音频信号。一个实例是通过混合左信号与右信号(由S信号和D信号表示)。另一实例是通过应用UHJ矩阵(或基于相位的变换)以解码W信号(在下文对于图5更详细地论述)。通过应用UHJ矩阵(或基于相位的变换)产生呈S信号和D信号形式的固有左信号和固有右信号,去相关单元40'可实施本发明的技术以与应用其它去相关变换(例如在MPEG-H标准中描述的模式矩阵)的技术相比提供潜在优点和/或潜在改进。The decorrelation unit 40' may perform the calculations listed above so that the resulting S and D signals represent the left and right audio signals (or in other words, the stereo audio signal). In some such scenarios, the decorrelation unit 40' may output the T and Q signals as part of the decorrelated HOA coefficients 47", but the T and Q signals may not be processed by a decoding device receiving the bitstream 21 when the T and Q signals are reproduced to a stereo speaker geometry (or in other words, a stereo speaker configuration). In an example, the HOA coefficients 47' may represent a sound field that would be reproduced on a mono audio reproduction system. The decorrelation unit 40' may output the S and D signals as part of the decorrelated HOA coefficients 47", and the decoding device receiving the bitstream 21 may combine (or "mix") the S and D signals to form an audio signal that is to be reproduced and/or output in a mono audio format. In these examples, the decoding device and/or the reproduction device may recover the mono audio signal in various ways. One example is by mixing the left and right signals (represented by the S and D signals). Another example is by applying a UHJ matrix (or phase-based transform) to decode the W signal (discussed in more detail below with respect to FIG. 5 ). By applying a UHJ matrix (or phase-based transform) to generate an intrinsic left signal and an intrinsic right signal in the form of an S signal and a D signal, decorrelation unit 40 ′ can implement the techniques of this disclosure to provide potential advantages and/or potential improvements over techniques that apply other decorrelation transforms (e.g., pattern matrices described in the MPEG-H standard).
在各种实例中,去相关单元40'可基于所接收的HOA系数47'的位速率,应用不同的去相关变换。举例来说,在其中HOA系数47'表示四信道输入的情境中,去相关单元40'可应用上文所描述的UHJ矩阵(或基于相位的变换)。更具体来说,基于HOA系数47'表示四信道输入,去相关单元40'可应用4×4UHJ矩阵(或基于相位的变换)。举例来说,4×4矩阵可正交于HOA系数47'的四信道输入。换句话说,在HOA系数47'表示更少数目个信道(例如,四个)的例子中,去相关单元40'可应用UHJ矩阵作为选定去相关变换,以将HOA信号47'的背景信号去相关以获得经去相关的HOA系数47"。In various examples, the decorrelation unit 40' may apply different decorrelation transforms based on the bit rate of the received HOA coefficients 47'. For example, in a scenario where the HOA coefficients 47' represent a four-channel input, the decorrelation unit 40' may apply the UHJ matrix (or phase-based transform) described above. More specifically, based on the HOA coefficients 47' representing a four-channel input, the decorrelation unit 40' may apply a 4×4 UHJ matrix (or phase-based transform). For example, the 4×4 matrix may be orthogonal to the four-channel input of the HOA coefficients 47'. In other words, in instances where the HOA coefficients 47' represent a smaller number of channels (e.g., four), the decorrelation unit 40' may apply the UHJ matrix as the selected decorrelation transform to decorrelate the background signal of the HOA signal 47' to obtain decorrelated HOA coefficients 47".
根据此实例,如果HOA系数47'表示更大数目个信道(例如,九个),那么去相关单元40'可应用不同于UHJ矩阵(或基于相位的变换)的去相关变换。举例来说,在其中HOA系数47'表示九信道输入的情境中,去相关单元40'可应用模式矩阵(例如,如在MPEG-H标准中所描述),以将HOA系数47'去相关。在其中HOA系数47'表示九信道输入的实例中,去相关单元40'可应用9×9模式矩阵以获得经去相关的HOA系数47"。Following this example, if the HOA coefficients 47′ represent a larger number of channels (e.g., nine), the decorrelation unit 40′ may apply a decorrelation transform other than the UHJ matrix (or phase-based transform). For example, in a scenario where the HOA coefficients 47′ represent a nine-channel input, the decorrelation unit 40′ may apply a pattern matrix (e.g., as described in the MPEG-H standard) to decorrelate the HOA coefficients 47′. In an example where the HOA coefficients 47′ represent a nine-channel input, the decorrelation unit 40′ may apply a 9×9 pattern matrix to obtain decorrelated HOA coefficients 47″.
继而,音频编码装置20的各个组件(例如心理声学音频译码器40)可根据AAC或USAC对经去相关的HOA系数47"以感知方式进行译码。去相关单元40'可应用相移去相关变换(例如,在四信道输入的情况下,为UHJ矩阵或基于相位的变换),以优化针对HOA的AAC/USAC译码。在其中HOA系数47'(以及借此,经去相关的HOA系数47")表示将在立体声再现系统上再现的音频数据的实例中,去相关单元40′可应用本发明的技术以基于AAC和USAC是经相对地定向的立体声音频数据(或针对其经优化)而改进或优化压缩。Then, various components of the audio encoding device 20 (e.g., the psychoacoustic audio decoder 40) may perceptually decode the decorrelated HOA coefficients 47″ according to AAC or USAC. The decorrelation unit 40′ may apply a phase-shift decorrelation transform (e.g., a UHJ matrix or a phase-based transform in the case of a four-channel input) to optimize the AAC/USAC decoding for HOA. In instances where the HOA coefficients 47′ (and thereby, the decorrelated HOA coefficients 47″) represent audio data to be reproduced on a stereo reproduction system, the decorrelation unit 40′ may apply the techniques of the present invention to improve or optimize compression based on (or being optimized for) relatively oriented stereo audio data.
将理解,在其中经能量补偿的HOA系数47′包含前景信道的情境中,以及在其中经能量补偿的HOA系数47′不包含任何前景信道的情境中,去相关单元40′可应用本文中所描述的技术。作为一个实例,在其中经能量补偿的HOA系数47′包含零(0)个前景信道和四个(4)背景信道的情境(例如,更低/更小位速率的情境)中,去相关单元40′可应用上文所描述的技术和/或计算。It will be understood that the decorrelation unit 40′ may apply the techniques described herein in scenarios where the energy-compensated HOA coefficients 47′ include foreground channels, as well as in scenarios where the energy-compensated HOA coefficients 47′ do not include any foreground channels. As an example, in scenarios where the energy-compensated HOA coefficients 47′ include zero (0) foreground channels and four (4) background channels (e.g., lower/smaller bit rate scenarios), the decorrelation unit 40′ may apply the techniques and/or calculations described above.
在一些实例中,去相关单元40′可致使位流产生单元42用信号发送指示去相关单元40′将去相关变换应用于HOA系数47′的一或多个语法元素作为基于向量的位流21的一部分。通过将此指示提供到解码装置,去相关单元40′可使得解码装置能够对HOA域中的音频数据执行互逆去相关变换。在一些实例中,去相关单元40′可致使位流产生单元42用信号发送指示应用哪一去相关变换(例如UHJ矩阵(或其它基于相位的变换)或模式矩阵)的语法元素。In some examples, the decorrelation unit 40′ may cause the bitstream generation unit 42 to signal one or more syntax elements as part of the vector-based bitstream 21 that indicate that the decorrelation unit 40′ applies a decorrelating transform to the HOA coefficients 47′. By providing this indication to the decoding device, the decorrelation unit 40′ may enable the decoding device to perform a reciprocal decorrelating transform on the audio data in the HOA domain. In some examples, the decorrelation unit 40′ may cause the bitstream generation unit 42 to signal a syntax element that indicates which decorrelating transform to apply, such as a UHJ matrix (or other phase-based transform) or a pattern matrix.
去相关单元40′可将基于相位的变换应用于能量补偿环境HOA系数47′。用于CAMB(k-1)的第一OMIN HOA系数序列的基于相位的变换定义如下The decorrelation unit 40' may apply a phase-based transform to the energy-compensated ambient HOA coefficients 47'. The phase-based transform for the first O MIN HOA coefficient sequence of CAMB (k-1) is defined as follows
其中系数d如在表1中定义,信号帧S(k-2)和M(k-2)定义如下Where the coefficient d is defined as in Table 1, and the signal frames S(k-2) and M(k-2) are defined as follows
S(k-2)=A+90(k-2)+d(6)·cAMB,2(k-2)S(k-2)=A +90 (k-2)+d(6)·c AMB, 2 (k-2)
M(k-2)=d(4)·cAMB,1(k-2)+d(5)·cAMB,4(k-2)M(k-2)=d(4)·c AMB, 1 (k-2)+d(5)·c AMB, 4 (k-2)
且A+90(k-2)和B+90(k-2)是+90度相移信号A和B的帧,定义如下And A +90 (k-2) and B +90 (k-2) are frames of +90 degree phase shifted signals A and B, defined as follows
因此定义针对CP,AMB(k-1)的第一OMIN HOA系数序列的基于相位的变换。所描述的变换可引入一个帧的延迟。Thus a phase-based transformation of the first O MIN HOA coefficient sequence for CP ,AMB (k-1) is defined.The described transformation may introduce a delay of one frame.
在上文中,xAMB,LOW,1(k-2)到xAMB,LOW,4(k-2)可对应于经去相关的环境HOA系数47″。在上述方程式中,变化的CAMB,1(k)变量标示对应于具有(阶数∶子阶数)为(0∶0)的球形基底函数的第k帧的HOA系数,其还可被称作‘W’信道或分量。变化的CAMB,2(k)变量标示对应于具有(阶数∶子阶数)为(1∶-1)的球形基底函数的第k帧的HOA系数,其还可被称作‘Y’信道或分量。变化的CAMB,3(k)变量标示对应于具有(阶数:子阶数)为(1:0)的球形基底函数的第k帧的HOA系数,其还可被称作‘Z’信道或分量。变化的CAMB,4(k)变量标示对应于具有(阶数:子阶数)为(1:1)的球形基底函数的第k帧的HOA系数,其还可被称作‘X’信道或分量。CAMB,1(k)到CAMB,3(k)可对应于环境HOA系数47'。In the above, x AMB,LOW,1 (k-2) to x AMB,LOW,4 (k-2) may correspond to decorrelated ambient HOA coefficients 47". In the above equations, the varying C AMB,1 (k) variable indicates the HOA coefficient corresponding to the k-th frame having a spherical basis function with (order:sub-order) of (0:0), which may also be referred to as the 'W' channel or component. The varying C AMB,2 (k) variable indicates the HOA coefficient corresponding to the k-th frame having a spherical basis function with (order:sub-order) of (1:-1), which may also be referred to as the 'Y' channel or component. The varying C AMB,3 (k) variable indicates the HOA coefficient corresponding to the k-th frame having a spherical basis function with (order:sub-order) of (1:0), which may also be referred to as the 'Z' channel or component. The varying C AMB,4 The (k) variable designates the HOA coefficient corresponding to the kth frame with a spherical basis function of (order:suborder) of (1:1), which may also be referred to as the 'X' channel or component. CAMB,1 (k) to CAMB,3 (k) may correspond to the ambient HOA coefficients 47'.
下文的表1说明可由去相关单元40使用以执行基于相位的变换的系数的实例。Table 1 below illustrates an example of coefficients that may be used by decorrelation unit 40 to perform a phase-based transform.
表1用于基于相位的变换的系数Table 1 Coefficients for phase-based transformation
在一些实例中,音频编码装置20的各个组件(例如位流产生单元42)可经配置以仅传输针对较低目标位速率(例如,128K或256K的目标位速率)的一阶HOA表示。根据一些此类实例,音频编码装置20(或其组件,例如位流产生单元42)可经配置以丢弃高阶HOA系数(例如,具有大于一阶(或换句话说,N>1)的阶数的系数)。然而,在其中音频编码装置20确定目标位速率相对较高的实例中,音频编码装置20(例如,位流产生单元42)可分离前景信道与背景信道,且可分配位(例如,以较大量)给前景信道。In some examples, various components of the audio encoding device 20 (e.g., the bitstream generation unit 42) may be configured to transmit only first-order HOA representations for lower target bit rates (e.g., a target bit rate of 128K or 256K). According to some such examples, the audio encoding device 20 (or its components, such as the bitstream generation unit 42) may be configured to discard higher-order HOA coefficients (e.g., coefficients having an order greater than first order (or in other words, N>1)). However, in examples where the audio encoding device 20 determines that the target bit rate is relatively high, the audio encoding device 20 (e.g., the bitstream generation unit 42) may separate the foreground channel from the background channel and may allocate bits (e.g., in a larger amount) to the foreground channel.
包含于音频编码装置20内的心理声学音频译码器单元40可表示心理声学音频译码器的多个例子,所述例子中的每一者用以编码经去相关的HOA系数47"和经内插的nFG信号49'中的每一者的不同音频对象或HOA信道以产生经编码的环境HOA系数59和经编码的nFG信号61。心理声学音频译码器单元40可将经编码的环境HOA系数59和经编码的nFG信号61输出到位流产生单元42。The psychoacoustic audio decoder unit 40 included in the audio encoding device 20 may represent multiple instances of a psychoacoustic audio decoder, each of which is used to encode a different audio object or HOA channel for each of the decorrelated HOA coefficients 47″ and the interpolated nFG signal 49′ to produce encoded ambient HOA coefficients 59 and an encoded nFG signal 61. The psychoacoustic audio decoder unit 40 may output the encoded ambient HOA coefficients 59 and the encoded nFG signal 61 to the bitstream generation unit 42.
包含于音频编码装置20内的位流产生单元42表示将数据格式化以符合已知格式(可指解码装置已知的格式),借此产生基于向量的位流21的单元。换句话说,位流21可表示已按上文所描述的方式进行编码的经编码音频数据。在一些实例中,位流产生单元42可表示可接收经译码前景V[k]向量57、经编码环境HOA系数59、经编码nFG信号61和背景信道信息43的多路复用器。位流产生单元42接着可基于经译码前景V[k]向量57、经编码环境HOA系数59、经编码nFG信号61和背景信道信息43产生位流21。以此方式,位流产生单元42可借此规定位流中21的向量57以获得位流21。位流21可包含主要或主位流以及一或多个旁侧信道位流。The bitstream generation unit 42 included in the audio encoding device 20 represents a unit that formats data to conform to a known format (which may be a format known to the decoding device), thereby generating a vector-based bitstream 21. In other words, the bitstream 21 may represent encoded audio data that has been encoded in the manner described above. In some examples, the bitstream generation unit 42 may represent a multiplexer that can receive the coded foreground V[k] vector 57, the encoded ambient HOA coefficients 59, the encoded nFG signal 61, and the background channel information 43. The bitstream generation unit 42 may then generate the bitstream 21 based on the coded foreground V[k] vector 57, the encoded ambient HOA coefficients 59, the encoded nFG signal 61, and the background channel information 43. In this manner, the bitstream generation unit 42 may thereby specify the vector 57 in the bitstream 21 to obtain the bitstream 21. The bitstream 21 may include a primary or main bitstream and one or more side channel bitstreams.
尽管未在图3的实例中示出,但音频编码装置20还可包含位流输出单元,所述位流输出单元基于是将使用基于方向的合成还是基于向量的合成对当前帧进行编码而切换从音频编码装置20输出的位流(例如,在基于方向的位流21与基于向量的位流21之间切换)。位流输出单元可基于由内容分析单元26输出的指示执行基于方向的合成(作为检测到HOA系数11是从合成音频对象产生的结果)还是执行基于向量的合成(作为检测到HOA系数经记录的结果)的语法元素执行所述切换。位流输出单元可指定正确的标头语法以指示用于当前帧以及位流21中的相应者的切换或当前编码。Although not shown in the example of FIG3 , the audio encoding device 20 may further include a bitstream output unit that switches the bitstream output from the audio encoding device 20 based on whether the current frame is to be encoded using direction-based synthesis or vector-based synthesis (e.g., switches between a direction-based bitstream 21 and a vector-based bitstream 21). The bitstream output unit may perform the switching based on a syntax element output by the content analysis unit 26 indicating whether direction-based synthesis is to be performed (as a result of detecting that the HOA coefficients 11 are generated from a synthesized audio object) or vector-based synthesis is to be performed (as a result of detecting that the HOA coefficients are recorded). The bitstream output unit may specify the correct header syntax to indicate the switching or current encoding for the current frame and the corresponding one in the bitstream 21.
此外,如上文所提到,声场分析单元44可标识BGTOT环境HOA系数47,所述系数可逐帧改变(但有时BGTOT可跨越两个或更多个邻近(在时间上)帧保持恒定或相同)。BGTOT的改变可导致在经减少前景V[k]向量55中表达的系数的改变。BGTOT的改变可导致背景HOA系数(其也可被称作“环境HOA系数”)逐帧改变(但再次,BGTOT有时可跨越两个或更多个邻近(在时间上)帧保持恒定或相同)。所述改变通常导致声场的各方面的能量改变,所述能量改变由额外环境HOA系数的添加或移除以及系数从减少的前景V[k]向量55的对应移除或系数到减少的前景V[k]向量55的添加来表示。In addition, as mentioned above, the sound field analysis unit 44 may identify BG TOT ambient HOA coefficients 47, which may change from frame to frame (but sometimes the BG TOT may remain constant or the same across two or more adjacent (in time) frames). Changes in the BG TOT may result in changes in the coefficients expressed in the reduced foreground V[k] vector 55. Changes in the BG TOT may result in changes in the background HOA coefficients (which may also be referred to as "ambient HOA coefficients") from frame to frame (but again, the BG TOT may sometimes remain constant or the same across two or more adjacent (in time) frames). The changes typically result in energy changes in various aspects of the sound field, which are represented by the addition or removal of additional ambient HOA coefficients and the corresponding removal of coefficients from the reduced foreground V[k] vector 55 or the addition of coefficients to the reduced foreground V[k] vector 55.
因此,声场分析单元44可进一步确定环境HOA系数何时从帧到帧改变,并且产生指示环境HOA系数在用于表示声场的环境分量方面的改变的标志或其它语法元素(其中所述改变也可被称作环境HOA系数的“转变”或环境HOA系数的“转变”)。特定来说,系数减少单元46可产生标志(其可表示为AmbCoeffTransition标志或AmbCoeffIdxTransition标志),从而将所述标志提供到位流产生单元42,使得可将所述标志包含在位流21中(可能地作为旁侧信道信息的部分)。Thus, the soundfield analysis unit 44 may further determine when the ambient HOA coefficients change from frame to frame and generate a flag or other syntax element indicating the change in the ambient HOA coefficients used to represent the ambient component of the soundfield (wherein the change may also be referred to as a "transition" of the ambient HOA coefficients or a "transition" of the ambient HOA coefficients. In particular, the coefficient reduction unit 46 may generate a flag (which may be denoted as an AmbCoeffTransition flag or an AmbCoeffIdxTransition flag), providing the flag to the bitstream generation unit 42 so that the flag may be included in the bitstream 21 (possibly as part of the side channel information).
除指定环境系数转变标志之外,系数减少单元46还可修改产生减少的前景V[k]向量55的方式。在一个实例中,在确定环境HOA环境系数中的一者在当前帧期间处于转变中时,系数减少单元46可指定减少的前景V[k]向量55的V向量中的每一者的向量系数(其也可被称作“向量元素”或“元素”),其对应于处于转变中的环境HOA系数。此外,处于转变中的环境HOA系数可添加到背景系数的BGTOT总数目或从背景系数的BGTOT总数目中移除。因此,背景系数的总数目的所得改变影响环境HOA系数是否包含于位流中,以及在上文所描述的第二和第三配置模式中是否针对位流中所指定的V向量包含V向量的对应元素。关于系数减少单元46可如何规定减少的前景V[k]向量55以克服能量改变的更多信息提供于2015年1月12日申请的标题为“环境高阶立体混响系数的转变(TRANSITIONING OF AMBIENT HIGHER-ORDERAMBISONIC COEFFICIENTS)”的第14/594,533号美国申请案中。In addition to specifying the ambient coefficient transition flag, the coefficient reduction unit 46 may also modify the manner in which the reduced foreground V[k] vector 55 is generated. In one example, upon determining that one of the ambient HOA ambient coefficients is in transition during the current frame, the coefficient reduction unit 46 may specify a vector coefficient (which may also be referred to as a "vector element" or "element") for each of the V vectors of the reduced foreground V[k] vector 55 that corresponds to the ambient HOA coefficient that is in transition. Furthermore, the ambient HOA coefficient that is in transition may be added to or removed from the total BG TOT number of background coefficients. Thus, the resulting change in the total number of background coefficients affects whether the ambient HOA coefficient is included in the bitstream , and whether the corresponding element of the V vector is included for the specified V vector in the bitstream in the second and third configuration modes described above. More information regarding how coefficient reduction unit 46 may specify reduced foreground V[k] vectors 55 to overcome energy changes is provided in U.S. application Ser. No. 14/594,533, filed Jan. 12, 2015, entitled “TRANSITIONING OF AMBIENT HIGHER-ORDERAMBISONIC COEFFICIENTS.”
因此,音频编码装置20可表示用于压缩音频的装置的实例,所述装置经配置以将去相关变换应用于环境立体混响系数以获得环境立体混响系数的经去相关表示,环境HOA系数已从多个高阶立体混响系数中提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联。在一些实例中,为了应用去相关变换,所述装置经配置以将UHJ矩阵应用于环境立体混响系数。Thus, the audio encoding device 20 may represent an example of a device for compressing audio, the device being configured to apply a decorrelation transform to ambisonic reverberation coefficients to obtain a decorrelated representation of the ambisonic reverberation coefficients, the ambisonic reverberation coefficients having been extracted from a plurality of higher-order ambisonic reverberation coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic reverberation coefficients, wherein at least one of the plurality of higher-order ambisonic reverberation coefficients is associated with a spherical basis function having an order greater than one. In some examples, to apply the decorrelation transform, the device is configured to apply a UHJ matrix to the ambisonic reverberation coefficients.
在一些实例中,所述装置经进一步配置以根据N3D(全三维)归一化对UHJ矩阵进行归一化。在一些实例中,所述装置经进一步配置以根据SN3D归一化(施密特半归一化)对UHJ矩阵进行归一化。在一些实例中,环境立体混响系数是与具有阶数零或阶数一的球形基底函数相关联,且为了将UHJ矩阵应用于环境立体混响系数,所述装置经配置以对于环境立体混响系数的至少一个子集执行UHJ矩阵的标量乘法。在一些实例中,为了应用去相关变换,所述装置经配置以将模式矩阵应用于环境立体混响系数。In some examples, the device is further configured to normalize the UHJ matrix according to N3D (full three-dimensional) normalization. In some examples, the device is further configured to normalize the UHJ matrix according to SN3D normalization (Schmidt semi-normalization). In some examples, the ambisonic reverberation coefficients are associated with spherical basis functions of order zero or order one, and to apply the UHJ matrix to the ambisonic reverberation coefficients, the device is configured to perform a scalar multiplication of the UHJ matrix for at least a subset of the ambisonic reverberation coefficients. In some examples, to apply a decorrelation transform, the device is configured to apply a mode matrix to the ambisonic reverberation coefficients.
根据一些实例,为了应用去相关变换,所述装置经配置以从经去相关的环境立体混响系数获得左信号和右信号。根据一些实例,所述装置经进一步配置以用信号发送经去相关的环境立体混响系数以及一或多个前景信道。根据一些实例,为了用信号发送经去相关的环境立体混响系数以及一或多个前景信道,所述装置经配置以响应于确定目标位速率符合或超过预定阈值而用信号发送经去相关的环境立体混响系数以及一或多个前景信道。According to some examples, to apply the decorrelation transform, the device is configured to obtain the left signal and the right signal from the decorrelated ambisonic reverberation coefficients. According to some examples, the device is further configured to signal the decorrelated ambisonic reverberation coefficients and the one or more foreground channels. According to some examples, to signal the decorrelated ambisonic reverberation coefficients and the one or more foreground channels, the device is configured to signal the decorrelated ambisonic reverberation coefficients and the one or more foreground channels in response to determining that the target bit rate meets or exceeds a predetermined threshold.
在一些实例中,所述装置经进一步配置以在不用信号发送任何前景信道的情况下用信号发送经去相关的环境立体混响系数。在一些实例中,为了在不用信号发送任何前景信道的情况下用信号发送经去相关的环境立体混响系数,所述装置经配置以响应于确定目标位速率低于预定阈值而在不用信号发送任何前景信道的情况下用信号发送经去相关的环境立体混响系数。在一些实例中,所述装置经进一步配置以用信号发送对去相关变换已应用于环境立体混响系数的指示。在一些实例中,所述装置进一步包含经配置以捕获将被压缩的音频数据的麦克风阵列。In some examples, the device is further configured to signal decorrelated ambisonic reverberation coefficients without signaling any foreground channels. In some examples, to signal decorrelated ambisonic reverberation coefficients without signaling any foreground channels, the device is configured to signal decorrelated ambisonic reverberation coefficients without signaling any foreground channels in response to determining that a target bit rate is below a predetermined threshold. In some examples, the device is further configured to signal an indication that a decorrelation transform has been applied to the ambisonic reverberation coefficients. In some examples, the device further includes a microphone array configured to capture the audio data to be compressed.
图4是更详细地说明图2的音频解码装置24的框图。如在图4的实例中所展示,音频解码装置24可包含提取单元72、基于方向的重构单元90、基于向量的重构单元92以及再相关单元81。4 is a block diagram illustrating in greater detail the audio decoding device 24 of FIG 2. As shown in the example of FIG 4, the audio decoding device 24 may include an extraction unit 72, a direction-based reconstruction unit 90, a vector-based reconstruction unit 92, and a re-correlation unit 81.
尽管下文进行描述,但关于音频解码装置24和解压缩或以其它方式解码HOA系数的各种方面的更多信息可在2014年5月29日申请的标题为“用于声场的经分解表示的内插(INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD)”的第WO 2014/194099号国际专利申请公开案中获得。Although described below, more information regarding the audio decoding device 24 and various aspects of decompressing or otherwise decoding HOA coefficients may be obtained in International Patent Application Publication No. WO 2014/194099, filed on May 29, 2014, entitled “INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD.”
提取单元72可表示经配置以接收位流21并提取HOA系数11的各种经编码版本(例如,基于方向的经编码版本或基于向量的经编码版本)的单元。提取单元72可从以上所述确定指示HOA系数11是经由各种基于方向的版本还是基于向量的版本经编码的语法元素。当执行基于方向的编码时,提取单元72可提取HOA系数11的基于方向的版本和与所述经编码版本相关联的语法元素(其在图4的实例中表示为基于方向的信息91),从而将基于方向的信息91传递到基于方向的重构单元90。基于方向的重构单元90可表示经配置以基于所述基于方向的信息91重构呈HOA系数11'形式的HOA系数的单元。下文描述位流内的位流和语法元素的布置。The extraction unit 72 may represent a unit configured to receive the bitstream 21 and extract various encoded versions of the HOA coefficients 11 (e.g., a direction-based encoded version or a vector-based encoded version). The extraction unit 72 may determine, from the above description, syntax elements indicating whether the HOA coefficients 11 are encoded via various direction-based versions or vector-based versions. When direction-based encoding is performed, the extraction unit 72 may extract the direction-based version of the HOA coefficients 11 and syntax elements associated with the encoded version (which are represented as direction-based information 91 in the example of FIG. 4 ), thereby passing the direction-based information 91 to the direction-based reconstruction unit 90. The direction-based reconstruction unit 90 may represent a unit configured to reconstruct the HOA coefficients in the form of HOA coefficients 11′ based on the direction-based information 91. The arrangement of the bitstream and syntax elements within the bitstream is described below.
当语法元素指示HOA系数11使用基于向量的合成进行编码时,提取单元72可提取经译码前景V[k]向量57(其可包含经译码权重57和/或索引63或经标量量化的V向量)、经编码环境HOA系数59和对应的音频对象61(其也可被称作经编码nFG信号61)。音频对象61各自对应于向量57中的一者。提取单元72可将经译码前景V[k]向量57传递到V向量重构单元74,且将经编码环境HOA系数59以及经编码nFG信号61提供到心理声学解码单元80。When the syntax element indicates that the HOA coefficients 11 are encoded using vector-based synthesis, the extraction unit 72 may extract the coded foreground V[k] vectors 57 (which may include the coded weights 57 and/or indices 63 or scalar-quantized V vectors), the encoded ambient HOA coefficients 59, and the corresponding audio objects 61 (which may also be referred to as encoded nFG signals 61). The audio objects 61 each correspond to one of the vectors 57. The extraction unit 72 may pass the coded foreground V[k] vectors 57 to the V-vector reconstruction unit 74 and provide the encoded ambient HOA coefficients 59 and the encoded nFG signal 61 to the psychoacoustic decoding unit 80.
V向量重构单元74可表示经配置以从经编码前景V[k]向量57重构V向量的单元。V向量重构单元74可以与量化单元52互逆的方式操作。The V-vector reconstruction unit 74 may represent a unit configured to reconstruct the V-vector from the encoded foreground V[k] vector 57. The V-vector reconstruction unit 74 may operate in an inverse manner to the quantization unit 52.
心理声学解码单元80可以与图3的实例中所展示的心理声学音频译码器单元40互逆的方式操作,以便对经编码环境HOA系数59和经编码nFG信号61进行解码且借此产生经能量补偿的环境HOA系数47'和经内插的nFG信号49'(其也可被称作经内插的nFG音频对象49')。心理声学解码单元80可将经能量补偿的环境HOA系数47'传递到再相关单元81且将nFG信号49'传递到前景制订单元78。继而,再相关单元81可将一或多个再相关变换应用于经能量补偿的环境HOA系数47'以获得一或多个再相关的HOA系数47"(或相关的HOA系数47"),并且可将相关的HOA系数47"传递到HOA系数制订单元82(任选地,通过淡化单元770)。The psychoacoustic decoding unit 80 may operate in a reciprocal manner to the psychoacoustic audio decoder unit 40 shown in the example of FIG3 , so as to decode the encoded ambient HOA coefficients 59 and the encoded nFG signal 61 and thereby generate energy-compensated ambient HOA coefficients 47′ and an interpolated nFG signal 49′ (which may also be referred to as an interpolated nFG audio object 49′). The psychoacoustic decoding unit 80 may pass the energy-compensated ambient HOA coefficients 47′ to the re-correlation unit 81 and pass the nFG signal 49′ to the foreground formulation unit 78. The re-correlation unit 81 may then apply one or more re-correlation transforms to the energy-compensated ambient HOA coefficients 47′ to obtain one or more re-correlated HOA coefficients 47″ (or correlated HOA coefficients 47″), and may pass the correlated HOA coefficients 47″ to the HOA coefficient formulation unit 82 (optionally via the fade unit 770).
类似于以上的描述,相对于音频编码装置20的去相关单元40',再相关单元81可实施本发明的技术以减少经能量补偿的环境HOA系数47'的背景信道之间的相关性,从而减少或减低噪声去掩蔽。在其中再相关单元81应用UHJ矩阵(例如,逆UHJ矩阵)作为选定再相关变换的实例中,再相关单元81可改进压缩率并且通过减少数据处理操作而节省计算资源。在一些实例中,基于向量的位流21可包含指示在编码期间应用去相关变换的一或多个语法元素。在基于向量的位流21中包含此类语法元素可使再相关单元81能够对经能量补偿的HOA系数47'执行互逆去相关(例如,相关或再相关)变换。在一些实例中,信号语法元素可指示应用哪一去相关变换,例如UHJ矩阵或模式矩阵,借此使得再相关单元81能够选择适当再相关变换应用于经能量补偿的HOA系数47'。Similar to the description above, the re-correlation unit 81, relative to the decorrelation unit 40 ′ of the audio encoding device 20 , may implement the techniques of this disclosure to reduce the correlation between background channels of the energy-compensated ambient HOA coefficients 47 ′, thereby reducing or mitigating noise demasking. In instances where the re-correlation unit 81 applies a UHJ matrix (e.g., an inverse UHJ matrix) as the selected re-correlation transform, the re-correlation unit 81 may improve compression and conserve computational resources by reducing data processing operations. In some examples, the vector-based bitstream 21 may include one or more syntax elements indicating the application of a decorrelation transform during encoding. Including such syntax elements in the vector-based bitstream 21 may enable the re-correlation unit 81 to perform a mutually inverse decorrelation (e.g., correlation or re-correlation) transform on the energy-compensated HOA coefficients 47 ′. In some examples, a signal syntax element may indicate which decorrelation transform to apply, such as a UHJ matrix or a pattern matrix, thereby enabling the re-correlation unit 81 to select an appropriate re-correlation transform to apply to the energy-compensated HOA coefficients 47 ′.
在其中基于向量的重构单元92将HOA系数11'输出到包括立体声系统的再现系统的实例中,再相关单元81可处理S信号和D信号(例如,固有左信号和固有右信号)以产生再相关的HOA系数47"。举例来说,因为S信号和D信号表示固有左信号和固有右信号,所以再现系统可使用S信号和D信号作为两个立体声输出流。在其中重构单元92将HOA系数11'输出到包括单声道音频系统的再现系统的实例中,再现系统可组合或混合S信号与D信号(如在HOA系数11'中表示)以获得单声道音频输出以用于回放。在单声道音频系统的实例中,再现系统可将经混合的单声道音频输出添加到一或多个前景信道(在存在任何前景信道的情况下)以产生音频输出。In an instance where the vector-based reconstruction unit 92 outputs the HOA coefficients 11′ to a reproduction system comprising a stereo system, the re-correlation unit 81 may process the S signal and the D signal (e.g., the intrinsic left signal and the intrinsic right signal) to produce re-correlated HOA coefficients 47″. For example, because the S signal and the D signal represent the intrinsic left signal and the intrinsic right signal, the reproduction system may use the S signal and the D signal as two stereo output streams. In an instance where the reconstruction unit 92 outputs the HOA coefficients 11′ to a reproduction system comprising a mono audio system, the reproduction system may combine or mix the S signal and the D signal (as represented in the HOA coefficients 11′) to obtain a mono audio output for playback. In the instance of a mono audio system, the reproduction system may add the mixed mono audio output to one or more foreground channels (if any foreground channels are present) to produce an audio output.
相对于一些现有的具有UHJ能力的编码器,以相位振幅矩阵处理信号以恢复类似于B格式的信号集合。在大多数情况下,所述信号将实际上为B格式,但在2信道UHJ的情况下,没有充足的信息可供用以能够重构正确的B格式信号,而是呈现类似于B格式信号的特性的信号。所述信息接着经由雪夫型(Shelf)滤波器集合传递到产生扬声器馈送的振幅矩阵,所述雪夫型(Shelf)滤波器集合改进解码器在较小收听环境(其在较大规模的应用中可被省略)中的准确性和性能。立体混响经设计以符合实际房间(例如,起居室)和实用扬声器位置的要求:很多此类房间是矩形的,因此基础系统经设计以对去往矩形中的四个扩音器进行解码,其中侧变长度介于1:2(宽度是长度的两倍)与2:1(长度是宽度的两倍)之间,因此符合大部分此类房间的要求。通常提供布局控制以允许解码器针对扩音器位置经配置。布局控制是不同于其它环绕声系统的立体混响回放的方面:解码器可针对扬声器阵列的大小和布局经具体配置。布局控制可呈旋纽、2路(1:2、2:1)或3路(1:2、1:1、2:1)开关的形式。四个扬声器是水平环绕解码所需的最小值,且虽然四个扬声器布局可适用于数种收听环境,但较大空间可需要更多扬声器以给出全环绕定位。In contrast to some existing UHJ-capable encoders, the signal is processed with a phase-amplitude matrix to recover a B-format-like signal set. In most cases, the signal will actually be B-format, but in the case of two-channel UHJ, there's insufficient information available to reconstruct a correct B-format signal, resulting in a signal with characteristics similar to a B-format signal. This information is then passed to the amplitude matrix that generates the speaker feeds via a set of Shelf filters. This improves the decoder's accuracy and performance in smaller listening environments (it can be omitted in larger applications). Ambisonics is designed to fit the requirements of real rooms (e.g., living rooms) and practical speaker placement: many such rooms are rectangular, so the base system is designed to decode four loudspeakers in a rectangle, with sideband lengths between 1:2 (width twice length) and 2:1 (length twice width), thus meeting the requirements of most such rooms. Layout controls are typically provided to allow the decoder to be configured for loudspeaker placement. Layout control is an aspect of ambisonic playback that differs from other surround sound systems: the decoder can be configured specifically for the size and layout of the speaker array. Layout control can take the form of a knob, a 2-way (1:2, 2:1), or a 3-way (1:2, 1:1, 2:1) switch. Four speakers are the minimum required for horizontal surround decoding, and while a four-speaker layout can be suitable for many listening environments, larger spaces may require more speakers to provide full surround positioning.
再相关单元81可对于应用UHJ矩阵(例如,逆UHJ矩阵或基于相位的逆变换)作为再相关变换执行的计算的实例列出如下:Examples of calculations that the re-correlation unit 81 may perform for applying a UHJ matrix (eg, an inverse UHJ matrix or an inverse phase-based transform) as the re-correlation transform are listed below:
UHJ解码:UHJ decoding:
左和右到S和D的转换:Conversion of left and right to S and D:
S=左+右S=left+right
D=左-右D = left-right
W=(0.982*S)+0.197.*imag(hilbert((0.828*D)+(0.768*T)));W=(0.982*S)+0.197.*imag(hilbert((0.828*D)+(0.768*T)));
X=(0.419*S)-imag(hilbert((0.828*D)+(0.768*T)));X=(0.419*S)-imag(hilbert((0.828*D)+(0.768*T)));
Y=(0.796*D)-0.676*T+imag(hilbert(0.187*S));Y=(0.796*D)-0.676*T+imag(hilbert(0.187*S));
Z=(1.023*Q);Z=(1.023*Q);
在以上计算的一些实例实施方案中,关于以上计算的假设可包含以下各项:HOA背景信道是1阶立体混响,FuMa经归一化,按照立体混响信道编号次序W(a00)、X(a11)、Y(a11-)、Z(a10)。In some example implementations of the above calculations, assumptions regarding the above calculations may include the following: the HOA background channel is first-order ambisonics, the FuMa is normalized, and the ambisonics channels are numbered in the order W(a00), X(a11), Y(a11-), Z(a10).
再相关单元81可对于应用UHJ矩阵(或基于相位的逆变换)作为再相关变换执行的计算的实例列出如下:Examples of calculations that the re-correlation unit 81 may perform for applying the UHJ matrix (or the inverse phase-based transform) as the re-correlation transform are listed below:
UHJ解码:UHJ decoding:
左和右到S和D的转换:Conversion of left and right to S and D:
左和右到S和D的转换:Conversion of left and right to S and D:
S=左+右;S = left + right;
D=左-右;D = left-right;
h1=imag(hilbert(1.014088753512236*D+T));h1=imag(hilbert(1.014088753512236*D+T));
h2=imag(hilbert(0.229027290950227*S));h2=imag(hilbert(0.229027290950227*S));
W=0.982*S+0.160849826442762*h1;W=0.982*S+0.160849826442762*h1;
X=0.513168101113076*S-h1;X=0.513168101113076*S-h1;
Y=0.974896917627705*D-0.880208333333333*T+h2;Y=0.974896917627705*D-0.880208333333333*T+h2;
Z=Q;Z=Q;
在以上计算的一些实施方案中,关于以上计算的假设可包含以下各项:HOA背景信道是1阶立体混响,N3D(或“全三维”)经归一化,按照立体混响信道编号次序W(a00)、X(a11)、Y(a11-)、Z(a10)。尽管本文中关于N3D归一化进行描述,但应了解,所述实例计算也可应用于经SN3D归一化(或“经施密特半归一化”)的HOA背景信道。如上文关于图4所描述,N3D与SN3D归一化可在所使用的比例缩放因子方面不同。上文关于图4描述在N3D归一化中使用的比例缩放因子的实例表示。上文关于图4描述在SN3D归一化中使用的加权系数的实例表示。In some implementations of the above calculations, assumptions regarding the above calculations may include the following: the HOA background channel is 1st-order ambisonics, N3D (or "full 3D") normalized, and in the ambisonic channel numbering order W(a00), X(a11), Y(a11-), Z(a10). Although described herein with respect to N3D normalization, it should be understood that the example calculations may also be applied to HOA background channels that are SN3D normalized (or "Schmidt semi-normalized"). As described above with respect to FIG. 4 , N3D and SN3D normalizations may differ in the scaling factors used. An example representation of the scaling factors used in N3D normalization is described above with respect to FIG. 4 . An example representation of the weighting coefficients used in SN3D normalization is described above with respect to FIG.
在一些实例中,经能量补偿的HOA系数47'可表示仅水平布局,例如不包含任何垂直信道的音频数据。在这些实例中,再相关单元81可不对于以上的Z信号执行计算,因为Z信号表示垂直方向音频数据。替代地,在这些实例中,再相关单元81可仅对W、X和Y信号执行以上计算,因为W、X和Y信号表示水平方向数据。在其中经能量补偿的HOA系数47'表示将在单声道音频再现系统上再现的音频数据的一些实例中,再相关单元81可仅从以上计算得到W信号。更具体来说,因为所得W信号表示单声道音频数据,所以W信号可提供所必要的全部数据,其中经能量补偿的HOA系数47'表示将以单声道音频格式再现的数据,或其中再现系统包括单声道音频系统。In some examples, the energy-compensated HOA coefficients 47' may represent audio data that is only horizontally arranged, e.g., without any vertical channels. In these examples, the re-correlation unit 81 may not perform the above calculations on the Z signal, as the Z signal represents vertical audio data. Alternatively, in these examples, the re-correlation unit 81 may perform the above calculations only on the W, X, and Y signals, as the W, X, and Y signals represent horizontal data. In some examples where the energy-compensated HOA coefficients 47' represent audio data to be reproduced on a monophonic audio reproduction system, the re-correlation unit 81 may derive only the W signal from the above calculations. More specifically, because the resulting W signal represents monophonic audio data, the W signal may provide all the necessary data where the energy-compensated HOA coefficients 47' represent data to be reproduced in a monophonic audio format, or where the reproduction system comprises a monophonic audio system.
类似于如上文关于音频编码装置20的去相关单元40'所描述,在实例中,再相关单元81可在其中经能量补偿的HOA系数47'包含较少数目个背景信道的情境中应用UHJ矩阵(或逆UHJ矩阵或基于相位的逆变换),但可在经能量补偿的HOA系数47'包含较大数目个背景信道的情境中应用模式矩阵或逆模式矩阵(例如,如在MPEG-H标准中所描述)。Similar to as described above with respect to the decorrelation unit 40′ of the audio encoding device 20, in an example, the recorrelation unit 81 may apply a UHJ matrix (or an inverse UHJ matrix or an inverse phase-based transform) in scenarios where the energy-compensated HOA coefficients 47′ include a smaller number of background channels, but may apply a pattern matrix or an inverse pattern matrix (e.g., as described in the MPEG-H standard) in scenarios where the energy-compensated HOA coefficients 47′ include a larger number of background channels.
将理解,在其中经能量补偿的HOA系数47'包含前景信道的情境中,以及在其中经能量补偿的HOA系数47'不包含任何前景信道的情境中,再相关单元81可应用本文中所描述的技术。作为一个实例,在其中经能量补偿的HOA系数47'包含零(0)个前景信道和八(8)个背景信道的情境(例如,更低/更小位速率的情境)中,再相关单元81可应用上文所描述的技术和/或计算。It will be understood that the re-correlation unit 81 may apply the techniques described herein in scenarios where the energy-compensated HOA coefficients 47' include foreground channels, as well as in scenarios where the energy-compensated HOA coefficients 47' do not include any foreground channels. As an example, in scenarios where the energy-compensated HOA coefficients 47' include zero (0) foreground channels and eight (8) background channels (e.g., lower/smaller bit rate scenarios), the re-correlation unit 81 may apply the techniques and/or calculations described above.
音频解码装置24的各个组件(例如再相关单元81)可为用以确定将两种处理方法中的哪一个应用于去相关的语法元素,例如标志UsePhaseShiftDecorr。在其中去相关单元40'将空间变换用于去相关的例子中,再相关单元81可确定UsePhaseShiftDecorr标志设置为值零。Various components of the audio decoding device 24, such as the re-correlation unit 81, may be configured to determine which of the two processing methods to apply for decorrelation, such as a syntax element, such as a flag, UsePhaseShiftDecorr. In the example where the decorrelation unit 40' uses a spatial transform for decorrelation, the re-correlation unit 81 may determine that the UsePhaseShiftDecorr flag is set to a value of zero.
在其中再相关单元81确定UsePhaseShiftDecorr标志设置为值一的情况下,再相关单元81可确定将使用基于相位的变换执行再相关。如果标志UsePhaseShiftDecorr具有值1,那么应用以下处理以重构环境HOA分量的前四个系数序列In the case where the re-correlation unit 81 determines that the UsePhaseShiftDecorr flag is set to a value of one, the re-correlation unit 81 may determine that re-correlation is to be performed using a phase-based transform. If the flag UsePhaseShiftDecorr has a value of 1, then the following process is applied to reconstruct the first four coefficient sequences of the ambient HOA component
其中如在下文表1中定义的系数c以及A+90(k)和B+90(k)是+90度相移信号A和B的帧,定义如下where coefficient c is defined in Table 1 below and A +90 (k) and B +90 (k) are frames of +90 degree phase shifted signals A and B, defined as follows
A(k)=c(0)·[cI,AMB,1(k)-cI,AMB,2(k)],A(k)=c(0)·[c I,AMB,1 (k)-c I,AMB,2 (k)],
B(k)=c(1)·[cI,AMB,1(k)+cI,AMB,2(k)]。B(k)=c(1)·[c I,AMB,1 (k)+c I,AMB,2 (k)].
下文的表2说明去相关单元40'可用以实施基于相位的变换的实例系数。Table 2 below illustrates example coefficients that decorrelation unit 40' may use to implement the phase-based transform.
表2基于相位的变换的系数Table 2 Coefficients of phase-based transformation
在上述方程式中,变化的CAMB,1(k)变量标示对应于具有(阶数:子阶数)为(0:0)的球形基底函数的第k帧的HOA系数,其还可被称作‘W’信道或分量。变化的CAMB,2(k)变量标示对应于具有(阶数:子阶数)为(1:-1)的球形基底函数的第k帧的HOA系数,其还可被称作‘Y’信道或分量。变化的CAMB,3(k)变量标示对应于具有(阶数:子阶数)为(1:0)的球形基底函数的第k帧的HOA系数,其还可被称作‘Z’信道或分量。变化的CAMB,4(k)变量标示对应于具有(阶数:子阶数)为(1:1)的球形基底函数的第k帧的HOA系数,其还可被称作‘X’信道或分量。CAMB,1(k)到CAMB,3(k)可对应于环境HOA系数47'。In the above equations, the varying CAMB,1 (k) variable denotes the HOA coefficients corresponding to the k-th frame of the spherical basis function with (order:sub-order) of (0:0), which may also be referred to as the 'W' channel or component. The varying CAMB,2 (k) variable denotes the HOA coefficients corresponding to the k-th frame of the spherical basis function with (order:sub-order) of (1:-1), which may also be referred to as the 'Y' channel or component. The varying CAMB,3 (k) variable denotes the HOA coefficients corresponding to the k-th frame of the spherical basis function with (order:sub-order) of (1:0), which may also be referred to as the 'Z' channel or component. The varying CAMB,4 (k) variable denotes the HOA coefficients corresponding to the k-th frame of the spherical basis function with (order:sub-order) of (1:1), which may also be referred to as the 'X' channel or component. C AMB,1 (k) to C AMB,3 (k) may correspond to ambient HOA coefficients 47 ′.
上文的记法[CI,AMB,1(k)+CI,AMB,2(k)]标示可替代地称为‘S’的项,其等效于左信道加右信道。CI,AMB,1(k)变量标示作为UHJ编码的结果产生的左信道,而CI,AMB,2(k)变量标示作为UHJ编码的结果产生的右信道。下标‘I’记法标示对应信道已经与其它环境信道去相关(例如,通过应用UHJ矩阵或基于相位的变换)。[CI,AMB,1(k)-CI,AMB,2(k)]记法标示在本发明通篇中被称为‘D’的项,其表示左信道减右信道。CI,AMB,3(k)变量标示在本发明通篇中被称为变量‘T’的项。CI,AMB,4(k)变量标示在本发明通篇中被称为变量‘Q’的项。The notation [C I,AMB,1 (k) + C I,AMB,2 (k)] above denotes a term that may alternatively be referred to as 'S', which is equivalent to the left channel plus the right channel. The C I,AMB,1 (k) variable denotes the left channel produced as a result of UHJ encoding, while the C I,AMB,2 (k) variable denotes the right channel produced as a result of UHJ encoding. The subscript 'I' notation denotes that the corresponding channel has been decorrelated from the other ambient channels (e.g., by applying a UHJ matrix or a phase-based transform). The notation [C I,AMB,1 (k) - C I,AMB,2 (k)] denotes a term that is referred to throughout this disclosure as 'D', which represents the left channel minus the right channel. The C I,AMB,3 (k) variable denotes a term that is referred to throughout this disclosure as the variable 'T'. The C I,AMB,4 (k) variable denotes a term that is referred to throughout this disclosure as the variable 'Q'.
A+90(k)记法标示c(0)乘以S的正90度相移(其还在本发明通篇中由变量‘h1’标示)。B+90(k)记法标示c(1)乘以D的正90度相移(其还在本发明通篇中由变量‘h2’标示)。The A +90 (k) notation designates the positive 90 degree phase shift of c(0) times S (also designated by the variable 'h1' throughout this disclosure). The B +90 (k) notation designates the positive 90 degree phase shift of c(1) times D (also designated by the variable 'h2' throughout this disclosure).
空间-时间内插单元76可以类似于上文关于空间-时间内插单元50所描述的方式操作。空间-时间内插单元76可接收经减少的前景V[k]向量55k,并且对于前景V[k]向量55k和经减少的前景V[k-1]向量55k-1执行空间-时间内插以产生经内插的前景V[k]向量55k″。空间-时间内插单元76将经内插的前景V[k]向量55k″转发到淡化单元770。The spatial-temporal interpolation unit 76 may operate in a manner similar to that described above with respect to the spatial-temporal interpolation unit 50. The spatial-temporal interpolation unit 76 may receive the reduced foreground V[k] vector 55 k and perform spatial-temporal interpolation on the foreground V[k] vector 55 k and the reduced foreground V[k-1] vector 55 k-1 to produce an interpolated foreground V[k] vector 55 k ″. The spatial-temporal interpolation unit 76 forwards the interpolated foreground V[k] vector 55 k ″ to the fade unit 770.
提取单元72还可将指示环境HOA系数中的一者何时处于转变中的信号757输出到淡化单元770,所述淡化单元接着可确定SHCBG 47'(其中SHCBG 47'还可标示为“环境HOA信道47'”或“环境HOA系数47'”)和经内插的前景V[k]向量55k″的元素中的何者将淡入或淡出。在一些实例中,淡化单元770可对于环境HOA系数47'和经内插的前景V[k]向量55k″的元素中的每一者以相反方式操作。也就是说,淡化单元770可对于环境HOA系数47'中的对应环境HOA系数执行淡入或淡出或执行淡入或淡出两者,同时对于经内插的前景V[k]向量55k″的元素中的对应元素执行淡入或淡出或执行淡入和淡出两者。淡化单元770可将经调整的环境HOA系数47″输出到HOA系数制订单元82且将经调整的前景V[k]向量55k″′输出到前景制订单元78。在这方面,淡化单元770表示经配置以对于HOA系数或其衍生物(例如,呈环境HOA系数47'和经内插的前景V[k]向量55k″的元素的形式)的各种方面执行淡化操作的单元。The extraction unit 72 may also output a signal 757 indicating when one of the ambient HOA coefficients is in transition to a fade unit 770, which may then determine which of the SHC BG 47′ (where the SHC BG 47′ may also be labeled “ambient HOA channel 47′” or “ambient HOA coefficient 47′”) and the elements of the interpolated foreground V[k] vector 55 k ″ are to be faded in or out. In some examples, the fade unit 770 may operate in opposite manners for each of the elements of the ambient HOA coefficient 47′ and the interpolated foreground V[k] vector 55 k ″. That is, the fade unit 770 may perform a fade-in or fade-out or both fade-in or fade-out on the corresponding ambient HOA coefficients in the ambient HOA coefficients 47 ′ , while performing a fade-in or fade-out or both fade-in and fade-out on the corresponding elements in the elements of the interpolated foreground V[k] vector 55 k ″. The fade unit 770 may output the adjusted ambient HOA coefficients 47″ to the HOA coefficient formulation unit 82 and the adjusted foreground V[k] vector 55 k ″′ to the foreground formulation unit 78. In this regard, the fade unit 770 represents a unit configured to perform fade operations on various aspects of the HOA coefficients or their derivatives (e.g., in the form of the ambient HOA coefficients 47′ and the elements of the interpolated foreground V[k] vector 55 k ″).
前景制订单元78可表示经配置以对于经调整的前景V[k]向量55k″′和经内插的nFG信号49'执行矩阵乘法以产生前景HOA系数65的单元。在这方面,前景制订单元78可组合音频对象49'(其为借以表示经内插的nFG信号49'的另一种方式)与向量55k″′以重构HOA系数11'的前景(或换句话说,主导)方面。前景制订单元78可执行经内插的nFG信号49'与经调整的前景V[k]向量55k″′的矩阵乘法。The foreground development unit 78 may represent a unit configured to perform matrix multiplication on the adjusted foreground V[k] vector 55k ″′ and the interpolated nFG signal 49′ to produce the foreground HOA coefficients 65. In this regard, the foreground development unit 78 may combine the audio object 49′ (which is another way to represent the interpolated nFG signal 49′) with the vector 55k ″′ to reconstruct the foreground (or in other words, dominant) aspects of the HOA coefficients 11′. The foreground development unit 78 may perform matrix multiplication of the interpolated nFG signal 49′ and the adjusted foreground V[k] vector 55k ″′.
HOA系数制订单元82可表示经配置以将前景HOA系数65与经调整的环境HOA系数47″组合以便获得HOA系数11'的单元。撇号记法反映HOA系数11'可与HOA系数11类似而非相同。HOA系数11与11'之间的差异可由归因于经由有损传输媒体的传输、量化或其它有损操作的损失引起。The HOA coefficient formulation unit 82 may represent a unit configured to combine the foreground HOA coefficients 65 with the adjusted ambient HOA coefficients 47″ in order to obtain the HOA coefficients 11′. The apostrophe notation reflects that the HOA coefficients 11′ may be similar to, but not identical to, the HOA coefficients 11. Differences between the HOA coefficients 11 and 11′ may be due to losses due to transmission over a lossy transmission medium, quantization, or other lossy operations.
UHJ是已经用以从一阶立体混响内容创建2信道立体声流的矩阵变换方法。UHJ在过去用以经由FM发射器发射立体声或仅水平环绕内容。然而,应了解,UHJ不限于在FM发射器中的使用。在MPEG-H HOA编码方案中,可用模式矩阵预处理HOA背景信道以将HOA背景信道转换成空间域中的正交点。接着经由USAC或AAC对经变换信道以感知方式进行译码。UHJ is a matrix transformation method that has been used to create a two-channel stereo stream from first-order ambisonic content. UHJ has been used in the past to transmit stereo or horizontal surround-only content via FM transmitters. However, it should be understood that UHJ is not limited to use in FM transmitters. In the MPEG-H HOA coding scheme, the HOA background channel can be preprocessed with a pattern matrix to convert the HOA background channel into orthogonal points in the spatial domain. The transformed channel is then perceptually coded using USAC or AAC.
本发明的技术通常涉及在对HOA背景信道进行译码的应用中使用UHJ变换(或基于相位的变换)而非使用此模式矩阵。两种方法((1)经由模式矩阵到空间域中的变换,(2)UHJ变换)通常均涉及减少HOA背景信道之间的的相关性,所述相关性可引起经解码声场内的噪声去掩蔽的(潜在地非所要的)效应。The techniques of this disclosure generally involve using the UHJ transform (or phase-based transform) in applications for decoding HOA background channels rather than using this pattern matrix. Both approaches ((1) transform into the spatial domain via the pattern matrix, (2) the UHJ transform) generally involve reducing the correlation between the HOA background channels, which can cause the (potentially undesirable) effect of noise demasking within the decoded sound field.
因此,在实例中,音频解码装置24可表示经配置以进行以下操作的装置:获得具有至少一左信号和右信号的环境立体混响系数的经去相关表示,所述环境立体混响系数已从多个高阶立体混响系数提取并且表示由所述多个高阶立体混响系数描述的声场的背景分量,其中所述多个高阶立体混响系数中的至少一者与具有大于一的阶数的球形基底函数相关联;和基于所述环境立体混响系数的经去相关表示而产生扬声器馈送。在一些实例中,所述装置经进一步配置以将再相关变换应用于环境立体混响系数的经去相关表示以获得多个相关的环境立体混响系数。Thus, in an example, the audio decoding device 24 may represent a device configured to: obtain a decorrelated representation of ambisonic reverberation coefficients having at least a left signal and a right signal, the ambisonic reverberation coefficients having been extracted from a plurality of higher-order ambisonic reverberation coefficients and representing background components of a sound field described by the plurality of higher-order ambisonic reverberation coefficients, wherein at least one of the plurality of higher-order ambisonic reverberation coefficients is associated with a spherical basis function having an order greater than one; and generate a speaker feed based on the decorrelated representation of the ambisonic reverberation coefficients. In some examples, the device is further configured to apply a recorrelation transform to the decorrelated representation of the ambisonic reverberation coefficients to obtain a plurality of correlated ambisonic reverberation coefficients.
在一些实例中,为了应用再相关变换,所述装置经配置以将逆UHJ矩阵(或基于相位的变换)应用于环境立体混响系数。根据一些实例,逆UHJ矩阵(或基于相位的逆变换)已根据N3D(全三维)归一化经归一化。根据一些实例,逆UHJ矩阵(或基于相位的逆变换)已根据SN3D归一化(施密特半归一化)经归一化。In some examples, to apply the recorrelation transform, the device is configured to apply an inverse UHJ matrix (or a phase-based transform) to the ambisonics coefficients. According to some examples, the inverse UHJ matrix (or the inverse phase-based transform) has been normalized according to N3D (full three-dimensional) normalization. According to some examples, the inverse UHJ matrix (or the inverse phase-based transform) has been normalized according to SN3D normalization (Schmidt semi-normalization).
根据一些实例,环境立体混响系数与具有阶数零或阶数一的球形基底函数相关联,且为了应用逆UHJ矩阵(或基于相位的逆变换),所述装置经配置以对于环境立体混响系数的经去相关表示执行UHJ矩阵的标量乘法。在一些实例中,为了应用再相关变换,所述装置经配置以将逆模式矩阵应用于环境立体混响系数的经去相关表示。在一些实例中,为了产生扬声器馈送,所述装置经配置以基于左信号产生左扬声器馈送且基于右信号产生右扬声器馈送,所述左扬声器馈送和扬声器馈送通过立体声再现系统输出。According to some examples, the ambisonic reverberation coefficients are associated with spherical basis functions of order zero or order one, and to apply an inverse UHJ matrix (or an inverse phase-based transform), the device is configured to perform a scalar multiplication of the UHJ matrix on a decorrelated representation of the ambisonic reverberation coefficients. In some examples, to apply a recorrelation transform, the device is configured to apply an inverse mode matrix to the decorrelated representation of the ambisonic reverberation coefficients. In some examples, to generate the speaker feeds, the device is configured to generate a left speaker feed based on the left signal and a right speaker feed based on the right signal, the left speaker feed and the speaker feed being output by a stereo reproduction system.
在一些实例中,为了产生扬声器馈送,所述装置经配置以在不将再相关变换应用于所述右信号和左信号的情况下,使用左信号作为左扬声器馈送且使用右信号作为右扬声器馈送。根据一些实例,为了产生扬声器馈送,所述装置经配置以混合左信号与右信号以用于由单声道音频系统输出。根据一些实例,为了产生扬声器馈送,所述装置经配置以组合相关的环境立体混响系数与一或多个前景信道。In some examples, to generate the speaker feeds, the device is configured to use the left signal as the left speaker feed and the right signal as the right speaker feed without applying a recorrelation transform to the right and left signals. According to some examples, to generate the speaker feeds, the device is configured to mix the left and right signals for output by a mono audio system. According to some examples, to generate the speaker feeds, the device is configured to combine correlated ambisonics coefficients with one or more foreground channels.
根据一些实例,所述装置经进一步配置以确定没有前景信道可用于与相关的环境立体混响系数组合。在一些实例中,所述装置经进一步配置以确定将经由单声道音频再现系统输出声场,以及对包含用于由单声道音频再现系统输出的数据的经去相关的高阶立体混响系数的至少一个子集进行解码。在一些实例中,所述装置经进一步配置以获得对环境立体混响系数的经去相关表示是通过去相关变换经去相关的指示。根据一些实例,所述装置进一步包含经配置以输出基于环境立体混响系数的经去相关表示产生的扬声器馈送的扩音器阵列。According to some examples, the device is further configured to determine that no foreground channel is available for combination with the correlated ambisonic reverberation coefficients. In some examples, the device is further configured to determine that the sound field is to be output via a monophonic audio reproduction system, and to decode at least a subset of decorrelated higher-order ambisonic reverberation coefficients comprising data for output by the monophonic audio reproduction system. In some examples, the device is further configured to obtain an indication that the decorrelated representation of the ambisonic reverberation coefficients is decorrelated by a decorrelation transform. According to some examples, the device further includes a loudspeaker array configured to output speaker feeds generated based on the decorrelated representation of the ambisonic reverberation coefficients.
图5是说明音频编码装置(例如在图3的实例中展示的音频编码装置20)执行本发明中描述的基于向量的合成技术的各种方面的示范性操作的流程图。最初,音频编码装置20接收HOA系数11(106)。音频编码装置20可调用LIT单元30,其可对于HOA系数应用LIT以输出经变换的HOA系数(例如,在SVD的情况下,经变换的HOA系数可包括US[k]向量33和V[k]向量35)(107)。FIG5 is a flowchart illustrating exemplary operation of an audio encoding device (e.g., the audio encoding device 20 shown in the example of FIG3 ) in performing various aspects of the vector-based synthesis techniques described in this disclosure. Initially, the audio encoding device 20 receives the HOA coefficients 11 ( 106 ). The audio encoding device 20 may invoke the LIT unit 30 , which may apply the LIT to the HOA coefficients to output transformed HOA coefficients (e.g., in the case of SVD, the transformed HOA coefficients may include the US[k] vector 33 and the V[k] vector 35 ) ( 107 ).
音频编码装置20接下来可调用参数计算单元32以按上文所描述的方式对于US[k]向量33、US[k-1]向量33、V[k]和/或V[k-1]向量35的任何组合执行上文所描述的分析以标识各种参数。也就是说,参数计算单元32可基于对经变换的HOA系数33/35的分析确定至少一个参数(108)。The audio encoding device 20 may then call the parameter calculation unit 32 to perform the above-described analysis to identify various parameters for any combination of the US[k] vector 33, the US[k-1] vector 33, the V[k] and/or the V[k-1] vector 35 in the manner described above. That is, the parameter calculation unit 32 may determine at least one parameter based on the analysis of the transformed HOA coefficients 33/35 (108).
音频编码装置20接着可调用重排序单元34,所述重排序单元基于所述参数将经变换的HOA系数(再次在SVD的上下文中,其可指US[k]向量33和V[k]向量35)重排序,以产生经重排序的经变换HOA系数33'/35'(或换句话说,US[k]向量33'和V[k]向量35'),如上文所描述(109)。音频编码装置20可在前述操作或后续操作中的任一者期间还调用声场分析单元44。如上文所描述,声场分析单元44可对于HOA系数11和/或经变换HOA系数33/35执行声场分析,以确定前景信道(nFG)45的总数目、背景声场(NBG)的阶数以及待发送的额外BG HOA信道的数目(nBGa)和索引(i)(其可在图3的实例中共同标示为背景信道信息43)(110)。The audio encoding device 20 may then call a reordering unit 34 that reorders the transformed HOA coefficients (again, in the context of SVD, which may refer to the US[k] vector 33 and the V[k] vector 35) based on the parameters to produce reordered transformed HOA coefficients 33'/35' (or in other words, the US[k] vector 33' and the V[k] vector 35'), as described above (109). The audio encoding device 20 may also call a sound field analysis unit 44 during any of the foregoing operations or subsequent operations. As described above, the sound field analysis unit 44 may perform sound field analysis on the HOA coefficients 11 and/or the transformed HOA coefficients 33/35 to determine the total number of foreground channels (nFG) 45, the order of the background sound field ( NBG ), and the number (nBGa) and index (i) of additional BG HOA channels to be sent (which may be collectively labeled as background channel information 43 in the example of FIG. 3) (110).
音频编码装置20还可调用背景选择单元48。背景选择单元48可基于背景信道信息43确定背景或环境HOA系数47(112)。音频编码装置20可进一步调用前景选择单元36,所述前景选择单元可基于nFG 45(其可表示标识前景向量的一或多个索引)选择表示声场的前景或相异分量的经重排序US[k]向量33'和经重排序V[k]向量35'(113)。The audio encoding device 20 may also invoke a background selection unit 48. The background selection unit 48 may determine background or ambient HOA coefficients 47 based on the background channel information 43 (112). The audio encoding device 20 may further invoke a foreground selection unit 36, which may select a reordered US[k] vector 33' and a reordered V[k] vector 35' representing the foreground or distinct components of the soundfield based on nFG 45 (which may represent one or more indices identifying the foreground vectors) (113).
音频编码装置20可调用能量补偿单元38。能量补偿单元38可对于环境HOA系数47执行能量补偿,以补偿归因于由背景选择单元48移除HOA系数中的各者而导致的能量损失(114),且借此产生经能量补偿的环境HOA系数47'。The audio encoding device 20 may invoke the energy compensation unit 38. The energy compensation unit 38 may perform energy compensation on the ambient HOA coefficients 47 to compensate for the energy loss due to the removal of each of the HOA coefficients by the background selection unit 48 (114), and thereby generate energy-compensated ambient HOA coefficients 47'.
音频编码装置20还可调用空间-时间内插单元50。空间-时间内插单元50可对于经重排序的经变换HOA系数33'/35'执行空间-时间内插,以获得经内插的前景信号49'(其也可被称作“经内插的nFG信号49'”)和剩余的前景方向信息53(其也可被称作“V[k]向量53”)(116)。音频编码装置20接着可调用系数减少单元46。系数减少单元46可基于背景信道信息43对于剩余的前景V[k]向量53执行系数减少,以获得经减少的前景方向信息55(其也可被称为经减少的前景V[k]向量55)(118)。The audio encoding device 20 may also call the space-time interpolation unit 50. The space-time interpolation unit 50 may perform space-time interpolation on the reordered transformed HOA coefficients 33'/35' to obtain an interpolated foreground signal 49' (which may also be referred to as "interpolated nFG signal 49'") and remaining foreground directional information 53 (which may also be referred to as "V[k] vector 53") (116). The audio encoding device 20 may then call the coefficient reduction unit 46. The coefficient reduction unit 46 may perform coefficient reduction on the remaining foreground V[k] vector 53 based on the background channel information 43 to obtain reduced foreground directional information 55 (which may also be referred to as reduced foreground V[k] vector 55) (118).
音频编码装置20接着可调用量化单元52以按上文所描述的方式压缩经减少的前景V[k]向量55且产生经译码前景V[k]向量57(120)。音频编码装置20还可调用去相关单元40'以应用相移去相关,以减少或消除HOA系数47'的背景信号之间的相关性,从而形成一或多个经去相关的HOA系数47"(121)。The audio encoding device 20 may then call the quantization unit 52 to compress the reduced foreground V[k] vector 55 in the manner described above and produce a decoded foreground V[k] vector 57 (120). The audio encoding device 20 may also call the decorrelation unit 40' to apply phase-shift decorrelation to reduce or eliminate correlation between the background signal of the HOA coefficients 47', thereby forming one or more decorrelated HOA coefficients 47" (121).
音频编码装置20还可调用心理声学音频译码器单元40。心理声学音频译码器单元40可对经能量补偿的环境HOA系数47'和经内插nFG信号49'的每一向量进行心理声学译码,以产生经编码环境HOA系数59和经编码nFG信号61(122)。音频编码装置接着可调用位流产生单元42。位流产生单元42可基于经译码前景方向信息57、经译码环境HOA系数59、经译码nFG信号61和背景信道信息43产生位流21(124)。The audio encoding device 20 may also call the psychoacoustic audio decoder unit 40. The psychoacoustic audio decoder unit 40 may psychoacoustically decode each vector of the energy-compensated ambient HOA coefficients 47' and the interpolated nFG signal 49' to generate encoded ambient HOA coefficients 59 and encoded nFG signal 61 (122). The audio encoding device may then call the bitstream generation unit 42. The bitstream generation unit 42 may generate the bitstream 21 based on the coded foreground directional information 57, the coded ambient HOA coefficients 59, the coded nFG signal 61, and the background channel information 43 (124).
图6A是说明音频解码装置(例如在图4的实例中展示的音频解码装置24)执行本发明中描述的技术的各种方面的示范性操作的流程图。最初,音频解码装置24可接收位流21(130)。在接收到位流之后,音频解码装置24即可调用提取单元72。出于论述的目的假设位流21指示将执行基于向量的重建构,提取单元72可解析位流以检索上文所提到的信息,从而将所述信息传递到基于向量的重构单元92。FIG6A is a flowchart illustrating exemplary operation of an audio decoding device, such as the audio decoding device 24 shown in the example of FIG4 , in performing various aspects of the techniques described in this disclosure. Initially, the audio decoding device 24 may receive the bitstream 21 ( 130 ). Upon receiving the bitstream, the audio decoding device 24 may invoke the extraction unit 72 . Assuming for purposes of discussion that the bitstream 21 indicates that vector-based reconstruction is to be performed, the extraction unit 72 may parse the bitstream to retrieve the information mentioned above, thereby passing the information to the vector-based reconstruction unit 92 .
换句话说,提取单元72可按上文所描述的方式从位流21中提取经译码前景方向信息57(再次,其也可被称为经译码前景V[k]向量57)、经译码环境HOA系数59和经译码前景信号(其也可被称为经译码前景nFG信号59或经译码前景音频对象59)(132)。In other words, the extraction unit 72 can extract the decoded foreground directional information 57 (which, again, may also be referred to as the decoded foreground V[k] vector 57), the decoded ambient HOA coefficients 59 and the decoded foreground signal (which may also be referred to as the decoded foreground nFG signal 59 or the decoded foreground audio object 59) from the bitstream 21 in the manner described above (132).
音频解码装置24可进一步调用解量化单元74。解量化单元74可对经译码前景方向信息57进行熵解码和解量化以获得经减少的前景方向信息55k(136)。音频解码装置24可调用再相关单元81。再相关单元81可将一或多个再相关变换应用于经能量补偿的环境HOA系数47'以获得一或多个经再相关的HOA系数47"(或相关的HOA系数47"),并且可将相关的HOA系数47"传递到HOA系数制订单元82(任选地,通过淡化单元770)(137)。音频解码装置24还可调用心理声学解码单元80。心理声学音频解码单元80可对经编码环境HOA系数59和经编码前景信号61进行解码以获得经能量补偿的环境HOA系数47'和经内插的前景信号49'(138)。心理声学解码单元80可将经能量补偿的环境HOA系数47'传递到淡化单元770且将nFG信号49'传递到前景制订单元78。The audio decoding device 24 may further invoke the dequantization unit 74. The dequantization unit 74 may entropy decode and dequantize the coded foreground directional information 57 to obtain reduced foreground directional information 55k (136). The audio decoding device 24 may invoke the re-correlation unit 81. The re-correlation unit 81 may apply one or more re-correlation transforms to the energy-compensated ambient HOA coefficients 47′ to obtain one or more re-correlated HOA coefficients 47″ (or correlated HOA coefficients 47″), and may pass the correlated HOA coefficients 47″ to the HOA coefficient formulation unit 82 (optionally, through the fade unit 770) (137). The audio decoding device 24 may also call the psychoacoustic decoding unit 80. The psychoacoustic audio decoding unit 80 may decode the encoded ambient HOA coefficients 59 and the encoded foreground signal 61 to obtain energy-compensated ambient HOA coefficients 47′ and an interpolated foreground signal 49′ (138). The psychoacoustic decoding unit 80 may pass the energy-compensated ambient HOA coefficients 47′ to the fade unit 770 and pass the nFG signal 49′ to the foreground formulation unit 78.
音频解码装置24接下来可调用空间-时间内插单元76。空间-时间内插单元76可接收经重排序的前景方向信息55k'且对于经减少的前景方向信息55k/55k-1执行空间-时间内插以产生经内插的前景方向信息55k″(140)。空间-时间内插单元76可将经内插的前景V[k]向量55k″转发到淡化单元770。The audio decoding device 24 may next invoke the spatio-temporal interpolation unit 76. The spatio-temporal interpolation unit 76 may receive the reordered foreground directional information 55 k ′ and perform spatio-temporal interpolation on the reduced foreground directional information 55 k /55 k-1 to generate interpolated foreground directional information 55 k ″ (140). The spatio-temporal interpolation unit 76 may forward the interpolated foreground V[k] vector 55 k ″ to the fade unit 770.
音频解码装置24可调用淡化单元770。淡化单元770可(例如,从提取单元72)接收或以其它方式获得指示经能量补偿的环境HOA系数47'何时处于转变中的语法元素(例如,AmbCoeffTransition语法元素)。淡化单元770可基于转变语法元素和所维持的转变状态信息使经能量补偿的环境HOA系数47'淡入或淡出,从而将经调整的环境HOA系数47″输出到HOA系数制订单元82。淡化单元770还可基于语法元素和所维持的转变状态信息而使经内插的前景V[k]向量55k″的对应一或多个元素淡出或淡入,从而将经调整的前景V[k]向量55k″′输出到前景制订单元78(142)。The audio decoding device 24 may call a fade unit 770. The fade unit 770 may receive or otherwise obtain (e.g., from the extraction unit 72) a syntax element (e.g., an AmbCoeffTransition syntax element) that indicates when the energy-compensated ambient HOA coefficients 47' are in transition. The fade unit 770 may fade the energy-compensated ambient HOA coefficients 47' in or out based on the transition syntax element and the maintained transition state information, thereby outputting the adjusted ambient HOA coefficients 47" to the HOA coefficient formulation unit 82. The fade unit 770 may also fade out or in corresponding one or more elements of the interpolated foreground V[k] vector 55k " based on the syntax element and the maintained transition state information, thereby outputting the adjusted foreground V[k] vector 55k '" to the foreground formulation unit 78 (142).
音频解码装置24可调用前景制订单元78。前景制订单元78可执行nFG信号49'与经调整前景方向信息55k″′的矩阵乘法以获得前景HOA系数65(144)。音频解码装置24还可调用HOA系数制订单元82。HOA系数制订单元82可将前景HOA系数65与经调整环境HOA系数47″相加以便获得HOA系数11'(146)。The audio decoding device 24 may call the foreground formulation unit 78. The foreground formulation unit 78 may perform a matrix multiplication of the nFG signal 49′ and the adjusted foreground direction information 55 k ″′ to obtain the foreground HOA coefficients 65 (144). The audio decoding device 24 may also call the HOA coefficient formulation unit 82. The HOA coefficient formulation unit 82 may add the foreground HOA coefficients 65 to the adjusted ambient HOA coefficients 47″ to obtain the HOA coefficients 11 ′ (146).
图6B是说明音频编码装置和音频解码装置执行本发明中描述的译码技术的示范性操作的流程图。图6B是说明根据本发明的一或多个方面的实例编码和解码过程160的流程图。尽管过程160可由多种装置执行,但为便于论述,在本文中关于上文所描述的音频编码装置20和音频解码装置24来描述过程160。使用图6B中的虚线将过程160的编码区段与解码区段分界。过程160可开始于音频编码装置20的一或多个组件(例如,前景选择单元36和背景选择单元48)使用HOA空间编码从HOA输入产生前景信道164和一阶HOA背景信道166(162)。继而,去相关单元40'可将去相关变换(例如,呈基于相位的去相关变换或矩阵形式)应用于经能量补偿的环境HOA系数47'。更具体来说,音频编码装置20可将UHJ矩阵或基于相位的去相关变换(例如,通过标量乘法)应用于经能量补偿的环境HOA系数47'(168)。6B is a flowchart illustrating exemplary operation of an audio encoding device and an audio decoding device performing the decoding techniques described in the present disclosure. FIG6B is a flowchart illustrating an example encoding and decoding process 160 according to one or more aspects of the present disclosure. Although process 160 can be performed by a variety of devices, for ease of discussion, process 160 is described herein with respect to the audio encoding device 20 and audio decoding device 24 described above. The encoding section and the decoding section of process 160 are demarcated using the dashed line in FIG6B. Process 160 may begin with one or more components of the audio encoding device 20 (e.g., foreground selection unit 36 and background selection unit 48) generating a foreground channel 164 and a first-order HOA background channel 166 from an HOA input using HOA spatial coding (162). Subsequently, decorrelation unit 40′ may apply a decorrelation transform (e.g., in the form of a phase-based decorrelation transform or matrix) to the energy-compensated ambient HOA coefficients 47′. More specifically, the audio encoding device 20 may apply a UHJ matrix or a phase-based decorrelation transform (eg, by scalar multiplication) to the energy-compensated ambient HOA coefficients 47 ′ ( 168 ).
在一些实例中,如果去相关单元40',在其中去相关单元40'确定HOA背景信道包含较少数目个信道(例如,四个)的例子中,去相关单元40'可应用UHJ矩阵(或基于相位的变换)。相反地,在这些实例中,如果去相关单元40'确定HOA背景信道包含较大数目个信道(例如,九个),那么音频编码装置20可选择不同于UHJ矩阵的去相关变换(例如,在MPEG-H标准中描述的模式矩阵)并将所述去相关变换应用于HOA背景信道。通过将去相关变换(例如,UHJ矩阵)应用于HOA背景信道,音频编码装置20可获得经去相关的HOA背景信道。In some examples, if the decorrelation unit 40' determines that the HOA background channel includes a smaller number of channels (e.g., four), the decorrelation unit 40' may apply a UHJ matrix (or a phase-based transform). Conversely, in these examples, if the decorrelation unit 40' determines that the HOA background channel includes a larger number of channels (e.g., nine), the audio encoding device 20 may select a decorrelation transform other than the UHJ matrix (e.g., a pattern matrix described in the MPEG-H standard) and apply the decorrelation transform to the HOA background channel. By applying the decorrelation transform (e.g., the UHJ matrix) to the HOA background channel, the audio encoding device 20 may obtain a decorrelated HOA background channel.
如在图6B中展示,音频编码装置20(例如,通过调用心理声学音频译码器单元40)可将时间编码(例如,通过应用AAC和/或USAC)应用于经去相关的HOA背景信号(170)以及应用于任何前景信道(166)。应了解,在一些情境中,心理声学音频译码器单元40可确定前景信道的数目可为零(即,在这些情境中,心理声学音频译码器单元40可不从HOA输入获得任何前景信道)。因为AAC和/或USAC可能不经优化以用于或以其它方式非常适合于立体声音频数据,去相关单元40'可应用去相关矩阵以减少或消除HOA背景信道之间的相关性。经去相关的HOA背景信道中展示的经减少的相关性提供在AAC/USAC时间编码阶段减轻或消除噪声去掩蔽的潜在优点,这是因为AAC和USAC可能不针对立体声音频数据经优化。As shown in FIG6B , the audio encoding device 20 (e.g., by invoking the psychoacoustic audio decoder unit 40) may apply temporal encoding (e.g., by applying AAC and/or USAC) to the decorrelated HOA background signal (170) as well as to any foreground channels (166). It should be appreciated that in some scenarios, the psychoacoustic audio decoder unit 40 may determine that the number of foreground channels may be zero (i.e., in these scenarios, the psychoacoustic audio decoder unit 40 may not obtain any foreground channels from the HOA input). Because AAC and/or USAC may not be optimized for or otherwise well-suited to stereo audio data, the decorrelation unit 40′ may apply a decorrelation matrix to reduce or eliminate correlation between the HOA background channels. The reduced correlation shown in the decorrelated HOA background channels provides a potential advantage in mitigating or eliminating noise demasking during the AAC/USAC temporal encoding stage, since AAC and USAC may not be optimized for stereo audio data.
继而,音频解码装置24可执行对由音频编码装置20输出的经编码位流的时间解码。在过程160的实例中,音频解码装置24的一或多个组件(例如,心理声学解码单元80)可分别对于前景信道(如果有任何前景信道包含在位流中)(172)和背景信道(174)执行时间解码。另外,再相关单元81可将再相关变换应用于经时间解码的HOA背景信道。作为一实例,再相关单元81可以互逆方式将去相关变换应用于去相关单元40'。举例来说,如在过程160的具体实例中所描述,再相关单元81可将UHJ矩阵或基于相位的变换应用于经时间解码的HOA背景信号(176)。Then, the audio decoding device 24 may perform temporal decoding of the encoded bitstream output by the audio encoding device 20. In an example of process 160, one or more components of the audio decoding device 24 (e.g., the psychoacoustic decoding unit 80) may perform temporal decoding for the foreground channel (if any foreground channel is included in the bitstream) (172) and the background channel (174), respectively. In addition, the recordination unit 81 may apply a recordination transform to the temporally decoded HOA background channel. As an example, the recordination unit 81 may apply a decorrelation transform to the decorrelation unit 40' in a reciprocal manner. For example, as described in the specific example of process 160, the recordination unit 81 may apply a UHJ matrix or a phase-based transform to the temporally decoded HOA background signal (176).
在一些实例中,如果再相关单元81确定经时间解码的HOA背景信号包含较少数目个信道(例如,四个),那么再相关单元81可应用UHJ矩阵或基于相位的变换。相反地,在这些实例中,如果再相关单元81确定经时间解码的HOA背景信道包含较大数目个信道(例如,九个),那么再相关单元81可选择不同于UHJ矩阵的去相关变换(例如,在MPEG-H标准中描述的模式矩阵)并将所述去相关变换应用于HOA背景信道。In some examples, if the re-correlation unit 81 determines that the time-decoded HOA background signal includes a smaller number of channels (e.g., four), the re-correlation unit 81 may apply a UHJ matrix or a phase-based transform. Conversely, in these examples, if the re-correlation unit 81 determines that the time-decoded HOA background channel includes a larger number of channels (e.g., nine), the re-correlation unit 81 may select a decorrelation transform other than the UHJ matrix (e.g., a pattern matrix described in the MPEG-H standard) and apply the decorrelation transform to the HOA background channel.
另外,HOA系数制订单元82可执行对相关的HOA背景信道和任何可用的经解码前景信道的HOA空间解码(178)。继而,HOA系数制订单元82可向一或多个输出装置(例如扩音器和/或头戴式耳机(包含但不限于具有立体声或环绕声能力的输出装置)再现经解码的音频信号(180)。Additionally, HOA coefficient formulation unit 82 may perform HOA spatial decoding of the associated HOA background channels and any available decoded foreground channels (178). HOA coefficient formulation unit 82 may then reproduce the decoded audio signal (180) to one or more output devices, such as speakers and/or headphones (including but not limited to output devices with stereo or surround sound capabilities).
可对于任何数目个不同上下文和音频生态系统执行前述技术。下文描述数个实例上下文,但所述技术不应限于所述实例上下文。一个实例音频生态系统可包含音频内容、电影工作室、音乐工作室、游戏音频工作室、基于信道的音频内容、译码引擎、游戏音频原声(stem)、游戏音频译码/再现引擎,以及递送系统。The aforementioned techniques can be performed for any number of different contexts and audio ecosystems. Several example contexts are described below, but the techniques should not be limited to these example contexts. An example audio ecosystem may include audio content, a movie studio, a music studio, a game audio studio, channel-based audio content, a decoding engine, a game audio soundtrack (stem), a game audio decoding/rendering engine, and a delivery system.
电影工作室、音乐工作室和游戏音频工作室可接收音频内容。在一些实例中,音频内容可表示获取内容的输出。电影工作室可例如通过使用数字音频工作站(DAW)输出基于信道的音频内容(例如,呈2.0、5.1和7.1)。音乐工作室可例如通过使用DAW输出基于信道的音频内容(例如,呈2.0和5.1)。在任一情况下,译码引擎可基于一或多个编码解码器(例如,AAC、AC3、杜比真HD(Dolby True HD)、杜比数字加(Dolby Digital Plus)以及DTS主音频)接收并编码基于信道的音频内容以用于由递送系统输出。游戏音频工作室可例如通过使用DAW输出一或多个游戏音频原声。游戏音频译码/再现引擎可译码音频原声和/或将音频原声再现成基于信道的音频内容以供递送系统输出。可执行所述技术的另一实例上下文包括音频生态系统,其可包含广播记录音频对象、专业音频系统、消费型装置上捕获、HOA音频格式、装置上再现、消费型音频、TV、和配件,以及汽车音频系统。Movie studios, music studios, and game audio studios can receive audio content. In some instances, the audio content can represent the output of acquired content. A movie studio can output channel-based audio content (e.g., in 2.0, 5.1, and 7.1), for example, by using a digital audio workstation (DAW). A music studio can output channel-based audio content (e.g., in 2.0 and 5.1), for example, by using a DAW. In either case, a decoding engine can receive and encode channel-based audio content based on one or more codecs (e.g., AAC, AC3, Dolby True HD, Dolby Digital Plus, and DTS Master Audio) for output by a delivery system. A game audio studio can output one or more game audio soundtracks, for example, by using a DAW. A game audio decoding/reproduction engine can decode the audio soundtracks and/or reproduce the audio soundtracks into channel-based audio content for output by a delivery system. Another example context in which the techniques may be performed includes an audio ecosystem that may include broadcast recorded audio objects, professional audio systems, capture on consumer devices, HOA audio formats, on-device reproduction, consumer audio, TV, and accessories, and automotive audio systems.
广播记录音频对象、专业音频系统和消费型装置上捕获都可使用HOA音频格式对其输出进行译码。以此方式,可使用HOA音频格式将音频内容译码成单一表示,可使用装置上再现、消费型音频、TV、和配件以及汽车音频系统回放所述单一表示。换句话说,可在通用音频回放系统(即,与要求例如5.1、7.1等的特定配置相反)(例如,音频回放系统16)处回放音频内容的单一表示。Broadcast recorded audio objects, professional audio systems, and capture on consumer devices can all be encoded using the HOA audio format for their output. In this way, the HOA audio format can be used to encode audio content into a single representation that can be played back using on-device reproduction, consumer audio, TV, accessories, and car audio systems. In other words, a single representation of the audio content can be played back on a universal audio playback system (i.e., as opposed to requiring a specific configuration such as 5.1, 7.1, etc.) (e.g., audio playback system 16).
可执行所述技术的上下文的其它实例包含可包含获取元件和回放元件的音频生态系统。获取元件可包含有线和/或无线获取装置(例如,本征麦克风)、装置上环绕声捕获,以及移动装置(例如,智能电话和平板计算机)。在一些实例中,有线和/或无线获取装置可经由有线和/或无线通信信道耦合到移动装置。Other examples of contexts in which the techniques may be implemented include audio ecosystems that may include acquisition elements and playback elements. The acquisition elements may include wired and/or wireless acquisition devices (e.g., intrinsic microphones), on-device surround sound capture, and mobile devices (e.g., smartphones and tablets). In some examples, the wired and/or wireless acquisition devices may be coupled to the mobile device via a wired and/or wireless communication channel.
根据本发明的一或多种技术,移动装置可用以获取声场。举例来说,移动装置可经由有线和/或无线获取装置和/或装置上环绕声捕获(例如,集成到移动装置中的多个麦克风)获取声场。移动装置接着可将所获取声场译码成HOA系数以用于由回放元件中的一或多者回放。举例来说,移动装置的用户可记录实况事件(例如,集会、会议、比赛、音乐会等)(获取实况事件的声场),且将所述记录内容译码成HOA系数。According to one or more techniques of this disclosure, a mobile device may be used to acquire a sound field. For example, the mobile device may acquire the sound field via wired and/or wireless acquisition device and/or on-device surround sound capture (e.g., multiple microphones integrated into the mobile device). The mobile device may then decode the acquired sound field into HOA coefficients for playback by one or more of the playback elements. For example, a user of the mobile device may record a live event (e.g., a meeting, conference, game, concert, etc.) (capturing the sound field of the live event) and decode the recording into HOA coefficients.
移动装置还可使用回放元件中的一或多者来回放经HOA译码声场。举例来说,移动装置可对经HOA译码声场进行解码,且将致使回放元件中的一或多者重新产生声场的信号输出到所述回放元件中的所述一或多者。作为一个实例,移动装置可使用无线和/或无线通信信道将信号输出到一或多个扬声器(例如,扬声器阵列、声棒(sound bar)等)。作为另一实例,移动装置可使用对接解决方案将信号输出到一或多个对接站和/或一或多个对接的扬声器(例如,智能汽车和/或家庭中的声音系统)。作为另一实例,移动装置可使用头戴式耳机再现将信号输出到一组头戴式耳机(例如)以创建逼真的双耳声。The mobile device may also use one or more of the playback elements to play back the HOA decoded sound field. For example, the mobile device may decode the HOA decoded sound field and output a signal to the one or more of the playback elements that causes the one or more of the playback elements to reproduce the sound field. As an example, the mobile device may output the signal to one or more speakers (e.g., a speaker array, a sound bar, etc.) using a wireless and/or wireless communication channel. As another example, the mobile device may use a docking solution to output the signal to one or more docking stations and/or one or more docked speakers (e.g., a sound system in a smart car and/or home). As another example, the mobile device may use headphone reproduction to output the signal to a set of headphones (e.g., to create realistic binaural sound).
在一些实例中,特定移动装置可获取3D声场以及在稍后时间回放同一3D声场。在一些实例中,移动装置可获取3D声场,将3D声场编码为HOA,并且将经编码3D声场传输到一或多个其它装置(例如,其它移动装置和/或其它非移动装置)以用于回放。In some examples, a particular mobile device may acquire a 3D sound field and play back the same 3D sound field at a later time. In some examples, a mobile device may acquire a 3D sound field, encode the 3D sound field into an HOA, and transmit the encoded 3D sound field to one or more other devices (e.g., other mobile devices and/or other non-mobile devices) for playback.
可执行所述技术的又一上下文包含音频生态系统,其可包含音频内容、游戏工作室、经译码音频内容、再现引擎以及递送系统。在一些实例中,游戏工作室可包含可支持HOA信号的编辑的一或多个DAW。举例来说,所述一或多个DAW可包含可经配置以与一或多个游戏音频系统一起操作(例如,工作)的HOA插件和/或工具。在一些实例中,游戏工作室可输出支持HOA的新原声格式。在任何情况下,游戏工作室可将经译码音频内容输出到再现引擎,所述再现引擎可再现声场以供递送系统回放。Yet another context in which the techniques may be implemented includes an audio ecosystem, which may include audio content, a game studio, decoded audio content, a rendering engine, and a delivery system. In some examples, a game studio may include one or more DAWs that may support editing of HOA signals. For example, the one or more DAWs may include HOA plug-ins and/or tools that may be configured to operate (e.g., work) with one or more game audio systems. In some examples, a game studio may output a new soundtrack format that supports HOA. In any case, the game studio may output the decoded audio content to a rendering engine that may reproduce the sound field for playback by a delivery system.
也可对于示范性音频获取装置执行所述技术。举例来说,可对于可包含共同地经配置以记录3D声场的多个麦克风的本征麦克风执行所述技术。在一些实例中,本征麦克风的所述多个麦克风可位于具有大约4cm的半径的基本呈球形的球体的表面上。在一些实例中,音频编码装置20可集成到本征麦克风中以便直接从麦克风输出位流21。The techniques may also be performed for an exemplary audio acquisition device. For example, the techniques may be performed for an eigenmicrophone, which may include multiple microphones collectively configured to record a 3D sound field. In some examples, the multiple microphones of the eigenmicrophone may be located on the surface of a substantially spherical sphere having a radius of approximately 4 cm. In some examples, the audio encoding device 20 may be integrated into the eigenmicrophone so as to output the bitstream 21 directly from the microphone.
另一示范性音频获取上下文可包含可经配置以从一或多个麦克风(例如,一或多个本征麦克风)接收信号的制作车。制作车还可包含音频编码器,例如图3的音频编码器20。Another exemplary audio acquisition context may include a production truck that may be configured to receive signals from one or more microphones (eg, one or more eigenmicrophones).The production truck may also include an audio encoder, such as audio encoder 20 of FIG.
在一些例子中,移动装置还可包含共同地经配置以记录3D声场的多个麦克风。换句话说,所述多个麦克风可具有X、Y、Z分集。在一些实例中,移动装置可包含可旋转以相对于移动装置的一或多个其它麦克风提供X、Y、Z分集的麦克风。移动装置还可包含音频编码器,例如图3的音频编码器20。In some instances, the mobile device may also include multiple microphones that are collectively configured to record a 3D sound field. In other words, the multiple microphones may have X, Y, Z diversity. In some examples, the mobile device may include a microphone that can be rotated to provide X, Y, Z diversity relative to one or more other microphones of the mobile device. The mobile device may also include an audio encoder, such as audio encoder 20 of FIG. 3 .
加固型视频捕获装置可进一步经配置以记录3D声场。在一些实例中,加固型视频捕获装置可附接到参与活动的用户的头盔。举例来说,加固型视频捕获装置可在用户泛舟时附接到用户的头盔。以此方式,加固型视频捕获装置可捕获表示用户周围的动作(例如,水在用户身后的撞击、另一泛舟者在用户前方说话等)的3D声场。The ruggedized video capture device can be further configured to record a 3D sound field. In some examples, the ruggedized video capture device can be attached to a helmet of a user participating in an activity. For example, the ruggedized video capture device can be attached to a user's helmet while the user is whitewater rafting. In this way, the ruggedized video capture device can capture a 3D sound field representing the action around the user (e.g., water crashing behind the user, another whitewater rafter speaking in front of the user, etc.).
还可对于可经配置以记录3D声场的配件增强型移动装置执行所述技术。在一些实例中,移动装置可类似于上文所论述的移动装置,其中添加了一或多个配件。举例来说,本征麦克风可附接到上文所提及的移动装置以形成配件增强型移动装置。以此方式,与仅使用与配件增强型移动装置成一体的声音捕获组件相比,配件增强型移动装置可捕获3D声场的更高质量版本。The techniques can also be performed for accessory-enhanced mobile devices that can be configured to record 3D sound fields. In some examples, the mobile device may be similar to the mobile device discussed above, with one or more accessories added. For example, an intrinsic microphone can be attached to the mobile device mentioned above to form an accessory-enhanced mobile device. In this way, the accessory-enhanced mobile device can capture a higher-quality version of the 3D sound field than using only the sound capture components integrated into the accessory-enhanced mobile device.
下文进一步论述可执行本发明中所描述的技术的各种方面的实例音频回放装置。根据本发明的一或多种技术,扬声器和/或声棒在回放3D声场时可布置于任何任意的配置中。此外,在一些实例中,头戴式耳机回放装置可经由有线或无线连接耦合到解码器24。根据本发明的一或多种技术,可使用声场的单一通用表示来在扬声器、声棒和头戴式耳机回放装置的任何组合上再现声场。An example audio playback device that can perform various aspects of the techniques described in this disclosure is discussed further below. In accordance with one or more techniques of this disclosure, speakers and/or soundbars can be arranged in any arbitrary configuration when playing back a 3D soundfield. Furthermore, in some examples, a headphone playback device can be coupled to decoder 24 via a wired or wireless connection. In accordance with one or more techniques of this disclosure, a single, universal representation of a soundfield can be used to reproduce the soundfield on any combination of speakers, soundbars, and headphone playback devices.
多个不同实例音频回放环境还可适用于执行本发明中所描述的技术的各种方面。举例来说,以下环境可为用于执行本发明中所描述的技术的各种方面的合适环境:5.1扬声器回放环境、2.0(例如,立体声)扬声器回放环境、具有全高前扩音器的9.1扬声器回放环境、22.2扬声器回放环境、16.0扬声器回放环境、汽车扬声器回放环境,以及具有耳芽(earbud)回放环境的移动装置。A number of different example audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For example, the following environments may be suitable environments for performing various aspects of the techniques described in this disclosure: a 5.1 speaker playback environment, a 2.0 (e.g., stereo) speaker playback environment, a 9.1 speaker playback environment with full-height front speakers, a 22.2 speaker playback environment, a 16.0 speaker playback environment, an automotive speaker playback environment, and a mobile device with an earbud playback environment.
根据本发明的一或多种技术,可利用声场的单一通用表示来在前述回放环境中的任一者上再现声场。另外,本发明的技术使得再现器能够从通用表示再现声场以用于在除上文所描述的环境之外的回放环境上回放。举例来说,如果设计考虑禁止扬声器根据7.1扬声器回放环境的恰当放置(例如,如果不可能放置右环绕扬声器),那么本发明的技术使得再现器能够以其它6个扬声器进行补偿,使得可在6.1扬声器回放环境上实现回放。According to one or more techniques of the present invention, a single universal representation of a sound field can be utilized to reproduce the sound field on any of the aforementioned playback environments. In addition, the techniques of the present invention enable a reproducer to reproduce the sound field from the universal representation for playback on playback environments other than the environments described above. For example, if design considerations prohibit the proper placement of speakers according to a 7.1 speaker playback environment (e.g., if it is impossible to place a right surround speaker), then the techniques of the present invention enable the reproducer to compensate with the other six speakers, so that playback can be achieved on a 6.1 speaker playback environment.
此外,用户可在佩戴头戴式耳机时观看运动比赛。根据本发明的一或多种技术,可获取运动比赛的3D声场(例如,可将一或多个本征麦克风放置在棒球场中和/或周围),可获得对应于3D声场的HOA系数且将所述HOA系数传输到解码器,所述解码器可基于HOA系数重构3D声场且将经重构3D声场输出到再现器,且所述再现器可获得关于回放环境的类型(例如,头戴式耳机)的指示,且将经重构3D声场再现为致使头戴式耳机输出运动比赛的3D声场的表示的信号。In addition, a user can watch a sports game while wearing a headset. According to one or more techniques of the present invention, a 3D sound field of a sports game can be obtained (for example, one or more eigenmicrophones can be placed in and/or around a baseball field), HOA coefficients corresponding to the 3D sound field can be obtained and transmitted to a decoder, the decoder can reconstruct the 3D sound field based on the HOA coefficients and output the reconstructed 3D sound field to a reproducer, and the reproducer can obtain an indication of the type of playback environment (for example, a headset) and reproduce the reconstructed 3D sound field as a signal that causes the headset to output a representation of the 3D sound field of the sports game.
在上述各种例子中的每一者中,应理解,音频编码装置20可执行方法,或另外包括执行音频编码装置20经配置以执行的方法的每一步骤的装置。在一些例子中,这些装置可包括一或多个处理器。在一些例子中,所述一或多个处理器可表示借助于存储到非暂时性计算机可读存储媒体的指令配置的专用处理器。换句话说,编码实例的集合中的每一者中的技术的各种方面可提供其上存储有指令的非暂时性计算机可读存储媒体,所述指令在执行时致使一或多个处理器执行音频编码装置20已经配置以执行的方法。In each of the various examples described above, it should be understood that the audio encoding device 20 may perform a method, or otherwise include means for performing each step of the method that the audio encoding device 20 is configured to perform. In some examples, these means may include one or more processors. In some examples, the one or more processors may represent specialized processors configured with the aid of instructions stored on a non-transitory computer-readable storage medium. In other words, various aspects of the techniques in each of the set of encoding examples may provide a non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause the one or more processors to perform the method that the audio encoding device 20 has been configured to perform.
在一或多个实例中,所描述的功能可用硬件、软件、固件或其任何组合实施。如果以软件实施,那么所述功能可以作为一或多个指令或代码在计算机可读媒体上存储或传输,并且由基于硬件的处理单元来执行。计算机可读媒体可包含计算机可读存储媒体,其对应于例如数据存储媒体等有形媒体。数据存储媒体可为可由一或多个计算机或一个或多个处理器存取以检索用于实施本发明中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可以包含计算机可读媒体。In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored or transmitted as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media such as data storage media. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.
同样地,在上文所描述的各种例子中的每一者中,应理解,音频解码装置24可执行方法或另外包括用于执行音频解码装置24经配置以执行的方法的每一步骤的装置。在一些例子中,所述装置可包括一或多个处理器。在一些例子中,所述一或多个处理器可表示借助于存储到非暂时性计算机可读存储媒体的指令配置的专用处理器。换句话说,编码实例的集合中的每一者中的技术的各种方面可提供其上存储有指令的非暂时性计算机可读存储媒体,所述指令在执行时致使所述一或多个处理器执行音频解码装置24已经配置以执行的方法。Likewise, in each of the various examples described above, it should be understood that the audio decoding device 24 may perform a method or otherwise include means for performing each step of the method that the audio decoding device 24 is configured to perform. In some examples, the means may include one or more processors. In some examples, the one or more processors may represent specialized processors configured with the aid of instructions stored on a non-transitory computer-readable storage medium. In other words, various aspects of the technology in each of the set of encoding examples may provide a non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause the one or more processors to perform the method that the audio decoding device 24 has been configured to perform.
借助于实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器,或可用以存储指令或数据结构的形式的期望程序代码并且可以由计算机存取的任何其它媒体。然而,应理解,所述计算机可读存储媒体和数据存储媒体并不包含连接、载波、信号或其它暂时性媒体,而是实际上针对非暂时性的有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. However, it should be understood that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are actually directed to non-transitory tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks generally reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also intended to be included within the scope of computer-readable media.
可由例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的功能性可以在经配置用于编码和解码的专用硬件和/或软件模块内提供,或者并入于组合式编码解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.
本发明的技术可在广泛多种装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本发明中描述各种组件、模块或单元是为了强调经配置以执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可以结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元的集合来提供,所述硬件单元包含如上文所描述的一或多个处理器。The techniques of this disclosure can be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or a set of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by distinct hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with appropriate software and/or firmware, or provided by a collection of interoperating hardware units, including one or more processors as described above.
已经描述了所述技术的各种方面。所述技术的这些和其它方面在所附权利要求书的范围内。Having described various aspects of the technology, it is intended that these and other aspects of the technology be within the scope of the following claims.
Claims (28)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US62/020,348 | 2014-07-02 | ||
| US62/060,512 | 2014-10-06 | ||
| US14/789,961 | 2015-07-01 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1232013A1 HK1232013A1 (en) | 2017-12-29 |
| HK1232013B true HK1232013B (en) | 2021-08-20 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11664035B2 (en) | Spatial transformation of ambisonic audio data | |
| CN106663433B (en) | Method and apparatus for processing audio data | |
| US9984693B2 (en) | Signaling channels for scalable coding of higher order ambisonic audio data | |
| CN106471576B (en) | Closed-loop Quantization of Higher Order Ambisonics Coefficients | |
| US10134403B2 (en) | Crossfading between higher order ambisonic signals | |
| US9959880B2 (en) | Coding higher-order ambisonic coefficients during multiple transitions | |
| HK1232013B (en) | Methods and devices for processing audio data | |
| HK1233045B (en) | Signaling channels for scalable coding of higher order ambisonic audio data | |
| HK1233047B (en) | Signaling layers for scalable coding of higher order ambisonic audio data | |
| HK1232013A1 (en) | Reducing correlation between higher order ambisonic (hoa) background channels |