CN110739000B - Audio object coding method suitable for personalized interactive system - Google Patents
Audio object coding method suitable for personalized interactive system Download PDFInfo
- Publication number
- CN110739000B CN110739000B CN201910972165.8A CN201910972165A CN110739000B CN 110739000 B CN110739000 B CN 110739000B CN 201910972165 A CN201910972165 A CN 201910972165A CN 110739000 B CN110739000 B CN 110739000B
- Authority
- CN
- China
- Prior art keywords
- matrix
- code stream
- audio
- coding
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
技术领域technical field
本发明属于数字音频信号处理技术领域,具体涉及一种多步逐级下混与重建的音频对象编码解码方法,适用于空间音频的个性化交互系统,允许在用户根据自身需求调整音频对象。The invention belongs to the technical field of digital audio signal processing, in particular to an audio object encoding and decoding method for multi-step downmixing and reconstruction, which is suitable for a personalized interactive system of spatial audio and allows users to adjust audio objects according to their own needs.
背景技术Background technique
基于声道编码的空间音频技术可以实现三位音频场景的编码与重建,比单声道或立体声音频技术更能提供身临其境的听觉体验,如MPEG空间音频编码、NHK22.2扬声器阵列等,因而越来越受到人们的欢迎。但传统基于声道的空间音频系统仍然存在的局限性,其灵活性较低,难以满足支持个性化交互功能的音频服务系统。因此,新一代音频编码技术将音频场景分解为一系列独立对象,以对象为基本元素进行编码传输。The spatial audio technology based on channel coding can realize the encoding and reconstruction of three-dimensional audio scenes, which can provide more immersive listening experience than mono or stereo audio technology, such as MPEG spatial audio coding, NHK22.2 speaker array, etc. , thus becoming more and more popular. However, the traditional channel-based spatial audio system still has limitations, its flexibility is low, and it is difficult to meet the audio service system that supports personalized interactive functions. Therefore, the new generation of audio coding technology decomposes the audio scene into a series of independent objects, and uses the objects as the basic elements for encoding and transmission.
国际上许多学者和研究机构已在音频对象编码方面的开展了研究工作,并提出多种音频对象编码方法。其中最具代表性的是德国知名研究机构Fraunhofer提出的空间音频对象联合编码技术(Spatial audio object coding,SAOC)[文献1],该方法编码传输多个音频对象的下混信号和边信息,在解码端根据边信息将音频对象从下混信号中分离重构。SAOC方法可以以低码率传输大量音频对象,大大提升了音频对象编码效率,并使得用户可以根据自身的听音需求进行个性化的调整与交互[文献2]。Many scholars and research institutions in the world have carried out research work on audio object coding, and proposed a variety of audio object coding methods. The most representative one is the Spatial Audio Object Coding (SAOC) technique proposed by Fraunhofer, a well-known German research institute [Document 1], which encodes and transmits the downmix signal and side information of multiple audio objects. The decoding end separates and reconstructs the audio object from the downmix signal according to the side information. The SAOC method can transmit a large number of audio objects at a low bit rate, which greatly improves the coding efficiency of audio objects, and enables users to make personalized adjustments and interactions according to their own listening needs [Reference 2].
在SAOC框架中,为了获得较低的编码比特率,在同一子带中使用相同的参数作为边信息。这导致了频域混叠失真,严重降低了听力体验,例如一个音频对象信号播放时会包含其他对象信号成分混合[文献3]。甚至,这一问题会影响到后续用户端的空间音频个性化交互服务。一些研究利用残差信号来补偿这一失真,提高解码音质[文献4][文献5]。然而,这些方法只能提高某个目标对象的听音体验,其他对象仍然存在混叠失真问题,并不能保证每个音频对象都有较好的解码音质。In the SAOC framework, in order to obtain a lower coding bit rate, the same parameters are used as side information in the same subband. This leads to aliasing distortion in the frequency domain, which seriously degrades the listening experience. For example, when an audio object signal is played, it will contain other object signal components mixed [Reference 3]. Even, this problem will affect the subsequent user-end spatial audio personalized interactive services. Some studies use the residual signal to compensate for this distortion and improve the decoding sound quality [Reference 4][Reference 5]. However, these methods can only improve the listening experience of a certain target object, and other objects still have the problem of aliasing distortion, and cannot guarantee that each audio object has a better decoding quality.
文献1:Breebaart,J.,Engdeg°ard,J.,Falch,C.,et al.:Spatial audio objectcoding (saoc)-the upcoming mpeg standard on parametric object based audiocoding.In:Audio Engineering Society Convention 124.Audio Engineering Society(2008).Literature 1: Breebaart, J., Engdeg°ard, J., Falch, C., et al.: Spatial audio objectcoding (saoc)-the upcoming mpeg standard on parametric object based audiocoding. In: Audio Engineering Society Convention 124. Audio Engineering Society (2008).
文献2:Coleman,P.,Franck,A.,Francombe,J.,et al.:An audio-visual systemfor objectbased audio:From recording to listening.IEEE Transactions onMultimedia 20(8),1919-1931(2018).Reference 2: Coleman, P., Franck, A., Francombe, J., et al.: An audio-visual system for objectbased audio: From recording to listening. IEEE Transactions on Multimedia 20(8), 1919-1931(2018).
文献3:Wu,T.,Hu,R.,Wang,X.,Ke,S.:Audio object coding based on optimalparameter frequency resolution.Multimedia Tools and Applications pp.1-16(2019).文献4:Kim,K.,Seo,J.,Beack,S.,Kang,K.,Hahn,M.:Spatial audio objectcoding with two-step coding structure for interactive audio service.IEEETransactions on Multimedia 13(6),1208-1216(2011).Literature 3: Wu, T., Hu, R., Wang, X., Ke, S.: Audio object coding based on optimalparameter frequency resolution. Multimedia Tools and Applications pp.1-16 (2019). Literature 4: Kim, K., Seo, J., Beack, S., Kang, K., Hahn, M.: Spatial audio objectcoding with two-step coding structure for interactive audio service. IEEE Transactions on Multimedia 13(6), 1208-1216 (2011 ).
文献5:Lee,B.,Kim,K.,Hahn,M.:Efficient residual coding method ofspatial audio object coding with two-step coding structure for interactiveaudio services.IEICE TRANSACTIONS on Information and Systems 99(7),1949-1952(2016).Document 5: Lee, B., Kim, K., Hahn, M.: Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE TRANSACTIONS on Information and Systems 99(7), 1949-1952 (2016).
发明内容SUMMARY OF THE INVENTION
为解决上述技术问题,本发明提供了一种多步逐级下混与重建的音频对象编解码方法,能够在中低码率下进行高质量的音频编解码,保证所有音频对象都具有良好解码音质。In order to solve the above-mentioned technical problems, the present invention provides an audio object encoding and decoding method for multi-step downmixing and reconstruction, which can perform high-quality audio encoding and decoding at medium and low code rates, and ensure that all audio objects have good decoding. sound quality.
本发明所采用的技术方案是:一种适应于个性化交互系统的音频对象编码方法,其特征在于,包括以下步骤:The technical scheme adopted by the present invention is: an audio object coding method suitable for the personalized interactive system, characterized in that it comprises the following steps:
步骤A1:对输入的音频对象序列进行分帧加窗,将时域信号转换到频域信号,得到每个音频对象的时频矩阵;Step A1: carry out frame-by-frame windowing to the input audio object sequence, convert the time-domain signal into a frequency-domain signal, and obtain the time-frequency matrix of each audio object;
步骤A2:根据每个对象的时频矩阵,计算对象频域能量进行排序,确定多步逐级编码中每步需要编码的对象;Step A2: According to the time-frequency matrix of each object, calculate the frequency domain energy of the object to sort, and determine the object to be encoded in each step in the multi-step step-by-step encoding;
步骤A3:根据确定的编码顺序,逐步下混并计算对应的边信息;所述逐步下混指将当前处理流程中输入的对象对数据进行矩阵相加,得到一个和矩阵;其中逐步下混信号并不作为传输码流进行传输;所述边信息包含对象残差与对象增益参数矩阵;其中,对象增益参数通过对象对中两个输入信号的能量比计算得到;Step A3: According to the determined coding sequence, gradually downmix and calculate the corresponding side information; the stepwise downmix refers to performing matrix addition on the data of the input objects in the current processing flow to obtain a sum matrix; wherein the stepwise downmix signal It is not transmitted as a transport code stream; the side information includes the object residual and the object gain parameter matrix; wherein, the object gain parameter is calculated by the energy ratio of the two input signals in the object pair;
步骤A4:利用奇异值分解将边信息中的对象残差分解为左、右奇异矩阵与奇异值;Step A4: Use singular value decomposition to decompose the object residuals in the side information into left and right singular matrices and singular values;
步骤A5:量化奇异矩阵、奇异值及对象增益参数,获得边信息码流;Step A5: quantize singular matrix, singular value and object gain parameter to obtain side information code stream;
步骤A6:将步骤A3中的最终下混信号进行编码,获得下混信号码流;Step A6: Encode the final downmix signal in Step A3 to obtain a downmix signal stream;
步骤A7:步骤A5和步骤A6得到的码流合成为输出码流,传输到解码端。Step A7: The code stream obtained in step A5 and step A6 is synthesized into an output code stream and transmitted to the decoding end.
与现有音频对象编码技术相比,本发明的优势在于:利用多步逐级编解码,最大程度上利用残差补偿解码失真,保证每个音频对象都具有较好的听音质量;同时引入奇异值分解将残差信息分解压缩,降低码率。因此,本发明可以保证在中低码率下,解码得到高质量的音频对象,以满足音频个性化交互系统的使用需求。Compared with the existing audio object coding technology, the present invention has the advantages of: using multi-step step-by-step coding and decoding, using residuals to compensate for decoding distortion to the greatest extent, and ensuring that each audio object has better listening quality; Singular value decomposition decomposes and compresses residual information to reduce the code rate. Therefore, the present invention can ensure that high-quality audio objects can be obtained by decoding at medium and low bit rates, so as to meet the usage requirements of the audio personalized interactive system.
附图说明Description of drawings
图1是本发明实施例的编码原理图;Fig. 1 is the coding principle diagram of the embodiment of the present invention;
图2是本发明实施例的解码原理图。FIG. 2 is a schematic diagram of decoding according to an embodiment of the present invention.
具体实施方式Detailed ways
为了便于本领域的技术人员理解和实施本发明,下面结合附图以及具体实施示例对本发明的技术方案作进一步说明,应当理解,此处所描述的实施示例仅用于说明和解释本发明,并不用于限定本发明:In order to facilitate the understanding and implementation of the present invention by those skilled in the art, the technical solutions of the present invention will be further described below with reference to the accompanying drawings and specific implementation examples. To limit the present invention:
本发明在现有音频对象编码方法的基础上开展进一步研究,提出了多步逐级下混与重建的音频对象编解码方法。首先,根据对象频域能量研究最佳编码顺序,确定每步需要编码和计算边信息的对象,最终可以得到每个对象的残差信息,有效降低所有重建对象的信号失真与混淆;然后利用奇异值分解方法将残差信息分为三个低维矩阵,从而达到压缩残差信息,降低比特率的目的。The present invention conducts further research on the basis of the existing audio object encoding method, and proposes an audio object encoding and decoding method for multi-step downmixing and reconstruction. First, the optimal coding sequence is studied according to the object frequency domain energy, and the objects that need to be coded and calculated in each step are determined. Finally, the residual information of each object can be obtained, which can effectively reduce the signal distortion and confusion of all reconstructed objects; The value decomposition method divides the residual information into three low-dimensional matrices, so as to achieve the purpose of compressing the residual information and reducing the bit rate.
参见图1,本发明提出一种适应于个性化交互系统的多音频对象编码方法,本实施示例以输入A、B、C、D四个对象举例说明,具体实施示例包含以下步骤:Referring to FIG. 1, the present invention proposes a multi-audio object encoding method suitable for a personalized interactive system. This implementation example is illustrated by inputting four objects A, B, C, and D. The specific implementation example includes the following steps:
步骤A1:输入音频对象A、B、C、D(可包含人声、钢琴、吉他等多种不同对象),将每个对象分帧加窗,时域信号转换到频域信号,得到每个音频对象的时频矩阵;Step A1: Input audio objects A, B, C, D (which may include various objects such as human voice, piano, guitar, etc.), divide each object into frames and add windows, convert time domain signals to frequency domain signals, and obtain each The time-frequency matrix of the audio object;
本实施例中,通过分帧、加窗与改进离散余弦变换MDCT将原本时域的一维声音信号,变为频域的二维频谱图,输出得到的是矩阵形式的对象数据。In this embodiment, the original one-dimensional sound signal in the time domain is transformed into a two-dimensional spectrogram in the frequency domain through framing, windowing and improved discrete cosine transform (MDCT), and the output is object data in the form of a matrix.
输入的音频对象信号采样率为44.1Khz,位深度16位,wav音频格式。The input audio object signal sampling rate is 44.1Khz, the bit depth is 16 bits, and the wav audio format.
应注意的是,此处规定的音频参数和对象种类仅为举例说明本发明的实施过程,并不用于限定本发明。It should be noted that the audio parameters and object types specified herein are only for illustrating the implementation process of the present invention, and are not intended to limit the present invention.
分帧加窗中,每帧长度1024,窗函数选择hanning窗,50%时域交叠;时频变换选择改进离散余弦变换MDCT,变换长度为2048点;最终输出多个矩阵形式的音频对象信号,其中矩阵行数等于帧数(或列数等于帧数)、矩阵的列数等于频点数(或行数等于频点数)。In the frame-by-frame windowing, the length of each frame is 1024, the window function selects the hanning window, and the time domain overlaps 50%; the time-frequency transform selects the improved discrete cosine transform MDCT, and the transform length is 2048 points; finally output multiple audio object signals in the form of matrices , where the number of matrix rows is equal to the number of frames (or the number of columns is equal to the number of frames), and the number of columns of the matrix is equal to the number of frequency points (or the number of rows is equal to the number of frequency points).
应注意的是,此处规定的帧长,窗函数类型以及变换方式等只是为了举例说明本发明的具体实施步骤,并不用作限定本发明。It should be noted that the frame length, window function type and transformation mode specified here are only for illustrating the specific implementation steps of the present invention, and are not used to limit the present invention.
步骤A2:根据每个对象的时频矩阵,计算对象频域能量进行排序,确定多步逐级编码中每步需要编码的对象;Step A2: According to the time-frequency matrix of each object, calculate the frequency domain energy of the object to sort, and determine the object to be encoded in each step in the multi-step step-by-step encoding;
本实施例中,根据矩阵形式的对象数据,计算对象频域能量,选择从大到小的能量排序方式,确定每步需要编码的对象顺序;编码顺序,指优先编码能量较大的音频对象。In this embodiment, the object frequency domain energy is calculated according to the object data in the form of a matrix, the energy sorting method from large to small is selected, and the order of the objects to be encoded in each step is determined;
对象频域能量的计算如下式所示:The calculation of the object frequency domain energy is as follows:
其中,||Si||表示第i个音频对象的总能量,Oi表示第i个对象在所有对象总能量中所占比例;根据每个对象Oi值从大到小排序,排序顺序为D(S1)、B(S2)、A(S3)、C(S4),优先编码Oi值大的对象;应注意的是,此处规定的i∈[1,4]以及从大到小的排序方式,仅为举例说明本发明的具体实施步骤,并不用作限定本发明。Among them, ||S i || represents the total energy of the ith audio object, O i represents the proportion of the ith object in the total energy of all objects; according to the value of each object O i is sorted from large to small, and the sorting order is For D(S 1 ), B(S 2 ), A(S 3 ), and C(S 4 ), objects with large O i values are preferentially encoded; it should be noted that i∈[1, 4] specified here And the ordering manner from large to small is only an example to illustrate the specific implementation steps of the present invention, and is not used to limit the present invention.
步骤A3:根据编码顺序,逐步下混并计算对应的边信息(对象残差与奇异矩阵、奇异值);Step A3: According to the coding sequence, gradually downmix and calculate the corresponding side information (object residual, singular matrix, singular value);
本实施例中,逐步下混指将当前处理流程中输入的对象对数据进行矩阵相加,得到一个和矩阵;其中逐步下混信号并不作为传输码流进行传输;边信息包含对象残差与对象增益参数矩阵;其中,对象增益参数通过对象对中两个输入信号的能量比计算得到;In this embodiment, the step-by-step down-mixing refers to performing matrix addition on the data of the input objects in the current processing flow to obtain a sum matrix; the step-by-step down-mixing signal is not transmitted as a transport stream; the side information includes the object residual and object gain parameter matrix; wherein, the object gain parameter is calculated by the energy ratio of the two input signals in the object pair;
对象残差与对象增益参数的计算公式如下所示:The calculation formulas of the object residual and object gain parameters are as follows:
其中,R(i)为第i+1个对象的残差信号,Go(i)为第i+1个对象的增益参数,Gd(i)为第i个下混信号的增益参数;公式中Xi表示第i步得到的下混信号,Po(i)为对象i的能量,Pd(i)为第i步下混信号的能量。在本实施实例中N=4,表示需要编码的对象个数。Wherein, R(i) is the residual signal of the i+1th object, G o (i) is the gain parameter of the i+1th object, and G d (i) is the gain parameter of the i-th downmix signal; In the formula, X i represents the downmix signal obtained in the i-th step, P o (i) is the energy of the object i, and P d (i) is the energy of the down-mix signal in the i-th step. In this embodiment, N=4, which indicates the number of objects to be encoded.
应注意的是,此处规定的对象数量N=4仅为举例说明本发明的具体实施步骤,并不用作限定本发明。It should be noted that the number of objects N=4 specified here is only for illustrating the specific implementation steps of the present invention, and is not used to limit the present invention.
结合本实例,根据步骤A2确定的编码顺序以上公式多步逐级下混计算过程如下:第一步,将对象D、B作为对象对进行下混及参数提取(在第一步中,D被视为下混信号进行计算),得到两个对象的下混信号X1,并计算得到第二个对象B的增益参数Go(1)及其残差R(1);第二步,将下混信号X1、A作为对象对进行下混及参数提取,得到第二步的下混信号X2,并计算第三个对象A的增益参数Go(2)及其残差R(2);第三步,将下混信号X2、C作为对象对进行下混及参数提取,得到第三步的下混信号X3(即需要传输到解码端的最终下混信号),并计算第四个对象C的增益参数Go(3)及其残差R(3)。至此,四个对象通过以上三步完成下混与参数提取。In conjunction with this example, the multi-step downmixing calculation process of the above formula according to the coding sequence determined in step A2 is as follows: the first step, the objects D and B are used as object pairs to carry out downmixing and parameter extraction (in the first step, D is The downmix signal X 1 of the two objects is obtained, and the gain parameter G o (1) of the second object B and its residual R (1) are obtained by calculation; in the second step, the The downmixed signals X 1 and A are used as object pairs to perform downmixing and parameter extraction to obtain the downmix signal X 2 of the second step, and calculate the gain parameter G o (2) of the third object A and its residual R (2 ); In the third step, the downmix signals X 2 and C are used as object pairs to perform down mixing and parameter extraction to obtain the down mix signal X 3 of the third step (that is, the final down mix signal that needs to be transmitted to the decoding end), and calculate the first The gain parameters Go (3) of the four objects C and their residuals R(3). So far, the four objects have completed the downmix and parameter extraction through the above three steps.
应注意的是,此处规定的编码顺序与步数仅为举例说明本发明的具体实施步骤,并不用作限定本发明。It should be noted that the coding sequence and the number of steps specified here are only examples to illustrate the specific implementation steps of the present invention, and are not used to limit the present invention.
步骤A4:利用奇异值分解将边信息中的对象残差分解为系数矩阵与核向量;Step A4: Use singular value decomposition to decompose the object residuals in the side information into coefficient matrices and kernel vectors;
本实施例中,通过奇异值分解方法对多个对象的残差矩阵进行降维压缩,减少残差信息带来的数据量上升;残差矩阵会被分解为三个小矩阵,分别为左奇异矩阵、奇异值矩阵、右奇异矩阵;其中,奇异值矩阵仅传输矩阵对角线上的数值。In this embodiment, the residual matrix of multiple objects is dimensionally reduced and compressed by the singular value decomposition method to reduce the increase in the amount of data caused by residual information; the residual matrix will be decomposed into three small matrices, which are left singular Matrix, singular value matrix, right singular matrix; among them, the singular value matrix only transmits the values on the diagonal of the matrix.
奇异值分解SVD是一种矩阵特征值分解,用于将矩阵归约成其组成部分的矩阵分解方法,以使高维矩阵分解为几个低维矩阵进行表示,以达到数据压缩的目的。分解过程如下所示:Singular value decomposition SVD is a matrix eigenvalue decomposition, a matrix decomposition method used to reduce a matrix into its components, so that a high-dimensional matrix can be decomposed into several low-dimensional matrices for representation, so as to achieve the purpose of data compression. The decomposition process is as follows:
其中,R(i)P×Q为第i+1个对象的残差信号,行数P为MDCT变换长度的一半,列数Q为音频对象的帧数。U为左奇异矩阵,Λ为奇异值矩阵,V为右奇异矩阵。Λ矩阵中对角线上的奇异值按从大到小排序。Among them, R(i) P×Q is the residual signal of the i+1 th object, the number of rows P is half the length of the MDCT transform, and the number of columns Q is the number of frames of the audio object. U is the left singular matrix, Λ is the singular value matrix, and V is the right singular matrix. The singular values on the diagonal in the Λ matrix are sorted from largest to smallest.
为了进行降维,可以选择前r个奇异值(取r=50)和对应的奇异矩阵近似表示R(i),近似表示如下:For dimensionality reduction, the first r singular values (take r=50) and the corresponding singular matrix can be selected to approximate R(i), and the approximate expression is as follows:
其中,为奇异值矩阵的一部分,和为原始左右奇异矩阵的前50行(或列)。利用以上三个矩阵可以近似表示残差信号,并降低矩阵维度,压缩边信息数据量。in, is part of the singular value matrix, and is the first 50 rows (or columns) of the original left and right singular matrix. Using the above three matrices can approximate the residual signal, reduce the matrix dimension, and compress the amount of side information data.
应注意的是,此处规定的r=50仅为举例说明本发明的具体实施步骤,并不用作限定本发明。It should be noted that r=50 specified here is only for illustrating the specific implementation steps of the present invention, and is not used to limit the present invention.
步骤A5:量化奇异值、奇异矩阵及对象增益参数,获得边信息码流;Step A5: quantize singular values, singular matrices and object gain parameters to obtain side information code streams;
本实施例中,量化可通过查表法实现。在量化操作中,残差分解矩阵与增益参数中的元素取值范围不同,因此量化前通过归一化处理来统一量化表。然后根据每个元素值的大小在量化表中查找最接近的量化值,并将对应的量化索引作为边信息量化码流输出。In this embodiment, the quantification may be implemented by a table look-up method. In the quantization operation, the value ranges of the elements in the residual decomposition matrix and the gain parameter are different, so the quantization table is unified by normalization before quantization. Then look up the closest quantization value in the quantization table according to the size of each element value, and output the corresponding quantization index as the side information quantization code stream.
步骤A6:将步骤A3中的最终下混信号进行编码,获得下混信号码流;Step A6: Encode the final downmix signal in Step A3 to obtain a downmix signal stream;
本实施例中,最终下混信号为解码端进行对象信号重建的基础,其采用AAC128k进行编码。In this embodiment, the final downmix signal is the basis for the decoding end to reconstruct the object signal, which uses AAC128k for encoding.
应注意的是,对最终下混信号采用AAC 128k编码仅为举例说明本发明的具体实施步骤,并不用作限定本发明。It should be noted that the use of AAC 128k encoding for the final downmix signal is only used to illustrate the specific implementation steps of the present invention, and is not intended to limit the present invention.
步骤A7:步骤A5和步骤A6得到的码流合成为输出码流,传输到解码端。Step A7: The code stream obtained in step A5 and step A6 is synthesized into an output code stream and transmitted to the decoding end.
合成输出码流指将最终下混信号码流与边信息码流进行码流合并,并添加标志位用于标识解析。最终下混信号码流指经AAC编码后的输出码流,边信息码流指残差分解矩阵与增益参数量化后输出的量化索引码流。参见图2,本发明还提出了一种适应于个性化交互系统的多音频对象解码方法,本实施示例以输入A、B、C、D四个对象举例说明,具体实施示例包含以下步骤:Synthesizing the output code stream refers to merging the final downmix signal code stream and the side information code stream, and adding a flag bit for identification and parsing. The final downmix signal code stream refers to the output code stream after AAC encoding, and the side information code stream refers to the quantization index code stream output after the residual decomposition matrix and the gain parameter are quantized. Referring to Fig. 2, the present invention also proposes a multi-audio object decoding method suitable for a personalized interactive system. This implementation example takes the input of four objects A, B, C, and D as an example, and the specific implementation example includes the following steps:
步骤B1:解析接收到的码流,得到边信息码流与最终下混信号码流;Step B1: Parse the received code stream to obtain the side information code stream and the final downmix signal code stream;
本实施例中,解析码流指根据合成输出码流的方法进行反推,得到最终下混信号码流与边信息码流。In this embodiment, parsing the code stream refers to performing reverse inference according to the method of synthesizing the output code stream to obtain the final downmix signal code stream and the side information code stream.
步骤B2:下混信号码流经过AAC解码得到下混信号;Step B2: the downmix signal code stream is decoded by AAC to obtain the downmix signal;
本实施例中,最终下混信号码流是经过AAC编码压缩后得到的数据流,在经过AAC解码后可得到传输前的最终下混信号。In this embodiment, the final downmix signal code stream is a data stream obtained after AAC encoding and compression, and after AAC decoding, the final downmix signal before transmission can be obtained.
步骤B3:边信息码流经过去量化后得到左、右奇异矩阵、奇异值及对象增益参数;Step B3: After the side information code stream is dequantized, left and right singular matrices, singular values and object gain parameters are obtained;
本实施例中,边信息在进行量化时进行了归一化,在去量化时对应进行去归一化。经此,可解析得到传输前的边信息。In this embodiment, the side information is normalized during quantization, and correspondingly de-normalized during dequantization. Through this, the side information before transmission can be obtained through analysis.
步骤B4:左、右奇异矩阵与奇异值进行矩阵合成恢复出对象残差;Step B4: Matrix synthesis of left and right singular matrices and singular values to recover object residuals;
本实施例中,矩阵合成是将左奇异矩阵,奇异值矩阵,右奇异矩阵相乘得到近似的对象残差,具体见公式:In this embodiment, the matrix synthesis is to multiply the left singular matrix, the singular value matrix, and the right singular matrix to obtain an approximate object residual. For details, see the formula:
步骤B5:根据编码顺序反向解码,利用边信息从传输下混信号中循环重构音频对象频域信号;Step B5: reverse decoding according to the coding order, and use side information to cyclically reconstruct the audio object frequency domain signal from the transmission downmix signal;
利用对象增益参数将对象从对应的下混信号中分离出来,再与残差信号进行计算弥补混叠失真后可以得到重构的音频对象频域信号,如下式所示:Using the object gain parameter to separate the object from the corresponding downmix signal, and then calculating with the residual signal to compensate for the aliasing distortion, the reconstructed audio object frequency domain signal can be obtained, as shown in the following formula:
其中,S′i是重构得到的频域对象信号,X′i是重构得到的逐步下混信号,Gd(i)为每步对应下混信号的增益参数。是解码端通过矩阵合成得到的残差信息,即步骤B4所完成的工作。对象的解码顺序与编码顺序相反,每个对象在对应的解码步骤中从逐步下混信号中解析重构。Among them, S′ i is the frequency domain object signal obtained by reconstruction, X′ i is the step-by-step downmix signal obtained by reconstruction, and G d (i) is the gain parameter of the downmix signal corresponding to each step. is the residual information obtained by the decoding end through matrix synthesis, that is, the work completed in step B4. The decoding order of the objects is opposite to the encoding order, and each object is analytically reconstructed from the progressive downmix signal in the corresponding decoding step.
结合本实例,根据步骤B5确定的解码顺序,根据以上公式(8)(9)(10)多步逐级重构对象过程如下:第一步,利用增益参数Go(3)及其残差从最终下混信号X3中重构对象C(即S′4),利用增益参数Gd(3)从最终下混信号X3中重构得到逐步下混信号X′2;第二步,利用增益参数Go(2)及其残差从逐步下混信号X′2中重构对象A(即S′3),利用增益参数Gd(2)从最逐步下混信号X′2中重构得到逐步下混信号X′1;第三步,利用增益参数Go(1)及其残差从逐步下混信号X′1中重构对象B(即S′2),利用逐步下混信号X′1与重构对象B相减,可得重构对象D(即S′1)。至此,通过三步解码,将对象从对应的逐步下混信号中依次恢复出来,并利用残差信息对其重构信号进行了补偿,减小混叠失真带来的音质降低。Combined with this example, according to the decoding order determined in step B5, the multi-step and step-by-step reconstruction object process according to the above formula (8) (9) (10) is as follows: The first step is to use the gain parameter G o (3) and its residual. The object C (ie S' 4 ) is reconstructed from the final down-mix signal X 3 , and the step-by-step down-mix signal X' 2 is reconstructed from the final down-mix signal X 3 by using the gain parameter G d (3); in the second step, Using the gain parameter Go(2) and its residual The object A (ie S' 3 ) is reconstructed from the stepwise downmix signal X' 2 , and the stepwise downmix signal X' 1 is reconstructed from the most stepwise downmix signal X' 2 by using the gain parameter G d (2); Three steps, using the gain parameter G o (1) and its residual The object B (ie S′ 2 ) is reconstructed from the stepwise downmix signal X′ 1 , and the reconstructed object D (ie S′ 1 ) can be obtained by subtracting the stepwise downmix signal X′ 1 and the reconstructed object B. So far, through three-step decoding, the object is sequentially recovered from the corresponding progressive downmix signal, and the reconstructed signal is compensated by using the residual information to reduce the sound quality degradation caused by aliasing distortion.
应注意的是,此处A、B、C、D四个对象与解码步数仅为举例说明本发明的具体实施步骤,并不用作限定本发明。It should be noted that the four objects A, B, C, and D and the number of decoding steps here are only examples to illustrate the specific implementation steps of the present invention, and are not used to limit the present invention.
步骤B6:利用时频反变换,将频域的音频对象信号转换到时域。Step B6: Convert the audio object signal in the frequency domain to the time domain by using inverse time-frequency transform.
本实施例中,逐步重建的对象信号仍然是频域信号,需要进行时频反变换将其转换到时域内才可进行后续的渲染、个性化交互、播放等功能。所以,解码方法中的反变换是将对象频域信号进行去窗,改进离散余弦逆变换操作得到时域联系信号。In this embodiment, the gradually reconstructed object signal is still the frequency domain signal, and needs to be converted into the time domain by inverse time-frequency transform before subsequent functions such as rendering, personalized interaction, and playback can be performed. Therefore, the inverse transform in the decoding method is to remove the window of the target frequency domain signal, and improve the inverse discrete cosine transform operation to obtain the time domain signal.
与现有音频对象编码方法相比,本发明具有的优势及特点是:Compared with the existing audio object coding method, the advantages and features of the present invention are:
利用多步逐级编解码,最大程度上利用残差补偿解码失真,保证每个音频对象都具有较好的听音质量;同时引入奇异值分解将残差信息分解压缩,降低码率。因此,本发明可以保证在中低码率下,解码得到高质量的音频对象,以满足音频个性化交互系统的使用需求。Using multi-step step-by-step encoding and decoding, the residual error is used to compensate for decoding distortion to the greatest extent, so as to ensure that each audio object has better listening quality; at the same time, singular value decomposition is introduced to decompose and compress the residual information to reduce the bit rate. Therefore, the present invention can ensure that high-quality audio objects can be obtained by decoding at medium and low bit rates, so as to meet the usage requirements of the audio personalized interactive system.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910972165.8A CN110739000B (en) | 2019-10-14 | 2019-10-14 | Audio object coding method suitable for personalized interactive system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910972165.8A CN110739000B (en) | 2019-10-14 | 2019-10-14 | Audio object coding method suitable for personalized interactive system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110739000A CN110739000A (en) | 2020-01-31 |
CN110739000B true CN110739000B (en) | 2022-02-01 |
Family
ID=69270038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910972165.8A Active CN110739000B (en) | 2019-10-14 | 2019-10-14 | Audio object coding method suitable for personalized interactive system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110739000B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112365896B (en) * | 2020-10-15 | 2022-06-14 | 武汉大学 | Object-oriented encoding method based on stack type sparse self-encoder |
CN112885364B (en) * | 2021-01-21 | 2023-10-13 | 维沃移动通信有限公司 | Audio encoding method and decoding method, audio encoding device and decoding device |
CN113096672B (en) * | 2021-03-24 | 2022-06-14 | 武汉大学 | Multi-audio object coding and decoding method applied to low code rate |
CN113314131B (en) * | 2021-05-07 | 2022-08-09 | 武汉大学 | Multistep audio object coding and decoding method based on two-stage filtering |
CN113314130B (en) * | 2021-05-07 | 2022-05-13 | 武汉大学 | An Audio Object Coding and Decoding Method Based on Spectrum Shifting |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067931A (en) * | 2007-05-10 | 2007-11-07 | 芯晟(北京)科技有限公司 | Efficient configurable frequency domain parameter stereo-sound and multi-sound channel coding and decoding method and system |
WO2008145894A1 (en) * | 2007-05-10 | 2008-12-04 | France Telecom | Audio encoding and decoding method and associated audio encoder, audio decoder and computer programs |
CN101609674A (en) * | 2008-06-20 | 2009-12-23 | 华为技术有限公司 | Codec method, device and system |
JP2010109631A (en) * | 2008-10-29 | 2010-05-13 | Kyocera Corp | Wireless communication system, transmission device, and communication signal transmission method |
CN103778919A (en) * | 2014-01-21 | 2014-05-07 | 南京邮电大学 | Speech coding method based on compressed sensing and sparse representation |
CN103928030A (en) * | 2014-04-30 | 2014-07-16 | 武汉大学 | Gradable audio coding system and method based on sub-band space attention measure |
CN103974076A (en) * | 2014-05-19 | 2014-08-06 | 华为技术有限公司 | Image decoding and coding method, device and system |
CN104064194A (en) * | 2014-06-30 | 2014-09-24 | 武汉大学 | Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency |
CN105556596A (en) * | 2013-07-22 | 2016-05-04 | 弗朗霍夫应用科学研究促进协会 | Multi-channel audio decoder, multi-channel audio encoder, method and computer program for adjusting decorrelated signal contribution based on residual signal |
CN107610710A (en) * | 2017-09-29 | 2018-01-19 | 武汉大学 | A kind of audio coding and coding/decoding method towards Multi-audio-frequency object |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8280744B2 (en) * | 2007-10-17 | 2012-10-02 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio decoder, audio object encoder, method for decoding a multi-audio-object signal, multi-audio-object encoding method, and non-transitory computer-readable medium therefor |
TWI476761B (en) * | 2011-04-08 | 2015-03-11 | Dolby Lab Licensing Corp | Audio encoding method and system for generating a unified bitstream decodable by decoders implementing different decoding protocols |
EP2690621A1 (en) * | 2012-07-26 | 2014-01-29 | Thomson Licensing | Method and Apparatus for downmixing MPEG SAOC-like encoded audio signals at receiver side in a manner different from the manner of downmixing at encoder side |
US20150371644A1 (en) * | 2012-11-09 | 2015-12-24 | Stormingswiss Gmbh | Non-linear inverse coding of multichannel signals |
CN107533845B (en) * | 2015-02-02 | 2020-12-22 | 弗劳恩霍夫应用研究促进协会 | Apparatus and method for processing encoded audio signals |
-
2019
- 2019-10-14 CN CN201910972165.8A patent/CN110739000B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067931A (en) * | 2007-05-10 | 2007-11-07 | 芯晟(北京)科技有限公司 | Efficient configurable frequency domain parameter stereo-sound and multi-sound channel coding and decoding method and system |
WO2008145894A1 (en) * | 2007-05-10 | 2008-12-04 | France Telecom | Audio encoding and decoding method and associated audio encoder, audio decoder and computer programs |
CN101609674A (en) * | 2008-06-20 | 2009-12-23 | 华为技术有限公司 | Codec method, device and system |
JP2010109631A (en) * | 2008-10-29 | 2010-05-13 | Kyocera Corp | Wireless communication system, transmission device, and communication signal transmission method |
CN105556596A (en) * | 2013-07-22 | 2016-05-04 | 弗朗霍夫应用科学研究促进协会 | Multi-channel audio decoder, multi-channel audio encoder, method and computer program for adjusting decorrelated signal contribution based on residual signal |
US20160275958A1 (en) * | 2013-07-22 | 2016-09-22 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Multi-Channel Audio Decoder, Multi-Channel Audio Encoder, Methods and Computer Program using a Residual-Signal-Based Adjustment of a Contribution of a Decorrelated Signal |
CN103778919A (en) * | 2014-01-21 | 2014-05-07 | 南京邮电大学 | Speech coding method based on compressed sensing and sparse representation |
CN103928030A (en) * | 2014-04-30 | 2014-07-16 | 武汉大学 | Gradable audio coding system and method based on sub-band space attention measure |
CN103974076A (en) * | 2014-05-19 | 2014-08-06 | 华为技术有限公司 | Image decoding and coding method, device and system |
CN104064194A (en) * | 2014-06-30 | 2014-09-24 | 武汉大学 | Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency |
CN107610710A (en) * | 2017-09-29 | 2018-01-19 | 武汉大学 | A kind of audio coding and coding/decoding method towards Multi-audio-frequency object |
Non-Patent Citations (2)
Title |
---|
Audio object coding based on optimal parameter frequency resolution;Wu, T. , et al.;《Multimedia Tools and Applications》;20190305;第20723-20738页 * |
一种结合G.719编解码器的参数立体声音频编解码扩展方法;王晶 等;《北京理工大学学报》;20140228;第34卷(第2期);第192-196页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110739000A (en) | 2020-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110739000B (en) | Audio object coding method suitable for personalized interactive system | |
US11798568B2 (en) | Methods, apparatus and systems for encoding and decoding of multi-channel ambisonics audio data | |
CN102270452B (en) | Near-transparent or transparent multi-channel encoder/decoder scheme | |
US9514759B2 (en) | Method and apparatus for performing an adaptive down- and up-mixing of a multi-channel audio signal | |
TWI612517B (en) | Switching between predictive and non-predictive quantization techniques in a higher order ambisonics (hoa) framework | |
CN102982805B (en) | A Multi-channel Audio Signal Compression Method Based on Tensor Decomposition | |
US8964994B2 (en) | Encoding of multichannel digital audio signals | |
CN107610710B (en) | Audio coding and decoding method for multiple audio objects | |
CN101202043B (en) | Method and system for encoding and decoding audio signal | |
CN101160619A (en) | Adaptive Residual Audio Coding | |
JP2013506164A (en) | Audio signal decoder, audio signal encoder, upmix signal representation generation method, downmix signal representation generation method, computer program, and bitstream using common object correlation parameter values | |
CN106373583B (en) | Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM | |
CN106981292A (en) | A kind of multichannel spatial audio signal compression modeled based on tensor and restoration methods | |
JPWO2010140350A1 (en) | Downmix apparatus, encoding apparatus, and methods thereof | |
CN103413553A (en) | Audio coding method, audio decoding method, coding terminal, decoding terminal and system | |
CN110660401B (en) | An audio object encoding and decoding method based on high and low frequency domain resolution switching | |
CN101031961B (en) | Method and device for processing coded signals | |
US20240153512A1 (en) | Audio codec with adaptive gain control of downmixed signals | |
CN108417219B (en) | Audio object coding and decoding method suitable for streaming media | |
CN112365896B (en) | Object-oriented encoding method based on stack type sparse self-encoder | |
CN113314131B (en) | Multistep audio object coding and decoding method based on two-stage filtering | |
US11176954B2 (en) | Encoding and decoding of multichannel or stereo audio signals | |
CN104347077B (en) | A kind of stereo coding/decoding method | |
CN101754086A (en) | Decoder and decoding method for multichannel audio coder using sound source location cue | |
CN113314130B (en) | An Audio Object Coding and Decoding Method Based on Spectrum Shifting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |