CN1258172C - Apparatus and method for encoding and decoding audio signals - Google Patents
Apparatus and method for encoding and decoding audio signals Download PDFInfo
- Publication number
- CN1258172C CN1258172C CNB028289749A CN02828974A CN1258172C CN 1258172 C CN1258172 C CN 1258172C CN B028289749 A CNB028289749 A CN B028289749A CN 02828974 A CN02828974 A CN 02828974A CN 1258172 C CN1258172 C CN 1258172C
- Authority
- CN
- China
- Prior art keywords
- integer
- block
- spectrum value
- difference
- extension layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0017—Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
Abstract
时间离散音频信号被处理(52),以提供一个带有量化频谱值的量化块。此外,使用整数变换算法(56),从时间离散音频信号产生一种整数频谱表示。使用心理声学模型(54)产生的量化块被反向量化并取整(58),以随后在整数频谱值和反向量化取整频谱值之间形成差值。在解码之后,这个量化块单独提供一种有损的心理声学编码/解码音频信号;而在解码中,这个量化块和结合模块一起提供一个无损或者几乎无损的编码和再次解码音频信号。通过在频域内产生差分信号,形成了一个简单的编码器/解码器结构。
The discrete-time audio signal is processed (52) to provide a quantization block with quantized spectral values. Furthermore, an integer spectral representation is generated from the discrete-time audio signal using an integer transform algorithm (56). The quantization block generated using a psychoacoustic model (54) is inversely quantized and rounded (58) to subsequently form a difference between the integer spectral values and the inversely quantized rounded spectral values. After decoding, this quantization block alone provides a lossy psychoacoustic encoded/decoded audio signal; while during decoding, this quantization block, together with the combination module, provides a lossless or near-lossless encoded and re-decoded audio signal. By generating differential signals in the frequency domain, a simple encoder/decoder structure is formed.
Description
技术领域technical field
本发明涉及音频编码/解码,尤其是涉及可扩展(scalable)的编码/解码算法,这种算法包含了一个心理声学的第一扩展层和一个包括用于无损解码的辅助音频数据的第二扩展层。The present invention relates to audio encoding/decoding, and in particular to scalable encoding/decoding algorithms comprising a psychoacoustic first extension layer and a second extension including auxiliary audio data for lossless decoding layer.
背景技术Background technique
现代音频编码方法,如MPEG Layer3(MP3)或者MPEG ACC,使用如所谓的修正离散余弦变换(MDCT)的变换来获得对音频信号的数据块式的频率表示。这样的音频编码器通常获得时间离散的音频采样的一个数据流。音频采样的数据流被窗口化(windowed)用以获取例如1024或者2048个窗口化的音频采样的窗口数据块。为了进行窗口化使用了多种窗口函数,例如正弦窗口等。Modern audio coding methods, such as MPEG Layer 3 (MP3) or MPEG ACC, use transforms such as the so-called Modified Discrete Cosine Transform (MDCT) to obtain a block-wise frequency representation of the audio signal. Such audio encoders typically obtain a data stream of time-discrete audio samples. The stream of audio samples is windowed to obtain windowed blocks of eg 1024 or 2048 windowed audio samples. Various window functions are used for windowing, such as sine window and so on.
随后,窗口化的时间离散音频采样通过滤波器组被转换为频谱表示。原则上,傅立叶变换,或者用于特殊原因的多种傅立叶变换,如FFT,或者前面阐述的MDCT,都可以用于此。然后,在滤波器组输出端处的音频频谱值的数据块可以根据要求做进一步处理。在上面引用的音频编码器中,随后是音频频谱的量化,其中典型选择量化级,以使被量化引入的量化噪声在心理声学掩盖阈值之下,也就是说被“掩盖”住了。量化是一种有损编码。为了获得进一步的数据量缩减,量化的频谱值被熵编码,例如通过哈夫曼编码。通过添加辅助信息,如比例因子(scale factors)等,一个能够被存储或者传送的比特流通过比特流多路复用器从熵编码量化的频谱值中形成。Subsequently, the windowed time-discrete audio samples are converted to a spectral representation through a filter bank. In principle, the Fourier transform, or a variety of Fourier transforms for special reasons, such as the FFT, or the MDCT explained earlier, can be used for this. The block of audio spectral values at the output of the filter bank can then be further processed as required. In the audio coders cited above, quantization of the audio spectrum is followed, where the quantization level is typically chosen such that the quantization noise introduced by the quantization is below the psychoacoustic masking threshold, that is to say "masked". Quantization is a lossy encoding. In order to obtain a further data volume reduction, the quantized spectral values are entropy coded, for example by Huffman coding. By adding auxiliary information, such as scale factors, a bitstream that can be stored or transmitted is formed from the entropy-encoded quantized spectral values through a bitstream multiplexer.
在音频解码器中,比特流被一个比特流分离多路复用器分割为编码量化的频谱值和辅助信息。熵编码的量化频谱值首先被熵解码,以获得量化频谱值。经过量化的频谱值然后被反向量化,以获得包含量化噪声的解码频谱值,然而,这种量化噪声是在生理声学掩盖阈值之下的,因而是听不到的。然后这些频谱值通过合成滤波器组被转换为时间表示方式,以获得时间离散的解码音频采样。在合成滤波器组中,必须使用一种与变换算法相反的变换算法。而且,在频率-时间转换或者反变换后,窗口必须被取消。In the audio decoder, the bitstream is split by a bitstream demultiplexer into encoded quantized spectral values and side information. The entropy encoded quantized spectral values are first entropy decoded to obtain quantized spectral values. The quantized spectral values are then dequantized to obtain decoded spectral values that contain quantization noise, however, such quantization noise is below the physiological acoustic masking threshold and thus is inaudible. These spectral values are then converted to a temporal representation through a synthesis filterbank to obtain time-discrete decoded audio samples. In synthesizing filter banks, a transform algorithm that is the inverse of the transform algorithm must be used. Also, after frequency-to-time conversion or inverse conversion, the window must be canceled.
为了获得良好的频率选择性,现代音频编码器典型地利使用块重叠。这种情况在图4a中示出。首先,通过装置402取出例如2048个时间离散的音频采样,并窗口化。实现这种窗口的装置402具有2N个采样的窗口长度,并在输出端提供了一个2N个窗口化采样的数据块。为了获得窗口重叠,通过装置404(仅仅是为了表述得更加清楚,该装置在图4a中与装置402被分开描述),形成了第二个2N个窗口化采样的数据块。然而,被送入装置404的2048个采样不是紧接着第一个窗口的时间离散音频采样,而是包含了通过装置402窗口化了的采样的后半部分,此外仅包含了1024个“新”采样。在图4a中通过装置406示意性地说明了这个重叠,引起了50%的重叠度。然后,对通过装置402的2N个窗口化采样输出和通过装置404的2N个窗口化采样输出,分别用装置408和410实现MDCT算法。装置408根据已知的MDCT算法为第一个窗口提供了N个频谱值,而装置410也提供了N个频谱值,不过是用于第二个窗口,其中第一个和第二个窗口之间有50%的重叠。In order to obtain good frequency selectivity, modern audio coders typically utilize block overlapping. This situation is shown in Figure 4a. First, for example, 2048 time-discrete audio samples are taken out by means 402 and windowed. The means 402 for implementing such a window has a window length of 2N samples and provides at output a data block of 2N windowed samples. To obtain window overlap, by means 404 (which is depicted separately from means 402 in FIG. 4a only for clarity of presentation), a second block of 2N windowed samples is formed. However, instead of the time-discrete audio samples immediately following the first window, the 2048 samples fed into device 404 contain the second half of the samples windowed by device 402, and in addition only contain 1024 "new" sampling. This overlap is schematically illustrated in Fig. 4a by means 406, causing an overlap of 50%. Then, for the 2N windowed sample outputs passing through the device 402 and the 2N windowed sample outputs passing through the device 404, the MDCT algorithm is implemented by means 408 and 410, respectively. The device 408 provides N spectral values for the first window according to the known MDCT algorithm, and the device 410 also provides N spectral values, but for the second window, wherein the first window and the second window There is a 50% overlap between them.
在解码器中,第一个窗口的N个频谱值,如图4b所示,被送入装置412来实现修正离散余弦反变换。同样的操作被作用于第二个窗口的N个频谱值。它们被送入装置414,也实现了修正离散余弦反变换。装置412和装置414都分别为第一个窗口和第二个窗口提供了2N个采样。In the decoder, the N spectral values of the first window, as shown in Fig. 4b, are fed into the
在装置416中,在图4b中以TDAC(时域混迭取消)来表示,考虑到两个窗口是重叠的。特别地,第一个窗口的后半部分的一个采样y1(也就是带有系数N+k)与第二个窗口的前半部分的采样y2(也就是带有系数k)相加,这样在输出端,也就是解码器处生成N个已解码的时域采样。In
需要注意的是,通过也称为相加函数的装置416的功能,在图4a所示的编码器中实现的窗口化在一定程度上被自动考虑,所以在图4b所示的解码器中不必有明显的“反向窗口化”发生。It should be noted that the windowing implemented in the encoder shown in Fig. 4a is taken into account to a certain extent automatically by the function of the
当通过装置402或者404实现的窗口函数被指定为w(k),其中系数k代表时间系数,必须满足的条件是平方后的窗口权重w(k)与平方后的窗口权重w(N+k)的和等于1,其中k的范围从0到N-1。当使用正弦窗口时,该窗口的权重遵循正弦函数的前半波,这个条件始终满足,因为任意角的正弦平方与余弦平方的和均为1。When the window function implemented by means 402 or 404 is designated as w(k), where the coefficient k represents the time coefficient, the condition that must be satisfied is that the squared window weight w(k) and the squared window weight w(N+k ) equals 1, where k ranges from 0 to N-1. When using a sine window, the weight of the window follows the first half of the sine function, which is always true because the sum of the squared sine and the squared cosine of any angle is 1.
在图4a中描述的按照MDCT函数的窗口方法的缺点是,通过将时间离散的采样相乘来窗口化,当考虑它为一个正弦窗口的时候,它由一个浮点数来达到,因为一个在0到180度之间的角的正弦不会产生整数,除非这个角等于90度。即便当整数时间离散采样被窗口化时,在窗口化后也会产生浮点数。The disadvantage of the windowing method according to the MDCT function described in Fig. 4a is that windowing is performed by multiplying time-discrete samples, which is achieved by a floating point number when considering it as a sinusoidal window, since one at 0 The sine of an angle between 180 and 180 does not produce an integer unless the angle is equal to 90. Even when integer-time discrete samples are windowed, floating-point numbers are produced after windowing.
因此,即使当不使用心理声学编码时,也就是当需要获得无损编码时,为了进行适当的易于处理的熵编码,在装置408或装置410的输出端处的量化也是必要的。Therefore, quantization at the output of means 408 or 410 is necessary for proper tractable entropy coding even when psychoacoustic coding is not used, ie when lossless coding is to be obtained.
当已知的变换,如在图4a基础上描述的那样,被应用于无损音频编码,需要使用非常好的量化,以可以忽略由于浮点数取整而引起的结果误差,或者误差信号需要例如在时域中被额外地编码。When known transforms, as described on the basis of Fig. 4a, are applied to lossless audio coding, it is necessary to use very good quantization, so that the resulting errors due to rounding of floating-point numbers can be ignored, or the error signal needs to be e.g. are additionally encoded in the time domain.
现有技术中的概念,也就是在其中量化被非常好地调整以使得由于浮点数取整而引起的结果错误可以被忽略,例如在德国专利DE 19742 201 C1中公开的那样。这里,一个音频信号被转换为它的频谱表示并被量化,以获得量化的频谱值。量化的频谱值然后被反向量化,变换到时域,并且被与原始的音频信号相比较。如果误差,也就是原始音频信号与量化/反向量化后的音频信号之间的误差,在一个误差阈值以上,在反馈中量化器会被调整得更加精确,然后再次进行比较。当低于误差阈值时,停止迭代。可能仍然存在的残留信号被一个时域编码器编码并被写入一个比特流,这个比特流除了时域编码的残留信号外还包括根据在迭代取消时候存在的量化器调整进行量化后的编码频谱值。需要注意的是,量化器不一定必须通过心理声学模型控制,以使编码的频谱值通常比由于采用心理声学模型而得到的频谱值量化得更为精确。Concepts in the prior art, namely in which quantization is adjusted so well that result errors due to rounding of floating-point numbers can be ignored, are disclosed, for example, in German patent DE 19742 201 C1. Here, an audio signal is converted to its spectral representation and quantized to obtain quantized spectral values. The quantized spectral values are then dequantized, transformed into the time domain, and compared with the original audio signal. If the error, that is, the error between the original audio signal and the quantized/inverse quantized audio signal, is above an error threshold, the quantizer is adjusted to be more accurate in the feedback, and the comparison is made again. When it is below the error threshold, stop the iteration. The residual signal that may still be present is encoded by a time-domain encoder and written to a bitstream that includes, in addition to the time-domain encoded residual signal, the encoded spectrum quantized according to the quantizer adjustments that existed at the time of the iterative cancellation value. It should be noted that the quantizer does not necessarily have to be controlled by the psychoacoustic model, so that the coded spectral values are usually quantized more precisely than those obtained due to the use of the psychoacoustic model.
在出版物“A Design of Lossy and Lossless Scalable AudioCoding”(T.Moriya et al.,Proc.ICASSP,2000)中描述了一个可扩展的编码器,这个编码器包括如一个MPEG编码器作为第一个有损数据压缩模块,此模块具有一个数据块形式的数字信号形式作为输入信号,并生成压缩的比特流。在另一个现有的本地解码器中编码再次被取消,并生成了一个编码/解码信号。这个信号通过从初始输入信号中减去编码/解码信号而与初始的输入信号相比较。误差信号然后被送到第二个模块,在那里使用了一个无损位转换器。这个转换有两步。第一步包括一个从二进制补码格式到符号数值格式的转换。第二步包括在一个处理决中从一个垂直数值序列到一个水平比特序列的转换。无损数据转换被执行以使零的数量最大化或者使一个序列中连续零的数量最大化,以便获得尽可能好的作为数字结果表示的时间误差信号。这一原理基于在出版物“Multi-Layer Bit Sliced Bit Rate Scalable AudioCoder”(103rd AES Convention,Preprint No.4520,1997)中阐明的比特片算法编码(BSAC)方案。In the publication "A Design of Lossy and Lossless Scalable AudioCoding" (T.Moriya et al., Proc. ICASSP, 2000) a scalable encoder is described which includes e.g. an MPEG encoder as the first Lossy data compression module, this module takes as input a digital signal form in the form of data blocks and generates a compressed bit stream. In another existing native decoder the encoding is canceled again and an encoded/decoded signal is generated. This signal is compared to the original input signal by subtracting the encoded/decoded signal from the original input signal. The error signal is then sent to a second block where a lossless bit converter is used. This conversion has two steps. The first step involves a conversion from two's complement format to signed numeric format. The second step involves the conversion from a vertical sequence of values to a horizontal sequence of bits in one processing block. Lossless data conversion is performed to maximize the number of zeros or to maximize the number of consecutive zeros in a sequence in order to obtain the best possible temporal error signal represented as a digital result. This principle is based on the Bit Slice Algorithm Coding (BSAC) scheme explained in the publication "Multi-Layer Bit Sliced Bit Rate Scalable AudioCoder" (103 rd AES Convention, Preprint No. 4520, 1997).
上述概念的缺点是用于无损扩展层的数据,也就是用于获得无损音频信号解码的辅助数据必须在时域中获得。这意味着获得为了获得时域的编码/解码信号需要包含频率/时间变换的完全解码,所以通过在原始音频输入信号与编码/解码音频信号之间的采样差异的形成来计算误差信号,编码/解码音频信号由于是心理声学编码因而是有损的。这个概念的缺点尤其在于在编码器生成音频数据流时,两种完全的时间/频率变换装置,如滤波器组或者如MDCT算法,都被要求用于前向的转换,另一方面,仅仅为了产生误差信号,需要一个完整的反向滤波器组或者一个完全的合成算法。因而,编码器除了它固有的编码器功能,还必须具有完全的解码器功能。如果编码器是由软件实现的,则为此对存储性能和处理器性能都有所要求,从而导致编码器的实现增加了开销。A disadvantage of the above concept is that the data for the lossless extension layer, ie the auxiliary data for decoding to obtain a lossless audio signal, must be obtained in the time domain. This means that obtaining an encoded/decoded signal in the time domain requires a full decoding including frequency/time transforms, so the error signal is computed by forming the sample difference between the original audio input signal and the encoded/decoded audio signal, the encoded/decoded Decoded audio signals are lossy due to psychoacoustic encoding. The disadvantage of this concept is especially that when the encoder generates the audio data stream, two complete time/frequency transformation devices, such as filter banks or algorithms such as MDCT, are required for forward transformation, on the other hand, only for To generate the error signal, a complete inverse filter bank or a complete synthesis algorithm is required. Thus, an encoder must have full decoder functionality in addition to its inherent encoder functionality. If the encoder is implemented by software, there are requirements for both storage performance and processor performance, which results in increased overhead for the implementation of the encoder.
发明内容Contents of the invention
本发明的目的在于提供一种花费较少的概念,利用这个概念,可以产生以一种以几乎无损的方式解码的音频数据流。It is an object of the invention to provide an inexpensive concept with which an audio data stream can be generated which can be decoded in an almost lossless manner.
这个目标通过权利要求1中对时间离散的音频信号进行编码的装置,权利要求21中对时间离散的音频信号进行编码的方法,权利要求22中对已编码的音频数据进行解码的装置,权利要求31中对已编码的音频数据进行解码的方法,或者权利要求32或33中的计算机程序来实现。This object is achieved by a device for encoding a time-discrete audio signal in
本发明基于这样的发现,可以对音频信号进行无损解码的辅助音频信号可以通过如通常那样提供一个量化频谱值的数据块,然后对其进行反向量化来获得反向量化的频谱值来实现,反向量化的频谱值由于使用了心理声学模型量化因而是有损的。这些反向量化的频谱值然后被取整,以获得经过取整的反向量化的频谱值的取整块。作为形成差值的参考,按照本发明,使用了一种整数变换算法,此算法从一个整数时间离散采样块生成了只包含整数频谱值的频谱值整数块。按照本发明,现在在取整块和在整数块中的频谱值的结合是以频谱值的方式实现的,也就是说在频域内实现,所以在编码器本身不需要合成算法,也就是反向滤波器组或者反向MDCT算法等。由于整数变换算法和取整量化值,包含不同频谱值的结合块仅仅包含可以以某些已知方式熵编码的整数值。需要注意的是,任意的熵编码器都可以用于结合块的熵编码,如哈夫曼编码器和算法编码器等。The invention is based on the discovery that an auxiliary audio signal capable of lossless decoding of the audio signal can be achieved by providing a data block of quantized spectral values as usual and then dequantizing it to obtain dequantized spectral values, The dequantized spectral values are lossy due to quantization using a psychoacoustic model. These inverse quantized spectral values are then rounded to obtain rounded blocks of rounded inverse quantized spectral values. As a reference for forming the difference, according to the invention an integer transformation algorithm is used which generates an integer block of spectral values containing only integer spectral values from a block of discrete samples at integer times. According to the present invention, the combination of the rounding block and the spectral value in the integer block is now realized in the form of spectral value, that is to say in the frequency domain, so no synthesis algorithm is needed in the encoder itself, that is, the reverse Filter bank or inverse MDCT algorithm, etc. Due to the integer transformation algorithm and the rounding of quantization values, a combined block containing different spectral values only contains integer values that can be entropy coded in some known way. It should be noted that any entropy encoder can be used for entropy encoding of combined blocks, such as Huffman encoders and algorithmic encoders.
对量化块的量化频谱值编码也可以使用任意的编码器,如已知的现代音频编码器常用的工具。The encoding of the quantized spectral values of the quantized blocks can also use any encoder, such as known tools commonly used by modern audio encoders.
值得注意的是,本发明的编码/解码概念与现代编码装置是兼容的,如窗口切换、TNS、或者多信道音频信号的中心/边缘编码。It is worth noting that the encoding/decoding concept of the present invention is compatible with modern encoding devices such as window switching, TNS, or center/edge encoding of multi-channel audio signals.
在本发明的一个优选实施例中,用MDCT来提供一个使用心理声学模型量化的频谱值量化块。此外,最好使用一个所谓IntMDCT作为整数变换算法。In a preferred embodiment of the invention, MDCT is used to provide a quantized block of spectral values quantized using a psychoacoustic model. Also, it is better to use a so-called IntMDCT as an integer transformation algorithm.
在本发明的替代实施例中,可以不使用通常的MDCT,而IntMDCT可以作为MDCT的近似,即通过整数变换算法获得的整数频谱被用于心理声学量化器来获得量化的IntMDCT频谱值,此频谱值然后再次被反向量化并取整,以与原始的整数频谱值相比较。在这种情况下,只需要单一变换,也就是IntMDCT从整数时间离散采样产生整数频谱值。In an alternative embodiment of the present invention, the usual MDCT may not be used, and the IntMDCT may be used as an approximation of the MDCT, that is, the integer spectrum obtained by the integer transform algorithm is used in a psychoacoustic quantizer to obtain quantized IntMDCT spectral values, this spectrum The values are then dequantized and rounded again for comparison with the original integer spectral values. In this case, only a single transform is required, namely IntMDCT to produce integer spectral values from integer-time discrete samples.
典型地,处理器处理整数,或者每个浮点数被表示为整数。如果一个整数算法用于一个处理器,它可以无需对反向量化的频谱值取整,因为由于处理器取整值的算法,也就是在LSB精确度范围之内,即最低有效位,总是存在的。在这样的情况下,实现了完全的无损处理,也就是在被使用的处理器精度范围之内的处理。然而可选地,也可以取整到一个大致的精度,以使合成块中的差分信号被取整到一个由取整函数所确定的精确度。为了生成一个在数据压缩意义上几乎无损的编码器,在原本的处理系统取整外引入了取整,这样增强了灵活性,从而影响了编码无损的程度。Typically, processors handle integers, or each floating point number is represented as an integer. If an integer algorithm is used in a processor, it can eliminate the need to round the dequantized spectral values, because due to the processor's algorithm rounding the values, that is, within the precision of the LSB, i.e. the least significant bit, is always existing. In this case, completely lossless processing is achieved, that is, processing within the precision of the processor being used. Optionally, however, rounding to an approximate accuracy is also possible, so that the differential signal in the synthesis block is rounded to an accuracy determined by the rounding function. In order to generate an encoder that is almost lossless in the sense of data compression, rounding is introduced in addition to the rounding of the original processing system, which enhances flexibility and thus affects the degree of lossless encoding.
根据本发明的解码器本身在心理声学编码音频数据和辅助音频数据两方面特别突出,辅助音频数据从音频数据中抽取出,进行可能的熵解码,然后又做如下处理。首先解码器中量化块被反向量化,并且使用与编码器中一样的取整算法进行取整,这样随后可以被加到熵解码辅助音频数据上。在解码器中,然后心理声学压缩的音频信号的频谱表示和音频信号的无损表示同时存在,其中心理声学压缩的音频信号频谱表示被变换到时域,以获得一个无损的编码/解码音频信号,而所述无损表示通过使用与为获得无损,或者如上所述的那样,基本无损的编码/解码音频信号而使用的整数转换算法相反的整数转换算法变换到时域。The decoder according to the invention is itself particularly distinguished both with respect to psychoacoustically encoded audio data and with auxiliary audio data extracted from the audio data, subjected to possible entropy decoding, and then processed as follows. First, the quantized block is dequantized in the decoder and rounded using the same rounding algorithm as in the encoder, which can then be added to the entropy-decoded auxiliary audio data. In the decoder, the spectral representation of the psychoacoustically compressed audio signal then co-exists with a lossless representation of the audio signal, wherein the psychoacoustically compressed spectral representation of the audio signal is transformed into the time domain to obtain a lossless encoded/decoded audio signal, The lossless representation is instead transformed to the time domain by using an integer transformation algorithm inverse to that used to obtain a lossless, or, as mentioned above, substantially lossless encoded/decoded audio signal.
附图说明Description of drawings
本发明的上述及其他目标和特性将在下面与附图相结合的描述中更加清楚:The above-mentioned and other objects and characteristics of the present invention will be clearer in the following description in conjunction with the accompanying drawings:
图1是用于处理时间离散的音频采样,以获得从中可确定整数频谱值的整数值的优选的装置的电路框图;1 is a block circuit diagram of a preferred apparatus for processing time-discrete audio samples to obtain integer values from which integer spectral values can be determined;
图2是一个在Givens旋转以及两个DCT-IV操作中的MDCT和反向MDCT的分解的示意图;Figure 2 is a schematic diagram of the decomposition of MDCT and inverse MDCT in Givens rotation and two DCT-IV operations;
图3是在旋转和DCT-TV操作中有50%重叠的MDCT分解的图例代表;Figure 3 is a legend representation of the MDCT decomposition with 50% overlap in rotation and DCT-TV operations;
图4a是一个具有MDCT和50%重叠的已知编码器的示意电路框图;Figure 4a is a schematic circuit block diagram of a known encoder with MDCT and 50% overlap;
图4b是用于对图4a中生成的值进行解码的已知解码器的电路框图;Figure 4b is a block circuit diagram of a known decoder for decoding the values generated in Figure 4a;
图5是一个优选的根据本发明的编码器的原理电路框图;Fig. 5 is a preferred schematic circuit block diagram of an encoder according to the present invention;
图6是一个可作为替代的优选的具有创造性的解码器的原理电路框图;Fig. 6 is a schematic circuit block diagram of an alternative preferred inventive decoder;
图7是一个具有创造性的优选解码器的原理电路框图;Fig. 7 is a schematic circuit block diagram of an inventive preferred decoder;
图8a是具有一个第一扩展层和一个第二扩展层的比特流示意图;Figure 8a is a schematic diagram of a bitstream with a first extension layer and a second extension layer;
图8b是具有一个第一扩展层和多个其它扩展层的比特流示意图;Figure 8b is a schematic diagram of a bitstream with a first extension layer and a plurality of other extension layers;
图9是二进制编码差分频谱值的示意图,用于表示与差分频谱值的精确度(位)有关和/或与差分频谱值的频率(采样率)有关的可能扩展比率。Fig. 9 is a schematic diagram of binary coded differential spectral values for representing possible spreading ratios in relation to the precision (bits) of the differential spectral values and/or in relation to the frequency (sampling rate) of the differential spectral values.
具体实施方式Detailed ways
在图5到7的基础上,下面将论及具有创造性的编码器电路(图5和图6)或者一个具有创造性的优选的解码器电路(图7)。图5所示的本发明的编码器包括一个输入端50,时间离散的音频信号被送入这个输入端,还包括一个输出端52,它输出已编码的音频数据。输入端50处的时间离散的音频信号被馈入装置52以提供一个量化块,这个块在输出端提供了时间离散的音频信号的量化块,这个量化块包含使用生理声学模型54的时间离散频谱音频信号50的量化频谱值。本发明的编码器还包含使用一个整数变换算法56生成一个整数块的装置,其中这个整数算法对从整数时间离散采样生成整数频谱值是有效的。On the basis of FIGS. 5 to 7, the following will discuss an inventive encoder circuit (FIGS. 5 and 6) or an inventive preferred decoder circuit (FIG. 7). The encoder of the invention shown in Figure 5 comprises an
具有创造性的编码器还包括用于从装置52对量化块输出进行反向量化的装置58,并且,当需要和处理器精度不同的精度时,还包括一个取整函数。如同所述的一样,如果已经达到处理器系统的精度,则取整函数已经固有地包含在量化块的反向量化中,因为一个具有整数算法的处理器是无论如何不能够提供非整数值的。于是装置58提供了一种所谓的取整块,它包括固有地或者显式地被取整为整数的反向量化频谱值。取整块和整数块都被馈送到用于使用差异形成提供具有差分频谱值的差分的结合装置,在这里术语“差分块”意味着差分频谱值是包含整数决与取整块之间的差的数值。The inventive encoder also includes
从装置52输出的量化块以及从输出差异形成装置58的差分块都被送入处理装置60,来实现如通常的量化块处理,并例如引起对差分块的熵编码。处理装置60在输出端52输出经过编码的音频数据,这些数据包括量化块的信息,还包括差分块的信息。Both the quantized block output from means 52 and the differential block output from difference forming means 58 are fed to processing means 60 for performing quantized block processing as usual and for example causing entropy coding of the differential block. The processing means 60 output the encoded audio data at the
在第一个优选实施例中,如图6所示,时间离散的音频信号通过MDCT方法被转换为频谱表示,然后被量化。装置52用于提供量化块,具有MDCT装置52a和一个量化器52b。In a first preferred embodiment, as shown in Fig. 6, the time-discrete audio signal is converted into a spectral representation by MDCT method and then quantized. Means 52 for providing quantized blocks have MDCT means 52a and a
另外,最好用IntMDCT56作为整数转换算法来生成整数块。Also, it is best to use IntMDCT56 as the integer conversion algorithm to generate integer blocks.
在图6中,图5所示的处理装置60也作为比特流编码装置60a和熵编码器60b来描述,比特流编码装置60a是用于对装置52b输出的量化块进行比特流编码,熵编码器60b是用于对差分块进行熵编码。比特流编码器60a输出生理声学编码的音频数据,而熵编码器60b输出熵编码的差分块。模块60a和60b的两种输出数据块可以通过一种合适的方式结合为比特流,此比特流以生理声学编码的音频数据作为第一扩展层,而把用于无损解码的辅助音频数据作为第二扩展层。这个经过扩展的比特流然后与图5所示的在编码器的输出端52处的已编码的音频数据相一致。In FIG. 6, the
在一个替代的优选实施例中,可以不使用图6中的MDCT块52a,因为它已在图5中通过虚线箭头62暗示了。在这种情况下,整数变换装置56提供的整数频谱被送到图6中形成差值的装置58和量化器52b。由整数变换算法产生的频谱值在这里通过一种方式被用做通常的MDCT频谱的近似。这个实施例的好处在于,仅仅IntMDCT算法存在于编码器中,而不是IntMDCT和MDCT算法都需要存在。In an alternative preferred embodiment, the
再次参考图6,需要注意的是,实框和实线代表遵循某一MPEG标准的一个普通音频编码器,而虚框和虚线则代表这样一个普通MPEG编码器的扩展。因此,可以看到不需要对普通MPEG编码器进行根本改变,而是通过增加整数变换器的方法来捕获无损编码的辅助音频数据,并不需要改变编码器/解码器的基本结构。Referring again to FIG. 6, it should be noted that the solid boxes and lines represent a general audio coder conforming to an MPEG standard, while the dashed boxes and dashed lines represent extensions of such a general MPEG coder. Therefore, it can be seen that there is no fundamental change to the general MPEG encoder, but the method of adding an integer transformer to capture the lossless encoded auxiliary audio data does not need to change the basic structure of the encoder/decoder.
图7示出了一个用于对图5中输出端52处的已编码的音频数据输出进行解码的具有创造性的解码器的原理电路框图。它首先一方面分解为心理声学编码音频数据,另一方面分解为辅助音频数据。心理声学编码音频数据被送入一个普通的比特流解码器70,而辅助音频数据,当在被编码器熵编码后,被编码器72熵编码。在图7中比特流解码器70的输出端处存在量化频谱值,这些频谱值原理上可以被送到与图6的装置中的反向量化器结构相同的反向量化器74。如果需要达到一个与处理器精度不同的精度,在解码器中还提供了一个取整装置76,取整装置76与图6的装置58一样,实现了将一个实数映射为一个整数的同样的算法或者同样的取整函数。在一个解码端结合器78中,经过取整的反向量化频谱值最好通过相加以频谱值的方式与熵编码辅助音频数据相结合,使得在解码器中,一方面反向量化频谱值出现在装置74的输出端处,另一方面整数频谱值出现在结合器78的输出端处。FIG. 7 shows a schematic block circuit diagram of an inventive decoder for decoding the encoded audio data output at
然后,为了执行经过修正的离散余弦反变换,可以通过装置80把装置74的输出端处的频谱值变换到时域,以得到一个有损的心理声学编码和再解码的音频信号。为了执行反向的整数MDCT(IntMDCT),可以通过装置82把合成器78的输出信号也变换到其时间形式,以产生一个无损的编码/解码音频信号,或者在采用一个更加粗略的取整的时候,能够产生一个几乎无损的编码和再解码的音频信号。The spectral values at the output of the
下面来看图6中的熵编码器60b一种特别优选的实施方式。在通常的现代MPEG编码器中,多个码表是根据量化频谱值的平均统计量来选择。最好在合成器58的输出端处的差分块使用相同的码表或者码书来进行熵编码。由于差分块的大小,即残留IntMDCT频谱,取决于量化的精度,因此熵编码器60b的码表选择可以在没有辅助边缘信息的情况下执行。A particularly preferred implementation of the
在一个MPEG-2AAC解码器中,频谱系数,也就是量化频谱值,被分组为在量化块中的比例因子频带,其中频谱值以来自与比例因子频带相关的相应的比例因子的增益因子来加权。由于在这个已知的编码器概念中,一个非均匀的量化器被用于量化加权的频谱值,残留值的大小,也就是结合器58的输出端处的频谱值,不仅取决于比例因子,还取决于量化值自身。但是由于比例因子和量化频谱值都包含在由图6的装置60a生成的比特流中,也就是在心理声学编码音频数据中,最好根据差分频谱值的大小来实现解码器中的码书选择,以及在比特流中传输的比例因子和量化值的基础之上,确定出解码器中所使用的码表。由于在合成器58的输出端不需要传输辅助信息以对差分频谱值进行熵编码,熵编码仅仅导致数据率压缩,而不需要在数据流中扩展任何信号化比特作为熵编码器60b的辅助信息。In an MPEG-2 AAC decoder, the spectral coefficients, i.e. quantized spectral values, are grouped into scalefactor bands in quantized blocks, where the spectral values are weighted by gain factors from the corresponding scalefactors associated with the scalefactor bands . Since in this known encoder concept a non-uniform quantizer is used to quantize the weighted spectral values, the size of the residual value, i.e. the spectral value at the output of the
在一个遵循标准MPEG-2 AAC的音频编码器中,用窗口切换来避免瞬态音频信号域中的前向回波。这种技术基于在每半个MDCT窗口中分别选择窗口形状的可能性,能够在连续块中改变块的大小。同样的,IntMDCT形式的整数变换算法(这种算法参照图1到3来解释)也在窗口化和在时域MDCT分解的混迭部分使用了不同的窗口形状来执行。因而,为整数变换算法和生成量化块的变换算法最好使用相同的窗口判别。In an audio codec conforming to the standard MPEG-2 AAC, window switching is used to avoid forward echoes in the transient audio signal domain. This technique is based on the possibility to choose the window shape separately in each half of the MDCT window, enabling the block size to be varied in successive blocks. Likewise, integer transform algorithms of the form IntMDCT (this algorithm is explained with reference to Figures 1 to 3) are also performed using different window shapes for windowing and aliasing in the temporal MDCT decomposition. Thus, preferably the same window discrimination is used for the integer transform algorithm and the transform algorithm that generates the quantized blocks.
在一个遵循MPEG-2AAC的编码器中,也存在多种其它的编码工具,这里只介绍TNS(时域噪声整形)和中间/边缘(CS)立体声编码。在TNS编码中,就在像CS编码中那样,在量化前对频谱值进行修正。接着,IntMDCT值,也就是整数块,之间的差,以及量化MDCT值增加了。根据本发明,形成整数变换算法来接纳TNS编码和中间/边缘编码的整数频谱值。TNS技术基于对MDCT值在频率上的自适应前向预测。通过一个信号自适应方式的普通TNS模块计算出的相同的预测滤波器最好也被用于预测整数频谱值,而如果其中产生了非整数值,则会使用向下取整,再次产生整数值。此取整最好发生在每个预测步骤之后。在解码器中,初始频谱可以通过使用反向滤波器和同样的取整函数再次重建。同样,CS编码也可基于提升法通过使用具有角度π/4的取整Givens旋转用于IntMDCT频谱值。因此,在解码器中的初始IntMDCT值是可以重建的。In an MPEG-2AAC-compliant encoder, there are many other encoding tools, here only TNS (temporal noise shaping) and mid/edge (CS) stereo coding are introduced. In TNS coding, just as in CS coding, the spectral values are corrected before quantization. Next, the IntMDCT value, ie the difference between the integer blocks, and the quantized MDCT value are increased. According to the invention, an integer transformation algorithm is formed to accommodate TNS coded and intermediate/edge coded integer spectral values. The TNS technique is based on adaptive forward prediction of MDCT values in frequency. The same predictive filter computed by a normal TNS module in a signal-adaptive manner is preferably also used to predict integer spectral values, and if non-integer values are generated therein, rounding down is used to generate integer values again . This rounding preferably happens after each prediction step. In the decoder, the original spectrum can be reconstructed again by using the inverse filter and the same rounding function. Likewise, CS coding can also be based on the lifting method for the IntMDCT spectral values by using a rounded Givens rotation with an angle π/4. Therefore, the original IntMDCT value in the decoder can be reconstructed.
需要注意的是,在以IntMDCT作为整数变换算法的优选实施例中,本发明的概念可以应用于一切基于MDCT的听觉适应性音频编码器。只是作为一个例子,这些编码器是根据MPEG-4 AAC可扩展性、MPEG-4 AAC低时延、MPEG-4 BSAC、MPEG-4 Twin VQ、DolbyAC-3等的编码器。It should be noted that in the preferred embodiment using IntMDCT as the integer transform algorithm, the concept of the present invention can be applied to all MDCT-based auditory adaptive audio coders. Just as an example, these encoders are encoders based on MPEG-4 AAC Scalable, MPEG-4 AAC Low Latency, MPEG-4 BSAC, MPEG-4 Twin VQ, DolbyAC-3, etc.
尤其需要注意的是,这个具有创造性的概念是反向兼容的。听觉适应性编码或解码器没有被改变,而仅仅是被扩展了。无损分量的辅助信息可以在以反向兼容方式的听觉适应性方式编码的比特流中传输,如在“辅助数据”域中的MPEG-2 AAC。前面的听觉适应性解码器的附加部分在图7中以虚线表示,它可以与量化MDCT频谱和从听觉适应性解码器以无损方式获得的IntMDCT频谱一起来估计并重建辅助数据。In particular, note that this inventive concept is backward compatible. The auditory adaptive coder or decoder is not changed, but only extended. Auxiliary information for lossless components may be transmitted in an auditory-adaptive coded bitstream in a backwards-compatible manner, such as MPEG-2 AAC in the "ancillary data" field. An additional part of the preceding auditory-adaptive decoder, shown in dashed lines in Fig. 7, can estimate and reconstruct the auxiliary data together with the quantized MDCT spectrum and the IntMDCT spectrum obtained losslessly from the auditory-adaptive decoder.
在无损或者几乎无损编码的补充下,心理声学编码的创造性的概念尤其适合产生、传输和解码可扩展数据流。已知可扩展数据流包含许多不同的扩展层。其中,至少最低的扩展层可以被发送并与较高扩展层无关地进行解码。在数据的可扩展处理中,其它扩展层或者增强层被叠加到第一个扩展层或者基层上。一个完整的编码器可以产生可扩展的数据流,这个数据流具有第一可扩展层,原理上还有任意数目的其它可扩展层。可扩展性概念的一个优点是,假如有一个宽带传输信道可用,由编码器产生的可扩展数据流能够完全发送。也就是说,包括所有的可扩展层都可通过宽带传输信道来传输。但是,如果只有一个窄带的传输信道,经过编码的信号仍然可以通过传输信道发送,但是只能以第一扩展层或者某个数目的其它扩展层的形式来发送。其中其它扩展层的数目小于由编码器产生的所有扩展层数。当然,与信道连接并且适应信道的编码器可能已经产生基扩展层或第一扩展层以及多个与信道相关的其它可扩展层。The inventive concept of psychoacoustic coding is particularly well suited for generating, transmitting and decoding scalable data streams, complemented by lossless or nearly lossless coding. Scalable data streams are known to contain many different layers of scaling. Of these, at least the lowest extension layer can be transmitted and decoded independently of higher extension layers. In scalable processing of data, other extension or enhancement layers are superimposed on the first extension or base layer. A complete encoder can produce a scalable data stream with a first scalable layer and, in principle, any number of other scalable layers. An advantage of the scalability concept is that the scalable data stream generated by the encoder can be completely transmitted, provided a broadband transmission channel is available. That is to say, including all extensible layers can be transmitted through the broadband transmission channel. However, if there is only one narrowband transport channel, the coded signal can still be sent over the transport channel, but only in the form of the first extension layer or a certain number of other extension layers. Wherein the number of other extension layers is less than the number of all extension layers generated by the encoder. Of course, an encoder coupled to and adapted to the channel may already produce a base or first extension layer and a number of other channel-dependent extension layers.
在解码器一端,可扩展概念也有一个优点,那就是反向兼容。这意味着只能处理第一扩展层的解码器忽略了数据流中的第二个以及其它扩展层,并且可以产生一个有用的输出信号。但是,如果解码器是一个典型的更加现代的解码器,能够处理扩展数据流中的多个扩展层,那么这个编码器能够作为基解码器来处理相同的数据流。On the decoder side, the extensibility concept also has the advantage of being backwards compatible. This means that a decoder that can only handle the first extension layer ignores the second and other extension layers in the data stream and can produce a useful output signal. However, if the decoder is a typically more modern decoder capable of handling multiple extension layers in an extension data stream, then this encoder can handle the same data stream as the base decoder.
在本发明中,基本的可扩展性是量化的模块,即比特流编码器60a的输出,被写入到图8的第一个扩展层81中,当考虑图6的情况下,它包含心理声学编码的数据,例如帧。通过合成装置58产生的最好经过熵编码的差分频谱值被写入第二个扩展层中,这种简单的可扩展性在图8a中用82来表示。因此对帧来说,包含辅助音频数据。In the present invention, the basic scalability is that the quantized module, i.e. the output of the
如果从编码器到解码器的传输信道是宽带传输信道,扩展层81和82都可以发送到解码器。但如果这个传输信道是一个窄带传输信道,只有第一个扩展层是“符合”的,第二个扩展层可以在数据发送之前直接从数据流中移除,因此解码器只处理第一个扩展层。If the transport channel from the encoder to the decoder is a broadband transport channel, both extension layers 81 and 82 can be sent to the decoder. But if this transport channel is a narrowband transport channel, only the first extension layer is "compliant", the second extension layer can be directly removed from the data stream before the data is sent, so the decoder only processes the first extension layer.
在解码器一端,一个只能处理心理声学编码数据的“基解码器”可以在通过宽带信道收到第二个扩展层时直接忽略第二个扩展层。但如果这个解码器是一个含有心理声学解码算法和整数解码算法的完全的解码器,那么它可以用第一个和第二个扩展层来解码,以产生无损编码和解码后的输出信号。On the decoder side, a "base decoder" that can only process psychoacoustically encoded data can simply ignore the second extension layer when it is received over a wideband channel. But if this decoder is a full decoder with a psychoacoustic decoding algorithm and an integer decoding algorithm, then it can be decoded with the first and second extension layers to produce a lossless encoded and decoded output signal.
图8a中简要示出了本发明的一个优选实施例,用于帧的心理声学编码数据也被放在第一个扩展层中。图8a中的第二个扩展层被更精细地量化,使得从图8中的这个第二扩展层中出现多个扩展层,例如(更小的)第二扩展层、第三扩展层、第四扩展层等等。A preferred embodiment of the invention is schematically shown in Figure 8a, the psychoacoustically encoded data for the frame is also placed in the first extension layer. The second extension layer in Fig. 8a is quantized more finely, so that from this second extension layer in Fig. 8 emerges multiple extension layers, e.g. a (smaller) second extension layer, third extension layer, Four expansion layers and more.
从加法器58输出的差分频谱值尤其适合进一步的量化,如基于图9所示。图9简要示出了二进制编码的频谱值。图9中的每行90代表一个二进制编码的差分频谱值。在图9中差分频谱值根据频率来分类,在图上用箭头91来表示。一个差分频谱值92比差分频谱值90有更高的频率。图9中的表格中的第一列代表一个差分频谱值中的最高有效位;第二个数字代表有效位为MSB-1的比特;第三个数字代表有效位为MSB-2的比特。倒数第二列代表有效位为LSB+2的比特;倒数第一列代表有效位为LSB+1的比特;最后一列代表有效位数为LSB的比特,也就是一个差分频谱值的最低有效位。The differential spectral values output from
在本发明的一个优选实施例中,差分频谱值的例如16个最高有效位在第二个扩展层中出现,以实现精确量化,这样如果希望的话,可以通过熵编码器60b进行熵编码。采用第二个扩展层的解码器在输出端以16比特的精度获得差分频谱值,这样第二扩展层和第一扩展层一起提供了一个CD音质的无损解码音频信号。已知存在16比特的CD音质音频采样。In a preferred embodiment of the invention, eg the 16 most significant bits of the differential spectral value appear in the second extension layer for precise quantization and thus entropy encoding by
另一方面,如果将演播室音质的音频信号提供给编码器,即,每个采样包含24比特的音频信号,则编码器可进一步产生包含差分频谱值的最后8比特的第三扩展层,并根据需要进行熵编码(图6的装置60)。On the other hand, if the encoder is provided with a studio-quality audio signal, i.e., an audio signal containing 24 bits per sample, the encoder may further generate a third extension layer containing the last 8 bits of the differential spectral value, and Entropy encoding is performed as needed (means 60 of Figure 6).
一个完全的解码器获得第一扩展层、第二扩展层(差分频谱值16个最高有效位)和第三扩展层(差分频谱值8个次高有效位)的数据流,这个解码器可以提供一个无损的、演播室音质的编码/解码音频信号,也就是说,采用全部三个扩展层在解码器的输出端提供24比特的采样字宽。A complete decoder obtains the data streams of the first extension layer, the second extension layer (16 most significant bits of difference spectrum value) and the third extension layer (8 second most significant bits of difference spectrum value), and this decoder can provide A lossless, studio-quality encoded/decoded audio signal, that is, using all three extension layers to provide a 24-bit sample wordwidth at the output of the decoder.
需要注意的是,演播室领域中音频信号比一般消费类领域音频信号有更长的采样字长。在消费类领域,音频CD中信号字宽是16比特,而在演播室领域中是24或20比特。It should be noted that the audio signal in the studio field has a longer sample word length than the audio signal in the general consumer field. In the consumer field, the signal word width in an audio CD is 16 bits, and in the studio field it is 24 or 20 bits.
基于在IntMDCT领域缩放的概念,如前所述,所有三种精度(16比特,20比特或24比特)或者最小用1比特来量化的任意精度均可以被量化编码。Based on the concept of scaling in the IntMDCT domain, all three precisions (16-bit, 20-bit or 24-bit) or any precision quantized with a minimum of 1 bit can be quantized and coded as described above.
这里,用24比特精度表示的音频信号在借助于反向IntMDCT在整数频域中表示,并且和听力适应的基于MDCT的音频编码输出信号量化结合。Here, an audio signal represented with 24-bit precision is represented in the integer frequency domain by means of an inverse IntMDCT and combined with a hearing-adapted MDCT-based audio coding output signal quantization.
用于无损表示的整数差分值现在不是在一个扩展层中完全编码,而是首先以一种比较低的精度来编码。仅在一个其它扩展层中发送为精确的表达所需的残留值。然而一种替代方案是,一个差分频谱值可以被完整的表示,即在其它扩展层中例如用24比特来表示,这样对于解码这个其它的可扩展层,则不再需要下面的扩展层。然而,这种情况会导致更高的比特流大小,但是当传输信道的带宽不存在问题时,在解码器端就会简化,因为在解码器中可扩展层不再需要结合起来,对解码始终采用一个扩展层就足够了。Integer difference values for lossless representation are now not fully encoded in an extension layer, but first encoded at a lower precision. Only the residual values required for an exact representation are sent in a further extension layer. As an alternative, however, a differential spectral value can be represented completely, ie, for example, with 24 bits in the other extension layer, so that no further extension layer is required for decoding this other extension layer. However, this situation leads to higher bitstream sizes, but when the bandwidth of the transmission channel is not an issue, it simplifies at the decoder side, because in the decoder the scalable layers no longer need to be combined, and the decoding is always It is sufficient to adopt an extension layer.
例如如果低8位LSB,如图9所示,在开始时不再发送,就能实现在24比特和16比特之间的可扩展性。For example, if the lower 8 LSBs, as shown in Figure 9, are not sent at the beginning, scalability between 24 bits and 16 bits can be achieved.
为了将用较低精度所传输的值反变换到时域,被传输的值最好被扩展回初始区域,例如24比特,例如用28乘以所传输的值。一个反向的IntMDCT被应用到对应的扩展回的值。In order to inversely transform the values transmitted with lower precision into the time domain, the transmitted values are preferably extended back to the original area, eg 24 bits, eg by multiplying the transmitted values by 28 . An inverse IntMDCT is applied to the corresponding extended back value.
在根据本发明的频域中的精度量化中,还最好利用LSB中的冗余。例如如果一个音频信号在上部频域有很小的能量,这在IntMDCT频谱中用很小的值来表示,例如这些值大大小于可以例如用8比特表示的值(-128,......,127),在IntMDCT频谱的LSB值的可压缩性中也体现了这种情况。而且,需要注意的是:在很小的差分频谱值中,从MSB到MSB-1的多个比特典型地均等于零;在有效位为MSB-n-1的比特之前,二进制编码的差分频谱值中的第一个1并不存在。这种情况下,当在第二个可扩展层中的差分频谱值只包含零的时候,熵编码尤其适合进一步的数据压缩。In precision quantization in the frequency domain according to the invention, it is also advantageous to exploit the redundancy in the LSB. For example if an audio signal has very little energy in the upper frequency domain, this is represented in the IntMDCT spectrum by very small values, e.g. these values are much smaller than what can e.g. be represented by 8 bits (-128, ..... ., 127), which is also reflected in the compressibility of the LSB values of the IntMDCT spectrum. Also, it should be noted that in very small differential spectral values, the bits from MSB to MSB-1 are typically equal to zero; The first 1 in does not exist. In this case, entropy coding is especially suitable for further data compression when the differential spectral values in the second scalable layer contain only zeros.
按照本发明的另一个实施例,对于图8a的第二扩展层82最好使用采样率扩展性。采样率扩展性通过最大为包含在第二扩展层中的第一截止频率的差分频谱值来实现,如图9右边所示,而在其它扩展层中,包含频率位于第一截止频率和最大频率之间的差分频谱值。当然,可以实现进一步的扩展,以在整个频域形成多个扩展层。According to another embodiment of the present invention, sampling rate scalability is preferably used for the
在本发明的一个优选实施例中,图9中的第二个扩展层包括频率最大为24kHz的差分频谱值,对应于48kHz的采样率。第三扩展层包括从24kHz到48kHz的差分频谱值,对应于96kHz的采样率。In a preferred embodiment of the invention, the second extension layer in FIG. 9 comprises differential spectral values at a frequency of at most 24 kHz, corresponding to a sampling rate of 48 kHz. The third extension layer includes differential spectral values from 24kHz to 48kHz, corresponding to a sampling rate of 96kHz.
需要进一步注意的是,在第二扩展层和第三扩展层中,不是一个差分频谱值中的所有位都需要编码。在合成扩展性的其它形式中,第二扩展层可包含最大为某一截止频率的差分频谱值的从MSB到MSB-X的位。第三扩展层然后可以包含从第一截止频率到最高频率的差分频谱值的从MSB到MSB-X的位。第四扩展层可包含最大为截止频率的差分频谱值的剩余位。最后一个扩展层包含较高频率的差分频谱值的剩余位。这个概念将会使图9中的表格被分为四个象限,每个象限代表一个扩展层。It should be further noted that, in the second extension layer and the third extension layer, not all bits in a differential spectrum value need to be coded. In other forms of synthetic extension, the second extension layer may contain bits MSB to MSB-X of a differential spectral value up to a certain cutoff frequency. The third extension layer may then contain the MSB to MSB-X bits of the differential spectral value from the first cutoff frequency to the highest frequency. The fourth extension layer may contain the remaining bits of the differential spectral value up to the cutoff frequency. The last extension layer contains the remaining bits of the difference spectral values at higher frequencies. This concept would result in the table in Figure 9 being divided into four quadrants, each quadrant representing an extension tier.
在频率的可扩展性中,在本发明的一个优选实施例中,描述了一个位于48kHz和96kHz采样率之间的可扩展性。96kHz的采样信号首先只在无损扩展层的IntMDCT区域中编码一半,并被传输。如果上半部分不被另外传输,在解码器中它被假定为零。在反向IntMDCT中(与编码器同样长度),产生了一个96kHz的信号,这个信号在上面的频域不包含能量,因而可能在没有质量损失的情况下以48kHZ被二次采样。In the frequency scalability, in a preferred embodiment of the invention, a scalability between 48 kHz and 96 kHz sampling rate is described. The 96 kHz sampled signal is first coded only half in the IntMDCT region of the lossless extension layer and transmitted. If the upper half is not otherwise transmitted, it is assumed to be zero in the decoder. In the inverse IntMDCT (same length as the encoder), a 96kHz signal is generated which contains no energy in the upper frequency domain and thus may be subsampled at 48kHz without loss of quality.
考虑到可扩展层的大小,图9具有固定边界的象限中差分频谱值最好在上面量化,因为在一个扩展层中,实际上只需要包含例如16位或者8位或者最大为截止频率或高于截止频率的频谱值。Considering the size of the scalable layer, the differential spectral values in the quadrants with fixed boundaries in Fig. 9 are best quantized on top, because in an extended layer, it is actually only necessary to contain e.g. 16 bits or 8 bits or a maximum of the cutoff frequency or Spectrum value at the cutoff frequency.
一种作为替换的比例在某种程度上“软化”了图9的象限边界。在频率可扩展性的例子中,这意味着不因为在截止频率前的差分频谱值没有改变并且在截止频率后为零,就应用所谓的“砖墙低通”。相反的,差分频谱值也可通过已经有些阻碍低于截止频率的频谱值的任意低通来滤波,但是,在截止频率以上,差分频谱值还仍然有能量,虽然能量在降低。在由此生成的扩展层中,还包含在截止频率以上的频谱值。然而,由于这些频谱值相对较小,它们可以被有效地进行熵编码。在这种情况下最高扩展层具有在完全差分频谱值和包含在第二扩展层的频谱值之间的差。An alternative scale "softens" the quadrant boundaries of Figure 9 somewhat. In the case of frequency scalability, this means not applying a so-called "brick-wall low-pass" since the differential spectral value does not change before the cutoff frequency and is zero after the cutoff frequency. Conversely, the differential spectral values can also be filtered by an arbitrary low pass which already somewhat blocks the spectral values below the cutoff frequency, but above the cutoff frequency the differential spectral values still have energy, although the energy is decreasing. In the extension layer thus generated, the spectral values above the cutoff frequency are also included. However, since these spectral values are relatively small, they can be effectively entropy encoded. In this case the highest expansion layer has the difference between the fully differential spectral values and the spectral values contained in the second expansion layer.
精确量化在某种程度上也可以同样被软化。第一扩展层也包含例如多于16位的频谱值,其中在下一个扩展层中仍然具有这个差别。通常来讲,第二扩展层具有精度更低的差分频谱值,而在下一个扩展层中,其余的,也就是完全频谱值和第二可扩展层中包含的频谱值之间的差被传输。通过这种方法,实现了可变精度缩减。Precise quantification can also be softened to some extent. The first expansion layer also contains, for example, spectral values with more than 16 bits, with this difference still being present in the next expansion layer. In general, the second expansion layer has the differential spectral values with less precision, while in the next expansion layer the rest, ie the difference between the full spectral value and the spectral value contained in the second scalable layer, is transmitted. In this way, variable precision reduction is achieved.
具有创造性的编码或解码方法更适于存储在具有电子可读性控制信号的电子存储媒体中,如软盘,其中控制信号可以与一个可编程的计算机系统配合,从而执行编码和/或解码方法。换句话说,当程序产品在计算机上执行时,存在一个具有存储在机器可读载体的计算机代码的计算机程序产品,以实现编码和/或解码方法。当程序在计算机中执行时,本发明的方法可以通过具有执行本发明的方法的计算机代码的计算机程序来实现。The inventive encoding or decoding method is preferably stored on an electronic storage medium, such as a floppy disk, having electronically readable control signals that can cooperate with a programmable computer system to perform the encoding and/or decoding method. In other words, there is a computer program product having computer code stored on a machine-readable carrier for implementing the encoding and/or decoding method when the program product is executed on a computer. The method of the present invention can be realized by a computer program having computer code for executing the method of the present invention when the program is executed in a computer.
下面,作为一个整数变换算法的例子,需要介绍在“Audio CodingBased on Interger Transforms”(111th AES convention,NewYork,2001)中描述的IntMDCT变换算法。由于IntMDCT有MDCT算法的吸引人的特性,如音频信号的良好频谱表示、严格的取样和块重叠,IntMDCT尤其受到青睐。一种通过IntMDCT对MDCT的良好的近似可以仅仅使用在图5的编码器中的一个变换算法,如图5的箭头62所示。在图1到4的基础之上解释了这种特别形式的整数变换算法的重要属性。Next, as an example of an integer transformation algorithm, the IntMDCT transformation algorithm described in "Audio Coding Based on Integer Transforms" (111 th AES convention, New York, 2001) needs to be introduced. IntMDCT is especially favored due to its attractive properties of MDCT algorithms, such as good spectral representation of audio signals, strict sampling and block overlap. A good approximation to MDCT by IntMDCT can use only one transform algorithm in the encoder of FIG. 5, as indicated by
图1示出了为处理表示音频信号的时间离散的采样的具有创造性的优选的装置,以获得使IntMDCT整数变换算法有效的整数值。时间离散的采样被窗口化并且可选地被图1所示的装置转换成频谱表示。被送入装置的输入端10的时间离散的采样被一个长度为2N时间离散采样的窗口w窗口化,以在输出端12获取整数窗口化采样,这些采样适合于通过变换装置、尤其是用于执行整数DCT的装置14转化为频谱表示。整数DCT用于从N个输入值产生N个输出值,这与图4a的MDCT函数408相反,函数408根据MDCT等式从2N个窗口化值只产生N个频谱值。Figure 1 shows an inventively preferred arrangement for processing time-discrete samples representing an audio signal to obtain integer values for which the IntMDCT integer transform algorithm is efficient. The time-discrete samples are windowed and optionally converted to a spectral representation by the apparatus shown in FIG. 1 . The time-discrete samples fed into the
为了窗口化时间离散采样,首先在装置16中选择两个时间离散的采样,它们一起代表一个时间离散采样的矢量。装置16选择的一个时间离散采样位于窗口的第一象限。另一个时间离散采样位于窗口的第二象限,在图3的基础上它被解释得更加详细。对于装置16生成的矢量,应用一个2×2维的矩阵旋转,其中这个操作不是立即执行的,而是通过多个所谓的“提升矩阵”来执行。To window the time-discrete samples, first two time-discrete samples are selected in the
一个提升矩阵具有只包含一个与窗口w有关的元素和不等于0或者1的属性。A boosting matrix has the property that it contains only one element associated with window w and is not equal to 0 or 1.
在“Factoring Wavelet Transforms Into Lifting Steps”(IngridDaubechies和Wim Sweldens,preprint,Bell Laboratories,LucentTechnologies,1996)中描述了由小波变换到提升步骤的因式分解。总体来讲,一个提升方案是具有同样低通或者高通滤波器的完美重建滤波器对之间的简单关系。每对互补滤波器都可以被因式分解为提升步骤。这对于Givens旋转尤其适用。考虑多相矩阵是Givens旋转的情形。然后,应用下面的公式:The factoring of wavelet transforms into lifting steps is described in "Factoring Wavelet Transforms Into Lifting Steps" (Ingrid Daubechies and Wim Sweldens, preprint, Bell Laboratories, Lucent Technologies, 1996). In general, a lifting scheme is a simple relationship between pairs of perfect reconstruction filters with the same low-pass or high-pass filter. Each pair of complementary filters can be factorized into lifting steps. This is especially true for Givens rotations. Consider the case where the multiphase matrix is a Givens rotation. Then, apply the following formula:
等号右边的三个提升矩阵每个都有1作为主对角线元素。此外,在每个提升矩阵中,不在主对角线上的元素等于0,不在主对角线上的元素与旋转角α有关。The three lifting matrices to the right of the equal sign each have 1s as the main diagonal elements. In addition, in each lifting matrix, the elements not on the main diagonal are equal to 0, and the elements not on the main diagonal are related to the rotation angle α.
现在向量与第三个提升矩阵相乘,也就是乘以上式中最右边的提升矩阵,得到第一个结果向量,在图1中用装置18来描述这个过程。如图1中通过装置20所示,用一个任意的取整函数对第一个结果向量取整,这个取整函数将一组实数映射为一组整数。在装置20的输出端处得到了取整后的第一个结果向量。这个取整后的第一个结果向量被送到装置22,与中间的一项相乘,也就是乘以右边第二项,得到第二个结果向量,然后再用装置24取整得到取整后的第二个结果向量。取整后的第二个结果向量送至装置26与上述等式最左边的提升矩阵相乘,也就是第一项,来得到第三个结果向量,最后依然用装置28取整,最后在输出端12处得到整数窗口化采样,如果希望得到其频谱表示,则需要通过装置14对其进行处理,从而在频谱输出端30处得到整数频谱值。Now the vector is multiplied by the third lifting matrix, that is, multiplying the rightmost lifting matrix in the above formula, to obtain the first result vector, and this process is described with
装置14最好作为整数DCT来实现。The means 14 are preferably implemented as an integer DCT.
根据长度为N的类型4(DCT-IV),离散余弦变换用下式给出:According to type 4 (DCT-IV) of length N, the discrete cosine transform is given by:
DCT-IV的系数形成一个标准正交的N×N矩阵,如出版物“Multirate System And Filter Banks”(P.P.Vaidyanathan,PrenticeHall,Englewood Cliffs,1993)中所述,每一个正交N×N矩阵可以分解成N(N-1)/2个Givens旋转。需要注意的是,也可以进一步分解。The coefficients of DCT-IV form an orthonormal N×N matrix, as described in the publication “Multirate System And Filter Banks” (P.P. Vaidyanathan, Prentice Hall, Englewood Cliffs, 1993), each orthogonal N×N matrix can Decomposed into N(N-1)/2 Givens rotations. Note that further decompositions are also possible.
对于不同DCT算法的分类,可以参考H.S.Malvar的“SignalProcessing With Lapped Transforms”一书,1992年Artech House出版社出版。一般来说,DCT算法根据它们的基函数类型来区分。而在这里优选的DCT-IV中包含非对称的基函数,也就是说,一个1/4余弦波,一个3/4余弦波,一个5/4余弦波,一个7/4余弦波等等,这种离散余弦变换,例如类型II(DCT-II),具有轴对称和点对称的基函数。零级基函数是一个直流分量,第一级基函数是半个余弦波,第二级基函数是整个余弦波,等等。由于在DCT-II中特别考虑直流分量,它应用在视频编码中而不是用在音频编码中,因为与视频编码不同的是,音频编码中的直流分量是不相关的。For the classification of different DCT algorithms, you can refer to the book "Signal Processing With Lapped Transforms" by H.S. Malvar, published by Artech House Press in 1992. In general, DCT algorithms are differentiated according to their basis function type. Whereas the preferred DCT-IV here contains asymmetric basis functions, that is, a 1/4 cosine wave, a 3/4 cosine wave, a 5/4 cosine wave, a 7/4 cosine wave, etc., Such discrete cosine transforms, such as Type II (DCT-II), have axisymmetric and point-symmetric basis functions. The zero-order basis function is a DC component, the first-order basis function is a half cosine wave, the second-order basis function is a whole cosine wave, and so on. Since the DC component is specifically considered in DCT-II, it is used in video coding but not in audio coding, because the DC component is irrelevant in audio coding, unlike in video coding.
下面来解释Givens旋转的旋转角α如何与窗口函数有关。Let's explain how the rotation angle α of the Givens rotation is related to the window function.
窗口长度为2N的一个MDCT可以减至长度为N的IV型离散余弦变换。这可以通过在时域内执行TDAC操作,然后应用DCT-IV来实现。由于50%重叠,用于块t的左半部窗口和先前的块,也就是决t-1的右半部重叠。两个连续块t和t-1的重叠部分在时域中,即在转换之前,也就是在图1的输入10和输出12之间,进行预处理,如下:An MDCT with a window length of 2N can be reduced to a type IV discrete cosine transform of length N. This can be achieved by performing TDAC operations in the time domain and then applying DCT-IV. Due to the 50% overlap, the left half of the window for block t overlaps with the right half of the previous block, ie block t-1. The overlap of two consecutive blocks t and t−1 is preprocessed in the time domain, i.e. before conversion, that is, between
字母上面标有波浪线的数值是图1的输出端12处的值,上式中没有标有波浪线的x值代表输入端10处的值或者装置16后面的用于选择的值。系数k的取值范围从0到(N/2)-1,w代表窗口函数。Values marked with a wavy line above the letters are the values at the
从窗口函数w的TDAC条件可知有下面关系:From the TDAC condition of the window function w, we can see the following relationship:
对于某些角度αk,k=0、1、......、(N/2)-1,这个在时域内的预处理可以写成Givens旋转,这在前面已经解释了。For certain angles α k , k=0, 1, . . . , (N/2)-1, this preprocessing in the time domain can be written as a Givens rotation, which was explained earlier.
Givens旋转的角度α与窗口函数w的关系如下:The relationship between the givens rotation angle α and the window function w is as follows:
α=arctan[w(N/2-1-k)/w(N/2+k)4 (5)α=arctan[w(N/2-1-k)/w(N/2+k)4 (5)
需要注意的是,只要符合TDAC条件,任意的窗口函数w都可以应用。It should be noted that any window function w can be applied as long as it meets the TDAC conditions.
下面,以图2为基础,描述了一个级联的编码器和解码器。通过一个窗口一起“窗口化”的时间离散采样x(0)到x(2N-1)首先被图1中的装置16来选择,使得采样x(0)和x(N-1),即来自窗口的第一个四分之一部分的采样和来自窗口的第二个四分之一部分的采样被选择,以在装置16的输出端处形成矢量。交叉的箭头表示对装置18,20或22,24或26,28提升相乘和相继取整,以在DCT-IV块的输入端得到整数窗口化的采样。Below, based on Figure 2, a cascaded encoder and decoder is described. Time-discrete samples x(0) to x(2N-1) that are "windowed" together by a window are first selected by
如上所描述,当第一个矢量被处理的时候,第二个矢量也从采样x(N/2-1)和x(N/2)中选中,也就是说,又一个来自窗口的第一个四分之一部分的采样和来自窗口的第二个四分之一部分的采样,再一次通过图1中所描述的算法处理。所有其他的来自于窗口第一个四分之一部分和第二个四分之一部分的采样对均被类似处理。第一个窗口的第三和第四个四分之一部分被同样地处理。如图2所示,在输出端12处具有N个“窗口化”的整数采样,它被送至DCT-IV变换。特别的,第二和第三个四分之一部分的“窗口化”整数采样被送至DCT。窗口的第一个四分之一部分的“窗口化”整数采样与前一个窗口的第四个四分之一部分的“窗口化”整数采样一起被送入前面的DCT-IV中进行处理。类似的,图2中第四个四分之一部分的“窗口化”整数采样与后一个窗口的第一个四分之一部分的“窗口化”整数采样一起被送至DCT-IV变换。图2中所示的中央整数DCT-IV变换32提供了N个整数的频谱值y(0)到y(N-1)。由于窗口化过程和变换过程提供了整数的输出值,因此不需要反向量化就可以将这些整数频谱值直接进行熵编码。As described above, when the first vector is processed, the second vector is also selected from samples x(N/2-1) and x(N/2), that is, another one from the first vector of the window The samples from the first quarter of the window and the samples from the second quarter of the window are again processed by the algorithm described in Figure 1. All other pairs of samples from the first and second quarters of the window are treated similarly. The third and fourth quarters of the first window are treated similarly. As shown in Figure 2, there are N "windowed" integer samples at
在图2的右半边描述了一个解码器。这个解码器包含反向变换和“反向窗口化”,它以与编码器相反的方式工作。已知对于DCT-IV的反向变换来说,需要使用到如图2所示的反向DCT-IV。如图2所示,为了再一次在装置34的输出端或者前一次和下一次变换中从整数“窗口化”采样中产生时间离散音频采样x(0)到x(2N-1),用前一次和后一次的变换的值对解码器DCT-IV34的输出值进行反向处理。In the right half of Fig. 2 a decoder is depicted. This decoder contains an inverse transform and "inverse windowing", which works in the opposite way to the encoder. It is known that for the inverse transformation of DCT-IV, the inverse DCT-IV as shown in FIG. 2 needs to be used. As shown in FIG. 2, to generate time-discrete audio samples x(0) to x(2N-1) from the integer "windowed" samples at the output of the device 34 again in the previous and next transformations, the previous The values of the first and subsequent transformations are inversely processed to the output values of the decoder DCT-IV34.
输出端的操作通过一个反向Givens旋转来完成,即块26,28或者22,24或者18,20是在一个相反的方向通过。基于等式1的第二个提升矩阵可以描述得更加详细。当(在编码器中)第二个结果矢量通过将取整后的第一个结果矢量与第二个提升矩阵相乘(装置22)而形成的时候,有以下的结果:Operation at the output is done by a reverse Givens rotation, ie blocks 26, 28 or 22, 24 or 18, 20 are passed in an opposite direction. The second boost matrix based on
等式6右边的值x,y是整数。然而这不适用于值xsinα。这里,需要介绍一下取整函数r,它以如下的等式表示:The values x, y on the right side of Equation 6 are integers. However this does not apply to the value xsinα. Here, we need to introduce the rounding function r, which is expressed by the following equation:
这个操作执行了装置24的功能。This operation performs the function of
解码器中的反向映射可以定义如下:The reverse mapping in the decoder can be defined as follows:
由于在取整操作之前的减号,很明显提升步骤的整数近似可以被反向,而不会引入错误。对这三个提升步骤中任何一个的近似的应用都导致了Givens旋转的整数近似。(编码器中的)取整旋转可以(在解码器中)被反向,而不会引入错误,即反向取整顺提升步骤以相反的顺序通过,也就是说,图1的算法在解码的时候是自下向上执行的。Due to the minus sign preceding the rounding operation, it is clear that the integer approximation of the lifting step can be reversed without introducing errors. Application of an approximation to any of these three lifting steps results in an integer approximation of the Givens rotation. The rounding rotation (in the encoder) can be reversed (in the decoder) without introducing errors, i.e. the reverse rounding and lifting steps are passed in reverse order, that is, the algorithm of Fig. It is executed bottom-up.
如果取整函数r是点对称的,反向取整的旋转与角-α的取整旋转是相同的,如下:If the rounding function r is point-symmetric, the rotation of the reverse rounding is the same as the rounding rotation of the angle -α, as follows:
用于解码器的提升矩阵,即用于反向Givens旋转,在这种情况下可由等式(1)直接得到,仅需简单地将“sinα”项替换为“-sinα”。The lifting matrix for the decoder, ie for the inverse Givens rotation, is in this case directly derived from equation (1), simply replacing the "sinα" term with "-sinα".
在下面,在图3的基础之上,再次提到具有重叠窗口40到60的普通MDCT的分解。窗口40到60分别重叠50%。每个窗口,首先窗口的第一和第二个四分之一部分内、或者在窗口的第三和第四个四分之一部分内的Givens旋转被执行,如箭头48所示。然后,被旋转的值,也就是窗口化的整数采样,被送入一个N到N的DCT,使得一个窗口的第二和第三个四分之一部分或者下一个窗口的第四和第一个四分之一部分一起通过DCT-IV算法转换为频谱表示。In the following, on the basis of FIG. 3 , the decomposition of a general MDCT with overlapping windows 40 to 60 is mentioned again. Windows 40 to 60 each overlap by 50%. For each window, first a Givens rotation within the first and second quarter of the window, or within the third and fourth quarter of the window is performed, as indicated by
所以,通常的Givens旋转被分解为提升矩阵,这些矩阵被顺序执行,其中在每次提升矩阵相乘之后引入一个取整的步骤,使得浮点数在它们产生后就立即被取整,这样在每次结果矢量与提升矩阵相乘之前,结果矢量只有整数。Therefore, the usual Givens rotation is decomposed into lifting matrices, which are executed sequentially, where a rounding step is introduced after each lifting matrix multiplication, so that floating-point numbers are rounded immediately after they are generated, so that in each Before multiplying the result vector with the lifting matrix, the result vector has only integers.
输出值总是整数,最好也使用整数输入值。这不代表对本发明的局限,因为每个作为示例的PCM采样,由于它们存储在一张CD上,是整数值,其取值范围是根据位的宽度变化的,也就是说,根据时间离散数字输入值是十六位还是二十一位来变化。然而,如所阐述的一样,通过以相反的顺序执行反向旋转,整个过程是可以反向进行的。因此,存在一个具有完美重建的MDCT整数近似值,即无损转换。Output values are always integers, and it is preferable to use integer input values as well. This does not represent a limitation on the invention, since each exemplary PCM sample, since they are stored on a CD, is an integer value whose range of values varies according to the width of the bit, that is, according to the time discrete number Whether the input value is sixteen or twenty one to change. However, as explained, the entire process can be reversed by performing the reverse rotation in the reverse order. Therefore, there exists an integer approximation of the MDCT with perfect reconstruction, the lossless transformation.
所示转换提供了整数输出值而不是浮点值。它提供了一个完美的重建,所以当先执行一个前向转换、然后执行一个后向转换的时候,没有引入错误。这个转换,按照本发明的一个优选实施例,是对修正离散余弦变换的替换。然而,其他转换方法也可以通过整数的方式执行,只要分解为旋转和将旋转分解为提升步骤是可能的。The conversions shown provide integer output values rather than floating point values. It provides a perfect reconstruction, so no errors are introduced when performing a forward transformation followed by a backward transformation. This transform, according to a preferred embodiment of the present invention, is a replacement for the Modified Discrete Cosine Transform. However, other transformation methods can also be performed in an integer fashion, as long as decomposition into rotations and decomposition of rotations into lifting steps is possible.
整数MDCT有MDCT的大部分优良特性。它有一个重叠的结构,由此可得到比在无重叠块转换中更好的频率选择性。由于TDAC函数,转换前的窗口化已经考虑了这个函数,维持了严格的采样,使得代表一个音频信号的所有频谱值等于输入采样的总数。Integer MDCT has most of the good properties of MDCT. It has an overlapping structure, whereby better frequency selectivity is obtained than in non-overlapping block switching. Due to the TDAC function, windowing before conversion already takes this function into account, maintaining strict sampling such that all spectral values representing an audio signal are equal to the total number of input samples.
与一个普通的提供浮点采样的MDCT相比,在描述的优选的整数变换中,仅在具有很小的信号强度的频谱区域中,与普通MDCT相比,噪声增强了,而这个噪声增强的并没有使它自己成为一个重要的信号强度。为此,整数处理有助于有效的硬件实现,因为只使用了乘法步骤,而乘法可以很容易地分解为移位和加法步骤,这两种操作在硬件中都是很容易快速实现的。当然,软件实现也是可行的。In the preferred integer transform described, the noise is enhanced compared to an ordinary MDCT only in spectral regions with little signal strength, and this noise-enhanced Doesn't qualify as a significant signal strength on its own. For this, integer processing facilitates an efficient hardware implementation because only the multiplication step is used, and multiplication can be easily decomposed into shift and add steps, both of which are easy and fast to implement in hardware. Of course, software implementation is also feasible.
整数变换提供了音频信号的一个良好的频谱表示,并且仍然保留在整数区域。当它被应用于一个音频信号的语音部分时,会导致良好的能量聚集。通过这种方法,一个有效的无损编码方案可以通过用如图1所示简单的级联窗口化/转换来实现。尤其,使用逸出值的堆栈编码是很受欢迎的,如在MPEG AAC中使用的一样。最好通过使用二的特定次方来缩减所有的值直到它们满足一个所希望的码表,然后对忽略的最低有效位进行编码。与使用更大的码表的替代方法相比,考虑到存储码表所需要的存储消耗,这个方法更好。也可以通过只简单地省略某些最低有效位获得一种几乎无损的编码器。Integer transforms provide a nice spectral representation of the audio signal and still remain in the integer region. When applied to the speech portion of an audio signal, it results in a good energy concentration. With this approach, an efficient lossless coding scheme can be implemented with a simple cascaded windowing/transformation as shown in Figure 1. In particular, stack encoding using escape values, as used in MPEG AAC, is popular. It is best to reduce all values by using a specific power of two until they satisfy a desired code table, and then encode the least significant bits ignored. Compared to the alternative method of using a larger code table, this method is better in terms of the memory consumption required to store the code table. It is also possible to obtain an almost lossless encoder by simply omitting only some of the least significant bits.
尤其对于语音信号,整数频谱值的熵编码使高编码增益成为可能。对于信号的瞬态部分,编码增益很低,即由于瞬态信号的平坦频谱,也就是说,由于一小部分等于或几乎等于0的频谱值。如在J.Herre,J.D.Johnston的“Enhancing the Performance of Perceptual AudioCoders by Using Temporal Noise Shaping(TNS)”101st AESConvention,Los Angeles,1996,preprint 4384中所描述,然而这种平坦性可能通过用频域内的线性预测而被利用。有一个替代方案是用开环预测,还有一个替代方案是用闭环预测。第一种方案,即开环预测器,被称为TNS。预测后的量化导致结果量化噪声适应于音频信号的时域结构,因此阻止了在心理声学音频编码器中的前向回波。对于无损音频编码,第二种方案更适合,也就是闭环预测器,因为闭环预测允许输入信号的精确重建。当这一技术被应用于所生成的频谱时,在预测滤波器的每级后必须执行一个取整步骤,以使之保留在整数区域内。通过使用反向滤波器和同样的取整函数,初始的频谱可以精确地产生。Especially for speech signals, entropy coding of integer spectral values enables high coding gains. For the transient part of the signal, the coding gain is low, i.e. due to the flat spectrum of the transient signal, that is to say, due to a small fraction of spectral values equal or almost equal to zero. As described in J. Herre, JD Johnston "Enhancing the Performance of Perceptual AudioCoders by Using Temporal Noise Shaping (TNS)" 101 st AESConvention, Los Angeles, 1996, preprint 4384, however this flatness may be obtained by using used for linear prediction. An alternative is to use open-loop forecasting, and an alternative is to use closed-loop forecasting. The first scheme, the open-loop predictor, is called TNS. Post-predictive quantization causes the resulting quantization noise to adapt to the temporal structure of the audio signal, thus preventing forward echoes in psychoacoustic audio coders. For lossless audio coding, the second scheme is more suitable, that is, a closed-loop predictor, because closed-loop prediction allows an accurate reconstruction of the input signal. When this technique is applied to the generated spectrum, a rounding step must be performed after each stage of the prediction filter to keep it in the integer region. By using the inverse filter and the same rounding function, the original spectrum can be generated exactly.
为了利用数据缩减中的两条信道之间的冗余,当使用一个α/4角度的取整旋转时候,在无损方式中也可以使用中间-边缘编码。与计算立体声信号左右声道之间的总数和差的方法相比较,这个取整旋转的好处是能够维持能量。使用所谓的结合立体声编码的技术可以为每个波段被打开或者关闭,如同在标准MPEG AAC中也是这样实现的。为了能够更加灵活地减小两个信道之间的冗余,还可考虑其它旋转角度。In order to exploit the redundancy between the two channels in data reduction, mid-edge coding can also be used in lossless fashion when using a rounded rotation of α/4 angle. The benefit of this rounding rotation is that it preserves energy compared to methods that calculate the sum and difference between the left and right channels of a stereo signal. Using so-called combined stereo coding it can be switched on or off for each band, as is done in standard MPEG AAC. In order to be able to more flexibly reduce the redundancy between the two channels, other rotation angles are also conceivable.
Claims (31)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE10217297A DE10217297A1 (en) | 2002-04-18 | 2002-04-18 | Device and method for coding a discrete-time audio signal and device and method for decoding coded audio data |
| DE10217297.8 | 2002-04-18 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1625768A CN1625768A (en) | 2005-06-08 |
| CN1258172C true CN1258172C (en) | 2006-05-31 |
Family
ID=28798541
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB028289749A Expired - Lifetime CN1258172C (en) | 2002-04-18 | 2002-12-02 | Apparatus and method for encoding and decoding audio signals |
Country Status (8)
| Country | Link |
|---|---|
| EP (1) | EP1495464B1 (en) |
| JP (1) | JP4081447B2 (en) |
| KR (1) | KR100892152B1 (en) |
| CN (1) | CN1258172C (en) |
| AT (1) | ATE305655T1 (en) |
| CA (1) | CA2482427C (en) |
| DE (2) | DE10217297A1 (en) |
| WO (1) | WO2003088212A1 (en) |
Families Citing this family (39)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| MXPA06003508A (en) * | 2003-09-29 | 2007-01-25 | Agency Science Tech & Res | Method for transforming a digital signal from the time domain into the frequency domain and vice versa. |
| ES2305852T3 (en) * | 2003-10-10 | 2008-11-01 | Agency For Science, Technology And Research | PROCEDURE FOR CODING A DIGITAL SIGNAL IN A SCALABLE BINARY FLOW, PROCEDURE FOR DECODING A SCALABLE BINARY FLOW. |
| DE102004007184B3 (en) * | 2004-02-13 | 2005-09-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and apparatus for quantizing an information signal |
| DE102004007200B3 (en) * | 2004-02-13 | 2005-08-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device for audio encoding has device for using filter to obtain scaled, filtered audio value, device for quantizing it to obtain block of quantized, scaled, filtered audio values and device for including information in coded signal |
| DE102004059979B4 (en) * | 2004-12-13 | 2007-11-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for calculating a signal energy of an information signal |
| JP2009500657A (en) | 2005-06-30 | 2009-01-08 | エルジー エレクトロニクス インコーポレイティド | Apparatus and method for encoding and decoding audio signals |
| EP1920635B1 (en) | 2005-08-30 | 2010-01-13 | LG Electronics Inc. | Apparatus and method for decoding an audio signal |
| JP2009520212A (en) | 2005-10-05 | 2009-05-21 | エルジー エレクトロニクス インコーポレイティド | Signal processing method and apparatus, encoding and decoding method, and apparatus therefor |
| KR100857114B1 (en) | 2005-10-05 | 2008-09-08 | 엘지전자 주식회사 | Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor |
| US7653533B2 (en) | 2005-10-24 | 2010-01-26 | Lg Electronics Inc. | Removing time delays in signal paths |
| EP1852849A1 (en) * | 2006-05-05 | 2007-11-07 | Deutsche Thomson-Brandt Gmbh | Method and apparatus for lossless encoding of a source signal, using a lossy encoded data stream and a lossless extension data stream |
| EP1883067A1 (en) * | 2006-07-24 | 2008-01-30 | Deutsche Thomson-Brandt Gmbh | Method and apparatus for lossless encoding of a source signal, using a lossy encoded data stream and a lossless extension data stream |
| EP1903559A1 (en) | 2006-09-20 | 2008-03-26 | Deutsche Thomson-Brandt Gmbh | Method and device for transcoding audio signals |
| DE102006051673A1 (en) * | 2006-11-02 | 2008-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for reworking spectral values and encoders and decoders for audio signals |
| DE102007003187A1 (en) * | 2007-01-22 | 2008-10-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating a signal or a signal to be transmitted |
| KR101149448B1 (en) | 2007-02-12 | 2012-05-25 | 삼성전자주식회사 | Audio encoding and decoding apparatus and method thereof |
| EP2015293A1 (en) * | 2007-06-14 | 2009-01-14 | Deutsche Thomson OHG | Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain |
| DK2186088T3 (en) * | 2007-08-27 | 2018-01-15 | ERICSSON TELEFON AB L M (publ) | Low complexity spectral analysis / synthesis using selectable time resolution |
| EP2063417A1 (en) * | 2007-11-23 | 2009-05-27 | Deutsche Thomson OHG | Rounding noise shaping for integer transform based encoding and decoding |
| EP2144230A1 (en) | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme having cascaded switches |
| EP2345030A2 (en) * | 2008-10-08 | 2011-07-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Multi-resolution switched audio encoding/decoding scheme |
| EP2555186A4 (en) | 2010-03-31 | 2014-04-16 | Korea Electronics Telecomm | METHOD AND DEVICE FOR ENCODING, AND METHOD AND DEVICE FOR DECODING |
| US8924222B2 (en) | 2010-07-30 | 2014-12-30 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for coding of harmonic signals |
| US9208792B2 (en) | 2010-08-17 | 2015-12-08 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for noise injection |
| JP5799707B2 (en) * | 2011-09-26 | 2015-10-28 | ソニー株式会社 | Audio encoding apparatus, audio encoding method, audio decoding apparatus, audio decoding method, and program |
| EP2830058A1 (en) | 2013-07-22 | 2015-01-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Frequency-domain audio coding supporting transform length switching |
| CN105632503B (en) * | 2014-10-28 | 2019-09-03 | 南宁富桂精密工业有限公司 | Information concealing method and system |
| US10354668B2 (en) * | 2017-03-22 | 2019-07-16 | Immersion Networks, Inc. | System and method for processing audio data |
| EP3471271A1 (en) * | 2017-10-16 | 2019-04-17 | Acoustical Beauty | Improved convolutions of digital signals using a bit requirement optimization of a target digital signal |
| EP3483879A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Analysis/synthesis windowing function for modulated lapped transformation |
| WO2019091576A1 (en) * | 2017-11-10 | 2019-05-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits |
| CN107911122A (en) * | 2017-11-13 | 2018-04-13 | 南京大学 | Lossless compression method for distributed optical fiber vibration sensing data based on decomposition and compression |
| US11281312B2 (en) | 2018-01-08 | 2022-03-22 | Immersion Networks, Inc. | Methods and apparatuses for producing smooth representations of input motion in time and space |
| WO2019199995A1 (en) | 2018-04-11 | 2019-10-17 | Dolby Laboratories Licensing Corporation | Perceptually-based loss functions for audio encoding and decoding based on machine learning |
| DE102019204527B4 (en) * | 2019-03-29 | 2020-11-19 | Technische Universität München | CODING / DECODING DEVICES AND METHODS FOR CODING / DECODING VIBROTACTILE SIGNALS |
| KR102250835B1 (en) * | 2019-08-05 | 2021-05-11 | 국방과학연구소 | A compression device of a lofar or demon gram for detecting a narrowband of a passive sonar |
| US11348594B2 (en) * | 2020-06-11 | 2022-05-31 | Qualcomm Incorporated | Stream conformant bit error resilience |
| CN117217268A (en) * | 2022-05-24 | 2023-12-12 | 英属维京群岛商烁星有限公司 | Autoregressive model-based transducer and associated processor |
| CN118571234A (en) * | 2023-02-28 | 2024-08-30 | 华为技术有限公司 | Audio encoding and decoding method and related device |
-
2002
- 2002-04-18 DE DE10217297A patent/DE10217297A1/en not_active Withdrawn
- 2002-12-02 EP EP02792858A patent/EP1495464B1/en not_active Expired - Lifetime
- 2002-12-02 AT AT02792858T patent/ATE305655T1/en active
- 2002-12-02 CA CA002482427A patent/CA2482427C/en not_active Expired - Lifetime
- 2002-12-02 JP JP2003585070A patent/JP4081447B2/en not_active Expired - Lifetime
- 2002-12-02 CN CNB028289749A patent/CN1258172C/en not_active Expired - Lifetime
- 2002-12-02 KR KR1020047016744A patent/KR100892152B1/en not_active Expired - Lifetime
- 2002-12-02 DE DE50204426T patent/DE50204426D1/en not_active Expired - Lifetime
- 2002-12-02 WO PCT/EP2002/013623 patent/WO2003088212A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| KR20050007312A (en) | 2005-01-17 |
| CA2482427C (en) | 2010-01-19 |
| JP4081447B2 (en) | 2008-04-23 |
| ATE305655T1 (en) | 2005-10-15 |
| EP1495464A1 (en) | 2005-01-12 |
| CN1625768A (en) | 2005-06-08 |
| DE10217297A1 (en) | 2003-11-06 |
| JP2005527851A (en) | 2005-09-15 |
| WO2003088212A1 (en) | 2003-10-23 |
| EP1495464B1 (en) | 2005-09-28 |
| DE50204426D1 (en) | 2005-11-03 |
| AU2002358578A1 (en) | 2003-10-27 |
| CA2482427A1 (en) | 2003-10-23 |
| KR100892152B1 (en) | 2009-04-10 |
| HK1077391A1 (en) | 2006-02-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1258172C (en) | Apparatus and method for encoding and decoding audio signals | |
| US7275036B2 (en) | Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data | |
| CN1675683A (en) | Device and method for scalable coding and device and method for scalable decoding | |
| CN101432802B (en) | Method and device for losslessly encoding a source signal using a lossy encoded data stream and a lossless extension data stream | |
| CN1146130C (en) | System and method of masking quantization noise of audio signals | |
| JP5400143B2 (en) | Factoring the overlapping transform into two block transforms | |
| EP1403854A2 (en) | Multi-channel audio encoding and decoding | |
| US20100274555A1 (en) | Audio Coding Apparatus and Method Thereof | |
| CN1662958A (en) | Audio coding system using spectral hole filling | |
| CN1813286A (en) | Efficient coding of digital media spectral data using wide-sense perceptual similarity | |
| CN1806239A (en) | Device and method for conversion into a transformed representation or for inversely converting the transformed representation. | |
| JP2009524108A (en) | Complex transform channel coding with extended-band frequency coding | |
| CN1310210C (en) | Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components | |
| CN1677490A (en) | Intensified audio-frequency coding-decoding device and method | |
| CN1677493A (en) | Intensified audio-frequency coding-decoding device and method | |
| Geiger et al. | IntMDCT-A link between perceptual and lossless audio coding | |
| CN1677492A (en) | Intensified audio-frequency coding-decoding device and method | |
| WO2012052802A1 (en) | An audio encoder/decoder apparatus | |
| CN101031961B (en) | Method and device for processing coded signals | |
| WO2009022193A2 (en) | Devices, methods and computer program products for audio signal coding and decoding | |
| CN1890712A (en) | Audio signal coding | |
| US20100280830A1 (en) | Decoder | |
| HK1077391B (en) | Device and method for coding and decoding audio signal | |
| CN1862969A (en) | Adaptive block length, constant converting audio frequency decoding method | |
| US20170206905A1 (en) | Method, medium and apparatus for encoding and/or decoding signal based on a psychoacoustic model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1077391 Country of ref document: HK |
|
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CP01 | Change in the name or title of a patent holder |
Address after: Munich, Germany Patentee after: FRAUNHOFER-GESELLSCHAFT ZUR FORDERUNG DER ANGEWANDTEN FORSCHUNG E.V. Address before: Munich, Germany Patentee before: Franhofer transportation Application Research Co.,Ltd. |
|
| CP01 | Change in the name or title of a patent holder | ||
| CX01 | Expiry of patent term |
Granted publication date: 20060531 |
|
| CX01 | Expiry of patent term |