CN1258172C

CN1258172C - Apparatus and method for encoding and decoding audio signals

Info

Publication number: CN1258172C
Application number: CNB028289749A
Authority: CN
Inventors: 拉尔夫·盖格; 托马斯·思博尔; 卡尔海因兹·勃兰登堡; 朱尔根·赫尔; 朱尔根·科洛尔; 乔吉姆·德格拉
Original assignee: Fraunhofer Gesellschaft zur Foerderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Foerderung der Angewandten Forschung eV
Priority date: 2002-04-18
Filing date: 2002-12-02
Publication date: 2006-05-31
Anticipated expiration: 2022-12-02
Also published as: KR20050007312A; CA2482427C; JP4081447B2; ATE305655T1; EP1495464A1; CN1625768A; DE10217297A1; JP2005527851A; WO2003088212A1; EP1495464B1; DE50204426D1; AU2002358578A1; CA2482427A1; KR100892152B1; HK1077391A1

Abstract

The discrete-time audio signal is processed (52) to provide a quantization block with quantized spectral values. Furthermore, an integer spectral representation is generated from the discrete-time audio signal using an integer transform algorithm (56). The quantization block generated using a psychoacoustic model (54) is inversely quantized and rounded (58) to subsequently form a difference between the integer spectral values and the inversely quantized rounded spectral values. After decoding, this quantization block alone provides a lossy psychoacoustic encoded/decoded audio signal; while during decoding, this quantization block, together with the combination module, provides a lossless or near-lossless encoded and re-decoded audio signal. By generating differential signals in the frequency domain, a simple encoder/decoder structure is formed.

Description

Apparatus and method for encoding and decoding audio signals

技术领域technical field

本发明涉及音频编码/解码，尤其是涉及可扩展(scalable)的编码/解码算法，这种算法包含了一个心理声学的第一扩展层和一个包括用于无损解码的辅助音频数据的第二扩展层。The present invention relates to audio encoding/decoding, and in particular to scalable encoding/decoding algorithms comprising a psychoacoustic first extension layer and a second extension including auxiliary audio data for lossless decoding layer.

背景技术Background technique

现代音频编码方法，如MPEG Layer3(MP3)或者MPEG ACC，使用如所谓的修正离散余弦变换(MDCT)的变换来获得对音频信号的数据块式的频率表示。这样的音频编码器通常获得时间离散的音频采样的一个数据流。音频采样的数据流被窗口化(windowed)用以获取例如1024或者2048个窗口化的音频采样的窗口数据块。为了进行窗口化使用了多种窗口函数，例如正弦窗口等。Modern audio coding methods, such as MPEG Layer 3 (MP3) or MPEG ACC, use transforms such as the so-called Modified Discrete Cosine Transform (MDCT) to obtain a block-wise frequency representation of the audio signal. Such audio encoders typically obtain a data stream of time-discrete audio samples. The stream of audio samples is windowed to obtain windowed blocks of eg 1024 or 2048 windowed audio samples. Various window functions are used for windowing, such as sine window and so on.

随后，窗口化的时间离散音频采样通过滤波器组被转换为频谱表示。原则上，傅立叶变换，或者用于特殊原因的多种傅立叶变换，如FFT，或者前面阐述的MDCT，都可以用于此。然后，在滤波器组输出端处的音频频谱值的数据块可以根据要求做进一步处理。在上面引用的音频编码器中，随后是音频频谱的量化，其中典型选择量化级，以使被量化引入的量化噪声在心理声学掩盖阈值之下，也就是说被“掩盖”住了。量化是一种有损编码。为了获得进一步的数据量缩减，量化的频谱值被熵编码，例如通过哈夫曼编码。通过添加辅助信息，如比例因子(scale factors)等，一个能够被存储或者传送的比特流通过比特流多路复用器从熵编码量化的频谱值中形成。Subsequently, the windowed time-discrete audio samples are converted to a spectral representation through a filter bank. In principle, the Fourier transform, or a variety of Fourier transforms for special reasons, such as the FFT, or the MDCT explained earlier, can be used for this. The block of audio spectral values at the output of the filter bank can then be further processed as required. In the audio coders cited above, quantization of the audio spectrum is followed, where the quantization level is typically chosen such that the quantization noise introduced by the quantization is below the psychoacoustic masking threshold, that is to say "masked". Quantization is a lossy encoding. In order to obtain a further data volume reduction, the quantized spectral values are entropy coded, for example by Huffman coding. By adding auxiliary information, such as scale factors, a bitstream that can be stored or transmitted is formed from the entropy-encoded quantized spectral values through a bitstream multiplexer.

在音频解码器中，比特流被一个比特流分离多路复用器分割为编码量化的频谱值和辅助信息。熵编码的量化频谱值首先被熵解码，以获得量化频谱值。经过量化的频谱值然后被反向量化，以获得包含量化噪声的解码频谱值，然而，这种量化噪声是在生理声学掩盖阈值之下的，因而是听不到的。然后这些频谱值通过合成滤波器组被转换为时间表示方式，以获得时间离散的解码音频采样。在合成滤波器组中，必须使用一种与变换算法相反的变换算法。而且，在频率-时间转换或者反变换后，窗口必须被取消。In the audio decoder, the bitstream is split by a bitstream demultiplexer into encoded quantized spectral values and side information. The entropy encoded quantized spectral values are first entropy decoded to obtain quantized spectral values. The quantized spectral values are then dequantized to obtain decoded spectral values that contain quantization noise, however, such quantization noise is below the physiological acoustic masking threshold and thus is inaudible. These spectral values are then converted to a temporal representation through a synthesis filterbank to obtain time-discrete decoded audio samples. In synthesizing filter banks, a transform algorithm that is the inverse of the transform algorithm must be used. Also, after frequency-to-time conversion or inverse conversion, the window must be canceled.

为了获得良好的频率选择性，现代音频编码器典型地利使用块重叠。这种情况在图4a中示出。首先，通过装置402取出例如2048个时间离散的音频采样，并窗口化。实现这种窗口的装置402具有2N个采样的窗口长度，并在输出端提供了一个2N个窗口化采样的数据块。为了获得窗口重叠，通过装置404(仅仅是为了表述得更加清楚，该装置在图4a中与装置402被分开描述)，形成了第二个2N个窗口化采样的数据块。然而，被送入装置404的2048个采样不是紧接着第一个窗口的时间离散音频采样，而是包含了通过装置402窗口化了的采样的后半部分，此外仅包含了1024个“新”采样。在图4a中通过装置406示意性地说明了这个重叠，引起了50％的重叠度。然后，对通过装置402的2N个窗口化采样输出和通过装置404的2N个窗口化采样输出，分别用装置408和410实现MDCT算法。装置408根据已知的MDCT算法为第一个窗口提供了N个频谱值，而装置410也提供了N个频谱值，不过是用于第二个窗口，其中第一个和第二个窗口之间有50％的重叠。In order to obtain good frequency selectivity, modern audio coders typically utilize block overlapping. This situation is shown in Figure 4a. First, for example, 2048 time-discrete audio samples are taken out by means 402 and windowed. The means 402 for implementing such a window has a window length of 2N samples and provides at output a data block of 2N windowed samples. To obtain window overlap, by means 404 (which is depicted separately from means 402 in FIG. 4a only for clarity of presentation), a second block of 2N windowed samples is formed. However, instead of the time-discrete audio samples immediately following the first window, the 2048 samples fed into device 404 contain the second half of the samples windowed by device 402, and in addition only contain 1024 "new" sampling. This overlap is schematically illustrated in Fig. 4a by means 406, causing an overlap of 50%. Then, for the 2N windowed sample outputs passing through the device 402 and the 2N windowed sample outputs passing through the device 404, the MDCT algorithm is implemented by means 408 and 410, respectively. The device 408 provides N spectral values for the first window according to the known MDCT algorithm, and the device 410 also provides N spectral values, but for the second window, wherein the first window and the second window There is a 50% overlap between them.

在解码器中，第一个窗口的N个频谱值，如图4b所示，被送入装置412来实现修正离散余弦反变换。同样的操作被作用于第二个窗口的N个频谱值。它们被送入装置414，也实现了修正离散余弦反变换。装置412和装置414都分别为第一个窗口和第二个窗口提供了2N个采样。In the decoder, the N spectral values of the first window, as shown in Fig. 4b, are fed into the means 412 to implement the modified inverse discrete cosine transform. The same operation is applied to the N spectral values of the second window. They are fed into means 414, which also implements the Modified Inverse Discrete Cosine Transform. Both means 412 and means 414 provide 2N samples for the first window and the second window, respectively.

在装置416中，在图4b中以TDAC(时域混迭取消)来表示，考虑到两个窗口是重叠的。特别地，第一个窗口的后半部分的一个采样y₁(也就是带有系数N+k)与第二个窗口的前半部分的采样y₂(也就是带有系数k)相加，这样在输出端，也就是解码器处生成N个已解码的时域采样。In means 416, denoted TDAC (Time Domain Aliasing Cancellation) in Fig. 4b, it is taken into account that the two windows are overlapping. In particular, a sample y ₁ (that is, with coefficient N+k) from the second half of the first window is added to a sample y ₂ (that is, with coefficient k) from the first half of the second window, such that At the output, ie at the decoder, N decoded time-domain samples are generated.

需要注意的是，通过也称为相加函数的装置416的功能，在图4a所示的编码器中实现的窗口化在一定程度上被自动考虑，所以在图4b所示的解码器中不必有明显的“反向窗口化”发生。It should be noted that the windowing implemented in the encoder shown in Fig. 4a is taken into account to a certain extent automatically by the function of the means 416, also called the addition function, so it is not necessary in the decoder shown in Fig. 4b There is obvious "reverse windowing" happening.

当通过装置402或者404实现的窗口函数被指定为w(k)，其中系数k代表时间系数，必须满足的条件是平方后的窗口权重w(k)与平方后的窗口权重w(N+k)的和等于1，其中k的范围从0到N-1。当使用正弦窗口时，该窗口的权重遵循正弦函数的前半波，这个条件始终满足，因为任意角的正弦平方与余弦平方的和均为1。When the window function implemented by means 402 or 404 is designated as w(k), where the coefficient k represents the time coefficient, the condition that must be satisfied is that the squared window weight w(k) and the squared window weight w(N+k ) equals 1, where k ranges from 0 to N-1. When using a sine window, the weight of the window follows the first half of the sine function, which is always true because the sum of the squared sine and the squared cosine of any angle is 1.

在图4a中描述的按照MDCT函数的窗口方法的缺点是，通过将时间离散的采样相乘来窗口化，当考虑它为一个正弦窗口的时候，它由一个浮点数来达到，因为一个在0到180度之间的角的正弦不会产生整数，除非这个角等于90度。即便当整数时间离散采样被窗口化时，在窗口化后也会产生浮点数。The disadvantage of the windowing method according to the MDCT function described in Fig. 4a is that windowing is performed by multiplying time-discrete samples, which is achieved by a floating point number when considering it as a sinusoidal window, since one at 0 The sine of an angle between 180 and 180 does not produce an integer unless the angle is equal to 90. Even when integer-time discrete samples are windowed, floating-point numbers are produced after windowing.

因此，即使当不使用心理声学编码时，也就是当需要获得无损编码时，为了进行适当的易于处理的熵编码，在装置408或装置410的输出端处的量化也是必要的。Therefore, quantization at the output of means 408 or 410 is necessary for proper tractable entropy coding even when psychoacoustic coding is not used, ie when lossless coding is to be obtained.

当已知的变换，如在图4a基础上描述的那样，被应用于无损音频编码，需要使用非常好的量化，以可以忽略由于浮点数取整而引起的结果误差，或者误差信号需要例如在时域中被额外地编码。When known transforms, as described on the basis of Fig. 4a, are applied to lossless audio coding, it is necessary to use very good quantization, so that the resulting errors due to rounding of floating-point numbers can be ignored, or the error signal needs to be e.g. are additionally encoded in the time domain.

现有技术中的概念，也就是在其中量化被非常好地调整以使得由于浮点数取整而引起的结果错误可以被忽略，例如在德国专利DE 19742 201 C1中公开的那样。这里，一个音频信号被转换为它的频谱表示并被量化，以获得量化的频谱值。量化的频谱值然后被反向量化，变换到时域，并且被与原始的音频信号相比较。如果误差，也就是原始音频信号与量化/反向量化后的音频信号之间的误差，在一个误差阈值以上，在反馈中量化器会被调整得更加精确，然后再次进行比较。当低于误差阈值时，停止迭代。可能仍然存在的残留信号被一个时域编码器编码并被写入一个比特流，这个比特流除了时域编码的残留信号外还包括根据在迭代取消时候存在的量化器调整进行量化后的编码频谱值。需要注意的是，量化器不一定必须通过心理声学模型控制，以使编码的频谱值通常比由于采用心理声学模型而得到的频谱值量化得更为精确。Concepts in the prior art, namely in which quantization is adjusted so well that result errors due to rounding of floating-point numbers can be ignored, are disclosed, for example, in German patent DE 19742 201 C1. Here, an audio signal is converted to its spectral representation and quantized to obtain quantized spectral values. The quantized spectral values are then dequantized, transformed into the time domain, and compared with the original audio signal. If the error, that is, the error between the original audio signal and the quantized/inverse quantized audio signal, is above an error threshold, the quantizer is adjusted to be more accurate in the feedback, and the comparison is made again. When it is below the error threshold, stop the iteration. The residual signal that may still be present is encoded by a time-domain encoder and written to a bitstream that includes, in addition to the time-domain encoded residual signal, the encoded spectrum quantized according to the quantizer adjustments that existed at the time of the iterative cancellation value. It should be noted that the quantizer does not necessarily have to be controlled by the psychoacoustic model, so that the coded spectral values are usually quantized more precisely than those obtained due to the use of the psychoacoustic model.

在出版物“A Design of Lossy and Lossless Scalable AudioCoding”(T.Moriya et al.，Proc.ICASSP，2000)中描述了一个可扩展的编码器，这个编码器包括如一个MPEG编码器作为第一个有损数据压缩模块，此模块具有一个数据块形式的数字信号形式作为输入信号，并生成压缩的比特流。在另一个现有的本地解码器中编码再次被取消，并生成了一个编码/解码信号。这个信号通过从初始输入信号中减去编码/解码信号而与初始的输入信号相比较。误差信号然后被送到第二个模块，在那里使用了一个无损位转换器。这个转换有两步。第一步包括一个从二进制补码格式到符号数值格式的转换。第二步包括在一个处理决中从一个垂直数值序列到一个水平比特序列的转换。无损数据转换被执行以使零的数量最大化或者使一个序列中连续零的数量最大化，以便获得尽可能好的作为数字结果表示的时间误差信号。这一原理基于在出版物“Multi-Layer Bit Sliced Bit Rate Scalable AudioCoder”(103^rd AES Convention，Preprint No.4520，1997)中阐明的比特片算法编码(BSAC)方案。In the publication "A Design of Lossy and Lossless Scalable AudioCoding" (T.Moriya et al., Proc. ICASSP, 2000) a scalable encoder is described which includes e.g. an MPEG encoder as the first Lossy data compression module, this module takes as input a digital signal form in the form of data blocks and generates a compressed bit stream. In another existing native decoder the encoding is canceled again and an encoded/decoded signal is generated. This signal is compared to the original input signal by subtracting the encoded/decoded signal from the original input signal. The error signal is then sent to a second block where a lossless bit converter is used. This conversion has two steps. The first step involves a conversion from two's complement format to signed numeric format. The second step involves the conversion from a vertical sequence of values to a horizontal sequence of bits in one processing block. Lossless data conversion is performed to maximize the number of zeros or to maximize the number of consecutive zeros in a sequence in order to obtain the best possible temporal error signal represented as a digital result. This principle is based on the Bit Slice Algorithm Coding (BSAC) scheme explained in the publication "Multi-Layer Bit Sliced Bit Rate Scalable AudioCoder" (103 ^rd AES Convention, Preprint No. 4520, 1997).

上述概念的缺点是用于无损扩展层的数据，也就是用于获得无损音频信号解码的辅助数据必须在时域中获得。这意味着获得为了获得时域的编码/解码信号需要包含频率/时间变换的完全解码，所以通过在原始音频输入信号与编码/解码音频信号之间的采样差异的形成来计算误差信号，编码/解码音频信号由于是心理声学编码因而是有损的。这个概念的缺点尤其在于在编码器生成音频数据流时，两种完全的时间/频率变换装置，如滤波器组或者如MDCT算法，都被要求用于前向的转换，另一方面，仅仅为了产生误差信号，需要一个完整的反向滤波器组或者一个完全的合成算法。因而，编码器除了它固有的编码器功能，还必须具有完全的解码器功能。如果编码器是由软件实现的，则为此对存储性能和处理器性能都有所要求，从而导致编码器的实现增加了开销。A disadvantage of the above concept is that the data for the lossless extension layer, ie the auxiliary data for decoding to obtain a lossless audio signal, must be obtained in the time domain. This means that obtaining an encoded/decoded signal in the time domain requires a full decoding including frequency/time transforms, so the error signal is computed by forming the sample difference between the original audio input signal and the encoded/decoded audio signal, the encoded/decoded Decoded audio signals are lossy due to psychoacoustic encoding. The disadvantage of this concept is especially that when the encoder generates the audio data stream, two complete time/frequency transformation devices, such as filter banks or algorithms such as MDCT, are required for forward transformation, on the other hand, only for To generate the error signal, a complete inverse filter bank or a complete synthesis algorithm is required. Thus, an encoder must have full decoder functionality in addition to its inherent encoder functionality. If the encoder is implemented by software, there are requirements for both storage performance and processor performance, which results in increased overhead for the implementation of the encoder.

发明内容Contents of the invention

本发明的目的在于提供一种花费较少的概念，利用这个概念，可以产生以一种以几乎无损的方式解码的音频数据流。It is an object of the invention to provide an inexpensive concept with which an audio data stream can be generated which can be decoded in an almost lossless manner.

这个目标通过权利要求1中对时间离散的音频信号进行编码的装置，权利要求21中对时间离散的音频信号进行编码的方法，权利要求22中对已编码的音频数据进行解码的装置，权利要求31中对已编码的音频数据进行解码的方法，或者权利要求32或33中的计算机程序来实现。This object is achieved by a device for encoding a time-discrete audio signal in claim 1, a method for encoding a time-discrete audio signal in claim 21, a device for decoding encoded audio data in claim 22, and a device for encoding a time-discrete audio signal in claim 22. 31, or a computer program in claim 32 or 33.

本发明基于这样的发现，可以对音频信号进行无损解码的辅助音频信号可以通过如通常那样提供一个量化频谱值的数据块，然后对其进行反向量化来获得反向量化的频谱值来实现，反向量化的频谱值由于使用了心理声学模型量化因而是有损的。这些反向量化的频谱值然后被取整，以获得经过取整的反向量化的频谱值的取整块。作为形成差值的参考，按照本发明，使用了一种整数变换算法，此算法从一个整数时间离散采样块生成了只包含整数频谱值的频谱值整数块。按照本发明，现在在取整块和在整数块中的频谱值的结合是以频谱值的方式实现的，也就是说在频域内实现，所以在编码器本身不需要合成算法，也就是反向滤波器组或者反向MDCT算法等。由于整数变换算法和取整量化值，包含不同频谱值的结合块仅仅包含可以以某些已知方式熵编码的整数值。需要注意的是，任意的熵编码器都可以用于结合块的熵编码，如哈夫曼编码器和算法编码器等。The invention is based on the discovery that an auxiliary audio signal capable of lossless decoding of the audio signal can be achieved by providing a data block of quantized spectral values as usual and then dequantizing it to obtain dequantized spectral values, The dequantized spectral values are lossy due to quantization using a psychoacoustic model. These inverse quantized spectral values are then rounded to obtain rounded blocks of rounded inverse quantized spectral values. As a reference for forming the difference, according to the invention an integer transformation algorithm is used which generates an integer block of spectral values containing only integer spectral values from a block of discrete samples at integer times. According to the present invention, the combination of the rounding block and the spectral value in the integer block is now realized in the form of spectral value, that is to say in the frequency domain, so no synthesis algorithm is needed in the encoder itself, that is, the reverse Filter bank or inverse MDCT algorithm, etc. Due to the integer transformation algorithm and the rounding of quantization values, a combined block containing different spectral values only contains integer values that can be entropy coded in some known way. It should be noted that any entropy encoder can be used for entropy encoding of combined blocks, such as Huffman encoders and algorithmic encoders.

对量化块的量化频谱值编码也可以使用任意的编码器，如已知的现代音频编码器常用的工具。The encoding of the quantized spectral values of the quantized blocks can also use any encoder, such as known tools commonly used by modern audio encoders.

值得注意的是，本发明的编码/解码概念与现代编码装置是兼容的，如窗口切换、TNS、或者多信道音频信号的中心/边缘编码。It is worth noting that the encoding/decoding concept of the present invention is compatible with modern encoding devices such as window switching, TNS, or center/edge encoding of multi-channel audio signals.

在本发明的一个优选实施例中，用MDCT来提供一个使用心理声学模型量化的频谱值量化块。此外，最好使用一个所谓IntMDCT作为整数变换算法。In a preferred embodiment of the invention, MDCT is used to provide a quantized block of spectral values quantized using a psychoacoustic model. Also, it is better to use a so-called IntMDCT as an integer transformation algorithm.

在本发明的替代实施例中，可以不使用通常的MDCT，而IntMDCT可以作为MDCT的近似，即通过整数变换算法获得的整数频谱被用于心理声学量化器来获得量化的IntMDCT频谱值，此频谱值然后再次被反向量化并取整，以与原始的整数频谱值相比较。在这种情况下，只需要单一变换，也就是IntMDCT从整数时间离散采样产生整数频谱值。In an alternative embodiment of the present invention, the usual MDCT may not be used, and the IntMDCT may be used as an approximation of the MDCT, that is, the integer spectrum obtained by the integer transform algorithm is used in a psychoacoustic quantizer to obtain quantized IntMDCT spectral values, this spectrum The values are then dequantized and rounded again for comparison with the original integer spectral values. In this case, only a single transform is required, namely IntMDCT to produce integer spectral values from integer-time discrete samples.

典型地，处理器处理整数，或者每个浮点数被表示为整数。如果一个整数算法用于一个处理器，它可以无需对反向量化的频谱值取整，因为由于处理器取整值的算法，也就是在LSB精确度范围之内，即最低有效位，总是存在的。在这样的情况下，实现了完全的无损处理，也就是在被使用的处理器精度范围之内的处理。然而可选地，也可以取整到一个大致的精度，以使合成块中的差分信号被取整到一个由取整函数所确定的精确度。为了生成一个在数据压缩意义上几乎无损的编码器，在原本的处理系统取整外引入了取整，这样增强了灵活性，从而影响了编码无损的程度。Typically, processors handle integers, or each floating point number is represented as an integer. If an integer algorithm is used in a processor, it can eliminate the need to round the dequantized spectral values, because due to the processor's algorithm rounding the values, that is, within the precision of the LSB, i.e. the least significant bit, is always existing. In this case, completely lossless processing is achieved, that is, processing within the precision of the processor being used. Optionally, however, rounding to an approximate accuracy is also possible, so that the differential signal in the synthesis block is rounded to an accuracy determined by the rounding function. In order to generate an encoder that is almost lossless in the sense of data compression, rounding is introduced in addition to the rounding of the original processing system, which enhances flexibility and thus affects the degree of lossless encoding.

根据本发明的解码器本身在心理声学编码音频数据和辅助音频数据两方面特别突出，辅助音频数据从音频数据中抽取出，进行可能的熵解码，然后又做如下处理。首先解码器中量化块被反向量化，并且使用与编码器中一样的取整算法进行取整，这样随后可以被加到熵解码辅助音频数据上。在解码器中，然后心理声学压缩的音频信号的频谱表示和音频信号的无损表示同时存在，其中心理声学压缩的音频信号频谱表示被变换到时域，以获得一个无损的编码/解码音频信号，而所述无损表示通过使用与为获得无损，或者如上所述的那样，基本无损的编码/解码音频信号而使用的整数转换算法相反的整数转换算法变换到时域。The decoder according to the invention is itself particularly distinguished both with respect to psychoacoustically encoded audio data and with auxiliary audio data extracted from the audio data, subjected to possible entropy decoding, and then processed as follows. First, the quantized block is dequantized in the decoder and rounded using the same rounding algorithm as in the encoder, which can then be added to the entropy-decoded auxiliary audio data. In the decoder, the spectral representation of the psychoacoustically compressed audio signal then co-exists with a lossless representation of the audio signal, wherein the psychoacoustically compressed spectral representation of the audio signal is transformed into the time domain to obtain a lossless encoded/decoded audio signal, The lossless representation is instead transformed to the time domain by using an integer transformation algorithm inverse to that used to obtain a lossless, or, as mentioned above, substantially lossless encoded/decoded audio signal.

附图说明Description of drawings

本发明的上述及其他目标和特性将在下面与附图相结合的描述中更加清楚：The above-mentioned and other objects and characteristics of the present invention will be clearer in the following description in conjunction with the accompanying drawings:

图1是用于处理时间离散的音频采样，以获得从中可确定整数频谱值的整数值的优选的装置的电路框图；1 is a block circuit diagram of a preferred apparatus for processing time-discrete audio samples to obtain integer values from which integer spectral values can be determined;

图2是一个在Givens旋转以及两个DCT-IV操作中的MDCT和反向MDCT的分解的示意图；Figure 2 is a schematic diagram of the decomposition of MDCT and inverse MDCT in Givens rotation and two DCT-IV operations;

图3是在旋转和DCT-TV操作中有50％重叠的MDCT分解的图例代表；Figure 3 is a legend representation of the MDCT decomposition with 50% overlap in rotation and DCT-TV operations;

图4a是一个具有MDCT和50％重叠的已知编码器的示意电路框图；Figure 4a is a schematic circuit block diagram of a known encoder with MDCT and 50% overlap;

图4b是用于对图4a中生成的值进行解码的已知解码器的电路框图；Figure 4b is a block circuit diagram of a known decoder for decoding the values generated in Figure 4a;

图5是一个优选的根据本发明的编码器的原理电路框图；Fig. 5 is a preferred schematic circuit block diagram of an encoder according to the present invention;

图6是一个可作为替代的优选的具有创造性的解码器的原理电路框图；Fig. 6 is a schematic circuit block diagram of an alternative preferred inventive decoder;

图7是一个具有创造性的优选解码器的原理电路框图；Fig. 7 is a schematic circuit block diagram of an inventive preferred decoder;

图8a是具有一个第一扩展层和一个第二扩展层的比特流示意图；Figure 8a is a schematic diagram of a bitstream with a first extension layer and a second extension layer;

图8b是具有一个第一扩展层和多个其它扩展层的比特流示意图；Figure 8b is a schematic diagram of a bitstream with a first extension layer and a plurality of other extension layers;

图9是二进制编码差分频谱值的示意图，用于表示与差分频谱值的精确度(位)有关和/或与差分频谱值的频率(采样率)有关的可能扩展比率。Fig. 9 is a schematic diagram of binary coded differential spectral values for representing possible spreading ratios in relation to the precision (bits) of the differential spectral values and/or in relation to the frequency (sampling rate) of the differential spectral values.

具体实施方式Detailed ways

在图5到7的基础上，下面将论及具有创造性的编码器电路(图5和图6)或者一个具有创造性的优选的解码器电路(图7)。图5所示的本发明的编码器包括一个输入端50，时间离散的音频信号被送入这个输入端，还包括一个输出端52，它输出已编码的音频数据。输入端50处的时间离散的音频信号被馈入装置52以提供一个量化块，这个块在输出端提供了时间离散的音频信号的量化块，这个量化块包含使用生理声学模型54的时间离散频谱音频信号50的量化频谱值。本发明的编码器还包含使用一个整数变换算法56生成一个整数块的装置，其中这个整数算法对从整数时间离散采样生成整数频谱值是有效的。On the basis of FIGS. 5 to 7, the following will discuss an inventive encoder circuit (FIGS. 5 and 6) or an inventive preferred decoder circuit (FIG. 7). The encoder of the invention shown in Figure 5 comprises an input 50 to which a time-discrete audio signal is fed and an output 52 which outputs encoded audio data. The time-discrete audio signal at the input 50 is fed into a device 52 to provide a quantized block which provides at the output a quantized block of the time-discrete audio signal comprising the time-discrete spectrum using a physiological acoustic model 54 Quantized spectral values of the audio signal 50 . The encoder of the present invention also includes means for generating an integer block using an integer transformation algorithm 56 effective for generating integer spectral values from discrete samples at integer times.

具有创造性的编码器还包括用于从装置52对量化块输出进行反向量化的装置58，并且，当需要和处理器精度不同的精度时，还包括一个取整函数。如同所述的一样，如果已经达到处理器系统的精度，则取整函数已经固有地包含在量化块的反向量化中，因为一个具有整数算法的处理器是无论如何不能够提供非整数值的。于是装置58提供了一种所谓的取整块，它包括固有地或者显式地被取整为整数的反向量化频谱值。取整块和整数块都被馈送到用于使用差异形成提供具有差分频谱值的差分的结合装置，在这里术语“差分块”意味着差分频谱值是包含整数决与取整块之间的差的数值。The inventive encoder also includes means 58 for inverse quantizing the output of the quantized block from means 52 and, when a precision different from the processor precision is required, a rounding function. As stated, the rounding function is already inherently included in the inverse quantization of the quantization block if the processor system's precision has been achieved, since a processor with integer arithmetic is by no means capable of providing non-integer values . The means 58 then provide a so-called rounding block comprising inverse quantized spectral values which are inherently or explicitly rounded to integers. Both the rounded block and the integer block are fed to combining means for providing a difference with difference spectral values using difference forming, where the term "difference block" means that the difference spectral value is the difference between the containing integer block and the rounded block value.

从装置52输出的量化块以及从输出差异形成装置58的差分块都被送入处理装置60，来实现如通常的量化块处理，并例如引起对差分块的熵编码。处理装置60在输出端52输出经过编码的音频数据，这些数据包括量化块的信息，还包括差分块的信息。Both the quantized block output from means 52 and the differential block output from difference forming means 58 are fed to processing means 60 for performing quantized block processing as usual and for example causing entropy coding of the differential block. The processing means 60 output the encoded audio data at the output terminal 52, these data include the information of the quantization block and also the information of the difference block.

在第一个优选实施例中，如图6所示，时间离散的音频信号通过MDCT方法被转换为频谱表示，然后被量化。装置52用于提供量化块，具有MDCT装置52a和一个量化器52b。In a first preferred embodiment, as shown in Fig. 6, the time-discrete audio signal is converted into a spectral representation by MDCT method and then quantized. Means 52 for providing quantized blocks have MDCT means 52a and a quantizer 52b.

另外，最好用IntMDCT56作为整数转换算法来生成整数块。Also, it is best to use IntMDCT56 as the integer conversion algorithm to generate integer blocks.

在图6中，图5所示的处理装置60也作为比特流编码装置60a和熵编码器60b来描述，比特流编码装置60a是用于对装置52b输出的量化块进行比特流编码，熵编码器60b是用于对差分块进行熵编码。比特流编码器60a输出生理声学编码的音频数据，而熵编码器60b输出熵编码的差分块。模块60a和60b的两种输出数据块可以通过一种合适的方式结合为比特流，此比特流以生理声学编码的音频数据作为第一扩展层，而把用于无损解码的辅助音频数据作为第二扩展层。这个经过扩展的比特流然后与图5所示的在编码器的输出端52处的已编码的音频数据相一致。In FIG. 6, the processing device 60 shown in FIG. 5 is also described as a bit stream encoding device 60a and an entropy encoder 60b. The bit stream encoding device 60a is used to perform bit stream encoding on the quantized block output by the device 52b. The entropy encoding The unit 60b is used to perform entropy coding on the differential block. The bitstream encoder 60a outputs physiologically-acoustically encoded audio data, while the entropy encoder 60b outputs entropy-encoded difference blocks. The two output data blocks of modules 60a and 60b can be combined in a suitable way into a bitstream with the audio data encoded physiologically as a first extension layer and the auxiliary audio data for lossless decoding as a second extension layer. Two expansion layers. This expanded bitstream then corresponds to the encoded audio data shown in FIG. 5 at the output 52 of the encoder.

在一个替代的优选实施例中，可以不使用图6中的MDCT块52a，因为它已在图5中通过虚线箭头62暗示了。在这种情况下，整数变换装置56提供的整数频谱被送到图6中形成差值的装置58和量化器52b。由整数变换算法产生的频谱值在这里通过一种方式被用做通常的MDCT频谱的近似。这个实施例的好处在于，仅仅IntMDCT算法存在于编码器中，而不是IntMDCT和MDCT算法都需要存在。In an alternative preferred embodiment, the MDCT block 52a in FIG. 6 may not be used as it has been implied by the dashed arrow 62 in FIG. 5 . In this case, the integer spectrum provided by the integer transformation means 56 is supplied to the difference forming means 58 and the quantizer 52b of FIG. 6 . The spectral values generated by the integer transformation algorithm are used here in one way as an approximation of the usual MDCT spectrum. The benefit of this embodiment is that only the IntMDCT algorithm is present in the encoder, rather than both IntMDCT and MDCT algorithms need to be present.

再次参考图6，需要注意的是，实框和实线代表遵循某一MPEG标准的一个普通音频编码器，而虚框和虚线则代表这样一个普通MPEG编码器的扩展。因此，可以看到不需要对普通MPEG编码器进行根本改变，而是通过增加整数变换器的方法来捕获无损编码的辅助音频数据，并不需要改变编码器/解码器的基本结构。Referring again to FIG. 6, it should be noted that the solid boxes and lines represent a general audio coder conforming to an MPEG standard, while the dashed boxes and dashed lines represent extensions of such a general MPEG coder. Therefore, it can be seen that there is no fundamental change to the general MPEG encoder, but the method of adding an integer transformer to capture the lossless encoded auxiliary audio data does not need to change the basic structure of the encoder/decoder.

图7示出了一个用于对图5中输出端52处的已编码的音频数据输出进行解码的具有创造性的解码器的原理电路框图。它首先一方面分解为心理声学编码音频数据，另一方面分解为辅助音频数据。心理声学编码音频数据被送入一个普通的比特流解码器70，而辅助音频数据，当在被编码器熵编码后，被编码器72熵编码。在图7中比特流解码器70的输出端处存在量化频谱值，这些频谱值原理上可以被送到与图6的装置中的反向量化器结构相同的反向量化器74。如果需要达到一个与处理器精度不同的精度，在解码器中还提供了一个取整装置76，取整装置76与图6的装置58一样，实现了将一个实数映射为一个整数的同样的算法或者同样的取整函数。在一个解码端结合器78中，经过取整的反向量化频谱值最好通过相加以频谱值的方式与熵编码辅助音频数据相结合，使得在解码器中，一方面反向量化频谱值出现在装置74的输出端处，另一方面整数频谱值出现在结合器78的输出端处。FIG. 7 shows a schematic block circuit diagram of an inventive decoder for decoding the encoded audio data output at output 52 of FIG. 5 . It is first decomposed into psychoacoustically encoded audio data on the one hand and auxiliary audio data on the other hand. The psychoacoustically encoded audio data is fed to a conventional bitstream decoder 70, while the auxiliary audio data, after being entropy encoded by the encoder, is entropy encoded by an encoder 72. At the output of the bitstream decoder 70 in FIG. 7 there are quantized spectral values which can in principle be fed to an inverse quantizer 74 of the same structure as the inverse quantizer in the arrangement of FIG. 6 . If it is necessary to achieve a precision different from the precision of the processor, a rounding device 76 is also provided in the decoder. The rounding device 76 is the same as the device 58 in FIG. 6, and realizes the same algorithm for mapping a real number into an integer Or the same rounding function. In a decoder combiner 78, the rounded inverse quantized spectral values are preferably combined with the entropy encoded auxiliary audio data by adding the spectral values, so that in the decoder, on the one hand, the inverse quantized spectral values appear At the output of the means 74 , on the other hand integer spectral values appear at the output of a combiner 78 .

然后，为了执行经过修正的离散余弦反变换，可以通过装置80把装置74的输出端处的频谱值变换到时域，以得到一个有损的心理声学编码和再解码的音频信号。为了执行反向的整数MDCT(IntMDCT)，可以通过装置82把合成器78的输出信号也变换到其时间形式，以产生一个无损的编码/解码音频信号，或者在采用一个更加粗略的取整的时候，能够产生一个几乎无损的编码和再解码的音频信号。The spectral values at the output of the means 74 can then be transformed by means 80 into the time domain in order to perform a modified inverse discrete cosine transform to obtain a lossy psychoacoustically encoded and re-decoded audio signal. In order to perform an inverse integer MDCT (IntMDCT), the output signal of the synthesizer 78 can also be transformed into its time form by means 82 to produce a lossless encoded/decoded audio signal, or in a more coarse rounded Sometimes, an almost lossless encoded and re-decoded audio signal can be produced.

下面来看图6中的熵编码器60b一种特别优选的实施方式。在通常的现代MPEG编码器中，多个码表是根据量化频谱值的平均统计量来选择。最好在合成器58的输出端处的差分块使用相同的码表或者码书来进行熵编码。由于差分块的大小，即残留IntMDCT频谱，取决于量化的精度，因此熵编码器60b的码表选择可以在没有辅助边缘信息的情况下执行。A particularly preferred implementation of the entropy coder 60b in FIG. 6 will now be considered. In typical modern MPEG coders, multiple codetables are selected based on an average statistic of quantized spectral values. Preferably the differential blocks at the output of the combiner 58 are entropy coded using the same code table or codebook. Since the size of the differential block, ie the residual IntMDCT spectrum, depends on the precision of the quantization, the code table selection by the entropy encoder 60b can be performed without auxiliary edge information.

在一个MPEG-2AAC解码器中，频谱系数，也就是量化频谱值，被分组为在量化块中的比例因子频带，其中频谱值以来自与比例因子频带相关的相应的比例因子的增益因子来加权。由于在这个已知的编码器概念中，一个非均匀的量化器被用于量化加权的频谱值，残留值的大小，也就是结合器58的输出端处的频谱值，不仅取决于比例因子，还取决于量化值自身。但是由于比例因子和量化频谱值都包含在由图6的装置60a生成的比特流中，也就是在心理声学编码音频数据中，最好根据差分频谱值的大小来实现解码器中的码书选择，以及在比特流中传输的比例因子和量化值的基础之上，确定出解码器中所使用的码表。由于在合成器58的输出端不需要传输辅助信息以对差分频谱值进行熵编码，熵编码仅仅导致数据率压缩，而不需要在数据流中扩展任何信号化比特作为熵编码器60b的辅助信息。In an MPEG-2 AAC decoder, the spectral coefficients, i.e. quantized spectral values, are grouped into scalefactor bands in quantized blocks, where the spectral values are weighted by gain factors from the corresponding scalefactors associated with the scalefactor bands . Since in this known encoder concept a non-uniform quantizer is used to quantize the weighted spectral values, the size of the residual value, i.e. the spectral value at the output of the combiner 58, depends not only on the scale factor, Also depends on the quantization value itself. However, since the scale factor and the quantized spectral value are included in the bit stream generated by the device 60a of Fig. 6, that is, in the psychoacoustic coded audio data, it is better to realize the codebook selection in the decoder according to the magnitude of the differential spectral value , and on the basis of the scale factor and quantization value transmitted in the bit stream, the code table used in the decoder is determined. Since no side information needs to be transmitted at the output of the combiner 58 to entropy code the differential spectral values, the entropy coding only results in data rate compression without the need to spread any signaling bits in the data stream as side information for the entropy coder 60b .

在一个遵循标准MPEG-2 AAC的音频编码器中，用窗口切换来避免瞬态音频信号域中的前向回波。这种技术基于在每半个MDCT窗口中分别选择窗口形状的可能性，能够在连续块中改变块的大小。同样的，IntMDCT形式的整数变换算法(这种算法参照图1到3来解释)也在窗口化和在时域MDCT分解的混迭部分使用了不同的窗口形状来执行。因而，为整数变换算法和生成量化块的变换算法最好使用相同的窗口判别。In an audio codec conforming to the standard MPEG-2 AAC, window switching is used to avoid forward echoes in the transient audio signal domain. This technique is based on the possibility to choose the window shape separately in each half of the MDCT window, enabling the block size to be varied in successive blocks. Likewise, integer transform algorithms of the form IntMDCT (this algorithm is explained with reference to Figures 1 to 3) are also performed using different window shapes for windowing and aliasing in the temporal MDCT decomposition. Thus, preferably the same window discrimination is used for the integer transform algorithm and the transform algorithm that generates the quantized blocks.

在一个遵循MPEG-2AAC的编码器中，也存在多种其它的编码工具，这里只介绍TNS(时域噪声整形)和中间/边缘(CS)立体声编码。在TNS编码中，就在像CS编码中那样，在量化前对频谱值进行修正。接着，IntMDCT值，也就是整数块，之间的差，以及量化MDCT值增加了。根据本发明，形成整数变换算法来接纳TNS编码和中间/边缘编码的整数频谱值。TNS技术基于对MDCT值在频率上的自适应前向预测。通过一个信号自适应方式的普通TNS模块计算出的相同的预测滤波器最好也被用于预测整数频谱值，而如果其中产生了非整数值，则会使用向下取整，再次产生整数值。此取整最好发生在每个预测步骤之后。在解码器中，初始频谱可以通过使用反向滤波器和同样的取整函数再次重建。同样，CS编码也可基于提升法通过使用具有角度π/4的取整Givens旋转用于IntMDCT频谱值。因此，在解码器中的初始IntMDCT值是可以重建的。In an MPEG-2AAC-compliant encoder, there are many other encoding tools, here only TNS (temporal noise shaping) and mid/edge (CS) stereo coding are introduced. In TNS coding, just as in CS coding, the spectral values are corrected before quantization. Next, the IntMDCT value, ie the difference between the integer blocks, and the quantized MDCT value are increased. According to the invention, an integer transformation algorithm is formed to accommodate TNS coded and intermediate/edge coded integer spectral values. The TNS technique is based on adaptive forward prediction of MDCT values in frequency. The same predictive filter computed by a normal TNS module in a signal-adaptive manner is preferably also used to predict integer spectral values, and if non-integer values are generated therein, rounding down is used to generate integer values again . This rounding preferably happens after each prediction step. In the decoder, the original spectrum can be reconstructed again by using the inverse filter and the same rounding function. Likewise, CS coding can also be based on the lifting method for the IntMDCT spectral values by using a rounded Givens rotation with an angle π/4. Therefore, the original IntMDCT value in the decoder can be reconstructed.

需要注意的是，在以IntMDCT作为整数变换算法的优选实施例中，本发明的概念可以应用于一切基于MDCT的听觉适应性音频编码器。只是作为一个例子，这些编码器是根据MPEG-4 AAC可扩展性、MPEG-4 AAC低时延、MPEG-4 BSAC、MPEG-4 Twin VQ、DolbyAC-3等的编码器。It should be noted that in the preferred embodiment using IntMDCT as the integer transform algorithm, the concept of the present invention can be applied to all MDCT-based auditory adaptive audio coders. Just as an example, these encoders are encoders based on MPEG-4 AAC Scalable, MPEG-4 AAC Low Latency, MPEG-4 BSAC, MPEG-4 Twin VQ, DolbyAC-3, etc.

尤其需要注意的是，这个具有创造性的概念是反向兼容的。听觉适应性编码或解码器没有被改变，而仅仅是被扩展了。无损分量的辅助信息可以在以反向兼容方式的听觉适应性方式编码的比特流中传输，如在“辅助数据”域中的MPEG-2 AAC。前面的听觉适应性解码器的附加部分在图7中以虚线表示，它可以与量化MDCT频谱和从听觉适应性解码器以无损方式获得的IntMDCT频谱一起来估计并重建辅助数据。In particular, note that this inventive concept is backward compatible. The auditory adaptive coder or decoder is not changed, but only extended. Auxiliary information for lossless components may be transmitted in an auditory-adaptive coded bitstream in a backwards-compatible manner, such as MPEG-2 AAC in the "ancillary data" field. An additional part of the preceding auditory-adaptive decoder, shown in dashed lines in Fig. 7, can estimate and reconstruct the auxiliary data together with the quantized MDCT spectrum and the IntMDCT spectrum obtained losslessly from the auditory-adaptive decoder.

在无损或者几乎无损编码的补充下，心理声学编码的创造性的概念尤其适合产生、传输和解码可扩展数据流。已知可扩展数据流包含许多不同的扩展层。其中，至少最低的扩展层可以被发送并与较高扩展层无关地进行解码。在数据的可扩展处理中，其它扩展层或者增强层被叠加到第一个扩展层或者基层上。一个完整的编码器可以产生可扩展的数据流，这个数据流具有第一可扩展层，原理上还有任意数目的其它可扩展层。可扩展性概念的一个优点是，假如有一个宽带传输信道可用，由编码器产生的可扩展数据流能够完全发送。也就是说，包括所有的可扩展层都可通过宽带传输信道来传输。但是，如果只有一个窄带的传输信道，经过编码的信号仍然可以通过传输信道发送，但是只能以第一扩展层或者某个数目的其它扩展层的形式来发送。其中其它扩展层的数目小于由编码器产生的所有扩展层数。当然，与信道连接并且适应信道的编码器可能已经产生基扩展层或第一扩展层以及多个与信道相关的其它可扩展层。The inventive concept of psychoacoustic coding is particularly well suited for generating, transmitting and decoding scalable data streams, complemented by lossless or nearly lossless coding. Scalable data streams are known to contain many different layers of scaling. Of these, at least the lowest extension layer can be transmitted and decoded independently of higher extension layers. In scalable processing of data, other extension or enhancement layers are superimposed on the first extension or base layer. A complete encoder can produce a scalable data stream with a first scalable layer and, in principle, any number of other scalable layers. An advantage of the scalability concept is that the scalable data stream generated by the encoder can be completely transmitted, provided a broadband transmission channel is available. That is to say, including all extensible layers can be transmitted through the broadband transmission channel. However, if there is only one narrowband transport channel, the coded signal can still be sent over the transport channel, but only in the form of the first extension layer or a certain number of other extension layers. Wherein the number of other extension layers is less than the number of all extension layers generated by the encoder. Of course, an encoder coupled to and adapted to the channel may already produce a base or first extension layer and a number of other channel-dependent extension layers.

在解码器一端，可扩展概念也有一个优点，那就是反向兼容。这意味着只能处理第一扩展层的解码器忽略了数据流中的第二个以及其它扩展层，并且可以产生一个有用的输出信号。但是，如果解码器是一个典型的更加现代的解码器，能够处理扩展数据流中的多个扩展层，那么这个编码器能够作为基解码器来处理相同的数据流。On the decoder side, the extensibility concept also has the advantage of being backwards compatible. This means that a decoder that can only handle the first extension layer ignores the second and other extension layers in the data stream and can produce a useful output signal. However, if the decoder is a typically more modern decoder capable of handling multiple extension layers in an extension data stream, then this encoder can handle the same data stream as the base decoder.

在本发明中，基本的可扩展性是量化的模块，即比特流编码器60a的输出，被写入到图8的第一个扩展层81中，当考虑图6的情况下，它包含心理声学编码的数据，例如帧。通过合成装置58产生的最好经过熵编码的差分频谱值被写入第二个扩展层中，这种简单的可扩展性在图8a中用82来表示。因此对帧来说，包含辅助音频数据。In the present invention, the basic scalability is that the quantized module, i.e. the output of the bitstream encoder 60a, is written into the first extension layer 81 of Figure 8, which contains the psychological Acoustically encoded data, such as frames. The preferably entropy-coded differential spectral values generated by the combining means 58 are written into the second expansion layer, this simple scalability being indicated by 82 in FIG. 8a. Thus for a frame, auxiliary audio data is included.

如果从编码器到解码器的传输信道是宽带传输信道，扩展层81和82都可以发送到解码器。但如果这个传输信道是一个窄带传输信道，只有第一个扩展层是“符合”的，第二个扩展层可以在数据发送之前直接从数据流中移除，因此解码器只处理第一个扩展层。If the transport channel from the encoder to the decoder is a broadband transport channel, both extension layers 81 and 82 can be sent to the decoder. But if this transport channel is a narrowband transport channel, only the first extension layer is "compliant", the second extension layer can be directly removed from the data stream before the data is sent, so the decoder only processes the first extension layer.

在解码器一端，一个只能处理心理声学编码数据的“基解码器”可以在通过宽带信道收到第二个扩展层时直接忽略第二个扩展层。但如果这个解码器是一个含有心理声学解码算法和整数解码算法的完全的解码器，那么它可以用第一个和第二个扩展层来解码，以产生无损编码和解码后的输出信号。On the decoder side, a "base decoder" that can only process psychoacoustically encoded data can simply ignore the second extension layer when it is received over a wideband channel. But if this decoder is a full decoder with a psychoacoustic decoding algorithm and an integer decoding algorithm, then it can be decoded with the first and second extension layers to produce a lossless encoded and decoded output signal.

图8a中简要示出了本发明的一个优选实施例，用于帧的心理声学编码数据也被放在第一个扩展层中。图8a中的第二个扩展层被更精细地量化，使得从图8中的这个第二扩展层中出现多个扩展层，例如(更小的)第二扩展层、第三扩展层、第四扩展层等等。A preferred embodiment of the invention is schematically shown in Figure 8a, the psychoacoustically encoded data for the frame is also placed in the first extension layer. The second extension layer in Fig. 8a is quantized more finely, so that from this second extension layer in Fig. 8 emerges multiple extension layers, e.g. a (smaller) second extension layer, third extension layer, Four expansion layers and more.

从加法器58输出的差分频谱值尤其适合进一步的量化，如基于图9所示。图9简要示出了二进制编码的频谱值。图9中的每行90代表一个二进制编码的差分频谱值。在图9中差分频谱值根据频率来分类，在图上用箭头91来表示。一个差分频谱值92比差分频谱值90有更高的频率。图9中的表格中的第一列代表一个差分频谱值中的最高有效位；第二个数字代表有效位为MSB-1的比特；第三个数字代表有效位为MSB-2的比特。倒数第二列代表有效位为LSB+2的比特；倒数第一列代表有效位为LSB+1的比特；最后一列代表有效位数为LSB的比特，也就是一个差分频谱值的最低有效位。The differential spectral values output from adder 58 are particularly suitable for further quantization, as shown based on FIG. 9 . Fig. 9 schematically shows binary coded spectral values. Each row 90 in Fig. 9 represents a binary coded differential spectral value. In FIG. 9 the differential spectral values are classified according to frequency, which is indicated by arrows 91 in the figure. A differential spectral value 92 has a higher frequency than differential spectral value 90 . The first column in the table in FIG. 9 represents the most significant bit in a differential spectrum value; the second number represents the bit whose significance is MSB-1; and the third number represents the bit whose significance is MSB-2. The second-to-last column represents the bit whose significant digit is LSB+2; the penultimate column represents the bit whose significant digit is LSB+1; the last column represents the bit whose significant digit is LSB, that is, the least significant bit of a differential spectrum value.

在本发明的一个优选实施例中，差分频谱值的例如16个最高有效位在第二个扩展层中出现，以实现精确量化，这样如果希望的话，可以通过熵编码器60b进行熵编码。采用第二个扩展层的解码器在输出端以16比特的精度获得差分频谱值，这样第二扩展层和第一扩展层一起提供了一个CD音质的无损解码音频信号。已知存在16比特的CD音质音频采样。In a preferred embodiment of the invention, eg the 16 most significant bits of the differential spectral value appear in the second extension layer for precise quantization and thus entropy encoding by entropy encoder 60b, if desired. A decoder employing the second extension layer obtains the differential spectral values at the output with 16-bit precision, so that the second extension layer and the first extension layer together provide a CD-quality lossless decoded audio signal. 16-bit CD-quality audio samples are known to exist.

另一方面，如果将演播室音质的音频信号提供给编码器，即，每个采样包含24比特的音频信号，则编码器可进一步产生包含差分频谱值的最后8比特的第三扩展层，并根据需要进行熵编码(图6的装置60)。On the other hand, if the encoder is provided with a studio-quality audio signal, i.e., an audio signal containing 24 bits per sample, the encoder may further generate a third extension layer containing the last 8 bits of the differential spectral value, and Entropy encoding is performed as needed (means 60 of Figure 6).

一个完全的解码器获得第一扩展层、第二扩展层(差分频谱值16个最高有效位)和第三扩展层(差分频谱值8个次高有效位)的数据流，这个解码器可以提供一个无损的、演播室音质的编码/解码音频信号，也就是说，采用全部三个扩展层在解码器的输出端提供24比特的采样字宽。A complete decoder obtains the data streams of the first extension layer, the second extension layer (16 most significant bits of difference spectrum value) and the third extension layer (8 second most significant bits of difference spectrum value), and this decoder can provide A lossless, studio-quality encoded/decoded audio signal, that is, using all three extension layers to provide a 24-bit sample wordwidth at the output of the decoder.

需要注意的是，演播室领域中音频信号比一般消费类领域音频信号有更长的采样字长。在消费类领域，音频CD中信号字宽是16比特，而在演播室领域中是24或20比特。It should be noted that the audio signal in the studio field has a longer sample word length than the audio signal in the general consumer field. In the consumer field, the signal word width in an audio CD is 16 bits, and in the studio field it is 24 or 20 bits.

基于在IntMDCT领域缩放的概念，如前所述，所有三种精度(16比特，20比特或24比特)或者最小用1比特来量化的任意精度均可以被量化编码。Based on the concept of scaling in the IntMDCT domain, all three precisions (16-bit, 20-bit or 24-bit) or any precision quantized with a minimum of 1 bit can be quantized and coded as described above.

这里，用24比特精度表示的音频信号在借助于反向IntMDCT在整数频域中表示，并且和听力适应的基于MDCT的音频编码输出信号量化结合。Here, an audio signal represented with 24-bit precision is represented in the integer frequency domain by means of an inverse IntMDCT and combined with a hearing-adapted MDCT-based audio coding output signal quantization.

用于无损表示的整数差分值现在不是在一个扩展层中完全编码，而是首先以一种比较低的精度来编码。仅在一个其它扩展层中发送为精确的表达所需的残留值。然而一种替代方案是，一个差分频谱值可以被完整的表示，即在其它扩展层中例如用24比特来表示，这样对于解码这个其它的可扩展层，则不再需要下面的扩展层。然而，这种情况会导致更高的比特流大小，但是当传输信道的带宽不存在问题时，在解码器端就会简化，因为在解码器中可扩展层不再需要结合起来，对解码始终采用一个扩展层就足够了。Integer difference values for lossless representation are now not fully encoded in an extension layer, but first encoded at a lower precision. Only the residual values required for an exact representation are sent in a further extension layer. As an alternative, however, a differential spectral value can be represented completely, ie, for example, with 24 bits in the other extension layer, so that no further extension layer is required for decoding this other extension layer. However, this situation leads to higher bitstream sizes, but when the bandwidth of the transmission channel is not an issue, it simplifies at the decoder side, because in the decoder the scalable layers no longer need to be combined, and the decoding is always It is sufficient to adopt an extension layer.

例如如果低8位LSB，如图9所示，在开始时不再发送，就能实现在24比特和16比特之间的可扩展性。For example, if the lower 8 LSBs, as shown in Figure 9, are not sent at the beginning, scalability between 24 bits and 16 bits can be achieved.

为了将用较低精度所传输的值反变换到时域，被传输的值最好被扩展回初始区域，例如24比特，例如用2⁸乘以所传输的值。一个反向的IntMDCT被应用到对应的扩展回的值。In order to inversely transform the values transmitted with lower precision into the time domain, the transmitted values are preferably extended back to the original area, eg 24 bits, eg by multiplying the transmitted values by ²⁸ . An inverse IntMDCT is applied to the corresponding extended back value.

在根据本发明的频域中的精度量化中，还最好利用LSB中的冗余。例如如果一个音频信号在上部频域有很小的能量，这在IntMDCT频谱中用很小的值来表示，例如这些值大大小于可以例如用8比特表示的值(-128，......，127)，在IntMDCT频谱的LSB值的可压缩性中也体现了这种情况。而且，需要注意的是：在很小的差分频谱值中，从MSB到MSB-1的多个比特典型地均等于零；在有效位为MSB-n-1的比特之前，二进制编码的差分频谱值中的第一个1并不存在。这种情况下，当在第二个可扩展层中的差分频谱值只包含零的时候，熵编码尤其适合进一步的数据压缩。In precision quantization in the frequency domain according to the invention, it is also advantageous to exploit the redundancy in the LSB. For example if an audio signal has very little energy in the upper frequency domain, this is represented in the IntMDCT spectrum by very small values, e.g. these values are much smaller than what can e.g. be represented by 8 bits (-128, ..... ., 127), which is also reflected in the compressibility of the LSB values of the IntMDCT spectrum. Also, it should be noted that in very small differential spectral values, the bits from MSB to MSB-1 are typically equal to zero; The first 1 in does not exist. In this case, entropy coding is especially suitable for further data compression when the differential spectral values in the second scalable layer contain only zeros.

按照本发明的另一个实施例，对于图8a的第二扩展层82最好使用采样率扩展性。采样率扩展性通过最大为包含在第二扩展层中的第一截止频率的差分频谱值来实现，如图9右边所示，而在其它扩展层中，包含频率位于第一截止频率和最大频率之间的差分频谱值。当然，可以实现进一步的扩展，以在整个频域形成多个扩展层。According to another embodiment of the present invention, sampling rate scalability is preferably used for the second extension layer 82 of FIG. 8a. Sampling rate scalability is achieved by differential spectral values up to the first cutoff frequency contained in the second extension layer, as shown on the right in Figure 9, while in other extension layers, the included frequency is between the first cutoff frequency and the maximum frequency The difference spectrum value between. Of course, further extensions can be implemented to form multiple extension layers across the frequency domain.

在本发明的一个优选实施例中，图9中的第二个扩展层包括频率最大为24kHz的差分频谱值，对应于48kHz的采样率。第三扩展层包括从24kHz到48kHz的差分频谱值，对应于96kHz的采样率。In a preferred embodiment of the invention, the second extension layer in FIG. 9 comprises differential spectral values at a frequency of at most 24 kHz, corresponding to a sampling rate of 48 kHz. The third extension layer includes differential spectral values from 24kHz to 48kHz, corresponding to a sampling rate of 96kHz.

需要进一步注意的是，在第二扩展层和第三扩展层中，不是一个差分频谱值中的所有位都需要编码。在合成扩展性的其它形式中，第二扩展层可包含最大为某一截止频率的差分频谱值的从MSB到MSB-X的位。第三扩展层然后可以包含从第一截止频率到最高频率的差分频谱值的从MSB到MSB-X的位。第四扩展层可包含最大为截止频率的差分频谱值的剩余位。最后一个扩展层包含较高频率的差分频谱值的剩余位。这个概念将会使图9中的表格被分为四个象限，每个象限代表一个扩展层。It should be further noted that, in the second extension layer and the third extension layer, not all bits in a differential spectrum value need to be coded. In other forms of synthetic extension, the second extension layer may contain bits MSB to MSB-X of a differential spectral value up to a certain cutoff frequency. The third extension layer may then contain the MSB to MSB-X bits of the differential spectral value from the first cutoff frequency to the highest frequency. The fourth extension layer may contain the remaining bits of the differential spectral value up to the cutoff frequency. The last extension layer contains the remaining bits of the difference spectral values at higher frequencies. This concept would result in the table in Figure 9 being divided into four quadrants, each quadrant representing an extension tier.

在频率的可扩展性中，在本发明的一个优选实施例中，描述了一个位于48kHz和96kHz采样率之间的可扩展性。96kHz的采样信号首先只在无损扩展层的IntMDCT区域中编码一半，并被传输。如果上半部分不被另外传输，在解码器中它被假定为零。在反向IntMDCT中(与编码器同样长度)，产生了一个96kHz的信号，这个信号在上面的频域不包含能量，因而可能在没有质量损失的情况下以48kHZ被二次采样。In the frequency scalability, in a preferred embodiment of the invention, a scalability between 48 kHz and 96 kHz sampling rate is described. The 96 kHz sampled signal is first coded only half in the IntMDCT region of the lossless extension layer and transmitted. If the upper half is not otherwise transmitted, it is assumed to be zero in the decoder. In the inverse IntMDCT (same length as the encoder), a 96kHz signal is generated which contains no energy in the upper frequency domain and thus may be subsampled at 48kHz without loss of quality.

考虑到可扩展层的大小，图9具有固定边界的象限中差分频谱值最好在上面量化，因为在一个扩展层中，实际上只需要包含例如16位或者8位或者最大为截止频率或高于截止频率的频谱值。Considering the size of the scalable layer, the differential spectral values in the quadrants with fixed boundaries in Fig. 9 are best quantized on top, because in an extended layer, it is actually only necessary to contain e.g. 16 bits or 8 bits or a maximum of the cutoff frequency or Spectrum value at the cutoff frequency.

一种作为替换的比例在某种程度上“软化”了图9的象限边界。在频率可扩展性的例子中，这意味着不因为在截止频率前的差分频谱值没有改变并且在截止频率后为零，就应用所谓的“砖墙低通”。相反的，差分频谱值也可通过已经有些阻碍低于截止频率的频谱值的任意低通来滤波，但是，在截止频率以上，差分频谱值还仍然有能量，虽然能量在降低。在由此生成的扩展层中，还包含在截止频率以上的频谱值。然而，由于这些频谱值相对较小，它们可以被有效地进行熵编码。在这种情况下最高扩展层具有在完全差分频谱值和包含在第二扩展层的频谱值之间的差。An alternative scale "softens" the quadrant boundaries of Figure 9 somewhat. In the case of frequency scalability, this means not applying a so-called "brick-wall low-pass" since the differential spectral value does not change before the cutoff frequency and is zero after the cutoff frequency. Conversely, the differential spectral values can also be filtered by an arbitrary low pass which already somewhat blocks the spectral values below the cutoff frequency, but above the cutoff frequency the differential spectral values still have energy, although the energy is decreasing. In the extension layer thus generated, the spectral values above the cutoff frequency are also included. However, since these spectral values are relatively small, they can be effectively entropy encoded. In this case the highest expansion layer has the difference between the fully differential spectral values and the spectral values contained in the second expansion layer.

精确量化在某种程度上也可以同样被软化。第一扩展层也包含例如多于16位的频谱值，其中在下一个扩展层中仍然具有这个差别。通常来讲，第二扩展层具有精度更低的差分频谱值，而在下一个扩展层中，其余的，也就是完全频谱值和第二可扩展层中包含的频谱值之间的差被传输。通过这种方法，实现了可变精度缩减。Precise quantification can also be softened to some extent. The first expansion layer also contains, for example, spectral values with more than 16 bits, with this difference still being present in the next expansion layer. In general, the second expansion layer has the differential spectral values with less precision, while in the next expansion layer the rest, ie the difference between the full spectral value and the spectral value contained in the second scalable layer, is transmitted. In this way, variable precision reduction is achieved.

具有创造性的编码或解码方法更适于存储在具有电子可读性控制信号的电子存储媒体中，如软盘，其中控制信号可以与一个可编程的计算机系统配合，从而执行编码和/或解码方法。换句话说，当程序产品在计算机上执行时，存在一个具有存储在机器可读载体的计算机代码的计算机程序产品，以实现编码和/或解码方法。当程序在计算机中执行时，本发明的方法可以通过具有执行本发明的方法的计算机代码的计算机程序来实现。The inventive encoding or decoding method is preferably stored on an electronic storage medium, such as a floppy disk, having electronically readable control signals that can cooperate with a programmable computer system to perform the encoding and/or decoding method. In other words, there is a computer program product having computer code stored on a machine-readable carrier for implementing the encoding and/or decoding method when the program product is executed on a computer. The method of the present invention can be realized by a computer program having computer code for executing the method of the present invention when the program is executed in a computer.

下面，作为一个整数变换算法的例子，需要介绍在“Audio CodingBased on Interger Transforms”(111^th AES convention，NewYork，2001)中描述的IntMDCT变换算法。由于IntMDCT有MDCT算法的吸引人的特性，如音频信号的良好频谱表示、严格的取样和块重叠，IntMDCT尤其受到青睐。一种通过IntMDCT对MDCT的良好的近似可以仅仅使用在图5的编码器中的一个变换算法，如图5的箭头62所示。在图1到4的基础之上解释了这种特别形式的整数变换算法的重要属性。Next, as an example of an integer transformation algorithm, the IntMDCT transformation algorithm described in "Audio Coding Based on Integer Transforms" (111 ^th AES convention, New York, 2001) needs to be introduced. IntMDCT is especially favored due to its attractive properties of MDCT algorithms, such as good spectral representation of audio signals, strict sampling and block overlap. A good approximation to MDCT by IntMDCT can use only one transform algorithm in the encoder of FIG. 5, as indicated by arrow 62 of FIG. The important properties of this particular form of integer transformation algorithm are explained on the basis of Figures 1 to 4.

图1示出了为处理表示音频信号的时间离散的采样的具有创造性的优选的装置，以获得使IntMDCT整数变换算法有效的整数值。时间离散的采样被窗口化并且可选地被图1所示的装置转换成频谱表示。被送入装置的输入端10的时间离散的采样被一个长度为2N时间离散采样的窗口w窗口化，以在输出端12获取整数窗口化采样，这些采样适合于通过变换装置、尤其是用于执行整数DCT的装置14转化为频谱表示。整数DCT用于从N个输入值产生N个输出值，这与图4a的MDCT函数408相反，函数408根据MDCT等式从2N个窗口化值只产生N个频谱值。Figure 1 shows an inventively preferred arrangement for processing time-discrete samples representing an audio signal to obtain integer values for which the IntMDCT integer transform algorithm is efficient. The time-discrete samples are windowed and optionally converted to a spectral representation by the apparatus shown in FIG. 1 . The time-discrete samples fed into the input 10 of the device are windowed by a window w of length 2N time-discrete samples to obtain integer windowed samples at the output 12, which are suitable for passing through the transformation device, especially for Means 14 perform an integer DCT to convert to a spectral representation. Integer DCT is used to generate N output values from N input values, in contrast to MDCT function 408 of Figure 4a, which generates only N spectral values from 2N windowed values according to the MDCT equation.

为了窗口化时间离散采样，首先在装置16中选择两个时间离散的采样，它们一起代表一个时间离散采样的矢量。装置16选择的一个时间离散采样位于窗口的第一象限。另一个时间离散采样位于窗口的第二象限，在图3的基础上它被解释得更加详细。对于装置16生成的矢量，应用一个2×2维的矩阵旋转，其中这个操作不是立即执行的，而是通过多个所谓的“提升矩阵”来执行。To window the time-discrete samples, first two time-discrete samples are selected in the device 16, which together represent a vector of time-discrete samples. A time-discrete sample selected by means 16 is located in the first quadrant of the window. Another time-discrete sampling is located in the second quadrant of the window, which is explained in more detail on the basis of Figure 3. To the vectors generated by means 16, a matrix rotation of 2x2 dimension is applied, wherein this operation is not performed immediately, but through a number of so-called "lifting matrices".

一个提升矩阵具有只包含一个与窗口w有关的元素和不等于0或者1的属性。A boosting matrix has the property that it contains only one element associated with window w and is not equal to 0 or 1.

在“Factoring Wavelet Transforms Into Lifting Steps”(IngridDaubechies和Wim Sweldens，preprint，Bell Laboratories，LucentTechnologies，1996)中描述了由小波变换到提升步骤的因式分解。总体来讲，一个提升方案是具有同样低通或者高通滤波器的完美重建滤波器对之间的简单关系。每对互补滤波器都可以被因式分解为提升步骤。这对于Givens旋转尤其适用。考虑多相矩阵是Givens旋转的情形。然后，应用下面的公式：The factoring of wavelet transforms into lifting steps is described in "Factoring Wavelet Transforms Into Lifting Steps" (Ingrid Daubechies and Wim Sweldens, preprint, Bell Laboratories, Lucent Technologies, 1996). In general, a lifting scheme is a simple relationship between pairs of perfect reconstruction filters with the same low-pass or high-pass filter. Each pair of complementary filters can be factorized into lifting steps. This is especially true for Givens rotations. Consider the case where the multiphase matrix is a Givens rotation. Then, apply the following formula:

$(\begin{matrix} cos cos α α - - sin sin α α \\ sin sin α α cos cos α α \end{matrix}) = = (\begin{matrix} 11 & \frac{((cos cos α α - - 11))}{sin sin α α} \\ 00 & 11 \end{matrix}) (\begin{matrix} 11 & 00 \\ sin sin α α & 11 \end{matrix}) (\begin{matrix} 11 & \frac{((cos cos α α - - 11))}{sin sin α α} \\ 00 & 11 \end{matrix}) - - - - - - ((11))$

等号右边的三个提升矩阵每个都有1作为主对角线元素。此外，在每个提升矩阵中，不在主对角线上的元素等于0，不在主对角线上的元素与旋转角α有关。The three lifting matrices to the right of the equal sign each have 1s as the main diagonal elements. In addition, in each lifting matrix, the elements not on the main diagonal are equal to 0, and the elements not on the main diagonal are related to the rotation angle α.

现在向量与第三个提升矩阵相乘，也就是乘以上式中最右边的提升矩阵，得到第一个结果向量，在图1中用装置18来描述这个过程。如图1中通过装置20所示，用一个任意的取整函数对第一个结果向量取整，这个取整函数将一组实数映射为一组整数。在装置20的输出端处得到了取整后的第一个结果向量。这个取整后的第一个结果向量被送到装置22，与中间的一项相乘，也就是乘以右边第二项，得到第二个结果向量，然后再用装置24取整得到取整后的第二个结果向量。取整后的第二个结果向量送至装置26与上述等式最左边的提升矩阵相乘，也就是第一项，来得到第三个结果向量，最后依然用装置28取整，最后在输出端12处得到整数窗口化采样，如果希望得到其频谱表示，则需要通过装置14对其进行处理，从而在频谱输出端30处得到整数频谱值。Now the vector is multiplied by the third lifting matrix, that is, multiplying the rightmost lifting matrix in the above formula, to obtain the first result vector, and this process is described with device 18 in FIG. 1 . As shown in Figure 1 by means 20, the first result vector is rounded by an arbitrary rounding function which maps a set of real numbers to a set of integers. The rounded first result vector is obtained at the output of the device 20 . The first result vector after this rounding is sent to the device 22, multiplied by the middle item, that is, multiplied by the second item on the right, to obtain the second result vector, and then rounded by the device 24 to obtain the rounded After the second result vector. The second result vector after rounding is sent to device 26 and multiplied by the lifting matrix on the leftmost side of the above equation, that is, the first item, to obtain the third result vector, and finally the device 28 is still used to round, and finally output Integer windowed samples are obtained at terminal 12 , and if one wants to obtain its spectrum representation, it needs to be processed by device 14 , so that integer spectrum values can be obtained at spectrum output terminal 30 .

装置14最好作为整数DCT来实现。The means 14 are preferably implemented as an integer DCT.

根据长度为N的类型4(DCT-IV)，离散余弦变换用下式给出：According to type 4 (DCT-IV) of length N, the discrete cosine transform is given by:

${X x}_{t t} ((m m)) = = \sqrt{\frac{22}{N N}} {Σ Σ}_{k k = = 00}^{N N - - 11} x x ((k k)) cos cos ((\frac{π π}{44 N N} ((22 k k + + 11)) ((22 m m + + 11)))) - - - - - - ((22))$

DCT-IV的系数形成一个标准正交的N×N矩阵，如出版物“Multirate System And Filter Banks”(P.P.Vaidyanathan，PrenticeHall，Englewood Cliffs，1993)中所述，每一个正交N×N矩阵可以分解成N(N-1)/2个Givens旋转。需要注意的是，也可以进一步分解。The coefficients of DCT-IV form an orthonormal N×N matrix, as described in the publication “Multirate System And Filter Banks” (P.P. Vaidyanathan, Prentice Hall, Englewood Cliffs, 1993), each orthogonal N×N matrix can Decomposed into N(N-1)/2 Givens rotations. Note that further decompositions are also possible.

对于不同DCT算法的分类，可以参考H.S.Malvar的“SignalProcessing With Lapped Transforms”一书，1992年Artech House出版社出版。一般来说，DCT算法根据它们的基函数类型来区分。而在这里优选的DCT-IV中包含非对称的基函数，也就是说，一个1/4余弦波，一个3/4余弦波，一个5/4余弦波，一个7/4余弦波等等，这种离散余弦变换，例如类型II(DCT-II)，具有轴对称和点对称的基函数。零级基函数是一个直流分量，第一级基函数是半个余弦波，第二级基函数是整个余弦波，等等。由于在DCT-II中特别考虑直流分量，它应用在视频编码中而不是用在音频编码中，因为与视频编码不同的是，音频编码中的直流分量是不相关的。For the classification of different DCT algorithms, you can refer to the book "Signal Processing With Lapped Transforms" by H.S. Malvar, published by Artech House Press in 1992. In general, DCT algorithms are differentiated according to their basis function type. Whereas the preferred DCT-IV here contains asymmetric basis functions, that is, a 1/4 cosine wave, a 3/4 cosine wave, a 5/4 cosine wave, a 7/4 cosine wave, etc., Such discrete cosine transforms, such as Type II (DCT-II), have axisymmetric and point-symmetric basis functions. The zero-order basis function is a DC component, the first-order basis function is a half cosine wave, the second-order basis function is a whole cosine wave, and so on. Since the DC component is specifically considered in DCT-II, it is used in video coding but not in audio coding, because the DC component is irrelevant in audio coding, unlike in video coding.

下面来解释Givens旋转的旋转角α如何与窗口函数有关。Let's explain how the rotation angle α of the Givens rotation is related to the window function.

窗口长度为2N的一个MDCT可以减至长度为N的IV型离散余弦变换。这可以通过在时域内执行TDAC操作，然后应用DCT-IV来实现。由于50％重叠，用于块t的左半部窗口和先前的块，也就是决t-1的右半部重叠。两个连续块t和t-1的重叠部分在时域中，即在转换之前，也就是在图1的输入10和输出12之间，进行预处理，如下：An MDCT with a window length of 2N can be reduced to a type IV discrete cosine transform of length N. This can be achieved by performing TDAC operations in the time domain and then applying DCT-IV. Due to the 50% overlap, the left half of the window for block t overlaps with the right half of the previous block, ie block t-1. The overlap of two consecutive blocks t and t−1 is preprocessed in the time domain, i.e. before conversion, that is, between input 10 and output 12 of Fig. 1, as follows:

$(\begin{matrix} {\overset{~ ~}{x x}}_{t t} ((k k)) \\ {\overset{~ ~}{x x}}_{t t - - 11} ((N N - - 11 - - k k)) \end{matrix}) = = (\begin{matrix} w w ((\frac{N N}{22} + + k k)) & - - w w ((\frac{N N}{22} - - 11 - - k k)) \\ w w ((\frac{N N}{22} - - 11 - - k k)) & w w ((\frac{N N}{22} + + k k)) \end{matrix}) (\begin{matrix} {x x}_{t t} ((\frac{N N}{22} + + k k)) \\ {x x}_{t t} ((\frac{N N}{22} - - 11 - - k k)) \end{matrix}) - - - - - - ((33))$

字母上面标有波浪线的数值是图1的输出端12处的值，上式中没有标有波浪线的x值代表输入端10处的值或者装置16后面的用于选择的值。系数k的取值范围从0到(N/2)-1，w代表窗口函数。Values marked with a wavy line above the letters are the values at the output terminal 12 of FIG. The value of the coefficient k ranges from 0 to (N/2)-1, and w represents the window function.

从窗口函数w的TDAC条件可知有下面关系：From the TDAC condition of the window function w, we can see the following relationship:

$w w {((\frac{N N}{22} + + k k))}^{22} + + w w {((\frac{N N}{22} - - 11 - - k k))}^{22} = = 11 - - - - - - ((44))$

对于某些角度α_k，k＝0、1、......、(N/2)-1，这个在时域内的预处理可以写成Givens旋转，这在前面已经解释了。For certain angles α _k , k=0, 1, . . . , (N/2)-1, this preprocessing in the time domain can be written as a Givens rotation, which was explained earlier.

Givens旋转的角度α与窗口函数w的关系如下：The relationship between the givens rotation angle α and the window function w is as follows:

α＝arctan[w(N/2-1-k)/w(N/2+k)4 (5)α＝arctan[w(N/2-1-k)/w(N/2+k)4 (5)

需要注意的是，只要符合TDAC条件，任意的窗口函数w都可以应用。It should be noted that any window function w can be applied as long as it meets the TDAC conditions.

下面，以图2为基础，描述了一个级联的编码器和解码器。通过一个窗口一起“窗口化”的时间离散采样x(0)到x(2N-1)首先被图1中的装置16来选择，使得采样x(0)和x(N-1)，即来自窗口的第一个四分之一部分的采样和来自窗口的第二个四分之一部分的采样被选择，以在装置16的输出端处形成矢量。交叉的箭头表示对装置18，20或22，24或26，28提升相乘和相继取整，以在DCT-IV块的输入端得到整数窗口化的采样。Below, based on Figure 2, a cascaded encoder and decoder is described. Time-discrete samples x(0) to x(2N-1) that are "windowed" together by a window are first selected by means 16 in Fig. 1 such that samples x(0) and x(N-1), i.e. from The samples from the first quarter of the window and the samples from the second quarter of the window are selected to form a vector at the output of the means 16 . Crossed arrows indicate lifting multiplication and successive rounding of means 18, 20 or 22, 24 or 26, 28 to obtain integer windowed samples at the input of the DCT-IV block.

如上所描述，当第一个矢量被处理的时候，第二个矢量也从采样x(N/2-1)和x(N/2)中选中，也就是说，又一个来自窗口的第一个四分之一部分的采样和来自窗口的第二个四分之一部分的采样，再一次通过图1中所描述的算法处理。所有其他的来自于窗口第一个四分之一部分和第二个四分之一部分的采样对均被类似处理。第一个窗口的第三和第四个四分之一部分被同样地处理。如图2所示，在输出端12处具有N个“窗口化”的整数采样，它被送至DCT-IV变换。特别的，第二和第三个四分之一部分的“窗口化”整数采样被送至DCT。窗口的第一个四分之一部分的“窗口化”整数采样与前一个窗口的第四个四分之一部分的“窗口化”整数采样一起被送入前面的DCT-IV中进行处理。类似的，图2中第四个四分之一部分的“窗口化”整数采样与后一个窗口的第一个四分之一部分的“窗口化”整数采样一起被送至DCT-IV变换。图2中所示的中央整数DCT-IV变换32提供了N个整数的频谱值y(0)到y(N-1)。由于窗口化过程和变换过程提供了整数的输出值，因此不需要反向量化就可以将这些整数频谱值直接进行熵编码。As described above, when the first vector is processed, the second vector is also selected from samples x(N/2-1) and x(N/2), that is, another one from the first vector of the window The samples from the first quarter of the window and the samples from the second quarter of the window are again processed by the algorithm described in Figure 1. All other pairs of samples from the first and second quarters of the window are treated similarly. The third and fourth quarters of the first window are treated similarly. As shown in Figure 2, there are N "windowed" integer samples at output 12, which are sent to the DCT-IV transform. In particular, "windowed" integer samples of the second and third quarters are sent to the DCT. The "windowed" integer samples from the first quarter of the window are fed into the previous DCT-IV along with the "windowed" integer samples from the fourth quarter of the previous window deal with. Similarly, the "windowed" integer samples from the fourth quarter of Figure 2 are sent to DCT-IV along with the "windowed" integer samples from the first quarter of the following window transform. The central integer DCT-IV transform 32 shown in Figure 2 provides N integer spectral values y(0) to y(N-1). Since the windowing process and the transformation process provide integer output values, these integer spectral values can be directly entropy encoded without inverse quantization.

在图2的右半边描述了一个解码器。这个解码器包含反向变换和“反向窗口化”，它以与编码器相反的方式工作。已知对于DCT-IV的反向变换来说，需要使用到如图2所示的反向DCT-IV。如图2所示，为了再一次在装置34的输出端或者前一次和下一次变换中从整数“窗口化”采样中产生时间离散音频采样x(0)到x(2N-1)，用前一次和后一次的变换的值对解码器DCT-IV34的输出值进行反向处理。In the right half of Fig. 2 a decoder is depicted. This decoder contains an inverse transform and "inverse windowing", which works in the opposite way to the encoder. It is known that for the inverse transformation of DCT-IV, the inverse DCT-IV as shown in FIG. 2 needs to be used. As shown in FIG. 2, to generate time-discrete audio samples x(0) to x(2N-1) from the integer "windowed" samples at the output of the device 34 again in the previous and next transformations, the previous The values of the first and subsequent transformations are inversely processed to the output values of the decoder DCT-IV34.

输出端的操作通过一个反向Givens旋转来完成，即块26，28或者22，24或者18，20是在一个相反的方向通过。基于等式1的第二个提升矩阵可以描述得更加详细。当(在编码器中)第二个结果矢量通过将取整后的第一个结果矢量与第二个提升矩阵相乘(装置22)而形成的时候，有以下的结果：Operation at the output is done by a reverse Givens rotation, ie blocks 26, 28 or 22, 24 or 18, 20 are passed in an opposite direction. The second boost matrix based on Equation 1 can be described in more detail. When (in the encoder) the second result vector is formed by multiplying (means 22) the rounded first result vector with the second lifting matrix, the following results:

等式6右边的值x，y是整数。然而这不适用于值xsinα。这里，需要介绍一下取整函数r，它以如下的等式表示：The values x, y on the right side of Equation 6 are integers. However this does not apply to the value xsinα. Here, we need to introduce the rounding function r, which is expressed by the following equation:

这个操作执行了装置24的功能。This operation performs the function of means 24 .

解码器中的反向映射可以定义如下：The reverse mapping in the decoder can be defined as follows:

由于在取整操作之前的减号，很明显提升步骤的整数近似可以被反向，而不会引入错误。对这三个提升步骤中任何一个的近似的应用都导致了Givens旋转的整数近似。(编码器中的)取整旋转可以(在解码器中)被反向，而不会引入错误，即反向取整顺提升步骤以相反的顺序通过，也就是说，图1的算法在解码的时候是自下向上执行的。Due to the minus sign preceding the rounding operation, it is clear that the integer approximation of the lifting step can be reversed without introducing errors. Application of an approximation to any of these three lifting steps results in an integer approximation of the Givens rotation. The rounding rotation (in the encoder) can be reversed (in the decoder) without introducing errors, i.e. the reverse rounding and lifting steps are passed in reverse order, that is, the algorithm of Fig. It is executed bottom-up.

如果取整函数r是点对称的，反向取整的旋转与角-α的取整旋转是相同的，如下：If the rounding function r is point-symmetric, the rotation of the reverse rounding is the same as the rounding rotation of the angle -α, as follows:

$(\begin{matrix} cos cos α α & sin sin α α \\ - - sin sin α α & cos cos α α \end{matrix}) - - - - - - ((99))$

用于解码器的提升矩阵，即用于反向Givens旋转，在这种情况下可由等式(1)直接得到，仅需简单地将“sinα”项替换为“-sinα”。The lifting matrix for the decoder, ie for the inverse Givens rotation, is in this case directly derived from equation (1), simply replacing the "sinα" term with "-sinα".

在下面，在图3的基础之上，再次提到具有重叠窗口40到60的普通MDCT的分解。窗口40到60分别重叠50％。每个窗口，首先窗口的第一和第二个四分之一部分内、或者在窗口的第三和第四个四分之一部分内的Givens旋转被执行，如箭头48所示。然后，被旋转的值，也就是窗口化的整数采样，被送入一个N到N的DCT，使得一个窗口的第二和第三个四分之一部分或者下一个窗口的第四和第一个四分之一部分一起通过DCT-IV算法转换为频谱表示。In the following, on the basis of FIG. 3 , the decomposition of a general MDCT with overlapping windows 40 to 60 is mentioned again. Windows 40 to 60 each overlap by 50%. For each window, first a Givens rotation within the first and second quarter of the window, or within the third and fourth quarter of the window is performed, as indicated by arrow 48 . Then, the rotated values, that is, windowed integer samples, are fed into an N by N DCT such that the second and third quarters of one window or the fourth and third quarters of the next window A quarter section is converted into a spectral representation together through the DCT-IV algorithm.

所以，通常的Givens旋转被分解为提升矩阵，这些矩阵被顺序执行，其中在每次提升矩阵相乘之后引入一个取整的步骤，使得浮点数在它们产生后就立即被取整，这样在每次结果矢量与提升矩阵相乘之前，结果矢量只有整数。Therefore, the usual Givens rotation is decomposed into lifting matrices, which are executed sequentially, where a rounding step is introduced after each lifting matrix multiplication, so that floating-point numbers are rounded immediately after they are generated, so that in each Before multiplying the result vector with the lifting matrix, the result vector has only integers.

输出值总是整数，最好也使用整数输入值。这不代表对本发明的局限，因为每个作为示例的PCM采样，由于它们存储在一张CD上，是整数值，其取值范围是根据位的宽度变化的，也就是说，根据时间离散数字输入值是十六位还是二十一位来变化。然而，如所阐述的一样，通过以相反的顺序执行反向旋转，整个过程是可以反向进行的。因此，存在一个具有完美重建的MDCT整数近似值，即无损转换。Output values are always integers, and it is preferable to use integer input values as well. This does not represent a limitation on the invention, since each exemplary PCM sample, since they are stored on a CD, is an integer value whose range of values varies according to the width of the bit, that is, according to the time discrete number Whether the input value is sixteen or twenty one to change. However, as explained, the entire process can be reversed by performing the reverse rotation in the reverse order. Therefore, there exists an integer approximation of the MDCT with perfect reconstruction, the lossless transformation.

所示转换提供了整数输出值而不是浮点值。它提供了一个完美的重建，所以当先执行一个前向转换、然后执行一个后向转换的时候，没有引入错误。这个转换，按照本发明的一个优选实施例，是对修正离散余弦变换的替换。然而，其他转换方法也可以通过整数的方式执行，只要分解为旋转和将旋转分解为提升步骤是可能的。The conversions shown provide integer output values rather than floating point values. It provides a perfect reconstruction, so no errors are introduced when performing a forward transformation followed by a backward transformation. This transform, according to a preferred embodiment of the present invention, is a replacement for the Modified Discrete Cosine Transform. However, other transformation methods can also be performed in an integer fashion, as long as decomposition into rotations and decomposition of rotations into lifting steps is possible.

整数MDCT有MDCT的大部分优良特性。它有一个重叠的结构，由此可得到比在无重叠块转换中更好的频率选择性。由于TDAC函数，转换前的窗口化已经考虑了这个函数，维持了严格的采样，使得代表一个音频信号的所有频谱值等于输入采样的总数。Integer MDCT has most of the good properties of MDCT. It has an overlapping structure, whereby better frequency selectivity is obtained than in non-overlapping block switching. Due to the TDAC function, windowing before conversion already takes this function into account, maintaining strict sampling such that all spectral values representing an audio signal are equal to the total number of input samples.

与一个普通的提供浮点采样的MDCT相比，在描述的优选的整数变换中，仅在具有很小的信号强度的频谱区域中，与普通MDCT相比，噪声增强了，而这个噪声增强的并没有使它自己成为一个重要的信号强度。为此，整数处理有助于有效的硬件实现，因为只使用了乘法步骤，而乘法可以很容易地分解为移位和加法步骤，这两种操作在硬件中都是很容易快速实现的。当然，软件实现也是可行的。In the preferred integer transform described, the noise is enhanced compared to an ordinary MDCT only in spectral regions with little signal strength, and this noise-enhanced Doesn't qualify as a significant signal strength on its own. For this, integer processing facilitates an efficient hardware implementation because only the multiplication step is used, and multiplication can be easily decomposed into shift and add steps, both of which are easy and fast to implement in hardware. Of course, software implementation is also feasible.

整数变换提供了音频信号的一个良好的频谱表示，并且仍然保留在整数区域。当它被应用于一个音频信号的语音部分时，会导致良好的能量聚集。通过这种方法，一个有效的无损编码方案可以通过用如图1所示简单的级联窗口化/转换来实现。尤其，使用逸出值的堆栈编码是很受欢迎的，如在MPEG AAC中使用的一样。最好通过使用二的特定次方来缩减所有的值直到它们满足一个所希望的码表，然后对忽略的最低有效位进行编码。与使用更大的码表的替代方法相比，考虑到存储码表所需要的存储消耗，这个方法更好。也可以通过只简单地省略某些最低有效位获得一种几乎无损的编码器。Integer transforms provide a nice spectral representation of the audio signal and still remain in the integer region. When applied to the speech portion of an audio signal, it results in a good energy concentration. With this approach, an efficient lossless coding scheme can be implemented with a simple cascaded windowing/transformation as shown in Figure 1. In particular, stack encoding using escape values, as used in MPEG AAC, is popular. It is best to reduce all values by using a specific power of two until they satisfy a desired code table, and then encode the least significant bits ignored. Compared to the alternative method of using a larger code table, this method is better in terms of the memory consumption required to store the code table. It is also possible to obtain an almost lossless encoder by simply omitting only some of the least significant bits.

尤其对于语音信号，整数频谱值的熵编码使高编码增益成为可能。对于信号的瞬态部分，编码增益很低，即由于瞬态信号的平坦频谱，也就是说，由于一小部分等于或几乎等于0的频谱值。如在J.Herre，J.D.Johnston的“Enhancing the Performance of Perceptual AudioCoders by Using Temporal Noise Shaping(TNS)”101^st AESConvention，Los Angeles，1996，preprint 4384中所描述，然而这种平坦性可能通过用频域内的线性预测而被利用。有一个替代方案是用开环预测，还有一个替代方案是用闭环预测。第一种方案，即开环预测器，被称为TNS。预测后的量化导致结果量化噪声适应于音频信号的时域结构，因此阻止了在心理声学音频编码器中的前向回波。对于无损音频编码，第二种方案更适合，也就是闭环预测器，因为闭环预测允许输入信号的精确重建。当这一技术被应用于所生成的频谱时，在预测滤波器的每级后必须执行一个取整步骤，以使之保留在整数区域内。通过使用反向滤波器和同样的取整函数，初始的频谱可以精确地产生。Especially for speech signals, entropy coding of integer spectral values enables high coding gains. For the transient part of the signal, the coding gain is low, i.e. due to the flat spectrum of the transient signal, that is to say, due to a small fraction of spectral values equal or almost equal to zero. As described in J. Herre, JD Johnston "Enhancing the Performance of Perceptual AudioCoders by Using Temporal Noise Shaping (TNS)" 101 ^st AESConvention, Los Angeles, 1996, preprint 4384, however this flatness may be obtained by using used for linear prediction. An alternative is to use open-loop forecasting, and an alternative is to use closed-loop forecasting. The first scheme, the open-loop predictor, is called TNS. Post-predictive quantization causes the resulting quantization noise to adapt to the temporal structure of the audio signal, thus preventing forward echoes in psychoacoustic audio coders. For lossless audio coding, the second scheme is more suitable, that is, a closed-loop predictor, because closed-loop prediction allows an accurate reconstruction of the input signal. When this technique is applied to the generated spectrum, a rounding step must be performed after each stage of the prediction filter to keep it in the integer region. By using the inverse filter and the same rounding function, the original spectrum can be generated exactly.

为了利用数据缩减中的两条信道之间的冗余，当使用一个α/4角度的取整旋转时候，在无损方式中也可以使用中间-边缘编码。与计算立体声信号左右声道之间的总数和差的方法相比较，这个取整旋转的好处是能够维持能量。使用所谓的结合立体声编码的技术可以为每个波段被打开或者关闭，如同在标准MPEG AAC中也是这样实现的。为了能够更加灵活地减小两个信道之间的冗余，还可考虑其它旋转角度。In order to exploit the redundancy between the two channels in data reduction, mid-edge coding can also be used in lossless fashion when using a rounded rotation of α/4 angle. The benefit of this rounding rotation is that it preserves energy compared to methods that calculate the sum and difference between the left and right channels of a stereo signal. Using so-called combined stereo coding it can be switched on or off for each band, as is done in standard MPEG AAC. In order to be able to more flexibly reduce the redundancy between the two channels, other rotation angles are also conceivable.

Claims

1. be used for the device of time-discrete coding audio signal with the voice data after obtaining encoding comprised:

Being used for applied mental acoustic model (54) provides the device (52) of the quantize block of the time-discrete sound signal that is quantized;

Be used for this quantize block of inverse quantization, and the spectrum value of inverse quantization is rounded, with the device that rounds piece (58) of the spectrum value of the inverse quantization that obtains to be rounded;

Be used to utilize the integer transform algorithm to generate the device (56) of the integer piece of integer spectrum value, described integer transform algorithm is used for generating from integer time discrete sampling module the integer piece of spectrum value;

Be used for coupling apparatus (58), to obtain to have the difference block of difference spectrum value according to the difference formation difference block that rounds spectrum value between piece and the integer piece; And

Be used to handle the device (60) of quantize block and difference block, comprise the voice data of coding of the information of the information of quantize block and difference block with generation.

2. device as claimed in claim 1 wherein is used to the device (52) that provides by a MDCT, produces the MDCT module of a MDCT spectrum value from the time block of time audio signal value, and

Quantize this MDCT module with psychoacoustic model, comprise the quantize block of the MDCT spectrum value of quantification with generation.

3. device as claimed in claim 2, the device (56) that wherein is used to produce the integer piece is carried out an IntMDCT on time block, comprise the integer piece of IntMDCT spectrum value with generation.

4. as the described device of the arbitrary claim in front, the device (52) that wherein is used to provide calculates quantize block with the floating-point transfer algorithm.

5. device as claimed in claim 1, the device (52) that wherein is used to provide use the integer piece that produces by the device (56) that is used to generate to calculate quantize block.

6. device as claimed in claim 1,

The device (60) that wherein is used to handle carries out entropy coding (60a) to quantize block, to obtain the quantize block of entropy coding;

Carry out entropy coding (60b) to rounding piece, to obtain the piece that rounds of entropy coding; And

The quantize block of entropy coding is converted to first extension layer of the extended data stream of presentation code voice data, and entropy coding is rounded second extension layer that piece is converted to extended data stream.

7. device as claimed in claim 6,

The device (60) that wherein is used to handle uses in a plurality of code tables also according to the spectrum value that quantizes, and quantize block is carried out entropy coding, and

The device (60) that wherein is used for handling is selected in a plurality of code tables also according to the attribute that quantizes available quantizer, is used for difference block is carried out the quantize block of entropy coding with generation.

8. device as claimed in claim 1,

The device (52) that the provides attribute according to sound signal wherein is provided, selects in a plurality of windows, carry out windowization with time block to audio signal value; And

The device (56) that wherein is used to generate is selected for the integer transfer algorithm carries out identical window.

9. device as claimed in claim 1,

The device that wherein is used to generate has used an integer transfer algorithm, comprising:

Corresponding to the window (w) of 2N time-discrete sampling time-discrete sampling carry out windowization with length, so that the time discrete sampling of windowization to be provided, by producing the conversion of N output valve from N input value, with time-discrete unscented transformation is frequency spectrum designation, and wherein the window process comprises following substep:

Select (16) time-discrete samplings from four of window/part, and select a time-discrete sampling, to obtain the vector of time discrete sampling from the other four/part of this window;

Use a rotation square formation, its dimension and vector are complementary to the dimension of vector, and wherein rotation matrix can be represented with a plurality of lifting matrixes, and one of them promotes matrix and only comprises an element according to window (w), and be not equal to 1 or 0, wherein use substep and comprise following substep:

Multiply each other (18) with promoting matrix and vector, obtain first result vector;

Round the component of first result vector, first result vector that obtains rounding with the bracket function (r) that real number is mapped as integer; And

Execution subsequently promotes matrix multiple (22) with another one and rounds the step of (24), finish up to all lifting matrixes are all processed, obtain a rotating vector, it comprises from the integer window sampling of four of window/part with from the integer window sampling of the other four/part of this window, and

Execution is sampled for all time discretes of the remaining four/part of window and is carried out the step of windowization, obtains 2N filtered round values; And

For second and the 3rd four/a part of filtered integer sample values by window, by integer DCT, be the integer unscented transformation (14) of a N windowization frequency spectrum designation, obtain N integer spectrum value.

10. device as claimed in claim 1,

The device (52) that quantize block wherein is provided is realized prediction for spectrum value on the frequency with a predictive filter, at quantization step (52b) before to obtain being illustrated in the prediction residual spectrum value of the quantize block after quantizing;

A prediction unit wherein also is provided, and it is predicted on frequency the integer spectrum value of integer piece, wherein also provides to round device, rounds with the prediction residual spectrum value that the integer spectrum value that rounds piece owing to expression is obtained.

11. device as claimed in claim 1,

Wherein the time discrete sound signal comprises at least two channels:

The device (52) that provides wherein is provided comes implementation center/edge coding with the spectrum value of time discrete tone signal, after the quantification of center/edge spectrum value, obtaining quantize block, and

The device (56) that wherein is used to generate the integer piece also is provided by the center/edge coding corresponding to the center/edge coding of the device that is used to provide (52).

12. device as claimed in claim 1, the device (60) that wherein is used to handle produces a MPEG-2ACC data stream, has wherein introduced the auxiliary data supplementary that is used for the integer transform algorithm in a zone.

13. device as claimed in claim 1,

The voice data that device (60) the output process that wherein is used to handle is encoded is as the data stream that has a plurality of extension layers.

14. device as claimed in claim 13,

Wherein be used for the device (60) handled and inserted information, and in second extension layer (82), inserted information about difference block about quantize block at first extension layer (81).

15. device as claimed in claim 13,

Wherein be used for the device (60) handled and inserted information, and in the second and the 3rd extension layer, inserted information at least about difference block about quantize block at first extension layer.

16. device as claimed in claim 15,

Wherein in second extension layer, comprise the difference spectrum value that has the precision that is reduced, but in high one-level or more senior extension layer, comprise the residual fraction of difference spectrum value.

17. device as claimed in claim 15,

Wherein the information about difference block comprises binary coding difference spectrum value;

Second extension layer that wherein is used for the difference spectrum value comprises a plurality of bits from the highest significant position (MSB) of difference spectrum value to time high significance bit (MSB-x); And

Wherein comprise a plurality of bits from inferior high significance bit (MSB-x-1) to least significant bit (LSB) (LSB) at the 3rd extension layer.

18. device as claimed in claim 17,

Wherein time discrete sound signal width is that the sampled form of 24 bits is represented, and

The device (60) that wherein is used for handling inserts 16 bits of the higher significance bit of difference spectrum value at second extension layer, in the 3rd extension layer, insert remaining 8 bits of difference spectrum value, demoder has reached CD Quality with second extension layer like this, if wherein adopt the 3rd extension layer, demoder just can reach the tonequality of studio.

19. device as claimed in claim 15,

The device (60) that wherein is used for handling has inserted to small part difference spectrum value at second extension layer, and the expression low-pass filter signal has inserted difference spectrum value and the initial difference between the difference spectrum value in second extension layer in the another one extension layer.

20. device as claimed in claim 15,

The device (60) that wherein is used for handling has inserted the difference spectrum value that is up to certain cutoff frequency to small part at second extension layer, and has inserted in the 3rd extension layer to the difference spectrum value of small part from certain cutoff frequency to higher frequency.

21. time-discrete coding audio signal to obtain the method for coding audio data, being comprised:

Applied mental acoustic model (54) provides the quantize block of spectrum value of the time discrete sound signal of (52) quantifications;

Inverse quantization (58) quantize block, and round the spectrum value of this inverse quantization, to obtain rounding the piece that rounds of inverse quantization spectrum value;

Use an integer transform algorithm to produce the integer piece of (56) integer spectrum values, this integer transform algorithm produces the integer piece of spectrum value from integer time discrete sampling block;

According at the spectral difference score value that rounds between piece and the integer piece, form (58) difference block, to obtain having the difference block of difference spectrum value; And

Handle (60) quantize block and difference block, comprise about the information of quantize block with about the coding audio data of the information of difference block with generation.

22. be used for device that the voice data of having encoded is decoded, this voice data of having encoded produces from a time discrete sound signal, the quantize block of spectrum value of the time discrete sound signal of (52) quantifications is provided by applied mental acoustic model (54), by inverse quantization (58) quantize block and round the spectrum value of inverse quantization, inverse quantization spectrum value after obtaining to round round piece, by using the integer transform algorithm that produces the integer piece of spectrum value from the data block of integer time discrete sampling, produce the integer piece of (56) integer spectrum value, form (58) difference block by basis in the difference that rounds the spectrum value between piece and the integer piece, to obtain the difference block of difference spectrum value, comprising:

Be used to handle the device (70) of coding audio data, obtain a quantize block and difference block;

Be used for inverse quantization and round the device (74) of this quantize block, with the quantize block of the inverse quantization that obtains an integer;

Be used for obtaining a binding modules with the device (78) of spectrum value mode in conjunction with integer quantisation piece and difference block;

Use this binding modules and the integer transform algorithm opposite, produce the device (82) of the time representation of a time discrete sound signal with the integer transform algorithm.

23. the decoding device described in claim 22,

Wherein coding audio data is extendible, and comprises a plurality of extension layers;

The device (70) that wherein is used for handling this coding audio data is determined quantize block from coding audio data, as first extension layer, and determines difference block from coding audio data, as second extension layer.

24. device as claimed in claim 22,

Wherein the information about difference block comprises binary code differential spectrum value,

Wherein coding audio data is extendible, and comprises a plurality of extension layers,

The device (70) that wherein is used for handling this coding audio data determines quantize block from coding audio data, as first extension layer, and extracts representing of difference spectrum value with the precision that has reduced, as second extension layer.

25. device as claimed in claim 24,

The device (70) that wherein is used to handle this coding audio data extracts a plurality of bits from highest significant position to inferior high significance bit as second extension layer, and wherein time high significance bit is higher than the least significant bit (LSB) in the difference spectrum value, and

The device (82) of time representation that is used to generate the discrete tone signal produced the disappearance bit of difference spectrum value with comprehensive method before using the integer transform algorithm.

26. device as claimed in claim 25,

Wherein device (82) is carried out the expansion of second extension layer for comprehensive generation, wherein in expansion, use a scale factor, it equals 2n, and wherein n is the inferior high number of significant bits that is not included in second extension layer, perhaps uses dither algorithm for comprehensive the generation.

27. device as claimed in claim 22,

Wherein coding audio data is extendible, and comprises a plurality of extension layers, and

The device (70) that is used for handling this coding audio data is determined quantize block from coding audio data, as first extension layer, and the difference spectrum value of definite low-pass filtering, as second extension layer.

28. device as claimed in claim 22,

The device (70) that wherein is used for handling this coding audio data is determined quantize block from coding audio data, as first extension layer; Determine to be up to the difference spectrum value of first cutoff frequency, as second extension layer, wherein first cutoff frequency is littler than the maximum frequency of the difference spectrum value that can produce in scrambler.

29. device as claimed in claim 28,

Wherein be used for the device (82) that the rise time represents the input value of the integer transform algorithm of total length is made as predetermined value, these are worth on the cutoff frequency of second extension layer; And by by corresponding to the maximum frequency of difference spectrum value and the factor selected by the ratio of frequency, just use after the reverse integer transform algorithm, reduce the time representation of discrete tone signal sample time, wherein difference spectrum value maximum frequency can be produced by scrambler.

30. device as claimed in claim 29,

Wherein the predetermined value of all input values on cutoff frequency is zero.

31. to the method that the voice data of having encoded is decoded, the voice data of wherein having encoded by provide, inverse quantization, generation, formation and processing, from time-discrete sound signal, produce, this method comprises:

Handle (70) coding audio data, to obtain a quantize block and a difference block;

Inverse quantization (74) quantize block also rounds, to obtain the quantize block of an integer inverse quantization;

Mode combination (78) this integer quantisation piece and difference block with spectrum value obtains a binding modules; And

Use this binding modules, and the use integer transform algorithm opposite with the integer transform algorithm, the time representation that produces (82) time discrete sound signal.