CN1901042B

CN1901042B - Audio data encoding and decoding system and method for making bit rate adjustable

Info

Publication number: CN1901042B
Application number: CN2006101078352A
Authority: CN
Inventors: 金重会; 金尚煜; 吴殷美
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2002-12-12
Filing date: 2003-09-17
Publication date: 2011-07-06
Anticipated expiration: 2023-09-17
Also published as: CN1525437A; KR20040051369A; KR100908116B1; CN1276406C; CN1901042A

Abstract

A method and apparatus for scalable encoding and decoding of audio data are provided. A method for scalable encoding of audio data includes encoding additional information including scale factor information corresponding to a first layer and coding model information; by referring to the coding model information, in order from symbols formed from the most significant bits (MSB) up to an order of symbols formed by least significant bits (LSBs), encoding a plurality of quantized samples corresponding to the first layer in units of symbols; Multiple layers of encoding. Following this approach, fine-grained scalability (FGS) can be provided with low complexity and good audio quality even at low layers.

Description

Method and device for scalable encoding and decoding of audio data

本申请是申请日为2003年9月17日，申请号为03165036.8，题为“可伸缩地编解码音频数据的方法和装置”的专利申请的分案申请。This application is a divisional application of a patent application entitled "Method and Device for Scalably Encoding and Decoding Audio Data" with the filing date of September 17, 2003 and the application number of 03165036.8.

技术领域technical field

本发明涉及编码和解码音频数据，尤其特别的是，涉及用于编码音频数据，以便编码的音频比特流具有可伸缩的比特率的方法和装置，以及用于解码音频数据的方法和装置。The present invention relates to encoding and decoding audio data, and more particularly, to methods and apparatus for encoding audio data such that the encoded audio bitstream has scalable bit rates, and methods and apparatus for decoding audio data.

背景技术Background technique

由于近来数字信号处理技术的发展，多数情况下音频信号通常被存储成数字信号并且再现。数字音频存储/恢复装置通过采样和量化把音频信号变换成脉冲编码调制(PCM)，也就是数字信号。通过这样的操作，数字音频存储/再现装置在信息存储介质，比如光盘(CD)和数字化视频光盘(DVD)中存储PCM音频数据，并响应用户的命令再现存储的信号以便用户能听音频数据。相对于使用密纹(LP)记录或磁带的模拟方法来说，数字存储/恢复方法大大地提高了音频质量，并显著减少了由长的存储周期引起的恶化。然而，由于大量的数字数据，数字方法在存储和传输方面存在问题。Due to the recent development of digital signal processing technology, audio signals are usually stored as digital signals and reproduced in many cases. A digital audio storage/recovery device converts an audio signal into a pulse code modulation (PCM), that is, a digital signal, by sampling and quantizing. Through such operations, the digital audio storage/reproduction apparatus stores PCM audio data in information storage media such as compact discs (CD) and digital video discs (DVD), and reproduces the stored signals in response to user commands so that the user can listen to the audio data. Digital storage/retrieval methods greatly improve audio quality and significantly reduce degradation caused by long storage periods relative to analog methods using LP recording or magnetic tape. However, digital methods have problems with storage and transmission due to the large amount of digital data.

为解决该问题，各种压缩方法被用于压缩数字音频信号。To solve this problem, various compression methods are used to compress digital audio signals.

在由国际标准化组织标准化的运动图象专家组(MPEG)中，或者由Dolby开发的AC-2/AC-3中，使用音质模型减少了数据量。作为结果，数据量能被有效地减少而不管信号的特性如何。就是说，MPEG/音频标准或AC-2/AV-3方法能以仅仅64～384Kbps的比特率提供几乎与CD相同的音频质量，该比特率是先前数字编码方法的比特率的1/6到1/8。In Moving Picture Experts Group (MPEG) standardized by ISO, or AC-2/AC-3 developed by Dolby, the amount of data is reduced using a sound quality model. As a result, the amount of data can be effectively reduced regardless of the characteristics of the signal. That is to say, the MPEG/audio standard or the AC-2/AV-3 method can provide almost the same audio quality as a CD at a bit rate of only 64-384Kbps, which is 1/6 to 1/6 of the bit rate of the previous digital encoding method. 1/8.

然而，在这些方法中，搜索适用于固定比特率的最佳状态并且接着执行量化和编码。因此，如果在通过网络发送比特流时由于网络条件很差而使得传输带宽被降低，可能会出现断开和适当的服务不能再提供给用户。此外，当比特流期望被变换成较小尺寸的比特流以更适用于具有有限存储容量的移动装置时，应该执行再编码处理以减少比特流的尺寸，并增加了所需的计算量。However, in these methods, an optimum state suitable for a fixed bit rate is searched for and then quantization and encoding are performed. Therefore, if the transmission bandwidth is reduced due to poor network conditions when the bit stream is transmitted through the network, disconnection may occur and appropriate services can no longer be provided to users. Furthermore, when a bitstream is expected to be transformed into a smaller-sized bitstream more suitable for a mobile device with limited storage capacity, a re-encoding process should be performed to reduce the size of the bitstream and increase the amount of calculation required.

为解决该问题，本发明的的申请人提出了韩国专利申请No.97-61298，1997年11月19日，标题“使用位切片算法编码(BSAC)的可伸缩比特率音频编码/解码的方法和装置”，此专利在2000年4月17日被授权，韩国专利号No.261253。根据BSAC技术，具有高比特率编码的比特流能被变成具有低比特率的比特流，并能够只用部分的比特流进行恢复。因此，当网络过载时，或者解码器的性能很差时，或用户请求低比特率时，通过只使用部分的比特流，可以把具有一定音频质量的服务提供给用户，尽管随着比特率的下降，质量会不可避免地成比例地下降。To solve this problem, the applicant of the present invention filed Korean Patent Application No. 97-61298, Nov. 19, 1997, titled "Method for Scalable Bit-Rate Audio Encoding/Decoding Using Bit Slice Algorithm Coding (BSAC)" and device", this patent was authorized on April 17, 2000, Korean Patent No. 261253. According to the BSAC technique, a bit stream encoded with a high bit rate can be changed into a bit stream with a low bit rate, and can be restored using only a part of the bit stream. Therefore, when the network is overloaded, or when the performance of the decoder is poor, or when the user requests a low bit rate, by using only part of the bit stream, it is possible to provide a service with a certain audio quality to the user, although the bit rate decreases with the bit rate. drop, the quality will inevitably drop proportionally.

然而，由于BSAC技术采用算术编码，复杂性很高，并且当BSAC技术在实际的装置中被实现时，成本增加。此外，由于BSAC技术利用修正离散余弦变换(MDCT)来进行音频信号的变换，低层中的音频质量会被严重的损坏。However, since the BSAC technique employs arithmetic coding, the complexity is high, and the cost increases when the BSAC technique is implemented in an actual device. In addition, since the BSAC technology uses Modified Discrete Cosine Transform (MDCT) to transform the audio signal, the audio quality in the lower layer will be seriously damaged.

发明内容Contents of the invention

发明提供了用于可伸缩地(with scalability)编解码音频数据的一种方法和装置，通过该方法和装置为精细粒度可伸缩性(FGS)提供较低的复杂度。The invention provides a method and apparatus for encoding and decoding audio data with scalability, by which lower complexity is provided for fine-grained scalability (FGS).

本发明还提供用于可伸缩地编解码音频数据的一种方法和装置，通过该方法和装置，即使在提供FGS时，也能在低层提供较好的音频质量。The present invention also provides a method and apparatus for scalable encoding and decoding of audio data, by which better audio quality can be provided at a lower layer even when FGS is provided.

根据本发明的一个方面，提供了用于可伸缩地编码音频数据的方法，包括编码对应于第一层、包含定标因子信息和编码模型信息的附加信息，通过参照编码模型信息，以从利用最高有效位(MSB)形成的符号直到利用最低有效位(LSB)形成的符号的方式，按顺序以符号为单位编码对应于第一层的多个量化样本，以及随着每次逐层增加层的序数，重复执行步骤，直到完成预定的多个层的编码。According to an aspect of the present invention, there is provided a method for scalable coding of audio data, comprising coding additional information corresponding to a first layer, comprising scale factor information and coding model information, by referring to the coding model information to obtain from using A number of quantized samples corresponding to the first layer are sequentially coded in symbol units from the symbol formed by the most significant bit (MSB) up to the symbol formed by the least significant bit (LSB), and each time the layer is increased layer by layer The ordinal number of , the steps are repeatedly executed until the encoding of predetermined multiple layers is completed.

根据本发明的另一个方面，提供了一种编码方法，包括切片(slicing)音频数据以使切片音频数据对应于多个层，获得对应于多个层的每个的定标段信息和编码段信息，基于对应于第一层的定标段信息和编码段信息编码包含定标因子信息和编码模型信息的附加信息，通过参照定标因子信息量化对应于第一层的音频数据，获得量化样本，通过参照编码模型信息，以从利用最高有效位(MSB)形成的符号直到利用最低有效位(LSB)形成的符号的方式，按顺序以符号为单位编码所得到的多个量化样本，以及随着每次逐层增加层的序数，重复执行步骤，直到完成预定的多个层的编码。According to another aspect of the present invention, there is provided an encoding method, including slicing audio data so that the sliced audio data corresponds to a plurality of layers, obtaining scaling segment information and encoding segments corresponding to each of the plurality of layers Information, based on the scaling segment information and encoding segment information corresponding to the first layer, encode additional information including scaling factor information and coding model information, quantize the audio data corresponding to the first layer by referring to the scaling factor information, and obtain quantized samples , by referring to the encoding model information, the resulting plurality of quantized samples are encoded in units of symbols in order from a symbol formed using the most significant bit (MSB) to a symbol formed using the least significant bit (LSB), and then The sequence numbers of the layers are increased layer by layer each time, and the steps are repeated until the coding of a predetermined number of layers is completed.

该方法可以进一步包括，在编码附加信息之前，获得多个层的每个中所允许的比特范围，其中在获得的多个量化样本的编码中，编码比特的数目被计数，和如果计数的比特的数目超过相应于比特的比特范围，编码停止，以及即使在量化样本全被编码之后，如果计数的比特的数目小于相应于比特的比特范围，在低层编码被完成之后仍然未编码的比特被编码到比特范围允许的范围。The method may further comprise, prior to encoding the additional information, obtaining a range of bits allowed in each of the plurality of layers, wherein in the encoding of the obtained plurality of quantized samples, the number of encoded bits is counted, and if the counted bits If the number of counted bits exceeds the bit range corresponding to bits, the encoding stops, and even after the quantized samples are all encoded, if the number of counted bits is less than the bit range corresponding to bits, the bits that are still unencoded after the lower layer encoding is completed are encoded to the range allowed by the bit range.

此外，音频数据的切片包括执行音频数据的子波变换，并且通过参照截止频率切片子波变换的数据，使得切片的数据对应于多个层。In addition, slicing of audio data includes performing wavelet transformation of audio data, and slicing the wavelet-transformed data with reference to a cutoff frequency so that sliced data corresponds to a plurality of layers.

音频数据的编码可以包括差分编码定标因子信息和编码模型信息。The encoding of audio data may include differential encoding scale factor information and encoding model information.

多个量化样本的编码可以包括赫夫曼编码(Huffman coding)，并且多个量化样本的编码可以包括在比特平面上映射多个量化样本，并按照从利用最高有效位(MSB)形成的符号直到利用最低有效位(LSB)形成的符号的顺序，在对应于样本的一个层中允许的比特范围内以符号为单位编码样本。The encoding of the plurality of quantized samples may include Huffman coding, and the encoding of the plurality of quantized samples may include mapping the plurality of quantized samples on a bit-plane, in order from symbols formed using the most significant bits (MSB) up to Samples are coded in units of symbols within a range of bits allowed in one layer corresponding to the samples using the order of symbols formed of least significant bits (LSBs).

在多个量化样本的映射中，K个量化样本被映射在比特平面上，并且在样本的编码中，获得对应于由K比特二进制数据形成的符号的标量值，并通过参照K比特二进制数据，获得的标量值，和对应于高于比特平面上当前符号的一个符号的标量值执行赫夫曼编码，其中K是一个整数。In the mapping of multiple quantized samples, K quantized samples are mapped on the bit plane, and in the encoding of the samples, a scalar value corresponding to a symbol formed by K-bit binary data is obtained, and by referring to the K-bit binary data , the scalar value obtained, and the scalar value corresponding to one symbol higher than the current symbol on the bit-plane perform Huffman encoding, where K is an integer.

根据本发明的另一个方面，提供了一种方法，用于可伸缩地解码在分层结构中被编码的音频数据，包括解码包含对应于第一层的定标因子信息和编码模型信息的附加信息，按照从由最高有效位(MSB)形成的符号到由最低有效位(LSB)形成的符号的顺序，以符号为单位解码音频数据，并通过参照编码模型信息获得量化样本，通过参照定标因子信息反向量化所获得的量化样本，反向变换该反向量化样本，以及随着每次逐层增加层的序数，重复执行步骤，直到完成预定的多个层的解码。According to another aspect of the present invention, there is provided a method for scalable decoding of audio data encoded in a layered structure, comprising decoding an additional Information, in the order from the symbol formed by the most significant bit (MSB) to the symbol formed by the least significant bit (LSB), decode the audio data in units of symbols, and obtain quantized samples by referring to the coding model information, by referring to the scale The obtained quantized samples are inversely quantized by the factor information, the inversely quantized samples are inversely transformed, and the sequence numbers of the layers are increased layer by layer each time, and the steps are repeated until the decoding of predetermined multiple layers is completed.

附加信息的解码包括差分解码定标因子和编码模型信息。The decoding of additional information includes differential decoding scaling factors and coding model information.

在解码音频数据中，通过赫夫曼解码获得量化样本。此外，音频数据的解码可以进一步包括按照从由最高有效位(MSB)形成的符号直到由最低有效位(LSB)形成的符号的顺序，在对应于音频数据的层中所允许的比特范围内以符号为单位解码音频数据，以及从其上安排了解码的符号的比特平面获得量化样本。In decoding audio data, quantized samples are obtained by Huffman decoding. In addition, the decoding of the audio data may further include, in the order from the symbol formed by the most significant bit (MSB) to the symbol formed by the least significant bit (LSB), within the range of bits allowed in the layer corresponding to the audio data in the order of Audio data is decoded in units of symbols, and quantized samples are obtained from bit planes on which the decoded symbols are arranged.

在解码音频数据时，获得由解码符号形成的4*K比特平面，并且在获得量化样本时，从4*K比特平面获得K个量化样本，其中K是一个整数。When decoding audio data, a 4*K bit plane formed of decoded symbols is obtained, and when obtaining quantized samples, K quantized samples are obtained from the 4*K bit plane, where K is an integer.

根据本发明的另一个方面，提供了一种装置，用于可伸缩地解码在分层结构中被编码的音频数据，包括解封单元，其解码包含对应于第一层的定标因子信息和编码模型信息的附加信息，并且通过参照编码模型信息，按照从由最高有效位(MSB)形成的符号直到由最低有效位(LSB)形成的符号的顺序以符号为单位解码音频数据和获得量化的样本；反向量化单元，其通过参照定标因子信息反向量化所获得的量化样本；和反向变换单元，其反向变换该反向量化样本。According to another aspect of the present invention, there is provided an apparatus for scalable decoding of audio data encoded in a layered structure, comprising a decapsulation unit that decodes the information containing the scale factor information corresponding to the first layer and The additional information of the encoding model information, and by referring to the encoding model information, decodes the audio data in units of symbols and obtains the quantized a sample; an inverse quantization unit that inversely quantizes the obtained quantized samples with reference to the scale factor information; and an inverse transform unit that inversely transforms the inverse quantized samples.

解封单元差分解码定标因子信息和编码模型信息，并通过赫夫曼解码输出量化样本。解封单元按照从由最高有效位(MSB)形成的符号直到由最低有效位(LSB)形成的符号的顺序，在对应于音频数据的层中所允许的比特范围内以符号为单位解码音频数据，并且从其上安排了解码的符号的比特平面获得量化样本。解封单元获得由解码的符号形成的4*K比特平面，并且接着从4*K比特平面获得K个量化样本，其中K是一个整数。The unpacking unit differentially decodes the scaling factor information and the encoding model information, and outputs quantized samples through Huffman decoding. The depacking unit decodes the audio data in units of symbols within the bit range allowed in the layer corresponding to the audio data in order from the symbol formed of the most significant bit (MSB) to the symbol formed of the least significant bit (LSB) , and quantized samples are obtained from the bit-plane on which the decoded symbols are arranged. The unpacking unit obtains a 4*K bit plane formed by the decoded symbols, and then obtains K quantized samples from the 4*K bit plane, where K is an integer.

根据本发明的另一个方面，提供了一种装置，用于可伸缩地解码音频数据，包括变换音频数据的变换单元；量化单元，其通过参照定标因子信息量化时应于每层的变换音频数据，并输出量化样本；和封装单元，其编码包含对应于每层的定标因子信息和编码模型信息的附加信息，并且通过参照编码模型信息，按照从由最高有效位(MSB)形成的符号直到由最低有效位(LSB)形成的符号的顺序，以符号为单位编码来自量化单元的多个量化样本。According to another aspect of the present invention, there is provided an apparatus for scalable decoding of audio data, including a transform unit for transforming audio data; a quantization unit for transforming audio corresponding to each layer when quantizing by referring to scale factor information data, and outputs quantized samples; and a packing unit that encodes additional information including scale factor information and coding model information corresponding to each layer, and by referring to the coding model information, according to the symbol formed from the most significant bit (MSB) A plurality of quantized samples from the quantization unit are coded in units of symbols up to the order of symbols formed of least significant bits (LSBs).

封装单元获得对应于多个层的每个的定标段信息和编码段信息，并基于对应于每层的定标段信息和编码段信息编码包含定标因子信息和编码模型信息的附加信息。The encapsulation unit obtains the scaling section information and the encoding section information corresponding to each of the layers, and encodes additional information including the scaling factor information and the encoding model information based on the scaling section information and the encoding section information corresponding to each layer.

另外，封装单元计数编码的比特的数目，如果计数的比特数超过相应于比特的比特范围，编码停止，并且即使在量化样本全被编码之后，如果计数的比特数小于相应于比特的比特范围，将低层编码完成之后仍然未编码的比特编码到比特范围允许的范围。In addition, the encapsulation unit counts the number of encoded bits, and if the counted number of bits exceeds a bit range corresponding to bits, the encoding is stopped, and even after the quantized samples are all encoded, if the counted number of bits is smaller than the bit range corresponding to bits, Encode the bits that are still unencoded after the low-level encoding is completed to the range allowed by the bit range.

变换单元对音频数据执行子波变换。The transform unit performs wavelet transform on audio data.

封装单元通过参照一个截止频率切片子波变换数据，使得切片的数据对应于多个层。The encapsulation unit slices the wavelet transform data by referring to a cutoff frequency so that the sliced data corresponds to a plurality of layers.

封装单元差分编码定标因子信息和编码模型信息。The encapsulation unit differentially encodes scaling factor information and encodes model information.

封装单元赫夫曼编码量化样本。特别是，封装元在比特平面上映射多个量化样本，并按照从由最高有效位(MSB)形成的符号直到由最低有效位(LSB)形成的符号的顺序，在对应于符号的层中所允许的比特范围内以符号为单位解码符号。Encapsulates unit Huffman encoding quantized samples. In particular, the packing element maps a plurality of quantized samples on the bit plane, and in the layer corresponding to the symbol in the order from the symbol formed by the most significant bit (MSB) to the symbol formed by the least significant bit (LSB) Decode symbols in units of symbols within the allowed bit range.

封装单元在比特平面上映射K个量化样本，获得对应于由K比特二进制数据形成的符号的标量值，并通过参照K比特二进制数据，获得的标量值，和对应于高于比特平面上当前符号的一个符号的标量值执行赫夫曼编码，其中K是一个整数。The packing unit maps K quantized samples on the bit plane, obtains scalar values corresponding to symbols formed by K-bit binary data, and by referring to the K-bit binary data, obtains scalar values corresponding to Huffman encoding is performed on a scalar value of a symbol for the current symbol, where K is an integer.

附图说明Description of drawings

通过结合参考附图详细描述本发明的优选实施例，本发明的上述目的和优点将变得更加清楚，其中：The above objects and advantages of the present invention will become clearer by describing in detail preferred embodiments of the present invention with reference to the accompanying drawings, wherein:

图1是根据本发明的一个优选实施例的编码装置的方框图；Fig. 1 is a block diagram of an encoding device according to a preferred embodiment of the present invention;

图2是根据本发明的优选实施例的解码装置的方框图；Fig. 2 is a block diagram of a decoding device according to a preferred embodiment of the present invention;

图3是帧的结构图，所述的帧形成在分层结构中编码的比特流以便能够控制比特率；Fig. 3 is a structural diagram of frames forming a bit stream coded in a hierarchical structure so as to be able to control the bit rate;

图4是附加信息的结构的详细图；FIG. 4 is a detailed diagram of the structure of additional information;

图5是参考图，用以解释按照本发明的一种编码方法；FIG. 5 is a reference diagram for explaining an encoding method according to the present invention;

图6是参考图，用以更加具体地解释按照本发明的编码方法；Fig. 6 is a reference figure, in order to more concretely explain according to the encoding method of the present invention;

图7是流程图，用于解释按照本发明优选实施例的编码方法；Fig. 7 is a flowchart for explaining the coding method according to the preferred embodiment of the present invention;

图8是流程图，用于解释按照本发明的优选实施例的解码方法；而Fig. 8 is flow chart, is used to explain the decoding method according to the preferred embodiment of the present invention; And

图9是流程图，用于解释按照本发明的另一个优选实施例的解码方法。Fig. 9 is a flowchart for explaining a decoding method according to another preferred embodiment of the present invention.

具体实施方式Detailed ways

参考图1，按照本发明，编码装置以分层结构编码音频数据，以便能控制编码的比特流的比特率，并且包括变换单元11，音质单元12，量化单元13，和比特封装单元14。Referring to Fig. 1, according to the present invention, the encoding device encodes audio data in a hierarchical structure so that the bit rate of the encoded bit stream can be controlled, and includes a transformation unit 11, a sound quality unit 12, a quantization unit 13, and a bit packing unit 14.

变换单元11接收作为时域音频信号的脉冲编码调制(PCM)音频数据，并把信号变换成频域信号，其中参照由音质单元12提供的有关音质模型的信息。当人能感知的音频信号的特性之间的差在时域中不是很大时，在通过变换获得的频域音频信号中，人能感知的信号和不能被人所感知的信号的特性之间具有大的差别。所以，通过差分分配到各个频段的比特的数目，压缩效率可以被提高。在本发明实施例中，变换单元11执行子波变换。在MDCT中，由于低频段中不必要的高频分辨率，甚至轻微的失真也可引起能由人耳朵感觉到的降级。然而，在子波变换中，时间/频率分辨率是更合适的，以至于可以提供更稳定的音频质量，即使是在具有低频段的低层中。Transformation unit 11 receives pulse code modulation (PCM) audio data as a time-domain audio signal, and transforms the signal into a frequency-domain signal, referring to information on a sound quality model provided by sound quality unit 12 . When the difference between the characteristics of the human-perceivable audio signal is not very large in the time domain, in the frequency-domain audio signal obtained by transformation, the difference between the characteristics of the human-perceivable signal and the signal that cannot be perceived by humans have a big difference. Therefore, compression efficiency can be improved by differentiating the number of bits allocated to the respective frequency bands. In the embodiment of the present invention, the transformation unit 11 performs wavelet transformation. In MDCT, even slight distortions can cause degradation that can be perceived by the human ear due to unnecessary high frequency resolution in the low frequency band. However, in wavelet transform, the time/frequency resolution is more appropriate so that it can provide more stable audio quality even in low layers with low frequency bands.

音质单元12提供音质模型的信息，比如冲击感信息给变换单元11，并把变换单元11变换的音频信号组合成适当子频段的信号。此外，音质单元12通过使用各个信号之间的交互作用所引起的屏蔽效应计算每个子频段中的屏蔽门限，并提供该门限值给量化单元13。屏蔽门限是由于信号间的交互作用而不能被人所感觉到的信号的最大值。在本实施例中，音质单元12通过两耳屏蔽电平降低(binaural masking level depression)(BMLD)来计算立体声分量的屏蔽门限。The sound quality unit 12 provides sound quality model information, such as shock information, to the transformation unit 11, and combines the audio signals transformed by the transformation unit 11 into signals of appropriate sub-bands. In addition, the sound quality unit 12 calculates the masking threshold in each sub-band by using the masking effect caused by the interaction between the respective signals, and provides the threshold value to the quantization unit 13 . The masking threshold is the maximum value of a signal that cannot be perceived by humans due to the interaction between the signals. In this embodiment, the sound quality unit 12 calculates the masking threshold of the stereo component through binaural masking level depression (BMLD).

量化单元13根据相应于音频信号的定标因子信息在每个频段标量量化音频信号，使得频段中量化噪声的水平小于音质单元12所提供的屏蔽门限，以致人不能感知到噪声。接着，量化单元13输出量化的样本。就是说，通过使用音质单元12中计算的屏蔽门限和每个频段产生的作为噪声比率的噪声-屏蔽比率(NMR)，量化单元13执行量化，使得全频段中的NMR值是0dB或更小。0dB或更小的NMR值意味着人不能感知量化噪声。The quantization unit 13 scalarizes the audio signal in each frequency band according to the scaling factor information corresponding to the audio signal, so that the level of quantization noise in the frequency band is smaller than the masking threshold provided by the sound quality unit 12, so that people cannot perceive the noise. Next, the quantization unit 13 outputs the quantized samples. That is, by using the masking threshold calculated in the sound quality unit 12 and a noise-masking ratio (NMR) generated for each frequency band as a noise ratio, the quantization unit 13 performs quantization so that the NMR value in all frequency bands is 0 dB or less. An NMR value of 0 dB or less means that humans cannot perceive quantization noise.

比特封装单元14编码属于每层的量化样本和附加信息，并以分层结构封装编码信号。附加信息包括每层中的定标段信息，编码段信息，它们的定标因子信息，和编码模型信息。定标段信息和编码段信息可以被封装成首部信息，并且接着被发送到解码装置。否则，定标段信息和编码段信息可以被编码和封装成每层的附加信息，并接着发送到解码装置。定标段信息和编码段信息可以不被发送到解码装置，因为在一些情况下它们被预存在解码装置中。The bit packing unit 14 encodes quantized samples and additional information belonging to each layer, and packs the encoded signal in a hierarchical structure. The additional information includes scaling segment information in each layer, encoding segment information, their scaling factor information, and encoding model information. The scale segment information and the code segment information may be encapsulated into header information, and then sent to the decoding device. Otherwise, the scaling segment information and the encoding segment information may be encoded and encapsulated into additional information of each layer, and then sent to the decoding device. Scale section information and coded section information may not be transmitted to the decoding device because they are pre-stored in the decoding device in some cases.

更为特别的是，当编码包含对应于第一层的定标因子信息和编码模型信息的附加信息时，比特封装单元14参照对应于第一层的编码模型信息，按照从由最高有效位(MSB)形成的符号直到由最低有效位(LSB)形成的符号的顺序，以符号为单位执行样本和信息的编码。接着，在第二层中，相同的处理被重复执行。就是说，随着层数的增加而执行编码，直到多个预定层的编码被完成。在本实施例中，比特封装单元14差分编码定标因子信息和编码模型信息，并赫夫曼编码量化样本。后面将解释根据本发明编码的比特流的分层结构。More specifically, when encoding additional information including scaling factor information and coding model information corresponding to the first layer, the bit packing unit 14 refers to the coding model information corresponding to the first layer, according to the most significant bit ( The encoding of samples and information is performed in units of symbols in the order from a symbol formed by MSB) to a symbol formed by least significant bit (LSB). Then, in the second layer, the same processing is repeatedly performed. That is, encoding is performed as the number of layers increases until encoding of a plurality of predetermined layers is completed. In this embodiment, the bit packing unit 14 differentially encodes the scale factor information and the encoding model information, and Huffman encodes the quantized samples. The hierarchical structure of the bitstream encoded according to the present invention will be explained later.

定标段信息是指用于按照音频信号的频率特性更合适地执行量化的信息。当频率区域被分成多个频段并且一个合适的定标因子被分配到每个频段时，定标段信息指示相应于每层的定标段。这样，每层属于至少一个定标段。每个定标段具有一个分配的定标因子。此外，编码段信息是指用于根据音频信号的频率特性更合适地执行编码的信息。当频率区域被分成多个频段并且适当的编码模型被分配到每个频段时，编码段信息指示对应于每层的编码段。定标段和编码段以经验为主进行划分，并且分别与之对应的定标因子和编码模型基于相同的方式被确定。The scale segment information refers to information for performing quantization more appropriately in accordance with the frequency characteristics of the audio signal. When the frequency region is divided into a plurality of frequency bands and an appropriate scaling factor is assigned to each frequency band, the scale band information indicates the scale band corresponding to each layer. Thus, each layer belongs to at least one calibration segment. Each scaling segment has an assigned scaling factor. Also, the encoding segment information refers to information for performing encoding more appropriately according to the frequency characteristics of the audio signal. The coding segment information indicates a coding segment corresponding to each layer when a frequency region is divided into a plurality of frequency bands and an appropriate coding model is allocated to each frequency band. The scaling segment and the encoding segment are divided based on experience, and the corresponding scaling factor and encoding model are determined based on the same method.

图2是按照本发明的优选实施例的解码装置的方框图。Fig. 2 is a block diagram of a decoding apparatus according to a preferred embodiment of the present invention.

参考图2，解码装置解码比特流到由网络条件，解码装置的性能和用户的选择所确定的目标层，使得比特流的比特率能被控制。解码装置包括解封单元21，反向量化单元22，和反向变换单元23。Referring to FIG. 2, the decoding device decodes the bit stream to a target layer determined by network conditions, performance of the decoding device and user's selection, so that the bit rate of the bit stream can be controlled. The decoding device includes a decapsulation unit 21 , an inverse quantization unit 22 , and an inverse transformation unit 23 .

解封单元21解封比特流到目标层，并解码每层中的比特流。就是说，包含相应于每层的定标因子信息和编码模型信息的附加信息被解码，并接着基于获得的编码模型信息，属于该层的编码量化样本被解码，并且量化样本被恢复。在本实施例中，解封单元21差分解码定标因子信息和编码模型信息，并赫夫曼解码所编码的量化样本。The decapsulation unit 21 decapsulates the bitstream to the target layer, and decodes the bitstream in each layer. That is, additional information including scale factor information and coding model information corresponding to each layer is decoded, and then based on the obtained coding model information, coded quantized samples belonging to the layer are decoded and quantized samples are restored. In this embodiment, the unpacking unit 21 differentially decodes the scaling factor information and the encoding model information, and Huffman decodes the encoded quantized samples.

同时，从比特流的首部信息，或通过解码每层中的附加信息，获得定标段信息和编码段信息。可替换的，解码装置可以提前存储定标段信息和编码段信息。按照相应于样本的定标因子信息，反向量化单元22反向量化和恢复每层中的量化样本。反向变换单元23频率/时间映射所恢复的样本，以便把样本变换成时域的PCM音频数据，并输出它。At the same time, the scaling section information and the coding section information are obtained from the header information of the bit stream, or by decoding the additional information in each layer. Alternatively, the decoding device may store the scaling segment information and the encoding segment information in advance. The inverse quantization unit 22 inverse quantizes and restores the quantized samples in each layer according to the scale factor information corresponding to the samples. The inverse transform unit 23 frequency/time maps the restored samples to transform the samples into PCM audio data in the time domain, and outputs it.

图3是帧的结构图，所述帧形成以分层结构编码的比特流，使得可以控制比特率。FIG. 3 is a structural diagram of frames forming a bit stream encoded in a hierarchical structure so that a bit rate can be controlled.

参考图3，按照本发明的比特流的帧通过映射量化样本和附加信息被编码到分层结构，以获得精细粒度可伸缩性(FGS)。换句话说，低层比特流被包括在分层结构的增强层比特流中。每层中需要的附加信息被分配到每层并接着被编码。Referring to FIG. 3, a frame of a bitstream according to the present invention is encoded into a hierarchical structure by mapping quantized samples and additional information to achieve fine-grained scalability (FGS). In other words, the lower layer bitstream is included in the hierarchically structured enhancement layer bitstream. Additional information required in each layer is assigned to each layer and then encoded.

用于存储首部信息的首部区域被放在比特流的前面，然后有关层0的信息在首部区域之后被封装，接着，属于作为增强层的层1-N的信息按顺序被封装。从首部区域至层0信息的层被称作基层，从首部区域至层1信息的层被称作层1，和从首部区域至层2信息的层被称作层2。同样，最上层表示从首部区域至层N信息，就是说，从基层到作为增强层的层N。附加信息和编码音频数据被存储成每个层信息。例如，附加信息2和编码量化样本被存储成层2信息。这里，N是大于或等于1的一个整数。A header area for storing header information is placed in front of the bitstream, then information about layer 0 is packed after the header area, and then information belonging to layers 1-N as enhancement layers is packed in order. The layer from the header area to layer 0 information is called base layer, the layer from the header area to layer 1 information is called layer 1, and the layer from the header area to layer 2 information is called layer 2. Also, the uppermost layer represents information from the header area to layer N, that is, from the base layer to layer N as an enhancement layer. Additional information and coded audio data are stored as each layer information. For example, additional information 2 and coded quantized samples are stored as layer 2 information. Here, N is an integer greater than or equal to 1.

图4是附加信息的结构的详细图。FIG. 4 is a detailed diagram of the structure of additional information.

参考图4，附加信息和编码量化样本被存储成任意的附加信息，并在本实施例中，附加信息包括赫夫曼编码模型信息，量化因子信息，有关信道的附加信息，和其它附加信息。赫夫曼编码模型信息是赫夫曼编码模型的索引信息，应该被用于编码或解码属于相应于该信息的层的量化样本。量化因子信息指示量化步长，该步长用于量化或反向量化属于相应于信息的层的音频数据。有关信道的附加信息是有关信道的诸如M/S立体声的信息。其它附加信息是有关是否使用M/S立体声的标志信息。Referring to FIG. 4, additional information and coded quantized samples are stored as arbitrary additional information, and in this embodiment, the additional information includes Huffman coding model information, quantization factor information, additional information about channels, and other additional information. The Huffman coding model information is index information of the Huffman coding model that should be used to encode or decode quantized samples belonging to a layer corresponding to the information. The quantization factor information indicates a quantization step size for quantizing or dequantizing audio data belonging to a layer corresponding to the information. The additional information on the channel is information on the channel such as M/S stereo. Other additional information is flag information on whether to use M/S stereo.

在本实施例中，比特封装单元14对赫夫曼编码模型信息和量化因子信息执行差分编码。在差分编码中，一个直接在先频段的值的差分值被编码。有关信道的附加信息被赫夫曼编码。In the present embodiment, the bit packing unit 14 performs differential encoding on Huffman encoding model information and quantization factor information. In differential encoding, the differential value of the value of an immediately preceding band is encoded. Additional information about the channel is Huffman coded.

图5是参考图，用于更具体地解释根据本发明的编码方法。FIG. 5 is a reference diagram for more specifically explaining the encoding method according to the present invention.

参考图5，要编码的量化样本具有3-层结构。斜线矩形表示包括量化样本的频谱线，实线表示定标段，虚线表示编码段。定标段(1)，(2)，(3)，(4)和(5)以及编码段(1)，(2)，(3)，(4)和(5)属于层0。定标段(5)和(6)以及编码段(6)，(7)，(8)，(9)和(10)属于层1。定标段(6)和(7)以及编码段(11)，(12)，(13)，(14)和(15)属于层2。同时，定义层0，使得执行编码直到频段(a)，定义层1，使得执行编码直到频段(b)，并且定义层2，使得执行编码直到频段(c)。Referring to FIG. 5, quantized samples to be encoded have a 3-layer structure. Slashed rectangles indicate spectral lines including quantized samples, solid lines indicate scaled segments, and dashed lines indicate encoded segments. Scaling segments (1), (2), (3), (4) and (5) and coding segments (1), (2), (3), (4) and (5) belong to layer 0. Scaling segments (5) and (6) and encoding segments (6), (7), (8), (9) and (10) belong to layer 1. Scaling segments (6) and (7) and coding segments (11), (12), (13), (14) and (15) belong to layer 2. Meanwhile, layer 0 is defined so that encoding is performed up to band (a), layer 1 is defined so that encoding is performed up to band (b), and layer 2 is defined so that encoding is performed up to band (c).

首先，使用相应编码模型在100的比特范围内编码属于层0的量化样本。此外，作为层0的附加信息，属于层0的定标段(1)，(2)，(3)，(4)和(5)以及编码段(1)，(2)，(3)，(4)和(5)被编码。在以符号为单位编码量化样本时，比特的数目被计数。如果计数的比特数超过允许的比特范围，层0的编码被停止，并且层1被算术编码。在属于层0的量化样本中，当层0和1中的允许比特数仍然有空间时，未编码的量化样本下一次被编码。First, the quantized samples belonging to layer 0 are coded in the bit range of 100 using the corresponding coding model. Furthermore, as additional information of layer 0, the scaled segments (1), (2), (3), (4) and (5) and the encoded segments (1), (2), (3) belonging to layer 0, (4) and (5) are coded. When encoding quantized samples in units of symbols, the number of bits is counted. If the counted number of bits exceeds the allowable bit range, encoding of layer 0 is stopped, and layer 1 is arithmetic-encoded. Among quantized samples belonging to layer 0, uncoded quantized samples are coded next time when there is still room for the allowed number of bits in layers 0 and 1.

接下来，属于层1的量化样本被编码，其中使用属于层1的编码段，就是说，编码段(6)，(7)，(8)，(9)和(10)中要编码的量化样本所属的一个编码段的编码模型。此外，作为层1的附加的信息，属于层1的定标段(5)和(6)以及编码段(6)，(7)，(8)，(9)和(10)被编码。甚至在编码相应于层1的所有样本之后，如果在允许的比特范围，即100比特中仍然有空间，层0中剩余的未编码比特被编码，直到允许比特，即100比特被计数到。如果针对编码而计数的比特数超过允许比特范围，层1的编码被停止，并且层2的编码开始。Next, the quantized samples belonging to layer 1 are coded using the coding segments belonging to layer 1, that is, the quantizations to be coded in coding segments (6), (7), (8), (9) and (10) The coding model of a coding segment to which the sample belongs. Furthermore, as additional information of layer 1, scaling segments (5) and (6) and encoding segments (6), (7), (8), (9) and (10) belonging to layer 1 are coded. Even after encoding all samples corresponding to layer 1, if there is still room in the allowed bit range, i.e. 100 bits, the remaining unencoded bits in layer 0 are encoded until the allowed bits, i.e. 100 bits, are counted. If the number of bits counted for encoding exceeds the allowable bit range, encoding of layer 1 is stopped, and encoding of layer 2 starts.

最后，属于层2的量化样本被编码，其中使用属于层2的编码段，即编码段(11)，(12)，(13)，(14)和(15)中要编码的量化样本所属的一个编码段的编码模型。此外，作为层2的附加信息，属于层2的定标段(6)和(7)以及编码段(11)，(12)，(13)，(14)和(15)被编码。甚至在编码相应于层2的所有样本之后，如果在允许的比特范围，即100比特中仍然有空间，层0中剩余的未编码比特被编码，直到允许的比特，即100比特被计数到。Finally, the quantized samples belonging to layer 2 are coded using the coding section belonging to layer 2, i.e. the quantized samples to be coded in coding sections (11), (12), (13), (14) and (15) belong to An encoding model for an encoded segment. Furthermore, as additional information of layer 2, scaling segments (6) and (7) and encoding segments (11), (12), (13), (14) and (15) belonging to layer 2 are coded. Even after encoding all samples corresponding to layer 2, if there is still room in the allowed bit range, ie 100 bits, the remaining uncoded bits in layer 0 are encoded until the allowed bits, ie 100 bits, are counted.

如果所有的量化样本被编码而不考虑层0的允许比特范围，即如果所有的量化样本被编码，甚至在编码比特数超过允许比特范围，即100之后(这意味着下一层，即层1的允许比特范围中的一些比特被用于编码当前层)，通常的情况是，属于层1的量化样本不能被编码。因此，在可伸缩解码的情况下，如果在范围到层1的层上执行解码，由于相应于层1的范围到预定频段(b)的所有量化样本不被编码，在低于(b)的频率上解码的量化样本会波动，导致“Birdy”效应，其中音频质量会恶化。If all quantized samples are coded regardless of the allowed bit range of layer 0, i.e. if all quantized samples are coded, even after the number of coded bits exceeds the allowed bit range, i.e. 100 (which means the next layer, i.e. layer 1 Some bits in the allowable bit range of the layer are used to code the current layer), it is usually the case that quantized samples belonging to layer 1 cannot be coded. Therefore, in the case of scalable decoding, if decoding is performed on layers ranging up to layer 1, since all quantized samples corresponding to the range of layer 1 to a predetermined frequency band (b) are not coded, at layers lower than (b) The decoded quantized samples fluctuate in frequency, causing the "Birdy" effect, where the audio quality deteriorates.

在确定多个层中(目标层)时，分配比特范围，其中考虑到被编码的所有音频数据的整体大小。这样，没有可能因为其中安排被编码的比特的比特范围的缺陷而不执行编码。When determining among the layers (target layers), bit ranges are allocated, taking into account the overall size of all audio data to be encoded. In this way, there is no possibility that encoding will not be performed due to a defect in the bit range in which bits to be encoded are arranged.

在以和编码处理相反的方式执行解码时，按照允许的比特范围计数比特数。因此，预定层的解码定时点能被识别。When decoding is performed in the reverse manner to the encoding process, the number of bits is counted in the allowed bit range. Therefore, a decoding timing point of a predetermined layer can be identified.

图6是参考图，用以更具体地解释按照本发明的编码方法。FIG. 6 is a reference diagram for more specifically explaining the encoding method according to the present invention.

按照本发明，比特装单元14通过比特明码(bit-plain)编码和赫夫曼编码在对相应于每个层的量化样本执行编码。多个量化样本被映射在比特平面上，以便接着以二进制形式表示，并在每层的允许的比特范围内，按照从由MSB形成的符号直到由LSB形成的符号的顺序被编码。比特平面上重要的信息首先被编码，而相对不太重要的信息随后被编码。通过这样操作，相应于每层的比特率和频段在编码处理中被固定，使得能够减少被称作“Birdy效应”的失真。According to the present invention, the bit packing unit 14 performs encoding on quantized samples corresponding to each layer by bit-plain encoding and Huffman encoding. A number of quantized samples are mapped on a bit-plane for subsequent representation in binary form and are coded in order from symbols formed by MSBs up to symbols formed by LSBs within the allowed bit range of each layer. Important information on the bit plane is encoded first, while less important information is encoded later. By doing so, the bit rate and frequency band corresponding to each layer are fixed in the encoding process, making it possible to reduce distortion called "Birdy effect".

图6示例了在此情况下的编码例子，其中包括MSB的符号的比特数是4或更少。当量化样本9，2，4，和0被映射在比特平面上时，它们以二进制形式被表示，也就是，分别为1001b，0010b，0100b和0000b。就是说，在本实施例中，作为比特平面上的编码单元的编码块的大小是4*4。FIG. 6 illustrates an encoding example in the case where the number of bits of a symbol including MSB is 4 or less. When quantized samples 9, 2, 4, and 0 are mapped on the bit-plane, they are represented in binary form, ie, 1001b, 0010b, 0100b, and 0000b, respectively. That is, in this embodiment, the size of a coding block as a coding unit on a bit plane is 4*4.

由MSB形成的符号msb是“1001b”，由下一MSB形成的符号msb-1是“0010b”，由下一MSB形成的符号msb-2是“0100b”，由LSB形成的符号msb-3是“1000b”。The symbol msb formed from the MSB is "1001b", the symbol msb-1 formed from the next MSB is "0010b", the symbol msb-2 formed from the next MSB is "0100b", and the symbol msb-3 formed from the LSB is "1000b".

用于赫夫曼编码的赫夫曼模型信息，即码本索引被示于表1：Huffman model information for Huffman coding, i.e. codebook index is shown in Table 1:

表1Table 1

附加信息Additional Information 有效性(significance)Significance 赫夫曼模型Huffman model 00 00 00 11 1 1 1 1 22 1 1 2 2 33 2 2 33 44 44 2 2 55 66 55 33 77 8 8 9 9

66 33 1010 1111 1212 77 44 1313 1414 1515 1616 88 44 1717 1818 1919 2020 99 55 ** 1010 66 **

1111 77 ** 1212 8 8 ** 1313 9 9 ** 1414 1010 ** 1515 1111 ** 1616 1212 ** 1717 1313 ** 1818 1414 ** ** ** **

根据表1，对于相同有效水平(本实施例中的msb)甚至存在两个模型。这是因为两个模型是针对显示出不同分布的量化样本而产生的。According to Table 1, there are even two models for the same effective level (msb in this example). This is because the two models were produced for quantized samples that exhibit different distributions.

现在将更详细地解释按照表1的图6例子的用于编码的处理过程。The process for encoding according to the example of FIG. 6 of Table 1 will now be explained in more detail.

在一个符号的比特数是4或更小的情况下，按照本发明的赫夫曼编码如公式1所示：In the case where the number of bits of a symbol is 4 or less, Huffman coding according to the present invention is shown in Equation 1:

赫夫曼码值＝赫夫曼码本[码本索引][更高比特平面][符号]......(1)Huffman code value=Huffman codebook [codebook index] [higher bit plane] [symbol]...(1)

就是说，赫夫曼编码使用3个输入变量，包括码本索引，更高平面，和符号。码本索引表示从表1获得的值，更高比特平面表示比特平面上紧临当前期望编码的符号之上的符号。符号表示目前期望编码的符号。That is, Huffman coding uses 3 input variables, including codebook index, higher plane, and sign. The codebook index represents the value obtained from Table 1, and the higher bit-plane represents the symbol on the bit-plane immediately above the symbol currently desired to be coded. symbol represents the symbol currently expected to be encoded.

由于在图6的例子中赫夫曼模型的msb是4，选择13-16或17-20。如果被编码的附加信息是8，Since the msb of the Huffman model is 4 in the example in Figure 6, choose 13-16 or 17-20. If the encoded additional information is 8,

由msb比特形成的符号的码本索引是16，The codebook index of the symbol formed by msb bits is 16,

由msb-1比特形成的符号的码本索引是15，The codebook index of the symbol formed by msb-1 bits is 15,

由msb-2比特形成的符号的码本索引是14，并且The codebook index of the symbol formed by msb-2 bits is 14, and

由msb-3比特形成的符号的码本索引是13。A codebook index of a symbol formed of msb-3 bits is 13.

同时，由于收msb比特形成的符号不具有更高比特平面的数据，如果更高比特平面的值是0，用赫夫曼码本[16][0b][1000b]的码执行编码。由于由msb-1比特形成的符号的更高比特平面是1000b，用赫夫曼码本[15][1000b][0010b]的码执行编码。由于由msb-2比特形成的符号的更高比特平面是0010b，用赫夫曼码本[14][0010b][0100b]的码执行编码。由于由msb-3比特形成的符号的更高比特平面是0100b，用赫夫曼码本[13][0100b][1000b]的码执行编码。Meanwhile, since the symbol formed by receiving msb bits does not have the data of the higher bit plane, if the value of the higher bit plane is 0, encoding is performed with the code of the Huffman codebook [16][0b][1000b]. Since the higher bit plane of the symbol formed by msb-1 bits is 1000b, encoding is performed with the codes of the Huffman codebook [15][1000b][0010b]. Since the higher bit plane of the symbol formed by msb-2 bits is 0010b, encoding is performed with the codes of the Huffman codebook [14][0010b][0100b]. Since the higher bit plane of the symbol formed by msb-3 bits is 0100b, encoding is performed with the codes of the Huffman codebook [13][0100b][1000b].

比特封装单元14计数编码比特的数目，用层中允许使用的比特的数目比较该计数，如果计数大于允许数目，停止编码。当在下一层中允许空间时，没有被编码的剩余比特被编码并且被放进下一层。在分配到相应层的量化样本被全部编码之后，如果层中允许的比特的数目中仍然有空间，即如果层中有空间，则低层中编码完成之后仍然未编码的量化样本被编码。The bit packing unit 14 counts the number of encoded bits, compares this count with the number of bits allowed to be used in the layer, and stops encoding if the count is greater than the allowed number. When space is allowed in the next layer, the remaining bits that were not coded are coded and put into the next layer. After the quantized samples allocated to the corresponding layer are all coded, if there is still room in the number of bits allowed in the layer, ie, if there is room in the layer, the quantized samples in the lower layer that are still uncoded after the coding is completed are coded.

同时，如果由msb形成的符号的比特数大于或等于5，使用当前比特平面上的位置确定赫夫曼码值。换句话说，如果有效性大于或等于5，每个比特平面上的数据中只具有较小统计差，使用相同的赫夫曼模型对数据进行赫夫曼编码。就是说，每个比特平面均存在赫夫曼模式。Meanwhile, if the number of bits of the symbol formed by msb is greater than or equal to 5, the position on the current bit plane is used to determine the Huffman code value. In other words, if the significance is greater than or equal to 5, there is only a small statistical difference in the data on each bit plane, and the data is Huffman coded using the same Huffman model. That is, there is a Huffman pattern for each bit-plane.

如果有效性大于或等于5，就是说，符号的比特数大于或等于5，本发明的赫夫曼编码满足公式2：If the validity is greater than or equal to 5, that is to say, the number of bits of the symbol is greater than or equal to 5, the Huffman coding of the present invention satisfies formula 2:

赫夫曼码＝20+bp1 ...2Huffman code = 20+bp1 ...2

其中‘bp1’表示期望当前被编码的比特平面的索引，并且是大于或等于1的整数。，如表2所列，常数20是增加的一个值，用于表示索引从21开始，因为赫夫曼模型的最后索引(对应于附加数8)是20。因此，用于一个编码段的附加信息简单地表示了有效性。在表2中，按照期望被当前编码的比特平面的索引确定赫夫曼模型。where 'bp1' represents the index of the bit-plane expected to be currently coded, and is an integer greater than or equal to 1. , as listed in Table 2, the constant 20 is a value added to indicate that the index starts from 21, because the last index of the Huffman model (corresponding to the additional number 8) is 20. Therefore, the additional information for a coded segment simply indicates availability. In Table 2, the Huffman model is determined by the index of the bit-plane expected to be currently coded.

表2Table 2

附加信息Additional Information 有效性Effectiveness 赫夫曼模型Huffman model 99 55 21-2521-25 1010 66 21-2621-26 1111 77 21-2721-27 1212 8 8 21-2821-28 1313 9 9 21-2921-29 1414 1010 21-3021-30 1515 1111 21-3121-31 1616 1212 21-3221-32 1717 1313 21-3321-33 1818 1414 21-3421-34 1919 1515 21-3521-35

对于附加信息中的量化因子信息和赫夫曼模型信息，在相应于信息的编码段上执行DPCM。当量化因子信息被编码时，在帧的首部信息中用8个比特表示初始的DPCM值。用于赫夫曼模型信息的DPCM的初始值被设置为0。For the quantization factor information and the Huffman model information in the additional information, DPCM is performed on the coded section corresponding to the information. When the quantization factor information is encoded, 8 bits are used to represent the initial DPCM value in the header information of the frame. The initial value of DPCM for Huffman model information is set to 0.

下面列出根据本发明和现有技术的BSAC技术的编码方法之间的差别。首先，在BSAC技术中，编码以比特为单位来执行，而在本发明中是以符号为单位来执行编码。第二，在BSAC技术中，使用了算术编码，而在本发明中使用赫夫曼编码。算术编码提供了较高的压缩增益，但增加了复杂性和成本。因此，在本发明中，数据不是以比特单位被编码，而是以符号为单位通过赫夫曼被编码，以便降低复杂性和成本。The differences between the encoding methods of the BSAC technique according to the present invention and the prior art are listed below. First, in the BSAC technique, encoding is performed in units of bits, whereas encoding is performed in units of symbols in the present invention. Second, in the BSAC technique, arithmetic coding is used, while Huffman coding is used in the present invention. Arithmetic coding provides high compression gains at the expense of increased complexity and cost. Therefore, in the present invention, data is not coded in units of bits, but is coded by Huffman in units of symbols in order to reduce complexity and cost.

为了控制比特率，就是说为了提供可伸缩性，相应于一个帧的比特流被截止，考虑到每层中允许使用的比特的数目，使得只利用较小的数据量但可进行解码。例如，如果只有对应于48kbps的比特流期望被解码，只使用比特流的1048比特，使得能够获得对应于48kbps的解码音频数据。In order to control the bit rate, that is to say to provide scalability, the bit stream corresponding to one frame is cut off, taking into account the number of bits allowed in each layer, so that only a small amount of data is utilized but still decodable. For example, if only a bit stream corresponding to 48 kbps is desired to be decoded, only 1048 bits of the bit stream are used, enabling decoding audio data corresponding to 48 kbps to be obtained.

现在将解释基于上述结构的按照本发明的编码和解码方法。An encoding and decoding method according to the present invention based on the above structure will now be explained.

编码装置读取PCM音频数据，在存储器中存储数据(未显示)，并且通过音质建模从存储的PCM音频数据中获得屏蔽门限和附加信息。由于PCM音频数据是时域信号，PCM音频数据被子波变换成频域信号。接着，编码装置根据量化段信息和量化因子信息通过量化子波变换的信号来获得量化样本。如上所述，编码量化样本并通过比特切片编码，基于符号单位的编码和赫夫曼编码来封装。The encoding device reads the PCM audio data, stores the data in a memory (not shown), and obtains the masking threshold and additional information from the stored PCM audio data through sound quality modeling. Since PCM audio data is a time-domain signal, the PCM audio data is wavelet-transformed into a frequency-domain signal. Next, the encoding device obtains quantized samples by quantizing the wavelet-transformed signal according to the quantization segment information and the quantization factor information. Quantized samples are encoded and packed by bit-slice encoding, symbol-unit based encoding, and Huffman encoding, as described above.

图7是流程图，用于解释按照本发明优选实施例的编码方法。Fig. 7 is a flowchart for explaining an encoding method according to a preferred embodiment of the present invention.

参考图7，现在将解释编码装置的比特封装单元14编码和封装量化的样本的处理过程。Referring to Fig. 7, the process of encoding and packing quantized samples by the bit packing unit 14 of the encoding device will now be explained.

首先，比特封装单元14根据所提供的目标比特率和附加信息提取相应于每层的信息。该处理在步骤701至703中执行。更具体的，在步骤701获得作为用于每层的截止的基础的截止频率，在步骤702获得对应于每层的量化段信息和编码段信息，并且在步骤703分配比特范围，在该范围内，应当编码的比特在每个层中能够被编码。First, the bit packing unit 14 extracts information corresponding to each layer according to the supplied target bit rate and additional information. This processing is performed in steps 701 to 703 . More specifically, in step 701, the cut-off frequency used as the basis for the cut-off of each layer is obtained, in step 702, the quantized segment information and the encoded segment information corresponding to each layer are obtained, and in step 703, a bit range is allocated, within the range , bits that should be coded can be coded in each layer.

接着，在步骤704中，层索引被设置成基层，并且附加信息(包括量化段信息和编码段信息)在步骤705被编码。Next, in step 704 , the layer index is set to the base layer, and additional information (including quantization section information and coding section information) is encoded in step 705 .

接下来，相应于基层的量化样本被映射在比特平面上，并在步骤706根据由msb比特形成的符号以4*4块为单位进行编码。在步骤707，编码的比特数被计数，并且如果该计数超过当前层的比特范围，则当前层的编码被停止，并且在下一层开始编码。如果在步骤707计数的比特数没有超出比特范围，则在步骤709，过程返回到步骤705以处理下一层。由于基层没有更低的层，步骤708不被执行，但针对基层之后跟着的层执行步骤708。通过上述步骤，直到目标层的所有范围的层均被编码。Next, the quantized samples corresponding to the base layer are mapped on the bit plane, and coded in units of 4*4 blocks according to the symbol formed by msb bits at step 706 . In step 707, the number of encoded bits is counted, and if the count exceeds the bit range of the current layer, the encoding of the current layer is stopped, and encoding is started at the next layer. If the number of bits counted in step 707 does not exceed the bit range, then in step 709, the process returns to step 705 to process the next layer. Since the base layer has no lower layers, step 708 is not performed, but step 708 is performed for the layers following the base layer. Through the above steps, all ranges of layers up to the target layer are encoded.

步骤706，也就是用于编码量化的样本的步骤如下所述：Step 706, that is, the step for encoding quantized samples is as follows:

1.相应于一个层的量化样本被以N样本为单位分组和映射在比特平面上。1. Quantized samples corresponding to one layer are grouped in units of N samples and mapped on a bit plane.

2.根据由映射的二进制数据的msb比特形成的符号执行赫夫曼编码。2. Perform Huffman encoding according to the symbols formed by the msb bits of the mapped binary data.

子步骤2可以如下进行详细解释：Sub-step 2 can be explained in detail as follows:

2.1相应于期望编码的符号的标量值(curVal)被获得。2.1 A scalar value (curVal) corresponding to the symbol desired to be encoded is obtained.

2.2相应于标量值(upperVal)的赫夫曼代码被获得，该标量值对应于更高比特平面中的符号，就是说，比特流中处于比期望当前编码的符号更高的位置的符号。2.2 A Huffman code is obtained corresponding to a scalar value (upperVal) corresponding to a symbol in a higher bit-plane, that is, a symbol at a higher position in the bitstream than the symbol expected to be currently coded .

对于附加信息中的量化因子信息和赫夫曼模型信息，在相应于信息的编码段上执行DPCM。当量化因子信息被编码时，DPCM的初始值在帧的首部信息中由8个比特表示。用于赫夫曼模型信息的DPCM的初始值被设置到0。For the quantization factor information and the Huffman model information in the additional information, DPCM is performed on the coded section corresponding to the information. When quantization factor information is encoded, the initial value of DPCM is represented by 8 bits in header information of a frame. The initial value of DPCM for Huffman model information is set to 0.

图8是流程图，用于解释按照本发明的优选实施例的解码方法。Fig. 8 is a flowchart for explaining a decoding method according to a preferred embodiment of the present invention.

参考图8，解码装置接收由在分层结构中编码的音频数据形成的比特流，并解码每帧中的首部信息。接着，在步骤801，包括相应于第一层的定标因子信息和编码模型信息的附加信息被解码。在步骤802，参照编码模型信息，通过按照从由MSB比特形成的符号直到由LSB比特形成的符号的顺序以符号为单位地解码比特流，获得量化样本。在步骤803，通过参考定标因子信息，获得的量化样本被反向量化，并在步骤804中，反向量化样本被反向变换。随着每次逐层增加层的序数，重复执行步骤801-804，直到完成预定的多个层的编码。Referring to FIG. 8, a decoding device receives a bitstream formed of audio data encoded in a hierarchical structure, and decodes header information in each frame. Next, in step 801, additional information including scale factor information and coding model information corresponding to the first layer is decoded. In step 802, referring to the encoding model information, quantized samples are obtained by decoding the bit stream in units of symbols in the order from symbols formed of MSB bits to symbols formed of LSB bits. In step 803, the obtained quantized samples are inversely quantized by referring to the scale factor information, and in step 804, the inversely quantized samples are inversely transformed. As the ordinal number of the layer is increased layer by layer each time, steps 801-804 are repeatedly executed until the coding of a predetermined number of layers is completed.

图9是流程图，用于解释按照本发明另一个优选实施例的解码方法。Fig. 9 is a flowchart for explaining a decoding method according to another preferred embodiment of the present invention.

参考图9，接收由在分层结构中编码的音频数据形成的比特流，并在步骤901，根据每帧中的首部信息解码相应于每层的截止频率。在步骤902，通过解码，根据首部信息识别相应于每层的量化段信息和编码段信息。在步骤903，每层的允许使用比特范围被识别。在步骤904，层索引被设置为基层。步骤905解码基层上的附加信息，在步骤906，通过按照从由MSB比特形成的符号直到由LSB比特形成的符号的顺序以符号单位地将比特流解码为每层允许的比特范围，获得量化样本。在步骤907，检查当前层是否最后一个。随着层数的逐个增加，步骤905和906在各层上重复执行，直到到达预定的目标层。在步骤901-903中，解码装置可以提前具有截止频率，量化段信息，编码段信息和比特范围，而不是根据存储在接收的比特流的每帧中的首部信息获得这些信息。在此情况下，通过读取存储的信息，解码装置获得信息。Referring to FIG. 9, a bit stream formed of audio data encoded in a hierarchical structure is received, and in step 901, a cutoff frequency corresponding to each layer is decoded according to header information in each frame. In step 902, through decoding, the quantized section information and coding section information corresponding to each layer are identified according to the header information. At step 903, the allowed usage bit ranges for each layer are identified. At step 904, the layer index is set to the base layer. Step 905 decodes the additional information on the base layer, and in step 906 quantized samples are obtained by decoding the bitstream symbol-wise into the allowed bit range for each layer in the order from the symbol formed by the MSB bits up to the symbol formed by the LSB bits . In step 907, it is checked whether the current layer is the last one. As the number of layers increases one by one, steps 905 and 906 are repeatedly executed on each layer until a predetermined target layer is reached. In steps 901-903, the decoding device may have the cutoff frequency, quantized section information, coded section information and bit range in advance instead of obtaining these information according to the header information stored in each frame of the received bitstream. In this case, by reading the stored information, the decoding means obtains the information.

如上所述，根据本发明，通过在执行比特切片之后以符号为单位编码比特，提供借以能够通过自顶向下方式控制比特率的可伸缩性，使得编码装置的计算量不太大于没有提供可伸缩性的装置。就是说，根据本发明，提供了一种用于编解码带有可伸缩性的音频数据的方法和装置，其中复杂性较低，同时可以提供FGS，以及良好的音频质量，即使是在低层。As described above, according to the present invention, by encoding bits in units of symbols after bit slicing is performed, scalability whereby the bit rate can be controlled in a top-down manner is provided so that the computational load of the encoding device is not much larger than that without providing the scalability. stretchable device. That is, according to the present invention, there is provided a method and apparatus for encoding and decoding audio data with scalability, wherein the complexity is low while providing FGS and good audio quality even at low layers.

此外，比较于使用算术编码的MPEG-4音频BSAC技术，使用赫夫曼编码的本发明的编解码装置减少了用于封装/解封处理的计算量，其降至BSAC技术的八分之一。即使当按照本发明的比特封装被执行以便提供FGS时，开销是小的，使得编码增益与没有提供可伸缩性时相同。In addition, compared to the MPEG-4 audio BSAC technology using arithmetic coding, the codec device of the present invention using Huffman coding reduces the amount of computation for encapsulation/decapsulation processing down to one-eighth of the BSAC technology . Even when bit packing according to the present invention is performed to provide FGS, the overhead is small so that the coding gain is the same as when no scalability is provided.

此外，由于按照本发明的装置具有分层结构，为使得服务器端能够控制比特率而再产生比特流的处理是很简单的，因此，用于变换编码的装置的复杂性是低的。Furthermore, since the apparatus according to the present invention has a layered structure, the process of regenerating the bit stream to enable the server side to control the bit rate is simple, and thus the complexity of the apparatus for transform coding is low.

当通过网络发送音频流时，能根据用户的选择或网络条件控制传输比特率，以便可以提供不停断的服务。When sending audio streams over the network, the transmission bit rate can be controlled according to user choice or network conditions so that uninterrupted service can be provided.

当音频流被存储在具有有限容量的信息存储介质中时，文件的大小能被任意控制和被存储。如果比特率变低，频段被约束。因此，是编/解码器中最复杂装置的滤波器的复杂性被大大降低，并且与比特率成反比，编/解码器装置的实际复杂性降低。When an audio stream is stored in an information storage medium having a limited capacity, the size of a file can be arbitrarily controlled and stored. If the bit rate becomes lower, the frequency band is restricted. Thus, the complexity of the filter, which is the most complex device in the codec, is greatly reduced, and inversely proportional to the bit rate, the actual complexity of the coder device is reduced.

通过使用子波变换，时/频域分辨率高于现有技术的基于MDCT的编码，以至于提供了较好的音频质量，即使是在较低层。By using wavelet transform, the time/frequency domain resolution is higher than that of the state-of-the-art MDCT-based coding, so that better audio quality is provided even at lower layers.

Claims

1. method that is used for the telescopically coding audio data comprises:

Coding comprises corresponding to the scaling factor information of ground floor and the additional information of encoding model information;

By reference encoding model information,, be a plurality of quantized samples of unit Huffman encoding corresponding to ground floor with the symbol according to from the symbol that forms by highest significant position (MSB) order up to the symbol that forms by least significant bit (LSB) (LSB); With

Along with each ordinal number that successively increases layer, repeat described step, up to the coding of finishing predetermined a plurality of layers.

2. coding method comprises:

The section voice data makes the voice data of cutting into slices corresponding to a plurality of layers;

Acquisition is corresponding to a plurality of layers each calibration segment information and coding section information;

Based on calibration segment information and the coding section information corresponding to ground floor, coding comprises the additional information of scaling factor information and encoding model information;

By the voice data of reference scaling factor information quantization, obtain quantized samples corresponding to ground floor;

By reference encoding model information,, be that a plurality of quantized samples that the Huffman sign indicating number is obtained are compiled by unit with the symbol according to from the symbol that forms by highest significant position (MSB) order up to the symbol that forms by least significant bit (LSB) (LSB); With

Along with each ordinal number that successively increases layer, repeat described step, up to the coding of finishing a plurality of layers.

3. according to the method for claim 2, further comprise, before the additional information of coding,

The bit range that acquisition is allowed in each of a plurality of layers, wherein when a plurality of quantized samples that coding is obtained, the number of the bit of coding is counted, if and the bit of counting outnumber bit range corresponding to this bit, coding stops, even and after quantized samples is encoded entirely, if the number of bit of counting less than bit range corresponding to this bit, still uncoded bit is encoded into the scope of bit range permission after the low layer coding is done.

4. according to the method for claim 2, wherein the section of voice data comprises:

Carry out the wavelet transform of voice data; With

By the reference cutoff frequency, the data of section wavelet transform make the data of cutting into slices corresponding to a plurality of layers.

5. according to the process of claim 1 wherein that the coding of voice data comprises differential coding scaling factor information and encoding model information.

6. according to the process of claim 1 wherein that the coding of a plurality of quantized samples comprises:

The a plurality of quantized samples of mapping on bit-planes; With

According to from the symbol that forms by the MSB bit order, be the unit encoding sample with the symbol in the bit range that in layer, is allowed corresponding to sample up to the symbol that forms by the LSB bit.

7. according to the method for claim 6, wherein when a plurality of quantized samples of mapping, K quantized samples is mapped on the bit-planes, and in the coding of sample, obtain scalar value, and pass through with reference to K bit-binary data corresponding to the symbol that forms by K bit-binary data, the scalar value that obtains, with corresponding to the scalar value that is higher than a symbol of current symbol on the bit-planes, carry out Huffman encoding, wherein K is an integer.

8. method that is used for telescopically decoding with the voice data of hierarchy coding comprises:

Decoding comprises corresponding to the scaling factor information of ground floor and the additional information of encoding model information;

By reference encoding model information, according to from the symbol that forms by the MSB bit order, be unit Huffman decoding voice data, and obtain quantized samples with the symbol up to the symbol that forms by the LSB bit;

The quantized samples that is obtained by reference scaling factor information inverse quantization;

The sample of this inverse quantization of reciprocal transformation; With

Along with each ordinal number that successively increases layer, repeat described step, up to the decoding of finishing predetermined a plurality of layers.

9. method according to Claim 8, wherein the decoding of additional information comprises differential decoding scaling factor information and encoding model information.

10. method according to Claim 8, wherein the decoding of voice data further comprises:

According to from the symbol that forms by the MSB bit order, be the unit decoding audio data with the symbol in the bit range that in layer, is allowed corresponding to voice data up to the symbol that forms by the LSB bit; With

Arranged the bit-planes of decoding symbols to obtain quantized samples from it.

11., wherein when decoding audio data, obtain the 4*K bit-planes that forms by decoding symbols, and from the quantized samples that obtains, from K quantized samples of 4*K bit-planes acquisition, wherein K is an integer according to the method for claim 10.

12. a device that is used for the telescopically decoding with the voice data of hierarchy coding comprises:

The deblocking unit, its decoding comprises corresponding to the scaling factor information of ground floor and the additional information of encoding model information, and pass through with reference to encoding model information, according to from the symbol that forms by the MSB bit order up to the symbol that forms by the LSB bit, with the symbol is unit Huffman decoding voice data, and obtains quantized samples;

The inverse quantization unit, its quantized samples by being obtained with reference to scaling factor information inverse quantization; With

The reciprocal transformation unit, this inverse quantization sample of its reciprocal transformation.

13. according to the device of claim 12, wherein deblocking unit differential decoding scaling factor information and encoding model information.

14. device according to claim 12, wherein the deblocking unit is according to from the symbol that formed by the MSB bit order up to the symbol that is formed by the LSB bit, in corresponding to the layer of voice data, be the unit decoding audio data with the symbol in the bit range that allowed, arranged the bit-planes of decoding symbols to obtain quantized samples from it.

15. according to the device of claim 12, wherein the deblocking unit obtains the 4*K bit-planes that formed by decoding symbols, and from K quantized samples of 4*K bit-planes acquisition, wherein K is an integer.

16. a device that is used for the telescopically coding audio data comprises:

The converter unit of converting audio frequency data;

Quantifying unit, it quantizes the converting audio frequency data corresponding to every layer by with reference to scaling factor information, and the output quantized samples; With

Encapsulation unit, its coding comprises corresponding to the scaling factor information of ground floor and the additional information of encoding model information, by reference encoding model information, according to from the symbol that forms by highest significant position (MSB) order, be a plurality of quantized samples of unit Huffman encoding from quantifying unit with the symbol up to the symbol that forms by least significant bit (LSB) (LSB).

17. device according to claim 16, wherein encapsulation unit obtains each calibration segment information and the coding section information corresponding to a plurality of layers, and based on the additional information that comprises scaling factor information and encoding model information corresponding to every layer calibration segment information and coding section information coding.

18. device according to claim 16, wherein encapsulation unit is counted the number of the bit of coding, if and the bit of counting outnumber bit range corresponding to this bit, coding stops, even and after quantized samples is encoded entirely, if the number of the bit of counting is less than the bit range corresponding to this bit, still uncoded bit is encoded into the scope that bit range allows after the coding of low layer is finished.

19. according to the device of claim 16, wherein converter unit is carried out wavelet transform to voice data.

20. according to the device of claim 16, wherein encapsulation unit makes the data of cutting into slices corresponding to a plurality of layers by the data with reference to cutoff frequency section wavelet transform.

21. according to the device of claim 16, wherein encapsulation unit differential coding scaling factor information and encoding model information.

22. device according to claim 16, wherein encapsulation unit shines upon a plurality of quantized samples on bit-planes, and according to from the symbol that forms by the MSB bit order, be the unit encoding sample with the symbol in the bit range that in layer, is allowed corresponding to sample up to the symbol that forms by the LSB bit.

23. device according to claim 16, wherein encapsulation unit shines upon K quantized samples on bit-planes, acquisition is corresponding to the scalar value of the symbol that is formed by K bit-binary data, and pass through with reference to K bit-binary data, the scalar value that obtains, with carry out Huffman encoding corresponding to the scalar value that is higher than a symbol of current symbol on the bit-planes, wherein K is an integer.