CN1532808A

CN1532808A - Method and device for encoding and/or decoding audio data using bandwidth extension technology

Info

Publication number: CN1532808A
Application number: CNA031650201A
Authority: CN
Inventors: 金重会; 金尚煜
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2003-03-22
Filing date: 2003-09-17
Publication date: 2004-09-29
Anticipated expiration: 2023-09-17
Also published as: KR100923301B1; KR20040086879A; CN1273955C

Abstract

The present invention provides methods and devices for encoding and decoding audio data using bandwidth extension techniques. The method includes: bandwidth extension encoding audio data, outputting bandwidth limited audio data, and generating bandwidth extension information; arithmetically encoding the bandwidth limited audio data into a layered structure having a base layer and at least one enhancement layer to control bit rate ; and multiplexing the arithmetic-coded bandwidth-limited audio data and the bandwidth extension information.

Description

Method and device for encoding and/or decoding audio data using bandwidth extension technology

本申请要求以2003年3月22日向韩国知识产权局提交的申请号为2003-17978的申请为外国优先权，其公开在此被全文引用。This application claims foreign priority from Application No. 2003-17978 filed with the Korean Intellectual Property Office on March 22, 2003, the disclosure of which is hereby incorporated by reference in its entirety.

技术领域technical field

本发明涉及音频数据的编码和解码，尤其涉及采用带宽扩展技术编码和/或解码音频数据的方法和装置。The present invention relates to encoding and decoding of audio data, in particular to a method and device for encoding and/or decoding audio data using bandwidth extension technology.

背景技术Background technique

随着数字信号处理技术发展，音频信号主要作为数字数据存储并播放。数字音频存储器和/或播放设备采样并量化模拟音频信号，将模拟音频信号变换成作为数字信号的脉冲编码调制(PCM)音频数据，并将脉冲编码调制(PCM)音频数据存储在诸如压缩光盘(CD)、数字通用光盘(DVD)或类似物的信息存储介质上，这样，当用户想听PCM音频数据时可播放信息存储介质上的数据。相对于慢转密纹(LP)唱片、磁带或类似物所采用的模拟音频信号存储和/或再现方法，数字音频信号存储和/或再现方法大大提高了音质且显著减少了由于长期存放所导致的音质恶化。然而，大量数字数据有时也产生存储和传输的问题。With the development of digital signal processing technology, audio signals are mainly stored and played as digital data. A digital audio memory and/or playback device samples and quantizes an analog audio signal, converts the analog audio signal into pulse code modulated (PCM) audio data as a digital signal, and stores the pulse code modulated (PCM) audio data on a file such as a compact disc ( CD), Digital Versatile Disc (DVD) or the like, so that the data on the information storage medium can be played back when the user wants to listen to PCM audio data. Compared to analog audio signal storage and/or reproduction methods employed by slow-rotation (LP) records, tapes, or the like, digital audio signal storage and/or reproduction methods greatly improve sound quality and significantly reduce damage caused by long-term storage. sound quality deteriorates. However, large amounts of digital data sometimes create storage and transmission problems.

为解决以上问题，大量用于减少数字音频数据量的压缩技术得以使用。由国际标准化组织(ISO)起草的运动图像专家组音频标准或道尔比降噪声系统的AC-2/AC-3技术采取利用一音质模型减少数据量的方法，有效减少了与信号特征无关的数据量。也就是说，MPEG音频标准和AC-2/AC-3技术只在64Kbps-384Kbps的比特率上，即现有数字编码技术的1/6-1/8，提供与CD几乎相同的音质。To solve the above problems, a variety of compression techniques for reducing the amount of digital audio data are used. The Motion Picture Experts Group audio standard drafted by the International Organization for Standardization (ISO) or the AC-2/AC-3 technology of the Dalby noise reduction system adopts the method of reducing the amount of data by using a sound quality model, which effectively reduces the signal irrelevant to the signal characteristics. amount of data. That is to say, the MPEG audio standard and AC-2/AC-3 technology provide almost the same sound quality as CD only at the bit rate of 64Kbps-384Kbps, that is, 1/6-1/8 of the existing digital coding technology.

然而，以上所有技术都遵从在固定比特率的最佳状态下检测、量化和编码数字数据的方法。因此，当经由网络传输数字数据时，由于网络条件限制导致传输带宽减小。然后，网络断开且网络服务停止。也就是说，当数字数据被变换成较小比特流从而适合存储容量受限的移动设备时，可执行重编码以减少数据量。为此，需要相当多的计算。However, all of the above techniques follow a method of detecting, quantizing, and encoding digital data at an optimum of a fixed bit rate. Therefore, when transmitting digital data via a network, the transmission bandwidth is reduced due to network condition limitations. Then, the network disconnects and the network service stops. That is, when digital data is transformed into a smaller bit stream to fit a mobile device with limited storage capacity, re-encoding may be performed to reduce the amount of data. For this, considerable computation is required.

因此，本申请人于1997年11月19日向韩国知识产权局提交了申请号为97-61298的韩国专利申请“能利用比特切割算术编码(BSAC)技术控制比特率的音频编码和/或解码方法和装置”，此申请已于2002年4月17日授权，韩国专利登记号为261253。依据BSAC技术，已以高比特率编码的比特流，可被变换成具有低比特率的比特流。由于可只采用一部分比特流进行重构，即使网络过载、解码器性能不好或用户要求低比特率，也能只利用一部分比特流向用户提供适中音质服务(即使解码器性能与低比特率同样恶化)。然而，在低比特率，解码器性能也不可避免地要降低。Therefore, the applicant filed Korean Patent Application No. 97-61298 "Audio Coding and/or Decoding Method Capable of Controlling Bit Rate Using Bit-Sliced Arithmetic Coding (BSAC) Technology" to the Korean Intellectual Property Office on November 19, 1997 and device", this application was authorized on April 17, 2002, Korean Patent Registration No. 261253. According to the BSAC technique, a bit stream encoded at a high bit rate can be transformed into a bit stream with a low bit rate. Since only a part of the bit stream can be used for reconstruction, even if the network is overloaded, the performance of the decoder is not good, or the user requires a low bit rate, only a part of the bit stream can be used to provide users with moderate sound quality services (even if the performance of the decoder is as bad as the low bit rate) ). However, at low bit rates, decoder performance inevitably degrades.

然而，BSAC技术利用改进的离散余弦变换(MDCT)变换音频信号，这严重降低了较低层产生的声音质量。由于MDCT的频率分辨率为常数，考虑到音质模型，人耳不敏感部分的频率分辨率变得很高。因此，依据MDCT，从增强层到较低层时声音质量变差。However, BSAC technology utilizes a modified discrete cosine transform (MDCT) to transform the audio signal, which severely degrades the sound quality produced by the lower layers. Since the frequency resolution of the MDCT is constant, the frequency resolution of the insensitive portion of the human ear becomes high in consideration of the sound quality model. Therefore, according to MDCT, the sound quality deteriorates from the enhancement layer to the lower layer.

发明内容Contents of the invention

本发明提供了能控制音频数据比特率的音频数据编码和/或解码方法和装置，即使只利用部分比特流执行恢复，也能再现高质量声音。The present invention provides an audio data encoding and/or decoding method and apparatus capable of controlling the bit rate of audio data, and can reproduce high-quality sound even if restoration is performed using only a part of the bit stream.

本发明还提供了音频数据编码和/或解码方法和装置，能控制比特流使得可从一较低层产生高质量声音。The present invention also provides an audio data encoding and/or decoding method and apparatus capable of controlling a bit stream so that high-quality sound can be produced from a lower layer.

依据本发明的一方面，提供了一种编码音频数据的方法。此方法包括：带宽扩展编码音频数据，输出带宽受限的音频数据，并产生带宽扩展信息；将所述带宽受限的音频数据算术编码为具有一基层和至少一增强层的分层结构从而控制比特率；并多路复用该算术编码带宽受限的音频数据和带宽扩展信息。According to an aspect of the present invention, a method of encoding audio data is provided. The method includes: bandwidth extension encoded audio data, outputting bandwidth-limited audio data, and generating bandwidth extension information; arithmetically encoding the bandwidth-limited audio data into a layered structure with a base layer and at least one enhancement layer to control bit rate; and multiplexing the arithmetically encoded bandwidth-limited audio data and bandwidth extension information.

所述算术编码包括：差分编码对应于基层的辅助信息；比特分割编码对应于基层的多个量化采样值；并为下一增强层重复差分编码和比特分割编码直到多个预定层完成编码。The arithmetic coding includes: differential coding corresponding to auxiliary information of the base layer; bit partition coding corresponding to multiple quantized sampling values of the base layer; and repeating differential coding and bit partition coding for the next enhancement layer until coding is completed for multiple predetermined layers.

所述算术编码包括：差分编码对应于基层的包含比例因子信息和编码模型信息的辅助信息；参考编码模型信息，比特分割编码对应于基层的多个量化采样值；并为下一增强层重复差分编码和比特分割编码直到多个预定层完成编码。The arithmetic coding includes: differential coding corresponds to auxiliary information including scale factor information and coding model information of the base layer; referring to the coding model information, bit-segmented coding corresponds to a plurality of quantized sampling values of the base layer; and repeating the difference for the next enhancement layer Encoding and bit-splitting encoding until multiple predetermined layers are encoded.

量化采样值最好通过音频数据的伪子波变换获得。Quantized sample values are preferably obtained by pseudo-wavelet transform of the audio data.

已编码带宽受限的音频数据和带宽扩展信息按以下顺序多路复用，定位对应于基层的一部分已编码带宽受限的音频数据，定位带宽扩展信息，并定位对应于其余增强层的部分已编码带宽受限的音频数据。The encoded bandwidth-limited audio data and the bandwidth extension information are multiplexed in the following order, locating a portion of the encoded bandwidth-limited audio data corresponding to the base layer, locating the bandwidth extension information, and locating the portion corresponding to the remaining enhancement layer already Encode bandwidth-constrained audio data.

可选地，已编码带宽受限的音频数据和带宽扩展信息按以下顺序多路复用，定位带宽扩展信息，定位对应于基层的一部分已编码带宽受限的音频数据，并定位对应于其余增强层的部分已编码带宽受限的音频数据。Optionally, the encoded bandwidth-limited audio data and the bandwidth extension information are multiplexed in the following order, locating the bandwidth extension information, locating the part of the encoded bandwidth-limited audio data corresponding to the base layer, and locating the part corresponding to the enhancement Layer part of encoded bandwidth-constrained audio data.

根据本发明的另一方面，提供了一种解码音频数据的方法。此方法包括：多路分解一输入音频比特流并采样带宽受限的音频数据，此数据被编码为包括一基层和至少一增强层以及带宽扩展信息的分层结构；算术解码至少一部分对应于基层的带宽受限的音频数据；基于带宽受限的音频数据的已解码部分并参考带宽扩展信息，产生处于至少一部分未被带宽受限的音频数据的已解码部分覆盖的频带内的音频数据，然后将所产生的音频数据补入带宽受限的音频数据的已解码部分。According to another aspect of the present invention, a method of decoding audio data is provided. The method includes: demultiplexing an input audio bitstream and sampling bandwidth-limited audio data encoded as a layered structure including a base layer and at least one enhancement layer and bandwidth extension information; arithmetic decoding at least a portion corresponding to the base layer the bandwidth-limited audio data of the bandwidth-limited audio data; based on the decoded portion of the bandwidth-limited audio data and referring to the bandwidth extension information, generating audio data in a frequency band at least in part not covered by the decoded portion of the bandwidth-limited audio data, and then The generated audio data is added to the decoded portion of the bandwidth-constrained audio data.

产生在所述频带部分的音频数据，从而到达带宽受限的音频数据已解码部分的边界。产生在所述频带部分的音频数据，从而到达用于伪子波变换的滤波器组的边界。假如音频数据未到达用于伪子波变换的滤波器组的边界，则插入带宽受限的音频数据已解码部分与所产生的音频数据的重叠部分。The audio data is generated in the frequency band portion so as to reach the boundary of the decoded portion of the bandwidth-limited audio data. The audio data is generated in the part of the frequency band so as to reach the boundary of the filter bank for the pseudo-wavelet transform. If the audio data does not reach the boundary of the filter bank used for the pseudo-wavelet transform, an overlapping portion of the bandwidth-limited decoded portion of the audio data and the resulting audio data is inserted.

所述输入音频比特流按以下顺序多路分解：从输入音频比特流采样对应于基层的数据，从输入音频比特流采样带宽扩展信息，并从输入音频比特流采样对应于其余增强层的数据。The input audio bitstream is demultiplexed in the following order: data corresponding to the base layer is sampled from the input audio bitstream, bandwidth extension information is sampled from the input audio bitstream, and data corresponding to the remaining enhancement layers is sampled from the input audio bitstream.

可选地，所述输入音频比特流按以下顺序多路分解：从输入音频比特流采样带宽扩展信息，从输入音频比特流采样对应于基层的数据，并从输入音频比特流采样对应于其余增强层的数据。Optionally, said input audio bitstream is demultiplexed in the following order: sampling bandwidth extension information from the input audio bitstream, sampling data corresponding to the base layer from the input audio bitstream, and sampling data corresponding to the remaining enhancements from the input audio bitstream layer data.

所述算术解码包括：差分解码对应于基层的辅助信息；比特分割解码对应于基层的多个量化采样值；并为下一增强层重复差分解码和比特分割解码直到多个预定层完成解码。The arithmetic decoding includes: differential decoding corresponding to the auxiliary information of the base layer; bit-segmented decoding corresponding to multiple quantized sampling values of the base layer; and repeating the differential decoding and bit-segmented decoding for the next enhancement layer until the decoding is completed for multiple predetermined layers.

所述算术解码包括：差分解码对应于基层的包含比例因子信息和编码模型信息的辅助信息；参考编码模型信息，比特分割解码对应于基层的多个量化采样值；并为下一增强层重复差分解码和比特分割解码直到多个预定层完成解码。The arithmetic decoding includes: differentially decoding auxiliary information corresponding to the base layer including scale factor information and coding model information; referring to the coding model information, bit-segmented decoding corresponding to a plurality of quantized sampling values of the base layer; and repeating the difference for the next enhancement layer Decoding and bit-slicing decoding until multiple predetermined layers are decoded.

根据本发明的再一个方面，提供了一种编码音频数据的装置。此装置包括：一带宽扩展编码器，用于带宽扩展编码音频数据、输出带宽受限的音频数据并产生带宽扩展信息；一细粒可伸缩编码器，用于将所述带宽受限的音频数据算术编码为包括一基层和至少一增强层的分层结构从而控制比特率；以及一多路复用器，用于多路复用所述算术编码带宽受限的音频数据和带宽扩展信息。According to yet another aspect of the present invention, an apparatus for encoding audio data is provided. The apparatus includes: a bandwidth extension encoder for bandwidth extension encoding audio data, outputting bandwidth-limited audio data and generating bandwidth extension information; a fine-grain scalable encoder for converting the bandwidth-limited audio data Arithmetic coding is in a hierarchical structure including a base layer and at least one enhancement layer to control a bit rate; and a multiplexer for multiplexing the arithmetic coding bandwidth-limited audio data and bandwidth extension information.

所述细粒可伸缩编码器差分编码对应于基层的辅助信息，比特分割编码对应于基层的多个量化采样值，并比特分割编码对应于下一增强层的辅助信息和多个量化采样值直到多个预定层完成编码。The fine-grained scalable encoder differentially encodes side information corresponding to the base layer, bit-split encodes multiple quantized sample values corresponding to the base layer, and bit-split encodes side information and multiple quantized sample values corresponding to the next enhancement layer until Multiple predetermined layers complete the encoding.

所述细粒可伸缩编码器差分编码对应于基层的包含比例因子信息和编码模型信息的辅助信息，参考编码模型信息，比特分割编码对应于基层的多个量化采样值，编码对应于下一增强层的包含比例因子信息和编码模型信息的辅助信息直到多个预定层完成编码，并比特分割编码对应于下一增强层的多个量化采样值。所述细粒可伸缩编码器最好通过伪子波变换音频数据获得量化采样值。The differential encoding of the fine-grained scalable encoder corresponds to auxiliary information including scale factor information and encoding model information of the base layer, referring to the encoding model information, bit-segmented encoding corresponds to multiple quantized sampling values of the base layer, and encoding corresponds to the next enhanced Auxiliary information of a layer including scale factor information and coding model information is coded until a plurality of predetermined layers are coded, and a plurality of quantized sample values corresponding to the next enhancement layer are bit-divided coded. The fine-grained scalable encoder preferably obtains quantized sample values by pseudo-wavelet transforming the audio data.

所述多路复用器按以下顺序多路复用已编码带宽受限的音频数据和带宽扩展信息：定位一部分对应于基层的已编码带宽受限的音频数据，定位带宽扩展信息，并定位对应于其余增强层的部分已编码带宽受限的音频数据。The multiplexer multiplexes the encoded bandwidth-limited audio data and the bandwidth extension information in the following order: locating a portion of the encoded bandwidth-limited audio data corresponding to the base layer, locating the bandwidth extension information, and locating the corresponding Part of the encoded bandwidth-limited audio data in the remaining enhancement layers.

根据本发明的又一个方面，提供了一种用于解码音频数据的装置。此装置包括：一多路分解器，多路分解一输入音频比特流并采样被编码成具有一基层和至少一增强层以及带宽扩展信息的分层结构的带宽受限的音频数据；一细粒可伸缩算术解码器，解码对应于基层的至少一部分带宽受限的音频数据；和一带宽扩展解码器，基于带宽受限的音频数据的已解码部分并参考带宽扩展信息，产生处于至少一部分未被带宽受限的音频数据的已解码部分覆盖的频带内的音频数据，然后将所产生的音频数据补入带宽受限的音频数据的已解码部分。According to yet another aspect of the present invention, an apparatus for decoding audio data is provided. The device comprises: a demultiplexer that demultiplexes an input audio bit stream and samples the bandwidth-limited audio data encoded into a layered structure with a base layer and at least one enhancement layer and bandwidth extension information; a fine-grained a scalable arithmetic decoder for decoding at least a portion of the bandwidth-restricted audio data corresponding to the base layer; and a bandwidth extension decoder for generating at least a portion of the bandwidth-restricted audio data based on the decoded portion of the bandwidth-restricted audio data and referring to the bandwidth extension information audio data within the frequency band covered by the decoded portion of the bandwidth-limited audio data, and the resulting audio data is then added to the decoded portion of the bandwidth-limited audio data.

所述细粒可伸缩哈夫曼解码器差分解码对应于基层的辅助信息，比特分割解码对应于基层的多个量化采样值，并解码对应于下一增强层的辅助信息直到多个预定层完全解码，并比特分割解码对应于下一增强层的多个量化采样值。The fine-grained scalable Huffman decoder differentially decodes side information corresponding to a base layer, bit-partitioned decodes a plurality of quantized sample values corresponding to a base layer, and decodes side information corresponding to a next enhancement layer until a plurality of predetermined layers are fully decoding, and bit-sliced decoding a plurality of quantized sample values corresponding to the next enhancement layer.

所述多路分解器按以下顺序多路分解所述输入音频比特流：从输入音频比特流采样对应于基层的数据，从输入音频比特流采样带宽扩展信息，并从输入音频比特流采样对应于其余增强层的数据。可选地，所述多路分解器可按以下顺序多路分解输入音频比特流：从输入音频比特流采样带宽扩展信息，从输入音频比特流采样对应于基层的数据，并从输入音频比特流采样对应于其余增强层的数据。The demultiplexer demultiplexes the input audio bitstream in the following order: data corresponding to the base layer is sampled from the input audio bitstream, bandwidth extension information is sampled from the input audio bitstream, and data corresponding to the base layer is sampled from the input audio bitstream Data for the rest of the enhancement layers. Optionally, the demultiplexer may demultiplex the input audio bitstream in the following order: sampling the bandwidth extension information from the input audio bitstream, sampling data corresponding to the base layer from the input audio bitstream, and sampling the data corresponding to the base layer from the input audio bitstream The samples correspond to the data of the remaining enhancement layers.

附图说明Description of drawings

通过参照附图详细描述本发明的典型实施例，本发明的特征及其它优点将更加显而易见，其中：Features and other advantages of the present invention will become more apparent by describing in detail exemplary embodiments of the present invention with reference to the accompanying drawings, in which:

图1为根据本发明的一编码装置的方框图；Fig. 1 is a block diagram of an encoding device according to the present invention;

图2为图1所示的编码装置的详细方框图；Fig. 2 is a detailed block diagram of the encoding device shown in Fig. 1;

图3为根据本发明的一解码装置的方框图；Fig. 3 is a block diagram of a decoding device according to the present invention;

图4为图3所示的解码装置的详细方框图；Fig. 4 is a detailed block diagram of the decoding device shown in Fig. 3;

图5示出了从一细粒可伸缩(FGS)编码器2输出的比特流的结构；Figure 5 shows the structure of a bitstream output from a Fine Grain Scalable (FGS) encoder 2;

图6示出了图5所示的辅助信息的详细结构；Fig. 6 shows the detailed structure of the auxiliary information shown in Fig. 5;

图7示出了从多路复用器3输出或输入到多路分解器7的比特流的结构；Figure 7 shows the structure of the bit stream output from the multiplexer 3 or input to the demultiplexer 7;

图8是用于解释根据本发明的编码和解码装置执行的算术编码和解码方法的示意图；8 is a schematic diagram for explaining an arithmetic encoding and decoding method performed by an encoding and decoding device according to the present invention;

图9是用于更详细地解释由带宽扩展(BWE)解码器9执行的带宽扩展解码的示意图；FIG. 9 is a schematic diagram for explaining bandwidth extension decoding performed by a bandwidth extension (BWE) decoder 9 in more detail;

图10是用于说明根据本发明的一编码方法的流程图；Fig. 10 is a flowchart for illustrating an encoding method according to the present invention;

图11是用于说明根据本发明的一解码方法的流程图。FIG. 11 is a flowchart illustrating a decoding method according to the present invention.

具体实施方式Detailed ways

以下将参照附图对本发明的优选实施例进行详细描述。Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

图1是根据本发明的一编码装置的方框图。如图1，该编码装置，接收并编码PCM音频数据，并将PCM音频数据作为音频比特流输出，该编码装置包括一带宽扩展(BWE)编码器1，一细粒可伸缩(FGS)编码器2和一多路复用器3。Fig. 1 is a block diagram of an encoding apparatus according to the present invention. As shown in Figure 1, the encoding device receives and encodes PCM audio data, and outputs the PCM audio data as an audio bit stream, and the encoding device includes a bandwidth extension (BWE) encoder 1 and a fine-grained scalable (FGS) encoder 2 and a multiplexer 3 .

所述BWE编码器1，BWE编码PCM音频信号、输出带宽受限的音频数据并产生BWE信息。BWE编码涉及用于接收音频数据、分割高频带内的一部分音频数据，并产生恢复音频数据的分割部分所必须的辅助信息的技术。在此，音频数据的其余部分被称为“带宽受限的音频数据”且辅助信息被称为“BWE信息”。BWE技术的一个例子为编码技术发展而来的光谱带复制(SBR)技术。SBR技术的详细内容在2002年5月10-13日的第112届音频工程协会会议上的“会议论文5560”中公开。The BWE encoder 1, BWE encodes PCM audio signals, outputs bandwidth-limited audio data, and generates BWE information. BWE encoding refers to a technique for receiving audio data, segmenting a portion of the audio data within a high frequency band, and generating side information necessary to restore the segmented portion of the audio data. Here, the rest of the audio data is referred to as "bandwidth limited audio data" and the side information is referred to as "BWE information". An example of BWE technology is Spectral Band Replication (SBR) technology developed from coding technology. Details of the SBR technology are disclosed in "Conference Proceedings 5560" at the 112th Audio Engineering Society Meeting, May 10-13, 2002.

FGS编码器2将带宽受限的音频数据编码成具有一基层和至少一增强层的分层结构从而控制比特率。FGS编码涉及用于将数据编码成多层结构从而控制比特率，即提供FGS的技术。申请号为97-61298的韩国专利申请公开的BSAC技术为FGS编码的一个实例。也就是说，FGS编码器2差分编码对应于基层的辅助信息，比特分割编码对应于基层的多个量化采样值，差分编码对应于下一增强层的辅助信息直到多个预定层完成编码，并且比特分割编码对应于下一增强层的多个量化采样值。在此，所述辅助信息包含比例因子信息和编码模型信息，并且通过变换和量化输入音频数据获得量化采样值。以下将详细说明所述辅助信息和量化采样值。The FGS encoder 2 encodes bandwidth-limited audio data into a layered structure having a base layer and at least one enhancement layer to control a bit rate. FGS encoding refers to a technique for encoding data into a multi-layer structure to control a bit rate, ie, provides FGS. The BSAC technique disclosed in Korean Patent Application No. 97-61298 is an example of FGS encoding. That is to say, the FGS encoder 2 differential encoding corresponds to the auxiliary information of the base layer, the bit-segmented encoding corresponds to a plurality of quantized sampling values of the base layer, and the differential encoding corresponds to the auxiliary information of the next enhancement layer until a plurality of predetermined layers are coded, and Bit-partitioned coding corresponds to multiple quantized sample values of the next enhancement layer. Here, the side information includes scale factor information and encoding model information, and quantized sample values are obtained by transforming and quantizing input audio data. The side information and quantized sample values will be described in detail below.

多路复用器3多路复用由FGS编码器2编码产生的带宽受限PMC音频数据和由BWE编码器1产生的BWE信息。The multiplexer 3 multiplexes the bandwidth-limited PMC audio data encoded by the FGS encoder 2 and the BWE information produced by the BWE encoder 1 .

图2为图1所示的编码装置的详细方框图。如图2，所述编码装置包括一BWE编码器1、一FGS编码器2和一多路复用器3。与图1中执行相同功能的块采用相同的附图标记，在此不再赘述。Fig. 2 is a detailed block diagram of the encoding device shown in Fig. 1 . As shown in FIG. 2 , the encoding device includes a BWE encoder 1 , a FGS encoder 2 and a multiplexer 3 . Blocks that perform the same functions as those in FIG. 1 use the same reference numerals and will not be repeated here.

具体地，FGS编码器2包括一伪子波变换(PWT)单元21、一音质单元22和一量化单元23以及一FGS算术编码单元24。Specifically, the FGS encoder 2 includes a pseudo-wavelet transform (PWT) unit 21 , a sound quality unit 22 , a quantization unit 23 and a FGS arithmetic coding unit 24 .

PWT单元21接收时域内的PCM音频数据，并参照由音质单元22提供的音质模型信息将该PCM音频数据伪子波变换为频域内的音频信号。能被人感知的音频信号特性，以下被称为感知音频信号，在时域内没有太大的不同。相反，考虑音质模型，频域内的感知和非感知音频信号的特性很不相同。因此，通过给每个频带分配不同数量的比特可提高压缩效率。仅由于低频带内高频分辨率所产生的轻微频率畸变，MDCT会产生感知噪音。相对于MDCT，由于具有适中的时间/频率分辨率，PWT即使从具有较低频带的较低层也可提供稳定的声音质量。The PWT unit 21 receives PCM audio data in the time domain, and transforms the PCM audio data pseudo-wavelet into an audio signal in the frequency domain with reference to the sound quality model information provided by the sound quality unit 22 . The characteristics of the audio signal that can be perceived by humans, hereinafter referred to as the perceptual audio signal, do not differ much in the time domain. In contrast, considering the sound quality model, the characteristics of perceptual and non-perceptual audio signals in the frequency domain are quite different. Therefore, compression efficiency can be improved by allocating different numbers of bits to each frequency band. MDCT produces perceptual noise due to only slight frequency distortions produced by high-frequency resolution in the low-frequency band. Compared to MDCT, PWT provides stable sound quality even from lower layers with lower frequency bands due to moderate time/frequency resolution.

音质单元22向PWT单元21提供诸如处理(attack)检测信息之类的关于音质模型的信息，将由PWT单元21变换的音频信号打包为子频带音频信号，利用子频带信号间相互作用所产生的掩蔽效应为每个子频带计算掩蔽阈值，并将掩蔽阈值提供给量化单元23。所述掩蔽阈值表示由于音频信号间的相互作用人们不能感知的音频信号的最大功率。在本实施例中，音质单元22利用立体声掩蔽水平降低(Binaural Masking Level Depression)(BMLD)计算用于立体声部分的掩蔽阈值和类似值。The sound quality unit 22 provides information about the sound quality model such as processing (attack) detection information to the PWT unit 21, packs the audio signal transformed by the PWT unit 21 into a sub-band audio signal, and utilizes the masking generated by the interaction between the sub-band signals. The effect computes a masked threshold for each subband and provides the masked threshold to the quantization unit 23 . The masking threshold represents the maximum power of an audio signal that a human cannot perceive due to the interaction between the audio signals. In this embodiment, the sound quality unit 22 calculates the masking threshold and the like for the stereo part using Binaural Masking Level Depression (BMLD).

量化单元23基于对应的比例因子信息标量量化每个子频带音频信号以将每个子频带的量化噪音能量减少到低于音质单元22提供的掩蔽阈值，然后输出量化采样值，这样人们能听到子频带音频信号但感觉不到其中的噪音。也就是说，量化单元23按噪音掩蔽率(NMR)量化子频带音频信号，NMR表示每个子频带所产生的噪音与音质单元22所计算的掩蔽阈值的比例，在全带宽为OdB或更少。OdB或更少的NMR表示人们不能听到量化噪音。The quantization unit 23 scalarizes each sub-band audio signal based on the corresponding scale factor information to reduce the quantization noise energy of each sub-band to be lower than the masking threshold provided by the sound quality unit 22, and then outputs the quantized sample value so that people can hear the sub-band Audio signal but no perceivable noise in it. That is, the quantization unit 23 quantizes the sub-band audio signal by a noise masking ratio (NMR), which represents the ratio of the noise generated by each sub-band to the masking threshold calculated by the sound quality unit 22, which is 0 dB or less in the full bandwidth. An NMR of 0 dB or less means that one cannot hear quantization noise.

FGS算术编码单元24将量化采样值和属于每层的辅助信息编码为分层结构。所述辅助信息包含对应于每层的比例频带信息、编码频带信息、比例因子信息和编码模型信息。比例频带信息和编码频带信息可打包为构成音频比特流的每一帧的标头信息，然后被传送到解码装置。可选地，所述比例频带信息和编码频带信息可被编码并被打包为对应于每层的辅助信息，然后被传送到解码装置。而且，由于比例频带信息和编码频带信息已经存储在解码装置中，比例频带信号和编码频带信息可不必传送至所述解码装置。The FGS arithmetic coding unit 24 codes quantized sample values and side information belonging to each layer into a hierarchical structure. The side information includes scale band information, coding band information, scale factor information, and coding model information corresponding to each layer. The proportional band information and the encoding band information may be packed into header information constituting each frame of the audio bitstream, and then transmitted to the decoding device. Optionally, the proportional band information and coded band information may be encoded and packaged as auxiliary information corresponding to each layer, and then transmitted to the decoding device. Also, since the proportional band information and the encoded band information are already stored in the decoding device, the proportional band signal and the encoded band information may not be transmitted to the decoding device.

更详细地，FGS算术编码单元24差分编码对应于第一层的包含比例因子信息和编码模型信息的辅助信息，同时参考编码模型信息比特分割编码量化对应于第一层的采样值。比特分割编码表示在上述BSAC中使用的编码，顺序无损编码最高有效比特、次有效比特……和最低有效比特。第二层与第一层采用相同处理。也就是说，多个预定层逐层顺序编码。第一层被称为基层，其余层被称为增强层。后面将对分层结构进行更详细描述。In more detail, the FGS arithmetic coding unit 24 differentially encodes auxiliary information corresponding to the first layer including scale factor information and coding model information, and at the same time refers to the coding model information to bit-segment code and quantize the sample values corresponding to the first layer. Bit-segmented coding means the coding used in the above-mentioned BSAC, sequentially lossless coding the most significant bit, the next most significant bit... and the least significant bit. The second layer is treated the same as the first layer. That is, a plurality of predetermined layers are sequentially coded layer by layer. The first layer is called the base layer, and the remaining layers are called enhancement layers. The hierarchical structure will be described in more detail later.

比例频带信息对于依赖音频信号的频率特性正确执行量化是必须的，且当频域被分为多个频带且每个频带被分配了一正确比例因子时，比例频带信息通知对应的比例频带的每一层。因此，每层属于至少一个比例频带。每个比例频带被分配一个比例因子。编码频带信息对于依赖音频信号的频率特性正确执行编码是必须的，且当频域被分为多个频带且每个频带被分配了一正确编码模型时，编码频带信息通知对应的编码频带的每一层。通过测试正确划分比例频带和编码频带，且随后确定对应的比例因子和编码模型。The scale band information is necessary to perform quantization correctly depending on the frequency characteristics of the audio signal, and when the frequency domain is divided into bands and each band is assigned a correct scale factor, the scale band information notifies each of the corresponding scale bands layer. Therefore, each layer belongs to at least one proportional band. Each scale band is assigned a scale factor. The encoding band information is necessary to correctly perform encoding depending on the frequency characteristics of the audio signal, and when the frequency domain is divided into bands and each band is assigned a correct encoding model, the encoding band information notifies each of the corresponding encoding bands layer. The correct division of scale bands and coding bands is done by testing, and then the corresponding scale factors and coding models are determined.

多路复用器3按以下顺序多路复用已编码带宽受限的音频数据和BWE信息：定位对应于基层的已编码量化采样值数据，定位BWE信息，并定位对应于其余增强层的已编码量化采样值数据。多路复用器3或者按以下顺序多路复用已编码带宽受限的音频数据和BWE信息：定位BWE信息，定位对应于基层的已编码量化采样值数据，并定位对应于其余增强层的已编码量化采样值数据。Multiplexer 3 multiplexes the encoded bandwidth-limited audio data and BWE information in the following order: locate the encoded quantized sample value data corresponding to the base layer, locate the BWE information, and locate the encoded quantized sample value data corresponding to the remaining enhancement layers. Encodes quantized sample value data. The multiplexer 3 alternatively multiplexes the encoded bandwidth-limited audio data and the BWE information in the following order: locate the BWE information, locate the encoded quantized sample value data corresponding to the base layer, and locate the Encoded quantized sample value data.

图3为根据本发明的一解码装置的方框图。如图3，该解码装置，接收并解码音频比特流，然后输出音频数据，该解码装置包括一多路分解器7、一FGS解码器8和一BWE解码器9。FIG. 3 is a block diagram of a decoding device according to the present invention. As shown in FIG. 3 , the decoding device receives and decodes the audio bit stream, and then outputs the audio data. The decoding device includes a demultiplexer 7 , a FGS decoder 8 and a BWE decoder 9 .

多路分解器7将一输入音频比特流多路分解为采样带宽受限的音频数据，该采样带宽受限的音频数据已被编码成具有基层和至少一增强层及其中的BWE信息的分层结构。在此，带宽受限的音频数据和BWE信息与参照图1所述的相同。FGS解码器8算术解码对应于基层的至少一部分带宽受限的音频数据。执行解码的层与网络状态、用户选择之类相关。Demultiplexer 7 demultiplexes an input audio bitstream into sample bandwidth limited audio data that has been encoded into layers having a base layer and at least one enhancement layer with BWE information therein structure. Here, bandwidth-limited audio data and BWE information are the same as described with reference to FIG. 1 . The FGS decoder 8 arithmetically decodes at least a portion of the bandwidth-limited audio data corresponding to the base layer. The layer that performs the decoding is related to network state, user selection, etc.

基于由FGS解码器8算术解码的带宽受限的音频数据部分并参考由多路分解器7采样的BWE信息，BWE解码器9产生处于至少一部分未被由FGS解码器8算术解码的带宽受限的音频数据覆盖的频带内的音频数据，然后将所产生的音频数据补入已被FGS解码器8算术解码的带宽受限的音频数据。Based on the bandwidth-limited audio data portion arithmetically decoded by the FGS decoder 8 and with reference to the BWE information sampled by the demultiplexer 7, the BWE decoder 9 generates bandwidth-limited audio data in at least a portion not arithmetically decoded by the FGS decoder 8 The audio data within the frequency band covered by the audio data of the FGS decoder 8 is then added to the bandwidth-limited audio data that has been arithmetically decoded by the FGS decoder 8 .

由于本发明采用PWT，BWE解码器9经历以下过程。当采用PWT执行解码时，通过在确定带宽受限的音频数据过程中确定频域内的最后点来选择分割频率。由于在高频部分的频率分辨率低，PWT不能象MDCT那样根据所确定的最后点精确限制带宽。在解码过程中，BWE解码器8将由FGS解码器9所产生的核心部分安排到频域中，确认该核心部分的频率带宽，并将BWE部分修改和解码为适合的频率带宽。Since the present invention employs PWT, the BWE decoder 9 goes through the following process. When decoding is performed using PWT, the division frequency is selected by determining the last point in the frequency domain during determination of bandwidth-limited audio data. Due to the low frequency resolution in the high frequency part, PWT cannot precisely limit the bandwidth according to the last point determined like MDCT. In the decoding process, the BWE decoder 8 arranges the core part generated by the FGS decoder 9 into the frequency domain, confirms the frequency bandwidth of the core part, and modifies and decodes the BWE part into an appropriate frequency bandwidth.

例如，让我们假设以64kbps的比特率编码的16层比特流中只有8层重构，对应第8层的频率为8.5kHZ。在此情况下，BWE解码器8不得不在8.5kHZ-15kHZ或更宽频率范围内重构数据。由于正交镜象滤波器(QMF)的特性，BWE解码器8可在正交镜象滤波信道带宽的基础上调整频率带宽。当QMF的第n个频率带宽为8.3kHZ，在8.3-8.5kHZ的频率带宽范围内的频率分量被包含在核心部分或BWE部分。因此，核心部分或BWE部分必须正确处理。For example, let us assume that only 8 layers are reconstructed in a 16-layer bitstream encoded at a bit rate of 64kbps, corresponding to a frequency of 8.5kHZ for layer 8. In this case, the BWE decoder 8 has to reconstruct data in the frequency range of 8.5kHZ-15kHZ or wider. Due to the characteristics of the quadrature mirror filter (QMF), the BWE decoder 8 can adjust the frequency bandwidth based on the channel bandwidth of the quadrature mirror filter. When the nth frequency bandwidth of QMF is 8.3kHZ, the frequency components within the frequency bandwidth range of 8.3-8.5kHZ are contained in the core part or the BWE part. Therefore, the core part or the BWE part must be handled correctly.

第一种处理核心部分和BWE部分的方法，是从核心部分删除8.3-8.5kHZ频率带宽范围内的频率分量。在此方法中，考虑到BWE部分的带宽信息，FGS解码器9执行解码。第二种方法是利用BWE解码器8中使用的QMF过滤核心部分的数据，通过内插生成QMF数据，并反向正交镜象滤波QMF数据从而重构核心部分的数据。The first way to deal with the core and BWE section is to remove frequency components in the 8.3-8.5kHZ frequency bandwidth range from the core. In this method, the FGS decoder 9 performs decoding in consideration of the bandwidth information of the BWE part. The second method is to use the QMF used in the BWE decoder 8 to filter the data of the core part, generate the QMF data by interpolation, and reconstruct the data of the core part by inverse quadrature mirror filtering of the QMF data.

如上所述，即使FGS解码器8解码的音频数据只有基带音频数据，BWE解码器9创建遗漏频带音频数据并将其补入基带音频数据。因此，可提高解码音频数据质量。As described above, even if the audio data decoded by the FGS decoder 8 is only baseband audio data, the BWE decoder 9 creates missing band audio data and fills it into the baseband audio data. Therefore, the quality of decoded audio data can be improved.

图4为图3所示的解码装置的详细方框图。如图4，该解码装置包括一多路分解器7、一FGS解码器8和一BWE解码器9。与图3中执行相同功能的块采用的相同附图标记，在此不再赘述。FIG. 4 is a detailed block diagram of the decoding device shown in FIG. 3 . As shown in FIG. 4 , the decoding device includes a demultiplexer 7 , a FGS decoder 8 and a BWE decoder 9 . The same reference numerals are used for the blocks that perform the same functions in FIG. 3 , and will not be repeated here.

具体地，为控制比特率，FGS解码器8执行解码直到目标层，此目标层由网络状态、解码装置性能、用户选择等确定。FGS解码器8包括一FGS算术解码单元81、一反量化单元82和一PWT反变换单元83。FGS算术解码单元81执行解码直到音频比特流的目标层。更详细地，基于通过解码包含对应于每层的比例因子信息和编码模型信息的辅助信息所获得的编码模型信息，FGS算术解码单元81算术解码对应于每层的已编码量化采样值从而获得量化采样值。以下将详细解释获得量化采样值的处理。Specifically, to control the bit rate, the FGS decoder 8 performs decoding up to a target layer, which is determined by network status, decoding device performance, user selection, and the like. The FGS decoder 8 includes a FGS arithmetic decoding unit 81 , an inverse quantization unit 82 and a PWT inverse transform unit 83 . The FGS arithmetic decoding unit 81 performs decoding up to the target layer of the audio bitstream. In more detail, based on encoding model information obtained by decoding side information including scale factor information and encoding model information corresponding to each layer, the FGS arithmetic decoding unit 81 arithmetically decodes encoded quantized sample values corresponding to each layer to obtain quantization sample value. The process of obtaining quantized sample values will be explained in detail below.

比例频带信息和编码频带信息可从音频比特流的标头信息或解码每层的辅助信息中获得。可选地，所述解码装置可预先存储比例频带信息和编码频带信息。Scale band information and encoding band information can be obtained from the header information of the audio bitstream or side information of each decoding layer. Optionally, the decoding device may store proportional band information and coding band information in advance.

反量化单元82基于对应于每层的比例因子信息反量化并重构每层的量化采样值。PWT反变换单元83频率/时间映射已重构采样值，将所映射的采样值反向伪子波变换为时域PCM音频数据，并输出该时域PCM音频数据。The inverse quantization unit 82 dequantizes and reconstructs quantized sample values of each layer based on scale factor information corresponding to each layer. The PWT inverse transform unit 83 frequency/time maps the reconstructed sample value, inversely transforms the mapped sample value into a time-domain PCM audio data, and outputs the time-domain PCM audio data.

BWE解码器9包括一变换单元91、一高频产生单元92、一调整单元93和一合成单元94。变换单元91将从PWT反变换单元83输出的时域PCM音频数据变换为频域数据。频域数据被称为低频部分。高频产生单元92创建频域数据未覆盖的部分，即，通过参考BWE信息复制低频部分然后将所复制的低频部分补入频域数据、即原始低频部分中而得到的高频部分。调整单元93采用包含在BWE信息中的包封信息来调整由高频产生单元92所产生的高频部分的水平。从编码点传送的包封信息，表示对应于在BWE编码过程中由编码点所分割的高频部分的音频数据的包封信息。合成单元94合成从变换单元91输出的低频部分和从调整单元93输出的高频部分，然后输出PCM音频数据。The BWE decoder 9 includes a transformation unit 91 , a high frequency generation unit 92 , an adjustment unit 93 and a synthesis unit 94 . The transform unit 91 transforms the time domain PCM audio data output from the PWT inverse transform unit 83 into frequency domain data. The frequency domain data is called the low frequency part. The high frequency generation unit 92 creates a part not covered by the frequency domain data, that is, a high frequency part obtained by copying the low frequency part with reference to the BWE information and then adding the copied low frequency part to the frequency domain data, ie, the original low frequency part. The adjustment unit 93 adjusts the level of the high frequency part generated by the high frequency generation unit 92 using the envelope information included in the BWE information. The encapsulation information transmitted from the code point indicates the encapsulation information of the audio data corresponding to the high frequency part divided by the code point in the BWE encoding process. The synthesis unit 94 synthesizes the low frequency part output from the transformation unit 91 and the high frequency part output from the adjustment unit 93, and then outputs PCM audio data.

如上所述，尽管FGS解码器8只解码基带音频数据，BWE解码器9重构遗漏频带音频数据并将遗漏频带音频数据补入基带音频数据中。因此，提高了基带音频数据质量。As described above, while the FGS decoder 8 decodes only the baseband audio data, the BWE decoder 9 reconstructs the missing band audio data and fills the missing band audio data into the baseband audio data. Therefore, the baseband audio data quality is improved.

图5表示从FGS编码器2输出的比特流结构。如图5，通过映射量化采样值和辅助信息到细粒可伸缩(FGS)的分层结构中，FGS编码器2将比特流帧编码。也就是说，此帧具有分层结构，其中较低层的比特流被包括在增强层的比特流中。每层必须的辅助信息逐层编码。FIG. 5 shows the structure of the bit stream output from the FGS encoder 2. As shown in FIG. As shown in Fig. 5, FGS encoder 2 encodes bitstream frames by mapping quantized samples and side information into a fine-grained scalable (FGS) hierarchical structure. That is, this frame has a hierarchical structure in which bitstreams of lower layers are included in bitstreams of enhancement layers. The auxiliary information necessary for each layer is encoded layer by layer.

标头信息所存储在的标头区域处于比特流的开始部分，第零层信息被打包，而处于增强层的第一至第N层的信息依次被打包。基层范围是从标头区域至第零层信息，第一层范围是从标头区域至第一层信息，并且第二层范围是从标头区域至第二层信息。同样，最高增强层范围是从标头区域至第N层信息，即从基层至第N层。辅助信息和已编码数据均被作为每层信息存储。例如，辅助信息2和已编码量化采样值被作为第二层信息存储。这里，N为大于或等于“1”的自然数。The header area where the header information is stored is at the beginning of the bit stream, the information of the zeroth layer is packed, and the information of the first to the Nth layers of the enhancement layer is packed sequentially. The base layer ranges from the header area to the zeroth layer information, the first layer ranges from the header area to the first layer information, and the second layer ranges from the header area to the second layer information. Likewise, the highest enhancement layer ranges from the header area to the Nth layer information, that is, from the base layer to the Nth layer. Both side information and encoded data are stored as per-layer information. For example, side information 2 and encoded quantized sample values are stored as second layer information. Here, N is a natural number greater than or equal to "1".

图6示出了图5所示的辅助信息的详细结构。如图6，辅助信息和已编码量化采样值均被作为任意层信息存储。在当前实施例中，由于量化采样值已被算术编码，辅助信息包含算术编码模型信息、比例因子信息、信道辅助信息以及其他辅助信息。算术编码模型信息涉及用于编码或解码包含在相应层的量化采样值的算术编码模型的索引信息。比例因子信息通知相应层适合量化或反量化包含在相应层的音频数据的量化步骤大小。信道辅助信息涉及诸如中/边(M/S)立体声的与信道相关的信息。其他辅助信息为表示是否采用M/S立体声的标识信息。FIG. 6 shows a detailed structure of the auxiliary information shown in FIG. 5 . As shown in Figure 6, side information and encoded quantized sample values are stored as arbitrary layer information. In the current embodiment, since the quantized sampling values have been arithmetically coded, the auxiliary information includes arithmetic coding model information, scaling factor information, channel auxiliary information and other auxiliary information. The arithmetic coding model information refers to index information of an arithmetic coding model used to encode or decode quantized sample values contained in a corresponding layer. The scalefactor information informs the corresponding layer of the quantization step size suitable for quantizing or dequantizing the audio data contained in the corresponding layer. Channel side information refers to channel-related information such as Mid/Side (M/S) stereo. Other auxiliary information is identification information indicating whether to use M/S stereo.

在本实施例中，编码装置的FGS编码器2差分编码包括算术编码模型信息和比例因子信息的辅助信息。由于每个比例频带具有一比例因子，为编码比例因子，首先算术编码属于比例频带的比例因子中最小的比例因子，然后算术编码在最小比例因子和其他比例因子之间的差值。在对应于每个编码频带所允许的比特范围内的一算术编码模型和信息被根据编码量化步骤大小的方法来编码，即差分编码。In this embodiment, the FGS encoder 2 of the encoding device differentially encodes auxiliary information including arithmetic coding model information and scale factor information. Since each scale band has a scale factor, to encode the scale factors, first the smallest scale factor among the scale factors belonging to the scale band is arithmetically encoded, and then the difference between the smallest scale factor and the other scale factors is arithmetically encoded. An arithmetic coding model and information within the allowed bit range corresponding to each coding band is coded according to the method of coding the quantization step size, ie differential coding.

在本实施例中，解码装置的FGS解码器8算术解码包括算术编码模型信息和比例因子信息的辅助信息。由于每个比例频带具有一比例因子，为解码比例因子，首先算术解码属于比例频带的比例因子中最小的比例因子，然后算术解码在最小比例因子和其他比例因子之间的差值。在对应于每个编码频带所允许的比特范围内的一算术编码模型和信息以与比例因子相同的方式被算术解码。In this embodiment, the FGS decoder 8 of the decoding device arithmetically decodes side information including arithmetic coding model information and scale factor information. Since each scale band has a scale factor, to decode the scale factors, first the smallest scale factor among the scale factors belonging to the scale band is arithmetically decoded, and then the difference between the smallest scale factor and the other scale factors is arithmetically decoded. An arithmetically encoded model and information within the allowed bit range corresponding to each encoding band is arithmetically decoded in the same manner as the scale factor.

图7示出了多路复用器3输出或输入到多路分解器7的比特流的结构。如图7，第零层，即FGS编码器2所编码的基层，位于比特流的开始部分，BWE信息在第零层之后，而增强层，即第一层、第二层……和第N层，在BWE信息之后。尽管解码点只接收或解码基层，解码点能基于基层的已解码音频数据并参考BWE信息创建遗漏层音频数据。FIG. 7 shows the structure of the bit stream output from the multiplexer 3 or input to the demultiplexer 7 . As shown in Figure 7, the zeroth layer, that is, the base layer encoded by the FGS encoder 2, is located at the beginning of the bit stream, the BWE information is after the zeroth layer, and the enhancement layers, namely the first layer, the second layer... and the Nth layer layer, after the BWE information. Although the decoding point only receives or decodes the base layer, the decoding point can create the missing layer audio data based on the decoded audio data of the base layer and refer to the BWE information.

图8是用于解释根据本发明的编码和解码装置执行的算术编码和解码方法的示意图。如图8，点阵矩形框表示构成量化采样值的频谱线，其中A表示用于形成层之间的边界的线，B表示用于分割频谱线的边界线从而对应PWT树结构的终端结点。FIG. 8 is a diagram for explaining an arithmetic encoding and decoding method performed by an encoding and decoding device according to the present invention. As shown in Figure 8, the dot matrix rectangular box represents the spectral lines that constitute the quantized sampling values, where A represents the line used to form the boundary between layers, and B represents the boundary line used to divide the spectral line to correspond to the terminal node of the PWT tree structure .

根据本发明用在编码和/或解码方法中的PWT和/或反PWT采用树结构执行频率变换和/或频率反变换，从而将频率表示为更接近于对应人耳的滤波器组的状态。所述树结构的最后结点分别对应算术编码比例频带。因此，每个最后结点都对应一个比例因子。The PWT and/or inverse PWT used in the encoding and/or decoding method according to the present invention performs frequency transformation and/or inverse frequency transformation using a tree structure to represent frequencies closer to the state of the filter bank corresponding to the human ear. The last nodes of the tree structure correspond to arithmetic coding scale frequency bands respectively. Therefore, each last node corresponds to a scaling factor.

作为传送算术编码所必须的算术编码模型信息的单元的编码频带可以考虑编码效率而决定。例如，让我们假设最后结点具有相同的比例频带和编码频带。如图8所示，层和最后结点被映射。由于对应最后结点的数据存在于相同频带的时域内，在分割层的过程中不划分对应最后结点的数据。A coding band as a unit for transmitting arithmetic coding model information necessary for arithmetic coding can be determined in consideration of coding efficiency. For example, let's assume that the last node has the same scale band and code band. As shown in Figure 8, layers and final nodes are mapped. Since the data corresponding to the last node exists in the time domain of the same frequency band, the data corresponding to the last node is not divided in the process of dividing the layers.

固定第零层从而对频带a执行编码，固定第一层从而对频带b执行编码，固定第二层从而对频带c执行编码，固定第三层从而对频带d执行编码，固定第四层从而对频带e执行编码，固定第五层从而对频带f执行编码，固定第六层从而对频带g执行编码，以及固定第七层从而对频带h执行编码。The zeroth layer is fixed to encode band a, the first layer is fixed to encode band b, the second layer is fixed to encode band c, the third layer is fixed to encode band d, the fourth layer is fixed to encode Encoding is performed for the frequency band e, the fifth layer is fixed so that encoding is performed for the frequency band f, the sixth layer is fixed so that encoding is performed for the frequency band g, and the seventh layer is fixed so that encoding is performed for the frequency band h.

首先，采用对应的编码模型在允许的比特范围内算术编码对应第零层的量化采样值。第零层的辅助信息被算术编码。当比特分割编码第零层的量化采样值时计算比特量。假如比特量超过所允许的比特范围时，停止第零层的编码，然后开始第一层的算术编码。当第一和第二层允许的比特范围具有附加的比特部分，对未编码的第零层的量化采样值进行编码。First, the quantized sampling value corresponding to the zeroth layer is arithmetically coded within the allowable bit range by using the corresponding coding model. The side information of the zeroth layer is arithmetically coded. The amount of bits is calculated when bit-slicing the quantized sample values of the zeroth layer. If the amount of bits exceeds the allowable bit range, the coding of the zeroth layer is stopped, and then the arithmetic coding of the first layer is started. When the allowed bit ranges of the first and second layers have additional bit parts, the uncoded quantized sample values of the zeroth layer are coded.

利用对应于第一层的编码模型来编码对应第一层的量化采样值。算术编码第一层的辅助信息。在编码第一层的所有量化采样值之后，第一层所允许的比特范围具有附加比特部分的情况下，对未编码的第零层的量化采样值进行编码直到到达所允许的比特范围。当到达所允许的比特范围时，停止第一层的编码，然后开始第二层的编码。执行此处理直到第七层，从而完成第七层的编码。The quantized sample values corresponding to the first layer are encoded using the coding model corresponding to the first layer. Auxiliary information for the first layer of arithmetic coding. After encoding all quantized sample values of the first layer, the uncoded quantized sample values of the zeroth layer are encoded until the allowed bit range is reached, in case the allowed bit range of the first layer has an additional bit portion. When the allowed bit range is reached, the encoding of the first layer is stopped, and then the encoding of the second layer is started. This processing is performed up to the seventh layer, thereby completing the encoding of the seventh layer.

如果每层的所有量化采样值不考虑所允许的比特范围来进行编码，即，即使已编码比特量超过所允许的比特范围也对每层的所有量化采样值进行编码，则可以使用一部分下一层所允许的比特范围。这样，属于下一层的量化采样值可以不被编码。因此，假如执行比特率可伸缩编码，即，只对较低层而不是对所有层执行解码，则不对在预定频率内的量化采样值解码。因此，已解码量化采样值在频带下上下变化，导致多鸟效应(birdy effect)，恶化了声音质量。If all quantized sample values of each layer are encoded regardless of the allowed bit range, i.e. all quantized sample values of each layer are encoded even if the amount of encoded bits exceeds the allowed bit range, a part of the next The range of bits allowed by the layer. In this way, quantized sample values belonging to the next layer may not be coded. Therefore, if bit-rate scalable encoding is performed, ie, decoding is performed only on lower layers instead of all layers, quantized sample values within a predetermined frequency are not decoded. As a result, the decoded quantized sample values vary up and down the frequency band, causing a birdy effect that deteriorates the sound quality.

由于当执行与编码处理反向的处理时解码处理依据所允许的比特范围计算比特量，可检测开始解码预定层的时间点。Since the decoding process calculates the amount of bits in accordance with the allowable bit range when the process reverse to the encoding process is performed, a time point at which decoding of a predetermined layer starts can be detected.

从“msb”方向到“lsb”方向对频谱线执行编码。这里，在用于波形变换的树结构的最后结点，相同比特平面上的数据比特必须一起编码。例如，当最后结点具有以下量化采样值时，Encoding is performed on spectral lines from "msb" direction to "lsb" direction. Here, at the last node of the tree structure used for waveform transformation, the data bits on the same bit plane have to be coded together. For example, when the last node has the following quantized sample values,

0000000010101011010100000000101010110101

1111110000000000000011111100000000000000

0000110011000000011000001100110000000110

基于MDCT，量化采样值被分组为五个4×4比特平面且从左至右从上至下执行编码。然而，基于PWT，所有量化采样值被作为一个比特平面且基于N比特从最高有效比特至最低有效比特从较低频率至较高频率执行编码。最高有效比特“00000000101010110101”基于N比特从左至右编码，后续比特“11111100000000000000”被基于N比特从左至右编码，而且最不重要比特“00001100110000000110”基于N比特编码。此处，N为大于或等于“1”的整数。尤其，假如N为1，则执行二进制编码。由于算术编码可将比特分配到十进制位置，例如0.001比特，当编码一比特时，可只利用少量比特编码大量信息。也就是说，编码效率相当高。哈夫曼编码，另一种无损编码，要求每个码元至少一比特，因此算术编码具有较差的编码效率。Based on MDCT, quantized sample values are grouped into five 4x4 bit planes and coding is performed from left to right and top to bottom. However, based on PWT, all quantized sample values are taken as one bit plane and coding is performed based on N bits from most significant bit to least significant bit from lower frequency to higher frequency. The most significant bit "00000000101010110101" is encoded from left to right based on N bits, the subsequent bits "11111100000000000000" are encoded from left to right based on N bits, and the least significant bit "00001100110000000110" is encoded based on N bits. Here, N is an integer greater than or equal to "1". In particular, if N is 1, binary encoding is performed. Since arithmetic coding can assign bits to decimal positions, eg 0.001 bit, a large amount of information can be encoded with only a small number of bits when encoding one bit. That said, the coding efficiency is quite high. Huffman coding, another lossless coding, requires at least one bit per symbol, so arithmetic coding has poor coding efficiency.

图9是用于解释由BWE解码器9执行的BWE解码的示意图。如图9，条纹部分表示被FGS解码器8解码的数据，点阵部分表示BWE解码器9创建的数据。当在采样频率Fs的1/4部分的所有数据属于基层时，图9(a)示出了一解码结点只解码基带数据的情况，图10(b)、(c)和(d)示出了FGS解码器8解码对应于基带和至少一增强层的数据的情况。也就是说，FGS解码器8能解码数据从而控制比特率，而BWE解码器9能创建FGS解码器8不能解码的遗漏频带数据。FIG. 9 is a diagram for explaining BWE decoding performed by the BWE decoder 9 . As shown in FIG. 9 , the stripe part represents the data decoded by the FGS decoder 8 , and the dot matrix part represents the data created by the BWE decoder 9 . When all the data in the 1/4 part of the sampling frequency Fs belong to the base layer, Fig. 9(a) shows a situation where a decoding node only decodes baseband data, and Fig. 10(b), (c) and (d) show The case is shown where the FGS decoder 8 decodes data corresponding to the baseband and at least one enhancement layer. That is, the FGS decoder 8 can decode data to control the bit rate, while the BWE decoder 9 can create missing band data that the FGS decoder 8 cannot decode.

基于上述结构将说明根据本发明优选实施例的编码和解码方法。An encoding and decoding method according to a preferred embodiment of the present invention will be described based on the above structure.

图10是用于说明根据本发明的一编码方法的流程图。如图10，在步骤1001，一编码装置BWE编码音频数据，输出带宽受限的音频数据，并产生对应于基层的BWE信息。基层的BWE信息对于利用解码结点基于属于基层的音频数据创建遗漏频带音频数据是必须的，并且包括包封信息。所述编码装置将带宽受限的音频数据编码为具有基层和至少一增强层的分层结构从而控制比特率。更详细地，在步骤1002，编码装置逐层伪子波变换带宽受限的音频数据，在步骤1003，量化带宽受限的音频数据，并且在步骤1004，哈夫曼编码带宽受限的音频数据并将带宽受限的音频数据打包成分层结构从而控制比特率。在步骤1005，该编码装置多路复用带宽受限的音频数据和BWE信息，然后输出音频比特流。更详细地，编码装置按以下顺序多路复用已编码带宽受限的音频数据和BWE信息：定位对应于基层的一部分已编码带宽受限的音频数据，定位BWE信息，并定位对应于其余增强层的部分已编码带宽受限数据。或者按以下顺序多路复用：定位BWE信息，定位对应于基层的一部分已编码带宽受限的音频数据，并定位对应于其余增强层的部分已编码带宽受限数据。FIG. 10 is a flowchart for explaining an encoding method according to the present invention. As shown in Fig. 10, in step 1001, an encoding device BWE encodes audio data, outputs bandwidth-limited audio data, and generates BWE information corresponding to the base layer. The BWE information of the base layer is necessary for creating missing band audio data based on the audio data belonging to the base layer using the decoding node, and includes encapsulation information. The encoding means encodes bandwidth-limited audio data into a layered structure having a base layer and at least one enhancement layer to control a bit rate. In more detail, in step 1002, the encoding device layer-by-layer pseudo-wavelet transforms the bandwidth-limited audio data, in step 1003, quantizes the bandwidth-limited audio data, and in step 1004, Huffman encodes the bandwidth-limited audio data And the bandwidth-limited audio data is packaged into a layered structure to control the bit rate. In step 1005, the encoding device multiplexes the bandwidth-limited audio data and BWE information, and then outputs an audio bitstream. In more detail, the encoding device multiplexes the encoded bandwidth-limited audio data and the BWE information in the following order: locating a portion of the encoded bandwidth-limited audio data corresponding to the base layer, locating the BWE information, and locating the remaining enhanced Part of the layer has encoded bandwidth-constrained data. Or multiplexed in the following order: locate the BWE information, locate the portion of the encoded bandwidth limited audio data corresponding to the base layer, and locate the portion of the encoded bandwidth limited audio data corresponding to the remaining enhancement layers.

图11是用于说明根据本发明的一解码方法的流程图。参考图11，在步骤1101，该解码装置多路分解一输入音频比特流并采样带宽受限的音频数据，该带宽受限的音频数据已被编码成具有一基层和至少一增强层以及BWE信息的分层结构。也就是说，解码装置按以下顺序多路分解输入音频比特流：它采样来自输入音频比特流中对应于基层的数据、BWE信息和对应于其余增强层的数据，它或者采样来自输入音频比特流中的BWE信息、对应于基层的数据和对应于其余增强层的数据。然后，该解码装置解码至少一部分对应于基层的带宽受限的音频数据从而控制比特率。更详细地，在步骤1102，解码装置执行算术解码直到目标层，在步骤1103反量化，以及在步骤1104伪子波变换从而获得PCM音频数据。在步骤1105，基于步骤1104所获得的PCM音频数据并参考BWE信息，解码装置创建处于至少一部分未被在步骤1104所获得的PCM音频数据覆盖的频带内的PCM音频数据，然后将所创建的PCM音频数据补入在步骤1104所获得的PCM音频数据中。FIG. 11 is a flowchart illustrating a decoding method according to the present invention. Referring to Fig. 11, in step 1101, the decoding device demultiplexes an input audio bit stream and samples bandwidth-limited audio data, which has been encoded to have a base layer and at least one enhancement layer and BWE information layered structure. That is, the decoding means demultiplexes the input audio bitstream in the following order: it samples data from the input audio bitstream corresponding to the base layer, BWE information and data corresponding to the remaining enhancement layers, it either samples from the input audio bitstream The BWE information in , the data corresponding to the base layer and the data corresponding to the rest of the enhancement layers. Then, the decoding means decodes at least a part of the bandwidth-limited audio data corresponding to the base layer to control the bit rate. In more detail, at step 1102, the decoding means performs arithmetic decoding up to the target layer, inverse quantization at step 1103, and pseudo-wavelet transform at step 1104 to obtain PCM audio data. In step 1105, based on the PCM audio data obtained in step 1104 and referring to the BWE information, the decoding device creates PCM audio data in at least a part of the frequency band not covered by the PCM audio data obtained in step 1104, and then converts the created PCM audio data into Audio data is added to the PCM audio data obtained in step 1104 .

如上所述，本发明提供了一种比特可伸缩编码和解码方法及装置，只需恢复部分比特流便可获得高质量的声音。As described above, the present invention provides a bit-scalable encoding and decoding method and device, which can obtain high-quality sound only by restoring part of the bit stream.

基于算术编码利用少量数据可提供高FGS，并且基于PWT，频率分辨率可与人耳传送功能相同。因此，基于PWT编码比现有基于MDCT编码的时域/频域分辨率更好。因而，可从较低层产生高质量声音。High FGS can be provided with a small amount of data based on arithmetic coding, and based on PWT, the frequency resolution can be the same as the human ear transmission function. Therefore, PWT-based coding has better time/frequency domain resolution than existing MDCT-based coding. Thus, high-quality sound can be generated from lower layers.

虽然参考典型实施例已对本发明进行了详述，很明显，本领域普通技术人员在不脱离所附权利要求所限定的本发明的精神和范围的情况下可以对本发明的形式和细节作各种改变。Although the invention has been described in detail with reference to exemplary embodiments, it is obvious that various changes in form and details of the invention can be made by persons skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims. Change.

Claims

1. A method of encoding audio data, the method comprising:

Perform bandwidth extension encoding on audio data, output bandwidth-limited audio data, and generate bandwidth extension information;

Arithmetic coding of bandwidth-limited audio data into a layered structure with a base layer and at least one enhancement layer, thereby controlling the bit rate;

The arithmetically encoded bandwidth-limited audio data and bandwidth extension information are multiplexed.

2. The method of claim 1, wherein arithmetic coding comprises:

The differential encoding corresponds to the auxiliary information of the base layer;

Bit-segmented coding corresponds to multiple quantized sample values of the base layer;

And repeat the differential encoding and bit-splitting encoding for the next enhancement layer until the encoding is completed for a plurality of predetermined layers.

3. The method of claim 1, wherein arithmetic coding comprises:

The differential encoding corresponds to the auxiliary information of the base layer including scale factor information and encoding model information;

Referring to the coding model information, the bit-segmented coding corresponds to a plurality of quantized sampling values of the base layer;

The differential encoding and bit-splitting encoding are repeated for the next enhancement layer until a number of predetermined layers have been encoded.

4. A method as claimed in claim 2 or 3, wherein the quantized sample values are obtained by pseudo-wavelet transform of the audio data.

5. The method of claim 1, wherein the encoded bandwidth-limited audio data and the bandwidth extension information are multiplexed in the following order: locating a portion of the encoded bandwidth-limited audio data corresponding to the base layer, locating the bandwidth extension information, and locate the portion of encoded bandwidth-limited data corresponding to the remaining enhancement layers.

6. The method of claim 1, wherein the encoded bandwidth limited audio data and the bandwidth extension information are multiplexed in the following order: locate the bandwidth extension information, locate the part of the encoded bandwidth limited audio corresponding to the base layer data, and locate the portion of encoded bandwidth-limited data corresponding to the remaining enhancement layers.

7. A method of decoding audio data, the method comprising:

demultiplexing an input audio bitstream and sampling bandwidth-limited audio data encoded as a layered structure comprising a base layer and at least one enhancement layer, and bandwidth extension information;

arithmetically decoding at least a portion of the bandwidth-limited audio data corresponding to the base layer;

Based on the decoded portion of the bandwidth-limited audio data and with reference to the bandwidth extension information, audio data within at least a portion of a frequency band not covered by the decoded portion of the bandwidth-limited audio data is generated, and the generated audio data is then Fill in the decoded portion of bandwidth-constrained audio data.

8. A method as claimed in claim 7, wherein the audio data within the part of the frequency band is generated so as to reach the boundary of the bandwidth-limited encoded part of the audio data.

9. A method according to claim 8, wherein the audio data in this part of the frequency band is generated so as to reach the boundary of the filter bank used for the pseudo-wavelet transform.

10. The method of claim 8, wherein overlapping portions of the decoded portion of the bandwidth-limited audio signal and the resulting audio data are inserted if the audio data does not reach the boundary of the filter bank used for the pseudo-wavelet transform.

11. The method according to claim 7, wherein the input audio bitstream is demultiplexed in the following order: sampling data corresponding to the base layer from the input audio bitstream, sampling bandwidth extension information from the input audio bitstream, and sampling the input audio bitstream from Stream samples correspond to the data of the remaining enhancement layers.

12. The method according to claim 7, wherein the input audio bitstream is demultiplexed in the following order: sampling the bandwidth extension information from the input audio bitstream, sampling data corresponding to the base layer from the input audio bitstream, and sampling the data corresponding to the base layer from the input audio bitstream. Stream samples correspond to the data of the remaining enhancement layers.

13. The method of claim 7, wherein arithmetic decoding comprises:

Differential decoding corresponds to the auxiliary information of the base layer;

bit-sliced decoding of multiple quantized sample values corresponding to the base layer;

The differential decoding and bit-sliced decoding are repeated for the next enhancement layer until decoding is complete for a number of predetermined layers.

14. The method of claim 7, wherein arithmetic decoding comprises:

Differential decoding corresponds to auxiliary information of the base layer including scale factor information and encoding model information;

Referring to the coding model information, the bit-segmented decoding corresponds to a plurality of quantized sample values of the base layer;

And repeat the differential decoding and bit-splitting decoding for the next enhancement layer until the decoding is completed for a plurality of predetermined layers.

15. An apparatus for encoding audio data, the apparatus comprising:

A bandwidth extension encoder, used for bandwidth extension encoding audio data, outputting bandwidth limited audio data and generating bandwidth extension information;

a fine-grained scalable encoder for encoding bandwidth-limited audio data into a layered structure comprising a base layer and at least one enhancement layer to control the bit rate;

A multiplexer for multiplexing the arithmetically coded bandwidth limited audio data and the bandwidth extension information.

16. The apparatus of claim 15, wherein the fine-grained scalable encoder differentially encodes side information corresponding to the base layer, bit-segmented encoding corresponds to multiple quantized sample values of the base layer, and bit-segmented encoding corresponds to the next enhancement layer The auxiliary information and multiple quantized sample values are coded until multiple predetermined layers are completed.

17. The apparatus according to claim 15, wherein the fine-grained scalable encoder differential encoding corresponds to auxiliary information including scale factor information and coding model information of the base layer, and the reference coding model information bit-segmented coding corresponds to multiple quantizations of the base layer The sampling value is coded corresponding to the auxiliary information of the next enhancement layer, including the scale factor information and the coding model information until the coding is completed for multiple predetermined layers, and bit-divided coding is corresponding to the auxiliary information of the next enhancement layer and a plurality of quantized sample values.

18. The apparatus according to claim 15, wherein the fine-grained scalable encoder obtains quantized sample values by pseudo-wavelet transforming the audio data.

19. The apparatus according to claim 15, wherein the multiplexer multiplexes the encoded bandwidth-limited audio data and the bandwidth extension information in the following order: Locate a portion of the encoded bandwidth-limited audio corresponding to the base layer data, locate the bandwidth extension information, and locate the portion of encoded bandwidth-limited data corresponding to the remaining enhancement layers.

20. An apparatus for decoding audio data, the apparatus comprising:

A demultiplexer for demultiplexing an input audio bitstream and sampling bandwidth-limited audio data encoded into a hierarchical structure with a base layer and at least one enhancement layer and bandwidth extension information;

a fine-grain scalable arithmetic decoder for decoding at least a portion of bandwidth-limited audio data corresponding to the base layer;

a bandwidth extension decoder for generating audio data in a frequency band at least in part not covered by the decoded portion of the bandwidth-limited audio data based on the decoded portion of the bandwidth-limited audio data and with reference to the bandwidth extension information, and then The generated audio data is added to the decoded portion of the bandwidth-constrained audio data.

21. The apparatus of claim 20, wherein the fine-grained scalable Huffman decoder differentially decodes side information corresponding to the base layer, bit-partitioned decodes a plurality of quantized sample values corresponding to the base layer, and decodes values corresponding to the next enhancement The layer side information is fully decoded up to a number of predetermined layers, and bit-splitting decodes a number of quantized sample values corresponding to the next enhancement layer.

22. The apparatus according to claim 20, wherein the demultiplexer demultiplexes the input audio bitstream in the order of sampling data corresponding to the base layer from the input audio bitstream, sampling bandwidth extension information from the input audio bitstream, and Data corresponding to the remaining enhancement layers are sampled from the input audio bitstream.

23. The apparatus of claim 20, wherein the demultiplexer demultiplexes the input audio bitstream in the following order: sampling bandwidth extension information from the input audio bitstream, sampling data corresponding to the base layer from the input audio bitstream, And data corresponding to the remaining enhancement layers are sampled from the input audio bitstream.