WO2012044067A1

WO2012044067A1 - Method and apparatus for decoding an audio signal using an adaptive codebook update

Info

Publication number: WO2012044067A1
Application number: PCT/KR2011/007150
Authority: WO
Inventors: 이미숙
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2010-09-28
Filing date: 2011-09-28
Publication date: 2012-04-05
Anticipated expiration: 2013-03-28

Abstract

The present invention relates to a method and apparatus for decoding an audio signal using an adaptive codebook update. According to one embodiment of the present invention, the method for decoding an audio signal comprises the following steps: taking, as an input, an N^th frame, which is a frame with a loss, and an N+1^th frame, which is a normal frame; determining whether or not to update an adaptive codebook of the last subframe of the N^th frame using the N^th frame and N+1^th frame; updating the adaptive codebook of the last subframe of the N^th frame using the N+1t^th frame; and synthesizing an audio signal using the N^th frame. According to the present invention, the adaptive codebook of the last subframe of the frame with a loss is updated using the frame data normally received after a loss of frame data upon the occurrence of a loss of frame data, thus more quickly returning to the state before the loss of frame data.

Description

Method and apparatus for decoding audio signal using adaptive codebook update

본 발명은 오디오 신호를 디코딩하는 방법 및 장치에 관한 것으로, 보다 상세하게는 적응 코드북 업데이트를 이용한 오디오 신호의 디코딩 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for decoding an audio signal, and more particularly, to a method and apparatus for decoding an audio signal using an adaptive codebook update.

음성(오디오) 통화를 위한 음성(오디오) 신호를 통신망으로 전송하기 위해서, 디지털 신호로 변환된 오디오 신호를 압축하는 인코더와 인코딩된 데이터로부터 오디오 신호를 복원하는 디코더가 사용된다. 가장 널리 사용되는 오디오 코덱(인코더와 디코더) 기술 중 하나는 CELP(Code Excited Linear Prediction)이다. CELP 코덱은 성도를 모델링하는 합성 필터와 이 합성 필터의 입력 신호로 오디오 신호를 표현한다. In order to transmit a voice (audio) signal for a voice (audio) call to a communication network, an encoder for compressing an audio signal converted into a digital signal and a decoder for recovering an audio signal from encoded data are used. One of the most widely used audio codec (encoder and decoder) technologies is Code Excited Linear Prediction (CELP). The CELP codec represents an audio signal with a synthesis filter modeling saints and the input signal from the synthesis filter.

대표적인 CELP 코덱으로는 G.729와 AMR(Adaptive Multi-Rate)코덱이 있다. 이들 코덱의 인코더는 10 또는 20 msec에 해당하는 한 프레임의 입력 신호로부터 합성 필터의 계수를 추출하고, 이 프레임을 다시 5 msec의 서브프레임으로 나누어 적응 코드북의 피치 인덱스와 이득, 그리고 고정 코드북의 펄스 인덱스와 이득을 구한다. 또한 디코더는 적응 코드북의 피치 인덱스와 이득, 고정 코드북의 펄스 인덱스와 이득을 이용하여 여기 신호를 만들고, 이 여기 신호를 합성 필터로 필터링함으로써 오디오 신호를 복원한다. Representative CELP codecs include G.729 and Adaptive Multi-Rate (AMR) codecs. The encoders of these codecs extract the coefficients of the synthesis filter from the input signal of one frame corresponding to 10 or 20 msec, and divide the frame into subframes of 5 msec again, the pitch index and gain of the adaptive codebook, and the pulse of the fixed codebook. Find the index and gain. The decoder also generates an excitation signal using the pitch index and gain of the adaptive codebook and the pulse index and gain of the fixed codebook, and restores the audio signal by filtering the excitation signal with a synthesis filter.

인코더에서 출력되는 프레임 데이터를 전송하는 과정에서, 통신망의 상태에 따라 프레임 손실이 발생할 수 있다. 이러한 프레임 손실로 인한 합성 신호의 품질 저하를 줄이기 위해, 프레임 손실 은닉 알고리즘이 사용된다. 대부분의 프레임 손실 은닉 알고리즘은 프레임 손실이 발생하기 전의 정상 프레임 데이터를 이용하여 손실된 프레임의 오디오 신호를 복원한다. 그러나 손실로 인한 에러는 손실된 프레임뿐만 아니라 이후에 정상적으로 수신된 프레임에도 영향을 미친다. 따라서 오디오 신호를 디코딩할 때, 프레임 손실이 발생한 이후에 정상 프레임이 수신된 경우, 가능하면 빠르게 손실이 발생하기 전의 상태로 복귀할 수 있는 방법이 요구된다.In the process of transmitting the frame data output from the encoder, frame loss may occur according to the state of the communication network. In order to reduce the degradation of the synthesized signal due to such frame loss, a frame loss concealment algorithm is used. Most frame loss concealment algorithms use the normal frame data before frame loss occurs to recover the audio signal of the lost frame. However, errors due to loss affect not only lost frames but also frames that are normally received later. Thus, when decoding an audio signal, if a normal frame is received after a frame loss has occurred, a method is needed to return to the state before the loss occurs as soon as possible.

본 발명은 프레임 데이터 손실이 발생했을 경우, 프레임 데이터 손실 이후에 정상적으로 수신된 프레임 데이터를 이용하여 손실된 프레임의 마지막 서브프레임의 적응 코드북을 업데이트함으로써 보다 빠르게 프레임 손실 전의 상태로 복귀할 수 있도록 하는 오디오 신호 디코딩 방법 및 장치를 제공하는 것을 목적으로 한다.According to the present invention, when frame data loss occurs, the audio code can be returned to the state before the frame loss more quickly by updating the adaptive codebook of the last subframe of the lost frame using the frame data normally received after the frame data loss. It is an object of the present invention to provide a signal decoding method and apparatus.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention, which are not mentioned above, can be understood by the following description, and more clearly by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

이러한 목적을 달성하기 위한 본 발명은 오디오 신호를 디코딩하는 방법에 있어서, 손실 프레임 데이터인 N번째 프레임 이후 전송된 정상 프레임인 N+1번째 프레임을 입력받는 단계, N번째 프레임과 N+1번째 프레임을 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북 업데이트 여부를 결정하는 단계, N+1번째 프레임을 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트하는 단계 및 N번째 프레임을 이용하여 오디오 신호를 합성하는 단계를 포함하는 것을 일 특징으로 한다.According to an aspect of the present invention, there is provided a method of decoding an audio signal, the method comprising: receiving an N + 1 th frame, which is a normal frame transmitted after an N th frame, which is lost frame data, an N th frame and an N + 1 th frame Determining whether to update the adaptive codebook of the last subframe of the Nth frame using the method, updating the adaptive codebook of the last subframe of the Nth frame using the N + 1th frame, and audio using the Nth frame. And synthesizing the signal.

또한 본 발명은 오디오 신호를 디코딩하는 장치에 있어서, 손실 프레임 데이터인 N번째 프레임 이후 전송된 정상 프레임인 N+1번째 프레임을 입력받는 입력부, N번째 프레임과 N+1번째 프레임을 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북 업데이트 여부를 결정하고, N+1번째 프레임을 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트하는 제어부 및 N번째 프레임을 이용하여 오디오 신호를 합성하는 디코딩부를 포함하는 것을 다른 특징으로 한다.The present invention also provides an apparatus for decoding an audio signal, comprising: an input unit for receiving an N + 1 th frame, which is a normal frame transmitted after an N th frame, which is lost frame data, and an N th using an N th frame and an N + 1 th frame A control unit determines whether to update the adaptive codebook of the last subframe of the frame, updates the adaptive codebook of the last subframe of the Nth frame using the N + 1th frame, and a decoding unit to synthesize an audio signal using the Nth frame. It is another feature to include.

전술한 바와 같은 본 발명에 의하면, 프레임 데이터 손실이 발생했을 경우, 프레임 데이터 손실 이후에 정상적으로 수신된 프레임 데이터를 이용하여 손실된 프레임의 마지막 서브프레임의 적응 코드북을 업데이트함으로써 보다 빠르게 프레임 손실 전의 상태로 복귀할 수 있는 효과가 있다. According to the present invention as described above, when frame data loss occurs, after the frame data loss, the adaptive codebook of the last subframe of the lost frame is updated by using the frame data normally received, so that the frame data loss can be brought to a state before the frame loss more quickly. There is an effect to return.

도 1은 CELP 인코더의 구성을 나타내는 구성도.1 is a configuration diagram showing a configuration of a CELP encoder.

도 2는 CELP 디코더의 구성을 나타내는 구성도.2 is a configuration diagram showing a configuration of a CELP decoder.

도 3은 인코더에서 디코더로 전송되는 프레임 시퀀스.3 is a frame sequence transmitted from an encoder to a decoder.

도 4는 AMR-WB 코덱에서 손실된 N번째 프레임의 합성필터 계수를 복원한 후에 서브프레임 단위로 여기신호를 복원하는 과정을 나타내는 흐름도.4 is a flowchart illustrating a process of restoring an excitation signal in units of subframes after restoring the synthesis filter coefficients of the N-th frame lost by the AMR-WB codec.

도 5는 본 발명의 일 실시예에 따른 오디오 신호의 디코딩 방법을 나타내는 흐름도.5 is a flowchart illustrating a method of decoding an audio signal according to an embodiment of the present invention.

도 6은 본 발명의 일 실시예에 따른 오디오 신호의 디코딩 장치의 구성도.6 is a block diagram of an apparatus for decoding an audio signal according to an embodiment of the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다. 도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용된다.The above objects, features, and advantages will be described in detail with reference to the accompanying drawings, whereby those skilled in the art to which the present invention pertains may easily implement the technical idea of the present invention. In describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to indicate the same or similar components.

도 1은 CELP 인코더의 구성을 나타내는 구성도이다. 1 is a configuration diagram showing the configuration of a CELP encoder.

전처리 필터(102)는 입력 신호를 스케일링하고, 고대역 통과 필터링을 수행한다. 이 때 입력 신호는 10msec 또는 20msec의 길이를 가질 수 있으며, 복수의 서브프레임으로 구성된다. 이 때 서브프레임은 일반적으로 5msec의 길이를 갖는다.Preprocessing filter 102 scales the input signal and performs high pass filtering. In this case, the input signal may have a length of 10 msec or 20 msec, and is composed of a plurality of subframes. In this case, the subframe generally has a length of 5 msec.

LPC 획득부(104)는 전처리된 입력 신호로부터, 합성 필터의 계수에 해당하는 선형 예측 계수(Linear Prediction Coefficient: LPC)를 추출한다. 그리고 나서 LPC 획득부(104)는 추출된 LPC를 양자화하고, 이전 프레임의 LPC와 인터폴레이션하여 각 서브프레임의 합성 필터 계수를 획득한다. The LPC obtaining unit 104 extracts a linear prediction coefficient (LPC) corresponding to the coefficient of the synthesis filter from the preprocessed input signal. The LPC obtainer 104 then quantizes the extracted LPC and interpolates with the LPC of the previous frame to obtain the synthesis filter coefficients of each subframe.

피치 분석부(106)는 서브프레임 단위로 입력 신호의 피치를 분석하여 적응 코드북의 피치 인덱스와 이득을 획득한다. 획득된 피치 인덱스는 적응 코드북 모듈(112)에서 적응 코드북 값을 재생하는데 사용된다. 그리고 고정 코드북 검색부(108)는 서브프레임 단위로 입력 신호의 고정 코드북을 검색하여 고정 코드북의 펄스 인덱스와 이득을 획득한다. 획득된 펄스 인덱스는 고정 코드북 모듈(110)에서 고정 코드북 값을 재생하는데 사용된다. 적응 코드북 이득과 고정 코드북 이득은 이득 양자화부(122)에 의해 양자화된다. The pitch analyzer 106 analyzes the pitch of the input signal in units of subframes to obtain a pitch index and a gain of the adaptive codebook. The obtained pitch index is used to reproduce the adaptive codebook value in the adaptive codebook module 112. The fixed codebook search unit 108 retrieves the fixed codebook of the input signal on a subframe basis to obtain a pulse index and a gain of the fixed codebook. The obtained pulse index is used to reproduce the fixed codebook value in the fixed codebook module 110. The adaptive codebook gain and the fixed codebook gain are quantized by the gain quantization unit 122.

펄스 인덱스에 의해 재생된 고정 코드북 모듈(110)의 출력은 양자화된 고정 코드북 이득과 곱해진다(114). 그리고 피치 인덱스에 의해 재생된 적응 코드북 모듈(112)의 출력은 양자화된 적응 코드북 이득과 곱해진다(116). 이렇게 이득이 곱해진 적응 코드북 값과 고정 코드북 값이 더해져서 여기 신호가 생성된다. The output of the fixed codebook module 110 reproduced by the pulse index is multiplied by the quantized fixed codebook gain (114). The output of the adaptive codebook module 112 reproduced by the pitch index is then multiplied by the quantized adaptive codebook gain (116). The gain multiplied by the adaptive codebook value and the fixed codebook value are then added to generate an excitation signal.

생성된 여기 신호는 합성 필터(118)로 입력된다. 이후, 전처리부(102)에서 전처리된 입력 신호와 합성 필터(118)의 출력 신호의 오차를 사람의 청각적 특성을 반영한 인지 가중 필터(120)로 필터링한 후, 이 오차 신호가 가장 작아지는 피치 인덱스와 양자화된 이득, 그리고 펄스 인덱스와 양자화된 이득을 구하여 파라미터 인코딩부(124)로 전달한다. 파라미터 인코딩부(124)에서는 적응 코드북의 피치 인덱스, 고정 코드북의 펄스 인덱스, 그리고 이득 양자화부(122)의 출력 및 LPC 파라미터를 전송에 적합한 형태로 인코딩하여 프레임 데이터를 출력한다. 출력된 프레임 데이터는 네트워크 등을 통해 디코더로 전송된다.The generated excitation signal is input to the synthesis filter 118. Subsequently, the error between the input signal preprocessed by the preprocessing unit 102 and the output signal of the synthesis filter 118 is filtered by the cognitive weighting filter 120 reflecting the human auditory characteristics, and the pitch at which the error signal is the smallest. The index, the quantized gain, and the pulse index and the quantized gain are obtained and transmitted to the parameter encoder 124. The parameter encoding unit 124 encodes the pitch index of the adaptive codebook, the pulse index of the fixed codebook, the output of the gain quantization unit 122 and the LPC parameters in a form suitable for transmission, and outputs frame data. The output frame data is transmitted to the decoder via a network or the like.

도 2는 CELP 디코더의 구성을 나타내는 구성도이다. 2 is a configuration diagram illustrating a configuration of a CELP decoder.

디코더는 인코더로부터 전송된 펄스 인덱스와 피치 인덱스를 통해 각각 고정 코드북(202)과 적응 코드북(204)을 복원한다. 그리고 나서, 고정 코드북(202)의 출력에 고정 코드북 이득이 곱해지고(206), 적응 코드북(204)의 출력에 적응 코드북 이득이 곱해진다(208). 이렇게 이득이 곱해진 적응 코드북 값과 고정 코드북 값이 더해져서 여기 신호가 복원된다. 복원된 여기 신호는 인코더로부터 전송된 LPC 계수를 인터폴레이션해서 만든 계수로 이루어진 합성 필터(210)에서 필터링된다. 합성 필터(205)의 출력은 후처리부(212)에서 후처리되어 오디오 신호가 복원된다.The decoder reconstructs the fixed codebook 202 and the adaptive codebook 204 through the pulse index and pitch index transmitted from the encoder, respectively. The output of the fixed codebook 202 is then multiplied by the fixed codebook gain (206), and the output of the adaptive codebook 204 is multiplied by the adaptive codebook gain (208). The gain multiplied by the adaptive codebook value and the fixed codebook value are added to recover the excitation signal. The reconstructed excitation signal is filtered by a synthesis filter 210 consisting of coefficients produced by interpolating the LPC coefficients transmitted from the encoder. The output of the synthesis filter 205 is post-processed by the post processor 212 to restore the audio signal.

한편, 도 1의 인코더를 통해 출력된 프레임 데이터가 도 2의 디코더로 전송되는 과정에서, 네트워크의 상태에 따라 프레임 데이터의 손실이 발생할 수 있다. 이러한 프레임 데이터 손실은 결과적으로 디코더에서 합성된 오디오 신호의 품질 저하로 이어진다. 이러한 오디오 신호의 품질 저하를 줄이기 위해, 대부분의 코덱에는 프레임 손실 은닉 알고리즘이 내장되어 있다.Meanwhile, in the process of transmitting the frame data output through the encoder of FIG. 1 to the decoder of FIG. 2, loss of the frame data may occur according to the state of the network. This frame data loss results in degradation of the synthesized audio signal at the decoder. To reduce the degradation of these audio signals, most codecs have a built-in frame loss concealment algorithm.

AMR-WB(Adaptive MultiRate-WideBand) 코덱에서는 프레임 손실이 발생했을 경우, 이전 프레임의 데이터를 프레임 손실의 연속성 정도에 따라 미리 정해진 값으로 스케일링하여 손실된 프레임의 데이터로 사용한다. In the AMR-WB (Adaptive MultiRate-WideBand) codec, when a frame loss occurs, the data of the previous frame is scaled to a predetermined value according to the continuity of the frame loss and used as the lost frame data.

예를 들어, 도 3과 같이 N-1번째 프레임 데이터는 정상적으로 수신되었으나, N번째 프레임 데이터가 전송도중 손실되었을 때, AMR-WB 디코더에서는 먼저 N-1번째 프레임의 합성 필터 계수를 이용하여 손실된 N번째 프레임의 합성 필터 계수를 복원한다. 그리고 고정 코드북의 펄스는 랜덤 함수를 이용하여 복원하고, 이전 서브 프레임의 고정 코드북 이득을 메디안 필터링한 후에 스케일링한 값을 고정 코드북 이득으로 사용한다. 또한 적응 코드북의 피치 인덱스는 N-1번째 프레임의 마지막 서브프레임의 피치 인덱스 또는 이전 서브프레임의 피치 인덱스들과 랜덤 값을 이용하여 복원하고, 이전 서브프레임의 이득을 메디안 필터링한 후 스케일링한 값을 적응 코드북 이득으로 사용한다. 이렇게 복원된 프레임 데이터를 이용하여 손실된 프레임의 오디오 신호가 복원된다. 한편, 도 3에서 BFI(Bad Frame Indication)는 해당 프레임이 손실 프레임인지 정상 프레임인지 여부를 표시하는 정보로서, BFI가 0이면 해당 프레임은 정상 프레임이며, 1이면 해당 프레임은 손실 프레임이다.For example, as shown in FIG. 3, the N-th frame data is normally received, but when the N-th frame data is lost during transmission, the AMR-WB decoder first loses the data using the synthesis filter coefficients of the N-th frame. Restore the synthesis filter coefficients of the Nth frame. The pulse of the fixed codebook is recovered using a random function, and the scaled value is used as the fixed codebook gain after median filtering the fixed codebook gain of the previous subframe. In addition, the pitch index of the adaptive codebook is reconstructed using the pitch index of the last subframe of the N-1th frame or the pitch indexes of the previous subframe and a random value, and the scaled value after median filtering the gain of the previous subframe. Use as an adaptive codebook gain. The audio signal of the lost frame is restored using the restored frame data. Meanwhile, in FIG. 3, BFI (Bad Frame Indication) is information indicating whether a corresponding frame is a lost frame or a normal frame. When BFI is 0, the corresponding frame is a normal frame, and when 1, the corresponding frame is a lost frame.

도 4는 AMR-WB 코덱에서 손실된 N번째 프레임의 합성필터 계수를 복원한 후에 서브프레임 단위로 여기 신호를 복원하는 과정을 나타내는 흐름도이다. 4 is a flowchart illustrating a process of restoring an excitation signal in units of subframes after restoring the synthesis filter coefficients of the Nth frame lost in the AMR-WB codec.

도 4를 참조하면, 먼저 해당 프레임이 종료되었는지 여부를 판단한다(402). 그리고 나서 해당 프레임이 손실 프레임인지 여부, 즉 BFI가 1인지 여부를 판단한다(404). Referring to FIG. 4, it is first determined whether the corresponding frame is terminated (402). Then, it is determined whether the frame is a lost frame, that is, whether BFI is 1 (404).

프레임 손실이 발생하지 않은 경우에는, 수신된 프레임 데이터로부터 적응 코드북의 피치 인덱스를 디코딩(406)하고, 이를 이용하여 적응 코드북을 디코딩한다(408). 그리고 펄스 인덱스를 이용하여 고정 코드북을 디코딩하고(410), 각 코드북의 이득(412)을 디코딩하여 각 코드북에 곱한다. 그 다음으로, 두 코드북을 더하여 여기 신호를 합성하고(414), 합성된 여기 신호를 합성 필터로 필터링한다(416).If no frame loss occurs, the pitch index of the adaptive codebook is decoded from the received frame data (406) and the adaptive codebook is decoded using this (408). The fixed codebook is decoded using the pulse index (410), and the gain 412 of each codebook is decoded and multiplied by each codebook. Next, two codebooks are added to synthesize the excitation signal (414), and the synthesized excitation signal is filtered by the synthesis filter (416).

단계 404에서 프레임 손실이 발생한 경우(즉, BFI가 1인 경우), 이전 서브프레임의 피치를 이용하여 적응 코드북의 피치 인덱스를 복원하고(418), 복원된 피치 인덱스를 이용하여 적응 코드북을 복원한다(420). 그리고 랜덤 함수를 이용하여 고정 코드북을 복원하고(422), 이전 서브프레임의 적응 코드북과 고정 코드북 이득을 각각 메디안 필터링한 후 스케일링하여, 손실된 프레임의 적응 코드북 이득과 고정코드북 이득을 복원한다(424). 이렇게 복원된 적응 코드북과 고정 코드북 파라미터를 이용하여 여기 신호를 합성하고(414), 합성된 여기 신호를 합성 필터로 필터링한다(416).If frame loss occurs in step 404 (that is, BFI is 1), the pitch index of the adaptive codebook is restored using the pitch of the previous subframe (418), and the adaptive codebook is restored using the recovered pitch index. (420). The fixed codebook is recovered using a random function (422), and the adaptive codebook gain and the fixed codebook gain of the previous subframe are respectively median filtered and scaled to recover the adaptive codebook gain and the fixed codebook gain of the lost frame (424). ). The excitation signal is synthesized using the reconstructed adaptive codebook and the fixed codebook parameter (414), and the synthesized excitation signal is filtered by the synthesis filter (416).

전술한 바와 같이, 프레임 손실이 발생했을 경우 적응 코드북의 피치 인덱스와 이득, 그리고 고정 코드북의 펄스 인덱스와 이득을 복원하는 과정은 프레임 손실이 발생하지 않았을 때와 차이가 있다. 그 후 여기 신호를 합성하고 이를 합성 필터로 필터링하는 과정은 프레임 손실 여부와 관계없이 동일하다. As described above, the process of restoring the pitch index and the gain of the adaptive codebook and the pulse index and the gain of the fixed codebook when the frame loss occurs is different from when the frame loss does not occur. After that, the process of synthesizing the excitation signal and filtering it with a synthesis filter is the same regardless of whether or not the frame is lost.

프레임 데이터 손실이 발생한 경우, 프레임 손실로 인한 영향은 손실된 프레임뿐만 아니라 이후에 수신된 정상 프레임에도 영향을 미친다. 따라서 프레임 손실로 인한 품질 저하를 줄이기 위해서는 손실이 발행한 프레임의 오디오 신호를 잘 복원하는 것도 중요하지만, 이후에 정상적으로 프레임 데이터가 수신되었을 때에 손실이 발생하기 전의 상태로 빠르게 복원하는 것도 중요하다. When frame data loss occurs, the effects of frame loss affect not only the lost frame but also the normal frame received later. Therefore, in order to reduce the quality deterioration due to the frame loss, it is important to restore the audio signal of the frame in which the loss occurs, but it is also important to quickly restore the state before the loss occurs when the frame data is normally received later.

본 발명에서는 프레임 데이터의 손실이 발생하였을 때 손실로 인한 영향에서 빠르게 벗어나기 위해, 손실 발생 이후 처음으로 수신된 정상 프레임 데이터의 피치 정보를 이용하여 손실이 발생한 프레임의 마지막 서브프레임의 적응 코드북을 업데이트한다. The present invention updates the adaptive codebook of the last subframe of the lost frame by using the pitch information of the first normal frame data received after the loss, in order to quickly escape the effects of the loss when the frame data is lost. .

도 5는 본 발명의 일 실시예에 따른 오디오 신호의 디코딩 방법을 나타내는 흐름도이다. 이 실시예는 N번째 프레임이 손실된 프레임이고, N+1번째 프레임이 정상 프레임일 때, 프레임 손실 은닉 알고리즘에 의해 복원된 N번째 프레임의 마지막 서브프레임의 적응 코드북을, N+1번째 프레임 정보를 이용하여 재업데이트하여, N번째 프레임의 손실로 인한 오디오 신호의 품질 저하를 N+1번째 프레임 데이터를 디코딩할 때보다 빠르게 복구하는 과정을 나타낸다.5 is a flowchart illustrating a method of decoding an audio signal according to an embodiment of the present invention. In this embodiment, when the N-th frame is lost, and when the N + 1-th frame is a normal frame, the adaptive codebook of the last subframe of the N-th frame restored by the frame loss concealment algorithm is used. Re-update using to restore the quality degradation of the audio signal due to the loss of the N-th frame faster than when decoding the N + 1-th frame data.

도 5를 참조하면, 먼저 N+1번째 프레임의 첫 번째 서브프레임 피치(T0)와 두번째 서브프레임 피치(T0_2)를 디코딩한다(504).Referring to FIG. 5, first, a first subframe pitch T0 and a second subframe pitch T0_2 of an N + 1th frame are decoded (504).

다음으로, N+1번째 프레임이 손실된 프레임 이후 수신된 첫 번째 정상 프레임인지 여부(즉, prev_BFI==1 & BFI==0), 현재 서브프레임이 N+1번째 프레임의 첫 번째 서브프레임인지 여부(i_subfr==0) 및 손실된 N번째 프레임의 마지막 서브프레임의 피치(PrevT0)와 N+1번째 프레임의 첫 번째 서브프레임의 피치(T0)가 서로 다른지 여부(prev_T0!=T0)를 판단한다(506). 이러한 조건이 만족되지 않으면, 단계 518 이후의 일반적인 디코딩 절차가 수행된다.Next, whether the N + 1th frame is the first normal frame received after the lost frame (ie, prev_BFI == 1 & BFI == 0), and whether the current subframe is the first subframe of the N + 1th frame (I_subfr == 0) and whether the pitch (PrevT0) of the last subframe of the lost Nth frame and the pitch (T0) of the first subframe of the N + 1th frame are different (prev_T0! = T0) (506). If this condition is not met, the general decoding procedure after step 518 is performed.

만약 단계 506의 조건이 모두 만족되면, N+1번째 프레임의 첫 번째 서브프레임과 두 번째 서브프레임의 피치 차이(T0-T0_2)의 절대값이, 미리 정해진 기준 값(x)보다 작은지 여부를 판단한다(508). 이러한 조건이 만족되지 않으면, 단계 518 이후의 일반적인 디코딩 절차가 수행된다.If the conditions of step 506 are all satisfied, it is determined whether the absolute value of the pitch difference (T0-T0_2) of the first subframe and the second subframe of the N + 1th frame is smaller than the predetermined reference value (x). Determine (508). If this condition is not met, the general decoding procedure after step 518 is performed.

만약 단계 508의 조건 또한 만족되면, 다음과 같이, N+1번째 프레임의 첫 번째 서브프레임 여기 신호를 합성하기 전에, N+1번째 프레임의 첫 번째 서브프레임 피치를 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트한다. 즉, N번째 프레임의 마지막 서브프레임의 적응 코드북을 복원하고(510), N번째 프레임의 마지막 서브프레임의 랜덤 함수를 생성한다(512). 그리고 N번째 프레임의 마지막 서브프레임의 여기 신호를 복원하고(514), N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트한다(516).If the condition of step 508 is also satisfied, as follows, before synthesizing the first subframe excitation signal of the N + 1 th frame, using the pitch of the first subframe of the N + 1 th frame, the last sub of the N th frame Update the adaptive codebook of the frame. That is, the adaptive codebook of the last subframe of the Nth frame is recovered (510), and a random function of the last subframe of the Nth frame is generated (512). The excitation signal of the last subframe of the Nth frame is recovered (514), and the adaptive codebook of the last subframe of the Nth frame is updated (516).

이후 N+1번째 프레임의 여기 신호 합성 및 합성 필터에 의한 필터링이 수행된다. 즉, 적응 코드북의 디코딩(518) 및 고정 코드북의 디코딩(520)이 수행되고, 적응 코드북의 이득과 고정 코드북의 이득이 디코딩된다(522). 이렇게 디코딩된 적응 코드북과 그 이득, 고정 코드북과 그 이득을 이용하여 해당 프레임의 여기 신호가 합성되고(524), 합성된 여기 신호는 합성 필터에 의해 필터링된다(525).Thereafter, the excitation signal synthesis and the filtering by the synthesis filter of the N + 1 th frame are performed. That is, decoding 518 of the adaptive codebook and decoding 520 of the fixed codebook are performed, and the gain of the adaptive codebook and the gain of the fixed codebook are decoded (522). The excitation signal of the frame is synthesized using the decoded adaptive codebook and its gain, the fixed codebook and the gain (524), and the synthesized excitation signal is filtered by the synthesis filter (525).

도 6은 본 발명의 일 실시예에 따른 오디오 신호의 디코딩 장치의 구성도이다.6 is a block diagram of an apparatus for decoding an audio signal according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 오디오 신호의 디코딩 장치(602)는 입력부(604), 제어부(606), 디코딩부(608)를 포함한다.An apparatus for decoding an audio signal 602 according to an embodiment of the present invention includes an input unit 604, a control unit 606, and a decoding unit 608.

입력부(604)는 인코딩 장치를 통해 인코딩된 오디오 신호의 프레임 데이터를 입력받는다. 전술한 바와 같이, 이 프레임 데이터는 디코딩 장치로 전송되는 과정에서 프레임 손실이 발생할 수 있다. 본 발명의 일 실시예에서, 입력부(604)는 손실 프레임인 N번째 프레임 이후 전송된 정상 프레임인 N+1번째 프레임을 입력받는다.The input unit 604 receives frame data of an encoded audio signal through an encoding device. As described above, frame loss may occur while the frame data is transmitted to the decoding apparatus. In an embodiment of the present invention, the input unit 604 receives an N + 1 th frame, which is a normal frame transmitted after the N th frame, which is a lost frame.

제어부(606)는 N번째 프레임과 N+1번째 프레임을 이용하여, N번째 프레임의 마지막 서브프레임의 적응 코드북 업데이트 여부를 결정한다. 그리고 이 결정에 따라, N+1번째 프레임을 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트한다.The controller 606 determines whether to update the adaptive codebook of the last subframe of the Nth frame by using the Nth frame and the N + 1th frame. According to this determination, the N + 1th frame is used to update the adaptive codebook of the last subframe of the Nth frame.

본 발명의 일 실시예에서, 제어부(606)는 먼저 N+1번째 프레임의 첫 번째 서브프레임 피치(T0)와 두번째 서브프레임 피치(T0_2)를 디코딩한다.In one embodiment of the present invention, the controller 606 first decodes the first subframe pitch T0 and the second subframe pitch T0_2 of the N + 1 th frame.

다음으로, 제어부(606)는 N+1번째 프레임이 손실된 프레임 이후 수신된 첫 번째 정상 프레임인지 여부(즉, prev_BFI==1 & BFI==0), 현재 서브프레임이 N+1번째 프레임의 첫 번째 서브프레임인지 여부(i_subfr==0) 및 손실된 N번째 프레임의 마지막 서브프레임의 피치(PrevT0)와 N+1번째 프레임의 첫번째 서브프레임의 피치(T0)가 서로 다른지 여부(prev_T0!=T0)를 판단한다. 이러한 조건이 만족되지 않으면, 제어부(606)는 일반적인 디코딩 절차를 수행한다.Next, the controller 606 determines whether it is the first normal frame received after the lost frame of the N + 1th frame (ie, prev_BFI == 1 & BFI == 0), and if the current subframe is the N + 1th frame. Whether it is the first subframe (i_subfr == 0) and whether the pitch (PrevT0) of the last subframe of the lost Nth frame and the pitch (T0) of the first subframe of the N + 1th frame are different (prev_T0! = To determine T0). If this condition is not satisfied, the controller 606 performs a general decoding procedure.

만약 위의 조건이 모두 만족되면, 제어부(606)는 N+1번째 프레임의 첫 번째 서브프레임과 두 번째 서브프레임의 피치 차이(T0-T0_2)의 절대값이, 미리 정해진 기준 값(x)보다 작은지 여부를 판단한다. 이러한 조건이 만족되지 않으면, 제어부(606)는 일반적인 디코딩 절차를 수행한다.If all of the above conditions are satisfied, the controller 606 determines that the absolute value of the pitch difference (T0-T0_2) between the first subframe and the second subframe of the N + 1th frame is greater than the predetermined reference value (x). Determine if it is small. If this condition is not satisfied, the controller 606 performs a general decoding procedure.

만약 위의 조건 또한 만족되면 제어부(606)는 N+1번째 프레임의 첫 번째 서브프레임 여기 신호를 합성하기 전에, N+1번째 프레임의 첫 번째 서브프레임 피치를 이용하여 N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트한다. 즉, 제어부(606)는 N번째 프레임의 마지막 서브프레임의 적응 코드북을 복원하고, N번째 프레임의 마지막 서브프레임의 랜덤 함수를 생성한다. 그리고 제어부(606)는 N번째 프레임의 마지막 서브프레임의 여기 신호를 복원하고, N번째 프레임의 마지막 서브프레임의 적응 코드북을 업데이트한다.If the above condition is also satisfied, the controller 606 uses the pitch of the first subframe of the N + 1th frame to synthesize the first subframe excitation signal of the N + 1th frame before the last subframe of the Nth frame. Update the adaptive codebook of. That is, the controller 606 restores the adaptive codebook of the last subframe of the Nth frame and generates a random function of the last subframe of the Nth frame. The control unit 606 restores the excitation signal of the last subframe of the Nth frame, and updates the adaptive codebook of the last subframe of the Nth frame.

이후 제어부(606)는 N+1번째 프레임의 여기 신호 합성 및 합성 필터에 의한 필터링을 수행한다. 즉, 제어부(606)는 적응 코드북의 디코딩 및 고정 코드북의 디코딩을 수행하고, 적응 코드북의 이득과 고정 코드북의 이득을 디코딩한다. 이렇게 디코딩된 적응 코드북과 그 이득, 고정 코드북과 그 이득을 이용하여, 제어부(606)는 해당 프레임의 여기 신호를 합성하고, 합성된 여기 신호는 합성 필터에 의해 필터링된다.Thereafter, the controller 606 performs filtering by the excitation signal synthesis and the synthesis filter of the N + 1th frame. That is, the controller 606 performs decoding of the adaptive codebook and decoding of the fixed codebook, and decodes the gain of the adaptive codebook and the gain of the fixed codebook. Using the decoded adaptive codebook and its gain, the fixed codebook and its gain, the control unit 606 synthesizes the excitation signal of the corresponding frame, and the synthesized excitation signal is filtered by the synthesis filter.

또한 본 발명에 의하면, 특히 무성음에서 유성음으로 천이하는 구간 혹은 피치가 변하는 구간에서 정상 프레임이 손실되었을 때, 프레임 손실을 빠르게 복구함으로써 프레임 손실로 인한 합성 신호의 품질 저하를 줄일 수 있다.In addition, according to the present invention, when the normal frame is lost, particularly in a transition period from unvoiced sound to voiced sound or a pitch change period, the frame loss can be quickly recovered to reduce the degradation of the synthesized signal due to the frame loss.

전술한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by.

Claims

In the method for decoding an audio signal,

Receiving an N + 1 th frame, which is a normal frame transmitted after the N th frame, which is lost frame data;

Determining whether to update an adaptive codebook of the last subframe of the Nth frame using the Nth frame and the N + 1th frame;

Updating the adaptive codebook of the last subframe of the Nth frame using the N + 1th frame; And

Synthesizing an audio signal using the Nth frame

Decoding method of an audio signal comprising.

The method of claim 1,

Determining whether to update the adaptive codebook of the last subframe of the Nth frame

A first determining step of determining whether a pitch of a last subframe of the Nth frame and a pitch of the first subframe of the N + 1th frame are different from each other;

A second determination step of determining whether a difference between a pitch of a first subframe of the N + 1th frame and a pitch of a second subframe of the N + 1th frame is smaller than a predetermined reference value; And

Determining that the adaptive codebook of the last subframe of the Nth frame is updated if both the results of the first determination step and the second determination step are positive.

Decoding method of an audio signal comprising.

The method of claim 1,

Updating the adaptive codebook of the last subframe of the N-th frame

Updating the adaptive codebook of the last subframe of the Nth frame using the pitch of the first subframe of the N + 1th frame.

Decoding method of an audio signal comprising.

An apparatus for decoding an audio signal,

An input unit for receiving an N + 1 th frame, which is a normal frame transmitted after the N th frame, which is lost frame data;

The Nth frame and the N + 1th frame are used to determine whether to update the adaptive codebook of the last subframe of the Nth frame, and the N + 1th frame is used to adapt the last subframe of the Nth frame. A controller for updating the codebook; And

A decoding unit for synthesizing an audio signal using the Nth frame

Apparatus for decoding an audio signal comprising.

The method of claim 1,

The control unit

Whether the pitch of the last subframe of the Nth frame and the pitch of the first subframe of the N + 1th frame are different from each other, the pitch of the first subframe of the N + 1th frame, and the N + 1th frame It is determined whether the difference in pitch of the second subframe is smaller than a predetermined reference value, and if the determination result is all positive, the adaptive codebook of the last subframe of the Nth frame is updated.

A device for decoding an audio signal.

The method of claim 1,

The control unit

Updating the adaptive codebook of the last subframe of the Nth frame using the pitch of the first subframe of the N + 1th frame

Method of decoding an audio signal.