CN1150516C

CN1150516C - Voice coding method and voice coder

Info

Publication number: CN1150516C
Application number: CNB971034516A
Authority: CN
Inventors: A��߶��ɭ; 艾瑞·A·格森; A; 马克·A·杰休科; 马休·A·哈特曼
Original assignee: Motorola Inc
Current assignee: BlackBerry Ltd
Priority date: 1993-03-26
Filing date: 1997-03-12
Publication date: 2004-05-19
Anticipated expiration: 2014-03-07
Also published as: SE518319C2; AU6397094A; JPH07507885A; AU678953B2; AU668817B2; SG47025A1; GB9802900D0; SE9404086D0; CA2135629C; SE0201109D0; CN1166019A; DE4492048C2; FR2706064B1; US5675702A; WO1994023426A1; CN1109697A; AU6084396A; DE4492048T1; JP3042886B2; CA2135629A1

Abstract

A Vector Additive Excited Linear Predictive Coding (VSELP) speech coder provides improved quality and reduces the complexity of conventional speech coders. VSELP uses codebooks, which have a predefined structure so that the computation required for the codebook search process is greatly reduced. The VSELP speech coder employs single- or multi-segment vector quantizers based on reflection coefficients of fixed-array technology (FLAT). In addition, this speech coder uses a prequantizer to reduce the complexity of the search and a high-resolution scalar quantizer to reduce the amount of memory required. To reduce computation and storage requirements.

Description

Voice coding method and voice coder

技术领域technical field

本发明一般地涉及使用代码激励的线性预测编码(CELP)，使用随机编码或矢量激励的语音编码方法，以及更具体地讲，涉及用于矢量相加激励的线性预测编码(VSELP)的矢量量化器方法。The present invention relates generally to code-excited linear predictive coding (CELP), speech coding methods using random coding or vector-excited coding, and more particularly to vector quantization for vector-additionally excited linear predictive coding (VSELP) device method.

背景技术Background technique

代码激励的线性预测(CELP)是一用于产生高质量合成语音的语音编码技术。这种语音编码也称为矢量激励的线性预测，被用于许多的语音通信和语音合成应用中。CELP特别是应用于数字语音加密和数字无线电话通信系统，这里语音质量，数据速率，大小和费用都是重要的问题。Code Excited Linear Prediction (CELP) is a speech coding technique used to generate high quality synthetic speech. This type of speech coding, also known as vector-excited linear prediction, is used in many speech communication and speech synthesis applications. CELP finds particular application in digital voice encryption and digital wireless telephone communication systems where voice quality, data rate, size and cost are all important issues.

在CELP语音编码器中，模拟输入语音信号特性的长期(音调(pitch))和短期(峰段(formant))预测器包含在一套时变滤波器内。具体地，可使用一个长期和一个短期滤波器。用于滤波器的激励信号从已存贮的改进序列的代码簿或码矢(codevectors)中选择。In a CELP speech coder, long-term (pitch) and short-term (formant) predictors of the characteristics of the analog input speech signal are contained within a set of time-varying filters. Specifically, one long-term and one short-term filter can be used. The excitation signal for the filter is selected from a stored codebook or codevectors of modified sequences.

对于每一帧语音，选择一最佳的激励信号。语音编码器施加一单独码矢到滤波器以产生一重建的语音信号，该重建的语音信号和原始输入的语音信号相比较，产生一误差信号。该误差信号通过一谱噪声加权滤波器被加权。谱噪声加权滤波器有一以人体听力感觉为基础的响应。最佳激励信号是一选出的码矢，对当时的语音帧它以最小能量产生加权的误差信号。For each frame of speech, an optimal excitation signal is selected. The speech encoder applies a single code vector to the filter to generate a reconstructed speech signal, which is compared with the original input speech signal to generate an error signal. The error signal is weighted by a spectral noise weighting filter. The spectral noise weighting filter has a response based on the human perception of hearing. The optimal excitation signal is a selected code vector which produces a weighted error signal with the least energy for the current speech frame.

典型地，线性预测编码(LPC)是用来模拟采样区段内的短期信号相关，也称为短期滤波器。短期信号相关表示声道的谐振频率。LPC系数是一组语音模型参量。其它的参量组可用来表示加于短期预测滤波器的激励信号的特性。这些其它的语音模型参量包括：线谱频率(LSF)，逆谱(cepstral)系数，反射系数，对数面积比和反正弦。Typically, Linear Predictive Coding (LPC) is used to model short-term signal correlations within a sample segment, also known as a short-term filter. The short-term signal correlation indicates the resonant frequency of the vocal tract. The LPC coefficients are a set of speech model parameters. Other sets of parameters can be used to characterize the excitation signal applied to the short-term prediction filter. These other speech model parameters include: line spectral frequency (LSF), inverse spectral (cepstral) coefficients, reflection coefficient, log area ratio and arcsine.

语音编码器通常对激励信号进行矢量量化以减少为表示信号特性所必需的比特数。LPC系数可在量化前被转换为其它的上述的参量组。这些系数可被单独地量化(标量量化)或被一组地量化(矢量量化)。标量量化不如矢量量化那样有效。然而标量量化在计算和存贮要求上比矢量量化费便宜。LPC参量的矢量量化是应用于主要考虑编码效率的地方。Speech coders typically vector quantize the excitation signal to reduce the number of bits necessary to represent the signal characteristics. The LPC coefficients can be converted to other above-mentioned parameter groups before quantization. These coefficients can be quantized individually (scalar quantization) or as a group (vector quantization). Scalar quantization is not as efficient as vector quantization. However, scalar quantization is cheaper than vector quantization in terms of computation and storage requirements. The vector quantization of LPC parameters is applied where the coding efficiency is mainly considered.

多段矢量量化可被用来均衡编码效率，矢量量化器搜索复杂性，和矢量量化器存贮要求等。第一类多段矢量量化是将一Np元素的LPC参量矢量划分为n段，n段的每一段是单独地被矢量量化。第二类多段矢量量化是在n个矢量的代码簿中划分LPC参量，这里每个矢量代码簿复盖所有Np矢量元素。为了说明矢量量化，假设Np＝10个元素，每个元素以2比特代表。传统的矢量量化每10个元素将要求2²⁰个码矢来表示所有可能码矢的可能性。第一类带有两段的多段矢量量化每5个元素将要求2¹⁰+2¹⁰个码矢。第二类带有2段的多段矢量量化每5个元素将要求2¹⁰+2¹⁰个码矢。这些矢量量化的每一种方法对于编码效率，搜索复杂性和存贮要求等方面都提供不同的利益。这样，现代技术的语音编码器将会从矢量量化方法和设备(这些设备和方法增加了编码效率，或减少了搜索复杂性或存贮要求而无需改变相应的要求)中得益。Multi-segment vector quantization can be used to balance coding efficiency, vector quantizer search complexity, and vector quantizer storage requirements. The first type of multi-segment vector quantization is to divide an LPC parameter vector of Np elements into n segments, and each segment of the n segments is vector quantized separately. The second type of multi-segment vector quantization divides LPC parameters in codebooks of n vectors, where each vector codebook covers all Np vector elements. To illustrate vector quantization, it is assumed that Np=10 elements, and each element is represented by 2 bits. Every 10 elements of traditional vector quantization will require ²²⁰ code vectors to represent the possibility of all possible code vectors. The first type of multi-segment vector quantization with two segments will require 2 ¹⁰ +2 ¹⁰ code vectors per 5 elements. The second type of multi-segment vector quantization with 2 segments will require 2 ¹⁰ +2 ¹⁰ code vectors per 5 elements. Each of these methods of vector quantization offers different benefits in terms of coding efficiency, search complexity, and storage requirements. Thus, state-of-the-art speech coders would benefit from vector quantization methods and devices that increase coding efficiency, or reduce search complexity or storage requirements without changing the corresponding requirements.

发明内容Contents of the invention

针对现有技术的缺陷，本发明提供了一种语音编码方法，包括步骤：For the defects of prior art, the invention provides a kind of speech coding method, comprising steps:

(a)利用M个基本矢量，构建2^M个代码矢量的激励代码簿，其中M是自然数；(a) Utilize M basic vectors to construct an incentive codebook of 2 ^M code vectors, where M is a natural number;

(b)接收输入的语音；(b) receiving input speech;

(c)响应于该输入语音，计算对应于表示该输入语音的语音参数的反射系数值；(c) in response to the input speech, calculating a reflection coefficient value corresponding to a speech parameter representing the input speech;

(d)在一个表格中存储2^N个反射系数值，每个反射系数值利用一个N比特代码来寻址，其中N是自然数；(d) store 2 ^N reflection coefficient values in a table, each reflection coefficient value is addressed by an N-bit code, where N is a natural number;

(e)处理代码矢量以产生一个合成语音；(e) processing the code vectors to produce a synthesized speech;

(f)从该激励代码簿中选择一个代码矢量，其中该代码矢量使该合成语音相对于该输入语音的加权误差实现最小，包括：(f) selecting a code vector from the excitation codebook, wherein the code vector minimizes the weighted error of the synthesized speech relative to the input speech, comprising:

(f1)当处理过程中需要反射系数值时，向该表格提供对应的N比特代码，以查寻反射系数值，(f1) When the reflection coefficient value is needed during processing, the corresponding N-bit code is provided to the table to look up the reflection coefficient value,

(f2)当处理过程中不需要反射系数值时，在处理期间仅存储该N比特代码，从而使反射系数值的存储要求实现最小。(f2) When the reflection coefficient value is not required during processing, only the N-bit code is stored during processing, thereby minimizing the storage requirements for the reflection coefficient value.

根据本发明的另一个方面，提供了一种语音编码器，包括：According to another aspect of the present invention, a speech encoder is provided, comprising:

一个代码簿生成器，其生成具有利用M个基本矢量形成的2^M个代码矢量的激励代码簿，其中M是自然数；a codebook generator that generates a stimulus codebook with 2 ^M codevectors formed from M basis vectors, where M is a natural number;

接收输入的语音并且产生数据矢量的输入装置；an input device that receives input speech and generates a data vector;

与该输入装置相耦合的编码装置，其生成对应于表示该输入语音的语音参数的反射系数，并且该编码装置处理这些代码矢量，以生成合成语音；encoding means coupled to the input means to generate reflection coefficients corresponding to speech parameters representing the input speech, and the encoding means to process the code vectors to generate synthesized speech;

用于量化该反射系数的矢量量化器，该矢量量化器包括一个矢量量化器存储器，该矢量量化器存储器被配置成存储2^N个反射系数值，且具有一个N比特的输入端和一个输出端，而且该矢量量化器存储器响应于该N比特输入端所接收的一个N比特地址，在输出端提供2^N个反射系数值中的一个，其中N是自然数；A vector quantizer for quantizing the reflection coefficient, the vector quantizer comprising a vector quantizer memory configured to store ^2N reflection coefficient values and having an N-bit input and an output , and the vector quantizer memory provides one of 2 ^N reflection coefficient values at the output, where N is a natural number, in response to an N-bit address received at the N-bit input;

与代码簿生成器相耦合的代码簿搜索控制器，用于从激励代码簿中选择一个代码矢量，以使合成语音和数据矢量之间的加权误差实现最小，该代码簿搜索控制器又与该矢量量化器相耦合，并且当处理过程中需要反射系数值时，向该矢量量化器提供一个相应的N比特代码，以查寻用于处理的反射系数值，当处理过程中不需要反射系数值时，该代码簿搜索控制器仅存储该N比特代码。a codebook search controller coupled to the codebook generator for selecting a code vector from the excitation codebook such that the weighted error between the synthesized speech and data vectors is minimized, the codebook search controller being coupled to the The vector quantizer is coupled, and the vector quantizer is provided with a corresponding N-bit code to look up the reflection coefficient value for processing when a reflection coefficient value is required for processing, and when a reflection coefficient value is not required for processing , the codebook search controller only stores the N-bit code.

根据本发明，减少了通常语音编码器的复杂性，并且减少了计算量和存储要求。According to the invention, the complexity of conventional speech coders is reduced, and the computation and storage requirements are reduced.

附图说明Description of drawings

图1是一无线电通信系统的方框图，它包括依据本发明的语音编码器。Figure 1 is a block diagram of a radio communication system including a speech encoder according to the invention.

图2是依据本发明的语音编码器的方框图。Figure 2 is a block diagram of a speech encoder according to the invention.

图3是依据本发明所使用的反正弦函数的曲线图。Figure 3 is a graph of the arcsine function used in accordance with the present invention.

具体实施方式Detailed ways

这里描述代码激励线性预测编码(CELP)的一种变形，被称为矢量相加激励的线性预测编码(VSELP)是本发明的优选实施例。VSELP使用一具有预先定义的结构的激励代码簿，这样，代码簿搜索过程所要求的计算就大大减少。这个VSELP语音编码器使用基于定点点阵技术(FLAT)的反射系数的单一或多段矢量量化器。此外，这个语音编码器使用了预量化器以减少矢量代码簿搜索复杂性，和使用高分辨率标量量化器以减少为存贮反射系数矢量代码簿所需要的存贮器总量。这就得到一反射系数的高性能的矢量量化器，它在计算上是有效的并已减少了存贮要求。A variant of Code Excited Linear Predictive Coding (CELP) described herein, known as Vector Addition Excited Linear Predictive Coding (VSELP), is the preferred embodiment of the invention. VSELP uses an excitation codebook with a predefined structure, so that the computation required for the codebook search process is greatly reduced. The VSELP vocoder uses single or multi-segment vector quantizers based on reflection coefficients in Fixed Point Lattice Technology (FLAT). In addition, this speech coder uses a prequantizer to reduce the vector codebook search complexity, and a high resolution scalar quantizer to reduce the total amount of memory required to store the reflection coefficient vector codebook. This results in a high performance vector quantizer of reflection coefficients which is computationally efficient and has reduced memory requirements.

图1是无线电通信系统100的方框图。无线电通信系统100包括两个收发机101，113，他们互相发送和接收语音数据。这两个收发机101，113可以是中继无线电系统，或无线电话通信系统或任何其它发送和接收语音数据的无线电通信系统的一部分。在发射机中，语音信号被输入到话筒108，语音编码器选择被量化的语音模型参量。被量化的参量的代码被发送到另一收发信机113。在另一收发信机113，所发送的被量化参量的代码被接收机121接收，并用来在语音译码器123中再生语音。再生的语音输出到扬声器124。FIG. 1 is a block diagram of a radio communication system 100 . The radio communication system 100 comprises two transceivers 101, 113, which transmit and receive voice data from each other. The two transceivers 101, 113 may be part of a trunked radio system, or a radiotelephone communication system or any other radio communication system for transmitting and receiving voice data. In the transmitter, the speech signal is input to the microphone 108 and the speech coder selects the quantized speech model parameters. The code of the quantized quantity is sent to another transceiver 113 . In another transceiver 113 the codes of the transmitted quantized parameters are received by a receiver 121 and used in a speech decoder 123 to reproduce the speech. The reproduced voice is output to the speaker 124 .

图2是一VSELP语音编码器200的方框图。VSELP语音编码器200使用所接收的代码来决定使用代码簿中的激励矢量。VSELP编码器使用有2^M个码矢(它由M个基本矢量构成)的激励代码簿。字度Vm(n)为第m个基本矢量，定义u_i(n)为代码簿中的第i个码矢。FIG. 2 is a block diagram of a VSELP speech coder 200 . The VSELP speech encoder 200 uses the received codes to decide to use the excitation vectors in the codebook. The VSELP encoder uses an excitation codebook with 2 ^M codevectors (which consists of M basis vectors). The word size Vm(n) is the mth basic vector, and u _i (n) is defined as the ith codevector in the codebook.

于是then

${u u}_{i i} ((n no)) = = {Σ Σ}_{m m = = 11}^{M m} {θ θ}_{im im} {v v}_{m m} ((n no)) - - - - - - ((1.10 1.10))$

这里0≤i-≤2^M-1；0≤n≤N-1。换句话说，代码簿中的每一个码矢被构成为M个基本矢量的线性组合。线性组合由Q参量定义。Here 0≤i- ^≤2M -1; 0≤n≤N-1. In other words, each code vector in the codebook is constituted as a linear combination of M basis vectors. Linear combinations are defined by Q parameters.

θ_im被定义为θ _im is defined as

θ_im＝+1如果码字i的比特m＝1θ _im =+1 if bit m=1 of codeword i

θ_im＝-1如果码字i的比特m＝0码矢i被构成为M个基本矢量之和，这里每一基本矢量的符号(正号或负号)由码字i中相应的比特状态来决定。注意如果我们求补码字i中全部比特数，相应的码矢就是码矢i的负数。所以，对于每个码矢，它的负数也是代码簿中的一个码矢。由于为相应的码字彼此互补，这些码矢对被称为互补码矢。θ _im =-1 If the bit m=0 of the codeword i, the codevector i is constituted as the sum of M basic vectors, where the sign (positive or negative) of each basic vector is determined by the corresponding bit state in the codeword i to decide. Note that if we find the number of bits in the complement codeword i, the corresponding code vector is the negative of code vector i. So, for each codevector, its negative is also a codevector in the codebook. Since the corresponding codewords are complementary to each other, these codevector pairs are called complementary codevectors.

适宜的矢量被选择后，增益单元205对所选择的矢量乘以增益项γ换算。增益单元205的输出加到一组线性滤波器207和209以获得重建语音的N个样值。滤波器包括一“长期”(或“音调”)滤波器207，它将音调周期性插入激励中。“长期”滤波器207的输出又加到“短期”(或“峰段”)滤波器209。短期滤波器209给信号加上谱包络。After a suitable vector is selected, the gain unit 205 scales the selected vector by multiplying it by a gain term γ. The output of the gain unit 205 is applied to a set of linear filters 207 and 209 to obtain N samples of the reconstructed speech. The filters include a "long-term" (or "pitch") filter 207, which periodically inserts a pitch into the excitation. The output of the "long term" filter 207 is applied to a "short term" (or "peak") filter 209 . Short term filter 209 adds a spectral envelope to the signal.

长期滤波器207包括长期预测器系数(LTP)。长期滤波器207试图从一个或多个很远以前的样值中预测下一个输出样值。如果仅有一个过去的样值被应用于预测器中，那么，预测器就是一单抽头预测器。典型地使用一到三个抽头。含有一个单抽头长期预测器的长期(“音调”)滤波器207，其传递函数由(1.1)给出The long-term filter 207 includes long-term predictor coefficients (LTP). The long-term filter 207 attempts to predict the next output sample from one or more samples far in the past. If only one past sample is used in the predictor, then the predictor is a one-tap predictor. Typically one to three taps are used. A long-term ("pitch") filter 207 containing a single-tap long-term predictor whose transfer function is given by (1.1)

$B B ((z z)) = = \frac{11}{11 - - {βz βz}^{- - L L}} - - - - - - ((1.1 1.1))$

B(z)的特性由两个量L和β表示。L称为“滞后”。对于声频的语音，L典型地应是音调周期或是它的倍数。L也可能是一非整数值。如果L是一非整数，一个内插的有限脉冲响应(FIR)滤波器被用来产生部分延迟的样值。β是长期(或“音调”)预测器系数。B(z) is characterized by two quantities L and β. L is called "lag". For audible speech, L would typically be the pitch period or a multiple thereof. L may also be a non-integer value. If L is a non-integer, an interpolating finite impulse response (FIR) filter is used to generate partially delayed samples. β is the long-term (or "pitch") predictor coefficient.

短期滤波器209包括短期预测器系数α_i，它试图从前面的Np个输出样值中预测下一个输出样值。Np典型的范围是从8到12。在本优选的实施例中，Np等于10。短期滤波器209相当于传统的Lpc合成滤波器。短期滤波器209的传递函数由(1.2)给出The short-term filter 209 includes short-term predictor coefficients α _i which attempt to predict the next output sample from the previous Np output samples. Np typically ranges from 8 to 12. In the preferred embodiment, Np is equal to 10. The short-term filter 209 is equivalent to a conventional Lpc synthesis filter. The transfer function of the short-term filter 209 is given by (1.2)

$A A ((z z)) = = \frac{11}{11 - - {Σ Σ}_{i i = = 11}^{{N N}_{p p}} {α α}_{i i} {z z}^{- - i i}} - - - - - - ((1.2 1.2))$

短期滤波器209的特性由α_i参量表示。对于全极点“合成”滤波器，它是直接形式滤波器系数。关于α_i参量的详情可在下面见到。The characteristics of the short-term filter 209 are represented by the α _i parameter. For all-pole "synthesis" filters, it is the direct form filter coefficients. Details on the α _i parameter can be found below.

各种参量(代码，增益，滤波器参量)并不以相同的速率发送到合成器(语音译码器)。典型地，短期参量的更新通常比代码少。我们将定义短期参量更新率为“帧频”，更新之间的间隔为一“帧”。代码更新率由矢量长度N决定。我们将定义代码更新率为“子帧频”，代码更新间隔为“子帧”。一帧通常由整数个子帧组成。增益和长期参量可以子帧频，帧频或根据语音编码器设计的两者之间的某些速率更新。The various parameters (code, gain, filter parameters) are not sent to the synthesizer (speech coder) at the same rate. Typically, short-term parameters are updated less often than code. We will define the short-term parameter update rate as "frame rate", and the interval between updates as one "frame". The code update rate is determined by the vector length N. We will define the code update rate as "subframe frequency" and the code update interval as "subframe". A frame usually consists of an integer number of subframes. The gain and long-term parameters can be updated at the subframe rate, the frame rate, or some rate in between depending on the vocoder design.

代码簿搜索过程包括尝试每一个码矢作为可能激励CELP合成器的码矢。合成语音S′(n)在比较器211和输入语音S(n)相比较，而产生一差值信号e_i。这个差值信号e_i(n)由一谱加权滤波器W(z)213(也可能由第二加权滤波器C(z)加以滤波以产生一加权的误差信号e′(n)。e′(n)中的功率在能量计算器215上计算。产生最小加权误差功率的码矢被选为该子帧的码矢。谱加权滤波器213用来对基于感官考虑的误差谱进行加权。这个加权滤波器213是语音谱的一个函数，并能以短期(谱)滤波器209的α参量来表示。The codebook search process consists of trying each codevector as a possible codevector to excite the CELP synthesizer. The synthesized speech S'(n) is compared with the input speech S(n) in the comparator 211 to generate a difference signal e _i . This difference signal e _i (n) is filtered by a spectral weighting filter W(z) 213 (possibly also by a second weighting filter C(z) to produce a weighted error signal e'(n). e' The power in (n) is calculated on energy calculator 215.The code vector that produces minimum weighted error power is selected as the code vector of this subframe. Spectrum weighting filter 213 is used for weighting the error spectrum based on sensory considerations. This The weighting filter 213 is a function of the speech spectrum and can be represented by the alpha parameter of the short-term (spectral) filter 209 .

$W W ((z z)) = = \frac{11 - - {Σ Σ}_{i i = = 11}^{{N N}_{p p}} {α α}_{i i} {z z}^{- - i i}}{11 - - {Σ Σ}_{i i = = 11}^{{N N}_{p p}} {\overset{~ ~}{α α}}_{i i} {z z}^{- - i i}} - - - - - - ((1.3 1.3))$

有两种方法能用来计算增益γ。可以在根据残余能量的代码簿搜索之前决定增益。这个增益然后可被固定用于代码簿搜索。另一个方法是当代码簿搜索时优化每个码矢的增益。产生最小加权误差的码矢将被选择，并且它相应的最佳增益将被用于γ。由于增益是对每个码矢的优化，所以后一方法一般地能产生更好的效果。这个方法也意味着增益项必须以子帧频更新。此技术的最佳代码和增益可计算如下：There are two methods that can be used to calculate the gain γ. The gain can be determined prior to the codebook search from the residual energy. This gain can then be fixed for codebook searches. Another approach is to optimize the gain of each codevector when the codebook is searched. The codevector that yields the smallest weighting error will be chosen, and its corresponding best gain will be used for γ. Since the gain is optimized for each code vector, the latter approach generally yields better results. This approach also means that the gain term must be updated at the subframe rate. The optimal code and gain for this technique can be calculated as follows:

1.对于子帧计算已加权的输入信号y(n)。1. Compute the weighted input signal y(n) for a subframe.

2.计算d(n)，子帧的B(z)和W(z)(和(C(z)，如果使C(z)的情况下)滤波器的零输入响应。(零输入响应是滤波器没有输入时的响应；滤波器状态的衰减)。2. Calculate d(n), B(z) and W(z) of the subframe (and (C(z), if C(z) is used) the zero-input response of the filter. (The zero-input response is filter response when there is no input; decay of the filter state).

3.在整个子帧计算p(n)＝y(n)-d(n)(0≤n≤N-1)3. Calculate p(n)=y(n)-d(n) (0≤n≤N-1) in the entire subframe

4.对于每个代码i。4. For each code i.

a.计算g(n)，即码矢i的B(z)和W(z)(和C(z)，如果使用的情况下)的零状态响应。(零状态响应是将起始滤波状态置为零的滤波器输出)。a. Compute g(n), the zero-state response of B(z) and W(z) (and C(z), if used) for code vector i. (The zero state response is the filter output with the initial filter state set to zero).

b.计算b. Calculate

${C C}_{i i} = = {Σ Σ}_{n no = = 00}^{N N - - 11} {g g}_{i i} ((n no)) p p ((n no)) - - - - - - ((1.5 1.5))$

即，已滤波的码矢i和p(n)之间的互相关That is, the cross-correlation between the filtered codevectors i and p(n)

c.计算c. Calculate

${G G}_{i i} = = {Σ Σ}_{n no = = 00}^{N N - - 11} {(({g g}_{i i} ((n no))))}^{22} - - - - - - ((1.6 1.6))$

即，已滤波的码矢i中的功率。That is, the power in the filtered codevector i.

5.选择i，它使 $\frac{{(G_{i})}^{2}}{G_{i}} (1.7)$ 达到最大。5. Choose i, which makes $\frac{{(G_{i})}^{2}}{G_{i}} (1.7)$ to reach maximum.

6.使用所选择的码字和它相应的量化增益，更新B(z)和W(z)(和C(z)，如使用的情况下)滤波器的滤波状态，以得到合成器在步骤2的下一子帧起始的同样滤波状态。码矢i的最佳增益由(1.8)式给出6. Using the chosen codeword and its corresponding quantization gain, update the filtering states of the B(z) and W(z) (and C(z), if used) filters to obtain the synthesizer at step The same filtering state at the start of the next subframe of 2. The optimal gain for code vector i is given by (1.8)

${γ γ}_{i i} = = \frac{{C C}_{i i}}{{G G}_{i i}} - - - - - - ((1.8 1.8))$

使用最佳增益γ_i的码矢i的总加权误差由(1.9)给出The total weighted error for code vector i using the optimal gain γ _i is given by (1.9)

${E E.}_{i i} = = (({Σ Σ}_{n no = = 00}^{N N - - 11} {p p}^{22} ((n no)))) - - \frac{{(({C C}_{i i}))}^{22}}{{G G}_{i i}} . . - - - - - - ((1.9 1.9))$

短期预测器参量是图2短期滤波器209的α_i′。这些是标准的LPC直接形式滤波器系数，许多LPC分析技术可被用来决定这些系数。在本优选实施例中，使用了一个快速定点协方差点阵算法(FLAT)。FLAT具有点阵算法的全部优点，包括有保证的滤波器稳定性，非窗口分析和在循环内量化反射系数的能力。此外，FLAT在数值上是健全的，并能在一定点处理器上很容易实现。The short-term predictor parameter is α _i ' of the short-term filter 209 of FIG. 2 . These are standard LPC direct form filter coefficients, and many LPC analysis techniques can be used to determine these coefficients. In the preferred embodiment, a Fast Fixed Point Covariance Lattice Algorithm (FLAT) is used. FLAT has all the advantages of lattice algorithms, including guaranteed filter stability, non-windowed analysis, and the ability to quantize reflection coefficients within loops. Furthermore, FLAT is numerically sound and can be easily implemented on a given number of processors.

短期预测器参量是从输入语音计算的，没有使用预加重。用于参量计算的分析长度是170个样值(N_A＝170)。预测器的阶数是10(Np＝10)。The short-term predictor parameters are computed from the input speech without using pre-emphasis. The analysis length used for parameter calculation is 170 samples (N _A =170). The order of the predictor is 10 (Np=10).

这部分将详细描述FLAT算法。令落入分析间隔内输入语音的样值用S(n)表示；0≤n≤N_A-1。因为FLAT是一点阵算法，可以认为该技术是试图逐级地建立一最佳(它使残余能量最小)的逆点阵滤波器。定义b_j(n)为来自逆点阵滤波器第j级的反向余项，f_j(n)为来自逆点阵滤波器第j级的正向余项，我们可定义This section will describe the FLAT algorithm in detail. Let the samples of input speech falling within the analysis interval be denoted by S(n); 0≤n≤NA _- 1. Because FLAT is a lattice algorithm, the technique can be thought of as an attempt to progressively build an optimal (which minimizes residual energy) inverse lattice filter. Define b _j (n) as the reverse remainder from the jth stage of the inverse lattice filter, and f _j (n) as the forward remainder from the jth stage of the inverse lattice filter, we can define

${F f}_{j j} ((i i,, k k)) = = {Σ Σ}_{n no - - {N N}_{p p}}^{{N N}_{A A} - - 11} {f f}_{j j} ((n no - - i i)) {f f}_{j j} ((n no - - k k)) - - - - - - ((2.1 2.1))$

为f_j(n)的自相关；is the autocorrelation of f _j (n);

${B B}_{j j} ((i i,, k k)) = = {Σ Σ}_{n no - - {N N}_{p p}}^{{N N}_{A A} - - 11} {b b}_{j j} ((n no - - i i - - 11)) {b b}_{j j} ((n no - - k k - - 11)) - - - - - - ((2.2 2.2))$

为b_j(n-1)的自相关；以及is the autocorrelation of b _j (n-1); and

${C C}_{j j} ((i i,, k k)) = = {Σ Σ}_{n no - - {N N}_{p p}}^{{N N}_{A A} - - 11} {f f}_{j j} ((n no - - i i)) {b b}_{j j} ((n no - - k k - - 11)) - - - - - - ((2.3 2.3))$

为f_j(n)和b_j(n-1)之间的互相关。is the cross-correlation between f _j (n) and b _j (n-1).

令r_j表示逆点阵第j级的反射系数。于是Let r _j denote the reflection coefficient of the jth order of the inverse lattice. then

${F f}_{j j} ((i i,, k k)) = = {F f}_{j j - - 11} ((i i,, k k)) + + {r r}_{j j} (({C C}_{j j - - 11} ((i i,, k k)) + + {C C}_{j j - - 11} ((k k,, i i)))) + + {r r}_{j j}^{22} {B B}_{j j - - 11} ((i i,, k k)) - - - - - - ((2.4 2.4))$

和and

${B B}_{j j} ((i i,, k k)) = = {B B}_{j j - - 11} ((i i + + 11,, k k + + 11)) + + {r r}_{j j} (({C C}_{j j - - 11} ((i i + + 11,, k k + + 11)) + + {C C}_{j j - - 11} ((k k + + 11,, j j + + 11)))) + + {r r}_{j j}^{22} {F f}_{j j - - 11} ((i i + + 11,, k k + + 11)) - - - - - - ((2.5 2.5))$

和and

${C C}_{j j} ((i i,, k k)) = = {C C}_{j j - - 11} ((i i,, k k + + 11)) + + {r r}_{j j} (({B B}_{j j - - 11} ((i i,, k k + + 11)) + + {F f}_{j j - - 11} ((i i,, k k + + 11)))) + + {r r}_{j j}^{22} {C C}_{j j - - 11} ((k k + + 11,, i i)) - - - - - - ((2.6 2.6))$

我们已选择的用来决定r_j的公式可表示为The formula we have chosen to determine _rj can be expressed as

${r r}_{j j} = = - - 22 \frac{{C C}_{j j - - 11} ((0,0 0,0)) + + {C C}_{j j - - 11} (({N N}_{p p} - - j j,, {N N}_{p p} - - j j))}{{F f}_{j j - - 11} ((0,0 0,0)) + + {B B}_{j j - - 11} ((0,0 0,0)) + + {F f}_{j j - - 11} (({N N}_{p p} - - j j,, {N N}_{p p} - - j j)) + + {B B}_{j j - - 11} (({N N}_{p p} - - j j,, {N N}_{p p} - - j j))} - - - - - - ((2.7 2.7))$

FLAT算法现可描述如下：The FLAT algorithm can now be described as follows:

1.首先从输入语音中计算协方差(自相关)矩阵1. First calculate the covariance (autocorrelation) matrix from the input speech

$φ φ ((i i,, k k)) = = {Σ Σ}_{{N N}_{p p}}^{{N N}_{A A} - - 11} s the s ((n no - - i i)) s the s ((n no - - k k)) - - - - - - 00 \leq \leq i i,, k k \leq \leq NP NP . . - - - - - - ((2.8 2.8))$

2.2.

F0(i，k)＝f(i，k) 0≤i，k≤NP-1 (2.9)F0(i,k)=f(i,k) 0≤i,k≤NP-1 (2.9)

B0(i，k)＝f(i+1，k+1) 0≤i，k≤NP-1 (2.10)B0(i,k)=f(i+1,k+1) 0≤i,k≤NP-1 (2.10)

C0(i，k)＝f(i，k+1) 0≤i，k≤NP-1 (2.11)C0(i,k)=f(i,k+1) 0≤i,k≤NP-1 (2.11)

3.设j＝13. Let j=1

4.用(2.7)式计算rj4. Use formula (2.7) to calculate rj

5.如j＝Np，于是结束5. If j=Np, then end

6.用(2.4)计算F_j(i，k) 0≤i，k≤Np-j-16. Use (2.4) to calculate F _j (i, k) 0≤i, k≤Np-j-1

用(2.5)计算B(i，k) 0≤i，k≤Np-j-1Use (2.5) to calculate B(i, k) 0≤i, k≤Np-j-1

用(2.6)计算C_j(i，k) 0≤i，k≤Np-j-1Use (2.6) to calculate C _j (i, k) 0≤i, k≤Np-j-1

7.j＝j+1；转到47. j=j+1; go to 4

在解出反射系数之前，用给自相关函数开窗的方法修正φ数值Before solving the reflection coefficient, correct the value of φ by windowing the autocorrelation function

φ′(i，k)＝φ′(i，k)w(|i-k|) (2.12)φ′(i, k)=φ′(i, k)w(|i-k|) (2.12)

在计算反射系数之前，给自相关函数开窗通称为谱平滑(SST)。Windowing the autocorrelation function before calculating the reflection coefficient is known as spectral smoothing (SST).

从反射系数r_j，可计算短期LPC预测系数α_i。From the reflection coefficient r _j , the short-term LPC prediction coefficient α _i can be calculated.

使用反射系数的28比特三段矢量量化器。矢量量化器的段分别复盖反射系数r₁-r₃，r₄-r₆，和r₇-r₁₀。矢量量化器段的比特分配是28-bit three-segment vector quantizer using reflection coefficients. The segments of the vector quantizer cover reflection coefficients r ₁ -r ₃ , r ₄ -r ₆ , and r ₇ -r ₁₀ , respectively. The bit allocation of the vector quantizer section is

Q₁ 11比特Q ₁ 11 bits

Q₂ 9比特Q ₂ 9 bits

Q₃ 8比特为避免穷举的矢量量化器搜索的计算复杂性，在每一段使用了反射系数矢量的预量化器。每段预量化器的大小是：Q ₃ 8 bits To avoid the computational complexity of an exhaustive vector quantizer search, a prequantizer of reflection coefficient vectors is used in each segment. The size of each prequantizer is:

P₁ 6比特P ₁ 6 bits

P₂ 5比特P ₂ 5 bits

P₃ 4比特P ₃ 4 bits

在一给定段，来自预量化器的每个矢量的残余误差被计算并存入暂时存贮器中。搜索这个表以识别最低失真的4个预量化器矢量。每个选出的预量化器矢量的指数是用来计算进入矢量量化器表中的偏移，与预量化器矢量相连系的量化器矢量邻接的子集从这里开始。在第k段每个矢量量化器子集的大小由下式给出：At a given stage, the residual error for each vector from the prequantizer is calculated and stored in temporary memory. Search this table to identify the 4 prequantizer vectors with the lowest distortion. The index of each selected prequantizer vector is used to calculate the offset into the vector quantizer table from which the contiguous subset of quantizer vectors associated with the prequantizer vector begins. The size of each vector quantizer subset in segment k is given by:

${S S}_{k k} = = \frac{22^{{Q Q}_{k k}}}{22^{{P P}_{k k}}} - - - - - - ((2.13 2.13))$

与选出的预量化器矢量相连系的量化器矢量的4个子集被检索用来寻找产生最低残余误差的量化器矢量。这样，在第一段估算64个预量化器矢量和128个量化器矢量，在第二段估算32个预量化器矢量和64个量化器矢量，在第三段估算16个预量化器矢量和64个量化器矢量。通过上述具有频带扩展的FLAT技术所计算的最佳反射系数在矢量量化之前被转换为一自相关矢量。The 4 subsets of quantizer vectors associated with the selected prequantizer vector are searched to find the quantizer vector that produces the lowest residual error. Thus, 64 prequantizer vectors and 128 quantizer vectors are estimated in the first stage, 32 prequantizer vectors and 64 quantizer vectors are estimated in the second stage, and 16 prequantizer vectors and 64 quantizer vectors. The optimal reflection coefficient calculated by the above-mentioned FLAT technique with band extension is converted into an autocorrelation vector before vector quantization.

FLAT算法的自相关方案AFLAT是被用来计算正在被估算的反射系数矢量的残余误差能量。象FLAT一样，当计算最佳反射系数或从当前段的矢量量化器中选择反射系数时，这个算法具有部分地补偿来自以前点阵级中反射系数量化误差的能力。这个改善对于具有高反射系数量化失真的帧是很重要的。现在描述AFLAT算法(在带有预量化器的多段矢量量化器的范围内)：The autocorrelation scheme AFLAT of the FLAT algorithm is used to calculate the residual error energy of the reflection coefficient vector being estimated. Like FLAT, this algorithm has the ability to partially compensate reflection coefficient quantization errors from previous lattice stages when calculating the best reflection coefficient or selecting reflection coefficients from the current stage's vector quantizer. This improvement is important for frames with high reflectance quantization distortion. Now describe the AFLAT algorithm (in the context of a multi-segment vector quantizer with prequantizer):

在0≤i≤Np范围内，根据最佳反射系数计算自相关序列R(i)。或者，可根据其它的LPC参量表示(如直接形式LPC预测器系数α_i′)来计算自相关序列，或直接根据输入语音来计算。In the range of 0≤i≤Np, the autocorrelation sequence R(i) is calculated according to the optimal reflection coefficient. Alternatively, the autocorrelation sequence can be calculated according to other LPC parameter representations (such as direct form LPC predictor coefficients α _i '), or directly from the input speech.

定义AFLAT循环的起始条件：Define the starting conditions for the AFLAT loop:

P₀(i)＝R(i)，0≤i≤N_p-1 (2.14)P ₀ (i)=R(i), 0≤i≤N _p -1 (2.14)

V₀(i)＝R(|i+1|)，1-N_p≤i≤N_p-1 (2.15)V ₀ (i)=R(|i+1|), 1-N _p ≤ i ≤ N _p -1 (2.15)

初始化矢量量化器段指数k：Initialize the vector quantizer segment index k:

k＝1 (2.16)令I_l(k)是第k段中第一点阵级的指数，I_h(k)是第k段的最后点阵级的指数。在第k段估算来自点阵级I_h(k)的残余误差的循环，已知r，来自预量化器的反射系数矢量或来自量化器的反射系数矢量表示如下。k = 1 (2.16) Let I _l (k) be the index of the first lattice stage in the kth segment and _Ih (k) be the index of the last lattice stage in the kth segment. The cycle of estimating the residual error from the lattice level _Ih (k) at segment k, knowing r, the reflection coefficient vector from the prequantizer or the reflection coefficient vector from the quantizer is expressed as follows.

初始化点阵级的指数j，以指向第k段的开始：Initialize the index j of the lattice level to point to the start of the kth segment:

j＝I_l(k) (2.17)j=I _l (k) (2.17)

设置P_j-1和V_j-1的起始条件为Set the initial conditions of P _j-1 and V _j-1 as

P_j-1(i)＝P_j-1(i)，0≤i≤I_h(k)-I_l(k)+1 (2.18)P _j-1 (i) = P _j-1 (i), 0≤i≤I _h (k)-I _l (k)+1 (2.18)

V_j-1(i)＝ V_j-1(i)，-I_h(k)+I_l(k)-1≤i≤I_h(k)-I_l(k)+1(2.19)V _j-1 (i) = V _j-1 (i), -I _h (k)+I _l (k)-1≤i≤I _h (k)-I _l (k)+1 (2.19)

用下式计算V_j和P_j数值，Calculate the V _j and P _j values using the following formula,

${P P}_{j j} ((i i)) = = ((11 + + {\overset{^^}{r r}}_{j j}^{22})) {P P}_{j j - - 11} ((i i)) + + {\overset{^^}{r r}}_{j j} [[{V V}_{j j - - 11} ((i i)) + + {V V}_{j j - - 11} ((- - i i))]],, 00 \leq \leq i i \leq \leq {I I}_{b b} ((k k)) - - j j - - - - - - ((2.20 2.20))$

${V V}_{j j} ((i i)) = = {V V}_{j j - - 11} ((i i + + 11)) + + {\overset{^^}{r r}}_{j j}^{22} {V V}_{j j - - 11} ((- - i i - - 11)) + + 22 {\overset{^^}{r r}}_{j j} {P P}_{j j - - 11} ((| | i i + + 11 | |)),, j j - - {I I}_{h h} ((k k)) \leq \leq i i \leq \leq {I I}_{h h} ((k k)) - - j j - - - - - - ((2.21 2.21))$

j增加：j increase:

j＝j+1 (2.22)j=j+1 (2.22)

如j≤I_h(k)转到(2.20)式。Such as j≤I _h (k) to (2.20) formula.

已知反射系数矢量

来自点阵级L_h(k)的残余误差由下式给出：Known reflection coefficient vector

The residual error from lattice level L _h (k) is given by:

${E E.}_{T T} = = {P P}_{{I I}_{b b} ((k k))} ((00)) - - - - - - ((2.23 2.23))$

使用概述的AFLAT循环，‘估算来自第k段预量化器的每个矢量引起的残余误差，识别要搜索的量化器矢量的4个子集，并计算来自所选出的4个子集的每个量化器矢量引起的残余误差。在4个子集中所有量化器矢量范围内使E_r为最小的量化器矢量的指数，被用Q_k比特编码。如k＜3，于是在k+1段进行循环的起始条件需要计算。设点阵级指数j等于：Using the outlined AFLAT loop, 'Estimate the residual error induced by each vector from the k-th segment prequantizer, identify 4 subsets of quantizer vectors to search for, and compute each quantizer from the 4 selected subsets The residual error caused by the device vector. The quantizer vector that minimizes _Er over the range of all quantizer vectors in the 4 subsets The exponent of is coded with Q _k bits. If k<3, then the initial condition for looping in segment k+1 needs to be calculated. Let the lattice index j be equal to:

j＝I_l(k) (2.24)j=I _l (k) (2.24)

计算：calculate:

${\overset{&OverBar; &OverBar;}{P P}}_{j j} ((i i)) = = ((11 + + {\overset{~ ~}{r r}}_{j j}^{22})) {\overset{&OverBar; &OverBar;}{P P}}_{j j - - 11} ((i i)) + + {\overset{~ ~}{r r}}_{j j} [[{\overset{&OverBar; &OverBar;}{V V}}_{j j - - 11} ((i i)) + + {\overset{&OverBar; &OverBar;}{V V}}_{j j - - 11} ((i i))]],, 00 \leq \leq i i \leq \leq {N N}_{p p} - - j j - - 11 - - - - - - ((2.25 2.25))$

${\overset{&OverBar; &OverBar;}{V V}}_{j j} ((i i)) = = {\overset{&OverBar; &OverBar;}{V V}}_{j j - - 11} ((i i + + 11)) + + {\overset{~ ~}{r r}}_{j j}^{22} {\overset{&OverBar; &OverBar;}{V V}}_{j j - - 11} ((- - i i - - 11)) + + 22 {\overset{&OverBar; &OverBar;}{r r}}_{j j} {\overset{&OverBar; &OverBar;}{P P}}_{j j - - 11} ((| | i i + + 11 | |)),, j j - - {N N}_{p p} + + 11 \leq \leq i i \leq \leq {N N}_{p p} - - j j - - 11 - - - - - - ((2.26 2.26))$

j增加，j increases,

j＝j+1 (2.27)j=j+1 (2.27)

如j≤I_h(k)，转到式(2.25)If j≤I _h (k), go to formula (2.25)

增加矢量量化器分段指数k：Increase the vector quantizer segment index k:

k＝k+1 (2.28)k=k+1 (2.28)

如k≤3，转到(2.17)。否则，选择三段的反射系数矢量的指数，并终止反射系数矢量量化器的搜索。If k≤3, go to (2.17). Otherwise, the index of the reflection coefficient vector of the three segments is selected and the search of the reflection coefficient vector quantizer is terminated.

为了使反射系数矢量量化器的存贮要求为最小，单个反射系数的八位代码而不是实际的反射系数值被存贮于矢量量化器表中，该代码被用来从有256项的标量量化表中查寻反射系数值。八位代码表示通过均匀采样图3所示的反正弦函数所得到的反射系数值。反射系数值从-1到+1变化。当值是接近于+/-1的极限时，在反射系数域(X轴)中的非线性间隔提供更精确的反射系数。而当值是近于0时，提供稍差的精确度。假设为256量化级时，和在反射系数域中均匀采样相比较，这减少了由于反射系数的标量量化引起的谱失真。In order to minimize the storage requirements of the reflection coefficient vector quantizer, the eight-bit code of the individual reflection coefficient, rather than the actual reflection coefficient value, is stored in the vector quantizer table. Look up reflectance values in the table. The eight-bit code represents the reflection coefficient value obtained by uniformly sampling the arcsine function shown in Figure 3. The reflectance value varies from -1 to +1. Non-linear spacing in the reflection coefficient domain (X-axis) provides more accurate reflection coefficients when values are near the limit of +/-1. And when the value is close to 0, slightly worse accuracy is provided. Assuming 256 quantization levels, this reduces spectral distortion due to scalar quantization of reflection coefficients compared to uniform sampling in the reflection coefficient domain.

Claims

1. A speech coding method, comprising steps:

(a) Utilize M basic vectors to construct an incentive codebook of 2 ^M code vectors, where M is a natural number;

(b) receiving input speech;

(c) in response to the input speech, calculating a reflection coefficient value corresponding to a speech parameter representing the input speech;

(d) store 2 ^N reflection coefficient values in a table, each reflection coefficient value is addressed by an N-bit code, where N is a natural number;

(e) processing the code vectors to produce a synthesized speech;

(f) selecting a code vector from the excitation codebook, wherein the code vector minimizes the weighted error of the synthesized speech relative to the input speech, comprising:

(f1) When the reflection coefficient value is needed during processing, the corresponding N-bit code is provided to the table to look up the reflection coefficient value,

(f2) When the reflection coefficient value is not required during processing, only the N-bit code is stored during processing, thereby minimizing the storage requirements for the reflection coefficient value.

2. A method according to claim 1, wherein reflection coefficient values are stored in a vector quantizer of the speech coder, and the reflection coefficient values are scaled non-linearly.

3. A method according to claim 1, wherein the reflection coefficient values are stored in a vector quantizer of the speech coder, and the reflection coefficient values are arcsine-scaled between -1 and +1.

4. A method according to claim 1, wherein reflection coefficient values are stored in a vector quantizer of a speech coder, and said N is equal to eight.

5. A speech coder comprising:

a codebook generator that generates a stimulus codebook with 2 ^M codevectors formed using M basis vectors, where M is a natural number;

an input device that receives input speech and generates a data vector;

encoding means coupled to the input means to generate reflection coefficients corresponding to speech parameters representing the input speech, and the encoding means to process the code vectors to generate synthesized speech;

A vector quantizer for quantizing the reflection coefficient, the vector quantizer comprising a vector quantizer memory configured to store ^2N reflection coefficient values and having an N-bit input and an output , and the vector quantizer memory provides one of 2 ^N reflection coefficient values at the output, where N is a natural number, in response to an N-bit address received at the N-bit input;

a codebook search controller coupled to the codebook generator for selecting a code vector from the excitation codebook such that the weighted error between the synthesized speech and data vectors is minimized, the codebook search controller being coupled to the The vector quantizer is coupled, and the vector quantizer is provided with a corresponding N-bit code to look up the reflection coefficient value for processing when a reflection coefficient value is required for processing, and when a reflection coefficient value is not required for processing , the codebook search controller only stores the N-bit code.

6. A speech encoder according to claim 5, wherein each reflection coefficient value is associated with a corresponding N-bit address by an arcsine scaling function.